|
|
| | Ask HN: Anyone Using a Mac Studio for Local AI/LLM? | | 57 points by UmYeahNo 17 days ago | hide | past | favorite | 35 comments | | Curious to know your experience running local LLM's with a well spec'ed out M3 Ultra or M4 Pro Mac Studio. I don't see a lot of discussion on the Mac Studio for Local LLMs but it seems like you could put big models in memory with the shared VRAM. I assume that the token generation would be slow, but you might get higher quality results because you can put larger models in memory. |
|

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
|
I've only had it a couple of months, but so far it's proving its worth in the quality of LLM output, even quantized.
I generally run Qwen3-vl at 235b, at a Q4_K_M quantization level so that it fits, and it leaves me plenty of RAM for workstation tasks while delivering tokens at around 30tok/s
The smaller Qwen3 models (like qwen3-coder) I use in tandem, of course they run much faster and I tend to run them at higher quants up to Q8 for quality purposes.
The gigantic RAM's biggest boon, I've found, is letting me run the models with full context allocated, which lets me hand them larger and more complicated things than I could before. This alone makes the money I spent worth it, IMO.
I did manage to get glm-4.7 (a 358b model) running at a Q3 quantization level; it's delivery is adequate quality-wise, although it delivers at 15tok/s, though I did have to cut down to only 128k context to leave me enough room for the desktop.
If you get something this big, it's a powerhouse, but not nearly as much of a powerhouse as a dedicated nVidia GPU rig. The point is to be able to run them _adequately_, not at production speeds, to get your work done. I found price/performance/energy usage to be compelling at this level and I am very satisfied.