I'm using an M3 Ultra w/ 512GB of RAM, using LMStudio and mostly mlx models. It runs massive models with reasonable tokens per second, though prompt processing can be slow. It handles long conversations fine so long as the KV cache hits. It's usable with opencode and crush, though my main motivation for getting it was specifically to be able to process personal data (e.g. emails) privately, and to experiment freely with abliterated models for security research. Also, I appreciate being able to run it off solar power.
I'm still trying to figure out a good solution for fast external storage, I only went for 1TB internal which doesn't go very far with models that have hundreds of billions of parameters.
You might consider then getting four 40gbps nVME enclosures, and then RAIDing multiple together (e.g. in a big stripe, you could get 160gbps throughput, only limited by # physical interfaces). Each slice could be +TBs.
Obviously increases your failure rate, but if you're constantly updating the same models (and not creating your own) you don't really need redundancy.
I'm still trying to figure out a good solution for fast external storage, I only went for 1TB internal which doesn't go very far with models that have hundreds of billions of parameters.