- cross-posted to:
- [email protected]
- cross-posted to:
- [email protected]
A note that this setup runs a 671B model in Q4 quantization at 3-4 TPS, running a Q8 would need something beefier. To run a 671B model in the original Q8 at 6-8 TPS you’d need a dual socket EPYC server motherboard with 768GB of RAM.
The “wall” they’re talking about is orthodox AI not getting better despite feeding it even more data. DeepSeek sidesteps this by making multiple smaller models that can switched for different tasks, instead of the orthodox method of trying to make a “general intelligence” model that works for everything.
just reasoning alone almost destroyed the western AI industry overnight. They scrambled to make their own reasoning models in less than a week but it changed everything
I think the next big idea could be models dynamically training sub models on demand. There are approaches like HRM being explored that require far less training data and scope of parameters already. Another avenue being explored focuses on creating reusable memory components as seen with MemOS. It blurs the line between training and operational modes, where the model just continuously learns. What we might see is models that create an agent to learn a new task, and then once it’s learned it can be used and shared going forward.
From what we know, human intelligence is also structured hierarchically, where the brain has regions responsible for different tasks like vision processing, and then there’s a high level reasoning system built on top of that.