- cross-posted to:
- [email protected]
- cross-posted to:
- [email protected]
A note that this setup runs a 671B model in Q4 quantization at 3-4 TPS, running a Q8 would need something beefier. To run a 671B model in the original Q8 at 6-8 TPS you’d need a dual socket EPYC server motherboard with 768GB of RAM.
I found a YouTube link in your comment. Here are links to the same video on alternative frontends that protect your privacy: