A note that this setup runs a 671B model in Q4 quantization at 3-4 TPS, running a Q8 would need something beefier. To run a 671B model in the original Q8 at 6-8 TPS you’d need a dual socket EPYC server motherboard with 768GB of RAM.

  • CriticalResist8@lemmygrad.ml
    link
    fedilink
    arrow-up
    4
    arrow-down
    1
    ·
    2 months ago

    ooh now I’m stressed I’m gonna have to download and try 20 different models to find out the one I like best haha. Do you know some that are good for coding tasks? I also do design stuff (the AI helps walk through the design thinking process with me), it’s kind of a reasoning/logical task but it’s also highly specialized.

    • KrasnaiaZvezda@lemmygrad.ml
      link
      fedilink
      arrow-up
      3
      ·
      2 months ago

      There is a new Qwen3 coder 30B-A3B that looks to be good, and people were talking about GLM4.5 32B, but I haven’t used local models for code much so I can’t give a good answer.