Big tech boss tells delegates at Davos that broader global use is essential if technology is to deliver lasting growth

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    13 hours ago

    Unless you have some really serious hardware, 24 billion parameters is probably the maximum that would be practical for self-hosting on a reasonable hobbyist set-up.

    Eh…I don’t know if you’d call it “really serious hardware”, but when I picked up my 128GB Framework Desktop, it was $2k (without storage), and that box is often described as being aimed at the hobbyist AI market. That’s pricier than most video cards, but an AMD Radeon RX 7900 XTX GPU was north of $1k, an NVidia RTX 4090 was about $2k, and it looks like the NVidia RTX 5090 is presently something over $3k (and rising) on EBay, well over MSRP. None of those GPUs are dedicated hardware aimed at doing AI compute, just high-end cards aimed at playing games that people have used to do AI stuff on.

    I think that the largest LLM I’ve run on the Framework Desktop was a 106 billion parameter GLM model at Q4_K_M quantization. It was certainly usable, and I wasn’t trying to squeeze as large a model as possible on the thing. I’m sure that one could run substantially-larger models.

    EDIT: Also, some of the newer LLMs are MoE-based, and for those, it’s not necessarily unreasonable to offload expert layers to main memory. If a particular expert isn’t being used, it doesn’t need to live in VRAM. That relaxes some of the hardware requirements, from needing a ton of VRAM to just needing a fair bit of VRAM plus a ton of main memory.

    • wonderingwanderer@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      11 hours ago

      See, you have more experience in the matter than I do, hence the caveat that I’m not an expert. Thanks for sharing your experience.

      Then again, I’d consider 128GB of memory to be fairly serious hardware, but if that’s common among hobbyists then I stand corrected. I was operating on the assumption that 64GB of RAM is already a lot

      All in all, 106 billion parameters on 128GB of memory with quantization doesn’t surprise me all that much. But again, I’m just going off of the vague notions I’ve gathered from reading about it.

      The focus of my original comment was more on the fact that self-hosting is an option, I wasn’t trying to be too precise with the specs. My bad if it came off that way