• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    1 day ago

    I am referencing this: https://z.ai/blog/glm-4.5

    The full GLM? Basically a 3090 or 4090 and a budget EPYC CPU. Or maybe 2 GPUs on a threadripper system.

    GLM Air? Now this would work on a 16GB+ VRAM desktop, just slap in 96GB+ (maybe 64GB?) of fast RAM. Or the recent Framework desktop, or any mini PC/laptop with the 128GB Ryzen 395 config, or a 128GB+ Mac.

    You’d download the weights, quantize yourself if needed, and run them in ik_llama.cpp (which should get support imminently).

    https://github.com/ikawrakow/ik_llama.cpp/

    But these are…not lightweight models. If you don’t want a homelab, there are better ones that will fit on more typical hardware configs.