• partofthevoice@lemmy.zip
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 day ago

    I’m on a MacBook with M2, 32GB ram. Literally just tried:

    • gemma4:12b - very slow, unworkable
    • qwen3:8b - very slow, unworkable
    • qwen2.5-coder:7b - slow but workable. Doesn’t use tools properly in OpenCode.

    Well, I guess I’ll try again next year.

    For context: my home pc is running gemma4:31b just fine. It’s also a beefy ass desktop, though.

    • fluxx@mander.xyz
      link
      fedilink
      English
      arrow-up
      2
      ·
      17 hours ago

      Are you running an mlx model? If not, try that. My m4 macbook runs qwen3.6-35b-a3b lightning fast. Has its issues, but fast nonetheless.

        • fluxx@mander.xyz
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 hours ago

          I have a model with 64GB of ram. I’ve limited context to 16k, in an effort to make it more stable, but tbh - it is rather unreliable no matter what I do. With my setup - mlx_lm and webui, it frequently collapses or loops, no matter the settings. I have done a lot of debugging and have concluded it is probably inherent model behavior.

    • NotMyOldRedditName@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      1 day ago

      You might be doing something wrong, models that size shouldn’t be that slow if properly configured on a 32gb m2

      You need a metal optimized client and model, not the same models you’d run on your desktop machine.