• NotMyOldRedditName@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    1 day ago

    Also developers often want more ram, and if youre on the mac side, the M series ram works as video ram for loading and running models, so there’s a good chance you can already run something better than is typical of others, and apple is focusing on this by adding more NPUs and increasing memory bandwidth. They arent good at training, but can do inference.

    • partofthevoice@lemmy.zip
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      I’m on a MacBook with M2, 32GB ram. Literally just tried:

      • gemma4:12b - very slow, unworkable
      • qwen3:8b - very slow, unworkable
      • qwen2.5-coder:7b - slow but workable. Doesn’t use tools properly in OpenCode.

      Well, I guess I’ll try again next year.

      For context: my home pc is running gemma4:31b just fine. It’s also a beefy ass desktop, though.

      • fluxx@mander.xyz
        link
        fedilink
        English
        arrow-up
        2
        ·
        18 hours ago

        Are you running an mlx model? If not, try that. My m4 macbook runs qwen3.6-35b-a3b lightning fast. Has its issues, but fast nonetheless.

          • fluxx@mander.xyz
            link
            fedilink
            English
            arrow-up
            1
            ·
            5 hours ago

            I have a model with 64GB of ram. I’ve limited context to 16k, in an effort to make it more stable, but tbh - it is rather unreliable no matter what I do. With my setup - mlx_lm and webui, it frequently collapses or loops, no matter the settings. I have done a lot of debugging and have concluded it is probably inherent model behavior.

            • NotMyOldRedditName@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              27 minutes ago

              That’s lame about the looping, but ya I don’t think that’s a mlx issue, I’ve had it on my desktop with my nvidia card as well. I also tried fussing with configurations, and I was never sure if it was the models or my settings. I was mainly toying around with LLama based models.

      • NotMyOldRedditName@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        1 day ago

        You might be doing something wrong, models that size shouldn’t be that slow if properly configured on a 32gb m2

        You need a metal optimized client and model, not the same models you’d run on your desktop machine.