• jacksilver@lemmy.world
    link
    fedilink
    arrow-up
    5
    arrow-down
    3
    ·
    10 个月前

    My main point is that gpt4o and other models it’s being compared to are multimodal, R1 is only a LLM from what I can find.

    Something trained on audio/pictures/videos/text is probably going to cost more than just text.

    But maybe I’m missing something.

    • will_a113@lemmy.ml
      link
      fedilink
      English
      arrow-up
      24
      ·
      10 个月前

      The original gpt4 is just an LLM though, not multimodal, and the training cost for that is still estimated to be over 10x R1’s if you believe the numbers. I think where R 1 is compared to 4o is in so-called reasoning, where you can see the chain of though or internal prompt paths that the model uses to (expensively) produce an output.

      • jacksilver@lemmy.world
        link
        fedilink
        arrow-up
        5
        arrow-down
        2
        ·
        edit-2
        10 个月前

        I’m not sure how good a source it is, but Wikipedia says it was multimodal and came out about two years ago - https://en.m.wikipedia.org/wiki/GPT-4. That being said.

        The comparisons though are comparing the LLM benchmarks against gpt4o, so maybe a valid arguement for the LLM capabilites.

        However, I think a lot of the more recent models are pursing architectures with the ability to act on their own like Claude’s computer use - https://docs.anthropic.com/en/docs/build-with-claude/computer-use, which DeepSeek R1 is not attempting.

        Edit: and I think the real money will be in the more complex models focused on workflows automation.

      • veroxii@aussie.zone
        link
        fedilink
        arrow-up
        4
        ·
        9 个月前

        Holy smoke balls. I wonder what else they have ready to release over the next few weeks. They might have a whole suite of things just waiting to strategically deploy