Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the ‘reasoning’ models.

  • TankovayaDiviziya@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    6
    ·
    edit-2
    2 hours ago

    We poked fun at this meme, but it goes to show that the LLM is still like a child that needs to be taught to make implicit assumptions and posses contextual knowledge. The current model of LLM needs a lot more input and instructions to do what you want it to do specifically, like a child.

    Edit: I know Lemmy scoff at LLM, but people probably also used to scoff at Veirbest’s steam machine that it will never amount to anything. Give it time and it will improve. I’m not endorsing AI by the way, I am on the fence about the long term consequence of it, but whether people like it or not, AI will impact human lives.

    • Rob T Firefly@lemmy.world
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      2
      ·
      edit-2
      5 hours ago

      LLMs are not children. Children can have experiences, learn things, know things, and grow. Spicy autocomplete will never actually do any of these things.

        • Rob T Firefly@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          ·
          edit-2
          1 hour ago

          Our microorganism ancestors also did all those things, and they were far beyond anything an LLM can do. Turning a given list of words into numbers, doing a string of math to those numbers, and turning the resulting numbers back into words is not consciousness or wisdom and never will be.

          • TankovayaDiviziya@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            12 minutes ago

            You think microorganisms can reason? Wow, AI haters are grasping for straws.

            Honestly, I don’t understand Lemmy scoffing at AI and thinking the current iteration is all it ever will be. I’m sure some thought that the automobile technology would not go anywhere simply because the first model was running at 3mph. These things always takes time.

            To be clear, I’m not endorsing AI, but I think there is a huge potential in years to come, for better or worse. And it is especially important to never underestimate something, especially by AI haters, because of what destructive potential AI has.

    • kshade@lemmy.world
      link
      fedilink
      English
      arrow-up
      14
      ·
      5 hours ago

      We have already thrown just about all the Internet and then some at them. It shows that LLMs can not think or reason. Which isn’t surprising, they weren’t meant to.

      • eronth@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        6
        ·
        5 hours ago

        Or at least they can’t reason the way we do about our physical world.

        • Nalivai@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          1
          ·
          3 hours ago

          You’re failing into the same trap. When the letters on the screen tell you something, it’s not necessarily the truth. When there is “I’m reasoning” written in a chatbot window, it doesn’t mean that there is a something that’s reasoning.

        • zalgotext@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          14
          arrow-down
          1
          ·
          5 hours ago

          No, they cannot reason, by any definition of the word. LLMs are statistics-based autocomplete tools. They don’t understand what they generate, they’re just really good at guessing how words should be strung together based on complicated statistics.

      • Nalivai@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        3 hours ago

        By now it’s kind of getting clear that fundamentally it’s the best version of the thing that we get. This is a primetime.
        For some time, there was a legit question of “if we give it enough data, will there be a qualitative jump”, and as far as we can see right now, we’re way past this jump. Predictive algorithm can form grammatically correct sentences that are related to the context. That’s it, that’s the jump.
        Now a bunch of salespeople are trying to convince us that if there was one jump, there necessarily will be others, while there is no real indication of that.