Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the ‘reasoning’ models.

  • Snot Flickerman@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    124
    arrow-down
    3
    ·
    edit-2
    16 hours ago

    I mean, I’ve been saying this since LLMs were released.

    We finally built a computer that is as unreliable and irrational as humans… which shouldn’t be considered a good thing.

    I’m under no illusion that LLMs are “thinking” in the same way that humans do, but god damn if they aren’t almost exactly as erratic and irrational as the hairless apes whose thoughts they’re trained on.

    • Peekashoe@lemmy.wtf
      link
      fedilink
      English
      arrow-up
      31
      ·
      16 hours ago

      Yeah, the article cites that as a control, but it’s not at all surprising since “humanity by survey consensus” is accurate to how LLM weighting trained on random human outputs works.

      It’s impressive up to a point, but you wouldn’t exactly want your answers to complex math operations or other specialized areas to track layperson human survey responses.

    • MangoCats@feddit.it
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      7
      ·
      15 hours ago

      which shouldn’t be considered a good thing.

      Good and bad is subjective and depends on your area of application.

      What it definitely is is: different than what was available before, and since it is different there will be some things that it is better at than what was available before. And many things that it’s much worse for.

      Still, in the end, there is real power in diversity. Just don’t use a sledgehammer to swipe-browse on your cellphone.

      • Lost_My_Mind@lemmy.world
        link
        fedilink
        English
        arrow-up
        10
        ·
        14 hours ago

        I asked Lars Ulrich to define good and bad. He said…

        FIRE GOOD!!! NAPSTER BAD!!! OOOOH FIRE HOT!!! FIRE BAD!!! FIIIRRREEE BAAAAAAAD!!!