• boonhet@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    7 hours ago

    The carwash thing applies to low end models and older models. Here’s Claude from lowest to highest model, ignoring the banned Fable

    • replicat@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      6 hours ago

      They altered the training data to address this challenge. The underlying issue wasn’t solved in any way. Don’t be naive.

      • boonhet@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 hours ago

        Takes months to train a model, there were already models that got it right when the question was popular, as long as thinking was enabled.

        Also if they were optimising for this question, why not update their lower end model (Haiku) as well?

        The interesting question would be what percent of humans get it wrong. Smaller than LLMs for sure, but I somehow doubt it’s 0.