I tested 9 flagships (Claude 4.6, GPT-5.2, Gemini 3.1 Pro, Kimi K2.5, etc.) in my own mini-benchmark with novel tasks, web search disabled and zero training contamination and no cheating possible.

TL;DR: Claude 4.6 is currently the best reasoning model, GPT-5.2 is overrated, and open-source is catching up fast, in particular Moonshot.ai’s Kimi K2.5 seems very capable.

  • Iconoclast@feddit.uk
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    4
    ·
    edit-2
    17 小时前

    The hostility just seems unnecessary and unproductive from my point of view. Unless of course your intention is to hurt - but I’ll give you the benefit of the doubt and assume you’d rather change minds instead.

    It’s a nuanced discussion, which is why I don’t think either fanaticism or militant opposition is going to get us anywhere. This is a technology community - people should be free to have civil discussions about technology. Criticism is just as valid without the jabs and insults.

    • IratePirate@feddit.org
      link
      fedilink
      English
      arrow-up
      2
      ·
      7 小时前

      Appreciate the call to reason. Yet, though this may have been sharply worded, insulting it was not.