wake up

open twitter to catch up

see deepseek did it again

(and as a reminder, Deepseek-r1 only came out in January so it’s been less than 12 months since their last bombshell)

One more graph:

What this all means

Traditional AI models are trained to be “rewarded” for a correct final answer. Get the expected answer, win points, be incentivized to get the answer more often. This has a major flaw: a correct answer does not guarantee correct reasoning. A model can guess, use a shortcut, or even have flawed logic but still output the right answer. This approach completely fails for tasks like theorem proving, where the process is the product. DeepSeekMath-V2 tackles this with a novel self-verifying reasoning framework:

  • the Generator: One part of the model generates mathematical proofs and solutions.
  • the Verifier: Another part acts as the critic, checking every step of the reasoning for logical rigor and correctness
  • The Loop: If the verifier finds a flaw, it provides feedback, and the generator revises the proof. This creates a co-evolution cycle where both components push each other to become smarter

This new approach allows the model to set record-breaking performance. As you can see from the charts above, it scores second-place on ProofBench-Advanced, just behind Gemini. But Gemini isn’t open-source, Deepseekmath-V2 is.

The model weights are available on Huggingface under an Apache 2.0 license: https://huggingface.co/deepseek-ai/DeepSeek-Math-V2.

This means researchers, developers, and enthusiasts around the world can download, study, and build upon this model right now. They can fine-tune or change the model to fit their needs and research, which promises a lot of exciting math discoveries happening soon - I predict (on no basis mind you) that this will help solve computing problems to start with, either practical or theoretical.

Beyond just the math, the self-verification mechanism is a crucial step towards building AI systems whose reasoning we can trust, which is vital for applications such as scientific research, formal verification, and safety-critical systems. It also proves that ‘verification-driven’ training is a viable and powerful alternative to the ‘answer-driven’ method used to this day.

  • Conselheiro@lemmygrad.ml
    link
    fedilink
    arrow-up
    4
    ·
    edit-2
    6 days ago

    A second open source LLM has hit the S&P500.

    I find it fascinating how much of the progress in DNNs for the last 5 years has been simply “how can we converge on training this in an Adversarial framework”. Somebody more versed in Hegel could probably write some relation between the success of GANs and the nature of contradictions.

    I also hope this training method removes the annoying artefact of the chatbot replying to all my comments with “You are absolutely correct!” like it wants some favour.

    I’ll check out the paper later, but at a glance I’m somewhat sceptical of the critic increasing reliability in non math stuff. Depending on how it’s modelled, it could just become a more convincing bullshitter.

    • CriticalResist8@lemmygrad.mlOP
      link
      fedilink
      arrow-up
      11
      ·
      6 days ago

      I don’t think they’ll put it on the web version, at least not yet. A few months ago they released OCR which is pretty amazing at turning pictures into text, but did not roll it out to their web interface either. Math-v2 is a completely different model and they’d have to load it on their servers so that’s probably why they don’t include it. They might have API access for it though eventually, paid of course.

      The full model is 185 gigs which means as many gigs of Vram to run it, so I don’t think we’ll see it on consumer hardware anytime soon… however the model is released so labs and universities (aka centers that have that kind of Vram available) can run it right now, by downloading the safetensors files from huggingface and - so to answer that question, yes it exists and is available right now!

      But with the way Chinese innovation is going in AI, I expect by the end of 2026-27 we will be able to run these kinds of models on our computers, without doing heavy quantization or distills that basically destroy the model’s capabilities. They are already finding new ways to quantize models without losing accuracy in China. If you think about it it’s only existed commercially for 3 years and we already have gguf optimizations so, it’s only progressing haha

      • Soviet Snake@lemmygrad.ml
        link
        fedilink
        arrow-up
        6
        ·
        6 days ago

        As far as I know OCR is actually included in the web version. I’ve used it various times and actually when you upload a picture it only lets you send it if it has text, otherwise it says it can’t do anything with it. So maybe they will add it soon.

        • CriticalResist8@lemmygrad.mlOP
          link
          fedilink
          arrow-up
          6
          ·
          6 days ago

          I think this is still the “old” OCR version, they released a new one a few months ago (https://huggingface.co/deepseek-ai/DeepSeek-OCR) that can pretty much OCR anything. It can also compress it and use it as context, which could unlock unlimited context.

          Mind you OCR is only about 6 gigs so you could totally use it on most modern gpus. Yogthos is going to try and make it able to parse a full pdf which will be very useful to upload books to prolewiki with (I used to have a good OCR program but OS change and now it doesn’t work anymore…)

          Wish it could come to their web interface too though 🥲