wake up

open twitter to catch up

see deepseek did it again

(and as a reminder, Deepseek-r1 only came out in January so it’s been less than 12 months since their last bombshell)

One more graph:

What this all means

Traditional AI models are trained to be “rewarded” for a correct final answer. Get the expected answer, win points, be incentivized to get the answer more often. This has a major flaw: a correct answer does not guarantee correct reasoning. A model can guess, use a shortcut, or even have flawed logic but still output the right answer. This approach completely fails for tasks like theorem proving, where the process is the product. DeepSeekMath-V2 tackles this with a novel self-verifying reasoning framework:

  • the Generator: One part of the model generates mathematical proofs and solutions.
  • the Verifier: Another part acts as the critic, checking every step of the reasoning for logical rigor and correctness
  • The Loop: If the verifier finds a flaw, it provides feedback, and the generator revises the proof. This creates a co-evolution cycle where both components push each other to become smarter

This new approach allows the model to set record-breaking performance. As you can see from the charts above, it scores second-place on ProofBench-Advanced, just behind Gemini. But Gemini isn’t open-source, Deepseekmath-V2 is.

The model weights are available on Huggingface under an Apache 2.0 license: https://huggingface.co/deepseek-ai/DeepSeek-Math-V2.

This means researchers, developers, and enthusiasts around the world can download, study, and build upon this model right now. They can fine-tune or change the model to fit their needs and research, which promises a lot of exciting math discoveries happening soon - I predict (on no basis mind you) that this will help solve computing problems to start with, either practical or theoretical.

Beyond just the math, the self-verification mechanism is a crucial step towards building AI systems whose reasoning we can trust, which is vital for applications such as scientific research, formal verification, and safety-critical systems. It also proves that ‘verification-driven’ training is a viable and powerful alternative to the ‘answer-driven’ method used to this day.

  • CriticalResist8@lemmygrad.mlOP
    link
    fedilink
    arrow-up
    6
    ·
    6 days ago

    I think this is still the “old” OCR version, they released a new one a few months ago (https://huggingface.co/deepseek-ai/DeepSeek-OCR) that can pretty much OCR anything. It can also compress it and use it as context, which could unlock unlimited context.

    Mind you OCR is only about 6 gigs so you could totally use it on most modern gpus. Yogthos is going to try and make it able to parse a full pdf which will be very useful to upload books to prolewiki with (I used to have a good OCR program but OS change and now it doesn’t work anymore…)

    Wish it could come to their web interface too though 🥲