• Kissaki@programming.dev
      link
      fedilink
      English
      arrow-up
      11
      ·
      4 hours ago

      Their original blog post

      My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing.

      This is just one source code repository and maybe it is much better on other things. I can only tell and comment on what it found here.

      So it’s mainly about “maybe slightly better, not significantly better than existing tools, at least in the context of this single project” vs “greatest marketing stunt ever”.

      some more quotes from that post, if you just want these – specifically context about their process and other AI tools

      Before this first Mythos report, we had already scanned curl with several different very capable AI powered tools (I mean in addition to running a number of “normal” static code analyzers all the time, using the pickiest compiler options and doing fuzzing on it for years etc). Primarily AISLE, Zeropath and OpenAI’s Codex Security have been used to scrutinize the code with AI. These tools and the analyses they have done have triggered somewhere between two and three hundred bugfixes merged in curl through-out the recent 8-10 months or so. A bunch of the findings these AI tools reported were confirmed vulnerabilities and have been published as CVEs. Probably a dozen or more.

      but the PR review bots regularly highlight issues that we fix: our merges would be worse without them. The AI reviews are used in addition to the human reviews.

      and dug into the details, we had trimmed the list down and were left with one confirmed vulnerability. The other four were three false positives (they highlighted shortcomings that are documented in API documentation) and the fourth we deemed “just a bug”.

      The Mythos report on curl also contained a number of spotted bugs that it concluded were not vulnerabilities

      All in all about twenty bugs that are described and explained very nicely. Barely any false positives, so I presume they have had a rather high threshold for certainty.

      curl is certainly getting better thanks to this report, but counted by the volume of issues found, all the previous AI tools we have used have resulted in larger bugfix amounts. This is only natural of course since the first tools we ran had many more and easier bugs to find. As we have fixed issues along the way, finding new ones are slowly becoming harder. Additionally, a bug can be small or big so it’s not always fair to just compare numbers

      Still very good

      But allow me to highlight and reiterate what I have said before: AI powered code analyzers are significantly better at finding security flaws and mistakes in source code than any traditional code analyzers did in the past. All modern AI models are good at this now. Anyone with time and some experimental spirits can find security problems now. The high quality chaos is real.

      How AI analyzers differ

      • They can spot when the comment says something about the code and then conclude that the code does not work as the comment says.
      • It can check code for platforms and configurations we otherwise cannot run analyzers for
      • It “knows” details about 3rd party libraries and their APIs so it can detect abuse or bad assumptions.
      • It “knows” details about protocols curl implements and can question details in the code that seem to violate or contradict protocol specifications
      • They are typically good at summarizing and explaining the flaw, something which can be rather tedious and difficult with old style analyzers.
      • They can often generate and offer a patch for its found issue (even if the patch usually is not a 100% fix).

      We have not seen any AI so far report a vulnerability that would somehow be of a novel kind or something totally new. They do not reinvent the field in that way, but they do dig up more issues than any other tools did before.

      I hope we can keep getting more curl scans done with Mythos and other AIs, over and over until they truly stop finding new problems.

  • makeshift0546@lemmy.today
    link
    fedilink
    arrow-up
    4
    arrow-down
    1
    ·
    edit-2
    2 hours ago

    Dev on one of the most highly used simple tools finds project is secure. News at 11. They really would have to try hard to find a worse example.

  • statelesz@slrpnk.net
    link
    fedilink
    English
    arrow-up
    15
    ·
    5 hours ago

    I’m all sceptical of AI and the hype but maybe the curl codebase is just quite secure and there are not many vulnerabilities? Not finding a bunch of things doesn’t mean the model sucks. That’s a stupid conclusion.

    • Not a newt@piefed.ca
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 hour ago

      Daniel has been quite vocal about his views on AI slop reports, but he’s also been honest about how some AI systems have been able to identify issues in the curl code, ranging from documentation drift to actual vulnerabilities. It’s not that Mythos isn’t finding vulns. It’s that Mythos is not noticeably better at finding them than other tools (LLM or non-LLM), unlike what Anthropic are claiming.

  • HaraldvonBlauzahn@feddit.orgOP
    link
    fedilink
    arrow-up
    6
    ·
    6 hours ago

    Note: This does not mean that AI tools can’t find bugs. There are plenty of tools that are able to, and for sure plenty of bugs out there that have not been found yet.