Anthropic’s bug-hunting Mythos was greatest marketing stunt ever, says cURL creator

HaraldvonBlauzahn@feddit.org · 6 hours ago

Anthropic’s bug-hunting Mythos was greatest marketing stunt ever, says cURL creator

Underlectual@lemmy.world · 4 hours ago

Kissaki@programming.dev · 4 hours ago

My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing.

This is just one source code repository and maybe it is much better on other things. I can only tell and comment on what it found here.

So it’s mainly about “maybe slightly better, not significantly better than existing tools, at least in the context of this single project” vs “greatest marketing stunt ever”.

some more quotes from that post, if you just want these – specifically context about their process and other AI tools

Before this first Mythos report, we had already scanned curl with several different very capable AI powered tools (I mean in addition to running a number of “normal” static code analyzers all the time, using the pickiest compiler options and doing fuzzing on it for years etc). Primarily AISLE, Zeropath and OpenAI’s Codex Security have been used to scrutinize the code with AI. These tools and the analyses they have done have triggered somewhere between two and three hundred bugfixes merged in curl through-out the recent 8-10 months or so. A bunch of the findings these AI tools reported were confirmed vulnerabilities and have been published as CVEs. Probably a dozen or more.

but the PR review bots regularly highlight issues that we fix: our merges would be worse without them. The AI reviews are used in addition to the human reviews.

and dug into the details, we had trimmed the list down and were left with one confirmed vulnerability. The other four were three false positives (they highlighted shortcomings that are documented in API documentation) and the fourth we deemed “just a bug”.

The Mythos report on curl also contained a number of spotted bugs that it concluded were not vulnerabilities

All in all about twenty bugs that are described and explained very nicely. Barely any false positives, so I presume they have had a rather high threshold for certainty.

curl is certainly getting better thanks to this report, but counted by the volume of issues found, all the previous AI tools we have used have resulted in larger bugfix amounts. This is only natural of course since the first tools we ran had many more and easier bugs to find. As we have fixed issues along the way, finding new ones are slowly becoming harder. Additionally, a bug can be small or big so it’s not always fair to just compare numbers

Still very good

But allow me to highlight and reiterate what I have said before: AI powered code analyzers are significantly better at finding security flaws and mistakes in source code than any traditional code analyzers did in the past. All modern AI models are good at this now. Anyone with time and some experimental spirits can find security problems now. The high quality chaos is real.

How AI analyzers differ

They can spot when the comment says something about the code and then conclude that the code does not work as the comment says.

It can check code for platforms and configurations we otherwise cannot run analyzers for

It “knows” details about 3rd party libraries and their APIs so it can detect abuse or bad assumptions.

It “knows” details about protocols curl implements and can question details in the code that seem to violate or contradict protocol specifications

They are typically good at summarizing and explaining the flaw, something which can be rather tedious and difficult with old style analyzers.

They can often generate and offer a patch for its found issue (even if the patch usually is not a 100% fix).

We have not seen any AI so far report a vulnerability that would somehow be of a novel kind or something totally new. They do not reinvent the field in that way, but they do dig up more issues than any other tools did before.

I hope we can keep getting more curl scans done with Mythos and other AIs, over and over until they truly stop finding new problems.

makeshift0546@lemmy.today · edit-2 2 hours ago

Dev on one of the most highly used simple tools finds project is secure. News at 11. They really would have to try hard to find a worse example.

statelesz@slrpnk.net · 5 hours ago

I’m all sceptical of AI and the hype but maybe the curl codebase is just quite secure and there are not many vulnerabilities? Not finding a bunch of things doesn’t mean the model sucks. That’s a stupid conclusion.

Not a newt@piefed.ca · 1 hour ago

Daniel has been quite vocal about his views on AI slop reports, but he’s also been honest about how some AI systems have been able to identify issues in the curl code, ranging from documentation drift to actual vulnerabilities. It’s not that Mythos isn’t finding vulns. It’s that Mythos is not noticeably better at finding them than other tools (LLM or non-LLM), unlike what Anthropic are claiming.

HaraldvonBlauzahn@feddit.org · 6 hours ago

Note: This does not mean that AI tools can’t find bugs. There are plenty of tools that are able to, and for sure plenty of bugs out there that have not been found yet.

mindlesscrollyparrot@discuss.tchncs.de · 4 hours ago

Indeed. What Daniel actually said was that, if you have already used the other AI tools to find bugs, Mythos is unlikely to find more.