I’m no fan of AI generally, but “AI Vulnerable” as a term just doesn’t make much sense to me. Code reviewing should be filtering out bad code whether it originates from an AI or a human.
PR spamming with the usage of AI is another problem which is very serious and harmful for OSS, but that’s not due to some unique danger that only AI code has and human contributors don’t.
Code reviewing should be filtering out bad code whether it originates from an AI or a human.
But studies are showing it doesn’t work.
A human makes a mental model of the entire system, does some testing, and submits code that works, passes tests, and fits their unstanding of what is need.
A present day AI makes an educated guess which existing source code snippets best match the request, does some testing, and submits code that it judges is most likely to pass code review.
And yes, plenty of human coders fall into the second bracket, as well.
But AI is very good at writing code that looks right. Code review is a good and necessary tool, but the data tells us code review isn’t solving the problem of bugs introduced by AI generated code.
I don’t have an answer, but “just use code review” probably isn’t it. In my opinion, “never use AI code assist” also isn’t the answer. There’s just more to learn about it, and we should proceed with drastically more caution.
Here’s an example I ran into, since work wants us to use AI to produce work stuff, whatever, they get to deal with the result.
But I had asked it to add some debug code to verify that a process was working by saving the in memory result of that process to a file, so I could ensure the next step was even possible to do based on the output of the first step (because the second step was failing). Get the file output and it looks fine, other than missing some whitespace, but that’s ok.
And then while debugging, it says the issue is the data for step 1 isn’t being passed to the function the calls if all. Wait, how can this be, the file looks fine? Oh when it added the debug code, it added a new code path that just calls the step 1 code (properly). Which does work for verifying step 1 on its own but not for verifying the actual code path.
The code for this task is full of examples like that, almost as if it is intelligent but it’s using the genie model of being helpful where it tries to technically follow directions while subverting expectations anywhere it isn’t specified.
Thinking about my overall task, I’m not sure using AI has saved time. It produces code that looks more like final code, but adds a lot of subtle unexpected issues on the way.
It produces code that looks more like final code, but adds a lot of subtle unexpected issues on the way.
That is an excellent summary of the challenge. The code looks high quality sooner in the debug lifecycle, which actually makes debugging a little bit slower, at least with our current tools.
Yeah, it’s good enough that it even had me fooled, despite all my “it just correlates words” comments. It was getting to the desired result, so I was starting to think that the framework around the agentic coding AIs was able to give it enough useful context to make the correlations useful, even if it wasn’t really thinking.
But it’s really just a bunch of duct tape slapped over cracks in a leaky tank they want to put more water in. While it’s impressive how far it has come, the fundamental issues will always be there because it’s still accurate to call LLMs massive text predictors.
The people who believe LLMs have achieved AGI are either just lying to try to prolong the bubble in the hopes of actually getting it to the singularity before it pops or are revealing their own lack of expertise because they either haven’t noticed the fundamental issues or think they are minor things that can be solved because any instance can be patched.
But a) they can only be patched by people who know the correction (so the patches won’t happen in the bleeding edge until humans solve the problem they wanted AI to solve), and b) it will require an infinite number of these patches even to just cover all permutations of everything we do know.
A present day AI makes an educated guess which existing source code snippets best match the request, does some testing, and submits code that it judges is most likely to pass code review.
That’s still on the human that opened the PR without doing the slightest effort of testing the AI changes though.
I agree there should be a lot of caution overall, I just think that the problem is a bit mischaracterized. The problem is the newfound ability to spam PRs that look legit but are actually crap, but the root here is humans doing this for Github rep or whatever, not AI inherently making codebases vulnerable. There need to be ways to detect such users that repeatedly do zero effort contributions like that and ban them.
I’m no fan of AI generally, but “AI Vulnerable” as a term just doesn’t make much sense to me. Code reviewing should be filtering out bad code whether it originates from an AI or a human.
PR spamming with the usage of AI is another problem which is very serious and harmful for OSS, but that’s not due to some unique danger that only AI code has and human contributors don’t.
But studies are showing it doesn’t work.
A human makes a mental model of the entire system, does some testing, and submits code that works, passes tests, and fits their unstanding of what is need.
A present day AI makes an educated guess which existing source code snippets best match the request, does some testing, and submits code that it judges is most likely to pass code review.
And yes, plenty of human coders fall into the second bracket, as well.
But AI is very good at writing code that looks right. Code review is a good and necessary tool, but the data tells us code review isn’t solving the problem of bugs introduced by AI generated code.
I don’t have an answer, but “just use code review” probably isn’t it. In my opinion, “never use AI code assist” also isn’t the answer. There’s just more to learn about it, and we should proceed with drastically more caution.
Here’s an example I ran into, since work wants us to use AI to produce work stuff, whatever, they get to deal with the result.
But I had asked it to add some debug code to verify that a process was working by saving the in memory result of that process to a file, so I could ensure the next step was even possible to do based on the output of the first step (because the second step was failing). Get the file output and it looks fine, other than missing some whitespace, but that’s ok.
And then while debugging, it says the issue is the data for step 1 isn’t being passed to the function the calls if all. Wait, how can this be, the file looks fine? Oh when it added the debug code, it added a new code path that just calls the step 1 code (properly). Which does work for verifying step 1 on its own but not for verifying the actual code path.
The code for this task is full of examples like that, almost as if it is intelligent but it’s using the genie model of being helpful where it tries to technically follow directions while subverting expectations anywhere it isn’t specified.
Thinking about my overall task, I’m not sure using AI has saved time. It produces code that looks more like final code, but adds a lot of subtle unexpected issues on the way.
That is an excellent summary of the challenge. The code looks high quality sooner in the debug lifecycle, which actually makes debugging a little bit slower, at least with our current tools.
Yeah, it’s good enough that it even had me fooled, despite all my “it just correlates words” comments. It was getting to the desired result, so I was starting to think that the framework around the agentic coding AIs was able to give it enough useful context to make the correlations useful, even if it wasn’t really thinking.
But it’s really just a bunch of duct tape slapped over cracks in a leaky tank they want to put more water in. While it’s impressive how far it has come, the fundamental issues will always be there because it’s still accurate to call LLMs massive text predictors.
The people who believe LLMs have achieved AGI are either just lying to try to prolong the bubble in the hopes of actually getting it to the singularity before it pops or are revealing their own lack of expertise because they either haven’t noticed the fundamental issues or think they are minor things that can be solved because any instance can be patched.
But a) they can only be patched by people who know the correction (so the patches won’t happen in the bleeding edge until humans solve the problem they wanted AI to solve), and b) it will require an infinite number of these patches even to just cover all permutations of everything we do know.
That’s still on the human that opened the PR without doing the slightest effort of testing the AI changes though.
I agree there should be a lot of caution overall, I just think that the problem is a bit mischaracterized. The problem is the newfound ability to spam PRs that look legit but are actually crap, but the root here is humans doing this for Github rep or whatever, not AI inherently making codebases vulnerable. There need to be ways to detect such users that repeatedly do zero effort contributions like that and ban them.
Yes, it is their fault, and also, that fault is a widespread problem