Developers who are told to use AI whether they like it or not, however, tell a different story.
Well there’s the problem.
I’m a software developer and I say that AI is the greatest force-multiplier that’s been introduced into the field since the compiler. I love using it, it handles the most tedious and annoying parts of the process. But there are situations I don’t want to use it in, and of course being forced to use would give me a more negative opinion of it. Obviously.
I’m a software developer and I say that AI is the greatest force-multiplier that’s been introduced into the field since the compiler.
As a person who works with coworkers who fully embraced it, it doesn’t look like they are any faster. There is one group that is faster, but they don’t verify their code and provide burden of it on another person who reviews PR to go through their shit code (sorry, but it is unnecessarily complex, does things in weird ways, I’ve seen it had bugs that even canceled each other (I guess this is probably due to re-running until things work))
There isn’t any credible evidence out there that actually shows LLMs are a “force multiplier.” That is almost certainly just a made up marketing term for unprofitable chatbot companies.
In this case the evidence is literally first-hand experience. There is nothing that will change my mind on this because it’s my direct personal experience from actual use.
I honestly don’t care what marketing says, and if other people have different experiences then that’s just them. In my personal actual real-world experience I found that they let me get tons more done and their quality of work is perfectly fine as long as you’re using the right tools and giving them the right instructions.
The article says that developers are disagreeing with that in situations where they are “forced” to use AI, and that’s fair, it doesn’t make sense to force a tool to be used for something it’s not good at. They might be using it wrong. I use it whenever it’s better than not using it, and that ends up being quite often in my workflow.
Unfortunately your being downvoted by the echo chamber participants that have to make sure you know that your opinion is wrong and theirs is better. AI is a tool, just like my impact gun. Yea there are times where you absolutely should not use an impact gun on something, but it’s THE tool for some situations. And yea, using an impact gun where you should t will get you in trouble just like using AI in situations you shouldn’t will get you in trouble. There is nothing new on that front!
I kind of agree it’s a multiplier. But so far every time I’ve had it do something its written such an ugly turd I have to rewire it all taking more time than if I’d just solved the problem to start with. Maybe someday but it’s not up to the quality I expect of development.
Have you tried giving it coding standards and other such preferences about how you like your code to be organized? I’ve found that coding agents can be quite adaptable to various styles, you can put stuff like “try to keep functions less than 100 lines long” or “include assertions validating all function inputs” into your coding agent’s general instructions and it’ll follow them.
For me, one of the things that’s a huge fundamental improvement is telling the agent to create and run unit tests for everything. That way when it does mess up accidentally it can immediately catch the problem and usually fixes it in the same session without further intervention. Unit tests used to be more trouble than they were worth most of the time, now I love them.
We have ours configured with our coding standards, mcps, and we have a skill library.
It still outputs code full of mistakes. Usually they’re minor mistakes, but not always.
When we use it to fix defects, it usually fixes the problem, but not in a very robust way. It still needs a lot of supervision to output quality code. For example it will often spot fix defects instead of applying the principle of the code fix to other areas that also need it (i.e. we needed to normalize some data but it only did it in one place, because the ticket only mentioned that one place, however that data is used elsewhere as well)
It’s a helpful tool for sure but it’s rare that I don’t need to make corrections
No, I’ve used them plenty before. I just found them to generally be a huge hassle of minimal benefit. They became much more useful in the context of agentic coding, where you want the agent to be able to immediately realize “oh, this change I made causes these specific problems when it’s run.” The hassle is all on the agent, not on me.
Could be. I’m a professional programmer whose usage runs the whole gamut - large applications with hundreds of programmers working on them for years, smaller apps that I make for my own use, and one-off scripts to do some particular task and then generally throw away afterwards.
I don’t do unit tests for that last category, of course. I don’t even use coding agents for those, generally speaking - a bit of back-and-forth in a chat interface is usually enough there.
Is this like a who’s got a bigger portfolio situation? I’m not sure how to respond
I guess I’ve been developing for decades including consulting for Page 6, a stint in RD at Sony Music. One of my open source contributions was used as part of the backend for one of Obama’s State of the Unions. I spend my time these days writing and maintaining multiple software stacks integrating across multiple platforms.
Since you brought up the notion that we might be doing different styles of development, I was giving you context as to the kinds of development that I do. Sounds like we might not be doing such different scales of development after all, but I couldn’t have known that until you gave that information just now.
This isn’t supposed to be some kind of duel or argument, I don’t see the point of that. I’m just explaining my usage of coding agents and specifically unit tests in that context. Since that’s what you were questioning.
I’ll say that during a recent week where I was forced to use an LLM, I found Claude Opus to be extremely poor at referencing this guide: https://mywiki.wooledge.org/BashPitfalls
it took almost an hour to get Claude to write me a shell script which I considered to be of acceptable quality. It completely hallucinated about several of the points in that guide, requiring me to just go read the guide myself to verify that the language model was falsifying information. That same task would have taken me about 5 minutes.
I believe that GIGO applies here. 99% of shell scripts on the internet are unsafe and terrible (looking at you, set -euo pipefail), and Claude is much more likely to generate god awful garbage because of the inherent bias present in the training data.
And as for unit tests? Imo, anything other than property-based testing is irrelevant. If you’re using something like Pydantic, you can auto-generate a LOT of your tests using the rich type annotations available in that library along with hypothesis. I tend to write a testing framework once, and then special case property tests for things that fall outside of my models. None of this is super helpful for big ugly codebases with a lot of inertia around practices, but that’s not been my environment, thankfully.
Shellcheck, while good, doesn’t capture all best practices in my opinion. There are many items in that doc which shellcheck would happily allow, worst of all being set -euo pipefail.
It lets me focus on the software architecture, not the minutiae. It feels exactly like when I ran a team of brand new interns. They require a lot of hand holding but with the right direction they get good at their jobs very fast.
I think the problem is that for now, it will always continue to require that hand-holding, whereas interns/new programmers will need less and less over time and become more independent over time
Well there’s the problem.
I’m a software developer and I say that AI is the greatest force-multiplier that’s been introduced into the field since the compiler. I love using it, it handles the most tedious and annoying parts of the process. But there are situations I don’t want to use it in, and of course being forced to use would give me a more negative opinion of it. Obviously.
As a person who works with coworkers who fully embraced it, it doesn’t look like they are any faster. There is one group that is faster, but they don’t verify their code and provide burden of it on another person who reviews PR to go through their shit code (sorry, but it is unnecessarily complex, does things in weird ways, I’ve seen it had bugs that even canceled each other (I guess this is probably due to re-running until things work))
There isn’t any credible evidence out there that actually shows LLMs are a “force multiplier.” That is almost certainly just a made up marketing term for unprofitable chatbot companies.
If it’s a tool you can use yourself and it makes you more efficient, you don’t need a study to recognize its efficiency.
If you’re a software engineer, just try it yourself. Your own experience is the best proof you can find to judge if a tool is useful to you or not.
I am a software engineer, and trying it is exactly how I know it is not a “force multiplier.”
Outside of my personal experience—there’s also zero actual evidence it provides anywhere near the benefits it’s marketed as.
Then we have a different experience, and that’s fine
In this case the evidence is literally first-hand experience. There is nothing that will change my mind on this because it’s my direct personal experience from actual use.
I honestly don’t care what marketing says, and if other people have different experiences then that’s just them. In my personal actual real-world experience I found that they let me get tons more done and their quality of work is perfectly fine as long as you’re using the right tools and giving them the right instructions.
The article says that developers are disagreeing with that in situations where they are “forced” to use AI, and that’s fair, it doesn’t make sense to force a tool to be used for something it’s not good at. They might be using it wrong. I use it whenever it’s better than not using it, and that ends up being quite often in my workflow.
Unfortunately your being downvoted by the echo chamber participants that have to make sure you know that your opinion is wrong and theirs is better. AI is a tool, just like my impact gun. Yea there are times where you absolutely should not use an impact gun on something, but it’s THE tool for some situations. And yea, using an impact gun where you should t will get you in trouble just like using AI in situations you shouldn’t will get you in trouble. There is nothing new on that front!
I don’t disagree with the post you are responding to, almost all of that is reasonable.
Your overall argument would be more convincing if it wasn’t you doing the exact same thing you are complaining about.
As for specifics , the “Just a tool” argument is meh, not all tools are equal in potential benefit and harm.
Asbestos (while it is a material) was a “tool” used to insulate from heat.
Was it good at that, sure, it probably saved many lives, was it also harmful as fuck in the medium to long term, yes it was.
It can be a useful tool and also be a detriment, those things aren’t mutually exclusive.
The danger of a tool can also be mitigated with adequate safeguards that come from experience gained over time.
The argument then becomes risk vs reward, which is an entirely different conversation.
There are way too many ways to use LLMs for programming to make a blanket statement
‘Your personal experience is invalid because it goes against the circlejerk’
Did you miss the part about no credible evidence? Feeling like something is a certain way doesn’t make it true.
I kind of agree it’s a multiplier. But so far every time I’ve had it do something its written such an ugly turd I have to rewire it all taking more time than if I’d just solved the problem to start with. Maybe someday but it’s not up to the quality I expect of development.
It’s definitely a force multiplier, it’s just that the factor after the X can be less than 1.0.
Have you tried giving it coding standards and other such preferences about how you like your code to be organized? I’ve found that coding agents can be quite adaptable to various styles, you can put stuff like “try to keep functions less than 100 lines long” or “include assertions validating all function inputs” into your coding agent’s general instructions and it’ll follow them.
For me, one of the things that’s a huge fundamental improvement is telling the agent to create and run unit tests for everything. That way when it does mess up accidentally it can immediately catch the problem and usually fixes it in the same session without further intervention. Unit tests used to be more trouble than they were worth most of the time, now I love them.
We have ours configured with our coding standards, mcps, and we have a skill library.
It still outputs code full of mistakes. Usually they’re minor mistakes, but not always.
When we use it to fix defects, it usually fixes the problem, but not in a very robust way. It still needs a lot of supervision to output quality code. For example it will often spot fix defects instead of applying the principle of the code fix to other areas that also need it (i.e. we needed to normalize some data but it only did it in one place, because the ticket only mentioned that one place, however that data is used elsewhere as well)
It’s a helpful tool for sure but it’s rare that I don’t need to make corrections
You… just started writing unit tests?
No, I’ve used them plenty before. I just found them to generally be a huge hassle of minimal benefit. They became much more useful in the context of agentic coding, where you want the agent to be able to immediately realize “oh, this change I made causes these specific problems when it’s run.” The hassle is all on the agent, not on me.
I think we do very different development.
Could be. I’m a professional programmer whose usage runs the whole gamut - large applications with hundreds of programmers working on them for years, smaller apps that I make for my own use, and one-off scripts to do some particular task and then generally throw away afterwards.
I don’t do unit tests for that last category, of course. I don’t even use coding agents for those, generally speaking - a bit of back-and-forth in a chat interface is usually enough there.
Is this like a who’s got a bigger portfolio situation? I’m not sure how to respond
I guess I’ve been developing for decades including consulting for Page 6, a stint in RD at Sony Music. One of my open source contributions was used as part of the backend for one of Obama’s State of the Unions. I spend my time these days writing and maintaining multiple software stacks integrating across multiple platforms.
Since you brought up the notion that we might be doing different styles of development, I was giving you context as to the kinds of development that I do. Sounds like we might not be doing such different scales of development after all, but I couldn’t have known that until you gave that information just now.
This isn’t supposed to be some kind of duel or argument, I don’t see the point of that. I’m just explaining my usage of coding agents and specifically unit tests in that context. Since that’s what you were questioning.
Wow what a circlejerk this turned into.
Oh well, I guess that’s what everything really is the whole time.
I’ll say that during a recent week where I was forced to use an LLM, I found Claude Opus to be extremely poor at referencing this guide: https://mywiki.wooledge.org/BashPitfalls
it took almost an hour to get Claude to write me a shell script which I considered to be of acceptable quality. It completely hallucinated about several of the points in that guide, requiring me to just go read the guide myself to verify that the language model was falsifying information. That same task would have taken me about 5 minutes.
I believe that GIGO applies here. 99% of shell scripts on the internet are unsafe and terrible (looking at you,
set -euo pipefail), and Claude is much more likely to generate god awful garbage because of the inherent bias present in the training data.And as for unit tests? Imo, anything other than property-based testing is irrelevant. If you’re using something like Pydantic, you can auto-generate a LOT of your tests using the rich type annotations available in that library along with hypothesis. I tend to write a testing framework once, and then special case property tests for things that fall outside of my models. None of this is super helpful for big ugly codebases with a lot of inertia around practices, but that’s not been my environment, thankfully.
Why not just give it shellcheck and have it run that on every script it creates?
Shellcheck, while good, doesn’t capture all best practices in my opinion. There are many items in that doc which shellcheck would happily allow, worst of all being
set -euo pipefail.Sounds like you were writing bad unit tests and AI showed you how to do it right.
If so, it was project-wide across hundreds of devs.
It lets me focus on the software architecture, not the minutiae. It feels exactly like when I ran a team of brand new interns. They require a lot of hand holding but with the right direction they get good at their jobs very fast.
I think the problem is that for now, it will always continue to require that hand-holding, whereas interns/new programmers will need less and less over time and become more independent over time
Some programmers do get more independent. Some do not.