I felt the familiar feeling of depression and lethargy creep in while my eyes darted from watching claude-code work and my phone. “What’s the point of it all?” I thought, LLMs can generate decent-ish and correct-ish looking code while I have more time to do what? doomscroll? This was the third time I gave claude-code a try. I felt the same feelings every single time and ended up deleting claude-code after 2-3 weeks, and whaddyouknow?
Writing code is only the tip of the iceberg. You actually have to:
While large language models can help in the last step, they are very limited in previous ones, except working as a search engine on steroids.
More like a search engine on LSD.
AI results are always shit when trying to find anything not completely obvious. You end up more often than not with an hallucinated reality that has absolutely no value.
No, AI results can be quite good, especially if your internal documentation is poor and disorganised. Fundamentally you cannot trust it, but in software we have the luxury of being able to check solutions cheaply (usually).
Our internal search at work is dogshit, but the internal LLM can turn up things quicker. Do I wish they’d improve the internal search? Yes. Am I going to make that my problem by continuing to use a slower tool? No.
“It’s quite good” “you cannot trust it”
What is your definition of good?
What a recipe for disaster…
Something that you can’t trust can be good if it is possible to verify without significant penalties, as long as its accuracy is sufficiently high.
In my country, you would never just trust the weather forecast if your life depended on it not raining: if you book an open-air event more than a week in advance, the plan cannot rely on the weather being fair, because the long-range forecast is not that reliable. But this is OK if the cost of inaccuracy is that you take an umbrella with you, or change plans last-minute and stay in. It’s not OK you don’t have an umbrella, or staying in would cost you dearly.
In software development, if you ask a question like, “how do I fix this error message from the CI system”, and it comes back with some answer, you can just try it out. If it doesn’t work, oh well, you wasted a few minutes of your time and some minutes on the CI nodes. If it does, hurrah!
Given that, in practice the alternative is often spending hours digging through internal posts, messaging other people (disrupting their time) who don’t know the answer, only to end up with a hack workaround, this is actually well worth a go - at my place of work. In fact, let’s compare the AI process to the internal search one - I search for the error message and the top 5 results are all completely unrelated. This isn’t much different to the AI returning a hallucinated solution - the difference is that to check the hallucinated solution, I have to run the command it gives (or whatever), whereas to check the search results, I have to read the posts. There is a higher time cost to checking the AI solution - it probably only takes 30 seconds to click a link, load the page, and read enough of it to see it’s wrong. Whereas the hallucinated solution, as I said, will take a few minutes (of my time actually typing commands, watching it run, looking at results - not waiting for CI to complete which I can spend doing something else). So that is, roughly, the ratio for how much better the LLM needs to be than search (in terms of % good results).
Like I said, I wish that the state of our internal search and internal documentation were better, but it ain’t.
Good point. Reading the documentation of the library and the source code is often a better use of a software developer’s time.