The way you teach an LLM, outside of training your own, is with rules files and MCP tools. Record your architectural constraints, favored dependencies, and style guide information in your rule files and the output you get is going to be vastly improved. Give the agent access to more information with MCP tools and it will make more informed decisions. Update them whenever you run into issues and the vast majority of your repeated problems will be resolved.
If it’s doesn’t work for you, it’s because you’re a failure!
Still not convinced these LLM bros aren’t junior developers (at best) who someone gave a senior title to because everyone else left their shit hole company.
More to the point, that is exactly what the people in this study were doing.
They don’t really do into a lot of detail about what they were doing. But they have a table on limitations of the study that would indicate it is not.
We do not provide evidence that: There are not ways of using existing AI systems more effectively to achieve positive speedup in our exact setting. Cursor does not sample many tokens from LLMs, it may not use optimal prompting/scaffolding, and domain/repository-specific training/finetuning/few-shot learning could yield positive speedup.
Back to this:
even if it did it’s not any easier or cheaper than teaching humans to do it.
In my experience, the kinds of information that an AI needs to do its job effectively has a significant overlap with the info humans need when just starting on a project. The biggest problem for onboarding is typically poor or outdated internal documentation. Fix that for your humans and you have it for your LLMs at no extra cost. Use an LLM to convert your docs into rules files and to keep them up to date.
Your argument depends entirely on the assumption that you know more about using AI to support coding than the experienced devs that participated in this study. You want to support that claim with more than a “trust me, bro”?
Do you think that like nobody has access to AI or something? These guys are the ultimate authorities on AI usage? I won’t claim to be but I am a 15 YOE dev working with AI right now and I’ve found the quality is a lot better with better rules and context.
And, ultimately, I don’t really care if you believe me or not. I’m not here to sell you anything. Don’t use it the tools, doesn’t matter to me. Anybody else who does use them, give my advice a try an see if it helps you.
Again, read and understand the limitations of the study. Just the portion I quoted you alone is enough to show you that you’re leaning way too heavily on conclusions that they don’t even claim to provide evidence for.
That is a moronic take. You would be better off learning to structure your approach to SW development than trying to learn how to use a glorified slop machine to plagiarize other people’s works.
In practice I find the more stuff like this you throw at it the more rope it has to hang itself with. And you spend so much time prompt adjusting so it doesn’t do the wrong things that you were better off just doing half of the tasks yourself.
Unless you are retraining the model locally at your 23 acre data center in your garage after every interaction, it’s still not learning anything. You are just dumping more data in to its temporary context.
And lots fit on personal computers dude, do you even know what different llms there are…?
One for programming doesn’t need all the fluff of books and art, so now it’s a manageable size. Llms are customizable to any degree, use your own data library for the context data even!
“Customizing” is just dumping more data in to it’s context.
Yes, which would fix the incorrect coding issues. It’s not an llm issue, it’s too much data. Or remove the context causing that issue. These require a little legwork and knowledge to make useful. Like anything else.
You do understand that the model weights and the context are not the same thing right? They operate completely differently and have different purposes.
Trying to change the model’s behavior using instructions in the context is going to fail. That’s like trying to change how a word processor works by typing in to the document. Sure, you can kind of get the formatting you want if you manhandle the data, but you haven’t changed how the application works.
Where do you think the errors are coming from? From data bleed over, the word “coding” shows up in books, so yes the context would incorrectly pull book data too.
Or do you not realize coding books exist as well…? And would be in the dataset.
If it’s constantly making an error, fix the context data dude. What about it an llm/ai makes you think this isn’t possible…? Lmfao, you just want to bitch about ai, not comprehend how they work.
Yeah, but LLMs still consistently don’t follow all rules they’re given, they randomly will not follow one or more with no indication they did so, so you can’t really fix these issues consistently, just most of the time.
Edit: to put this a little more clearly after a bit more thought: It’s not even necessarily a problem that it doesn’t always follow rules, it’s more so a problem that when it doesn’t follow the rules, there’s no indication it did so. If it had that, it would actually be fine!
Very true. I’ve been saying this for years. However, the flip side is you get the best results from AI by treating it as a junior developer as well. When you do, you can in fact have a fleet of virtual junior developer working for you as a senior.
However, and I tell this to the junior I work with: you are responsible for the code you put into production, regardless if you write it yourself or you used AI. You must review what it creates because you’re signing off on it.
That in turn means you may not save as much time as you think, because you have to review everything, and you have to make sure you understand everything.
But understanding will get progressively harder the more code is written by other people or AI. It’s best to try to stay current with the code base as it develops.
Unfortunately this cautious approach does not align with the profit motives of those trying to replace us with AI, so I remain cynical about the future.
Usually, having to wrangle a junior developer takes a senior more time than doing the junior’s job themselves. The problem grows the more juniors they’re responsible for, so having LLMs stimulate a fleet of junior developers will be a massive time sink and not faster than doing everything themselves. With real juniors, though, this can still be worthwhile, as eventually they’ll learn, and then require much less supervision and become a net positive. LLMs do not learn once they’re deployed, though, so the only way they get better is if a cleverer model is created that can stimulate a mid-level developer, and so far, the diminishing returns of progressively larger and larger models makes it seem pretty likely that something based on LLMs won’t be enough.
I’m a senior working with junior developers, guiding them through difficult tasks and delegating work to them. I also use AI for some of the work. Everything you say is correct.
However, that doesn’t stop a) some seniors from spinning up several copies of AI and test them like a group of juniors and b) management from seeing this as a way to cut personnel.
I think denying these facts as a senior is just shooting yourself in the foot. We need to find the most productive ways of using AI or become obsolete.
At the same time we need to ensure that juniors can develop into future seniors. AI is throwing a major wrench in the works of that, but management won’t care.
Basically, the smart thing to do is to identify where AI, seniors, and juniors all fit in. I think the bubble needs to pop before that truly happens, though. Right now there’s too much excitement to cut cost/salaries with the people holding the purse strings. Until AI companies start trying to actually make a profit, that won’t happen.
If LLMs aren’t going to reach a point where they outperform a junior developer who needs too much micromanaging to be a net gain to productivity, then AI’s not going to be a net gain to productivity, and the only productive way to use it is to fight its adoption, much like the only way to productively use keyboards that had a bunch of the letters missing would be to refuse to use them. It’s not worth worrying about obsolescence until such a time as there’s some evidence that they’re likely to be better, just like how it wasn’t worth worrying about obsolescence yet when neural nets were being worked on in the 80s.
You’re not wrong, but in my personal experience AI that I’ve used is already at the level of a decent intern, maybe fresh junior level. There’s no reason it can’t improve from there. In fact I get pretty good results by working incrementally to stay within its context window.
I was around for the dotcom bubble and I expect this to go similarly: at first there is a rush to put AI into everything. Then they start realizing they have to actually make money and the frivolous stuff drops by the wayside and the useful stuff remains.
But it doesn’t go away completely. After the dotcom bust, the Internet age was firmly upon us, just with less hype. I expect AI to follow a similar trend. So, we can hope for another AI winter or we can figure out where we fit in. I know which one I’m doing.
There’s a pretty good reason to think it’s not going to improve much. The size of models and amount of compute and training data required to create them is increasing much faster than their performance is increasing, and they’re already putting serious strain on the world’s ability to build and power computers, and the world’s ability to get human-written text into training sets (hence why so many sites are having to deploy things like Anubis to keep themselves functioning). The levers AI companies have access to are already pulled as far as they can go, and so the slowing of improvement can only increase, and the returns can only diminish faster.
I can only say I hope you’re right. I don’t like the way things are going, but I need to do what I can to adapt and survive so I choose to not put my hopes on AI failing anytime soon.
By the way, thank you for the thoughtful responses and discussion.
Funny cause my experience is completely the reverse. I’ve seen a ton of medium level developers just use copilot style auto complete without really digging into new workflows, and on the other end really experienced people spinning agents in parallel and getting a lot of shit done.
The “failed tech business people” are super hyped for ten minutes when cursor gives them a static html page for free, but they quickly grow very depressed when the actual work starts. Making sense of a code base is where the rubber meets the road, and agents won’t help if you have zero experience in a software factory.
That’s the funny thing. I definitely fall into the ‘medium level’ dev group (Coding is my job, but I haven’t written a single line of code in my spare time for years), and frankly - I really like Copilot. It’s like the standard code-completion on steroids. No need to spend excessive amounts of time describing the problem and review a massive blob of dubious code, just short-ish snippets of easily reviewed code based on current context.
Everyone seems to argue against AI as if vibe coding is the only option and you have to spend time describing every single task, but I’ve changed literally nothing in my normal workflow and get better and more relevant code completion results.
Obviously having to describe every task in detail taking edge cases into account is going to be a waste of time, but fortunately that’s not the only option.
I get what you are saying and agree. But corporations doing give a fuck. As long as they can keep seeing increased profits from it, it’s coming. It’s not about code quality or time or humans. It’s about profits.
True. The AI parents are having issues. We all know OpenAI is hemorrhaging money. I think Anthropic is as well. They are all passing money between each other. But software companies, like the one I work for, don’t care what those companies are doing. As long as my company can use services provided by the AI parents, it’s not an issue if the AI parents themselves are losing money. Or if software companies can shove out their own AI feature (like the AI in ServiceNow or how Office 365 is getting some rebranding), all is well and they can brag about having AI to the shareholders.
That’ll work right up until the shareholders start hearing “we got AI!” as the equivalent to “we invested in Enron!”. I hope they have a plan for that.
Me it reminds me of that period of time where A/B testing was big and everybody and their mother had to at least do some. Never mind that it solved problems we didn’t have, it still was a cool thing to say in a meeting lol
And that’s what I don’t understand. Instructing a team of juniors works very well, in fact it has been the predominant way of making software for some time now. Hire a bit more junior than what you need, and work them a bit above their pay grade thanks to your experience. That’s just business as usual.
So I guess what these studies show is that most engineers are not really good when it comes to piloting juniors, which has been a known fact forever. That’s often cited as a reason why most seniors will never make it to staff level.
Writing code with an AI as an experienced software developer is like writing code by instructing a junior developer.
… That keeps making the same mistakes over and over again because it never actually learns from what you try to teach it.
Yep, the junior is capable of learning.
Wait till I get hired as junior
Yeah, not all people who enter the industry should be doing so.
Most of this was boomers being boomers and claiming anyone and everyone should code.
My job believes the solution to this is a 7,000 line agents.md file
Sometimes. And if they’re not, they’ll be replaced or replace themselves.
This is not really true.
The way you teach an LLM, outside of training your own, is with rules files and MCP tools. Record your architectural constraints, favored dependencies, and style guide information in your rule files and the output you get is going to be vastly improved. Give the agent access to more information with MCP tools and it will make more informed decisions. Update them whenever you run into issues and the vast majority of your repeated problems will be resolved.
Well, that’s what they say, but then it doesn’t actually work, and even if it did it’s not any easier or cheaper than teaching humans to do it.
More to the point, that is exactly what the people in this study were doing.
If it’s doesn’t work for you, it’s because you’re a failure!
Still not convinced these LLM bros aren’t junior developers (at best) who someone gave a senior title to because everyone else left their shit hole company.
They don’t really do into a lot of detail about what they were doing. But they have a table on limitations of the study that would indicate it is not.
Back to this:
In my experience, the kinds of information that an AI needs to do its job effectively has a significant overlap with the info humans need when just starting on a project. The biggest problem for onboarding is typically poor or outdated internal documentation. Fix that for your humans and you have it for your LLMs at no extra cost. Use an LLM to convert your docs into rules files and to keep them up to date.
Your argument depends entirely on the assumption that you know more about using AI to support coding than the experienced devs that participated in this study. You want to support that claim with more than a “trust me, bro”?
Do you think that like nobody has access to AI or something? These guys are the ultimate authorities on AI usage? I won’t claim to be but I am a 15 YOE dev working with AI right now and I’ve found the quality is a lot better with better rules and context.
And, ultimately, I don’t really care if you believe me or not. I’m not here to sell you anything. Don’t use it the tools, doesn’t matter to me. Anybody else who does use them, give my advice a try an see if it helps you.
These guys all said the same thing before they participated in a study that proved that they were less efficient than their peers.
Again, read and understand the limitations of the study. Just the portion I quoted you alone is enough to show you that you’re leaning way too heavily on conclusions that they don’t even claim to provide evidence for.
That is a moronic take. You would be better off learning to structure your approach to SW development than trying to learn how to use a glorified slop machine to plagiarize other people’s works.
Codex literally lies about being connected to configured MCP servers.
Are you trying to make a point that agents can’t use MCP based off of a picture of a tweet you saw or something?
I’m talking from my personal, daily experience using codex.
In theory yes.
In practice I find the more stuff like this you throw at it the more rope it has to hang itself with. And you spend so much time prompt adjusting so it doesn’t do the wrong things that you were better off just doing half of the tasks yourself.
This is why you use a downloaded llm and customize it, there’s ways to fix these issues.
Unless you are retraining the model locally at your 23 acre data center in your garage after every interaction, it’s still not learning anything. You are just dumping more data in to its temporary context.
Sounds like you have no clue what an LLM/AI actually is or is capable of.
https://medium.com/sciforce/step-by-step-guide-to-your-own-large-language-model-2b3fed6422d0
It’s not hard to keep a data library updated for context, and some are under a TB in siz.
Where are you getting your information from?
It seems you are still confusing context with training? Did you read that text and understand it?
Did you follow it yourself to build an llm?
I bet they had an LLM read it and summarize it for them
Why do you think it’s solely a training issue?
So, you did not? Ok
Can’t answer the question eh?
What a shocker.
If you can’t explain your or justify your side, I’ve got no time for people like you.
What part of customize did you not understand?
And lots fit on personal computers dude, do you even know what different llms there are…?
One for programming doesn’t need all the fluff of books and art, so now it’s a manageable size. Llms are customizable to any degree, use your own data library for the context data even!
What part about how LLMs actually work do you not understand?
“Customizing” is just dumping more data in to it’s context. You can’t actually change the root behavior of an LLM without rebuilding it’s model.
Yes, which would fix the incorrect coding issues. It’s not an llm issue, it’s too much data. Or remove the context causing that issue. These require a little legwork and knowledge to make useful. Like anything else.
You really don’t know how these work do you?
You do understand that the model weights and the context are not the same thing right? They operate completely differently and have different purposes.
Trying to change the model’s behavior using instructions in the context is going to fail. That’s like trying to change how a word processor works by typing in to the document. Sure, you can kind of get the formatting you want if you manhandle the data, but you haven’t changed how the application works.
Why are you so focused on just the training? The data is ALSO the issue.
Of course if you ignore one fix, that works, of course you can only cry it’s not fixable.
But it is.
But
Is not inside the context, that comes from training. So you know how an llm works?
Where do you think the errors are coming from? From data bleed over, the word “coding” shows up in books, so yes the context would incorrectly pull book data too.
Or do you not realize coding books exist as well…? And would be in the dataset.
If it’s constantly making an error, fix the context data dude. What about it an llm/ai makes you think this isn’t possible…? Lmfao, you just want to bitch about ai, not comprehend how they work.
This is Lemmy, bitching about AI is the norm.
Yeah, but LLMs still consistently don’t follow all rules they’re given, they randomly will not follow one or more with no indication they did so, so you can’t really fix these issues consistently, just most of the time.
Edit: to put this a little more clearly after a bit more thought: It’s not even necessarily a problem that it doesn’t always follow rules, it’s more so a problem that when it doesn’t follow the rules, there’s no indication it did so. If it had that, it would actually be fine!
Without the payoff of the next generation of developers learning.
Management: “Treat it like a junior dev”
… So where are we going to get senior devs if we’re not training juniors?
Very true. I’ve been saying this for years. However, the flip side is you get the best results from AI by treating it as a junior developer as well. When you do, you can in fact have a fleet of virtual junior developer working for you as a senior.
However, and I tell this to the junior I work with: you are responsible for the code you put into production, regardless if you write it yourself or you used AI. You must review what it creates because you’re signing off on it.
That in turn means you may not save as much time as you think, because you have to review everything, and you have to make sure you understand everything.
But understanding will get progressively harder the more code is written by other people or AI. It’s best to try to stay current with the code base as it develops.
Unfortunately this cautious approach does not align with the profit motives of those trying to replace us with AI, so I remain cynical about the future.
Usually, having to wrangle a junior developer takes a senior more time than doing the junior’s job themselves. The problem grows the more juniors they’re responsible for, so having LLMs stimulate a fleet of junior developers will be a massive time sink and not faster than doing everything themselves. With real juniors, though, this can still be worthwhile, as eventually they’ll learn, and then require much less supervision and become a net positive. LLMs do not learn once they’re deployed, though, so the only way they get better is if a cleverer model is created that can stimulate a mid-level developer, and so far, the diminishing returns of progressively larger and larger models makes it seem pretty likely that something based on LLMs won’t be enough.
I’m a senior working with junior developers, guiding them through difficult tasks and delegating work to them. I also use AI for some of the work. Everything you say is correct.
However, that doesn’t stop a) some seniors from spinning up several copies of AI and test them like a group of juniors and b) management from seeing this as a way to cut personnel.
I think denying these facts as a senior is just shooting yourself in the foot. We need to find the most productive ways of using AI or become obsolete.
At the same time we need to ensure that juniors can develop into future seniors. AI is throwing a major wrench in the works of that, but management won’t care.
Basically, the smart thing to do is to identify where AI, seniors, and juniors all fit in. I think the bubble needs to pop before that truly happens, though. Right now there’s too much excitement to cut cost/salaries with the people holding the purse strings. Until AI companies start trying to actually make a profit, that won’t happen.
If LLMs aren’t going to reach a point where they outperform a junior developer who needs too much micromanaging to be a net gain to productivity, then AI’s not going to be a net gain to productivity, and the only productive way to use it is to fight its adoption, much like the only way to productively use keyboards that had a bunch of the letters missing would be to refuse to use them. It’s not worth worrying about obsolescence until such a time as there’s some evidence that they’re likely to be better, just like how it wasn’t worth worrying about obsolescence yet when neural nets were being worked on in the 80s.
You’re not wrong, but in my personal experience AI that I’ve used is already at the level of a decent intern, maybe fresh junior level. There’s no reason it can’t improve from there. In fact I get pretty good results by working incrementally to stay within its context window.
I was around for the dotcom bubble and I expect this to go similarly: at first there is a rush to put AI into everything. Then they start realizing they have to actually make money and the frivolous stuff drops by the wayside and the useful stuff remains.
But it doesn’t go away completely. After the dotcom bust, the Internet age was firmly upon us, just with less hype. I expect AI to follow a similar trend. So, we can hope for another AI winter or we can figure out where we fit in. I know which one I’m doing.
There’s a pretty good reason to think it’s not going to improve much. The size of models and amount of compute and training data required to create them is increasing much faster than their performance is increasing, and they’re already putting serious strain on the world’s ability to build and power computers, and the world’s ability to get human-written text into training sets (hence why so many sites are having to deploy things like Anubis to keep themselves functioning). The levers AI companies have access to are already pulled as far as they can go, and so the slowing of improvement can only increase, and the returns can only diminish faster.
I can only say I hope you’re right. I don’t like the way things are going, but I need to do what I can to adapt and survive so I choose to not put my hopes on AI failing anytime soon.
By the way, thank you for the thoughtful responses and discussion.
Apparently some people would love to manage a fleet of virtual junior devs instead of coding themselves, I really don’t see the appeal.
I think the appeal is that they already tried to lean to code and failed.
Folks I know who are really excited about vibe coding are the ones who are tired of not having access to a programmer.
In some of their cases, vibe coding is a good enough answer. In other cases, it is not.
Their workplaces get to find out later which cases were which.
Funny cause my experience is completely the reverse. I’ve seen a ton of medium level developers just use copilot style auto complete without really digging into new workflows, and on the other end really experienced people spinning agents in parallel and getting a lot of shit done.
The “failed tech business people” are super hyped for ten minutes when cursor gives them a static html page for free, but they quickly grow very depressed when the actual work starts. Making sense of a code base is where the rubber meets the road, and agents won’t help if you have zero experience in a software factory.
That’s the funny thing. I definitely fall into the ‘medium level’ dev group (Coding is my job, but I haven’t written a single line of code in my spare time for years), and frankly - I really like Copilot. It’s like the standard code-completion on steroids. No need to spend excessive amounts of time describing the problem and review a massive blob of dubious code, just short-ish snippets of easily reviewed code based on current context.
Everyone seems to argue against AI as if vibe coding is the only option and you have to spend time describing every single task, but I’ve changed literally nothing in my normal workflow and get better and more relevant code completion results.
Obviously having to describe every task in detail taking edge cases into account is going to be a waste of time, but fortunately that’s not the only option.
What a wonderful statement.
I get what you are saying and agree. But corporations doing give a fuck. As long as they can keep seeing increased profits from it, it’s coming. It’s not about code quality or time or humans. It’s about profits.
Are they though? They’ve invested like a trillion dollars into this and it doesn’t seem any closer to actually making money.
True. The AI parents are having issues. We all know OpenAI is hemorrhaging money. I think Anthropic is as well. They are all passing money between each other. But software companies, like the one I work for, don’t care what those companies are doing. As long as my company can use services provided by the AI parents, it’s not an issue if the AI parents themselves are losing money. Or if software companies can shove out their own AI feature (like the AI in ServiceNow or how Office 365 is getting some rebranding), all is well and they can brag about having AI to the shareholders.
That’ll work right up until the shareholders start hearing “we got AI!” as the equivalent to “we invested in Enron!”. I hope they have a plan for that.
Me it reminds me of that period of time where A/B testing was big and everybody and their mother had to at least do some. Never mind that it solved problems we didn’t have, it still was a cool thing to say in a meeting lol
Wow, great analogy. Might steal this to use myself.
And that’s what I don’t understand. Instructing a team of juniors works very well, in fact it has been the predominant way of making software for some time now. Hire a bit more junior than what you need, and work them a bit above their pay grade thanks to your experience. That’s just business as usual.
So I guess what these studies show is that most engineers are not really good when it comes to piloting juniors, which has been a known fact forever. That’s often cited as a reason why most seniors will never make it to staff level.