How AI assistance impacts the formation of coding skills - Anthropic study

idriss@lemmy.ml · 21 hours ago

How AI assistance impacts the formation of coding skills - Anthropic study

Remy Rose@piefed.social · 15 minutes ago

Why are like 70% of the posts in this comm about AI lately?? I’m out of here…

Kissaki@programming.dev · 18 hours ago

From the paper abstract:

[…] Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI.

We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library.

We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation – particularly in safety-critical domains.

PolarKraken@programming.dev · 14 hours ago

Interesting read and feels intuitively plausible. Also matches my growing personal sense that people are using these things wildly differently and having completely different outcomes as a result. Some other random disconnected thoughts:

I’m surprised they’re publishing this, it seems to me like a pretty stark condemnation of the technology. Like what are the benefits they anticipate that made them decide this should be published, vs. quietly kept aside “pending further research”? Obviously people knowing how to use the tools better is good for longevity, but that’s just not what our idiotic investment cycles prioritize.
I’m no scientist or expert in experimental design, but this seems like way too few people for the level of detail they’re bringing to the conclusions they’re drawing. That plus the way it all just feels intuitively plausible has a very “just so” feeling to the interpretation rather than true exploration. I mean, cmon - the behavioral buckets they are talking about range from 2-7 people apiece, most commonly just 4 individuals. “Four junior engineers behaved kinda like this and had that average outcome” MIGHT reflect a broader pattern but it sure doesn’t feel compelling or scientific.

Nonetheless I selfishly enjoyed having my own vague subconscious observations validated lol, would like to see more of this (and anything else that seems to work against the crazy bubble being inflated).

AbelianGrape@beehaw.org · 12 hours ago

For 1: as a software company, they have a vested interest in ensuring that software engineers are as capable as possible. I don’t know if anthropic as a company uses this as a guiding principle, but certainly some companies do (ex Jane Street). So they might see this as more important than investment cycles.

The quality of software engineers and computer scientists I’ve seen coming out of undergraduate programs in the last year has been astonishingly poor compared to 2-3 years ago. I think it’s almost guaranteed that the larger companies have also noticed this.

PolarKraken@programming.dev · edit-2 8 hours ago

I completely agree and appreciate sincerely that they released this. It’s unfortunate, the way the obviously nonsense claims made by the industry at large - “LLMs are AI and can do everything!” - have polluted a lot of devs’ ability or willingness to see the tools for what they are, and maybe official acknowledgements like these can help.

It also seems likely to me that the major players know a lot of negative truths about all this stuff, you’re probably right about hiring observations. I don’t follow any of their marketing really so I have to admit I’m even just assuming that releasing this is out of character.

If I’m being honest, I’m mostly just on the edge of my seat waiting for the hype bubble to burst, lol, and curious about how it’ll unfold. Probably just kind of hoping this marks a step toward that.

entwine@programming.dev · 14 hours ago

In a randomized controlled trial, we examined 1) how quickly software developers picked up a new skill (in this case, a Python library) with and without AI assistance; and 2) whether using AI made them less likely to understand the code they’d just written.

We found that using AI assistance led to a statistically significant decrease in mastery. On a quiz that covered concepts they’d used just a few minutes before, participants in the AI group scored 17% lower than those who coded by hand, or the equivalent of nearly two letter grades. Using AI sped up the task slightly, but this didn’t reach the threshold of statistical significance.

Who designed this study? I assume it wasn’t a software engineer, because this doesn’t reflect real world “coding skills”. This is just a programming-flavored memory test. Obviously, the people who coded by hand remembered more about the library in the same way students who take notes by hand as opposed to typing tend to remember more.

A proper study would need to evaluate critical thinking and problem solving skills using real world software engineering tasks. Maybe find some already-solved, but obscure bug in an open source project and have them try to solve it in a controlled environment (so they don’t just find the existing solution already).

eleijeep@piefed.social · 19 hours ago

Discussion
Our main finding is that using AI to complete tasks that require a new skill (i.e., knowledge of a new Python
library) reduces skill formation.
…
The erosion of conceptual understanding, code reading, and debugging skills that we measured among participants using AI assistance suggests that workers acquiring new skills should be mindful of their reliance on AI during the learning process.
…
Among participants who use AI, we find a stark divide in skill formation outcomes between high-scoring interaction patterns (65%-86% quiz score) vs low-scoring interaction patterns (24%-39% quiz score). The high scorers only asked AI conceptual
questions instead of code generation or asked for explanations to accompany generated code; these usage
patterns demonstrate a high level of cognitive engagement.
Contrary to our initial hypothesis, we did not observe a significant performance boost in task completion
in our main study.
…
Our qualitative analysis reveals that our finding is largely due to the heterogeneity in how participants decide to use AI during the task.
…
These contrasting patterns of AI usage suggest that accomplishing a task with new knowledge or skills does not necessarily lead to the same productive gains as tasks that require only existing knowledge.
Together, our results suggest that the aggressive incorporation of AI into the workplace can have negative impacts on the professional development workers if they do not remain cognitatively [ sic ] engaged. Given time constraints and organizational pressures, junior developers or other professionals may rely on AI to complete tasks as fast as possible at the cost of real skill development. Furthermore, we found that the biggest difference in test scores is between the debugging questions. This suggests that as companies transition to more AI code writing with human supervision, humans may not possess the necessary skills to validate and debug AI-written code if their skill formation was inhibited by using AI in the first place.

Lucy :3@feddit.org · 21 hours ago

The wording is very, very self serving tho.

idriss@lemmy.ml · 19 hours ago

yep, they are selling learning models, but they are not pretending medical doctors will be out of work next week like OpenAI is doing

d0ntpan1c@lemmy.blahaj.zone · 15 minutes ago

Anthropic may avoid saying the dumb things OpenAI says, but do not mistake that for being a better company/product. Amodei is still out to eliminate all jobs and has a history of being just as self-serving as Altman.

troi@techhub.social · 21 hours ago

@idriss Seems predictable to me. Programmers on the left or middle of some distribution identifying “good” programmers or engineers will use AI and be comfortable having completed some task. Those on the right of the distribution may or may not use AI but will insist on understanding what has been created.

Now, an interesting question for me unrelated to the post is “what would be a good metric to identify
really good programmers?”

idriss@lemmy.ml · 4 hours ago

@[email protected] tbh I could see people who are considered good programmers in one place but not in another place (just prompting to get things done with minimum effort & reserving the effort for something else). Probably it comes back to interest & care, how much the person is interested in iterating over their solution & architecture + learning things regardless of seniority level to achieve a higher level goal (simpler design for example rather than stopping when it works). Maybe that could be an indication of a good programmer?

troi@techhub.social · 1 hour ago

@idriss makes sense. The 80-20 rule might apply here. A good programmer knows where to spend their time. I’ve been kicking this around with an old boss and we don’t have any firm ideas. A metric should be quantifiable, but your interest & care gets into self actualization. Maybe a version of Maslow’s hierarchy of needs for software developers?

I am also thinking the word “good” was a bad choice. It’s too subjective and has a negative implication for anyone to the left side of the bell curve. Competent programmers are a thing and I suspect they actually keep most things running smoothly.

I wish I had my old copy of Weinberg’s _The Psychology of Computer Programming_. It’s been decades since I read it so I don’t recall if it addressed this sort of question, but it might suggest something.