Does vibe coding risk destroying the Open Source ecosystem? According to a pre-print paper by a number of high-profile researchers, this might indeed be the case based on observed patterns and some…
Oh, sorry, I didn’t mean to imply that consumer-grade hardware has gotten more efficient. I wouldn’t really know about that, but I assume most of the focus is on data centers.
Those were two separate thoughts:
Models are getting better, and tooling built around them are getting better, so hopefully we can get to a point where small models (capable of running on consumer-grade hardware) become much more useful.
Some modern data center GPUs and TPUs compute more per watt-hour than previous generations.
Can you provide evidence the “more efficient” models are actually more efficient for vibe coding? Results would be the best measure.
It also seems like costs for these models are increasing, and companies like Cursor had to stoop to offering people services below cost (before pulling the rug out from them).
I wish I could, but it would kinda be PII for me. Though, to clarify some things:
I’m mostly not talking about vibe coding. Vibe coding might be okay for quickly exploring or (in)validating some concept/idea, but they tend to make things brittle pile up a lot of tech debt if you let them.
I don’t think “more efficient” (in terms of energy and pricing) models are more efficient for work. I haven’t measured it, but the smaller/“dumber” models tend to require more cycles before they reach their goals, as they have to debug their code more along the way. However, with the right workflow (using subagents, etc.), you can often still reach the goals with smaller models.
There’s a difference between efficiency and effectiveness. The hardware is becoming more efficient, while models and tooling are becoming more effective. The tooling/techniques to use LLMs more effectively also tend to burn a LOT of tokens.
TL;DR:
Hardware is getting more efficient.
Models, tools, and techniques are getting more effective.
I think this kind of claim really lies in a sour spot.
On the one hand it is trivial to get an IDE, plug it to GLM 4.5 or some other smaller more efficient model, and see how it fares on a project. But that’s just anecdotal. On the other hand, model creators do this thing called benchmaxing where they fine-tune their model to hell and back to respond well to specific benchmarks. And the whole culture around benchmarks is… i don’t know i don’t like the vibe it’s all AGI maximalists wanking to percent changes in performance. Not fun. So, yeah, evidence is hard to come by when there are so many snake oil salesmen around.
On the other hand, it’s pretty easy to check on your own. Install opencode, get 20$ of GLM credit, make it write, deploy and monitor a simple SaaS product, and see how you like it. Then do another one. And do a third one with Claude Code for control if you can get a guest pass (i have some hit me up if you’re interested).
What is certain from casual observation is that yes, small models have improved tremendously in the last year, to the point where they’re starting to get usable. Code generation is a much more constrained world than generalist text gen, and can be tested automatically, so progress is expected to continue at breakneck pace. Large models are still categorically better but this is expected to change rapidly.
Oh, sorry, I didn’t mean to imply that consumer-grade hardware has gotten more efficient. I wouldn’t really know about that, but I assume most of the focus is on data centers.
Those were two separate thoughts:
Can you provide evidence the “more efficient” models are actually more efficient for vibe coding? Results would be the best measure.
It also seems like costs for these models are increasing, and companies like Cursor had to stoop to offering people services below cost (before pulling the rug out from them).
I wish I could, but it would kinda be PII for me. Though, to clarify some things:
There’s a difference between efficiency and effectiveness. The hardware is becoming more efficient, while models and tooling are becoming more effective. The tooling/techniques to use LLMs more effectively also tend to burn a LOT of tokens.
TL;DR:
I think this kind of claim really lies in a sour spot.
On the one hand it is trivial to get an IDE, plug it to GLM 4.5 or some other smaller more efficient model, and see how it fares on a project. But that’s just anecdotal. On the other hand, model creators do this thing called benchmaxing where they fine-tune their model to hell and back to respond well to specific benchmarks. And the whole culture around benchmarks is… i don’t know i don’t like the vibe it’s all AGI maximalists wanking to percent changes in performance. Not fun. So, yeah, evidence is hard to come by when there are so many snake oil salesmen around.
On the other hand, it’s pretty easy to check on your own. Install opencode, get 20$ of GLM credit, make it write, deploy and monitor a simple SaaS product, and see how you like it. Then do another one. And do a third one with Claude Code for control if you can get a guest pass (i have some hit me up if you’re interested).
What is certain from casual observation is that yes, small models have improved tremendously in the last year, to the point where they’re starting to get usable. Code generation is a much more constrained world than generalist text gen, and can be tested automatically, so progress is expected to continue at breakneck pace. Large models are still categorically better but this is expected to change rapidly.