TL;DR: The big tech AI company LLMs have gobbled up all of our data, but the damage they have done to open source and free culture communities are particularly insidious. By taking advantage of those who share freely, they destroy the bargain that made free software spread like wildfire.


If you didn’t want your code to be used by others then don’t make it open source.
Do you understand how free software works? Did you read the post? I’d love to clarify, but I’m not going to rewrite the article.
Also - this conclusion is ridiculous:
That is absolutely not true. It doesn’t remove the copyright from the original work and no court has ruled as such.
If I wrote a “random code generator” that just happened to create the source code for Microsoft Windows in entirety it wouldn’t strip Microsoft of its copyright.
Yes. And this is kinda hand-wavy bullshit.
That’s not how it works. Your code is not “incorporated” into the model in any recognizable form. It trains a model of vectors. There isn’t a file with your
for loopin there though.I can read your code, learn from it, and create my own code with the knowledge gained from your code without violating an OSS license. So can an LLM.
No you can’t. In the same way you can’t watch a Mickey mouse movie and then draw your own Mickey mouse from what you recall from the movie.
Copying can be done manually by memory, it doesn’t need to be a 1:1 match. Otherwise you could take a GPL licensed file, change the name of 1 variable, and make it proprietary code.
LLMs are just fancy lossy compression algorithms you can interact with. If I save a Netflix series in my hard drive, then re encode it, it is still protected by copyright, even if the bytes don’t match.
Why is Clean-room design a thing then?
Not copy your code. Use it to learn what algorithms it uses and ideas on how to implement it.
No, sometimes they spit out shit verbatim.
You are assuming way too much about how the models work.
Not even just an OSS license. No license backed by law is any stronger than copyright. And you are allowed to learn from or statistically analyze even fully copyrighted work.
Copyright is just a lot more permissive than I think many people realize. And there’s a lot of good that comes from that. It’s enabled things like API emulation and reverse engineering and being able to leave our programming job to go work somewhere else without getting sued.
The article is about how LLMs circumvent copyleft licenses like GPL
“If you didn’t want people in your house then don’t have doors” buddy… That’s not how anything works.
If you put a fucking sign on your door saying “come on in!” then don’t be angry when people do?
We do hang signs on the doors but they say something sightly different
https://en.wikipedia.org/wiki/Open-source_license
Public domain licenses are truly as you describe, but copyleft licenses are far from that. There are also many “source available” licenses which aren’t open at all. Just because you can read a book doesn’t mean you can print and sell it.
Who is wholesale copying OSS code and releasing it under a non-compliant license with an LLM?
Uh… Lots of people? That’s kinda the problem. Maybe use a search engine. There are plenty of cases of LLMs ‘laundering’ copyleft code into (often) proprietary codebases. And that’s just the most blatant and brain-dead obvious example; the use of GPL code to train commercial models is a bit more subtle and nuanced but no less nefarious, and the laws are currently unequipped to handle that part at all.
You don’t need an LLM to find and copy GPL code. The LLM isn’t adding anything new here.
Oh I’m sorry I didn’t realize you had the intelligence of an LLM. The conversation is now over. I hope you have a pleasant day.
🙄