The Chinese models are way ahead of this, training smartly and frugally instead of on a huge quantity. They don’t really need more internet to work fine as tools.
…And they share data with each other, and ignore copyright. Seemingly. I wouldn’t be surprised if the Chinese govt is providing a lot of data to them.
Also, it turns out multilingual training works very well. So even if the English internet turns to slop, other languages may fare better.
The Chinese models are way ahead of this, training smartly and frugally instead of on a huge quantity. They don’t really need more internet to work fine as tools.
…And they share data with each other, and ignore copyright. Seemingly. I wouldn’t be surprised if the Chinese govt is providing a lot of data to them.
Also, it turns out multilingual training works very well. So even if the English internet turns to slop, other languages may fare better.