Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.
Also includes outtakes on the ‘reasoning’ models.
Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.
Also includes outtakes on the ‘reasoning’ models.
The very interesting part will be how successful they are at training the training data selectors to choose high quality data sources.
I think a lot of it is still done by hand, and there is also synthetic data distilled from larger models of course.