Simply explained: how does GPT work?

sizeoftheuniverse@programming.dev · 3 years ago

Simply explained: how does GPT work?

W^Unt!2@waveform.social · 3 years ago

You know when your typing on your phone and you have that bar above the keyboard showing you what word it thinks you are writing? If you click the word before you finish typing it, it can even show you the word it thinks you are going to write next. Gpt works the same way, it just has waaaay more data that it can sample from.

It’s all just very advanced predictive text algorithms.

Ask it a question about basketball. It looks through all documents it can find about basketball and sees often they reference, hoops, Michael Jordan, sneakers, NBA ect. And just outputs things that are highly referenced in a structure that makes grammatical sense.

For instance, if you had the word ‘basketball’ it knows it’s very unlikely for the word before it to be ‘radish’ and it’s more likely to be a word like ‘the’ or ‘play’ so it just strings it together logically.

That’s the basics anyway.

qwertyasdef@programming.dev · 3 years ago

Ask it a question about basketball. It looks through all documents it can find about basketball…

I get that this is a simplified explanation but want to add that this part can be misleading. The model doesn’t contain the original documents and doesn’t have internet access to look up the documents (though that can be added as an extra feature, but even then it’s used more as a source to show humans than something for the model to learn from on the fly). The actual word associations are all learned during training, and during inference it just uses the stored weights. One implication of this is that the model doesn’t know about anything that happened after its training data was collected.

W^Unt!2@waveform.social · 3 years ago

I wonder what an ELI5 version of ‘stored weights’ would be in this context.

Lmaydev@programming.dev · edit-2 3 years ago

How closely related words and their attributes are to other words.