By now, you have probably heard of OpenAI’s ChatGPT, or any of the alternatives GPT-3, GPT-4, Microsoft’s Bing Chat, Facebook’s LLaMa or even Google’s Bard. They are artificial intelligence programs that can participate in a conversation. Impressively smart, they can easily be mistaken for humans, and are skilled in a variety of tasks, from writing a dissertation to the creation of a website.
How can a computer hold such a conversation?
I get that this is a simplified explanation but want to add that this part can be misleading. The model doesn’t contain the original documents and doesn’t have internet access to look up the documents (though that can be added as an extra feature, but even then it’s used more as a source to show humans than something for the model to learn from on the fly). The actual word associations are all learned during training, and during inference it just uses the stored weights. One implication of this is that the model doesn’t know about anything that happened after its training data was collected.
I wonder what an ELI5 version of ‘stored weights’ would be in this context.
How closely related words and their attributes are to other words.