A token is the smallest piece of text a language model reads. It’s not always a word — sometimes it’s part of one.
‘Chatbot’ might be one token. ‘Unbelievable’ might be three: ‘un’ + ‘believ’ + ‘able’. The model doesn’t see letters; it sees tokens.
Here’s the weird part: even the people who built these models don’t fully understand what each token means inside the neural network. It’s a number in a vast statistical soup. We know it works. We just don’t always know why.