u/Brief-Object-5230

The way we usually interact with a LLM is through a chat interface, we write something, send it to the llm and got the response.

But that’s not how llm’s actually work under the hood. Your given textual input makes actually no sense to a llm at the very first place.

Token and embeddings are the two central concepts of using a llm

Small chunks of text are called as tokens, and for a large language model to compute language, these token are needed to be converted into numeric representation called embeddings.

LLM Tokenization
The process of converting the textual chunks into tokens is called tokenization. For this, the llm has it’s tokenizer, which breaks the prompt into tokens
example showing the tokenizer of GPT-4 on the OpenAI Platform.

https://preview.redd.it/w2gy5bngne0h1.jpg?width=698&format=pjpg&auto=webp&s=eeee35259660c15a7b772baf165bb934ab012c32

The tokenizer while breaking the prompt into tokens also associates a unique_id to a specific token into it’s own reference table. The LLM responds to these series of integers
Apart from the input side, the tokenizers are also used at the output side of the llm to again to turn the resulting token ID into the output word or token associated with it,

Tokens and Embeddings – the food for your favourite LLM