How Words Become Numbers
Computers cannot learn from text unless it is converted into numbers. In this lesson we explore how tokens are transformed into numerical representations called embeddings, allowing neural networks to learn patterns in language.
Course contents
Quick Recap
In the previous lesson we learned that Large Language Models do not read full sentences directly.
Instead, text is broken into small pieces called tokens.
When an LLM generates a response, it predicts one token at a time and gradually builds a sentence.
But this raises another important question.
If tokens are pieces of language, how does a computer actually understand them?
Computers cannot think in words. They can only process numbers.
So before a model can learn from language, those tokens must first be converted into numbers.
Why Computers Need Numbers
Imagine showing a computer the word:
cat
To a human, the meaning is obvious. You picture an animal.
But a computer does not see meaning. It only sees characters stored as data.
For a neural network to learn patterns in language, words must be translated into a numerical form that mathematics can operate on.
This numerical representation is called an embedding.
From Words to Vectors
An embedding is a list of numbers that represents a piece of language.
For example, a token might be represented like this:
cat → [0.21, -0.44, 0.73, 0.10, ...]
These numbers do not describe the word directly. Instead, they place the word inside a mathematical space where similar meanings end up closer together.
For example, words like:
- cat
- dog
- rabbit
will often appear near each other in this space because they are used in similar contexts.
Meanwhile, unrelated words like:
- airplane
- database
- volcano
will appear in very different regions.
This structure allows the model to learn relationships between words.
A Simple Way to Picture It
Imagine a large map where every word is represented by a point.
Words that appear in similar contexts are placed closer together.
Words used in very different situations are placed far apart.
This map is called a vector space.
Embeddings are the coordinates that place each word on that map.
Why Embeddings Matter
Once tokens are converted into embeddings, the neural network can begin learning patterns.
It can recognize that words with similar meanings tend to appear in similar places in the vector space.
This allows the model to:
- understand relationships between words
- recognize patterns in language
- predict which token should come next
Without embeddings, a neural network would only see random characters.
Embeddings give language a mathematical structure the model can learn from.
The Key Idea
Computers cannot learn directly from words.
Tokens must first be converted into numerical representations called embeddings.
These embeddings place words inside a mathematical space where relationships between language patterns can be learned.
Summary
Before a model can learn language, tokens must be converted into numbers.
These numerical representations are called embeddings. They allow neural networks to analyze patterns in language and make predictions about what token should come next.
In the next lesson, we will explore how transformer models use attention to decide which parts of a sentence matter most when generating te
Mark your place before you move on.
Lesson 4 of 4 · How AI Decides What to Pay Attention To. Progress is stored on this device so the course can show what to continue next.