How Words Become Numbers

Quick Recap

In the previous lesson we learned that Large Language Models do not read full sentences directly.

Instead, text is broken into small pieces called tokens.

When an LLM generates a response, it predicts one token at a time and gradually builds a sentence.

But this raises another important question.

If tokens are pieces of language, how does a computer actually understand them?

Computers cannot think in words. They can only process numbers.

So before a model can learn from language, those tokens must first be converted into numbers.

Why Computers Need Numbers

Imagine showing a computer the word:

cat

To a human, the meaning is obvious. You picture an animal.

But a computer does not see meaning. It only sees characters stored as data.

For a neural network to learn patterns in language, words must be translated into a numerical form that mathematics can operate on.

This numerical representation is called an embedding.

From Words to Vectors

An embedding is a list of numbers that represents a piece of language.

For example, a token might be represented like this:

cat → [0.21, -0.44, 0.73, 0.10, ...]

These numbers do not describe the word directly. Instead, they place the word inside a mathematical space where similar meanings end up closer together.

For example, words like:

cat
dog
rabbit

will often appear near each other in this space because they are used in similar contexts.

Meanwhile, unrelated words like:

airplane
database
volcano

will appear in very different regions.

This structure allows the model to learn relationships between words.

A Simple Way to Picture It

Imagine a large map where every word is represented by a point.

Words that appear in similar contexts are placed closer together.

Words used in very different situations are placed far apart.

This map is called a vector space.

Embeddings are the coordinates that place each word on that map.

Why Embeddings Matter

Once tokens are converted into embeddings, the neural network can begin learning patterns.

It can recognize that words with similar meanings tend to appear in similar places in the vector space.

This allows the model to:

understand relationships between words
recognize patterns in language
predict which token should come next

Without embeddings, a neural network would only see random characters.

Embeddings give language a mathematical structure the model can learn from.

The Key Idea

Computers cannot learn directly from words.

Tokens must first be converted into numerical representations called embeddings.

These embeddings place words inside a mathematical space where relationships between language patterns can be learned.

Summary

Before a model can learn language, tokens must be converted into numbers.

These numerical representations are called embeddings. They allow neural networks to analyze patterns in language and make predictions about what token should come next.

In the next lesson, we will explore how transformer models use attention to decide which parts of a sentence matter most when generating te

Foundational Intelligence

Phase 1. Core mechanics

Quick Recap

Why Computers Need Numbers

From Words to Vectors

A Simple Way to Picture It

Why Embeddings Matter

The Key Idea

Summary

Mark your place before you move on.

Foundational Intelligence

Phase 1. Core mechanics

Foundational Intelligence

Quick Recap

Why Computers Need Numbers

From Words to Vectors

A Simple Way to Picture It

Why Embeddings Matter

The Key Idea

Summary

Mark your place before you move on.

How AI Sees Your Words

How AI Decides What to Pay Attention To