Token

Isabell Hamecher

March 23, 2026

•

4 min read

Learn more about tokens as essential units of data in machine learning and natural language processing.

Definition

In the context of AI, particularly in Natural Language Processing (NLP) and machine learning, a token is a basic unit of data that a model processes. Tokens are typically pieces of text (such as words, subwords, or characters) that have been extracted from input text during a process called tokenisation, enabling AI models to analyse and understand the input.

What is a Token?

A token is a small unit created by breaking larger pieces of data into manageable parts. In text, this might be a whole word, part of a word or even punctuation. In other types of data, such as images or audio, tokens represent structured chunks of visual or sound information.

AI models do not process sentences, pictures, or recordings in their raw form. They convert everything into tokens first. Each token is linked to a numerical representation, which allows the model to detect relationships and meaning. Put simply, tokens are how AI reads the world. For example:

The word darkness may be split into two tokens: "dark" and "ness"
Brightness could become "bright" and "ness"
The shared token "ness" helps the model recognise a similarity in meaning

Tokens can also change depending on context. The word lie might have one token when referring to resting down and another when referring to saying something untrue. During training, the model learns the difference.

What Is Tokenisation?

Tokenisation is the process of translating data into tokens. It happens across many data types:

Text is split into words or sub-words
Images and video can be mapped from pixels or voxels into tokens
Audio may be turned into spectrograms or semantic sound units

Efficient tokenisation matters, because when the system uses fewer or more meaningful tokens, it needs less computing power for both training and day-to-day use. Different tokenisers are built for different tasks. Some use smaller vocabularies, meaning fewer tokens to process, which improves efficiency.

Tokens during AI training

Training begins with tokenising large datasets. The scale is enormous, often billions or trillions of tokens. According to scaling principles, more training tokens generally lead to higher-quality models.

During pretraining, a model:

Receives a sequence of tokens
Predicts the next token
Adjusts itself based on whether it was correct

This cycle repeats until the model reaches a target accuracy, known as convergence.

After that comes post-training, where the model learns from more specialised tokens. These may relate to law, medicine, business, or specific tasks such as translation or reasoning. The goal is for the model to generate the right tokens during inference, which is when it responds to real user input.

Tokens during inference and reasoning

When a user provides a prompt, the AI converts it into tokens. It processes these and produces output tokens, which are then turned back into text, images, or other formats. A key factor here is the context window, which is the number of tokens a model can handle at once. This affects what the system can understand in a single request:

A few thousand tokens may cover a high-resolution image or several pages of text
Tens of thousands can handle long documents or extended audio
Some systems support a million or more tokens for very large datasets

Reasoning models go further. Besides input and output tokens, they generate additional reasoning tokens while working through complex problems. This can take minutes or hours and may require over 100 times more computing power than a simple response.

Tokens and AI economics

Tokens also drive cost and value of an AI system and are not just simply technical units. During training, tokens represent investment into intelligence. During inference, they influence pricing, revenue, and performance. Many AI services now measure usage in tokens, covering both input and output.

Different usage patterns are possible:

A short prompt can generate a long response using many output tokens
Large documents can be provided as input and summarised into a few tokens
Services may also set token-per-minute limits to manage heavy demand.

User experience is shaped by token performance too:

Time to first token is how quickly a response begins
Inter-token latency is how fast the rest of the tokens appear

Chat systems benefit from quick first tokens, while research-style tasks may prioritise higher-quality tokens even if responses take longer.

Key Takeaways

Tokens are small data units that AI models use to understand text, images, audio and more
Tokenisation converts raw data into numerical tokens that models can process
Training quality improves with larger numbers of tokens, often in the billions or trillions
Context windows limit how many tokens a model can handle at once, affecting task complexity
Reasoning models generate extra tokens to work through harder problems, increasing compute needs
Tokens influence AI economics, including cost, pricing and user experience

Related Terms

General Purpose AI

Generative AI

Natural Language Processing (NLP)

Get AI-compliant today!

Begin your journey towards AI excellence with oxethica. Start your free trial today and experience the future of responsible AI.

Start using oxethica