Zero-Shot Learning (ZSL)

Isabell Hamecher

March 20, 2026

•

4 min read

Discover how Zero-Shot Learning allows AI systems to process unknown data.

Definition

A concept in artificial intelligence (AI), particularly in machine learning and computer vision, where a model is able to recognise and classify objects or concepts that it has never seen before during training. This is achieved by leveraging auxiliary information such as semantic descriptions, attributes, or relationships to bridge the gap between seen and unseen classes.

Imagine seeing a zebra for the first time. You might describe it as a striped horse. Even without ever having seen a zebra, you can identify it because you understand its characteristics in relation to something familiar. This is the core idea behind zero-shot learning (ZSL).

Zero-shot learning is a problem setup in deep learning where a model is asked to recognise classes it has never seen during training. Unlike standard supervised learning, which requires many labelled examples, ZSL relies on auxiliary information to generalise from known classes to new ones.

How Zero-Shot Learning works

At the heart of ZSL is the use of additional knowledge about the unseen classes. This can include:

Attributes: Descriptions of observable features, such as “red head” or “long beak” for birds.
Textual descriptions: Encyclopaedia entries or Wikipedia pages explaining the new class.
Class-class similarity: Embedding classes in a continuous space and matching unseen samples to the nearest known class.

For example, a model trained to recognise horses can identify a zebra if it knows a zebra looks like a horse with stripes. This approach allows the model to make predictions without ever seeing labelled examples of the zebra during training.

History and development

The first mention of zero-shot learning appeared in natural language processing research in 2008, then in computer vision under the term zero-data learning. By 2009, the term “zero-shot learning” had become popular, taking inspiration from one-shot learning, where models learn from just one or a few examples.

In computer vision, ZSL models typically rely on representational similarity among class labels to classify new instances. In NLP, models aim to “understand the labels” by representing them in the same semantic space as the documents they are classifying.

Key methods in Zero-Shot Learning

Zero-shot learning methods can be grouped into several approaches:

Classifier-based methods: Construct classifiers for unseen classes using relationships or correspondences with known classes.
Instance-based methods: Generate or borrow examples of unseen classes using projection, instance-borrowing, or synthesising pseudo-instances.
Embedding-based methods: Represent classes and samples as vectors in a semantic space and classify new samples based on proximity to class embeddings.
Generative methods: Use models such as GANs or variational autoencoders to synthesise data for unseen classes.

Joint embedding and contrastive learning are often used to align representations across different modalities, such as text and images, enabling the model to compare and classify effectively.

Generalised Zero-Shot Learning

Real-world scenarios often involve both seen and unseen classes appearing at the same time. Generalised zero-shot learning (GZSL) addresses this challenge. Techniques include:

Gating modules: Decide whether a sample belongs to a known or new class.
Generative modules: Create feature representations of unseen classes to train standard classifiers on combined datasets.

Challenges in Zero-Shot Learning

Despite its potential, ZSL faces several limitations:

Bias: Models are naturally inclined to predict seen classes.
Domain shift: Differences in the statistical distribution between training and test data can affect accuracy.
Hubness: In high-dimensional spaces, some points dominate nearest neighbour searches, skewing predictions.
Semantic loss: Models may overlook important features that are only relevant for unseen classes.

Applications of Zero-Shot Learning

Zero-shot learning has found use across a range of fields, including:

Image classification: Identifying novel objects in visual search engines.
Semantic segmentation: Medical imaging, such as identifying COVID-19 features in X-rays.
Image generation: Text-to-image or sketch-to-image generation.
Object detection: Autonomous vehicles recognising new obstacles on the fly.
Natural language processing: Classifying text into topics or emotions.
Action recognition: Detecting new actions in video frames.
Style transfer and resolution enhancement: Improving image quality or transferring textures without prior examples.
Audio processing: Converting voices in voice synthesis applications.

Key Takeaways

Zero-shot learning allows AI models to generalise to unseen classes without labelled examples.
It relies on auxiliary information, such as textual descriptions, attributes, or semantic embeddings.
Generalised zero-shot learning addresses real-world scenarios where seen and unseen classes appear simultaneously.
Applications span computer vision, NLP, audio processing, and beyond.
While powerful, ZSL faces challenges including bias, domain shift, hubness, and semantic loss.

Related Terms

Deep Learning

Large-Language-Models (LLMs)

Natural Language Processing (NLP)

Get AI-compliant today!

Begin your journey towards AI excellence with oxethica. Start your free trial today and experience the future of responsible AI.

Start using oxethica