Definition
A trial-and-error process by which one changes some hyperparameters and runs the algorithm on the data again, then compares its performance on the validation set to determine which set of hyperparameters results in the most accurate model.
AI models rarely perform at their best straight out of the box. Behind every accurate prediction or fluent response lies a careful process known as tuning. Often referred to as hyperparameter optimisation or hyperparameter tuning, this process focuses on refining the settings that shape how a model learns.
In simple terms, model tuning adjusts a machine learning model’s hyperparameters to achieve the best possible training performance. The goal is to find the optimal combination of values that improves accuracy, generation quality and other key performance metrics.
What are hyperparameters?
Hyperparameters are configuration variables set before training begins. They cannot be learned directly from the training data. Instead, they determine how a model is structured and how it behaves during learning. Some hyperparameters influence the training process itself, while others define the model’s architecture. For example,...
- Learning rate
- Batch size
- Number of epochs
- Momentum
- Number of hidden layers
- Nodes per layer
- Activation function
- Regularisation settings
- Dropout rate
Choosing the right combination is essential. Data scientists must define these values in advance, and the quality of those choices has a direct impact on the outcome of training.
Hyperparameters versus model parameters
It is helpful to distinguish hyperparameters from model parameters.
Model parameters, often called weights, are learned automatically during training. As the model processes data, it adjusts these weights to reduce error. They represent what the model has learned from the dataset.
Hyperparameters, by contrast, are set externally. They guide the learning process but are not themselves learned. If model parameters are the knowledge gained by a student, hyperparameters are the rules of study set before lessons begin.
Why model tuning is important for good AI models
Hyperparameters directly shape model performance. Poor choices can lead to:
- Overfitting, where a model memorises training data but struggles with new data
- High bias, where the model is too simplistic to capture meaningful patterns
- High variance, where predictions are inconsistent on unseen data
Overfitting is particularly common when a model becomes too complex for its dataset. Imagine two students: one memorises facts, the other understands concepts. Both may perform well in class, but only the student who understands principles can apply them to new topics. An overfitted model behaves like the student who memorises but cannot generalise.
Model tuning helps manage the balance between bias and variance. Techniques such as regularisation shift this balance deliberately, aiming for better predictions rather than perfect performance on training data alone.
Because different algorithms respond differently to hyperparameters, tuning often focuses on the most impactful settings to reduce time and computational cost.
How tuning works
Model tuning searches for the hyperparameter configuration that produces the best training outcome. For simple models, hyperparameters may be set manually. However, modern architectures such as transformers can have thousands of possible combinations. To make the search practical, data scientists limit the search space and use automated methods. Common tuning approaches include:
- Grid search: Tests every possible combination within a defined range. It is thorough but computationally expensive.
- Random search: Samples combinations from a statistical distribution. It is faster but does not guarantee the absolute best configuration.
- Bayesian optimisation: Builds a probabilistic model based on previous results and uses it to predict better combinations. Over time, it becomes more efficient by focusing on promising areas.
- Hyperband: Improves on random search by discarding poorly performing configurations early. Its successive halving strategy concentrates resources on the most promising candidates.
Model tuning versus model training
Model tuning and model training are closely related but distinct.
Training is the process in which a model learns patterns from data. During training, optimisation algorithms such as gradient descent minimise a loss function by updating model weights. The model reaches convergence when the loss is sufficiently reduced.
Tuning, by contrast, determines the best hyperparameter values that shape how training unfolds.
After training, models are evaluated through cross validation and testing. These steps compare predictions against unseen data to ensure the model generalises effectively. In practice, retraining may occur over time as part of the wider machine learning operations lifecycle.
Model tuning versus fine-tuning
Model tuning should not be confused with fine tuning.
Fine tuning adapts a pre trained model for a new, specific task. It is a form of transfer learning. Rather than training a model from scratch, developers start with a foundation model that has already learned broad patterns from large datasets.
In natural language processing, for example, models such as GPT style systems are pre trained using self-supervised learning. Fine tuning then tailors them for tasks such as instruction following, coding or specialised domains. Fine tuning can involve:
- Updating all model weights
- Updating only selected layers while freezing others
- Adding adapters or low rank updates to reduce computational cost
Parameter efficient fine-tuning methods, including low rank adaptation or LoRA, reduce the number of trainable parameters and memory demands while maintaining strong performance.
Hyperparameters remain critical during fine tuning. Learning rate, batch size and regularisation settings often need adjustment to prevent destabilising previously learned knowledge.
The role of hyperparameters in fine tuning
When adapting a pre trained model to a specific use case, hyperparameters influence:
- How quickly the model updates its weights
- Whether earlier layers remain stable or change significantly
- The risk of overfitting to small, task specific datasets
- The overall computational cost of adaptation
A smaller learning rate, for example, is often used during fine tuning to avoid catastrophic forgetting, where a model loses core knowledge acquired during pre training. Careful hyperparameter selection ensures that fine tuning enhances performance without undermining robustness.
Key takeaways
- Tuning identifies the best configuration settings for training a machine learning model
- Hyperparameters shape how a model learns, while model parameters are learned during training
- Poor tuning can lead to overfitting, high bias or high variance
- Common tuning methods include grid search, random search, Bayesian optimisation and Hyperband
- Fine tuning adapts a pre trained model for specific tasks, and hyperparameters remain central to its success
- Effective tuning balances performance, generalisation and computational efficiency

