Hyperparameters are configurations that determine the structure of machine learning models and control their learning processes. They shouldn’t be confused with the model’s parameters (such as the bias) whose optimal values are determined during training.
Hyperparameters are adjustable configurations that are manually set and tuned to optimize the model performance. They are top-level parameters whose values contribute to determining the weights of the model parameters. The two main types of hyperparameters are the model hyperparameters (such as the number and units of layers) which determine the structure of the model and the algorithm hyperparameters (such as the optimization algorithm and learning rate), which influences and controls the learning process.
Some standard hyperparameters for training neural nets include:
Number of hidden layers
Number of units for hidden layers
The dropout rate - A single model can be used to simulate having a large number of different network architectures by randomly dropping out nodes during training
Activation function (Relu, Sigmoid, Tanh) - defines the output of that node given an input or set of inputs
Optimization algorithm (Stochastic Gradient descent, Adam Optimizer, RMSprop, e.t.c) - tools for updating model parameters and minimizing the value of the loss function, as evaluated on the training set.
Loss function - a measurement of how good your model is in terms of predicting the expected outcome
Learning rate - controls how much to change the model in response to the estimated error each time the model weights are updated
Number of training iterations (epochs) - the number times that the learning algorithm will work through the entire training dataset.
Batch size - this hyperparameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated.
When building machine learning models, hyperparameters are set to guide the training process. Depending on the performance of the model after initial training, these values are repeatedly adjusted to improve the model, until a combination of values that produces the best results is chosen. The process of adjusting hyperparameters to obtain the right set of values that optimizes the performance of machine learning models is known as Hyperparameter Tuning.
Tuning hyperparameters could be challenging in deep learning. This is mainly due to the different configurations that need to be rightly set, several trials of re-adjusting these values to improve the performance and the poor results that arise from setting sub-optimal values for the hyperparameters. In practice, these values are usually set and fine-tuned based on certain inferences such as the general principles for specific problems (e.g using the softmax activation function for multiclass classification), prior experience from building models (e.g progressively reducing the units of hidden layers by a factor of 2), domain knowledge and size of the input data (building simpler networks for smaller dataset).
Even with this understanding, it is still difficult to come up with perfect values for these hyperparameters. Practitioners often determine the best hyperparameters using a trial and error approach. This is done by initializing the values based on their understanding of the problem, and then instinctively adjusting the values on several training trials according to the model’s performance before choosing the final values with the best performance for the model.
Manually fine-tuning hyperparameters this way is often laborious, time-consuming, sub-optimal and inefficient for managing computing resources. An alternative approach is to utilize scalable hyperparameter search algorithms such as Bayesian optimization, Random search and Hyperband. Keras Tuner is a scalable Keras framework that provides these algorithms built-in for hyperparameter optimization of deep learning models. It also provides an algorithm for optimizing Scikit-Learn models.
In this article, we will learn how to use various functions of the Keras Tuner to perform an automatic search for optimal hyperparameters. The task is to use the Keras Tuner to obtain optimal hyperparameters for building a model that accurately classifies the images of the CIFAR-10 dataset.
Using Keras Tuner requires the installation of the Tensorflow and Keras Tuner packages and importing the required libraries for building our model. KerasTuner requires Python 3.6+ and TensorFlow 2.0+. These come pre-installed on Gradient Machines.
We will load the CIFAR-10 dataset that contains 50,000 training and 10,000 test images of 10 object classes. You can read more about the dataset here. We also normalize the image pixel values to have similar data distribution and simplify the training.
A preprocessed dataset version is preloaded into the Keras dataset module for easy access and use.
Now that we have the setup and prepared our input data, we can build our model for hypertuning. This is done using Keras Tuner to define a search model (known as hypermodel) which is then passed to a tuner for hypertuning.
Hypermodels are either defined by creating a custom model builder function, utilizing the built-in models or subclassing the Tuner class for advanced use cases. We will be using the first two approaches to create search models for autotuning our hyperparameters.
To use a custom model, we will define a model-building function by defining the layers we need, tailor the search space for finding the best parameters and define a default value for the hyperparameters when we are not tuning them.
The function takes a parameter (hp) which instantiates the Hyperparameter object of Keras Tuner and is used to define the search space for the hyperparameter values. We will also compile and return the hypermodel for use. We will be using the Keras functional model pattern for building our model.
Line 3: We define a model building function (build_model) and pass a parameter (hp) which instantiates the Hyperparameter object of the Keras Tuner package, this is utilized for defining the search space for the hyperparameter values.
Line 5-6: We define our input layer and pass it to a variable (x)
Line 11: We define a search space for the number of convolution blocks for our model. We use the hp.Int function to create an integer hyperparameter search space. This creates a search space from min_value + 1 to max value. This will search through a space of 4 and 5 convolution blocks for the optimum value that maximizes accuracy.
Line 12: We define a search space for the number of filters for each convolutional layer in a block. A step of 32 increases the filter units by 32 for successive convolution layers.
Line 14-24: We define a set of three layers for each block. Each sub-layer applies convolution, batch normalization and ReLU activation to the input. The hp.Choice function for the pooling layer randomly chooses one of the supplied pooling to apply to the input. We then pass the predefined filter search space to the convolution layer.
Line 26: We apply Global average pooling and a dense layer with a search space from min_value to max_value and a step of 10. We also define the output layer with a softmax activation.
Line 34-40: Finally we define the model using the input and output layers, compile the model and return the built hypermodel.
For compiling the model we define a learning rate search space with the hp.Float function which creates a search space from 0.0001 to 0.002 for selecting the optimal learning rate.
After building the Hypermodel, we can now initialize our search algorithm. We will have to choose from the built-in search algorithms, such as Bayesian Optimization, Hyperband, and Random Search, for classical machine learning models.
We will be using the Hyperband search algorithm for our example. The tuner function takes parameters such as the hypermodel, an objective metric for evaluating the model, the max_epochs for training, the number of hyperband_iterations for each model, and a directory for saving the training logs (which can be visualized with Tensorboard) and the project_name.
Keras Tuner currently provides two tunable built-in models, the HyperResnet and HyperXception models which search through different combinations for the Resnet and Xception architectures respectively. Defining the tuner using built-in models is similar to using the model building function.
We can then use our tuner to search for the optimal hyperparameters for the model within the defined search space. The method is similar to fitting a model using Keras.
The best hyperparameters for the model within the defined search space can be gotten using the get_best_hyperparameters method of the tuner instance and the best model using the get_best_models method.
We can also view the best hyperparameters. In our example, we can achieve this thus:
This displays the optimal values for the number of convolution blocks, filters and units for the convolution and dense layers, choices of pooling layer and the learning rate.
We can also view the summary and structure of the optimal model using the appropriate Keras functions.
Finally, we will build a model using the optimal hyperparameters before calling the fit function for training the model.
Here I train the model for 50 epochs and added an EarlyStopping callback to stop training when the model is no longer improving.
We can evaluate the model on the test set. We will be evaluating the model using the loss and accuracy score of the model. You can try out other metrics as applicable.
Hyperparameters are key determinants for the performance of machine learning models and tuning them with a trial and error approach is inefficient. Keras Tuner applies search algorithms to automatically find the best hyperparameters in a defined search space.
In this article, we utilized the Keras Tuner to determine the best hyperparameters for a multiclass classification task. We were able to define a search space in an hypermodel using our custom model and built-in models before leveraging the provided search algorithms to automatically search through several values and combinations in finding an optimal combination of hyperparameters for our model.
You can check out the Keras Tuner guide for guides on visualizing the tuning process on Tensorboard, distributing the hypertuning process, tailoring the search space and subclassing the Tuner class for advanced use cases.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!