Back to blog

The Missing Library in your Machine Learning Workflow

Machine Learning

The Missing Library in your Machine Learning Workflow

A quick guide to using Optuna for hyperparameter optimization

Photo by Drew Patrick Miller on Unsplash

Sound engineers can create the perfect blend in audio by tuning the sliders and knobs to the right positions on audio mixers.

Just like tuning music, machine learning models are also tuned to achieve the best performance.

Before we go into how we can use Optuna for tuning hyperparameters, here’s a quick intro to the topic.

Hyperparameter Optimization

What are hyperparameters?

Hyperparameters can be thought of as configuration values that control the learning process of an algorithm.

For example, let’s say you’re building a motorized toy car for a racing competition from scratch. You have control over the car’s specifications, such as the size of the tires, the motor speed, the torque, etc., that will determine how it performs on a race track. If the goal is to win the race, you would configure those settings for optimal performance.

Similarly, many machine learning algorithms also have these settings (hyperparameters) where you can tune or tweak the performance of the models to obtain the best performance.

I say most because simple ones, such as Simple Linear Regression, do not have them.

Below are some examples of model hyperparameters that are configured before model training.

  • k value in K-Nearest Neighbors
  • Learning rate in Neural Networks
  • C value in Logistic Regression
  • n_components in Principal Components Analysis
  • max_depth, n_estimators, max_features, etc. in Random Forest

What is hyperparameter optimization?

Now that you understand what hyperparameters are (hopefully), it’s time to understand how to optimize it.

Hyperparameter optimization is the idea of finding the right set of hyperparameters that yields an optimized model which minimizes or maximizes an objective function.

Depending on the metric you want to optimize in a machine learning model, the objective function could return the loss or the accuracy of the model, where the loss is something we want to minimize, and accuracy is something we want to maximize

A function that we want to minimize is called a loss function, which in simple terms, is a way to tell how poorly your machine learning model is performing.

Why is it important?

Machine Learning models aren’t able to learn the right set of hyperparameters to use by themselves. This is why it’s fundamental to tune them to the right settings so that they can achieve higher predictive power.

How to do it

There are various algorithms and tools that can be used to perform hyperparameter tuning.

The most common way that many intro courses reach is GridSearchCV, an exhaustive approach, where every possible combination of parameters is used to fit a model and optimize for the best performance. This approach is very expensive and takes a lot of time.

There are other approaches such as RandomizedSearchCV, which randomly samples hyperparameter values to fit the model, and more model-specific approaches such as LogisticRegressionCV and ElasticNetCV.

These approaches have problems and limitations, but there are new packages for hyperparameter tuning that solves these limitations, and one of them is Optuna.

Introducing Optuna

Photo by https://optuna.org/

Optuna is “an automatic hyperparameter optimization software framework, particularly designed for machine learning.

The key features of Optuna are as follows (source)

It’s popular among Kaggle Competitors and well-received by the ML community.

It’s also framework agnostic, which supports any machine learning or deep learning framework.

In this article, we’ll be exploring what Optuna does and try it out with sklearn in a simple example.

As always, here’s where you can find the code for this article:

Install Dependencies

! pip install --quiet optuna

First, install optuna with pip.

Load Libraries

import pandas as pd
import numpy as np

import optuna
from optuna.visualization import plot_param_importances

import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection
import sklearn.svm

optuna.logging.set_verbosity(optuna.logging.WARNING)

Here, we load the optuna library and sklearn and set the verbosity level to warnings only.

Basic of Optuna

Let’s first understand three terminologies in Optuna.

  • Trial: An execution of the objective function
  • Study: An optimization based on the objective function contains a set of trials.
  • Parameter: A variable that we want to optimize

Let’s start with an example.

We have this quadratic function below, and we want to optimize it.

If you forgot your calculus, optimizing a function means finding an input to the function that results in the minimum or maximum output from the function.

To do that, you first:

  1. differentiate the function 2(x-1)
  2. Set the result to zero 2x — 2 = 0
  3. Then you solve for x x = 1

Let’s now optimize it with Optuna and see the results.

Define the objective function

First, we start by defining an objective function.

We can suggest values that we want Optuna to sample from for our hyperparameter in the function.

def objective(trial):
  x = trial.suggest_float("x", -10, 10)
  return (x - 1) ** 2 # objective function

In our case, x is a float number. And we give a range from -10 to 10 for Optuna to sample from.

In other cases, if our variable was a categorical or an integer, we could use suggest_categorical, or suggest_int respectively.

Once we have the objective defined, we create the study using create_study

study = optuna.create_study()
study.optimize(objective, n_trials = 100, callbacks=[logging_callback])
Trial 0 finished with best value: 0.28501019083587587 and parameters: {'x': 1.5338634571085343}. 
Trial 1 finished with best value: 0.03894285911347894 and parameters: {'x': 0.802660548512268}. 
Trial 33 finished with best value: 0.0010347011566727482 and parameters: {'x': 0.9678332289983476}. 
Trial 49 finished with best value: 0.0004660814488122324 and parameters: {'x': 0.978411080415819}. 
Trial 51 finished with best value: 3.213952149612913e-11 and parameters: {'x': 1.0000056691729111}. 

Then we call the optimize function on it and set the number of trials.

The logging_callback variable passed to the parameter callbacks is telling Optuna to only produce an output when the best value is updated and is not required otherwise.

After the study is done optimizing, you can get the results of the best parameter like below.

study.best_params
{'x': 1.0000056691729111}

You can also get the best value, which is zero for our function.

study.best_value
3.213952149612913e-11

I coded a custom function below to print out all the useful info of a study, including the best trial of the study.

# print study info
def study_info(study):
  num_trial = len(study.trials)
  trial = study.best_trial
  print(f"Number of trials: {num_trial}")

  print(f"Best trial: ")
  print(f"  No.: {trial.number}")
  print(f"  Value: {trial.value}")
  print(f"  Params: ")

  for key, value in trial.params.items():
      print(f"    {key}: {value}")

Here’s what we get when using this function on our study.

study_info(study)
Number of finished trials: 100
Best trial : 
  Number: 51
  Value: 3.213952149612913e-11
  Params: 
    x: 1.0000056691729111

Let’s increase the number of variables for the function.

Now we have a quadratic function with three parameters (x, y, z)

With a bit of calculus or basic math intuition, you can easily figure out the value for x, y, and z that will make this equation equal to zero.

The answer: x = 1, y = 2, z = 3

Let’s use Optuna to optimize this function.

def objective(trial):
  x = trial.suggest_float("x", -10, 10)
  y = trial.suggest_float("y", -10, 10)
  z = trial.suggest_float("z", -10, 10)
  return (x - 1)**2 + (y - 2)**2 + (z - 3)**2

study = optuna.create_study()
study.optimize(objective, n_trials = 100, callbacks=[logging_callback])
Trial 0 finished with best value: 185.36401921699712 and parameters: {'x': -8.29537193039899, 'y': -7.9444851162075185, 'z': 3.2594140822600064}. 
Trial 1 finished with best value: 159.83255179442162 and parameters: {'x': 6.919588162035236, 'y': -6.57704125596668, 'z': -4.157191563718232}. 
Trial 2 finished with best value: 7.626764123650401 and parameters: {'x': -1.3053998639717879, 'y': 2.978717173954813, 'z': 4.1636186163237365}. 
Trial 21 finished with best value: 4.082878261206643 and parameters: {'x': -0.9926841299551272, 'y': 1.665213499245479, 'z': 3.0024936607957393}. 
Trial 29 finished with best value: 3.6726776120838824 and parameters: {'x': 2.0038973632986297, 'y': 0.3787112861454629, 'z': 3.190500924863483}. 
Trial 32 finished with best value: 1.6660402994238952 and parameters: {'x': 1.9779070538930343, 'y': 2.1608518645468258, 'z': 3.8269611665864067}. 
Trial 71 finished with best value: 0.5027736619471901 and parameters: {'x': 1.5627642695646873, 'y': 2.2084263728124554, 'z': 3.3776618672367618}. 

From 100 trials, it seems Optuna can’t find the best value for our variables.

study.best_params
{'x': 1.5627642695646873, 'y': 2.2084263728124554, 'z': 3.3776618672367618}

Here comes the best part about Optuna.

It saves the most recent trial, and we can keep optimizing our study until we are satisfied.

Let’s optimize with 500 more trials.

study.optimize(objective, n_trials=500, callbacks=[logging_callback])
Trial 112 finished with best value: 0.2838186411163353 and parameters: {'x': 1.3821926696075142, 'y': 2.303204599867226, 'z': 2.785957071983277}. 
Trial 113 finished with best value: 0.2636754457290153 and parameters: {'x': 1.4633972103468333, 'y': 2.112157177799116, 'z': 2.8093190134283823}. 
Trial 114 finished with best value: 0.13125427164753062 and parameters: {'x': 1.213208423245971, 'y': 2.028933781114155, 'z': 2.7085222543401715}. 
Trial 142 finished with best value: 0.09764704185839798 and parameters: {'x': 1.109293241869957, 'y': 1.900423125463834, 'z': 3.2752934347156897}. 
Trial 143 finished with best value: 0.023384939524816253 and parameters: {'x': 1.1079473428591624, 'y': 1.9704852367666095, 'z': 3.104217030497609}. 
Trial 381 finished with best value: 0.02105327177210695 and parameters: {'x': 0.8966563295800967, 'y': 1.9386566935358678, 'z': 2.918695902266264}. 

With 600 trials in total, Optuna is able to get closer to the right values.

study_info(study)
Number of finished trials: 600
Best trial : 
  Number: 381
  Value: 0.02105327177210695
  Params: 
    x: 0.8966563295800967
    y: 1.9386566935358678
    z: 2.918695902266264

Now let’s look at Optuna’s built-in functions for visualizing the optimizations.

Visualizations

Optimization history

With this function, we can observe at which # of trials does Optuna obtain the best value.

Objective Values

With plot_slice, we can also see as the # trials increase (darker shade), most of the values converge around the right values.

Now let’s try Optuna on a dataset and use it with sklearn to optimize for the right classifier.

Wine Dataset

We’ll be using the classic wine dataset for this example

We can load the dataset using the sklearn dataset package.

We have a target value we want to classify, which are the different types of wine.

Now our goal is to predict the class and optimize for accuracy.

Optuna with Sklearn

Below you see an example of integrating Optuna with sklearn.

First we can sample from the classifier algorithms to use — Support Vector Classifier and the Random Forest algorithm

Then, depending on which algorithm was sampled, they have their respective hyperparamers that Optuna can sample.

At the end, the score will be calculated and the accuracy is the value we want to optimize.

Since the higher the accuracy, the better, we create a study where we want to maximize, and we can tell Optuna that like below.

Running 100 trials on the study, we get an accuracy of 96.6% and it tells us the Random Forest Algorithm should be used, with 20 estimators and a maximum depth of 24.

Let’s plot the optimization history.

It seems around the 18 trial mark, the best value was already obtained.

Let’s also plot the hyperparameters as well.

optuna.visualization.plot_slice(study)

It seems max_depth has a lot of variation, and notice how Random Forest has more data points, which maybe suggests it was the better algorithm achieving higher accuracies.

You can also plot hyperparameter importance, which can tell us which hyperparameters are important and which to discard using the plot_param_importances function.

This is important because the difficulty of optimization increases roughly exponentially with regard to the number of parameters.

So it’s essential to only optimize for the important parameters.

Check out more visualizations you can do with Optuna.

Conclusion

This was a short guide to using Optuna. If you want to dive deeper into this tool, check out the resources below.

Resources

Kaggle Notebooks


Want to discuss the latest developments in Data Science and AI with other data scientists? Join our discord server!

Follow Bitgrit’s socials 📱 to stay updated on workshops and upcoming competitions!