Bayesian Optimization Example

In this article, I will present a very useful technique to find the global maximum of any function \(f : \mathbb{R}^n \rightarrow \mathbb{R}\). It is called Bayesian Optimization.

I will use this very good Bayesian Optimization framework.

Let us begin by a brief recap of what is Bayesian Optimization and why many people use it to optimize their models.

Bayesian Optimization is a constrained global optimization package built upon bayesian inference and gaussian process, that attempts to find the maximum value of an unknown function in as few iterations as possible. This technique is particularly suited for optimization of high cost functions, situations where the balance between exploration and exploitation is important.

Instead of attacking BO from a theoretical point of view, I think it is definitely easier to understand it with a short example. I have selected an analytic function (defined below) and lets see if we can find its global maximum.

First of all, let us start by installing the required packages (for those who dont want to mess up their python setup, please consider using the Virtual Environments in Python.

pip install numpy scipy scikit-learn bayesian-optimization

From there, lets proceed step by step.

from bayes_opt import BayesianOptimization
import numpy as np

import matplotlib.pyplot as plt
from matplotlib import gridspec
%matplotlib inline

Target Function

The function we will analyze today is a 1-D function with multiple local maxima: \(f(x) = e^{-(x - 2)^2} + e^{-\frac{(x - 6)^2}{10}} + \frac{1}{x^2 + 1},\). Its maximum is at \(x = 2\) and we assume that we already know that the maximum is in \(x \in (-2, 10)\).

def target(x):
    return np.exp(-(x - 2)**2) + np.exp(-(x - 6)**2/10) + 1/ (x**2 + 1)

KAPPA = 5
x = np.linspace(-2, 10, 1000)
y = target(x)

plt.plot(x, y)

png

Create a BayesianOptimization Object

A minimum number of 2 initial guesses is necessary to kick start the algorithms, these can either be random or user defined.

bo = BayesianOptimization(target, {'x': (-2, 10)})

In this example we will use the Upper Confidence Bound (UCB) as our utility function. It has the free parameter \(\kappa\) which control the balance between exploration and exploitation; we will set \(\kappa=5\) which, in this case, makes the algorithm quite bold. Additionally we will use the cubic correlation in our Gaussian Process.

gp_params = {'corr': 'cubic'}
bo.maximize(init_points=2, n_iter=0, acq='ucb', kappa=KAPPA, **gp_params)

Plotting and visualizing the algorithm at each step

Two random points

After we probe two points at random, we can fit a Gaussian Process and start the bayesian optimization procedure. Two points should give us a uneventful posterior with the uncertainty growing as we go further from the observations.

png

After one step of GP (and two random points)

bo.maximize(init_points=0, n_iter=1, kappa=KAPPA)

png

After two steps of GP (and two random points)

bo.maximize(init_points=0, n_iter=1, kappa=KAPPA)

png

After three steps of GP (and two random points)

bo.maximize(init_points=0, n_iter=1, kappa=KAPPA)

png

After four steps of GP (and two random points)

bo.maximize(init_points=0, n_iter=1, kappa=KAPPA)

png

After five steps of GP (and two random points)

bo.maximize(init_points=0, n_iter=1, kappa=KAPPA)

png

Stopping

After just a few points the algorithm was able to get pretty close to the true maximum. It is important to notice that the trade off between exploration (exploring the parameter space) and exploitation (probing points near the current known maximum) is fundamental to a succesful bayesian optimization procedure. The utility function being used here (Upper Confidence Bound - UCB) has a free parameter \(\kappa\) that allows the user to make the algorithm more or less conservative. Additionally, a the larger the initial set of random points explored, the less likely the algorithm is to get stuck in local minima due to being too conservative.

All the source code is available on the repository at https://github.com/fmfn/BayesianOptimization.

References

Written on June 25, 2016