Global Optimization and Solver Tuning

The selection of solver parameters or initial guesses can be determined by another optimization algorithm to search in among categorical or continuous parameters. These solver parameters are called hyperparameters in Machine Learning. This tutorial is an introduction to hyperparameter optimization and the application for global optimization. A simple test optimization case with two local minima demonstrates the approach.

Examples of optimizer hyperparameters include initial guesses and solver options. The best values for these solver options and initial guesses are determined through a process called hyperparameter search to find the best combination of values. The objective may be to minimize the number of solver iterations or find a global solution.

Hyperparameter Search Methods

There are several common methods for hyperparameter optimization, each with its own strengths and weaknesses:

1️⃣ Grid search: A technique where a set of possible values for each hyperparameter is specified, and the algorithm will train and evaluate a model for each combination of hyperparameter values. Grid search can be computationally expensive, particularly when searching over many hyperparameters or a large range of values for each hyperparameter.

2️⃣ Random search: A technique where a random set of hyperparameter values is sampled from a predefined distribution for each hyperparameter. Random search is less computationally expensive than grid search, but still has a higher chance of finding a good set of hyperparameters than a simple grid search.

3️⃣ Bayesian optimization: A probabilistic model-based approach that uses Bayesian inference to model the function that maps the hyperparameters to the performance of the model. It uses the acquired knowledge to direct the search to the regions where it expects to find the best performance. Bayesian optimization cannot be parallelized and requires continuous hyperparameters (not categorical). It quickly converges to an optimal solution when there are few hyperparameters, but this efficiency degrades when the search dimension increases.

4️⃣ Genetic Algorithm: A evolutionary based algorithm that uses concepts of natural selection and genetics to optimize the parameters.

5️⃣ Gradient-based optimization: A method that uses gradient information to optimize the hyperparameters. This can be done using optimization algorithms such as gradient descent or Adam.

6️⃣ Hyperband: An algorithm that uses the idea of early stopping to decide when to stop training a model, which reduces the number of models that need to be trained and evaluated, making it faster than grid search or random search.

Which method to use depends on the problem, the complexity of the model, the computational resources available, and the desired trade-off between computation time and optimization quality.

Global Optimization

Objective: An optimization example has 2 local minima at (0,0,8) with objective 936.0 and (7,0,0) with objective 951.0. Use gekko and a multi-start method to find the global solution.

$$\begin{align}\mathrm{minimize} \quad & 100-x_1^2-2x_2^2-x_3^2-x_1x_2-x_1x_3 \\ \mathrm{subject\;to}\quad & 8x_1+14x_2+7x_3=56 \\ & x_1^2+x_2^2+x_3^2\geq25 \\ & x_1,x_2,x_3\ge0 \end{align}$$


Python GEKKO Local Solution

The following script produces the local (not global) solution of (7,0,0) with objective 951.0.

from gekko import GEKKO
m = GEKKO(remote=False)
x = m.Array(m.Var,3,lb=0)
x1,x2,x3 = x
m.Minimize(1000-x1**2-2*x2**2-x3**2-x1*x2-x1*x3)
m.Equations([8*x1+14*x2+7*x3==56,
             x1**2+x2**2+x3**2>=25])
m.solve(disp=False)
res=[print(f'x{i+1}: {xi.value[0]}') for i,xi in enumerate(x)]
print(f'Objective: {m.options.objfcnval:.2f}')

Multi-Start with Parallel Threading

This solution demonstrates the use of the threading module to perform a multi-start method with different initial conditions over a grid search. Multi-threading in Python is the ability of the interpreter to execute multiple threads (functions of a program) concurrently, in the same process as the main program. This allows for parallel execution of code, to improve the performance by utilizing multiple CPU cores or executing tasks simultaneously. The Python threading module creates and manages threads. A thread is created by instantiating an instance of the Thread class and then starting the thread using the start() method. Thanks to Erwin Kalvelagen for proposing a multi-start method.

import numpy as np
import threading
import time, random
from gekko import GEKKO

class ThreadClass(threading.Thread):
    def __init__(self, id, xg):
        s = self
        s.id = id
        s.m = GEKKO(remote=False)
        s.xg = xg
        s.objective = float('NaN')

        # initialize variables
        s.m.x = s.m.Array(s.m.Var,3,lb=0)
        for i in range(3):
            s.m.x[i].value = xg[i]
        s.m.x1,s.m.x2,s.m.x3 = s.m.x

        # Equations
        s.m.Equation(8*s.m.x1+14*s.m.x2+7*s.m.x3==56)
        s.m.Equation(s.m.x1**2+s.m.x2**2+s.m.x3**2>=25)

        # Objective
        s.m.Minimize(1000-s.m.x1**2-2*s.m.x2**2-s.m.x3**2
                     -s.m.x1*s.m.x2-s.m.x1*s.m.x3)

        # Set solver option
        s.m.options.SOLVER = 1

        threading.Thread.__init__(s)

    def run(self):
        print('Running application ' + str(self.id) + '\n')
        self.m.solve(disp=False,debug=0) # solve
        # Retrieve objective if successful
        if (self.m.options.APPSTATUS==1):
            self.objective = self.m.options.objfcnval
        else:
            self.objective = float('NaN')
        self.m.cleanup()

# Optimize at mesh points
x1_ = np.arange(0.0, 10.0, 3.0)
x2_ = np.arange(0.0, 10.0, 3.0)
x3_ = np.arange(0.0, 10.0, 3.0)
x1,x2,x3 = np.meshgrid(x1_,x2_,x3_)

threads = [] # Array of threads

# Load applications
id = 0
for i in range(x1.shape[0]):
    for j in range(x1.shape[1]):
        for k in range(x1.shape[2]):
            xg = (x1[i,j,k],x2[i,j,k],x3[i,j,k])
            # Create new thread
            threads.append(ThreadClass(id, xg))
            # Increment ID
            id += 1

# Run applications simultaneously as multiple threads
# Max number of threads to run at once
max_threads = 8
for t in threads:
    while (threading.activeCount()>max_threads):
        # check for additional threads every 0.01 sec
        time.sleep(0.01)
    # start the thread
    t.start()

# Check for completion
mt = 10.0 # max time (sec)
it = 0.0  # time counter
st = 1.0  # sleep time (sec)
while (threading.active_count()>=3):
    time.sleep(st)
    it = it + st
    print('Active Threads: ' + str(threading.active_count()))
    # Terminate after max time
    if (it>=mt):
        break

# Initialize array for objective
obj = np.empty_like(x1)

# Retrieve objective results
id = 0
id_best = 0; obj_best = 1e10
for i in range(x1.shape[0]):
    for j in range(x1.shape[1]):
        for k in range(x1.shape[2]):
            obj[i,j,k] = threads[id].objective
            if obj[i,j,k]<obj_best:
                id_best = id
                obj_best = obj[i,j,k]
            id += 1

print(obj)
print(f'Best objective {obj_best}')
print(f'Solution {threads[id_best].m.x}')

Global Optimization with Hyperopt

An alternative strategy is to use hyperopt to search for a global solution with gekko in the fmin function to find local evaluations with a multi-start method. hyperopt is a Python package for performing hyperparameter optimization with a variety of optimization algorithms including random search, Tree-structured Parzen Estimator (TPE), and adaptive TPE, as well as a simple and flexible way to define the search space for the hyperparameters. The main function of the package is fmin that is used to perform the optimization. The function fmin takes an objective function, the search space, the optimization algorithm, and the maximum number of evaluations as input. The objective function takes the hyperparameters as input and returns a dictionary with the loss (or negative of the performance metric) and the status of the optimization. In addition to the fmin function, hyperopt also provides a number of helper functions for defining the search space. For example, hp.quniform and hp.qloguniform for continuous variables, hp.choice for categorical variables and hp.randint for integers. hyperopt provides a built-in support for parallel execution and early stopping. It can be used in combination with most machine learning libraries, such as scikit-learn, TensorFlow, and PyTorch. It is a popular choice among data scientists and researchers for the ease of use and ability to find better solutions in a relatively small number of evaluations. There is an example of using hyperopt to find the best hyperparameters of a k-Nearest Neighbors classifier in the Machine Learning for Engineers course.

from gekko import GEKKO
from hyperopt import fmin, tpe, hp
from hyperopt import STATUS_OK, STATUS_FAIL

# Define the search space for the hyperparameters
space = {'x1': hp.quniform('x1', 0, 10, 3),
         'x2': hp.quniform('x2', 0, 10, 3),
         'x3': hp.quniform('x3', 0, 10, 3)}

def objective(params):
    m = GEKKO(remote=False)
    x = m.Array(m.Var,3,lb=0)
    x1,x2,x3 = x
    x1.value = params['x1']
    x2.value = params['x2']
    x3.value = params['x3']
    m.Minimize(1000-x1**2-2*x2**2-x3**2-x1*x2-x1*x3)
    m.Equations([8*x1+14*x2+7*x3==56,
                 x1**2+x2**2+x3**2>=25])
    m.options.SOLVER = 1
    m.solve(disp=False,debug=False)
    obj = m.options.objfcnval
    if m.options.APPSTATUS==1:
        s=STATUS_OK
    else:
        s=STATUS_FAIL
    m.cleanup()
    return {'loss':obj, 'status': s, 'x':x}

best = fmin(objective, space, algo=tpe.suggest, max_evals=50)
sol = objective(best)
print(f"Solution Status: {sol['status']}")
print(f"Objective: {sol['loss']:.2f}")
print(f"Solution: {sol['x']}")