## Global Optimization and Solver Tuning

The selection of solver parameters or initial guesses can be determined by another optimization algorithm to search in among categorical or continuous parameters. These solver parameters are called hyperparameters in Machine Learning. This tutorial is an introduction to hyperparameter optimization and the application for global optimization. A simple test optimization case with two local minima demonstrates the approach.

Examples of optimizer hyperparameters include initial guesses and solver options. The best values for these solver options and initial guesses are determined through a process called hyperparameter search to find the best combination of values. The objective may be to minimize the number of solver iterations or find a global solution.

#### Hyperparameter Search Methods

There are several common methods for hyperparameter optimization, each with its own strengths and weaknesses:

1️⃣ Grid search: A technique where a set of possible values for each hyperparameter is specified, and the algorithm will train and evaluate a model for each combination of hyperparameter values. Grid search can be computationally expensive, particularly when searching over many hyperparameters or a large range of values for each hyperparameter.

2️⃣ Random search: A technique where a random set of hyperparameter values is sampled from a predefined distribution for each hyperparameter. Random search is less computationally expensive than grid search, but still has a higher chance of finding a good set of hyperparameters than a simple grid search.

3️⃣ Bayesian optimization: A probabilistic model-based approach that uses Bayesian inference to model the function that maps the hyperparameters to the performance of the model. It uses the acquired knowledge to direct the search to the regions where it expects to find the best performance. Bayesian optimization cannot be parallelized and requires continuous hyperparameters (not categorical). It quickly converges to an optimal solution when there are few hyperparameters, but this efficiency degrades when the search dimension increases.

4️⃣ Genetic Algorithm: A evolutionary based algorithm that uses concepts of natural selection and genetics to optimize the parameters.

5️⃣ Gradient-based optimization: A method that uses gradient information to optimize the hyperparameters. This can be done using optimization algorithms such as gradient descent or Adam.

6️⃣ Hyperband: An algorithm that uses the idea of early stopping to decide when to stop training a model, which reduces the number of models that need to be trained and evaluated, making it faster than grid search or random search.

Which method to use depends on the problem, the complexity of the model, the computational resources available, and the desired trade-off between computation time and optimization quality.

#### Global Optimization

**Objective:** An optimization example has 2 local minima at *(0,0,8)* with objective *936.0* and *(7,0,0)* with objective *951.0*. Use gekko and a multi-start method to find the global solution.

$$\begin{align}\mathrm{minimize} \quad & 100-x_1^2-2x_2^2-x_3^2-x_1x_2-x_1x_3 \\ \mathrm{subject\;to}\quad & 8x_1+14x_2+7x_3=56 \\ & x_1^2+x_2^2+x_3^2\geq25 \\ & x_1,x_2,x_3\ge0 \end{align}$$

**Python GEKKO Local Solution**

The following script produces the local (not global) solution of *(7,0,0)* with objective *951.0*.

m = GEKKO(remote=False)

x = m.Array(m.Var,3,lb=0)

x1,x2,x3 = x

m.Minimize(1000-x1**2-2*x2**2-x3**2-x1*x2-x1*x3)

m.Equations([8*x1+14*x2+7*x3==56,

x1**2+x2**2+x3**2>=25])

m.solve(disp=False)

res=[print(f'x{i+1}: {xi.value[0]}') for i,xi in enumerate(x)]

print(f'Objective: {m.options.objfcnval:.2f}')

**Multi-Start with Parallel Threading**

This solution demonstrates the use of the *threading* module to perform a multi-start method with different initial conditions over a grid search. Multi-threading in Python is the ability of the interpreter to execute multiple threads (functions of a program) concurrently, in the same process as the main program. This allows for parallel execution of code, to improve the performance by utilizing multiple CPU cores or executing tasks simultaneously. The Python *threading* module creates and manages threads. A thread is created by instantiating an instance of the Thread class and then starting the thread using the *start()* method. Thanks to Erwin Kalvelagen for proposing a multi-start method.

import threading

import time, random

from gekko import GEKKO

class ThreadClass(threading.Thread):

def __init__(self, id, xg):

s = self

s.id = id

s.m = GEKKO(remote=False)

s.xg = xg

s.objective = float('NaN')

# initialize variables

s.m.x = s.m.Array(s.m.Var,3,lb=0)

for i in range(3):

s.m.x[i].value = xg[i]

s.m.x1,s.m.x2,s.m.x3 = s.m.x

# Equations

s.m.Equation(8*s.m.x1+14*s.m.x2+7*s.m.x3==56)

s.m.Equation(s.m.x1**2+s.m.x2**2+s.m.x3**2>=25)

# Objective

s.m.Minimize(1000-s.m.x1**2-2*s.m.x2**2-s.m.x3**2

-s.m.x1*s.m.x2-s.m.x1*s.m.x3)

# Set solver option

s.m.options.SOLVER = 1

threading.Thread.__init__(s)

def run(self):

print('Running application ' + str(self.id) + '\n')

self.m.solve(disp=False,debug=0) # solve

# Retrieve objective if successful

if (self.m.options.APPSTATUS==1):

self.objective = self.m.options.objfcnval

else:

self.objective = float('NaN')

self.m.cleanup()

# Optimize at mesh points

x1_ = np.arange(0.0, 10.0, 3.0)

x2_ = np.arange(0.0, 10.0, 3.0)

x3_ = np.arange(0.0, 10.0, 3.0)

x1,x2,x3 = np.meshgrid(x1_,x2_,x3_)

threads = [] # Array of threads

# Load applications

id = 0

for i in range(x1.shape[0]):

for j in range(x1.shape[1]):

for k in range(x1.shape[2]):

xg = (x1[i,j,k],x2[i,j,k],x3[i,j,k])

# Create new thread

threads.append(ThreadClass(id, xg))

# Increment ID

id += 1

# Run applications simultaneously as multiple threads

# Max number of threads to run at once

max_threads = 8

for t in threads:

while (threading.activeCount()>max_threads):

# check for additional threads every 0.01 sec

time.sleep(0.01)

# start the thread

t.start()

# Check for completion

mt = 10.0 # max time (sec)

it = 0.0 # time counter

st = 1.0 # sleep time (sec)

while (threading.active_count()>=3):

time.sleep(st)

it = it + st

print('Active Threads: ' + str(threading.active_count()))

# Terminate after max time

if (it>=mt):

break

# Initialize array for objective

obj = np.empty_like(x1)

# Retrieve objective results

id = 0

id_best = 0; obj_best = 1e10

for i in range(x1.shape[0]):

for j in range(x1.shape[1]):

for k in range(x1.shape[2]):

obj[i,j,k] = threads[id].objective

if obj[i,j,k]<obj_best:

id_best = id

obj_best = obj[i,j,k]

id += 1

print(obj)

print(f'Best objective {obj_best}')

print(f'Solution {threads[id_best].m.x}')

**Global Optimization with Hyperopt**

An alternative strategy is to use *hyperopt* to search for a global solution with gekko in the *fmin* function to find local evaluations with a multi-start method. *hyperopt* is a Python package for performing hyperparameter optimization with a variety of optimization algorithms including random search, Tree-structured Parzen Estimator (TPE), and adaptive TPE, as well as a simple and flexible way to define the search space for the hyperparameters. The main function of the package is *fmin* that is used to perform the optimization. The function *fmin* takes an objective function, the search space, the optimization algorithm, and the maximum number of evaluations as input. The objective function takes the hyperparameters as input and returns a dictionary with the loss (or negative of the performance metric) and the status of the optimization. In addition to the *fmin* function, *hyperopt* also provides a number of helper functions for defining the search space. For example, *hp.quniform* and *hp.qloguniform* for continuous variables, *hp.choice* for categorical variables and *hp.randint* for integers. *hyperopt* provides a built-in support for parallel execution and early stopping. It can be used in combination with most machine learning libraries, such as scikit-learn, TensorFlow, and PyTorch. It is a popular choice among data scientists and researchers for the ease of use and ability to find better solutions in a relatively small number of evaluations. There is an example of using hyperopt to find the best hyperparameters of a k-Nearest Neighbors classifier in the Machine Learning for Engineers course.

from hyperopt import fmin, tpe, hp

from hyperopt import STATUS_OK, STATUS_FAIL

# Define the search space for the hyperparameters

space = {'x1': hp.quniform('x1', 0, 10, 3),

'x2': hp.quniform('x2', 0, 10, 3),

'x3': hp.quniform('x3', 0, 10, 3)}

def objective(params):

m = GEKKO(remote=False)

x = m.Array(m.Var,3,lb=0)

x1,x2,x3 = x

x1.value = params['x1']

x2.value = params['x2']

x3.value = params['x3']

m.Minimize(1000-x1**2-2*x2**2-x3**2-x1*x2-x1*x3)

m.Equations([8*x1+14*x2+7*x3==56,

x1**2+x2**2+x3**2>=25])

m.options.SOLVER = 1

m.solve(disp=False,debug=False)

obj = m.options.objfcnval

if m.options.APPSTATUS==1:

s=STATUS_OK

else:

s=STATUS_FAIL

m.cleanup()

return {'loss':obj, 'status': s, 'x':x}

best = fmin(objective, space, algo=tpe.suggest, max_evals=50)

sol = objective(best)

print(f"Solution Status: {sol['status']}")

print(f"Objective: {sol['loss']:.2f}")

print(f"Solution: {sol['x']}")