Main

## Data Regression with Python

#### Python Data Regression

Correlations from data are obtained by adjusting parameters of a model to best fit the measured outcomes. The analysis may include statistics, data visualization, or other calculations to synthesize the information into relevant and actionable information. This tutorial demonstrates how to create a linear, polynomial, or nonlinear functions that best approximate the data and analyze the result. Script files of the Python source code with sample data are available below.

#### Regression with Python (GEKKO or Scipy)

import numpy as np
from gekko import GEKKO

# load data
xm = np.array([18.3447,79.86538,85.09788,10.5211,44.4556, \
69.567,8.960,86.197,66.857,16.875, \
52.2697,93.917,24.35,5.118,25.126, \
34.037,61.4445,42.704,39.531,29.988])

ym = np.array([5.072,7.1588,7.263,4.255,6.282, \
6.9118,4.044,7.2595,6.898,4.8744, \
6.5179,7.3434,5.4316,3.38,5.464, \
5.90,6.80,6.193,6.070,5.737])

# define GEKKO model
m = GEKKO()
# parameters and variables
a = m.FV(value=0)
b = m.FV(value=0)
c = m.FV(value=0,lb=-100,ub=100)
x = m.Param(value=xm)
ymeas = m.Param(value=ym)
ypred = m.Var()
# parameter and variable options
a.STATUS = 1 # available to optimizer
b.STATUS = 1 #  to minimize objective
c.STATUS = 1
# equation
m.Equation(ypred == a + b/x + c*m.log(x))
# objective
m.Obj(((ypred-ymeas)/ymeas)**2)
# application options
m.options.IMODE = 2   # regression mode
# solve
m.solve() # remote=False for local solve

# show final objective
print('Final SSE Objective: ' + str(m.options.objfcnval))

# print solution
print('Solution')
print('a = ' + str(a.value))
print('b = ' + str(b.value))
print('c = ' + str(c.value))

# plot solution
import matplotlib.pyplot as plt
plt.figure(1)
plt.plot(x,ymeas,'ro')
plt.plot(x,ypred,'bx');
plt.xlabel('x')
plt.ylabel('y')
plt.legend(['Measured','Predicted'],loc='best')
plt.savefig('results.png')
plt.show()

import numpy as np
from scipy.optimize import minimize

# load data
xm = np.array([18.3447,79.86538,85.09788,10.5211,44.4556, \
69.567,8.960,86.197,66.857,16.875, \
52.2697,93.917,24.35,5.118,25.126, \
34.037,61.4445,42.704,39.531,29.988])

ym = np.array([5.072,7.1588,7.263,4.255,6.282, \
6.9118,4.044,7.2595,6.898,4.8744, \
6.5179,7.3434,5.4316,3.38,5.464, \
5.90,6.80,6.193,6.070,5.737])

# calculate y
def calc_y(x):
a,b,c = x
y = a + b/xm + c*np.log(xm)
return y

# define objective
def objective(x):
return np.sum(((calc_y(x)-ym)/ym)**2)

# initial guesses
x0 = np.zeros(3)

# show initial objective
print('Initial SSE Objective: ' + str(objective(x0)))

# optimize
# bounds on variables
bnds100 = (-100.0, 100.0)
no_bnds = (-1.0e10, 1.0e10)
bnds = (no_bnds, no_bnds, bnds100)
solution = minimize(objective,x0,method='SLSQP',bounds=bnds)
x = solution.x
y = calc_y(x)

# show final objective
print('Final SSE Objective: ' + str(objective(x)))

# print solution
print('Solution')
print('a = ' + str(x))
print('b = ' + str(x))
print('c = ' + str(x))

# plot solution
import matplotlib.pyplot as plt
plt.figure(1)
plt.plot(xm,ym,'ro')
plt.plot(xm,y,'bx');
plt.xlabel('x')
plt.ylabel('y')
plt.legend(['Measured','Predicted'],loc='best')
plt.savefig('results.png')
plt.show()

#### Excel and MATLAB

This regression tutorial can also be completed with Excel and Matlab. A multivariate nonlinear regression case with multiple factors is available with example data for energy prices in Python. Click on the appropriate link for additional information.

comments powered by Disqus