Main

Data Regression with Python

Python Data Regression

A frequent activity for scientists and engineers is to develop correlations from data. By importing the data into Python, data analysis such as statistics, trending, or calculations can be made to synthesize the information into relevant and actionable information. This tutorial demonstrates how to create a linear or polynomial functions that best approximate the data trend, plot the results, and perform a basic statistical analysis. A script file of the Python source code with sample data is below.

Linear and Polynomial Regression

Nonlinear Regression with APM Python

Nonlinear Regression with Python SciPy Optimize

import numpy as np
from scipy.optimize import minimize

# load data
xm = np.array([18.3447,79.86538,85.09788,10.5211,44.4556, \
               69.567,8.960,86.197,66.857,16.875, \
               52.2697,93.917,24.35,5.118,25.126, \
               34.037,61.4445,42.704,39.531,29.988])

ym = np.array([5.072,7.1588,7.263,4.255,6.282, \
               6.9118,4.044,7.2595,6.898,4.8744, \
               6.5179,7.3434,5.4316,3.38,5.464, \
               5.90,6.80,6.193,6.070,5.737])

# calculate y
def calc_y(x):
    a = x[0]
    b = x[1]
    c = x[2]
    y = a + b/xm + c*np.log(xm)
    return y

# define objective
def objective(x):
    # calculate y
    y = calc_y(x)
    # calculate objective
    obj = 0.0
    for i in range(len(ym)):
        obj = obj + ((y[i]-ym[i])/ym[i])**2    
    # return result
    return obj

# initial guesses
x0 = np.zeros(3)
x0[0] = 0.0 # a
x0[1] = 0.0 # b
x0[2] = 0.0 # c

# show initial objective
print('Initial SSE Objective: ' + str(objective(x0)))

# optimize
# bounds on variables
bnds100 = (-100.0, 100.0)
no_bnds = (-1.0e10, 1.0e10)
bnds = (no_bnds, no_bnds, bnds100)
solution = minimize(objective,x0,method='SLSQP',bounds=bnds)
x = solution.x
y = calc_y(x)

# show final objective
print('Final SSE Objective: ' + str(objective(x)))

# print solution
print('Solution')
print('a = ' + str(x[0]))
print('b = ' + str(x[1]))
print('c = ' + str(x[2]))

# plot solution
import matplotlib.pyplot as plt
plt.figure(1)
plt.plot(xm,ym,'ro')
plt.plot(xm,y,'bx');
plt.xlabel('x')
plt.ylabel('y')
plt.legend(['Measured','Predicted'],loc='best')
plt.savefig('results.png')
plt.show()

This regression tutorial can also be completed with Excel and Matlab. Click on the appropriate link for additional information.


comments powered by Disqus