Regression Statistics with Python

Main.PythonRegressionStatistics History

Hide minor edits - Show changes to output

October 31, 2022, at 07:26 PM by 10.35.117.248 -

Changed line 46 from:

url = '~~https~~://apmonitor.com/che263/uploads/Main/stats_data.txt'

to:

url = 'http://apmonitor.com/che263/uploads/Main/stats_data.txt'

Changed line 163 from:

url = '~~https~~://apmonitor.com/che263/uploads/Main/stats_data.txt'

to:

url = 'http://apmonitor.com/che263/uploads/Main/stats_data.txt'

Changed line 271 from:

url = '~~https~~://apmonitor.com/che263/uploads/Main/stats_data.txt'

to:

url = 'http://apmonitor.com/che263/uploads/Main/stats_data.txt'

Changed line 426 from:

url = '~~https~~://apmonitor.com/che263/uploads/Main/stats_data.txt'

to:

url = 'http://apmonitor.com/che263/uploads/Main/stats_data.txt'

Restore

June 21, 2020, at 04:17 AM by 136.36.211.159 -

Deleted lines 564-582:

----

(:html:)
<div id="disqus_thread"></div>
<script type="text/javascript">
/* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
var disqus_shortname = 'apmonitor'; // required: replace example with your forum shortname

/* * * DON'T EDIT BELOW THIS LINE * * */
(function() {
var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
dsq.src = 'https://' + disqus_shortname + '.disqus.com/embed.js';
(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
})();
</script>
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
<a href="https://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
(:htmlend:)

Restore

February 14, 2019, at 10:23 PM by 10.37.111.106 -

Changed lines 37-38 from:

~~import pip~~
pip.main(['install','uncertainties'])

to:

try:
from pip import main as pipmain
except:
from pip._internal import main as pipmain
pipmain(['install','uncertainties'])

Changed lines 154-155 from:

~~import pip~~
pip.main(['install','uncertainties'])

to:

try:
from pip import main as pipmain
except:
from pip._internal import main as pipmain
pipmain(['install','uncertainties'])

Restore

March 21, 2018, at 11:53 PM by 10.37.35.33 -

Changed lines 439-440 from:

~~s_c = np.sqrt(pcov[2,2])~~

to:

# #####################################
# Single Variable Confidence Interval
# #####################################

Added lines 455-491:

# #####################################
# Linear Joint Confidence Interval
# thanks to Trent Okeson
# #####################################
eigenval, eigenvec = np.linalg.eig(pcov)
max_eigenval = np.argmax(eigenval)
largest_eigenvec = eigenvec[:, max_eigenval]
largest_eigenval = eigenval[max_eigenval]

if max_eigenval == 1:
smallest_eigenval = eigenval[0]
smallest_eigenvec = eigenvec[:, 0]
else:
smallest_eigenval = eigenval[1]
smallest_eigenvec = eigenvec[:, 1]

angle = np.arctan2(largest_eigenvec[1], largest_eigenvec[0])
if angle < 0:
angle += 2*np.pi

chisquare_val = 2.4477
theta_grid = np.linspace(0, 2*np.pi)
phi = angle
a_rot = chisquare_val*np.sqrt(largest_eigenval)
b_rot = chisquare_val*np.sqrt(smallest_eigenval)

ellipse_x_r = a_rot*np.cos(theta_grid)
ellipse_y_r = b_rot*np.sin(theta_grid)

R = [[np.cos(phi), np.sin(phi)], [-np.sin(phi), np.cos(phi)]]

r_ellipse = np.stack((ellipse_x_r, ellipse_y_r)).T.dot(R)
final_ellipse = r_ellipse + popt[None, 0:2]

# #####################################
# Nonlinear Joint Confidence Interval
# #####################################

Changed lines 494-496 from:

m = ~~400~~

i1 = np.linspace(a3L,a3U*2,m)

to:

m = 200

i1 = np.linspace(a3L,a3U*2.0,m)

Changed lines 526-527 from:

plt.xlabel('a')
plt.ylabel('b')

to:

plt.xlabel('Slope (a)')
plt.ylabel('Intercept (b)')

Added lines 538-539:

plt.plot(final_ellipse[:, 0], final_ellipse[:, 1], 'k-',\
label='Linear Joint Confidence Region')

Changed lines 541-542 from:

LineWidth=3,~~label='Joint~~ Confidence Region')

to:

LineWidth=4,\
label='Nonlinear Joint Confidence Region')

Changed line 546 from:

plt.title('Parameter Statistics (y = a exp(b*x) + c)')

to:

plt.title('Parameter Statistics (y = a x + b)')

Restore

March 21, 2018, at 11:44 PM by 10.37.35.33 -

Added lines 284-286:

# #####################################
# Single Variable Confidence Interval
# #####################################

Added lines 299-335:

# #####################################
# Linear Joint Confidence Interval
# thanks to Trent Okeson
# #####################################
eigenval, eigenvec = np.linalg.eig(pcov)
max_eigenval = np.argmax(eigenval)
largest_eigenvec = eigenvec[:, max_eigenval]
largest_eigenval = eigenval[max_eigenval]

if max_eigenval == 1:
smallest_eigenval = eigenval[0]
smallest_eigenvec = eigenvec[:, 0]
else:
smallest_eigenval = eigenval[1]
smallest_eigenvec = eigenvec[:, 1]

angle = np.arctan2(largest_eigenvec[1], largest_eigenvec[0])
if angle < 0:
angle += 2*np.pi

chisquare_val = 2.4477
theta_grid = np.linspace(0, 2*np.pi)
phi = angle
a_rot = chisquare_val*np.sqrt(largest_eigenval)
b_rot = chisquare_val*np.sqrt(smallest_eigenval)

ellipse_x_r = a_rot*np.cos(theta_grid)
ellipse_y_r = b_rot*np.sin(theta_grid)

R = [[np.cos(phi), np.sin(phi)], [-np.sin(phi), np.cos(phi)]]

r_ellipse = np.stack((ellipse_x_r, ellipse_y_r)).T.dot(R)
final_ellipse = r_ellipse + popt[None, :]

# #####################################
# Nonlinear Joint Confidence Interval
# #####################################

Added lines 382-383:

plt.plot(final_ellipse[:, 0], final_ellipse[:, 1], 'k--',\
label='Linear Joint Confidence Region')

Changed lines 385-386 from:

LineWidth=3,~~label='Joint~~ Confidence Region')

to:

LineWidth=4,\
label='Nonlinear Joint Confidence Region')

Restore

March 17, 2018, at 03:38 PM by 45.56.3.173 -

Added lines 247-250:

(:html:)
<iframe width="560" height="315" src="https://www.youtube.com/embed/qOV8QaOzWXA" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
(:htmlend:)

Restore

March 17, 2018, at 05:12 AM by 45.56.3.173 -

Changed lines 244-245 from:

!!!! Joint Confidence ~~Regions~~

to:

!!!! Joint Confidence Region - Linear Regression

Added lines 357-465:

!!!! Joint Confidence Region - Nonlinear Regression

%width=550px%Attach:sse_contour_nonlinear.png

(:toggle hide conf_int_nonlinear button show="Joint Confidence Solution":)
(:div id=conf_int_nonlinear:)
(:source lang=python:)
import numpy as np
import scipy as sp
from scipy import stats
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import pandas as pd

# import data
url = 'https://apmonitor.com/che263/uploads/Main/stats_data.txt'
data = pd.read_csv(url)
x = data['x'].values
y = data['y'].values

def f(x, a, b, c):
return a * np.exp(b*x) + c

# perform regression
popt, pcov = curve_fit(f, x, y)

# retrieve parameter values
a = popt[0]
b = popt[1]
c = popt[2]

# calculate parameter stdev
s_a = np.sqrt(pcov[0,0])
s_b = np.sqrt(pcov[1,1])
s_c = np.sqrt(pcov[2,2])

# calculate 95% confidence interval
aL = a-1.96*s_a
aU = a+1.96*s_a
bL = b-1.96*s_b
bU = b+1.96*s_b

# calculate 3sigma interval
a3L = a-3.0*s_a
a3U = a+3.0*s_a
b3L = b-3.0*s_b
b3U = b+3.0*s_b

# Generate contour plot of SSE ratio vs. Parameters
# meshgrid is +/- change in the objective value
m = 400

i1 = np.linspace(a3L,a3U*2,m)
i2 = np.linspace(b3L,b3U,m)
a_grid, b_grid = np.meshgrid(i1, i2)
n = len(y) # number of data points
p = 2 # number of parameters

# sum of squared errors
sse = np.empty((m,m))
for i in range(m):
for j in range(m):
at = a_grid[i,j]
bt = b_grid[i,j]
sse[i,j] = np.sum((y-f(x,at,bt,c))**2)

# normalize to the optimal solution
best_sse = np.sum((y-f(x,a,b,c))**2)
fsse = (sse - best_sse) / best_sse

# compute f-statistic for the f-test
alpha = 0.05 # alpha, confidence
# alpha=0.05 is 95% confidence
fstat = sp.stats.f.isf(alpha,p,(n-p))
flim = fstat * p / (n-p)
obj_lim = flim * best_sse + best_sse

# Create a contour plot
plt.figure()
CS = plt.contour(a_grid,b_grid,sse,[1,2,5,10],\
colors='gray')
plt.clabel(CS, inline=1, fontsize=10)
plt.xlabel('a')
plt.ylabel('b')
# solid line to show confidence region
CS = plt.contour(a_grid,b_grid,sse,[obj_lim],\
colors='orange',linewidths=[2.0])
plt.clabel(CS, inline=1, fontsize=10)

# 2 dummy plots (gray,orange) for legend
plt.plot([aL,aL],[bL,bL],c='gray',\
LineWidth=3,label='Objective')
plt.plot([aL,aU,aU,aL,aL],[bL,bL,bU,bU,bL],c='blue',\
LineWidth=3,label='Parameter Confidence Region')
plt.plot([aL,aL],[bL,bL],c='orange',\
LineWidth=3,label='Joint Confidence Region')
plt.scatter(a,b,s=10,c='red',marker='o',\
label='Optimal Solution')

plt.title('Parameter Statistics (y = a exp(b*x) + c)')
plt.legend(loc=1)

# Save the figure as a PNG
plt.savefig('contour.png')
plt.show()
(:sourceend:)
(:divend:)

Restore

March 17, 2018, at 03:51 AM by 45.56.3.173 -

Changed line 250 from:

(:toggle hide conf_int button show="~~Nonlinear Regression~~ Solution":)

to:

(:toggle hide conf_int button show="Joint Confidence Solution":)

Changed lines 263-265 from:

x = data['x'].values~~[0:20]~~
y = data['y'].values~~[0:20]~~

to:

x = data['x'].values
y = data['y'].values

Changed line 323 from:

CS = plt.contour(a_grid,b_grid,sse,[5,20,50,~~100,200,500~~],\

to:

CS = plt.contour(a_grid,b_grid,sse,[1,2,5,10],\

Restore

March 17, 2018, at 03:00 AM by 45.56.3.173 -

Changed lines 250-251 from:

(:toggle hide ~~nonlinear~~ button show="Nonlinear Regression Solution":)
(:div id=~~nonlinear~~:)

to:

(:toggle hide conf_int button show="Nonlinear Regression Solution":)
(:div id=conf_int:)

Restore

March 17, 2018, at 03:00 AM by 45.56.3.173 -

Added lines 249-350:

(:toggle hide nonlinear button show="Nonlinear Regression Solution":)
(:div id=nonlinear:)
(:source lang=python:)
import numpy as np
import scipy as sp
from scipy import stats
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import pandas as pd

# import data
url = 'https://apmonitor.com/che263/uploads/Main/stats_data.txt'
data = pd.read_csv(url)
x = data['x'].values[0:20]
y = data['y'].values[0:20]

def f(x, a, b):
return a * x + b

# perform regression
popt, pcov = curve_fit(f, x, y)

# retrieve parameter values
a = popt[0]
b = popt[1]

# calculate parameter stdev
s_a = np.sqrt(pcov[0,0])
s_b = np.sqrt(pcov[1,1])

# calculate 95% confidence interval
aL = a-1.96*s_a
aU = a+1.96*s_a
bL = b-1.96*s_b
bU = b+1.96*s_b

# calculate 3sigma interval
a3L = a-3.0*s_a
a3U = a+3.0*s_a
b3L = b-3.0*s_b
b3U = b+3.0*s_b

# Generate contour plot of SSE ratio vs. Parameters
# meshgrid is +/- change in the objective value
m = 200

i1 = np.linspace(a3L,a3U,m)
i2 = np.linspace(b3L,b3U,m)
a_grid, b_grid = np.meshgrid(i1, i2)
n = len(y) # number of data points
p = 2 # number of parameters

# sum of squared errors
sse = np.empty((m,m))
for i in range(m):
for j in range(m):
at = a_grid[i,j]
bt = b_grid[i,j]
sse[i,j] = np.sum((y-f(x,at,bt))**2)

# normalize to the optimal solution
best_sse = np.sum((y-f(x,a,b))**2)
fsse = (sse - best_sse) / best_sse

# compute f-statistic for the f-test
alpha = 0.05 # alpha, confidence
# alpha=0.05 is 95% confidence
fstat = sp.stats.f.isf(alpha,p,(n-p))
flim = fstat * p / (n-p)
obj_lim = flim * best_sse + best_sse

# Create a contour plot
plt.figure()
CS = plt.contour(a_grid,b_grid,sse,[5,20,50,100,200,500],\
colors='gray')
plt.clabel(CS, inline=1, fontsize=10)
plt.xlabel('Slope (a)')
plt.ylabel('Intercept (b)')
# solid line to show confidence region
CS = plt.contour(a_grid,b_grid,sse,[obj_lim],\
colors='orange',linewidths=[2.0])
plt.clabel(CS, inline=1, fontsize=10)

# 2 dummy plots (gray,orange) for legend
plt.plot([aL,aL],[bL,bL],c='gray',\
LineWidth=3,label='Objective')
plt.plot([aL,aU,aU,aL,aL],[bL,bL,bU,bU,bL],c='blue',\
LineWidth=3,label='Parameter Confidence Region')
plt.plot([aL,aL],[bL,bL],c='orange',\
LineWidth=3,label='Joint Confidence Region')
plt.scatter(a,b,s=10,c='red',marker='o',\
label='Optimal Solution')

plt.title('Parameter Statistics (y = a x + b)')
plt.legend(loc=1)

# Save the figure as a PNG
plt.savefig('contour.png')
plt.show()
(:sourceend:)
(:divend:)

Restore

March 17, 2018, at 01:43 AM by 45.56.3.173 -

Changed lines 252-254 from:

{$\frac{SSE(\theta)-SSE(\theta*)}{SSE(\theta*)}\le \frac{p}{n-p} F\left(\alpha,p,n-p\right)$}

with p=2 (number of parameters), n=100 (number of measurements), {`\theta`}=[a,b] (parameters), {`\~~theta*`~~} as the optimal parameters, SSE as the sum of squared errors, and the F statistic that has 3 arguments (alpha=confidence level, degrees of freedom 1, and degrees of freedom 2). For many problems, this creates a multi-dimensional nonlinear confidence region. In the case of 2 parameters, the nonlinear confidence region is a 2-dimensional space.

to:

{$\frac{SSE(\theta)-SSE(\theta^*)}{SSE(\theta^*)}\le \frac{p}{n-p} F\left(\alpha,p,n-p\right)$}

with p=2 (number of parameters), n=100 (number of measurements), {`\theta`}=[a,b] (parameters), {`\theta`}'^*^' as the optimal parameters, SSE as the sum of squared errors, and the F statistic that has 3 arguments (alpha=confidence level, degrees of freedom 1, and degrees of freedom 2). For many problems, this creates a multi-dimensional nonlinear confidence region. In the case of 2 parameters, the nonlinear confidence region is a 2-dimensional space.

Restore

March 16, 2018, at 08:01 PM by 45.56.3.173 -

Added lines 6-9:

(:html:)
<iframe width="560" height="315" src="https://www.youtube.com/embed/r4pgGD1kpYM" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
(:htmlend:)

Restore

March 16, 2018, at 07:38 PM by 45.56.3.173 -

Added lines 130-131:

%width=550px%Attach:regression_nonlinear_python.png

Restore

March 16, 2018, at 07:37 PM by 45.56.3.173 -

Changed lines 131-132 from:

(:toggle hide ~~linear~~ button show="~~Linear~~ Regression Solution":)
(:div id=~~linear~~:)

to:

(:toggle hide nonlinear button show="Nonlinear Regression Solution":)
(:div id=nonlinear:)

Restore

March 16, 2018, at 07:37 PM by 45.56.3.173 -

Deleted line 23:

~~import uncertainties as unc~~

Added line 31:

import uncertainties as unc

Changed lines 36-37 from:

to:

import uncertainties as unc

Added lines 130-236:

(:toggle hide linear button show="Linear Regression Solution":)
(:div id=linear:)
(:source lang=python:)
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
from scipy import stats
import pandas as pd

# pip install uncertainties, if needed
try:
import uncertainties.unumpy as unp
import uncertainties as unc
except:
import pip
pip.main(['install','uncertainties'])
import uncertainties.unumpy as unp
import uncertainties as unc

# import data
url = 'https://apmonitor.com/che263/uploads/Main/stats_data.txt'
data = pd.read_csv(url)
x = data['x'].values
y = data['y'].values
n = len(y)

def f(x, a, b, c):
return a * np.exp(b*x) + c

popt, pcov = curve_fit(f, x, y)

# retrieve parameter values
a = popt[0]
b = popt[1]
c = popt[2]
print('Optimal Values')
print('a: ' + str(a))
print('b: ' + str(b))
print('c: ' + str(c))

# compute r^2
r2 = 1.0-(sum((y-f(x,a,b,c))**2)/((n-1.0)*np.var(y,ddof=1)))
print('R^2: ' + str(r2))

# calculate parameter confidence interval
a,b,c = unc.correlated_values(popt, pcov)
print('Uncertainty')
print('a: ' + str(a))
print('b: ' + str(b))
print('c: ' + str(c))

# plot data
plt.scatter(x, y, s=3, label='Data')

# calculate regression confidence interval
px = np.linspace(14, 24, 100)
py = a*unp.exp(b*px)+c
nom = unp.nominal_values(py)
std = unp.std_devs(py)

def predband(x, xd, yd, p, func, conf=0.95):
# x = requested points
# xd = x data
# yd = y data
# p = parameters
# func = function name
alpha = 1.0 - conf # significance
N = xd.size # data sample size
var_n = len(p) # number of parameters
# Quantile of Student's t distribution for p=(1-alpha/2)
q = stats.t.ppf(1.0 - alpha / 2.0, N - var_n)
# Stdev of an individual measurement
se = np.sqrt(1. / (N - var_n) * \
np.sum((yd - func(xd, *p)) ** 2))
# Auxiliary definitions
sx = (x - xd.mean()) ** 2
sxd = np.sum((xd - xd.mean()) ** 2)
# Predicted values (best-fit model)
yp = func(x, *p)
# Prediction band
dy = q * se * np.sqrt(1.0+ (1.0/N) + (sx/sxd))
# Upper & lower prediction bands.
lpb, upb = yp - dy, yp + dy
return lpb, upb

lpb, upb = predband(px, x, y, popt, f, conf=0.95)

# plot the regression
plt.plot(px, nom, c='black', label='y=a exp(b x) + c')

# uncertainty lines (95% confidence)
plt.plot(px, nom - 1.96 * std, c='orange',\
label='95% Confidence Region')
plt.plot(px, nom + 1.96 * std, c='orange')
# prediction band (95% confidence)
plt.plot(px, lpb, 'k--',label='95% Prediction Band')
plt.plot(px, upb, 'k--')
plt.ylabel('y')
plt.xlabel('x')
plt.legend(loc='best')

# save and show figure
plt.savefig('regression.png')
plt.show()
(:sourceend:)
(:divend:)

Restore

March 16, 2018, at 07:06 PM by 45.56.3.173 -

Changed line 5 from:

~~Data regression~~ is ~~adjusts parameter values to best fit the data. Parameter uncertainty and the predicted outcome~~ uncertainty ~~is important for qualifying the degree of confidence in~~ the ~~solution. This tutorial shows how to perform a statistical analysis with~~ Python.

to:

Regression is an optimization method for adjusting parameter values so that a correlation best fits data. Parameter uncertainty and the predicted uncertainty is important for qualifying the confidence in the solution. This tutorial shows how to perform a statistical analysis with Python for both linear and nonlinear regression.

Restore

March 16, 2018, at 06:14 PM by 45.56.3.173 -

Added lines 6-7:

!!!! Linear Regression

Restore

March 16, 2018, at 06:11 PM by 45.56.3.173 -

Changed lines 65-66 from:

plt.scatter(x, y, s=3)

to:

plt.scatter(x, y, s=3, label='Data')

Changed lines 101-102 from:

plt.plot(px, nom, c='black')

to:

plt.plot(px, nom, c='black', label='y=a x + b')

Changed lines 104-105 from:

plt.plot(px, nom - 1.96 * std, c='orange')

to:

plt.plot(px, nom - 1.96 * std, c='orange',\
label='95% Confidence Region')

Changed line 108 from:

plt.plot(px, lpb, 'k--')

to:

plt.plot(px, lpb, 'k--',label='95% Prediction Band')

Added line 112:

plt.legend(loc='best')

Restore

March 16, 2018, at 05:35 PM by 45.56.3.173 -

Changed line 37 from:

data = pd.read_csv(url~~+file~~)

to:

data = pd.read_csv(url)

Restore

March 16, 2018, at 05:32 PM by 45.56.3.173 -

Changed lines 23-24 from:

import matplotlib.pyplot as plt
~~import uncertainties.unumpy as unp~~

to:

import matplotlib.pyplot as plt

Added lines 26-33:

# pip install uncertainties, if needed
try:
import uncertainties.unumpy as unp
except:
import pip
pip.main(['install','uncertainties'])
import uncertainties.unumpy as unp

Restore

March 16, 2018, at 05:30 PM by 45.56.3.173 -

Changed line 30 from:

data = pd.read_csv(url)

to:

data = pd.read_csv(url+file)

Added lines 102-103:

plt.ylabel('y')
plt.xlabel('x')

Restore

March 16, 2018, at 05:28 PM by 45.56.3.173 -

Changed lines 29-31 from:

url = 'https://apmonitor.com/che263/uploads/Main/'
~~file~~ = ~~'stats_data~~.txt'
data = pd.read_csv(url~~+file~~)

to:

url = 'https://apmonitor.com/che263/uploads/Main/stats_data.txt'
data = pd.read_csv(url)

Restore

March 16, 2018, at 05:28 PM by 45.56.3.173 -

Changed lines 20-106 from:

to:

import numpy as np
from scipy.optimize import curve_fit
import uncertainties as unc
import matplotlib.pyplot as plt
import uncertainties.unumpy as unp
from scipy import stats
import pandas as pd

# import data
url = 'https://apmonitor.com/che263/uploads/Main/'
file = 'stats_data.txt'
data = pd.read_csv(url+file)
x = data['x'].values
y = data['y'].values
n = len(y)

def f(x, a, b):
return a * x + b

popt, pcov = curve_fit(f, x, y)

# retrieve parameter values
a = popt[0]
b = popt[1]
print('Optimal Values')
print('a: ' + str(a))
print('b: ' + str(b))

# compute r^2
r2 = 1.0-(sum((y-f(x,a,b))**2)/((n-1.0)*np.var(y,ddof=1)))
print('R^2: ' + str(r2))

# calculate parameter confidence interval
a,b = unc.correlated_values(popt, pcov)
print('Uncertainty')
print('a: ' + str(a))
print('b: ' + str(b))

# plot data
plt.scatter(x, y, s=3)

# calculate regression confidence interval
px = np.linspace(14, 24, 100)
py = a*px+b
nom = unp.nominal_values(py)
std = unp.std_devs(py)

def predband(x, xd, yd, p, func, conf=0.95):
# x = requested points
# xd = x data
# yd = y data
# p = parameters
# func = function name
alpha = 1.0 - conf # significance
N = xd.size # data sample size
var_n = len(p) # number of parameters
# Quantile of Student's t distribution for p=(1-alpha/2)
q = stats.t.ppf(1.0 - alpha / 2.0, N - var_n)
# Stdev of an individual measurement
se = np.sqrt(1. / (N - var_n) * \
np.sum((yd - func(xd, *p)) ** 2))
# Auxiliary definitions
sx = (x - xd.mean()) ** 2
sxd = np.sum((xd - xd.mean()) ** 2)
# Predicted values (best-fit model)
yp = func(x, *p)
# Prediction band
dy = q * se * np.sqrt(1.0+ (1.0/N) + (sx/sxd))
# Upper & lower prediction bands.
lpb, upb = yp - dy, yp + dy
return lpb, upb

lpb, upb = predband(px, x, y, popt, f, conf=0.95)

# plot the regression
plt.plot(px, nom, c='black')

# uncertainty lines (95% confidence)
plt.plot(px, nom - 1.96 * std, c='orange')
plt.plot(px, nom + 1.96 * std, c='orange')
# prediction band (95% confidence)
plt.plot(px, lpb, 'k--')
plt.plot(px, upb, 'k--')

# save and show figure
plt.savefig('regression.png')
plt.show()

Restore

March 16, 2018, at 04:25 PM by 45.56.3.173 -

Changed lines 32-34 from:

!!!! ~~Parameter~~ Confidence

~~The nonlinear~~ confidence ~~interval~~ is shown by producing a contour plot of the SSE objective function with variations in the two parameters. The optimal solution is shown at the center of the plot and the objective function becomes worse (higher) away from the optimal solution.

to:

!!!! Joint Confidence Regions

The joint confidence region is shown by producing a contour plot of the SSE objective function with variations in the two parameters. The optimal solution is shown at the center of the plot and the objective function becomes worse (higher) away from the optimal solution.

Restore

March 16, 2018, at 04:24 PM by 45.56.3.173 -

Changed lines 28-29 from:

{$y = a \exp{b\, x} + c$}

to:

{$y = a \exp{\left(b\, x\right)} + c$}

Changed lines 40-42 from:

{$\frac{SSE(\theta)-SSE(\theta^*)}{SSE(\theta^*)}\le \frac{p}{n-p} F\left(\alpha,p,n-p\right)$}

with p=2 (number of parameters), n=100 (number of measurements), {`\theta`}=[a,b] (parameters), {`\theta^*`} as the optimal parameters, SSE as the sum of squared errors, and the F statistic that has 3 arguments (alpha=confidence level, degrees of freedom 1, and degrees of freedom 2). For many problems, this creates a multi-dimensional nonlinear confidence region. In the case of 2 parameters, the nonlinear confidence region is a 2-dimensional space.

to:

{$\frac{SSE(\theta)-SSE(\theta*)}{SSE(\theta*)}\le \frac{p}{n-p} F\left(\alpha,p,n-p\right)$}

with p=2 (number of parameters), n=100 (number of measurements), {`\theta`}=[a,b] (parameters), {`\theta*`} as the optimal parameters, SSE as the sum of squared errors, and the F statistic that has 3 arguments (alpha=confidence level, degrees of freedom 1, and degrees of freedom 2). For many problems, this creates a multi-dimensional nonlinear confidence region. In the case of 2 parameters, the nonlinear confidence region is a 2-dimensional space.

Restore

March 16, 2018, at 04:23 PM by 45.56.3.173 -

Changed lines 7-8 from:

~~Use the data at the link below to create~~ a ~~linear~~ and ~~nonlinear regression~~.

to:

Create a linear model with unknown coefficients ''a'' (slope) and ''b'' (intercept). Fit the model to the data by minimizing the sum of squared errors between the predicted and measured ''y'' values.

{$y = a \, x + b$}

Show the linear regression with 95% confidence bands and 95% prediction bands. The confidence band is the confidence region for the correlation equation. The prediction band is the region that contains approximately 95% of the measurements. If another measurement is taken, there is a 95% chance that it falls within the prediction band. Use the following data set that is a table of comma separated values.

Changed lines 17-18 from:

(:toggle hide ~~stats~~ button show="~~Python~~ Solution":)
(:div id=~~stats~~:)

to:

(:toggle hide linear button show="Linear Regression Solution":)
(:div id=linear:)

Changed lines 24-32 from:

!!!! Nonlinear ~~Parameter Confidence Region~~

to:

!!!! Nonlinear Regression

Create a nonlinear regression with unknown coefficients ''a'', ''b'', and ''c''.

{$y = a \exp{b\, x} + c$}

Similar to the linear regression, fit the model to the data by minimizing the sum of squared errors between the predicted and measured ''y'' values. Show the nonlinear regression with 95% confidence bands and 95% prediction bands.

!!!! Parameter Confidence

Restore

March 16, 2018, at 03:47 PM by 45.56.3.173 -

Changed line 24 from:

-> Attach:sse_contour.png

to:

%width=550px%Attach:sse_contour.png

Restore

March 16, 2018, at 03:46 PM by 45.56.3.173 -

Added lines 19-32:

!!!! Nonlinear Parameter Confidence Region

The nonlinear confidence interval is shown by producing a contour plot of the SSE objective function with variations in the two parameters. The optimal solution is shown at the center of the plot and the objective function becomes worse (higher) away from the optimal solution.

-> Attach:sse_contour.png

The parameter confidence region is determined with an F-test that specifies an upper limit to the deviation from the optimal solution

{$\frac{SSE(\theta)-SSE(\theta^*)}{SSE(\theta^*)}\le \frac{p}{n-p} F\left(\alpha,p,n-p\right)$}

with p=2 (number of parameters), n=100 (number of measurements), {`\theta`}=[a,b] (parameters), {`\theta^*`} as the optimal parameters, SSE as the sum of squared errors, and the F statistic that has 3 arguments (alpha=confidence level, degrees of freedom 1, and degrees of freedom 2). For many problems, this creates a multi-dimensional nonlinear confidence region. In the case of 2 parameters, the nonlinear confidence region is a 2-dimensional space.

----

Restore

March 16, 2018, at 03:01 PM by 45.56.3.173 -

Added lines 1-39:

(:title Regression Statistics with Python:)
(:keywords data regression, uncertainty, Python, unumpy, linear regression, prediction band, confidence interval:)
(:description Statistics for confidence interval and prediction band from a linear or nonlinear regression. The uncertainties package is used in Python to generate the confidence intervals.:)

Data regression is adjusts parameter values to best fit the data. Parameter uncertainty and the predicted outcome uncertainty is important for qualifying the degree of confidence in the solution. This tutorial shows how to perform a statistical analysis with Python.

Use the data at the link below to create a linear and nonlinear regression.

Attach:download.png[[Attach:stats_data.txt|View Data for Regression and Statistical Analysis]]

%width=550px%Attach:regression_statistics_python.png

(:toggle hide stats button show="Python Solution":)
(:div id=stats:)
(:source lang=python:)

(:sourceend:)
(:divend:)

Linear and nonlinear regression tutorials can also available in [[Main/PythonDataRegression|Python]], [[Main/ExcelDataRegression|Excel]], and [[Main/MatlabDataRegression|Matlab]]. A [[https://apmonitor.com/me575/index.php/Main/NonlinearRegression|multivariate nonlinear regression case with multiple factors]] is available with example data for energy prices in Python. Click on the appropriate link for additional information.

----

(:html:)
<div id="disqus_thread"></div>
<script type="text/javascript">
/* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
var disqus_shortname = 'apmonitor'; // required: replace example with your forum shortname

/* * * DON'T EDIT BELOW THIS LINE * * */
(function() {
var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
dsq.src = 'https://' + disqus_shortname + '.disqus.com/embed.js';
(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
})();
</script>
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
<a href="https://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
(:htmlend:)

Restore

Learn Programming

Regression Statistics with Python

Main.PythonRegressionStatistics History

Search

Options: