Regression is an optimization method for adjusting parameter values so that a correlation best fits data. Parameter uncertainty and the predicted uncertainty is important for qualifying the confidence in the solution. This tutorial shows how to perform a statistical analysis with Python for both linear and nonlinear regression.
Create a linear model with unknown coefficients a (slope) and b (intercept). Fit the model to the data by minimizing the sum of squared errors between the predicted and measured y values.
$$y = a \, x + b$$
Show the linear regression with 95% confidence bands and 95% prediction bands. The confidence band is the confidence region for the correlation equation. The prediction band is the region that contains approximately 95% of the measurements. If another measurement is taken, there is a 95% chance that it falls within the prediction band. Use the following data set that is a table of comma separated values.
Create a nonlinear regression with unknown coefficients a, b, and c.
$$y = a \exp{\left(b\, x\right)} + c$$
Similar to the linear regression, fit the model to the data by minimizing the sum of squared errors between the predicted and measured y values. Show the nonlinear regression with 95% confidence bands and 95% prediction bands.
The joint confidence region is shown by producing a contour plot of the SSE objective function with variations in the two parameters. The optimal solution is shown at the center of the plot and the objective function becomes worse (higher) away from the optimal solution.
The parameter confidence region is determined with an F-test that specifies an upper limit to the deviation from the optimal solution
$$\frac{SSE(\theta)-SSE(\theta^*)}{SSE(\theta^*)}\le \frac{p}{n-p} F\left(\alpha,p,n-p\right)$$
with p=2 (number of parameters), n=100 (number of measurements), `\theta`=[a,b] (parameters), `\theta`* as the optimal parameters, SSE as the sum of squared errors, and the F statistic that has 3 arguments (alpha=confidence level, degrees of freedom 1, and degrees of freedom 2). For many problems, this creates a multi-dimensional nonlinear confidence region. In the case of 2 parameters, the nonlinear confidence region is a 2-dimensional space.
Linear and nonlinear regression tutorials can also available in Python, Excel, and Matlab. A multivariate nonlinear regression case with multiple factors is available with example data for energy prices in Python. Click on the appropriate link for additional information.