Linear Regression
Main.LinearMultivariateRegression History
Hide minor edits - Show changes to markup
(:html:) <iframe width="560" height="315" src="https://www.youtube.com/embed/U05aLeWQLSc" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> (:htmlend:)
(:html:) <iframe width="560" height="315" src="https://www.youtube.com/embed/U05aLeWQLSc" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> (:htmlend:)
Objective: Perform multiple linear regression on sample data.
Objective: Perform multiple linear regression on sample data with two inputs.
(:html:) <iframe width="560" height="315" src="https://www.youtube.com/embed/qlxVM-un2eo" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> (:htmlend:)
============================================================================== Dep. Variable: y R-squared: 0.933 Model: OLS Adj. R-squared: 0.906 Method: Least Squares F-statistic: 34.77 Date: Wed, 26 Aug 2020 Prob (F-statistic): 0.00117 Time: 23:16:24 Log-Likelihood: 4.6561 No. Observations: 8 AIC: -3.312 Df Residuals: 5 BIC: -3.074
========================================================================== Dep. Variable: y R-squared: 0.933 Model: OLS Adj. R-squared: 0.906 Method: Least Squares F-statistic: 34.77 Date: Wed, 26 Aug 2020 Prob (F-statistic): 0.00117 Time: 23:16:24 Log-Likelihood: 4.6561 No. Observations: 8 AIC: -3.312 Df Residuals: 5 BIC: -3.074
Covariance Type: nonrobust ==============================================================================
coef std err t P>|t| [0.025 0.975]
x1 0.2003 0.024 8.256 0.000 0.138 0.263 x2 -0.0750 0.046 -1.639 0.162 -0.193 0.043 const -0.2883 0.186 -1.551 0.182 -0.766 0.190 ============================================================================== Omnibus: 1.262 Durbin-Watson: 1.558 Prob(Omnibus): 0.532 Jarque-Bera (JB): 0.075 Skew: -0.237 Prob(JB): 0.963 Kurtosis: 3.026 Cond. No. 16.9 ==============================================================================
Covariance Type: nonrobust ==========================================================================
coef std err t P>|t| [0.025 0.975]
x1 0.2003 0.024 8.256 0.000 0.138 0.263 x2 -0.0750 0.046 -1.639 0.162 -0.193 0.043 const -0.2883 0.186 -1.551 0.182 -0.766 0.190 ========================================================================== Omnibus: 1.262 Durbin-Watson: 1.558 Prob(Omnibus): 0.532 Jarque-Bera (JB): 0.075 Skew: -0.237 Prob(JB): 0.963 Kurtosis: 3.026 Cond. No. 16.9 ==========================================================================
Omnibus: 2.653 Durbin-Watson: 0.811 Prob(Omnibus): 0.265 Jarque-Bera (JB): 0.918 Skew: 0.827 Prob(JB): 0.632 Kurtosis: 2.862 Cond. No. 7.32 ==============================================================================
Omnibus: 2.653 Durbin-Watson: 0.811 Prob(Omnibus): 0.265 Jarque-Bera (JB): 0.918 Skew: 0.827 Prob(JB): 0.632 Kurtosis: 2.862 Cond. No. 7.32 ============================================================================
==============================================================================
coef std err t P>|t| [0.025 0.975]
x1 0.1980 0.027 7.224 0.000 0.131 0.265 const -0.5432 0.115 -4.721 0.003 -0.825 -0.262 ==============================================================================
============================================================================
coef std err t P>|t| [0.025 0.975]
x1 0.1980 0.027 7.224 0.000 0.131 0.265 const -0.5432 0.115 -4.721 0.003 -0.825 -0.262 ============================================================================
Dep. Variable: y R-squared: 0.897 Model: OLS Adj. R-squared: 0.880 Method: Least Squares F-statistic: 52.19 Date: Wed, 26 Aug 2020 Prob (F-statistic): 0.000357 Time: 22:05:45 Log-Likelihood: 2.9364 No. Observations: 8 AIC: -1.873 Df Residuals: 6 BIC: -1.714
Dep. Variable: y R-squared: 0.897 Model: OLS Adj. R-squared: 0.880 Method: Least Squares F-statistic: 52.19 Date: Wed, 26 Aug 2020 Prob (F-statistic): 0.000357 Time: 22:05:45 Log-Likelihood: 2.9364 No. Observations: 8 AIC: -1.873 Df Residuals: 6 BIC: -1.714
Covariance Type: nonrobust
Covariance Type: nonrobust
============================================================================== Dep. Variable: y R-squared: 0.897 Model: OLS Adj. R-squared: 0.880 Method: Least Squares F-statistic: 52.19 Date: Wed, 26 Aug 2020 Prob (F-statistic): 0.000357 Time: 22:05:45 Log-Likelihood: 2.9364 No. Observations: 8 AIC: -1.873 Df Residuals: 6 BIC: -1.714 Df Model: 1
============================================================================ Dep. Variable: y R-squared: 0.897 Model: OLS Adj. R-squared: 0.880 Method: Least Squares F-statistic: 52.19 Date: Wed, 26 Aug 2020 Prob (F-statistic): 0.000357 Time: 22:05:45 Log-Likelihood: 2.9364 No. Observations: 8 AIC: -1.873 Df Residuals: 6 BIC: -1.714 Df Model: 1
(:html:) <iframe width="560" height="315" src="https://www.youtube.com/embed/U05aLeWQLSc" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> (:htmlend:)
- Dep. Variable: Model output
- Model: Regression model (OLS=Ordinary Least Squares)
- Method: Regression method
- Date/Time: Time stamp
- No. Observations: Number of data points
- DF Residuals: Residual degrees of freedom. Number of data points – number of parameters
- DF Model: Number of parameters but not including the constant term (intercept)
- R-squared: Coefficient of determination (0-1) is a statistical measure the regression line closeness to the data points (1=perfect alignment)
- Adj. R-squared: Adjusted R-squared based on the number of data points and DF Residuals
- F-statistic: Significance of the fit
- Prob (F-statistic): Probability of the F-statistic
- Log-likelihood: log of the likelihood function
- AIC: Akaike Information Criterion
- BIC: Bayesian Information Criterion
- coef: the regression coefficient
- std err: standard error of the estimated coefficient
- t: t-statistic value that is a measure of the cofficient signficance
- P > |t|: P-value, if less than the confidence level (typically 0.05) the coefficient is a statistically significant in predicting the output
- [0.025 0.975]: 95% confidence interval coefficient bounds
- Skewness: measure of data symmetry. With |skewness|>1 data is highly skewed. If |skewness|<0.5 the data is approximately symmetric.
- Kurtosis: shape of the distribution that compares data at the center with the tails. Data sets with high kurtosis have heavy tails or more outliers. Data sets with low kurtosis have fewer outliers. Kurtosis is 3 for a normal distribution.
- Omnibus: D’Angostino’s test, statistical test for the presence of skewness and kurtosis
- Prob(Omnibus): Omnibus probability
- Jarque-Bera: Test of skewness and kurtosis
- Prob (JB): Jarque-Bera probability
- Durbin-Watson: Test for autocorrelation if the errors have a time-series component
- Cond. No: Test for multicollinearity coefficients are related. A high condition number indicates that some of the inputs and coefficents are not needed.
- Dep. Variable: Model output
- Model: Regression model (OLS=Ordinary Least Squares)
- Method: Regression method
- Date/Time: Time stamp
- No. Observations: Number of data points
- DF Residuals: Residual degrees of freedom. Number of data points – number of parameters
- DF Model: Number of parameters but not including the constant term (intercept)
- R-squared: Coefficient of determination (0-1) is a statistical measure the regression line closeness to the data points (1=perfect alignment)
- Adj. R-squared: Adjusted R-squared based on the number of data points and DF Residuals
- F-statistic: Significance of the fit
- Prob (F-statistic): Probability of the F-statistic
- Log-likelihood: log of the likelihood function
- AIC: Akaike Information Criterion
- BIC: Bayesian Information Criterion
- coef: the regression coefficient
- std err: standard error of the estimated coefficient
- t: t-statistic value that is a measure of the cofficient signficance
- P > |t|: P-value, if less than the confidence level (typically 0.05) the coefficient is a statistically significant in predicting the output
- [0.025 0.975]: 95% confidence interval coefficient bounds
- Skewness: measure of data symmetry. With |skewness|>1 data is highly skewed. If |skewness|<0.5 the data is approximately symmetric.
- Kurtosis: shape of the distribution that compares data at the center with the tails. Data sets with high kurtosis have heavy tails or more outliers. Data sets with low kurtosis have fewer outliers. Kurtosis is 3 for a normal distribution.
- Omnibus: D’Angostino’s test, statistical test for the presence of skewness and kurtosis
- Prob(Omnibus): Omnibus probability
- Jarque-Bera: Test of skewness and kurtosis
- Prob (JB): Jarque-Bera probability
- Durbin-Watson: Test for autocorrelation if the errors have a time-series component
- Cond. No: Test for multicollinearity coefficients are related. A high condition number indicates that some of the inputs and coefficents are not needed.
There is additional information in nonlinear regression and Data Science Online Course Regression (Module 6).
There is additional information about nonlinear regression and regression statistics. Also see Data Science Online Course for more information on regression (Module 6).
- Dep. Variable: Model output
- Model: Regression model (OLS=Ordinary Least Squares)
- Method: Regression method
- Date/Time: Time stamp
- No. Observations: Number of data points
- DF Residuals: Residual degrees of freedom. Number of data points – number of parameters
- DF Model: Number of parameters but not including the constant term (intercept)
- R-squared: Coefficient of determination (0-1) is a statistical measure the regression line closeness to the data points (1=perfect alignment)
- Adj. R-squared: Adjusted R-squared based on the number of data points and DF Residuals
- F-statistic: Significance of the fit
- Prob (F-statistic): Probability of the F-statistic
- Log-likelihood: log of the likelihood function
- AIC: Akaike Information Criterion
- BIC: Bayesian Information Criterion
- coef: the regression coefficient
- std err: standard error of the estimated coefficient
- t: t-statistic value that is a measure of the cofficient signficance
- P > |t|: P-value, if less than the confidence level (typically 0.05) the coefficient is a statistically significant in predicting the output
- [0.025 0.975]: 95% confidence interval coefficient bounds
- Skewness: measure of data symmetry. With |skewness|>1 data is highly skewed. If |skewness|<0.5 the data is approximately symmetric.
- Kurtosis: shape of the distribution that compares data at the center with the tails. Data sets with high kurtosis have heavy tails or more outliers. Data sets with low kurtosis have fewer outliers. Kurtosis is 3 for a normal distribution.
- Omnibus: D’Angostino’s test, statistical test for the presence of skewness and kurtosis
- Prob(Omnibus): Omnibus probability
- Jarque-Bera: Test of skewness and kurtosis
- Prob (JB): Jarque-Bera probability
- Durbin-Watson: Test for autocorrelation if the errors have a time-series component
- Cond. No: Test for multicollinearity coefficients are related. A high condition number indicates that some of the inputs and coefficents are not needed.
- Dep. Variable: Model output
- Model: Regression model (OLS=Ordinary Least Squares)
- Method: Regression method
- Date/Time: Time stamp
- No. Observations: Number of data points
- DF Residuals: Residual degrees of freedom. Number of data points – number of parameters
- DF Model: Number of parameters but not including the constant term (intercept)
- R-squared: Coefficient of determination (0-1) is a statistical measure the regression line closeness to the data points (1=perfect alignment)
- Adj. R-squared: Adjusted R-squared based on the number of data points and DF Residuals
- F-statistic: Significance of the fit
- Prob (F-statistic): Probability of the F-statistic
- Log-likelihood: log of the likelihood function
- AIC: Akaike Information Criterion
- BIC: Bayesian Information Criterion
- coef: the regression coefficient
- std err: standard error of the estimated coefficient
- t: t-statistic value that is a measure of the cofficient signficance
- P > |t|: P-value, if less than the confidence level (typically 0.05) the coefficient is a statistically significant in predicting the output
- [0.025 0.975]: 95% confidence interval coefficient bounds
- Skewness: measure of data symmetry. With |skewness|>1 data is highly skewed. If |skewness|<0.5 the data is approximately symmetric.
- Kurtosis: shape of the distribution that compares data at the center with the tails. Data sets with high kurtosis have heavy tails or more outliers. Data sets with low kurtosis have fewer outliers. Kurtosis is 3 for a normal distribution.
- Omnibus: D’Angostino’s test, statistical test for the presence of skewness and kurtosis
- Prob(Omnibus): Omnibus probability
- Jarque-Bera: Test of skewness and kurtosis
- Prob (JB): Jarque-Bera probability
- Durbin-Watson: Test for autocorrelation if the errors have a time-series component
- Cond. No: Test for multicollinearity coefficients are related. A high condition number indicates that some of the inputs and coefficents are not needed.
- Model: Regression model (e.g. Ordinary Least Squares)
- Model: Regression model (OLS=Ordinary Least Squares)
- [95.0% Conf. Interval]: 95% confidence interval coefficient bounds
- [0.025 0.975]: 95% confidence interval coefficient bounds
Dep. Variable: Model output Model: Regression model (e.g. Ordinary Least Squares) Method: Regression method Date/Time: Time stamp No. Observations: Number of data points DF Residuals: Residual degrees of freedom. Number of data points – number of parameters DF Model: Number of parameters but not including the constant term (intercept) R-squared: Coefficient of determination (0-1) is a statistical measure the regression line closeness to the data points (1=perfect alignment) Adj. R-squared: Adjusted R-squared based on the number of data points and DF Residuals F-statistic: Significance of the fit Prob (F-statistic): Probability of the F-statistic Log-likelihood: log of the likelihood function AIC: Akaike Information Criterion BIC: Bayesian Information Criterion coef: the regression coefficient std err: standard error of the estimated coefficient t: t-statistic value that is a measure of the cofficient signficance P > |t|: P-value, if less than the confidence level (typically 0.05) the coefficient is a statistically significant in predicting the output [95.0% Conf. Interval]: 95% confidence interval coefficient bounds Skewness: measure of data symmetry. With |skewness|>1 data is highly skewed. If |skewness|<0.5 the data is approximately symmetric. Kurtosis: shape of the distribution that compares data at the center with the tails. Data sets with high kurtosis have heavy tails or more outliers. Data sets with low kurtosis have fewer outliers. Kurtosis is 3 for a normal distribution. Omnibus: D’Angostino’s test, statistical test for the presence of skewness and kurtosis Prob(Omnibus): Omnibus probability Jarque-Bera: Test of skewness and kurtosis Prob (JB): Jarque-Bera probability Durbin-Watson: Test for autocorrelation if the errors have a time-series component Cond. No: Test for multicollinearity coefficients are related. A high condition number indicates that some of the inputs and coefficents are not needed.
- Dep. Variable: Model output
- Model: Regression model (e.g. Ordinary Least Squares)
- Method: Regression method
- Date/Time: Time stamp
- No. Observations: Number of data points
- DF Residuals: Residual degrees of freedom. Number of data points – number of parameters
- DF Model: Number of parameters but not including the constant term (intercept)
- R-squared: Coefficient of determination (0-1) is a statistical measure the regression line closeness to the data points (1=perfect alignment)
- Adj. R-squared: Adjusted R-squared based on the number of data points and DF Residuals
- F-statistic: Significance of the fit
- Prob (F-statistic): Probability of the F-statistic
- Log-likelihood: log of the likelihood function
- AIC: Akaike Information Criterion
- BIC: Bayesian Information Criterion
- coef: the regression coefficient
- std err: standard error of the estimated coefficient
- t: t-statistic value that is a measure of the cofficient signficance
- P > |t|: P-value, if less than the confidence level (typically 0.05) the coefficient is a statistically significant in predicting the output
- [95.0% Conf. Interval]: 95% confidence interval coefficient bounds
- Skewness: measure of data symmetry. With |skewness|>1 data is highly skewed. If |skewness|<0.5 the data is approximately symmetric.
- Kurtosis: shape of the distribution that compares data at the center with the tails. Data sets with high kurtosis have heavy tails or more outliers. Data sets with low kurtosis have fewer outliers. Kurtosis is 3 for a normal distribution.
- Omnibus: D’Angostino’s test, statistical test for the presence of skewness and kurtosis
- Prob(Omnibus): Omnibus probability
- Jarque-Bera: Test of skewness and kurtosis
- Prob (JB): Jarque-Bera probability
- Durbin-Watson: Test for autocorrelation if the errors have a time-series component
- Cond. No: Test for multicollinearity coefficients are related. A high condition number indicates that some of the inputs and coefficents are not needed.
Dep. Variable: Model output Model: Regression model (e.g. Ordinary Least Squares) Method: Regression method Date/Time: Time stamp No. Observations: Number of data points DF Residuals: Residual degrees of freedom. Number of data points – number of parameters DF Model: Number of parameters but not including the constant term (intercept) R-squared: Coefficient of determination (0-1) is a statistical measure the regression line closeness to the data points (1=perfect alignment) Adj. R-squared: Adjusted R-squared based on the number of data points and DF Residuals F-statistic: Significance of the fit Prob (F-statistic): Probability of the F-statistic Log-likelihood: log of the likelihood function AIC: Akaike Information Criterion BIC: Bayesian Information Criterion coef: the regression coefficient std err: standard error of the estimated coefficient t: t-statistic value that is a measure of the cofficient signficance P > |t|: P-value, if less than the confidence level (typically 0.05) the coefficient is a statistically significant in predicting the output [95.0% Conf. Interval]: 95% confidence interval coefficient bounds Skewness: measure of data symmetry. With |skewness|>1 data is highly skewed. If |skewness|<0.5 the data is approximately symmetric. Kurtosis: shape of the distribution that compares data at the center with the tails. Data sets with high kurtosis have heavy tails or more outliers. Data sets with low kurtosis have fewer outliers. Kurtosis is 3 for a normal distribution. Omnibus: D’Angostino’s test, statistical test for the presence of skewness and kurtosis Prob(Omnibus): Omnibus probability Jarque-Bera: Test of skewness and kurtosis Prob (JB): Jarque-Bera probability Durbin-Watson: Test for autocorrelation if the errors have a time-series component Cond. No: Test for multicollinearity coefficients are related. A high condition number indicates that some of the inputs and coefficents are not needed.
Capital letters are often used to indicate when there are multiple inputs (X) or multiple outputs (Y). The difference between the predicted `X \beta` and measured `Y` output is the error `e`.
$$Y=X \beta + \e$$
Linear regression analysis determines if the error `\mu` has certain statistical properties. A common requirement is that the errors (residuals) are normally distributed (`N(\mu,\Sigma)`) with zero mean `\mu=0` and covariance `\Sigma`=I (the identity matrix). This implies that the residuals are i.i.d. (independent and identically distributed) random variables. Statistical tests determine if the data fits a linear regression model or if there are unmodeled features of the data that may require a different type of regression model.
Capital letters are often used to indicate when there are multiple inputs (X) or multiple outputs (Y). The difference between the predicted `X \beta` and measured `Y` output is the error `\epsilon`.
$$Y=X \beta + \epsilon$$
Linear regression analysis determines if the error `\epsilon` has certain statistical properties. A common requirement is that the errors (residuals) are normally distributed (`N(\mu,\Sigma)`) with zero mean `\mu=0` and covariance `\Sigma`=I (the identity matrix). This implies that the residuals are i.i.d. (independent and identically distributed) random variables. Statistical tests determine if the data fits a linear regression model or if there are unmodeled features of the data that may require a different type of regression model.
Capital letters are often used to indicate when there are multiple inputs (X) or multiple outputs (Y). The difference between the predicted `X \beta` and measured `Y` output is the error `\mu`.
$$Y=X \beta + \mu$$
Linear regression analysis determines if the error `\mu` has certain statistical properties. A common requirement is that the errors (residuals) are normally distributed (N()) with zero mean and covariance `\Sigma`.
$$\mu \approx N(0,\Sigma)$$
The most basic type of linear regression, Ordinary Least Squares (OLS), assumes that the residuals are i.i.d. (independent and identically distributed) random variables with `\Sigma=I`, the identity matrix. Statistical tests determine if the data fits a linear regression model or if there are unmodeled features of the data that may require a different type of regression model.
Capital letters are often used to indicate when there are multiple inputs (X) or multiple outputs (Y). The difference between the predicted `X \beta` and measured `Y` output is the error `e`.
$$Y=X \beta + \e$$
Linear regression analysis determines if the error `\mu` has certain statistical properties. A common requirement is that the errors (residuals) are normally distributed (`N(\mu,\Sigma)`) with zero mean `\mu=0` and covariance `\Sigma`=I (the identity matrix). This implies that the residuals are i.i.d. (independent and identically distributed) random variables. Statistical tests determine if the data fits a linear regression model or if there are unmodeled features of the data that may require a different type of regression model.
In machine learning terminology, the data inputs (x) are features and the measured outputs (y) are labels. Two examples demonstrate multiple Python methods for (1) univariate linear regression and (2) multiple linear regression.
In machine learning terminology, the data inputs (x) are features and the measured outputs (y) are labels. For a single input and single output, m is the slope and c is the intercept.
$$y=m x + c$$
An alternate way to write this is in matrix form and changing the slope to `\beta_1` and the intercept to `\beta_2`.
$$y = \begin{bmatrix}x&1\end{bmatrix} \begin{bmatrix}m\\c\end{bmatrix} = \begin{bmatrix}x&1\end{bmatrix} \begin{bmatrix}\beta_1\\\beta_2\end{bmatrix}$$
Capital letters are often used to indicate when there are multiple inputs (X) or multiple outputs (Y). The difference between the predicted `X \beta` and measured `Y` output is the error `\mu`.
$$Y=X \beta + \mu$$
Linear regression analysis determines if the error `\mu` has certain statistical properties. A common requirement is that the errors (residuals) are normally distributed (N()) with zero mean and covariance `\Sigma`.
$$\mu \approx N(0,\Sigma)$$
The most basic type of linear regression, Ordinary Least Squares (OLS), assumes that the residuals are i.i.d. (independent and identically distributed) random variables with `\Sigma=I`, the identity matrix. Statistical tests determine if the data fits a linear regression model or if there are unmodeled features of the data that may require a different type of regression model.
Two examples demonstrate multiple Python methods for (1) univariate linear regression and (2) multiple linear regression.
Dep. Variable: y R-squared: 0.897
Dep. Variable: y R-squared: 0.897
Dep. Variable: y R-squared: 0.897
Dep. Variable: y R-squared: 0.897
In machine learning terminology, the data inputs (x) are features and the measured outputs (y) are labels.
In machine learning terminology, the data inputs (x) are features and the measured outputs (y) are labels. Two examples demonstrate multiple Python methods for (1) univariate linear regression and (2) multiple linear regression.
There is additional information about regression in the Data Science Online Course.
There is additional information in nonlinear regression and Data Science Online Course Regression (Module 6).
(:toggle hide regress1 button show="Show Source Code":) (:div id=regress1:)
(:toggle hide regress2 button show="Show Source Code":) (:div id=regress2:)
(:html:) <iframe width="560" height="315" src="https://www.youtube.com/embed/BSwm2ZSstEY" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe> (:htmlend:)
Example 2: Multiple Linear Regression
Objective: Perform multiple linear regression on sample data.
For linear regression, find unknown parameters a0-a2 to minimize the difference between measured y and predicted yfit.
Data
$$x_0 = [4,5,2,3,-1,1,6,7]$$
$$x_1 = [3,2,3,4, 3,5,2,6]$$
$$y = [0.3,0.8,-0.05,0.1,-0.8,-0.5,0.5,0.65]$$

Linear Equation
$$y_{fit} = a_0 x_0 + a_1 x_1 + a_2$$
Minimize Objective
$$\min_{a_0,a_1} \sum_{i=1}^n \left(y_i-y_{fit,i}\right)^2$$
where n is the length of y and a0-a2 are adjusted to minimize the sum of the squared errors.
Report the parameter values, the R2 value of fit, and display a plot of the results.
Solution
As with univariate linear regression, there are several methods for multiple regression in Python with 3 different packages to generate the solution. Fewer packages in Python can perform multiple or multivariate linear regression. The methods are from the packages:
- numpy.linalg
- statsmodels ordinary least squares
- gekko optimization (allows constraints)

(:source lang=python:)
OLS Regression Results
============================================================================== Dep. Variable: y R-squared: 0.933 Model: OLS Adj. R-squared: 0.906 Method: Least Squares F-statistic: 34.77 Date: Wed, 26 Aug 2020 Prob (F-statistic): 0.00117 Time: 23:16:24 Log-Likelihood: 4.6561 No. Observations: 8 AIC: -3.312 Df Residuals: 5 BIC: -3.074 Df Model: 2 Covariance Type: nonrobust ==============================================================================
coef std err t P>|t| [0.025 0.975]
x1 0.2003 0.024 8.256 0.000 0.138 0.263 x2 -0.0750 0.046 -1.639 0.162 -0.193 0.043 const -0.2883 0.186 -1.551 0.182 -0.766 0.190 ============================================================================== Omnibus: 1.262 Durbin-Watson: 1.558 Prob(Omnibus): 0.532 Jarque-Bera (JB): 0.075 Skew: -0.237 Prob(JB): 0.963 Kurtosis: 3.026 Cond. No. 16.9 ============================================================================== (:sourceend:)
(:toggle hide regress1 button show="Show Source Code":) (:div id=regress1:) (:source lang=python:) import numpy as np import statsmodels.api as sm from gekko import GEKKO
- Data
x0 = np.array([4,5,2,3,-1,1,6,7]) x1 = np.array([3,2,3,4, 3,5,2,6]) y = np.array([0.3,0.8,-0.05,0.1,-0.8,-0.5,0.5,0.65])
- calculate R^2
def rsq(y1,y2):
yresid= y1 - y2 SSresid = np.sum(yresid**2) SStotal = len(y1) * np.var(y1) r2 = 1 - SSresid/SStotal return r2
- Method 1: numpy linalg solution
- Y = X a
- X^T Y = X^T X a
X = np.vstack((x0,x1,np.ones(len(x0)))).T a = np.linalg.lstsq(X,y)[0]; print(a) yfit = a[0]*x0+a[1]*x1+a[2] print('R^2 = '+str(rsq(y,yfit)))
- Method 2: statsmodels ordinary least squares
model = sm.OLS(y,X).fit() predictions = model.predict(X) print(model.summary())
- Method 3: gekko
m = GEKKO(remote=False); m.options.IMODE=2 c = m.Array(m.FV,3) for ci in c:
ci.STATUS=1
xd = m.Array(m.Param,2); xd[0].value=x0; xd[1].value=x1 yd = m.Param(y); yp = m.Var() s = m.sum([c[i]*xd[i] for i in range(2)]) m.Equation(yp==s+c[-1]) m.Minimize((yd-yp)**2) m.solve(disp=False) a = [c[i].value[0] for i in range(3)] print(a)
- plot data
from mpl_toolkits import mplot3d from matplotlib import cm import matplotlib.pyplot as plt fig = plt.figure() ax = plt.axes(projection='3d') ax.plot3D(x0,x1,y,'ko') x0t = np.arange(-1,7,0.25) x1t = np.arange(2,6,0.25) X0,X1 = np.meshgrid(x0t,x1t) Yt = a[0]*X0+a[1]*X1+a[2] ax.plot_surface(X0,X1,Yt,cmap=cm.coolwarm,alpha=0.5) plt.show() (:sourceend:)
(:divend:)
Objective: Perform univariate (single input factor) linear regression on sample data with and without constraints.
Objective: Perform univariate (single input factor) linear regression on sample data with and without a parameter constraint.
For linear regression, find unknown parameters a0 (slope) and a1 (intercept) to minimize the difference between measured y and predicted yfit. Optionally enforce a constraint with the intercept>-0.5 and show the effect of that constraint on the regression fit.
For linear regression, find unknown parameters a0 (slope) and a1 (intercept) to minimize the difference between measured y and predicted yfit.
Report the parameter values, the R2 value of fit, and display a plot of the results.
where n is the length of y and a0 and a1 are adjusted to minimize the sum of the squared errors.
Report the parameter values, the R2 value of fit, and display a plot of the results. Enforce a constraint with the intercept>-0.5 and show the effect of that constraint on the regression fit compared to the unconstrained least squares solution.
- gekko constrained optimization
- gekko optimization (allows constraints)
$$\sum_{i=1}^n \left(y_i-y_{fit,i}\right)^2$$
$$\min_{a_0,a_1} \sum_{i=1}^n \left(y_i-y_{fit,i}\right)^2$$
Minimize Objective
$$\sum_{i=1}^n \left(y_i-y_{fit,i}\right)^2$$
(:toggle hide regress1 button show="Show Source Code":) (:div id=regress1:)
OLS Regression Results
============================================================================== Dep. Variable: y R-squared: 0.897 Model: OLS Adj. R-squared: 0.880 Method: Least Squares F-statistic: 52.19 Date: Wed, 26 Aug 2020 Prob (F-statistic): 0.000357 Time: 22:05:45 Log-Likelihood: 2.9364 No. Observations: 8 AIC: -1.873 Df Residuals: 6 BIC: -1.714 Df Model: 1 Covariance Type: nonrobust ==============================================================================
coef std err t P>|t| [0.025 0.975]
x1 0.1980 0.027 7.224 0.000 0.131 0.265 const -0.5432 0.115 -4.721 0.003 -0.825 -0.262 ============================================================================== Omnibus: 2.653 Durbin-Watson: 0.811 Prob(Omnibus): 0.265 Jarque-Bera (JB): 0.918 Skew: 0.827 Prob(JB): 0.632 Kurtosis: 2.862 Cond. No. 7.32 ============================================================================== (:sourceend:)
(:toggle hide regress1 button show="Show Source Code":) (:div id=regress1:) (:source lang=python:)
(:sourceend:)
(:source lang=python:)
OLS Regression Results
============================================================================== Dep. Variable: y R-squared: 0.897 Model: OLS Adj. R-squared: 0.880 Method: Least Squares F-statistic: 52.19 Date: Wed, 26 Aug 2020 Prob (F-statistic): 0.000357 Time: 22:05:45 Log-Likelihood: 2.9364 No. Observations: 8 AIC: -1.873 Df Residuals: 6 BIC: -1.714 Df Model: 1 Covariance Type: nonrobust ==============================================================================
coef std err t P>|t| [0.025 0.975]
x1 0.1980 0.027 7.224 0.000 0.131 0.265 const -0.5432 0.115 -4.721 0.003 -0.825 -0.262 ============================================================================== Omnibus: 2.653 Durbin-Watson: 0.811 Prob(Omnibus): 0.265 Jarque-Bera (JB): 0.918 Skew: 0.827 Prob(JB): 0.632 Kurtosis: 2.862 Cond. No. 7.32 ==============================================================================
Linear Regression
Example 1: Linear Regression
For linear regression, find unknown parameters a0 (slope) and a1 (intercept) to minimize the difference between measured y and predicted yfit. Optionally enforce a constraint on the slope>-0.15 and determine the effect of the constraint on the fit.
For linear regression, find unknown parameters a0 (slope) and a1 (intercept) to minimize the difference between measured y and predicted yfit. Optionally enforce a constraint with the intercept>-0.5 and show the effect of that constraint on the regression fit.
$$x = [4, 5, 2, 3, -1, 1, 6, 7]$$
$$x = [4,5,2,3,-1,1,6,7]$$
- numpy matrix operations
- numpy.linalg
c[0].lower=0; c[1].upper=0 # non-binding constraints
c[1].lower=-0.5
a = [c[0].value[0],c[1].value[1]] print(a)
c = [c[0].value[0],c[1].value[1]] print(c)
plt.plot(x,y,'o')
plt.plot(x,y,'ko',label='data')
plt.plot(xp,a[0]*xp+a[1],'r-')
eqn = 'y=slopex'+intercept plt.text(-0.2,0.5,eqn)
eqn = 'LstSQ: y=slopex'+intercept plt.plot(xp,a[0]*xp+a[1],'r-',label=eqn) slope = str(np.round(c[0],2)) intercept = str(np.round(c[1],2)) eqn = 'Constraint: y=slopex'+intercept plt.plot(xp,c[0]*xp+c[1],'b--',label=eqn)
plt.legend()
(:source lang=python:)
OLS Regression Results
============================================================================== Dep. Variable: y R-squared: 0.897 Model: OLS Adj. R-squared: 0.880 Method: Least Squares F-statistic: 52.19 Date: Wed, 26 Aug 2020 Prob (F-statistic): 0.000357 Time: 22:05:45 Log-Likelihood: 2.9364 No. Observations: 8 AIC: -1.873 Df Residuals: 6 BIC: -1.714 Df Model: 1 Covariance Type: nonrobust ==============================================================================
coef std err t P>|t| [0.025 0.975]
x1 0.1980 0.027 7.224 0.000 0.131 0.265 const -0.5432 0.115 -4.721 0.003 -0.825 -0.262 ============================================================================== Omnibus: 2.653 Durbin-Watson: 0.811 Prob(Omnibus): 0.265 Jarque-Bera (JB): 0.918 Skew: 0.827 Prob(JB): 0.632 Kurtosis: 2.862 Cond. No. 7.32 ============================================================================== (:sourceend:)
Regression is the method of adjusting parameters in a model to minimize the difference between the predicted output and the measured output. The predicted output is calculated from a measured input (univariate) or multiple inputs (multivariate). In machine learning terminology, the data inputs (x) are features and the measured outputs (y) are labels.
Univariate Linear Regression
Regression is the method of adjusting parameters in a model to minimize the difference between the predicted output and the measured output. The predicted output is calculated from a measured input (univariate), multiple inputs and a single output (multiple linear regression), or multiple inputs and outputs (multivariate linear regression).
- linear regression: x and y are scalars
- multiple linear regression: x is a vector, y is a scalar response
- multivariate linear regression: x is a vector, y is a vector response
In machine learning terminology, the data inputs (x) are features and the measured outputs (y) are labels.
Linear Regression
- Method 3: matrix solution
- Method 3: numpy linalg solution
- matrix operations
a = np.linalg.solve(XX,XTy); print(a) yfit = a[0]*x+a[1]
a = np.linalg.solve(XX,XTy)
- same solution with lstsq
a = np.linalg.lstsq(X,y,rcond=None)[0] yfit = a[0]*x+a[1]; print(a)
- Method 5: Gekko for constrained nonlinear regression
- Method 5: Gekko for constrained regression
Solution (5 Python Methods)
Solution
There are many methods for regression in Python with 5 different packages to generate the solution. All give the same solution but the methods are different. The methods are from the packages:
- scipy.stats.linregress
- numpy.polyfit
- numpy matrix operations
- statsmodels ordinary least squares
- gekko constrained optimization
(:toggle hide regress1 button show="Show Source Code":) (:div id=regress1:)
(:divend:)
Regression is the method of adjusting parameters in a model to minimize the difference between the predicted output and the measured output. The predicted output is calculated from a measured input (univariate) or multiple inputs (multivariate). In machine learning terminology, the data inputs (x) are features and the measured outputs (y) are labels.
Regression is the method of adjusting parameters in a model to minimize the difference between the predicted output and the measured output. The predicted output is calculated from a measured input (univariate) or multiple inputs (multivariate). In machine learning terminology, the data inputs (x) are features and the measured outputs (y) are labels.
(:title Univariate and Multivariate Linear Regression:) (:keywords Linear, Regression, Factors, Univariate Multivariate, Optimization, Constraint:)
(:title Linear Regression:) (:keywords Linear, Regression, Factors, Univariate, Multivariate, Optimization, Constraint:)
Objective: Perform univariate linear regression on sample data with and without constraints.
Objective: Perform univariate (single input factor) linear regression on sample data with and without constraints.
For linear regression, find unknown parameters a0 (slope) and a1 (intercept) to minimize the difference between measured y and predicted yfit. Enforce a binding constraint on the slope>-0.15 to determine the effect on the fit.
For linear regression, find unknown parameters a0 (slope) and a1 (intercept) to minimize the difference between measured y and predicted yfit. Optionally enforce a constraint on the slope>-0.15 and determine the effect of the constraint on the fit.
For linear regression, find unknown parameters a0 (slope) and a1 (intercept) to minimize the difference between measured y and predicted yfit.
For linear regression, find unknown parameters a0 (slope) and a1 (intercept) to minimize the difference between measured y and predicted yfit. Enforce a binding constraint on the slope>-0.15 to determine the effect on the fit.
xp = np.linspace(-2,6,100)
xp = np.linspace(-2,8,100)
Objective: Perform univariate and multivariate linear regression on sample data with and without constraints.
Objective: Perform univariate linear regression on sample data with and without constraints.
Report the parameter values, the R2 value of fit, and display a plot of the results.
Report the parameter values, the R2 value of fit, and display a plot of the results.
Data
Data
Equation
Linear Equation
Solution
Solutions
Regression is the method of adjusting parameters in a model to minimize the difference between the predicted output and the measured output. The predicted output is calculated from a measured input (univariate) or multiple inputs (multivariate). In machine learning terminology the data inputs are "features" and the measured outputs are "labels".
Regression is the method of adjusting parameters in a model to minimize the difference between the predicted output and the measured output. The predicted output is calculated from a measured input (univariate) or multiple inputs (multivariate). In machine learning terminology, the data inputs (x) are features and the measured outputs (y) are labels.


(:title Univariate and Multivariate Linear Regression:) (:keywords Linear, Regression, Factors, Univariate Multivariate, Optimization, Constraint:) (:description Perform univariate and multivariate linear regression with Python with and without parameter constraints.:)
Objective: Perform univariate and multivariate linear regression on sample data with and without constraints.
Regression is the method of adjusting parameters in a model to minimize the difference between the predicted output and the measured output. The predicted output is calculated from a measured input (univariate) or multiple inputs (multivariate). In machine learning terminology the data inputs are "features" and the measured outputs are "labels".
For linear regression, find unknown parameters a0 (slope) and a1 (intercept) to minimize the difference between measured y and predicted yfit.
Data
$$x = [4, 5, 2, 3, -1, 1, 6, 7]$$
$$y = [0.3,0.8,-0.05,0.1,-0.8,-0.5,0.5,0.65]$$
Equation
$$y_{fit} = a_0 x + a_1$$
Report the parameter values, the R2 value of fit, and display a plot of the results.
Solution

(:source lang=python:) import numpy as np from scipy.stats import linregress import statsmodels.api as sm import matplotlib.pyplot as plt from gekko import GEKKO
- Data
x = np.array([4,5,2,3,-1,1,6,7]) y = np.array([0.3,0.8,-0.05,0.1,-0.8,-0.5,0.5,0.65])
- calculate R^2
def rsq(y1,y2):
yresid= y1 - y2 SSresid = np.sum(yresid**2) SStotal = len(y1) * np.var(y1) r2 = 1 - SSresid/SStotal return r2
- Method 1: scipy linregress
slope,intercept,r,p_value,std_err = linregress(x,y) a = [slope,intercept] print('R^2 linregress = '+str(r**2))
- Method 2: numpy polyfit (1=linear)
a = np.polyfit(x,y,1); print(a) yfit = np.polyval(a,x) print('R^2 polyfit = '+str(rsq(y,yfit)))
- Method 3: matrix solution
- y = X a
- X^T y = X^T X a
X = np.vstack((x,np.ones(len(x)))).T XX = np.dot(X.T,X) XTy = np.dot(X.T,y) a = np.linalg.solve(XX,XTy); print(a) yfit = a[0]*x+a[1] print('R^2 matrix = '+str(rsq(y,yfit)))
- Method 4: statsmodels ordinary least squares
X = sm.add_constant(x,prepend=False) model = sm.OLS(y,X).fit() yfit = model.predict(X) a = model.params print(model.summary())
- Method 5: Gekko for constrained nonlinear regression
m = GEKKO(remote=False); m.options.IMODE=2 c = m.Array(m.FV,2); c[0].STATUS=1; c[1].STATUS=1 c[0].lower=0; c[1].upper=0 # non-binding constraints xd = m.Param(x); yd = m.Param(y); yp = m.Var() m.Equation(yp==c[0]*xd+c[1]) m.Minimize((yd-yp)**2) m.solve(disp=False) a = [c[0].value[0],c[1].value[1]] print(a)
- plot data and regressed line
plt.plot(x,y,'o') xp = np.linspace(-2,6,100) plt.plot(xp,a[0]*xp+a[1],'r-') slope = str(np.round(a[0],2)) intercept = str(np.round(a[1],2)) eqn = 'y=slopex'+intercept plt.text(-0.2,0.5,eqn) plt.grid() plt.show() (:sourceend:)
(:html:) <iframe width="560" height="315" src="https://www.youtube.com/embed/BSwm2ZSstEY" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe> (:htmlend:)
There is additional information about regression in the Data Science Online Course.