Thermophysical Properties

Empirical regression has limitations, especially when predictions are requested outside of the training region. Physics-based information can overcome this limitation by including fundamental engineering knowledge such as constraints during the training process.

Background: Parachor values are a factor in the prediction of several thermophysical properties such as surface tension and thermal conductivity. The parachor value (P) is used to predict surface tension with the difference between the density of saturated liquid `\rho_L` and saturated vapor `\rho_V` at the temperature of interest.

$$\sigma = \left(P \left(\rho_L-\rho_V\right)\right)^4$$

Surface tension and thermal conductivity are two specific properties that need improved predictions. A more accurate predictor of the parachor value (P) is an important step to improve those thermophysical properties. Most deep learning methods use a limited set of activation functions (ReLU, sigmoid, tanh, and linear) and perform unconstrained regression to minimize a loss function. The purpose of this case study is to explore the addition of physics-based information in the fitting process. This may include the use of new types of activation functions or constraints on the adjustable weights. The data for this case study is from Gharagheizi, et. al. (2011) who explored deep learning (a multi-layered neural network) to improve parachor predictions for 277 compounds from 40 functional groups.

  • Name: The common chemical name
  • Formula: Chemical formula of the compound
  • CASN: Chemical Abstracts Service Registry Number
  • Family: Chemical family of the compound
  • Parachor: Estimate of parachor value
  • Grp1-Grp40: Number of functional groups in the compound

Objective: Develop a prediction of the parachar from the chemical compound data set. Report the correlation coefficient (R2) for predicting Parachor in the test set. Randomly select values that split the data into a train (80%) and test (20%) set. Use Linear Regression and Neural Network (Deep Learning) with and without constraints. There is additional information on adding constraints to linear regression (see Gekko example) and to the neural network weights. For the constrained cases, enforce a positive parachor contribution for each group. Discuss the performance of each on the train and test sets. Submit source code and a summary memo (max 2 pages) of your results.

Constrained Linear Regression Example

Constraints on the parameters are possible with Python Gekko and Keras / TensorFlow. This example shows how to include a lower bound on the slope for linear regression with Python Gekko.

from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt

# define data
x = np.random.rand(100) + np.linspace(0,100,100)
y = np.random.rand(100) - 2*np.linspace(0,1,100)

# linear regression model with Gekko
m = GEKKO()

# unknown parameters
a,b = m.Array(m.FV,2)
a.STATUS = 1; b.STATUS = 1

# variables and parameters
yp = m.Var()
ym = m.Param(y)
xm = m.Param(x)

# equations and objective
m.Equation(yp == a*xm + b)
m.Minimize((yp-ym)**2)

# solve
m.options.IMODE = 2
m.solve(disp=False)
p1 = [a.value[0],b.value[0]]

print('Slope:', p1[0])
print('Intercept:', p1[1])

# add constraint to the slope (>=0)
a.LOWER = 0.0
m.solve(disp=False)
p2 = [a.value[0],b.value[0]]

print('Slope:', p2[0])
print('Intercept:', p2[1])

# plot results
plt.plot(x,y,'r.')
plt.plot(x,np.polyval(p1,x),label='Unconstrained')
plt.plot(x,np.polyval(p2,x),label='Constrained (slope>=0)')
plt.ylabel('y'); plt.xlabel('x'); plt.legend()
plt.show()

References

  1. F. Gharagheizi, A. Eslamimanesh, A. H. Mohammadi, and D. Richon, Determination of parachor of various compounds using an artificial neural network group contribution method, Industrial & Engineering Chemistry Research, vol. 50, no. 9, pp. 5815–5823, 2011.
  2. S. Sugden, Vi.-the variation of surface tension with temperature and some related functions, J. Chem. Soc., Trans., vol. 125, pp. 32–41, 1924. [Online]. Available: https://dx.doi.org/10.1039/CT9242500032
  3. T. A. Knotts, W. V. Wilding, J. L. Oscarson, and R. L. Rowley, Use of the DIPPR database for development of QSPR correlations: Surface tension, Journal of Chemical & Engineering Data, vol. 46, no. 5, pp. 1007–1012, 2001. [Online]. Available: https://doi.org/10.1021/je000232d
  4. F. Gharagheizi, A. Eslamimanesh, A. H. Mohammadi, and D. Richon, Use of artificial neural network-group contribution method to determine surface tension of pure compounds, Journal of Chemical & Engineering Data, vol. 56, no. 5, pp. 2587–2601, 2011.

Solutions