Glass Characterization

Glass Classification and Property Prediction

This assignment focuses on the glass identification with 214 samples of window glass, containers, and headlamps. Each sample is defined by refractive index and oxide composition (weight‑percent of Na, Mg, Al, Si, K, Ca, Ba and Fe) and is labelled according to the type of glass. The original study was motivated by forensic science. Investigators needed an objective way to determine whether fragments of broken glass came from a window, container or headlamp at a crime scene. The samples were collected from the USA Forensic Science Service and cover six different classes (building windows made by float or non‑float processes, vehicle windows, containers, tableware and headlamps). Because glass type is defined by compositional ratios, machine‑learning models are a natural tool for classification.

Background: Composition, Color and Manufacturing

The glass in everyday windows and containers is typically a soda‑lime‑silica composition. Pure silica melts at roughly 1,700 °C, so manufacturers add sodium carbonate to lower the melting temperature and enable workable viscosities at around 800 °C. However, soda‑rich glasses are water‑soluble; adding calcium oxide (CaO) and magnesium oxide (MgO) stabilizes the network and makes the glass durable. Commercial soda‑lime glass therefore contains approximately 70–74 % silica, 12–16 % sodium oxide, 5–11 % calcium oxide and a few percent magnesium and aluminum oxides. Lead crystal and other heavy‑metal glasses substitute calcium for lead or barium oxides, greatly increasing the refractive index and resulting in brilliant, sparkly glass. These compositional adjustments underpin both the classification problem (different products use different oxide ratios) and the regression of optical properties.

In the float glass industry, these oxide ratios originate from carefully batched raw materials. A typical float batch contains roughly 60 % quartz sand, 20 % soda and sulfate, and 20 % limestone and dolomite. The mix is melted in a furnace at about 1,500 °C, refined at 1,100–1,300 °C to eliminate bubbles and homogenize the melt, and then poured over a bath of molten tin where it spreads into a perfectly flat ribbon. The molten ribbon leaves the tin bath at approximately 600 °C and enters the annealing lehr.

Color and optical appearance are determined by trace impurities and additives. Iron oxide impurities impart a pale green or aqua tint, an effect that becomes more apparent in thick pieces of soda‑lime glass. Manufacturers introduce de‑colorizing agents such as selenium or select low‑iron sand to achieve “flint” glass that appears colorless. Other impurities produce characteristic colors: iron oxide can give glass a yellow or brown hue, copper produces green or blue tones and cobalt leads to deep blues. Beyond color, the refractive index (RI), a continuous value in the data set, is also determined by composition. Heavy ions such as lead or barium increase the optical density and refractive index.

Float glass manufacturing and annealing

Most flat glass used in windows and vehicles is produced by the float process, where molten glass is poured onto a bath of molten tin and allowed to spread into a continuous ribbon. After leaving the tin bath, the ribbon is roughly 600 °C and is conveyed into a long, temperature‑controlled lehr oven. This lehr is typically several meters wide and more than 100 m long and is divided into zones. In the initial A‑section the temperature is kept uniform so that the ribbon does not warp. In the B‑sections the glass passes through the annealing range (approximately 540 °C down to 470 °C). Electric heaters and controlled cooling tubes maintain a slow, linear cooling rate through this range to avoid locking in stresses. Finally, a mass‑air section uses fans to cool the glass to near room temperature while maintaining uniformity across the ribbon. Uniform controlled cooling from about 600 °C to 60 °C ensures that residual stresses are distributed evenly across the ribbon. If the glass cools too quickly or unevenly, lateral and plane stresses develop. Permanent stresses form at 470-480 °C, while temporary stresses arise at lower temperatures. Excessive residual stress at the edges makes cutting difficult, whereas too little stress leads to brittle edges that do not cut cleanly. This manufacturing context explains why composition and processing conditions influence optical and mechanical properties. The annealing lehr itself comprises multiple temperature‑controlled sections. The lehr is divided into closed and open areas to manage heat transfer. The annealing zone, where permanent stresses are locked in, lies between roughly 566 °C and 496 °C. Below this range, the ribbon enters a forced‑cooling section and eventually reaches room temperature, leaving only residual stress. Properly annealed glass exhibits compressive stress at the edges and tensile stress at the mid‑plane. If cooling is mismanaged, ribbon breakage and defects such as scratches or warping can occur.

Soda‑lime glass owes the widespread adoption to workability and cost. Typical soda‑lime glass consists of about 70 % silica, 15 % soda and 9 % lime. The soda acts as a flux to lower the melting point while the lime stabilizes the silica network. Because soda‑lime glass can be re‑softened multiple times, it is inexpensive, chemically stable and suited to products ranging from bottles and windowpanes to light bulbs.

Data Sets

The main data file for this assignment is glass.csv. It contains the nine compositional features, refractive index, and a class label for each of 214 samples with no missing values. The columns are:

Column Description (units) Usage
RI Refractive index classification feature and regression target
Na Sodium weight % (in corresponding oxide) feature
Mg Magnesium weight % feature
Al Aluminum weight % feature
Si Silicon weight % feature
K Potassium weight % feature
Ca Calcium weight % feature
Ba Barium weight % feature
Fe Iron weight % feature
Type Class label identifying the glass type target for classification

Download the data here: glass.csv. The Glass UCI Machine Learning Repository is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Part 1: Data Visualization and Cleansing

  1. Summary statistics: Use pandas, seaborn or a profiling tool to generate summary statistics for each feature. Identify the range, mean, median and standard deviation and comment on any anomalies. Are there obvious outliers in RI, Mg or other features? Plot box plots or violin plots to visualize the distribution of each feature.
  2. Pair plot and correlation matrix: Create a pair plot of the nine features colored by Type and compute a correlation matrix (Pearson coefficient). Identify which compositional variables are most correlated with specific glass types or with the refractive index.
  3. Outlier removal: Decide whether any samples should be removed based on the visualizations. If you remove observations, justify your decision and report how many data points remain.
  4. Balanced representation: Analyze the number of each glass type. Use methods to achieve a balanced representation of glass type.

Part 2: Classification of Glass Types

Your goal is to build models that classify the glass into the correct types using the eight compositional variables and refractive index. Complete the following steps:

  1. Train/test split: Randomly split the data into an 80 % training set and a 20 % test set using train_test_split with shuffle=True and a fixed random_state for reproducibility.
  2. Try at least eight classifiers: Evaluate a variety of classifiers.
  3. Confusion matrices and metrics: For each classifier, compute the confusion matrix on the test set and report accuracy, precision, recall and F1 score. Discuss which classes are most frequently misclassified and speculate on possible reasons (e.g., overlapping compositions).
  4. Model selection: Recommend a “best” classifier based on your evaluation metrics. Justify your choice.

Part 3: Classification with RI-Only

Refractive index is relatively easy to measure, whereas obtaining full composition data is more difficult and expensive. Repeat the classification analysis using only refractive index as the input feature, excluding all compositional variables. Train and evaluate the same set of classifiers using identical train/test splits and evaluation metrics. Determine the highest classification accuracy achievable with refractive index alone, and compare this performance to the models that use both composition and refractive index. Discuss the tradeoff between predictive performance and measurement practicality, and comment on whether refractive index alone could provide a viable simplified field measurement approach.

Part 4: Regression of Optical Properties

Develop regression models to predict a continuous property of glass from its composition. You may pursue one or both of the following directions:

  • Refractive index prediction: Use Na, Mg, Al, Si, K, Ca, Ba and Fe as features to predict RI. Randomly split the data (80 % train, 20 % test) and evaluate at least three regression techniques, such as Linear Regression, K‑Nearest Neighbors Regressor, Support Vector Regressor, Random Forest Regressor, Neural Network Regressor or XGBoost Regressor. Report the coefficient of determination (R²) and mean absolute error (MAE) for train and test sets.

Compare the regression methods and discuss which models best capture the relationships. Highlight any features that have strong or weak predictive power.

What to Turn In

Submit a repository containing:

  1. Source code for all analyses (Python script or Jupyter notebook). Comment your code so that others can follow your workflow.
  2. Executive summary memo (≤ 2 pages) describing:
    • (i) the outliers and data cleaning decisions
    • (ii) variables most correlated with glass type and/or refractive index
    • (iii) classification results and your recommended model
    • (iv) regression results and insights on predicting refractive index
    • (v) any further observations.

Data Provided

  • glass.csv: The primary data set for classification and regression (214 samples). Download here.

Course on GitHub

Exams

Data Engineering

Agentic Engineering

Classification

Supervised Learning

Unsupervised Learning

Regression

Time-Series

Computer Vision

Applications

3D Print 📈📊
Automotive 📈📊
Battery Life⏱️📈
Bit Classification 👁️📊
Facial Recognition 👁️📊
Glass Type⏱️📈
Hand Tracking 👁️
OT Cybersecurity ⏱️📊
Batteries 📊
Polymers 📈
Road Detection 👁️📊
Safety 👁️
Soils 👁️📊
Sonar 📊
Texture 👁️📊
Wind Power ⏱️📈
📈=Regression
📊=Classification
⏱️=Time Series
👁️=Computer Vision
🎧=Audio

TCLab Project

Related Courses

Admin

Streaming Chatbot
💬