AdaBoost Classification

AdaBoost (Adaptive Boosting) is a machine learning algorithm for classification. It is a type of ensemble learning algorithm that is used to improve the accuracy of a model by combining the predictions of multiple weaker models.

The algorithm works by iteratively training a series of weak models on subsets of the data, with each model attempting to correct the mistakes of the previous model. At each iteration, the weights of the incorrectly predicted examples are increased, so that the next model is more likely to focus on the examples that were misclassified by the previous models. It is used as a supervisory layer to other classification algorithms such as neural networks, decisions trees, and support vector machines. It takes weak classifiers as a weighted sum and adaptively refines the output to focus on the harder-to-classify cases.

AdaBoost Classifier in Python

Here is an example of AdaBoost in Python using the scikit-learn library.

from sklearn.ensemble import AdaBoostClassifier
ab = AdaBoostClassifier(),yA)
yP = ab.predict(XB)

The default base_estimator (if None is given) is the DecisionTreeClassifier with max_depth=1 and the default for number of estimators (n_estimators) is 50. Specifying the base_estimator and n_estimators gives more control over the options.

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier

# Create a decision tree classifier as the base model
base_model = DecisionTreeClassifier(max_depth=1)

# Train an AdaBoost classifier using the decision tree as the base model
ab = AdaBoostClassifier(base_estimator=base_model, n_estimators=50), y_train)

# Make predictions on the test set
y_pred = ab.predict(X_test)

In this example, we first create a decision tree classifier with a maximum depth of 1 as the base model for AdaBoost. Then, we train an AdaBoost classifier using the decision tree as the base model and 100 estimators. Finally, we use the trained model to make predictions on the test set.

Optical Character Recognition with AdaBoost

Optical character recognition (OCR) is the process of extracting text from images or scanned documents. OCR algorithms are typically based on machine learning models that are trained on large datasets of images containing text. Here is an example of OCR in Python using the scikit-learn library:

# Import necessary libraries
from sklearn.datasets import load_digits
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import train_test_split

# Load the dataset of images of handwritten digits
digits = load_digits()

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(,

# Train an AdaBoost classifier
model = AdaBoostClassifier(n_estimators=100)

# Train the model using the training set, y_train)

# Evaluate the model performance on the test set
accuracy = model.score(X_test, y_test)
print("Accuracy: %0.2f" % accuracy)

The accuracy is low for this particular example (26%). Further tuning of the options would increase the accuracy. Below is an example that also shows a number from the test set that is evaluated with the AdaBoost classifier.

from sklearn import datasets
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np

from sklearn.ensemble import AdaBoostClassifier
classifier = AdaBoostClassifier()

# The digits dataset
digits = datasets.load_digits()
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))

# Split into train and test subsets (50% each)
X_train, X_test, y_train, y_test = train_test_split(
    data,, test_size=0.5, shuffle=False)

# Learn the digits on the first half of the digits, y_train)

# Test on second half of data
n = np.random.randint(int(n_samples/2),n_samples)
print('Predicted: ' + str(classifier.predict([n:n+1])[0]))

# Show number
plt.imshow(digits.images[n],, interpolation='nearest')

Return to Classification Overview