Transformer Forecasting with PyTorch
Transformers have revolutionized the field of Natural Language Processing (NLP) and are increasingly being used in time-series forecasting. Originally introduced in the paper Attention Is All You Need, transformers have become the backbone of many state-of-the-art language models like BERT, GPT, and others. This page uses PyTorch and code is also available with the Keras / TensorFlow packages.

Transformers in Large Language Models (LLM)
Transformers in LLMs like GPT and BERT use self-attention mechanisms to process text. They are capable of capturing contextual information from the entire text input, making them effective for a variety of NLP tasks.
Basic Usage Example with the transformers package
# Using a pre-trained model
generator = pipeline('text-generation', model='gpt2')
generated_text = generator("Today is a beautiful day and", max_length=30)
Fine-tuning a transformer model allows for customization to specific tasks or datasets. For example, fine-tuning a model on Python Gekko data involves adjusting the model to better understand and generate Python code related to the Gekko library. For detailed steps and code examples for fine-tuning the phi2-microsoft model with training data for Python Gekko, refer to this GitHub repository:
Transformers in Time-Series Forecasting
In time-series forecasting, transformers are used to analyze sequential data, capturing temporal dependencies. They are particularly effective in scenarios where long-range dependencies are important.
Data Generation: This section of the script is responsible for creating synthetic time-series data. It uses a sine function to generate a sequence of data points, mimicking a real-world time-series dataset. The data is split into sequences of a specified length, with each sequence used to predict the next point in the series. This mimics a common scenario in time-series analysis where past data is used to predict future values.
import torch
import torch.nn as nn
from import Dataset, DataLoader
# Generating synthetic time-series data
def generate_data(size=1000, sequence_length=10):
data = np.sin(np.linspace(0, 10 * np.pi, size)) # Sine wave data
sequences = [data[i:i+sequence_length] for i in range(size-sequence_length)]
next_points = data[sequence_length:]
return np.array(sequences), next_points
Custom Dataset Class: This part defines a custom Dataset class for handling the time-series data, making it compatible with PyTorch's data handling utilities. The class, TimeSeriesDataset, takes sequences and their corresponding next points, and implements methods to allow easy access to these data points and their labels (the next points in the series). This class is essential for feeding the data into a DataLoader for batching and shuffling during training.
class TimeSeriesDataset(Dataset):
def __init__(self, sequences, next_points):
self.sequences = sequences
self.next_points = next_points
def __len__(self):
return len(self.sequences)
def __getitem__(self, idx):
return self.sequences[idx], self.next_points[idx]
Transformer Model Definition: In this section, a transformer model specifically tailored for numerical time-series data is defined. The TransformerModel class extends PyTorch's nn.Module. It's designed to process sequences of numerical data using a transformer encoder, which is capable of capturing temporal dependencies in the data. The model includes a fully connected output layer to make the final prediction of the next point in the series.
class TransformerModel(nn.Module):
def __init__(self, input_size=1, sequence_length=10, num_layers=1, \
num_heads=2, dim_feedforward=512):
super(TransformerModel, self).__init__()
self.sequence_length = sequence_length
self.encoder_layer = nn.TransformerEncoderLayer(d_model=input_size*sequence_length,
self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer,
self.fc_out = nn.Linear(input_size * sequence_length, 1)
def forward(self, src):
# Reshape to match the input dimensions
src = src.reshape(-1, self.sequence_length, 1)
src = src.flatten(start_dim=1)
src = src.unsqueeze(0) # Add batch dimension
out = self.transformer_encoder(src)
out = out.squeeze(0) # Remove batch dimension
return self.fc_out(out)
Data Preparation for Training: This part of the script involves instantiating the dataset and DataLoader. The DataLoader is used to batch and shuffle the dataset, making it ready for efficient training.
sequences, next_points = generate_data()
dataset = TimeSeriesDataset(sequences, next_points)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
Model Training: Here, the model is trained using the time-series data. The training loop involves passing batches of data through the model, calculating the loss (using Mean Squared Error as the criterion), and updating the model's weights with backpropagation. The optimizer used is Adam, a popular choice for training neural networks.
model = TransformerModel()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# Training loop
for epoch in range(9): # Number of epochs
for seq, next_point in dataloader:
seq, next_point = seq.float(), next_point.float().unsqueeze(1)
output = model(seq)
loss = criterion(output, next_point)
print(f"Epoch {epoch+1}, Loss: {loss.item()}")
Epoch 1, Loss: 0.057174958288669586 Epoch 2, Loss: 0.030068831518292427 Epoch 3, Loss: 0.011044860817492008 Epoch 4, Loss: 0.034356195479631424 Epoch 5, Loss: 0.029425013810396194 Epoch 6, Loss: 0.10149335861206055 Epoch 7, Loss: 0.007862072438001633 Epoch 8, Loss: 0.0072705368511378765 Epoch 9, Loss: 0.008393393829464912
Prediction: In the final part of the script, the trained model is used to make a prediction. It takes a sequence from the dataset and predicts the next data point. This demonstrates the model's ability to perform the intended time-series forecasting task.
test_seq = torch.tensor(sequences[0]).float()
predicted_point = model(test_seq)
print("Predicted next point:", predicted_point.item())
Predicted next point: 0.3461025655269623
The complete code is given below on generating synthetic data, training the transformer, and predicting the next point as a time-series forecast.
import torch
import torch.nn as nn
from import Dataset, DataLoader
# Generating synthetic time-series data
def generate_data(size=1000, sequence_length=10):
data = np.sin(np.linspace(0, 10 * np.pi, size)) # Sine wave data
sequences = [data[i:i+sequence_length] for i in range(size-sequence_length)]
next_points = data[sequence_length:]
return np.array(sequences), next_points
# Custom dataset class
class TimeSeriesDataset(Dataset):
def __init__(self, sequences, next_points):
self.sequences = sequences
self.next_points = next_points
def __len__(self):
return len(self.sequences)
def __getitem__(self, idx):
return self.sequences[idx], self.next_points[idx]
# Transformer Model (simplified for numerical data)
class TransformerModel(nn.Module):
def __init__(self, input_size=1, sequence_length=10, num_layers=1, \
num_heads=2, dim_feedforward=512):
super(TransformerModel, self).__init__()
self.sequence_length = sequence_length
self.encoder_layer = nn.TransformerEncoderLayer(d_model=input_size \
* sequence_length, nhead=num_heads, \
self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, \
self.fc_out = nn.Linear(input_size * sequence_length, 1)
def forward(self, src):
# Reshape to match the input dimensions
src = src.reshape(-1, self.sequence_length, 1)
src = src.flatten(start_dim=1)
src = src.unsqueeze(0) # Add batch dimension
out = self.transformer_encoder(src)
out = out.squeeze(0) # Remove batch dimension
return self.fc_out(out)
# Prepare data
sequences, next_points = generate_data()
dataset = TimeSeriesDataset(sequences, next_points)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
# Model
model = TransformerModel()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# Training loop
for epoch in range(9): # Number of epochs
for seq, next_point in dataloader:
seq, next_point = seq.float(), next_point.float().unsqueeze(1)
output = model(seq)
loss = criterion(output, next_point)
print(f"Epoch {epoch+1}, Loss: {loss.item()}")
# Predict the next point after a sequence
test_seq = torch.tensor(sequences[0]).float()
predicted_point = model(test_seq)
print("Predicted next point:", predicted_point.item())
Further Reading
- Attention Is All You Need - Seminal paper introducing transformers.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - Paper introducing BERT.
- Language Models are Few-Shot Learners - Paper on the GPT-3 model.
- Transformers in Time-Series: A Survey
- How to Apply Transformers to Time Series Models - Medium Article by Intel Tech.
- Time Series Transformer - Documentation on HuggingFace.
✅ Knowledge Check
1. What is the primary mechanism that transformers use to process text in language models?
- Incorrect. Transformers use self-attention mechanisms to process text.
- Correct. Transformers use self-attention mechanisms to process text.
- Incorrect. Transformers use self-attention mechanisms to process text.
2. Why are transformers effective in time-series forecasting?
- Correct. Transformers are effective because they can capture long-range dependencies in sequential data.
- Incorrect. The effectiveness of transformers is not primarily due to reduced data requirements.
- Incorrect. This is a disadvantage of LSTM models that do not capture long-range dependencies because of vanishing gradients.