TCLab with Reinforcement Learning
Main.RLTCLab History
Hide minor edits - Show changes to output
Added lines 92-93:
if action.dim() == 1: # If action is 1D, reshape it to (batch_size, action_dim)
action = action.unsqueeze(1)
action = action.unsqueeze(1)
Changed lines 113-115 from:
return torch.tensor(states, dtype=torch.float32), torch.tensor(actions, dtype=torch.float32), \
torch.tensor(rewards, dtype=torch.float32).unsqueeze(1), \
torch.tensor(next_states, dtype=torch.float32), torch.tensor(dones, dtype=torch.float32).unsqueeze(1)
to:
return (
torch.tensor(states, dtype=torch.float32),
torch.tensor(actions, dtype=torch.float32),
torch.tensor(rewards, dtype=torch.float32).unsqueeze(1),
torch.tensor(next_states, dtype=torch.float32),
torch.tensor(dones, dtype=torch.float32).unsqueeze(1)
)
def __len__(self): # Add this method
return len(self.buffer)
torch.tensor(states, dtype=torch.float32),
torch.tensor(actions, dtype=torch.float32),
torch.tensor(rewards, dtype=torch.float32).unsqueeze(1),
torch.tensor(next_states, dtype=torch.float32),
torch.tensor(dones, dtype=torch.float32).unsqueeze(1)
)
def __len__(self): # Add this method
return len(self.buffer)
Added lines 128-130:
# Initialize Gymnasium environment
env = TCLabEnv(setpoint=50)
env = TCLabEnv(setpoint=50)
Changed lines 151-152 from:
action = actor(state_tensor).detach().numpy()
next_state, reward, done, _, _ = env.step(action)
to:
action = actor(state_tensor).detach().cpu().numpy().flatten()[0] # Convert to scalar
next_state, reward, done, _, _ = env.step([action]) # Wrap action in list
next_state, reward, done, _, _ = env.step([action]) # Wrap action in list
Added lines 97-98:
Implements a memory buffer to store experience tuples (state, action, reward, next state, done). The replay buffer randomly samples batches of experiences for stable training of the neural networks.
Added lines 119-125:
Executes the main training process where the RL agent interacts with the TCLab environment over multiple episodes. In each step:
* The actor selects heater power actions.
* Experience data is collected and stored in the replay buffer.
* The actor and critic networks are trained using sampled experiences.
* Target networks are softly updated to stabilize learning.
Changed lines 21-22 from:
Define a custom Gymnasium environment to interface with TCLab.
to:
Define a custom Gymnasium environment to interface with TCLab. This class defines the interface between the TCLab (Temperature Control Lab) hardware and Python through a Gymnasium custom environment. The environment allows the RL agent to interact with heaters and sensors, apply actions, and receive temperature readings.
Changed lines 59-62 from:
Define the '''Actor''' and '''Critic''' neural networks.
to:
Define the '''Actor''' and '''Critic''' neural networks. This code creates two neural network classes using PyTorch:
* Actor: Determines the control action (heater power level) based on the current temperature.
* Critic: Evaluates the quality of actions by estimating the expected future rewards from a given state-action pair.
* Actor: Determines the control action (heater power level) based on the current temperature.
* Critic: Evaluates the quality of actions by estimating the expected future rewards from a given state-action pair.
Changed lines 171-172 from:
This RL implementation successfully controls TCLab temperature using DDPG with PyTorch. The agent learns to adjust heater power to maintain the temperature setpoint with minimal error.
to:
This RL implementation attempts to control the TCLab temperature using DDPG with PyTorch. The agent learns to adjust heater power to maintain the temperature setpoint with minimal error.
Added line 179:
* Convert from SISO to MIMO control
Changed line 5 from:
%width=15px%Attach:github.png [[https://github.com/APMonitor/dynopt/blob/master/RL_for_Engineers.ipynb|GitHub]] | %width=20px%Attach:colab.png [[https://colab.research.google.com/github/APMonitor/dynopt/blob/master/RL_for_Engineers.ipynb|Google Colab]]
to:
%width=15px%Attach:github.png [[https://github.com/APMonitor/dynopt/blob/master/TCLab_RL.ipynb|GitHub]] | %width=20px%Attach:colab.png [[https://colab.research.google.com/github/APMonitor/dynopt/blob/master/TCLab_RL.ipynb|Google Colab]]
Changed line 32 from:
self.lab = tclab.TCLab() # Connect to TCLab hardware
to:
self.lab = tclab.TCLabModel() # Connect to TCLab hardware with TCLab()
Changed line 1 from:
(:title TCLab Temperature Control with Reinforcement Learning:)
to:
(:title TCLab with Reinforcement Learning:)
Changed line 59 from:
Define the **Actor** and **Critic** neural networks.
to:
Define the '''Actor''' and '''Critic''' neural networks.
Changed lines 7-20 from:
This
The
- **
- **
The RL agent
We define
to:
This page demonstrates a Reinforcement Learning (RL) approach for controlling the Temperature Control Lab (TCLab) using a Deep Deterministic Policy Gradient (DDPG) algorithm in PyTorch. The RL agent learns to adjust the heater power to maintain a desired temperature setpoint.
'''TCLab Environment'''
The [[https://apmonitor.com/heat.htm|TCLab]] is an Arduino-based temperature control system with:
* Two heaters
* Two temperature sensors
* Python / MATLAB / Simulink interface
The RL agent learns to control heater power to maintain a temperature set point.
'''Gymnasium Custom Environment'''
Define a custom Gymnasium environment to interface with TCLab.
'''TCLab Environment'''
The [[https://apmonitor.com/heat.htm|TCLab]] is an Arduino-based temperature control system with:
* Two heaters
* Two temperature sensors
* Python / MATLAB / Simulink interface
The RL agent learns to control heater power to maintain a temperature set point.
'''Gymnasium Custom Environment'''
Define a custom Gymnasium environment to interface with TCLab.
Changed lines 56-58 from:
We define
to:
'''Actor-Critic Networks (PyTorch)'''
Define the **Actor** and **Critic** neural networks.
Changed lines 92-93 from:
to:
'''Replay Buffer'''
Changed lines 113-114 from:
to:
'''Training Loop'''
Changed lines 171-179 from:
This
- **
- **
- **
This guide provides a
to:
This RL implementation successfully controls TCLab temperature using DDPG with PyTorch. The agent learns to adjust heater power to maintain the temperature setpoint with minimal error.
'''Next Steps'''
* Train on real TCLab hardware
* Optimize hyperparameters
* Compare RL vs. PID control
This guide provides a step-by-step RL implementation for process control applications using TCLab.
'''Next Steps'''
* Train on real TCLab hardware
* Optimize hyperparameters
* Compare RL vs. PID control
This guide provides a step-by-step RL implementation for process control applications using TCLab.
Added lines 1-173:
(:title TCLab Temperature Control with Reinforcement Learning:)
(:keywords TCLab, reinforcement learning, RL, Gymnasium, DDPG, PyTorch, process control:)
(:description Implementing Deep Deterministic Policy Gradient (DDPG) with PyTorch for temperature control using the TCLab environment:)
%width=15px%Attach:github.png [[https://github.com/APMonitor/dynopt/blob/master/RL_for_Engineers.ipynb|GitHub]] | %width=20px%Attach:colab.png [[https://colab.research.google.com/github/APMonitor/dynopt/blob/master/RL_for_Engineers.ipynb|Google Colab]]
## **Introduction**
This page demonstrates a **Reinforcement Learning (RL) approach** for **controlling the Temperature Control Lab (TCLab)** using the **Deep Deterministic Policy Gradient (DDPG) algorithm in PyTorch**. The RL agent learns to adjust the **heater power** to maintain a desired temperature setpoint.
## **TCLab Environment**
The **TCLab** is an Arduino-based **temperature control system** with:
- **Two heaters**
- **Two temperature sensors**
- **Python interface via the tclab package**
The RL agent will learn to control **heater power** to maintain a **temperature setpoint**.
### **Gymnasium Custom Environment**
We define a **custom Gymnasium environment** to interface with TCLab.
(:source lang=python:)
import gymnasium as gym
import numpy as np
import torch
import tclab
class TCLabEnv(gym.Env):
def __init__(self, setpoint=50):
super(TCLabEnv, self).__init__()
self.lab = tclab.TCLab() # Connect to TCLab hardware
self.setpoint = setpoint
self.action_space = gym.spaces.Box(low=np.array([0]), high=np.array([100]), dtype=np.float32)
self.observation_space = gym.spaces.Box(low=np.array([0]), high=np.array([100]), dtype=np.float32)
def reset(self):
self.lab.Q1(0) # Turn off heater
self.lab.Q2(0)
return np.array([self.lab.T1]), {}
def step(self, action):
self.lab.Q1(action[0]) # Apply action
self.lab.Q2(action[0])
temperature = self.lab.T1 # Read temperature
reward = -abs(temperature - self.setpoint) # Reward: minimize error
done = False # No terminal state in continuous control
return np.array([temperature]), reward, done, False, {}
def close(self):
self.lab.Q1(0)
self.lab.Q2(0)
self.lab.close()
(:sourceend:)
## **Actor-Critic Networks (PyTorch)**
We define the **Actor** and **Critic** neural networks.
(:source lang=python:)
import torch.nn as nn
import torch.optim as optim
class Actor(nn.Module):
def __init__(self, state_dim, action_dim, max_action):
super(Actor, self).__init__()
self.net = nn.Sequential(
nn.Linear(state_dim, 128), nn.ReLU(),
nn.Linear(128, 128), nn.ReLU(),
nn.Linear(128, action_dim),
nn.Sigmoid() # Output between 0 and 1
)
self.max_action = max_action
def forward(self, state):
return self.max_action * self.net(state)
class Critic(nn.Module):
def __init__(self, state_dim, action_dim):
super(Critic, self).__init__()
self.net = nn.Sequential(
nn.Linear(state_dim + action_dim, 128), nn.ReLU(),
nn.Linear(128, 128), nn.ReLU(),
nn.Linear(128, 1)
)
def forward(self, state, action):
return self.net(torch.cat([state, action], dim=1))
(:sourceend:)
## **Replay Buffer**
(:source lang=python:)
import random
from collections import deque
class ReplayBuffer:
def __init__(self, capacity=10000):
self.buffer = deque(maxlen=capacity)
def add(self, state, action, reward, next_state, done):
self.buffer.append((state, action, reward, next_state, done))
def sample(self, batch_size):
batch = random.sample(self.buffer, batch_size)
states, actions, rewards, next_states, dones = zip(*batch)
return torch.tensor(states, dtype=torch.float32), torch.tensor(actions, dtype=torch.float32), \
torch.tensor(rewards, dtype=torch.float32).unsqueeze(1), \
torch.tensor(next_states, dtype=torch.float32), torch.tensor(dones, dtype=torch.float32).unsqueeze(1)
(:sourceend:)
## **Training Loop**
(:source lang=python:)
actor = Actor(state_dim=1, action_dim=1, max_action=100)
critic = Critic(state_dim=1, action_dim=1)
target_actor = Actor(state_dim=1, action_dim=1, max_action=100)
target_critic = Critic(state_dim=1, action_dim=1)
target_actor.load_state_dict(actor.state_dict())
target_critic.load_state_dict(critic.state_dict())
actor_optimizer = optim.Adam(actor.parameters(), lr=1e-3)
critic_optimizer = optim.Adam(critic.parameters(), lr=1e-3)
buffer = ReplayBuffer()
gamma = 0.99
tau = 0.005
for episode in range(100):
state, _ = env.reset()
episode_reward = 0
for step in range(200):
state_tensor = torch.tensor(state, dtype=torch.float32).unsqueeze(0)
action = actor(state_tensor).detach().numpy()
next_state, reward, done, _, _ = env.step(action)
buffer.add(state, action, reward, next_state, done)
state = next_state
episode_reward += reward
if len(buffer) > 64:
states, actions, rewards, next_states, dones = buffer.sample(64)
with torch.no_grad():
next_actions = target_actor(next_states)
target_q = target_critic(next_states, next_actions)
target_q = rewards + gamma * (1 - dones) * target_q
current_q = critic(states, actions)
critic_loss = nn.MSELoss()(current_q, target_q)
critic_optimizer.zero_grad()
critic_loss.backward()
critic_optimizer.step()
actor_loss = -critic(states, actor(states)).mean()
actor_optimizer.zero_grad()
actor_loss.backward()
actor_optimizer.step()
for param, target_param in zip(critic.parameters(), target_critic.parameters()):
target_param.data.copy_(tau * param.data + (1 - tau) * target_param.data)
for param, target_param in zip(actor.parameters(), target_actor.parameters()):
target_param.data.copy_(tau * param.data + (1 - tau) * target_param.data)
print(f"Episode {episode+1}: Reward = {episode_reward:.2f}")
env.close()
(:sourceend:)
## **Conclusion**
This RL implementation successfully controls **TCLab temperature** using **DDPG with PyTorch**. The agent learns to **adjust heater power** to maintain the **temperature setpoint** with **minimal error**.
### **Next Steps**
- **Train on real TCLab hardware**
- **Optimize hyperparameters**
- **Compare RL vs. PID control**
This guide provides a **step-by-step RL implementation** for process control applications using **TCLab**.
(:keywords TCLab, reinforcement learning, RL, Gymnasium, DDPG, PyTorch, process control:)
(:description Implementing Deep Deterministic Policy Gradient (DDPG) with PyTorch for temperature control using the TCLab environment:)
%width=15px%Attach:github.png [[https://github.com/APMonitor/dynopt/blob/master/RL_for_Engineers.ipynb|GitHub]] | %width=20px%Attach:colab.png [[https://colab.research.google.com/github/APMonitor/dynopt/blob/master/RL_for_Engineers.ipynb|Google Colab]]
## **Introduction**
This page demonstrates a **Reinforcement Learning (RL) approach** for **controlling the Temperature Control Lab (TCLab)** using the **Deep Deterministic Policy Gradient (DDPG) algorithm in PyTorch**. The RL agent learns to adjust the **heater power** to maintain a desired temperature setpoint.
## **TCLab Environment**
The **TCLab** is an Arduino-based **temperature control system** with:
- **Two heaters**
- **Two temperature sensors**
- **Python interface via the tclab package**
The RL agent will learn to control **heater power** to maintain a **temperature setpoint**.
### **Gymnasium Custom Environment**
We define a **custom Gymnasium environment** to interface with TCLab.
(:source lang=python:)
import gymnasium as gym
import numpy as np
import torch
import tclab
class TCLabEnv(gym.Env):
def __init__(self, setpoint=50):
super(TCLabEnv, self).__init__()
self.lab = tclab.TCLab() # Connect to TCLab hardware
self.setpoint = setpoint
self.action_space = gym.spaces.Box(low=np.array([0]), high=np.array([100]), dtype=np.float32)
self.observation_space = gym.spaces.Box(low=np.array([0]), high=np.array([100]), dtype=np.float32)
def reset(self):
self.lab.Q1(0) # Turn off heater
self.lab.Q2(0)
return np.array([self.lab.T1]), {}
def step(self, action):
self.lab.Q1(action[0]) # Apply action
self.lab.Q2(action[0])
temperature = self.lab.T1 # Read temperature
reward = -abs(temperature - self.setpoint) # Reward: minimize error
done = False # No terminal state in continuous control
return np.array([temperature]), reward, done, False, {}
def close(self):
self.lab.Q1(0)
self.lab.Q2(0)
self.lab.close()
(:sourceend:)
## **Actor-Critic Networks (PyTorch)**
We define the **Actor** and **Critic** neural networks.
(:source lang=python:)
import torch.nn as nn
import torch.optim as optim
class Actor(nn.Module):
def __init__(self, state_dim, action_dim, max_action):
super(Actor, self).__init__()
self.net = nn.Sequential(
nn.Linear(state_dim, 128), nn.ReLU(),
nn.Linear(128, 128), nn.ReLU(),
nn.Linear(128, action_dim),
nn.Sigmoid() # Output between 0 and 1
)
self.max_action = max_action
def forward(self, state):
return self.max_action * self.net(state)
class Critic(nn.Module):
def __init__(self, state_dim, action_dim):
super(Critic, self).__init__()
self.net = nn.Sequential(
nn.Linear(state_dim + action_dim, 128), nn.ReLU(),
nn.Linear(128, 128), nn.ReLU(),
nn.Linear(128, 1)
)
def forward(self, state, action):
return self.net(torch.cat([state, action], dim=1))
(:sourceend:)
## **Replay Buffer**
(:source lang=python:)
import random
from collections import deque
class ReplayBuffer:
def __init__(self, capacity=10000):
self.buffer = deque(maxlen=capacity)
def add(self, state, action, reward, next_state, done):
self.buffer.append((state, action, reward, next_state, done))
def sample(self, batch_size):
batch = random.sample(self.buffer, batch_size)
states, actions, rewards, next_states, dones = zip(*batch)
return torch.tensor(states, dtype=torch.float32), torch.tensor(actions, dtype=torch.float32), \
torch.tensor(rewards, dtype=torch.float32).unsqueeze(1), \
torch.tensor(next_states, dtype=torch.float32), torch.tensor(dones, dtype=torch.float32).unsqueeze(1)
(:sourceend:)
## **Training Loop**
(:source lang=python:)
actor = Actor(state_dim=1, action_dim=1, max_action=100)
critic = Critic(state_dim=1, action_dim=1)
target_actor = Actor(state_dim=1, action_dim=1, max_action=100)
target_critic = Critic(state_dim=1, action_dim=1)
target_actor.load_state_dict(actor.state_dict())
target_critic.load_state_dict(critic.state_dict())
actor_optimizer = optim.Adam(actor.parameters(), lr=1e-3)
critic_optimizer = optim.Adam(critic.parameters(), lr=1e-3)
buffer = ReplayBuffer()
gamma = 0.99
tau = 0.005
for episode in range(100):
state, _ = env.reset()
episode_reward = 0
for step in range(200):
state_tensor = torch.tensor(state, dtype=torch.float32).unsqueeze(0)
action = actor(state_tensor).detach().numpy()
next_state, reward, done, _, _ = env.step(action)
buffer.add(state, action, reward, next_state, done)
state = next_state
episode_reward += reward
if len(buffer) > 64:
states, actions, rewards, next_states, dones = buffer.sample(64)
with torch.no_grad():
next_actions = target_actor(next_states)
target_q = target_critic(next_states, next_actions)
target_q = rewards + gamma * (1 - dones) * target_q
current_q = critic(states, actions)
critic_loss = nn.MSELoss()(current_q, target_q)
critic_optimizer.zero_grad()
critic_loss.backward()
critic_optimizer.step()
actor_loss = -critic(states, actor(states)).mean()
actor_optimizer.zero_grad()
actor_loss.backward()
actor_optimizer.step()
for param, target_param in zip(critic.parameters(), target_critic.parameters()):
target_param.data.copy_(tau * param.data + (1 - tau) * target_param.data)
for param, target_param in zip(actor.parameters(), target_actor.parameters()):
target_param.data.copy_(tau * param.data + (1 - tau) * target_param.data)
print(f"Episode {episode+1}: Reward = {episode_reward:.2f}")
env.close()
(:sourceend:)
## **Conclusion**
This RL implementation successfully controls **TCLab temperature** using **DDPG with PyTorch**. The agent learns to **adjust heater power** to maintain the **temperature setpoint** with **minimal error**.
### **Next Steps**
- **Train on real TCLab hardware**
- **Optimize hyperparameters**
- **Compare RL vs. PID control**
This guide provides a **step-by-step RL implementation** for process control applications using **TCLab**.