This project explores the challenges and opportunities of Real-Time Reinforcement Learning in simulated autonomous driving environments. Traditional RL algorithms assume that the environment remains static during action selection, which breaks down in real-world applications where computation time introduces delays.

Research Questions

Key Findings

Methodology

We address computation delays by conditioning the policy on both the previous state and the previous action when sampling a new action. Instead of the classical policy π(a_t s_t), we use π(a_t s_{t-1}, a_{t-1}), allowing the policy to learn to predict state evolution intrinsically within the model.

Demo & Resources

Authors

Guillaume Gagné-Labelle, Gabriel Sasseville, Nicolas Bosteels (2025)

Technologies

PyTorch, Gymnasium, Duckietown, Duckiematrix, TensorRT