Publications

For news about publications, follow us on X:

Click on any author names or tags to filter publications.

All topic tags:
survey deep-rl multi-agent-rl agent-modelling ad-hoc-teamwork autonomous-driving goal-recognition explainable-ai causal generalisation security emergent-communication iterated-learning intrinsic-reward simulator state-estimation deep-learning transfer-learning

Selected tags (click to remove):
TMLR deep-rl

2025

Zidu Yin, Zhen Zhang, Dong Gong, Stefano V. Albrecht, Javen Q. Shi
Highway Graph to Accelerate Reinforcement Learning
Transactions on Machine Learning Research, 2025
Abstract | BibTex | arXiv | Code
TMLR deep-rl

Abstract: Reinforcement Learning (RL) algorithms often suffer from low training efficiency. A strategy to mitigate this issue is to incorporate a model-based planning algorithm, such as Monte Carlo Tree Search (MCTS) or Value Iteration (VI), into the environmental model. The major limitation of VI is the need to iterate over a large tensor with the shape |S| × |A| × |S|, where S/A denotes the state/action space. This process iteratively updates the value of the preceding state st−1 based on the state st in one step via value propagation. These still lead to intensive computations. We focus on improving the training efficiency of RL algorithms by improving the efficiency of the value learning process. For the deterministic environments with discrete state and action spaces, on the sampled empirical state-transition graph, a non-branching sequence of transitions can directly bring the agent from s0 to sT without deviating from intermediate states, which we call a highway. On such non-branching highways, the value-updating process can be merged as a one-step process instead of iterating the value step-by-step. Based on this observation, we propose a novel graph structure, named highway graph, to model the state transition. Our highway graph compresses the transition model into a concise graph, where edges can represent multiple state transitions to support value propagation across multiple time steps in each iteration. We thus can obtain a more efficient value learning approach by facilitating the VI algorithm on highway graphs. By integrating the highway graph into RL (as a model-based off-policy RL method), the RL training can be remarkably accelerated in the early stages (within 1 million frames). Comparison against various baselines on four categories of environments reveals that our method outperforms both representative and novel model-free and model-based RL algorithms, demonstrating 10 to more than 150 times more efficiency while maintaining an equal or superior expected return, as confirmed by carefully conducted analyses. Moreover, a deep neural network-based agent is trained using the highway graph, resulting in better generalization and lower storage costs.

@article{yin2025highway,
   title = {Highway Graph to Accelerate Reinforcement Learning},
   author = {Zidu Yin and Zhen Zhang and Dong Gong and Stefano V. Albrecht and Javen Q. Shi},
   journal = {Transactions on Machine Learning Research (TMLR)},
   year = {2025}
}

2024

Trevor McInroe, Lukas Schäfer, Stefano V. Albrecht
Multi-Horizon Representations with Hierarchical Forward Models for Reinforcement Learning
Transactions on Machine Learning Research, 2024
Abstract | BibTex | arXiv | Code
TMLR deep-rl

@article{mcinroe2024hksl,
   title = {Multi-Horizon Representations with Hierarchical Forward Models for Reinforcement Learning},
   author = {Trevor McInroe and Lukas Schäfer and Stefano V. Albrecht},
   journal = {Transactions on Machine Learning Research (TMLR)},
   year = {2024}
}

2023

Filippos Christianos, Georgios Papoudakis, Stefano V. Albrecht
Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning
Transactions on Machine Learning Research, 2023
Abstract | BibTex | arXiv | Code
TMLR deep-rl multi-agent-rl

@article{christianos2023pareto,
   title={Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning},
   author={Filippos Christianos and Georgios Papoudakis and Stefano V. Albrecht},
   journal={Transactions on Machine Learning Research (TMLR)},
   year={2023}
}

Arrasy Rahman, Elliot Fosong, Ignacio Carlucho, Stefano V. Albrecht
Generating Teammates for Training Robust Ad Hoc Teamwork Agents via Best-Response Diversity
Transactions on Machine Learning Research, 2023
Abstract | BibTex | arXiv | Code
TMLR ad-hoc-teamwork multi-agent-rl deep-rl

@article{rahman2023BRDiv,
   title={Generating Teammates for Training Robust Ad Hoc Teamwork Agents via Best-Response Diversity},
   author={Arrasy Rahman and Elliot Fosong and Ignacio Carlucho and Stefano V. Albrecht},
   journal={Transactions on Machine Learning Research (TMLR)},
   year={2023}
}