Publications
For news about publications, follow us on X:
Click on any author names or tags to filter publications.
All topic tags:
surveydeep-rlmulti-agent-rlagent-modellingad-hoc-teamworkautonomous-drivinggoal-recognitionexplainable-aicausalgeneralisationsecurityemergent-communicationiterated-learningintrinsic-rewardsimulatorstate-estimationdeep-learningtransfer-learning
Selected tags (click to remove):
Samuel-Garcin
2024
Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design
International Conference on Machine Learning, 2024
Abstract | BibTex | arXiv
ICMLdeep-rl
Abstract:
Autonomous agents trained using deep reinforcement learning (RL) often lack the ability to successfully generalise to new environments, even when they share characteristics with the environments they have encountered during training. In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents. We discover that, for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data. This provides a novel theoretical justification for the implicit regularisation achieved by certain adaptive sampling strategies. We then turn our attention to unsupervised environment design (UED) methods, which have more control over the data generation mechanism. We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance. To prevent both overfitting and distributional shift, we introduce data-regularised environment design (DRED). DRED generates levels using a generative model trained over an initial set of level parameters, reducing distributional shift, and achieves significant improvements in ZSG over adaptive level sampling strategies and UED methods.
@inproceedings{garcin2024dred,
title={{DRED}: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design},
author={Samuel Garcin and James Doran and Shangmin Guo and Christopher G. Lucas and Stefano V. Albrecht},
year={2024},
booktitle={International Conference on Machine Learning (ICML)}
}
2023
Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
How the level sampling process impacts zero-shot generalisation in deep reinforcement learning
NeurIPS Workshop on Agent Learning in Open-Endedness, 2023
Abstract | BibTex | arXiv
NeurIPSdeep-rl
Abstract:
A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods.
@inproceedings{garcin2023level,
title={How the level sampling process impacts zero-shot generalisation in deep reinforcement learning},
author={Samuel Garcin and James Doran and Shangmin Guo and Christopher G. Lucas and Stefano V. Albrecht},
booktitle={NeurIPS Workshop on Agent Learning in Open-Endedness},
year={2023}
}
Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
How the level sampling process impacts zero-shot generalisation in deep reinforcement learning
arXiv:2310.03494, 2023
Abstract | BibTex | arXiv
deep-rl
Abstract:
A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods.
@misc{garcin2023level,
title={How the level sampling process impacts zero-shot generalisation in deep reinforcement learning},
author={Samuel Garcin and James Doran and Shangmin Guo and Christopher G. Lucas and Stefano V. Albrecht},
year={2023},
eprint={2310.03494},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
2022
Ibrahim H. Ahmed, Cillian Brewitt, Ignacio Carlucho, Filippos Christianos, Mhairi Dunion, Elliot Fosong, Samuel Garcin, Shangmin Guo, Balint Gyevnar, Trevor McInroe, Georgios Papoudakis, Arrasy Rahman, Lukas Schäfer, Massimiliano Tamborski, Giuseppe Vecchio, Cheng Wang, Stefano V. Albrecht
Deep Reinforcement Learning for Multi-Agent Interaction
AI Communications, 2022
Abstract | BibTex | arXiv | Publisher
AICsurveydeep-rlmulti-agent-rlad-hoc-teamworkagent-modellinggoal-recognitionsecurityexplainable-aiautonomous-driving
Abstract:
The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.
@article{albrecht2022aic,
author = {Ahmed, Ibrahim H. and Brewitt, Cillian and Carlucho, Ignacio and Christianos, Filippos and Dunion, Mhairi and Fosong, Elliot and Garcin, Samuel and Guo, Shangmin and Gyevnar, Balint and McInroe, Trevor and Papoudakis, Georgios and Rahman, Arrasy and Schäfer, Lukas and Tamborski, Massimiliano and Vecchio, Giuseppe and Wang, Cheng and Albrecht, Stefano V.},
title = {Deep Reinforcement Learning for Multi-Agent Interaction},
journal = {AI Communications, Special Issue on Multi-Agent Systems Research in the UK},
year = {2022}
}
2021
Cillian Brewitt, Balint Gyevnar, Samuel Garcin, Stefano V. Albrecht
GRIT: Fast, Interpretable, and Verifiable Goal Recognition with Learned Decision Trees for Autonomous Driving
IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021
Abstract | BibTex | arXiv | Video | Code
IROSautonomous-drivinggoal-recognitionexplainable-ai
Abstract:
It is important for autonomous vehicles to have the ability to infer the goals of other vehicles (goal recognition), in order to safely interact with other vehicles and predict their future trajectories. This is a difficult problem, especially in urban environments with interactions between many vehicles. Goal recognition methods must be fast to run in real time and make accurate inferences. As autonomous driving is safety-critical, it is important to have methods which are human interpretable and for which safety can be formally verified. Existing goal recognition methods for autonomous vehicles fail to satisfy all four objectives of being fast, accurate, interpretable and verifiable. We propose Goal Recognition with Interpretable Trees (GRIT), a goal recognition system which achieves these objectives. GRIT makes use of decision trees trained on vehicle trajectory data. We evaluate GRIT on two datasets, showing that GRIT achieved fast inference speed and comparable accuracy to two deep learning baselines, a planning-based goal recognition method, and an ablation of GRIT. We show that the learned trees are human interpretable and demonstrate how properties of GRIT can be formally verified using a satisfiability modulo theories (SMT) solver.
@inproceedings{brewitt2021grit,
title={{GRIT:} Fast, Interpretable, and Verifiable Goal Recognition with Learned Decision Trees for Autonomous Driving},
author={Cillian Brewitt and Balint Gyevnar and Samuel Garcin and Stefano V. Albrecht},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2021}
}