Publications
For news about publications, follow us on X:
Click on any author names or tags to filter publications.
All topic tags:
surveydeep-rlmulti-agent-rlagent-modellingad-hoc-teamworkautonomous-drivinggoal-recognitionexplainable-aicausalgeneralisationsecurityemergent-communicationiterated-learningintrinsic-rewardsimulatorstate-estimationdeep-learningtransfer-learning
Selected tags (click to remove):
deep-rlNeurIPS
2024
Xuehui Yu, Mhairi Dunion, Xin Li, Stefano V. Albrecht
Skill-aware Mutual Information Optimisation for Generalisation in Reinforcement Learning
Conference on Neural Information Processing Systems, 2024
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rl
Abstract:
Meta-Reinforcement Learning (Meta-RL) agents can struggle to operate across tasks with varying environmental features that require different optimal skills (i.e., different modes of behaviour). Using context encoders based on contrastive learning to enhance the generalisability of Meta-RL agents is now widely studied but faces challenges such as the requirement for a large sample size, also referred to as the log-K curse. To improve RL generalisation to different tasks, we first introduce Skill-aware Mutual Information (SaMI), an optimisation objective that aids in distinguishing context embeddings according to skills, thereby equipping RL agents with the ability to identify and execute different skills across tasks. We then propose Skill-aware Noise Contrastive Estimation (SaNCE), a K-sample estimator used to optimise the SaMI objective. We provide a framework for equipping an RL agent with SaNCE in practice and conduct experimental validation on modified MuJoCo and Panda-gym benchmarks. We empirically find that RL agents that learn by maximising SaMI achieve substantially improved zero-shot generalisation to unseen tasks. Additionally, the context encoder trained with SaNCE demonstrates greater robustness to a reduction in the number of available samples, thus possessing the potential to overcome the log-K curse.
@inproceedings{yu2024skillaware,
title={Skill-aware Mutual Information Optimisation for Generalisation in Reinforcement Learning},
author={Xuehui Yu and Mhairi Dunion and Xin Li and Stefano V. Albrecht},
booktitle={Conference on Neural Information Processing Systems},
year={2024}
}
2023
Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah Hanna, Stefano V. Albrecht
Conditional Mutual Information for Disentangled Representations in Reinforcement Learning
Conference on Neural Information Processing Systems, 2023
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rlcausalgeneralisation
Abstract:
Reinforcement Learning (RL) environments can produce training data with spurious correlations between features due to the amount of training data or its limited feature coverage. This can lead to RL agents encoding these misleading correlations in their latent representation, preventing the agent from generalising if the correlation changes within the environment or when deployed in the real world. Disentangled representations can improve robustness, but existing disentanglement techniques that minimise mutual information between features require independent features, thus they cannot disentangle correlated features. We propose an auxiliary task for RL algorithms that learns a disentangled representation of high-dimensional observations with correlated features by minimising the conditional mutual information between features in the representation. We demonstrate experimentally, using continuous control tasks, that our approach improves generalisation under correlation shifts, as well as improving the training performance of RL algorithms in the presence of correlated features.
@inproceedings{dunion2023cmid,
title={Conditional Mutual Information for Disentangled Representations in Reinforcement Learning},
author={Mhairi Dunion and Trevor McInroe and Kevin Sebastian Luck and Josiah Hanna and Stefano V. Albrecht},
booktitle={Conference on Neural Information Processing Systems},
year={2023}
}
Lukas Schäfer, Filippos Christianos, Amos Storkey, Stefano V. Albrecht
Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning
NeurIPS Workshop on Generalization in Planning, 2023
Abstract | BibTex | arXiv | Code
NeurIPSmulti-agent-rldeep-rl
Abstract:
Successful deployment of multi-agent reinforcement learning often requires agents to adapt their behaviour. In this work, we discuss the problem of teamwork adaptation in which a team of agents needs to adapt their policies to solve novel tasks with limited fine-tuning. Motivated by the intuition that agents need to be able to identify and distinguish tasks in order to adapt their behaviour to the current task, we propose to learn multi-agent task embeddings (MATE). These task embeddings are trained using an encoder-decoder architecture optimised for reconstruction of the transition and reward functions which uniquely identify tasks. We show that a team of agents is able to adapt to novel tasks when provided with task embeddings. We propose three MATE training paradigms: independent MATE, centralised MATE, and mixed MATE which vary in the information used for the task encoding. We show that the embeddings learned by MATE identify tasks and provide useful information which agents leverage during adaptation to novel tasks.
@inproceedings{schaefer2023mate,
title={Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning},
author={Lukas Schäfer and Filippos Christianos and Amos Storkey and Stefano V. Albrecht},
booktitle={NeurIPS Workshop on Generalization in Planning},
year={2023}
}
Guy Azran, Mohamad H Danesh, Stefano V. Albrecht, Sarah Keren
Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning
NeurIPS Workshop on Generalization in Planning, 2023
Abstract | BibTex | arXiv
NeurIPSdeep-rlcausal
Abstract:
Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RM), state machine abstractions that induce subtasks based on the current task’s rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Our empirical evaluation shows that our representations improve sample efficiency and few-shot transfer in a variety of domains.
@inproceedings{azran2023contextual,
title={Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning},
author={Guy Azran and Mohamad H. Danesh and Stefano V. Albrecht and Sarah Keren},
booktitle={NeurIPS Workshop on Generalization in Planning},
year={2023}
}
Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
How the level sampling process impacts zero-shot generalisation in deep reinforcement learning
NeurIPS Workshop on Agent Learning in Open-Endedness, 2023
Abstract | BibTex | arXiv
NeurIPSdeep-rl
Abstract:
A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods.
@inproceedings{garcin2023level,
title={How the level sampling process impacts zero-shot generalisation in deep reinforcement learning},
author={Samuel Garcin and James Doran and Shangmin Guo and Christopher G. Lucas and Stefano V. Albrecht},
booktitle={NeurIPS Workshop on Agent Learning in Open-Endedness},
year={2023}
}
Sabrina McCallum, Max Taylor-Davies, Stefano V. Albrecht, Alessandro Suglia
Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning
NeurIPS Workshop on Goal-Conditioned Reinforcement Learning, 2023
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rl
Abstract:
Despite numerous successes, the field of reinforcement learning (RL) remains far from matching the impressive generalisation power of human behaviour learning. One way to help bridge this gap may be to provide RL agents with richer, more human-like feedback expressed in natural language. To investigate this idea, we first extend BabyAI to automatically generate language feedback from the environment dynamics and goal condition success. Then, we modify the Decision Transformer architecture to take advantage of this additional signal. We find that training with language feedback either in place of or in addition to the return-to-go or goal descriptions improves agents’ generalisation performance, and that agents can benefit from feedback even when this is only available during training, but not at inference.
@inproceedings{mccallum2023feedback,
title={Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning},
author={Sabrina McCallum and Max Taylor-Davies and Stefano V. Albrecht and Alessandro Suglia},
booktitle={NeurIPS Workshop on Goal-Conditioned Reinforcement Learning (GCRL)},
year={2023}
}
2022
Rujie Zhong, Duohan Zhang, Lukas Schäfer, Stefano V. Albrecht, Josiah P. Hanna
Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
Conference on Neural Information Processing Systems, 2022
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rl
Abstract:
Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy. In this paper, we study a subtle distinction between on-policy data and on-policy sampling in the context of the RL sub-problem of policy evaluation. We observe that on-policy sampling may fail to match the expected distribution of on-policy data after observing only a finite number of trajectories and this failure hinders data-efficient policy evaluation. Towards improved data-efficiency, we show how non-i.i.d., off-policy sampling can produce data that more closely matches the expected on-policy data distribution and consequently increases the accuracy of the Monte Carlo estimator for policy evaluation. We introduce a method called Robust On-Policy Sampling and demonstrate theoretically and empirically that it produces data that converges faster to the expected on-policy distribution compared to on-policy sampling. Empirically, we show that this faster convergence leads to lower mean squared error policy value estimates.
@inproceedings{zhong2022datacollection,
title={Robust On-Policy Data Collection for Data Efficient Policy Evaluation},
author={Rujie Zhong and Duohan Zhang and Lukas Sch\"afer and Stefano V. Albrecht and Josiah P. Hanna},
booktitle={Conference on Neural Information Processing Systems},
year={2022}
}
Trevor McInroe, Lukas Schäfer, Stefano V. Albrecht
Learning Representations for Reinforcement Learning with Hierarchical Forward Models
NeurIPS Workshop on Deep Reinforcement Learning, 2022
Abstract | BibTex | arXiv
NeurIPSdeep-rlgeneralisation
Abstract:
Learning control from pixels is difficult for reinforcement learning (RL) agents because representation learning and policy learning are intertwined. Previous approaches remedy this issue with auxiliary representation learning tasks, but they either do not consider the temporal aspect of the problem or only consider single-step transitions, which may miss relevant information if important environmental changes take many steps to manifest. We propose Hierarchical k-Step Latent (HKSL), an auxiliary task that learns representations via a hierarchy of forward models that operate at varying magnitudes of step skipping while also learning to communicate between levels in the hierarchy. We evaluate HKSL in a suite of 30 robotic control tasks with and without distractors and a task of our creation. We find that HKSL either converges to higher or optimal episodic returns more quickly than several alternative representation learning approaches. Furthermore, we find that HKSL's representations capture task-relevant details accurately across timescales (even in the presence of distractors) and that communication channels between hierarchy levels organize information based on both sides of the communication process, both of which improve sample efficiency.
@inproceedings{mcinroe2022hksl,
title={Learning Representations for Reinforcement Learning with Hierarchical Forward Models},
author={Trevor McInroe and Lukas Schäfer and Stefano V. Albrecht},
booktitle={NeurIPS Workshop on Deep RL},
year={2022}
}
Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah Hanna, Stefano V. Albrecht
Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning
NeurIPS Workshop on Deep Reinforcement Learning, 2022
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rlgeneralisationcausal
Abstract:
Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training. This issue is especially problematic for image-based RL, where a change in just one variable, such as the background colour, can change many pixels in the image, which can lead to drastic changes in the agent's latent representation of the image, causing the learned policy to fail. To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled image representations exploiting the sequential nature of RL observations. We find empirically that RL algorithms utilising TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods. Since TED enforces a disentangled structure of the representation, we also find that policies trained with TED generalise better to unseen values of variables irrelevant to the task (e.g. background colour) as well as unseen values of variables that affect the optimal policy (e.g. goal positions).
@inproceedings{dunion2022ted,
title={Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning},
author={Mhairi Dunion and Trevor McInroe and Kevin Sebastian Luck and Josiah Hanna and Stefano V. Albrecht},
booktitle={NeurIPS Workshop on Deep Reinforcement Learning},
year={2022}
}
Guy Azran, Mohamad Hosein Danesh, Stefano V. Albrecht, Sarah Keren
Enhancing Transfer of Reinforcement Learning Agents with Abstract Contextual Embeddings
NeurIPS Workshop on Neuro Causal and Symbolic AI, 2022
Abstract | BibTex
NeurIPSdeep-rlcausal
Abstract:
Deep reinforcement learning (DRL) algorithms have seen great success in performing a plethora of tasks, but often have trouble adapting to changes in the environment. We address this issue by using reward machines (RM), a graph-based abstraction of the underlying task to represent the current setting or context. Using a graph neural network (GNN), we embed the RMs into deep latent vector representations and provide them to the agent to enhance its ability to adapt to new contexts. To the best of our knowledge, this is the first work to embed contextual abstractions and let the agent decide how to use them. Our preliminary empirical evaluation demonstrates improved sample efficiency of our approach upon context transfer on a set of grid navigation tasks.
@inproceedings{Azran2022enhancing,
title={Enhancing Transfer of Reinforcement Learning Agents with Abstract Contextual Embeddings},
author={Guy Azran and Mohamad Hosein Danesh and Stefano V. Albrecht and Sarah Keren},
booktitle={NeurIPS Workshop on Neuro Causal and Symbolic AI (https://ncsi.cause-lab.net)},
year={2022}
}
2021
Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, Stefano V. Albrecht
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks
Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, 2021
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rlmulti-agent-rl
Abstract:
Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we consistently evaluate and compare three different classes of MARL algorithms (independent learning, centralised multi-agent policy gradient, value decomposition) in a diverse range of cooperative multi-agent learning tasks. Our experiments serve as a reference for the expected performance of algorithms across different learning tasks, and we provide insights regarding the effectiveness of different learning approaches. We open-source EPyMARL, which extends the PyMARL codebase [Samvelyan et al., 2019] to include additional algorithms and allow for flexible configuration of algorithm implementation details such as parameter sharing. Finally, we open-source two environments for multi-agent research which focus on coordination under sparse rewards.
@inproceedings{papoudakis2021benchmarking,
title={Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks},
author={Georgios Papoudakis and Filippos Christianos and Lukas Sch\"afer and Stefano V. Albrecht},
booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS)},
year={2021},
url = {http://arxiv.org/abs/2006.07869},
openreview = {https://openreview.net/forum?id=cIrPX-Sn5n},
code = {https://github.com/uoe-agents/epymarl}
}
Georgios Papoudakis, Filippos Christianos, Stefano V. Albrecht
Agent Modelling under Partial Observability for Deep Reinforcement Learning
Conference on Neural Information Processing Systems, 2021
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rlagent-modelling
Abstract:
Modelling the behaviours of other agents is essential for understanding how agents interact and making effective decisions. Existing methods for agent modelling commonly assume knowledge of the local observations and chosen actions of the modelled agents during execution. To eliminate this assumption, we extract representations from the local information of the controlled agent using encoder-decoder architectures. Using the observations and actions of the modelled agents during training, our models learn to extract representations about the modelled agents conditioned only on the local observations of the controlled agent. The representations are used to augment the controlled agent's decision policy which is trained via deep reinforcement learning; thus, during execution, the policy does not require access to other agents' information. We provide a comprehensive evaluation and ablations studies in cooperative, competitive and mixed multi-agent environments, showing that our method achieves significantly higher returns than baseline methods which do not use the learned representations.
@inproceedings{papoudakis2021local,
title={Agent Modelling under Partial Observability for Deep Reinforcement Learning},
author={Georgios Papoudakis and Filippos Christianos and Stefano V. Albrecht},
booktitle = {Proceedings of the Neural Information Processing Systems (NeurIPS)},
year = {2021}
}
Rujie Zhong, Josiah P. Hanna, Lukas Schäfer, Stefano V. Albrecht
Robust On-Policy Data Collection for Data-Efficient Policy Evaluation
NeurIPS Workshop on Offline Reinforcement Learning, 2021
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rl
Abstract:
This paper considers how to complement offline reinforcement learning (RL) data with additional data collection for the task of policy evaluation. In policy evaluation, the task is to estimate the expected return of an evaluation policy on an environment of interest. Prior work on offline policy evaluation typically only considers a static dataset. We consider a setting where we can collect a small amount of additional data to combine with a potentially larger offline RL dataset. We show that simply running the evaluation policy – on-policy data collection – is sub-optimal for this setting. We then introduce two new data collection strategies for policy evaluation, both of which consider previously collected data when collecting future data so as to reduce distribution shift (or sampling error) in the entire dataset collected. Our empirical results show that compared to on-policy sampling, our strategies produce data with lower sampling error and generally lead to lower mean-squared error in policy evaluation for any total dataset size. We also show that these strategies can start from initial off-policy data, collect additional data, and then use both the initial and new data to produce low mean-squared error policy evaluation without using off-policy corrections.
@inproceedings{zhong2021robust,
title={Robust On-Policy Data Collection for Data-Efficient Policy Evaluation},
author={Rujie Zhong and Josiah P. Hanna and Lukas Sch\"afer and Stefano V. Albrecht},
booktitle={NeurIPS Workshop on Offline Reinforcement Learning (OfflineRL)},
year={2021}
}
2020
Filippos Christianos, Lukas Schäfer, Stefano V. Albrecht
Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
Conference on Neural Information Processing Systems, 2020
Abstract | BibTex | arXiv
NeurIPSdeep-rlmulti-agent-rl
Abstract:
Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.
@inproceedings{christianos2020shared,
title={Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning},
author={Filippos Christianos and Lukas Sch\"afer and Stefano V. Albrecht},
booktitle={34th Conference on Neural Information Processing Systems},
year={2020}
}