Publications
For news about publications, follow us on X:
Click on any author names or tags to filter publications.
All topic tags:
surveydeep-rlmulti-agent-rlagent-modellingad-hoc-teamworkautonomous-drivinggoal-recognitionexplainable-aicausalgeneralisationsecurityemergent-communicationiterated-learningintrinsic-rewardsimulatorstate-estimationdeep-learningtransfer-learning
2024
Stefano V. Albrecht, Filippos Christianos, Lukas Schäfer
Multi-Agent Reinforcement Learning: Foundations and Modern Approaches
MIT Press (print version scheduled for December 2024), 2024
Abstract | BibTex | Book website | Book codebase
MITPmulti-agent-rldeep-rldeep-learningsurvey
Abstract:
Textbook published by MIT Press.
@book{ marl-book,
author = {Stefano V. Albrecht and Filippos Christianos and Lukas Sch\"afer},
title = {Multi-Agent Reinforcement Learning: Foundations and Modern Approaches},
publisher = {MIT Press},
year = {2024},
url = {https://www.marl-book.com}
}
Anton Kuznietsov, Balint Gyevnar, Cheng Wang, Steven Peters, Stefano V. Albrecht
Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review
IEEE Transactions on Intelligent Transportation Systems, 2024
Abstract | BibTex | arXiv
T-ITSautonomous-drivingexplainable-aisurvey
Abstract:
Artificial Intelligence (AI) shows promising applications for the perception and planning tasks in autonomous driving (AD) due to its superior performance compared to conventional methods. However, inscrutable AI systems exacerbate the existing challenge of safety assurance of AD. One way to mitigate this challenge is to utilize explainable AI (XAI) techniques. To this end, we present the first comprehensive systematic literature review of explainable methods for safe and trustworthy AD. We begin by analyzing the requirements for AI in the context of AD, focusing on three key aspects: data, model, and agency. We find that XAI is fundamental to meeting these requirements. Based on this, we explain the sources of explanations in AI and describe a taxonomy of XAI. We then identify five key contributions of XAI for safe and trustworthy AI in AD, which are interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation. Finally, we propose a modular framework called SafeX to integrate these contributions, enabling explanation delivery to users while simultaneously ensuring the safety of AI models.
@article{kuznietsov2024avreview,
title={Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review},
author={Anton Kuznietsov and Balint Gyevnar and Cheng Wang and Steven Peters and Stefano V. Albrecht},
journal={IEEE Transactions on Intelligent Transportation Systems (T-ITS)},
year={2024}
}
Trevor McInroe, Lukas Schäfer, Stefano V. Albrecht
Multi-Horizon Representations with Hierarchical Forward Models for Reinforcement Learning
Transactions on Machine Learning Research, 2024
Abstract | BibTex | arXiv | Code
TMLRdeep-rl
Abstract:
Learning control from pixels is difficult for reinforcement learning (RL) agents because representation learning and policy learning are intertwined. Previous approaches remedy this issue with auxiliary representation learning tasks, but they either do not consider the temporal aspect of the problem or only consider single-step transitions. Instead, we propose Hierarchical k-Step Latent (HKSL), an auxiliary task that learns representations via a hierarchy of forward models that operate at varying magnitudes of step skipping while also learning to communicate between levels in the hierarchy. We evaluate HKSL in a suite of 30 robotic control tasks and find that HKSL either reaches higher episodic returns or converges to maximum performance more quickly than several current baselines. Also, we find that levels in HKSL's hierarchy can learn to specialize in long- or short-term consequences of agent actions, thereby providing the downstream control policy with more informative representations. Finally, we determine that communication channels between hierarchy levels organize information based on both sides of the communication process, which improves sample efficiency.
@article{mcinroe2024hksl,
title = {Multi-Horizon Representations with Hierarchical Forward Models for Reinforcement Learning},
author = {Trevor McInroe and Lukas Schäfer and Stefano V. Albrecht},
journal = {Transactions on Machine Learning Research (TMLR)},
year = {2024}
}
Zidu Yin, Zhen Zhang, Dong Gong, Stefano V. Albrecht, Javen Qinfeng Shi
Highway Graph to Accelerate Reinforcement Learning
Transactions on Machine Learning Research, 2024
Abstract | BibTex | Paper
TMLRdeep-rl
Abstract:
Reinforcement Learning (RL) algorithms often suffer from low training efficiency. A strategy to mitigate this issue is to incorporate a model-based planning algorithm, such as Monte Carlo Tree Search (MCTS) or Value Iteration (VI), into the environmental model. The major limitation of VI is the need to iterate over a large tensor with the shape |S| × |A| × |S|, where S/A denotes the state/action space. This process iteratively updates the value of the preceding state st−1 based on the state st in one step via value propagation. These still lead to intensive computations. We focus on improving the training efficiency of RL algorithms by improving the efficiency of the value learning process. For the deterministic environments with discrete state and action spaces, on the sampled empirical state-transition graph, a non-branching sequence of transitions can directly bring the agent from s0 to sT without deviating from intermediate states, which we call a highway. On such non-branching highways, the value-updating process can be merged as a one-step process instead of iterating the value step-by-step. Based on this observation, we propose a novel graph structure, named highway graph, to model the state transition. Our highway graph compresses the transition model into a concise graph, where edges can represent multiple state transitions to support value propagation across multiple time steps in each iteration. We thus can obtain a more efficient value learning approach by facilitating the VI algorithm on highway graphs. By integrating the highway graph into RL (as a model-based off-policy RL method), the RL training can be remarkably accelerated in the early stages (within 1 million frames). Comparison against various baselines on four categories of environments reveals that our method outperforms both representative and novel model-free and model-based RL algorithms, demonstrating 10 to more than 150 times more efficiency while maintaining an equal or superior expected return, as confirmed by carefully conducted analyses. Moreover, a deep neural network-based agent is trained using the highway graph, resulting in better generalization and lower storage costs.
@article{yin2024highway,
title = {Multi-Horizon Representations with Hierarchical Forward Models for Reinforcement Learning},
author = {Zidu Yin and Zhen Zhang and Dong Gong and Stefano V. Albrecht and Javen Qinfeng Shi},
journal = {Transactions on Machine Learning Research (TMLR)},
year = {2024}
}
Alain Andres, Lukas Schäfer, Esther Villar-Rodriguez, Stefano V. Albrecht, Javier Del Ser
Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments
Neurocomputing, 2024
Abstract | BibTex | arXiv
Neurocomputingdeep-rl
Abstract:
One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently during online RL training both consistently improve the sample-efficiency while converging to optimal policies. Furthermore, we show that pre-training a policy from as few as two trajectories can make the difference between learning an optimal policy at the end of online training and not learning at all. Our findings motivate the widespread adoption of IL for pre-training and concurrent IL in procedurally generated environments whenever offline trajectories are available or can be generated.
@article{andres2024offline,
title = {Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments},
author = {Andres, Alain and Sch\"afer, Lukas and Villar-Rodriguez, Esther and Albrecht, Stefano V. and Del Ser, Javier},
journal = {Neurocomputing},
year = {2024}
}
Xuehui Yu, Mhairi Dunion, Xin Li, Stefano V. Albrecht
Skill-aware Mutual Information Optimisation for Generalisation in Reinforcement Learning
Conference on Neural Information Processing Systems, 2024
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rl
Abstract:
Meta-Reinforcement Learning (Meta-RL) agents can struggle to operate across tasks with varying environmental features that require different optimal skills (i.e., different modes of behaviour). Using context encoders based on contrastive learning to enhance the generalisability of Meta-RL agents is now widely studied but faces challenges such as the requirement for a large sample size, also referred to as the log-K curse. To improve RL generalisation to different tasks, we first introduce Skill-aware Mutual Information (SaMI), an optimisation objective that aids in distinguishing context embeddings according to skills, thereby equipping RL agents with the ability to identify and execute different skills across tasks. We then propose Skill-aware Noise Contrastive Estimation (SaNCE), a K-sample estimator used to optimise the SaMI objective. We provide a framework for equipping an RL agent with SaNCE in practice and conduct experimental validation on modified MuJoCo and Panda-gym benchmarks. We empirically find that RL agents that learn by maximising SaMI achieve substantially improved zero-shot generalisation to unseen tasks. Additionally, the context encoder trained with SaNCE demonstrates greater robustness to a reduction in the number of available samples, thus possessing the potential to overcome the log-K curse.
@inproceedings{yu2024skillaware,
title={Skill-aware Mutual Information Optimisation for Generalisation in Reinforcement Learning},
author={Xuehui Yu and Mhairi Dunion and Xin Li and Stefano V. Albrecht},
booktitle={Conference on Neural Information Processing Systems},
year={2024}
}
Mhairi Dunion, Stefano V. Albrecht
Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras
Reinforcement Learning Conference, 2024
Abstract | BibTex | arXiv | Code
RLCdeep-rlgeneralisation
Abstract:
The performance of image-based Reinforcement Learning (RL) agents can vary depending on the position of the camera used to capture the images. Training on multiple cameras simultaneously, including a first-person egocentric camera, can leverage information from different camera perspectives to improve the performance of RL. However, hardware constraints may limit the availability of multiple cameras in real-world deployment. Additionally, cameras may become damaged in the real-world preventing access to all cameras that were used during training. To overcome these hardware constraints, we propose Multi-View Disentanglement (MVD), which uses multiple cameras to learn a policy that achieves zero-shot generalisation to any single camera from the training set. Our approach is a self-supervised auxiliary task for RL that learns a disentangled representation from multiple cameras, with a shared representation that is aligned across all cameras to allow generalisation to a single camera, and a private representation that is camera-specific. We show experimentally that an RL agent trained on a single third-person camera is unable to learn an optimal policy in many control tasks; but, our approach, benefiting from multiple cameras during training, is able to solve the task using only the same single third-person camera.
@inproceedings{dunion2024mvd,
title={Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras},
author={Mhairi Dunion and Stefano V. Albrecht},
booktitle={1st Reinforcement Learning Conference},
year={2024}
}
Trevor McInroe, Adam Jelley, Stefano V. Albrecht, Amos Storkey
Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning
Reinforcement Learning Conference, 2024
Abstract | BibTex | arXiv
RLCdeep-rl
Abstract:
Offline pretraining with a static dataset followed by online fine-tuning (offline-to-online, or OtO) is a paradigm well matched to a real-world RL deployment process. In this scenario, we aim to find the best-performing policy within a limited budget of online interactions. Previous work in the OtO setting has focused on correcting for bias introduced by the policy-constraint mechanisms of offline RL algorithms. Such constraints keep the learned policy close to the behavior policy that collected the dataset, but we show this can unnecessarily limit policy performance if the behavior policy is far from optimal. Instead, we forgo constraints and frame OtO RL as an exploration problem that aims to maximize the benefit of online data-collection. We first study the major online RL exploration methods based on intrinsic rewards and UCB in the OtO setting, showing that intrinsic rewards add training instability through reward-function modification, and UCB methods are myopic and it is unclear which learned-component's ensemble to use for action selection. We then introduce an algorithm for planning to go out-of-distribution (PTGOOD) that avoids these issues. PTGOOD uses a non-myopic planning procedure that targets exploration in relatively high-reward regions of the state-action space unlikely to be visited by the behavior policy. By leveraging concepts from the Conditional Entropy Bottleneck, PTGOOD encourages data collected online to provide new information relevant to improving the final deployment policy without altering rewards. We show empirically in several continuous control tasks that PTGOOD significantly improves agent returns during online fine-tuning and avoids the suboptimal policy convergence that many of our baselines exhibit in several environments.
@inproceedings{mcinroe2024planning,
title={Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning},
author={Trevor McInroe and Adam Jelley and Stefano V. Albrecht and Amos Storkey},
booktitle={1st Reinforcement Learning Conference},
year={2024}
}
Aditya Kapoor, Sushant Swamy, Kale-ab Tessera, Mayank Baranwal, Mingfei Sun, Harshad Khadilkar, Stefano V. Albrecht
Agent-Temporal Credit Assignment for Optimal Policy Preservation in Sparse Multi-Agent Reinforcement Learning
RLC Workshop on Coordination and Cooperation for Multi-Agent Reinforcement Learning Methods, 2024
Abstract | BibTex | Paper
RLCdeep-rlmulti-agent-rl
Abstract:
The ability of agents to learn optimal policies is hindered in multi-agent environments where all agents receive a global reward signal sparsely or only at the end of an episode. The delayed nature of these rewards, especially in long-horizon tasks, makes it challenging for agents to evaluate their actions at intermediate time steps. In this paper, we propose Agent-Temporal Reward Redistribution (ATRR), a novel approach to tackle the agent-temporal credit assignment problem by redistributing sparse environment rewards both temporally and at the agent level. ATRR first decomposes the sparse global rewards into rewards for each time step and then calculates agent-specific rewards by determining each agent's relative contribution to these decomposed temporal rewards. We theoretically prove that there exists a redistribution method equivalent to potential-based reward shaping, ensuring that the optimal policy remains unchanged. Empirically, we demonstrate that ATRR stabilizes and expedites the learning process. We also show that ATRR, when used alongside single-agent reinforcement learning algorithms, performs as well as or better than their multi-agent counterparts.
@inproceedings{kapoor2024agenttemporal,
title={Agent-Temporal Credit Assignment for Optimal Policy Preservation in Sparse Multi-Agent Reinforcement Learning},
author={Aditya Kapoor and Sushant Swamy and Kale-ab Tessera and Mayank Baranwal and Mingfei Sun and Harshad Khadilkar and Stefano V Albrecht},
booktitle={Coordination and Cooperation for Multi-Agent Reinforcement Learning Methods Workshop},
year={2024},
url={https://openreview.net/forum?id=dGS1e3FXUH}
}
Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design
International Conference on Machine Learning, 2024
Abstract | BibTex | arXiv
ICMLdeep-rl
Abstract:
Autonomous agents trained using deep reinforcement learning (RL) often lack the ability to successfully generalise to new environments, even when they share characteristics with the environments they have encountered during training. In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents. We discover that, for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data. This provides a novel theoretical justification for the implicit regularisation achieved by certain adaptive sampling strategies. We then turn our attention to unsupervised environment design (UED) methods, which have more control over the data generation mechanism. We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance. To prevent both overfitting and distributional shift, we introduce data-regularised environment design (DRED). DRED generates levels using a generative model trained over an initial set of level parameters, reducing distributional shift, and achieves significant improvements in ZSG over adaptive level sampling strategies and UED methods.
@inproceedings{garcin2024dred,
title={{DRED}: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design},
author={Samuel Garcin and James Doran and Shangmin Guo and Christopher G. Lucas and Stefano V. Albrecht},
year={2024},
booktitle={International Conference on Machine Learning (ICML)}
}
Elliot Fosong, Arrasy Rahman, Ignacio Carlucho, Stefano V. Albrecht
Learning Complex Teamwork Tasks Using a Given Sub-task Decomposition
International Conference on Autonomous Agents and Multi-Agent Systems, 2024
Abstract | BibTex | arXiv | Code
AAMASmulti-agent-rl
Abstract:
Training a team to complete a complex task via multi-agent reinforcement learning can be difficult due to challenges such as policy search in a large joint policy space, and non-stationarity caused by mutually adapting agents. To facilitate efficient learning of complex multi-agent tasks, we propose an approach which uses an expert-provided decomposition of a task into simpler multi-agent sub-tasks. In each sub-task, a subset of the entire team is trained to acquire sub-task-specific policies. The sub-teams are then merged and transferred to the target task, where their policies are collectively fine-tuned to solve the more complex target task. We show empirically that such approaches can greatly reduce the number of timesteps required to solve a complex target task relative to training from-scratch. However, we also identify and investigate two problems with naive implementations of approaches based on sub-task decomposition, and propose a simple and scalable method to address these problems which augments existing actor-critic algorithms. We demonstrate the empirical benefits of our proposed method, enabling sub-task decomposition approaches to be deployed in diverse multi-agent tasks.
@inproceedings{fosongLearningComplexTeamwork2024,
title = {Learning Complex Teamwork Tasks Using a Given Sub-task Decomposition},
author = {Fosong, Elliot and Rahman, Arrasy and Carlucho, Ignacio and Albrecht, Stefano V.},
booktitle = {Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems},
year = {2024}
}
Balint Gyevnar, Cheng Wang, Christopher G. Lucas, Shay B. Cohen, Stefano V. Albrecht
Causal Explanations for Sequential Decision-Making in Multi-Agent Systems
International Conference on Autonomous Agents and Multi-Agent Systems, 2024
Abstract | BibTex | arXiv | Code | Dataset
AAMASexplainable-aiautonomous-drivingcausal
Abstract:
We present CEMA: Causal Explanations in Multi-Agent systems; a framework for creating causal natural language explanations of an agent's decisions in dynamic sequential multi-agent systems to build more trustworthy autonomous agents. Unlike prior work that assumes a fixed causal structure, CEMA only requires a probabilistic model for forward-simulating the state of the system. Using such a model, CEMA simulates counterfactual worlds that identify the salient causes behind the agent's decisions. We evaluate CEMA on the task of motion planning for autonomous driving and test it in diverse simulated scenarios. We show that CEMA correctly and robustly identifies the causes behind the agent's decisions, even when a large number of other agents is present, and show via a user study that CEMA's explanations have a positive effect on participants' trust in autonomous vehicles and are rated as high as high-quality baseline explanations elicited from other participants.
@inproceedings{gyevnar2024cema,
title={Causal Explanations for Sequential Decision-Making in Multi-Agent Systems},
author={Balint Gyevnar and Cheng Wang and Christopher G. Lucas and Shay B. Cohen and Stefano V. Albrecht},
booktitle = {Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems},
year={2024}
}
Guy Azran, Mohamad H. Danesh, Stefano V. Albrecht, Sarah Keren
Contextual Pre-planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning
AAAI Conference on Artificial Intelligence, 2024
Abstract | BibTex | arXiv | Code | Video
AAAIdeep-rlcausal
Abstract:
Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RMs), state machine abstractions that induce subtasks based on the current task’s rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Empirical results show that our representations improve sample efficiency and few-shot transfer in a variety of domains.
@inproceedings{azran2024contextual,
title={Contextual Pre-planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning},
author={Guy Azran and Mohamad H. Danesh and Stefano V. Albrecht and Sarah Keren},
booktitle={Proceedings of the 38th AAAI Conference on Artificial Intelligence},
year={2024}
}
Shangmin Guo, Yi Ren, Stefano V. Albrecht, Kenny Smith
lpNTK: Better Generalisation with Less Data via Sample Interaction During Learning
International Conference on Learning Representations, 2024
Abstract | BibTex | arXiv | Code
ICLRdeep-learning
Abstract:
Although much research has been done on proposing new models or loss functions to improve the generalisation of artificial neural networks (ANNs), less attention has been directed to the impact of the training data on generalisation. In this work, we start from approximating the interaction between samples, i.e. how learning one sample would modify the model's prediction on other samples. Through analysing the terms involved in weight updates in supervised learning, we find that labels influence the interaction between samples. Therefore, we propose the labelled pseudo Neural Tangent Kernel (lpNTK) which takes label information into consideration when measuring the interactions between samples. We first prove that lpNTK asymptotically converges to the empirical neural tangent kernel in terms of the Frobenius norm under certain assumptions. Secondly, we illustrate how lpNTK helps to understand learning phenomena identified in previous work, specifically the learning difficulty of samples and forgetting events during learning. Moreover, we also show that using lpNTK to identify and remove poisoning training samples does not hurt the generalisation performance of ANNs.
@inproceedings{guo2024lpntk,
title={Sample Relationship from Learning Dynamics Matters for Generalisation},
author={Shangmin Guo and Yi Ren and Stefano V. Albrecht and Kenny Smith},
booktitle={12th International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=8Ju0VmvMCW}
}
Aleksandar Krnjaic, Raul D. Steleac, Jonathan D. Thomas, Georgios Papoudakis, Lukas Schäfer, Andrew Wing Keung To, Kuan-Ho Lao, Murat Cubuktepe, Matthew Haley, Peter Börsting, Stefano V. Albrecht
Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers
IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024
Abstract | BibTex | arXiv | Website
IROSmulti-agent-rlsimulator
Abstract:
We envision a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance (e.g. order throughput). Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), as the agents learn through experience how to optimally cooperate with one another. We develop hierarchical MARL algorithms in which a manager assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency and overall pick rates over baseline MARL algorithms in diverse warehouse configurations, and substantially outperform two established industry heuristics for order-picking systems
@inproceedings{krnjaic2024scalable,
title={Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers},
author={Aleksandar Krnjaic and Raul D. Steleac and Jonathan D. Thomas and Georgios Papoudakis and Lukas Sch\"afer and Andrew Wing Keung To and Kuan-Ho Lao and Murat Cubuktepe and Matthew Haley and Peter B\"orsting and Stefano V. Albrecht},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems},
year={2023}
}
Anthony Knittel, Majd Hawasly, Stefano V. Albrecht, John Redford, Subramanian Ramamoorthy
DiPA: Probabilistic Multi-Modal Interactive Prediction for Autonomous Driving
IEEE International Conference on Robotics and Automation, 2024
Abstract | BibTex | arXiv | Publisher
ICRAautonomous-drivingstate-estimation
Abstract:
Accurate prediction is important for operating an autonomous vehicle in
interactive scenarios. Prediction must be fast, to support multiple
requests from a planner exploring a range of possible futures. The
generated predictions must accurately represent the probabilities of
predicted trajectories, while also capturing different modes of
behaviour (such as turning left vs continuing straight at a junction).
To this end, we present DiPA, an interactive predictor that addresses
these challenging requirements. Previous interactive prediction methods
use an encoding of k-mode-samples, which under-represents the full
distribution. Other methods optimise closest-mode evaluations, which
test whether one of the predictions is similar to the ground-truth, but
allow additional unlikely predictions to occur, over-representing
unlikely predictions. DiPA addresses these limitations by using a
Gaussian-Mixture-Model to encode the full distribution, and optimising
predictions using both probabilistic and closest-mode measures. These
objectives respectively optimise probabilistic accuracy and the ability
to capture distinct behaviours, and there is a challenging trade-off
between them. We are able to solve both together using a novel training
regime. DiPA achieves new state-of-the-art performance on the
INTERACTION and NGSIM datasets, and improves over the baseline (MFP)
when both closest-mode and probabilistic evaluations are used. This
demonstrates effective prediction for supporting a planner on
interactive scenarios.
@article{Knittel2023dipa,
title={{DiPA:} Probabilistic Multi-Modal Interactive Prediction for Autonomous Driving},
author={Anthony Knittel and Majd Hawasly and Stefano V. Albrecht and John Redford and Subramanian Ramamoorthy},
journal={IEEE Robotics and Automation Letters},
volume={8},
number={8},
pages={4887--4894},
year={2023}
}
Dongge Han, Trevor McInroe, Adam Jelley, Stefano V. Albrecht, Peter Bell, Amos Storkey
LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots
International Conference on Computational Linguistics, 2024
Abstract | BibTex | arXiv | Code | Website
COLINGgeneralisationstate-estimation
Abstract:
Large language models (LLMs) have shown significant potential for robotics applications, particularly task planning, by harnessing their language comprehension and text generation capabilities. However, in applications such as household robotics, a critical gap remains in the personalization of these models to individual user preferences. We introduce LLM-Personalize, a novel framework with an optimization pipeline designed to personalize LLM planners for household robotics. Our LLM-Personalize framework features an LLM planner that performs iterative planning in multi-room, partially-observable household scenarios, making use of a scene graph constructed with local observations. The generated plan consists of a sequence of high-level actions which are subsequently executed by a controller. Central to our approach is the optimization pipeline, which combines imitation learning and iterative self-training to personalize the LLM planner. In particular, the imitation learning phase performs initial LLM alignment from demonstrations, and bootstraps the model to facilitate effective iterative self-training, which further explores and aligns the model to user preferences. We evaluate LLM-Personalize on Housekeep, a challenging simulated real-world 3D benchmark for household rearrangements, and show that LLM-Personalize achieves more than a 30 percent increase in success rate over existing LLM planners, showcasing significantly improved alignment with human preferences.
@inproceedings{han2024llmpersonalize,
title={LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots},
author={Dongge Han and Trevor McInroe and Adam Jelley and Stefano V. Albrecht and Peter Bell and Amos Storkey},
booktitle={International Conference on Computational Linguistics},
year={2024}
}
Guy Azran, Mohamad H. Danesh, Stefano V. Albrecht, Sarah Keren
Contextual Pre-planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning
ICAPS Workshop on Planning and Reinforcement Learning, 2024
Abstract | BibTex | arXiv | Code | Video
ICAPSdeep-rlcausal
Abstract:
Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RMs), state machine abstractions that induce subtasks based on the current task’s rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Empirical results show that our representations improve sample efficiency and few-shot transfer in a variety of domains.
@inproceedings{Azran2022enhancing,
title={Contextual Pre-planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning},
author={Azran, Guy and Danesh, Mohamad H. and Albrecht, Stefano V. and Keren, Sarah},
booktitle={ICAPS Workshop on Planning and Reinforcement Learning (https://prl-theworkshop.github.io/prl2024-icaps/},
year={2024}
}
Sarah Keren, Chaimaa Essayeh, Stefano V. Albrecht, Thomas Mortsyn
Multi-Agent Reinforcement Learning for Energy Networks: Computational Challenges, Progress and Open Problems
arXiv:2404.15583, 2024
Abstract | BibTex | arXiv
multi-agent-rlsurvey
Abstract:
The rapidly changing architecture and functionality of electrical networks and the increasing penetration of renewable and distributed energy resources have resulted in various technological and managerial challenges. These have rendered traditional centralized energy-market paradigms insufficient due to their inability to support the dynamic and evolving nature of the network. This survey explores how multi-agent reinforcement learning (MARL) can support the decentralization and decarbonization of energy networks and mitigate the 12 associated challenges. This is achieved by specifying key computational challenges in managing energy networks, reviewing recent research progress on addressing them, and highlighting open challenges that may be addressed using MARL.
@misc{keren2024multiagent,
title={Multi-Agent Reinforcement Learning for Energy Networks: Computational Challenges, Progress and Open Problems},
author={Sarah Keren and Chaimaa Essayeh and Stefano V. Albrecht and Thomas Mortsyn},
year={2024},
eprint={2404.15583},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
Kale-ab Tessera, Arrasy Rahman, Stefano V. Albrecht
HyperMARL: Adaptive Hypernetworks for Multi-Agent RL
arXiv:2412.04233, 2024
Abstract | BibTex | arXiv
multi-agent-rl
Abstract:
Balancing individual specialisation and shared behaviours is a critical challenge in multi-agent reinforcement learning (MARL). Existing methods typically focus on encouraging diversity or leveraging shared representations. Full parameter sharing (FuPS) improves sample efficiency but struggles to learn diverse behaviours when required, while no parameter sharing (NoPS) enables diversity but is computationally expensive and sample inefficient. To address these challenges, we introduce HyperMARL, a novel approach using hypernetworks to balance efficiency and specialisation. HyperMARL generates agent-specific actor and critic parameters, enabling agents to adaptively exhibit diverse or homogeneous behaviours as needed, without modifying the learning objective or requiring prior knowledge of the optimal diversity. Furthermore, HyperMARL decouples agent-specific and state-based gradients, which empirically correlates with reduced policy gradient variance, potentially offering insights into its ability to capture diverse behaviours. Across MARL benchmarks requiring homogeneous, heterogeneous, or mixed behaviours, HyperMARL consistently matches or outperforms FuPS, NoPS, and diversity-focused methods, achieving NoPS-level diversity with a shared architecture. These results highlight the potential of hypernetworks as a versatile approach to the trade-off between specialisation and shared behaviours in MARL.
@misc{tessera2024hyper,
title={{HyperMARL}: Adaptive Hypernetworks for Multi-Agent RL},
author={Kale-ab Tessera and Arrasy Rahman and Stefano V. Albrecht},
year={2024},
eprint={2412.04233},
archivePrefix={arXiv}
}
2023
Arrasy Rahman, Ignacio Carlucho, Niklas Höpner, Stefano V. Albrecht
A General Learning Framework for Open Ad Hoc Teamwork Using Graph-based Policy Learning
Journal of Machine Learning Research, 2023
Abstract | BibTex | arXiv | Publisher | Code
JMLRad-hoc-teamworkdeep-rlagent-modellingmulti-agent-rl
Abstract:
Open ad hoc teamwork is the problem of training a single agent to efficiently collaborate with an unknown group of teammates whose composition may change over time. A variable team composition creates challenges for the agent, such as the requirement to adapt to new team dynamics and dealing with changing state vector sizes. These challenges are aggravated in real-world applications where the controlled agent has no access to the full state of the environment. In this work, we develop a class of solutions for open ad hoc teamwork under full and partial observability. We start by developing a solution for the fully observable case that leverages graph neural network architectures to obtain an optimal policy based on reinforcement learning. We then extend this solution to partially observable scenarios by proposing different methodologies that maintain belief estimates over the latent environment states and team composition. These belief estimates are combined with our solution for the fully observable case to compute an agent's optimal policy under partial observability in open ad hoc teamwork. Empirical results demonstrate that our approach can learn efficient policies in open ad hoc teamwork in full and partially observable cases. Further analysis demonstrates that our methods' success is a result of effectively learning the effects of teammates' actions while also inferring the inherent state of the environment under partial observability.
@article{JRahman2022POGPL,
author = {Arrasy Rahman and Ignacio Carlucho and Niklas H\"opner and Stefano V. Albrecht},
title = {A General Learning Framework for Open Ad Hoc Teamwork Using Graph-based Policy Learning},
journal = {Journal of Machine Learning Research},
year = {2023},
volume = {24},
number = {298},
pages = {1--74},
url = {http://jmlr.org/papers/v24/22-099.html}
}
Filippos Christianos, Georgios Papoudakis, Stefano V. Albrecht
Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning
Transactions on Machine Learning Research, 2023
Abstract | BibTex | arXiv | Code
TMLRdeep-rlmulti-agent-rl
Abstract:
This work focuses on equilibrium selection in no-conflict multi-agent games, where we specifically study the problem of selecting a Pareto-optimal Nash equilibrium among several existing equilibria. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC), which is an actor-critic algorithm that utilises a simple property of no-conflict games (a superset of cooperative games): the Pareto-optimal equilibrium in a no-conflict game maximises the returns of all agents and, therefore, is the preferred outcome for all agents. We evaluate Pareto-AC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to seven state-of-the-art MARL algorithms and that it successfully converges to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose PACDCG, a graph neural network extension of Pareto-AC, which is shown to efficiently scale in games with a large number of agents.
@article{christianos2023pareto,
title={Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning},
author={Filippos Christianos and Georgios Papoudakis and Stefano V. Albrecht},
journal={Transactions on Machine Learning Research (TMLR)},
year={2023}
}
Arrasy Rahman, Elliot Fosong, Ignacio Carlucho, Stefano V. Albrecht
Generating Teammates for Training Robust Ad Hoc Teamwork Agents via Best-Response Diversity
Transactions on Machine Learning Research, 2023
Abstract | BibTex | arXiv | Code
TMLRad-hoc-teamworkmulti-agent-rldeep-rl
Abstract:
Ad hoc teamwork (AHT) is the challenge of designing a robust learner agent that effectively collaborates with unknown teammates without prior coordination mechanisms. Early approaches address the AHT challenge by training the learner with a diverse set of handcrafted teammate policies, usually designed based on an expert's domain knowledge about the policies the learner may encounter. However, implementing teammate policies for training based on domain knowledge is not always feasible. In such cases, recent approaches attempted to improve the robustness of the learner by training it with teammate policies generated by optimising information-theoretic diversity metrics. The problem with optimising existing information-theoretic diversity metrics for teammate policy generation is the emergence of superficially different teammates. When used for AHT training, superficially different teammate behaviours may not improve a learner's robustness during collaboration with unknown teammates. In this paper, we present an automated teammate policy generation method optimising the Best-Response Diversity (BRDiv) metric, which measures diversity based on the compatibility of teammate policies in terms of returns. We evaluate our approach in environments with multiple valid coordination strategies, comparing against methods optimising information-theoretic diversity metrics and an ablation not optimising any diversity metric. Our experiments indicate that optimising BRDiv yields a diverse set of training teammate policies that improve the learner's performance relative to previous teammate generation approaches when collaborating with near-optimal previously unseen teammate policies.
@article{rahman2023BRDiv,
title={Generating Teammates for Training Robust Ad Hoc Teamwork Agents via Best-Response Diversity},
author={Arrasy Rahman and Elliot Fosong and Ignacio Carlucho and Stefano V. Albrecht},
journal={Transactions on Machine Learning Research (TMLR)},
year={2023}
}
Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah Hanna, Stefano V. Albrecht
Conditional Mutual Information for Disentangled Representations in Reinforcement Learning
Conference on Neural Information Processing Systems, 2023
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rlcausalgeneralisation
Abstract:
Reinforcement Learning (RL) environments can produce training data with spurious correlations between features due to the amount of training data or its limited feature coverage. This can lead to RL agents encoding these misleading correlations in their latent representation, preventing the agent from generalising if the correlation changes within the environment or when deployed in the real world. Disentangled representations can improve robustness, but existing disentanglement techniques that minimise mutual information between features require independent features, thus they cannot disentangle correlated features. We propose an auxiliary task for RL algorithms that learns a disentangled representation of high-dimensional observations with correlated features by minimising the conditional mutual information between features in the representation. We demonstrate experimentally, using continuous control tasks, that our approach improves generalisation under correlation shifts, as well as improving the training performance of RL algorithms in the presence of correlated features.
@inproceedings{dunion2023cmid,
title={Conditional Mutual Information for Disentangled Representations in Reinforcement Learning},
author={Mhairi Dunion and Trevor McInroe and Kevin Sebastian Luck and Josiah Hanna and Stefano V. Albrecht},
booktitle={Conference on Neural Information Processing Systems},
year={2023}
}
Lukas Schäfer, Filippos Christianos, Amos Storkey, Stefano V. Albrecht
Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning
NeurIPS Workshop on Generalization in Planning, 2023
Abstract | BibTex | arXiv | Code
NeurIPSmulti-agent-rldeep-rl
Abstract:
Successful deployment of multi-agent reinforcement learning often requires agents to adapt their behaviour. In this work, we discuss the problem of teamwork adaptation in which a team of agents needs to adapt their policies to solve novel tasks with limited fine-tuning. Motivated by the intuition that agents need to be able to identify and distinguish tasks in order to adapt their behaviour to the current task, we propose to learn multi-agent task embeddings (MATE). These task embeddings are trained using an encoder-decoder architecture optimised for reconstruction of the transition and reward functions which uniquely identify tasks. We show that a team of agents is able to adapt to novel tasks when provided with task embeddings. We propose three MATE training paradigms: independent MATE, centralised MATE, and mixed MATE which vary in the information used for the task encoding. We show that the embeddings learned by MATE identify tasks and provide useful information which agents leverage during adaptation to novel tasks.
@inproceedings{schaefer2023mate,
title={Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning},
author={Lukas Schäfer and Filippos Christianos and Amos Storkey and Stefano V. Albrecht},
booktitle={NeurIPS Workshop on Generalization in Planning},
year={2023}
}
Guy Azran, Mohamad H Danesh, Stefano V. Albrecht, Sarah Keren
Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning
NeurIPS Workshop on Generalization in Planning, 2023
Abstract | BibTex | arXiv
NeurIPSdeep-rlcausal
Abstract:
Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RM), state machine abstractions that induce subtasks based on the current task’s rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Our empirical evaluation shows that our representations improve sample efficiency and few-shot transfer in a variety of domains.
@inproceedings{azran2023contextual,
title={Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning},
author={Guy Azran and Mohamad H. Danesh and Stefano V. Albrecht and Sarah Keren},
booktitle={NeurIPS Workshop on Generalization in Planning},
year={2023}
}
Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
How the level sampling process impacts zero-shot generalisation in deep reinforcement learning
NeurIPS Workshop on Agent Learning in Open-Endedness, 2023
Abstract | BibTex | arXiv
NeurIPSdeep-rl
Abstract:
A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods.
@inproceedings{garcin2023level,
title={How the level sampling process impacts zero-shot generalisation in deep reinforcement learning},
author={Samuel Garcin and James Doran and Shangmin Guo and Christopher G. Lucas and Stefano V. Albrecht},
booktitle={NeurIPS Workshop on Agent Learning in Open-Endedness},
year={2023}
}
Sabrina McCallum, Max Taylor-Davies, Stefano V. Albrecht, Alessandro Suglia
Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning
NeurIPS Workshop on Goal-Conditioned Reinforcement Learning, 2023
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rl
Abstract:
Despite numerous successes, the field of reinforcement learning (RL) remains far from matching the impressive generalisation power of human behaviour learning. One way to help bridge this gap may be to provide RL agents with richer, more human-like feedback expressed in natural language. To investigate this idea, we first extend BabyAI to automatically generate language feedback from the environment dynamics and goal condition success. Then, we modify the Decision Transformer architecture to take advantage of this additional signal. We find that training with language feedback either in place of or in addition to the return-to-go or goal descriptions improves agents’ generalisation performance, and that agents can benefit from feedback even when this is only available during training, but not at inference.
@inproceedings{mccallum2023feedback,
title={Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning},
author={Sabrina McCallum and Max Taylor-Davies and Stefano V. Albrecht and Alessandro Suglia},
booktitle={NeurIPS Workshop on Goal-Conditioned Reinforcement Learning (GCRL)},
year={2023}
}
Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah Hanna, Stefano V. Albrecht
Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning
International Conference on Learning Representations, 2023
Abstract | BibTex | arXiv | Code
ICLRdeep-rlgeneralisationcausal
Abstract:
Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training. This issue is especially problematic for image-based RL, where a change in just one variable, such as the background colour, can change many pixels in the image, which can lead to drastic changes in the agent's latent representation of the image, causing the learned policy to fail. To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled image representations exploiting the sequential nature of RL observations. We find empirically that RL algorithms utilising TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods. Since TED enforces a disentangled structure of the representation, we also find that policies trained with TED generalise better to unseen values of variables irrelevant to the task (e.g. background colour) as well as unseen values of variables that affect the optimal policy (e.g. goal positions).
@inproceedings{dunion2023ted,
title={Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning},
author={Mhairi Dunion and Trevor McInroe and Kevin Sebastian Luck and Josiah Hanna and Stefano V. Albrecht},
booktitle={International Conference on Learning Representations (ICLR)},
year={2023}
}
Yi Ren, Shangmin Guo, Wonho Bae, Danica J. Sutherland
How to Prepare Your Task Head for Finetuning
International Conference on Learning Representations, 2023
Abstract | BibTex | arXiv
ICLRdeep-learningtransfer-learning
Abstract:
In the era of deep learning, transferring information from a pretrained network to a downstream task by finetuning has many benefits. The choice of task head plays an important role in fine-tuning, as the pretrained and downstream tasks are usually different. Although there exist many different designs for finetuning, a full understanding of when and why these algorithms work has been elusive. We analyze how the choice of task head controls feature adaptation and hence influences the downstream performance. By decomposing the feature's learning dynamics, we find the key aspect is the training accuracy and loss at the beginning of finetuning, which determines the "energy" available for the feature's adaptation. We identify a significant trend in the effect of changes in this initial energy on the resulting features after finetuning. Specifically, as the energy increases, the Euclidean and cosine distances between the resulting and original features increase, while their dot product (and the resulting features’ norm) first increases and then decreases. Inspired by this, we give several practical principles that lead to better downstream performance. We analytically prove this trend in an overparamterized linear setting and verify its applicability to different experimental settings.
@inproceedings{ ren2023how,
title={How to Prepare Your Task Head for Finetuning},
author={Yi Ren and Shangmin Guo and Wonho Bae and Danica J. Sutherland},
booktitle={International Conference on Learning Representations (ICLR)},
year={2023},
url={https://openreview.net/forum?id=gVOXZproe-e}
}
Anthony Knittel, Majd Hawasly, Stefano V. Albrecht, John Redford, Subramanian Ramamoorthy
DiPA: Probabilistic Multi-Modal Interactive Prediction for Autonomous Driving
IEEE Robotics and Automation Letters, 2023
Abstract | BibTex | arXiv | Publisher
RA-Lautonomous-drivingstate-estimation
Abstract:
Accurate prediction is important for operating an autonomous vehicle in
interactive scenarios. Prediction must be fast, to support multiple
requests from a planner exploring a range of possible futures. The
generated predictions must accurately represent the probabilities of
predicted trajectories, while also capturing different modes of
behaviour (such as turning left vs continuing straight at a junction).
To this end, we present DiPA, an interactive predictor that addresses
these challenging requirements. Previous interactive prediction methods
use an encoding of k-mode-samples, which under-represents the full
distribution. Other methods optimise closest-mode evaluations, which
test whether one of the predictions is similar to the ground-truth, but
allow additional unlikely predictions to occur, over-representing
unlikely predictions. DiPA addresses these limitations by using a
Gaussian-Mixture-Model to encode the full distribution, and optimising
predictions using both probabilistic and closest-mode measures. These
objectives respectively optimise probabilistic accuracy and the ability
to capture distinct behaviours, and there is a challenging trade-off
between them. We are able to solve both together using a novel training
regime. DiPA achieves new state-of-the-art performance on the
INTERACTION and NGSIM datasets, and improves over the baseline (MFP)
when both closest-mode and probabilistic evaluations are used. This
demonstrates effective prediction for supporting a planner on
interactive scenarios.
@article{Knittel2023dipa,
title={{DiPA:} Probabilistic Multi-Modal Interactive Prediction for Autonomous Driving},
author={Anthony Knittel and Majd Hawasly and Stefano V. Albrecht and John Redford and Subramanian Ramamoorthy},
journal={IEEE Robotics and Automation Letters},
volume={8},
number={8},
pages={4887--4894},
year={2023}
}
Cillian Brewitt, Massimiliano Tamborski, Cheng Wang, Stefano V. Albrecht
Verifiable Goal Recognition for Autonomous Driving with Occlusions
IEEE/RSJ International Conference on Intelligent Robots and Systems, 2023
Abstract | BibTex | arXiv
IROSautonomous-drivinggoal-recognitionexplainable-ai
Abstract:
Goal recognition (GR) allows the future behaviour of vehicles to be more accurately predicted. GR involves inferring the goals of other vehicles, such as a certain junction exit. In autonomous driving, vehicles can encounter many different scenarios and the environment is partially observable due to occlusions. We present a novel GR method named Goal Recognition with Interpretable Trees under Occlusion (OGRIT). We demonstrate that OGRIT can handle missing data due to occlusions and make inferences across multiple scenarios using the same learned decision trees, while still being fast, accurate, interpretable and verifiable. We also present the inDO and rounDO datasets of occluded regions used to evaluate OGRIT.
@inproceedings{brewitt2023ogrit,
title={Verifiable Goal Recognition for Autonomous Driving with Occlusions},
author={Cillian Brewitt and Massimiliano Tamborski and Cheng Wang and Stefano V. Albrecht},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems},
year={2023}
}
Filippos Christianos, Peter Karkus, Boris Ivanovic, Stefano V. Albrecht, Marco Pavone
Planning with Occluded Traffic Agents using Bi-Level Variational Occlusion Models
IEEE International Conference on Robotics and Automation, 2023
Abstract | BibTex | arXiv
ICRAdeep-rlautonomous-driving
Abstract:
Reasoning with occluded traffic agents is a significant open challenge for planning for autonomous vehicles. Recent deep learning models have shown impressive results for predicting occluded agents based on the behaviour of nearby visible agents; however, as we show in experiments, these models are difficult to integrate into downstream planning. To this end, we propose Bi-level Variational Occlusion Models (BiVO), a two-step generative model that first predicts likely locations of occluded agents, and then generates likely trajectories for the occluded agents. In contrast to existing methods, BiVO outputs a trajectory distribution which can then be sampled from and integrated into standard downstream planning. We evaluate the method in closed-loop replay simulation using the real-world nuScenes dataset. Our results suggest that BiVO can successfully learn to predict occluded agent trajectories, and these predictions lead to better subsequent motion plans in critical scenarios.
@inproceedings{christianos2023planning,
title={Planning with Occluded Traffic Agents using Bi-Level Variational Occlusion Models},
author={Filippos Christianos and Peter Karkus and Boris Ivanovic and Stefano V. Albrecht and Marco Pavone},
booktitle={International Conference on Robotics and Automation (ICRA)},
year={2023}
}
Cillian Brewitt, Massimiliano Tamborski, Cheng Wang, Stefano V. Albrecht
Verifiable Goal Recognition for Autonomous Driving with Occlusions
ICRA Workshop on Scalable Autonomous Driving, 2023
Abstract | BibTex | arXiv
ICRAautonomous-drivinggoal-recognitionexplainable-ai
Abstract:
Goal recognition (GR) allows the future behaviour of vehicles to be more accurately predicted. GR involves inferring the goals of other vehicles, such as a certain junction exit. In autonomous driving, vehicles can encounter many different scenarios and the environment is partially observable due to occlusions. We present a novel GR method named Goal Recognition with Interpretable Trees under Occlusion (OGRIT). We demonstrate that OGRIT can handle missing data due to occlusions and make inferences across multiple scenarios using the same learned decision trees, while still being fast, accurate, interpretable and verifiable. We also present the inDO and rounDO datasets of occluded regions used to evaluate OGRIT.
@misc{brewitt2023verifiable,
title={Verifiable Goal Recognition for Autonomous Driving with Occlusions},
author={Cillian Brewitt and Massimiliano Tamborski and Cheng Wang and Stefano V. Albrecht},
booktitle={ICRA 2023 Workshop on Scalable Autonomous Driving},
year={2023}
}
Giuseppe Vecchio, Simone Palazzo, Dario C Guastella, Riccardo E. Sarpietro, Ignacio Carlucho, Stefano V. Albrecht, Giovanni Muscato, Concetto Spampinato
MIDGARD: A Simulation Platform for Autonomous Navigation in Unstructured Environments
RSS Workshop on Multi-Agent Planning and Navigation in Challenging Environments, 2023
Abstract | BibTex | arXiv
RSSsimulatordeep-rl
Abstract:
We present MIDGARD, an open-source simulation platform for autonomous robot navigation in outdoor unstructured environments. MIDGARD is designed to enable the training of autonomous agents (e.g., unmanned ground vehicles) in photorealistic 3D environments, and to support the generalization skills of learning-based agents through the variability in training scenarios. MIDGARD's main features include a configurable, extensible, and difficulty-driven procedural landscape generation pipeline, with fast and photorealistic scene rendering based on Unreal Engine. Additionally, MIDGARD has built-in support for OpenAI Gym, a programming interface for feature extension (e.g., integrating new types of sensors, customizing exposing internal simulation variables), and a variety of simulated agent sensors (e.g., RGB, depth and instance/semantic segmentation). We evaluate MIDGARD's capabilities as a benchmarking tool for robot navigation utilizing a set of state-of-the-art reinforcement learning algorithms. The results demonstrate MIDGARD's suitability as a simulation and training environment, as well as the effectiveness of our procedural generation approach in controlling scene difficulty, which directly reflects on accuracy metrics.
@inproceedings{vecchio2022midgard,
title={MIDGARD: A Simulation Platform for Autonomous Navigation in Unstructured Environments},
author={Vecchio, Giuseppe and Palazzo, Simone and Guastella, Dario C and Sarpietro, Riccardo E. and Carlucho, Ignacio and Albrecht, Stefano V. and Muscato, Giovanni and Spampinato, Concetto},
booktitle={RSS 2023 Workshop on Multi-Agent Planning and Navigation in Challenging Environments},
year={2023}
}
Balint Gyevnar, Cheng Wang, Christopher G. Lucas, Shay B. Cohen, Stefano V. Albrecht
Causal Social Explanations for Stochastic Sequential Multi-Agent Decision-Making
AAMAS Workshop on Explainable and Transparent AI and Multi-Agent Systems, 2023
Abstract | BibTex | arXiv | Code
AAMASautonomous-drivingexplainable-aicausal
Abstract:
We present a novel framework to generate causal explanations for the decisions of agents in stochastic sequential multi-agent environments. Explanations are given via natural language conversations answering a wide range of user queries and requiring associative, interventionist, or counterfactual causal reasoning. Instead of assuming any specific causal graph, our method relies on a generative model of interactions to simulate counterfactual worlds which are used to identify the salient causes behind decisions. We implement our method for motion planning for autonomous driving and test it in simulated scenarios with coupled interactions. Our method correctly identifies and ranks the relevant causes and delivers concise explanations to the users' queries.
@inproceedings{gyevnar2023causal,
title={Causal Social Explanations for Stochastic Sequential Multi-Agent Decision-Making},
author={Balint Gyevnar and Cheng Wang and Christopher G. Lucas and Shay B. Cohen and Stefano V. Albrecht},
booktitle={5th International Workshop on EXplainable and TRAnsparent AI and Multi-Agent Systems},
year={2023}
}
Filippos Christianos, Georgios Papoudakis, Stefano V. Albrecht
Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning
AAMAS Workshop on Optimization and Learning in Multiagent Systems, 2023
Abstract | BibTex | arXiv
AAMASdeep-rlmulti-agent-rl
Abstract:
This work focuses on equilibrium selection in no-conflict multi-agent games, where we specifically study the problem of selecting a Pareto-optimal equilibrium among several existing equilibria. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address suboptimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC), an actor-critic algorithm that utilises a simple property of no-conflict games (a superset of cooperative games with identical rewards): each agent can assume the others will choose actions that will lead to a Pareto-optimal equilibrium. We evaluate Pareto-AC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to alternative MARL algorithms, as well as successfully converging to a Pareto-optimal equilibrium in a range of matrix games.
@inproceedings{christianos2023pareto,
title={Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning},
author={Filippos Christianos and Georgios Papoudakis and Stefano V. Albrecht},
booktitle={AAMAS Workshop on Optimization and Learning in Multiagent Systems},
year={2023}
}
Elliot Fosong, Arrasy Rahman, Ignacio Carlucho, Stefano V. Albrecht
Learning Complex Teamwork Tasks Using a Sub-task Curriculum
AAMAS Workshop on Multiagent Sequential Decision Making Under Uncertainty, 2023
Abstract | BibTex | arXiv | Code
AAMASmulti-agent-rlad-hoc-teamworktransfer-learning
Abstract:
Training a team to complete a complex task via multi-agent reinforcement learning can be difficult due to challenges such as policy search in a large policy space, and non-stationarity caused by mutually adapting agents. To facilitate efficient learning of complex multi-agent tasks, we propose an approach which uses an expert-provided curriculum of simpler multi-agent sub-tasks. In each sub-task of the curriculum, a subset of the entire team is trained to acquire sub-task-specific policies. The sub-teams are then merged and transferred to the target task, where their policies are collectively fined tuned to solve the more complex target task. We present MEDoE, a flexible method which identifies situations in the target task where each agent can use its sub-task-specific skills, and uses this information to modulate hyperparameters for learning and exploration during the fine-tuning process. We compare MEDoE to multi-agent reinforcement learning baselines that train from scratch in the full task, and with naïve applications of standard multi-agent reinforcement learning techniques for fine-tuning. We show that MEDoE outperforms baselines which train from scratch or use naïve fine-tuning approaches, requiring significantly fewer total training timesteps to solve a range of complex teamwork tasks.
@inproceedings{fosong2023learning,
title={Learning complex teamwork tasks using a sub-task curriculum},
author={Elliot Fosong, Arrasy Rahman, Ignacio Carlucho and Stefano V. Albrecht},
booktitle={AAMAS Workshop on Multiagent Sequential Decision Making under Uncertainty},
year={2023},
}
Adam Michalski, Filippos Christianos, Stefano V. Albrecht
SMAClite: A Lightweight Environment for Multi-Agent Reinforcement Learning
AAMAS Workshop on Multiagent Sequential Decision Making Under Uncertainty, 2023
Abstract | BibTex | arXiv | Code
AAMASdeep-rlmulti-agent-rl
Abstract:
There is a lack of standard benchmarks for Multi-Agent Reinforcement Learning (MARL) algorithms. The Starcraft Multi-Agent Challenge (SMAC) has been widely used in MARL research, but is built on top of a heavy, closed-source computer game, StarCraft II. Thus, SMAC is computationally expensive and requires knowledge and the use of proprietary tools specific to the game for any meaningful alteration or contribution to the environment. We introduce SMAClite -- a challenge based on SMAC that is both decoupled from Starcraft II and open-source, along with a framework which makes it possible to create new content for SMAClite without any special knowledge. We conduct experiments to show that SMAClite is equivalent to SMAC, by training MARL algorithms on SMAClite and reproducing SMAC results. We then show that SMAClite outperforms SMAC in both runtime speed and memory.
@inproceedings{michalski2023smaclite,
title={SMAClite: A Lightweight Environment for Multi-Agent Reinforcement Learning},
author={Adam Michalski and Filippos Christianos and Stefano V. Albrecht},
booktitle={AAMAS workshop on Multiagent Sequential Decision Making Under Uncertainty (MSDM)},
year={2023}
}
Lukas Schäfer, Oliver Slumbers, Stephen McAleer, Yali Du, Stefano V. Albrecht, David Mguni
Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning
AAMAS Workshop on Adaptive and Learning Agents, 2023
Abstract | BibTex | arXiv
AAMASmulti-agent-rldeep-rl
Abstract:
Cooperative multi-agent reinforcement learning (MARL) requires agents to explore to learn to cooperate. Existing value-based MARL algorithms commonly rely on random exploration, such as ϵ-greedy, which is inefficient in discovering multi-agent cooperation. Additionally, the environment in MARL appears non-stationary to any individual agent due to the simultaneous training of other agents, leading to highly variant and thus unstable optimisation signals. In this work, we propose ensemble value functions for multi-agent exploration (EMAX), a general framework to extend any value-based MARL algorithm. EMAX trains ensembles of value functions for each agent to address the key challenges of exploration and non-stationarity: (1) The uncertainty of value estimates across the ensemble is used in a UCB policy to guide the exploration of agents to parts of the environment which require cooperation. (2) Average value estimates across the ensemble serve as target values. These targets exhibit lower variance compared to commonly applied target networks and we show that they lead to more stable gradients during the optimisation. We instantiate three value-based MARL algorithms with EMAX, independent DQN, VDN and QMIX, and evaluate them in 21 tasks across four environments. Using ensembles of five value functions, EMAX improves sample efficiency and final evaluation returns of these algorithms by 53%, 36%, and 498%, respectively, averaged all 21 tasks.
@inproceedings{schaefer2023emax,
title={Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning},
author={Lukas Schäfer and Oliver Slumbers and Stephen McAleer and Yali Du and Stefano V. Albrecht and David Mguni},
year={2023},
booktitle={AAMAS Workshop on Adaptive and Learning Agents (ALA)},
}
Callum Tilbury, Filippos Christianos, Stefano V. Albrecht
Revisiting the Gumbel-Softmax in MADDPG
AAMAS Workshop on Adaptive and Learning Agents, 2023
Abstract | BibTex | arXiv | Code
AAMASmulti-agent-rldeep-rl
Abstract:
MADDPG is an algorithm in multi-agent reinforcement learning (MARL) that extends the popular single-agent method, DDPG, to multi-agent scenarios. Importantly, DDPG is an algorithm designed for continuous action spaces, where the gradient of the state-action value function exists. For this algorithm to work in discrete action spaces, discrete gradient estimation must be performed. For MADDPG, the Gumbel-Softmax (GS) estimator is used -- a reparameterisation which relaxes a discrete distribution into a similar continuous one. This method, however, is statistically biased, and a recent MARL benchmarking paper suggests that this bias makes MADDPG perform poorly in grid-world situations, where the action space is discrete. Fortunately, many alternatives to the GS exist, boasting a wide range of properties. This paper explores several of these alternatives and integrates them into MADDPG for discrete grid-world scenarios. The corresponding impact on various performance metrics is then measured and analysed. It is found that one of the proposed estimators performs significantly better than the original GS in several tasks, achieving up to 55\% higher returns, along with faster convergence.
@inproceedings{tilbury2023revisitingmaddpg,
title={Revisiting the Gumbel-Softmax in MADDPG},
author={Callum Tilbury and Filippos Christianos and Stefano V. Albrecht},
year={2023},
booktitle={AAMAS Workshop on Adaptive and Learning Agents (ALA)},
}
Alain Andres, Lukas Schäfer, Esther Villar-Rodriguez, Stefano V. Albrecht, Javier Del Ser
Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments
AAMAS Workshop on Adaptive and Learning Agents, 2023
Abstract | BibTex | arXiv
AAMASdeep-rl
Abstract:
One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently during online RL training both consistently improve the sample-efficiency while converging to optimal policies. Furthermore, we show that pre-training a policy from as few as two trajectories can make the difference between learning an optimal policy at the end of online training and not learning at all. Our findings motivate the widespread adoption of IL for pre-training and concurrent IL in procedurally generated environments whenever offline trajectories are available or can be generated.
@inproceedings{andres2023using,
title={Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments},
author={Andres, Alain and Schäfer, Lukas and Villar-Rodriguez, Esther and Albrecht, Stefano V. and Del Ser, Javier},
booktitle={AAMAS Workshop on Adaptive and Learning Agents (ALA)},
year={2023}
}
Guy Azran, Mohamad H. Danesh, Stefano V. Albrecht, Sarah Keren
Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning
IJCAI Workshop on Planning and Reinforcement Learning, 2023
Abstract | BibTex | arXiv
IJCAIdeep-rlcausal
Abstract:
Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RM), state machine abstractions that induce subtasks based on the current task’s rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Our empirical evaluation shows that our representations improve sample efficiency and few-shot transfer in a variety of domains.
@inproceedings{azran2023contextual,
title={Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning},
author={Guy Azran and Mohamad H. Danesh and Stefano V. Albrecht and Sarah Keren},
booktitle={IJCAI Workshop on Planning and Reinforcement Learning (https://prl-theworkshop.github.io/)},
year={2023}
}
Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah Hanna, Stefano V. Albrecht
Conditional Mutual Information for Disentangled Representations in Reinforcement Learning
European Workshop on Reinforcement Learning, 2023
Abstract | BibTex | arXiv | Code
EWRLdeep-rlcausalgeneralisation
Abstract:
Reinforcement Learning (RL) environments can produce training data with spurious correlations between features due to the amount of training data or its limited feature coverage. This can lead to RL agents encoding these misleading correlations in their latent representation, preventing the agent from generalising if the correlation changes within the environment or when deployed in the real world. Disentangled representations can improve robustness, but existing disentanglement techniques that minimise mutual information between features require independent features, thus they cannot disentangle correlated features. We propose an auxiliary task for RL algorithms that learns a disentangled representation of high-dimensional observations with correlated features by minimising the conditional mutual information between features in the representation. We demonstrate experimentally, using continuous control tasks, that our approach improves generalisation under correlation shifts, as well as improving the training performance of RL algorithms in the presence of correlated features.
@inproceedings{dunion2023cmid,
title={Conditional Mutual Information for Disentangled Representations in Reinforcement Learning},
author={Mhairi Dunion and Trevor McInroe and Kevin Sebastian Luck and Josiah Hanna and Stefano V. Albrecht},
booktitle={European Workshop on Reinforcement Learning},
year={2023}
}
Aleksandar Krnjaic, Raul D. Steleac, Jonathan D. Thomas, Georgios Papoudakis, Lukas Schäfer, Andrew Wing Keung To, Kuan-Ho Lao, Murat Cubuktepe, Matthew Haley, Peter Börsting, Stefano V. Albrecht
Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers
arXiv:2212.11498, 2023
Abstract | BibTex | arXiv | Website
multi-agent-rlsimulator
Abstract:
We envision a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance (e.g. order throughput). Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), as the agents learn through experience how to optimally cooperate with one another. We develop hierarchical MARL algorithms in which a manager assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency and overall pick rates over baseline MARL algorithms in diverse warehouse configurations, and substantially outperform two established industry heuristics for order-picking systems
@misc{krnjaic2023scalable,
title={Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers},
author={Aleksandar Krnjaic and Raul D. Steleac and Jonathan D. Thomas and Georgios Papoudakis and Lukas Sch\"afer and Andrew Wing Keung To and Kuan-Ho Lao and Murat Cubuktepe and Matthew Haley and Peter B\"orsting and Stefano V. Albrecht},
year={2023},
eprint={2212.11498},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
How the level sampling process impacts zero-shot generalisation in deep reinforcement learning
arXiv:2310.03494, 2023
Abstract | BibTex | arXiv
deep-rl
Abstract:
A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods.
@misc{garcin2023level,
title={How the level sampling process impacts zero-shot generalisation in deep reinforcement learning},
author={Samuel Garcin and James Doran and Shangmin Guo and Christopher G. Lucas and Stefano V. Albrecht},
year={2023},
eprint={2310.03494},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Trevor McInroe, Stefano V. Albrecht, Amos Storkey
Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning
arXiv:2310.05723, 2023
Abstract | BibTex | arXiv
deep-rl
Abstract:
Offline pretraining with a static dataset followed by online fine-tuning (offline-to-online, or OtO) is a paradigm that is well matched to a real-world RL deployment process: in few real settings would one deploy an offline policy with no test runs and tuning. In this scenario, we aim to find the best-performing policy within a limited budget of online interactions. Previous work in the OtO setting has focused on correcting for bias introduced by the policy-constraint mechanisms of offline RL algorithms. Such constraints keep the learned policy close to the behavior policy that collected the dataset, but this unnecessarily limits policy performance if the behavior policy is far from optimal. Instead, we forgo policy constraints and frame OtO RL as an exploration problem: we must maximize the benefit of the online data-collection. We study major online RL exploration paradigms, adapting them to work well with the OtO setting. These adapted methods contribute several strong baselines. Also, we introduce an algorithm for planning to go out of distribution (PTGOOD), which targets online exploration in relatively high-reward regions of the state-action space unlikely to be visited by the behavior policy. By leveraging concepts from the Conditional Entropy Bottleneck, PTGOOD encourages data collected online to provide new information relevant to improving the final deployment policy. In that way the limited interaction budget is used effectively. We show that PTGOOD significantly improves agent returns during online fine-tuning and finds the optimal policy in as few as 10k online steps in Walker and in as few as 50k in complex control tasks like Humanoid. Also, we find that PTGOOD avoids the suboptimal policy convergence that many of our baselines exhibit in several environments.
@misc{mcinroe2023planning,
title={Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning},
author={Trevor McInroe and Stefano V. Albrecht and Amos Storkey},
year={2023},
eprint={2310.05723},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
2022
Stefano V. Albrecht, Michael Wooldridge
Special Issue on Multi-Agent Systems Research in the United Kingdom: Guest Editorial
AI Communications, 2022
Abstract | BibTex | Publisher | Special Issue
AICsurveydeep-rlmulti-agent-rlagent-modelling
Abstract:
The purpose of this special issue is to showcase current multi-agent systems research led by university and industry groups based in the United Kingdom. Research groups and institutes in the UK which have significant activity in multi-agent systems research were invited to submit an article describing: (1) the technical problems in multi-agent systems tackled by the group (their core research agenda), including applications and industry collaboration; (2) the main approaches developed by the group and any key results achieved; and (3) important open challenges in multi-agent systems research from the perspective of the group.
@article{albrecht2020special,
title = {Special Issue on Multi-Agent Systems Research in the United Kingdom: Guest Editorial},
author = {Stefano V. Albrecht and Michael Wooldridge},
journal = {AI Communications},
volume = {35},
number = {4},
year = {2022},
publisher = {IOS Press},
url = {https://content.iospress.com/articles/ai-communications/aic229003}
}
Ibrahim H. Ahmed, Cillian Brewitt, Ignacio Carlucho, Filippos Christianos, Mhairi Dunion, Elliot Fosong, Samuel Garcin, Shangmin Guo, Balint Gyevnar, Trevor McInroe, Georgios Papoudakis, Arrasy Rahman, Lukas Schäfer, Massimiliano Tamborski, Giuseppe Vecchio, Cheng Wang, Stefano V. Albrecht
Deep Reinforcement Learning for Multi-Agent Interaction
AI Communications, 2022
Abstract | BibTex | arXiv | Publisher
AICsurveydeep-rlmulti-agent-rlad-hoc-teamworkagent-modellinggoal-recognitionsecurityexplainable-aiautonomous-driving
Abstract:
The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.
@article{albrecht2022aic,
author = {Ahmed, Ibrahim H. and Brewitt, Cillian and Carlucho, Ignacio and Christianos, Filippos and Dunion, Mhairi and Fosong, Elliot and Garcin, Samuel and Guo, Shangmin and Gyevnar, Balint and McInroe, Trevor and Papoudakis, Georgios and Rahman, Arrasy and Schäfer, Lukas and Tamborski, Massimiliano and Vecchio, Giuseppe and Wang, Cheng and Albrecht, Stefano V.},
title = {Deep Reinforcement Learning for Multi-Agent Interaction},
journal = {AI Communications, Special Issue on Multi-Agent Systems Research in the UK},
year = {2022}
}
Majd Hawasly, Jonathan Sadeghi, Morris Antonello, Stefano V. Albrecht, John Redford, Subramanian Ramamoorthy
Perspectives on the System-level Design of a Safe Autonomous Driving Stack
AI Communications, 2022
Abstract | BibTex | arXiv | Publisher
AICsurveyautonomous-drivinggoal-recognitionexplainable-ai
Abstract:
Achieving safe and robust autonomy is the key bottleneck on the path towards broader adoption of autonomous vehicles technology. This motivates going beyond extrinsic metrics such as miles between disengagement, and calls for approaches that embody safety by design. In this paper, we address some aspects of this challenge, with emphasis on issues of motion planning and prediction. We do this through description of novel approaches taken to solving selected sub-problems within an autonomous driving stack, in the process introducing the design philosophy being adopted within Five. This includes safe-by-design planning, interpretable as well as verifiable prediction, and modelling of perception errors to enable effective sim-to-real and real-to-sim transfer within the testing pipeline of a realistic autonomous system.
@article{albrecht2022aic,
author = {Majd Hawasly and Jonathan Sadeghi and Morris Antonello and Stefano V. Albrecht and John Redford and Subramanian Ramamoorthy},
title = {Perspectives on the System-level Design of a Safe Autonomous Driving Stack},
journal = {AI Communications, Special Issue on Multi-Agent Systems Research in the UK},
year = {2022}
}
Rujie Zhong, Duohan Zhang, Lukas Schäfer, Stefano V. Albrecht, Josiah P. Hanna
Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
Conference on Neural Information Processing Systems, 2022
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rl
Abstract:
Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy. In this paper, we study a subtle distinction between on-policy data and on-policy sampling in the context of the RL sub-problem of policy evaluation. We observe that on-policy sampling may fail to match the expected distribution of on-policy data after observing only a finite number of trajectories and this failure hinders data-efficient policy evaluation. Towards improved data-efficiency, we show how non-i.i.d., off-policy sampling can produce data that more closely matches the expected on-policy data distribution and consequently increases the accuracy of the Monte Carlo estimator for policy evaluation. We introduce a method called Robust On-Policy Sampling and demonstrate theoretically and empirically that it produces data that converges faster to the expected on-policy distribution compared to on-policy sampling. Empirically, we show that this faster convergence leads to lower mean squared error policy value estimates.
@inproceedings{zhong2022datacollection,
title={Robust On-Policy Data Collection for Data Efficient Policy Evaluation},
author={Rujie Zhong and Duohan Zhang and Lukas Sch\"afer and Stefano V. Albrecht and Josiah P. Hanna},
booktitle={Conference on Neural Information Processing Systems},
year={2022}
}
Trevor McInroe, Lukas Schäfer, Stefano V. Albrecht
Learning Representations for Reinforcement Learning with Hierarchical Forward Models
NeurIPS Workshop on Deep Reinforcement Learning, 2022
Abstract | BibTex | arXiv
NeurIPSdeep-rlgeneralisation
Abstract:
Learning control from pixels is difficult for reinforcement learning (RL) agents because representation learning and policy learning are intertwined. Previous approaches remedy this issue with auxiliary representation learning tasks, but they either do not consider the temporal aspect of the problem or only consider single-step transitions, which may miss relevant information if important environmental changes take many steps to manifest. We propose Hierarchical k-Step Latent (HKSL), an auxiliary task that learns representations via a hierarchy of forward models that operate at varying magnitudes of step skipping while also learning to communicate between levels in the hierarchy. We evaluate HKSL in a suite of 30 robotic control tasks with and without distractors and a task of our creation. We find that HKSL either converges to higher or optimal episodic returns more quickly than several alternative representation learning approaches. Furthermore, we find that HKSL's representations capture task-relevant details accurately across timescales (even in the presence of distractors) and that communication channels between hierarchy levels organize information based on both sides of the communication process, both of which improve sample efficiency.
@inproceedings{mcinroe2022hksl,
title={Learning Representations for Reinforcement Learning with Hierarchical Forward Models},
author={Trevor McInroe and Lukas Schäfer and Stefano V. Albrecht},
booktitle={NeurIPS Workshop on Deep RL},
year={2022}
}
Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah Hanna, Stefano V. Albrecht
Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning
NeurIPS Workshop on Deep Reinforcement Learning, 2022
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rlgeneralisationcausal
Abstract:
Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training. This issue is especially problematic for image-based RL, where a change in just one variable, such as the background colour, can change many pixels in the image, which can lead to drastic changes in the agent's latent representation of the image, causing the learned policy to fail. To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled image representations exploiting the sequential nature of RL observations. We find empirically that RL algorithms utilising TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods. Since TED enforces a disentangled structure of the representation, we also find that policies trained with TED generalise better to unseen values of variables irrelevant to the task (e.g. background colour) as well as unseen values of variables that affect the optimal policy (e.g. goal positions).
@inproceedings{dunion2022ted,
title={Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning},
author={Mhairi Dunion and Trevor McInroe and Kevin Sebastian Luck and Josiah Hanna and Stefano V. Albrecht},
booktitle={NeurIPS Workshop on Deep Reinforcement Learning},
year={2022}
}
Cillian Brewitt, Massimiliano Tamborski, Stefano V. Albrecht
Verifiable Goal Recognition for Autonomous Driving with Occlusions
NeurIPS Workshop on Machine Learning for Autonomous Driving, 2022
Abstract | BibTex | arXiv | Code
NeurIPSautonomous-drivinggoal-recognitionexplainable-ai
Abstract:
Goal recognition (GR) allows the future behaviour of vehicles to be more accurately predicted. GR involves inferring the goals of other vehicles, such as a certain junction exit. In autonomous driving, vehicles can encounter many different scenarios and the environment is partially observable due to occlusions. We present a novel GR method named Goal Recognition with Interpretable Trees under Occlusion (OGRIT). We demonstrate that OGRIT can handle missing data due to occlusions and make inferences across multiple scenarios using the same learned decision trees, while still being fast, accurate, interpretable and verifiable. We also present the inDO and rounDO datasets of occluded regions used to evaluate OGRIT.
@inproceedings{brewitt2022,
title={Verifiable Goal Recognition for Autonomous Driving with Occlusions},
author={Cillian Brewitt and Massimiliano Tamborski and Stefano V. Albrecht},
booktitle={NeurIPS Workshop on Machine Learning for Autonomous Driving},
year={2022}
}
Shangmin Guo, Yi Ren, Stefano V. Albrecht, Kenny Smith
Sample Relationships through the Lens of Learning Dynamics with Label Information
NeurIPS Workshop on Interpolation and Beyond, 2022
Abstract | BibTex | arXiv
NeurIPSiterated-learningdeep-learningtransfer-learning
Abstract:
Although much research has been done on proposing new models or loss functions to improve the generalisation of artificial neural networks (ANNs), less attention has been directed to the data, which is also an important factor for training ANNs. In this work, we start from approximating the interaction between two samples, i.e. how learning one sample would modify the model's prediction on the other sample. Through analysing the terms involved in weight updates in supervised learning, we find that the signs of labels influence the interactions between samples. Therefore, we propose the labelled pseudo Neural Tangent Kernel (lpNTK) which takes label information into consideration when measuring the interactions between samples. We first prove that lpNTK would asymptotically converge to the well-known empirical Neural Tangent Kernel in terms of the Frobenius norm under certain assumptions. Secondly, we illustrate how lpNTK helps to understand learning phenomena identified in previous work, specifically the learning difficulty of samples and forgetting events during learning. Moreover, we also show that lpNTK can help to improve the generalisation performance of ANNs in image classification tasks, compared with the original whole training sets.
@inproceedings{guo2022relationship,
title={Sample Relationships through the Lens of Learning Dynamics with Label Information},
author={Shangmin Guo and Yi Ren and Stefano V. Albrecht and Kenny Smith},
booktitle={NeurIPS 2022 Workshop on Interpolation and Beyond},
year={2022}
}
Guy Azran, Mohamad Hosein Danesh, Stefano V. Albrecht, Sarah Keren
Enhancing Transfer of Reinforcement Learning Agents with Abstract Contextual Embeddings
NeurIPS Workshop on Neuro Causal and Symbolic AI, 2022
Abstract | BibTex
NeurIPSdeep-rlcausal
Abstract:
Deep reinforcement learning (DRL) algorithms have seen great success in performing a plethora of tasks, but often have trouble adapting to changes in the environment. We address this issue by using reward machines (RM), a graph-based abstraction of the underlying task to represent the current setting or context. Using a graph neural network (GNN), we embed the RMs into deep latent vector representations and provide them to the agent to enhance its ability to adapt to new contexts. To the best of our knowledge, this is the first work to embed contextual abstractions and let the agent decide how to use them. Our preliminary empirical evaluation demonstrates improved sample efficiency of our approach upon context transfer on a set of grid navigation tasks.
@inproceedings{Azran2022enhancing,
title={Enhancing Transfer of Reinforcement Learning Agents with Abstract Contextual Embeddings},
author={Guy Azran and Mohamad Hosein Danesh and Stefano V. Albrecht and Sarah Keren},
booktitle={NeurIPS Workshop on Neuro Causal and Symbolic AI (https://ncsi.cause-lab.net)},
year={2022}
}
Shangmin Guo, Yi Ren, Kory Mathewson, Simon Kirby, Stefano V. Albrecht, Kenny Smith
Expressivity of Emergent Languages is a Trade-off between Contextual Complexity and Unpredictability
International Conference on Learning Representations, 2022
Abstract | BibTex | arXiv | Code
ICLRmulti-agent-rlemergent-communication
Abstract:
Researchers are using deep learning models to explore the emergence of language in various language games, where simulated agents interact and develop an emergent language to solve a task. We focus on the factors which determine the expressivity of emergent languages, which reflects the amount of information about input spaces those languages are capable of encoding. We measure the expressivity of emergent languages based on their generalisation performance across different games, and demonstrate that the expressivity of emergent languages is a trade-off between the complexity and unpredictability of the context those languages are used in. Another novel contribution of this work is the discovery of message type collapse. We also show that using the contrastive loss proposed by Chen et al. (2020) can alleviate this problem, compared with the standard referential loss used by the existing works.
@inproceedings{guo2022expressivity,
title={Expressivity of Emergent Languages is a Trade-off between Contextual Complexity and Unpredictability},
author={Shangmin Guo and Yi Ren and Kory Mathewson and Simon Kirby and Stefano V. Albrecht and Kenny Smith},
booktitle={International Conference on Learning Representations (ICLR)},
year={2022}
}
Lukas Schäfer, Filippos Christianos, Josiah P. Hanna, Stefano V. Albrecht
Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration
International Conference on Autonomous Agents and Multi-Agent Systems, 2022
Abstract | BibTex | arXiv | Code
AAMASdeep-rlintrinsic-reward
Abstract:
Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we introduce Decoupled RL (DeRL) as a general framework which trains separate policies for intrinsically-motivated exploration and exploitation. Such decoupling allows DeRL to leverage the benefits of intrinsic rewards for exploration while demonstrating improved robustness and sample efficiency. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. Our results show that DeRL is more robust to varying scale and rate of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically-motivated baselines in fewer interactions. Lastly, we discuss the challenge of distribution shift and show that divergence constraint regularisers can successfully minimise instability caused by divergence of exploration and exploitation policies.
@inproceedings{schaefer2022derl,
title={Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration},
author={Lukas Schäfer and Filippos Christianos and Josiah P. Hanna and Stefano V. Albrecht},
booktitle={International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
year={2022}
}
Lukas Schäfer
Task Generalisation in Multi-Agent Reinforcement Learning
International Conference on Autonomous Agents and Multiagent Systems, Doctoral Consortium, 2022
Abstract | BibTex | Paper
AAMASmulti-agent-rl
Abstract:
Multi-agent reinforcement learning agents are typically trained in a single environment. As a consequence, they overfit to the training environment which results in sensitivity to perturbations and inability to generalise to similar environments. For multi-agent reinforcement learning approaches to be applicable in real-world scenarios, generalisation and robustness need to be addressed. However, unlike in supervised learning, generalisation lacks a clear definition in multi-agent reinforcement learning. We discuss the problem of task generalisation and demonstrate the difficulty of zero-shot generalisation and finetuning at the example of multi-robot warehouse coordination with preliminary results. Lastly, we discuss promising directions of research working towards generalisation of multi-agent reinforcement learning.
@inproceedings{schaefer2022task,
title={Task Generalisation in Multi-Agent Reinforcement Learning},
author={Lukas Schäfer},
booktitle={Doctoral Consortium at the International Conference on Autonomous Agents and Multiagent Systems},
year={2022}
}
Filippos Christianos
Collaborative Training of Multiple Autonomous Agents
International Conference on Autonomous Agents and Multiagent Systems, Doctoral Consortium, 2022
Abstract | BibTex | Paper
AAMASmulti-agent-rl
Abstract:
Exploration in multi-agent reinforcement learning is a challenging problem, especially with a large number of agents. Parameter sharing between agents is often used since it significantly decreases the number of trainable parameters, shortening training times to tractable levels and improving exploration efficiency. We present two algorithms that aim to be a middle ground between not sharing parameters and fully sharing parameters. These proposed algorithms show the advantages of the baselines at the two ends of the spectrum and minimise their drawbacks. First, Shared Experience Actor-Critic [Christianos et al. 2020], applies the basic idea of off-policy correction via importance weighting and combines the experiences generated by different agents into more informative and effective learning gradients. Then, Selective Parameter Sharing [Christianos et al. 2021], based on rigorous empirical analysis of the impact of parameter sharing proposes a novel parameter sharing method that can be coupled with existing multi-agent reinforcement learning algorithms.
@inproceedings{christianos2022collaborative,
title={Collaborative Training of Multiple Autonomous Agents},
author={Filippos Christianos},
booktitle={Doctoral Consortium at the International Conference on Autonomous Agents and Multiagent Systems},
year={2022}
}
Francisco Eiras, Majd Hawasly, Stefano V. Albrecht, Subramanian Ramamoorthy
A Two-Stage Optimization-based Motion Planner for Safe Urban Driving
IEEE Transactions on Robotics, 2022
Abstract | BibTex | arXiv | Publisher | Video
T-ROautonomous-driving
Abstract:
Recent road trials have shown that guaranteeing the safety of driving decisions is essential for the wider adoption of autonomous vehicle technology. One promising direction is to pose safety requirements as planning constraints in nonlinear, non-convex optimization problems of motion synthesis. However, many implementations of this approach are limited by uncertain convergence and local optimality of the solutions achieved, affecting overall robustness. To improve upon these issues, we propose a novel two-stage optimization framework: in the first stage, we find a solution to a Mixed-Integer Linear Programming (MILP) formulation of the motion synthesis problem, the output of which initializes a second Nonlinear Programming (NLP) stage. The MILP stage enforces hard constraints of safety and road rule compliance generating a solution in the right subspace, while the NLP stage refines the solution within the safety bounds for feasibility and smoothness. We demonstrate the effectiveness of our framework via simulated experiments of complex urban driving scenarios, outperforming a state-of-the-art baseline in metrics of convergence, comfort and progress.
@article{eiras2021twostage,
title = {A Two-Stage Optimization-based Motion Planner for Safe Urban Driving},
author = {Francisco Eiras and Majd Hawasly and Stefano V. Albrecht and Subramanian Ramamoorthy},
journal = {IEEE Transactions on Robotics},
volume = {38},
number = {2},
pages = {822--834},
year = {2022},
doi = {10.1109/TRO.2021.3088009}
}
Morris Antonello, Mihai Dobre, Stefano V. Albrecht, John Redford, Subramanian Ramamoorthy
Flash: Fast and Light Motion Prediction for Autonomous Driving with Bayesian Inverse Planning and Learned Motion Profiles
IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022
Abstract | BibTex | arXiv
IROSautonomous-drivingstate-estimation
Abstract:
Motion prediction of road users in traffic scenes is critical for autonomous driving systems that must take safe and robust decisions in complex dynamic environments. We present a novel motion prediction system for autonomous driving. Our system is based on the Bayesian inverse planning framework, which efficiently orchestrates map-based goal extraction, a classical control-based trajectory generator and an ensemble of light-weight neural networks specialised in motion profile prediction. In contrast to many alternative methods, this modularity helps isolate performance factors and better interpret results, without compromising performance. This system addresses multiple aspects of interest, namely multi-modality, motion profile uncertainty and trajectory physical feasibility. We report on several experiments with the popular highway dataset NGSIM, demonstrating state-of-the-art performance in terms of trajectory error. We also perform a detailed analysis of our system's components, along with experiments that stratify the data based on behaviours, such as change lane versus follow lane, to provide insights into the challenges in this domain. Finally, we present a qualitative analysis to show other benefits of our approach, such as the ability to interpret the outputs.
@inproceedings{antonello2022flash,
title={Flash: Fast and Light Motion Prediction for Autonomous Driving with {Bayesian} Inverse Planning and Learned Motion Profiles},
author={Morris Antonello, Mihai Dobre, Stefano V. Albrecht, John Redford, Subramanian Ramamoorthy},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2022}
}
Giuseppe Vecchio, Simone Palazzo, Dario C Guastella, Ignacio Carlucho, Stefano V. Albrecht, Giovanni Muscato, Concetto Spampinato
MIDGARD: A Simulation Platform for Autonomous Navigation in Unstructured Environments
ICRA Workshop on Releasing Robots into the Wild: Simulations, Benchmarks, and Deployment, 2022
Abstract | BibTex | arXiv
ICRAdeep-rlsimulator
Abstract:
We present MIDGARD, an open source simulation platform for autonomous robot navigation in unstructured outdoor environments. We specifically design MIDGARD to enable training of autonomous agents (e.g., unmanned ground vehicles) in photorealistic 3D environments, and to support the generalization skills of learning-based agents by means of diverse and variable training scenarios. MIDGARD differs from other major simulation platforms in that it proposes a highly configurable procedural landscape generation pipeline, which enables autonomous agents to be trained in diverse scenarios while reducing the efforts and costs needed to create digital content from scratch.
@misc{Vecchio2022MIDGARD,
title={MIDGARD: A Simulation Platform for Autonomous Navigation in Unstructured Environments},
author={Giuseppe Vecchio, Simone Palazzo, Dario C Guastella, Ignacio Carlucho, Stefano V. Albrecht, Giovanni Muscato, Concetto Spampinato},
year={2022},
eprint={2205.08389},
archivePrefix={arXiv},
primaryClass={cs.MA}
}
Balint Gyevnar, Massimiliano Tamborski, Cheng Wang, Christopher G. Lucas, Shay B. Cohen, Stefano V. Albrecht
A Human-Centric Method for Generating Causal Explanations in Natural Language for Autonomous Vehicle Motion Planning
IJCAI Workshop on Artificial Intelligence for Autonomous Driving, 2022
Abstract | BibTex | arXiv | Code
IJCAIautonomous-drivingexplainable-aicausal
Abstract:
Inscrutable AI systems are difficult to trust, especially if they operate in safety-critical settings like autonomous driving. Therefore, there is a need to build transparent and queryable systems to increase trust levels. We propose a transparent, human-centric explanation generation method for autonomous vehicle motion planning and prediction based on an existing white-box system called IGP2. Our method integrates Bayesian networks with context-free generative rules and can give causal natural language explanations for the high-level driving behaviour of autonomous vehicles. Preliminary testing on simulated scenarios shows that our method captures the causes behind the actions of autonomous vehicles and generates intelligible explanations with varying complexity.
@inproceedings{gyevnar2022humancentric,
title={A Human-Centric Method for Generating Causal Explanations in Natural Language for Autonomous Vehicle Motion Planning},
author={Balint Gyevnar and Massimiliano Tamborski and Cheng Wang and Christopher G. Lucas and Shay B. Cohen and Stefano V. Albrecht},
booktitle={IJCAI Workshop on Artificial Intelligence for Autonomous Driving},
year={2022}
}
Arrasy Rahman, Elliot Fosong, Ignacio Carlucho, Stefano V. Albrecht
Towards Robust Ad Hoc Teamwork Agents By Creating Diverse Training Teammates
IJCAI Workshop on Ad Hoc Teamwork, 2022
Abstract | BibTex | arXiv | Code
IJCAIad-hoc-teamworkmulti-agent-rl
Abstract:
Ad hoc teamwork (AHT) is the problem of creating an agent that must collaborate with previously unseen teammates without prior coordination. Many existing AHT methods can be categorised as type-based methods, which require a set of predefined teammates for training. Designing teammate types for training is a challenging issue that determines the generalisation performance of agents when dealing with teammate types unseen during training. In this work, we propose a method to discover diverse teammate types based on maximising best response diversity metrics. We show that our proposed approach yields teammate types that require a wider range of best responses from the learner during collaboration, which potentially improves the robustness of a learner's performance in AHT compared to alternative methods.
@inproceedings{rahman2022towards,
title={Towards Robust Ad Hoc Teamwork Agents By Creating Diverse Training Teammates},
author={Arrasy Rahman and Elliot Fosong and Ignacio Carlucho and Stefano V. Albrecht},
booktitle={IJCAI Workshop on Ad Hoc Teamwork},
year={2022}
}
Elliot Fosong, Arrasy Rahman, Ignacio Carlucho, Stefano V. Albrecht
Few-Shot Teamwork
IJCAI Workshop on Ad Hoc Teamwork, 2022
Abstract | BibTex | arXiv
IJCAIad-hoc-teamworkmulti-agent-rl
Abstract:
We propose the novel few-shot teamwork (FST) problem, where skilled agents trained in a team to complete one task are combined with skilled agents from different tasks, and together must learn to adapt to an unseen but related task. We discuss how the FST problem can be seen as addressing two separate problems: one of reducing the experience required to train a team of agents to complete a complex task; and one of collaborating with unfamiliar teammates to complete a new task. Progress towards solving FST could lead to progress in both multi-agent reinforcement learning and ad hoc teamwork.
@inproceedings{fosong2022fewshot,
title={Few-Shot Teamwork},
author={Elliot Fosong and Arrasy Rahman and Ignacio Carlucho and Stefano V. Albrecht},
booktitle={IJCAI Workshop on Ad Hoc Teamwork},
year={2022}
}
Ignacio Carlucho, Arrasy Rahman, William Ard, Elliot Fosong, Corina Barbalata, Stefano V. Albrecht
Cooperative Marine Operations Via Ad Hoc Teams
IJCAI Workshop on Ad Hoc Teamwork, 2022
Abstract | BibTex | arXiv
IJCAIad-hoc-teamworkmulti-agent-rl
Abstract:
While research in ad hoc teamwork has great potential for solving real-world robotic applications, most developments so far have been focusing on environments with simple dynamics. In this article, we discuss how the problem of ad hoc teamwork can be of special interest for marine robotics and how it can aid marine operations. Particularly, we present a set of challenges that need to be addressed for achieving ad hoc teamwork in underwater environments and we discuss possible solutions based on current state-of-the-art developments in the ad hoc teamwork literature.
@inproceedings{Carlucho2022UnderwaterAHT,
title={Cooperative Marine Operations Via Ad Hoc Teams},
author={Ignacio Carlucho, Arrasy Rahman, William Ard, Elliot Fosong, Corina Barbalata, Stefano V. Albrecht},
booktitle={IJCAI Workshop on Ad Hoc Teamwork},
year={2022}
}
Reuth Mirsky, Ignacio Carlucho, Arrasy Rahman, Elliot Fosong, William Macke, Mohan Sridharan, Peter Stone, Stefano V. Albrecht
A Survey of Ad Hoc Teamwork Research
European Conference on Multi-Agent Systems, 2022
Abstract | BibTex | arXiv
EUMASsurveyad-hoc-teamwork
Abstract:
Ad hoc teamwork is the research problem of designing agents that can collaborate with new teammates without prior coordination. This survey makes a two-fold contribution: First, it provides a structured description of the different facets of the ad hoc teamwork problem. Second, it discusses the progress that has been made in the field so far, and identifies the immediate and long-term open problems that need to be addressed in ad hoc teamwork.
@inproceedings{mirsky2022survey,
title={A Survey of Ad Hoc Teamwork Research},
author={Reuth Mirsky and Ignacio Carlucho and Arrasy Rahman and Elliot Fosong and William Macke and Mohan Sridharan and Peter Stone and Stefano V. Albrecht},
booktitle={European Conference on Multi-Agent Systems (EUMAS)},
year={2022}
}
Arrasy Rahman, Ignacio Carlucho, Niklas Höpner, Stefano V. Albrecht
A General Learning Framework for Open Ad Hoc Teamwork Using Graph-based Policy Learning
arXiv:2210.05448, 2022
Abstract | BibTex | arXiv
ad-hoc-teamworkdeep-rlagent-modelling
Abstract:
Open ad hoc teamwork is the problem of training a single agent to efficiently collaborate with an unknown group of teammates whose composition may change over time. A variable team composition creates challenges for the agent, such as the requirement to adapt to new team dynamics and dealing with changing state vector sizes. These challenges are aggravated in real-world applications where the controlled agent has no access to the full state of the environment. In this work, we develop a class of solutions for open ad hoc teamwork under full and partial observability. We start by developing a solution for the fully observable case that leverages graph neural network architectures to obtain an optimal policy based on reinforcement learning. We then extend this solution to partially observable scenarios by proposing different methodologies that maintain belief estimates over the latent environment states and team composition. These belief estimates are combined with our solution for the fully observable case to compute an agent's optimal policy under partial observability in open ad hoc teamwork. Empirical results demonstrate that our approach can learn efficient policies in open ad hoc teamwork in full and partially observable cases. Further analysis demonstrates that our methods' success is a result of effectively learning the effects of teammates' actions while also inferring the inherent state of the environment under partial observability.
@misc{Rahman2022POGPL,
title={A General Learning Framework for Open Ad Hoc Teamwork Using Graph-based Policy Learning},
author={Arrasy Rahman and Ignacio Carlucho and Niklas H\"opner and Stefano V. Albrecht},
year={2022},
eprint={2210.05448},
archivePrefix={arXiv}
}
Aleksandar Krnjaic, Jonathan D. Thomas, Georgios Papoudakis, Lukas Schäfer, Peter Börsting, Stefano V. Albrecht
Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers
arXiv:2212.11498, 2022
Abstract | BibTex | arXiv
deep-rlmulti-agent-rl
Abstract:
This project leverages advances in Multi-Agent Reinforcement Learning (MARL) to improve the efficiency and flexibility of order-picking systems for large-scale commercial warehouses. We envision a warehouse of the future in which dozens or even hundreds of mobile robots and humans work together to collect and deliver items. The fundamental problem we tackle - called the order-picking problem - is how these agents must coordinate their movement and actions in the warehouse to maximise performance (e.g. order throughput) under given resource constraints. MARL algorithms implement a paradigm whereby the agents learn via a process of trial-and-error how to optimally collaborate with one another. Established industry methods using fixed heuristics require a large engineering effort to operate in specific warehouse configurations and resource constraints, and their achievable performance is often limited by heuristic design limitations. In contrast, the MARL framework can be applied to any warehouse configuration (e.g. size, layout, number/types of workers, item replenishment frequency) and resource constraints, and the learning process maximises performance by optimising agent behaviours for the specified warehouse environment.
@misc{Krnjaic2022HSNAC,
title={Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers},
author={Aleksandar Krnjaic and Jonathan D. Thomas and Georgios Papoudakis and Lukas Sch\"afer and Peter B\"orsting and Stefano V. Albrecht,
year={2022},
eprint={2212.11498},
archivePrefix={arXiv}
}
Lukas Schäfer, Filippos Christianos, Amos Storkey, Stefano V. Albrecht
Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning
arxiv:2207.02249, 2022
Abstract | BibTex | arXiv
deep-rlmulti-agent-rl
Abstract:
Successful deployment of multi-agent reinforcement learning often requires agents to adapt their behaviour. In this work, we discuss the problem of teamwork adaptation in which a team of agents needs to adapt their policies to solve novel tasks with limited fine-tuning. Motivated by the intuition that agents need to be able to identify and distinguish tasks in order to adapt their behaviour to the current task, we propose to learn multi-agent task embeddings (MATE). These task embeddings are trained using an encoder-decoder architecture optimised for reconstruction of the transition and reward functions which uniquely identify tasks. We show that a team of agents is able to adapt to novel tasks when provided with task embeddings. We propose three MATE training paradigms: independent MATE, centralised MATE, and mixed MATE which vary in the information used for the task encoding. We show that the embeddings learned by MATE identify tasks and provide useful information which agents leverage during adaptation to novel tasks.
@misc{schaefer2022mate,
title={Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning},
author={Lukas Schäfer and Filippos Christianos and Amos Storkey and Stefano V. Albrecht},
year={2022},
eprint={2207.02249},
archivePrefix={arXiv},
primaryClass={cs.MA}
}
Filippos Christianos, Georgios Papoudakis, Stefano V. Albrecht
Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning
arXiv:2209.14344, 2022
Abstract | BibTex | arXiv
deep-rlmulti-agent-rl
Abstract:
Equilibrium selection in multi-agent games refers to the problem of selecting a Pareto-optimal equilibrium. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address suboptimal equilibrium selection, we propose Pareto-AC (PAC), an actor-critic algorithm that utilises a simple principle of no-conflict games (a superset of cooperative games with identical rewards): each agent can assume the others will choose actions that will lead to a Pareto-optimal equilibrium. We evaluate PAC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to alternative MARL algorithms, as well as successfully converging to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose a graph neural network extension which is shown to efficiently scale in games with up to 15 agents.
@misc{christianos2022pareto,
title={Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning},
author={Filippos Christianos and Georgios Papoudakis and Stefano V. Albrecht},
year={2022},
eprint={2209.14344},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Filippos Christianos, Peter Karkus, Boris Ivanovic, Stefano V. Albrecht, Marco Pavone
Planning with Occluded Traffic Agents using Bi-Level Variational Occlusion Models
arXiv:2210.14584, 2022
Abstract | BibTex | arXiv
autonomous-driving
Abstract:
Reasoning with occluded traffic agents is a significant open challenge for planning for autonomous vehicles. Recent deep learning models have shown impressive results for predicting occluded agents based on the behaviour of nearby visible agents; however, as we show in experiments, these models are difficult to integrate into downstream planning. To this end, we propose Bi-level Variational Occlusion Models (BiVO), a two-step generative model that first predicts likely locations of occluded agents, and then generates likely trajectories for the occluded agents. In contrast to existing methods, BiVO outputs a trajectory distribution which can then be sampled from and integrated into standard downstream planning. We evaluate the method in closed-loop replay simulation using the real-world nuScenes dataset. Our results suggest that BiVO can successfully learn to predict occluded agent trajectories, and these predictions lead to better subsequent motion plans in critical scenarios.
@misc{christianos2022bivo,
title={Planning with Occluded Traffic Agents using Bi-Level Variational Occlusion Models},
author={Filippos Christianos and Peter Karkus and Boris Ivanovic and Stefano V. Albrecht and Marco Pavone},
year={2022},
eprint={2210.14584},
archivePrefix={arXiv}
}
Anthony Knittel, Majd Hawasly, Stefano V. Albrecht, John Redford, Subramanian Ramamoorthy
DiPA: Diverse and Probabilistically Accurate Interactive Prediction
arXiv:2210.06106, 2022
Abstract | BibTex | arXiv
autonomous-drivingstate-estimation
Abstract:
Accurate prediction is important for operating an autonomous vehicle in interactive scenarios. Previous interactive predictors have used closest-mode evaluations, which test if one of a set of predictions covers the ground-truth, but not if additional unlikely predictions are made. The presence of unlikely predictions can interfere with planning, by indicating conflict with the ego plan when it is not likely to occur. Closest-mode evaluations are not sufficient for showing a predictor is useful, an effective predictor also needs to accurately estimate mode probabilities, and to be evaluated using probabilistic measures. These two evaluation approaches, eg. predicted-mode RMS and minADE/FDE, are analogous to precision and recall in binary classification, and there is a challenging trade-off between prediction strategies for each. We present DiPA, a method for producing diverse predictions while also capturing accurate probabilistic estimates. DiPA uses a flexible representation that captures interactions in widely varying road topologies, and uses a novel training regime for a Gaussian Mixture Model that supports diversity of predicted modes, along with accurate spatial distribution and mode probability estimates. DiPA achieves state-of-the-art performance on INTERACTION and NGSIM, and improves over a baseline (MFP) when both closest-mode and probabilistic evaluations are used at the same time.
@misc{brewitt2022verifiable,
title={{DiPA:} Diverse and Probabilistically Accurate Interactive Prediction},
author={Anthony Knittel and Majd Hawasly and Stefano V. Albrecht and John Redford and Subramanian Ramamoorthy},
year={2022},
eprint={2210.06106},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
2021
Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, Stefano V. Albrecht
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks
Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, 2021
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rlmulti-agent-rl
Abstract:
Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we consistently evaluate and compare three different classes of MARL algorithms (independent learning, centralised multi-agent policy gradient, value decomposition) in a diverse range of cooperative multi-agent learning tasks. Our experiments serve as a reference for the expected performance of algorithms across different learning tasks, and we provide insights regarding the effectiveness of different learning approaches. We open-source EPyMARL, which extends the PyMARL codebase [Samvelyan et al., 2019] to include additional algorithms and allow for flexible configuration of algorithm implementation details such as parameter sharing. Finally, we open-source two environments for multi-agent research which focus on coordination under sparse rewards.
@inproceedings{papoudakis2021benchmarking,
title={Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks},
author={Georgios Papoudakis and Filippos Christianos and Lukas Sch\"afer and Stefano V. Albrecht},
booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS)},
year={2021},
url = {http://arxiv.org/abs/2006.07869},
openreview = {https://openreview.net/forum?id=cIrPX-Sn5n},
code = {https://github.com/uoe-agents/epymarl}
}
Georgios Papoudakis, Filippos Christianos, Stefano V. Albrecht
Agent Modelling under Partial Observability for Deep Reinforcement Learning
Conference on Neural Information Processing Systems, 2021
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rlagent-modelling
Abstract:
Modelling the behaviours of other agents is essential for understanding how agents interact and making effective decisions. Existing methods for agent modelling commonly assume knowledge of the local observations and chosen actions of the modelled agents during execution. To eliminate this assumption, we extract representations from the local information of the controlled agent using encoder-decoder architectures. Using the observations and actions of the modelled agents during training, our models learn to extract representations about the modelled agents conditioned only on the local observations of the controlled agent. The representations are used to augment the controlled agent's decision policy which is trained via deep reinforcement learning; thus, during execution, the policy does not require access to other agents' information. We provide a comprehensive evaluation and ablations studies in cooperative, competitive and mixed multi-agent environments, showing that our method achieves significantly higher returns than baseline methods which do not use the learned representations.
@inproceedings{papoudakis2021local,
title={Agent Modelling under Partial Observability for Deep Reinforcement Learning},
author={Georgios Papoudakis and Filippos Christianos and Stefano V. Albrecht},
booktitle = {Proceedings of the Neural Information Processing Systems (NeurIPS)},
year = {2021}
}
Rujie Zhong, Josiah P. Hanna, Lukas Schäfer, Stefano V. Albrecht
Robust On-Policy Data Collection for Data-Efficient Policy Evaluation
NeurIPS Workshop on Offline Reinforcement Learning, 2021
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rl
Abstract:
This paper considers how to complement offline reinforcement learning (RL) data with additional data collection for the task of policy evaluation. In policy evaluation, the task is to estimate the expected return of an evaluation policy on an environment of interest. Prior work on offline policy evaluation typically only considers a static dataset. We consider a setting where we can collect a small amount of additional data to combine with a potentially larger offline RL dataset. We show that simply running the evaluation policy – on-policy data collection – is sub-optimal for this setting. We then introduce two new data collection strategies for policy evaluation, both of which consider previously collected data when collecting future data so as to reduce distribution shift (or sampling error) in the entire dataset collected. Our empirical results show that compared to on-policy sampling, our strategies produce data with lower sampling error and generally lead to lower mean-squared error in policy evaluation for any total dataset size. We also show that these strategies can start from initial off-policy data, collect additional data, and then use both the initial and new data to produce low mean-squared error policy evaluation without using off-policy corrections.
@inproceedings{zhong2021robust,
title={Robust On-Policy Data Collection for Data-Efficient Policy Evaluation},
author={Rujie Zhong and Josiah P. Hanna and Lukas Sch\"afer and Stefano V. Albrecht},
booktitle={NeurIPS Workshop on Offline Reinforcement Learning (OfflineRL)},
year={2021}
}
Arrasy Rahman, Niklas Höpner, Filippos Christianos, Stefano V. Albrecht
Towards Open Ad Hoc Teamwork Using Graph-based Policy Learning
International Conference on Machine Learning, 2021
Abstract | BibTex | arXiv | Video | Code
ICMLdeep-rlagent-modellingad-hoc-teamwork
Abstract:
Ad hoc teamwork is the challenging problem of designing an autonomous agent which can adapt quickly to collaborate with teammates without prior coordination mechanisms, including joint training. Prior work in this area has focused on closed teams in which the number of agents is fixed. In this work, we consider open teams by allowing agents with different fixed policies to enter and leave the environment without prior notification. Our solution builds on graph neural networks to learn agent models and joint-action value models under varying team compositions. We contribute a novel action-value computation that integrates the agent model and joint-action value model to produce action-value estimates. We empirically demonstrate that our approach successfully models the effects other agents have on the learner, leading to policies that robustly adapt to dynamic team compositions and significantly outperform several alternative methods.
@inproceedings{rahman2021open,
title={Towards Open Ad Hoc Teamwork Using Graph-based Policy Learning},
author={Arrasy Rahman and Niklas H\"opner and Filippos Christianos and Stefano V. Albrecht},
booktitle={International Conference on Machine Learning (ICML)},
year={2021}
}
Filippos Christianos, Georgios Papoudakis, Arrasy Rahman, Stefano V. Albrecht
Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing
International Conference on Machine Learning, 2021
Abstract | BibTex | arXiv | Video | Code
ICMLdeep-rlmulti-agent-rl
Abstract:
Sharing parameters in multi-agent deep reinforcement learning has played an essential role in allowing algorithms to scale to a large number of agents. Parameter sharing between agents significantly decreases the number of trainable parameters, shortening training times to tractable levels, and has been linked to more efficient learning. However, having all agents share the same parameters can also have a detrimental effect on learning. We demonstrate the impact of parameter sharing methods on training speed and converged returns, establishing that when applied indiscriminately, their effectiveness is highly dependent on the environment. We propose a novel method to automatically identify agents which may benefit from sharing parameters by partitioning them based on their abilities and goals. Our approach combines the increased sample efficiency of parameter sharing with the representational capacity of multiple independent networks to reduce training time and increase final returns.
@inproceedings{christianos2021scaling,
title={Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing},
author={Filippos Christianos and Georgios Papoudakis and Arrasy Rahman and Stefano V. Albrecht},
booktitle={International Conference on Machine Learning (ICML)},
year={2021}
}
Lukas Schäfer, Filippos Christianos, Josiah Hanna, Stefano V. Albrecht
Decoupling Exploration and Exploitation in Reinforcement Learning
ICML Workshop on Unsupervised Reinforcement Learning, 2021
Abstract | BibTex | arXiv | Code
ICMLdeep-rlintrinsic-reward
Abstract:
Intrinsic rewards are commonly applied to improve exploration in reinforcement learning. However, these approaches suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we propose Decoupled RL (DeRL) which trains separate policies for exploration and exploitation. DeRL can be applied with on-policy and off-policy RL algorithms. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. We show that DeRL is more robust to scaling and speed of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically motivated baselines in fewer interactions.
@inproceedings{schaefer2021decoupling,
title={Decoupling Exploration and Exploitation in Reinforcement Learning},
author={Lukas Schäfer and Filippos Christianos and Josiah Hanna and Stefano V. Albrecht},
booktitle={ICML Workshop on Unsupervised Reinforcement Learning (URL)},
year={2021}
}
Stefano V. Albrecht, Cillian Brewitt, John Wilhelm, Balint Gyevnar, Francisco Eiras, Mihai Dobre, Subramanian Ramamoorthy
Interpretable Goal-based Prediction and Planning for Autonomous Driving
IEEE International Conference on Robotics and Automation, 2021
Abstract | BibTex | arXiv | Video | Code
ICRAautonomous-drivinggoal-recognitionexplainable-ai
Abstract:
We propose an integrated prediction and planning system for autonomous driving which uses rational inverse planning to recognise the goals of other vehicles. Goal recognition informs a Monte Carlo Tree Search (MCTS) algorithm to plan optimal maneuvers for the ego vehicle. Inverse planning and MCTS utilise a shared set of defined maneuvers and macro actions to construct plans which are explainable by means of rationality principles. Evaluation in simulations of urban driving scenarios demonstrate the system's ability to robustly recognise the goals of other vehicles, enabling our vehicle to exploit non-trivial opportunities to significantly reduce driving times. In each scenario, we extract intuitive explanations for the predictions which justify the system's decisions.
@inproceedings{albrecht2020igp2,
title={Interpretable Goal-based Prediction and Planning for Autonomous Driving},
author={Stefano V. Albrecht and Cillian Brewitt and John Wilhelm and Balint Gyevnar and Francisco Eiras and Mihai Dobre and Subramanian Ramamoorthy},
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
year={2021}
}
Cillian Brewitt, Balint Gyevnar, Samuel Garcin, Stefano V. Albrecht
GRIT: Fast, Interpretable, and Verifiable Goal Recognition with Learned Decision Trees for Autonomous Driving
IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021
Abstract | BibTex | arXiv | Video | Code
IROSautonomous-drivinggoal-recognitionexplainable-ai
Abstract:
It is important for autonomous vehicles to have the ability to infer the goals of other vehicles (goal recognition), in order to safely interact with other vehicles and predict their future trajectories. This is a difficult problem, especially in urban environments with interactions between many vehicles. Goal recognition methods must be fast to run in real time and make accurate inferences. As autonomous driving is safety-critical, it is important to have methods which are human interpretable and for which safety can be formally verified. Existing goal recognition methods for autonomous vehicles fail to satisfy all four objectives of being fast, accurate, interpretable and verifiable. We propose Goal Recognition with Interpretable Trees (GRIT), a goal recognition system which achieves these objectives. GRIT makes use of decision trees trained on vehicle trajectory data. We evaluate GRIT on two datasets, showing that GRIT achieved fast inference speed and comparable accuracy to two deep learning baselines, a planning-based goal recognition method, and an ablation of GRIT. We show that the learned trees are human interpretable and demonstrate how properties of GRIT can be formally verified using a satisfiability modulo theories (SMT) solver.
@inproceedings{brewitt2021grit,
title={{GRIT:} Fast, Interpretable, and Verifiable Goal Recognition with Learned Decision Trees for Autonomous Driving},
author={Cillian Brewitt and Balint Gyevnar and Samuel Garcin and Stefano V. Albrecht},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2021}
}
Josiah P. Hanna, Arrasy Rahman, Elliot Fosong, Francisco Eiras, Mihai Dobre, John Redford, Subramanian Ramamoorthy, Stefano V. Albrecht
Interpretable Goal Recognition in the Presence of Occluded Factors for Autonomous Vehicles
IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021
Abstract | BibTex | arXiv
IROSautonomous-drivinggoal-recognitionexplainable-ai
Abstract:
Recognising the goals or intentions of observed vehicles is a key step towards predicting the long-term future behaviour of other agents in an autonomous driving scenario. When there are unseen obstacles or occluded vehicles in a scenario, goal recognition may be confounded by the effects of these unseen entities on the behaviour of observed vehicles. Existing prediction algorithms that assume rational behaviour with respect to inferred goals may fail to make accurate long-horizon predictions because they ignore the possibility that the behaviour is influenced by such unseen entities. We introduce the Goal and Occluded Factor Inference (GOFI) algorithm which bases inference on inverse-planning to jointly infer a probabilistic belief over goals and potential occluded factors. We then show how these beliefs can be integrated into Monte Carlo Tree Search (MCTS). We demonstrate that jointly inferring goals and occluded factors leads to more accurate beliefs with respect to the true world state and allows an agent to safely navigate several scenarios where other baselines take unsafe actions leading to collisions.
@inproceedings{hanna2021interpretable,
title={Interpretable Goal Recognition in the Presence of Occluded Factors for Autonomous Vehicles},
author={Josiah P. Hanna and Arrasy Rahman and Elliot Fosong and Francisco Eiras and Mihai Dobre and John Redford and Subramanian Ramamoorthy and Stefano V. Albrecht},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2021}
}
Henry Pulver, Francisco Eiras, Ludovico Carozza, Majd Hawasly, Stefano V. Albrecht, Subramanian Ramamoorthy
PILOT: Efficient Planning by Imitation Learning and Optimisation for Safe Autonomous Driving
IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021
Abstract | BibTex | arXiv | Video
IROSautonomous-driving
Abstract:
Achieving a proper balance between planning quality, safety and efficiency is a major challenge for autonomous driving. Optimisation-based motion planners are capable of producing safe, smooth and comfortable plans, but often at the cost of runtime efficiency. On the other hand, naively deploying trajectories produced by efficient-to-run deep imitation learning approaches might risk compromising safety. In this paper, we present PILOT -- a planning framework that comprises an imitation neural network followed by an efficient optimiser that actively rectifies the network's plan, guaranteeing fulfilment of safety and comfort requirements. The objective of the efficient optimiser is the same as the objective of an expensive-to-run optimisation-based planning system that the neural network is trained offline to imitate. This efficient optimiser provides a key layer of online protection from learning failures or deficiency in out-of-distribution situations that might compromise safety or comfort. Using a state-of-the-art, runtime-intensive optimisation-based method as the expert, we demonstrate in simulated autonomous driving experiments in CARLA that PILOT achieves a seven-fold reduction in runtime when compared to the expert it imitates without sacrificing planning quality.
@inproceedings{pulver2020pilot,
title={{PILOT:} Efficient Planning by Imitation Learning and Optimisation for Safe Autonomous Driving},
author={Henry Pulver and Francisco Eiras and Ludovico Carozza and Majd Hawasly and Stefano V. Albrecht and Subramanian Ramamoorthy},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2021}
}
Ibrahim H. Ahmed, Josiah P. Hanna, Elliot Fosong, Stefano V. Albrecht
Towards Quantum-Secure Authentication and Key Agreement via Abstract Multi-Agent Interaction
International Conference on Practical Applications of Agents and Multi-Agent Systems, 2021
Abstract | BibTex | arXiv | Publisher | Code
PAAMSsecurityagent-modelling
Abstract:
Current methods for authentication and key agreement based on public-key cryptography are vulnerable to quantum computing. We propose a novel approach based on artificial intelligence research in which communicating parties are viewed as autonomous agents which interact repeatedly using their private decision models. Authentication and key agreement are decided based on the agents' observed behaviors during the interaction. The security of this approach rests upon the difficulty of modeling the decisions of interacting agents from limited observations, a problem which we conjecture is also hard for quantum computing. We release PyAMI, a prototype authentication and key agreement system based on the proposed method. We empirically validate our method for authenticating legitimate users while detecting different types of adversarial attacks. Finally, we show how reinforcement learning techniques can be used to train server models which effectively probe a client's decisions to achieve more sample-efficient authentication.
@inproceedings{ahmed2021quantum,
title={Towards Quantum-Secure Authentication and Key Agreement via Abstract Multi-Agent Interaction},
author={Ibrahim H. Ahmed and Josiah P. Hanna and Elliot Fosong and Stefano V. Albrecht},
booktitle={International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS)},
year={2021}
}
Shangmin Guo, Yi Ren, Kory Mathewson, Simon Kirby, Stefano V. Albrecht, Kenny Smith
Expressivity of Emergent Language is a Trade-off between Contextual Complexity and Unpredictability
arXiv:2106.03982, 2021
Abstract | BibTex | arXiv
multi-agent-rlemergent-communication
Abstract:
Researchers are now using deep learning models to explore the emergence of language in various language games, where simulated agents interact and develop an emergent language to solve a task. Although it is quite intuitive that different types of language games posing different communicative challenges might require emergent languages which encode different levels of information, there is no existing work exploring the expressivity of the emergent languages. In this work, we propose a definition of partial order between expressivity based on the generalisation performance across different language games. We also validate the hypothesis that expressivity of emergent languages is a trade-off between the complexity and unpredictability of the context those languages are used in. Our second novel contribution is introducing contrastive loss into the implementation of referential games. We show that using our contrastive loss alleviates the collapse of message types seen using standard referential loss functions.
@misc{guo2021expressivity,
title={Expressivity of Emergent Language is a Trade-off between Contextual Complexity and Unpredictability},
author={Shangmin Guo and Yi Ren and Kory Mathewson and Simon Kirby and Stefano V. Albrecht and Kenny Smith},
year={2021},
eprint={2106.03982},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Trevor McInroe, Lukas Schäfer, Stefano V. Albrecht
Learning Temporally-Consistent Representations for Data-Efficient Reinforcement Learning
arXiv:2110.04935, 2021
Abstract | BibTex | arXiv | Code
deep-rl
Abstract:
Deep reinforcement learning (RL) agents that exist in high-dimensional state spaces, such as those composed of images, have interconnected learning burdens. Agents must learn an action-selection policy that completes their given task, which requires them to learn a representation of the state space that discerns between useful and useless information. The reward function is the only supervised feedback that RL agents receive, which causes a representation learning bottleneck that can manifest in poor sample efficiency. We present k-Step Latent (KSL), a new representation learning method that enforces temporal consistency of representations via a self-supervised auxiliary task wherein agents learn to recurrently predict action-conditioned representations of the state space. The state encoder learned by KSL produces low-dimensional representations that make optimization of the RL task more sample efficient. Altogether, KSL produces state-of-the-art results in both data efficiency and asymptotic performance in the popular PlaNet benchmark suite. Our analyses show that KSL produces encoders that generalize better to new tasks unseen during training, and its representations are more strongly tied to reward, are more invariant to perturbations in the state space, and move more smoothly through the temporal axis of the RL problem than other methods such as DrQ, RAD, CURL, and SAC-AE.
@misc{mcinroe2021learning,
title={Learning Temporally-Consistent Representations for Data-Efficient Reinforcement Learning},
author={Trevor McInroe and Lukas Schäfer and Stefano V. Albrecht},
year={2021},
eprint={2110.04935},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
2020
Stefano V. Albrecht, Peter Stone, Michael P. Wellman
Special Issue on Autonomous Agents Modelling Other Agents: Guest Editorial
Artificial Intelligence, 2020
Abstract | BibTex | Publisher | Special Issue
AIJsurveyagent-modelling
Abstract:
Much research in artificial intelligence is concerned with enabling autonomous agents to reason about various aspects of other agents (such as their beliefs, goals, plans, or decisions) and to utilise such reasoning for effective interaction. This special issue contains new technical contributions addressing open problems in autonomous agents modelling other agents, as well as research perspectives about current developments, challenges, and future directions.
@article{albrecht2020special,
title = {Special Issue on Autonomous Agents Modelling Other Agents: Guest Editorial},
author = {Stefano V. Albrecht and Peter Stone and Michael P. Wellman},
journal = {Artificial Intelligence},
volume = {285},
year = {2020},
publisher = {Elsevier},
url = {https://doi.org/10.1016/j.artint.2020.103292}
}
Filippos Christianos, Lukas Schäfer, Stefano V. Albrecht
Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
Conference on Neural Information Processing Systems, 2020
Abstract | BibTex | arXiv
NeurIPSdeep-rlmulti-agent-rl
Abstract:
Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.
@inproceedings{christianos2020shared,
title={Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning},
author={Filippos Christianos and Lukas Sch\"afer and Stefano V. Albrecht},
booktitle={34th Conference on Neural Information Processing Systems},
year={2020}
}
Georgios Papoudakis, Stefano V. Albrecht
Variational Autoencoders for Opponent Modeling in Multi-Agent Systems
AAAI Workshop on Reinforcement Learning in Games, 2020
Abstract | BibTex | arXiv
AAAIdeep-rlagent-modelling
Abstract:
Multi-agent systems exhibit complex behaviors that emanate from the interactions of multiple agents in a shared environment. In this work, we are interested in controlling one agent in a multi-agent system and successfully learn to interact with the other agents that have fixed policies. Modeling the behavior of other agents (opponents) is essential in understanding the interactions of the agents in the system. By taking advantage of recent advances in unsupervised learning, we propose modeling opponents using variational autoencoders. Additionally, many existing methods in the literature assume that the opponent models have access to opponent's observations and actions during both training and execution. To eliminate this assumption, we propose a modification that attempts to identify the underlying opponent model using only local information of our agent, such as its observations, actions, and rewards. The experiments indicate that our opponent modeling methods achieve equal or greater episodic returns in reinforcement learning tasks against another modeling method.
@inproceedings{papoudakis2020variational,
title={Variational Autoencoders for Opponent Modeling in Multi-Agent Systems},
author={Georgios Papoudakis and Stefano V. Albrecht},
booktitle={AAAI Workshop on Reinforcement Learning in Games},
year={2020}
}
Arrasy Rahman, Niklas Höpner, Filippos Christianos, Stefano V. Albrecht
Open Ad Hoc Teamwork using Graph-based Policy Learning
arXiv:2006.10412, 2020
Abstract | BibTex | arXiv
deep-rlagent-modellingad-hoc-teamwork
Abstract:
Ad hoc teamwork is the challenging problem of designing an autonomous agent which can adapt quickly to collaborate with previously unknown teammates. Prior work in this area has focused on closed teams in which the number of agents is fixed. In this work, we consider open teams by allowing agents of varying types to enter and leave the team without prior notification. Our proposed solution builds on graph neural networks to learn scalable agent models and value decompositions under varying team sizes, which can be jointly trained with a reinforcement learning agent using discounted returns objectives. We demonstrate empirically that our approach results in agent policies which can robustly adapt to dynamic team composition, and is able to effectively generalize to larger teams than were seen during training.
@misc{rahman2020open,
title={Open Ad Hoc Teamwork using Graph-based Policy Learning},
author={Arrasy Rahman and Niklas H\"opner and Filippos Christianos and Stefano V. Albrecht},
year={2020},
eprint={2006.10412},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Georgios Papoudakis, Filippos Christianos , Lukas Schäfer, Stefano V. Albrecht
Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms
arXiv:2006.07869, 2020
Abstract | BibTex | arXiv
deep-rlmulti-agent-rl
Abstract:
Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we evaluate and compare three different classes of MARL algorithms (independent learners, centralised training with decentralised execution, and value decomposition) in a diverse range of multi-agent learning tasks. Our results show that (1) algorithm performance depends strongly on environment properties and no algorithm learns efficiently across all learning tasks; (2) independent learners often achieve equal or better performance than more complex algorithms; (3) tested algorithms struggle to solve multi-agent tasks with sparse rewards. We report detailed empirical data, including a reliability analysis, and provide insights into the limitations of the tested algorithms.
@misc{papoudakis2020comparative,
title={Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms},
author={Georgios Papoudakis and Filippos Christianos and Lukas Sch\"afer and Stefano V. Albrecht},
year={2020},
eprint={2006.07869},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Georgios Papoudakis, Filippos Christianos, Stefano V. Albrecht
Local Information Opponent Modelling Using Variational Autoencoders
arXiv:2006.09447, 2020
Abstract | BibTex | arXiv
deep-rlagent-modelling
Abstract:
Modelling the behaviours of other agents (opponents) is essential for understanding how agents interact and making effective decisions. Existing methods for opponent modelling commonly assume knowledge of the local observations and chosen actions of the modelled opponents, which can significantly limit their applicability. We propose a new modelling technique based on variational autoencoders, which are trained to reconstruct the local actions and observations of the opponent based on embeddings which depend only on the local observations of the modelling agent (its observed world state, chosen actions, and received rewards). The embeddings are used to augment the modelling agent's decision policy which is trained via deep reinforcement learning; thus the policy does not require access to opponent observations. We provide a comprehensive evaluation and ablation study in diverse multi-agent tasks, showing that our method achieves comparable performance to an ideal baseline which has full access to opponent's information, and significantly higher returns than a baseline method which does not use the learned embeddings.
@misc{papoudakis2020opponent,
title={Local Information Opponent Modelling Using Variational Autoencoders},
author={Georgios Papoudakis and Filippos Christianos and Stefano V. Albrecht},
year={2020},
eprint={2006.09447},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Ibrahim H. Ahmed, Josiah P. Hanna, Stefano V. Albrecht
Quantum-Secure Authentication via Abstract Multi-Agent Interaction
arXiv:2007.09327, 2020
Abstract | BibTex | arXiv
securityagent-modelling
Abstract:
Current methods for authentication based on public-key cryptography are vulnerable to quantum computing. We propose a novel approach to authentication in which communicating parties are viewed as autonomous agents which interact repeatedly using their private decision models. The security of this approach rests upon the difficulty of learning the model parameters of interacting agents, a problem which we conjecture is also hard for quantum computing. We develop methods which enable a server agent to classify a client agent as either legitimate or adversarial based on their past interactions. Moreover, we use reinforcement learning techniques to train server policies which effectively probe the client's decisions to achieve more sample-efficient authentication, while making modelling attacks as difficult as possible via entropy-maximization principles. We empirically validate our methods for authenticating legitimate users while detecting different types of adversarial attacks.
@misc{ahmed2020quantumsecure,
title={Quantum-Secure Authentication via Abstract Multi-Agent Interaction},
author={Ibrahim H. Ahmed and Josiah P. Hanna and Stefano V. Albrecht},
year={2020},
eprint={2007.09327},
archivePrefix={arXiv},
primaryClass={cs.CR}
}
Stefano V. Albrecht, Cillian Brewitt, John Wilhelm, Balint Gyevnar, Francisco Eiras, Mihai Dobre, Subramanian Ramamoorthy
Interpretable Goal-based Prediction and Planning for Autonomous Driving
arXiv:2002.02277, 2020
Abstract | BibTex | arXiv
autonomous-drivinggoal-recognitionexplainable-ai
Abstract:
We propose an integrated prediction and planning system for autonomous driving which uses rational inverse planning to recognise the goals of other vehicles. Goal recognition informs a Monte Carlo Tree Search (MCTS) algorithm to plan optimal maneuvers for the ego vehicle. Inverse planning and MCTS utilise a shared set of defined maneuvers and macro actions to construct plans which are explainable by means of rationality principles. Evaluation in simulations of urban driving scenarios demonstrate the system's ability to robustly recognise the goals of other vehicles, enabling our vehicle to exploit non-trivial opportunities to significantly reduce driving times. In each scenario, we extract intuitive explanations for the predictions which justify the system's decisions.
@misc{albrecht2020integrating,
title={Interpretable Goal-based Prediction and Planning for Autonomous Driving},
author={Stefano V. Albrecht and Cillian Brewitt and John Wilhelm and Balint Gyevnar and Francisco Eiras and Mihai Dobre and Subramanian Ramamoorthy},
year={2020},
eprint={2002.02277},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
Henry Pulver, Francisco Eiras, Ludovico Carozza, Majd Hawasly, Stefano V. Albrecht, Subramanian Ramamoorthy
PILOT: Efficient Planning by Imitation Learning and Optimisation for Safe Autonomous Driving
arXiv:2011.00509, 2020
Abstract | BibTex | arXiv
autonomous-driving
Abstract:
Achieving the right balance between planning quality, safety and runtime efficiency is a major challenge for autonomous driving research. Optimisation-based planners are typically capable of producing high-quality, safe plans, but at the cost of efficiency. We present PILOT, a two-stage planning framework comprising an imitation neural network and an efficient optimisation component that guarantees the satisfaction of requirements of safety and comfort. The neural network is trained to imitate an expensive-to-run optimisation-based planning system with the same objective as the efficient optimisation component of PILOT. We demonstrate in simulated autonomous driving experiments that the proposed framework achieves a significant reduction in runtime when compared to the optimisation-based expert it imitates, without sacrificing the planning quality.
@misc{pulver2020pilot,
title={{PILOT:} Efficient Planning by Imitation Learning and Optimisation for Safe Autonomous Driving},
author={Henry Pulver and Francisco Eiras and Ludovico Carozza and Majd Hawasly and Stefano V. Albrecht and Subramanian Ramamoorthy},
year={2020},
eprint={2011.00509},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
Francisco Eiras, Majd Hawasly, Stefano V. Albrecht, Subramanian Ramamoorthy
Two-Stage Optimization-based Motion Planner for Safe Urban Driving
arXiv:2002.02215, 2020
Abstract | BibTex | arXiv
autonomous-driving
Abstract:
Recent road trials have shown that guaranteeing the safety of driving decisions is essential for the wider adoption of autonomous vehicle technology. One promising direction is to pose safety requirements as planning constraints in nonlinear, nonconvex optimization problems of motion synthesis. However, many implementations of this approach are limited by uncertain convergence and local optimality of the solutions achieved, affecting overall robustness. To improve upon these issues, we propose a novel two-stage optimization framework: in the first stage, we find a solution to a Mixed-Integer Linear Programming (MILP) formulation of the motion synthesis problem, the output of which initializes a second Nonlinear Programming (NLP) stage. The MILP stage enforces hard constraints of safety and road rule compliance generating a solution in the right subspace, while the NLP stage refines the solution within the safety bounds for feasibility and smoothness. We demonstrate the effectiveness of our framework via simulated experiments of complex urban driving scenarios, outperforming a state-of-the-art baseline in metrics of convergence, comfort and progress.
@misc{eiras2020twostage,
title={Two-Stage Optimization-based Motion Planner for Safe Urban Driving},
author={Francisco Eiras and Majd Hawasly and Stefano V. Albrecht and Subramanian Ramamoorthy},
year={2020},
eprint={2002.02215},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
2019
Maciej Wiatrak, Stefano V. Albrecht, Andrew Nystrom
Stabilizing Generative Adversarial Networks: A Survey
arXiv:1910.00927, 2019
Abstract | BibTex | arXiv
surveysecurity
Abstract:
Generative Adversarial Networks (GANs) are a type of generative model which have received much attention due to their ability to model complex real-world data. Despite their recent successes, the process of training GANs remains challenging, suffering from instability problems such as non-convergence, vanishing or exploding gradients, and mode collapse. In recent years, a diverse set of approaches have been proposed which focus on stabilizing the GAN training procedure. The purpose of this survey is to provide a comprehensive overview of the GAN training stabilization methods which can be found in the literature. We discuss the advantages and disadvantages of each approach, offer a comparative summary, and conclude with a discussion of open problems.
@misc{wiatrak2019stabilizing,
title={Stabilizing Generative Adversarial Networks: A Survey},
author={Maciej Wiatrak and Stefano V. Albrecht and Andrew Nystrom},
year={2019},
eprint={1910.00927},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Georgios Papoudakis, Filippos Christianos, Arrasy Rahman, Stefano V. Albrecht
Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning
arXiv:1906.04737, 2019
Abstract | BibTex | arXiv
surveydeep-rlmulti-agent-rl
Abstract:
Recent developments in deep reinforcement learning are concerned with creating decision-making agents which can perform well in various complex domains. A particular approach which has received increasing attention is multi-agent reinforcement learning, in which multiple agents learn concurrently to coordinate their actions. In such multi-agent environments, additional learning problems arise due to the continually changing decision-making policies of agents. This paper surveys recent works that address the non-stationarity problem in multi-agent deep reinforcement learning. The surveyed methods range from modifications in the training procedure, such as centralized training, to learning representations of the opponent's policy, meta-learning, communication, and decentralized learning. The survey concludes with a list of open problems and possible lines of future research.
@misc{papoudakis2019dealing,
title={Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning},
author={Georgios Papoudakis and Filippos Christianos and Arrasy Rahman and Stefano V. Albrecht},
year={2019},
eprint={1906.04737},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
2018
Stefano V. Albrecht, Peter Stone
Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems
Artificial Intelligence, 2018
Abstract | BibTex | arXiv | Publisher
AIJsurveyagent-modellinggoal-recognition
Abstract:
Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different sub-communities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research.
@article{ albrecht2018modelling,
title = {Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems},
author = {Stefano V. Albrecht and Peter Stone},
journal = {Artificial Intelligence},
volume = {258},
pages = {66--95},
year = {2018},
publisher = {Elsevier},
note = {DOI: 10.1016/j.artint.2018.01.002}
}
Craig Innes, Alex Lascarides, Stefano V. Albrecht, Subramanian Ramamoorthy, Benjamin Rosman
Reasoning about Unforeseen Possibilities During Policy Learning
arXiv:1801.03331, 2018
Abstract | BibTex | arXiv
causal
Abstract:
Methods for learning optimal policies in autonomous agents often assume that the way the domain is conceptualised - its possible states and actions and their causal structure - is known in advance and does not change during learning. This is an unrealistic assumption in many scenarios, because new evidence can reveal important information about what is possible, possibilities that the agent was not aware existed prior to learning. We present a model of an agent which both discovers and learns to exploit unforeseen possibilities using two sources of evidence: direct interaction with the world and communication with a domain expert. We use a combination of probabilistic and symbolic reasoning to estimate all components of the decision problem, including its set of random variables and their causal dependencies. Agent simulations show that the agent converges on optimal polices even when it starts out unaware of factors that are critical to behaving optimally.
@misc{innes2018reasoning,
title={Reasoning about Unforeseen Possibilities During Policy Learning},
author={Craig Innes and Alex Lascarides and Stefano V. Albrecht and Subramanian Ramamoorthy and Benjamin Rosman},
year={2018},
eprint={1801.03331},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
2017
Stefano V. Albrecht, Somchaya Liemhetcharat, Peter Stone
Special Issue on Multiagent Interaction without Prior Coordination: Guest Editorial
Journal of Autonomous Agents and Multi-Agent Systems, 2017
Abstract | BibTex | Publisher | MIPC Workshop Series
JAAMASsurveyad-hoc-teamwork
Abstract:
This special issue of the Journal of Autonomous Agents and Multi-Agent Systems sought research articles on the emerging topic of multiagent interaction without prior coordination. Topics of interest included empirical and theoretical investigations of issues arising from assumptions of prior coordination, as well as solutions in the form of novel models and algorithms for effective multiagent interaction without prior coordination.
@article{ albrecht2017special,
title = {Special Issue on Multiagent Interaction without Prior Coordination: Guest Editorial},
author = {Stefano V. Albrecht and Somchaya Liemhetcharat and Peter Stone},
journal = {Autonomous Agents and Multi-Agent Systems},
volume = {31},
issue = {4},
pages = {765--766},
year = {2017},
publisher = {Springer},
url = {http://dx.doi.org/10.1007/s10458-016-9358-0}
}
Stefano V. Albrecht, Peter Stone
Reasoning about Hypothetical Agent Behaviours and their Parameters
International Conference on Autonomous Agents and Multiagent Systems, 2017
Abstract | BibTex | arXiv
AAMASad-hoc-teamworkagent-modelling
Abstract:
Agents can achieve effective interaction with previously unknown other agents by maintaining beliefs over a set of hypothetical behaviours, or types, that these agents may have. A current limitation in this method is that it does not recognise parameters within type specifications, because types are viewed as blackbox mappings from interaction histories to probability distributions over actions. In this work, we propose a general method which allows an agent to reason about both the relative likelihood of types and the values of any bounded continuous parameters within types. The method maintains individual parameter estimates for each type and selectively updates the estimates for some types after each observation. We propose different methods for the selection of types and the estimation of parameter values. The proposed methods are evaluated in detailed experiments, showing that updating the parameter estimates of a single type after each observation can be sufficient to achieve good performance.
@inproceedings{ albrecht2017reasoning,
title = {Reasoning about Hypothetical Agent Behaviours and their Parameters},
author = {Stefano V. Albrecht and Peter Stone},
booktitle = {Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems},
pages = {547--555},
year = {2017}
}
Stefano V. Albrecht, Subramanian Ramamoorthy
Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks (Extended Abstract)
International Joint Conference on Artificial Intelligence, 2017
Abstract | BibTex | arXiv
IJCAIstate-estimationcausal
Abstract:
Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and uncertain observations. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivity-based Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF is evaluated in both synthetic processes and a simulated multi-robot warehouse, where it outperformed alternative filtering methods by exploiting passivity.
@inproceedings{ albrecht2017causality,
title = {Exploiting Causality for Selective Belief Filtering in Dynamic {B}ayesian Networks (Extended Abstract)},
author = {Stefano V. Albrecht and Subramanian Ramamoorthy},
booktitle = {Proceedings of the 26th International Joint Conference on Artificial Intelligence},
address = {Melbourne, Australia},
month = {August},
year = {2017}
}
2016
Stefano V. Albrecht, Jacob W. Crandall, Subramanian Ramamoorthy
Belief and Truth in Hypothesised Behaviours
Artificial Intelligence, 2016
Abstract | BibTex | arXiv | Publisher
AIJagent-modellingad-hoc-teamwork
Abstract:
There is a long history in game theory on the topic of Bayesian or “rational” learning, in which each player maintains beliefs over a set of alternative behaviours, or types, for the other players. This idea has gained increasing interest in the artificial intelligence (AI) community, where it is used as a method to control a single agent in a system composed of multiple agents with unknown behaviours. The idea is to hypothesise a set of types, each specifying a possible behaviour for the other agents, and to plan our own actions with respect to those types which we believe are most likely, given the observed actions of the agents. The game theory literature studies this idea primarily in the context of equilibrium attainment. In contrast, many AI applications have a focus on task completion and payoff maximisation. With this perspective in mind, we identify and address a spectrum of questions pertaining to belief and truth in hypothesised types. We formulate three basic ways to incorporate evidence into posterior beliefs and show when the resulting beliefs are correct, and when they may fail to be correct. Moreover, we demonstrate that prior beliefs can have a significant impact on our ability to maximise payoffs in the long-term, and that they can be computed automatically with consistent performance effects. Furthermore, we analyse the conditions under which we are able complete our task optimally, despite inaccuracies in the hypothesised types. Finally, we show how the correctness of hypothesised types can be ascertained during the interaction via an automated statistical analysis.
@article{ albrecht2016belief,
title = {Belief and Truth in Hypothesised Behaviours},
author = {Stefano V. Albrecht and Jacob W. Crandall and Subramanian Ramamoorthy},
journal = {Artificial Intelligence},
volume = {235},
pages = {63--94},
year = {2016},
publisher = {Elsevier},
note = {DOI: 10.1016/j.artint.2016.02.004}
}
Stefano V. Albrecht, Subramanian Ramamoorthy
Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks
Journal of Artificial Intelligence Research, 2016
Abstract | BibTex | arXiv | Publisher
JAIRstate-estimationcausal
Abstract:
Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and noisy observations. This can be a hard problem in complex processes with large state spaces. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivity-based Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF produces exact belief states under certain assumptions and approximate belief states otherwise, where the approximation error is bounded by the degree of uncertainty in the process. We show empirically, in synthetic processes with varying sizes and degrees of passivity, that PSBF is faster than several alternative methods while achieving competitive accuracy. Furthermore, we demonstrate how passivity occurs naturally in a complex system such as a multi-robot warehouse, and how PSBF can exploit this to accelerate the filtering task.
@article{ albrecht2016causality,
title = {Exploiting Causality for Selective Belief Filtering in Dynamic {B}ayesian Networks},
author = {Stefano V. Albrecht and Subramanian Ramamoorthy},
journal = {Journal of Artificial Intelligence Research},
volume = {55},
pages = {1135--1178},
year = {2016},
publisher = {AI Access Foundation},
note = {DOI: 10.1613/jair.5044}
}
2015
Stefano V. Albrecht, Subramanian Ramamoorthy
Are You Doing What I Think You Are Doing? Criticising Uncertain Agent Models
Conference on Uncertainty in Artificial Intelligence, 2015
Abstract | BibTex | arXiv
UAIagent-modelling
Abstract:
The key for effective interaction in many multiagent applications is to reason explicitly about the behaviour of other agents, in the form of a hypothesised behaviour. While there exist several methods for the construction of a behavioural hypothesis, there is currently no universal theory which would allow an agent to contemplate the correctness of a hypothesis. In this work, we present a novel algorithm which decides this question in the form of a frequentist hypothesis test. The algorithm allows for multiple metrics in the construction of the test statistic and learns its distribution during the interaction process, with asymptotic correctness guarantees. We present results from a comprehensive set of experiments, demonstrating that the algorithm achieves high accuracy and scalability at low computational costs.
@inproceedings{ albrecht2015criticising,
title = {Are You Doing What {I} Think You Are Doing? Criticising Uncertain Agent Models},
author = {Stefano V. Albrecht and Subramanian Ramamoorthy},
booktitle = {Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence},
pages = {52--61},
year = {2015}
}
Stefano V. Albrecht, Jacob W. Crandall, Subramanian Ramamoorthy
An Empirical Study on the Practical Impact of Prior Beliefs over Policy Types
AAAI Conference on Artificial Intelligence, 2015
Abstract | BibTex | arXiv | Appendix
AAAIagent-modellingad-hoc-teamwork
Abstract:
Many multiagent applications require an agent to learn quickly how to interact with previously unknown other agents. To address this problem, researchers have studied learning algorithms which compute posterior beliefs over a hypothesised set of policies, based on the observed actions of the other agents. The posterior belief is complemented by the prior belief, which specifies the subjective likelihood of policies before any actions are observed. In this paper, we present the first comprehensive empirical study on the practical impact of prior beliefs over policies in repeated interactions. We show that prior beliefs can have a significant impact on the long-term performance of such methods, and that the magnitude of the impact depends on the depth of the planning horizon. Moreover, our results demonstrate that automatic methods can be used to compute prior beliefs with consistent performance effects. This indicates that prior beliefs could be eliminated as a manual parameter and instead be computed automatically.
@inproceedings{ albrecht2015empirical,
title = {An Empirical Study on the Practical Impact of Prior Beliefs over Policy Types},
author = {Stefano V. Albrecht and Jacob W. Crandall and Subramanian Ramamoorthy},
booktitle = {Proceedings of the 29th AAAI Conference on Artificial Intelligence},
pages = {1988--1994},
year = {2015}
}
Stefano V. Albrecht, Jacob W. Crandall, Subramanian Ramamoorthy
E-HBA: Using Action Policies for Expert Advice and Agent Typification
AAAI Workshop on Multiagent Interaction without Prior Coordination, 2015
Abstract | BibTex | arXiv | Appendix
AAAIagent-modellingad-hoc-teamwork
Abstract:
Past research has studied two approaches to utilise predefined policy sets in repeated interactions: as experts, to dictate our own actions, and as types, to characterise the behaviour of other agents. In this work, we bring these complementary views together in the form of a novel meta-algorithm, called Expert-HBA (E-HBA), which can be applied to any expert algorithm that considers the average (or total) payoff an expert has yielded in the past. E-HBA gradually mixes the past payoff with a predicted future payoff, which is computed using the type-based characterisation. We present results from a comprehensive set of repeated matrix games, comparing the performance of several well-known expert algorithms with and without the aid of E-HBA. Our results show that E-HBA has the potential to significantly improve the performance of expert algorithms.
@inproceedings{ albrecht2015ehba,
title = {{E-HBA}: Using Action Policies for Expert Advice and Agent Typification},
author = {Stefano V. Albrecht and Jacob W. Crandall and Subramanian Ramamoorthy},
booktitle = {AAAI Workshop on Multiagent Interaction without Prior Coordination},
address = {Austin, Texas, USA},
month = {January},
year = {2015}
}
2014
Stefano V. Albrecht, Subramanian Ramamoorthy
On Convergence and Optimality of Best-Response Learning with Policy Types in Multiagent Systems
Conference on Uncertainty in Artificial Intelligence, 2014
Abstract | BibTex | arXiv | Appendix
UAIagent-modelling
Abstract:
While many multiagent algorithms are designed for homogeneous systems (i.e. all agents are identical), there are important applications which require an agent to coordinate its actions without knowing a priori how the other agents behave. One method to make this problem feasible is to assume that the other agents draw their latent policy (or type) from a specific set, and that a domain expert could provide a specification of this set, albeit only a partially correct one. Algorithms have been proposed by several researchers to compute posterior beliefs over such policy libraries, which can then be used to determine optimal actions. In this paper, we provide theoretical guidance on two central design parameters of this method: Firstly, it is important that the user choose a posterior which can learn the true distribution of latent types, as otherwise suboptimal actions may be chosen. We analyse convergence properties of two existing posterior formulations and propose a new posterior which can learn correlated distributions. Secondly, since the types are provided by an expert, they may be inaccurate in the sense that they do not predict the agents’ observed actions. We provide a novel characterisation of optimality which allows experts to use efficient model checking algorithms to verify optimality of types.
@inproceedings{ albrecht2014convergence,
title = {On Convergence and Optimality of Best-Response Learning with Policy Types in Multiagent Systems},
author = {Stefano V. Albrecht and Subramanian Ramamoorthy},
booktitle = {Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence},
pages = {12--21},
year = {2014}
}
2013
Stefano V. Albrecht, Subramanian Ramamoorthy
A Game-Theoretic Model and Best-Response Learning Method for Ad Hoc Coordination in Multiagent Systems
International Conference on Autonomous Agents and Multiagent Systems, 2013
Abstract | BibTex | arXiv (full technical report) | Extended Abstract
AAMASad-hoc-teamworkagent-modelling
Abstract:
The ad hoc coordination problem is to design an autonomous agent which is able to achieve optimal flexibility and efficiency in a multiagent system with no mechanisms for prior coordination. We conceptualise this problem formally using a game-theoretic model, called the stochastic Bayesian game, in which the behaviour of a player is determined by its private information, or type. Based on this model, we derive a solution, called Harsanyi-Bellman Ad Hoc Coordination (HBA), which utilises the concept of Bayesian Nash equilibrium in a planning procedure to find optimal actions in the sense of Bellman optimal control. We evaluate HBA in a multiagent logistics domain called level-based foraging, showing that it achieves higher flexibility and efficiency than several alternative algorithms. We also report on a human-machine experiment at a public science exhibition in which the human participants played repeated Prisoner's Dilemma and Rock-Paper-Scissors against HBA and alternative algorithms, showing that HBA achieves equal efficiency and a significantly higher welfare and winning rate.
@inproceedings{ albrecht2013game,
title = {A Game-Theoretic Model and Best-Response Learning Method for Ad Hoc Coordination in Multiagent Systems},
author = {Stefano V. Albrecht and Subramanian Ramamoorthy},
booktitle = {Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems},
address = {St. Paul, Minnesota, USA},
month = {May},
year = {2013}
}
2012
Stefano V. Albrecht, Subramanian Ramamoorthy
Comparative Evaluation of Multiagent Learning Algorithms in a Diverse Set of Ad Hoc Team Problems
International Conference on Autonomous Agents and Multiagent Systems, 2012
Abstract | BibTex | arXiv
AAMASmulti-agent-rlad-hoc-teamwork
Abstract:
This paper is concerned with evaluating different multiagent learning (MAL) algorithms in problems where individual agents may be heterogenous, in the sense of utilizing different learning strategies, without the opportunity for prior agreements or information regarding coordination. Such a situation arises in ad hoc team problems, a model of many practical multiagent systems applications. Prior work in multiagent learning has often been focussed on homogeneous groups of agents, meaning that all agents were identical and a priori aware of this fact. Also, those algorithms that are specifically designed for ad hoc team problems are typically evaluated in teams of agents with fixed behaviours, as opposed to agents which are adapting their behaviours. In this work, we empirically evaluate five MAL algorithms, representing major approaches to multiagent learning but originally developed with the homogeneous setting in mind, to understand their behaviour in a set of ad hoc team problems. All teams consist of agents which are continuously adapting their behaviours. The algorithms are evaluated with respect to a comprehensive characterisation of repeated matrix games, using performance criteria that include considerations such as attainment of equilibrium, social welfare and fairness. Our main conclusion is that there is no clear winner. However, the comparative evaluation also highlights the relative strengths of different algorithms with respect to the type of performance criteria, e.g., social welfare vs. attainment of equilibrium.
@inproceedings{ albrecht2012comparative,
title = {Comparative Evaluation of {MAL} Algorithms in a Diverse Set of Ad Hoc Team Problems},
author = {Stefano V. Albrecht and Subramanian Ramamoorthy},
booktitle = {Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems},
pages = {349--356},
year = {2012}
}