Publications
For news about publications, follow us on X:
Click on any author names or tags to filter publications.
All topic tags:
surveydeep-rlmulti-agent-rlagent-modellingad-hoc-teamworkautonomous-drivinggoal-recognitionexplainable-aicausalgeneralisationsecurityemergent-communicationiterated-learningintrinsic-rewardsimulatorstate-estimationdeep-learningtransfer-learning
Selected tags (click to remove):
AAMAS
2024
Elliot Fosong, Arrasy Rahman, Ignacio Carlucho, Stefano V. Albrecht
Learning Complex Teamwork Tasks Using a Given Sub-task Decomposition
International Conference on Autonomous Agents and Multi-Agent Systems, 2024
Abstract | BibTex | arXiv | Code
AAMASmulti-agent-rl
Abstract:
Training a team to complete a complex task via multi-agent reinforcement learning can be difficult due to challenges such as policy search in a large joint policy space, and non-stationarity caused by mutually adapting agents. To facilitate efficient learning of complex multi-agent tasks, we propose an approach which uses an expert-provided decomposition of a task into simpler multi-agent sub-tasks. In each sub-task, a subset of the entire team is trained to acquire sub-task-specific policies. The sub-teams are then merged and transferred to the target task, where their policies are collectively fine-tuned to solve the more complex target task. We show empirically that such approaches can greatly reduce the number of timesteps required to solve a complex target task relative to training from-scratch. However, we also identify and investigate two problems with naive implementations of approaches based on sub-task decomposition, and propose a simple and scalable method to address these problems which augments existing actor-critic algorithms. We demonstrate the empirical benefits of our proposed method, enabling sub-task decomposition approaches to be deployed in diverse multi-agent tasks.
@inproceedings{fosongLearningComplexTeamwork2024,
title = {Learning Complex Teamwork Tasks Using a Given Sub-task Decomposition},
author = {Fosong, Elliot and Rahman, Arrasy and Carlucho, Ignacio and Albrecht, Stefano V.},
booktitle = {Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems},
year = {2024}
}
Balint Gyevnar, Cheng Wang, Christopher G. Lucas, Shay B. Cohen, Stefano V. Albrecht
Causal Explanations for Sequential Decision-Making in Multi-Agent Systems
International Conference on Autonomous Agents and Multi-Agent Systems, 2024
Abstract | BibTex | arXiv | Code | Dataset
AAMASexplainable-aiautonomous-drivingcausal
Abstract:
We present CEMA: Causal Explanations in Multi-Agent systems; a framework for creating causal natural language explanations of an agent's decisions in dynamic sequential multi-agent systems to build more trustworthy autonomous agents. Unlike prior work that assumes a fixed causal structure, CEMA only requires a probabilistic model for forward-simulating the state of the system. Using such a model, CEMA simulates counterfactual worlds that identify the salient causes behind the agent's decisions. We evaluate CEMA on the task of motion planning for autonomous driving and test it in diverse simulated scenarios. We show that CEMA correctly and robustly identifies the causes behind the agent's decisions, even when a large number of other agents is present, and show via a user study that CEMA's explanations have a positive effect on participants' trust in autonomous vehicles and are rated as high as high-quality baseline explanations elicited from other participants.
@inproceedings{gyevnar2024cema,
title={Causal Explanations for Sequential Decision-Making in Multi-Agent Systems},
author={Balint Gyevnar and Cheng Wang and Christopher G. Lucas and Shay B. Cohen and Stefano V. Albrecht},
booktitle = {Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems},
year={2024}
}
2023
Balint Gyevnar, Cheng Wang, Christopher G. Lucas, Shay B. Cohen, Stefano V. Albrecht
Causal Social Explanations for Stochastic Sequential Multi-Agent Decision-Making
AAMAS Workshop on Explainable and Transparent AI and Multi-Agent Systems, 2023
Abstract | BibTex | arXiv | Code
AAMASautonomous-drivingexplainable-aicausal
Abstract:
We present a novel framework to generate causal explanations for the decisions of agents in stochastic sequential multi-agent environments. Explanations are given via natural language conversations answering a wide range of user queries and requiring associative, interventionist, or counterfactual causal reasoning. Instead of assuming any specific causal graph, our method relies on a generative model of interactions to simulate counterfactual worlds which are used to identify the salient causes behind decisions. We implement our method for motion planning for autonomous driving and test it in simulated scenarios with coupled interactions. Our method correctly identifies and ranks the relevant causes and delivers concise explanations to the users' queries.
@inproceedings{gyevnar2023causal,
title={Causal Social Explanations for Stochastic Sequential Multi-Agent Decision-Making},
author={Balint Gyevnar and Cheng Wang and Christopher G. Lucas and Shay B. Cohen and Stefano V. Albrecht},
booktitle={5th International Workshop on EXplainable and TRAnsparent AI and Multi-Agent Systems},
year={2023}
}
Filippos Christianos, Georgios Papoudakis, Stefano V. Albrecht
Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning
AAMAS Workshop on Optimization and Learning in Multiagent Systems, 2023
Abstract | BibTex | arXiv
AAMASdeep-rlmulti-agent-rl
Abstract:
This work focuses on equilibrium selection in no-conflict multi-agent games, where we specifically study the problem of selecting a Pareto-optimal equilibrium among several existing equilibria. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address suboptimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC), an actor-critic algorithm that utilises a simple property of no-conflict games (a superset of cooperative games with identical rewards): each agent can assume the others will choose actions that will lead to a Pareto-optimal equilibrium. We evaluate Pareto-AC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to alternative MARL algorithms, as well as successfully converging to a Pareto-optimal equilibrium in a range of matrix games.
@inproceedings{christianos2023pareto,
title={Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning},
author={Filippos Christianos and Georgios Papoudakis and Stefano V. Albrecht},
booktitle={AAMAS Workshop on Optimization and Learning in Multiagent Systems},
year={2023}
}
Elliot Fosong, Arrasy Rahman, Ignacio Carlucho, Stefano V. Albrecht
Learning Complex Teamwork Tasks Using a Sub-task Curriculum
AAMAS Workshop on Multiagent Sequential Decision Making Under Uncertainty, 2023
Abstract | BibTex | arXiv | Code
AAMASmulti-agent-rlad-hoc-teamworktransfer-learning
Abstract:
Training a team to complete a complex task via multi-agent reinforcement learning can be difficult due to challenges such as policy search in a large policy space, and non-stationarity caused by mutually adapting agents. To facilitate efficient learning of complex multi-agent tasks, we propose an approach which uses an expert-provided curriculum of simpler multi-agent sub-tasks. In each sub-task of the curriculum, a subset of the entire team is trained to acquire sub-task-specific policies. The sub-teams are then merged and transferred to the target task, where their policies are collectively fined tuned to solve the more complex target task. We present MEDoE, a flexible method which identifies situations in the target task where each agent can use its sub-task-specific skills, and uses this information to modulate hyperparameters for learning and exploration during the fine-tuning process. We compare MEDoE to multi-agent reinforcement learning baselines that train from scratch in the full task, and with naïve applications of standard multi-agent reinforcement learning techniques for fine-tuning. We show that MEDoE outperforms baselines which train from scratch or use naïve fine-tuning approaches, requiring significantly fewer total training timesteps to solve a range of complex teamwork tasks.
@inproceedings{fosong2023learning,
title={Learning complex teamwork tasks using a sub-task curriculum},
author={Elliot Fosong, Arrasy Rahman, Ignacio Carlucho and Stefano V. Albrecht},
booktitle={AAMAS Workshop on Multiagent Sequential Decision Making under Uncertainty},
year={2023},
}
Adam Michalski, Filippos Christianos, Stefano V. Albrecht
SMAClite: A Lightweight Environment for Multi-Agent Reinforcement Learning
AAMAS Workshop on Multiagent Sequential Decision Making Under Uncertainty, 2023
Abstract | BibTex | arXiv | Code
AAMASdeep-rlmulti-agent-rl
Abstract:
There is a lack of standard benchmarks for Multi-Agent Reinforcement Learning (MARL) algorithms. The Starcraft Multi-Agent Challenge (SMAC) has been widely used in MARL research, but is built on top of a heavy, closed-source computer game, StarCraft II. Thus, SMAC is computationally expensive and requires knowledge and the use of proprietary tools specific to the game for any meaningful alteration or contribution to the environment. We introduce SMAClite -- a challenge based on SMAC that is both decoupled from Starcraft II and open-source, along with a framework which makes it possible to create new content for SMAClite without any special knowledge. We conduct experiments to show that SMAClite is equivalent to SMAC, by training MARL algorithms on SMAClite and reproducing SMAC results. We then show that SMAClite outperforms SMAC in both runtime speed and memory.
@inproceedings{michalski2023smaclite,
title={SMAClite: A Lightweight Environment for Multi-Agent Reinforcement Learning},
author={Adam Michalski and Filippos Christianos and Stefano V. Albrecht},
booktitle={AAMAS workshop on Multiagent Sequential Decision Making Under Uncertainty (MSDM)},
year={2023}
}
Lukas Schäfer, Oliver Slumbers, Stephen McAleer, Yali Du, Stefano V. Albrecht, David Mguni
Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning
AAMAS Workshop on Adaptive and Learning Agents, 2023
Abstract | BibTex | arXiv
AAMASmulti-agent-rldeep-rl
Abstract:
Cooperative multi-agent reinforcement learning (MARL) requires agents to explore to learn to cooperate. Existing value-based MARL algorithms commonly rely on random exploration, such as ϵ-greedy, which is inefficient in discovering multi-agent cooperation. Additionally, the environment in MARL appears non-stationary to any individual agent due to the simultaneous training of other agents, leading to highly variant and thus unstable optimisation signals. In this work, we propose ensemble value functions for multi-agent exploration (EMAX), a general framework to extend any value-based MARL algorithm. EMAX trains ensembles of value functions for each agent to address the key challenges of exploration and non-stationarity: (1) The uncertainty of value estimates across the ensemble is used in a UCB policy to guide the exploration of agents to parts of the environment which require cooperation. (2) Average value estimates across the ensemble serve as target values. These targets exhibit lower variance compared to commonly applied target networks and we show that they lead to more stable gradients during the optimisation. We instantiate three value-based MARL algorithms with EMAX, independent DQN, VDN and QMIX, and evaluate them in 21 tasks across four environments. Using ensembles of five value functions, EMAX improves sample efficiency and final evaluation returns of these algorithms by 53%, 36%, and 498%, respectively, averaged all 21 tasks.
@inproceedings{schaefer2023emax,
title={Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning},
author={Lukas Schäfer and Oliver Slumbers and Stephen McAleer and Yali Du and Stefano V. Albrecht and David Mguni},
year={2023},
booktitle={AAMAS Workshop on Adaptive and Learning Agents (ALA)},
}
Callum Tilbury, Filippos Christianos, Stefano V. Albrecht
Revisiting the Gumbel-Softmax in MADDPG
AAMAS Workshop on Adaptive and Learning Agents, 2023
Abstract | BibTex | arXiv | Code
AAMASmulti-agent-rldeep-rl
Abstract:
MADDPG is an algorithm in multi-agent reinforcement learning (MARL) that extends the popular single-agent method, DDPG, to multi-agent scenarios. Importantly, DDPG is an algorithm designed for continuous action spaces, where the gradient of the state-action value function exists. For this algorithm to work in discrete action spaces, discrete gradient estimation must be performed. For MADDPG, the Gumbel-Softmax (GS) estimator is used -- a reparameterisation which relaxes a discrete distribution into a similar continuous one. This method, however, is statistically biased, and a recent MARL benchmarking paper suggests that this bias makes MADDPG perform poorly in grid-world situations, where the action space is discrete. Fortunately, many alternatives to the GS exist, boasting a wide range of properties. This paper explores several of these alternatives and integrates them into MADDPG for discrete grid-world scenarios. The corresponding impact on various performance metrics is then measured and analysed. It is found that one of the proposed estimators performs significantly better than the original GS in several tasks, achieving up to 55\% higher returns, along with faster convergence.
@inproceedings{tilbury2023revisitingmaddpg,
title={Revisiting the Gumbel-Softmax in MADDPG},
author={Callum Tilbury and Filippos Christianos and Stefano V. Albrecht},
year={2023},
booktitle={AAMAS Workshop on Adaptive and Learning Agents (ALA)},
}
Alain Andres, Lukas Schäfer, Esther Villar-Rodriguez, Stefano V. Albrecht, Javier Del Ser
Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments
AAMAS Workshop on Adaptive and Learning Agents, 2023
Abstract | BibTex | arXiv
AAMASdeep-rl
Abstract:
One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently during online RL training both consistently improve the sample-efficiency while converging to optimal policies. Furthermore, we show that pre-training a policy from as few as two trajectories can make the difference between learning an optimal policy at the end of online training and not learning at all. Our findings motivate the widespread adoption of IL for pre-training and concurrent IL in procedurally generated environments whenever offline trajectories are available or can be generated.
@inproceedings{andres2023using,
title={Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments},
author={Andres, Alain and Schäfer, Lukas and Villar-Rodriguez, Esther and Albrecht, Stefano V. and Del Ser, Javier},
booktitle={AAMAS Workshop on Adaptive and Learning Agents (ALA)},
year={2023}
}
2022
Lukas Schäfer, Filippos Christianos, Josiah P. Hanna, Stefano V. Albrecht
Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration
International Conference on Autonomous Agents and Multi-Agent Systems, 2022
Abstract | BibTex | arXiv | Code
AAMASdeep-rlintrinsic-reward
Abstract:
Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we introduce Decoupled RL (DeRL) as a general framework which trains separate policies for intrinsically-motivated exploration and exploitation. Such decoupling allows DeRL to leverage the benefits of intrinsic rewards for exploration while demonstrating improved robustness and sample efficiency. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. Our results show that DeRL is more robust to varying scale and rate of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically-motivated baselines in fewer interactions. Lastly, we discuss the challenge of distribution shift and show that divergence constraint regularisers can successfully minimise instability caused by divergence of exploration and exploitation policies.
@inproceedings{schaefer2022derl,
title={Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration},
author={Lukas Schäfer and Filippos Christianos and Josiah P. Hanna and Stefano V. Albrecht},
booktitle={International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
year={2022}
}
Lukas Schäfer
Task Generalisation in Multi-Agent Reinforcement Learning
International Conference on Autonomous Agents and Multiagent Systems, Doctoral Consortium, 2022
Abstract | BibTex | Paper
AAMASmulti-agent-rl
Abstract:
Multi-agent reinforcement learning agents are typically trained in a single environment. As a consequence, they overfit to the training environment which results in sensitivity to perturbations and inability to generalise to similar environments. For multi-agent reinforcement learning approaches to be applicable in real-world scenarios, generalisation and robustness need to be addressed. However, unlike in supervised learning, generalisation lacks a clear definition in multi-agent reinforcement learning. We discuss the problem of task generalisation and demonstrate the difficulty of zero-shot generalisation and finetuning at the example of multi-robot warehouse coordination with preliminary results. Lastly, we discuss promising directions of research working towards generalisation of multi-agent reinforcement learning.
@inproceedings{schaefer2022task,
title={Task Generalisation in Multi-Agent Reinforcement Learning},
author={Lukas Schäfer},
booktitle={Doctoral Consortium at the International Conference on Autonomous Agents and Multiagent Systems},
year={2022}
}
Filippos Christianos
Collaborative Training of Multiple Autonomous Agents
International Conference on Autonomous Agents and Multiagent Systems, Doctoral Consortium, 2022
Abstract | BibTex | Paper
AAMASmulti-agent-rl
Abstract:
Exploration in multi-agent reinforcement learning is a challenging problem, especially with a large number of agents. Parameter sharing between agents is often used since it significantly decreases the number of trainable parameters, shortening training times to tractable levels and improving exploration efficiency. We present two algorithms that aim to be a middle ground between not sharing parameters and fully sharing parameters. These proposed algorithms show the advantages of the baselines at the two ends of the spectrum and minimise their drawbacks. First, Shared Experience Actor-Critic [Christianos et al. 2020], applies the basic idea of off-policy correction via importance weighting and combines the experiences generated by different agents into more informative and effective learning gradients. Then, Selective Parameter Sharing [Christianos et al. 2021], based on rigorous empirical analysis of the impact of parameter sharing proposes a novel parameter sharing method that can be coupled with existing multi-agent reinforcement learning algorithms.
@inproceedings{christianos2022collaborative,
title={Collaborative Training of Multiple Autonomous Agents},
author={Filippos Christianos},
booktitle={Doctoral Consortium at the International Conference on Autonomous Agents and Multiagent Systems},
year={2022}
}
2017
Stefano V. Albrecht, Peter Stone
Reasoning about Hypothetical Agent Behaviours and their Parameters
International Conference on Autonomous Agents and Multiagent Systems, 2017
Abstract | BibTex | arXiv
AAMASad-hoc-teamworkagent-modelling
Abstract:
Agents can achieve effective interaction with previously unknown other agents by maintaining beliefs over a set of hypothetical behaviours, or types, that these agents may have. A current limitation in this method is that it does not recognise parameters within type specifications, because types are viewed as blackbox mappings from interaction histories to probability distributions over actions. In this work, we propose a general method which allows an agent to reason about both the relative likelihood of types and the values of any bounded continuous parameters within types. The method maintains individual parameter estimates for each type and selectively updates the estimates for some types after each observation. We propose different methods for the selection of types and the estimation of parameter values. The proposed methods are evaluated in detailed experiments, showing that updating the parameter estimates of a single type after each observation can be sufficient to achieve good performance.
@inproceedings{ albrecht2017reasoning,
title = {Reasoning about Hypothetical Agent Behaviours and their Parameters},
author = {Stefano V. Albrecht and Peter Stone},
booktitle = {Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems},
pages = {547--555},
year = {2017}
}
2013
Stefano V. Albrecht, Subramanian Ramamoorthy
A Game-Theoretic Model and Best-Response Learning Method for Ad Hoc Coordination in Multiagent Systems
International Conference on Autonomous Agents and Multiagent Systems, 2013
Abstract | BibTex | arXiv (full technical report) | Extended Abstract
AAMASad-hoc-teamworkagent-modelling
Abstract:
The ad hoc coordination problem is to design an autonomous agent which is able to achieve optimal flexibility and efficiency in a multiagent system with no mechanisms for prior coordination. We conceptualise this problem formally using a game-theoretic model, called the stochastic Bayesian game, in which the behaviour of a player is determined by its private information, or type. Based on this model, we derive a solution, called Harsanyi-Bellman Ad Hoc Coordination (HBA), which utilises the concept of Bayesian Nash equilibrium in a planning procedure to find optimal actions in the sense of Bellman optimal control. We evaluate HBA in a multiagent logistics domain called level-based foraging, showing that it achieves higher flexibility and efficiency than several alternative algorithms. We also report on a human-machine experiment at a public science exhibition in which the human participants played repeated Prisoner's Dilemma and Rock-Paper-Scissors against HBA and alternative algorithms, showing that HBA achieves equal efficiency and a significantly higher welfare and winning rate.
@inproceedings{ albrecht2013game,
title = {A Game-Theoretic Model and Best-Response Learning Method for Ad Hoc Coordination in Multiagent Systems},
author = {Stefano V. Albrecht and Subramanian Ramamoorthy},
booktitle = {Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems},
address = {St. Paul, Minnesota, USA},
month = {May},
year = {2013}
}
2012
Stefano V. Albrecht, Subramanian Ramamoorthy
Comparative Evaluation of Multiagent Learning Algorithms in a Diverse Set of Ad Hoc Team Problems
International Conference on Autonomous Agents and Multiagent Systems, 2012
Abstract | BibTex | arXiv
AAMASmulti-agent-rlad-hoc-teamwork
Abstract:
This paper is concerned with evaluating different multiagent learning (MAL) algorithms in problems where individual agents may be heterogenous, in the sense of utilizing different learning strategies, without the opportunity for prior agreements or information regarding coordination. Such a situation arises in ad hoc team problems, a model of many practical multiagent systems applications. Prior work in multiagent learning has often been focussed on homogeneous groups of agents, meaning that all agents were identical and a priori aware of this fact. Also, those algorithms that are specifically designed for ad hoc team problems are typically evaluated in teams of agents with fixed behaviours, as opposed to agents which are adapting their behaviours. In this work, we empirically evaluate five MAL algorithms, representing major approaches to multiagent learning but originally developed with the homogeneous setting in mind, to understand their behaviour in a set of ad hoc team problems. All teams consist of agents which are continuously adapting their behaviours. The algorithms are evaluated with respect to a comprehensive characterisation of repeated matrix games, using performance criteria that include considerations such as attainment of equilibrium, social welfare and fairness. Our main conclusion is that there is no clear winner. However, the comparative evaluation also highlights the relative strengths of different algorithms with respect to the type of performance criteria, e.g., social welfare vs. attainment of equilibrium.
@inproceedings{ albrecht2012comparative,
title = {Comparative Evaluation of {MAL} Algorithms in a Diverse Set of Ad Hoc Team Problems},
author = {Stefano V. Albrecht and Subramanian Ramamoorthy},
booktitle = {Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems},
pages = {349--356},
year = {2012}
}