Publications
For news about publications, follow us on X: 
Click on any author names or tags to filter publications.
All topic tags:
surveydeep-rlmulti-agent-rlagent-modellingad-hoc-teamworkautonomous-drivinggoal-recognitionexplainable-aicausalgeneralisationsecurityemergent-communicationiterated-learningintrinsic-rewardsimulatorstate-estimationdeep-learningtransfer-learning
Selected tags (click to remove):
Shuai-Li
2025
Xu Liu, Haobo Fu, Stefano V. Albrecht, Qiang Fu, Shuai Li
Online-to-Offline RL for Agent Alignment
International Conference on Learning Representations, 2025
Abstract | BibTex | Paper | Code
ICLRdeep-learning
Abstract:
Reinforcement learning (RL) has shown remarkable success in training agents to achieve high-performing policies, particularly in domains like Game AI where simulation environments enable efficient interactions. However, despite their success in maximizing these returns, such online-trained policies often fail to align with human preferences concerning actions, styles, and values. The challenge lies in efficiently adapting these online-trained policies to align with human preferences, given the scarcity and high cost of collecting human behavior data. In this work, we formalize the problem as online-to-offline RL and propose ALIGNment of Game AI to Preferences (ALIGN-GAP), an innovative approach for the alignment of well-trained game agents to human preferences. Our method features a carefully designed reward model that encodes human preferences from limited offline data and incorporates curriculum-based preference learning to align RL agents with targeted human preferences. Experiments across diverse environments and preference types demonstrate the performance of ALIGN-GAP, achieving effective alignment with human preferences.
@inproceedings{liu2025aligngap,
title={Online-to-Offline RL for Agent Alignment},
author={Xu Liu and Haobo Fu and Stefano V. Albrecht and Qiang Fu and Shuai Li},
booktitle={13th International Conference on Learning Representations},
year={2025}
}