Charakorn, Rujikorn, Manoonpong, Poramate and Dilokthanakul, Nat (2023) Generating Diverse Cooperative Agents by Learning Incompatible Policies In: International Conference on Learning Representations (ICLR).
Official URL: https://openreview.net/forum?id=UkU05GOH7_6
Training a robust cooperative agent requires diverse partner agents. However, obtaining those agents is difficult. Previous works aim to learn diverse behaviors by changing the state-action distribution of agents. But, without information about the task's goal, the diversified agents are not guided to find other important, albeit sub-optimal, solutions: the agents might learn only variations of the same solution. In this work, we propose to learn diverse behaviors via policy compatibility. Conceptually, policy compatibility measures whether policies of interest can coordinate effectively. We theoretically show that incompatible policies are not similar. Thus, policy compatibility—which has been used exclusively as a measure of robustness—can be used as a proxy for learning diverse behaviors. Then, we incorporate the proposed objective into a population-based training scheme to allow concurrent training of multiple agents. Additionally, we use state-action information to induce local variations of each policy. Empirically, the proposed method consistently discovers more solutions than baseline methods across various multi-goal cooperative environments. Finally, in multi-recipe Overcooked, we show that our method produces populations of behaviorally diverse agents, which enables generalist agents trained with such a population to be more robust.
Item Type:
Conference or Workshop Item (Paper)
Subjects:
Subjects > Computer Science > Artificial Intelligence
Subjects > Computer Science > Machine Learning
Subjects > Computer Science > Robotics
Deposited by:
Nat Dilokthanakul
Date Deposited:
2024-03-07 14:09:47
Last Modified:
2024-08-27 23:18:02