Francisco S. Melo and Alberto Sardinha
Autonomous Agents and Multi-Agent Systems, 30 (2):175-219, 2016. DOI: 10.1007/s10458-015-9280-x
This paper addresses the problem of ad hoc teamwork, where a learning agent engages in a cooperative task with other (unknown) agents. The agent must effectively coordinate with the other agents towards completion of the intended task, not relying on any pre-defined coordination strategy. We contribute a new perspective on the ad hoc teamwork problem and propose that, in general, the learning agent should not only identify (and coordinate with) the teammates’ strategy but also identify the task to be completed. In our approach to the ad hoc teamwork problem, we represent tasks as fully cooperative matrix games. Relying exclusively on observations of the behavior of the teammates, the learning agent must identify the task at hand (namely, the corresponding payoff function) from a set of possible tasks and adapt to the teammates’ behavior. Teammates are assumed to follow a bounded-rationality best-response model and thus also adapt their behavior to that of the learning agent. We formalize the ad hoc teamwork problem as a sequential decision problem and propose two novel approaches to address it. In particular, we propose (i) the use of an online learning approach that considers the different tasks depending on their ability to predict the behavior of the teammate; and (ii) a decision-theoretic approach that models the ad hoc teamwork problem as a partially observable Markov decision problem. We provide theoretical bounds of the performance of both approaches and evaluate their performance in several domains of different complexity.