Abstract | In this paper we investigate two hypothesis regarding the use of deep reinforcement learning in multiple tasks. The first hypothesis is driven by the question of whether a deep reinforcement learning algorithm, trained on two similar tasks, is able to outperform two single-task, individually trained algorithms, by more efficiently learning a new, similar task,that none of the three algorithms has encountered before. The second hypothesis is driven by the question of whether the same multi-task deep RL algorithm, trained on two similartasks and augmented with elastic weight consolidation (EWC), is able to retain similar performance on the new task, as a similar algorithm without EWC, whilst being able toovercome catastrophic forgetting in the two previous tasks. We show that a multi-task Asynchronous Advantage Actor-Critic on GPU (Hybrid GA3C) algorithm, trained on Space Invaders and Demon Attack, is in fact, able to outperform two single-tasks GA3C versions, trained individually for each single-task, when evaluated on a new, third task—namely, Phoenix. We also show that, when training two trained multi-task GA3C algorithms on the thirdtask, if one is augmented with EWC, it is not only able to achieve similar performance on the new task, but also capable of overcoming a substantial amount of catastrophic forgetting on the two previous tasks | |
Year | 2019 | |
Keywords | Reinforcement Learning;Intelligent Virtual Agents;Computer Games; | |
Authors | João G. Ribeiro, Francisco S. Melo, João Dias | |
Booktitle | 5th Global Conference on Artificial Intelligence | |
Volume | 65 | |
Pages | 163-175 | |
Month | December | |
Pdf File |
![]() |
|
BibTex |
![]() ![]() @inproceedings { ribeiro19, abstract = {In this paper we investigate two hypothesis regarding the use of deep reinforcement learning in multiple tasks. The first hypothesis is driven by the question of whether a deep reinforcement learning algorithm, trained on two similar tasks, is able to outperform two single-task, individually trained algorithms, by more efficiently learning a new, similar task,that none of the three algorithms has encountered before. The second hypothesis is driven by the question of whether the same multi-task deep RL algorithm, trained on two similartasks and augmented with elastic weight consolidation (EWC), is able to retain similar performance on the new task, as a similar algorithm without EWC, whilst being able toovercome catastrophic forgetting in the two previous tasks. We show that a multi-task Asynchronous Advantage Actor-Critic on GPU (Hybrid GA3C) algorithm, trained on Space Invaders and Demon Attack, is in fact, able to outperform two single-tasks GA3C versions, trained individually for each single-task, when evaluated on a new, third task—namely, Phoenix. We also show that, when training two trained multi-task GA3C algorithms on the thirdtask, if one is augmented with EWC, it is not only able to achieve similar performance on the new task, but also capable of overcoming a substantial amount of catastrophic forgetting on the two previous tasks}, booktitle = {5th Global Conference on Artificial Intelligence}, keywords = {Reinforcement Learning;Intelligent Virtual Agents;Computer Games;}, month = {December}, pages = {163-175}, title = {Multi-task Learning and Catastrophic Forgetting in Continual Reinforcement Learning}, volume = {65}, year = {2019}, author = {João G. Ribeiro and Francisco S. Melo and João Dias} } |