gaips_bea image1 image2 image3 image4 image5 gaips_ecute_beach_bar_banner gaips_ecute_train_incorrect_ticket_banner
A new convergent variant of Q-learning with linear function approximation


Abstract In this work, we identify a novel set of conditions that ensure convergence with probability 1 of Q-learning with linear function approximation, by proposing a two time-scale variation thereof. In the faster time scale, the algorithm features an update similar to that of DQN, where the impact of bootstrapping is attenuated by using a Q-value estimate akin to that of the target network in DQN. The slower time-scale, in turn, can be seen as a modified target network update. We establish the convergence of our algorithm, provide an error bound and discuss our results in light of existing convergence results on reinforcement learning with function approximation. Finally, we illustrate the convergent behavior of our method in domains where standard Q-learning has previously been shown to diverge.
Year 2020
Keywords Reinforcement Learning;
Authors Diogo S. Carvalho, Francisco S. Melo, Pedro A. Santos
Journal Advances in Neural Information Processing Systems
Volume 33
Pdf File
BibTex bib icon or see it here down icon

@article { santos20, abstract = {In this work, we identify a novel set of conditions that ensure convergence with probability 1 of Q-learning with linear function approximation, by proposing a two time-scale variation thereof. In the faster time scale, the algorithm features an update similar to that of DQN, where the impact of bootstrapping is attenuated by using a Q-value estimate akin to that of the target network in DQN. The slower time-scale, in turn, can be seen as a modified target network update. We establish the convergence of our algorithm, provide an error bound and discuss our results in light of existing convergence results on reinforcement learning with function approximation. Finally, we illustrate the convergent behavior of our method in domains where standard Q-learning has previously been shown to diverge. }, journal = {Advances in Neural Information Processing Systems}, keywords = {Reinforcement Learning;}, title = {A new convergent variant of Q-learning with linear function approximation}, volume = {33}, year = {2020}, author = {Diogo S. Carvalho, Francisco S. Melo, Pedro A. Santos} }

up icon hide this content