Associative Metric for Learning in Factored MDPs Based on Classical Conditioning


Pedro Sequeira, Francisco S. Melo and Ana Paiva
GAIPS technical report series, GAIPS-TR-002-12, June 2012    Associative Metric for Learning in Factored MDPs Based on Classical Conditioning


Abstract
Classical conditioning is a behaviorist paradigm that allows organisms to acquire predictive associations between stimuli in the environment whenever co-occurrences between them are frequent. In this paper we propose a novel associative metric based on the classical conditioning paradigm that, much like what happens in nature, identifies associations between stimuli perceived by a learning agent while interacting with the environment. We use an associative tree structure to identify associations between the perceived stimuli and measure the degree of similarity between states in reinforcement learning (RL) scenarios. Our approach provides a state-space metric that requires no prior knowledge on the structure of the underlying decision problem and which is learned online, i.e., while the agent is learning the RL task it- self. We combine our metric with Q-learning, generalizing the experience of the agent and improving the overall learning performance. We illustrate the application of our method in several problems of varying complexity and show that our metric leads to a performance comparable to that obtained with other well- studied metrics but which require full knowledge of the decision problem. The paper concludes by analyzing the impact of our metric in typified conditioning experiments, showing that combining our associative metric with standard TD(0) learning leads to the replication of common phenomena described in the classical conditioning literature.

sequeira2012techrep1.pdf

Paper

Posted on