CS790 Project

This project was submitted in August, 2003 as my CS Masters of Engineering Project
at Cornell University under the advising of Bart Selman.  It is the culmination of
my Masters studies at Cornell, bringing together aspects of the fields of Artificial
Intelligence and Machine Learning.

Sample Prison Environment


Overview

The Prison Guard Training Game is a simulation of a simple prison environment.  Two
prison guards are trained (through Reinforcement Learning techniques) to prevent a
prisoner's escape by navigating through a maze of hallways and catching the prisoner.
This simple learning evironment was designed to encourage the prison guards to cooperate
with each other in their pursuit of a common goal.  Temporal difference Q-learning
was used to train each guard agent.  For each, a Neural Network was trained via
backpropagation to approximate an expected Return funtion of state-action value pairs.
A variety of experiments were performed, each of which involved applying the learning
system to pairs of agents on a given prison environment.  The Prison Guard Training
Game turned out to be quite successful.  On a variety of different prison environments,
two guard agents learned to behave intelligently, developing a cooperative strategy
that allowed for near flawless performance against the prisoner adversary.

Additionally, a communications channel was implemented in an attempt to get the agents
to talk to one another, further promoting cooperation.  No structure was imposed on the
information to be communicated back and fourth, however.  In order to communicate
sucessfully, the agents had to learn what information to send and how to interpret the
information recieved.  The agents were, in a sense, required to develop their own
language.  And in one simple situation they learned to make very good use of the channel.
[See Demo]

View my project proposal.

To facilitate the experiments performed, a graphical user interface (GUI) was created.
The Prison Guard Training Game GUI allows for easy creation of prison environments,
training of prison guard agents, and viewing and playing of games.  This was implemented
in Java with AWT and Swing graphics primitives.  The complete source along with
compilation and execution instructions can be downloaded here.  It requires that sun's
JDK version 1.4.0 or higher be installed.  The latest version of java can be downloaded
here (follow links to "SDK").


Details


Contact

Stefan Witwicki
Cornell University

Please email me with your questions or comments: sjw28@cornell.edu.