11-07-2014 | Karen Feigh: Option and Constraint Generation Using Work Domain Analysis: Implementation for Reinforcement Learning Algorithm

Autonomy Incubator Seminar Series:
OPTION AND CONSTRAINT GENERATION USING WORK DOMAIN ANALYSIS: IMPLEMENTATION FOR REINFORCEMENT LEARNING ALGORITHM

Dr. Karen Feigh, Associate Professor, School of Aerospace Engineering, Georgia Tech
November 7, 2014, 10:00 am, NASA LaRC Reid Center (Bldg 2102)
Hosts: Danette Allen (NASA) and Fred Brooks (NIA)

Abstract
The goal of interactive machine learning is to develop methods to exploit experienced humans’ knowledge to teach machine learning agents in a manner that people find natural and intuitive and to design machine learning algorithms that take better advantage of a human teacher’s guidance. What is needed is a systematic way to mine a human’s knowledge about a domain and to translate it into a set of options and constraints for use by a machine learning agent. The conventional approaches most often associated with interactive machine learning are learning from demonstrations where a human teacher provides demonstration of the desired behavior, and learning from critique where a human teacher watches the agent and gives feedback about the behavior. Both of these approaches are used to inform an underlying reinforcement learning schema.

The talk will describe recent work on the use of Work Domain Analysis (WDA), a technique from the field of cognitive engineering, to inform the creation of options and constraints for Reinforcement Learning (RL) algorithms, which differs from both of the conventional approaches associated with interactive machine learning. The micro-world of Pac-Man, a classic arcade game, is used as a tractable and representative work domain. WDA was conducted on individuals familiar with Pac-Man and an abstraction hierarchy, a means-ends representation of their understanding of the game, was created for each individual. The abstraction hierarchies for the best performing and worst performing individuals were then combined to illustrate the differences between the groups. Several differences were found, and included the use of defense as well as offensive strategies by high performers versus only defense by poor performers, context sensitivity, additional goals and more sophisticated constraints by high performers. The differences were translated into an options and constraint paradigm suitable for incorporation into RL algorithms. The two sets of option and constraints are given to reinforcement learning algorithms for policy creation. The results indicate that the performance of the policies correspond to the relative performance of the individuals from which they were derived.

Bio
Karen Feigh is an Associate professor at Georgia Tech’s School of Aerospace Engineering. As a faculty member of the Georgia Tech Cognitive Engineering Center, she leads a research and education program focused on the computational cognitive modeling and design of cognitive work support systems and technologies to improve the performance of socio-technical systems with particular emphasis on aerospace systems. She is responsible for undergraduate and graduate level instruction in the areas of flight dynamics, evaluation of human integrated systems, human factors, and cognitive engineering. Dr. Feigh has over nine years of relevant research and design experience in fast-time air traffic simulation, ethnographic studies, airline and fractional ownership operation control centers, synthetic vision systems for helicopters, expert systems for air traffic control towers, and the impact of context on undersea warfighters. Dr. Feigh serves on the National Research Council’s Aeronautics and Space Engineering Board (ASEB), as the Associate Editor for the Journal of the American Helicopter Society, and as a guest editor for a special addition of the AIAA Journal of Aerospace Information Systems on Human Automation Interaction.