Machine Learning Meetings and Events
Group Meetings: Group meetings are held Mondays from 11am to Noon (talk starts 11:10am) in D.L. Pratt 290C unless otherwise noted. Meetings are coordinated by Hugo Larochelle.
Tea Talks: Tea talks are held every Wednesday at 4:00pm in D.L. Pratt 290C. Talks should be simple, accessible, and not exceed 15 minutes. Speakers bring snacks, make tea, and provide a copy of the presented paper.
Group Meeting Nov 2, 2009: Reinforcement Learning: A Model-Based Bayesian Approach.
- Speaker: Pascal Poupart (Waterloo)
- Abstract:
Reinforcement learning considers the problem of sequential decision making modeled as a Markov decision process, but with unknown parameters (e.g., unknown transition probabilities and costs). In many application domains, there may not be enough data available to accurately learn all the parameters, hence the decision maker is forced to optimize and execute a policy based on partial information with the opportunity to refine the policy as he/she executes it and gathers more data. As a result, the decision maker faces an important dilemma: select actions that minimize costs by "exploiting" the information of the data gathered so far or select actions that maximize information gain by "exploring" (with potentially high costs). Bayesian reinforcement learning provides a principled approach to naturally optimize this exploitation/exploration tradeoff. The idea is to maintain a distribution over the unknown parameters, which can be used to quantify the uncertainty about the model and to make more informed decisions. However, as is typical with Bayesian approaches, maintaining such distributions increases the conceptual and computational complexity.
In this talk, despite the increased complexity, I will explain how an analytical parameterization can be derived for the optimal value function. More precisely, I will show that the optimal value function for discrete states/actions and finite planning horizon is the upper surface of a set of polynomials. This result also holds for partially observable domains. I will also describe an approximate dynamic programming algorithm called BEETLE that uses this representation to optimize an adaptive policy that refines itself as additional data becomes available.
References:
* An Analytic Solution to Discrete Bayesian Reinforcement Learning
Pascal Poupart, Nikos Vlassis, Jesse Hoey and Kevin Regan
In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 697-704, Pittsburgh, Pennsylvania, USA, 2006.
http://www.cs.uwaterloo.ca/~ppoupart/publications/bayesian-rl/icml-brl-8pages.pdf* Model-based Bayesian Reinforcement Learning in Partially Observable Domains
Pascal Poupart and Nikos Vlassis
International Symposium on Artificial Intelligence and Mathematics (ISAIM), Fort Lauderdale, Florida, 2008.
http://www.cs.uwaterloo.ca/~ppoupart/publications/bporl/isaim08-poupart.pdf - Notes: in PT266