[ home ] [ people ] [ projects ] [ courses ] [ meetings ]


Direct value-approximation for factored Markov decision processes


Wednesday, Oct. 24th -- Dale Schuurmans (University of Waterloo)


Abstract:
 

It is well known that classical methods for determining optimal control policies for Markov Decision Processes (MDPs) do not scale up feasibly in the size of the state descriptions.  We present a simple approach for efficiently computing near-optimal policies in factored MDPs, when the optimal value function can be approximated by a compact linear form. Our method is based on solving a single linear program that approximates the best linear fit to the optimal value function.  By applying an efficient constraint generation procedure we obtain an iterative solution method that tackles concise linear programs.  This direct linear programming approach experimentally yields a significant reduction in computation time over approximate value- and policy-iteration methods (sometimes reducing several hours to a few seconds).  However, the quality of the solutions produced by linear programming is weaker---usually about twice the approximation error for the same approximating class.  Nevertheless, the speed advantage allows one to use larger approximation classes to achieve similar error in reasonable time.

Joint work with Relu Patrascu (U. Waterloo)

Link: