Speaker
Description
Markov decision processes (MDPs) is a minimalist framework that is designed to capture the most important aspects of decision making under uncertainty, a problem of major practical interest. Unfortunately, the price of the minimalist approachis that MDPs lack structure and as such planning and learning in MDPs with combinatorial-sized state and action spaces is strongly intractable. An appealing idea to overcome this intractability is to assume that the optimal action-value function, which effectively captures everything that needs to be known about an MDP in order to act optimally in it, enjoys some distinct structural properties, such as that it can be written as the linear combination of a tractable number of basis functions. The question is whether this structure is sufficient to design algorithms that overcome the inherent intractability of planning and learning in MDPs. In this talk, I will look back at research addressing this question (starting in the 1960s) and then present a recent result that shows that the MDP planning and learning problems remain strongly intractable even with the extra assumptions, as far as scaling with the planning and the number of basis functions is concerned. I will finish with listing a number of open problems.
Title
Hardness of planning and learning with linear function approximation in reinforcement learning
authors | Weisz, Gellert; Amortila, Philip; Szepesvári, Csaba |
---|---|
affiliation | DeepMind; University of Illinois, Urbana-Champaign, USA; DeepMind/University of Alberta |