11–26 Nov 2021
Europe/Budapest timezone

Hardness of planning and learning with linear function approximation in reinforcement learning

Not scheduled
20m
Online lecture

Speaker

Csaba Szepesvari (DeepMind/University of Alberta)

Description

Markov decision processes (MDPs) is a minimalist framework that is designed to capture the most important aspects of decision making under uncertainty, a problem of major practical interest. Unfortunately, the price of the minimalist approachis that MDPs lack structure and as such planning and learning in MDPs with combinatorial-sized state and action spaces is strongly intractable. An appealing idea to overcome this intractability is to assume that the optimal action-value function, which effectively captures everything that needs to be known about an MDP in order to act optimally in it, enjoys some distinct structural properties, such as that it can be written as the linear combination of a tractable number of basis functions. The question is whether this structure is sufficient to design algorithms that overcome the inherent intractability of planning and learning in MDPs. In this talk, I will look back at research addressing this question (starting in the 1960s) and then present a recent result that shows that the MDP planning and learning problems remain strongly intractable even with the extra assumptions, as far as scaling with the planning and the number of basis functions is concerned. I will finish with listing a number of open problems.

Title

Hardness of planning and learning with linear function approximation in reinforcement learning

authors Weisz, Gellert; Amortila, Philip; Szepesvári, Csaba
affiliation DeepMind; University of Illinois, Urbana-Champaign, USA; DeepMind/University of Alberta

Primary authors

Csaba Szepesvari (DeepMind/University of Alberta) Mr Gellert Weisz (DeepMind) Mr Philip Amortila (University of Illinois, Urbana-Champaign, USA)

Presentation materials

There are no materials yet.