Reinforcement Learning Online

The purpose of this web-site is to provide web-links and references to research related to reinforcement learning (RL), which also goes by other names such as neuro-dynamic programming (NDP) and adaptive or approximate dynamic programming (ADP). You'll find links to tutorials, MATLAB codes, papers, textbooks, and journals. Please send me an e-mail at gosavia AT mst DOT edu if you have any comments or questions about this web-page.
A brief description of Reinforcement Learning

Reinforcement Learning is a simulation-based technique for solving Markov Decision Problems. Classical dynamic programming algorithms, such as value iteration and policy iteration, can be used to solve these problems if their state-space is small and the system under study is not very complex. However if the system under study is very complex or if the number of states is very large, these algorithms break down. This is because these algorithms require the computation of the so-called one-step transition probabilities, which is difficult under these circumstances. If the system stochastics are very complex, it is difficult to obtain expressions for these transition probabilities (this is called the curse of modeling). If the state space is large (say of the order of million states), the number of transition probabilities goes through the roof, and then it is not possible to even store them --- let alone process them for generating a solution. Reinforcement learning is a simulation-based method rooted in dynamic programming. It is capable of solving large-scale Markov decision problems. It does not require the computation or storage of the transition probabilites. When the state-space is large, it can be combined with a function approximation scheme such as regression or a neural network algorithm to approximate the value function of dynamic programming, thereby generating a solution. It has been shown through mathematically rigorous arguments that reinforcement learning can produce optimal or near-optimal solutions. There is also a great deal of empirical evidence to show the same. Here is a list of some of my own research papers , but there is more here, which you will hopefully like.

Tutorials, Codes, and Other Web-based Resources

A MATLAB Repository for Reinforcement Learning (created by Abhijit Gosavi)

Neuro-Dynamic Programming (NDP) (Research of D.P. Bertsekas (MIT) and his colleagues.)

A Reinforcement Learning Repository (created by Sridhar Mahadevan (Umass, Amherst))

Textbooks

Reinforcement Learning

Reinforcement Learning: An Introduction written by R. Sutton and A. Barto.

Neuro-Dynamic Programming written by D.P. Bertsekas and J.N. Tsitsiklis.

Simulation-based Optimization: Parametric Optimization Techniques and Reinforcement Learning written by Abhijit Gosavi. Some chapters from the book are freely available from this website.

Markov Decision Theory

Dynamic Programming and Optimal Control written by D.P. Bertsekas. This textbook treats the topic of Markov Decision Problems in great detail and is strongly recommended if you are a researcher. The chapter on reinforcement learning (ADP) in the book is freely available here .

If you are a beginner and are interested in a quick introduction to this subject, read the accesible account in the chapter on Markov Decision Processes presented in the following textbook:

Introduction to Operations Research written by Frederick S. Hillier, Gerald J. Lieberman.

Journals

Back to Abhijit Gosavi's homepage.