The purpose of this web-site is to provide web-links and references to research related to reinforcement learning (RL), which also goes by other names such as neuro-dynamic programming (NDP) and adaptive or approximate dynamic programming (ADP). You'll find links to tutorials, MATLAB codes, papers, textbooks, and journals. Please send me an e-mail at gosavia AT mst DOT edu if you have any comments or questions about this web-page.

Reinforcement Learning is a simulation-based technique for solving
Markov Decision Problems. Classical dynamic programming algorithms, such
as value iteration and policy iteration, can be used to solve these
problems if their state-space is small and the system under study is not
very complex. However if the system under study is very complex or if the
number of states is very large, these algorithms break down.
This is because these algorithms require the computation of the so-called
one-step transition probabilities, which is difficult under
these circumstances. If the system stochastics are very complex, it is
difficult to obtain expressions for these transition probabilities (this is called the curse
of modeling). If the state space is large (say of the order of
million states), the number of transition probabilities goes through the roof, and then
it is not possible to even store them --- let alone process them for generating
a solution.
Reinforcement learning is a simulation-based method rooted in
dynamic programming. It is capable of solving **large-scale** Markov decision problems.
It does not require the computation or storage of the transition probabilites.
When the state-space is large, it can be combined with a function approximation
scheme such as regression or a neural network algorithm to approximate the value
function of dynamic programming, thereby generating a solution. It has been shown through mathematically rigorous
arguments that reinforcement learning can produce optimal or near-optimal solutions.
There is also a great deal of empirical evidence to show the same. Here is a list of some of my own research
papers , but there is more here, which you will hopefully like.

A Survey Paper for Reinforcement Learning

A Tutorial for Reinforcement Learning

A MATLAB Repository for Reinforcement Learning (created by Abhijit Gosavi)

Neuro-Dynamic Programming (NDP) (Research of D.P. Bertsekas (MIT) and his colleagues.)

A Reinforcement Learning Repository (created by Sridhar Mahadevan (Umass, Amherst))

Partially Observable Markov Decision Processes

Reinforcement Learning: An Introduction written by R. Sutton and A. Barto.

Neuro-Dynamic Programming written by D.P. Bertsekas and J.N. Tsitsiklis.

Simulation-based Optimization: Parametric Optimization Techniques and Reinforcement Learning written by Abhijit Gosavi. Some chapters from the book are freely available from this website.

** Dynamic Programming and Optimal Control**
written by D.P. Bertsekas.
This textbook treats the topic of Markov Decision
Problems in great detail and is strongly recommended if you are a researcher. The chapter on reinforcement
learning (ADP) in the book is freely available here .

If you are a beginner and are interested in a quick introduction to this subject, read the accesible account in the chapter on Markov Decision Processes presented in the following textbook:

**
Introduction to Operations Research ** written by Frederick S. Hillier,
Gerald J. Lieberman.

Journal of Machine Learning Research

Journal of Artificial Intelligence Research

Engineering Applications of Artificial Intelligence