MATLAB Repository for Reinforcement Learning

Funded by the National Science Foundation via grant ECS: 0841055.

The purpose of this web-site is to provide MATLAB codes for Reinforcement Learning (RL), which is also called Adaptive or Approximate Dynamic Programming (ADP) or Neuro-Dynamic Programming (NDP).

This website has been created for the purpose of making RL programming accesible in the engineering community which widely uses MATLAB. Please feel free to use these codes in your research. I will appreciate it if you send me an email acknowledging their use in your research.

The first set of codes that we provide use a 2-state Markov chain as the test bed. While this is a simple test-bed, it is useful to test a new algorithm. Also, these codes are meant to give you an idea of how to incorporate a Q-learning algorithm within a discrete-event simulator of your own. Please click here to access these codes.

Codes are provided for Q-learning, R-SMART and also for value iteration (Q-factor versions).

For a tutorial on RL, please click here .

Matlab Codes for the following paper on semi-variance penalized MDPs and SMDPs (survival probabilities):

1. A. Gosavi. Target-sensitive control of Markov and semi-Markov processes, International Journal of Control, Automation, and Systems, , 9(5):1-11, 2011.

can be found here .

Other papers that were partially funded from this project include:

2. Abhijit Gosavi. "Reinforcement Learning: A Tutorial Survey and Recent Advances." (pdf file) INFORMS Journal on Computing, 21(2):178-192, 2009.

3. Abhijit Gosavi, Susan L. Murray, Jiaqiao Hu, and Shuva Ghosh. . Model-building Adaptive Critics for semi-Markov Control. Journal of Artificial Intelligence and Soft Computing Research, 2(1), 2012.

4. A. Gosavi, S.L. Murray, V.M. Tirumalasetty and S. Shewade. A Budget-Sensitive Approach to Scheduling Maintenance in a Total Productive Maintenance (TPM) Program , Engineering Management Journal , 23(3): 46-56, 2011.

5. K. Kulkarni, A. Gosavi, S. L. Murray and K. Grantham Semi-Markov Adaptive Critic Heuristics with Application to Airline Revenue Management Journal of Control Theory and Applications (special issue on Approximate Dynamic Programming), 9(3): 421-430, 2011.

We plan to put up numerous other MATLAB codes for RL on this website!

Back to Abhijit Gosavi's homepage.