General Information
Welcome to the Reinforcement Learning Reading Group at RSISE@ANU
- Who: Everyone is welcome.
- When: Every Wednesday, 11:30-12:30, with subsequent lunch.
(If you want to attend, but the time does not suit you, please let me know) - Where: RSISE building, Common RHS Room, A206, Australian National University.
- Assumed Background: Basics in Reinforcement Learning (see below)
- Operation mode: Discussing read papers. No email reminders.
The Reinforcement Learning Reading Group will concentrate on the background for and techniques pushing the frontier of generic reinforcement learning agents, in particular for partial observable domains (PORL). For many years, the reinforcement-learning community primarily focused on sequential decision making in fully observable but unknown domains while the planning-under-uncertainty community focused on known but partially observable domains. Since most problems are both partially observable and (at least partially) unknown, recent years have seen a surge of interest in combining the related, but often different, algorithmic machineries developed in the two communities.
See, for instance:
PORL09: Partially Observable Reinforcement Learning
Symposium at NIPS'09 December 10, Vancouver
http://www.hutter1.net/ai/porlsymp.htm and
http://grla.wikidot.com/nips for more details.
Given the substantial interest in RL and Planning at RSISE@ANU and CRL@NICTA, the time thus seems ripe for a reading group that brings these two communities together and to review recent relevant papers (see below).
Regular (Past&Current) Participants:
Mostly students and researchers from RSISE@ANU and CRL@NICTA
and other RL friends nearby.
Book Reading
From 1.June'11-9.Nov'11 we read
"Neuro-dynamic programming" by Dimitri P. Bertsekas and John Tsitsiklis
Athena Scientific 1996
http://www.amazon.com/Neuro-Dynamic-Programming-Optimization-Neural-Computation/dp/1886529108
Chapter (http://www.athenasc.com/ndpcontents.html) 1: Peter, 2: Mayank, 3: Daniel, 4: Tor, 5: Phuong, 6: Wen and Matthew 7: Peter 8: Joseph
Reading List
The schedule for the reading group is given below and will be updated weekly.
10.Mar'12
- Wen reports from DCC 2012 on three paper (soon to be added to the wiki)
2.Mar'12
- Selecting the state representation in reinforcement learning Maillard, Munos and Ryabko
http://books.nips.cc/papers/files/nips24/NIPS2011_1427.pdf
18.April'12
- On Nicod's condition and the black raven paradox
The paper is available from Hadi or Peter by email
4.April'12
- Near-optimal Regret Bounds for Reinforcement Learning, Thomas Jaksch, Ronald Ortner and Peter Auer
http://jmlr.csail.mit.edu/papers/v11/jaksch10a.html
21.Mar'12
- Automatic discovery of ranking formulas for playing with multi-armed bandits, Francis Maes, Louis Wehenkel, and Damien Ernst, EWRL 2011
http://ewrl.files.wordpress.com/2011/08/ewrl2011_submission_15.pdf
14.Mar'12
- A theoretical analysis of model based interval estimation by A. Strehl and M. Littman, ICML 2005
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.1496
presented by Peter
29.Feb'12
- Wen will be presenting his TPR on compression
15.Feb'12
- PAC bounds for Discounted MDPs, presented by Tor. Email ua.ude.una|eromittal.rot#ua.ude.una|eromittal.rot for a copy of the paper.
8.Feb'12
- Some inequalities in probability theory, presented by Tor
7.Dec'11
- Predictive State Temporal Difference Learning by Byron Boots and Geoff Gordon NIPS 2010
http://www.cs.cmu.edu/~ggordon/boots-gordon-PSTD.pdf
30.Nov'11
- Solomonoff Memorial conference in Melbourne. Tor, Ian, Peter and Wen presenting.
23.Nov'11
- An approximation of the universal intelligence measure, Shane Legg and Joel Veness
http://jveness.info/publications/rsmc2011%20-%20aiq.pdf
presented by Wen as a practice talk for Solomonoff Memorial
Further discussions of the paper follows the 20 minute presentation with slides
16.Nov'11
- Looping Suffix Tree-Based Inference of Partially Observable Hidden State, ICML 2006, Michael P. Holmes , Charles Lee Isbell, Jr.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.62.262
presented by Mayank
9.Nov'11
- We are done with the book. Paper reading resumes next week.
Nov'11
- Peter presents chapter 7 and Joseph Chapter 8.
Oct'11
- Wen presents Chapter 6
Sep'11
- After a break for the first two weeks, Phuong presents chapter 5
Aug'11
- Tor presents chapter 4
July'11
- Daniel finnish chapter 3
30.June'11
- Daniel presents chapter 3 of "Neuro-dynamic programming"
8,15,23.June'11
- Mayank presents chapter 2 of "Neuro-dynamic programming"
1.June'11
- We will start reading "Neuro-dynamic programming" by Dimitri P. Bertsekas and John Tsitsiklis
Athena Scientific 1996
http://www.amazon.com/Neuro-Dynamic-Programming-Optimization-Neural-Computation/dp/1886529108
We will go through the introduction this week and do some planning for the reading group.
This will be lead by Peter
25.May'11
- Variable resolution discretization in optimal control
R. Munos, A.Moore
http://repository.cmu.edu/cgi/viewcontent.cgi?article=1259&context=robotics
Presented by Daniel
18.May'11
- [WNLL] Planning and Learning in Environments with Delayed Feedback
Thomas J. Walsh , Ali Nouri , Lihong Li , Michael L. Littman
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.677
Presented by Matthew
11.May'11
- Meeting, discussing paper reviewing
4.May'11
- Zahra Zamani (PhD Monitoring) An Agent Architecture for Structured Uncertain Environments
http://cecs.anu.edu.au/seminars/more/SID/2834 - Phuong Nguyen (PhD Monitoring) Feature Reinforcement Learning In Practice
http://cecs.anu.edu.au/seminars/more/SID/2833
27.April'11
- Tor Lattimore (PhD Monitoring) Asymptotically Optimal Agents
http://cecs.anu.edu.au/seminars/more/SID/2832 - Wen Shao (PhD Monitoring) AIXI in Formalisation of Turing Test
http://cecs.anu.edu.au/seminars/more/SID/2835
20.April'11
- Matthew Robards (PhD Monitoring) Function Approximation for Model Based Reinforcement Learning
http://cecs.anu.edu.au/seminars/more/SID/2830 - Mayank Daswani (PhD Monitoring) Feature Dynamic Bayesian Networks
http://cecs.anu.edu.au/seminars/more/SID/2831
13.April'11
- Wen presents, [Chu10] Evgeny Chutchev (2010), A Formalization of the Turing Test
http://arxiv.org/abs/1005.4989
6.April'11
- Mayank Presents [Hut09] M.Hutter, Feature dynamic Bayesian networks.
In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume8, pages 67-73. Atlantis Press, 2009.
http://www.hutter1.net/ai/phidbn.pdf
30.Mar'11
- Daniel Presents [NL09] A Nouri, M. Littman, Multi-resolution Exploration in Continuous Spaces, NIPS 2009
http://books.nips.cc/papers/files/nips21/NIPS2008_0730.pdf
23.Mar'11
- Peter presents "Dynamic Policy Programming"
http://www.mbfys.ru.nl/staff/m.azar/poster_NIPS09.pdf
http://arxiv.org/abs/1004.2027
16.Mar'11
- Phuong presents
9.Mar'11
- Pascal presents
Finale Doshi-Velez: Nonparametric Bayesian Approaches for Reinforcement Learning in Partially Observable Domains
http://www.informatik.uni-trier.de/%7Eley/db/conf/aaai/aaai2010.html#Doshi-Velez10
and Matthew presents Model Based RL with Function Approximation
4.Mar'11
"Pascal" Workshop on RL and Planning at NICTA Level 3, Meeting Room D
- 10:30 — 11:00 | Pascal Poupart: Explaining Automated Policies for Sequential Decision Making
- 11:00 — 11:18 | Debdeep Banerjee: Partial Order Support Link Scheduling
- 11:18 — 11:36 | Patrik Haslum: A Quick Overview of Factored (Classical) Planning
- Break — 12 minutes
- 11:48 — 12:06 | Scott Sanner: The Relational Dynamic Influence Diagram Language
- 12:06 — 12:24 | Peter Sunehag: History-based Reinforcement Learning
- 12:24 — 12:42 | Matt Robards: Model-Based Reinforcement Learning With Function Approximation
- 12:42 — 13:00 | Will Uther: topic TBD
2.Mar'11
- PAC-Bayesian Model Selection for Reinforcement Learning
Mahdi Milani Fard, Joelle Pineau
http://books.nips.cc/papers/files/nips23/NIPS2010_0431.pdf
Presented by Pascal Poupart
23.Feb'11
- Tor and Hassan present …
16.Feb'11
- Matthew and Peter present …
9.Feb.'11
- Bruno C. da Silva, Eduardo W. Basso, Ana L. C. Bazzan, Paulo M. Engel,
Dealing with Non-Stationary Environments using Context Detection
ICML 2006
http://www.autonlab.org/icml_documents/camera-ready/028_Dealing_with_Non_Sta.pdf
Presented by Aaron Li
15.Dec.'10
- Chapters 6—8, Sridhar Mahadevan, "Learning Representation and Control in Markov Decision Processes: New Frontiers".
Foundations and Trends in Machine Learning (editor, Michael, Jordan), vol 1, No. 4, pp. 403-565 (163 pages), 2009.
http://www.cs.umass.edu/~mahadeva/papers/ml-found-trend.pdf
Presented by Scott
8.Dec.'10
- Statistical physics of social dynamics
Castellano, C., Fortunato, S., and Loreto, V. 2009. Reviews of Modern Physics 81, 2, 591
Section IV. Cultural Dynamics, Parts A & B (Axelrod model and variants)
http://dx.doi.org/10.1103/RevModPhys.81.591
Presented by Ian Wood
1.Dec.'10
- Constrained Complexity Generalized Context-Tree Algorithms, Robert J Drost and Andrew C Singer
http:/ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04301233
Presented by Peter
24.Nov'10
- Hopefully the following will be presented :Efficient real-time dynamic programming for factored MDPs.
Honors thesis by Sotirios Diamand. NOTE: In N101.
17.Nov'10
- Ofer Dekel and Shai Shalev Shwartz and Yoram Singer
Power of Selective Memory: Self Bounded Learning of Prediction Suffix Trees
NIPS 2004
http://ttic.uchicago.edu/~shai/papers/DekelShSi04.pdf
(Presented by Hassan)
10.Nov'10
- Efficient real-time dynamic programming for factored MDPs.
Honors thesis by Sotirios Diamand. CANCELED
3.Nov'10
- Cancelled
27.Oct'10
- Optimality Issues of Universal Greedy Agents with Static Priors by Laurent Orseau http://www.springerlink.com/content/p2780778k054411x/
Presented by Tor
20.Oct'10
- Never Ending Language Learning (from Tom Mitchell's group at CMU)
Scientific: http://rtw.ml.cmu.edu/papers/carlson-aaai10.pdf
News: http://www.nytimes.com/2010/10/05/science/05compute.html
Webpage: http://rtw.ml.cmu.edu/rtw/publications
13.Oct'10
- Frank Stephan will talk about Inductive Inference. He is a visitor from Singapore who was a PC chair at ALT and tutorial speaker. Webpage: http://www.comp.nus.edu.sg/~fstephan
22 .Sep'10, in Room 207 (only ours to 12:30, be on time)
- A Complete Theory of Everything, http://arxiv.org/abs/0912.5434
(Marcus)
15 .Sep'10
- Constantine Caramanis and Shie Mannor
Learning in the Limit with Adversarial Disturbances
In, Proceedings of COLT 2008.
http://www.ece.mcgill.ca/~smanno1//public/C-CarmanisM-COLT2008.pdf
(Presented by Hassan)
I list two more interesting papers that are more RL related but harder (I think)
- Huibert Kwakernaak, Robust control and H8-optimization - Tutorial paper. Automatica, 29 (2). pp. 255-273. 1993.
http://doc.utwente.nl/29962/1/Kwakernaak93robust.pdf - Jun Morimoto and Kenji Doya, Robust Reinforcement Learning. Neural Computation 2005.
http://mitpress.mit.edu/journals/pdf/neco_17_2_335_0.pdf
And another making the case for pursuing robust estimators in general:
- Peter j. Huber. On the non-optimality of optimal procedures. Optimality, the third Erich L. Lehmann Symposium. 2009.
http://projecteuclid.org/euclid.lnms/1249305323
8 .Sep'10
- No Free Lunch and Occam's Razor in Supervised Learning
(Tor presents his work)
18,25 .Aug,1.Sep'10
- Chapter 8 of Universal AI book
(general discussion)
11 .Aug'10
- New TD algorithms from Alberta
(Matthew presents a survey of stuff by Maei, Sutton and their collegues)
4 .Aug'10
- MC-AIXI-CTW,
(Joel Verness) NOTE LOCATION: A207
28 .July'10
- End of Chapter 7 of Universal AI book
(Tor presents)
21 .July'10
- Meeting about Advanced AI course
Aug'10
- Hyeong Soo Chang and Michael C. Fu and Jiaqiao Hu and Steven I. Marcus
An Adaptive Sampling Algorithm for Solving Markov Decision Processes
Operations Research, 53 (1), January–February 2005, pp. 126–139
http://www.rhsmith.umd.edu/faculty/mfu/fu_files/CFHM05.pdf
9,16 .Jun'10
- Chapter 7 of Universal AI book
(Presented by Phuong)
2.Jun'10
- Canceled
26.May'10
- Matthew Robards and Peter Sunehag and Scott Sanner
RKHS Temporal Difference Learning
Tech Report, The Australian National University
RKHS Temporal Difference Learning
28.Apr'&05,19.May'10
- Chapter 6 of Universal AI book
(Presented by Peter,Marcus, Zhara)
21.April'10
- F. Willems and Y. Shtarkov and T. Tjalkens
Reflections on the Prize Paper: "The Context-Tree Weighting Method: Basic Properties"
IEEE Information Theory Society Newsletter (47) No 1, March 1997
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.1872&rep=rep1&type=pdf
For more details,
- F. Willems and Y. Shtarkov and T. Tjalkens
The context-tree weighting method: Basic properties
IEEE Transactions on Information Theory (41), 653 - 664, 1995
http://ieeexplore.ieee.org/iel1/18/8656/00382012.pdf?arnumber=382012
(the following is a more readable version of the same paper)
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.30.1819&rep=rep1&type=pdf
(Presented by Tor)
14.April'10
- L. Kocsis, Cs. Szepesvári
Bandit Based Monte-Carlo Planning
In, Proceedings of the 17th European Conference on Machine Learning
Springer-Verlag, Berlin, LNCS/LNAI 4212, September 18-22, pp. 282-293, 2006.
http://www.sztaki.hu/~szcsaba/papers/ecml06.pdf
(Presented by Peter)
7.April'10
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R.E. Schapire.
The nonstochastic multiarmed bandit problem.
SIAM Journal on Computing, 32: 48- 77, 2002.
http://www.cs.princeton.edu/~schapire/uncompress-papers.cgi/AuerCeFrSc01.ps
(Presented by Mark Reid)
31.Mar'10
- Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer
Finite time analysis of the multiarmed bandit problem
Machine Learning, 47(2-3):235-256, 2002.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.9211&rep=rep1&type=pdf
(Presented by Hassan)
17&24.Mar'10
- M. Kearns, Y. Mansour, and A.Y. Ng.
A sparse sampling algorithm for near optimal planning in large Markovian decision processes.
In Proceedings of IJCAI'99, pages 1324-1331, 1999.
http://www.cis.upenn.edu/~mkearns/papers/sparseplan.pdf
(Presented by Zahra)
3&10.Mar'10
- Joel Veness and Kee Siong Ng and Marcus Hutter and David Silver
A Monte Carlo AIXI Approximation
Technical Report, arXiv 0909.0801 (2009) 1-42
[implementation & application of the AIXI]
http://www.hutter1.net/ai/aixictw.pdf
(Presented by Sam)
3&10&17&24.Feb'10
- C. Boutilier and T. Dean and S. Hanks
Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
Journal of Artificial Intelligence Research, 11 (1999) 1—94.
http://www.eecs.harvard.edu/~avi/CS281r/F06/Papers/boutilier-et-al-mdp.pdf
(Presented 3rd by Scott and the group, 10th no meeting, 17th Phuong, 24th Zahra)
27.Jan'10
- Sebastian Thrun, Probabilistic Algorithms in Robotics
AI Magazine, 21:4 (2000) 93—109
http://www.cs.cmu.edu/~thrun/papers/thrun.probrob.pdf
(Presented by Marcus)
20.Jan'10
- Continuation of last week's paper + Summer Scholar presentation preview.
13.Jan'10
- Note: Changed from Before.
Beal, M.J., Ghahramani, Z. and Rasmussen, C.E.
The Infinite Hidden Markov Model
In Advances in Neural Information Processing Systems 2002.
http://www.cse.buffalo.edu/faculty/mbeal/papers/ihmm.pdf
(Presented by Hassan).
23&30.Dec'09
- Break - (-: Christmas and New Years :-)
16.Dec'09
Scott will be presenting:
- (1) Excerpts of Scott's Thesis on factored MDPs.
- (2) Stochastic Planning using Decision Diagrams (SPUDD).
Hoey, St. Aubin, Hu, Boutilier (UAI-99)
http://www.cs.toronto.edu/~cebly/Papers/spudd.ps
- (3) Approximate Policy Construction using Decision Diagrams (APRICODD).
St. Aubin, Hoey, Boutilier (NIPS-00)
http://www.cs.ubc.ca/nest/lci/papers/docs2000/hoey-apricodd.pdf
09.Dec'09
- [RP08] S.Ross and J.Pineau.
Model-based Bayesian reinforcement learning in large structured domains.
In Proc. 24th Conference in Uncertainty in Artificial Intelligence
(UAI'08), pages 476-483, Helsinki, 2008. AUAI Press.
http://www.cs.mcgill.ca/~jpineau/files/sross-uai08.pdf
(Presented by Peter)
02.Dec'09
- Note: Changed from before.
M. Rosencrantz, G. Gordon, and S. Thrun.
Learning low dimensional predictive representations.
In Proceedings of the Twenty-First International Conference on Machine Learning,
Banff, Alberta, Canada, 2004.
http://robots.stanford.edu/papers/Rosencrantz04a.pdf
(Presented by Ian)
25.Nov'09
- [SJR04] S.P. Singh, M.R. James, and M.R. Rudary.
Predictive state representations: A new theory for modeling dynamical systems.
In Proc. 20th Conference in Uncertainty in Artificial Intelligence (UAI'04), pages 512-518, Banff, Canada, 2004. AUAI Press.
(Presented by Hassan)
18.Nov'09
- Matthew Robards will present his literature review on reinforcement learning in large, continuous spaces (focus on Part II).
Literature Review
11.Nov'09
- [SLL09] A.L. Strehl, L.Li, and MichaelL. Littman.
Reinforcement learning in finite MDPs: PAC analysis.
http://paul.rutgers.edu/~strehl/, 2009.
(Presented by Marcus)
04.Nov'09
- [McC95] McCallum, R. Andrew.
Instance-Based Utile Distinctions for Reinforcement Learning.
The Proceedings of the Twelfth International Machine Learning Conference (ML'95).
Lake Tahoe, CA, 1995.
ftp://ftp.cs.rochester.edu/pub/papers/robotics/95.mccallum-ml.ps.Z
(Presented by Peter)
28.Oct'09
Scott will be talking about several nice methods for solving MDPs efficiently.
The 4 papers to be covered are summarized in the following slides:
http://sml.nicta.com.au/rlp08/RLP_MDP_Extensions.pdf
The papers themselves are as follows (it's recommended that people read the first one
and skim through the others).
- Algorithms for Inverse Reinforcement Learning.
Andrew Y. Ng and Stuart Russell.
ICML 2000.
http://robotics.stanford.edu/~ang/papers/icml00-irl.pdf
- Policy invariance under reward transformations: theory and application to reward shaping.
Andrew Y. Ng and Daishi Harada and Stuart Russell.
ICML 1999.
http://robotics.stanford.edu/~ang/papers/shaping-icml99.pdf
- Hierarchical Solution of Markov Decision Processes using Macro-actions.
Milos Hauskrecht and Nicolas Meuleau and Leslie Pack Kaelbling and Thomas Dean and Craig Boutilier.
UAI 1998.
Note: this paper builds on the macro action semi-MDP framework of Sutton & Precup, but makes some
important changes which make things much cleaner (theoretically and implementationally).
http://www.cs.toronto.edu/kr/papers/macros.pdf
- Reinforcement Learning with Hierarchies of Machines.
Ronald Parr and Stuart Russell.
NIPS 1998.
http://eprints.kfupm.edu.sa/61888/1/61888.pdf
21.Oct'09
- Andrew Y. Ng and Michael Jordan.
PEGASUS: A policy search method for large MDPs and POMDPs.
In Uncertainty in Artificial Intelligence, Proceedings of the Sixteenth Conference, 2000.
http://robotics.stanford.edu/~ang/papers/uai00-pegasus.pdf
(Presented by Matthew)
Addendum: Policy gradient techniques from a robotics perspective:
- Policy gradient methods for robotics.
J. Peters and S.Schaal.
IROS 2006
http://www-clmc.usc.edu/publications/P/peters-IROS2006.pdf
14.Oct'09
- [NCD04] A.Y. Ng, A.Coates, M.Diel, V.Ganapathi, J.Schulte, B.Tse, E.Berger, and E.Liang.
Autonomous inverted helicopter flight via reinforcement learning.
In ISER, volume21 of Springer Tracts in Advanced Robotics, pages 363-372. Springer, 2004.
(Presented by Phuong)
30.Sep'09&7.Oct'09
- [RPPC08] S. Ross, J. Pineau, S. Paquet, B. Chaib-draa,
Online planning algorithms for POMDPs,
Journal of Artificial Intelligence Research, 32 (2008) 663—704.
This paper compares the "online" "tree-search" planning approach, popular for games
with the "offline" "self-consistent" Bellman equation approach,
popular in reinforcement learning (and described by Kaelbling 1998 et al).
(Presented by Peter).
16&23.Sep'09
- [KLC98] L.P. Kaelbling and M.L. Littman and A.R. Cassandra,
Planning and Acting in Partially Observable Stochastic Domains
Artificial Intelligence, 101 (1998) 99—134
(Presented by Marcus/Hassan/Sarah)
Papers in Queue
General POMDPs
- [Dim10] Christos Dimitrakakis (2010) Context MDPs
http://fias.uni-frankfurt.de/~dimitrakakis/papers/cmdp.pdf
State Abstractions for RL
- [LWL06], Lihong Li , Thomas J. Walsh , Michael L. Littman,
Towards a Unified Theory of State Abstraction for MDPs
In Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.60.1229
- [GDG03] R.Givan, T.Dean, and M.Greig.
Equivalence notions and model minimization in Markov decision processes.
Artificial Intelligence, 147(1-2):163-223, 2003.
- [Hut09a] M.Hutter.
Feature dynamic Bayesian networks.
In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume8, pages 67-73. Atlantis Press, 2009.
Exploration/Exploitation in RL
- [LL10], Lihong Li and Michael L. Littman, Reducing reinforcement learning to KWIK online regression
Tenth International Symposium on Artificial Intelligence and Mathematics"
http://www.springerlink.com/content/g25m74160311n665/fulltext.pdf
- [SL07] Er L. Strehl , Michael L. Littman , Online linear regression and its application to model-based reinforcement learning (NIPS 2007)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.6591
General MDPs
- [WGL10] Thomas J. Walsh, Sergiu Goschin, Michael L. Littman: Integrating Sample-Based Planning and Model-Based Reinforcement Learning. AAAI 2010,
http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1880
- [WNLL07], Thomas J. Walsh , Ali Nouri , Lihong Li , Michael L. Littman,
Planning and Learning in Environments with Delayed Feedback
Autonomous Agents and Multi-Agent Systems 18(1): 83-105 (2009)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.677
- [JS10] Tobias Jung, Peter Stone: Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-Like Exploration.
ECML/PKDD (1) 2010: 601-616
http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/ECML10-jung.pdf
- [JS09] Nicholas K. Jong, Peter Stone: Compositional Models for Reinforcement Learning.
ECML/PKDD (1) 2009: 644-659
http://www.springerlink.com/content/11460wl75p04493v/
- [GP10] M. Geist and O. Pietquin (2010) Kalman Temporal Differences
JAIR Volume 39, pages 483-532
http://www.jair.org/papers/paper3077.html
- [LT10] T. Lang and M. Toussaint (2010) Planning with Noisy Probabilistic Relational Rules
JAIR Volume 39, pages 1-49
http://www.jair.org/papers/paper3093.html
- [BBSE10] (Book), Lucian Busoniu, Robert Babuska, Bart De Schutter, and Damien Ernst (2010)
"Reinforcement Learning and Dynamic Programming Using Functions Approximators"
in the Automation and Control Engineering series of Taylor & Francis CRC Press.
- [Mah09] Sridhar Mahadevan (2009) Learning Representation and Control in Markov Decision Processes: New Frontiers
Foundations and Trends in Machine Learning: Vol. 1: No 4, pp 403-565.
http://dx.doi.org/10.1561/2200000003
Miscellaneous
- [Gru04] P.D. Gruenwald.
Tutorial on minimum description length.
In Minimum Description Length: recent advances in theory and practice, page Chapters 1 and 2. MIT Press, 2004.
- [BLA02] B. Ng, L. Peshkin, and A. Pfeffer.
Factored Particles for Scalable Monitoring.
In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, 2002.
Background Reading
- L. P. Kaelbling and M. L. Littman and A. W. Moore,
Reinforcement learning: A Survey,
Journal of Artificial Intelligence Research, 4 (1996) 237—285
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/jair/pub/volume4/kaelbling96a.pdf
- R. Sutton and A. Barto. Reinforcement learning: An introduction
Cambridge, MA, MIT Press (1998),
http://www-anw.cs.umass.edu/~rich/book/the-book.html
- [Put94] M.L. Puterman.
Markov Decision Processes - Discrete Stochastic Dynamic Programming.
Wiley, New York, NY, 1994.
- [KV86] P.R. Kumar and P.P. Varaiya.
Stochastic Systems: Estimation, Identification, and Adaptive Control.
Prentice Hall, Englewood Cliffs, NJ, 1986.
Contact
Peter Sunehag <ua.ude.una|gahenus.retep#ua.ude.una|gahenus.retep> or
Marcus Hutter <ua.ude.una|rettuh.sucram#ua.ude.una|rettuh.sucram>