Reinforcement Learning Reading Group

General Information

Welcome to the Reinforcement Learning Reading Group at RSISE@ANU

  • Who: Everyone is welcome.
  • When: Every Wednesday, 11:30-12:30, with subsequent lunch.
    (If you want to attend, but the time does not suit you, please let me know)
  • Where: RSISE building, Common RHS Room, A206, Australian National University.
  • Assumed Background: Basics in Reinforcement Learning (see below)
  • Operation mode: Discussing read papers. No email reminders.

The Reinforcement Learning Reading Group will concentrate on the background for and techniques pushing the frontier of generic reinforcement learning agents, in particular for partial observable domains (PORL). For many years, the reinforcement-learning community primarily focused on sequential decision making in fully observable but unknown domains while the planning-under-uncertainty community focused on known but partially observable domains. Since most problems are both partially observable and (at least partially) unknown, recent years have seen a surge of interest in combining the related, but often different, algorithmic machineries developed in the two communities.

See, for instance:
PORL09: Partially Observable Reinforcement Learning
Symposium at NIPS'09 December 10, Vancouver
http://www.hutter1.net/ai/porlsymp.htm and
http://grla.wikidot.com/nips for more details.

Given the substantial interest in RL and Planning at RSISE@ANU and CRL@NICTA, the time thus seems ripe for a reading group that brings these two communities together and to review recent relevant papers (see below).

Regular (Past&Current) Participants:
Mostly students and researchers from RSISE@ANU and CRL@NICTA
and other RL friends nearby.

Book Reading
From 1.June'11-9.Nov'11 we read
"Neuro-dynamic programming" by Dimitri P. Bertsekas and John Tsitsiklis
Athena Scientific 1996
http://www.amazon.com/Neuro-Dynamic-Programming-Optimization-Neural-Computation/dp/1886529108
Chapter (http://www.athenasc.com/ndpcontents.html) 1: Peter, 2: Mayank, 3: Daniel, 4: Tor, 5: Phuong, 6: Wen and Matthew 7: Peter 8: Joseph

Reading List

The schedule for the reading group is given below and will be updated weekly.

10.Mar'12

  • Wen reports from DCC 2012 on three paper (soon to be added to the wiki)

2.Mar'12

18.April'12

  • On Nicod's condition and the black raven paradox
    The paper is available from Hadi or Peter by email

4.April'12

21.Mar'12

14.Mar'12

29.Feb'12

  • Wen will be presenting his TPR on compression

15.Feb'12

  • PAC bounds for Discounted MDPs, presented by Tor. Email ua.ude.una|eromittal.rot#ua.ude.una|eromittal.rot for a copy of the paper.

8.Feb'12

  • Some inequalities in probability theory, presented by Tor

7.Dec'11

30.Nov'11

  • Solomonoff Memorial conference in Melbourne. Tor, Ian, Peter and Wen presenting.

23.Nov'11

  • An approximation of the universal intelligence measure, Shane Legg and Joel Veness
    http://jveness.info/publications/rsmc2011%20-%20aiq.pdf
    presented by Wen as a practice talk for Solomonoff Memorial
    Further discussions of the paper follows the 20 minute presentation with slides

16.Nov'11

9.Nov'11

  • We are done with the book. Paper reading resumes next week.

Nov'11

  • Peter presents chapter 7 and Joseph Chapter 8.

Oct'11

  • Wen presents Chapter 6

Sep'11

  • After a break for the first two weeks, Phuong presents chapter 5

Aug'11

  • Tor presents chapter 4

July'11

  • Daniel finnish chapter 3

30.June'11

  • Daniel presents chapter 3 of "Neuro-dynamic programming"

8,15,23.June'11

  • Mayank presents chapter 2 of "Neuro-dynamic programming"

1.June'11

25.May'11

18.May'11

11.May'11

  • Meeting, discussing paper reviewing

4.May'11

27.April'11

20.April'11

13.April'11

6.April'11

  • Mayank Presents [Hut09] M.Hutter, Feature dynamic Bayesian networks.
    In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume8, pages 67-73. Atlantis Press, 2009.
    http://www.hutter1.net/ai/phidbn.pdf

30.Mar'11

23.Mar'11

16.Mar'11

  • Phuong presents

9.Mar'11

4.Mar'11
"Pascal" Workshop on RL and Planning at NICTA Level 3, Meeting Room D

  • 10:30 — 11:00 | Pascal Poupart: Explaining Automated Policies for Sequential Decision Making
  • 11:00 — 11:18 | Debdeep Banerjee: Partial Order Support Link Scheduling
  • 11:18 — 11:36 | Patrik Haslum: A Quick Overview of Factored (Classical) Planning
  • Break — 12 minutes
  • 11:48 — 12:06 | Scott Sanner: The Relational Dynamic Influence Diagram Language
  • 12:06 — 12:24 | Peter Sunehag: History-based Reinforcement Learning
  • 12:24 — 12:42 | Matt Robards: Model-Based Reinforcement Learning With Function Approximation
  • 12:42 — 13:00 | Will Uther: topic TBD

2.Mar'11

23.Feb'11

  • Tor and Hassan present …

16.Feb'11

  • Matthew and Peter present …

9.Feb.'11

15.Dec.'10

  • Chapters 6—8, Sridhar Mahadevan, "Learning Representation and Control in Markov Decision Processes: New Frontiers".
    Foundations and Trends in Machine Learning (editor, Michael, Jordan), vol 1, No. 4, pp. 403-565 (163 pages), 2009.
    http://www.cs.umass.edu/~mahadeva/papers/ml-found-trend.pdf
    Presented by Scott

8.Dec.'10

  • Statistical physics of social dynamics
    Castellano, C., Fortunato, S., and Loreto, V. 2009. Reviews of Modern Physics 81, 2, 591
    Section IV. Cultural Dynamics, Parts A & B (Axelrod model and variants)
    http://dx.doi.org/10.1103/RevModPhys.81.591
    Presented by Ian Wood

1.Dec.'10

  • Constrained Complexity Generalized Context-Tree Algorithms, Robert J Drost and Andrew C Singer
    http:/ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04301233
    Presented by Peter

24.Nov'10

  • Hopefully the following will be presented :Efficient real-time dynamic programming for factored MDPs.
    Honors thesis by Sotirios Diamand. NOTE: In N101.

17.Nov'10

10.Nov'10

  • Efficient real-time dynamic programming for factored MDPs.
    Honors thesis by Sotirios Diamand. CANCELED

3.Nov'10

  • Cancelled

27.Oct'10

20.Oct'10

13.Oct'10

22 .Sep'10, in Room 207 (only ours to 12:30, be on time)

15 .Sep'10

I list two more interesting papers that are more RL related but harder (I think)

And another making the case for pursuing robust estimators in general:

8 .Sep'10

  • No Free Lunch and Occam's Razor in Supervised Learning
    (Tor presents his work)

18,25 .Aug,1.Sep'10

  • Chapter 8 of Universal AI book
    (general discussion)

11 .Aug'10

  • New TD algorithms from Alberta
    (Matthew presents a survey of stuff by Maei, Sutton and their collegues)

4 .Aug'10

  • MC-AIXI-CTW,
    (Joel Verness) NOTE LOCATION: A207

28 .July'10

  • End of Chapter 7 of Universal AI book
    (Tor presents)

21 .July'10

  • Meeting about Advanced AI course

Aug'10

9,16 .Jun'10

  • Chapter 7 of Universal AI book
    (Presented by Phuong)

2.Jun'10

  • Canceled

26.May'10

  • Matthew Robards and Peter Sunehag and Scott Sanner
    RKHS Temporal Difference Learning
    Tech Report, The Australian National University
    RKHS Temporal Difference Learning

28.Apr'&05,19.May'10

  • Chapter 6 of Universal AI book
    (Presented by Peter,Marcus, Zhara)

21.April'10

For more details,

14.April'10

  • L. Kocsis, Cs. Szepesvári
    Bandit Based Monte-Carlo Planning
    In, Proceedings of the 17th European Conference on Machine Learning
    Springer-Verlag, Berlin, LNCS/LNAI 4212, September 18-22, pp. 282-293, 2006.
    http://www.sztaki.hu/~szcsaba/papers/ecml06.pdf
    (Presented by Peter)

7.April'10

31.Mar'10

17&24.Mar'10

3&10.Mar'10

  • Joel Veness and Kee Siong Ng and Marcus Hutter and David Silver
    A Monte Carlo AIXI Approximation
    Technical Report, arXiv 0909.0801 (2009) 1-42
    [implementation & application of the AIXI]
    http://www.hutter1.net/ai/aixictw.pdf
    (Presented by Sam)

3&10&17&24.Feb'10

27.Jan'10

20.Jan'10

  • Continuation of last week's paper + Summer Scholar presentation preview.

13.Jan'10

23&30.Dec'09

  • Break - (-: Christmas and New Years :-)

16.Dec'09

Scott will be presenting:

09.Dec'09

  • [RP08] S.Ross and J.Pineau.
    Model-based Bayesian reinforcement learning in large structured domains.
    In Proc. 24th Conference in Uncertainty in Artificial Intelligence
    (UAI'08), pages 476-483, Helsinki, 2008. AUAI Press.
    http://www.cs.mcgill.ca/~jpineau/files/sross-uai08.pdf
    (Presented by Peter)

02.Dec'09

  • Note: Changed from before.
    M. Rosencrantz, G. Gordon, and S. Thrun.
    Learning low dimensional predictive representations.
    In Proceedings of the Twenty-First International Conference on Machine Learning,
    Banff, Alberta, Canada, 2004.
    http://robots.stanford.edu/papers/Rosencrantz04a.pdf
    (Presented by Ian)

25.Nov'09

  • [SJR04] S.P. Singh, M.R. James, and M.R. Rudary.
    Predictive state representations: A new theory for modeling dynamical systems.
    In Proc. 20th Conference in Uncertainty in Artificial Intelligence (UAI'04), pages 512-518, Banff, Canada, 2004. AUAI Press.
    (Presented by Hassan)

18.Nov'09

  • Matthew Robards will present his literature review on reinforcement learning in large, continuous spaces (focus on Part II).
    Literature Review

11.Nov'09

  • [SLL09] A.L. Strehl, L.Li, and MichaelL. Littman.
    Reinforcement learning in finite MDPs: PAC analysis.
    http://paul.rutgers.edu/~strehl/, 2009.
    (Presented by Marcus)

04.Nov'09

28.Oct'09

Scott will be talking about several nice methods for solving MDPs efficiently.
The 4 papers to be covered are summarized in the following slides:

http://sml.nicta.com.au/rlp08/RLP_MDP_Extensions.pdf

The papers themselves are as follows (it's recommended that people read the first one
and skim through the others).

  • Hierarchical Solution of Markov Decision Processes using Macro-actions.
    Milos Hauskrecht and Nicolas Meuleau and Leslie Pack Kaelbling and Thomas Dean and Craig Boutilier.
    UAI 1998.
    Note: this paper builds on the macro action semi-MDP framework of Sutton & Precup, but makes some
    important changes which make things much cleaner (theoretically and implementationally).
    http://www.cs.toronto.edu/kr/papers/macros.pdf

21.Oct'09

Addendum: Policy gradient techniques from a robotics perspective:

14.Oct'09

  • [NCD04] A.Y. Ng, A.Coates, M.Diel, V.Ganapathi, J.Schulte, B.Tse, E.Berger, and E.Liang.
    Autonomous inverted helicopter flight via reinforcement learning.
    In ISER, volume21 of Springer Tracts in Advanced Robotics, pages 363-372. Springer, 2004.
    (Presented by Phuong)

30.Sep'09&7.Oct'09

  • [RPPC08] S. Ross, J. Pineau, S. Paquet, B. Chaib-draa,
    Online planning algorithms for POMDPs,
    Journal of Artificial Intelligence Research, 32 (2008) 663—704.
    This paper compares the "online" "tree-search" planning approach, popular for games
    with the "offline" "self-consistent" Bellman equation approach,
    popular in reinforcement learning (and described by Kaelbling 1998 et al).
    (Presented by Peter).

16&23.Sep'09

  • [KLC98] L.P. Kaelbling and M.L. Littman and A.R. Cassandra,
    Planning and Acting in Partially Observable Stochastic Domains
    Artificial Intelligence, 101 (1998) 99—134
    (Presented by Marcus/Hassan/Sarah)

Papers in Queue

General POMDPs

State Abstractions for RL

  • [GDG03] R.Givan, T.Dean, and M.Greig.
    Equivalence notions and model minimization in Markov decision processes.
    Artificial Intelligence, 147(1-2):163-223, 2003.
  • [Hut09a] M.Hutter.
    Feature dynamic Bayesian networks.
    In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume8, pages 67-73. Atlantis Press, 2009.

Exploration/Exploitation in RL

General MDPs

  • [BBSE10] (Book), Lucian Busoniu, Robert Babuska, Bart De Schutter, and Damien Ernst (2010)
    "Reinforcement Learning and Dynamic Programming Using Functions Approximators"
    in the Automation and Control Engineering series of Taylor & Francis CRC Press.
  • [Mah09] Sridhar Mahadevan (2009) Learning Representation and Control in Markov Decision Processes: New Frontiers
    Foundations and Trends in Machine Learning: Vol. 1: No 4, pp 403-565.
    http://dx.doi.org/10.1561/2200000003

Miscellaneous

  • [Gru04] P.D. Gruenwald.
    Tutorial on minimum description length.
    In Minimum Description Length: recent advances in theory and practice, page Chapters 1 and 2. MIT Press, 2004.
  • [BLA02] B. Ng, L. Peshkin, and A. Pfeffer.
    Factored Particles for Scalable Monitoring.
    In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, 2002.

Background Reading

  • [Put94] M.L. Puterman.
    Markov Decision Processes - Discrete Stochastic Dynamic Programming.
    Wiley, New York, NY, 1994.
  • [KV86] P.R. Kumar and P.P. Varaiya.
    Stochastic Systems: Estimation, Identification, and Adaptive Control.
    Prentice Hall, Englewood Cliffs, NJ, 1986.

Contact

Peter Sunehag <ua.ude.una|gahenus.retep#ua.ude.una|gahenus.retep> or
Marcus Hutter <ua.ude.una|rettuh.sucram#ua.ude.una|rettuh.sucram>

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License