Reinforcement Learning Reading Group

General Information

Welcome to the Reinforcement Learning Reading Group at RSISE@ANU

  • Who: Everyone is welcome.
  • When: Every Wednesday, 12:00-13:00, with provided or subsequent lunch.
    (If you want to attend, but the time does not suit you, please let me know)
  • Where: Brian Anderson building (115), Meeting Room, A206 (or sometimes seminar room A105), Australian National University.
  • Assumed Background: Basics in Reinforcement Learning (e.g.[SB98] see end of this page)
  • Operation mode: Discussing read papers. No email reminders.

Regular (Past&Current) Participants:
Mostly students and researchers from RSISE@ANU and CRL@NICTA
and other RL friends nearby.

Reading List

The schedule for the reading group is given below and will be updated weekly.

Important: New schedule meeting at 12:00

31.Jan'17

24.Jan'17

17.Jan'17
TBA

10.Jan'17
TBA

3.Jan'17
[no reading group]

27.Dec'17
[no reading group]

20.Dec'17
[probably no reading group, but check a day in advance]

13.Dec'17
[probably no reading group, but check a day in advance]

6.Dec'17
[no reading group]

29.Nov'17
Elliot Catt presents his Thesis Progress on Quantum Computing

22.Nov'17
Group Excursion: Meet 13:00 at RSISE=BAB entrance (see email for details)

15.Nov'17
[no reading group]

8.Nov'17
[no reading group]

1.Nov'17
Owen presents his Honours Thesis on Universal Compression of Piecewise iid Sources

25.Oct'17
Xavier presents impressions from CFAR's AI Summer Fellows Program in San Francisco

18.Oct'17
[no reading group]

11.Oct'17
[no reading group]

4.Oct'17
Samuel presents Hindsight Experience Replay

27.Sep'17
Badri presents Convergence of Binarized CTW

20.Sep'17
Daoyi Dong from ADFA presents his research on Quantum RL and Quantum control theory

13.Sep'17
Elliot presents his formalization of TMs in HOL and prove of equivalence to PR functions.

6.Sep'17
[no reading group]

30.Aug'17
Sultan presents impressions from UAI

23.Aug'17
[no reading group]

16.Aug'17
[no reading group]

9.Aug'17
Arthur Franz presents incremental and hierarchical compression

2.Aug'17
Elliot practices conference talk

26.Jul'17
John practices conference talk

19.Jul'17
Tom presents Learning from Human Preferences

12.Jul'17
Tom presents applications of evidential semi-measures

5.Jul'17
John presents Deterministic Policy Gradient Algorithms

28.Jun'17
[No reading group]

21.Jun'17
Adam presents one of his recent research papers

14.Jun'17
Tom presents considerations on SARSA convergence: Gordon (1996) and Perkins and Precup (2003)

7.Jun'17
John presents FeUdal Networks for Hierarchical Reinforcement Learning in Room A105

31.May'17
Elliot presents The forget me not process

24.May'17
Sultan and Marcus present some AGI papers

17.May'17
Marcus presents Compress & Control

12.May'17 @12pm
Reading and discussing some UAI papers

10.May'17
Reading and discussing some UAI papers

3.May'17
Reading and discussing some UAI papers

26.April'17
Arie, Suraj and Elliot present MC-AIXI-CTW

19.April'17
Elliot presents Evolution Strategies as a Scalable Alternative to Reinforcement Learning

12.April'17
Tobias and Mikael continue

5.April'17
Tobias and Mikael present their Bachelor thesis on classifying games

29.March'17
Jarryd continues with generative adversarial networks for RL

22.March'17
Edward Barker presents Unsupervised Basis Function Adaptation for Reinforcement Learning

15.March'17
Jarryd presents generative adversarial networks for RL

8.March'17
Tor Lattimore talks about the adversarial/stochastic divide and some open problems there

1.March'17
Tom continues with The Delusionbox Problem.

22.Feb'17
Tom presents The Delusionbox Problem

15.Feb'17
Phuong Nguyen presents interesting experiences since leaving the group

8.Feb'17
Tor Lattimore talks about some open problems in online learning/statistics/RL.

1.Feb'17
Tor Lattimore presents The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

14.Dec'16
Jarryd presents - Learning to reinforcement learn

07.Dec'16
Boris gives update on his project

30.Nov'16
Tom gives AI-Safety talk

23.Nov'16
Suraj presents his thesis (mid-semester update)

16.Nov'16
[No reading group]

9.Nov'16
Tom gives monitoring talk

2.Nov'16
[No reading group]

26.Oct'16 (at different time: 1pm)
James presents symmetry of algorithmic information

26.Oct'16 (at different time: 12pm)
Farhana presents - Dimensionality of Spatio-Temporal Broadband Signals Observed Over Finite Spatial and Temporal Windows

19.Oct'16
Sultan presents his recent research insights
(thereafter group excursion, see email for details)

12.Oct'16
Jarryd presents exploration results (cont.)

5.Oct'16
John presents AIXIjs

28.Sep'16
Jarryd presents exploration results

21.Sep'16
Manlio presents On the Computability of AIXI

14.Sep'16
John speaks about visit to Bay Area, and Why does deep and cheap learning work so well?

7.Sep'16
Tom presents takeaways from US+UK trip: deep learning

31.Aug'16
Tom presents takeaways from US+UK trip: UC Berkely and AGI

24.Aug'16
Tom presents takeaways from US+UK trip: New AI Safety research agendas Google/OpenAI open safety problems and MIRI's machine learning agenda

17.Aug'16
Tom presents takeaways from US+UK trip: Mainly Cooperative inverse reinforcement learning

3.Aug'16
Manlio Valenti from Trento introduces Upper-SemiComputable SemiMeasures

27.July'16
John and Sean present progress on Interactive GRL Demo

20.July'16
Break

13.July'16
Sultan presents 2 papers

22.June'16
Jarryd presents Unifying Count-Based Exploration and Intrinsic Motivation

15.June'16
Xian Wang presents his research on …

6.June'16 (obs: Monday)
Tom continues with AIXI tutorial

1.June'16
George Stamatescu presents KL Divergence and Reciprocal Chains

31.May'16
Tom presents AIXI tutorial

25.May'16
Gerhard Visser presents Interest-Relative Inductive Inference
thesis draft (unpublished)

11+18.May'16
Break

4.May'16
John presents Pedro A. Ortega, Naftali Tishby (2016) Memory controls time perception and inter-temporal choices

27.April'16
Tom presents wireheading result

20.April'16
Sultan continues AGI reviews

13.April'16
Sultan AGI reviews

6.April'16
Tom and Sultan UAI reviews

30.Mar'16
Jan "defends" his thesis
(in room A105)

23.Mar'16
Discussion of UAI reviews

16.Mar'16
Jan continues to talk about conferences from 2015

9.Mar'16
Sultan presents State of the Art Control of Atari Games Using Shallow Reinforcement Learning

4.Mar'16 11:30 EXTRA SESSION
Adam Case presents

2.Mar'16
Jan presents Safely Interruptible Agents

24.Feb'16
Djallel Bouneffouf presents

17.Feb'16
Jan continues to talk about conferences from 2015

10.Feb'16
No reading group.

3.Feb'16
Jan continues to talk about conferences from 2015

27.Jan'16
Tom presents Owain Evans' paper Learning the Preferences of Ignorant, Inconsistent Agents

20.Jan'16
Jan talks about conferences from 2015

16.Dec'15
Tom summarises the Australasian AI conference, and maybe continues with preliminary results on the wireheading problem.

9.Dec'15
Jae Hee Lee presents his PhD thesis Qualitative Reasoning about Relative Directions: Computational Complexity and Practical Algorithm

2.Dec'15
Break for Australian AI conference

25.Nov'15
Tom presents summary of MIRIx workshop.

18.Nov'15
Tom presents preliminary results on the wireheading problem.

11.Nov'15
Daniel continues with Agents Using Speed Priors

4.Nov'15
Daniel presents Agents Using Speed Priors

28.Oct'15
David presents Practical Extreme State Aggregation

21.Oct'15
Matt Alger presents a project on Deep Inverse Reinforcement Learning

14.Oct'15
Aqua Zhu presents background on classical sequence prediction and related problems.

7.Oct'15
Tom presents Analytical Results on the BFS vs. DFS Algorithm Selection Problem, Part II: Graph Search

23.Sep'15
Tor Lattimore presents Optimally Confident UCB: Improved Regret for Finite-Armed Bandits

16.Sep'15
Spring break

9.Sep'15
Spring break

2.Sep'15
Tom presents Analytical Results on the BFS vs. DFS Algorithm Selection Problem, Part I: Tree Search

26.Aug'15
Marcus continues presenting impressions from ICML/EWRL

19.Aug'15
Hadi Afshar presents Reflection, Refraction, and Hamiltonian Monte Carlo
Recommended (background) reading

12.Aug'15
Tom presents Sequential Extensions of Causal and Evidential Decision Theory

5.Aug'15
Reading group resumes. Marcus presents impressions from ICML/EWRL

24.June'15 - 29.July'15
Winter break

17.June'15
Yiyun presents Modelling Causal Reasoning with Ambiguous Observations and Quantum Probability Model of "Zero-Sum" Beliefs

10.June'15
Mayank presents Neural Turing Machines

3.June'15
Continue discussing reviews for ALT

27.May'15
Discussing reviews for ALT

20.May'15
Jan continues from last time

13.May'15
Jan talks about merging and predicting,
in particular the results from Merging and Learning and
On Sequence Prediction for Arbitrary Measures

6.May'15
Continued discussion of AGI reviews

29.Apr'15
Discussing reviews for AGI

15. and 22.Apr'15
No reading group

8.Apr'15
Jan presents Reflective Oracles: A Foundation for Classical Game Theory

1.Apr'15
Jan presents Reflective Variants of Solomonoff Induction and AIXI

25.Mar'15
Yiyun presents Cognitive processes and mechanisms in causal reasoning with ambiguous observations

18.Mar'15
Marcus presents Compress and Control

11.Mar'15
Daniel presents the current status of his work on the speed prior
The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions

4.Mar'15
Jan presents a journal paper under review

18. and 25. Feb'15
No Reading Group

11.Feb'15
Tom presents Can we measure the difficulty of an optimization problem?

4.Feb'15
Mayank presents selected papers from ACML 2014

28.Jan'15
Jan presents Corrigibility
https://intelligence.org/files/CorrigibilityTR.pdf

21.Jan'15
Mayank gives a tutorial on convex optimization

10.Dec'14
PhD Monitoring Hadi (Room RSISE B123)

19'Nov'14
Peter leads discussion on the new book (with a focus on chapter 7)
Ethical Artificial Intelligence by Bill Hibbard
http://arxiv.org/ftp/arxiv/papers/1411/1411.1373.pdf
Bill builds on UAI, decision theoretic rationality, space-time embedded agents etc. to formally study ethical AI.

12'Nov'14
Xi Li presents on Leibniz's program and its
relation to UAI

5.Nov'14
Daniel Filan talks about Extreme state aggregation beyond MDPs

29.Oct'14
Neal Hughes (economics PhD student) presents on using RL for water management
Note: its in B123

22.Oct'14
Tom Butler presents his honors thesis
Fuzzy Expert System Evolution: Increasing the accessibility of intelligent controllers

15.Oct'14
PhD Monitoring Mayank & Jan (Room RSISE B123)

8.Oct'14
Break

1.Oct'14
Break

24.Sep'14
Daniel Filan presents about the speed prior
http://link.springer.com/chapter/10.1007%2F3-540-45435-7_15

17.Sep'14
Jan presents Teleporting Universal Agents by Laurent Orseau AGI'2014
http://www.agroparistech.fr/mia/equipes:membres:page:laurent:teleport

10.Sep'14
Hadi presents his most recent work on symbolic Gibb's sampling

3.Sep'14
Peter reports from AAAI'2014

Integrating representation learning and temporal difference learning:
A matrix factorization approach by M. White
http://webdocs.cs.ualberta.ca/~whitem/publications/14aaaiw-crtd.pdf
with a closely related alternative
http://webdocs.cs.ualberta.ca/~whitem/publications/14aaaiw-frrl.pdf
Active Learning with Model Selection by A. Ali., R. Caruana and A. Kapoor
http://research.microsoft.com/en-us/um/people/akapoor/papers/AAAI2014.pdf
Natural Temporal Difference Learning by W. Dabney and P. Thomas
http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/viewFile/8568/8913

27.Aug'14
Peter reports from CogSci'2014.
Toward Boundedly Rational Analysis by Thomas Icard
http://web.stanford.edu/~icard/cogsci14.pdf
A Bounded Rationality Account of Wishful Thinking by R. Neumann, A. N. Rafferty, T. L. Griffiths
http://cocosci.berkeley.edu/anna/papers/WishfulThinking.pdf
The high availability of extreme events serves resource-rational decision-making by Lieder, Wills, Hsu, Griffiths
http://cocosci.berkeley.edu/falk/HighAvailabilityOfExtremeEvents.pdf
and a related recent journal paper providing the background for the above
One and Done? Optimal Decisions From Very Few Samples by Edward Vul, Noah Goodman, Thomas L. Griffiths and Joshua B. Tenenbaum
http://web.stanford.edu/~ngoodman/papers/VulGoodmanGriffithsTenenbaum-COGS-2014.pdf
Information vs Reward in a changing world by Navarro and Newell
http://health.adelaide.edu.au/psychology/ccs/docs/pubs/2014/NavarroNewell2014.pdf
Uncertainty and Exploration in a restless bandit task by Speekenbrink and Konstantinidis
http://www.psychol.ucl.ac.uk/m.speekenbrink/articles/cogsci2014.pdf

16.July'14 — 20.Aug'14
Currently no meetings planned, but check a day in advance or volunteer to present something.

9.July'14
Jan presents Yudkowsky, Eliezer Herreshoff, Marcello.
Tiling Agents for Self-Modifying AI, and the Löbian Obstacle
https://intelligence.org/files/TilingAgents.pdf
and
Problems of self-reference in self-improving space-time embedded
intelligence. Benja Fallenstein and Nate Soares. AGI 2014.
https://intelligence.org/wp-content/uploads/2014/05/Fallenstein-Soares-Problems-of-self-reference-in-self-improving-space-time-embedded-intelligence.pdf

2.July'14
Peter gives practice talk for Quebec conference.
Note B123 and we start on time since the room has other events at 12:20.
Please arrive no later than 11:30 (always applies but in particular this week).

25.June'14
Marcus presents his ALT paper on Offline to Online Conversion.

20.June'14 Note, this is a Friday! Time 2pm
Daniel Cotton presents his ASC project on Reinforcement learning in computer science and psychology
Followed by Tony Allard giving his monitoring talk at 3pm on Logistics Planning.

18.June'14
Jan presents overview of MIRI's recent research

11.June'14
Break

4.June'14
Daniel Filan presents his ASC project on AIXI convergence

28.May'14
Mayank talks about game playing competition and reports on his progress.
In particular a report to Marcus and Peter, but others welcome.

21.May'14
Marcus presents his ALT submission Extreme State Aggregation Beyond MDPs

14.May'14
Paper reviewing discussions

7.May'14
No reading group

30.Apr'14
Paper reviewing discussions

23.Apr'14
Tor (visiting 23.-25.Apr) presents "Memory Allocation Bandits"

16.Apr'14
Paper reviewing discussions

9.Apr'14
Monitoring in RSISE seminar room
11:30 Hadi
12:00 Mayank
12:30 Jan
13:00-14:00 feedback.

2.Apr'14
Jan presents
Marcus Hutter: Discrete MDL Predicts in Total Variation. NIPS'09
http://arxiv.org/abs/0909.4588

26.Mar'14
Mayank presents "Cover Tree Bayesian Reinforcement Learning" by Nikolaos Tziortziotis, Christos Dimitrakakis and Konstantinos Blekas.
http://arxiv.org/pdf/1305.1809v1

12,19.Mar'14
Break due to travels and deadlines

5.Mar'14
Marcus talks about new extension of the context tree weighting algorithm

26.Feb'14
Peter presents "Using Expectation-Maximization for Reinforcement Learning" by Dayan and Hinton 1997
http://www.gatsby.ucl.ac.uk/~dayan/papers/rpp97.pdf
with a discussion of what has happened afterwards which includes Bayesian MCMC alternatives to the original frequentist EM approach, e.g.
http://www.stanford.edu/~ngoodman/papers/WingateEtAl-PolicyPrios.pdf
This line of work that includes many papers in the last 5 years is often called planning as inference
http://ipvs.informatik.uni-stuttgart.de/mlr/marc/publications/12-botvinick-TICS.pdf

19.Feb'14
Alex presents "Changing tastes and Coherent Dynamic Choice" by Peter J. Hammond
http://www.jstor.org/stable/2296609

12.Feb'14
Mayank continues from last week

5.Feb'14
Mayank presents "Efficient Learning and Planning with Compressed Predictive States".
William Hamilton, Mahdi Miliani Fard and Joelle Pineau.
http://arxiv.org/abs/1312.0286

29.Jan'14
Reading group restarts for 2014 with Peter talking about "Rationality, Optimism and Guarantees in General Reinforcement Learning" in the RSISE seminar room as an AI seminar. Please note 12:00-13:00 !

18'Dec'13-
MaxEnt, Xmas, New Year

11'Dec'13
Johannes presents his work on counter-examples in reinforcement learning

6'Dec'13
Tor's last day, at least here at ANU. Talk, farewell lunch etc. Details later

4'Dec'13
Hadi gives monitoring talk

27'Nov'13
Rachael continues from the 16:th of Oct with the voting part of the paper.
Note, the talk will be in R214 in the Ian Ross building!

20'Nov'13
Ian Hon presents his honours thesis

12'Nov'13
Tony Allard monitoring talk. Note Tuesday! 3pm in the RSISE seminar room

13'Nov'13
ACML workshop (organized by Peter Sunehag, Marcus Hutter, Mark Reid) on theory and practice in Machine Learning at the Manning Clark Centre, ANU
https://sites.google.com/site/mltheoryandpractice/

14,15'Nov'13
ACML at ANU

6'Nov'13
Johannes presents "The Fixed Points of Off-Policy TD" by J. Zico Kolter NIPS 2011.
http://books.nips.cc/papers/files/nips24/NIPS2011_1200.pdf

30'October'13
Mayank gives monitoring talk

23'October'13
Peter talks about "Learning from human generated rewards", based on a sequence of papers making up the PhD thesis of Bradley Knox (http://www.bradknox.net/) supervised by Peter Stone, primarily: W. Bradley Knox and Peter Stone. Learning Non-Myopically from Human-Generated Reward. In Proceedings of the International Conference on Intelligent User Interfaces (IUI), March 2013.
http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/iui13-knox.pdf

16'October'13
Raechel Briggs presents her article Decision-Theoretic Paradoxes as Voting Paradoxes,
Philosophical Review 2010 Volume 119, Number 1: 1-30
http://philreview.dukejournals.org/content/119/1/1.abstract

2,9'Oct'13
Break due to travels

25'Sep'13
Tor presents (More) Efficient Reinforcement Learning via Posterior Sampling, NIPS'2013
Ian Osband, Daniel Russo and Benjamin Van Roy
http://arxiv.org/pdf/1306.0940v1.pdf

18'Sep'13
Mayank presents,
Incremental Basis Construction from Temporal Difference Error by Yi Sun, Faustino Gomez, Mark Ring, Jurgen Schmidhuber in ICML 2011.
Paper @ http://www.idsia.ch/~juergen/icml2011sun.pdf
Slides @ http://www.idsia.ch/~sun/doc/icml11-ftr-slides.pdf

11'Sep'13
Tor talks about best arm identification in bandits

4'Sep'13
Peter presents,
Temporal-Difference Search in Computer Go by Silver, D., Sutton, R. S., Mueller, M in ICAPS 2013 http://www.aaai.org/ocs/index.php/ICAPS/ICAPS13/paper/view/6037/6227
and in Machine Learning 87(2):183-219 2012
http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/tdsearch.pdf

28'Aug'13
Mayank presents.
Bruno Scherrer. "Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unifed oblique projection view" in Proceedings of the 27th International Conference on Machine Learning (2010).
http://www.icml2010.org/papers/654.pdf
Slides available here,
http://www.loria.fr/~scherrer/presentations/tdbr.pdf

24'July'13
Tor presents things from conference travel to ICML/COLT.

17'July'13
Peter presents tutorial on Exploration vs Exploitation as practice before EWRL.
Probably downstairs in the seminar room

9'July'13 (note Tuesday!, 11:30)
Hadi presents his TPR

3'July'13
Scott presents (at 11)
S. Sanner, K. V. Delgado, and L. N. de Barros (2011). Symbolic Dynamic Programming for Discrete and Continuous State MDPs. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI-11). Barcelona, Spain.
http://users.cecs.anu.edu.au/~ssanner/Papers/cont_mdp.pdf

26'June'13
Monitoring talk by Ehsan (NICTA)

19'June'13
Monitoring talks by David and Suvash downstairs RSISE seminar room at 12

12'June'13
Integrating Partial Model Knowledge in Model Free RL Algorithms
Aviv Tamar and Dotan Di Castro and Ron Meir
International Conference on Machine Learning (ICML), 2011
http://www.icml-2011.org/papers/222_icmlpaper.pdf
Mayank presents

5'June'13
S. Thiebaux, C. Gretton, J. Slaney, D. Price and F. Kabanza (2006) "Decision-Theoretic Planning with non-Markovian Rewards", Volume 25, pages 17-74
http://www.jair.org/papers/paper1676.html
Charles Gretton presents

29'May'13
Tor and Peter presents

22'May'13
Monitoring

15'May'13
Monitoring

8'May'13
Peter present "Online Feature Selection for Model-based Reinforcement Learning" ICML'2013 by Trung Thanh Nguyen, Zhuoru Li, Tomi Silander and Tze-Yun Leong http://jmlr.csail.mit.edu/proceedings/papers/v28/nguyen13.pdf

1'May'13
Marcus presents his COLT paper on sparse adaptive Dirichlet-multinomial-like Processes

24'April'13
Hadi and Tor give monitoring talks

17'April'13
Wen and Mayank give monitoring talks

10'April'13
Mayank continues from last time on over-estimation in Q-learning

3'April'13
Mayank presents Double-Q learning and associated paper
http://books.nips.cc/papers/files/nips23/NIPS2010_0208.pdf

27'March'13
Ian Hon continues the survey on large alphabet sources and compression based on
http://www.cs.technion.ac.il/~ronbeg/begleiter-papers/begleiter06a.pdf
http://www.sps.ele.tue.nl/members/f.m.j.willems/research_files/CTW/benelux94-tjalkens-willems-shtarkov.pdf

20'March'13
Wen presents a survey on text (large alphabet) modeling. Relevant papers are
http://www2.denizyuret.com/ref/goodman/chen-goodman-99.pdf
http://acl.ldc.upenn.edu/P/P06/P06-1124.pdf

18'March'13
Oscillation-free epsilon-random sequences, Ludwig Staiger

13'March'13
Tor presents Thompson Sampling: An Asymptotically Optimal Finite Time Analysis, ALT'2012
Emilie Kaufmann, Nathaniel Korda, Rémi Munos
http://arxiv.org/abs/1205.4217

6'March'13
Peter presents "A Dantzig Selector Approach to Temporal Difference Learning", Matthieu Geist, Bruno Scherrer, Alessandro Lazaric and Mohammad Ghavamzadeh, ICML 2012 http://icml.cc/2012/papers/703.pdf

27'February'13
Wen presents "Delusion, Survival, and Intelligent Agents" by Mark B. Ring, Laurent Orseau, AGI'2011
http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=28BAE7205B795D39B357E46822EB4A4D?doi=10.1.1.232.9313

13'February'13
Tor presents "Universal Knowledge-Seeking Agents" by Laurent Orseau , ALT'2011
http://www.agroparistech.fr/mmip/maths/laurent_orseau/papers/orseau-ALT-2011-knowledge-seeking.pdf

6'February'13
Peter presents "Space-Time Embedded Intelligence" by Laurent Orseau and Mark Ring, AGI'2012
http://agi-conference.org/2012/wp-content/uploads/2012/12/paper_76.pdf

30'January'13
Tom presents his (draft) Master's thesis about (No) Free Lunch theorems for optimization

23'January'13
Nam talks about learning theory

16'January'13
Hadi presents the Loewenheim Skolem Theorem & Proof
http://en.wikipedia.org/wiki/L%C3%B6wenheim%E2%80%93Skolem_theorem
and Marcus the Skolem Paradox and its resolution
http://en.wikipedia.org/wiki/Skolem%27s_paradox

9'January'13
Marco presents Wouter M. Koolen, Dimitri Adamskiy, Manfred K. Warmuth (NIPS 2012) Putting Bayes to sleep
http://www.cs.rhul.ac.uk/~wouter/Papers/sleep.pdf

19'December'12
End of year meeting. Marcus presents Fun with Bayesian & Decision & other paradoxes.

12'December'12
Summer scholars present their topics and informal question and answer session.

28'November'12
Marco leads readings on extensions of CTW
Volf, P., & Willems, F. (1997). A context-tree branch-weighting algorithm. SYMPOSIUM ON INFORMATION THEORY IN THE …. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.121.7873&rep=rep1&type=pdf
Willems, F. M. J. (1996). Context weighting for general finite-context sources. Information Theory, IEEE …, 42(5), 1514–1520. doi:10.1109/18.532891
Willems, F. M. J. (1998). The context-tree weighting method: extensions. IEEE Transactions on Information Theory, 44(2), 792–798. doi:10.1109/18.661523

21'November'12
Peter leads discussions on
Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011) How to grow a mind: Statistics, structure, and abstraction. Science, 331, 1279-1285.
http://www.sciencemag.org/content/331/6022/1279.full.pdf

14'November'12
Monitoring: Wen and Mayank

7'November'12
Tom talks about his literature review on meta rationality

31'October'12
Xinjue gives practice talk on "Exploration in Bayesian Reinforcement Learning".

24'October'12
Monitoring talks

17'October'12
Peter presents "A Bayesian Sampling Approach to Exploration in Reinforcement Learning" by Asmuth, Li, Littman, Nouri, Wingate
http://web.mit.edu/~wingated/www/papers/boss.pdf

10'October'12
Hadi gives practice talk

3'October'12
Tor gives practice talk

26'September'12
Mayank gives practice talk

19'September'12
Marcus presents. See his email

12'September'12
Phuong gives practice talk for monitoring

5'September'12
Joel Veness gives a talk in the Seminar room downstairs on further developments of MC-AIXI.

29'August'12
Hadi talks about AGI papers

22'August'12
Phuong presents "TD-DeltaPi: A Model-Free Algorithm for Efficient Exploration" by B. C. Silva and A. G. Barto
http://people.cs.umass.edu/~bsilva/deltaPi_aaai2012.pdf

15'August'12
Mayank presents Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction. Available at http://webdocs.cs.ualberta.ca/~sutton/papers/horde-aamas-11.pdf .

8'August'12
Phuong summarizes two papers from AAAI12

  • D. Lee and W. B. Powell, Intelligence Battery Controller Using Bias-Corrected Q-learning

http://energysystems.princeton.edu/Papers/Lee_Powell_AAAI2012_BiasCorrectedQLearning.pdf

  • W. Dbney and A. G. Barto, Adaptive Step-Size for Online Reinforcement Learning

http://people.cs.umass.edu/~wdabney/papers/alphaBounds.pdf

1'August'12

25'July'12

18 July'12

11 July'12

  • Phuong doing test run for AAAI presentation

27 June 12

20 June 12

  • Cancelled

13'June'12

  • Phuong presents his work on Regret bounds for feature reinforcement learning where he extends the work by Maillard, Munos and Ryabko to the countable case.

30'May'12

  • Mayank talks about Predictive State Representations
  • Littman, Michael L.; Richard S. Sutton; Satinder Singh (2002). "Predictive Representations of State". Advances in Neural Information Processing Systems 14 (NIPS). pp. 1555–1561.
  • Singh, Satinder; Michael R. James; Matthew R. Rudary (2004). "Predictive State Representations: A New Theory for Modeling Dynamical Systems". Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI). pp. 512–519.

10.May'12

  • Wen reports from DCC 2012 on three paper (emailed out)

2.May'12

18.April'12

  • On Nicod's condition and the black raven paradox
    The paper is available from Hadi or Peter by email

4.April'12

21.Mar'12

14.Mar'12

29.Feb'12

  • Wen will be presenting his TPR on compression

15.Feb'12

  • PAC bounds for Discounted MDPs, presented by Tor. Email ua.ude.una|eromittal.rot#ua.ude.una|eromittal.rot for a copy of the paper.

8.Feb'12

  • Some inequalities in probability theory, presented by Tor

7.Dec'11

30.Nov'11

  • Solomonoff Memorial conference in Melbourne. Tor, Ian, Peter and Wen presenting.

23.Nov'11

  • An approximation of the universal intelligence measure, Shane Legg and Joel Veness
    http://jveness.info/publications/rsmc2011%20-%20aiq.pdf
    presented by Wen as a practice talk for Solomonoff Memorial
    Further discussions of the paper follows the 20 minute presentation with slides

16.Nov'11

9.Nov'11

  • We are done with the book. Paper reading resumes next week.

Nov'11

  • Peter presents chapter 7 and Joseph Chapter 8.

Oct'11

  • Wen presents Chapter 6

Sep'11

  • After a break for the first two weeks, Phuong presents chapter 5

Aug'11

  • Tor presents chapter 4

July'11

  • Daniel finnish chapter 3

30.June'11

  • Daniel presents chapter 3 of "Neuro-dynamic programming"

8,15,23.June'11

  • Mayank presents chapter 2 of "Neuro-dynamic programming"

1.June'11

25.May'11

18.May'11

11.May'11

  • Meeting, discussing paper reviewing

4.May'11

27.April'11

20.April'11

13.April'11

6.April'11

  • Mayank Presents [Hut09] M.Hutter, Feature dynamic Bayesian networks.
    In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume8, pages 67-73. Atlantis Press, 2009.
    http://www.hutter1.net/ai/phidbn.pdf

30.Mar'11

23.Mar'11

16.Mar'11

  • Phuong presents

9.Mar'11

4.Mar'11
"Pascal" Workshop on RL and Planning at NICTA Level 3, Meeting Room D

  • 10:30 — 11:00 | Pascal Poupart: Explaining Automated Policies for Sequential Decision Making
  • 11:00 — 11:18 | Debdeep Banerjee: Partial Order Support Link Scheduling
  • 11:18 — 11:36 | Patrik Haslum: A Quick Overview of Factored (Classical) Planning
  • Break — 12 minutes
  • 11:48 — 12:06 | Scott Sanner: The Relational Dynamic Influence Diagram Language
  • 12:06 — 12:24 | Peter Sunehag: History-based Reinforcement Learning
  • 12:24 — 12:42 | Matt Robards: Model-Based Reinforcement Learning With Function Approximation
  • 12:42 — 13:00 | Will Uther: topic TBD

2.Mar'11

23.Feb'11

  • Tor and Hassan present …

16.Feb'11

  • Matthew and Peter present …

9.Feb.'11

15.Dec.'10

  • Chapters 6—8, Sridhar Mahadevan, "Learning Representation and Control in Markov Decision Processes: New Frontiers".
    Foundations and Trends in Machine Learning (editor, Michael, Jordan), vol 1, No. 4, pp. 403-565 (163 pages), 2009.
    http://www.cs.umass.edu/~mahadeva/papers/ml-found-trend.pdf
    Presented by Scott

8.Dec.'10

  • Statistical physics of social dynamics
    Castellano, C., Fortunato, S., and Loreto, V. 2009. Reviews of Modern Physics 81, 2, 591
    Section IV. Cultural Dynamics, Parts A & B (Axelrod model and variants)
    http://dx.doi.org/10.1103/RevModPhys.81.591
    Presented by Ian Wood

1.Dec.'10

  • Constrained Complexity Generalized Context-Tree Algorithms, Robert J Drost and Andrew C Singer
    http:/ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04301233
    Presented by Peter

24.Nov'10

  • Hopefully the following will be presented :Efficient real-time dynamic programming for factored MDPs.
    Honors thesis by Sotirios Diamand. NOTE: In N101.

17.Nov'10

10.Nov'10

  • Efficient real-time dynamic programming for factored MDPs.
    Honors thesis by Sotirios Diamand. CANCELED

3.Nov'10

  • Cancelled

27.Oct'10

20.Oct'10

13.Oct'10

22 .Sep'10, in Room 207 (only ours to 12:30, be on time)

15 .Sep'10

I list two more interesting papers that are more RL related but harder (I think)

And another making the case for pursuing robust estimators in general:

8 .Sep'10

  • No Free Lunch and Occam's Razor in Supervised Learning
    (Tor presents his work)

18,25 .Aug,1.Sep'10

  • Chapter 8 of Universal AI book
    (general discussion)

11 .Aug'10

  • New TD algorithms from Alberta
    (Matthew presents a survey of stuff by Maei, Sutton and their collegues)

4 .Aug'10

  • MC-AIXI-CTW,
    (Joel Verness) NOTE LOCATION: A207

28 .July'10

  • End of Chapter 7 of Universal AI book
    (Tor presents)

21 .July'10

  • Meeting about Advanced AI course

Aug'10

9,16 .Jun'10

  • Chapter 7 of Universal AI book
    (Presented by Phuong)

2.Jun'10

  • Canceled

26.May'10

  • Matthew Robards and Peter Sunehag and Scott Sanner
    RKHS Temporal Difference Learning
    Tech Report, The Australian National University
    RKHS Temporal Difference Learning

28.Apr'&05,19.May'10

  • Chapter 6 of Universal AI book
    (Presented by Peter,Marcus, Zhara)

21.April'10

For more details,

14.April'10

  • L. Kocsis, Cs. Szepesvári
    Bandit Based Monte-Carlo Planning
    In, Proceedings of the 17th European Conference on Machine Learning
    Springer-Verlag, Berlin, LNCS/LNAI 4212, September 18-22, pp. 282-293, 2006.
    http://www.sztaki.hu/~szcsaba/papers/ecml06.pdf
    (Presented by Peter)

7.April'10

31.Mar'10

17&24.Mar'10

3&10.Mar'10

  • Joel Veness and Kee Siong Ng and Marcus Hutter and David Silver
    A Monte Carlo AIXI Approximation
    Technical Report, arXiv 0909.0801 (2009) 1-42
    [implementation & application of the AIXI]
    http://www.hutter1.net/ai/aixictw.pdf
    (Presented by Sam)

3&10&17&24.Feb'10

27.Jan'10

20.Jan'10

  • Continuation of last week's paper + Summer Scholar presentation preview.

13.Jan'10

23&30.Dec'09

  • Break - (-: Christmas and New Years :-)

16.Dec'09

Scott will be presenting:

09.Dec'09

  • [RP08] S.Ross and J.Pineau.
    Model-based Bayesian reinforcement learning in large structured domains.
    In Proc. 24th Conference in Uncertainty in Artificial Intelligence
    (UAI'08), pages 476-483, Helsinki, 2008. AUAI Press.
    http://www.cs.mcgill.ca/~jpineau/files/sross-uai08.pdf
    (Presented by Peter)

02.Dec'09

  • Note: Changed from before.
    M. Rosencrantz, G. Gordon, and S. Thrun.
    Learning low dimensional predictive representations.
    In Proceedings of the Twenty-First International Conference on Machine Learning,
    Banff, Alberta, Canada, 2004.
    http://robots.stanford.edu/papers/Rosencrantz04a.pdf
    (Presented by Ian)

25.Nov'09

  • [SJR04] S.P. Singh, M.R. James, and M.R. Rudary.
    Predictive state representations: A new theory for modeling dynamical systems.
    In Proc. 20th Conference in Uncertainty in Artificial Intelligence (UAI'04), pages 512-518, Banff, Canada, 2004. AUAI Press.
    (Presented by Hassan)

18.Nov'09

  • Matthew Robards will present his literature review on reinforcement learning in large, continuous spaces (focus on Part II).
    Literature Review

11.Nov'09

  • [SLL09] A.L. Strehl, L.Li, and MichaelL. Littman.
    Reinforcement learning in finite MDPs: PAC analysis.
    http://paul.rutgers.edu/~strehl/, 2009.
    (Presented by Marcus)

04.Nov'09

28.Oct'09

Scott will be talking about several nice methods for solving MDPs efficiently.
The 4 papers to be covered are summarized in the following slides:

http://sml.nicta.com.au/rlp08/RLP_MDP_Extensions.pdf

The papers themselves are as follows (it's recommended that people read the first one
and skim through the others).

  • Hierarchical Solution of Markov Decision Processes using Macro-actions.
    Milos Hauskrecht and Nicolas Meuleau and Leslie Pack Kaelbling and Thomas Dean and Craig Boutilier.
    UAI 1998.
    Note: this paper builds on the macro action semi-MDP framework of Sutton & Precup, but makes some
    important changes which make things much cleaner (theoretically and implementationally).
    http://www.cs.toronto.edu/kr/papers/macros.pdf

21.Oct'09

Addendum: Policy gradient techniques from a robotics perspective:

14.Oct'09

  • [NCD04] A.Y. Ng, A.Coates, M.Diel, V.Ganapathi, J.Schulte, B.Tse, E.Berger, and E.Liang.
    Autonomous inverted helicopter flight via reinforcement learning.
    In ISER, volume21 of Springer Tracts in Advanced Robotics, pages 363-372. Springer, 2004.
    (Presented by Phuong)

30.Sep'09&7.Oct'09

  • [RPPC08] S. Ross, J. Pineau, S. Paquet, B. Chaib-draa,
    Online planning algorithms for POMDPs,
    Journal of Artificial Intelligence Research, 32 (2008) 663—704.
    This paper compares the "online" "tree-search" planning approach, popular for games
    with the "offline" "self-consistent" Bellman equation approach,
    popular in reinforcement learning (and described by Kaelbling 1998 et al).
    (Presented by Peter).

16&23.Sep'09

  • [KLC98] L.P. Kaelbling and M.L. Littman and A.R. Cassandra,
    Planning and Acting in Partially Observable Stochastic Domains
    Artificial Intelligence, 101 (1998) 99—134
    (Presented by Marcus/Hassan/Sarah)

Papers in Queue

General POMDPs

State Abstractions for RL

  • [GDG03] R.Givan, T.Dean, and M.Greig.
    Equivalence notions and model minimization in Markov decision processes.
    Artificial Intelligence, 147(1-2):163-223, 2003.
  • [Hut09a] M.Hutter.
    Feature dynamic Bayesian networks.
    In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume8, pages 67-73. Atlantis Press, 2009.

General MDPs

  • [LLW08] Lihong Li, Michael L. Littman, Thomas J. Walsh: Knows what it knows: a framework for self-aware learning. ICML 2008: 568-575
    www.machinelearning.org/archive/icml2008/papers/627.pdf
  • [BBSE10] (Book), Lucian Busoniu, Robert Babuska, Bart De Schutter, and Damien Ernst (2010)
    "Reinforcement Learning and Dynamic Programming Using Functions Approximators"
    in the Automation and Control Engineering series of Taylor & Francis CRC Press.
  • [Mah09] Sridhar Mahadevan (2009) Learning Representation and Control in Markov Decision Processes: New Frontiers
    Foundations and Trends in Machine Learning: Vol. 1: No 4, pp 403-565.
    http://dx.doi.org/10.1561/2200000003

Miscellaneous

  • [Gru04] P.D. Gruenwald.
    Tutorial on minimum description length.
    In Minimum Description Length: recent advances in theory and practice, page Chapters 1 and 2. MIT Press, 2004.
  • [BLA02] B. Ng, L. Peshkin, and A. Pfeffer.
    Factored Particles for Scalable Monitoring.
    In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, 2002.

Background Reading

The standard textbook on RL is:

See also:

  • [Put94] M.L. Puterman.
    Markov Decision Processes - Discrete Stochastic Dynamic Programming.
    Wiley, New York, NY, 1994.
  • [KV86] P.R. Kumar and P.P. Varaiya.
    Stochastic Systems: Estimation, Identification, and Adaptive Control.
    Prentice Hall, Englewood Cliffs, NJ, 1986.
  • [PORL09] Partially Observable Reinforcement Learning

Symposium at NIPS'09 December 10, Vancouver
http://www.hutter1.net/ai/porlsymp.htm and
http://grla.wikidot.com/nips for more details.

Contact

Sultan Javed Majeed <ua.ude.una|deejam.natlus#ua.ude.una|deejam.natlus> or
Tom Everitt <ua.ude.una|ttireve.mot#ua.ude.una|ttireve.mot> or
Marcus Hutter <ua.ude.una|rettuh.sucram#ua.ude.una|rettuh.sucram>

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License