# General Information

Welcome to the Reinforcement Learning Reading Group at RSISE@ANU

**Who:**Everyone is welcome.**When:**Every Wednesday, 10:45-12, with subsequent lunch.

^{(If you want to attend, but the time does not suit you, please let me know)}**Where:**Brian Anderson building (115), Meeting Room, A206 (or sometimes seminar room A105), Australian National University.**Assumed Background:**Basics in Reinforcement Learning (see below)**Operation mode:**Discussing read papers. No email reminders.

The Reinforcement Learning Reading Group will concentrate on the background for and techniques pushing the frontier of *generic* reinforcement learning agents, in particular for partial observable domains (PORL). For many years, the reinforcement-learning community primarily focused on sequential decision making in fully observable but unknown domains while the planning-under-uncertainty community focused on known but partially observable domains. Since most problems are both partially observable and (at least partially) unknown, recent years have seen a surge of interest in combining the related, but often different, algorithmic machineries developed in the two communities.

See, for instance:

PORL09: Partially Observable Reinforcement Learning

Symposium at NIPS'09 December 10, Vancouver

http://www.hutter1.net/ai/porlsymp.htm and

http://grla.wikidot.com/nips for more details.

Given the substantial interest in RL and Planning at RSISE@ANU and CRL@NICTA, the time thus seems ripe for a reading group that brings these two communities together and to review recent relevant papers (see below).

Regular (Past&Current) Participants:

Mostly students and researchers from RSISE@ANU and CRL@NICTA

and other RL friends nearby.

Book Reading

From 1.June'11-9.Nov'11 we read

"Neuro-dynamic programming" by Dimitri P. Bertsekas and John Tsitsiklis

Athena Scientific 1996

http://www.amazon.com/Neuro-Dynamic-Programming-Optimization-Neural-Computation/dp/1886529108

Chapter (http://www.athenasc.com/ndpcontents.html) 1: Peter, 2: Mayank, 3: Daniel, 4: Tor, 5: Phuong, 6: Wen and Matthew 7: Peter 8: Joseph

# Reading List

The schedule for the reading group is given below and will be updated weekly.

**We are now meeting at 10:45**

**29.March'17**

Jarryd continues with generative adversarial networks for RL

**22.March'17**

Edward Barker presents Unsupervised Basis Function Adaptation for Reinforcement Learning

**15.March'17**

Jarryd presents generative adversarial networks for RL

**8.March'17**

Tor Lattimore talks about the adversarial/stochastic divide and some open problems there

**1.March'17**

Tom continues with The Delusionbox Problem.

**22.Feb'17**

Tom presents The Delusionbox Problem

**15.Feb'17**

Phuong Nguyen presents interesting experiences since leaving the group

**8.Feb'17**

Tor Lattimore talks about some open problems in online learning/statistics/RL.

**1.Feb'17**

Tor Lattimore presents The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

**14.Dec'16**

Jarryd presents - Learning to reinforcement learn

**07.Dec'16**

Boris gives update on his project

**30.Nov'16**

Tom gives AI-Safety talk

**23.Nov'16**

Suraj presents his thesis (mid-semester update)

**16.Nov'16**

[No reading group]

**9.Nov'16**

Tom gives monitoring talk

**2.Nov'16**

[No reading group]

**26.Oct'16** (at different time: 1pm)

James presents symmetry of algorithmic information

**26.Oct'16** (at different time: 12pm)

Farhana presents - Dimensionality of Spatio-Temporal Broadband Signals Observed Over Finite Spatial and Temporal Windows

**19.Oct'16**

Sultan presents his recent research insights

(thereafter group excursion, see email for details)

**12.Oct'16**

Jarryd presents exploration results (cont.)

**5.Oct'16**

John presents AIXIjs

**28.Sep'16**

Jarryd presents exploration results

**21.Sep'16**

Manlio presents On the Computability of AIXI

**14.Sep'16**

John speaks about visit to Bay Area, and Why does deep and cheap learning work so well?

**7.Sep'16**

Tom presents takeaways from US+UK trip: deep learning

**31.Aug'16**

Tom presents takeaways from US+UK trip: UC Berkely and AGI

**24.Aug'16**

Tom presents takeaways from US+UK trip: New AI Safety research agendas Google/OpenAI open safety problems and MIRI's machine learning agenda

**17.Aug'16**

Tom presents takeaways from US+UK trip: Mainly Cooperative inverse reinforcement learning

**3.Aug'16**

Manlio Valenti from Trento introduces Upper-SemiComputable SemiMeasures

**27.July'16**

John and Sean present progress on Interactive GRL Demo

**20.July'16**

Break

**13.July'16**

Sultan presents 2 papers

**22.June'16**

Jarryd presents Unifying Count-Based Exploration and Intrinsic Motivation

**15.June'16**

Xian Wang presents his research on …

**6.June'16 (obs: Monday)**

Tom continues with AIXI tutorial

**1.June'16**

George Stamatescu presents KL Divergence and Reciprocal Chains

**31.May'16**

Tom presents AIXI tutorial

**25.May'16**

Gerhard Visser presents Interest-Relative Inductive Inference thesis draft (unpublished)

**11+18.May'16**

Break

**4.May'16**

John presents Pedro A. Ortega, Naftali Tishby (2016) Memory controls time perception and inter-temporal choices

**27.April'16**

Tom presents wireheading result

**20.April'16**

Sultan continues AGI reviews

**13.April'16**

Sultan AGI reviews

**6.April'16**

Tom and Sultan UAI reviews

**30.Mar'16**

Jan "defends" his thesis

**(in room A105)**

**23.Mar'16**

Discussion of UAI reviews

**16.Mar'16**

Jan continues to talk about conferences from 2015

**9.Mar'16**

Sultan presents State of the Art Control of Atari Games Using Shallow Reinforcement Learning

**4.Mar'16 11:30 EXTRA SESSION**

Adam Case presents

**2.Mar'16**

Jan presents Safely Interruptible Agents

**24.Feb'16**

Djallel Bouneffouf presents

**17.Feb'16**

Jan continues to talk about conferences from 2015

**10.Feb'16**

No reading group.

**3.Feb'16**

Jan continues to talk about conferences from 2015

**27.Jan'16**

Tom presents Owain Evans' paper Learning the Preferences of Ignorant, Inconsistent Agents

**20.Jan'16**

Jan talks about conferences from 2015

**16.Dec'15**

Tom summarises the Australasian AI conference, and maybe continues with preliminary results on the wireheading problem.

**9.Dec'15**

Jae Hee Lee presents his PhD thesis Qualitative Reasoning about Relative Directions: Computational Complexity and Practical Algorithm

**2.Dec'15**

Break for Australian AI conference

**25.Nov'15**

Tom presents summary of MIRIx workshop.

**18.Nov'15**

Tom presents preliminary results on the wireheading problem.

**11.Nov'15**

Daniel continues with Agents Using Speed Priors

**4.Nov'15**

Daniel presents Agents Using Speed Priors

**28.Oct'15**

David presents Practical Extreme State Aggregation

**21.Oct'15**

Matt Alger presents a project on Deep Inverse Reinforcement Learning

**14.Oct'15**

Aqua Zhu presents background on classical sequence prediction and related problems.

**7.Oct'15**

Tom presents Analytical Results on the BFS vs. DFS Algorithm Selection Problem, Part II: Graph Search

**23.Sep'15**

Tor Lattimore presents Optimally Confident UCB: Improved Regret for Finite-Armed Bandits

**16.Sep'15**

Spring break

**9.Sep'15**

Spring break

**2.Sep'15**

Tom presents Analytical Results on the BFS vs. DFS Algorithm Selection Problem, Part I: Tree Search

**26.Aug'15**

Marcus continues presenting impressions from ICML/EWRL

**19.Aug'15**

Hadi Afshar presents Reflection, Refraction, and Hamiltonian Monte Carlo

Recommended (background) reading

**12.Aug'15**

Tom presents Sequential Extensions of Causal and Evidential Decision Theory

**5.Aug'15**

Reading group resumes. Marcus presents impressions from ICML/EWRL

**24.June'15 - 29.July'15**

Winter break

**17.June'15**

Yiyun presents *Modelling Causal Reasoning with Ambiguous Observations* and *Quantum Probability Model of "Zero-Sum" Beliefs*

**10.June'15**

Mayank presents Neural Turing Machines

**3.June'15**

Continue discussing reviews for ALT

**27.May'15**

Discussing reviews for ALT

**20.May'15**

Jan continues from last time

**13.May'15**

Jan talks about merging and predicting,

in particular the results from Merging and Learning and

On Sequence Prediction for Arbitrary Measures

**6.May'15**

Continued discussion of AGI reviews

**29.Apr'15**

Discussing reviews for AGI

**15. and 22.Apr'15**

No reading group

**8.Apr'15**

Jan presents Reflective Oracles: A Foundation for Classical Game Theory

**1.Apr'15**

Jan presents Reflective Variants of Solomonoff Induction and AIXI

**25.Mar'15**

Yiyun presents *Cognitive processes and mechanisms in causal reasoning with ambiguous observations*

**18.Mar'15**

Marcus presents Compress and Control

**11.Mar'15**

Daniel presents the current status of his work on the speed prior

The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions

**4.Mar'15**

Jan presents a journal paper under review

**18. and 25. Feb'15**

No Reading Group

**11.Feb'15**

Tom presents Can we measure the difficulty of an optimization problem?

**4.Feb'15**

Mayank presents selected papers from ACML 2014

**28.Jan'15**

Jan presents Corrigibility

https://intelligence.org/files/CorrigibilityTR.pdf

**21.Jan'15**

Mayank gives a tutorial on convex optimization

**10.Dec'14**

PhD Monitoring Hadi (Room RSISE B123)

**19'Nov'14**

Peter leads discussion on the new book (with a focus on chapter 7)

Ethical Artificial Intelligence by Bill Hibbard

http://arxiv.org/ftp/arxiv/papers/1411/1411.1373.pdf

Bill builds on UAI, decision theoretic rationality, space-time embedded agents etc. to formally study ethical AI.

**12'Nov'14**

Xi Li presents on Leibniz's program and its

relation to UAI

**5.Nov'14**

Daniel Filan talks about Extreme state aggregation beyond MDPs

**29.Oct'14**

Neal Hughes (economics PhD student) presents on using RL for water management

Note: its in B123

**22.Oct'14**

Tom Butler presents his honors thesis

Fuzzy Expert System Evolution: Increasing the accessibility of intelligent controllers

**15.Oct'14**

PhD Monitoring Mayank & Jan (Room RSISE B123)

**8.Oct'14**

Break

**1.Oct'14**

Break

**24.Sep'14**

Daniel Filan presents about the speed prior

http://link.springer.com/chapter/10.1007%2F3-540-45435-7_15

**17.Sep'14**

Jan presents Teleporting Universal Agents by Laurent Orseau AGI'2014

http://www.agroparistech.fr/mia/equipes:membres:page:laurent:teleport

**10.Sep'14**

Hadi presents his most recent work on symbolic Gibb's sampling

**3.Sep'14**

Peter reports from AAAI'2014

Integrating representation learning and temporal difference learning:

A matrix factorization approach by M. White

http://webdocs.cs.ualberta.ca/~whitem/publications/14aaaiw-crtd.pdf

with a closely related alternative

http://webdocs.cs.ualberta.ca/~whitem/publications/14aaaiw-frrl.pdf

Active Learning with Model Selection by A. Ali., R. Caruana and A. Kapoor

http://research.microsoft.com/en-us/um/people/akapoor/papers/AAAI2014.pdf

Natural Temporal Difference Learning by W. Dabney and P. Thomas

http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/viewFile/8568/8913

**27.Aug'14**

Peter reports from CogSci'2014.

Toward Boundedly Rational Analysis by Thomas Icard

http://web.stanford.edu/~icard/cogsci14.pdf

A Bounded Rationality Account of Wishful Thinking by R. Neumann, A. N. Rafferty, T. L. Griffiths

http://cocosci.berkeley.edu/anna/papers/WishfulThinking.pdf

The high availability of extreme events serves resource-rational decision-making by Lieder, Wills, Hsu, Griffiths

http://cocosci.berkeley.edu/falk/HighAvailabilityOfExtremeEvents.pdf

and a related recent journal paper providing the background for the above

One and Done? Optimal Decisions From Very Few Samples by Edward Vul, Noah Goodman, Thomas L. Griffiths and Joshua B. Tenenbaum

http://web.stanford.edu/~ngoodman/papers/VulGoodmanGriffithsTenenbaum-COGS-2014.pdf

Information vs Reward in a changing world by Navarro and Newell

http://health.adelaide.edu.au/psychology/ccs/docs/pubs/2014/NavarroNewell2014.pdf

Uncertainty and Exploration in a restless bandit task by Speekenbrink and Konstantinidis

http://www.psychol.ucl.ac.uk/m.speekenbrink/articles/cogsci2014.pdf

**16.July'14 — 20.Aug'14**

Currently no meetings planned, but check a day in advance or volunteer to present something.

**9.July'14**

Jan presents Yudkowsky, Eliezer Herreshoff, Marcello.

Tiling Agents for Self-Modifying AI, and the Löbian Obstacle

https://intelligence.org/files/TilingAgents.pdf

and

Problems of self-reference in self-improving space-time embedded

intelligence. Benja Fallenstein and Nate Soares. AGI 2014.

https://intelligence.org/wp-content/uploads/2014/05/Fallenstein-Soares-Problems-of-self-reference-in-self-improving-space-time-embedded-intelligence.pdf

**2.July'14**

Peter gives practice talk for Quebec conference.

Note B123 and we start on time since the room has other events at 12:20.

Please arrive no later than 11:30 (always applies but in particular this week).

**25.June'14**

Marcus presents his ALT paper on Offline to Online Conversion.

**20.June'14 Note, this is a Friday! Time 2pm**

Daniel Cotton presents his ASC project on Reinforcement learning in computer science and psychology

Followed by Tony Allard giving his monitoring talk at 3pm on Logistics Planning.

**18.June'14**

Jan presents overview of MIRI's recent research

**11.June'14**

Break

**4.June'14**

Daniel Filan presents his ASC project on AIXI convergence

**28.May'14**

Mayank talks about game playing competition and reports on his progress.

In particular a report to Marcus and Peter, but others welcome.

**21.May'14**

Marcus presents his ALT submission Extreme State Aggregation Beyond MDPs

**14.May'14**

Paper reviewing discussions

**7.May'14**

No reading group

**30.Apr'14**

Paper reviewing discussions

**23.Apr'14**

Tor (visiting 23.-25.Apr) presents "Memory Allocation Bandits"

**16.Apr'14**

Paper reviewing discussions

**9.Apr'14**

Monitoring in RSISE seminar room

11:30 Hadi

12:00 Mayank

12:30 Jan

13:00-14:00 feedback.

**2.Apr'14**

Jan presents

Marcus Hutter: Discrete MDL Predicts in Total Variation. NIPS'09

http://arxiv.org/abs/0909.4588

**26.Mar'14**

Mayank presents "Cover Tree Bayesian Reinforcement Learning" by Nikolaos Tziortziotis, Christos Dimitrakakis and Konstantinos Blekas.

http://arxiv.org/pdf/1305.1809v1

**12,19.Mar'14**

Break due to travels and deadlines

**5.Mar'14**

Marcus talks about new extension of the context tree weighting algorithm

**26.Feb'14**

Peter presents "Using Expectation-Maximization for Reinforcement Learning" by Dayan and Hinton 1997

http://www.gatsby.ucl.ac.uk/~dayan/papers/rpp97.pdf

with a discussion of what has happened afterwards which includes Bayesian MCMC alternatives to the original frequentist EM approach, e.g.

http://www.stanford.edu/~ngoodman/papers/WingateEtAl-PolicyPrios.pdf

This line of work that includes many papers in the last 5 years is often called planning as inference

http://ipvs.informatik.uni-stuttgart.de/mlr/marc/publications/12-botvinick-TICS.pdf

**19.Feb'14**

Alex presents "Changing tastes and Coherent Dynamic Choice" by Peter J. Hammond

http://www.jstor.org/stable/2296609

**12.Feb'14**

Mayank continues from last week

**5.Feb'14**

Mayank presents "Efficient Learning and Planning with Compressed Predictive States".

William Hamilton, Mahdi Miliani Fard and Joelle Pineau.

http://arxiv.org/abs/1312.0286

**29.Jan'14**

Reading group restarts for 2014 with Peter talking about "Rationality, Optimism and Guarantees in General Reinforcement Learning" in the RSISE seminar room as an AI seminar. Please note 12:00-13:00 !

**18'Dec'13-**

MaxEnt, Xmas, New Year

**11'Dec'13**

Johannes presents his work on counter-examples in reinforcement learning

**6'Dec'13**

Tor's last day, at least here at ANU. Talk, farewell lunch etc. Details later

**4'Dec'13**

Hadi gives monitoring talk

**27'Nov'13**

Rachael continues from the 16:th of Oct with the voting part of the paper.

Note, the talk will be in R214 in the Ian Ross building!

**20'Nov'13**

Ian Hon presents his honours thesis

**12'Nov'13**

Tony Allard monitoring talk. Note Tuesday! 3pm in the RSISE seminar room

**13'Nov'13**

ACML workshop (organized by Peter Sunehag, Marcus Hutter, Mark Reid) on theory and practice in Machine Learning at the Manning Clark Centre, ANU

https://sites.google.com/site/mltheoryandpractice/

**14,15'Nov'13**

ACML at ANU

**6'Nov'13**

Johannes presents "The Fixed Points of Off-Policy TD" by J. Zico Kolter NIPS 2011.

http://books.nips.cc/papers/files/nips24/NIPS2011_1200.pdf

**30'October'13**

Mayank gives monitoring talk

**23'October'13**

Peter talks about "Learning from human generated rewards", based on a sequence of papers making up the PhD thesis of Bradley Knox (http://www.bradknox.net/) supervised by Peter Stone, primarily: W. Bradley Knox and Peter Stone. Learning Non-Myopically from Human-Generated Reward. In Proceedings of the International Conference on Intelligent User Interfaces (IUI), March 2013.

http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/iui13-knox.pdf

**16'October'13**

Raechel Briggs presents her article Decision-Theoretic Paradoxes as Voting Paradoxes,

Philosophical Review 2010 Volume 119, Number 1: 1-30

http://philreview.dukejournals.org/content/119/1/1.abstract

**2,9'Oct'13**

Break due to travels

**25'Sep'13**

Tor presents (More) Eﬃcient Reinforcement Learning via Posterior Sampling, NIPS'2013

Ian Osband, Daniel Russo and Benjamin Van Roy

http://arxiv.org/pdf/1306.0940v1.pdf

**18'Sep'13**

Mayank presents,

Incremental Basis Construction from Temporal Difference Error by Yi Sun, Faustino Gomez, Mark Ring, Jurgen Schmidhuber in ICML 2011.

Paper @ http://www.idsia.ch/~juergen/icml2011sun.pdf

Slides @ http://www.idsia.ch/~sun/doc/icml11-ftr-slides.pdf

**11'Sep'13**

Tor talks about best arm identification in bandits

**4'Sep'13**

Peter presents,

Temporal-Difference Search in Computer Go by Silver, D., Sutton, R. S., Mueller, M in ICAPS 2013 http://www.aaai.org/ocs/index.php/ICAPS/ICAPS13/paper/view/6037/6227

and in Machine Learning 87(2):183-219 2012

http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/tdsearch.pdf

**28'Aug'13**

Mayank presents.

Bruno Scherrer. "Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unifed oblique projection view" in Proceedings of the 27th International Conference on Machine Learning (2010).

http://www.icml2010.org/papers/654.pdf

Slides available here,

http://www.loria.fr/~scherrer/presentations/tdbr.pdf

**24'July'13**

Tor presents things from conference travel to ICML/COLT.

**17'July'13**

Peter presents tutorial on Exploration vs Exploitation as practice before EWRL.

Probably downstairs in the seminar room

**9'July'13 (note Tuesday!, 11:30)**

Hadi presents his TPR

**3'July'13**

Scott presents (at 11)

S. Sanner, K. V. Delgado, and L. N. de Barros (2011). Symbolic Dynamic Programming for Discrete and Continuous State MDPs. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI-11). Barcelona, Spain.

http://users.cecs.anu.edu.au/~ssanner/Papers/cont_mdp.pdf

**26'June'13**

Monitoring talk by Ehsan (NICTA)

**19'June'13**

Monitoring talks by David and Suvash downstairs RSISE seminar room at 12

**12'June'13**

Integrating Partial Model Knowledge in Model Free RL Algorithms

Aviv Tamar and Dotan Di Castro and Ron Meir

International Conference on Machine Learning (ICML), 2011

http://www.icml-2011.org/papers/222_icmlpaper.pdf

Mayank presents

**5'June'13**

S. Thiebaux, C. Gretton, J. Slaney, D. Price and F. Kabanza (2006) "Decision-Theoretic Planning with non-Markovian Rewards", Volume 25, pages 17-74

http://www.jair.org/papers/paper1676.html

Charles Gretton presents

**29'May'13**

Tor and Peter presents

**22'May'13**

Monitoring

**15'May'13**

Monitoring

**8'May'13**

Peter present "Online Feature Selection for Model-based Reinforcement Learning" ICML'2013 by Trung Thanh Nguyen, Zhuoru Li, Tomi Silander and Tze-Yun Leong http://jmlr.csail.mit.edu/proceedings/papers/v28/nguyen13.pdf

**1'May'13**

Marcus presents his COLT paper on sparse adaptive Dirichlet-multinomial-like Processes

**24'April'13**

Hadi and Tor give monitoring talks

**17'April'13**

Wen and Mayank give monitoring talks

**10'April'13**

Mayank continues from last time on over-estimation in Q-learning

**3'April'13**

Mayank presents Double-Q learning and associated paper

http://books.nips.cc/papers/files/nips23/NIPS2010_0208.pdf

**27'March'13**

Ian Hon continues the survey on large alphabet sources and compression based on

http://www.cs.technion.ac.il/~ronbeg/begleiter-papers/begleiter06a.pdf

http://www.sps.ele.tue.nl/members/f.m.j.willems/research_files/CTW/benelux94-tjalkens-willems-shtarkov.pdf

**20'March'13**

Wen presents a survey on text (large alphabet) modeling. Relevant papers are

http://www2.denizyuret.com/ref/goodman/chen-goodman-99.pdf

http://acl.ldc.upenn.edu/P/P06/P06-1124.pdf

**18'March'13**

Oscillation-free epsilon-random sequences, Ludwig Staiger

**13'March'13**

Tor presents Thompson Sampling: An Asymptotically Optimal Finite Time Analysis, ALT'2012

Emilie Kaufmann, Nathaniel Korda, Rémi Munos

http://arxiv.org/abs/1205.4217

**6'March'13**

Peter presents "A Dantzig Selector Approach to Temporal Diﬀerence Learning", Matthieu Geist, Bruno Scherrer, Alessandro Lazaric and Mohammad Ghavamzadeh, ICML 2012 http://icml.cc/2012/papers/703.pdf

**27'February'13**

Wen presents "Delusion, Survival, and Intelligent Agents" by Mark B. Ring, Laurent Orseau, AGI'2011

http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=28BAE7205B795D39B357E46822EB4A4D?doi=10.1.1.232.9313

**13'February'13**

Tor presents "Universal Knowledge-Seeking Agents" by Laurent Orseau , ALT'2011

http://www.agroparistech.fr/mmip/maths/laurent_orseau/papers/orseau-ALT-2011-knowledge-seeking.pdf

**6'February'13**

Peter presents "Space-Time Embedded Intelligence" by Laurent Orseau and Mark Ring, AGI'2012

http://agi-conference.org/2012/wp-content/uploads/2012/12/paper_76.pdf

**30'January'13**

Tom presents his (draft) Master's thesis about (No) Free Lunch theorems for optimization

**23'January'13**

Nam talks about learning theory

**16'January'13**

Hadi presents the Loewenheim Skolem Theorem & Proof

http://en.wikipedia.org/wiki/L%C3%B6wenheim%E2%80%93Skolem_theorem

and Marcus the Skolem Paradox and its resolution

http://en.wikipedia.org/wiki/Skolem%27s_paradox

**9'January'13**

Marco presents Wouter M. Koolen, Dimitri Adamskiy, Manfred K. Warmuth (NIPS 2012) Putting Bayes to sleep

http://www.cs.rhul.ac.uk/~wouter/Papers/sleep.pdf

**19'December'12**

End of year meeting. Marcus presents Fun with Bayesian & Decision & other paradoxes.

**12'December'12**

Summer scholars present their topics and informal question and answer session.

**28'November'12**

Marco leads readings on extensions of CTW

Volf, P., & Willems, F. (1997). A context-tree branch-weighting algorithm. SYMPOSIUM ON INFORMATION THEORY IN THE …. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.121.7873&rep=rep1&type=pdf

Willems, F. M. J. (1996). Context weighting for general finite-context sources. Information Theory, IEEE …, 42(5), 1514–1520. doi:10.1109/18.532891

Willems, F. M. J. (1998). The context-tree weighting method: extensions. IEEE Transactions on Information Theory, 44(2), 792–798. doi:10.1109/18.661523

**21'November'12**

Peter leads discussions on

Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011) How to grow a mind: Statistics, structure, and abstraction. Science, 331, 1279-1285.

http://www.sciencemag.org/content/331/6022/1279.full.pdf

**14'November'12**

Monitoring: Wen and Mayank

**7'November'12**

Tom talks about his literature review on meta rationality

**31'October'12**

Xinjue gives practice talk on "Exploration in Bayesian Reinforcement Learning".

**24'October'12**

Monitoring talks

**17'October'12**

Peter presents "A Bayesian Sampling Approach to Exploration in Reinforcement Learning" by Asmuth, Li, Littman, Nouri, Wingate

http://web.mit.edu/~wingated/www/papers/boss.pdf

**10'October'12**

Hadi gives practice talk

**3'October'12**

Tor gives practice talk

**26'September'12**

Mayank gives practice talk

**19'September'12**

Marcus presents. See his email

**12'September'12**

Phuong gives practice talk for monitoring

**5'September'12**

Joel Veness gives a talk in the Seminar room downstairs on further developments of MC-AIXI.

**29'August'12**

Hadi talks about AGI papers

**22'August'12**

Phuong presents "TD-DeltaPi: A Model-Free Algorithm for Efficient Exploration" by B. C. Silva and A. G. Barto

http://people.cs.umass.edu/~bsilva/deltaPi_aaai2012.pdf

**15'August'12**

Mayank presents Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction. Available at http://webdocs.cs.ualberta.ca/~sutton/papers/horde-aamas-11.pdf .

**8'August'12**

Phuong summarizes two papers from AAAI12

- D. Lee and W. B. Powell, Intelligence Battery Controller Using Bias-Corrected Q-learning

http://energysystems.princeton.edu/Papers/Lee_Powell_AAAI2012_BiasCorrectedQLearning.pdf

- W. Dbney and A. G. Barto, Adaptive Step-Size for Online Reinforcement Learning

http://people.cs.umass.edu/~wdabney/papers/alphaBounds.pdf

**1'August'12**

- Mayank presents "Safe exploration in Markov Decision Processes" by Teodor Mihai Moldovan and Pieter Abbeel, ICML'2012. [http://icml.cc/2012/papers/838.pdf]

**25'July'12**

- Wen presents "Efficient learning algorithms for changing environments" by Elad Hazan and C. Seshadhri, ICML'2009

http://dl.acm.org/citation.cfm?id=1553425

**18 July'12**

- Hadi presents "On Bayes Methods for On-line Boolean Prediction" by Nicolo Cesa-Bianchi and David P. Helmbold and Sandra Panizza, in NeuroColt 1997

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.631

**11 July'12**

- Phuong doing test run for AAAI presentation

**27 June 12**

- Tor presents: David Freedman, On the Asymptotic Behaviour of Bayes Estimates in the Discrete Case II, 1965.
- http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoms/1177700155

**20 June 12**

- Cancelled

**13'June'12**

- Phuong presents his work on Regret bounds for feature reinforcement learning where he extends the work by Maillard, Munos and Ryabko to the countable case.

**30'May'12**

- Mayank talks about Predictive State Representations
- Littman, Michael L.; Richard S. Sutton; Satinder Singh (2002). "Predictive Representations of State". Advances in Neural Information Processing Systems 14 (NIPS). pp. 1555–1561.
- Singh, Satinder; Michael R. James; Matthew R. Rudary (2004). "Predictive State Representations: A New Theory for Modeling Dynamical Systems". Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI). pp. 512–519.

**10.May'12**

- Wen reports from DCC 2012 on three paper (emailed out)

**2.May'12**

- Selecting the state representation in reinforcement learning Maillard, Munos and Ryabko

http://books.nips.cc/papers/files/nips24/NIPS2011_1427.pdf

**18.April'12**

- On Nicod's condition and the black raven paradox

The paper is available from Hadi or Peter by email

**4.April'12**

- Near-optimal Regret Bounds for Reinforcement Learning, Thomas Jaksch, Ronald Ortner and Peter Auer

http://jmlr.csail.mit.edu/papers/v11/jaksch10a.html

**21.Mar'12**

- Automatic discovery of ranking formulas for playing with multi-armed bandits, Francis Maes, Louis Wehenkel, and Damien Ernst, EWRL 2011

http://ewrl.files.wordpress.com/2011/08/ewrl2011_submission_15.pdf

**14.Mar'12**

- A theoretical analysis of model based interval estimation by A. Strehl and M. Littman, ICML 2005

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.1496

presented by Peter

**29.Feb'12**

- Wen will be presenting his TPR on compression

**15.Feb'12**

- PAC bounds for Discounted MDPs, presented by Tor. Email ua.ude.una|eromittal.rot#ua.ude.una|eromittal.rot for a copy of the paper.

**8.Feb'12**

- Some inequalities in probability theory, presented by Tor

**7.Dec'11**

- Predictive State Temporal Difference Learning by Byron Boots and Geoff Gordon NIPS 2010

http://www.cs.cmu.edu/~ggordon/boots-gordon-PSTD.pdf

**30.Nov'11**

- Solomonoff Memorial conference in Melbourne. Tor, Ian, Peter and Wen presenting.

**23.Nov'11**

- An approximation of the universal intelligence measure, Shane Legg and Joel Veness

http://jveness.info/publications/rsmc2011%20-%20aiq.pdf

presented by Wen as a practice talk for Solomonoff Memorial

Further discussions of the paper follows the 20 minute presentation with slides

**16.Nov'11**

- Looping Suffix Tree-Based Inference of Partially Observable Hidden State, ICML 2006, Michael P. Holmes , Charles Lee Isbell, Jr.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.62.262

presented by Mayank

**9.Nov'11**

- We are done with the book. Paper reading resumes next week.

**Nov'11**

- Peter presents chapter 7 and Joseph Chapter 8.

**Oct'11**

- Wen presents Chapter 6

**Sep'11**

- After a break for the first two weeks, Phuong presents chapter 5

**Aug'11**

- Tor presents chapter 4

**July'11**

- Daniel finnish chapter 3

**30.June'11**

- Daniel presents chapter 3 of "Neuro-dynamic programming"

**8,15,23.June'11**

- Mayank presents chapter 2 of "Neuro-dynamic programming"

**1.June'11**

- We will start reading "Neuro-dynamic programming" by Dimitri P. Bertsekas and John Tsitsiklis

Athena Scientific 1996

http://www.amazon.com/Neuro-Dynamic-Programming-Optimization-Neural-Computation/dp/1886529108

We will go through the introduction this week and do some planning for the reading group.

This will be lead by Peter

**25.May'11**

- Variable resolution discretization in optimal control

R. Munos, A.Moore

http://repository.cmu.edu/cgi/viewcontent.cgi?article=1259&context=robotics

Presented by Daniel

**18.May'11**

- [WNLL] Planning and Learning in Environments with Delayed Feedback

Thomas J. Walsh , Ali Nouri , Lihong Li , Michael L. Littman

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.677

Presented by Matthew

**11.May'11**

- Meeting, discussing paper reviewing

**4.May'11**

- Zahra Zamani (PhD Monitoring) An Agent Architecture for Structured Uncertain Environments

http://cecs.anu.edu.au/seminars/more/SID/2834 - Phuong Nguyen (PhD Monitoring) Feature Reinforcement Learning In Practice

http://cecs.anu.edu.au/seminars/more/SID/2833

**27.April'11**

- Tor Lattimore (PhD Monitoring) Asymptotically Optimal Agents

http://cecs.anu.edu.au/seminars/more/SID/2832 - Wen Shao (PhD Monitoring) AIXI in Formalisation of Turing Test

http://cecs.anu.edu.au/seminars/more/SID/2835

**20.April'11**

- Matthew Robards (PhD Monitoring) Function Approximation for Model Based Reinforcement Learning

http://cecs.anu.edu.au/seminars/more/SID/2830 - Mayank Daswani (PhD Monitoring) Feature Dynamic Bayesian Networks

http://cecs.anu.edu.au/seminars/more/SID/2831

**13.April'11**

- Wen presents, [Chu10] Evgeny Chutchev (2010), A Formalization of the Turing Test

http://arxiv.org/abs/1005.4989

**6.April'11**

- Mayank Presents [Hut09] M.Hutter, Feature dynamic Bayesian networks.

In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume8, pages 67-73. Atlantis Press, 2009.

http://www.hutter1.net/ai/phidbn.pdf

**30.Mar'11**

- Daniel Presents [NL09] A Nouri, M. Littman, Multi-resolution Exploration in Continuous Spaces, NIPS 2009

http://books.nips.cc/papers/files/nips21/NIPS2008_0730.pdf

**23.Mar'11**

- Peter presents "Dynamic Policy Programming"

http://www.mbfys.ru.nl/staff/m.azar/poster_NIPS09.pdf

http://arxiv.org/abs/1004.2027

**16.Mar'11**

- Phuong presents

**9.Mar'11**

- Pascal presents

Finale Doshi-Velez: Nonparametric Bayesian Approaches for Reinforcement Learning in Partially Observable Domains

http://www.informatik.uni-trier.de/%7Eley/db/conf/aaai/aaai2010.html#Doshi-Velez10

and Matthew presents Model Based RL with Function Approximation

**4.Mar'11**

"Pascal" Workshop on RL and Planning at NICTA Level 3, Meeting Room D

- 10:30 — 11:00 | Pascal Poupart: Explaining Automated Policies for Sequential Decision Making
- 11:00 — 11:18 | Debdeep Banerjee: Partial Order Support Link Scheduling
- 11:18 — 11:36 | Patrik Haslum: A Quick Overview of Factored (Classical) Planning
- Break — 12 minutes
- 11:48 — 12:06 | Scott Sanner: The Relational Dynamic Influence Diagram Language
- 12:06 — 12:24 | Peter Sunehag: History-based Reinforcement Learning
- 12:24 — 12:42 | Matt Robards: Model-Based Reinforcement Learning With Function Approximation
- 12:42 — 13:00 | Will Uther: topic TBD

**2.Mar'11**

- PAC-Bayesian Model Selection for Reinforcement Learning

Mahdi Milani Fard, Joelle Pineau

http://books.nips.cc/papers/files/nips23/NIPS2010_0431.pdf

Presented by Pascal Poupart

**23.Feb'11**

- Tor and Hassan present …

**16.Feb'11**

- Matthew and Peter present …

**9.Feb.'11**

- Bruno C. da Silva, Eduardo W. Basso, Ana L. C. Bazzan, Paulo M. Engel,

Dealing with Non-Stationary Environments using Context Detection

ICML 2006

http://www.autonlab.org/icml_documents/camera-ready/028_Dealing_with_Non_Sta.pdf

Presented by Aaron Li

**15.Dec.'10**

- Chapters 6—8, Sridhar Mahadevan, "Learning Representation and Control in Markov Decision Processes: New Frontiers".

Foundations and Trends in Machine Learning (editor, Michael, Jordan), vol 1, No. 4, pp. 403-565 (163 pages), 2009.

http://www.cs.umass.edu/~mahadeva/papers/ml-found-trend.pdf

Presented by Scott

**8.Dec.'10**

- Statistical physics of social dynamics

Castellano, C., Fortunato, S., and Loreto, V. 2009. Reviews of Modern Physics 81, 2, 591

Section IV. Cultural Dynamics, Parts A & B (Axelrod model and variants)

http://dx.doi.org/10.1103/RevModPhys.81.591

Presented by Ian Wood

**1.Dec.'10**

- Constrained Complexity Generalized Context-Tree Algorithms, Robert J Drost and Andrew C Singer

http:/ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04301233

Presented by Peter

**24.Nov'10**

- Hopefully the following will be presented :Efficient real-time dynamic programming for factored MDPs.

Honors thesis by Sotirios Diamand. NOTE: In N101.

**17.Nov'10**

- Ofer Dekel and Shai Shalev Shwartz and Yoram Singer

Power of Selective Memory: Self Bounded Learning of Prediction Suffix Trees

NIPS 2004

http://ttic.uchicago.edu/~shai/papers/DekelShSi04.pdf

(Presented by Hassan)

**10.Nov'10**

- Efficient real-time dynamic programming for factored MDPs.

Honors thesis by Sotirios Diamand. CANCELED

**3.Nov'10**

- Cancelled

**27.Oct'10**

- Optimality Issues of Universal Greedy Agents with Static Priors by Laurent Orseau http://www.springerlink.com/content/p2780778k054411x/

Presented by Tor

**20.Oct'10**

- Never Ending Language Learning (from Tom Mitchell's group at CMU)

Scientific: http://rtw.ml.cmu.edu/papers/carlson-aaai10.pdf

News: http://www.nytimes.com/2010/10/05/science/05compute.html

Webpage: http://rtw.ml.cmu.edu/rtw/publications

**13.Oct'10**

- Frank Stephan will talk about Inductive Inference. He is a visitor from Singapore who was a PC chair at ALT and tutorial speaker. Webpage: http://www.comp.nus.edu.sg/~fstephan

**22 .Sep'10, in Room 207 (only ours to 12:30, be on time)**

- A Complete Theory of Everything, http://arxiv.org/abs/0912.5434

(Marcus)

**15 .Sep'10**

- Constantine Caramanis and Shie Mannor

Learning in the Limit with Adversarial Disturbances

In, Proceedings of COLT 2008.

http://www.ece.mcgill.ca/~smanno1//public/C-CarmanisM-COLT2008.pdf

(Presented by Hassan)

I list two more interesting papers that are more RL related but harder (I think)

- Huibert Kwakernaak, Robust control and H8-optimization - Tutorial paper. Automatica, 29 (2). pp. 255-273. 1993.

http://doc.utwente.nl/29962/1/Kwakernaak93robust.pdf - Jun Morimoto and Kenji Doya, Robust Reinforcement Learning. Neural Computation 2005.

http://mitpress.mit.edu/journals/pdf/neco_17_2_335_0.pdf

And another making the case for pursuing robust estimators in general:

- Peter j. Huber. On the non-optimality of optimal procedures. Optimality, the third Erich L. Lehmann Symposium. 2009.

http://projecteuclid.org/euclid.lnms/1249305323

**8 .Sep'10**

- No Free Lunch and Occam's Razor in Supervised Learning

(Tor presents his work)

**18,25 .Aug,1.Sep'10**

- Chapter 8 of Universal AI book

(general discussion)

**11 .Aug'10**

- New TD algorithms from Alberta

(Matthew presents a survey of stuff by Maei, Sutton and their collegues)

**4 .Aug'10**

- MC-AIXI-CTW,

(Joel Verness) NOTE LOCATION: A207

**28 .July'10**

- End of Chapter 7 of Universal AI book

(Tor presents)

**21 .July'10**

- Meeting about Advanced AI course

**Aug'10**

- Hyeong Soo Chang and Michael C. Fu and Jiaqiao Hu and Steven I. Marcus

An Adaptive Sampling Algorithm for Solving Markov Decision Processes

Operations Research, 53 (1), January–February 2005, pp. 126–139

http://www.rhsmith.umd.edu/faculty/mfu/fu_files/CFHM05.pdf

**9,16 .Jun'10**

- Chapter 7 of Universal AI book

(Presented by Phuong)

**2.Jun'10**

- Canceled

**26.May'10**

- Matthew Robards and Peter Sunehag and Scott Sanner

RKHS Temporal Difference Learning

Tech Report, The Australian National University

RKHS Temporal Difference Learning

**28.Apr'&05,19.May'10**

- Chapter 6 of Universal AI book

(Presented by Peter,Marcus, Zhara)

**21.April'10**

- F. Willems and Y. Shtarkov and T. Tjalkens

Reflections on the Prize Paper: "The Context-Tree Weighting Method: Basic Properties"

IEEE Information Theory Society Newsletter (47) No 1, March 1997

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.1872&rep=rep1&type=pdf

For more details,

- F. Willems and Y. Shtarkov and T. Tjalkens

The context-tree weighting method: Basic properties

IEEE Transactions on Information Theory (41), 653 - 664, 1995

http://ieeexplore.ieee.org/iel1/18/8656/00382012.pdf?arnumber=382012

(the following is a more readable version of the same paper)

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.30.1819&rep=rep1&type=pdf

(Presented by Tor)

**14.April'10**

- L. Kocsis, Cs. Szepesvári

Bandit Based Monte-Carlo Planning

In, Proceedings of the 17th European Conference on Machine Learning

Springer-Verlag, Berlin, LNCS/LNAI 4212, September 18-22, pp. 282-293, 2006.

http://www.sztaki.hu/~szcsaba/papers/ecml06.pdf

(Presented by Peter)

**7.April'10**

- P. Auer, N. Cesa-Bianchi, Y. Freund, and R.E. Schapire.

The nonstochastic multiarmed bandit problem.

SIAM Journal on Computing, 32: 48- 77, 2002.

http://www.cs.princeton.edu/~schapire/uncompress-papers.cgi/AuerCeFrSc01.ps

(Presented by Mark Reid)

**31.Mar'10**

- Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer

Finite time analysis of the multiarmed bandit problem

Machine Learning, 47(2-3):235-256, 2002.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.9211&rep=rep1&type=pdf

(Presented by Hassan)

**17&24.Mar'10**

- M. Kearns, Y. Mansour, and A.Y. Ng.

A sparse sampling algorithm for near optimal planning in large Markovian decision processes.

In Proceedings of IJCAI'99, pages 1324-1331, 1999.

http://www.cis.upenn.edu/~mkearns/papers/sparseplan.pdf

(Presented by Zahra)

**3&10.Mar'10**

- Joel Veness and Kee Siong Ng and Marcus Hutter and David Silver

A Monte Carlo AIXI Approximation

Technical Report, arXiv 0909.0801 (2009) 1-42

[implementation & application of the AIXI]

http://www.hutter1.net/ai/aixictw.pdf

(Presented by Sam)

**3&10&17&24.Feb'10**

- C. Boutilier and T. Dean and S. Hanks

Decision-Theoretic Planning: Structural Assumptions and Computational Leverage

Journal of Artificial Intelligence Research, 11 (1999) 1—94.

http://www.eecs.harvard.edu/~avi/CS281r/F06/Papers/boutilier-et-al-mdp.pdf

(Presented 3rd by Scott and the group, 10th no meeting, 17th Phuong, 24th Zahra)

**27.Jan'10**

- Sebastian Thrun, Probabilistic Algorithms in Robotics

AI Magazine, 21:4 (2000) 93—109

http://www.cs.cmu.edu/~thrun/papers/thrun.probrob.pdf

(Presented by Marcus)

**20.Jan'10**

- Continuation of last week's paper + Summer Scholar presentation preview.

**13.Jan'10**

- Note: Changed from Before.

Beal, M.J., Ghahramani, Z. and Rasmussen, C.E.

The Infinite Hidden Markov Model

In Advances in Neural Information Processing Systems 2002.

http://www.cse.buffalo.edu/faculty/mbeal/papers/ihmm.pdf

(Presented by Hassan).

**23&30.Dec'09**

- Break - (-: Christmas and New Years :-)

**16.Dec'09**

Scott will be presenting:

- (1) Excerpts of Scott's Thesis on factored MDPs.

- (2) Stochastic Planning using Decision Diagrams (SPUDD).

Hoey, St. Aubin, Hu, Boutilier (UAI-99)

http://www.cs.toronto.edu/~cebly/Papers/spudd.ps

- (3) Approximate Policy Construction using Decision Diagrams (APRICODD).

St. Aubin, Hoey, Boutilier (NIPS-00)

http://www.cs.ubc.ca/nest/lci/papers/docs2000/hoey-apricodd.pdf

**09.Dec'09**

- [RP08] S.Ross and J.Pineau.

Model-based Bayesian reinforcement learning in large structured domains.

In Proc. 24th Conference in Uncertainty in Artificial Intelligence

(UAI'08), pages 476-483, Helsinki, 2008. AUAI Press.

http://www.cs.mcgill.ca/~jpineau/files/sross-uai08.pdf

(Presented by Peter)

**02.Dec'09**

*Note: Changed from before.*

M. Rosencrantz, G. Gordon, and S. Thrun.

Learning low dimensional predictive representations.

In Proceedings of the Twenty-First International Conference on Machine Learning,

Banff, Alberta, Canada, 2004.

http://robots.stanford.edu/papers/Rosencrantz04a.pdf

(Presented by Ian)

**25.Nov'09**

- [SJR04] S.P. Singh, M.R. James, and M.R. Rudary.

Predictive state representations: A new theory for modeling dynamical systems.

In Proc. 20th Conference in Uncertainty in Artificial Intelligence (UAI'04), pages 512-518, Banff, Canada, 2004. AUAI Press.

(Presented by Hassan)

**18.Nov'09**

- Matthew Robards will present his literature review on reinforcement learning in large, continuous spaces (focus on Part II).

Literature Review

**11.Nov'09**

- [SLL09] A.L. Strehl, L.Li, and MichaelL. Littman.

Reinforcement learning in finite MDPs: PAC analysis.

http://paul.rutgers.edu/~strehl/, 2009.

(Presented by Marcus)

**04.Nov'09**

- [McC95] McCallum, R. Andrew.

Instance-Based Utile Distinctions for Reinforcement Learning.

The Proceedings of the Twelfth International Machine Learning Conference (ML'95).

Lake Tahoe, CA, 1995.

ftp://ftp.cs.rochester.edu/pub/papers/robotics/95.mccallum-ml.ps.Z

(Presented by Peter)

**28.Oct'09**

Scott will be talking about several nice methods for solving MDPs efficiently.

The 4 papers to be covered are summarized in the following slides:

http://sml.nicta.com.au/rlp08/RLP_MDP_Extensions.pdf

The papers themselves are as follows (it's recommended that people read the first one

and skim through the others).

- Algorithms for Inverse Reinforcement Learning.

Andrew Y. Ng and Stuart Russell.

ICML 2000.

http://robotics.stanford.edu/~ang/papers/icml00-irl.pdf

- Policy invariance under reward transformations: theory and application to reward shaping.

Andrew Y. Ng and Daishi Harada and Stuart Russell.

ICML 1999.

http://robotics.stanford.edu/~ang/papers/shaping-icml99.pdf

- Hierarchical Solution of Markov Decision Processes using Macro-actions.

Milos Hauskrecht and Nicolas Meuleau and Leslie Pack Kaelbling and Thomas Dean and Craig Boutilier.

UAI 1998.

Note: this paper builds on the macro action semi-MDP framework of Sutton & Precup, but makes some

important changes which make things much cleaner (theoretically and implementationally).

http://www.cs.toronto.edu/kr/papers/macros.pdf

- Reinforcement Learning with Hierarchies of Machines.

Ronald Parr and Stuart Russell.

NIPS 1998.

http://eprints.kfupm.edu.sa/61888/1/61888.pdf

**21.Oct'09**

- Andrew Y. Ng and Michael Jordan.

PEGASUS: A policy search method for large MDPs and POMDPs.

In Uncertainty in Artificial Intelligence, Proceedings of the Sixteenth Conference, 2000.

http://robotics.stanford.edu/~ang/papers/uai00-pegasus.pdf

(Presented by Matthew)

**Addendum:** Policy gradient techniques from a robotics perspective:

- Policy gradient methods for robotics.

J. Peters and S.Schaal.

IROS 2006

http://www-clmc.usc.edu/publications/P/peters-IROS2006.pdf

**14.Oct'09**

- [NCD04] A.Y. Ng, A.Coates, M.Diel, V.Ganapathi, J.Schulte, B.Tse, E.Berger, and E.Liang.

Autonomous inverted helicopter flight via reinforcement learning.

In ISER, volume21 of Springer Tracts in Advanced Robotics, pages 363-372. Springer, 2004.

(Presented by Phuong)

**30.Sep'09&7.Oct'09**

- [RPPC08] S. Ross, J. Pineau, S. Paquet, B. Chaib-draa,

Online planning algorithms for POMDPs,

Journal of Artificial Intelligence Research, 32 (2008) 663—704.

This paper compares the "online" "tree-search" planning approach, popular for games

with the "offline" "self-consistent" Bellman equation approach,

popular in reinforcement learning (and described by Kaelbling 1998 et al).

(Presented by Peter).

**16&23.Sep'09**

- [KLC98] L.P. Kaelbling and M.L. Littman and A.R. Cassandra,

Planning and Acting in Partially Observable Stochastic Domains

Artificial Intelligence, 101 (1998) 99—134

(Presented by Marcus/Hassan/Sarah)

# Papers in Queue

**General POMDPs**

- Nishiyama, Y., Boularias, A., Gretton, A., and Fukumizu, K., Hilbert Space Embeddings of {POMDPs}, UAI, 2012

http://www.gatsby.ucl.ac.uk/~gretton/papers/NisBouGreFuk12.pdf - Grunewalder, S., Lever, G., Baldassarre, L., Pontil, M., and Gretton, A., Modeling transition dynamics in {MDP}s with {RKHS} embeddings, ICML, 2012

http://www.gatsby.ucl.ac.uk/~gretton/papers/GruLevBalPonetal12.pdf - Fukumizu, K., Song, L., and Gretton, A., Kernel {Bayes'} Rule, Advances in Neural Information Processing Systems 24, pp.1737-1745, 2011

http://www.gatsby.ucl.ac.uk/~gretton/papers/FukSonGre11.pdf - [Dim10] Christos Dimitrakakis (2010) Context MDPs

http://fias.uni-frankfurt.de/~dimitrakakis/papers/cmdp.pdf

**State Abstractions for RL**

- [LWL06], Lihong Li , Thomas J. Walsh , Michael L. Littman,

Towards a Unified Theory of State Abstraction for MDPs

In Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.60.1229

- [GDG03] R.Givan, T.Dean, and M.Greig.

Equivalence notions and model minimization in Markov decision processes.

Artificial Intelligence, 147(1-2):163-223, 2003.

- [Hut09a] M.Hutter.

Feature dynamic Bayesian networks.

In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume8, pages 67-73. Atlantis Press, 2009.

**General MDPs**

- [LLW08] Lihong Li, Michael L. Littman, Thomas J. Walsh: Knows what it knows: a framework for self-aware learning. ICML 2008: 568-575

www.machinelearning.org/archive/icml2008/papers/627.pdf

- [DLL09] The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning

Carlos Diuk, Lihong Li, Bethany Leffler ICML '09

http://dl.acm.org/citation.cfm?doid=1553374.1553406

- [LL10], Lihong Li and Michael L. Littman, Reducing reinforcement learning to KWIK online regression

Tenth International Symposium on Artificial Intelligence and Mathematics"

http://www.springerlink.com/content/g25m74160311n665/fulltext.pdf

- [SL07] Er L. Strehl , Michael L. Littman , Online linear regression and its application to model-based reinforcement learning (NIPS 2007)

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.6591

- [WGL10] Thomas J. Walsh, Sergiu Goschin, Michael L. Littman: Integrating Sample-Based Planning and Model-Based Reinforcement Learning. AAAI 2010,

http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1880 - [WNLL07], Thomas J. Walsh , Ali Nouri , Lihong Li , Michael L. Littman,

Planning and Learning in Environments with Delayed Feedback

Autonomous Agents and Multi-Agent Systems 18(1): 83-105 (2009)

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.677 - [JS10] Tobias Jung, Peter Stone: Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-Like Exploration.

ECML/PKDD (1) 2010: 601-616

http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/ECML10-jung.pdf

- [JS09] Nicholas K. Jong, Peter Stone: Compositional Models for Reinforcement Learning.

ECML/PKDD (1) 2009: 644-659

http://www.springerlink.com/content/11460wl75p04493v/

- [GP10] M. Geist and O. Pietquin (2010) Kalman Temporal Differences

JAIR Volume 39, pages 483-532

http://www.jair.org/papers/paper3077.html

- [LT10] T. Lang and M. Toussaint (2010) Planning with Noisy Probabilistic Relational Rules

JAIR Volume 39, pages 1-49

http://www.jair.org/papers/paper3093.html

- [BBSE10] (Book), Lucian Busoniu, Robert Babuska, Bart De Schutter, and Damien Ernst (2010)

"Reinforcement Learning and Dynamic Programming Using Functions Approximators"

in the Automation and Control Engineering series of Taylor & Francis CRC Press.

- [Mah09] Sridhar Mahadevan (2009) Learning Representation and Control in Markov Decision Processes: New Frontiers

Foundations and Trends in Machine Learning: Vol. 1: No 4, pp 403-565.

http://dx.doi.org/10.1561/2200000003

**Miscellaneous**

- [Gru04] P.D. Gruenwald.

Tutorial on minimum description length.

In Minimum Description Length: recent advances in theory and practice, page Chapters 1 and 2. MIT Press, 2004.

- [BLA02] B. Ng, L. Peshkin, and A. Pfeffer.

Factored Particles for Scalable Monitoring.

In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, 2002.

# Background Reading

- L. P. Kaelbling and M. L. Littman and A. W. Moore,

Reinforcement learning: A Survey,

Journal of Artificial Intelligence Research, 4 (1996) 237—285

http://www.cs.cmu.edu/afs/cs.cmu.edu/project/jair/pub/volume4/kaelbling96a.pdf

- R. Sutton and A. Barto. Reinforcement learning: An introduction

Cambridge, MA, MIT Press (1998),

http://www.cs.ualberta.ca/~sutton/book/the-book.html

- [Put94] M.L. Puterman.

Markov Decision Processes - Discrete Stochastic Dynamic Programming.

Wiley, New York, NY, 1994.

- [KV86] P.R. Kumar and P.P. Varaiya.

Stochastic Systems: Estimation, Identification, and Adaptive Control.

Prentice Hall, Englewood Cliffs, NJ, 1986.

# Contact

Tom Everitt <ua.ude.una|ttireve.mot#ua.ude.una|ttireve.mot> or

Marcus Hutter <ua.ude.una|rettuh.sucram#ua.ude.una|rettuh.sucram>