Reinforcement Learning Reading Group

General Information

Welcome to the Reinforcement Learning Reading Group at RSCS@ANU

Regular (Past&Current) Participants:
Mostly students and researchers from RSCS@ANU and other RL friends nearby.

Reading List

The schedule for the reading group is given below and will be updated weekly.

Important: Reading group has become virtual, please email moc.liamg|cilbup.ud.nel#moc.liamg|cilbup.ud.nel to join the reading group.

30.Nov.23 (01.Dec.23 AEST)
Casey S. Schroeder presents Decision as Pattern Recognition

The hypothesis is roughly that our actions are the result of recognizing a pattern with extension in time, with a certain 'choice' from a weighted probability distribution of movements, replacing 'active' components of the pattern, as they move to the present. It is a 'choice' in much the same way a quantum wave 'chooses' to collapse, perhaps not really a choice at all, but a probabilistic mechanics. This mechanism may account for most everyday practical reasoning.

23.Nov.23 (24.Nov.23 AEST)
Free-form discussion

09.Nov.23 (10.Nov.23)
Free-form discussion

02.Nov.23 (03.Nov.23)
Michael Bennett informally presents further contents on previous work.

26.Oct.23 (27.Oct.23)
Extra free-form discussion

19.Oct.23 (20.Oct.23)
Free-form discussion

12.Oct.23 (13.Oct.23)
Cole Wyeth presents A Circuit Complexity Formulation of Algorithmic Information Theory

05.Oct.23 (06.Oct.23)
Free-form discussion

28.Sep.23 (29.Sep.23)
Vincent Abott presents Neural Circuit Diagrams: Standardized Diagrams for Deep Learning Architectures

21.Sep.23 (08.Sep.23)
Free-form discussion

14.Sep.23 (15.Sep.23)
Michael Bennett presents A Unified Theory of Meaning, Consciousness and Artificial superintelligence (Best student paper at AGI-23)

07.Sep.23 (08.Sep.23)
Free-form discussion

31.Aug.23 (01.Sep.23)
Vincent Abott presents Neural Circuit Diagrams: Standardized Diagrams for Deep Learning Architectures

24.Aug.23 (25.Aug.23)
Free-form discussion

17.Aug.23 (18.Aug.23)
Vincent Abott presents Neural Circuit Diagrams: Standardized Diagrams for Deep Learning Architectures

10.Aug.23 (11.Aug.23)
Free-form discussion

03.Aug.23 (04.Aug.23)
No reading group

27.Jul.23 (28.Jul.23)
Free-form discussion

20.Jul.23 (21.Jul.23)
No reading group

13.Jul.23 (14.Jul.23)
Free-form discussion

6.Jul.23 (7.Jul.23)
No reading group

29.Jun.23 (30.Jun.23)
Free-form discussion

22.Jun.23 (23.Jun.23)
No reading group

15.Jun.23 (16.Jun.23)
Free-form discussion

8.Jun.23 (9.Jun.23)
No reading group

1.Jun.23
Free-form discussion

24.May.23
BREAK - in transition

17.May.23
Free-form discussion

10.May.23
Anna Winnicki presents A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

3.May.23
Free-form discussion

26.Apr.23
Ashish Jayant presents Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm

19.Apr.23
Free-form discussion

12.Apr.23
Jakob Thumm presents Reducing Safety Interventions in Provably Safe Reinforcement Learning

5.Apr.23
Free-form discussion

29.Mar.23
TBA

22.Mar.23
Free-form discussion

15.Mar.23
David Quarel presents Git Re-Basin: Merging Models Modulo Permutation Symmetries

8.Mar.23
Free-form discussion

1.Mar.23
Arash Tavakoli presents Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

22.Feb.23
Free-form discussion

15.Feb.23
Samuel Alexander presents Extending Environments to Measure Self-reflection in Reinforcement Learning

8.Feb.23
Jinke He presents Online Planning in POMDPs with Self-Improving Simulators

1.Feb.23
Free-form discussion

25.Jan.23
Free-form discussion

18.Jan.23
Joar Skalse presents Defining and Characterizing Reward Hacking

11.Jan.23
BREAK

4.Jan.23
BREAK

28.Dec.22
BREAK

21.Dec.22
BREAK

14.Dec.22
Free-form discussion

7.Dec.22
(TBC) Joar Skalse presents Reinforcement learning in Newcomblike environments

30.Nov.22
Free-form discussion

23.Nov.22
Runze Tang presents his honors thesis on Procedural Content Generation using GANs

16.Nov.22
Free-form discussion

9.Nov.22
Samuel Alexander presents Agent mixtures and the genericness of non-deterministic intelligence]

2.Nov.22
Free-form discussion

26.Oct.22
Samuel Yang-Zhao and Tianyu Wang present A Direct Approximation of AIXI Using Logical State Abstractions

19.Oct.22
Aram Ebtekar continues with Information dynamics and the arrow of time

12.Oct.22
Aram Ebtekar presents Information dynamics and the arrow of time

5.Oct.22
Free-form discussion

28.Sep.22
Steve Carr presents Safe Reinforcement Learning via Shielding for POMDPs

21.Sep.22
Free-form discussion

14.Sep.22
Matthew Aitchison presents DNA: Proximal Policy Optimization with a Dual Network Architecture

7.Sep.22
Free-form discussion

31.Aug.22
No reading group

24.Aug.22
Free-form discussion

17.Aug.22
Matthew Aitchison presents Is the Policy Gradient a Gradient?

10.Aug.22
Free-form discussion

3.Aug.22
No reading group Christopher Mingard continues with Neural networks are a priori biased towards Boolean functions with low entropy

27.July.22
Free-form discussion

20.July.22
Samuel Alexander presents Can reinforcement learning learn itself? A reply to 'Reward is enough

13.July.22
Free-form discussion

6.July.22
DavidQ discusses Context Tree Weighting

29.June.22
Free-form discussion

22.June.22
Michael Bennett presents Computable Artificial General Intelligence

15.June.22
Free-form discussion

8.June.22
Christopher Mingard presents Neural networks are a priori biased towards Boolean functions with low entropy

1.June.22
Free-form discussion

25.May.22
Samuel Alexander presents The Archimedean trap: Why traditional reinforcement learning will probably not yield AGI

18.May.22
Free-form discussion

11.May.22
David Abel presents On the Expressivity of Markov Reward

4.May.22
Free-form discussion

27.Apr.22
Peter Vamplew presents Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton

20.Apr.22
Free-form discussion

13.Apr.22
Vembalagu "VJ" Vijendran discusses Quantum Algorithms for Reinforcement Learning

6.Apr.22
[no reading group]

30.Mar.22
Samuel Alexander presents Reward-Punishment Symmetric Universal Intelligence

23.Mar.22
Free-form discussion

16.Mar.22
Elliot discusses investigations into binarisation in reinforcement learning

9.Mar.22
Free-form discussion

2.Mar.22
Erik Rehn presents Free Will Belief as a consequence of Model-based Reinforcement Learning

23.Feb.22
Free-form discussion

16.Feb.22
Tomer Galanti presents On the Role of Neural Collapse in Transfer Learning

9.Feb.22
DavidQ continues presenting Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression

2.Feb.22
DavidQ presents Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression

26.Jan.22
Free-form discussion

19.Jan.22
DavidQ presents Memoryless policies: theoretical limitations and practical results

15.Dec.21
Preetum presents Turing-Universal Learners with Optimal Scaling Laws

10.Nov.21
Michael Presents Shaking the foundations: delusions in sequence models for interaction and control

13.Oct.21
Elliot presents Reinforcement Learning with Information-Theoretic Actuation

29.Sept.21
Sultan presents ARENA

21.July'21
Matthew presents Muesli: Combining Improvements in Policy Optimization

26.May'21
Jonathon presents Monte-Carlo planning for Partially Observable Markov Games

28.Apr'21
Len presents Nondeterministic Turing machines as a practical pattern language for beyond-context-free patterns and replacement of (propositional) logic

14.Apr'21
Jaskirat presents Sparse Attention Guided Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning

31.Mar'21
Michael B discusses Defining Tasks, Intensional Solutions, and a Computational Theory of Meaning

17.Mar'21
Matthew presents Fixed-Horizon Temporal Difference Methodsfor Stable Reinforcement Learning

3.Mar'21
Elliot presents Universal Agents in Repeated Matrix Games

17.Feb'21
David Q presents Temporal Difference Updating without a Learning Rate

3.Feb'21
Michael C talks about Online Imitation Learning

16.Dec'20
David A continues presenting The Theory of Abstraction in Reinforcement Learning

9.Dec'20
David A presents The Theory of Abstraction in Reinforcement Learning

25.Nov'20
Marcus gives final presentation on Neural Network Approximation Theory

11.Nov'20
Marcus continues presenting an Introduction to Neural Network Approximation Theory

28.Oct'20
Matthew continues to present Role-Based Deception in multi-agent games

21.Oct'20
Matthew presents Role-Based Deception in multi-agent games

14.Oct'20
Marcus continues presenting an Introduction to Neural Network Approximation Theory

7.Oct'20
Elliot continues to present A Gentle Introduction to Quantum Computing Algorithms

30.Sept'20
Michael B discusses Fragility, Mimicry and Understanding: Why AI Lacks Human Adaptability, and How to Fix This

23.Sept'20
Jonathon presents A Distributional Perspective on Reinforcement Learning

16.Sept'20
Marcus continues presenting an Introduction to Neural Network Approximation Theory

9.Sept'20
[no reading group]

2.Sept'20
Elliot continues to present The forget me not process

26.Aug'20
Elliot will present The forget me not process

19.Aug'20
Matthew presents Language Models are Few-Shot Learners

12.Aug'20
Michael C presents Quantilizers

5.Aug'20
[no reading group]

29.Jul'20
Marcus continues presenting an Introduction to Neural Network Approximation Theory

15.Jul'20
Joel presents Gated Linear Networks

8.Jul'20
Michael B presents the Abstraction and Reasoning Corpus

1.Jul'20
Marcus continues presenting an Introduction to Neural Network Approximation Theory

24.Jun'20
Elliot continues to present A Gentle Introduction to Quantum Computing Algorithms

10.Jun'20
Elliot presents A Gentle Introduction to Quantum Computing Algorithms

3.Jun'20
David J presents Decision theoretic foundations of casual modelling

27.May'20
Marcus continues presenting an Introduction to Neural Network Approximation Theory

20.May'20
Michael B presents On the Measure of Intelligence

13.May'20
Michael C presents Pessimism About Unknown Unknowns Inspires Conservatism

06.May'20
Matthew presents Agent57: Outperforming the human Atari benchmark

29.Apr'20
Sultan presents A Neural Transfer Function for a Smooth and Differentiable Transition Between Additive and Multiplicative Interactions

22.Apr'20
Marcus presents an Introduction to Neural Network Approximation Theory

15.Apr'20
Meet and greet of new virtual reading group

28.Aug'19
Sam, Sultan and Elliot present experiences from IJCAI-2019

10.July'19
Matthew will present Large-Scale Study of Curiosity-Driven Learning

15.May'19
Elliot presents his proofs of Kolmogorov complexity theorems in HOL

24.Apr'19
James Parker presents progress on his Honour's thesis on Feature Reinforcement Learning beyond Markov Decision Processes

17.Apr'19
Timothy presents progress on his Master's thesis on Non-Markovian State and State-Action Abstractions

20.Mar'19
Matthew will present Counterfactual Regret Minimization

20.Feb'19
Michael will continue to present his summer research

13.Feb'19
Michael will present his summer research

6.Feb'19
Elliot presents his experiences at AAAI-19

23.Jan'19
Matthew and Nikhil present their Summer Research Report

16.Jan'19
Matthew will present Learning to Navigate in Complex Environments

12.Dec'18
Sam will present Gradient Descent Finds Global Minima of Deep Neural Networks

5.Dec'18
[no reading group]

28.Nov'18
Nikhil and Matthew will present their recent research.

21.Nov'18
Omar will present A Generalized Representer Theorem

14.Nov'18
Samuel Yang-Zhao will present his honours thesis on Divergence of TD-like algorithms.

7.Nov'18
DavidQ presents his Masters thesis.

31.Oct'18
Advanced AI Group project presentation

24.Oct'18
Elliot will discuss Intelligence, Beyond Bounded Rationality, and Space-time embedded intelligence

17.Oct'18
[no reading group]

10.Oct'18
James Paker will Open the black box of Deep Neural Networks
via Information

3.Oct'18
Omar Ghattas will talk about Multi-Agent Reinforcement Learning

26.Sept'18
Tor Lattimore will talk about Partial Monitoring and his new book

19.Sept'18
Christian Walder will present Neural Dynamic Programming for Musical Self Similarity

12.Sept'18
No reading group due to holidays

5.Sept'18
Michael discusses why local minima don't appear that often in high dimensions

29.Aug'18
Xavier presents Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

22.Aug'18
Sultan continues to present his experiences at conferences (ICML, IJCAI)

15.Aug'18
Sultan presents his experiences at conferences (ICML, IJCAI)

8.Aug'18
Elliot and Xavier present the results of OpenAI Five

1.Aug'18
Chamin presents Online Learning with Gated Linear Networks

25.Jul'18
DavidQ will present Computable Variants of AIXI which are More Powerful than AIXItl

18.Jul'18
Tianyu presents his honours thesis on Neural Causality Detection for Multi-dimensional Point Processes

11.Jul'18
[no reading group]

4.Jul'18
Sultan will present Q-Learning Beyond MDPs

27.Jun'18
[no reading group]

20.Jun'18
Thibaut presents An Outsider's Tour of Reinforcement Learning

13.Jun'18
[no reading group]

6.Jun'18
Michael shows us how Solomonoff's Universal Prior actually looks like

30.May'18
Elliot presents some papers about AGI

23.May'18
Thibaut presents Philosophy and the practice of Bayesian statistics

16.May'18
DavidJ talks about Causality

9.May'18
Michael presents his AGI safety research

2.May'18
Elliot talks about World Models

25.Apr'18
[public holiday, no reading group]

18.Apr'18
Visitor Thiebaux introduces himself and presents his PhD research

11.Apr'18
[no reading group]

4.Apr'18
[no reading group]

28.Mar'18
Showdown between One-boxers and Two-boxers, and Quining the survivers.

21.Mar'18
Xavier presents Functional Decision Theory

14.Mar'18
Tom presents AI Safety Gridworlds

7.Mar'18
Michael reports on his CHAI research internship experience at Berkeley

28.Feb'18
Elliot and Badri present some papers

21.Feb'18
Tom presents some papers

14.Feb'18
Elliot presents population based algorithms

7.Feb'18
Tom continues with Wireheading Taxonomy

31.Jan'18
Tom presents Wireheading Taxonomy

24.Jan'18
[no reading group]

17.Jan'18
Tom reports on his Google Deep Mind internship experience.

6.Dec'17 - 10.Jan'18
[no reading group]

29.Nov'17
Elliot Catt presents his Thesis Progress on Quantum Computing

22.Nov'17
Group Excursion: Meet 13:00 at RSISE=BAB entrance (see email for details)

15.Nov'17
[no reading group]

8.Nov'17
[no reading group]

1.Nov'17
Owen presents his Honours Thesis on Universal Compression of Piecewise iid Sources

25.Oct'17
Xavier presents impressions from CFAR's AI Summer Fellows Program in San Francisco

18.Oct'17
[no reading group]

11.Oct'17
[no reading group]

4.Oct'17
Samuel presents Hindsight Experience Replay

27.Sep'17
Badri presents Convergence of Binarized CTW

20.Sep'17
Daoyi Dong from ADFA presents his research on Quantum RL and Quantum control theory

13.Sep'17
Elliot presents his formalization of TMs in HOL and prove of equivalence to PR functions.

6.Sep'17
[no reading group]

30.Aug'17
Sultan presents impressions from UAI

23.Aug'17
[no reading group]

16.Aug'17
[no reading group]

9.Aug'17
Arthur Franz presents incremental and hierarchical compression

2.Aug'17
Elliot practices conference talk

26.Jul'17
John practices conference talk

19.Jul'17
Tom presents Learning from Human Preferences

12.Jul'17
Tom presents applications of evidential semi-measures

5.Jul'17
John presents Deterministic Policy Gradient Algorithms

28.Jun'17
[No reading group]

21.Jun'17
Adam presents one of his recent research papers

14.Jun'17
Tom presents considerations on SARSA convergence: Gordon (1996) and Perkins and Precup (2003)

7.Jun'17
John presents FeUdal Networks for Hierarchical Reinforcement Learning in Room A105

31.May'17
Elliot presents The forget me not process

24.May'17
Sultan and Marcus present some AGI papers

17.May'17
Marcus presents Compress & Control

12.May'17 @12pm
Reading and discussing some UAI papers

10.May'17
Reading and discussing some UAI papers

3.May'17
Reading and discussing some UAI papers

26.April'17
Arie, Suraj and Elliot present MC-AIXI-CTW

19.April'17
Elliot presents Evolution Strategies as a Scalable Alternative to Reinforcement Learning

12.April'17
Tobias and Mikael continue

5.April'17
Tobias and Mikael present their Bachelor thesis on classifying games

29.March'17
Jarryd continues with generative adversarial networks for RL

22.March'17
Edward Barker presents Unsupervised Basis Function Adaptation for Reinforcement Learning

15.March'17
Jarryd presents generative adversarial networks for RL

8.March'17
Tor Lattimore talks about the adversarial/stochastic divide and some open problems there

1.March'17
Tom continues with The Delusionbox Problem.

22.Feb'17
Tom presents The Delusionbox Problem

15.Feb'17
Phuong Nguyen presents interesting experiences since leaving the group

8.Feb'17
Tor Lattimore talks about some open problems in online learning/statistics/RL.

1.Feb'17
Tor Lattimore presents The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

14.Dec'16
Jarryd presents - Learning to reinforcement learn

07.Dec'16
Boris gives update on his project

30.Nov'16
Tom gives AI-Safety talk

23.Nov'16
Suraj presents his thesis (mid-semester update)

16.Nov'16
[No reading group]

9.Nov'16
Tom gives monitoring talk

2.Nov'16
[No reading group]

26.Oct'16 (at different time: 1pm)
James presents symmetry of algorithmic information

26.Oct'16 (at different time: 12pm)
Farhana presents - Dimensionality of Spatio-Temporal Broadband Signals Observed Over Finite Spatial and Temporal Windows

19.Oct'16
Sultan presents his recent research insights
(thereafter group excursion, see email for details)

12.Oct'16
Jarryd presents exploration results (cont.)

5.Oct'16
John presents AIXIjs

28.Sep'16
Jarryd presents exploration results

21.Sep'16
Manlio presents On the Computability of AIXI

14.Sep'16
John speaks about visit to Bay Area, and Why does deep and cheap learning work so well?

7.Sep'16
Tom presents takeaways from US+UK trip: deep learning

31.Aug'16
Tom presents takeaways from US+UK trip: UC Berkely and AGI

24.Aug'16
Tom presents takeaways from US+UK trip: New AI Safety research agendas Google/OpenAI open safety problems and MIRI's machine learning agenda

17.Aug'16
Tom presents takeaways from US+UK trip: Mainly Cooperative inverse reinforcement learning

3.Aug'16
Manlio Valenti from Trento introduces Upper-SemiComputable SemiMeasures

27.July'16
John and Sean present progress on Interactive GRL Demo

20.July'16
Break

13.July'16
Sultan presents 2 papers

22.June'16
Jarryd presents Unifying Count-Based Exploration and Intrinsic Motivation

15.June'16
Xian Wang presents his research on …

6.June'16 (obs: Monday)
Tom continues with AIXI tutorial

1.June'16
George Stamatescu presents KL Divergence and Reciprocal Chains

31.May'16
Tom presents AIXI tutorial

25.May'16
Gerhard Visser presents Interest-Relative Inductive Inference
thesis draft (unpublished)

11+18.May'16
Break

4.May'16
John presents Pedro A. Ortega, Naftali Tishby (2016) Memory controls time perception and inter-temporal choices

27.April'16
Tom presents wireheading result

20.April'16
Sultan continues AGI reviews

13.April'16
Sultan AGI reviews

6.April'16
Tom and Sultan UAI reviews

30.Mar'16
Jan "defends" his thesis
(in room A105)

23.Mar'16
Discussion of UAI reviews

16.Mar'16
Jan continues to talk about conferences from 2015

9.Mar'16
Sultan presents State of the Art Control of Atari Games Using Shallow Reinforcement Learning

4.Mar'16 11:30 EXTRA SESSION
Adam Case presents

2.Mar'16
Jan presents Safely Interruptible Agents

24.Feb'16
Djallel Bouneffouf presents

17.Feb'16
Jan continues to talk about conferences from 2015

10.Feb'16
No reading group.

3.Feb'16
Jan continues to talk about conferences from 2015

27.Jan'16
Tom presents Owain Evans' paper Learning the Preferences of Ignorant, Inconsistent Agents

20.Jan'16
Jan talks about conferences from 2015

16.Dec'15
Tom summarises the Australasian AI conference, and maybe continues with preliminary results on the wireheading problem.

9.Dec'15
Jae Hee Lee presents his PhD thesis Qualitative Reasoning about Relative Directions: Computational Complexity and Practical Algorithm

2.Dec'15
Break for Australian AI conference

25.Nov'15
Tom presents summary of MIRIx workshop.

18.Nov'15
Tom presents preliminary results on the wireheading problem.

11.Nov'15
Daniel continues with Agents Using Speed Priors

4.Nov'15
Daniel presents Agents Using Speed Priors

28.Oct'15
David presents Practical Extreme State Aggregation

21.Oct'15
Matt Alger presents a project on Deep Inverse Reinforcement Learning

14.Oct'15
Aqua Zhu presents background on classical sequence prediction and related problems.

7.Oct'15
Tom presents Analytical Results on the BFS vs. DFS Algorithm Selection Problem, Part II: Graph Search

23.Sep'15
Tor Lattimore presents Optimally Confident UCB: Improved Regret for Finite-Armed Bandits

16.Sep'15
Spring break

9.Sep'15
Spring break

2.Sep'15
Tom presents Analytical Results on the BFS vs. DFS Algorithm Selection Problem, Part I: Tree Search

26.Aug'15
Marcus continues presenting impressions from ICML/EWRL

19.Aug'15
Hadi Afshar presents Reflection, Refraction, and Hamiltonian Monte Carlo
Recommended (background) reading

12.Aug'15
Tom presents Sequential Extensions of Causal and Evidential Decision Theory

5.Aug'15
Reading group resumes. Marcus presents impressions from ICML/EWRL

24.June'15 - 29.July'15
Winter break

17.June'15
Yiyun presents Modelling Causal Reasoning with Ambiguous Observations and Quantum Probability Model of "Zero-Sum" Beliefs

10.June'15
Mayank presents Neural Turing Machines

3.June'15
Continue discussing reviews for ALT

27.May'15
Discussing reviews for ALT

20.May'15
Jan continues from last time

13.May'15
Jan talks about merging and predicting,
in particular the results from Merging and Learning and
On Sequence Prediction for Arbitrary Measures

6.May'15
Continued discussion of AGI reviews

29.Apr'15
Discussing reviews for AGI

15. and 22.Apr'15
No reading group

8.Apr'15
Jan presents Reflective Oracles: A Foundation for Classical Game Theory

1.Apr'15
Jan presents Reflective Variants of Solomonoff Induction and AIXI

25.Mar'15
Yiyun presents Cognitive processes and mechanisms in causal reasoning with ambiguous observations

18.Mar'15
Marcus presents Compress and Control

11.Mar'15
Daniel presents the current status of his work on the speed prior
The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions

4.Mar'15
Jan presents a journal paper under review

18. and 25. Feb'15
No Reading Group

11.Feb'15
Tom presents Can we measure the difficulty of an optimization problem?

4.Feb'15
Mayank presents selected papers from ACML 2014

28.Jan'15
Jan presents Corrigibility
https://intelligence.org/files/CorrigibilityTR.pdf

21.Jan'15
Mayank gives a tutorial on convex optimization

10.Dec'14
PhD Monitoring Hadi (Room RSISE B123)

19'Nov'14
Peter leads discussion on the new book (with a focus on chapter 7)
Ethical Artificial Intelligence by Bill Hibbard
http://arxiv.org/ftp/arxiv/papers/1411/1411.1373.pdf
Bill builds on UAI, decision theoretic rationality, space-time embedded agents etc. to formally study ethical AI.

12'Nov'14
Xi Li presents on Leibniz's program and its
relation to UAI

5.Nov'14
Daniel Filan talks about Extreme state aggregation beyond MDPs

29.Oct'14
Neal Hughes (economics PhD student) presents on using RL for water management
Note: its in B123

22.Oct'14
Tom Butler presents his honors thesis
Fuzzy Expert System Evolution: Increasing the accessibility of intelligent controllers

15.Oct'14
PhD Monitoring Mayank & Jan (Room RSISE B123)

8.Oct'14
Break

1.Oct'14
Break

24.Sep'14
Daniel Filan presents about the speed prior
http://link.springer.com/chapter/10.1007%2F3-540-45435-7_15

17.Sep'14
Jan presents Teleporting Universal Agents by Laurent Orseau AGI'2014
http://www.agroparistech.fr/mia/equipes:membres:page:laurent:teleport

10.Sep'14
Hadi presents his most recent work on symbolic Gibb's sampling

3.Sep'14
Peter reports from AAAI'2014

Integrating representation learning and temporal difference learning:
A matrix factorization approach by M. White
http://webdocs.cs.ualberta.ca/~whitem/publications/14aaaiw-crtd.pdf
with a closely related alternative
http://webdocs.cs.ualberta.ca/~whitem/publications/14aaaiw-frrl.pdf
Active Learning with Model Selection by A. Ali., R. Caruana and A. Kapoor
http://research.microsoft.com/en-us/um/people/akapoor/papers/AAAI2014.pdf
Natural Temporal Difference Learning by W. Dabney and P. Thomas
http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/viewFile/8568/8913

27.Aug'14
Peter reports from CogSci'2014.
Toward Boundedly Rational Analysis by Thomas Icard
http://web.stanford.edu/~icard/cogsci14.pdf
A Bounded Rationality Account of Wishful Thinking by R. Neumann, A. N. Rafferty, T. L. Griffiths
http://cocosci.berkeley.edu/anna/papers/WishfulThinking.pdf
The high availability of extreme events serves resource-rational decision-making by Lieder, Wills, Hsu, Griffiths
http://cocosci.berkeley.edu/falk/HighAvailabilityOfExtremeEvents.pdf
and a related recent journal paper providing the background for the above
One and Done? Optimal Decisions From Very Few Samples by Edward Vul, Noah Goodman, Thomas L. Griffiths and Joshua B. Tenenbaum
http://web.stanford.edu/~ngoodman/papers/VulGoodmanGriffithsTenenbaum-COGS-2014.pdf
Information vs Reward in a changing world by Navarro and Newell
http://health.adelaide.edu.au/psychology/ccs/docs/pubs/2014/NavarroNewell2014.pdf
Uncertainty and Exploration in a restless bandit task by Speekenbrink and Konstantinidis
http://www.psychol.ucl.ac.uk/m.speekenbrink/articles/cogsci2014.pdf

16.July'14 — 20.Aug'14
Currently no meetings planned, but check a day in advance or volunteer to present something.

9.July'14
Jan presents Yudkowsky, Eliezer Herreshoff, Marcello.
Tiling Agents for Self-Modifying AI, and the Löbian Obstacle
https://intelligence.org/files/TilingAgents.pdf
and
Problems of self-reference in self-improving space-time embedded
intelligence. Benja Fallenstein and Nate Soares. AGI 2014.
https://intelligence.org/wp-content/uploads/2014/05/Fallenstein-Soares-Problems-of-self-reference-in-self-improving-space-time-embedded-intelligence.pdf

2.July'14
Peter gives practice talk for Quebec conference.
Note B123 and we start on time since the room has other events at 12:20.
Please arrive no later than 11:30 (always applies but in particular this week).

25.June'14
Marcus presents his ALT paper on Offline to Online Conversion.

20.June'14 Note, this is a Friday! Time 2pm
Daniel Cotton presents his ASC project on Reinforcement learning in computer science and psychology
Followed by Tony Allard giving his monitoring talk at 3pm on Logistics Planning.

18.June'14
Jan presents overview of MIRI's recent research

11.June'14
Break

4.June'14
Daniel Filan presents his ASC project on AIXI convergence

28.May'14
Mayank talks about game playing competition and reports on his progress.
In particular a report to Marcus and Peter, but others welcome.

21.May'14
Marcus presents his ALT submission Extreme State Aggregation Beyond MDPs

14.May'14
Paper reviewing discussions

7.May'14
No reading group

30.Apr'14
Paper reviewing discussions

23.Apr'14
Tor (visiting 23.-25.Apr) presents "Memory Allocation Bandits"

16.Apr'14
Paper reviewing discussions

9.Apr'14
Monitoring in RSISE seminar room
11:30 Hadi
12:00 Mayank
12:30 Jan
13:00-14:00 feedback.

2.Apr'14
Jan presents
Marcus Hutter: Discrete MDL Predicts in Total Variation. NIPS'09
http://arxiv.org/abs/0909.4588

26.Mar'14
Mayank presents "Cover Tree Bayesian Reinforcement Learning" by Nikolaos Tziortziotis, Christos Dimitrakakis and Konstantinos Blekas.
http://arxiv.org/pdf/1305.1809v1

12,19.Mar'14
Break due to travels and deadlines

5.Mar'14
Marcus talks about new extension of the context tree weighting algorithm

26.Feb'14
Peter presents "Using Expectation-Maximization for Reinforcement Learning" by Dayan and Hinton 1997
http://www.gatsby.ucl.ac.uk/~dayan/papers/rpp97.pdf
with a discussion of what has happened afterwards which includes Bayesian MCMC alternatives to the original frequentist EM approach, e.g.
http://www.stanford.edu/~ngoodman/papers/WingateEtAl-PolicyPrios.pdf
This line of work that includes many papers in the last 5 years is often called planning as inference
http://ipvs.informatik.uni-stuttgart.de/mlr/marc/publications/12-botvinick-TICS.pdf

19.Feb'14
Alex presents "Changing tastes and Coherent Dynamic Choice" by Peter J. Hammond
http://www.jstor.org/stable/2296609

12.Feb'14
Mayank continues from last week

5.Feb'14
Mayank presents "Efficient Learning and Planning with Compressed Predictive States".
William Hamilton, Mahdi Miliani Fard and Joelle Pineau.
http://arxiv.org/abs/1312.0286

29.Jan'14
Reading group restarts for 2014 with Peter talking about "Rationality, Optimism and Guarantees in General Reinforcement Learning" in the RSISE seminar room as an AI seminar. Please note 12:00-13:00 !

18'Dec'13-
MaxEnt, Xmas, New Year

11'Dec'13
Johannes presents his work on counter-examples in reinforcement learning

6'Dec'13
Tor's last day, at least here at ANU. Talk, farewell lunch etc. Details later

4'Dec'13
Hadi gives monitoring talk

27'Nov'13
Rachael continues from the 16:th of Oct with the voting part of the paper.
Note, the talk will be in R214 in the Ian Ross building!

20'Nov'13
Ian Hon presents his honours thesis

12'Nov'13
Tony Allard monitoring talk. Note Tuesday! 3pm in the RSISE seminar room

13'Nov'13
ACML workshop (organized by Peter Sunehag, Marcus Hutter, Mark Reid) on theory and practice in Machine Learning at the Manning Clark Centre, ANU
https://sites.google.com/site/mltheoryandpractice/

14,15'Nov'13
ACML at ANU

6'Nov'13
Johannes presents "The Fixed Points of Off-Policy TD" by J. Zico Kolter NIPS 2011.
http://books.nips.cc/papers/files/nips24/NIPS2011_1200.pdf

30'October'13
Mayank gives monitoring talk

23'October'13
Peter talks about "Learning from human generated rewards", based on a sequence of papers making up the PhD thesis of Bradley Knox (http://www.bradknox.net/) supervised by Peter Stone, primarily: W. Bradley Knox and Peter Stone. Learning Non-Myopically from Human-Generated Reward. In Proceedings of the International Conference on Intelligent User Interfaces (IUI), March 2013.
http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/iui13-knox.pdf

16'October'13
Raechel Briggs presents her article Decision-Theoretic Paradoxes as Voting Paradoxes,
Philosophical Review 2010 Volume 119, Number 1: 1-30
http://philreview.dukejournals.org/content/119/1/1.abstract

2,9'Oct'13
Break due to travels

25'Sep'13
Tor presents (More) Efficient Reinforcement Learning via Posterior Sampling, NIPS'2013
Ian Osband, Daniel Russo and Benjamin Van Roy
http://arxiv.org/pdf/1306.0940v1.pdf

18'Sep'13
Mayank presents,
Incremental Basis Construction from Temporal Difference Error by Yi Sun, Faustino Gomez, Mark Ring, Jurgen Schmidhuber in ICML 2011.
Paper @ http://www.idsia.ch/~juergen/icml2011sun.pdf
Slides @ http://www.idsia.ch/~sun/doc/icml11-ftr-slides.pdf

11'Sep'13
Tor talks about best arm identification in bandits

4'Sep'13
Peter presents,
Temporal-Difference Search in Computer Go by Silver, D., Sutton, R. S., Mueller, M in ICAPS 2013 http://www.aaai.org/ocs/index.php/ICAPS/ICAPS13/paper/view/6037/6227
and in Machine Learning 87(2):183-219 2012
http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/tdsearch.pdf

28'Aug'13
Mayank presents.
Bruno Scherrer. "Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unifed oblique projection view" in Proceedings of the 27th International Conference on Machine Learning (2010).
http://www.icml2010.org/papers/654.pdf
Slides available here,
http://www.loria.fr/~scherrer/presentations/tdbr.pdf

24'July'13
Tor presents things from conference travel to ICML/COLT.

17'July'13
Peter presents tutorial on Exploration vs Exploitation as practice before EWRL.
Probably downstairs in the seminar room

9'July'13 (note Tuesday!, 11:30)
Hadi presents his TPR

3'July'13
Scott presents (at 11)
S. Sanner, K. V. Delgado, and L. N. de Barros (2011). Symbolic Dynamic Programming for Discrete and Continuous State MDPs. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI-11). Barcelona, Spain.
http://users.cecs.anu.edu.au/~ssanner/Papers/cont_mdp.pdf

26'June'13
Monitoring talk by Ehsan (NICTA)

19'June'13
Monitoring talks by David and Suvash downstairs RSISE seminar room at 12

12'June'13
Integrating Partial Model Knowledge in Model Free RL Algorithms
Aviv Tamar and Dotan Di Castro and Ron Meir
International Conference on Machine Learning (ICML), 2011
http://www.icml-2011.org/papers/222_icmlpaper.pdf
Mayank presents

5'June'13
S. Thiebaux, C. Gretton, J. Slaney, D. Price and F. Kabanza (2006) "Decision-Theoretic Planning with non-Markovian Rewards", Volume 25, pages 17-74
http://www.jair.org/papers/paper1676.html
Charles Gretton presents

29'May'13
Tor and Peter presents

22'May'13
Monitoring

15'May'13
Monitoring

8'May'13
Peter present "Online Feature Selection for Model-based Reinforcement Learning" ICML'2013 by Trung Thanh Nguyen, Zhuoru Li, Tomi Silander and Tze-Yun Leong http://jmlr.csail.mit.edu/proceedings/papers/v28/nguyen13.pdf

1'May'13
Marcus presents his COLT paper on sparse adaptive Dirichlet-multinomial-like Processes

24'April'13
Hadi and Tor give monitoring talks

17'April'13
Wen and Mayank give monitoring talks

10'April'13
Mayank continues from last time on over-estimation in Q-learning

3'April'13
Mayank presents Double-Q learning and associated paper
http://books.nips.cc/papers/files/nips23/NIPS2010_0208.pdf

27'March'13
Ian Hon continues the survey on large alphabet sources and compression based on
http://www.cs.technion.ac.il/~ronbeg/begleiter-papers/begleiter06a.pdf
http://www.sps.ele.tue.nl/members/f.m.j.willems/research_files/CTW/benelux94-tjalkens-willems-shtarkov.pdf

20'March'13
Wen presents a survey on text (large alphabet) modeling. Relevant papers are
http://www2.denizyuret.com/ref/goodman/chen-goodman-99.pdf
http://acl.ldc.upenn.edu/P/P06/P06-1124.pdf

18'March'13
Oscillation-free epsilon-random sequences, Ludwig Staiger

13'March'13
Tor presents Thompson Sampling: An Asymptotically Optimal Finite Time Analysis, ALT'2012
Emilie Kaufmann, Nathaniel Korda, Rémi Munos
http://arxiv.org/abs/1205.4217

6'March'13
Peter presents "A Dantzig Selector Approach to Temporal Difference Learning", Matthieu Geist, Bruno Scherrer, Alessandro Lazaric and Mohammad Ghavamzadeh, ICML 2012 http://icml.cc/2012/papers/703.pdf

27'February'13
Wen presents "Delusion, Survival, and Intelligent Agents" by Mark B. Ring, Laurent Orseau, AGI'2011
http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=28BAE7205B795D39B357E46822EB4A4D?doi=10.1.1.232.9313

13'February'13
Tor presents "Universal Knowledge-Seeking Agents" by Laurent Orseau , ALT'2011
http://www.agroparistech.fr/mmip/maths/laurent_orseau/papers/orseau-ALT-2011-knowledge-seeking.pdf

6'February'13
Peter presents "Space-Time Embedded Intelligence" by Laurent Orseau and Mark Ring, AGI'2012
http://agi-conference.org/2012/wp-content/uploads/2012/12/paper_76.pdf

30'January'13
Tom presents his (draft) Master's thesis about (No) Free Lunch theorems for optimization

23'January'13
Nam talks about learning theory

16'January'13
Hadi presents the Loewenheim Skolem Theorem & Proof
http://en.wikipedia.org/wiki/L%C3%B6wenheim%E2%80%93Skolem_theorem
and Marcus the Skolem Paradox and its resolution
http://en.wikipedia.org/wiki/Skolem%27s_paradox

9'January'13
Marco presents Wouter M. Koolen, Dimitri Adamskiy, Manfred K. Warmuth (NIPS 2012) Putting Bayes to sleep
http://www.cs.rhul.ac.uk/~wouter/Papers/sleep.pdf

19'December'12
End of year meeting. Marcus presents Fun with Bayesian & Decision & other paradoxes.

12'December'12
Summer scholars present their topics and informal question and answer session.

28'November'12
Marco leads readings on extensions of CTW
Volf, P., & Willems, F. (1997). A context-tree branch-weighting algorithm. SYMPOSIUM ON INFORMATION THEORY IN THE …. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.121.7873&rep=rep1&type=pdf
Willems, F. M. J. (1996). Context weighting for general finite-context sources. Information Theory, IEEE …, 42(5), 1514–1520. doi:10.1109/18.532891
Willems, F. M. J. (1998). The context-tree weighting method: extensions. IEEE Transactions on Information Theory, 44(2), 792–798. doi:10.1109/18.661523

21'November'12
Peter leads discussions on
Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011) How to grow a mind: Statistics, structure, and abstraction. Science, 331, 1279-1285.
http://www.sciencemag.org/content/331/6022/1279.full.pdf

14'November'12
Monitoring: Wen and Mayank

7'November'12
Tom talks about his literature review on meta rationality

31'October'12
Xinjue gives practice talk on "Exploration in Bayesian Reinforcement Learning".

24'October'12
Monitoring talks

17'October'12
Peter presents "A Bayesian Sampling Approach to Exploration in Reinforcement Learning" by Asmuth, Li, Littman, Nouri, Wingate
http://web.mit.edu/~wingated/www/papers/boss.pdf

10'October'12
Hadi gives practice talk

3'October'12
Tor gives practice talk

26'September'12
Mayank gives practice talk

19'September'12
Marcus presents. See his email

12'September'12
Phuong gives practice talk for monitoring

5'September'12
Joel Veness gives a talk in the Seminar room downstairs on further developments of MC-AIXI.

29'August'12
Hadi talks about AGI papers

22'August'12
Phuong presents "TD-DeltaPi: A Model-Free Algorithm for Efficient Exploration" by B. C. Silva and A. G. Barto
http://people.cs.umass.edu/~bsilva/deltaPi_aaai2012.pdf

15'August'12
Mayank presents Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction. Available at http://webdocs.cs.ualberta.ca/~sutton/papers/horde-aamas-11.pdf .

8'August'12
Phuong summarizes two papers from AAAI12

  • D. Lee and W. B. Powell, Intelligence Battery Controller Using Bias-Corrected Q-learning

http://energysystems.princeton.edu/Papers/Lee_Powell_AAAI2012_BiasCorrectedQLearning.pdf

  • W. Dbney and A. G. Barto, Adaptive Step-Size for Online Reinforcement Learning

http://people.cs.umass.edu/~wdabney/papers/alphaBounds.pdf

1'August'12

25'July'12

18 July'12

11 July'12

  • Phuong doing test run for AAAI presentation

27 June 12

20 June 12

  • Cancelled

13'June'12

  • Phuong presents his work on Regret bounds for feature reinforcement learning where he extends the work by Maillard, Munos and Ryabko to the countable case.

30'May'12

  • Mayank talks about Predictive State Representations
  • Littman, Michael L.; Richard S. Sutton; Satinder Singh (2002). "Predictive Representations of State". Advances in Neural Information Processing Systems 14 (NIPS). pp. 1555–1561.
  • Singh, Satinder; Michael R. James; Matthew R. Rudary (2004). "Predictive State Representations: A New Theory for Modeling Dynamical Systems". Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI). pp. 512–519.

10.May'12

  • Wen reports from DCC 2012 on three paper (emailed out)

2.May'12

18.April'12

  • On Nicod's condition and the black raven paradox
    The paper is available from Hadi or Peter by email

4.April'12

21.Mar'12

14.Mar'12

29.Feb'12

  • Wen will be presenting his TPR on compression

15.Feb'12

  • PAC bounds for Discounted MDPs, presented by Tor. Email ua.ude.una|eromittal.rot#ua.ude.una|eromittal.rot for a copy of the paper.

8.Feb'12

  • Some inequalities in probability theory, presented by Tor

7.Dec'11

30.Nov'11

  • Solomonoff Memorial conference in Melbourne. Tor, Ian, Peter and Wen presenting.

23.Nov'11

  • An approximation of the universal intelligence measure, Shane Legg and Joel Veness
    http://jveness.info/publications/rsmc2011%20-%20aiq.pdf
    presented by Wen as a practice talk for Solomonoff Memorial
    Further discussions of the paper follows the 20 minute presentation with slides

16.Nov'11

9.Nov'11

  • We are done with the book. Paper reading resumes next week.

Nov'11

  • Peter presents chapter 7 and Joseph Chapter 8.

Oct'11

  • Wen presents Chapter 6

Sep'11

  • After a break for the first two weeks, Phuong presents chapter 5

Aug'11

  • Tor presents chapter 4

July'11

  • Daniel finnish chapter 3

30.June'11

  • Daniel presents chapter 3 of "Neuro-dynamic programming"

8,15,23.June'11

  • Mayank presents chapter 2 of "Neuro-dynamic programming"

1.June'11

25.May'11

18.May'11

11.May'11

  • Meeting, discussing paper reviewing

4.May'11

27.April'11

20.April'11

13.April'11

6.April'11

  • Mayank Presents [Hut09] M.Hutter, Feature dynamic Bayesian networks.
    In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume8, pages 67-73. Atlantis Press, 2009.
    http://www.hutter1.net/ai/phidbn.pdf

30.Mar'11

23.Mar'11

16.Mar'11

  • Phuong presents

9.Mar'11

4.Mar'11
"Pascal" Workshop on RL and Planning at NICTA Level 3, Meeting Room D

  • 10:30 — 11:00 | Pascal Poupart: Explaining Automated Policies for Sequential Decision Making
  • 11:00 — 11:18 | Debdeep Banerjee: Partial Order Support Link Scheduling
  • 11:18 — 11:36 | Patrik Haslum: A Quick Overview of Factored (Classical) Planning
  • Break — 12 minutes
  • 11:48 — 12:06 | Scott Sanner: The Relational Dynamic Influence Diagram Language
  • 12:06 — 12:24 | Peter Sunehag: History-based Reinforcement Learning
  • 12:24 — 12:42 | Matt Robards: Model-Based Reinforcement Learning With Function Approximation
  • 12:42 — 13:00 | Will Uther: topic TBD

2.Mar'11

23.Feb'11

  • Tor and Hassan present …

16.Feb'11

  • Matthew and Peter present …

9.Feb.'11

15.Dec.'10

  • Chapters 6—8, Sridhar Mahadevan, "Learning Representation and Control in Markov Decision Processes: New Frontiers".
    Foundations and Trends in Machine Learning (editor, Michael, Jordan), vol 1, No. 4, pp. 403-565 (163 pages), 2009.
    http://www.cs.umass.edu/~mahadeva/papers/ml-found-trend.pdf
    Presented by Scott

8.Dec.'10

  • Statistical physics of social dynamics
    Castellano, C., Fortunato, S., and Loreto, V. 2009. Reviews of Modern Physics 81, 2, 591
    Section IV. Cultural Dynamics, Parts A & B (Axelrod model and variants)
    http://dx.doi.org/10.1103/RevModPhys.81.591
    Presented by Ian Wood

1.Dec.'10

  • Constrained Complexity Generalized Context-Tree Algorithms, Robert J Drost and Andrew C Singer
    http:/ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04301233
    Presented by Peter

24.Nov'10

  • Hopefully the following will be presented :Efficient real-time dynamic programming for factored MDPs.
    Honors thesis by Sotirios Diamand. NOTE: In N101.

17.Nov'10

10.Nov'10

  • Efficient real-time dynamic programming for factored MDPs.
    Honors thesis by Sotirios Diamand. CANCELED

3.Nov'10

  • Cancelled

27.Oct'10

20.Oct'10

13.Oct'10

22 .Sep'10, in Room 207 (only ours to 12:30, be on time)

15 .Sep'10

I list two more interesting papers that are more RL related but harder (I think)

And another making the case for pursuing robust estimators in general:

8 .Sep'10

  • No Free Lunch and Occam's Razor in Supervised Learning
    (Tor presents his work)

18,25 .Aug,1.Sep'10

  • Chapter 8 of Universal AI book
    (general discussion)

11 .Aug'10

  • New TD algorithms from Alberta
    (Matthew presents a survey of stuff by Maei, Sutton and their collegues)

4 .Aug'10

  • MC-AIXI-CTW,
    (Joel Verness) NOTE LOCATION: A207

28 .July'10

  • End of Chapter 7 of Universal AI book
    (Tor presents)

21 .July'10

  • Meeting about Advanced AI course

Aug'10

9,16 .Jun'10

  • Chapter 7 of Universal AI book
    (Presented by Phuong)

2.Jun'10

  • Canceled

26.May'10

  • Matthew Robards and Peter Sunehag and Scott Sanner
    RKHS Temporal Difference Learning
    Tech Report, The Australian National University
    RKHS Temporal Difference Learning

28.Apr'&05,19.May'10

  • Chapter 6 of Universal AI book
    (Presented by Peter,Marcus, Zhara)

21.April'10

For more details,

14.April'10

  • L. Kocsis, Cs. Szepesvári
    Bandit Based Monte-Carlo Planning
    In, Proceedings of the 17th European Conference on Machine Learning
    Springer-Verlag, Berlin, LNCS/LNAI 4212, September 18-22, pp. 282-293, 2006.
    http://www.sztaki.hu/~szcsaba/papers/ecml06.pdf
    (Presented by Peter)

7.April'10

31.Mar'10

17&24.Mar'10

3&10.Mar'10

  • Joel Veness and Kee Siong Ng and Marcus Hutter and David Silver
    A Monte Carlo AIXI Approximation
    Technical Report, arXiv 0909.0801 (2009) 1-42
    [implementation & application of the AIXI]
    http://www.hutter1.net/ai/aixictw.pdf
    (Presented by Sam)

3&10&17&24.Feb'10

27.Jan'10

20.Jan'10

  • Continuation of last week's paper + Summer Scholar presentation preview.

13.Jan'10

23&30.Dec'09

  • Break - (-: Christmas and New Years :-)

16.Dec'09

Scott will be presenting:

09.Dec'09

  • [RP08] S.Ross and J.Pineau.
    Model-based Bayesian reinforcement learning in large structured domains.
    In Proc. 24th Conference in Uncertainty in Artificial Intelligence
    (UAI'08), pages 476-483, Helsinki, 2008. AUAI Press.
    http://www.cs.mcgill.ca/~jpineau/files/sross-uai08.pdf
    (Presented by Peter)

02.Dec'09

  • Note: Changed from before.
    M. Rosencrantz, G. Gordon, and S. Thrun.
    Learning low dimensional predictive representations.
    In Proceedings of the Twenty-First International Conference on Machine Learning,
    Banff, Alberta, Canada, 2004.
    http://robots.stanford.edu/papers/Rosencrantz04a.pdf
    (Presented by Ian)

25.Nov'09

  • [SJR04] S.P. Singh, M.R. James, and M.R. Rudary.
    Predictive state representations: A new theory for modeling dynamical systems.
    In Proc. 20th Conference in Uncertainty in Artificial Intelligence (UAI'04), pages 512-518, Banff, Canada, 2004. AUAI Press.
    (Presented by Hassan)

18.Nov'09

  • Matthew Robards will present his literature review on reinforcement learning in large, continuous spaces (focus on Part II).
    Literature Review

11.Nov'09

  • [SLL09] A.L. Strehl, L.Li, and MichaelL. Littman.
    Reinforcement learning in finite MDPs: PAC analysis.
    http://paul.rutgers.edu/~strehl/, 2009.
    (Presented by Marcus)

04.Nov'09

28.Oct'09

Scott will be talking about several nice methods for solving MDPs efficiently.
The 4 papers to be covered are summarized in the following slides:

http://sml.nicta.com.au/rlp08/RLP_MDP_Extensions.pdf

The papers themselves are as follows (it's recommended that people read the first one
and skim through the others).

  • Hierarchical Solution of Markov Decision Processes using Macro-actions.
    Milos Hauskrecht and Nicolas Meuleau and Leslie Pack Kaelbling and Thomas Dean and Craig Boutilier.
    UAI 1998.
    Note: this paper builds on the macro action semi-MDP framework of Sutton & Precup, but makes some
    important changes which make things much cleaner (theoretically and implementationally).
    http://www.cs.toronto.edu/kr/papers/macros.pdf

21.Oct'09

Addendum: Policy gradient techniques from a robotics perspective:

14.Oct'09

  • [NCD04] A.Y. Ng, A.Coates, M.Diel, V.Ganapathi, J.Schulte, B.Tse, E.Berger, and E.Liang.
    Autonomous inverted helicopter flight via reinforcement learning.
    In ISER, volume21 of Springer Tracts in Advanced Robotics, pages 363-372. Springer, 2004.
    (Presented by Phuong)

30.Sep'09&7.Oct'09

  • [RPPC08] S. Ross, J. Pineau, S. Paquet, B. Chaib-draa,
    Online planning algorithms for POMDPs,
    Journal of Artificial Intelligence Research, 32 (2008) 663—704.
    This paper compares the "online" "tree-search" planning approach, popular for games
    with the "offline" "self-consistent" Bellman equation approach,
    popular in reinforcement learning (and described by Kaelbling 1998 et al).
    (Presented by Peter).

16&23.Sep'09

  • [KLC98] L.P. Kaelbling and M.L. Littman and A.R. Cassandra,
    Planning and Acting in Partially Observable Stochastic Domains
    Artificial Intelligence, 101 (1998) 99—134
    (Presented by Marcus/Hassan/Sarah)

Papers in Queue

Neural Network Papers

  • Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Networks. ArXiv:1406.2661 [Cs, Stat].
    http://arxiv.org/abs/1406.2661
  • Muthukumar, V., Vodrahalli, K., & Sahai, A. (2019). Harmless interpolation of noisy data in regression. ArXiv:1903.09139 [Cs, Stat].
    http://arxiv.org/abs/1903.09139
  • Du, S. S., Lee, J. D., Li, H., Wang, L., & Zhai, X. (2018). Gradient Descent Finds Global Minima of Deep Neural Networks. ArXiv:1811.03804 [Cs, Math, Stat].
    http://arxiv.org/abs/1811.03804
  • Arjovsky, M., & Bottou, L. (2017). Towards Principled Methods for Training Generative Adversarial Networks. ArXiv:1701.04862 [Cs, Stat].
    http://arxiv.org/abs/1701.04862
  • Alex Graves. (2014). Differentiable neural computers. Deepmind. /blog/article/differentiable-neural-computers Graves, A., Wayne, G., & Danihelka, I. (2014). Neural Turing Machines. ArXiv:1410.5401 [Cs].
    http://arxiv.org/abs/1410.5401
  • Mazumdar, E. V., Jordan, M. I., & Sastry, S. S. (2019). On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games. ArXiv:1901.00838 [Cs, Math, Stat].
    http://arxiv.org/abs/1901.00838

Deepmind Papers

General POMDPs

State Abstractions for RL

  • [GDG03] R.Givan, T.Dean, and M.Greig.
    Equivalence notions and model minimization in Markov decision processes.
    Artificial Intelligence, 147(1-2):163-223, 2003.
  • [Hut09a] M.Hutter.
    Feature dynamic Bayesian networks.
    In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume8, pages 67-73. Atlantis Press, 2009.

General MDPs

  • [LLW08] Lihong Li, Michael L. Littman, Thomas J. Walsh: Knows what it knows: a framework for self-aware learning. ICML 2008: 568-575
    www.machinelearning.org/archive/icml2008/papers/627.pdf
  • [BBSE10] (Book), Lucian Busoniu, Robert Babuska, Bart De Schutter, and Damien Ernst (2010)
    "Reinforcement Learning and Dynamic Programming Using Functions Approximators"
    in the Automation and Control Engineering series of Taylor & Francis CRC Press.
  • [Mah09] Sridhar Mahadevan (2009) Learning Representation and Control in Markov Decision Processes: New Frontiers
    Foundations and Trends in Machine Learning: Vol. 1: No 4, pp 403-565.
    http://dx.doi.org/10.1561/2200000003

Miscellaneous

  • [Gru04] P.D. Gruenwald.
    Tutorial on minimum description length.
    In Minimum Description Length: recent advances in theory and practice, page Chapters 1 and 2. MIT Press, 2004.
  • [BLA02] B. Ng, L. Peshkin, and A. Pfeffer.
    Factored Particles for Scalable Monitoring.
    In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, 2002.

Background Reading

The standard textbook on RL is:

See also:

  • [Put94] M.L. Puterman.
    Markov Decision Processes - Discrete Stochastic Dynamic Programming.
    Wiley, New York, NY, 1994.
  • [KV86] P.R. Kumar and P.P. Varaiya.
    Stochastic Systems: Estimation, Identification, and Adaptive Control.
    Prentice Hall, Englewood Cliffs, NJ, 1986.
  • [PORL09] Partially Observable Reinforcement Learning

Symposium at NIPS'09 December 10, Vancouver
http://www.hutter1.net/ai/porlsymp.htm and
http://grla.wikidot.com/nips for more details.

Contact

Len Du <moc.liamg|cilbup.ud.nel#moc.liamg|cilbup.ud.nel> or
David Quarel <ua.ude.una|lerauq.divad#ua.ude.una|lerauq.divad> or
Elliot Catt <ua.ude.una|ttacretneprac.toille#ua.ude.una|ttacretneprac.toille> or
Marcus Hutter <ua.ude.una|rettuh.sucram#ua.ude.una|rettuh.sucram>

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License