General Information
Welcome to the Reinforcement Learning Reading Group at RSCS@ANU
- Who: Everyone is welcome.
- When: Every second Thursday (Friday for Australia & New Zealand), 23:00-24:00 UTC (subject to micro-adjustments which will be displayed if any),
(If you want to attend, but the time does not suit you, please let me know) - Where: Virtually on Google Meet
- Assumed Background: Basics in Reinforcement Learning (e.g. [SB98] see end of this page)
- Operation mode: Discussing read papers. No email reminders.
Regular (Past&Current) Participants:
Mostly students and researchers from RSCS@ANU and other RL friends nearby.
Reading List
The schedule for the reading group is given below and will be updated weekly.
Important: Reading group has become virtual, please email moc.liamg|cilbup.ud.nel#moc.liamg|cilbup.ud.nel to join the reading group.
30.Nov.23 (01.Dec.23 AEST)
Casey S. Schroeder presents Decision as Pattern Recognition
The hypothesis is roughly that our actions are the result of recognizing a pattern with extension in time, with a certain 'choice' from a weighted probability distribution of movements, replacing 'active' components of the pattern, as they move to the present. It is a 'choice' in much the same way a quantum wave 'chooses' to collapse, perhaps not really a choice at all, but a probabilistic mechanics. This mechanism may account for most everyday practical reasoning.
23.Nov.23 (24.Nov.23 AEST)
Free-form discussion
09.Nov.23 (10.Nov.23)
Free-form discussion
02.Nov.23 (03.Nov.23)
Michael Bennett informally presents further contents on previous work.
26.Oct.23 (27.Oct.23)
Extra free-form discussion
19.Oct.23 (20.Oct.23)
Free-form discussion
12.Oct.23 (13.Oct.23)
Cole Wyeth presents A Circuit Complexity Formulation of Algorithmic Information Theory
05.Oct.23 (06.Oct.23)
Free-form discussion
28.Sep.23 (29.Sep.23)
Vincent Abott presents Neural Circuit Diagrams: Standardized Diagrams for Deep Learning Architectures
21.Sep.23 (08.Sep.23)
Free-form discussion
14.Sep.23 (15.Sep.23)
Michael Bennett presents A Unified Theory of Meaning, Consciousness and Artificial superintelligence (Best student paper at AGI-23)
07.Sep.23 (08.Sep.23)
Free-form discussion
31.Aug.23 (01.Sep.23)
Vincent Abott presents Neural Circuit Diagrams: Standardized Diagrams for Deep Learning Architectures
24.Aug.23 (25.Aug.23)
Free-form discussion
17.Aug.23 (18.Aug.23)
Vincent Abott presents Neural Circuit Diagrams: Standardized Diagrams for Deep Learning Architectures
10.Aug.23 (11.Aug.23)
Free-form discussion
03.Aug.23 (04.Aug.23)
No reading group
27.Jul.23 (28.Jul.23)
Free-form discussion
20.Jul.23 (21.Jul.23)
No reading group
13.Jul.23 (14.Jul.23)
Free-form discussion
6.Jul.23 (7.Jul.23)
No reading group
29.Jun.23 (30.Jun.23)
Free-form discussion
22.Jun.23 (23.Jun.23)
No reading group
15.Jun.23 (16.Jun.23)
Free-form discussion
8.Jun.23 (9.Jun.23)
No reading group
1.Jun.23
Free-form discussion
24.May.23
BREAK - in transition
17.May.23
Free-form discussion
10.May.23
Anna Winnicki presents A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games
3.May.23
Free-form discussion
26.Apr.23
Ashish Jayant presents Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm
19.Apr.23
Free-form discussion
12.Apr.23
Jakob Thumm presents Reducing Safety Interventions in Provably Safe Reinforcement Learning
5.Apr.23
Free-form discussion
29.Mar.23
TBA
22.Mar.23
Free-form discussion
15.Mar.23
David Quarel presents Git Re-Basin: Merging Models Modulo Permutation Symmetries
8.Mar.23
Free-form discussion
1.Mar.23
Arash Tavakoli presents Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning
22.Feb.23
Free-form discussion
15.Feb.23
Samuel Alexander presents Extending Environments to Measure Self-reflection in Reinforcement Learning
8.Feb.23
Jinke He presents Online Planning in POMDPs with Self-Improving Simulators
1.Feb.23
Free-form discussion
25.Jan.23
Free-form discussion
18.Jan.23
Joar Skalse presents Defining and Characterizing Reward Hacking
11.Jan.23
BREAK
4.Jan.23
BREAK
28.Dec.22
BREAK
21.Dec.22
BREAK
14.Dec.22
Free-form discussion
7.Dec.22
(TBC) Joar Skalse presents Reinforcement learning in Newcomblike environments
30.Nov.22
Free-form discussion
23.Nov.22
Runze Tang presents his honors thesis on Procedural Content Generation using GANs
16.Nov.22
Free-form discussion
9.Nov.22
Samuel Alexander presents Agent mixtures and the genericness of non-deterministic intelligence]
2.Nov.22
Free-form discussion
26.Oct.22
Samuel Yang-Zhao and Tianyu Wang present A Direct Approximation of AIXI Using Logical State Abstractions
19.Oct.22
Aram Ebtekar continues with Information dynamics and the arrow of time
12.Oct.22
Aram Ebtekar presents Information dynamics and the arrow of time
5.Oct.22
Free-form discussion
28.Sep.22
Steve Carr presents Safe Reinforcement Learning via Shielding for POMDPs
21.Sep.22
Free-form discussion
14.Sep.22
Matthew Aitchison presents DNA: Proximal Policy Optimization with a Dual Network Architecture
7.Sep.22
Free-form discussion
31.Aug.22
No reading group
24.Aug.22
Free-form discussion
17.Aug.22
Matthew Aitchison presents Is the Policy Gradient a Gradient?
10.Aug.22
Free-form discussion
3.Aug.22
No reading group Christopher Mingard continues with Neural networks are a priori biased towards Boolean functions with low entropy
27.July.22
Free-form discussion
20.July.22
Samuel Alexander presents Can reinforcement learning learn itself? A reply to 'Reward is enough
13.July.22
Free-form discussion
6.July.22
DavidQ discusses Context Tree Weighting
29.June.22
Free-form discussion
22.June.22
Michael Bennett presents Computable Artificial General Intelligence
15.June.22
Free-form discussion
8.June.22
Christopher Mingard presents Neural networks are a priori biased towards Boolean functions with low entropy
1.June.22
Free-form discussion
25.May.22
Samuel Alexander presents The Archimedean trap: Why traditional reinforcement learning will probably not yield AGI
18.May.22
Free-form discussion
11.May.22
David Abel presents On the Expressivity of Markov Reward
4.May.22
Free-form discussion
27.Apr.22
Peter Vamplew presents Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton
20.Apr.22
Free-form discussion
13.Apr.22
Vembalagu "VJ" Vijendran discusses Quantum Algorithms for Reinforcement Learning
6.Apr.22
[no reading group]
30.Mar.22
Samuel Alexander presents Reward-Punishment Symmetric Universal Intelligence
23.Mar.22
Free-form discussion
16.Mar.22
Elliot discusses investigations into binarisation in reinforcement learning
9.Mar.22
Free-form discussion
2.Mar.22
Erik Rehn presents Free Will Belief as a consequence of Model-based Reinforcement Learning
23.Feb.22
Free-form discussion
16.Feb.22
Tomer Galanti presents On the Role of Neural Collapse in Transfer Learning
9.Feb.22
DavidQ continues presenting Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression
2.Feb.22
DavidQ presents Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression
26.Jan.22
Free-form discussion
19.Jan.22
DavidQ presents Memoryless policies: theoretical limitations and practical results
15.Dec.21
Preetum presents Turing-Universal Learners with Optimal Scaling Laws
10.Nov.21
Michael Presents Shaking the foundations: delusions in sequence models for interaction and control
13.Oct.21
Elliot presents Reinforcement Learning with Information-Theoretic Actuation
29.Sept.21
Sultan presents ARENA
21.July'21
Matthew presents Muesli: Combining Improvements in Policy Optimization
26.May'21
Jonathon presents Monte-Carlo planning for Partially Observable Markov Games
28.Apr'21
Len presents Nondeterministic Turing machines as a practical pattern language for beyond-context-free patterns and replacement of (propositional) logic
14.Apr'21
Jaskirat presents Sparse Attention Guided Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning
31.Mar'21
Michael B discusses Defining Tasks, Intensional Solutions, and a Computational Theory of Meaning
17.Mar'21
Matthew presents Fixed-Horizon Temporal Difference Methodsfor Stable Reinforcement Learning
3.Mar'21
Elliot presents Universal Agents in Repeated Matrix Games
17.Feb'21
David Q presents Temporal Difference Updating without a Learning Rate
3.Feb'21
Michael C talks about Online Imitation Learning
16.Dec'20
David A continues presenting The Theory of Abstraction in Reinforcement Learning
9.Dec'20
David A presents The Theory of Abstraction in Reinforcement Learning
25.Nov'20
Marcus gives final presentation on Neural Network Approximation Theory
11.Nov'20
Marcus continues presenting an Introduction to Neural Network Approximation Theory
28.Oct'20
Matthew continues to present Role-Based Deception in multi-agent games
21.Oct'20
Matthew presents Role-Based Deception in multi-agent games
14.Oct'20
Marcus continues presenting an Introduction to Neural Network Approximation Theory
7.Oct'20
Elliot continues to present A Gentle Introduction to Quantum Computing Algorithms
30.Sept'20
Michael B discusses Fragility, Mimicry and Understanding: Why AI Lacks Human Adaptability, and How to Fix This
23.Sept'20
Jonathon presents A Distributional Perspective on Reinforcement Learning
16.Sept'20
Marcus continues presenting an Introduction to Neural Network Approximation Theory
9.Sept'20
[no reading group]
2.Sept'20
Elliot continues to present The forget me not process
26.Aug'20
Elliot will present The forget me not process
19.Aug'20
Matthew presents Language Models are Few-Shot Learners
12.Aug'20
Michael C presents Quantilizers
5.Aug'20
[no reading group]
29.Jul'20
Marcus continues presenting an Introduction to Neural Network Approximation Theory
15.Jul'20
Joel presents Gated Linear Networks
8.Jul'20
Michael B presents the Abstraction and Reasoning Corpus
1.Jul'20
Marcus continues presenting an Introduction to Neural Network Approximation Theory
24.Jun'20
Elliot continues to present A Gentle Introduction to Quantum Computing Algorithms
10.Jun'20
Elliot presents A Gentle Introduction to Quantum Computing Algorithms
3.Jun'20
David J presents Decision theoretic foundations of casual modelling
27.May'20
Marcus continues presenting an Introduction to Neural Network Approximation Theory
20.May'20
Michael B presents On the Measure of Intelligence
13.May'20
Michael C presents Pessimism About Unknown Unknowns Inspires Conservatism
06.May'20
Matthew presents Agent57: Outperforming the human Atari benchmark
29.Apr'20
Sultan presents A Neural Transfer Function for a Smooth and Differentiable Transition Between Additive and Multiplicative Interactions
22.Apr'20
Marcus presents an Introduction to Neural Network Approximation Theory
15.Apr'20
Meet and greet of new virtual reading group
28.Aug'19
Sam, Sultan and Elliot present experiences from IJCAI-2019
10.July'19
Matthew will present Large-Scale Study of Curiosity-Driven Learning
15.May'19
Elliot presents his proofs of Kolmogorov complexity theorems in HOL
24.Apr'19
James Parker presents progress on his Honour's thesis on Feature Reinforcement Learning beyond Markov Decision Processes
17.Apr'19
Timothy presents progress on his Master's thesis on Non-Markovian State and State-Action Abstractions
20.Mar'19
Matthew will present Counterfactual Regret Minimization
20.Feb'19
Michael will continue to present his summer research
13.Feb'19
Michael will present his summer research
6.Feb'19
Elliot presents his experiences at AAAI-19
23.Jan'19
Matthew and Nikhil present their Summer Research Report
16.Jan'19
Matthew will present Learning to Navigate in Complex Environments
12.Dec'18
Sam will present Gradient Descent Finds Global Minima of Deep Neural Networks
5.Dec'18
[no reading group]
28.Nov'18
Nikhil and Matthew will present their recent research.
21.Nov'18
Omar will present A Generalized Representer Theorem
14.Nov'18
Samuel Yang-Zhao will present his honours thesis on Divergence of TD-like algorithms.
7.Nov'18
DavidQ presents his Masters thesis.
31.Oct'18
Advanced AI Group project presentation
24.Oct'18
Elliot will discuss Intelligence, Beyond Bounded Rationality, and Space-time embedded intelligence
17.Oct'18
[no reading group]
10.Oct'18
James Paker will Open the black box of Deep Neural Networks
via Information
3.Oct'18
Omar Ghattas will talk about Multi-Agent Reinforcement Learning
26.Sept'18
Tor Lattimore will talk about Partial Monitoring and his new book
19.Sept'18
Christian Walder will present Neural Dynamic Programming for Musical Self Similarity
12.Sept'18
No reading group due to holidays
5.Sept'18
Michael discusses why local minima don't appear that often in high dimensions
29.Aug'18
Xavier presents Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
22.Aug'18
Sultan continues to present his experiences at conferences (ICML, IJCAI)
15.Aug'18
Sultan presents his experiences at conferences (ICML, IJCAI)
8.Aug'18
Elliot and Xavier present the results of OpenAI Five
1.Aug'18
Chamin presents Online Learning with Gated Linear Networks
25.Jul'18
DavidQ will present Computable Variants of AIXI which are More Powerful than AIXItl
18.Jul'18
Tianyu presents his honours thesis on Neural Causality Detection for Multi-dimensional Point Processes
11.Jul'18
[no reading group]
4.Jul'18
Sultan will present Q-Learning Beyond MDPs
27.Jun'18
[no reading group]
20.Jun'18
Thibaut presents An Outsider's Tour of Reinforcement Learning
13.Jun'18
[no reading group]
6.Jun'18
Michael shows us how Solomonoff's Universal Prior actually looks like
30.May'18
Elliot presents some papers about AGI
23.May'18
Thibaut presents Philosophy and the practice of Bayesian statistics
16.May'18
DavidJ talks about Causality
9.May'18
Michael presents his AGI safety research
2.May'18
Elliot talks about World Models
25.Apr'18
[public holiday, no reading group]
18.Apr'18
Visitor Thiebaux introduces himself and presents his PhD research
11.Apr'18
[no reading group]
4.Apr'18
[no reading group]
28.Mar'18
Showdown between One-boxers and Two-boxers, and Quining the survivers.
21.Mar'18
Xavier presents Functional Decision Theory
14.Mar'18
Tom presents AI Safety Gridworlds
7.Mar'18
Michael reports on his CHAI research internship experience at Berkeley
28.Feb'18
Elliot and Badri present some papers
21.Feb'18
Tom presents some papers
14.Feb'18
Elliot presents population based algorithms
7.Feb'18
Tom continues with Wireheading Taxonomy
31.Jan'18
Tom presents Wireheading Taxonomy
24.Jan'18
[no reading group]
17.Jan'18
Tom reports on his Google Deep Mind internship experience.
6.Dec'17 - 10.Jan'18
[no reading group]
29.Nov'17
Elliot Catt presents his Thesis Progress on Quantum Computing
22.Nov'17
Group Excursion: Meet 13:00 at RSISE=BAB entrance (see email for details)
15.Nov'17
[no reading group]
8.Nov'17
[no reading group]
1.Nov'17
Owen presents his Honours Thesis on Universal Compression of Piecewise iid Sources
25.Oct'17
Xavier presents impressions from CFAR's AI Summer Fellows Program in San Francisco
18.Oct'17
[no reading group]
11.Oct'17
[no reading group]
4.Oct'17
Samuel presents Hindsight Experience Replay
27.Sep'17
Badri presents Convergence of Binarized CTW
20.Sep'17
Daoyi Dong from ADFA presents his research on Quantum RL and Quantum control theory
13.Sep'17
Elliot presents his formalization of TMs in HOL and prove of equivalence to PR functions.
6.Sep'17
[no reading group]
30.Aug'17
Sultan presents impressions from UAI
23.Aug'17
[no reading group]
16.Aug'17
[no reading group]
9.Aug'17
Arthur Franz presents incremental and hierarchical compression
2.Aug'17
Elliot practices conference talk
26.Jul'17
John practices conference talk
19.Jul'17
Tom presents Learning from Human Preferences
12.Jul'17
Tom presents applications of evidential semi-measures
5.Jul'17
John presents Deterministic Policy Gradient Algorithms
28.Jun'17
[No reading group]
21.Jun'17
Adam presents one of his recent research papers
14.Jun'17
Tom presents considerations on SARSA convergence: Gordon (1996) and Perkins and Precup (2003)
7.Jun'17
John presents FeUdal Networks for Hierarchical Reinforcement Learning in Room A105
31.May'17
Elliot presents The forget me not process
24.May'17
Sultan and Marcus present some AGI papers
17.May'17
Marcus presents Compress & Control
12.May'17 @12pm
Reading and discussing some UAI papers
10.May'17
Reading and discussing some UAI papers
3.May'17
Reading and discussing some UAI papers
26.April'17
Arie, Suraj and Elliot present MC-AIXI-CTW
19.April'17
Elliot presents Evolution Strategies as a Scalable Alternative to Reinforcement Learning
12.April'17
Tobias and Mikael continue
5.April'17
Tobias and Mikael present their Bachelor thesis on classifying games
29.March'17
Jarryd continues with generative adversarial networks for RL
22.March'17
Edward Barker presents Unsupervised Basis Function Adaptation for Reinforcement Learning
15.March'17
Jarryd presents generative adversarial networks for RL
8.March'17
Tor Lattimore talks about the adversarial/stochastic divide and some open problems there
1.March'17
Tom continues with The Delusionbox Problem.
22.Feb'17
Tom presents The Delusionbox Problem
15.Feb'17
Phuong Nguyen presents interesting experiences since leaving the group
8.Feb'17
Tor Lattimore talks about some open problems in online learning/statistics/RL.
1.Feb'17
Tor Lattimore presents The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits
14.Dec'16
Jarryd presents - Learning to reinforcement learn
07.Dec'16
Boris gives update on his project
30.Nov'16
Tom gives AI-Safety talk
23.Nov'16
Suraj presents his thesis (mid-semester update)
16.Nov'16
[No reading group]
9.Nov'16
Tom gives monitoring talk
2.Nov'16
[No reading group]
26.Oct'16 (at different time: 1pm)
James presents symmetry of algorithmic information
26.Oct'16 (at different time: 12pm)
Farhana presents - Dimensionality of Spatio-Temporal Broadband Signals Observed Over Finite Spatial and Temporal Windows
19.Oct'16
Sultan presents his recent research insights
(thereafter group excursion, see email for details)
12.Oct'16
Jarryd presents exploration results (cont.)
5.Oct'16
John presents AIXIjs
28.Sep'16
Jarryd presents exploration results
21.Sep'16
Manlio presents On the Computability of AIXI
14.Sep'16
John speaks about visit to Bay Area, and Why does deep and cheap learning work so well?
7.Sep'16
Tom presents takeaways from US+UK trip: deep learning
31.Aug'16
Tom presents takeaways from US+UK trip: UC Berkely and AGI
24.Aug'16
Tom presents takeaways from US+UK trip: New AI Safety research agendas Google/OpenAI open safety problems and MIRI's machine learning agenda
17.Aug'16
Tom presents takeaways from US+UK trip: Mainly Cooperative inverse reinforcement learning
3.Aug'16
Manlio Valenti from Trento introduces Upper-SemiComputable SemiMeasures
27.July'16
John and Sean present progress on Interactive GRL Demo
20.July'16
Break
13.July'16
Sultan presents 2 papers
22.June'16
Jarryd presents Unifying Count-Based Exploration and Intrinsic Motivation
15.June'16
Xian Wang presents his research on …
6.June'16 (obs: Monday)
Tom continues with AIXI tutorial
1.June'16
George Stamatescu presents KL Divergence and Reciprocal Chains
31.May'16
Tom presents AIXI tutorial
Gerhard Visser presents Interest-Relative Inductive Inference thesis draft (unpublished)
11+18.May'16
Break
4.May'16
John presents Pedro A. Ortega, Naftali Tishby (2016) Memory controls time perception and inter-temporal choices
27.April'16
Tom presents wireheading result
20.April'16
Sultan continues AGI reviews
13.April'16
Sultan AGI reviews
6.April'16
Tom and Sultan UAI reviews
30.Mar'16
Jan "defends" his thesis
(in room A105)
23.Mar'16
Discussion of UAI reviews
16.Mar'16
Jan continues to talk about conferences from 2015
9.Mar'16
Sultan presents State of the Art Control of Atari Games Using Shallow Reinforcement Learning
4.Mar'16 11:30 EXTRA SESSION
Adam Case presents
2.Mar'16
Jan presents Safely Interruptible Agents
24.Feb'16
Djallel Bouneffouf presents
17.Feb'16
Jan continues to talk about conferences from 2015
10.Feb'16
No reading group.
3.Feb'16
Jan continues to talk about conferences from 2015
27.Jan'16
Tom presents Owain Evans' paper Learning the Preferences of Ignorant, Inconsistent Agents
20.Jan'16
Jan talks about conferences from 2015
16.Dec'15
Tom summarises the Australasian AI conference, and maybe continues with preliminary results on the wireheading problem.
9.Dec'15
Jae Hee Lee presents his PhD thesis Qualitative Reasoning about Relative Directions: Computational Complexity and Practical Algorithm
2.Dec'15
Break for Australian AI conference
25.Nov'15
Tom presents summary of MIRIx workshop.
18.Nov'15
Tom presents preliminary results on the wireheading problem.
11.Nov'15
Daniel continues with Agents Using Speed Priors
4.Nov'15
Daniel presents Agents Using Speed Priors
28.Oct'15
David presents Practical Extreme State Aggregation
21.Oct'15
Matt Alger presents a project on Deep Inverse Reinforcement Learning
14.Oct'15
Aqua Zhu presents background on classical sequence prediction and related problems.
7.Oct'15
Tom presents Analytical Results on the BFS vs. DFS Algorithm Selection Problem, Part II: Graph Search
23.Sep'15
Tor Lattimore presents Optimally Confident UCB: Improved Regret for Finite-Armed Bandits
16.Sep'15
Spring break
9.Sep'15
Spring break
2.Sep'15
Tom presents Analytical Results on the BFS vs. DFS Algorithm Selection Problem, Part I: Tree Search
26.Aug'15
Marcus continues presenting impressions from ICML/EWRL
19.Aug'15
Hadi Afshar presents Reflection, Refraction, and Hamiltonian Monte Carlo
Recommended (background) reading
12.Aug'15
Tom presents Sequential Extensions of Causal and Evidential Decision Theory
5.Aug'15
Reading group resumes. Marcus presents impressions from ICML/EWRL
24.June'15 - 29.July'15
Winter break
17.June'15
Yiyun presents Modelling Causal Reasoning with Ambiguous Observations and Quantum Probability Model of "Zero-Sum" Beliefs
10.June'15
Mayank presents Neural Turing Machines
3.June'15
Continue discussing reviews for ALT
27.May'15
Discussing reviews for ALT
20.May'15
Jan continues from last time
13.May'15
Jan talks about merging and predicting,
in particular the results from Merging and Learning and
On Sequence Prediction for Arbitrary Measures
6.May'15
Continued discussion of AGI reviews
29.Apr'15
Discussing reviews for AGI
15. and 22.Apr'15
No reading group
8.Apr'15
Jan presents Reflective Oracles: A Foundation for Classical Game Theory
1.Apr'15
Jan presents Reflective Variants of Solomonoff Induction and AIXI
25.Mar'15
Yiyun presents Cognitive processes and mechanisms in causal reasoning with ambiguous observations
18.Mar'15
Marcus presents Compress and Control
11.Mar'15
Daniel presents the current status of his work on the speed prior
The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions
4.Mar'15
Jan presents a journal paper under review
18. and 25. Feb'15
No Reading Group
11.Feb'15
Tom presents Can we measure the difficulty of an optimization problem?
4.Feb'15
Mayank presents selected papers from ACML 2014
28.Jan'15
Jan presents Corrigibility
https://intelligence.org/files/CorrigibilityTR.pdf
21.Jan'15
Mayank gives a tutorial on convex optimization
10.Dec'14
PhD Monitoring Hadi (Room RSISE B123)
19'Nov'14
Peter leads discussion on the new book (with a focus on chapter 7)
Ethical Artificial Intelligence by Bill Hibbard
http://arxiv.org/ftp/arxiv/papers/1411/1411.1373.pdf
Bill builds on UAI, decision theoretic rationality, space-time embedded agents etc. to formally study ethical AI.
12'Nov'14
Xi Li presents on Leibniz's program and its
relation to UAI
5.Nov'14
Daniel Filan talks about Extreme state aggregation beyond MDPs
29.Oct'14
Neal Hughes (economics PhD student) presents on using RL for water management
Note: its in B123
22.Oct'14
Tom Butler presents his honors thesis
Fuzzy Expert System Evolution: Increasing the accessibility of intelligent controllers
15.Oct'14
PhD Monitoring Mayank & Jan (Room RSISE B123)
8.Oct'14
Break
1.Oct'14
Break
24.Sep'14
Daniel Filan presents about the speed prior
http://link.springer.com/chapter/10.1007%2F3-540-45435-7_15
17.Sep'14
Jan presents Teleporting Universal Agents by Laurent Orseau AGI'2014
http://www.agroparistech.fr/mia/equipes:membres:page:laurent:teleport
10.Sep'14
Hadi presents his most recent work on symbolic Gibb's sampling
3.Sep'14
Peter reports from AAAI'2014
Integrating representation learning and temporal difference learning:
A matrix factorization approach by M. White
http://webdocs.cs.ualberta.ca/~whitem/publications/14aaaiw-crtd.pdf
with a closely related alternative
http://webdocs.cs.ualberta.ca/~whitem/publications/14aaaiw-frrl.pdf
Active Learning with Model Selection by A. Ali., R. Caruana and A. Kapoor
http://research.microsoft.com/en-us/um/people/akapoor/papers/AAAI2014.pdf
Natural Temporal Difference Learning by W. Dabney and P. Thomas
http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/viewFile/8568/8913
27.Aug'14
Peter reports from CogSci'2014.
Toward Boundedly Rational Analysis by Thomas Icard
http://web.stanford.edu/~icard/cogsci14.pdf
A Bounded Rationality Account of Wishful Thinking by R. Neumann, A. N. Rafferty, T. L. Griffiths
http://cocosci.berkeley.edu/anna/papers/WishfulThinking.pdf
The high availability of extreme events serves resource-rational decision-making by Lieder, Wills, Hsu, Griffiths
http://cocosci.berkeley.edu/falk/HighAvailabilityOfExtremeEvents.pdf
and a related recent journal paper providing the background for the above
One and Done? Optimal Decisions From Very Few Samples by Edward Vul, Noah Goodman, Thomas L. Griffiths and Joshua B. Tenenbaum
http://web.stanford.edu/~ngoodman/papers/VulGoodmanGriffithsTenenbaum-COGS-2014.pdf
Information vs Reward in a changing world by Navarro and Newell
http://health.adelaide.edu.au/psychology/ccs/docs/pubs/2014/NavarroNewell2014.pdf
Uncertainty and Exploration in a restless bandit task by Speekenbrink and Konstantinidis
http://www.psychol.ucl.ac.uk/m.speekenbrink/articles/cogsci2014.pdf
16.July'14 — 20.Aug'14
Currently no meetings planned, but check a day in advance or volunteer to present something.
9.July'14
Jan presents Yudkowsky, Eliezer Herreshoff, Marcello.
Tiling Agents for Self-Modifying AI, and the Löbian Obstacle
https://intelligence.org/files/TilingAgents.pdf
and
Problems of self-reference in self-improving space-time embedded
intelligence. Benja Fallenstein and Nate Soares. AGI 2014.
https://intelligence.org/wp-content/uploads/2014/05/Fallenstein-Soares-Problems-of-self-reference-in-self-improving-space-time-embedded-intelligence.pdf
2.July'14
Peter gives practice talk for Quebec conference.
Note B123 and we start on time since the room has other events at 12:20.
Please arrive no later than 11:30 (always applies but in particular this week).
25.June'14
Marcus presents his ALT paper on Offline to Online Conversion.
20.June'14 Note, this is a Friday! Time 2pm
Daniel Cotton presents his ASC project on Reinforcement learning in computer science and psychology
Followed by Tony Allard giving his monitoring talk at 3pm on Logistics Planning.
18.June'14
Jan presents overview of MIRI's recent research
11.June'14
Break
4.June'14
Daniel Filan presents his ASC project on AIXI convergence
28.May'14
Mayank talks about game playing competition and reports on his progress.
In particular a report to Marcus and Peter, but others welcome.
21.May'14
Marcus presents his ALT submission Extreme State Aggregation Beyond MDPs
14.May'14
Paper reviewing discussions
7.May'14
No reading group
30.Apr'14
Paper reviewing discussions
23.Apr'14
Tor (visiting 23.-25.Apr) presents "Memory Allocation Bandits"
16.Apr'14
Paper reviewing discussions
9.Apr'14
Monitoring in RSISE seminar room
11:30 Hadi
12:00 Mayank
12:30 Jan
13:00-14:00 feedback.
2.Apr'14
Jan presents
Marcus Hutter: Discrete MDL Predicts in Total Variation. NIPS'09
http://arxiv.org/abs/0909.4588
26.Mar'14
Mayank presents "Cover Tree Bayesian Reinforcement Learning" by Nikolaos Tziortziotis, Christos Dimitrakakis and Konstantinos Blekas.
http://arxiv.org/pdf/1305.1809v1
12,19.Mar'14
Break due to travels and deadlines
5.Mar'14
Marcus talks about new extension of the context tree weighting algorithm
26.Feb'14
Peter presents "Using Expectation-Maximization for Reinforcement Learning" by Dayan and Hinton 1997
http://www.gatsby.ucl.ac.uk/~dayan/papers/rpp97.pdf
with a discussion of what has happened afterwards which includes Bayesian MCMC alternatives to the original frequentist EM approach, e.g.
http://www.stanford.edu/~ngoodman/papers/WingateEtAl-PolicyPrios.pdf
This line of work that includes many papers in the last 5 years is often called planning as inference
http://ipvs.informatik.uni-stuttgart.de/mlr/marc/publications/12-botvinick-TICS.pdf
19.Feb'14
Alex presents "Changing tastes and Coherent Dynamic Choice" by Peter J. Hammond
http://www.jstor.org/stable/2296609
12.Feb'14
Mayank continues from last week
5.Feb'14
Mayank presents "Efficient Learning and Planning with Compressed Predictive States".
William Hamilton, Mahdi Miliani Fard and Joelle Pineau.
http://arxiv.org/abs/1312.0286
29.Jan'14
Reading group restarts for 2014 with Peter talking about "Rationality, Optimism and Guarantees in General Reinforcement Learning" in the RSISE seminar room as an AI seminar. Please note 12:00-13:00 !
18'Dec'13-
MaxEnt, Xmas, New Year
11'Dec'13
Johannes presents his work on counter-examples in reinforcement learning
6'Dec'13
Tor's last day, at least here at ANU. Talk, farewell lunch etc. Details later
4'Dec'13
Hadi gives monitoring talk
27'Nov'13
Rachael continues from the 16:th of Oct with the voting part of the paper.
Note, the talk will be in R214 in the Ian Ross building!
20'Nov'13
Ian Hon presents his honours thesis
12'Nov'13
Tony Allard monitoring talk. Note Tuesday! 3pm in the RSISE seminar room
13'Nov'13
ACML workshop (organized by Peter Sunehag, Marcus Hutter, Mark Reid) on theory and practice in Machine Learning at the Manning Clark Centre, ANU
https://sites.google.com/site/mltheoryandpractice/
14,15'Nov'13
ACML at ANU
6'Nov'13
Johannes presents "The Fixed Points of Off-Policy TD" by J. Zico Kolter NIPS 2011.
http://books.nips.cc/papers/files/nips24/NIPS2011_1200.pdf
30'October'13
Mayank gives monitoring talk
23'October'13
Peter talks about "Learning from human generated rewards", based on a sequence of papers making up the PhD thesis of Bradley Knox (http://www.bradknox.net/) supervised by Peter Stone, primarily: W. Bradley Knox and Peter Stone. Learning Non-Myopically from Human-Generated Reward. In Proceedings of the International Conference on Intelligent User Interfaces (IUI), March 2013.
http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/iui13-knox.pdf
16'October'13
Raechel Briggs presents her article Decision-Theoretic Paradoxes as Voting Paradoxes,
Philosophical Review 2010 Volume 119, Number 1: 1-30
http://philreview.dukejournals.org/content/119/1/1.abstract
2,9'Oct'13
Break due to travels
25'Sep'13
Tor presents (More) Efficient Reinforcement Learning via Posterior Sampling, NIPS'2013
Ian Osband, Daniel Russo and Benjamin Van Roy
http://arxiv.org/pdf/1306.0940v1.pdf
18'Sep'13
Mayank presents,
Incremental Basis Construction from Temporal Difference Error by Yi Sun, Faustino Gomez, Mark Ring, Jurgen Schmidhuber in ICML 2011.
Paper @ http://www.idsia.ch/~juergen/icml2011sun.pdf
Slides @ http://www.idsia.ch/~sun/doc/icml11-ftr-slides.pdf
11'Sep'13
Tor talks about best arm identification in bandits
4'Sep'13
Peter presents,
Temporal-Difference Search in Computer Go by Silver, D., Sutton, R. S., Mueller, M in ICAPS 2013 http://www.aaai.org/ocs/index.php/ICAPS/ICAPS13/paper/view/6037/6227
and in Machine Learning 87(2):183-219 2012
http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/tdsearch.pdf
28'Aug'13
Mayank presents.
Bruno Scherrer. "Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unifed oblique projection view" in Proceedings of the 27th International Conference on Machine Learning (2010).
http://www.icml2010.org/papers/654.pdf
Slides available here,
http://www.loria.fr/~scherrer/presentations/tdbr.pdf
24'July'13
Tor presents things from conference travel to ICML/COLT.
17'July'13
Peter presents tutorial on Exploration vs Exploitation as practice before EWRL.
Probably downstairs in the seminar room
9'July'13 (note Tuesday!, 11:30)
Hadi presents his TPR
3'July'13
Scott presents (at 11)
S. Sanner, K. V. Delgado, and L. N. de Barros (2011). Symbolic Dynamic Programming for Discrete and Continuous State MDPs. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI-11). Barcelona, Spain.
http://users.cecs.anu.edu.au/~ssanner/Papers/cont_mdp.pdf
26'June'13
Monitoring talk by Ehsan (NICTA)
19'June'13
Monitoring talks by David and Suvash downstairs RSISE seminar room at 12
12'June'13
Integrating Partial Model Knowledge in Model Free RL Algorithms
Aviv Tamar and Dotan Di Castro and Ron Meir
International Conference on Machine Learning (ICML), 2011
http://www.icml-2011.org/papers/222_icmlpaper.pdf
Mayank presents
5'June'13
S. Thiebaux, C. Gretton, J. Slaney, D. Price and F. Kabanza (2006) "Decision-Theoretic Planning with non-Markovian Rewards", Volume 25, pages 17-74
http://www.jair.org/papers/paper1676.html
Charles Gretton presents
29'May'13
Tor and Peter presents
22'May'13
Monitoring
15'May'13
Monitoring
8'May'13
Peter present "Online Feature Selection for Model-based Reinforcement Learning" ICML'2013 by Trung Thanh Nguyen, Zhuoru Li, Tomi Silander and Tze-Yun Leong http://jmlr.csail.mit.edu/proceedings/papers/v28/nguyen13.pdf
1'May'13
Marcus presents his COLT paper on sparse adaptive Dirichlet-multinomial-like Processes
24'April'13
Hadi and Tor give monitoring talks
17'April'13
Wen and Mayank give monitoring talks
10'April'13
Mayank continues from last time on over-estimation in Q-learning
3'April'13
Mayank presents Double-Q learning and associated paper
http://books.nips.cc/papers/files/nips23/NIPS2010_0208.pdf
27'March'13
Ian Hon continues the survey on large alphabet sources and compression based on
http://www.cs.technion.ac.il/~ronbeg/begleiter-papers/begleiter06a.pdf
http://www.sps.ele.tue.nl/members/f.m.j.willems/research_files/CTW/benelux94-tjalkens-willems-shtarkov.pdf
20'March'13
Wen presents a survey on text (large alphabet) modeling. Relevant papers are
http://www2.denizyuret.com/ref/goodman/chen-goodman-99.pdf
http://acl.ldc.upenn.edu/P/P06/P06-1124.pdf
18'March'13
Oscillation-free epsilon-random sequences, Ludwig Staiger
13'March'13
Tor presents Thompson Sampling: An Asymptotically Optimal Finite Time Analysis, ALT'2012
Emilie Kaufmann, Nathaniel Korda, Rémi Munos
http://arxiv.org/abs/1205.4217
6'March'13
Peter presents "A Dantzig Selector Approach to Temporal Difference Learning", Matthieu Geist, Bruno Scherrer, Alessandro Lazaric and Mohammad Ghavamzadeh, ICML 2012 http://icml.cc/2012/papers/703.pdf
27'February'13
Wen presents "Delusion, Survival, and Intelligent Agents" by Mark B. Ring, Laurent Orseau, AGI'2011
http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=28BAE7205B795D39B357E46822EB4A4D?doi=10.1.1.232.9313
13'February'13
Tor presents "Universal Knowledge-Seeking Agents" by Laurent Orseau , ALT'2011
http://www.agroparistech.fr/mmip/maths/laurent_orseau/papers/orseau-ALT-2011-knowledge-seeking.pdf
6'February'13
Peter presents "Space-Time Embedded Intelligence" by Laurent Orseau and Mark Ring, AGI'2012
http://agi-conference.org/2012/wp-content/uploads/2012/12/paper_76.pdf
30'January'13
Tom presents his (draft) Master's thesis about (No) Free Lunch theorems for optimization
23'January'13
Nam talks about learning theory
16'January'13
Hadi presents the Loewenheim Skolem Theorem & Proof
http://en.wikipedia.org/wiki/L%C3%B6wenheim%E2%80%93Skolem_theorem
and Marcus the Skolem Paradox and its resolution
http://en.wikipedia.org/wiki/Skolem%27s_paradox
9'January'13
Marco presents Wouter M. Koolen, Dimitri Adamskiy, Manfred K. Warmuth (NIPS 2012) Putting Bayes to sleep
http://www.cs.rhul.ac.uk/~wouter/Papers/sleep.pdf
19'December'12
End of year meeting. Marcus presents Fun with Bayesian & Decision & other paradoxes.
12'December'12
Summer scholars present their topics and informal question and answer session.
28'November'12
Marco leads readings on extensions of CTW
Volf, P., & Willems, F. (1997). A context-tree branch-weighting algorithm. SYMPOSIUM ON INFORMATION THEORY IN THE …. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.121.7873&rep=rep1&type=pdf
Willems, F. M. J. (1996). Context weighting for general finite-context sources. Information Theory, IEEE …, 42(5), 1514–1520. doi:10.1109/18.532891
Willems, F. M. J. (1998). The context-tree weighting method: extensions. IEEE Transactions on Information Theory, 44(2), 792–798. doi:10.1109/18.661523
21'November'12
Peter leads discussions on
Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011) How to grow a mind: Statistics, structure, and abstraction. Science, 331, 1279-1285.
http://www.sciencemag.org/content/331/6022/1279.full.pdf
14'November'12
Monitoring: Wen and Mayank
7'November'12
Tom talks about his literature review on meta rationality
31'October'12
Xinjue gives practice talk on "Exploration in Bayesian Reinforcement Learning".
24'October'12
Monitoring talks
17'October'12
Peter presents "A Bayesian Sampling Approach to Exploration in Reinforcement Learning" by Asmuth, Li, Littman, Nouri, Wingate
http://web.mit.edu/~wingated/www/papers/boss.pdf
10'October'12
Hadi gives practice talk
3'October'12
Tor gives practice talk
26'September'12
Mayank gives practice talk
19'September'12
Marcus presents. See his email
12'September'12
Phuong gives practice talk for monitoring
5'September'12
Joel Veness gives a talk in the Seminar room downstairs on further developments of MC-AIXI.
29'August'12
Hadi talks about AGI papers
22'August'12
Phuong presents "TD-DeltaPi: A Model-Free Algorithm for Efficient Exploration" by B. C. Silva and A. G. Barto
http://people.cs.umass.edu/~bsilva/deltaPi_aaai2012.pdf
15'August'12
Mayank presents Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction. Available at http://webdocs.cs.ualberta.ca/~sutton/papers/horde-aamas-11.pdf .
8'August'12
Phuong summarizes two papers from AAAI12
- D. Lee and W. B. Powell, Intelligence Battery Controller Using Bias-Corrected Q-learning
http://energysystems.princeton.edu/Papers/Lee_Powell_AAAI2012_BiasCorrectedQLearning.pdf
- W. Dbney and A. G. Barto, Adaptive Step-Size for Online Reinforcement Learning
http://people.cs.umass.edu/~wdabney/papers/alphaBounds.pdf
1'August'12
- Mayank presents "Safe exploration in Markov Decision Processes" by Teodor Mihai Moldovan and Pieter Abbeel, ICML'2012. [http://icml.cc/2012/papers/838.pdf]
25'July'12
- Wen presents "Efficient learning algorithms for changing environments" by Elad Hazan and C. Seshadhri, ICML'2009
http://dl.acm.org/citation.cfm?id=1553425
18 July'12
- Hadi presents "On Bayes Methods for On-line Boolean Prediction" by Nicolo Cesa-Bianchi and David P. Helmbold and Sandra Panizza, in NeuroColt 1997
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.631
11 July'12
- Phuong doing test run for AAAI presentation
27 June 12
- Tor presents: David Freedman, On the Asymptotic Behaviour of Bayes Estimates in the Discrete Case II, 1965.
- http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoms/1177700155
20 June 12
- Cancelled
13'June'12
- Phuong presents his work on Regret bounds for feature reinforcement learning where he extends the work by Maillard, Munos and Ryabko to the countable case.
30'May'12
- Mayank talks about Predictive State Representations
- Littman, Michael L.; Richard S. Sutton; Satinder Singh (2002). "Predictive Representations of State". Advances in Neural Information Processing Systems 14 (NIPS). pp. 1555–1561.
- Singh, Satinder; Michael R. James; Matthew R. Rudary (2004). "Predictive State Representations: A New Theory for Modeling Dynamical Systems". Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI). pp. 512–519.
10.May'12
- Wen reports from DCC 2012 on three paper (emailed out)
2.May'12
- Selecting the state representation in reinforcement learning Maillard, Munos and Ryabko
http://books.nips.cc/papers/files/nips24/NIPS2011_1427.pdf
18.April'12
- On Nicod's condition and the black raven paradox
The paper is available from Hadi or Peter by email
4.April'12
- Near-optimal Regret Bounds for Reinforcement Learning, Thomas Jaksch, Ronald Ortner and Peter Auer
http://jmlr.csail.mit.edu/papers/v11/jaksch10a.html
21.Mar'12
- Automatic discovery of ranking formulas for playing with multi-armed bandits, Francis Maes, Louis Wehenkel, and Damien Ernst, EWRL 2011
http://ewrl.files.wordpress.com/2011/08/ewrl2011_submission_15.pdf
14.Mar'12
- A theoretical analysis of model based interval estimation by A. Strehl and M. Littman, ICML 2005
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.1496
presented by Peter
29.Feb'12
- Wen will be presenting his TPR on compression
15.Feb'12
- PAC bounds for Discounted MDPs, presented by Tor. Email ua.ude.una|eromittal.rot#ua.ude.una|eromittal.rot for a copy of the paper.
8.Feb'12
- Some inequalities in probability theory, presented by Tor
7.Dec'11
- Predictive State Temporal Difference Learning by Byron Boots and Geoff Gordon NIPS 2010
http://www.cs.cmu.edu/~ggordon/boots-gordon-PSTD.pdf
30.Nov'11
- Solomonoff Memorial conference in Melbourne. Tor, Ian, Peter and Wen presenting.
23.Nov'11
- An approximation of the universal intelligence measure, Shane Legg and Joel Veness
http://jveness.info/publications/rsmc2011%20-%20aiq.pdf
presented by Wen as a practice talk for Solomonoff Memorial
Further discussions of the paper follows the 20 minute presentation with slides
16.Nov'11
- Looping Suffix Tree-Based Inference of Partially Observable Hidden State, ICML 2006, Michael P. Holmes , Charles Lee Isbell, Jr.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.62.262
presented by Mayank
9.Nov'11
- We are done with the book. Paper reading resumes next week.
Nov'11
- Peter presents chapter 7 and Joseph Chapter 8.
Oct'11
- Wen presents Chapter 6
Sep'11
- After a break for the first two weeks, Phuong presents chapter 5
Aug'11
- Tor presents chapter 4
July'11
- Daniel finnish chapter 3
30.June'11
- Daniel presents chapter 3 of "Neuro-dynamic programming"
8,15,23.June'11
- Mayank presents chapter 2 of "Neuro-dynamic programming"
1.June'11
- We will start reading "Neuro-dynamic programming" by Dimitri P. Bertsekas and John Tsitsiklis
Athena Scientific 1996
http://www.amazon.com/Neuro-Dynamic-Programming-Optimization-Neural-Computation/dp/1886529108
We will go through the introduction this week and do some planning for the reading group.
This will be lead by Peter
25.May'11
- Variable resolution discretization in optimal control
R. Munos, A.Moore
http://repository.cmu.edu/cgi/viewcontent.cgi?article=1259&context=robotics
Presented by Daniel
18.May'11
- [WNLL] Planning and Learning in Environments with Delayed Feedback
Thomas J. Walsh , Ali Nouri , Lihong Li , Michael L. Littman
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.677
Presented by Matthew
11.May'11
- Meeting, discussing paper reviewing
4.May'11
- Zahra Zamani (PhD Monitoring) An Agent Architecture for Structured Uncertain Environments
http://cecs.anu.edu.au/seminars/more/SID/2834 - Phuong Nguyen (PhD Monitoring) Feature Reinforcement Learning In Practice
http://cecs.anu.edu.au/seminars/more/SID/2833
27.April'11
- Tor Lattimore (PhD Monitoring) Asymptotically Optimal Agents
http://cecs.anu.edu.au/seminars/more/SID/2832 - Wen Shao (PhD Monitoring) AIXI in Formalisation of Turing Test
http://cecs.anu.edu.au/seminars/more/SID/2835
20.April'11
- Matthew Robards (PhD Monitoring) Function Approximation for Model Based Reinforcement Learning
http://cecs.anu.edu.au/seminars/more/SID/2830 - Mayank Daswani (PhD Monitoring) Feature Dynamic Bayesian Networks
http://cecs.anu.edu.au/seminars/more/SID/2831
13.April'11
- Wen presents, [Chu10] Evgeny Chutchev (2010), A Formalization of the Turing Test
http://arxiv.org/abs/1005.4989
6.April'11
- Mayank Presents [Hut09] M.Hutter, Feature dynamic Bayesian networks.
In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume8, pages 67-73. Atlantis Press, 2009.
http://www.hutter1.net/ai/phidbn.pdf
30.Mar'11
- Daniel Presents [NL09] A Nouri, M. Littman, Multi-resolution Exploration in Continuous Spaces, NIPS 2009
http://books.nips.cc/papers/files/nips21/NIPS2008_0730.pdf
23.Mar'11
- Peter presents "Dynamic Policy Programming"
http://www.mbfys.ru.nl/staff/m.azar/poster_NIPS09.pdf
http://arxiv.org/abs/1004.2027
16.Mar'11
- Phuong presents
9.Mar'11
- Pascal presents
Finale Doshi-Velez: Nonparametric Bayesian Approaches for Reinforcement Learning in Partially Observable Domains
http://www.informatik.uni-trier.de/%7Eley/db/conf/aaai/aaai2010.html#Doshi-Velez10
and Matthew presents Model Based RL with Function Approximation
4.Mar'11
"Pascal" Workshop on RL and Planning at NICTA Level 3, Meeting Room D
- 10:30 — 11:00 | Pascal Poupart: Explaining Automated Policies for Sequential Decision Making
- 11:00 — 11:18 | Debdeep Banerjee: Partial Order Support Link Scheduling
- 11:18 — 11:36 | Patrik Haslum: A Quick Overview of Factored (Classical) Planning
- Break — 12 minutes
- 11:48 — 12:06 | Scott Sanner: The Relational Dynamic Influence Diagram Language
- 12:06 — 12:24 | Peter Sunehag: History-based Reinforcement Learning
- 12:24 — 12:42 | Matt Robards: Model-Based Reinforcement Learning With Function Approximation
- 12:42 — 13:00 | Will Uther: topic TBD
2.Mar'11
- PAC-Bayesian Model Selection for Reinforcement Learning
Mahdi Milani Fard, Joelle Pineau
http://books.nips.cc/papers/files/nips23/NIPS2010_0431.pdf
Presented by Pascal Poupart
23.Feb'11
- Tor and Hassan present …
16.Feb'11
- Matthew and Peter present …
9.Feb.'11
- Bruno C. da Silva, Eduardo W. Basso, Ana L. C. Bazzan, Paulo M. Engel,
Dealing with Non-Stationary Environments using Context Detection
ICML 2006
http://www.autonlab.org/icml_documents/camera-ready/028_Dealing_with_Non_Sta.pdf
Presented by Aaron Li
15.Dec.'10
- Chapters 6—8, Sridhar Mahadevan, "Learning Representation and Control in Markov Decision Processes: New Frontiers".
Foundations and Trends in Machine Learning (editor, Michael, Jordan), vol 1, No. 4, pp. 403-565 (163 pages), 2009.
http://www.cs.umass.edu/~mahadeva/papers/ml-found-trend.pdf
Presented by Scott
8.Dec.'10
- Statistical physics of social dynamics
Castellano, C., Fortunato, S., and Loreto, V. 2009. Reviews of Modern Physics 81, 2, 591
Section IV. Cultural Dynamics, Parts A & B (Axelrod model and variants)
http://dx.doi.org/10.1103/RevModPhys.81.591
Presented by Ian Wood
1.Dec.'10
- Constrained Complexity Generalized Context-Tree Algorithms, Robert J Drost and Andrew C Singer
http:/ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04301233
Presented by Peter
24.Nov'10
- Hopefully the following will be presented :Efficient real-time dynamic programming for factored MDPs.
Honors thesis by Sotirios Diamand. NOTE: In N101.
17.Nov'10
- Ofer Dekel and Shai Shalev Shwartz and Yoram Singer
Power of Selective Memory: Self Bounded Learning of Prediction Suffix Trees
NIPS 2004
http://ttic.uchicago.edu/~shai/papers/DekelShSi04.pdf
(Presented by Hassan)
10.Nov'10
- Efficient real-time dynamic programming for factored MDPs.
Honors thesis by Sotirios Diamand. CANCELED
3.Nov'10
- Cancelled
27.Oct'10
- Optimality Issues of Universal Greedy Agents with Static Priors by Laurent Orseau http://www.springerlink.com/content/p2780778k054411x/
Presented by Tor
20.Oct'10
- Never Ending Language Learning (from Tom Mitchell's group at CMU)
Scientific: http://rtw.ml.cmu.edu/papers/carlson-aaai10.pdf
News: http://www.nytimes.com/2010/10/05/science/05compute.html
Webpage: http://rtw.ml.cmu.edu/rtw/publications
13.Oct'10
- Frank Stephan will talk about Inductive Inference. He is a visitor from Singapore who was a PC chair at ALT and tutorial speaker. Webpage: http://www.comp.nus.edu.sg/~fstephan
22 .Sep'10, in Room 207 (only ours to 12:30, be on time)
- A Complete Theory of Everything, http://arxiv.org/abs/0912.5434
(Marcus)
15 .Sep'10
- Constantine Caramanis and Shie Mannor
Learning in the Limit with Adversarial Disturbances
In, Proceedings of COLT 2008.
http://www.ece.mcgill.ca/~smanno1//public/C-CarmanisM-COLT2008.pdf
(Presented by Hassan)
I list two more interesting papers that are more RL related but harder (I think)
- Huibert Kwakernaak, Robust control and H8-optimization - Tutorial paper. Automatica, 29 (2). pp. 255-273. 1993.
http://doc.utwente.nl/29962/1/Kwakernaak93robust.pdf - Jun Morimoto and Kenji Doya, Robust Reinforcement Learning. Neural Computation 2005.
http://mitpress.mit.edu/journals/pdf/neco_17_2_335_0.pdf
And another making the case for pursuing robust estimators in general:
- Peter j. Huber. On the non-optimality of optimal procedures. Optimality, the third Erich L. Lehmann Symposium. 2009.
http://projecteuclid.org/euclid.lnms/1249305323
8 .Sep'10
- No Free Lunch and Occam's Razor in Supervised Learning
(Tor presents his work)
18,25 .Aug,1.Sep'10
- Chapter 8 of Universal AI book
(general discussion)
11 .Aug'10
- New TD algorithms from Alberta
(Matthew presents a survey of stuff by Maei, Sutton and their collegues)
4 .Aug'10
- MC-AIXI-CTW,
(Joel Verness) NOTE LOCATION: A207
28 .July'10
- End of Chapter 7 of Universal AI book
(Tor presents)
21 .July'10
- Meeting about Advanced AI course
Aug'10
- Hyeong Soo Chang and Michael C. Fu and Jiaqiao Hu and Steven I. Marcus
An Adaptive Sampling Algorithm for Solving Markov Decision Processes
Operations Research, 53 (1), January–February 2005, pp. 126–139
http://www.rhsmith.umd.edu/faculty/mfu/fu_files/CFHM05.pdf
9,16 .Jun'10
- Chapter 7 of Universal AI book
(Presented by Phuong)
2.Jun'10
- Canceled
26.May'10
- Matthew Robards and Peter Sunehag and Scott Sanner
RKHS Temporal Difference Learning
Tech Report, The Australian National University
RKHS Temporal Difference Learning
28.Apr'&05,19.May'10
- Chapter 6 of Universal AI book
(Presented by Peter,Marcus, Zhara)
21.April'10
- F. Willems and Y. Shtarkov and T. Tjalkens
Reflections on the Prize Paper: "The Context-Tree Weighting Method: Basic Properties"
IEEE Information Theory Society Newsletter (47) No 1, March 1997
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.1872&rep=rep1&type=pdf
For more details,
- F. Willems and Y. Shtarkov and T. Tjalkens
The context-tree weighting method: Basic properties
IEEE Transactions on Information Theory (41), 653 - 664, 1995
http://ieeexplore.ieee.org/iel1/18/8656/00382012.pdf?arnumber=382012
(the following is a more readable version of the same paper)
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.30.1819&rep=rep1&type=pdf
(Presented by Tor)
14.April'10
- L. Kocsis, Cs. Szepesvári
Bandit Based Monte-Carlo Planning
In, Proceedings of the 17th European Conference on Machine Learning
Springer-Verlag, Berlin, LNCS/LNAI 4212, September 18-22, pp. 282-293, 2006.
http://www.sztaki.hu/~szcsaba/papers/ecml06.pdf
(Presented by Peter)
7.April'10
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R.E. Schapire.
The nonstochastic multiarmed bandit problem.
SIAM Journal on Computing, 32: 48- 77, 2002.
http://www.cs.princeton.edu/~schapire/uncompress-papers.cgi/AuerCeFrSc01.ps
(Presented by Mark Reid)
31.Mar'10
- Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer
Finite time analysis of the multiarmed bandit problem
Machine Learning, 47(2-3):235-256, 2002.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.9211&rep=rep1&type=pdf
(Presented by Hassan)
17&24.Mar'10
- M. Kearns, Y. Mansour, and A.Y. Ng.
A sparse sampling algorithm for near optimal planning in large Markovian decision processes.
In Proceedings of IJCAI'99, pages 1324-1331, 1999.
http://www.cis.upenn.edu/~mkearns/papers/sparseplan.pdf
(Presented by Zahra)
3&10.Mar'10
- Joel Veness and Kee Siong Ng and Marcus Hutter and David Silver
A Monte Carlo AIXI Approximation
Technical Report, arXiv 0909.0801 (2009) 1-42
[implementation & application of the AIXI]
http://www.hutter1.net/ai/aixictw.pdf
(Presented by Sam)
3&10&17&24.Feb'10
- C. Boutilier and T. Dean and S. Hanks
Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
Journal of Artificial Intelligence Research, 11 (1999) 1—94.
http://www.eecs.harvard.edu/~avi/CS281r/F06/Papers/boutilier-et-al-mdp.pdf
(Presented 3rd by Scott and the group, 10th no meeting, 17th Phuong, 24th Zahra)
27.Jan'10
- Sebastian Thrun, Probabilistic Algorithms in Robotics
AI Magazine, 21:4 (2000) 93—109
http://www.cs.cmu.edu/~thrun/papers/thrun.probrob.pdf
(Presented by Marcus)
20.Jan'10
- Continuation of last week's paper + Summer Scholar presentation preview.
13.Jan'10
- Note: Changed from Before.
Beal, M.J., Ghahramani, Z. and Rasmussen, C.E.
The Infinite Hidden Markov Model
In Advances in Neural Information Processing Systems 2002.
http://www.cse.buffalo.edu/faculty/mbeal/papers/ihmm.pdf
(Presented by Hassan).
23&30.Dec'09
- Break - (-: Christmas and New Years :-)
16.Dec'09
Scott will be presenting:
- (1) Excerpts of Scott's Thesis on factored MDPs.
- (2) Stochastic Planning using Decision Diagrams (SPUDD).
Hoey, St. Aubin, Hu, Boutilier (UAI-99)
http://www.cs.toronto.edu/~cebly/Papers/spudd.ps
- (3) Approximate Policy Construction using Decision Diagrams (APRICODD).
St. Aubin, Hoey, Boutilier (NIPS-00)
http://www.cs.ubc.ca/nest/lci/papers/docs2000/hoey-apricodd.pdf
09.Dec'09
- [RP08] S.Ross and J.Pineau.
Model-based Bayesian reinforcement learning in large structured domains.
In Proc. 24th Conference in Uncertainty in Artificial Intelligence
(UAI'08), pages 476-483, Helsinki, 2008. AUAI Press.
http://www.cs.mcgill.ca/~jpineau/files/sross-uai08.pdf
(Presented by Peter)
02.Dec'09
- Note: Changed from before.
M. Rosencrantz, G. Gordon, and S. Thrun.
Learning low dimensional predictive representations.
In Proceedings of the Twenty-First International Conference on Machine Learning,
Banff, Alberta, Canada, 2004.
http://robots.stanford.edu/papers/Rosencrantz04a.pdf
(Presented by Ian)
25.Nov'09
- [SJR04] S.P. Singh, M.R. James, and M.R. Rudary.
Predictive state representations: A new theory for modeling dynamical systems.
In Proc. 20th Conference in Uncertainty in Artificial Intelligence (UAI'04), pages 512-518, Banff, Canada, 2004. AUAI Press.
(Presented by Hassan)
18.Nov'09
- Matthew Robards will present his literature review on reinforcement learning in large, continuous spaces (focus on Part II).
Literature Review
11.Nov'09
- [SLL09] A.L. Strehl, L.Li, and MichaelL. Littman.
Reinforcement learning in finite MDPs: PAC analysis.
http://paul.rutgers.edu/~strehl/, 2009.
(Presented by Marcus)
04.Nov'09
- [McC95] McCallum, R. Andrew.
Instance-Based Utile Distinctions for Reinforcement Learning.
The Proceedings of the Twelfth International Machine Learning Conference (ML'95).
Lake Tahoe, CA, 1995.
ftp://ftp.cs.rochester.edu/pub/papers/robotics/95.mccallum-ml.ps.Z
(Presented by Peter)
28.Oct'09
Scott will be talking about several nice methods for solving MDPs efficiently.
The 4 papers to be covered are summarized in the following slides:
http://sml.nicta.com.au/rlp08/RLP_MDP_Extensions.pdf
The papers themselves are as follows (it's recommended that people read the first one
and skim through the others).
- Algorithms for Inverse Reinforcement Learning.
Andrew Y. Ng and Stuart Russell.
ICML 2000.
http://robotics.stanford.edu/~ang/papers/icml00-irl.pdf
- Policy invariance under reward transformations: theory and application to reward shaping.
Andrew Y. Ng and Daishi Harada and Stuart Russell.
ICML 1999.
http://robotics.stanford.edu/~ang/papers/shaping-icml99.pdf
- Hierarchical Solution of Markov Decision Processes using Macro-actions.
Milos Hauskrecht and Nicolas Meuleau and Leslie Pack Kaelbling and Thomas Dean and Craig Boutilier.
UAI 1998.
Note: this paper builds on the macro action semi-MDP framework of Sutton & Precup, but makes some
important changes which make things much cleaner (theoretically and implementationally).
http://www.cs.toronto.edu/kr/papers/macros.pdf
- Reinforcement Learning with Hierarchies of Machines.
Ronald Parr and Stuart Russell.
NIPS 1998.
http://eprints.kfupm.edu.sa/61888/1/61888.pdf
21.Oct'09
- Andrew Y. Ng and Michael Jordan.
PEGASUS: A policy search method for large MDPs and POMDPs.
In Uncertainty in Artificial Intelligence, Proceedings of the Sixteenth Conference, 2000.
http://robotics.stanford.edu/~ang/papers/uai00-pegasus.pdf
(Presented by Matthew)
Addendum: Policy gradient techniques from a robotics perspective:
- Policy gradient methods for robotics.
J. Peters and S.Schaal.
IROS 2006
http://www-clmc.usc.edu/publications/P/peters-IROS2006.pdf
14.Oct'09
- [NCD04] A.Y. Ng, A.Coates, M.Diel, V.Ganapathi, J.Schulte, B.Tse, E.Berger, and E.Liang.
Autonomous inverted helicopter flight via reinforcement learning.
In ISER, volume21 of Springer Tracts in Advanced Robotics, pages 363-372. Springer, 2004.
(Presented by Phuong)
30.Sep'09&7.Oct'09
- [RPPC08] S. Ross, J. Pineau, S. Paquet, B. Chaib-draa,
Online planning algorithms for POMDPs,
Journal of Artificial Intelligence Research, 32 (2008) 663—704.
This paper compares the "online" "tree-search" planning approach, popular for games
with the "offline" "self-consistent" Bellman equation approach,
popular in reinforcement learning (and described by Kaelbling 1998 et al).
(Presented by Peter).
16&23.Sep'09
- [KLC98] L.P. Kaelbling and M.L. Littman and A.R. Cassandra,
Planning and Acting in Partially Observable Stochastic Domains
Artificial Intelligence, 101 (1998) 99—134
(Presented by Marcus/Hassan/Sarah)
Papers in Queue
Neural Network Papers
- Malach, E. & Shalev-Shwartz, S. (2019). Is Deeper Better only when Shallow is Good?.
https://arxiv.org/pdf/1903.03488.pdf
- Pinkus, A. (1999). Approximation theory of the MLP model in neural networks. Acta Numerica, 8, 143–195.
https://doi.org/10.1017/S0962492900002919
- Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Networks. ArXiv:1406.2661 [Cs, Stat].
http://arxiv.org/abs/1406.2661
- Sutskever, I. (2015, January 13). A Brief Overview of Deep Learning [Random Ponderings]. Random Ponderings.
http://yyue.blogspot.com/2015/01/a-brief-overview-of-deep-learning.html
- Muthukumar, V., Vodrahalli, K., & Sahai, A. (2019). Harmless interpolation of noisy data in regression. ArXiv:1903.09139 [Cs, Stat].
http://arxiv.org/abs/1903.09139
- Du, S. S., Lee, J. D., Li, H., Wang, L., & Zhai, X. (2018). Gradient Descent Finds Global Minima of Deep Neural Networks. ArXiv:1811.03804 [Cs, Math, Stat].
http://arxiv.org/abs/1811.03804
- Arjovsky, M., & Bottou, L. (2017). Towards Principled Methods for Training Generative Adversarial Networks. ArXiv:1701.04862 [Cs, Stat].
http://arxiv.org/abs/1701.04862
- Alex Graves. (2014). Differentiable neural computers. Deepmind. /blog/article/differentiable-neural-computers Graves, A., Wayne, G., & Danihelka, I. (2014). Neural Turing Machines. ArXiv:1410.5401 [Cs].
http://arxiv.org/abs/1410.5401
- Mazumdar, E. V., Jordan, M. I., & Sastry, S. S. (2019). On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games. ArXiv:1901.00838 [Cs, Math, Stat].
http://arxiv.org/abs/1901.00838
Deepmind Papers
- Representation Learning with Contrastive Predictive Coding
https://arxiv.org/pdf/1807.03748.pdf - Variational Bayesian Reinforcement Learning with Regret Bounds
https://arxiv.org/pdf/1807.09647v1.pdf - Maximum a Posterori Policy Optimisation
https://arxiv.org/pdf/1806.06920.pdf - Human-level performance in first-person multiplayer games with population-based deep reinforcement learning
https://arxiv.org/pdf/1807.01281.pdf - Modeling Friends and Foes
https://arxiv.org/pdf/1807.00196.pdf - Agents and Devices: A Relative Definition of Agency
https://arxiv.org/pdf/1805.12387.pdf - Progress & Compress: A scalable framework for continual learning
https://arxiv.org/pdf/1805.06370.pdf - A Generalised Method for Empirical Game Theoretic Analysis
https://arxiv.org/pdf/1803.06376.pdf - Learning to Search with MCTSnets
https://arxiv.org/pdf/1802.04697v2.pdf
General POMDPs
- Nishiyama, Y., Boularias, A., Gretton, A., and Fukumizu, K., Hilbert Space Embeddings of {POMDPs}, UAI, 2012
http://www.gatsby.ucl.ac.uk/~gretton/papers/NisBouGreFuk12.pdf - Grunewalder, S., Lever, G., Baldassarre, L., Pontil, M., and Gretton, A., Modeling transition dynamics in {MDP}s with {RKHS} embeddings, ICML, 2012
http://www.gatsby.ucl.ac.uk/~gretton/papers/GruLevBalPonetal12.pdf - Fukumizu, K., Song, L., and Gretton, A., Kernel {Bayes'} Rule, Advances in Neural Information Processing Systems 24, pp.1737-1745, 2011
http://www.gatsby.ucl.ac.uk/~gretton/papers/FukSonGre11.pdf - [Dim10] Christos Dimitrakakis (2010) Context MDPs
http://fias.uni-frankfurt.de/~dimitrakakis/papers/cmdp.pdf
State Abstractions for RL
- [LWL06], Lihong Li , Thomas J. Walsh , Michael L. Littman,
Towards a Unified Theory of State Abstraction for MDPs
In Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.60.1229
- [GDG03] R.Givan, T.Dean, and M.Greig.
Equivalence notions and model minimization in Markov decision processes.
Artificial Intelligence, 147(1-2):163-223, 2003.
- [Hut09a] M.Hutter.
Feature dynamic Bayesian networks.
In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume8, pages 67-73. Atlantis Press, 2009.
General MDPs
- [LLW08] Lihong Li, Michael L. Littman, Thomas J. Walsh: Knows what it knows: a framework for self-aware learning. ICML 2008: 568-575
www.machinelearning.org/archive/icml2008/papers/627.pdf
- [DLL09] The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning
Carlos Diuk, Lihong Li, Bethany Leffler ICML '09
http://dl.acm.org/citation.cfm?doid=1553374.1553406
- [LL10], Lihong Li and Michael L. Littman, Reducing reinforcement learning to KWIK online regression
Tenth International Symposium on Artificial Intelligence and Mathematics"
http://www.springerlink.com/content/g25m74160311n665/fulltext.pdf
- [SL07] Er L. Strehl , Michael L. Littman , Online linear regression and its application to model-based reinforcement learning (NIPS 2007)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.6591
- [WGL10] Thomas J. Walsh, Sergiu Goschin, Michael L. Littman: Integrating Sample-Based Planning and Model-Based Reinforcement Learning. AAAI 2010,
http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1880 - [WNLL07], Thomas J. Walsh , Ali Nouri , Lihong Li , Michael L. Littman,
Planning and Learning in Environments with Delayed Feedback
Autonomous Agents and Multi-Agent Systems 18(1): 83-105 (2009)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.677 - [JS10] Tobias Jung, Peter Stone: Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-Like Exploration.
ECML/PKDD (1) 2010: 601-616
http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/ECML10-jung.pdf
- [JS09] Nicholas K. Jong, Peter Stone: Compositional Models for Reinforcement Learning.
ECML/PKDD (1) 2009: 644-659
http://www.springerlink.com/content/11460wl75p04493v/
- [GP10] M. Geist and O. Pietquin (2010) Kalman Temporal Differences
JAIR Volume 39, pages 483-532
http://www.jair.org/papers/paper3077.html
- [LT10] T. Lang and M. Toussaint (2010) Planning with Noisy Probabilistic Relational Rules
JAIR Volume 39, pages 1-49
http://www.jair.org/papers/paper3093.html
- [BBSE10] (Book), Lucian Busoniu, Robert Babuska, Bart De Schutter, and Damien Ernst (2010)
"Reinforcement Learning and Dynamic Programming Using Functions Approximators"
in the Automation and Control Engineering series of Taylor & Francis CRC Press.
- [Mah09] Sridhar Mahadevan (2009) Learning Representation and Control in Markov Decision Processes: New Frontiers
Foundations and Trends in Machine Learning: Vol. 1: No 4, pp 403-565.
http://dx.doi.org/10.1561/2200000003
Miscellaneous
- [Gru04] P.D. Gruenwald.
Tutorial on minimum description length.
In Minimum Description Length: recent advances in theory and practice, page Chapters 1 and 2. MIT Press, 2004.
- [BLA02] B. Ng, L. Peshkin, and A. Pfeffer.
Factored Particles for Scalable Monitoring.
In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, 2002.
Background Reading
The standard textbook on RL is:
- [SB98/18] R. Sutton and A. Barto. Reinforcement learning: An introduction (1st/2nd ed)
Cambridge, MA, MIT Press (1998),
http://incompleteideas.net/book/the-book-2nd.html
At least read Chps.1,2,3,6,11/16.
See also:
- [KLM96] L. P. Kaelbling and M. L. Littman and A. W. Moore,
Reinforcement learning: A Survey,
Journal of Artificial Intelligence Research, 4 (1996) 237—285
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/jair/pub/volume4/kaelbling96a.pdf
- [Put94] M.L. Puterman.
Markov Decision Processes - Discrete Stochastic Dynamic Programming.
Wiley, New York, NY, 1994.
- [KV86] P.R. Kumar and P.P. Varaiya.
Stochastic Systems: Estimation, Identification, and Adaptive Control.
Prentice Hall, Englewood Cliffs, NJ, 1986.
- [PORL09] Partially Observable Reinforcement Learning
Symposium at NIPS'09 December 10, Vancouver
http://www.hutter1.net/ai/porlsymp.htm and
http://grla.wikidot.com/nips for more details.
Contact
Len Du <moc.liamg|cilbup.ud.nel#moc.liamg|cilbup.ud.nel> or
David Quarel <ua.ude.una|lerauq.divad#ua.ude.una|lerauq.divad> or
Elliot Catt <ua.ude.una|ttacretneprac.toille#ua.ude.una|ttacretneprac.toille> or
Marcus Hutter <ua.ude.una|rettuh.sucram#ua.ude.una|rettuh.sucram>