Fitted Q Iteration with CMACs...1
Stephan Timmer and Martin Riedmiller
Reinforcement-Learning-based Magneto-Hydrodynamic Control of Hypersonic Flows...9
Nilesh Kulkarni and Minh Phan
Nasser Sadati and Mohammad Mollaie Emamzadeh
Knowledge Transfer using Local Features...26
Martin Stolle and Christopher Atkeson
Particle Swarm Optimized Adaptive Dynamic Programming...32
Dongbin Zhao, Jianqiang Yi and Derong Liu
Discrete-time nonlinear HJB solution using Approximate dynamic programming: Convergence Proof...38
Asma AL Tamimi and Frank Lewis
Dual Representations for Dynamic Programming and Reinforcement Learning...44
Tao Wang, Michael Bowling and Dale Schuurmans
An Optimal ADP Algorithm for a High-Dimensional Stochastic Control Problem...52
Juliana Nascimento and Warren Powell
Convergence of Model-Based Temporal Difference Learning for Control...60
Hado van Hasselt and Marco Wiering
On a Successful Application of Multi-Agent Reinforcement Learning to Operations Research Benchmarks...68
Thomas Gabel and Martin Riedmiller
The Effect of Bootstrapping in Multi-Automata Reinforcement Learning...76
Maarten Peeters, Katja Verbeeck and Ann Nowe
A Theoretical Analysis of Cooperative Behavior in Multi-agent Q-learning...84
Ludo Waltman and Uzay Kaymak
Hybrid Ant Colony Optimization Using Memetic Algorithm for Traveling Salesman Problem...92
Haibin Duan and Xiufen Yu
Model-Based Reinforcement Learning in Factored-State MDPs...103
Alexander Strehl
Leader-Follower semi-Markov Decision Problems: Theoretical Framework and Approximate Solution...111
Kurian Tharakunnel and Siddhartha Bhattacharyya
A Scalable Recurrent Neural Network Framework for Model-Free POMDPs...119
Zhenzhen Liu and Itamar Elhanany
Reinforcement learning by backpropagation through an LSTM model/critic...127
Bram Bakker
Discrete-Time Approximate Dynamic Programming using Wavelet Basis Function Neural Networks...135
Ning Jin, Derong Liu, Ting Huang and Zhongyu Pang
The Knowledge Gradient Policy for Offline Learning with Independent Normal Rewards...143
Peter Frazier and Warren Powell
A Recurrent Control Neural Network for Data Efficient Reinforcement Learning...151
Anton Schaefer, Steffen Udluft and Hans-Georg Zimmermann
Computing Optimal Stationary Policies for Multi-Objective Markov Decision Processes...158
Marco Wiering and Edwin De Jong
Strategy Generation with Cognitive Distance in Two-Player Games...166
Kosuke Sekiyama, Ricardo Carnieri and Toshio Fukuda
Identifying Trajectory Classes in Dynamic Tasks...172
Stuart Anderson and Siddhartha Srinivasa
A Dynamic Programming Approach to Viability Problems...178
Pierre-Arnaud Coquelin, Sophie Martin and Remi Munos
Randomly Sampling Actions In Dynamic Programming...185
Christopher Atkeson
SVM Viability Controller Active Learning: Application to Bike Control...193
Laetitia Chapel and Guillaume Deffuant
Jose Ramirez-Hernandez and Emmanuel Fernandez
Opposition-Based Reinforcement Learning in the Management of Water Resources...217
Hamid R. Tizhoosh, Masoud Mahootchi and K. Ponnambalam
Online Reinforcement Learning Neural Network Controller Design for Nanomanipulation...225
Qinmin Yang and Sarangapani Jagannathan
Short-term Stock Market Timing Prediction under Reinforcement Learning Schemes...233
Hailin Li, Cihan Dagli and David Enke
Dynamic optimization of the strength ratio during a terrestrial conflict...241
Alexandre Sztykgold, Gilles Coppin and Olivier Hudry
Continuous-Time ADP for Linear Systems with Partially Unknown Dynamics...247
Draguna Vrabie, Murad Abu-Khalaf, Frank L. Lewis and Youyi Wang
Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark...254
Martin Riedmiller, Jan Peters and Stefan Schaal
Using Reward-weighted Regression for Reinforcement Learning of Task Space Control...262
Jan Peters and Stefan Schaal
Toward effective combination of off-line and on-line training in ADP framework...268
Danil Prokhorov
Reinforcement Learning in Continuous Action Spaces...272
Hado van Hasselt and Marco Wiering
Two Novel On-policy Reinforcement Learning Algorithms based on TD(λ)-methods...280
Marco Wiering and Hado van Hasselt
Opposition-Based Q(λ) with Non-Markovian Update...288
Hamid R. Tizhoosh, Maryam Shokri and Mohamed S. Kamel
Coordinated Reinforcement Learning for Decentralized Optimal Control...296
Daniel Yagan and Chen-Khong Tham
DHP Adaptive Critic Motion Control of Autonomous Wheeled Mobile Robot...311
Wei-Song Lin and Ping-Chieh Yang
Daniel Patino, Julian Pucheta, Carlos Schugurensky, Rogelio Fullana and Benjamin Kuchen
Roman Ilin, Robert Kozma and Paul Werbos
Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory...330
Andras Antos, Szepesvari Csaba and Munos Remi
Kernelizing LSPE(λ)...338
Tobias Jung and Daniel Polani
Q-learning algorithm with continuous state spaces and finite decision set...346
Kengy Barty, Pierre Girardeau, Jean Sebastien Roy and Cyrille Strugarek
Sparse Temporal Difference Learning using LASSO...352
Manuel Loth, Manuel Davy and Philippe Preux