Next: About this document
Up: Reinforcement Learning: A Survey
Previous: Acknowledgments
References
- 1
-
David H. Ackley and Michael L. Littman.
Generalization and scaling in reinforcement learning.
In D. S. Touretzky, editor, Advances in Neural Information
Processing Systems 2, pages 550-557, San Mateo, CA, 1990. Morgan Kaufmann.
- 2
-
J. S. Albus.
A new approach to manipulator control: Cerebellar model articulation
controller (cmac).
Journal of Dynamic Systems, Measurement and Control,
97:220-227, 1975.
- 3
-
James S. Albus.
Brains, Behavior, and Robotics.
BYTE Books, Subsidiary of McGraw-Hill, Peterborough, New Hampshire,
1981.
- 4
-
Charles W. Anderson.
Learning and Problem Solving with Multilayer Connectionist
Systems.
PhD thesis, University of Massachusetts, Amherst, MA, 1986.
- 5
-
Rachita (Ronny) Ashar.
Hierarchical learning in stochastic domains.
Master's thesis, Brown University, Providence, Rhode Island, 1994.
- 6
-
Leemon Baird.
Residual algorithms: Reinforcement learning with function
approximation.
In Armand Prieditis and Stuart Russell, editors, Proceedings of
the Twelfth International Conference on Machine Learning, pages 30-37, San
Francisco, CA, 1995. Morgan Kaufmann.
- 7
-
Leemon C. Baird and A. H. Klopf.
Reinforcement learning with high-dimensional, continuous actions.
Technical Report WL-TR-93-1147, Wright-Patterson Air Force Base Ohio:
Wright Laboratory, 1993.
- 8
-
Andrew G. Barto, S. J. Bradtke, and Satinder P. Singh.
Learning to act using real-time dynamic programming.
Artificial Intelligence, 72(1):81-138, 1995.
- 9
-
Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson.
Neuronlike adaptive elements that can solve difficult learning
control problems.
IEEE Transactions on Systems, Man, and Cybernetics,
SMC-13(5):834-846, 1983.
- 10
-
Richard Bellman.
Dynamic Programming.
Princeton University Press, Princeton, NJ, 1957.
- 11
-
Hamid R. Berenji.
Artificial neural networks and approximate reasoning for intelligent
control in space.
In American Control Conference, pages 1075-1080, 1991.
- 12
-
Donald A. Berry and Bert Fristedt.
Bandit Problems: Sequential Allocation of Experiments.
Chapman and Hall, London, UK, 1985.
- 13
-
Dimitri P. Bertsekas.
Dynamic Programming: Deterministic and Stochastic Models.
Prentice-Hall, Englewood Cliffs, NJ, 1987.
- 14
-
Dimitri P. Bertsekas.
Dynamic Programming and Optimal Control.
Athena Scientific, Belmont, Massachusetts, 1995.
Volumes 1 and 2.
- 15
-
Dimitri P. Bertsekas and D. A. Castañon.
Adaptive aggregation for infinite horizon dynamic programming.
IEEE Transactions on Automatic Control, 34(6):589-598, 1989.
- 16
-
Dimitri P. Bertsekas and John N. Tsitsiklis.
Parallel and Distributed Computation: Numerical Methods.
Prentice-Hall, Englewood Cliffs, NJ, 1989.
- 17
-
G. E. P. Box and N. R. Draper.
Empirical Model-Building and Response Surfaces.
Wiley, 1987.
- 18
-
Justin A. Boyan and Andrew W. Moore.
Generalization in reinforcement learning: Safely approximating the
value function.
In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors,
Advances in Neural Information Processing Systems 7, Cambridge, MA, 1995.
The MIT Press.
- 19
-
D. Burghes and A. Graham.
Introduction to Control Theory including Optimal Control.
Ellis Horwood, 1980.
- 20
-
Anthony R. Cassandra, Leslie Pack Kaelbling, and Michael L. Littman.
Acting optimally in partially observable stochastic domains.
In Proceedings of the Twelfth National Conference on Artificial
Intelligence, Seattle, WA, 1994.
- 21
-
David Chapman and Leslie Pack Kaelbling.
Input generalization in delayed reinforcement learning: An
algorithm and performance comparisons.
In Proceedings of the International Joint Conference on
Artificial Intelligence, Sydney, Australia, 1991.
- 22
-
Lonnie Chrisman.
Reinforcement learning with perceptual aliasing: The perceptual
distinctions approach.
In Proceedings of the Tenth National Conference on Artificial
Intelligence, pages 183-188, San Jose, CA, 1992. AAAI Press.
- 23
-
Lonnie Chrisman and Michael Littman.
Hidden state and short-term memory, 1993.
Presentation at Reinforcement Learning Workshop, Machine Learning
Conference.
- 24
-
Pawel Cichosz and Jan J. Mulawka.
Fast and efficient reinforcement learning with truncated temporal
differences.
In Armand Prieditis and Stuart Russell, editors, Proceedings of
the Twelfth International Conference on Machine Learning, pages 99-107, San
Francisco, CA, 1995. Morgan Kaufmann.
- 25
-
W. S. Cleveland and S. J. Delvin.
Locally weighted regression: An approach to regression analysis by
local fitting.
Journal of the American Statistical Association,
83(403):596-610, September 1988.
- 26
-
Dave Cliff and Susi Ross.
Adding temporary memory to ZCS.
Adaptive Behavior, 3(2):101-150, 1994.
- 27
-
Anne Condon.
The complexity of stochastic games.
Information and Computation, 96(2):203-224, February 1992.
- 28
-
Jonathan Connell and Sridhar Mahadevan.
Rapid task learning for real robots.
In Robot Learning. Kluwer Academic Publishers, 1993.
- 29
-
R. H. Crites and A. G. Barto.
Improving elevator performance using reinforcement learning.
In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Neural
Information Processing Systems 8, 1996.
- 30
-
Peter Dayan.
The convergence of TD( ) for general .
Machine Learning, 8(3):341-362, 1992.
- 31
-
Peter Dayan and Geoffrey E. Hinton.
Feudal reinforcement learning.
In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances
in Neural Information Processing Systems 5, San Mateo, CA, 1993. Morgan
Kaufmann.
- 32
-
Peter Dayan and Terrence J. Sejnowski.
TD converges with probability 1.
Machine Learning, 14(3), 1994.
- 33
-
Thomas Dean, Leslie Pack Kaelbling, Jak Kirman, and Ann Nicholson.
Planning with deadlines in stochastic domains.
In Proceedings of the Eleventh National Conference on Artificial
Intelligence, Washington, DC, 1993.
- 34
-
F. D'Epenoux.
A probabilistic production and inventory problem.
Management Science, 10:98-108, 1963.
- 35
-
Cyrus Derman.
Finite State Markovian Decision Processes.
Academic Press, New York, 1970.
- 36
-
M. Dorigo and H. Bersini.
A comparison of q-learning and classifier systems.
In From Animals to Animats: Proceedings of the Third
International Conference on the Simulation of Adaptive Behavior, Brighton,
UK, 1994.
- 37
-
M. Dorigo and M. Colombetti.
Robot shaping: Developing autonomous agents through learning.
Artificial Intelligence, 71(2):321-370, December 1994.
- 38
-
Marco Dorigo.
Alecsys and the AutonoMouse: Learning to control a real robot by
distributed classifier systems.
Machine Learning, 19, 1995.
- 39
-
Claude-Nicolas Fiechter.
Efficient reinforcement learning.
In Proceedings of the Seventh Annual ACM Conference on
Computational Learning Theory, pages 88-97. Association of Computing
Machinery, 1994.
- 40
-
J. C. Gittins.
Multi-armed Bandit Allocation Indices.
Wiley-Interscience series in systems and optimization. Wiley,
Chichester, NY, 1989.
- 41
-
D. Goldberg.
Genetic algorithms in search, optimization, and machine
learning.
Addison-Wesley, MA, 1989.
- 42
-
Geoffrey J. Gordon.
Stable function approximation in dynamic programming.
In Armand Prieditis and Stuart Russell, editors, Proceedings of
the Twelfth International Conference on Machine Learning, pages 261-268,
San Francisco, CA, 1995. Morgan Kaufmann.
- 43
-
Vijay Gullapalli.
A stochastic reinforcement learning algorithm for learning
real-valued functions.
Neural Networks, 3:671-692, 1990.
- 44
-
Vijay Gullapalli.
Reinforcement learning and its application to control.
PhD thesis, University of Massachusetts, Amherst, MA, 1992.
- 45
-
Ernest R. Hilgard and Gordon H. Bower.
Theories of Learning.
Prentice-Hall, Englewood Cliffs, NJ, fourth edition, 1975.
- 46
-
A. J. Hoffman and R. M. Karp.
On nonterminating stochastic games.
Management Science, 12:359-370, 1966.
- 47
-
John H. Holland.
Adaptation in Natural and Artificial Systems.
University of Michigan Press, Ann Arbor, MI, 1975.
- 48
-
Ronald A. Howard.
Dynamic Programming and Markov Processes.
The MIT Press, Cambridge, MA, 1960.
- 49
-
Tommi Jaakkola, Michael I. Jordan, and Satinder P. Singh.
On the convergence of stochastic iterative dynamic programming
algorithms.
Neural Computation, 6(6), November 1994.
- 50
-
Tommi Jaakkola, Satinder Pal Singh, and Michael I. Jordan.
Monte-carlo reinforcement learning in non-Markovian decision
problems.
In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors,
Advances in Neural Information Processing Systems 7, Cambridge, MA, 1995.
The MIT Press.
- 51
-
Leslie Pack Kaelbling.
Hierarchical learning in stochastic domains: Preliminary results.
In Proceedings of the Tenth International Conference on Machine
Learning, Amherst, MA, 1993. Morgan Kaufmann.
- 52
-
Leslie Pack Kaelbling.
Learning in Embedded Systems.
The MIT Press, Cambridge, MA, 1993.
- 53
-
Leslie Pack Kaelbling.
Associative reinforcement learning: A generate and test algorithm.
Machine Learning, 15(3), 1994.
- 54
-
Leslie Pack Kaelbling.
Associative reinforcement learning: Functions in k-DNF.
Machine Learning, 15(3), 1994.
- 55
-
Jak Kirman.
Predicting Real-Time Planner Performance by Domain
Characterization.
PhD thesis, Department of Computer Science, Brown University, 1994.
- 56
-
Sven Koenig and Reid G. Simmons.
Complexity analysis of real-time reinforcement learning.
In Proceedings of the Eleventh National Conference on Artificial
Intelligence, pages 99-105, Menlo Park, California, 1993. AAAI Press/MIT
Press.
- 57
-
P. R. Kumar and P. P. Varaiya.
Stochastic Systems: Estimation, Identification, and Adaptive
Control.
Prentice Hall, Englewood Cliffs, New Jersey, 1986.
- 58
-
C. C. Lee.
A self learning rule-based controller employing approximate reasoning
and neural net concepts.
International Journal of Intelligent Systems, 6(1):71-93,
1991.
- 59
-
Long-Ji Lin.
Programming robots using reinforcement learning and teaching.
In Proceedings of the Ninth National Conference on Artificial
Intelligence, 1991.
- 60
-
Long-Ji Lin.
Hierachical learning of robot skills by reinforcement.
In Proceedings of the International Conference on Neural
Networks, 1993.
- 61
-
Long-Ji Lin.
Reinforcement Learning for Robots Using Neural Networks.
PhD thesis, Carnegie Mellon University, Pittsburgh, PA, 1993.
- 62
-
Long-Ji Lin and Tom M. Mitchell.
Memory approaches to reinforcement learning in non-Markovian
domains.
Technical Report CMU-CS-92-138, Carnegie Mellon University, School of
Computer Science, May 1992.
- 63
-
Michael L. Littman.
Markov games as a framework for multi-agent reinforcement learning.
In Proceedings of the Eleventh International Conference on
Machine Learning, pages 157-163, San Francisco, CA, 1994. Morgan Kaufmann.
- 64
-
Michael L. Littman.
Memoryless policies: Theoretical limitations and practical results.
In Dave Cliff, Philip Husbands, Jean-Arcady Meyer, and Stewart W.
Wilson, editors, From Animals to Animats 3: Proceedings of the Third
International Conference on Simulation of Adaptive Behavior, Cambridge, MA,
1994. The MIT Press.
- 65
-
Michael L. Littman, Anthony Cassandra, and Leslie Pack Kaelbling.
Learning policies for partially observable environments: Scaling
up.
In Armand Prieditis and Stuart Russell, editors, Proceedings of
the Twelfth International Conference on Machine Learning, pages 362-370,
San Francisco, CA, 1995. Morgan Kaufmann.
- 66
-
Michael L. Littman, Thomas L. Dean, and Leslie Pack Kaelbling.
On the complexity of solving Markov decision problems.
In Proceedings of the Eleventh Annual Conference on Uncertainty
in Artificial Intelligence (UAI-95), Montreal, Québec, Canada, 1995.
- 67
-
William S. Lovejoy.
A survey of algorithmic methods for partially observable Markov
decision processes.
Annals of Operations Research, 28:47-66, 1991.
- 68
-
Pattie Maes and Rodney A. Brooks.
Learning to coordinate behaviors.
In Proceedings Eighth National Conference on Artificial
Intelligence, pages 796-802. Morgan Kaufmann, 1990.
- 69
-
Sridhar Mahadevan.
To discount or not to discount in reinforcement learning: A case
study comparing R learning and Q learning.
In Proceedings of the Eleventh International Conference on
Machine Learning, pages 164-172, San Francisco, CA, 1994. Morgan Kaufmann.
- 70
-
Sridhar Mahadevan.
Average reward reinforcement learning: Foundations, algorithms, and
empirical results.
Machine Learning, 22(1), 1996.
- 71
-
Sridhar Mahadevan and Jonathan Connell.
Automatic programming of behavior-based robots using reinforcement
learning.
In Proceedings of the Ninth National Conference on Artificial
Intelligence, Anaheim, CA, 1991.
- 72
-
Sridhar Mahadevan and Jonathan Connell.
Scaling reinforcement learning to robotics by exploiting the
subsumption architecture.
In Proceedings of the Eighth International Workshop on Machine
Learning, pages 328-332, 1991.
- 73
-
Maja J. Mataric.
Reward functions for accelerated learning.
In W. W. Cohen and H. Hirsh, editors, Proceedings of the
Eleventh International Conference on Machine Learning. Morgan Kaufmann,
1994.
- 74
-
Andrew Kachites McCallum.
Reinforcement Learning with Selective Perception and Hidden
State.
PhD thesis, Department of Computer Science, University of Rochester,
December 1995.
- 75
-
R. Andrew McCallum.
Overcoming incomplete perception with utile distinction memory.
In Proceedings of the Tenth International Conference on Machine
Learning, pages 190-196, Amherst, Massachusetts, 1993. Morgan Kaufmann.
- 76
-
R. Andrew McCallum.
Instance-based utile distinctions for reinforcement learning with
hidden state.
In Proceedings of the Twelfth International Conference Machine
Learning, pages 387-395, San Francisco, CA, 1995. Morgan Kaufmann.
- 77
-
Lisa Meeden, G. McGraw, and D. Blank.
Emergent control and planning in an autonomous vehicle.
In D.S. Touretsky, editor, Proceedings of the Fifteenth Annual
Meeting of the Cognitive Science Society, pages 735-740. Lawerence Erlbaum
Associates, Hillsdale, NJ, 1993.
- 78
-
Jose del R. Millan.
Rapid, safe, and incremental learning of navigation strategies.
IEEE Transactions on Systems, Man, and Cybernetics, 26(3),
1996.
- 79
-
George E. Monahan.
A survey of partially observable Markov decision processes:
Theory, models, and algorithms.
Management Science, 28:1-16, January 1982.
- 80
-
Andrew W. Moore.
Variable resolution dynamic programming: Efficiently learning
action maps in multivariate real-valued spaces.
In Proc. Eighth International Machine Learning Workshop, 1991.
- 81
-
Andrew W. Moore.
The parti-game algorithm for variable resolution reinforcement
learning in multidimensional state-spaces.
In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances
in Neural Information Processing Systems 6, pages 711-718, San Mateo, CA,
1994. Morgan Kaufmann.
- 82
-
Andrew W. Moore and Christopher G. Atkeson.
An investigation of memory-based function approximators for learning
control.
Technical report, MIT Artifical Intelligence Laboratory, Cambridge,
MA, 1992.
- 83
-
Andrew W. Moore and Christopher G. Atkeson.
Prioritized sweeping: Reinforcement learning with less data and
less real time.
Machine Learning, 13, 1993.
- 84
-
Andrew W. Moore, Christopher G. Atkeson, and S. Schaal.
Memory-based learning for control.
Technical Report CMU-RI-TR-95-18, CMU Robotics Institute, 1995.
- 85
-
Kumpati Narendra and M. A. L. Thathachar.
Learning Automata: An Introduction.
Prentice-Hall, Englewood Cliffs, NJ, 1989.
- 86
-
Kumpati S. Narendra and M. A. L. Thathachar.
Learning automata--a survey.
IEEE Transactions on Systems, Man, and Cybernetics,
4(4):323-334, July 1974.
- 87
-
Jing Peng and Ronald J. Williams.
Efficient learning and planning within the Dyna framework.
Adaptive Behavior, 1(4):437-454, 1993.
- 88
-
Jing Peng and Ronald J. Williams.
Incremental multi-step Q-learning.
In Proceedings of the Eleventh International Conference on
Machine Learning, pages 226-232, San Francisco, CA, 1994. Morgan Kaufmann.
- 89
-
Dean A. Pomerleau.
Neural network perception for mobile robot guidance.
Kluwer Academic Publishing, 1993.
- 90
-
Martin L. Puterman.
Markov Decision Processes--Discrete Stochastic Dynamic
Programming.
John Wiley & Sons, Inc., New York, NY, 1994.
- 91
-
Martin L. Puterman and Moon Chirl Shin.
Modified policy iteration algorithms for discounted Markov decision
processes.
Management Science, 24:1127-1137, 1978.
- 92
-
M. B. Ring.
Continual Learning in Reinforcement Environments.
PhD thesis, University of Texas at Austin, Austin, Texas, August
1994.
- 93
-
Ulrich Rüde.
Mathematical and computational techniques for multilevel
adaptive methods.
Society for Industrial and Applied Mathematics, Philadelphia,
Pennsylvania, 1993.
- 94
-
D. E. Rumelhart and J. L. McClelland, editors.
Parallel Distributed Processing: Explorations in the
microstructures of cognition. Volume 1: Foundations.
The MIT Press, Cambridge, MA, 1986.
- 95
-
G. A. Rummery and M. Niranjan.
On-line Q-learning using connectionist systems.
Technical Report CUED/F-INFENG/TR166, Cambridge University, 1994.
- 96
-
John Rust.
Numerical dynamic programming in economics.
In Handbook of Computational Economics. Elsevier, North
Holland, 1996.
- 97
-
A. P. Sage and C. C. White.
Optimum Systems Control.
Prentice Hall, 1977.
- 98
-
Marcos Salganicoff and Lyle H. Ungar.
Active exploration and learning in real-valued spaces using
multi-armed bandit allocation indices.
In Armand Prieditis and Stuart Russell, editors, Proceedings of
the Twelfth International Conference on Machine Learning, pages 480-487,
San Francisco, CA, 1995. Morgan Kaufmann.
- 99
-
A. L. Samuel.
Some studies in machine learning using the game of checkers.
IBM Journal of Research and Development, 3:211-229, 1959.
Reprinted in E. A. Feigenbaum and J. Feldman, editors, Computers
and Thought, McGraw-Hill, New York 1963.
- 100
-
S. Schaal and Christopher Atkeson.
Robot juggling: An implementation of memory-based learning.
Control Systems Magazine, 14, 1994.
- 101
-
J. Schmidhuber.
A general method for multi-agent learning and incremental
self-improvement in unrestricted environments.
In X. Yao, editor, Evolutionary Computation: Theory and
Applications. Scientific Publ. Co., Singapore, 1996.
- 102
-
J. H. Schmidhuber.
Curious model-building control systems.
In Proc. International Joint Conference on Neural Networks,
Singapore, volume 2, pages 1458-1463. IEEE, 1991.
- 103
-
Jürgen H. Schmidhuber.
Reinforcement learning in Markovian and non-Markovian
environments.
In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors,
Advances in Neural Information Processing Systems 3, pages 500-506, San
Mateo, CA, 1991. Morgan Kaufmann.
- 104
-
Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski.
Temporal difference learning of position evaluation in the game of
Go.
In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances
in Neural Information Processing Systems 6, pages 817-824, San Mateo, CA,
1994. Morgan Kaufmann.
- 105
-
Alexander Schrijver.
Theory of Linear and Integer Programming.
Wiley-Interscience, New York, NY, 1986.
- 106
-
Anton Schwartz.
A reinforcement learning method for maximizing undiscounted rewards.
In Proceedings of the Tenth International Conference on Machine
Learning, pages 298-305, Amherst, Massachusetts, 1993. Morgan Kaufmann.
- 107
-
Satinder P. Singh, Andrew G. Barto, Roderic Grupen, and Christopher Connolly.
Robust reinforcement learning in motion planning.
In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances
in Neural Information Processing Systems 6, pages 655-662, San Mateo, CA,
1994. Morgan Kaufmann.
- 108
-
Satinder P. Singh and Richard S. Sutton.
Reinforcement learning with replacing eligibility traces.
Machine Learning, 22(1), 1996.
- 109
-
Satinder Pal Singh.
Reinforcement learning with a hierarchy of abstract models.
In Proceedings of the Tenth National Conference on Artificial
Intelligence, pages 202-207, San Jose, CA, 1992. AAAI Press.
- 110
-
Satinder Pal Singh.
Transfer of learning by composing solutions of elemental sequential
tasks.
Machine Learning, 8(3):323-340, 1992.
- 111
-
Satinder Pal Singh.
Learning to Solve Markovian Decision Processes.
PhD thesis, Department of Computer Science, University of
Massachusetts, 1993.
Also, CMPSCI Technical Report 93-77.
- 112
-
Robert F. Stengel.
Stochastic Optimal Control.
John Wiley and Sons, 1986.
- 113
-
R. S. Sutton.
Generalization in Reinforcement Learning: Successful Examples Using
Sparse Coarse Coding.
In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Neural
Information Processing Systems 8, 1996.
- 114
-
Richard S. Sutton.
Temporal Credit Assignment in Reinforcement Learning.
PhD thesis, University of Massachusetts, Amherst, MA, 1984.
- 115
-
Richard S. Sutton.
Learning to predict by the method of temporal differences.
Machine Learning, 3(1):9-44, 1988.
- 116
-
Richard S. Sutton.
Integrated architectures for learning, planning, and reacting based
on approximating dynamic programming.
In Proceedings of the Seventh International Conference on
Machine Learning, Austin, TX, 1990. Morgan Kaufmann.
- 117
-
Richard S. Sutton.
Planning by incremental dynamic programming.
In Proceedings of the Eighth International Workshop on Machine
Learning, pages 353-357. Morgan Kaufmann, 1991.
- 118
-
Gerald Tesauro.
Practical issues in temporal difference learning.
Machine Learning, 8:257-277, 1992.
- 119
-
Gerald Tesauro.
TD-Gammon, a self-teaching backgammon program, achieves
master-level play.
Neural Computation, 6(2):215-219, 1994.
- 120
-
Gerald Tesauro.
Temporal difference learning and TD-Gammon.
Communications of the ACM, 38(3):58-67, March 1995.
- 121
-
C-K. Tham and R. W. Prager.
A modular q-learning architecture for manipulator task decomposition.
In Proceedings of the Eleventh International Conference on
Machine Learning, San Francisco, CA, 1994. Morgan Kaufmann.
- 122
-
Sebastian Thrun.
Learning to play the game of chess.
In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors,
Advances in Neural Information Processing Systems 7, Cambridge, MA, 1995.
The MIT Press.
- 123
-
Sebastian Thrun and Anton Schwartz.
Issues in using function approximation for reinforcement learning.
In M. Mozer, P. Smolensky, D. Touretzky, J. Elman, and A. Weigend,
editors, Proceedings of the 1993 Connectionist Models Summer School,
Hillsdale, NJ, 1993. Lawrence Erlbaum.
- 124
-
Sebastian B. Thrun.
The role of exploration in learning control.
In David A. White and Donald A. Sofge, editors, Handbook of
Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand
Reinhold, New York, NY, 1992.
- 125
-
John N. Tsitsiklis.
Asynchronous stochastic approximation and Q-learning.
Machine Learning, 16(3), September 1994.
- 126
-
John N. Tsitsiklis and Ben Van Roy.
Feature-based methods for large scale dynamic programming.
Machine Learning, 22(1), 1996.
- 127
-
L. G. Valiant.
A theory of the learnable.
Communications of the ACM, 27(11):1134-1142, November 1984.
- 128
-
Christopher J. C. H. Watkins.
Learning from Delayed Rewards.
PhD thesis, King's College, Cambridge, UK, 1989.
- 129
-
Christopher J. C. H. Watkins and Peter Dayan.
Q-learning.
Machine Learning, 8(3):279-292, 1992.
- 130
-
Steven D. Whitehead.
Complexity and cooperation in Q-learning.
In Proceedings of the Eighth International Workshop on Machine
Learning, Evanston, IL, 1991. Morgan Kaufmann.
- 131
-
Ronald J. Williams.
A class of gradient-estimating algorithms for reinforcement learning
in neural networks.
In Proceedings of the IEEE First International Conference on
Neural Networks, San Diego, CA, 1987.
- 132
-
Ronald J. Williams.
Simple statistical gradient-following algorithms for connectionist
reinforcement learning.
Machine Learning, 8(3):229-256, 1992.
- 133
-
Ronald J. Williams and Leemon C. Baird, III.
Analysis of some incremental variants of policy iteration: First
steps toward understanding actor-critic learning systems.
Technical Report NU-CCS-93-11, Northeastern University, College of
Computer Science, Boston, MA, September 1993.
- 134
-
Ronald J. Williams and Leemon C. Baird, III.
Tight performance bounds on greedy policies based on imperfect value
functions.
Technical Report NU-CCS-93-14, Northeastern University, College of
Computer Science, Boston, MA, November 1993.
- 135
-
Stewart Wilson.
Classifier fitness based on accuracy.
Evolutionary Computation, 3(2):147-173, 1995.
- 136
-
W. Zhang and T. G. Dietterich.
A reinforcement learning approach to job-shop scheduling.
In Proceedings of the International Joint Conference on
Artificial Intellience, 1995.
Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996