Next: About this document
Up: Learning Evaluation Functions
Previous: Transfer
References
- Barto et al.
1989
-
A. Barto, R. Sutton, and C. Watkins.
Learning and sequential decision making.
Technical Report COINS 89-95, University of Massachusetts, 1989.
- Barto et al.
1995
-
A. G. Barto, S. J. Bradtke, and S. P. Singh.
Real-time learning and control using asynchronous dynamic
programming.
AI Journal, 1995.
- Bellman1957
-
Richard Bellman.
Dynamic Programming.
Princeton University Press, 1957.
- Bellman1978
-
R. Bellman.
An Introduction to Artificial Intelligence: Can Computers
Think?
Boyd & Fraser Publishing Company, 1978.
- Berry and Fristedt1985
-
D. A. Berry and B. Fristedt.
Bandit Problems: Sequential Allocation of Experiments.
Chapman and Hall, 1985.
- Bertsekas and Tsitsiklis1996
-
D. Bertsekas and J. Tsitsiklis.
Neuro-Dynamic Programming.
Athena Scientific, Belmont, MA, 1996.
- Bertsekas1995
-
D. Bertsekas.
A counterexample to temporal differences learning.
Neural Computation, 7:270-9, 1995.
- Boyan and Moore1995
-
J. A. Boyan and A. W. Moore.
Generalization in reinforcement learning: Safely approximating the
value function.
In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors,
Advances In Neural Information Processing Systems 7. MIT Press, 1995.
- Boyan and Moore1996
-
J. A. Boyan and A. W. Moore.
Learning evaluation functions for large acyclic domains.
In L. Saitta, editor, Machine Learning: Proceedings of the
Thirteenth International Conference. Morgan Kaufmann, 1996.
(To appear).
- Boyan et al.
1995
-
J. A. Boyan, A. W. Moore, and R. S. Sutton, editors.
Proceedings of the Workshop on Value Function Approximation,
Machine Learning Conference, July 1995. CMU-CS-95-206.
Web: http://www.cs.cmu.edu/126reinf/ml95/.
- Boyan1992
-
J. A. Boyan.
Modular neural networks for learning context-dependent game
strategies.
Master's thesis, Engineering Department, Cambridge University, 1992.
- Censor et al.
1988
-
Y. Censor, M. D. Altschuler, and W. D. Powlis.
A computational solution of the inverse problem in radiation-therapy
treatment planning.
Applied Mathematics and Computation, 25:57-87, 1988.
- Chao and Harper1996
-
Heng-Yi Chao and Mary P. Harper.
An efficient lower bound algorithm for channel routing.
Integration: The VLSI Journal, 1996.
(to appear).
- Christensen1986
-
J. Christensen.
Learning static evaluation functions by linear regression.
In T. Mitchell, J. Carbonell, and R. Michalski, editors, Machine
learning: A guide to current research, pages 39-42. Kluwer, Boston, 1986.
- Cohn1992
-
J. M. Cohn.
Automatic Device Placement for Analog Cells in KOAN.
PhD thesis, Carnegie Mellon University Department of Electrical and
Computer Engineering, February 1992.
- Cormen et al.
1990
-
T. H. Cormen, C. E. Leiserson, and R. L. Rivest.
Introduction to Algorithms.
MIT Press, 1990.
- Crites and Barto1996
-
R. Crites and A. Barto.
Improving elevator performance using reinforcement learning.
In D. Touretzky, M. Mozer, and M. Hasselno, editors, Advances in
Neural Information Processing Systems 8, 1996.
- Dayan1992
-
P. Dayan.
The convergence of TD( ) for general .
Machine Learning, 8(3/4), May 1992.
- Denardo1982
-
E. Denardo.
Dynamic Programming: Models and Applications.
Prentice-Hall, Inc., 1982.
- Fahlman and Lebiere1990
-
S. Fahlman and C. Lebiere.
The Cascade-Correlation learning architecture.
In D. Touretzky, editor, Advances in Neural Information
Processing Systems 2. Morgan Kaufmann, 1990.
- Gordon1995
-
G. Gordon.
Stable function approximation in dynamic programming.
In Proceedings of the 12th International Conference on Machine
Learning. Morgan Kaufmann, 1995.
- Harmon et al.
1995
-
M. Harmon, L. Baird, and A. H. Klopf.
Advantage updating applied to a differential game.
In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors,
Advances In Neural Information Processing Systems 7. MIT Press, 1995.
- Howard1960
-
R. Howard.
Dynamic Programming and Markov Processes.
MIT Press and John Wiley & Sons, 1960.
- Kaelbling1990
-
L. P. Kaelbling.
Learning in Embedded Systems.
PhD thesis, Stanford University, Department of Computer Science,
1990.
- Lee and Mahajan1988
-
K.-F. Lee and S. Mahajan.
A pattern classification approach to evaluation function learning.
Artificial Intelligence, 36, 1988.
- Lin and Kernighan1973
-
S. Lin and B. W. Kernighan.
An effective heuristic algorithm for the traveling salesman problem.
Operations Research, 21:498-516, 1973.
- Lin1993
-
L.-J. Lin.
Reinforcement Learning for Robots Using Neural Networks.
PhD thesis, Carnegie Mellon University, 1993.
- Littman and
Szepesvári1996
-
M. L. Littman and C. Szepesvári.
A generalized reinforcement-learning model: Convergence and
applications.
In L. Saitta, editor, Machine Learning: Proceedings of the
Thirteenth International Conference. Morgan Kaufmann, 1996.
(To Appear).
- Moore and Schneider1996
-
A. W. Moore and J. Schneider.
Memory-based stochastic optimization.
In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors,
Neural Information Processing Systems 8. MIT Press, 1996.
- Moriarty and Miikkulainen1995
-
D. Moriarty and R. Miikkulainen.
Discovering complex othello strategies through evolutionary neural
networks.
Connection Science, 7(3-4):195-209, 1995.
- Nilsson1980
-
N.J. Nilsson.
Principles of Artificial Intelligence.
McGraw-Hill, 1980.
- Ochotta1994
-
E. Ochotta.
Synthesis of High-Performance Analog Cells in ASTRX/OBLX.
PhD thesis, Carnegie Mellon University Department of Electrical and
Computer Engineering, April 1994.
- Pollack et al.
1996
-
J. Pollack, A. Blair, and M. Land.
Coevolution of a backgammon player.
In C.G. Langton, editor, Proceedings of Artificial Life 5. MIT
Press, 1996.
(to appear).
- Pomerleau1991
-
D. Pomerleau.
Efficient training of artificial neural networks for autonomous
navigation.
Neural Computation, 3, January 1991.
- Press et al.
1992
-
W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery.
Numerical Recipes in C: The Art of Scientific Computing.
Cambridge University Press, second edition, 1992.
- Prieditis1993
-
A. Prieditis.
Machine discovery of effective admissible heuristics.
Machine Learning, 12:117-141, 1993.
- Samuel1959
-
A. L. Samuel.
Some studies in machine learning using the game of checkers.
IBM Journal of Research and Development, 3:211-229, 1959.
- Samuel1967
-
A. L. Samuel.
Some studies in machine learning using the game of checkers
II--Recent progress.
IBM Journal of Research and Development, 11(6):601-617, 1967.
- Sutton1987
-
R. S. Sutton.
Implementation details of the TD( ) procedure for the case
of vector predictions and backpropagation.
Technical Note TN87-509.1, GTE Laboratories, May 1987.
- Sutton1988
-
R. S. Sutton.
Learning to predict by the methods of temporal differences.
Machine Learning, 3, 1988.
- Sutton1990
-
R. S. Sutton.
Integrated architectures for learning, planning, and reacting based
on approximating dynamic programming.
In Proceedings of the Seventh International Conference on
Machine Learning. Morgan Kaufmann, 1990.
- Szykman and Cagan1995
-
S. Szykman and J. Cagan.
A simulated annealing-based approach to three-dimensional component
packing.
ASME Journal of Mechanical Design, 117, June 1995.
- Tesauro and Sejnowski1989
-
G. Tesauro and T. J. Sejnowski.
A parallel network that learns to play backgammon.
Artificial Intelligence, 39, 1989.
- Tesauro1992
-
G. Tesauro.
Practical issues in temporal difference learning.
Machine Learning, 8(3/4), May 1992.
- Tesauro1994
-
G. Tesauro.
TD-Gammon, a self-teaching backgammon program, achieves
master-level play.
Neural Computation, 6(2):215-219, 1994.
- Thrun and Schwartz1993
-
S. Thrun and A. Schwartz.
Issues in using function approximation for reinforcement learning.
In Proceedings of the Fourth Connectionist Models Summer
School, 1993.
- Tsitsiklis and Roy1996
-
J. N. Tsitsiklis and B. Van Roy.
An analysis of temporal-difference learning with function
approximation.
Technical Report LIDS-P-2322, MIT, 1996.
- Tunstall-Pedoe1991
-
W. Tunstall-Pedoe.
Genetic algorithms optimizing evaluation functions.
ICCA Journal, 14(3):119-128, 1991.
- Utgoff and Clouse1991
-
P. Utgoff and J. Clouse.
Two kinds of training information for evaluation function learning.
In Proceedings of AAAI, 1991.
- van Laarhoven1987
-
P. van Laarhoven.
Simulated Annealing: Theory and Applications.
Kluwer Academic, 1987.
- Watkins1989
-
C. Watkins.
Learning From Delayed Rewards.
PhD thesis, Cambridge University, 1989.
- Wong et al.
1988
-
D. F. Wong, H.W. Leong, and C.L. Liu.
Simulated Annealing for VLSI Design.
Kluwer Academic, 1988.
- Zhang and Dietterich1995
-
W. Zhang and T. G. Dietterich.
A reinforcement learning approach to job-shop scheduling.
In Proceedings of IJCAI-95, pages 1114-1120, 1995.
- Zhang1996
-
W. Zhang.
Reinforcement Learning for Job-Shop Scheduling.
PhD thesis, Oregon State University, 1996.
- Zweben and Fox1994
-
M. Zweben and M.S. Fox.
Scheduling and Rescheduling with Iterative Repair.
Morgan Kaufmann, 1994.
Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996