next up previous contents
Next: About this document Up: Learning Evaluation Functions Previous: Transfer

References

Barto et al. 1989
A. Barto, R. Sutton, and C. Watkins. Learning and sequential decision making. Technical Report COINS 89-95, University of Massachusetts, 1989.

Barto et al. 1995
A. G. Barto, S. J. Bradtke, and S. P. Singh. Real-time learning and control using asynchronous dynamic programming. AI Journal, 1995.

Bellman1957
Richard Bellman. Dynamic Programming. Princeton University Press, 1957.

Bellman1978
R. Bellman. An Introduction to Artificial Intelligence: Can Computers Think? Boyd & Fraser Publishing Company, 1978.

Berry and Fristedt1985
D. A. Berry and B. Fristedt. Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, 1985.

Bertsekas and Tsitsiklis1996
D. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, Belmont, MA, 1996.

Bertsekas1995
D. Bertsekas. A counterexample to temporal differences learning. Neural Computation, 7:270-9, 1995.

Boyan and Moore1995
J. A. Boyan and A. W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances In Neural Information Processing Systems 7. MIT Press, 1995.

Boyan and Moore1996
J. A. Boyan and A. W. Moore. Learning evaluation functions for large acyclic domains. In L. Saitta, editor, Machine Learning: Proceedings of the Thirteenth International Conference. Morgan Kaufmann, 1996. (To appear).

Boyan et al. 1995
J. A. Boyan, A. W. Moore, and R. S. Sutton, editors. Proceedings of the Workshop on Value Function Approximation, Machine Learning Conference, July 1995. CMU-CS-95-206. Web: http://www.cs.cmu.edu/126reinf/ml95/.

Boyan1992
J. A. Boyan. Modular neural networks for learning context-dependent game strategies. Master's thesis, Engineering Department, Cambridge University, 1992.

Censor et al. 1988
Y. Censor, M. D. Altschuler, and W. D. Powlis. A computational solution of the inverse problem in radiation-therapy treatment planning. Applied Mathematics and Computation, 25:57-87, 1988.

Chao and Harper1996
Heng-Yi Chao and Mary P. Harper. An efficient lower bound algorithm for channel routing. Integration: The VLSI Journal, 1996. (to appear).

Christensen1986
J. Christensen. Learning static evaluation functions by linear regression. In T. Mitchell, J. Carbonell, and R. Michalski, editors, Machine learning: A guide to current research, pages 39-42. Kluwer, Boston, 1986.

Cohn1992
J. M. Cohn. Automatic Device Placement for Analog Cells in KOAN. PhD thesis, Carnegie Mellon University Department of Electrical and Computer Engineering, February 1992.

Cormen et al. 1990
T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, 1990.

Crites and Barto1996
R. Crites and A. Barto. Improving elevator performance using reinforcement learning. In D. Touretzky, M. Mozer, and M. Hasselno, editors, Advances in Neural Information Processing Systems 8, 1996.

Dayan1992
P. Dayan. The convergence of TD( tex2html_wrap_inline2400 ) for general tex2html_wrap_inline2400 . Machine Learning, 8(3/4), May 1992.

Denardo1982
E. Denardo. Dynamic Programming: Models and Applications. Prentice-Hall, Inc., 1982.

Fahlman and Lebiere1990
S. Fahlman and C. Lebiere. The Cascade-Correlation learning architecture. In D. Touretzky, editor, Advances in Neural Information Processing Systems 2. Morgan Kaufmann, 1990.

Gordon1995
G. Gordon. Stable function approximation in dynamic programming. In Proceedings of the 12th International Conference on Machine Learning. Morgan Kaufmann, 1995.

Harmon et al. 1995
M. Harmon, L. Baird, and A. H. Klopf. Advantage updating applied to a differential game. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances In Neural Information Processing Systems 7. MIT Press, 1995.

Howard1960
R. Howard. Dynamic Programming and Markov Processes. MIT Press and John Wiley & Sons, 1960.

Kaelbling1990
L. P. Kaelbling. Learning in Embedded Systems. PhD thesis, Stanford University, Department of Computer Science, 1990.

Lee and Mahajan1988
K.-F. Lee and S. Mahajan. A pattern classification approach to evaluation function learning. Artificial Intelligence, 36, 1988.

Lin and Kernighan1973
S. Lin and B. W. Kernighan. An effective heuristic algorithm for the traveling salesman problem. Operations Research, 21:498-516, 1973.

Lin1993
L.-J. Lin. Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University, 1993.

Littman and Szepesvári1996
M. L. Littman and C. Szepesvári. A generalized reinforcement-learning model: Convergence and applications. In L. Saitta, editor, Machine Learning: Proceedings of the Thirteenth International Conference. Morgan Kaufmann, 1996. (To Appear).

Moore and Schneider1996
A. W. Moore and J. Schneider. Memory-based stochastic optimization. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Neural Information Processing Systems 8. MIT Press, 1996.

Moriarty and Miikkulainen1995
D. Moriarty and R. Miikkulainen. Discovering complex othello strategies through evolutionary neural networks. Connection Science, 7(3-4):195-209, 1995.

Nilsson1980
N.J. Nilsson. Principles of Artificial Intelligence. McGraw-Hill, 1980.

Ochotta1994
E. Ochotta. Synthesis of High-Performance Analog Cells in ASTRX/OBLX. PhD thesis, Carnegie Mellon University Department of Electrical and Computer Engineering, April 1994.

Pollack et al. 1996
J. Pollack, A. Blair, and M. Land. Coevolution of a backgammon player. In C.G. Langton, editor, Proceedings of Artificial Life 5. MIT Press, 1996. (to appear).

Pomerleau1991
D. Pomerleau. Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3, January 1991.

Press et al. 1992
W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, second edition, 1992.

Prieditis1993
A. Prieditis. Machine discovery of effective admissible heuristics. Machine Learning, 12:117-141, 1993.

Samuel1959
A. L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3:211-229, 1959.

Samuel1967
A. L. Samuel. Some studies in machine learning using the game of checkers II--Recent progress. IBM Journal of Research and Development, 11(6):601-617, 1967.

Sutton1987
R. S. Sutton. Implementation details of the TD( tex2html_wrap_inline2400 ) procedure for the case of vector predictions and backpropagation. Technical Note TN87-509.1, GTE Laboratories, May 1987.

Sutton1988
R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3, 1988.

Sutton1990
R. S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning. Morgan Kaufmann, 1990.

Szykman and Cagan1995
S. Szykman and J. Cagan. A simulated annealing-based approach to three-dimensional component packing. ASME Journal of Mechanical Design, 117, June 1995.

Tesauro and Sejnowski1989
G. Tesauro and T. J. Sejnowski. A parallel network that learns to play backgammon. Artificial Intelligence, 39, 1989.

Tesauro1992
G. Tesauro. Practical issues in temporal difference learning. Machine Learning, 8(3/4), May 1992.

Tesauro1994
G. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.

Thrun and Schwartz1993
S. Thrun and A. Schwartz. Issues in using function approximation for reinforcement learning. In Proceedings of the Fourth Connectionist Models Summer School, 1993.

Tsitsiklis and Roy1996
J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. Technical Report LIDS-P-2322, MIT, 1996.

Tunstall-Pedoe1991
W. Tunstall-Pedoe. Genetic algorithms optimizing evaluation functions. ICCA Journal, 14(3):119-128, 1991.

Utgoff and Clouse1991
P. Utgoff and J. Clouse. Two kinds of training information for evaluation function learning. In Proceedings of AAAI, 1991.

van Laarhoven1987
P. van Laarhoven. Simulated Annealing: Theory and Applications. Kluwer Academic, 1987.

Watkins1989
C. Watkins. Learning From Delayed Rewards. PhD thesis, Cambridge University, 1989.

Wong et al. 1988
D. F. Wong, H.W. Leong, and C.L. Liu. Simulated Annealing for VLSI Design. Kluwer Academic, 1988.

Zhang and Dietterich1995
W. Zhang and T. G. Dietterich. A reinforcement learning approach to job-shop scheduling. In Proceedings of IJCAI-95, pages 1114-1120, 1995.

Zhang1996
W. Zhang. Reinforcement Learning for Job-Shop Scheduling. PhD thesis, Oregon State University, 1996.

Zweben and Fox1994
M. Zweben and M.S. Fox. Scheduling and Rescheduling with Iterative Repair. Morgan Kaufmann, 1994.



Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996