References

Next: About this document Up: Learning Evaluation Functions Previous: Transfer

References

Barto et al. 1989: A. Barto, R. Sutton, and C. Watkins. Learning and sequential decision making. Technical Report COINS 89-95, University of Massachusetts, 1989.
Barto et al. 1995: A. G. Barto, S. J. Bradtke, and S. P. Singh. Real-time learning and control using asynchronous dynamic programming. AI Journal, 1995.
Bellman1957: Richard Bellman. Dynamic Programming. Princeton University Press, 1957.
Bellman1978: R. Bellman. An Introduction to Artificial Intelligence: Can Computers Think? Boyd & Fraser Publishing Company, 1978.
Berry and Fristedt1985: D. A. Berry and B. Fristedt. Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, 1985.
Bertsekas and Tsitsiklis1996: D. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, Belmont, MA, 1996.
Bertsekas1995: D. Bertsekas. A counterexample to temporal differences learning. Neural Computation, 7:270-9, 1995.
Boyan and Moore1995: J. A. Boyan and A. W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances In Neural Information Processing Systems 7. MIT Press, 1995.
Boyan and Moore1996: J. A. Boyan and A. W. Moore. Learning evaluation functions for large acyclic domains. In L. Saitta, editor, Machine Learning: Proceedings of the Thirteenth International Conference. Morgan Kaufmann, 1996. (To appear).
Boyan et al. 1995: J. A. Boyan, A. W. Moore, and R. S. Sutton, editors. Proceedings of the Workshop on Value Function Approximation, Machine Learning Conference, July 1995. CMU-CS-95-206. Web: http://www.cs.cmu.edu/126reinf/ml95/.
Boyan1992: J. A. Boyan. Modular neural networks for learning context-dependent game strategies. Master's thesis, Engineering Department, Cambridge University, 1992.
Censor et al. 1988: Y. Censor, M. D. Altschuler, and W. D. Powlis. A computational solution of the inverse problem in radiation-therapy treatment planning. Applied Mathematics and Computation, 25:57-87, 1988.
Chao and Harper1996: Heng-Yi Chao and Mary P. Harper. An efficient lower bound algorithm for channel routing. Integration: The VLSI Journal, 1996. (to appear).
Christensen1986: J. Christensen. Learning static evaluation functions by linear regression. In T. Mitchell, J. Carbonell, and R. Michalski, editors, Machine learning: A guide to current research, pages 39-42. Kluwer, Boston, 1986.
Cohn1992: J. M. Cohn. Automatic Device Placement for Analog Cells in KOAN. PhD thesis, Carnegie Mellon University Department of Electrical and Computer Engineering, February 1992.
Cormen et al. 1990: T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, 1990.
Crites and Barto1996: R. Crites and A. Barto. Improving elevator performance using reinforcement learning. In D. Touretzky, M. Mozer, and M. Hasselno, editors, Advances in Neural Information Processing Systems 8, 1996.
Dayan1992: P. Dayan. The convergence of TD( ) for general . Machine Learning, 8(3/4), May 1992.
Denardo1982: E. Denardo. Dynamic Programming: Models and Applications. Prentice-Hall, Inc., 1982.
Fahlman and Lebiere1990: S. Fahlman and C. Lebiere. The Cascade-Correlation learning architecture. In D. Touretzky, editor, Advances in Neural Information Processing Systems 2. Morgan Kaufmann, 1990.
Gordon1995: G. Gordon. Stable function approximation in dynamic programming. In Proceedings of the 12th International Conference on Machine Learning. Morgan Kaufmann, 1995.
Harmon et al. 1995: M. Harmon, L. Baird, and A. H. Klopf. Advantage updating applied to a differential game. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances In Neural Information Processing Systems 7. MIT Press, 1995.
Howard1960: R. Howard. Dynamic Programming and Markov Processes. MIT Press and John Wiley & Sons, 1960.
Kaelbling1990: L. P. Kaelbling. Learning in Embedded Systems. PhD thesis, Stanford University, Department of Computer Science, 1990.
Lee and Mahajan1988: K.-F. Lee and S. Mahajan. A pattern classification approach to evaluation function learning. Artificial Intelligence, 36, 1988.
Lin and Kernighan1973: S. Lin and B. W. Kernighan. An effective heuristic algorithm for the traveling salesman problem. Operations Research, 21:498-516, 1973.
Lin1993: L.-J. Lin. Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University, 1993.
Littman and Szepesvári1996: M. L. Littman and C. Szepesvári. A generalized reinforcement-learning model: Convergence and applications. In L. Saitta, editor, Machine Learning: Proceedings of the Thirteenth International Conference. Morgan Kaufmann, 1996. (To Appear).
Moore and Schneider1996: A. W. Moore and J. Schneider. Memory-based stochastic optimization. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Neural Information Processing Systems 8. MIT Press, 1996.
Moriarty and Miikkulainen1995: D. Moriarty and R. Miikkulainen. Discovering complex othello strategies through evolutionary neural networks. Connection Science, 7(3-4):195-209, 1995.
Nilsson1980: N.J. Nilsson. Principles of Artificial Intelligence. McGraw-Hill, 1980.
Ochotta1994: E. Ochotta. Synthesis of High-Performance Analog Cells in ASTRX/OBLX. PhD thesis, Carnegie Mellon University Department of Electrical and Computer Engineering, April 1994.
Pollack et al. 1996: J. Pollack, A. Blair, and M. Land. Coevolution of a backgammon player. In C.G. Langton, editor, Proceedings of Artificial Life 5. MIT Press, 1996. (to appear).
Pomerleau1991: D. Pomerleau. Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3, January 1991.
Press et al. 1992: W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, second edition, 1992.
Prieditis1993: A. Prieditis. Machine discovery of effective admissible heuristics. Machine Learning, 12:117-141, 1993.
Samuel1959: A. L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3:211-229, 1959.
Samuel1967: A. L. Samuel. Some studies in machine learning using the game of checkers II--Recent progress. IBM Journal of Research and Development, 11(6):601-617, 1967.
Sutton1987: R. S. Sutton. Implementation details of the TD( ) procedure for the case of vector predictions and backpropagation. Technical Note TN87-509.1, GTE Laboratories, May 1987.
Sutton1988: R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3, 1988.
Sutton1990: R. S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning. Morgan Kaufmann, 1990.
Szykman and Cagan1995: S. Szykman and J. Cagan. A simulated annealing-based approach to three-dimensional component packing. ASME Journal of Mechanical Design, 117, June 1995.
Tesauro and Sejnowski1989: G. Tesauro and T. J. Sejnowski. A parallel network that learns to play backgammon. Artificial Intelligence, 39, 1989.
Tesauro1992: G. Tesauro. Practical issues in temporal difference learning. Machine Learning, 8(3/4), May 1992.
Tesauro1994: G. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.
Thrun and Schwartz1993: S. Thrun and A. Schwartz. Issues in using function approximation for reinforcement learning. In Proceedings of the Fourth Connectionist Models Summer School, 1993.
Tsitsiklis and Roy1996: J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. Technical Report LIDS-P-2322, MIT, 1996.
Tunstall-Pedoe1991: W. Tunstall-Pedoe. Genetic algorithms optimizing evaluation functions. ICCA Journal, 14(3):119-128, 1991.
Utgoff and Clouse1991: P. Utgoff and J. Clouse. Two kinds of training information for evaluation function learning. In Proceedings of AAAI, 1991.
van Laarhoven1987: P. van Laarhoven. Simulated Annealing: Theory and Applications. Kluwer Academic, 1987.
Watkins1989: C. Watkins. Learning From Delayed Rewards. PhD thesis, Cambridge University, 1989.
Wong et al. 1988: D. F. Wong, H.W. Leong, and C.L. Liu. Simulated Annealing for VLSI Design. Kluwer Academic, 1988.
Zhang and Dietterich1995: W. Zhang and T. G. Dietterich. A reinforcement learning approach to job-shop scheduling. In Proceedings of IJCAI-95, pages 1114-1120, 1995.
Zhang1996: W. Zhang. Reinforcement Learning for Job-Shop Scheduling. PhD thesis, Oregon State University, 1996.
Zweben and Fox1994: M. Zweben and M.S. Fox. Scheduling and Rescheduling with Iterative Repair. Morgan Kaufmann, 1994.

Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996