References

Next: About this document Up: Reinforcement Learning: A Survey Previous: Acknowledgments

References

1: David H. Ackley and Michael L. Littman. Generalization and scaling in reinforcement learning. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems 2, pages 550-557, San Mateo, CA, 1990. Morgan Kaufmann.
2: J. S. Albus. A new approach to manipulator control: Cerebellar model articulation controller (cmac). Journal of Dynamic Systems, Measurement and Control, 97:220-227, 1975.
3: James S. Albus. Brains, Behavior, and Robotics. BYTE Books, Subsidiary of McGraw-Hill, Peterborough, New Hampshire, 1981.
4: Charles W. Anderson. Learning and Problem Solving with Multilayer Connectionist Systems. PhD thesis, University of Massachusetts, Amherst, MA, 1986.
5: Rachita (Ronny) Ashar. Hierarchical learning in stochastic domains. Master's thesis, Brown University, Providence, Rhode Island, 1994.
6: Leemon Baird. Residual algorithms: Reinforcement learning with function approximation. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 30-37, San Francisco, CA, 1995. Morgan Kaufmann.
7: Leemon C. Baird and A. H. Klopf. Reinforcement learning with high-dimensional, continuous actions. Technical Report WL-TR-93-1147, Wright-Patterson Air Force Base Ohio: Wright Laboratory, 1993.
8: Andrew G. Barto, S. J. Bradtke, and Satinder P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1):81-138, 1995.
9: Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13(5):834-846, 1983.
10: Richard Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.
11: Hamid R. Berenji. Artificial neural networks and approximate reasoning for intelligent control in space. In American Control Conference, pages 1075-1080, 1991.
12: Donald A. Berry and Bert Fristedt. Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London, UK, 1985.
13: Dimitri P. Bertsekas. Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs, NJ, 1987.
14: Dimitri P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, Belmont, Massachusetts, 1995. Volumes 1 and 2.
15: Dimitri P. Bertsekas and D. A. Castañon. Adaptive aggregation for infinite horizon dynamic programming. IEEE Transactions on Automatic Control, 34(6):589-598, 1989.
16: Dimitri P. Bertsekas and John N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs, NJ, 1989.
17: G. E. P. Box and N. R. Draper. Empirical Model-Building and Response Surfaces. Wiley, 1987.
18: Justin A. Boyan and Andrew W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, Cambridge, MA, 1995. The MIT Press.
19: D. Burghes and A. Graham. Introduction to Control Theory including Optimal Control. Ellis Horwood, 1980.
20: Anthony R. Cassandra, Leslie Pack Kaelbling, and Michael L. Littman. Acting optimally in partially observable stochastic domains. In Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA, 1994.
21: David Chapman and Leslie Pack Kaelbling. Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. In Proceedings of the International Joint Conference on Artificial Intelligence, Sydney, Australia, 1991.
22: Lonnie Chrisman. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 183-188, San Jose, CA, 1992. AAAI Press.
23: Lonnie Chrisman and Michael Littman. Hidden state and short-term memory, 1993. Presentation at Reinforcement Learning Workshop, Machine Learning Conference.
24: Pawel Cichosz and Jan J. Mulawka. Fast and efficient reinforcement learning with truncated temporal differences. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 99-107, San Francisco, CA, 1995. Morgan Kaufmann.
25: W. S. Cleveland and S. J. Delvin. Locally weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association, 83(403):596-610, September 1988.
26: Dave Cliff and Susi Ross. Adding temporary memory to ZCS. Adaptive Behavior, 3(2):101-150, 1994.
27: Anne Condon. The complexity of stochastic games. Information and Computation, 96(2):203-224, February 1992.
28: Jonathan Connell and Sridhar Mahadevan. Rapid task learning for real robots. In Robot Learning. Kluwer Academic Publishers, 1993.
29: R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Neural Information Processing Systems 8, 1996.
30: Peter Dayan. The convergence of TD( ) for general . Machine Learning, 8(3):341-362, 1992.
31: Peter Dayan and Geoffrey E. Hinton. Feudal reinforcement learning. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, San Mateo, CA, 1993. Morgan Kaufmann.
32: Peter Dayan and Terrence J. Sejnowski. TD converges with probability 1. Machine Learning, 14(3), 1994.
33: Thomas Dean, Leslie Pack Kaelbling, Jak Kirman, and Ann Nicholson. Planning with deadlines in stochastic domains. In Proceedings of the Eleventh National Conference on Artificial Intelligence, Washington, DC, 1993.
34: F. D'Epenoux. A probabilistic production and inventory problem. Management Science, 10:98-108, 1963.
35: Cyrus Derman. Finite State Markovian Decision Processes. Academic Press, New York, 1970.
36: M. Dorigo and H. Bersini. A comparison of q-learning and classifier systems. In From Animals to Animats: Proceedings of the Third International Conference on the Simulation of Adaptive Behavior, Brighton, UK, 1994.
37: M. Dorigo and M. Colombetti. Robot shaping: Developing autonomous agents through learning. Artificial Intelligence, 71(2):321-370, December 1994.
38: Marco Dorigo. Alecsys and the AutonoMouse: Learning to control a real robot by distributed classifier systems. Machine Learning, 19, 1995.
39: Claude-Nicolas Fiechter. Efficient reinforcement learning. In Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, pages 88-97. Association of Computing Machinery, 1994.
40: J. C. Gittins. Multi-armed Bandit Allocation Indices. Wiley-Interscience series in systems and optimization. Wiley, Chichester, NY, 1989.
41: D. Goldberg. Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, MA, 1989.
42: Geoffrey J. Gordon. Stable function approximation in dynamic programming. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 261-268, San Francisco, CA, 1995. Morgan Kaufmann.
43: Vijay Gullapalli. A stochastic reinforcement learning algorithm for learning real-valued functions. Neural Networks, 3:671-692, 1990.
44: Vijay Gullapalli. Reinforcement learning and its application to control. PhD thesis, University of Massachusetts, Amherst, MA, 1992.
45: Ernest R. Hilgard and Gordon H. Bower. Theories of Learning. Prentice-Hall, Englewood Cliffs, NJ, fourth edition, 1975.
46: A. J. Hoffman and R. M. Karp. On nonterminating stochastic games. Management Science, 12:359-370, 1966.
47: John H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI, 1975.
48: Ronald A. Howard. Dynamic Programming and Markov Processes. The MIT Press, Cambridge, MA, 1960.
49: Tommi Jaakkola, Michael I. Jordan, and Satinder P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), November 1994.
50: Tommi Jaakkola, Satinder Pal Singh, and Michael I. Jordan. Monte-carlo reinforcement learning in non-Markovian decision problems. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, Cambridge, MA, 1995. The MIT Press.
51: Leslie Pack Kaelbling. Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, 1993. Morgan Kaufmann.
52: Leslie Pack Kaelbling. Learning in Embedded Systems. The MIT Press, Cambridge, MA, 1993.
53: Leslie Pack Kaelbling. Associative reinforcement learning: A generate and test algorithm. Machine Learning, 15(3), 1994.
54: Leslie Pack Kaelbling. Associative reinforcement learning: Functions in k-DNF. Machine Learning, 15(3), 1994.
55: Jak Kirman. Predicting Real-Time Planner Performance by Domain Characterization. PhD thesis, Department of Computer Science, Brown University, 1994.
56: Sven Koenig and Reid G. Simmons. Complexity analysis of real-time reinforcement learning. In Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 99-105, Menlo Park, California, 1993. AAAI Press/MIT Press.
57: P. R. Kumar and P. P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, Englewood Cliffs, New Jersey, 1986.
58: C. C. Lee. A self learning rule-based controller employing approximate reasoning and neural net concepts. International Journal of Intelligent Systems, 6(1):71-93, 1991.
59: Long-Ji Lin. Programming robots using reinforcement learning and teaching. In Proceedings of the Ninth National Conference on Artificial Intelligence, 1991.
60: Long-Ji Lin. Hierachical learning of robot skills by reinforcement. In Proceedings of the International Conference on Neural Networks, 1993.
61: Long-Ji Lin. Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, 1993.
62: Long-Ji Lin and Tom M. Mitchell. Memory approaches to reinforcement learning in non-Markovian domains. Technical Report CMU-CS-92-138, Carnegie Mellon University, School of Computer Science, May 1992.
63: Michael L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 157-163, San Francisco, CA, 1994. Morgan Kaufmann.
64: Michael L. Littman. Memoryless policies: Theoretical limitations and practical results. In Dave Cliff, Philip Husbands, Jean-Arcady Meyer, and Stewart W. Wilson, editors, From Animals to Animats 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior, Cambridge, MA, 1994. The MIT Press.
65: Michael L. Littman, Anthony Cassandra, and Leslie Pack Kaelbling. Learning policies for partially observable environments: Scaling up. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 362-370, San Francisco, CA, 1995. Morgan Kaufmann.
66: Michael L. Littman, Thomas L. Dean, and Leslie Pack Kaelbling. On the complexity of solving Markov decision problems. In Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence (UAI-95), Montreal, Québec, Canada, 1995.
67: William S. Lovejoy. A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28:47-66, 1991.
68: Pattie Maes and Rodney A. Brooks. Learning to coordinate behaviors. In Proceedings Eighth National Conference on Artificial Intelligence, pages 796-802. Morgan Kaufmann, 1990.
69: Sridhar Mahadevan. To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 164-172, San Francisco, CA, 1994. Morgan Kaufmann.
70: Sridhar Mahadevan. Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1), 1996.
71: Sridhar Mahadevan and Jonathan Connell. Automatic programming of behavior-based robots using reinforcement learning. In Proceedings of the Ninth National Conference on Artificial Intelligence, Anaheim, CA, 1991.
72: Sridhar Mahadevan and Jonathan Connell. Scaling reinforcement learning to robotics by exploiting the subsumption architecture. In Proceedings of the Eighth International Workshop on Machine Learning, pages 328-332, 1991.
73: Maja J. Mataric. Reward functions for accelerated learning. In W. W. Cohen and H. Hirsh, editors, Proceedings of the Eleventh International Conference on Machine Learning. Morgan Kaufmann, 1994.
74: Andrew Kachites McCallum. Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, Department of Computer Science, University of Rochester, December 1995.
75: R. Andrew McCallum. Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Conference on Machine Learning, pages 190-196, Amherst, Massachusetts, 1993. Morgan Kaufmann.
76: R. Andrew McCallum. Instance-based utile distinctions for reinforcement learning with hidden state. In Proceedings of the Twelfth International Conference Machine Learning, pages 387-395, San Francisco, CA, 1995. Morgan Kaufmann.
77: Lisa Meeden, G. McGraw, and D. Blank. Emergent control and planning in an autonomous vehicle. In D.S. Touretsky, editor, Proceedings of the Fifteenth Annual Meeting of the Cognitive Science Society, pages 735-740. Lawerence Erlbaum Associates, Hillsdale, NJ, 1993.
78: Jose del R. Millan. Rapid, safe, and incremental learning of navigation strategies. IEEE Transactions on Systems, Man, and Cybernetics, 26(3), 1996.
79: George E. Monahan. A survey of partially observable Markov decision processes: Theory, models, and algorithms. Management Science, 28:1-16, January 1982.
80: Andrew W. Moore. Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued spaces. In Proc. Eighth International Machine Learning Workshop, 1991.
81: Andrew W. Moore. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 711-718, San Mateo, CA, 1994. Morgan Kaufmann.
82: Andrew W. Moore and Christopher G. Atkeson. An investigation of memory-based function approximators for learning control. Technical report, MIT Artifical Intelligence Laboratory, Cambridge, MA, 1992.
83: Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 1993.
84: Andrew W. Moore, Christopher G. Atkeson, and S. Schaal. Memory-based learning for control. Technical Report CMU-RI-TR-95-18, CMU Robotics Institute, 1995.
85: Kumpati Narendra and M. A. L. Thathachar. Learning Automata: An Introduction. Prentice-Hall, Englewood Cliffs, NJ, 1989.
86: Kumpati S. Narendra and M. A. L. Thathachar. Learning automata--a survey. IEEE Transactions on Systems, Man, and Cybernetics, 4(4):323-334, July 1974.
87: Jing Peng and Ronald J. Williams. Efficient learning and planning within the Dyna framework. Adaptive Behavior, 1(4):437-454, 1993.
88: Jing Peng and Ronald J. Williams. Incremental multi-step Q-learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 226-232, San Francisco, CA, 1994. Morgan Kaufmann.
89: Dean A. Pomerleau. Neural network perception for mobile robot guidance. Kluwer Academic Publishing, 1993.
90: Martin L. Puterman. Markov Decision Processes--Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, 1994.
91: Martin L. Puterman and Moon Chirl Shin. Modified policy iteration algorithms for discounted Markov decision processes. Management Science, 24:1127-1137, 1978.
92: M. B. Ring. Continual Learning in Reinforcement Environments. PhD thesis, University of Texas at Austin, Austin, Texas, August 1994.
93: Ulrich Rüde. Mathematical and computational techniques for multilevel adaptive methods. Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania, 1993.
94: D. E. Rumelhart and J. L. McClelland, editors. Parallel Distributed Processing: Explorations in the microstructures of cognition. Volume 1: Foundations. The MIT Press, Cambridge, MA, 1986.
95: G. A. Rummery and M. Niranjan. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR166, Cambridge University, 1994.
96: John Rust. Numerical dynamic programming in economics. In Handbook of Computational Economics. Elsevier, North Holland, 1996.
97: A. P. Sage and C. C. White. Optimum Systems Control. Prentice Hall, 1977.
98: Marcos Salganicoff and Lyle H. Ungar. Active exploration and learning in real-valued spaces using multi-armed bandit allocation indices. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 480-487, San Francisco, CA, 1995. Morgan Kaufmann.
99: A. L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3:211-229, 1959. Reprinted in E. A. Feigenbaum and J. Feldman, editors, Computers and Thought, McGraw-Hill, New York 1963.
100: S. Schaal and Christopher Atkeson. Robot juggling: An implementation of memory-based learning. Control Systems Magazine, 14, 1994.
101: J. Schmidhuber. A general method for multi-agent learning and incremental self-improvement in unrestricted environments. In X. Yao, editor, Evolutionary Computation: Theory and Applications. Scientific Publ. Co., Singapore, 1996.
102: J. H. Schmidhuber. Curious model-building control systems. In Proc. International Joint Conference on Neural Networks, Singapore, volume 2, pages 1458-1463. IEEE, 1991.
103: Jürgen H. Schmidhuber. Reinforcement learning in Markovian and non-Markovian environments. In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 500-506, San Mateo, CA, 1991. Morgan Kaufmann.
104: Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski. Temporal difference learning of position evaluation in the game of Go. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 817-824, San Mateo, CA, 1994. Morgan Kaufmann.
105: Alexander Schrijver. Theory of Linear and Integer Programming. Wiley-Interscience, New York, NY, 1986.
106: Anton Schwartz. A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of the Tenth International Conference on Machine Learning, pages 298-305, Amherst, Massachusetts, 1993. Morgan Kaufmann.
107: Satinder P. Singh, Andrew G. Barto, Roderic Grupen, and Christopher Connolly. Robust reinforcement learning in motion planning. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 655-662, San Mateo, CA, 1994. Morgan Kaufmann.
108: Satinder P. Singh and Richard S. Sutton. Reinforcement learning with replacing eligibility traces. Machine Learning, 22(1), 1996.
109: Satinder Pal Singh. Reinforcement learning with a hierarchy of abstract models. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 202-207, San Jose, CA, 1992. AAAI Press.
110: Satinder Pal Singh. Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8(3):323-340, 1992.
111: Satinder Pal Singh. Learning to Solve Markovian Decision Processes. PhD thesis, Department of Computer Science, University of Massachusetts, 1993. Also, CMPSCI Technical Report 93-77.
112: Robert F. Stengel. Stochastic Optimal Control. John Wiley and Sons, 1986.
113: R. S. Sutton. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Neural Information Processing Systems 8, 1996.
114: Richard S. Sutton. Temporal Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts, Amherst, MA, 1984.
115: Richard S. Sutton. Learning to predict by the method of temporal differences. Machine Learning, 3(1):9-44, 1988.
116: Richard S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, Austin, TX, 1990. Morgan Kaufmann.
117: Richard S. Sutton. Planning by incremental dynamic programming. In Proceedings of the Eighth International Workshop on Machine Learning, pages 353-357. Morgan Kaufmann, 1991.
118: Gerald Tesauro. Practical issues in temporal difference learning. Machine Learning, 8:257-277, 1992.
119: Gerald Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.
120: Gerald Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3):58-67, March 1995.
121: C-K. Tham and R. W. Prager. A modular q-learning architecture for manipulator task decomposition. In Proceedings of the Eleventh International Conference on Machine Learning, San Francisco, CA, 1994. Morgan Kaufmann.
122: Sebastian Thrun. Learning to play the game of chess. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, Cambridge, MA, 1995. The MIT Press.
123: Sebastian Thrun and Anton Schwartz. Issues in using function approximation for reinforcement learning. In M. Mozer, P. Smolensky, D. Touretzky, J. Elman, and A. Weigend, editors, Proceedings of the 1993 Connectionist Models Summer School, Hillsdale, NJ, 1993. Lawrence Erlbaum.
124: Sebastian B. Thrun. The role of exploration in learning control. In David A. White and Donald A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, New York, NY, 1992.
125: John N. Tsitsiklis. Asynchronous stochastic approximation and Q-learning. Machine Learning, 16(3), September 1994.
126: John N. Tsitsiklis and Ben Van Roy. Feature-based methods for large scale dynamic programming. Machine Learning, 22(1), 1996.
127: L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134-1142, November 1984.
128: Christopher J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, UK, 1989.
129: Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3):279-292, 1992.
130: Steven D. Whitehead. Complexity and cooperation in Q-learning. In Proceedings of the Eighth International Workshop on Machine Learning, Evanston, IL, 1991. Morgan Kaufmann.
131: Ronald J. Williams. A class of gradient-estimating algorithms for reinforcement learning in neural networks. In Proceedings of the IEEE First International Conference on Neural Networks, San Diego, CA, 1987.
132: Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3):229-256, 1992.
133: Ronald J. Williams and Leemon C. Baird, III. Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems. Technical Report NU-CCS-93-11, Northeastern University, College of Computer Science, Boston, MA, September 1993.
134: Ronald J. Williams and Leemon C. Baird, III. Tight performance bounds on greedy policies based on imperfect value functions. Technical Report NU-CCS-93-14, Northeastern University, College of Computer Science, Boston, MA, November 1993.
135: Stewart Wilson. Classifier fitness based on accuracy. Evolutionary Computation, 3(2):147-173, 1995.
136: W. Zhang and T. G. Dietterich. A reinforcement learning approach to job-shop scheduling. In Proceedings of the International Joint Conference on Artificial Intellience, 1995.

Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996