15-887*: Planning, Execution, and Learning
Schedule, Notes, Readings
Fall 2016

Wednesday, September 7: (LV) Introduction
- Lecture Notes
- Lecture Notes (Maxim)
- Points to remember from the discussion in class:
- No assigned readings besides lecture notes.
Monday, September 12: (V) Representation for higher level planning
- Lecture Notes
- Points to remember from the discussion in class:
  - Many ways to represent the states and actions of a problem, significantly affecting the planning expressiveness and complexity.
  - Closed-world assumption: What is not true in the state, is false.
  - STRIPS action representation (add / delete lists).
  - Many ways to find solutions to planning problems -- differ in complexity and how well they scale with respect to size of search space.
  - Humans use real-world knowledge when they solve planning problems; most planners do not have that luxury.
- Readings:
  - PDDL - The Planning Domain Definition Language,
    Veloso et al., 1998
  - STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving,
    Fikes and Nilsson. Artificial intelligence, 2(3-4), 189-208, 1971
Wednesday, September 14: (V) Algorithms for classical planning
- Lecture Notes
- Points to remember from the discussion in class:
  - Planners are characterized in terms of soundness, completeness, optimality, and computational efficiency.
  - Linear Planning: Work on one goal until completely solved. Only then move to another goal.
  - GPS Algorithm: Sound, Not Optimal, Not Complete
  - Prodigy Algorithm: Nonlinear planner, Sound
- Readings:
  - Integrating Planning and Learning: The PRODIGY Architecture,
    Veloso et al. Journal of Experimental & Theoretical Artificial Intelligence, 7(1), 81-120, 1995
  - A Computational Model for Skill Acquisition,
    Sussman, 1973
Monday, September 19: (L) Graph representation for lower level planning
- Lecture Notes
- Points to remember from the discussion in class:
  - Interleaving of graph construction and search is critical.
  - C-Space Transform: makes planning faster, but constructing C-space can be expensive.
  - Multiple methods for constructing the graph of the planning problem.
  - State discretization typically reduces computational complexity of motion planning at the expense of surrendering strict completeness.
- Readings:
  - No assigned readings besides lecture notes.
Wednesday, September 21: (V) Planning Graphs for Planning and Heuristic Search
- Lecture Notes
- Points to remember from the discussion in class:
  - Graphplan has two phases: forward extension and backwards plan search.
  - Exclusivity relations key to the efficiency of Graphplan.
  - Graphplan is optimal in terms of number of time steps.
- Readings:
  - Graphplan page,
    Blum, Furst and Langford
  - FF page,
    Hoffman
Monday, September 26: (L) Heuristic A*; Weighted A*
- Lecture Notes
- Points to remember from the discussion in class:
  - An heuristic applied to a state returns an estimate of the cost of the cheapest path from that state to the goal.
  - An admissible heuristic never overestimates the cost of reaching the goal. A consistent heuristic satisfies the triangle inequality.
  - A* is guaranteed to return an optimal path.
  - Using heuristics may reduce the number of expanded states.
  - Weighted A* trades off optimality for speed.
- No assigned readings besides lecture notes.
Wednesday, September 28: (L) Heuristic functions, Multi-heuristic A*
- Lecture Notes
- Points to remember from the discussion in class:
  - When dealing with high dimensional problems, very often it is a good idea to solve a lower dimensional approximation of the problem and use it as an heuristic.
  - The design of heuristics is critical in heuristic search-based planning.
- Readings:
  - Multi-Heuristic A*,
    Aine et al. Proceedings of Robotics: Science and Systems, 2014
Monday, October 3: (V) Probabilistic path planning
- Lecture Notes
- Points to remember from the discussion in class:
  - PRM "discretizes" a continuous space by sampling points in the space.
  - RRT searches the space by sampling towards the goal and to random points building a connected tree as found needed.
  - Replanning is needed when execution fails.
  - Given a plan that failed on a step, the two alternatives for replanning are: to plan from scratch to the goal from the new state; and to plan from the new state to try to reach one of the steps of the past plan. Both options are not ideal.
  - ERRT with a search bias to the past plan is the perfect solution for replanning:
  - ERRT expands the search tree to the goal with probability p, to a past found waypoint for reuse, with probability q, and a random target with probability 1-p-q.
- Readings:
  - Probabilistic roadmaps for path planning in high-dimensional configuration spaces,
    Kavraki et al.
    In IEEE transactions on Robotics and Automation, 566-580, 1996
  - Real-time randomized path planning for robot navigation,
    James Bruce and Manuela Veloso.
    In Proceedings of IROS-2002, Switzerland, October 2002.
Wednesday, October 5: (V) Planning under uncertainty, re-planning and decision-theoretic planning, uncertainty of action
- Lecture Notes
- Points to remember from the discussion in class:
  - Uncertainty arises from state uncertainty, non-deterministic effects, and sensor noise.
  - Conformant planning produces non-branching plans that deal with uncertainty by forcing the world into known states
  - Planning models can change as a function of the planning "distance" to the state.
  - Short-sighted probabilistic planning bridges the gap between policy planning and replanning.
- Readings:
  - Variable Level-of-Detail Motion Planning in Environments with Poorly Predictable Bodies ,
    Stefan Zickler and Manuela Veloso
    In Proceedings of ECAI'10, Lisbon, Portugal, August, 2010.
Monday, October 10: (L) Execution: heuristic search, anytime, re-planning, real-time heuristic search I
- Lecture Notes
- Points to remember from the discussion in class:
  - Planning is a repeated process. So, there is a need to re-plan fast.
  - Anytime Heuristic Search: returns best plan possible within T msecs
  - Incremental Heuristic Search: speed up search by reusing previous efforts
- Readings:
  - Anytime Search in Dynamic Graphs,
    Likhachev et al.
    In Artificial Intelligence, 2008.
Wednesday, October 12: (L) Execution: heuristic search, anytime, re-planning, real-time heuristic search II
- Lecture Notes
- Points to remember from the discussion in class:
  - Realtime Heuristic Search: plan a few steps towards the goal and re-plan later. May lead to smaller total planning time, but the result may also be highly sub-optimal.
  - Freespace assumption: assume the path is clear unless known otherwise.
- Readings:
  - Agent-Centered Search,
    Sven Koenig
    In AI Magazine, 2001
  - Real-Time Adaptive A*,
    Sven Koenig and Maxim Likhachev
    In Proceedings of AAMAS 2006, 2006
Monday, October 17: (V) Planning under uncertainty: MDPs
- Lecture Notes
- Points to remember from the discussion in class:
  - MDPs for planning under uncertainty
  - MDPs reason about states and actions and rewards.
  - Rewards capture goal achievement.
  - Solving an MDP is finding the optimal policy.
  - A policy maps states to actions.
  - An MDP with a specific policy is a Markov Chain.
  - We have seen value iteration and policy iteration to solve Marchov Chains and MDPs.
- No assigned readings besides lecture notes.
Wednesday, October 19: (V) Planning under uncertainty: Reinforcement Learning
- Lecture Notes
- Points to remember from the discussion in class:
  - Reinforcement learning is an online learning method that learns a policy to control an agent in a world that is assumed to be an MDP.
  - Model-free RL learns a Q function that captures both unknown reward and state transition functions.
  - An online RL agent balances exploitation and exploration.
- Readings:
  - Chapter 13 - Machine Learning, Tom Mitchell
  - Reinforcement Learning: A Survey L.P. Kaelbling, M. Littman, A. Moore, JAIR vol 4, pp 237-285, 1996
  - Probabilistic policy reuse in a reinforcement learning agent,
    Fernando Fernandez and Manuela Veloso.
    In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems , Hakodate, Japan, May 2006.
Monday, October 24: (L) Learning in Planning: Experience graphs
- Lecture Notes
- Points to remember from the discussion in class:
  - Experience Graphs are a method for biasing search towards reuse of previously planned paths and demonstrations.
  - They are based on the idea of re-computing heuristics to guide the search towards a set of paths rather than towards a goal.
  - Can be useful in domains where a robot has a similar workspace across planning instances.
- Readings:
  - E-Graphs: Bootstrapping Planning with Experience Graphs,
    Mike Phillips, et al.
    In Robotics: Science and Systems Vol. 5. No. 1. 2012.
Wednesday, October 26: (L) Dependent variables, Markov Property, Dominance
- Lecture Notes
- Points to remember from the discussion in class:
  - Markov Property: cost and successor state only depend on the current state.
  - Dominance relationship can be used to speed up search.
- Readings:
Monday, October 31: (V) Learning in Planning: Explanation-based, Analogy-based
- Lecture Notes
- Points to remember from the discussion in class:
  - Generating "correct" explanations is hard, as it may require additional knowledge about the domain for the generalization part.
  - Control learning "explains" a search episode and outputs an explanation that can be used by the planner "to speed up" its search for solutions to problems.
  - Explanation Based Generalization: Assumes the world provides all relevant features. Has the purpose
  - HAMLET: learns control rules without proving that they are correct by generalizing from a single example, but it inductively revises its set of rules from multiple incremental examples.
- Readings:
  - Explanation-based generalization: A unifying view,
    Tom M. Mitchell, Richard M. Keller, Smadar T. Kedar-Cabelli.
    In Machine learning 1.1 (1986): 47-80.
  - Incremental learning of control knowledge for improvement of planning efficiency and plan quality,
    Daniel Borrajo and Manuela Veloso
    In AAAI-94 fall symposium on planning and learning 1994
Monday, November 2: (V) Deep Learning and Deep Reinforcement Learning
- Lecture Notes
- Points to remember from the discussion in class:
  - When state-space is continuous or or infinite, we use Q-learning with approximation.
  - Deep Q-Network (DQN): approximate Q function with a neural network.
  - DQN: the network has one output node for each action available
  - Concept of Replay Memory: necessary for data eficiency and avoiding overfitting.
- Readings:
  - Playing Atari with Deep Reinforcement Learning,
    V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller
    2013
  - Human-Level Control Through Deep Reinforcement Learning,
    V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg and D. Hassabis
    In Nature 518, 529-533, 2015
Wednesday, November 7: (L) Learning in Planning: learning a cost function
- Lecture Notes
- Points to remember from the discussion in class:
  - Learning cost function is a way of learning by demonstration.
  - Goal is to learn a cost function that makes demonstrations optimal solutions to the problem
  - Performance depends on the basis functions used.
- Readings:
  - Learning to search: Functional gradient techniques for imitation learning,
    Ratliff, Nathan D., David Silver, and J. Andrew Bagnell.
    In Autonomous Robots 27.1 (2009): 25-53
  - Inverse Reinforcement Learning
    Pieter Abbeel
Wednesday, November 9: (L) Planning under uncertainty: POMDPs
- Lecture Notes
- Points to remember from the discussion in class:
  - POMDP is an extension of MDP with uncertainty over the state one is in
  - POMDP can be transformed into continuous-space MDP (over belief space)
  - POMDPs are highly intractable to solve. There are some approximation techniques, but intractability is still a big issue.
- Readings:
  - Planning and Acting in Partially Observable Stochastic Domains,
    Kaelbling, Leslie Pack, Michael L. Littman, and Anthony R. Cassandra.
    Artificial intelligence 101.1 (1998): 99-134.
Monday, November 14: (V) Multi-Robot Decision Making: State Estimation and Coordination
- Lecture Notes
- Points to remember from the discussion in class:
  - Details on the CMDragons SSL team.
  - The STEP architecture
  - Robots can communicate and share state in order to locally make their role assignment.
- Readings:
  - Prioritized Multi-Hypothesis Tracking by a Robot with Limited Sensing,
    Paul Rybski and Manuela Veloso
    In EURASIP Journal on Advances in Signal Processing, 2009
  - Selectively Reactive Coordination for a Team of Robot Soccer Champions,
    Juan Pablo Mendoza, Joydeep Biswas, Danny Zhu, Philip Cooksey, Richard Wang, Steven Klee, and Manuela Veloso
    In Proceedings of AAAI16
Wednesday, November 16: (V) Short-Sighted Probabilistic Planning
- Lecture Notes
- Points to remember from the discussion in class:
  - Probabilistic planners FF-Replan, UCT, SSiPP
  - FF-Replan: remove probabilities from actions
  - UCT: Limit the maximum number of actions that can be applied to reach the goal, and use sparse sampling to search for the solution.
  - SSiPP: Prune states reachable only in the far future.
- Readings:
  - Finding Objects through Stochastic Shortest Path Problems,
    Felipe Trevizan, and Manuela Veloso
    In Proceedings of AAMAS 13
  - Short-Sighted Stochastic Shortest Path Problems,
    Felipe Trevizan and Manuela Veloso
    In Proceedings of ICAPS 12.
Monday, November 21: (V) Review
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
Wednesday, November 23: THANKSGIVING - NO CLASS
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
Monday, November 28: (L) Application: planning for autonomous driving
- Lecture Notes
- Points to remember from the discussion in class:
  - Planning at multiple levels
  - Multi-resolution lattice in Motion Planner is beneficial
  - Anytime re-planner is critical
  - Design of proper heuristics is key to efficiency.
- Readings:
  - Planning Long Dynamically-Feasible Maneuvers for Autonomous Vehicles,
    Maxim Likhachev and Dave Ferguson
    In International Journal of Robotics Research 28.8 (2009): 933-945.
  - Motion Planning for Autonomous Driving with a Conformal Spatiotemporal Lattice,
    Matthew McNaughton, Chris Urmson, John M. Dolan, and Jin-Woo Lee
    In Proceedings of ICRA 2011
Wednesday, November 30: (L) Application: planning for mobile manipulation and articulated robots
- Lecture Notes
- Points to remember from the discussion in class:
  - Multiple planners used for both domains.
  - Start and goal configurations are often most constrained - this can be exploited by the planners.
  - Cost function can be really complex in planning for articulated robots
  - Designing proper heuristics is key to efficiency.
- Readings:
  - Single- and Dual-Arm Motion Planning with Heuristic Search,
    Benjamin Cohen, Sachin Chitta and Maxim Likhachev
    In The International Journal of Robotics Research 2013
  - Search-based Planning for a Legged Robot over Rough Terrain,
    Paul Vernaza, Maxim Likhachev, Subhrajit Bhattacharya, Sachin Chitta, Aleksandr Kushleyev, Daniel D. Lee
    In International Conference on Robots and Automation 2009
  - Optimization and Learning for Rough-Terrain Legged Locomotion,
    Zucker, Matt, et al.
    In The International Journal of Robotics Research 30.2 (2011): 175-191.
Monday, December 5: (L) Application: planning for perception
- Lecture Notes
- Points to remember from the discussion in class:
  - Deliberative Perception: search for best â€œexplanationâ€ of the observed scene
  - Discriminative guidance from any and multiple statistical learners
  - Multi-Heuristic A* (MHA*) for graph search with distinct hypotheses
- Readings:
  - Discriminatively-guided Deliberative Perception for Pose Estimation of Multiple 3D Object Instances,
    Venkatraman Narayanan and Maxim Likhachev
Wednesday, December 7: (LV) Project presentations
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
mmv@cs.cmu.edu
July 23, 2016

15-887*: Planning, Execution, and LearningSchedule, Notes, Readings Fall 2016

15-887*: Planning, Execution, and Learning
Schedule, Notes, Readings
Fall 2016