15-887*: Planning, Execution, and Learning
Schedule, Notes, Readings
Fall 2016
- Wednesday, September 7: (LV) Introduction
- Monday, September 12: (V) Representation for higher level planning
- Lecture Notes
- Points to remember from the discussion in class:
- Many ways to represent the states and actions of a problem,
significantly affecting the planning expressiveness and complexity.
- Closed-world assumption: What is not true in the state, is false.
- STRIPS action representation (add / delete lists).
- Many ways to find solutions to planning problems -- differ
in complexity and how well they scale with respect to size
of search space.
- Humans use real-world knowledge when they solve planning problems;
most planners do not have that luxury.
- Readings:
- Wednesday, September 14: (V) Algorithms for classical planning
- Lecture Notes
- Points to remember from the discussion in class:
- Planners are characterized in terms of soundness, completeness,
optimality, and computational efficiency.
- Linear Planning: Work on one goal until completely solved. Only then move to another goal.
- GPS Algorithm: Sound, Not Optimal, Not Complete
- Prodigy Algorithm: Nonlinear planner, Sound
- Readings:
- Monday, September 19: (L) Graph representation for lower level planning
- Lecture Notes
- Points to remember from the discussion in class:
- Interleaving of graph construction and search is critical.
- C-Space Transform: makes planning faster, but constructing C-space can be expensive.
- Multiple methods for constructing the graph of the planning problem.
- State discretization typically reduces computational complexity of motion planning at the expense of surrendering strict completeness.
- Readings:
- No assigned readings besides lecture notes.
- Wednesday, September 21: (V) Planning Graphs for Planning and Heuristic Search
- Lecture Notes
- Points to remember from the discussion in class:
- Graphplan has two phases: forward extension and backwards plan search.
- Exclusivity relations key to the efficiency of Graphplan.
- Graphplan is optimal in terms of number of time steps.
- Readings:
- Monday, September 26: (L) Heuristic A*; Weighted A*
- Lecture Notes
- Points to remember from the discussion in class:
- An heuristic applied to a state returns an estimate of the cost of the cheapest path from that state to the goal.
- An admissible heuristic never overestimates the cost of reaching the goal. A consistent heuristic satisfies the triangle inequality.
- A* is guaranteed to return an optimal path.
- Using heuristics may reduce the number of expanded states.
- Weighted A* trades off optimality for speed.
- No assigned readings besides lecture notes.
- Wednesday, September 28: (L) Heuristic functions, Multi-heuristic A*
- Lecture Notes
- Points to remember from the discussion in class:
- When dealing with high dimensional problems, very often it is a good idea to solve a lower dimensional approximation of the problem and use it as an heuristic.
- The design of heuristics is critical in heuristic search-based planning.
-
- Readings:
- Monday, October 3: (V) Probabilistic path planning
- Lecture Notes
- Points to remember from the discussion in class:
- PRM "discretizes" a continuous space by sampling points in the space.
- RRT searches the space by sampling towards the goal and to random points building a connected tree as found needed.
- Replanning is needed when execution fails.
- Given a plan that failed on a step, the two alternatives for replanning
are: to plan from scratch to the goal from the new state; and to plan
from the new state to try to reach one of the steps of the past plan. Both
options are not ideal.
- ERRT with a search bias to the past plan is the perfect solution
for replanning:
- ERRT expands the search tree to the goal with probability p, to a past found waypoint for reuse, with probability q, and a random target with probability 1-p-q.
- Readings:
- Wednesday, October 5: (V) Planning under uncertainty, re-planning and decision-theoretic planning, uncertainty of action
- Lecture Notes
- Points to remember from the discussion in class:
- Uncertainty arises from state uncertainty, non-deterministic effects,
and sensor noise.
- Conformant planning produces non-branching plans that deal with
uncertainty by forcing the world into known states
- Planning models can change as a function of the planning "distance" to the state.
- Short-sighted probabilistic planning bridges the gap between policy planning and replanning.
- Readings:
- Monday, October 10: (L) Execution: heuristic search, anytime, re-planning, real-time heuristic search I
- Lecture Notes
- Points to remember from the discussion in class:
- Planning is a repeated process. So, there is a need to re-plan fast.
- Anytime Heuristic Search: returns best plan possible within T msecs
- Incremental Heuristic Search: speed up search by reusing previous efforts
- Readings:
- Wednesday, October 12: (L) Execution: heuristic search, anytime, re-planning, real-time heuristic search II
- Lecture Notes
- Points to remember from the discussion in class:
- Realtime Heuristic Search: plan a few steps towards the goal and re-plan later.
May lead to smaller total planning time, but the result may also be highly sub-optimal.
- Freespace assumption: assume the path is clear unless known otherwise.
- Readings:
- Monday, October 17: (V) Planning under uncertainty: MDPs
- Lecture Notes
- Points to remember from the discussion in class:
- MDPs for planning under uncertainty
- MDPs reason about states and actions and rewards.
- Rewards capture goal achievement.
- Solving an MDP is finding the optimal policy.
- A policy maps states to actions.
- An MDP with a specific policy is a Markov Chain.
- We have seen value iteration and policy iteration
to solve Marchov Chains and MDPs.
- No assigned readings besides lecture notes.
- Wednesday, October 19: (V) Planning under uncertainty: Reinforcement Learning
- Lecture Notes
- Points to remember from the discussion in class:
- Reinforcement learning is an online learning method that learns a policy to control an agent in a world that is assumed to be an MDP.
- Model-free RL learns a Q function that captures both unknown reward and state transition functions.
- An online RL agent balances exploitation and exploration.
- Readings:
- Chapter 13 - Machine Learning, Tom Mitchell
-
Reinforcement Learning: A Survey
L.P. Kaelbling, M. Littman, A. Moore, JAIR vol 4, pp
237-285, 1996
-
Probabilistic policy reuse in a reinforcement learning agent,
Fernando Fernandez and Manuela Veloso.
In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems , Hakodate, Japan, May 2006.
- Monday, October 24: (L) Learning in Planning: Experience graphs
- Lecture Notes
- Points to remember from the discussion in class:
- Experience Graphs are a method for biasing search towards reuse of previously planned paths and demonstrations.
- They are based on the idea of re-computing heuristics to guide the search towards a set of paths rather than towards a goal.
- Can be useful in domains where a robot has a similar workspace across planning instances.
- Readings:
- Wednesday, October 26: (L) Dependent variables, Markov Property, Dominance
- Lecture Notes
- Points to remember from the discussion in class:
- Markov Property: cost and successor state only depend on the current state.
- Dominance relationship can be used to speed up search.
- Readings:
- Monday, October 31: (V) Learning in Planning: Explanation-based, Analogy-based
- Lecture Notes
- Points to remember from the discussion in class:
- Generating "correct" explanations is hard, as it may require additional knowledge about the domain for the generalization part.
- Control learning "explains" a search episode and outputs an explanation that can be used by the planner "to speed up" its search for solutions to problems.
- Explanation Based Generalization: Assumes the world provides all relevant features. Has the purpose
- HAMLET: learns control rules without proving that they are correct by generalizing from a single example, but it inductively revises its set of rules from multiple incremental examples.
- Readings:
- Monday, November 2: (V) Deep Learning and Deep Reinforcement Learning
- Lecture Notes
- Points to remember from the discussion in class:
- When state-space is continuous or or infinite, we use Q-learning with approximation.
- Deep Q-Network (DQN): approximate Q function with a neural network.
- DQN: the network has one output node for each action available
- Concept of Replay Memory: necessary for data eficiency and avoiding overfitting.
- Readings:
-
Playing Atari with Deep Reinforcement Learning,
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller
2013
-
Human-Level Control Through Deep Reinforcement Learning,
V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves,
M. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou,
H. King, D. Kumaran, D. Wierstra, S. Legg and D. Hassabis
In Nature 518, 529-533, 2015
- Wednesday, November 7: (L) Learning in Planning: learning a cost function
- Lecture Notes
- Points to remember from the discussion in class:
- Learning cost function is a way of learning by demonstration.
- Goal is to learn a cost function that makes demonstrations optimal solutions to the problem
- Performance depends on the basis functions used.
- Readings:
- Wednesday, November 9: (L) Planning under uncertainty: POMDPs
- Lecture Notes
- Points to remember from the discussion in class:
- POMDP is an extension of MDP with uncertainty over the
state one is in
- POMDP can be transformed into continuous-space MDP (over
belief space)
- POMDPs are highly intractable to solve. There are some approximation techniques, but intractability is still a big issue.
- Readings:
- Monday, November 14: (V) Multi-Robot Decision Making: State Estimation and Coordination
- Lecture Notes
- Points to remember from the discussion in class:
- Details on the CMDragons SSL team.
- The STEP architecture
- Robots can communicate and share state in order to locally make their role assignment.
- Readings:
- Wednesday, November 16: (V) Short-Sighted Probabilistic Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Probabilistic planners FF-Replan, UCT, SSiPP
- FF-Replan: remove probabilities from actions
- UCT: Limit the maximum number of actions that can be applied to reach the goal, and use sparse sampling to search for the solution.
- SSiPP: Prune states reachable only in the far future.
- Readings:
- Monday, November 21: (V) Review
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
- Wednesday, November 23: THANKSGIVING - NO CLASS
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
- Monday, November 28: (L) Application: planning for autonomous driving
- Lecture Notes
- Points to remember from the discussion in class:
- Planning at multiple levels
- Multi-resolution lattice in Motion Planner is beneficial
- Anytime re-planner is critical
- Design of proper heuristics is key to efficiency.
- Readings:
- Wednesday, November 30: (L) Application: planning for mobile manipulation and articulated robots
- Lecture Notes
- Points to remember from the discussion in class:
- Multiple planners used for both domains.
- Start and goal configurations are often most constrained - this can be exploited by the planners.
- Cost function can be really complex in planning for articulated robots
- Designing proper heuristics is key to efficiency.
- Readings:
-
Single- and Dual-Arm Motion Planning with Heuristic Search,
Benjamin Cohen, Sachin Chitta and Maxim Likhachev
In The International Journal of Robotics Research 2013
-
Search-based Planning for a Legged Robot over Rough Terrain,
Paul Vernaza, Maxim Likhachev, Subhrajit Bhattacharya, Sachin Chitta, Aleksandr Kushleyev, Daniel D. Lee
In International Conference on Robots and Automation 2009
-
Optimization and Learning for Rough-Terrain Legged Locomotion,
Zucker, Matt, et al.
In The International Journal of Robotics Research 30.2 (2011): 175-191.
- Monday, December 5: (L) Application: planning for perception
- Lecture Notes
- Points to remember from the discussion in class:
- Deliberative Perception: search for best “explanation” of the observed scene
- Discriminative guidance from any and multiple statistical learners
- Multi-Heuristic A* (MHA*) for graph search with distinct hypotheses
- Readings:
- Wednesday, December 7: (LV) Project presentations
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
mmv@cs.cmu.edu
July 23, 2016