Midterm 2

Learning Objectives

Logic

Describe the definition of (Boolean) Satisfiability Problem (SAT).
Describe conjunctive normal form (CNF).
~~Understand the algorithm DPLL for solving SAT problems.~~
Describe and create a Successor-State Axiom given a problem setup.
Describe and implement SATPlan (Planning as Satisfiability) to real-world problems.

Classical Planning

Compare and contrast classical planning methods with planning via search or propositional logic.
Execute linear planning on real-world problems.
Identify properties of a given planning algorithm, namely whether it is sound, complete, and optimal.
Implement and execute real-world problems given a GraphPlan solver.
Create or extend layers of a GraphPlan graph.
Identify termination conditions from a GraphPlan graph.

Markov Decision Processes (MDPs)

Describe the definition of Markov Decision Process.
Compute utility of a reward sequence given discount factor.
Define policy and optimal policy of an MDP.
Define state-value and (true) state value of an MDP.
Define Q-value and (true) Q value of an MDP.
Derive optimal policy from (true) state value or (true) Q-values.
Write Bellman Equation for state-value and Q-value for optimal policy and a given policy.
Describe and implement value iteration algorithm (through Bellman update) for solving MDPs.
Describe and implement policy iteration algorithm (through policy evaluation and policy improvement) for solving MDPs.
Understand convergence for value iteration and policy iteration.

Reinforcement Learning

Understand the concept of exploration, exploitation, regret.
Describe the relationships and differences between:
1. Markov Decision Processes (MDP) vs Reinforcement Learning (RL)
2. Model-based vs Model-free RL
3. Temporal-Difference Value Learning (TD Value Learning) vs Q-Learning
4. Passive vs Active RL
5. Off-policy vs On-policy Learning
6. Exploration vs Exploitation
Describe and implement:
1. Temporal difference learning
2. Q-Learning
3. \(\epsilon\)-Greedy algorithm
4. Approximate Q-learning (Feature-based)
Derive weight update for Approximate Q-learning.

Bayes Nets

Answer any query from a joint distribution.
Construct joint distribution from conditional probability tables using chain rule.
Construct joint, conditional or marginal distribution from Bayes net and conditional probability tables.
Construct a Bayes net given conditional independence assumptions
Identify independence relationships between variables in a Bayes Net

Practice Exams

Please find practice midterms and solutions attached below. We have included some recordings of TAs walking through solutions to some select problems. If you have any questions, please feel free to post on Piazza or ask during Office Hours.

Practice Midterm 2A: blank/sol

Practice Midterm 2B: blank/sol