This assignment explores using LQR and DDP to do policy optimization. We suggest that you read this paper before attempting this assignment, to get an understanding of the system we are dealing with.
Here we consider multiple ways to parameterize an inverted pendulum walking robot with an offset flywheel, which is a simplified model of Atrias.
Dynamics: For each case all the mass is concentrated at a body whose center of mass is LL = 0.1m above the hip. The body has mass M and moment of inertia around the center of mass I_com.
M = 80 kg, I_com = 5 kgm^2, g = 9.81 m/s^2.
theta is the angle of the body with respect to vertical (measured with an IMU). psi is the angle of the hip with respect to the leg (measured with an encoder). The leg length is L. torque is the torque at the hip. force is the force along the leg. Fx_foot and Fy_foot are ground reaction forces in ground coordinates.
I_joint = I_com + M*LL^2
torque = I_joint*thetadd
x_hip = L*sin(theta - psi)
y_hip = L*cos(theta - psi)
x_com = x_hip + LL*sin(theta) = L*sin(theta - psi) + LL*sin(theta)
y_com = y_hip + LL*cos(theta) = L*cos(theta - psi) + LL*cos(theta)
so
x_comd = L*cos(theta - psi)*(thetad - psid) + LL*cos(theta)*thetad
y_comd = -L*sin(theta - psi)*(thetad - psid) - LL*sin(theta)*thetad
and
x_comdd = L*cos(theta - psi)(thetadd - psidd) - L*cos(theta - psi)(thetad - psid)^2
+ LL*cos(theta)*thetadd - LL*sin(theta)*thetad^2
y_comdd = -L*sin(theta - psi)(thetadd - psidd) + L*sin(theta - psi)(thetad - psid)^2
- LL*sin(theta)*thetadd - LL*cos(theta)*thetad^2
Fx_com = -force*sin(theta - psi) + torque*cos(theta-psi)/L
Fy_com = force*cos(theta - psi) - M*g + torque*sin(theta-psi)/L
Fx_com = M*x_comdd
Fy_com = M*y_comdd
The robot is controlled in terms of ground reaction forces Fx_foot and Fy_foot at the point foot. xcom_dd, ycom_dd and theta_dd are a function of Fx_foot and Fy_foot.
M*xcom_dd = Fx_foot
M*ycom_dd = Fy_foot -M*g
I_com*theta_dd = Fx_foot*y_com - Fy_foot*x_com
The robot has the kinematics of Atrias. There is a four bar mechanism, and two hip motors per leg. The dimensions of the four bar mechanism, as well as details about the four-bar dynamics can be found in this paper. In particular, you should look at section III.
Linearize this system using each action parameterization about x_hip = 0, y_hip = 1, and theta = 0;
Write an LQR controller that stabilizes the system about this point. Perturb the system a little and see the basin of attraction for disturbances in x, y, and theta.
Question 1: What Q and R for LQR work best? What does work best mean? Can you come up with a way to automatically tune your Q and R matrices (for example, using CMA-ES)?
Question 2: Which action parameterization work best? What does work best mean?
Question 3: What perturbation direction (x, y or theta) is the robot most vulnerable? What does vulnerable mean?
Bonus: Does parameterizing the state in terms of x_hip, y_hip or x_com, y_com improve LQR control?
Generate trajectories that optimize the sum of the squared magnitudes of the u vectors, using each of the three action parameterizations. The robot should start standing at rest, walk 10m in 10s, and come to rest standing. You need to decide what to optimize. You might want to look at this paper as a guideline for this part and the next.
Generate optimal controllers that follow the footstep trajectories you generated in part 2, using each of the three action representations. You need to decide what to optimize, but the robot should not fall down, and it should use small forces so we can run it off a battery.
(A) First do this by linearizing about just the (0,l,0) point. Use all the formulations for doing this, and compare performance.
(B) Now use DDP to do control. How does the performance differ from part A?
Bonus: Use some form policy optimization to do Parts 2 and 3.
You can work in groups or alone. Generate a web page describing what you did (one per group). Include links to your source and any compiled code in either .zip, .tar, or .tar.gz format. Be sure to list the names of all the members of your group. Mail the URL of your web page to cga@cmu.xxx and arai@andrew.xxx. [You complete the address, we are trying to avoid spam.] The writeup is more important than the code. What did you do? Why did it work? What didn’t work and why?