CMU RI 16-745: Dynamic Optimization: Assignment 2


Educational Goals

The goal of this assignment is to explore LQR, DDP, and nonlinear optimal control design. More importantly, the fundamental role of problem formulation is also explored. In this assignment we assume we have a perfect model that is expensive to evaluate, and full measurement of relevant quantitites (positions, velocities, and forces). Future assignments will explore the effect of imperfect measurements and imperfect models.


The Task

The task we will do is designing a policy/control law for a humanoid to balance on a board that is itself on top of a rolling cylinder (video, video, video, video, video, video, video, video, video, video, video, ). The board is called a bongo board or balance board, and the task is also known as Rola Bola or Rolla Bolla. Historic survey of bongo boards

Why is this task interesting? This task has the non-minimum phase characteristics of inverted pendulums and cart pole systems (of which BallBots are an example). There are enough degrees of freedom that problem formulation becomes interesting. People can do it, and it is impressive. It is a challenging task for humanoid robots (not yet solved to my knowledge). You can learn to do the task, and use the insight you gain to guide your policy design (and possible design of learning algorithms). Unlike most other tasks involving legs, it does not involve intermittent contact (as in walking). It potentially engages all the human balance strategies (ankle strategy, hip strategies, righting reflex, squatting (control of body height), using arms to grasp supports, ...). It takes time for humans to learn it, so human learning can be studied. It is a challenging problem for robot learning (including learning from demonstration). It can be extended and made more difficult by changing the characteristics of the ground (friction), board (usually board thickness: video), amount the roller is allowed to move (putting roller on pedestal), and using multiple rollers, with pedestals in between them (video).

What is the optimization criterion for this task? Like most tasks, it is not obvious how to define what should be optimized. Falling down is bad: the board can't hit the ground, or run off the roller. The humanoid can't hit the floor (or even bend over too much). The knees can't bend backwards. The contact forces must be positive (or contact is lost). Friction cones should not be violated (no slipping contacts). The maximum actuator torques can not be exceeded. I provide a subroutine crash() that tests if any of the conditions are violated.

In general, smaller actuator commands are better. This means we can use smaller motors, and if the humanoid is biological, they will get less tired.

Often in a task it is useful to define an optimization criterion not because it is really what you want to optimize, but because it is useful for computing or learning a policy. This is referred to as shaping. It is up to you to define specific optimization criteria for this task.


Starter Code

I give you sample C code for the dynamics. Matlab is useful for doing the LQR control design, and for plotting data. Feel free to translate the C code into other languages (AMPL, Matlab, Simulink, ...) and use other code (Gnu Scientific Library, ...). If you translate the sample code into some other language (say translating dynamics.c into Matlab-speak) please give a copy of that (not your solution) to us (the TA) so everyone can use it.

(x in bongox.zip is version number)

Here is a matlab version of the bongo program, Compliments of Ji Zhang jizhang03@gmail.com. He translated the C++ into matlab. The "simulate.m" runs in the same way as the "simulate.c" and saves a data file "d00001" in the same folder. The data file can be loaded for animation directly.

Hmmm, this C code is slow! Yes, it is expensive to do an accurate simulation of this task that also gives you the constraint forces. Ways to speed things up:


What to do?

1: Try to manually design a control law for this task. The point of this question is to give you some appreciation of the difficulty of manually designing multidimensional controllers. However, I do have to note that people can learn to do this task themselves, which means they are clearly designing controllers for the task somehow. How do they do it? Go ahead and try to do the task yourself.

2: Try to design an LQR control law with the largest basin of attraction possible. See notes.txt in the bongox directory for an example of using Matlab to do LQR design, and linearize-dynamics.c for an example of numerically linearizing the dynamics at the equilibrium point. You can vary Q and R, the equilibrium point (how much the knees are bent), and the states you use. I arbitrarily chose knee angles as part of the state. I could have just as well chosen leg lengths, an x,y location of the hip relative to the board, or an x,y location of the hip relative to the roller and true vertical. The torso angle is with respect to true vertical. It could be relative to the board angle, or some other orientation. I could have used the center of mass location relative to the roller as part of the state, as that is a key quantity. The control signals (currently joint torques) could be linearly or nonlinearly transformed as well.

It is potentially expensive to estimate the size of the basin of attraction in the state space (which is 10 dimensional). In theory a simulation must be run for an adequate amount of time for each candidate initial state. The grid based approach where you estimate the volume of initial states that converge to the goal by counting the number of grid cells that do so is challenging since the grid is 10 dimensional. An approach which randomly samples the grid and estimates the volume might be more useful. A crude measure of volume would do line searches in both directions along each coordinate axis starting from a known working initial state (an axis aligned orthogonal search). The next step up would maintain orthogonal line searches, but allow flexibility in the orientation of the coordinate system. The next step up would try to fit an ellipse to the boundaries detected by many line searches in arbitrary directions.

3: Design a nonlinear policy with the largest basin of attraction possible using the method of your choice. Options include Dynamic Programming, optimal trajectory library, MPC/RHC (Model Predictive Control/Receding Horizon Control, policy optimization, ...

4: Design a controller that can move to having the roller almost under one foot and hold there while balancing.

5: EXTRA CREDIT: Design a controller that can start from a position where one end of the board is at floor level and the roller is almost under the opposite foot. This is the position humans often start from. You can add constraint forces to the dynamics to keep the board from going through the floor, so this start position is statically stable.

6: EXTRA CREDIT: What policy will be easiest for a robot? What formulations give the lowest K (sum the squares of all the elements of the K matrix)? Minimizing K by redefining states needs to take into account any scaling of the magnitude of the states, as K can be made arbitrarily small by scaling the states up. What formulations give a decoupled K (K is as close to "double" diagonal/modal as possible). A decoupled K can be handled by separate controllers. What control designs allow the "slowest" reaction (I suppose this means the approach that can tolerate the largest feedback delay)? What formulations can make use of the foot forces, or other "features" that are available that might not be part of the state? What formulations allow K to be specified with the least numerical precision (often two high gains need to be very precisely specified to get a desired low gain on the difference of the corresponding two states)? What formulations allow the model to be specified with the least precision (whatever that means)? What formulations allow the optimization criterion to be specified with the least precision?


What to turn in?

You can use any type of computer/OS/language you want. You can work in groups or alone. Generate a web page describing what you did (one per group). Include links to your source and any compiled code in either .zip, .tar, or .tar.gz format. Be sure to list the names of all the members of your group. Mail the URL of your web page to cga@cmu.xxx and xxinjile@andrew.cmu.xxx [You complete the address, we are trying to avoid spam.] The writeup is more important than the code. What did you do? Why did it work? What didn't work and why?


Questions


If I want to use LQR for the nonlinear control, should I linearize the system and compute the gain matrix k every a few steps? I am wondering how to linearize the system at a non-stable point to get the linear equation x_k+1 = Ax + Bu.

If you are going to linearize at multiple points which are not equilibrium points, use DDP (Differential Dynamic Programming). LQR is only appropriate for a regulator design that linearizes about an equilibrium point. For DDP you can linearize a system in motion. Given a function:
x1 = F( x0, u0 )
you can linearize as
(x_out - x1) = A*(x_in - x0) + B*(u - u0)
or
(x_out - x1) = A*(x_in - x0) + B*u + c
For DDP it does not need to be an equilibrium point.


Simulate or Animate dies and I get no information.

Run these programs from a terminal window (or DOS window or command window or whatever it is called). That way you can see error messages and they persist after the program dies.


I have a 64 bit version of windows or linux and the animation program does not work.

You need 64bit versions of the drawstuff libraries. I don't have access to a 64bit windows machine, so I can't make them. Try to get them from someone else in the class, or install ODE (www.ode.org) with the drawstuff libraries and then copy the files. For linux: working copy of bongo2 for linux 64bit, compliments of Bryan Hood bhood@cs.cmu.edu.


I get an error message of the form
'simulate.exe': Loaded 'C:\Windows\SysWOW64\ntdll.dll', Cannot find or open the PDB file

One way to debug this kind of stuff is to google the error message. It appears you are running into a new feature of Visual Studio that I don't see in my 2008 express 32bit or 2010 express 32bit. I believe you can just ignore the PDB messages. Check this out. or google the error message yourself to get more info.


We are using MATLAB/SimMechanics for our homework 2 solution. This is mostly going very nicely but we're having trouble comparing your code with ours. We'd like to be able to test your LQR gain matrix with our model as a sanity check. Could we maybe get a quick picture of how you've defined all of your local coordinate systems? Essentially, for all joints (including ball and board) we need to know what 0 is and what direction is positive.

X is to the viewers right, and Y is up. (0,0) is on the ground in the center of the picture. The robot is facing the viewer, with its left hand pointing along positive X.

For the board, horizontal is zero, and the standard angle convention of counterclockwise from the viewer's point of view is positive.

In the Newton-Euler model, zero angles occur when the corresponding link is vertical. All link angles use the standard angle convention of counterclockwise from the viewer's point of view is positive.

In the 10 state model, the knees measure how much each leg is bent. Positive knee angles point the knees away from the centerline. The torso angle is with respect to vertical, with the standard angle convention of counterclockwise from the viewer's point of view is positive.


I'm trying to implement something along the lines of a receding horizon controller or a model predictive controller which uses the dynamics as constraints, in a similar fashion to the 2 link swingup problem. However, the current dynamics are written in Newton-Euler form, and I'm a bit confused about how to represent them as constraints. As far as I understand it, I want to get them into the form
Constraints = (Forces and Torques) - (Intertias and AngularInertias)
I could do this by just rearranging the dynamics equations as they are currently written, but I'm not sure exactly what's going on in the code. What do nr_a (which I assume stands for "numerical recipies A") and nr_b stand for? Am I even going about this the right way?

This is a reasonable approach. The dynamics works by (in accelerations())

1) filling matrix nr_a and vector nr_b with stuff (all the lines with
d->nr_a[row][z] = something;
d->nr_b[row] = something;
)

2) solving nr_a * unknown-vector = nr_b
ludcmp( d->nr_a_x2, N_X2_B, d->nr_indx_x2, &nr_d );
lubksb( d->nr_a_x2, N_X2_B, d->nr_indx_x2, d->nr_b_x2 );
The unknown vector is filled with first accelerations, and then forces.

You need to create acceleration and force variables (assignment 1 only created acceleration variables). Constraint equations come from each row of
0 = nr_a * vector - nr_b;
Part 1) would remain unchanged. Part 2) would be dropped.

nr does stand for numerical recipes, to remind me that its indexing starts with 1.


For my simplified representation of the bongo board, I had to re-derive the dynamics. I got pretty tired of taking analytic derivatives, so I think I'm going to try to compute the matrices A and B in
x_dot = A x + B u
numerically, kind of like how you do it in linearize_dynamics.c.

From working out the matrix math on paper, it seems to me that if I change only the i^th component of my state vector by some dx_i and call my function
x_dot = get_derivatives(x,u)
and call it once more with
x_dot = get_derivatives(x+d_xi,u)
I'll get some measured change in each component of (column vector) x_dot. If i divide by the amount d_xi, this looks like it should give me the i^th column of the A matrix.

I've been reading your c code but don't fully understand what's going on, except I can't really make sense of the factor of two you buried in the denominator:
"... a[j][i] = (next_state_1[j] - next_state_2[j])/(2*delta); ..."
Also, while I'd be comparing changes in x_dot, I think you're looking at changes in the state x itself?

Can you help me interpret how A is found numerically? Thanks!

1) I am doing a discrete time model, you are doing a continuous time model.

2) I used a centered difference
state[i] += delta;
evaluate
state[i] = save_state[i] - delta;
so + and - delta, for a total change of 2*delta