CMU RI 16-745: Dynamic Optimization: Assignment 2

Educational Goals

The goal of this assignment is to explore LQR, DDP, and nonlinear optimal control design. More importantly, the fundamental role of problem formulation is also explored. In this assignment we assume we have a perfect model that is expensive to evaluate, and full measurement of relevant quantitites (positions, velocities, and forces). Future assignments will explore the effect of imperfect measurements and imperfect models.

The Task

The task we will do is designing a policy/control law for a humanoid to balance on a board that is itself on top of a rolling cylinder (video, video, video, video, video, video, video, video, video, video, video, ). The board is called a bongo board or balance board, and the task is also known as Rola Bola or Rolla Bolla. Historic survey of bongo boards

Why is this task interesting? This task has the non-minimum phase characteristics of inverted pendulums and cart pole systems (of which BallBots are an example). There are enough degrees of freedom that problem formulation becomes interesting. People can do it, and it is impressive. It is a challenging task for humanoid robots (not yet solved to my knowledge). You can learn to do the task, and use the insight you gain to guide your policy design (and possible design of learning algorithms). Unlike most other tasks involving legs, it does not involve intermittent contact (as in walking). It potentially engages all the human balance strategies (ankle strategy, hip strategies, righting reflex, squatting (control of body height), using arms to grasp supports, ...). It takes time for humans to learn it, so human learning can be studied. It is a challenging problem for robot learning (including learning from demonstration). It can be extended and made more difficult by changing the characteristics of the ground (friction), board (usually board thickness: video), amount the roller is allowed to move (putting roller on pedestal), and using multiple rollers, with pedestals in between them (video).

What is the optimization criterion for this task? Like most tasks, it is not obvious how to define what should be optimized. Falling down is bad: the board can't hit the ground, or run off the roller. The humanoid can't hit the floor (or even bend over too much). The knees can't bend backwards. The contact forces must be positive (or contact is lost). Friction cones should not be violated (no slipping contacts). The maximum actuator torques can not be exceeded. I provide a subroutine crash() that tests if any of the conditions are violated.

In general, smaller actuator commands are better. This means we can use smaller motors, and if the humanoid is biological, they will get less tired.

Often in a task it is useful to define an optimization criterion not because it is really what you want to optimize, but because it is useful for computing or learning a policy. This is referred to as shaping. It is up to you to define specific optimization criteria for this task.

Starter Code

I give you sample C code for the dynamics. Matlab is useful for doing the LQR control design, and for plotting data. Feel free to translate the C code into other languages (AMPL, Matlab, Simulink, ...) and use other code (Gnu Scientific Library, ...). If you translate the sample code into some other language (say translating dynamics.c into Matlab-speak) please give a copy of that (not your solution) to us (the TA) so everyone can use it.

bongo2.zip sets up directory dynopt-ass2 and several subdirectories. Look at dynopt-ass2/notes.txt to see what they are.
bongo3.zip sets up directory bongo3 which should be a subdirectory in dynopt-ass2. This supercedes dynopt-ass2/bongo2/. *.vcproj files are fixed to optimize code in the ReleaseDLL version (not debug versions). *.vcxproj files are removed since they were incorrect. If you are using Visual Studio Express 2010, just double click on the *.vcproj files to create correct *.vcxproj files.
bongo4.zip fixes several BUGS (see bug-fixes). The critical bugs involve typos on how the forces were applied. The directory bongo4 should be a subdirectory in dynopt-ass2. This supercedes dynopt-ass2/bongo3/.
bongo5.zip uses a hybrid Langrangian/Newton-Euler method (lne) to calculate the accelerations vs. the previous Newton Euler method (ne). It is about 3 times faster. The directory bongo5 should be a subdirectory in dynopt-ass2. This supercedes dynopt-ass2/bongo4/, although you should feel free to keep using the Newton-Euler method.

(x in bongox.zip is version number)

Here is a matlab version of the bongo program, Compliments of Ji Zhang jizhang03@gmail.com. He translated the C++ into matlab. The "simulate.m" runs in the same way as the "simulate.c" and saves a data file "d00001" in the same folder. The data file can be loaded for animation directly.

Hmmm, this C code is slow! Yes, it is expensive to do an accurate simulation of this task that also gives you the constraint forces. Ways to speed things up:

Change the dynamics engine. I used a Newton Euler approach, because it provided the constraint forces and was easy to formulate (the equations I provide are simple, repetitive, and easy to debug). You could use Lagrangian dynamics to model the humanoid, or Lagrangian dynamics to model the whole thing. This gives you a much smaller set of equations to solve (as much as 10 (full Lagrangian approach) vs. 37). The equation solver I use scales with the cube of the number of equations. Lagrangian dynamics do not give you constraint forces, so if the entire dynamics is done using Lagrangian dynamics another way must be found to estimate foot forces on the board and force on the roller from the board and ground (to check for slipping). Maybe Featherstone's approach is a better way to go (planarized).
Take advantage of sparseness. Find an equation solver that is specialized for sparse matrices and thus faster.
Sacrifice accuracy in the policy design process. Play with the integration method (Euler vs. Runge Kutta) and sampling rate (currently 1kHz). You should still use the expensive but accurate model to ultimately evaluate the quality of your policy.
Model the model. Fit a model to the provided model. Options include a global linear, quadratic, or other polynomial model (think Taylor series), a sigmodial neural network, radial basis functions, and local Taylor series approximations. These models can be fit by approximating derivatives of the dynamics evaluated at the equilibrium point (Taylor series approach), or approximating model predictions at many points in the state space.
Solve a different problem. Simpler systems that have some of the characteristics of the bongo board problem include inverted pendulums (single or multiple links) with weak actuators at the joints, and the cart pole. Use what you learn on these simpler problems to guide your solution to bongo boarding. A very simple model only has a horizontal center of mass (COMx) and its derivative (marked with a d at the end: COMxd), with a center of pressure location (COP) as the actuation. More complex models could add dynamics to the center of pressure, with states of (COMx, COMxd, COPx) and an input of (COPxd) or states of (COMx, COMxd, COPx, COPxd) and an input of (COPxdd) (the last version is a good approximation to the cart pole). Another potentially important feature to add is the use of angular momentum of the body. A flywheel can be added to the cart pole. Ben Stephens's thesis discusses these models for standing and walking, all you would have to do is add a sliding or wheeled base. Fixed linear models could be used.
Don't calculate accelerations from states. It should be pointed out that some trajectory optimization methods use the dynamics as constraints rather than calculating the accelerations from the states and integrating accelerations forward in time. In this case the accelerations are not solved for, avoiding the solution of the set of equations described above. Instead, positions, accelerations, constraint forces, and potentially velocities are the variables that are optimized. The cost of manipulating the dynamics and constraint equations is now borne by the Sequential Quadratic Program (SQP) solver. It would be interesting to know if this helps. This Matlab code from Assignment 1 is an example of using dynamics as a constraint.
Single precision? Maybe converting everything to single precision floating point numbers will make a difference?
Compiler optimization flags I suspect the -O3 (and -m64 on 64 bit machines) flags for gcc on linux get most of the optimization speedup, but it may be the case that other compiler flags, especially flags indicating what kind of hardware to compile to, will help. On Windows, I have no idea how to set up the optmization flags for a Visual Studio project, and the code is currently running several times slower under windows. I played with the compiler flags in the ReleaseDLL version. Let me know if you find better compiler flag settings.

What to do?

1: Try to manually design a control law for this task. The point of this question is to give you some appreciation of the difficulty of manually designing multidimensional controllers. However, I do have to note that people can learn to do this task themselves, which means they are clearly designing controllers for the task somehow. How do they do it? Go ahead and try to do the task yourself.

2: Try to design an LQR control law with the largest basin of attraction possible. See notes.txt in the bongox directory for an example of using Matlab to do LQR design, and linearize-dynamics.c for an example of numerically linearizing the dynamics at the equilibrium point. You can vary Q and R, the equilibrium point (how much the knees are bent), and the states you use. I arbitrarily chose knee angles as part of the state. I could have just as well chosen leg lengths, an x,y location of the hip relative to the board, or an x,y location of the hip relative to the roller and true vertical. The torso angle is with respect to true vertical. It could be relative to the board angle, or some other orientation. I could have used the center of mass location relative to the roller as part of the state, as that is a key quantity. The control signals (currently joint torques) could be linearly or nonlinearly transformed as well.

It is potentially expensive to estimate the size of the basin of attraction in the state space (which is 10 dimensional). In theory a simulation must be run for an adequate amount of time for each candidate initial state. The grid based approach where you estimate the volume of initial states that converge to the goal by counting the number of grid cells that do so is challenging since the grid is 10 dimensional. An approach which randomly samples the grid and estimates the volume might be more useful. A crude measure of volume would do line searches in both directions along each coordinate axis starting from a known working initial state (an axis aligned orthogonal search). The next step up would maintain orthogonal line searches, but allow flexibility in the orientation of the coordinate system. The next step up would try to fit an ellipse to the boundaries detected by many line searches in arbitrary directions.

3: Design a nonlinear policy with the largest basin of attraction possible using the method of your choice. Options include Dynamic Programming, optimal trajectory library, MPC/RHC (Model Predictive Control/Receding Horizon Control, policy optimization, ...

4: Design a controller that can move to having the roller almost under one foot and hold there while balancing.

5: EXTRA CREDIT: Design a controller that can start from a position where one end of the board is at floor level and the roller is almost under the opposite foot. This is the position humans often start from. You can add constraint forces to the dynamics to keep the board from going through the floor, so this start position is statically stable.

6: EXTRA CREDIT: What policy will be easiest for a robot? What formulations give the lowest K (sum the squares of all the elements of the K matrix)? Minimizing K by redefining states needs to take into account any scaling of the magnitude of the states, as K can be made arbitrarily small by scaling the states up. What formulations give a decoupled K (K is as close to "double" diagonal/modal as possible). A decoupled K can be handled by separate controllers. What control designs allow the "slowest" reaction (I suppose this means the approach that can tolerate the largest feedback delay)? What formulations can make use of the foot forces, or other "features" that are available that might not be part of the state? What formulations allow K to be specified with the least numerical precision (often two high gains need to be very precisely specified to get a desired low gain on the difference of the corresponding two states)? What formulations allow the model to be specified with the least precision (whatever that means)? What formulations allow the optimization criterion to be specified with the least precision?

What to turn in?

You can use any type of computer/OS/language you want. You can work in groups or alone. Generate a web page describing what you did (one per group). Include links to your source and any compiled code in either .zip, .tar, or .tar.gz format. Be sure to list the names of all the members of your group. Mail the URL of your web page to cga@cmu.xxx and xxinjile@andrew.cmu.xxx [You complete the address, we are trying to avoid spam.] The writeup is more important than the code. What did you do? Why did it work? What didn't work and why?

Questions

If I want to use LQR for the nonlinear control, should I linearize the system and compute the gain matrix k every a few steps? I am wondering how to linearize the system at a non-stable point to get the linear equation x_k+1 = Ax + Bu.

If you are going to linearize at multiple points which are not equilibrium points, use DDP (Differential Dynamic Programming). LQR is only appropriate for a regulator design that linearizes about an equilibrium point. For DDP you can linearize a system in motion. Given a function:
x1 = F( x0, u0 )
you can linearize as
(x_out - x1) = A*(x_in - x0) + B*(u - u0)
or
(x_out - x1) = A*(x_in - x0) + B*u + c
For DDP it does not need to be an equilibrium point.

Simulate or Animate dies and I get no information.

Run these programs from a terminal window (or DOS window or command window or whatever it is called). That way you can see error messages and they persist after the program dies.

I have a 64 bit version of windows or linux and the animation program does not work.

You need 64bit versions of the drawstuff libraries. I don't have access to a 64bit windows machine, so I can't make them. Try to get them from someone else in the class, or install ODE (www.ode.org) with the drawstuff libraries and then copy the files. For linux: working copy of bongo2 for linux 64bit, compliments of Bryan Hood bhood@cs.cmu.edu.

I get an error message of the form
'simulate.exe': Loaded 'C:\Windows\SysWOW64\ntdll.dll', Cannot find or open the PDB file

One way to debug this kind of stuff is to google the error message. It appears you are running into a new feature of Visual Studio that I don't see in my 2008 express 32bit or 2010 express 32bit. I believe you can just ignore the PDB messages. Check this out. or google the error message yourself to get more info.

We are using MATLAB/SimMechanics for our homework 2 solution. This is mostly going very nicely but we're having trouble comparing your code with ours. We'd like to be able to test your LQR gain matrix with our model as a sanity check. Could we maybe get a quick picture of how you've defined all of your local coordinate systems? Essentially, for all joints (including ball and board) we need to know what 0 is and what direction is positive.

X is to the viewers right, and Y is up. (0,0) is on the ground in the center of the picture. The robot is facing the viewer, with its left hand pointing along positive X.

For the board, horizontal is zero, and the standard angle convention of counterclockwise from the viewer's point of view is positive.

In the Newton-Euler model, zero angles occur when the corresponding link is vertical. All link angles use the standard angle convention of counterclockwise from the viewer's point of view is positive.

In the 10 state model, the knees measure how much each leg is bent. Positive knee angles point the knees away from the centerline. The torso angle is with respect to vertical, with the standard angle convention of counterclockwise from the viewer's point of view is positive.

I'm trying to implement something along the lines of a receding horizon controller or a model predictive controller which uses the dynamics as constraints, in a similar fashion to the 2 link swingup problem. However, the current dynamics are written in Newton-Euler form, and I'm a bit confused about how to represent them as constraints. As far as I understand it, I want to get them into the form
Constraints = (Forces and Torques) - (Intertias and AngularInertias)
I could do this by just rearranging the dynamics equations as they are currently written, but I'm not sure exactly what's going on in the code. What do nr_a (which I assume stands for "numerical recipies A") and nr_b stand for? Am I even going about this the right way?

This is a reasonable approach. The dynamics works by (in accelerations())

1) filling matrix nr_a and vector nr_b with stuff (all the lines with
d->nr_a[row][z] = something;
d->nr_b[row] = something;
)

2) solving nr_a * unknown-vector = nr_b
ludcmp( d->nr_a_x2, N_X2_B, d->nr_indx_x2, &nr_d );
lubksb( d->nr_a_x2, N_X2_B, d->nr_indx_x2, d->nr_b_x2 );
The unknown vector is filled with first accelerations, and then forces.

You need to create acceleration and force variables (assignment 1 only created acceleration variables). Constraint equations come from each row of
0 = nr_a * vector - nr_b;
Part 1) would remain unchanged. Part 2) would be dropped.

nr does stand for numerical recipes, to remind me that its indexing starts with 1.

For my simplified representation of the bongo board, I had to re-derive the dynamics. I got pretty tired of taking analytic derivatives, so I think I'm going to try to compute the matrices A and B in
x_dot = A x + B u
numerically, kind of like how you do it in linearize_dynamics.c.

From working out the matrix math on paper, it seems to me that if I change only the i^th component of my state vector by some dx_i and call my function
x_dot = get_derivatives(x,u)
and call it once more with
x_dot = get_derivatives(x+d_xi,u)
I'll get some measured change in each component of (column vector) x_dot. If i divide by the amount d_xi, this looks like it should give me the i^th column of the A matrix.

I've been reading your c code but don't fully understand what's going on, except I can't really make sense of the factor of two you buried in the denominator:
"... a[j][i] = (next_state_1[j] - next_state_2[j])/(2*delta); ..."
Also, while I'd be comparing changes in x_dot, I think you're looking at changes in the state x itself?

Can you help me interpret how A is found numerically? Thanks!

1) I am doing a discrete time model, you are doing a continuous time model.

2) I used a centered difference
state[i] += delta;
evaluate
state[i] = save_state[i] - delta;
so + and - delta, for a total change of 2*delta