Here is Matlab code for a simple dual control example. There is no measurement noise, so the x(i) are known perfectly. The control gain b is not known perfectly but with variance var_b(i). There is process noise with variance var_w = 1. The first section below is the top level code, and the second file is a subroutine that computes the cost of an initial action.
Matlab code to plot cost of various initial actions on a 2 step problem:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % x(k+1) = x(k) + b(k)*u(k) + w % b(k+1) = b(k) % w is zero mean noise with variance var_w = 1 % x0 is first state x0 = -1; % nominal estimate of b remains the same throughout b = 1; % initial b parameter variance var_b0 = 1; % var_w: variance of process noise w var_w = 1 % u0 is first action N = 100; u0s(N) = 0; cs(N) = 0; for i = 1:N u0 = (i-1)/(N-1); c = cost( u0, x0, b, var_b0, var_w ); u0s(i) = u0; cs(i) = c; end plot( u0s, cs ); xlabel('u0') ylabel('cost') title('x0=-1,b=1,var_w=1') %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Matlab subroutine to compute cost of initial action
function c = cost( u0, x0, b, var_b0, var_w ) % c = cost( u0, x0, b, var_b0, var_w ): compute cost of initially taking % action u0 % x0 is the starting state, % b is the prior on the value of the b parameter, % var_b0 is the initial variance of b % x(k+1) = x(k) + b(k)*u(k) + w % b(k+1) = b(k) % w is zero mean noise with variance var_w % step indices are appended to variable names: % x1 = x(1), x2 = x(2), var_b1 = var_b(1), u1 = u(1), ... % variance of b after first step var_b1 = var_b0*var_w/(u0*u0*var_b0 + var_w); % expectation of x1 x1 = x0 + b*u0; % use cautious control on last step since further learning is not useful % since no further decisions will be made u1 = -b*x1/(var_b1 + b*b); % expectation of x2 x2 = x1 + b*u1; % Var(x1) var_x1 = var_b0*u0*u0 + var_w; % Var(x2) var_x2 = var_b1*u1*u1 + var_w; % E(x1^2) ex1_2 = var_x1 + x1*x1; % E(x2^2) ex2_2 = var_x2 + x2*x2; % cost which is sum of E(x1^2 + x2^2) c = ex1_2 + ex2_2;
In the figure, notice that the optimum is around 0.6. If this had been just a one step problem, the optimum u0 would have been 0.5 (using cautious control). Having more than one step makes it advantagous to use a bigger u0 to reduce the uncertainty about the parameter b.
Matlab code to plot optimal u versus other variables:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% clear uarray x0 = -1; b = 1; var_b0 = 1; var_w = 1; i = 1; for x0 = 0.0:-0.1:-2.0 j = 1; for var_b0 = 0.0: 0.1: 5.0 u = findbestu( x0, b, var_b0, var_w ); uarray(i,j) = u; j = j + 1; end i = i + 1; end figure(1) y = 0.0:-0.1:-2.0; x = 0.0: 0.1: 5.0; surf(x,y,uarray) view(-37.5+180,30) title('Optimal u0 when varying x0 and Var_b0') ylabel('x0') xlabel('Var(b)') %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Matlab subroutine to find best u
function u = findbestu( x0, b, var_b0, var_w ) % x0 is first state % b: nominal estimate of b remains the same throughout % var_b0: initial b parameter variance % var_w: variance of process noise % x(k+1) = x(k) + b(k)*u(k) + w % b(k+1) = b(k) % v is zero mean noise with variance var_w % u0 is first action % c = cost( u0, x0, b, var_b0 ) N = 1001; u0s(N) = 0; cs(N) = 0; u_best = 0; c_best = 1e30; for i = 1:N u0 = 10*(i-1)/(N-1); c = cost( u0, x0, b, var_b0, var_w ); if ( c < c_best ) u_best = u0; c_best = c; end u0s(i) = u0; cs(i) = c; end u = u_best;
Matlab code to plot u0 vs b and var_b0:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% clear uarray x0 = -1; b = 1; var_b0 = 1; var_w = 1; i = 1; for b = 0.1:0.1:2.0 j = 1; for var_b0 = 0: 0.1: 5.0 u = findbestu( x0, b, var_b0, var_w ); uarray(i,j) = u; j = j + 1; end i = i + 1; end figure(2) y = 0.1:0.1:2.0; x = 0: 0.1: 5.0; surf(x,y,uarray) title('Optimal u0 when varying b and Var(b)') ylabel('b') xlabel('Var(b)') view(-37.5+180,30) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Matlab code to plot u0 vs var_w and var_b0:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% clear uarray x0 = -1; b = 1; var_b0 = 1; var_w = 1; i = 1; for var_w = 0.1:0.1:2.5 j = 1; for var_b0 = 0: 0.1: 5.0 u = findbestu( x0, b, var_b0, var_w ); uarray(i,j) = u; j = j + 1; end i = i + 1; end figure(3) y = 0.1:0.1:2.5; x = 0: 0.1: 5.0; surf(x,y,uarray) title('Optimal u0 when varying var_w and Var(b)') ylabel('var_w') xlabel('Var(b)') view(-37.5+180,30) % var_w has no effect %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%