When linear generalization from the input features does not suffice to produce good performance, the cost of generating training data is likely to make STAGE inferior to non-learning optimization algorithms on any given problem instance. However, if the input features can be made suitably instance-independent, then the cost of training an evaluation function can be amortized over the whole family of applicable instances. This was exactly the tradeoff found to be profitable in [Zhang1996], as discussed in Section 3.2.
To investigate how problem-dependent learned evaluation functions are in channel routing, we re-ran experiment (G) on a suite of eight problems from the literature [Chao and Harper1996]. Table 3 summarizes the results and gives the coefficients of the linear evaluation function learned (independently) for each problem.
Problem | lower | best | best | learned coefficients |
instance | bound | hillcl. | STAGE | < 1, U, p, w > |
YK4 | 10 | 41 | 14 | <-7.55, -141.2, 7.94, 141.7 > |
HYC1 | 8 | 8 | 8 | <-0.19, -2.12, 0.86, 2.36> |
HYC2 | 9 | 9 | 9 | < 0.05, -2.78, 0.34, 2.66> |
HYC3 | 11 | 14 | 12 | <-0.09, -5.06, 0.38, 5.02> |
HYC4 | 20 | 49 | 30 | <-3.96, -49.62, 0.33, 49.14> |
HYC5 | 35 | 47 | 41 | <-1.68, -7.30, 1.18, 7.08> |
HYC6 | 50 | 75 | 59 | <-1.38, -9.59, 0.72, 9.48> |
HYC7 | 39 | 87 | 83 | <-5.35, -49.26, -1.38, 48.03> |
HYC8 | 21 | 55 | 33 | <-0.79, -32.96, 0.11, 32.78> |
The similarities among the learned evaluation functions are striking. Like the hand-tuned cost function C of [Wong et al. 1988] (Equation 7), all the STAGE-learned cost functions assigned a relatively small positive weight to feature p, except on instance HYC7 (where STAGE's learning helped least). Unlike function C, they all assigned a large negative weight to feature U. Since the behavior of a linear evaluation function is invariant under linear transformations, the similarity of these functions suggests that transfer between problem instances will be fruitful. As a first test of transfer, I ran STAGE on HYC7 using the value function learned from HYC6. This did indeed help, producing a solution of size 71 instead of 83.
The assignment of a negative coefficient to U is surprising, because U measures the sparsity of the horizontal tracks. U correlates strongly positively with the objective function to be minimized; a term of -U in the evaluation function ought to pull the search toward terrible solutions in which each subnet occupies its own track. However, the positive coefficient on w cancels out this bias, and in fact a proper balance between the two terms can be shown to bias search toward solutions with an uneven distribution of track sparsities. Although this characteristic is not itself the mark of a high-quality solution, it does help lead hillclimbing search to high-quality solutions. STAGE successfully discovered and exploited this predictive combination of features.