To join consecutive candidate units from clusters selected by the decision trees, we use an optimal coupling [4] technique to measure the concatenation costs between two units. This technique offers two results: the cost of a join and a position for the join. Allowing the join point to move is particularly important when our units are phones: initial unit boundaries are on phone-phone boundaries which probably are the least stable part of the signal. Optimal coupling allows us to select more stable positions towards the center of the phone. In our implementation, if the previous phone in the database is of the same type as the selected phone we use a search region that extends 60% into the previous phone, otherwise the search region is defined to be the phone boundaries of the current phone.
Our actual measure of join cost is a frame based Euclidean distance. The frame information includes /, Mel frequency cepstrum coefficients, and power and their delta counterparts. Although this uses the same parameters as used in the acoustic measure used in clustering, now it is necessary to weight the / parameter to deter discontinuity of local / which can be particularly distracting in synthesized examples. Except for the delta features this measure is similar to that used in [7].