Although the constant factors derived for the running time bound in Theorem 2.1 are superior to those proved by Upfal [29] (who achieved a bound of steps for d = 21), they are very far from what can be considered practical. Fortunately, the constant factors can be substantially improved if we allow each switch to process communications along each incident edge at the same time. In particular, we can handle 4d routing problems for the price of one by interleaving and pipelining the 4d problems in the standard way. (I.e., every edge will now be active at every step - edges with the jth color will be working on the ith problem during steps congruent to .) Similarly, we can partition a single problem into 4d batches, each of which can be handled simultaneously but independently. The number of waves in each batch is which means that we can simply reduce PL by a factor of 4d in the overall time bound. As long as PL > 4d, this means that we can route P permutations in
steps.
The most interesting application of this procedure is to routing permutations, for which the time bound is steps. For L=16, , , d = 11, , , , and , this gives an absolute bound of steps which is substantially better than the previous bound.
Recently, Leighton and Maggs [19] developed an alternative method for analyzing greedy routing algorithms on multibutterflies which allows for another order-of-magnitude improvement in the constant factors. In particular, they have shown that a simple greedy algorithm can routing a single permutation in at most steps using d = 10 and queues of size 2 at each edge. By increasing d further, the time bound can be decreased to about . Using a more complicated algorithm, and large d, the time bound can be decreased to nearly [2].
Even the best bounds for the constant factors are not very good, however. Fortunately, as we will see in Section 4, randomly-wired splitter networks (even those with d = 2) appear to perform much better in practice than would be indicated by the known theoretical upper bounds.