Routing is more complicated in the weak switch model. A problem
arises when several packets arrive on the same input bundle at the
same step. Since a switch can examine only one edge in the bundle, it
can forward only one of them, while the others must wait. The routing
algorithm must guarantee that the total delay experienced by each
packet is at most bit steps.
Our routing algorithm is a variant of Ranade's algorithm for routing
on the butterfly [18, 19, 23]. In
Ranade's algorithm, each packet is initially assigned an -bit random rank, which it carries in front of its destination
header. The most important invariant maintained by the algorithm is
that the stream of packets leaving each switch is arranged in order of
non-decreasing rank. In order to keep the stream sorted, a switch
never sends a packet until it is sure that no other packets with
smaller rank will arrive later.
The problem with implementing Ranade's algorithm in the bit model is
that it takes too long to compare the ranks of two packets at the same
switch. In our algorithm, each packet is also assigned a random rank
initially, and that rank does not change throughout the course of the
routing. However, a packet does not carry its rank along with it.
Instead, we introduce a new type of packet called a token. A
token carries no information other than its type, and as a consequence
consists of only bits. The algorithm uses these tokens to
represent the ranks of the other packets. It maintains the invariant
that a packet of rank r passes through a bundle only after exactly
r tokens have passed through the bundle. This invariant guarantees
that the packets pass through the bundle in order sorted by rank. A
token can be considered to have a rank as well; a token of rank r
passes through a bundle only after exactly r tokens have passed
through the bundle. Note that the tokens are not the same as the
ghost packets used in Ranade's algorithm, which carry a rank. Our
algorithm does not use ghost packets.
The mechanics of the algorithm are fairly simple. A switch maintains a pointer to an edge in each of its two input and two output bundles. We call these edges the current input and output edges, respectively. The behavior of the switch depends on whether the current input queues contain packets or tokens. There are four cases. If both of the current input edges have tokens in their queues, then the switch sends tokens on both of the current output edges and advances all four pointers. If one of the current input edges has a token in its queue and the other a packet, then the switch strips off the first bit of the packet's header, shunts the packet to the current edge of the output bundle specified by that bit, and advances the pointers for the input and output bundles used by the packet. If both of the current input edges have packets in their queues, then the switch forwards one of them, and advances the pointers for the bundles used by that one. If either of the current input edges has an empty queue, then the switch does nothing.
Initially, each packet p is assigned a random rank in the
range
, where
, and will be specified
later. The input switches on level 0 send out a stream of packets
interspersed with tokens so that r tokens precede each packet with
rank r. A total of w tokens are sent on each bundle. The tokens
with rank w-1 follow all of the other packets and tokens in the
network. A routing phase terminates when every output has received a
rank w-1 token on both of its input bundles. Rank w-1 tokens can be
specially marked so that the outputs do not have to count the number
of tokens that they have received.
During the course of the algorithm, each edge is used by either one
packet, one token, or not at all. Because w tokens pass through
each bundle, we further dilate the butterfly by adding
edges to each bundle. By using the same edges to send both packets
and tokens, it is possible to remove these extra w edges. However,
the algorithm and its analysis become slightly more complicated
because several tokens may be sent across the same edge, and each one
must wait to be sent until a space frees up in the queue at the end of
the edge.
The proof that the heads of the packets quickly reach their
destinations uses a delay sequence argument like that in
[18, 19, 23]. A k-delay
sequence consists of four components: a path backwards from an
output to an input, a sequence of (not necessarily distinct) switches
appearing on the path in the order , a sequence
of distinct packets, and a sequence
of non-increasing ranks. We say that a k-delay sequence
has occurred if packet
passes through
for
, and
for
.
The following lemma is the crux of the delay sequence argument. It shows that if some packet is delayed, then a delay sequence must have occurred. For simplicity we analyze the first routing phase only.
Proof:
We will construct a delay sequence by spending lag
points. The lag of a packet on level l at time is
. If
some packet arrives at its destination at step
, then it has
lag
. We use these
points to build the sequence. For each
point spent, we will either insert a packet into the sequence, or
decrease the rank of the next packet in the sequence. Since there can
be at most w decreases in the rank, the sequence must contain at
least
packets.
The construction of the delay sequence begins with the last packet to
arrive at its destination. Suppose a packet arrives at its
destination
at time
. First, we insert
,
,
and
into the delay sequence. Then we follow
back in time until it was last delayed. The lag remains
at each
step on the path, until the step at which
failed to move
forward, where the lag drops to
.
If was delayed at switch
because another packet
was
sent instead, then
, and we insert
,
, and
into the sequence, and follow
back
in time. In this case, we have inserted one packet into the sequence
at a cost of one lag point. In the butterfly network, once the paths
of two packets diverge, they do not meet again. Thus, if
delays
, then
remains ahead of
as long as their paths
intersect. As a consequence, no packet is inserted into the sequence
more than once.
If was delayed because a token
was sent instead, then
. In this case we do not insert anything
into the sequence, but instead follow
back in time until it was
last delayed. In this case, we have lost one point in lag, but have
also decreased the rank of the next packet in the sequence by at least
one point.
The third possibility is that was delayed because the current
edge on the other input bundle, c, had nothing to send. In this
case we do not insert anything into the sequence, but go back through
c to a switch s at the previous level at the previous time step.
At the previous time step, switch s must not have sent anything
through bundle c. This can happen for one of two reasons. Either
s sent a packet
on its other bundle, or one of the input
bundles
to s had nothing to send. In the former case, we
insert
,
, and
into the
sequence, and follow
back until it was last delayed. In the
latter case, we go back through
to a switch
at the previous
level at the previous time step and continue. Continuing in this
fashion, we must eventually reach a switch that sent a packet.
Now that we have analyzed the ways in a which a packet can be delayed,
we must consider the ways in which a token can be delayed. Suppose we
are following a token back in time until it was last delayed at
a switch s.
If was delayed because s sent a packet p instead, then we
insert
,
, and
into
the sequence.
Otherwise, if was delayed because s sent another token
instead, then
, and we follow
back
until it was last delayed.
The last possibility is that was delayed because one of the
input bundles c to s had nothing to send. In this case, we go
back through c just as if a packet had been delayed for the same
reason.
The preceding case analysis shows that for each point of lag spent,
either a packet is inserted into the delay sequence, or the rank of
the next packet in the sequence decreases. Thus, the sequence must
contain at least packets.
Proof:
To bound the probability that some packet is delayed for
steps, we now need only to bound the probability that some
-delay
sequence occurs. To do so, we enumerate all possible delay sequences,
and sum the probabilities that each occurs. The argument is similar
to the ones in [18, 19, 23]. There are
ways to choose the path, since there are
choices for
the output at which it starts,
choices for the input at which it
ends, and only one path between them. Once the path is fixed, there
are at most
ways of choosing
switches
on the path. Let
denote
the level of switch
. Then there are at most
ways to choose
. Finally, there are at most
ways of choosing a sequence
of non-increasing ranks in the range
.
A delay sequence occurs only if
passes through
and
, for
. For the delay
sequence constructed thus far, the probabilities that these two events
occur are
, and
, respectively. Thus, the probability that any delay
sequence occurs is at most
Using
the inequality for
, it is
straightforward to show that for any
, there are constants
and
, such that for
, and
,
, the product is at most
.
The analysis for the second phase of routing is similar. If we
require the first phase to be completed before the second phase
begins, then each takes bit steps, and the total time is
bit steps. In fact, it is possible to pipeline the two
phases so that some packets begin the second phase before the first is
finished. The resulting algorithm is simpler and still requires only
bit steps. The analysis is a little more complicated
because a packet
can delay another packet
in the first
phase, and
can delay
in the second.
Once the head of a packet reaches its true destination, the M bits
of data following behind it trickle into the destination in bit
steps. Since, with high probability, all of the heads reach their
destinations in
bit steps, the total time is also
with high probability.