If the paths are to be node-disjoint, then each path must avoid the
sources and sinks in the second and third parts as it passes from the
first part to the fourth part. To avoid these sources and sinks, we
declare them to be blocked. We then apply the technique of
[17] for tolerating faults in multibutterfly networks to
the second and third parts, treating blocked nodes as if they were
faulty. The technique of [17] can be summarized as
follows. First, any splitter (and all nodes that can be reached from
that splitter) that contains more than a fraction
of blocked inputs is erased, meaning that its nodes cannot be used
for routing, where
. Next, working backwards
from the outputs to the inputs, a node is declared to be blocked if
more than
of its up or down neighbors at the next level are
blocked (and not erased). (Note that it is not possible for all of a
node's up and down neighbors to be erased unless that node is also
erased.) Upon reaching the inputs of the network, all the blocked
nodes are erased. The switches that are not erased are said to be
working. The expansion property of the network of working switches is
reduced from
to
.
The following lemmas bound the number of inputs (on levels
and
) and outputs (on level 0) that are erased in the second
and third parts. Note that Lemma 5.1 bounds the
number of inputs that are erased, but are not themselves blocked.
(All the blocked inputs are erased.) Note also that since the two
parts share level 0, the number of erased nodes on that level may be
as large as twice the bound given in Lemma 5.2.
In both networks at least of the inputs and
of the
outputs are left working, where K is the number of sources (and
sinks). Suppose that
, where
is some
constant. By choosing
to be small, we can ensure that at
least K of the nodes on level 0 are not erased in either the
second or third parts. We call these K nodes the rendezvous
points. By making
(and hence d) large, we can also ensure
that the number of nodes on levels
and
that are
erased, but are not themselves sources or sinks, is
,
where
can be made to be an arbitrarily small constant.
The reconfiguration technique described in [17] requires
off-line computation to count the number of blocked inputs in each
splitter. In another paper, Goldberg, Maggs, and Plotkin
[12] describe a technique for reconfiguring a
multibutterfly on-line in word steps.
The next step is to mark some of the nodes in the first part as
blocked. We begin by declaring any node in the first part to be
reserved if it is a neighbor of a source in the second, third,
or fourth part via a cross edge. Now, working backwards from level
to
, a node is declared blocked if at
least
of its 2d neighbors at the next level are either
sources, sinks, blocked, reserved, or erased. We call a node that
is not a source or a sink, and is not reserved, blocked, or erased, a
working node.
Where did the bound on non-working neighbors come from? In
order to be apply the routing algorithm of Section 4.3,
the subnetwork of working nodes must have an
unshared neighbor property. Let
be the largest value such
that the subnetwork of working nodes has an
expansion property (where
is the original expansion
property of the first part). To show that the subnetwork of working
nodes has an
unique neighbors property, we need
. If every working node has at most
non-working
neighbors, then the subnetwork of working nodes has expansion property
. (Recall that we multiply the
parameter
by 2 to get the actual expansion in each merger.) Thus
. If
, then
.
By restricting a working switch to have fewer non-working neighbors,
we could have reduced the required expansion from
down to
nearly
. As the following lemma shows, however, if a working
switch can have
non-working neighbors, then we also need
in order to ensure that there aren't too many blocked nodes.
If we were to allow a working switch to have fewer (or more) than
non-working neighbors, then one of the two ``
''
lower bounds would increase, and the network would require more
expansion.
An identical process is applied to the fourth part, with blocked nodes
propagating from level to level
, and a lemma
analogous to Lemma 5.3 can be proven, showing that
there are at most
blocked nodes in
this part.
Because each node in the first part may be reserved by one source in
each of the second, third, and fourth parts, it may not be possible
for all the sources to establish their paths. If several sources wish
to begin their paths at the same node, then one is locally and
arbitrarily selected to do so, and the others give up. Since at most
four paths start at any node in the first section, at least of
the sources are able to begin their paths. Each source then sends a
message to the corresponding sink. A message first routes across the
row of its source to level
(recall that in every merger
there is an edge from each input to the output in the same row), then
uses the multibutterfly store-and-forward packet routing algorithm
from
[17, 37] to route to the row of its sink on level 0, then
routes across that row in the third and fourth parts until it either
reaches its sinks or reaches the cross edge to its sink. The entire
routing can be performed in
word steps. Note that we
can't use the circuit-switching algorithm of Section 4.3
here because there may be as many as
sinks in a single row.
The
or more sinks that receive messages then each pick one of
these messages (there are at most 4), and send an acknowledgement to
the source of that message. At least
sources receive
acknowledgements, and these sources are the ones that will establish
paths. A source that doesn't receive an acknowledgement gives up on
routing its path.
Some of the nodes at which the remaining sources and sinks wish to
begin or end their paths may have been declared blocked. None of
these nodes will be used. By making (and hence d) large,
however, the number of blocked nodes in the first and fourth parts,
, can be made small relative to
. Thus, we are left with
source-sink pairs.
The paths from the sources and the paths from the sinks are routed
independently through the first two and last two parts, respectively.
The path from a source then meets the path from the
corresponding sink
at a rendezvous point
on level 0.
The rendezvous points are selected as follows. First, the sources
route their paths to distinct nodes on level in
time using the algorithm from Section 4 on the
working switches. Then the sources are numbered according to the
order in which they appear on that level using a parallel prefix
computation. A parallel prefix computation can be performed in
word (or even bit) steps on an N-leaf complete binary
tree, and hence also on a butterfly. For a proof, see
[16, Section 1.2,]. (Note that although we are treating
some of the nodes as if they were faulty, there are no actual faults
in the network, so non-working nodes can assist in performing prefix
computations.) The rendezvous points are also numbered according to
the order in which they appear on level 0 using another prefix
computation.
Next, using a packing operation, a packet representing the ith
rendezvous point is routed from
to the node in the ith
row of level 0. At the same time, a packet representing the ith
source
is routed from level
, where
's path has
reached so far, to the ith node of level 0. These two routings
can be implemented in
word (or bit) steps on a butterfly
[16, Section 3.4.3,].
Once the packets for and
are paired up, a packet is sent
back to
's node on level
informing it of the position
of
on level 0. (This is an unpacking operation.) The path
for source
is then extended from level
to level 0
using the algorithm from Section 4.3 on the working
switches. Then a packet containing the location of
is sent from
's node on level
to the node on level 0 that is in
the same row that
lies on in the fourth part. This routing can
be performed in
word steps using the store-and-forward
multibutterfly routing algorithm of [17]. (We can't use
the circuit switching algorithm because there may be as many as
sinks in the same row.)
In time, the packet works it way across the row from level
0 to
, which lies somewhere between levels
and
. (Note that although there may be as many as
in
the same row, the total time is still at most
.)
Finally, a path is extended from to any working node on level
and from there to
using the algorithm of
Section 4.3 on the working switches.