Theoretical Analysis

Next: Empirical Analysis Up: Comparison of Alternative Approaches Previous: Comparison of Alternative Approaches

Theoretical Analysis

In the literature we find theoretical analyses for the alternative approaches to one aspect of parallel search, namely task distribution. Kumar and Rao [19] provide an analysis of the speedup of distributed tree search, and Powley and Korf [25] provide an analysis of parallel window search. In this section we summarize these analyses with a unifying representation, and empirically compare the performance of the techniques using the derived equations.

These analyses assume that the average branching factor bremains constant throughout the search space and that the least-cost goal is located at a depth d. We also let b represent the heuristic branching factor, or the ratio of nodes generated during one iteration of IDA* to the number of nodes generated during the previous iteration of IDA*. Forcing the heuristic branching factor to be equal to the average branching allows the analysis to be the same as for incremental-deepening depth-first search.

For the distributed tree search analysis, we assume that a breadth-first expansion is used to generate enough nodes, n, to distribute one node to each of P processors. Since $n \; = \; b^x$ and $n \; \geq \; P$ , we can assume that the tree is expanded to depth x where $x \; \geq log_b P$ . We will assume a time of $2 b^x \approx 2P$ to perform the node distribution and to collect the solution results from each processor. The speedup of distributed tree search, measured as the run time of the serial algorithm divided by the run time of the parallel algorithm, can be computed here as the number of serial nodes generated (assuming a constant node expansion time) divided by the number of serial node expansions performed by the parallel algorithm. This is derived in the literature [19,35] as

$\begin{displaymath}S = P \left( \frac{b^{d} + b^{d-1} + b^{d-2} + \cdots + b^{2}... ...^{d-1} + b^{d-2} + \cdots + b^{x+1}} \right) + \frac{1}{2b^x}. \end{displaymath}$

(1)

As d increases, the leftmost fractional part of this equation approaches 1 and can be ignored. The 1/2b^x term contributes a minimal amount to the final value and can also be ignored. In this case, speedup approaches P, which represents linear speedup.

Figure 5 shows the performance of the distributed tree search algorithm based on these equations on a perfectly balanced tree and on a heavily imbalanced tree for P = 10, b = 3, and d = 10. In the imbalanced case, the size of the processors' search spaces varies as an exponential function where the first processor is assigned a majority of the work and the load decreases as the processor number increases. In this graph, the goal position ranges from the far left side of the tree (position = 0) to the far right side of the tree (position = 1). Performance of the search algorithm always peaks when the goal is on the far left side of a processor's portion of the search space. For the case of an imbalanced tree, much of the search space is assigned to a single processor, which increases the resulting amount of serial effort required.

**Figure 5:** Distributed Tree Search Contrasting Tree Balance
$\begin{figure}\medskip \centerline{\psfig{figure=figures/imbal_bal.ps,width=2.3in,height=2.0in}} \medskip \end{figure}$

We next consider the theoretical performance of the parallel window search algorithm. Recall that parallel window search operates by distributing the window sizes, or cost thresholds, to each available processor so each processor performs one iteration of IDA*. Since thresholds are not explored sequentially, the first solution found may not represent an optimal path. To ensure an optimal solution, all processors with a lower threshold must complete their current iteration of IDA*. In the worst case, this can make the performance of parallel window search equal to that of a serial version of IDA*.

In this analysis the assumption is made that a sufficient number of processors exists such that the goal iteration will start without delay. Speedup of parallel window search can be calculated as the ratio of the number of non-goal plus goal iteration nodes to the number of nodes generated by the processor performing the goal iteration. Powley and Korf generate this ratio using the notion of the left-to-right goal position a, defined as the fraction of the total number of nodes in the goal iteration that must be expanded before reaching the first goal node [25]. Speedup of parallel window search can thus be expressed as

$\begin{displaymath} S = \frac{b^{d-1}\left( \frac{b}{b-1} \right)^2 + ab^d\left(... ...d\left( \frac{b}{b-1} \right)} \;=\; 1 \;+\; \frac{1}{a(b-1)}. \end{displaymath}$

(2)

**Figure 6:** Distributed Tree Search vs. Parallel Window Search
$\begin{figure}\medskip \centerline{\psfig{figure=figures/dts_pws.ps,width=2.3in,height=2.0in}} \medskip \end{figure}$

Given this formula, we can empirically compare the performance of distributed tree search and parallel window search for P = 10, b = 6, and d = 10. From the graph in Figure 6 we can see that parallel window search will outperform distributed tree search only if the goal is located on the far left of the search space. We also observe that performance of distributed tree search peaks whenever the goal node is located on the far left of the subspace assigned to a particular processor.

Similar analyses have been provided to compare node ordering techniques and to determine the optimal number of clusters [10,25,35]. These analyses do indicate trends in the performance of alternative strategies based on features of the problem space, and can also be used to determine the theoretical performance of a particular technique for a given problem. However, the terms used to predict the performance in many of these analyses are not always measurable and many assumptions made are too constraining for real-world problems.

Next: Empirical Analysis Up: Comparison of Alternative Approaches Previous: Comparison of Alternative Approaches