We want to do multiple parallel implementations of the Quine-McCluskey boolean logic exact minimization algorithm. The goal of the project would be to examine tradeoffs for different parallel programming schemes.
The QM algorithm takes as input a boolean function of n variables and returns a minimized list of implicants which will implement the function. For example, if we are given the following function:
f = ¬a¬b¬c¬d + ¬ab¬c¬d + ¬a¬b¬cd + ¬ab¬cd + ab¬cd + ¬abcd + abcd + a¬bcd + abc¬d + a¬bc¬d
with the following Karnaugh map:
00 01 11 10 -- -- -- -- 00| 1 1 0 0 01| 1 1 1 0 11| 0 1 1 1 10| 0 0 1 1
Quine-McCluskey will tell us that the function can be implemented as: f = ¬a¬c + bd + ac
The algorithm is deceptively simple to run by hand, but to implement it efficiently is quite another matter, especially when considering the different ways of implementing it on a parallel processor. The basic operation of the algorithm is this: Construct a list of implicants. For nimplicants, perform Theta(n²) operations, producing a separate list of implicants, and then operate on that list of implicants. The number of lists is at most n.
In more detail, the algorithm works as follows: First, generate a list of implicants. In our example from above, this is:
0000 0100 0001 0101 1101 0111 1111 1011 1110 1010
Next, find all the pairs of elements which differ in at most one element; compose a separate list of these elements, with one implicant for each element, with the term where they differ replaced by a -; keep track of the implicants used:
0000+ 0-00 0100+ 000- 0001+ 010- 0101+ 0-01 1101+ -101 0111+ 01-1 1111+ 11-1 1011+ -111 1110+ 1-11 1010+ 111- 101- 1-10
Note that implicants which have already been used can still be used in other pairs; this is essential to the correctness of the algorithm, corresponding to overlapping implicants in a Karnaugh map. The next step is to generate another implicant list.
0000+ 0-00+ 0-0- 0100+ 000-+ 0-0- 0001+ 010-+ -1-1 0101+ 0-01+ -1-1 1101+ -101+ 1-1- 0111+ 01-1+ 1-1- 1111+ 11-1+ 1011+ -111+ 1110+ 1-11+ 1010+ 111-+ 101-+ 1-10+
Note that (0-00,0-01) and (000-,010-) both reduce to (0-0-); this is also the case for some other pairs of implicants. Two choices for the implementation of collisions such as these are: Only add to a list if the specified implicant is not already present in the list; or, discard any duplicate implicants in the final output.
The next step in the algorithm is to look once more for implicants which differ in only one term. However, there are no such implicants, so it is time to move on to the next phase of the algorithm, which is finding all the prime implicants -- that is, the implicants which were not collected into a pair. These are all the implicants which do not have a + beside them. In this example, the set of prime implicants is exactly the same as the set of elements in the right list, but in general, there may be prime implicants in any of the lists, and all of these are used for output. The list of prime implicants in this example is: (0-0-,-1-1,1-1-), meaning that the function can be expressed as ¬a¬c + bd + ac, as can easily be verified by looking at the Karnaugh map above.
The major opportunity for parallelism that we have seen so far is that the lists can be generated dynamically: If we assign a process to each list, each process can take a stream of implicants from the previous list and generate a stream of implicants for successive lists. However, this is not sufficient to ensure good parallelism, for two main reasons: The number of implicants on the initial lists is (usually) much larger than the number of implicants on later lists, so if a process is assigned to each list, then the later ones will be starved for work; and, if there are more than nprocessors, they will not all be used if we simply assign a processor to each list, since there are at most n lists. The solution to these problems will be interesting to investigate.
Sun | Mon | Tue | Wed | Thu | Fri | Sat | |
---|---|---|---|---|---|---|---|
Mar | 19 Proposal due | 20 | 21 Implicant code (1) | 22* | 23* | ||
24* Table code (2) | 25 | 26 | 27 Uniprocessor code (3) | 28 Milestone #1 Uniprocessor timing (4) | 29* | 30* Uniprocessor writeup (5) | |
Mar-Apr | 31* | 1* | 2* Parallel data structures (6) | 3* | 4* Locking code (7) | 5* | 6* |
7* | 8 | 9 Static code (8) | 10 Static timing (9) | 11 Milestone #2 Static writeup (10) | 12 | 13 | |
14 | 15 Dynamic code (11) | 16 Dynamic timing (12) | 17 Dynamic writeup (13) | 18 | 19 | 20 | |
21 | 22 | 23 | 24 | 25 Final code (14) | 26 Final timing(15) | 27 Final documentation (16) | |
Apr-May | 28 | 29 | 30 Final writeup (17) | 1 | 2 Poster session (18) |
For the second milestone, we will have:
Development will be done primarily on and for the NCSA SGI machines. We will also evaluate performance on machines with higher communication latencies (including the Intel cluster). Additionally, we will evaluate the performance on large numbers of processors on PSC machines, although it will be challenging to get good parallelism in this environment.
Individual runs will be kept to a few minutes at most. Hence, we should not require more than 40 hours of total compute time for the entire project (excluding development time in interactive mode).
No special commercial software will be required. Berkeley's
espresso
heuristic logic minimizer may be used for results
verification and/or timing comparisons. We will write some Perl scripts and/or
other tools to generate input files and test output.
No special resources are required for this project which are not already available.