Static parallel implementation notes
This document describes the state of the QM implementation and
datastructures as of Milestone 2. Since there are obvious
problems with the load-balancing in this version, we expect to go
to dynamic scheduling as the next step; that version may be
drastically different.
1 New datastructures
We continue to use the implicant and table
datastructures developed for the uniprocessor version, with the
addition of a few operations that became useful. We also use a
(fairly standard, uninteresting) queue datastructure for storing
tasks.
2 Parallelization
The idea is to make a sort of table of tables. Then each process
gets a column of the super-table to work on; there is no stealing
from other columns or dynamic scheduling of any kind. The
difference between this version and the uniprocessor version is
just that as long as there are items in a given queue, the queue
next to it on the right can work on those items, and so on from
left to right across the super-table. This means that work moves
from left to right over the program's career, and before or after
processors obtain work, they have to sit idle.
A process is done when its queue is empty and the process "to its
left" (in the standard table representation) has signalled
completion. At that point, the process itself signals completion.
Until this happens, iterations of a process are as follows.
- Wait for work in the queue.
- Check every new implicant arriving with every previous
implicant. If there is exactly one difference, then replace
that difference with a don't-care and give it to the queue to
your right.
- When there's nothing in the queue and the previous process
has signalled completion, add any prime implicants you've found
to the global prime implicant table.
3 Timing results
Quantitative results are to follow when batch machines are more
available (as of 4/10, queues were the slowest I'd seen). For
now, here are some things we found:
- Making sure to add only non-duplicates to tables speeds
things up, even though that operation takes more time than
adding without checking for duplicates. Without duplicates in
tables, there is less redundant communication and work, and this
more than makes up for the non-duplicating addition.
- Best speedups were on problems with a low ratio of number
of terms to number of implicants (i.e., mostly true datasets.)
Relatively speaking, each implicant is then worth more in terms
of providing work to be done without requiring more
communication.
4 Possible improvements
The most obvious improvement to make is to the scheduling; work
sweeps through the processes from left to right rather than being
in constant supply, and in any case as the algorithm proceeds to
the right the workload becomes smaller.
Changing the scheduling will probably affect whether the algorithm
runs faster on some types of problems than on others. As of yet,
we can't say how.