Reconfigurable Computing Seminar
Carneigie Mellon University
15-828/18-847 |
Spring 1998 |
February 4 |
Lab 2 |
Due: Friday 2/21/98 11:59PM
The purpose of this lab is to gain experience implementing an
algorithm on a reconfigurable computing device. For this lab you will
target the PipeRench architecture and optionally a Xilinx XC40xx
device. You will map and test your algorithm to PipeRench using the
assembler, dataflow compiler, and simulator that have been developed
for PipeRench.
You should first implement the algorithm in C, using the C
implementation as a reference specification. Them use DIC as an aid in
determining the structure and memory requirements of your design.
This tool will also allow high level simulation to debug the general
implementation you have chosen. Finally, you should implement the
design in CVHASM. Once implemented you can test the implementation
using CVHSIM. For extra credit, you can map the design to the Xilinx
architecture.
You should hand in a report, in HTML, that describes the problem, the
implementation approach, your final design, and simulation results.
While the time for the lab does not permit much exploration of the
design space, we do expect that you will evaluate the tradeoffs that
would be possible if you had more time. The report should also
evaluate the tools you used.
You should pick a kernel to implement by 2/6/97. There is a sign-up
sheet outside of Seth's office, 7122 Wean Hall. If you decide to pick
a kernel not listed in Section 5, you should speak with Herman or
Seth about it before Friday. By Monday you should turn in (By running
handin -lab 2 proposal.html. Use the
template
we provide.) a proposal which describes the
kernel and your initial ideas in how to map it to PipeRench.
You should next implement the kernel in C. There are no particular
guidelines for your C implementation, it is mainly to provide you with
a reference specification for debugging and guidance as you implement
the reconfigurable design.
Your next step should be to describe the algorithm to dic, the
dataflow intermediate compiler. dic allows the algorithm to be
described in an architecture neutral manner. dic will schedule
your algorithm onto an abstract pipeline reconfigurable architecture.
Its output of interest is a C program which can simulate the design
and shows the structure of the algorithm as it would be mapped onto a
pipelined reconfigurable architecture. The dic programming guide
will be published on the web before 2/10/98.
With the output of dic to guide you, you should map the algorithm
to PipeRench using CVHASM. Your goal should be to minimize the
width*height product. Some tricky things to keep in mind:
- stripe state and store/restore.
- pipelining adds across stripes requires routing the carry out
signal from one stripe to the next which require 1 PE in each stripe.
-
Test out your design using CVHSIM. You can use simwave to debug
the design-painful, but complete.
- Project proposal (Due on 2/6).
- Code for your kernel in C, DIC, and CVHASM.
- The necessary .data files to run the simulation.
- memory.data: the input memory file
- memconfig.data: the configuration of the memory controllers.
- A write-up of the lab which describes your design and
implementation. This should be in HTML and be about 1000 words with
some figures. We recommend the following structure for the report.
- Problem statement in a paragraph.
- Implementation overview in a couple of paragraphs. References
to papers and or figures should be included here as well.
- Describe what is interesting about your design in a few
paragraphs. This should include neat tricks, really hard things,
architectural changes that would make life easier, etc.
- Results in a paragraph. This should include things like maximum
number of registers used, stripes, width of largest stripe, avg width,
etc.
- Finally, other approaches that might be profitable.
If you implement your design on a Xilinx, hand in the
verilog/vhdl/schematics, etc. Also describe the Xilinx design and
compare it to the PipeRench design. Did you use features on the
Xilinx that were not available on PipeRench (feedback paths, local
memory, etc.)
We will pick a few of the projects to be presented in class. We will
notify the lucky winners shortly after the lab is turned in.
Here are some kernels that we think are of an appropriate complexity
for this lab. Many of these kernels are parameterized in some
fashion. While not required, think about how you would generate a
class of implementations depending on the parameter. For instance,
the parameter for nqueens is the board size, for the systolic priority
queue, the size of the queue, etc.
- Floating Point Addition/Subtraction
Implement a floating
point adder/subtracter for a 16-bit floating point
number [6]. The breakdown of the 16-bits is: 1 sign bit, 6
exponent bits, and 9 mantissa bits.
- Floating Point Multiply
Implement a floating
point multiplier for a 16-bit floating point
number [6]. The breakdown of the 16-bits is: 1 sign bit, 6
exponent bits, and 9 mantissa bits.
- Systolic Priority Queue
Implement a priority queue on PipeRench. This structure should handle
delete_min and insert operations [2]. It can have a constant size, but
should detect overflow. Essentially, it keeps k items sorted from
min to max.
- IDEA encryption
Implement one round of IDEA encryption [4]. In addition
to a general solution, you should create a solution which has a
non-trivial key compiled into the design.
- Nqueens Evaluator
Implement a pipeline that can evaluate
an nqueen position every cycle. It should return whether or not a
position is valid or not. This can then be embedded in a position
generator. Final implementation should handle a board at least 8x8.
- Huffman Decoding
Generate a Huffman trie for ASCII text
compression (not in hardware), and implement the encoder and decoder
in CVHASM. Compress and decompress this PostScript document. A good
reference for Huffman coding is [5].
- Integer Square Root
Implement 24 bit integer square
root. An FPGA-oriented pipelineable solution is presented
in [3].
- DNA Match
Implement the unidirectional matcher from the
Hoang paper [1]. Implement both a general matcher and one
which has a reasonable source string compiled in.
- 64x64 Multiplier
Implement a 64-bit by 64-bit integer
multiplier on a stripe of no more than 10 4-bit PEs.
- Vector-Matrix Multiply
Implement a vector matrix
multiply on PipeRench. You wide latitude in how you do this because
it requires not only work in CVHASM, but also in the data controller.
This is really only for someone very familiar with Verilog/
Remember, the tools are new and buggy. It will help everyone if you
post questions, bugs, comments, hints, etc. to the
class news group
.
- 1
-
Dzung T. Hoang.
Splash-2: FPGAs in a custom computing machine, chapter
Searching genetic databases on Splash 2, pages 97-109.
IEEE, 1996.
- 2
-
Charles Leiserson.
Systolic priority queues.
Technical Report CMU-CS-79-115, Carnegie-Mellon University,
Pittsbugh, Pennsylvania 15213, April 1979., Pittsbugh, Pennsylvania 15213,
April 1979.
- 3
-
Y. Li and W. Chu.
Implementation of single precision floating point square root on
FPGAs.
In Proceedings FPGAs for Custom Computing Machines, April 1997.
- 4
-
Bruce Schneier.
Applied Cryptography, chapter 13, pages 321-322,638-641.
?, 1996.
- 5
-
Robert Sedgewick.
Algorithms in C, chapter 22.
?, 1999.
- 6
-
N. Shirazi, A. Walters, and P. Athanas.
Quantitative analysis of floating point arithmetic on FPGA based
custom computing machines.
In Proceedings FPGAs for Custom Computing Machines, April 1995.
This document was generated using the
LaTeX2HTML translator Version 97.1 (release) (July 13th, 1997)
Copyright © 1993, 1994, 1995, 1996, 1997,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 0 -show_section_numbers -no_navigation -t Reconfigurable Computing Seminar Lab 2 lab2.tex.
The translation was initiated by Seth Copen Goldstein on 2/4/1998
Seth Copen Goldstein
2/4/1998