Reconfigurable Computing Seminar
Carneigie Mellon University
15-828/18-847 |
Spring 1998 |
January 12 |
|
|
|
Lab1 |
|
|
|
Due: 11:59 PM January 30,
1998 |
Outline:
-
Introduction
-
Tool Flow and Setup
-
Multiplier HDL and Test Harness
-
Redesigning the 2-operand Multiplier
-
Designing a Constant Multiplier
-
What you should handin
1. Introduction:
The purpose of this lab is to give everyone an introduction to the design
flow for commercial FPGAs, including work with hardware description languages,
simulation tools, synthesis tools, and place and route tools. In addition,
you will explore different multiplier structures and witness the benefits
of pipelining and constant propogation.
2. Tool Flow and Setup:
The basic tool flow for this lab is as follows: simulation, synthesis,
and physical design. There are two tools available for each one of these
steps, and two platforms that they will run on: the rs_aix boxes that are
in HH1000 and many ECE grad students' desks, and SPARCs (running Solaris)
that are in andrew clusters. The tools that we have for each stage, as
well as the platform that it is supported on are listed in the table below.
(Note, we have not tested the lab on the wintel boxes.)
Tool |
Vendor |
AIX |
Solaris |
WINTEL |
Simulation |
Verilog-XL |
Cadence |
X |
X |
|
Leapfrog VHDL |
Cadence |
X |
X |
|
Synthesis |
synplify |
Synplicity |
|
X |
X
|
design_analyzer |
Synopsys |
X |
|
|
FPGA Physical Design |
dsgnmgr |
Xilinx |
|
X |
X
|
XDM |
Xilinx |
X |
|
|
Why all the choices?
-
To spread the compute load.
-
To vary the results obtained.
-
To allow people to do as much as possible on their own machine, if they
have one.
We will describe how to use each one of these tools. Here
is a broad brush comparison of the compatible tools.
The Bottom Line: If you're new to this, and you don't have an
AIX box on your desk, we recommend the Solaris tools for synthesis and
physical design. You should be able to swap back and forth between
platforms. Both Synopsys and Symplify accept VHDL and Verilog, both Xilinx
tools should accept the Xilinx Netlist Format (XNF) files that are output
from both synthesis tools. This SHOULD work, but in preparing this
assignment we only tested the interface between the tools on one platform.
You're on your own if you play on both platforms.
To set yourself up for the two platforms, we've developed two shell
scripts that should set your paths and environmental variables correctly
to run all these programs. Save these scripts from your browswer into your
home directory.
For Solaris Machines:
setvar847.sun4_55
For AIX Machines:
setvar847.rs_aix41
If plan to use Synopsys, copy the following file to your working directory:
.synopsys_dc.setup
Everytime that you start working, you should source the setvar file
by typing
% source setvar847.`sys`
from your unix prompt. If you're running remotely, make sure you setenv
DISPLAY and xhost properly to allow X-windows interfaces to work. If you
don't know how to do this, see Appendix B.
3. Multiplier HDL and Test Harness
In the first part of this lab, you will take a 2-operand 12-bit multiplier
through the entire simulation and synthesis path. There is minimal design
in this section. Its primarily intended to teach the flow through these
tools, for those of you who may have never done this before.
A. Simulation: Circuit under test and test harness.
Most of the time when you want to create a simulation model for a design,
you create a description of the design, as well as a description of a tester
for the design that makes sure things are working correctly. We will break
these up into two files:
Verilog:
harness.v
mult1.v
VHDL:
Mult_VHDL.tar.gz
Copy these files into your working directory. If your doing VHDL, uncompress
(gunzip) and untar (tar xvf Mult_VHDL.tar.gz)
The interface between the Harness and Mult1 is simple. Mult1 has four
inputs: a and b, which are twelve-bits; valid, which is a single bit indicating
(when it is one) that the operands a and b are valid; and clk, which is
the clock for the design. Mult1 has one output: c, which is the 24-bit
product of a and b. Harness drives all the inputs of Mult1, therefore it
has four outputs, and receives the single input from Mult1.
The harness has two parameters: latency and period. Latency describes
how long it takes the design to output a result of a multiplication, and
period describes how often, in clock cycles, the multiplier can be expected
to accept operands. Mult1 has a period of one and a latency of four. In
general every "period" cycles, the harness asserts valid, indicating that
the multiplier should multiply the values that are on the a and b pins.
"Latency" cycles later, Harness makes sure that the values coming out of
the multiplier on the c bus equal the product of those previous a and b
values. Note that if latency is greater than period, there are multiple
multiplications going on in the multiplier at the same time. For now, you
don't have to modify the values of latency and period, but you will later
in the lab.
The Mult1 file describes a multiply circuit with a latency of four.
This latency is implemented by having four cascaded registers that delay
the product. The multiplication is described using the (*) operator.
Running the simulator:
For
Verilog
For
VHDL
Questions:
-
Currently, the multiplication takes place in the same cycle that the inputs
are received. Rewrite this module so that the multiplication happens on
the second cycle after operands arrive. What effect should this modification
have on the maximum clock speed of the implementation? Will it have any
effect on the size of the implementation?
-
Blocking and non-blocking assignments (answer only if you used Verilog):
Change all the procedural assignments in the multiplier module to blocking
ones (use the = operator rather than the <=). Why doesn't the simulation
work? Can you re-write this description so that it works with blocking
assignments? (Hint: no new code is needed.) Are there any risks to creating
descriptions in this way? (See the on-line documentation for a discussion
on non-blocking assignment.)
What you should hand in:
-
The postscript of the simwave output in simwave.ps
-
Answers to question 1 in written.txt (head your answer as part 3A.1)
-
Answers to question 2 in written.txt (head you answers as part 3A.2)
-
A copy of the verilog using blocking assignment in the file mult2.v
B. Synthesis:
How it works: The * operator in Mult1 gets mapped to internal module
generators that generate the netlist for a multiplier. This mapping may
vary based on the path used. The c1, c2 and c3 elements get mapped to internal
registers. Therefore, the structure of the implementation will depend a
great deal on the structure of the specification. This may be not be the
right thing for reconfigurable computing, but in the next section you will
use this fact to modify the structure of the multiplier.
More detailed instructions for synthesis depend on the tool you are
using.
Instructions
for Synplify
Instructions
for Synopsys
What you should hand in:
In the file written.txt as part 3B: Report the estimated number of CLBs,
and registers, FMAPs and HMAPS if you ran Synplify, and the estimated length
of the estimated critical path. The reason the CLBs is an estimate
is that the synthesis tool does not know what is inside a Xilinx CLB and
instead maps the design to four-input LUTs (FMAPs) and three-input LITS
(HMAPs) leaving the packing of the CLBs to the Xilinx technology mapper.
C. Xilinx Physical Design:
In this final stage of design, the Xilinx tools do a final mapping of logic
to CLBS, and place and route the design. You'll do this, as well as run
analysis tools to determine the maximum clock frequency that your design
can operate on.
Instructions
for dsgnmgr
Instructions
for xmake and XDM
What you should hand in:
-
In written.txt as part 3C, report the number of FMAPs, HMAPs, Packed CLBs,
registers, and the length of the critical path in nanoseconds.
-
Extra credit: Resynthesize, place and route the description you
wrote for question one. How fast and big is the new design? (Call
this file mult3.v and place answers in written.txt as part 3C-extra).
4. Redesigning the Two-Operand Multiplier
Create one redesign of the two-operand multiplier. The multiplier you design
must run with the test harness from section 3, although you may select
the latency and frequency of the multiplier. You may target any Xilinx
4000E series device, with speed grade -3, that is supported by the tools.
We want some variety of implementations. We plan to produce a graph of
throughput vs. area of every implementation generated by the class. To
encourage variety, the grading for this section will be based, in part,
in how far away you are from the convex-hull of the solutions generated
by the whole class. Therefore, if you're off in a lunatic portion of the
design space (super fast, super small), you'll get a better grade than
if you do something more conventional. Your design should be significantly
faster or smaller than the Mult1 built in the previous section.
Suggestions:
-
Highly pipelined array multiplier: In both synthesis tools, the + operator
uses the fast carry logic present in Xilinx FPGAs. Build an array multiplier
using a set of adders. You may want to experiment with the width of the
adders that you use (12 bit adders might not be optimal.)
-
Pipelined Wallace tree multiplier: Generate the logic for a Wallace tree
multiplier and pipeline it. Use AND gates for the partial product generation,
full adders for the partial product reduction, and a large adder (using
the + operator) for the final adder.
-
A pipelined Ferrari-Stefanelli multiplier. This uses a bigger multiplier
(say 2x2) to generate partial products. Then you can use a Wallace tree
reduction and final adder.
-
Shift-and-add serial multiplier.
-
Serial-serial multiplier.
The test criteria:
Your Verilog or VHDL MUST run with Harness.v or Harness.vhd. The only
thing you may change in Harness is the two parameters: PERIOD and LATENCY.
What you should hand in:
-
Verilog (or VHDL) description of multiplier in mult4.v. Include as
a comment the latency and frequency settings for test harness.
-
Postscript out of simwaves timing diagram in simwave2.ps
-
Describe any variations to the synthesis or P&R flow that you used
in written.txt part 4. Also report the target FPGA, the number of CLBs,
registers, and the length of the critical path. Basically, give us the
results, plus enough information to duplicate the work that you did.
-
Turn in any software that you wrote to assist your design of this task.
5. Designing a Constant Multiplier
If one operand to a multiplier is a constant, the logic required to perform
a multiplication is significantly reduced.
Design the verilog for a multiplier for the following twelve bit constants:
3171 (binary: 1100 0110 0011)
2426 (binary: 1001 0111 1010)
You should re-write both the harness and the multiplier files, and simulate
with at least 300 random vectors. Like in the last section, you determine
period, latency and the targetted FPGA (as long as it is in the Xilinx
4000E family with -3 speed grade). We will again give better grades to
more extreme designs in the throughput and area space. Here are implementation
some suggestions:
-
Use multiple adders, taking advantage of the fast-carry logic.
-
Use look-up tables to create small (4-bit input) constant multipliers,
and add the results up.
You will need to modify the test harness so that it tests for constant
multiplication.
What you should hand in:
-
Verilog (or VHDL) description of multiplier in mult4.v. Include as a comment
the latency and frequency settings for test harness.
-
Postscript out of simwaves timing diagram in simwave2.ps
-
Describe any variations to the synthesis or P&R flow that you used
in written.txt part 4. Also report the target FPGA, the number of CLBs,
registers, and the length of the critical path. Basically, give us the
results, plus enough information to duplicate the work that you did.
-
Turn in any software that you wrote to assist your design of this task.
6. How You Should Handin Your Work
Run the program /afs/cs/academic/15828/bin/handin -lab 1 [path].
If is a directory the entire directory will be turned in. If
it is a file, the file will be turned in. You can run handin as many times
as you want. The last copy will be what we evaluate.
Appendices:
A. On-line Documentation:
For Verilog and VHDL information run openbook. The following volumes
may be relevant:
-
Verilog-XL Reference
-
Verilog-XL Tutorial
-
Verilog-XL Users Guide
-
LeapFrog VHDL Simulator Reference
-
LeapFrog VHDL Simulator User Guide
For Synopsys tools, run iview.
For help with dsgnmgr, run hyperhelp:
hyperhelp (xdsgn) /afs/ece/common/local/usr/supported/xilinx/M13/sol/usenglish/*.hlp
B. Xhosting
-
Telnet to a Solaris box (machine1) from your X terminal (machine2):
machine2% telnet far-sun4.andrew.cmu.edu
In order to find out the name of the Solaris box (machine1):
machine1% hostname
-
Klog into ECE:
machine1% klog UID@ece.cmu.edu
Where UID is your ECE user id.
-
Source the setup file:
machine1% source setvar847.sun4_55
-
Set the DISPLAY variable:
machine1% setenv DISPLAY machine2:0.0
Remember, machine2 is the hostname of the machine that runs your X display.
You may have to add the ".ece.cmu.edu" suffix.
-
Enable xhosting:
machine2% xhost +machine1
Where machine1 is the hostname of the Solaris box. You may have to add
the .andrew.cmu.edu suffix to the name you get from hostname.