Scribe Notes -- Jan. 21, 1998
Slide 1: Admin
- People have been having problems with XDelay - especially on the AIX boxes. Try the Solaris boxes if you have problems on AIX.
Slide 2: Outline
Slide 3: Why Look at Arithmetic?
- Parallelization is important.
- There's the issue of throughput vs. area, although on many FPGAs you get pipelining for free because of the registers on the logic blocks.
Slide 4: Adders
- Xilinx claims their ripple carry chain delays aren't noticible until 32 ripples, but it seems more logical to peg that number around 16, from experience.
- A variety of adders work well in FPGAs - carry bypass, carry select, and carry look-ahead.
Slide 5: Adder Pipelining
- This is not really done in ASICs because registers are almost as expensive as adders.
- This is not really an issue in FPGAs because, as mentioned above, there are registers for free on the outputs of the logic blocks.
- For chaining, after the intial registers are allocated for delay, register cost is light.
Slide 6: Carry Bypass Adders
- If all the propagates are 1, then Cin is passed to Cout with a MUX. Skip all the carry logic within the cells. Inexpensive.
Slide 7: Carry Bypass
Slide 8: Carry Select Adders
- This precomputes two answers - one for Cin=0 and one for Cin=1. A MUX is then used to select between the two when the carry arrives. This is expensive in terms of area because it nearly doubles the required number of adders.
Slide 9: Linear Carry-Select
- Assumption: Adder delays = width of adder. Mux delays = 1.
- The critical path here is clearly the path across the MUXs, and this can be reduced as seen in the next slide.
Slide 10: Square Root Carry-Select
- This ensures that each carry arrives at the MUX when the correspoding sum is ready.
- The amount of hardware is more than double that required for the ripple carry adder.
- The amount of hardware is not any more than that required for the linear carray select adder.
Slide 11: Brent-Kung Adder I
- The parentheses for the 2 bit adder generate signal are like this: (G0 & P1) | G1
Slide 12: Brent-Kung II
- By adding sums of sums, the speed of Brent-Kung can be improved more.
Slide 13: Brent-Kung
- One advantage of Brent-Kung is that it can be laid out very tightly.
Slide 14: Constant Addition
- Constant addition doesn't gain you a whole lot in terms of speed, although it can reduce some full adders to half adders.
- Additionally, trailing zeros can be turned into wires.
Slide 15: HHF Paper Issues
- It seems as if they threw hardware at speed.
- What's the best case? They don't say. It might be really good, depending on exactly where the carry chain was lined up in the logic block row.
- There aren't any comments on scalability up to, say, 128 bits.
- Is it really necessary to have the carry chain be a drop in replacement for all the original functionality? It seems as if hardware could be saved if some of the features were cut.
- One issue is that pipelining the Brent-Kung adder takes a lot of storage space.
- Most microprocessors use a Brent-Kung adder structure along with other carry select structures for large adder sizes.
- A note about the Chimera architecture - it has a shadow RFile in a CPU with the FPGA fabric above the carry chain.
Slide 16: Multiplication
Slide 17: The Array Multiplier
- You can't really optimize this design unless you optimize each path.
Slide 18: The Carry-Save Multiplier
- There's one critical path here - along the right-hand side and across the bottom.
Slide 19: Pipelining the Multiplier
Slide 20: Paper: PH96
- It's hard to believe tht the manual synthesis was that bad.
- The point of reprogrammability of FPGAs was never really made.
- The authors didn't draw a lot of attention to the fact that their FIR was implemented on 5 FPGAs and the DSP chip they used was just 1.
Slide 21: Paper Problems
Slide 22: Summary
Scribed by Adrian Drury