Scribe Notes -- Jan. 21, 1998

People have been having problems with XDelay - especially on the AIX boxes. Try the Solaris boxes if you have problems on AIX.

Parallelization is important.
There's the issue of throughput vs. area, although on many FPGAs you get pipelining for free because of the registers on the logic blocks.

Xilinx claims their ripple carry chain delays aren't noticible until 32 ripples, but it seems more logical to peg that number around 16, from experience.
A variety of adders work well in FPGAs - carry bypass, carry select, and carry look-ahead.

This is not really done in ASICs because registers are almost as expensive as adders.
This is not really an issue in FPGAs because, as mentioned above, there are registers for free on the outputs of the logic blocks.
For chaining, after the intial registers are allocated for delay, register cost is light.

If all the propagates are 1, then Cin is passed to Cout with a MUX. Skip all the carry logic within the cells. Inexpensive.

This precomputes two answers - one for Cin=0 and one for Cin=1. A MUX is then used to select between the two when the carry arrives. This is expensive in terms of area because it nearly doubles the required number of adders.

Assumption: Adder delays = width of adder. Mux delays = 1.
The critical path here is clearly the path across the MUXs, and this can be reduced as seen in the next slide.

This ensures that each carry arrives at the MUX when the correspoding sum is ready.
The amount of hardware is more than double that required for the ripple carry adder.
The amount of hardware is not any more than that required for the linear carray select adder.

The parentheses for the 2 bit adder generate signal are like this: (G0 & P1) | G1

Constant addition doesn't gain you a whole lot in terms of speed, although it can reduce some full adders to half adders.
Additionally, trailing zeros can be turned into wires.

It seems as if they threw hardware at speed.
What's the best case? They don't say. It might be really good, depending on exactly where the carry chain was lined up in the logic block row.
There aren't any comments on scalability up to, say, 128 bits.
Is it really necessary to have the carry chain be a drop in replacement for all the original functionality? It seems as if hardware could be saved if some of the features were cut.
One issue is that pipelining the Brent-Kung adder takes a lot of storage space.
Most microprocessors use a Brent-Kung adder structure along with other carry select structures for large adder sizes.
A note about the Chimera architecture - it has a shadow RFile in a CPU with the FPGA fabric above the carry chain.

There's one critical path here - along the right-hand side and across the bottom.

It's hard to believe tht the manual synthesis was that bad.
The point of reprogrammability of FPGAs was never really made.
The authors didn't draw a lot of attention to the fact that their FIR was implemented on 5 FPGAs and the DSP chip they used was just 1.