Optimize Outer LoopRetime to Eliminate Broadcast
for i=1 to n in place
for j=1 to k in parallel
yi += wj * xi+j-1
W1
. . . x2 x1
. . . y2 y1
W2
Wk
Similar to W2 From [Kung82]
Previous slide
Next slide
Back to first slide
View graphic version