Optimize Outer LoopBroadcast Common Value
for i=1 to n in place
for j=1 to k in parallel
yi += wj * xi+j-1
W1
. . . x2 x1
. . . y2 y1
W2
Wk
B1 From [Kung82]
Previous slide
Next slide
Back to first slide
View graphic version