As we saw earlier in Figure , WATER suffers the least from memory latency of all the SPLASH applications, spending only 7%of its time stalled for memory. Although there is little need for prefetching in this case, we discovered nonetheless that our algorithm is unable to cover the misses. The reason why is because the key loop body is not in the same file as its surrounding loop. Since our prefetching algorithm does not perform interprocedural analysis-particularly not across separate files, which becomes very tricky given separate compilation-it fails to recognize the affine access patterns, and therefore does not insert any prefetches at all. With either interprocedural analysis or inlining across separate files, the compiler could easily prefetch the references and hide the memory latency. Since the solution to this problem is well-understood, and since there is little performance gain to be had, we did not bother to insert the prefetches by hand for this case. WATER is an example of a case where strengthening the implementation of the existing algorithm would solve the problem.