We have demonstrated that profiling feedback can potentially help the compiler generate more effective prefetching code by giving it more information about the dynamic behavior of the application. However, using feedback in the compilation process can be somewhat time-consuming and clumsy, and also raises the issue of finding representative input data sets. Perhaps a better alternative for using dynamic information is to generate code that adapts at runtime. We have demonstrated that by exploiting user-visible hardware miss counters, the software can dynamically adapt to the cache behavior to achieve the best overall performance. Based on the success of using such miss counters in BCOPY and LU, it would appear that processor designers should seriously consider adding user-visible miss counters to their processor architectures. In addition to their benefit in adaptive code, these counters will also make it possible to collect detailed memory feedback information with very little overhead. From our own experience, this information is quite valuable for debugging memory performance, and is very useful when inserting prefetches into irregular codes (e.g., BARNES and PTHOR) where static locality analysis is extremely difficult.