The deadline has been extended by 24 hours. Lab 4 is now due Friday, October 26 at 11:59pm.
64 | 128 | 256 | 512 | 1024 | Mean |
1.00 | 3.50 | 3.50 | 3.37 | 2.92 | 2.61 |
64 | 128 | 256 | 512 | 1024 | Mean |
1.84 | 4.81 | 3.14 | 3.09 | 3.86 | 3.19 |
32 | 64 | 128 | 256 | 512 | Mean |
7.76 | 7.64 | 7.62 | 6.97 | 6.83 | 7.36 |
For example, consider a 1024 x 1024 matrix. Each row has 1K pixels, each pixel being 6 bytes wide. Thus each row of this matrix is 6 Kbytes wide. Consider the pixels in column i. If src[i][j] is at address A, then src[i+1][j] is at address A+6K, src[i+2][j] is at address A+12K, and so on.
The L1 cache we're targetting is of size 16 KB, and is 4-way associative. This means that addresses A and A+m*4K will map to the same set, where m is an integer.
In particular, src[i][j] at address A, and src[i+2][j] at address A+12K will map to the same set. A set can hold at most 4 cache lines. Therefore, if more than 4 blocks of the matrix map to the same set, the last one accessed is likely to evict the first. If you now want to access the first block again, you will get a miss and pay a performance penalty.
For smaller matrices, this effect is not so pronounced, because the locations within the same matrix are comparatively less far apart. For example, for size 64, if src[i][j] is at address B, then src[i+2][j] is at address B+868.