We have carried a preliminary study to assess the properties of the spatial model of computation. Our study shows that this paradigm has some non-intuitive properties, which are different from the classical model of computation, in which a processor interprets a program in machine-code.
|
In this image we see the spatial layout of a program from the Mediabench benchmark suite; the layout was automatically generated by a placer tool, based on the profiled execution of the code. Each square is a cluster of computation and memory having 100 units of area, where 1 unit is roughly one 32-bit integer operation or one 32-bit memory word. Green indicates memory, white program code. The edges indicate communication: red is control-flow transfers, while blue is memory access. The thickness of an edge is proportional to the logarithm of the number of times the edge is used during the program execution. Communication along one edge takes time proportional to the length of the edge. |
|
In all the programs we have analyzed we noticed the existence of some very large "stars" in the layout: nodes which have a lot of neighbors. These nodes will impair the timing, because there is no 2D layout which can place all the neighbors close to the center.
By analyzing the program we discovered that one of the star centers it the memcpy library function. This function touches most of the memory of this program.
By applying a very simple classical optimization we have dramatically improved the program layout, and implicitly its performance. What we did is to inline the body of the memcpy function into some of its callers. This has effectively caused the star to break into several smaller stars, which can have a much better layout. |