Our analysis by itself will not give any performance improvement. But
when the result of the analysis is compiled by GCC, some performance
gain might be obtained in the program compiled, especially when it
contains function calls that can be optimized because they are
redundant. To measure this performance improvement, we wrote four
microbenchmarks:
All the benchmarks were written in C. We created two versions of each benchmark: an original version of the benchmark without our analysis, and a modified version with our analysis. To compile the benchmark to obtain the modified (analyzed) version, we first compiled the C source file of the corresponding benchmark to its SUIF representation, using the c2s program that comes in a standard SUIF2 distribution. Our analysis pass would then be applied to the resulting SUIF representation, and the result of our analysis would be converted back to C using SUIF's s2c pass. Similarly, to obtain the original version, we took the above steps with the exception of running our analysis. So we converted the C source file of the benchmark to SUIF using c2s, and converted the SUIF representation back to C using s2c. One might ask why these steps are necessary for the original version instead of using the original C source file for the benchmark. The reason we did this is because the s2c pass might produce a different, although similar, C representation from the original C source file, and we want to reduce any variability that this might cause.
Both C files produced by s2c were then compiled using GCC, with the options -O3 and -fno-inline. The option -O3 allows GCC to optimize calls to functions with const or pure attribute. However, -O3 will also enable function inlining in GCC, so we disable inlining using the option -fno-inline. We disabled inlining for our benchmarks because we are trying to measure the performance of function calls. We used GCC 3.2.2 on a Red Hat Linux 9 machine. This step will produce executables for both the original version and the modified version of the benchmarks. We then ran each version of each benchmark ten times on a Pentium IV 2.4 GHz machine with 1GB of RAM running Red Hat Linux 9. Table 2 contains the results of the benchmark runs, using the averaged user time as measured by the UNIX utility time over the ten iterations.
|
As the table shows, there are speedups gained in benchmarks 1, 3, and 4, all of which contain const functions. However, for benchmark 2, which contains a pure functions, we did not get any speedups. We were a bit surprised ourselves by the results for benchmark 2. So we inspected the assembly code produced by GCC for the executable of benchmark 2, and indeed we found out that although the second call to the pure function is redundant because there is no change to the global variable in between the calls, GCC somehow does not optimize this case. GCC might not handle optimization for pure functions yet. Regardless, at least for the case of const functions, we have shown that we can get significant performance improvement by optimizing redundant calls. The performance improvement does depend on the complexity of the function being called. So the effect of the analysis and optimization will be more pronounced when there are a lot of redundant calls to computationally expensive but const functions (e.g. functions for common mathematical operations).
One might also ask what improvement our analysis combined with GCC when it is used to compile standard benchmark programs, for instance, the programs that are part of the applicable SPEC benchmark suite[8]. A limitation in our implementation prevents us to do this. This limitation--the inability of our implementation to analyze over multiple compilation units--will be discussed in the next section.