C++ Performance Tips
Table of Contents
Introduction
These tips are based mainly on ideas from the book
Efficient C++
by Dov Bulka and David Mayhew. For a more thorough treatment of performance
programming with C++, I highly recommend this book. This document, while
presenting many of the same ideas in the book, does not go into as much
detail as to why certain techniques are better than others. The book also
provides code examples to illustrate many of the points presented here.
Constructors and Destructors
- The performance of constructors and destructors is often poor due to the
fact that an object's constructor (destructor) may call the constructors
(destructors) of member objects and parent objects. This can result in
constructors (destructors) that take a long time to execute, especially with
objects in complex hierarchies or objects that contain several member objects.
As long as all of the computations are necessary, then there isn't really a way
around this. As a programmer, you should at least be aware of this "silent
execution".
If all of the computations mentioned above are not necessary, then they should
be avoided. This seems like an obvious statement, but you should be sure that
the computations performed by the constructor that you are using is doing
only what you need.
-
Objects should be only created when they are used. A good technique is to put
off object creation to the scope in which it is used. This prevents
unnecessary constructors and destructors from being called.
-
Using the initializer list functionality that C++ offers is very important
for efficiency. All member objects that are not in the initializer list are
by default created by the compiler using their respective default constructors.
By calling an object's constructor in the initializer list, you avoid having
to call an object's default constructor and the overhead from an assignment
operator inside the constructor. Also, using the initializer list may reduce
the number of temporaries needed to construct the object. See the
Temporaries section for more information on this.
Virtual Functions
-
Virtual functions negatively affect performance in 3 main ways:
- The constructor of an object containing virtual functions must initialize
the
vptr
table, which is the table of pointers to its member
functions.
- Virtual functions are called using pointer indirection, which results in
a few extra instructions per method invocation as compared to a non-virtual
method invocation.
- Virtual functions whose resolution is only known at run-time cannot be
inlined. (For more on inlining, see the Inlining
section.
-
Templates can be used to avoid the overhead of virtual functions by using
a templated class in place of inheritance. A templated class does not use
the
vptr
table because the type of class is known at compile-time
instead of having to be determined at run-time. Also, the non-virtual
methods in a templated class can be inlined.
- The cost of using virtual functions is usually not a factor in calling
methods that take a long time to execute since the call overhead is dominated
by the method itself. In smaller methods, for example accessor methods, the
cost of virtual functions is more important.
Return Value
Methods that must return an object usually have to create an object to return.
Since constructing this object takes time, we want to avoid it if
possible. There are several ways to accomplish this.
-
Instead of returning an object, add another parameter to the method which
allows the programmer to pass in the object in which the programmer
wants the result stored.
This way the method won't have to create an extra
object. It will simply use the parameter passed to the method.
This technique is called Return Value Optimization (RVO).
-
Whether or not RVO will result in an actual optimization is up to the compiler.
Different compilers handle this differently. One way to help the compiler
is to use a computational constructor. A computational constructor can be
used in place of a method that returns an object. The computational
constructor takes the same parameters as the method to be optimized, but
instead of returning an object based on the parameters, it initializes itself
based on the values of the parameters.
Temporaries
Temporaries are objects that are "by-products" of a computation. They are not
explicitly declared, and as their name implies, they are temporary. Still,
you should know when the compiler is creating a temporary object because it
is often possible to prevent this from happening.
-
The most common place for temporaries to occur is in passing an object to a
method by value. The formal argument is created on the stack. This can be
prevented by using pass by address or pass by reference.
-
Compilers may create a temporary object in assignment of an object. For example,
a constructor that takes an
int
as an argument may be assigned
an int
. The compiler will create a temporary object using the
int
as the parameter and then call the assignment operator
on the object. You can prevent the compiler from doing this behind your back
by using the explicit
keyword in the declaration of the
constructor.
-
When objects are returned by value, temporaries are often used. See the
Return Value section for more on this.
-
Temporaries can be avoided by using
<op>=
operators. For
example, the code
a = b + c;
could be written as
a=b;
a+=c;
.
Inlining
Inlining is one of the easiest optimizations to use in C++ and it can result
in the most dramatic improvements in execution speed. The main thing to
know when using inlining is when you should inline a method and when you
shouldn't inline.
-
There is always a trade-off between code size and execution speed when
inlining.
In general, small methods (for example, accessors) should
be inlined and large methods should not be inlined.
-
If you are not sure of whether or not a given method should be inlined, the
best way to decide is to profile the code. That is, run test samples of the
code, timing inlining and non-inlining versions.
-
Excessive inlining can drastically increase code size, which can result in
increased execution times because of a resulting lower cache hit rate.
-
Watch out for inlined methods that make calls to other inlined methods. This
can make the code size unexpectedly larger.
-
Singleton methods, methods that are only called from one place in a program,
are ideal for inlining. The code size does not get any bigger and execution
speed only gets better.
-
Using literal arguments with an inlined method allows the compiler to make
significant optimizations. (This is, however, compiler dependent.)
-
The compiler preprocessor can be used to implement conditional inlining. This
is useful so that during testing the code is easier to debug. But for
compiling production code, there are no changes to be made to the source
code. This is implemented by using a preprocessor macro called
INLINE
. Inlined code is defined within #ifdef INLINE ... #endif
code blocks. Similarly, non-inlined code is defined within #ifndef INLINE ... #endif
code blocks. Then to compile using inlined code, you tell the compiler to treat
INLINE
as defined. (-DINLINE
with g++
)
-
Sometimes it makes sense to inline a given method in some places, but to not
inline in other places within the same program.
This can be accomplished using a technique called selective
inlining. The implementation of this technique is not very convenient. For each
method that you want to selectively inline you have two methods,
where on one has "inline_" prepended to the method name and is of course inlined. The method without
the "inline_" prepended to its name simply calls the inlined version of the
method.
-
Recursive calls cannot be inlined, but there are two techniques to try to
improve performance in the case of recursive methods:
-
If the recursion is tail recursion, then the algorithm can be rewritten as an
iterative algorithm, eliminating the overhead of method invocations.
-
Recursive call unrolling basically allows the programmer to inline the first
steps of recursion in a recursive algorithm. For example, for a recursive
method
print()
we might do the following: print_unrolled()
calls print1()
which calls print2()
which
calls print3()
which calls print()
. All methods
except print()
are inlined. The number of recursive steps can
be made as high as desired, depending on the application.
Please send comments (complaints), suggestions (corrections), and additions (subtractions) to Andrew Gilpin.
Thanks to Andreas Franke for correcting an error.
Page last modified: October 5, 2002