C++ Performance Tips

Introduction
Constructors and Destructors
Virtual Functions
Return Value
Temporaries
Inlining

Introduction

These tips are based mainly on ideas from the book Efficient C++ by Dov Bulka and David Mayhew. For a more thorough treatment of performance programming with C++, I highly recommend this book. This document, while presenting many of the same ideas in the book, does not go into as much detail as to why certain techniques are better than others. The book also provides code examples to illustrate many of the points presented here.

Constructors and Destructors

The performance of constructors and destructors is often poor due to the fact that an object's constructor (destructor) may call the constructors (destructors) of member objects and parent objects. This can result in constructors (destructors) that take a long time to execute, especially with objects in complex hierarchies or objects that contain several member objects. As long as all of the computations are necessary, then there isn't really a way around this. As a programmer, you should at least be aware of this "silent execution".
If all of the computations mentioned above are not necessary, then they should be avoided. This seems like an obvious statement, but you should be sure that the computations performed by the constructor that you are using is doing only what you need.
Objects should be only created when they are used. A good technique is to put off object creation to the scope in which it is used. This prevents unnecessary constructors and destructors from being called.
Using the initializer list functionality that C++ offers is very important for efficiency. All member objects that are not in the initializer list are by default created by the compiler using their respective default constructors. By calling an object's constructor in the initializer list, you avoid having to call an object's default constructor and the overhead from an assignment operator inside the constructor. Also, using the initializer list may reduce the number of temporaries needed to construct the object. See the Temporaries section for more information on this.

Virtual Functions

Virtual functions negatively affect performance in 3 main ways:
1. The constructor of an object containing virtual functions must initialize the vptr table, which is the table of pointers to its member functions.
2. Virtual functions are called using pointer indirection, which results in a few extra instructions per method invocation as compared to a non-virtual method invocation.
3. Virtual functions whose resolution is only known at run-time cannot be inlined. (For more on inlining, see the Inlining section.
Templates can be used to avoid the overhead of virtual functions by using a templated class in place of inheritance. A templated class does not use the vptr table because the type of class is known at compile-time instead of having to be determined at run-time. Also, the non-virtual methods in a templated class can be inlined.
The cost of using virtual functions is usually not a factor in calling methods that take a long time to execute since the call overhead is dominated by the method itself. In smaller methods, for example accessor methods, the cost of virtual functions is more important.

Return Value

Methods that must return an object usually have to create an object to return. Since constructing this object takes time, we want to avoid it if possible. There are several ways to accomplish this.

Instead of returning an object, add another parameter to the method which allows the programmer to pass in the object in which the programmer wants the result stored. This way the method won't have to create an extra object. It will simply use the parameter passed to the method. This technique is called Return Value Optimization (RVO).
Whether or not RVO will result in an actual optimization is up to the compiler. Different compilers handle this differently. One way to help the compiler is to use a computational constructor. A computational constructor can be used in place of a method that returns an object. The computational constructor takes the same parameters as the method to be optimized, but instead of returning an object based on the parameters, it initializes itself based on the values of the parameters.

Temporaries

Temporaries are objects that are "by-products" of a computation. They are not explicitly declared, and as their name implies, they are temporary. Still, you should know when the compiler is creating a temporary object because it is often possible to prevent this from happening.

The most common place for temporaries to occur is in passing an object to a method by value. The formal argument is created on the stack. This can be prevented by using pass by address or pass by reference.
Compilers may create a temporary object in assignment of an object. For example, a constructor that takes an int as an argument may be assigned an int. The compiler will create a temporary object using the int as the parameter and then call the assignment operator on the object. You can prevent the compiler from doing this behind your back by using the explicit keyword in the declaration of the constructor.
When objects are returned by value, temporaries are often used. See the Return Value section for more on this.
Temporaries can be avoided by using <op>= operators. For example, the code
a = b + c;
could be written as
a=b; a+=c;.

Inlining

Inlining is one of the easiest optimizations to use in C++ and it can result in the most dramatic improvements in execution speed. The main thing to know when using inlining is when you should inline a method and when you shouldn't inline.

There is always a trade-off between code size and execution speed when inlining. In general, small methods (for example, accessors) should be inlined and large methods should not be inlined.
If you are not sure of whether or not a given method should be inlined, the best way to decide is to profile the code. That is, run test samples of the code, timing inlining and non-inlining versions.
Excessive inlining can drastically increase code size, which can result in increased execution times because of a resulting lower cache hit rate.
Watch out for inlined methods that make calls to other inlined methods. This can make the code size unexpectedly larger.
Singleton methods, methods that are only called from one place in a program, are ideal for inlining. The code size does not get any bigger and execution speed only gets better.
Using literal arguments with an inlined method allows the compiler to make significant optimizations. (This is, however, compiler dependent.)
The compiler preprocessor can be used to implement conditional inlining. This is useful so that during testing the code is easier to debug. But for compiling production code, there are no changes to be made to the source code. This is implemented by using a preprocessor macro called INLINE. Inlined code is defined within #ifdef INLINE ... #endif code blocks. Similarly, non-inlined code is defined within #ifndef INLINE ... #endif code blocks. Then to compile using inlined code, you tell the compiler to treat INLINE as defined. (-DINLINE with g++)
Sometimes it makes sense to inline a given method in some places, but to not inline in other places within the same program. This can be accomplished using a technique called selective inlining. The implementation of this technique is not very convenient. For each method that you want to selectively inline you have two methods, where on one has "inline_" prepended to the method name and is of course inlined. The method without the "inline_" prepended to its name simply calls the inlined version of the method.
Recursive calls cannot be inlined, but there are two techniques to try to improve performance in the case of recursive methods:
1. If the recursion is tail recursion, then the algorithm can be rewritten as an iterative algorithm, eliminating the overhead of method invocations.
2. Recursive call unrolling basically allows the programmer to inline the first steps of recursion in a recursive algorithm. For example, for a recursive method print() we might do the following: print_unrolled() calls print1() which calls print2() which calls print3() which calls print(). All methods except print() are inlined. The number of recursive steps can be made as high as desired, depending on the application.

Please send comments (complaints), suggestions (corrections), and additions (subtractions) to Andrew Gilpin.

Thanks to Andreas Franke for correcting an error.

Page last modified: October 5, 2002