Algorithm Library Design: Course Home Page -- Lecture Notes -- Source Code -- References

2. The C++ Language


Introduction

A basic familiarity with C++ is assumed. We repeat shortly classes, member functions, constructors and destructor, derivation, virtual member functions, and static class members. We continue then with an introduction to templates. Recommended introductions for C++ are [Stroustrup97] (advanced), [Lippman98] (more basic), [ISO-C++-98] (the actual standard ;-).


Classes and Member Functions

struct A {
    int i;
};
We call the type A a class. If we create a variable of type A, we call this variable an object of class A, which is also known as an instantiation of class A. Each object of type A has a member variable i which can be accessed with the dot notation:
int main() {
    A a;
    a.i = 5;
}
C++ provides access control for class members, which can be either of private, protected, or public. Private members can only be used within the class. Protected members can also be used by derived classes. Public members can be used from everywhere. The members in struct are by default public, the members in class are by default private. Thus above definition of A is equivalent to:
class A {
public:
    int i;
};
We will use struct or class interchangeable, whichever default is more convenient.

A member function looks like a normal function declaration within a class definition. It is usually placed in a header file *.h.

class A {
    int i; // private
public:
    int  get_i();
    void set_i( int n);
};
A member functions definition is written outside of the class definition and uses the scope operator :: to name the function within the class. It is usually placed in a source file *.C. Member functions can just access all other class members. To accomplish this, the C++ compiler adds automatically a hidden function parameter that points to the current object. This hidden function parameter is named this and is of type A*.
int A::get_i() {
    return i;
}
void A::set_i( int n) {
    i = n;
}
The member variable i is now inaccessible from the outside and the object can only be manipulated through its member functions.
int main() {
    A a;
    a.set_i(5);
    int j = a.get_i();
}
For efficiency, functions and member functions (collectively called functions in the following) can be declared inline, which advises the compiler to replace a function call directly with the function body if possible, instead of creating a function call and a separately compiled function body. Small inline functions can lead to faster and smaller code. Not so small inline functions can still lead to faster, but larger code. Inline functions have to be defined before their use, thus their implementation moves usually to the header file. If the member function is defined within the class, the inline declaration is automatically assumed.
class A {
    int i; // private
public:
    int  get_i()       { return i; }  // both inline
    void set_i( int n) { i = n; }
};
Variables can be declared const in C++. To check const correctness across classes and member functions, a member function has to be declared const if it does not change the member variable of the class. Otherwise, a member function cannot be called for an object that is declared const. So the complete definition of our example class A looks as follows: (See also const correctness.)
class A {
    int i; // private
public:
    int  get_i() const { return i; }  // inline and const
    void set_i( int n) { i = n; }     // inline
};

int main() {
    A a;
    a.set_i(5);
    int j = a.get_i();
    
    const A a2;      // uninitialized constant
    // a2.set_i(5);  // this is forbidden by C++ type system
                     // So, const A is currently pretty useless.
}

Constructors, Assignment, and Destructor

Constructors are special `member functions' of which exactly one is called automatically at creation time of an object. Which one depends on the form of the variable declaration. Symmetrically, there exists a destructor which is automatically called at destruction time of an object.

Three types of constructors are distinguished: default constructor, copy constructor, and all others (user defined). A constructor has the same name as the class and it has no return type. The parameter list for the default and the copy constructor are fixed. The user defined constructors can have arbitrary parameter lists following C++ overloading rules.

class A {
    int i; // private
public:
    A();             // default constructor
    A( const A& a);  // copy constructor
    A( int n);       // user defined
    ~A();            // destructor
};

int main() {
    A a1;  // calls the default constructor
    A a2 = a1;  // calls the copy constructor (not the assignment operator)
    A a3(a1);   // calls the copy constructor (usual constructor call syntax)
    A a4(1);    // calls the user defined constructor
}  // automatic destructor calls for a4, a3, a2, a1 at the end of the block
The compiler generates a missing default constructor, copy constructor, or destructor automatically. The default implementation of the default constructor calls the default constructors of all class member variables. The default constructor is not automatically generated if other constructors are explicitly declared in the class (except an explicit declared copy constructor). The default copy constructor calls the copy constructor of all class member variables, which performs a bitwise copy for built-in data types. The default destructor does nothing. All default implementations can be explicitly inhibited by declaring the respective constructor/destructor in the private part of the class.

Constructors initialize member variables. A new syntax, which resembles constructor calls, allows to call the respective constructors of the member variables (instead of assigning new values). The constructor call syntax extends also to built-in types. The order of the initializations should follow the order of the declarations, multiple initializations are separated by comma.

class A {
    int i; // private
public:
    A() : i(0) {}               // default constructor
    A( const A& a) : i(a.i) {}  // copy constructor, equal to the compiler default
    A( int n) : i(n) {}         // user defined
    ~A() {}                     // destructor, equal to the compiler default
};
Usually, only the default constructor (if the semantics are reasonable), and some user defined constructors are defined for a class. As soon as the class manages some external resources, e.g., dynamically allocated memory, the following four implementations have to work together to avoid resource allocation errors: default constructor, copy constructor, assignment operator, and destructor. Note that the compiler would create a default implementation for the assignment operator if it is not defined explicitly. See the following example and [Item 11 and 17, Meyers97]. Note the use of this, the pointer to the current object. (see also Buffer.C)
class Buffer {
    char* p;
public:
    Buffer() : p( new char[100]) {}
    ~Buffer() { delete[] p; }

    Buffer( const Buffer& buf) : p( new char[100]) {
        memcpy( p, buf.p, 100);
    }
    void swap( Buffer& buf) {
        char* tmp = p;
        p = buf.p;
        buf.p = tmp; 
    }
    Buffer& operator=( const Buffer& buf) {
        // Check for self-assignment, but its only an optimization
        if ( this != & buf) {
            // In general: perform copy constructor and destructor
            // Make sure that self-assignment is not harmful.
            Buffer newbuf( buf); // create the new copy
            swap( newbuf);       // exchange it with 'this'
            // the destructor of newbuf cleans up the data previously 
            // stored in 'this'.
	}
        return *this;
    }
};

Automatic Conversion and the explicit Keyword for Constructors

A user-defined constructor with a single argument defines a conversion between two types; the type of the argument and the class the constructor belongs to. The C++ compiler is allowed to perform this conversion automatically to find a matching function call.
struct Buffer {
    Buffer( int n);  // construtor to allocate n bytes for buffer
    // ...
};

void rotate( Buffer& buf); // a function to rotate buffer cyclically

int main() {
    rotate( 5); // oops, a temporary Buffer initialized with 5 will be created
}
These automatic conversions can be a source of errors that are difficult to spot. It is advised to forbid them with the new keyword explicit.
struct Buffer {
    explicit Buffer( int n);
    // ...
};

Ambiguity between Function-Style Cast and Declaration

The constructor notation and its implied function style cast gives rise to a couple of ambiguities in C++ and the heavily overloaded use of parentheses. In case of ambiguities between a statement and a declaration, the compiler choses the declaration.
struct S {
    S(int);
};

void foo( double d) {
    S v( int(d));     // function declaration
    S w = S( int(d)); // object declaration
    S x(( int(d)));   // object declaration
}
v is a function declaration because the parentheses around d are redundant, so S v( int d); is obviously a function declaration and not an object of type S that gets initialized with d casted to int.

To get the second interpretation we have to disambiguate the declaration explicitly, either by using the other intializer notation as for w or by adding parentheses that exclude the function declaration as for x. (see also decl.C)


Derivation

If we derive a class B from a base class A, B inherits all member variables and all member functions from A, but it can access only those that are not private. We consider only public inheritance here (inheritance can also be qualified as private).
class B : public A {
    int j;
};
B has now two integer variable member. Objects of class B can be assigned to objects of class A. Doing so, they loose their additional member variable j.
int main() {
    B b;
    A a = b;
}
Constructors and destructor are not inherited. But the default implementations of the derived class call automatically the respective implementations of the base class. Only the additional, user defined constructors are missing and must be repeated. Calling a base class constructor explicitly follows the same syntax as the member variable initialization.
class B : public A {
    int j;
public:
    B( int n) : A(n) {}
    B( int n, int m) : A(n), j(m) {}
};
The first constructor and the default constructor leave the value of j uninitialized. We solve this in the following example and use default values to implement the three constructors in one.
class B : public A {
    int j;
public:
    B( int n = 0, int m = 0) : A(n), j(m) {}
};

Virtual Member Functions

Virtual member functions and derivation provide the backbone of flexibility in C++ for the object-oriented paradigm. A base class defines an interface with virtual member functions, here a pure virtual member function.
struct Shape {
    virtual void draw() = 0;
};
We derive different concrete classes from Shape and implement the member function draw for each of them.
struct Circle : public Shape {
    void draw();
};
struct Square : public Shape {
    void draw();
};
We cannot create an object of a class that contains pure virtual member functions, but we can have pointer of this type and we can assign pointer of the derived types to them. If we call a member function through this pointer, the program figures out at runtime which member function is meant, Circle::draw or Square::draw.
int main() {
    Shape* s1 = new Circle;
    Shape* s2 = new Square;
    s1->draw();	            // calls Circle::draw
    s2->draw();             // calls Square::draw
}
This runtime flexibility is achieved with a virtual function table per class. (dispatch table with function pointers). Each object gets an additional pointer referring to this table. Thus, each object knows at runtime of which type it is, which is also used for the runtime type information in C++. These extra costs, additional pointer and one more indirection for a function call, are only imposed on objects which class or base classes use virtual member functions.

Since we don't know the size of the actually allocated objects any more, we also have to use a virtual member function to delete the objects properly. It is sufficient to define a virtual, empty destructor in the base class. (see also Shape.C)

struct Shape {
    virtual void draw() = 0;
    virtual ~Shape() {}
};

// ... the derived shape classes

int main() {
    Shape* s1 = new Circle;
    Shape* s2 = new Square;
    s1->draw();	            // calls Circle::draw
    s2->draw();             // calls Square::draw
    delete s1;
    delete s2;
}

Static Class Members

Static member variables belong to the class, not to the object. They can be accessed from outside the class with the scope operator. For example, a class that counts how many objects of its type have been created looks as follows: (see also Counter.C)
#include <iostream.h>

struct Counter {
    static int counter;
    Counter() { counter++; }
};

int Counter::counter = 0;

int main() {
    Counter a;
    Counter b;
    Counter c;
    cout << Counter::counter << endl;
}
Note that an explicit definition of the static member outside of the class is needed. This definition is supposed to show up in only one compilation unit (much like a global variable). Thus, the definition is usually in a *.C source file.

Static member variables are guaranteed to be initialized before main gets executed. The order of initialization for multiple static member variables is specified to be in the order of declaration in a single compilation unit, but the order is unspecified between different compilation units. This implies in particular that a class which relies on the proper initialization of a static member variable (such as our Counter example) cannot be used as a type of a static member variable in another compilation unit (see [Item 47, Meyers97] for how to get around this restriction with local static variables in global functions).

A static variable can be used to initialize a library automatically, see Automatic Library Initialization and Housekeeping.

A member function exists already only once per class, but it has a pointer (this) to the current object hidden in its parameter list. A static member function omits this pointer. It is called like a normal function using the scope operator. A static member function cannot access member variables of the object. It can only access static member variables.

struct A {
    static int i;
    int j;
    static void init();
};

void A::init() {
    i = 5; // fine
    // j = 6; // is not allowed
}

int main() {
    A::init();
    assert( A::i == 5);
}

Templates

Templates provide the backbone of flexibility in C++ for the generic-programming paradigm. Their flexibility is resolved at compile time thus retaining efficiency.

C++ supports two kinds of templates: class templates and function templates. Templates are incompletely specified components in which a few types are left open and represented by formal placeholders, the template arguments. The compiler generates a separate translation of the component with actual types replacing the formal placeholders wherever this template is used. This process is called template instantiation. The actual types for a function template are implicitly given by the types of the function arguments at instantiation time. Therefore, all template arguments must be used somewhere in the function parameter list. An example is a swap function that exchanges the value of two variables of arbitrary types.

template <class T>
void swap( T& a, T& b) {
    T tmp = a; 
    a = b;
    b = tmp;
}

int main() {
    int i = 5;
    int j = 7;
    swap( i, j);  // uses "int" for T.
}
The actual types for a class template are explicitly provided by the programmer. An example is a generic list class for arbitrary item types.
template <class T>
struct list {
    void push_back( const T& t); // append t to list.
};

int main() {
    list<int> ls;  // uses "int" for T.
    ls.push_back(5);
}
Defining push_back outside of the list class requires the repetition of the template declaration template <class T>, and the name of the list for the scope, which is list<T>. For the naming convention, list is a class template, while list<T> is a template class, (in particular, list is a template, and list<T> is a class).
template <class T>
void list<T>::push_back( const T& t) { ... }

The C++ compiler uses pattern matching to derive automatically the template argument types for functions template. Consider for example a function template that works only for lists:

template <class T>
void foo( list<T>& ls);

Template arguments can have default values, e.g., a stack class built with our list. Note the space between the two > >. Otherwise this would be parsed as the right-shift operator.

template <class T, class Container = list<T> >
struct stack { ... };

Besides type parameters, class templates can also have builtin integral types as template parameters. In that case, the template arguments must be constant expression for that integral type. A useful example are points of constant dimension. The parameter is determined at compile time and thus constant. We can use it to declare a fixed size array of coordinates for the point. Features like default arguments and specialization (see below) work also on this kind of parameter.

template <int dim>
struct Point { 
    double coordinates[dim]; // coordinate array
    // ...
};

int main() {
    Point<3> // a point in 3d space
}

The C++ standard aims for separate compilation of templates, but this is not available in most compilers today. The current model for templates is that all source code including definitions goes into the header files.


"Lazy" Implicit Instantiation

As already mentioned above, the process of using a function template forces the compiler to instantiate the template, which is more precisely called implicit instantiation. There is also an explicit instantiation, which we do not use and do not mention any further.

Member functions of class templates are also instantiated implicitly.

Assume that we want to add a sort member function to our list template from above. Quicksort is not efficient on lists (needs an extra container), so our specialized member function realizes, for example, a more efficient merge sort. For a sort function, the item type of the list needs to be comparable. But lists in general are also useful for types that are not comparable. However, this causes no problem in C++, since C++ is not allowed to compile the sort member function if it is not instantiated somewhere. And the compiler is not allowed to complain about the missing comparison operator in the sort member function. In consequence, we can use the list class with a type that is not comparable, as long as we do not try to sort this list somewhere in our program. In the chapter on the STL we will see examples where this useful to implement generic adapters for iterators.

This behavior is restricted to class templates.


Member Templates

Member functions are basically normal global functions (with some syntactic sugar and compiler support for the this pointer). Thus, function templates were easy to extend to member functions and even constructors (although late in the standardization process).

The small convenience class pair<T1,T2> from the STL makes use of a member template constructor. The class pair<T1,T2> contains a member variable of type T1 and a member variable of type T2, i.e., a tuple (first,second) with types (T1,T2). The default copy constructor allows the construction of a pair, if the types for the template arguments are exactly the same. However, in C++ are several automatic conversions possible, for example mutable to const, pointer of derived class to pointer of base class, short to int, etc. It would be nice, if we could create one pair from another pair, if the types T1 and T2 would be assignable from one pair to the other pair. The solution is a template constructor accepting all pairs. The actual construction of its member variable compiles only if this construction is permitted.

template <class T1, class T2>
struct pair {
    T1 first;
    T2 second;
    pair() {}  // don't forget the default constructor if there are also others
    // template constructor
    template <class U1, class U2>
    pair( const pair<U1,U2>& p);
};
Now, let's assume we want to define this constructor not inline (however, inline would be preferable here). Here is, how we nest the two template declarations. (see also pair.C)
template <class T1, class T2> 
template <class U1, class U2>
pair<T1,T2>::pair( const pair<U1,U2>& p) 
    : first( p.first), second( p.second) {}

Specialization and Partial Specialization

Let us assume we have a generic vector class vector<T> that is just fine for the general case, but for booleans we could do better with a bit vector. We can write a specialized class for booleans.
template <>
struct vector<bool> {
    // specialized implementation
};
The compiler matches vector<bool> automatically with this specialization. The empty template declaration was previously superfluous, but is now mandatory.

Now suppose, vector has a second template argument for a memory allocator (which it does in the STL, but hidden by a default setting). The resulting partial specialization is still a template.

template <class Allocator = std::allocator>
struct vector<bool,Allocator> {
    // partially specialized implementation
};
Specialization and partial specialization work also for function templates. Since we have already mentioned pattern matching for function templates, it might not be such a surprise. However, it needs to be clarified, how the resulting overloading of the function name gets resolved. The general rule of thumb is that the compiler tries to instantiate all function templates that can match the function call, and it chooses the `most specific' instantiation. If there is more than one `most specific' instantiation, it is reported as an ambiguity error. The bad news here is that a sound type theory as known from functional languages is missing here.


Local Types and Keyword typename

Besides variables and member functions, classes can also contain enum's and types. They are accessed with the scope operator `::'.
template <class T>
struct list {
    typedef T value_type;
};

int main() {
    list<int> ls;
    list<int>::value_type i; // is of type int
}
Let us assume a class X uses a container, such as list<T>. Now class X needs the value type T of the container, which we have already prepared with the typedef in the list class template. For convenience, we use a typedef and the same name in class X.
template <class Container>
struct X {
    typedef Container::value_type value_type; // not correct
    // ...
};
But how can the compiler know that Container::value_type is actually a type and not a static member variable without knowing the actual type for Container, i.e., before actually instantiating the template? It does not. The solution is the new keyword typename. By default, the compiler assumes that in case of such ambiguities the token is not a type. If it is a type, we can say so explicitly with the new keyword. Thus, the correct examples is:
template <class Container>
struct X {
    typedef typename Container::value_type value_type;
    // ...
};
The keyword typename is used to indicate that the name following the keyword does in fact denote the name of a type. However, one cannot just liberally sprinkle code with typenames. More precisely, one must use the keyword typename in front of a name that:
  1. denotes a type; and
  2. is qualified: i.e., it contains a scope operator `::'; and
  3. appears in a template; and
  4. is not used in a list of base-classes or as an item to be initialized by a constructor initializer list, and
  5. has a component left of a scope resolution operator that depends on a template parameter.
Furthermore, one is only allowed to use the keyword in this sense if the first four apply. To illustrate this rule, consider this code fragment:
template<class T>
struct S : public X<T>::Base {     // no typename, because of 4
    S(): X<T>::Base(               // no typename, because of 4
        typename X<T>::Base(0)) {} // typename needed
    X<T> f() {                     // no typename, because of 2
        typename X<T>::C *p;       // declaration of pointer p, typename needed
        X<T>::D *q;                // no typename ==> multiplication!
    }
    X<int>::C *s_;                 // typename allowed but not needed
};

struct U {
    X<int>::C *pc_;                // no typename, because of 2
};

Dynamic and Static Polymorphism

Polymorphism refers to the ability of a single piece of code to work with multiple types. C++ supports two kinds of polymorphism: dynamic (runtime) polymorphism through virtual functions and static (compile time) polymorphism through templates. Dynamic polymorphism is in central role in object-oriented programming while static polymorphism is at the heart of generic programming.

Let us look at the shape example we saw earlier:

struct Shape {
    virtual void draw() = 0;
    virtual ~Shape() {}
};

struct Circle : public Shape {
    void draw();
};
struct Square : public Shape {
    void draw();
};
Now we have two ways of writing a single function that works for circles, squares, and any other classes derived from Shape:
void display (const Shape& s) {   // dynamic polymorphism
    s.draw();
}

template <class T>                // static polymorphism
void display (const T& s) {       // T does not need to be derived from Shape
    s.draw();
}
In this case, dynamic polymorphism is more appropriate. The problems with static polymorphism are: Templates have their advantages, too. Let us look at the swap example:
template <class T>
void swap( T& a, T& b) {
    T tmp = a; 
    a = b;
    b = tmp;
}
The basic operations used by swap are copy constructor and assignment. To do this with virtual functions, we need a base class with virtual versions of copy constructor and assignment. Because a constructor cannot be virtual and virtual assignment has its problems (see [Gillam98]), we use normal member functions:
struct Swappable {
    virtual Swappable* clone() const =0;
    virtual Swappable& assign(const Swappable& rhs) =0;
    virtual ~Swappable() {};
}

void swap (Swappable& a, Swappable& b) {
    Swappable* tmp = a.clone();
    a.assign(b);
    b.assign(*tmp);
    delete tmp;
}
Now swap works with any type that is derived from Swappable and defines clone and assign appropriately. This is clearly more awkward to use than the swap template. Other problems with dynamic polymorphism are:

Name Spaces

As a library developer we do not own the universe, for example, for identifier names. An application programmer most likely uses more than one library and create additional identifier names. It frequently happened that common names such as min, max, swap, or Byte have been defined in more than one library with surprising results when those libraries where used together. Assume a header file "a.h" contains the macro definition

#define Byte unsigned char;

and another header file "b.h" defines

typedef unsigned char Byte;

What happens if we include both header files, does it depend on the order of inclusion?

A common way out of this dilemma was and still is the use of a common prefix for all identifiers of one library. The prefix is typically a short abbreviation such as std_, CGAL_, Q for the Qt GUI interface, and it is supposely different than all other prefixes.

In C++ we have the new concept of name spaces. They act as a scope and group identifiers together. They can be extended anytime. An example:

namespace CGAL {
    int max( int a, int b);
    class A;
    void foo( const A& a);
} // ends the namespace CGAL

We could use the above declarations in the following way:

int main() {
    int i = CGAL::max( 3, 4);
    A a; // assumes that we have also seen the full definition of A somewhere
    CGAL::foo( a);
}

So far, the name space scope CGAL:: with the so-called scope operator :: has just replaced the prefix in user code. However, within the name space itself we don't have to repeat the scope all the time. Name lookup happens following the name space scope from the inside to the outside. The global name space scope is just denoted :: and can be used to name identifiers in the global scope that also exist in the current local scope.

There is an alternative possebility for calling the function foo, see this example instead:

int main() {
    A a;
    foo( a);
}

We just omitted the scope, but now the compiler also examines the argument types and includes the scopes of the argument types for name lookup searches. This is well known in C++ under the name Koenig lookup.

All the standard C and C++ library has been enclosed in the std namespace. One can import a whole namespace or just selected identifiers into the current namespace with the using declarative:

using namespace std;
using std::vector;

C Preprocessor: Include Guards and assert Macro

C++ has inherited from C its preprocessor, a separate phase of the compiler that processes its own language before the compiler looks at the (transformed) source code. We discuss only a few aspects of the preprocessor, largely because it has almost become superfluous with the new language elements of C++, namely constants and templates.

Symbolic Constants

One can define symbolic constants, such as:

#define M_PI 3.14159265358979323846
#define CGAL_CFG_NO_KOENIG_LOOKUP 1

Each of these definitions defines a replacement rule, where the identifier following the #define gets literally replaced with the text following it. Note that this replacement happens on a text processing basis only, the preprocessor does not know about classes, protection, namespaces and scopes!

Include Guards

The preprocessor has control structures that can control which parts of the source code are compiled and which parts are excluded from the compiler.

assert Macro


Lutz Kettner (<surname>@mpi-sb.mpg.de). Last modified on Tuesday, 17-Jan-2006 17:53:41 MET.