Algorithm Library Design: Course Home Page -- Lecture Notes -- Source Code -- References

9. Large Scale C++ Software Design


Introduction

This section contains material from [Lakos96]. This book contains many more examples that help in understanding the different techniques.

We have seen various implementation aspects for single classes. We have also seen designs with several classes, for example, design patterns and template metaprograms. If we develop a large application or library, we have to consider a new level of organization: How to distribute classes and functions over files -- and also bigger units of organization.

We distinguish between the logical design and the physical design. The logical design describes how to write a class or a function and how to relate them to each other. Physical design describes how to organize the code in files.

In large systems, it is crucial to keep the complexity managable. One measure of complexity is the dependency between different units. Specifically cyclic dependencies increase the complexity. Complex systems are hard to understand and hard to test. Compilation times and link times can grow unmanagable large for complex systems.

After some definitions, which basically set up a sane way of organizing source code into components, we see techniques how to analyse physical dependencies between packages and how to break up cyclic dependencies between packages.


Internal and External Linkage

A name in C++ has internal linkage if it is local to its translation unit and cannot collide with an identical name defined in another translation unit. Examples are type names and static variables.

A name in C++ has external linkage if, in a multi-file program, that name can interact with other translation units at link time. Examples are non-static function names, member function names, and non-static global variables.


Components and Dependency Relations

A component name consists of one header file name.h and one source file name.C (or whatever suffix for C++ source files is appropriate). Components are not restricted to a single class or function. They will usually contain a few closely related classes and functions. A couple of sanity rules apply:

We are interested in physical dependencies between components (the dependencies within a component are not of interest here). The sanity rules make it easy to see the physical dependencies from the header file inclusion graph.

A component y DependsOn a component x if x is needed in order to compile or link y. More specifically: Component y exhibits a compile-time dependency on x if x.h is needed in order to compile y.C. Component y exhibits a link-time dependency on x if the object file y.o contains undefined symbols for which x.o may be called upon either directly or indirectly to help resolve them at link time. Compile-time dependency almost always implies link-time dependency (see also [Page 127 ff., Lakos96]).

The IsA relation and the HasA relation from the logical design form always compile-time dependencies.


Physical Hierarchy

The DependsOn relation forms a graph over components. The major design rule is: Avoid cycles in the dependency graph! Designs with cycles are hard to understand. Designs with cycles can have much larger compile and link-times for testing.

Let us take a closer look on testing. We assume a test-driver program for each component. The benefit of components (and the intended modularization) is hierarchical testing. We test each compononent in isolation, before we test components that depend on this component. Of course, this does not work if we have cycles in the dependency graph. All components participating in the cycle have to be tested at once and together. However, this shows also a way out of cycles between components; reorganize the parts that participate in the cycle into one compoment (since we do not care what happens within one component). Some of the possible ways of reorganizing a design are covered in the next section.

Furthermore, the link time for building all test drivers increases. We compare two worst cases for n components: First, each component depends on all other components, and second, no component depends on any other component. If we assume unit cost for each linking with one component, we get O( n2) cost for linking all test drivers in the first case, and O( n) in the second case. However, each non-trivial realistic system will have dependencies. A well designed system will aim for a flat acyclic hierarchy, approximately shaped like a balanced tree. Total linking cost would be O( n log n).

The cost for linking all test-drivers can be captured in a useful metric.

The Cumulative Component Dependency, CCD, is the sum over all components Ci in a subsystem of the number of components needed in order to test each Ci incrementally.

Derived metrics are the avarage component dependency, ACD = CCD / n, and the normalized cumulative component dependency, NCCD, which is the CCD devided by the CCD of a perfectly balanced binary dependency tree with the same number of components. The CCD of a perfectly balanced binary dependency tree of n components is (n+1) * log2(n+1) - n.

The book [Lakos96] describes tools to analyse the dependencies of components and to compute these metrics automatically, assuming the above rules for packages have been followed. The sources for the tools are available at ftp://ftp.aw.com/cp/lakos/.


Reducing Link-Time Dependencies: Levelization

We introduce several techniques for eliminating cyclic dependencies in the dependency graph. The underlying assumption is that an initial design is actually likely to be free of cycles, but as the design evolves over time cycles are introduced.

An examples: We are given a bunch of geometric objects, among others a rectangle in a component of the same name:

// rectangle.h
#ifndef RECTANGLE_H
#define RECTANGLE_H 1

class Rectangle {
    // ...
public:
    Rectangle( int x1, int y1, int x2, int y2);
    // ...
};

#endif // RECTANGLE_H //
We also work with a graphical user interface and have a component with a class for a window:

// window.h
#ifndef WINDOW_H
#define WINDOW_H 1

class Window {
    // ...
public:
    Window( int xCenter, int yCenter, int width, int height);
    // ...
};

#endif // WINDOW_H //
We realize, that both represent (among others) a two-dimensional box and we would like to be able to construct a rectangle from a window and vice versa. A first attempt might just include the respective constructors. But as a consequence, we have to include the respective header files and have a cyclic dependency.

// rectangle.h
#ifndef RECTANGLE_H
#define RECTANGLE_H 1

#include "window.h"

class Rectangle {
    // ...
public:
    Rectangle( int x1, int y1, int x2, int y2);
    Rectangle( const Window& w);
    // ...
};

#endif // RECTANGLE_H //


// window.h
#ifndef WINDOW_H
#define WINDOW_H 1

#include "rectangle.h"

class Window {
    // ...
public:
    Window( int xCenter, int yCenter, int width, int height);
    Window( const Rectangle& r);
    // ...
};

#endif // WINDOW_H //
The dependancy graph looks like this:

Actually, if we follow the include statements and the include guards, we will find out that the solution does not compile yet, since we include the other header file before declaring the own class. We need to use forward declarations to solve this problem.

In fact, since we use the class Window only by reference in the Rectangle constructor, we do not need the full definition of the class Window, which we get by including the header file, but a declaration would be sufficient. The same is true for the class Rectangle in the Window header file.

// rectangle.h
#ifndef RECTANGLE_H
#define RECTANGLE_H 1

class Window;

class Rectangle {
    // ...
public:
    Rectangle( int x1, int y1, int x2, int y2);
    Rectangle( const Window& w);
    // ...
};

#endif // RECTANGLE_H //


// window.h
#ifndef WINDOW_H
#define WINDOW_H 1

class Rectangle;

class Window {
    // ...
public:
    Window( int xCenter, int yCenter, int width, int height);
    Window( const Rectangle& r);
    // ...
};

#endif // WINDOW_H //
However, in order to implement the constructors in the source files rectangle.C and window.C the respectively other header file has to be included again. The resulting dependency graph shows that we still have the cyclic dependency between the components. We just have reduced the compile-time dependencies, not the link-time dependencies.

Definition: A subsystem is levelizable if it compiles and the graph implied by the include directives of the individual components (including the .C files) is acyclic.

Thus, our example so far is not levelizable. We will see now some techniques to break cycles and to make a design levelizable.

Escalation

Escalation breaks a cycle by lifting the interdependent functionality one level up into a new component. The interdependent functionality is supposed to be small compared to the involved components. Thus, the extracted functionality is small enough to be put in a single component. In our example we introduce the component, boxutil, that contains only the two conversion functions, here as static member functions of a class. The rectangle and the window component remain untouched.

// boxutil.h
#ifndef BOXUTIL_H
#define BOXUTIL_H 1

class Rectangle;
class Window;

struct Boxutil {
    static Window    toWindow( const Rectangle& r);
    static Rectangle toWindow( const Window&    w);
};

#endif // BOXUTIL_H //

Demotion

Demotion is similar to escalation. But instead of collecting the interdependent functionality in a component in a level up, we collect the functionality one level down. It does not work nicely with our running example.

Factoring

Factoring is the general version of escalation and demotion. The interdependent functionality is isolated and repackaged in components, not necessarily in a single component. However, the goal is to reduce the complexity of the remaining cycle.

Opaque Pointers

Definition: A function f uses a type T in size if compiling the body of f requires having first seen the definition of T.

Definition: A function f uses a type T in name only if compiling f and any of the components on which f may depend does not require having first seen the definition of T.

Examples for in name only are reference and pointer types. Both definitions extend naturally for components.

Components that use objects in name only can be thoroughly tested, independently of the named object. Examples are container classes, nodes, and handles that just pass their data as pointers around.

Redundancy

This is not necessarily a technique to break cycles, but to reduce coupling and dependencies in general. The idea is, whenever only a small fraction of a component is actually used in another component and causes the dependency, it might be worthwhile to reimplement this small fraction again in the other component. Consider an example of a cell class, that contains among others a name. The name has been implemented as a string, but is still presented at the interface of cell as an old C-style char* pointer.

// cell.h
#ifndef CELL_H
#define CELL_H 1

#include 

class Cell {
    std::string d_name;
    // ...
public:
    Cell( const char* name);
    const char* name() const;
    // ...
};
#endif // CELL_H //
Here it might be worthwhile to reimplement the small fraction we need from string, namely storing a dynamically allocated array of characters.

// cell.h
#ifndef CELL_H
#define CELL_H 1

class Cell {
    char* d_name;
    // ...
public:
    Cell( const char* name);
    Cell( const Cell& name);
    ~Cell();
    Cell& operator=( const Cell& cell);
    const char* name() const;
    // ...
};
#endif // CELL_H //
However, it is arguable for this example that a dependency on strings is not a big issue, and that the old C-style interface is actually a bit clunky.

Callback

Callback functions allow to break a cycle. Typical example are graphical user interfaces, or the simple qsort function in the standard C library:

NAME
       qsort - sorts an array

SYNOPSIS
       #include 

       void qsort(void *base, size_t nmemb, size_t size,
              int (*compar)(const void *, const void *))
The function pointer compar is the callback function. However, callback functions are difficult to understand, debug, and maintain.


Reducing Compile-Time Dependencies: Insulation

We give a list of parts in a class that can create compile-time dependencies. A compile-time dependency exist for another component using this component if the other component has to be recompiled if one of the following parts in this component changes:

Insulation techniques eliminate the above dependencies, for example, compiler generated member functions can be explicitly implemented, even if they perform the same as the default implementation would. In case that a future revision would like to change this semantics it could be implemented without recompiling dependent components.

Besides the obvious ones, we address two techniques for partial insulation in the next section. Two techniques for full insulation are covered in the section thereafter.

Sometimes, going from partial insulation to full insulation is very easy. But sometimes, the last 5 percent are the hardest and the most costly. Full insulation is usually not appropriate at the bottom layers of a library. Full insulation is appropriate at the higher layers of a library that are exposed to the users.

Major runtime costs for insulation can happen because inline functions are no longer possible, virtual dispatch tables can add another indirection, and dynamic allocation of memory is slow. Memory use can increase with dynamic memory or virtual tables.

Techniques for Partial Insulation

A HasA relationship can be changed to a HoldsA relationship. The cost for this is usually dynamic memory management.

A private member function can be changed to a static non-local function of the component. If the private member function can be implemented using the public interface of the class, we just need to add the this pointer as an explicit function argument. If the member function needs exclusive access to private member variables, references to the private member variables can be added to the function signature. The price could be a penalty in runtime: the extended function signatures with larger parameter list cost time when calling the function if the function is not inline.

Techniques for Full Insulation

Definition: An abstract class is a protocol class if
  1. it neither contains nor inherits from classes that contain member data, non-virtual functions, or private (or protected) members of any kind,
  2. it has a non-inline virtual destructor defined with an empty implementation, and
  3. all member functions other than the destructor including inherited functions, are declared pure virtual and left undefined.
A protocol class is a nearly perfect insulator. Protocol classes in C++ are similar to interfaces in Java. Several of the design patterns have protocol classes as part of their design, for example the adaptor pattern.

Another technique for full insulation is an opaque pointer. The insulated class contains only one opaque pointer to its private data and no other member variables.

// Insulated.h
#ifndef INSULATED_H
#ifndef INSULATED_H

class Insulated_private;

class Insulated {
    Insulated_private* d_data; // opaque pointer
public:
    // ... constructors and member functions
};
#endif // INSULATED_H //
The .C file implements the private data type and all member functions and constructors. The .C file can be changed and recompiled without forcing any other component to recompile.

// Insulated.C
#include "Insulated.h"

class Insulated_private {
    // ...
};

// ....

Lutz Kettner (<surname>@mpi-sb.mpg.de). Last modified on Tuesday, 29-Jul-2003 12:26:25 MEST.