// prime.C
// Program by Erwin Unruh
template <int i>
struct D {
D(void*);
operator int();
};
template <int p, int i>
struct is_prime {
enum { prim = ((p%i) && is_prime< (i>2 ? p : 0), i-1>::prim) };
};
template <int i>
struct Prime_print {
Prime_print<i-1> a;
enum { prim = is_prime<i,i-1>::prim };
void f() { D<i> d = prim; }
};
struct is_prime<0,0> { enum { prim = 1}; };
struct is_prime<0,1> { enum { prim = 1}; };
struct Prime_print<2> {
enum { prim = 1};
void f() { D<2> d = prim; }
};
void foo() {
Prime_print<20> a;
}
Todd Veldhuizen uses these advanced template techniques with template
metaprograms [Veldhuizen95a] and
with expression templates [Veldhuizen95b],
which we are going to detail later.In [Veldhuizen95a] a template metaprogram for bubblesort is described. Instantiated for a constant number of elements, the metaprogram unrolls the loops of bubblesort and creates the decision tree to sort the elements. The second example is a compile time function for sinus and cosinus that can be used to implement a metaprogram for the FFT, resulting in a single unrolled function for a 256 FFT with all roots evaluated as constants.
In [Veldhuizen95b] templates are used to write code such as:
// Integrate a function from 0 to 10 DoublePlaceholder x; double result = integrate( x / (1.0 + x), 0.0, 10.0);The term x / (1.0 + x) is a template expression. Its benefit is that it can be expanded inline in the integrate function. Template expressions have also nicer syntax than functors. The same technique is used in the paper to implement vector arithmetic more efficiently, namely avoiding large temporary results in vector expressions.
Both techniques are used in Blitz++, a library for numerical algorithms with vectors and matrices, see http://oonumerics.org/blitz/.
int f(T t) {
return t.category();
}
int a = f(t); // function (f) : associates a value (a) to an object (t)
Now observe the similarity with the following function on types :
template < typename T >
struct F {
typedef typename T::Category type;
};
typedef F<T>::type A; // function (F) : associates a type (A) to the type (T)
Constant integral values can be associated to types and vice-versa, at compile
time.
template < typename T >
struct G {
enum { value = 3 };
};
template < int i >
struct H {
typedef H<i+1> type;
};
template < int i >
struct K {
enum { value = i+1 };
};
int a = G<T>::value; // (G) : associates an integral value (a) to the type (T)
typedef H<2>::type A; // (H) : associates a type (A) to an integral constant value (2)
int a = K<2>::value; // (K) : associates an integral value (a) to an integral constant value (2)
Of course, such "meta" functions are not restricted to only one argument, and
not even to one return entity.
template < int i, typename T, int j, typename U >
struct Z {
enum { value1 = i+1 };
enum { value2 = j+i };
typedef Y<T, U> type;
};
Built-in operators (+, -, *, /, %, ?:, ^, !, &, |, &&, ||,
<<, >>, <, >, <=, >=, ==, !=) can also be used
to generate constant integral values directly :
int a = K<2+3>::value; // (+) : associates an integral value (5) to two integral constant values (2 and 3)We can apply composition of meta functions :
int a = G<F<2>::value>::value; // Composition of F and G.To do real programming, we need to be able to perform branches. This can be achieved using (eventually partial) specialization.
template < int i >
struct D {
enum { value = D<i-1>::value };
};
template <>
struct D<0> {
enum { value = 0 };
};
// Which is equivalent to the run-time program :
int d(int i) {
if (i == 0)
return 0;
return d(i-1);
}
// Note that the following works as well in this particular case :
template < int i >
struct E {
enum { value = i==0 ? 0 : D<i-1>::value };
};
The is_prime example in the introduction illustrates some more
possibilities.
Another tool, which allows to connect compile-time programming to the powerful
function overload resolution mechanism of C++, is the sizeof()
operator :
struct LargeStruct { char c[2]; };
char f(...);
LargeStruct f(double);
template < int s >
struct Is_double {
enum { value = (s == 1) };
};
// x is some expression, e.g. "std::sqrt(2.0)"
int a = Is_double< sizeof(f(x)) >::value;
Note that sizeof() returns an integral constant value. It also
has the property that the expression to which it is applied needs not be
defined (being declared is enough).
Overload resolution is a complex mechanism, and we will see later how it allows
to extract some properties of types automatically.
f :
void f();
void f(int);
void f(double);
void f(...);
template < typename T >
void f(T const&);
int main () {
float x;
f(x); // which function f is going to be called ?
}
The compiler proceeds in 2 stages : it first selects all functions that
could match the argument (in this case, the last 3 declarations), then it
discards those which require a conversion which has the least "priority".
If there is not one unique best match, then this ambiguous case leads to an
error.
So conversion rules come into play, and this is an area of C++ which is
very complicated. In this case, the template version is going to be chosen
because there is no perfect match otherwise, the other possibilities
require conversions which have less priority, and the ellipses version is
the one with the least priority.There is a particular feature of the first processing stage performed by the compiler, when it tries to substitute the types of the actual arguments of the function call into each function declaration : this process does not produce an error if it fails, it just discards the considered function from the set of possible matching functions. This is called the "Substitution Failure Is Not An Error" (SFINAE) principle. Now let's add the following additional overloaded function to the previous program :
template < typename T >
struct G {};
template < typename T >
void f(T const&, typename G<T>::type = 0);
Given that G does not define a nested type type,
trying to substitute float in this overloaded function is going
to fail. The SFINAE principle implies that the program is perfectly valid,
and that the first template is still chosen.
Now what happens if we define a nested type type in
G ?
template < typename T >
struct G {
typedef int type;
};
Then the second template matches as well (there is no failure during the
substitution process), and we get an ambiguity error with the first template,
because it matches with equal priority.
Now this opens new horizons, because the template class G can
be specialized for some particular types, and can define or not a nested type
type. This allows to perform some selection on the templates,
to constrain them depending on some properties on types. However, the process
is not yet so nice, because it requires an additional argument to the function
(although we can specify a default value), and this is impossible for
overloaded operators.
The remark that is going to save this idea for operators is that the SFINAE principle applies to the whole function signature, including the return type.
template < typename T > typename G<T>::type f(T const&);Let us now have a look at how we could make use of this in a practical case : the CGAL library. There, we would like to define generic functions for adding geometric vectors :
template < typename Vector_2 >
Vector_2 operator+(Vector_2 const &v, Vector_2 const & w) {
return Vector_2(v.x() + w.x(), v.y() + w.y());
}
Obviously, this doesn't work, because this template is way too general
(it matches almost any type), and it would clash with, for example :
template < typename Vector_3 >
Vector_3 operator+(Vector_3 const &v, Vector_3 const & w) {
return Vector_3(v.x() + w.x(), v.y() + w.y(), v.z() + w.z());
}
So, let's consider concrete types provided by the user, to which we would
like to apply our generic operator :
class My_vector { // 2D vector type
int _x, _y;
public:
My_vector(int x, int y) : _x(x), _y(y) {}
int x() const { return _x; }
int y() const { return _y; }
};
class My_point { // 2D point type
int _x, _y;
public:
My_point(int x, int y) : _x(x), _y(y) {}
int x() const { return _x; }
int y() const { return _y; }
};
The types My_vector and My_point have identical
interfaces, and we would like that our generic operator applies only on
My_vector.
So the idea is to constrain the function template, in such a way that it will
be considered only for types that are 2D geometric vector types, and which
provide the needed requirements (.x() and .y() member
functions, correct constructor, adequate semantics...).
This kind of thing is typically a job for a traits class.
template < typename T >
struct IsVector_2 {
enum { value = false }; // By default, a type is not a 2D vector.
};
template <>
struct IsVector_2 <My_vector> {
enum { value = true }; // But My_vector is a 2D vector.
};
So now we can easily constrain the template using another accessory tool :
template < typename T, typename, bool = T::value >
struct Enable_if;
template < typename T, typename R >
struct Enable_if <T, R, true> {
typedef R type;
};
template < typename Vector_2 >
typename Enable_if< IsVector_2<Vector_2>, Vector_2 >::type
operator+(Vector_2 const &v, Vector_2 const &w) {
return Vector_2(v.x() + w.x(), v.y() + w.y());
}
The following main function illustrates what we finally get :
int main() {
My_vector v(1,2), w(3,4);
My_vector z = v + w; // OK
My_point p(1,2), q(3,4);
My_point r = p + q; // error :
// no match for `My_point& + My_point&' operator
}
The whole program can be found here : vector_2.C.
We have just seen an example of traits class to express a property of a type. We are now going to see ways to write traits classes to automatically extract fundamental properties of types.
Let's start by writing a traits class which determines if a type is a fundamental type or not. Given that there is a small finite number of such types, it is easy to enumerate them :
template < typename >
struct IsFundamental {
enum { value = false };
};
template <>
struct IsFundamental<int> { enum { value = true }; };
template <>
struct IsFundamental<short> { enum { value = true }; };
template <>
struct IsFundamental<double> { enum { value = true }; };
// similarly for bool, char, long, long double and the unsigned versions.
Now let's try to classify compound types, that is, types which are constructed
from other types : plain pointers, references, arrays... We can use partial
specialization to identify some of these categories. In this case, it is
also useful to extract which type(s) the compound is made of (e.g. the type
of each item for an array).
// Primary template
template < typename T >
struct CompoundType {
enum { IsPointer = false, IsReference = false, IsArray = false,
IsFunction = false, IsPointerToMember = false };
typedef T base_type;
};
// Partial specialization for references
template < typename T >
struct CompoundType<T&> {
enum { IsPointer = false, IsReference = true, IsArray = false,
IsFunction = false, IsPointerToMember = false };
typedef T base_type;
};
// Partial specialization for pointers
template < typename T >
struct CompoundType<T&> {
enum { IsPointer = true, IsReference = false, IsArray = false,
IsFunction = false, IsPointerToMember = false };
typedef T base_type;
};
// Partial specialization for arrays
template < typename T, size_t N >
struct CompoundType<T[N]> {
enum { IsPointer = false, IsReference = false, IsArray = true,
IsFunction = false, IsPointerToMember = false };
typedef T base_type;
};
// Partial specialization for empty arrays
template < typename T >
struct CompoundType<T[]> {
enum { IsPointer = false, IsReference = false, IsArray = true,
IsFunction = false, IsPointerToMember = false };
typedef T base_type;
};
// Partial specialization for pointers to members
template < typename T, typename C >
struct CompoundType<T C::*> {
enum { IsPointer = false, IsReference = false, IsArray = false,
IsFunction = false, IsPointerToMember = true };
typedef T base_type;
};
Unfortunately, function types cannot be recognized that easily. We are
therefore going to use the SFINAE principle, after making the following remark
: the only types which cannot be gathered in an array are function types, void
and references. We now exploit this fact by trying to make an array
of the given type, and if it fails, it means the type is one of these special
types. "Trying", here, is a synonym for using the SFINAE principle.
template < typename T >
class IsFunction {
struct LargeStruct { char c[2]; };
template < typename U > static char test(...);
template < typename U > static LargeStruct test(U (*)[1]);
public:
enum { value = sizeof( IsFunction<T>::test<T>(0) ) == 1 };
};
template < typename T >
struct IsFunction<T&> {
enum { value = false };
};
template <>
struct IsFunction<void> {
enum { value = false };
};
template <>
struct IsFunction<void const> {
enum { value = false };
};
How to determine enumerated types ? What are the features of this types that
we could exploit to detect them ? We know that they are convertible to
integral types. But we need something more, because a user defined class
could also define a conversion to an integral type.We are not going to describe the full code here, but the curious reader can find the solution in [Vandevoorde03]. Hint : two user defined conversions cannot be applied automatically consecutively, whereas type promotion between enums and integral types do not have this restriction.
To conclude : it is possible to write traits classes that determine almost
all basic properties of types. Libraries such as
Boost provide such mechanisms in an extensive
way, and the next revision of the standard library may contain such a
type_traits mechanism. However, there are properties which cannot
be determined using the current language : it is possible to determine if a
class defines a static data member of a given name using SFINAE
(static_member_detection.C), but it
is not possible to extract the list of all data members of a class. Such a
possibility would require changes in the core language. The following section
illustrates what could be done with such a feature.
The following is a correct program in the functional programming language PolyP (see http://www.cs.chalmers.se/~patrikj/poly/polyp/):
module Main where
import Flatten(flatten)
data Foo a = Bar | Cee a | Doo (Foo a) a (Foo a)
x = Doo (Doo (Cee 'P') 'a' (Doo Bar 't' Bar)) 'r' (Doo Bar 'i' (Doo Bar 'k' Bar))
main = print (flatten "Patrik", flatten x)
-- expected output: ("Patrik", "Patrik")
Foo a is a typical parameterized recursive type definition,
here a binary tree, where the nodes with constructor Doo and
the leaves with constructor Cee contain an element of type
a, while the leaves with constructor Bar are
empty. The variable x contains such a tree. The generic
function (called polytypic in PolyP) flatten performs an
in-order traversal and concatenates all elements of type a it
can find.The trick is the implementation of flatten without knowing the structure of Foo. Instead, PolyP allows to write functions over the structure of how types can be defined in the language in general. There is only a small set of possebilities how types can be defined in PolyP, for example, alternatives of types or Cartesian product of types.
Can we write such a flatten function in C++? No, we cannot, but almost. The missing information is what could be called self-inspection of types. We need meta information of a type how it is constructed. For example, for a struct we need to know its member types.
Let us assume we would have such information, and for the running example we provide this information just by hand (as a clever annotation to the type definition). Then, we could write the following program in C++ (see flatten.C for the full program):
// ----------------------------------- USER DATA TYPES
struct B;
struct A {
int i;
A* a;
B* b;
};
struct B {
A* a;
};
// ----------------------------------- EXAMPLE MAIN PROGRAM
int main() {
A a1; a1.i = 1;
A a2; a2.i = 2;
A a3; a3.i = 3;
B b;
a1.a = &a2;
a2.a = &a3;
a3.a = 0;
a1.b = &b;
a2.b = 0;
a3.b = 0;
b.a = &a2;
list<int> ls;
flatten(a1,ls);
copy( ls.begin(), ls.end(), ostream_iterator<int>(cout, " "));
cout << endl;
}
The program creates an acyclic graph rooted at a1. Nodes of type A can
point to nodes of type A and to nodes of type B. They also contain an
integer value, the value we want to concatenate during our flatten
call. Nodes of type B can just point to a node of type A. The graph
looks as follows, nodes are labelled with their types, and nodes of
type A contain their integer value:

The return value of the flatten call is stored in the list ls. The output of the program is the sequence [1,2,3,2,3].
We have to add some annotation for our type definitions of A and B to make this example work. For the flatten function, we can get away with a rather dense notation that gives us for each structure the types of the members that it contains (incl. how many), and a way to access the different members. We define a class template with two arguments. The first argument is the type we want to annotate. The second argument is the cardinal number of the current member we are describing. The general class template has an operator() that just returns an object of type UNKNOWN. For each member in each type we write a specialization of the class template. The specialization has an operator() that, given an object of that type, returns the member of the given cardinal number.
// ----------------------------------- ANNOTATE TYPES FOR SELF-INSPECTION
struct UNKNOWN {};
template <class X, int i> struct T {
UNKNOWN operator()( X&) { return UNKNOWN();}
};
template <> struct T<A,1> { int operator()(A& a) { return a.i; } };
template <> struct T<A,2> { A* operator()(A& a) { return a.a; } };
template <> struct T<A,3> { B* operator()(A& a) { return a.b; } };
template <> struct T<B,1> { A* operator()(B& b) { return b.a; } };
The flatten function unrolls along the way types can be
defined in C++. We have restricted this example to deal with pointers
and structures. We have omitted, for example, reference types, unions,
and arrays. We follow the flatten function call. First, we
distinguish between pointer types and non-pointer types. We use
partial specialization of a class template to implement this
distinction. For pointer types, we apply flatten recursively on the
dereferenced value of the pointer.
// ----------------------------------- VALUE / POINTER
template <class X, class I>
struct flatten_class {
void operator()( X x, list<I>& ls) {
flatten_items( T<X,1>()(x), T<X,1>(), x, ls);
}
};
template <class P, class I>
struct flatten_class<P*,I> {
void operator()( P* x, list<I>& ls) {
if ( x != NULL)
flatten_class<P,I>()(*x, ls);
}
};
template <class X, class I>
void flatten( X x, list<I>& ls) {
flatten_class<X,I>()( x, ls);
}
Second, we distinguish between structs and simple types. For structs,
we iterate through all members of the struct and call flatten
recursively. If the type is simple and equal to the value type of the
list, we append the value to the list. Thus, in this implementation of
the flatten function, the value type of the list actually
determines what gets collected. We use again a class template and
partial specialization to do the match. Note how the UNKNOWN return
type causes the recursive enumeration of all the members to stop. Note
also that in the case we want to flatten a struct that has a
meta description we get an ambiguous match between the second and
fourth specialization. The third specialization solves this
ambiguity.
// ----------------------------------- LEAF / STRUCT WITH SUBITEMS
template <class Item, class X, int n, class I>
struct flatten_items_class { // flatten all subitems
void operator()( Item i, X x, list<I>& ls) {
flatten( i, ls);
flatten_items( T<X,n+1>()(x), T<X,n+1>(), x, ls);
}
};
template <class Item,int n, class I>
struct flatten_items_class<Item,I,n,I> { // item of type I found
void operator()( Item, I i, list<I>& ls) {
ls.push_back(i);
}
};
template <int n, class I> // item of type I found
struct flatten_items_class<UNKNOWN,I,n,I> {
void operator()( UNKNOWN, I i, list<I>& ls) {
ls.push_back(i);
}
};
template <class X, int n, class I> // no (further) subitems
struct flatten_items_class<UNKNOWN,X,n,I> {
void operator()( UNKNOWN, X x, list<I>& ls) {}
};
template <class Item, class X, int n, class I>
void flatten_items( Item i, T<X,n>, X x, list<I>& ls) {
flatten_items_class<Item,X,n,I>()(i,x,ls);
}
This concludes our presentation of flatten.C.
The purpose of this example was to stretch the imagination with what would be
possible in C++ (and is possible in other languages), but how ugly and
unmaintainable it looks in C++.Expression templates have several uses, but basically the advantages of this technic are :
class SArray {
double * _a;
size_t _s;
public:
SArray (size_t n) : _a(new double[n]), _s(n) {}
double const& operator[](size_t i) const { return _a[i]; }
double & operator[](size_t i) { return _a[i]; }
size_t size() const { return _s; }
~SArray() { delete[] _a; }
};
SArray
operator+(SArray const& a, SArray const& b) {
assert(a.size() == b.size());
SArray tmp(a.size();
for (int i = 0; i < a.size(); ++i)
tmp[i] = a[i] + b[i];
return tmp;
}
Now, if we consider the typical use below, we can see that there is an
efficiency problem with the last line : a temporary array is created in order
to store the result of x + y. This does not increase the number
of additions to be performed, but it increases the amount of memory required by
the program, which makes it slower due to cache effects.
SArray x, y, z, t; ... // do something useful with x, y, z t = x + y + z;We can solve the problem by using a dedicated function that adds three arrays in one shot :
SArray
add_3(SArray const& a, SArray const& b, SArray const& c) {
assert(a.size() == b.size() && a.size() == c.size());
SArray tmp(a.size();
for (int i = 0; i < a.size(); ++i)
tmp[i] = a[i] + b[i] + c[i];
return tmp;
}
...
t = add_3(x, y, z);
This is inconvenient because the notation is cumbersome, and for any expression
based on arrays, we would like to benefit from the optimization, without
requiring to write a new function for, e.g. x+y*z-z*x, or whatever
the user of our array class wants to compute. This is where expression
templates come into play.The idea is to postpone the actual evaluation of the expression until the assignment operator is seen. This is done by creating an object that will encode the expression in the form of a tree :

Each node in this tree has a particular type representing the type of operation
(e.g. +) that it represents, together with references to its
operands.
template < typename OP1, typename OP2 >
class A_Add {
OP1 const& op1; // first operand
OP2 const& op2; // second operand
public:
A_Add (OP1 const& a, OP2 const& b)
: op1(a), op2(b) {}
// What it is contributing in the final computation
double operator[] (size_t i) const {
return op1[i] + op2[i];
}
};
Notice now how operator+ is going to create a node of the
expression tree, and no computation on the arrays at all. We first need to
write a wrapper around the concrete array SArray. This is the
assignment operator which triggers the recursive computation over the tree.
template < typename Rep = SArray >
class Array {
Rep expr_rep;
public:
// initialization with a size
explicit Array(size_t s)
: expr_rep(s) {}
// create an array from a possible representation
Array (Rep const& rb)
: expr_rep(rb) {}
// Assignment of arrays of the same type
Array& operator=(Array const& b) {
assert(size() == b.size());
for (size_t i = 0; i < b.size(); ++i)
expr_rep[i] = b[i];
return *this;
}
// Assignment of arrays of the different type
template < typename Rep2 >
Array& operator=(Array<Rep2> const& b) {
assert(size() == b.size());
for (size_t i = 0; i < b.size(); ++i)
expr_rep[i] = b[i];
return *this;
}
double operator[] (size_t i) {
assert(i < size());
return expr_rep[i];
}
Rep const& rep() const { return expr_rep; }
Rep & rep() { return expr_rep; }
};
template < typename R1, typename R2 >
Array<A_Add<R1, R2> >
operator+(Array<R1> const& a, Array<R2> const& b) {
return Array<A_Add<R1, R2> > (A_Add<R1, R2>(a.rep(), b.rep()));
}
Similarly for the other operations :
template < typename OP1, typename OP2 >
class A_Mul {
OP1 const& op1; // first operand
OP2 const& op2; // second operand
public:
A_Mul (OP1 const& a, OP2 const& b)
: op1(a), op2(b) {}
double operator[] (size_t i) const {
return op1[i] * op2[i];
}
};
template < typename R1, typename R2 >
Array<A_Mul<R1, R2> >
operator*(Array<R1> const& a, Array<R2> const& b) {
return Array<A_Mul<R1, R2> > (A_Mul<R1, R2>(a.rep(), b.rep()));
}
With this implementation, we have achieved what we have promised.
Note that a complete implementation would also have to take care of some corner
cases like proper handling of local variables (by copying them in the tree
nodes, not referencing them), also paying attention to cases where the variable
being assigned to is also in the arguments (e.g. x = y + x)...
Caveats : for the optimizations to apply, the compiler needs to perform two
kinds of particular optimizations itself : inlining all the small functions
which locally create a tree on the stack, and split the tree node structures
in small components. For example, the latter is not being performed by
g++ at the moment (version 3.3). As with some other template
technics, it also has the potential to increase compilation times considerably
depending on the program.
Another caveat of the expression templates is a bad interaction with generic libraries. Consider :
template < typename T >
T square(T const& t) {
return t*t;
}
Array<> x, y, z;
...
z = square(x); // OK
z = square(x+y); // error
The problem here is that x+y doesn't have the "base" type
Array<>, but it has the type of a tree node.
Inside the square function, the return value is therefore
going to be that particular type, but there is no possible conversion
between any two different "internal" tree node types. The only possible
conversion is from an internal tree node to the "base" type
Array<>. Therefore it has to be explicitely
converted before the call to the square function :
... z = square(Array<>(x+y)); // OK z = square<Array<> >(x+y); // OKThe other possibility is to partially overload the
square
function as shown below. The advantage is that the call sites do not
require any change, but the disadvantage is that the overloading has
to know about the R parameter of Array, which
otherwise never shows up in the usage of this class.
template < typename R >
Array<> square(Array<R> const& t) {
return t*t;
}
// or, taking further advantage of the "optimization" :
template < typename R >
Array<A_Mul<R, R> > square(Array<R> const& t) {
return t*t;
}
There are other useful applications of expression templates, for example in
the Boost Lambda library, which allows
to create functors "on the fly", starting from placeholders variables
_1, _2... provided by the library. This way you can write
functors on the fly using a natural syntax like :
std::vector<double*> V; ... std::sort(V.begin(), V.end(), *_1 > *_2);Another application can be found in the GMP library (GNU Multi Precision), which provides C++ types for computing with multi precision numbers (integers, rationals...). In this case, expression templates are used to avoid creating temporaries. We can also cite PETE (Portable Expression Template Engine) which is a tool which allows to create expression templates easily.