Algorithm Library Design: Course Home Page -- Lecture Notes -- Source Code -- References

8. Template Metaprograms


Introduction

Template metaprogramming refers to a technique where the template instantiation mechanism of the C++ compiler is used to partially evaluate a program at compile time. At some point during the standardization process of C++ it has been discovered that templates make the C++ compiler actually Turing equivalent at compile time. The first examples using the C++ compiler as an interpreter are credited to Erwin Unruh. Among them the following program that computes prime numbers at compile time. The result is communicated with compiler warnings containing the prime numbers. This program circulated among the members of the ANSI/ISO C++ standardization committee. The program does not work anymore on current compilers, see prime.C for a modified version that compiles as promised on the current g++ compiler.

// prime.C
// Program by Erwin Unruh

template <int i> 
struct D {
    D(void*);
    operator int();
};

template <int p, int i>
struct is_prime {
    enum { prim = ((p%i) && is_prime< (i>2 ? p : 0), i-1>::prim) };
};

template <int i>
struct Prime_print {
    Prime_print<i-1> a;
    enum { prim = is_prime<i,i-1>::prim };
    void f() { D<i> d = prim; }
};

struct is_prime<0,0> { enum { prim = 1}; };
struct is_prime<0,1> { enum { prim = 1}; };
struct Prime_print<2> { 
    enum { prim = 1};
    void f() { D<2> d = prim; }
};

void foo() {
    Prime_print<20> a;
}
Todd Veldhuizen uses these advanced template techniques with template metaprograms [Veldhuizen95a] and with expression templates [Veldhuizen95b], which we are going to detail later.

In [Veldhuizen95a] a template metaprogram for bubblesort is described. Instantiated for a constant number of elements, the metaprogram unrolls the loops of bubblesort and creates the decision tree to sort the elements. The second example is a compile time function for sinus and cosinus that can be used to implement a metaprogram for the FFT, resulting in a single unrolled function for a 256 FFT with all roots evaluated as constants.

In [Veldhuizen95b] templates are used to write code such as:

// Integrate a function from 0 to 10
DoublePlaceholder x;
double result = integrate( x / (1.0 + x), 0.0, 10.0);
The term x / (1.0 + x) is a template expression. Its benefit is that it can be expanded inline in the integrate function. Template expressions have also nicer syntax than functors. The same technique is used in the paper to implement vector arithmetic more efficiently, namely avoiding large temporary results in vector expressions.

Both techniques are used in Blitz++, a library for numerical algorithms with vectors and matrices, see http://oonumerics.org/blitz/.


Compile-time programming

The C++ language provides two entities which can be used to program at compile time: We now describe a list of basic operations on these entites, which can be used to program "at compile time". Let us begin by making an analogy with the following usual function :
int f(T t) {
  return t.category();
}

int a = f(t);    // function (f) : associates a value (a) to an object (t)
Now observe the similarity with the following function on types :
template < typename T >
struct F {
  typedef typename T::Category  type;
};

typedef F<T>::type A; // function (F) : associates a type (A) to the type (T)
Constant integral values can be associated to types and vice-versa, at compile time.
template < typename T >
struct G {
  enum { value = 3 };
};

template < int i >
struct H {
  typedef H<i+1> type;
};

template < int i >
struct K {
  enum { value = i+1 };
};

int a = G<T>::value;  // (G) : associates an integral value (a) to the type (T)
typedef H<2>::type A; // (H) : associates a type (A) to an integral constant value (2)
int a = K<2>::value;  // (K) : associates an integral value (a) to an integral constant value (2)
Of course, such "meta" functions are not restricted to only one argument, and not even to one return entity.
template < int i, typename T, int j, typename U >
struct Z {
  enum { value1 = i+1 };
  enum { value2 = j+i };
  typedef Y<T, U>  type;
};
Built-in operators (+, -, *, /, %, ?:, ^, !, &, |, &&, ||, <<, >>, <, >, <=, >=, ==, !=) can also be used to generate constant integral values directly :
int a = K<2+3>::value; // (+) : associates an integral value (5) to two integral constant values (2 and 3)
We can apply composition of meta functions :
int a = G<F<2>::value>::value; // Composition of F and G.
To do real programming, we need to be able to perform branches. This can be achieved using (eventually partial) specialization.
template < int i >
struct D {
  enum { value = D<i-1>::value };
};

template <>
struct D<0> {
  enum { value = 0 };
};

// Which is equivalent to the run-time program :

int d(int i) {
  if (i == 0)
    return 0;
  return d(i-1);
}

// Note that the following works as well in this particular case :
template < int i >
struct E {
  enum { value = i==0 ? 0 : D<i-1>::value };
};
The is_prime example in the introduction illustrates some more possibilities.

Another tool, which allows to connect compile-time programming to the powerful function overload resolution mechanism of C++, is the sizeof() operator :

struct LargeStruct { char c[2]; };

char        f(...);
LargeStruct f(double);

template < int s >
struct Is_double {
  enum { value = (s == 1) };
};

// x is some expression, e.g. "std::sqrt(2.0)"
int a = Is_double< sizeof(f(x)) >::value;
Note that sizeof() returns an integral constant value. It also has the property that the expression to which it is applied needs not be defined (being declared is enough). Overload resolution is a complex mechanism, and we will see later how it allows to extract some properties of types automatically.

Constraining templates using the SFINAE principle

In C++, overload resolution is the process by which the compiler selects which specific function to use in a given call, when that function is overloaded. Consider the following overloaded function f :
void f();
void f(int);
void f(double);
void f(...);

template < typename T >
void f(T const&);

int main () {
  float x;
  f(x);    // which function f is going to be called ?
}
The compiler proceeds in 2 stages : it first selects all functions that could match the argument (in this case, the last 3 declarations), then it discards those which require a conversion which has the least "priority". If there is not one unique best match, then this ambiguous case leads to an error. So conversion rules come into play, and this is an area of C++ which is very complicated. In this case, the template version is going to be chosen because there is no perfect match otherwise, the other possibilities require conversions which have less priority, and the ellipses version is the one with the least priority.

There is a particular feature of the first processing stage performed by the compiler, when it tries to substitute the types of the actual arguments of the function call into each function declaration : this process does not produce an error if it fails, it just discards the considered function from the set of possible matching functions. This is called the "Substitution Failure Is Not An Error" (SFINAE) principle. Now let's add the following additional overloaded function to the previous program :

template < typename T >
struct G {};

template < typename T >
void f(T const&, typename G<T>::type = 0);
Given that G does not define a nested type type, trying to substitute float in this overloaded function is going to fail. The SFINAE principle implies that the program is perfectly valid, and that the first template is still chosen.

Now what happens if we define a nested type type in G ?

template < typename T >
struct G {
  typedef int type;
};
Then the second template matches as well (there is no failure during the substitution process), and we get an ambiguity error with the first template, because it matches with equal priority.

Now this opens new horizons, because the template class G can be specialized for some particular types, and can define or not a nested type type. This allows to perform some selection on the templates, to constrain them depending on some properties on types. However, the process is not yet so nice, because it requires an additional argument to the function (although we can specify a default value), and this is impossible for overloaded operators.

The remark that is going to save this idea for operators is that the SFINAE principle applies to the whole function signature, including the return type.

template < typename T >
typename G<T>::type f(T const&);
Let us now have a look at how we could make use of this in a practical case : the CGAL library. There, we would like to define generic functions for adding geometric vectors :
template < typename Vector_2 >
Vector_2 operator+(Vector_2 const &v, Vector_2 const & w) {
  return Vector_2(v.x() + w.x(), v.y() + w.y());
}
Obviously, this doesn't work, because this template is way too general (it matches almost any type), and it would clash with, for example :
template < typename Vector_3 >
Vector_3 operator+(Vector_3 const &v, Vector_3 const & w) {
  return Vector_3(v.x() + w.x(), v.y() + w.y(), v.z() + w.z());
}
So, let's consider concrete types provided by the user, to which we would like to apply our generic operator :
class My_vector {  // 2D vector type
  int _x, _y;
public:

  My_vector(int x, int y) : _x(x), _y(y) {}

  int x() const { return _x; }
  int y() const { return _y; }
};

class My_point {  // 2D point type
  int _x, _y;
public:

  My_point(int x, int y) : _x(x), _y(y) {}

  int x() const { return _x; }
  int y() const { return _y; }
};
The types My_vector and My_point have identical interfaces, and we would like that our generic operator applies only on My_vector.

So the idea is to constrain the function template, in such a way that it will be considered only for types that are 2D geometric vector types, and which provide the needed requirements (.x() and .y() member functions, correct constructor, adequate semantics...). This kind of thing is typically a job for a traits class.

template < typename T >
struct IsVector_2 {
  enum { value = false };  // By default, a type is not a 2D vector.
};

template <>
struct IsVector_2 <My_vector> {
  enum { value = true };   // But My_vector is a 2D vector.
};
So now we can easily constrain the template using another accessory tool :
template < typename T, typename, bool = T::value >
struct Enable_if;

template < typename T, typename R >
struct Enable_if <T, R, true> {
  typedef R type;
};

template < typename Vector_2 >
typename Enable_if< IsVector_2<Vector_2>, Vector_2 >::type
operator+(Vector_2 const &v, Vector_2 const &w) {
  return Vector_2(v.x() + w.x(), v.y() + w.y());
}
The following main function illustrates what we finally get :
int main() {
  My_vector v(1,2), w(3,4);
  My_vector z = v + w;   // OK

  My_point p(1,2), q(3,4);
  My_point r = p + q;    // error :
                         // no match for `My_point& + My_point&' operator
}
The whole program can be found here : vector_2.C.

Type Classification

(Note : this section is inspired by chapter 19 of [Vandevoorde03])

We have just seen an example of traits class to express a property of a type. We are now going to see ways to write traits classes to automatically extract fundamental properties of types.

Let's start by writing a traits class which determines if a type is a fundamental type or not. Given that there is a small finite number of such types, it is easy to enumerate them :

template < typename >
struct IsFundamental {
  enum { value = false };
};

template <>
struct IsFundamental<int>    { enum { value = true }; };

template <>
struct IsFundamental<short>  { enum { value = true }; };

template <>
struct IsFundamental<double> { enum { value = true }; };

// similarly for bool, char, long, long double and the unsigned versions.
Now let's try to classify compound types, that is, types which are constructed from other types : plain pointers, references, arrays... We can use partial specialization to identify some of these categories. In this case, it is also useful to extract which type(s) the compound is made of (e.g. the type of each item for an array).
// Primary template
template < typename T >
struct CompoundType {
  enum { IsPointer = false, IsReference = false, IsArray = false,
         IsFunction = false, IsPointerToMember = false };
  typedef T base_type;
};

// Partial specialization for references
template < typename T >
struct CompoundType<T&> {
  enum { IsPointer = false, IsReference = true, IsArray = false,
         IsFunction = false, IsPointerToMember = false };
  typedef T base_type;
};

// Partial specialization for pointers
template < typename T >
struct CompoundType<T&> {
  enum { IsPointer = true, IsReference = false, IsArray = false,
         IsFunction = false, IsPointerToMember = false };
  typedef T base_type;
};

// Partial specialization for arrays
template < typename T, size_t N >
struct CompoundType<T[N]> {
  enum { IsPointer = false, IsReference = false, IsArray = true,
         IsFunction = false, IsPointerToMember = false };
  typedef T base_type;
};

// Partial specialization for empty arrays
template < typename T >
struct CompoundType<T[]> {
  enum { IsPointer = false, IsReference = false, IsArray = true,
         IsFunction = false, IsPointerToMember = false };
  typedef T base_type;
};

// Partial specialization for pointers to members
template < typename T, typename C >
struct CompoundType<T C::*> {
  enum { IsPointer = false, IsReference = false, IsArray = false,
         IsFunction = false, IsPointerToMember = true };
  typedef T base_type;
};
Unfortunately, function types cannot be recognized that easily. We are therefore going to use the SFINAE principle, after making the following remark : the only types which cannot be gathered in an array are function types, void and references. We now exploit this fact by trying to make an array of the given type, and if it fails, it means the type is one of these special types. "Trying", here, is a synonym for using the SFINAE principle.
template < typename T >
class IsFunction {
  struct LargeStruct { char c[2]; };
  template < typename U > static char        test(...);
  template < typename U > static LargeStruct test(U (*)[1]);
public:
  enum { value = sizeof( IsFunction<T>::test<T>(0) ) == 1 };
};

template < typename T >
struct IsFunction<T&> {
  enum { value = false };
};

template <>
struct IsFunction<void> {
  enum { value = false };
};

template <>
struct IsFunction<void const> {
  enum { value = false };
};
How to determine enumerated types ? What are the features of this types that we could exploit to detect them ? We know that they are convertible to integral types. But we need something more, because a user defined class could also define a conversion to an integral type.

We are not going to describe the full code here, but the curious reader can find the solution in [Vandevoorde03]. Hint : two user defined conversions cannot be applied automatically consecutively, whereas type promotion between enums and integral types do not have this restriction.

To conclude : it is possible to write traits classes that determine almost all basic properties of types. Libraries such as Boost provide such mechanisms in an extensive way, and the next revision of the standard library may contain such a type_traits mechanism. However, there are properties which cannot be determined using the current language : it is possible to determine if a class defines a static data member of a given name using SFINAE (static_member_detection.C), but it is not possible to extract the list of all data members of a class. Such a possibility would require changes in the core language. The following section illustrates what could be done with such a feature.


Self Reflection of Types in C++: The flatten.C Example

In this section I describe how far one could go with generic algorithms if types would be first class citizens in C++.

The following is a correct program in the functional programming language PolyP (see http://www.cs.chalmers.se/~patrikj/poly/polyp/):

module Main where 
import Flatten(flatten)

data Foo a = Bar | Cee a | Doo (Foo a) a (Foo a)

x = Doo (Doo (Cee 'P') 'a' (Doo Bar 't' Bar)) 'r' (Doo Bar 'i' (Doo Bar 'k' Bar))

main = print (flatten "Patrik", flatten x)
-- expected output: ("Patrik", "Patrik")
Foo a is a typical parameterized recursive type definition, here a binary tree, where the nodes with constructor Doo and the leaves with constructor Cee contain an element of type a, while the leaves with constructor Bar are empty. The variable x contains such a tree. The generic function (called polytypic in PolyP) flatten performs an in-order traversal and concatenates all elements of type a it can find.

The trick is the implementation of flatten without knowing the structure of Foo. Instead, PolyP allows to write functions over the structure of how types can be defined in the language in general. There is only a small set of possebilities how types can be defined in PolyP, for example, alternatives of types or Cartesian product of types.

Can we write such a flatten function in C++? No, we cannot, but almost. The missing information is what could be called self-inspection of types. We need meta information of a type how it is constructed. For example, for a struct we need to know its member types.

Let us assume we would have such information, and for the running example we provide this information just by hand (as a clever annotation to the type definition). Then, we could write the following program in C++ (see flatten.C for the full program):

// -----------------------------------  USER DATA TYPES
struct B;

struct A {
    int i;
    A* a;
    B* b;
};

struct B {
    A* a;
};

// -----------------------------------  EXAMPLE MAIN PROGRAM

int main() {
    A a1; a1.i = 1;
    A a2; a2.i = 2;
    A a3; a3.i = 3;
    B b;
    
    a1.a = &a2;
    a2.a = &a3;
    a3.a = 0;
    a1.b = &b;
    a2.b = 0;
    a3.b = 0;
    b.a = &a2;

    list<int> ls;
    flatten(a1,ls);
    copy( ls.begin(), ls.end(), ostream_iterator<int>(cout, " "));
    cout << endl;
}
The program creates an acyclic graph rooted at a1. Nodes of type A can point to nodes of type A and to nodes of type B. They also contain an integer value, the value we want to concatenate during our flatten call. Nodes of type B can just point to a node of type A. The graph looks as follows, nodes are labelled with their types, and nodes of type A contain their integer value:

The return value of the flatten call is stored in the list ls. The output of the program is the sequence [1,2,3,2,3].

We have to add some annotation for our type definitions of A and B to make this example work. For the flatten function, we can get away with a rather dense notation that gives us for each structure the types of the members that it contains (incl. how many), and a way to access the different members. We define a class template with two arguments. The first argument is the type we want to annotate. The second argument is the cardinal number of the current member we are describing. The general class template has an operator() that just returns an object of type UNKNOWN. For each member in each type we write a specialization of the class template. The specialization has an operator() that, given an object of that type, returns the member of the given cardinal number.

// -----------------------------------  ANNOTATE TYPES FOR SELF-INSPECTION
struct UNKNOWN {};

template <class X, int i> struct T { 
    UNKNOWN operator()( X&) { return UNKNOWN();} 
};

template <> struct T<A,1> {  int  operator()(A& a) { return a.i; } };
template <> struct T<A,2> {  A*   operator()(A& a) { return a.a; } };
template <> struct T<A,3> {  B*   operator()(A& a) { return a.b; } };

template <> struct T<B,1> {  A*   operator()(B& b) { return b.a; } };
The flatten function unrolls along the way types can be defined in C++. We have restricted this example to deal with pointers and structures. We have omitted, for example, reference types, unions, and arrays. We follow the flatten function call. First, we distinguish between pointer types and non-pointer types. We use partial specialization of a class template to implement this distinction. For pointer types, we apply flatten recursively on the dereferenced value of the pointer.

// -----------------------------------  VALUE / POINTER
template <class X, class I>
struct flatten_class {
    void operator()( X x, list<I>& ls) {
        flatten_items( T<X,1>()(x), T<X,1>(), x, ls);
    }
};
template <class P, class I>
struct flatten_class<P*,I> {
    void operator()( P* x, list<I>& ls) { 
        if ( x != NULL)
            flatten_class<P,I>()(*x, ls);
    }
};

template <class X, class I>
void flatten( X x, list<I>& ls) {
    flatten_class<X,I>()( x, ls);
}
Second, we distinguish between structs and simple types. For structs, we iterate through all members of the struct and call flatten recursively. If the type is simple and equal to the value type of the list, we append the value to the list. Thus, in this implementation of the flatten function, the value type of the list actually determines what gets collected. We use again a class template and partial specialization to do the match. Note how the UNKNOWN return type causes the recursive enumeration of all the members to stop. Note also that in the case we want to flatten a struct that has a meta description we get an ambiguous match between the second and fourth specialization. The third specialization solves this ambiguity.

// -----------------------------------  LEAF / STRUCT WITH SUBITEMS
template <class Item, class X, int n, class I>
struct flatten_items_class {                     // flatten all subitems
    void operator()( Item i, X x, list<I>& ls) {
        flatten( i, ls);
        flatten_items( T<X,n+1>()(x), T<X,n+1>(), x, ls);
    }
};
template <class Item,int n, class I>
struct flatten_items_class<Item,I,n,I> {         // item of type I found
    void operator()( Item, I i, list<I>& ls) {
        ls.push_back(i);
    }
};
template <int n, class I>                        // item of type I found
struct flatten_items_class<UNKNOWN,I,n,I> {
    void operator()( UNKNOWN, I i, list<I>& ls) {
        ls.push_back(i);
    }
};
template <class X, int n, class I>               // no (further) subitems
struct flatten_items_class<UNKNOWN,X,n,I> {
    void operator()( UNKNOWN, X x, list<I>& ls) {}
};

template <class Item, class X, int n, class I>
void flatten_items( Item i, T<X,n>, X x, list<I>& ls) {
    flatten_items_class<Item,X,n,I>()(i,x,ls);
}
This concludes our presentation of flatten.C. The purpose of this example was to stretch the imagination with what would be possible in C++ (and is possible in other languages), but how ugly and unmaintainable it looks in C++.


Expression Templates

(Note : this section is inspired by chapter 18 of [Vandevoorde03])

Expression templates have several uses, but basically the advantages of this technic are :

The typical example of expression templates is the optimization of array operations. Let's consider :
class SArray {
  double * _a;
  size_t   _s;
public:
  SArray (size_t n) : _a(new double[n]), _s(n) {}

  double const& operator[](size_t i) const { return _a[i]; }
  double &      operator[](size_t i)       { return _a[i]; }

  size_t size() const { return _s; }

  ~SArray() { delete[] _a; }
};

SArray
operator+(SArray const& a, SArray const& b) {
  assert(a.size() == b.size());
  SArray tmp(a.size();
  for (int i = 0; i < a.size(); ++i)
    tmp[i] = a[i] + b[i];
  return tmp;
}
Now, if we consider the typical use below, we can see that there is an efficiency problem with the last line : a temporary array is created in order to store the result of x + y. This does not increase the number of additions to be performed, but it increases the amount of memory required by the program, which makes it slower due to cache effects.
  SArray x, y, z, t;
  ... // do something useful with x, y, z
  t = x + y + z;
We can solve the problem by using a dedicated function that adds three arrays in one shot :
SArray
add_3(SArray const& a, SArray const& b, SArray const& c) {
  assert(a.size() == b.size() && a.size() == c.size());
  SArray tmp(a.size();
  for (int i = 0; i < a.size(); ++i)
    tmp[i] = a[i] + b[i] + c[i];
  return tmp;
}

  ...
  t = add_3(x, y, z);
This is inconvenient because the notation is cumbersome, and for any expression based on arrays, we would like to benefit from the optimization, without requiring to write a new function for, e.g. x+y*z-z*x, or whatever the user of our array class wants to compute. This is where expression templates come into play.

The idea is to postpone the actual evaluation of the expression until the assignment operator is seen. This is done by creating an object that will encode the expression in the form of a tree :

Each node in this tree has a particular type representing the type of operation (e.g. +) that it represents, together with references to its operands.

template < typename OP1, typename OP2 >
class A_Add {
  OP1 const& op1;   // first  operand
  OP2 const& op2;   // second operand
public:
  A_Add (OP1 const& a, OP2 const& b)
    : op1(a), op2(b) {}

  // What it is contributing in the final computation
  double operator[] (size_t i) const {
    return op1[i] + op2[i];
  }
};
Notice now how operator+ is going to create a node of the expression tree, and no computation on the arrays at all. We first need to write a wrapper around the concrete array SArray. This is the assignment operator which triggers the recursive computation over the tree.
template < typename Rep = SArray >
class Array {
  Rep expr_rep;

public:

  // initialization with a size
  explicit Array(size_t s)
    : expr_rep(s) {}

  // create an array from a possible representation
  Array (Rep const& rb)
    : expr_rep(rb) {}

  // Assignment of arrays of the same type
  Array& operator=(Array const& b) {
    assert(size() == b.size());
    for (size_t i = 0; i < b.size(); ++i)
      expr_rep[i] = b[i];
    return *this;
  }

  // Assignment of arrays of the different type
  template < typename Rep2 >
  Array& operator=(Array<Rep2> const& b) {
    assert(size() == b.size());
    for (size_t i = 0; i < b.size(); ++i)
      expr_rep[i] = b[i];
    return *this;
  }

  double operator[] (size_t i) {
    assert(i < size());
    return expr_rep[i];
  }

  Rep const& rep() const { return expr_rep; }
  Rep & rep() { return expr_rep; }
};

template < typename R1, typename R2 >
Array<A_Add<R1, R2> >
operator+(Array<R1> const& a, Array<R2> const& b) {
  return Array<A_Add<R1, R2> > (A_Add<R1, R2>(a.rep(), b.rep()));
}
Similarly for the other operations :
template < typename OP1, typename OP2 >
class A_Mul {
  OP1 const& op1;   // first  operand
  OP2 const& op2;   // second operand
public:
  A_Mul (OP1 const& a, OP2 const& b)
    : op1(a), op2(b) {}

  double operator[] (size_t i) const {
    return op1[i] * op2[i];
  }
};

template < typename R1, typename R2 >
Array<A_Mul<R1, R2> >
operator*(Array<R1> const& a, Array<R2> const& b) {
  return Array<A_Mul<R1, R2> > (A_Mul<R1, R2>(a.rep(), b.rep()));
}
With this implementation, we have achieved what we have promised. Note that a complete implementation would also have to take care of some corner cases like proper handling of local variables (by copying them in the tree nodes, not referencing them), also paying attention to cases where the variable being assigned to is also in the arguments (e.g. x = y + x)...

Caveats : for the optimizations to apply, the compiler needs to perform two kinds of particular optimizations itself : inlining all the small functions which locally create a tree on the stack, and split the tree node structures in small components. For example, the latter is not being performed by g++ at the moment (version 3.3). As with some other template technics, it also has the potential to increase compilation times considerably depending on the program.

Another caveat of the expression templates is a bad interaction with generic libraries. Consider :

template < typename T >
T square(T const& t) {
  return t*t;
}

  Array<> x, y, z;
  ...
  z = square(x);    // OK
  z = square(x+y);  // error
The problem here is that x+y doesn't have the "base" type Array<>, but it has the type of a tree node. Inside the square function, the return value is therefore going to be that particular type, but there is no possible conversion between any two different "internal" tree node types. The only possible conversion is from an internal tree node to the "base" type Array<>. Therefore it has to be explicitely converted before the call to the square function :
  ...
  z = square(Array<>(x+y));   // OK
  z = square<Array<> >(x+y);  // OK
The other possibility is to partially overload the square function as shown below. The advantage is that the call sites do not require any change, but the disadvantage is that the overloading has to know about the R parameter of Array, which otherwise never shows up in the usage of this class.
template < typename R >
Array<> square(Array<R> const& t) {
  return t*t;
}

// or, taking further advantage of the "optimization" :

template < typename R >
Array<A_Mul<R, R> > square(Array<R> const& t) {
  return t*t;
}
There are other useful applications of expression templates, for example in the Boost Lambda library, which allows to create functors "on the fly", starting from placeholders variables _1, _2... provided by the library. This way you can write functors on the fly using a natural syntax like :
  std::vector<double*> V;
  ...
  std::sort(V.begin(), V.end(), *_1 > *_2);
Another application can be found in the GMP library (GNU Multi Precision), which provides C++ types for computing with multi precision numbers (integers, rationals...). In this case, expression templates are used to avoid creating temporaries. We can also cite PETE (Portable Expression Template Engine) which is a tool which allows to create expression templates easily.
Lutz Kettner (<surname>@mpi-sb.mpg.de). Last modified on Tuesday, 29-Jul-2003 12:26:25 MEST.