
Since the 1990's graphics cards
have developed quickly from a primitive drawing device to a major
computing resource in a PC. High end Graphics Processor Units (GPUs)
have already far more transistors than a typical CPU. Also, they devote
the majority of these transistors to computations whereas a large
percentage of a CPU is occupied by caches.
We have started research
on implementations of partial differential equation (PDE) solvers in
graphics hardware in 2000. At this time the GPUs were very restricted in
the precision of number formats and the programmability. Their main
advantage was the much higher memory bandwidth as opposed to a PC. Many
image processing applications pose exactly these requirements and allow
these restrictions. They involve large image data which needs to be
transferred quickly and do not need ultimate precision for exact
computations, but rather a faithful reconstruction of the image
evolution known from the continuous PDE model. In case of the non-linear
diffusion these are the decreasing diffusivity in areas of large
gradients and the smoothing in image regions which are expected to be
apart from edges (Fig. 1). Whereas for the level-set evolution these
are the fast front propagation in homogeneous regions and the
deceleration of the front at segments' borders (Fig. 2).
With the advent of
floating point units in DirectX9 GPUs not only the number format has
changed but also the balance between the bandwidth of the video memory
and the computational power of the GPUs. Previously, the bandwidth
basically sufficed to provide all processing elements of the GPU with
individual data. But floating point computations on GPUs nowadays imply
- similar to micro-processors - a bandwidth shortage. As a consequence
operations with higher computational intensity should be executed, i.e.
several operations should be performed on each read data item. The
greatly increased programmability also supports the implement ion of more
complex algorithms and allows the incorporation of more advanced
numerical methods. The task of image registration requires to find a
deformation between two images which minimizes a certain energy, e.g.
intensity differences. We implemented a cascaded gradient flow PDE for
the minimization of the energy (Fig. 3). The algorithm operates on a
multi-scale which is represented by a multi-grid hierarchy with several scales
per grid. Efficient
multi-grid solvers and an adaptive time-step control accelerate the
solution. Without the high level programming languages this complexity
could be hardly realized on GPUs.



3. Elimination of a possible acquisition artifact computed in DirectX9 graphics hardware.


|

dozrtska@mped.gpm.fni-i