Max Planck Center   (Max Planck Institut Informatik)
NVIDIA Research, Emerging Applications   (NVIDIA Corp.)


photograph of Robert Strzodka
Dr. Robert Strzodka

Visiting Researcher
Head of CFD Group

NVIDIA Corp.
2701 San Tomas Expressway
Santa Clara, CA 95050
USA


Email:  dozrtska@mped.gpm.fni-i
URL:    www.mpi-inf.mpg.de/~strzodka/

Visitor information: NVIDIA Santa Clara Headquarters


Research Mission

Our research focuses on significant improvements of performance and accuracy in scientific computing through a global optimization across the entire spectrum of continuous modeling, numerical analysis, algorithm design, software implementation and hardware acceleration.

The concatenation of individually optimal solutions on each of these layers often performs poorly due to conflicting requirements at the interfaces. Consequently, the integration of individually suboptimal but inter-coordinated solutions from all layers can be far superior. Even when the application complexity prevents a global optimization the integrative consideration of several layers already proves to be beneficial.

Chosen application areas of particular interest in this context are the solution of partial differential equations and real-time image processing.

Current topics
Heterogeneous coprocessor cluster computing
Large scale SW-HW integration
Parallel adaptive data structures
Bandwidth reduction techniques
Global accuracy optimization
Real-time image processing pipeline



Projects

Highlights

The group has pioneered several innovative techniques in parallel processing on CMPs and FPGAs.
Mixing coarse-grained MPI cluster level parallelism and fine-grained co-processor parallelism, we contributed to a GPU accelerated FEM package that features a minimally invasive HW-SW integration and tested scalability up to 1 billion unknowns (Link).
Our co-development of mixed precision methods for parallel co-processors overcame their initial single precision limitation and still offers faster results of equal accuracy compared to a direct double precision implementation (Link).
We took part in the design and development of the Glift library for random access GPU data structures that enabled higher level programming and data parallel execution of complex data adaptive algorithms on the GPU (Link).
At a time when GPUs had still a fixed function pipeline and operated in 8 bit precision, we demonstrated their early potential for scientific computing by implementing the first iterative solvers for PDEs (Link). Comparisons to an FPGA and a tile-based CMP followed (Link).


2010-


Iterative Stencil Computations

Iterative stencil computations are ubiquitous in scientific computing and the exponential growth of cores in current processors leads to a bandwidth wall problem where limited off-chip bandwidth severely restricts their performance. In this project we aim at overcoming these problems by new algorithms that scale mainly with the aggregate cache bandwidth rather than the system bandwidth.

Sparse Format Specialization

The processing time of sparse representations of local discrete operators depends heavily on the storage format. The requirements of high accuracy, minimal memory footprint, high data locality, parallelism friendly layout, wide applicability, and easy modifiability are contradicting and therefore only case specific choices lead to satisfactory results.

Parallel Adaptive Data Structures

While GPUs and other highly parallel devices excel in processing of regularly structured data their large SIMD width and high number of cores quickly leads to inefficiencies in fine-granular branches and complex synchronization. However, adaptive data structures that cause such problems are indispensable to capture multi-scale phenomena. We must rethink our data arrangement in order to reconcile parallel and adaptive requirements.

Balanced Multigrid Solvers

Neither solvers with best numerical convergence nor solvers with best parallel efficiency are the best choice for the fast solution of PDE problems in practice. The fastest solvers require a delicate balance between their numerical and hardware characteristics. Often different tradeoffs must be chosen for fine and coarse grained parallelism.

HPC Programming Patterns

Although in-hardware processing has always been parallel, traditional programming languages create the illusion of sequential execution. This hurts performance and readability of the code, however, compatibility and maintainability reasons hinder the adaption of new languages. With a little thought, at least some of the desirable HPC programming patterns like switching between array of structs and struct of arrays layouts or passing of run-time values as template parameters can also be realized in traditional languages.


2005-2009


GPU-Cluster Computing

A single GPU already offers two levels of parallelism, but similar to CPUs, demand for higher performance and larger problem sizes leads to the utilization of GPU-clusters, in which every cluster node is equipped with GPUs. This adds the intra-node and inter-node parallelism. The main challenge for these heterogeneous systems is the enormous discrepancy in the bandwidth between the two finer and two coarser levels of parallelism.

Mixed-Precision Methods

To obtain a result of high accuracy it is not necessary to compute all intermediate results with high precision. Mixed precision methods apply high precision computations only where necessary and save space or time without decreasing the accuracy of the final solution.

GPGPU Geometric Refinement

GPUs process data of the same resolution very quickly with massive data parallel execution. But even the massive parallelism cannot compete with adaptive methods when the data size grows cubically under uniform refinement. This project develops parallel refinement strategies with grids and particles that allow to introduce higher resolution in only parts of the computational domain.

GPGPU Scientific Computing

Scientific simulations have higher accuracy requirements than multimedia processing applications. With the introduction of optimized floating point processing units in graphics processors and reconfigurable hardware these devices are now also attractive as powerful scientific co-processors.


2000-2004


Reconfigurable Computing

This projects investigates how the enormous parallelism of reconfigurable hardware can be harnessed to accelerate PDE solvers. Both fine- and coarse-grained architectures are examined. The performance is very convincing but for complex problems higher level programming languages for these devices are required.

GPGPU Computer Vision

Although graphics processor units (GPUs) are still very restricted in data handling some strategies allow the focusing of processing on data-dependent regions of interest. Thus computer vision algorithms which require computations on changing regions of interest can already benefit from the high GPU performance. Current implementations comprise the Generalized Hough Transform, skeleton computation and motion estimation.

GPGPU Image Processing

The data parallelism in typical image processing algorithms is very well suited for data-stream-based architectures. PDE based methods for image denoising, segmentation and registration have been thus accelerated on graphics cards.


Visualization

The choice of visualization methods and parameters is already a part of the interpretation process of the data, as it emphasizes certain structures and subdues others. This can lead to positive effects uncovering otherwise unconceivable relations in the data, but may also produce false evidence. Combinations of multiple methods, and data based parameter controls try to limit this danger.


Teaching

Lectures

  • Robert Strzodka, Ross Walker, Eduardo Bringa, Ezequiel Ferrero, Carlos Bederián, and Nicolás Wolovick. GPGPU computing for scientific applications. http://www.famaf.unc.edu.ar/grupos/GPGPU/EscuelaGPGPU2011/, 2011. Lecture & course at the University of Córdoba, Argentina, SS 2011.
  • Timothy Lanfear, Hendrik Lensch, and Robert Strzodka. Scientific GPU computing. http://gpulab.imm.dtu.dk/PhDschool/, 2010. Lecture & course at the Technical University of Denmark, Lyngby, Denmark, SS 2010.
  • Hendrik Lensch and Robert Strzodka. Massively parallel computing with CUDA. http://www.mpi-inf.mpg.de/%7Estrzodka/lectures/ParCo08/, 2008. Lecture & course at the Saarland University, Saarbrücken, Germany, WS 2008/2009.

Tutorials/Workshops

  • Emmanuel Agullo, Fran c cois Bodin, Denis Caromel, Jack Dongarra, Florent Duchaine, Luigi Genovese, Judith Gimenez, Dominik Göddeke, Michael Heroux, Manfred Liebmann, Raymond Namyst, Enrique Quintana Ortí, Robert Strzodka, Marc Tajchman, Tim Warburton, and Felix Wolf. Toward petaflop numerical simulation on parallel hybrid architectures. http://www-sop.inria.fr/manifestations/cea-edf-inria-2011/index_en.html, June 2011. CEA-EDF-INRIA summer school, Sophia-Antipolis, France.
  • Robert Strzodka, Dominik Göddeke, and Dominik Behr. GPUs, OpenCL and scientific computing. http://www.gpgpu.org/ppam2009/, September 2009. Tutorial at the International Conference on Parallel Processing and Applied Mathematics PPAM 2009, Wroclaw, Poland.
  • Dominik Göddeke, Robert Strzodka, and Christian Sigg. Practical GPU programming. http://www.speedup.ch/workshops/w38_2009/tutorial.html, September 2009. Tutorial at the SPEEDUP Workshop on High-Performance Computing 2009, Lausanne, Switzerland.
  • Dominik Göddeke, Simon Green, and Robert Strzodka. GPGPU and CUDA tutorials. http://www.mathematik.uni-dortmund.de/%7Egoeddeke/arcs2008/, February 2008. Tutorials at the International Conference on Architecture of Computing Systems ARCS 2008, Dresden, Germany.
  • B. Scott Michel, Ian Buck, Frederica Darema, Dominik Göddeke, Mary Hall, Allen McPherson, Dinesh Manocha, Matthew Papakipos, Michael Paolini, Ryan N. Schneider, Mark Segal, Burton Smith, Robert Strzodka, Marc Tremblay, and John Turner. General-purpose GPU computing: Practice and experience. http://www.gpgpu.org/sc2006/workshop/, November 2006. Workshop at IEEE/ACM Supercomputing 2006, Tampa, FL.
  • Maya Gokhale, Pat McCormick, Robert Strzodka, Zach Baker, Yang Liu, Paul Henning, Matthew Papakipos, Jeff Inman, Justin Tripp, Yuan Zhao, Matt Sottile, and Ron Minnich. Heterogeneous computing: Architectures, tools, applications. http://nis-www.lanl.gov/%7Emaya/papers/lacsi-06-heterogeneous-computing/abs.html, October 2006. Workshop at Los Alamos Computer Science Institute (LACSI) Symposium 2006, Santa Fe, NM.
  • Dominik Göddeke and Robert Strzodka. Scientific computing on graphics hardware. http://www.mathematik.uni-dortmund.de/%7Egoeddeke/iccs/, May 2006. Tutorial at the International Conference on Computational Science (ICCS) 2006, Reading, UK.
  • Aaron Lefohn, Ian Buck, Patrick McCormick, John D. Owens, Tim Purcell, and Robert Strzodka. GPGPU: General-purpose computation on graphics processors. http://www.gpgpu.org/vis2005/, October 2005. Tutorial in IEEE Visualization 2005, Minneapolis, MN.
  • Aaron Lefohn, Ian Buck, John D. Owens, and Robert Strzodka. GPGPU: General-purpose computation on graphics processors. http://www.gpgpu.org/vis2004/, October 2004. Tutorial in IEEE Visualization 2004, Austin, TX.


Publications

This list is sorted by type of publication, for a thematic sorting please visit the project pages.

Extensive Articles: collections and journals

Articles: collections and journals

Articles: conference proceedings

Extended Abstracts

Thesis



 dozrtska@mped.gpm.fni-i