





Dr. Robert
Strzodka
Visiting Researcher
Head of CFD Group

NVIDIA Corp. 2701 San Tomas Expressway
Santa Clara, CA 95050
USA

Email: dozrtska@mped.gpm.fnii
URL: www.mpiinf.mpg.de/~strzodka/

Visitor information: NVIDIA Santa Clara Headquarters




Research Mission




Projects
Highlights
The group has pioneered several innovative techniques in parallel processing on CMPs and FPGAs.



Mixing coarsegrained MPI cluster level parallelism and finegrained coprocessor parallelism, we contributed to a GPU accelerated FEM package that features a minimally invasive HWSW integration and tested scalability up to 1 billion unknowns (Link).


Our codevelopment of mixed precision methods for parallel coprocessors overcame their initial single precision limitation and still offers faster results of equal accuracy compared to a direct double precision implementation (Link).


We took part in the design and development of the Glift library for random access GPU data structures that enabled higher level programming and data parallel execution of complex data adaptive algorithms on the GPU (Link).


At a time when GPUs had still a fixed function pipeline and operated in 8 bit precision, we demonstrated their early potential for scientific computing by implementing the first iterative solvers for PDEs (Link). Comparisons to an FPGA and a tilebased CMP followed (Link).


2010


Iterative Stencil Computations
Iterative stencil computations are ubiquitous in scientific computing and the exponential growth of cores in current processors leads to a bandwidth wall problem where limited offchip bandwidth severely restricts their performance. In this project we aim at overcoming these problems by new algorithms that scale mainly with the aggregate cache bandwidth rather than the system bandwidth.


Sparse Format Specialization
The processing time of sparse representations of local discrete operators depends heavily on the storage format. The requirements of high accuracy, minimal memory footprint, high data locality, parallelism friendly layout, wide applicability, and easy modifiability are contradicting and therefore only case specific choices lead to satisfactory results.


Parallel Adaptive Data Structures
While GPUs and other highly parallel devices excel in processing of regularly structured data their large SIMD width and high number of cores quickly leads to inefficiencies in finegranular branches and complex synchronization. However, adaptive data structures that cause such problems are indispensable to capture multiscale phenomena. We must rethink our data arrangement in order to reconcile parallel and adaptive requirements.


Balanced Multigrid Solvers
Neither solvers with best numerical convergence nor solvers with best parallel efficiency are the best choice for the fast solution of PDE problems in practice. The fastest solvers require a delicate balance between their numerical and hardware characteristics. Often different tradeoffs must be chosen for fine and coarse grained parallelism.


HPC Programming Patterns
Although inhardware processing has always been parallel, traditional programming languages create the illusion of sequential execution. This hurts performance and readability of the code, however, compatibility and maintainability reasons hinder the adaption of new languages. With a little thought, at least some of the desirable HPC programming patterns like switching between array of structs and struct of arrays layouts or passing of runtime values as template parameters can also be realized in traditional languages.

20052009


GPUCluster Computing
A single GPU already offers two levels of parallelism, but similar to CPUs, demand for
higher performance and larger problem sizes leads to the utilization of GPUclusters, in which every cluster node is equipped with GPUs. This adds the intranode and internode parallelism. The main challenge for these heterogeneous systems is the enormous discrepancy in the bandwidth between the two finer and two coarser levels of parallelism.


MixedPrecision MethodsTo
obtain a result of high accuracy it is not necessary to compute all
intermediate results with high precision. Mixed precision methods apply
high precision computations only where necessary and save space or time
without decreasing the accuracy of the final solution.


GPGPU Geometric Refinement
GPUs process data of the same resolution very quickly with massive data parallel execution. But even the massive parallelism cannot
compete with adaptive methods when the data size grows cubically under uniform refinement. This project develops parallel refinement strategies with grids and particles that allow to introduce higher resolution in only parts of the computational domain.


GPGPU Scientific Computing
Scientific simulations have higher accuracy requirements than
multimedia
processing applications. With the introduction of optimized floating
point processing units in graphics processors and reconfigurable
hardware these devices are now also attractive as powerful
scientific coprocessors.

20002004


Reconfigurable Computing
This projects investigates how the enormous parallelism of
reconfigurable hardware can be harnessed to accelerate
PDE solvers. Both fine
and coarsegrained architectures are examined. The performance is very
convincing but for complex problems higher level programming languages
for these devices are required.


GPGPU Computer Vision
Although graphics processor units (GPUs) are still very restricted in
data
handling some strategies allow the focusing of processing on
datadependent regions of interest. Thus computer vision algorithms
which require computations on changing regions of interest can already
benefit from the high GPU performance. Current implementations comprise
the Generalized Hough Transform, skeleton computation and motion
estimation.


GPGPU Image Processing
The data parallelism in typical image processing algorithms is very
well
suited for datastreambased architectures. PDE based methods for image
denoising, segmentation and registration have been thus accelerated on
graphics cards.


Visualization
The choice of visualization methods and parameters is already a part of
the interpretation process of the data, as it
emphasizes
certain structures and subdues others. This can lead to positive
effects uncovering otherwise unconceivable relations in the data, but
may
also produce false evidence. Combinations of multiple methods, and data
based parameter controls try to limit this danger. 




Teaching
Lectures
 Robert Strzodka, Ross Walker,
Eduardo Bringa, Ezequiel Ferrero, Carlos Bederián, and Nicolás Wolovick.
GPGPU
computing for scientific applications.
http://www.famaf.unc.edu.ar/grupos/GPGPU/EscuelaGPGPU2011/, 2011.
Lecture & course at the University of Córdoba, Argentina, SS 2011.
 Timothy Lanfear, Hendrik Lensch,
and Robert Strzodka.
Scientific GPU computing.
http://gpulab.imm.dtu.dk/PhDschool/, 2010.
Lecture & course at the Technical University of Denmark, Lyngby, Denmark, SS
2010.
 Hendrik Lensch and Robert
Strzodka.
Massively
parallel computing with CUDA.
http://www.mpiinf.mpg.de/%7Estrzodka/lectures/ParCo08/, 2008.
Lecture & course at the Saarland University, Saarbrücken, Germany, WS
2008/2009.
Tutorials/Workshops
 Emmanuel Agullo, Fran c cois
Bodin, Denis Caromel, Jack Dongarra, Florent Duchaine, Luigi Genovese, Judith
Gimenez, Dominik Göddeke, Michael Heroux, Manfred Liebmann, Raymond
Namyst, Enrique Quintana Ortí, Robert Strzodka, Marc Tajchman, Tim
Warburton, and Felix Wolf.
Toward petaflop numerical simulation on parallel hybrid architectures.
http://wwwsop.inria.fr/manifestations/ceaedfinria2011/index_en.html, June 2011.
CEAEDFINRIA summer school, SophiaAntipolis, France.
 Robert Strzodka, Dominik
Göddeke, and Dominik Behr.
GPUs, OpenCL and scientific
computing.
http://www.gpgpu.org/ppam2009/, September 2009.
Tutorial at the International Conference on Parallel Processing and Applied
Mathematics PPAM 2009, Wroclaw, Poland.
 Dominik Göddeke, Robert
Strzodka, and Christian Sigg.
Practical
GPU programming.
http://www.speedup.ch/workshops/w38_2009/tutorial.html, September
2009.
Tutorial at the SPEEDUP Workshop on HighPerformance Computing 2009, Lausanne,
Switzerland.
 Dominik Göddeke, Simon Green,
and Robert Strzodka.
GPGPU and CUDA tutorials.
http://www.mathematik.unidortmund.de/%7Egoeddeke/arcs2008/, February
2008.
Tutorials at the International Conference on Architecture of Computing Systems
ARCS 2008, Dresden, Germany.
 B. Scott Michel, Ian Buck, Frederica
Darema, Dominik Göddeke, Mary Hall, Allen McPherson, Dinesh Manocha,
Matthew Papakipos, Michael Paolini, Ryan N. Schneider, Mark Segal, Burton
Smith, Robert Strzodka, Marc Tremblay, and John Turner.
Generalpurpose GPU
computing: Practice and experience.
http://www.gpgpu.org/sc2006/workshop/, November 2006.
Workshop at IEEE/ACM Supercomputing 2006, Tampa, FL.
 Maya Gokhale, Pat McCormick, Robert
Strzodka, Zach Baker, Yang Liu, Paul Henning, Matthew Papakipos, Jeff Inman,
Justin Tripp, Yuan Zhao, Matt Sottile, and Ron Minnich.
Heterogeneous computing: Architectures, tools,
applications.
http://niswww.lanl.gov/%7Emaya/papers/lacsi06heterogeneouscomputing/abs.html, October 2006.
Workshop at Los Alamos Computer Science Institute (LACSI) Symposium 2006,
Santa Fe, NM.
 Dominik Göddeke and Robert
Strzodka.
Scientific
computing on graphics hardware.
http://www.mathematik.unidortmund.de/%7Egoeddeke/iccs/, May 2006.
Tutorial at the International Conference on Computational Science (ICCS)
2006, Reading, UK.
 Aaron Lefohn, Ian Buck, Patrick
McCormick, John D. Owens, Tim Purcell, and Robert Strzodka.
GPGPU: Generalpurpose computation on
graphics processors.
http://www.gpgpu.org/vis2005/, October 2005.
Tutorial in IEEE Visualization 2005, Minneapolis, MN.
 Aaron Lefohn, Ian Buck, John D.
Owens, and Robert Strzodka.
GPGPU: Generalpurpose computation on
graphics processors.
http://www.gpgpu.org/vis2004/, October 2004.
Tutorial in IEEE Visualization 2004, Austin, TX.




Publications
This list is sorted by type of publication, for a thematic sorting please visit
the project pages.
Extensive Articles: collections and journals
 Dominik Göddeke, Robert
Strzodka, and Stefan Turek.
Performance and accuracy of hardwareoriented native, emulated and
mixedprecision solvers in FEM simulations.
International Journal of Parallel, Emergent and Distributed Systems
(IJPEDS), Special issue: Applied parallel computing, 22(4):221–256,
January 2007.
(PDF)
 Aaron E. Lefohn, Joe Kniss, Robert
Strzodka, Shubhabrata Sengupta, and John D. Owens.
Glift: An abstraction for generic, efficient GPU data structures.
ACM Transactions on Graphics, 25(1):1–37, Jan 2006.
(PDF)
 Martin Rumpf and Robert Strzodka.
Graphics processor units: New prospects for parallel computing.
In Are Magnus Bruaset and Aslak Tveito, editors, Numerical Solution of
Partial Differential Equations on Parallel Computers, volume 51 of
Lecture Notes in Computational Science and Engineering, pages
89–134. Springer, 2005.
(PDF)
Articles: collections and journals
 Robert Strzodka.
Abstraction for AoS and SoA layout in C++.
In Wen mei W. Hwu, editor, GPU Computing Gems: Jade Edition.
Morgan Kaufmann, September 2011.
 Dominik Göddeke and Robert Strzodka.
Cyclic reduction tridiagonal solvers on GPUs applied to mixed precision
multigrid.
IEEE Transactions on Parallel and Distributed Systems (TPDS), Special
Issue: High Performance Computing with Accelerators, 22(1):22–32,
January 2011.
(PDF)
(doi:10.1109/TPDS.2010.61)
 Dominik Göddeke and Robert Strzodka.
Mixed precision GPUmultigrid solvers with strong smoothers.
In Jack J. Dongarra, David A. Bader, and Jakub Kurzak, editors,
Scientific Computing with Multicore and Accelerators, pages
131–147. CRC Press, December 2010.
(PDF)
 Dominik Göddeke, Hilmar
Wobker, Robert Strzodka, Jamaludin MohdYusof, Patrick McCormick, and Stefan
Turek.
Coprocessor acceleration of an unmodified parallel solid mechanics
code with FEASTGPU.
International Journal of Computational Science and Engineering
(IJCSE), 4(4):254–269, November 2009.
(PDF)
 Nicolas Cuntz, Andreas Kolb, Robert
Strzodka, and Daniel Weiskopf.
Particle level set advection for the interactive visualization of unsteady
3D flow.
Computer Graphics Forum, 27(3):719–726, May 2008.
(PDF)
 Dominik Göddeke, Robert
Strzodka, Jamaludin MohdYusof, Patrick McCormick, Hilmar Wobker, Christian
Becker, and Stefan Turek.
Using GPUs to improve multigrid solver performance on a cluster.
International Journal of Computational Science and Engineering
(IJCSE), 4(1):36–55, 2008.
(PDF)
 Dominik Göddeke, Robert
Strzodka, Jamaludin MohdYusof, Patrick McCormick, Sven H.M. Buijssen,
Matthias Grajewski, and Stefan Turek.
Exploring weak scalability for FEM calculations on a GPUenhanced
cluster.
Parallel Computing, Special issue: Highperformance computing using
accelerators, 33(10–11):685–699, November 2007.
(PDF)
 Robert Strzodka, Michael Doggett, and
Andreas Kolb.
Scientific computation for simulations on programmable graphics hardware.
Simulation Modelling Practice and Theory, Special Issue: Programmable
Graphics Hardware, 13(8):667–680, Nov 2005.
(PDF)
 Robert Strzodka, Marc Droske,
and Martin Rumpf.
Image registration by a regularized gradient flow  a streaming
implementation in DX9 graphics hardware.
Computing, 73(4):373–389, November 2004.
(PDF)
 Robert Strzodka, Marc Droske,
and Martin Rumpf.
Fast image registration in DX9 graphics hardware.
Journal of Medical Informatics and Technologies, 6:43–49, Nov
2003.
(PDF)
 Udo Diewald, Tobias Preusser,
Martin Rumpf, and Robert Strzodka.
Diffusion models and their accelerated solution in computer vision
applications.
Acta Mathematica Universitatis Comenianae (AMUC), 70(1):15–31,
2001.
(PDF)
 J. Becker, D. Bürkle, R.T.
Happe, T. Preusser, M. Rumpf, M. Spielberg, and R. Strzodka.
Aspects on data analysis and visualization for complex dynamical
systems.
In Bernold Fiedler, editor, Ergodic Theory, Analysis, and Efficient
Simulation of Dynamical Systems, pages 417–430. Springer, 2000.
(PDF)
Articles: conference proceedings
 Robert Strzodka, Mohammed Shaheen,
Dawid Pajak, and HansPeter Seidel.
Cache accurate time skewing in iterative stencil computations.
In Proceedings of the International Conference on Parallel Processing
(ICPP), pages 571–581. IEEE Computer Society, September 2011.
(PDF)
(doi:10.1109/ICPP.2011.47)
 Robert Strzodka, Mohammed
Shaheen, Dawid Pajak, and HansPeter Seidel.
Cache oblivious parallelograms in iterative stencil computations.
In ICS '10: Proceedings of the 24th ACM International Conference on
Supercomputing, pages 49–59. ACM, June 2010.
(PDF)
(doi:10.1145/1810085.1810096)
 M. Shaheen, J. Gall,
R. Strzodka, L. Van Gool, and H.P. Seidel.
A comparison of 3d modelbased tracking approaches for human motion
capture in uncontrolled environments.
In IEEE Workshop on Applications of Computer Vision (WACV'09),
pages 1–8, December 2009.
(PDF)
 Robert Strzodka and Dominik
Göddeke.
Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE
solvers from low precision components.
In IEEE Symposium on FieldProgrammable Custom Computing Machines (FCCM
2006), pages 259–268, April 2006.
(PDF)
 Alexandru Telea and Robert Strzodka.
Multiscale image based flow visualization.
In Proc. of SPIEIS&T Electronic Imaging, Visualization and Data
Analysis (VDA) 2006, volume 6060, pages 1–11, Jan 2006.
(PDF)
 Dominik Göddeke, Robert
Strzodka, and Stefan Turek.
Accelerating double precision FEM simulations with GPUs.
In Proceedings of ASIM 2005  18th Symposium on Simulation
Technique, Sep 2005.
(PDF)
 Robert Strzodka and Christoph Garbe.
Realtime motion estimation and visualization on graphics cards.
In Proceedings IEEE Visualization 2004, pages 545–552, 2004.
(PDF)
 Robert Strzodka and Alexandru
Telea.
Generalized Distance Transforms and skeletons in graphics hardware.
In Proceedings of EG/IEEE TCVG Symposium on Visualization (VisSym
'04), pages 221–230, 2004.
(PDF)
 Robert Strzodka, Ivo Ihrke, and
Marcus Magnor.
A graphics hardware implementation of the Generalized Hough Transform for
fast object recognition, scale, and 3d pose detection.
In Proceedings of IEEE International Conference on Image Analysis and
Processing (ICIAP'03), pages 188–193, 2003.
(PDF)
 Robert Strzodka.
Virtual 16 bit precise operations on RGBA8 textures.
In Proceedings of Vision, Modeling, and Visualization (VMV'02),
pages 171–178, 2002.
(PDF)
 Steffen Klupsch, Markus Ernst,
Sorin A. Huss, Martin Rumpf, and Robert Strzodka.
Real time image processing based on reconfigurable hardware
acceleration.
In Proceedings of IEEE Workshop Heterogeneous reconfigurable Systems on
Chip, 2002.
(PDF)
 Martin Rumpf and Robert Strzodka.
Using graphics cards for quantized FEM computations.
In Proceedings of IASTED Visualization, Imaging and Image Processing
Conference (VIIP'01), pages 193–202, 2001.
(PDF)
 Martin Rumpf and Robert Strzodka.
Level set segmentation in graphics hardware.
In Proceedings of IEEE International Conference on Image Processing
(ICIP'01), volume 3, pages 1103–1106, 2001.
(PDF)
 Martin Rumpf and Robert
Strzodka.
Nonlinear diffusion in graphics hardware.
In Proceedings of EG/IEEE TCVG Symposium on Visualization (VisSym
'01), pages 75–84, 2001.
(PDF)
 Michael Dellnitz, Oliver Junge,
Martin Rumpf, and Robert Strzodka.
The computation of an unstable invariant set inside a cylinder containing a
knotted flow.
In B. Fiedler, K. Gröger, and J. Sprekels, editors, Proceedings of the
Equadiff'99, pages 1015–1020. World Scientific, 2000.
(PDF)
Extended Abstracts
 Robert Strzodka, Mohammed Shaheen,
and Dawid Pajak.
Time skewing made simple.
In Proceedings ACM symposium on principles and practice of parallel
programming, PPoPP '11, February 2011.
(PDF)
 Robert Strzodka, Mohammed Shaheen, and
Dawid Pajak.
Overcoming bandwidth limitations in visual computing.
In Proceedings of Visual Computing Research Conference,
Saarbrücken, Germany, December 2009.
(PDF)
 Robert Strzodka and Dominik
Göddeke.
Mixed precision methods for convergent iterative schemes.
In Proceedings of the 2006 Workshop on Edge Computing Using New Commodity
Architectures, pages D–59–60, May 2006.
(PDF)
 Aaron E. Lefohn, Shubhabrata
Sengupta, Joe Kniss, Robert Strzodka, and John D. Owens.
Glift: Generic data structures for the GPU.
In Proceedings of the 2006 Workshop on Edge Computing Using New Commodity
Architectures, pages D–15–16, May 2006.
(PDF)
 Aaron Lefohn, Shubhabrata Sengupta,
Joe Kniss, Robert Strzodka, and John D. Owens.
Dynamic adaptive shadow maps on graphics hardware.
In ACM SIGGRAPH 2005 Conference Abstracts and Applications, Aug
2005.
(PDF)
 Joe Kniss, Aaron Lefohn, Robert
Strzodka Shubhabrata Sengupta, and John D. Owens.
Octree textures on graphics hardware.
In ACM SIGGRAPH 2005 Conference Abstracts and Applications, Aug
2005.
(PDF)
Thesis




dozrtska@mped.gpm.fnii 