Recent GPU ray tracers can already achieve performance competitive to that of their CPU counterparts. Nevertheless, these systems can not yet fully exploit the capabilities of modern GPUs and can only handle mediumsized, static scenes.
In this paper we present a BVHbased GPU ray tracer with a parallel packet traversal algorithm using a shared stack. We also present a fast, CPUbased BVH construction algorithm which very accurately approximates the surface area heuristic using streamed binning while still being one order of magnitude faster than previously published results. Furthermore, using a BVH allows us to push the size limit of supported scenes on the GPU: We can now ray trace the 12.7 million triangle Power Plant at 1024×1024 image resolution with 3 fps, including shading and shadows.