Ultra-fast 3D Filtered Backprojection on Commodity Graphics Hardware

Fang Xu, Klaus Mueller

Center for Visual Computing Computer Science Department Stony Brook University Feldkamp Algorithm

for each projection projk xd // perform filtering yd weight by a/b Y ramp-filter each column (yd direction) Voxel vj // perform backprojection for each grid voxel v φ j k Project vj onto image along rays

b φ interpolate voxel update dvj X weight dvj by depth factor cj: dvj = dvj · c a j Z add result to grid voxel: v = v + dv a2 j j j c = j 2 2 ϕ 2 (a + v jy + v jz cos( −ϕk ))

3D World Scene 3D View Scene

Geometry Processing

Polygon Rasterization

Texture Mapping Graphics Pipeline

Vertex Data Rendering Vertex Data TriangleTriangle SetupSetup

Geometry Processing FixedFixed FunctionFunction TexturingTexturing ProgrammableProgrammable && FilteringFiltering PixelPixel ShaderShader FixedFixed FunctionFunction ProgrammableProgrammable && BlendingBlending 32-bit32-bit TransformTransform VertexVertex ShaderShader 8-bit8-bit && LightingLighting

FogFog BlendingBlending VisibilityVisibility TestingTesting Clipping,Clipping, ViewportViewport Transform Transform FrameFrame BufferBuffer GPU as a Computation Platform

• Architecture: Single Instruction/Multiple Data (SIMD) + RGBA • Vector or : Textures • Extended programmability: Vertex and • Up to 16 or 32 bit floating point precision

Fixed Pipeline Programmable Pipeline Precision 8 bits 32 bits Speed Very Fast Fast Cutting Edge Graphics Hardware

GeForce FX 5900 GeForce FX 6800 Chip Technology 256-bit/130M transistors 256-bit/222M transistors (178M transistors on Pentium 4) Processor 0.13 Micron 0.13 Micron Memory Bus 256-bit GDDR 256-bit GDDR3 Memory Bandwidth 27.2 GB/s 35.2 GB/s Texel Fillrate 3.6 Gigatexels/s Unknown Triangle Transform Rate 315 M Triangles/s 600 M Triangles/s Maximum Memory 256MB 512MB GPU clock 450 MHz 400 MHz Pixel Pipelines 4 for Color&Z, 8 for Texture 16 for Color&Z, 32 for Texture Textures per Texture Units 16 Unknown Internal Precision 32-bit Floating Point 32-bit Floating Point Backpojection via Projective Texture (Side View) Texture Stack 1

Texture Stack 2 Backpojection via Projective Texture (Top View)

Texture-mapped polygons (volume slices) Z Cone Angle γ φ

Rotation angle φ Image plane X (detector,)

Texture slices are mapped onto the image plane and values are accumulated in the framebuffer Backpojection via Texture Spreading (Side View)

Relevant Pixels for Slice K

Volume

Source

Slice K

Image Plane Backprojection via Texture Spreading (Top View)

A row of backprojected image ParallelParallel BeamBeam

3 rows of backprojected Frame Buffer image = A Volume Slice ConeCone BeamBeam Texture Spreading Pipeline

8-bit Rasterizers 2D Texture Tiles

Floating Point ALU Timings

CPU: Pentium 2.66 GHz, 512MB RAM GPU: NVidia FX 5900, 256MB DDR

Time Time Implementation Volume = 1283 Volume = 2563 160 projections 160 projections Software – Full Floating Point 1 min 16 min

GPU – Full Floating Point 8 s (7.5) 2 min

GPU – 8-bit Projective Texture 1.6 s (37) 25 s

GPU – 8-bit Texture Spreading 1.6 s (37) 25 s GPU-32, 0.5% contrast, GPU-32, 0.5% contrast, Results 1283 80 projections 1283 160 projections

Original 1283 Shepp-Logan Phantom

GPU-8, 0.5% contrast, GPU-8, 0.5% contrast, 1283 160 projections 2563 160 projections Reference

[1] B. Cabral, N. Cam, and J. Foran, “Accelerated volume rendering and tomographic reconstruction using hardware,” Symposium on Volume Visualization, pp. 91-98, 1994. [2] L.A. Feldkamp, L.C. Davis, and J.W. Kress, “Practical cone beam algorithm,” J. Opt. Soc. Am., pp. 612-619, 1984. [3] K. Chidlow and T. Möller, “Rapid emission volume reconstruction, ” Proc. Volume Graphics Workshop 2003, pp. 15-26. [4] K. Mueller and R. Yagel, "Rapid 3D cone-beam reconstruction with the Algebraic Reconstruction Technique (ART) by using texture mapping hardware," vol. 19, no. 12, pp. 1227-1237, IEEE Trans. on Medical Imaging, 2000.