<<

High-QualityVolume Rendering Using Hardware

y

Frank Dachille, Kevin Kreeger, Bao quan Chen, Ingmar Bitter and Arie Kaufman

z

Center for Visual Computing CVC

and Department of Computer Science

State University of New York at Stony Bro ok

Stony Bro ok, NY 11794-4400, USA

Abstract store the volume as a solid texture on the graphics hardware,

then to sample the texture using planes parallel to the image

We present a method for of regular grids

plane and comp osite them into the frame bu er using the

which takes advantage of 3D texture mapping hardware cur-

blending hardware. This approach considers only ambient

rently available on graphics workstations. Our methodpro-

light and quickly pro duces unshaded images. The images

duces accurate for arbitrary and dynamical ly chang-

could b e improved byvolumetric shading, which implements

ing directional lights, viewing parameters, and transfer func-

a full lighting equation for eachvolume sample.

tions. This is achieved by hardware interpolating the data

Cabral et al. [3] rendered 51251264 volumes into a

values and gradients before software classi cation and shad-

512512 window presumably with 64 sampling planes in

ing. The method works equal ly wel l for paral lel and perspec-

0.1 seconds on a four Raster Manager SGI RealityEngi ne

tive projections. We present two approaches for our method:

Onyx with one 150MHz CPU. Cullip and Neumann [4]

one which takes advantage of softwareray casting optimiza-

also pro duced 512512 images on the SGI RealityEngi ne

tions and another which takes advantage of hardware blend-

again presumably 64 sampling planes since the volume is

ing acceleration.

12812864 in 0.1 seconds. All of these approaches keep

time-critical computations inside the graphics pip eline at the

CR Categories: I.3.1 []: Hardware

exp ense of volumetric shading and image quality.

Architecture; I.3.3 [Computer Graphics]: Picture/Image

Van Gelder and Kim [6] prop osed a metho d by which

Generation; I.3.7 [Computer Graphics]: Three-Dimensional

volumetric shading could b e incorp orated at the exp ense of

Graphics and Realism|Color, shading, shadowing, and tex-

interactivity. Their shaded renderings of 256256113 vol-

ture

2

umes into 600 images with 1000 samples along eachraytook

Keywords: volume rendering, shading, ray casting, tex-

13.4 seconds. Their metho d is slower than Cullip and Neu-

ture mapping, solid texture, hardware acceleration, parallel

mann's and Cabral et al.'s b ecause they must re-shade the

rendering

volume and reload the texture for every frame b ecause

the colors in the texture memory are view dep endant.

Cullip and Neumann also describ ed a metho d utilizin g

1 Intro duction

the PixelFlow machine which pre-computes the x, y and z

gradient comp onents and uses the texture mapping to inter-

Volumetric data is p ervasive in many areas such as medi-

p olate the density data and the three gradient comp onents.

cal diagnosis, geophysical analysis, and computational uid

The latter is implemented partially in hardware and par-

dynamics. Visualizati on byinteractive, high-quali tyvol-

2

tially in software on the 128 SIMD pro cessors [5].

ume rendering enhances the usefulness of this data. To

All four of these values are used to compute Phong shaded

date, manyvolume rendering metho ds have b een prop osed

samples which are comp osited in the frame bu er. They pre-

on general and sp ecial purp ose hardware, but most fail to

3

dicted that 256 volume could b e rendered at over 10Hz into

achieve reasonable cost-p erformance ratios. We prop ose a

a 640x512 image with 400 sample planes. Although this is

high-qualityvolume rendering metho d suitable for imple-

the rst prop osed solution to implement full Phong lighting

mentation on machines with 3D texture mapping hardware.

functionality, it has never b een realized as far as we know

Akeley [1] rst mentioned the p ossibili ty of accelerating

b ecause it would require 43 pro cessor cards, a numb er which

volume rendering using of 3D texture mapping hardware,

can not easily t into a standard workstation chassis [4].

sp eci cally on the SGI Reality Engine. The metho d is to

3

Sommer et al. [13] describ ed a metho d to render 128

2

y

volumes at 400 resolution with 128 samples p er ray in 2.71

fdachillejkkreegerjbao [email protected]

z

seconds. They employ a full lighting equation by computing

http://www.cvc.sunysb.edu

a smo oth gradient from a second copy of the volume stored

in main memory. Therefore, they do not have to reload the

texture when viewing parameters change. However,

this rendering rate is for extraction; if translucent

pro jections are required, it takes 33.2 seconds for the same

rendering. They were the rst to prop ose to resample the

texture volume in planes parallel to a row of image so

that a whole raywas in main memory at one time. They

mention the p otential to also interp olate gradients with the

hardware. All of these texture map based metho ds either non- Data, Gradients Classification, Data, Gradients Classification, Data, Gradients Classification, map to colors map to colors map to colors

3-4 parameter texture memory software texture LUT 3-4 parameter texture memory software texture LUT 3-4 parameter Main Memory hardware texture LUT and CPU texture memory

Frame Buffer Frame Buffer Frame Buffer

(a) (b) (c)

Figure 1: Threearchitectures for texture map based volume rendering: a Our architecture, b Traditional architectureof

Van Gelder and Kim, and c Ideal architectureofVan Gelder and Kim. The thick lines are the operations which must be

performed for every frame.

by the texture hardware along rays cast through the volume. interactively recompute direction-dep endent shading each

The sample data for eachray or slice is then transferred time any of the viewing parameters change, compute only

to a bu er in main memory and shaded by the CPU. The direction-indep end ent shading, or compute no shading at

shaded samples along a ray are comp osited and the nal all. Our metho d shades every visible sample with view-

pixels are moved to the frame bu er for display. Alterna- dep endent lighting at interactive rates.

tively within the same architecture, the shaded can

We do not adapt the ray casting algorithm to t within

b e comp osited by the frame bu er.

the existing graphics pip elin e, whichwould compromise the

Fig. 1b shows the architecture that is traditional ly used

image quality. Instead, we only utilize the hardware where

in texture map based shaded volume rendering. One of the

it provides run time advantages, but maintain the integrity

disadvantages of this architecture is that the volume must

of the ray casting algorithm. For the p ortions of the volume

b e re-shaded and re-loaded every time any of the viewing

rendering pip eline which can not b e p erformed in graphics

parameters changes. Another problem with this metho d is

hardware sp eci cally shading we use the CPU.

that RGB values are interp olated by the texture hardware.

In volume rendering byray casting, data values and gra-

Therefore, when non-linear mappings from densitytoRGB

dients are estimated at evenly spaced intervals along rays

are used, the interp olated samples are incorrect. We present

emanating from pixels of the nal . Resam-

a more detailed comparison of the various metho ds in Sec. 4.

pling these data values and gradients is often the most time

In Fig. 1c, Van Gelder and Kim's [6] Ideal architecture

consuming task in software implementations . The texture

is presented. In this architecture, the raw density and vol-

mapping hardware on high-end graphics workstations is de-

ume gradients are loaded into the texture memory one time

signed to p erform resampling of solid textures with very high

only. The density and gradients are then interp olated by the

throughput. We leverage this capability to implement high

texture hardware and passed to a p ost-texturing LUT. The

throughput density and gradient resampling.

densityvalues and gradients are used as an index into the

Shading is the missing key in conventional texture map

LUT to get the RGB values for each sample. The LUT

based volume rendering. This is one of the reasons that pure

is based on the current view direction and can b e created

graphics hardware metho ds su er from lower image qual-

using any lighting mo del desired e.g., Phong for any level

ity than software implementations of ray-casting. For high-

of desired image quality. This metho d solves the problems

quality images, our metho d implements full

of the current architecture includin g pre-shading the volume

using the estimated surface normal gradient of the density.

and interp olatin g RBG values. However, a p ost-texturing

We pre-compute the estimated gradient of the density and

LUT would need to b e indexed by the lo cal gradient which

store it in texture memory.We also pre-compute a lo okup

would require an infeasibly large LUT see Sec. 2.2.

LUT to store the e ect of an arbitrary number of

light sources using full Phong shading.

The nal step in volume rendering is the comp ositing, or

2.1 Sampling

blending, of the color samples along eachrayinto a nal im-

age color. Most graphics systems have a frame bu er with

Ray casting is an image-order algorithm, which has the

an opacitychannel and ecient blending hardware which

drawbackofmultiple access of data, since sampling

can b e used for back-to-front comp ositing. In the next sec-

within the dataset usually requires the access of eightor

tion we present our architecture, in Sec. 3 we present our

more neighb oring data p oints [2, 11 ]. Ray casting using tex-

rendering optimization techniques, in Sec. 4 we compare our

ture mapping hardware the multiple voxel accesses by using

metho d to existing metho ds, in Sec. 5 we present our paral-

the hardware to p erform the resampling.

lel implementation, and nally in Sec. 6 we give our results

Graphics pip elin es work on primitives, or geometric

and draw conclusions.

shap es de ned byvertices. Traditionall y,volume rendering

has b een achieved on texturing hardware systems by orient-

ing p olygons parallel to the image plane and then comp osit-

ing these planes into the frame bu er as in Fig. 2. 2 Architectural Overview

Because of the way that the texture hardware interp olates

values, the size of the original volume do es not adversely af- Fig. 1a shows our architecture in which density and gradi-

fect the rendering sp eed of texture map based volume ren- ents are loaded into the texture memory once and resampled

than computing the LUT. If the table is to b e indexed by

only four parameters G , G , G , densityvalue then the

x y z

table would need to b e recomputed every time any lightor

hanged, or every frame in the usual case.

Volume viewing parameter c

Trade-o s could o ccur to also use eye and light p osition as

indices, but the table is already much to o large. Reducing

the precision brings the table downn to a much more man-

ageable size. However, that deteriorates the image quality.

Fig. 3a shows a sphere generated with an 8-bit xed-p oint

Phong calculation and Fig. 3b with a 4-index Phong LUT

with 5-bits p er index and 8-bit values. Five bits is ab out the largest that can b e considered for a manageable lo okup

Resampling 4

table since 32 4Bytes = 4 MBytes.

Final Image Plane Polygon Slices

Fortunately, with the Phong lighting mo del it is p ossible

to reduce the size of the LUT by rst normalizing the gra-

dient and using a Re ectance Map [14]. With this metho d,

2

Figure 2: Polygon primitives for texturebased volume ren-

the Phong shading contribution for 6n surface normals is

2

dering when the nal image is orientedparal lel to one of the

computed. They are organized as six n tables that map to

2

faces of the volume

the six sides of a cub e with each side divided into n equal

patches. Each sample gradientvector G is normalized

x;y ;z

by its maximum comp onent to form G , where index

u;v ;index

ennumerates the six ma jor directions. A direct lo okup re-

turns RGB intensities which are mo dulated with the ob ject

color to form the shaded sample intensity.

Trade-o s in image quality and frame rate o ccur with the

choice of shading implementation. Wehavechosen to imple-

ment re ectance map shading b ecause it delivers go o d image

quality with fast LUT creation and simple lo okup.

2.3 Pre-computation of Volume Gradients

To b e able to compute accurately shaded volumes we pre-

compute the G , G and G central di erence gradientval-

y z

(a) (b) x

ues at eachvoxel p osition. Our voxel data typ e is then four

8-bit values whichwe load into an RGB typ e texture map,

Figure 3: Sphererendered using a 8-bit xed-point Phong

although the elds are really three gradientvalues and raw

shading calculations, and b with a 5-bit, 4-index LUT

density. These gradientvalues are then interp olated along

with the raw densityvalues to the sample p ositions by the

3D texture mapping hardware. Assuming a piecewise linear

derers. Instead, the image size and numb er of samples along

gradient function, this metho d pro duces the same gradient

eachray dictate how many texture map resamplings are

values at the sample lo cations as if gradients themselves were

computed. This is true as long as the volume data ts in

computed at unit voxel distances from the sample p oint. The

the texture memory of the graphics system. A typical high-

gradient computation needs to o ccur only once for anyvol-

end graphics system is equipp ed with 64 MBytes of texture

ume of data b eing rendered, regardless of changes in the

3

memory which holds volumes up to 256 with 32-bits p er

viewing parameters. Since the gradients are pro cessed o -

voxel. Newer hardware supp orts fast paging b etween main

line, wehavechosen to compute high-quality Sob el gradients

and texture memory for higher virtual texture memory than

at the exp ense of sp eed. Computing gradients serially o -

is physically available [12, 8 ]. 3

line for a 256 volume takes 12 seconds on a 200MHz CPU.

2.2 Shading Options

3 Rendering Optimization Techniques

The 3-4 parameter LUT presented by all three architectures

in Fig. 1 is used to optimize the computation of the lighting Software ray casting algorithms enjoy sp eedup advantages

equation for shading of samples. The LUT summarizes the from several di erent optimization techniques; we consider

contribution of ambient, di use, and sp ecular shading for two of them. The rst is space leaping, or skipping over

every gradient direction in the LUT. areas of insigni cant opacity. In this technique, the opacity

of a given sample or area of samples is checked b efore any

We present alternatives to compute the shading of the

re-sampled p oints along the rays. Van Gelder and Kim im- shading computations are p erformed. If the opacity is under

plied that a 3-4 parameter LUT within the graphics pip eline some threshold, the shading and comp ositing calculations

could b e used. Even if there were only four viewing param- are skipp ed b ecause the samples minimally contribute to the

eters to consider, four 8-bit indices into a LUT would mean nal pixel color for that ray.

4

256 = 4 Gigaentries in the table. Since this is an RGB A second optimization technique employed by software

table it would consume 16 GBytes of memory.Furthermore, ray casting is the so-called early ray termination. In this

it would require 4 Gigacalculati ons to compute the LUT. technique, only p ossible in front-to-back traversal, the sam-

If the same calculations were used on the resampled data, pling, shading and comp ositing op erations are terminated

then a 400400256 pro jection of a volume could b e shaded once the ray reaches full opacity. In other words, the ray has

with 40 Megacalculation s, or two orders of magnitude less reached the p oint where everything b ehind it is obscured by

Volume Load texture rotatio n matrix

Resample first plane into frame buffer

Read first plane from frame buffer into memory

lo op over all remaining scanline s

Final image plane Resample next plane into frame buffer

lo op over all columns in previous plane

Initiali ze integrati on variable s

while within bounds and ray not opaque

Lookup opacity in tranfer function

if sample opacity > threshold

Lookup shade in reflecta nc e map Composit e ray color OVER sample

Resampling polygon end if

perpendicular to end while

All rays for a row of final image plane Store ray color in previou s row of image

final image pixels. end lo op

Read next plane from frame buffer into memory

end lo op

Figure 4: Polygon primitives for texturebased volume ren-

Shade and composit e final plane as above

dering with rows of rays using the Planes method

Algorithm 1: Planes method for texture map based volume

rendering

other ob jects closer to the viewer. Since we are using the

hardware to p erform all interp olations, we can only elimi-

nate shading and comp ositing op erations. However, these

Notice that weinterlace the CPU and graphics hardware

op erations typically dominate the rendering time.

computation by initiating the texture mapping calculations

Below, we prop ose two metho ds to apply these optimiza-

for scanline y + 1 b efore doing the shading and comp ositing

tion techniques to sp eed up the computation of accurately

on scanline y in the CPU.

shaded volume rendering utilizing texture mapping hard-

Table 1 presents rendering times for various volumes and

ware.

3

image sizes. The Translucent Solid is a 64 volume that is

homogenous and translucent with a 1=255 opacity. This es-

3.1 Planes: Comp ositing on the CPU

tablishes a baseline of how long it takes to pro cess an entire

volume. Since there are no transparentvoxels, every sample

Previous texture map based volume rendering metho ds re-

is shaded i.e., the low opacity skipping optimization is not

sample a volume by dispatching p olygons down the graphics

utilized. Additionally, the rays do not reach full opacity for

pip eline parallel to the image plane. The textured p olygons

191 samples inside this volume, so for most cases in the table,

are then blended into the frame bu er without ever leav-

the early ray termination optimization do es not take e ect.

ing the graphics hardware [1, 7 , 3, 4, 6 ]. Fig. 2 shows the

The Translucent Sphere is a radius 32 sphere of 4=255 opac-

common p olygon resampling direction.

3

ity in the center of a transparent64 volume. In this volume,

In contrast, since we prop ose to shade the samples in the

the e ect of the low opacity skipping optimization b ecomes

CPU and take advantage of the two optimization techniques

apparent. The Opaque Sphere is the same sphere, but with

discussed earlier, we wish to have all the samples for a rayin

a uniform opacity of 255=255. This is the rst volume to

the main memory at one time. For this reason wehavecho-

take advantage of early ray termination and the rendering

sen an alternative order for accessing the resampling func-

times re ect that. These rst three volumes were created as

tionality of the 3D texture map hardware. Polygons are

theoretical test cases. The next three MRI and CT scanned

forwarded to the graphics pip eline oriented in suchaway

volumes are representative of the typical workload of a vol-

that they are coplaner with the rays that would end up b e-

ume rendering system. All three of these contain areas of

ingarow of pixels in the nal image plane. Fig. 4 shows the

translucent \gel" with other features inside or b ehind the

p olygon orientation for this metho d.

rst material encountered. Renderings of the Lobster, MRI

Once the data has b een loaded backinto the main mem-

Head, Silicon and CT Head datasets on a four pro cessor SGI

ory, the raw densityvalue, and three gradient comp onents

Onyx2 are shown in Figs. 5, 6, 7 and 8, resp ectively.

are extracted and used in a re ectance map computation

The image sizes cover a broad range most are included

to generate the Phong shaded RGB for each sample. The

for comparison to other metho ds; see Sec. 4. The number

samples are comp osited front-to-back taking advantage of

of samples along eachray is also included b ecause the run

early ray termination and skipping over low opacity samples.

time of image-order ray casting is typically prop ortional to

Similar to the shear-warp approach [10], the comp osition is

the numb er of samples computed and not the size of the

now an orthogonal pro jection with no more resampling. The

volume. To show this, we rendered the Opaque Sphere as a

ray comp osition section is therefore computed as quickly as

3 3

32 volume in 0.13 seconds, as a 64 volume in 0.13 seconds,

in the shear-warp approach. In fact, this metho d can b e

3

and as a 128 volume also in 0.13 seconds for all of these

viewed as resembling the shear-warp metho d where we let

2

we rendered 100 images with 100 samples p er ray using the

the texture mapping hardware p erform the shearing and p er-

Planes metho d.

sp ective scaling. Furthermore, our metho d do es not require

a nal warp since the planes are already resampled into im-

age space. This not only sp eeds up the total pro cessing over

3.2 Blend: Comp ositing in the Frame Bu er

shear-warp, but removes a ltering step and thus, results in

higher image quality. Algorithm 1 renders a volume using When we tested and studied the p erformance of the system

the Planes metho d. we noticed that, dep ending on the volume data and transfer

Image Size Translucent Translucent Opaque Lobster Silicon MRI CT

 Samples Solid Sphere Sphere Head Head

3 3 3 2 2 2 2

per Ray 64 64 64 128 64 12832 64256 128 113

2

128 84 1.04 0.54 0.19 0.24 0.48 0.36 0.20

2

200 VolDepth 1.90 0.99 0.35 0.31 0.52 1.27 0.48

2

200 200 5.55 2.69 0.73 1.22 2.21 1.76 0.67

2

400 128 13.19 6.10 1.84 2.78 5.98 4.53 1.84

2

512 VolDepth 11.96 6.15 1.94 1.88 3.06 7.99 2.81

Table 1: Renderings rates in seconds for the Planes method

Load texture rotatio n matrix

Resample furthest slice into frame buffer

Read furthest slice from frame buffer into memory

lo op over all remaining slices back-to- fr on t

Resample next slice into frame buffer

lo op over all samples in previous slice

if sample opacity > thresho ld

Lookup shade in reflecta nc e map

Write shade back into buffer

else

Write clear back into buffer

end if

end lo op

Blend slice buffer into frame buffer

Read next slice from frame buffer into memory

end lo op

Shade and Blend nearest slice as above

Algorithm 2: Blend method for texture map based volume

rendering

Figure 5: CT scannedLobster dataset with a translucent shel l

function, there was still a substantial amount of time sp ent

2

rendered in 0.26 seconds at 200 resolution also in the color

in the comp ositing p ortion of the algorithm. In fact, we

section

found that the numb er of samples p er ray b efore reaching

full opacity and terminating is prop ortional to the time sp ent

comp ositing. We prop ose to comp osite using the blending

hardware of the graphics hardware by placing the shaded

images backinto the frame bu er and sp ecifying the over

op erator. Of course, this requires that we return to using

p olygons that are parallel to the nal image plane as in Fig 2.

In this metho d, we can employ the optimization of skipping

over low opacity samples by not shading empty samples.

However, since the transparency values reside in the frame

bu er's channel and not in main memory,we can not

easily tell when a given ray has reached full opacity and can

not directly employ early ray termination without reading

the frame bu er. Algorithm 2 renders a volume using this

Blend metho d.

Notice that wenow resample along slices rather than

planes. Also, there are two frame bu ers, one for the slices

of samples and another for the blending of shaded images.

Since comp ositing is not p erformed in software, it is quicker

than the Planes algorithm. However, b ecause of the added

data transfer back to the graphics pip eline for blending into

the the frame bu er, and the fact that shading is p erformed

for all voxels, this metho d do es not always pro duce faster

rendering rates.

Considering Table 2, the Blend metho d always pro duces

Figure 6: MRI scannedHead dataset showing internal brain

b etter rendering rates for the rst two columns, due to the

2

structures rendered in 0.43 seconds at 200 resolution also

fact that here the volumes are \fully translucent". In other

in the color section

words, since the rays never reach full opacity, the early ter-

mination optimization that the Planes metho d typically uti-

Image Size Translucent Translucent Opaque Lobster Silicon MRI CT

 Samples Solid Sphere Sphere Head Head

3 3 3 2 2 2 2

per Ray 64 64 64 128 64 12832 64256 128 113

2

128 84 0.83 0.48 0.48 0.26 0.46 0.35 0.29

2

200 VolDepth 1.48 0.88 0.83 0.23 0.38 1.21 0.84

2

200 200 4.64 2.63 2.66 1.31 2.50 1.83 1.49

2

400 128 11.30 5.40 3.19 3.04 5.13 13.49 3.44

2

512 VolDepth 9.29 5.46 5.14 1.32 2.36 7.26 4.68

Table 2: Rendering rates in seconds for the Blend method

lizes is unavailable. Since b oth metho ds must shade the same

number of voxels and comp osite every sample on every ray,

letting the graphics hardware p erform this comp ositing is the

quickest. However, for the Opaque Sphere the Planes metho d

is always faster. This is b ecause 78.5 of the rays intersect

the sphere and the optimization from early ray termination

is greater than the time gained from not p erforming com-

p ositing. We notice for the three \real" volumes, the Blend

metho d is quicker when the numb er of samples along each

ray is equal to the number of voxels in that dimension of the

volume. When the sampling rate is close to the resolution

of the image, the excessive slices that must b e shaded and

returned to the frame bu er again allow the early ray ter-

mination optimization in the Planes metho d to out-p erform

the Blend metho d.

In theory, which metho d would b e optimal can b e deter-

mined from the desired rendering parameters, volume den-

sity histogram, and transfer function. For a more opaque

volume, the Planes metho d always pro duces b etter render-

ing rates. For transparentvolumes, if there are many slices

to render, the Planes metho d is usually quicker, while if

there are few slices the Blend metho d is the b etter of the

two. Yet in practice, we feel it may prove to b e dicult

Figure 7: Silicon dataset ythrough showing translucent

2

to determine how \few" and \many" are de ned. For this

surfaces rendered in 0.29 seconds at 200 resolution also

reason, we prefer the Planes metho d, since it is faster for all

in the color section

opaque volumes and for some of the translucentvolumes.

4 Comparison to Other Metho ds

Here we compare the p erformance of our rendering algorithm

to others presented in the literature, in terms of b oth image

quality and rendering rates. The image quality comparisons

p oint out quality trade-o s as they relate to lighting meth-

o ds. We noticed in our testing on di erentvolumes, that

the numb er of samples more accurately determined the run

time of the algorithm than simply the volume size. For this

reason wehave included image sizes and sample counts in

our runtime tables see Tables 1 and 2. We also noticed

that the volume data and transfer functions greatly in u-

ence rendering rates. For our metho d this is probably more

of an e ect b ecause we are utilizing runtime optimization

techniques whose p erformance directly relies on the volume

data and transfer functions.

Our Planes metho d renders the Lobster at a sampling res-

olution of 51251264 in 1.88 seconds. Our metho d is 19

times slower than the metho d of Cabral et al. However,

their metho d do es not employ directional shading or even

view indep endent di use shading. This is a ma jor limita-

tion to their metho d since shading cues are highly regarded

Figure 8: CT scannedHead dataset showing bone structures

as essential to visual p erception of shap e and form. Our im-

2

rendered in 0.44 seconds at 200 resolution also in the color

plementations with full Phong lighting with ambient, di use

section

and sp ecular comp onents pro duces a much higher quality image.

In comparison to Cullip and Neumann [4], our metho d achieves the p o orest sp eedup b ecause it quickly approaches

is again slower. Cullip and Neumann achieve b etter image the limit for raw rendering time imp osed by the sequential

quality than Cabral et al. by computing a gradient co e- texture mapping. On the other end of the sp ectrum, the

cient that is used to simulate di use highlights. This still Translucent Sphere achieves the b est sp eedup p erformance

is not as high an image quality as our full Phong lighting, although it su ers from the slowest rendering rates. This is

and if the light geometry changes with resp ect to the vol- b ecause the CPU b ound raycasting p ortion of the compu-

ume, Cullip and Neumann's texture maps must b e recom- tation is the dominant p ercentage of the sequential time for

puted. Therefore, if the viewing geometry is dynamic, then this dataset. The Lobster dataset is representativeofvol-

our metho d obtains higher quality images, including sp ecu- ume rendering applications and shows results b etween the

lar highlights, at faster rates. two extremes.

Our metho d pro duces an image of Opaque Sphere in 1.84 Given enough pro cessors, any dataset will eventually b e

seconds with the Planes metho d, faster than Sommer et limited by the time to p erform the texture mapping. The

numb er of pro cessors required to reach this limit dep ends

al.'s [13] isosurface rendering. For a translucent rendering of

the Lobster our Planes metho d runs 12 times faster in 2.78 on the time it takes for the CPU p ortion raycasting of

the algorithm to run and the fact that that p ortion relies

seconds. The image quality of b oth metho ds is equivalent

since they b oth compute full lighting e ects. As Sommer heavily on software data dep endant optimizations. The limit

is reached when the numb er of pro cessors is equal to T =T ,

et al. p ointed out, storing the gradients in the texture map

c t

has the disadvantage of limiting the size of the volume that where T is the time to p erform the raycasting for one plane

c

on the CPU and T is the time to texture map one plane in

can b e rendered without texture paging, so our frame rate

t

is limited by the amountofavailable texture memory, like the graphics hardware.

all other texture map based metho ds.

Although Lacroute's shear-warp [9] is not a texture map

6 Results and Conclusions

based approach, we include a comparison, since it is one

of the quickest metho ds for rendering with a full accurate

Wehave presented a metho d for high-quali ty rendering

lighting mo del on a workstation class machine. For example,

of volumetric datasets which utilizes the 3D texture map

shear-warp pro duces fully shaded mono chrome renderings at

hardware currently available in graphics workstations. The

a rate of 10 Hz, but, this is a parallel version of shear-warp

metho d pro duces images whose quality is not only compa-

running on a 32 pro cessor SGI Challenge. Lacroute rep orts

rable to that of accurate software ray casting, but also the

that a 12812884 volume can b e rendered in 0.24 seconds

2

highest quality metho d currently available, at a substantiall y

on one pro cessor. Our Planes metho d renders a 128 im-

faster frame rate than that of software ray casting. Other

age of the Opaque Sphere with 84 samples p er ray in 0.19

metho ds achieve higher frame rates than ours, but either

seconds and the Lobster in 0.24 seconds. Our parallel im-

lack shading, lack directional shading, or require multiple

plementation runs even faster see Sec. 5. Since shear-warp

pro cessors.

must generate three copies of a compressed data structure

Our metho d is accelerated bymultiple pro cessors, al-

p er classi cation, interactive segmentation is not p ossible as

though the sp eedup is limited by the throughput of the se-

is with our metho d. Shear-warp p erforms classi cation b e-

rial graphics pip eline. Although shear-warp achieves higher

fore bilinear resampling, whereas our metho d p erforms tri-

rendering rates for multipro cessor machines, our metho d is

linear interp olation followed by classi cation . Additionally,

faster on typical graphics workstations with 3D texture map-

our metho d p erforms arbitrary parallel and p ersp ective pro-

ping and also supp orts interactive classi cation .

jections in the same time while shear-warp takes up to four

times longer for p ersp ective pro jections.

7 Acknowledgements

5 Parallel Implementation

This work has b een supp orted by the National Science Foun-

dation under grant MIP9527694, Oce of Naval Research

Wehave parallelized the Planes algorithm on a four pro ces-

under grant N000149710402, Mitsubishi Electric Research

sor Onyx2worksatition with In nite Reality graphics. We

Lab, Japan Radio Corp., Hewlett-Packard, and Intel Corp.

constructed a master-slave mo del for the parallel pro cessing

The Lobster dataset is courtesy of AVS, Inc. The Silicon

where the master pro cess implements the texture mapping

dataset is courtesy of Oak Ridge National Lab. The MRI

interface to the graphics hardware and once a plane of or-

Head dataset is courtesy of Siemens. The CT Head is from

thogonal rays is resampled by the hardware, the work is

the UNC database.

farmed to a slave pro cess for the raycasting. We use the

shared memory symmetric multi-pro cessor SMP function-

ality of the Onyx 2 and IRIX 6.4 op erating system. The b est

References

sp eedup we can achieve with the parallel version is b ound

by the time it takes to p erform the texture mapping for all

[1] K. Akeley. RealityEngin e Graphics. In Computer

the planes. This is b ecause the texture mapping computa-

Graphics, SIGGRAPH '93, Anahiem, CA, August

tion must b e p erformed sequentially since there is only one

1993. ACM.

graphics pip eline.

Figure 9a shows the rendering rates for one to four pro-

[2] R. Avila, L. Sobiera jski, and A. Kaufman. Towards a

2

cessors for various volumes. For all cases we rendered 128

Comprehensivevolume System. In Pro-

images with 84 samples p er ray. The time to p erform the

ceedings of Visualization '92, Boston, MA, Octob er

texture mapping of 128 12884 planes is 0.12 seconds as

1992. IEEE.

shown on the graph. As can b e seen, the rendering rates

approach this theoretical b est rendering time. Figure 9b [3] B. Cabral, N. Cam, and J. Foran. Accelerated Vol-

presents sp eedup curves for the same datasets. The Opaque ume Rendering and Tomographic Reconstruction Using

Sphere dataset is the rendered the fastest. However, it also Texture Mapping Hardware. In Symposium on Volume 0.5 4 Optimal Linear Transparent Sphere Transparent Sphere Lobster Lobster 0.4 Opaque Sphere Opaque Sphere Texture Mapping

3 0.3

0.2 Speedup Runtime (sec.) 2

0.1

0 1 12341234 Number of Processors Number of Processors

(a) (b)

Figure 9: a Paral lel rendering rates for 12812884 samples and b Paral lel speedup

[12] J. S. Montrym, D. R. Baum, D. L. Dignam, and C. J. Visualization, pages 91{98, Washington D.C., Octob er

Migdal. In niteReali ty: A Real-Time Graphics Sys- 1994. ACM.

tem. In Computer Graphics, SIGGRAPH '97, pages

[4] T. J. Cullip and U. Neumann. Accelerating Volume

293{302, Los Angeles, CA, August 1997. ACM.

Reconstruction with 3D Texture Hardware. Technical

Rep ort TR93-027, University of North Carolina, Chap el

[13] O. Sommer, A. Dietz, R. Westermann, and T. Ertl.

Hill, 1993.

Tivor: An Interactive Visualizati on and Navigation

To ol for Medical Volume Data. In The Sixth Inter-

[5] J. Eyles, S. Molnar, J. Poulton, T. Greer, A. Las-

national Conference in Central Europe on Computer

tra, N. England, and L. Westover. PixelFlow:

Graphics and Visualization '98,February 1998.

The Realization. In Proceedings of the 1997 SIG-

GRAPH/Eurographics Workshop on Graphics Hard-

[14] J. van Scheltinga, J. Smit, and M. Bosma. Design

ware, pages 57{68, Los Angeles, CA, August 1997. Eu-

of an On-Chip Re ectance Map. In Proceedings of

rographics.

the 10th Eurographics Workshop on Graphics Hardware

'95, pages 51{55, Maastricht, The Netherlands, August

[6] A. Van Gelder and K. Kim. Direct Volume Render-

1995. Eurographics.

ing with Shading via Three-Dimensional Textures. In

Symposium on Volume Visualization, pages 23{30, San

Francisco, CA, Octob er 1996. ACM.

[7] S.-Y. Guan and R. Lip es. Innovativevolume rendering

using 3D texture mapping. In Image Capture, Format-

ting and Display, Newp ort Beach, CA, February 1994.

SPIE.

[8] M. J. Kilgard. Realizing Op enGL: Two Implemen-

tations of One Architecture. In Proceedings of the

1997 SIGGRAPH/Eurographics Workshop on Graphics

Hardware, pages 45{56, Los Angeles, CA, August 1997.

Eurographics.

[9] P. Lacroute. Analysis of a Parallel Volume Rendering

System Based on the Shear-Warp Factorization. IEEE

Transactions on Visualization and Computer Graphics,

23:218{231, Septemb er 1996.

[10] P. Lacroute and M. Levoy.Fast Volume Rendering us-

ing a Shear-warp Factorization of the Viewing Trans-

form. In Computer Graphics, SIGGRAPH '94, pages

451{457, Orlando, FL, July 1994. ACM.

[11] M. Levoy. Display of Surfaces from Volume Data. IEEE

Computer Graphics and Applications, 85:29{37, May 1988.