High-QualityVolume Rendering Using Texture Mapping Hardware
y
Frank Dachille, Kevin Kreeger, Bao quan Chen, Ingmar Bitter and Arie Kaufman
z
Center for Visual Computing CVC
and Department of Computer Science
State University of New York at Stony Bro ok
Stony Bro ok, NY 11794-4400, USA
Abstract store the volume as a solid texture on the graphics hardware,
then to sample the texture using planes parallel to the image
We present a method for volume rendering of regular grids
plane and comp osite them into the frame bu er using the
which takes advantage of 3D texture mapping hardware cur-
blending hardware. This approach considers only ambient
rently available on graphics workstations. Our methodpro-
light and quickly pro duces unshaded images. The images
duces accurate shading for arbitrary and dynamical ly chang-
could b e improved byvolumetric shading, which implements
ing directional lights, viewing parameters, and transfer func-
a full lighting equation for eachvolume sample.
tions. This is achieved by hardware interpolating the data
Cabral et al. [3] rendered 51251264 volumes into a
values and gradients before software classi cation and shad-
512512 window presumably with 64 sampling planes in
ing. The method works equal ly wel l for paral lel and perspec-
0.1 seconds on a four Raster Manager SGI RealityEngi ne
tive projections. We present two approaches for our method:
Onyx with one 150MHz CPU. Cullip and Neumann [4]
one which takes advantage of softwareray casting optimiza-
also pro duced 512512 images on the SGI RealityEngi ne
tions and another which takes advantage of hardware blend-
again presumably 64 sampling planes since the volume is
ing acceleration.
12812864 in 0.1 seconds. All of these approaches keep
time-critical computations inside the graphics pip eline at the
CR Categories: I.3.1 [Computer Graphics]: Hardware
exp ense of volumetric shading and image quality.
Architecture; I.3.3 [Computer Graphics]: Picture/Image
Van Gelder and Kim [6] prop osed a metho d by which
Generation; I.3.7 [Computer Graphics]: Three-Dimensional
volumetric shading could b e incorp orated at the exp ense of
Graphics and Realism|Color, shading, shadowing, and tex-
interactivity. Their shaded renderings of 256256113 vol-
ture
2
umes into 600 images with 1000 samples along eachraytook
Keywords: volume rendering, shading, ray casting, tex-
13.4 seconds. Their metho d is slower than Cullip and Neu-
ture mapping, solid texture, hardware acceleration, parallel
mann's and Cabral et al.'s b ecause they must re-shade the
rendering
volume and reload the texture map for every frame b ecause
the colors in the texture memory are view dep endant.
Cullip and Neumann also describ ed a metho d utilizin g
1 Intro duction
the PixelFlow machine which pre-computes the x, y and z
gradient comp onents and uses the texture mapping to inter-
Volumetric data is p ervasive in many areas such as medi-
p olate the density data and the three gradient comp onents.
cal diagnosis, geophysical analysis, and computational uid
The latter is implemented partially in hardware and par-
dynamics. Visualizati on byinteractive, high-quali tyvol-
2
tially in software on the 128 SIMD pixel pro cessors [5].
ume rendering enhances the usefulness of this data. To
All four of these values are used to compute Phong shaded
date, manyvolume rendering metho ds have b een prop osed
samples which are comp osited in the frame bu er. They pre-
on general and sp ecial purp ose hardware, but most fail to
3
dicted that 256 volume could b e rendered at over 10Hz into
achieve reasonable cost-p erformance ratios. We prop ose a
a 640x512 image with 400 sample planes. Although this is
high-qualityvolume rendering metho d suitable for imple-
the rst prop osed solution to implement full Phong lighting
mentation on machines with 3D texture mapping hardware.
functionality, it has never b een realized as far as we know
Akeley [1] rst mentioned the p ossibili ty of accelerating
b ecause it would require 43 pro cessor cards, a numb er which
volume rendering using of 3D texture mapping hardware,
can not easily t into a standard workstation chassis [4].
sp eci cally on the SGI Reality Engine. The metho d is to
3
Sommer et al. [13] describ ed a metho d to render 128
2
y
volumes at 400 resolution with 128 samples p er ray in 2.71
fdachillejkkreegerjbao [email protected]
z
seconds. They employ a full lighting equation by computing
http://www.cvc.sunysb.edu
a smo oth gradient from a second copy of the volume stored
in main memory. Therefore, they do not have to reload the
texture maps when viewing parameters change. However,
this rendering rate is for isosurface extraction; if translucent
pro jections are required, it takes 33.2 seconds for the same
rendering. They were the rst to prop ose to resample the
texture volume in planes parallel to a row of image pixels so
that a whole raywas in main memory at one time. They
mention the p otential to also interp olate gradients with the
hardware. All of these texture map based metho ds either non- Data, Gradients Classification, Data, Gradients Classification, Data, Gradients Classification, map to colors map to colors map to colors
3-4 parameter texture memory software texture LUT 3-4 parameter texture memory software texture LUT 3-4 parameter Main Memory hardware texture LUT and CPU texture memory
Frame Buffer Frame Buffer Frame Buffer
(a) (b) (c)
Figure 1: Threearchitectures for texture map based volume rendering: a Our architecture, b Traditional architectureof
Van Gelder and Kim, and c Ideal architectureofVan Gelder and Kim. The thick lines are the operations which must be
performed for every frame.
by the texture hardware along rays cast through the volume. interactively recompute direction-dep endent shading each
The sample data for eachray or slice is then transferred time any of the viewing parameters change, compute only
to a bu er in main memory and shaded by the CPU. The direction-indep end ent shading, or compute no shading at
shaded samples along a ray are comp osited and the nal all. Our metho d shades every visible sample with view-
pixels are moved to the frame bu er for display. Alterna- dep endent lighting at interactive rates.
tively within the same architecture, the shaded voxels can
We do not adapt the ray casting algorithm to t within
b e comp osited by the frame bu er.
the existing graphics pip elin e, whichwould compromise the
Fig. 1b shows the architecture that is traditional ly used
image quality. Instead, we only utilize the hardware where
in texture map based shaded volume rendering. One of the
it provides run time advantages, but maintain the integrity
disadvantages of this architecture is that the volume must
of the ray casting algorithm. For the p ortions of the volume
b e re-shaded and re-loaded every time any of the viewing
rendering pip eline which can not b e p erformed in graphics
parameters changes. Another problem with this metho d is
hardware sp eci cally shading we use the CPU.
that RGB values are interp olated by the texture hardware.
In volume rendering byray casting, data values and gra-
Therefore, when non-linear mappings from densitytoRGB
dients are estimated at evenly spaced intervals along rays
are used, the interp olated samples are incorrect. We present
emanating from pixels of the nal image plane. Resam-
a more detailed comparison of the various metho ds in Sec. 4.
pling these data values and gradients is often the most time
In Fig. 1c, Van Gelder and Kim's [6] Ideal architecture
consuming task in software implementations . The texture
is presented. In this architecture, the raw density and vol-
mapping hardware on high-end graphics workstations is de-
ume gradients are loaded into the texture memory one time
signed to p erform resampling of solid textures with very high
only. The density and gradients are then interp olated by the
throughput. We leverage this capability to implement high
texture hardware and passed to a p ost-texturing LUT. The
throughput density and gradient resampling.
densityvalues and gradients are used as an index into the
Shading is the missing key in conventional texture map
LUT to get the RGB values for each sample. The LUT
based volume rendering. This is one of the reasons that pure
is based on the current view direction and can b e created
graphics hardware metho ds su er from lower image qual-
using any lighting mo del desired e.g., Phong for any level
ity than software implementations of ray-casting. For high-
of desired image quality. This metho d solves the problems
quality images, our metho d implements full Phong shading
of the current architecture includin g pre-shading the volume
using the estimated surface normal gradient of the density.
and interp olatin g RBG values. However, a p ost-texturing
We pre-compute the estimated gradient of the density and
LUT would need to b e indexed by the lo cal gradient which
store it in texture memory.We also pre-compute a lo okup
would require an infeasibly large LUT see Sec. 2.2.
table LUT to store the e ect of an arbitrary number of
light sources using full Phong shading.
The nal step in volume rendering is the comp ositing, or
2.1 Sampling
blending, of the color samples along eachrayinto a nal im-
age color. Most graphics systems have a frame bu er with
Ray casting is an image-order algorithm, which has the
an opacitychannel and ecient blending hardware which
drawbackofmultiple access of voxel data, since sampling
can b e used for back-to-front comp ositing. In the next sec-
within the dataset usually requires the access of eightor
tion we present our architecture, in Sec. 3 we present our
more neighb oring data p oints [2, 11 ]. Ray casting using tex-
rendering optimization techniques, in Sec. 4 we compare our
ture mapping hardware the multiple voxel accesses by using
metho d to existing metho ds, in Sec. 5 we present our paral-
the hardware to p erform the resampling.
lel implementation, and nally in Sec. 6 we give our results
Graphics pip elin es work on primitives, or geometric
and draw conclusions.
shap es de ned byvertices. Traditionall y,volume rendering
has b een achieved on texturing hardware systems by orient-
ing p olygons parallel to the image plane and then comp osit-
ing these planes into the frame bu er as in Fig. 2. 2 Architectural Overview
Because of the way that the texture hardware interp olates
values, the size of the original volume do es not adversely af- Fig. 1a shows our architecture in which density and gradi-
fect the rendering sp eed of texture map based volume ren- ents are loaded into the texture memory once and resampled
than computing the LUT. If the table is to b e indexed by
only four parameters G , G , G , densityvalue then the
x y z
table would need to b e recomputed every time any lightor
hanged, or every frame in the usual case.
Volume viewing parameter c
Trade-o s could o ccur to also use eye and light p osition as
indices, but the table is already much to o large. Reducing
the precision brings the table downn to a much more man-
ageable size. However, that deteriorates the image quality.
Fig. 3a shows a sphere generated with an 8-bit xed-p oint
Phong calculation and Fig. 3b with a 4-index Phong LUT
with 5-bits p er index and 8-bit values. Five bits is ab out the largest that can b e considered for a manageable lo okup
Resampling 4
table since 32 4Bytes = 4 MBytes.
Final Image Plane Polygon Slices
Fortunately, with the Phong lighting mo del it is p ossible
to reduce the size of the LUT by rst normalizing the gra-
dient and using a Re ectance Map [14]. With this metho d,
2
Figure 2: Polygon primitives for texturebased volume ren-
the Phong shading contribution for 6n surface normals is
2
dering when the nal image is orientedparal lel to one of the
computed. They are organized as six n tables that map to
2
faces of the volume
the six sides of a cub e with each side divided into n equal
patches. Each sample gradientvector G is normalized
x;y ;z
by its maximum comp onent to form G , where index
u;v ;index
ennumerates the six ma jor directions. A direct lo okup re-
turns RGB intensities which are mo dulated with the ob ject
color to form the shaded sample intensity.
Trade-o s in image quality and frame rate o ccur with the
choice of shading implementation. Wehavechosen to imple-
ment re ectance map shading b ecause it delivers go o d image
quality with fast LUT creation and simple lo okup.
2.3 Pre-computation of Volume Gradients
To b e able to compute accurately shaded volumes we pre-
compute the G , G and G central di erence gradientval-
y z
(a) (b) x
ues at eachvoxel p osition. Our voxel data typ e is then four
8-bit values whichwe load into an RGB typ e texture map,
Figure 3: Sphererendered using a 8-bit xed-point Phong
although the elds are really three gradientvalues and raw
shading calculations, and b with a 5-bit, 4-index LUT
density. These gradientvalues are then interp olated along
with the raw densityvalues to the sample p ositions by the
3D texture mapping hardware. Assuming a piecewise linear
derers. Instead, the image size and numb er of samples along
gradient function, this metho d pro duces the same gradient
eachray dictate how many texture map resamplings are
values at the sample lo cations as if gradients themselves were
computed. This is true as long as the volume data ts in
computed at unit voxel distances from the sample p oint. The
the texture memory of the graphics system. A typical high-
gradient computation needs to o ccur only once for anyvol-
end graphics system is equipp ed with 64 MBytes of texture
ume of data b eing rendered, regardless of changes in the
3
memory which holds volumes up to 256 with 32-bits p er
viewing parameters. Since the gradients are pro cessed o -
voxel. Newer hardware supp orts fast paging b etween main
line, wehavechosen to compute high-quality Sob el gradients
and texture memory for higher virtual texture memory than
at the exp ense of sp eed. Computing gradients serially o -
is physically available [12, 8 ]. 3
line for a 256 volume takes 12 seconds on a 200MHz CPU.
2.2 Shading Options
3 Rendering Optimization Techniques
The 3-4 parameter LUT presented by all three architectures
in Fig. 1 is used to optimize the computation of the lighting Software ray casting algorithms enjoy sp eedup advantages
equation for shading of samples. The LUT summarizes the from several di erent optimization techniques; we consider
contribution of ambient, di use, and sp ecular shading for two of them. The rst is space leaping, or skipping over
every gradient direction in the LUT. areas of insigni cant opacity. In this technique, the opacity
of a given sample or area of samples is checked b efore any
We present alternatives to compute the shading of the
re-sampled p oints along the rays. Van Gelder and Kim im- shading computations are p erformed. If the opacity is under
plied that a 3-4 parameter LUT within the graphics pip eline some threshold, the shading and comp ositing calculations
could b e used. Even if there were only four viewing param- are skipp ed b ecause the samples minimally contribute to the
eters to consider, four 8-bit indices into a LUT would mean nal pixel color for that ray.
4
256 = 4 Gigaentries in the table. Since this is an RGB A second optimization technique employed by software
table it would consume 16 GBytes of memory.Furthermore, ray casting is the so-called early ray termination. In this
it would require 4 Gigacalculati ons to compute the LUT. technique, only p ossible in front-to-back traversal, the sam-
If the same calculations were used on the resampled data, pling, shading and comp ositing op erations are terminated
then a 400400256 pro jection of a volume could b e shaded once the ray reaches full opacity. In other words, the ray has
with 40 Megacalculation s, or two orders of magnitude less reached the p oint where everything b ehind it is obscured by
Volume Load texture rotatio n matrix
Resample first plane into frame buffer
Read first plane from frame buffer into memory
lo op over all remaining scanline s
Final image plane Resample next plane into frame buffer
lo op over all columns in previous plane
Initiali ze integrati on variable s
while within bounds and ray not opaque
Lookup opacity in tranfer function
if sample opacity > threshold
Lookup shade in reflecta nc e map Composit e ray color OVER sample
Resampling polygon end if
perpendicular to end while
All rays for a row of final image plane Store ray color in previou s row of image
final image pixels. end lo op
Read next plane from frame buffer into memory
end lo op
Figure 4: Polygon primitives for texturebased volume ren-
Shade and composit e final plane as above
dering with rows of rays using the Planes method
Algorithm 1: Planes method for texture map based volume
rendering
other ob jects closer to the viewer. Since we are using the
hardware to p erform all interp olations, we can only elimi-
nate shading and comp ositing op erations. However, these
Notice that weinterlace the CPU and graphics hardware
op erations typically dominate the rendering time.
computation by initiating the texture mapping calculations
Below, we prop ose two metho ds to apply these optimiza-
for scanline y + 1 b efore doing the shading and comp ositing
tion techniques to sp eed up the computation of accurately
on scanline y in the CPU.
shaded volume rendering utilizing texture mapping hard-
Table 1 presents rendering times for various volumes and
ware.
3
image sizes. The Translucent Solid is a 64 volume that is
homogenous and translucent with a 1=255 opacity. This es-
3.1 Planes: Comp ositing on the CPU
tablishes a baseline of how long it takes to pro cess an entire
volume. Since there are no transparentvoxels, every sample
Previous texture map based volume rendering metho ds re-
is shaded i.e., the low opacity skipping optimization is not
sample a volume by dispatching p olygons down the graphics
utilized. Additionally, the rays do not reach full opacity for
pip eline parallel to the image plane. The textured p olygons
191 samples inside this volume, so for most cases in the table,
are then blended into the frame bu er without ever leav-
the early ray termination optimization do es not take e ect.
ing the graphics hardware [1, 7 , 3, 4, 6 ]. Fig. 2 shows the
The Translucent Sphere is a radius 32 sphere of 4=255 opac-
common p olygon resampling direction.
3
ity in the center of a transparent64 volume. In this volume,
In contrast, since we prop ose to shade the samples in the
the e ect of the low opacity skipping optimization b ecomes
CPU and take advantage of the two optimization techniques
apparent. The Opaque Sphere is the same sphere, but with
discussed earlier, we wish to have all the samples for a rayin
a uniform opacity of 255=255. This is the rst volume to
the main memory at one time. For this reason wehavecho-
take advantage of early ray termination and the rendering
sen an alternative order for accessing the resampling func-
times re ect that. These rst three volumes were created as
tionality of the 3D texture map hardware. Polygons are
theoretical test cases. The next three MRI and CT scanned
forwarded to the graphics pip eline oriented in suchaway
volumes are representative of the typical workload of a vol-
that they are coplaner with the rays that would end up b e-
ume rendering system. All three of these contain areas of
ingarow of pixels in the nal image plane. Fig. 4 shows the
translucent \gel" with other features inside or b ehind the
p olygon orientation for this metho d.
rst material encountered. Renderings of the Lobster, MRI
Once the data has b een loaded backinto the main mem-
Head, Silicon and CT Head datasets on a four pro cessor SGI
ory, the raw densityvalue, and three gradient comp onents
Onyx2 are shown in Figs. 5, 6, 7 and 8, resp ectively.
are extracted and used in a re ectance map computation
The image sizes cover a broad range most are included
to generate the Phong shaded RGB for each sample. The
for comparison to other metho ds; see Sec. 4. The number
samples are comp osited front-to-back taking advantage of
of samples along eachray is also included b ecause the run
early ray termination and skipping over low opacity samples.
time of image-order ray casting is typically prop ortional to
Similar to the shear-warp approach [10], the comp osition is
the numb er of samples computed and not the size of the
now an orthogonal pro jection with no more resampling. The
volume. To show this, we rendered the Opaque Sphere as a
ray comp osition section is therefore computed as quickly as
3 3
32 volume in 0.13 seconds, as a 64 volume in 0.13 seconds,
in the shear-warp approach. In fact, this metho d can b e
3
and as a 128 volume also in 0.13 seconds for all of these
viewed as resembling the shear-warp metho d where we let
2
we rendered 100 images with 100 samples p er ray using the
the texture mapping hardware p erform the shearing and p er-
Planes metho d.
sp ective scaling. Furthermore, our metho d do es not require
a nal warp since the planes are already resampled into im-
age space. This not only sp eeds up the total pro cessing over
3.2 Blend: Comp ositing in the Frame Bu er
shear-warp, but removes a ltering step and thus, results in
higher image quality. Algorithm 1 renders a volume using When we tested and studied the p erformance of the system
the Planes metho d. we noticed that, dep ending on the volume data and transfer
Image Size Translucent Translucent Opaque Lobster Silicon MRI CT
Samples Solid Sphere Sphere Head Head
3 3 3 2 2 2 2
per Ray 64 64 64 128 64 12832 64256 128 113
2
128 84 1.04 0.54 0.19 0.24 0.48 0.36 0.20
2
200 VolDepth 1.90 0.99 0.35 0.31 0.52 1.27 0.48
2
200 200 5.55 2.69 0.73 1.22 2.21 1.76 0.67
2
400 128 13.19 6.10 1.84 2.78 5.98 4.53 1.84
2
512 VolDepth 11.96 6.15 1.94 1.88 3.06 7.99 2.81
Table 1: Renderings rates in seconds for the Planes method
Load texture rotatio n matrix
Resample furthest slice into frame buffer
Read furthest slice from frame buffer into memory
lo op over all remaining slices back-to- fr on t
Resample next slice into frame buffer
lo op over all samples in previous slice
if sample opacity > thresho ld
Lookup shade in reflecta nc e map
Write shade back into buffer
else
Write clear back into buffer
end if
end lo op
Blend slice buffer into frame buffer
Read next slice from frame buffer into memory
end lo op
Shade and Blend nearest slice as above
Algorithm 2: Blend method for texture map based volume
rendering
Figure 5: CT scannedLobster dataset with a translucent shel l
function, there was still a substantial amount of time sp ent
2
rendered in 0.26 seconds at 200 resolution also in the color
in the comp ositing p ortion of the algorithm. In fact, we
section
found that the numb er of samples p er ray b efore reaching
full opacity and terminating is prop ortional to the time sp ent
comp ositing. We prop ose to comp osite using the blending
hardware of the graphics hardware by placing the shaded
images backinto the frame bu er and sp ecifying the over
op erator. Of course, this requires that we return to using
p olygons that are parallel to the nal image plane as in Fig 2.
In this metho d, we can employ the optimization of skipping
over low opacity samples by not shading empty samples.
However, since the transparency values reside in the frame
bu er's channel and not in main memory,we can not
easily tell when a given ray has reached full opacity and can
not directly employ early ray termination without reading
the frame bu er. Algorithm 2 renders a volume using this
Blend metho d.
Notice that wenow resample along slices rather than
planes. Also, there are two frame bu ers, one for the slices
of samples and another for the blending of shaded images.
Since comp ositing is not p erformed in software, it is quicker
than the Planes algorithm. However, b ecause of the added
data transfer back to the graphics pip eline for blending into
the the frame bu er, and the fact that shading is p erformed
for all voxels, this metho d do es not always pro duce faster
rendering rates.
Considering Table 2, the Blend metho d always pro duces
Figure 6: MRI scannedHead dataset showing internal brain
b etter rendering rates for the rst two columns, due to the
2
structures rendered in 0.43 seconds at 200 resolution also
fact that here the volumes are \fully translucent". In other
in the color section
words, since the rays never reach full opacity, the early ter-
mination optimization that the Planes metho d typically uti-
Image Size Translucent Translucent Opaque Lobster Silicon MRI CT
Samples Solid Sphere Sphere Head Head
3 3 3 2 2 2 2
per Ray 64 64 64 128 64 12832 64256 128 113
2
128 84 0.83 0.48 0.48 0.26 0.46 0.35 0.29
2
200 VolDepth 1.48 0.88 0.83 0.23 0.38 1.21 0.84
2
200 200 4.64 2.63 2.66 1.31 2.50 1.83 1.49
2
400 128 11.30 5.40 3.19 3.04 5.13 13.49 3.44
2
512 VolDepth 9.29 5.46 5.14 1.32 2.36 7.26 4.68
Table 2: Rendering rates in seconds for the Blend method
lizes is unavailable. Since b oth metho ds must shade the same
number of voxels and comp osite every sample on every ray,
letting the graphics hardware p erform this comp ositing is the
quickest. However, for the Opaque Sphere the Planes metho d
is always faster. This is b ecause 78.5 of the rays intersect
the sphere and the optimization from early ray termination
is greater than the time gained from not p erforming com-
p ositing. We notice for the three \real" volumes, the Blend
metho d is quicker when the numb er of samples along each
ray is equal to the number of voxels in that dimension of the
volume. When the sampling rate is close to the resolution
of the image, the excessive slices that must b e shaded and
returned to the frame bu er again allow the early ray ter-
mination optimization in the Planes metho d to out-p erform
the Blend metho d.
In theory, which metho d would b e optimal can b e deter-
mined from the desired rendering parameters, volume den-
sity histogram, and transfer function. For a more opaque
volume, the Planes metho d always pro duces b etter render-
ing rates. For transparentvolumes, if there are many slices
to render, the Planes metho d is usually quicker, while if
there are few slices the Blend metho d is the b etter of the
two. Yet in practice, we feel it may prove to b e dicult
Figure 7: Silicon dataset ythrough showing translucent
2
to determine how \few" and \many" are de ned. For this
surfaces rendered in 0.29 seconds at 200 resolution also
reason, we prefer the Planes metho d, since it is faster for all
in the color section
opaque volumes and for some of the translucentvolumes.
4 Comparison to Other Metho ds
Here we compare the p erformance of our rendering algorithm
to others presented in the literature, in terms of b oth image
quality and rendering rates. The image quality comparisons
p oint out quality trade-o s as they relate to lighting meth-
o ds. We noticed in our testing on di erentvolumes, that
the numb er of samples more accurately determined the run
time of the algorithm than simply the volume size. For this
reason wehave included image sizes and sample counts in
our runtime tables see Tables 1 and 2. We also noticed
that the volume data and transfer functions greatly in u-
ence rendering rates. For our metho d this is probably more
of an e ect b ecause we are utilizing runtime optimization
techniques whose p erformance directly relies on the volume
data and transfer functions.
Our Planes metho d renders the Lobster at a sampling res-
olution of 51251264 in 1.88 seconds. Our metho d is 19
times slower than the metho d of Cabral et al. However,
their metho d do es not employ directional shading or even
view indep endent di use shading. This is a ma jor limita-
tion to their metho d since shading cues are highly regarded
Figure 8: CT scannedHead dataset showing bone structures
as essential to visual p erception of shap e and form. Our im-
2
rendered in 0.44 seconds at 200 resolution also in the color
plementations with full Phong lighting with ambient, di use
section
and sp ecular comp onents pro duces a much higher quality image.
In comparison to Cullip and Neumann [4], our metho d achieves the p o orest sp eedup b ecause it quickly approaches
is again slower. Cullip and Neumann achieve b etter image the limit for raw rendering time imp osed by the sequential
quality than Cabral et al. by computing a gradient co e- texture mapping. On the other end of the sp ectrum, the
cient that is used to simulate di use highlights. This still Translucent Sphere achieves the b est sp eedup p erformance
is not as high an image quality as our full Phong lighting, although it su ers from the slowest rendering rates. This is
and if the light geometry changes with resp ect to the vol- b ecause the CPU b ound raycasting p ortion of the compu-
ume, Cullip and Neumann's texture maps must b e recom- tation is the dominant p ercentage of the sequential time for
puted. Therefore, if the viewing geometry is dynamic, then this dataset. The Lobster dataset is representativeofvol-
our metho d obtains higher quality images, including sp ecu- ume rendering applications and shows results b etween the
lar highlights, at faster rates. two extremes.
Our metho d pro duces an image of Opaque Sphere in 1.84 Given enough pro cessors, any dataset will eventually b e
seconds with the Planes metho d, faster than Sommer et limited by the time to p erform the texture mapping. The
numb er of pro cessors required to reach this limit dep ends
al.'s [13] isosurface rendering. For a translucent rendering of
the Lobster our Planes metho d runs 12 times faster in 2.78 on the time it takes for the CPU p ortion raycasting of
the algorithm to run and the fact that that p ortion relies
seconds. The image quality of b oth metho ds is equivalent
since they b oth compute full lighting e ects. As Sommer heavily on software data dep endant optimizations. The limit
is reached when the numb er of pro cessors is equal to T =T ,
et al. p ointed out, storing the gradients in the texture map
c t
has the disadvantage of limiting the size of the volume that where T is the time to p erform the raycasting for one plane
c
on the CPU and T is the time to texture map one plane in
can b e rendered without texture paging, so our frame rate
t
is limited by the amountofavailable texture memory, like the graphics hardware.
all other texture map based metho ds.
Although Lacroute's shear-warp [9] is not a texture map
6 Results and Conclusions
based approach, we include a comparison, since it is one
of the quickest metho ds for rendering with a full accurate
Wehave presented a metho d for high-quali ty rendering
lighting mo del on a workstation class machine. For example,
of volumetric datasets which utilizes the 3D texture map
shear-warp pro duces fully shaded mono chrome renderings at
hardware currently available in graphics workstations. The
a rate of 10 Hz, but, this is a parallel version of shear-warp
metho d pro duces images whose quality is not only compa-
running on a 32 pro cessor SGI Challenge. Lacroute rep orts
rable to that of accurate software ray casting, but also the
that a 12812884 volume can b e rendered in 0.24 seconds
2
highest quality metho d currently available, at a substantiall y
on one pro cessor. Our Planes metho d renders a 128 im-
faster frame rate than that of software ray casting. Other
age of the Opaque Sphere with 84 samples p er ray in 0.19
metho ds achieve higher frame rates than ours, but either
seconds and the Lobster in 0.24 seconds. Our parallel im-
lack shading, lack directional shading, or require multiple
plementation runs even faster see Sec. 5. Since shear-warp
pro cessors.
must generate three copies of a compressed data structure
Our metho d is accelerated bymultiple pro cessors, al-
p er classi cation, interactive segmentation is not p ossible as
though the sp eedup is limited by the throughput of the se-
is with our metho d. Shear-warp p erforms classi cation b e-
rial graphics pip eline. Although shear-warp achieves higher
fore bilinear resampling, whereas our metho d p erforms tri-
rendering rates for multipro cessor machines, our metho d is
linear interp olation followed by classi cation . Additionally,
faster on typical graphics workstations with 3D texture map-
our metho d p erforms arbitrary parallel and p ersp ective pro-
ping and also supp orts interactive classi cation .
jections in the same time while shear-warp takes up to four
times longer for p ersp ective pro jections.
7 Acknowledgements
5 Parallel Implementation
This work has b een supp orted by the National Science Foun-
dation under grant MIP9527694, Oce of Naval Research
Wehave parallelized the Planes algorithm on a four pro ces-
under grant N000149710402, Mitsubishi Electric Research
sor Onyx2worksatition with In nite Reality graphics. We
Lab, Japan Radio Corp., Hewlett-Packard, and Intel Corp.
constructed a master-slave mo del for the parallel pro cessing
The Lobster dataset is courtesy of AVS, Inc. The Silicon
where the master pro cess implements the texture mapping
dataset is courtesy of Oak Ridge National Lab. The MRI
interface to the graphics hardware and once a plane of or-
Head dataset is courtesy of Siemens. The CT Head is from
thogonal rays is resampled by the hardware, the work is
the UNC database.
farmed to a slave pro cess for the raycasting. We use the
shared memory symmetric multi-pro cessor SMP function-
ality of the Onyx 2 and IRIX 6.4 op erating system. The b est
References
sp eedup we can achieve with the parallel version is b ound
by the time it takes to p erform the texture mapping for all
[1] K. Akeley. RealityEngin e Graphics. In Computer
the planes. This is b ecause the texture mapping computa-
Graphics, SIGGRAPH '93, Anahiem, CA, August
tion must b e p erformed sequentially since there is only one
1993. ACM.
graphics pip eline.
Figure 9a shows the rendering rates for one to four pro-
[2] R. Avila, L. Sobiera jski, and A. Kaufman. Towards a
2
cessors for various volumes. For all cases we rendered 128
Comprehensivevolume Visualization System. In Pro-
images with 84 samples p er ray. The time to p erform the
ceedings of Visualization '92, Boston, MA, Octob er
texture mapping of 128 12884 planes is 0.12 seconds as
1992. IEEE.
shown on the graph. As can b e seen, the rendering rates
approach this theoretical b est rendering time. Figure 9b [3] B. Cabral, N. Cam, and J. Foran. Accelerated Vol-
presents sp eedup curves for the same datasets. The Opaque ume Rendering and Tomographic Reconstruction Using
Sphere dataset is the rendered the fastest. However, it also Texture Mapping Hardware. In Symposium on Volume 0.5 4 Optimal Linear Transparent Sphere Transparent Sphere Lobster Lobster 0.4 Opaque Sphere Opaque Sphere Texture Mapping
3 0.3
0.2 Speedup Runtime (sec.) 2
0.1
0 1 12341234 Number of Processors Number of Processors
(a) (b)
Figure 9: a Paral lel rendering rates for 12812884 samples and b Paral lel speedup
[12] J. S. Montrym, D. R. Baum, D. L. Dignam, and C. J. Visualization, pages 91{98, Washington D.C., Octob er
Migdal. In niteReali ty: A Real-Time Graphics Sys- 1994. ACM.
tem. In Computer Graphics, SIGGRAPH '97, pages
[4] T. J. Cullip and U. Neumann. Accelerating Volume
293{302, Los Angeles, CA, August 1997. ACM.
Reconstruction with 3D Texture Hardware. Technical
Rep ort TR93-027, University of North Carolina, Chap el
[13] O. Sommer, A. Dietz, R. Westermann, and T. Ertl.
Hill, 1993.
Tivor: An Interactive Visualizati on and Navigation
To ol for Medical Volume Data. In The Sixth Inter-
[5] J. Eyles, S. Molnar, J. Poulton, T. Greer, A. Las-
national Conference in Central Europe on Computer
tra, N. England, and L. Westover. PixelFlow:
Graphics and Visualization '98,February 1998.
The Realization. In Proceedings of the 1997 SIG-
GRAPH/Eurographics Workshop on Graphics Hard-
[14] J. van Scheltinga, J. Smit, and M. Bosma. Design
ware, pages 57{68, Los Angeles, CA, August 1997. Eu-
of an On-Chip Re ectance Map. In Proceedings of
rographics.
the 10th Eurographics Workshop on Graphics Hardware
'95, pages 51{55, Maastricht, The Netherlands, August
[6] A. Van Gelder and K. Kim. Direct Volume Render-
1995. Eurographics.
ing with Shading via Three-Dimensional Textures. In
Symposium on Volume Visualization, pages 23{30, San
Francisco, CA, Octob er 1996. ACM.
[7] S.-Y. Guan and R. Lip es. Innovativevolume rendering
using 3D texture mapping. In Image Capture, Format-
ting and Display, Newp ort Beach, CA, February 1994.
SPIE.
[8] M. J. Kilgard. Realizing Op enGL: Two Implemen-
tations of One Architecture. In Proceedings of the
1997 SIGGRAPH/Eurographics Workshop on Graphics
Hardware, pages 45{56, Los Angeles, CA, August 1997.
Eurographics.
[9] P. Lacroute. Analysis of a Parallel Volume Rendering
System Based on the Shear-Warp Factorization. IEEE
Transactions on Visualization and Computer Graphics,
23:218{231, Septemb er 1996.
[10] P. Lacroute and M. Levoy.Fast Volume Rendering us-
ing a Shear-warp Factorization of the Viewing Trans-
form. In Computer Graphics, SIGGRAPH '94, pages
451{457, Orlando, FL, July 1994. ACM.
[11] M. Levoy. Display of Surfaces from Volume Data. IEEE
Computer Graphics and Applications, 85:29{37, May 1988.