Acceleration techniques
Acceleration
Digggyital Image Synthesis Yung-Yu Chuang
with slides by Mario Costa Sousa, Gordon Stoll and Pat Hanrahan
Bounding volume hierarchy Bounding volume hierarchy
1) Find bounding box of objects Bounding volume hierarchy Bounding volume hierarchy
1) Find bounding box of objects 1) Find bounding box of objects 2) SliSplit objects into two groups 2) SliSplit objects into two groups 3) Recurse
Bounding volume hierarchy Bounding volume hierarchy
1) Find bounding box of objects 1) Find bounding box of objects 2) SliSplit objects into two groups 2) SliSplit objects into two groups 3) Recurse 3) Recurse Bounding volume hierarchy Where to split?
1) Find bounding box of objects • At midpoint 2) SliSplit objects into two groups • Sort, and put half of the objects on each side 3) Recurse • Use modeling hierarchy
BVH traversal BVH traversal • If hit parent, then check all children • Don't return intersection immediately because the other subvolumes may have a closer intersection Bounding volume hierarchy Bounding volume hierarchy
Space subdivision approaches Space subdivision approaches
Unifrom grid Quadtree (2D) KD tree BSP tree Octree (3D) Uniform grid Uniform grid
Preprocess scene 1. Find bounding box
Uniform grid Uniform grid
Preprocess scene Preprocess scene 1. Find bounding box 1. Find bounding box 2. Determine grid resolution 2. Determine grid resolution 3. Place object in cell if its bounding box overlaps the cell Uniform grid Uniform grid traversal
Preprocess scene Preprocess scene 1. Find bounding box Traverse grid 2. Determine grid resolution 3D line = 3D-DDA 3. Place object in cell if its (Digital Differential bounding box overlaps the cell Analyzer) 4. Check that object overlaps cell (expensive!) y y y mx b m 2 1 x2 x1
xi1 xi 1
yi1 mxi1 b naive
yi1 yi m DDA
octree Octree K-d tree
A
A
Leaf nodes correspond to unique regions in space K-d tree K-d tree
A A
B
B B
A A
Leaf nodes correspond to unique regions in space Leaf nodes correspond to unique regions in space
K-d tree K-d tree
C A C A
B B
C B B
A A K-d tree K-d tree
C A C A
D B D B
C C B B D A A
K-d tree K-d tree traversal
A A
D B D B
B C C B C C
D D
A A
Leaf nodes correspond to unique regions in space Leaf nodes correspond to unique regions in space BSP tree BSP tree
6 6 5 5 1
9 9 inside outside ones ones 7 7 10 8 10 8 1 1 11 11 2 2
4 4
3 3
BSP tree BSP tree
6 6 5 1 5 1
2 5 5 9 9b 9 3 6 4 7 7 6 8 7 10 10 11b 8 8 8 9a 7 9b 1 9 1 9a 11b 11 10 11 10 11a 2 11 2 11a
4 4
3 3 BSP tree BSP tree traversal
6 5 6 5 1 1
9b 9b 9 2 5 9 2 5 7 7 10 11b 8 8 10 11b11b 8 9a 3 6 9a9a 3 6 8 1 1 11 9b 11 9b 11a 4 7 11a 4 7 2 2
4 9a 11b 4 9a 11b poitint 3 3 10 10
11a 11a
BSP tree traversal BSP tree traversal
6 5 6 5 1 1
9b 9b 9 2 5 9 2 5 7 11b 7 10 11b11b 10 11b 8 8 3 8 9a9a 3 6 8 9a9a 6 1 1 11 9b 11 9b 11a 4 7 11a 4 7 2 2
4 9a 11b 4 9a 11b poitint point 3 3 10 10
11a 11a Classes Hierarchy
• Primitive (in core/primitive.*) Primitive – Geome tiPiititricPrimitive – InstancePrimitive – AtAggregate • Three types of accelerators are provided (in Geometric Transformed Aggregate accelerators/*.cpp) PiPrimiti ve PiPrimiti ve – GridAccel T – BVHAccel – KdTreeAccel Material Shape Support both Store intersectable Instancing and primitives, call animation Refine If necessary
Primitive Interface class Primitive : public ReferenceCounted { BBox WorldBound(); geometry
GeometricPrimitive Object instancing bool Intersect(Ray &r,Intersection *isect) { float thit, rayEpsilon; if (!shape->Intersect(r, &thit, &rayypEpsilon , &isect->dg)) return false; isect->primitive = this; isect->WorldToObject = *shape->WorldToObject; isect->ObjectToWorld = *shape->ObjectToWorld; isect->shapeId = shape->shapeId; isect->primitiveId = primitiveId; isect->rayEpsilon = rayEpsilon; r.maxt = thit; 61 uniqqpue plants , 4000 individual plants , 19.5M trian gles return true; With instancing, store only 1.1M triangles, 11GB->600MB } TransformedPrimitive TransformedPrimitive
bool Intersect(Ray &r, Intersection *isect){ Re ference
TransformedPrimitive Aggregates // Transform instance's differential geometry to world space • Acceleration is a heart component of a ray Transform PrimitiveToWorld = Inverse((p);w2p); tracer because ray/scene intersection accounts isect->dg.p = PrimitiveToWorld(isect->dg.p); for the majority of execution time isect->dg.nn = Normalize( PrimitiveToWorld(isect- >dg.nn)); • GlGoal: reduce the num ber of ray/iiti/primitive isect->dg.dpdu=PrimitiveToWorld(isect->dg.dpdu); intersections by quick simultaneous rejection of isect->dg.dpdv=PrimitiveToWorld(isect->dg.dpdv); groups of pr im itives and the fact tha t near by isect->dg.dndu=PrimitiveToWorld(isect->dg.dndu); intersections are likely to be found first isect->dg.dndv=PrimitiveToWorld(isect->dg.dndv); • Two main approaches: spatial subdivision, } object subdivision return true; • No clear winner } Ray-Box intersections Ray-Box intersections • Almost all acclerators require it 1 • QikQuick rejecti on, use enter and exit point to traverse the hierarchy Dx
• AABB is the intersection of three slabs t1 x1 Ox t1 Dx x1 Ox
x=x0 x=x1
Ray-Box intersections Grid accelerator bool BBox::IntersectP(const Ray &ray, • Uniform grid float *hitt0, float *hitt1) { float t0 = ray.mint, t1 = ray.maxt; for (int i = 0; i < 3; ++i) { float invRayDir = 1.f / ray.d[i]; float tNear = (pMin[i] - ray.o[i]) * invRayDir; float tFar = (pMax [i ] -rayy[]).o[i]) * invRa yDir ;
if (tNear > tFar) swap(tNear, tFar); t0 = tNear > t0 ? tNear : t0; segment iiintersection t1 = tFar < t1 ? tFar : t1; if (t0 > t1) return false; intersection is empty } if (hitt0) *hitt0 = t0; if (hitt1) *hitt1 = t1; return true; } Teapot in a stadium problem GridAccel
• Not adaptive to distribution of primitives. Class GridAccel:public Aggregate { • Have to determine the number of voxels.
mailbox GridAccel struct MailboxPrim { GridAccel(vector
Conversion between voxel and position Add primitives into voxels int posToVoxel(const Point &P, int axis) { for (u_int i=0; i
BBox pb = prims[i]->WorldBound(); for (int z = vmin[2]; z <= vmax[2]; ++z) for (int y = vmin[1]; y <= vmax[1]; ++y) int vmin[3] , vmax[3]; for (int x = vmin[0]; x <= vmax[0]; ++x) { for (int axis = 0; axis < 3; ++axis) { int o = offset(x, y, z); if (!voxels[o]) { vmin[axis] = posToVoxel(pb .pMin , axis); voxels[o] = voxelArena.Alloc
Voxel structure GridAccel traversal struct Voxel { bool GridAccel::Intersect( p ) { primitives.push_back(p); } Check against overall bound Set up 3D DDA (Digital Differential Analyzer) float rayT; • Similar to Bresenhm’s line drawing algorithm if (bounds. Inside(ray(ray. mint))) rayT = ray.mint; else if (!bounds. IntersectP(ray, &rayT)) return false; Point gridIntersect = ray(rayT); Set up 3D DDA (Digital Differential Analyzer) Set up 3D DDA for (int axis=0; axis<3; ++axis) { blue values changgges along the traversal Pos[axis]=posToVoxel(gridIntersect, axis); if (ray.d[axis]>=0) { Out NextCrossingT[1] NextCrossingT[axis] = rayT+ voxel (voxelToPos(Pos[axis]+1, axis)-gridIntersect[axis]) index /ray.d[axis]; DeltaT[axis] = width[axis] / ray.d[axis]; Step[axis] = 1; 1 DeltaT[0] Out[axis] = nVoxels[axis]; Step[0]=1 } else { Dx rayT ... Step[axis] = -1; Pos NextCrossingT[0] Out[axis] = -1; } DeltaT: the distance change when voxel changes 1 in that direction } width[0] Walk through grid Advance to next voxel for (;;) { int bits=((NextCrossingT[0] if (Pos[stepAxis] == Out[stepAxis]) break; Do not return; cut tmax instead Return when entering a voxel NextCrossingT[stepAxis] += DeltaT[stepAxis]; that is beyond the closest found intersection. conditions Bounding volume hierarchies • Object subdivision. Each primitive appears in the hierarchy exactly once. Additionally, the x BVHAccel construction Initialize buildData array MemoryArena buildArena; struct BVHBuildNode { uint32_; t totalNodes = 0; void InitLeaf((,,){int first, int n, BBox &b) { vector recursiveBuild Choose axis • Given n primitives, there are in general 2n-2 BBox cBounds; possible ways to partition them into two non- for (int i = start; i < end; ++i) empty groups. In practice, one considers cBounds=Union(cBounds, buildData[i].centroid); int dim = centroidBounds.MaximumExtent(); partitions along a coordinate axis, resulting in If cBounds has zreo volume, create a leaf 6n candidate partitions. 1. Choose axis 2. Choose split 3. Interior(dim, recursiildiveBuild(id)(.., start, mid, ..), recursiveBuild(.., mid, end, ..) ) Choose split (split_middle) Choose split (split_equal_count) float pmid = .5f * (centroidBounds.pMin[dim] + mid = (start + end) / 2; centroidBounds.pMax[dim]); std::nthstd::nth_element(&buildData[start], element(&buildData[start], BVHPrimitiveInfo *midPtr = std::partition( &buildData[mid], &buildData[end-1]+1, &buildData[start], &buildData[end-1]+1, Compp());arePoints(dim)); CompareToMid(dim, pmid)); mid = midPtr - &buildData[0]; It orders the array so that the middle pointer has median, Return true if the given primitive’s bound’s the first half is smaller and the second half is larger in O(n). centroid is below the given midpoint Choose split Choose split (split_SAH) Do not split N tisect (i) i1 split N A N B c(A, B) ttrav pA tisect (ai ) pB tisect (bi ) both heuristics work well both are sub-optimal i1 i1 s sA B B p pB A s sC C A C better solution Choose split (split_SAH) Choose split (split_SAH) • If there are no more than 4 primitives, use equal size const int nBuckets = 12; heuristics instead. struct BucketInfo { • Instead of testing 2n candidates, the extend is divided int count; BBox bounds; into a small number (()12) of buckets of e qual extent. }; Only buck boundaries are considered. BucketInfo buckets[nBuckets]; for (int i=start; i Choose split (split_SAH) Choose split (split_SAH) float cost[nBuckets-1]; float minCost = cost[0]; uint32_t minCostSplit = 0; for (int i = 0; i < nBuckets-1; ++i) { for ((;int i = 1; i < nBuckets-1;;){ ++i) { BBox b0, b1; if (cost[i] < minCost) { int count0 = 0,,; count1 = 0; minCost = cost[i]; minCostSplit = i; for (int j = 0; j <= i; ++j) { } b0 = Union(b0, buckets[j].bounds); } count0 += buckets[j].count; } if (nPrimitives > maxPrimsInNode || for (int j = i+1; j < nBuckets; ++j) { minCost < nPrimitives)){ { b1 = Union(b1, buckets[j].bounds); BVHPrimitiveInfo *pmid = count1 += buckets[j].count; } std::partition(&buildData[start],&buildData[end- cost[i] = .125f + (count0*b0.SurfaceArea() + 1]+1, Compare To Buc ke t(m in Cos tSp lit, n Buc ke ts, dim, count1*b1.SurfaceArea())/bbox.SurfaceArea(); centroidBounds)); mid = pmid - &buildData[0]; } Traverse cost : Intersection cost = 1 : 8 } else uin t32_ t no de Num = 0; offset into the nodes array to be visited uint32_t todo[64]; nodes to be visited; acts like a stack uint32_ t todoOffset = 0; next free element in the stack BVHAccel traversal BVHAccel traversal while (true) { else { interior node const LinearBVHNode *node = &nodes[[];nodeNum]; if ((g[dirIsNeg[node->axis]) { if (::IntersectP(node->bounds,ray,invDir,dirIsNeg)){ todo[todoOffset++] = nodeNum + 1; if (node->nPrimitives > 0) { leaf node nodeNum = node->secondChildOffset; // Intersect ray with primitives in leaf BVH node } for (uint32_t i = 0; i < node->nPrimitives; ++i){ else { if (primitives[node->primitivesOffset+i] todo[todoOffset++] = node->secondChildOffset; ->Intersect(ray, isect)) nodeNum = nodeNum + 1; hit = true; } } } if (todoOffset == 0) break; } nodeNum = todo[--todoOffset]; else { Do not hit the bounding box; retrieve the next one if any } if (todoOffset == 0) break; nodeNum = todo[--todoOffset]; } KD-Tree accelerator Spatial hierarchies • Non-uniform space subdivision (for example, kd-tree and octree) is better than uniform grid A if the scene is irregularly distributed. A Letters correspond to planes (A) Point Location by recursive search Spatial hierarchies Spatial hierarchies A A B D B B B C C D A A Letters correspond to planes (A, B) Letters correspond to planes (A, B, C, D) Point Location by recursive search Point Location by recursive search Variations “Hack” kd-tree building • Split axis – RdRound-robin; largest ex ten t • Split location – Middle of extent; median of geometry (balanced tree) • Termination – Target # of primitives, limited tree depth • All of these techniques stink. kd-tree octree bsp-tree Building good kd-trees Splitting with cost in mind • What split do we really want? – Clever Idea: the one tha t ma kes ray trac ing c heap – Write down an expression of cost and minimize it – Gree dy cost optim iza tion • What is the cost of tracing a ray through a cell? Cost(cell) = C_trav + Prob(hit L) * Cost(L) + Prob(hit R) * Cost(R) Split in the middle Split at the median To get through this part of empty space, you need to test all triangles on the right. • Makes the L & R probabilities equal • Makes the L & R costs equal • Pays no a tten tion to the L & R cos ts • Pays no a tten tion to the L & R pro ba bilities Cost-optimized split Building good kd-trees Since Cost(R) is much higher, make it as small as possible • Need the probabilities – Turns out to be propor tiona l to sur face area • Need the child cell costs – Simple triangle count works great (very rough approx.) –Empty cell “boost ” Cost(cell) = C_trav + Prob(hit L) * Cost(L) + Prob(hit R) * Cost(R) = C_trav + SA(L) * TriCount(L) + SA(R) * TriCount(R) C_trav is the ra tio o f the cos t to traverse to the cos t to • Automatically and rapidly isolates complexity intersect • PdProduces large c hun ks of emp ty space C_trav = 1:80 in pbrt (found by experiments) Surface area heuristic Termination criteria • When should we stop splitting? – BdBad: dep th liitlimit, num ber o f titriang les – Good: when split does not help any more. •Thhldfhreshold of cost improvement 2n splits; must coincides – Stretch over multiple levels with object – For example, if cost does not go down after three boundary. Why? splits in a row, terminate a b • Threshold of cell size – Absolute probability SA(node)/SA(scene) small Sa Sb pa p S b S Basic building algorithm Ray traversal algorithm 1. Pick an axis, or optimize across all three • Recursive inorder traversal 2. Build a set of can didate sp lit locat ions (cost extrema must be at bbox vertices) tmax t * 3. Sort or bin the triangles 4. Sweeppy to incrementally track L/R counts , cost t * 5. Output position of minimum cost split t Running time: T (N) N log N 2T (N / 2) min t * T (N) N log2 N tt * tttmin* max tt* min • Characteristics of highly optimized tree max Intersect(L,tmin,tmax) Intersect(L,tmin,t*) Intersect(R,tmin,tmax) – very deep, very small leaves, big empty cells Intersect((,R,t* ,tmax ) a video for kdtree Tree representation Tree representation 8-byte (reduced from 16-byte, 20% gain) float is irrelevant in pbrt2 interior 18 23 struct KdAccelNode { SE M ... union { float split; // Interior n flags u_int onePrimitive; // Leaf leaf 2 uu_int int *primitives; // Leaf n }; Flag: 0,1,2 (interior x, y, z) 3 (leaf) union { u_int flags; // Both u_int nPrims; // Leaf u_int aboveChild; // Interior }; } KdTreeAccel construction Interior node • Recursive top-down algorithm • Choose split axis position •max dhdepth = 8 1.3l(log(N) – MdMedpoi itnt – Medium cut If (nPrims <= maxPrims || depth==0) { – Area hhitieuristic N Start from the axis with maximum extent, sort cost of no split: ti (k) k 1 N B N A all edge events and process them in order cost of split: tt PB ti (bk ) PA ti (ak ) k 1 k 1 assumptions: A 1. ti is the same for all primitives C 2. ti : tt = 80 : 1 (determined by experiments, B main factor for the pp)erformance) cost of no split: ti N cost of split: tt ti (1 be )( pB N B pA N A ) a0 b0 a1 b1 c0 c1 sB A B C p(B | A) sA Choose split axis position KdTreeAccel traversal If there is no split along this axis, try other axes. When all fail, create a leaf. KdTreeAccel traversal KdTreeAccel traversal tmax ToDo stack ToDo stack tplane tmax tmin tmin far near far near KdTreeAccel traversal KdTreeAccel traversal far tmax tplane tmax ToDo stack ToDo stack tmin tmin near KdTreeAccel traversal KdTreeAccel traversal bool KdTreeAccel::Intersect (const Ray &ray, Intersection *isect) { tmax if (!bounds.IntersectP (y,,))(ray, &tmin, &tmax)) tmin return false; ToDo stack KdAccelNode *node=&nodes[0]; while (node!=NULL) { if (ray.maxt Leaf node Interior node 1. Check whether ray intersects primitive(s) 1. Determine near and far (by testing which side inside the node; update ray’ s maxt O is) 2. Grab next node from ToDo queue below above node+1 &(nodes[node->aboveChild]) 2. Determine whether we can skip a node t t max tplane tplane min tmin tmax near far near far tplane Acceleration techniques Best efficiency scheme References • J. Goldsmith and J. Salmon, Automatic Creation of Object Hierarchies for Ray Tracing, IEEE CG&A, 1987. •Brian Smits, Efficiency Issues for Ray Tracing, Journal of Grapp,hics Tools, 1998. • K. Klimaszewski and T. Sederberg, Faster Ray Tracing Using Adaptive Grids, IEEE CG&A Jan/Feb 1999. • Whang et. al., Octree-R: An Adaptive Octree for efficient ray tracing, IEEE TVCG 1(4), 1995. • A. Glassner, Space subdivision for fast ray tracing. IEEE CG&A, 4(10), 1984