Optimizing Mobile Games with Gameloft And
Total Page:16
File Type:pdf, Size:1020Kb
Optimizing Mobile Games with Gameloft and ARM Stacy Smith Senior Software Engineer, ARM Adrian Voinea World Android Technical Lead, Gameloft Victor Bernot Lead Visual Effects Developer, Gameloft 1 ARM Ecosystem . My first role in ARM was in Developer Relations . Developers came to us to ask for help . We couldn’t share their problems with the world . We couldn’t help everyone on a one to one basis 2 Developer Education . Developer Education addresses that need . Since 2012 we’ve been sharing advice for graphical development . Developers working with Developer Relations have remained separate . Until now! 3 4 Today’s Agenda . Gameloft . Batching (Iron Man 3) . Culling and LOD (Gangstar Vegas) . Texture compression (Asphalt 8) . ARM (That’s me) . Entry level implementations . How to achieve similar results 5 Iron Man 3 Improving Draw Calls and Rendering Techniques 6 Sorting Objects Before Rendering . There is no good or bad way to determine which sorting method works best for a game . Sorting methods reduce overdraw and material changes at the cost of CPU . Sorting algorithms used for Iron Man 3: . Sorting by material . Sorting by distance 7 What Happens When No Sorting Is Applied? . Mid-range device: average 18FPS, constant micro-freezes . Over 35 program changes per frame . The skybox is rendered in the 27th draw call / 150 8 Sorting Objects Before Rendering . Every shader program change is costly . It will depend a lot on the number of programs your game has . Using a texture atlas will create the same materials, thus allowing better material sorting . Sorting front to back will reduce the overdraw 9 Material Sorting Results . Reduced program changes to an average of 16 . Micro-freezes are reduced. Average 22FPS, smoother gameplay 10 Material Sorting Results . But we still have a lot of overdraw… 11 Front to Back Sorting . And this is the desired result… 12 Front to Back Sorting . Sorting by distance . Front to back sorting will reduce the overdraw, skipping fragment shader execution for all the dropped fragments . The transparent objects must be drawn back to front for a correct ordering. The blending is done with what is already drawn in the color buffer . For OpenGL® ES 3.0, avoid modifying the depth buffer from the fragment shader or you will not benefit from the early Z testing, making the front to back sorting useless 13 Front to Back Sorting Results . Sorting first by material, objects with the same material then sorted front to back . Constant 24FPS . The skybox is rendered in the 82nd draw call / 135, being the last opaque object 14 Dynamic Batching . Objects that are not static batched and by nature require dynamic movement can be batched at runtime . Objects need to share the same material to be batched into one single draw call . Dynamic batched objects require new geometry and the duplication of all the vertex attributes, adding a constraint on the number of vertices 15 Dynamic Batching • Should be applied to dynamic objects that share the same material, with a constraint on number of vertices, like particles 16 Dynamic Batching • In order for objects to be batchable at runtime: . Objects need to use the same program . Objects require identical textures . Because of draw order constraint, objects require the same transparency property . Because of duplicating vertex attributes, dynamic batching on objects requires a lower number of vertices 17 Sorting Objects Before Rendering . Good approach to render your scene: . Sort all opaque objects based on material and distance from camera . This reduces scene overdraw . Reducing material change will help a bit on FPS, depending on the hardware, but also helps with micro-freezes . For every consecutive draw call that has the same material, dynamic batching is simple and easy to achieve . Reduces the actual number of draw calls . Reduced micro-freezes and gained ~6FPS by applying sorting algorithms 18 The Importance of Batching 19 Batching . Deferred immediate mode rendering . glDrawElements and glDrawArrays have an overhead . Less draw calls, less overhead . DrawCall class stitches multiple objects into one draw call . Macro functions in shaders make batching as simple as: vec4 pos=transform[getInstance()]*getPosition(); 20 Batching 21 Batching uniform mat4 transforms[4]; 22 Mixing up a Batch Mesh 1 [(1,1),(0,1),(0,0),(1,0)] Mesh 2 [(1,2),(0,2),(0,0)] Mesh 3 [(2,2),(2,1),(1,1)] Index1: [0,1,2,0,2,3] Index2: [0,1,2] Index3: [0,1,2] Attrib1:[(1,1),(0,1),(0,0),(1,1),(1,2),(0,0),(1,0),(2,2),(2,1),(1,1)] Attrib2:[0,0,0,0,1,1,1,2,2,2] Index: [0,1,2,0,2,3,4,5,6,7,8,9] 23 Serving up a Batch Vertex shader: uniform mat4 transforms[3]; attribute vec4 pos; attribute float id; void main(){ mat4 trans=transforms[(int)id]; glPosition=trans*pos; } GL Code: float transforms[16*instanceCount]; . /* Load matrices into float array */ . glUniformMatrix4fv(transID,4,false,transforms); 24 Scene Batching . Multiple geometries drawn in a single draw call: . drawbuilder.addGeometry(geo1); . drawbuilder.addGeometry(geo2); . drawbuilder.addGeometry(geo3); . drawbuilder.Build(); . Always draws in the same order from a start and end index. Unused objects can be scaled out to zero . Can lead to inefficiency of vertex shaders 25 Remixing a Batch Mesh 1 [(1,1),(0,1),(0,0),(1,0)] Mesh 2 [(1,2),(0,2),(0,0)] Mesh 3 [(2,2),(2,1),(1,1)] Index1: [0,1,2,0,2,3] Index2: [0,1,2] Index3: [0,1,2] Attrib1:[(1,1),(0,1),(0,0),(1,1),(1,2),(0,0),(1,0),(2,2),(2,1),(1,1)] Attrib2:[0,0,0,0,1,1,1,2,2,2] Index: [0,1,2,0,2,3,4,5,6,7,8,9] 26 Remixing a Batch Attrib1:[(1,1),(0,1),(0,0),(1,1),(1,2),(0,0),(1,0),(2,2),(2,1),(1,1)] Attrib2:[0,0,0,0,1,1,1,2,2,2] These IDs are the same Index: [0,1,2,0,2,3,4,5,6,7,8,9] But now the draw order is different Index: [7,8,9,4,5,6,0,1,2,0,2,3] 27 Remixing a Batch Attrib1:[(1,1),(0,1),(0,0),(1,1),(1,2),(0,0),(1,0),(2,2),(2,1),(1,1)] Attrib2:[0,0,0,0,1,1,1,2,2,2] Index: [0,1,2,0,2,3,4,5,6,7,8,9] These vertices are still processed Index: [0,1,2,0,2,3,7,8,9] …But never get drawn 28 Object Instancing . Multiple geometries or single object instances: for(int i=0;i<50;i++) drawbuilder.addGeometry(geo1); drawbuilder.Build() . Can implement LOD switching when objects are sorted front to back and correctly culled . So let’s talk about culling 29 Batching / Draw Call Reduction Case Study: Gangstar Vegas 30 Challenges . Big map: ~16 km² . Move by foot / by car / by plane . Interior/exterior environments 31 Streaming . Constant streaming is needed : . Meshes / LODs . Streaming thread with second OpenGL® context . Textures . Same streaming thread as meshes . Lower mipmap level always available . Also streamed : . Animations . Sounds . Physics data 32 Reducing Draw Call Count & Cost . Draw calls are one of the major bottlenecks on mobiles . On a very powerful device, stay under 300 draw calls . On most devices => 100 to 200 at max . Techniques used to reduce draw call count: . Static batching . 2D grid culling . Near/far city . To reduce draw call cost: . Index & Vertex Buffer Objects (IBO / VBO) for static geometry 33 2D Grid Culling . Objects are linked to the cells of a grid . A simple 2D grid is used: . 128x128 cells . Cell size : 30m x 30m . An object can belong to one or more cells . Culling process: . Get cells visible by camera’s frustum 2D projection on grid . Middle cells: fully in frustum . Border cells: partially in frustum . Middle cell objects: directly added to render list . Border cell objects: AABBoxes are tested vs camera frustum . Grid is also used for streaming: . Radius Add (175m) / Radius Safe (205m) 34 Frustum & Grid 35 Middle Cells 36 Border Cells 37 Objects will be drawn directly 38 AABOX culling => culled 39 Outside frustum => directly culled 40 AABOX culling => visible Near/Far Cities . Two different city meshes are used: 1. Near city: . High resolution . Streamed 2. Far city . Low resolution . Not streamed 41 Cities Depth Slices . Each city version has its own « depth slice » . Using glDepthRange . Little overlapping zone between the two (20m). glDepthRange(0.0, 0.08); CamZnear = 0.2; CamZFar = 160; RenderHighresCity(); glDepthRange(0.08, 1); CamZnear = 140; CamZFar = 2000; RenderLowresCity(); 42 Full city: 77 draw calls 43 Near City : 48 draw calls 44 45 Top view, near city slice Near City Materials . Atlasing of albedo textures (batching & memory) . Lightmaps . Complex shaders (e.g. specular / reflection) 46 Full City: 77 draw calls 47 Far City: 29 draw calls 48 49 Top view: far city and full frustum Far City Materials . Heavy atlasing . Premultiplied Lightmaps . Low cost shaders 50 LODs and Culling 51 LOD in Batches 52 LOD in Batches 3 1 5 9 2 6 X 4 8 X 7 X 53 LOD in Batches 54 LOD in Batches X X 2 1 6 3 8 5 X 4 X 7 55 LOD in Batches X X X X 4 1 6 3 X 2 X 5 56 LOD in Batches 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 57 LOD in Batches X X 2 1 6 3 8 5 X 4 X 7 58 LOD in Batches X X X X 4 1 6 3 X 2 X 5 59 A View to a Cull 60 Scene Batching Timbuktu has 22 objects per environment layer S= Skipped R=Rendered X=Scaled out 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 R R R R R R R S S R R R X X X X X X R R X X X X X R X R S < - - - - - - - - - - - - - - - - - > S S R R R S S S S S S R R S S S S S R S R S < - > < > <> <> S S R R R S S S S S S R R S S S S S R X R S < - > < > < - > 61 Imposterable .