Real-Time Large Crowd Rendering with Efficient Character and Instance Management on GPU

Total Page:16

File Type:pdf, Size:1020Kb

Real-Time Large Crowd Rendering with Efficient Character and Instance Management on GPU Hindawi International Journal of Computer Games Technology Volume 2019, Article ID 1792304, 15 pages https://doi.org/10.1155/2019/1792304 Research Article Real-Time Large Crowd Rendering with Efficient Character and Instance Management on GPU Yangzi Dong and Chao Peng Department of Computer Science, University of Alabama in Huntsville, Huntsville, AL 35899, USA Correspondence should be addressed to Yangzi Dong; [email protected] Received 20 October 2018; Revised 26 January 2019; Accepted 3 March 2019; Published 26 March 2019 Academic Editor: Ali Arya Copyright © 2019 Yangzi Dong and Chao Peng. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Achieving the efcient rendering of a large animated crowd with realistic visual appearance is a challenging task when players interact with a complex game scene. We present a real-time crowd rendering system that efciently manages multiple types of character data on the GPU and integrates seamlessly with level-of-detail and visibility culling techniques. Te character data, including vertices, triangles, vertex normals, texture coordinates, skeletons, and skinning weights, are stored as either bufer objects or textures in accordance with their access requirements at the rendering stage. Our system preserves the view-dependent visual appearance of individual character instances in the crowd and is executed with a fne-grained parallelization scheme. We compare our approach with the existing crowd rendering techniques. Te experimental results show that our approach achieves better rendering performance and visual quality. Our approach is able to render a large crowd composed of tens of thousands of animated instances in real time by managing each type of character data in a single bufer object. 1. Introduction speed and reduce memory consumption while preserving the crowd’s visual fdelity. Crowd rendering is an important form of visual efects. In To improve the diversity of character appearances in a video games, thousands of computer-articulated polygonal crowd, a common method is duplicating a character’s mesh characters with a variety of appearances can be generated many times and then assigning each duplication with a to inhabit in a virtual scene like a village, a city, or a forest. diferent texture and a varied animation. Some advanced Movements of the crowd are usually programmed through a methods allow developers to modify the shape proportion crowd simulator [1–4] with given goals. To achieve a realistic of duplications and then retarget rigs and animations to visual approximation of the crowd, each character is usually the modifed meshes [10, 11] or synthesize new motions tessellated with tessellation algorithms [5], which increases [12, 13]. With the support of hardware-accelerated geometry- the character’s mesh complexity to a sufcient level, so that instancing and pseudo-instancing techniques [9, 14–16], fne geometric details and smooth mesh deformations can multiple data of a character, including vertices, triangles, be preserved in the virtual scene. As a result, the virtual textures, skeletons, skinning weights, and animations, can scene may end up with a composition of millions of, or even be cached in the memory of a graphics processing unit hundreds of millions of, vertices and triangles. Rasterizing (GPU). At each time when the virtual scene needs to be such massive amount of vertices and triangles into pixels rendered, the renderer will alter and assemble those data is a high computational cost. Also, when storing them in dynamically without the need of fetching them from CPU memory, the required amount of memory may be beyond main memory. However, storing the duplications on the the storage capability of a graphic hardware. Tus, in the GPU consumes a large amount of memory and limits the production of video games [6–9], advanced crowd rendering number of instances that can be rendered. Furthermore, technologies are needed in order to increase the rendering even though the instancing technique reduces the CPU-GPU 2 International Journal of Computer Games Technology communication overhead, it may sufer the lack of dynamic status of a character at a given time frame. McKenzie et al. mesh adaption (e.g., continuous level-of-detail). [28] developed a crowd simulator to generate noncombatant In this work, we present a rendering system, which civilian behaviors which is interoperable with a simulation achieves a real-time rendering rate for a crowd composed of modern military operations. Zhou et al. [29] classifed the of tens of thousands of animated characters. Te system existing crowd modeling and simulation technologies based ensuresafullyutilizationofGPUmemoryandcomputational on the size and time scale of simulated crowds and evaluated power through the integration with continuous level-of- them based on their fexibility, extensibility, execution ef- detail (LOD) and View-Frustum Culling techniques. Te ciency, and scalability. Zhang et al. [30] presented a unifed size of memory allocated for each character is adjusted interaction framework on GPU to simulate the behavior of dynamically in response to the change of levels of detail, as a crowd at interactive frame rates in a fne-grained parallel the camera’s viewing parameters change. Te scene of the fashion. Malinowski et al. [31] were able to perform large scale crowd may end up with more than one hundred million simulations that resulted in tens of thousands of simulated triangles. Diferent from existing instancing techniques, our agents. approach is capable of rendering all diferent characters Visualizing a large number of simulated agents using through a single bufer object for each type of data. Te animated characters is a challenging computing task and system encapsulates multiple data of each unique source worth an in-depth study. Beacco et al. [9] surveyed previous characters into bufer objects and textures, which can then approaches for real-time crowd rendering. Tey reviewed be accessed quickly by shader programs on the GPU as and examined existing acceleration techniques and pointed well as maintained efciently by a general-purpose GPU out that LOD techniques have been used widely in order programming framework. to achieve high rendering performance, where a far-away Te rest of the paper is organized as follows. Section 2 character can be represented with a coarse version of the char- reviews the previous works about crowd simulation and acter as the alternative for rendering. A well-known approach crowd rendering. Section 3 gives an overview of our system’s is using discrete LOD representations, which are a set of rendering pipeline. In Section 4, we describe fundamentals ofine simplifed versions of a mesh. At the rendering stage, of continues LOD and animation techniques and discuss the renderer selects a desired version and renders it without their parallelization on the GPU. Section 5 describes how to any additional processing cost at runtime. However, Discrete process and store the source character’s multiple data and LODs require too much memory for storing all simplifed how to manage instances on the GPU. Section 6 presents our versions of the mesh. Also, as mentioned by Cleju and Saupe experimental results and compares our approach with the [32], discrete LODs could cause “popping” visual artifacts existing crowd rendering techniques. We conclude our work because of the unsmooth shape transition between simplifed in Section 7. versions. Dobbyn et al. [33] introduced a hybrid rendering approach that combines image-based and geometry-based rendering techniques. Tey evaluated the rendering quality 2. Related Work and performance in an urban crowd simulation. In their approach, the characters in the distance were rendered with Simulation and rendering are two primary computing com- image-based LOD representations, and they were switched ponents in a crowd application. Tey are ofen tightly to geometric representations when they were within a closer integrated as an entity to enable a special form of in situ distance. Although the visual quality seemed better than visualization, which in general means data is rendered and using discrete LOD representations, popping artifacts also displayed by a renderer in real time while a simulation is occurred when the renderer switches content between image running and generating new data [17–19]. One example is representations and geometric representations. Ulicny et al. the work presented by Hernandez et al. [20] that simulated a [34] presented an authoring tool to create crowd scenes of wandering crowd behavior and visualized it using animated thousands of characters. To provide users immediate visual 3D virtual characters on GPU clusters. Another example is feedback, they used low-poly meshes for source characters. the work presented by Perez et al. [21] that simulated and visu- Te meshes were kept in OpenGL’s display lists on GPU for alized crowds in a virtual city. In this section, we frst briefy fast rendering. review some previous work contributing to crowd simulation. Characters in a crowd are polygonal meshes. Te mesh Ten, more related to our work, we focus on acceleration is rigged by a skeleton. Rotations of the skeleton’s joints techniques contributing to crowd rendering, including level- transform surrounding vertices, and subsequently the mesh of-detail (LOD), visibility culling, and instancing
Recommended publications
  • Graphics Development for Mobile Systems Marco Agus, KAUST & CRS4
    Visual Computing Group Part 3 Graphics development for mobile systems Marco Agus, KAUST & CRS4 1 Mobile Graphics Course – Siggraph Asia 2017 Mobile Graphics Heterogeneity OS programming languages architecture 3D APIs IDEs 2 Mobile Graphics Course – Siggraph Asia 2017 Mobile Graphics Heterogeneity OS programming languages IDEs 3D APIs 3 Mobile Graphics Course – Siggraph Asia 2017 Mobile Graphics – Android • OS – iOS – Windows Phone – Firefox OS, Ubuntu Phone, Tizen… • Programming Languages – C++ – Obj-C / Swift – Java – C# / Silverlight – Html5/JS/CSS • Architectures – X86 (x86_64): Intel / AMD – ARM (32/64bit): ARM + (Qualcomm, Samsung, Apple, NVIDIA,…) – MIPS (32/64 bit): Ingenics, Imagination. • 3D APIs – OpenGL / GL ES – D3D / ANGLE – Metal / Mantle / Vulkan (GL Next) • Cross-development – Qt – Marmalade / Xamarin / – Muio – Monogame / Shiva3D / Unity / UDK4 / Cocos2d-x 4 Mobile Graphics Course – Siggraph Asia 2017 Operating Systems 5 Mobile Graphics Course – Siggraph Asia 2017 Operating Systems • Linux based (Qt…) – Ubuntu, Tizen, BBOS… • Web based (Cloud OS) – ChromeOS, FirefoxOS, WebOS • Windows Phone • iOS (~unix + COCOA) • Android (JAVA VM) 6 Mobile Graphics Course – Siggraph Asia 2017 Development trends • Hard to follow the trends – software does not follow hardware evolution – strong market oriented field where finance has strong impact on evolution • In general, for – Mobile phones • Market drive towards Android, iOS – Tablets • Android, iOS, Windows 10 – Embedded devices • Heterogenous (beyond the scopes of this course) • Here
    [Show full text]
  • NVIDIA's Opengl Functionality
    NVIDIANVIDIA ’’ss OpenGLOpenGL FunctionalityFunctionality Session 2127 | Room A5 | Monday, September, 20th, 16:00 - 17:20 San Jose Convention Center, San Jose, California Mark J. Kilgard • Principal System Software Engineer – OpenGL driver – Cg (“C for graphics”) shading language • OpenGL Utility Toolkit (GLUT) implementer • Author of OpenGL for the X Window System • Co-author of Cg Tutorial Outline • OpenGL’s importance to NVIDIA • OpenGL 3.3 and 4.0 • OpenGL 4.1 • Loose ends: deprecation, Cg, further extensions OpenGL Leverage Cg Parallel Nsight SceniX CompleX OptiX Example of Hybrid Rendering with OptiX OpenGL (Rasterization) OptiX (Ray tracing) Parallel Nsight Provides OpenGL Profiling Configure Application Trace Settings Parallel Nsight Provides OpenGL Profiling Magnified trace options shows specific OpenGL (and Cg) tracing options Parallel Nsight Provides OpenGL Profiling Parallel Nsight Provides OpenGL Profiling Trace of mix of OpenGL and CUDA shows glFinish & OpenGL draw calls OpenGL In Every NVIDIA Business OpenGL on Quadro – World class OpenGL 4 drivers – 18 years of uninterrupted API compatibility – Workstation application certifications – Workstation application profiles – Display list optimizations – Fast antialiased lines – Largest memory configurations: 6 gigabytes – GPU affinity – Enhanced interop with CUDA and multi-GPU OpenGL – Advanced multi-GPU rendering – Overlays – Genlock – Unified Back Buffer for less framebuffer memory usage – Cross-platform • Windows XP, Vista, Win7, Linux, Mac, FreeBSD, Solaris – SLI Mosaic –
    [Show full text]
  • Opengl® ES™ 3.0 Programming Guide, Second Edition
    Praise for OpenGL® ES™ 3.0 Programming Guide, Second Edition “As a graphics technologist and intense OpenGL ES developer, I can honestly say that if you buy only one book on OpenGL ES 3.0 programming, then this should be the book. Dan and Budirijanto have written a book clearly by programmers for programmers. It is simply required reading for anyone interested in OpenGL ES 3.0. It is informative, well organized, and comprehensive, but best of all practical. You will find yourself reaching for this book over and over again instead of the actual OpenGL ES specification during your programming sessions. I give it my highest recommendation.” —Rick Tewell, Graphics Technology Architect, Freescale “This book provides outstanding coverage of the latest version of OpenGL ES, with clear, comprehensive explanations and extensive examples. It belongs on the desk of anyone developing mobile applications.” —Dave Astle, Graphics Tools Lead, Qualcomm Technologies, Inc., and Founder, GameDev.net “The second edition of OpenGL® ES™ 3.0 Programming Guide provides a solid introduction to OpenGL ES 3.0 specifications, along with a wealth of practical information and examples to help any level of developer begin programming immediately. We’d recommend this guide as a primer on OpenGL ES 3.0 to any of the thousands of developers creating apps for the many mobile and embedded products using our PowerVR Rogue graphics.” —Kristof Beets, Business Development, Imagination Technologies “This is a solid OpenGL ES 3.0 reference book. It covers all aspects of the API and will help any developer get familiar with and understand the API, including specifically the new ES 3.0 functionality.” —Jed Fisher, Managing Partner, 4D Pipeline “This is a clear and thorough reference for OpenGL ES 3.0, and an excellent presentation of the concepts present in all modern OpenGL programming.
    [Show full text]
  • Mobile Graphics Siggraph Asia 2017 Course Marco Agus, KAUST & CRS4 Enrico Gobbetti, CRS4 Fabio Marton, CRS4 Giovanni Pintore, CRS4 Pere-Pau Vázquez, UPC
    Visual Computing Group Mobile Graphics Siggraph Asia 2017 course Marco Agus, KAUST & CRS4 Enrico Gobbetti, CRS4 Fabio Marton, CRS4 Giovanni Pintore, CRS4 Pere-Pau Vázquez, UPC November 2017 1 Mobile Graphics Course – Siggraph Asia 2017 WELCOME TO THIS HALF-DAY COURSE! 2 Mobile Graphics Course – Siggraph Asia 2017 Subject: Mobile Graphics • All you need to know to get an introduction to the field of mobile graphics: – Scope and definition of “mobile graphics” – Brief overview of current trends in terms of available hardware architectures and research apps built of top of them – Quick overview of development environments – Rendering, with focus on rendering massive/complex surface and volume models – Capture, with focus on data fusion techniques 3 Mobile Graphics Course – Siggraph Asia 2017 Speakers (in alphabetical order) • Marco Agus (1,2) – Research Engineer at KAUST (Saudi Arabia) (1) www.crs4.it/vic/ – Researcher at CRS4 (Italy) • Enrico Gobbetti (1) - Organizer – Director of Visual Computing at CRS4 (Italy) (2) https://vcc.kaust.edu.sa • Fabio Marton (1) Researcher at CRS4 – (3) http://www.virvig.eu/ • Giovanni Pintore (1) – Researcher at CRS4 • Pere-Pau Vázquez (3) – Professor at UPC, Spain 4 Mobile Graphics Course – Siggraph Asia 2017 Funding Center for Research, King Abdullah University Polytechnic University of Development, and Advanced of Science & Technology, Catalonia, Studies in Sardinia, Italy Saudi Arabia Spain Project TDM Spanish MINECO Ministry RAS - POR FESR 2014-2020 FEDER funds Projects VIGEC / VIDEOLAB Grant No. TIN2014-52211-C2-1-R 5 Mobile Graphics Course – Siggraph Asia 2017 Schedule 5’ 0. Introduction and outline Enrico 15’ 1. Evolution of Mobile Graphics Marco 20’ 2.1 Mobile Graphics Trends / Hardware Pere-Pau 15’ 2.2 Mobile Graphics Trends / Applications Marco 15’ 3.
    [Show full text]
  • Real-Time Voxel Rendering Algorithm Based on Screen Space Billboard Voxel Buffer with Sparse Lookup Textures
    ISSN 2464-4617 (print) WSCG 2016 - 24th Conference on Computer Graphics, Visualization and Computer Vision 2016 ISSN 2464-4625 (CD-ROM) Real-time voxel rendering algorithm based on Screen Space Billboard Voxel Buffer with Sparse Lookup Textures Szymon Jabłonski´ Tomasz Martyn Institute of Computer Science Institute of Computer Science Warsaw University of Technology Warsaw University of Technology ul. Nowowiejska 15/19 ul. Nowowiejska 15/19 00-665 Warsaw, Poland 00-665 Warsaw, Poland [email protected] [email protected] ABSTRACT In this paper, we present a novel approach to efficient real-time rendering of numerous high-resolution voxelized objects. We present a voxel rendering algorithm based on triangle rasterization pipeline with screen space rendering computational complexity. In order to limit the number of vertex shader invocations, voxel filtering algorithm with fixed size voxel data buffer was developed. Voxelized objects are represented by sparse voxel octree (SVO) structure. Using sparse texture available in modern graphics APIs, we create a 3D lookup table for voxel ids. Voxel filtering algorithm is based on 3D sparse texture ray marching approach. Screen Space Billboard Voxel Buffer is filled by voxels from visible voxels point cloud. Thanks to using 3D sparse textures, we are able to store high-resolution objects in VRAM memory. Moreover, sparse texture mipmaps can be used to control object level of detail (LOD). The geometry of a voxelized object is represented by a collection of points extracted from object SVO. Each point is defined by position, normal vector and texture coordinates. We also show how to take advantage of programmable geometry shaders in order to store voxel objects with extremely low memory requirements and to perform real-time visualization.
    [Show full text]
  • Efficient Tile-Based Deferred Shading Pipeline
    ©Copyright 2013 DigiPen Institute of Technology and DigiPen (USA) Corporation. All rights reserved. Efficient Tile-Based Deferred Shading Pipeline BY Denis Ishmukhametov Bachelor of Science, Computer Science Ufa State Aviation Technical University, June 2011 THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in the graduate studies program of DigiPen Institute Of Technology Redmond, Washington United States of America Spring 2013 Thesis Advisor: Dr. Gary Herron DIGIPEN INSTITUTE OF TECHNOLOGY GRADUATE STUDY PROGRAM DEFENSE OF THESIS THE UNDERSIGNED VERIFY THAT THE FINAL ORAL DEFENSE OF THE MASTER OF SCIENCE THESIS OF DENIS ISHMUKHAMETOV HAS BEEN SUCCESSFULLY COMPLETED ON TITLE OF THESIS: EFFICIENT TILE-BASED DEFERRED SHADING PIPELINE MAJOR FILED OF STUDY: COMPUTER SCIENCE. COMMITTEE: Gary Herron, Chair Xin Li Pushpak Karnick Antonie Boerkoel APPROVED : date date Graduate Program Director Associate Dean date date Department of Computer Science Dean The material presented within this document does not necessarily reflect the opinion of the Committee, the Graduate Study Program, or DigiPen Institute of Technology. INSTITUTE OF DIGIPEN INSTITUTE OF TECHNOLOGY PROGRAM OF MASTER’S DEGREE THESIS APPROVAL DATE: ______DATE____________ BASED ON THE CANDIDATE’S SUCCESSFUL ORAL DEFENSE, IT IS RECOMMENDED THAT THE THESIS PREPARED BY Denis Ishmukhametov ENTITLED Efficient Tile-Based Deferred Shading Pipeline BE ACCEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF COMPUTER SCIENCE FROM THE PROGRAM OF MASTER’S DEGREE AT DIGIPEN INSTITUTE OF TECHNOLOGY. Thesis Advisory Committee Chair Director of Graduate Study Program Dean of Faculty The material presented within this document does not necessarily reflect the opinion of the Committee, the Graduate Study Program, or DigiPen Institute Of Technology.
    [Show full text]
  • Opengl 4.0 Review 23 March 2010, Christophe Riccio
    OpenGL 4.0 review 23 March 2010, Christophe Riccio Copyright © 2005–2011, G-Truc Creation Introduction OpenGL 4.0 is the new specification for GeForce GTX 400 and Radeon HD 5000 hardware series: aka Direct3D 11 hardware level. In this review we go through all the new features and explore their details. 1. Tessellation OpenGL 4.0 extend the rendering pipeline with three new stages that take place between the vertex shader and geometry shader: Control shader (Known as Hull shader in Direct3D 11) Primitive generator Evaluation shader (Known as Domain shader in Direct3D 11) In a way, the tessellation stages replace the vertex shader stage in the graphics pipeline. Most of the vertex shader tasks will be dispathed in the control shader and the evalution shader. So far, the vertex shader stage is still required but the control shader and the evaluation shader are both optional. Control shaders work on 'patches', a set of vertices. Their output per-vertex data and per- patch data used by the primitive generator and available as read only in the evaluation shader. Input per-vertex data are stored in an array called gl_in which maximum size is gl_MaxPatchVertices. The elements of gl_in contain the variables gl_Position, gl_PointSize, gl_ClipDistance and gl_ClipVertex. The per-patch variables are gl_PatchVerticesIn (number of vertices in the patch), gl_PrimitiveID (number of primitives of the draw call) and gl_InvocationID (Invocation number). The control shaders have a gl_out array of per-vertex data which members are gl_Position, gl_PointSize and gl_ClipDistance. They also output per-patch data with the variables gl_TessLevelOuter and gl_TesslevelInner to control the tessellation level.
    [Show full text]