Retained Mode Parallel Rendering for Scalable Tiled Displays
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Parallel Particle Rendering: a Performance Comparison Between Chromium and Aura
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/221357191 Parallel Particle Rendering: a Performance Comparison between Chromium and Aura . Conference Paper · January 2006 DOI: 10.2312/EGPGV/EGPGV06/137-144 · Source: DBLP CITATIONS READS 4 29 3 authors, including: Michal Koutek Koninklijk Nederlands Meteorologisch Instituut 26 PUBLICATIONS 172 CITATIONS SEE PROFILE All content following this page was uploaded by Michal Koutek on 28 January 2015. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately. Eurographics Symposium on Parallel Graphics and Visualization (2006) Alan Heirich, Bruno Raffin, and Luis Paulo dos Santos (Editors) Parallel particle rendering: a performance comparison between Chromium and Aura , Tom van der Schaaf1, Michal Koutek1 2 and Henri Bal1 1Faculty of Sciences - Vrije Universiteit, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands 2Faculty of Information Technology and Systems - Delft University of Technology Abstract In the fields of high performance computing and distributed rendering, there is a great need for a flexible and scalable architecture that supports coupling of parallel simulations to commodity visualization clusters. The most popular architecture that allows such flexibility, called Chromium, is a parallel implementation of OpenGL. It has sufficient performance on applications with static scenes, but in case of more dynamic content this approach often fails. We have developed Aura, a distributed scene graph library, which allows optimized performance for both static and more dynamic scenes. In this paper we compare the performance of Chromium and Aura. -
Depth Buffer Test
Computer Graphics – OpenGL – Philipp Slusallek History: Graphics Hardware • Graphics in the ‘80ies – Framebuffer was a designated memory in RAM – „HW“: Set individual pixels directly via memory access • Peek & poke, getpixel & putpixel, … • MDA ('81: text only but 720x350 resolution, monochrome, 4 kB of RAM!) – Character code was index into bit pattern in ROM for each character • CGA ('81: 160x200: 16 colors w/ tricks; 320x200: 4 col; 640x200: 2 col) • EGA ('85: 640x350: 16 from 64 col, CGA mode) • VGA ('90: 640x480: 16 col @ table with 2^18 col, 320x200: 256 col), with BIOS extension – Everything done on CPU • Except for driving the display output History: Graphics Hardware (II) • Today (Nvidia Ampere Flagship GA 102, RTX 3090) – Discrete graphics card via high-speed link • e.g. PCIe-4.0 x16: up to 64 GB/s transfer rate – Autonomous, high performance GPU (more powerful than CPU) • 10,496 SIMD processors • Up to 24GB of local GDDR6X RAM (A6000: 48 GB, x2 via NVLink) • 936 GB/s memory bandwidth • 35.6 TFLOPS 16bit floats • 35.6 TFLOPS single precision (SP) + ? TFLOPS doubles (DP) • 35.6/142/284/568 TFLOPS in FP32/16/Int8/4 via 328 Tensor Cores (RTX) • 20? GigaRays/s, 84 RT Cores • Dedicated ray tracing HW unit (BVH traversal & tri. interpol & intersect) • Total of 28.3 Billion transistors at 350 Watt – Performs all low-level tasks & a lot of high-level tasks • Clipping, rasterization, hidden surface removal, … + Ray Tracing • Procedural geometry, shading, texturing, animation, simulation, … • Video rendering, de- and encoding, deinterlacing, … • Full programmability at several pipeline stages • Deep Learning & Matrix-Multiply (sparse x2): Training and Inference Nvidia GA102 GPU History: Graphics APIs • Brief history of graphics APIs – Initially every company had its own 3D-graphics API – Many early standardization efforts • CORE, GKS/GKS-3D, PHIGS/PHIGS-PLUS, .. -
A Load-Balancing Strategy for Sort-First Distributed Rendering
A load-balancing strategy for sort-first distributed rendering Frederico Abraham, Waldemar Celes, Renato Cerqueira, Joao˜ Luiz Campos Tecgraf - Computer Science Department, PUC-Rio Rua Marquesˆ de Sao˜ Vicente 225, 22450-900 Rio de Janeiro, RJ, Brasil ffabraham,celes,rcerq,[email protected] Abstract such algorithms, when applied to complex scenes, still de- mand graphics power that a single processor may not be In this paper, we present a multi-threaded sort-first dis- able to deliver. Examples of such scenarios include scenes tributed rendering system. In order to achieve load balance described by a dense and highly tessellated geometry set, among the rendering nodes, we propose a new partition- sophisticated per-pixel lighting algorithms and volume ren- ing scheme based on the rendering time of the previous dering. frame. The proposed load-balancing algorithm is very sim- The purpose of our research is to investigate the use of ple to be implemented and works well for both geometry- PC-based clusters for improving the rendering performance and rasterization-bound models. We also propose a strat- of such complex scenes, delivering the frame rate usually egy to assign tiles to rendering nodes that effectively uses required by virtual-reality applications. This goal can be the available graphics resources, thus improving rendering achieved by combining the graphics power of a set of PCs performance. equipped with graphics accelerators. Although PC clusters have also been used for supporting high resolution multi- display rendering systems [2, 12, 14, 13], in this paper we focus on the use of a PC cluster for improving rendering 1. -
Introduction to Computer Graphics with Webgl
Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science Laboratory University of New Mexico Angel and Shreiner: Interactive Computer Graphics 7E © Addison-Wesley 2015 1 Models and Architectures Angel and Shreiner: Interactive Computer Graphics 7E © Addison-Wesley 2015 2 Objectives • Learn the basic design of a graphics system • Introduce pipeline architecture • Examine software components for an interactive graphics system Angel and Shreiner: Interactive Computer Graphics 7E © Addison-Wesley 2015 3 Image Formation Revisited • Can we mimic the synthetic camera model to design graphics hardware software? • Application Programmer Interface (API) - Need only specify • Objects • Materials • Viewer • Lights • But how is the API implemented? Angel and Shreiner: Interactive Computer Graphics 7E © Addison-Wesley 2015 4 Physical Approaches • Ray tracing: follow rays of light from center of projection until they either are absorbed by objects or go off to infinity - Can handle global effects • Multiple reflections • Translucent objects - Slow - Must have whole data base available at all times • Radiosity: Energy based approach - Very slow Angel and Shreiner: Interactive Computer Graphics 7E © Addison-Wesley 2015 5 Practical Approach • Process objects one at a time in the order they are generated by the application - Can consider only local lighting • Pipeline architecture application display program • All steps can be implemented in hardware on the graphics -
Cross-Segment Load Balancing in Parallel Rendering
EUROGRAPHICS Symposium on Parallel Graphics and Visualization (2011), pp. 1–10 N.N. and N.N. (Editors) Cross-Segment Load Balancing in Parallel Rendering Fatih Erol1, Stefan Eilemann1;2 and Renato Pajarola1 1Visualization and MultiMedia Lab, Department of Informatics, University of Zurich, Switzerland 2Eyescale Software GmbH, Switzerland Abstract With faster graphics hardware comes the possibility to realize even more complicated applications that require more detailed data and provide better presentation. The processors keep being challenged with bigger amount of data and higher resolution outputs, requiring more research in the parallel/distributed rendering domain. Optimiz- ing resource usage to improve throughput is one important topic, which we address in this article for multi-display applications, using the Equalizer parallel rendering framework. This paper introduces and analyzes cross-segment load balancing which efficiently assigns all available shared graphics resources to all display output segments with dynamical task partitioning to improve performance in parallel rendering. Categories and Subject Descriptors (according to ACM CCS): I.3.2 [Computer Graphics]: Graphics Systems— Distributed/network graphics; I.3.m [Computer Graphics]: Miscellaneous—Parallel Rendering Keywords: Dynamic load balancing, multi-display systems 1. Introduction of massive pixel amounts to the final display destinations at interactive frame rates. The graphics pipes (GPUs) driving As CPU and GPU processing power improves steadily, so the display array, however, are typically faced with highly and even more increasingly does the amount of data to uneven workloads as the vertex and fragment processing cost be processed and displayed interactively, which necessitates is often very concentrated in a few specific areas of the data new methods to improve performance of interactive massive and on the display screen. -
Programming with Opengl Part 1: Background
Programming with OpenGL Part 1: Background CS 537 Interactive Computer Graphics Prof. David E. Breen Department of Computer Science E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley 2012 1 Objectives • Development of the OpenGL API • OpenGL Architecture - OpenGL as a state machine - OpenGL as a data flow machine • Functions - Types - Formats • Simple program E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley 2012 2 Retained vs. Immediate Mode Graphics • Immediate - Geometry is drawn when CPU sends it to GPU - All data needs to be resent even if nothing changes - Once drawn, geometry on GPU is discarded - Requires major bandwidth between CPU and GPU - Minimizes memory requirements on GPU • Retained - Geometry is sent to GPU and stored - It is displayed when directed by CPU - CPU may send transformations to move geometry - Minimizes data transfers, but GPU now needs enough memory to store geometry 3 Early History of APIs • IFIPS (1973) formed two committees to come up with a standard graphics API - Graphical Kernel System (GKS) • 2D but contained good workstation model - Core • Both 2D and 3D - GKS adopted as IS0 and later ANSI standard (1980s) • GKS not easily extended to 3D (GKS-3D) - Far behind hardware development E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley 2012 4 PHIGS and X • Programmers Hierarchical Graphics System (PHIGS) - Arose from CAD community - Database model with retained graphics (structures) • X Window System - DEC/MIT effort - Client-server architecture with graphics • PEX combined the two - Not easy to use (all the defects of each) E. Angel and D. -
Opengl Pipeline Architecture • Understand Immediate Mode Graphics Vs Retained Mode Graphics Opengl Primitives
Lecture 3: Pipeline Architecture CITS 3003 Graphics & Animation Slides: E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley 2012 Breakdown of Lectures 1. Introduction & Image Formation 15. Computer Viewing 2. Programming with OpenGL 16. Shading 3. OpenGL: Pipeline Architecture 17. Shading Models 4. OpenGL: An Example Program 18. Shading in OpenGL 5. Vertex and Fragment Shaders 1 19. Texture Mapping 6. Vertex and Fragment Shaders 2 20. Texture Mapping in OpenGL 7. Representation and Coordinate 21. Hierarchical Modelling Systems 22. 3D Modelling: Subdivision Surfaces 8. Coordinate Frame Transformations 23. Animation Fundamentals and 9. Transformations and Homogeneous Quaternions Coordinates 24. Skinning 10. Input, Interaction and Callbacks 11. More on Callbacks 12. Mid-semester Test Study break 13. 3D Hidden Surface Removal 14. Mid term-test solution and project discussion 2 Content • Expanding on primitives • Vertex attributes • OpenGL pipeline architecture • Understand immediate mode graphics vs retained mode graphics OpenGL Primitives Recall from a previous lecture… GL_POINTS GL_LINES GL_LINE_STRIP GL_LINE_LOOP GL_TRIANGLES important GL_TRIANGLE_STRIP GL_TRIANGLE_FAN Polygon Issues • Graphics systems like triangles because triangles are: o Simple: edges cannot cross o Convex: All points on a line segment between two points in a polygon are also in that polygon o Flat: all vertices are in the same plane nonsimple polygon nonconvex polygon Polygon Issues (cont.) • If other polygons are used, they are tessellated -
Chromium Renderserver: Scalable and Open Remote Rendering Infrastructure Brian Paul, Member, IEEE, Sean Ahern, Member, IEEE, E
1 Chromium Renderserver: Scalable and Open Remote Rendering Infrastructure Brian Paul, Member, IEEE, Sean Ahern, Member, IEEE, E. Wes Bethel, Member, IEEE, Eric Brug- ger, Rich Cook, Jamison Daniel, Ken Lewis, Jens Owen, and Dale Southard Abstract— Chromium Renderserver (CRRS) is software infrastruc- ture that provides the ability for one or more users to run and view image output from unmodified, interactive OpenGL and X11 applications on a remote, parallel computational platform equipped with graphics hardware accelerators via industry-standard Layer 7 network proto- cols and client viewers. The new contributions of this work include a solution to the problem of synchronizing X11 and OpenGL command streams, remote delivery of parallel hardware-accelerated rendering, and a performance anal- ysis of several different optimizations that are generally applicable to a variety of rendering architectures. CRRS is fully operational, Open Source software. Index Terms— remote visualization, remote rendering, parallel rendering, virtual network computer, collaborative Fig. 1. An unmodified molecular docking application run in parallel visualization, distance visualization on a distributed memory system using CRRS. Here, the cluster is configured for a 3x2 tiled display setup. The monitor for the “remote I. INTRODUCTION user machine” in this image is the one on the right. While this example has “remote user” and “central facility” connected via LAN, The Chromium Renderserver (CRRS) is software in- the typical use model is where the remote user is connected to the frastructure that provides access to the virtual desktop central facility via a low-bandwidth, high-latency link. Here, we see the complete 4800x2400 full-resolution image from the application on a remote computing system. -
Rapidmind.Pdf
Rapidmind Background on GPGPU GPU Tutorial • Goal: 3-D image -> 2-D image • 2 main stages: – Convert 3-D coordinates to 2-D windows • Vertex processing – Fill in 2-D windows • Fragment processing Video Video Memory Memory (Textures) (Textures) Final Pixels (Color, Depth) Render-to-texture Shade Shade Fragment Processor Fragment Processor Fragments (pre-pixels) Pipeline GPU Rasterize Rasterize Screenspace triangles Graphics State (2D) Primitives Assemble Primitives Assemble Xformed, Lit Vertices Hardware (2D) Light Light Vertex & Vertex & Transform Transform Processor Processor GPU Vertices (3D) CPU Application Application GPU Parallelism • Parallelism @ vertex and fragment calculations Vertex vp vp vp vp vp vp processors Rasterizer Fragment fp fp fp fp fp fp fp fp fp fp fp fp fp fp fp fp processors Frame buffer GPU Programmability • Vertex and fragment processors can be programmed • Shader = programs written for vertex and fragment calculations • Vertex shaders = transformation, lighting • Fragment shaders = texture, color, fog GPU SIMD • Vertex processors all run SAME shader program • Fragment processor all run SAME shader program Vertex vp vp vp vp vp vp processors Rasterizer Fragment fp fp fp fp fp fp fp fp fp fp fp fp fp fp fp fp processors Frame buffer GPU Drawbacks • No integer data operands • No integer operations – e.g. bit shift, AND, OR, XOR, NOT • No double precision arithmetic • Unusual programming model GPU Improvement • NVIDIA GeForce G80 – unified pipeline and shader • CUDA – Computer Unified Device Architecture • Unified -
Master Thesis
Charles University in Prague Faculty of Mathematics and Physics MASTER THESIS Petr Koupý Graphics Stack for HelenOS Department of Distributed and Dependable Systems Supervisor of the master thesis: Mgr. Martin Děcký Study programme: Informatics Specialization: Software Systems Prague 2013 I would like to thank my supervisor, Martin Děcký, not only for giving me an idea on how to approach this thesis but also for his suggestions, numerous pieces of advice and significant help with code integration. Next, I would like to express my gratitude to all members of Hele- nOS developer community for their swift feedback and for making HelenOS such a good plat- form for works like this. Finally, I am very grateful to my parents and close family members for supporting me during my studies. I declare that I carried out this master thesis independently, and only with the cited sources, literature and other professional sources. I understand that my work relates to the rights and obligations under the Act No. 121/2000 Coll., the Copyright Act, as amended, in particular the fact that the Charles University in Pra- gue has the right to conclude a license agreement on the use of this work as a school work pursuant to Section 60 paragraph 1 of the Copyright Act. In Prague, March 27, 2013 Petr Koupý Název práce: Graphics Stack for HelenOS Autor: Petr Koupý Katedra / Ústav: Katedra distribuovaných a spolehlivých systémů Vedoucí diplomové práce: Mgr. Martin Děcký Abstrakt: HelenOS je experimentální operační systém založený na mikro-jádrové a multi- serverové architektuře. Před započetím této práce již HelenOS obsahoval početnou množinu moderně navržených subsystémů zajišťujících různé úkoly v rámci systému. -
Opengl Programming
Computer Graphics - OpenGL- Hendrik Lensch Computer Graphics WS07/08 – Rendering with Rasterization Overview • Last lecture: – Rasterization – Clipping • Today: – OpenGL Computer Graphics WS07/08 – Rendering with Rasterization Ray Tracing vs. Rasterization • Ray tracing – For every pixel • Locate first object visible in a certain direction – Requires spatial index structure to be fast • Rasterization – For every object • Locate all covered pixels – Uses 2D image coherence but not necessarily an index structure Computer Graphics WS07/08 – Rendering with Rasterization History • Graphics in the ‘80ies – Designated memory in RAM – Set individual pixels directly via memory access • peek & poke, getpixel & putpixel, … – Everything done on CPU, except for driving the display – Dump „frame buffer“ • Today – Separate graphics card connected via high-speed link (e.g. PCIe) • Autonomous, high performance GPU (much more powerful than CPU • Up to 128 SIMD processors, >>80 GB/s memory access • Up to 1GB of local RAM plus virtual memory – Performs all low-level tasks & a lot of high-level tasks • Clipping, rasterization, hidden surface removal, … • Procedural shading, texturing, animation, simulation, … • Video rendering, de- and encoding, deinterlacing, ... • Full programmability at several pipeline stages Computer Graphics WS07/08 – Rendering with Rasterization Introduction to OpenGL • Brief history of graphics APIs – Initially every company had its own 3D-graphics API – Many early standardization efforts • CORE, GKS/GKS-3D, PHIGS/PHIGS-PLUS, .. -
Chapter 7: Shading Through Multi-Pass Rendering
Chapter 7 Shading Through Multi-Pass Rendering Marc Olano Interactive Multi-Pass Programmable Shading Mark S. Peercy, Marc Olano, John Airey, P. Jeffrey Ungar SGI Abstract constructs familiar from general purpose programming languages Programmable shading is a common technique for production an- such as C, including loops, conditionals, and functions. The most imation, but interactive programmable shading is not yet widely common is the RenderMan Shading Language [32]. available. We support interactive programmable shading on vir- The power of shading languages for describing intricate light- tually any 3D graphics hardware using a scene graph library on ing and shading computations been widely recognized since Cook's top of OpenGL. We treat the OpenGL architecture as a general seminal shade tree research [7]. Programmable shading has played SIMD computer, and translate the high-level shading description a fundamental role in digital content creation for motion pictures into OpenGL rendering passes. While our system uses OpenGL, and television for over a decade. The high level of abstraction in the techniques described are applicable to any retained mode in- programmable shading enables artists, storytellers, and their techni- terface with appropriate extension mechanisms and hardware API cal collaborators to translate their creative visions into images more with provisions for recirculating data through the graphics pipeline. easily. Shading languages are also used for visualization of scien- We present two demonstrations of the method. The first is tific data. Special data shaders have been developed to support the a constrained shading language that runs on graphics hardware depiction of volume data [3, 8], and a texture synthesis language has supporting OpenGL 1.2 with a subset of the ARB imaging exten- been used for visualizing data fields on surfaces [9].