A Survey and Performance Analysis of Software Platforms for Interactive Cluster-Based Multi-Screen Rendering

Ninth Eurographics Workshop on Virtual Environments (2003) A. Kunz, J. Deisinger (Editors) A Survey and Performance Analysis of Software Platforms for Interactive Cluster-Based Multi-Screen Rendering Oliver Staadt, Justin Walker, Christof Nuber, and Bernd Hamann Center for Image Processing and Integrated Computing (CIPIC) and Department of Computer Science University of California, Davis Abstract We present a survey of different software architectures designed to render on a tiled display. We provide an in- depth analysis of three selected systems, including their implementation of data distribution, sort-first rendering, and overall usability. We use various test cases to analyze the performance of these three systems. Categories and Subject Descriptors (according to ACM CCS): C.4 [Performance of Systems]: Performance Attributes, I.3.2 [Computer Graphics]: Distributed/Network Graphics, I.3.4 [Computer Graphics]: Graphics Packages, I.3.7 [Computer Graphics]: Virtual Reality. 1. Introduction • displaying images at very high resolutions exceeding those of available monitors and/or graphic cards and Traditionally, multi-screen display environments have been driven primarily by powerful graphics supercomputers, • providing a larger field-of-view and better immersion such as SGI’s Onyx systems. With features including into the scenery. shared-memory multi-processing and multiple synchro- A larger image can be obtained by using special purpose nized graphics pipelines, they provided a stable and flexible video processors that split the incoming video signal and development platform for high-performance virtual reality distribute it to the connected display systems. This and visual simulation applications. Unfortunately, these fea- approach, however, increases the area covered by a single tures come at high cost. Hence, the use of multi-screen pro- pixel, which is not always desired. Increasing the resolution jection environments has been limited to a small number of of a displayed image requires the combination of several users. display-devices into a single display-environment, provid- During the past several years, high-performance and fea- ing a higher resolution by combining several images. In the ture-rich PC graphics interfaces have become available at past, high-performance computers, such as SGI’s Infinite low cost. This development enables us to build clusters of Reality with multiple graphics pipes have been used to high-performance graphics PCs at reasonable cost. An drive multi-tiled displays. With the availability of afford- important issue, however, is that the programming model able PC-based high-performance graphics cards like the for shared-memory systems and clusters differ significantly. NVidia GeForce- or the 3DLabs Wildcat-series, high-qual- In shared-memory graphics systems, the programmer does ity rendering is available at relatively low cost. Using a not have to worry about issues such as sharing data amongst cluster-based approach requires the solution of problems different processors or distributing rendering information to like data-management and -distribution, output-synchroni- different graphics engines. In cluster environments, it is zation and event handling. Solving these problems for an necessary to deal with these issues explicitly. The use of application can be very tedious, time-consuming and error- clusters for computationally intensive simulations and prone, so the usage of libraries providing the necessary sup- applications has lead to the development of interface stan- port should be considered. dards such as the Message Passing Interface (http:// The design and development of platforms for cluster- www.mpi-forum.org) and OpenPBS (http://www.open- based multi-screen rendering has become increasingly pop- pbs.org). We focus on rendering in a multi-display environ- ular during the past few years 1,4,5,6,7,10,15,16, but a standard ment. There are two important application areas where solution has yet to be found. Nevertheless, many developers multi-display environments are used 14: are eager to port existing applications to cluster environ- © The Eurographics Association 2003 1 Staadt et al. / Performance Analysis of Software Platforms for Interactive Cluster-Based Multi-Screen Rendering ments or to develop new ones. Although development of primitive clustering. Samanta et al. 13 investigated methods those platforms has only begun recently, various different to improve load balancing by dynamically changing the til- architectures have been proposed, some are available as ing. Sort-first algorithms also do not scale well when the open-source software 1,7,10,15,16. number of nodes in the cluster increases. Every primitive We provide a survey of different systems designed to that lies on the border of two tiles must be rendered by both tiles. As the number of tiles increases, the number of these render on a tiled display and discuss potential implications primitives increases. Samanta et al. 12 solved this problem on the application development as well as advantages and by using a hybrid sort-first, sort-last approach. disadvantages of these designs. We present detailed descrip- Sort-middle algorithms begin by distributing each graph- tions of three systems followed by a performance analysis. ics primitive to exactly one processor† for geometry pro- We defined a set of test-cases and conducted a quantitative cessing. After the primitive has been transformed into evaluation of the usefulness of these systems for different screen space, it is forwarded to another processor for ren- kinds of application scenarios. Our goal is to help develop- dering. Similar to the sort-first approach the screen space is ers with the selection of a software platform that is appro- divided into tiles, but each processor is only responsible for priate for their particular application requirements. We do rasterization of primitives within that tile. This approach not discuss hardware-related issues, which are also impor- requires a separation of rasterization engine and rendering tant for building commodity clusters for rendering. We refer pipeline, so that primitives can be redistributed. Currently the reader to 19 for an overview of different hardware archi- this approach can only be implemented using specialized tectures. hardware, such as SGI’s InfiniteReality engine. The remainder of this paper is structured as follows: In sort-last approaches, each primitive is sent to exactly After discussing different applications for cluster-based one node for rendering. After all primitives have been ren- rendering environments in Section 2, we will present the dered, the nodes must composite the images to form the survey of different software platforms in Section 3. Three final image. This usually requires a large amount of band- selected platforms will be evaluated in detail in Section 4. width because each node must send the entire image to a Section 5 contains the results and interpretations of our per- compositor. formance analysis, followed by conclusions in Section 6. Tiled displays lead naturally toward a sort-first approach. The screen is already partitioned into tiles, with each tile being driven by a single cluster node. Other approaches 2. Cluster-based Rendering require to distribute the primitives to the rendering nodes Cluster-based rendering in general can be described as the and distributing the final image to the nodes responsible for use of a set of computers connected via a network for ren- tile-rendering. For these reasons the majority of software dering purposes, ranging from distributed non-photorealis- systems designed for rendering on a tiled display imple- ments a sort-first algorithm. tic volume rendering over ray tracing and radiosity-based rendering 17 to interactive rendering using application programming interfaces (APIs) like OpenGL 18 or DirectX 11. 3. Systems Survey We use the same terminology as is used by X-Windows. A client runs the application while the server renders on the In our survey we analyzed systems designed to support ren- local display. dering on a tiled display. All systems evaluated implement to some extent a sort-first method. They vary widely with Most of the recent research on cluster-based rendering respect to the way data is distributed among the cluster focuses on different algorithms to distribute the rendering 2,3 of polygonal geometry across the cluster. Molnar et al.9 nodes. Chen et al. first looked at the problem of data dis- classified these algorithms into three general classes based tribution. Two general models have emerged: on where the sorting of the primitives occurs in the transi- • client–server and tion from object to screen space. The three classes are • master–slave. • sort-first, In the client–server model a user interacts with a single • sort-middle, and instance of the application that runs on a client node. This • sort-last. client is responsible for generating the geometry and dis- In sort-first algorithms, the display is partitioned into distributing it to the render servers (see Figure 1a). We can dis- crete, disjoint tiles. Each rendering node of the cluster is tinguish between two rendering modes - immediate mode then assigned one or more of these tiles and is responsible and retained mode. In immediate mode the client sends the for the complete rendering of only those primitives that lie primitives over the network every frame. In retained mode within one of its tiles. To accomplish this, primitives are each render server stores primitives it has already been sent usually pre-transformed to determine their screen space to locally for re-use. The client then needs to send only extents and then sent only to those tiles they overlap with. changes to the geometry. This method is usually accom- The required network bandwidth can be high when sending plished through the use of a scene graph. primitives to the appropriate render server, but utilizing knowledge of the frame-to-frame coherency of the primitives can reduce the amount of network traffic significantly. †The term processor is used in a more general sense, not re- Sort-first algorithms suffer from load balancing due to stricted to CPUs or GPUs.

A Survey and Performance Analysis of Software Platforms for Interactive Cluster-Based Multi-Screen Rendering

Stardust: Accessible and Transparent GPU Support for Information Visualization Rendering

Parallel Particle Rendering: a Performance Comparison Between Chromium and Aura

Tessellation and Rendering of Trimmed NURBS Models in Scene Graph Systems

Opengl Shading Languag 2Nd Edition (Orange Book)

A Load-Balancing Strategy for Sort-First Distributed Rendering

Rapid Prototyping for Virtual Environments

Data Management for Augmented Reality Applications

Cross-Segment Load Balancing in Parallel Rendering

Opensg Starter Guide 1.2.0

A Survey of Technologies for Building Collaborative Virtual Environments

Chromium Renderserver: Scalable and Open Remote Rendering Infrastructure Brian Paul, Member, IEEE, Sean Ahern, Member, IEEE, E

Hybrid Sort-First and Sort-Last Parallel Rendering with a Cluster of Pcs