Project Plan for May08-38 - Interactive Ray Tracer on the PlayStation 3

Brendan Campbell, Daniel Risse, Aaron Westphal, Sean Godinez

November 25, 2007 Figure 1:

IBM's iRT Produces beautiful results

Abstract

The PlayStation 3 uses a new processor called the Cell Broadband Engine, developed by IBM. This report discusses a software application to run on the PlayStation 3 that utilizes the computational performance-enhancing features of the Cell Broadband Engine for the purposes of . Terms:

CBE Cell Broadband Engine A processor developed by IBM that is the CPU of Sony's PlayStation 3 gaming console. It is also referred to as the Cell Processor.

SIMD Single Instruction, Multiple Data. A SIMD command is an instruction that tells a processor to work on several data items with the same operation applied to each. This is also sometimes known as a vector operation. PPE Power Processing Element. This is the main core of the Cell Processor.

SPE Synergistic Processing Element. This is a co-processor that specializes in SIMD instructions and is a supplemental part of the Cell Processor.

IBM's iRT A team at IBM produced an interactive ray tracer to showcase the capabilities of the Cell Broadband Engine. Their ray tracer is known as iRT.

1 Contents

1 Design and Requirements Specication 4 1.1 Problem Statement ...... 4 1.2 Solution Concept ...... 5 1.3 System Description ...... 5 1.4 Operating Environment ...... 7 1.5 User Interface Description ...... 7 1.6 Market and Literature Survey ...... 7 1.6.1 Desirable Features for an Interactive Ray Tracer . . . . . 7 1.6.1.1 High Frame Rate ...... 7 1.6.1.2 Low-latency from user input ...... 7 1.6.1.3 Color ...... 8 1.6.1.4 Diuse ...... 8 1.6.1.5 Specular Shading ...... 8 1.6.1.6 Shadows ...... 8 1.6.1.7 Reections ...... 8 1.6.1.8 Refraction ...... 8 1.6.1.9 Anti-aliasing ...... 9 1.6.1.10 Textures ...... 9 1.6.1.11 Bump-mapping ...... 9 1.6.1.12 ...... 9 1.6.1.13 ...... 9 1.6.2 Means to Achieve Features ...... 10 1.6.2.1 Hierarchies ...... 10 1.6.2.2 Grouping of Similar Tasks...... 10 1.6.2.3 Iterative vs. Recursive Design ...... 10 1.6.3 Impact for This Project ...... 10 1.6.4 Originality ...... 11 1.7 Requirements ...... 11 1.7.1 Functional Requirements: ...... 11 1.7.2 Non-Functional Requirements ...... 12 1.8 Deliverables ...... 12

2 2 Project Plan 13 2.1 Work Breakdown Structure ...... 13 2.2 Budget ...... 14 2.3 Project Schedule ...... 14

3 Chapter 1

Design and Requirements Specication

1.1 Problem Statement

Our client, Dr. Joseph Zambreno, is interested in achieving a real-time Ray Tracing Graphics application on a PlayStation 3 (PS3). Traditionally, Ray Tracing is a very computationally intensive process, often requiring highly par- allelized supercomputers or clusters for real-time interactivity. The PlayStation 3 uses a parallel processing architecture of IBM's design known as Cell Broad- band Engine. Implementing interactive ray tracing on the Cell architecture is an area of only limited research so far, and it is believed that there is room improvement in

• the detail of renders the number of features implemented, • the speed of interaction and frame-rate of image generation, and • the scalability of such applications through clustering. • There is also room for taking greater advantage of the parallelism inherent in the Cell's multiple SPE (Synergistic Processing Element) design, and SIMD (Single Instruction Multiple Data) vector handling.

Our client's goal for us is to implement a well-featured interactive Ray Tracer for Linux on the PS3. This includes producing a working end program, ecient and readable source code for future development, and complete documentation. It is also desired to make it scalable to multiple PS3s, and potentially publish this work in an academic journal. We will be approaching the problem by taking the following steps:

1 Thorough research, studying the previous implementations and the Cell Broadband Engine Architecture

4 2 Implementing a prototype that uses modular code for a Ray Tracer.

3 Iteratively add features.

4 Testing, optimization and documentation.

1.2 Solution Concept

Ray Tracing is a set of algorithms that simulate the interactions of light rays with objects within a given, possibly dynamic, scene. It is based on the inverse of how actual light rays behave: instead of tracing all rays from luminous sources, it traces each pixel of a camera window to objects visible in the scene, and from each of those backwards, almost recursively (but implemented more eciently), until some termination point or recursion-depth limit. Many aspects of physical phenomena are common in typical Ray Tracing applications, including reec- tion, refraction, shading, shadows, textured and/or bump-mapped surfaces, and others. Scenes to render can be specied using natural shapes such as spheres and not just triangles as in common but faster rasterization algorithms. This can result in far more realistic looking models than rasterization, however the time to compute the render is typically orders of magnitude greater for Ray Tracing. Nonetheless, with parallel architectures, fast processors, ecient and optimized code, achieving real-time renders with low-latency user input to move the camera or objects in the scene is close at hand.

1.3 System Description

The purpose of the program is to take a model le into the PPE and divide its ray tracings into each SPE for side-by-side run time. The PPE and SPEs use real-time interactive ray tracing algorithms to generate output which is sent to the frame buer for viewing. When a new model le or the camera angle is changed, the algorithm must be run again with a new origin for the tracing rays. The generated image uses shadows, refraction, reection, and illumination to simulate life-like scenes.

5 Figure 1.1: System Block Diagram. Shows the ow of actions and data as the software runs.

Camera Input Device

Scene File File Load

Raytrace PS3 PPE Raytrace

Raytrace Raytrace

SPE

SPE Raytrace Raytrace

SPE

SPE

SPE SPE

Memory Bus

Memory Output

The program takes in a model le to the PPE, which is then divided into tasks to be traced by the individual SPEs. Each SPE and the PPE have access to a shared memory bus, as well as the output buer, to which each SPE sends its results. The input controller sends a new origin for the rays of the ray-tracing to the PPE, which then updates each SPE as necessary.

6 1.4 Operating Environment

The operating environment for our application will be a PS3 console connected to some type of display and having a USB keyboard input. There will be a distribution of Linux already installed and running on the PS3, and this will be the software environment in which our application will run. There are constraints of hardware availability within the Linux environment on the PS3, for instance, many dedicated graphics capabilities of the hardware are crippled when not using the PS3s native OS. However, Ray Tracing is not well suited to this kind of processing hardware and will instead run on the Cell Processor?s Primary Processing Element (PPE) and the 5 available SPEs. The application will be written to take advantage of these units by running sequences in parallel, in addition to using SIMD instructions to parallelize vector and array operations.

1.5 User Interface Description

The user interface for our application will be very simple. The display will consist entirely of the scene to be rendered, being sent through the PS3s video outputs to a compatible display. The interactivity will be achieved through a USB keyboard, for which Linux on the PS3 already has support. Key controls will enable such actions as moving the camera window around, and potentially manipulating the scene being rendered. The interface will not allow editing of the original scene data. Scene les will have to be created and modied with existing design tools.

1.6 Market and Literature Survey 1.6.1 Desirable Features for an Interactive Ray Tracer 1.6.1.1 High Frame Rate Interactive Ray Tracers, by denition, have to be able to display an ever- changing 3D scene at a frame-rate that is fast enough to seem smooth (20+ fps). The greatest obstacle to achieving such frame-rates is the amount of com- putation required to render a single frame. At a normal resolution (720x1280 aka 720p)[2], nearly one million rays are generated initially, and these rays have to do several intersection tests each. Furthermore, each ray must spawn more rays in order to calculate shadows and reections. It's easy to tell that the number of calculations is on the order of several million, per frame.

1.6.1.2 Low-latency from user input An Interactive Ray Tracer must be able to provide near-immediate response to user input. Otherwise, the utility of the input, as well as the pace at which a user manipulates the scene, decrease substantially. The sort of changes based on user input can include:

7 • Changing the geometries (shape/distances of objects in the scene)

• Changing Lights

• Moving the camera

Of these changes, changing the geometry is generally the most dicult to do quickly. This is due to the techniques that are traditionally used to speed up ray tracing.

1.6.1.3 Color Full-color renders (24-bit, usually) are generally expected from Ray Tracers. Interactive Ray Tracers are no exception.

1.6.1.4 Diuse Shading Diuse shading is calculated based on the angle at which a light source hits a surface. The more directly the surface is hit, the more intensely the surface is lit.

1.6.1.5 Specular Shading Specular shading is calculated like a reection. It determines how the camera sees a light source as a reection. This means that it takes into account the angle that the camera-ray hits a surface, as well as the angle that a light-ray hits a surface. It is aesthetically pleasing, and scenes look odd without it.

1.6.1.6 Shadows Shadows are important features to produce realism in renders. Shadows are calculated (if they are optically correct) by determining whether or not a point is hit by a light source.

1.6.1.7 Reections Reections are an outstanding feature that puts Ray Tracing ahead of other rendering methods (scan-line, specically). Tracing rays is a sure-re way to get realistic reections. For this reason, it is a major feature that allows a ray tracer to stand out by producing better images.

1.6.1.8 Refraction Refraction is the optical eect produced by light traveling through materials of diering indices of refraction. It is the eect that allows lenses to magnify. Refraction in ray tracing involves more computation than reection, but is an- other major feature that makes ray tracers stand out, because it isn't possible in scan-line rendering.

8 1.6.1.9 Anti-aliasing Anti-aliasing is the technique that gets rid of jagged edges on objects. The image is constrained to being rendered on pixels, but anti-aliasing can dampen the eect of pixelation. Anti-aliasing is usually accomplished by sending out additional rays and blending the results.

1.6.1.10 Textures In , a texture is a graphic that is applied to a surface for rendering. It allows for detailed and more realistic images. Allowing for textures creates a signicant boost in the aesthetic appeal of the render.

1.6.1.11 Bump-mapping Bump-mapping is a technique where a texture is used, but instead of using it for determining color, it is used for determining the way shading is applied to the surface. This allows for nice-looking rough surfaces. The technique can also be used to distort reections and refractions.

1.6.1.12 Ambient Occlusion Ambient Occlusion is a technique that helps a scene look like it is in a real environment. It is somewhat like a cheaper version of Global Illumination. Ambient Occlusion creates an eect where it seems that more light hits areas that are more revealed to the environment, and less light hits areas that are more concealed. It is a very nice looking eect, even though it isn't as realistic as true global illumination techniques. It is worth noting that Ambient Occlusion is a feature of IBM's Interactive Ray Tracer for Cell Broadband Engine.

1.6.1.13 Global Illumination Global Illumination is a Technique that produces realistic eects for the disper- sion of light. That is, light is simulated and bounced o of objects, thus lighting the scene indirectly from the light source. This generally involves a technique called photon simulation where new light sources are generated at varying intensities based on the way the simulated photons bounce o of the surfaces. Only by Global Illumination is it possible to realistically render eects called caustics. An example of a caustic eect is when light passes through a glass of water. Some of the light gets focused onto the surface behind the glass, creating a bright spot. So far, it has not been possible to do real-time ray tracing with global illumination on anything but supercomputers.

9 1.6.2 Means to Achieve Features 1.6.2.1 Hierarchies A major help to fast rendering is the use of spacial subdivision hierarchies. There are several data-structures that are useful for this. One common one is an Octree. An Octree is a spacial subdivision that divides a cube into eight equal sub-cubes at every level of the tree. Another common scheme is a kd-tree. A kd-tree divides the spaces into two (usually non-equal) parts at each level, along an axis. Other schemes include Bounding Volume Hierarchies and Binary Space Partitioning. The Octrees generally don't perform quite as well as the other structures, especially when the scene is not well-balanced (for example, a teapot in the middle of a stadium). However, Octrees are very fast to construct, which makes them appealing if the geometry is going to change often. The most expensive of the hierarchies to construct is the Bounding Volume Hierarchy, but such a structure has more potential to outperform all the other techniques.

1.6.2.2 Grouping of Similar Tasks. In order to be scalable, it must be possible to do signicant, useful computation without having the entire description of the scene at immediate access. Group- ing tasks based on their data dependencies is a way to signicantly reduce the frequency of external memory accesses. Memory latency is a signicant bottle- neck for almost any real-time application, and Ray Tracing is no exception.

1.6.2.3 Iterative vs. Recursive Design The Cell processor gets its speedup through the utilization of special processing units called Synergistic Processing Elements (SPEs). The PS3 has 6 SPEs available for use by Linux applications (which is what a ray tracer on a PS3 would be) [2]. SPEs do not have any form of branch prediction[1]. This means that every time a branch is taken, there is a penalty paid for the time it takes to rell the instruction cache. Calling a function within itself incurs the penalty of taking the time to allocate space on the stack and saving registers (perhaps unnecessarily). An iterative design can allow for more eective use of registers. Used in tandem with the in-lining of arithmetic methods, iterative design could allow for noticeable performance increase.

1.6.3 Impact for This Project The only features that stand out from the others as being infeasible for an interactive ray tracer are:

• Global Illumination - has far too many computations for a single PS3 to do in real-time. • Ambient Occlusion - has the same problem as Global Illumination unless it is pre-calculated.

10 • Anti-Aliasing - can easily more than triple the number of calculations, but if the scene is simple enough, this might be feasible.

On the other hand, most of the general-purpose techniques for improving ray tracer performance would be applicable to this project.

1.6.4 Originality There are already implementations of real-time, interactive ray tracers for the PlayStation 3. A group of students at MIT has produced a simple one, and IBM has produced a very impressive ray tracer. A signicant factor in this project is the fact that IBM has not released the source-code for their ray tracer. This project is intended to be an improvement over the MIT implementation in its features and performance. Additionally, this project is for academic advance- ment, and will not be closed-source. These qualities are improvements over the predecessors in PlayStation 3 ray tracing.

1.7 Requirements

In the following requirements, the term system describes this interactive ray tracer.

1.7.1 Functional Requirements: 1 System shall provide 24-bit RGB color output.

2 System shall provide a mechanism by which the camera position can be altered.

3 System shall render scenes composed of simple geometries, by computing intersections of rays and surfaces.

3.1 System shall render triangles when needed. 3.2 System shall render spheres when needed.

4 System shall render shadow simulations

4.1 System shall spawn rays to calculate the degree of shadow cover, one ray per light source.

5 System shall render reection simulations.

5.1 System shall spawn a reection ray(s) when a material's reective factor is above a particular threshold to determine an additional color factor to apply to the pixel of its parent-ray.

6 System shall render refraction simulations.

11 6.1 System shall spawn a refraction ray(s) when a material's transmissive factor is above a particular threshold to determine an additional color factor to apply to the pixel of its parent-ray.

7 System shall render diuse shading.

8 System shall render specular shading. 9 System shall render textures on geometries.

9.1 System shall allow for a graphic to be mapped onto a simple geometry, such that a color/material property can be determined on a per- coordinate basis. 9.2 System should allow for a graphic to determine parameters such as reectance, transmittance, specular level, bump-map, and index of refraction.

1.7.2 Non-Functional Requirements 1 Frame Rate faster than 20 fps.

2 Work must be possible to perform without complete scene description readily available. 3 Must easily scale to run multiple PS3s on the same frame.

4 Must have low latency from user input.

1.8 Deliverables

The end-products in this endeavor will be an application that runs on a PlaySta- tion 3, as well as several documents that detail the results of performance-related experiments that will be run to determine the best way to implement the ap- plication.

12 Chapter 2

Project Plan

2.1 Work Breakdown Structure

The development is going to consist of several iterative phases, in which new features will be added at each iteration. The distribution of work is intended to be very even, with each member handling dierent aspects at each phase, in an attempt to distribute expertise.

Figure 2.1: Org Chart

Dr. Joe Zambreno Unnamed Unnamed Faculty Advisor Dr. Greg Smith Management Design / Client Course Coordinator Consultant Consultant

Mike Steffen Ray Tracing Consultant

Brendan Daniel Risse Sean Godinez Aaron Westphal Campbell Communications

Team Leader Coordinator

PP07-04a: Ray Tracing on the Playstation 3

13 2.2 Budget

No budget is needed for this project, beyond calculating a theoretical compen- sation for the labor involved. At a rate of ten dollars per hour and working roughly six hours per week, the cost of the project comes to roughly $7200.

2.3 Project Schedule

The following table shows the start and end-times for the various phases of this project.

Figure 2.2: Project Schedule

14 Figure 2.3:

15 Figure 2.4:

16 Bibliography

[1] IBM's CBE Website. http://www.ibm.com/developerworks/power/cell/

[2] MIT's Undergrad PS3 Programming Course Website. http://cag.csail.mit.edu/ps3/

17