2006:265 CIV MASTER'S THESIS

Implementation of a COLLADA scene-graph

Johan Lindbergh

Luleå University of Technology MSc Programmes in Engineering Computer Science and Engineering Department of Computer Science and Electrical Engineering Division of Media Technology

2006:265 CIV - ISSN: 1402-1617 - ISRN: LTU-EX--06/265--SE Implementation of a COLLADA scene-graph

Master's thesis

by Johan Lindbergh [email protected]

September 11, 2006

Supervisors: Tomas Karlsson Mikael Drugge

Luleå University of Technology Systemteknik

Preface

This thesis was done at Agency 9 AB for Luleå University of Technology  Computer Science and Engineering, during late spring and summer 2006. Agency 9 wanted to see if the digital content pipeline could be made more ecient by designing a 3D engine around the open COLLADA format. Tomas Karlsson at Agency 9 had a rough draft for how this could be done using a client-server architecture. The basics of the architecture is that the client should read and parse data and create rendering instructions, while the server takes those instructions and resources and uses them for rendering the scene. The focus of this thesis is on the client part, although data structures relevant to the rendering server will be covered, as well as the communication with the server. I would like to thank Tomas Karlsson and Khashayar Farmanbar at Agency 9 for all help, and of course the other guys at Agency 9 during the period: Micke, Lasse, Johan, Tompa K and Andreas. Special thanks goes to Åsa Lindvall. You know why. . .

Abstract

The need for more graphical content in real-time applications is rapidly in- creasing. As hardware becomes more powerful, the limit, possi- bilities and thereby also the room for details in a scene or model increases. The work to create these large digital worlds is done using dierent digital content creation tools (DCC tools), for example Maya, 3dsMax, Milkshape or . This makes it valuable to have a versatile non-proprietary format that can handle large data sets, and that is what makes COLLADA interesting for this thesis. A COLLADA document can contain and support almost every feature that a modern content creation tool could need. More importantly, it also contains a scene-graph. Agency 9's current 3D engine  AgentFX  has been using a scene-graph structure for several years. This scene-graph has to be created by manually building the graph; adding and removing nodes and their children, grandchildren and so on. But is it possible to directly use the scene-graph contained in a COLLADA document for a real-time 3D engine? The main goal of the thesis is to implement and evaluate a client-server architecture of a 3D engine with COLLADA as the base format, continuing on Agency 9's path of supporting COLLADA, and using a scene-graph structure. This thesis mainly covers the client part of the 3D engine. It describes the COLLADA format in more detail, and how to parse and store the COLLADA data structures. Also, the issue of how to communicate with the rendering server is addressed. The main conclusion is that the COLLADA structure can be modied and used as the base of a scene-graph based 3D engine.

Contents

1 Introduction 1 1.1 Problem formulation ...... 1 1.2 Purpose ...... 2 1.3 Background ...... 2 1.3.1 COLLADA & the ...... 2 1.3.2 Agency 9 ...... 3 1.4 Other work ...... 3 1.5 Limitations ...... 3

2 Methods and theory 4 2.1 XML ...... 4 2.1.1 Data binding ...... 4 2.2 Scene-graphs ...... 5 2.3 COLLADA ...... 6 2.3.1 Summary ...... 7 2.4 The COLLADA format ...... 7 2.4.1 Header ...... 8 2.4.2 Library ...... 8 2.4.3 Scene ...... 13 2.5 Rendering ...... 14 2.5.1 Resources and instructions ...... 14

3 Implementation 16 3.1 JAXB ...... 16 3.1.1 JAXB customization ...... 17 3.1.2 JAXB usage ...... 18 3.2 COLLADA Client ...... 19 3.2.1 Resources and instructions revisited ...... 19 3.2.2 Animating ...... 20 3.3 Client parser classes ...... 21 3.3.1 COLLADAParser ...... 21 3.3.2 AnimationParser ...... 21 3.3.3 CameraParser ...... 21 3.3.4 ControllerParser ...... 21 3.3.5 EectParser ...... 21 3.3.6 GeometryParser ...... 22 3.3.7 LightParser ...... 23 3.4 Rendering server ...... 23

iii 3.5 Data structures ...... 23 3.5.1 Bind package ...... 23 3.5.2 Vector math library ...... 24 3.6 Communication ...... 24

4 Evaluation 26 4.1 Functionality ...... 26 4.1.1 Client-server architecture ...... 26 4.1.2 Parsing COLLADA libraries ...... 26 4.1.3 Camera ...... 27 4.1.4 Animations ...... 27 4.1.5 Scene-graph ...... 27 4.2 Performance ...... 28 4.2.1 Load time examples ...... 28 4.2.2 Space costs ...... 30

5 Discussion 32 5.1 Conclusion ...... 32 5.1.1 COLLADA as a 3D engine format ...... 33 5.2 Future work ...... 33

Bibliography 35

List of Figures 36

List of Tables 37

A Unmarshalling example 39

B Sample COLLADA le 41 Chapter 1

Introduction

1.1 Problem formulation

The concept of using a scene-graph has been around for quite some time. IRIS Inventor was one of the earliest, presented in a paper in 1993 [1]. Nowadays, a scene-graph is almost the de-facto standard data structure for representing a virtual world. Current 3D graphics toolkits using some form of scene-graph include Open Scenegraph1, Java3D 2, Gizmondo3, Renderware4 and of course AgentFX 5. For a more thorough explanation about what a scene-graph is, please refer to section 2.2. Until recently, AgentFX has put the work of building the scene-graph on the programmer. Manually creating and connecting all nodes in a content tree can be tedious work and be quite hard to manage. It could also be quite time consuming, especially when creating large virtual worlds. Agency 9 thought that it was possible to use COLLADA's own scene-graph representation to automatically build the scene-graph. Using the information in a COLLADA document, the programmer would no longer have to bother with the initial structuring of the graph. This was implemented and tested in Mikael Lagré's [2] bachelor's thesis (see 1.4) in 2005. Since COLLADA is very well dened, and its goal is to incorporate as many features in modern content creation tools as possible, it would be interesting to see if one could build an engine that can read and parse an entire COLLADA document. Then, depending on machine capabilities and implemented features, the engine should make its best eort to render the scene. Tomas Karlsson, CTO at Agency 9, had an idea of a client-server architecture for a 3D engine. The client should be responsible for reading and parsing the COLLADA data. The data should then be packaged and sent to a rendering server. The server [3] (see 1.4) is then responsible for implementing functionality of rendering what it can handle. The COLLADA documentation [4] quite clearly states that COLLADA is not a format as such, rather that it is benecial in the content

1http://www.openscenegraph.org/ 2https://java3d.dev.java.net/ 3http://www.gizmosdk.com/ 4http://www.renderware.com/ 5http://www.agency9.se/

1 1.2. PURPOSE CHAPTER 1. INTRODUCTION production pipeline. The documentation does however not point out any obvious problems with using COLLADA as an engine format, just that applications will use proprietary, size-optimized binary les. Whether there is some substance in those statements remains to be seen.

1.2 Purpose

Agency 9 feels that it is worth researching the possibility to use COLLADA more or less as an engine format. Using the client-server architecture, the client should be able to supply the server with all data needed for rendering, and probably more than that. The goal is to be able to use the client and server as separate entities. Dierent rendering kernels could be used by the server, and depending on the kernel, dierent amounts of the data sent could be used in the rendering process. The purpose of this thesis is to learn as much as possible about XML and COLLADA, and to see if it is possible to take the whole COLLADA structure, including both resources and scene-graph, and map it to a real time rendering engine. Most of the work will be to understand the COLLADA structure and decide what to do with the data contained within a .dae6-le. Another big part of the work will be the learning about XML and data binding, implementing resource parsers and key-frame interpolation. Yet another part of the thesis project is to decide, together with Tomas Karlsson and Lasse Wedin [3] how to talk to the rendering server. Some problems in Lagré [2] arised because of the requirement to t the current version of AgentFX. These problems might be avoided by redesigning the 3D engine from scratch. Another higher purpose is thus to lay the foundation for a new iteration of AgentFX.

1.3 Background 1.3.1 COLLADA & the Khronos group COLLADA, from COLLAborative Design Activity, is basically an open Digi- tal Asset Exchange Schema for the interactive 3D industry. COLLADA is a standard of the Khronos group7. The Khronos group was founded in January 2000 by a number of media- centric companies including ATI, Creative, Google, Intel, nVidia, Ericsson and Sun Microsystems. The group is dedicated to creating APIs and enable the authoring and playback of rich media on a wide variety of platforms and devices. Current media APIs and technologies are OpenGL R , OpenGL R ES, OpenVGTMand COLLADATMamong others. All Khronos members are able to contribute to the development of Khronos API specications, and are empowered to vote at various stages before public deployment. COLLADA is providing a neutral zone where companies can work together towards the same goal: a common specication of the format. It is then up to

6Digital Asset Exchange 7http://www.khronos.org/

2 CHAPTER 1. INTRODUCTION 1.4. OTHER WORK the companies themselves to support the format by writing their own exporters and importers, and as this is done, the work ow for end users is made more ecient. The major benet for the companies is that they get early access to specication drafts and conformance tests, and that helps them in developing their own 3D platforms and applications. More detailed information about COLLADA and how it is developed can be found in part 2.3 and 2.4.

1.3.2 Agency 9 Agency 9 is a small company based in Luleå. Starting in June 2001, the busi- ness quickly moved from creating web based games to developing an advanced platform for real-time rendering of 3D graphics. The platform, AgentFXTM, is basically a platform-independent 3D engine. As mentioned in 1.1, the engine is using a hierarchical structure for visualizing and modifying graphic content  a scene-graph. It is also one of the most advanced 3D engines written in JavaTM.

1.4 Other work

Mikael Lagré [2] made a reader for COLLADA 1.3 and adapted it for the existing AgentFX. It is a good reference for this thesis project. The main dierence between this work and [2], is that this COLLADA implementation should not be customized to a specic engine. It should instead be used as a guide when designing the next generation of Agency 9's graphics engine. Lasse Wedin - a fellow student from Campus Skellefteå - has as a 10p thesis project implemented a rendering server that is the base of the server to be used by the nal product [3].

1.5 Limitations

The work is limited to using the most recent COLLADA 1.4.0 schema8 and Java 1.5 (J2SE 5.0). Since the project might result in a new Java based 3D engine API, it is reasonable to use the latest version of Java. J2SE 5.0 also has support for many new features such as generics, typesafe enumerations, enhanced for- loops and static imports. All of which makes programming both safer and easier to write, read and maintain. There are a lot of exporters for COLLADA from many dierent DCC-tools, some of which are not very stable or even fully compliant with the format. The one mainly used when creating content for testing in this project is Feel- ing Softwares9 COLLADAMaya 0.90 for Maya 7.0. Lagré [2] also mostly used COLLADAMaya, although in an earlier version. It worked satisfactory then as a COLLADA 1.3.1 exporter, and has now been improved and upgraded to export COLLADA 1.4.0 documents. The purpose of this thesis is not to test COLLADA exporters DCC tools, but as the prototype evolves towards a real product, extensive compliance testing will have to be done.

8http://www.collada.org/2005/11/COLLADASchema.xsd 9http://www.feelingsoftware.com/

3 Chapter 2

Methods and theory

The following chapter deals with the tools and techniques needed and used for the thesis work. Most of the theory is a presentation of COLLADA, and everything the format can contain.

2.1 XML

XML is a general-purpose markup language proposed by the W3C (World Wide Web Consortium) [5]. It is used for creating special-purpose markup languages that can describe many dierent kinds of data. XML can be use to both describe and contain data. To create a specic document type using XML, a schema language is used to put constraints on the contents and structure of that document type. The schema of a document is like a set of rules to which the document must conform in order to be considered valid according to that schema. Generally for an XML document to be valid it has to conform to the rules of a schema as mentioned above. It also has to be well-formed according to the XML standard. That means a document must obey the following rules among others:

• Have one and only one root element

• Has to be compliant with a character set denition. UTF-8 is default.

• Non-empty elements has to have both start and end tags

• Tags may be nested, but cannot overlap This is not an extensive rule list. Only rules somewhat relevant to this thesis are listed. A complete list of rules that applies to XML documents can be found in [6] (under Extensible Markup Language), and in 2.1 of [5].

2.1.1 Data binding To be able to easily use XML data in a program, an XML data binding has to be made. The data binding is the process of taking the XML schema and bind it into classes usable by the program. An XML schema describes the structure

4 CHAPTER 2. METHODS AND THEORY 2.2. SCENE-GRAPHS and meaning of an XML document in such detail that a package of classes can be derived from it. This fact is used by XML compilers to generate a set of classes  a bind package  from a source schema. This way, elements of the XML document are easily represented as objects in a program. The process of converting an XML document into the data objects derived from the XML compiler is called unmarshalling. The opposite, to convert ob- jects into a valid XML document, is called marshalling. There are many XML parsers for Java, but according to [2] and some own studies, JAXB1 (Java Architecture for XML Binding) is the most powerful and exible data binder, and it is suitable for this work.

2.2 Scene-graphs

The scene-graph is an innitely useful structure to use when arranging the logical and spatial representation of a graphical scene. The fact that programmers implement scene-graphs to t their particular needs in an application makes the denition of what a scene-graph really is quite fuzzy. That said, there is no hard rule for what a scene-graph is or is not. A general description of a scene-graph is that it is a collection of nodes organized in a graph or tree structure. This means that any node may have many children but often only a single parent. An operation applied to a parenting node propagates to all of its children in the way implemented by the traversal algorithm (see 3.2.1 for implementation details). In many cases, associating a geometrical transformation matrix (see 2.4.3) with a node, and concatenating the matrices together, is an ecient and natural way to process those transformations [6]. A common feature is to group related objects into a compound object which can then be transformed as a single object. A small example of a scene-graph can be viewed in g 2.1, where the computerTable is the parent of the table and the computerMonitor. The table and monitor group in turn have children of their own. When a user wants to move the whole table, only the computerTable node has to be transformed. The screen node for example, inherits the transform of the screenStand, the computerMonitor and the whole computerTable.

An example of the dierence between scene-graph denitions is the repre- sentation of a COLLADA scene versus the scene-graph structure of Agency 9's engine AgentFX. The scene-graph of COLLADA is a graph containing nodes to describe how and where in world-space to put objects, while AgentFX works with nodes that in addition to describing child-parent relations and content also implement their own traversal method. In lack of better terms, the COLLADA scene-graph can be called a describing or non-interactive scene-graph, while the current AgentFX is a more complete scene-graph based engine.

1https://jaxb.dev.java.net/

5 2.3. COLLADA CHAPTER 2. METHODS AND THEORY

Figure 2.1: computerTable Figure 2.2: Table scene. scene-graph.

2.3 COLLADA

The development of the COLLADA Digital Asset schema has and is involving a lot of people from many dierent companies. It is therefore important to have well dened goals with this so called COLLAborative Design Activity. Some of the goals are presented here, including the eect these goals have on the design of the format.

Non-proprietary format

One ever important goal with COLLADA is to be able to store digital assets in a form not proprietary to some company. Digital assets are the biggest part of most 3D applications, and large invest- ments have to be made in software development for every new proprietary tool used in the production chain. Middleware vendors  such as Agency 9  have to integrate with as many tool chains as possible to be an attractive choice for developers. Many middleware companies provide their own toolkit, and have to convince developers to adopt it. This makes it hard or even impossible for the developers to use several middleware tools in the same project. It might also be just as dicult to use several DCC tools in the same project. This goal led to COLLADA using XML, thanks to its well-dened framework and it being an open standard. Also, XML is internationally usable due to its character set denitions and the fact that there exists XML parsers for almost every language on every platform. Discussions around this goal also led to the need of a standard prole for dening digital asset data.

6 CHAPTER 2. METHODS AND THEORY 2.4. THE COLLADA FORMAT

Provide a standard common format An important functionality with a format that should be used as widely as COLLADA is that it has to be able to contain a very wide amount of data. A digital asset is a very broad denition. This led to the COLLADA Common Prole, which is the basis for all content in a digital asset. The idea is that if a tool can interact with the Common Prole, then the tool should work with COLLADA. Included in this goal is that the Common Prole should work as a more gen- eral base for digital (3D) data exchange. The Khronos group acknowledges that the Common Prole will be an ongoing exercise. It currently covers polygon models, materials, , animations and more. The work of dening a com- mon way of describing NURBS, subdivision surfaces and other more complex data types is continuous. [4]

Easy integration There is also need to facilitate the integration into a wider range of tool chains. A way to accommodate for developers need of adding their own functionality into the format, is by making COLLADA fully extensible. Extensions allows vendors to dene their own prole for data not dened in the Common Prole. For example, there are still content from Maya or 3dsMAX that cannot be described using the COLLADA Common Prole. Exporters for these DCC's therefore dene a Maya- or MAX- prole to keep all information exported from that specic tool. This thesis concentrates only on data contained in the Common Prole.

2.3.1 Summary To sum up the most important features of COLLADA, we can say that it. . .

• is built using open standards (UTF8, URLs, XML, XPath etc.)

• is itself an open standard, designed by the industry consortium the Khronos group.

• encodes a scene using a scene-graph representation.

• is a lossless format.

• is an interchange format usable by many 3D applications. One example of a company that uses COLLADA is Sony, who incorporates the format into the Playstation 3 developer's kit.

2.4 The COLLADA format

A COLLADA document is described with a header, a number of libraries and a set of scenes. Those elements reside directly under the root element 2.

2This visual style of writing XML element names as is taken directly from the COLLADA documentation [4], and will be used throughout this thesis when needed.

7 2.4. THE COLLADA FORMAT CHAPTER 2. METHODS AND THEORY

One of the most important properties of the resource elements in the COL- LADA specication is that they contain an id attribute. This id is a string value that uniquely identies the element within the current document. The standard le sux for a COLLADA le is .dae, which stands for Digital Asset Exchange. Following is a more detailed description of each part of the document. For a sample .dae le, refer to Appendix B.

2.4.1 Header The mandatory header, in form of an -tag, contains information about the COLLADA version used, information about the author of the document, the DCC tool and/or exporter used etcetera.

2.4.2 Library The library part of the document is where all resources are described. A resource can be considered being a building block of a scene. There are several dierent types of libraries, and the ones listed here are the ones parsed by the reader implemented for this thesis. Actually only the libraries pertaining to COLLADA Physics are omitted. Data contained in a COLLADA library is presented in table 2.1, and the library types requiring a more detailed description follows below.

Library Contains information about. . . Animations Keyframes and interpolation type. Cameras Field of view, projection mode, near and far clipping planes. Controllers Generic geometry controllers. Currently mesh skinning and morphing are supported by the COLLADA specication. Geometries The visual shape and appearance of an object in the scene. Images Raster image data (non-vector textures). Lights Light sources; type, color, attenuation etc. Materials The visual appearance of a geometry object. Nodes Node data to be referenced from some scene-graph. VisualScenes Complete scenes, one which is later referenced from the element.

Table 2.1: COLLADA Libraries

Animation The basics of animation is that it describes a transformation of an object or value over time. It is most often used to give an illusion of motion. A common technique, which is also used in COLLADA, is key-frame animation. A key-frame is a two-dimensional sample of data  like a point in an x-y plot. The rst dimension (think x-axis) in COLLADA is called the INPUT. The input is usually time, but can theoretically be any real value. The second dimension is the OUTPUT, and represents the value being animated. Having a set of key-frames and corresponding interpolation information (in- cluding tangent values), output values for times between keyframes can be cal- culated. A set of key frames and the interpolation between them denes a

8 CHAPTER 2. METHODS AND THEORY 2.4. THE COLLADA FORMAT

2D-function called an animation curve. An example of an animation curve is presented in gure 2.3.

Figure 2.3: An animation curve with visible curve tangents.

A COLLADA -element declares an animation hierarchy de- scribing one or more animation curves. The element contains a number of , and elements. Also, if the animation has chil- dren, recursive -elements may exist. s describes actual animation data in the form of arrays, while a uses at the very least one , to dene the meaning of the arrays. An element has a semantic and a pointer to a data source. The semantic is what tells a user what the data source should be interpreted as. There are ve predened Enum types for animation semantics in the COLLADA common prole. The semantic is however not limited to these Enums for the reason of maintaining the extensibility of the COLLADA element. A with its s is said to dene a sample point, which is really quite like a key-frame, but with some additional information. s typically contain ve s to dene a sample point. These semantics are the ve Enum types of the COLLADA common prole: INPUT, OUTPUT, IN_TANGENT, OUT_TANGENT and INTERPOLATION. As stated above, having in/out-values in conjunction with the tangent value and interpolation type makes it possible to compute an output for any time in between key-frames. Since the s contains arrays, the combination of s and a together dene a complete animation curve. Finally, the (s) connects the now well dened animation curve with the object to animate using an URL expression.

Controllers The COLLADA controller element contains declarations of generic control in- formation. A controller is dened as a device or mechanism that manages and directs the operations of another object. In its current state, the controller can contain only two kinds of control elements: a or a element.

9 2.4. THE COLLADA FORMAT CHAPTER 2. METHODS AND THEORY

Skinning is a way to make a polygonal mesh deform smoothly, following an underlying skeleton. The technique of dening bones and having a skin reacting to changes in their transforms is a well known concept in computer animation. Skinning is sometimes also referred to as skeleton-subspace deformation, enveloping or vertex blending. The vertices of a mesh should be transformed by several dierent joints3, and the result of each transformed vertex should be averaged using weighted scalar values. It is the hierarchy of joints (g. 2.4) that makes up a skeleton (g. 2.5), and the resulting transformed mesh is called the skin (g. 2.6).

Figure 2.4: Skeleton hierar- Figure 2.5: Corresponding chy. humanoid skeleton.

The skinning algorithm works in two steps. One preprocessing step, and then the continuous updating of the skin as the skeleton pose changes. Preprocessing is done by the modeling software and results in a bind-pose, and a list of joint-weight pairs for each vertex in the mesh to be skinned. The weights for each vertex is usually computed by some fall-o function, and the maximum number of joints to inuence it is often limited. Most DCC tools that support skinning also have ways to manually paint the weights of each joint onto the base mesh. The weight should also be normalized, that is, the sum of all weights for a certain vertex should add up to 1. The bind-pose is the hierarchy of joints in the position they had at the time of binding the skin to the skeleton. In the skinning algorithm, all transformations are computed relative to the bind-pose. When multiplying the current local-to- world matrix of a joint with the inverse of a joint bind-pose matrix it cancels out the bind-pose transform and we are in object space of the skin  that is we are relative to the bind-pose. Therefore it is more useful to store the inverse bind-pose matrices instead of the actual pose. This can be expressed mathematically with equation 2.1. v(t) is the trans- formed vertex as a function of time and wi is the weight of joint i for vertex p. The matrix M−1 is the inverse bind-pose for skeleton joint mentioned above, i i

3The terms joint, node, bone etc. and are used interchangeably in the literature, but they all refer to a transformation matrix.

10 CHAPTER 2. METHODS AND THEORY 2.4. THE COLLADA FORMAT

Figure 2.6: Resulting skinned mesh  striking a pose.

and Bi(t) is the transformation matrix the same joint at the time t. The sum is computed for all n nodes inuencing the vertex.

n−1 n−1 v X B M−1p where X (2.1) (t) = wi i(t) i , wi = 1, wi ≥ 0. i=0 i=0 Another way of looking at what is happening is to think of the vertex as be- ing transformed to a number of positions, then interpolated among them. The nal blended position of the vertex will be inside the convex hull of the set of un-weighted points B M−1p for all ( xed). i(t) i i = 0 . . . n − 1 t

The COLLADA element encapsulates all the information needed for skinning a polygonal mesh. It has as child elements a element, which holds per-joint data needed for the skin, and a element, which describes a per-vertex combination of joint and weight data.

Morphing is a way of blending or combining two or more static meshes. Blending is done by linearly interpolating between the vertices in a set of ge- ometries using a corresponding set of weights. This of course requires the morph geometries have the same set of vertices for the operation to make sense. The element contains information about which morph method to use, and points to a base mesh. The base mesh is the reference for the blending operation. The morph method can be either RELATIVE or NORMALIZED, and decides which blending equation to use. Basically the normalized version just states that the weights should add up to 1, while the relative method does not have that constraint. child elements specify a and at least two s of the morph operation. A morph target works in a similar fashion as the -element, using s with semantics to give meaning to the

11 2.4. THE COLLADA FORMAT CHAPTER 2. METHODS AND THEORY data in the arrays. The two required semantics are MORPH_TARGET and MORPH_WEIGHT. This way the denes a list of morph targets and their corresponding weight, which is all that is needed to complete the blend operation.

Geometry COLLADA is designed to be able to accommodate for every conceivable type of geometry description. There are many ways to represent geometry information in software. Commonly used are polygonal meshes, bezier curves, NURBS and other curve patches. Consumer graphics hardware today is very good at handling vertex posi- tions along with dierent additional attributes, such as normals and color etc. The ways mentioned above of describing geometry data provides this vertex information with varied eciency. Currently COLLADA only supports polygon meshes and splines. These ge- ometry types are represented as child elements directly under the element.

A contains information about a curve described with control ver- tex (CV) positions, interpolation type between CVs, tangents to control the shape of the curve segment preceding and following the CV. It can also describe continuity constraints at a CV and the number of piece-wise linear approxima- tion steps to be used for when drawing the following segment (tessellation).

The element should contain vertex information together with poly- gon primitive data to suciently describe a polygonal mesh. A vertex is a point on the mesh, containing an additional set of attributes. These attributes are typically the vertex , texture coordinate(s) and color. The also carries information about how the vertices are organized to form the geometric shape of the mesh. This is done using the following prim- itive types: , , , , , and . More details of the element will be covered in the Implementation chapter 3 as needed.

Images An contains a link to an external image using its element, or it contains an embedded image using the element. The COLLADA documentation states that a design choice was to not use binary data in COLLADA les. This is mostly because all languages do not easily support binary data in XML les. The element is the only exclu- sion from this design choice, since it uses hexadecimal encoded binary octets to represent the image data. The more commonly used element contains a common URL expression for linking to an image le.

12 CHAPTER 2. METHODS AND THEORY 2.4. THE COLLADA FORMAT

Materials/Eects Geometry objects can have lots of parameters describing their material proper- ties. In computer graphics, the material properties are what gives an object its visual appearance after the rendering computations are made. Nowadays there are two common graphic rendering systems in use: The xed-function pipeline and the programmable graphics pipeline. In the xed-function pipeline, the hardware requires certain parameters to compute a predened illumination model, for example Phong illumination. The parameters in that case includes ambient, diuse, emissive and specular colors. In programmable pipelines, the programmer decides which material param- eters to use, and also supplies an appropriate rendering algorithm to the vertex and/or pixel shader programs. COLLADA has to accommodate to both these pipeline types, and this lead to what the author would like to call the material-eect architecture. In this architecture each instantiates an . This is done using an aptly-named element. The has as at- tribute an URL pointing to the to instantiate. It also has parameters as child elements which should be sent to the eect, which in turn, depending on platform capabilities, should try to use that parameter.

An can have a number of child elements depending on what kind of graphics pipeline or what rendering environment the eect should be used in. The can have dierent children that should be used appropriately depending on the rendering environment4. For the xed- function pipeline, the is used. It can encapsulate all values and declarations for a platform-independent xed-function shader, including constant, blinn, lambert and .

2.4.3 Scene The acts as the root node for the scene-graph. It can instantiate any number of physics or visual scenes by referencing to scenes residing in or . COLLADA physics is out of the scope of this thesis, so only the element will be covered. A is the root of a hierarchical structure of s orga- nized into a scene-graph. A scene-graph is a Directed Acyclic Graph (DAG), or a tree data structure, see also 2.2. The thereby represent the whole scene, while a can be said to be the root of a sub-graph of the .

The s can have an arbitrary number of other s as children. Each node denes its position in world space using a series of transform elements (see Transformations below). It also inherits the transforms of its parent. Since the element is the basis of the entire scene-graph structure it can have a wide range of other child elements aside from other s. These other child elements are for letting the instantiate resources residing in the Library structure (2.4.2). The types of elements that can be

4Programmable rendering pipelines supported in COLLADA are CG, GLES and GLSL.

13 2.5. RENDERING CHAPTER 2. METHODS AND THEORY instantiated are listed in table 2.2.

Element Description Allows the to instantiate cameras. Allows the to instantiate controllers. Allows the to instantiate geometry objects. Allows the to instantiate lights. Allows the to instantiate other s re- siding in the . For recursive elements.

Table 2.2: Important child elements

Having a instantiate an object eectively places that object in world coordinates, since that object will be transformed by the accumulated trans- forms of all s above it.

Transformations A transformation as used in is essentially just a 4x4 ma- trix. But a matrix is not always the best way to represent a transformation, and probably not the most intuitive way. The transformation elements in COL- LADA are , , , , and . All these elements can be converted to 4x4-matrix form. To compute the nal transform of a element, the transformations should be converted to matrices and post-multiplied in the order they are spec- ied. This is sometimes referred to as baking the transformations.

2.5 Rendering

Rendering is the process of using the data representation of a scene or object, and convert it to a viewable image. In a real-time rendering system, this is done several times per second, preferably at least 30+, but this is depending on the application. Rendering a scene-graph involves traversing the graph. The transforms lead- ing up to the current graph node are accumulated, and the instance elements in each node are interpreted and rendered by the underlying hardware in a best eort manner. See brief implementation details in 3.4.

2.5.1 Resources and instructions When examining the COLLADA format, it was found that the data encountered in the libraries and scenes could be divided into two categories: resources and instructions. This conclusion was drawn thanks to collaborative work between the author of this thesis together with Tomas Karlsson and Lasse Wedin [3]. Resources are basically everything that lies in a library, or everything that can be instantiated in a element: geometries, cameras, lights and so on. An exception to this is the node, which is treated partly as a resource, and partly as an instruction.

14 CHAPTER 2. METHODS AND THEORY 2.5. RENDERING

While nodes can exist in a library, they can also exist only in the vi- sual_scene. Furthermore, it is only the transformation in the node that is treated as a resource. Instructions on the other hand, are what is giving the information about what to do in a when traversing the scene graph. It is what tells us what to instantiate and  together with the transform  where to put it. Generally, the rendering of a COLLADA scene is to follow a set of instruc- tions and draw the resources referenced by them.

15 Chapter 3

Implementation

The following chapter deals with the specic methods used in the implementa- tion of the COLLADA client.

3.1 JAXB

JAXB works by rst compiling a source schema into a package of Java classes called the bind package, see 2.1.1. The compiler used is the XML Java Compiler (XJC). Secondly, the JAXB API is used for getting the XML content into the object representation in computer memory described by the bind package.A schematic picture of this is presented in gure 3.1.

Figure 3.1: The left part of the gure shows how the XJC compiler generates the bind package classes. The right part shows how a COLLADA document can be loaded into a Java class representation, or created from a set of bind package class instances. The dotted arrow symbolizes that the bind package is used in the Marshalling and Unmarshalling processes.

The COLLADA XML schema is the formal denition of the format, and is the schema used in this project for compiling a bind package. The schema is

16 CHAPTER 3. IMPLEMENTATION 3.1. JAXB just over 10 000 lines long, and takes some customization to be compiled by the XJC compiler to yield the desired results.

3.1.1 JAXB customization XJC works by looking through the XML schema denition and tries to bind the components of the source schema to Java classes. This sometimes result in conicts between XML elements with the same name, or in class names gone haywire. Since some customization had to be made, a bind le was supplied to the XJC compiler when compiling the schema into classes. The types of custom bindings are listed and explained in the four following sections, and the code examples are taken from the bind le used when compiling the schema.

Global bindings Custom bindings done globally for the whole schema include altering the binding of XML types long and unsignedLong to Javas int, and double to float. Since any COLLADA element using the COLLADA oat type  which is really mapped to be an XML double according to the COLLADA schema1  will be represented as doubles, this binding reduces the memory usage quite drastically. This is especially true for geometry, which uses the oat type for representing all vertex data. Furthermore, graphic cards only work with 32 bits internally, so the 64-bit precision will be lost in the rendering pipeline anyway. The long and unsignedLong bindings are also to conserve memory. Since they are most often used to act as pointers into arrays, an int, being able to address 2 GB of data, was deemed fully sucient.

Another parameter to customize in the global scope is which Java collection type to be used by XML list data types. This is implementation specic, but in the case of JAXB, all XML lists must implement java.util.List. The bind le for this project uses java.util.ArrayList.

Local bindings The local bindings are mostly used to create shorter class and method names in the generated bind package. While some class and method names were customized for more or less aes- thetic reasons, one name collision in the control element really had to be solved for the schema to compile: Both the and element had name conicts, since these elements both have elements and a source at- tribute. This was xed by addressing the node using XPath syntax, and then setting the property name to resolve the conict. Following is the code that does just that.

1http://www.collada.org/2005/11/COLLADASchema.xsd

17 3.1. JAXB CHAPTER 3. IMPLEMENTATION

Primitive type bindings Binding some of the COLLADA primitive types was done to make use of the Vecmath (see 3.5.2) library. For instance, the COLLADA oat3 type extends a ListOfFloats in the schema. That would result in any COLLADA element using a oat3 to contain a List, when what is really desired is to use the Float3 interface of the Vecmath library. Since the COLLADA oat type was bound to Java floats, the oat3 was bound to the Vector3f2 type in the Vecmath library. The following code example is for binding the type oat3 of the COLLADA schema to a Vector3f. The methods parseFloat3 and parseString have been implemented in the class COLLADAParser. These methods have to be specied in the binding le, as they are used in the unmarshalling and marshalling process respectively.

In this manner, all 2-, 3- and 4-vectors were bound to make use of their corresponding FloatNf interface in the Vecmath library. Also, the float4x4 type was bound to be represented by the Matrix4x4f, which is a much more useful form than the list of 16 oats it would otherwise have been.

Enum bindings Two special cases of COLLADA enumeration types also had to be given custom names. This was to resolve the conict of them having illegal characters in the enum name. Or in the example given, an illegal starting character for a Java enum identier.

3.1.2 JAXB usage The now customized bind package created by the XJC compiler is used by JAXB as the data structure in which to store the parsed XML (COLLADA) data. The bind package is supplied when creating a new instance of a JAXBContext. That way, JAXB knows how and where to store the parsed XML data. A JAXBContext is what is needed for creating a Marshaller and/or an Un- marshaller instance for the marshalling and unmarshalling processes described in 2.1.1.

2Where f stands for oat

18 CHAPTER 3. IMPLEMENTATION 3.2. COLLADA CLIENT

Using the unmarshal()-method of an Unmarshaller returns an instance of the bind package root element, which of course corresponds to the root element of a COLLADA document  a element. For marshalling, a root element (an instance of a COLLADA object) is supplied to the Marshaller marshal()-method, which marshals the element and all its children into a valid XML le as dened by the schema used when compiling the bind package. For a full example of unmarshalling see Appendix A. It is also possible to validate the document to unmarshal against a source schema (see 2.1 for schema validity). This is done by specifying a schema to the unmarshaller before unmarshalling. See the comment in the code of Appendix A.

3.2 COLLADA Client

The COLLADA client package is the programmer's interface for reading and parsing COLLADA les. The following is a description of the architecture and inner workings of the client.

The COLLADAIo read() method takes an InputStream as argument and returns a COLLADA root element, or an instance of the bind package class COLLADA. That instance is supplied to the constructor of the COLLADAHandler, which immediately loads the libraries in the COLLADA class. Loading of the libraries accomplishes two things; Firstly a list of resources is created, containing every resource in the scene, ready to be sent to the rendering server. The resources are in the form of a GenericResource, described further in 3.6. Secondly, the elements residing in the COLLADA libraries are stored in to hashmaps declared as HashMap, where resourcetype is one of the types described in 2.4.2, and the String is the unique id tag (see 2.4). The hashmaps are used to let the user get easy access to all library resources for editing their content. The resources still exist in the libraries of the COL- LADA root element, but doing a hashmap lookup on the unique id immediately returns that resource. There is a general (and quite slow) method for searching through all hashmaps, but also get()-methods for each map that can be used if the type of resource is known. In certain private methods, some of the resource hashmaps also come to use. One example is when the material library has to be traversed to initialize the eects. Since the values()-method of a hashmap returns a Collection interface, the enhanced for-loop of Java 5 makes it ridiculously simple to traverse the map. There are three main methods in the COLLADAHandler: getResources(), traverseScene() and evaluateScene() covered in 3.2.1 and 3.2.2.

3.2.1 Resources and instructions revisited The methods getResources() and traverseScene() are used for retrieving the information needed by the rendering server. The getResources()-method returns the list of resources generated when the libraries are read. That list should then be passed on to the rendering server where they should reside. Worth noting again is that the transforms

19 3.2. COLLADA CLIENT CHAPTER 3. IMPLEMENTATION also has to be treated as resources (see 2.5.1). A complete list of what has to be sent to the rendering server is in table 3.1. The table also distinguishes between non-animatable (static) and animatable resources, since the animatable resources has to be re-sent to the server each time the scene is evaluated.

Resource Animatable Non-animatable x x x x x x -transforms x

Table 3.1: Resources to be sent to the rendering server

The traverseScene()-method traverses the current visual scene, and cre- ates an instruction for every element it nds in a . The instructions always contain references to the unique id of a resource. A standard instruction contains a list of transform id:s, and the id:s of the resource(s) to be rendered at that position in world space. The list of transform id:s are due to the traversal of the graph. As mentioned in 2.4.3, a node should inherit the transform of its parent, and that is what is happening with the id list: it appends the id of the current to the list of id:s leading to it. It's actually a representation of a transformation stack. An example of a typical rendering instruction for a mesh is an instruction that contains the transform id stack, the geometry id and the material id to be used on the geometry. Another example is a lighting instruction, which simply supplies the transform id stack and a light id.

3.2.2 Animating

The third method in the COLLADAHandler, evaluateScene(t), is for evalu- ating the animations in the scene at a given time t. Evaluating the scene is done by using the elements (i.e. ani- mation curves) in the animations library hashmap and evaluating them at the time specied. Then the objects that the curves are connected to are updated. This obviously also requires the updated resources to be re-sent to the server. See table 3.1 for the resources that has to be updated after evaluation of the scene. The COLLADA documentation states that COLLADA recognizes the fol- lowing interpolation types: linear, cardinal, bezier, hermite, bspline and step. The documentation does however not state anywhere how the dierent interpo- lation schemes are to be used. This makes it hard for developers of exporters and importers to interpret a DCC tool's representation of an animation keyframe. For example it might not be clear how the tangent-information should be used. Linear and bezier interpolation of keyframes is implemented though, but the bezier tangent values exported from the COLLADAMaya exporter used are seemingly strange. Several methods of bezier tangent interpretation has been

20 CHAPTER 3. IMPLEMENTATION 3.3. CLIENT PARSER CLASSES tested, including ones mentioned in [7] using Bernstein polynomials, and the one in [8]. This is discussed further in 4.1.4.

3.3 Client parser classes

This is an overview of the parser classes of the COLLADA client. They are all used by the COLLADAHandler when loading the libraries. The basic function- ality and some details are explained for each class. Additionally, the parsers for animatable elements (see table 3.1) have helper methods that are used for fast access to the parameter(s) to animate.

3.3.1 COLLADAParser This is the only parser used outside the COLLADAHandler since it is the main parser used by JAXB. It contains parse and print-methods as needed by JAXB (see Primitive bindings page 18). It also has methods for baking the transfor- mation elements of the COLLADA .

3.3.2 AnimationParser The AnimationParser reads and stores animation data (animation curves) in special hashmaps for easy access by the evaluateScene() method of the COL- LADAHandler.

3.3.3 CameraParser A element contains information about projections, eld of view (fov) and near and far clipping planes. These attributes are parsed and stored as a generic resource. The variables in this GenericResource can be used by the server to compute the projection matrix for the scene. The transformation stack leading up to the camera in the node tree is used to compute the viewing matrix. There are some problems that arise when using the camera element(s), but they are left for discussion in 4.1.3.

3.3.4 ControllerParser The name says it all, but the parser is currently only a shell for parsing control information. It was not prioritized to get to a working state because the existing server kernels do not implement skinning or morphing yet. The parser shell is there though, ready to implement when needed, and it is easy to put the information in a GenericResource.

3.3.5 EectParser This parser is where the eects in the libraries are read. Currently, only the common technique is parsed. This means that just the xed-function pipeline components such as ambient, diuse, emissive and specular are read. The xed-function material properties contained in the Common Prole in- clude ambient, emissive, diuse and specular colors/textures. It also includes shininess and reectivity components.

21 3.3. CLIENT PARSER CLASSES CHAPTER 3. IMPLEMENTATION

All these components are put into a GenericResource. Many of the compo- nents can as mentioned be either a color or a texture map, which means that the type of the GenericResource has to be carefully specied. In the ResourceConstants class, a base type is declared. Depending on the content of, say, the diuse component, a constant is selected and appended to declare the correct type of the GenericResource. For instance ResourceConstants.DIFFUSE + ResourceConstants.MAP as type in a GenericResource would hint that the resource contains a texture map for the diuse component. The + of course being a String concate- nation operation. A diuse color component would instead contain the type ResourceConstants.DIFFUSE + ResourceConstants.COLOR.

3.3.6 GeometryParser The GeometryParser has a method for parsing a element and pack- aging it into a GenericResource. Only polygon meshes are parsed, that is the and elements. Splines were left out since they are very rarely used. The meshes are put into GeometryResources  a special data con- tainer for carrying mesh information. A valid can be made out of or . This was an implementation design choice, it is not a limitation of COLLADA (refer to 2.4.2 on page 12 for a complete list of mesh primitives). primitives were trivial to parse and send to the server, since all rendering APIs use triangles in some form, and since the triangle is the ultimate polygon primitive  unambiguously dening a plane with its vertices, and always being convex. The polygon primitive type was harder to parse. Partly because a polygon is not limited to having a certain amount of vertices, but also because a polygon can contain holes and is not guaranteed to be convex. This left some choices of how to interpret polygon information, and what to allow and not. The options were:

1. Triangulate on the client side

2. Demand triangulated data.

3. Create a GeometryResource container that exactly maps the - element of COLLADA to a serializable resource and do as needed on the server side.

4. Allow quads and triangles, but then the mesh has to be made homoge- neously out of one of the primitive types.

Since there was not neither time nor any immediate need to implement a triangulation algorithm, the last alternative was chosen. The second option was used in the beginning of the project, and is quite reasonable, since triangulating is easily made in the DCC tool. The nal choice to use the last alternative, was because most real-time ren- dering API:s (OpenGL and DirectX) can render both quads and triangles. Ex- ceptions are thrown that informs the user about the limitations posed on the polygon primitive.

22 CHAPTER 3. IMPLEMENTATION 3.4. RENDERING SERVER

3.3.7 LightParser

The LightParser parses any element and puts the lighting properties in a GenericResource. Light properties include color, attenuation and light cone angle (for spotlights).

3.4 Rendering server

Working on the rendering server was not initially a part of this work. It was developed as a shorter thesis project by Lasse Wedin [3]. When that project was nished, it matched the COLLADA client at the time, but since the client went into further development, the server had to be worked on as well. Most of that work was to clean up the existing code and just keeping the essentials. Then, together with Tomas Karlsson, implementing a working set of generic resource containers (as described in 3.2) to transfer data from the client to the server. The server is described more in [3], but basically works by creating a Ren- derer, and loading dierent Kernels. A Kernel is what is interpreting the in- structions and resources, so it can be seen as the rendering engine. It basically takes the instructions it gets, looks up the appropriate resource(s) and uses the graphics API chosen to render the instruction. The server is tested both using an OpenGL kernel via JOGL3, and with a DirectX kernel4. By implementing other Kernels, dierent rendering results can be achieved, which is also described in [3].

3.5 Data structures

The client  and server too for that matter  relies heavily on the use of the java.util.Map interface. More specically the HashMap implementation. As mentioned in 2.4, all COLLADA elements have unique id attributes, which are used as the key when inserting an element/resource into a hashmap. A hashmap makes it possible to look up a specic value in O(1) time, at least if the hash function distributes the elements properly in the buckets. Other important structures are described in the sections below.

3.5.1 Bind package

The bind package generated by the XJC compiler is the main data structure used by the client. It consists of 194 classes plus 3 extra Adapter classes due to the global rebind of the primitive types done in 3.1.1. The classes in the bind package are used to represent the XML elements of the COLLADA document. The bind package is thus the structure in which all unmarshalled COLLADA content will be stored.

3https://jogl.dev.java.net/ 4Thanks to some late night ninja programming by Tomas Karlsson.

23 3.6. COMMUNICATION CHAPTER 3. IMPLEMENTATION

3.5.2 Vector math library Tomas Karlsson re-designed the vector math library used in the current AgentFX, to get a fresh start with the new engine, and to take advantage of static im- port feature of Java 5. The library was used for the basic storage of graphic primitives, i.e. vectors and matrices. The library contains some math classes and basic functions needed when working with 3D graphics, like vector multiplication, cross and dot products and so on. It is programmed using an interface pattern, with base interfaces for types like Float2, Float3, Float4 and Float4x4. Those interfaces can in turn be implemented as Vector2f or Vector2d while retaining the same functionality. When using the library, the interfaces are always used as return or input types in methods or functions. The only thing that decides which implemen- tation of the interface is to be used, is when creating a new instance of the object.

The library also contains basic 3D graphics transformation operations, such as setting translation, scale and rotation of a matrix. The author has imple- mented methods for converting from other representations of transformations to matrices. More specically there are converter methods from all of the COL- LADA transform-type elements mentioned on page 14.

3.6 Communication

Currently, to get a working demo application, both client and server are running locally, so the client communicates information to the server directly via class objects. The resource class  GenericResource.java  is designed to be a generic container of resource data, not strictly bound to the current COLLADA version, which might very well change. It is actually not bound to any digital asset format. It could in theory be used as a resource container for data parsed from an .obj-le for instance. As long as it is well dened in the ResourceConstants, and that there exists a ren- dering kernel that can handle that resource.

The GenericResource has an id String, a String type descriptor, and room for a data Object. The type descriptor is what denes the content in the data eld. Legal data types are described by resource constants. They are dened in a Resource- Constants.java as public static final Strings, and are used both by the client when packaging data into a GenericResource, and by the server when interpreting the GenericResource. Enums are not used for ResourceConstants since an enum class cannot be extended, while an ordinary class can. The extensibility is desired if someone wants to use the engine, but with other resource types.

As stated above, the client and server are currently run locally. The plan with the client-server architecture is to be able to have a client on one machine and the rendering server running on a remote machine. The class objects then have to be serializable, for packaging and sending over a network layer. This problem was looked into briey, but creating a resource as generic that

24 CHAPTER 3. IMPLEMENTATION 3.6. COMMUNICATION any one resource could be serialized and deserialized easily was deemed to big, and out of the scope for this thesis.

25 Chapter 4

Evaluation

In this chapter, the functionality and performance of the thesis implementation is evaluated. Some test cases have been run, and they are analyzed and discussed below.

4.1 Functionality 4.1.1 Client-server architecture The client-server architecture is solid. It is up to the programmer of the ren- dering server kernel to implement the desired rendering functionality. The im- portant thing is that as much of the COLLADA data as possible is provided to the server. The system with resources and instructions solves the communication be- tween client and server. The most dicult problem is to nd a way as generic as possible to transfer the resources. Using a class (ResourceConstants) which denes what name and type of data contained in each resource works, but it has to be maintained and referenced frequently when implementing a kernel. Earlier on, a system with a larger set of more specic resource containers was used. The resources were sent in a number of classes like LightContainer, MeshContainer, TransformContainer and so on. That solution was discarded as it did not allow for any resource to be sent, only the ones currently implemented. The same problem is true for the ResourceConstants system used in the current version, but then the matter of implementing a new resource type is just to declare a new constant and com- menting with which type of data the type should contain.

4.1.2 Parsing COLLADA libraries Generally, the hardest thing about parsing a COLLADA library is to accom- modate for the intrinsic structure of the format. The input semantics and the sometimes seemingly endless variation of child elements are there for one pur- pose: To allow the format to be able to contain as many and varied digital assets as possible. Correctly traversing the XML hierarchy, doing it as generally as possible, and handling every possible instance of child elements, is needed to go from

26 CHAPTER 4. EVALUATION 4.1. FUNCTIONALITY the very general COLLADA format into something that is understandable by a rendering kernel. This is a reason why the work was limited to only the Common Prole, and partly why the mesh parsing was limited to and . The design of the parser structure is present though, and as much compliance as possible or needed will be implemented into a new AgentFX.

4.1.3 Camera One problem with using the camera attributes is that the projection matrix used in the server is computed by using the height and width of the current viewport. The viewport does not have to match the x-fov1 and/or y-fov and/or aspect ratio of the camera data. A camera does not have to supply both x and y fov or aspect ratios, but used together with the dimensions of the current viewport, an aspect ratio can be computed. Without going into too much details about the rendering server, it can be mentioned that the architecture of the server is such that it is designed not to know which context (or viewport) it renders to. That is what makes it dicult to know which aspect ratio to use. It is of course also possible that there is no camera element, or that there are several cameras in a scene. Since the product of this thesis is not a complete scene-graph based engine, there is no easy way to move a camera around in a scene, and there is currently no way to choose an active camera among the cameras that might exist in a scene.

4.1.4 Animations Animations are implemented and tested. The test-scene uses a bezier interpo- lation method, but as mentioned in 3.2.2, it is not quite clear how the tangents exported should be used. No hints are given in neither the COLLADA docu- mentation nor the COLLADAMaya exporter documentation. Currently, a bezier curve that results in a linear movement in the DCC tool used (Maya 7.0), appears to make the animation accelerate in the start and slow down near the next key-frame, when the speed should be constant all the way through. This is illustrated in gure 4.1.

4.1.5 Scene-graph The scene-graph is fully functional, but the graph traversal is not made through recursion in a Node, but rather in the COLLADAHandler, which makes it im- possible for an end user to make changes in the traversal of the tree. What is desired in the long run is for the recursion method to exist in the Node class. This would allow for a customized traversal method to be implemented directly in the node and extend the default traversal behavior. Since the Node class belongs to the bind package, and all other bind package classes were left untouched, a choice was made not to start ddling around with the node class either. This is discussed in 5.2, Future work.

1Short for eld of view

27 4.2. PERFORMANCE CHAPTER 4. EVALUATION

Figure 4.1: Testing of bezier interpolation between keyframes. The rst and second bezier curves are taken from two of the tested interpolation methods.

So, as it is now, the graph traversal is done as an outer recursion, that is, the COLLADAHandler has a method that takes a node and parses it, then calls itself recursively with any node children it nds in the current node.

4.2 Performance

Where are the bottlenecks when loading a COLLADA document? What is taking the most time? All benchmarks in this section are done on a Dell Dimension 9150  a P4 running at 3.0 GHz, with 1 GB of RAM. The graphics card is an nVidia GeForce 6800 with 256 MB memory.

4.2.1 Load time examples The tables 4.1 and 4.2 are the output of two timed runs of COLLADAIo.read(), loading two dierent documents. The percentage is how much of the time gone into that part of the reading process. Creating an instance of a JAXBContext usually takes somewhere around 4 seconds, but is declared as a singleton, which means that if another le is to be read, the Context (and Unmarshaller) is already there. The size of the input le has no inuence on the context creation time. That is fully dependent on the size of the bind package. The creation time of the Unmarshaller instance varies slightly, but can also be considered to run in constant time.

28 CHAPTER 4. EVALUATION 4.2. PERFORMANCE

Create JAXBContext: 4000 ms 76.98229% Create unmarshaller: 120 ms 2.3094687% Unmarshal: 1076 ms 20.708237% Total: 5196.0 ms 100%

Table 4.1: Read times for medieLab.dae (1.3 Mb)  a view of the rendered scene can be seen in g. 4.2

Create JAXBContext: 4172 ms 14.604775% Create unmarshaller: 150 ms 0.52509975% Unmarshal: 24244 ms 84.870125% Total: 28566 ms 100%

Table 4.2: Read times for islandComplete.dae (73 Mb)  a view of the rendered scene can be seen in g. 4.3

Figure 4.2: Screenshot of medieLab.dae

Figure 4.3: Screenshot of islandComplete.dae

29 4.2. PERFORMANCE CHAPTER 4. EVALUATION

Loading and parsing the libraries in islandComplete.dae takes just under 3 s, which is quite fast for that amount of data. (Note that the dierence in le size from the islandComplete.dae used when testing the space costs is due to the included tangent and binormal vectors for each vertex. These vectors are not always used, but can be precalculated and exported by the COLLADAMaya exporter to relieve the CPU/GPU from computing those vectors at runtime.) The island contains ∼365 000 triangles and ∼184 000 vertices divided on the 55 geometries in the library. There are also 134 animation channels, 55 materials/eects (one for each geometry), 88 images (of which 59 are actually referenced and read by the TextureIO, see below). The scene-graph furthermore contains 153 nodes with a corresponding transform, which are all baked into 4x4 matrices. The library loading process for the island results in 356 resources. Afterward, the island scene is traversed, resulting in 59 rendering instruc- tions. This takes around 4-5 ms.

Evidently, most of the time goes into the unmarshalling process, since that is where the actual InputStream reading is taking place.

Texture read times are not included in either the unmarshalling, library loading or the scene traversal. A texture is read by the rendering server the rst time an image resource is referenced by an instruction. This is typically when an eect has a texture map assigned. The texture then resides as a resource on the server. Typical texture read times are presented in table 4.3. The textures in this case are 24-bit png:s. They are read is using the TextureIO library in JOGL. These times are just meant to get a feel for the order of magnitude of the time it takes to read a texture. They are not in any way exact gures.

.png size (px) Time (ms) 10242 500 5122 100 2562 20 1282 5 Table 4.3: Approximate texture read times

4.2.2 Space costs One argument against using COLLADA as a format for distribution is that it uses a lot of space, being a human-readable XML format. Common data compression algorithms like the ones used in .zip are on the other hand very good at compressing text. To test the dierence in size between dierent formats, the islandComplete scene was exported into three formats: .dae, . and .mb. While .dae is a text-based format, both .fbx and .mb (Maya Binary) are binary formats. Their size was compared, after which they all were zip-compressed using WinRAR 3.20, with highest compression, to compare the compression ratio. The results are presented in table 4.4.

30 CHAPTER 4. EVALUATION 4.2. PERFORMANCE

File format Original Compressed Ratio .dae 57 643 K 15 831 K 27 % .fbx 53 321 K 16 171 K 30 % .mb 55 352 K 21 646 K 39 %

Table 4.4: size comparison

Granted, the .dae le is not excessively much larger than the others, but using standard compression the dierence is reduced with a signicant amount. Even to the extent that the COLLADA le becomes smaller than the others. When taking into account that the scene also includes 22 MB of textures, regardless of the le format used, the dierence in distribution size between the formats is relatively small. And since COLLADA in itself is such a exible format, data that is not actually needed for a scene can be excluded from being exported if such a size saving is desired.

31 Chapter 5

Discussion

The purpose of this chapter is to discuss the results obtained by the implemen- tation, and see if the purposes stated in 1.2 are fullled. A section discussing the future of the COLLADA engine is also presented.

5.1 Conclusion

The author has gone from zero to quite good understanding of XML and data binding techniques. Also, working with the COLLADA format and the data contained within has been a great experience, creating understanding of a very broad set of topics in 3D graphics.

The less personal purpose with the thesis was to see if it was possible to take the COLLADA structure, with libraries and scene-graph, and map it to a real time rendering engine. The COLLADA library data can be parsed for direct usage with relative ease. The biggest problem was to choose a general way to represent all resources. The COLLADA library resources are very generally specied, but depending on the rendering kernel, dierent amounts of data is needed. The current solution of using data objects with a corresponding type and for the server to use it at its own discretion seems to be the most durable solution. A strength with the implementation, thanks to the use of JAXB, to COL- LADA being an XML format, is that a user can choose to validate any le to be read against the COLLADA schema. The scene-graph can also be used. Everything that is needed for placing ob- jects and instantiating geometries, lights, materials and other objects is there. The system of generating (by scene traversal) and sending the rendering instruc- tions to the server works very well. As for now, the problem lies in interacting with the elements; adding and removing children, changing transforms and so on. It can be done, but the API structure of the bind package is not very user friendly. What can be said though, is that the nal product is a great non-interactive scene-graph. Meaning that it can take an exported COLLADA document, including the scene-graph representation, and render it to look just like inside the DCC tool. But taking control of the nodes is so hard that it cannot be called a true scene-graph based

32 CHAPTER 5. DISCUSSION 5.2. FUTURE WORK engine as described in the scene-graph section (2.2). At least not yet.

This thesis work is going to be used as the basis developing the next genera- tion of AgentFX. Thus, the higher purpose of the thesis can also be considered fullled.

Animation is working, although the curve interpolation data seems incorrect. FeelingSoftware1 came out with 4(!) new releases of their COLLADAMaya ex- porter during the summer. One of the things they've looked into is the export of animation curve tangents, so key-frame interpolation will probably work better with the new exporter.

5.1.1 COLLADA as a 3D engine format COLLADA is an extremely broad container of data, which is good in a way. But it leaves huge amounts of work to implement the rendering capabilities for all that content. Choosing carefully what to implement, and implementing things so they can be extended in the future is the key to making an engine based on COLLADA. Everything needed for representing all kinds of 3D data is there, and the format will continue to evolve. This makes the future very interesting for the new AgentFX.

5.2 Future work

Priority one towards a COLLADA compliant rendering engine is the develop- ment of a rendering server that is as complete as possible, supporting as many features in the format as possible. The rst step is to include the CommonPro- le, i.e. implementing complete xed-function pipeline rendering capability. There should also be support for reading of other proles than the Common- Prole. It should then also exist kernels for each DCC tool that uses COLLADA in an extended form, namely using another Prole. Those kernels should just be loaded into the renderer and used depending on the content supplied. For example, a company that has chosen to use COLLADA as a format just has to implement an own rendering kernel, able to render the content specic to their developer tools.

Some classes of the bind package, mainly the Node class, should be modi- ed to contain a traversal method. The traversal method could then easily be overloaded to let programmers implement custom scene-graph traversal. The node should also be re-written to make the interaction with the scene- graph easier. For example by implementing methods for making it easier to set transformations. That would make the COLLADA scene-graph work like a real scene-graph based engine, and like the scene-graph in the current version of AgentFX. And perhaps this little change in node functionality would punch a hole in the statement from the COLLADA documentation mentioned in the problem formulation (1.1); that COLLADA is not a game engine format.

1http://www.feelingsoftware.com/

33 5.2. FUTURE WORK CHAPTER 5. DISCUSSION

34 Bibliography

[1] Paul S. Strauss. Iris inventor, a 3d graphics toolkit. OOPSLA'93, pages 192200, 1993. [2] Mikael Lagré. Implementation av en collada inläsare för agency 9. LTU- HIP-EX-0542-SE, 2005. [3] Lasse Wedin. A rendering server for agency 9. Bachelor's thesis at LTU  gscept, 2006. [4] Mark Barnes. Sony Computer Entertainment Inc. Collada: Digital asset schema release 1.4.0 specication. pdf, January 2006. [5] W3C. Extensible markup language () 1.0 (fourth edition) documenta- tion. http://www.w3.org/TR/REC-xml/, 2006. [6] Wikipedia. English. http://en.wikipedia.org, 2006. I realize that it's bold and perhaps unscientic to use Wikipedia as a reference in a Master's thesis. However, for wiki-articles in this eld of work, there is hardly any political or other reason to write inaccurate or unreliable articles. [7] Eric Haines Tomas Akenine-Möller. Real-Time Rendering. A K Peters, Ltd, second edition edition, 2002. [8] Robin Stuart Ferguson. Practical Algorithms for 3d Computer Graphics.A K Peters, Ltd, 2001.

35 List of Figures

2.1 computerTable scene-graph...... 6 2.2 Table scene...... 6 2.3 An animation curve with visible curve tangents...... 9 2.4 Skeleton hierarchy...... 10 2.5 Corresponding humanoid skeleton...... 10 2.6 Resulting skinned mesh  striking a pose...... 11

3.1 The left part of the gure shows how the XJC compiler generates the bind package classes. The right part shows how a COLLADA document can be loaded into a Java class representation, or cre- ated from a set of bind package class instances. The dotted arrow symbolizes that the bind package is used in the Marshalling and Unmarshalling processes...... 16

4.1 Testing of bezier interpolation between keyframes. The rst and second bezier curves are taken from two of the tested interpola- tion methods...... 28 4.2 Screenshot of medieLab.dae ...... 29 4.3 Screenshot of islandComplete.dae ...... 29

36 List of Tables

2.1 COLLADA Libraries ...... 8 2.2 Important child elements ...... 14

3.1 Resources to be sent to the rendering server ...... 20

4.1 Read times for medieLab.dae (1.3 Mb)  a view of the rendered scene can be seen in g. 4.2 ...... 29 4.2 Read times for islandComplete.dae (73 Mb)  a view of the rendered scene can be seen in g. 4.3 ...... 29 4.3 Approximate texture read times ...... 30 4.4 File format size comparison ...... 31

37 LIST OF TABLES LIST OF TABLES

38 Appendix A

Unmarshalling example

public static COLLADA read( InputStream inputStream ) { COLLADA collada = null; Unmarshaller u;

if ( jc == null ) { try { jc = JAXBContext.newInstance( BINDING_PACKAGE ); } catch ( JAXBException e ) { e.printStackTrace(); } }

try { u = jc.createUnmarshaller(); // Validate against a null schema, that is, do not validate u.setSchema( null ); collada = (COLLADA) u.unmarshal( inputStream );

} catch ( JAXBException e ) { e.printStackTrace(); } return collada; }

39 APPENDIX A. UNMARSHALLING EXAMPLE

40 Appendix B

Sample COLLADA le

This appendix contains a .dae le describing a simple textured cube located at the origin. Note that the cube is triangulated, but the geometry is described with the element.

Agency 9 Maya 7.0 | ColladaMaya v0.90 May 4 2006 at 09:20:32 Collada Maya Export Options: bakeTransforms=0;exportPolygonMeshes=1; bakeLighting=0;isSampling=0;curveConstrainSampling=0;exportCameraAsLookat=0; relativePaths=1;exportLights=1;exportCameras=1;exportJointsAndSkin=1; exportAnimations=1;exportTriangles=0;exportInvisibleNodes=0;exportNormals=1; exportTexCoords=1;exportVertexColors=1;exportTangents=0;exportTexTangents=0; exportConstraints=1;exportPhysics=1;exportXRefs=1;dereferenceXRefs=0; cameraXFov=0;cameraYFov=1 file://C|/workspace/xproject/colladaclient/data/cubeCrate.dae 2006-06-01T16:54:51Z 2006-06-01T16:54:51Z Y_UP crate.jpg 0 0 0 1.000000 0 0 0 1.000000

41 APPENDIX B. SAMPLE COLLADA FILE

true true false false 1.000000 1.000000 0 0 0 false false 1.000000 1.000000 0 0 0 0 0 NONE 0 0 0 1.000000 1 -0.500000 -0.500000 0.500000 0.500000 -0.500000 0.500000 -0.500000 0.500000 0.500000 0.500000 0.500000 0.500000 -0.500000 0.500000 -0.500000 0.500000 0.500000 -0.500000 -0.500000 -0.500000 -0.500000 0.500000 -0.500000 -0.500000

42 APPENDIX B. SAMPLE COLLADA FILE

0 0 1.000000 0 0 1.000000 0 0 1.000000 0 0 1.000000 0 0 1.000000 0 0 1.000000 0 1.000000 0 0 1.000000 0 0 1.000000 0 0 1.000000 0 0 1.000000 0 0 1.000000 0 0 0 -1.000000 0 0 -1.000000 0 0 -1.000000 0 0 -1.000000 0 0 -1.000000 0 0 -1.000000 0 -1.000000 0 0 -1.000000 0 0 -1.000000 0 0 -1.000000 0 0 -1.000000 0 0 -1.000000 0 1.000000 0 0 1.000000 0 0 1.000000 0 0 1.000000 0 0 1.000000 0 0 1.000000 0 0 -1.000000 0 0 -1.000000 0 0 -1.000000 0 0 -1.000000 0 0 -1.000000 0 0 -1.000000 0 0 true 0 0 1.000000 0 0 1.000000 1.000000 1.000000 0 2.000000 1.000000 2.000000 0 3.000000 1.000000 3.000000 0 4.000000 1.000000 4.000000 2.000000 0

43 APPENDIX B. SAMPLE COLLADA FILE

2.000000 1.000000 -1.000000 0 -1.000000 1.000000

0 0 0 1 1 1 2 2 2

1 3 1 3 4 3 2 5 2

2 6 2 3 7 3 4 8 4

3 9 3 5 10 5 4 11 4

4 12 4 5 13 5 6 14 6

5 15 5 7 16 7 6 17 6

6 18 6 7 19 7 0 20 8

7 21 7 1 22 9 0 23 8

1 24 1 7 25 10 3 26 3

7 27 10 5 28 11 3 29 3

6 30 12 0 31 0 4 32 13

0 33 0 2 34 2 4 35 13

0 0 1 0 0 1 0 0 1 0 0 0 0.041667 2.000000

44