CR-PLAY “Capture-Reconstruct-Play: an innovative mixed pipeline for videogames development” Grant Agreement ICT-611089-CR-PLAY Start Date End Date 01/11/2013 31/10/2016

Deliverable 4.2

Functional specifications, system definition, development approach and architecture

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

Document Information

Deliverable number: 4.2 Deliverable title: Functional specifications, system definition, development approach and architecture Deliverable due date: 31/05/2014 Actual date of delivery: 31/05/2014 Main author(s): Ivan Orvieto, Matteo Romagnoli (TL) Main contributor(s): George Drettakis, Jerome Esnault (INRIA), Jens Ackermann, Fabian Langguth, Michael Goesele (TUD), Corneliu Ilisescu, Gabriel Brostow (UCL) Version: 1.0

Versions Information Version Date Description of changes 0.1 02/04/2014 Structure and contributors 0.2 24/04/2014 First draft of state of the art of game development tools 0.3 08/05/2014 Update with requirements and first draft of architecture 0.4 12/05/2014 Capture, Reconstruction, IBR, VBR contributions 0.5 17/05/2014 Pre-final version produced and sent to partners for comments 1.0 28/05/2014 Final version

Dissemination Level PU Public X PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services)

Deliverable Nature

R Report X P Prototype D Demonstrator O Other CR-PLAY Project Information

The CR-PLAY project is funded by the European Commission, Directorate General for Communications Networks, Content and Technology, under the FP7-ICT programme. The CR-PLAY Consortium consists of:

Participant Participant Organisation Name Participant Country Number Short Name Coordinator 1 Testaluna S.R.L. TL Italy Other Beneficiaries 2 Institut National de Recherche en Informatique et en INRIA France Automatique

2

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

3 University College London UCL UK 4 Technische Universitaet Darmstadt TUD Germany 5 Miniclip UK Limited MC UK 6 University of Patras UPAT Greece 7 Cursor Oy CUR Finland

Summary This is the second deliverable from Work Package 4: “Design and Development (Requirements, Functional Specifications, and Prototypes)”. The leader of this work package is TL, with involvement of all other partners. The objective of this work package is focused on gathering end-user needs, forming these into functional specifications and creating the prototypes of the mixed pipeline for videogame development. This WP sets into practice the user-centred design approach adopted by CR-PLAY, ensuring that the technologies developed will result in tools that are effective and usable for professional and semi-professional use.

This deliverable - “D4.2 Functional specifications, system definition, development approach and architecture” - describes the results of Task 4.2 “Functional specifications for a Mixed Pipeline” and acts as reference document towards the development activities performed (in particular) in WP4. D4.2 is meant to be a living document that will follow the evolution of the project in terms of updated requirements, advancements in research activities, new and unexpected possibilities, risks and related contingency actions.

The structure of this deliverable is as follows:

Section 1 presents user requirements (from D4.1) with a prioritization based on analysis of their relevance for the project, technical constraints and possibilities, and costs-benefits implications.

Section 2 is dedicated to the analysis of the state-of-the-art game development tools, towards the identification of the best candidate to be adopted as base technology for CR-PLAY. The aim is to make the capture-reconstruct process fully compatible with the selected tool, and to integrate IBR and VBR technologies in it.

Based on the above, Section 3 describes the architecture and functional specifications of the innovative mixed pipeline that will be developed in CR-PLAY, with specific sections dedicated to the main phases: Capture, Reconstruct, Play.

Section 4 presents main risks and related contingency actions that are foreseeable at this stage of the project. They will be constantly updated as long as development activities progress during the project lifetime.

Finally, Section 5 draws conclusions and describes next steps of development activities in WP4.

3

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

Table of Contents Summary ...... 3 Table of Contents ...... 4 Abbreviations and Acronyms ...... 5 Introduction ...... 5 1. Requirements: analysis, categorization and prioritization ...... 5 2. Analysis of state-of-the-art game development tools ...... 9 2.1 Game development tools categorization ...... 9 AAA tools ...... 9 All-in-one tools ...... 12 Open tools ...... 13 2.2 Analysis result: 4 as preferred choice ...... 15 3. Mixed pipeline: architecture and functional specifications ...... 17 3.1 General Architecture ...... 17 3.2 Capture ...... 18 General Description: ...... 18 Specific Architecture: ...... 19 3.3 Reconstruct ...... 20 General description...... 20 Architecture ...... 21 Regularization ...... 22 User intervention ...... 22 Data ...... 22 Asset management/exchange ...... 23 Use cases ...... 24 Open issues ...... 24 3.4 (Edit and) Play ...... 25 Image Based Rendering ...... 25 Video Based Rendering ...... 27 Edit&Play Workflow within the Game Development Tool ...... 28 4. Risks and contingency actions ...... 32 5. Conclusions ...... 34 5.1 Plan for next period ...... 34 References ...... 35 Table of Figures ...... 36 Table of Tables ...... 36 4

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

Abbreviations and Acronyms  IBR: Image Based Rendering  VBR: Video Based Rendering  DoW: Description of Work  WP: Work Package  GPS: Global Positioning System  UCD: User Centred Design  WYSIWYG: what you see is what you get  EaaS: Engine-as-a-Service  SDK: Software Development Kit  FPS: First Person Shooter  GPU: Graphics Processing Unit  API: Application Programming Interfaces  COLLADA: COLLAborative Design Activity  XML: Extensible Markup Language  MVS: Multi-View Stereo  SfM: Structure from Motion  ULR: Unstructured Lumigraph Rendering

Introduction The development of the mixed pipeline for the creation of high quality videogame development assets is a complex and somewhat unexplored process. It involves several different areas of research, integration of separate-but-related technologies, participation of final users (game developers) both for collecting requirements and performing evaluation activities, design of innovative activities (and related user interfaces) and, critically, the production of game prototypes that serve as proof-of-concept of the validity of the proposed approach. In the first six months of the project, the focus of WP4 has been on the collection of user requirements and the adoption of the most appropriate game development tool to be used as a technological hub during the project. In subsequent phases, the effort will be on creation, improvement, and integration of different technologies that form the pipeline: guidance for capture, reconstruction tools, IBR and VBR techniques, and specific solutions to be developed within the chosen game development tool. These will allow game developers to personally experience and benefit from the resulting innovative approach to content creation. The UCD approach, together with the novelty that characterizes CR-PLAY, do not allow forming a fully comprehensive list of specifications at this stage, but rather a living document that will accompany development activities during the project lifetime. In the first half of the project, effort is mainly on integration and production of incremental prototypes. In the second half, the prototypes will improve gradually with more attention to user interfaces and specific functionalities, based mainly on results of evaluation with game developers.

1. Requirements: analysis, categorization and prioritization The CR-PLAY mixed pipeline is targeted mainly at game developers for the creation of high quality videogames with reduced effort, thanks to an innovative approach to content creation. The User Centred Design (UCD) approach followed in this project considers final users (game developers) as a fundamental element of the equation, thus the user requirements collected (see D4.1) are central to the technological choices, approach to development, features to be implemented, and architecture of the system.

5

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

The following table shows a collection, discussion, categorization and prioritization of user requirements from D4.1. It also provides comments to help readers in understanding the rationale behind decisions on their relevance for the project and prioritization.

User requirements table

Id Requirement Relevance Priority (low, med, high) - (low, med, high) - Comment Comment MOTIVATION IN USING CR-PLAY M1 To use CR-PLAY as a fast prototype methodological High – this is one of High – it will be naturally approach that would allow them to examine possible ways to use CR- fulfilled when the various assets on game design ideas, aiming to PLAY’s technology pipeline is ready decide which asset best fits for a certain video game

M2 To use the CR-PLAY approach like an “automated Low – the pipeline is not Low – developers will be translation tool” so that they can edit the intended to translate 3D able to create games produced outcome and create video games with file formats; it is rather an with localized content localized content innovative approach to but not use it as a content development translation tool

CAPTURE

HOW TO SUPPORT CR-PLAY's TEAM MEMBER COMMUNICATION DURING THE CAPTURE PHASE C1 The need to be able to share (in real time) the High – this is not Med – Asset sharing will captured assets with team members completely linked, but be provided. Real time important. sharing will be considered, but it involves many complex factors (networks, computer speed etc.)

C2 Sharing the assets via dropbox or via email but High – sharing of assets is High – different also through a CR-PLAY repository (that would fundamental to allow approaches will be allow the logical grouping of assets according to game developers to best studied. a game design) use the pipeline

C3 A quality control on captured and reconstructed Med – this is relevant but Med - we will approach assets according to the game design quality standards have to this in Capture, IBR and specifications is required be defined VBR in various stages of the project (e.g. in Task 2.5)

C4 A crowd sourcing-approach for CR-PLAY allowing Low – this is interesting Low – this requirement many users to capture photos from assets and but out of the scope of the is opportunity driven putting them into the repository for later use. project (e.g. with external partners after the end of the project)

6

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

WHICH ASSETS ARE MORE IMPORTANT TO CAPTURE WHEN USING CR-PLAY C5 The most useful would be outdoor complex High – this confirms what High – CR-PLAY will start assets (like big buildings, complex architectures, has been written in the with outdoor assets and terrains, a whole street etc.), as these are usually DoW about the three animated elements difficult to create by the 3D-modellers and usually categories (outdoor, (VBR). Indoor assets will there are libraries for indoor assets. Animated indoor, animated be treated later (e.g. elements are very strictly related to the game elements) Task 2.4) design.

OTHER CAPTURE REQUIREMENTS C6 The participants stated the need to be able to align Med – this is feasible to a Med – different captured assets to a particular game design, to certain extent, but not functionalities will be organize and share results with appropriate team completely (due to the implemented to help members very nature of IBR and game developers align VBR) IBR/VBR assets with traditional content.

C7 The participants stated the need to be able to Med – We envision three Med – capturing with capture assets by simply using the mobile phone, possible scenarios. The mobile devices is a or the tablet; they stated that the need for special first one allows image feature planned in the equipment would work as an obstacle for them to capture, initial DoW. Computational easily adopt the CR-PLAY approach reconstruction and quality power, technical specs feedback. It requires no of cameras etc. will special hardware influence the task. (standard DSLR + laptop). The second one adds user guidance and its implementation will depend heavily on the capacities of future mobile devices. The third one is for ‘advanced capturing', e.g. surface normals, which will require special add-ons. These will be designed to pose only minor obstacles in terms of adoption.

C8 The participants stated the need to have High – capture guidance is High – We are already intelligent guidance that will inform the user very relevant and already working on intelligent about the next best view position, the correct planned in the DoW guidance systems. GPS angle or whether enough pictures have been is not reliable enough to taken, […] [Also requested] use of GPS coordinates improve capture and and to download pictures already taken from does not provide other users from the CR-PLAY repository directional information.

C9 […] all the participants stated as an important High – this is relevant but High – the consortium is requirement to have a quick preview on the difficult to be obtained already studying mobile phone with regards to the captured assets techniques to make this to perform a quick first hand quality control […] happen. First tests will important requirements were related with be performed after Year regards to the interrelation among the desired 1, when capture 7

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

quality of the representation and the pictures technology will be more needed to be taken in the capture phase for a mature. particular asset and for a certain game design

RECONSTRUCT R1 Almost all interviewees stated the reconstruction Low – the produced Low – The focus of the phase is very important and, both 3D artists and outcome of project is not on game designers, agreed about the need to be able reconstruction will not be producing 3D models, to modify/edit and transform the produced modifiable after the rather to render outcome of the reconstruction phase in case the reconstruction is done captured assets via IBR game design needs some modification on the (except for relighting). and VBR techniques produced assets.

R2 To be able to set up the quality of the resolution Med – The quality of Med – the quality of the in the reconstruction phase and the need for a rendering (IBR/VBR) is the reconstruction will be fast reconstruction process important factor, rather controllable, and linked than the quality of the to final image/video reconstruction per se. quality.

R3 Answers varied when asked where they would Low – this is in line with Low – reconstruction prefer to use the reconstruction tool; possibilities expectations currently happens with included the capture device, a standalone tool a standalone tool. and within the game development engine

PLAY P1 The need to interact with the reconstructed assets Low – captured assets are Low – translate, rotate as they usually do with the assets created within not standard 3D content and scale will be the traditional pipeline. and thus are manipulated possible to a certain differently. extent.

CR-PLAY’s INTEROPERABILITY WITH INDUSTRY STANDARD FORMATS AND ENGINES I1 “I want to use the CR-PLAY assets with MAYA, Low – see P1 above Low – see P1 above RHINO, 3DSMAX and other tools.”

I2 “Ideally the CR-PLAY should be compatible with High – this is very High – the CR-PLAY the majority of game engines like Unity, CryEngine relevant, the outcome of pipeline will be etc.” the project should be integrated in Unity, as usable by most modern described later in this game engines deliverable. The integration with other game engines is not prevented by the approach chosen in CR- PLAY.

Table 1 – User requirements table

8

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

2. Analysis of state-of-the-art game development tools Modern videogame development often involves ready-to-use solutions that allow developers to take advantage of the modern rendering and programming techniques, and focus their work on building new and sophisticated gameplays and game experiences. This scenario is the natural evolution of a process that began in the mid-1990s when, for the first time, games were architected with a well-defined separation between the core software components (the ) and the art assets and rules of play. The first game that had this clear distinction was Doom by (http://www.idsoftware.com/).

The data-driven architecture is the main concept behind the birth of modern game engines. The approach of this architecture is to decouple data information that describe specific game behaviours and aesthetics, from engine logic. This results in software applications that are composed of smaller and independent modules (modularity), with specific interfaces that allow new features to be added quickly (extensibility) and reused in different projects or software products (reusability). This approach requires more initial work for game developers (hardcoding a game would be easier), but brings many benefits: adding new content and new behaviours is much easier and reduces artists’ and designers’ dependency on programmers that, at the same time, are able to work on source code that is more readable and easier to maintain.

3D and 2D converters, level editors, script compilers, and special effects editors are only some of the tools they used in game development. Originally, they were separated and independent software applications that allowed programmers, artists and designers to achieve specific tasks during the development phase. This had the problem that heterogeneous tools were handling many different types of data in several different ways, creating games that were more prone to errors, less optimized etc. Natural evolution led to the current approach, where a single game development tool shares the same data formats and data management allowing data to be exchanged between different components in a more seamless way.

In this section of the deliverable, we present an analysis of the state-of-the-art of game development tools that led to the selection of the one that will be adopted in CR-PLAY. They have been divided in three categories (AAA, All-in-One and Open Source) and, for each category, one or more representatives have been analysed, taking in consideration the following features:

 The general system requirements needed by the game development tools.  The level of graphics features provided.  The number and variety of individual development tools.  The extensibility of development tools.  The level of support provided through the official communication channels.  The possibility and easiness to deploy for multiplatform and mobile devices.  The available licensing options. 2.1 Game development tools categorization

AAA tools These are engines and tools developed by professional game studios during the production of big AAA titles and then sold separately to be used to other videogames. They allow designers, artists and programmers to create games that target very high performance platforms such as next generation consoles and PCs. The flexibility of their engines makes them able to fit almost any development situation, but often requires highly specialized personnel. They are usually distributed full-source (providing the entire source code of engine and tools) to let game developers modify them according to their particular needs. But since they can be changed at their lowest level, they need very highly skilled

9

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

programmers to do that. They target big studios that mainly work on long term projects with very high budgets. Main representative examples are the 4 from Epic Games Inc. and CRYENGINE from Crytek GmbH. Unreal Engine 4 The Unreal Engine 4 (https://www.unrealengine.com) is the latest of four game engines developed by Epic Games Inc. (http://epicgames.com).

Epic Games Inc. is based in Cary, North Carolina. The company is well known for Unreal Tournament 1, 2 and 3 and for Gears of War 1, 2 and 3, all developed with the Unreal Engine technology. Epic worked also on the iOS game Infinity Blade. It has several subsidiaries in North America, Europe and Asia.

Unreal Engine is a suite of tools and technologies used for building high-quality games, real-time 3D simulations and visualizations across a range of platforms. In order to run a game and the editor, or to develop with the engine, there are specific hardware and software requirements. The recommended hardware includes (from the official documentation): quad-core Intel or AMD, 2.5 GHz or faster processor, 8 GB RAM memory and NVIDIA GeForce 470 GTX or AMD Radeon 6870 series or higher video card. In order to run and develop with the Unreal Engine on Windows machines additional software such as DirectX End-User Runtimes, .NET 4.0 and Visual Studio 2013 for Windows Desktop is also required.

Unreal Engine supports DirectX 11 allowing programmers and artist to take advantage of the latest rendering features such as full-scene HDR reflections, thousands of dynamic lights per scene, tesselation shaders, physics-based shading and much more.

In the Unreal Engine, each game is a self-contained project that holds every asset (2D and 3D content, code, shaders, etc.). The Unreal Editor is the scene/level editor included in the Unreal Engine; it provides a WYSIWYG user interface that allows direct manipulation of game entities (a.k.a. Actors) and game assets.

The Unreal Editor includes a complete gameplay scripting system called Blueprint visual scripting. Blueprint is based on the concept of using a node-based interface to create gameplay elements from within the Unreal Editor. This system provides the ability for designers to use virtually the full range of concepts and tools generally only available to programmers.

Unreal Engine 4 is currently distributed using a monthly subscription plan of $19/month that includes all feature and tools, full C++ source code access via GitHub, official documentation, access to all community resources and regular updates. Epic Games requests developers pay 5% of gross revenue from games made with Unreal Engine

Having the access to the full C++ source code allows maximum freedom of customization and allows extension of the Unreal Editor tools and Unreal Engine subsystem. It is possible to modify existing tools or create new tools from scratch depending on the needs of the development team and, at the same time, change the low-level rendering modules allowing the engine to work with new devices and new technologies not yet supported.

The Unreal Engine is multiplatform due to the extreme flexibility of the engine and the availability of the entire source code. It does, however, not provide a one-click deploy option like other game development tools: the development team needs to manually intervene in the game code to manage different deployment options. This usually requires very good programming skills that are not always available in small or independent game development teams.

10

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

In conclusion, Unreal Engine 4 is one of the best options on the market in terms of features and flexibility, but the need for the high-end system requirements for gameplay, the licensing options and the need for highly skilled programmers can represent a limitation for small and medium development teams. CRYENGINE The CRYENGINE (http://cryengine.com) is the current version of the CryEngine 3, the latest of three engines developed by the German company Crytek GmbH (http://www.crytek.com).

Crytek GmbH is an independent company based in Frankfurt am Main (Germany) with additional studios in Europe, Asia and North America. Their most famous titles are Far Cry, Crysis, Crysis 2 and 3, Ryse: Son of Rome and the free-to-play FPS WarFace.

CRYENGINE has one of the fastest high-end renderers on the market. Like other game development tools, it requires specific hardware and software to be run. CRYENGINE has the following requirements for developer machines (from the official documentation): Intel Core 2 Duo 2GHz or AMD Athlon 64 X2 2Ghz or better processor (multi-core processor is recommended), 2 GB RAM memory (4 GB recommended), NVIDIA 8800GT 512MB RAM or ATI 3850HD 512MB RAM or higher. It supports development on Windows only and needs the DirectX Package and Visual Studio 2010 in order to work properly.

CRYENGINE uses all the latest graphics and rendering features such as dynamic soft shadows, irradiance volume, real-time dynamic global illumination, light propagation volume, particle shading, tesselation shaders, parallax occlusion mapping and many other more specific features such as parametric skeletal animation, a facial animation editor, integrated multi-threaded high performance physics, etc.

CRYENGINE Sandbox is a WYSIWYG editor that includes a set of different tools specifically designed in order to support the game development team. The following list will briefly describe some of the more interesting ones.

 Material Editor: a tool used to interact with and modify materials in the scene. Users can apply textures or shaders and adjust materials’ parameters on the fly.  Flow Graph: a tool that allows designers to create and control events, triggers, game logic, effects and sounds without dealing with scripts or writing code.  Track View Editor: an embedded cutscene editing tool for making interactive movie sequences with time-dependent control over objects and events in the scene.  Dedicated vehicle creator: a toolset that allows easy creation of any type of vehicle, including component damage and effects, passenger positions, weapons and physics parameters.

CRYENGINE is distributed with an "Engine-as-a-Service" (EaaS) program that allows a developer to get the engine by paying a monthly subscription per user, royalty free, of €/$ 9,90/month. The CRYENGINE Free SDK will continue to be available to users, but new cutting-edge features will be available only to subscribers. The EaaS program includes a full source license that provides the entire engine source code. CRYENGINE is developed in C++ and it is fully extensible and modifiable. CryScriptSystem is a high level scripting engine based on Lua 5. It is possible to call C++ functions from Lua and vice versa, in order to decouple core functionalities and game logic.

Like Unreal Engine, CRYENGINE is multiplatform, but needs very accurate and delicate interventions in order to target different deployment platforms. Such interventions normally requires very good programming skills that are not always available in small or independent game development team.

11

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

In summary, CRYENGINE represents the best option in terms of graphics features. It has a good community and gives maximum freedom to developers, but at the same time, require very skilled programmers and very powerful hardware that could represent a problem for independent and small game development studios. All-in-one tools These tools usually target small and independent game studios. They are very general purpose and allow the creation of many different types of game, such as 2D puzzles, 3D platformers, FPS, role-playing games, and so on. Such games are typically deployable on several different platforms with reduced effort, making multiplatform development cheaper and more accessible for a large number of game development teams. This category of tools is usually presented as closed-source (no source code is distributed) in order to protect the core technology, and to keep it stable. They have strategic limitations in order to make them work smoothly with low performance hardware, such as certain mobile devices, without the need for dedicated intervention from the development team. These tools try to target the biggest audience possible, providing accessible tools and keeping a state-of-the-art level in terms of graphical quality. The most popular example of this category is Unity 3D from Unity Technologies. Other all-in-one tools are, for example, ShiVa3D from ShiVa Technologies SAS and GameMaker Studio from YoYo Games Ltd. For the purpose of this deliverable we will analyse Unity 3D because of its widespread use and importance in the independent market and its huge community of developers. Unity 4 Unity 4 (http://unity3d.com) is the latest version of the game development systems developed by Unity Technologies (http://unity3d.com/company), but, at time of writing, Unity 5 has already been announced.

Unity Technologies is based in Copenhagen, Denmark with additional offices in North and South America, Europe and Asia. Unity has been used to create top selling titles like Bad Piggies by Rovio Mobile Ltd. (creators of Angry Birds), Call of Duty®: Strike Team (iOS version of the next generation console game) and Temple Run 2 (the sequel of the mobile top-selling title Temple Run)

The first main difference with respect to the AAA tools, is the different target audience: while Unreal Engine and CRYENGINE have been created for big game studios, and are only now changing their licensing model to target also small and independent studios, Unity has always targeted the independent market, offering a cheaper product, with more accessible system requirements, while minimizing quality and performance degradation. As described on the official website, the only system requirements are a graphics card supporting DirectX 9 (shader model 2.0). Using Occlusion Culling requires a GPU with Occlusion Query support.

Unity is defined as a game development ecosystem: a powerful rendering engine integrated with a large set of tools and rapid workflows. It supports the Windows DirectX 11 graphics API and shader model 5.0 for Windows desktop deployment. It can use different rendering engines: a deferred lighting renderer for the highest lights and shadow fidelity, a forward renderer for the most common games and a vertex lit renderer to be compliant with old video cards and low performance devices. Unity natively works for 3D and 2D games and allows game developers to mix the two approaches with maximum freedom. It is powered by NVIDIA PhysX for 3D physics simulations and for 2D physics simulation.

Game development tools are so strongly integrated into the rendering engine and the authoring system, that it is not possible to separate them. Unity is an all-in-one solution that includes all the most important features for game development.

12

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

Integrated editor: a WYSIWYG authoring system that allows a direct scene interaction and manipulation, different asset formats (such as images, 3D models, and audio files) are automatically imported and made available to the game development team, programming is supported by a built-in scripting system (C#).

Mecanim: a powerful and highly integrated animation system that includes all the tools and workflows needed to create and build muscle clips, blend trees, state machines and controllers directly in Unity.

ShaderLab: shaders in Unity can be written with different languages. They are automatically compiled and optimized in order to fit deployment platform features.

Moreover, Unity includes a material editor, a special effects editor, a terrain editor, an artificial intelligence module with navigation paths, a network module, and more.

Unity comes with two main licensing options: Unity Free and Unity Pro. The first one is free and has limitations in terms of advanced features and deployment options. The second one is the full-featured version and it can be bought for $1,500 (single seat) or with a $75/month subscription plan.

A very important element in the Unity world is the community. The Unity community is very large (more than 1.5 million registered users) and very active giving technical support and producing software modules available on the Unity Assets Store, the proprietary marketplace integrated in Unity.

Unity is developed in C++, but it is distributed closed-source (the source code of the engine and the editor could be purchased but only on a per-case and per-title basis, typically via special arrangements made by Unity business development team and big development studios). The extensibility is guaranteed by the integrated scripting system based on Mono (the open-source port of Microsoft .NET infrastructure) and allows programmers to use C# as the main language for scripting gameplay and advanced editor features during game development. Unity also allows native code to be run and controlled by the scripting module in order to extend the compatibility with third-party software modules.

In contrast to the AAA tools described before, Unity provides an easy one-click multiplatform deployment option instead of leaving this task to developers. The advantage of this choice is that going multiplatform is easier and faster. The disadvantage is that Unity imposes some limitations, in terms of graphics features, in order to be able to support as many platforms and devices as possible.

In conclusion, Unity is a state-of-the-art game development tool and provides very good graphics quality with accessible hardware and team skills. Moreover, it has a very big and active community that provides official support and information through several communication channels.

Open Source tools Open Source software is the family of applications that complies with the rules defined in the Open Source Definition by the Open Source Initiative (http://opensource.org).

The undisputed success of Open Source software such as Linux and Firefox, to cite just two famous examples, seems to imply that the open source model is a good and reliable choice that can be also applied in the videogame world, but the market tells a different story. Open Source game development tools are not the preferred solution of videogame developers, and the current selection is not competitive enough with commercial solutions. Open source development roadmaps are normally slower than commercial ones, especially in a world where big industrial investors define the trends and this often leaves the Open Source tools a step behind with respect to the state-of-the-art. 13

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

On the other hand, the Open Source model plays a more decisive role in the core products category like rendering, physics or network engines that are often created and maintained by different communities with specific expertise. Ogre3D (http://www.ogre3d.org) and Bullet Physics (http://bulletphysics.org) are only a few of the most famous Open Source engines used in both the academic and commercial world. One of the most famous Open Source game development tools is Torque3D from GarageGames, which was born as a commercial product and has recently moved to open source in order to gain the community support to keep the tools under continuous development and upgrade. 3D Torque 3D is developed by GarageGames (http://www.garagegames.com/). GarageGames is based in Las Vegas, Nevada. The Torque Game Engine was the original technology behind the series of games Tribes. Torque3D has been used also on titles such as Penny Arcade Adventures, Rokkitball and Marble Blast.

Similar to Unity, Torque 3D targets small and independent game studios, providing an accessible game development tool. Torque 3D needs at least a 2GHz dual-core Intel or AMD processor, 2 GB RAM memory, 100% DirectX compatible NVIDIA based video card with 1GB or more video RAM with at least 2GHz and DirectX 9.0c.

Torque 3D is the GarageGames flagship engine built on the core strengths of the Torque Game Engine Advanced. Torque 3D has been re-engineered for maximum flexibility and performance. It comes with a full suite of tools that helps the game development team to produce high-quality games and simulations. The world editor is the central hub for working with Torque 3D that provides a central interface to all different editors used to put together a game level. The shape editor, terrain editor, material editor and particle editor are just some of the most important tools it provides. It uses a powerful rendering engine that includes shader features such as per-pixel dynamic lighting, normal and parallax occlusion mapping, screen space ambient occlusion and so on.

The Torque 3D assets pipeline is based on COLLADA, an XML-based schema used to make it easy to transport 3D assets between applications, enabling 3D authoring and content processing tools to be combined into a production pipeline. It includes the NVIDIA PhysX physics engine out of the box for Windows users, but game developers can implement other physics engines as well.

Due to its Open Source licensing model, Torque 3D comes with full access to source code, giving the possibility to access low-level engine source code. It is written in C++ and can be easily extended using third-party libraries and external software modules. TorqueScript is an object-oriented C++ like scripting system that is used to program gameplay functionalities and ties the various elements of a project together. It can be deployed on Desktop and Web platforms only, whereas mobile deployment is limited to Torque 2D, which is a separate game development tool that supports only 2D game development.

In conclusion, Torque 3D is a light tool that does not need powerful hardware to work smoothly. It supports good graphics features and the level of the tools available is in line with the market average. Since the community is not very big and has limitations on multiplatform and mobile support, it may not be the best choice for independent and small game development studios.

14

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

2.2 Analysis result: Unity 4 as preferred choice Table 2 below summarizes the results of the analysis in visual terms, thus clearly presenting the strengths and weaknesses of each of the tools. Each column represents a game development tool, while each row represents the particular parameter that has been taken into account during the analysis.

For each of these parameters, game development tools get a certain number of stars.

 System requirements: indicates the general system requirements needed by the development environment in order to allow developers to run the game and the development tools smoothly. More stars means lower system requirements needed by a game development tool, which means cheaper hardware (easily available for small development studios).  Graphics features: indicates the level of the graphics features implemented and available on the game development tools. More stars means higher level of graphics quality.  Number of development tools: indicates the number and variety of individual modules contained in the game development tool. More stars means more tools.  Extensibility: indicates the general possibility of extending the engine and the game development tools. More stars means major extensibility.  Support: indicates the level of support provided by official and unofficial channels. More stars means better support, bigger community, etc.  Multiplatform and mobile: indicates the possibility and ease of deploying on different platforms and mobile devices. More stars means better support for multiplatform deployment.  Licensing options: indicates the licensing possibilities available to the users. More stars means more affordable licensing options.

Unreal Engine CRYENGINE Unity 4 Torque 3D 4

System requirements

Graphics features

Number of development tools

Extensibility

Support

Multiplatform and mobile

Licensing options

TOTAL 19 x 19 x 22 x 17 x

Table 2 - Summary of analysis results

15

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

Even if the analysis does not present a clear and undisputed winner, Unity 4 has been chosen as the games development tool for CR-PLAY, mainly because of its huge support community, the extended multiplatform capabilities, the affordable licensing options and the integrated Asset Store.

As a general purpose game development tool, Unity allows the creation of any kind of game, from very small 2D mobile games to very big and complex 3D console titles, meeting the needs of a wide variety of game developers.

Figure 1 - Screenshot from Unity 4 Editor

Unity focuses its workflows on multiplatform capabilities, allowing easy and fast deployment on major available platforms: PC, Mac, Linux, iOS, Android, Windows Phone 8, Blackberry, PS3, XBox 360, Wii U and web browsers by means of the external plugin Unity Web Player (the next version of Unity will support WebGL deployment in order to publish on browsers without using any additional plugin). The multiplatform publishing system of Unity allows developers to port their application on different platforms with very small interventions in the game source code, allowing fast iterations and testing during the game development phase.

It is possible to easily extend the C# scripting layer and the rendering pipeline is fully modifiable using ShaderLab, the internal shader system that automatically cross-compiles and optimizes shaders depending on the target platform. Adding third-party native code is possible as well, by using the Unity plugin system, with the disadvantage of cross-compilation needed on different target platforms. Last, but not least, Unity Technologies provides the Unity Assets Store, a virtual market place where developers can sell, buy and share source code packages, feature-specific software modules, 3D model bundles, special effects or any assets, extension and third-party component that can be used within Unity.

In addition to the above, Testaluna has a very good experience in using Unity. This represents a clear advantage towards integration of IBR and VBR. Other game developers involved in CR-PLAY (from requirements provision to evaluation activities) use Unity 4 as well in their current production pipelines.

16

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

3. Mixed pipeline: architecture and functional specifications This section describes the general architecture and functional specifications of the CR-PLAY mixed pipeline, firstly from a general point of view, and then presenting each individual pipeline phase: capture, reconstruct and play. 3.1 General Architecture The general architecture of CR-PLAY mixed pipeline is represented in Figure 2. From a high level point of view, the pipeline is composed of software modules that handle image and video based rendering data from capture to (edit and) play phase, passing through reconstruction.

Figure 2 - CR-PLAY General Architecture

Before describing the various components in detail, we provide a summary of the data flow, from the capture device to the screen of a developer’s computer.

17

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

Images and videos are captured from the real world using common devices, such as digital cameras or smartphones, but also from specialized devices developed during later phases of CR-PLAY. The capture phase is guided by software that assists users during capture, by providing real-time indications on how to perform or correct the capturing process. As an example, indication can be given on how to take the next picture, or feedback given regarding the need of re-capturing images/videos.

The whole data package resulting from the capture phase is then sent to the Reconstruction Tool, a software module that takes the captured data and provides input for the IBR/VBR methods in the form of images, parameters, and geometrical assets that are then packed and stored on a shared repository in order to make them available for integration.

Once data retrieved from the shared repository, it is made available for the Unity game engine and the additional tools that handle image- and video-based rendering techniques, in order to be visualized together with traditional assets.

The Unity assets manager will be extended to understand the new data formats and unpack the entire data package into its internal data structures. Unity Editor scripts will provide the data handling extension that allows developers to modify or add new features to Unity's interface and makes new data available to IBR and VBR Behaviors (a set of scripts defining the high-level features that describe how a game object must behave). IBR and VBR Behaviors will then communicate to the rendering sub-system using dedicated native-code modules (IBR/VBR Plugins) responsible for communication with the low- level rendering pipeline (fully accessible and extendable using ShaderLab). 3.2 Capture General Description: The Capture phase of CR-PLAY is intended to assist the user in capturing the input data (images). This is necessary because the user cannot predict without help whether the captured data meets the requirements that allow for a final rendering in a desired quality. In an iterative process, the capture tools will analyze the already captured data, will predict the position of missing images, and will guide the user towards these positions in order to capture the missing data. Any new user is going to receive specific capture instructions in the form of a tutorial video before the actual application starts. From there on the pipeline will be as follows:

First, the user will select which type of environment and quality he wants to capture. Different environments need to be captured differently and high quality renderings require more images. For example one needs to distinguish between a close up view of an old painting, which is captured inside a building and has to be rendered in very high quality, versus a far away view of a façade, which is captured outside and only needs to be rendered in a coarse resolution (CR-PLAY will start with outdoor assets and animated elements (VBR), indoor assets will be treated later). After the user has made this selection, he will start the process by just capturing a few images of the scene. The capture tool will then immediately start to analyze these images and makes a first prediction whether reconstruction is possible. This feedback will be available quickly.

The tool will then try to predict if the rendering would be able to achieve a quality that is adequate for the selected environment. If the input data is found to be insufficient, the tool will suggest the position, where images are missing, and will guide the user to these positions. For this guiding, a specific interface will indicate (for example using arrows or an overview of the already captured data) in which direction the user has to move the capture device in order to navigate to the missing position. If the data is

18

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

sufficient, the view planning will stop the capturing process or suggest to expand the scene by capturing novel viewpoints outside the already captured scene volume.

The capture stage of animated assets is performed by one of the content-creator artists, and involves filming video footage of a desired dynamic phenomenon. For example, to make a realistic looking flag element that can be used and replicated later in the game, a real flag should be set up in front of a video camera, and wind applied, e.g. using a fan or natural outdoor wind. To make segmentation easier, it is advisable to film the flag against a simple controlled background, such as a green-screen. The captured video should be fairly long, especially if the moving element exhibits highly variable motion. The resulting video can then be treated as a “bucket” of unordered images, where each image has its associated alpha matte as a result of the chroma-key segmentation. Specific Architecture: The CR-PLAY capture tools will run on common consumer hardware. This includes a digital camera to capture the actual images and a connected computing device to provide the guidance. Specific scenarios could include a DSLR camera that is connected to a notebook and later on in the project, also combine devices such as a smartphone or tablet.

Since mobile platforms are limited in their compute power, the guidance cannot rely on a complete reconstruction, which would take too long to calculate. The system will instead work with approximations for the most immediate feedback. After the capture session, the data is sent to a server where compute intensive analysis can be performed. Team Member Co-Operation: We envision two scenarios for team member co-operation during capture. The first allows co-workers at the office to look at the data once it has been processed on the server as described above. They can then comment on the reconstruction and coordinate next steps, e.g. recapturing some parts of the scene, with the personnel in the field. This scenario requires a time span of several hours. It is intended to save travel expenses in cases where the required assets are far away.

Another scenario that we consider is support for manual planning before the capture session. This might be implemented by letting the team choose certain view points and camera positions on a map that are taken into account during guidance. This scenario is, however, of lower priority since the task can also be solved with traditional methods. It might be approached once all other challenges with automated view planning are overcome within CR-PLAY.

19

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

Figure 3: Capture architecture and work flow.

3.3 Reconstruct General description The Reconstruct phase takes images (i.e., photos or video frames) and meta data acquired in the Capture phase and provides input data for the Image/Video Based Rendering phase. The pipeline consists of several steps shown in Figure 4. First, camera parameters (i.e. position, viewing direction, focal length etc.) are estimated. Then, a multi-view stereo (MVS) reconstruction algorithm recovers the depth (distance of a scene point from the camera) at each pixel where this is possible with sufficient reliability. Finally, these per image depth maps are fused to obtain a 3D model. The modular design allows the replacement of the individual steps if better techniques become available during the course of the project.

20

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

Figure 4: Reconstruction architecture and data flow.

For the reconstruct stage related to VBR, every pair of images in the collection is analysed to measure their compatibility. Naturally, images that were originally sequential are highly compatible with each other, but other, non-sequential images will typically also be similar, so that the original sequence of the images can be modified to introduce loops and variations that were not originally present in the linear video. This is a one-time process for analysing the footage, and the resulting cost-matrix is retained to allow sequences of various lengths to be generated from the available frames. There are further technical details about how the images are compared using L2-distance and look-ahead optimization to avoid playing back sequences that dead-end in frames for which no loop-back is available. The input video may contain sub-sequences where the dynamic element is doing something specific, such as a flag fluttering to the right vs. to the left, or a water fountain streaming high/medium/low. At present, the content- creating user must locate the relevant sub-sequences themselves, and should assign them names for organizational purposes. Architecture The current architecture relies on open source [Bundler] or freely available [Wu] third party software for structure from motion (SfM), i.e. finding the input camera positions and orientations. The output formats of those are documented and well supported by later stages. This step also applies a distortion correction on all images. Images, camera parameters and meta-data are then passed on to the MVS phase.

The MVS reconstruction is based on the approach by Goesele et al. [Goesele 2007], which has been implemented by TUD and is publicly available [libMVE]. If necessary, the reconstruction can be controlled with several parameters, e.g. the amount of downscaling applied to the input images. Sensible choices are set by default and allow automatic reconstruction. Batch processing is also possible since each image is, at this stage, independent of the others. The results of this approach are per-image depth and confidence maps. These are stored along with each image and further information in a ‘view’ data structure that can be accessed using libMVE. Tools exist to export depth maps as float images or triangulated meshes. Other data can be exported in various image (tiff, tiff16, jpg, png, pfm), geometry (bnpts, off, ply, pbrt, synth.out), and plain text formats. 21

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

Depth maps are computed for each image separately and might not be globally consistent. They also tend to be incomplete and noisy. Combining the information from multiple images in a global model helps to address all these issues. The surface extraction step merges depth maps and produces a global model. The code is provided by TUD [Fuhrmann 2014] and well integrated into the reconstruction pipeline sketched in Figure 4. A conversion routine to transform the resulting triangle mesh into a point cloud with normals and visibility information (PMVS patch file) is available. Other options to merge multiple depth maps, e.g. [Kazhdan], can easily be integrated for comparison purposes.

The merging phase is also the place where post-processing options, e.g. mesh simplification, can be implemented Regularization The current algorithms in the MVS reconstruction and surface extraction phases employ very little regularization. To better reduce noise and fill undesired holes, regularization mechanisms will be considered. If it proves to be beneficial, this might also lead to the integration of MVS reconstruction and surface extraction into a single step. The amount of regularization should be adapted according to the user’s settings, and will depend on the specifics of the IBR or VBR algorithms, as well as the quality measures developed in CR-PLAY. User intervention The entire pipeline requires very little user interaction. Future extensions might change this because user input is expected to improve and speed up the SfM and MVS parts. Graphical user interfaces for the first two stages are available and can be extended to also encompass the surface reconstruction.

Functionality has to be implemented that lets a user mark parts of the scene or object in the image or on the final mesh. The marked regions can then be annotated with desired quality settings or guidance information, e.g. foreground vs. background. These attributes can then be taken into account during surface reconstruction, e.g. to adapt the vertex count or to exclude clutter. Data Figure 5 shows the output data that we expect from each stage. Some attributes are optional and marked in square brackets.

22

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

Figure 5: Output data in each of the pipeline stages.

The data can be stored in different formats:

 SfM: synth.out [bundler]; .nvm [Wu]  MVS: .pfm, .ply (depth maps); .mve [libMVE]; .txt, synth.out (camera parameters)  Surface reconstruction: .ply, .off, .pbrt (mesh); .pmvs, .ply (dense point cloud) Third party importers and converters for the final ply mesh are available in Unity, 3DS Max, and other packages.

Asset management/exchange Any asset management system that is able to operate with binary data can be used to store the intermediary files (images, views) and final output (meshes, point clouds). An example workflow (1-4) is shown in Figure 6. Some computations can also be performed locally, e.g. on a laptop during capture, and uploaded to the server at a later time.

Depending on the scene content, a 20 megapixel input image requires about 10MB of disk space. More aggressive compression leads to artefacts in the MVS reconstruction stage. Depending on the image size and scene content, a depth map typically has 1-15 megapixels that are active and each requires four bytes (uncompressed). Thus, at least 15-70MB are required per view. Typical scenes contain 50-500 images. If 23

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

multiple versions at different scales are required, this number is even larger. The size of the resulting meshes or point clouds after the merging step depends on various parameters and ranges from 50MB to 500MB.

Figure 6: Example workflow (from 1 to 4) demonstrating synchronization points with a local or remote asset storage.

Use cases  Providing spatial relationships between images (‘this one was taken to the left of the other’)  Estimating from where a photo was taken and then combining the image with rendered objects from the correct perspective  Providing proxy geometry as starting point for 3D artists  Providing proxy geometry (i.e., point clouds or meshes) for image based rendering

Open issues Some design decisions on the functional level are difficult to make without a more rigorous definition of ‘mesh quality’. Quality was a concern in the end user study but has not been quantified. IBR should work with proxy geometry that is not very detailed. Once questions such as “What quality measures are suitable for such impostors?” and “What quality score is needed for successful IBR?” are clarified, deviations from the initial functional specification might be necessary.

24

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

3.4 (Edit and) Play This is the last phase occurring in the CR-PLAY mixed pipeline and consists of the integration of IBR and VBR rendering in the game development tool. It can be differentiated in two distinct sub-phases, which are very important during videogame development: Edit and Play. Edit is related to the activities and tasks performed by developers once all videogame assets are ready to be integrated in the game. It includes operations such as loading assets, creating game objects, assigning behaviours, programming game logic, etc. Play starts immediately after Edit and includes all situations where the game is actually played, including testing by the development team and the real gaming situations experienced by game players.

In this section both IBR and VBR roles will be explained and contextualized in the Edit and Play phases, as specific independent modules integrated in the game development tool. Image Based Rendering Algorithms The image-based rendering algorithms we will adopt in CR-PLAY are based on two main categories of solutions: geometry-based such as unstructured lumigraph (ULR) [Buehler 2001] including visibility computation enhancements and per-pixel processing as in [Eisemann 2008] and oversegmentation/warp-based [Chen 2011, Chaurasia 2013] IBR.

In a nutshell, the former class of methods assume that sufficiently good geometry is available, and thus the algorithm simply chooses the images used to render a novel view, reprojects the geometry back into the images, performs a texture lookup, takes into account visibility using the geometry and then blends the result. If the geometry has captured all the details of silhouettes and depth differences, this result will look perfect. Obviously, this does not usually happen, since geometry is never complete and rarely accurate enough. Even if it were possible to capture accurate geometry everywhere, the size of the geometric representation would be prohibitive (imagine modelling every single leaf of a tree with polygons).

The latter approach foregoes the need for detailed and accurate geometry, by using an oversegmentation (decomposition into superpixels) of the original images, which accurately capture depth boundaries. If sufficient depth is available, superpixels can be simply reprojected [Chen 2011]. Otherwise [Chaurasia 2013], a sparse set of depth values is projected within each superpixel of the oversegmentation, where such depth exists. In the cases where there is no depth, we synthesize it by using a geodesic traversal of the graph of superpixels and interpolating from closest neighbours [Chaurasia 2013]. The superpixels are then warped using the depth as a hard constraint, together with a shape-preserving constraint that results in a consistent image, even in the presence of inaccurate depth.

Position of IBR in the Pipeline Image-based rendering will mainly be used to render captured assets in the overall CR-PLAY pipeline. As described in the DoW, IBR will be used to capture and render backdrops, rather than characters and objects that players will manipulate.

After capture and reconstruction, we have a set of calibrated cameras and 3D information in the form of a point cloud and a – usually inaccurate and incomplete – mesh. If a geometry-based approach is used, there is no further processing; images, cameras and geometry are loaded and rendering can begin.

If an oversegmentation/warping approach is used, a set of additional steps is required. In particular, the superpixel oversegmentation and depth synthesis (if we use the warping method) need to be run, and 25

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

the superpixel correspondences computed as a graph. This preprocessing is done once per dataset and stored with the model, similar to geometry and textures in a traditional content management pipeline.

At runtime, all this data needs to be loaded: superpixels, depth and superpixel correspondence graph. As explained below, we develop in two phases: research prototype and then in the integrated platform. In the research prototype implementation, this step is quite long; in the integrated platform, this data will be efficiently stored and retrieved using binary encoding and fast access.

At this stage, the actual image-based rendering (“Play”) can begin. The details of the render passes vary depending on the algorithm. For geometry-based approaches, there is generally a single pass to render depth, and a blending pass to reproject the images from a set of candidate cameras. For oversegmentation approaches, we render the mesh grids around each super-pixel. The method of [Chen 2011] is done with simple reprojection of the vertices, while for the warping approach, it is done by first warping the vertices of each mesh using the shape-preserving warp. The warped images are then blended together in a blending pass. A final hole-filling pass can be applied to complete uncovered regions.

Image-based rendering typically will occur before the rendering of traditional content. During IBR, the z- buffer is also written, providing depth values that can then be used to render traditional assets consistently. This is important since it allows virtual objects to be hidden by captured content, e.g., a character walking behind a captured column and vice versa, e.g., a moving car passing in front of the captured wall. Software Platform and Integration A key element in the development of our software tools is the need to both support integration within the common platform (Unity) and maintain the flexibility required for the development of research prototypes of experimental renderers. We thus have adopted a two-tier approach.

The first tier platform is used for research work. This involves the development of a common library that has functionality for the treatment of point clouds, meshes and calibrated cameras, which remains in C++ using standard tools (OpenGL or Qt) used in the development of research prototypes. We have refactored our original research code, and currently have an implementation with ULR, videomesh, superpixel warp, and ambient point clouds [Goesele 2010]. We are in the process of integrating combined IBR-virtual character rendering (this last item is part of the VERVE project). In the next 6-8 months, we will also integrate the viewer for interactive relighting and the possibility to relight the input images of a given captured IBR scene. Some of the preprocessing functionality for image-based rendering and relighting will also be linked to this library if the need becomes apparent.

The second tier is the integration into the common platform. This work is under way, and involves porting the original C++/OpenGL code of superpixel warp rendering to Unity. We have taken a progressive approach to this task, to ensure that we understand the difficulties and potential problems involved. We have thus opted to maintain some of the functionality for now in C++ as a plugin to Unity, in particular the superpixel mesh warping step. This choice is justified for reasons of performance, since C# versions of numerical libraries are not very efficient. The rest of the input and renderer preparation code is being progressively ported to C#. As a first step, we will use GLSL shaders to allow the development of a complete working prototype, which initially will only be available as a standalone workstation application. In future steps, we will port the shaders to the ShaderLab language of Unity, which will allow the use of image-based rendering on all platforms except the browser (which does not include support for plugins). Additional difficulties involve the older shader language version supported by Unity (equivalent of OpenGL 2.0), and the performance of the warping operation, which involves quite 26

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

expensive numerical computations. Shader language versions should not be a big problem, since only minor features of recent OpenGL enhancements are being used, and they can be changed easily.

To overcome the issue of the warp performance, we will investigate several fall-back solutions, which may be appropriate for web-based or mobile platforms where the quality requirements are not as high as for workstations. This may involve a combined solution with ULR, videomesh, and superpixel warp rendering, depending on the content and the target platform.

Video Based Rendering Algorithms CR-PLAY assets that are dynamic i.e. moving over time, bear some resemblance to IBR assets. Full 3D reconstruction of geometry remains difficult, and may even be impossible in some cases. For example, high quality surface normals can be captured by complex experts using Photometric Stereo techniques [Woodham 1980] if the subject stands perfectly still to allow multiplexing of the light direction. Some progress on alignment [Wilson 2010] and our own multi-spectral techniques [Brostow 2012] means that now some dynamic objects can finally be captured, though in a very constrained studio setting.

Also similar to IBR, video elements have lighting- and view-dependent shading effects that look flat and fake if texture is simply baked onto good or bad geometry that is just re-rendered in a game. This is especially noticeable on objects that are reflective, such as automobiles or even shiny plastic. Much of the existing literature on 3D surface-acquisition assumes Lambertian, i.e. non-reflective surfaces. Obviously, we must cope with a much broader range of surfaces for general game content-creation. Very little attention has been paid to view-dependent lighting effects in previous video-sprite literature [Schoedl 2000], [Schoedl 2002].

Further, VBR faces the additional challenges of i) spatial segmentation, ii) temporal coherence, and iii) causality. Segmentation is needed to isolate a moving element from its original setting, so it can be inserted into a new environment. Some elements, like moving vehicles, are largely rigid, so the segmentation challenge is mostly about overcoming viewing-angle dependencies. Other objects, like palm-trees, are highly deformable, so both viewing-angle and dynamic pose make segmentation difficult. Temporal coherence is important because any artifacts in segmentation or rendering, even minor ones, become obvious as attention-grabbing flickering during playback. Finally, dynamic elements are often tied to physical phenomena, and need to be consistent. For example, a fluttering flag reveals the direction of blowing wind, and should move similarly to nearby objects subject to the same forces.

Position of VBR in the Pipeline Video-Based Rendering spans the entire CR-PLAY pipeline, from capture to play. Video elements can be used passively in the background, or as controllable/interactive elements in the foreground, as needed.

For brevity, we omit the view-dependent side of reconstruction and playback for VBR elements, because they bear substantial resemblance to IBR, and focus instead on the parts that are unique to creating animated assets.

For playback of single-view VBR elements, the matted video sprites undergo very few changes, except for being played back out-of-order compared to the original sequence(s). If the data was labelled (by the user or automatically) into named sub-sequences, the rendering stage should specify which sub-

27

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

sequence is desired, i.e. should the about-to-be-rendered flag be billowing to the left or the right? If sub- sequences naming is not available, any of the input frames can be selected. Of course, the actual probability that a frame will be selected depends on its characteristics in the previously computed cost- matrix, i.e. how compatible is that frame with non-sequential frames. The renderer simply generates new image-sequences by selecting compatible chains of frames of the desired length. Software Platform and Integration Initial prototyping of the VBR system is implemented using an opportunistic mix of programming languages and libraries. This is helpful, especially in the context of memory-hungry video, and where different algorithms are created and studied for feasibility. In this development stage, rapid experimentation is more important than overall code efficiency or integration with the general CR-PLAY pipeline.

After prototyping, the second stage of development targets the common CR-PLAY platform, Unity. The filming and reconstruction stages of VBR are part of content-creation, so these are being integrated into Unity mostly as a matter of convenience to the users who can then better preview the assets they are preparing. The final stage of VBR, where the video sprites showing the dynamic element are rendered into the game, is necessarily tightly coupled with the game-engine and the current state of game-play. Specifically, the VBR rendering function must know from the game what name (if labelling is being used) of video elements is desired, and how many frames in the sequence are desired. As output, the function returns a sequence of image-buffers of that length along with their alpha mattes, to allow compositing. The calling function uses each image buffer in turn as a texture map for a simple billboard. In the flag example, the sprites showing the flag blowing to the right appear on an otherwise-invisible rectangular billboard, positioned at the end of a traditional-content flag-pole, so modelled and rendered as 3D geometry & texture. As game-play calls for the flag-pole to move left or right, the VBR function will be asked for further and context-appropriate sequences.

Edit&Play Workflow within the Game Development Tool In modern game development tools, like Unity, one of the most important tool is the level editor, a powerful visual interface that allows direct manipulation of the game scene. Unity Editor is a fully integrated and extendable interface mainly composed of an advanced level editor integrated with the Unity assets management system and other Unity specific tools. The Unity Editor provides five main views (as shown in Figure 7):

 Project: this is the interface to the asset management system and visualizes the list of assets included in the project, which are quickly searchable and sortable.  Scene: this is the sandbox visual 3D editor that provides tools to manipulate the game scene allowing the user to create, add, remove or modify game objects and build game levels and scenes.  Hierarchy: shows all the game objects and their hierarchies in the scene.  Inspector: provides the interface to view and edit specific parameters of game objects and assets in both scene and project.  Game: a WYSIWYG view that enables preview and testing of the game, including how it will look and run on a target device.

28

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

Figure 7 - Main views on Unity Editor

When a new asset has to be imported into the game scene, developers need to follow a precise workflow. For each step, the Unity Editor provides a dedicated user interface.

Before considering the integration of IBR and VBR assets, we briefly describe how the Unity Editor handles loading, placement and interaction of traditional assets inside a game level. For this example, we consider an animated 3D character model that is first placed inside the Unity Asset folder, in order to make it available from the Project view. When selecting the 3D character in the Project view, all its parameters are exposed in the Inspector view (scale factor, mesh compression, normals, materials, textures, animations clips, etc.) allowing developers to manipulate their values as needed.

Once the 3D character is loaded, it can be dragged into the Scene or into the Hierarchy view in order to instance the corresponding game object in the game scene and start working on it. Selecting the game object just created will show its properties in the Inspector view, but this time these properties will be related to the game object and will contain specific information such as the 3D transform and its behaviours.

Thanks to the data-driven approach, a game object is an abstract entity with transform properties, that define its position, scale and orientation in the scene, and can "contain" one or more behaviours (scripts) describing its logic and aesthetics. In essence, a game object is completely defined by its behaviours.

At this point, the developer can add new behaviours to the game object by dragging the corresponding scripts into the Inspector view, manipulating its exposed parameters, or moving, rotating and scaling the game object directly in the Scene view. Once everything is ready, the developer can run the scene in order to test the work done and check if each game object is behaving as it has been designed.

Management of IBR and VBR assets will follow the same approach, but with the addition of dedicated tools, behaviours and general code that will extend the Unity Editor in order to let developers handle and manipulate them.

29

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

IBR and VBR assets files will be placed inside the Unity Asset folder, making them automatically available to the game engine and immediately accessible from the Project view. Initial setup operations can be performed as soon as these assets are loaded in the project view and, by selecting them, all the importing properties can be shown in the Inspector view in order to be manipulated by the development team.

Unity allows developers to interact with its user interface and controllers in the same way they interact with the game objects contained in a scene, thanks to the Unity Editor API, a set of libraries and functionalities provided by Unity and accessible by C# scripts. Using this API, developers are able to create and customize editor tools like, in this case, a new tool associated to the IBR/VBR asset data format that could read all parameters and automatically setup a prefab (a reusable game object stored in the Project view) ready to be placed and used in the scene.

Once the prefab is ready, the developer can drag it into the Scene view in order to place the IBR/VBR game object in the game scene and start interacting with it. This game object can be visualized as a proxy of the reconstructed scene (see Figure 8) in order to give an idea of its approximate size in game scene and helping other traditional assets to be placed accordingly. Other visualizations may be more appropriate, and will be investigated once the integration is completed. Additional features, such as physics proxies for collision handling, particle emitters for traditional special effects, light probes for modifying and harmonizing the general scene illumination and so on can be attached to the IBR/VBR game object to improve the integration with other traditional assets in the game scene.

Figure 8 - Fictional example of a possible proxy for a reconstructed IBR scene

Among all behaviours that characterize the IBR/VBR game objects, the most significant will be those that provide IBR/VBR features. More precisely, the key is the communication between the game object and the IBR/VBR Plugins that are responsible for part of the IBR/VBR technology integration. Rendering integration is performed using both IBR/VBR Plugins, to prepare data for the runtime execution, and dedicated IBR/VBR Unity Shaders, that allow direct manipulation of Unity's rendering pipeline.

30

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

Figure 9 - The CR-PLAY mixed pipeline integrate in Unityshows the different components described and their interaction in the CR-PLAY mixed pipeline in Unity.

Figure 9 - The CR-PLAY mixed pipeline integrate in Unity

31

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

4. Risks and contingency actions Despite the adoption of a UCD approach, which involves final users in every phase of the project, the experience of partners in their respective domains, as well as the adoption of technologies that are top notch in their respective fields, there are risks that activities in the project can face problems and unforeseeable issues. The consortium is well aware of this and a first list of risks is already present in the DoW. Nevertheless, as initial user requirements have been collected, the game development tool (i.e. Unity) has been chosen, architecture and functional specifications have been depicted, additional risks have emerged. We present hereby a table summarizing additional risks related to development activities of CR-PLAY, with an estimation of how likely they will happen and the related contingency actions.

Risks and contingency actions table

Id Risk description Probability Contingency action (low, med, high) - Comment Risk1 View planning and guidance for an unknown scene Med – This is a known When the user selects are inherently difficult. Finding a suitable stopping problem but we are the type of environment criterion (sufficient images to capture the whole already working on at the beginning of the scene in sufficient quality) is an unsolved problem mitigating it capture session, it might because the algorithm cannot know how big the be necessary to also ask ‘whole scene’ is. For single objects that are for additional captured in a 360 degree circle this is less of a information such as problem than for large outdoor scenes scene extent

Risk2 It is yet unclear how well a preview computed on Low - Cases might occur In the worst case, and if a mobile device relates to the quality of a final where the initially capturing additional reconstruction integrated into the game. This predicted quality deviates pictures is not possible, affects predictions during capture and during from the quality of the full an artist will have to reconstruction reconstruction manually adjust the reconstruction

Risk3 Some scenes, or parts thereof, might not yield a Low – first results tend to In the worst case, an sufficiently accurate and complete reconstruction restrict these issues to artist will have to even with regularisation applied, e.g. translucent very specific cases manually adjust the objects. The goal is to detect such cases during the reconstruction capture guidance step and inform the user

Risk 4 If the compute power on a device is too low, Low – computing power is In that case additional computations have to be performed on a server. If not likely to be an issue simplifications and bandwidth is limited, this might not be possible in when the project’s compression schemes a short amount of time outcomes will be have to be considered exploited on the market that might reduce the overall quality

Risk 5 Integration of IBR in Unity: the older shader Low – we are confident We can possibly adapt language version supported by Unity (equivalent that it is possible to write the algorithm with equivalent shaders in an

32

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

of OpenGL 2.0) implies adapting the algorithms to OpenGL 2.0 compatible additional passes to be backwards compatible version overcome the limitation

Risk 6 Integration of IBR in Unity: the performance of the Med – first tests have We are investigating warping operation, which involves quite expensive shown possible several options to numerical computations and is slow in C# performance issues circumvent this need. compared with the C++ These include the version development of new algorithms to avoid the warp with a possible tradeoff in quality.

Risk 7 A practical risk for integrating VBR into the Unity High – this risk is a VBR should best be used game-creation pipeline centers on hardware property of the VBR for elements that constraints. VBR content, such as a moving ribbon, technology of CR-PLAY appear repeatedly. For is likely to have a medium-to-high texture memory example, game cost, and a medium-to-low computational cost. If designers should use a game level requires only one animated ribbon, VBR for levels that have then it is very likely that an in-game physics hundreds of ribbons, as simulation would cost much less memory, and a VBR scales very comparable amount of CPU. VBR would be an efficiently with the inferior choice in that case number of instances (no extra texture memory, and trivial extra CPU usage).

Risk 8 A second risk for VBR centers on the division High – this risk is a Rather than filming at between pre-production and game content- property of the VBR the outset, content- tuning. When all dynamic elements are synthetic technology of CR-PLAY creators will be geometry, textured, and physical simulated, then instructed to develop tuning their parameters is difficult (especially if much of the game realism is sought), but possible at any time. In functionality early. This contrast, VBR content will be very realistic, but will help them establish must be planned out in advance, else video a useful "grammar" of footage will have to be re-filmed what the ribbon should do. Knowing this, they can then film (and label those sequences) appropriately, as, for example, soft wind vs. gusty, or 0-360 degrees. Planning the game-play this way will lock down what footage needs to be collected in-camera

Table 3 – Risks and contingency actions table

33

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

5. Conclusions This deliverable sets the technological framework, the approach to development and specifications that will guide activities in WP4 (and indirectly also in WP1, WP2 and WP3). As stated above, this should be intended as a starting point that will live and grow as long as the project progresses and results from evaluation start to appear. All partners involved in development WPs have a specific role and the work will be performed accordingly to the plan depicted in the DoW. More detailed activities are defined within sub-groups and depending on specific needs within the shell of main tasks of the WPs, towards the achievement of main objectives of the project. 5.1 Plan for next period Development tasks in WP4 have been designed to follow a three-tier schema (Low-fidelity, High-fidelity and Final prototypes of mixed pipeline). The next step in the direction of the Year 1 goal is the creation of a Pre-release of the Low-fidelity prototype. It is intended to be a first working version where every piece of the pipeline is in place (possibly with incomplete or still-to-be-integrated code), that will serve to TL (in its role of leader of WP4 and game developer) to start experimenting with the innovative technology and producing feedback, further requirements, and useful insights to better steer development activities.

34

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

References

[Brostow 2012] Brostow, G.J., Hernandez, C., Vogiatzis, G., Stenger, B., Cipolla, R. (2012). Video Normals from Colored Lights. In IEEE Pattern Analysis and Machine Intelligence. [Buehler 2001] Buehler, C., Bosse, M., McMillan, L., Gortler, S., & Cohen, M. (2001, August). Unstructured lumigraph rendering. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques (pp. 425-432). ACM. [Bundler] Snavely, N. Bundler: Structure from Motion (SfM) for Unordered Image Collections. http://www.cs.cornell.edu/~snavely/bundler/. [Chaurasia 2013] Chaurasia, G., Duchene, S., Sorkine-Hornung, O., &Drettakis, G. (2013). Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics (TOG), 32(3), 30. [Chen 2011] Chen, J., Paris, S., Wang, J., Matusik, W., Cohen, M., & Durand, F. (2011, April). The video mesh: A data structure for image-based three-dimensional video editing. In Computational Photography (ICCP), 2011 IEEE International Conference on (pp. 1-8). IEEE. [Eisemann 2008] Eisemann, M., De Decker, B., Magnor, M., Bekaert, P., De Aguiar, E., Ahmed, N., ...&Sellent, A. (2008, April). Floating textures. In Computer Graphics Forum (Vol. 27, No. 2, pp. 409-418). Blackwell Publishing Ltd. [Fuhrmann 2014] Fuhrmann, S., Goesele, M. Floating Scale Surface Reconstruction. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH 2014). [Goesele 2007] Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, Steven M. Seitz. Multi- View Stereo for Community Photo Collections. ICCV 2007. [Goesele 2010] Goesele, M., Ackermann, J., Fuhrmann, S., Haubold, C., &Klowsky, R. (2010). Ambient point clouds for view interpolation. ACM Transactions on Graphics (TOG), 29(4), 95. [Kazhdan] Michael Kazhdan, Matthew Bolitho, Hugues Hoppe. Poisson surface reconstruction. SGP 2006. [libMVE] GCC, TU Darmstadt. Multi-View Environment. http://www.gris.informatik.tu- darmstadt.de/projects/multiview-environment/. [Schoedl 2000] Schoedl, A., Szeliski, R., Salesin, D., and Essa, I. A.. (2002). Video Textures. In SIGGRAPH 2000. [Schoedl 2002] Schoedl, A. and Essa, I. A., (2002). Controlled Animation of Video Sprites. In Symposium on Computer Animation. [Wilson 2010] Wilson, C., Ghosh, A., Peers, P., Chiang, J., Busch, J., Debevec, P.E. (2010) Temporal upsampling of performance geometry using photometric alignment. In Transactions on Graphics. [Woodham 1980] Woodham, R. (1980) Photometric Method for Determining Surface Orientation from Multiple Images. In Optical Engineering 19(1)139-144. [Wu] Wu, C. VisualSFM : A Visual Structure from Motion System. http://ccwu.me/vsfm/

35

Functional specifications, system definition, CR-PLAY Project no. 661089 Deliverable 4.2 development approach and architecture

Table of Figures Figure 1 - Screenshot from Unity 4 Editor ...... 16 Figure 2 - CR-PLAY General Architecture ...... 17 Figure 3: Capture architecture and work flow...... 20 Figure 4: Reconstruction architecture and data flow...... 21 Figure 5: Output data in each of the pipeline stages...... 23 Figure 6: Exemplary workflow (from 1 to 4) demonstrating synchronization points with a local or remote asset storage...... 24 Figure 7 - Main views on Unity Editor ...... 29 Figure 8 - Fictional example of a possible proxy for a reconstructed IBR scene ...... 30 Figure 9 - The CR-PLAY mixed pipeline integrate in Unity ...... 31

Table of Tables Table 1 – User requirements table ...... 8 Table 2 - Summary of analysis results ...... 15 Table 3 – Risks and contingency actions table ...... 33

36