Toolkit for Social Experiments in VR

by

Unnar Kristjánsson

Thesis of 60 ECTS credits submitted to the School of Computer Science at Reykjavík University in partial fulfillment of the requirements for the degree of Master of Science (M.Sc.) in Computer Science

May 2019

Examining Committee: Dr.Hannes Högni Vilhjálmsson, Supervisor Associate Professor, Reykjavík University, Iceland

Dr.David Thue, Examiner Assistant Professor, Reykjavík University, Iceland

Claudio Pedica, Examiner Affiliated Researcher, Reykjavík University, Iceland

i Copyright Unnar Kristjánsson May 2019

ii Toolkit for Social Experiments in VR

Unnar Kristjánsson

May 2019

Abstract

Researchers tend to develop their own applications when it comes to running social be- haviour studies in Virtual Reality. These applications are often single use and become obsolete between different projects of studies, despite projects often sharing similar goals or needs. This thesis argues that given a set of principles influenced by modular design patterns found in embedded systems architecture, reusable toolkit applications could be constructed that serve as a shared foundation for social experiments in VR. What follows is a partial implementation of those principles, and an example study en- abled by said implementation. Conclusions suggest that despite need for refinements, particularly to the implementation presented which only partially satisfies those prin- ciples, the principles indicate some merit as to their utility for building a reusable software toolkit for social experiments in VR.

iii Hugbúnaðartól Fyrir Félagsrannsóknir í Sýndarveruleika

Unnar Kristjánsson

maí 2019

Útdráttur

Rannsakendur á félagssviði eiga það til að þróa sinn eigin hugbúnað til að keyra fé- lagsrannsóknir í sýnderveruleika. Slíkur hugbúnaður er venjulega sérhæfður og einungi nothæfur í einu ákveðnu samhengi, og er því ekki endurnýttur á milli verkefna á sama sviði, þrátt fyrir að verkefni deili oftast markmiðum og þörfum. Þessi grein færir rök fyrir því að gefið íhald í ákveðnar reglur miðaðar að endurnýtsleika, þá sé hægt að smíða hugbúnaðartól með eiginleika sem geta þjónað sem undirstaða fyrir tilraunir í sýndarveruleika. Það sem fylgir er af hluta til útfærsla af þessum reglum, og dæmi um notendarannsókn sem byggir á þeirri útfærslu. Niðurstöður benda til þess að þrátt fyrir þarfir á breytingum, sérstaklega á útfærslunni sem er gefin - sem aðeins af hluta til fylgir reglunum út, þá lofa reglurnar góðu hvað notendagildi fyrir félagstilraunir í sýndarveruleika varðar og útfærslu á hugbúnaði sem þjónar þeim þörfum.

iv Toolkit for Social Experiments in VR

Unnar Kristjánsson

Thesis of 60 ECTS credits submitted to the School of Computer Science at Reykjavík University in partial fulfillment of the requirements for the degree of Master of Science (M.Sc.) in Computer Science

May 2019

Student:

Unnar Kristjánsson

Examining Committee:

Dr.Hannes Högni Vilhjálmsson

Dr.David Thue

Claudio Pedica

v The undersigned hereby grants permission to the Reykjavík University Library to reproduce single copies of this Thesis entitled Toolkit for Social Experiments in VR and to lend or sell such copies for private, scholarly or scientific research purposes only. The author reserves all other publication and other rights in association with the copyright in the Thesis, and except as herein before provided, neither the Thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author’s prior written permission.

date

Unnar Kristjánsson Master of Science

vi Contents

Contents vii

List of Figures x

List of Tables xi

List of Abbreviations xii

1 Introduction 1 1.1 Motivation ...... 1 1.2 Social VR Toolkit ...... 2 1.3 Problem Statement ...... 2 1.4 Contributions ...... 3 1.5 Structure ...... 3

2 Related Work 4 2.1 Game Engines ...... 4 2.2 User Friendly Abstractions ...... 7 2.2.1 Visual Programming ...... 7 2.2.2 Natural Language Understanding ...... 7 2.2.3 Native Format Conversion ...... 8 2.3 Social Agent Research Tools ...... 8 2.3.1 SAIBA ...... 8 2.3.2 SAIBA Compliance ...... 9

3 Approach 10 3.1 Design Principles ...... 10 3.2 Architecture ...... 11 3.3 Supporting Custom Content ...... 12 3.4 Agent Behaviour ...... 13 3.5 VR Considerations ...... 13 3.5.1 Design Guidelines ...... 13 3.5.2 User Interaction ...... 14 3.5.3 Interface ...... 14 3.6 Use Case ...... 14 3.6.1 The Scenario ...... 14 3.6.2 The Tools ...... 15 3.6.3 The Procedure ...... 15

4 Implementation 16

vii 4.1 Implementation Principles ...... 16 4.2 Architecture ...... 16 4.2.1 Network Layer ...... 17 4.2.2 The Command Manager ...... 17 4.2.3 Networked Commands ...... 18 4.2.4 Command Handlers ...... 18 4.2.5 Command Terminals ...... 19 4.2.6 Native Command Terminal ...... 20 4.2.7 External Web Interface ...... 20 4.2.8 Command Macros ...... 21 4.3 Adding Content ...... 22 4.4 Creating Agents ...... 22 4.4.1 Humanoids in Unity ...... 22 4.4.2 Unity’s Animator Component ...... 22 4.4.3 Skinned Meshes in Unity ...... 23 4.4.4 Human Animator ...... 23 4.4.5 Unity’s Timeline and Sequence Editing ...... 24 4.4.6 Extending Unity’s Timeline ...... 25 4.4.6.1 Pose Track ...... 25 4.4.6.2 Bone Track ...... 26 4.4.6.3 Gaze Track ...... 26 4.4.6.4 Expression Track ...... 27 4.4.7 Behaviour Annotator ...... 28 4.5 VR Features ...... 29 4.5.1 VR Interface ...... 29 4.6 Use Case ...... 30 4.6.1 The Scenario ...... 30 4.6.2 Content Creation ...... 30 4.6.3 Visual Aesthetics ...... 31 4.6.4 Characters ...... 31 4.6.5 Characters Aesthetics ...... 32 4.6.6 Character Dialogue ...... 33 4.6.7 Behaviour Sequences ...... 33 4.6.8 User Interaction and Interface ...... 34 4.6.9 Configuration ...... 34

5 Results 35 5.1 Principles and Architecture Flexibility ...... 35 5.2 Content Support ...... 35 5.3 Usability of Agent Tools ...... 36 5.4 Use Case Reviewed ...... 36 5.4.1 Preparation ...... 36 5.4.2 Running the Study ...... 37 5.4.3 Scenario Specific Commentary ...... 39

6 Future Work 40 6.1 Content Pipeline ...... 40 6.2 Dedicated Scripting Language ...... 40 6.3 Scripting Abstractions ...... 41

viii 6.4 Speech Generation ...... 41 6.5 Reactive Animation System ...... 41

7 Conclusion 43 7.1 Contributions ...... 43 7.1.1 Guidelines ...... 43 7.1.2 Command Line Interface ...... 43 7.1.3 Animation Tools ...... 43 7.1.4 Content Pipeline ...... 44

Bibliography 45

ix List of Figures

2.1 Modded Doom ...... 5 2.2 Developer Consoles ...... 5 2.3 Addons in WoW ...... 6 2.4 Visual Programming ...... 7

3.1 Toolkit Usage ...... 11 3.2 Application Content Pipeline ...... 12 3.3 Example Study Composition ...... 15

4.1 Command Based Architecture ...... 17 4.2 Command Handler Inspector ...... 19 4.3 Command Terminal Execution ...... 19 4.4 Native Terminal ...... 20 4.5 Networked Command Terminal Execution ...... 20 4.6 Web App - Terminal ...... 21 4.7 Web App - UI Widgets ...... 21 4.8 Timeline Animator Control ...... 23 4.9 Unity Timeline Editor ...... 24 4.10 Custom Timeline Tracks ...... 25 4.11 Pose Track ...... 25 4.12 Bone Track ...... 26 4.13 Gaze Track ...... 27 4.14 Expression Track ...... 27 4.15 Behaviour Annotator ...... 28 4.16 In-VR Interface ...... 29 4.17 Pilot Study Procedure ...... 30 4.18 Cartoon Object Shading ...... 31 4.19 Cafe Environment ...... 31 4.20 Cafe Characters ...... 32 4.21 Behaviour Variation ...... 32 4.22 Character Prototypes ...... 33

5.1 Procedure Setup ...... 37

x List of Tables

4.1 Custom Animation Tracks ...... 25

5.1 Pre-exposure Simulation Sickness ...... 38 5.2 Post-exposure Simulation Sickness ...... 38

xi List of Abbreviations

VR Virtual Reality CLI Command Line Interface HA Human Animator

xii Chapter 1

Introduction

1.1 Motivation

The current wave of Virtual Reality technologies has introduced a variety of applica- tions for VR. Social researchers for example have been drawn to its qualities. Because VR gives you the ability to take part in fantastical scenarios, the technology is appeal- ing for scientific study because of how easy it is to manipulate conditions and variables of the virtual world. And the person put into VR can be made to assume a role. Such as a bystander in a recreation of a real life location. The recent surge in VR follows improvements to hardware which allow minimum requirements for comfort to be met on a consumer level. These hardware advances also continue to enable progress in other fields, which may also indirectly benefit VR, such as computer simulations. The basic view of a simulation is that it attempts to computationally replicate some specific real-world behaviour, or a category of real-world behaviours. For example, a crowd simulation attempts to replicate the behaviour of a crowd of pedestrians, mov- ing in realistic looking patterns. These simulations are usually developed in a bubble, though coexistence with other types of simulations is the intended goal. A crowd simu- lation could be used inside a larger city simulation, which is composed of multiple such single purpose simulations, forming a coherent city simulation. Coexistence is a tricky thing to achieve, since the requirements of each simulation cannot always be satisfied simultaneously. Simulations may race each other to access system resources, and how freely they can do so can produce chaotic results. Some may function proportionally to the level satisfied, while for others it’s either all or nothing. A crowd simulation for example may function adequately on little processing power if judiciously relied on, while other models such as physics simulation may break down if not provided ample resources for calculation. VR games aren’t too dissimilar from their traditional counterparts. The most obvious differences concern hardware and input. These are easy to outline because they concern basic usability and comfort. In other respects, if something is applicable to traditional games, then it may reasonably be assumed to apply to VR games as well. Though to what degree is a matter of scrutiny. Certain flaws might become glaringly obvious to users when viewed from the more intimate first person perspective used in VR. For example, viewers observing conversational characteristics of virtual characters in action may disregard certain nuances when viewed from a detached perspective of a 2 CHAPTER 1. INTRODUCTION

2D monitor. VR may in such cases prove to be more ideal for certain types of scenarios, although it remains a subject for study in each case.

1.2 Social VR Toolkit

Social researchers working with VR often develop their own set of software tools to aid them in some specific inquiry. These are single-purpose applications tailored to an experiment of their design and questions they want answered. After producing a study which uses the tools, they become obsolete. Any future experiment designs will require that the old tools be refitted to a new purpose, or a new set made from scratch. Either option requires a level of software proficiency, and in the first case, also technical familiarity with the old tools. And so it isn’t just that creating new software tools is a tedious process. It also puts software-illiterate researchers at a disadvantage. The development overhead and tedium, and the tools being so similar, begs the question: why aren’t they reused? Is it more practical that tools be hardwired to fit certain experiments rather than be configurable? Social scientists for example, better be ready to recruit software specialists, and spend a great deal of time communicating their experiment design. The developers need to be made intimately familiar with the topic to prevent requirements becoming lost in translation. The researchers need to be made intimately familiar with technical limitations to avoid requirements becoming unrealistic. This process is a lot of busy- work, akin to the same sort of busywork as the last group of social scientists had to go through to get their study off the ground.

1.3 Problem Statement

In many commercial video games, their underlying engines provides access to technical features which allows user modification of game content. Experienced users can this take even further, and through an engine’s more developer-centric interface, they can modify the game’s execution to suit different purposes. How constrained user modifi- cation is can vary, and usually depends on whether such usage is accounted for by the game’s architecture. Researchers, when seeking demonstrative environments for simulations models, fre- quently opt to make use of modding capabilities of commercial games to do so, min- imising or even removing the need to develop specialised software from scratch. Fur- thermore, the simulation model being demonstrated may have been developed with such a game environment in mind, and so embedding it into one from the start can serve as a useful test of its practical viability. Conversely, the modding approach is inherently limited in its dependence on the game content it is modifying. While it benefits from the feature richness of an ex- isting game’s environment, it is limited by the extent to which the game’s execution can be modified. And some major workaround may have to be implemented to get around such limitations, if at all possible. Secondly, assuming that very game-specific content is being modified, and said game runs on a very specific version of its engine, the modification will not benefit from any new features that newer versions receive - potentially undesirable if the creators would like to make use of a new feature that should by all appearances work amiably with their mod, and could perhaps be 1.4. CONTRIBUTIONS 3 features added to address limitations that they could previously not solve with what they had. The mod derived simulation will therefore remain entirely fixed capability wise and tied to whatever features are offered by that version of the engine, for better or worse. Social researchers may opt to develop their own software tailored to the study they have in mind. This allows them to work within their own boundaries, but requires more development efforts as a result. The product of which is unlikely to be concerned with reusability, and will therefore be geared for a single study. Future studies will therefore require their own software, which makes it difficult for direct followup studies to guarantee that previous conditions remain unchanged on a technical level, whilst safely altering others. Perhaps a mixed approach would prove more sustainable. One that is part moddable game environment, and part specialised research software. Reformulated: What if there were a generally applicable set of software tools for social researchers, with which virtual social scenarios could be put together. A toolkit for rapid realisation, demonstration, and evaluation of ideas that are derived from theoretical social models. Is it possible to determine whether such software would hinder expansion of ideas, or whether it would provide a stable platform for implementation social scenarios and studies thereof?

1.4 Contributions

The work presented here provides a groundwork towards further development of a general purpose software toolkit within the realm the social simulation. It proposes several general criteria that such tools should adhere to and capabilities it should pos- sess. Namely, those that serve to benefit researchers in their study of social simulations in relation to human subjects and artificial actors.

1.5 Structure

Chapter 2 (Related work) will explore some relevant background material and moti- vations pertaining to social simulations. Chapter 3 (Approach) outlines a set of principles and capabilities that a social toolkit should possess if implemented in practice. Additionally, it presents an outline for an example study, one which would benefit from the existence of such a set of tools. Chapter 4 (Implementation) presents a concrete implementation of a social toolkit that aims to provide a practical example of the ideals from chapter 3. An implementation strategy of the example study from said chapter will also be presented, which uses the capabilities provided by the tools. Chapter 5 (Results) will contrast the ideals from chapter 3 with the implementations of chapter 4, and review to what degree the initial criteria were met, and note down drawbacks with the form of implementation chosen. Feedback from the example study will also be presented to some degree with the same goal in mind. Chapter 6 (Future work) draws from chapter 5, and lists improvements that could be made to overcome the shortcomings encountered, as well as propose new capabilities that a toolkit might ideally opt to include that were not previously considered. Chapter 7 (Conclusion) summarises the work so far, and outlines the contributions of the thesis. Chapter 2

Related Work

2.1 Game Engines

A game engine is a form of software development environment that provides a set of standard solutions for game development, such as for graphical rendering or physics simulation. Engines allow developers to concentrate their work on what is unique to their own game content without having to worry about reimplementing common solutions such as hardware or operating system interaction.

The modern understanding of game engines grew alongside mainstream use of 3D graphics. id Software is largely credited for laying groundwork in both areas with their games Wolfenstein 3D (1992) [1] and the later Doom (1993) [2]. The transition from Wolfenstein to Doom marked id’s first use of an engine in their games. Before Doom, and exemplified by Wolfenstein, games were developed as single purpose units. id’s first engine, id Tech 1, would eventually be used in the studio’s own Doom, and its sequel Doom II (1994) [3]. Bundling the engine in this manner also allowed id to licence its use to other studios, such as to Raven Software for their game Heretic (1994) [4] - a practice that has become common for engines.

Id Tech 1 also officially supported as part of its architecture the concept of user modding; giving users the ability to modify how content is loaded in (see figure 2.1). Previously in Wolfenstein, advanced users had managed to find ways to modify the application through unofficial means, swapping content around and creating different experiences. The desire of users to modify content was therefore accounted for ex- plicitly in id Tech 1, which exposes portions of its architecture to the user, allowing them to rewire it to a certain extent. Content could be swapped out, and as long as mods worked inside the boundaries of the game running the engine, new games could essentially be created made from the same ingredients. User mods would continue to be popular for later engines such as Epic Games’ Unreal Engine (1998) and ’s Engine (2004), and the latter’s developer would often seek to recruit mod creators - owing to the fact that modding tools made available were the same content creation tools available to developers. 2.1. GAME ENGINES 5

Figure 2.1: Total conversion mods for for Doom (left) [5] and Doom II (right) [6]

The type of mod support has varied for engines. For id Tech 1 games, mods were largely kept within the bounds of whatever features games were shipped with. Unreal Engine 1 (1998) opened the door to additional flexibility by providing support for custom script modules, allowing modders to expand on the existing set of features, rather than merely rewiring them or swapping out assets. It is not uncommon for engines to include a form of backdoor access to games at runtime, whether they support custom user scripts or not, although this form of access is often indicative of such flexibility being available. Developer consoles/terminals are usually the form such access takes and can be summoned in-game (see figure 2.2). The output of these console windows tend to expose the current activity of the engine; what is currently happening behind the scenes. By examining the output one may get a sense of how the engine manages a game’s content at runtime.

Figure 2.2: Examples of developer consoles in games: Portal 2 (left) [7] and Fallout 3 (right) [8]

Blizzard Entertainment’s online multiplayer World of Warcraft (WoW) [9] incor- porates a fully moddable approach to its interfaces, allowing users to fully customise the game’s UI with Addons, shown in figure 2.3. Modders have taken advantage of this extensive access, and dedicated websites exist that host a catalogue of different types of addons. For instance, one of the game’s most popular addons is a set of visual aids for enemy encounters, which replaces the game’s more basic defaults. [10] 6 CHAPTER 2. RELATED WORK

Figure 2.3: Left: A customised interface in World of Warcraft through addons. Right: A web site listing for WoW addons.

Game engines and modding support are not just used for entertainment purposes. They have also provided a foundation for many scientific projects where researchers have opted to use them as a development platform to test out their ideas and proto- types. [11] Either they opt to use an engine, in which case they build their own ap- plication around the idea, or mod a game, in which an existing application is rewired to fit the idea. [12] Building an application from scratch lets researchers more freely pick and choose the available set of features, but generally this also requires greater technical involvement and insight into the engine architecture. Modding an existing application limits researchers to using whatever features are already available, but the modding interface may be simpler for programming novices to work with, and espe- cially so if the application comes included with dedicated modding tools, some of which are entirely visual rather than programmatic. The game modding approach for research is particularly appealing for models that are intended to be integrated into games, since it simultaneously tests their practical feasibility and viability for game environments. Especially so if the game being modded is an established commercial product. Conversely, if the idea is implemented inside an entirely custom application, feasibility may be proven by way of demonstration, but viability is less obvious and harder to argue for as it exists entirely inside the researcher’s own domain: "...it works in the demo you made. Now what?". A custom application may be the only option if the feature set available to mods is either too limited, or the exact nature of what is exposed isn’t compatible with the design researchers have in mind. In full control however, researchers can expand the set of features as needed and tune the application to desired fit, though they need greater software proficiency to do so. Such applications are usually always single purpose, and depending on the engine, are limited in how they can be modified. Similar issues exist for both approaches to research applications. Both require a level of software proficiency not necessarily guaranteed that researchers each time possess. Behaviour researchers may for example want to implement a theoretical be- haviour model but cannot directly do so without either recruiting software specialists, or acquiring the skills themselves. Either option is costly, and possibly unreliable if either group (assuming they’re distinct for the sake of example) is too unfamiliar with the other’s perspective in the process - the result might be that the application becomes a poor representation of the original design. 2.2. USER FRIENDLY ABSTRACTIONS 7

2.2 User Friendly Abstractions

Software used by social researchers is often designed to appeal to their lack of soft- ware proficiencies. The ideal in such cases is to provide them with some means of manipulating the software that aren’t predicated on knowledge of programmer centric terminologies or skills. A user friendly abstraction is a high level abstraction layer. The use of which is defined in terms familiar to some non-technical audience. Some piece of software may define several types of such abstraction layers, each aimed at a particular type of user. For example, a game engine may provide an interface geared towards artists, as well as an interface geared towards writers, such that each respective group can worth in terms they’re familiar with.

2.2.1 Visual Programming Visual programming employs graphical interfaces which are layered on top of program- matic software to make it easier for technical novices to use it. Rather than writing textual instructions to feed to the software, the researcher can do so indirectly by manipulating graphical elements, the output of which is a programming language that can be interpreted by the software, but that the researcher doesn’t have to deal in explicitly (see figure 2.4). For example, stories can be authored using a visual language that converts visual semantic constructs to a 3D story scene representation [13].

Figure 2.4: Programmatic control versus visual programmatic control. Users of the left process write textual programs that are sent to the software (direct control). Users of the right process manipulate a visual interface that generates a textual program which is then sent to the software (indirect control).

2.2.2 Natural Language Understanding As an interface for constructing complex visual scenarios, natural language can be considered as one effective means of describing them intuitively. For example, some systems have been developed that take natural language as input and attempt to out- put a visual representation of a scene being described [14]. The input is typically converted into some kind of internal representation through methods of natural lan- guage understanding. Natural language gives an illusion of freedom, while the output will be inherently constrained by the system - there are only so many possible outputs. 8 CHAPTER 2. RELATED WORK

This can partially be addressed by limiting the domain of use. For example, taking a textual description of car accidents and converting it to a visual representation [15].

2.2.3 Native Format Conversion Visual programming may be thought of as one implementation of the idea of converting between user and software, but where the visual aspect is in general more tuned to the idea of hiding complexity rather than primarily serving a particular type of user (although this could also be true), who might still prefer to work in a familiar format. For example: a writer, writing a story script for a game, may be compelled to learn a new visual programming language used by game designers. Although the new language may be relatively easy to learn compared to a programming language, it may be lacking in necessary literary constructs familiar to the writer. One possible solution might be to allow the writer to work in their native format, and algorithmically convert that format to something native to the software.

2.3 Social Agent Research Tools

Game engines often provide basic ingredients for working with character behaviour, i.e. virtual humans or agents, and are geared towards the needs of games. Because the needs of games and the needs of research tend to differ somewhat with regard to agents, researchers have developed specialised frameworks for their behaviour. These frameworks focus to a greater degree on the expressive characteristics and communica- tive capabilities of such embodied agents, and describe their behaviour in high level physical terms. Defining behaviour in such abstract fashion allows agent systems to be more easily integrated into a variety of applications than if they were concretely specified. The unifier is that the behaviour notation assumes that human physical affordances can be accounted for without specifying how they should be accounted for.

2.3.1 SAIBA SAIBA [16] is a design framework aimed at generation of communicative behaviour for virtual agents. It defines layers of abstraction for representing communicative behaviour. This is visible in two languages defined within the scope of SAIBA. BML (Behaviour Markup Language) [17] and FML (Function Markup Language) [18]. BML defines high level syntax for describing behaviour in abstract physical terms. FML is conversely concerned with describing an agent’s communicative intent; the motivat- ing factors that underlie a particular arrangement of an agent’s outputted behaviour. The ideas underlying SAIBA promote a certain notion of universality for describing human behaviour. The degree of universality varies and depends on the extent to which researchers can agree that certain features are generally applicable for different domains. FML for example has been noted by contributing researchers as being more difficult to define in terms of a shared structure than BML is (which concerns itself with more tangible modalities such as physical gestures) [19]. To address such ambi- guities, these languages attempt to varying degree to leave certain portions of their definitions extensible. 2.3. SOCIAL AGENT RESEARCH TOOLS 9

2.3.2 SAIBA Compliance A SAIBA compliant system adheres to the specifications defined within SAIBA, such as BML realizers. A BML realizer is an interpreter capable of generating synchronised behaviour from BML syntax. While BML provides behaviour sequences and synchro- nisation constraints, a realizer will resolve this more abstract notation into concrete executable form. A realizer implementation is inherently specific, and serves as a bridge between BML, and a specific type of animation system. Various examples exist of BML realizers made for different contexts. [20] [21]. Some realizer implementations strive to be more generic, and provide a solution that sits between the abstract nota- tion of BML, and a specific application, providing the framework for parsing notation while leaving application-specific portions of the architecture open for extension. [22]

BEAT (Behaviour Expression Animation Toolkit) [23] is an animation tool used to derive human animation appropriately synchronised a given dialogue. A text utterance is fed into the tool, and a behaviour script is outputted, which describes in abstract terms a sequence of behaviours with synchronisation constraints. This script can then be fed into an interpreter that converts its descriptions into executable animations. Since the tool operates on a strictly descriptive level it can be universally integrated into different contexts. BEAT’s output is configured by specifying rules that govern how linguistic characteristics relate to behaviour, and how certain behaviour can and should be applied in response to linguistic patterns. BEAT can be effectively utilised for applications that require the presence of a responsive communicative agent. [24]

What these tools have notably in common is that their output strive to be appli- cation agnostic. Such specific implementations are expected by the tools to give them executable form. It is not exactly clear however whether such an implementation might be ideally tuned to a specific context, such as within a specific type of simulation, or whether the implementation will also strive to be context agnostic. Such that it can be plugged as a fully functional unit into various applications. Chapter 3

Approach

This chapter will take an overview of some minimal features that a toolkit should provide for it to be able to support studies of social behaviour in VR.

3.1 Design Principles

A toolkit will ideally opt to restrain researchers as little as possible. It should not force researchers to adhere to strict design patterns. A researcher should be able to modify the software’s execution to as large a degree as possible, and the toolkit should therefore avoid enforcing broad design standards. This view isn’t taken in opposition of standards, but rather to minimise degradation of the features in face of changing or newly introduced standards. The responsibility of structure would instead fall to the researcher who would make sure the most appropriate standards of use are being adhered to.

The toolkit should ideally restrain researchers as little as possible. That is, it should be composed of features, but how they wire together is up to the researcher. For example: a virtual reality manager, and an AI manager could be provided as features of the toolkit, and the researcher might specify that some action of one should always follow some action from the other (a virtual reality participant lifts his hand, which causes an AI character to take notice). 3.2. ARCHITECTURE 11

Figure 3.1: Toolkit execution being customised by researcher. A researcher design is composed of calls to features in the toolkit.

Essentially, the software should provide researchers with a set of general use build- ing blocks rather than a building kit for something specific. This is also important for the software’s longevity. Presenting the features to the user in a way that has already bound them together in some fashion, it becomes more difficult to verify whether the application behaves as before with regard to a feature because of needlessly introduced relations to newer additions. Put simply: define features, but leave relations to the researcher. It should be clarified that this mainly applies to the toolkit as a whole. For some individual features, certain patterns of use may be promoted, but these aren’t intended to be uniform across the application. Rather, this serves more as a reflection of the possibility of the toolkit being composed of a multitude of different types of subsystems, each with their own unique form of input. Researchers may also shy away from using these features if the implication is that the features may change its structure arbitrarily, requiring them to learn new stan- dards. It may also encourage them to stick to an older version of the software, one which supports the workflow that they’ve become accustomed to, to the detriment of having more options to build with. In summary, a social toolkit should be conceptually comparable to a play chest of building blocks. New types of blocks can always be added. These may not fit with the old ones, but previous arrangements haven’t been invalidated by their addition. A system introduced into the toolkit should essentially exist independently in a sort of void, where it can be called upon by researchers. A building block taken from the toy chest, or addon installed on demand.

3.2 Architecture

The architecture of the software refers to the interface it provides to the researcher that allows them to essentially define their own program composed of references to features. 12 CHAPTER 3. APPROACH

For that purpose, the toolkit should possess scripting capabilities, an interpreted language in which the researcher can invoke features on demand, and specify relations between their inputs and outputs. Referring back to the previous section on principles, the toolkit should essentially function as an engine. A well of outlets that can be plugged into a context drawn by the researcher who uses it. In summary, the toolkit should provide the researcher with a scripting interface that allows them to compose how the toolkit should operate in the form of an interpreted program.

3.3 Supporting Custom Content

The term content refers to such items as 3D models, image textures, or sound files. To be regarded as universal to a minimum degree, a toolkit should not include any specific content within its core. This is to ensure that the same software can be reused by multiple different researchers. For example: the toolkit might be used in two different studies, both using completely different sets of 3D models. Either the toolkit opts to include a larger set of assets, supporting possibly many studies, or remain free of them completely. The former cannot be sustained for very long, while the latter can be accommodated indefinitely if the software supports the import of new assets. The toolkit should therefor provide a content pipeline. A structure or process that allows researchers to append new content to the toolkit that it can reference during its execution (see figure 3.2).

Figure 3.2: Relation between application and custom content. The application sup- ports content that adheres to a specific format.

Ideally, the formats inputted to this pipeline (e.g. image formats such as .png), should not be attributable to the toolkit. While it may for certain purposes define its own proprietary formats, it should at a minimum support existing formats whose structure is clearly defined and widely used. In summary. The toolkit should not come with any type of content built into it. To allow researchers to add content it should instead provide a content pipeline that allows them to extend the toolkit with new resources. This would allow them to not only customise the execution of the program using the features provided (see previous sections) but also have this program interact with new content. This would guarantee to a greater degree that the toolkit remains generally applicable for different studies. Either the toolkit grows in size by bundling a large set of general-use assets with it, or a new version of the toolkit is created every time a new study is proposed - one that includes its own assets. Either approach is not considered in this thesis, and the latter has been criticised previously as being too "single use". 3.4. AGENT BEHAVIOUR 13

3.4 Agent Behaviour

An agent, or a character, as will be referred to throughout this thesis is a virtual human. The rules governing the behaviour of which can be described in terms of universal, physical, human attributes. This section overlaps partially with the one prior on content, in so far as agent behaviour and content descriptions thereof are considered to be a type of used derived content format. Because the tools aim to mainly provide support for characters with basic human characteristics, it would be sensible to derive the minimal requirements from a pre- existing, proven standard. BML (Behaviour Markup Language) for example is one such standard. It provides a unified syntax for describing multi-modal human behaviour independent of an animation system. As such, support for BML, or one equivalent in purpose is a recommended minimum. Beyond the more abstract terminologies of behaviour notation, there should be support for at least one type of concrete animation system. One that can interpret behaviour notation into a realised, animated output. This does however mean that certain choices must be made with regard to how a character needs to be physically composed for it to be supported by the animation system. Most desirably, this is a design decision that should be placed outside the toolkit, giving the researcher the ability to integrate their animation system of choice. Or even be able to assemble one if the existing tools provided are small and flexible enough to allow it. Minimally, an ideal animation system as pictured should provide a way to target individual portions of a human character to a precise degree. Such that animation access comes down to unrestricted animation of individual bones if such were required. To summarise for agent behaviour. The toolkit should include a paradigm for working with virtual agents. One that derives its terminologies for describing char- acter behaviour on existing notation that takes into account real human modalities. Additionally, it should include an animation system that can at a minimum realise the types of behaviour supported by the notation.

3.5 VR Considerations

One of the toolkit’s primary aims is that it support virtual reality. Researchers can define procedures revolving around human subjects inside VR.

3.5.1 Design Guidelines To be able to adequately tout itself as VR ready, a toolkit should take into account some minimal guidelines when implementing VR specific features. These guidelines can be roughly divided into two categories: minimal restrictions, and ideal aesthetics. Minimal restrictions, which is the more critical category, pertains to strict avoidance of elements which are linked to user discomfort. Certain modes of navigation for example have been noted to contribute less to simulator sickness than others. [25][26] Graphical stuttering as a result of poor system performance can introduce visual artifacts that compound discomfort. 14 CHAPTER 3. APPROACH

Aesthetic concerns conversely refer to more subjective ideals. The questions in- volved pertain more to taste than literal physical comfort. Color schemes, sound design, or contextual "appropriateness" of objects fall into this category. Some exceptions to the former rule may also fall into this category. If for exam- ple, avoiding discomfort is considered unavoidable by the designer towards a specific creative goal (though within reasonable tolerance limits). To sum up. By stating these guidelines, it is not recommended that toolkit design- ers spend efforts to police users of it away from breaking them. Rather, they can be used to inform good design of individual VR specific features, such as ones concerning interfaces, or features that are highly performance demanding. The software might opt to inform the researcher of how optimal their design is with respect to performance where possible, but this should be secondary, and considered only to the extent it can be implemented when the toolkit is as uncoupled as is recommended. As before, the researcher shouldn’t be barred from making the attempt.

3.5.2 User Interaction First, VR subjects should be able to interact with elements in the environment physi- cally. This might include any of a multitude of currently popular types of controllers, such as those paired with the HTC Vive. Second, there should be a method available that allows interactions by the subject to be described in terms of actions, and handlers which execute logic. These requirements might to some degree refer to the previous section on architec- ture, and its stated criteria of supporting an embedded scripting language.

3.5.3 Interface There should be support for researchers to define user interfaces that are specially tuned to interaction in VR. Such interfaces would allow the researcher to present in-VR UI to a subject, in the form of information, or questions. Surveys could be conducted for example without having the user break out of the currently engaged scenario. Such a system may be addressable as part of larger scheme that concerns user interface features in general, but then VR support should be included as a guarantee.

3.6 Use Case

What follows is a typical use case for the sake of example. Lets imagine that a re- searcher wants to understand the effect of certain behaviour on empathy. He might desire the use of tools that let him construct a procedure in VR that can produce those qualities.

3.6.1 The Scenario The researcher wants to define his scenario in VR. This scenario includes expressive characters that behave in a certain way, and should play out a sequence that involves them both. There should be a questionnaire put forth to the subject during the scenario, which lets him give feedback on the experience without exiting VR. 3.6. USE CASE 15

3.6.2 The Tools The scenario might be composed of the following features, all of which are defined in a toolkit of the proposed sort.

• Content - to allow the researcher to add new environments and characters

• Agent - to allow the researcher to define character behaviour

• VR interfaces - to gather subject feedback inside VR

• Runtime scripting - to allow the researcher to configure the scenario and set parameters

3.6.3 The Procedure The procedure should be configured such that when the application starts, an environ- ment is shown. The subject is led through the procedure, which involves characters acting out social behaviour. The subject is asked to give feedback, and the procedure then ends. A rough outline of steps is shown in figure 3.3.

Figure 3.3: An example study. The study is first configured and, and then executed using said configuration Chapter 4

Implementation

This chapter follows chapter 3 and views it through a technical lens. Previously, some abstract principles were discussed in terms of features that a social study toolkit for VR ought to possess. The implementation was developed with the Unity [27] game engine and most of the features that were implemented are embedded in its architecture. The sections that follow will contrast chapter 3 and discuss implementations of features informed by its stated ideals: Practical design principles, runtime architecture, agent design tools and practical VR considerations. Those sections will then be referred to in the final section which discusses how the use case outlined in the previous chapter has been implemented.

4.1 Implementation Principles

This section addresses the general design principles from the previous chapter in prac- tical terms. One of the stated goals was that individual features should opt to be self contained. This is achieved by designing each feature of the application to function as a separate system. Such that the internal architecture of each feature does not attempt to account for how it might be used in relation to any other specific feature. Neither does its application interface use complex signatures that refer to other components. Rather than coupling components together directly, strategies such as dependency injection, or middle components are used. Both of these are enabled through use of Unity’s native inspector. Indeed, for these reasons, Unity’s script architecture is commonly described as component based. The only exception to the the dependency rule are components native to Unity. Features may refer to these, and thus their only requirement for application is that they be used within Unity’s framework.

4.2 Architecture

This section will discuss the strategies taken towards providing users of the application with a way to configure its execution. The toolkit employs a Command-based backend that delegates instructions to vari- ous features. This system is divided into three parts: terminal, manager, and handlers (further elaborated on below; component diagram seen in figure 4.1). Simplified, fea- 4.2. ARCHITECTURE 17 tures are registered to the manager, which are then invokeable by way messages sent directly to it, a process which the terminal provides a graphical representation of. The features that follow do all in some part share one goal: to make the application controllable through an easily accessible interface. The ideal is to be able to control the application externally and allow researchers to craft their own interface without needing to modify the core software itself. The command like interface is built around the idea of executable strings sent to its core manager, and these can be sent either internally or externally in the same manner as far as the command line system is concerned.

Figure 4.1: Command backend

4.2.1 Network Layer The application supports a network interface that allows it to be communicated with through web sockets. Component methods can be bound to names which comprise the available socket API. More strictly speaking, the application uses an implementation of Socket.IO, an abstraction layer which simplifies the use of web sockets. The key benefit of providing such an interface is that the application as a result can be externally interacted with by any application that supports web sockets. This in- cludes those written in most major programming languages, such as web pages running JavaScript, its web socket implementation being supported in all major browsers.

4.2.2 The Command Manager The command manager provides centralised access for executing commands, and adding new ones. Internally, it is implemented as an associated map between command names, and a list of handlers (see later section on handlers). A command is invoked by sending a command string to the manager for execution, which it validates, before extracting from it a command name and arguments, and invokes corresponding handlers. Consider the following example when the string "do_something(0)" is sent to the manager for execution:

1. "do_something(0)" is validated.

2. The command name "do_something" and argument "0" are extracted from the input. 18 CHAPTER 4. IMPLEMENTATION

3. Handlers registered to "do_something" are invoked with the argument "0"

Note: on invalid input, the manager will return an error.

4.2.3 Networked Commands

Combining the network interface using web sockets, and the command manager, com- mands can be invoked externally.

4.2.4 Command Handlers

Command handlers are used to register features to the command manager. They reg- ister themselves to a command name, and get called on its execution by the manager.

A component representing a feature is attached to a handler by adding it as a listener to the execution event. Unity enables this form of generic dependency in- jection natively in its editor interface. This removes the need to create middle-man components to add features to the manager. See example of handler in figure 4.2.

For commands that take arguments it is often necessary to set the handler to not pass calls directly to features, but to components which convert the arguments into appropriate types or constrain their value. Several parsing components exist for this reason. A parser has its own listeners, which would include the feature corresponding to the command. A string-to-int parser for example will attempt to parse a string argument to an integer value, and if successful, passes the result to int listeners.

Parsers are not explicitly accounted for in the command architecture, but they do enable more with less.

Listeners are registered to handlers either with or without argument. Additionally, arguments can be set to be strictly validated against a given pattern. This can be used to constrain arguments to a certain range, or even a specific value such that multiple handlers are registered to the same command but set to handle different argument values. 4.2. ARCHITECTURE 19

Figure 4.2: Command Handler settings in the Unity Editor. 1) Command Handler script. 2) Command name and pattern validation 3) Registered listeners invoked without arguments (string parser). 4) Registered listeners invoked with arguments (none). 5) Parsing component (string to integer). 6) Registered features (Screen Fade).

It should be noted that handlers exist on a per-scene basis. Which can be thought of as different contexts offering different functions. A scene with characters for example might contain command handlers for functions pertaining to their behaviour.

4.2.5 Command Terminals A command terminal is a simple console interface that lets the user communicate directly with available features (see execution flow in figure 4.3). Because they do little other than relay messages between the backend and the user, different types of terminals can and have been implemented in this example.

Figure 4.3: Execution flow starting with terminal input and ending with a call to a custom component (script) 20 CHAPTER 4. IMPLEMENTATION

One of these is included natively and another has been implemented externally as a web app (both discussed below). As an aside, terminals can also be used to merely display raw text, such as to report the status of an active process.

4.2.6 Native Command Terminal A native terminal (seen in figure 4.4) comes included with the application and can be opened with a keyboard press.

Figure 4.4: Native terminal

The downside of the built in terminal is its incompatibility with VR on a perfor- mance level. By keeping the interface open, the active scene will be rendered twice. First for the main display (the researcher view), and then for the VR headset (subject). Keeping the interface open during an active VR session may not be necessary, but as- suming it were, opening and closing the interface might result in performance spikes perceptible by the headset wearer, thus introducing a needless, unlogged variation.

4.2.7 External Web Interface The web interface is not strictly a part of the core application. Rather it is a separate web application that communicates with the command manager through a network layer (see previous section on networked commands, and execution flow in figure 4.5).

Figure 4.5: Execution of commands, with added network layer

This interface runs in a browser with the application open. Since it is not tied to its rendering pipeline, it does not incur significant graphical performance cost. This makes it more preferable as a realtime interface than the native terminal. 4.2. ARCHITECTURE 21

Figure 4.6: Terminal inside the web app

In addition to providing a simple terminal that functions identically to the built-in one (see figure 4.6), the web interface can very easily define more user friendly input widgets which are mapped to commands (see figure 4.7). For example, to conduct Wizard of Oz studies controllable by the researcher during execution.

Figure 4.7: User friendly representation of command interface.

4.2.8 Command Macros Macros can be used as a shorthand to execute multiple commands at once. To add new macros, the application can be made to load them from external files at startup. A macro definition includes its name, number of arguments required, and a set of command strings it is composed of which include markers for arguments. When a macro is executed, the command strings are formatted to include the supplied ar- guments. Each of these formatted command strings then gets sent in turn to the command manager for execution.

1 { 2 "name":"set_time", 3 "args":1, 4 "script":[ 5 "clock.set_mode(1)", 6 "clock.set_time($1)" 7 ], 8 } 22 CHAPTER 4. IMPLEMENTATION

Listing 4.1: Example macro "set_time". Executes two command strings in order, with one formatted to include an argument. These commands in particular target a clock component, change its time mode to manual, and set the current time of day. The second command has no effect unless the former is also called, hence the need for two commands.

Macros are currently the only way to create a form of command that accepts multiple arguments. They’re ideal in that case for grouping a set of commands into reusable blocks.

4.3 Adding Content

Scene features opt primarily to use Unity’s native support. This means that scenes are designed inside the Unity editor, and come coupled with the application. This workflow runs counter to the recommended practices from chapter 3, as specific scenes are integrally bound to the tools by being bundled with the application, and new additions require it (and the tools) to be recompiled. This does yield benefits in the form of optimisation as resource loading and unloading, and asset compression are delegated to Unity. This will be scrutinised further in any case in a later chapter

4.4 Creating Agents

The agent architecture of the system is composed of features that build on Unity’s definition of humanoid characters.

4.4.1 Humanoids in Unity One of the native features of Unity is the ability to import character models, and have Unity generate a skeletal definition that is addressable at runtime. Any system that requires direct access to bones can query Unity by way of bone identifiers, correspond- ing to Unity’s standard skeletal definition for humanoids. This is useful as systems are absolved of the need to store concrete references to bones, and can instead resolve them dynamically using an identifier and a skeletal definition. For example, a single frame of a character animation can be defined as set of pairs of bone identifiers and transformation matrices. An animation frame defined in this manner can be reused for different characters assuming they adhere to a humanoid skeletal definition as Unity defines it.

4.4.2 Unity’s Animator Component Unity defines its own animation system, and the main point of entry is the Animator component, which serves as an abstraction between concrete characters in scenes, and reusable animations. This component, if the character definition is that of a humanoid, contains its skeletal definition - the mapping between concrete bones and identifiers. 4.4. CREATING AGENTS 23

4.4.3 Skinned Meshes in Unity Unity provides a way to render skinned meshes in the form of its Skinned Mesh Ren- derer component. Skinned meshes are usually deformed by way of bones where each bone affects a set of vertices to a weighted degree. More direct means of deformation are also available. The Skinned Renderer interface provides an access to Blend Shapes (also called Shape Keys) defined in the mesh. These enable simple linear transforma- tions for vertices. For example, in a skinned mesh, a blend shape named "Smile" could define transformation offsets for vertices around a character model’s mouth, whose in- terpolations deform it into a smiling position. By weighting a blend shape, the vertices will shift between their neutral positions and the transformation offsets. The Skinned Mesh Renderer is integral to controlling a character’s body as well as its expression. Body animation traditionally relies on bone transformations while facial expressions are commonly achieved using either blend shapes, bones (facial muscles), or both.

4.4.4 Human Animator This new component (not to be confused with Unity’s own Animator component) provides access to a character’s bones and an interface to control their transformation. The HA (Human Animator) is the main entry to character animation. A Bone Frame is a data structure containing a set of transformation matrices applicable to a humanoid skeleton (as Unity defines it). Transformations can be applied on a per-bone basis and an additional settings are used to tweak the behaviour of these operations. This pertains to how they are to be added to the current transformation state of the skeleton, either as an offset value (added) or absolute value (overridden). The second option is mainly useful as a way of initialising a default transformation for a character such as a neutral starting pose. See applications of bone frames to the HA in figure 4.8. The HA’s interface has in part a set of methods specifically geared to work with instances of bone frames and apply their values to the characters’ bones. A bone frame can in a sense be thought of as a single, or partial animation frame, since it is used to specify a full skeletal transformation at a single interval. Because bone frames only apply transformations to those bones set to receive them, they can be used to mask out certain portions of the body that are to be managed sep- arately (e.g. head movements). Such as by adding a different bone frame instance for those portions, or allowing them to be controlled by different animation components.

Figure 4.8: Bone Frames applied to the animator, scaled by weight values.

When bone frames are added to the HA they are not immediately applied. Rather they are queued up, and a separate method is used which applies them in the order 24 CHAPTER 4. IMPLEMENTATION in which they were supplied. This allows frames to be added and applied in a user defined order.

4.4.5 Unity’s Timeline and Sequence Editing Fairly recently, Unity introduced a sequencing tool called Timeline [28]. Timeline is fundamentally similar to video editing tools with respect to workflow (see timeline interface in figure 4.9). It can be used to assemble pre-scripted gameplay events or one-shot animation sequences.

Figure 4.9: Timeline’s editor interface, showing two built-in track types: Animation and Audio, playing an animation clip and an audio clip respectively.

When using Timeline, two default component are provided. A Timeline asset and a Playable Director. The first of these is a blueprint object that describes how a sequence plays out that is composed of roles of a certain type. The director component then combines the blueprint object with concrete instances of actors that fill the roles in the sequence. The Timeline asset stores a set of timed clips. During playback, the Timeline evaluates the current time frame, and computes and then sends weight values to clips based on their start time and duration. i.e., if the current time is evaluated outside the clip duration, the weight is 0, and otherwise on the 0-1 range. Clips can also be blended together such that overlapping clips fade into each other in terms of weight value. Custom clip types can be defined which control different functions, and these features serve as the main point of entry for extending Timeline with new features. Unity already provides a few such clip types for interacting with built-in components. A Timeline asset is meant to be reusable, and as such, it does not contain any references to concrete scene objects. A special resolver is used in that case such as the Playable Director. Furthermore, clips are grouped into sub components called Tracks. These tracks run in parallel, but evaluated from top to bottom. Each of these is bound to a control component. An Animator Track for example is provided by Unity and binds an instance of an Animator to animation data played on the track. This also means that a track cannot contain more than one type of clip, nor multiple binding types (animation clip for animator tracks etc.). The Playable Director, along with binding objects to events in the timeline, also provides playback controls. Play and pause methods are defined here, along with a method that allows the play state to be evaluated at a given time which allows the timeline to be scrubbed through manually. On a simplified, conceptual level, the timeline system is not unlike a film process in its workflow. A screenplay is written containing a set of role and order of events, which is handed to a director, along with actors ascribed to the roles, who then directs the order of events according to that in the blueprint (screenplay). 4.4. CREATING AGENTS 25

4.4.6 Extending Unity’s Timeline This section discusses a set of extensions to Unity’s Timeline, which when combined, provide an animation interface to the previously discussed components Unity and custom components. These extensions are implemented as custom Tracks, and the Human Animator used as the binding type for most. The available extension types are listed in table 4.1, with an example in figure 4.10. Note: The actual components, as seen in figures are prefixed with "Human" (e.g., "Human Pose Track) to make their membership in this set of components more obvious, but the prefix is removed in written descriptions for simplicity’s sake.

Figure 4.10: Animations controlled using timeline extensions

The following track types are defined.

Pose Sets the transformation of every bone to an absolute value Bone Adds to or overrides transformation matrices of one or more bone Gaze Controls transform orientation of eyes Expression Controls facial expression defined as Blend Shapes

Table 4.1: Custom timeline tracks used to animate characters

All of these are bound to the Human Animator component and work with Bone Frames in one way or another, with the exception of the Expression track, which is bound to an instance of a Skinned Mesh Renderer, to manage its weights of Blend Shapes.

4.4.6.1 Pose Track A pose track (see figure 4.11) is used primarily as a way to set a character to a base pose before further transform modifiers are applied. As such, only one pose track instance should appear in a timeline, and before any other track type that uses the Human Animator. This is to prevent pose tracks from overriding one another. For use with the pose track, an additional construct is defined. A Humanoid Pose is a snapshot of a skeletal definition set to a specific transformation for each bone. A pose click placed on the track is given a reference to an instance of such data. There is no restriction on the number of clips in a pose track, and these can be placed in sequence on the track, and even overlapped to blend poses together.

Figure 4.11: A Pose Track along with a supplied instance of a Humanoid Pose 26 CHAPTER 4. IMPLEMENTATION

Because the Humanoid Pose data merely consist of a set of transform matrices, they can be used to store animation data imported from other sources. For example, a different animation system can be set to play out an animation, the animation then paused, and the skeletal transformation copied at that animation frame. Some animation data even consists of just one frame to begin with, containing merely a pose that perfectly aligns with some form of geometry. Such clips can be imported without loss or a need to pause to find exactly the right frame.

4.4.6.2 Bone Track

Bone tracks (see figure 4.12) can be used to transform individual bones. Transfor- mations are subsequently applied through bone clips. Which take a set of bones and transform operations and combine these into a bone frame. This frame is then sent to the Human Animator instance bound to the track.

Figure 4.12: Bone clips, and transform operations

The extent of transformation of a single bone clip can be scaled with a weight value, each transform operation declared in the clip can be further scaled with an animation curve (the output being a function of linear weight). It is also possible to set a single animation curve that is uniformly applied to all the operations, overriding individual ones. Transform operations can either be set as an offset (additive) or absolute value (override). An absolute value in this case does not mean absolute in terms of global space, but rather local or relative to the parent bone. Additive mode is main envisioned use, but the latter can be useful to guarantee that bones are set to a specific transformation before further operations are applied to it.

4.4.6.3 Gaze Track

Using a gaze track (see figure 4.13), characters can be made to direct their eyes to a given target in space. To assigned targets to the gaze track, clips are added, and given references to proxy targets. Meaning that targets are not that of absolute coordinates, but a reference to a proxy object whose exact position is resolved when the clip is evaluated during playback. This allows targets to be moved without the need to update coordinate data in the clip. Overlapping clips allows targets to be faded between. 4.4. CREATING AGENTS 27

Figure 4.13: Transitioning between gaze targets, along with clip settings, showing a reference to a target, and relative offset

If no clip is active during playback of a gaze track, the look direction defaults to forward. One of the benefits to using proxy objects for targets is the potential to cycle be- tween different objects for the same target proxy. If a clip is looped, this might give an impression of a character looking around at different positions in the environment. For example, a character might be standing in a gallery, and the target is set to "painting", whose actual coordinate cycles between the different coordinates of paintings in the room.

4.4.6.4 Expression Track

An expression track (see figure 4.14) diverges from other track types in that it doesn’t work with bones. Instead, it manages weight values of blend shapes in the Skinned Mesh Renderer instance bound to it. Multiple tracks can co-exist, though it is possible for them to conflict if they’re set to control the same blend shapes. Expression clips define a set of desired weight values for different blend shapes. During playback, these weight values, after being scaled by the input weight of the clip, are applied to the renderer. Overlapping clips allows expressions to be blended together and faded between. For example, one clip could define a smiling expression with eyebrows turned fully upwards, and another a frowning expression with inwards turned eyebrows, both sup- plying different weight values for mouth and eyebrow deformations. When partially overlapped from left to right, the former expression will smoothly transition into the latter giving the impression of a happy disposition gradually turned angry.

Figure 4.14: Expressions being mixed together, along with accompanying settings for a single clip 28 CHAPTER 4. IMPLEMENTATION

The input weight supplied to the clip by playback is not directly used to scale blend shape weights, but instead is supplied to an animation curve, the output of which is then used to scale the shape weights. Additionally, a clip provides an iterator count value, which allows its deformations to be looped internally. Combined with the uniform animation curve, this feature is useful for adding subtle repetitive motions to give an impression of liveliness. Such as chest movements to simulate breathing, eyebrow ticks, or lip movements. For further note, during playback, any blend shape not assigned a weight value by an active clip defaults to 0. To avoid abrupt changes in expression, empty clips can be added to any gaps in the track. By overlapping them to a minor degree with those with clearly defined expressions, smooth transitions are created rather than the abrupt changed from neutral to defined, or vice versa.

4.4.7 Behaviour Annotator

The application comes with an annotation tool meant to help researchers validate their study by providing means to carefully review and comment on behaviours they create, and ask their peers for feedback. It supports the need of the researcher to make certain that experimental behavioural stimuli are correct. The annotator can be set to scrub through a specific sequence, and its interface displays simple playback controls that allow the time interval to be set. It also has comment editing controls for adding comments to specific points in time, or editing existing ones. The interface is visible in figure 4.15.

Figure 4.15: Annotator interface, with playback and comment controls. Left: playback controls, populated with comments. Right: comment viewer, seen when a comment is selected. The camera is also set to the orientation attached to the comment.

If a camera is attached to the annotator, it will additionally on each comment added, save the orientation of the camera such that viewing a comment will bring the camera to the same orientation. This allows commenters to highlight a specific viewpoint that the comment pertains to. At any point, the annotator can export the current set of comments to text form for the commenter to share with the original researcher. The data can subsequently be re-imported and made viewable in the interface. 4.5. VR FEATURES 29

4.5 VR Features

This chapter will expand on the approaches in chapter 3 with details on the imple- mentation of those goals. The first section will discuss how modularity is promoted within the application on a software design level. The second section will discuss how the Timeline Animator, a custom animation system, is implemented. The third section discusses the Command Line Interface running underneath the application, how it is structured, and how it is used by developers. The fourth section discusses how VR support in the application is implemented and how it interfaces with the rest of the application. The fifth section touches briefly on asset creation throughout the application’s development. The sixth section goes into brief detail on other areas of note which aren’t covered in as great a detail as those preceding. The chapter will conclude by summarising the goals of the implementations and how these are intended to address the concerns of the approaches previously outlined.

4.5.1 VR Interface The one type of VR interface implemented currently is a type of holographic display used to present information to the user, and have them interact back. A holographic display, seen in figure 4.16, combines both flat, and three dimensional interface elements. Interactable elements such as buttons are three dimensional, while static elements such as text and backgrounds overlays are either partially or entirely flat. The currently supported types of elements are the following: • Range Input • Multiple Option Input • Text • Overlay A questionnaire display is a specific arrangement of the types of elements provided, consisting of a series of slides, which are either a piece of information or question.

Figure 4.16: A questionnaire slide containing interactable elements

Interactables are designed to behave similarly to traditional 2D interface elements, the VR controller behaving similar to a cursor in that case. This is to minimise the 30 CHAPTER 4. IMPLEMENTATION need to explain to users how to work the interface, assuming they’re familiar with regular interfaces. The following slide templates are supported:

• Information: Text with navigation buttons

• Information: Text only

• Question: Slider (draggable handle)

• Question: Options (buttons)

Whenever the user submits an input, such as by pressing a confirmation button below a range slider, the value of the submit event (the slider value in this case) can be piped to a data container, mapped to a key. The contents of this container can be exported to a data file, such as at the end of a procedure, with the container storing every value submitted by the user.

4.6 Use Case

This section describes the implementation details of the typical use case proposed in chapter 3, which puts the framework to the test. The procedure involves a scenario in virtual reality, within which a human subject fills the role of an observer of virtual agents, and provides within-VR feedback.

Figure 4.17: Example study execution flow. Note that the initial "continue" prompt has no explicit alternative inside VR.

4.6.1 The Scenario The scenario consists of a virtual Cafe setting (see figure 4.19), two characters sitting at a table opposite of each other, and the subject positioned at a table facing them. The characters play out a behaviour sequence, after which the subject takes a short survey to rate their experience.

4.6.2 Content Creation The scenario uses custom content created by the author of this thesis, specifically for use in the study at hand. This was partly done to have available assets custom fitted to the study, but mostly it was done to develop a better understanding of the content creation workflow involved and how the software should be able to accommodate it. 4.6. USE CASE 31

4.6.3 Visual Aesthetics The environmental assets are heavily stylised and made to appear cartoon like. This form of stylisation, particularly with regard to the characters, was preferred on the assumption that viewers are less sensitive to violations of realism when an exaggerated cartoon-like styling is used. [29] The overall cartoon effect for most objects (visible in figure 4.18) is done with a combined application of texture and shading. The ink-like outlines visible on the object are embedded in its texture which is made with a technique that involves both the modeling software the object was created with and an image editor from which the texture is ultimately exported. A specialised shader is then used to apply toon lighting effects when the object is rendered. The same shader is also capable of achieving a realtime outline effect similar to that of the texture, but the quality varies based on the angle between surface normals of the object. The realtime outline is reserved for objects with minimal hard angles in their profile such as characters.

Figure 4.18: Outline texture applied to object with toon shading

All of the assets used in the scenario are composed of custom assets either made for prototyping purposes during development of the application, or made specifically for this use case.

Figure 4.19: Cafe environment. Assets use a combination of outline textures and realtime shading to produce the toon like effects visible.

4.6.4 Characters Two characters included in the scenario, named Al and Bob, shown in figure 4.20. The characters can be in either of two states, idling or conversing. When idling, the characters emit no dialogue, and idle in their seats on a loop. When conversing, the characters play out a conversation with audible dialogue, and then return to idling. 32 CHAPTER 4. IMPLEMENTATION

Figure 4.20: Cafe characters, Al and Bob

Bob has two possible variations to his behaviour, shown in figure 4.21. In a varia- tion dubbed "attentive", Bob’s posture is aimed towards Al and while conversing he frequently gives Al his attention. In the variation dubbed "aloof", Bob’s posture is directed away from the table, and while conversing he frequently shifts his gaze away signaling disinterest.

Figure 4.21: Variations of Bob’s behaviour, attentive (left), and aloof (right). Attentive Bob faces Al, and aloof Bob partially the exit.

Aside from Bob’s variation, the conversation plays out exactly the same.

4.6.5 Characters Aesthetics

The characters adhere to the same type of toon aesthetics described previously. Their model had several prototype stages, which eventually resulted in a version slightly closer to realistic proportions and expressiveness (see figure 4.22). 4.6. USE CASE 33

Figure 4.22: Stages of character prototype, with the model currently used on the far right

4.6.6 Character Dialogue For their voiced dialogue, the characters use pre-recorded audio clips, generated with the text-to-speech tool Polly [30] from a written script.

4.6.7 Behaviour Sequences All the behaviours used were created using the animation tools described in previous section (see sections on timeline and timeline extensions). Each sequence contains the behaviour of one character. These are:

• Al Idling

• Al Conversing

• Bob Idling (attentive)

• Bob Conversing (attentive)

• Bob Idling (aloof)

• Bob Conversing (aloof)

These are then combined into four meta-sequences that comprise both characters:

• Idling (Al idling | Bob idling, attentive)

• Conversing (Al Conversing | Bob conversing, attentive)

• Idling (Al idling | Bob idling, aloof)

• Conversing (Al Conversing | Bob conversing, aloof)

Furthermore, these meta-sequences are divided into two groups, one for each version of the scenario (attentive, aloof). 34 CHAPTER 4. IMPLEMENTATION

4.6.8 User Interaction and Interface During the session, the subject is able to initiate the procedure, i.e., the conversation between the characters, and furthermore is able to repeat this process as often as they like. After which they give feedback inside VR pertaining to their experience and their evaluation of the characters. The interface shown to the subject for them to give feedback and answer questions was created using the VR interface capabilities provided (see previous section on VR interfaces). This holographic display contains all information and questions presented to the sub- ject during the procedure. The way buttons respond to input attempts to replicate behaviour of buttons in tra- ditional interfaces, where here, the VR controller functions as a mouse cursor. This is done to leverage existing expectations as to minimise the need to explain how the interface is used. The elements are spread facing the user at different angles to give greater impression of depth and physical presence.

4.6.9 Configuration To set execution parameters, a configuration file is read when the application starts up. The file is read line by line, and each is sent directly to the command backend for execution. The path to the file is fixed, and sits relative to the project root directory.

1 screen.set_fade(1) 2 _load_macros(Init/macros.json) 3 _load_macros(Init/debug_macros.json) 4 _load_macros(Init/shortcut_macros.json) 5 _load_macros(Init/vr_macros.json) 6 #set_al_name AL 7 #set_bob_name BOB 8 _delay(1|#fade_in 3) 9 #set_time 14:40 10 vr_camera.set_enabled(1) 11 chat.set_delay(3) 12 chat.set_version(1) 13 survey.show() Listing 4.2: cafe.init: The configuration file used to initialise the application and the procedure. The characters’ names are set, the screen is set to fade, the clock on the wall has its time set, the VR (subject) camera is enabled, a version of the conversation is set (1=aloof), and the user questionnaire interface shown. The configuration file essentially provides a way to supply command line arguments to the application. Chapter 5

Results

This section will reflect on the implementation’s discussed in chapter 4, and contrast it with ideals outlined in chapter 3 (approach), as well as how well they achieved intended outcome with the use case.

5.1 Principles and Architecture Flexibility

Components in the application as they are implemented are self contained, and lend themselves well to being swapped out. Because the current implementation is built using Unity, and also opts to make use of many of Unity’s native capabilities such as scene loading or in-editor dependency injection, it is largely inflexible at runtime. To add a new environment, its assets must first be imported into the application’s Unity project, and then configured to make its controllable aspects accessible at run- time. For example, the Cafe scene used in the example study (see use case in chapter 4) had to be imported into Unity. Then all parameterised objects, such as the charac- ters and the VR interface, had to be inserted into the scene and command handlers connected to them for runtime use. Generally speaking, any such issue exists for anything that relies on the Unity editor interface for configuration, such as the character animation tools, and command handlers. Modifying runtime behaviour is limited, and relies on the scripting syntax used to execute commands. This syntax is limited and cannot currently be used to achieve anything more intelligent than a sequential execution of commands.

5.2 Content Support

This was briefly touched upon in the previous section. To add new content, a researcher must first import assets through the Unity editor interface, and manually set up those connections which make it accessible for runtime control. Character models need to be configured by Unity to derive the skeletal definitions required for use by the animation tools. Environments may require an even more extensive preparation. Light data can- not be generated at runtime, and requires Unity’s lighting features to do so. Some 36 CHAPTER 5. RESULTS realtime effects are possible to compensate if these are not used, but they draw extra performance and may not achieve a pleasing result, especially in VR.

5.3 Usability of Agent Tools

Designing agent behaviour with the current suite of tools is very rigid, and requires meticulous adjustments when designing a new recipe. Bones need to be moved by hand to each desired position, a lot of which is incremental and tedious. Synchronisation with various events needs to be managed manually such that for example, character dialogue syncs up with their mouth movements. The Synchronisation issues become even more glaring and its solutions tedious when setting up a behaviour that is directly responding to a specific other. Since there is no specialised overview provided to aid synchronisation for multiple characters, behaviours involving multiple characters must either be merged into a single recipe, or split apart and each carefully synchronised in turn. Being able to specify gaze directions by targets ended up highly useful, albeit lim- ited. Targets along with an offset could be specified, but no variation capabilities were provided. It was not possible for example to specify variations to the gaze direction as a function of time, such that a character could be made to look at a trajectory around its target. The closest such functionality is to have gaze targets overlap and fade to one another. There was no built in way to specify head look targets, nor one for posture orien- tation. Head look in particular ended up requiring many different variations for those behaviours implemented for the use case, all of which had to be manually arranged by tweaking the neck and head bones. The posture was similarly tedious. Though less frequently required, it does require a larger number of bones to be transformed to achieve a natural result (spine bones, neck, legs etc.)

5.4 Use Case Reviewed

Chapter 4 presented an implementation of a use case proposed in chapter 3. A subject pilot was executed which used this implementation to a get a sense of how well the tools supported such a study. The following sections discuss the setup, and execution of said study, and assess how the implementation performed with regard to designing an experiment.

5.4.1 Preparation

The HTC Vive Pro VR headset was used for the procedure, as well as a single handheld Vive controller, used for user interaction. Subjects were positioned in the virtual environment in a seated perspective. In the real environment they were centered in the middle of a 3x3 space and seated (see figure 5.1). 5.4. USE CASE REVIEWED 37

Figure 5.1: Subject position inside VR and outside VR. In the real environment, the subject sat in the middle of a 3x3 space - though spacious, the subject was not intended to move from the center.

The process of creating character behaviour for the scenario quickly became tedious as it involved a lot of manual refinements to animation. While the tools didn’t constrain precision (bones could be freely manipulated), there were few shorthands available to speed up the process. Setting up a hand gesture meant having to adjust each bone manually to compose it, while also manually making sure they made contextual sense (weaker hand gestures when listening for example). While adding final polish to the characters’ behaviours, the annotator tool was used for peer feedback. Gathering comments was tiresome because it involved building a separate application containing just the behaviours, having the commenter send back an annotated document, loading the document into the annotator on the designer end, and then after viewing the comments, going back to the behaviour interface and tweaking the behaviours based on the feedback. It was considerably difficult therefor to regularly integrate feedback in this manner. The annotator did turn out to be useful when used sparingly, especially for broader issues such as pointing out lack or excess of gestures at certain points. The in-VR interface used to gather subject feedback was not initialised procedu- rally at runtime, and was instead wired manually in the Unity editor. Composing new information, or questions did not require any new scripts or features, and those used very general components that were linked together by way of events. A press of a spe- cific button for example would trigger the current text being hidden, and the next text being shown. Not having the interface procedurally initialised (researcher scripted) was expectantly inconvenient. Changing a question text for example wasn’t supported at runtime. Although it could technically have been by registering configurable UI elements to the command backend, adding one for every such element manually would have become quickly difficult to manage.

5.4.2 Running the Study There were 7 subjects in total that participated in the procedure. None of the subjects directly reported symptoms of motion sickness. Though this was entirely expected given the short length of the procedure (<5 minutes). Subjects filled out a standard simulation sickness questionnaire [31] before (table 5.1) and after entering VR (table 5.2), with similar results both times, indicating little discomfort. Although this could also mean that subjects’ interpretation of the question texts altered slightly after having entered VR, and therefor providing a more honest 38 CHAPTER 5. RESULTS answer the second time around. This seems plausible as some subjects weren’t sure what some questions referred to, and this could have become clearer to them during the procedure. Tables 5.1 and 5.2 list the sickness score of both the before-and after question- naires for each subject. The sickness questionnaire that was used contains 16 ques- tions in total, each of which is ranked on the discrete range [0,3]. These are divided into three representative subscores: nausea-related (N), oculomotor-related (O), and disorientation-related (D). Each question is weighted in one or more of these categories.

# N (%) O (%) D (%) Total (%) 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 5 0 12 5 24 29 14 20 6 24 29 14 20 7 0 0 0 0

Table 5.1: Sickness before VR. (N=Nausea, O=Oculomotor, D=Disorientation)

# N (%) O (%) D (%) Total (%) 1 0 0 0 0 2 0 10 5 4 3 0 0 0 0 4 0 5 0 1 5 14 19 5 11 6 10 19 0 8 7 0 0 0 0

Table 5.2: Sickness after VR. Note that decrease in sickness scores may indicate that subjects interpreted questions differently after entering VR.

The framerate throughout the procedure was not logged, but it appeared to sit consistently around 90-100 frames. There were no noticeable dips in frames (sudden >10 frame jumps), which tend to correlate with discomfort, which wasn’t reported by users, nor significantly indicated by the questionnaires. The subjects used a VR controller to interact with the interface. None of the subjects experienced major difficulties figuring out how to interact with the interface, nor with using the controller. Beyond explaining which button on the controller was pressed to confirm selection, and how selection worked in the interface, users required no explanations. The first interaction makes sure the subject is ready to continue, and once he had figured out how to confirm that selection, he did not require further explanations during the remainder of the procedure. At the beginning of the procedure, the parameters used in the scenario was config- ured through a web site, which made use of the network features that allow commands to be sent to the application through its network interface. Having this feature was extremely useful as it allowed the procedure to be configured without a significant 5.4. USE CASE REVIEWED 39 performance overhead. Although it was only used at the beginning to reset conditions, it could be kept open throughout without issue.

5.4.3 Scenario Specific Commentary The characters in the scenario played out a conversation with voiced dialogue (pre- generated text-to-speech audio). The topic of the conversation was chosen for its uninteresting qualities, such that the subject would be less inclined to form an opinion on it. This would allow the subject to rate the characters more on the basis of their expressed behaviour, rather than the characters’ viewpoint during the conversation. Comments from subjects however seem to indicate that the topic was too dull or irrelevant, such that character A that opened the conversation with the topic was viewed unfavourably, or too random. This wasn’t expected since character A stayed the same between groups, and his counterpart, character B, swapped behaviours. It was presumed that one of these behaviours would get character B rated less favourably than character A, but the results were instead rather inconclusive. Chapter 6

Future Work

The biggest limitation to the implementation described in chapter 4 is its over-reliance on a programmer-centric approach for researchers, and this became especially evident during the process of setting up the example study. It flouts the principles from chapter 3 with regard to separating the programmer and researcher environments. But it did enable quicker prototyping, and perhaps gave a better sense of the sort of workflow that researchers might require.

6.1 Content Pipeline

There was no way to import custom content at runtime in the implementation pre- sented. New content had to be added through the Unity editor offline. This is definitely not ideal with respect to the principles outlined in chapter 3, and should be addressed in a proper implementation by adding an abstraction layer between the application and content, allowing new assets to be registered for runtime use. Generally speaking, per the principles of chapter 3, assets should be bundled to- gether, and loaded in at runtime. Unity currently supports a dedicated asset format called AssetBundles. This sys- tem allows content to be archived separate from the main application, and then loaded at runtime. It is designed primarily to lighten install sizes for game applications - sepa- rating application from content (ideal for downloadable content on mobile platforms). And usually the same developer is responsible for both. But it can just as well be used to support custom content from arbitrary sources.

6.2 Dedicated Scripting Language

The simple scripting syntax used to execute commands and macros used to modify runtime behaviour should ideally be replaced in favour of an established scripting language. The one currently used only supports simple function statements, which prevents more complex logic from being assembled, such as conditional or branching execution. Either the application adds more control constructs, which commits it to continuously updating its core whenever new constructs are needed, or an outside language is supported by integrating a dedicated script interpreter. Lua or Python for example might be considered as they can be integrated fairly easily, and have been widely proven effective for embedded use. Lua is in fact referred to a glue language due to it commonly being used to wire together application components 6.3. SCRIPTING ABSTRACTIONS 41 at runtime. Lua is extremely lightweight as the language consists of very minimal constructs. Python is interesting for opposite qualities - an immense variety of built-in features.

6.3 Scripting Abstractions

The implementation contains a simple scripting language, and as suggested previously, it should be replaced by an established language. But the utility of such scripting features hasn’t been properly argued for with regard to researchers. Ultimately, a scripting language should only be used as an intermediary language between the user and software. Some sort of abstraction, such as a visual interface should sit on top the scripting layer. For example: a visual node editor interface may allow researchers to put together a procedure by chaining together content nodes, and the output of this editor is a file written in the scripting language supported by the software. Whatever form a scripting abstraction may take that is well suited to social re- searchers, they should be looked into as a next step towards making the software more accessible to non programmers.

6.4 Speech Generation

The characters used in the example study had fully "voiced" dialogue, generated with a text-to-speech tool. The clips were entirely pre-generated and it took some manual effort to synchronise them properly with movements. It might therefor prove useful to support some kind of text-to-speech synthesis natively. And perhaps if a smarter ani- mation system were also supported, one that included speech markers, synchronisation could be done entirely automatically. It might also be interesting to consider "speech" in more abstract terms from the application’s point of view. Such that different types of speech generators could be swapped in and out. For example, in The Sims series of video games, character speech is uttered in a made-up language known as "Simlish". The utterances are incomprehensible, but they allow the characters’ mood to be expressed through tone and inflection without words. Mouth animations, like audio, had to be done entirely by hand, and were made to appear lively rather than realistic (random movements). A dedicated lip sync system could be included in the future for this reason, one that is able to generate phoneme markers to match the dialogue.

6.5 Reactive Animation System

The animation tools are able to specify how poses are blended together very exactly, but not dynamically. The outputted animation isn’t reusable outside the context it was created for and cannot with the current implementation be made to react to unpredictable elements. Such as movements of VR user. The only exception is gaze behaviour which can be set to track dynamic targets (although anomalous when paired with fixed head movements). 42 CHAPTER 6. FUTURE WORK

A more sophisticated animation system might be included where pre-scripted an- imations might be combined with some manner of dynamic limb motions, such as by using inverse kinematics algorithms. These could be used to derive animations which make characters appear more adaptive to their context. It would be ideal if established agent systems like SmartBody or BEAT (see chapter 2) could be integrated into the core application to enrich the agent portion of the toolkit, or even better, the application would support their integration without wiring them specifically into the core - i.e., leave the application extensible at runtime to the extent that varied system integration is possible to support. Chapter 7

Conclusion

The work presented here can be viewed as having contributed steps towards imple- menting a reusable VR-based social study application. The reusable qualities of such an application owing to the fact that in principle, at its core comes included a set of standard behaviours, common to similar such applications that have had more specific focus. Its emphases would be that of integrating toolkits, and expansion thereof, which allow users to develop new behaviours and scenarios, and quickly bring them into form ready for evaluation. What has been developed so far is rough in its realisations to- wards such an idea, but given steady improvements, has the potential to speed up the time it takes to evaluate new scenarios, given the elimination of the inherent need that an application unique to the scenario needs building every single time.

7.1 Contributions

7.1.1 Guidelines The steps taken towards developing a toolkit application that might aid more general use for social researchers in the future is considered to be of primary contribution in this thesis.

7.1.2 Command Line Interface Command line interfaces aren’t rare nor is it a new idea in itself. They remain widely used in commercial games still. Their potential seems less explored when it comes to applications developed specifically to be test beds for social simulation. The imple- mentation discussed here is relatively self contained and could be integrated into other projects with minimal adjustment.

7.1.3 Animation Tools The Human Animator and its custom Timeline extensions depend on Unity’s Timeline component. Assuming that Timeline will remain a permanent fixture to Unity’s set of tools, the HA will continue to be available for Timeline-supported versions of Unity. The custom timeline tracks, with some minor adjustments could be made to play well with other systems, such that they could be used to added additional animation tweaks on top of something else. 44 CHAPTER 7. CONCLUSION

The behaviour annotation tool was added specifically to work with behaviour made using the custom timeline features and Human Animator component, but the imple- mentation is loosely decoupled from those components and could be easily separated completely. If it cannot be reused in its current form, then the idea at least serves in utility and in principle: i.e. giving researchers tool to evaluate their work.

7.1.4 Content Pipeline Custom content was created for the study by the author of this thesis to gain better understanding of the workflow involved. Content creation wasn’t specifically outlined as a requirement to consider, but supporting a content pipeline was. It’s advised however to consider both aspects if a content pipeline is to be made. It’s important that no content pipeline be enforced unless done so with broad regard to the type of content considered typical for use. Bibliography

[1] John Carmack and John Romero, Wolfenstein 3D, 1992. [Online]. Available: ht tps://github.com/id-Software/wolf3d. [2] John Carmack, John Romero, and Dave Taylor, Doom, 1993. [Online]. Available: https://www.mobygames.com/game/doom. [3] ——, Doom II: Hell on Earth, 1994. [Online]. Available: https://www.mobygam es.com/game/doom-ii_. [4] Ben Gokey and Chris Rhinehart, Heretic, 1994. [Online]. Available: https:// www.mobygames.com/game/heretic. [5] Stephen Browning. (1998). Doom: Ghostbusters Mod, Ghostbusters mod for Doom, [Online]. Available: https://www.doomworld.com/gbd2/index2.htm (visited on 05/23/2019). [6] Ace Team Software. (1998). Doom II: Batman Mod, Batman mod for Doom II, [Online]. Available: https://www.doomworld.com/batman/main.shtml (visited on 05/23/2019). [7] Joshua Weier, Portal 2, 2011. [Online]. Available: http://www.thinkwithport als.com/. [8] Guy Carver and Steve Meister, Fallout 3, 2008. [Online]. Available: https:// fallout.bethesda.net/games/fallout-3. [9] Rob Pardo, Jeff Kaplan, and Tom Chilton, World of Warcraft, 2004. [Online]. Available: https://worldofwarcraft.com/. [10] Adam Williams. (2019). WoW: Deadly Boss Mods, Deadly Boss Mods addon page, [Online]. Available: https://www.curseforge.com/wow/addons/deadly- boss-mods (visited on 05/23/2019). [11] M. Lewis and J. Jacobson, “Game engines in scientific research”, Communications of the ACM, vol. 45, no. 1, Jan. 1, 2002, issn: 00010782. doi: 10.1145/502269. 502288. [Online]. Available: http://portal.acm.org/citation.cfm?doid= 502269.502288 (visited on 05/21/2019). [12] D. Thue, V. Bulitko, M. Spetch, and T. Romanuik, “A computational model of perceived agency in video games”, p. 6, [13] S. Poulakos, M. Kapadia, G. M. Maiga, F. Zünd, M. Gross, and R. W. Sum- ner, “Evaluating accessible graphical interfaces for building story worlds”, in Interactive Storytelling, F. Nack and A. S. Gordon, Eds., vol. 10045, Cham: Springer International Publishing, 2016, pp. 184–196, isbn: 978-3-319-48278-1 978-3-319-48279-8. doi: 10.1007/978-3-319-48279-8_17. [Online]. Available: http://link.springer.com/10.1007/978-3-319-48279-8_17 (visited on 06/03/2019). 46 BIBLIOGRAPHY

[14] L. M. Seversky and L. Yin, “Real-time automatic 3d scene generation from natu- ral language voice and text descriptions”, in Proceedings of the 14th annual ACM international conference on Multimedia - MULTIMEDIA ’06, Santa Barbara, CA, USA: ACM Press, 2006, p. 61, isbn: 978-1-59593-447-5. doi: 10.1145/ 1180639.1180660. [Online]. Available: http://portal.acm.org/citation. cfm?doid=1180639.1180660 (visited on 06/03/2019). [15] S. Dupuy, A. Egges, V. Legendre, and P. Nugues, “Generating a 3d simulation of a car accident from a written description in natural language: The CarSim system”, p. 8, 2001. [16] S. Kopp, B. Krenn, S. Marsella, A. N. Marshall, C. Pelachaud, H. Pirker, K. R. Thórisson, and H. H. Vilhjálmsson, “Towards a Common Framework for Mul- timodal Generation: The Behavior Markup Language”, in Intelligent Virtual Agents: 6th International Conference, Marina Del Rey, CA, USA, August 21-23, 2006. Proceedings, ser. IVA ’06, J. Gratch, R. M. Young, R. Aylett, D. Ballin, and P. Olivier, Eds., Berlin, Heidelberg: Springer, 2006, pp. 205–217, isbn: 978- 3-540-37594-4. doi: 10.1007/11821830_17. [Online]. Available: https://doi. org/10.1007/11821830_17. [17] H. H. Vilhjálmsson, N. Cantelmo, J. Cassell, N. E. Chafai, M. Kipp, S. Kopp, M. Mancini, S. Marsella, A. N. Marshall, C. Pelachaud, Z. Ruttkay, K. R. Thóris- son, H. v. Welbergsen, and R. J. v. d. Werf, “The Behavior Markup Language: Recent Developments and Challenges”, in Intelligent Virtual Agents: 7th Interna- tional Conference, IVA 2007 Paris, France, September 17-19, 2007 Proceedings, C. Pelachaud, J.-C. Martin, E. André, G. Chollet, K. Karpouzis, and D. Pelé, Eds., Berlin, Heidelberg: Springer, 2007, pp. 99–111, isbn: 978-3-540-74997-4. doi: 10.1007/978-3-540-74997-4_10. [Online]. Available: https://doi.org/ 10.1007/978-3-540-74997-4_10. [18] H. H. Vilhjálmsson, “Representing communicative function and behavior in mul- timodal communication”, in Multimodal Signals: Cognitive and Algorithmic Is- sues, A. Esposito, A. Hussain, M. Marinaro, and R. Martone, Eds., vol. 5398, Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 47–59, isbn: 978-3- 642-00524-4 978-3-642-00525-1. doi: 10.1007/978-3-642-00525-1_4. [Online]. Available: http://link.springer.com/10.1007/978- 3- 642- 00525- 1_4 (visited on 05/21/2019). [19] D. Heylen, S. Kopp, S. C. Marsella, C. Pelachaud, and H. Vilhjálmsson, “The Next Step Towards a Functional Markup Language”, in Intelligent Virtual Agents, H. Prendinger, J. Lester, and M. Ishizuka, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 270–280, isbn: 978-3-540-85483-8. doi: 10.1007/ 978-3-540-85483-8_28. [Online]. Available: http://dx.doi.org/10.1007/ 978-3-540-85483-8_28. [20] J. Kolkmeier, M. Bruijnes, D. Reidsma, and D. Heylen, “An ASAP realizer- unity3d bridge for virtual and mixed reality applications”, in Intelligent Virtual Agents, J. Beskow, C. Peters, G. Castellano, C. O’Sullivan, I. Leite, and S. Kopp, Eds., vol. 10498, Cham: Springer International Publishing, 2017, pp. 227–230, isbn: 978-3-319-67400-1 978-3-319-67401-8. doi: 10.1007/978-3-319-67401- 8_27. [Online]. Available: http://link.springer.com/10.1007/978-3-319- 67401-8_27 (visited on 05/23/2019). BIBLIOGRAPHY 47

[21] D. Reidsma and H. van Welbergen, “AsapRealizer in practice – a modular and extensible architecture for a BML realizer”, Entertainment Computing, vol. 4, no. 3, pp. 157–169, Aug. 2013, issn: 18759521. doi: 10.1016/j.entcom.2013. 05.001. [Online]. Available: https://linkinghub.elsevier.com/retrieve/ pii/S1875952113000050 (visited on 05/23/2019). [22] M. Thiebaux and A. N. Marshall, “SmartBody: Behavior realization for embodied conversational agents”, p. 8, [23] J. Cassell, H. H. Vilhjálmsson, and T. Bickmore, “BEAT: The Behavior Expres- sion Animation Toolkit”, in Proceedings of the 28th Annual Conference on Com- puter Graphics and Interactive Techniques, ser. SIGGRAPH ’01, New York, NY, USA: ACM, 2001, pp. 477–486, isbn: 1-58113-374-X. doi: 10.1145/383259. 383315. [Online]. Available: http://doi.acm.org/10.1145/383259.383315. [24] T. Bickmore, A. Gruber, and R. Picard, “Establishing the computer–patient working alliance in automated health behavior change interventions”, Patient Education and Counseling, vol. 59, no. 1, pp. 21–30, Oct. 2005, issn: 07383991. doi: 10.1016/j.pec.2004.09.008. [Online]. Available: https://linkinghub. elsevier.com/retrieve/pii/S0738399104003076 (visited on 05/23/2019). [25] P. J. Lindal, K. R. Johannsdottir, U. Kristjansson, N. Lensing, A. Stuhmeier, A. Wohlan, and H. H. Vilhjalmsson, “Comparison of teleportation and fixed track driving in VR”, in 2018 10th International Conference on Virtual Worlds and Games for Serious Applications (VS-Games), Wurzburg: IEEE, Sep. 2018, pp. 1– 7, isbn: 978-1-5386-7123-8. doi: 10.1109/VS-Games.2018.8493414. [Online]. Available: https://ieeexplore.ieee.org/document/8493414/ (visited on 05/25/2019). [26] M. P. Jacob Habgood, D. Moore, D. Wilson, and S. Alapont, “Rapid, contin- uous movement between nodes as an accessible virtual reality locomotion tech- nique”, in 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), Tuebingen/Reutlingen, Germany: IEEE, Mar. 2018, pp. 371–378, isbn: 978-1-5386-3365-6. doi: 10.1109/VR.2018.8446130. [Online]. Available: https: //ieeexplore.ieee.org/document/8446130/ (visited on 05/25/2019). [27] Unity Technologies, Unity, 2019. [Online]. Available: https://unity.com/. [28] ——, (Jul. 4, 2017). Timeline Manual, Unity3D Documentation, [Online]. Avail- able: https://docs.unity3d.com/Manual/TimelineSection.html (visited on 01/23/2019). [29] K. Zibrek and R. McDonnell, “Does Render Style Affect Perception of Person- ality in Virtual Humans?”, in Proceedings of the ACM Symposium on Applied Perception, ser. SAP ’14, New York, NY, USA: ACM, 2014, pp. 111–115, isbn: 978-1-4503-3009-1. doi: 10.1145/2628257.2628270. [Online]. Available: http: //doi.acm.org/10.1145/2628257.2628270. [30] Amazon. (Jan. 17, 2019). AWS Polly Manual, Polly Manual, [Online]. Available: https://docs.aws.amazon.com/polly/index.html#lang/en_us (visited on 01/30/2019). 48 BIBLIOGRAPHY

[31] R. S. Kennedy, N. E. Lane, K. S. Berbaum, and M. G. Lilienthal, “Simulator Sickness Questionnaire: An Enhanced Method for Quantifying Simulator Sick- ness”, The International Journal of Aviation Psychology, vol. 3, no. 3, pp. 203– 220, 1993. doi: 10.1207/s15327108ijap0303\_3. eprint: https://doi.org/ 10.1207/s15327108ijap0303_3. [Online]. Available: https://doi.org/10. 1207/s15327108ijap0303_3.