<<

Automatic 2.5D Cartoon Modelling

Fengqi An School of Computer Science and Engineering University of New South Wales

A dissertation submitted for the degree of Master of Science 2012 PLEASE TYPE THE UNIVERSITY OF NEW SOUTH WALES T hesis!Dissertation Sheet

Surname or Family name. AN

First namEY. Fengqi Orner namels: Zane

Abbreviatlo(1 for degree as given in the University calendar: MSc

School: Computer Science & Engineering Faculty: Engineering

Title; Automatic 2.50 Cartoon Modelling

Abstract 350 words maximum: (PLEASE TYPE)

Declarat ion relating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole orin part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of thts thesis or dissertation.

I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation· Abstracts International (this is applicable to-doctoral theses only)

...... ~...... 24 I 09 I 2012 Signature · · ·· ·· ·· ···· · ··· ·· ~ ··· · ·· ··· ···· Date

The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years must be made in writi'ng. Requests for a longer period of restriction may be considered in exceptional e

FOR OFFICE USE ONLY Date of completion of requirements for Award:

THIS SHEET IS TO BE GLUED TO THE INSIDE FRONT COVER OF THE THESIS ii ORIGINALITY STATEMENT

'I hereby declare that this submrssion is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's aesign and conception or in style, presentation and linguistic expression is acknowledged.'

Signed ·················~················ · ·· · ····

Date 24/09 I 2012 COPYRIGHT STATEMENT

'I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/Will apply for a partial restriction of the digital copy of my thesis or dissertation.'

Signed ············· · ···~············ · ··················· ·· ···

Date 24109 I 2012

AUTHENTICITY STATEMENT

1 1 certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.'

Sighed · ·············~··············· · ··· ·· · · ·· · ··· · ·· ···· ···

24 I 0912012 Date iii

Abstract

Non-photorealistic arts have been an invaluable form of media for over tens of thousands of years, and are widely used in and games today, motivating research into this field. Recently, the novel 2.5D Model has emerged, targetting the limitations of both 2D and 3D forms of cartoons. The most recent development is the 2.5D Cartoon Model. The manual building process of such models is labour intensive, and no automatic building method for 2.5D models exists currently. This dissertation proposes a novel approach to the problem of automatic cre- ation of 2.5D Cartoon Models, termed Auto-2CM in this thesis, which is the first attempt of a solution to the problem.

The proposed approach aims to build 2.5D models from real world objects. Auto-2CM collects 3D information on the candidate object us- ing 3D reconstruction methods from Computer Vision, then partitions it into meaningful parts using segmentation methods from . A novel 3D-2.5D conversion method is introduced to create the final 2.5D model, which is the first method for 3D-2.5D conver- sion. The Auto-2CM framework does not mandate specific algorithms of reconstruction or segmentation, therefore different algorithms may be used for different kinds of objects.

The effect of different algorithms on the final 2.5D model is currently unknown. A perceptual evaluation of Auto-2CM is performed, which shows that by using different combinations of algorithms within Auto- 2CM for specific kinds of objects, the performance of the system maybe increased significantly. The approach can produce acceptable models for both manual sketches and direct use. It is also the first experimental study of the problem. iv v

Acknowledgements

Of the many people who deserve thanks, some are particularly prominent:

My supervisors, Prof. Arcot Sowmya and Dr. Xiongcai Cai, for their advice, guidance and support. This thesis would not have been possible without their selfless contribution. My parents, Shuli Zhan and Yuli An, for their support and understanding. The thanks and appreciation I owe to them cannot be described by any words. My fellow students in the group, Roger Chen, for his selfless help during my study; Dimitri Semenovich and Anuraag Sridhar, for their guidance and help as seniors. Shy Shalom, co-author of SDF, and Alec Rivers, co-author of 2.5D Cartoon Models, for their patience in answering questions and for helpful discussions. vi Contents

1. Introduction 3 1.1. Cartoon, Model and Modelling ...... 4 1.2. 2D,3D and 2.5D ...... 4 1.2.1. 2D ...... 4 1.2.2. 3D ...... 5 1.2.3. 2.5D ...... 5 1.3. Artists at Work ...... 6 1.3.1. 2D programs and devices ...... 7 1.3.2. Automatic 2D generation ...... 7 1.3.3. 3D programs and devices ...... 8 1.3.4. Automatic 3D modelling ...... 8 1.3.5. 2.5D programs ...... 9 1.4. Motivation ...... 9 1.5. Overview ...... 10 1.6. Scope ...... 11 1.7. Contribution ...... 11 1.8. Organization ...... 12

2. Background and Literature Survey 13 2.1. Cartoons ...... 14 2.1.1. History ...... 14 2.1.2. Modern Cartoons ...... 16 2.1.3. The 2D-3D argument ...... 17 2.2. Early 2.5D Cartoons ...... 17 2.2.1. Automatic In-betweening for Animation ...... 18 2.2.2. Drawing for Illustration and Annotation in 3D ...... 19

vii viii Contents

2.3. The 2.5D Cartoon Models ...... 20 2.3.1. Advantages ...... 20 2.3.2. Limitations ...... 21 2.3.3. The Manual Creation Process ...... 23 2.4. 3D Reconstruction ...... 24 2.4.1. Feature Detection, Stereo Matching and Visual Hull ...... 24 2.4.2. Image-Based 3D Reconstruction ...... 26 2.5. 3D Segmentation ...... 27 2.5.1. Manual 3D Segmentation ...... 27 2.5.2. Automatic 3D Segmentation ...... 27 2.5.3. Shape Diameter Function ...... 28 2.5.4. Fitting Primitives ...... 29 2.5.5. Protrusion-oriented Segmentation ...... 29 2.6. 3D ...... 30 2.7. Summary ...... 31

3. Automatic 2.5D Cartoon Modelling 33 3.1. 3D Reconstruction ...... 34 3.2. 3D Segmentation ...... 35 3.3. 2.5D Stroke Creation ...... 38 3.3.1. Parts Refinement ...... 38 3.3.2. Shape extraction ...... 39 3.3.3. Stroke Control Points Creation ...... 40 3.3.4. Other Attributes ...... 41 3.4. 2.5D Model Assembly ...... 42 3.4.1. Key-View Selection ...... 42 3.4.2. Single Part Assembly ...... 43 3.4.3. Model Assembly ...... 43 3.5. Manual Modifications ...... 44 3.5.1. Stylistic Strokes Modification ...... 45 3.5.2. Colouring ...... 45 3.6. Algorithm ...... 47 3.7. Experiments and Results ...... 47 3.8. Limitations ...... 52 3.9. Remarks ...... 53 Contents ix

4. Perceptual Evaluation and Algorithm Selection 55 4.1. Data Sets ...... 56 4.1.1. Scientific Models ...... 56 4.1.2. Industry Models ...... 58 4.2. ...... 61 4.3. Software Interface and Development ...... 62 4.4. Results and Analysis ...... 63 4.4.1. Segmentation Results and Analysis ...... 63 4.4.2. 2.5D Models and Analysis ...... 65 4.5. Recommendations ...... 76 4.5.1. Simple Models ...... 76 4.5.2. Reconstructed Models ...... 76 4.5.3. industry Models ...... 76 4.5.4. Summary ...... 77 4.6. Remarks ...... 77

5. Conclusion 79 5.1. Thesis Overview ...... 79 5.2. Contributions ...... 80 5.3. Limitations and Future Work ...... 81 5.4. Concluding Remarks ...... 82

A. Publications Arising from Thesis 83

B. Other Tools, Engines and Frameworks 85 B.1. ...... 85 B.2. Unity ...... 85 B.3. OpenCvSharp ...... 86

C. Acronyms and Abbreviations 87

Bibliography 91

List of Figures 97 x “I think the idea of a traditional story being told using is likely a thing of the past.” — Jeffrey Katzenberg, CEO of DreamWorks Animation

xi xii “I think 2D animation disappeared from Disney because they made so many uninteresting films. They became very conservative in the way they created them. It’s too bad. I thought 2D and 3D could coexist happily.” — Hayao Miyazaki, Co-founder of Studio Ghibli

1 2 Chapter 1

Introduction

Cartoons are an important form of media, and they are widely used in and video games. Since the Family Computer was released in 1983, cartoons have become the most common type of artwork, and most top games in the Ap- ple AppStore today use stylistic cartoon artwork. New tools and technologies continue to be invented and provide relief from labour intensive work in creating cartoons, so that humans can focus on more creative aspects. Today there are many computer-aided applications1 for the cartoon industry. Sprites [45] for games uses code to control anima- tion of characters during run-time. 3D models and editors are available, and moreover there are automatic methods [28,66] to build them. However, both the 2D and 3D form of cartoons have certain limits, and in order to address them, a novel structure known as 2.5D models [5,22,55] were introduced. 2.5D models preserve geometric cartoon ele- ments just like 2D, while being able to rotate like 3D. Yet this is not enough, because drawing of key frames for 2.5D models is still labour intensive. Currently, there is no automatic way to build 2.5D models, unlike 2D images and 3D models, and this problem is the focus of research in this thesis.

In this chapter, concepts related to this thesis such as Cartoon, 2.5D, and 2.5D Cartoon Modelling are explained. The motivation and significance of the building process for 2.5D Cartoon Models automatically is discussed. The first part of this chapter clarifies the concepts of Cartoon, Model and Modelling in Section 1.1. As a relatively new concept, it is necessary to first have a basic understanding of 2D and 3D, and this is discussed in Section 1.2. Artists currently have various choices to create different forms of artworks, with manual editors available for 2D, 3D and 2.5D models, and both 2D and

1Inkscape(http://inkscape.org/), GIMP(http://www.gimp.org/)

3 4 Introduction

3D artworks have approaches for automatic creation, but there is currently no method for building 2.5D models automatically. This situation is discussed in Section 1.3.

1.1. Cartoon, Model and Modelling

Cartoon refers to a typically non-realistic or semi-realistic drawing or painting [48]. The term is also widely used to define a film that is made by photographing a series of drawings [69]. Almost every artefact that is not photorealistic could be called Cartoons.

A Model is a computer representation or scientific description of something[69]. Here in this thesis, this term means computer representation of an object or artefact.

Modelling is the activity of making models of objects [69]. In , is the process of developing a mathematical, wireframe representation of any 3D object using specialized software. The product is called a 3D model[2]. Similarly, 2.5D Modelling is the process of 2.5D model creation. The term 2.5D is expained in Section 1.2.3.

Thus cartoon model means a computer representation of a cartoon object, and cartoon modelling means the process of building such a model.

1.2. 2D,3D and 2.5D

To understand 2.5D, basic knowledge of two more common concepts, 2D (in Sec- tion 1.2.1) and 3D (in Section 1.2.2), is necessary. The 2.5D concept is discussed in Section 1.2.3. There are also visual arts in only one dimension, but they are not as important nor commonly seen, and are not included in this discussion.

1.2.1. 2D

The two-dimensional (2D) space is a spherical surface, where the geodesic lies on the surface and is defined by any two of its points [11]. The real world is 3D, but when the optics of the eye creates an image of the visual world on the retina, that image is two dimensional. Humans can understand 2D artworks really well, and the human brain is able to get 3D information from a 2D painting when it is in [19]. Traditional Introduction 5

cartoon animation, where each frame is drawn by hand, is also 2D. Some 2D contents can present non-realistic information, while some of these are geometrically impossible for 3D. Therefore, although there are more and more 3D cartoons today, 2D is still the best choice for artists to express their imagination.

1.2.2. 3D

A three-dimensional (3D) space is a space that has, or appears to have, length, depth, and height. [69]. Although “3D” technologies have become very popular recently, 3D artworks have a very long history, even longer than that of 2D. The earliest European sculpture is a real world 3D object, which portrays a female form, and is estimated to have been created about 35,000 years ago [16]. But today the term 3D often refers to 3D computer graphics, including games, animations and visual effects used in movies.

A widely used rendering technique called Toon-Shading or -Shading can make 3D models look like cartoons. 3D cartoons often have more realistic effects and faster creation pipelines. However, when creating 3D cartoons, artists are not as free as creating 2D cartoons, because some stylistic elements are difficult to present in 3D[55], see Fig. 2.8 for example. Current methods include interpolating 3D models at run time, which is quite computing-resource intensive, and artists must create multiple 3D models and define their relative positions, which makes it very labour intensive. It is easy for 2D cartoons to add these elements, but 2D cartoons cannot rotate which means artists have to draw every frame. Arguments between 2D and 3D artists have lasted for decades, this is discussed in Section 2.1.3. 2.5D models is a novel solution to this problem.

1.2.3. 2.5D

Two-and-a-half-dimensional (2.5D) is precisely to emphasize that the representation is of distal surfaces, not shapes, from the perspective of the viewer [65]. The term 2.5D is often used to describe the techniques used in video games and animations to simulate 3D scenes using only 2D elements. These include games based on isometric tiles where the viewing angles are often fixed [41], as in Figure 1.1. 2.5D is mostly seen in games, and there are many game engines that support the 2.5D style. The term 45-degree describes 2.5D games that give the player a 45-degree third person viewpoint. Other 6 Introduction

2.5D techniques like are also widely used [6]. For example, Super Mario2, Rayman3 and Worms4 are such games.

Figure 1.1.: An isometric tiles example using Cocos2d engine [41].

Some other 2.5D methods[22],[5] may provide view point changes, but these methods are relatively limited as they rely heavily on information manually provided by the artists, and thus are inefficient and labour intensive. Rivers et al. [55] proposed a novel solution for full 3D of 2D drawings, called “2.5D Cartoon Models”. It is a new model structure that combines interpolation and rendering methods. More importantly, it retains 2D stylistic strokes while rotating in 3D and enables presentation of elements that are impossible to define using current 3D techniques.

1.3. Artists at Work

Today visual artists have many options to express their imagination, and there are var- ious kinds of computer aided programs and systems that they can choose from.These include manual editors and automatic systems. Manual editors are available for 2D (Section 1.3.1), 3D (Section 1.3.3) and 2.5D models (Section 1.3.5). Both 2D and 3D artworks have automatic creation approaches for them, known as Photography (Sec- tion 1.3.2) and 3D Reconstruction (Section 1.3.4) respectively. But unlike for 2D or 3D, there is currently no method for building 2.5D models automatically, which is the problem that this thesis focuses on.

2http://mario.nintendo.com 3http://raymanzone.ubi.com/ 4http://worms.team17.com/ Introduction 7

Traditionally, artists create drawings using pen, pencil or brush on paper, wall or the surfaces of other objects. When it comes to 3D, the most common type of art is sculptures. These artworks are handmade. Traditional arts are the basis of modern arts, however, they are not strongly related to the topic of this thesis. Thus, the following part of this section is mainly about computer artworks.

Today as the personal computer becomes much cheaper and faster, most artists have their own computer and many use them as a tool to create visual arts. It is so widely used that today the term 3D modelling normally refers to modelling 3D objects in computers, rather than sculpting real objects. Moreover, a large number of 2D drawings have been created using computer aided programs. Various kinds of input devices have been developed for artists to use with computers. Now it is very hard to find artists who do not use computers at all and only rely on their brushes and chisels.

1.3.1. 2D programs and devices

Today, drawing with computers is very common. For example Pixiv, the Japanese online community for artists, has more than 4 million users and 24.5 million submitted works as of January, 2012 5. Some famous programs such as Photoshop6, Inkscape7 and GIMP8 are widely used by artists. Input devices such as pen tablets are not very expensive either. For example, the mid-range tablet Wacom Intuos49 normally costs just a few hundreds of dollars, and some entry-level ones may cost even less.

1.3.2. Automatic 2D generation

In early ages, when people wished to record objects on a 2D surface, they had to hand- draw everything. The first camera that can produce permanent photographs was made in 1826 by Joseph Ni´epce [57]. Today many 2D graphics editors rely on various kinds of algorithms that can generate many kinds of effects automatically. Computers can also help to generate in-between frames for animations. Automatic 2D image creating devices, such as cameras, have been used so widely that photos taken by them are more commonly seen than handdrawn paintings today.

5http://www.pixiv.net/info.php?id=901&lang=en 6http://www.adobe.com/Photoshop/ 7http://inkscape.org/ 8http://www.gimp.org/ 9http://wacom.com/en/Products/Intuos.aspx/ 8 Introduction

1.3.3. 3D programs and devices

3D computer-generated imagery was first used in the 1976 film Futureworld [37], and 3D computer graphics has been widely used since the 1990s in films, animations and video games. A large number of 3D modelling programs are available, such as the commercial product AutoDesk 3DsMax and AutoDesk Maya10, and open- software Blender11. Devices for automatically aquiring depth information are now cheap enough to enter the family entertainment market, for example the Microsoft Kinect12.

1.3.4. Automatic 3D modelling

Although today’s 3D industry mainly relies on artists to manually create 3D models, there are techniques that can automatically build models. One way is to generate new ones from existing models saved in databases, or generate a specific type such as the terrain from preset parameters, for example, as shown in Figure 1.2.

Figure 1.2.: A scene generated using Terragen2, by Giovanni Mezzadri (GioMez) from the terragen comunity 13.

Another way to automatically build 3D models is known as 3D reconstruction, which is the process of capturing the 3D shape and appearance of real objects. 3D reconstruc- tion can be accomplished by active or passive methods. Examples for active methods include time-of-flight lasers, microwaves and ultrasound, also known as [18].

10http://www.autodesk.com 11http://www.blender.org/ 12http://www.xbox.com/en-US/Kinect 13http://www.planetside.co.uk/gallery/f/tg2/GioMez-Isola.jpg.html Introduction 9

Passive methods do not interfere with the object. Typically, image sensors or cameras are used as input devices, and the process is called Image-Based Reconstruction.

1.3.5. 2.5D programs

When it comes to games that utilize 2.5D techniques, it is mostly the programmer’s responsibility to make the characters in games behave as in a 3D environment. Artists who draw assets for such games often only have to know the view angle of the scene to create 2D images.

Currently, there are only two programs that can be used for 2.5D cartoon model editing. Both of them have been developed by researchers for their experiments, one is by Di Fiore et al. [22] and the other is by Rivers et al. [55].

Currently, the building process for such models requires the user to have sufficient drawing abilities and consumes a great amount of time and effort as the user actually needs to manually draw a number of 2D cartoons from different views. Defining a curve in 2.5D normally requires much more work than in 2D, because a regular 2.5D cartoon model is often defined in three or more different 2D planes. The manual process makes the creation of 2.5D cartoon models not only inefficient but also labour intensive. Therefore, an automatic system that allows users to quickly build up a 2.5D model is essential for improving the usability of such a model.

Although there are many 2D and 3D automatic building and reconstruction appli- cations and devices, there is no automatic modelling application for 2.5D Models. This thesis proposes an Automatic 2.5D Cartoon Modelling system, which will be introduced in Chapter3.

1.4. Motivation

Cartoon is an important form of media. Both 2D and 3D cartoons have certain limita- tions. While 2D is more “free” as it may contain geometric non-realistic elements, every frame needs to be redrawn when the object is rotating. 3D has a much faster production pipeline as 3D objects can rotate freely and the frames are rendered by computers, but it cannot easily convey geometric non-realistic elements and restricts artists to a con- servative method of creation. The arguments between 2D and 3D artists has lasted for 10 Introduction

decades. As a solution to this problem, 2.5D Models have emerged. 2.5D cartoons can rotate in 3D and at the same time maintain stylistic strokes as in 2D, which overcomes the limitations of both 2D and 3D.

Computer aided automatic creation approaches are very useful for artists, because the building process is often very labour intensive. Current methods includes Photog- raphy for 2D and 3D Reconstruction for 3D, but there is no such existing approach for 2.5D. There are a few existing manual 2.5D programs [22, 55], and currently artists have no choice but to create 2.5D artworks manually. Thus, an automatic approach for building 2.5D cartoon models is required, and this thesis aims to create an approach that automates this process.

1.5. Overview

Figure 1.3.: Proposed approach, Auto-2CM.

The goal of this research is to create an automatic approach for 2.5D cartoon mod- elling, called Auto-2CM.

As shown in Figure 1.3, the system is designed as follows:

(i) 3D information, which is required by the system, can be collected from real world objects using 3D Reconstruction techniques. Existing 3D reconstruction approaches include 3D Scanning and Image-Based Reconstruction, both of which can be used for the system.

(ii) Meaningful parts of the candidate object are calculated using 3D Segmentation methods. Several existing methods of 3D segmentation are available.

(iii) A novel approach for converting 3D segments to 2.5D models is developed in this thesis. Introduction 11

The quality of models built by this automatic system varies significantly, according to the type of the candidate object and components used for the system. To find the best practice, an evaluation is performed on an implementation of the system.

1.6. Scope

The focus of this thesis is the automation of 2.5D Cartoon Models creation. This is most closely related to the following areas:

(i) 3D impossible cartoons are stylistic 2D cartoon elements that cannot be presented in 3D space, also known as the rabbit ears problem [55]. For example, Bugs Bunny’s ears should always face the camera no matter the viewing angle [55], the mouths of Shin-chan [74] and Dexter [71] often change appearance and position according to the viewing angle. Such stylistic elements are impossible to present as part of a 3D model. To solve this problem, and make these elements rotatable in 3D space, 2.5D models [5,22,55] were developed.

(ii) 3D Reconstruction is the process of aquiring the 3D information of real world objects, and this information is required when building 2.5D models. There are mainly two types of 3D reconstruction. The first is Image-Based 3D reconstruc- tion [28,62,66], which uses several images of the object to construct a 3D model. The second, 3D scanning [18], is simpler and normally more accurate than image- based reconstruction, but requires expensive equipment.

(iii) 3D Segmentation is the process of segmenting a 3D object into smaller parts. In this research, methods that are particularly good at simulating manual segmentation are considered. These include SDF [63], FP [4] and PBS [3].

(iv) [26] is the projection of a 3D object to a 2D . It can be per- spective or parallel.

1.7. Contribution

The main contributions of this thesis are threefold: 12 Introduction

(i) This is the first system to automate the building process for 2.5D cartoon models. It aims to free artists from labour intensive work so they can focus on the more creative aspects. The system is built with Image-Based Reconstruction, 3D Segmentation, 2.5D Shape Creation and 2.5D Model Assembly methods.

(ii) A novel method, which is the first automatic method to convert 3D segmented meshes to 2.5D cartoon models is proposed, it is an important part of the automatic 2.5D cartoon modelling system. It includes two steps: 2.5D Shape Creation and 2.5D Model Assembly.

(iii) This work provides an evaluation of the 2.5D modelling process with different algo- rithms and different datasets. The aim is to find the best practice for artists who use the system.

1.8. Organization

The thesis begins with a survey of the areas related to automated 2.5D cartoon modelling in Chapter2. It includes an overview of traditional and digital cartoons, and reviews the current state of 3D modelling and segmentation, as well as 2.5D models and modelling approaches.

The automatic 2.5D Cartoon Modelling system Auto-2CM, and the 2.5D shape cre- ation and assembly approach, are presented in Chapter 3.

Different algorithms and datasets are evaluated in Chapter 4, which aims to find the best practice for artists modelling different kinds of objects. An implementation of the system is also presented.

Finally, Chapter 5 presents the conclusion and summarises the thesis. Possible future directions of research in this field are also discussed. Chapter 2

Background and Literature Survey

Cartoons are an important form of art that is widely used in animations and games. It will be hard to understand why 2.5D cartoons are important without a knowledge of the history of cartoons. The ability to create and understand non-realistic 3D impossible shapes was developed back in prehistorical times, before writing began. The use of cartoons never stopped throughout history. This is presented in Section 2.1.

2D cartoons give artists the most freedom to express their imagination, but require artists to control every frame and therefore are more labour intensive. 3D cartoons can be rotated and reused, and the frames rendered with computers, but artists are often restricted to a conservative approach because they cannot create any geometri- cally impossible content. Arguments between 2D and 3D in the animation and games community lasted for decades. Previous research ([5,22]) provides the ability to view 3D impossible shapes from different angles, known as 2.5D models. This is a relatively new technique that mixes 2D freedom and 3D rotation, and these methods are discussed in Section 2.2. A more recent approach called 2.5D Cartoon Models was introduced by Rivers et al. in 2010 [55], and is discussed in Section 2.3.

Current automatic modelling systems are mostly for building 3D models, and use cameras or radars to get the information of real world objects. This technique is known as 3D Reconstruction, which aims to produce 3D models automatically from real world objects. More details of 3D reconstruction are in Section 2.4. To automatically segment 3D models into smaller parts, 3D Segmentation is introduced. It is a very important part of the automatic 2.5D cartoon modelling (Auto-2CM) approach of this thesis. The

13 14 Background and Literature Survey

results of segmentation directly affect the final 2.5D model. Techniques for 3D segmen- tation are discussed in Section 2.5.

Auto-2CM also requires methods to map 3D objects back to 2D space. Techniques for this purpose are known as 3D Projection, which is discussed in Section 2.6.

2.1. Cartoons

Cartoon is a line drawing representing visible edges or intersecting planes of objects[17]. They range from simple line drawings to computer simulations [24]. Any typically non- realistic or semi-realistic drawings could be termed as cartoons.

A basic knowledge of cartoons is necessary for understanding the topic of the the- sis. Cartoons have a longer history than writing, and many animations and games use cartoon artworks. The history of cartoons is discussed in Section 2.1.1. Modern car- toons are discussed separately in Section 2.1.2. Both 2D and 3D cartoons have certain limitations, and the arguments between 2D and 3D artists are discussed in Section 2.1.3.

2.1.1. History

Figure 2.1.: Pig having eight legs shows movement [25].

Stone age cartoons refers to prehistoric cave paintings, which are paintings on cave walls. The earliest cave paintings date back to 32,000 years ago [14]. At that time, humans used abstract shapes to express their thoughts and imagination, and others who saw those paintings could understand such expression. The shapes of birds, wolves or humans were created without high-level intelligence. Nonrealistic representations are as easily recognized as realistic ones[20]. This is a low-level, basic and deep-in-heart ability Background and Literature Survey 15

of humans. It is not surprising that today abstract shapes and stylistic strokes are still an important part of art.

Not all stone age drawings are static pictures, or more precisely, some artists of the prehistoric era were trying to give expression to the essence of movement. For example the pig in Figure 2.1 is depicted with mulltiple legs in superimposed positions, with the artist clearly attempting to convey a perception of motion.

In the classical period, there were changes in the style and function of sculpture [50]. Appearance became more realistic and poses became more naturalistic, see Figure 2.2.

Figure 2.2.: Bronze Sculpture, thought to be either Poseidon or Zeus, c. 460 B.C, National Archaeological Museum, Athens.1

Such realistic styles survived throughout the Middle Ages in the Byzantine, while Western European art mixed with the vigorous “Barbarian” culture of Northern Europe.

Medieval art covers over a thousand years in time. The term “Cartoon” originated in this period [72]. It first described a preparatory piece of art, such as a stained glass window. The glass painting style is still in use in modern times, as shown in Figure 2.3.

Figure 2.3.: Glass painting in the game “Zelda”.2

1http://www.namuseum.gr 2http://www.zelda.com 16 Background and Literature Survey

Semi-realistic paintings dominated the art until cameras were invented and widely used in the 19th centrury.

Figure 2.4.: Base of the brain, by Andreas Vesalius, 15433.

Renaissance art was a “rebirth” of the classical arts, emerging in Italy in about 1400 [36]. Yet it is not only a carry-on of classical arts but was also transformed by the mixing of medieval art with new scientific knowledge, as in Figure 2.4. Most Renaissance art is not as realistic as classical arts, but rather semi-realistic. Development in this period greatly influenced later times, including modern cartoons [30].

Modern cartoons are influenced by these previous arts, and many features and tech- niques developed in these periods are still being used.

2.1.2. Modern Cartoons

In modern times, cartoons may be a single picture drawing, which can mostly be seen in newspapers, or a series of drawings that describes a story, the most famous form of this being comics and animations. Another type of cartoon that is widely seen today is that in video games.

Will Eisner, one of the most important contributors to the development of the medium, defined comics as: “the printed arrangement of art and balloons in sequence, particularly in comic books” [23]. Comics usually use words to contribute to the mean- ing of the pictures. Cartoons have been definitely a significant information carrier since early ages, but sometimes it is better combined with other types of information such as words to describe the story better.

3Public domain image from Wikipedia Commons. Background and Literature Survey 17

Animation is the frame-by-frame control of images in time[68]. Japanese animations, also known as “”, are mainly 2D. On the other hand, 3D techniques are more widely used in western animations.

There used to be many 2D games with realistic images before early 2000s, but with graphic hardware becoming more and more powerful, game developers now often turn to 3D when they want realistic graphics. Today even mobile devices can provide sufficiant power for realistic 3D effects in games, for example Infinity Blade4 and Real Racing 5 targeting iOS devices. On the other hand, since realistic games have turned to 3D, the 2D games are now dominated by cartoon artworks.

2.1.3. The 2D-3D argument

Since 3D cartoons emerged, the argument about 2D versus 3D cartoons has lasted for decades. Those who prefer 3D argue that “2D is dead”, and Hollywood has abandoned 2D feature animation with the success of ’s 3D animations [12]. On the other hand, those who prefer 2D believe that 3D techniques restrict the imagination of artists. Now many animations use both 2D and 3D graphics, to balance their advantages and disadvantages.

2.5D cartoons, which mix the rotation ability and reusability of 3D, and the unreal 3D impossible shapes of 2D, were introduced recently as a novel direction to this art[55].

2.2. Early 2.5D Cartoons

Some recent applications of cartoons provide 3D rotation for 2D elements. Di Fiore et al. [22] proposed an approach for animation production that generates key frames by interpolating hand-drawn views. Bourguinon et al.[5] presented another approach using 2D strokes manually drawn on 3D planes. These two 2.5D methods are discussed in Sections 2.2.1 and 2.2.2 respectively.

4http://epicgames.com/infinityblade/ 5http://firemint.com/real-racing/ 18 Background and Literature Survey

2.2.1. Automatic In-betweening for Animation

In 2001, Di Fiore et al. introduced a new method for automatic in-betweening in com- puter assisted traditional animation [22]. The method aims to animate characters by transforming them outside the drawing plane, especially all rotations around an axis different from the z-axis (i.e. not only scale). This is a multi-level approach, with the following levels:

• Level 0 holds the basic building primitives, being sets of attributed 2D curves. At this level, the approach utilizes subdivision curves to represent sub-objects. Each curve can be closed or open. The thickness of curves can also be defined.

• Level 1a is fundamental to the whole approach. In this level, several key frames (called “extreme frames” by Di Fiore et al.) are created by artists manually using the system developed for the approach. See Figure 2.5.

• Level 1b is where the rendering takes place. It uses the extreme frames created in Level 1a to render animated sequences that are covered by these pre-defined key frames.

• Level 2 incorporates 3D information by means of 3D skeletons. For animating background elements and non-deformable objects only undergoing affine transfor- mations, Level 1 functionality could suffice. However, for animating live characters that are deformable, Level 2 functionality is needed. Thus a 3D skeleton system is introduced to maintain more 3D information of the 2.5D model. This 3D skeleton also requires to be manually created by the user.

• Level 3 introduces higher level tools such as non-affine deformations and facial expressions. They are very handy but are not core functionalities of the method.

Di Fiore’s approach is the first that tries to provide a geometric non-realistic solution for rotatable cartoons. It relies heavily on 3D skeletons, which is very similar to a technique used for 3D model animations. Building a model using this method is also very labour intensive work, sometimes more intense than building a 3D model of the same object, as it needs resetting of many vertices and bones6 in different frames. However, it is a great idea and is definitely a significant milestone in this area.

6“bone” of Skeletal Animation is a 3D animation technique[47] Background and Literature Survey 19

Figure 2.5.: Level 1 functionality, where control points are drawn to shape the character [22].

2.2.2. Drawing for Illustration and Annotation in 3D

In order to represent 2D strokes in 3D space, Bourguinon et al. presented a novel system [5].

Figure 2.6.: A circular stroke from Bourguinon et al. [5]. From left to right: viewed from front view; side view rotated by 30 degrees; side view rotated by 90 degrees; top view.

Bourguinon et al. argue that strokes are an excellent way to indicate the presence of a surface silhouette even in 3D. They believe several neighbouring strokes are able to reinforce the presence of a surface in the viewer’s mind, while attenuated strokes may indicate imprecise contours or even hidden parts. The main idea is to draw strokes in 3D planes, and show or hide strokes while the viewpoint is rotating, as shown in Figure 2.6.

This structure is easier to create than Di Fiore’s, and it requires less computational resources during rendering. However the use of this method is limited to static shapes, 20 Background and Literature Survey

i.e. shapes that do not change. Also, the shape is not clear when looking between key views.

Methods of Di Fiore et al. [22] and Bourguinon et al. [5] are the first to research 2.5D cartoons. Dated in the beginning of the new millennium they have opened a new direction, namely 2.5D, for cartoons. The most recent research in this direction is the 2.5D Cartoon Models by Rivers et al. [55], which is discussed in Section 2.3.

2.3. The 2.5D Cartoon Models

The two 2.5D methods described in Section 2.2 give the artists less freedom than the 2.5D Cartoon Models recently presented by Rivers et al. [55]. The 2.5D cartoon model is a novel approach to render 2D cartoons in 3D. Rivers’ 2.5D model, which is created purely by manual means, is the target model which this research aims to build automatically. Normally, a 2.5D cartoon model contains several billboards, each of which is defined in one or more views using one stroke per view, and also has one 3D anchor position, as shown in Fig. 2.7. The shape of a billboard in a new view is then determined by doing simple 2D interpolation between the corresponding user drawn strokes in existing views.

Figure 2.7.: A picture demonstrating the principle of River’s 2.5D cartoon models. This simple head model is combined using 9 parts, and each part contains one stroke as its boundary line, a 3D anchor position and a filling colour [55].

2.3.1. Advantages

As in other 2.5D models, the main advantage of Rivers’ model is that it preserves geometric cartoon stylistic elements. The appearance style can be provided by 3D. For example, no matter what the viewing angle, Bugs Bunny’s ears are always facing the camera [55], called the rabbit ear problem, and a model has been built in this thesis to Background and Literature Survey 21

demostrate this problem, as shown in Figure 2.8. To present this geometric element in a 2.5D cartoon model, one just draws the ears facing the camera in all views. This is achieved by defining the shape of the model in each view separately, and editing of a shape in one view will not affect its apprearance in other user-drawn views. See Figure 2.9.

(a) (b) (c)

(d) (e) (f)

Figure 2.8.: (a)(b)(c): The front, 45 degree and right view of 3D model of ’Ruby’. The 3D model is rendered using Toon-Shading. (d)(e)(f): In order to show the advantages of Rivers’ structure, the 2.5D Ruby has been manually edited based on the automatically generated sketch (by changing the positions of eyes and ears). The eyes can still be seen even when facing 90 degrees sideways, and the ears appear always facing the camera.

2.3.2. Limitations

There are several limitations to Rivers’ 2.5D models. The one that causes the most issues is the simple interpolation technique used to determine stroke shapes in views that are not key views. The simple interpolation of Rivers et al. could lead to strange shapes in interpolated views, which may also cause different parts of a model to detach from each other during rotation. Currently the only way to solve this problem is to define more key views. In Rivers’ work, three more key views are defined to make the 22 Background and Literature Survey

Figure 2.9.: Top: with the camera position and a 2.5D “professor” model structure, the final rendered image; Middle: a new view generated from three key views; Bottom: the manual editor interface of Rivers’ system [55]. Background and Literature Survey 23

Alien’s arm look right, as shown in Figure 2.10. The more views need to be defined, the more work that artists have to do. This research aims to develop a system which allows the user to simply add another line to the commands to get a new 3D consistent key view generated, without any extra work.

(a) (b)

(c) (d)

Figure 2.10.: (a)(b): Notice the ears of the ’Dog’ model in Rivers’ work. (c)(d): Three more key views are pre-defined to make the arms of the ’Alien’ look right.

2.3.3. The Manual Creation Process

Normally the user starts from one view position, draws one part, changes to another view and draws the shape for that part, then keeps changing to different views until that part is finished. This process may cost an experienced artist a large amount of time and effort to create a simple bear, for example, as shown in a video by Rivers et al. [55]. Thus a method to automatically generate 2.5D models would be very beneficial. 24 Background and Literature Survey

Rivers’ 2.5D cartoon model relies on artists to:

i) analyse the spatial and spectral characteristics of the object

ii) segment the object into parts based on those characteristics, and

iii) create a 2D stroke for each part in each view.

This is time consuming and challenging. An automatic approach to 2.5D modelling is required. By simulating the manual creation process, this research aims at the develop- ment of such an automatic approach. To achieve this, 3D information of the candidate object is needed. Thus, 3D Reconstruction is introduced into this research for this purpose.

2.4. 3D Reconstruction

3D reconstruction includes image-based reconstruction and 3D scanning. Because 3D scanning relies on expensive devices, this research focuses on image-based methods.

The 3D reconstruction of scenes from images taken from multiple cameras, known as Image-Based 3D Reconstruction, is a fundamental problem in Computer Vision. In this section, Feature Detection [35], which is a fundamental theory underlying nearly all image-based reconstruction, is discussed. Stereo Matching [59] is a method to compute 3D positions of the features detected by Feature Detection. Visual Hull [43] is another classical method of 3D object reconstruction that is more accurate but with more restric- tions. Feature Detection, Stereo Matching and Visual Hull are discussed in Section 2.4.1. Visual Hull can be combined with Stereo Matching to gain better performance of recon- struction, with one example presented by Furukawa et al. [28]. Furukawa’s method, as well as other reconstruction methods, are discussed in Section 2.4.2.

2.4.1. Feature Detection, Stereo Matching and Visual Hull

Feature detection and matching is fundamental to many computer vision applications [7,10,28,31,52,75]. Geometric relations between features and cameras can be computed using detected and matched features in different images. Feature detection and matching have been used since the early days of stereo matching[34,35,49] and is popular in image stitching applications ([8]) and 3D reconstruction approaches ([28]). Background and Literature Survey 25

Stereo matching is useful to find 3D positions of on 2D images, and it is one of the most active research areas in computer vision, with a large number of algorithms for stereo correspondence [59]. Some methods use several images to generate a depth map. A depth map is a 2D array where the x and y distance information correspond to the rows and columns of the array as in an ordinary image, and the corresponding depth readings (z values) are stored as the array’s elements (pixels). A survey accompanied with data sets is available [59], and the comparison is uptodate on the website [58].

Figure 2.11.: Visual hull, the volume intersection approach to object reconstruction [43].

The Visual hull is a geometric entity created by shape-from-silhouette 3D reconstruc- tion technique [43]. This technique assumes that the foreground object in an image can be separated from the background. Under this assumption, the original image can be thresholded into a silhouette image, which is a foreground/background binary image, see Figure 2.11. The foreground mask, known as a silhouette, is the 2D projection of the corresponding 3D foreground object. A back-projected cone can be computed with the silhouette and the camera viewing parameters. This cone is called a silhouette cone[43]. Cones produced from multiple silhouettes taken from different viewpoints will intersect. The intersection of the cones is called a visual hull, which is the bounding of the actual 3D object. With a sufficient number of cones, an accurate enough model can be built. A main limitation of the Visual Hull is that it cannot provide geometry information of concave surfaces. 26 Background and Literature Survey

2.4.2. Image-Based 3D Reconstruction

Most current stereo reconstruction methods produce 3D polygonal mesh models [56]. Polygonal modeling is the process of modeling objects by way of representing their surfaces in terms of [40]. In 2006, Seitz et al. [62] compared several such methods. Among these methods, Furukawa and Ponce [27] achieved the best results.

Since 2007, Furukawa and Ponce[28] remains one of the top methods on the Middle- bury benchmark[58]. In this method, a model called the patch model is used to produce final meshes. A patch p is a rectangle with centre c(p) and unit vector n(p) oriented toward the cameras observing it. The first step of their algorithm is to perform Feature Detection in each image. After features have been found in each image, they are matched across multiple images to reconstruct a sparse set of patches, which are then stored in the grid of cells C(i, j) overlaid on each image. At the expansion stage, the method iteratively adds new neighbors to existing patches until they cover the surfaces visible in the scene. Two filtering steps are applied to the reconstructed patches to fur- ther enforce visibility consistency and remove erroneous matches. Finally the polygonal surface is reconstructed, with Figure 2.12 showing this process.

Figure 2.12.: Overall approach of Furukawa and Ponce’s method [28]. From left to right: a sample input image; detected features; reconstructed patches after initial matching; final patches after expansion and filtering; polygonal surface ex- tracted from reconstructed patches.

Among the methods in the Middlebury benchmark[58], Bradley et al.[7] also provide good results when provided with fewer images (16 views). Methods by Vogiatzis et al. [75], Habbecke and Kobbelt [31] and Lambert et al. [52] perform well if more images are provided (312 views). Campbell et al.’s work[10] also gained good results in all range of data. A survey on Image-Based Modelling has been published recently [51]. Background and Literature Survey 27

Feature Detection, Visual Hull and Stereo Matching are all methods for reconstruct- ing 3D geometry from 2D images. Auto-2CM in this thesis, aims to produce 2.5D cartoon models by using 3D models reconstructed from real world objects. In addition, to reach the goal of automating the 2.5D cartoon creation process, the manual process of human artists segmenting objects must also be simulated. 3D Segmentation is the solution to this problem.

2.5. 3D Segmentation

Automatic 3D segmentation is important for automatic 2.5D modelling, and its goal is to provide results similar to human segments. Some segmentation methods are able to have similar performace to human segmentation [13]. Three such methods, SDF [63] (Section 2.5.3), FP [4] (Section 2.5.4) and PBS [3] (Section 2.5.5), are discussed.

2.5.1. Manual 3D Segmentation

There are many ways to segment a 3D model manually. An obvious solution is to use 3D editing software, such as open-sourced Blender7, or commercial products such as and Autodesk 3Ds Max8. But these require the user to have experience in using these tools.

Another way is to use software that has been developed specifically for 3D mesh seg- mentation. For example, software that aims to semi-automatically segment 3D meshes has been provided by Shapira et al. [63]. Using that software to segment a 3D mesh, the user needs to click several points on segment positions of the candidate object. The algorithm will then automatically link these points and find a cut of the object.

2.5.2. Automatic 3D Segmentation

Automatic mesh segmentation (also called mesh partitioning or mesh decomposition) automatically segments 3D meshes into smaller parts and is a classical problem and key ingredient of Computer Graphics. Many automatic mesh segmentation algorithms have been developed over the last several years. These algorithms can be categorized into

7http://www.blender.org/ 8http://www.autodesk.com 28 Background and Literature Survey

different groups according to their underlying methods, with a survey by Agathos et al. [1].

The purpose of segmentation in Auto-2CM is to simulate the segmentation of an object by a human artist to create a 2.5D cartoon model. As this research concentrates on methods that provide similar results to human segmentation, methods that have been evaluated against human segmentation are considered. In 2009, a benchmark for quantitative analysis of how people manually decompose objects into parts and for comparison of automatic mesh segmentation algorithms was published by Xiaobai Chen et al. [13].

Chen et al. [13] compared different automatic segmentation methods against manual segmentation. A standard dataset of 3D object segmentation has been built by recording segmentation by humans. Shape Diameter Functions [63] is one of the best among the evaluated algorithms.

2.5.3. Shape Diameter Function

The Shape Diameter Function method was presented by Shapira et al. [63] in 2008. The algorithm is based on a Volume-Based-shape-Function, which maintains similar values in analogue parts of different objects. The algorithm consists of three main steps:

(i) compute a value called “SDF value” for every triangle at the triangle’s centre. This is achieved by shooting cones of rays from each triangle centre toward the opposite of its normal, allowing the rays to hit other triangles of the object, and computing the average distance of the hits.

(ii) a Gaussian Mixture Model is used to compute the histogram of all SDF values in order to find the probability of each triangle being assigned to each of the SDF clusters.

(iii) after probability is computed by Gaussian Mixture Model, a graph is built where each triangle is a node of the graph, neighboring information is also considered in the graph. Then the Graph-cut algorithm is used on the graph to compute the segmentation results.

SDF is one of the best segmentation methods when simulating human segments [13]. It has good performance on various kinds of objects. However its performance starts to decline when the count of the object is low (less than 1000 in experiments). Background and Literature Survey 29

Figure 2.13.: Examples of cones of rays sent to opposite side of the mesh [63].

2.5.4. Fitting Primitives

Fitting Primitives Segmentation was proposed by Attene et al. [4] in 2006. It is an hierarchical clustering algorithm based on fitting primitives. The algorithm requires the user to pre-set the number of segments. It then starts with every triangle in the object, computes the fitting geometric primitive (plane, cylinder etc.) for adjacent triangles and merges the best pair. It keeps iterating this step until a pre-set number of segments is reached.

In experiments, Fitting Primitives was good at sharp edges, but had lower perfor- mance than SDF and PBS(in Section 2.5.5) methods in most other cases.

2.5.5. Protrusion-oriented Segmentation

Protrusion-oriented 3D Mesh Segmentation [3] was proposed by Agathos et al. in 2010. This method is based on prominent feature extraction and core approximation, aligned with a general framework introduced by Lin et al. [44]. The algorithm is based on the premise that a 3D object consists of a core body and its constituent protrusible parts (for example, arms, legs etc.). After initial segmentation, the partitioning boundaries are refined using a minimum cut algorithm.

Agathos’ approach consists of five steps:

(i) Salient points extraction: Salient points are points at the extremities of the object. Continuous Function for Topology Matching [39], also known as Protrusion Func- tion [1], which has the property of having low values at the centre of a 3D object 30 Background and Literature Survey

and high values at its protrusions, is used for finding the salient points of the 3D object;

(ii) Salient points grouping: Extracted salient points are grouped according to their distance. Half of the mean of geodesic distances between salient points is defined as a threshold, and salient points for which the geodesic distance is less than the threshold are grouped together.

(iii) Core approximation: The core approximation is addressed by using the minimum cost paths between the representative salient points. Its purpose is to approximate the main body of the object.

(iv) Partitioning Boundary Detection: This is based on the assumption that, in the area that is enclosed by the boundary between the protrusion and the main body, an abrupt change in the volume of the 3D object should occur. By detecting this change the boundary can be found.

(v) Partitioning boundary refinement: Minimum cut algorithm is used in this step to refine the boundry.

To sum up, the three segmentation methods are able to mimic human segmentation, but they are different at the methodology level. SDF is an algorithm that both seg- ments the object and optimizes the segments globally. Fitting Primitives only locally grows each segment and then refines it using min-cut algorithm. Protrusion-oriented segmentation finds the core body at the global level then does optimization locally.

The last step of Auto-2CM involves 2D shape extraction from 3D objects, and a method for mapping 3D to 2D known as 3D Projection.

2.6. 3D Projection

If image-based 3D reconstruction aims to build 3D objects from 2D information (images), the purpose of 3D projection is just the opposite. It projects 3D objects to 2D space. The projection of a 3D object is defined by straight projection rays (called projectors) emanating from a centre of projection, passing through each point of the object, and intersecting a to form the projection [26]. In Auto-2CM, 3D projection is used to create 2.5D shapes. Background and Literature Survey 31

(a) (b)

Figure 2.14.: (a) Perspective projection (b) [26].

There are two types of 3D projection, namely perspective projection and parallel projection. The perspective projection of any set of parallel lines that are not parallel to the projection plane converges to a point [26], as shown in Figure 2.14 (a). Parallel projection projects points on the object surface along parallel lines onto the display plane [26], as shown in Figure 2.14 (b). Both these methods may be used in Auto-2CM.

2.7. Summary

In this chapter the significance of cartoons was discussed. Previous research [22,55] introduced solutions for rotating 3D impossible cartoonish elements, called 2.5D models. However the cost of creating such models is high, and the process is labour intensive. Creating 3D models is also a labour intensive task, but there are many methods[28,38,62] for building them automatically. Although there are no existing automatic 2.5D building approaches, this thesis argues that it is necessary and will show that it is possible.

3D reconstruction methods [7,10,28,31,38,52,75] are able to produce good quality 3D models from images. 3D mesh segmentation methods [1,63] can provide results similar to the results achieved by manual means. 3D projection, which is the mapping from 3D to 2D, was also discussed. 32 Chapter 3

Automatic 2.5D Cartoon Modelling

Automatic 2.5D cartoon modelling (Auto-2CM) aims to build 2.5D cartoon models au- tomatically from real world objects, so that human artists can be released from labour intensive and repetitive work, and are able to focus on more creative tasks.

The 2.5D cartoon model consists of several 2.5D parts. To automatically build it, a method to automatically segment the object into meaningful parts, as well as an approach to create 2.5D parts automatically, are required. Every 2.5D part has one stroke for each key view, therefore a method to create strokes from different viewing angles is required for building 2.5D parts. Since the strokes are created from different viewing angles, appearances of the object from these angles must be available. In other words, shapes from different view points of 3D space should be able to be captured. Moreover, in order to segment the candidate object, the 3D geometric information of the object must be available.

In computer graphics, 3D information is normally structured as 3D models. Auto- matic methods of building 3D models from real world objects are known as automatic 3D reconstruction, including Image-Based 3D Reconstruction and 3D scanning. Among different types of 3D models, the 3D polygonal model is often used in these methods for its simplicity and efficiency. In Auto-2CM the 3D information carrier is the 3D polyg- onal mesh. Both methods provide good enough models for the needs of Auto-2CM. 3D reconstruction in the Auto-2CM approach is discussed in Section 3.1. Once the 3D in- formation is collected, the problem of segmenting the candidate object into meaningful parts can be solved by 3D segmentation methods. These methods can segment a 3D polygonal model into smaller parts, and the results are similar to those created manu-

33 34 Automatic 2.5D Cartoon Modelling

ally. Segmentation is discussed in Section 3.2. There is no 2.5D parts creation method available, so in order to complete Auto-2CM, a new method for this problem is required. Assuming both 3D information and segments are available, the shape of a stroke may be created by capturing the contour of that part from its projection, which is discussed in Section 3.3 in detail. A 2.5D stroke is rendered based on a series of control points, ie. it is a vector curve, not a curve. Thus, in order to create the 2.5D stroke, control points should be created after the shape has been extracted. This is discussed in Section 3.4.

The proposed system consists three main stages: 3D reconstruction, 3D segmentation and 2.5D shape creation and assembly, as shown in Figure 3.1.

Figure 3.1.: The process flow of Auto-2CM.

3.1. 3D Reconstruction

The goal of this stage is to provide 3D information of the candidate object, which is required for building 2.5D models. Segmentation of 3D objects into parts needs 3D information, and moreover 2.5D parts are created by capturing shapes of the object from different 3D view points. When an artist observes an object from different angles, the 3D structure of that object is intuitively understood. In contrast, the computer technique to obtain this 3D information from real-world objects is called 3D reconstruction.

In computer graphics, the structure that contains 3D information is the 3D model. Among different types of models, the 3D polygonal model, which is the structure to model objects using polygons to representing their surfaces [40], is most widely used for its simplicity and efficiency, so this structure is selected for the approach as 3D information carrier. There are two main approaches to 3D reconstruction, namely 3D scanning and Image-Based 3D Reconstruction. 3D scanning simply uses special devices, such as laser scanners, but such equipment is often very expensive, so usage is limited. This research focuses on the image-based approach, though either approach will work. Automatic 2.5D Cartoon Modelling 35

In order to proceed to the next stage, the 3D information collected in this stage must be dense enough that 3D completion methods would be able to fix all the incomplete holes, which is required by 3D segmentation in the next stage. Among all current image- based reconstruction methods, models built by Furukawa et al. [28] have the highest accuracy and completeness [62]. This method is chosen in this research, though it may be replaced by any other reconstruction approach.

The kind of input data required at this stage is important, because it is also the input of the whole system. 3D scanning devices do not require special input data. Image-based methods, on the other hand, require images as input, and optionally the camera parameters. The appearance of the candidate object also affects the quality of the reconstruction, and affects the performance of segmentation in the next stage. After data has been collected, a self-calibration step is necessary if the calibration parameters are not pre-calculated. The reconstruction algorithm will take the images and parameters of each camera as input. The output is a 3D polygonal mesh of the scene. After reconstruction, a refinement step is needed to separate the candidate object from the background. This is because the algorithm reconstructs everything in the scene. This can be achieved by using a 3D editing program. A polygonal mesh completion algorithm [53] is also necessary to close holes on the mesh.

In this stage the necessary 3D information is collected and stored in the form of a 3D polygonal mesh model. The next step is separation of the object into meaningful parts.

3.2. 3D Segmentation

3D segmentation is a very important part of the approach, because it helps decide how to separate the candidate object into parts, and these parts are later converted to the basic units of the cartoon model. A slightly modified version of SDF [63] is used in the approach.

3D information has been collected in the previous stage. In the manual modelling process, at this stage the artist would decide how to divide the object into smaller meaningful parts. For example, if the object is an apple, it probably will be divided into stem and body. If the object is a humanoid character, then most likely into body, head and limbs. A more experienced artist would divide a humanoid character in more 36 Automatic 2.5D Cartoon Modelling

detail, and Figure 3.2 shows a typical segmentation where the human body consists of 24 parts.

Figure 3.2.: A typical segmentation of the human body for cartoon drawing. From Pixiv lecture by Kyachi [42]

Thus, the task for this stage is the separation of the object into meaningful parts automatically. An approach which can segment 3D models automatically and at the same time gives results similar to meaningful human segments is required. Previous research has introduced several approaches for this task [3,4,63].

Data for this stage is collected in the previous reconstruction stage. However, some preparation is necessary. The number of triangles of the mesh should be reduced to a reasonable amount, because the following steps of the approach have several processes with O(n2) time complexity, where n is the number of triangles. A large number of triangles will require more computing resources. Triangle reduction can be done auto- matically by mesh decimation methods. This would not affect the final result much. In Automatic 2.5D Cartoon Modelling 37

the experiments, the appearance of a mesh would not change significantly unless more than 66% of the total amount is decimated. For example see Figure 3.3.

(a) Original bear model wire (b) Decimated bear model wire frame frame

(c) Original bear model (d) Decimated bear model

Figure 3.3.: The bear model from Chen’s benchmark [13]. Original model contains 24996 triangles, after decimation to 5% there are 1248 triangles left. The silhouette has not changed much.

The SDF volume computation is a process that shoots cones of rays from each triangle toward the inner side of the mesh, and these values are used for triangle clustering. In the triangle clustering step, an energy minimization model is built. SDF results calculated in the last step are used as samples to train the model. Then the clustering result of each triangle is computed. This step has not taken the geometric factors into account, but only uses local SDF values to do the clustering. Thus adjusting the partitioning by considering the geometry distances is necessary, which is the task of the k-way min-cut.

After the segmentation stage, all information needed for creating 2.5D shapes is ready. The next stages are the automatic drawing phase, which includes 2.5D shape creation, and model assembly. 38 Automatic 2.5D Cartoon Modelling

3.3. 2.5D Stroke Creation

This stage is the actual drawing phase. In the manual process, the artist will decide the shape of each part in different views and draws control points for it. To simulate this, the automatic process consists of two steps, because the task may be divided into two parts: the first is to decide the shape, the second is to generate control points based on that shape. For the first step, the contour of the current part is captured as its 2.5D stroke. This is appropriate for modelling 2.5D cartoon models, because any curve other than the contour, such as a nose on a face, should be a different part in the 2.5D cartoon model structure. This is because in the structure each part consists of only one stroke, and in order to interpolate between key views, a part cannot contain more than one stroke. Thus in this step, the contours of the parts from different angles are extracted. For the second step, control points are generated according to the shape extracted in the first step. The 2.5D strokes are vector art defined by control points, not continuous curves. When the artist draws a stroke of a 2.5D cartoon model, what actually is drawn are control points of that stroke. Thus, in this second step, control points for each shape should be generated.

The previous steps provide segmented parts, and this information is all that required for this stage. However, the segmented parts need refinement before entering the drawing phase, and this is discussed in Section 3.3.1. The process of capturing the shape of the candidate object, the shape extraction step, is discussed in Section 3.3.2. Then final control points are generated based on these shapes, which is discussed in Section 3.3.3. Some other attributes provided by the structure are also discussed in Section 3.3.4.

3.3.1. Parts Refinement

Before proceeding to the shape extraction step, refinement of the segmented parts is necessary. When a part has multiple number of cuts, it is possible that from some viewing angle, two holes may be overlapping, and the background may be visible, as show in Figure 3.4. This will cause the contour extraction step to extract false shapes. Therefore before the next step, holes in every parts should be filled using automatic mesh completion approaches. MashLab is used for the purpose and appears adequate for the task. After this step, the data is ready for the next step, which is shape extraction. Automatic 2.5D Cartoon Modelling 39

(a) (b) (c) (d)

Figure 3.4.: (a) Two holes where the legs are separated from the cow’s body, this case will cause 2D hole filling method fail to give right result. (b) Model based on incorrect result. (c) The body part of the cow before 3D hole filling (d) The body part of the cow after 3D hole filling

3.3.2. Shape extraction

The goal of shape extraction is to obtain the shape of each segmented part, preparing for control point generation later on. Because the segmented parts are 3D, they should be first projected to the 2D plane before drawing them. There are two different ways to do this. The first is to project 3D vertices of the mesh directly to 2D positions. The second is to project the surface of the mesh to a 2D area.

(i) Extract directly from vertices of mesh (DV)

When observing an object from a virtual camera, on the screen it is the projection of that object from the virtual 3D space to the camera plane that is seen. A 3D mesh after projection should be a 2D shape covering an area on the camera plane. Because the surfaces of a 3D polygonal mesh are triangles which use three vertices of the mesh as their vertices, any point inside the mesh will not exceed the boundary linked by vertices of the mesh. If the vertices of the mesh are projected to the camera plane, the vertices will became a cluster of 2D points, and any points inside the mesh (inside the projected area of that mesh) will not exceed the boundary linked by the outmost projected vertices. Thus the contour will be the links between each pair of neighbouring outmost vertices.

Although generally these outmost vertices can be found by building a concave hull of all the vertices, this is a very hard task. Current automatic concave hull finding methods are not very reliable, and require a large amount of computation resources. Convex hull, on the other hand, is easy and fast to extract. It does not suit concave shapes, but for 2.5D models, most segmented parts are convex. For concave shapes, a different method to extract their contour is available, which extracts them from projected 2D areas. 40 Automatic 2.5D Cartoon Modelling

(ii) Extract from projected area (PA)

Compared to extracting directly from mesh vertices, getting contours from projected ar- eas is much more reliable. However this method requires more computational resources, and the final generated control points are not as accurate as direct extraction. This is because the control points may not be exactly at the vertex point position, thus may slightly change the shape. In this case the system needs to generate control points from a continuous curve, and choosing the location of each control point is a concern. When trying to generate control points equally on the contour, some of them might be redun- dant. For example, for a relatively long straight line, where actually only two points are needed, there might be some points generated in the middle, but these points are unnecessary and thus redundant. Therefore direct extraction should be the first choice except for concave shapes, which are less common than convex ones anyway.

To find the contour of the projected area, the projected area of the candidate mesh should first be extracted out of the background. After that a binarization method is applied on the camera plane image. This image is called a binary image, which is actually a two-dimensional matrix. In this matrix, the value of pixels belonging to the object projection area (foreground) are one, and background pixels are zero. Once this data structure is ready, an algorithm introduced by Suzuki et al. [70] is applied to find the boundary between zeros and ones. This boundary is the contour of the projected area. Putting the projected area into a matrix will reduce the information. How much information is lost in this process depends on the size of the matrix. A bigger matrix is better to preserve the information but will increase the computing resources needed for both contour calculation and control point generation.

3.3.3. Stroke Control Points Creation

After the previous shape extraction step, the shapes required for control points genera- tion are available. The goal of this stage is to use this information to create these control points. Depending on the shape extraction method used, either DV or PA, there are two ways to create control points at this stage. Automatic 2.5D Cartoon Modelling 41

(i) Control points from vertices of mesh (for DV)

Assuming the convex hull of the total vertices set has been calculated and a subset consisting of all outmost vertices is available, the control points can be simply built at the location of each border vertex. But a better and more practical way to generate control points includes a reduction process for these points. No points extracted directly are redundant, and deleting any of them will lower the accuracy of the shape. But some points might be too close to each other and removing them will not affect the shape very much. Actually removing unnecessary points will make the final stroke smoother and increase the efficiency of interpolation while being rendered.

(ii) Control points from contour of projected area (for PA)

After contour extraction using a binary image, a closed line consisting of a series of pixels is available. These pixels are the border pixels between the object projection area and the background. One pixel is randomly picked and its location is used to generate the first control point. This point is called the head point of the stroke. From the head point, the line of border pixels is walked through, counting the steps (pixels) that have passed. Each time a certain number of pixels are passed, the current pixel is picked and a control point is generated using its location. The number used here may differ from part to part, depending on the relation of sizes between the current part and whole object, and the resolution of the binary matrix.

3.3.4. Other Attributes

There are some attributes other than control point locations of a 2.5D cartoon model stroke, defined by Rivers et al. [55]. Among these attributes two are more important than others as they greatly affect the appearance of the model, the first is thickness and the second is colour. Control points have some additional attributes too. The most important one is its smoothness type. It can be smooth or sharp, as shown in Figure 3.5. In current Auto-2CM approach, these attributes are set to default values and are available for later manual modification.

The control points of each part in each view is now available. To build the final model there is only one step left, which is the 2.5D model assembly. 42 Automatic 2.5D Cartoon Modelling

(a) Smooth (b) Sharp

Figure 3.5.: Stroke by control points with different smoothness types.

3.4. 2.5D Model Assembly

Assembly is the last stage of the automatic modelling process. The goal of this stage is to fit all information and products of the previous stages into a 2.5D cartoon model structure. In the manual process, human users decide where each shape should be placed and how to organise them as a model. Based on assumptions, the system now only has control points of shapes and viewing angles. Fitting this information into the 2.5D cartoon model structure is the task for the system in this stage.

The beginning of this section discusses the selection of key views, in Section 3.4.1. The process of assembling a single part and the whole model is discussed in Section 3.4.2 and Section 3.4.3. respectively.

3.4.1. Key-View Selection

It is important to know the number of key-views needed for the 2.5D cartoon model. The number of key-views should be sufficient to contain all necessary 3D information, but more key-views require more computation resources for rendering and storage. For a 3D object, three 2D views are enough to store all its information. For example, scale modelers often use orthographically projected or less formally three view images as a guide for building scale models, see Figure 3.6 as an example. Thus normally three views are enough for a 2.5D model. In some cases however, such as when interpolation errors occur, more key views may be necessary. By default Auto-2CM uses only three views. Automatic 2.5D Cartoon Modelling 43

Figure 3.6.: An orthographically projected of the Lockheed Martin/Boeing F-22 Raptor fighter aircraft. Image from Wikipedia1.

3.4.2. Single Part Assembly

Assembly of one part is the basis of assembly of the whole model, for example, to assemble a single part of the model, like a leg of the camel in Figure 3.7, from locations of control points created in previous stages. First, for every set of control points, a 2.5D view is picked according to the viewing angle of the virtual camera that captured this contour. Then, in that view, starting from the head point, every control point is pushed into the 2.5D cartoon stroke structure. After doing this for every camera position, the 2.5D strokes will be completed, which are drawings of the original 3D part.

3.4.3. Model Assembly

There is only one step left, which is assembling the final model. The 2.5D positions of the strokes are decided by all the control points in all views. This value is calculated automatically when rendering, so no extra actions are needed for putting the strokes in the right position. The action of this step is to put all 2.5D strokes into the 2.5D model structure in the right order. There are some attributes for defining the relationship be- tween strokes, such as overlapping and union. But these functionalities are not essential and rarely used. Moreover, the complexity of automatically assigning them is relatively high. Thus these attributes are left to the user.

1http://en.wikipedia.org/wiki/Lockheed_Martin_F-22_Raptor 44 Automatic 2.5D Cartoon Modelling

After this stage, the automatic 2.5D modelling process is completed. This stage takes control points as input and creates a fully functional 2.5D model. In experiments three key views were enough for most objects. Possible future improvements include an approach that automatically chooses additional attributes, such as overlapping and union.

(a) (b) (c) (d)

Figure 3.7.: The process of part assembly: (a) Segmentation. (b) The right-front leg part from different views. (c) 2D control points of the part. (d) Final 2.5D model.

3.5. Manual Modifications

One of the most useful features of 2.5D models is preserving 3D impossible drawings. But this feature does not appear in automatically built 2.5D models because they are built from 3D meshes. This is inevitable, because 3D impossible drawings are creative unreal content which cannot be found in real-world objects, but are created by humans [61]. However, this is the most fun part of the 2.5D building process. As stated in the beginning of this thesis, one goal of this research is to create an approach that releases artists from labour intensive work that machines can do, so that artists may concentrate on creative work. This stage is not yet a part of the system but an optional step of the approach. The goal of this section to show what has been left to the artists after the labour intensive work is done, and how easy it can be. This includes strokes modification and colouring. Automatic 2.5D Cartoon Modelling 45

3.5.1. Stylistic Strokes Modification

One action that an artist might like to take after a model is built is the editing of some strokes to have 3D impossible cartoon styles. This is the fun part of 2.5D cartoon modelling, which the approach leaves to artists. With all other parts of the model as reference, it should be much easier than doing it from scratch. See Figure 3.8 for an example.

3.5.2. Colouring

In the 2.5D cartoon model structure, every part has one colour. A part does not need more than a colour, as in that case that part should be separated into two different parts. Assigning a colour to each part is not heavy work. The colour it should be assigned depends on the visual effect the artist wants, for example, a red bird character in dark environments (such as at night) should be coloured purple instead of red.

The manual modification step is where artists do the fun part of 2.5D modelling process. It is fun and easy, an example of this is demostrated in Figure 3.8. 46 Automatic 2.5D Cartoon Modelling

Editing tail Original automatically built model

Finished editing 9 points edited Editing Colour One action per part

All finished Time usage: 2min41sec Test render Figure 3.8.: Actions an artist normally takes after model is automatically built Automatic 2.5D Cartoon Modelling 47

3.6. Algorithm

The fully automatic 3D to 2.5D convertion process is summarised in Algorithm 1.

Algorithm 1 3D to 2.5D Conversion(I) Initialise B ← ∅, P ← ∅, S ← 3DModel for all s ∈ S do for all v ∈ V iews do c ← CreateContour(s, v) p ← SampleP oints(c) P ← P ∪ p end for b ← CreateBillboard(P ) B ← B ∪ b end for return B

3.7. Experiments and Results

A comparison between the automatically created 2.5D models and manually built ones is shown in Figure 3.9. The 3D models used as inputs of the system are also shown.

Another comparison is in Figure 3.10, which also demostrates the auto-built 2.5D model before and after some simple manual modification.

Two scenes are shown in Figure 3.11, one using pure 2D and the other with 2D- 2.5D mixed elements. To make it easier to compare, the different parts (characters) are highlighted.

A model of a rabbit named Ruby is also built, to demostrate the 3D impossible problem, also known as the rabbit ear problem. See Figure 2.8. 48 Automatic 2.5D Cartoon Modelling

(a) simple head, 3D model (b) simple head, Automatic (c) simple head, Manual model model by Rivers et. al.

(d) Alien, 3D model (e) Alien, automatic model (f) Alien, Manual model by Rivers et. al.

Figure 3.9.: Some example 2.5D cartoon models built using our system, the 3D model used and fully manually created 2.5D models by Rivers et. al. [55]. Automatic 2.5D Cartoon Modelling 49

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 3.10.: Top row: A manually created bear model [55]. Mid row: A bear model created by Auto-2CM, original 3D models from [13]. Bottom row: With only a few manual changes to the automatically built bear, a koala model is created. 50 Automatic 2.5D Cartoon Modelling

(a) 2D Scene

(b) 2.5D Scene

Figure 3.11.: 2D and 2.5D scene of Angry Birds. 2.5D cartoon birds built using Auto-2CM. Automatic 2.5D Cartoon Modelling 51

(a) 2D elements highlighted

(b) 2.5D elements highlighted

Figure 3.12.: 2D and 2.5D scene of Angry Birds, with comparison of pure 2D and 2.5D elements in scene. 2.5D elements highlighted. 52 Automatic 2.5D Cartoon Modelling

3.8. Limitations

Auto-2CM uses Rivers’ structure, which is good for presenting abstract cartoons, but not suitable for models which have long thin shapes, such as tables and chairs. This is because of the limitations of Rivers’ model as discussed in Section 2.3.2. Complex objects, which are not composed of round shapes, normally need more key views to make them look right while rotating. Having many key views makes a 2.5D cartoon model inefficient to render. So converting complex 3D models to 2.5D may not be useful in practice.

The system cannot automatically convey a shape using a partially surrounded stroke (open stroke). In some cases there are different ways that an artist may use to convey a 3D shape, for example as in Figure. 3.13. But in other cases when a shape obviously should be left open, such as fingers on palm, the system cannot automatically keep that stroke open. For example, the system itself cannot figure out the difference between a cigar and the fingers holding it. This is because it is a general system that treats all objects as the same. This part is left to the users for now. Future improvements can be introduced to help the system learn different kinds of objects.

Figure 3.13.: In some cases, whether the nose stroke is open or closed depends on the style that the artist wants to present [32]

In future, some other non-photorealistic rendering methods (such as [15] and [21]) might be introduced into the system. These 3D model based line drawing techniques may help to decide where to pick control points. Because there may be some strokes rendered by other methods[21] that do not fit the existing strokes, it is possible to define new stroke creation rules based on these extra strokes. Automatic 2.5D Cartoon Modelling 53

3.9. Remarks

In this research, a novel approach for automatic 2.5D modelling is presented. This system is able to produce 2.5D models from real-world objects through 3D meshes with minimum human monitoring. It is helpful to artists who want to preserve stylistic 2D elements provided by the 2.5D cartoon model and reduce manual labour for creation. There is no previous work on automatic 2.5D cartoon modelling, and this approach is the first solution to this problem. Though able to create fully fuctional models, the system should be considered as a sketch system, that saves artists from basic labour intensive work and allows them to focus on the more creative parts, such as colourization and stylization.

One possible future work is to improve Rivers’ structure. Especially the interpola- tion, which is only simple interpolation now, could be improved to some other inter- polation methods, with correspondence information added to the strokes automatically when building the model, as assigning correspondences between different views might be cumbersome to human users.

It is also possible to separate parts directly in images in the future. This may require the ability to find the correspondences between segmented regions in multiple images, or do the segmentation simultaneously in multiple images. Investigation of image-based segmentation methods toward automatic 2.5D modelling may improve the accuracy and efficiency of the proposed method. 54 Chapter 4

Perceptual Evaluation and Algorithm Selection

When building 2.5D models using the Automatic 2.5D Cartoon Modelling approach, the performance of different algorithm combinations and for different objects may vary for different applications. The aim of perceptual evaluation is to investigate algorithm selection, i.e. selecting specific algorithm components for specific kinds of objects in order to improve the performace of Auto-2CM. The goal of this evaluation is to find the best practice for Auto-2CM, and the motivations are:

(i) evaluation of different component combinations of Auto-2CM

(ii) assessment of advantages and limitations of different combinations

(iii) deeper understanding of the examined field

Currently, no previous experimental study of this area exists.

Perceptual evaluation is widely used to evaluate approaches that are not suitable for quantitiative evaluation, such as those related to visual art or audio [9,29,33,54]. In this chapter, a perceptual evaluation of the Auto-2CM approach is presented via a series of experiments. The hypotheses of this evaluation are: (i) that certain component combinations perform differently on different kinds of objects; (ii) with this information the approach can produce better results by selecting different combinations for specific kinds of objects. This is facilitated because the Auto-2CM approach does not require specific methods for segmentation, and the system still works with different algorithms at each change.

55 56 Perceptual Evaluation and Algorithm Selection

Different configurations of the approach on different types of objects are tested, see Figure 4.1, and their advantages and disadvantages discussed. Recommendations on suitable configurations for specific kinds of objects are also provided.

This chapter is organized as follows. The data used in this experiment, and the reasons they are selected are discussed in Section 4.1. The design of the experiments is discussed in Section 4.2. Results and analysis are in Section 4.4. Recommendations and suggestions for best practice of Auto-2CM are in Section 4.5. Finally the conclusion is in Section 4.6.

4.1. Data Sets

Assuming 3D models already exist, that Auto-2CM can start from, the data sets used in this evaluation are from several different sources, and fall into two main categories. The first category contains models from different 3D mesh segmentation benchmarks currently in use, called ’scientific models’. These could provide information on how 3D segmentation affects 2.5D modelling. This is discussed in more detail in Section 4.1.1. The second category includes handmade models suitable for 3D video games, called “industry models”. Some of them are created using Rivers’ models as reference, others are from games such as “Angry Birds” and “Ruby Run”. These models are most suitable as 2.5D models, and the best models for Auto-2CM evaluation, and are discussed in more detail in Section 4.1.2.

4.1.1. Scientific Models

The reasons to include these models are, firstly, to test the performance of different segmentation approaches in experiments. The relationship between the results of general 3D segmentation and 2.5D modelling may be tested, to examine the influence of general 3D segmentation on the performance of 2.5D modeling. Secondly, models picked for this experiment are those that are better candidates for 2.5D modelling than others in the benchmarks. Those that greatly violate Rivers’ limitations, no matter which segmentation method is used, such as the Temple model of Middlebury dataset [58], are avoided. Perceptual Evaluation and Algorithm Selection 57 cess and Components of this Experiment Pro Figure 4.1.: 58 Perceptual Evaluation and Algorithm Selection

This category contains public shape benchmarks (see Figure 4.2). These bench- marks are designed for the purpose of evaluating different segmentation methods, in- cluding: (i)SHREC [73], (ii) McGill 3D Shape Benchmark [76], (iii) Princeton Shape Benchmark [64], (iv) 3D Shape Segmentation Benchmark [13].

4.1.2. Industry Models

These models came from three different sources. The first two models, SimpleHead (Fig 4.3(a)) and Alien (Fig 4.3(b)), are built based on Rivers’ work. These two models are included in order to make it easy to compare with original manually created 2.5D cartoon models. These 2.5D models are built to demostrate the usability of Rivers’ work for industrial animations. The following two characters, Bird (Fig 4.3(c)) and Pig (Fig 4.3(d)), are from the popular video game Angry Birds [46], and the last model Ruby (Fig 4.3(e)) is from another game Ruby Run [67]. These models are good to test the performance of different processes aimed at real industry. Current 3D video game models are normally ’low-poly’ models, which means there is a limit to the numbers of polygons of the model. Normal maps and pre-rendered bake technique are utilized instead of increasing the number of triangles of a mesh in order to obtain better visual effects. For current gaming models, between 300-1500 triangles on mobile platforms and 500-6000 triangles on desktop platforms are reasonable. Next-gen AAA games running on PS3 or Xbox360 usually have characters with 5000-7000 triangles. Game models used here were targeting mobile platforms, and thus have around 1000 triangles each, but models from 3D segmentation benchmarks usually have more than 10000 triangles, which is not practical for the industry. Both these factors affect the segmentation results and 2.5D modelling, which will be discussed in Section 4.4.

Models in this category are designed to evaluate the usability of the Auto-2CM approach for industry. They are created such that they do not violate Rivers’ limitations, thus guaranteeing that any error in the final 2.5D Cartoon Models is not caused by these 3D models. See Figure 4.3. Perceptual Evaluation and Algorithm Selection 59

(a) sunglasses (b) table (c) bear

(d) dophine (e) gull (f) teapot

(g) cow (h) camel

Figure 4.2.: Models from public shape benchmarks [13] 60 Perceptual Evaluation and Algorithm Selection

(a) Simple Head (b) Alien (c) Bird

(d) Pig (e) Ruby

Figure 4.3.: Models for 2.5D Cartoon Modelling Perceptual Evaluation and Algorithm Selection 61

4.2. Design

The experiments were designed to test the performance of different algorithm combi- nations of Auto-2CM on different kinds of objects. In order to achieve this goal, three segmentation algorithms (SDF, FP and PBS) and two categories of models are used.

The three segmentation algorithms deployed are:

(i) Shape Diameter Function (SDF)

(ii) Fitting Primitive (FP)

(iii) PortShapeSeg (PBS)

The two categories of models are:

(i) Scientific models, from 3D segmentation benchmarks (Sci).

(ii) Industrial models, from animation and games, etc (Ind).

Results and analysis of these experiments are in Section 4.4.

The 2.5D model building system Auto-2CM was used in the experiments. This system takes segmented 3D meshes as input. As it just captures shapes of segmented parts from views, no elements that violate Rivers’ limitations will be created at this step.

Following the perceptual evaluation methods of previous works, especially the car- toon related research by Garcia et al. [29], the evaluation of this research examines the following two aspects:

(i) Errors shapes of final 2.5D model that lead to incorrect interpolation results.

(ii) Appearance perceptual judgment of appearance

Some models from public shape benchmarks have some parts that almost always fail Rivers’ conditions. This is unavoidable and can only be fixed by improving the 2.5D Cartoon Model structure itself, which is a task beyond the scope of this thesis. Such errors will be ignored in this experiment since they are caused by a rendering system outside the modelling system. 62 Perceptual Evaluation and Algorithm Selection

4.3. Software Interface and Development

Figure 4.4.: User interface of the Auto-2CM prototype

Auto-2CM is implemented using Unity3D1 game engine with OpenCvSharp2 library. The purpose is to create a prototype of the Auto-2CM, which is also used in this evalu- ation. This section discusses the details of this implementation.

The software interface consists mainly of two parts: the control panel and the 3D view points matrix. On the control panel there are scrollers and buttons for actions such as segmentation and file outputs. The view point matrix is a surrounding the candidate object, and each node indicates a 2.5D view point. See Figure 4.4.

Currently the software supports Wavefront OBJ format. Either original meshes or segmented meshes can be loaded as candidate objects. If an original mesh is loaded then a segmentation step is required, otherwise the user may directly proceed to the 2.5D

1http://unity3d.com 2http://code.google.com/p/opencvsharp/ Perceptual Evaluation and Algorithm Selection 63

shape building step. SDF segmentation has been integrated into the software, however, other methods such as FT and PBS may be run on external consoles. SDF requires OpenCV library. Thus to integrate SDF into Unity, a .Net wrapper of OpenCV, known as OpenCvSharp, is used. However, some functions of OpenCvSharp are not compatible with Unity, so the library needed modification and recompilation. To calculate an SDF value of a triangle, the distances of a cone of rays from the centre of that triangle to the inner sides of the mesh are required. This is computed by a process based on Nvidia PhysX engine3. First, a mesh collider is built for the mesh and reversed. Then, a cone of rays are shot toward the opposite side of the normal. Lastly, the mean distances of the hits are calculated. The next step is initial clustering. All SDF values are used to train an Energy Minimizaiton model, and the cluster ID, mean and probability of each triangle are calculated. The last step is k-way min-cut adjustment. The cluster IDs and probabilities are used to build a graph. A k-way min-cut algorithm has been implemented to cut the graph to get the final segmentation result.

A virtual camera is placed at each view point node that the user has selected. By projecting the mesh part onto the camera plane a 2D shape is calculated. This shape is then used to create the stroke for that view point. The last step assembles all the strokes and outputs the 2.5D cartoon model structure for later manual modification if necessary.

4.4. Results and Analysis

The final results of the experiments are the 2.5D Cartoon models built by different algorithm combinations. Before evaluating the quality of 2.5D Modelling, it is necessary to evaluate the intermediate product, namely the 3D segmented meshes, to determine how segmentation influences the whole process. The first half of this section is analysis of the 3D segmentation results of different segmentation methods. The second half is analysis of the 2.5D results.

4.4.1. Segmentation Results and Analysis

Three segmentation approaches, PBS, SDF and FP were tested, as listed in Section 4.2.

3http://developer.nvidia.com/physx 64 Perceptual Evaluation and Algorithm Selection

Sunglasses: The three methods provide similar results. SDF unnecesarily separated the tips of frames. See Figure 4.5 (a)(b)(c).

Table: PBS and FP provide the best results, as in Figure 4.5(d)(f). SDF focused on details, however this will not cause errors in the final modelling. See Figure 4.5(e).

Bear: PBS gives a fair result, but it ignores the details on the head (ears) which will lead to an faulty 2.5D shape, shown in Figure 4.9(e). PBS also produced a meaningless part at the bottom of the bear. SDF is the best. FP divided the body into two parts, and the border is uneven, which will lead to meaningless 2.5D shapes.

Dolphin: SDF was better than PBS in this case, as it successfully separated the fins. FP failed. See Figure 4.5(j)(k)(l).

Gull: Both SDF and PBS are acceptable. FP successfully segmented the wings, but failed at the body part. See Figure 4.6(a)(b)(c).

Teapot: The result of SDF is the only one that can produce a good 2.5D model, as in Figure 4.10(d). Result for PBS in Figure 4.6(d) will lead to unrecognisable shapes, as shown in Figure 4.10(c).

Cow: SDF is the best. PBS is acceptable except for an unsegmented leg. See Figure 4.6(g)(h).

Camel: The problem of PBS in this case is the same as for the Cow model, namely failure to segment a main part of the model that will lead to errors in a later step. In this case, the head of Camel will be missing in final 2.5D models, as shown in Figure 4.10(g). SDF provides acceptable results, as in Figure 4.6(k).

To conclude, the performences of the three segmentation algorithms on public shape benchmark models, shown in Figures 4.5 and 4.6, FP and PBS are good at simpler and more obvious cases, such as the first two models. But FP can hardly provide any useful results beyond that. PBS generally produced acceptable segments except for more challenging models, such as the teapot, cow and camel.

The results of models built specifically for testing Auto-2CM are shown in Figures 4.7 and 4.8. These models share common features, namely, they have neither long stick shapes nor complex concave parts. Neither of PBS and FP can provide acceptable results for these models, while SDF achieves almost perfect results. Perceptual Evaluation and Algorithm Selection 65

Glasses Table Bear Dolphin Gull Teapot Cow Camel PBS    ×  × × × SDF  ×  ×   ×  FP   × × × × × × Head Alien Bird Pig Ruby PBS × × × × × SDF      FP × × × × ×

Table 4.1.: Acceptatble results () and those may lead to 2.5D errors ( × ) of segmentation

4.4.2. 2.5D Models and Analysis

The final 2.5D models are the most intuitive results for evaluation. FP was not tested for 2.5D models building, because the segmentation results of FP is much worse than the other two methods, and is unlikely to lead to meaningful 2.5D models. Only the other two segmentation algorithms were tested in this step, namely PBS and SDF.

As in the previous step, the results are organized in two parts: results and analysis for benchmark models, and for industry models.

Segmentation benchmark models

2.5D models built from shape segmentation benchmark datasets are shown in Figures 4.9 and 4.10.

For the first two models, Sunglasses and Table, PBS and SDF give almost the same results. They both look good, but wrong shapes at certain angles are caused by the long-thin stick part of the original 3D model, and as explained earlier this is not caused by 3D segmentation but the simple interpolation rendering method of Rivers’ approach.

Bear, Dolphin and Gull show differences caused by 3D segmentation methods. The Bear built from PBS did not separate the ears, causing a bad shape at in-between angles. However, the extra part at the bottom of the Bear did not leads to serious faults. The Dolphin from SDF has its fins separated as shapes, thus can be recognised when rotating. Dolphin from PBS, on the other hand, leaves the fins on the body, which will cause the fins to disapear at certain view angles and lead to wrong interpolation of the body part. 66 Perceptual Evaluation and Algorithm Selection

It is still recognizable at most angles though. The two segmentation methods do have not much difference when dealing with the Gull, they both give good results.

The last group contains three more difficult models for automatic segmentation, Teapot, Cow and Camel. All 2.5D models built from PBS segmentations have at least one serious error. The handle of Teapot, right-rear leg of Cow and head of Camel are not recognizable. These errors are caused by the segmentation results not being suitable for 2.5D modelling, and it is also not the way that manual segmentation will cut the mesh. For example, manual segmentation will not cut the teapot handle from the middle of the body, but from the end of the handle like SDF. Manual segmentation will cut the leg as a different part of the body of the Cow, similar to SDF, and cutting the Camel’s head from the neck as shown in SDF is more reasonable than from the shoulder as in PBS.

Thus to conclude, a segmentation method that has better performance on a general 3D segmentation benchmark, when compared to manual segmentation, will also have better performance in the 2.5D Cartoon Modelling process. Because manually created 2.5D Cartoon Shapes rely on human artists to separate the object, it is reasonable that an automatic segmentation method that is most similar to human segmentation will give the best result.

2.5D testing models

The results of 2.5D testing models are shown in Figures 4.11 and 4.12. Referring to the segmentation results shown in Figures 4.7 and 4.8, it is clear that PBS did not perform well on industry models, on the other hand SDF is almost perfect for these models. The good performance of SDF leaves the artist less manual work in modifying the model and makes it practically useful. The reason that PBS and SDF perform significantly differently on these models might be the difference between industry models and scientific benchmark models.

One obvious difference between these industry models and the more research-oriented benchmark models is that the industry models need to consider real-time rendering with limited computing power. Thus industry models are often ’low-poly’ models, having fewer triangles than research benchmark models. Low-poly industry models will affect the performance of different segmentation methods. For example, research benchmark meshes may have hundreds of triangles for a wrist, while the wrist of a game character Perceptual Evaluation and Algorithm Selection 67

Glasses Table Bear Dolphin Gull Teapot Cow Camel PBS    ×  × × × SDF  ×  ×   ×  Head Alien Bird Pig Ruby PBS × × × × × SDF     

Table 4.2.: Acceptatble () and error ( × ) 2.5D results may only have tens of triangles, which is sufficient for animation deformation, but leaves less information of the shape for the segmentation algorithms. However, research bench- marks models often have redundant information caused by automatic 3D reconstruction, while industry models are often handmade and smoother. Both these factors affect the segmentation results, further affecting the 2.5D modelling performance. 68 Perceptual Evaluation and Algorithm Selection

(a) PBS (b) SDF (c) FP

(d) PBS (e) SDF (f) FP

(g) PBS (h) SDF (i) FP

(j) PBS (k) SDF (l) FP

Figure 4.5.: Segmentation of models from shape segmentation benchmark. Perceptual Evaluation and Algorithm Selection 69

(a) PBS (b) SDF (c) FP

(d) PBS (e) SDF (f) FP

(g) PBS (h) SDF (i) FP

(j) PBS (k) SDF (l) FP

Figure 4.6.: Segmentation of models from shape segmentation benchmark. 70 Perceptual Evaluation and Algorithm Selection

(a) PBS (b) SDF (c) FP

(d) PBS (e) SDF (f) FP

Figure 4.7.: Simple Head and Alien Perceptual Evaluation and Algorithm Selection 71

(a) PBS (b) SDF (c) FP

(d) PBS (e) SDF (f) FP

(g) PBS (h) SDF (i) FP

Figure 4.8.: Bird, Pig and Ruby 72 Perceptual Evaluation and Algorithm Selection a PBS (a) e PBS (e) Figure 4.9.: .DMdl rmsaesgetto ecmr Dmodels. 3D benchmark segmentation shape from Models 2.5D b SDF (b) (f) SDF g PBS (g) c PBS (c) h SDF (h) d SDF (d) Perceptual Evaluation and Algorithm Selection 73 (d) SDF (h) SDF (g) PBS (c) PBS SDF (f) (b) SDF 2.5D Models from shape segmentation benchmark 3D models. 4.10.: Figure (e) PBS (a) PBS 74 Perceptual Evaluation and Algorithm Selection

(a) PBS (b) SDF

(c) PBS (d) SDF

Figure 4.11.: 2.5D models built from 2.5D modelling test models. Based on Rivers’ models as reference. Perceptual Evaluation and Algorithm Selection 75

(a) PBS (b) SDF

(c) PBS (d) SDF

(e) PBS (f) SDF

Figure 4.12.: 2.5D models built from 2.5D modelling test models. Games characters. 76 Perceptual Evaluation and Algorithm Selection

4.5. Recommendations

This section discusses the best practice for 2.5D cartoon modelling, which is the pur- pose of this evaluation. Based on the experiments, different kinds of models should be segmented using different methods to get the best results.

4.5.1. Simple Models

For most simple shapes, such as the sunglasses and the table used in the experiments, where the boundary of segmentation are clear shape edges, Fitting Primitives (FP) gives the best results, followed by PBS, which gives similar performance to FP. SDF tends to segment simple models into more parts, and some segments are unnecessary and meaningless. However, models which have long thin shapes and sharp edges are not suitable to be presented as 2.5D cartoon models.

4.5.2. Reconstructed Models

Reconstructed models used in this experiment are not image-based reconstructions but scanning models. Both these two methods provide similar mesh models, which have high polygon count (usually more than 10k triangles per model). Moreover, automatically reconstructed meshes have irregular vertex positions, but their distance to each other is often even. For example, a rectangle may consist of not just two triangles, but many triangles that have similar areas.

When building from these reconstructed 3D models, SDF will provide the best quality results. In some cases PBS may have similar results, but overall SDF is the best choice.

4.5.3. industry Models

3D models made for games are low-poly models (around 1k triangles if targeting mobile devices, 5-7k triangles if targeting next-gen game consoles). They are often handmade and normally no triangle is wasted, i.e. a rectangular plane will consist of only two triangles. Another difference between handmade models and reconstructed models is that the latter are always using one mesh per object, but handmade models could have multiple meshes per object. Perceptual Evaluation and Algorithm Selection 77

Based on the experiments, when building from low-poly game models, SDF is the best choice for the segmentation step. Segments of SDF are almost perfect, while in contrast, results from the other two methods are not acceptable.

4.5.4. Summary

When building simple models with sharp edges, even though they are not suitable for 2.5D cartoon models, FP should be used in the segmentation step; when building recon- structed models, both PBS and SDF could be considered, and the user can run both and pick the better one for the specific model she is building; when creating from low-poly game models, SDF should be selected as it has the best performance on such models.

4.6. Remarks

This research has tried to show that, in a task that requires the system to build a cartoon model, choosing specific algorithms for the type of object can improve the performance.

Based on the experiment, FP is good at segmenting simpler models that have long stick shapes, but does not give acceptable results on round shapes. Such models are not suited to 2.5D in any case. SDF is slightly worse than FP and PBS at simpler shapes, but good for almost all shapes that follow Rivers’ conditions. Therefore, (i) for the simplest models with sharp edges, FP is the best choice; for other reconstructed (scientific) models, both PBS and SDF may be used; (ii) SDF is currently the best segmentation method for industry models, and it can lead to 2.5D models with least errors and best appearance.

A prototype has been implemented for demonstrating the approach, with Unity en- gine and OpenCvSharp library. It provides a 3D user interface, and is easy to use. The user is able to create 2.5D cartoon models with only a few drags and clicks. 78 Chapter 5

Conclusion

This thesis focused on the problems of automatic 2.5D cartoon modelling (Auto-2CM), 3D to 2.5D conversion, and evaluation of Auto-2CM.

The contributions include an approach to automate the building process of 2.5D cartoon models, which is the first solution to the problem. This is the first automatic approach to convert 3D meshes to 2.5D cartoon shapes, which could also benefit future automatic approaches for other 2.5D methods. An evaluation of the 2.5D modelling process was performed, which aims to find the best practice for artists to use the system.

In this chapter, an overview and contributions of this thesis are presented. Limita- tions and future work are also discussed.

5.1. Thesis Overview

Cartoon is a very important form of media, and is widely used today in games and animations. Some 2D stylistic elements of cartoons cannot be presented in 3D space, and in this research it is called the rabbit ear problem. In 2010, Rivers et al. presented a novel approach to solve this problem, known as 2.5D Cartoon Models [55]. However, manual 2.5D Cartoon Modelling is a labour intensive work. An approach that can build such models automatically is required, but there are no existing automatic modelling approaches for 2.5D, unlike those for 3D modelling. This research aims to create a new approach to overcome this problem.

79 80 Conclusion

The Automatic 2.5D Cartoon Modelling approach (Auto-2CM) has been developed, including a 3D-2.5D conversion method. To evaluate this approach, it is tested with experiments designed for finding out the best practice. The main contributions of the developed techniques are discussed in the next section.

5.2. Contributions

Contributions of this research include the automatic 2.5D cartoon modelling approach, the 2.5D shape creation method, and an evaluation of this modelling approach.

The Auto-2CM approach presented in this thesis is the first solution to the problem of automating the 2.5D cartoon modelling process. It aims to release human artists from repetitive work, so that they can focus on more creative activities. The approach also can significantly boost the 2.5D cartoon creation process, as time and labour requirements are much less than the manual process. Auto-2CM is a practical method that builds 2.5D cartoon models from images or 3D models, and it connects the automation problem with computer vision techniques such as 3D reconstruction and computer graphics problems such as 3D segmentation. The problem is handled by Auto-2CM in a divide-and-conquer way. The method uses images of real world objects as inputs, and it can also use built 3D models as inputs. A prototype software has been implemented for demostration purposes.

2.5D shape creation is the key to the Auto-2CM approach. It is the first and currently the only method that converts 3D meshes to 2.5D shapes. Use of this method is not limited to Auto-2CM, and may be applied to all vector art 2.5D methods, for example the one presented by Di Fiore et al. [22]. It formulates the conversion problem as 3D-2D conversion and vector control point creation, which can be solved by 3D projection and contour extraction. As it is the first functional solution to the problem, there are still many possibilities for future improvements.

In Chapter4 an evaluation for Auto-2CM is presented, aimed to find the best practice of Auto-2CM. When modelling different kinds of objects, different algorithms may be used to achieve best performance. Previously the best combinations for modelling differ- ent kinds of objects was unknown, and there is no experimental study of this question. The evaluation consists of several experiments, and follows the methods of previous re- Conclusion 81

search into perceptual evaluation for the visual arts. Recommendations and suggestions have been given at the end of the evaluation.

In summary, the contributions of this thesis include:

(i) an approach to automate the building process for 2.5D cartoon models, called Auto- 2CM. For the first time, artists are freed from the labour intensive work of building 2.5D cartoon models. It has initiated and explored a new research direction in 2.5D model creation.

(ii) the first automatic approach to convert 3D meshes to 2.5D cartoon shapes, which is an important part of an automatic 2.5D cartoon modelling system. It linkes the huge number of built 3D models and real world objects with 2.5D model creation.

(iii) an evaluation of the 2.5D modelling process with different algorithms and differ- ent datasets. Aimed at finding the best pratice for Auto-2CM, and improve the performance of Auto-2CM significantly, this is practically useful.

5.3. Limitations and Future Work

Objects suitable for presentation as 2.5D Cartoon Models are limited. Rivers’ structure is good for presenting abstract cartoons, but not suitable for objects that have long thin shapes and sharp changes, such as tables and chairs. This is because of the limitations of Rivers’ model discussed in Section 2.3.2. The interpolation method used in Rivers’ model should be improved to overcome this limitation. Although this is beyond the topic of this thesis, improvements of the interpolation mothod might significantly increase the performance of 2.5D models and may lead to practical industrial usage.

Partially closed shapes cannot be built by Auto-2CM automatically, because it cur- rently does not distinguish between different types of objects by their semantics, but only by their geometry. 3D segmentations used in Auto-2CM, are general methods that do not consider this issue. Currently open shapes are left to the manual modification step, where changing close shapes to open is a very simple task. Future improvements can be introduced to help the system learn different kinds of objects, but the significance of this improvement is not as important as other possible future work.

In future, some 3D non-photorealistic rendering methods [15,21] may be introduced into the system. These 3D model based line drawing techniques may help to decide 82 Conclusion

where to pick control points. Because some strokes rendered by these methods do not fit the existing strokes, it is possible to define new stroke creation rules based on these extra strokes. Currently the segmentation algorithms used consider only geometric informa- tion, but no texture and colour. Consideration of these information during segmentation is worth pursuing.

5.4. Concluding Remarks

The arguments about 2D and 3D cartoons between artists have lasted for decades, with 2D giving artists the largest space for their imagination, and 3D significantly easing the production by providing the ability to rotate, leading to reuse of models in different frames, whereas 2D needs to be redrawn. 2.5D cartoons which mix 2D imaginative elements and 3D rotation seem to be the closest solution, however, this relatively new theory has just been introduced in the past few years, with only a few approaches published. “I thought 2D and 3D could coexist happily.” said Hayao Miyazaki1, the CEO of Studio Ghibli, and 2.5D techniques might prove him right.

Both 2D and 3D cartoons have many automatic creation approaches. This thesis has developed methods to fill the gap of automation in 2.5D cartoon creation. Hope- fully, future research in this field will finally lead to 2.5D cartoons in practical industry. Providing better tools for artists that help them more easily to express their imagina- tion, and inventing new information carriers that make such expression possible, should always be considered in future work in this area.

1Wikipedia, Hayao Miyazaki Appendix A

Publications Arising from Thesis

F. An, X. Cai and A. Sowmya. Automatic 2.5D Cartoon Modelling. Image and Vision Computing New Zealand, pp. 149-154, 2011.

F. An, X. Cai and A. Sowmya. Perceptual Evaluation of Automatic 2.5D Cartoon Modelling. Knowledge Management and Acquisition for Intelligent Systems Lecture Notes in Computer Science Volume 7457, 2012, pp 28-42

F. An, X. Cai and A. Sowmya. Automatic 2.5D Cartoon Modelling. (Journal paper to be submitted)

83 84 Appendix B

Other Tools, Engines and Frameworks

Some tools, engines and frameworks have been used in this thesis. Rivers and Angry Birds 3D datasets used in this research are built manually using Blender. The prototype software was developed with Unity3d, and some algorithms used, such as Gaussian Mixture Model and Convex hull of discret points, are based on OpenCvSharp. These are now briefly discussed for future reference.

B.1. Blender

Blender 1 is an open source, cross platform suite of tools for 3D creation. It also has a built-in game engine. It is a mature product with a history stretching back to the year 1989. The famous model “Suzanne” (the monkey head), which is now widely used as a quick and easy way to test material, texture and lighting, was created by Blender.

B.2. Unity

Unity3D 2 is a game engine and game development environment. It provides deployment support for multiple platforms: Microsoft Windows or Mac OS X executable; on the web (via the Unity Web Player plugin or Flash); as a Mac OS X Dashboard widget; Nintendo Wii; iPhone/iPad; Google Android; Microsoft Xbox 360; Sony PlayStation 3.

1www.blender.org 2www.unity3d.com

85 86 Other Tools, Engines and Frameworks

Easy multiplatform deployment is one of the reasons that we chose Unity for 2.5D cartoon modelling system development. It is also able to provide functionalities that allow developing of a 3D application quickly.

B.3. OpenCvSharp

OpenCvSharp is a cross platform wrapper of OpenCV for .NET Framework written in C#, which provides many popular image processing and computer vision algorithms in C#, VB.NET, etc. [60].

OpenCvSharp has the following features [60]:

(i) Many classes of OpenCvSharp implement IDisposable. Can write stylish code by “using” statement.

(ii) OpenCvSharp does not force object-oriented programming style. Can also call native-style OpenCV functions.

(iii) OpenCvSharp provides functions for converting from IplImage into Bitmap(GDI+) or WriteableBitmap(WPF).

(iv) OpenCvSharp can work on Mono. Therefore it is able to run on any platform which Mono supports (e.g. Linux, BSD and Mac OS X).

These features makes OpenCvSharp a very useful and powerful tool when developing computer vision and machine learning products for .Net framework. Appendix C

Acronyms and Abbreviations

Auto-2CM Automatic 2.5D Cartoon Modelling SDF Shape Diameter Function FP Fitting Primitives PBS Protrusion-oriented Segmentation

87

Acronyms and Abbreviations 89 90 Bibliography

[1] A. Agathos, I. Pratikakis, S. Perantonis, N. Sapidis, and P. Azariadis. 3d mesh segmentation methodologies for cad applications. Computer-Aided Design and Ap- plications, 4(6):827–841, 2007.

[2] D. Akoumianakis. Virtual Community Practices and Social Interactive Media: Technology Lifecycle and Workflow Analysis. Premier Reference Source. IGI Global, 2009.

[3] Stavros Perantonis Alexander Agathos, Ioannis Pratikakis and Nickolas S. Sapidis. Protrusion-oriented 3d mesh segmentation. The Visual Computer, 26(1):63–81, 2010.

[4] M. Attene, B. Falcidieno, and M. Spagnuolo. Hierarchical mesh segmentation based on fitting primitives. The Visual Computer, 22(3):181–193, 2006.

[5] D. Bourguignon, M.P. Cani, and G. Drettakis. Drawing for illustration and annota- tion in 3d. In Computer Graphics Forum, volume 20, pages 114–123. Wiley Online Library, 2001.

[6] D. Brackeen, B. Barker, and L. Vanhelsuw´e. Developing Games in Java. New Riders games. New Riders, 2004.

[7] D. Bradley, T. Boubekeur, and W. Heidrich. Accurate multi-view reconstruction using robust binocular stereo and surface meshing. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.

[8] M. Brown and D.G. Lowe. Automatic panoramic image stitching using invariant features. International Journal of Computer Vision, 74(1):59–73, 2007.

[9] M. Cad´ık.ˆ Perceptual evaluation of color-to-grayscale image conversions. In Com- puter Graphics Forum, volume 27, pages 1745–1754. Wiley Online Library, 2008.

91 92 BIBLIOGRAPHY

[10] N.D.F. Campbell, G. Vogiatzis, C. Hern´andez,and R. Cipolla. Using multiple hypotheses to improve depth-maps for multi-view stereo. ECCV08, 5302:766–779, 2008.

[11] M. Capek.ˇ The Concepts of space and time: their structure and their development. Boston studies in the philosophy of science. Reidel, 1976.

[12] D. Cavallaro. The anim´eart of Hayao Miyazaki. McFarland & Co., 2006.

[13] X. Chen, A. Golovinskiy, and T. Funkhouser. A benchmark for 3d mesh segmenta- tion. ACM Transactions on Graphics (Proc. SIGGRAPH), 28(3), aug 2009.

[14] J. Clottes, P.G. Bahn, and M. Arnold. Chauvet Cave: the art of earliest times. University of Utah Press, 2003.

[15] F. Cole, A. Golovinskiy, A. Limpaecher, H.S. Barros, A. Finkelstein, T. Funkhouser, and S. Rusinkiewicz. Where do people draw lines? In SIGGRAPH. ACM, 2008.

[16] Steve Connor. After 35,000 years, erotic art for cavemen discovered. http:// animatedcartoons.blogspot.com.au/, 2006.

[17] S. Crabtree and P. Beudert. Scenic Art for the Theatre: History, Tools and Tech- niques. Elsevier Science, 2011.

[18] B. Curless. From range scans to 3d models. ACM SIGGRAPH Computer Graphics, 33(4):38–41, 1999.

[19] J. D’Amelio and S. Hohauser. Perspective Drawing Handbook. Dover Art Instruc- tion. Dover Publications, 2004.

[20] W. Damon and R.M. Lerner. Child and Adolescent Development: An Advanced Course. John Wiley & Sons, 2008.

[21] D. DeCarlo, A. Finkelstein, S. Rusinkiewicz, and A. Santella. Suggestive contours for conveying shape. In ACM Transactions on Graphics, volume 22, pages 848–855. ACM, 2003.

[22] F. Di Fiore, P. Schaeken, K. Elens, and F. Van Reeth. Automatic in-betweening in computer assisted animation by exploiting 2.5d modelling techniques. In , pages 192–200. IEEE, 2001.

[23] W. Eisner. Graphic storytelling. Poorhouse Press, 1996. BIBLIOGRAPHY 93

[24] I. Ekeland. Au hasard. University of Chicago Press, 1993.

[25] Correo Electronico. First optical toys. http:// www.independent.co.uk/arts-entertainment/art/news/ after-35000-years-erotic-art-for-cavemen-discovered-1684569.html, 2009.

[26] J.D. Foley. Computer graphics: principles and practice. The Systems Programming Series. Addison-Wesley, 1996.

[27] Y. Furukawa. High-fidelity image-based modeling. ProQuest, 2008.

[28] Y. Furukawa and J. Ponce. Accurate, dense, and robust multiview stereopsis. IEEE transactions on Pattern Analysis and Machine Intelligence, pages 1362–1376, 2009.

[29] M. Garcia, J. Dingliana, and C. O’Sullivan. Perceptual evaluation of : accuracy, attention, appeal. In Proceedings of the 5th symposium on Ap- plied perception in graphics and visualization, pages 107–114. ACM, 2008.

[30] W.H. Goodyear. Renaissance and modern art. The Macmillan company, 1913.

[31] M. Habbecke and L. Kobbelt. A surface-growing approach to multi-view stereo reconstruction. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, pages 1–8. IEEE, 2007.

[32] Harry Hamernik. Cartoon 360: Secrets to Drawing Cartoon People. IMPACT, 2010.

[33] J. Hamill, R. McDonnell, S. Dobbyn, and C. O’Sullivan. Perceptual evaluation of impostor representations for virtual humans and buildings. In Computer Graphics Forum, volume 24, pages 623–633. Wiley Online Library, 2005.

[34] M. HANNAH. Test results from sri’s stereo system. In Science Applications Inter- national Corp, Proceedings: Image Understanding Workshop,, volume 2, 1988.

[35] M.J. Hannah. Computer matching of areas in stereo images. Technical report, DTIC Document, 1974.

[36] F. Hartt. A history of Italian Renaissance art: painting, sculpture, architecture. Thames and Hudson, 1970.

[37] Richard T. Heffron. Futureworld, 1976.

[38] C. Hern´andezEsteban and F. Schmitt. Silhouette and stereo fusion for 3d object 94 BIBLIOGRAPHY

modeling. Computer Vision and Image Understanding, 96(3):367–392, 2004.

[39] M. Hilaga, Y. Shinagawa, T. Kohmura, and T.L. Kunii. Topology matching for fully automatic similarity estimation of 3d shapes. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 203–212. ACM, 2001.

[40] K.S. Inc. 3Ds Max 2008 In Simple Steps. Dreamtech Press, 2008.

[41] S. Itterheim. Learn IPhone and IPad Cocos2D Game Development: the Leading Framework for Building 2D Graphical and Interactive Applications. Apress L. P., 2010.

[42] Kyachi. When drawing a pause in motion. http://www.pixiv.net/member_ illust.php?mode=medium&illust_id=12192382, Jul 2010.

[43] A. Laurentini. The visual hull concept for silhouette-based image understanding. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 16(2):150–162, 1994.

[44] H. Lin, H.Y.M. Liao, and J.C. Lin. Visual salience-guided mesh decomposition. Multimedia, IEEE Transactions on, 9(1):46–57, 2007.

[45] B. Loguidice and M. Barton. Vintage games: an insider look at the history of Grand Theft Auto, Super Mario, and the most influential games of all time. Focal Press. Elsevier Science, 2009.

[46] Rovio Entertaiment Ltd. Angry birds. http://www.rovio.com/angrybirds, 2009.

[47] R. Marucchi-Foino. Game and Graphics Programming for IOS and Android with OpenGL Es 2.0. John Wiley & Sons, 2012.

[48] Merriam-Webster. The Merriam-Webster Dictionary. Merriam-Webster, 2005.

[49] H.P. Moravec. The stanford cart and the cmu rover. Proceedings of the IEEE, 71(7):872–884, 1983.

[50] R.T. Neer. The Emergence of the Classical Style in Greek Sculpture. University of Chicago Press, 2010.

[51] H.M. Nguyen, B. W¨unsche, P. Delmas, and C. Lutteroth. 3d models from the black box: Investigating the current state of image-based modeling. Proceedings of the International Conferences in Central Europe on Computer Graphics, Visualization BIBLIOGRAPHY 95

and Computer Vision, 2012.

[52] Jean-Daniel Deschnes Philippe Lambert and Patrick Hbertt. Robust rbf: Applica- tion to multi-view stereo surface reconstruction. In CVIU, 2010.

[53] J. Podolak and S. Rusinkiewicz. Atomic volumes for mesh completion. In Pro- ceedings of the third Eurographics symposium on , page 33. Eurographics Association, 2005.

[54] M. Praˇz´ak,L. Hoyet, and C. O’Sullivan. Perceptual evaluation of footskate cleanup. In Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Com- puter Animation, pages 287–294. ACM, 2011.

[55] A. Rivers, T. Igarashi, and F. Durand. 2.5d cartoon models. ACM Transactions on Graphics, 29(4), 2010.

[56] M. Russo. Polygonal Modeling: Basic And Advanced Techniques. Wordware Game and Graphics Library. Wordware Pub., 2006.

[57] M.W. Sandler. Photography: an illustrated history. Oxford illustrated histories. Oxford University Press, 2002.

[58] D. Scharstein and R. Szeliski. Multi-view stereo evaluation. http://vision. middlebury.edu/mview/eval/.

[59] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision, 47(1):7–42, 2002.

[60] schima et al. opencvsharp - opencv wrapper for .net framework.

[61] A. Sears and J.A. Jacko. The human-computer interaction handbook: fundamentals, evolving technologies, and emerging applications. Human factors and ergonomics. Lawrence Erlbaum Associates, 2008.

[62] S.M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski. A comparison and evaluation of multi-view stereo reconstruction algorithms. Computer Vision and Pattern Recognition, 2006.

[63] L. Shapira, A. Shamir, and D. Cohen-Or. Consistent mesh partitioning and skele- tonisation using the shape diameter function. The Visual Computer, 24(4):249–259, 2008. 96 BIBLIOGRAPHY

[64] P. Shilane, P. Min, M. Kazhdan, and T. Funkhouser. The princeton shape bench- mark. In Shape Modeling Applications, 2004. Proceedings, pages 167–178. Ieee, 2004.

[65] S. Silvers. Rerepresentation: readings in the philosophy of mental representation. Philosophical studies series. Kluwer Academic Publishers, 1989.

[66] C. Strecha, R. Fransens, and L. Van Gool. Combined depth and outlier estimation in multi-view stereo. In Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, volume 2, pages 2394–2401. IEEE, 2006.

[67] Zhan Studio. Ruby run. http://www.zhanstudio.com/ruby-run.html.

[68] S. Subotnick. Animation in the home digital studio: creation to distribution. Focal Press visual effects and animation series. Focal Press, 2003.

[69] D. Summers. Longman dictionary of contemporary English. Longman Dictionary of Contemporary English Series. Pearson Education, 2005.

[70] S. Suzuki et al. Topological structural analysis of digitized binary images by border following. Computer Vision, Graphics, and Image Processing, 30(1):32–46, 1985.

[71] Genndy Tartakovsky. Dexter’s laboratory. http://en.wikipedia.org/wiki/ Dexter’s_Laboratory, 1996-2003.

[72] Parram´onEdiciones. Editorial Team and P.E. Team. Cartoon. Barron’s Art Hand- books. Barron’s Educational Series, 2003.

[73] Frank ter Haarf, Remco Veltkamp, and Tristan Whitmarsh. Shrec’07. www. aimatshape.net/event/SHREC/, 2007.

[74] Yoshito Usui. Crayon shin-chan. http://en.wikipedia.org/wiki/Crayon_ Shin-chan, 1990-2010.

[75] G. Vogiatzis, P.H.S. Torr, and R. Cipolla. Multi-view stereo via volumetric graph- cuts. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Com- puter Society Conference on, volume 2, pages 391–398. IEEE, 2005.

[76] J. Zhang, K. Siddiqi, D. Macrini, A. Shokoufandeh, and S. Dickinson. Retrieving articulated 3-d models using medial surfaces and their graph spectra. In Energy minimization methods in computer vision and pattern recognition, pages 285–300. Springer, 2005. List of Figures

2.7. A picture demonstrating the principle of River’s 2.5D cartoon models. This simple head model is combined using 9 parts, and each part contains one stroke as its boundary line, a 3D anchor position and a filling colour[55]. 20

2.8. The front, 45 degree and right view of 3D model of ’Ruby’ ...... 21

2.10. (a)(b): Notice the ears of the ’Dog’ model in Rivers’ work. (c)(d): Three more key views are pre-defined to make the arms of the ’Alien’ look right. 23

2.14. (a) Perspective projection (b) Parallel projection[26]...... 31

3.1. The process flow of Auto-2CM...... 34

3.4. Hole fillings ...... 39

3.5. Stroke by control points with different smoothness type ...... 42

3.6. An orthographically projected diagram of the Lockheed Martin/Boeing F-22 Raptor fighter aircraft. Image from Wikipedia ...... 43

3.7. The process of part assembly ...... 44

3.8. Actions an artist normally takes after model is automatically built . . . . 46

3.10. The bear and koala ...... 49

3.13. In some cases, whether the nose stroke is open or closed depends on the style that the artist wants to present[32] ...... 52

4.1. Process and Components of this Experiment ...... 57

97