ITE Trans. on MTA Vol. 1, No. 1, pp. 20-26 (2013) Copyright © 2013 by ITE Transactions on Media Technology and Applications (MTA)

Intelligent CG Making Technology and Intelligent Media

Masayuki Nakajima† (member)

Abstract In this invited research paper, I will describe the Intelligent CG Making Technology, (ICGMT) pro- duction methodology and Intelligent Media (IM). I will begin with an explanation of the key aspects of the ICGMT and a definition of IM. Thereafter I will explain the three approaches of the ICGMT. These approaches are the reuse of data, the making animation from text, and the making animation from natural spo- ken language. Finally, I will explain current approaches of the ICGMT under development by the Nakajima lab- oratory.

Keywords: Intelligent, CG, reuse, agent, language understanding

1. Introduction 2. What is Intelligence in CG

Computer Graphics (CG) is a technique used to create In this chapter, I introduce the technology supporting images and animation using computer and graphics the Intelligence in CG and thereafter I will define peripheral devices. CG has become a popular medium for Intelligent Media, (IM). science, industrial manufacturing, animation, games, 2.1 Intelligent CG Making Technology (ICGMT) communication and medical visualization. Formally, The term of "Artificial Intelligence" or AI is often used Large-scale investments in technology, skilled personnel to describe the intelligence of machines and the branch and large time allotments were necessary to produce CG of computer science that aims to create intelligence in animation. However, due to developments in Motion ways that emulate some functions of the human brain. Capturing Systems, the KINECT and other motion However "Intelligence" in CG animation differs from acquisition systems, the production of CG animation has this. Intelligence in CG means that the computer aids become faster, easier and less expensive. production efficiently, comfortably and more Furthermore, as a result of recent advances in render- practically1). Furthermore, intelligence in CG includes ing technologies, it is now possible to produce high quali- the domain of KANSEI, which adds sensitivity and feel- ty CG animation that is difficult to distinguish from live- ings. The following five aspects provide a definition of action footage. We can conclude that CG animation is the intelligent CG image production concept. nearing technical completion as it becomes indistin- (1) Production is more readily available to wider user guishable from reality. groups without previous animation skills giving them It can be concluded that in a post-realism era, CG the ability to make high quality CG . engineers are struggling to find new ways to further (2) We can generate animation in more ergonomically develop the field. This paper proposes that the key designed environments providing pleasant work- aspects for the future development and facilitation of CG flows and generally a more pleasant experience production are AI (Artificial Intelligence) and Pattern thus allowing to concentrate on other Recognition. This invited paper introduces a new AI-like elements such as directing. trial as proposed primarily by Nakajima laboratory. (3) Correspondence with diversification. The system corresponds with 2D, 3D and four- dimensional aspects of representation on both Received September 19, 2012; Revised October 4, 2012; Accepted mobile devices and large high-resolution screens. October 12, 2012 † Gotland University (Sweden) and Director of Center for the Study of (4) Delivery of high resolution graphics providing World Civilization in Tokyo Institute of Technology increased artistic freedom with high fidelity anima-

20 Invited Paper » Intelligent CG Making Technology and Intelligent Media

tion for abstraction or realism. data is low-level intelligence and the use of natural lan- (5) Optimized and semi-automated production is pro- guage is high-level intelligence due to the use of lan- viding shorter production periods at less cost. guage recognition technologies. This allows non-skilled users to make CG animation 3.1 Reuse in the Animation Production cheaply and easily. This section introduces advancements in the reuse of Demands in hardware and software specifications data and assets in animation production, as developed require faster computer processing, user-friendly graph- by the Nakajima laboratory2). We have two approaches ics and high-level peripheral devices. In software tech- in animation production. One is making animation nology the following should be taken into consideration: though the production of new assets and another is mak- (1) Advanced image information processing technologies. ing animation by reusing existing animation sequences (2) Advanced image expression and direction technolo- and assets. gy incorporating KANSEI processing, (aesthetic Making animation through the asset production has expression and entertainment). been a branch standard, however reuse is new concept in 2.2 Intelligent CG Production Technology CG production. There are two ways of reuse. The first is The following (1)-(7) technologies are the important reusing the movement data of characters, the other is image information processing. the reuse of the animation sequence itself. These are (1) Database Technology. described as follows. Database Technology is fast enough to retrieve ade- (1) Reuse of Motion Data. quate image information. The main approach for the reuse of motion data is the (2) Computer Vision Technology. reuse of the actual motion data itself. Computer Vision (CV) Technology is effective for mod- (a) Reuse of MOCAP Data. eling of real human and animal movement and for the The most standard approach for the reuse of generation of corresponding animations. MOCAP() is the application of the (3) Automatic Language understanding Technology. MOCAP data for reuse in several characters3). And refer- Technical capability of generating animation effective- ence4) is the interactive and hierarchical motion editing ly from both written and spoken language. system of humanoid character movement using MOCAP (4) Human Interface Technology. data. We proposed MOTION BELT5) for the reuse of The development of a user-friendly CG making sys- MOCAP data effectively and visually. tem. (b) Using Motion Graph. (5) Standardization Technology. When we reuse MOCAP data, Motion Graph is used The use of standardization technology, allows us to oper- for the connection of several groups of MOCAP data. ate equally between multiple platforms and projects. Kovar6) is proposed as an automatic Motion Graph gen- (7) The Expert System. eration method from the MOCAP data. This detects the We can make animation, by reusing animation assets, adequate movement from the Motion Graph and gener- which are designed and made by experts in the field. ates the connection pass between two Motion Graphs. We can state "Intelligent Media" (IM) is created (c) Combination with the Animation through the development of the Intelligent CG Making Method. Technology (ICGMT) production methodology. These One of the approaches is the generation of Key frame technologies offer new advances towards the develop- animation from several kinds of MOCAP data7). In this ment of future trends of media production providing paper, we can also add exaggeration of movement in optimized high-level animations. selected key frames of the animation to improve the expressive quality. 3. Trial of the Intelligent CG Making System (2) Reuse of Animation Scenes and Image Data. In this chapter, I will describe examples of the intelli- A widely used method in production for weekly televi- gent CG (image and animation) making research papers. sion programs is the pre-vis checking of the character 3.1 describes the reuse of animation data, 3.2 describes motion in the early stages of production. This allows for making animation from Text and 3.3 describes the mak- adjustments to be made early on, preventing irregulari- ing of animation through the use of natural language. ties later in production. In consideration of this, it is It should be pointed out that the reuse of animation important for the industry to be able to reuse existing

21 ITE Trans. on MTA Vol. 1, No. 1 (2013)

animation sequences to produce new animations directly and effectively without irregularities of motion. (a) Method of Making Reusable. To make existing sequences reusable, it is necessary to extract the motion at the same time as the extraction of the model shape from the animation. To achieve this, Fig.1 The structure of the system implemented with the use of we used pattern matching and structural matching algo- the 'Reuters' website. rithms for previous animation sequences2). The match- ing process is as follows: (I) Binarization and Line Approximation. speaks by the use of a synthetic voice. In addition, the The bitmap image from a frame of a 2D shot can control all camera positioning and move- sequence is finalized and the lines of model shape are fila- ment as well as all lighting. In addition, all postproduc- mented. Because the image is drawn using lines, theses tion processing and compositing used in TV programs, lines can be transferred and reformed in several characters. BGM, and movie productions, are also facilitated in the (II) Extracting Lines. TVML player in real-time. The line of a sequence is extracted by detecting inter- (2) T2V System section points and start and end points. We proposed T2V (Text-To-Vision) technology capable (III) Matching the Correspondence Point. of generating animation from simple text. We apply T2V The correspondences of the cross and terminal points to Automatic and Intelligent News Broadcasting in the target image to the original image are set auto- System9). matically. We have constructed functioning software that gener- (IV) Adjustment of the Amount of Transformation and ates TV news shows from Internet news sites using Correlation is Possible Between Frames. TVML and T2V. Fig.1 shows the structure of the test (b) Reusable Animation Database. system implemented with the use of the 'Reuters' web- It is necessary to create the reusable animation site. This system extracts HTML data from the top page sequences as a production standard. We create the of the site and analyzes it and divides it into the corre- reusable animation sequence and store them in a data- sponding number of news articles. It then extracts the base. Our proposed animation reuse method is then title, the main body and the main jpeg image for each ready for applying stored character motion sequences to article respectively. The system then creates a T2V the user's character. script from the HTML news text. The script is then con- The developed animation database system has the fol- verted to a TVML script, which includes and formats lowing functionality. visual and audio effects such as the CG announcer, the (I) Database Registration news show setup, sound effects, superimposed graphics (II) Animation Generation etc. Finally the TVML engine plays back the TVML (III) Animation Retrieval. script and delivers the full-CG news show animation The database allows for the production of a wide vari- without pre-render waiting. It can also support multi- ety of different animations2). language operation capable of speaking virtually every 3.2 Automatic Animation Making from Text language. We consider automatic and intelligent animation mak- 3.3 Automatic Animation Making from Natural ing methods from a text scenario as a more advanced Language approach than reuse. There is much research on making 3D scenes from texts (1) TVML written in natural language or animated agent systems Hayashi8) has proposed and developed TVML (TV pro- which can interact with users through natural language10)11). gram Making Language) for several years. The TVML There are two approaches. One is generating still has the ability of making TV programs from simple nat- images and another is generating CG animation. When ural language to describe the script of the TV program. we produce a still image from natural language, it is The animation for the TV program is then generated necessary to generate a depiction of a scene more pre- immediately when we enter the program script language cisely. Namely the identification of the noun to point to into the TVML player. A real-time CG character then the object of the sentences and an analysis of the predi-

22 Invited Paper » Intelligent CG Making Technology and Intelligent Media

cate expression including position relations become more animations. In the paper12), the Badler group introduce important. On the other hand, the case of the generation an architecture, which allows users to input immediate of the animation by natural language, analysis of the or persistent instructions using natural language and to verb and adverb in the sentence become more important then see the agents' resulting behavioral changes in the for automatic generation of real action in the character. graphical output of the simulation. They have therefore The Pennsylvania University Group has written sev- implemented an architecture, which allows users to eral papers proposing animation production instructed input instructions using natural language sentences. by natural languages which can be found at this URL. These instructions can range from specific instanta- http://www.cis.upenn.edu/~hms/publications.html neous commands, like "Sit down," to very general stand- named "The Center for Human Modeling & Simulation." ing orders, like "Drive abandoned vehicles to the parking (1) SPRINT lot," affording various degrees of autonomy to the SPRINT(SPatial Representation INTerpreter) is a sys- avatar/agent. tem that makes 3D scenes from natural language10). (4) Animated Agent System This system focuses on the spatial constraints in a text We are also developing an animated agent system, that describe a scene and determines the location of which can interact through Japanese natural language objects. A potential model is used to express the vague- 13). In recent years, there has been considerable interest ness of spatial constraints. The potential model used in in simulating human behavior in both real and virtual SPRINT is becomes very complicated when several spa- world scenarios. If simulated agents or robots could tial constraints are combined. The potential model can understand and carry out instructions expressed in nat- treat several spatial constraints at the same time like a ural language, they could vastly improve their utility Boolean expression. and extend their area of application. However in gener- (2) WordsEye System al, linguistic expressions have ambiguity and vagueness. WordsEye makes 3D scenes from natural language It is thus often hard to resolve the ambiguity in an auto- proposed by ATT laboratory11). Namely, WordsEye gen- matic manner. In this work, we are focusing on the prob- erates 3 dimensional animations according to English lem of using natural language for command driven text entered into the system. At first, the input text is motion generation14). At first, to express the constraints grammatically and semantically analyzed using "The specified explicitly by the user, with those implied by the Natural Language Analyzer". Next, 3D animation is virtual character's body and the surrounding environ- generated using prepared tagged 3D polygon objects ment, into a uniform representation; and second to according to the analyzed text. WordsEye also gives spa- develop a system that uses this representation in order tial tags, which assign a function to a part of the object. to generate smooth agent animation consistent with the For example, the "top surface" tag is assigned to the seat constraints. of a chair. The spatial tags are used to determine the (b) System Overview. location of the objects. Using speech, a user can command the agents to (3) Smart Avatars manipulate objects in the space. The current system A natural language interface should be powerful accepts simple Japanese commands, such as "Tsukue no enough to express conditional instructions and hypothet- mae ni ike" (Walk to the table) or "Motto" (Further). The ical situations. Smart Avatars are virtual human repre- agent's behavior and the subsequent changes in the vir- sentations controlled by real people12). Given instruc- tual world are displayed to the user as a three-dimen- tions interactively, Smart Avatars can act as sional animation. autonomous or reactive agents. During a real-time simu- Fig.2 illustrates the architecture of the system. The lation, a user should be able to dynamically refine his or speech recognition module receives the user's speech her avatar's behavior in reaction to simulated stimuli input and translates it to a sequence of words. The without having to undertake a lengthy off-line program- text/discourse analysis module analyzes the word ming session. One promising and relatively unexplored sequence to extract a case frame, thus extracting the option for giving runtime instructions to virtual humans user's goal and passing it over to the planning modules, is a natural language based interface. After all, instruc- which then build a plan to generate the appropriate ani- tions for real humans are given in natural language, mation. We separate the planning into two stages; macro augmented with graphical diagrams and, occasionally, and micro planning to account for the differences in repre-

23 ITE Trans. on MTA Vol. 1, No. 1 (2013)

Fig.2 System architecture.

Fig.3 Experimental result of "Natchan ha aoi tsukue no mae ni ike" (Natchan, go in front of the blue table).

sentation. During the macro planning, the planner needs The analysis of the non-verbal communication to know the qualitative properties of the involved objects between people via the management of their Personal depending on their size, location and so on. Spaces (PS) shown in Fig.4, gives an idea about the (b) Experimental Result nature of their relationship. We propose a mathematical Using our character agent system, we generated a model for the concept of Personal Space and demon- variety of different user command-driven animations. strate its application in simulating the non-verbal com- Fig.3 shows the result when "Natchan ha aoi tsukue no munication between agents in virtual worlds and also in mae ni ike" (Natchan, go in front of the blue table.) is Human Computer Interaction15). We focus on the com- given as a user's input. The gradation coloring of the munication between two virtual agents. We assume floor shows the value of the potential field and the line three different types of relationships: business relation- signifies the generated trajectory. ship, friendly relationship and relationships between strangers. Two virtual agents behave under our pro- 4. Current Approach in Nakajima posed PS. Our method can simulate the behavior of vir- Laboratory tual agents according to the relationship between them. Finally in this paper, novel approaches by the use of We use the Personal Space model to: 1) automatically the Intelligent CG Making Technology (ICGMT) produc- control the speed of an agent when it is moving towards tion methodology as proposed by Nakajima laboratory another agent, which it is going to meet, 2) automatical- are reported. ly find a natural stopping distance in front of the target. (1) Agent Movement in Accordance with Social The proposed method enables the modeling of the Relationships to One Another. agent's mobile territory and his relationship with others.

24 Invited Paper » Intelligent CG Making Technology and Intelligent Media

Fig.4 Definition of the Personal Space.

Results of this work can be applied to modeling the behavior of autonomous virtual agents and avatars in virtual worlds, as well as individual movement in social groups and crowds. (2) Learning System Approach in NPR Fig.5 Results of automatic photo conversion into an oriental ink We proposed a highly practical framework in Non style. Photo-realistic Rendering, (NPR) for painterly rendering that is an automatic and intelligent approach to creating artistic paintings16). We mainly focused on creating ori- optimization problem to find an optimal policy of control- ental ink painting ("Sumi-e"), which is one of the oldest ling the brush so as to maximize a cumulative reward artistic brushworks and particularly popular in Asian during the process of drawing strokes. RL methods help countries. The main research challenge in oriental ink to build an artificial agent that learns how to optimize painting is stroke placement and how to distribute its behavior in an unknown environment, without strokes with realistic brush textures in desired shapes. requiring prior knowledge. However, this process tends to cause unsatisfactory (b) Model-based Learning of Sumi-e Agent defects such as non-natural stretching in textures and Model-based methods require an explicit model of the undesirable folds or creases appearing inside corners or Markov decision process, including the transition curves. To address these classic problems, we introduced dynamics and the reward function. Model-based meth- an intelligent learning agent theory for the art of paint- ods work offline to produce a policy, which is then used ing. This work contains the design of a brush agent and to control the process. Transition dynamics is the model the development of two agent's learning algorithms for of the environment, which guides the agent in how to automatic stroke drawing. move inside the desired shapes. To construct transition (a) Design of Brush Agent dynamics, we begin by defining the space around a We designed the brush as an intelligent agent for shape of a desired stroke by sampling locations. An opti- deciding behaviors of drawing strokes in the framework mal policy describes a mapping from states to actions to of reinforcement learning (RL), and formulate this form the optimal brush trajectory, which obtains the sequential decision-making problem as a Markov deci- maximum cumulative reward. Since this model-based sion process (MDP)17). We then provided an elaborate method requires the transition dynamics of a specific design of an environment, actions, states, and rewards shape, this results in a limitation that the optimal policy specifically tailored to the Sumi-e agent. Under this on a desired shape may not be directly applied to new framework, the stroke generation is formulated as an shapes.

25 ITE Trans. on MTA Vol. 1, No. 1 (2013)

The effectiveness of our proposed learning approaches 6) L. Kovar, M. Gleicher and F. Pighin: "Motion Graphs", Proceedings of SIGGRAPH2002 (2002) was demonstrated through simulated Sumi-e experi- 7) K. Pullen and C. Bregler: "Motion Capture Assisted Animation: ments shown in Fig.5. Statistical comparison with Texturing and Synthesis", Proceedings of SIGGRAPH2002 (2002) mainstream commercial software through the user study 8) M. Hayashi, H. Ueda, T. Kurihara and M. Yasumura: "TVML (TV program Making Language) - Automatic TV Program Generation showed that the performance of our methods is closer to from Text-based Script -" Imagina'99 proceedings (1999) the real paintings than the commercial software. 9) M. Hayashi, M. Nakajima and S. Bachelder: "International Standard of Automatic and Intelligent News Broadcasting System", 5. Conclusion Nicograph International in Indonesia (2012) 10) Yamada and Nishita: The analysis of the Spatial Descriptions in In this invited paper, I introduce the Intelligent CG Natural Language and the Reconstruction of the Scene, IPSJ, 31, 5, pp.660-672 (1990) Making Technology (ICGMT) system as a new trend in 11) B. Coyne, R. Sproat: "WordsEye: An Automatic Text-to-Scene CG fields. However there are other papers related to Conversion System", SIGGRAPH 2001 proceeding, pp.487-496 (2001) "Intelligent CG media" which are not included in this 12) A. Bindiganavale, W. Schuler, J.M. Allbeck, N.I. Badler and A.K. paper due to format constraints. I hope to introduce Joshi: "Dynamically Altering Agent Behaviors Using Natural these in coming papers. Furthermore I aim to develop Language Instructions", the 4th International Conference on Autonomous Agents 2000 (AGENTS 2000), pp.293-300 (2000) our Agent System providing increased AI capabilities in 13) M. Nakajima: "Autonomous Agent Action and Semantics", 1st the animated characters for future use in applications International Symposium on Shape and Semantics, pp.1-5 (2006) 14) S. Funatsu, T. Koyama, S. Saito, T. Tokunaga and M. Nakajima: such as Robotics. "Action Generation from Natural Language", Advances in I strongly hope many researchers and engineers are Multimedia Information Processing-PCM 2004, pp.15-22 (2004) 15) T. Amaoka, H. Laga, M. Yoshie, M. Nakajima: "Personal Space- active in the development of the Intelligent Media fields. based Simulation of Non-Verbal Communications", Entertainment Thanks to Prof. Steven Bachelder and Prof. Masaki Computing, 14, pp.1-36 (2011) Hayashi, Gotland University for assistance and support 16) N. Xie, H. Laga, S. Saito, M. Nakajima: "Contour-driven Sumi-e rendering of real photos", Computers & Graphics, 35, 1, pp.122- in this paper. 134 (2011) 17) N. Xie, H. Hachiya and M. Sugiyama. "Artist Agent: A [References] Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting", Proceedings of 29th 1) M. Nakajima: "Computer and AI" , Journal of Artificial Intelligent International Conference on Machine Learning (ICML2012), Society, 19, 1, pp.10-14 (2004) pp.153-160 (2012) 2) F. Sumi, M. Nakajima "A Production Method of Reusing Existing 2D Animation Sequences", CGI 2003 pp.282-287(2003) Masayuki Nakajima received Dr. Eng. 3) A. Witkin and Z. Popovic: "Motion Warping", Proceedings of SIG- degree from the Tokyo Institute of Technology, Tokyo, GRAPH (1995) Japan in 1975 and has been Professor at the Department of Computer Science, Graduate School of 4) J. Lee and S.Y. Shin: "A Hierarchical Approach to Interactive Information Science & Engineering, Tokyo Institute of Motion Editing for Human-Like Figures", Proceedings of SIG- Technology during 1997-2012 March. He began work- GRAPH(1999) ing at the Department of Game Design, Technology 5) H. Yasuda, R. Kaihara, S. Saito, M. Nakajima: "Motion Belts", and Learning Processing, Gotland university, (Sweden) Visualization of Human Motion Data on a Timeline IEICE in April 2012. Transactions, 91-D, 4, pp.1159-1167 (2008)

26