.

Virtual Studios

continue at the academic level as well as the prac- tical level. Over the last few years, virtual studio technol- Image ogy has become very popular in television pro- gram production. NHK developed probably the first practical virtual studio system1,2 in the world and in 1991 used it in a science program, “Nano- Compositing space” (see Figure 1). Since then we have contin- ued to study the integration of and image compositing systems. I’ll begin with an overview of image composit- Based on Virtual ing, then clarify its direction and research targets. Then, focusing on camerawork-related technolo- gy, I’ll explain the concept of a virtual camera before describing several image compositing sys- Cameras tems NHK developed that exploit virtual camera technology. Finally, I’ll address future issues regarding other aspects of the system besides cam- Masaki Hayashi erawork. NHK Science and Technical Research Laboratories Image compositing Basically, image compositing involves adjust- oday, actors in television programs ing and matching various conditions in separate- The virtual camera and movies commonly appear within ly or generated images, and synthesizing method presented a landscape wholly created by com- these images to create a completely different here integrates puter graphics. Computer graphics scene. The conditions might include such things computer graphics evolvedT for the most part as a computer-oriented as camerawork, lighting, resolution of pickups, and real-time video technology, which makes it possible to create atmosphere, semantic content, and so on. Which processing scenes from nothing by numerical calculation. conditions to match for picture synthesis depends techniques in a new Image compositing technology, on the other on the purpose of the image compositing. type of virtual studio. hand, has a different tradition. It evolved in the Generally, image compositing is done for the At NHK we field of video production as a technique for syn- following reasons: developed several thesizing multiple images taken by a real camera. image compositing Computer graphics technology has made remark- ❚ To create a scene in which it appears that the systems based on this able progress, and today the computer algorithms synthesized objects occupy a particular place method. First we look that produce the final image have reached a very (realism). at the virtual camera high level of sophistication. By contrast, image method, then some compositing technology relies on already filmed ❚ For artistic effect, often with little or no con- of the systems video footage, and the methods that produce the cern for realism (artwork). developed with it. final image still depend on expert human skill and Then we discuss experience. Examples of the former include the virtual studio future directions and Technology for integrating computer graphics approach and the image compositing effects seen possibilities not only and image compositing involves the convergence in so many recent special-effects movies (such as for image of these two cultural traditions, which already Jurassic Park). Examples of the latter encompass compositing but also have influenced one another over a long period of videos created for the title and credit segments of for the development time. The recent popularity of image-based ren- TV shows, and music promotion videos, which of sophisticated dering and in computer graphics combine special video effects with various types systems for editing on the one hand and the virtual studio approach of image components such as camera images, text and producing in image compositing on the other hand testify to images, computer graphics, and so on. television programs. this ongoing mutual interaction. However, Of these two types of application, the former because convergence has only just begun, a com- probably lends itself more readily to realization by prehensive system that integrates computer technology. The latter depends more on human graphics and image compositing does not yet artistic sensibility and therefore would be hard to exist. Efforts to achieve such a system will have to reduce to a technological process. Yet it would be

36 1070-986X/98/$10.00 © 1998 IEEE .

of great technological interest to make a machine capable of automatically generating artistic syn- thesized images based on, for example, a seman- tic understanding of the video content. Here, we will focus on the pursuit of realism. The objective in this case is achieved by creating a new realistic image out of multiple image compo- nents, matching their attribute conditions as close- ly as possible. Specific types of conditions include

1. Camerawork—pan, tilt, roll, 3D positioning (x, y, z), zoom, focus

2. Lighting—color temperature, intensity, direc- tion of light source, and so forth

3. Filming conditions—resolution of pickups, recording characteristics (for example, video or film?)

4. Environmental conditions—fog, shadows, and Figure 1. Scene from the so on TV program “Nano- In this work, we at NHK propose a filming sys- space.” 5. Interaction between objects tem that outputs the desired image by feeding the external data of filming conditions 1 through 5. In the computer graphics field, where a scene We refer to this as a virtual camera. Both the com- is literally created out of nothing, conditions 1 puter graphics technique and image processing through 4 have been studied thoroughly. You can technique may contribute to a virtual camera. Fig- readily obtain the desired computer graphics ure 2 shows a general schematic of an image com- images simply by entering data as matching val- positing system based on the virtual camera ues for conditions in a computer. However, the concept. In a conventional virtual studio, for image compositing field has produced no clear-cut example, the virtual camera for the background is answers—despite the ability to use many aspects a real-time graphics workstation, and the virtual of computer vision—because the conditions are camera for the foreground is a television camera. generally determined by the ambient physical Output data from a sensor mounted on the tele- conditions at the time of shooting. vision camera serves as the filming conditions. Turning to condition 5, we can think of many cases where interactions between objects occur. Figure 2. Image For example, two people filmed on different occa- compositing based on sions appear to shake hands, rain falls and objects the virtual camera get drenched, someone walks along leaving foot- Computer graphics concept. (Virtual camera #2) prints behind him, and so on. Effects such as these are quite difficult to achieve automatically, so in Real TV camera Multilayered the actual production of TV programs and movies with control port image Output today, special-effects professionals still do most of (Virtual camera #2) compositing scene these jobs manually. TV camera + The virtual studio creates a realistic composite image processor scene synchronized with foreground camerawork (Virtual camera #3) by driving computer graphics using camera oper- ation data. In other words, although matching Filming conditions camerawork condition 1 achieves the main pur- camera work, lighting pose in the virtual studio, matching conditions 2 pickups, fog, shadow through 5 currently mostly relies on experience object interaction, etc. and trial and error.

37 .

Figure 3. Virtual y y y Object coordinate Real d shooting process. camera 2D plate on which real object image is mapped Real object z O x z i z Ri x x World coordinate World coordinate (a) (b)

y y

Virtual Vi object Virtual object z Virtual z camera

Ui x x World coordinate World coordinate Obtained image (c) (d) (e)

Virtual camera developed commercial systems are available. By Two fundamental approaches realize the virtu- contrast, image-based rendering has a much short- al camera: er history. It needs special techniques whenever different filming parameters are attempted for ❚ The first approach generates images by using video materials already filmed. This is obviously a computer graphics to set the filming parame- serious constraint, and the issue has been studied ters (model-based rendering). intensively in recent years to find a solution.3,4 As mentioned, this article primarily concerns ❚ The second approach obtains images by pro- camerawork as filming parameters. Here, based on cessing the original images taken with a real an image-based rendering approach, a method camera to match the filming parameters called virtual shooting provides a different kind of (image-based rendering). camerawork from that with an actual camera. At NHK we have developed a number of image com- The first approach imposes few constraints on positing systems that apply this virtual shooting the filming parameters, but it’s difficult to replace method to a virtual studio. all the real objects by computer graphics when Virtual shooting creates the impression of dif- rendering naturalistic landscapes. Furthermore, ferent camerawork in an image already shot. The this approach requires immense computational virtual shooting process described in this article resources to produce images that closely simulate appears in Figure 3. Once a 3D object is filmed to the filming parameters. Turning to the second produce a 2D image, this image is mapped onto a approach for realizing the virtual camera, let’s take 2D surface and then refilmed using different camerawork as an example. We first film a real camerawork. This process requires the following scene using a wide angle and high resolution, and data: store the image data in a computer. Then we get panning and tilting effects by altering the size of 1. Real camera data—the direction, position, and the area. angle of view of the real camera when the Although the second approach can certainly object was filmed handle photorealistic images, it imposes major constraints on the filming parameters. In com- 2. Real object data—the orientation and position puter graphics, the first approach is generally of the real object called model-based rendering, and the second method is called image-based rendering. Model- 3. Virtual object—the orientation, position, and

IEEE MultiMedia based rendering has a long history, and fully scale of the surface on which the previously

38 .

filmed 2D image is mapped when placed in a Table 1. Parameters for virtual shooting procedure.

4. Virtual camera data—the direction, position, Data Type Parameters and angle of view of a virtual camera placed in Real camera Pan, tilt, rotation a virtual world x, y, z Angle of view Table 1 summarizes these four parameters. Real object data x, y, z Transforming the input image to the desired out- Virtual camera data Pan, tilt, rotation put image is a perspective transformation. If all x, y, z the above parameters are known, then the object Angle of view image on the output screen can be numerically Virtual object data Pitch, yaw, roll calculated. In other words, by applying virtual x, y, z camera data that differs from the original real Scale camera data, we can obtain a pseudo image of the object just as if we had applied camerawork com- pletely different from that used when the object the studio set of a virtual studio. Actors are filmed was actually filmed. For example, by placing the full size in front of a chromakey blue screen, and virtual camera farther from the object than the the images are recorded on laser disk. Then the real camera was, we create the impression that the playback images from the laser disk are image object was shot from farther away. In this virtual processed using the virtual shooting method. shooting process, the image diminishes in size. Thus, the images from two virtual cameras are This type of principle has been used for various synthesized in the system—one provides model- aspects of image compositing work. However, so based computer graphics and the other provides far few have attempted to use a geometrically con- an image-based video of the actors. The four para- sistent image-processing approach that embraces meters in Table 1 are realized as follows: all four parameters in Table 1. Most compositing work, therefore, remains a manual endeavor in ❚ Real camera data—As a person walks around, which the synthesized image is gradually created the camera pans to maintain a full-size video while the technician watches and assesses the of the person. A rotary encoder mounted on effects. the real camera head measures this panning This article provides an overview of image data, and real camera data is recorded in a com- compositing systems capable of using all four puter along with laser disk time code. parameters listed in Table 1. We also examine the key features of a system provisionally called a vir- ❚ Real object data—Here, the distance between tual camera system. the person and the real camera stays fixed and restricted to horizontal movement. This per- Virtual camera systems mits the person’s position to be calculated We went through several stages in developing using the distance between the real camera and a . Here we look at the dif- the actor and the real camera data mentioned ferent features and how the four parameters of above. These calculations are used as the real Table 1 were realized. object data.

A desktop virtual camera system5 ❚ Virtual camera data—Virtual camera data is We developed version 1 of the virtual camera provided by a user operating a “camerawork

as part of a larger research project to design a desk- tool.” January–March 1998 top program production (DTPP) system.6 The DTPP system’s principal purpose is to create an ❚ Virtual object data—The user can change the environment that supports managing an entire virtual actor’s position (discussed in detail program production process from planning to below) by using a mouse on a workstation. final editing. The desktop virtual camera system provides the studio part of the entire DTPP process. Virtual shooting is achieved by using the above Let me explain the desktop virtual camera sys- data and by image processing the image from the tem briefly. Real-time computer graphics create laser disk.

39 .

Figure 4. Schematic of Two changeable the desktop virtual computer graphics sets Virtual studio modeled in advance camera system. set generator (real-time computer graphics) Multilayered Output image synthesizer image Actor's image #1 Image processor Top view Laser disk #1 To set (scroll, zoom) of the floor generator

Camera data #1 Actor#2

Actor's Actor#1 image #2 Image processor Laser disk #2 (scroll, zoom) Camera Change of Camera data #2 Positions can be changed computer graphics Each laser disk with mouse sets includes six acts Operation of virtual camera shot in advance (panning, tilting, zooming)

Video Data Man-machine interface

Figure 5. Output images camera removed serves of the desktop virtual as the camerawork tool. studio. Figure 6 shows the operating environ- ment. Using this sys- tem, a user can shoot the virtual studio space in real time by operat- ing the camera head on a desktop work space. Figure 7 shows a graph- ical user interface screen on a worksta- tion. Using this graphi- cal user interface, the user can select actors, move actors to differ- ent locations, adjust camera settings, modi- fy computer graphics, Figure 4 shows the system hardware configura- and perform a host of other actions simply by tion. The graphics workstation generates the set manipulating the mouse. image in real time. Meanwhile, the playback images of the actor—stored on laser disks—are A virtual camera manipulator processed by real-time image processors. In this In version 1 of the system, the user could pan, system, a maximum of two actors can appear on tilt, and zoom the virtual camera as if operating a the computer graphics studio set. Finally, the com- real camera. Although the user could move the puter graphics image and processed images of the camera to the appropriate spot using the mouse as if actor are synthesized by multilayer compositing to setting up in a real studio, the version 1 system produce the final output image (see Figure 5). A could not accommodate continuous tracking shots.

IEEE MultiMedia regular camera head for studio work with the TV The basic problem regarding 3D movement of the

40 .

Figure 6. Operating environment for the desktop virtual studio.

Figure 7. User interface screen for the desktop virtual studio. camera derives from the fact that the images of the actor are simple 2D sur- faces. Consequently, when the virtu- al camera moves in 3D space, it cannot obtain proper images. Taking an extreme example, con- sider what happens if the virtual cam- era wraps around the actor from the side to the back. The actor’s image is just a flat planar surface—the actor’s side and back cannot be recreated if they weren’t filmed originally. Also, since the operating tool for the virtual camera is the camera head itself, the user cannot manipulate any camerawork data except for pan, tilt, and zoom. This particular prob- lem led us to develop a tool called a virtual camera manipulator with six degrees of freedom of movement, Link #2 tilt which we connected to the virtual studio. This enables the user to per- Arm pan Figure 8. Virtual form camerawork in real time with Camera pan camera manipulator. full degrees of freedom, such as pan, tilt, roll, x, y, and z (see Figure 8). Link #1 tilt Camera tilt One effective solution to the basic Camera view problem of the camera’s 3D move- direction

ment is to use a motion control cam- January–March 1998 era for the virtual camera. We will Camera roll discuss this solution in the next sec- tion. Here we will address treating the actor as a 2D surface, as in version 1 Zoom operation of the system. Let’s next consider the Focus operation virtual camera system featuring the virtual camera manipulator. We made a manipulator that

41 .

❚ Implement six-degrees-of-freedom camerawork with mechanisms such as those used to control a vehicle, namely a brake and accelerator that can control the manipulator’s speed and direction.

In the work described here we adopted the first approach employing the link structure illustrated in Figure 8. (We later combined these two meth- ods; see the next section.) This six-degrees-of- freedom robot arm enables the operator to man- ually pan, tilt, roll, and position the virtual cam- era at any x, y, z coordinate with one hand. Potentiometers mounted at each joint on the arm detect the angle of rotation. The virtual camera manipulator uses the angle of rotation data at each joint to calculate and output six-degrees-of- freedom camera data in real time. By integrating the camera manipulator with the desktop virtual camera system, we developed Figure 9. Overview of an advanced system enabling the user to perform the operating six-degrees-of-freedom camerawork from the desk- environment for the enables the actual computer graphics camerawork top to create composite worlds synthesizing nat- virtual studio with performed by human hands. The manipulator has ural video and computer graphics. Figure 9 shows manipulator. six-degrees-of-freedom movement and permits an overview of the operating environment, and the operator to pan, tilt, roll, zoom, and position Figure 10 is an example filmed using the system. the virtual camera at any x, y, z coordinate in real Our experience with actually operating the sys- time from the desktop. tem demonstrated its practicality and ease of use Implementing such a manipulator can follow in filming, but nevertheless exposed a number of one of two basic approaches: problems. First and foremost, in the compositing process people are represented as 2D surfaces. Figure 10. Output ❚ Build a scaled-down version of a six-degrees-of- However, actually moving the camera position images of the virtual freedom camerawork mechanism on the desk- up, down, right, and left using the manipulator studio with top that the operator can manipulate by hand. produced virtually no sense of unnaturalness in manipulator. the composite if the range of movement was not too great. In particular, when following an actor as he walked toward the left or toward the right, the system produced a natural-looking composite. If the user comes to understand and appreciate how best to use the system (for example, by using wraparound camerawork sparingly), the system proves very practical. The system is also extreme- ly effective for depth-direction tracking shots because almost no sense of unnaturalness is asso- ciated with movement in this direction. Next, let us consider some of the problems associated with the camera manipulator:

❚ When moving the camera in a particular direc- tion using the articulated robot arm, the amount of resistance differs depending on the direction of movement, so a smooth transition is not always possible.

❚ Professional camera people require the camera

42 .

Real camera and virtual Figure 11. Principle of the motion control camera- camera are at the same based virtual camera system. (a) Distance between position Real camera a camera and an object is less than 3m. (b) When the camera moves out more than 3m. Virtual camera

3m 3m Real-time video processor (a) (b)

Graphic Background workstation Chromakey Output (SGI Onyx) (Ultimate) video head to have an appropriate “friction” and a “sticky” feeling for smooth camerawork. How- Motion control camera Foreground ever, this is difficult to realize with the manip- ulator because the mechanism would be too Real-time big and complicated. video processor Turntable Several commercial versions of this type of manipulator are now available in the field of vir- tual reality, but to our knowledge they all have Real camera data these same problems. Virtual camera data Figure 12. Schematic of A virtual camera system using a motion control the motion-control Virtual 7 Video camera camera camera-based virtual Data This section describes the final version of the manipulator camera system. system in terms of supporting maximum degrees of freedom of motion of camerawork. Let me highlight the new virtual camera system’s three most significant features: position) in real time. Placing the actor on a turntable lets the system easily film the actor 1. The system can film an actor from any direc- from any direction. tion—from 360 degrees around the actor— while the camera pans, tilts, rolls, and moves 2. Figure 11 illustrates the principle behind how to an x-y-z position. the second feature was implemented. When the virtual camera lies within the approxi- 2. The system can handle continuous camera- mately three-meter area where the MC camera work over a wide range from very close up to can move physically, the positions of the vir- super long distance shots in real time. Exploit- tual camera and the real camera coincide, and ing this feature, the virtual camera can move no video processing is performed (Figure 11a). to a position far beyond the physical limita- However, when the virtual camera moves out- tions of studio space. side this area, the image processor performs geometrical transformation (that is, virtual 3. By operating the new manipulator, the user shooting) to the real camera image to obtain

can perform camerawork in real time. an equivalent effect (Figure 11b). January–March 1998

Let me briefly describe how these features were 3. Finally, to implement the third feature, we implemented: constructed a new type of manipulator that supports a full range a camerawork while the 1. To implement the first feature, we construct- cameraperson observes the composite video. ed a motion-control camera (MC camera) capable of filming an actor with six-degrees-of- Figure 12 shows a schematic of the system. freedom movement (pan, tilt, roll, and x-y-z When the distance between the camera and

43 .

Figure 13. Virtual LCD monitor around effect discussed camera manipulator. earlier. To be more spe- Grip with distance zoom cific, occlusion occurs in the direction of depth. The thicker the Grip with zoom object photographed in the depth direction, the more severe this y becomes. We calculat- ed7 that if the thickness x z of the object is 50 cen- timeters and the dis- tance between the object and the real camera exceeds about 3.5 meters, the average occlusion becomes less than 1 pixel. In the pre- sent system, since the MC camera operates within a range of object is less than 3 meters (Figure 11a), the sys- around 3 meters, occlusion in the depth direction tem controls the MC camera according to the vir- does not become a serious problem. tual camera data. When the virtual camera moves Regarding the four sets of parameters required beyond 3 meters (Figure 11b), the system moves for the virtual shooting process listed in Table 1, the MC camera to a position where a straight line the real camera and real object data are derived linking the virtual camera and the actor intersects from actual spatial measurements, while the vir- with a 3-meter half sphere surrounding the actor. tual camera and virtual object data come from With this arrangement, the virtual camera does outside the system and are provided by the user. not move at a right angle to the line connecting These parameters are entered into a specially the actor with the real camera. Consequently, designed real-time image processor7 capable of occlusion does not occur because of the wrap- performing perspective transformation based on the virtual shooting method. To implement the approach illustrat- Figure 14. Operating ed in Figure 11, the sys- environment for the tem must be capable of virtual studio using the switching between real motion control camera. shooting when the vir- tual camera is within the movable range and virtual shooting when the virtual camera moves outside the movable range. Adopt- ing the setup shown in Figure 12, just control- ling the MC camera makes this switching automatic. In other words, when the virtu- al camera is in the movable range, the

IEEE MultiMedia parameters of the real

44 .

Figure 15. Output images of the virtual camera and the virtual camera become the same. actor using the manipulator, then use the distance studio using the motion The virtual shooting processing then becomes 1:1 zooming function for more distant camerawork. control camera. and is identical to the actual shooting. With controls somewhat like operating an air- In this system users need to perform continu- plane, the system supports continuous camera- ous camerawork, from close-up shots to super work covering the entire range from very close to long shots. This led us to construct a new type of long distance shots. manipulator, quite different from the one Figure 14 shows the operating environment for described earlier. Figure 13 shows what this new the system, and Figure 15 provides an example of manipulator system looks like. High-precision the system’s video output. For this example, we rotary encoders are mounted on each axis. Data used a mannequin as the actor on a 200 × 200-

from these encoders is then used to calculate pan, meter piece of computer graphics-created ground. January–March 1998 tilt, roll, and the x-y-z location, and they output Naturally, the system could cover a much more camera data at the video rate. extensive area. Another powerful capability incorporated in When the system was used in real studio work, this tool is distance zooming. With just a flick of other practical problems arose. We had to con- a rocker switch, the user can move in for close-ups struct a special infrared sensor around the per- or out for distance shots along the current line of former to avoid accidentally hitting him with the sight. This lets the user position the camera at any camera. We also had to consider the rotation speed x-y-z coordinate in the close space around the of the turntable in terms of safety and its inertia.

45 .

Figure 16. Video Virtual Camera Laser disk components with shooting camerawork represented by an icon-like format. Composite Composite

Computer Computer graphics graphics Manipulator

Normal virtual studio Desktop virtual camera system

The shape of things to come posite layer. To change the camerawork, the user Up to this point we have surveyed a number of would employ a virtual shooting icon by provid- systems for producing TV programs that are either ing it with real and virtual camerawork data. available now or soon will be. This section high- Certainly there are limitations to the camera- lights the major issues likely to emerge in the work available to produce composite scenes years ahead and suggests some possibilities that (for example, rough picture quality due to video might open up beyond that. processing, or an unnatural image from wrapping around the subject), but you could readily Unified handling of virtual camera video conceive a scheme for creating the final video materials through interactive correction using a graphical The virtual camera concept will undoubtedly user interface. This would definitely facilitate become much more pervasive in the years ahead. the incorporation of the virtual camera concept All sorts of video components could be made in commercial offline image compositing software available for image compositing applications. applications (such as Flame8). It would also be From the standpoint of camerawork, it’s critical to very significant in terms of supporting real-time develop a robust software environment that can applications. handle these components in a simple, unified way. One approach would be to set up an icon- Extending the virtual camera beyond like format representing the camerawork attached camerawork to an image component. Figure 16 shows this for- As noted earlier regarding the examples in the mat. The shaded boxes represent camerawork “Image compositing” section, other conditions attached to a video component. Black triangles besides camerawork must be matched for anyone show the ability to output or accept camerawork. to use the virtual camera approach. Certainly The computer graphics box, for example, accepts much more study must focus on technologies for camerawork but has no output. The virtual shoot- processing original video materials to make them ing module works with real and virtual camera- virtual in other conditions, such as lighting and work data. The manipulator simply outputs filming conditions. camerawork. This example describes a normal vir- One example would be a technique for altering tual studio and a desktop virtual camera system in the direction of light. In the model-based world of the icon format. computer graphics (CG), light, filming conditions, Where the camerawork is unknown, it could and other parameters have been intensively studied be analyzed over a period of time using tech- with the aim of producing photorealistic CG niques developed in the area of computer vision. images. The image-based approach remains in the Reasonable estimates of camerawork could then initial research stage in addressing such issues, but be attached to the video components. With this one possible scheme would be to estimate the shape system in effect, composite images could be cre- of the object to be filmed, then alter the lighting ated automatically with matching camerawork conditions based on the shape data. The interaction simply by linking the proper camerawork icons between photography and computer graphics

IEEE MultiMedia and assigning the image components to the com- images will also continue to be a major issue.

46 .

Conversion from image-based to model-based dering it together with the background scene. The images system could be implemented much more com- Although this is an obvious point, there is a pactly using this latter approach, because we limit as to how much can be accomplished by fur- could dispense with both the video processor and ther processing video once filmed. Therefore, nat- the image compositor. What’s more, the resolu- urally the idea is starting to emerge of replacing a tion between the video and computer graphics person, for example, with a perfect computer would be almost exactly the same, and this has graphics-generated replica of the person. In fact, numerous advantages. For example, fog flash the technique—called substituting a virtual effects could be implemented much more easily, actor—has already begun to see use. The wide- and shadowing could be applied from within the spread application of range sensors and motion- computer graphics. capture techniques in video production today essentially equals this virtual actor approach. Integration of spatial image compositing and However, when a - temporal image editing, and automatic generated image is substituted for an actual filmed program production object today, the method involves producing only As noted earlier, in addition to camera data, a the video images needed, and only the required virtual camera must also handle lighting condi- parts are rendered in detail. This method requires tions, filming conditions, and other parameters. more thorough research. However, broadening our interpretation, let’s con- sider what might be achieved by extending this Analyzing and using professional camerawork approach to the semantic content of video mate- Dedicated manipulators use virtual cameras’ rials. Image compositing generally involves arrang- data, which human operators produce through ing video materials in a spatial direction, while manual methods. But there is a limit to how video editing clearly involves arranging video much camerawork human professionals can do materials temporally. You might say that produc- manually. Using the motion-control camera sys- ing a program involves arranging video materials tem described earlier, for example, the range of along both a spatial and a time dimension. The camerawork is so extensive that it’s nearly impos- issues addressed in this article focused narrowly on sible to finish the camerawork in one session in camerawork in the spatial direction, but a number real time. It’s therefore essential to adopt some of of very interesting themes emerge if we broaden the same methods used in the available comput- our interpretation. For instance, if you consider er graphics tool for producing camerawork, par- video editing along the time axis, the significance ticularly the computer graphics approach to of lining up one particular clip next to another clip producing interconnected camerawork by setting depends not only on the continuity of semantic multiple control points and connecting the points meaning between the two clips, it also depends with a spline curve. However, let me also caution on, for example, the pattern of camerawork that the human touch is starting to disappear between the clips and the continuity of sound from much of the camerawork that you see in the between the clips. In other words, data inherent in computer graphics world today. It’s therefore the material contribute to the editing process. This imperative that we continue to analyze the cam- suggests that it might be possible to implement a erawork of human professionals and try to iden- semiautomatic compositing and editing method tify the rules inherent in their work. More by attaching attribute data to video materials and research should focus on exploiting the rules of then searching for links between the attributes. good professional camerawork to develop a semi- This approach raises a host of interesting issues automatic camerawork production system. calling for further research: What attributes would

be selected? How would the attributes be attached January–March 1998 Embedding video in CG and applying CG-based to the video? How would the attributes actually rendering be used? Eventually, this approach might even The various systems described in this article are enable us to describe the structure of TV pro- similar to the extent that filmed video is first video grams. Further, if we could identify the structures processed, then externally composited with com- of programs, this would introduce the possibility puter graphics material. However, we could obtain of a largely automated process of program pro- the same effect by mapping the filmed video onto duction by creating programs with analogous a flat surface defined in CG-3D space, then ren- structures but different contents.9

47 .

Conclusion 2. K. Haseba et al., “Real-Time Compositing System of By matching the camerawork across various a Real Camera and a Computer Graphic Image,” video components and compositing the images, Proc. Int’l Broadcasting Convention, Amsterdam, Sept. virtual camera systems can create realistic syn- 1994. thetic scenes that look as if they had actually been 3. L. McMilland and G. Bishop, “Plenoptic Modeling: filmed on location. From a hardware point of An Image-Based Rendering System,” Computer view, a virtual camera system is achieved by com- Graphics (Proc. Siggraph 95), Aug. 1995, pp. 39-46. bining a real-time computer graphics technique 4. S.E. Chen, “QuickTime VR—An Image-Based and a real-time digital video processing technique. Approach to Virtual Environment Navigation,” Com- In addition to camerawork, other elements puter Graphics (Proc. Siggraph 95), Aug. 1995, pp. include lighting, filming conditions, and the 29-38. semantic content of video. Matching or harmo- 5. M. Hayashi et al., “Desktop Virtual Studio System,” nizing these parameters across different video IEEE Trans. on Broadcasting, Vol. 42, No. 4, Sept. materials could open up a greatly expanded role 1996, pp. 278-284. for virtual camera systems in the future. By man- 6. K. Enami, K. Fukui, and N. Yagi, “Program Produc- aging the various types of attribute data attached tion in the Age of Multimedia—DTPP: Desktop Pro- to video materials, the full implementation of gram Production,” IEICE Trans. Inf. & Syst., Vol. such a method would facilitate image composit- E79-D, No. 6, June 1996. ing and promote the development of a sophisti- 7. M. Hayashi, K. Fukui, and Y. Ito, “Image Composit- cated system for editing and producing programs. ing System Capable of Long-Range Camera Move- Regarding the quality of synthesized images ment,” Proc. ACM Multimedia 96, ACM Press, New that integrate computer graphics and real images, York, Nov. 1996, pp. 153-161. many recent special effects films are really quite 8. Flame, Discreet Logic, incredible. However, movies are not produced in http://www.future.com.au/flame.html. real time, and a major portion of this work still 9. M. Hayashi, “Automatic Production of TV Program involves laborious frame-by-frame manual effort. from Script—A Proposal of TVML,” S4-3, Proc. ITE A number of good software applications on the Annual Convention, July 1996, Institute of Image market today (such as Flame) cater to these high- Information and Television Engineers, Tokyo, pp. end requirements, vastly improving production 589-592. efficiency by applying semiautomatic functions to this offline frame-by-frame manual work. The three types of virtual camera systems pro- filed in this article all assume real-time perfor- mance capability. Broadcasting is one obvious Masaki Hayashi works for NHK area where real-time performance would be a sig- Science and Technical Research nificant sales point. The end goal of virtual cam- Laboratories in the Multimedia era systems is to seamlessly integrate computer Services Research Division. He has graphics and video to produce a single integrated engaged in research on image pro- virtual world in real time. Numerous challenges cessing, computer graphics, image have yet to be overcome before this capability will compositing systems, and virtual studios. He is current- be freely accessible. We at NHK hope that this ly involved in research on automatic TV program gen- field will continue to prosper and expand in the eration from text-based scripts written using TVML (TV years ahead. MM Program Making Language) by integrating real-time computer graphics, voice synthesizing, and multimedia Acknowledgment computing techniques (http://www.strl.nhk.or.jp/ Thanks go to the members of the Program Pro- TVML/index.html). He graduated from Tokyo Institute duction Technology Research Division and the of Technology in 1981 in electronics engineering, earn- Recording and Mechanical Engineering Research ing his Masters in 1983. Division of NHK research laboratories. Readers may contact Hayashi at Multimedia Services References Research Division, NHK Science and Technical Research 1. T. Yamashita and M. Hayashi, “From Synthevision to Laboratories, 1-10-11, Kinuta, Setagaya-ku, Tokyo, an Electronic Set,” Proc. Digimedia Conf., Digimedia Japan, 157, e-mail [email protected].

IEEE MultiMedia Secretariat, Geneva, Apr. 1995, pp. 51-57.

48