Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations

1-1-2005

Augmented reality tangible interfaces for CAD design review

Ronald Sidharta Iowa State University

Follow this and additional works at: https://lib.dr.iastate.edu/rtd

Recommended Citation Sidharta, Ronald, "Augmented reality tangible interfaces for CAD design review" (2005). Retrospective Theses and Dissertations. 20909. https://lib.dr.iastate.edu/rtd/20909

This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Augmented reality tangible interfaces for CAD design review

by

Ronald Sidharta

A thesis submitted to the graduate faculty

in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

Major: Human Computer Interaction

Program of Study Committee: Adrian Sannier (Co-major Professor) Carolina Cruz-Neira (Co-major Professor) Dirk Reiners Ying Cai

Iowa State University

Ames, Iowa

2005

Copyright © Ronald Sidharta, 2005. All rights reserved. 11

Graduate College Iowa State University

This is to certify that the master's thesis of

Ronald Sidharta has met the thesis requirements of Iowa State University

Signatures have been redacted for privacy 111

TABLE OF CONTENTS

ABSTRACT ...... iv MOTIVATION ...... 1

CAD'S ADVANTAGE AND EVOLUTION ...... 1 RESEARCH PROBLEM ...... 5 RELATED RESEARCH ...... 10

TANGIBLE USER INTERFACE ...... 10 Advantages of Tangible User Interface (TUI) ...... 15 AUGMENTED REALITY ...... 17 Augmented Reality For Collaborative Work ...... 19 AUGMENTED TANGIBLE INTERFACE ...... 21 IMPLEMENTA TI0 N ...... 24

STATEMENT OF PURPOSE ...... 24 GENERAL SYSTEM OVERVIEW ...... 24 Computer ...... 25 HMD ...... 26 Camera and Microphone ...... 27 IMPLEMENTATION DETAILS ...... 28 Image Analyzer ...... 29 Speech Module ...... 33 Scene Graph Module ...... 34 Interaction Module ...... 36 Inter-A TI Interactions ...... 43 DISCUSSION ...... 54 3 D Browsing ...... 54 3D Positioning/Orientation ...... 54 3D Assembly/Disassembly ...... 55 HCI CONSIDERATIONS ...... 56 Augmented Reality Feasibility ...... 56 Tangible User Interface Feasibility ...... 63 FUTURE W 0 RKS ...... 68 REFERENCES ...... 70 ACKNOWLEDGEMENTS ...... 73 lV

ABSTRACT

Today's engineering design is an iterative process. It is a process with many cascading effects; a change in one facet of a design often has effects that ripple throughout the rest of the process. It is also a difficult process, one that requires the close cooperation of people from a variety of different backgrounds and training. Creating designs digitally has allowed engineers and designers to harness the processing power of computers to help them create, manipulate, update, and distribute designs faster and more efficiently than traditional, paper-based process. Today's designers and engineers rely on Computer Aided Drawing

(CAD) programs to help them finish a design efficiently and accurately, reducing cost and increasing productivity in the midst of increasing competition.

CAD programs have been in development for almost forty years. In the beginning,

CAD was synonymous with the electronic creation and storage of 2D drawings, a replacement for the traditional draftsmen's table. Though CAD has evolved over the ensuing years to a primarily 3D tool, its 2D roots are clearly evident in the user interface. The creation of 2D drawings maps naturally to the interfaces provided by 2D desktop computers, but as the role of CAD programs has become increasingly three dimensional, the 2D analogy is extended beyond the breaking point. Unlike 2D drawing, the desktop metaphor does not provide an intuitive mechanism for the creation and manipulation of 3D objects. In order to adapt the 2D desktop to create and manage 3D objects, new interface methodologies and special purpose widgets were invented to map 2D actions into 3D, mappings that require significant training for users and increase their cognitive load.

Collaborative work has become an increasingly crucial aspect of engineering design, as teams grow larger and more diverse, spread across different locations, ever around the world. This increasing emphasis on collaboration requires new tools to facilitate design v

reviews among team members with differing backgrounds. The diversity of training requires that these new collaborative tools be transparent, with little or no learning curve required of their casual users.

Consider the members of a modem design team as they meet to determine if there is a conflict between two mechanical parts that join together as part of a design. In this typical design review scenario, with today's tools, a group of several users might find themselves crowded together in front of a computer desktop, while a designated, trained operator runs the CAD system to show the group the design of the mating parts. The complexity of the

CAD system's interface forms a barrier to the design, preventing the active participation of the team members. Communication is bottlenecked by the operator, restricting the flow of ideas.

The goal of the research presented in this thesis is to consider an alternative to 2D desktop based interfaces for design review, an alternative that reduce the users' cognitive load while selecting and manipulating 3D objects during the design review process. In this thesis, we identify three specific interaction challenges common to design reviews: 3D browsing, 3D positioning/orientation, and 3D assembly/disassembly. We then describe a new set of Augmented Tangible Interfaces (ATI) designed to more naturally support these three tasks. A TI uses augmented reality techniques to allow a computer to recognize a set of tangible objects and generate virtual graphics that integrate with a user's vision, letting those users "see" and "handle" those virtual 3D objects naturally, as if they were real, physical objects.

The A TI interface set we developed includes three new interface items: the Card

Stack, the Cube, and the Table. We used these items to demonstrate a more natural interface for the viewing and assembly of a set of 3D parts. The Card Stack interface is a stack of three cards that, used together, allow a user to easily browse through any number of 3D objects, just as though the user were looking through a stack of family photos. The Cube allows a Vl

user to easily pick up and "handle" a virtual object, with six degrees of freedom. The Table interface allows a user to place multiple objects on a common surface, facilitating object exchange between participants and allowing for consistent collaborative viewing. These interface items combine to form a natural environment for examining the relationships between the parts of a prospective design -- creating a simple, transparent environment for discussion and review.

In this thesis, we discuss the research problem and the related research that motivated us to develop this new A TL We then discuss the detailed implementation of our system, both hardware and software. In the HCI consideration chapter we discuss the usability of the system and then discuss future work.

Based on our experience, we believe that augmented reality based tangible interfaces are a promising area of further study, particularly in their application to the manipulation and evaluation of 3D designs. Furthermore, we conclude that augmented reality has a powerful potential for Computer Supported Collaborative Work (CSCW) despite some limitations that exist today. 1

MOTIVATION

CAD's Advantage and Evolution

Computer Aided Design (CAD) is the application of computer-based tools to assist engineers and architects in the creation, modification, review, and preservation of product designs. It is widely used in architectural and engineering projects, including mechanical, architectural and electrical drafting, mechanical drafting, as well as the 3D modeling of complex assemblies.

Today's CAD systems are the result of nearly 40 years of research and commercial development. Most agree that the origin of CAD can be traced to Ivan Sutherland's work at

MIT in the early 1960's. In 1961, Sutherland began his PhD research that would ultimately be known as SKETCHPAD, arguably the first electronic 2D drawing program. [ 11].

SKETCHPAD used a novel device, a light pen combined with a panel of buttons, to allow a designer to create 2D shapes directly on a computer display. Sutherland's light pen is the first predecessor of today's mouse, however unlike today's mice, the SKETCHPAD light pen was used to draw directly on the computer screen, allowing a user to create and interact with 2D drawings as though drawing with light. Because of these features SKETCHPAD is often cited as the first computer program to use a (GUI).

In his thesis, Sutherland pointed out two of the principal advantages of digital drawing programs, i.e. that such programs make it "easy to draw highly repetitive or highly accurate drawings and to change drawings previously drawn with it. "[31]. Like today's

CAD programs, SKETCHPAD was able to store previous drawings for later retrieval, using magnetic tape as the recording media. SKETCHPAD let users specify the exact dimensions of a drawing, enabling them to generate accurate plots of the designs using a digital plotter. A user of SKETCHPAD could also specify relationships between objects in a drawing, so that a 2

change in one object's property would affect every object m the drawing. Sutherland demonstrated this feature with a study of a mechanical linkage.

Though only an early prototype of today's CAD systems, Sutherland's SKETCHPAD already demonstrated all of the basic benefits that a CAD systems over traditional paper­ based drafting.

1. Ease of Creation

Object creation is easier with CAD because CAD facilitates reuse. Basic shapes can be

easily recreated and modified, and the many tools and libraries of objects available in

CAD programs help design engineers utilize previous designs as the basis for new ones.

For example, a design engineer can start with a 2D outline, and then extrude that outline

along a 3D space curve to easily create a new 3D object. Using traditional drafting tools

to create such complex shapes requires significantly greater time and skill.

2. Increased Precision

Precision in object dimension is a critical aspect of the design of mechanical and

electrical components. Unlike traditional drawings, CAD designs allow a part to be

created with precise dimensions that can drive the creation of a series of drawings, at

varying scales and views, all of which maintain the precision of the original definition.

3. Quicker and easier revision

Using traditional methods, design revision means redrafting. When a design change has a

cascading effect, every single design drawing may require redrafting. With CAD, design

changes can be made once, in place. The CAD system can then automatically update

other objects affected by the design change, eliminating the need for manual updating.

CAD thus allows many more design alternatives to be examined in a shorter period of

time.

4. Ability to analyze and perform virtual tests 3

Unlike traditional methods, CAD systems allow designers to specify object properties

and define relationships between objects that can then be used to drive automated

analysis and simulation. For example, a designer can evaluate the performance of a

mechanical linkage design virtually, without having to produce the physical objects. The

ability to do detailed analysis and simulation without the use of physical prototypes

decreases the cost of evaluation, allows more detailed design iteration, and produces

better results.

5. Electronic storage and distribution

In order to drive production, designs must be communicated. Today's designs are

typically exchanged between many different teams throughout an extended enterprise,

across different locations, often around the globe. These teams include designers,

engineers, sales/marketing professionals, manufacturers, and customers, all with different

backgrounds, training, even language. Unlike physical drawings, digital designs created

with CAD systems can be easily distributed electronically, in a variety of formats and

languages.

SKETCHPAD was revolutionary in its time, and in its design we can see that many of its features shaped not only today's CAD programs, but the very concept of a computer desktop. This evolution has shaped not only the features these systems provide, but also the interfaces use to access these features. At their core, even today's most sophisticated 3D

CAD systems have interfaces that evolved from the fundamentally 2D, drawing-oriented interfaces derived from Sutherland's SKETCHPAD. This evolution -- driven by changes in computer hardware, operating system, and interface devices - has shaped the CAD system's we know today.

At the same time as SKETCHPAD, in the early 1960s, large aerospace and automobiles companies also began to develop internal CAD software. For example, at

General Motors, Hanratty et. al. were responsible for the development of DAC (Designs 4

Automated by Computer), a 2D CAD system used to automate repetitive drafting chores

[10]. The light pen and drawing tablet were the principal means geometric data entry in these early CAD systems. This would ultimately change due to the work ofEngelbart [32], who first demonstrated the mouse at the Stanford Research Institute in 1968. In his demonstration, the mouse was used along with the keyboard to directly interact with a word processor prototype. Mouse like devices did not begin to replace pens in CAD until the late 1980 's, but they have since become the standard device for geometric input, as well as for and selection.

In the 1970s, companies such as Applicon and MCS had begun to offer commercial

CAD systems. These CAD systems were still generally used as electronic 2D drafting tools; however as computer hardware costs declined, and computing power increased, workstations capable of real time computer graphics became more affordable. The spread of these workstations increased access to the electronic designs, making the electronic form of the design as important as the paper drawings they could create. The CAD system interface was also influenced by the development of the now ubiquitous WIMP (Windows, Icons, Menus and Pull downs) paradigm. WIMP made its debut in 1973, when researchers at Xerox PARC introduced the Xerox Alto computer. This system demonstrated most ofEngelbart's ideas about direct manipulation, and also featured a bit-mapped display with graphical windows.

Interactions were mediated through a three- the mouse, and a keyboard.

By the 1980s, Personal Computers (PC) had become economically viable and with

them came AutoCAD v. l, by Autodesk. The first CAD system to run on PC rather than a

mainframe, AutoCAD showed that distributed computing would be the future of CAD. On

the user interface side, Apple Macintosh's commercial success in 1984 ensured the success

of the WIMP paradigm. Pro/Engineer, released by Parametric Technology Corporation

(PTC) in 1987, was the first CAD system to combine 2D and 3D modeling capability with a

WIMP interface. All the other CAD systems quickly followed. "Literally overnight 5

Pro/Engineer made the user-interfaces of the other vendors' CAD software programs obsolete. "[ 11].

In the 1990's, Microsoft's Windows OS extended the ubiquity of the WIMP paradigm, consequently, the majority of CAD developers adapted their systems to run on PC, and invented a variety of extensions and interface widgets to allow users to use the 2D mouse and keyboard to indirectly control and position 3D objects.

The standard 2D interface, a combination of keyboard, mouse, and computer monitor, has a strong correlation with the natural interfaces for 2D drafting, photo editing, and document editing. Similarly, to the extent that 3D object creation is driven by the creation of

2D templates and cross sections, the 2D interface is a natural choice for 3D object creation and editing. However, we contend that the application of the 2D interface to 3D object manipulation is an artifact of CAD's evolution rather than a deliberate and considered design choice [ 16]. In order to adapt the 2D interface for 3D interaction, many new interface metaphors (e.g.: skitter and jacks [2], snap-dragging [3]) have been proposed and implemented. While many of these metaphors are in use today, none of them correlate to the natural metaphors of object manipulation. As a result, the present generation of 3D object positioning and orientation interfaces is not intuitive and has steep learning curves.

In the next section, we discuss the challenges of using 2D interfaces to interact with

3D objects during a design review.

Research Problem

As concurrent design becomes increasingly common, and design teams grow larger and more diverse, today's design process requires different teams from different backgrounds to collaborate and communicate the designs collectively through a design review meeting.

These teams may include design engineers, marketing representatives, manufacturing engineers and even potential customers. At one time, such meetings were conducted 6

exclusively using a combination of paper drawings and physical prototypes. Even in the age of CAD, many such meetings still include these elements. But with the advent of CAD systems, a design review can now include interactive visualization of 3D models on computer screens or projection displays.

The first step when using these displays for review is model selection. The group must browse through a database containing a set of 3D models, in order to isolate the particular models they need to review. If this is done using a typical CAD or 3D visualization program, this browsing process will involve opening up some sort of file navigator or database browser. The group then chooses the desired models from a list of files that may or may not be accompanied by their corresponding thumbnail photographs. Browsing by filename alone can be extremely difficult, since there are typically a large number of part files, each with several revisions. Thumbnails can simplify this process by providing a 2D preview of the model. However, thumbnails have their limitations as well. A static 2D picture has limited degrees of freedom when representing a 3D object.

Selecting objects in these ways is unique to the 2D computer desktop. It does not

have a natural analogy in the real world. If the parts existed, or had physical prototypes, they

would likely be spread out on a table for people to pick up and look at. Object selection mediated by names and pictures is an artifact of the CAD process. It is not intrinsically

desirable. Rather, this form of access is an outgrowth of a limitation of the 2D computer

interfaces developed to create and store the designs.

The next step after successfully loading the 3D parts is to allow team members to

interact with those parts -- pointing out particular features, orienting and positioning them

relative to one another. In the past, physical prototypes of the designed parts were often

created, to allow team members to review and evaluate them. Interaction with these physical

prototypes is quite natural. No one has to teach people how to interact with physical objects.

The team members naturally handle the objects, turning them about in their hands, handing 7

them back and forth to one another or arranging or assembling them together to evaluate their worthiness.

But creating physical prototypes is expensive; one of the principal arguments for the use of CAD has been the concept of "virtual prototyping", the use of virtual objects in place of physical prototypes. However, interacting with virtual objects that exist only in the CAD system is by no means natural. It is bounded by the limitations of an indirect, 2D interface.

Unlike the direct manipulation of physical objects, using CAD to "manipulate" virtual 3D objects requires a trained, skilled operator to use the mouse and keyboard to interact with the interface items that then cause the 3D objects to appear to move and rotate behind the monitor glass. This indirect, non-intuitive operation can sometimes create confusion or ambiguity. Typically only one team member at a time can manipulate the CAD display, and many of the members of the team may not have the skills necessary to operate the system.

The unnatural interface creates a barrier, limiting the flow of ideas and the ability of the team to completely examine the design.

Interacting with virtual 3D objects with 2D interfaces is cumbersome, and this is exacerbated when the user must interact with more than one object. For example in the assembly process, several 3D parts, perhaps designed by different teams, must be put together to form a single assembly. Performing assembly using a 2D interface bears little resemblance to an actual physical assembly process. In place of the natural positioning of objects relative to one another using motions of the hands and arms, a complex sequence of learned mouse and keyboard operations must be performed to adjust viewpoints, position virtual objects in the desired positions, join them, and then view the partially or fully assembled object for interference inspection.

This thesis proposes an alternative to the traditional, 2D CAD system interface for purposes of design review. This alternative is a break with the traditional mouse and keyboard paradigm, an attempt to create an interface for virtual designs that simulates the 8

way a group of designers naturally interact with a set of physical prototypes. In place of indirect 2D interface widgets, we want to create something akin to a physical object, an interface device that fuses input and output, that can be handled, passed back and forth, inspected visually and positioned relative to other objects. At the same time, we want to retain the advantages that CAD models have over physical prototypes - the decreased cost, decreased creation time, and increased flexibility that CAD systems provide. We want a system that allows virtual objects to be handled as much like real objects as possible, allowing people to work with them naturally with an interface that has almost no learning curve.

In considering this natural interface we will focus on the several tasks identified above that characterize a typical design review. These three critical tasks are:

1. 3D browsing: the selection of one or more virtual objects for detailed

considerations from among a larger sets

2. 3D positioning and orientation - the six-degree of freedom translation and

orientation of virtual parts relative to one another

3. 3D assembly/disassembly- the joining of virtual parts to form complex

assemblies, or the deconstruction of more complicated parts into their

constituent sub-assemblies.

The remainder of this thesis provides a detailed description for the design and implementation of one such "natural" interface, based on a coordinated set of Augmented

Tangible Interfaces (A TI) that allow users to easily accomplish the three tasks listed above.

In the next chapter, we provide some background on tangible interfaces and present some related research using augmented reality to create malleable but tangible object manipulation interfaces in several other application areas. In chapter three, we describe the implementation details of the A TI for design review that we have developed. Finally, in chapter four, we 9

discuss the evaluation of this system and conclude with an outline of future work in chapter five. 10

RELATED RESEARCH

Tangible User Interface

Ishii described TUI as interfaces that couple digital information to physical objects and environments [18]. Furthermore, he explained that tangible.interfaces link physical objects with digital information, such as graphics or audio, to yield interactive systems that are mediated by computer, but generally not identified as "computers" per se [ 17]. In other words, TUI provides a seamless integration between representation and control in interface design. To further explain this, let's consider the abacus as an example.

Ishii described the abacus as the ideal prototype of a tangible interface. An abacus is a traditional artifact used in Eastern Asian cultures to assist in arithmetic tasks. It uses beads on frames to represent numerical values and operations. From the HCI point of view, the abacus does not distinguish between input and output. The beads are the physical representation of numerical values, but simultaneously, they are also the physical control to manipulate the numerical values. The beads are the input as well as the output.

In contrast, today's computer interfaces typically separate the input domain from the output domain. For example, the result of manipulating the mouse is only observable on a separate domain, the monitor. The TUI paradigm is an attempt to bring the input and output domains closer to one another. The closer association between the input and output domain decreases the cognitive load of operations; therefore, it is more intuitive [13]. So, the ideal

TUI should represent both input and output [ 17].

In the following sections, we consider two examples of early TUI interfaces to gain some insight into their development. 11

Bricks

In 1995, Fitzmaurice and Ishii introduced the concept of graspable user interfaces. To illustrate their concept, they described a physical interface called a brick, a LEGO-sized cube that was coupled with virtual objects in computer application in order to control the virtual objects [14].

They illustrated the use of these bricks as an interface in a floor planning application.

Placing a physical brick on top of a horizontal display surface caused the virtual furniture in the display to become "attached" to the brick (see figure 1.a). To arrange the virtual furniture's location and orientation, the user manipulated the position and the orientation of the brick and the attached virtual furniture moved accordingly. Furthermore, because the bricks are physical objects, the user could use both of his/her arms to manipulate several bricks at the same time in a rapid fashion.

(b) ,f/_,\:00=._. -·- ----=.-, /

/ I

Figure 1: floor planner application with bricks (left), spline curve creation tool (right) (Image is reprinted with permission from [14])

They also described the use of the brick as a drawing and transformation tool. For example, bricks were used to represent the control points of a spline curve. Physical manipulation of the bricks allowed a user to physically control the shape of the curve, in a way like manipulating a malleable metal rod (see figure 1.b ). 12

In applying their concept, Fitzmaurice et al. implemented "GraspDraw", a simple drawing application that uses two bricks (Ascension Flock of Birds' magnetic tracker) as input devices. The application allows the user to create simple 2D primitives such as rectangles, lines, and circles by utilizing one brick as an anchor, and the other brick as an actuator. For example, to create a square of a certain size, the user anchors one brick as at the bottom left comer of the square, while the other brick acts as an actuator to control the upper right comer of the square. With this two handed interaction, the user could both position and orient the shape simultaneously. To assign color to the virtual object, the user uses the ink­ well metaphor, dunking a brick in a colored physical compartment to assign the brick with that color.

Figure 2: GraspDraw uses rear projection system as a drawing table, and two magnetic trackers for input. Physical compartment to get various colors are located above the table. (Image is reprinted with permission from [14]) 13

Toon Town

ToonTown is a system that utilizes toy action figures' position on a tray to control the audio level of each participant in a teleconference application [29]. Each action figure's position represents a remote participant position relative to the local user. By changing an action figure's physical position on the tray, the audio levels of the remote participant represented by that figure are adjusted. For example, if an action figure representing a remote user, John, is put farther away on the tray, the audio level of John in the local user's system will be lowered. If the action figure is positioned to the left, the audio level of John will be panned to the left.

The initial purpose of Toontown was an investigation to improve the traditional 2D audio interface that used knobs and faders to control the audio level of each remote participant. With the traditional interface, called the Somewire, the local user must change the volume level and the pan levels of each remote participant separately. Toontown was a successful alternative to the Somewire's interface. Using Toontown's system, the local user could use both hands to simultaneously adjust the volume and pan levels of several remote participants.

Figure 3: Toontown (left). It's a more intuitive audio-levels control system compared to their previous 2D GUI Interface (right). (Image is reprinted with permission from [29]) 14

Triangles

The triangles system [15] is a system that has set of identical flat triangular tiles that can be constructed together to provide a meaning. Each plastic triangle has a microprocessor, and a magnetic edge connector. The connector let electric currents to flow, allowing the triangles to communicate to each other, and gives the constructed configuration back to the host computer. Specific configuration of the triangles is used to trigger application events.

Triangular shape was chosen for the tile because triangular shape facilitated the construction of any generic object, ranging from 2 dimensional objects (such as: squares, L shaped object, trapezoids) to 3 dimensional objects (such as: cubes).

This system has been applied to several non-linear story-telling applications [15].

We'll take the Galapagos! project as an example (see figure 4). Each triangular tile in the

Galapagos project is painted with half of the characters, places, or events related to the story.

The user must connect two corresponding triangles to complete an image. Upon a correct connection of two triangles, the host computer's screen will show a webpage that display the story corresponding to the image on the connected halves. The progression of the story is determined by the order and the shape of the triangles constructed by the user.

Figure 4: Galapagos! a tangible story­ telling system. (Image is reprinted with permission from [15]) 15

Advantages of Tangible User Interface (TUI)

Brick, Toontown, and Triangles described in previous section demonstrate the use of tangible interface; furthermore, they show that in many applications, it is more appropriate to use TUI than 2D interfaces. It is because TUI uses physical objects, and they have affordances that can't be replaced by 2D interfaces. In this section, we will examine the advantages of using TUI.

Fitzmaurice et al. categorized input devices as being space-multiplexed or time­ multiplexed [14]. Space-multiplexed inputs, he explained, have a dedicated transducer. For example, the automobile has a brake, a clutch, and a steering-wheel which each of them controlling a single specific task. Time-multiplexed inputs, on the other hand, use a single transducer to control multiple different tasks. The mouse is an example of a time-multiplexed input. Unlike space-multiplexed input, time-multiplexed input allows only one task to be controlled at a time, resulting in interactions that are sequential and mutually exclusive in nature. TUis are powerful because they provide space-multiplexed input devices, allowing

several interaction tasks to be performed in parallel.

In 2000, Ishii outlined four major advantages that interfaces based on physical objects

have over traditional user interfaces [ 17]:

1. Spatial

Physical objects are spatial in nature, allowing interface designers to leverage the

orientations, and the positions of physical objects as interface parameters. For example, The

GraspDraw system uses to two physical cubes as the control devices. By manipulating the

cubes' spatial location, the user manipulates the virtual drawing associated with the cubes

Also, by leveraging spatial persistence, the users benefit from using spatial reasoning and

muscle memory.

2. Relational 16

Physical objects or tokens [ 17] can have relational connection. A relational

connection maps the logical relationship between tokens onto more abstract computational

interpretations. For instance, in the GraspDraw system, there is an "inkwell" for the bricks to

"pick up" a color from. If a brick is dunked in that inkwell, the brick will paint that color to the virtual drawings. The relationship defined here is that an inkwell is a place to get a color.

3. Constructive

Physical objects can be connected or assembled together to provide meaning. An

example of a constructive system is the triangle project [ 15] described in previous section. In

that system, the structure of the triangular tiles is relayed to the host computer as the input for

the application.

4. Hybrid

The spatial, relational, and constructive properties of physical objects are not

mutually exclusive. TUI' s can leverage one or more of these properties to better map user

interactions. An application of the triangle system, the Galapagos! story-telling application

[15], encourages the user to construct two related triangular tiles before presenting a

corresponding story on the computer screen.

Based on these advantages, TUis show promise as interfaces in a variety of

application domains. However, we believe that TUis are particularly valuable as interfaces in

a design review context. Fundamentally, design reviews are discussions about objects and

their relationships with one another. Discussions about the relationships between physical

objects are natural - they feel unmediated. This is one of the rationales behind the use of

physical prototypes to propel design discussions. Real, physical objects play a crucial role in

collaborative meetings, beyond simply providing an image of a particular design.

Billinghurst [6] found that real objects helped collaborative activity by creating reference

frames for communication, and by altering the dynamics of interactions. 17

A TUI for design review promises to provide this unmediated feeling that real objects give to a discussion, but for objects which are still in the design stage, and are yet to be physically realized. Unlike today's "virtual prototyping", the goal for our TUI is to bring the virtual objects out from behind the monitor glass, to make them behave like real objects, but without the expense of fashioning them from physical materials.

Augmented Reality

We have discussed the advantages of physical objects as interfaces -- their relational and spatial relationship, and their constructive meanings. Despite their benefits, there is a drawback. Physical objects lack the infinite malleability of virtual ones. When the objects being manipulated are static as in ToonTown, or confined to a 2D domain, as in GraspDraw, physical interfaces can be easily realized. In design, where the objects to be manipulated have not yet been created, or are the subject of change, associating particular physical objects with them is problematic. Further, we ideally want to be able to handle the objects in true 3- space, away from the artificially constraining 2D glass of the monitor. Some other form of display is required to allow us to easily depict the state of digital data through visual cues directly associated with the physical interface objects.

Augmented reality (AR) provides a method for overcoming this limitation, by enhancing the display possibilities through the use of virtual image overlays. We call the winning combination of TUI and AR, Augmented Tangible Interface (ATI).

The term Augmented Reality (AR) was first coined in 1992 by researchers at Boeing

[12], who developed an AR system to help workers in a Boeing factory assemble wiring harnesses. Unlike users in Virtual Reality (VR) systems, who are completely immersed in a virtual environment, AR users see the virtual objects and the real world coexisting in the same space. It is the goal of AR to supplement reality than to replace it [l]. 18

Milgram [23] described the relationship between VR and AR in his Reality-Virtuality

Continuum. In the continuum, the real world lies at the left pole, VR lies at the right pole, while AR lies in the middle. We can think of AR as the middle ground between the real and the virtual.

Real Augmented, Augmented Virtual Environment Reality (AR.) Virtuamy (AV}. Erwimnmem Figure 5: Milgram's Reality-Virtuality Continuum

According to Azuma [ 1], one of the aspects that characterize an AR system is that AR combines the real and the virtual. There are many ways to combine the real and virtual, and it is not strictly limited to visual augmentation. For example, audio augmentation can also be considered augmented reality. But in this thesis we will focus on visual augmentation.

In order to augment human's visual sense, we need to use a physical display device that allows us to combine real and virtual imagery and present it to our user. Many forms of video display suffice, from Head Mounted Displays (HMD), portable displays (such as PDA), to monitors and projectors. HMD is a common choice for AR because it is portable, mobile, and it is placed directly on the users' visual range to provide continuous display feedback. In our thesis, we focus solely on using HMD.

There are two basic HMD-based augmentation techniques in AR: video see-through or optical see-through.

A user wearing a video see-through HMD doesn't have a direct view to the real world.

Instead, the user must use one or two head mounted video cameras to provide views of the real world. The video frames captured from the cameras are combined with virtual objects 19

generated by the computer before being presented back to the user, thus combining the real and the virtual. Our system is a video see-through AR system with one camera.

A system with an optical see-through HMD, on the other hand, allows the user to view the real world directly, without mediation from video cameras. A combiner in the HMD allows the user to see lights coming from the real world along with the virtual images generated by the computer. This is a similar approach common found in military aircraft's

Heads Up Display (HUD).

Augmented Reality For Collaborative Work

In a face-to-face collaboration like a design review, non-verbal cues such as gesture,

eye contact, and body language are important in order to achieve a successful communication.

Non-verbal cues allow the meeting participants to know where the focus of attention is; they

also facilitate a systematic communication. The lack of visual cues in communication, for

example in an audio-only teleconference, often results in communication overlap, a difficulty

in determining conversational tum-taking [7].

Research in virtual reality conferencing has shown some positive results in restoring

visual cues; however, virtual environments separate users from their physical environment,

and users often report that it is more difficult to interact with the virtual environment [ 6].

Billinghurst [5] and Schmalstieg [28] argued that AR can further improve face-to­

face collaboration by enhancing the physical , and the interaction possibilities.

Schmalstieg et al. 's studierstube project showed the potential of using AR for a face-to-face

collaborative work by using HMD and 3D magnetic trackers to allow users to collaboratively

view 3D virtual objects overlaid on the real world (look at Figure 6). 20

Figure 6: Two users collaborate using AR. Each user has independent view of the virtual object. (Image is reprinted with permission from [28])

Studierstube project showed that AR support collaborative work by letting user work from his/her own viewpoint, but furthermore, collaborative work was enhanced with virtual data augmentation. Schmalsteig et al. identified five benefits of collaborative AR:

1. Virtuality: An ability to view and examine objects that doesn't exist in the real world

2. Augmentation: Augmenting real objects with virtual annotations

3. Cooperation: Multiple users can work together in a natural way

4. Independence: Each user has control over his/her own viewpoint

5. Individuality: Each user can see different data

By providing these benefits, AR allows the tangible interface approach to be used in a

broader range of applications. Augmented Tangible Interface (ATI) facilitates more natural

interactions with virtual objects, and can be used to enhance the interpersonal exchanges that

characterize face-to-face meetings such as design reviews.

In the next section, we will discuss some existing work in A Tl's which provided the

basis for the interface we designed. 21

Augmented Tangible Interface

ATI's are tangible interfaces that are enhanced with augmented reality technology.

Kato and Billinghurst argued that TUI and AR combination is more powerful than the TUI or

AR on its own [20], because it combines the intuitive power of physical input devices with the flexibility of AR's display capability. Furthermore, they argued, ATI supports collaborative work well. It is helpful to use physical objects in collaborative work because physical objects have affordances, they have spatial relationship, they have semantic representation, and they help to focus attention. With ATI, we further the capabilities of the physical objects by enhancing the display capabilities, and interaction possibilities. In this section we will examine some examples of A TI.

VO MAR

Billinghurst's VOMAR (Virtual Object Manipulation in Augmented Reality) project demonstrated a collaborative table top AR system that uses A TI to arrange furniture in a miniature room [20]. VO MAR users wore a HMD-camera pair to view the world. The

VO MAR A TI consisted of three components. First, a physical book served as the catalog for all the pieces of virtual furniture. Second, a paddle served as the primary interaction device allowing users to pick pieces up from the catalog and move them around in three space.

Third, a large piece of paper served as the workspace or miniature room.

Users pick up pieces of virtual furniture from the catalog book using the paddle. The gesture is similar to picking up food using a spatula. Users then place the virtual furniture in the miniature room with a pouring gesture (See figure 7).

The paddle was a specially patterned cardboard that's designed to support various gesture interactions. Using gestures such as shaking, hitting, and pushing, the user can use the paddle as a selection tool, deletion tool, and displacement tool to interact with the virtual 22

furniture in the miniature room. Billinghurst argued that using the paddle as interaction tool is more intuitive than using a mouse and keyboard.

Figure 7: VOMAR. The user uses a physical paddle as an interface to interact with virtual furniture. (Image is reprinted with permission from [20])

AR City Planner

Kato demonstrates a city-planning system that uses A TI to lay out the various component of the city in a map like system [21]. In this system, there is a table that is outfitted with 35 square markers to display the 3D map, and a physical cup that is useful for picking up and placing virtual objects. The user's task is to put the city's various virtual component (e.g.: different types of buildings, parks, and trees) onto the map as to plan the city's development, just like the game SimCity. Instead of using the mouse and keyboard to place the virtual objects onto the map, the user utilizes a cup interface to achieve the task.

The manipulations of virtual objects using the cup interface are as follows. To pick up a virtual object, the user covers the virtual object with the cup for a few seconds, and the virtual object will remain in the cup. To put object onto the map, the cup with a virtual object 23

attached is positioned and oriented somewhere on the map. The virtual object will stay on the map. To delete the object from the cup, the user simply shakes the cup, and the object disappears.

Figure 8 shows the AR City Planner designing a park. Using a physical cup as an interface, the user picks up a from the park (see figure 8 right).

Figure 8: Designing a park (left). The user is using the cup interface to interact with sliders on the park (right). (Image is reprinted with permission from [21]) 24

IMPLEMENTATION

Statement of Purpose

This research seeks to address the challenge of interacting with virtual objects during a face-to-face collaborative design review by creating a set of tangible user interfaces (TUI) that support design review using currently available and implementable technology. We consider three main interaction challenges as part of the design review process: 3D browsing,

3D positioning/orientation, and 3D assembly/disassembly. Using TUI and AR principles, we have designed and implemented a new set of augmented tangible interface (ATI) devices to support these interactions for design review participants. Our goal is to provide interfaces that facilitate more natural interactions with virtual objects than mouse, monitor and keyboard based methods, allowing participants to focus on the design discussion instead of the mechanics of interactions.

General System Overview

The system we created serves multiple participants engaged in a collaborative design review of a set of CAD models. Each participant wears a HMD with a video camera attached in front of it. Each user's world view is mediated through the camera, allowing the world view to be augmented with virtual objects before being displayed in the HMD. Through the

HMD-camera pair, each user views the real world, and shares a custom set of physical interfaces - a table, a set of cards and a set of cubes, -- with the other participants. These interfaces are equipped with black and white markers with distinct patterns that allow our system to identify the interfaces and their positions. 25

A computer vision algorithm analyzes the images coming from the camera. It looks for black square markers on the physical objects to figure out their 3D positions and

orientations. Using these data, an image fusion algorithm places the virtual objects in the

scene, on top of the physical objects, for display in the HMD. This creates an augmented

reality environment that combines virtual objects and real world objects in a common scene.

The user interacts with the scene by manipulating the marked objects and by giving speech-

based interaction commands. The speech commands are captured by a microphone,

recognized by the speech recognition engine, and used to control the scene.

The following sections describe the hardware components in detail.

Tangible User Interfaces. silCh as: the card stack, ttle cube, the table

Figure 9: General System Overview. Each user's system is equipped with a camera, HMD, and microphone. User's view to the A TI are mediated through the camera

Computer

Our system runs on a VPR-Matrix laptop with a Pentium 2.20 GHz processor, a

GeForce4 graphic card with 32MB of memory, and 512 MB of RAM. A laptop was chosen

as it is more portable compared to desktop systems, but is more powerful than present

generation wearable computers. The laptop's processor is sufficient to generate

representative 3D graphics in real time while simultaneously running the computer vision 26

and speech recognition algorithms. The laptop also provides the VGA output for the HMD, and the USB 2.0 input for the camera.

HMD

For the user's display, we chose an Olympus Eye-Trek F-700 HMD. This HMD provides a maximum resolution of 800x600 with a 30 degree Field Of View (FOV). The

Olympus HMD is connected directly to the laptop via a 15-pin VGA input.

In order to see in stereo, each eye of the user must see the same scene but from slightly different angles. There are two ways to achieve this using HMD. The first way requires two VGA inputs into the HMD; this way, the left LCD can display the image appropriate for the user's left eye, and vice versa, resulting in stereoscopic view for the user.

There are also HMDs that only have one VGA input and are capable of displaying in stereo.

This kind of HMD' has LCDs that are capable of displaying in 120Hz. Through this type of

HMD, a special video-card is required to deliver separate views for each LCD with 60 Hz refresh rate each. The Olympus HMD only has one VGA input, and 60Hz refresh rate for the

LCDs, thus it is not capable of displaying in stereo. We will discuss the drawbacks of monoscopic view in chapter four.

We chose the Olympus display as it represents a nearly commodity level device, a device that at the low end of today's display technology. 27

Figure 10: Olympus Eye-Trek 700 HMD with Logitech Quickcam Notebook Pro Webcam attached using Velcro tape. Front view (left), and side view (right).

Camera and Microphone

For visual and audio input, we selected another low-end, commodity level device, the

Logitech Quickcam Notebook Pro webcam with 640x480 video resolution and microphone input. We attached the camera through the USB 2.0 connection to the laptop. The camera is attached to the HMD using Velcro tape so it tracks the "gaze" of the user (see figure 10).

Since the HMD can display up to 800x600 resolutions, the images from the camera are scaled from 640x480 to 800x600. The camera is capable of capturing video at 30 frames/second; however, our system is running at average 14.5 frames/second. 28

Implementation Details \ ~~~ r-·-·' lnt~rao1io r1 ;M()()uJ~ "~Ustof markets' positkm E.. . )· and llri#ritaliatt ·----lm-ag-eAn-a-lyt-er ...... ,.,.'* ff-~ rnage from Cill\Wil.'~- . .·· .C··am .· ... ~. r..a · . ....· ·Card Stack UI .Ctibe Ul -Tu~~ I I

Fh!ure 11: The software components in our svstem

Figure 11 illustrates the details of the A TI system. The video camera captures the real world; the video frames coming from the camera are processed by the Image Analyzer module. The Image Analyzer looks for physical objects with square markers on them in the video frames, and based on the square markers, the Image Analyzer computes the 3D position and orientation of the identified objects. These data are then forwarded to the

Interaction module.

The Interaction module combines the markers' positional and orientation data, as given by the Image Analyzer, and the user's speech input, forwarded from the microphone.

Based on these inputs, the Interaction module modifies the state of the virtual objects in the

Scene Graph module on a by frame basis. 29

The Scene Graph module is a module that manages the drawing subroutines of each virtual object in the system. In the scene graph, the virtual objects, represented by object nodes, are positioned by a hierarchical cascade of linear transformations that encode the objects' positions, and the positions of the markers in the scene, obtained from the Interaction module. Speech commands from the Interaction module modify the nodes in the scene graph to either move the object nodes to different roots, or to delete them.

The Image Fusion module takes the images created from the scene graph, combines it with the live video stream from the camera to create an augmented reality view for the user, in which the real world and the digital world coexist in the HMD. The following sections describe the algorithms used in each module in the system.

Image Analyzer

In order to maintain the illusion that the real world and the virtual objects coexist in the same space, an AR system must perform view registration, recognizing the real world's structure in order to accurately place the virtual objects in their intended place in the real world. The Image Analyzer module performs this view registration in our AR system.

There are many ways to do view registration in AR [8], e.g.:

• Magnetic. For example: Ascension Flock of Birds, Polhemus 3SPACE FastTrack

• Mechanical. For example: Senseable Phantom

• Ultrasound. For example: Intersense IS-600

• Inertial. For example: Intersense Cube2

• Light waves. For example: 3rd Tech HiBall-300

• Optical. For example: ARToolkit

• GPS for outdoor AR applications

• Hybrid: combining two or more methods to produce a more robust one 30

In our work, since the virtual objects are always associated or attached to physical objects, we realized that we could confine the view registration problem to the tracking of the physical objects in the scene that serve as the user interfaces. Since the user's eyes focus on the physical objects while interacting with them, we decided to use the optical tracking method.

Square markers were attached to the physical objects that make up the interface; computer vision techniques were then used to analyze each video frame to identify markers in the frame and to determine the 3D position and orientation of any identified markers. The code we wrote to perform this process is based on an open source library called ARToolkit

[19]. Video frames from the camera are processed by two critical ARToolkit routines which:

• Identifies every visible markers' unique pattern through template matching

algorithm

• Computes 3D position and orientation for every visible marker

Figure 12 shows the ARToolkit process in detail. First the video stream from the camera is normalized into black and white, simplifying the identification of the square shaped black frames of the markers. Because the markers' real world sizes are known,

ARToolkit can estimate the markers' 3D position and orientation relative to the camera, represented as a 4x4 matrix. ARToolkit's identification routine is able to distinguish among the different markers through a template matching of the unique symbol inside each marker.

The identification process yields an identification number for each marker. The identification numbers of the markers visible in the frames are forwarded to the Interaction module along with their orientation matrices. 31

-EstlrhatirMan---Markers· Matrix Pasitlon arid Orlentatiori'

Estimate marker's 30 ;mi'Joo and orier.taiion relative to fue camera

The 'ir11~ga is oorr-tt-!rted w binary Image (black&Viflite), Tllen identify marker black · frame

\_ ~---- ~ _ ,_ ·_. __ ._ -_ :__ the ' 51ri1b0i - 1n~aettie ! marker ls-matctied with a I 1 template In the rnerr~y /

JnteracliooMoou!e >----~10...ii;;;:. "--...... ______/_ ""'--- '-....._,__, (not p1¥t of ARToclklt)

Figure 12: ARToolkit's Process

Marker Identification

Each marker in the system has a black border and a unique symbol inside the black frame. ARToolkit uses the unique symbol to identify the markers. Before using a marker with ARToolkit, it must be trained using the ARToolkit marker training application. This training process generates a template image which the ARToolkit uses to compare against.

Each marker from the video stream is normalized to get a normalized symbol to be compared against the list of templates. The normalized symbol is compared four times to every template images for four possible orientations. This way ARToolkit gets the orientation information as well as the marker's id. Attention must be given when choosing a symbol for a marker to prevent orientation ambiguity. For example, "A" is a good symbol while the "%" 32

symbol is not a good symbol because a 180 degree rotation of the % symbol produces an image identical to the initial orientation.

Optical tracking of this kind is subject to several tradeoffs. For each marker visible in view, it has to be compared against every marker templates in the database. This O(n) times comparison means that the more markers used in the system, the slower the marker identification process would be.

Figure 13: ARTookit normalizes image from the video frame. (Image is taken with permission from [131)

3D position and orientation estimation

( eq. l)

Figure 14: The relationship between marker coordinates and the camera (Image is taken with permission from [19]) 33

Figure 14 shows the relationship between the camera and the marker in the scene.

The transformation matrix from the marker coordinates to the camera coordinates, Tcm, is represented in equation 1. The ARToolkit library computes the matrix Tcm, and we forward that, along with the marker's identification number to the interaction module. (The details of the Tcm computation can be found in [ 19]).

Speech Module

One of the identified benefits of tangible interfaces is that they encourage two-handed, direct manipulation; however the two-handed interaction makes it impractical to also use a mouse or keyboard to give additional commands. As was discussed in chapter one, it is our goal to facilitate the user with more natural 3D interactions than the ubiquitous mouse and keyboard. In their stead we chose speech as the interface with which to give commands to the computer. The basis for our speech recognition and synthesis system was the Microsoft

Speech API 5.1 (MSAPI) [22]. The MSAPI speech recognition engine is initialized with a grammar file, a file that defines all the possible phrases that can be used as commands. Upon receiving a speech profile that matches a phrase listed in that file, the system then executes the corresponding interaction command.

We defined several speech command phrases in the grammar files. They are:

• "Computer, transfer": to transfer virtual objects between tangible interfaces

• "Computer, join": to assemble two virtual objects together

• "Computer, break": to disassemble an assembled virtual object

• "Computer, delete": to delete the virtual object in view

• "Computer, put": to put the virtual object in view on the table

• "Computer, pick up": to pick up a virtual object from the table to an interface

The system's microphone is constantly listening during the lifetime of the application.

To prevent the computer from misinterpreting the users' conversation as a command, we 34

intentionally required the user to start with the phrase "computer" before continuing with the command. This keyword-command pair lessens the possibility of a non-command voice misinterpretation by the Speech module.

An acknowledgement from the computer was necessary to indicate if the user's speech command were successful. If the computer recognized the user's speech input, the computer would reply to the user through a speech response. We used the MSAPI speech synthesis engine to create the computer's speech responses.

Scene Graph Module

A scene graph is a tree-like data structure used to represent the relationships of virtual objects in a 3D scene. The graph defines a parent-child relationship between the nodes at different levels. When used to draw, a depth first traversal of the graph allows a parent node's transformation in a scene graph to affect its children node's transformation. In our system, the Scene Graph module is responsible for managing and drawing the virtual objects.

We developed our own simple scene graph using the OpenGL library for rendering.

Figure 15 illustrates the scene graph in detail. The oval nodes, or parent nodes, represent the various A Tis in our system; we call these the ATI nodes. Each A TI has its own position and orientation, determined by the marker's position in the scene. These markers'

transformations determine where the virtual objects associated with the markers will be drawn. The objects associated with each A TI are members of the ATI' s Object Group.

An Object Group may be assigned a transformation matrix to control the

transformation of its collection of virtual objects. An Object Group may also own another

Object Group as its child. Finally, the terminal node is an object node, which is responsible

for rendering the virtual object. In figure x, the object nodes are the ones in circles. The

system can load and render various CAD files, such as: GLUT solid objects, Alias Wavefront

file (.obj), and 3D Studio Max file (.3ds). Object from other file formats can also be rendered 35

in our system once it is converted into either a .3ds or .obj file, through use of a third party converter, such as polytrans [24].

The scene graph receives commands from the interaction module: to delete a 3D part, to join two 3D parts together, or to transfer a 3D part from one ATI node to another (e.g.: to transfer a .3ds object from the Card Stack UI to the Cube UI). For example, when a user transfers a 3D part from an ATI, let's call it A, onto another ATI, say B, we make sure that the transferred 3D part on B maintains the orientation it had on A. In order to do this, upon receiving the transfer command, the Interaction module computes the necessary transformation matrix for B. So in this case, the Scene Graph module just has to transfer the object node to from A to B, and apply the correct transformation matrix that was given by the

Interaction module to B.

(~~

~...jeci ·· .. ·· j"··· .rOol·O.U.. J.·. .pp••. •. / ' · -~~

Figure 15: The Scene Graph Module in our system 36

Interaction Module

The Interaction module is the central module in our system. It manages the behavior

of the A TI in our system, and computes the transformation matrices that are necessary for

inter-AT! interactions.

Based on the markers' data from the Image Analyzer, the Interaction module controls

the A TI in our system. There are three types of A TI, namely the Card Stack interface, the

Cube interface, and the Table interface. There may be more than one A TI type in a given

time, and each A TI can interact with the others. In the next section, we will discuss these ATI

in detail, followed by their interaction possibilities.

The Card Stack Interface

The first task we identified in a design review is model selection. The group must

browse through a database containing a set of 3D models in order to isolate the particular

models needed to be reviewed. Using a typical 3D visualization program, this involves

opening up some sort of file navigator or database browser, and then choosing the correct 3D

model from a list of files, that may or may not be accompanied by their corresponding

Figure 16: Open File-Dialog of 3D Studio MAX 37

thumbnail photographs. Thumbnails have their limitations as well. A static 2D picture has limited degrees of freedom when representing a 3D object.

Consider figure 16 and 17; they are the screenshot of 3D Studio MAX' Open-File dialog, and it shows a thumbnail photograph of the 3D model file. Both of the files are 3D model of the same looking monsters; however, the monster's right eye in figure 16 is not modeled yet. Because the two thumbnails show the monsters from the same angle, we cannot see the monster's right eye in figure 16.

Figure 17: The same Open File-Dialog showing different file from the same angle

Interaction mediated by names and thumbnails is an artifact of the evolution of CAD

interfaces. It does not model how people work with real objects. If physical prototypes of

designs existed, they would likely be spread out on a table for the participants to pick up and

look at. The Card Stack interface is designed to reduce 3D browsing complexity by

simulating the selection of physical objects. The stack can be picked up, and manipulated,

providing direct control of the virtual objects as if the physical prototypes existed.

The Card Stack interface consists of three cards with a square marker on each of the

card faces. The three cards are stacked on top of each other, so the user can see only one card

at the time. Associated with the interface is a 3D object database. When a user looks at the 38

Card Stack, a virtual object is displayed in the place of the marker. By swapping a card forward, the user tells the computer to display the next object in the database; by swapping a card backward, the user tells the computer to show the object previously seen (see figure 18).

By turning the Card Stack over, the user can see the backside of the virtual object (see figure

19).

Figure 18: The user browse to the next 3D object in the database by swapping the card forward

Figure 19: The Card Stack Interface consists of 3 cards (left). The user can see the 3D model from different angle (center), and even the back side by rotating the card physically (right) 39

We use three cards for the Card Stack interface to facilitate index counting. We determine the order of the cards at the start of the application by associating the markers' identification number with the ordering. Figure 20 illustrates the mechanism of indexing for the Card Stack interface. Taking the top most card and putting it to the bottom pile is a swapping forward gesture. When the user does this, the indexing counter will be incremented, thus telling the computer to display the next 3D object in the database. In contrast, taking the bottom card and putting it to the top is a swapping backward gesture.

When the user does this, the counter will decremented, thus telling the computer to display the previous 3D object in the database.

Condition Initial An~ Swapping Forward displaylndex=O; displaylndex=1;

..s.Iter S·,•appingBac~· ard Figure 20: Indexing Mechanism for the Card Stack Interface 40

The Cube Interface

After selecting 3D parts from the database, the design review group need to interact with the parts, point out particular features, manipulate them and position them relative to one another.

To position and orient 3D parts, the group is limited to typical CAD system's 2D GUI interface, instead of having the ability to handle the 3D parts directly. Usually it takes a skilled operator that uses the mouse and keyboard to interact with interface items that make the virtual objects rotate and translate behind the monitor glass or the projector screen. With this approach, only one team member at a time can manipulate the CAD display.

Different CAD programs have different ways in facilitating translation and orientation task. For example, via 3DS MAX, the user first have to click the "translation" icon to set the mouse as a translation tool, and drag the 3D model along the three independent translation axis, or at most on two axes simultaneously. While to set the orientation of the 3D model, the user clicks the "rotation" icon to set the mouse as a rotation tool, selects a rotation axis to rotate on, and finally rotates the model along that axis.

Figure 21: 3DS MAX' translation tool (left), and rotation tool (right). In both cases, the user first must set the mode by pressing icons in the above. Translating or rotating using the mouse, the user can manioulate two axes at maximum. 41

SGI Creator, on the other hand, operates differently to solve this translating and orienting task. Leveraging the mouse's X-axis and Y-axis, SGI Creator let the user to rotate a

3D object on two axes at the same time by pressing a button on a keyboard while manipulating the mouse. This turns out to be a more confusing and counter-intuitive method compared to 3DS MAX' method, because the mapping of the keyboard button and axes are not obvious. The lack of uniformity of CAD systems' interface limits the ability of the team to completely examine the design because some members of the team may not have necessary skills to operate the system.

Moreover, as previously discussed in section two, the 2D GUI and mouse combination is a time-multiplexed input. Using the mouse, the user has at most two degrees

of freedom when manipulating a 3D object; it is not possible to rotate and translate

simultaneously.

The ideal interface for 3D position and orientation task is the one that has low

learning curve, is intuitive, and allows simultaneous translation and orientation. The Cube

interface is designed to fit those criteria. The cube interface is a space-multiplexed input

device, and it supports six degrees o freedom. The user manipulating virtual objects via the

Cube interface is able to translate and rotate the virtual objects simultaneously and intuitively

without having to learn CAD systems' interface items. 42

Figure 22: The Cube Interface

The Cube interface is a wooden cube with six markers attached to its six sides. A

Cube interface controls a group of 3D objects. The 3D objects' transformation corresponds to the Cube's transformation. The markers in the Cube interface are ordered intentionally to indicate a rotation matrix for the 3D object. Each side of the cube signifies a 90 degrees rotation for the object. As seen in figure 22 left, a 3D model of a handle started out with a neutral orientation. Figure 22 right showed a rotation toward the user, which showed the cube's top side's marker, also viewed the 3D model from the top.

The Table interface

A table plays an important role in a collaborative work setting such as a design review. Kato and Billinghurst [20] summarized three roles of a table during a collaborative work:

• People typically gather around a table in face-to-face meetings

• A table provides a centralized location for placing objects and items related to

the meeting

• A table also functions as the surface for content creation 43

Furthermore, gathering around a table enables the participants to see each other's interpersonal communication space. A connected interpersonal communication space encourages communication cues such as gaze, gesture, and other non-verbal behaviors [4].

We use the Table interface as a center piece to put virtual objects for group viewing, and to focus attention.

The Table interface is a large piece of paper or a cardboard surface that is equipped with multiple markers. Our previous A TI have single tracking marker per surface, thus if the marker is covered by the user's hand, the tracking algorithm will fail. On the other hand, the

Table interface utilizes multiple markers on a single surface. With this approach, as long as one marker is visible, the tracking algorithm still works. There are two interaction operations possible with the Table; they are: placing and picking objects from the table. These interactions will be described in the following section.

Inter-A TI Interactions

In our system, we allow interactions between different A TI types. For example, the user is able to perform assembly operation easily via two Cube interfaces. It is because Cube interface supports natural positioning and rotating two virtual objects.

To execute inter-A TI interaction, the user tells the computer what to do, via the

speech input. The Speech module translates the user's speech, notifies the Interaction module,

and the Interaction module performs the action by modifying the scene graph structure.

There are four inter-A TI interactions, they are: deletion, transfer, assembly, and table

interactions. 44

Deleting Virtual Object

The user is able to delete virtual objects associated with the Cube and Table interface.

(This operation is not mapped for the Card Stack interface because the Card Stack is a catalog of virtual objects).

To perform delete operation, the user simply puts the A TI with the virtual object to be deleted in view, and say "Computer, delete" to the computer. The interaction module then signals the Scene Graph to remove the object node from that particular ATI.

Transferring Virtual Object

After browsing the 3D parts database via the Card Stack interface, the user may transfer the selected 3D part from the Card Stack interface to the Cube interface or to the

Table interface.

To perform a transfer operation, the user simply puts both ATI in view, and says

"Computer, transfer". This command transfers the virtual object from the filled-AT! to the empty ATI.

With the transfer operation, we want to make sure that virtual object maintains its pose. The process goes like this: upon receiving the transfer command from the Speech module, the Interaction module computes the rotation matrix for the transferred virtual object so that it will show up with the same orientation as the source virtual object.

Let's consider some examples. Figure x illustrates a transfer operation from the Card

Stack interface to a Cube interface. Figure 23 left is the condition before the transfer operation; the 3D part has a 90 degrees orientation. Figure 23 right shows the part already transferred or copied from the Card Stack interface to the Cube interface. The orientation is properly maintained. 45

Figure 24 illustrates a transfer operation from the left Cube to the right Cube. The figure on the left is the condition before the model is transferred, and the right figure was the condition after the model is transferred. The model maintains the orientation.

Figure 23: Transfer Operation from the Card Stack interface to the Cube interface

Figure 24: Transfer operation from a Cube interface to another Cube interface

Figure 25 shows the detail of transfer operation. For the illustration, we took Card

Stack to Cube transfer operation as an example. Before transferring, the Cube interface doesn't have an object node, thus even it is visible in the user's view, it doesn't display any virtual object (see figure 24 above). The Card Stack interface currently displays the model with a certain orientation, and it is defined by the matrix M The matrix Mis a multiplication 46

result of the position and orientation of Card Stack interface (matrix CJ), with the Object

Group modifier (matrix G1 ), and the object modifier itself (such as scaling), such as matrix

01. So, M=Cl *G 1*O1. To transfer the model to the Cube interface, we copied the object node from the Card Stack node to the Cube node. This copies all the information about the object, such as the vertices, and the transformation matrix. When the model is transferred to the Cube interface, we still want to maintain the orientation M. In the Cube tree,

M=C2*G2*02, with 02=01 (because it was copied). C2 and G2 have been defined, so to get Mfor the Cube tree, we need to modify 02. Thus: 02=((C2*G2f1 *M)*02. We want the virtual object to appear exactly on top of the cube interface, thus we reset the translation part of 02 matrix into zeroes.

M=.cr·c1·01 M"'c1•G1·0-1 M=C2'G2'0Z C2. G2, and M are knmvt\ 02"'((C2.G2f 1•Mt02

BEFORE TRANSFER AFTER TRANSFER Figure 25: Transfer operation's detail illustrated by the scene graph 47

Placing and Picking Object for The Table Interface

There are two interactions possible with the Table interface; they are: placing virtual objects onto the Table, and picking up virtual objects from it. Figure 26 left showed the Cube interface with the virtual objects attached. To put the objects to the Table, the user says

"Computer, put". The virtual object will be placed exactly on the spot where the user intends to while maintaining the rotation. Figure 26 right shows the Table with the virtual objects.

There may be multiple objects placed on the Table interface. To pick an object from the Table, the user places the Cube interface near to the object to be picked up. Then the user says "Computer, pick up" to instruct the computer to get the object that is nearest to the Cube interface.

Figure 26: The user placed the virtual fishing reel on the Table interface 48

''\,

f".,.,J $ /""-"-...,

BEFORE PLACING ON TABLE AFTER PLACING ON TABLE Figure 27: Scene Graph illustrating the placement operation

Similar to a transfer operation, placing a virtual object on the Table interface is accomplished through transferring object nodes from the Cube's Scene Graph to the Table's

Scene Graph. Taking the operation described above as an example, figure 27 illustrates the process of putting the assembled fishing reel base and handle onto the Table. Before being placed on the Table, the base and handle parts are on the Cube's Scene Graph's Object

Group. When the user signals the computer with the speech command "Computer, Put", the

Interaction module looks for the Table interface on the view, and another interface that has the virtual objects, in our example, is the Cube interface. Then the Interaction module transfers Cube's virtual objects, by transferring its Object Group to the Table's Scene Graph.

The Interaction module also computes the new matrix necessary for the virtual objects to show up correctly on the Table interface; the matrix is assigned to the new Object Group node. 49

Assembly/Disassembly Operations

During a design review meeting, different design teams present several 3D models

which must be put together to form a single assembly. Performing assembly using 2D

interface bears little resemblance to an actual, physical assembly process. In place of the

natural positioning of objects relative to one another using motions of the hands and arms, a

complex sequence of learned mouse and keyboard operations must be performed to adjust

viewpoints, position virtual objects in the desired positions, join them, and then view the

partially or fully assembled object for interference inspection.

Let's take 3DS MAX' interface for assembly (see figure 28). For each object that is to

be put together, the operator must choose the appropriate axes, and other align selection

Figure 28: Joining 2 parts together using 3DS MAX's Align Selection Option.

options from the options , however, the explanations in the options dialog box are

not obvious. During a design review meeting, this complexity may cause frustration, and

ineffective communication. 50

Figure 29: The user assemble two parts together Using two Cube interfaces, our users are able to position and orient 3D models easily, with a full six degrees of freedom. Then using a speech command "Computer, join", the

Interaction module joins the two parts correctly. Figure 29 illustrates the process. With two

Cube interfaces, the user positions and orients the fishing reel base and handle so both objects' snap surface are in close proximity to each other. When the user signals the computer to join the two models together, the Interaction module computes the orientation and the translation of the 3D model on the left Cube, transfers the object node from the left

Cube's scene graph, to the right's Cube scene graph and assigns the transformation matrix so that it attaches correctly (see figure 29, right).

The 3D modelers have to define one or more attachment points in each 3D model in order to assist in the assembly process. An attachment point is a location for other 3D parts to attach. An attachment point thus has a 3D location relative to the model itself, a normal vector and an identification number. The identification number is useful to match 3D parts to their correct location. The normal vector is defined on the attachment point to help the application orient the parts precisely during the join process.

Figure 30 illustrates the assembly process. Box 1 is located at position A, and it has two attachment ids. In our example we use the colors red and green. Attachment point red has a normal vector, vector u. Box 2, located at position B, has an attachment point red, with normal vector v. To join box 1 onto box 2, we first compute the translation necessary to 51

move an object from position A to B, T=B-A. Our algorithm checks for matching attachment point ids, in this case, red with red, and makes sure that the two attachment points are at certain proximity with each other before performing the join operation.

After getting the translation information, we need to compute the rotation for box 1 so that it orients itself properly with box 2. To do that, we first take vector u, and negate it to yield-u. Our goal is to determine the transform that will line up vector-u with vector v. The dot product of -u with vector v yields the angle between them; call it e. We choose the cross-axis, an appropriate an axis of rotation, by crossing the vector -u and vector v (see figure 31 ).

With this information, we can create a rotation matrix R, which is a rotation in -8 degrees about the cross-axis. With the rotation matrix R, and translation vector T, we can create a matrix that joins Box 1 onto Box 2 correctly. We call that matrix M .

The rest of the process goes as usual. First we transfer the object node from a Cube's scene graph to another Cube's scene graph, in our case, we choose to transfer from Cube 2 to

Cube 1, and apply the new matrix M so that the new object shows up attached to the object on Cube 1.

A 8

X-Axis Before Assembly Operation

~ LO

,/; X-A xis After Assembly Operation . ~~~ // Figure 30: The assembly algorithm 52

X.. Axis

Figure 31: Dot product and cross product of vector -u and vector v

Disassembly Operation

After successfully creating an assembled part, the design review group might want to disassemble a part, for example, with intention to put another version of that part on that spot for comparison.

To disassemble, we need two Cube interfaces. The first Cube has the assembled part attached, and the second Cube has no model yet attached to it (see figure 32). Through the speech command "Computer, break" or "Computer, release", the Interaction module breaks the assembled part by obtaining the last object in the assembled part's scene graph, and putting it on the empty Cube interface (see figure 32).

The order of the objects in the assembled part is determined by the assembly process.

In the example, the rotor combination from the fishing reel was assembled last, so it came off first during the disassembly process 53

Figure 32: Disassembly Operation 54

DISCUSSION

We have described three critical tasks that characterize a typical design review, and the use of our Augmented Reality Tangible Interface (A TI) to help the users to easily accomplish them. The three challenges are:

1. 3 D browsing

2. 3D positioning/orientation

3. 3D assembly/disassembly

3D Browsing

The first challenge in design review is model selection. The typical Open-File dialog, even when augmented by 2D thumbnail photograph, doesn't give enough degrees of freedom to represent the nuances of a 3D model file, and it can create confusion if the thumbnails portray similar looking 3D models from the same angle. The Card Stack interface provides a natural way to browse through a database of 3D objects. Using this interface, a user is able to

"handle" the objects in the browsing set, viewing them from different angles to give more accurate views in the selection process. Going forward and backward in the database is achieved by swapping a card forward and backward, just like browsing through a set of family photos. Furthermore, browsing a 3D model database with the Card Stack interface simulates the existence of physical prototypes of the 3D models.

3D Positioning/Orientation

3D positioning and orientation, our second challenge, is a task that is constantly performed during design review. During a design review, the team interacts with various 3D parts, pointing out particular features, and positioning the parts relative to one another in order to highlight certain relationships. The 2D interface approach to this positioning is 55

indirect. Using mouse and keyboard, the user must learn a complex set of UI sequences or mouse-keyboard combinations that indirectly translate and rotate a 3D model behind the monitor glass. Furthermore, since the mouse and keyboard are time multiplexed devices, the user cannot perform positioning and orientation tasks simultaneously.

The Cube interface, by contrast, provides a mechanism to "handle" virtual objects, providing a more natural way to position and orient virtual objects. To translate virtual objects, the user simply moves the Cube interface through space to a new location; to orient an object, the user rotates the Cube as if he/she is rotating the virtual objects. Because the

Cube allows virtual objects to behave like real ones, the learning curves are low. Moreover, since the Cube interface is a space multiplexed input device, translating and orientating an

object can be done simultaneously.

3D Assembly/Disassembly

Today's engineering firms are creating more complex products, and they need the power of CAD systems to help them finish the products economically and accurately. This

complexity drives CAD beyond part creation and display. For example, for complex designs,

composed of multiple parts, each with its own design teams, verifying assembly is an

important task. Eventually the different teams have to gather together to verify that a design

can be effectively assembled, and if this verification can occur without physical prototypes it

speeds the design process and reduces cost. Much research has been devoted to methods to

address this task virtually [27] [30] [3].

To assemble two parts, the user must first position and orient the two parts so that the

points that are to attach together are in close proximity. With the Cube interface, the user can

easily place and rotate each part in six degrees of freedom. Then, using our speech based

assemble command, the user asks the system to find two matching attachment points in close

proximity to each other, and attach the two parts automatically using our docking algorithm. 56

We have previously discussed the limitations that CAD's 2D interface paradigm imposes on user's interaction with 3D objects. Because 2D desktop lack the degrees of freedom need to fully manipulate 3D objects naturally, various indirect 2D paradigms have been created as stand-ins for the more natural 3D manipulation. These paradigms don't facilitate simultaneous rotation and translation (time-multiplexed), and have high learning curves. [ 17]

By contrast, applying ATI' s to the problem of 3D interaction eliminates the need for these intermediate constructs and allows the user to easily and naturally "handle" virtual objects. The A TI devices we implemented combine the natural interfaces of real objects with the infinite, instantaneous malleability of virtual objects. Our A TI gives the user greater degrees of freedom, facilitates simultaneous operation, and has a low learning curve.

HCI Considerations

Augmented Reality Feasibility

AR is not a new research area. In 1965 Ivan Sutherland developed a technology that enabled virtual objects to coexist with the real world when he created the first HMD by attaching two head worn miniature cathode ray tubes to a mechanical tracker [5]. Since those early days, augmented reality has emerged from the lab and found its way into everyday life in limited ways. For example, the virtual first down marker in American football, "the yellow line", has become a fixture of fall telecasts. However, despite consistent research effort in computer graphics, virtual reality and related fields over the past 40 years, AR technologies are far from ubiquitous. There are several considerations that need to improve before we can see AR technology join the mainstream. 57

Tracking Requirements

View Registration is a critical step in AR, critical to maintaining the illusion that the virtual objects and the real world coexist in the same space. There are three important criterions for AR tracking:

• Accuracy ofalignment. Human vision is an extremely acute sense, and as a result AR

system requires a high degree of accuracy to maintain the illusion that virtual objects

coexist in the real world. Azuma stated that the angular accuracy required for AR is a

small fraction of a degree [1].

• Latency. In AR, tracking must occur in real time. Even small delays in tracking result

in the appearance of virtual objects "catching up" to the real world.

• Range. The perfect AR system would able to track a user anywhere in the world,

regardless of their environment. However, with today's AR system, users are bound

to a certain areas to ensure that the tracking can be performed reliably.

There are several tracking systems that have high accuracy, and low latency, suitable for AR system. However, they must be compensated by a limited range capability. Let's consider UNC' HiBall tracking system as an example [32]. The HiBall tracker provides sub­ millimeter positional accuracy, and better than 2/100 angular accuracy. With its update rate

of l 500Hz, it is capable of reducing the latency to one millisecond. However, because the

HiBall tracker takes cue from LED-embedded ceiling tiles to extrapolate its position and

orientation, it only works in a room that has been specially prepared with LED ceilings.

Our system employed vision based tracking based on the marker recognition

algorithms implements in the ARToolkit library. ARToolkit requires that every tracked

object must have a distinct marker attached on it. This approach facilitates reasonable

accuracy because the virtual objects are able to show up directly on the marker. Even though

our implementation provided a convincing illusion, there was still noticeable "shake" of the

virtual objects at times due to tracking instabilities. However, as camera resolution increases 58

this shake would be reduced to insignificant levels. The latency in our system is also noticeable, a function of the frame rate of the camera. Images become "blurry" if the head rotation rate is more than 30 degrees per second. However, as camera frame rates increase this motion blur would naturally reduce to insignificant levels. Thus marker based tracking is a promising avenue for overcoming the accuracy and latency requirements of AR.

However, using markers for tracking limits its range to marked domains. The markers must never be too far from view, and large domains require us to utilize much larger markers to support viewing from a distance. And obviously it is not possible to attach markers to every object in the world if we want to have a ubiquitous AR system.

Aside from the technical challenges, there are other considerations that pertain to the usability of AR's tracking technology. Tracking must be:

• Unbounded: It must not strict user's movement.

• Robust: It must be robust against users' error and vandalism.

• Safe: It must not pose a danger to users, even with prolonged use.

• Stable: It must work well without repair for over 90% of the time.

• Easy to Operate: Easy to setup, and easy to operate.

• Low Cost: Initial cost and operating cost must be affordable.

We have discussed our system's technical limitations and promise in tracking area.

However, we feel that our tracking fared better in the HCI point of view. The vision tracking algorithm used in our system allows user to move freely, and it proved both stable and robust. We have also shown that such as system is both very affordable and easy to operate.

Safety, however, is another issue. Our system's HMD only offers a monocular vision with 30 degrees field of view (FOY), and a low resolution of 800x600. While this is sufficient for desktop work, it is certainly not sufficient for daily activity such as reading, much less walking or driving. With less than 30 degrees of FOY, a user using the HMD would be is very nearly legally blind [35]. Clearly, to extend this approach beyond desktop work would 59

require improvements in camera field of view, resolution and frame rate, as well as significant improvements in display weight and resolution.

AR For Collaborative Work

Billinghurst et al. conducted a user study in which they compared participants' performance in collaborative work in either a face-to-face setting, an AR setting, or a projection setting [6]. In the experiment, a pair of users was assigned to solve an urban design task. To complete the task, users must place nine real or virtual models on a 3x3 street grid on a table. The users must work together to place the buildings so as to satisfy a set of placement rules, for example: "The Church is next to the Theater", "The Fire House and the

Bank are across the street from one another".

In the face-to-face setting, the two users stood on opposite sides of the table, and they had to arrange paper models of the building according to the given rules (see figure 33) In the

AR condition, the two participants again stood on opposite sides of the table, but instead of using paper models, each user wore a HMD-camera pair to that allowed them to see a set of virtual buildings, rendered on top of marker cards. (see figure 33). To position the buildings, the users moved the corresponding marker card. In the third condition, the projective condition, the two users sat side by side in front of a projective display (see figure 33). A 3D mouse was used to select, position, and orient the virtual buildings.

Fourteen pairs of adults consisted of six pairs of women and eight pairs of men, ranging in age from twenty-one to thirty eight years old were tested. In the experiment, both objective and subjective measures were taken into consideration. For objective measures, the experimenters observed the average solving time, and the users' communication method.

Figure 34 shows the average solving time for each condition. Surprisingly the AR condition suffered the longest solving time instead of the projection condition. It was slower by more than a minute in average compared to the projection condition. However, looking at figure 35, 60

the deictic phrases percentage in AR is more closely related to the face-to-face condition.

Deictic phrase is phrases that include words such as "this", "that", or "there", which cannot be fully understood without some extra explanations such as pointing gestures.

Figure 34: 3 Conditions for the experiment. Face-to-face (left). AR (middle). Projection (right). (Image is reprinted with permission from (6])

~ 200 +---t---- ,1:., e 150 i= 100

50

0 ftf Proj AR

Figure 33: Solving Time. (Image is reprinted with permission from (61)

FtF Pmj AR

Figure 35: Deictic Phrases Percentage. (Image is reprinted with permission from [6]) 61

8...--·--··~-·~--·~~~~-·"-~•>•~-~-~·~~~--M~•~~-~-·-~--~~"-~--~~~ 7 6 5 4 3 2 1 Cl

Figure 36: Pick And Move Survey Feedback (Image is reprinted with permission from (6))

The users were also asked to fill out a survey on how they felt about the interface and the experience. Regarding picking and moving buildings (with 0: hardest, 8: easiest), the users rated face-to-face condition almost parallel to the AR condition (see figure 36). In both cases, the users feel that they are harder in the projection condition. However, for collaboration work, AR condition suffered a low rating again. For a scale one to three, with one being the easiest and three is the hardest, AR was rated the hardest condition for collaborative work. AR condition was also the least liked, and rated the lowest for easiness to understand partner. However, the users felt that AR interface was more intuitive compared to projection.

Face..:to ;Projection AR ANOV .. P Val ue i A ( -Face + 1 ... F(2,66) 14 Ease of working together l.09 23 9 2.53 56.07 p ~~:.. lxl0-

Liked the best 1. 74 2.00 2..26 2.41 p = 0.10 15 Understanding your partner L09 2.30 2.61 60.86 p < lxl0- •'"\ ...,. r. lxl0-18 :.lfost intuitive i11terface 1.04 2.61 L. . ~~-) 78.19 p <

Figure 37: User's subjective experience feedback (Image is reprinted with permission from (6]) 62

The purpose of using AR in collaborative work was to supplement face to face meeting with computing power. We believed that the AR condition should at least have performed better than the projection condition. However, working in AR setting suffered the longest completion time. In the interviews with the study participants, most of them mentioned perceptual problems with the AR interface, such as limited field of view, low resolution and blurry images. Two users commented that the AR condition created a form of tunnel vision. In addition to that, the users also commented on the limitation of vision-based tracking. Pointing at a virtual building on a card could make the building disappear if the marker on the card was occluded by the hand. There were also cases in which different buildings appear on a same card; this was due to misidentification of marker pattern. Some participants also mentioned that bulky AR's HMD-camera pair also made it impossible for

AR users to see other users gaze direction.

However, there were some good feedbacks from the users using AR. One user said

"AR is more real since you are actually holding something just like the real thing." In contrast, eight out of ten projective setting participants mentioned the difficulty of interacting with virtual objects. Some said that this was due to their hands kept on bumping into their partner's hand. One AR user said that the AR condition was better because she could point and also see where her partner pointed.

Billinghurst's experiment showed partial success in using ATI to interact with 3D objects, especially in collaborative work. Although the users were hindered by the limitation of AR affordances, they felt that they could interact with the objects easily compared to the projection condition. This was further confirmed by the higher percentage of deictic speech pattern used in the AR condition comparable to the face-to-face condition.

We need a better AR display interface, with stereo view capability, a wider field of view and higher resolution. We also need a tracking technology that is not encumbering, intuitive, and safe for the users. In conclusion, we believe that improvements in AR 63

technology will produce a better result comparable to the unmediated face-to-face collaboration. Certainly this is not a wishful thinking because currently AR research is still actively pursued by scientists and engineers around the world.

Tangible User Interface Feasibility

To analyze the usability of Tangible Interface, we use two measures. Measure number one is Kato and Billighurst's A TI design principle. Measure number two is Fishkin's TUI taxonomy.

Tangible User Interface Design Principles

Kato and Billinghurst described six A TI design principles [4], and we believe that these six principles outlined the benefits of using tangible interfaces. They are:

1. Match the physical constraints of the object to the requirements of the task.

2. Support parallel activity, allowing multiple objects or interface elements to be

manipulated simultaneously.

3. Support physically based interaction techniques (such as using object proximity or

spatial relations).

4. Support both time-multiplexed and space-multiplexed interactions.

5. Support multi-handed interaction.

6. Support multi-participant collaboration

We followed the tangible design principle when designing the Card Stack interface.

For example, in following rule number one, which stated that physical object should match the requirement of the task, we specified the requirement of the task as the ability to browse through a database of object, while being able to view a 3D object with better degrees of 64

freedom. Gesturing forward and backward was a natural way to browse through a database of objects. The familiarity of this gesture was like browsing through a stack of family pictures

(principle three). The Card Stack had markers on front side and back side, so the user was able to flip it backward, and saw the backside of 3D model. The Card Stack interface also supports multi-handed interaction (principle five).

The Cube interface followed the TUI design principle. The user used the Cube's location and pose to control the translation and orientation of virtual objects (principle three).

Rotation and translation could be performed simultaneously, adhering to principle four.

Leveraging spatial relations, the user was able to use two cubes to perform interaction such as assembly or disassembly (principle two).

The Table interface allowed virtual objects to be put on it as if they were physical objects being put on a physical table (adhering to principle one and three). In multi­ participant work setting, the Table interface facilitates viewing virtual objects collectively but

from each independent point of view (principle six).

Fishkin 's TUI Taxonomy

Fishkin [ 13] presented a taxonomy that reveals how "tangible" a system is. He

proposed a 2D taxonomy with embodiment on one dimension, and metaphor on the other

dimension. He explained that a TUI is more "tangible" if it uses full embodiment and full

metaphor; however, he stated that a more "tangible" TUI is not necessarily better, but using a

more "tangible" UI will result in less cognitive load than using a less "tangible" UI.

Fishkin defined embodiment as the closeness of input and output. There are four levels of

embodiment, namely: distant, environmental, nearby, and full. A distant embodiment refers

to a condition which the output was far from the input, or as he described it, "over there". An

example of distant embodiment is the TV remote control. 65

Environmental embodiment is a condition in which the output is around the user. For example, environmental embodiment is apparent in most sound editing applications. For TUI example, the ToonTown was categorized as having the environmental embodiment.

In nearby embodiment condition, the output takes place near the input object. For example, in GraspDraw project, the user manipulated two bricks, and the result happened on the augmented workbench.

And in full embodiment, the output device is the input device. This is the ideal embodiment for tangible interface.

Using augmented reality technology, we made it possible for our A TI to have full embodiment. The Card Stack interface has a full embodiment because the virtual objects show up on the card directly, and by manipulating the card, the virtual objects are also manipulated. This is also true for the Cube interface. By rotating or translating the cube physically, the virtual object follows the cube behavior as if it is the cube. Interaction with the table also happened directly as if the users are interacting with a real table. There are no intermediary between the A TI (input device) and the output.

Using metaphor in TUI means how analogous the users' action in the system is to the real world of similar action. Fishkin divided TUI' s metaphor characteristic into four levels, namely: none, noun or verb, noun and verb, and full.

The first one is the condition in which there is no metaphor being used at all, for example: the command line interface. The command line has no analogy in the real world.

In system that uses the noun analogy, the physical shape/look/sound of the interface object is important but the actions employed with that object are not important. Example in conventional HCI is the Windows desktop system. Although a is analogous to document in real desktop, most operations with documents (such as crumpling, stapling, burn) are not mapped to virtual windows. While in verb analogy, the shape of object is 66

irrelevant, instead the gesture or action being done with the object is relevant. For example: the paddle in the VOMAR project.

Stronger metaphors are used in the "noun and verb" level and the full metaphor level.

ToonTown project utilizes the noun and verb metaphor because the shape and the action being done to it are related. In the full metaphor, the users do not need to make analogy at all because it is direct interaction; for example using the stylus on a tablet PC.

We have established that all of the TUI in our system have full embodiment.

However, to analyze the metaphor variable, we need to break down each TUI into its separate interaction capabilities. Deletion, transfer, and join operation has no metaphor or analogy to the real world, because we utilize a speech interface to achieve those tasks.

Positioning and orientation task with the Cube has a verb metaphor, and so does putting and picking up objects onto and from the Table. Browsing gesture with the Card Stack interface has the noun and verb metaphor, because the shape of cards and the act of swapping are closely related.

Metaphor-> None Noun or Verb Noun and Full

Embodiment Verb

Full Deletion, Positioning and Browsing

Transfer, Join orientation, put,

pick up

Nearby

Environment

None Figure 38: TUl's Taxonomy On Our ATI 67

In conclusion, we have successfully created a tangible interface that has full embodiment [13]. This was made possible through augmented reality technology.

Interactions between the various A TI items were accomplished through a speech interface,

which allowed for two-handed interaction, but received a low metaphor score because it

lacks a real world analogy. Interactions with low metaphor are generally not very intuitive; in

our case, the user must be taught the speech commands beforehand. However, with a real

world analogous gesture interface, the user might be able to intuit the gestures associated

with the supported interactions based on the properties of the virtual and physical objects. 68

FUTURE WORKS

The discussions and Billinghurst's user study showed partial success of A TI for interacting with CAD models in collaborative settings. There are a number of ways to improve and expand our work.

We like the optical tracking strategy because it is cheap and doesn't hamper the users or the environment with cables or boxes. However, as discussed in the user study section, a partially covered marker ruined the tracking. We can alleviate this problem by using multiple markers on all tracked surfaces. Just like the Table interface, the multi-marker technique still works despite partially covered. However, in order to successfully use multiple markers on each tracked surfaces, we need a camera that can capture with a higher resolution due to the reduction of marker sizes in favor of the increase in marker's count. We believe that a faster computer will also be necessary because processing many markers will require more processing time.

We plan to expand our interaction possibilities when using the Table interface. When discussing about an object in the center of the table, the users may want to annotate the object's particular part, or look at the cross-section similar to the work by Regenbrecht et al

[26]. We can use a marker with a virtual laser pointer to point to the specific part that is to be annotated. We think that the user can annotate the part using speech instead of typing descriptions using the keyboard.

We also plan to investigate how to expand inter-A TI interactions to have more real world analogy. As discussed in the previous section, inter-A TI interactions in our system are primarily done through the speech command. Following Fishkin's TUI metaphor principle, we think that using gestures for the interactions will be more intuitive. For example, to delete an item from an A TI, we can use put that A TI into a garbage-can type area. To transfer and to assemble, we can use leverage the spatial location of the two Cube interfaces. If they are at 69

a close proximity to each other for a specified period of time, the computer can decide to transfer the virtual objects to an empty Cube, or to assemble them if the other Cube is not empty.

Finally, we think that a remote collaboration capability will be a necessary feature for the future because presently different design teams are more spread across different location.

For example, currently PTC Wildfire CAD software supports remote collaboration [25].

Adding a remote collaboration feature is not an easy task. There are a lot underlying concepts that must be well thought of. Brave and Ishii [9] discussed the extension of tangible interfaces in remote collaboration setting. In such setting, what mechanism should we, the designer, use to let the local user manipulate physical objects in remote spaces? How can we simulate the remote users' presence for the local user? These are the questions that need further research and user studies. As for an early prototype investigating this, we think that we can use the Table interface extensively. The local user can use a marked card on top of the Table to represent each remote collaborator. The video images of the remote collaborators' face will show up on top of those marked cards. By determining the physical position of the remote collaborators (using the physical cards), the local user takes advantage of collaborators spatial position just like in real face-to-face collaboration. Each participant can browse and manipulate virtual objects at their own location, and putting them on the

Table interface for group discussion. Using a tangible pointer tool, a local user can point to specific part of the 3D model, while the remote users can see the pointed part highlighted or has a colored dot on them. The user can also annotate parts using speech, and do cross­ section analysis for the remote collaborators. Extending from here, we can think about using a magnetic controlled workbench that allows the physical interfaces to move representing the movement of remote users. 70

REFERENCES

1 Azuma, R. (1997). A Survey Of Augmented Reality. Presence, 6(4), pp. 355-385. 2 Bier, E., Skitters and jacks: Interactive 3d positioning tools. In Proceedings of the 1986 Workshop on Interactive 3d Graphics, October 1986. pp. 237-249 3 Bier, E., Snap-dragging: Interactive geometric design in two and three dimensions. Technical Report EDL-89-2, Xerox Palo Alto Research Center, 1989. 4 Billinghurst, M., "Crossing The Chasm," Proceedings ofInternational Conference on Augmented Tele-Existence (!CAT 2001), Tokyo, Japan, December, 2001. 5 Billinghurst, M., Kato, H. (2002). Collaborative Augmented Reality. Communications of the ACM 45, 7, pp. 64-70. 6 Billinghurst, M., Kato, H., Kiyokawa, K., Belcher, D., Poupyrev, I., (2002). Experiments with Face to Face Collaborative AR Interfaces. Virtual Reality Journal, Vol 4, No. 2, 2002. 7 Billinghurst, M. and Kato, H. Out and about: Real-world teleconferencing. British Telecom Tech. J. 18, 1 (Jan. 2000), pp. 80-82. 8 Bimber, 0., Brunner, S., Madder, S., (2003). Report On Tracking Technology, Report for European Information Society Technologies, Project Number: IST-2000- 28610 9 Brave, S., Ishii, H., Dahley, A., Tangible Interfaces For Remote Collaboration And Communication. Proceedings Of CSCW 98 10 "CAD history 1960's", http://mbinfo.mbdesign.net/CAD1960.htm, visited February 1,2005. 11 "CAD Software History", http://www.cadazz.com/cad-software-history.htm, visited February 1, 2005. 12 Caudell, T., Mizell, D. "Augmented Reality: An Application of Heads-up Display Technology to Manual Manufacturing Processes." Proceedings ofHawaii International Conference on Systems Sciences, Maui, Hawaii, IEEE Press, January 1992. pp. 659-669 13 Fishkin, K, (2001). A Taxonomy for and analysis for tangible interfaces, Pers. Ubiquitous Computers 2004, pp. 8: 347-358. 14 Fitzmaurice, G., Ishii, H, Buxton, W. Bricks: Laying the Foundations for Graspable User Interfaces. Proceedings of CHI 1995, May 7-11, 1995. 15 Gorbet, M., Orth, M., and Ishii, H. "Triangles: Tangible Interface for Manipulation and Exploration of Digital Information Topography." In Proceedings of CH/'98, pp. 49-56. 16 "Graphical User Interface. Time For a Paradigm Shift?" http://www.sensomatic.com/chz/gui/ Alternative.html, visited March, 2005. 71

17 Ishii H, Ullmer B. (2001). Emerging Framework for Tangible User Interfaces, In ''Human-Computer Interaction in the New Millennium, "John M Carroll, ed.; Addison-Wesley, August 2001, pp. 579-601. 18 Ishii, H., Ullmer, B. (1997). Tangible bits: towards seamless interfaces between people, bits and atoms, Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 234-241. 19 Kato, H., Billinghurst, M. (1999). Marker Tracking and HMD Calibration for a Video-based Augmented Reality Conferencing System, Proceedings of the 2nd IEEE and ACM International Workshop on Augmented Reality '99, pp. 85-94. 20 Kato, H., Poupyrev, I. (2000). Virtual Object Manipulation On A Table-Top Environment. Proceedings ofJSAR2000, pp. 111-119. 21 Kato, H., Tachibana, K. (2003). A City-Planning System Based On Augmented Reality With A Tangible Interface, Proceeding of!SMAR '03. 22 "Microsoft Speech SDK 5.1 For Windows Application", http://www.microsoft.com/speech/download/sdk51/, visited December 10, 2004. 23 Milgram, P., Takemura, H. (1994). Augmented Reality: A Class Of Displays On The Reality-Virtuality Continuum. SPJE Vol. 2351, Telemanipulator and Telepresence Technologies, 1994. 24 Polytrans 3D File Converter. http://www.okino.com, visited April 1s\ 2005. 25 "PTC Wildfire 2.0", http://www.ptc.com/community/proewf2/newtools/, visited April 1st 2005. 26 Regenbrecht, H., Wager, M. (2002). Interaction In A Collaborative Augmented Reality Environment, Proceedings to CHI 2002, April 20-25, 2002. 27 Ritter, F., Strothotte, T., Deussen, 0., Preim, B., "Virtual 3D Puzzles: A New Method for Exploring Geometric Models in VR", IEEE Computer Graphics and Applications, 21(5): 11- 13, Sept/Oct. 2001. 28 Schmalstieg, D., Fuhrmann, A., Szalavari, Z., Gervautz, M., (1996) Studierstube: An Environment for Collaboration in Augmented Reality. In CVE '96 Workshop Proceedings, September 1996. 29 Singer, A., Hindus, D., Stifelman, L., and White, S. "Tangible Progress: Less is More in Somewire Audio Spaces." In Proceedings of CHJ'99, pp. 104-111. 30 Stork, A., Maidhof, M., Efficient And Precise Solid Modeling using a 3D Input Device. In Proceedings of 4th ACM. Symposium on Solid Modeling and Applications, 1997 31 Sutherland, I. (1963). Sketchpad: A Man-Machine Graphical Communication System. Proceedings of AFIPS Spring Joint Comp. Conf, 1963. 32 "UNC Hi-Ball Tracking System", http://www.cs.unc.edu/~tracker/media/html/hiball.html, visited April 1st, 2005. 33 "User Interface: Timeline", http://www.cne.gmu.edu/itcore/userinterface/timeline.html, visited March 1s\ 2005. 72

34 Watanabe, R., Itoh, Y., Asai, M., Kitamura, Y., "The Soul of ActiveCube - Implementing a Flexible, Multimodal, Three-Dimensional Spatial Tangible Interface", Proc. ofACM SJGCHI International Conference on Advances in Computer Entertainment Technology ACE 2004, pp. 173-180. 35 "Yahoo: Ask, Legally Blind?" http://ask.yahoo.com/ask/20021031.html, visited March, 2005. 73

ACKNOWLEDGEMENTS

"Yesterday is eternal". These words came to my mind, first just as a cool band name.

But it actually summarizes the unconscious reality that my sometimes-sentimental self thinks about. I like to remember yesterday. Yesterday is mostly sweet. Yesterday is eternal.

First, I want to thank God: Jesus Christ, for his love and mercy in my life. Thank you

for all the opportunities and talent you've blessed me with. I return all the glory to you.

I want to thank my parents, Peter and Margaret Sidharta. Thank you for the love and

the support you guys have given me. Also, love and thanks for my brother and sister:

Raymond and Ramona. Being able to live with you guys all through college was a blessing.

Thanks to Julie, for being there always, and believing in me. Also thanks to my

friends in VRAC for all their help, and fun during slacking time.

Thank you to my committee members for taking their time to review this work. To Dr.

Reiners for sparing a lot of his time to help me with matrices. Dr. Cruz-Neira, and Dr. Oliver

for their support, and advice.

Finally, but not least, I want to thank Dr. Adrian Sannier. Boss, thanks for believing

in me. You've taught me so many things, and fueled my ambitions. Thanks for the support,

the ideas, and for editing my thesis.