Steve Mann EyeTap Devicesfor Augmented, [email protected] Deliberately Diminished,or James Fung Otherwise AlteredVisual [email protected] University ofToronto Perceptionof Rigid Planar 10King’ s College Road Patches ofReal-World Scenes Toronto,Canada

Abstract

Diminished reality is as important as , and bothare possible with adevice called the RealityMediator. Over thepast twodecades, we have designed, built,worn, and tested many different embodiments ofthis device in thecontext of wearable computing. Incorporated intothe Reality Mediator is an “EyeTap”system, which is adevice thatquantiŽ es and resynthesizes light thatwould otherwise pass through one or bothlenses ofthe eye(s) ofa wearer. Thefunctional principles of EyeTap devices are discussed, in detail. TheEyeTap diverts intoa spatial measure- ment system at least aportionof light thatwould otherwise pass through thecen- ter ofprojection ofat least one lens ofan eye ofa wearer. TheReality Mediator has at least one mode ofoperation in which itreconstructs these rays oflight, un- der thecontrol of a system. Thecomputer system thenuses new results in algebraic projective geometry and comparametric equations toper- form head tracking, as well as totrack motionof rigid planar patches present in the scene. We describe howour tracking algorithm allows an EyeTap toalter thelight from aparticular portionof the scene togive rise toa computer-controlled, selec- tively mediated reality. Animportant difference between mediated reality and aug- mented reality includes theability tonot just augment butalso deliberately diminish or otherwise alter thevisual perception ofreality. For example, diminished reality allows additional information tobe inserted withoutcausing theuser toexperience information overload. Our tracking algorithm also takes intoaccount the effects of automatic gain control,by performing motionestimation in bothspatial as well as tonal motioncoordinates.

1Introduction

IvanSutherland, apioneer in the Želd ofcomputergraphics, described a head-mounted display with half-silvered mirrors so that the wearer could see a superimposed on reality (Earnshaw, Gigante,& Jones,1993; Sutherland, 1968), giving rise to augmentedreality (AR). Others have adopted Sutherland’s concept ofa head-mounted display (HMD) but generally without the see-through capability. An artiŽcial environ-

Presence, Vol. 11,No. 2, April2002, 158–175 mentin which the user cannot see through the display is generally referred as a © 2002by theMassachusetts Institute of Technology (VR) environment. Oneof the reasons that Sutherland’s ap-

158 PRESENCE: VOLUME11, NUMBER 2 Mannand Fung 159

proach was not more ubiquitously adopted is that he did not merge the virtual object (a simple cube) with the real world in ameaningful way. Feiner ’s group was responsible for demonstrating the viability ofAR as a Želd ofresearch, using sonar (Logitech 3-D trackers) to trackthe real world so that the real and virtual worlds could be registered (Feiner, MacIntyre,& Seligmann, 1993a,1993b). Other research groups (Fuchs, Bajura, &Ohbuchi;Caudell &Mizell, 1992) also contributed Figure 1. (a)The wearable computercan be used like clothing to to this development. Someresearch in ARalso arises encapsulate the user andfunction as aprotective shell, whether to fromwork in (Drascic &Milgram,1996). protectus from cold or physical attack(as traditionally facilitated by However, the concept ofthe Reality Mediator, which armor),or toprovide privacy (byconcealing personal information and arises fromthe Želd ofhumanistic intelligence (HI) personal attributesfrom others). Interms of signal ow, this (Mann,1997a, 2001a, 2001b) differs fromaugmented encapsulation facilitates the possible mediation of incoming reality, which has its origins in the Želd ofvirtual reality. information topermitsolitude andthe possible mediation of outgoing HI is deŽned as intelligence that arises fromthe human information topermitprivacy. Itis not somuchthe absolute blocking being in the feedbackloop ofa computational process of these information channels thatis important;it is the factthat the in which the humanand computerare inextricably in- wearer can control towhat extent,and when, these channels are tertwined. Wearable computinghas emerged as the per- blocked, modiŽed, attenuated, or ampliŽed, in various degrees, that fecttool for embodying HI.When awearable computer makes wearable computingmuch more empowering to the user than functions as asuccessful embodiment ofHI,the com- other similar formsof portable computing.(b) An equivalent depiction of encapsulation (mediation) redrawn where the encapsulation is puter uses the human ’smind and body asone ofits pe- understood tocomprise aseparate protective shell. ripherals, just as the humanuses the computeras ape- ripheral. This reciprocal relationship, in which each uses the other in its feedbackloop, is at the heart ofHI. Within an HIframework,the wearable computeris 2EyeTap Devices worn constantly to assist the user in avariety ofday-to- day situations. Justas half-silvered mirrors are used to create aug- Animportant observation arising fromthis constant mented reality, EyeTap devices are used tomediate use is that,unlike handheld devices, laptop computers, one’sperception ofreality. EyeTap devices have three and PDAs, the wearable computercan encapsulate us maincomponents: (Mann,1998). Itcan function asaninformation Žlter and allow ustoblockout material we mightnot wish to c ameasurement system typically consisting ofa cam- experience (such as offensive advertising) orsimply re- era system,or sensor array with appropriate optics; place existing media with different media.Thus, the c adiverter system,for diverting eyeward bound light wearable computeracts to mediate one ’sexperience into the measurement system and therefore causing with the world. The mediating role ofEyeTap and the eye ofthe user ofthe device to behave, in effect, wearable computers can be better understood by exam- asif it were acamera;and ining the signal ow paths between the human,com- c anaremacfor reconstructing atleast some ofthe puter,and external world as illustrated in Žgure 1. diverted rays ofeyeward bound light. (Thus, the There exist well known email and Web browser Žlters aremacdoes the opposite ofwhat the cameradoes that replace orremove unwanted advertising, mediating and is, in manyways, acamerain reverse. The ety- one’suse ofthe media.Diminished reality extends this mology ofthe word aremac itself arises fromspell- mediation to the visual domain. ing the word camera backwards (Mann,1997c).) 160 PRESENCE: VOLUME11, NUMBER 2

Anumber ofsuch EyeTap devices, together with Inthe focus-tracking embodiments,the aremachas wearable computers,were designed, built, and worn by focus linked to the measurement system (for example, the authors for manyyears in awide variety ofsettings “camera”)focus,so that objects seen depicted on the and situations, both inside the lab as well as in ordinary aremacof the device appear to be at the samedistance day-to-day life (such as while shopping,riding abicycle, fromthe user ofthe device asthe real objects so de- going through airport customs,attending weddings, picted.In manualfocus systems, the user ofthe device is and so on). This broad base ofpractical real-life experi- given afocus control that simultaneously adjusts both ence helped usbetter understand the fundamentalissues the aremacfocus and the “camera” focus.In automatic ofmediated reality. focus embodiments,the camerafocus also controls the Although the apparatus is for altering our vision, in aremacfocus. Such a linked focus gives rise to amore mostof the practical embodiments that we built we pro- natural view Žnder experience. Itreduces eyestrain as vided atleast one modeof operation that can preserve well. Reduced eyestrain is important because these de- our vision unaltered. This one mode,which we call the vices are intended tobe worn continually. “identity mode, ” serves as abaseline that formsa point The operation ofthe depth trackingaremac is shown ofdeparture for when certain changes are desired. To in Žgure 2. achieve the identity moderequirement, the EyeTap Because the eye ’s own lens (L3)experiences what it mustsatisfy three criterion: would have experienced in the absence ofthe apparatus, the apparatus,in effect,taps in toand out ofthe eye, c focus: The subject matter viewed through the eyetap mustbe displayed at the appropriate depth offocus. causing the eye to becomeboth the cameraand the viewŽnder (display). Therefore, the device is called an c orthospatiality: The rays oflight created by the aremacmust be collinear with the rays oflight en- EyeTap device. tering the EyeTap,such that the scene viewed Often, lens L1 is avarifocal lens or otherwise has a through the EyeTap appears the sameas if viewed variable Želd ofview (such as a “zoom” functionality). in the absence ofthe EyeTap. Inthis case,it is desired that the aremacalso have avari- c orthotonality: Inaddition to preserving the spatial able Želd ofview. Inparticular, Želd-of-view control light relationship oflight entering the eye,we de- mechanisms (whether mechanical,electronic, orhybrid) sire that the EyeTap device also preserves the tonal are linked in such away that the aremacimage magni Ž- relationships oflight entering the eye. cation is reduced as the cameramagni Žcation is in- creased. Through this appropriate linkage,any increase in magniŽcation by the camerais negated exactly by 2.1 Focus and Orthospatiality in decreasing the apparent size ofthe view Žnder image. EyeTap Systems The operation ofthe aremacfocus and zoom tracking The aremachas two embodiments:one in which a is shown in Žgure 3. focuser (such as an electronically focusable lens) tracks Stereo effectsare well known in virtual reality systems the focus ofthe camerato reconstruct rays ofdiverted (Ellis, Bucher,& Menges,1995) wherein two informa- light in the samedepth plane asimagedby the camera, tion channels are often foundto create abetter sense of and another in which the aremachas extended or in Ž- realism. Likewise, in stereo embodiments ofthe devices nite depth offocus so that the eye itself can focus on that we built, there were two cameras or measurement different objects in ascene viewed through the appara- systems and two aremacs that each regenerated the re- tus. spective outputs ofthe cameraor measurement systems. Although we have designed, built, and tested manyof The apparatus is usually concealed in darksunglasses each ofthese two kinds ofsystems, this paper describes that wholly or partially obstruct vision except for what only the systems that use focus tracking. the apparatus allows to pass through. Mannand Fung 161

Figure 2. Focustracking aremac: (a)with aNearbysubject, a point P 0 thatwould otherwisebe imaged atP 3 in the eye of auser of the device is instead imaged to point P 1 on the image sensor, because the diverterdiverts eyeward boundlight to lens L 1.When subjectmatter is nearby, the L 1 focuser Figure 3. Focusof right camera andboth aremacs (as well as moves objectivelens L 1 out away fromthe sensor automatically, as an vergence) controlled bythe autofocus camera on the left side. Ina automaticfocuscamera would. Asignal fromthe L 1 focuser directsthe L 2 two-eyed system,it is preferable thatboth cameras andboth focuser, byway of the focuscontroll er, to move lens L 2 outward away from aremacs focusto the same distance.Therefore, one of the cameras is the light synthesizer. Inmany of the embodimentsof the systemthat we built, afocusmaster and the other camera is afocusslave. Alternatively, a the functionality of the focuscontroll er was implemented within awearable focuscombiner is used to average the focusdistance of bothcameras computerthat also processedthe images. We designed andbuilt afocus andthen make the two cameras focusat equal distance.The two controller cardas aprintedcircuit boardfor use with the IndustryStandards aremacs, as well as the vergence of bothsystems, also trackthis Association (ISA)bus standard that was popular atthe time of our original same depthplane as de Žned bycamera autofocus. design. We also designed andbuilt aPC104version of the focuscontroll er board.Our focus controll er printedcircuit layout (available in PCBformat) is 2.2 Importance of the Orthospatial releasedunder GNUGPL, and is downloadable fromhttp:/ /wearcam.org/ eyetap_focus_controller, along with FIFOimplemen tation of serial select servo Criterion controller. (Later,we built some other embodimentsthat use aserial portof Registration, which is important to augmented- the wearable computerto drive aseparate focuscontroll er module.) The focus reality systems (You, Neumann,& Azuma,1999; controller drives upto four servos to adjustthe position of lenses L of a 2 Azuma,2001; Behringer, 1998), is also important in stereo rig, as well as the two lenses L1.In other embodiments,we used mediated reality. automatic-focuscameras andderived the signal controlling the servo position Ofthe three registration criteria (focus,orthospatial- for lens L2by extractin gthe similar servo positioning signal fromthe focus adjustmentof the autofocuscamera. Atthe same time,an image fromthe ity, orthotonality), an important one is the orthospatial sensor is directedthrough an image processor (PROC)into the light criteria for mitigation ofany resulting mismatchbe- synthesizer (SYNTH).Point P 2 of the display element is responsive to point P 1 of the sensor. Likewise,other pointson the light synthesizer are each responsive to corresponding pointson the sensor, so thatthe synthesizer producesa complete image for viewing through lens L 2 bythe eye, after retractsto focusthese rays atpoint P 1. When lens L1 retracts,so does lens reection offthe backside of the diverter.The position of L 2 is suchthat the L2,andthe light synthesizer ends upgenerati ng parallel rays of light that eye’s own lens L3 will focusto the same distance as it would have focusedin bounce offthe backside of the diverter.These parallel rays of light enter the the absence of the entire device. (b)With distant subject matter, rays of eye andcause itsown lens L 3 to relax to in Žnity, as it would have in the parallellight are diverted toward the sensor where lens L 1 automatically absence of the entire device. 162 PRESENCE: VOLUME11, NUMBER 2

tween viewŽnder imageand the real world that would otherwise create anunnatural mapping.Indeed, anyone who has walked around holding asmall camcorder up to his or her eye for several hours aday will obtain an understanding ofthe ill psychophysical effectsthat result. The diverter system in EyeTap allows the center of projection ofthe camerato optically coincide with the center ofprojection ofthe eye.This placement ofthe cameramakes EyeTap different fromother head- mountedcamera systems that place the cameraonly “near to” the eye’scenter ofprojection. We will now discuss how cameraplacement ofthe EyeTap allows it Figure 4. The small lens (22)shown in solid lines collects acone of to work without parallax in avariety ofsituations, with- light boundedby rays 1Cand 2C. Consider, for example, eyeward- out the limitations experienced by head-mounted cam- boundray of light 1E,which may be imagined to be collected bya era systems. large Žctional lens 22F(when in factray 1Cis capturedby the actual Itis easy to imagine acameraconnected to atelevi- lens 22),and focused to point 24A.The sensor element collecting sion screen and carefully arranged in such away that, light atpoint 24Ais displayed as point 32Aon the camcorder when viewed froma particular viewpoint, the television viewŽnder,which is then viewed bya magnifying lens andemerges as ray 1Dinto the eye (39).It should be noted thatthe topof the screen displays exactly what is blocked by the screen, so nearby subjectmatter (23N) also images topoint 24Aand is that anillusory transparency results. This illusory trans- displayed atpoint 32A,emerging as ray 1Das well. Thus,nearby parency would hold only so long asthe television is subjectmatter 23N will appearas shown in the dottedline denoted viewed fromthis particular viewpoint. Moreover, it is 23F,with the toppoint appearing as 23FAeven though the actual easy to imagine aportable miniature device that accom- point should appearas 23NA(that is, it would appearas point plishes this situation, especially given the proliferation of 23NAin the absence of the apparatus).Thus, a camcordercannot consumer camcorder systems (such as portable cameras properly function as atrue EyeTap device. with built-in displays). We could attemptto achieve this condition with ahandheld camcorder,perhaps minia- turized to Žtinto ahelmet-mounted apparatus,but it is impossible to align the images exactly with what would to aparticular viewpoint is not aproblem (because the appear in the absence ofthe apparatus.We can better sunglasses could be anchored to a Žxed viewpoint with understand this problem by referring to Žgure 4. respect to at least one eye ofa user), the other impor- Figure 5shows, in detail, how, in Žgure 4,we imag- tant limitation —that such systems work for only subject ine that the objective lens ofthe camera,placed directly matter in the samedepth plane —remains. in front ofthe eye,is muchlarger than it really is, so This problem exists whether the camerais right in that it captures all eyeward bound rays oflight, for front ofthe display oroffto one side. Somereal-world which we can imagine that it processes these rays in a examples, having the camerato the left ofthe display, collinear fashion.However, this reasoning is pure Žction are shown in Žgure 6.In these setups, subject matter and breaks down assoon as we consider ascene that has movedcloser to the apparatus will show as being not some depth of Želd. properly aligned. Consider aperson standing right in Thus,the setup of Žgures 4and 5works for only a front ofthe camerabut not in front ofthe TVin Žgure particular viewpoint and for subject matter in aparticu- 6.Clearly, this person will not be behind the television lar depth plane.Although the samekind of system but yet will appear on the television. Likewise, aperson could obviously be miniaturized and concealed in ordi- standing directly behind the television will not necessar- nary-appearing sunglasses, in which case the limitation ily beseen by the camera,which is located to the left of Mannand Fung 163

3VideoOrbits Head Tracking and Motion Estimation for EyeTap Reality Mediation

Because the device absorbs, quanti Žes,processes, and reconstructs light passing through it,there are ex- tensive applications in creating amediated version of reality. The computer-generated information orvirtual light as seen through the display mustbe properly regis- tered and aligned with the real-world objects within the user’s Želd ofview. To achieve this, amethodof camera-based head trackingis now described.

3.1 Why Camera-BasedHead Tracking?

Agoal ofpersonal imaging(Mann, 1997b) is to fa- Figure 5. Supposethe camera portion of the camcorder,denoted cilitate the use ofthe Reality Mediator in ordinary everyday byreference numeral 10C,were Žttedwith avery large objective lens situations, not just on afactory assembly line “workcell” or (22F).This lens would collect eyeward-bound rays of light 1Eand2E. other restricted space.Thus, it is desired that the apparatus Itwould also collect rays of light coming toward the center of have ahead tracker that need not rely on any special appa- projection of lens 22.Rays of light coming toward this camera center ratus being installed in the environment. of projection are denoted 1Cand 2C. Lens 22 converges rays 1Eand Therefore, we need anew methodof head tracking 1Cto point 24Aon the camera sensor element. Likewise, rays of light 2Cand 2E are focusedto point 24B.Ordinarily, the image (denoted based on the use ofthe cameracapability ofthe appara- byreference numeral 24)is upsidedown in acamera, butcameras tus (Mann,1997b) and on the VideoOrbits algorithm anddisplays are designed sothat,when the signal froma camera is (Mann &Picard, 1995). fedto adisplay (suchas aTVset),it shows rightside up.Thus, the image appearswith point 32Aof the display creating rays of light 3.2 Algebraic Projective Geometry suchas the one denoted 1D.Ray 1Dis collinear with eyeward-bound ray 1E.Ray 1Dis responsive to,and collinear with, ray 1Ethatwould The VideoOrbits algorithm performs head track- have entered the eye in the absence of the apparatus.Likewise, by ing,visually, based onanatural environment, and it similar reasoning, ray 2Dis responsive to,and collinear with, eyeward- works without the need for object recognition. Instead, boundray 2E.It should benoted,however, thatthe large lens (22F) it is based on algebraic projective geometry and adirect is just an element of Žction. Thus,lens 22Fis a Žctional lens, because featureless means ofestimating the change in spatial atrue lens should be represented byitscenter of projection; thatis, itsbehavior should not change, other than bydepth of focus, coordinates ofsuccessive framesof EyeTap video arising diffraction,and amount of light passed,when itsiris is opened or frommovement of the wearer ’shead,as illustrated in closed. Therefore, we could replace lens 22Fwith apinhole lens and Žgure 7.This change in spatial coordinates is character- simply imagine lens 22to have capturedrays 1E and2E, when it ized by eight parameters ofan “exact” projective (ho- actually, in fact,captures only rays 1Cand 2C. mographic) coordinate transformation that registers pairs ofimages or scene content.These eight parameters are “exact” for two cases ofstatic scenes: (i) images the television. Thus,subject matter that exists atavari- takenfrom the samelocation ofan arbitrary 3-D scene, ety ofdifferent depths and is not con Žned to aplane with acamerathat is free to pan,tilt, rotate about its maybe impossible to align in all areas with its imageon optical axis, and zoom (such as when the user stands the screen. still and moves their head); or (ii) images ofa at scene 164 PRESENCE: VOLUME11, NUMBER 2

Figure 6. Illusorytransparency: Examples of acamera supplying atelevision with an image of subjectmatter blocked bythe television. (a)A television camera on atripodat left supplies an Apple “Studio” television display with an image of the lower portion of Niagara Falls blocked bythe television display (resting on an easel to the right of the camera tripod).The camera anddisplay were carefully arranged, along with asecond camera tocapturethis pictureof the apparatus.Only when viewed fromthe special location of the second camera does the illusion of transparencyexist. (b)Various cameras with television outputswere setup on the walkway butnone of themcan re-create the subjectmatter behind the television display in a manner thatcreates aperfectillusion of transparency,because the subjectmatter does not exist in one single depthplane. There exists no choice of camera orientation, zoom setting, andviewer location thatcreates an exact illusion of transparencyfor the portion of the Brooklyn Bridge blocked bythe television screen. Notice how the railings don ’tquite line upcorrectly because they vary in depthwith respectto the Žrstsupport tower of the bridge.

takenfrom arbitrary locations (such aswhen it is desired by another method.Once placed, however, no further to tracka planar patch,viewed by auser who is free to model or knowledge ofthe scene is required to track moveabout). Thus, it is well suited for tracking planar the location ofcomputer-geneated information. patches with arbitrary view motion,a situation that The featureless projective approachgeneralizes inter- commonlyarises, for instance, when asign or billboard framecamera motion estimation methods that have pre- is to betracked. viously used an af Žnemodel (which lacks the degrees of Itis stressed here that the algorithm presented is used freedom to “exactly” characterize such phenomenaas to trackimage motion arising fromarbitrary relative camerapan and tilt) and/orthat have relied upon Žnd- motion ofthe user ’shead with respect to rigid planar ing points ofcorrespondence between the imageframes. patches.Initial placement ofcomputer-generated infor- The featureless projective approach,which operates di- mation (for instance, the four corners ofthe rigid planar rectly on the imagepixels, is shown to besuperior in rectangular patch) is assumed to have been completed accuracy and ability toenhance resolution. The pro- Mannand Fung 165

posed methods work well on imagedata collected from both good-quality and poor-quality video under awide variety ofconditions (sunny, cloudy, day,night). These fully automatic methods are also shown to be robust to deviations fromthe assumptions ofstatic scene and no parallax. The primary application here is in Žltering out or replacing subject matter appearing on at surfaces within ascene (for example,rigid planar patches such as advertising billboards). The mostcommon assumption (especially in motion estimation for coding and optical ow for computer vision) is that the coordinate transformation between framesis translation. Tekalp,Ozkan, and Sezan (1992) have applied this assumption to high-resolution image reconstruction. Although translation is the least con- straining and simplest to implement ofthe seven coordi- nate transformations in table 1,it is poor at handling large changes due to camerazoom, rotation, pan,and tilt. Zheng and Chellappa (1993) considered the image registration problem using asubset ofthe af Žne model: translation, rotation and scale. Other researchers (Irani &Peleg, 1991;Teodosio &Bender, 1993) have as- sumed afŽne motion (six parameters) between frames. Behringer (1998) considered features ofa silhouette. For the assumptions ofstatic scene and no parallax, the afŽne model exactly describes rotation about the optical axis ofthe camera,zoom ofthe camera,and pure shear, which the cameradoes not do,except in the limit as the lens focal length approaches in Žnity. The afŽne model cannot capture camerapan and tilt and therefore cannot Figure 7. TheVideoOrbits head-tracking algorithm: The new properly express the “keystoning” and “chirping” we head-tracking algorithm requires no special devices installed in the see in the real world. ( Chirping refers to the effectof environment. The camera in the personal imaging systemsimply tracksitself basedon itsview of objects in the environment. The increasing or decreasing spatial frequency with respect algorithm is basedon algebraic projective geometry andprovides an to spatial location, asillustrated in Žgure 8.) estimate of the true projective coordinate transformation, which, for This chirping phenomenon is implicit in the proposed successive image pairs is composedusing the projective group (Mann system,whether ornot there is periodicity in the subject &Picard,1995). Successive pairs of images may beestimatedin the matter.The only requirement is that there be some dis- neighborhood of the identity coordinate transformation of the group, tinct texture upon a at surface in the scene. whereas absolute head tracking is done using the exact group by relating the approximate parametersq tothe exact parametersp in the innermost loop of the process.The algorithm typically runsat Žve 3.3 Video Orbits to ten framesper second on ageneral purposecomputer, but the Tsai and Huang(1981) pointed out that the ele- simple structureof the algorithm makes it easy toimplement in ments ofthe projective group give the true cameramo- hardware for the higher frame rates needed for full-motion video. tions with respect to aplanar surface.They explored the 166 PRESENCE: VOLUME11, NUMBER 2

Table 1. Image CoordinateTransformations Discussed inthis Paper

Model Coordinate transformation from x to x9 Parameters

Translation x9 5 x 1 b b [ R2 AfŽne x9 5 Ax 1 b A [ R232, b [ R2

Bilinear x9 5 qx9xyxy 1 qx9xx 1 qx9yy 1 qx9 y9 5 q xy 1 q x 1 q y 1 q q [ R y9xy y9x y9y y9 * Ax 1 b 232 2 Projective x9 5 A [ R , b, c [ R cTx 1 1

Ax 1 b 232 2 Relative-projective x9 5 1 x A [ R , b, c [ R cTx 1 1 2 Pseudo-perspective x9 5 qx9xx 1 qx9yy 1 qx9 1 qa x 1 qb xy y9 5 q x 1 q y 1 q 1 q xy 1 q y 2 q [ R y9x y9y y9 a b * 2 2 Biquadratic x9 5 qx9x2 x 1 qx9xyxy 1 qx9y 2y 1 qx9xx 1 qx9yy 1 qx9 2 2 9 5 2 1 1 2 1 1 1 [ y qy9x x qy9xyxy qy9y y qy9xx qy9yy qy9 q* R

Figure 8. Theprojective chirpingphenomenon. (a)A real-world object thatexhibits periodicity generates aprojection (image) with “chirping” (“periodicity in perspective ”).(b)Center raster of image. (c) Best-Žtprojective chirpof formsin(2 p ((ax 1 b)/(cx 1 1))).(d) Graphical depiction of exemplar

1-Dprojective coordinate transformation of sin(2 p x1) into a “projective chirp ” function,sin(2 p x2) 5

sin(2p ((2x1 2 2)/(x1 1 1))).The range coordinate as afunction of the domain coordinate formsa

rectangular hyperbola with asymptotesshifted to center atthe vanishing point x 1 5 21/c 5 21 and 2 1 exploding point,x 2 5 a/c 5 2,andwith chirpiness c 9 5 c /(bc 2 a) 5 2 4.

group structure associated with images ofa3-D rigid has been solved. The solution presented in this paper planar patch,as well asthe associated Lie algebra, al- (which does not require prior solution ofcorrespon- though they assume that the correspondence problem dence) also relies onprojective group theory. Mannand Fung 167

3.3.1 ProjectiveFlow— A New Technique for described by Mann(1998, p.2139). The reader is in- Trackinga RigidPlanar Patch. Amethodfor track- vited to refer to that work for amore in-depth treat- ing arigid planar patchis now presented. Consider Žrst mentof the matter. one-dimensional systems because they are easier to ex- plain and understand. For a1-D af Žnecoordinate trans- 3.3.2 The Unweighted ProjectivityEstimator. formation,the graph ofthe range coordinate asafunc- Ifwe donot wish to apply the ad hoc weighting tion ofthe domain coordinate is astraight line; for the scheme,we maystill estimate the parameters ofprojec- projective coordinate transformation, the graph ofthe tivity in asimple manner still based on solving alinear range coordinate as afunction ofthe domain coordinate system ofequations. To do this, we write the Taylor is arectangular hyperbola ( Žgure 8(d)). series of um Whether ornot there is periodicity in the scene, the 2 methodstill works, in the sense that it is based on the um 1 x 5 b 1 ~a 2 bc!x 1 ~bc 2 a!cx (5) projective owacross the texture or pattern,at all vari- 1 ~a 2 bc!c 2x 3 1 · · · ous spatial frequency components ofa rigid planar patch.The methodis called projective-ow (p-ow), and use only the Žrst three terms,obtaining enough which we will now describe in 1-D. degrees offreedom to account for the three parameters Webegin with the well-known Horn and Schunk being estimated. Letting e 5 ¥ ((b 1 (a 2 bc 2 1)x 1 2 2 brightness change constraint equation (Horn & (bc 2 a)cx )Ex 1 Et) , q2 5 (bc 2 a)c, q1 5 a 2 bc 2

Schunk,1981): 1, and q0 5 b,and differentiating with respect toeach ofthe three parameters of q,setting the derivatives 1 ufEx Et < 0, (1) equal tozero, and verifying with the second derivatives gives the linear system ofequations for unweighted pro- where Ex and Et are the spatial and temporal derivatives jective ow: respectively ofthe image E(x), and uf is the optical ow velocity, assuming pure translation. Typically, we deter- 4 2 3 2 2 2 2 x E x x E x x E x q x ExEt mine u which minimizes the error equation (1) as O O O 2 O m 3 2 2 2 2 q O x E x O x E x O xE x 1 5 2 O xExEt (6) 2 2 2 2 2 F x E x xE x E x GF q0G F ExEt G e flow 5 O ~umEx 1 Et! (2) O O O O x

Projective- ow (p-ow), and arises fromsubstitutio n 3.4 Planetracker in 2-D of um 5 ((ax 1 b)/(cx 1 1)) 2 x in place of uf in We now discuss the 2-D formulation.We begin equation (1). again with the brightness constancy constraint equation, Ajudicious weightnig by ( cx 1 1) simpliŽes the cal- this time for 2-D images (Horn &Schunk,1981), culation, giving which gives the owvelocity components in both the x e 5 ~ 1 1 ~ 2 2 ! 1 2 !2 and y directions: w O axEx bEx c xEt x Ex Et xEx . (3)

T Differentiating and setting the derivative to zero (the uf Ex 1 Et < 0 (7) subscript w denotes weighting has takenplace) results in alinear system ofequations for the parameters,which Asis well known,the optical ow Želd in 2-D is un- 1 canbe written compactlyas derconstrained. The model ofpure translation at every point has two parameters,but there is only one ~ f f T!@ #T 5 ~ 2 !f O w w a, b, c O xEx Et w (4) equation (7) to solve, thus it is commonpractice to 2 T where the regressor is f w 5 [xEx, Ex, xEt 2 x Ex] . The notation and derivations used in this paper are as 1. Optical owin1-D did not suffer from thisproblem. 168 PRESENCE: VOLUME11, NUMBER 2

computethe optical ow over some neighborhood, 3.5 Unweighted Projective Flows which mustbe at least two pixels, but is generally taken As with the 1-D images,we makesimilar assump- over asmall block —333, 535,or sometimes larger (for tions in expanding equation (8) in its own Taylor series, example,the entire patchof subject matter to be Žltered analogous to equation (5). By appropriately constrain- out,such as abillboard orsign). ing the twelve parameters ofthe biquadratic model,we Ourtask is not todeal with the 2-D translational obtain avariety ofeight-parameter approximate models. ow, but with the 2-D projective ow, estimating the Inestimating the “exact unweighted ” projective group eight parameters in the coordinate transformation: parameters,one ofthese approximate models is used in an intermediate step. 2 x9 A@x, y#T 1 b Ax 1 b x9 5 5 5 (8) The Taylor series for the bilinear case gives F y9G cT @x, y#T 1 1 cTx 1 1

um 1 x 5 qx 9xy xy 1 ~qx 9x 1 1!x 1 qx 9yy 1 qx 9 The desired eight scalar parameters are denoted by p 5 232 231 231 (12) [A, b; c, 1], A [ R , b [ R , and c [ R . vm 1 y 5 qy 9 xy xy 1 qy 9 x x 1 ~qy 9 y 1 1!y 1 qy 9 Wehave,in the 2-D case: Incorporating these into the ow criteria yields asimple set ofeight linear equations in eight unknowns: Ax 1 b T 2 e 5 ~uT E 1 E !2 5 2 x E 1 E , flow O m x t O S S cTx 1 1 D x tD ~f ~ !f T~ !! 5 2 f ~ ! O x, y x, y q O Et x, y (13) (9) S x,y D x,y

where f T 5 [E (xy, x, y, 1), E (xy, x, y, 1)]. where the sumcan be weighted asit was in the 1-D x y For the relative-projective model, f is given by case: T f 5 @Ex~x, y, 1!, Ey~x, y, 1!, Et~x, y!#, (14) 2 e 5 ~ 1 2 ~ T 1 ! !T 1 ~ T 1 ! w O S Ax b c x 1 x Ex c x 1 EtD . and,for the pseudo-perspective model, f is given by T (10) f 5 @Ex~x, y, 1!, Ey~x, y, 1!, (15) ~x 2E 1 xyE , xyE 1 y 2E !#. Differentiating with respect to the free parameters A, b, x y x y and c,and setting the result to zero gives alinear solu- 3.5.1 Four-Point Method forRelating Ap- tion: proximate Modelto Exact Model. Any ofthe pre- ceding approximations, after being related to the exact T T projective model,tend to behave well in the neighbor- S O f f D @a11, a12, b1, a 21, a 22, b2, c1, c2# (11) hood ofthe identity, A 5 I, b 5 0, c 5 0. In 1-D, the 5 T 2 f O ~x Ex Et! model Taylor series about the identity was explicitly ex- panded;here, although this is not done explicitly, it is where assumed that the terms ofthe Taylor series ofthe model correspond to those takenabout the identity. Inthe T f 5 @Ex~x, y, 1!, Ey~x, y, 1!, 1-D case,we solve the three linear equations in three

2 2 xEt 2 x Ex 2 xyEy, yEt 2 xyEx 2 y Ey# 2.Use ofan approximate modelthat doesn ’tcapture chirping or preservestraight lines can still lead tothe true projective parameters as For amore in-depth treatment ofprojective ow, the longas themodel captures at least eightmeaningful degreesof free- reader is invited torefer to Mann(1998). dom. Mannand Fung 169

unknowns to estimate the parameters ofthe approxi- adequate to describe large changes in perspective. How- matemotion model,and then relate the terms in this ever, if we use it to tracksmall changes incrementally, Taylor series to the exact parameters, a, b, and c (which and each time relate these small changes to the exact involves solving another set ofthree equations in three model (8), then we can accumulate these small changes unknowns,the second set being nonlinear, although using the law ofcomposition affordedby the group very easy to solve). structure. This is anespecially favorable contribution of Inthe extension to2-D, the estimate step is straight- the group framework.For example,with avideo se- forward, but the relate step is more dif Žcult,because we quence,we canaccommodate very large accumulated now have eight nonlinear equations in eight unknowns, changes in perspective in this manner.The problems relating the terms in the Taylor series ofthe approxi- with cumulative error can be eliminated, for the most matemodel tothe desired exact model parameters.In- part,by constantly propagating forward the true values, stead ofsolving these equations directly, asimple proce- computingthe residual using the approximate model, dure is used for relating the parameters ofthe and each time relating this to the exact model toobtain approximate model to those ofthe exact model,which agoodness-of- Žt estimate. is called the “four-point method ”: 3.5.2 Overview ofthe Algorithmfor Un- 1.Select four ordered pairs (for example,the four weighted ProjectiveFlow. Frames froman image corners ofthe bounding box containing the region sequence are comparedpairwise totest whether or not under analysis, orthe four corners ofthe imageif they lie in the sameorbit: the whole imageis under analysis). Here, suppose, for simplicity, that these points are the corners of 1.A Gaussian pyramid ofthree orfour levels is con- T the unit square: s 5 [s1, s2, s3, s4] 5 [(0, 0) , structed for each framein the sequence. (0, 1)T, (1, 0)T, (1, 1)T]. 2.The parameters p are estimated at the top ofthe 2.Apply the coordinate transformation using the pyramid,between the two lowest-resolution im- Taylor series for the approximate model (such as ages ofa framepair, g and h,using the repetitive

equation (12)) to these points: r 5 um(s). methoddepicted in Žgure 7. 3.Finally, the correspondences between r and s are 3.The estimated p is aplied to the next-higher-reso- treated just like features. This results in four easy- lution (Žner) imagein the pyramid, p + g, to make to-solve equations: the two images atthat level ofthe pyramid nearly x9 x , y , 1, 0, 0, 0, 2x x9, 2y x9 congruent before estimating the p between them. k 5 k k k k k k F yk9G F 0, 0, 0, xk, yk, 1, 2xk yk9, 2yk yk9G 4.The process continues down the pyramid until the (16) highest-resolution imagein the pyramid is T reached. @ax 9x , ax 9y, bx 9, ay 9x , ay 9y, by 9, cx , cy#

where 1 # k # 4.This results in the exact eight parameters, p. 4Reality Mediation in Variable-Gain Image Sequences Weremind the reader that the four corners are not feature correspondences as used in the feature-based Until now, we have assumed Žxed-gain imagese- methods,but, rather, are used so that the two feature- quences. Inpractice, however, cameragain varies to less models (approximate and exact) canbe related to compensate for varying quantity oflight, by way ofau- one another. tomatic gain control (AGC), automatic level control, or Itis important to realize the full bene Žt of Žnding the some similar formof automatic exposure. exact parameters.Although the approximate model is Infact,almost all modern cameras incorporate some sufŽcient for small deviations fromthe identity, it is not formof automatic exposure control. Moreover, next- 170 PRESENCE: VOLUME11, NUMBER 2

Figure 9. Automaticexposure is the cause of differently exposed picturesof the same (overlapping) subjectmatter, creating the need for comparametricimaging in intelligent vision systems(Mann, 2001a).(a) Looking frominside Hart House Soldier ’sTower, out through an open doorway, when the sky is dominant in the picture,the exposure is automatically reduced,and the wearer of the apparatuscan see the texture (suchas clouds) in the sky. He can also see University College andthe CNTower to the left. (b)As he looks upand to the right totake in subjectmatter not so well illuminated, the exposure automatically increases somewhat. The wearer can no longer see detail in the sky, butnew architectural details inside the doorway startto become visible. (c)As he looks furtherup and to the right, the dimly lit interior dominates the scene, andthe exposure is automatically increased dramatically. Hecan no longer see any detail in the sky, andeven the University College building, outside, is washed out (overexposed). However, the inscriptions on the wall (names of soldiers killed in the war) now become visible.

generation imagingsystems such asthe EyeTap eye- overlapping subject matter,we have (once the images glasses also feature an automatic exposure control sys- are registered, in regions ofoverlap) differently exposed temto makepossible ahands-free, gaze-activated pictures ofidentical subject matter.In this example,we wearable system that is operable without conscious have three very differently exposed pictures depicting thought or effort.Indeed, the humaneye itself incorpo- parts ofthe University College building and surround- rates manyfeatures akinto the automatic exposure or ings. AGCofmodern cameras. Figure 9illustrates how the Reality Mediator (or 4.1 Variable-Gain Problem nearly any camerafor that matter) takesin atypical Formulation scene. Asthe wearer looks straight ahead,he sees mostly sky, Differently exposed images (such asindividual and the exposure is quite small. Lookingto the right at framesof video) ofthe samesubject matter are denoted darker subject matter,the exposure is automatically in- as vectors: f0, f1, . . . , fi, . . . , fI21, @i, 0 # i , I. creased. Because the differently exposed pictures depict Eachvideo frameis some unknown function, f ¼, of Mannand Fung 171

the actual quantity oflight, q(x)falling on the image 4.1.1 Using ComparametricEquations. Vari- sensor: able-gain imagesequences, fi,are created by the re- sponse, f,ofthe imaging device to light, q. Each of 1 Aix bi these images provides us with an estimate of f differing fi 5 f kiq , (17) S S cix 1 di D D only by exposure, k.Pairs ofimages can be comparedby where x 5 (x, y)denotes the spatial coordinates ofthe plotting ( f (q), f (kq)), and the resulting relationship can be expressed as the monotonic function g( f (q)) 5 f (kq) image, ki is asingle unknown scalar exposure constant, not involving q.Equations ofthis formare called com- and parameters Ai, bi, ci, and di denote the projective coordinate transformation between successive pairs of parametric equations (Mann,2000). Comparametric images. equations are aspecial case ofa more general class of For simplicity, this coordinate transformation is as- equations called functional equations (Acze´l, 1966). sumed to beable to be independently recovered (for Acomparametricequation that is particularly useful example,using the methods ofthe previous section). for mediated-reality applications will now beintro- Therefore, without loss ofgenerality, images considered duced, Žrst by its solution (fromwhich the compara- in this section will be takenas having the identity coor- metric equation itself will be derived). (It is generally dinate transformation, which corresponds to the special easier to construct comparametricequations fromtheir case ofimages differing only in exposure. solutions than it is to solve comparametricequations.) The solution is Without loss ofgenerality, k0 will be called the refer- ence exposure and will be set to unity, and framezero c will be called the reference frame, so that f0 5 f (q). f ~q! 5 e bq a/~e bq a 1 1! , (20) Thus, we have S D

which has only three parameters (of which only two are 1 21 21 f ~ fi ! 5 f ~ f0!, @i, 0 , i , I. (18) ki meaningful parameters because b is indeterminable and may be Žxed to b 5 0without loss ofgenerality). Equa- The existence ofan inverse for f follows froma semi- tion (20) is useful because it describes the shape ofthe monotonicity assumption.Semimonotonicity follows curve that characterizes the response ofmanycameras fromthe factthat we expect pixel values to either in- to light, f (q),called the response curve. The constants a crease orstay the samewith increasing quantity ofillu- and c are speciŽctothe camera. mination, q. This model accurately captures the essence ofthe so- Photographic Žlmis traditionally characterized by the called toe and shoulder regions ofthe response curve. so-called “density versus log exposure ” characteristic Intraditional photography,these regions are ignored; curve (Wyckoff,1961, 1962). Similarly, in the case of all that is ofinterest is the linear mid-portion ofthe electronic imaging,we mayalso use logarithmic expo- density versus log exposure curve. This interest in only 5 5 sure units, Q log(q),so that one imagewill be K the midtones arises because, in traditional photography, log(k)units darker than the other: areas outside this region are considered to be incorrectly

21 21 exposed. However, in practice, in input images tothe log~ f ~ f1~x!!! 5 Q 5 log~ f ~ f2~x!!! 2 K. (19) reality mediator,many of the objects we lookat will be Because the logarithm function is also monotonic,the massively underexposed and overexposed because not problem comesdown to estimating the semimonotonic everything in life is necessarily awell-composed picture. function F¼5 log( f 21¼)and the scalar constant K. Therefore, these qualities ofthe model (20) are ofgreat There are avariety oftechniques for solving for F and K value in capturing the essence ofthese extreme expo- directly (Mann,2000). Inthis paper,we choose touse a sures, in which exposure into both the toe and shoulder methodinvolving comparametricequations. regions are often the normrather than an aberration. 172 PRESENCE: VOLUME11, NUMBER 2

Figure 10. (a)One of nineteen differently exposed picturesof atestpattern. (b) Each of the nineteen exposures producedeleven

ordered pairs in aplotof f(Q)as afunction of Q.(c)Shifting these nineteen plotsleft or right bythe appropriateK i allowed themall toalign to producethe ground-truthknown-response function f(Q).

Furthermore, equation (20) has the advantage ofbeing Žgure 10(a), was used toverify the response function bounded in normalized units between 0and 1. recovered using the method. The comparametricequation ofwhich the proposed The individual bars were segmented automatically by photographic response function (20) is asolution, is differentiation to Žnd the transition regions, and then given by robust statistics were used to determine an estimate of f (q)for each ofthe eleven steps, as well as the black Î c f g~ f ! 5 , (21) regions ofthe test pattern.Using the known re ectivity ~Î c f 1 e2aK!c ofeach ofthese twelve regions, aset oftwelve ordered pairs (q, f (q))was determined for each ofthe nineteen where K 5 log2(k2/k1)is the ratio ofthe two expo- exposures, as shown in Žgure 10(b). Shifting these re- sures. sults appropriately (by the K values) to line themup, Tovalidate this model,we: i gives the ground-truth, known response function, f, c estimate the parameters a, c, and k of g( f (q)) that shown in Žgure 10(c). best Žt a plot ( f (q), f (kq))derived fromdifferently Thus,equation (21) gives us arecipe for lightening exposed pictures (as, for example,shown in Žgure or darkening an imagein away that looks natural and is 9), and also based on this proven theoretical framework.For c verify our estimate of f by using lab instruments. instance, given apair ofimages takenwith acamera Although there exist methods for automatically deter- with aknown response function (which is to say that the mining the a and c and relative gain k frompairs ofdif- a and c are known for the camera),the relative gain be- ferently exposed images byusing comparametricequa- tween images is estimated, and either ofthe pair is light- tions, these methods are beyond the scope ofthis paper, ened ordarkened to bring it into the sameexposure as and the reader is invited torefer to Mann(2000) for a the other image.Similarly, any computer-generated in- full discussion ofthese methods and ofcomparametric formation in amediated or augmentedscene is brought equations. into the appropriate exposure ofthe scene. ACamAlign-CGHtest chart fromDSC Laboratories, Toronto, Canada(Serial No.S009494), as shown in Mannand Fung 173

Figure 11. Mediated realityas aphotographic/videographic memory prosthesis: (a)Wearable face recognizer with virtual “name tag” (andgrocery list) appearsto stayattached to the cashier (b),even when the cashier is no longer within the Želd of view of the tappedeye andtransmitter (c).

4.2 Mediated Reality asa Form of Real-world “spam” (unwanted and unsoliciated ad- Communication vertising) typically occurs on planar surfaces, such as billboards. The VideoOrbits algorithm presented here is The mathematical frameworkfor mediated reality well suited toward diminishing these unwanted and in- arose through the process ofmarkinga reference frame trusive real-world planar objects. (Mann &Picard, 1995) with text or simple graphics in Because the cameraresponse function and exposure which it was noted that,by calculating and matching values can be computedautomatically in aself-calibrat- homographies ofthe plane,an illusory rigid planar patchappeared to hover upon objects in the real world, ing system,the computer-mediated reality can takeform giving rise to aformof computer-mediated collabora- by combining the results estimation ofthe gain using tion (Mann,1997b). Figure 11 shows images processed the cameraresponse function and estimation ofthe co- in real time byVideoOrbits. ordinate transformation between frameswith the Video- Orbits methodology for computer-mediated reality. Figure 12(a, b)shows anice view ofthe Empire State 4.3 Diminished Reality Building spoiled by an offensive jeans advertisement (a Diminished reality deliberately removes parts ofa billboard depicting amanpulling offa women ’s real-world scene or replaces themwith computer- clothes). The computer-mediated reality environment generated information (Mann &Fung,2001). For in- allows the billboard to be automatically replaced with a stance,deliberately diminished reality has application in picture ofvintage (original 1985) mediated-reality sun- construction. Klinker, Stricker and Reiners (2001) dis- glasses. (See Žgure 12(c,d).) Byremoving the bill- cuss anumber oftechniques for interpolating the pixels board,a deliberately diminished version ofthe scene is behind adiminished object: “Manyconstruction created. The information ofthe advertisement is now projects require that existing structures be removed be- removed,and computer-generated information is in- fore new ones are built. Thus, just asimportant as aug- serted in its place,helping toavoid information over- menting reality is technology to diminish it ” (p. 416). load. 174 PRESENCE: VOLUME11, NUMBER 2

Figure 12. (a,b) Two framesfrom a video sequence in New York City,showing how anice view of the Empire StateBuilding is spoiled byan offensive jeans advertisement (abillboard depicting amanpulling offa women ’sclothes). Notice the effectof AGCbeing similar to thatdepicted in Žgure 9.(a) Because alarge proportion of sky is included in the image, the overall exposure is quite low, so the image is darker. (b)Because the darker billboard begins to enter the center portion of view, the gain increases andthe entire image is lighter. (c,d) Two framesfrom a video sequence in New York City,showing how the same visual reality can be diminished. Ourability tosee the offensive advertisement is reduced. The diminished reality is then augmented with aview of the vintage 1985smart sunglasses. Now, the resulting image sequence is an example of mediated reality. Notice how the exposure of the new matter,introduced into the visual Želd of view, tracksthe exposure of the offensive advertising material originally present.The result is avisually pleasing, mediated- reality experience extending over avery wide dynamicrange. (c)Because alarge proportion of sky is included in the image, the overall exposure is quite low, andso the image is darker. The additional material inserted into the image is thusautomatically madedarker, comparametrically, to match.(d) Because the original image was lighter, the new matterintroduced into the visual reality streamis also madelighter, comparametrically, tomatch.

5Conclusion mation,giving rise to the notion ofa deliberately diminished reality. Because wearable computers and EyeTap encapsu- late users, the technologies mediate auser ’sexperience with the world. Having designed, built, worn, and Acknowledgments tested dozens ofdifferent embodiments ofthese devices This workwas funded in partby Xilinx and Altera. for use in ordinary day-to-day life provided uswith muchin the way ofvaluable insight into the concepts of mediated reality. The resulting Reality Mediators alter References the user’svisual perception oftheir environment. The user’shead motion is trackedby the VideoOrbits algo- Acze´l, J.(1966). Lectureson functionalequations and their rithm,and the cameragain is trackedusing compara- applications (Vol. 19). New Yorkand London: Academic metric equations. This allows for computer-generated Press. information tobe registered both spatially and tonally Azuma,R. T.(2001). Augmented reality: Approachesand with the real world. An extension ofthe concept ofme- technical challenges. In W.Bar Želd &T.Caudell (Eds.), diated reality is the replacement ofunwanted informa- Fundamentalsof wearablecomputers andaugmented reality tion, such asadvertising, with computer-geneated infor- (pp. 27–63). New Jersey:Lawrence ErlbaumPress. Mannand Fung 175

Behringer,R. (1998). Improving the precision of registration ———.(1998). Humanistic intelligence/humanistic comput- foraugmented reality in anoutdoor scenarioby visual hori- ing: ‘Wearcomp’ asanew frameworkfor intelligent signal zon silhouette matching. Proceedings of Žrst IEEE workshop processing. Proceedings of the IEEE, 86 (11), 2123–2151. on augmentedreality (IWAR98), 225 –230. ———.(2000). Comparametricequations with practicalap- Caudell, T.,&Mizell, D.(1992). Augmented reality: Anap- plications in quantigraphic image processing. IEEE Trans. plication of heads-updisplay technology to manual manu- Image Proc., 9 (8), 1389 –1406. facturing processes. Proc. Hawaii International Conf. on ———. (2001a). Intelligentimage processing. New York:John SystemsScience, 2, 659 –669. Wiley and Sons. Drascic,D., &Milgram, P.(1996). Perceptualissues in aug- ———.(2001b). Wearablecomputing: Toward humanistic mented reality. SPIE Volume2653: Stereoscopic Displaysand intelligence. IEEE IntelligentSystems, 16 (3), 10–15. Virtual Reality SystemsIII, 123–134. Mann, S.,&Fung,J. (2001). Videoorbits on eyetap devices Earnshaw,R. A.,Gigante, M.A.,&Jones, H.(1993). Virtual fordeliberately diminished reality oraltering the visual per- reality systems. London: AcademicPress. ception of rigid planarpatches of arealworld scene. Inter- Ellis, S.R.,Bucher,U. J.,&Menges, B.M.(1995). The rela- nationalSymposium on (ISMR2001), 48 –55. tionship of binocular convergenceand errorsin judged dis- Mann, S.,&Picard,R. W.(1995). Video orbits of the projective tance to virtual objects. Proceedings of the International Fed- group; asimpleapproach to featurelessestimation of parame- eration of Automatic Control, 297–301. ters (Tech.Rep. No. 338). Cambridge,MA: Massachusetts Feiner,S., MacIntyre, B.,&Seligmann, D.(1993a). Karma Institute of Technology. (Also appearsin IEEE Trans. Im- (knowledge-based augmentedreality for maintenanceassis- age Proc., (1997), 6(9), 1281–1295.) tance). Available online at:http:/ /www.cs.columbia.edu/ Sutherland, I.(1968). Ahead-mounted three dimensional graphics/projects/karma/karma.html. display. Proc. FallJoint Computer Conference, 757–764. ———.(1993b). Knowledge-based augmented reality. Com- Tekalp,A., Ozkan,M., & Sezan,M. (1992). High-resolution municationsof the ACM,36 (7), 52–62. image reconstruction fromlower-resolution image se- Fuchs,H., Bajura,M., & Ohbuchi, R. Teamingultrasound quences and space-varyingimage restoration. Proc. of the data with virtual reality inobstetrics. Available online at: Int. Conf. on Acoust., Speech andSig. Proc., III-169. http://www.ncsa.uiuc.edu/Pubs/MetaCenter/SciHi93/ Teodosio, L.,& Bender,W. (1993). Salient video stills: Con- 1c.Highlights-BiologyC.html. tent and context preserved. Proc. ACMMultimediaConf., Klinker,G., Stricker,D., &Reiners,D. (2001). Augmented 39 –46. reality forexterior construction applications. In W.Bar Želd &T.Caudell (Eds.), Fundamentalsof wearablecomputers Tsai,R. Y.,&Huang, T.S.(1981). Estimating three-dimen- andaugmented reality (pp. 397–427). New Jersey:Law- sional motion parametersof arigid planarpatch. IEEE renceErlbaum Press. Trans. Acoust., Speech, andSig. Proc., ASSP(29), Horn, B.,&Schunk,B. (1981). Determining optical ow. 1147–1152. ArtiŽcialIntelligence, 17, 185–203. Wyckoff,C. W.(1961). Anexperimental extended response Irani,M., & Peleg,S. (1991). Improving resolution by image Žlm (Tech.Rep. No. NO. B-321).Boston, Massachusetts: registration. CVGIP, 53, 231–239. Edgerton, Germeshausen& Grier,Inc. Mann, S.(1997a). Humanistic intelligence. Proceedings of Ars Wyckoff,C. W.(1962, June –July). An experimental extended Electronica, 217–231. (Available online at:http:/ / response Žlm. S.P.I.E. NEWSLETTER, 16 –20. wearcam.org/ars/and http//www.aec.at/ eshfactor. You, S.,Neumann,U., &Azuma,R. (1999). Hybrid inertial ———.(1997b). Wearablecomputing: A Žrststep toward and vision trackingfor augmented reality registration. Pro- personal imaging. IEEE Computer, 30 (2), 25–32. ceedingsof IEEE VR, 260 –267. ———.(1997c). Anhistorical account of the ‘WearComp’ Zheng, Q.,&Chellappa,R. (1993). Acomputational vision and ‘WearCam’ projects developed for ‘personal imaging. ’ approachto image registration. IEEE Transactions Image International symposiumon wearablecomputing, 66 –73. Processing, 2 (3), 311–325.