Steve Mann EyeTap Devicesfor Augmented, [email protected] Deliberately Diminished,or James Fung Otherwise AlteredVisual [email protected] University ofToronto Perceptionof Rigid Planar 10King’ s College Road Patches ofReal-World Scenes Toronto,Canada
Abstract
Diminished reality is as important as augmented reality, and bothare possible with adevice called the RealityMediator. Over thepast twodecades, we have designed, built,worn, and tested many different embodiments ofthis device in thecontext of wearable computing. Incorporated intothe Reality Mediator is an “EyeTap”system, which is adevice thatquanti es and resynthesizes light thatwould otherwise pass through one or bothlenses ofthe eye(s) ofa wearer. Thefunctional principles of EyeTap devices are discussed, in detail. TheEyeTap diverts intoa spatial measure- ment system at least aportionof light thatwould otherwise pass through thecen- ter ofprojection ofat least one lens ofan eye ofa wearer. TheReality Mediator has at least one mode ofoperation in which itreconstructs these rays oflight, un- der thecontrol of a wearable computer system. Thecomputer system thenuses new results in algebraic projective geometry and comparametric equations toper- form head tracking, as well as totrack motionof rigid planar patches present in the scene. We describe howour tracking algorithm allows an EyeTap toalter thelight from aparticular portionof the scene togive rise toa computer-controlled, selec- tively mediated reality. Animportant difference between mediated reality and aug- mented reality includes theability tonot just augment butalso deliberately diminish or otherwise alter thevisual perception ofreality. For example, diminished reality allows additional information tobe inserted withoutcausing theuser toexperience information overload. Our tracking algorithm also takes intoaccount the effects of automatic gain control,by performing motionestimation in bothspatial as well as tonal motioncoordinates.
1Introduction
IvanSutherland, apioneer in the eld ofcomputergraphics, described a head-mounted display with half-silvered mirrors so that the wearer could see a virtual world superimposed on reality (Earnshaw, Gigante,& Jones,1993; Sutherland, 1968), giving rise to augmentedreality (AR). Others have adopted Sutherland’s concept ofa head-mounted display (HMD) but generally without the see-through capability. An articial environ-
Presence, Vol. 11,No. 2, April2002, 158–175 mentin which the user cannot see through the display is generally referred as a © 2002by theMassachusetts Institute of Technology virtual reality (VR) environment. Oneof the reasons that Sutherland’s ap-
158 PRESENCE: VOLUME11, NUMBER 2 Mannand Fung 159
proach was not more ubiquitously adopted is that he did not merge the virtual object (a simple cube) with the real world in ameaningful way. Feiner ’s group was responsible for demonstrating the viability ofAR as a eld ofresearch, using sonar (Logitech 3-D trackers) to trackthe real world so that the real and virtual worlds could be registered (Feiner, MacIntyre,& Seligmann, 1993a,1993b). Other research groups (Fuchs, Bajura, &Ohbuchi;Caudell &Mizell, 1992) also contributed Figure 1. (a)The wearable computercan be used like clothing to to this development. Someresearch in ARalso arises encapsulate the user andfunction as aprotective shell, whether to fromwork in telepresence (Drascic &Milgram,1996). protectus from cold or physical attack(as traditionally facilitated by However, the concept ofthe Reality Mediator, which armor),or toprovide privacy (byconcealing personal information and arises fromthe eld ofhumanistic intelligence (HI) personal attributesfrom others). Interms of signal ow, this (Mann,1997a, 2001a, 2001b) differs fromaugmented encapsulation facilitates the possible mediation of incoming reality, which has its origins in the eld ofvirtual reality. information topermitsolitude andthe possible mediation of outgoing HI is dened as intelligence that arises fromthe human information topermitprivacy. Itis not somuchthe absolute blocking being in the feedbackloop ofa computational process of these information channels thatis important;it is the factthat the in which the humanand computerare inextricably in- wearer can control towhat extent,and when, these channels are tertwined. Wearable computinghas emerged as the per- blocked, modied, attenuated, or amplied, in various degrees, that fecttool for embodying HI.When awearable computer makes wearable computingmuch more empowering to the user than functions as asuccessful embodiment ofHI,the com- other similar formsof portable computing.(b) An equivalent depiction of encapsulation (mediation) redrawn where the encapsulation is puter uses the human ’smind and body asone ofits pe- understood tocomprise aseparate protective shell. ripherals, just as the humanuses the computeras ape- ripheral. This reciprocal relationship, in which each uses the other in its feedbackloop, is at the heart ofHI. Within an HIframework,the wearable computeris 2EyeTap Devices worn constantly to assist the user in avariety ofday-to- day situations. Justas half-silvered mirrors are used to create aug- Animportant observation arising fromthis constant mented reality, EyeTap devices are used tomediate use is that,unlike handheld devices, laptop computers, one’sperception ofreality. EyeTap devices have three and PDAs, the wearable computercan encapsulate us maincomponents: (Mann,1998). Itcan function asaninformation lter and allow ustoblockout material we mightnot wish to c ameasurement system typically consisting ofa cam- experience (such as offensive advertising) orsimply re- era system,or sensor array with appropriate optics; place existing media with different media.Thus, the c adiverter system,for diverting eyeward bound light wearable computeracts to mediate one ’sexperience into the measurement system and therefore causing with the world. The mediating role ofEyeTap and the eye ofthe user ofthe device to behave, in effect, wearable computers can be better understood by exam- asif it were acamera;and ining the signal ow paths between the human,com- c anaremacfor reconstructing atleast some ofthe puter,and external world as illustrated in gure 1. diverted rays ofeyeward bound light. (Thus, the There exist well known email and Web browser lters aremacdoes the opposite ofwhat the cameradoes that replace orremove unwanted advertising, mediating and is, in manyways, acamerain reverse. The ety- one’suse ofthe media.Diminished reality extends this mology ofthe word aremac itself arises fromspell- mediation to the visual domain. ing the word camera backwards (Mann,1997c).) 160 PRESENCE: VOLUME11, NUMBER 2
Anumber ofsuch EyeTap devices, together with Inthe focus-tracking embodiments,the aremachas wearable computers,were designed, built, and worn by focus linked to the measurement system (for example, the authors for manyyears in awide variety ofsettings “camera”)focus,so that objects seen depicted on the and situations, both inside the lab as well as in ordinary aremacof the device appear to be at the samedistance day-to-day life (such as while shopping,riding abicycle, fromthe user ofthe device asthe real objects so de- going through airport customs,attending weddings, picted.In manualfocus systems, the user ofthe device is and so on). This broad base ofpractical real-life experi- given afocus control that simultaneously adjusts both ence helped usbetter understand the fundamentalissues the aremacfocus and the “camera” focus.In automatic ofmediated reality. focus embodiments,the camerafocus also controls the Although the apparatus is for altering our vision, in aremacfocus. Such a linked focus gives rise to amore mostof the practical embodiments that we built we pro- natural view nder experience. Itreduces eyestrain as vided atleast one modeof operation that can preserve well. Reduced eyestrain is important because these de- our vision unaltered. This one mode,which we call the vices are intended tobe worn continually. “identity mode, ” serves as abaseline that formsa point The operation ofthe depth trackingaremac is shown ofdeparture for when certain changes are desired. To in gure 2. achieve the identity moderequirement, the EyeTap Because the eye ’s own lens (L3)experiences what it mustsatisfy three criterion: would have experienced in the absence ofthe apparatus, the apparatus,in effect,taps in toand out ofthe eye, c focus: The subject matter viewed through the eyetap mustbe displayed at the appropriate depth offocus. causing the eye to becomeboth the cameraand the viewnder (display). Therefore, the device is called an c orthospatiality: The rays oflight created by the aremacmust be collinear with the rays oflight en- EyeTap device. tering the EyeTap,such that the scene viewed Often, lens L1 is avarifocal lens or otherwise has a through the EyeTap appears the sameas if viewed variable eld ofview (such as a “zoom” functionality). in the absence ofthe EyeTap. Inthis case,it is desired that the aremacalso have avari- c orthotonality: Inaddition to preserving the spatial able eld ofview. Inparticular, eld-of-view control light relationship oflight entering the eye,we de- mechanisms (whether mechanical,electronic, orhybrid) sire that the EyeTap device also preserves the tonal are linked in such away that the aremacimage magni - relationships oflight entering the eye. cation is reduced as the cameramagni cation is in- creased. Through this appropriate linkage,any increase in magnication by the camerais negated exactly by 2.1 Focus and Orthospatiality in decreasing the apparent size ofthe view nder image. EyeTap Systems The operation ofthe aremacfocus and zoom tracking The aremachas two embodiments:one in which a is shown in gure 3. focuser (such as an electronically focusable lens) tracks Stereo effectsare well known in virtual reality systems the focus ofthe camerato reconstruct rays ofdiverted (Ellis, Bucher,& Menges,1995) wherein two informa- light in the samedepth plane asimagedby the camera, tion channels are often foundto create abetter sense of and another in which the aremachas extended or in - realism. Likewise, in stereo embodiments ofthe devices nite depth offocus so that the eye itself can focus on that we built, there were two cameras or measurement different objects in ascene viewed through the appara- systems and two aremacs that each regenerated the re- tus. spective outputs ofthe cameraor measurement systems. Although we have designed, built, and tested manyof The apparatus is usually concealed in darksunglasses each ofthese two kinds ofsystems, this paper describes that wholly or partially obstruct vision except for what only the systems that use focus tracking. the apparatus allows to pass through. Mannand Fung 161
Figure 2. Focustracking aremac: (a)with aNearbysubject, a point P 0 thatwould otherwisebe imaged atP 3 in the eye of auser of the device is instead imaged to point P 1 on the image sensor, because the diverterdiverts eyeward boundlight to lens L 1.When subjectmatter is nearby, the L 1 focuser Figure 3. Focusof right camera andboth aremacs (as well as moves objectivelens L 1 out away fromthe sensor automatically, as an vergence) controlled bythe autofocus camera on the left side. Ina automaticfocuscamera would. Asignal fromthe L 1 focuser directsthe L 2 two-eyed system,it is preferable thatboth cameras andboth focuser, byway of the focuscontroll er, to move lens L 2 outward away from aremacs focusto the same distance.Therefore, one of the cameras is the light synthesizer. Inmany of the embodimentsof the systemthat we built, afocusmaster and the other camera is afocusslave. Alternatively, a the functionality of the focuscontroll er was implemented within awearable focuscombiner is used to average the focusdistance of bothcameras computerthat also processedthe images. We designed andbuilt afocus andthen make the two cameras focusat equal distance.The two controller cardas aprintedcircuit boardfor use with the IndustryStandards aremacs, as well as the vergence of bothsystems, also trackthis Association (ISA)bus standard that was popular atthe time of our original same depthplane as de ned bycamera autofocus. design. We also designed andbuilt aPC104version of the focuscontroll er board.Our focus controll er printedcircuit layout (available in PCBformat) is 2.2 Importance of the Orthospatial releasedunder GNUGPL, and is downloadable fromhttp:/ /wearcam.org/ eyetap_focus_controller, along with FIFOimplemen tation of serial select servo Criterion controller. (Later,we built some other embodimentsthat use aserial portof Registration, which is important to augmented- the wearable computerto drive aseparate focuscontroll er module.) The focus reality systems (You, Neumann,& Azuma,1999; controller drives upto four servos to adjustthe position of lenses L of a 2 Azuma,2001; Behringer, 1998), is also important in stereo rig, as well as the two lenses L1.In other embodiments,we used mediated reality. automatic-focuscameras andderived the signal controlling the servo position Ofthe three registration criteria (focus,orthospatial- for lens L2by extractin gthe similar servo positioning signal fromthe focus adjustmentof the autofocuscamera. Atthe same time,an image fromthe ity, orthotonality), an important one is the orthospatial sensor is directedthrough an image processor (PROC)into the light criteria for mitigation ofany resulting mismatchbe- synthesizer (SYNTH).Point P 2 of the display element is responsive to point P 1 of the sensor. Likewise,other pointson the light synthesizer are each responsive to corresponding pointson the sensor, so thatthe synthesizer producesa complete image for viewing through lens L 2 bythe eye, after retractsto focusthese rays atpoint P 1. When lens L1 retracts,so does lens reection offthe backside of the diverter.The position of L 2 is suchthat the L2,andthe light synthesizer ends upgenerati ng parallel rays of light that eye’s own lens L3 will focusto the same distance as it would have focusedin bounce offthe backside of the diverter.These parallel rays of light enter the the absence of the entire device. (b)With distant subject matter, rays of eye andcause itsown lens L 3 to relax to in nity, as it would have in the parallellight are diverted toward the sensor where lens L 1 automatically absence of the entire device. 162 PRESENCE: VOLUME11, NUMBER 2
tween viewnder imageand the real world that would otherwise create anunnatural mapping.Indeed, anyone who has walked around holding asmall camcorder up to his or her eye for several hours aday will obtain an understanding ofthe ill psychophysical effectsthat result. The diverter system in EyeTap allows the center of projection ofthe camerato optically coincide with the center ofprojection ofthe eye.This placement ofthe cameramakes EyeTap different fromother head- mountedcamera systems that place the cameraonly “near to” the eye’scenter ofprojection. We will now discuss how cameraplacement ofthe EyeTap allows it Figure 4. The small lens (22)shown in solid lines collects acone of to work without parallax in avariety ofsituations, with- light boundedby rays 1Cand 2C. Consider, for example, eyeward- out the limitations experienced by head-mounted cam- boundray of light 1E,which may be imagined to be collected bya era systems. large ctional lens 22F(when in factray 1Cis capturedby the actual Itis easy to imagine acameraconnected to atelevi- lens 22),and focused to point 24A.The sensor element collecting sion screen and carefully arranged in such away that, light atpoint 24Ais displayed as point 32Aon the camcorder when viewed froma particular viewpoint, the television viewnder,which is then viewed bya magnifying lens andemerges as ray 1Dinto the eye (39).It should be noted thatthe topof the screen displays exactly what is blocked by the screen, so nearby subjectmatter (23N) also images topoint 24Aand is that anillusory transparency results. This illusory trans- displayed atpoint 32A,emerging as ray 1Das well. Thus,nearby parency would hold only so long asthe television is subjectmatter 23N will appearas shown in the dottedline denoted viewed fromthis particular viewpoint. Moreover, it is 23F,with the toppoint appearing as 23FAeven though the actual easy to imagine aportable miniature device that accom- point should appearas 23NA(that is, it would appearas point plishes this situation, especially given the proliferation of 23NAin the absence of the apparatus).Thus, a camcordercannot consumer camcorder systems (such as portable cameras properly function as atrue EyeTap device. with built-in displays). We could attemptto achieve this condition with ahandheld camcorder,perhaps minia- turized to tinto ahelmet-mounted apparatus,but it is impossible to align the images exactly with what would to aparticular viewpoint is not aproblem (because the appear in the absence ofthe apparatus.We can better sunglasses could be anchored to a xed viewpoint with understand this problem by referring to gure 4. respect to at least one eye ofa user), the other impor- Figure 5shows, in detail, how, in gure 4,we imag- tant limitation —that such systems work for only subject ine that the objective lens ofthe camera,placed directly matter in the samedepth plane —remains. in front ofthe eye,is muchlarger than it really is, so This problem exists whether the camerais right in that it captures all eyeward bound rays oflight, for front ofthe display oroffto one side. Somereal-world which we can imagine that it processes these rays in a examples, having the camerato the left ofthe display, collinear fashion.However, this reasoning is pure ction are shown in gure 6.In these setups, subject matter and breaks down assoon as we consider ascene that has movedcloser to the apparatus will show as being not some depth of eld. properly aligned. Consider aperson standing right in Thus,the setup of gures 4and 5works for only a front ofthe camerabut not in front ofthe TVin gure particular viewpoint and for subject matter in aparticu- 6.Clearly, this person will not be behind the television lar depth plane.Although the samekind of system but yet will appear on the television. Likewise, aperson could obviously be miniaturized and concealed in ordi- standing directly behind the television will not necessar- nary-appearing sunglasses, in which case the limitation ily beseen by the camera,which is located to the left of Mannand Fung 163
3VideoOrbits Head Tracking and Motion Estimation for EyeTap Reality Mediation
Because the device absorbs, quanti es,processes, and reconstructs light passing through it,there are ex- tensive applications in creating amediated version of reality. The computer-generated information orvirtual light as seen through the display mustbe properly regis- tered and aligned with the real-world objects within the user’s eld ofview. To achieve this, amethodof camera-based head trackingis now described.
3.1 Why Camera-BasedHead Tracking?
Agoal ofpersonal imaging(Mann, 1997b) is to fa- Figure 5. Supposethe camera portion of the camcorder,denoted cilitate the use ofthe Reality Mediator in ordinary everyday byreference numeral 10C,were ttedwith avery large objective lens situations, not just on afactory assembly line “workcell” or (22F).This lens would collect eyeward-bound rays of light 1Eand2E. other restricted space.Thus, it is desired that the apparatus Itwould also collect rays of light coming toward the center of have ahead tracker that need not rely on any special appa- projection of lens 22.Rays of light coming toward this camera center ratus being installed in the environment. of projection are denoted 1Cand 2C. Lens 22 converges rays 1Eand Therefore, we need anew methodof head tracking 1Cto point 24Aon the camera sensor element. Likewise, rays of light 2Cand 2E are focusedto point 24B.Ordinarily, the image (denoted based on the use ofthe cameracapability ofthe appara- byreference numeral 24)is upsidedown in acamera, butcameras tus (Mann,1997b) and on the VideoOrbits algorithm anddisplays are designed sothat,when the signal froma camera is (Mann &Picard, 1995). fedto adisplay (suchas aTVset),it shows rightside up.Thus, the image appearswith point 32Aof the display creating rays of light 3.2 Algebraic Projective Geometry suchas the one denoted 1D.Ray 1Dis collinear with eyeward-bound ray 1E.Ray 1Dis responsive to,and collinear with, ray 1Ethatwould The VideoOrbits algorithm performs head track- have entered the eye in the absence of the apparatus.Likewise, by ing,visually, based onanatural environment, and it similar reasoning, ray 2Dis responsive to,and collinear with, eyeward- works without the need for object recognition. Instead, boundray 2E.It should benoted,however, thatthe large lens (22F) it is based on algebraic projective geometry and adirect is just an element of ction. Thus,lens 22Fis a ctional lens, because featureless means ofestimating the change in spatial atrue lens should be represented byitscenter of projection; thatis, itsbehavior should not change, other than bydepth of focus, coordinates ofsuccessive framesof EyeTap video arising diffraction,and amount of light passed,when itsiris is opened or frommovement of the wearer ’shead,as illustrated in closed. Therefore, we could replace lens 22Fwith apinhole lens and gure 7.This change in spatial coordinates is character- simply imagine lens 22to have capturedrays 1E and2E, when it ized by eight parameters ofan “exact” projective (ho- actually, in fact,captures only rays 1Cand 2C. mographic) coordinate transformation that registers pairs ofimages or scene content.These eight parameters are “exact” for two cases ofstatic scenes: (i) images the television. Thus,subject matter that exists atavari- takenfrom the samelocation ofan arbitrary 3-D scene, ety ofdifferent depths and is not con ned to aplane with acamerathat is free to pan,tilt, rotate about its maybe impossible to align in all areas with its imageon optical axis, and zoom (such as when the user stands the screen. still and moves their head); or (ii) images ofa at scene 164 PRESENCE: VOLUME11, NUMBER 2
Figure 6. Illusorytransparency: Examples of acamera supplying atelevision with an image of subjectmatter blocked bythe television. (a)A television camera on atripodat left supplies an Apple “Studio” television display with an image of the lower portion of Niagara Falls blocked bythe television display (resting on an easel to the right of the camera tripod).The camera anddisplay were carefully arranged, along with asecond camera tocapturethis pictureof the apparatus.Only when viewed fromthe special location of the second camera does the illusion of transparencyexist. (b)Various cameras with television outputswere setup on the walkway butnone of themcan re-create the subjectmatter behind the television display in a manner thatcreates aperfectillusion of transparency,because the subjectmatter does not exist in one single depthplane. There exists no choice of camera orientation, zoom setting, andviewer location thatcreates an exact illusion of transparencyfor the portion of the Brooklyn Bridge blocked bythe television screen. Notice how the railings don ’tquite line upcorrectly because they vary in depthwith respectto the rstsupport tower of the bridge.
takenfrom arbitrary locations (such aswhen it is desired by another method.Once placed, however, no further to tracka planar patch,viewed by auser who is free to model or knowledge ofthe scene is required to track moveabout). Thus, it is well suited for tracking planar the location ofcomputer-geneated information. patches with arbitrary view motion,a situation that The featureless projective approachgeneralizes inter- commonlyarises, for instance, when asign or billboard framecamera motion estimation methods that have pre- is to betracked. viously used an af nemodel (which lacks the degrees of Itis stressed here that the algorithm presented is used freedom to “exactly” characterize such phenomenaas to trackimage motion arising fromarbitrary relative camerapan and tilt) and/orthat have relied upon nd- motion ofthe user ’shead with respect to rigid planar ing points ofcorrespondence between the imageframes. patches.Initial placement ofcomputer-generated infor- The featureless projective approach,which operates di- mation (for instance, the four corners ofthe rigid planar rectly on the imagepixels, is shown to besuperior in rectangular patch) is assumed to have been completed accuracy and ability toenhance resolution. The pro- Mannand Fung 165
posed methods work well on imagedata collected from both good-quality and poor-quality video under awide variety ofconditions (sunny, cloudy, day,night). These fully automatic methods are also shown to be robust to deviations fromthe assumptions ofstatic scene and no parallax. The primary application here is in ltering out or replacing subject matter appearing on at surfaces within ascene (for example,rigid planar patches such as advertising billboards). The mostcommon assumption (especially in motion estimation for coding and optical ow for computer vision) is that the coordinate transformation between framesis translation. Tekalp,Ozkan, and Sezan (1992) have applied this assumption to high-resolution image reconstruction. Although translation is the least con- straining and simplest to implement ofthe seven coordi- nate transformations in table 1,it is poor at handling large changes due to camerazoom, rotation, pan,and tilt. Zheng and Chellappa (1993) considered the image registration problem using asubset ofthe af ne model: translation, rotation and scale. Other researchers (Irani &Peleg, 1991;Teodosio &Bender, 1993) have as- sumed afne motion (six parameters) between frames. Behringer (1998) considered features ofa silhouette. For the assumptions ofstatic scene and no parallax, the afne model exactly describes rotation about the optical axis ofthe camera,zoom ofthe camera,and pure shear, which the cameradoes not do,except in the limit as the lens focal length approaches in nity. The afne model cannot capture camerapan and tilt and therefore cannot Figure 7. TheVideoOrbits head-tracking algorithm: The new properly express the “keystoning” and “chirping” we head-tracking algorithm requires no special devices installed in the see in the real world. ( Chirping refers to the effectof environment. The camera in the personal imaging systemsimply tracksitself basedon itsview of objects in the environment. The increasing or decreasing spatial frequency with respect algorithm is basedon algebraic projective geometry andprovides an to spatial location, asillustrated in gure 8.) estimate of the true projective coordinate transformation, which, for This chirping phenomenon is implicit in the proposed successive image pairs is composedusing the projective group (Mann system,whether ornot there is periodicity in the subject &Picard,1995). Successive pairs of images may beestimatedin the matter.The only requirement is that there be some dis- neighborhood of the identity coordinate transformation of the group, tinct texture upon a at surface in the scene. whereas absolute head tracking is done using the exact group by relating the approximate parametersq tothe exact parametersp in the innermost loop of the process.The algorithm typically runsat ve 3.3 Video Orbits to ten framesper second on ageneral purposecomputer, but the Tsai and Huang(1981) pointed out that the ele- simple structureof the algorithm makes it easy toimplement in ments ofthe projective group give the true cameramo- hardware for the higher frame rates needed for full-motion video. tions with respect to aplanar surface.They explored the 166 PRESENCE: VOLUME11, NUMBER 2
Table 1. Image CoordinateTransformations Discussed inthis Paper
Model Coordinate transformation from x to x9 Parameters
Translation x9 5 x 1 b b [ R2 Afne x9 5 Ax 1 b A [ R232, b [ R2
Bilinear x9 5 qx9xyxy 1 qx9xx 1 qx9yy 1 qx9 y9 5 q xy 1 q x 1 q y 1 q q [ R y9xy y9x y9y y9 * Ax 1 b 232 2 Projective x9 5 A [ R , b, c [ R cTx 1 1
Ax 1 b 232 2 Relative-projective x9 5 1 x A [ R , b, c [ R cTx 1 1 2 Pseudo-perspective x9 5 qx9xx 1 qx9yy 1 qx9 1 qa x 1 qb xy y9 5 q x 1 q y 1 q 1 q xy 1 q y 2 q [ R y9x y9y y9 a b * 2 2 Biquadratic x9 5 qx9x2 x 1 qx9xyxy 1 qx9y 2y 1 qx9xx 1 qx9yy 1 qx9 2 2 9 5 2 1 1 2 1 1 1 [ y qy9x x qy9xyxy qy9y y qy9xx qy9yy qy9 q* R
Figure 8. Theprojective chirpingphenomenon. (a)A real-world object thatexhibits periodicity generates aprojection (image) with “chirping” (“periodicity in perspective ”).(b)Center raster of image. (c) Best-tprojective chirpof formsin(2 p ((ax 1 b)/(cx 1 1))).(d) Graphical depiction of exemplar
1-Dprojective coordinate transformation of sin(2 p x1) into a “projective chirp ” function,sin(2 p x2) 5
sin(2p ((2x1 2 2)/(x1 1 1))).The range coordinate as afunction of the domain coordinate formsa
rectangular hyperbola with asymptotesshifted to center atthe vanishing point x 1 5 21/c 5 21 and 2 1 exploding point,x 2 5 a/c 5 2,andwith chirpiness c 9 5 c /(bc 2 a) 5 2 4.
group structure associated with images ofa3-D rigid has been solved. The solution presented in this paper planar patch,as well asthe associated Lie algebra, al- (which does not require prior solution ofcorrespon- though they assume that the correspondence problem dence) also relies onprojective group theory. Mannand Fung 167
3.3.1 ProjectiveFlow— A New Technique for described by Mann(1998, p.2139). The reader is in- Trackinga RigidPlanar Patch. Amethodfor track- vited to refer to that work for amore in-depth treat- ing arigid planar patchis now presented. Consider rst mentof the matter. one-dimensional systems because they are easier to ex- plain and understand. For a1-D af necoordinate trans- 3.3.2 The Unweighted ProjectivityEstimator. formation,the graph ofthe range coordinate asafunc- Ifwe donot wish to apply the ad hoc weighting tion ofthe domain coordinate is astraight line; for the scheme,we maystill estimate the parameters ofprojec- projective coordinate transformation, the graph ofthe tivity in asimple manner still based on solving alinear range coordinate as afunction ofthe domain coordinate system ofequations. To do this, we write the Taylor is arectangular hyperbola ( gure 8(d)). series of um Whether ornot there is periodicity in the scene, the 2 methodstill works, in the sense that it is based on the um 1 x 5 b 1 ~a 2 bc!x 1 ~bc 2 a!cx (5) projective owacross the texture or pattern,at all vari- 1 ~a 2 bc!c 2x 3 1 · · · ous spatial frequency components ofa rigid planar patch.The methodis called projective-ow (p-ow), and use only the rst three terms,obtaining enough which we will now describe in 1-D. degrees offreedom to account for the three parameters Webegin with the well-known Horn and Schunk being estimated. Letting e 5 ¥ ((b 1 (a 2 bc 2 1)x 1 2 2 brightness change constraint equation (Horn & (bc 2 a)cx )Ex 1 Et) , q2 5 (bc 2 a)c, q1 5 a 2 bc 2
Schunk,1981): 1, and q0 5 b,and differentiating with respect toeach ofthe three parameters of q,setting the derivatives 1 ufEx Et < 0, (1) equal tozero, and verifying with the second derivatives gives the linear system ofequations for unweighted pro- where Ex and Et are the spatial and temporal derivatives jective ow: respectively ofthe image E(x), and uf is the optical ow velocity, assuming pure translation. Typically, we deter- 4 2 3 2 2 2 2 x E x x E x x E x q x ExEt mine u which minimizes the error equation (1) as O O O 2 O m 3 2 2 2 2 q O x E x O x E x O xE x 1 5 2 O xExEt (6) 2 2 2 2 2 F x E x xE x E x GF q0G F ExEt G e flow 5 O ~umEx 1 Et! (2) O O O O x
Projective- ow (p-ow), and arises fromsubstitutio n 3.4 Planetracker in 2-D of um 5 ((ax 1 b)/(cx 1 1)) 2 x in place of uf in We now discuss the 2-D formulation.We begin equation (1). again with the brightness constancy constraint equation, Ajudicious weightnig by ( cx 1 1) simplies the cal- this time for 2-D images (Horn &Schunk,1981), culation, giving which gives the owvelocity components in both the x e 5 ~ 1 1 ~ 2 2 ! 1 2 !2 and y directions: w O axEx bEx c xEt x Ex Et xEx . (3)
T Differentiating and setting the derivative to zero (the uf Ex 1 Et < 0 (7) subscript w denotes weighting has takenplace) results in alinear system ofequations for the parameters,which Asis well known,the optical ow eld in 2-D is un- 1 canbe written compactlyas derconstrained. The model ofpure translation at every point has two parameters,but there is only one ~ f f T!@ #T 5 ~ 2 !f O w w a, b, c O xEx Et w (4) equation (7) to solve, thus it is commonpractice to 2 T where the regressor is f w 5 [xEx, Ex, xEt 2 x Ex] . The notation and derivations used in this paper are as 1. Optical owin1-D did not suffer from thisproblem. 168 PRESENCE: VOLUME11, NUMBER 2
computethe optical ow over some neighborhood, 3.5 Unweighted Projective Flows which mustbe at least two pixels, but is generally taken As with the 1-D images,we makesimilar assump- over asmall block —333, 535,or sometimes larger (for tions in expanding equation (8) in its own Taylor series, example,the entire patchof subject matter to be ltered analogous to equation (5). By appropriately constrain- out,such as abillboard orsign). ing the twelve parameters ofthe biquadratic model,we Ourtask is not todeal with the 2-D translational obtain avariety ofeight-parameter approximate models. ow, but with the 2-D projective ow, estimating the Inestimating the “exact unweighted ” projective group eight parameters in the coordinate transformation: parameters,one ofthese approximate models is used in an intermediate step. 2 x9 A@x, y#T 1 b Ax 1 b x9 5 5 5 (8) The Taylor series for the bilinear case gives F y9G cT @x, y#T 1 1 cTx 1 1
um 1 x 5 qx 9xy xy 1 ~qx 9x 1 1!x 1 qx 9yy 1 qx 9 The desired eight scalar parameters are denoted by p 5 232 231 231 (12) [A, b; c, 1], A [ R , b [ R , and c [ R . vm 1 y 5 qy 9 xy xy 1 qy 9 x x 1 ~qy 9 y 1 1!y 1 qy 9 Wehave,in the 2-D case: Incorporating these into the ow criteria yields asimple set ofeight linear equations in eight unknowns: Ax 1 b T 2 e 5 ~uT E 1 E !2 5 2 x E 1 E , flow O m x t O S S cTx 1 1 D x tD ~f ~ !f T~ !! 5 2 f ~ ! O x, y x, y q O Et x, y (13) (9) S x,y D x,y
where f T 5 [E (xy, x, y, 1), E (xy, x, y, 1)]. where the sumcan be weighted asit was in the 1-D x y For the relative-projective model, f is given by case: T f 5 @Ex~x, y, 1!, Ey~x, y, 1!, Et~x, y!#, (14) 2 e 5 ~ 1 2 ~ T 1 ! !T 1 ~ T 1 ! w O S Ax b c x 1 x Ex c x 1 EtD . and,for the pseudo-perspective model, f is given by T (10) f 5 @Ex~x, y, 1!, Ey~x, y, 1!, (15) ~x 2E 1 xyE , xyE 1 y 2E !#. Differentiating with respect to the free parameters A, b, x y x y and c,and setting the result to zero gives alinear solu- 3.5.1 Four-Point Method forRelating Ap- tion: proximate Modelto Exact Model. Any ofthe pre- ceding approximations, after being related to the exact T T projective model,tend to behave well in the neighbor- S O f f D @a11, a12, b1, a 21, a 22, b2, c1, c2# (11) hood ofthe identity, A 5 I, b 5 0, c 5 0. In 1-D, the 5 T 2 f O ~x Ex Et! model Taylor series about the identity was explicitly ex- panded;here, although this is not done explicitly, it is where assumed that the terms ofthe Taylor series ofthe model correspond to those takenabout the identity. Inthe T f 5 @Ex~x, y, 1!, Ey~x, y, 1!, 1-D case,we solve the three linear equations in three
2 2 xEt 2 x Ex 2 xyEy, yEt 2 xyEx 2 y Ey# 2.Use ofan approximate modelthat doesn ’tcapture chirping or preservestraight lines can still lead tothe true projective parameters as For amore in-depth treatment ofprojective ow, the longas themodel captures at least eightmeaningful degreesof free- reader is invited torefer to Mann(1998). dom. Mannand Fung 169
unknowns to estimate the parameters ofthe approxi- adequate to describe large changes in perspective. How- matemotion model,and then relate the terms in this ever, if we use it to tracksmall changes incrementally, Taylor series to the exact parameters, a, b, and c (which and each time relate these small changes to the exact involves solving another set ofthree equations in three model (8), then we can accumulate these small changes unknowns,the second set being nonlinear, although using the law ofcomposition affordedby the group very easy to solve). structure. This is anespecially favorable contribution of Inthe extension to2-D, the estimate step is straight- the group framework.For example,with avideo se- forward, but the relate step is more dif cult,because we quence,we canaccommodate very large accumulated now have eight nonlinear equations in eight unknowns, changes in perspective in this manner.The problems relating the terms in the Taylor series ofthe approxi- with cumulative error can be eliminated, for the most matemodel tothe desired exact model parameters.In- part,by constantly propagating forward the true values, stead ofsolving these equations directly, asimple proce- computingthe residual using the approximate model, dure is used for relating the parameters ofthe and each time relating this to the exact model toobtain approximate model to those ofthe exact model,which agoodness-of- t estimate. is called the “four-point method ”: 3.5.2 Overview ofthe Algorithmfor Un- 1.Select four ordered pairs (for example,the four weighted ProjectiveFlow. Frames froman image corners ofthe bounding box containing the region sequence are comparedpairwise totest whether or not under analysis, orthe four corners ofthe imageif they lie in the sameorbit: the whole imageis under analysis). Here, suppose, for simplicity, that these points are the corners of 1.A Gaussian pyramid ofthree orfour levels is con- T the unit square: s 5 [s1, s2, s3, s4] 5 [(0, 0) , structed for each framein the sequence. (0, 1)T, (1, 0)T, (1, 1)T]. 2.The parameters p are estimated at the top ofthe 2.Apply the coordinate transformation using the pyramid,between the two lowest-resolution im- Taylor series for the approximate model (such as ages ofa framepair, g and h,using the repetitive
equation (12)) to these points: r 5 um(s). methoddepicted in gure 7. 3.Finally, the correspondences between r and s are 3.The estimated p is aplied to the next-higher-reso- treated just like features. This results in four easy- lution (ner) imagein the pyramid, p + g, to make to-solve equations: the two images atthat level ofthe pyramid nearly x9 x , y , 1, 0, 0, 0, 2x x9, 2y x9 congruent before estimating the p between them. k 5 k k k k k k F yk9G F 0, 0, 0, xk, yk, 1, 2xk yk9, 2yk yk9G 4.The process continues down the pyramid until the (16) highest-resolution imagein the pyramid is T reached. @ax 9x , ax 9y, bx 9, ay 9x , ay 9y, by 9, cx , cy#
where 1 # k # 4.This results in the exact eight parameters, p. 4Reality Mediation in Variable-Gain Image Sequences Weremind the reader that the four corners are not feature correspondences as used in the feature-based Until now, we have assumed xed-gain imagese- methods,but, rather, are used so that the two feature- quences. Inpractice, however, cameragain varies to less models (approximate and exact) canbe related to compensate for varying quantity oflight, by way ofau- one another. tomatic gain control (AGC), automatic level control, or Itis important to realize the full bene t of nding the some similar formof automatic exposure. exact parameters.Although the approximate model is Infact,almost all modern cameras incorporate some sufcient for small deviations fromthe identity, it is not formof automatic exposure control. Moreover, next- 170 PRESENCE: VOLUME11, NUMBER 2
Figure 9. Automaticexposure is the cause of differently exposed picturesof the same (overlapping) subjectmatter, creating the need for comparametricimaging in intelligent vision systems(Mann, 2001a).(a) Looking frominside Hart House Soldier ’sTower, out through an open doorway, when the sky is dominant in the picture,the exposure is automatically reduced,and the wearer of the apparatuscan see the texture (suchas clouds) in the sky. He can also see University College andthe CNTower to the left. (b)As he looks upand to the right totake in subjectmatter not so well illuminated, the exposure automatically increases somewhat. The wearer can no longer see detail in the sky, butnew architectural details inside the doorway startto become visible. (c)As he looks furtherup and to the right, the dimly lit interior dominates the scene, andthe exposure is automatically increased dramatically. Hecan no longer see any detail in the sky, andeven the University College building, outside, is washed out (overexposed). However, the inscriptions on the wall (names of soldiers killed in the war) now become visible.
generation imagingsystems such asthe EyeTap eye- overlapping subject matter,we have (once the images glasses also feature an automatic exposure control sys- are registered, in regions ofoverlap) differently exposed temto makepossible ahands-free, gaze-activated pictures ofidentical subject matter.In this example,we wearable system that is operable without conscious have three very differently exposed pictures depicting thought or effort.Indeed, the humaneye itself incorpo- parts ofthe University College building and surround- rates manyfeatures akinto the automatic exposure or ings. AGCofmodern cameras. Figure 9illustrates how the Reality Mediator (or 4.1 Variable-Gain Problem nearly any camerafor that matter) takesin atypical Formulation scene. Asthe wearer looks straight ahead,he sees mostly sky, Differently exposed images (such asindividual and the exposure is quite small. Lookingto the right at framesof video) ofthe samesubject matter are denoted darker subject matter,the exposure is automatically in- as vectors: f0, f1, . . . , fi, . . . , fI21, @i, 0 # i , I. creased. Because the differently exposed pictures depict Eachvideo frameis some unknown function, f ¼, of Mannand Fung 171
the actual quantity oflight, q(x)falling on the image 4.1.1 Using ComparametricEquations. Vari- sensor: able-gain imagesequences, fi,are created by the re- sponse, f,ofthe imaging device to light, q. Each of 1 Aix bi these images provides us with an estimate of f differing fi 5 f kiq , (17) S S cix 1 di D D only by exposure, k.Pairs ofimages can be comparedby where x 5 (x, y)denotes the spatial coordinates ofthe plotting ( f (q), f (kq)), and the resulting relationship can be expressed as the monotonic function g( f (q)) 5 f (kq) image, ki is asingle unknown scalar exposure constant, not involving q.Equations ofthis formare called com- and parameters Ai, bi, ci, and di denote the projective coordinate transformation between successive pairs of parametric equations (Mann,2000). Comparametric images. equations are aspecial case ofa more general class of For simplicity, this coordinate transformation is as- equations called functional equations (Acze´l, 1966). sumed to beable to be independently recovered (for Acomparametricequation that is particularly useful example,using the methods ofthe previous section). for mediated-reality applications will now beintro- Therefore, without loss ofgenerality, images considered duced, rst by its solution (fromwhich the compara- in this section will be takenas having the identity coor- metric equation itself will be derived). (It is generally dinate transformation, which corresponds to the special easier to construct comparametricequations fromtheir case ofimages differing only in exposure. solutions than it is to solve comparametricequations.) The solution is Without loss ofgenerality, k0 will be called the refer- ence exposure and will be set to unity, and framezero c will be called the reference frame, so that f0 5 f (q). f ~q! 5 e bq a/~e bq a 1 1! , (20) Thus, we have S D
which has only three parameters (of which only two are 1 21 21 f ~ fi ! 5 f ~ f0!, @i, 0 , i , I. (18) ki meaningful parameters because b is indeterminable and may be xed to b 5 0without loss ofgenerality). Equa- The existence ofan inverse for f follows froma semi- tion (20) is useful because it describes the shape ofthe monotonicity assumption.Semimonotonicity follows curve that characterizes the response ofmanycameras fromthe factthat we expect pixel values to either in- to light, f (q),called the response curve. The constants a crease orstay the samewith increasing quantity ofillu- and c are specictothe camera. mination, q. This model accurately captures the essence ofthe so- Photographic lmis traditionally characterized by the called toe and shoulder regions ofthe response curve. so-called “density versus log exposure ” characteristic Intraditional photography,these regions are ignored; curve (Wyckoff,1961, 1962). Similarly, in the case of all that is ofinterest is the linear mid-portion ofthe electronic imaging,we mayalso use logarithmic expo- density versus log exposure curve. This interest in only 5 5 sure units, Q log(q),so that one imagewill be K the midtones arises because, in traditional photography, log(k)units darker than the other: areas outside this region are considered to be incorrectly
21 21 exposed. However, in practice, in input images tothe log~ f ~ f1~x!!! 5 Q 5 log~ f ~ f2~x!!! 2 K. (19) reality mediator,many of the objects we lookat will be Because the logarithm function is also monotonic,the massively underexposed and overexposed because not problem comesdown to estimating the semimonotonic everything in life is necessarily awell-composed picture. function F¼5 log( f 21¼)and the scalar constant K. Therefore, these qualities ofthe model (20) are ofgreat There are avariety oftechniques for solving for F and K value in capturing the essence ofthese extreme expo- directly (Mann,2000). Inthis paper,we choose touse a sures, in which exposure into both the toe and shoulder methodinvolving comparametricequations. regions are often the normrather than an aberration. 172 PRESENCE: VOLUME11, NUMBER 2
Figure 10. (a)One of nineteen differently exposed picturesof atestpattern. (b) Each of the nineteen exposures producedeleven
ordered pairs in aplotof f(Q)as afunction of Q.(c)Shifting these nineteen plotsleft or right bythe appropriateK i allowed themall toalign to producethe ground-truthknown-response function f(Q).
Furthermore, equation (20) has the advantage ofbeing gure 10(a), was used toverify the response function bounded in normalized units between 0and 1. recovered using the method. The comparametricequation ofwhich the proposed The individual bars were segmented automatically by photographic response function (20) is asolution, is differentiation to nd the transition regions, and then given by robust statistics were used to determine an estimate of f (q)for each ofthe eleven steps, as well as the black Î c f g~ f ! 5 , (21) regions ofthe test pattern.Using the known re ectivity ~Î c f 1 e2aK!c ofeach ofthese twelve regions, aset oftwelve ordered pairs (q, f (q))was determined for each ofthe nineteen where K 5 log2(k2/k1)is the ratio ofthe two expo- exposures, as shown in gure 10(b). Shifting these re- sures. sults appropriately (by the K values) to line themup, Tovalidate this model,we: i gives the ground-truth, known response function, f, c estimate the parameters a, c, and k of g( f (q)) that shown in gure 10(c). best t a plot ( f (q), f (kq))derived fromdifferently Thus,equation (21) gives us arecipe for lightening exposed pictures (as, for example,shown in gure or darkening an imagein away that looks natural and is 9), and also based on this proven theoretical framework.For c verify our estimate of f by using lab instruments. instance, given apair ofimages takenwith acamera Although there exist methods for automatically deter- with aknown response function (which is to say that the mining the a and c and relative gain k frompairs ofdif- a and c are known for the camera),the relative gain be- ferently exposed images byusing comparametricequa- tween images is estimated, and either ofthe pair is light- tions, these methods are beyond the scope ofthis paper, ened ordarkened to bring it into the sameexposure as and the reader is invited torefer to Mann(2000) for a the other image.Similarly, any computer-generated in- full discussion ofthese methods and ofcomparametric formation in amediated or augmentedscene is brought equations. into the appropriate exposure ofthe scene. ACamAlign-CGHtest chart fromDSC Laboratories, Toronto, Canada(Serial No.S009494), as shown in Mannand Fung 173
Figure 11. Mediated realityas aphotographic/videographic memory prosthesis: (a)Wearable face recognizer with virtual “name tag” (andgrocery list) appearsto stayattached to the cashier (b),even when the cashier is no longer within the eld of view of the tappedeye andtransmitter (c).
4.2 Mediated Reality asa Form of Real-world “spam” (unwanted and unsoliciated ad- Communication vertising) typically occurs on planar surfaces, such as billboards. The VideoOrbits algorithm presented here is The mathematical frameworkfor mediated reality well suited toward diminishing these unwanted and in- arose through the process ofmarkinga reference frame trusive real-world planar objects. (Mann &Picard, 1995) with text or simple graphics in Because the cameraresponse function and exposure which it was noted that,by calculating and matching values can be computedautomatically in aself-calibrat- homographies ofthe plane,an illusory rigid planar patchappeared to hover upon objects in the real world, ing system,the computer-mediated reality can takeform giving rise to aformof computer-mediated collabora- by combining the results estimation ofthe gain using tion (Mann,1997b). Figure 11 shows images processed the cameraresponse function and estimation ofthe co- in real time byVideoOrbits. ordinate transformation between frameswith the Video- Orbits methodology for computer-mediated reality. Figure 12(a, b)shows anice view ofthe Empire State 4.3 Diminished Reality Building spoiled by an offensive jeans advertisement (a Diminished reality deliberately removes parts ofa billboard depicting amanpulling offa women ’s real-world scene or replaces themwith computer- clothes). The computer-mediated reality environment generated information (Mann &Fung,2001). For in- allows the billboard to be automatically replaced with a stance,deliberately diminished reality has application in picture ofvintage (original 1985) mediated-reality sun- construction. Klinker, Stricker and Reiners (2001) dis- glasses. (See gure 12(c,d).) Byremoving the bill- cuss anumber oftechniques for interpolating the pixels board,a deliberately diminished version ofthe scene is behind adiminished object: “Manyconstruction created. The information ofthe advertisement is now projects require that existing structures be removed be- removed,and computer-generated information is in- fore new ones are built. Thus, just asimportant as aug- serted in its place,helping toavoid information over- menting reality is technology to diminish it ” (p. 416). load. 174 PRESENCE: VOLUME11, NUMBER 2
Figure 12. (a,b) Two framesfrom a video sequence in New York City,showing how anice view of the Empire StateBuilding is spoiled byan offensive jeans advertisement (abillboard depicting amanpulling offa women ’sclothes). Notice the effectof AGCbeing similar to thatdepicted in gure 9.(a) Because alarge proportion of sky is included in the image, the overall exposure is quite low, so the image is darker. (b)Because the darker billboard begins to enter the center portion of view, the gain increases andthe entire image is lighter. (c,d) Two framesfrom a video sequence in New York City,showing how the same visual reality can be diminished. Ourability tosee the offensive advertisement is reduced. The diminished reality is then augmented with aview of the vintage 1985smart sunglasses. Now, the resulting image sequence is an example of mediated reality. Notice how the exposure of the new matter,introduced into the visual eld of view, tracksthe exposure of the offensive advertising material originally present.The result is avisually pleasing, mediated- reality experience extending over avery wide dynamicrange. (c)Because alarge proportion of sky is included in the image, the overall exposure is quite low, andso the image is darker. The additional material inserted into the image is thusautomatically madedarker, comparametrically, to match.(d) Because the original image was lighter, the new matterintroduced into the visual reality streamis also madelighter, comparametrically, tomatch.
5Conclusion mation,giving rise to the notion ofa deliberately diminished reality. Because wearable computers and EyeTap encapsu- late users, the technologies mediate auser ’sexperience with the world. Having designed, built, worn, and Acknowledgments tested dozens ofdifferent embodiments ofthese devices This workwas funded in partby Xilinx and Altera. for use in ordinary day-to-day life provided uswith muchin the way ofvaluable insight into the concepts of mediated reality. The resulting Reality Mediators alter References the user’svisual perception oftheir environment. The user’shead motion is trackedby the VideoOrbits algo- Acze´l, J.(1966). Lectureson functionalequations and their rithm,and the cameragain is trackedusing compara- applications (Vol. 19). New Yorkand London: Academic metric equations. This allows for computer-generated Press. information tobe registered both spatially and tonally Azuma,R. T.(2001). Augmented reality: Approachesand with the real world. An extension ofthe concept ofme- technical challenges. In W.Bar eld &T.Caudell (Eds.), diated reality is the replacement ofunwanted informa- Fundamentalsof wearablecomputers andaugmented reality tion, such asadvertising, with computer-geneated infor- (pp. 27–63). New Jersey:Lawrence ErlbaumPress. Mannand Fung 175
Behringer,R. (1998). Improving the precision of registration ———.(1998). Humanistic intelligence/humanistic comput- foraugmented reality in anoutdoor scenarioby visual hori- ing: ‘Wearcomp’ asanew frameworkfor intelligent signal zon silhouette matching. Proceedings of rst IEEE workshop processing. Proceedings of the IEEE, 86 (11), 2123–2151. on augmentedreality (IWAR98), 225 –230. ———.(2000). Comparametricequations with practicalap- Caudell, T.,&Mizell, D.(1992). Augmented reality: Anap- plications in quantigraphic image processing. IEEE Trans. plication of heads-updisplay technology to manual manu- Image Proc., 9 (8), 1389 –1406. facturing processes. Proc. Hawaii International Conf. on ———. (2001a). Intelligentimage processing. New York:John SystemsScience, 2, 659 –669. Wiley and Sons. Drascic,D., &Milgram, P.(1996). Perceptualissues in aug- ———.(2001b). Wearablecomputing: Toward humanistic mented reality. SPIE Volume2653: Stereoscopic Displaysand intelligence. IEEE IntelligentSystems, 16 (3), 10–15. Virtual Reality SystemsIII, 123–134. Mann, S.,&Fung,J. (2001). Videoorbits on eyetap devices Earnshaw,R. A.,Gigante, M.A.,&Jones, H.(1993). Virtual fordeliberately diminished reality oraltering the visual per- reality systems. London: AcademicPress. ception of rigid planarpatches of arealworld scene. Inter- Ellis, S.R.,Bucher,U. J.,&Menges, B.M.(1995). The rela- nationalSymposium on Mixed Reality (ISMR2001), 48 –55. tionship of binocular convergenceand errorsin judged dis- Mann, S.,&Picard,R. W.(1995). Video orbits of the projective tance to virtual objects. Proceedings of the International Fed- group; asimpleapproach to featurelessestimation of parame- eration of Automatic Control, 297–301. ters (Tech.Rep. No. 338). Cambridge,MA: Massachusetts Feiner,S., MacIntyre, B.,&Seligmann, D.(1993a). Karma Institute of Technology. (Also appearsin IEEE Trans. Im- (knowledge-based augmentedreality for maintenanceassis- age Proc., (1997), 6(9), 1281–1295.) tance). Available online at:http:/ /www.cs.columbia.edu/ Sutherland, I.(1968). Ahead-mounted three dimensional graphics/projects/karma/karma.html. display. Proc. FallJoint Computer Conference, 757–764. ———.(1993b). Knowledge-based augmented reality. Com- Tekalp,A., Ozkan,M., & Sezan,M. (1992). High-resolution municationsof the ACM,36 (7), 52–62. image reconstruction fromlower-resolution image se- Fuchs,H., Bajura,M., & Ohbuchi, R. Teamingultrasound quences and space-varyingimage restoration. Proc. of the data with virtual reality inobstetrics. Available online at: Int. Conf. on Acoust., Speech andSig. Proc., III-169. http://www.ncsa.uiuc.edu/Pubs/MetaCenter/SciHi93/ Teodosio, L.,& Bender,W. (1993). Salient video stills: Con- 1c.Highlights-BiologyC.html. tent and context preserved. Proc. ACMMultimediaConf., Klinker,G., Stricker,D., &Reiners,D. (2001). Augmented 39 –46. reality forexterior construction applications. In W.Bar eld &T.Caudell (Eds.), Fundamentalsof wearablecomputers Tsai,R. Y.,&Huang, T.S.(1981). Estimating three-dimen- andaugmented reality (pp. 397–427). New Jersey:Law- sional motion parametersof arigid planarpatch. IEEE renceErlbaum Press. Trans. Acoust., Speech, andSig. Proc., ASSP(29), Horn, B.,&Schunk,B. (1981). Determining optical ow. 1147–1152. ArticialIntelligence, 17, 185–203. Wyckoff,C. W.(1961). Anexperimental extended response Irani,M., & Peleg,S. (1991). Improving resolution by image lm (Tech.Rep. No. NO. B-321).Boston, Massachusetts: registration. CVGIP, 53, 231–239. Edgerton, Germeshausen& Grier,Inc. Mann, S.(1997a). Humanistic intelligence. Proceedings of Ars Wyckoff,C. W.(1962, June –July). An experimental extended Electronica, 217–231. (Available online at:http:/ / response lm. S.P.I.E. NEWSLETTER, 16 –20. wearcam.org/ars/and http//www.aec.at/ eshfactor. You, S.,Neumann,U., &Azuma,R. (1999). Hybrid inertial ———.(1997b). Wearablecomputing: A rststep toward and vision trackingfor augmented reality registration. Pro- personal imaging. IEEE Computer, 30 (2), 25–32. ceedingsof IEEE VR, 260 –267. ———.(1997c). Anhistorical account of the ‘WearComp’ Zheng, Q.,&Chellappa,R. (1993). Acomputational vision and ‘WearCam’ projects developed for ‘personal imaging. ’ approachto image registration. IEEE Transactions Image International symposiumon wearablecomputing, 66 –73. Processing, 2 (3), 311–325.