US007543758B2

(12) United States Patent (10) Patent N0.: US 7,543,758 B2 Dymetman et a1. (45) Date of Patent: Jun. 9, 2009

(54) DOCUMENT LOCALIZATION OF POINTING 6,963,845 B1 11/2005 Lapstun et al. ACTIONS USING DISAMBIGUATED VISUAL REGIONS

(75) Inventors: Marc Dymetman, Grenoble (FR); OTHER PUBLICATIONS Gabriela Csurka, Crolles (FR) Adnan M. Alattar, “Smart images using Digimarc’s Watermarking (73) Assignee: Xerox Corporation, NorWalk, CT (US) technology”, Proceedings of the IS&T/SPII International Sympo sium on Electronic Imaging, vol. 3971, No. 25, pp. 264-273, (2000). (*) Notice: Subject to any disclaimer, the term of this (Chen) Yu-Chee Tseng, Y.Y. Chen, H. K. Pan, “A secure Data Hiding patent is extended or adjusted under 35 Scheme for Binary Images”, IEEE Transactions on Communications, U.S.C. 154(b) by 619 days. vol. 50, No. 8, Aug. 2002, pp. 1227-1231. Appl. No.: 11/312,057 (Csurka) G. Csurka, F. Deguillaume, J. K. Ruanaidh and T. Pun, “A (21) Bayesian Approach to Af?ne Transformation Resistant Image and (22) Filed: Dec. 20, 2005 Video Watermarking”, Lecture Notes in COmputer Science, vol. 1768 (1999). (65) Prior Publication Data (Continued) US 2007/0138283 A1 Jun. 21, 2007 Primary ExamineriKarl D. Frech (51) Int. Cl. (74) Attorney, Agent, or FirmiFay Sharpe LLP G06K 19/06 (2006.01) (52) US. Cl...... 235/494; 235/487; 235/375 (57) ABSTRACT (58) Field of Classi?cation Search ...... 235/487, 235/494, 375 See application ?le for complete search history. A system for document internal localization includes an opti (56) References Cited cal input device for retrieving an observed region of a hard copy document and a processing component Which provides U.S. PATENT DOCUMENTS a location identi?er for actionable observed regions by matching the retrieved region With a region of an electroni 5,949,055 A 9/1999 Fleet et al. _ 6,330,976 B1 12/2001 Dymetman et a1‘ cally stored version of the hardcopy document. The hardcopy 6,585,163 B1 7/2003 Meunier et a1, document and electronically stored version of the document 6,594,406 B1 7/2003 Hecht include machine readable disambiguating information Which 6,604,875 B2 8/2003 Meunier et al. has been added to actionable regions of the document Which 6,647,130 B2 1 1/2003 Rhoads are indistinguishable from other actionable regions, based on 6,690,821 B2 2/2004 Goldberg et al' their original markings, Whereby a region is distinguishable 6,694,042 B2 2/2004 Seder et al' from similar regions of the document. The disambiguating 6,752,317 B2 6/ 2004 Dymetman et a1...... 9/2004 S1 b k t l mformatlon 1s local1Zed 1n reglons of the document Whose 6,788,293 B1 1 ver TOO e a ...... 6,820,815 B2 1 H2004 Meunier et a1‘ mirkmgf~ are not d1st1ngu1shable W1thout the drsamblguatmg 6,847,883 B1 1/2005 Walmsley et al. In Orma 1On' 6,946,672 B1 9/2005 Lapstun et al. 6,959,298 B1 10/2005 Silverbrook et al. 19 Claims, 4 Drawing Sheets ,1 50~ A WORK \~ 0%? om \ smnoucvu DlSPLAY \ 44 ‘54 \ 5B_\ DOCUMENT mm 40 )0 so \~~uEwGRAN LDC/111011 __ COMPONENT \ M2 22 W 102,105 COMM/110R / 54 0F- mino g 46- \\mauoav Q2 , COMPONENT L 42\‘_ 1s\ DAIFBASE ACTION BITMAP DEVICE CENTRAL \ \ PROCESSOR 56 6a 30 US 7,543,758 B2 2

OTHER PUBLICATIONS F. A. P. Petitcolas, R. J. Anderson and M. G. Kuhn, “Information Hiding iA Survey”, Proceedings of the IEEEE, special issue on M. Dymetman and M. Copperman, “Intelligent Paper”, Proceedings protection ofmultimedia content, 87(7): Jul. 1999, pp. 1062-1078. of the 7th International Conference on Electronic Published, Held S. Roy and E. Chang, “Watermarking With Retrieval Systems”, in jointly With the 4th international Conference on Raster Imaging and Multimedia Systems 9:433-440 (2004). Digital : Electronic Publishing, Artistic Imaging, and C. Schmid, R. Mohr and C. Bauckhage, “Evaluation of Interest Digital Typography, Mar. 30-Apr. 3, 1998R.D. Hersch, J. Andre and Detectors”, International Journal of Computer Vision, 37(2), 2000, H. Brown, Eds. Lecture Notes in Computer Science, vol. 1375. pp. 151-172. Springer-Verlag, London, pp. 392-406. D. Sellars, “An Introduction to Steganography”, http://WWWtotse. com/en/privacy/encryption/ 163947.html Nov. 14, 2005. A. Gupta and R. Jain, “Visual Information Retrieval”, Communica C. Shoemaker, “Hidden Bits: A Survey of Techniques for Digital tions ofthe ACM, May 1997, vol. 40, No. 5, pp. 71-79. Watermaking”, Independent Study, EER-290, Spring 2002. K. Mikolajczyk and C. Schmid, “A performance evaluation of local A. Natsev, R. Rastogi, K. Shim, “WALRUS: A Similarity Retrieval descriptors”, IEEE Transactions on Pattern Analysis and Machine Algorithm for Image Databases”, IEEE Transactions on Knowledge Intelligence, vol. 27, No. 10 (Oct. 2005). and Data Engineering, 16-3 (2004). P. Nillius and J. O. Eklundh, “Fast Block Matching With Normalized M. Wu and B. Liu, “Data Hiding in Binary Image for Authentication Cross-Correlation using Walsh Transforms”, Report, ISRN KTRH/ and Annotation”, IEEE Transactions on Multimedia, vol. 6, No. 4, NNP -02/11-SE (2002). Aug. 2004, pp. 528-538.

US. Patent Jun. 9, 2009 Sheet 2 of4 US 7,543,758 B2

FIG.3

72

SEEFIGURE3 FIG.2

32

US. Patent Jun. 9, 2009 Sheet 4 of4 US 7,543,758 B2

9 @{QWAQQQQE1' I '- n $6M“ Q *1 FIG. 5 US 7,543,758 B2 1 2 DOCUMENT LOCALIZATION OF POINTING as by printer driver softWare, by a Postscript engine in a ACTIONS USING DISAMBIGUATED VISUAL printer. The indicia can encode data about the document, or REGIONS can encode an identi?er that references a database record containing such data. By shoWing the printed document to a BACKGROUND computer device With a suitable optical input device (e.g., a Webcam), an electronic version of the document can be The exemplary embodiment relates to localization of a recalled for editing, or other responsive action can be taken. pointing action on a document for providing information US. Pat. No. 6,647,130 entitled “PRINTABLE INTER associated With a pointer location. It ?nds particular applica FACES AND DIGITAL LINKING WITH EMBEDDED tion in conjunction With images on paper Which are mini CODES,” by Rhoads discloses a physical medium Which is mally altered to enable unique localiZation by a camera encoded With machine readable information that provides a enabled pointer pen and Will be described With reference human interface to a computer system. The information thereto. encoded into the medium indicates a computer implemented Machine readable information in the form of Watermarks, process, and is encoded according to a spectral encoding barcodes, and the like has been embedded into images on scheme, such as encoding by modifying color values of a paper for a variety of applications, such as document identi graphic or other image printed on the medium. For example, ?cation and precise document-internal pointing localiZation. a digital Watermark or other stegano graphic data hidden in the The code is generally invisible or visually unobstructive and image indicates a Web page. In response to the user selecting may be decoded by a camera-enabled pointer-pen. For the encoded information area, the machine readable informa example, presenting the document to the camera permits the 20 tion is decoded, and used to invoke a computer implemented user to recover a document identi?er encoded in the Water process. mark Which can then be used to retrieve the document from a Sujoy Roy and Ee-Chien Chang, “Watermarking With database Which is accessed With the identi?er. The image Retrieval Systems,” in Multimedia Systems 9: 433-440 itself is not used for retrieval purposes, but only the Water (2004), relates to the association of messages With multime mark-embedded code. 25 dia content for the purpose of identifying them. The reference The encoded information is used to associate various notes that Watermarking systems Which embed associated actions With the pointing gesture, such as displaying touristic messages into the multimedia content tend to distort the con information about a monument pointed on a map, ordering a tent during embedding. Retrieval system are said to avoid mechanical part from a paper catalog, retrieving the stored undesirable distortion but searching in large databases is fun translation of a selected piece of text, playing a recorded 30 damentally dif?cult. interpretation from the position indicated by the pointer on a musical partition, or generally any action that can be triggered BRIEF DESCRIPTION by using a mouse With a computer screen. Early Work on What is often referred to as “intelligent Aspects of the exemplary embodiment relate to a system paper” focused on the use of invisible inks. HoWever, this 35 and a method for document internal localiZation, to a system requires special printing inks to be installed in the printer. for generating such a localiZation system and to an article, More recently, visually unobstructive marks, such as datag such as a hardcopy document. lyphs, have been printed With conventional inks in locations In one aspect, a method for document internal localiZation of the image associated With information retrieval. HoWever, includes identifying, for a multiplicity of regions of an elec these marks do modify the visual appearance of the docu 40 tronically stored document, if there are regions Which are ment, and moreover are generally limited to areas of the indistinguishable, by an automated processing component, document, such as text regions, Where there is su?icient sur from other regions in the multiplicity of regions. In the event rounding White space. that regions Which are indistinguishable are identi?ed, for a set of regions Which are indistinguishable from each other, INCORPORATION BY REFERENCE 45 machine readable disambiguating information is adding to at least one of the regions in the set Whereby the regions in the The folloWing references, the disclosures of Which are set of regions are distinguishable by the automated process expressly incorporated herein in their entireties by reference, ing component. The method further includes associating each are mentioned: of the regions in the multiplicity of regions With a correspond US. Pat. Nos. 6,752,317 and 6,330,976 entitled “MARK 50 ing location in the electronic document. A hardcopy docu ING MEDIUM AREA WITH ENCODED IDENTIFIER ment is generated from the electronic document. An observed FOR PRODUCING ACTION THROUGH NETWORK,” by region of the hardcopy document is compared With a ?rst Dymetman, et al. disclose obtaining automatic actions region in the multiplicity of regions With the automated pro through a netWork using an area of marking medium With cessing component and, Where the observed region of hard machine-readable markings that encode an action/medium 55 copy document is indistinguishable from the ?rst region, the identi?er. The action/medium identi?er identi?es an action location associated With the ?rst region is retrieved. that can be produced through the netWork, and also identi?es In another aspect, a system for generating a document With the area of marking medium. For example, it may include a internal localiZation capability includes a processing compo globally unique or netWork-Wide page identi?er as Well as an nent Which determines Whether actionable regions of an elec action identi?er that can be used to produce an action 60 tronically stored document associated With a pointing action described by data associated With a counterpart digital page. are distinguishable, the processing component adding su?i US. Pat. No. 6,694,042, entitled “METHODS FOR cient machine readable disambiguating information to the DETERMINING CONTENTS OF MEDIA,” by Seder, et al. electronically stored document to render the actionable discloses printing documents and other objects With machine regions associated With a pointing action Which are indistin readable indicia, such as steganographic digital Watermarks 65 guishable Without the disambiguating information distin or barcodes, for enabling document management functions. guishable. A memory stores the electronically stored docu The indicia can be added as part of the printing process, such ment With the machine readable disambiguating information. US 7,543,758 B2 3 4 In another aspect, a system for document internal localiZa FIG. 5 is a schematic vieW of a grid superimposed on a tion includes an optical input device for retrieving an digital document, illustrating an observation region of an observed region of a hardcopy document and a processing optical input device. component Which determines a location of the observed region by matching the retrieved region With a region of an DETAILED DESCRIPTION electronically stored version of the hardcopy document. The hardcopy document and electronically stored version of the The exemplary embodiment relates to a system and a document include machine readable disambiguating infor method for document-intemal localiZation of a pointing ges mation Whereby a region is distinguished from similar ture. Rather than relying on a uniform localiZation-code layer regions of the document. The disambiguating information is over the surface of the document, intrinsic visual character localiZed in regions of the document Which are not distin istics of the document is used as the primary localiZation key. guishable Without the disambiguating information. In addition, Whenever the intrinsic visual characteristic of tWo In another aspect, an article of manufacture for obtaining different locations are indistinguishable to an optical input automatic actions through a netWork using a processing com device, the document is modi?ed by adding disambiguating ponent for connecting to the netWork and an optical input information, for example, by adding a small amount of visual device for providing input signals to the processing compo “noise” or a Watermark, Which is su?icient for discriminating nent. The automatic actions are provided by an action device betWeen the tWo locations. In this Way, the original document connected to the network. The article of manufacture includes is modi?ed in an almost unnoticeable Way While acquiring print medium With visually recogniZable machine-readable localiZation capabilities. markings on the print medium. Visually unobtrusive machine 20 An image or electronic document generally may include readable markings are localiZed in regions of the print information in electronic form Which is to be rendered on medium Which are indistinguishable from other regions by print media and may include text, graphics, pictures, and the the visually recogniZable machine-readable markings alone. like. At least a portion of the information, When rendered, is The visually unobtrusive machine readable markings are visually recognizable, Which users of the document having absent from at least some regions Which are distinguishable 25 normal vision can identify Without resort to a magnifying by the visually recogniZable machine-readable markings glass. alone. The visually recogniZable markings combine With the A physical page or page is a print medium that includes visually unobtrusive markings to render regions that are asso only one sheet; a card, a poster, and an ordinary sheet are all ciated With an automatic action distinguishable from other pages Within this meaning. A hardcopy document, or simply regions that are associated With a respective automatic action. 30 “a document,” is a collection of one or more pages, With at The associated processing component associates a region of least one page including markings indicating information. the article observed by the optical input device With a location The physical document can be formed by marking a physi identi?er and the associated action device provides the action cal medium, such as a usually ?imsy physical sheet of paper, automatically in response to the location identi?er. plastic, or other suitable physical print media substrate for In another aspect, a system for document internal localiZa 35 images With a marking material, such as ink or toner, gener tion of a pointing gesture is provided. The system includes an ally referred to as printing. The document may be rendered on optical input device Which captures machine readable mark a single sheet or multiple sheets by a standard o?ice printer ings in a region of an associated hardcopy document. A pro (e.g., ink-jet, laser, etc.) or a large clustered on-demand docu cessing component associates the captured machine readable ment printer. markings With a location identi?er. The location identi?er 40 In various aspect of the exemplary embodiment, local visu automatically implements a computer implemented process. ally recogniZable regions of the document are used for deter The hardcopy document includes disambiguating machine mining a pointing location of a suitable optical input device readable markings Whereby otherWise indistinguishable such as a Webcam or pen-camera, relative to the document. regions are distinguishable by the processing component. The The document includes a plurality of discrete pointing loca processing component is adapted to utiliZe visually recogniZ 45 tions, each of Which is associated With accessible informa able machine readable markings for associating the region tion. The document includes a plurality of discrete pointing With the identi?er for those regions Which are distinguishable locations, each of Which may be associated With accessible Without disambiguating markings and to utiliZe a combina information Which is stored in a location Which is remote tion of machine readable visually recogniZable markings With from the document. Each of the discrete pointing locations is machine readable disambiguating markings for at least some 50 distinguishable from the other discrete pointing locations of the regions Which are indistinguishable Without the disam When the document or portion thereof is presented to the biguating information. optical input device. The document includes disambiguating information Which is added to the visually recogniZable por BRIEF DESCRIPTION OF THE DRAWINGS tions of the image for supporting unique location identi?ca 55 tion of a pointing location. The disambiguating visual infor mation comprises machine readable information, such as a FIG. 1 is a schematic vieW of a system for document marking or a collection of markings, Which is visually unob internal localiZation of a pointing gesture according to one trusive to a human reader, but Which is su?icient to alloW each aspect of the exemplary embodiment; discrete pointing location to be distinguished from all other FIG. 2 demonstrates the addition of visual noise to tWo 60 discrete pointing locations in the document by a machine. A occurrences of a character trigram (the Word “neW”) in a marking or collection of markings, even if visible, is consid textual portion of a document; ered to be visually unobtrusive if it does not obstruct a FIG. 3 is an enlarged vieW of one character (the letter “e”) human’s visual perception of other visible markings printed in the ?rst occurrence of the trigram of FIG. 2; in the same area of the print medium. The disambiguating FIG. 4 is a How diagram of a process for creating a hard 65 information can be in the form of random noise or a Water copy document With internal localiZation and for initiating a mark, and may include modi?cations to localiZed regions of computer implemented process With the document; and the image by changing a color or gray level of bit of the image US 7,543,758 B2 5 6 associated With the pointing locations. In one embodiment, The optical input device 10 defects markings, e. g., by this is achieved With a minimal amount of noise added, and taking a digital snapshot of the observed region 12, and pro only if disambiguation is necessary for the region under con vides signals that include information about the markings. sideration. The information is considered to be noise When it Suitable optical input devices 10 are described, for example, aids in localization but does not of itself provide localizing in US. Pat. No. 6,330,976, the disclosure of Which is incor information. A Watermark serves as disambiguating informa porated herein in its entirety, by reference. tion Which can be used in localiZation but also provides infor The document 14 may include one or more machine read mation Which can be independently used to identify a particu able identi?ers 30 Which alloW the computer device 18 to lar pointing region (“intelligent information”). The identify the document or a portion thereof, such as a page of disambiguating information includes machine-readable the document. Unique document identi?cation need not pro markings Which can be invisible or can be visually unobtru vided locally, such as at each pointer location, but can be sive markings. provided globally. For example, a global identi?er 30, such as The disambiguating information can be added to the hard a Dataglyph identi?er is provided on a cover page of the copy document as part of the printing process, such as by document. The global identi?er 30 can be read once and printer driver software, by a Postscript engine in a printer. thereafter provides the context for the subsequent pointing actions. An identi?er 30 is considered to be a global identi?er Markings are considered to be machine readable if tech if it maps to at most one digitally stored document. niques are available for automatically obtaining the informa As illustrated in FIGS. 2 and 3, the document 14 includes tion from signals that include information about the mark disambiguating information 32, Which is added to the original ings. Markings are considered to be human readable if 20 document from Which the hardcopy document 14 is derived, humans can perceive the markings and extract information generally at the time of printing the document. An election to therefrom. provide a document With localiZation capabilities in the By shoWing the printed document to a computer device printed output can be made by the user. This election can be With a suitable associated optical input device (e.g., a pen made once, e.g., at the time of printer driver installation, and camera or Webcam), the machine-readable information is 25 consistently applied thereafter. Or it can be made on a print decoded, and used to invoke a computer implemented pro job by print-job or page by page basis, e.g., through a print cess. Speci?cally, the machine-readable markings Within an dialog presented by the application program When the user observed region of the printed document encode a location requests printed output. Or the localiZation capability can be identi?er. The machine-readable markings are decodable to added all the time, Without any user election or involvement. obtain the location identi?er by a processing component in 30 A user may manipulate (e.g. position or move With a Wip the computer device. The optical input device may include ing action) the optical input device 10 to capture local images detection circuitry. Input signals from the detection circuitry, of regions 12 of the document 14 and provide input signals including information de?ning the machine-readable mark de?ning the captured images. Data derived from the captured ings are input to the processing component. The location images is used by the computer device to identify an associ identi?er may specify an action device using a globally 35 ated computer implemented process. unique identi?er that identi?es the document and an action As illustrated in FIG. 4, a method for storing a document using a location identi?er that identi?es a region of the hard and implementing a computer implemented process may copy document. The processing component may be adapted include the folloWing steps. It Will be appreciated that the to provide the location identi?er through a netWork to an steps may be performed in a different order from that illus action device to produce the action. The action may relate to 40 trated or that feWer or additional steps may be performed. At the region of the document. The action device may provide step S100, an original electronic document to be rendered on the action automatically in response to the location identi?er. the print-medium is provided. At step S102, su?icient disam With reference to FIG. 1, an exemplary system for invoking biguating information (e.g., noise) is added to the document a computer implemented process, such as the provision of to ensure that matching ambiguities do not occur, thereby information associated With pointer locations in a document, 45 generating an electronic localiZable document. At step S104, includes an optical input device 10 capable of capturing a a physical localiZable document is generated, e. g., a hardcopy localiZed region 12 of a document 14 embodied on a print of the disambiguated localiZed document is printed (it Will be medium such as a sheet 16 of paper or multiple sheets of appreciated that this step may be performed later). At step paper. In the illustrated embodiment, the localiZed region 12 S106, the disambiguated localiZed electronic document cre is circular although other shapes are contemplated. The opti 50 ated in step S102 is vieWed as one large sheet of paper (that is, cal input device 10 is associated With a computer device 18, Which displays all the pages next to each other). At step S108, Which is con?gured for invoking a computer implemented the image of this sheet is digitally stored. At step S110, a user process. The computer implemented process may include provided With the physical document, moves the pen camera displaying information associated With the pointing location to a location of interest on the document. The current region on a screen 20 or other suitable visual or audible information 55 seen by the camera (such as a centimeter-square area or a output device, such as a landline telephone, cell phone, PDA, circle Which encompasses such an area) is considered by the MP3 player, radio, or the like. Other computer implemented computer device 18. At step S112, this region is matched With processes are also contemplated, such as ordering a part from a corresponding region in the stored image. The amount of a catalog, doWnloading music, or providing other information visual noise that Was added to the original document is su?i associated With the pointer location to a user. The computer 60 cient to ensure that matching ambiguities do not occur. device 18 may be linked to the optical input device 10 by a The essence of the exemplary embodiment may be sum suitable Wired or Wireless link 22, Whereby information mariZed in tWo “equations”: extracted from the document by the optical input device 10 is localizable docuInenForiginal document+disambigu— communicated to the computer device. In an alternative ating visual noise embodiment, at least one of the computer device and the 65 output device 20 is incorporated into a body (not shoWn) of pointer IOcaIiZatiOnIcamera the optical input device 10. region<—match—>document region US 7,543,758 B2 7 8 The approach can be used for all areas of the document, comparator is capable of distinguishing relatively small dif Whether text or images. In an exemplary embodiment, a ferences betWeen tWo trigrams of the same type. Multiple method Which can be applied to text as Well as to images is occurrences of the data are distinguishable by the disambigu described; the speci?c techniques for both cases Will be ating information Which is captured along With the markings. described in turn. An embodiment suitable for textual regions The comparator 64 identi?es the closest match and retrieves is described ?rst. Then a speci?c embodiment for image the associated location identi?er. The location identi?er is regions is described. The latter embodiment is conceptually used to initiate a computer implemented process. For simple but tends to incur higher computational cost than the example, the processing component 52 communicates With textual regions. an action processor 68 Which provides an automatic action, In one embodiment, the method is applied to textual such as retrieving Web pages, ordering parts form a catalog, regions of a document. Textual regions include regions in providing geographical information, or the like. The interac Which characters from a ?nite set of recogniZable characters tions betWeen the components and alternative con?gurations are arranged, generally in horiZontal strings of characters. of the system Will be appreciated from a description of the The characters may comprise letters, numbers, mathematical method. symbols, marks, combinations thereof, and the The method includes tWo distinct stages: the ?rst step is like. The characters in the text are generally of the same siZe creation of a localiZable document and the second step or selected from a ?nite set of character siZes. In one embodi includes retrieval of a location identi?er using the localiZed ment, each character is uniquely recogniZable, for example, document. At document construction time, the folloWing using optical character recognition (OCR). However, as Will steps may be implemented, as illustrated in FIG. 4. At step be appreciated, each character may be repeated in the original 20 S100, a bitmap image of the document is constructed. Step document a plurality of times. An Ngram comprising a S102, Which is the step of adding disambiguating informa sequence of N characters may also be repeated in the original tion, may include the folloWing substeps. At substep S102A, document multiple times. Thus, unique localiZation relying a bitmap image of the document is created. At substep S102B, on identi?cation of individual characters or Ngrams is not collision lists are built. In the case of textual regions, this may generally possible for most documents. 25 comprise building an a priori table 58 (FIG. 1) of all character The object is to alloW a user to point the pen-camera to a trigrams 60 in the document Which are recogniZable by the certain textual region of the hardcopy document 14 and alloW OCR, With one entry per trigram type (e. g., the tri gram “new”, the position of the pen-camera to be recovered, relative to the comprising the three characters n, e, and W, illustrated in FIG. document. The pen-camera generally has a vieWing diameter 2), and With an associated list of occurrences of this trigram that alloWs it to take in at least N characters, Where N is an 30 type in the document. Trigrams may be scaled to a common integer. In the illustrated embodiment, N:3 characters (a siZe prior to forming the table. For each such occurrence of a trigram), With at most one of the characters being a space. It trigram type, the position 62 at Which it occurs in the bitmap Will be appreciated that N can be feWer or greater than 3, such image is recorded in the table. The position may include x, y as for example, from 1 to 10, depending on the siZe of the coordinates of a pixel in one of the characters, such as the ?rst camera snapshot and the character siZe. The electronic docu 35 character in the trigram 60. For each trigram type, the docu ment may be represented in PDF or other suitable format. ment generation softWare (not illustrated) checks that each As illustrated in FIG. 1, an exemplary embodiment of occurrence of the trigram type in the bitmap is visually dis computer device 18 is shoWn. The computer device 18 may be tinguishable by the camera (Substep S102C). The document entirely local to the pen camera 10 or may be distributed such generation softWare comprehends the capabilities of the com that parts of the computer device, such as some of the pro 40 parator 50, Which provides ?ner recognition processing of the cessing, memory, and retrieval components, are located trigram than the OCR. Initially, tWo or more occurrences of remote from the pen camera 10 and are accessed via a net the trigram “neW” Will generally be considered identical and Work connection, intemet connection or the like. For pur thus visually indistinguishable. Where the tWo or more tri poses of discussion of the exemplary embodiment, local pro grams are not visually distinguishable, a small increment of cessing is handled by a Workstation CPU 40 or other local 45 “random visual noise” 32 or other disambiguating informa device Which communicates With a central processor 42 via a tion is added to each occurrence of the trigram 60 in the Wired or Wireless link 44. The central processor 42 includes a bitmap (Substep 102D). The processor then returns to substep memory component 46 Which temporarily stores the S102C and checks if the trigrams 60 of the same type are observed region 12 of the document in electronic form 50 and visually distinguishable (by the camera/comparator). Sub a processing component 52, Which matches the observed 50 steps S102C and S102D may be repeated tWo or more times, region 12 With a corresponding region of the electronically adding incrementally more noise 32 until all the trigrams 60 stored document and generates a location identi?er corre of the same type are distinguishable. In this Way, a minimum sponding to the location of the observed region in the docu amount of noise 32 needed to distinguish them is added. Once ment. The illustrated processing component, Which is par all the trigrams 60 are distinguishable, the next trigram type is ticularly suited to text recognition, includes a raW processor 55 subjected to substeps S102C and S102D, until all trigram 54, such as an optical character recognition processor (OCR) types in the document have been processed (Substep S102E). Which is capable of identifying, raW, unre?ned markings, At localiZation time (Step S110) the bitmap region 12 seen such as characters, in the captured region 48. The processing by the camera is captured. Step S112 may include the folloW component 52 may be in communication With a database ing substeps: First, OCR the captured region to retrieve three memory 56 Which stores a bitmap of the document. The 60 consecutive characters, the observed trigram (Substep memory component 46 also stores a collision list, e. g., a table S112A). It Will be appreciated that the region seen by the 58 comprising trigrams 60 recogniZable by the OCR, and camera may encompass more than three characters, in Which associated localiZation information 62, such as x, y coordi case, a trigram is selected, such as the ?rst trigram capable of nates of the trigram location in the document. The processing OCR. At Substep S112B, the trigram table 58 may be con component 52 also includes a comparator 64, Which serves as 65 sulted With the observed trigram, and all occurrences of the a ?ne processor for comparing the observed region 12 With trigram type are identi?ed. The observed trigram may be each occurrence of the OCR’d trigram in the document. The scaled to the common siZe used in the table prior to consulting US 7,543,758 B2 10 the table. This OCR step may involve a relatively unre?ned Vol. 6, No. 4 (August 2004). These techniques embed infor character comparison, Where the disambiguating noise is mation in a redundant Way that alloWs a more robust detec ignored. A proximity match of the captured bitmap region tion. With the bitmap regions associated With the different occur Another alternative is to ‘Watermark’ each trigram e.g., rences of this trigram in the document table is then performed With the trigram’s position in the trigram table. For example, With the comparator 64 (Substep S112C). In this step, the the third occurrence of the trigram “neW” in the table may be visual noise added to repeat occurrences of the trigrams is accompanied by a watermarked “3” in the document, adja used to ?nd the trigram occurrence Which most closely cent the trigram, to accelerate the retrieval. The Watermark is matches the observed trigram. The document location of the suf?ciently small to be visually unobtrusive and alloW OCR of the characters. Microprinting techniques exist for adding closest occurrence is then output (Substep S112D). The out numbers or other characters as small print Which is almost put location identi?er can be used to determine the exact unnoticeable to the eye, for example, using high-end laser position of the camera-pen or to implement a computer printers. Such techniques may be employed to add the disam implemented process (Step S114). biguating information in appropriate locations. In one embodiment, if the OCR is unable to identify a It should be noted, hoWever, that While the purpose of single trigram type With certainty (at Substep S112A), the steganography is traditionally to hide information in the OCR may output a small number of candidates, such as tWo to document, unnoticed by the reader, for purposes such as ?ve candidates, and the union of the corresponding set of authentication, copyright control, or covert communication, occurrences can be used for evaluation at Substeps S112B the purpose here is not hiding information per se, but rather and S112C. 20 adding disambiguating information in a Way Which is visually It Will be appreciated that the table of trigrams 60 may be minimally disturbing, for localiZation purposes. stored locally in the pen body, or on some external device In this approach to pointer localiZation, it is possible to rely on a very convenient property of textual regions: the charac such as Workstation 40 or central processor 42, With Which the ter-name is a poWerful invariant feature of the character pen communicates. 25 bitmap, and such a feature can be used as an ef?cient search An exemplary process of adding visual noise to bitmap key for matching textual regions. For image regions, i.e., regions associated to the trigram occurrences Will noW be those portions of the document Which cannot readily be dis described in greater detail. As illustrated in FIG. 2, at each tinguished using optical character recognition techniques, iteration of Substep S102C, a feW, e. g., tWo or more contigu such as graphics and pictures, other approaches may be used, ous pixels 32 are randomly selected in a region 70 Which does 30 as described beloW. As With the textual recognition method not impair OCR of the characters 72, such as in the White described above, the image recognition method also adds regions surrounding the characters 72. The selected pixels 32 disambiguating information to regions of the image to alloW are given a certain value of grey (or, in a variant, just given a localiZation of the pen camera. black value). The iterations are stopped When the level of It Will be appreciated that localiZation of the pointer in noise (pixels 32) added in this Way is su?icient for the camera 35 image regions is more demanding in terms of resources (stor to distinguish all the occurrences of the given trigram. As can age space, processing time) than When Working With textual be seen for example in FIG. 2, each character 72 may have one regions. The folloWing approach is conceptually simple, but or more associated visual noise regions 32 proximate thereto. of relatively high computational cost. As With the textual The visual noise associated With one character of the trigram method, the steps include constructing a bitmap of the appears in a different location in the next occurrence of the 40 document (Step S102A), adding su?icient disambiguating trigram, and so forth. For example, looking at the letter “e”, information to the document to ensure that matching ambi visual noise 32 is added in the ?rst trigram occurrence proxi guities do not occur (step S102), creation of a physical local mate the loWer right of the character, While in the second iZable document (Step S104), vieWing disambiguated local occurrence, visual noise is added in the enclosed White space iZed electronic document as one large sheet of paper (Step near the top of the character. It Will be appreciated that 45 S106) and digitally storing the image of this sheet (step although every character 72 may be associated With one or S108). At localiZation time, the process includes movement more visual noise regions 32, one or tWo of the characters 72 of a pen camera over the physical document (Step S110), in the trigram may have no associated visual noise. matching the observed region With a corresponding region in Many other techniques for adding disambiguating infor the document (Step S112). mation are possible, some inspired by “steganography” as 50 At substep S102B, collision lists 60 are built. With refer disclosed, for example, in D. Sellars, “An Introduction to ence to FIG. 5, an exemplary image portion 76, such as a page Steganography,” at of a document in the form of a map, is shoWn. First, the http://WWW.totse.com/en/privacy/encryption/ collection of images 76 in the document are considered, l63947.html. Which are assumed for purposes of discussion to be rectan 55 gular. On each image, a square grid comprising 78 cells c of For example, one method of adding disambiguating informa equal Width and height a is de?ned. The cell Width a is taken tion is to randomly shift the characters 72 in the trigram at a suf?ciently small value that the folloWing property holds: relative to each other (“Word-shift coding”), or to alter certain for any snapshot taken by the camera 10, the snapshot con features of the characters (“feature coding”, e.g. altering the tains at least a cell of the grid. For example, if snapshots 12 are length of vertical endlines in characters such as b, d and h). 60 assumed to be circular of radius r, then Another method of adding disambiguating information is to perform binary image Watermarking, as described, for air/J2. example, in Y.-Y. Chen, H.-K. Pan, and Y.-C. Tseng, “A In addition to the grid cells shoWn in FIG. 5, additional Secure Data Hiding Scheme for Binary Images,” IEEE Trans. cells (square regions of Width a) are de?ned by incrementally on Communications, Vol. 50, No. 8, (August 2002) and Min 65 shifting the grid cells in x and y directions by ?xed amounts, Wu and Bede Liu, “Data Hiding in Binary Image Authenti such as in increments of from 1 to 10 pixels. For each grid cell cation and Annotation,” IEEE Transactions on Multimedia, c in the ?nite collection of grid cells cc for all images of the US 7,543,758 B2 11 12 document, a “collision list” 58 is determined. The collision 87(7) (July 1999). Unlike many applications in these ?elds, list 58 is a set of “collision candidates,” that is square regions the present application does not require that the disambigu of Width a (not necessarily aligned With grid cells shoWn in ating information be recovered and exploited for its oWn sake, FIG. 5, but of the same siZe), Which occur in one image of the rather, that it is only necessary to add random information for document, and for Which the visual difference With the cell c purpose of image disambiguation. is smaller than a certain threshold. Some of the techniques for adding disambiguating noise For example, in order to determine the collision list, the Which may be used in the present application include altering grid cell c is moved by increments of one pixel at a time over the values of loW-signi?cance bits (LSB) in the RGB repre the Whole surface of the different images, registering the sentation of pixels (e.g., in the White background areas of the difference at each move, and registering a hit each time this map illustrated in FIG. 5); correlation-based Watermarking difference is smaller than the threshold. Only candidates (adding to images pseudo-random noise generated by a key Which are at a certain minimum offset from each other are and retrieving this key by auto-correlation techniques); fre maintained in the collision list, so as not to list tWice some quency-domain transform (transformation on a discrete Fou region Which corresponds to undistinguishable pointing loca rier transformation of the image Which When inverted modi tions. This avoids spurious effects such as listing tWice tWo ?es the image globally but in a non-noticeable Way); and regions Which are offset by one pixel and are visually similar, Wavelet transform (a similar idea, but using Wavelets). The but actually correspond to the same pointing action. For advantage of many of these techniques (With the exception of example, the minimum offset may beia. the LSB approach) is that they are designed to be resistant to The time taken to construct the collision list is obviously some transformations such as scaling, rotation, and lighting quite large: for each cell c, on the order of N matches are 20 variations. Such transformations may arise When an image performed, Where N is the total number of pixels in the portion is captured With a camera. As With the textual method, document images. the embedded information can correspond to the cell’s posi Improvements of this simplistic matching procedure are tion in the collision list. possible by using more sophisticated image-correlation tech Retrieving the location may proceed as folloWs: Once a niques. For example, matching techniques described by Nil 25 snapshot 12 is taken by the camera, one Way to retrieve its lius and J. O. Eklundh, “Fast Block Matching With Normal location is through classical image correlation techniques. iZed Cross-Correlation using Walsh Transforms,” Report, For example, the best match of the snapshot in the reference ISRN KTH/NA/P-02/11-SE (2002); Natsev, R Rastogi and document image (or list of images) is found. The match K. Shim, “WALRUS: A Similarity Retrieval Algorithm for contains at least one grid-cell region c. Because of the pre Image Databases”, IEEE Transactions on Knowledge and 30 disambiguation that Was performed on the grid-cell regions Data Engineering, 16-3 (2004); and Gupta and R. Jain, through addition of visual noise, this region corresponds to a “Visual Information Retrieval,” CACM (1997). For example, unique grid cell location, Which alloWs the grid to be super given a certain cell c, it is possible to use these techniques to imposed over the snapshot and therefore permits a precise ?nd regions of the document images Which are highly corre determination of the pointer’s location (e.g., at the center of lated With a given cell c more ef?ciently than through one 35 the snapshot 12) relative to the document, as shoWn in FIG. 5. pixel-at-a-time iterations. These high-correlation regions can The snapshot can only match at one location on the grid, then be used as the elements of the collision list. otherWise some grid cell region Would appear tWice in the For each grid cell c, a collision list [srl, . . . , srk] 58 is built document. The center of the pointing action (center of circle) Which includes all square regions (cells) srl, . . . , srk that are can then be precisely located relative to the document. indistinguishable from cell c. As With the textual method, the 40 An alternative to the above mentioned grid cell method is a next step is to add disambiguating information to distinguish descriptor-based approach. At decoding time, instead of betWeen confusable regions. For example, one or more of the attempting to match a camera snapshot over the Whole docu regions in the set S:{c, srl, . . . , srk} is modi?ed slightly in ment, only the snapshot 12 need be matched With document such a Way that they become pairWise distinguishable. regions Which share certain visual descriptors With the snap In one method, an operation analogous to that used in the 45 shot. At location time, the same technique can be used to case of textual regions is repeated a number of times: Check locate quickly regions that may collide With a given grid cell. that any tWo elements of S are visually distinguishable by the As Will be appreciated, the OCR method described above for camera (Substep S102C). If so, go to next pair. If not, add a text documents and text regions of documents is a specialiZed small increment of “random visual noise” to each element of application of a descriptor-based approach. S, and go back to Substep S102C. 50 The computational cost can be signi?cantly reduced by There are many Ways to add incremental random noise to replacing a simple matching approach With the use of some the regions of S. A simple option is take a feW random pixels speci?c location descriptors. For example, if the relevant in each region, and alter their RGB values in some Way (for actionable regions are rich in texture information, a comer instance replacing a light pixel by a dark one or vice-versa). based detector may be used. Several exemplary detectors of The noise added is suf?cient for the pen camera snapshot to be 55 this type are described, for example, in C. Schmid, R. Mohr distinguished from the other similar occurrences. For more and C. Bauckhage, “Evaluation of Interest Point Detectors,” sophisticated approaches, techniques developed in the ?elds in International Journal of Computer Vision, 37(2), 151-172 of steganography, Watermarking, and more generally “infor (2000). The comer based detector can extract all the corners mation hiding” may be employed, as described for example, in the image and consider only cells centered on those points. in Shoemaker, “Hidden Bits: A Survey of Techniques for 60 The collision list 58 is then built only for those cells as Digital Watermarking,” Independent Study, EER-290 (2002); described above. The advantage of tho se points is that they are G. Csurka, F. Deguillaume, J. Ruanaidh and T. Pun, “A Baye resistant to scale, rotation and illumination changes so that sian Approach to A?ine Transformation Resistant Image and during the location step, a determination the center of the cell Video. Watermarking,” in Lecture Notes In Computer Sci can be ascertained. ence, Vol. 1768 (1999); and F. Peticolas, R. Anderson, and M. 65 Another optimiZation is possible if speci?c region descrip Kuhn, “Information HidingiA Survey,” Proceedings of the tors are use to describe the cells, as described, for example, in IEEE, Special Issue on Protection of Multimedia Content, K. MikolajcZyk and C. Schmid, “A Performance Evaluation US 7,543,758 B2 13 14 Of Local Descriptors,” in IEEE Transactions on Pattern automated processing component and, Where the observed Analysis and Machine Intelligence, Vol. 27, No. 10 (October region of hardcopy document is indistinguishable from the 2005). ?rst region, retrieving the location associated With the ?rst The advantages of using local descriptors are not only in region. loWer cost but also in resistance to scale, rotation, and illumi 2. The method of claim 1, further comprising: nation changes. To disambiguate cells in the collision list automatically initiating a computer implemented process noise, e.g., random noise or Watermarks can be added. In this associated With the retrieved location. case, the correlation is done only betWeen the query cell and 3. The method of claim 1, further comprising: the list of cells having the same descriptor. These descriptors identifying, from among the multiplicity of regions of the can be used to build cell tables (collision lists) in a Way similar electronically stored document, regions Which are dis to the trigrams recogniZed by OCR in text. tinguishable from all the other regions in the multiplicity In some applications, only certain regions in the image are of regions, by the automated processing component, actionable (i.e., are associated With a location identi?er Which Without addition of disambiguating information. initiates a computer implemented process). As a result, the 4. The method of claim 3, further comprising: regions that need to be disambiguated are further limited. adding no disambiguating information to the regions Only su?icient disambiguating information need be added to Which are distinguishable from all the other regions of the document such that every active region is distinguishable. the electronically stored document. Once the localiZation of the pointing action is determined, 5. The method of claim 1, Wherein electronically stored a computer implemented process may be invoked by the document includes text characters and Wherein the identify processor. Exemplary actions Which may be associated With 20 ing of the set of regions includes comparing a group of N text the pointing gesture include the display of touristic informa characters in a ?rst of the regions With a group of text char tion about a monument pointed to on a map, ordering a acters in the second region, Wherein N is at least tWo. mechanical part from a paper catalog, retrieving the stored 6. The method of claim 5, Wherein the comparing of the translation of a selected piece of text, playing a recorded observed region of the hardcopy document includes: interpretation from the position indicated by the pointer on a 25 identifying a group of N text characters in the region With musical partition, or generally any action that can be triggered optical character recognition; by using a mouse With a computer screen. Other examples of retrieving regions of the electronically stored document uses for intelligent paper Which could be implemented With Which include the same group of N text characters; and the exemplary embodiment are described, for example, in M. comparing the retrieved regions With the observed regions Dymetman and M. Copperman, “Intelligent Paper,” in Elec 30 to identify a retrieved region having disambiguating tronic Publishing, Artistic Imaging, and Digital Typography, information Which matches the disambiguating infor pp. 392-406 (R. Hersch, J. Andr, H. Brown, Eds, Springer mation of the observed region. Verlag, 1998) and in Us. Pat. No. 6,752,317 to Dymetman, et 7. The method of claim 1, Wherein the adding of disam al.; U.S. Pat. No. 6,959,298 to Silverbrook, et al. (effecting biguating information includes at least one of adding random travel services); U.S. Pat. No. 6,963,845 to Lapstun, et al. 35 noise and Watermarking. (enabling authority to transmit electronic mail using a busi 8. The method of claim 1, Wherein the identifying of the set ness card); U.S. Pat. No. 6,847,883 to Walmsley, et al. (iden of indistinguishable regions and the adding of disambiguat tifying a geographic location on a map, the disclosures of ing information are performed by an automated processing Which are incorporated herein by reference in their entireties. component. It Will be appreciated that various of the above-disclosed 40 9. The method of claim 1, Wherein each of the multiplicity and other features and functions, or alternatives thereof, may of regions is associated With a corresponding location in the be desirably combined into many other different systems or electronic document. applications. Also that various presently unforeseen or unan 10. The method of claim 1, Wherein the generation of the ticipated alternatives, modi?cations, variations or improve hardcopy document from the electronic document comprises ments therein may be subsequently made by those skilled in 45 printing the disambiguating information on print media. the art Which are also intended to be encompassed by the 11. The method of claim 1, Wherein the identifying of the folloWing claims. set of the regions Which are indistinguishable includes gen The invention claimed is: erating a collision list Which lists the regions in the set 1. A method for document internal localiZation compris together With their locations in the electronically stored docu ing: 50 ment. identifying, for a multiplicity of regions of an electroni 12. The method of claim 11, Wherein the generating of the cally stored document, if there are regions Which are collision list includes generating a plurality of collision lists, indistinguishable, by an automated processing compo each list including locations of a plurality of regions Which nent, from other regions in the multiplicity of regions; are indistinguishable from other regions in the set but Which in the event that regions Which are indistinguishable are 55 are distinguishable from regions in other sets. identi?ed, for a set of regions Which are indistinguish 13 . A system for generating a document With internal local able from each other, adding machine readable disam iZation capability comprising: biguating information to at least one of the regions in the a processing component Which determines Whether action set Whereby the regions in the set of regions are distin able regions of an electronically stored document asso guishable by the automated processing component; 60 ciated With a pointing action are distinguishable, associating each of the regions in the multiplicity of the processing component adding su?icient machine read regions With a corresponding location in the electronic able disambiguating information to the electronically document; stored document to render the actionable regions asso generating a hardcopy document from the electronic docu ciated With a pointing action Which are indistinguishable ment; 65 Without the disambiguating information distinguish comparing an observed region of the hardcopy document able, the processing component being con?gured for With a ?rst region in the multiplicity of regions With the generating a collision list Which lists regions in a set of US 7,543,758 B2 15 16 regions Which are indistinguishable Without disambigu able from other regions that are associated With a respec ating information together With their locations in the tive automatic action, Whereby the processing compo electronically stored document; and nent associates a region of the article observed by the a memory Which stores the electronically stored document optical input device With a location identi?er, the action With the machine readable disambiguating information. device providing the action automatically in response to 14. The system of claim 13, Wherein the memory stores a the location identi?er. location identi?er for each actionable region of the document. 17. The combination of claim 16, Wherein the article com 15. A system for document internal localiZation compris prises a hardcopy document. ing: 18. The combination of claim 16, further comprising a a user-positionable optical input device for capturing a machine readable global identi?er Which identi?es the local image of a user-observed region of a hardcopy article. document; 19. A system for document internal localiZation of a point a processing component Which determines a location of the ing gesture comprising: observed region Within the document by matching the an optical input device Which captures machine readable captured image of the region With a corresponding 15 markings in a region of an associated hardcopy docu region of an electronically stored version of the hard ment; copy document, the hardcopy document and electroni memory Which stores an electronic copy of the document; cally stored version of the document including machine a processing component Which associates the captured readable disambiguating information Whereby a region machine readable markings With a location identi?er is distinguished from similar regions Within the docu 20 corresponding to the location of the observed region in ment, the disambiguating information being localiZed in the hardcopy document and the corresponding elec regions of the document Which are not distinguishable tronic document, the location identi?er automatically Without the disambiguating information. implementing a computer implemented process, the 16. In combination, An an article of manufacture for hardcopy document including disambiguating machine obtaining automatic actions through a netWork, and the sys 25 readable markings Whereby otherWise indistinguishable tem of claim 15, the article of manufacture comprising: regions of the hardcopy document are distinguishable by print medium; the processing component, the processing component visually recogniZable machine-readable markings on the being adapted to utiliZe visually recogniZable machine print medium; and readable markings for associating the region With the visually unobtrusive machine readable markings localiZed 30 respective location identi?er for those regions Which are in regions of the print medium Which are indistinguish distinguishable Without disambiguating markings and to able from other regions by the visually recognizable utilize a combination of machine readable visually rec machine-readable markings alone, the visually unobtru ogniZable markings With machine readable disambigu sive machine readable markings being absent from at ating markings for associating the region With the least some regions Which are distinguishable by the 35 respective location identi?er for at least some of the visually recogniZable machine-readable markings regions Which are indistinguishable Without the disam alone, the visually recogniZable markings combining biguating information. With the visually unobtrusive markings to render regions that are associated With an automatic action distinguish * * * * *