US 20130273968A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2013/0273968 A1 Rhoads et al. (43) Pub. Date: Oct. 17, 2013

(54) METHODS AND SYSTEMS FOR CONTENT Publication Classification PROCESSING (51) Int. Cl. (71) Applicant: Digimarc Corporation, (US) H04M I/02 (2006.01) (52) U.S. Cl. (72) Inventors: Geoffrey B. Rhoads, West Linn, OR CPC ...... H04M I/0264 (2013.01) (US); Tony F. Rodriguez, Portland, OR USPC ...... 455/556.1; 382/305; 382/103:455/566 (US) (57) ABSTRACT Cell phones and other portable devices are equipped with a (21) Appl. No.: 13/774,056 variety oftechnologies by which existing functionality can be improved, and new functionality can be provided. Some (22) Filed: Feb. 22, 2013 aspects relate to data driven imaging architectures, in which a 9 cell phone's image sensor is one in a chain of stages that Successively act on packetized instructions/data, to capture Related U.S. Application Data and later process imagery. Other aspects relate to distribution (63) Continuation of application No. 12/484.115, filed on of processing tasks between the device and remote resources Jun. 12, 2009, now Pat. No. s 385 97 which is a (“the cloud'). Elemental Image processing, Such as filtering COtinuation-in-p art of application No. 12A271,692 and edge detection and even some simpler template match filed on Nov. 14, 2008, now Pat. No. 8,520,979 sw- as ing operations—may be performed on the cell phone. Other • u is s • L vs. 8 Y-sa-1 sys- - • operations are referred out to remote service providers. The (60) Provisional application No. 61/090,083, filed on Aug. remote service providers can be identified using techniques 19, 2008, provisional application No. 61/096,703, Such as a reverse auction, though which they compete for filed on Sep. 12, 2008, provisional application No. processing tasks. Other aspects of the disclosed technologies 61/100,643, filed on Sep. 26, 2008, provisional appli relate to visual search capabilities, and determining appropri cation No. 61/103,907, filed on Oct. 8, 2008, provi ate actions responsive to different image inputs. Still others sional application No. 61/110,490, filed on Oct. 31, concern metadata generation, processing, and representation. 2008, provisional application No. 61/176.739, filed on Yet others relate to coping with fixed focus limitations of cell May 8, 2009, provisional application No. 61/174,822, phone cameras, e.g., in reading digital watermark data. Still filed on May 1, 2009, provisional application No. others concern user interface improvements. A great number 61/169.266, filed on Apr. 14, 2009. of other features and arrangements are also detailed.

CELL NETWORK TO COMM SERVICES CHANNEL INTERFACE WIFI TO INTERNET

TRAFFIC MONITORING 8. BILLING SYSTEM Patent Application Publication Oct. 17, 2013 Sheet 1 of 52 US 2013/0273968A1

O|X|\,|ONALENTITEO

WELSÅSSONITTIE

Patent Application Publication Oct. 17, 2013 Sheet 2 of 52 US 2013/0273968A1

EENAN\/WITH HOSEEHEDEC]) (NOILVHV?ES

LOETEO

ERHT,LSE|5) Patent Application Publication Oct. 17, 2013 Sheet 3 of 52 US 2013/0273968A1

LINEW/\\/c) SY-HOSSEOORHCH

SSEITERHIWA ETSOOOSO NESTEIN Patent Application Publication Oct. 17, 2013 Sheet 4 of 52 US 2013/0273968A1

KEYVECTOR DATA (PIXELS & DERIVATIVES) FIG. 4 / N

RESULTA RESULT B RESULT C

KEYVECTOR FIG. 4A

ADDRESSING, ETC / PXEL PACKET Patent Application Publication Oct. 17, 2013 Sheet 5 of 52 US 2013/0273968A1

FIG. 4B

HEADER DATA KEYVECTOR DATA

TASKA TASK B TASK C TASKD

VISUALTASK COMMONALITY PE CHARTS H FFT % RE-SAMPLING 1

PDF417 BARCODE READING FIG. 5 Patent Application Publication Oct. 17, 2013 Sheet 6 of 52 US 2013/0273968A1

RESIDENT CALL-UP VISUAL PROCESSING SERVICES

ON ON OFF OFF ON OFF

COMMON SERVICES SORTER

LEVEL PXEL PRO CESSING LIBRARY

FLOW GATE CONFIGURATION, SOFTWARE PROGRAMMING EXTERNAL

PXEL KEYVECTOR SENSOR Msier HARDWARE processess-ROUTINGDATA OTHER)

FIG. 6 INTERNAL Patent Application Publication Oct. 17, 2013 Sheet 7 of 52 US 2013/0273968A1

LNALN OTENNWHOWWOO

STWOO Patent Application Publication Oct. 17, 2013 Sheet 8 of 52 US 2013/0273968A1

BNAS LN

Sng WoO Patent Application Publication Oct. 17, 2013 Sheet 9 of 52 US 2013/0273968A1

SOAO Patent Application Publication Oct. 17, 2013 Sheet 10 of 52 US 2013/0273968A1

\/LVCI·HEYA‘HOSSEOOHd

ENOHc]TTEO Patent Application Publication Oct. 17, 2013 Sheet 11 of 52 US 2013/0273968A1

EOLAECI/TV/OOT

Patent Application Publication Oct. 17, 2013 Sheet 12 of 52 US 2013/0273968A1

MY PROFILE, INCLUDING WHO IAM, AND WHAT DO GENERALLY LIKEIWANT

MY CONTEXT, INCLUDING WHERE IAM, AND WHAT ARE MY CURRENT DESRES

MY VISUAL OUERY PIXELS THEMSELVES

PACKETS RESULTS

VISUAL CRUERY PACKET

PACKETS RESULTS PACKETS RESULTS

CONFIGURABLE ARRAY OF OFFEREDATAUCTION TO AN SERVICE PROVIDERS ARRAY OF SERVICE PROVIDERS

FIG. 11 Patent Application Publication Oct. 17, 2013 Sheet 13 of 52 US 2013/0273968A1

HØIH‘XOITTO ESNOCHSENHETTV/A

S_LITTISENH EOLAECIRJEST

Patent Application Publication Oct. 17, 2013 Sheet 14 of 52 US 2013/0273968A1

12 SENSOR

PROCESSING FOR HUMAN VISUAL | SYSTEM (e.g., JPEG compression)

FIG. 13

16

56

CAMERA STAGE 1 STAGE 2 STAGE Y INSTRUCTIONS INSTRUCTIONS INSTRUCTIONS ' ' ' INSTRUCTIONS 55 58a 58b.

STAGEZ INSTRUCTIONS DATA

56 59

FIG. 17 Patent Application Publication Oct. 17, 2013 Sheet 15 of 52 US 2013/0273968A1

CELL PHONE

FIG. 14 ADDRESSING

15

APPLICATION IN CELL PROCESSING PHONE EXTERNAL FROM CELL PHONE APPLICATION IN CELL l PHONE APPLICATION ------EXTERNAL FROM 1O CELL PHONE

APPLICATION EXTERNAL FROM CELL PHONE Patent Application Publication Oct. 17, 2013 Sheet 16 of 52 US 2013/0273968A1

S. Patent Application Publication Oct. 17, 2013 Sheet 17 of 52 US 2013/0273968A1

0 9 .|y=evis(??|c??vis

999

|9

Patent Application Publication Oct. 17, 2013 Sheet 18 of 52 US 2013/0273968A1

ERHVWAL-IOS 8),"SOIH

(ZZ

ERHV/AACIN-IV/H

Patent Application Publication Oct. 17, 2013 Sheet 20 of 52 US 2013/0273968A1

MICRO MIRROR PROJECTOR Patent Application Publication Oct. 17, 2013 Sheet 21 of 52 US 2013/0273968A1

Patent Application Publication Oct. 17, 2013 Sheet 22 of 52 US 2013/0273968A1

CAPTURE OR RECEIVE IMAGE IMAGE AND/OR METADATA

PRE-PROCESS IMAGE, SUBMIT TO GOOGLE, RECEIVE e.g., IMAGE ENHANCEMENT, AUXLIARY INFORMATION OBJECT SEGMENTATION/ L- EXTRACTION, etc.

SUBMIT TO SERVICE

FIG. 24

CAPTURE OR RECEIVE IMAGE

FEATURE EXTRACTION

SEARCH FOR IMAGES WITH AUGMENTENHANCE | SIMILAR METRICS METADATA

USE SIMLARMAGE DATAN SUBMIT TO SERVICE LIEU OF USER IMAGE DATA FIG. 25 FIG. 23 Patent Application Publication Oct. 17, 2013 Sheet 23 of 52 US 2013/0273968A1

CAPTURE OR RECEIVE IMAGE CAPTURE OR RECEIVE IMAGE

FEATURE EXTRACTION FEATURE EXTRACTION

SEARCH FOR IMAGES WITH SEARCH FOR IMAGES WITH SIMILAR METRICS SIMILAR METRICS

SUBMIT PLURAL SETS OF COMPOSITE OR MERGE IMAGE DATA TO SERVICE SEVERAL SETS OF IMAGE DATA PROVIDER

SUBMIT TO SERVICE PROVIDER SERVICE PROVIDER, OR USER DEVICE, DETERMINES ENHANCED RESPONSE BASED ON PLURAL SETS OF DATA FIG. 27

FIG. 26 CAPTURE OR RECEIVE USER IMAGE

FEATURE EXTRACTION

SEARCH FOR IMAGES WITH SIMILAR METRICS

USE INFO FROM SIMILAR IMAGE(S) TO ENHANCE USER IMAGE

FIG. 28 Patent Application Publication Oct. 17, 2013 Sheet 24 of 52 US 2013/0273968A1

CAPTURE OR RECEIVE IMAGE CAPTURE OR RECEIVE IMAGE OF SUBJECT WITH GEOLOCATION INFO FEATURE EXTRACTION

INPUT GEOLOCATION INFO TO SEARCH FOR IMAGES WITH DB; GET TEXTUAL SIMILAR METRICS DESCRIPTORS

HARVEST METADATA SEARCH IMAGE DATABASE USING TEXTUAL DESCRIPTORS, IDENTIFY OTHER IMAGE(S) OF SUBJECT SEARCH FOR IMAGES WITH SIMILARMETADATA

SUBMIT TO SERVICE

SUBMIT TO SERVICE FIG. 30 FIG. 28A

CAPTURE OR RECEIVE IMAGE OF SUBJECT WITH GEOLOCATION INFO

SEARCH IMAGE DATABASE USING GEOLOCATION DATA IDENTIFY OTHER IMAGE(S) OF SUBJECT

SUBMIT OTHER IMAGE(S) TO SERVICE

FIG. 31 Patent Application Publication Oct. 17, 2013 Sheet 25 of 52 US 2013/0273968A1

FIG. 36 FIG. 37

FIG. 38 Patent Application Publication Oct. 17, 2013 Sheet 26 of 52 US 2013/0273968A1

CAPTURE OR RECEIVE IMAGE CAPTURE OR RECEIVE IMAGE OF SUBJECT WITH

OF SUBJECT WITH GEOLOCATION INFO GEOLOCATION INFO

SEARCH FOR IMAGES WITH SEARCH FOR IMAGES WITH SIMILAR IMAGE METRICS SIMILAR IMAGE METRICS

ADD GEOLOCATION INFO TO HARVEST METADATA MATCHING IMAGES

FIG. 32 SEARCH FOR IMAGES WITH SIMILAR METADATA

ADD GEOLOCATION INFO TO CAPTURE OR RECEIVE IMAGE MATCHING IMAGES OF SUBJECT WITH GEOLOCATION INFO FIG. 33 SEARCH FOR IMAGES WITH MATCHING GEOLOCATION

HARVEST METADATA

SEARCH FOR IMAGES WITH SIMILAR METADATA

ADD GEOLOCATION INFO TO MATCHING IMAGES

FIG. 34 Patent Application Publication Oct. 17, 2013 Sheet 27 of 52 US 2013/0273968A1

USER SNAPS IMAGE

URBANIRURAL FAMILYIFRIEND/ PRSSSFFER ANALYSIS STRANGER ANALYSIS ANALYSIS

RESPONSE RESPONSE RESPONSE ENGINEE ENGINE #2 ENGINE #3

FIG. 49 CONSOLIDATOR

CELL PHONE USER INTERFACE

ROUTER

RESPONSE RESPONSE ENGINEA ENGINEE

FIG. 37A FIG. 37B Patent Application Publication Oct. 17, 2013 Sheet 28 of 52 US 2013/0273968A1

USER-IMAGE -o "IMAGE JUICER" (algorithmic, Crowd-sourced, etc.)

USER IMAGE-RELATED FACTORS (First Tier) e.g., data about one or more of: Color histogram, Pattern, Shape, Texture, Facial recognition, Eigenvalue,

Object recognition, Orientation, Text, Numbers, OCR, Barcode, Watermark, Emotion, Location, Transform data, etc

DATABASE OF PUBLIC IMAGES, SEARCHENGINE --> IMAGE-RELATED FACTORS, AND OTHER METADATA

OTHER IMAGE-RELATED OTHER FACTORS AND/OR IMAGESICONTENT OTHER METADATA (Second Tier) (Second Tier)

------DATABASE OF PUBLIC IMAGES, SEARCHENGINE ---> IMAGE-RELATED FACTORS, AND - L- OTHER METADATA

FINAL FINAL OTHER IMAGESICONTENT INFORMATION (Nth Tier) (Nth Tier)

USER FIG. 39 Patent Application Publication Oct. 17, 2013 Sheet 29 of 52 US 2013/0273968A1

USER-IMAGE -o "IMAGE JUICER" (algorithmic, Crowd-sourced, etc.)

USER IMAGE-RELATED FACTORS (First Tier)

DATABASE OF PUBLIC IMAGES, SEARCHENGINE - e IMAGE-RELATED FACTORS, AND OTHER METADATA

OTHER IMAGE-RELATED OTHER FACTORS AND/OR IMAGESICONTENT OTHER METADATA (Second Tier) (Second Tier)

GOOGLETEXT, SEARCHENGINE Ho- WEB DATABASE

------DATABASE OF PUBLIC IMAGES, SEARCHENGINE ---> IMAGE-RELATED FACTORS, AND ------OTHER - METADATA- - - - - FINAL FINAL OTHER IMAGESICONTENT INFORMATION (Nth Tier) (Nth Tier)

FIG. 40 Patent Application Publication Oct. 17, 2013 Sheet 30 of 52 US 2013/0273968A1

"IMAGE JUICER" (algorithmic, Crowd-sourced, etc.)

..Image eigenvalues, Image classifier concludes "drill".

DATABASE OF PUBLIC IMAGES, SEARCHENGINE IMAGE-RELATED FACTORS, AND OTHER METADATA

Similar Text metadata for looking similar-looking pix pi

PROCESSOR Ranked metadata, incl. "Drill," "Black & Decker" and "DR250B'

DATABASE OF PUBLIC IMAGES, SES IMAGE-RELATED ONE OR MORE FACTORS, AND IMAGE-BASED OTHER METADATA

ROUTING SERVICES (may Pix with similar metadata employ user, preference data), E.G., SNAPNOW Cached web pages FIG. 41

USER INTERFACE -He

Options presented may include "Similar looks," "Similar descriptors,” “Web results," "Buy," "Sell," "Manual," "More," "SnapNow," etc. Patent Application Publication Oct. 17, 2013 Sheet 31 of 52 US 2013/0273968A1

...Geolocation coordinates, Data restraight edgelarced edge density...

DATABASE OF PUBLIC IMAGES, SEARCHENGINE IMAGE-RELATED FACTORS, AND OTHER METADATA Pix that are similarly- Text metadata for pix that are similarly-located, located, with similar edge? with similar edge/arc density arc density PROCESSOR Ranked metadata, incl. "Eiffel Tower," and "Paris' FIG. 42

DATABASE OF SEARCH PUBLIC IMAGES, IMAGE-RELATED ENGINE ONE OR MORE FACTORS, AND OTHER METADATA

IMAGE-BASED ROUTING SERVICES (may BROWSER Pix with similar metadata employ user, preference data), Cached E.G., SNAPNOW web UI options presented may include y pages "Similar looks," "Similar descriptors," "Similar location," "Web results." "SnapNow," Map, Directions, USER INTERFACE History, Search nearby, etc. Patent Application Publication Oct. 17, 2013 Sheet 32 of 52 US 2013/0273968A1

"IMAGE JUICER" (algorithmic, Crowd-sourced, etc.)

...Geolocation coordinates, Eigenvectors....

DATABASE OF MLS DATABASE OF PUBLIC IMAGES, IMAGES, IMAGE-RELATED -o- SEARCHENGINE -o- IMAGE-RELATED FACTORS, AND FACTORS, AND OTHER METADATA OTHER METADATA

Neighborhood Address, Zip code, school district, scenes, text Square footage, price, bedrooms/ metadata from baths, acreage, time on market, similarly-located data/imagery re similarly-located images houses, similarly-priced houses, similarly-featured houses, etc

GOOGLE, GOOGLE EARTH

USER INTERFACE

FIG. 43 Patent Application Publication Oct. 17, 2013 Sheet 33 of 52 US 2013/0273968A1

Patent Application Publication Oct. 17, 2013 Sheet 34 of 52 US 2013/0273968A1

134

V Ya 132 1. 116 Y 130 m FIG. 45A

SOD2N S. Y 4. 116 Y. FIG. 45B Patent Application Publication Oct. 17, 2013 Sheet 35 of 52 US 2013/0273968A1

Rockefeller Center (21) New York (18) NYC (10) GET GEOCOORDINATES FROM IMAGE Manhattan (8) Empire State Building (4) Midtown (4) SEARCH FLICKR, ETC, FOR SIMILARLY Top of the rock (4) LOCATED IMAGES (AND/OR SIMILAR IN Nueva York (3) OTHER WAYS): "SET 1"IMAGES USA (3) 30 Rock (2) August (2) HARVEST, CLEAN-UP AND CLUSTER Atlas (2) METADATA FROM SET 1 IMAGES Christmas (2) CityScape (2)

Fifth Avenue (2) FROMMETADATA, DETERMINE Me (2) LIKELIHOOD THAT EACH IMAGE IS Prometheus (2) PLACE-CENTRIC (CLASS 1) Skating rink (2) Skyline (2) Skyscraper (2) FROMMETADATA AND FACE-FINDING, Clouds (1) DETERMINE LIKELIHOOD THAT EACH General Motors Building (1) IMAGE IS PERSON-CENTRIC (CLASS 2) Gertrude (1) Jerry (1) Lights (1) r FROMMETADATA, AND ABOVE Statue (1) A SCORES, DETERMINE LIKELIHOOD THAT EACH IMAGE IS THING-CENTRIC (CLASS 3)

DETERMINE WHICH IMAGE METRICS RELIABLY GROUP LIKE-CLASSED IMAGES TOGETHER, AND DISTINGUISH DIFFERENTLY-CLASSED IMAGES

FIG. 46A Patent Application Publication Oct. 17, 2013 Sheet 36 of 52 US 2013/0273968A1

APPLY TESTS TO INPUT IMAGE DETERMINE FROM METRICS ITS SCORE IN DIFFERENT CLASSES

PERFORM SIMLARITY TESTING BETWEEN INPUT IMAGE AND EACH IMAGE IN SET 1 B

WEIGHT METADATA FROM EACH 19 Rockefeller Center IMAGENACCORDANCE WITH 12 Prometheus SIMILARITY SCORE: COMBINE 5 Skating rink

SEARCH IMAGE DATABASE FOR New York (17) ADDL SIMILARLY-LOCATED Rockefeller Center (12) IMAGES, THIS TIME ALSO USING Manhattan (6) C INFERRED POSSIBLE METADATA Prometheus (5) AS SEARCH CRITERA. "SET 2" Fountain (4) 1. Christmas (3)

Paul Manship (3) Sculpture (3) PERFORM SIMLARITY TESTING Skating Rink (3)

BETWEEN INPUT IMAGE AND Jerry (1)

EACH IMAGE IN SET 2 National register of historical places (2)

Nikon (2) RCA Building (2) WEIGHT METADATA FROM EACH Greek mythology (1) IMAGENACCORDANCE WITH SIMILARITY SCORE: COMBINE D 5 Prometheus 1. 4 Fountain FURTHERWEIGHT METADATA IN 3 Paul Manship ACCORDANCE WITH 3 Rockefeller Center UNUSUALNESS 3 Sculpture 2 National register of historical places 2 RCA Building 1 Greek mythology FIG. 46B Patent Application Publication Oct. 17, 2013 Sheet 37 of 52 US 2013/0273968A1

USER SNAPS IMAGE

GEOINFO PROFILE DATA DATABASE

PLACE-TERM GOOGLE, ETC GLOSSARY

IMAGE

PICASA ANALYSIS, ICKR DATA PROCESSING ENGINE

WORD FREQUENCY GLOSSARY DATABASE ri/ FACEBOOK DATA

GOOGLE

FIG. 47

RESPONSE ENGINE

CELL PHONE USER INTERFACE Patent Application Publication Oct. 17, 2013 Sheet 38 of 52 US 2013/0273968A1

USER SNAPS IMAGE

PERSONAPLACE/ THING ANALYSIS

FAMILYIFRIEND/ CHILDIADULT STRANGER ANALYSIS ANALYSIS

RESPONSE ENGINE

CELL PHONE USER INTERFACE FIG. 48A

USER SNAPS IMAGE

FAMILYIFRIEND/ PERSONAPLACE/ CHILDIADULT STRANGER ANALYSIS THING ANALYSIS ANALYSIS

RESPONSE ENGINE

CELL PHONE USER INTERFACE FIG. 48B Patent Application Publication Oct. 17, 2013 Sheet 39 of 52 US 2013/0273968A1

USER SNAPS IMAGE

FACIAL DCT FFT WAVELET RECOGNITION TRANSFORM TRANSFORM TRANSFORM U

EDGE EIGENVALUE SPECTRUM OCR D DETECTION ANALYSIS ANALYSIS U D Of t PATTERN FOURIER-MELLIN TEXTURE COLOR (f) EXTRACTION TRANSFORM CLASSIFIER ANALYSIS t X - t GIST EMOTION BARCODE AGE U Z PROCESSING CLASSIFIER DECODER DETECTION D U WATERMARK ORIENTATION I SIMLARITY GENDER t Of DECODER DETECTION PROCESSOR DETECTION O C U OBJECT METADATA GEOLOCATION | ETC O t SEGMENTATION ANALYSIS PROCESSING Of

ROUTER

SERVICE SERVICE PROVIDER 1 PROVIDER 2 / CELL PHONE USER INTERFACE

FIG. 50 Patent Application Publication Oct. 17, 2013 Sheet 40 of 52 US 2013/0273968A1

INPUT IMAGE, OR INFORMATIONABOUT THE IMAGE

ON U 7 ere T “We

Q- O& O.a A O S8 2. Xa 2 stO d53 to2 aS 6g 22. $9 2a gCS C 2st \ Cy O O T G 9 s s s s 3 g a- S 2 -i. g 3 S S 2 2235 (a) Post annotated photo to user's Facebook page 3. (b) Start a text message to depicted person

(a) Show estimations of who this may be (b) Send "Friend" invite on MySpace (c) Who are the degrees of separation between us (a) Buy online (b) Show nutrition information (c) lodentify local stores that sell

FIG. 51 Patent Application Publication Oct. 17, 2013 Sheet 41 of 52 US 2013/0273968A1

COMPUTER A

CONTENT SETA ROUTER

ROUTER A COMPUTER B

RESPONSE RESPONSE CONTENT SET B ENGINE 1A ENGINE 2A

ROUTER B

COMPUTERC RESPONSE ENGENE B CONTEN SET C

RESPONSE COMPUTERD ENG NEC

CONTENT SET O

COMPUTERE ROUTERD

CONTENT SETE 1 50

RESPONSE ENGINE

F.G. 52 Patent Application Publication Oct. 17, 2013 Sheet 42 of 52 US 2013/0273968A1

satise Axexists, &s

BABY CAM O VECTORsFEATURE NAME (URLs.)ACTION www.SMITH HOME. 1E 5G 18D 238F BABY CAM COM BABYCAM.HTM TRAFFIC MAP SEATTLE

6C 78 TRAFFIC.MAP, HTML

FIG. 55 SIGN GLOSSARY FIG. 54 Patent Application Publication Oct. 17, 2013 Sheet 43 of 52 US 2013/0273968A1

MAGE SENSORS

AMPFY ANAOG SGNAS

CONVERT TO DIGITA NATIVE MAGE DATA

BAYER NTERPOLATION

WHTE BALANCE CORRECTON

GAMMA CORRECTION

EDGE ENHANCEMENT

JPEG COMPRESSION

STORE N BUFFER MEMORY

OSPLAY CAPTUREO MAGE ON SCREEN

ONUSER COMMAND, STORE IN CARD AND/OR TRANSMT

FIG. 56 (Prior Art) Patent Application Publication Oct. 17, 2013 Sheet 44 of 52 US 2013/0273968A1

MAGE SENSORS

AMPFY ANAOG FIG. 57 SGNALS

CONVERT TO DIGITAL NATWE IMAGE DATA

- BAYER FOURER ETC EGENFACE ETC NTERPOLATON TRANSFORM CACULATION - -

Etc (as in

Fig. 56)

MATCH MATCH FACE EMPATE Yes BARCODE Yes TEMPAE?

SEND DGTAL NATIVE MEN MAGE DATA AND/OR TRANSFORM FOURIER DOMAN DATA TO BARCODE SERVICE

SEND DGTALNATIVE

MAGE DATA AND/OR F-V MATCH TEXT DOMAN DATA TO OCR TEMPLATE SERVICE Yes SEND DGETA, NATIVE ACHW MAGE DATA AND/OR F-M ORENATON DOMAIN OATA TO WM Yes TEMPATEP SERVICE SEND DIGITANAVE MATCH FACE MAGE DATA AND/OR F-M ye\ TEMPLATE? OOMAN OAA AND/OR EGENFACE DATA TO FACA RECOGNETON SERVICE Patent Application Publication Oct. 17, 2013 Sheet 45 of 52 US 2013/0273968A1

FROM APPARENT 12 degrees right -ss HORIZON:

ROTATON SNCE 17 degrees left START

SCALE CHANGE SNCE 30% closer START

MAGE CAPTURE/ PROCESSING

ANA YSS, MEMORY

OUTPUT sax assess is a F.G. 59 DEPENDENT ON MAGE DAA Patent Application Publication Oct. 17, 2013 Sheet 46 of 52 US 2013/0273968A1

FIG. 60

S

FIG. 65

Patent Application Publication Oct. 17, 2013 Sheet 47 of 52 US 2013/0273968A1

FOURIER TRANSFORM - DATA : BARCODE NFORMATON FOURER-MEN TRANSFORM DATA "

OCR NFORMATON -->

WATERMARK NFORMATON

DCT DATA ------D-

MOTON INFORMATION --> > METADATA EGENFACE DATA -->

FACE INFORMATION -b-

SOBE FLTER DATA -->

WAVEET TRANSFORM DATA

COOR-STOGRAM DATA F.G. 61 Patent Application Publication Oct. 17, 2013 Sheet 48 of 52 US 2013/0273968A1

FIG 67

as

FIG. 68 Patent Application Publication Oct. 17, 2013 Sheet 49 of 52 US 2013/0273968A1

F.G. 69

Patent Application Publication Oct. 17, 2013 Sheet 50 of 52 US 2013/0273968A1

S ial Frequency f ycles/D egree

e 1 1. O ,01 22 |No. WS O, Threshold Sensi Y (fThresh old Contrast Contrast)

10

O. 10 OO Spatial frequency FIG. 72 (Cycles/MM On Retina )

Patent Application Publication Oct. 17, 2013 Sheet 52 of 52 US 2013/0273968A1

START START

ROUGH OUT NES ARGE OETAS INTERMEDATE OETAS FACA FNE DETAS EGENWA UES

R, G, B Data Alpha Channel Data

FIG. 77A FIG. 77B US 2013/0273968 A1 Oct. 17, 2013

METHODS AND SYSTEMIS FOR CONTENT BRIEF DESCRIPTION OF THE DRAWINGS PROCESSING 0012 FIG. 1 is a high level view of an embodiment incor porating aspects of the present technology. RELATED APPLICATION DATA 0013 FIG. 2 shows some of the applications that a user 0001. This application is a continuation of application Ser. may request a camera-equipped cell phone to perform. No. 12/484,115, filed Jun. 12, 2009 (now U.S. Pat. No. 8,385, 0014 FIG. 3 identifies some of the commercial entities in 971), which is a continuation-in-part of application Ser. No. an embodiment incorporating aspects of the present technol 12/271,692, filed Nov. 14, 2008 (published as 20100046842), Ogy. which claims priority to the following provisional applica (0015 FIGS. 4, 4A and 4B conceptually illustrate how tions: 61/090,083, filed Aug. 19, 2008: 61/096,703, filed Sep. pixel data, and derivatives, is applied in different tasks, and 12, 2008: 61/100,643, filed Sep. 26, 2008: 61/103,907, filed packaged into packet form. Oct. 8, 2008; and 61/110,490, filed Oct. 31, 2008. Application 0016 FIG. 5 shows how different tasks may have certain Ser. No. 12/484,115 also claims priority to provisional appli image processing operations in common. cations 61/176.739, filed May 8, 2009: 61/174,822, filed May 0017 FIG. 6 is a diagram illustrating how common image 1, 2009; and 61/169,266, filed Apr. 14, 2009. processing operations can be identified, and used to configure 0002 The present technology builds on, and extends, cellphone processing hardware to perform these operations. technology disclosed in other patent applications by the 0018 FIG. 7 is a diagram showing how a cell phone can present assignee. The reader is thus directed to the following send certain pixel-related data across an internal bus for local applications that serve to detail arrangements in which appli processing, and send other pixel-related data across a com cants intend the present technology to be applied, and that munications channel for processing in the cloud. technically Supplement the present disclosure: (0019 FIG. 8 shows how the cloud processing in FIG. 7 allows tremendously more “intelligence' to be applied to a 0003) Application Ser. No. 12/271,772, filed Nov. 14, task desired by a user. 2008 (published as 201001 19208); (0020 FIG. 9 details how key vector data is distributed to 0004. Application Ser. No. 61/150,235, filed Feb. 5, 2009: different external service providers, who perform services in 0005. Application Ser. No. 61/157,153, filed Mar. 3, 2009: exchange for compensation, which is handled in consolidated fashion for the user. 0006. Application Ser. No. 61/167,828, filed Apr. 8, 2009: 0021 FIG. 10 shows an embodiment incorporating 0007. Application Ser. No. 12/468.402, filed May 19, aspects of the present technology, noting how cell phone 2009 (now U.S. Pat. No. 8,004,576). based processing is Suited for simple object identification 0008. The disclosures of all these applications are incor tasks—such as template matching, whereas cloud-based pro porated herein by reference. cessing is Suited for complex tasks—such as data association. 0022 FIG. 10A shows an embodiment incorporating BACKGROUND aspects of the present technology, noting that the user expe rience is optimized by performing visual key vector process 0009 Digimarc's U.S. Pat. No. 6,947,571 shows a system ing as close to a sensor as possible, and administering traffic in which a cell phone camera captures content (e.g., image to the cloud as low in a communications stack as possible. data), and processes same to derive information related to the 0023 FIG. 11 illustrates that tasks referred for external imagery. This derived information is Submitted to a data processing can be routed to a first group of service providers structure (e.g., a remote database), which indicates corre who routinely perform certain tasks for the cellphone, or can sponding data or actions. The cell phone then displays be routed to a second group of service providers who compete responsive information, or takes responsive action. Such on a dynamic basis for processing tasks from the cellphone. sequence of operations is sometimes referred to as “visual 0024 FIG. 12 further expands on concepts of FIG. 11, e.g., search.” showing how a bid filter and broadcast agent software module 0010 Related technologies are shown in patent publica may oversee a reverse auction process. tions 20080300011 (Digimarc), U.S. Pat. No. 7,283,983 and 0025 FIG. 13 is a high level block diagram of a processing WO07/130,688 (Evolution Robotics), 20070175998 and arrangement incorporating aspects of the present technology. 20020102966 (DSPV), 20060012677, 20060240862 and 0026 FIG. 14 is a high level block diagram of another 20050185060 (Google), 20060056707 and 20050227674 processing arrangement incorporating aspects of the present (Nokia), 20060026140 (ExBiblio), U.S. Pat. No. 6,491,217, technology. 20020152388, 200201784.10 and 20050144455 (Philips), 0027 FIG. 15 shows an illustrative range of image types 20020072982 and 20040199387 (Shazam), 20030083098 that may be captured by a cell phone camera. (Canon), 20010055391 (Qualcomm), 20010001854 (Air 0028 FIG. 16 shows a particular hardware implementa Clic), U.S. Pat. Nos. 7.251,475 (Sony), 7,174.293 (Iceberg), tion incorporating aspects of the present technology. 7,065,559 (Organnon Wireless), 7,016,532 (Evryx Technolo 0029 FIG. 17 illustrates aspects of a packet used in an gies), 6,993,573 and 6,199,048 (Neomedia), 6,941,275 (Tune exemplary embodiment. Hunter), 6,788.293 (Silverbrook Research), 6,766,363 and 0030 FIG. 18 is a block diagram illustrating an implemen 6,675,165 (BarPoint), 6,389,055 (Alcatel-Lucent), 6,121,530 tation of the SIFT technique. (Sonoda), and 6,002,946 (Reber/Motorola). 0031 FIG. 19 is a block diagram illustrating, e.g., how 0011. The presently-detailed technology concerns packet header data can be changed during processing, improvements to Such technologies—moving towards the through use of a memory. goal of intuitive computing: devices that can see and/or hear, 0032 FIG. 20 shows an arrangement by which a cell and infer the user's desire in that sensed context. phone camera and a cell phone projector share a lens. US 2013/0273968 A1 Oct. 17, 2013

0033 FIG. 21 shows an image of a desktop telephone 0059 FIGS. 63-66 detail how the image of FIG. 62 can be captured by a cell phone camera. processed to convey semantic metadata. 0034 FIG.22 shows a collection of similar images found 0060 FIG. 67 shows another image that may be captured in a repository of public images, by reference to characteris by a cell phone camera user. tics discerned from the image of FIG. 21. 0061 FIGS. 68 and 69 detailhow the image of FIG. 67 can 0035 FIGS. 23-28A, and 30-34 are flow diagrams detail be processed to convey semantic metadata. ing methods incorporating aspects of the present technology. 0062 FIG. 70 shows an image that may be captured by a 0036 FIG. 29 is an arty shot of the Eiffel Tower, captured cellphone camera user. by a cell phone user. 0063 FIG. 71 details how the image of FIG. 70 can be 0037 FIG. 35 is another image captured by a cell phone processed to convey semantic metadata. USC. 0064 FIG. 72 is a chart showing aspects of the human 0038 FIG. 36 is an image of an underside of a telephone, visual system. discovered using methods according to aspects of the present 0065 FIG. 73 shows different low, mid and high fre technology. quency components of an image. 0039 FIG. 37 shows part of the physical user interface of 0.066 FIG. 74 shows a newspaper page. one style of cell phone. 0067 FIG. 75 shows the layout of the FIG. 74 page, as set 0040 FIGS. 37A and 37B illustrate different linking by layout software. topologies. 0068 FIG. 76 details how user interaction with imagery 0041 FIG. 38 is an image captured by a cell phone user, captured from printed text may be enhanced. depicting an Appalachian Trail trail marker. 0069 FIGS. 77A and 77B illustrate how semantic convey 0042 FIGS. 39-43 detail methods incorporating aspects ance of metadata can have a progressive aspect, akin to of the present technology. JPEG2000 and the like. 0043 FIG. 44 shows the user interface of one style of cell phone. DETAILED DESCRIPTION 0044 FIGS. 45A and 45B illustrate how different dimen 0070 The present specification details a collection of sions of commonality may be explored through use of a user interrelated work, including a great variety of technologies. A interface control of a cell phone. few of these include image processing architectures for cell 0045 FIGS. 46A and 46B detail a particular method incor phones, cloud-based computing, reverse auction-based deliv porating aspects of the present technology, by which key ery of services, metadata processing, image conveyance of words such as Prometheus and Paul Manship are automati semantic information, etc., etc., etc. Each portion of the speci cally determined from a cellphone image. fication details technology that desirably incorporates tech 0046 FIG. 47 shows some of the different data sources nical features detailed in other portions. Thus, it is difficult to that may be consulted in processing imagery according to identify “a beginning from which this disclosure should aspects of the present technology. logically begin. That said, we simply dive in. 0047 FIGS. 48A, 48B and 49 show different processing methods according to aspects of the present technology. Mobile Device Object Recognition and Interaction Using 0048 FIG.50 identifies some of the different processing Distributed Network Services that may be performed on image data, in accordance with aspects of the present technology. 0071. There is presently a huge disconnect between the 0049 FIG. 51 shows an illustrative tree structure that can unfathomable Volume of information that is contained in high be employed in accordance with certain aspects of the present quality image data streaming from a mobile device camera technology. (e.g., in a cell phone), and the ability of that mobile device to 0050 FIG. 52 shows a network of wearable computers process this data to whatever end. “Off device' processing of (e.g., cellphones) that can cooperate with each other, e.g., in visual data can help handle this fire hose of data, especially a peer-to-peer network. when a multitude of visual processing tasks may be desired. 0051 FIGS. 53-55 detail how a glossary of signs can be These issues become even more critical once “real time object identified by a cell phone, and used to trigger different recognition and interaction' is contemplated, where a user of actions. the mobile device expects virtually instantaneous results and 0052 FIG. 56 illustrates aspects of prior art digital camera augmented reality graphic feedback on the mobile device technology. screen, as that user points the camera at a scene or object. 0072. In accordance with one aspect of the present tech 0053 FIG. 57 details an embodiment incorporating nology, a distributed network of pixel processing engines aspects of the present technology. serve such mobile device users and meet most qualitative 0054 FIG.58 shows how a cellphone can be used to sense "human real time interactivity” requirements, generally with and display affine parameters. feedback in much less than one second. Implementation 0055 FIG. 59 illustrates certain state machine aspects of desirably provides certain basic features on the mobile the present technology. device, including a rather intimate relationship between the 0056 FIG. 60 illustrates how even “still imagery can image sensor's output pixels and the native communications include temporal, or motion, aspects. channel available to the mobile device. Certain levels of basic 0057 FIG. 61 shows some metadata that may be involved “content filtering and classification of the pixel data on the in an implementation incorporating aspects of the present local device, followed by attaching routing instructions to the technology. pixel data as specified by the user's intentions and Subscrip 0058 FIG. 62 shows an image that may be captured by a tions, leads to an interactive session between a mobile device cellphone camera user. and one or more "cloud based pixel processing services. The US 2013/0273968 A1 Oct. 17, 2013

key word “session further indicates fast responses transmit This aspect of the technology relates to how similarly effi ted back to the mobile device, where for some services mar cient and broadly enabling elements can be built into mobile keted as “real time' or “interactive,” a session essentially devices, mobile device connections and network services, all represents a duplex, generally packet-based, communication, with the goal of serving the applications depicted in FIG. 2, as where several outgoing pixel packets' and several incoming well as those new applications which may show up as the response packets (which may be pixel packets updated with technology dance continues. the processed data) may occur every second. 0080 Perhaps the central difference between the human 0073 Business factors and good old competition are at the analogy and mobile device networks must Surely revolve heart of the distributed network. Users can subscribe to or around the basic concept of “the marketplace,” where buyers otherwise tap into any external services they choose. The buy better and better things So long as businesses know how local device itself and/or the carrier service provider to that to profit accordingly. Any technology which aims to serve the device can be configured as the user chooses, routing filtered applications listed in FIG. 2 must necessarily assume that and pertinent pixel data to specified object interaction ser hundreds if not thousands of business entities will be devel vices. Billing mechanisms for Such services can directly plug oping the nitty gritty details of specific commercial offerings, into existing cell and/or mobile device billing networks, with the expectation of one way or another profiting from wherein users get billed and service providers get paid. those offerings. Yes, a few behemoths will dominate main 0074 But let's back up a bit. The addition of camera lines of cash flows in the overall mobile industry, but an equal systems to mobile devices has ignited an explosion of appli certainty will be that niche players will be continually devel cations. The primordial application certainly must be folks oping niche applications and services. Thus, this disclosure simply Snapping quick visual aspects of their environment describes how a marketplace for visual processing services and sharing Such pictures with friends and family. can develop, whereby business interests across the spectrum 0075. The fanning out of applications from that starting have something to gain. FIG. 3 attempts a crude categoriza point arguably hinges on a set of core plumbing features tion of some of the business interests applicable to the global inherent in mobile cameras. In short (and non-exhaustive of business ecosystem operative in the era of this filing. course). Such features include: a) higher quality pixel capture I0081 FIG. 4 sprints toward the abstract in the introduction and low level processing; b) better local device CPU and GPU of the technology aspect now being considered. Here we find resources for on-device pixel processing with Subsequent a highly abstracted bit of information derived from some user feedback; c) structured connectivity into “the cloud: batch of photons that impinged on Some form of electronic and importantly, d) a maturing traffic monitoring and billing image sensor, with a universe of waiting consumers of that infrastructure. FIG. 1 is but one graphic perspective on some lowly bit. FIG. 4A then quickly introduces the intuitively of these plumbing features of what might be called a visually well-known concept that singular bits of visual information intelligent network. (Conventional details of a cell phone, arent worth much outside of their role in both spatial and Such as the microphone, A/D converter, modulation and temporal groupings. This core concept is well exploited in demodulation systems, IF stages, cellular transceiver, etc., are modern video compression standards such as MPEG7 and not shown for clarity of illustration.) H.264. 0076. It is all well and good to get better CPUs and GPUs, I0082. The “visual character of the bits may be pretty far and more memory, on mobile devices. However, cost, weight removed from the visual domain by certain of the processing and power considerations seem to favor getting “the cloud to (consider, e.g., the vector strings representing eigenface do as much of the “intelligence' heavy lifting as possible. data). Thus, we sometimes use the term "keyvector data” (or 0077 Relatedly, it seems that there should be a common “key vector strings') to refer collectively to pixel data, denominator set of “device-side' operations performed on together with associated information/derivatives. visual data that will serve all cloud processes, including cer I0083 FIGS. 4A and 4B also introduce a central player in tain formatting, elemental graphic processing, and other rote this disclosure: the packaged and address-labeled pixel operations. Similarly, it seems there should be a standardized packet, into a body of which key vector data is inserted. The basic header and addressing scheme for the resulting com key vector data may be a single patch, or a collection of munication traffic (typically packetized) back and forth with patches, or a time-series of patches/collections. A pixel the cloud. packet may be less than a kilobyte, or its size can be much 0078. This conceptualization is akin to the human visual much larger. It may convey information about an isolated system. The eye performs baseline operations, such as chro patch of pixels excerpted from a larger image, or it may maticity groupings, and it optimizes necessary information convey a massive of Notre Dame cathedral. for transmission along the optic nerve to the brain. The brain I0084 (As presently conceived, a pixel packet is an appli does the real cognitive work. And there's feedback the other cation layer construct. When actually pushed around a net way too—with the brain sending information controlling work, however, it may be broken into Smaller portions—as muscle movement—where to point eyes, scanning lines of a transport layer constraints in a network may require.) book, controlling the iris (lighting), etc. I0085 FIG. 5 is a segue diagram still at an abstract level but 0079 FIG. 2 depicts a non-exhaustive but illustrative list pointing toward the concrete. A list of user-defined applica of visual processing applications for mobile devices. Again, it tions, such as illustrated in FIG. 2, will map to a state-of-the is hard not to see analogies between this list and the funda art inventory of pixel processing methods and approaches mentals of how the human visual system and the human brain which can accomplish each and every application. These operate. It is a well studied academic area that deals with how pixel processing methods break down into common and not “optimized the human visual system is relative to any given So-common component Sub-tasks. Object recognition text object recognition task, where a general consensus is that the books are filled with a wide variety of approaches and termi eye-retina-optic nerve-cortex system is pretty darn wonderful nologies which bring a sense of order into what at first glance in how efficiently it serves a vast array of cognitive demands. might appear to be a bewildering array of “unique require US 2013/0273968 A1 Oct. 17, 2013

ments' relative to the applications shown in FIG. 2. But FIG. cellphone first reports detection of the specially-marked six 5 attempts to show that there are indeed a set of common steps pack. In the FIG. 6 arrangement, some of the component tasks and processes shared between visual processing applications. in the SIFT pattern matching operation are performed by the The differently shaded pie slices attempt to illustrate that elemental image processing in the configured hardware; oth certain pixel operations are of a specific class and may simply ers are referred for more specialized processing—eitherinter have differences in low level variables or optimizations. The nal or external.) size of the overall pie (thought of in a logarithmic sense, I0089 FIG. 7 up-levels the picture to a generic distributed where a pie twice the size of another may represent 10 times pixel services network view, where local device pixel services more Flops, for example), and the percentage size of the slice, and "cloud based pixel services have a kind of symmetry in represent degrees of commonality. how they operate. The router in FIG. 7 takes care of how any I0086 FIG. 6 takes a major step toward the concrete, sac given packaged pixel packet gets sent to the appropriate pixel rificing simplicity in the process. Here we see a top portion processing location, whether local or remote (with the style of labeled “Resident Call-Up Visual Processing Services.” fill pattern denoting different component processing func which represents all of the possible list of applications from tions; only a few of the processing functions required by the FIG. 2 that a given mobile device may be aware of, or down enabled visual processing services are depicted). Some of the right enabled to perform. The idea is that not all of these data shipped to cloud-based pixel services may have been first applications have to be active all of the time, and hence some processed by local device pixel services. The circles indicate Sub-set of services is actually “turned on at any given that the routing functionality may have components in the moment. The turned on applications, as a one-time configu cloud—nodes that serve to distribute tasks to active service ration activity, negotiate to identify their common component providers, and collect results for transmission back to the tasks, labeled the “Common Processes Sorter first gener device. In some implementations these functions may be ating an overall common list of pixel processing routines performed at the edge of the wireless network, e.g., by mod available for on-device processing, chosen from a library of ules at wireless service towers, so as to ensure the fastest these elemental image processing routines (e.g., FFT filter action. Results collected from the active external service pro ing, edge detection, resampling, color histogramming, log viders, and the active local processing stages, are fed back to polar transform, etc.). Generation of corresponding Flow Pixel Service Manager software, which then interacts with Gate Configuration/Software Programming information fol the device user interface. lows, which literally loads library elements into properly 0090 FIG. 8 is an expanded view of the lower right portion ordered places in a field programmable gate array set-up, or of FIG. 7 and represents the moment where Dorothy's shoes otherwise configures a Suitable processor to perform the turn red and why distributed pixel services provided by the required component tasks. cloud—as opposed to the local device—will probably trump 0087 FIG. 6 also includes depictions of the image sensor, all but the most mundane object recognition tasks. followed by a universal pixel segmenter. This pixel segmenter 0091. Object recognition in its richer form is based on breaks down the massive stream of imagery from the sensor visual association rather than strict template matching rules. into manageable spatial and/or temporal blobs (e.g., akin to If we all were taught that the capital letter “A” will always be MPEG macroblocks, wavelet transform blocks, 64x64 pixel strictly following some pre-historic form never to change, a blocks, etc.). After the torrent of pixels has been broken down universal template image if you will, then pretty clean and into chewable chunks, they are fed into the newly pro locally prescriptive methods can be placed into a mobile grammed gate array (or other hardware), which performs the imaging device in order to get it to reliably read a capital A elemental image processing tasks associated with the selected any time that ordained form 'A' is presented to the camera. applications. (Such arrangements are further detailed below, 2D and even three 3D barcodes in many ways follow this in an exemplary system employing 'pixel packets.) Various template-like approach to object recognition, where for con output products are sent to a routing engine, which refers the tained applications involving Such objects, local processing elementally-processed data (e.g., key vector data) to other services can largely get the job done. But even in the barcode resources (internal and/or external) for further processing. example, flexibility in the growth and evolution of overt This further processing typically is more complex that that visual coding targets begs for an architecture which doesn't already performed. Examples include making associations, force “code upgrades' to a gaZillion devices every time there deriving inferences, pattern and template matching, etc. This is some advance in the overt symbology art. At the other end further processing can be highly application-specific. of the spectrum, arbitrarily complex tasks can be imagined, 0088 (Consider a promotional game from Pepsi, inviting e.g., referring to a network of Supercomputers the task of the public to participate in a treasure hunt in a state park. predicting the apocryphal typhoon resulting from the flutter Based on internet-distributed clues, people try to find a hid ing of a butterfly's wings halfway around the world if the den six-pack of soda to earn a S500 prize. Participants must application requires it. OZ beckons. download a special application from the Pepsi-dot-com web 0092 FIG. 8 attempts to illustrate this radical extra dimen site (or the Apple AppStore), which serves to distribute the sionality of pixel processing in the cloud as opposed to the clues (which may also be published to Twitter). The down local device. This virtually goes without saying (or without a loaded application also has a prize verification component, picture), but FIG. 8 is also a segue figure to FIG. 9, where which processes image data captured by the users cell Dorothy gets back to Kansas and is happy about it. phones to identify a special pattern with which the hidden 0093 FIG. 9 is all about cash, cash flow, and happy six-pack is uniquely marked. SIFT object recognition is used humans using cameras on their mobile devices and getting (discussed below), with the SIFT feature descriptors for the highly meaningful results back from their visual queries, all special package conveyed with the downloaded application. the while paying one monthly bill. It turns out the Google When an image match is found, the cell phone immediately “AdWords' auction genie is out of the bottle. Behind the reports same wirelessly to Pepsi. The winner is the user whose scenes of the moment-by-moment visual scans from a mobile US 2013/0273968 A1 Oct. 17, 2013

user of their immediate visual environment are hundreds and reverse auction. In many implementations, these latter pro thousands of micro-decisions, pixel routings, results com viders compete each time a packet is available for processing. parisons and micro-auctioned channels back to the mobile 0.098 Considera user who snaps a cellphone picture of an device user for the hard good they are “truly” looking for, unfamiliar car, wanting to learn the make and model. Various whether they know it or not. This last point is deliberately service providers may compete for this business. A startup cheeky, in that searching of any kind is inherently open ended vendor may offer to perform recognition for free—to build its and magical at Some level, and part of the fun of searching in brand or collect data. Imagery submitted to this service the first place is that Surprisingly new associations are part of returns information simply indicating the car's make and the results. The search user knows after the fact what they model. Consumer Reports may offer an alternative service— were truly looking for. The system, represented in FIG.9 as which provides make and model data, but also provides tech the carrier-based financial tracking server, now sees the addi nical specifications for the car. However, they may charge 2 tion of our networked pixel services module and its role in cents for the service (or the cost may be bandwidth based, facilitating pertinent results being sent back to a user, all the e.g., 1 cent per megapixel). Edmunds, or JD Powers, may while monitoring the uses of the services in order to populate offer still another service, which provides data like Consumer the monthly bill and send the proceeds to the proper entities. Reports, but pays the user for the privilege of providing data. 0094 FIG. 10 focuses on functional division of process In exchange, the vendor is given the right to have one of its ing illustrating how tasks in the nature oftemplate matching partners send a text message to the user promoting goods or can be performed on the cell phone itself, whereas more services. The payment may take the form of a credit on the Sophisticated tasks (in the nature of data association) desir user's monthly cell phone voice/data service billing. ably are referred to the cloud for processing. 0099. Using criteria specified by the user, stored prefer 0095 Elements of the foregoing are distilled in FIG. 10A, ences, context, and other rules/heuristics, a query router and showing an implementation of aspects of the technology as a response manager (in the cellphone, in the cloud, distributed, physical matter of (usually) software components. The two etc.) determines whether the packet of data needing process ovals in the figure highlight the symmetric pair of Software ing should be handled by one of the service providers in the components which are involved in setting up a "human real stable of static standbys, or whether it should be offered to time' visual recognition session between a mobile device and providers on an auction basis—in which case it arbitrates the the generic cloud or service providers, data associations and outcome of the auction. visual query results. The oval on the left refers to “keyvec 0100. The static standby service providers may be identi tors' and more specifically “visual key vectors. As noted, this fied when the phone is initially programmed, and only recon term can encompass everything from simple JPEG com figured when the phone is reprogrammed. (For example, Veri pressed blocks all the way through log-polar transformed Zon may specify that all FFT operations on its phones be facial feature vectors and anything in between and beyond. routed to a server that it provides for this purpose.) Or, the user The point of a key vector is that the essential raw information may be able to periodically identify preferred providers for of Some given visual recognition task has been optimally certain tasks, as through a configuration menu, or specify that pre-processed and packaged (possibly compressed). The oval certain tasks should be referred for auction. Some applica on the left assembles these packets, and typically inserts some tions may emerge where static service providers are favored; addressing information by which they will be routed. (Final the task may be so mundane, or one provider's services may addressing may not be possible, as the packet may ultimately be so un-paralleled, that competition for the provision of be routed to remote service providers—the details of which may not yet be known.) Desirably, this processing is per services isn't warranted. formed as close to the raw sensor data as possible. Such as by 0101. In the case of services referred to auction, some processing circuitry integrated on the same Substrate as the users may exalt price above all other considerations. Others image sensor, which is responsive to software instructions may insist on domestic data processing. Others may want to stored in memory or provided from another stage in packet stick to service providers that meet “green.” “ethical.” or other form. standards of corporate practice. Others may preferricher data 0096. The oval on the right administers the remote pro output. Weightings of different criteria can be applied by the cessing of key Vector data, e.g., attending to arranging appro query router and response manager in making the decision. priate services, directing traffic flow, etc. Desirably, this soft 0102. In some circumstances, one input to the query router ware process is implemented as low down on a and response manager may be the user's location, so that a communications Stack as possible, generally on a "cloud different service provider may be selected when the user is at side device, access point, or cell tower. (When real-time home in Oregon, than when she is vacationing in Mexico. In visual key vector packets stream over a communications other instances, the required turnaround time is specified, channel, the lower down in the communications stack they are which may disqualify some vendors, and make others more identified and routed, the smoother the “human real-time' competitive. In some instances the query router and response look and feel a given visual recognition task will be.) Remain manager need not decide at all, e.g., if cached results identi ing high level processing needed to Support this arrangement fying a service provider selected in a previous auction are still is included in FIG. 10A for context, and can generally be available and not beyond a “freshness” threshold. performed through native mobile and remote hardware capa 0103 Pricing offered by the vendors may change with bilities. processing load, bandwidth, time of day, and other consider 0097 FIGS. 11 and 12 illustrate the concept that some ations. In some embodiments the providers may be informed providers of some cloud-based pixel processing services may of offers submitted by competitors (using known trust be established inadvance, in a pseudo-static fashion, whereas arrangements assuring data integrity), and given the opportu other providers may periodically vie for the privilege of pro nity to make their offers more enticing. Such a bidding war cessing a user's key vector data, through participation in a may continue until no bidder is willing to change the offered US 2013/0273968 A1 Oct. 17, 2013 terms. The query router and response manager (or in some router and response manager) by presenting only partial implementations, the user) then makes a selection. results. With a taste of what’s available, the user (or the query 0104 For expository convenience and visual clarity, FIG. router and response manager) may be induced to make a 12 shows a software module labeled “Bid Filter and Broad different choice than relevant criteria/heuristics would other cast Agent. In most implementations this forms part of the wise indicate. query router and response manager module. The bid filter 0112 The function calls sent to external service providers, module decides which vendors—from a universe of possible of course, do not have to provide the ultimate result sought by Vendors—should be given a chance to bid on a processing a consumer (e.g., identifying a car, or translating a menu task. (The user's preference data, or historical experience, listing from French to English). They can be component may indicate that certain service providers be disqualified.) operations, such as calculating an FFT, or performing a SIFT The broadcast agent module then communicates with the procedure or a log-polar transform, or computing a histogram selected bidders to inform them of a user task for processing, or eigenvectors, or identifying edges, etc. and provides information needed for them to make a bid. 0113. In time, it is expected that a rich ecosystem of expert 0105. Desirably, the bid filter and broadcast agent do at processors will emerge—serving myriad processing requests least some their work in advance of data being available for from cell phones and other thin client devices. processing. That is, as soon as a prediction can be made as to an operation that the user may likely soon request, these More on a Particular Hardware Arrangement modules start working to identify a provider to perform a service expected to be required. A few hundred milliseconds 0114. As noted above and in the cited patent documents, later the user key vector data may actually be available for there is a need for generic object recognition by a mobile processing (if the prediction turns out to be accurate). device. Some approaches to specialized object recognition 0106 Sometimes, as with Google's present AdWords sys have emerged, and these have given rise to specific data tem, the service providers are not consulted at each user processing approaches. However, no architecture has been transaction. Instead, each provides bidding parameters, proposed that goes beyond specialized object recognition which are stored and consulted whenever a transaction is toward generic object recognition. considered, to determine which service provider wins. These 0115 Visually, a generic object recognition arrangement stored parameters may be updated occasionally. In some requires access to good raw visual data preferably free of implementations the service provider pushes updated param device quirks, Scene quirks, user quirks, etc. Developers of eters to the bid filter and broadcast agent whenever available. systems built around object identification will best prosper (The bid filter and broadcast agent may serve a large popula and serve their users by concentrating on the object identifi tion of users, such as all Verizon subscribers in area code 503, cation task at hand, and not the myriad existing roadblocks, or all subscribers to an ISP in a community, or all users at the resource sinks, and third party dependencies that currently domain well-dot-com, etc.; or more localized agents may be must be confronted. employed, such as one for each cell phone tower.) 0116. As noted, virtually all object identification tech 0107 If there is a lull in traffic, a service provider may niques can make use of or even rely upon a pipe to “the discount its services for the next minute. The service provider cloud.” (“Cloud can include anything external to the cell may thus transmit (or post) a message stating that it will phone.) perform eigenvector extraction on an image file of up to 10 0117 Desirably, image-responsive techniques should pro megabytes for 2 cents until 1244754176 Coordinated Univer duce a short term “result or answer,” which generally requires sal Time in the Unix epoch, after which time the price will some level of interactivity with a user hopefully measured return to 3 cents. The bid filter and broadcast agent updates a in fractions of a second for truly interactive applications, or a table with stored bidding parameters accordingly. few seconds or fractions of a minute for nearer-term “I’m 0108 (Information about the Google reverse auction, used patient to wait' applications. to place sponsored advertising on web search results page, is 0118. As for the objects in question, they can break down reproduced at the end of this specification. This information into various categories, including (1) generic passive (clues to was published in Wired magazine on May 22, 2009, in an basic searches), (2) geographic passive (at least you know article by Stephen Levy.) where you are, and may hook into geographic-specific 0109. In other implementations, the broadcast agent polls resources), (3) “cloud supported passive, as with “identified/ the bidders—communicating relevant parameters, and solic enumerated objects and their associated sites, and (4) active/ iting bid responses whenever a transaction is offered for pro controllable, a la ThingPipe (a reference to technology cessing. detailed in application 61/169.266, such as WiFi-equipped 0110. Once a prevailing bidder is decided, and data is thermostats and parking meters). available for processing, the broadcast agent transmits the 0119) An object recognition platform should not, it seems, key vector data (and other parameters as may be appropriate to be conceived in the classic “local device and local resources a particular task) to the winning bidder. The bidder then only' software mentality. However, it may be conceived as a performs the requested operation, and returns the processed local device optimization problem. That is, the software on data to the query router and response manager. This module the local device, and its processing hardware, should be logs the processed data, and attends to any necessary account designed in contemplation of their interaction with off-device ing (e.g., crediting the service provider with the appropriate software and hardware. Ditto the balance and interplay of fee). The response data is then forwarded back to the user both control functionality, pixel crunching functionality, and device. application software/GUI provided on the device, versus off 0111. In a variant arrangement, one or more of the com the device. peting service providers actually performs some or all of the I0120 Applicants believe this platform should employ requested processing, but "teases the user (or the query image processing near the sensor—optimally on the same US 2013/0273968 A1 Oct. 17, 2013 chip, with at least Some processing tasks desirably performed post the image to Flickr (using Flickr's cell phone upload by dedicated, special purpose hardware. functionality), and it will soon be found and processed by 0121 Consider FIG. 13, which shows an architecture of a Google and .) cellphone 10 in which an image sensor 12 feeds two process I0128 Image 44 is a snapshot of friends. Facial recognition ing paths. One, 13, is tailored for the human visual system, may be employed (both to indicate that there are faces in the and includes processing such as JPEG compression. Another, image, and to identify particular faces, e.g., by reference to 14, is tailored for object recognition. As discussed. Some of user-associated data maintained by Apple's iPhoto service, this processing may be performed by the mobile device, while Google's Picasa Service, Facebook, etc.) Geolocation and other processing may be referred to the cloud 16. date/time information from the cell phone may also provide 0122 FIG. 14 takes an application-centric view of the useful information. object recognition processing path. Some applications reside wholly in the cell phone. Other applications reside wholly 0129. The persons wearing Sunglasses pose a challenge outside the cell phone—simply taking key vector data as for Some facial recognition algorithms. Identification of those stimulus. More common are hybrids, Such as where some individuals may be aided by their association with persons processing is done in the cellphone, other processing is done whose identities can more easily be determined (e.g., by externally, and the application Software orchestrating the pro conventional facial recognition). That is, by identifying other cess resides in the cell phone. group pictures in iPhoto/Picasa/Facebook/etc. that include 0123 To illustrate further discussion, FIG. 15 shows a one or more of the latter individuals, the other individuals range 40 of some of the different types of images 41-46 that depicted in Such photographs may also be present in the may be captured by a particular user's cellphone. A few brief Subject image. These candidate persons form a much smaller (and incomplete) comments about some of the processing universe of possibilities than is normally provided by that may be applied to each image are provided in the follow unbounded iPhoto/Picasa/Facebook/etc data. The facial vec ing paragraphs. tors discernable from the Sunglass-wearing faces in the Sub ject image can then be compared against this Smaller universe 0124 Image 41 depicts a thermostat. A Steganographic of possibilities in order to determine a best match. If, in the digital watermark 47 is textured or printed on the thermostats usual case of recognizing a face, a score of 90 is required to be case. (The watermark is shown as visible in FIG. 15, but is considered a match (out of an arbitrary top match score of typically imperceptible to the viewer). The watermark con 100), in searching such a group-constrained set of images a veys information intended for the cell phone, allowing it to score of 70 or 80 might suffice. (Where, as in image 44, there presentagraphic user interface by which the user can interact are two persons depicted without Sunglasses, the occurrence with the thermostat. A bar code or other data carrier can of both of these individuals in a photo with one or more other alternatively be used. Such technology is further detailed individuals may increase its relevance to Such an analysis— below, and in patent application 61/169.266, filed Apr. 14, implemented, e.g., by increasing a weighting factor in a 2009. matching algorithm.) 0.125 Image 42 depicts an item including a barcode 48. 0.130 Image 45 shows part of the statue of Prometheus in This barcode conveys Universal Product Code (UPC) data. Rockefeller Center, NY. Its identification can follow teach Other barcodes may convey other information. The barcode ings detailed elsewhere in this specification. payload is not primarily intended for reading by a user cell phone (in contrast to watermark 47), but it nonetheless may be I0131 Image 46 is a landscape, depicting the Maroon Bells used by the cell phone to help determine an appropriate mountain range in Colorado. This image subject may be response for the user. recognized by reference to geolocation data from the cell phone, in conjunction with geographic information services 0126 Image 43 shows a product that may be identified such as GeoNames or Yahoo!'s GeoPlanet. without reference to any express machine readable informa tion (such as a bar code or watermark). A segmentation algo I0132 (It should be understood that techniques noted rithm may be applied to edge-detected image data to distin above in connection with processing of one of the images guish the apparent image subject from the apparent 41-46 in FIG. 15 can likewise be applied to others of the background. The image subject may be identified through its images. Moreover, it should be understood that while in some shape, color and texture. Image fingerprinting may be used to respects the depicted images are ordered according to ease of identify reference images having similar labels, and metadata identifying the Subject and formulating a response, in other associated with those other images may be harvested. SIFT respects they are not. For example, although landscape image techniques (discussed below) may be employed for Such pat 46 is depicted to the far right, its geolocation data is strongly tern-based recognition tasks. Specular reflections in low tex correlated with the metadata “Maroon Bells.” Thus, this par ture regions may tend to indicate the image subject is made of ticular image presents an easier case than that presented by glass. Optical character recognition can be applied for further many other images.) information (reading the visible text). All of these clues can 0133. In one embodiment, such processing of imagery be employed to identify the depicted item, and help determine occurs automatically—without express user instruction each an appropriate response for the user. time. Subject to network connectivity and power constraints, 0127. Additionally (or alternatively), similar-image information can be gleaned continuously from Such process search systems, such as Google Similar Images, and ing, and may be used in processing Subsequently-captured Microsoft Live Search, can be employed to find similar images. For example, an earlier image in a sequence that images, and their metadata can then be harvested. (As of this includes photograph 44 may show members of the depicted writing, these services do not directly supportupload of a user group without Sunglasses—simplifying identification of the picture to find similar web pictures. However, the user can persons later depicted with Sunglasses. US 2013/0273968 A1 Oct. 17, 2013

FIG. 16 Implementation header field 55 in a manner permitting it to be disregarded by 0134 FIG. 16 gets into the nifty gritty of a particular Subsequent processing stages). implementation incorporating certain of the features earlier (0.139. When the packet 57 was authored by setup module discussed. (The other discussed features can be implemented 34 it also included a series of further header fields 58, each by the artisan within this architecture, based on the provided specifying how a corresponding, Successive post-sensor stage disclosure.) In this data driven arrangement 30, operation of a 38 should process the captured data. As shown in FIG. 16, cellphone camera 32 is dynamically controlled in accordance there are several Such post-sensor processing stages 38. with packet data sent by a setup module 34, which in turn is 0140 Camera 32 outputs the image-stuffed packet pro controlled by a control processor module 36. (Control pro duced by the camera (a pixel packet) onto a bus or other data cessor module 36 may be the cellphone's primary processor, channel 61, which conveys it to a first processing stage 38. oran auxiliary processor, or this function may be distributed.) 0141 Stage 38 examines the header of the packet. Since The packet data further specifies operations to be performed the camera deleted the instruction field 55 that conveyed by an ensuing chain of processing stages 38. camera instructions (or marked it to be disregarded), the first 0135) In one particular implementation, setup module 34 header field encountered by a control portion of stage 38 is dictates—on a frame by frame basis—the parameters that are field 58a. This field details parameters of an operation to be to be employed by camera 32 in gathering an exposure. Setup applied by stage 38 to data in the body of the packet. module 34 also specifies the type of data the camera is to 0.142 For example, field 58a may specify parameters of an output. These instructional parameters are conveyed in a first edge detection algorithm to be applied by stage 38 to the field 55 of a header portion 56 of a data packet 57 correspond packets image data (or simply that Such an algorithm should ing to that frame (FIG. 17). be applied). It may further specify that stage 38 is to substitute 0.136 For example, for each frame, the setup module 34 the resulting edge-detected set of data for the original image may issue a packet 57 whose first field 55 instructs the camera data in the body of the packet. Stage 38 performs the about, e.g., the length of the exposure, the aperture size, and requested operation (which may involve configuring pro the lens focus. Module 34 may further author the field 55 to grammable hardware in certain implementations). First stage specify that the sensor is to Sum sensor charges to reduce 38 then deletes instructions 58a from the packet header 56 (or resolution (e.g., producing a frame of 640x480 data from a marks them to be disregarded) and outputs the processed sensor capable of 1280x960), output data only from red pixel packet for action by a next processing stage. filtered sensor cells, output data only from a horizontal line of 0.143 A control portion of a next processing stage (which cells across the middle of the sensor, output data only from a here comprises stages 38a and 38b, discussed later) examines 128x128 patch of cells in the center of the pixel array, etc. The the header of the packet. Since field 58a was deleted (or camera instruction field 55 may further specify the exact time marked to be disregarded), the first field encountered is field that the camera is to capture data—so as to allow, e.g., desired 58b. In this particular packet, field 58b may instruct the sec synchronization with ambient lighting (as detailed later). ond stage not to perform any processing on the data in the 0.137 Each packet 56 issued by setup module 34 may body of the packet, but instead simply delete field 58b from include different camera parameters in the first header field the packet header and pass the pixel packet to the next stage. 55. Thus, a first packet may cause camera 32 to capture a full 0144. A next field of the packet header may instruct the frame image with an exposure time of 1 millisecond. A next third stage 38c to perform 2D FFT operations on the image packet may cause the camera to capture a full frame image data found in the packet body, based on 16x16 blocks. It may with an exposure time of 10 milliseconds, and a third may further direct the stage to hand-off the processed FFT data to dictate an exposure time of 100 milliseconds. (Such frames a wireless interface, for internet transmission to address 216. may later be processed in combination to yield a high 239.32.10, accompanied by specified data (detailing, e.g., the dynamic range image.) A fourth packet may instruct the cam task to be performed on the received FFT data by the com era to down-sample data from the image sensor, and combine puter at that address, such as texture classification). It may signals from differently color-filtered sensor cells, so as to further direct the stage to hand off a single 16x16 block of output a 4x3 array of grayscale luminance values. A fifth FFT data, corresponding to the center of the captured image, packet may instruct the camera to output data only from an to the same or a different wireless interface for transmission 8x8 patch of pixels at the center of the frame. A sixth packet to address 12.232.235.27 —again accompanied by corre may instruct the camera to output only five lines of image sponding instructions about its use (e.g., search an archive of data, from the top, bottom, middle, and mid-upper and mid stored FFT data for a match, and return information if a match lower rows of the sensor. A seventh packet may instruct the is found; also, store this 16x16 block in the archive with an camera to output only data from blue-filtered sensor cells. An associated identifier). Finally, the header authored by setup eighth packet may instruct the camera to disregard any auto module 34 may instruct stage 38c to replace the body of the focus instructions but instead capture a full frame at infinity packet with the single 16x16 block of FFT data dispatched to focus. And so on. the wireless interface. As before, the stage also edits the 0138 Each such packet 57 is provided from setup module packet header to delete (or mark) the instructions to which it 34 across abus or other data channel 60 to a camera controller responded, so that a header instruction field for the next module associated with the camera. (The details of a digital processing stage is the first to be encountered. camera—including an array of photosensor cells, associated 0145. In other arrangements, the addresses of the remote analog-digital converters and control circuitry, etc., are well computers are not hard-coded. Instead, stage 38c may be known to artisans and so are not belabored.) Camera 32 directed to hand-off the processed pixel packet to the Query captures digital image data in accordance with instructions in Router and Response Manager. This module examines the the header field 55 of the packet and stuffs the resulting image pixel packet to determine what type of processing is next data into a body 59 of the packet. It also deletes the camera required, and it routes it to an appropriate provider (which instructions 55 from the packet header (or otherwise marks may be in the cellphone if resources permit, or in the cloud— US 2013/0273968 A1 Oct. 17, 2013 among the stable of static providers, or to a provideridentified simply point to memory locations from which corresponding through an auction). The provider returns the requested out instructions are retrieved, e.g., by the corresponding process put data (e.g., texture classification information, and infor ing stage 38. mation about any matching FFT in the archive), and process 0153 Memory 79 can also facilitate adaptation of process ing continues per the next item of instruction in the pixel ing flow even if conditional branching is not employed. For packet header. example, a processing stage may yield output data that deter 0146 The data flow continues through as many functions mines parameters of a filter or other algorithm to be applied as a particular operation may require. by a later stage (e.g., a convolution kernel, a time delay, a pixel 0147 In the particular arrangement illustrated, each pro mask, etc). Such parameters may be identified by the former cessing stage 38 strips-out, from the packet header, the processing stage in memory (e.g., determined/calculated, and instructions on which it acted. The instructions are ordered in stored), and recalled for use by the later stage. In FIG. 19, for the header in the sequence of processing stages, so this example, processing stage 38 produces parameters that are removal allows each stage to look to the first instructions stored in memory 79. A subsequent processing stage 38c later remaining in the header for direction. Other arrangements, of retrieves these parameters, and uses them in execution of its course, can alternatively be employed. (For example, a mod assigned operation. (The information in memory can be ule may insert new information into the header—at the front, labeled to identify the module/provider from which they tail, or elsewhere in the sequence—based on processing originated, or to which they are destined

account, access facial recognition data for the users friends (e.g., send all FFT operations to the Fraunhofer Institute from that account, compute facial recognition features from service), or they can be specified by other attributes. For image data conveyed with the packet, determine a best match, example, a cellphone user may direct that all remote service and return result information (e.g., a name of a depicted requests are to be routed to providers that are ranked as fastest individual) back to the originating device. in a periodically updated Survey of providers (e.g., by Con 0173 At the IP address for the Google service, a server Sumers Union). The cell phone can periodically check the would undertake similar operations, but would refer to the published results for this information, or it can be checked user's Picasa account. Ditto for Facebook. dynamically when a service is requested. Another user may 0.174 Identifying a face from among faces for dozens or specify that service requests are to be routed to service pro hundreds of known friends is easier than identifying faces of viders that have highest customersatisfaction scores—again strangers. Other vendors may offer services of the latter sort. by reference to an online rating resource. Still another user For example, L-1 IdentitySolutions, Inc. maintains databases may specify that requests should be routed to the providers of images from government-issued credentials—such as driv having highest customer satisfaction scores—but only if the ers licenses. With appropriate permissions, it may offer service is provided for free; else route to the lowest cost facial recognition services drawing from Such databases. provider. Combinations of these arrangements, and others, 0175 Other processing operations can similarly be oper are of course possible. The user may, in a particular case, ated remotely. One is a barcode processor, which would take specify a particular service provider—trumping any selection processed image data sent from the mobile phone, apply a that would be made by the stored profile data. (See, also, the decoding algorithm particular to the type of barcode present. earlier discussion of remote service providers, including auc A service may support one, a few, or dozens of different types tion-based services, e.g., in connection with FIGS. 7-12.) ofbarcode. The decoded data may be returned to the phone, or the service provider can access further data indexed by the Pipe Manager decoded data, such as product information, instructions, pur 0183 In the FIG. 16 implementation, communications to chase options, etc., and return Such further data to the phone. and from the cloud are facilitated by a pipe manager 51. This (Or both can be provided.) module (which may be realized as the cellphone-side portion 0176 Another service is digital watermark reading. of the query router and response manager of FIG. 7) performs Another is optical character recognition (OCR). An OCR a variety of functions relating to communicating across a data service provider may further offer translation services, e.g., pipe 52. (It will be recognized that pipe 52 is a data construct converting processed image data into ASCII symbols, and that may comprise a variety of communication channels.) then submitting the ASCII words to a translation engine to 0.184 One function performed by pipe manager 51 is to render them in a different language. Other services are negotiate for needed communication resources. The cell sampled in FIG. 2. (Practicality prevents enumeration of the phone can employ a variety of communication networks and myriad other services, and component operations, that may commercial data carriers, e.g., cellular data, WiFi, Bluetooth, also be provided.) etc.—any or all of which may be utilized. Each may have its 0177. The output from the remote service provider is com own protocol stack. In one respect the pipe manager 51 inter monly returned to the cell phone. In many instances the acts with respective interfaces for these data channels—de remote service provider will return processed image data. In termining the availability of bandwidth for different data some cases it may return ASCII or other such data. Some payloads. times, however, the remote service provider may produce 0185. For example, the pipe manager may alert the cellular other forms of output, including audio (e.g., MP3) and/or data carrier local interface and network that there will be a video (e.g., MPEG4 and Adobe Flash). payload ready for transmission starting in about 450 millisec 0.178 Video returned to the cell phone from the remote onds. It may further specify the size of the payload (e.g., two provider may be presented on the cellphone display. In some megabits), its character (e.g., block data), and a needed qual implementations such video presents a user interface screen, ity of service (e.g., data throughput rate). It may also specify inviting the user to touch or gesture within the displayed a priority level for the transmission, so that the interface and presentation to select information oran operation, or issue an network can service such transmission ahead of lower-prior instruction. Software in the cell phone can receive such user ity data exchanges, in the event of a conflict. input and undertake responsive operations, or present respon 0186 The pipe manager knows the expected size of the sive information. payload due to information provided by the control processor 0179 Instill other arrangements, the data provided back to module 36. (In the illustrated embodiment, the control pro the cell phone from the remote service provider can include cessor module specifies the particular processing that will JavaScript or other such instructions. When run by the cell yield the payload, and so it can estimate the size of the phone, the JavaScript provides a response associated with the resulting data). The control processor module can also predict processed data referred out to the remote provider. the character of the data, e.g., whether it will be available as a 0180 Remote processing services can be provided under a fixed block or intermittently in bursts, the rate at which it will variety of different financial models. An Apple iPhone service be provided for transmission, etc. The control processor mod plan may be bundled with a variety of remote services at no ule 36 can also predict the time at which the data will be ready additional cost, e.g., iPhoto-based facial recognition. Other for transmission. The priority information, too, is known by services may bill on a per-use, monthly Subscription, or other the control processor module. In some instances the control usage plans. processor module autonomously sets the priority level. In 0181 Some services will doubtless behighly branded, and other instances the priority level is dictated by the user, or by marketed. Others may compete on quality; others on price. the particular application being serviced. 0182. As noted, stored data may indicate preferred provid 0187. For example, the user may expressly signal— ers for different services. These may be explicitly identified through the cellphone's graphical user interface, or a particu US 2013/0273968 A1 Oct. 17, 2013 lar application may regularly require, that an image-based load may be generated as originally scheduled, and the results action is to be processed immediately. This may be the case, may be locally buffered for transmission when the carrier and for example, where further action from the user is expected interface are able to do so. Or other action may be taken. based on the results of the image processing. In other cases the 0194 Consider a business gathering in which a participant user may expressly signal, or a particular application may gathers a group for a photo before dinner. The user may want normally permit, that an image-based action can be per all faces in the photo to be recognized immediately, so that formed whenever convenient (e.g., when needed resources they can be quickly reviewed to avoid the embarrassment of have low or nil utilization). This may be the case, for example, not recalling a colleague's name. Even before the user oper if a user is posting a Snapshot to a social networking site Such ates the cellphone's user-shutter button the control processor as Facebook, and would like the image annotated with names module causes the system to process frames of image data, of depicted individuals—through facial recognition process and is identifying apparent faces in the field of view (e.g., oval ing. Intermediate prioritization (expressed by the user, or by shapes, with two seeming eyes in expected positions). These the application) can also be employed, e.g., process within a may be highlighted by rectangles on the cell phone's view minute, ten minutes, an hour, a day, etc. finder (screen) display. 0188 In the illustrated arrangement, the control processor 0.195 While current cameras have picture-taking modes module 36 informs the pipe manager of the expected data based on lens/exposure profiles (e.g., close-up, night-time, size, character, timing, and priority, so that the pipe manager beach, landscape, Snow scenes, etc), imaging devices can use same in negotiating for the desired service. (In other employing principles of the present technology may addition embodiments, less or more information can be provided.) ally (or alternatively) have different image-processing 0189 If the carrier and interface can meet the pipe man modes. One mode may be selected by the user to obtain ager's request, further data exchange may ensue to prepare names of people depicted in a photo (e.g., through facial for the data transmission and ready the remote system for the recognition). Another mode may be selected to perform opti expected operation. For example, the pipe manager may cal character recognition of text found in an image frame. establish a secure Socket connection with a particular com Another may trigger operations relating to purchasing a puter in the cloud that is to receive that particular data pay depicted item. Ditto for selling a depicted item. Ditto for load, and identify the user. If the cloud computer is to perform obtaining information about a depicted object, Scene or per a facial recognition operation, it may prepare for the opera son (e.g., from Wikipedia, a social network, a manufacturers tion by retrieving from Apple/Google/Facebook the facial web site), etc. Ditto for establishing a ThinkPipe session with recognition features associated with friends of the specified the item, or a related system. Etc. USC. 0196. These modes may be selected by the user in advance 0190. Thus, in addition to preparing a channel for the of operating a shutter control, or after. In other arrangements, external communication, the pipe manager enables pre plural shutter controls (physical or GUI) are provided for the warming of the remote computer, to ready it for the expected user respectively invoking different of the available opera service request. (The service may request may not follow. In tions. (In still other embodiments, the device infers what Some instances the user may operate the shutter button, and operation(s) is/are possibly desired, rather than having the the cellphone may not know what operation will follow. Will user expressly indicate same.) the user request a facial recognition operation? A barcode 0197) If the user at the business gathering takes a group decoding operation? Posting of the image to Flickr or Face shot depicting twelve individuals, and requests the names on book? In some cases the pipe manager—or control processor an immediate basis, the pipeline manager 51 may report back module—may pre-warm several processes. Orit may predict, to the control processor module (or to application Software) based on past experience, what operation will be undertaken, that the requested service cannot be provided. Due to a bottle and warm appropriate resources. (E.g., if the user performed neck or other constraint, the manager 51 may report that facial recognition operations following the last three shutter identification of only three of the depicted faces can be operations, there’s a good chance the user will request facial accommodated within service quality parameters considered recognition again.) to constitute an “immediate' basis. Another three faces may 0191 Pre-warming can also include resources within the be recognized within two seconds, and recognition of the full cellphone, configuring processors, loading caches, etc. set of faces may be expected in five seconds. (This may be due 0.192 The situation just reviewed contemplates that to a constraint by the remote service provider, rather than the desired resources are ready to handle the expected traffic. In carrier, perse.) another situation the pipe manager may report that the carrier 0198 The control processor module 36 (or application is unavailable (e.g., due to the user being in a region of Software) may respond to this report in accordance with an impaired radio service). This information is reported to con algorithm, or by reference to a rule set stored in a local or trol processor module 36, which may change the schedule of remote data structure. The algorithm or rule set may conclude image processing, buffer results, or take other responsive that for facial recognition operations, delayed service should action. be accepted on whatever terms are available, and the user 0193 If other, conflicting, data transfers are underway, the should be alerted (through the device GUI) that there will be carrier or interface may respond to the pipe manager that the a delay of about N seconds before full results are available. requested transmission cannot be accommodated, e.g., at the Optionally, the reported cause of the expected delay may also requested time or with the requested quality of service. In this be exposed to the user. Other service exceptions may be case the pipe manager may report same to the control proces handled differently in some cases with the operation sor module 36. The control processor module may abort the aborted or rescheduled or routed to a less-preferred provider, process that was to result in the two megabit data service and/or with the user not alerted. requirement and reschedule it for later. Alternatively, the con 0199. In addition to considering the ability of the local trol processor module may decide that the two megabit pay device interface to the network, and the ability of the network/ US 2013/0273968 A1 Oct. 17, 2013

carrier, to handle the forecast data traffic (within specified 0208 If the user's position corresponds to a downtown parameters), the pipeline manager may also query resources street, and magnetometer and other position data indicates out in the cloud to ensure that they are able to perform she is looking north, inclined up from the horizontal, what’s whatever services are requested (within specified param likely to be of interest? Even without image data, a quick eters). These cloud resources can include, e.g., data networks reference to online resources such as Google Streetview can and remote computers. If any responds in the negative, or with suggest she's looking at business signage along 5" Avenue. a service level qualification, this too can be reported back to Maybe feature recognition reference data for this geography the control processor module 36, so that appropriate action should be downloaded into the cache for rapid matching can be taken. against to-be-acquired image data. 0200. In response to any communication from the pipe 0209 To speed performance, the cache should be loaded manager 51 indicating possible trouble servicing the in a rational fashion—So that the most likely object is con expected data flow, the control process 36 may issue corre sidered first. Google Streetview for that location includes sponding instructions to the pipe manager and/or other mod metadata indicating 5" Avenue has signs for a Starbucks, a ules, as necessary. Nordstrom store, and a That restaurant. Stored profile data for 0201 In addition to the just-detailed tasks of negotiating the user reveals she visits Starbucks daily (she has their in advance for needed services, and setting up appropriate branded loyalty card); she is a frequent clothes shopper (albeit data connections, the pipe manager can also act as a flow control manager—orchestrating the transfer of data from the with a Macy's, rather than a Nordstrom's charge card); and different modules out of the cell phone, resolving conflicts, she's never eaten at a That restaurant. Perhaps the cache and reporting errors back to the control processor module 36. should be loaded so as to most quickly identify the Starbucks 0202 While the foregoing discussion has focused on out sign, followed by Nordstrom, followed by the That restaurant. bound data traffic, there is a similar flow inbound, back to the 0210 Low resolution imagery captured for presentation cellphone. The pipe manager (and control processor module) on the viewfinder fails to trigger the camera's feature high can help administer this traffic as well providing services lighting probable faces (e.g., for exposure optimization pur complementary to those discussed in connection with out poses). That helps. There's no need to pre-warm the complex bound traffic. processing associated with facial recognition. 0203. In some embodiments, there may be a pipe manager 0211 She touches the virtual shutter button, capturing a counterpart module 53 out in the cloud—cooperating with frame of high resolution imagery, and image analysis gets pipe manager 51 in the cell phone in performance of the underway—trying to recognize what’s in the field of view, so detailed functionality. that the camera application can overlay a ranked ordering of graphical links related to objects in the captured frame. Branch Prediction; Commercial Incentives Unlike Google web search—which ranks search results in an 0204 The technology of branch prediction arose to meet order based on aggregate user data, the camera application the needs of increasingly complex processor hardware; it attempts a ranking customized to the user's profile. If a Star allows processors with lengthy pipelines to fetch data and bucks sign or logo is found in the frame, the Starbucks link instructions (and in some cases, execute the instructions), gets top position for this user. without waiting for conditional branches to be resolved. 0212. If signs for Starbucks, Nordstrom, and the That res 0205. A similar science can be applied in the present con taurant are all found, links would normally be presented in text predicting what action a human user will take. For that order (per the user's preferences inferred from profile example, as discussed above, the just-detailed system may data). However, the cell phone application has a capitalistic “pre-warm certain processors, or communication channels, bent and is willing to promote a link by a position or two in anticipation that certain data or processing operations will (although perhaps not to the top position) if circumstances be forthcoming. warrant. In the present case, the cell phone routinely sent IP 0206 When a user removes an iPhone from her purse packets to the web servers at addresses associated with each (exposing the sensor to increased light) and lifts it to eye level of the links, alerting them that an iPhone user had recognized (as sensed by accelerometers), what is she about to do? Ref their corporate signage from a particular latitude/longitude. erence can be made to past behavior to make a prediction. (Other user data may also be provided, if privacy consider Particularly relevant may include what the user did with the ations and user permissions allow.) The That restaurant server phone camera the last time it was used; what the user did with responds back in an instant—offering to the next two custom the phone camera at about the same time yesterday (and at the ers 25% offany one item (the restaurant's point of sale system same time a week ago); what the user last did at about the indicates only four tables are occupied and no order is pend same location; etc. Corresponding actions can be taken in ing; the cook is idle). The restaurant server offers three cents anticipation. if the phone will present the discount offer to the user in its 0207. If her latitude/longitude correspond to a location presentation of search results, or five cents if it will also within a video rental store, that helps. Expect to maybe per promote the link to second place in the ranked list, or ten cents form image recognition on artwork from a DVD box. To if it will do that and be the only discount offer presented in the speed possible recognition, perhaps SIFT or other feature results list. (Starbucks also responded with an incentive, but recognition reference data should be downloaded for candi not as attractive). The cell phone quickly accepts the restau date DVDs and stored in a cell phone cache. Recent releases rants offer, and payments are quickly made—either to the are good prospects (except those rated G, or rated high for user (e.g., defraying the monthly phone bill) or more likely to violence—stored profile data indicates the user just doesn't the phone carrier (e.g., AT&T). Links are presented to Star have a history of watching those). So are movies that she's bucks, the That restaurant, and Nordstrom, in that order, with watched in the past (as indicated by historical rental the restaurants link noting the discount for the next two records—also available to the phone). CuStOmerS. US 2013/0273968 A1 Oct. 17, 2013

0213 Google's AdWord technology has already been 0218. Sometimes alleviating a bandwidth bottleneck noted. It decides, based on factors including a reverse-auction requires opening a bandwidth throttle on a cellphone end of determined payment, which ads to present as Sponsored the wireless link Or the bandwidth service change must be Links adjacent the results of a Google web search. Google has requested, or authorized, by the cell phone. In Such case adapted this technology to presentads on third party web sites MySpace can tell the cell phone application to take needed and blogs, based on the particular contents of those sites, steps for higher bandwidth service, and MySpace will rebate terming the service AdSense. to the user (or to the carrier, for benefit of the user's account) 0214. In accordance with another aspect of the present the extra associated costs. technology, the AdWord/AdSense technology is extended to 0219. In some arrangements, the quality of service (e.g., visual image search on cellphones. bandwidth) is managed by pipe manager 51. Instructions 0215 Consider a user located in a small bookstore who from MySpace may request that the pipe manager start snaps a picture of the Warren Buffet biography Snowball. The requesting augmented service quality, and setting up the book is quickly recognized, but rather than presenting a cor expected high bandwidth session, even before the user selects responding Amazon link atop the list (as may occur with a the MySpace link regular Google search), the cell phone recognizes that the 0220. In some scenarios, Vendors may negotiate preferen user is located in an independent bookstore. Context-based tial bandwidth for its content. MySpace may make a deal with rules consequently dictate that it present a non-commercial AT&T, for example, that all MySpace content delivered to link first. Top ranked of this type is a Wall Street Journal AT&T phone subscribers be delivered at 10 megabits per review of the book, which goes to the top of the presented list second—eventhough most Subscribers normally only receive of links Decorum, however, only goes so far. The cell phone 3 megabits per second service. The higher quality service may passes the book title or ISBN (or the image itself) to Google be highlighted to the user in the presented links AdSense or AdWords, which identifies sponsored links to be associated with that object. (Google may independently per Ambient Light form its own image analysis on any provided imagery. In 0221 Many artificial light sources do not provide a con Some cases it may pay for Such cell phone-submitted imag sistent illumination. Most exhibit a temporal variation in ery—since Google has a knack for exploiting data from intensity (luminance) and/or color. These variations com diverse sources) Per Google, Barnes and Noble has the top monly track the AC power frequency (50/60 or 100/120 Hz). sponsored position, followed by alldiscountbooks-dot-net. but sometimes do not. For example, fluorescent tubes can give The cellphone application may present these sponsored links of infrared illumination that varies at a ~40 KHZ rate. The in a graphically distinct manner to indicate their origin (e.g., emitted spectra depend on the particular lighting technology. in a different part of the display, or presented in a different Organic LEDs for domestic and industrial lighting sometimes color), or it may insert them alternately with non-commercial can use distinct color mixtures (e.g., blue and amber) to make search results, i.e., at positions two and four. The AdSense white. Others employ more traditional red/green/blue clus revenue collected by Google can again be shared with the ters, or blue/UV LEDs with phosphors. user, or with the user's carrier. 0222. In one particular implementation, a processing stage 0216. In some embodiments, the cell phone (or Google) 38 monitors, e.g., the average intensity, redness or greenness again pings the servers of companies for whom links will be of the image data contained in the bodies of packets. This presented—helping them track their physical world-based intensity data can be applied to an output 33 of that stage. online visibility. The pings can include the location of the With the image data, each packet can convey a timestamp user, and an identification of the object that prompted the indicating the particular time (absolute, or based on a local ping. When alldiscountbooks-dot-net receives the ping, it clock) at which the image data was captured. This time data, may check inventory and find it has a significant overstock of too, can be provided on output 33. Snowball. As in the example earlier given, it may offer an 0223) A synchronization processor 35 coupled to such an extra payment for some extra promotion (e.g., including “We output 33 can examine the variation in frame-to-frame inten have 732 copies—cheap” in the presented link) sity (or color), as a function of timestamp data, to discern its 0217. In addition to offering an incentive for a more periodicity. Moreover, this module can predict the next time prominent search listing (e.g., higher in the list, or augmented instant at which the intensity (or color) will have a maxima, with additional information), a company may also offer addi minima, or other particular state. A phase-locked loop may tional bandwidth to serve information to a customer. For control an oscillator that is synced to mirror the periodicity of example, a user may capture video imagery from an elec an aspect of the illumination. More typically, a digital filter tronic billboard, and want to download a copy to show to computes a time interval that is used to set or compare against friends. The user's cellphone identifies the content as a popu timers—optionally with software interrupts. A digital lar clip of user generated content (e.g., by reference to an phased-locked loop or delay-locked loop can also be used. (A encoded watermark), and finds that the clip is available from Kalman filter is commonly used for this type of phase lock several sites—the most popular of which is YouTube, fol ing.) lowed by MySpace. To induce the user to link to MySpace, 0224 Control processor module 36 can poll the synchro MySpace may offer to upgrade the user's baseline wireless nization module 35 to determine when a lighting condition is service from 3 megabits per second to 10 megabits per sec expected to have a desired state. With this information, con ond, so the video will download in a third of the time. This trol processor module 36 can direct setup module 34 to cap upgraded service can be only for the video download, or it can ture a frame of data under favorable lighting conditions for a be longer. The link presented on the screen of the user's cell particular purpose. For example, if the camera is imaging an phone can be amended to highlight the availability of the object Suspected of having a digital watermark encoded in a faster service. (Again, MySpace may make an associated green color channel, processor 36 may direct camera 32 to payment.) capture a frame of imagery at an instant that green illumina US 2013/0273968 A1 Oct. 17, 2013 tion is expected to be at a maximum, and direct processing camera is unable to distinguishahuman face from a picture of stages 38 to process that frame for detection of such a water a face (e.g., as may be found in a magazine, on a billboard, or mark. on an electronic display screen). With spaced-apart sensors, in contrast, the 3D aspect of the picture can readily be dis Other Notes: Projectors cerned, allowing a picture to be distinguished from a person. 0225. While a packet-based, data driven architecture is (Depending on the implementation, it may be the 3D aspect of shown in FIG. 16, a variety of other implementations are of the person that is actually discerned.) course possible. Such alternative architectures are straight 0235 Another function that benefits from multiple cam forward to the artisan, based on the detailed embodiment. eras is refinement of geolocation. From differences between 0226. The artisan will appreciate that the arrangements two images, a processor can determine the device's distance and details noted above are arbitrary. Actual choices of from landmarks whose location may be precisely known. arrangement and detail will depend on the particular applica This allows refinement of other geolocation data available to tion being served, and most likely will be different than those the device (e.g., by WiFi node identification, GPS, etc.) noted. (To cite but a trivial example, FFTs need not be per 0236. Just as a cell phone may have one, two (or more) formed on 16x16 blocks, but can be done on 64x64,256x256, sensors, such a device may also have one, two (or more) the whole image, etc.) projectors. Individual projectors are being deployed in cell 0227 Similarly, it will be recognized that the body of a phones by CKing (the N70 model, distributed by ChinaVi packet can convey an entire frame of data, or just excerpts sion) and Samsung. LG and others have shown prototypes. (e.g., a 128x128 block). Image data from a single captured (These projectors are understood to use Texas Instruments frame may thus span a series of several packets. Different electronically-steerable digital micro-mirror arrays, in con excerpts within a common frame may be processed differ junction with LED or laser illumination.) Microvision offers ently, depending on the packet with which they are conveyed. the PicoP Display Engine, which can be integrated into a 0228. Moreover, a processing stage 38 may be instructed variety of devices to yield projector capability, using a micro to break a packet into multiple packets—such as by splitting electro-mechanical scanning mirror (in conjunction with image data into 16 tiled Smaller Sub-images. Thus, more laser Sources and an optical combiner). Other Suitable pro packets may be present at the end of the system than were jection technologies include 3M's liquid crystal on silicon produced at the beginning. (LCOS) and Displaytech's ferroelectric LCOS systems. 0229. In like fashion, a single packet may contain a col 0237 If a reference pattern (e.g., a grid) is projected onto lection of data from a series of different images (e.g., images a surface, the shape of the surface is revealed by distortions of taken sequentially—with different focus, aperture, or shutter the pattern. The FIG. 16 architecture can be expanded to settings; a particular example is a set of focus regions from include a projector, which projects a pattern onto an object, five images taken with focus bracketing.) This set of data may for capture by the camera system. (Operation of the projector then be processed by later stages—either as a set, or through can be synchronized with operation of the camera, e.g., by a process that selects one or more excerpts of the packet control processor module 36 with the projector activated payload that meet specified criteria (e.g., a focus sharpness only as necessary, since it imposes a significant battery drain.) metric). Processing of the resulting image by modules 38 (local or 0230. In the particular example detailed, each processing remote) provides information about the Surface topology of stage 38 substituted the result of its processing for the data the object. This topology information can be an important originally received in the body of the packet. In other arrange clue in identifying the object. ments this need not be the case. For example, a stage may 0238. In addition to providing information about the shape output a result of its processing to a module outside the of a surface, the shape information allows the surface to be depicted processing chain, e.g., on an output 33. In another virtually re-mapped to any other configuration, e.g., flat. Such case, a stage may maintain in the body of the output remapping serves as a sort of normalization operation. packet—the data originally received, and augment it with 0239. In one particular arrangement, system 30 operates a further data—such as the result(s) of its processing. projector to project a reference pattern into the camera's field 0231 Reference was made to determining focus by refer of view. While the pattern is being projected, the camera ence to DCT frequency spectra, or edge detected data. Many captures a frame of image data. The resulting image is pro consumer cameras perform a simpler form of focus check— cessed to detect the reference pattern, and therefrom charac simply by determining the intensity difference (contrast) terize the shape of an imaged object. Subsequent processing between pairs of adjacent pixels. This difference peaks with then follows, based on the shape data. correct focus. Such an arrangement can naturally be used in 0240. In connection with such arrangements, the reader is the detailed arrangements. referred to the Google book-scanning U.S. Pat. No. 7,508, 0232 Each stage typically conducts a handshaking 978, which employs related principles. (That patent details a exchange with an adjoining stage—each time data is passed particularly useful reference pattern, among other relevant to or received from the adjoining stage. Such handshaking is disclosures.) routine to the artisan familiar with digital system design, so is 0241 The lens arrangement employed in the cell phone's not belabored here. projector system can also be used for the cellphone's camera 0233. The detailed arrangements contemplated a single system. A mirror can be controllably moved to steer the image sensor. However, in other embodiments, multiple camera or the projector to the lens. Or a beam-splitter image sensors can be used. In addition to enabling conven arrangement 80 can be used (FIG. 20). Here the body of a cell tional stereoscopic processing, two or more image sensors phone 81 incorporates a lens 82, which provides a light to a enable or enhance many other operations. beam-splitter 84. Part of the illumination is routed to the 0234 One function that benefits from multiple cameras is camera sensor 12. The otherpart of the optical pathgoes to the distinguishing objects. To cite a simple example, a single micro-mirror projector system 86. US 2013/0273968 A1 Oct. 17, 2013

0242 Lenses used in cell phone projectors typically are Router and Response Manager functionality detailed ear larger aperture than those used for cellphone cameras, so the lier—including both static and auction-based service provid camera may gain significant performance advantages (e.g., CS. enabling shorter exposures) by use of such a shared lens. Or, 0251 Still another hardware architecture makes use of a reciprocally, the beam splitter 84 can be asymmetrical—not field programmable object array (FPOA) arrangement, in equally favoring both optical paths. For example, the beam which hundreds of diverse 16-bit “objects’ are arrayed in a splitter can be a partially-silvered element that couples a gridded node fashion, with each being able to exchange data smaller fraction (e.g., 2%, 8%, or 25%) of externally incident with neighboring devices through very high bandwidth chan light to the sensor path 83. The beam-splitter may thus serve nels. The functionality of each can be reprogrammed, as with to couple a larger fraction (e.g., 98%, 92%, or 75%) of illu FPGAs. Again different of the image processing tasks can be mination from the micro-mirror projector externally, for pro performed by different of the FPOA objects. These tasks can jection. By this arrangement the camera sensor 12 receives be redefined on the fly, as needed (e.g., an object may perform light of a conventional—for a cell phone camera—intensity SIFT processing in one state; FFT processing in another state; (notwithstanding the larger aperture lens), while the light log-polar processing in a further state, etc.). output from the projector is only slightly dimmed by the lens 0252) (While many grid arrangements of logic devices are sharing arrangement. based on “nearest neighbor’ interconnects, additional flex 0243 (Those skilled in optical system design will recog ibility can be achieved by use of a “partial crossbar intercon nize a number of alternatives to the arrangements particularly nect. See, e.g., U.S. Pat. No. 5,448,496 (Quickturn Design noted.) Systems).) 0244. Some of the advantages that accrue from having two cameras can be realized by having two projectors (with a 0253 Also in the realm of hardware, certain embodiments single camera). For example, the two projectors can project of the present technology employ “extended depth of field” alternating or otherwise distinguishable patterns (e.g., simul imaging systems (see, e.g., U.S. Pat. Nos. 7,218,448, 7.031, taneous, but of differing color, pattern, polarization, etc) into 054 and 5,748,371). Such arrangements include a mask in the the camera's field of view. By noting how the two patterns— imaging path that modifies the optical transfer function of the system so as to be insensitive to the distance between the projected from different points—differ when presented on an object and the imaging system. The image quality is then object and viewed by the camera, stereoscopic information uniformly poor over the depth offield. Digital post processing can again be discerned. of the image compensates for the mask modifications, restor Selected Other Hardware Arrangements ing image quality, but retaining the increased depth of field. Using Such technology, the cell phone camera can capture 0245. In addition to the arrangements earlier detailed, imagery having both nearer and further Subjects all in focus another hardware arrangement suitable for use with the (i.e., with greater high frequency detail), without requiring present technology uses the Mali-400 ARM graphics multi longer exposures—as would normally be required. (Longer processor architecture, which includes plural fragment pro exposures exacerbate problems such as hand-jitter, and mov cessors that can be devoted to the different types of image ing Subjects.) In the arrangements detailed here, shorter expo processing tasks referenced in this document. Sures allow higher quality imagery to be provided to image 0246 The standards group Khronos has issued OpenGL processing functions without enduring the temporal delay ES2.0, which defines hundreds of standardized graphics created by optical/mechanical focusing elements, or requir function calls for systems that include multiple CPUs and ing input from the user as to which elements of the image multiple GPUs (a direction in which cell phones are increas should be in focus. This provides for a much more intuitive ingly migrating). OpenGL ES2.0 attends to routing of differ experience, as the user can simply point the imaging device at ent operations to different of the processing units—with Such the desired target without worrying about focus or depth of details being transparent to the application Software. It thus field settings. Similarly, the image processing functions are provides a consistent software API usable with all manner of able to leverage all the pixels included in the image/frame GPU/CPU hardware. captured, as all are expected to be in-focus. In addition, new 0247. In accordance with another aspect of the present metadata regarding identified objects or groupings of pixels technology, OpenGL ES2. standard is extended to provide a related to depth within the frame can produce simple “depth standardized graphics processing library not just across dif map.” Setting the stage for 3D video capture and storage of ferent CPU/GPU hardware, but also across different cloud Video streams using emerging standards on transmission of processing hardware—again with Such details being trans depth information. parent to the calling Software. 0248 Increasingly, Java service requests (JSRs) have been Leveraging Existing Image Collections, E.g., for Metadata defined to standardize certain Java-implemented tasks. JSRs increasingly are designed for efficient implementations on 0254 Collections of publicly-available imagery and other top of OpenGL ES2.0 class hardware. content are becoming more prevalent. Flickr, YouTube, Pho 0249. In accordance with a still further aspect of the tobucket (MySpace), Picasa, Zooomr, FaceBook, Webshots present technology, Some or all of the image processing and Google Images are just a few. Often, these resources can operations noted in this specification (facial recognition, also serve as sources of metadata—either expressly identified SIFT processing, watermark detection, histogram process as such, or inferred from data Such as file names, descriptions, ing, etc.) can be implemented as JSRs—providing standard etc. Sometimes geo-location data is also available. ized implementations that are Suitable across diverse hard 0255. An illustrative embodiment according to one aspect ware platforms. of the present technology works as follows. A captures a cell 0250 In addition to supporting cloud-based JSRs, the phone picture of an object, or scene—perhaps a desk tele extended Standards specification can also support the Query phone, as shown in FIG. 21. (The image may be acquired in US 2013/0273968 A1 Oct. 17, 2013

other manners as well. Such as transmitted from another user, 0279 Telecommunications (1) or downloaded from a remote computer.) (0280 Uninett (1) 0256 As a preliminary operation, known image process (0281 Work (1) ing operations may be applied, e.g., to correct color or con 0282. From this aggregated set of inferred metadata, it trast, to perform ortho-normalization, etc. on the captured may be assumed that those terms with the highest count image. Known image object segmentation or classification values (e.g., those terms occurring most frequently) are the techniques may also be used to identify an apparent Subject terms that most accurately characterize the user's FIG. 21 region of the image, and isolate same for further processing. image. 0257 The image data is then processed to determine char 0283. The inferred metadata can be augmented or acterizing features that are useful in pattern matching and enhanced, if desired, by known image recognition/classifica recognition. Color, shape, and texture metrics are commonly tion techniques. Such technology seeks to provide automatic used for this purpose. Images may also be grouped based on recognition of objects depicted in images. For example, by layout and eigenvectors (the latter being particularly popular recognizing a TouchTone keypad layout, and a coiled cord, for facial recognition). Many other technologies can of course Such a classifier may label the FIG. 21 image using the terms be employed, as noted elsewhere in this specification. Telephone and Facsimile Machine. 0258 (Uses of vector characterizations/classifications and 0284. If not already present in the inferred metadata, the other image/video/audio metrics in recognizing faces, imag terms returned by the image classifier can be added to the list ery, video, audio and other patterns are well known and Suited and given a count value. (An arbitrary value, e.g. 2, may be for use in connection with the present technology. See, e.g., used, or a value dependent on the classifiers reported confi patent publications 20060020630 and 20040243567 (Digi dence in the discerned identification can be employed.) marc), 20070239756 and 20020037083 (Microsoft), 0285 If the classifier yields one or more terms that are 20070237364 (Fuji Photo Film), U.S. Pat. Nos. 7,359,889 already present, the position of the term(s) in the list may be and 6,990,453 (Shazam), 20050180635 (Corel), U.S. Pat. elevated. One way to elevate a terms position is by increasing Nos. 6,430.306, 6,681,032 and 20030059124 (L-1 Corp.), its count value by a percentage (e.g., 30%). Another way is to U.S. Pat. Nos. 7,194,752 and 7,174.293 (Iceberg), 7,130,466 increase its count value to one greater than the next-above (Cobion), 6,553,136 (Hewlett-Packard), and 6,430.307 (Mat term that is not discerned by the image classifier. (Since the sushita), and the journal references cited at the end of this classifier returned the term “Telephone' but not the term disclosure. When used in conjunction with recognition of “Cisco, this latter approach could rank the term Telephone entertainment content such as audio and video, such features with a count value of "19" one above Cisco.) A variety of are sometimes termed content “fingerprints' or “hashes.) other techniques for augmenting/enhancing the inferred 0259. After feature metrics for the image are determined, metadata with that resulting from the image classifier are a search is conducted through one or more publicly-acces straightforward to implement. sible image repositories for images with similar metrics, 0286 A revised listing of metadata, resulting from the thereby identifying apparently similar images. (As part of its foregoing, may be as follows: image ingest process, Flickr and other Such repositories may calculate eigenvectors, color histograms, keypoint descrip (0287 Telephone (19) tors, FFTs, or other classification data on images at the time (0288 Cisco (18) they are uploaded by users, and collect same in an index for 0289 Phone (10) public search.) The search may yield the collection of appar 0290 VOIP (7) ently similar telephone images found in Flickr, depicted in 0291 IP (5) FIG 22. 0292 7941 (3) 0260 Metadata is then harvested from Flickr for each of 0293 Phones (3) these images, and the descriptive terms are parsed and ranked 0294 Technology (3) by frequency of occurrence. In the depicted set of images, for 0295 7960 (2) example, the descriptors harvested from Such operation, and 0296 Facsimile Machine (2) their incidence of occurrence, may be as follows: 0297. 7920 (1) 0261) Cisco (18) 0298 7950 (1) 0262 Phone (10) 0299 Best Buy (1) 0263 Telephone (7) 0300. Desk (1) 0264 VOIP (7) (0301 Ethernet (1) 0265 IP (5) (0302) IP-phone (1) 0266 7941 (3) 0303 Office (1) 0267 Phones (3) (0304 Pricey (1) 0268 Technology (3) 0305 Sprint (1) 0269 7960 (2) 0306 Telecommunications (1) (0270 7920 (1) 0307 Uninett (1) (0271 7950 (1) 0308 Work (1) (0272 Best Buy (1) 0309 The list of inferred metadata can be restricted to 0273. Desk (1) those terms that have the highest apparent reliability, e.g., (0274 Ethernet (1) count values. A Subset of the list comprising, e.g., the top N 0275 IP-phone (1) terms, or the terms in the top Mth percentile of the ranked 0276. Office (1) listing, may be used. This Subset can be associated with the (0277 Pricey (1) FIG. 21 image in a metadata repository for that image, as 0278 Sprint (1) inferred metadata. US 2013/0273968 A1 Oct. 17, 2013

0310. In the present example, if N=4, the terms Telephone, user's image. Examples include color correction/matching, Cisco, Phone and VOIP are associated with the FIG. 21 contrast correction, glare reduction, removing foreground/ image. background objects, etc. By Such arrangement, for example, 0311. Once a list of metadata is assembled for the FIG. 21 Such a system may discern that the FIG. 21 image has fore image (by the foregoing procedure, or others), a variety of ground components (apparently Post-It notes) on the tele operations can be undertaken. phone that should be masked or disregarded. The user's 0312 One option is to submit the metadata, along with the image data can be enhanced accordingly, and the enhanced captured content or data derived from the captured content image data used thereafter. (e.g., the FIG. 21 image, image feature data Such as eigen 0318 Relatedly, the user's image may suffer some vectors, color histograms, keypoint descriptors, FFTs. impediment, e.g., such as depicting its Subject from an odd machine readable data decoded from the image, etc), to a perspective, or with poor lighting, etc. This impediment may service provider that acts on the Submitted data, and provides cause the user's image not to be recognized by the service a response to the user. Shazam, Snapnow, ClusterMedia Labs, provider (i.e., the image data Submitted by the user does not Snaptell, Mobot, Mobile Acuity, Nokia Point & Find, seem to match any image data in the database being Kooaba, idée TinEye, and Digimarc Mobile, are a few of searched). Either in response to such a failure, or proactively, several commercially available services that capture media data from similar images identified from Flickr may be sub content, and provide a corresponding response; others are mitted to the service provider as alternatives—hoping they detailed in the earlier-cited patent publications. By accompa might work better. nying the content data with the metadata, the service provider 0319. Another approach—one that opens up many further can make a more informed judgment as to how it should possibilities—is to search Flickr for one or more images with respond to the user's Submission. similar image metrics, and collect metadata as described 0313 The service provider—or the user's device—can herein (e.g., Telephone, Cisco, Phone, VOIP). Flickr is then Submit the metadata descriptors to one or more other services, searched a second time, based on metadata. Plural images e.g., a web search engine Such as Google, to obtain a richerset with similar metadata canthereby be identified. Data for these of auxiliary information that may help better discern/infer/ further images (including images with a variety of different intuit an appropriate desired by the user. Or the information perspectives, different lighting, etc.) can then be submitted to obtained from Google (or other Such database resource) can the service provider—notwithstanding that they may “look” be used to augment/refine the response delivered by the ser different than the user's cellphone image. vice provider to the user. (In some cases, the metadata— 0320 When doing metadata-based searches, identity of possibly accompanied by the auxiliary information received metadata may not be required. For example, in the second from Google—can allow the service provider to produce an search of Flickr just-referenced, four terms of metadata may appropriate response to the user, without even requiring the have been associated with the user's image: Telephone, image data.) Cisco, Phone and VOIP. A match may be regarded as an 0314. In some cases, one or more images obtained from instance in which a Subset (e.g., three) of these terms is found. Flickr may be substituted for the user's image. This may be 0321) Another approach is to rank matches based on the done, for example, if a Flickr image appears to be of higher rankings of shared metadata terms. An image tagged with quality (using sharpness, illumination histogram, or other Telephone and Cisco would thus be ranked as a better match measures), and if the image metrics are Sufficiently similar. than an image tagged with Phone and VOIP. One adaptive (Similarity can be judged by a distance measure appropriate way to rank a “match is to sum the counts for the metadata to the metrics being used. One embodiment checks whether descriptors for the user's image (e.g., 19+18+10+7=54), and the distance measure is below a threshold. If several alternate then tally the count values for shared terms in a Flickr image images pass this screen, then the closest image is used.) Or (e.g., 35, if the Flickr image is tagged with Cisco, Phone and Substitution may be used in other circumstances. The Substi VOIP). The ratio can then be computed (35/54) and compared tuted image can then be used instead of (or in addition to) the to a threshold (e.g., 60%). In this case, a “match’ is found. A captured image in the arrangements detailed herein. variety of other adaptive matching techniques can be devised 0315. In one such arrangement, the substitute image data by the artisan. is submitted to the service provider. In another, data for sev 0322 The above examples searched Flickr for images eral Substitute images are Submitted. In another, the original based on similarity of image metrics, and optionally on simi image data—together with one or more alternative sets of larity of textual (semantic) metadata. Geolocation data (e.g., image data—are Submitted. In the latter two cases, the service GPS tags) can also be used to get a metadata toe-hold. provider can use the redundancy to help reduce the chance of 0323 If the user captures an arty, abstract shot of the Eiffel error—assuring an appropriate response is provided to the tower from amid the metalwork or another unusual vantage user. (Or the service provider can treat each submitted set of point (e.g., FIG. 29), it may not be recognized—from image image data individually, and provide plural responses to the metrics—as the Eiffel tower. But GPS info captured with the user. The client software on the cellphone can then assess the image identifies the location of the image subject. Public different responses, and pick between them (e.g., by a Voting databases (including Flickr) can be employed to retrieve tex arrangement), or combine the responses, to help provide the tual metadata based on GPS descriptors. Inputting GPS user an enhanced response.) descriptors for the photograph yields the textual descriptors 0316. Instead of substitution, one or more related public Paris and Eiffel. image(s) may be composited or merged with the user's cell 0324 Google Images, or another database, can be queried phone image. The resulting hybrid image can then be used in with the terms Eiffel and Paris to retrieve other, more perhaps the different contexts detailed in this disclosure. conventional images of the Eiffel tower. One or more of those 0317. A still further option is to use apparently-similar images can be submitted to the service provider to drive its images gleaned from Flickr to inform enhancement of the process. (Alternatively, the GPS information from the user's US 2013/0273968 A1 Oct. 17, 2013

image can be used to search Flickr for images from the same 0333 Each uncertain descriptor may be given a confi location; yielding imagery of the Eiffel Tower that can be dence metric or rank. This data may be determined by the submitted to the service provider.) public, either expressly or inferentially. An example is the 0325 Although GPS is gaining in camera-metadata-de case when a user sees a Flickr picture she believes to be from ployment, most imagery presently in Flickr and other public Yellowstone, and adds a “Yellowstone location tag, together databases is missing geolocation info. But GPS info can be with a "95% confidence tag (her estimation of certainty automatically propagated across a collection of imagery that about the contributed location metadata). She may add an share visible features (by image metrics Such as eigenvectors, alternate location metatag, indicating "Montana, together color histograms, keypoint descriptors, FFTs, or other clas with a corresponding 50% confidence tag. (The confidence sification techniques), or that have a metadata match. tags needn't sum to 100%. Just one tag can be contributed— 0326 To illustrate, if the user takes a cellphone picture of with a confidence less than 100%. Or several tags can be a city fountain, and the image is tagged with GPS informa contributed possibly overlapping, as in the case with Yel tion, it can be submitted to a process that identifies matching lowstone and Montana). Flickr/Google images of that fountain on a feature-recogni 0334. If several users contribute metadata of the same type tion basis. To each of those images the process can add GPS to an image (e.g., location metadata), the combined contribu information from the user's image. tions can be assessed to generate aggregate information. Such 0327. A second level of searching can also be employed. information may indicate, for example, that 5 of 6 users who From the set of fountain images identified from the first contributed metadata tagged the image as Yellowstone, with search based on similarity of appearance, metadata can be an average 93% confidence; that 1 of 6 users tagged the image harvested and ranked, as above. Flickr can then be searched a as Montana, with a 50% confidence, and 2 of 6 users tagged second time, for images having metadata that matches within the image as Glacier National park, with a 15% confidence, a specified threshold (e.g., as reviewed above). To those etc images, too, GPS information from the user's image can be 0335) Inferential determination of metadata reliability can added. be performed, either when express estimates made by con tributors are not available, or routinely. An example of this is 0328. Alternatively, or in addition, a first set of images in the FIG. 21 photo case, in which metadata occurrence counts Flickr/Google similar to the user's image of the fountain can are used to judge the relative merit of each item of metadata be identified—not by pattern matching, but by GPS-matching (e.g., Telephone-19 or 7, depending on the methodology (or both). Metadata can be harvested and ranked from these used). Similar methods can be used to rank reliability when GPS-matched images. Flickr can be searched a second time several metadata contributors offer descriptors for a given for a second set of images with similar metadata. To this image. second set of images, GPS information from the user's image 0336 Crowd-sourcing techniques are known to parcel can be added. image-identification tasks to online workers, and collect the 0329. Another approach to geolocating imagery is by results. However, prior art arrangements are understood to searching Flickr for images having similar image character seek simple, short-term consensus on identification. Better, it istics (e.g., gist, eigenvectors, color histograms, keypoint seems, is to quantify the diversity of opinion collected about descriptors, FFTs, etc.), and assessing geolocation data in the image contents (and optionally its variation over time, and identified images to infer the probable location of the original information about the sources relied-on), and use that richer image. See, e.g., Hays, etal, IM2GPS: Estimating geographic data to enable automated systems to make more nuanced information from a single image, Proc. of the IEEE Conf. on decisions about imagery, its value, its relevance, its use, etc. ComputerVision and Pattern Recognition, 2008. Techniques 0337 To illustrate, known crowd-sourcing image identi detailed in the Hays paper are Suited for use in conjunction fication techniques may identify the FIG. 35 image with the with the present technology (including use of probability identifiers “soccer ball and “dog. These are the consensus functions as quantizing the uncertainty of inferential tech terms from one or several viewers. Disregarded, however, niques). may be information about the long tail of alternative descrip 0330. When geolocation data is captured by the camera, it tors, e.g., Summer, Labrador, football, tongue, afternoon, is highly reliable. Also generally reliable is metadata (loca evening, morning, fescue, etc. Also disregarded may be tion or otherwise) that is authored by the proprietor of the demographic and other information about the persons (or image. However, when metadata descriptors (geolocation or processes) that served as metadata identifiers, or the circum semantic) are inferred or estimated, or authored by a stranger stances of their assessments. A richer set of metadata may to the image, uncertainty and other issues arise. associate with each descriptor a set of sub-metadata detailing 0331 Desirably, such intrinsic uncertainty should be this further information. memorialized in some fashion so that later users thereof (hu 0338. The sub-metadata may indicate, for example, that man or machine) can take this uncertainty into account. the tag “football was contributed by a 21 year old male in 0332 One approach is to segregate uncertain metadata Brazil on Jun. 18, 2008. It may further indicate that the tags from device-authored or creator-authored metadata. For “afternoon.” “evening and “morning were contributed by example, different data structures can be used. Or different an automated image classifier at the University of Texas that tags can be used to distinguish Such classes of information. Or made these judgments on Jul. 2, 2008 based, e.g., on the angle each metadata descriptor can have its own Sub-metadata, of illumination on the subjects. Those three descriptors may indicating the author, creation date, and source of the data. also have associated probabilities assigned by the classifier, The author or source field of the sub-metadata may have a e.g., 50% for afternoon, 30% for evening, and 20% for morn data String indicating that the descriptor was inferred, esti ing (each of these percentages may be stored as a Sub mated, deduced, etc., or Such information may be a separate metatag). One or more of the metadata terms contributed by Sub-metadata tag. the classifier may have a further sub-tag pointing to an on-line US 2013/0273968 A1 Oct. 17, 2013 20 glossary that aids in understanding the assigned terms. For Flickr can be searched based on image metrics to obtain a example, such as Sub-tag may give the URL of a computer collection of Subject-similar images (e.g., as detailed above). resource that associates the term “afternoon” with a defini A data extraction process (e.g., watermark decoding, finger tion, or synonyms, indicating that the term means noon to 7 print calculation, barcode- or OCR-reading) can be applied to pm. The glossary may further indicate a probability density Some or all of the resulting images, and information gleaned function, indicating that the mean time meant by “afternoon” thereby can be added to the metadata for the FIG. 21 image, is 3:30 pm, the median time is 4:15 pm, and the term has a and/or submitted to a service provider with image data (either Gaussian function of meaning spanning the noon to 7pm time for the FIG. 21 image, and/or for related images). interval. 0345 From the collection of images found in the first 0339 Expertise of the metadata contributors may also be search, text or GPS metadata can be harvested, and a second reflected in sub-metadata. The term “fescue' may have sub search can be conducted for similarly-tagged images. From metadata indicating it was contributed by a 45 year old grass the text tags Cisco and VOIP for example, a search of Flickr seed farmer in Oregon. An automated system can conclude may find a photo of the underside of the user's phone with that this metadata term was contributed by a person having OCR-readable data—as shown in FIG. 36. Again, the unusual expertise in a relevant knowledge domain, and may extracted information can be added to the metadata for the therefore treat the descriptor as highly reliable (albeit maybe FIG. 21 image, and/or submitted to a service provider to not highly relevant). This reliability determination can be enhance the response it is able to provide to the user. added to the metadata collection, so that other reviewers of 0346. As just shown, a cell phone user may be given the the metadata can benefit from the automated systems assess ability to look around corners and under objects—by using ment. one image as a portal to a large collection of related images. 0340 Assessment of the contributor’s expertise can also be self-made by the contributor. Or it can be made otherwise, User Interface e.g., by reputational rankings using collected third party assessments of the contributor's metadata contributions. (0347 Referring to FIGS. 44 and 45A, cell phones and (Such reputational rankings are known, e.g., from public related portable devices 110 typically include a display 111 assessments of sellers on EBay, and of book reviewers on and a keypad 112. In addition to a numeric (or alphanumeric) Amazon.) Assessments may be field-specific, so a person keypad there is often a multi-function controller 114. One may be judged (or self-judged) to be knowledgeable about popular controller has a centerbutton 118, and four surround grass types, but not about dog breeds. Again, all such infor ing buttons 116a, 116b, 116c and 116d (also shown in FIG. mation is desirably memorialized in Sub-metatags (including 37). Sub-Sub-metatags, when the information is about a Sub 0348 An illustrative usage model is as follows. A system metatag). responds to an image 128 (either optically captured or wire 0341 More information about crowd-sourcing, including lessly received) by displaying a collection of related images use of contributor expertise, etc., is found in Digimarc's pub to the user, on the cell phone display. For example, the user lished patent application 20070162761. captures an image and Submits it to a remote service. The 0342. Returning to the case of geolocation descriptors service determines image metrics for the Submitted image (which may be numeric, e.g., latitude/longitude, or textual), (possibly after pre-processing, as detailed above), and an image may accumulate—over time—a lengthy catalog of searches (e.g., Flickr) for visually similar images. These contributed geographic descriptors. An automated system images are transmitted to the cellphone (e.g., by the service, (e.g., a server at Flickr) may periodically review the contrib or directly from Flickr), and they are buffered for display. The uted geotag information, and distill it to facilitate public use. service can prompt the user, e.g., by instructions presented on For numeric information, the process can apply known clus the display, to repeatedly press the right-arrow button 116b on tering algorithms to identify clusters of similar coordinates, the four-way controller (or press-and-hold) to view a and average same to generate a mean location for each cluster. sequence of pattern-similar images (130, FIG. 45A). Each For example, a photo of a geyser may be tagged by some time the button is pressed, another one of the buffered appar people with latitude/longitude coordinates in Yellowstone, ently-similar images is displayed. and by others with latitude/longitude coordinates of Hells 0349. By techniques like those earlier described, or other Gate Park in New Zealand. These coordinates thus form wise, the remote service can also search for images that are distinct two clusters that would be separately averaged. If similar in geolocation to the Submitted image. These too can 70% of the contributors placed the coordinates in Yellow be sent to and buffered at the cellphone. The instructions may stone, the distilled (averaged) value may be given a confi advise that the user can press the left-arrow button 116d of the dence of 70%. Outlier data can be maintained, but given a low controller to review these GPS-similar images (132, FIG. probability commensurate with its outlier status. Such distil 45A). lation of the data by a proprietor can be stored in metadata 0350 Similarly, the service can search for images that are fields that are readable by the public, but not writable. similar in metadata to the Submitted image (e.g., based on 0343. The same or other approach can be used with added textual metadata inferred from other images, identified by textual metadata—e.g., it can be accumulated and ranked pattern matching or GPS matching). Again, these images can based on frequency of occurrence, to give a sense of relative be sent to the phone and buffered for immediate display. The confidence. instructions may advise that the user can press the up-arrow 0344) The technology detailed in this specification finds button 116a of the controller to view these metadata-similar numerous applications in contexts involving watermarking, images (134, FIG. 45A). bar-coding, fingerprinting, OCR-decoding, and other 0351. Thus, by pressing the right, left, and up buttons, the approaches for obtaining information from imagery. Con user can review images that are similar to the captured image sider again the FIG. 21 cell phone photo of a desk phone. in appearance, location, or metadata descriptors. US 2013/0273968 A1 Oct. 17, 2013

0352. Whenever such review reveals apicture of particular taining facial recognition parameters for images on those interest, the user can press the down button 116c. This action sites, for similar-appearing faces. (The service may provide identifies the currently-viewed picture to the service provider, the user's sign-on credentials to the sites, allowing searching which then can repeat the process with the currently-viewed of information that is not otherwise publicly accessible.) picture as the base image. The process then repeats with the Names and other information about similar-appearing per user-selected image as the base, and with button presses Sons located via the searching are returned to the user's cell enabling review of images that are similar to that base image phone—to help refresh the user's memory. in appearance (16b), location (16d), or metadata (16a). 0358 Various UI procedures are contemplated. When data 0353. This process can continue indefinitely. At some is returned from the remote service, the user may pushbutton point the user can press the centerbutton 118 of the four-way 116b to scroll thru matches in order of closest-similarity— controller. This action Submits the then-displayed image to a regardless of geography. Thumbnails of the matched indi service provider for further action (e.g., triggering a corre viduals with associated name and other profile information sponding response, as disclosed, e.g., in earlier-cited docu can be displayed, or just full screen images of the person can ments). This action may involve a different service provider be presented with the name overlaid. When the familiar than the one that provided all the alternative imagery, or they person is recognized, the user may press button 118 to load can be the same. (In the latter case the finally-selected image the full FaceBook/MySpace/Linked-In page for that person. need not be sent to the service provider, since that service Alternatively, instead of presenting images with names, just a provider knows all the images buffered by the cellphone, and textual list of names may be presented, e.g., all on a single may track which image is currently being displayed.) screen—ordered by similarity of face-match; SMS text mes 0354. The dimensions of information browsing just-de saging can Suffice for this last arrangement. tailed (similar-appearance images; similar-location images; 0359 Pushing button 116d may scroll thru matches in similar-metadata images) can be different in other embodi order of closest-similarity, of people who list their residence ments. Consider, for example, an embodiment that takes an as within a certain geographical proximity (e.g., same met image of a house as input (or latitude/longitude), and returns ropolitan area, same state, same campus, etc.) of the user's the following sequences of images: (a) the houses for sale present location or the user's reference location (e.g., home). nearest in location to the input-imaged house; (b) the houses Pushing button 116a may yield a similar display, but limited for sale nearest in price to the input-imaged house; and (c) the to persons who are “Friends of the user within a social houses for sale nearest in features (e.g., bedrooms/baths) to network (or who are Friends of Friends, or who are within the input-imaged house. (The universe of houses displayed another specified degree of separation of the user). can be constrained, e.g., by Zip-code, metropolitan area, 0360 A related arrangement is a law enforcement tool in school district, or other qualifier.) which an officer captures an image of a person and Submits 0355 Another example of this user interface technique is same to a database containing facial portrait/eigenvalue infor presentation of search results from EBay for auctions listing mation from government driver license records and/or other 360 game consoles. One dimension can be price (e.g., Sources. Pushing button 116b causes the screen to display a pushing button 116b yields a sequence of Screens showing sequence of images/biographical dossiers about persons Xbox 360 auctions, starting with the lowest-priced ones); nationwide having the closest facial matches. Pushing button another can be seller's geographical proximity to user (clos 116d causes the screen to display a similar sequence, but est to furthest, shown by pushing button 116d); another can be limited to persons within the officer's state. Button 116a time until end of auction (shortest to longest, presented by yields such a sequence, but limited to persons within the pushing button 116a). Pressing the middle button 118 can metropolitan area in which the officer is working. load the full web page of the auction being displayed. 0361 Instead of three dimensions of information brows 0356. A related example is a system that responds to a ing (buttons 116b, 11.6d, 116a, e.g., for similar-appearing user-captured image of a car by identifying the car (using images/similarly located images/similar metadata-tagged image features and associated database(s)), searching EBay images), more or less dimensions can be employed. FIG. 45B and Craigslist for similar cars, and presenting the results on shows browsing screens injust two dimensions. (Pressing the the screen. Pressing button 116b presents screens of informa right button yields a first sequence 140 of information tion about cars offered for sale (e.g., including image, seller screens; pressing the left button yields a different sequence location, and price) based on similarity to the input image 142 of information screens.) (same model year? same color first, and then nearest model 0362 Instead of two or more distinct buttons, a single UI years/colors), nationwide. Pressing button 116dyields such a control can be employed to navigate in the available dimen sequence of screens, but limited to the user's State (or metro sions of information. Ajoystick is one such device. Another is politan region, or a 50 mile radius of the user's location, etc). a roller wheel (or scroll wheel). Portable device 110 of FIG. Pressing button 116a yields such a sequence of screens, again 44 has a roller wheel 124 on its side, which can be rolled-up limited geographically, but this time presented in order of or rolled-down. It can also be pressed-in to make a selection ascending price (rather than closest model year/color). Again, (e.g., akin to buttons 116c or 118 of the earlier-discussed pressing the middle button loads the full web page (EBay or controller). Similar controls are available on many mice. Craigslist) of the car last-displayed. 0363. In most user interfaces, opposing buttons (e.g., left 0357 Another embodiment is an application that helps button 116b, and right button 116d) navigate the same dimen people recall names. A user sees a familiar person at a party, sion of information just in opposite directions (e.g., for but can’t remember his name. Surreptitiously the user Snaps a ward/reverse). In the particular interface discussed above, it picture of the person, and the image is forwarded to a remote will be recognized that this is not the case (although in other service provider. The service provider extracts facial recog implementations, it may be so). Pressing the right button nition parameters and searches social networking sites (e.g., 116b, and then pressing the left button 116d, does not return FaceBook, MySpace, Linked-In), or a separate database con the system to its original state. Instead, pressing the right US 2013/0273968 A1 Oct. 17, 2013 22 button gives, e.g., a first similar-appearing image, and press present a rightward-scrolling series of image frames 130 of ing the left button gives the first similarly-located image. imagery having similar visual content (with the initial speed 0364 Sometimes it is desirable to navigate through the of scrolling dependent on the speed of the user gesture, and same sequence of Screens, but in reverse of the order just with the Scrolling speed decelerating or not—over time). A reviewed. Various interface controls can be employed to do brushing gesture to the left may present a similar leftward this. scrolling display of imagery 132 having similar GPS infor 0365. One is a “Reverse' button. The device 110 in FIG. mation. A brushing gesture upward may present images an 44 includes a variety of buttons not-yet discussed (e.g., but upward-scrolling display of imagery 134 similar in metadata. tons 120a-120f around the periphery of the controller 114). At any point the user can tap one of the displayed images to Any of these if pressed—can serve to reverse the scrolling make it the base image, with the process repeating. order. By pressing, e.g., button 120a, the Scrolling (presenta 0372 Other gestures can invoke still other actions. One tion) direction associated with nearby button 116b can be Such action is displaying overhead imagery corresponding to reversed. So if button 116b normally presents items in order the GPS location associated with a selected image. The imag of increasing cost, activation of button 120a can cause the ery can be Zoomed in/out with other gestures. The user can function of button 116b to Switch, e.g., to presenting items in select for display photographic imagery, map data, data from order of decreasing cost. If, in reviewing screens resulting different times of day or different dates/seasons, and/or vari from use of button 116b, the user "overshoots' and wants to ous overlays (topographic, places of interest, and other data, reverse direction, she can push button 120a, and then push as is known from Google Earth), etc. Icons or other graphics button 116b again. The screen(s) earlier presented would then may be presented on the display depending on contents of appear in reverse order—starting from the present screen. particular imagery. One such arrangement is detailed in Digi 0366. Or, operation of such a button (e.g., 120a or 120f) marc's published application 20080300011. can cause the opposite button 116d to scroll back thru the 0373). “Curbside' or “street-level” imagery rather than screens presented by activation of button 116b, in reverse overhead imagery—can be also displayed. order. 0374. It will be recognized that certain embodiments of 0367. A textual or symbolic prompt can be overlaid on the the present technology include a shared general structure. An display screen in all these embodiments—informing the user initial set of data (e.g., an image, or metadata Such as descrip of the dimension of information that is being browsed, and the tors or geocode information, or image metrics such as eigen direction (e.g., browsing by cost: increasing). values) is presented. From this, a second set of data (e.g., 0368. In still other arrangements, a single button can per images, or image metrics, or metadata) are obtained. From form multiple functions. For example, pressing button 116b that second set of data, a third set of data is compiled (e.g., can cause the system to start presenting a sequence of screens, images with similar image metrics or similar metadata, or e.g., showing pictures of houses for sale nearest the user's image metrics, or metadata). Items from the third set of data location presenting each for 800 milliseconds (an interval can be used as a result of the process, or the process may set by preference data entered by the user). Pressing button continue, e.g., by using the third set of data in determining 116b a second time can cause the system to stop the fourth data (e.g., a set of descriptive metadata can be com sequence—displaying a static screen of a house for sale. piled from the images of the third set). This can continue, e.g., Pressing button 116b a third time can cause the system to determining a fifth set of data from the fourth (e.g., identify present the sequence in reverse order, starting with the static ing a collection of images that have metadata terms from the screen and going backwards thru the screens earlier pre fourth data set). A sixth set of data can be obtained from the sented. Repeated operation of buttons 116a, 116b, etc., can fifth (e.g., identifying clusters of GPS data with which images operate likewise (but control different sequences of informa in the fifth set are tagged), and so on. tion, e.g., houses closest in price, and houses closest in fea 0375. The sets of data can be images, or they can be other tures). forms of data (e.g., image metrics, textual metadata, geolo 0369. In arrangements in which the presented information cation data, decoded OCR-, barcode-, watermark-data, etc). stems from a process applied to a base image (e.g., a picture 0376 Any data can serve as the seed. The process can start Snapped by a user), this base image may be presented with image data, or with other information, such as image throughout the display—e.g., as a thumbnail in a corner of the metrics, textual metadata (aka semantic metadata), geoloca display. Or a button on the device (e.g., 126a, or 120b) can be tion information (e.g., GPS coordinates), decoded OCR/bar operated to immediately summon the base image back to the code/watermark data, etc. From a first type of information display. (image metrics, semantic metadata, GPS info, decoded info). 0370 Touch interfaces are gaining in popularity, such as in a first set of information-similar images can be obtained. products available from Apple and Microsoft (detailed, e.g., From that first set, a second, different type of information in Apple's patent publications 20060026535, 20060026536, (image metrics/semantic metadata/GPS/decoded info, etc.) 20060250377, 200802.11766, 20080158169, 20080158172, can be gathered. From that second type of information, a 20080204426, 20080174570, and Microsoft's patent publi second set of information-similar images can be obtained. cations 20060033701, 20070236470 and 20080001924). From that second set, a third, different type of information Such technologies can be employed to enhance and extend (image metrics/semantic metadata/GPS/decoded info, etc.) the just-reviewed user interface concepts—allowing greater can be gathered. From that third type of information, a third degrees of flexibility and control. Each button press noted set of information-similar images can be obtained. Etc. above can have a counterpart gesture in the Vocabulary of the 0377 Thus, while the illustrated embodiments generally touch screen system. start with animage, and then proceed by reference to its image 0371 For example, different touch-screen gestures can metrics, and so on, entirely different combinations of acts are invoke display of the different types of image feeds just also possible. The seed can be the payload from a product reviewed. A brushing gesture to the right, for example, may barcode. This can generate a first collection of images depict US 2013/0273968 A1 Oct. 17, 2013

ing the same barcode. This can lead to a set of common Processing a Set of Sample Images metadata. That can lead to a second collection of images based on that metadata. Image metrics may be computed from 0387 Assume a tourist snaps a photo of the Prometheus this second collection, and the most prevalent metrics can be statue at Rockefeller Center in NewYork using a cellphone or used to search and identify a third collection of images. The other mobile device. Initially, it is just a bunch of pixels. What images thus identified can be presented to the user using the to do? arrangements noted above. 0388 Assume the image is geocoded with location infor mation (e.g., latitude/longitude in XMP- or EXIF-metadata). 0378. In some embodiments, the present technology may 0389. From the geocode data, a search of Flickr can be be regarded as employing an iterative, recursive process by undertaken for a first set of images—taken from the same (or which information about one set of images (a single image in nearby) location. Perhaps there are 5 or 500 images in this first many initial cases) is used to identify a second set of images, Set. which may be used to identify a third set of images, etc. The 0390 Metadata from this set of images is collected. The function by which each set of images is related to the next metadata can be of various types. One is words/phrases from relates to a particular class of image information, e.g., image a title given to an image. Another is information in metatags metrics, semantic metadata, GPS, decoded info, etc. assigned to the image—usually by the photographer (e.g., 0379. In other contexts, the relation between one set of naming the photo subject and certain attributes/keywords), images and the next is a function not just of one class of but additionally by the capture device (e.g., identifying the information, but two or more. For example, a seed user image camera model, the date/time of the photo, the location, etc). may be examined for both image metrics and GPS data. From Another is words/phrases in a narrative description of the these two classes of information a collection of images can be photo authored by the photographer. determined images that are similar in both some aspect of 0391 Some metadata terms may be repeated across dif visual appearance and location. Other pairings, triplets, etc., ferent images. Descriptors common to two or more images of relationships can naturally be employed in the determi can be identified (clustered), and the most popular terms may nation of any of the Successive sets of images. be ranked. (Such as listing is shown at 'A' in FIG. 46A. Here, and in other metadata listings, only partial results are given Further Discussion for expository convenience.) 0380 Some embodiments of the present technology ana 0392. From the metadata, and from other analysis, it may lyze a consumer cell phone picture, and heuristically deter be possible to determine which images in the first set are mine information about the picture's Subject. For example, is likely person-centric, which are place-centric, and which are it a person, place, or thing? From this high level determina thing-centric. tion, the system can better formulate what type of response 0393 Consider the metadata with which a set of 50 images might be sought by the consumer—making operation more may be tagged. Some of the terms relate to place. Some relate intuitive. to persons depicted in the images. Some relate to things. 0381 For example, if the subject of the photo is a person, the consumer might be interested in adding the depicted per Place-Centric Processing son as a FaceBook“friend. Or sending a text message to that 0394 Terms that relate to place can be identified using person. Or publishing an annotated version of the photo to a various techniques. One is to use a database with geographi web page. Or simply learning who the person is. cal information to look-up location descriptors near a given 0382. If the subject is a place (e.g., Times Square), the geographical position. Yahoo’s GeoPlanet service, for consumer might be interested in the local geography, maps, example, returns a hierarchy of descriptors such as "Rock and nearby attractions. efeller Center,” “10024” (a zip code), “Midtown Manhattan.” “New York,” “Manhattan.” “New York, and “United States.” 0383. If the subject is a thing (e.g., the Liberty Bell or a when queried with the latitude/longitude of the Rockefeller bottle of beer), the consumer may be interested in information Center. about the object (e.g., its history, others who use it), or in 0395. The same service can return names of adjoining/ buying or selling the object, etc. sibling neighborhoods/features on request, e.g., “10017. 0384 Based on the image type, an illustrative system/ “10020.” “10036,” “Theater District,” “Carnegie Hall.” service can identify one or more actions that it expects the “Grand Central Station,” “Museum of American Folk Art.” consumer will find most appropriately responsive to the cell etc., etc. phone image. One or all of these can be undertaken, and 0396 Nearby street names can be harvested from a variety cached on the consumer's cellphone for review. For example, of mapping programs, given a set of latitude/longitude coor scrolling a thumbwheel on the side of the cell phone may dinates or other location info. presenta succession of different screens—each with different 0397. A glossary of nearby place-descriptors can be com information responsive to the image subject. (Ora Screen may piled in such manner. The metadata harvested from the set of be presented that queries the consumer as to which of a few Flickr images can then be analyzed, by reference to the glos possible actions is desired.) sary, to identify the terms that relate to place (e.g., that match 0385. In use, the system can monitor which of the avail terms in the glossary). able actions is chosen by the consumer. The consumer's 0398 Consideration then turns to use of these place-re usage history can be employed to refine a Bayesian model of lated metadata in the reference set of images collected from the consumers interests and desires, so that future responses Flickr. can be better customized to the user. 0399. Some images may have no place-related metadata. 0386 These concepts will be clearer by example (aspects These images are likely person-centric orthing-centric, rather of which are depicted, e.g., in FIGS. 46 and 47). than place-centric. US 2013/0273968 A1 Oct. 17, 2013 24

0400 Other images may have metadata that is exclusively myself, we, they, them, mine, their). Nouns identifying place-related. These images are likely place-centric, rather people and personal relationships can also be included (e.g., than person-centric or thing-centric. uncle, sister, daughter, gramps, boss, student, employee, wed 04.01. In between are images that have both place-related ding, etc) metadata, and other metadata. Various rules can be devised 0413 Adjectives and adverbs that are usually applied to and utilized to assign the relative relevance of the image to people may also be included in the person-term glossary (e.g., place. happy, boring, blonde, etc), as can the names of objects and 0402 One rule looks at the number of metadata descrip attributes that are usually associated with people (e.g., t-shirt, tors associated with an image, and determines the fraction backpack, Sunglasses, tanned, etc.). Verbs associated with that is found in the glossary of place-related terms. This is one people can also be employed (e.g., Surfing, drinking) metric. 0414. In this last group, as in Some others, there are some 0403. Another looks at where in the metadata the place terms that could also apply to thing-centric images (rather related descriptors appear. If they appear in an image title, than person-centric). The term "sunglasses' may appear in they are likely more relevant than if they appear at the end of metadata for an image depicting Sunglasses, alone; "happy” a long narrative description about the photograph. Placement may appear in metadata for an image depicting a dog. There of the placement-related metadata is another metric. are also some cases where a person-term may also be a place 0404 Consideration can also be given to the particularity term (e.g., Boring, Oregon). In more Sophisticated embodi of the place-related descriptor. A descriptor “New York’ or ments, glossary terms can be associated with respective con “USA” may be less indicative that an image is place-centric fidence metrics, by which any results based on Such terms than a more particular descriptor, such as “Rockefeller Cen may be discounted or otherwise acknowledged to have dif ter' or “Grand Central Station.” This can yield a third metric. ferent degrees of uncertainty.) 0405. A related, fourth metric considers the frequency of 0415. As before, if an image is not associated with any occurrence (or improbability) of a term—either just within person-related metadata, then the image can be adjudged the collected metadata, or within a Superset of that data. likely not person-centric. Conversely, if all of the metadata is “RCA Building is more relevant, from this standpoint, than person-related, the image is likely person-centric. “Rockefeller Center” because it is used much less frequently. 0406. These and other metrics can be combined to assign 0416 For other cases, metrics like those reviewed above each image in the set with a place score indicating its potential can be assessed and combined to yield a score indicating the place-centric-ness. relative person-centric-ness of each image, e.g., based on the 0407. The combination can be a straight sum of four fac number, placement, particularity and/or frequency/improb tors, each ranging from 0 to 100. More likely, however, some ability of the person-related metadata associated with the metrics will be weighted more heavily. The following equa image. tion employing metrics M1, M2, M2 and M4 can be 0417. While analysis of metadata gives useful information employed to yield a score S, with the factors A, B, C, D and about whether an image is person-centric, other techniques exponents W, X, Y and Z determined experimentally, or by can also be employed—either alternatively, or in conjunction Bayesian techniques: with metadata analysis. 0418. One technique is to analyze the image looking for continuous areas of skin-tone colors. Such features charac terize many features of person-centric images, but are less Person-Centric Processing frequently found in images of places and things. 0408. A different analysis can be employed to estimate the 0419 A related technique is facial recognition. This sci person-centric-ness of each image in the set obtained from ence has advanced to the point where even inexpensive point Flickr. and-shoot digital cameras can quickly and reliably identify 04.09. As in the example just-given, a glossary of relevant faces within an image frame (e.g., to focus or expose the terms can be compiled—this time terms associated with image based on Such subjects). people. In contrast to the place name glossary, the person 0420 (Face finding technology is detailed, e.g., in U.S. name glossary can be global rather than associated with a Pat. Nos. 5,781,650 (Univ. of Central Florida), 6,633,655 particular locale. (However, different glossaries may be (Sharp), 6,597,801 (Hewlett-Packard) and 6,430.306 (L-1 appropriate in different countries.) Corp.), and in Yang et al. Detecting Faces in Images: A Sur 0410. Such a glossary can be compiled from various vey, IEEE Transactions on Pattern Analysis and Machine Sources, including telephone directories, lists of most popular Intelligence, Vol. 24, No. 1, January 2002, pp. 34-58, and names, and other reference works where names appear. The Zhao, et al. Face Recognition: A Literature Survey, ACM list may start, "Aaron, Abigail, Adam, Addison, Adrian, Computing Surveys, 2003, pp. 399-458. Additional papers Aidan, Aiden, Alex, Alexa, Alexander, Alexandra, Alexis, about facial recognition technologies are noted in a bibliog Allison, Alyssa, Amelia, Andrea, Andrew, Angel, Angelina, raphy at the end of this specification.) Anna, Anthony, Antonio, Ariana, Arianna, Ashley, Aubrey, 0421 Facial recognition algorithms can be applied to the Audrey, Austin, Autumn, Ava, Avery . . . .” set of reference images obtained from Flickr, to identify those 0411 First names alone can be considered, or last names that have evident faces, and identify the portions of the can be considered too. (Some names may be a place name or images corresponding to the faces. aperson name. Searching for adjoining first/last names and/or 0422. Of course, many photos have faces depicted inci adjoining place names can help distinguish ambiguous cases. dentally within the image frame. While all images having E.g., Elizabeth Smith is a person; Elizabeth N.J. is a place.) faces could be identified as person-centric, most embodi 0412 Personal pronouns and the like can also be included ments employ further processing to provide a more refined in Such a glossary (e.g., he, she, him, her, his, our, her, I, me, aSSeSSment. US 2013/0273968 A1 Oct. 17, 2013

0423. One form of further processing is to determine the the present example. There are various techniques by which a percentage area of the image frame occupied by the identified thing-centric score for an image can be determined. face(s). The higher the percentage, the higher the likelihood 0432 One technique relies on metadata analysis, using that the image is person-centric. This is another metric than principles like those detailed above. A glossary of nouns can can be used in determining an image's person-centric score. be compiled—either from the universe of Flickr metadata or 0424. Another form of further processing is to look for the Some other corpus (e.g., WordNet), and ranked by frequency existence of (1) one or more faces in the image, together with of occurrence. Nouns associated with places and persons can (2) person-descriptors in the metadata associated with the be removed from the glossary. The glossary can be used in the image. In this case, the facial recognition data can be used as manners identified above to conduct analyses of the images a “plus’ factor to increase a person-centric score of an image metadata, to yield a score for each. based on metadata or other analysis. (The “plus’ can take various forms. E.g., a score (in a 0-100 scale) can be increased 0433) Another approach uses pattern matching to identify by 10, or increased by 10%. Or increased by half the remain thing-centric images—matching each against a library of ing distance to 100, etc.) known thing-related images. 0425 Thus, for example, a photo tagged with “Elizabeth' 0434 Still another approach is based on earlier-deter metadata is more likely a person-centric photo if the facial mined scores for person-centric and place-centric. A thing recognition algorithm finds a face within the image than if no centric score may be assigned in inverse relationship to the face is found. other two scores (i.e., if an image scores low for being person 0426 (Conversely, the absence of any face in an image can centric, and low for being place-centric, then it can be be used as a “plus’ factor to increase the confidence that the assigned a high score for being thing-centric). image Subject is of a different type, e.g., a place or a thing. 0435 Such techniques may be combined, or used indi Thus, an image tagged with Elizabeth as metadata, but lack vidually. In any event, a score is produced for each image— ing any face, increases the likelihood that the image relates to tending to indicate whether the image is more- or less-likely a place named Elizabeth, or a thing named Elizabeth—Such to be thing-centric. as a pet.) 0427 Still more confidence in the determination can be assumed if the facial recognition algorithm identifies a face as Further Processing of Sample Set of Images a female, and the metadata includes a female name. Such an 0436 Data produced by the foregoing techniques can pro arrangement, of course, requires that the glossary—or other duce three scores for each image in the set, indicating rough data structure—have data that associates genders with at least confidence/probability/likelihood that the image is (1) per SO alS. Son-centric, (2) place-centric, or (3) thing-centric. These 0428 (Still more sophisticated arrangements can be scores needn't add to 100% (although they may). Sometimes implemented. For example, the age of the depicted person(s) an image may score high in two or more categories. In Such can be estimated using automated techniques (e.g., as detailed case the image may be regarded as having multiple relevance, in U.S. Pat. No. 5,781,650, to Univ. of Central Florida). e.g., as both depicting a person and a thing. Names found in the image metadata can also be processed to estimate the age of the thus-named person(s). This can be 0437. The set of images downloaded from Flickr may next done using public domain information about the statistical be segregated into groups, e.g., A, B and C, depending on distribution of a name as a function of age (e.g., from pub whether identified as primarily person-centric, place-centric, lished Social Security Administration data, and web sites that or thing-centric. However, since some images may have split detail most popular names from birth records). Thus, names probabilities (e.g., an image may have some indicia of being Mildred and Gertrude may be associated with an age distri place-centric, and some indicia of being person-centric), bution that peaks at age 80, whereas Madison and Alexis may identifying an image wholly by its high score ignores useful be associated with an age distribution that peaks at age 8. information. Preferable is to calculate a weighted score for Finding statistically-likely correspondence between meta the set of images—taking each image's respective scores in data name and estimated person age can further increase the the three categories into account. person-centric score for an image. Statistically unlikely cor 0438 A sample of images from Flickr all taken near respondence can be used to decrease the person-centric score. Rockefeller Center—may suggest that 60% are place-centric, (Estimated information about the age of a subject in the 25% are person-centric, and 15% are thing-centric. consumer's image can also be used to tailor the intuited 0439. This information gives useful insight into the tour response(s), as may information about the Subject's gender)) ist’s cellphone image—even without regard to the contents of 0429 Just as detection of a face in an image can be used as the image itself (except its geocoding). That is, chances are a “plus’ factor in a score based on metadata, the existence of good that the image is place-centric, with less likelihood it is person-centric metadata can be used as a “plus’ factor to person-centric, and still less probability it is thing centric. increase a person-centric score based on facial recognition (This ordering can be used to determine the order of subse data. quent steps in the process—allowing the system to more 0430. Of course, if no face is found in an image, this quickly gives responses that are most likely to be appropri information can be used to reduce a person-centric score for ate.) the image (perhaps down to Zero). 0440 This type-assessment of the cellphone photo can be Thing-Centric Processing used—alone—to help determine an automated action pro vided to the touristin response to the image. However, further 0431. A thing-centered image is the third type of image processing can better assess the image's contents, and thereby that may be found in the set of images obtained from Flickr in allow a more particularly-tailored action to be intuited. US 2013/0273968 A1 Oct. 17, 2013 26

Similarity Assessments and Metadata Weighting content for yellow that peaks at lower image frequencies, is 0441. Within the set of co-located images collected from Somewhat associated with images that score high as thing Flickr, images that are place-centric will tend to have a dif centric. ferent appearance than images that are person-centric or 0448. The analysis techniques found most useful ingroup thing-centric, yet tend to have some similarity within the ing/distinguishing the different categories of images can then place-centric group. Place-centric images may be character be applied to the user's cellphone image. The results can then ized by Straight lines (e.g., architectural edges). Or repetitive be analyzed for proximity—in a distance measure sense (e.g., patterns (windows). Or large areas of uniform texture and multi-dimensional space)—with the characterizing features similar color near the top of the image (sky). associated with different categories of images. (This is the 0442. Images that are person-centric will also tend to have first time that the cellphone image has been processed in this different appearances than the other two classes of image, yet particular embodiment.) have common attributes within the person-centric class. For 0449. Using Such techniques, the cell phone image may example, person-centric images will usually have faces— score a 60 for thing-centric, a 15 for place-centric, and a 0 for generally characterized by an ovoid shape with two eyes and person-centric (on scale of 0-100). This is a second, better set a nose, areas of flesh tones, etc. of scores that can be used to classify the cellphone image (the 0443 Although thing-centric images are perhaps the most first being the statistical distribution of co-located photos diverse, images from any given geography may tend to have found in Flickr). unifying attributes or features. Photos geocoded at a horse 0450. The similarity of the user's cell phone image may track will depict horses with some frequency; photos geo next be compared with individual images in the reference set. coded from Independence National Historical Park in Phila Similarity metrics identified earlier can be used, or different delphia will tend to depict the Liberty Bell regularly, etc. measures can be applied. The time or processing devoted to 0444. By determining whether the cell phone image is this task can be apportioned across the three different image more similar to place-centric, or person-centric, or thing categories based on the just-determined scores. E.g., the pro centric images in the set of Flickr images, more confidence in cess may spend no time judging similarity with reference the Subject of the cell phone image can be achieved (and a images classed as 100% person-centric, but instead concen more accurate response can be intuited and provided to the trate on judging similarity with reference images classed as consumer). thing- or place-centric (with more effort—e.g., four times as 0445. A fixed set of image assessment criteria can be much effort being applied to the former than the latter). A applied to distinguish images in the three categories. How similarity score is generated for most of the images in the ever, the detailed embodiment determines such criteria adap reference set (excluding those that are assessed as 100% tively. In particular, this embodiment examines the set of person-centric). images and determines which image features/characteristics/ 0451 Consideration then returns to metadata. Metadata metrics most reliably (1) group like-categorized images from the reference images are again assembled—this time together (similarity); and (2) distinguish differently-catego weighted in accordance with each image's respective simi rized images from each other (difference). Among the larity to the cellphone image. (The weighting can be linear or attributes that may be measured and checked for similarity/ exponential.) Since metadata from similarimages is weighted difference behavior within the set of images are dominant more than metadata from dissimilar images, the resulting set color, color diversity; color histogram; dominant texture; tex of metadata is tailored to more likely correspond to the cell ture diversity; texture histogram; edginess; wavelet-domain phone image. transform coefficient histograms, and dominant wavelet coef 0452 From the resulting set, the top N (e.g., 3) metadata ficients; frequency domain transfer coefficient histograms descriptors may be used. Or descriptors that—on a weighted and dominant frequency coefficients (which may be calcu basis—comprise an aggregate M% of the metadata set. lated in different color channels); eigenvalues; keypoint 0453. In the example given, the thus-identified metadata descriptors; geometric class probabilities; symmetry; per may comprise “Rockefeller Center,” “Prometheus, and centage of image area identified as facial; image autocorre “Skating rink with respective scores of 19, 12 and 5 (see “B” lation; low-dimensional "gists of image; etc. (Combinations in FIG. 46B). of such metrics may be more reliable than the characteristics 0454. With this weighted set of metadata, the system can individually.) begin determining what responses may be most appropriate 0446. One way to determine which metrics are most for the consumer. In the exemplary embodiment, however, the salient for these purposes is to compute a variety of different system continues by further refining its assessment of the cell image metrics for the reference images. If the results within a phone image. (The system may begin determining appropri category of images for a particular metric are clustered (e.g., ate responses while also undertaking the further processing.) if, for place-centric images, the color histogram results are clustered around particular output values), and if images in Processing a Second Set of Reference Images other categories have few or no output values near that clus tered result, then that metric would appear well suited for use 0455. At this point the system is better informed about the as an image assessment criteria. (Clustering is commonly cell phone image. Not only is its location known; so is its performed using an implementation of a k-means algorithm.) likely type (thing-centric) and some of its most-probably 0447. In the set of images from Rockefeller Center, the relevant metadata. This metadata can be used in obtaining a system may determine that an edginess score of >40 is reli second set of reference images from Flickr. ably associated with images that score high as place-centric; 0456. In the illustrative embodiment, Flickr is queried for a facial area score of>15% is reliably associated with images images having the identified metadata. The query can be that score high as person-centric; and a color histogram that geographically limited to the cell phone's geolocation, or a has a local peak in the gold tones—together with a frequency broader (or unlimited) geography may be searched. (Or the US 2013/0273968 A1 Oct. 17, 2013 27 query may run twice, so that half of the images are co-located importantly for the present application, however, it can help a with the cell phone image, and the others are remote, etc.) service provider decide how best to respond to submission of 0457. The search may first look for images that are tagged the user's image. with all of the identified metadata. In this case, 60 images are found. If more images are desired, Flickr may be searched for Determining Appropriate Responses for Consumer the metadata terms in different pairings, or individually. (In 0465 Referring to FIG.50, the system just-described can these latter cases, the distribution of selected images may be be viewed as one particular application of an "image juicer chosen so that the metadata occurrence in the results corre that receives image data from a user, and applies different sponds to the respective scores of the different metadata forms of processing so as to gather, compute, and/or infer terms, i.e., 19/12/5.) information that can be associated with the image. 0458 Metadata from this second set of images can be 0466. As the information is discerned, it can be forwarded harvested, clustered, and may be ranked (“C” in FIG. 46B). by a router to different service providers. These providers (Noise words (“and, of, or etc.) can be eliminated. Words may be arranged to handle different types of information descriptive only of the camera or the type of photography may (e.g., semantic descriptors, image texture data, keypoint also be disregarded (e.g., “Nikon,” “D80, “HDR,” “black descriptors, eigenvalues, color histograms, etc) or to different and white, etc.). Month names may also be removed.) classes of images (e.g., photo offriend, photo of a can of soda, 0459. The analysis performed earlier by which each etc). Outputs from these service providers are sent to one or image in the first set of images was classified as person more devices (e.g., the user's cellphone) for presentation or centric, place-centric or thing-centric—can be repeated on later reference. The present discussion now considers how images in the second set of images. Appropriate image met these service providers decide what responses may be appro rics for determining similarity/difference within and between priate for a given set of input information. classes of this second image set can be identified (or the 0467. One approach is to establish a taxonomy of image earlier measures can be employed). These measures are then Subjects and corresponding responses. A tree structure can be applied, as before, to generate refined scores for the user's cell used, with an image first being classed into one of a few high phone image, as being person-centric, place-centric, and level groupings (e.g., person/place/thing), and then each thing-centric. By reference to the images of the second set, the group being divided into further subgroups. In use, an image cell phone image may score a 65 for thing-centric, 12 for is assessed through different branches of the tree until the place-centric, and 0 for person-centric. (These scores may be limits of available information allow no further progress to be combined with the earlier-determined scores, e.g., by averag made. Actions associated with the terminal leafor node of the ing, if desired.) tree are then taken. 0468 Part of a simple tree structure is shown in FIG. 51. 0460. As before, similarity between the user's cell phone (Each node spawns three branches, but this is for illustration image and each image in the second set can be determined. only; more or less branches can of course be used.) Metadata from each image can then be weighted in accor 0469 If the subject of the image is inferred to be an item of dance with the corresponding similarity measure. The results food (e.g., if the image is associated with food-related meta can then be combined to yield a set of metadata weighted in data), three different screens of information can be cached on accordance image similarity. the user's phone. One starts an online purchase of the 0461 Some of the metadata—often including some depicted item at an online vendor. (The choice of vendor, and highly ranked terms—will be of relatively low value in deter payment/shipping details, can be obtained from user profile mining image-appropriate responses for presentation to the data.) The second screen shows nutritional information about consumer. “New York.” “Manhattan.” are a few examples. the product. The third presents a map of the local area— Generally more useful will be metadata descriptors that are identifying stores that sell the depicted product. The user relatively unusual. Switches among these responses using a roller wheel 124 on 0462. A measure of “unusualness’ can be computed by the side of the phone (FIG. 44). determining the frequency of different metadata terms within 0470. If the subject is inferred to be a photo of a family a relevant corpus, such as Flickr image tags (globally, or member or friend, one screen presented to the user gives the within a geolocated region), or image tags by photographers option of posting a copy of the photo to the user's FaceBook from whom the respective images were Submitted, or words page, annotated with the person(s)'s likely name(s). (Deter in an encyclopedia, or in Google's index of the web, etc. The mining the names of persons depicted in a photo can be done terms in the weighted metadata list can be further weighted in by Submitting the photo to the user's account at Picasa. Picasa accordance with their unusualness (i.e., a second weighting). performs facial recognition operations on Submitted user images, and correlates facial eigenvectors with individual 0463. The result of such successive processing may yield names provided by the user, thereby compiling a user-specific the list of metadata shown at “D’ in FIG. 46B (each shown database of facial recognition information for friends and with its respective score). This information (optionally in others depicted in the user's prior images. Picasa's facial conjunction with a tag indicating the person/place/thing recognition is understood to be based on technology detailed determination) allows responses to the consumer to be well in U.S. Pat. No. 6,356,659 to Google. Apple's iPhoto soft correlated with the cell phone photo. ware and Facebook's Photo Finder software include similar 0464. It will be recognized that this set of inferred meta facial recognition functionality.) Another screen starts a text data for the user's cellphone photo was compiled entirely by message to the individual, with the addressing information automated processing of other images, obtained from public having been obtained from the user's address book, indexed Sources such as Flickr, in conjunction with other public by the Picasa-determined identity. The user can pursue any or resources (e.g., listings of names, places). The inferred meta all of the presented options by Switching between the associ data can naturally be associated with the user's image. More ated Screens. US 2013/0273968 A1 Oct. 17, 2013 28

0471) If the subject appears to be a stranger (e.g., not 0477. In some embodiments, the system response may recognized by Picasa), the system will have earlier under depend on people with whom the user has a “friend' relation taken an attempted recognition of the person using publicly ship in a social network, or some other indicia of trust. For available facial recognition information. (Such information example, if little is known about user Ted, but there is a rich can be extracted from photos of known persons. VideoSurf is set of information available about Ted's friend Alice, that rich one vendor with a database of facial recognition features for set of information may be employed in determining how to actors and other persons. L-1 Corp. maintains databases of respond to Ted, in connection with a given content stimulus. driver's licenses photos and associated data which may— 0478 Similarly, if user Ted is a friend of user Alice, and with appropriate safeguards—be employed for facial recog Bob is a friend of Alice, then information relating to Bob may nition purposes.) The screen(s) presented to the user can show be used in determining an appropriate response to Ted. reference photos of the persons matched (together with a 0479. The same principles can be employed even if Ted “match” score), as well as dossiers of associated information and Alice are strangers, provided there is another basis for compiled from the web and other databases. A further screen implicit trust. While basic profile similarity is one possible gives the user the option of sending a “Friend' invite to the basis, a better one is the sharing an unusual attribute (or, recognized person on MySpace, or another social networking better, several). Thus, for example, if both Ted and Alice share site where the recognized person is found to have a presence. the traits of being fervent supporters of Dennis Kucinich for A still further screen details the degree of separation between president, and being devotees of pickledginger, then infor the user and the recognized person. (E.g., my brother David mation relating to one might be used in determining an appro has a classmate Steve, who has a friend Matt, who has a friend priate response to present to the other. Tom, who is the son of the depicted person.) Such relation 0480. The arrangements just-described provides powerful ships can be determined from association information pub new functionality. However, the “intuiting of the responses lished on Social networking sites. likely desired by the user rely largely on the system designers. 0472. Of course, the responsive options contemplated for They consider the different types of images that may be the different Sub-groups of image subjects may meet most encountered, and dictate responses (or selections of user desires, but some users will want something different. responses) that they believe will best satisfy the users’ likely Thus, at least one alternative response to each image may be desires. open-ended—allowing the user to navigate to different infor 0481. In this respect the above-described arrangements mation, or specify a desired response—making use of what are akin to early indexes of the web—such as Yahoo! Teams ever image/metadata processed information is available. of humans generated taxonomies of information for which 0473. One such open-ended approach is to submit the people might search, and then manually located web twice-weighted metadata noted above (e.g., “D” in FIG. 46B) resources that could satisfy the different search requests. to a general purpose search engine. Google, per se, is not 0482 Eventually the web overwhelmed such manual necessarily best for this function, because current Google efforts at organization. Google's founders were among those searches require that all search terms be found in the results. that recognized that an untapped wealth of information about Better is a search engine that does fuZZy searching, and is the web could be obtained from examining links between the responsive to differently-weighted keywords—not all of pages, and actions of users in navigating these links Under which need be found. The results can indicate different seem standing of the system thus came from data within the system, ing relevance, depending on which keywords are found, rather than from an external perspective. In like fashion, where they are found, etc. (A result including “Prometheus' manually crafted trees of image classifications/responses will but lacking “RCA Building would be ranked more relevant probably someday be seen as an early stage in the develop than a result including the latter but lacking the former.) ment of image-responsive technologies. Eventually Such 0474 The results from such a search can be clustered by approaches will be eclipsed by arrangements that rely on other concepts. For example, Some of the results may be machine understanding derived from the system itself, and its clustered because they share the theme “art deco. Others may US be clustered because they deal with corporate history of RCA 0483. One such technique simply examines which respon and GE. Others may be clustered because they concern the sive screen(s) are selected by users in particular contexts. As works of the architect Raymond Hood. Others may be clus Such usage patterns become evident, the most popular tered as relating to 20" century American sculpture, or Paul responses can be moved earlier in the sequence of Screens Manship. Other concepts found to produce distinct clusters presented to the user. may include John Rockefeller. The Mitsubishi Group, 0484. Likewise, if patterns become evident in use of the Colombia University, Radio City Music Hall. The Rainbow open-ended search query option, Such action can become a Room Restaurant, etc. standard response, and moved higher in the presentation 0475 Information from these clusters can be presented to queue. the user on Successive UI Screens, e.g., after the screens on 0485 The usage patterns can be tailored in various dimen which prescribed information/actions are presented. The sions of context. Males between 40 and 60 years of age, in order of these screens can be determined by the sizes of the New York, may demonstrate interest in different responses information clusters, or the keyword-determined relevance. following capture of a snapshot of a statue by a 20" century 0476 Still a further response is to present to the user a sculptor, than females between 13 and 16 years of age in Google search screen pre-populated with the twice Beijing. Most persons Snapping a photo of a food processor in weighted metadata as search terms. The user can then delete the weeks before Christmas may be interested in finding the terms that arent relevant to his/her interest, and add other cheapest online vendor of the product; most persons Snapping terms, so as to quickly execute a web search leading to the a photo of the same object the week following Christmas may information or action desired by the user. be interested in listing the item for sale on E-Bay or Craigslist. US 2013/0273968 A1 Oct. 17, 2013 29

Etc. Desirably, usage patterns are tracked with as many demo understanding can be used to dynamically decide what infor graphic and other descriptors as possible, so as to be most mation should be presented, or what action should be under predictive of user behavior. taken, responsive to a particular user capturing a particular 0486 More sophisticated techniques can also be applied, image (or to other stimulus). Truly intuitive computing will drawing from the rich Sources of expressly- and inferentially then have arrived. linked data sources now available. These include not only the web and personal profile information, but all manner of other Other Comments digital data we touch and in which we leave traces, e.g., cell 0493 While the image/metadata processing detailed phone billing statements, credit card Statements, shopping above takes many words to describe, it need not take much data from Amazon and EBay, Google search history, brows time to perform. Indeed, much of the processing of reference ing history, cached web pages, cookies, email archives, phone data, compilation of glossaries, etc., can be done off-line— message archives from Google Voice, travel reservations on before any input image is presented to the system. Flickr, Expedia and Orbitz, music collections on iTunes, cable tele Yahoo! or other service providers can periodically compile vision subscriptions, Netflix movie choices, GPS tracking and pre-process reference sets of data for various locales, to information, Social network data and activities, activities and be quickly available when needed to respond to an image postings on photo sites such as Flickr and Picasa and video query. sites such as YouTube, the times of day memorialized in these 0494. In some embodiments, other processing activities records, etc. (our “digital life log”). Moreover, this informa will be started in parallel with those detailed. For example, if tion is potentially available not just for the user, but also for initial processing of the first set of reference images suggests the users friends/family, for others having demographic that the Snapped image is place-centric, the system can similarities with the user, and ultimately everyone else (with request likely-useful information from other resources before appropriate anonymization and/or privacy safeguards). processing of the user image is finished. To illustrate, the 0487. The network of interrelationships between these system may immediately request a street map of the nearby data sources is smaller than the network of web links analyzed area, together with a satellite view, a street view, a mass transit by Google, but is perhaps richer in the diversity and types of map, etc. Likewise, a page of information about nearby res links From it can be mined a wealth of inferences and taurants can be compiled, together with another page detail insights, which can help inform what aparticular user is likely ing nearby movies and show-times, and a further page with a to want done with a particular Snapped image. local weather forecast. These can all be sent to the user's 0488 Artificial intelligence techniques can be applied to phone and cached for later display (e.g., by scrolling a thumb the data-mining task. One class of Such techniques is natural wheel on the side of the phone). language processing (NLP), a science that has made signifi 0495. These actions can likewise be undertaken before any cant advancements recently. image processing occurs—simply based on the geocode data 0489. One example is the Semantic Map compiled by accompanying the cellphone image. Cognition Technologies, Inc., a database that can be used to 0496 While geocoding data accompanying the cellphone analyze words in context, in order to discern their meaning image was used in the arrangement particularly described, This functionality can be used, e.g., to resolve homonym this is not necessary. Other embodiments can select sets of ambiguity in analysis of image metadata (e.g., does "bow” reference images based on other criteria, Such as image simi refer to a part of a ship, a ribbon adornment, a performer's larity. (This may be determined by various metrics, as indi thank-you, or a complement to an arrow? ProXimity to terms cated above and also detailed below. Known image classifi such as “Carnival cruise.” “satin. “Carnegie Hall' or “hunt cation techniques can also be used to determine one of several ing can provide the likely answer). U.S. Pat. No. 5,794,050 classes of images into which the input image falls, so that (FRCD Corp.) details underlying technologies. similarly-classed images can then be retrieved.) Another cri 0490 The understanding of meaning gained through NLP teria is the IP address from which the input image is uploaded. techniques can also be used to augment image metadata with Other images uploaded from the same-or geographically other relevant descriptors—which can be used as additional proximate IP addresses, can be sampled to form the refer metadata in the embodiments detailed herein. For example, a ence SetS. close-up image tagged with the descriptor “hibiscus stamens' 0497 Even in the absence of geocode data for the input can—through NLP techniques—be further tagged with the image, the reference sets of imagery may nonetheless be term “flower” (As of this writing, Flickr has 460 images compiled based on location. Location information for the tagged with “hibiscus” and “stamen.” but omitting “flower.”) input image can be inferred from various indirect techniques. 0491 U.S. Pat. No. 7,383,169 (Microsoft) details how A wireless service provider thru which a cell phone image is dictionaries and other large works of language can be pro relayed may identify the particular cell tower from which the cessed by NLP techniques to compile lexical knowledge tourist's transmission was received. (If the transmission bases that serve as formidable sources of such "common originated through another wireless link, such as WiFi, its sense' information about the world. This common sense location may also be known.) The tourist may have used his knowledge can be applied in the metadata processing detailed credit card an hour earlier at a Manhattan hotel, allowing the herein. (Wikipedia is another reference source that can serve system (with appropriate privacy safeguards) to infer that the as the basis for Such a knowledge base. Our digital life log is picture was taken somewhere near Manhattan. Sometimes yet another—one that yields insights unique to us as individu features depicted in an image are so iconic that a quick Search als.) for similar images in Flickr can locate the user (e.g., as being 0492. When applied to our digital life log, NLP techniques at the Eiffel Tower, or at the Statue of Liberty). can reach nuanced understandings about our historical inter 0498 GeoPlanet was cited as one source of geographic ests and actions—information that can be used to model information. However, a number of other geoinformation (predict) our present interests and forthcoming actions. This databases can alternatively be used. GeoNames-dot-org is US 2013/0273968 A1 Oct. 17, 2013 30 one. (It will be recognized that the “-dot-' convention, and 0506. It will be recognized that much of the foregoing omission of the usual http preamble, is used to prevent the processing is fuZZy. Much of the data may be in terms of reproduction of this text by the Patent Office from being metrics that have no absolute meaning, but are relevant only indicated as a live hyperlink) In addition to providing place to the extent different from other metrics. Many such different names for a given latitude/longitude (at levels of neighbor probabilistic factors can be assessed and then combined—a hood, city, state, country), and providing parent, child, and statistical stew. Artisans will recognize that the particular sibling information for geographic divisions, GeoNames implementation Suitable for a given situation may be largely free data (available as a web service) also provides functions arbitrary. However, through experience and Bayesian tech Such as finding the nearest intersection, finding the nearest niques, more informed manners of weighting and using the post office, finding the surface elevation, etc. Still another different factors can be identified and eventually used. option is Google's GeoSearch API, which allows retrieval of 0507 If the Flickr archive is large enough, the first set of and interaction with data from Google Earth and Google images in the arrangement detailed above may be selectively Maps. chosen to more likely be similar to the subject image. For 0499. It will be recognized that archives of aerial imagery example, Flickr can be searched for images taken at about the are growing exponentially. Part of Such imagery is from a same time of day. Lighting conditions will be roughly similar, straight-down perspective, but off-axis the imagery increas e.g., so that matching a night scene to a daylight scene is ingly becomes oblique. From two or more different oblique avoided, and shadow/shading conditions might be similar. views of a location, a 3D model can be created. As the reso Likewise, Flickr can be searched for images taken in the same lution of such imagery increases, Sufficiently rich sets of data season/month. Issues such as seasonal disappearance of the are available that—for some locations—a view of a scene as ice skating rink at Rockefeller Center, and Snow on a winter if taken from ground level may be synthesized. Such views landscape, can thus be mitigated. Similarly, if the camera/ can be matched with street level photos, and metadata from phone is equipped with a magnetometer, inertial sensor, or one can augment metadata for the other. other technology permitting its bearing (and/or azimuth/el (0500. As shown in FIG. 47, the embodiment particularly evation) to be determined, then Flickr can be searched for described above made use of various resources, including shots with this degree of similarity too. Flickr, a database of person names, a word frequency data 0508 Moreover, the sets of reference images collected base, etc. These are just a few of the many different informa from Flickr desirably comprise images from many different tion sources that might be employed in Such arrangements. Sources (photographers)—so they don't tend towards use of Other social networking sites, shopping sites (e.g., Amazon, the same metadata descriptors. EBay), weather and traffic sites, online thesauruses, caches of 0509) Images collected from Flickr may be screened for recently-visited web pages, browsing history, cookie collec adequate metadata. For example, images with no metadata tions, Google, other digital repositories (as detailed herein), (except, perhaps, an arbitrary image number) may be etc., can all provide a wealth of additional information that removed from the reference set(s). Likewise, images with less can be applied to the intended tasks. Some of this data reveals than 2 (or 20) metadata terms, or without a narrative descrip information about the users interests, habits and prefer tion, may be disregarded. ences—data that can be used to better infer the contents of the 0510 Flickr is often mentioned in this specification, but Snapped picture, and to better tailor the intuited response(s). other collections of content can of course be used. Images in 0501. Likewise, while FIG.47 shows a few lines intercon Flickr commonly have specified license rights for each necting the different items, these are illustrative only. Differ image. These include “all rights reserved as well as a variety ent interconnections can naturally be employed. of Creative Commons licenses, through which the public can 0502. The arrangements detailed in this specification area make use of the imagery on different terms. Systems detailed particular few out of myriad that may be employed. Most herein can limit their searches through Flickr for imagery embodiments will be different than the ones detailed. Some meeting specified license criteria (e.g., disregard images actions will be omitted, some will performed in different marked “all rights reserved'). orders, some will be performed in parallel rather than serially 0511. Other image collections are in some respects pref (and vice versa). Some additional actions may be included, erable. For example, the database at images.google-dot-com etc seems better at ranking images based on metadata-relevance 0503. One additional action is to refine the just-detailed than Flickr. process by receiving user-related input, e.g., after the process 0512 Flickr and Google maintain image archives that are ing of the first set of Flickr images. For example, the system publicly accessible. Many other image archives are private. identified “Rockefeller Center,” “Prometheus,” and “Skating The present technology finds application with both includ rink” as relevant metadata to the user-Snapped image. The ing some hybrid contexts in which both public and propri system may query the user as to which of these terms is most etary image collections are used (e.g., Flickr is used to find an relevant (or least relevant) to his/her particular interest. The image based on a user image, and the Flickr image is Submit further processing (e.g., further search, etc.) can be focused ted to a private database to find a match and determine a accordingly. corresponding response for the user). 0504 Within an image presented on a touch screen, the 0513. Similarly, while reference was made to services user may touch a region to indicate an object of particular Such as Flickr for providing data (e.g., images and metadata), relevance within the image frame. Image analysis and Subse other sources can of course be used. quent acts can then focus on the identified object. 0514. One alternative source is an ad hoc peer-to-peer 0505. Some of the database searches can be iterative/re (P2P) network. In one such P2P arrangement, there may cursive. For example, results from one database search can be optionally be a central index, with which peers can commu combined with the original search inputs and used as inputs nicate in searching for desired content, and detailing the for a further search. content they have available for sharing. The index may US 2013/0273968 A1 Oct. 17, 2013

include metadata and metrics for images, together with point used in XMP- and EXIF-camera metainformation. Flickr ers to the nodes at which the images themselves are stored. uses a syntax established by Geobloggers, e.g. 0515. The peers may include cameras, PDAs, and other geotagged portable devices, from which image information may be geo:lat=57.64911 available nearly instantly after it has been captured. geo:lon=10.40744 0516. In the course of the methods detailed herein, certain 0524. In processing metadata, it is sometimes helpful to relationships are discovered between imagery (e.g., similar clean-up the data prior to analysis, as referenced above. The geolocation; similar image metrics; similar metadata, etc). metadata may also be examined for dominant language, and These data are generally reciprocal, so if the system discov if not English (or other particular language of the implemen ers—during processing of Image A, that its color histogram is tation), the metadata and the associated image may be similar to that of Image B, then this information can be stored removed from consideration. for later use. If a later process involves Image B, the earlier 0525. While the earlier-detailed embodiment sought to stored information can be consulted to discover that Image A identify the image Subject as being one of a person/place? has a similar histogram without analyzing Image A. Such thing so that a correspondingly-different action can be taken, relationships are akin to virtual links between the images. analysis/identification of the image within other classes can 0517 For such relationship information to maintain its naturally be employed. A few examples of the countless other utility over time, it is desirable that the images be identified in class/type groupings include animal/vegetable/mineral; golf a persistent manner. If a relationship is discovered while tennis/football/baseball; male/female; wedding-ring-de Image A is on a user's PDA, and Image B is on a desktop tected/wedding-ring-not-detected; urban/rural; rainy/clear; Somewhere, a means should be provided to identify Image A day/night; child/adult: Summer/autumn/winter/spring; car/ even after it has been transferred to the user's MySpace truck; consumer product/non-consumer product; can/box/ account, and to track Image B after it has been archived to an bag; natural/man-made; Suitable for all ages/parental advi anonymous computer in a cloud network. sory for children 13 and below/parental advisory for children 0518) Images can be assigned Digital Object Identifiers 17 and below/adult only; etc. (DOI) for this purpose. The International DOI Foundation has 0526. Sometimes different analysis engines may be implemented the CNRI Handle System so that such resources applied to the user's image data. These engines can operate can be resolved to their current location through the web site sequentially, or in parallel. For example, FIG. 48A shows an at doi-dot-org. Another alternative is for the images to be arrangement in which if an image is identified as person assigned and digitally watermarked with identifiers tracked centric—it is next referred to two other engines. One identi by Digimarc For Images service. fies the person as family, friend or stranger. The other identi 0519 If several different repositories are being searched fies the person as child or adult. The latter two engines work for imagery or other information, it is often desirable to adapt in parallel, after the first has completed its work. the query to the particular databases being used. For example, 0527. Sometimes engines can be employed without any different facial recognition databases may use different facial certainty that they are applicable. For example, FIG. 48B recognition parameters. To search across multiple databases, shows engines performing family/friend/stranger and child/ technologies such as detailed in Digimarc's published patent adult analyses—at the same time the person/place/thing applications 20040243567 and 20060020630 can be engine is undertaking its analysis. If the latter engine deter employed to ensure that each database is probed with an mines the image is likely a place or thing, the results of the appropriately-tailored query. first two engines will likely not be used. 0528 (Specialized online services can be used for certain 0520 Frequent reference has been made to images, but in types of image discrimination/identification. For example, many cases other information may be used in lieu of image one web site may provide an airplane recognition service: information itself. In different applications image identifiers, when an image of an aircraft is uploaded to the site, it returns characterizing eigenvectors, color histograms, keypoint an identification of the plane by make and model. (Such descriptors, FFTs, associated metadata, decoded barcode or technology can follow teachings, e.g., of Sun, The Features watermark data, etc., may be used instead of imagery, perse Vector Research on Target Recognition of Airplane, JCIS (e.g., as a data proxy). 2008 Proceedings; and Tien, Using Invariants to Recognize 0521. While the earlier example spoke of geocoding by Airplanes in Inverse Synthetic Aperture Radar Images, Opti latitude/longitude data, in other arrangements the cellphone/ cal Engineering, Vol. 42, No. 1, 2003.) The arrangements camera may provide location data in one or more other ref detailed herein can refer imagery that appears to be of aircraft erence systems, such as Yahoo’s GeoPlanet ID—the Where to such a site, and use the returned identification information. on Earth ID (WOEID). Orall input imagery can be referred to such a site; most of the 0522 Location metadata can be used for identifying other returned results will be ambiguous and will not be used.) resources in addition to similarly-located imagery. Web 0529 FIG. 49 shows that different analysis engines may pages, for example, can have geographical associations (e.g., provide their outputs to different response engines. Often the a blog may concern the author's neighborhood; a restaurants different analysis engines and response engines may be oper web page is associated with a particular physical address). ated by different service providers. The outputs from these The web service GeoURL-dot-org is a location-to-URL response engines can then be consolidated/coordinated for reverse directory that can be used to identify web sites asso presentation to the consumer. (This consolidation may be ciated with particular geographies. performed by the user's cellphone—assembling inputs from 0523 GeoURL supports a variety of location tags, includ different data sources; or such task can be performed by a ing their own ICMB meta tags, as well as Geo Tags. Other processor elsewhere.) systems that Support geotagging include RDF, Geo microfor 0530 One example of the technology detailed herein is a mat, and the GPSLongitude/GPSLatitude tags commonly homebuilder who takes a cellphone image of a drill that needs US 2013/0273968 A1 Oct. 17, 2013 32 a spare part. The image is analyzed, the drill is identified by other UI controls on the device. For example, repeated press the system as a Black and Decker DR250B, and the user is ing of a Function Select button can cause different intended provided various info/action options. These include review operations to be displayed on the screen of the UI (just as ing photos of drills with similar appearance, reviewing photos familiar consumer cameras have different photo modes. Such of drills with similar descriptors/features, reviewing the as Close-up, Beach, Nighttime, Portrait, etc.). When the user user's manual for the drill, seeing a parts list for the drill, then presses the shutter button, the selected operation is buying the drill new from Amazon or used from EBay, listing invoked. the builder's drill on EBay, buying parts for the drill, etc. The 0538. One common response (which may need no confir builder chooses the “buying parts' option and proceeds to mation) is to post the image on Flickr or Social network order the necessary part. (FIG. 41.) site(s). Metadata inferred by the processes detailed herein can 0531. Another example is a person shopping for a home. be saved in conjunction with the imagery (qualified, perhaps, She Snaps a photo of the house. The system refers the image as to its confidence). both to a private database of MLS information, and a public 0539. In the past, the "click” of a mouse served to trigger database Such as Google. The system responds with a variety a user-desired action. That action identified an X-Y-location of options, including reviewing photos of the nearest houses coordinate on a virtual landscape (e.g., a desktop screen) that offered for sale; reviewing photos of houses listed for sale that indicated the user's express intention. Going forward, this are closest in value to the pictured home, and within the same role will increasingly be served by the “snap' of a shutter— Zip-code; reviewing photos of houses listed for sale that are capturing a real landscape from which a users intention will most similar in features to the pictured home, and within the be inferred. same Zip-code; neighborhood and School information, etc. 0540 Business rules can dictate a response appropriate to (FIG. 43.) a given situation. These rules and responses may be deter 0532. In another example, a first user snaps an image of mined by reference to data collected by web indexers, such as Paul Simon at a concert. The system automatically posts the Google, etc., using intelligent routing. image to the user's Flickr account—together with metadata 0541 Crowdsourcing is not generally suitable for real inferred by the procedures detailed above. (The name of the time implementations. However, inputs that stymie the sys artist may have been found in a search of Google for the user's tem and fail to yield a corresponding action (or yield actions geolocation; e.g., a Ticketmaster web page revealed that Paul from which user selects none) can be referred offline for Simon was playing that venue that night.) The first user's crowdsource analysis—so that next time its presented, it can picture, a moment later, is encountered by a system process be handled better. ing a second concert-goer's photo of the same event, from a 0542. Image-based navigation systems present a different different vantage. The second user is shown the first user's topology than is familiar from web page-based navigation photo as one of the system's responses to the second photo. system. FIG. 57A shows that web pages on the internet relate The system may also alert the first user that another picture of in a point-to-point fashion. For example, web page 1 may link the same event from a different viewpoint is available for to web pages 2 and 3. Web page 3 may link to page 2. Web review on his cell phone, if he'll press a certain button twice. page 2 may link to page 4. Etc. FIG. 57B shows the contrast 0533. In many such arrangements, it will be recognized ing network associated with image-based navigation. The that “the content is the network.’ Associated with each photo, individual images are linked a central node (e.g., a router), or each Subject depicted in a photo (or any other item of digital which then links to further nodes (e.g., response engines) in content or information expressed therein), is a set of data and accordance with the image information. attributes that serve as implicit- or express-links to actions 0543. The “router here does not simply route an input and other content. The user can navigate from one to the packet to a destination determined by address information next navigating between nodes on a network. conveyed with the packet—as in the familiar case with inter 0534 Television shows are rated by the number of view net traffic routers. Rather, the router takes image information ers, and academic papers are judged by the number of later and decides what to do with it, e.g., to which responsive citations. Abstracted to a higher level, it will be recognized system should the image information be referred. that such “audience measurement' for physical- or virtual 0544 Routers can be stand-alone nodes on a network, or content is the census of links that associate it with other they can be integrated with other devices. (Or their function physical- or virtual-content. ality can be distributed between such locations.) A wearable 0535 While Google is limited to analysis and exploitation computer may have a router portion (e.g., a set of Software of links between digital content, the technology detailed instructions)—which takes image information from the com herein allows the analysis and exploitation of links between puter, and decides how it should be handled. (For example, if physical content as well (and between physical and electronic it recognizes the image information as being an image of a content). business card, it may OCR name, phone number, and other 0536. Known cell phone cameras and other imaging data, and enter it into a contacts database.) The particular devices typically have a single "shutter' button. However, the response for different types of input image information can be device may be provided with different actuator buttons— determined by a registry database, e.g., of the sort maintained each invoking a different operation with the captured image by a computer's , or otherwise. information. By this arrangement, the user can indicate—at 0545. Likewise, while response engines can be stand the outset—the type of action intended (e.g., identify faces in alone nodes on a network, they can also be integrated with image per Picasa or VideoSurf information, and post to my other devices (or their functions distributed). A wearable FaceBook page; or try and identify the depicted person, and computer may have one or several different response engines send a “friend request' to that person's MySpace account). that take action on information provided by the router portion. 0537. Rather than multiple actuator buttons, the function 0546 FIG. 52 shows an arrangement employing several of a sole actuator button can be controlled in accordance with computers (A-E). Some of which may be wearable computers US 2013/0273968 A1 Oct. 17, 2013

(e.g., cellphones). The computers include the usual comple Store, Proceedings of The First ACM Workshop on Continu ment of processor, memory, storage, input/output, etc. The ous Archival and Retrieval of Personal Experiences (CARPE storage or memory can contain content, Such as images, audio 04), pp. 48-55; Wilkinson, Remember This, The New Yorker, and video. The computers can also include one or more rout May 27, 2007. See also the other references cited at Gordon's ers and/or response engines. Standalone routers and response Bell's web page, and the ACM Special engines may also be coupled to the network Interest Group web page for CARPE (Capture, Archival & 0547. The computers are networked, shown schematically Retrieval of Personal Experiences).) by link 150. This connection can be by any known networking 0554. The present technology is well suited for use with arrangement, including the internet and/or wireless links Such experiential digital content—either as input to a system (WiFi, WiMax, Bluetooth, etc), Software in at least certain of (i.e., the system responds to the user's present experience), or the computers includes a peer-to-peer (P2P) client, which as a resource from which metadata, habits, and other makes at least Some of that computer's resources available to attributes can be mined (including service in the role of the other computers on the network, and reciprocally enables that Flickr archive in the embodiments earlier detailed). computer to employ certain resources of the other computers. 0555. In embodiments that employ personal experience as 0548 Though the P2P client, computer A may obtain an input, it is initially desirable to have the system trigger and image, video and audio content from computer B. Sharing respond only when desired by the user—rather than being parameters on computer B can be set to determine which constantly free-running (which is currently prohibitive from content is shared, and with whom. Data on computer B may the standpoint of processing, memory and bandwidth issues). specify, for example, that Some content is to be kept private; 0556. The user's desire can be expressed by a deliberate Some may be shared with known parties (e.g., a tier of social action by the user, e.g., pushing a button, or making a gesture network “Friends”); and other may be freely shared. (Other with head or hand. The system takes data from the current information, Such as geographic position information, may experiential environment, and provides candidate responses. also be shared—subject to such parameters.) 0557. More interesting, perhaps, are systems that deter 0549. In addition to setting sharing parameters based on mine the users interest through biological sensors. Electro party, the sharing parameters may also specify sharing based encephalography, for example, can be used to generate a on the content age. For example, content/information older signal that triggers the system's response (or triggers one of than a year might be shared freely, and content older than a several different responses, e.g., responsive to different month might be shared with a tier of friends (or in accordance stimuli in the current environment). Skin conductivity, pupil with other rule-based restrictions). In other arrangements, dilation, and other autonomous physiological responses can fresher content might be the type most liberally shared. E.g., also be optically or electrically sensed, and provide a trigger content captured or stored within the past hour, day or week ing signal to the system. might be shared freely, and content from within the past 0558 Eye tracking technology can be employed to iden month or year might be shared with friends. tify which object in a field of view captured by an experien 0550 An exception list can identify content—or one or tial-video sensor is of interest to the user. If Tony is sitting in more classes of content—that is treated differently than the a bar, and his eye falls on a bottle of unusual beer in front of above-detailed rules (e.g., never shared or always shared). a nearby woman, the system can identify his point of focal 0551. In addition to sharing content, the computers can attention, and focus its own processing efforts on pixels cor also share their respective router and response engine responding to that bottle. With a signal from Tony, Such as two resources across the network. Thus, for example, if computer quick eye-blinks, the system can launch an effort to provide A does not have a response engine Suitable for a certain type candidate responses based on that beer bottle perhaps also of image information, it can pass the information to computer informed by other information gleaned from the environment B for handling by its response engine. (time of day, date, ambient audio, etc.) as well as Tony's own 0552. It will be recognized that such a distributed archi personal profile data. (Gaze recognition and related technol tecture has a number of advantages, in terms of reduced cost ogy is disclosed, e.g., in Apple's patent publication and increased reliability. Additionally, the "peer groupings 200802.11766.) can be defined geographically, e.g., computers that find them 0559 The system may quickly identify the beer as Dop selves within a particular spatial environment (e.g., an area pelbock, e.g., by pattern matching from the image (and/or served by a particular WiFi system). The peers canthus estab OCR). With that identifier it finds other resources indicating lish dynamic, ad hoc subscriptions to content and services the beer originates from Bavaria, where it is brewed by monks from nearby computers. When the computer leaves that envi of St. Francis of Paula. Its 9% alcohol content also is distinc ronment, the session ends. tive. 0553 Some researchers foresee the day when all of our 0560. By checking personal experiential archives that experiences are captured in digital form. Indeed, Gordon Bell friends have made available to Tony, the system learns that his at Microsoft has compiled a digital archive of his recent buddy Geoff is fond of Doppelbock, and most recently drank existence through his technologies CyberAll, SenseCam and a bottle in a pub in Dublin. Tony's glancing encounter with MyLifeBits. Included in Bell's archive are recordings of all the bottle is logged in his own experiential archive, where telephone calls, video of daily life, captures of all TV and Geoff may later encounter same. The fact of the encounter radio consumed, archive of all web pages visited, map data of may also be real-time-relayed to Geoff in Prague, helping all places visited, polysomnograms for his sleep apnea, etc., populate an on-going data feed about his friends' activities. etc., etc. (For further information see, e.g., at Bell, A Digital 0561. The bar may also provide an experiential data Life, Scientific American, March, 2007: Gemmell, MyLife server, to which Tony is wirelessly granted access. The server Bits: A Personal Database for Everything, Microsoft maintains an archive of digital data captured in the bar, and Research Technical Report MSR-TR-2006-23; Gemmell, contributed by patrons. The server may also be primed with Passive Capture and Ensuing Issues for a Personal Lifetime related metadata & information the management might con US 2013/0273968 A1 Oct. 17, 2013 34 sider of interest to its patrons, such as the Wikipedia page on with shared interests, singles, etc. or in the form of gaming by the brewing methods of the monks of St Paul, what bands allowing people to opt-in to theme based games, where might be playing in weeks to come, or what the nights patrons piece together clues to find the true identity of some specials are. (Per user preference, some users require that one in the bar or unravel a mystery (similar to the board game their data be cleared when they leave the bar; others permit the Clue). Finally, the demographic information as it relates to data to be retained.) Tony's system may routinely check the audience measurement is of material value to proprietors as local environment’s experiential data server to see what odd they consider which beers to stock next, where to advertise, bits of information might be found. This time it shows that the etc. woman at barstool 3 (who might employ a range privacy heuristics to know where and with whom to share her infor Still Further Discussion mation; in this example she might screen her identity from 0567 Certain portable devices, such as the Apple iPhone, Strangers)—the woman with the Doppelbock has, among offer single-button access to pre-defined functions. Among her friends, a Tom

0576. The control that launches this glossary of signs than manipulating a keyboard to enter, e.g., choice #3, the can itself be an image. One image Suitable for this func user may capture an image of three fingers—visually sym tion is a generally featureless frame. An all-dark frame can be bolizing the selection. Software recognizes the three finger achieved by operating the shutter with the lens covered. An symbol as meaning the digit 3, and inputs that value to the all-light frame can be achieved by operating the shutter with process. the lens pointing at a light source. Another Substantially fea 0584) If desired, visual signs can form part of authentica tureless frame (of intermediate density) may be achieved by tion procedures, e.g., to access a bank or social-networking imaging a patch of skin, or wall, or sky. (To be substantially web site. For example, after entering a sign-on name or pass featureless, the frame should be closer to featureless than word at a site, the user may be shown a stored image (to matching one of the other stored visual signs. In other confirm that the site is authentic) and then be prompted to embodiments, “featureless' can be concluded if the image Submit an image of a particular visual type (earlier defined by has a texture metric below a threshold value.) the user, but not now specifically prompted by the site). The 0577 (The concept of triggering an operation by capturing web site checks features extracted from the just-captured an all-light frame can be extended to any device function. In image for correspondence with an expected response, before Some embodiments, repeated all-light exposures alternatively permitting the user to access the web site. toggle the function on and off. Likewise with all-dark and 0585. Other embodiments can respond to a sequence of intermediate density frames. A threshold can be set by the Snapshots within a certain period (e.g., 10 seconds)—a gram user with a UI control, or by the manufacturer to establish mar of imagery. An image sequence of "wristwatch.” “four how “light” or “dark” such a frame must be in order to be fingers’ “three fingers' can set an alarm clock function on the interpreted as a command. For example, 8-bit (0-255) pixel portable device to chime at 7am. values from a million pixel sensor can be summed. If the Sum 0586. In still other embodiments, the visual signs may be is less than 900,000, the frame may be regarded as all-dark. If gestures that include motion—captured as a sequence of greater than 254 million, the frame may be regarded as all frames (e.g., video) by the portable device. light. Etc.) 0587 Context data (e.g., indicating the user's geographic 0578. One of the other featureless frames can trigger location, time of day, month, etc.) can also be used to tailor the another special response. It can cause the portable device to response. For example, when a user is at work, the response to launchall of the stored functions/URLs (or, e.g., a certain five a certain visual sign may be to fetch an image from a security or ten) in the glossary. The device can cache the resulting camera from the user's home. At home, the response to the frames of information, and present them successively when same sign may be to fetch an image from a security camera at the user operates one of the phone controls, such as button work. 116b or scroll wheel 124 in FIG. 44, or makes a certain 0588. In this embodiment, as in others, the response gesture on a touch screen. (This function can be invoked by needn't be visual. Audio or other output (e.g., tactile, Smell, other controls as well.) etc.) can of course be employed. 0579. The third of the featureless frames (i.e., dark, white, 0589. The just-described technology allows a user to or mid-density) can send the device's location to a map server, define a glossary of visual signs and corresponding custom which can then transmit back multiple map views of the ized responses. An intended response can be quickly invoked user's location. These views may include aerial views and by imaging a readily-available subject. The captured image street map views at different Zoom levels, together with can be of low quality (e.g., overexposed, blurry), since it only nearby street-level imagery. Each of these frames can be needs to be classified among, and distinguished from, a rela cached at the device, and quickly reviewed by turning a scroll tively small universe of alternatives. wheel or other UI control. 0580. The user interface desirably includes controls for Visual Intelligence Pre-Processing deleting visual signs, and editing the name/functionality 0590 Another aspect of the present technology is to per assigned to each. The URLS can be defined by typing on a form one or more visual intelligence pre-processing opera keypad, or by navigating otherwise to a desired destination tions on image information captured by a camera sensor. and then saving that destination as the response correspond These operations may be performed without user request, and ing to a particular image. before other image processing operations that the camera 0581 Training of the pattern recognition engine can con customarily performs. tinue through use, with Successive images of the different 0591 FIG. 56 is a simplified diagram showing certain of visual signs each serving to refine the template model by the processing performed in an exemplary camera, Such as a which that visual sign is defined. cellphone camera. Light impinges on an image sensor com 0582. It will be recognized that a great variety of different prising an array of photodiodes. (CCD or CMOS sensor tech visual signs can be defined, using resources that are com nologies are commonly used.) The resulting analog electrical monly available to the user. A hand can define many different signals are amplified, and converted to digital form by D/A signs, with fingers arranged in different positions (fist, one converters. The outputs of these D/A converters provide through five-fingers, thumb-forefinger OK sign, open palm, image data in its most raw, or “native form. thumbs-up, American sign language signs, etc). Apparel and 0592. The foregoing operations are typically performed its components (e.g., shoes, buttons) can also be used, as can by circuitry formed on a common Substrate, i.e., "on-chip. jewelry. Features from common Surroundings (e.g., tele Before other processes can access the image data, one or more phone) may also be used. other processes are commonly performed. 0583. In addition to launching particular favorite opera 0593. One such further operation is Bayer interpolation tions, such techniques can be used as a user interface tech (de-mosaicing). The photodiodes of the sensor array typically nique in other situations. For example, a software program or each captures only a single color of light: red, green or blue web service may present a list of options to the user. Rather (R/G/B), due to a color filter array. This array is comprised of US 2013/0273968 A1 Oct. 17, 2013 36 a tiled 2x2 pattern of filter elements: one red, a diagonally routed for further processing (e.g., sent from the cellphone to opposite one blue, and the other two green. Bayer interpola a barcode-responsive service). This information can com tion effectively “fills in the blanks' of the sensor's resulting prise the native image data, and/or the Fourier transform R/G/B mosaic pattern, e.g., providing ared signal where there information derived from the image data. is a blue filter, etc. 0606. In the former case, the full image data needn't be 0594. Another common operation is white balance correc sent. In some embodiments a down-sampled version of the tion. This process adjusts the intensities of the component image data, e.g., one-fourth the resolution in both the hori R/G/B colors in order to render certain colors (especially Zontal and vertical directions, can be sent. Or just patches of neutral colors) correctly. the image data having the highest likelihood of depicting part 0595. Other operations that may be performed include ofa barcode pattern can be sent. Or, conversely, patches of the gamma correction and edge enhancement. image data having the lowest likelihood of depicting a bar 0596 Finally, the processed image data is typically com code can not be sent. (These may be patches having no peak pressed to reduce storage requirements. JPEG compression is at the characteristic frequency, or having a lower amplitude most commonly used. there than nearby.) 0597. The processed, compressed image data is then 0607. The transmission can be prompted by the user. For stored in a buffer memory. Only at this point is the image example, the camera UI may ask the user if information information commonly available to other processes and Ser should be directed for barcode processing. In other arrange vices of the cell phone (e.g., by calling a system API). ments, the transmission is dispatched immediately upon a 0598. One such process that is commonly invoked with determination that the image frame matches the template, this processed image data is to present the image to the user on indicating possible barcode data. No user action is involved. the screen of the camera. The user can then assess the image 0608. The Fourier transform data can be tested for signs of and decide, e.g., whether (1) to save it to the camera's other image Subjects as well. A 1D barcode, for example, is memory card, (2) to transmit it in a picture message, (3) to characterized by a significant amplitude component at a high delete it, etc. frequency—(going 'across the pickets.” and another signifi 0599. Until the user instructs the camera (e.g., through a cant amplitude spike at a low frequency—going along the control in a graphical or button-based user interface), the pickets. (Significant again means two-or-more times the image stays in the buffer memory. Without further instruc amplitude of nearby frequencies, as noted above.) Other tion, the only use made of the processed image data is to image contents can also be characterized by reference to their display same on the screen of the cell phone. Fourier domain representation, and corresponding templates 0600 FIG. 57 shows an exemplary embodiment of the can be devised. Fourier transform data is also commonly used presently-discussed aspect of the technology. After convert in computing fingerprints used for automated recognition of ing the analog signals into digital native form, one or more media content. other processes are performed. 0609. The Fourier-Mellin (F-M) transform is also useful 0601 One such process is to perform a Fourier transfor in characterizing various image subjects/components—in mation (e.g., an FFT) on the native image data. This converts cluding the barcodes noted above. The F-M transform has the the spatial-domain representation of the image into a fre advantage of being robust to Scale and rotation of the image quency-domain representation. Subject (Scale/rotation invariance). In an exemplary embodi 0602. A Fourier-domain representation of the native ment, if the scale of the Subject increases (as by moving the image data can be useful in various ways. One is to screen the camera closer), the F-M transform pattern shifts up; if the image for likely barcode data. scale decreases, the F-M pattern shifts down. Similarly, if the 0603 One familiar 2D barcode is a checkerboard-like subject is rotated clockwise, the F-M pattern shifts right; if array of light- and dark-squares. The size of the component rotated counter-clockwise, the F-M pattern shifts left. (The squares, and thus their repetition spacing, gives a pair of particular directions of the shifts can be tailored depending on notable peaks in the Fourier-domain representation of the the implementation.) These attributes make F-M data impor image at a corresponding frequency. (The peaks may be tant in recognizing patterns that may be affine-transformed, phase-spaced ninety degrees in the UV plane, if the pattern Such as facial recognition, character recognition, object rec recurs in equal frequency in both the vertical and horizontal ognition, etc. directions.) These peaks extend significantly above other 0610 The arrangement shown in FIG. 57 applies a Mellin image components at nearby image frequencies—with the transform to the output of the Fourier transform process, to peaks often having a magnitude twice- to five- or ten-times yield F-M data. The F-M can then be screened for attributes (or more) that of nearby image frequencies. If the Fourier associated with different image subjects. transformation is done on tiled patches from the image (e.g., 0611 For example, text is characterized by plural symbols patches of 16x16 pixels, or 128x128 pixels, etc), it may be of approximately similar size, composed of strokes in a fore found that certain patches that are wholly within a barcode ground color that contrast with a larger background field. portion of the image frame have essentially no signal energy Vertical edges tend to dominate (albeit slightly inclined with except at this characteristic frequency. italics), with significant energy also being found in the hori 0604. As shown in FIG. 57, Fourier transform information Zontal direction. Spacings between strokes usually fall within can be analyzed for telltale signs associated with an image of a fairly narrow range. a barcode. A template-like approach can be used. The tem 0612. These attributes manifest themselves as character plate can comprise a set of parameters against which the istics that tend to reliably fall within certain boundaries in the Fourier transform information is tested—to see if the data has F-M transform space. Again, a template can define tests by indicia associated with a barcode-like pattern. which the F-M data is screened to indicate the likely presence 0605. If the Fourier data is consistent with an image of text in the captured native image data. If the image is depicting a 2D barcode, corresponding information can be determined to include likely-text, it can be dispatched to a US 2013/0273968 A1 Oct. 17, 2013 37 service that handles this type of data (e.g., an optical character circuitry integrated on the same Substrate as the image sen recognition, or OCR, engine). Again, the image (or a variant sors. (Some of the operations may be performed by program of the image) can be sent, or the transform data can be sent, or mable hardware—either on the substrate or off responsive Some other data. to software instructions.) 0613 Just as text manifests itself with a certain set of 0622. While the foregoing operations are described as characteristic attributes in the F-M, so do faces. The F-M data immediately following conversion of the analog sensor sig output from the Mellin transform can be tested against a nals to digital form, in other embodiments such operations different template to determine the likely presence of a face can be performed after other processing operations (e.g., within the captured image. Bayer interpolation, white balance correction, JPEG com 0614 Likewise, the F-M data can be examined for tell-tale pression, etc.). signs that the image data conveys a watermark. A watermark 0623. Some of the services to which information is sent orientation signal is a distinctive signal present in some water may be provided locally in the cell phone. Or they can be marks that can serve as a sign that a watermark is present. provided by a remote device, with which the cellphone estab 0615. In the examples just given, as in others, the tem lishes a link that is at least partly wireless. Or Such processing plates may be compiled by testing with known images (e.g., can be distributed among various devices. “training”). By capturing images of many different text pre 0624 (While described in the context of conventional sentations, the resulting transform data can be examined for CCD and CMOS sensors, this technology is applicable attributes that are consistent across the sample set, or (more regardless of sensor type. Thus, for example, Foveon and likely) that fall within bounded ranges. These attributes can panchromatic image sensors can alternately be used. So can then be used as the template by which images containing high dynamic range sensors, and sensors using Kodak's True likely-text are identified. (Likewise for faces, barcodes, and sense Color Filter Pattern (which add panchromatic sensor other types of image subjects.) pixels to the usual Bayer array of red/green/blue sensor pix 0616 FIG. 57 shows that a variety of different transforms els). Sensors with infrared output data can also advanta can be applied to the image data. These are generally shown geously be used. For example, sensors that output infrared as being performed in parallel, although one or more can be image data (in addition to visible image data, or not) can be performed sequentially—either all operating on the same used to identify faces and other image Subjects with tempera input image data, or one transform using an output of a ture differentials—aiding in segmenting image Subjects previous transform (as is the case with the Mellin transform). within the frame.) Although not all shown (for clarity of illustration), outputs 0625. It will be recognized that devices employing the from each of the other transform processes can be examined FIG.57 architecture have, essentially, two parallel processing for characteristics that suggest the presence of a certain image chains. One processing chain produces data to be rendered type. If found, related data is then sent to a service appropriate into perceptual form for use by human viewers. This chain to that type of image information. typically includes at least one of a de-mosaic processor, a 0617. In addition to Fourier transform and Mellin trans white balance module, and a JPEG image compressor, etc. form processes, processes such as eigenface (eigenvector) The second processing chain produces data to be analyzed by calculation, image compression, cropping, affine distortion, one or more machine-implemented algorithms, and in the filtering, DCT transform, wavelet transform, Gabor trans illustrative example includes a Fourier transform processor, form, and other signal processing operations can be applied an eigenface processor, etc. (all are regarded as transforms). Others are noted elsewhere in 0626. Such processing architectures are further detailed in this specification, and in the documents incorporated by ref application 61/176.739, cited earlier. erence. Outputs from these processes are then tested for char 0627 By arrangements such as the foregoing, one or more acteristics indicating that the chance the image depicts a appropriate image-responsive services can begin formulating certain class of information, is greater than a random chance. candidate responses to the visual stimuli before the user has 0618. The outputs from some processes may be input to even decided what to do with the captured image. other processes. For example, an output from one of the boxes labeled ETC in FIG.57 is provided as an input to the Fourier Further Comments on Visual Intelligence Pre-Processing transform process. This ETC box can be, for example, a 0628 While static image pre-processing was discussed in filtering operation. Sample filtering operations include connection with FIG. 57 (and FIG. 50), such processing can median, Laplacian, Wiener, Sobel, high-pass, low-pass, also include temporal aspects, such as motion. bandpass, Gabor, signum, etc. (Digimarc's U.S. Pat. Nos. 0629 Motion is most commonly associated with video, 6,442,284, 6,483,927, 6,516,079, 6,614,914, 6,631, 198, and the techniques detailed herein can be used when captur 6,724,914, 6,988,202, 7,013,021 and 7,076,082 show various ing video content. However, motion/temporal implications such filters.) are also present with “still imagery. 0619. Sometimes a single service may handle different 0630 For example, some image sensors are read sequen data types, or data that passes different screens. In FIG.57, for tially, top row to bottom row. During the reading operation, example, a facial recognition service may receive F-M trans the image Subject may move within the image frame (i.e., due form data, or eigenface data. Or it may receive image infor to camera movement or subject movement). An exaggerated mation that has passed one of several different screens (e.g., view of this effect is shown in FIG. 60, depicting an imaged its F-M transform passed one screen, or its eigenface repre “E” captured as the sensor is moved to the left. The vertical sentation passed a different screen). stroke of the letter is further from the left edge of the image 0620. In some cases, data can be sent to two or more frame at the bottom than the top, due to movement of the different services. sensor while the pixel data is being clocked-out. 0621. Although not essential, it is desirable that some or 06.31 The phenomenon also arises when the camera all of the processing shown in FIG. 57 be performed by assembles data from several frames to generate a single 'still US 2013/0273968 A1 Oct. 17, 2013 image. Often unknown to the user, many consumer imaging by processing in the cell phone, or external services can be devices rapidly capture plural frames of image data, and employed (e.g., the OCR recognition service shown in FIG. composite different aspects of the data together (using soft 57 can be within the cellphone, or can be a remote server, etc.; ware provided, e.g., by FotoNation, Inc., now Tessera Tech similarly with the operations shown in FIG. 50.). nologies, Inc.). For example, the device may take three expo 0637. How can this information be packaged to facilitate Sures—one exposed to optimize appearance of faces detected Subsequent processing? One alternative is to convey it in the in the image frame, another exposed in accordance with the “alpha' channel of common image formats. background, and other exposed in accordance with the fore 0638 Most image formats represent imagery by data con ground. These are melded together to create a pleasing mon veyed in plural channels, or byte-planes. In RGB, for tage. (In another example, the camera captures a burst of example, one channel conveys red luminance, a second con frames and, in each, determines whether persons are Smiling veys green luminance, and a third conveys blue luminance. or blinking. It may then select different faces from different Similarly with CMYK (the channels respectively conveying frames to yield a final image.) cyan, magenta, yellow, and black information) Ditto with 0632. Thus, the distinction between video and still imag YUV commonly used with video (a luma, or brightness, ery is no longer simply a device modality, but rather is becom channel: Y, and two color channels: U and V), and LAB (also ing a user modality. brightness, with two color channels). 0633 Detection of motion can be accomplished in the 0639. These imaging constructs are commonly extended spatial domain (e.g., by reference to movement of feature to include an additional channel: alpha. The alpha channel is pixels between frames), or in a transform domain. Fourier provided to convey opacity information indicating the transform and DCT data are exemplary. The system may extent to which background subjects are visible through the extract the transform domain signature of an image compo imagery. nent, and track its movement across different frames iden 0640 While commonly supported by image processing tifying its motion. One illustrative technique deletes, e.g., the file structures, software and systems, the alpha channel is not lowest N frequency coefficients—leaving just high frequency much used (except, most notably, in computer generated edges, etc. (The highest M frequency coefficients may be imagery and radiology). Certain implementations of the disregarded as well.) A thresholding operation is performed present technology use the alpha channel to transmit infor on the magnitudes of the remaining coefficients—Zeroing mation derived from image data. those below a value (such as 30% of the mean). The resulting 0641. The different channels of image formats commonly coefficients serve as the signature for that image region. (The have the same size and bit-depth. In RGB, for example, the transform may be based, e.g., on tiles of 8x8 pixels.) When a red channel may convey 8-bit data (allowing values of 0-255 pattern corresponding to this signature is found at a nearby to be represented), for each pixel in a 640x480 array. Like location within another (or the same) image frame (using wise with the green and blue channels. The alpha channel in known similarity testing, Such as correlation), movement of Such arrangements is also commonly 8 bits, and co-extensive that image region can be identified. with the image size (e.g., 8 bitsx640x480). Every pixel thus has ared value, agreen value, a blue value, and an alpha value. Image Conveyance of Semantic Information (The composite image representation is commonly known as 0634. In many systems it is desirable to perform a set of RGBA) processing steps (like those detailed above) that extract infor 0642 A few of the many ways the alpha channel can be mation about the incoming content (e.g., image data) in a used to convey information derived from the image data are Scalable (e.g., distributed) manner. This extracted informa shown in FIGS. 62-71, and discussed below. tion (metadata) is then desirably packaged to facilitate Sub 0643 FIG. 62 shows a picture that a user may snap with a sequent processing (which may be application specific, or cell phone. A processor in the cell phone (on the sensor more computationally intense, and can be performed within Substrate or elsewhere) may apply an edge detection filter the originating device or by a remote system). (e.g., a Sobel filter) to the image data, yielding an edge map. 0635 A rough analogy is user interaction with Google. Each pixel of the image is either determined to be part of an Bare search terms aren't sent to a Google mainframe, as if edge, or not. So this edge information can be conveyed injust from a dumb terminal. Instead, the user's computer formats a one bit plane of the eight bit planes available in the alpha query as an HTTP request, including the internet protocol channel. Such an alpha channel payload is shown in FIG. 63. address of the originating computer (indicative of location), 0644. The cell phone camera may also apply known tech and makes available cookie information by which user lan niques to identify faces within the image frame. The red, guage preferences, desired safe search filtering, etc., can be green and blue image data from pixels corresponding to facial discerned. This structuring of relevant information serves as a regions can be combined to yield a grey-scale representation, precursor to Google's search process, allowing Google to and this representation can be included in the alpha channel— perform the search process more intelligently providing e.g., in aligned correspondence with the identified faces in the faster and better results to the user. RGB image data. An alpha channel conveying both edge 0636 FIG. 61 shows some of the metadata that may be information and greyscale faces is shown in FIG. 64. (An involved in an exemplary system. The left-most column of 8-bit greyscale is used for faces in the illustrated embodiment, information types may be computed directly from the native although a shallower bit-depth, such as 6- or 7-bits, can be image data signals taken from the image sensor. (As noted, used in other arrangements—freeing other bit planes for other Some or all of these can be computed using processing information.) arrangements integrated with the sensor on a common Sub 0645. The camera may also perform operations to locate strate.) Additional information may be derived by reference the positions of the eyes and mouth in each detected face. to these basic data types, as shown by the second column of Markers can be transmitted in the alpha channel—indicating information types. This further information may be produced the scale and positions of these detected features. A simple US 2013/0273968 A1 Oct. 17, 2013 39 form of marker is a “smiley face' bit-mapped icon, with the yearandcolor of the truck, recognize the text on the truck grill eyes and mouth of the icon located at the positions of the and the owner's t-shirt, recognize the owner's face, and rec detected eyes and mouth. The scale of the face can be indi ognize areas of grass and sky. cated by the length of the iconic mouth, or by the size of a 0654 The sky was recognized by its position at the top of Surrounding oval (or the space between the eye markers). The the frame, its color histogram within a threshold distance of tilt of the face can be indicated by the angle of the mouth (or expected norms, and a spectral composition weak in certain the angle of the line between the eyes, or the tilt of a surround frequency coefficients (e.g., a Substantially "flat” region). The ing oval). grass was recognized by its texture and color. (Other tech 0646. If the cell phone processing yields a determination niques for recognizing these features are taught, e.g., in of the genders of persons depicted in the image, this too can be Batlle, "A review on strategies for recognizing natural objects represented in the extra image channel. For example, an oval in colour images of outdoor scenes. Image and Vision Com line circumscribing the detected face of a female may be puting, Volume 18, Issues 6-7, 1 May 2000, pp. 515-530; made dashed or otherwise patterned. The eyes may be repre Hayashi, “Fast Labelling of Natural Scenes Using Enhanced sented as cross-hairs or XS instead of blackened circles, etc. Knowledge. Pattern Analysis & Applications, Volume 4, Ages of depicted persons may also be approximated, and Number 1, March, 2001, pp. 20-27; and Boutell, “Improved indicated similarly. The processing may also classify each semantic region labeling based on scene context.” IEEE Int’l person’s emotional state by visual facial clues, and an indi Conf. on Multimedia and Expo, July, 2005. See also patent cation Such as Surprise/happiness/sadness/anger/neutral can publications 20050105776 and 20050105775 (Kodak).) The be represented. (See, e.g., Su, "A simple approach to facial trees could have been similarly recognized. expression recognition.” Proceedings of the 2007 Int’l Conf 0655 The human face in the image was detected using on Computer Engineering and Applications, Queensland, arrangements like those commonly employed in consumer Australia, 2007, pp. 456-461. See also patent publications cameras. Optical character recognition was performed on a 20080218472 (Emotiv Systems, Pty), and 20040207720 data set resulting from application of an edge detection algo (NTT DoCoMo)). rithm to the input image, followed by Fourier and Mellin 0647. When a determination has some uncertainty (such transforms. (While finding the text GMC and LSU TIGERS, as guessing gender, age range, or emotion), a confidence the algorithm failed to identify other text on the t-shirt, and metric output by the analysis process can also be represented text on the tires. With additional processing time, some of this in an iconic fashion, such as by the width of the line, or the missing text may have been decoded.) scale or selection of pattern elements. 0648 FIG. 65 shows different pattern elements that can be 0656. The truck was first classed as a vehicle, and then as used to denote different information, including gender and a truck, and then finally identified as a Dark Crimson Metallic confidence, in an auxiliary image plane. 2007 GMC Sierra Z-71 with extended cab, by pattern match 0649. The portable device may also perform operations ing. (This detailed identification was obtained through use of culminating in optical character recognition of alphanumeric known reference truck images, from resources such as the symbols and strings depicted in the image data. In the illus GM trucks web site, Flickr, and a fan site devoted to identi trated example, the device may recognize the string "LAS fying vehicles in Hollywood motion pictures: IMCDB-dot VEGAS in the picture. This determination can be memori com.) alized by a PDF417 2D barcode added to the alpha channel. 0657 FIG. 68 shows an illustrative graphical, bitonal rep The barcode can be in the position of the OCR'd text in the resentation of the discerned information, as added to the alpha image frame, or elsewhere. channel of the FIG. 67 image. (FIG. 69 shows the different 0650 (PDF417 is exemplary only. Other barcodes such planes of the composite image: red, green, blue, and alpha.) as 1D, Aztec, Datamatrix, High Capacity Color Barcode, 0658. The portion of the image area detected as depicting Maxicode, QR Code, Semacode, and ShotCode—or other grass is indicated by a uniform array of dots. The image area machine-readable data symbologies—such as OCR fonts and depicting sky is represented as a grid of lines. (If trees had data glyphs—can naturally be used. Glyphs can be used both been particularly identified, they could have been labeled to convey arbitrary data, and also to form halftone image using one of the same patterns, but with different size/spac depictions. See in this regard Xerox's U.S. Pat. No. 6,419. ing/etc. Oran entirely different pattern could have been used.) 162, and Hecht, “Printed Embedded Data Graphical User 0659. The identification of the truck as a Dark Crimson Interfaces.” IEEE Computer Magazine, Vol. 34, No. 3, 2001, Metallic 2007 GMC Sierra Z-71 with extended cab is pp. 47-55.) encoded in a PDF417 2D barcode—scaled to the size of the 0651 FIG. 66 shows an alpha channel representation of truck and masked by its shape. Because PDF417 encodes some of the information determined by the device. All of this information redundantly, with error-correction features, the information is structured in a manner that allows it to be portions of the rectangular barcode that are missing do not conveyed withinjust a single bit plane (of the eight bit planes) prevent the encoded information from being recovered. of the alpha channel. Information resulting from other of the 0660. The face information is encoded in a second processing operations (e.g., the analyses shown in FIGS. 50 PDF417 barcode. This second barcode is oriented at 90 and 61) may be conveyed in this same bit plane, or in others. degrees relative to the truck barcode, and is scaled differently, 0652) While FIGS. 62-66 showed a variety of information to help distinguish the two distinct symbols to downstream that can be conveyed in the alpha channel, and different decoders. (Other different orientations could be used, and in representations of same, still more are shown in the example Some cases are preferable, e.g., 30 degrees, 45 degrees, etc.) of FIGS. 67-69. These involve a cell phone picture of a new 0661 The facial barcode is oval in shape, and may be GMC truck and its owner. outlined with an oval border (although this is not depicted). 0653 Among other processing, the cell phone in this The center of the barcode is placed at the mid-point of the example processed the image data to recognize the model, person’s eyes. The width of the barcode is twice the distance US 2013/0273968 A1 Oct. 17, 2013 40 between the eyes. The height of the oval barcode is four times for the person in different contexts. For example, some PDAs the distance between the mouth and a line joining the eyes. store contact lists that include facial images of the contacts. 0662. The payload of the facial barcode conveys informa The user (or the contacts) provides facial images that are tion discerned from the face. In rudimentary embodiments, easily recognized iconic. These iconic facial images can be the barcode simply indicates the apparent presence of a face. scaled to match the head of the person depicted in an image, In more Sophisticated embodiments, eigenvectors computed and added to the alpha channel at the corresponding facial from the facial image can be encoded. If a particular face is location. recognized, information identifying the person can be 0670 Also included in the alpha channel depicted in FIG. encoded. If the processor makes a judgment about the likely 71 is a 2D barcode. This barcode can convey other of the gender of the Subject, this information can be conveyed in the information discerned from processing of the image data or barcode too. otherwise available (e.g., the child's name, a color histogram, 0663 Persons appearing in imagery captured by consumer exposure metadata, how many faces were detected in the cameras and cell phones are not random: a significant per picture, the ten largest DCT or other transform coefficients, centage are of recurring Subjects, e.g., the owner's children, etc.). spouse, friends, the user himself/herself, etc. There are often 0671 To make the 2D barcode as robust as possible to multiple previous images of these recurring Subjects distrib compression and other image processing operations, its size uted among devices owned or used by the owner, e.g., PDA, may not be fixed, but rather is dynamically scaled based on cell phone, home computer, network storage, etc. Many of circumstances—such as image characteristics. In the these images are annotated with names of the persons depicted embodiment, the processor analyzes the edge map to depicted. From Such reference images, sets of characterizing identify regions with uniform edginess (i.e., within a thresh facial vectors can be computed, and used to identify Subjects olded range). The largest Such region is selected. The barcode in new photos. (As noted, Google's Picasa service works on is then scaled and placed to occupy a central area of this this principle to identify persons in a user's photo collection; region. (In Subsequent processing, the edginess where the Facebook and iPhoto do likewise.) Such a library of reference barcode was Substituted can be largely recovered by averag facial vectors can be checked to try and identify the person ing the edginess at the center points adjoining the four bar depicted in the FIG. 67 photograph, and the identification can be represented in the barcode. (The identification can com code sides.) prise the person's name, and/or other identifier(s) by which 0672. In another embodiment, region size is tempered the matched face is known, e.g., an index number in a data with edginess in determining where to place a barcode: low base or contact list, a telephone number, a FaceBook user edginess is preferred. In this alternative embodiment, a name, etc.) Smaller region of lower edginess may be chosen over a larger 0664 Text recognized from regions of the FIG. 67 image region of higher edginess. The size of each candidate region, is added to corresponding regions of the alpha channel frame, minus a scaled value of edginess in the region, can serve as a presented in a reliably decodable OCR font. (OCR-A is metric to determine which region should host the barcode. depicted although other fonts may be used.) This is the arrangement used in FIG. 71, resulting in place 0665 A variety of further information may be included in ment of the barcode in a region to the left of Matthew's the FIG. 68 alpha channel. For example, locations in the head—rather than in a larger, but edgier, region to the right. frame where a processor Suspects text is present, but OCRing 0673 Although the FIG. 70 photo is relatively "edgy” (as did not successfully decode alphanumeric symbols (on the contrasted, e.g., with the FIG. 62 photo), much of the edginess tires perhaps, or other characters on the person's shirt), can be may be irrelevant. In some embodiments the edge data is identified by adding a corresponding visual clue (e.g., a pat filtered to preserve only the principal edges (e.g., those indi tern of diagonal lines). An outline of the person (rather than cated by continuous line contours). Within otherwise vacant just an indication of his face) can also be detected by a regions of the resulting filtered edge map a processor can processor, and indicated by a corresponding border or fill convey additional data. In one arrangement the processor pattern. inserts a pattern to indicate a particular color histogram bin 0666. While the examples of FIGS. 62-66 and FIGS. into which that region's image colors fall. (In a 64-bin histo 67-69 show various different ways of representing semantic gram, requiring 64 different patterns, bin 2 may encompass metadata in the alpha channel, still more techniques are colors in which the red channel has values of 0-63, the green shown in the example of FIGS. 70-71. Here a user has cap channel has values of 0-63, and the blue channel has a values tured a snapshot of a child at play (FIG.70). of 64-127, etc.) Other image metrics can similarly be con 0667 The child's face is turned away from the camera, and veyed. is captured with poor contrast. However, even with this lim 0674) Instead of using different patterns to indicate differ ited information, the processor makes a likely identification ent data, vacant regions in a filtered edge map can be filled by referring to the user's previous images: the user's firstborn with a noise-like signal—steganographically encoded to con child Matthew Doe (who seems to be found in countless of the vey histogram (or other information) as digital watermark users archived photos). data. (A Suitable watermarking technology is detailed in Digi 0668. As shown in FIG. 71, the alpha channel in this marc's U.S. Pat. No. 6,590,996.) example conveys an edge-detected version of the user's 0675. It will be recognized that some of the information in image. Superimposed over the child's head is a Substitute the alpha channel—if visually presented to a human in a image of the child's face. This Substitute image can be graphical form, conveys useful information. From FIG. 63 a selected for its composition (e.g., depicting two eyes, nose human can distinguish a man embracing a woman, in front of and mouth) and better contrast. a sign stating “WELCOME TO Fabulous LAS VEGAS 0669. In some embodiments, each person known to the NEVADA. From FIG. 64 the human can see greyscale faces, system has an iconic facial image that serves as a visual proxy and an outline of the scene. From FIG. 66 the person can US 2013/0273968 A1 Oct. 17, 2013

additionally identify a barcode conveying some information, access this information, and contribute still more. All this and can identify two Smiley face icons showing the positions within the existing workflow channels and constraints of of faces. long-established file formats. 0676. Likewise, a viewer to whom the frame of graphical 0684. In some embodiments, the provenance of some or information in FIG. 68 is rendered can identify an outline of all of the discerned/inferred data is indicated. For example, a person, can read the LSU TIGERS from the person’s shirt, stored data may indicate that OCRing which yielded certain and make out what appears to be the outline of a truck (aided text was performed by a Verizon server having a unique by the clue of the GMC text where the truck's grill would be). identifier, such as MAC address of 01-50-F3-83-AB-CC or 0677. From presentation of the FIG.71 alpha channel data network identifier PDX-LA002290.corp. verizon-dot-com, a human can identify a child sitting on the floor, playing with on Aug. 28, 2008, 8:35pm. Such information can be stored in toys. the alpha channel, in header data, in a remote repository to 0678. The barcode in FIG. 71, like the barcode in FIG. 66, which a pointer is provided, etc. conspicuously indicates to an inspecting human the presence 0685 Different processors may contribute to different bit of information, albeit not its content. planes of the alpha channel. A capture device may write its 0679. Other of the graphical content in the alpha channel information to bit plane #1. An intermediate node may store may not be informative to a human upon inspection. For its contributions in bit plane #2. Etc. Certain bit planes may be example, if the child's name is steganographically encoded as available for shared use. a digital watermark in a noise-like signal in FIG. 71, even the 0686 Ordifferent bit planes may be allocated for different presence of information in that noise may go undetected by classes or types of semantic information. Information relating the person. to faces or persons in the image may always be written to bit plane #1. Information relating to places may always be writ 0680 The foregoing examples detail some of the diversity ten to bit plane #2. Edge map data may always be found in bit of semantic information that can be stuffed into the alpha plane #3, together with color histogram data (e.g., repre channel, and the diversity of representation constructs that sented in 2D barcode form). Other content labeling (e.g., can be employed. Of course, this is just a small sampling; the grass, sand, sky) may be found in bit plane #4, together with artisan can quickly adapt these teachings to the needs of OCR'd text. Textual information, such as related links or particular applications, yielding many other, different textual content obtained from the web may be found in bit embodiments. Thus, for example, any of the information that plane #5. (ASCII symbols may be included as bit patterns, can be extracted from an image can be memorialized in the e.g., with each symbol taking 8 bits in the plane. Robustness alpha channel using arrangements akin to those disclosed to Subsequent processing can be enhanced by allocating 2 or herein. more bits in the image plane for each bit of ASCII data. 0681. It will be recognized that information relating to the Convolutional coding and other error correcting technologies image can be added to the alpha channel at different times, by can also be employed for Some or all of the image plan different processors, at different locations. For example, the information. So, too, can error correcting barcodes.) sensor chip in a portable device may have on-chip processing 0687. An index to the information conveyed in the alpha that performs certain analyses, and adds resulting data to the channel can be compiled, e.g., in an EXIF header associated alpha channel. The device may have another processor that with the image, allowing Subsequent systems to speed their performs further processing—on the image data and/or on the interpretation and processing of Such data. The index can results of the earlier analyses—and adds a representation of employ XML-like tags, specifying the types of data conveyed those further results to the alpha channel. (These further in the alpha channel, and optionally other information (e.g., results may be based, in part, on data acquired wirelessly from their locations). a remote source. For example, a consumer camera may link 0688 Locations can be specified as the location of the by Bluetooth to the user's PDA, to obtain facial information upper-most bit (or upper-left-most bit) in the bit-plane array, from the user's contact files.) e.g., by X-Y-coordinates. Or a rectangular bounding box can 0682. The composite image file may be transmitted from be specified by reference to two corner points (e.g., specified the portable device to an intermediate network node (e.g., at by X,Y coordinates)—detailing the region whereinformation a carrier such as Verizon, AT&T, or T-Mobile, or at another is represented. service provider), which performs additional processing, and adds its results to the alpha channel. (With its more capable 0689. In the example of FIG. 66, the index may convey processing hardware, such an intermediate network node can information Such as perform more complex, resource-intensive processing Such (0690 AlphaEitflanel (637,938) ing. With its higher-bandwidth network access. Such a node (0691 AlphabitPlanel (750,1012) alpha channel with additional data, e.g., links to Wikipedia 0692 AlphabitPlane1 (75.450)- entries—or Wikipedia content itself, information from tele (1425,980) phone database and image database lookups, etc.) The thus (0693) AlphaEitFlane1 Supplemented image may then be forwarded to an image 0694. This index thus indicates that a male face is found in query service provider (e.g., SnapNow, Mobile Acuity, etc.), bit plane #1 of the alpha channel, with a top pixel at location which can continue the process and/or instruct a responsive (637.938); a female face is similarly present with a top pixel action based on the information thus-provided. located at (750,1012); OCR'd text encoded as a PDF417 0683. The alpha channel may thus convey an iconic view barcode is found in bit plane #1 in the rectangular area with of what all preceding processing has discerned or learned corner points (75.450) and (1425,980), and that bit plane #1 about the image. Each Subsequent processor can readily also includes an edge map of the image. US 2013/0273968 A1 Oct. 17, 2013 42

0695 More or less information can naturally be provided. rate image planes devoted to (1) red, (2) green, (3) blue, (4) A different form of index, with less information, may specify, near infrared, (5) mid-infrared, (6) far infrared, and (7) ther C.2. mal infrared. The techniques detailed above can convey (0696) auxiliary data planes available in Such formats. 0697. This form of index simply indicates that bit plane #1 0704. As an image moves between processing nodes, of the alpha channel includes 2 faces, a PDF417 barcode, and some of the nodes may overwrite data inserted by earlier an edge map. processing. Although not essential, the overwriting processor 0698. An index with more information may specify data may copy the overwritten information into remote storage, including the rotation angle and scale factor for each face, the and include a link or other reference to it in the alpha channel, LAS VEGAS payload of the PDF417 barcode, the angle of or index, or image—in case same later is needed. the PDF417 barcode, the confidence factors for subjective 0705. When representing information in the alpha chan determinations, names of recognized persons, a lexicon or nel, consideration may be given to degradations to which this glossary detailing the semantic significance of each pattern channel may be subjected. JPEG compression, for example, used in the alpha channels (e.g., the patterns of FIG. 65, and commonly discards high frequency details that do not mean the graphical labels used for sky and grass in FIG. 68), the ingfully contribute to a human’s perception of an image. Such Sources of auxiliary data (e.g., of the Superimposed child’s discarding of information based on the human visual system, face in FIG.71, or the remote reference image data that served however, can work to disadvantage when applied to informa as basis for the conclusion that the truck in FIG. 67 is a Sierra tion that is present for other purposes (although human view Z71), etc. ing of the alpha channel is certainly possible and, in some 0699. As can be seen, the index can convey information cases, useful). that is also conveyed in the bit planes of the alpha channel. 0706 To combat such degradation, the information in the Generally, different forms of representation are used in the alpha channel can be represented by features that would not alpha channel's graphical representations, Versus the index. likely be regarded as visually irrelevant. Different types of For example, in the alpha channel the femaleness of the information may be represented by different features, so that second face is represented by the '+'s to represent the eyes; in the most important persist through even severe compression. the index the femaleness is represented by the XML tag Thus, for example, the presence of faces in FIG. 66 are sig . Redundant representation of information nified by bold ovals. The locations of the eyes may be less can serve as a check on data integrity. relevant, so are represented by smaller features. Patterns 0700 Sometimes header information, such as EXIF data, shown in FIG. 65 may not be reliably distinguished after becomes separated from the image data (e.g., when the image compression, and so might be reserved to represent secondary is converted to a differentformat). Instead of conveying index information where loss is less important. With JPEG com information in a header, a bit plane of the alpha channel can pression, the most-significant bit-plane is best preserved, serve to convey the index information, e.g., bit plane #1. One whereas lesser-significant bit-planes are increasingly cor Such arrangement encodes the index information as a 2D rupted. Thus, the most important metadata should be con barcode. The barcode may be scaled to fill the frame, to veyed in the most-significant bit planes of the alpha chan provide maximum robustness to possible image degradation. nel—to enhance Survivability. 0701. In some embodiments, some or all of the index (0707. If technology of the sort illustrated by FIGS. 62-71 information is replicated in different data stores. For example, becomes a lingua franca for conveying metadata, image com it may be conveyed both in EXIF header form, and as a pression might evolve to take its presence into account. For barcode in bit plane #1. Some or all of the data may also be example, JPEG compression may be applied to the red, green maintained remotely, such as by Google, or other web storage and blue image channels, but lossless (or less lossy) compres “in the cloud.” Address information conveyed by the image sion may be applied to the alpha channel. Since the various bit can serve as a pointer to this remote storage. The pointer planes of the alpha channel may convey different informa (which can be a URL, but more commonly is a UID or index tion, they may be compressed separately—rather than as into a database which when queried—returns the current bytes of 8-bit depth. (If compressed separately, lossy com address of the sought-for data) can be included within the pression may be more acceptable.) With each bit-plane con index, and/or in one or more of the bit planes of the alpha veying only bitonal information, compression schemes channel. Or the pointer can be steganographically encoded known from facsimile technology can be used, including within the pixels of the image data (in some or all of the Modified Huffman, Modified READ, run length encoding, composite image planes) using digital watermarking technol and ITU-T T.6. Hybrid compression techniques are thus well Ogy. suited for such files. 0702. In still other embodiments, some or all the informa 0708 Alpha channel conveyance of metadata can be tion described above as stored in the alpha channel can addi arranged to progressively transmit and decode in general tionally, or alternatively, be stored remotely, or encoded correspondence with associated imagery features, when within the image pixels as a digital watermark. (The picture using compression arrangements such as JPEG 2000. That is, itself, with or without the alpha channel, can also be repli since the alpha channel is presenting semantic information in cated in remote storage, by any device in the processing the visual domain (e.g., iconically), it can be represented So chain.) that layers of semantic detail decompress at the same rate as 0703. Some image formats include more than the four the image. planes detailed above. Geospatial imagery and other mapping (0709. In JPEG 2000, a wavelet transform is used togen technologies commonly represent data with formats that erate data representing the image. JPEG 2000 packages and extend to a half-dozen or more information planes. For processes this transform data in a manner yielding progres example, multispectral space-based imagery may have sepa sive transmission and decoding. For example, when render US 2013/0273968 A1 Oct. 17, 2013 ing a JPEG 2000 image, the gross details of the image appear 0717 Receiving nodes can also use the conveyed data to first, with successively finer details following. Similarly with enhance stored profile information relating to the user. A node transmission. receiving the FIG. 66 metadata can note Las Vegas as a 0710 Consider the truck & man image of FIG. 67. Ren location of potential interest. A system receiving the FIG. 68 dering a JPEG 2000 version of this image would first present metadata can infer that GMC Z71 trucks are relevant to the the low frequency, bold form of the truck. Thereafter the user, and/or to the person depicted in that photo. Such asso shape of the man would appear. Next, features such as the ciations can serve as launch points for tailored user experi GMC lettering on the truck grill, and the logo on the man’s CCCS, t-shirt would be distinguished. Finally, the detail of the man’s 0718 The metadata also allows images with certain facial features, the grass, the trees, and other high frequency attributes to be identified quickly, in response to user queries. minutiae would complete the rendering of the image. Simi (E.g., find pictures showing GMC Sierra Z71 trucks.) Desir larly with transmission. ably, web-indexing crawlers can check the alpha channels of 0711. This progression is shown in the pyramid of FIG. images they find on the web, and add information from the 77A. Initially a relatively small amount of information is alpha channel to the compiled index to make the image more presented—giving gross shape details. Progressively the readily identifiable to searchers. image fills in ultimately ending with a relatively large 0719. As noted, an alpha channel-based approach is not amount of Small detail data. essential for use of the technologies detailed in this specifi 0712. The information in the alpha channel can be cation. Another alternative is a data structure indexed by arranged similarly (FIG. 77B). Information about the truck coordinates of image pixels. The data structure can be con can be represented with a large, low frequency (shape-domi veyed with the image file (e.g., as EXIF header data), or stored nated) symbology. Information indicating the presence and at a remote Server. location of the man can be encoded with a next-most-domi 0720 For example, one entry in the data structure corre nant representation. Information corresponding to the GMC sponding to pixel (637.938) in FIG. 66 may indicate that the lettering on the truck grill, and lettering on the man’s shirt, pixel forms part of a male's face. A second entry for this pixel can be represented in the alpha channel with a finer degree of may point to a shared sub-data structure at which eigenface detail. The finest level of salient detail in the image, e.g., the values for this face are stored. (The shared sub-data structure minutiae of the man’s face, can be represented with the finest may also list all the pixels associated with that face.) A data degree of detail in the alpha channel. (AS may be noted, the record corresponding to pixel (622,970) may indicate the illustrative alpha channel of FIG. 68 doesn't quite follow this pixel corresponds to the left-side eye of the male's face. A model.) data record indexed by pixel (155.780) may indicate that the 0713) If the alpha channel conveys its information in the pixel forms part of text recognized (by OCRing) as the letter form of machine-readable symbologies (e.g., barcodes, digi “L’, and also falls within color histogram bin 49, etc. The tal watermarks, glyphs, etc.), the order of alpha channel provenance of each datum of information may also be decoding can be deterministically controlled. Features with recorded. the largest features are decoded first; those with the finest 0721 (Instead of identifying each pixel by X- and Y-coor features are decoded last. Thus, the alpha channel can convey dinates, each pixel may be assigned a sequential number by barcodes at several different sizes (all in the same bit frame, which it is referenced.) e.g., located side-by-side, or distributed among bit frames). 0722. Instead of several pointers pointing to a common Or the alpha channel can convey plural digital watermark sub-data structure from data records of different pixels, the signals, e.g., one at a gross resolution (e.g., corresponding to entries may form a linked list, in which each pixel includes a 10 watermark elements, or “waxels' to the inch), and others at pointer to a next pixel with a common attribute (e.g., associ successively finer resolutions (e.g., 50, 100, 150 and 300 ated with the same face). A record for a pixel may include waxels per inch). Likewise with data glyphs: a range of larger pointers to plural different sub-data structures, or to plural and Smaller sizes of glyphs can be used, and they will decode other pixels—to associate the pixel with plural different relatively earlier or later. image features or data. 0714 (JPEG2000 is the most common of the compression 0723 If the data structure is stored remotely, a pointer to schemes exhibiting progressive behavior, but there are others. the remote store can be included with the image file, e.g., JPEG, with some effort, can behave similarly. The present Steganographically encoded in the image data, expressed with concepts are applicable whenever Such progressivity exists.) EXIF data, etc. If any watermarking arrangement is used, the 0715. By such arrangements, as image features are origin of the watermark (see Digimarc's U.S. Pat. No. 6,307. decoded for presentation—or transmitted (e.g., by streaming 949) can be used as a base from which pixel references are media delivery), the corresponding metadata becomes avail specified as offsets (instead of using, e.g., the upper left able. corner of the image). Such an arrangementallows pixels to be 0716. It will be recognized that results contributed to the correctly identified despite corruptions such as cropping, or alpha channel by the various distributed processing nodes are rotation. immediately available to each Subsequent recipient of the 0724. As with alpha channel data, the metadata written to image. A service provider receiving a processed image, for a remote store is desirably available for search. A web crawler example, thus quickly understands that FIG. 62 depicts a man encountering the image can use the pointer in the EXIF data and a woman in Las Vegas; that FIG. 63 shows a man and his or the Steganographically encoded watermark to identify a GMC truck; and that the FIG.70 image shows a child named corresponding repository of metadata, and add metadata from Matthew Doe. Edge map, color histogram, and other infor that repository to its index terms for the image (despite being mation conveyed with these images gives the service provider found at different locations). aheadstartin its processing of the imagery, e.g., to segmentit; 0725 By the foregoing arrangements it will be appreci recognize its content, initiating an appropriate response, etc. ated that existing imagery standards, workflows, and ecosys US 2013/0273968 A1 Oct. 17, 2013 44 tems—originally designed to Support just pixel image data, alone, may be displayed. (Or, the UI can be presented as an are here employed in Support of metadata as well. overlay on the background image captured by the camera.) 0726 (Of course, the alpha channel and other approaches 0733. In camera-based embodiments, as with embodi detailed in this section are not essential to other aspects of the ments employing physical sensors, another dimension of present technology. For example, information derived or motion may also be sensed: up/down. This can provide an inferred from processes such as those shown in FIGS. 50, 57 additional degree of control (e.g., shifting to capital letters, or and 61 can be sent by other transmission arrangements, e.g., shifting from characters to numbers, or selecting the current dispatched as packetized data using WiFi or WiMax, trans symbol, etc). mitted from the device using Bluetooth, sent as an SMS short 0734. In some embodiments, the device has several text or MMS multimedia messages, shared to another node in modes: one for entering text; another for entering numbers; a low power peer-to-peer wireless network, conveyed with another for symbols; etc. The user can switch between these wireless cellular transmission or wireless data service, etc.) modes by using mechanical controls (e.g., buttons), or through controls of a user interface (e.g., touches or gestures Texting, Etc. or voice commands). For example, while tapping a first region of the screen may select the currently-displayed symbol, tap 0727 U.S. Pat. Nos. 5,602,566 (Hitachi), 6,115,028 (Sili ping a second region of the screen may toggle the mode con Graphics), 6,201,554 (Ericsson), 6,466,198 (Innoven between character-entry and numeric-entry. Or one tap in this tions), 6,573,883 (Hewlett-Packard), 6,624,824 (Sun) and second region can Switch to character-entry (the default); two 6.956,564 (British Telecom), and published PCT application taps in this region can Switch to numeric-entry; and three taps WO9814863 (Philips), teach that portable computers can be in this region can Switch to entry of other symbols. equipped with devices by which tilting can be sensed, and 0735. Instead of selecting between individual symbols, used for different purposes (e.g., Scrolling through menus). Such an interface can also include common words or phrases 0728. In accordance with another aspect of the present (e.g., signature blocks) to which the user can tip/tilt navigate, technology, a tip/tilt interface is used in connection with a and then select. There may be several lists of words/phrases. typing operation, such as for composing text messages sent For example, a first list may be standardized (pre-pro by a Simple Message Service (SMS) protocol from a PDA, a grammed by the device vendor), and include statistically cell phone, or other portable wireless device. common words. A second list may comprise words and/or 0729. In one embodiment, a user activates a tip/tilt text phrases that are associated with a particular user (or a par entry mode using any of various known means (e.g., pushing ticular class of users). The user may enter these words into a button, entering a gesture, etc.). A scrollable user interface Such a list, or the device can compile the list during opera appears on the device screen, presenting a series of icons. tion—determining which words are most commonly entered Each icon has the appearance of a cell phone key, Such as a by the user. (The second list may exclude words found on the button depicting the numeral “2 and the letters “abc.” The first list, or not.) Again, the user can Switch between these lists user tilts the device left or right to scroll backwards or for as described above. wards thru the series of icons, to reach a desired button. The 0736. Desirably, the sensitivity of the tip/tilt interface is user then tips the device towards or away from themselves to adjustable by the user, to accommodate different user prefer navigate between the three letters associated with that icon ences and skills (e.g., tipping away navigates to “a; no tipping corresponds to 0737. While the foregoing embodiments contemplated a “b; and tipping towards navigates to “c'). After navigating to limited grammar of tilts/tips, more expansive grammars can the desired letter, the user takes an action to select that letter. be devised. For example, while relative slow tilting of the This action may be pressing a button on the device (e.g., with screen to the left may cause the icons to scroll in a given the users thumb), or another action may signal the selection. direction (left, or right, depending on the implementation), a The user then proceeds as described to select Subsequent sudden tilt of the screen in that direction can effect a different letters. By this arrangement, the user enters a series of text operation—such as inserting a line (or paragraph) break in the without the constraints of big fingers on tiny buttons or UI text. A sharp tilt in the other direction can cause the device to features. send the message. 0730 Many variations are, of course, possible. The device 0738. Instead of the speed of tilt, the degree of tilt can needn't be a phone; it may be a wristwatch, keyfob, or have correspond to different actions. For example, tilting the another small form factor. device between 5 and 25 degrees can cause the icons to Scroll, 0731. The device may have a touch-screen. After navigat but tilting the device beyond 30 degrees can insert a line break ing to a desired character, the user may tap the touchscreen to (if to the left) or can cause the message to be sent (if to the effect the selection. When tipping/tilting the device, the cor right). responding letter can be displayed on the screen in an 0739. Different tip gestures can likewise trigger different enlarged fashion (e.g., on the icon representing the button, or actions. overlaid elsewhere) to indicate the user's progress in naviga 0740 The arrangements just described are necessarily tion. only a few of the many different possibilities. Artisans adopt 0732. While accelerometers or other physical sensors are ing Such technology are expected to modify and adapt these employed in certain embodiments, others use a 2D optical teachings as Suited for particular applications. sensor (e.g., a camera). The user can point the sensor to the floor, to a knee, or to another subject, and the device can then Affine Capture Parameters sense relative physical motion by sensing movement of fea 0741. In accordance with another aspect of the present tures within the image frame (up/down; left right). In Such technology, a portable device captures—and may present— embodiments the image frame captured by the camera need geometric information relating to the device's position (or not be presented on the screen; the symbol selection UI, that of a subject). US 2013/0273968 A1 Oct. 17, 2013

0742 Digimarc's published patent application graphically. The graphical presentation can comprise two 20080300011 teaches various arrangements by which a cell lines: a reference line, and a second, parallel line whose phone can be made responsive to what it "sees including length changes in real time in accordance with the scale overlaying graphical features atop certain imaged objects. change (larger than the reference line for movement of the The overlay can be warped in accordance with the objects camera closer to the Subject, and Smaller for movement perceived affine distortion. away). 0743 Steganographic calibration signals by which affine 0750 Although not particularly shown in the exemplary distortion of an imaged object can be accurately quantified embodiment of FIG.58, other such geometric data can also be are detailed, e.g., in Digimarc's U.S. Pat. Nos. 6,614,914 and derived and presented, e.g., translation, differential scaling, 6,580,809; and in patent publications 20040105569, tip angle (i.e., forward/backward), etc. The determinations 20040101157, and 2006003 1684. Digimarc's U.S. Pat. No. detailed above can be simplified if the camera field of view 6.959,098 teaches how distortion can be characterized by includes a digital watermark having Steganographic calibra Such watermark calibration signals in conjunction with vis tion? orientation data of the sort detailed in the referenced ible image features (e.g., edges of a rectilinear object). From patent documents. However, the information can also be such affine distortion information, the 6D location of a water derived from other features in the imagery. marked object relative to the imager of a cell phone can be 0751. Of course, in still other embodiments, data from one determined. or more accelerometers or other position sensing arrange 0744. There are various ways 6D location can be ments in the device—either alone or in conjunction with described. One is by three location parameters: X, y, Z, and image data—can be used to generate the presented informa three angle parameters: tip, tilt, rotation. Another is by rota tion. tion and scale parameters, together with a 2D matrix of 4 0752. In addition to presenting such geometric informa elements that defines a linear transformation (e.g., shear map tion on the device Screen, such information can also be used, ping, translation, etc.). The matrix transforms the location of e.g., in sensing gestures made with the device by a user, in any pixel X.y to a resultant location after a linear transform has providing context by which remote system responses can be occurred. (The reader is referred to references on shear map ping, e.g., Wikipedia, for information on the matrix math, customized, etc. etc.) Camera-Based Environmental and Behavioral State Machine 0745 FIG. 58 shows how a cell phone can display affine parameters (e.g., derived from imagery or otherwise). The (0753. In accordance with a further aspect of the present camera can be placed in this mode through a UI control (e.g., technology, a cell phone functions as a state machine, e.g., tapping a physical button, making a touchscreen gesture, changing aspects of its functioning based on image-related etc.). information previously acquired. The image-related informa 0746. In the depicted arrangement, the device's rotation tion can be focused on the natural behavior of the camerauser, from (an apparent) horizontal orientation is presented at the typical environments in which the camera is operated, innate top of the cell phone screen. The cell phone processor can physical characteristics of the camera itself, the structure and make this determination by analyzing the image data for one dynamic properties of scenes being imaged by the camera, or more generally parallel elongated Straight edge features, and many other Such categories of information. The resulting averaging them to determine a mean, and assuming that this is changes in the camera's function can be directed toward the horizon. If the camera is conventionally aligned with the improving image analysis programs resident on a camera horizon, this mean line will be horizontal. Divergence of this device or remotely located at Some image-analysis server. line from horizontal indicates the camera's rotation. This Image analysis is construed very broadly, covering a range of information can be presented textually (e.g., "12 degrees analysis from digital watermark reading, to object and facial right'), and/oragraphical representation showing divergence recognition, to 2-D and 3-D barcode reading and optical from horizontal can be presented. character recognition, all the way through scene categoriza 0747 (Other means for sensing angular orientation can be tion analysis and more. employed. For example, many cell phones include acceler 0754. A few simple examples will illustrate what is ometers, or other tilt detectors, which output data from which expected to become an important aspect of future mobile the cell phone processor can discern the devices angular devices. orientation. 0755 Consider the problem of object recognition. Most 0748. In the illustrated embodiment, the camera captures a objects have different appearances, depending on the angle sequence of image frames (e.g., video) when in this mode of from which they are viewed. If a machine vision object operation. A second datum indicates the angle by which fea recognition algorithm is given some information about the tures in the image frame have been rotated since image cap perspective from which an object is viewed, it can make a ture began. Again, this information can be gleaned by analysis more accurate (or faster) guess of what the object is. of the image data, and can be presented in text form, and/or 0756 People are creatures of habit, including in their use graphically. (The graphic can comprise a circle, with a line— of cellphone cameras. This extends to the hand in which they or arrow—through the center showing real-time angular typically hold the phone, and how they incline it during pic movement of the camera to the left or right.) ture taking. After a user has established a history with a 0749. In similar fashion, the device can track changes in phone, usage patterns may be discerned from the images the apparent size of edges, objects, and/or other features in the captured. For example, the user may tend to take photos of image, to determine the amount by which scale has changed Subjects not straight-on, but slightly from the right. Such a since image capture started. This indicates whether the cam right-oblique tendency in perspective may be due to the fact era has moved towards or away from the subject, and by how that the user routinely holds the camera in the right hand, so much. Again, the information can be presented textually and exposures are taken from a bit right-of-center. US 2013/0273968 A1 Oct. 17, 2013 46

0757 (Right-obliqueness can be sensed in various ways, 0766 Common aspects of the subject-matter or “scenes e.g., by lengths of vertical paralleledges within image frames. being imaged' is another rich source of information for Sub If edges tend to be longer on the right sides of the images, this sequent image analysis routines, or at least early-stage image tends to indicate that the images were taken from a right processing steps which assist later stage image analysis rou oblique view. Differences in illumination across foreground times by optimally filtering and/or transforming the pixel data. Subjects can also be used brighter illumination on the right For example, it may become clear over days and weeks of side of Subjects suggest the right side was closer to the lens. camera usage that a given user only uses their cameras for Etc.) three basic interests: digital watermark reading, barcode read 0758 Similarly, in order to comfortably operate the shut ing, and visual logging of experimental set-ups in a labora terbutton of the phone while holding the device, this particu tory. A histogram can be developed over time showing which lar user may habitually adopt a grip of the phone that inclines “end result” operation Some given camera usage led toward, the top of the camera five degrees towards the user (i.e., to the followed by an increase in processing cycles devoted to early left). This results in the captured image subjects generally detections of both watermark and barcode basic characteris being skewed with an apparent rotation of five degrees. tics. Drilling a bit deeper here, a Fourier-transformed set of 0759. Such recurring biases can be discerned by examin image data may be preferentially routed to a quick 2-D bar ing a collection of images captured by that user with that cell code detection function which may otherwise have been de phone. Once identified, data memorializing these idiosyncra prioritized. Likewise on digital watermark reading, where sies can be stored in a memory, and used to optimize image Fourier transformed data may be shipped to a specialized recognition processes performed by the device. pattern recognition routine. A partially abstract way to view 0760 Thus, the device may generate a first output (e.g., a this state-machine change is that there is only a fixed amount tentative object identification) from a given image frame at of CPU and image-processing cycles available to a camera one time, but generate a second, different output (e.g., a device, and choices need to be made on which modes of different object identification) from the same image frame at analysis get what portions of those cycles. a later time—due to intervening use of the camera. 0767. An over-simplified representation of such embodi 0761. A characteristic pattern of the user's handjitter may ments is shown in FIG. 59. also be inferred by examination of plural images. For 0768. By arrangements such as just-discussed, operation example, by examining pictures of different exposure peri of an imager-equipped device evolves through its continued ods, it may be found that the user has a jitter with a frequency operation. of about 4 Hertz, which is predominantly in the left-right (horizontal) direction. Sharpening filters tailored to that jitter Focus Issues, Enhanced Print-to-Web Linking Based on Page behavior (and also dependent on the length of the exposure) Layout can then be applied to enhance the resulting imagery. 0769 Cameras currently provided with most cell phones, 0762. In similar fashion, through use, the device may and other portable PDA-like devices, do not generally have notice that the images captured by the user during weekday adjustable focus. Rather, the optics are arranged in compro hours of 9:00-5:00 are routinely illuminated with a spectrum mise fashion-aiming to get a decent image under typical characteristic of fluorescent lighting, to which a rather portrait Snapshot and landscape circumstances. Imaging at extreme white-balancing operation needs to be applied to try close distances generally yields inferior results—losing high and compensate. With a priori knowledge of this tendency, the frequency detail. (This is ameliorated by just-discussed device can expose photos captured during those hours differ “extended depth of field' image sensors, but widespread ently than with its baseline exposure parameters—anticipat deployment of Such devices has not yet occurred.) ing the fluorescent illumination, and allowing a better white 0770. The human visual system has different sensitivity to balance to be achieved. imagery at different spectral frequencies. Different image 0763. Over time the device derives information that mod frequencies convey different impressions. Low frequencies els some aspect of the user's customary behavior or environ give global information about an image. Such as its orienta mental variables. The device then adapts some aspect of its tion and general shape. High frequencies give fine details and operation accordingly. edges. As shown in FIG. 72, the sensitivity of the human 0764. The device may also adapt to its own peculiarities or vision system peaks at frequencies of about 10 cycles/mm on degradations. These include non-uniformities in the photo the retina, and falls away steeply on either side. (Perception diodes of the image sensor, dust on the image sensor, mars on also depends on contrast between features sought to be dis the lens, etc. tinguished—the vertical axis.) Image features with spatial 0765. Again, over time, the device may detect a recurring frequencies and contrastin the cross-hatched Zone are usually pattern, e.g.: (a) that one pixel gives a 2% lower average not perceivable by humans. FIG. 73 shows an image with the output signal than adjoining pixels; (b) that a contiguous low and high frequencies depicted separately (on the left and group of pixels tends to output signals that are about 3 digital right). numbers lower than averages would otherwise indicate; (c) 0771 Digital watermarking of print media, such as news that a certain region of the photosensor does not seem to papers, can be effected by tinting the page (before, during or capture high frequency detail—imagery in that region is con after printing) with an inoffensive background pattern that sistently a bit blurry, etc. From Such recurring phenomena, the Steganographically conveys auxiliary payload data. Different device can deduce, e.g., that (a) the gain for the amplifier columns of text can be encoded with different payload data, serving this pixel is low; (b) dust or other foreign object is e.g., permitting each news story to link to a different elec occluding these pixels; and (c) a lens flaw prevents light tronic resource (see, e.g., Digimarc's U.S. Pat. Nos. 6,985, falling in this region of the photosensor from being properly 600, 6,947,571 and 6,724,912). focused, etc. Appropriate compensations can then be applied 0772. In accordance with another aspect of the present to mitigate these shortcomings. technology, the close-focus shortcoming of portable imaging US 2013/0273968 A1 Oct. 17, 2013 47 devices is overcome by embedding a lower frequency digital ments appear in what spaces on each printed page. These watermark (e.g., with a spectral composition centered on the same Software tools, or others, can be adapted to take this left side of FIG. 72, above the curve). Instead of encoding layout map information, associate corresponding links or different watermarks in different columns, the page is marked other data for each story/advertisement, and store the result with a single watermark that spans the page—encoding an ing data structure in a web-accessible server from which identifier for that page. portable devices can access same. 0773) When a user snaps a picture of a newspaper story of 0781. The layout of newspaper and magazine pages offers interest (which picture may capture just text/graphics from orientation information that can be useful in watermark the desired story/advertisement, or may span other content as decoding. Columns are vertical. Headlines and lines of text well), the watermark of that page is decoded (either locally by are horizontal. Even at very low spatial image frequencies, the device, remotely by a different device, or in distributed Such shape orientation can be distinguished. A user capturing fashion). an image of a printed page may not capture the content 0774. The decoded watermark serves to index a data struc “squarely.” However, these strong vertical and horizontal ture that returns information to the device, to be presented on components of the image are readily determined by algorith its display Screen. The display presents a map of the newspa mic analysis of the captured image data, and allow the rota per page layout, with different articles/advertisements shown tion of the captured image to be discerned. This knowledge in different colors. simplifies and speeds the watermark decoding process (since (0775 FIGS. 74 and 75 illustrate one particular embodi a first step in many watermark decoding operations is to ment. The original page is shown in FIG.74. The layout map discern the rotation of the image from its originally-encoded displayed on the user device screen in shown in FIG. 75. state). 0776 To link to additional information about any of the 0782. In another embodiment, delivery of a page map to stories, the user simply touches the portion of the displayed the user device from a remote server is not required. Again, a map corresponding to the story of interest. (If the device is not region of a page spanning several items of content is encoded equipped with a touch screen, the map of FIG. 75 can be with a single watermark payload. Again, the user captures an presented with indicia identifying the different map Zones, image including content of interest. The watermark identify e.g., 1, 2,3 ... or A, B, C . . . . The user can then operate the ing the page is decoded. device's numeric or alphanumeric user interface (e.g., key pad) to identify the article of interest.) 0783. In this embodiment, the captured image is displayed 0777. The user's selectionistransmitted to a remote server on the device screen, and the user touches the content region (which may be the same one that served the layout map data of particular interest. The coordinates of the user's selection to the portable device, or another one), which then consults within the captured image data are recorded. with stored data to identify information responsive to the 0784 FIG. 76 is illustrative. The user has used an Apple user's selection. For example, if the user touches the region in iPhone, a T-Mobile Android phone, or the like to capture an the lower right of the page map, the remote system may image from an excerpt from a watermarked newspaper page, instructs a server at buick-dot-com to transmit a page for and then touches an article of interest (indicated by the oval). presentation on the user device, with more information the The location of the touch within the image frame is known to about the Buick Lucerne. Or the remote system can send the the touchscreen Software, e.g., as an offset from the upper left user device a link to that page, and the device can then load the corner, measured in pixels. (The display may have a resolu page. Or the remote system can cause the user device to tion of 480x320 pixels). The touch may beat pixel position present a menu of options, e.g., for a news article the user may (200,160). be given options to: listen to a related podcast; see earlier 0785. The watermark spans the page, and is shown in FIG. stories on the same topic; order reprints; download the article 76 by the dashed diagonal lines. The watermark (e.g., as as a Word file, etc. Or the remote system can send the user a described in Digimarc's U.S. Pat. No. 6,590.996) has an link to a web page or menu page by email, so that the user can origin, but the origin point is not within the image frame review same at a later time. (A variety of such different captured by the user. However, from the watermark, the responses to user-expressed selections can be provided, as are watermark decoder Software knows the scale of the image and known from the art cited herein.) its rotation. It also knows the offset of the captured image (0778 Instead of the map of FIG. 75, the system may cause frame from the watermark's origin. Based on this informa the user device to display a screen showing a reduced scale tion, and information about the scale at which the original version of the newspaper page itself like that shown in FIG. watermark was encoded (which information can be conveyed 74. Again, the user can simply touch the article of interest to with the watermark, accessed from a remote repository, hard trigger an associated response. coded into the detector, etc.), the software can determine that 0779. Or instead of a presenting a graphical layout of the the upper left corner of the captured image frame corresponds page, the remote system can return titles of all the content on to a point 1.6 inches below, and 2.3 inches to the right, of the the page (e.g., “Banks Owe Billions . . . . "McCain Pins top left corner of the originally printed page (assuming the Hopes . . . . "Buick Lucerne'). These titles are presented in watermark origin is at the top left corner of the page). From menu form on the device screen, and the user touches the the decoded scale information, the Software can discern that desired item (or enters a corresponding number/letter selec the 480 pixel width of the captured image corresponds to an tion). area of the originally printed page 12 inches in width. 0780. The layout map for each printed newspaper and 0786 The software finally determines the position of the magazine page is typically generated by the publishing com users touch, as an offset from the upper left corner of the pany as part of its layout process, e.g., using automated Soft originally-printed page. It knows the corner of the captured ware from Vendors such as Quark, Impress and Adobe, etc. image is offset (1.6", 2.3") from the upper left corner of the Existing Software thus knows what articles and advertise printed page, and that the touch is a further 5" to the right (200