USOO88051 1 OB2

(12) United States Patent (10) Patent No.: US 8,805,110 B2 Rhoads et al. (45) Date of Patent: Aug. 12, 2014

(54) METHODS AND SYSTEMS FOR CONTENT (56) References Cited PROCESSING (75) Inventors: Geoffrey B. Rhoads, West Linn, OR U.S. PATENT DOCUMENTS (US); Tony F. Rodriguez, Portland, OR (US); John D. Lord, West Linn, OR 5,835,616 A * 1 1/1998 Lobo et al...... 382,118 (US); Brian T. MacIntosh, Lake 6,301,370 B1 * 10/2001 Steffens et al...... 382,103 Oswego, OR (US); Nicole Rhoads, West (Continued) Linn, OR (US); William Y. Conwell, OTHER PUBLICATIONS Portland, OR (US) (73) Assignee: Digimarc Corporation, Beaverton, OR GPU Computing: Graphics Processing Units- powerful, program (US) mable, and highly parallel—are increasingly targeting general-pur (*) Notice: Subject to any disclaimer, the term of this pose computing applications, by John D. Owens, Mike Houston, patent is extended or adjusted under 35 David Luebke, Simon Green, John E. Stone, and James C. Phillips, U.S.C. 154(b) by 515 days. 0018-9219, 2008 IEEE vol. 96, No. 5, pp. 279-299, May 2008 | Proceedings of the IEEE.* (21) Appl. No.: 13/011,618 (22) Filed: Jan. 21, 2011 (Continued) (65) Prior Publication Data Primary Examiner — Hadi Akhavannik US 2011 FO212717 A1 Sep. 1, 2011 Assistant Examiner — Mehdi Rashidian (74) Attorney, Agent, or Firm — Digimarc Corporation Related U.S. Application Data (63) Continuation of application No. (57) ABSTRACT PCT/US2009/054358, filed on Aug. 19, 2009, and a Mobile phones and other portable devices are equipped with a variety of technologies by which existing functionality can (Continued) be improved, and new functionality can be provided. Some (51) Int. C. aspects relate to visual search capabilities, and determining G06K 9/00 (2006.01) appropriate actions responsive to different image inputs. Oth HO4M 3/OO (2006.01) ers relate to processing of image data. Still others concern HO4N 13/OO (2006.01) metadata generation, processing, and representation. Yet oth GO6T 1/20 (2006.01) (52) U.S. C. ers concern user interface improvements. Other aspects relate CPC ...... G06K 9/00 (2013.01); H04N 2013/0074 to imaging architectures, in which a mobile phone's image (2013.01); G06T 1/20 (2013.01) sensor is one in a chain of stages that successively act on USPC ...... 382/255; 455/418; 455/420 packetized instructions/data, to capture and later process (58) Field of Classification Search imagery. Still other aspects relate to distribution of processing CPC ...... G06K9/00; G06K9/0021; G06T 1/0007; tasks between the mobile device and remote resources (“the G06T 1/20: H04N 2013/0074 cloud'). Elemental image processing (e.g., simple filtering USPC ...... 382/100, 103,106, 107, 115, 117, 118, and edge detection) can be performed on the mobile phone, 382/190, 199, 209, 255, 276:348/61, while other operations can be referred out to remote service 348/143-160,273, 376,208.15, providers. The remote service providers can be selected using 348/333.01-333.04, 333.11, 240.99, techniques such as reverse auctions, through which they com 348/240.03, 208.4; 455/403, 418, 420; pete for processing tasks. A great number of other features 356/3; 726/17: 707/687, 690, 769, and arrangements are also detailed. 707/770 771 See application file for complete search history. 9 Claims, 60 Drawing Sheets

RERIEWE MANAL

GESRE FACIAL &"gif RECOGNITION

OBJECT RECOGNITC

NEARBY? POST TO BARCODE AC3ook IST OR RADING SAE ON CRAGSIST

TRANSLAE iS^ OENGIS SCOWER HISTORY HUMAN WEB COMMAND (DEGREES OF ACCESS

(-) Ntry SEPARATION) /AUTENTATION US 8,805,110 B2 Page 2

Related U.S. Application Data (56) References Cited continuation-in-part of application No. 12/271,692, U.S. PATENT DOCUMENTS filed on Nov. 14, 2008, now Pat. No. 8,520,979, and a continuation-in-part of application No. 12/484,115, 7,742,624 B2* 6/2010 Super et al...... 382/106 filed on Jun. 12, 2009, now Pat. No. 8,385,971, and a 8.467,627 B2 6, 2013 Gwak et al...... 382.275 continuation-in-part of application No. 12/498,709, 2002fOO19819 A1* 2/2002 Sekiguchi et al...... 707/3

filed on Jul. 7, 2009. 2004/0263663 A1: 12/2004 Lee et al...... 348,333.11 2006, OO 12677 A1 1/2006 Neven et al...... 348.61 (60) Provisional application No. 61/090,083, filed on Aug. 2007/0162971 A1* 7/2007 Blom et al...... 726/17 19, 2008, provisional application No. 61/096,703, 2007/0248281 A1* 10/2007 Super et al...... 382.275 filed on Sep. 12, 2008, provisional application No. 2008.OOOT 620 A1* 1/2008 Wang et al...... 348,154 61/100,643, filed on Sep. 26, 2008, provisional 2009/0080698 A1 3f2009 Mihara et al...... 38.2/103 application No. 61/103,907, filed on Oct. 8, 2008, 2011/O116720 Al 5/2011 Gwak et al...... 382,224 provisional application No. 61/110,490, filed on Oct. 31, 2008, provisional application No. 61/169,266, OTHER PUBLICATIONS filed on Apr. 14, 2009, provisional application No. 61/174,822, filed on May 1, 2009, provisional Yang et al. Detecting Faces in Images: A Survey, IEEE Transactions application No. 61/176.739, filed on May 8, 2009, on Pattern Analysis and Machine Intelligence, vol. 24. No. 1, Jan. provisional application No. 61/226,195, filed on Jul. 2002. 16, 2009, provisional application No. 61/234,542, filed on Aug. 17, 2009. * cited by examiner U.S. Patent Aug. 12, 2014 Sheet 1 of 60 US 8,805,110 B2

U.S. Patent Aug. 12, 2014 Sheet 2 of 60 US 8,805,110 B2

OLXAHONALENTITEO LENHELNIOLHIM|—?z?–

U.S. Patent Aug. 12, 2014 Sheet 3 of 60 US 8,805,110 B2

SSEOOV/(NOILVAJV?ES

LOETEO

ERHfI_LSESO U.S. Patent Aug. 12, 2014 Sheet 4 of 60 US 8,805,110 B2

ERHTMLOTT,HLS SèHOSSEOO?Ho-H

AO

EHL9."SOIH8HEMA U.S. Patent Aug. 12, 2014 Sheet 5 of 60 US 8,805,110 B2

KEYVECTOR DATA (PIXELS & DERIVATIVES)

FIG. 4

VISUAL TASK B

RESULTA RESULT B RESULT C

KEYVECTOR FIG. 4A

ADDRESSING, ETC

PXEL PACKET U.S. Patent Aug. 12, 2014 Sheet 6 of 60 US 8,805,110 B2

FIG. 4B

VISUALTASK COMMONALITY PE CHARTS I FFT % RE-SAMPLNG a

PDF417 BARCODE READING FIG. 5 U.S. Patent Aug. 12, 2014 Sheet 7 of 60 US 8,805,110 B2

RESIDENT CALL-UP VISUAL PROCESSING SERVICES

ON ON OFF OFF ON OFF

COMMON SERVICES SORTER

EYEE 23H: xNS Š CESSINGPRO 2H:N.xxx xxxssess 2.:::S

FLOW GATE CONFIGURATION, PROGRAMMING EXTERNAL

PXEL KEYVECTOR SENSOR milier HARDWARE pRoc,DATA Exelss ROUTING OTHER)

FIG. 6 INTERNAL U.S. Patent US 8,805,110 B2

U.S. Patent Aug. 12, 2014 Sheet 10 of 60 US 8,805,110 B2

SOAO U.S. Patent US 8,805,110 B2

U.S. Patent Aug. 12, 2014 Sheet 12 of 60 US 8,805,110 B2

U.S. Patent Aug. 12, 2014 Sheet 13 of 60 US 8,805,110 B2

MY PROFILE, INCLUDING WHO AM, AND WHAT DO GENERALLY LIKEWANT

MY CONTEXT, INCLUDING WHERE I AM, AND WHAT ARE MY CURRENT DESRES

MY VISUAL QUERY PXELS THEMSELVES

PACKETS RESULTS

VISUAL OUERY PACKET

PACKETS RESULTS PACKETS RESULTS

CONFIGURABLE ARRAY OF OFFEREDATAUCTION TO AN SERVICE PROVIDERS ARRAY OF SERVICE PROVIDERS

FIG. 11 U.S. Patent

S_LITTISERH

EOLAECIMJEST

U.S. Patent Aug. 12, 2014 Sheet 15 of 60 US 8,805,110 B2

12 SENSOR

PROCESSING FOR HUMAN VISUAL SYSTEM (e.g., JPEG compression)

No FIG. 13

16

57 Y 56 - N CAMERA STAGE 1 STAGE 2 STAGE Y INSTRUCTIONS INSTRUCTIONS INSTRUCTIONS ' ' ' INSTRUCTIONS 55 58a 58b

STAGE Z INSTRUCTIONS DATA

56 59

FIG. 17 U.S. Patent Aug. 12, 2014 Sheet 16 of 60 US 8,805,110 B2

CELL PHONE

FIG. 14

APPLICATION PROCESSING NCELL EXTERNAL FROM CELL PHONE APPLICATION IN CELL

APPLICATION EXTERNAL FROM CELL PHONE

APPLICATION EXTERNAL FROM CELL PHONE U.S. Patent Aug. 12, 2014 Sheet 17 of 60 US 8,805,110 B2

S. U.S. Patent Aug. 12, 2014 Sheet 18 of 60 US 8,805,110 B2

v li (D CC H CO el N

cy 9

H CO wa (1)||No.!!! |9

; : : -- i

| TO?]]NOO ?HOSSEOORHd U.S. Patent Aug. 12, 2014 Sheet 19 of 60 US 8,805,110 B2

9/

8),"

U.S. Patent US 8,805,110 B2

cd U.S. Patent Aug. 12, 2014 Sheet 22 of 60 US 8,805,110 B2

RESPONSE TIME ROUTING NEEDED CONSTRANTS OPERATION DETALS USER HARDWARE • High level PREFERENCES o What special purpose operations hardware is loca?

O LOW level o What is Current operations

hardware utilization? o Common operations o CPU pipeline length O Operations that are preconditions to a Pipeline stall risks others CONNECTIVITY o Power consumption STATUS

GEOGRAPHICAL

RULES, CONSIDERATIONS HEURiSTICS INFORE REMOTE PROVIDER(S), E.G., Readiness Speed Cost Attributes of importance to user

OPERATIONS FOR OPERATIONS FOR LOCAL PROCESSING REMOTE PROCESSING

FIG. 19B U.S. Patent Aug. 12, 2014 Sheet 23 of 60 US 8,805,110 B2

12 / CAMERA SENSOR

82 83 86

MICRO MIRROR PROJECTOR U.S. Patent Aug. 12, 2014 Sheet 24 of 60 US 8,805,110 B2

uOleOSSeO effieu Jey SSes) UOOU

ul-Idueful |-–

OO

Jepee epOO

Je9 O2/Ol Xueue) eM

-1 1--1 ENOHCHOMHO|W U.S. Patent Aug. 12, 2014 Sheet 25 of 60 US 8,805,110 B2

Fle 21

FIG. 22 U.S. Patent Aug. 12, 2014 Sheet 26 of 60 US 8,805,110 B2

CAPTURE OR RECEIVE IMAGE IMAGE AND/OR METADATA

H. H. : : H is as PRE-PROCESS IMAGE, SUBMIT TO GOOGLE, RECEIVE e.g., IMAGE ENHANCEMENT, AUXLARY INFORMATION OBJECT SEGMENTATION/ EXTRACTION, etc. ------SUBMIT TO SERVICE

FIG. 24

CAPTURE OR RECEIVE IMAGE

FEATURE EXTRACTION

SEARCH FOR IMAGES WITH AUGMENT/ENHANCE SMILAR METRICS METADATA

USE SMLAR MAGE DATAN SUBMIT TO SERVICE LEU OF USER MAGE DATA FIG. 25 FIG. 23 U.S. Patent Aug. 12, 2014 Sheet 27 of 60 US 8,805,110 B2

CAPTURE OR RECEIVE IMAGE CAPTURE OR RECEIVE IMAGE

FEATURE EXTRACTION FEATURE EXTRACTION

SEARCH FOR IMAGES WITH SEARCH FOR IMAGES WITH SMILAR METRICS SMILAR METRICS

SUBMIT PLURAL SETS OF COMPOSITE OR MERGE IMAGE DATA TO SERVICE SEVERAL SETS OF IMAGE DATA PROVIDER

SUBMIT TO SERVICE PROVIDER SERVICE PROVIDER, OR USER DEVICE, DETERMINES ENHANCED RESPONSE BASED ON PLURAL SETS OF DATA FIG. 27

FIG. 26 CAPTURE OR RECEIVE USER IMAGE

FEATURE EXTRACTION

SEARCH FOR MAGES WITH SMILAR METRICS

USE INFO FROM SIMLAR IMAGE(S) TO ENHANCE USER IMAGE

FIG. 28 U.S. Patent Aug. 12, 2014 Sheet 28 of 60 US 8,805,110 B2

CAPTURE OR RECEIVE IMAGE CAPTURE OR RECEIVE IMAGE OF SUBJECT WITH GEOLOCATION INFO FEATURE EXTRACTION

INPUT GEOLOCATION INFO TO SEARCH FOR IMAGES WITH DB; GET TEXTUAL SMILAR METRICS DESCRIPTORS

HARVEST METADATA SEARCH IMAGE DATABASE USING TEXTUAL DESCRIPTORS: IDENTIFY OTHER IMAGE(S) OF SUBJECT SEARCH FOR MAGES WITH

SMLAR METADATA SUBMIT TO SERVICE SUBMT TO SERVICE FIG. 30 FIG. 28A

CAPTURE OR RECEIVE IMAGE OF SUBJECT WITH GEOLOCATION INFO

SEARCH IMAGE DATABASE USING GEOLOCATION DATA; IDENTIFY OTHER IMAGE(S) OF SUBJECT

SUBMIT OTHER IMAGE(S) TO SERVICE

FIG. 31 U.S. Patent Aug. 12, 2014 Sheet 29 of 60 US 8,805,110 B2

U.S. Patent Aug. 12, 2014 Sheet 30 of 60 US 8,805,110 B2

CAPTURE OR RECEIVE IMAGE CAPTURE OR RECEIVE IMAGE OF SUBJECT WITH OF SUBJECT WITH GEOLOCATION INFO GEOLOCATION INFO

SEARCH FOR IMAGES WITH SEARCH FOR MAGES WITH SMILARMAGE METRICS SMLAR MAGE METRICS

ADD GEOLOCATION INFO TO HARVEST METADATA MATCHING IMAGES

FIG. 32 SEARCH FOR MAGES WITH SMILAR METADATA

ADD GEOLOCATION INFO TO CAPTURE OR RECEIVE IMAGE MATCHING MAGES OF SUBJECT WITH GEOLOCATION INFO

FIG. 33 SEARCH FOR IMAGES WITH MATCHING GEOLOCATION

HARVEST METADATA

SEARCH FOR IMAGES WITH SMILAR METADATA

ADD GEOLOCATION INFO TO MATCHING IMAGES

FIG. 34 U.S. Patent Aug. 12, 2014 Sheet 31 of 60 US 8,805,110 B2

USER SNAPS IMAGE

CONSUMER URBAN/RURAL FAMILYIFRIEND/ PRODUCTIOTHER ANALYSS STRANGER ANALYSIS ANALYSS

RESPONSE RESPONSE RESPONSE ENGINE it ENGINE #2 ENGINE #3

FIG. 49 CONSOLIDATOR

CELL PHONE USER INTERFACE

ROUTER

RESPONSE RESPONSE ENGINEA ENGINEB

FIG. 37A FIG. 37B U.S. Patent Aug. 12, 2014 Sheet 32 of 60 US 8,805,110 B2

"IMAGE JUCER" USER-IMAGE -o- (algorithmic, crowd-sourced, etc.)

USER IMAGE-RELATED FACTORS

(First Tier) e.g., data about one or more of:

Color histogram, Pattern, Shape, Texture, Facial recognition, Eigenvalue,

Object recognition, Orientation, Text, Numbers, OCR, Barcode, Watermark, Emotion, Location, Transform data, etc

DATABASE OF PUBLIC IMAGES, SEARCHENGINE - oil IMAGE-RELATED FACTORS, AND OTHER METADATA

OTHER MAGE-RELATED OTHER FACTORS AND/OR MSESin N' OTHER METADATA (Second Tier)

------DATABASE OF PUBLIC IMAGES, SEARCHENGINE ---> IMAGE-RELATED FACTORS, AND as as as as as a as a - - - -OTHER - METADATA- - - - -

FINAL FINAL OTHER IMAGES/CONTENT INFORMATION (Nth Tier) (Nth Tier)

FIG. 39 U.S. Patent Aug. 12, 2014 Sheet 33 of 60 US 8,805,110 B2

"IMAGE JUCER" USER-IMAGE -o (algorithmic, crowd-sourced, etc.)

USER MAGE-RELATED FACTORS (First Tier)

DATABASE OF PUBLIC IMAGES, SEARCHENGINE -1 - IMAGE-RELATED FACTORS, AND OTHER METADATA

OTHER MAGE-RELATED OTHER FACTORS AND/OR MSENINT OTHER METADATA (Second Tier)

SEARCHENGINE | H WEAAAs.GOOGLETEXT,

------DATABASE OF PUBLIC IMAGES, SEARCHENGINE --> IMAGE-RELATED FACTORS, AND X X ax W X ax W. W. L- OTHER METADATA

FINAL FINAL OTHER IMAGES/CONTENT INFORMATION (Nth Tier) (Nth Tier)

FIG. 40 U.S. Patent Aug. 12, 2014 Sheet 34 of 60 US 8,805,110 B2

"IMAGE JUICER" (algorithmic, crowd-sourced, etc.)

X ... Image eigenvalues, Image classifier concludes "drill".

DATABASE OF PUBLIC IMAGES, SEARCHENGINE IMAGE-RELATED FACTORS, AND OTHER METADATA Similar- Text metadata for looking similar-looking pix pi

PROCESSOR Ranked metadata, incl. "Drill," "Black 8 Decker" and "DR25OB"

GOOGLE

w DATABASE OF SEARCH PUBLIC IMAGES, IMAGE-RELATED ENGINE ONE OR MORE FACTORS, AND IMAGE-BASED OTHER METADATA ROUTING SERVICES (may BROWSER Pix With similar metadata employ user, preference data), E.G., SNAPNOW Cached Web y user interface -- Options presented may include "Similar looks," "Similar descriptors," "Web results," "Buy," "Sell," "Manual," "More," "SnapNow," etc. U.S. Patent Aug. 12, 2014 Sheet 35 of 60 US 8,805,110 B2

"IMAGE JUCER" (algorithmic, crowd-sourced, etc.)

(...Geolocation Coordinates, Data restraight edgefarced edge density.

DATABASE OF PUBLIC IMAGES, SEARCHENGINE IMAGE-RELATED FACTORS, AND OTHER METADATA Pix that are similarly- Text metadata for pix that are similarly-located, located, with & similara edge? with similar edgelarc density arc density PROCESSOR Ranked metadata, incl. "Eiffel Tower," and "Paris" GOOGLE FIG. 42

W DATABASE OF SEARCH PUBLIC IMAGES, MAGE-RELATED ENGINE ONE OR MORE FACTORS, AND MAGE-BASED OTHER METADATA ROUTING SERVICES (may BROWSER Pix with similar metadata employ user, preference data), E.G., SNAPNOW Cached web U options presented may include y pages "Similar looks," "Similar descriptors," "Similar location," "Web results," "SnapNow," Map, Directions, USER INTERFACE History, Search nearby, etc. U.S. Patent Aug. 12, 2014 Sheet 36 of 60 US 8,805,110 B2

"IMAGE JUCER" (algorithmic, crowd-sourced, etc.)

... Geolocation Coordinates, Eigenvectors.

DATABASE OF MLS DATABASE OF PUBLIC IMAGES, IMAGES, IMAGE-RELATED -o- SEARCHENGINE -o- IMAGE-RELATED FACTORS, AND FACTORS, AND OTHER METADATA OTHER METADATA

Neighborhood Address, zip code, school district, SCenes, text Square footage, price, bedrooms/ metadata from baths, acreage, time on market, similarly-located data/imagery re similarly-located images houses, similarly-priced houses, similarly-featured houses, etc

GOOGLE, GOOGLE EARTH

USER INTERFACE

FIG. 43 U.S. Patent Aug. 12, 2014 Sheet 37 of 60 US 8,805,110 B2

riseFirst #2a as as a an is a was s y assiss SSSs W s SSS&S SSS. f. I NSSst a E.s A El R

risellers" arra H l E FIG. 44 E. t R t 3 111 : it,E "...."...","."le"e""""" l E. . agerrarfar-roarrararator-ra-rril

110 U.S. Patent Aug. 12, 2014 Sheet 38 of 60 US 8,805,110 B2

134

132 1. 116 Y 130

M 128 FIG. 45A

Y 140

FIG. 45B U.S. Patent Aug. 12, 2014 Sheet 39 of 60 US 8,805,110 B2

Rockefeller Center (21) New York (18) NYC (10) GET GEOCOORONATES FROM IMAGE Manhattan (8) Empire State Building (4) Midtown (4) SEARCH FLICKR, ETC, FOR SIMILARLY Top of the rock (4) LOCATED IMAGES (AND/OR SIMILAR IN Nueva York (3) OTHER WAYS): "SET 1" IMAGES USA (3) 30 Rock (2) August (2) HARVEST, CLEAN-UP AND CLUSTER Atlas (2) METADATA FROM SET 1 MAGES Christmas (2) CityScape (2)

Fifth Avenue (2) FROM METADATA, DETERMINE Me (2) LIKELIHOOD THAT EACH IMAGES Prometheus (2) PLACE-CENTRIC (CLASS 1) Skating rink (2) Skyline (2) Skyscraper (2) FROM METADATA AND FACE-FINDING, Clouds (1) DETERMINE LIKELIHOOD THAT EACH General Motors Building (1) IMAGE IS PERSON-CENTRIC (CLASS2) Gertrude (1) Jerry (1)

Lights (1) Y FROM METADATA, AND ABOVE Statue (1) A SCORES, DETERMINE LIKELIHOOD THAT EACH IMAGE IS THING-CENTRIC (CLASS 3)

DETERMINE WHICH IMAGE METRICS RELIABLY GROUP LIKE-CLASSED IMAGES TOGETHER, AND DISTINGUISH DIFFERENTLY-CLASSED IMAGES

FIG. 46A U.S. Patent Aug. 12, 2014 Sheet 40 of 60 US 8,805,110 B2

APPLY TESTS TO INPUT IMAGE DETERMINE FROM METRICS ITS SCORE IN DIFFERENT CLASSES

PERFORM SIMLARITY TESTING BETWEEN INPUT IMAGE AND EACH MAGE IN SET 1 B 1. WEIGHT METADATA FROM EACH 19 Rockefeller Center IMAGE IN ACCORDANCE WITH 12 Prometheus SIMILARITY SCORE COMBINE 5 Skating rink

SEARCH IMAGE DATABASE FOR New York (17) ADD'L SIMLARLY-LOCATED Rockefeller Center (12) IMAGES, THIS TIME ALSO USING Manhattan (6) C NFERRED POSSIBLE METADATA Prometheus (5) AS SEARCH CRITERA. "SET 2" Fountain (4) 11 Christmas (3) Paul Manship (3) Sculpture (3) PERFORMS MLARITY TESTING Skating Rink (3) BETWEEN INPUT IMAGE AND Jerry (1) EACH IMAGE IN SET 2 National register of historical places (2) Nikon (2) RCA Building (2) WEIGHT METADATA FROM EACH Greek mythology (1) IMAGE IN ACCORDANCE WITH SIMILARITY SCORE: COMBINE D 5 Prometheus 1. 4 Fountain FURTHERWEIGHT METADATA IN 3 Paul Manship ACCORDANCE WITH 3 Rockefeller Center UNUSUALNESS 3 Sculpture 2 National register of historical places 2 RCA Building 1 Greek mythology FIG. 46B U.S. Patent Aug. 12, 2014 Sheet 41 of 60 US 8,805,110 B2

USER SNAPS IMAGE

GEO INFO PROFILE DATA DATABASE

PLACE-TERM GOOGLE, ETC GLOSSARY

MAGE PCASA ANALYSIS, FLICKR DATA PROCESSING

WORD PERSON-NAME FREQUENCY GLOSSARY DATABASE FACEBOOK DATA

GOOGLE

FIG. 47

RESPONSE ENGINE

CELL PHONE USER INTERFACE U.S. Patent Aug. 12, 2014 Sheet 42 of 60 US 8,805,110 B2

USER SNAPS IMAGE

PERSON/PLACE/ THING ANALYSS

FAMILY/FRIEND/ CHILD/ADULT STRANGER ANALYSS ANALYSS

RESPONSE ENGINE

CELL PHONE USER INTERFACE FIG. 48A

USER SNAPS IMAGE

FAMILY/FRIEND/ PERSON/PLACE/ CHILD/ADULT STRANGER ANALYSS THING ANALYSS ANALYSS

RESPONSE ENGINE

CELL PHONE USER INTERFACE FIG. 48B U.S. Patent Aug. 12, 2014 Sheet 43 of 60 US 8,805,110 B2

USER SNAPS IMAGE

FACIAL DCT FFT WAVELET RECOGNITION TRANSFORM TRANSFORM TRANSFORM U 2. EDGE EGENVALUE SPECTRUM OCR D DETECTION ANALYSIS ANALYSS D CO t PATTERN FOURIER-MELLIN TEXTURE COLOR ?o EXTRACTION TRANSFORM CLASSIFIER ANALYSS T X - T GIST EMOTION BARCODE AGE -Ho U 2. PROCESSING CLASSIFIER DECODER DETECTION D re WATERMARK ORIENTATION I SIMLARITY GENDER U CO DECODER DETECTION PROCESSOR DETECTION O O U OBJECT METADATA GEOLOCATION | ETC C) t SEGMENTATION ANALYSS PROCESSING CO

ROUTER

SERVICE SERVICE SERVICE PROVIDER 1 PROVIDER 2 PROVIDERN

CELL PHONE USER INTERFACE

FIG. 50 U.S. Patent Aug. 12, 2014 Sheet 44 of 60 US 8,805,110 B2

INPUT IMAGE, OR INFORMATIONABOUT THE IMAGE

eaeON i.U “Ws

CS3. C o i % $ ,

: (a) Post annotated photo to user's Facebook page (b) Start a text message to depicted person

w (a) Show estimations of who this may be (b) Send "Friend" invite on MySpace (c) Who are the degrees of separation between us (a) Buy online (b) Show nutrition information (c) laentify local stores that sell

FIG. 51 U.S. Patent Aug. 12, 2014 Sheet 45 of 60 US 8,805,110 B2

COMPUTERA

CONTENT SETA ROUTER

ROUTERA COMPUTER B

RESPONSE RESPONSE CONTENT SETB ENGINE 1A ENGINE 2A

ROUTER B

COMPUTER C RESPONSE ENGINEB CONTENT SET C

RESPONSE COMPUTER D ENGINE C

CONTENT SET D

COMPUTERE ROUTER D

CONTENT SETE 1 50

RESPONSE ENGINE

FIG. 52 U.S. Patent Aug. 12, 2014 Sheet 46 of 60 US 8,805,110 B2

sists are set

- FIG. 53

FEATURE ACTION

BABY CAM VECTORS NAME (URL, etc.)

TRAFFIC MAP 1E 5G 18D 238F BABYCAM 355%.A.WWW.SMITH HOME. WWW TRAFFIC.COM 582,79 TRAFFIC MAP "SELETRAFFIC.MAP.HTML

FIG. 55 SIGN GLOSSARY F.G. 54 U.S. Patent Aug. 12, 2014 Sheet 47 of 60 US 8,805,110 B2

IMAGE SENSORS

AMPLFY ANALOG SGNALS

CONVERT TO DIGITAL NATIVE IMAGE DATA

BAYER INTERPOLATION

WHTE BALANCE CORRECTION

GAMMA CORRECTION

EDGE ENHANCEMENT

JPEG COMPRESSION

STORE IN BUFFER MEMORY

DISPLAY CAPTURED IMAGE ON SCREEN

ONUSER COMMAND, STORE IN CARD AND/OR TRANSMT

FIG. 56 (Prior Art) U.S. Patent Aug. 12, 2014 Sheet 48 of 60 US 8,805,110 B2

MAGE SENSORS

AMPLFY ANALOG FIG. 57 SIGNALS

CONVERT TO DIGITAL NATIVE IMAGE DATA

- BAYER FOURIER | ETC EGENFACE ETC INTERPOLATION TRANSFORM CALCULATION - - Etc (as in Fig. 56)

MATCH MATCH FACE BARCODE TEMPLATE Yes Yes TEMPLATEP

SEND DIGITAL NATIVE MELLN IMAGE DATA AND/OR TRANSFORM FOURIER DOMAN DATA TO BARCODE SERVICE

SEND DIGITAL NATIVE

IMAGE DATA AND/ORF-M MATCH TEXT DOMAN DATA TO OCR TEMPLATEP SERVICE Yes SEND DIGITAL NATIVE ATCH WI IMAGE DATA AND/OR F-M ORIENTATION DOMAN DATA TO WM Yes TEMPLATE2 SERVICE SEND DIGITAL NATIVE MATCH FACE IMAGE DATA AND/ORF-M Yes TEMPLATE DOMAN DATA AND/OR EGENFACE DATA TO FACAL RECOGNITION SERVICE U.S. Patent Aug. 12, 2014 Sheet 49 of 60 US 8,805,110 B2

FROM APPARENT st

HORIZON:

ROTATION

SINCE 17 degrees left START

SCALE CHANGE SINCE 30% closer START:

IMAGE CAPTUREI PROCESSING

ANALYSIS, MEMORY @ @@@@: OUTPUT FIG. 59 DEPENDENT ON IMAGE DATA U.S. Patent Aug. 12, 2014 Sheet 50 of 60 US 8,805,110 B2

.

xxx xxxx xxxxx xxxx SS

F.G. 65

U.S. Patent Aug. 12, 2014 Sheet 51 of 60 US 8,805,110 B2

FOURIER TRANSFORM — DATA BARCODE INFORMATION FOURIER-MELLIN TRANSFORM DATA

OCR INFORMATION - e.

WATERMARK NFORMATION

DCT DATA b

MOTION INFORMATION - b > METADATA EGENFACE DATA b

FACE INFORMATION - D

SOBEL FILTER DATA e

WAVELET b TRANSFORM DATA

COLOR HISTOGRAM b DATA e F.G. 61

U.S. Patent Aug. 12, 2014 Sheet 53 of 60 US 8,805,110 B2

r & s: :

FIG. 69

FIG. 71 Patent Aug. 12, 2014 Sheet 54 of 60 US 8,805,110 B2

Spatial Frequency Cycles/Degree)

1. 10

S

invisible

0.01 Sensitivity Sel (1|Threshold Contrast) O. O

1.O. O. 10 100 FIG. 72 Spatial Frequency (Cycles/MM On Retina)

FIG. 73 U.S. Patent Aug. 12, 2014 Sheet 55 of 60 US 8,805,110 B2

Billion

s: XXXX 8XXX:8xxxx xxxx& 8XXXXXXXXXXXXX& xXXXXXXX88: ::: xxx8xxxX: xxxxxx 8xxxx x: & X& XXX 8XXX3:

s: Ex&

Watermark Origin Captured Image Frame

Offset of N.

DeCOded W s Watermark s:::::::::: a

Banks O E. SS 3-&

E.isis Exts.

*s - . . s - s Y s a s Y U.S. Patent Aug. 12, 2014 Sheet 56 of 60 US 8,805,110 B2

START START

ROUGH OUTLINES LARGE DETALS INTERMEDIATE DETALS FACAL FINE DETALS EGENVALUES

R, G, B Data Alpha Channel Data

FIG. 77A FIG. 77B U.S. Patent Aug. 12, 2014 Sheet 57 Of 60 US 8,805,110 B2

THERMOSTAT

WiFi transceiver

MEMORY: Op. SyS, Data * FIG. 78 (Prior 512 Art)

FIG. 79 (Prior Art) U.S. Patent Aug. 12, 2014 Sheet 58 of 60 US 8,805,110 B2

THERMOSTAT 526 WiFi

------transceiver Indicators 514 522

596 MEMORY: Op. SyS, Ap SW Data 532 34 FIG. 80

538

530

530 552 CELL PHONE - ZZ - SERVER FIG. 82

- M f N 4 as N rN N N N N f X-/ 555

NAMESPACE DATABASE

\ \ V V

V N 556a 556b N. N. N N n - - 556 THERMOSTAT N-- SERVER - 74 - | (OR OTHER DEVICE) U.S. Patent Aug. 12, 2014 Sheet 59 of 60 US 8,805,110 B2

CELL PHONE

542

Display MEMORY: Op. SyS.

U TouchSCreen SW Modules EtC. 546

540 FIG. 81

ALARM CLOCK 7-580

U.S. Patent Aug. 12, 2014 Sheet 60 of 60 US 8,805,110 B2

SETPOINT TEMP: 72.0 SETPPOINT TEMP: 72.0

URREN EMP: 70.7

N560

NEARBY DEVICES:

1. THERMOSTAT (4' away) (Outside Conf Rm 1500) 2. LaserJet Printer (15 away) (Cubicle 1250)

THEN PRESS OK PRESS DIGIT, THEN 'OK' BUTTON TO SET ALARM BUTTON TO SELECT

FIG. 86 FIG. 87 US 8,805,110 B2 1. 2 METHODS AND SYSTEMIS FOR CONTENT web connections to the baubles. Applications employing bar PROCESSING codes, digital watermarks, facial recognition, OCR, etc., can help Support initial deployment of the technology. RELATED APPLICATION DATA Later, the arrangement is expected to evolve into an auction market, in which paying enterprises want to place their own This application is a continuation of co-pending PCT baubles (or associated information) onto highly targeted application PCT/US09/54358, filed Aug. 19, 2009 (published demographic user Screens. User profiles, in conjunction with as WO2010022.185). Application PCT/US09/54358 claims the input visual stimuli (aided, in Some cases by GPS/mag priority benefit to each of the following provisional applica netometer data), is fed into a Google-esque mix-master in the tions: 10 cloud, matching buyers of mobile device-screen real estate to 61/090,083, filed 19 Aug. 2008; users requesting the baubles. 61/096,703, filed 12 Sep. 2008: Eventually, Such functionality may become so ubiquitous 61/100,643, filed 26 Sep. 2008: 61/103,907, filed 8 Oct. 2008: as to enter into the common lexicon, as in “I’ll try to get a 61/110,490, filed 31 Oct. 2008: 15 Bauble on that or “See what happens if you Viewgle that 61/169,266, filed 14 Apr. 2009: scene. 61/174,822, filed 1 May 2009; 61/176.739, filed 8 May 2009; BACKGROUND 61/226,195, filed 16 Jul. 2009; and 61/234,542, filed 17 Aug. 2009. Digimarc's U.S. Pat. No. 6,947,571 shows a system in Application PCT/US09/54358 also claims priority to, and which a cell phone camera captures content (e.g., image may be regarded as a continuation-in-part of each of the data), and processes same to derive an identifier related to the following applications: imagery. This derived identifier is submitted to a data struc Ser. No. 12/271,692, filed 14 Nov. 2008 (published as ture (e.g., a database), which indicates corresponding data or 2010.0046842): 25 actions. The cellphone then displays responsive information, Ser. No. 12/484,115, filed 12 Jun. 2009 (published as or takes responsive action. Such sequence of operations is 20100048242); and sometimes referred to as “visual search.” Ser. No. 12/498,709, filed 7 Jul. 2009 (published as Related technologies are shown in patent publications 20100261465). 20080300011 (Digimarc), U.S. Pat. No. 7,283,983 and Priority is claimed to each of the foregoing applications. The 30 WO07/130,688 (Evolution Robotics), 20070175998 and foregoing applications are incorporated herein by reference, 20020102966 (DSPV), 20060012677, 20060240862 and in their entireties. 20050185060 (Google), 20060056707 and 20050227674 INTRODUCTION (Nokia), 20060026140 (ExBiblio), U.S. Pat. No. 6,491,217, 35 20020152388, 200201784.10 and 20050144455 (Philips), Certain aspects of the technology detailed herein are intro 20020072982 and 20040199387 (Shazam), 20030083098 duced in FIG. 1. A user's mobile phone captures imagery (Canon), 20010055391 (Qualcomm), 20010001854 (Air (either in response to user command, or autonomously), and Clic), U.S. Pat. No. 7,251,475 (Sony), U.S. Pat. No. 7,174, objects within the scene are recognized. Information associ 293 (Iceberg), U.S. Pat. No. 7,065,559 (Organnon Wireless), ated with each object is identified, and made available to the 40 U.S. Pat. No. 7,016,532 (Evryx Technologies), U.S. Pat. No. user through a scene-registered interactive visual “bauble' 6,993,573 and U.S. Pat. No. 6,199,048 (Neomedia), U.S. Pat. that is graphically overlaid on the imagery. The bauble may No. 6,941,275 (Tune Hunter), U.S. Pat. No. 6,788.293 (Sil itself present information, or may simply be an indicia that the verbrook Research), U.S. Pat. No. 6,766,363 and U.S. Pat. user can tap at the indicated location to obtain a lengthier No. 6,675,165 (BarPoint), U.S. Pat. No. 6,389,055 (Alcatel listing of related information, or launch a related function/ 45 Lucent), U.S. Pat. No. 6,121,530 (Sonoda), and U.S. Pat. No. application. 6,002,946 (Reber/Motorola). In the illustrated Scene, the camera has recognized the face Aspects of the presently-detailed technology concern in the foreground as “Bob” and annotated the image accord improvements to Such technologies—moving towards the ingly. A billboard promoting the Godzilla movie has been goal of intuitive computing: devices that can see and/or hear, recognized, and a bauble saying "Show Times” has been 50 and infer the user's desire in that sensed context. blitted onto the display—inviting the user to tap for screening information. BRIEF DESCRIPTION OF THE DRAWINGS The phone has recognized the user's car from the scene, and has also identified—by make and year—another vehicle FIG. 1 is a diagram showing an exemplary embodiment in the picture. Both are noted by overlaid text. A restauranthas 55 incorporating certain aspects of the technology detailed also been identified, and an initial review from a collection of herein. reviews ("Jane's review: Pretty Good) is shown. Tapping FIG. 1A is a high level view of an embodiment incorporat brings up more reviews. ing aspects of the present technology. In one particular arrangement, this scenario is imple FIG. 2 shows some of the applications that a user may mented as a cloud-side service assisted by local device object 60 request a camera-equipped cell phone to perform. recognition core services. Users may leave notes on both FIG. 3 identifies some of the commercial entities in an fixed and mobile objects. Tapped baubles can trigger other embodiment incorporating aspects of the present technology. applications. Social networks can keep track of objection FIGS. 4, 4A and 4B conceptually illustrate how pixel data, relationships—forming a virtual “web of objects.” and derivatives, are applied in different tasks, and packaged In early roll-out, the class of recognizable objects will be 65 into packet form. limited but useful. Object identification events will primarily FIG. 5 shows how different tasks may have certain image fetch and associate public domain information and Social processing operations in common. US 8,805,110 B2 3 4 FIG. 6 is a diagram illustrating how common image pro FIG. 36 is an image of an underside of a telephone, discov cessing operations can be identified, and used to configure ered using methods according to aspects of the present tech cellphone processing hardware to perform these operations. nology. FIG. 7 is a diagram showing how a cell phone can send FIG. 37 shows part of the physical user interface of one certain pixel-related data across an internal bus for local pro style of cell phone. cessing, and send other pixel-related data across a communi FIGS. 37A and 38B illustrate different linking topologies. cations channel for processing in the cloud. FIG.38 is an image captured by a cellphone user, depicting FIG. 8 shows how the cloud processing in FIG. 7 allows an Appalachian Trail trail marker. tremendously more “intelligence' to be applied to a task FIGS. 39-43 detail methods incorporating aspects of the desired by a user. 10 present technology. FIG.9 details how key vector data is distributed to different FIG. 44 shows the user interface of one style of cellphone. external service providers, who perform services in exchange FIGS. 45A and 45B illustrate how different dimensions of for compensation, which is handled in consolidated fashion commonality may be explored through use of a user interface for the user. 15 control of a cell phone. FIG. 10 shows an embodiment incorporating aspects of the FIGS. 46A and 46B detail a particular method incorporat present technology, noting how cellphone-based processing ing aspects of the present technology, by which keywords is Suited for simple object identification tasks—such as tem such as Prometheus and Paul Manship are automatically plate matching, whereas cloud-based processing is Suited for determined from a cell phone image. complex tasks—such as data association. FIG. 47 shows some of the different data sources that may FIG. 10A shows an embodiment incorporating aspects of be consulted in processing imagery according to aspects of the present technology, noting that the user experience is the present technology. optimized by performing visual key vector processing as close FIGS. 48A, 48B and 49 show different processing methods to a sensor as possible, and administering traffic to the cloud according to aspects of the present technology. as low in a communications Stack as possible. 25 FIG.50 identifies some of the different processing that may FIG. 11 illustrates that tasks referred for external process be performed on image data, inaccordance with aspects of the ing can be routed to a first group of service providers who present technology. routinely perform certain tasks for the cell phone, or can be FIG. 51 shows an illustrative tree structure that can be routed to a second group of service providers who compete on employed in accordance with certain aspects of the present a dynamic basis for processing tasks from the cell phone. 30 technology. FIG. 52 shows a network of wearable computers (e.g., cell FIG. 12 further expands on concepts of FIG. 11, e.g., phones) that can cooperate with each other, e.g., in a peer-to showing how a bid filter and broadcast agent Software module peer network. may oversee a reverse auction process. FIGS. 53-55 detailhow a glossary of signs can be identified FIG. 13 is a high level block diagram of a processing 35 by a cell phone, and used to trigger different actions. arrangement incorporating aspects of the present technology. FIG. 56 illustrates aspects of prior art digital camera tech FIG. 14 is a high level block diagram of another processing nology. arrangement incorporating aspects of the present technology. FIG.57 details an embodiment incorporating aspects of the FIG. 15 shows an illustrative range of image types that may present technology. be captured by a cell phone camera. 40 FIG. 58 shows how a cell phone can be used to sense and FIG. 16 shows a particular hardware implementation display affine parameters. incorporating aspects of the present technology. FIG. 59 illustrates certain state machine aspects of the FIG. 17 illustrates aspects of a packet used in an exemplary present technology. embodiment. FIG. 60 illustrates how even “still imagery can include FIG. 18 is a block diagram illustrating an implementation 45 temporal, or motion, aspects. of the SIFT technique. FIG. 61 shows some metadata that may be involved in an FIG. 19 is a block diagram illustrating, e.g., how packet implementation incorporating aspects of the present technol header data can be changed during processing, through use of Ogy. a memory. FIG. 62 shows an image that may be captured by a cell FIG. 19A shows a prior art architecture from the robotic 50 phone camera user. Player Project. FIGS. 63-66 detail how the image of FIG. 62 can be pro FIG. 19B shows how various factors can influence how cessed to convey semantic metadata. different operations may be handled. FIG. 67 shows another image that may be captured by a cell FIG. 20 shows an arrangement by which a cell phone phone camera user. camera and a cell phone projector share a lens. 55 FIGS. 68 and 69 detail how the image of FIG. 67 can be FIG. 20A shows a reference platform architecture that can processed to convey semantic metadata. be used in embodiments of the present technology. FIG. 70 shows an image that may be captured by a cell FIG.21 shows an image of a desktop telephone captured by phone camera user. a cellphone camera. FIG. 71 details how the image of FIG.70 can be processed FIG. 22 shows a collection of similar images found in a 60 to convey semantic metadata. repository of public images, by reference to characteristics FIG. 72 is a chart showing aspects of the human visual discerned from the image of FIG. 21. system. FIGS. 23-28A, and 30-34 are flow diagrams detailing FIG.73 shows different low, mid and high frequency com methods incorporating aspects of the present technology. ponents of an image. FIG.29 is an arty shot of the EiffelTower, captured by a cell 65 FIG. 74 shows a newspaper page. phone user. FIG. 75 shows the layout of the FIG. 74 page, as set by FIG. 35 is another image captured by a cellphone user. layout software. US 8,805,110 B2 5 6 FIG. 76 details how user interaction with imagery captured lowed by attaching routing instructions to the pixel data as from printed text may be enhanced. specified by the user's intentions and Subscriptions, leads to FIG. 77 illustrates how semantic conveyance of metadata an interactive session between a mobile device and one or can have a progressive aspect, akin to JPEG2000 and the like. more “cloud based pixel processing services. The key word FIG. 78 is a block diagram of a prior art thermostat. 5 “session further indicates fast responses transmitted back to FIG. 79 is an exterior view of the thermostat of FIG. 78. the mobile device, where for some services marketed as “real FIG. 80 is a block diagram of a thermostat employing time' or “interactive. a session essentially represents a certain aspects of the present technology (“ThingPipe”). duplex, generally packet-based, communication, where sev FIG. 81 is a block diagram of a cell phone embodying eral outgoing 'pixel packets' and several incoming response certain aspects of the present technology. 10 packets (which may be pixel packets updated with the pro FIG. 82 is a block diagram by which certain operations of cessed data) may occur every second. the thermostat of FIG.80 are explained. Business factors and good old competition are at the heart FIG. 83 shows a cell phone display depicting an image of the distributed network. Users can subscribe to or other captured from a thermostat, onto which is overlaid certain wise tap into any external services they choose. The local touch-screen targets that the user can touch to increment or 15 device itself and/or the carrier service provider to that device decrement the thermostattemperature. can be configured as the user chooses, routing filtered and FIG. 84 is similar to FIG. 83, but shows a graphical user pertinent pixel data to specified object interaction services. interface for use on a phone without a touch-screen. Billing mechanisms for Such services can directly plug into FIG. 85 is a block diagram of an alarm clock employing existing cell and/or mobile device billing networks, wherein aspects of the present technology. users get billed and service providers get paid. FIG.86 shows a screenofanalarm clock user interface that Butlets back up a bit. The addition of camera systems to may be presented on a cell phone, in accordance with one mobile devices has ignited an explosion of applications. The aspect of the technology. primordial application certainly must be folks simply Snap FIG. 87 shows a screen of a user interface, detailing nearby ping quick visual aspects of their environment and sharing devices that may be controlled through use of the cellphone. 25 such pictures with friends and family. The fanning out of applications from that starting point DETAILED DESCRIPTION arguably hinges on a set of core plumbing features inherent in mobile cameras. In short (and non-exhaustive of course), The present specification details a diversity of technolo Such features include: a) higher quality pixel capture and low gies, assembled over an extended period of time, to serve a 30 level processing; b) better local device CPU and GPU variety of different objectives. Yet they relate together in resources for on-device pixel processing with Subsequent various ways, and so are presented collectively in this single user feedback; c) structured connectivity into “the cloud.” document. and importantly, d) a maturing traffic monitoring and billing This varied, interrelated subject matter does not lend itself infrastructure. FIG. 1A is but one graphic perspective on to a straightforward presentation. Thus, the reader's indul 35 Some of these plumbing features of what might be called a gence is solicited as this narrative occasionally proceeds in visually intelligent network. (Conventional details of a cell nonlinear fashion among the assorted topics and technolo phone. Such as the microphone, A/D converter, modulation gies. and demodulation systems, IF stages, cellular transceiver, Each portion of this specification details technology that etc., are not shown for clarity of illustration.) desirably incorporates technical features detailed in other 40 It is all well and good to get better CPUs and GPUs, and portions. Thus, it is difficult to identify “a beginning from more memory, on mobile devices. However, cost, weight and which this disclosure should logically begin. That said, we power considerations seem to favor getting “the cloud' to do simply dive in. as much of the “intelligence' heavy lifting as possible. Mobile Device Object Recognition and Interaction Using Relatedly, it seems that there should be a common denomi Distributed Network Services 45 nator set of “device-side' operations performed on visual data There is presently a huge disconnect between the unfath that will serve all cloud processes, including certain format omable Volume of information that is contained in high qual ting, elemental graphic processing, and other rote operations. ity image data streaming from a mobile device camera (e.g., Similarly, it seems there should be a standardized basic in a cell phone), and the ability of that mobile device to header and addressing scheme for the resulting communica process this data to whatever end. “Off device' processing of 50 tion traffic (typically packetized) back and forth with the visual data can help handle this fire hose of data, especially cloud. when a multitude of visual processing tasks may be desired. This conceptualization is akin to the human visual system. These issues become even more critical once “real time object The eye performs baseline operations, such as chromaticity recognition and interaction' is contemplated, where a user of groupings, and it optimizes necessary information for trans the mobile device expects virtually instantaneous results and 55 mission along the optic nerve to the brain. The brain does the augmented reality graphic feedback on the mobile device real cognitive work. And there's feedback the other way screen, as that user points the camera at a scene or object. too—with the brain sending information controlling muscle In accordance with one aspect of the present technology, a movement—where to point eyes, Scanning lines of a book, distributed network of pixel processing engines serve Such controlling the iris (lighting), etc. mobile device users and meet most qualitative "human real 60 FIG.2 depicts a non-exhaustive but illustrative list of visual time interactivity” requirements, generally with feedback in processing applications for mobile devices. Again, it is hard much less than one second. Implementation desirably pro not to see analogies between this list and the fundamentals of vides certain basic features on the mobile device, including a how the human visual system and the human brain operate. It rather intimate relationship between the image sensor's out is a well studied academic area that deals with how “opti put pixels and the native communications channel available to 65 mized the human visual system is relative to any given object the mobile device. Certain levels of basic “content filtering recognition task, where a general consensus is that the eye and classification of the pixel data on the local device, fol retina-optic nerve-cortex system is pretty darn wonderful in US 8,805,110 B2 7 8 how efficiently it serves a vast array of cognitive demands. FIG. 5 is a segue diagram—still at an abstract level, but This aspect of the technology relates to how similarly effi pointing toward the concrete. A list of user-defined applica cient and broadly enabling elements can be built into mobile tions, such as illustrated in FIG. 2, will map to a state-of-the devices, mobile device connections and network services, all art inventory of pixel processing methods and approaches with the goal of serving the applications depicted in FIG. 2, as which can accomplish each and every application. These well as those new applications which may show up as the pixel processing methods break down into common and not technology dance continues. So-common component Sub-tasks. Object recognition text Perhaps the central difference between the human analogy books are filled with a wide variety of approaches and termi and mobile device networks must surely revolve around the nologies which bring a sense of order into what at first glance basic concept of “the marketplace where buyers buy better 10 might appear to be a bewildering array of “unique require ments' relative to the applications shown in FIG. 2. (In addi and better things So long as businesses know how to profit tion, multiple computer vision and image processing librar accordingly. Any technology which aims to serve the appli ies, such as OpenCV and CMVision—discussed below, have cations listed in FIG.2 must necessarily assume that hundreds been created that identify and render functional operations, if not thousands of business entities will be developing the 15 which can be considered "atomic' functions within object nitty gritty details of specific commercial offerings, with the recognition paradigms.) But FIG. 5 attempts to show that expectation of one way or another profiting from those offer there are indeed a set of common steps and processes shared ings. Yes, a few behemoths will dominate main lines of cash between visual processing applications. The differently flows in the overall mobile industry, but an equal certainty shaded pie slices attempt to illustrate that certain pixel opera will be that niche players will be continually developing niche tions are of a specific class and may simply have differences applications and services. Thus, this disclosure describes how in low level variables or optimizations. The size of the overall a marketplace for visual processing services can develop, pie (thought of in a logarithmic sense, where a pie twice the whereby business interests across the spectrum have some size of another may represent 10 times more Flops, for thing to gain. FIG.3 attempts a crude categorization of some example), and the percentage size of the slice, represent of the business interests applicable to the global business 25 degrees of commonality. ecosystem operative in the era of this filing. FIG. 6 takes a major step toward the concrete, sacrificing FIG. 4 sprints toward the abstract in the introduction of the simplicity in the process. Here we see a top portion labeled technology aspect now being considered. Here we find a “Resident Call-Up Visual Processing Services, which repre highly abstracted bit of information derived from some batch sents all of the possible list of applications from FIG. 2 that a of photons that impinged on Some form of electronic image 30 given mobile device may be aware of, or downright enabled to sensor, with a universe of waiting consumers of that lowly bit. perform. The idea is that not all of these applications have to FIG. 4A then quickly introduces the intuitively well-known be active all of the time, and hence some sub-set of services is concept that singular bits of visual information arent worth actually “turned on at any given moment. The turned on much outside of their role in both spatial and temporal group applications, as a one-time configuration activity, negotiate to ings. This core concept is well exploited in modern video 35 identify their common component tasks, labeled the “Com compression standards such as MPEG7 and H.264. mon Processes Sorter' first generating an overall common The “visual character of the bits may be pretty far list of pixel processing routines available for on-device pro removed from the visual domain by certain of the processing cessing, chosen from a library of these elemental image pro (consider, e.g., the vector strings representing eigenface cessing routines (e.g., FFT, filtering, edge detection, resam data). Thus, we sometimes use the term "key vector data” (or 40 pling, color histogramming, log-polar transform, etc.). “key vector strings') to refer collectively to raw sensor/stimu Generation of corresponding Flow Gate Configuration/Soft lus data (e.g., pixel data), and/or to processed information and ware Programming information follows, which literally loads associated derivatives. A key vector may take the form of a library elements into properly ordered places in a field pro container in which Such information is conveyed (e.g., a data grammable gate array set-up, or otherwise configures a Suit structure such as a packet). A tag or other data can be included 45 able processor to perform the required component tasks. to identify the type of information (e.g., JPEG image data, FIG. 6 also includes depictions of the image sensor, fol eigenface data), or the data type may be otherwise evident lowed by a universal pixel segmenter. This pixel segmenter from the data or from context. One or more instructions, or breaks down the massive stream of imagery from the sensor operations, may be associated with key vector data—either into manageable spatial and/or temporal blobs (e.g., akin to expressly detailed in the key vector, or implied. An operation 50 MPEG macroblocks, wavelet transform blocks, 64x64 pixel may be implied in default fashion, for key vector data of blocks, etc.). After the torrent of pixels has been broken down certain types (e.g., for JPEG data it may be “store the image: into chewable chunks, they are fed into the newly pro for eigenface data is may be “match this eigenface template”). grammed gate array (or other hardware), which performs the Or an implied operation may be dependent on context. elemental image processing tasks associated with the selected FIGS. 4A and 4B also introduce a central player in this 55 applications. (Such arrangements are further detailed below, disclosure: the packaged and address-labeled pixel packet, in an exemplary system employing pixel packets.) Various into a body of which key vector data is inserted. The key vector output products are sent to a routing engine, which refers the data may be a single patch, or a collection of patches, or a elementally-processed data (e.g., key vector data) to other time-series of patches/collections. A pixel packet may be less resources (internal and/or external) for further processing. than a kilobyte, or its size can be much much larger. It may 60 This further processing typically is more complex that that convey information about an isolated patch of pixels already performed. Examples include making associations, excerpted from a larger image, or it may convey a massive deriving inferences, pattern and template matching, etc. This of Notre Dame cathedral. further processing can be highly application-specific. (AS presently conceived, a pixel packet is an application (Consider a promotional game from Pepsi, inviting the layer construct. When actually pushed around a network, 65 public to participate in a treasure huntina State park. Based on however, it may be broken into Smaller portions—as transport internet-distributed clues, people try to find a hidden six-pack layer constraints in a network may require.) of soda to earn a S500 prize. Participants must download a US 8,805,110 B2 9 10 special application from the Pepsi-dot-com web site (or the ture), but FIG. 8 is also a segue figure to FIG. 9, where Apple AppStore), which serves to distribute the clues (which Dorothy gets back to Kansas and is happy about it. may also be published to Twitter). The downloaded applica FIG.9 is all about cash, cash flow, and happy humans using tion also has a prize verification component, which processes cameras on their mobile devices and getting highly meaning image data captured by the users cell phones to identify a ful results back from their visual queries, all the while paying special pattern with which the hidden six-pack is uniquely one monthly bill. It turns out the Google AdWords' auction marked. SIFT object recognition is used (discussed below), genie is out of the bottle. Behind the scenes of the moment with the SIFT feature descriptors for the special package by-moment visual scans from a mobile user of their immedi conveyed with the downloaded application. When an image ate visual environment are hundreds and thousands of micro match is found, the cell phone immediately reports same 10 decisions, pixel routings, results comparisons and micro wirelessly to Pepsi. The winner is the user whose cell phone auctioned channels back to the mobile device user for the hard first reports detection of the specially-marked six-pack. In the good they are “truly” looking for, whether they know it or not. FIG. 6 arrangement, some of the component tasks in the SIFT This last point is deliberately cheeky, in that searching of any pattern matching operation are performed by the elemental kind is inherently open ended and magical at Some level, and image processing in the configured hardware; others are 15 part of the fun of searching in the first place is that Surpris referred for more specialized processing—either internal or ingly new associations are part of the results. The search user external.) knows after the fact what they were truly looking for. The FIG. 7 up-levels the picture to a generic distributed pixel system, represented in FIG. 9 as the carrier-based financial services network view, where local device pixel services and tracking server, now sees the addition of our networked pixel "cloud based pixel services have a kind of symmetry in how services module and its role in facilitating pertinent results they operate. The router in FIG.7 takes care of how any given being sent back to a user, all the while monitoring the uses of packaged pixel packet gets sent to the appropriate pixel pro the services in order to populate the monthly bill and send the cessing location, whether local or remote (with the style of fill proceeds to the proper entities. pattern denoting different component processing functions; (As detailed further elsewhere, the money flow may not only a few of the processing functions required by the enabled 25 exclusively be to remote service providers. Other money visual processing services are depicted). Some of the data flows can arise. Such as to users or other parties, e.g., to induce shipped to cloud-based pixel services may have been first or reward certain actions.) processed by local device pixel services. The circles indicate FIG. 10 focuses on functional division of processing that the routing functionality may have components in the illustrating how tasks in the nature of template matching can cloud—nodes that serve to distribute tasks to active service 30 be performed on the cell phone itself, whereas more sophis providers, and collect results for transmission back to the ticated tasks (in the nature of data association) desirably are device. In some implementations these functions may be referred to the cloud for processing. performed at the edge of the wireless network, e.g., by mod Elements of the foregoing are distilled in FIG. 10A, show ules at wireless service towers, so as to ensure the fastest ing an implementation of aspects of the technology as a action. Results collected from the active external service pro 35 physical matter of (usually) software components. The two viders, and the active local processing stages, are fed back to ovals in the figure highlight the symmetric pair of Software Pixel Service Manager software, which then interacts with components which are involved in setting up a "human real the device user interface. time' visual recognition session between a mobile device and FIG. 8 is an expanded view of the lower right portion of the generic cloud or service providers, data associations and FIG. 7 and represents the moment where Dorothy's shoes 40 visual query results. The oval on the left refers to “keyvec turn red and why distributed pixel services provided by the tors' and more specifically “visual key vectors.” As noted, this cloud—as opposed to the local device—will probably trump term can encompass everything from simple JPEG com all but the most mundane object recognition tasks. pressed blocks all the way through log-polar transformed Object recognition in its richer form is based on visual facial feature vectors and anything in between and beyond. association rather than strict template matching rules. If we 45 The point of a key vector is that the essential raw information all were taught that the capital letter “A” will always be of some given visual recognition task has been optimally strictly following some pre-historic form never to change, a pre-processed and packaged (possibly compressed). The oval universal template image if you will, then pretty clean and on the left assembles these packets, and typically inserts some locally prescriptive methods can be placed into a mobile addressing information by which they will be routed. (Final imaging device in order to get it to reliably read a capital A 50 addressing may not be possible, as the packet may ultimately any time that ordained form 'A' is presented to the camera. be routed to remote service providers—the details of which 2D and even three 3D barcodes in many ways follow this may not yet be known.) Desirably, this processing is per template-like approach to object recognition, where for con formed as close to the raw sensor data as possible. Such as by tained applications involving Such objects, local processing processing circuitry integrated on the same Substrate as the services can largely get the job done. But even in the barcode 55 image sensor, which is responsive to software instructions example, flexibility in the growth and evolution of overt stored in memory or provided from another stage in packet visual coding targets begs for an architecture which doesn't form. force “code upgrades' to agazillion devices every time there The oval on the right administers the remote processing of is some advance in the overt symbology art. key Vector data, e.g., attending to arranging appropriate Ser At the other end of the spectrum, arbitrarily complex tasks 60 vices, directing traffic flow, etc. Desirably, this software pro can be imagined, e.g., referring to a network of Supercomput cess is implemented as low down on a communications stack ers the task of predicting the apocryphal typhoon resulting as possible, generally on a "cloud side' device, access point, from the fluttering of a butterfly's wings halfway around the or cell tower. (When real-time visual key vector packets world—if the application requires it. OZ beckons. stream over a communications channel, the lower down in the FIG. 8 attempts to illustrate this radical extra dimension 65 communications Stack they are identified and routed, the ality of pixel processing in the cloud as opposed to the local smoother the “human real-time' look and feel a given visual device. This virtually goes without saying (or without a pic recognition task will be.) Remaining high level processing US 8,805,110 B2 11 12 needed to support this arrangement is included in FIG. 10A Submitted by competitors (using known trust arrangements for context, and can generally be performed through native assuring data integrity), and given the opportunity to make mobile and remote hardware capabilities. their offers more enticing. Such a bidding war may continue FIGS. 11 and 12 illustrate the concept that some providers until no bidder is willing to change the offered terms. The of some cloud-based pixel processing services may be estab query router and response manager (or in some implementa lished in advance, in a pseudo-static fashion, whereas other tions, the user) then makes a selection. providers may periodically vie for the privilege of processing For expository convenience and visual clarity, FIG. 12 a user's key vector data, through participation in a reverse shows a software module labeled “Bid Filter and Broadcast auction. In many implementations, these latter providers Agent. In most implementations this forms part of the query compete each time a packet is available for processing. 10 router and response manager module. The bid filter module Consider a user who Snaps a cell phone picture of an decides which vendors—from a universe of possible ven unfamiliar car, wanting to learn the make and model. Various dors—should be given a chance to bid on a processing task. service providers may compete for this business. A startup (The user's preference data, or historical experience, may vendor may offer to perform recognition for free to build its indicate that certain service providers be disqualified.) The brand or collect data. Imagery submitted to this service 15 broadcast agent module then communicates with the selected returns information simply indicating the car's make and bidders to inform them of a user task for processing, and model. Consumer Reports may offer an alternative service— provides information needed for them to make a bid. which provides make and model data, but also provides tech Desirably, the bid filter and broadcast agent do at least nical specifications for the car. However, they may charge 2 some their work in advance of data being available for pro cents for the service (or the cost may be bandwidth based, cessing. That is, as soon as a prediction can be made as to an e.g., 1 cent per megapixel). Edmunds, or JD Powers, may operation that the user may likely soon request, these modules offer still another service, which provides data like Consumer start working to identify a provider to perform a service Reports, but pays the user for the privilege of providing data. expected to be required. A few hundred milliseconds later the In exchange, the vendor is given the right to have one of its user key vector data may actually be available for processing partners send a text message to the user promoting goods or 25 (if the prediction turns out to be accurate). services. The payment may take the form of a credit on the Sometimes, as with Google's present AdWords system, the user's monthly cell phone voice/data service billing. service providers are not consulted at each user transaction. Using criteria specified by the user, Stored preferences, Instead, each provides bidding parameters, which are stored context, and other rules/heuristics, a query router and and consulted whenever a transaction is considered, to deter response manager (in the cellphone, in the cloud, distributed, 30 mine which service provider wins. These stored parameters etc.) determines whether the packet of data needing process may be updated occasionally. In some implementations the ing should be handled by one of the service providers in the service provider pushes updated parameters to the bid filter stable of static standbys, or whether it should be offered to and broadcast agent whenever available. (The bid filter and providers on an auction basis—in which case it arbitrates the broadcast agent may serve a large population of users, such as outcome of the auction. 35 all Verizon subscribers in area code 503, or all subscribers to The static standby service providers may be identified an ISP in a community, or all users at the domain well-dot when the phone is initially programmed, and only reconfig com, etc.; or more localized agents may be employed. Such as ured when the phone is reprogrammed. (For example, Verizon one for each cell phone tower.) may specify that all FFT operations on its phones be routed to If there is a lull in traffic, a service provider may discount its a server that it provides for this purpose.) Or, the user may be 40 services for the next minute. The service provider may thus able to periodically identify preferred providers for certain transmit (or post) a message stating that it will perform eigen tasks, as through a configuration menu, or specify that certain vector extraction on an image file of up to 10 megabytes for 2 tasks should be referred for auction. Some applications may cents until 1244754176 Coordinated Universal Time in the emerge where static service providers are favored; the task Unix epoch, after which time the price will return to 3 cents. may be so mundane, or one provider's services may be so 45 The bid filter and broadcast agent updates a table with stored un-paralleled, that competition for the provision of services bidding parameters accordingly. isn't warranted. (The reader is presumed to be familiar with the reverse In the case of services referred to auction, Some users may auction arrangements used by Google to place sponsored exalt price above all other considerations. Others may insist advertising on web search results page. An illustrative on domestic data processing. Others may want to Stick to 50 description is provided in Levy, “Secret of Googlenomics: service providers that meet “green.” “ethical or other stan Data-Fueled Recipe Brews Profitability.” Wired Magazine, dards of corporate practice. Others may prefer richer data May 22, 2009.) output. Weightings of different criteria can be applied by the In other implementations, the broadcast agent polls the query router and response manager in making the decision. bidders—communicating relevant parameters, and Soliciting In some circumstances, one input to the query router and 55 bid responses whenever a transaction is offered for process response manager may be the user's location, so that a differ ing. ent service provider may be selected when the user is at home Once a prevailing bidder is decided, and data is available in Oregon, than when she is vacationing in Mexico. In other for processing, the broadcast agent transmits the key vector instances, the required turnaround time is specified, which data (and other parameters as may be appropriate to a par may disqualify some vendors, and make others more com 60 ticular task) to the winning bidder. The bidder then performs petitive. In some instances the query router and response the requested operation, and returns the processed data to the manager need not decide at all, e.g., if cached results identi query router and response manager. This module logs the fying a service provider selected in a previous auction are still processed data, and attends to any necessary accounting (e.g., available and not beyond a “freshness” threshold. crediting the service provider with the appropriate fee). The Pricing offered by the vendors may change with processing 65 response data is then forwarded back to the user device. load, bandwidth, time of day, and other considerations. In In a variant arrangement, one or more of the competing some embodiments the providers may be informed of offers service providers actually performs some or all of the US 8,805,110 B2 13 14 requested processing, but "teases the user (or the query ner—such as by linking to the page from the user's business router and response manager) by presenting only partial card, or from another launch page. results. With a taste of what’s available, the user (or the query As another example, consider a Facebook user who has router and response manager) may be induced to make a earned, or paid for, or otherwise received credit that can be different choice than relevant criteria/heuristics would other applied to certain services—such as for downloading Songs wise indicate. from iTunes, or for music recognition services, or for identi The function calls sent to external service providers, of fying clothes that go with particular shoes (for which an course, do not have to provide the ultimate result sought by a image has been Submitted), etc. These services may be asso consumer (e.g., identifying a car, or translating a menu listing ciated with the particular Facebook page, so that friends can from French to English). They can be component operations, 10 invoke the services from that page—essentially spending the such as calculating an FFT, or performing a SIFT procedure host’s credit (again, with Suitable authorization or invitation or a log-polar transform, or computing a histogram or eigen by that hosting user). Likewise, friends may Submit images to vectors, or identifying edges, etc. a facial recognition service accessible through an application In time, it is expected that a rich ecosystem of expert 15 associated with the user's Facebook page. Images Submitted processors will emerge—serving myriad processing requests in such fashion are analyzed for faces of the hosts friends, from cell phones and other thin client devices. and identification information is returned to the submitter, More on Monetary Flow e.g., through a user interface presented on the originating Additional business models can be enabled, involving the Facebook page. Again, the host may be assessed a fee for each subsidization of consumed remote services by the service Such operation, but may allow authorized friends to avail providers themselves in exchange for user information (e.g., themselves of Such service at no cost. for audience measurement), or in exchange for action taken Credits, and payments, can also be routed to charities. A by the user, such as completing a Survey, visiting specific viewer exiting a theatre after a particularly poignant movie sites, locations in store, etc. about poverty in Bangladesh may capture an image of an Services may be subsidized by third parties as well, such a 25 associated movie poster, which serves as a portal for dona coffee shop that derives value by providing a differentiating tions for a charity that serves the poor in Bangladesh. Upon service to its customers in the form of free/discounted usage recognizing the movie poster, the cell phone can present a of remote services while they are seated in the shop. graphical/touch user interface through which the user spins In one arrangement an economy is enabled wherein a cur dials to specify an amount of a charitable donation, which at rency of remote processing credits is created and exchanged 30 the conclusion of the transaction is transferred from a finan between users and remote service providers. This may be cial account associated with the user, to one associated with entirely transparent to the user and managed as part of a the charity. service plan, e.g., with the user's cell phone or data service More on a Particular Hardware Arrangement provider. Or it can be exposed as a very explicit aspect of As noted above and in the cited patent documents, there is certain embodiments of the present technology. Service pro 35 a need for generic object recognition by a mobile device. viders and others may award credits to users for taking actions Some approaches to specialized object recognition have or being part of a frequent-user program to build allegiance emerged, and these have given rise to specific data processing with specific providers. approaches. However, no architecture has been proposed that As with other currencies, users may choose to explicitly goes beyond specialized object recognition toward generic donate, save, exchange or generally barter credits as needed. 40 object recognition. Considering these points in further detail, a service may Visually, a generic object recognition arrangement pay a user for opting-in to an audience measurement panel. requires access to good raw visual data preferably free of E.g., The Nielsen Company may provide services to the pub device quirks, Scene quirks, user quirks, etc. Developers of lic—Such as identification of television programming from systems built around object identification will best prosper audio or video samples Submitted by consumers. These ser 45 and serve their users by concentrating on the object identifi vices may be provided free to consumers who agree to share cation task at hand, and not the myriad existing roadblocks, Some of their media consumption data with Nielsen (Such as resource sinks, and third party dependencies that currently by serving as an anonymous member for a city's audience must be confronted. measurement panel), and provided on a fee basis to others. As noted, virtually all object identification techniques can Nielsen may offer, for example, 100 units of credit micro 50 make use of or even rely upon-a pipe to “the cloud.” payments or other value—to participating consumers each “Cloud can include anything external to the cell phone. month, or may provide credit each time the user Submits An example is a nearby cell phone, or plural phones on a information to Nielsen. distributed network. Unused processing power on Such other In another example, a consumer may be rewarded for phone devices can be made available for hire (or for free) to accepting commercials, or commercial impressions, from a 55 call upon as needed. The cell phones of the implementations company. If a consumer goes into the Pepsi Center in Denver, detailed herein can Scavenge processing power from Such she may receive a reward for each Pepsi-branded experience other cell phones. she encounters. The amount of micropayment may scale with Such a cloud may be ad hoc, e.g., other cellphones within the amount of time that she interacts with the different Pepsi Bluetooth range of the user's phone. The ad hoc network can branded objects (including audio and imagery) in the venue. 60 be extended by having such other phones also extend the local Notjust large brand owners can provide credits to individu cloud to further phones that they can reach by Bluetooth, but als. Credits can be routed to friends and social/business the user cannot. acquaintances. To illustrate, a user of Facebook may share The "cloud can also comprise other computational plat credit (redeemable for goods/services, or exchangeable for forms, such as set-top boxes; processors in automobiles, ther cash) from his Facebook page—enticing others to visit, or 65 mostats, HVAC systems, wireless routers, local cell phone linger. In some cases, the credit can be made available only to towers and other wireless network edges (including the pro people who navigate to the Facebook page in a certain man cessing hardware for their software-defined radio equip US 8,805,110 B2 15 16 ment), etc. Such processors can be used in conjunction with graphic user interface by which the user can interact with the more traditional cloud computing resources—as are offered thermostat. A bar code or other data carrier can alternatively by Google, Amazon, etc. be used. Such technology is further detailed below. (In View of concerns of certain users about privacy, the Image 42 depicts an item including a barcode 48. This phone desirably has a user-configurable option indicating barcode conveys Universal Product Code (UPC) data. Other whether the phone can refer data to cloud resources for pro barcodes may convey other information. The barcode pay cessing. In one arrangement, this option has a default value of load is not primarily intended for reading by a user cellphone “No, limiting functionality and impairing battery life, but (in contrast to watermark 47), but it nonetheless may be used also limiting privacy concerns. In another arrangement, this by the cell phone to help determine an appropriate response option has a default value of “Yes”) 10 for the user. Desirably, image-responsive techniques should produce a Image 43 shows a product that may be identified without short term “result or answer,” which generally requires some reference to any express machine readable information (Such level of interactivity with a user hopefully measured in as a bar code or watermark). A segmentation algorithm may fractions of a second for truly interactive applications, or a be applied to edge-detected image data to distinguish the few seconds or fractions of a minute for nearer-term “I’m 15 apparent image subject from the apparent background. The patient to wait' applications. image Subject may be identified through its shape, color and As for the objects in question, they can break down into texture. Image fingerprinting may be used to identify refer various categories, including (1) generic passive (clues to ence images having similar labels, and metadata associated basic searches), (2) geographic passive (at least you know with those other images may be harvested. SIFT techniques where you are, and may hook into geographic-specific (discussed below) may be employed for such pattern-based resources), (3) “cloud supported passive, as with “identified/ recognition tasks. Specular reflections in low texture regions enumerated objects and their associated sites, and (4) active/ may tend to indicate the image Subject is made of glass. controllable, a la ThingPipe (a reference to technology Optical character recognition can be applied for further infor detailed below, such as WiFi-equipped thermostats and park mation (reading the visible text). All of these clues can be ing meters). 25 employed to identify the depicted item, and help determine an An object recognition platform should not, it seems, be appropriate response for the user. conceived in the classic “local device and local resources Additionally (or alternatively), similar-image search sys only' software mentality. However, it may be conceived as a tems, such as Google Similar Images, and Microsoft Live local device optimization problem. That is, the software on Search, can be employed to find similar images, and their the local device, and its processing hardware, should be 30 metadata can then be harvested. (As of this writing, these designed in contemplation of their interaction with off-device services do not directly Supportupload of a user picture to find software and hardware. Ditto the balance and interplay of similar web pictures. However, the user can post the image to both control functionality, pixel crunching functionality, and Flickr (using Flickr's cellphone upload functionality), and it application software/GUI provided on the device, versus off will soon be found and processed by Google and Microsoft.) the device. (In many implementations, certain databases use 35 Image 44 is a Snapshot of friends. Facial detection and ful for object identification/recognition will reside remote recognition may be employed (i.e., to indicate that there are from the device.) faces in the image, and to identify particular faces and anno In a particularly preferred arrangement, such a processing tate the image with metadata accordingly, e.g., by reference to platform employs image processing near the sensor—opti user-associated data maintained by Apple's iPhoto service, mally on the same chip, with at least Some processing tasks 40 Google's Picasa service, Facebook, etc.) Some facial recog desirably performed by dedicated, special purpose hardware. nition applications can be trained for non-human faces, e.g., Consider FIG. 13, which shows an architecture of a cell cats, dogs animated characters including avatars, etc. Geolo phone 10 in which an image sensor 12 feeds two processing cation and date/time information from the cell phone may paths. One, 13, is tailored for the human visual system, and also provide useful information. includes processing Such as JPEG compression. Another, 14. 45 The persons wearing Sunglasses pose a challenge for some is tailored for object recognition. As discussed, some of this facial recognition algorithms. Identification of those indi processing may be performed by the mobile device, while viduals may be aided by their association with persons whose other processing may be referred to the cloud 16. identities can more easily be determined (e.g., by conven FIG. 14 takes an application-centric view of the object tional facial recognition). That is, by identifying other group recognition processing path. Some applications reside wholly 50 pictures in iPhoto/Picasa/Facebook/etc. that include one or in the cellphone. Other applications reside wholly outside the more of the latter individuals, the other individuals depicted cell phone—e.g., simply taking key vector data as stimulus. in Such photographs may also be present in the Subject image. More common are hybrids, Such as where some processing is These candidate persons form a much smaller universe of done in the cell phone, other processing is done externally, possibilities than is normally provided by unbounded iPhoto/ and the application Software orchestrating the process resides 55 Picasa/Facebook/etc data. The facial vectors discernable in the cell phone. from the Sunglass-wearing faces in the Subject image can then To illustrate further discussion, FIG. 15 shows a range 40 of be compared against this Smaller universe of possibilities in some of the different types of images 41-46 that may be order to determine a best match. If, in the usual case of captured by a particular user's cell phone. A few brief (and recognizing a face, a score of 90 is required to be considered incomplete) comments about some of the processing that may 60 a match (out of an arbitrary top match score of 100), in be applied to each image are provided in the following para searching Such a group-constrained set of images a score of graphs. 70 or 80 might suffice. (Where, as in image 44, there are two Image 41 depicts a thermostat. A steganographic digital persons depicted without Sunglasses, the occurrence of both watermark 47 is textured or printed on the thermostats case. of these individuals in a photo with one or more other indi (The watermark is shown as visible in FIG. 15, but is typically 65 viduals may increase its relevance to such an analysis— imperceptible to the viewer). The watermark conveys infor implemented, e.g., by increasing a weighting factor in a mation intended for the cell phone, allowing it to present a matching algorithm.) US 8,805,110 B2 17 18 Image 45 shows part of the statue of Prometheus in Rock exposure time of 10 milliseconds, and a third may dictate an efeller Center, N.Y. Its identification can follow teachings exposure time of 100 milliseconds. (Such frames may later be detailed elsewhere in this specification. processed in combination to yield a high dynamic range Image 46 is a landscape, depicting the Maroon Bells moun image.) A fourth packet may instruct the camera to down tain range in Colorado. This image subject may be recognized sample data from the image sensor, and combine signals from by reference to geolocation data from the cellphone, in con differently color-filtered sensor cells, so as to output a 4x3 junction with geographic information services such as array of grayscale luminance values. A fifth packet may GeoNames or Yahoo!'s GeoPlanet. instruct the camera to output data only from an 8x8 patch of (It should be understood that techniques noted above in pixels at the center of the frame. A sixth packet may instruct connection with processing of one of the images 41-46 in 10 the camera to output only five lines of image data, from the FIG. 15 can likewise be applied to others of the images. top, bottom, middle, and mid-upper and mid-lower rows of Moreover, it should be understood that while in some respects the sensor. A seventh packet may instruct the camera to output the depicted images are ordered according to ease of identi only data from blue-filtered sensor cells. An eighth packet fying the Subject and formulating a response, in other respects may instruct the camera to disregard any auto-focus instruc they are not. For example, although landscape image 46 is 15 tions but instead capture a full frame at infinity focus. And so depicted to the far right, its geolocation data is strongly cor O. related with the metadata “Maroon Bells.” Thus, this particu Each such packet 57 is provided from setup module 34 lar image presents an easier case than that presented by many across a bus or other data channel 60 to a camera controller other images.) module associated with the camera. (The details of a digital In one embodiment, Such processing of imagery occurs camera—including an array of photosensor cells, associated automatically—without express user instruction each time. analog-digital converters and control circuitry, etc., are well Subject to network connectivity and power constraints, infor known to artisans and so are not belabored.) Camera 32 mation can be gleaned continuously from Such processing, captures digital image data in accordance with instructions in and may be used in processing Subsequently-captured the header field 55 of the packet and stuffs the resulting image images. For example, an earlier image in a sequence that 25 data into a body 59 of the packet. It also deletes the camera includes photograph 44 may show members of the depicted instructions 55 from the packet header (or otherwise marks group without Sunglasses—simplifying identification of the header field 55 in a manner permitting it to be disregarded by persons later depicted with Sunglasses. Subsequent processing stages). FIG. 16, Etc., Implementation When the packet 57 was authored by setup module 34 it FIG. 16 gets into the nifty gritty of a particular implemen 30 also included a series of further header fields 58, each speci tation incorporating certain of the features earlier dis fying how a corresponding. Successive post-sensor stage 38 cussed. (The other discussed features can be implemented by should process the captured data. As shown in FIG. 16, there the artisan within this architecture, based on the provided are several Such post-sensor processing stages 38. disclosure.) In this data driven arrangement 30, operation of a Camera 32 outputs the image-stuffed packet produced by cellphone camera 32 is dynamically controlled in accordance 35 the camera (a pixel packet) onto a bus or other data channel with packet data sent by a setup module 34, which in turn is 61, which conveys it to a first processing stage 38. controlled by a control processor module 36. (Control pro Stage 38 examines the header of the packet. Since the cessor module 36 may be the cellphone's primary processor, camera deleted the instruction field 55 that conveyed camera oran auxiliary processor, or this function may be distributed.) instructions (or marked it to be disregarded), the first header The packet data further specifies operations to be performed 40 field encountered by a control portion of stage 38 is field 58a. by an ensuing chain of processing stages 38. This field details parameters of an operation to be applied by In one particular implementation, setup module 34 dic stage 38 to data in the body of the packet. tates—on a frame by frame basis—the parameters that are to For example, field 58a may specify parameters of an edge be employed by camera 32 in gathering an exposure. Setup detection algorithm to be applied by stage 38 to the packets module 34 also specifies the type of data the camera is to 45 image data (or simply that such an algorithm should be output. These instructional parameters are conveyed in a first applied). It may further specify that stage 38 is to substitute field 55 of a header portion 56 of a data packet 57 correspond the resulting edge-detected set of data for the original image ing to that frame (FIG. 17). data in the body of the packet. (Substituting of data, rather For example, for each frame, the setup module 34 may than appending, may be indicated by the value of a single bit issue a packet 57 whose first field 55 instructs the camera 50 flag in the packet header.) Stage 38 performs the requested about, e.g., the length of the exposure, the aperture size, the operation (which may involve configuring programmable lens focus, the depth of field, etc. Module 34 may further hardware in certain implementations). First stage 38 then author the field 55 to specify that the sensor is to sum sensor deletes instructions 58a from the packet header 56 (or marks charges to reduce resolution (e.g., producing a frame of 640x them to be disregarded) and outputs the processed pixel 480 data from a sensor capable of 1280x960), output data 55 packet for action by a next processing stage. only from red-filtered sensor cells, output data only from a A control portion of a next processing stage (which here horizontal line of cells across the middle of the sensor, output comprises stages 38a and 38b, discussed later) examines the data only from a 128x128 patch of cells in the center of the header of the packet. Since field 58a was deleted (or marked pixel array, etc. The camera instruction field 55 may further to be disregarded), the first field encountered is field 58b. In specify the exact time that the camera is to capture data—so 60 this particular packet, field 58b may instruct the second stage as to allow, e.g., desired synchronization with ambient light not to perform any processing on the data in the body of the ing (as detailed later). packet, but instead simply delete field 58b from the packet Each packet 56 issued by setup module 34 may include header and pass the pixel packet to the next stage. different camera parameters in the first header field 55. Thus, A next field of the packet header may instruct the third a first packet may cause camera 32 to capture a full frame 65 stage 38c to perform 2D FFT operations on the image data image with an exposure time of 1 millisecond. A next packet found in the packet body, based on 16x16 blocks. It may may cause the camera to capture a full frame image with an further direct the stage to hand-off the processed FFT data to US 8,805,110 B2 19 20 a wireless interface, for internet transmission to address For example, a processing module 38 may make a data flow 216.239.32.10, accompanied by specified data (detailing, selection based on Some result of processing it performs. E.g., e.g., the task to be performed on the received FFT data by the if an edge detection stage discerns a sharp contrast image, computer at that address, such as texture classification). It then an outgoing packet may be routed to an external service may further direct the stage to hand off a single 16x16 block 5 provider for FFT processing. That provider may return the of FFT data, corresponding to the center of the captured resultant FFT data to other stages. However, if the image has image, to the same or a different wireless interface for trans poor edges (such as being out of focus), then the system may mission to address 12.232.235.27—again accompanied by not want FFT- and following processing to be performed on corresponding instructions about its use (e.g., search an the data. Thus, the processing stages can cause branches in the 10 data flow, dependent on parameters of the processing (such as archive of stored FFT data for a match, and return information discerned image characteristics). if a match is found; also, store this 16x16 block in the archive Instructions specifying such conditional branching can be with an associated identifier). Finally, the header authored by included in the header of packet 57, or they can be provided setup module 34 may instruct stage 38c to replace the body of otherwise. FIG. 19 shows one arrangement. Instructions 58d the packet with the single 16x16 block of FFT data dispatched 15 originally in packet 57 specify a condition, and specify a to the wireless interface. As before, the stage also edits the location in a memory 79 from which replacement subsequent packet header to delete (or mark) the instructions to which it instructions (58e'-58g) can be read, and substituted into the responded, so that a header instruction field for the next packet header, if the condition is met. If the condition is not processing stage is the first to be encountered. met, execution proceeds in accordance with header instruc In other arrangements, the addresses of the remote com tions already in the packet. puters are not hard-coded. For example, the packet may In other arrangements, other variations can be employed. include a pointer to a database record or memory location (in For example, all of the possible conditional instructions can the phone or in the cloud), which contains the destination be provided in the packet. In another arrangement, a packet address. Or, stage 38c may be directed to hand-off the pro architecture is still used, but one or more of the header fields cessed pixel packet to the Query Router and Response Man 25 do not include explicit instructions. Rather, they simply point ager (e.g., FIG. 7). This module examines the pixel packet to to memory locations from which corresponding instructions determine what type of processing is next required, and it (or data) are retrieved, e.g., by the corresponding processing routes it to an appropriate provider (which may be in the cell stage 38. phone if resources permit, or in the cloud—among the stable Memory 79 (which can include a cloud component) can of static providers, or to a provider identified through an 30 also facilitate adaptation of processing flow even if condi auction). The provider returns the requested output data (e.g., tional branching is not employed. For example, a processing texture classification information, and information about any stage may yield output data that determines parameters of a matching FFT in the archive), and processing continues per filter or other algorithm to be applied by a later stage (e.g., a the next item of instruction in the pixel packet header. convolution kernel, a time delay, a pixel mask, etc). Such The data flow continues through as many functions as a 35 parameters may be identified by the former processing stage particular operation may require. in memory (e.g., determined/calculated, and stored), and In the particular arrangement illustrated, each processing recalled for use by the later stage. In FIG. 19, for example, stage 38 strips-out, from the packet header, the instructions on processing stage 38 produces parameters that are stored in which it acted. The instructions are ordered in the header in memory 79. A subsequent processing stage 38c later retrieves the sequence of processing stages, so this removal allows 40 these parameters, and uses them in execution of its assigned each stage to look to the first instructions remaining in the operation. (The information in memory can be labeled to header for direction. Other arrangements, of course, can alter identify the module/provider from which they originated, or natively be employed. (For example, a module may insert to which they are destined