USOO8768313B2

(12) United States Patent (10) Patent No.: US 8,768,313 B2 Rodriguez (45) Date of Patent: Jul. 1, 2014

(54) METHODS AND SYSTEMS FOR IMAGE OR (56) References Cited AUDIO RECOGNITION PROCESSING U.S. PATENT DOCUMENTS (75) Inventor: Tony F. Rodriguez, Portland, OR (US) 7,272.407 B2 * 9/2007 Strittmatter et al...... 455,500 7.925,265 B2 * 4/2011 Souissi ...... 455,445 (73) Assignee: Digimarc Corporation, Beaverton, OR 8,229, 160 B2 7/2012 Rosenblatt (US) 8,358,811 B2 * 1/2013 Adams et al...... 382,118 2002/0102966 A1 8, 2002 Lev et al. (*) Notice: Subject to any disclaimer, the term of this 2004/0212630 A1 10/2004 Hobgood et al. patent is extended or adjusted under 35 2006/00316842005/0091604 A1 4/20052/2006 SharmaDavis et al. U.S.C. 154(b) by 819 days. (Continued) (21) Appl. No.: 12/855,996 OTHER PUBLICATIONS (22) Filed: Aug. 13, 2010 Ahmed, et al. MACE—Adaptive Component Management Middleware for Ubiquitous Systems, Proc. 4th Int'l Workshop on (65) Prior Publication Data Middleware for Pervasive and Ad-Hoc Computing, 2006. US 2011 FO143811 A1 Jun. 16, 2011 (Continued) Primary Examiner — Blane JJackson Related U.S. Application Data (74) Attorney, Agent, or Firm — Digimarc Corporation (60) Provisional application No. 61/234,542, filed on Aug. 17, 2009. (57) ABSTRACT Many of the detailed technologies are useful in enabling a (51) Int. Cl. Smartphone to respond to a user's environment, e.g., so it can H04M 3/42 (2006.01) serve as an intuitive hearing and seeing device. A few of the G06K 9/00 (2006.01) detailed arrangements involve optimizing division of shared H04N I/00 (2006.01) processing tasks between the phone and remote devices; H04L 29/08 (2006.01) using a phone GPU for exhaustive speculative execution and G06K 9/22 (2006.01) machine vision purposes (including facial recognition); novel (52) U.S. Cl. device architectures involving abstraction layers that facili CPC ...... H04N I/00127 (2013.01); H04N I/00244 tate substitution of different local and remote services; inter (2013.01); H04N I/00331 (2013.01); H04N actions with private networks as they relate to audio/image 1/00328 (2013.01); H04L 67/16 (2013.01); processing; adapting the orders in which operations are G06K 9/228 (2013.01); G06K 9/00986 executed, and the types of data that are exchanged with (2013.01); H04N2201/0084 (2013.01); G06K remote servers, in accordance with current context; reconfig 9/00288 (2013.01); H04N I/00336 (2013.01) uring networks based on sensed social affiliations among USPC ...... 455/4.14.1; 455/415; 382/276; 382/118 users and in accordance with predictive models of user behav (58) Field of Classification Search ior, etc. A great variety of other features and arrangements are USPC ...... 455/4.1.2, 414.1, 66.1, 556.1, 557, 415; also detailed. 382/118, 173, 190, 276 See application file for complete search history. 16 Claims, 55 Drawing Sheets

USERSNAPS IMAGE

FACIAL oct FFT WAWELT RCCGNTION TRANSFORTRANSFORMITRANSFORM t EDGE EIGENWALUE SPECTRUM OCR ETECTON ANALYSIS ANALYSIS g es Atter FOURIER-MELLIN TEXTURE COLOR 2. XTRACTION TRANSFORM CLASSFR ANALYSS g GIST EMOTION BARCODE AGE se g PROCESSNS CLASSIFIER DECODER DETECTION s

WATERMARK ORIENTATION I SIMILARITY GENDER ECOER ETECTION PROCESSORETECTON 3 OBJECT METADATA GEOLOCATION | ETC 3. SEGMENTATON ANAYSIS PROCESSENG Cs ROUTER

SRWC SRWC SERVICE Row}ER PRWR2 PROWIRN

CELLPHONE USERNTERFACE US 8,768,313 B2 Page 2

(56) References Cited neering for Self-Adaptive Systems, Springer Berlin Heidelberg, Jun. 19, 2009, pp. 146-163. U.S. PATENT DOCUMENTS Gu, Adaptive offloading inference for delivering applications in per 2006,0047584 A1 3, 2006 Vaschillo et al. vasive computing environments, Proc IEEE Int’l Confon Pervasive 2006/0217199 A1 9, 2006 Adcox et al. Computing and Communication, 2003. 2007/010.0480 A1 5, 2007 Sinclair et al. Kirsch-Pinheiro, , et al. Context-Aware Service Selection Using 2007/O159522 A1 7, 2007 Neven Graph Matching, 2nd Non Functional Properties and Service Level 2009 OO31381 A1 1/2009 Cohen et al. Agreements in Service Oriented Computing Workshop (NFPSLA 2009/0237546 A1 9, 2009 Bloebaum et al. SOCO8), CEUR Workshop Proceedings, vol. 411, 2008. 2009,0252383 A1 10/2009 Adam et al. 2009,0279794 A1 11/2009 Brucher et al. Noble, etal, Agile Application-Aware Adaptation for Mobility, Proc. 2009/0285492 A1 1 1/2009 Ramanujapuram et al. of the ACM Symposium on Principles (SOSP), 2010/0260426 A1 10/2010 Huang et al. 1997. Osman, et al. The Design and Implementation of Zap: A System for OTHER PUBLICATIONS Migrating Computing Environments, Proc of the Fifth Symposium Balan, Tactics-Based Remote Execution for Mobile Computing, on Operating Systems Design and Implementation (OSDI), 2002. MobiSys 2003. Reichle, et al. A Comprehensive Context Modeling Framework for Chun, Augmented Smartphone applications through clone cloud Pervasive Computing Systems, Distributed Applications and execution, Proc. of the 8th Workshop on Hot Topics in Operating Interoperable Systems, Springer Berlin Heidelberg, 2008. Systems, May 2009. Zachariadis, et al. Adaptable Mobile Applications: Exploiting Logi Flinn et al. Balancing Performance, Energy, and Quality in Pervasive cal Mobility in Mobile Computing, in Mobile Agents for Telecom Computing, Proc. of the 22nd International Conference on Distrib munication Applications, Springer Berlin Heidelberg, 2003, pp. 170 uted Computing Systems (ICDCS), Jul. 2002. 179. Flinn, et al. Self-Tuned Remote Execution for Pervasive Computing, Zachariadis, et al. Building Adaptable Mobile Middleware Services Proc. of the 8th Workshop on Hot Topics in Operating Systems Using Logical Mobility Techniques, in Contributions to Ubiquitous (HotOS), May 2001. Computing, Springer Berlin Heidelberg, 2007, pp. 3-26. Geihs, et al. Modeling of Context-Aware Self-Adaptive Applications in Ubiquitous and Service-Oriented Environments, Engi * cited by examiner U.S. Patent US 8,768,313 B2

U.S. Patent Jul. 1, 2014 Sheet 2 of 55 US 8,768,313 B2

Z"SOIH

LOETEO

U.S. Patent US 8,768,313 B2

XIÈHOWALEN

SSETTE?-HIAA

U.S. Patent Jul. 1, 2014 Sheet 4 of 55 US 8,768,313 B2

KEYVECTOR DATA (PIXELS & DERIVATIVES)

FIG. 4

RESULTA RESULT B RESULT C

KEYVECTOR FG. 4A

ADDRESSING, ETC

/ PXEL PACKET U.S. Patent Jul. 1, 2014 Sheet 5 of 55 US 8,768,313 B2

F.G. 4B

TASKA TASK B TASK C TASK D

WSUA TASK COMMONALY PIE CHARTS I FFT RE-SAMPLNG

PDF417 BARCOOE READING F.G. 5 U.S. Patent Jul. 1, 2014 Sheet 6 of 55 US 8,768,313 B2

RESDEN CAL-UP VSUAL PROCESSING SERVICES

ON ON OFF OFF ON OFF

COMMON SERVICES SORTER

OW

LEVE PXEL PRO- 8. CESSENG LIBRARY

FOW GATE CONFIGURATION, SOFTWARE PROGRAMMING EXERNAL

PXEL KEYVECTOR DATA SENSOR milier HARDWARE (PROCD PIXELS & ROUTING OTHER)

F.G. 6 NERNA U.S. Patent US 8,768,313 B2

U.S. Patent US 8,768,313 B2

U.S. Patent Jul. 1, 2014 Sheet 9 Of 55 US 8,768,313 B2

SOAC U.S. Patent Jul. 1, 2014 Sheet 10 Of 55 US 8,768,313 B2

>JOSSEOORHd ?JOSSEOORHd

ENOHdTTHO 9NISSEOO}}d U.S. Patent US 8,768,313 B2

U.S. Patent Jul. 1, 2014 Sheet 12 Of 55 US 8,768,313 B2

MY PROFILE, INCLUDING WHOI AM, AND WHAT DO GENERALLY LIKE/WANT

MY CONTEXT, INCLUDING WHERE AM, AND WHAT ARE MY CURRENT DESRES

MY WISUAL OUERYPXELS THEMSELVES

PACKETS RESULTS

VISUAL QUERY PACKET

PACKETS RESULTS PACKETS RESULTS

CONFIGURABLE ARRAY OF OFFEREDATAUCTION TO AN SERVICE PROVIDERS ARRAY OF SERVICE PROVIDERS

FIG. 11 U.S. Patent US 8,768,313 B2

EOLAECI>JESÍTI

U.S. Patent Jul. 1, 2014 Sheet 14 of 55 US 8,768,313 B2

12 SENSOR 14

PROCESSING FOR OBJECT DENTIFICATION

PROCESSING FOR HUMAN WISUAL ADDRESSING SYSTEM (e.g., JPEG compression) 15

FIG. 13

16

56

CAMERA STAGE 1 STAGE 2 STAGE Y INSTRUCTIONS INSTRUCTIONS INSTRUCTIONS ' ' ' INSTRUCTIONS 55 58a 58b.

STAGE Z NSTRUCTIONS DATA

56 59

FIG. 17 U.S. Patent Jul. 1, 2014 Sheet 15 Of 55 US 8,768,313 B2

CEL PHONE

FIG. 14

APPLICATION NCELL PROCESSING C) PHONE EXTERNAL FROM CELL PHONE APPLICATION l PHONENCELL

APPLICATION ------a EXTERNAL FROM 10 CELL PHONE

APPLICATION EXTERNAL FROM CELL PHONE U.S. Patent Jul. 1, 2014 Sheet 16 of 55 US 8,768,313 B2

0

U.S. Patent Jul. 1, 2014 Sheet 17 Of 55 US 8,768,313 B2

L CD a. O

-NO(HONAS NO||W/Z1 U.S. Patent Jul. 1, 2014 Sheet 18 Of 55 US 8,768,313 B2

8/

01 ------;;;;;;;.

U.S. Patent Jul. 1, 2014 Sheet 20 Of 55 US 8,768,313 B2

NOISIATWO ?JEHLO SEKOLARJES

CIEX1 ?JEAN-JES -----Z___------Z___---- TOOOLO>jc)

U.S. Patent Jul. 1, 2014 Sheet 21 Of 55 US 8,768,313 B2

RESPONSE TIME ROUTING NEEDED CONSTRANTS OPERATION DETALS USER HARDWARE • High level PREFERENCES o What special purpose operations hardware is local?

O LOW level o What is Current operations hardware utilization? o Common operations

o CPU pipeline length o Operations that are preconditions to o Pipeline stall risks others CONNECTWTY o Power consumption STATUS

GEOGRAPHICAL CONSIDERATIONS RULES, HEURISTICS NFORE REMOTE PROVIDER(S), E.G., Readiness Speed Cost Attributes of importance to user

OPERATIONS FOR OPERATIONS FOR LOCAL PROCESSING REMOTE PROCESSENG

FIG. 19B U.S. Patent Jul. 1, 2014 Sheet 22 Of 55 US 8,768,313 B2

MICRO MIRROR PROJECTOR U.S. Patent Jul. 1, 2014 US 8,768,313 B2

U.S. Patent Jul. 1, 2014 Sheet 24 Of 55 US 8,768,313 B2

U.S. Patent Jul. 1, 2014 Sheet 25 Of 55 US 8,768,313 B2

CAPTURE OR RECEIVE IMAGE MAGE AND/OR METADATA

------PRE-PROCESS IMAGE, SUBMIT TO GOOGLE, RECEIVE e.g., IMAGE ENHANCEMENT, AUXLIARY INFORMATION : OBJECT SEGMENTATION/ : EXTRACTION, etc.

pre- repre- vers v are per a very w w w w w w w w SUBMIT TO SERVICE FEATURE EXTRACTION FIG. 24

CAPTURE OR RECEIVE MAGE

FEATURE EXTRACTION

------SEARCH FOR MAGES WITH AUGMENTIENHANCE SMAR METRICS METADATA

USE SIMLARMAGE DATAN SUBMIT TO SERVICE EU OF USER MAGE DATA FIG. 25 FIG. 23 U.S. Patent Jul. 1, 2014 Sheet 26 of 55 US 8,768,313 B2

CAPTURE OR RECEIVE IMAGE CAPTURE OR RECEIVE IMAGE

FEATURE EXTRACTION FEATURE EXTRACTION

SEARCH FOR MAGES WITH SEARCH FOR IMAGES WITH SMILAR METRICS SMILAR METRICS

SUBMIT PLURAL SETS OF COMPOSITE OR MERGE IMAGE DATA TO SERVICE SEVERAL SETS OF IMAGE DATA PROVIDER

SUBMIT TO SERVICE PROVIDER SERVICE PROVIDER, OR USER DEVICE, DETERMINES ENHANCED RESPONSE BASED ON PLURAL SETS OF DATA FIG. 27

FIG. 26 CAPTURE OR RECEIVE USER MAGE

FEATURE EXTRACTION

SEARCH FOR IMAGES WITH SMILAR METRICS

USE INFO FROM SIMILAR IMAGE(S) TO ENHANCE USER IMAGE

FIG. 28 U.S. Patent Jul. 1, 2014 Sheet 27 of 55 US 8,768,313 B2

CAPTURE OR RECEIVE IMAGE CAPTURE OR RECEIVE IMAGE OF SUBJECT WITH

GEOLOCATION INFO FEATURE EXTRACTION

INPUT GEOLOCATION INFO TO SEARCH FOR MAGES WITH DB; GET TEXTUAL SMAR METRICS DESCRIPTORS

HARVEST METADATA SEARCH MAGE DATABASE USING TEXTUAL DESCRIPTORS:

IDENTIFY OTHER IMAGE(S) OF SUBJECT SEARCH FOR MAGES WITH

SMAR METADATA SUBMT TO SERVICE SUBMIT TO SERVICE FIG. 30 FIG. 28A

CAPTURE OR RECEIVE IMAGE OF SUBJECT WITH GEOLOCATION INFO

SEARCH IMAGE DATABASE USING GEOLOCATION DATA; IDENTIFY OTHER IMAGE(S) OF SUBJECT

SUBMIT OTHER IMAGE(S) TO SERVICE

FIG. 31 U.S. Patent Jul. 1, 2014 Sheet 28 Of 55 US 8,768,313 B2

U.S. Patent Jul. 1, 2014 Sheet 29 of 55 US 8,768,313 B2

CAPTURE OR RECEIVE IMAGE CAPTURE OR RECEIVE IMAGE OF SUBJECT WITH

OF SUBJECT WITH GEOLOCATION INFO GEOLOCATION INFO

SEARCH FOR MAGES WITH SEARCH FOR MAGES WITH SMILAR IMAGE METRICS SMAR MAGE METRICS

ADD GEOLOCATION INFO TO HARVEST METADATA MATCHING MAGES

FIG. 32 SEARCH FOR MAGES WITH SMLAR METADATA

ADD GEOLOCATION INFO TO CAPTURE OR RECEIVE IMAGE MATCHING MAGES OF SUBJECT WITH GEOLOCATION INFO FIG. 33 SEARCH FOR IMAGES WITH MATCHING GEOLOCATION

HARVEST METADATA

SEARCH FOR MAGES WITH SMAR METADATA

ADD GEOLOCATION INFO TO MATCHING MAGES

FIG. 34 U.S. Patent Jul. 1, 2014 Sheet 30 Of 55 US 8,768,313 B2

USER SNAPS IMAGE

CONSUMER URBAN/RURAL FAMILY/FRIEND/ PRODUCTIOTHER ANALYSS STRANGER ANALYSS ANALYSS

RESPONSE RESPONSE RESPONSE ENGINE if ENGINE #2 ENGINE #3

FIG. 49 CONSOLIDATOR

CELL PHONE USER INTERFACE

ROUTER

RESPONSE RESPONSE ENGINEA ENGINEB

FIG. 37A FIG. 37B U.S. Patent Jul. 1, 2014 Sheet 31 Of 55 US 8,768,313 B2

USER-IMAGE -o- "IMAGE JUCER (algorithmic, crowd-sourced, etc.)

USER MAGE-RELATED FACTORS

(First Tier) e.g., data about one or more of:

Color histogram, Pattern, Shape, Texture, Facial recognition, Eigenvalue,

Object recognition, Orientation, Text, Numbers, OCR, Barcode, Watermark, Emotion, Location, Transform data, etc

DATABASE OF PUBLIC IMAGES, SEARCHENGINE --> IMAGE-RELATED FACTORS, AND OTHER METADATA

OTHER MAGE-RELATED OTHER FACTORS AND/OR MSESin N' OTHER METADATA (Second Tier)

------r : DATABASE OF PUBLIC IMAGES, SEARCHENGINE ---> IMAGE-RELATED FACTORS, AND ------OTHER METADATA ------

FINAL FINAL OTHER IMAGES/CONTENT NFORMATION (Nth Tier) (Nth Tier)

FIG. 39 U.S. Patent Jul. 1, 2014 Sheet 32 Of 55 US 8,768,313 B2

USER-IMAGE mos (algorithmic,"IMAGE crowd-sourced, JUICER etc.)

USER IMAGE-RELATED FACTORS (First Tier)

DATABASE OF PUBLIC IMAGES, SEARCHENGINE --> IMAGE-RELATED FACTORS, AND OTHER METADATA

OTHER MAGE-RELATED OTHER FACTORS AND/OR MSEN,N' OTHER METADATA (Second Tier)

GOOGLETEXT, SEARCH ENGINE -o-o- WEB DATABASE

------DATABASE OF PUBLIC IMAGES, SEARCHENGINE ---> IMAGE-RELATED FACTORS, AND ki w w w w w w w w w OTHER MEADATA ------FINA FINAL OTHER MAGES.CONTENT INFORMATION (Nth Tier) (Nth Tier)

FIG. 40 U.S. Patent Jul. 1, 2014 Sheet 33 Of 55 US 8,768,313 B2

"MAGE JUCER” (algorithmic, crowd-sourced, etc.)

g ...Image eigenvalues; image classifier concludes "drill".

DATABASE OF PUBLIC IMAGES, SEARCH ENGINE MAGE-RELATED FACTORS, AND OHER MEADAA

Similar- ext metadata for }ooking similar-looking pix

pi

PROCESSOR Ranked metadata, incl. "Dril," "Black & Decker" and "DR250B'

DATABASE OF SEARCH PUBLIC MAGES, IMAGE-REATED ENGINE ONE OR MORE FACTORS, AND MAGE-BASED OTHER MEADATA ROUTING

SERVICES (may BROWSER Pix with similar metadata employ user, preference data),

Cached E.G., SNAPNOW web pageS FG. 41

USER INTERFACE -->

Options presented may include "Similar looks," "Similar descriptors," "Web results," "Buy," "Sell," "Manual," "More," "Snapnow," etc. U.S. Patent Jul. 1, 2014 Sheet 34 of 55 US 8,768,313 B2

"MAGE UCER" (algorithmic, Crowd-sourced, etc.)

Geolocation Coordinates, Data restraight edgelarced edge density...

DATABASE OF PUBLIC MAGES, SEARCHENGINE IMAGE-RELATED FACTORS, AND OTHER METADAA Pix that are similarly located, with Text metadata for pix that are similarly-located, similar edge? with similar edgelarc density arc density PROCESSOR

Ranked metadata, incl. "Eiffe Tower," and "Paris” FG. 42

DATABASE OF SEARCH PUBLIC IMAGES, MAGE-REATED

ENGINE ONE OR MORE FACTORS, AND MAGE-BASED OHER METAOATA ROUTING SERVICES (may Pix With Similar metadata employ user, preference data),

E.G., SNAPNOW Cached Web U options presented may include pages "Similar looks," "Similar descriptors," "Similar location," "Web results," "Snapnow." Map, Directions, USER NERFACE History, Search nearby, etc. U.S. Patent Jul. 1, 2014 Sheet 35 of 55 US 8,768,313 B2

"IMAGE JUCER" (algorithmic, crowd-sourced, etc.)

...Geolocation Coordinates, Eigenvectors.

DATABASE OF \ MS DATABASE OF PUBLIC IMAGES, MAGES, MAGE-RELAED --> SEARCH ENGINE --> MAGE-REATED FACTORS, AND FACTORS, AND OTHER METADATA OHER MEADAA

Neighborhood Address, zip code, school district, scenes, text square footage, price, bedrooms/ metadata from baths, acreage, time on market, similarly-located datafimagery re similarly-located images houses, similarly-priced houses, similarly-featured houses, etc

GOOGLE, GOOGLE EARTH

USER INTERFACE

FG. 43 U.S. Patent Jul. 1, 2014 Sheet 36 of 55 US 8,768,313 B2

as- a Fissessis a

3 -ississs & &w SS s S. SSs S. e a i : is

s FG. 44 t

DOOOOOOOOO11 k U.S. Patent Jul. 1, 2014 Sheet 37 of 55 US 8,768,313 B2

FG. 45B U.S. Patent Jul. 1, 2014 Sheet 38 of 55 US 8,768,313 B2

Rockefeller Center (21) New York (18) NYC (10) GE GEOCOORDNATES FROM MAGE Manhattan (8) Empire State Building (4) Midtown (4) SEARCH FLICKR, ETC, FOR SIMLARLY Top of the rock (4) LOCATED IMAGES (AND/OR SIMILAR IN Nueva York (3) OTHER WAYS): "SET 1" IMAGES USA (3) 30 Rock (2) August (2) HARVEST, CLEAN-UP AND CLUSTER Atlas (2) METADAA FROM SET 1 MAGES Christmas (2) CityScape (2)

Fifth Avenue (2) FROM METADATA, DETERMINE Me (2) LKELIHOOD THAT EACH MAGE IS Prometheus (2) PLACE-CENTRIC (CLASS 1) Skating rink (2) Skyline (2)

Skyscraper (2) FROMMETADATA AND FACE-FINDING, Clouds (1) DEERMINE KELIHOOD THAT EACH General Motors Building (1) IMAGE IS PERSON-CENTRIC (CLASS 2) Gertrude (1) Jerry (1)

Lights (1) Y FROM METADATA, AND ABOVE Statue (1) A SCORES, DETERMINE LIKELIHOOD HAT EACH MAGE IS THING-CENTRIC (CLASS 3)

DEERMNE WHICH MAGE METRICS REABLY GROUP LIKE-CASSED IMAGES TOGETHER, AND DISTINGUISH DFFERENTLY-CLASSED IMAGES

FG. 46A U.S. Patent Jul. 1, 2014 Sheet 39 of 55 US 8,768,313 B2

APPLY TESTS TO INPUT IMAGE: DETERMINE FROM METRICS ITS SCORE IN DIFFERENT CLASSES

PERFORMSMLARITY TESTING BETWEEN INPUT IMAGE AND EACH MAGE IN SET 1 B 1. WEIGHT METADATA FROM EACH 19 Rockefeller Center IMAGE IN ACCORDANCE WITH 12 Prometheus SIMILARITY SCORE: COMBINE 5 Skating rink

SEARCH IMAGE DATABASE FOR New York (17) ADDL SIMLARLY-LOCATED Rockefeller Center (12) IMAGES, THIS TIME ALSO USING Manhattan (6) NFERRED POSSIBLE METADATA Prometheus (5) C AS SEARCH CRITERA. "SET 2" Fountain (4) 11 Christmas (3) Paul Manship (3) Sculpture (3) PERFORMSMLARITY TESTING Skating Rink (3) BETWEEN INPUT IMAGE AND Jerry (1) EACH MAGE IN SET 2 National register of historical places (2) Nikon (2) RCA Building (2) WEIGHT METADATA FROM EACH Greek mythology (1) MAGE IN ACCORDANCE WITH SIMILARITY SCORE: COMBINE D 5 Prometheus 1. 4 Fountain FURTHERWEIGHT METADATA IN 3 Paul Manship ACCORDANCE WITH 3 Rockefeller Center UNUSUALNESS 3 Sculpture 2 National register of historical places 2 RCA Building 1 Greek mythology FIG. 46B U.S. Patent Jul. 1, 2014 Sheet 40 of 55 US 8,768,313 B2

USER SNAPS IMAGE

GEOINFO PROFE DATA DATABASE

PLACE-TERM GOOGLE, ETC GLOSSARY

MAGE PICASA ANALYSS, FLICKR DATA PROCESSING

WORD PERSON-NAME FREQUENCY GLOSSARY DATABASE FACEBOOK DATA

GOOGLE

FIG. 47

RESPONSE ENGINE

CELL PHONE USER INTERFACE U.S. Patent Jul. 1, 2014 Sheet 41 of 55 US 8,768,313 B2

USER SNAPS IMAGE

PERSON/PLACE/ THING ANALYSS

FAMILY/FRIEND/ CHILD/ADULT STRANGER ANALYSS ANALYSS

RESPONSE ENGINE

CELL PHONE USER INTERFACE FIG. 48A

USER SNAPS IMAGE

FAMILY/FRIEND/ PERSON/PLACE/ CHILD/ADULT STRANGER ANALYSS THING ANALYSS ANALYSIS

RESPONSE ENGINE

CELL PHONE USER INTERFACE FIG. 48B U.S. Patent Jul. 1, 2014 Sheet 42 of 55 US 8,768,313 B2

USER SNAPS IMAGE

FACA DCT FFT WAVELET RECOGNITION TRANSFORM TRANSFORM TRANSFORM O D - EDGE EGENVALUE SPECTRUM OCR D DETECTION ANALYSS ANALYSS U D CO T PATTERN FOURIER-MELLIN TEXTURE COLOR (? EXTRACTION TRANSFORM CLASSIFIER ANALYSS T X r GIST EMOTION BARCODE AGE T 2 PROCESSING CLASSIFIER DECODER DETECTION D - WATERMARK ORIENTATION SMILARITY GENDER T Of DECODER DETECTION PROCESSOR DETECTION O

OBJECT MEADATA GEOLOCATION | ETC c T SEGMENTATION ANALYSIS PROCESSING Of

ROUTER

SERVICE SERVICE SERVICE PROVIDER 1 PROVIDER 2 PROVIDERN

CELL PHONE USER INTERFACE

FIG. 50 U.S. Patent Jul. 1, 2014 Sheet 43 of 55 US 8,768,313 B2

INPUT IMAGE, OR NFORMATION ABOUT THE MAGE

s 3 ASO O st (3A rio3 O S.S g 2a gyg 2.U O2. Od5 C 2.st CaA 6 A. s & Oy & C Q T 2.G)

8 2 O - U re ; 332 8 33 g : 27 3: (a) Post annotated photo to user's Facebook page 3. (b) Start a text message to depicted person

V (a) Show estimations of who this may be (b) Send "Friend" invite on MySpace w (c) Who are the degrees of separation between us (a) Buy online (b) Show nutrition information (c) dentify local stores that sell

FIG. 51 U.S. Patent Jul. 1, 2014 Sheet 44 of 55 US 8,768,313 B2

COMPUTERA

CONTENT SETA ROUTER

ROUTERA COMPUTER B

RESPONSE RESPONSE CONTENT SETB ENGINE 1A ENGINE 2A

ROUTER B

COMPUTER C RESPONSE ENGINEB CONTENT SET C

RESPONSE COMPUTERD ENGINE C

CONTENT SETD

COMPUTERE ROUTERD

CONTENT SET E 1 50

RESPONSE ENGINE

FIG. 52 U.S. Patent Jul. 1, 2014 Sheet 45 of 55 US 8,768,313 B2

Sge setts 2

FEATURE VECTORS

WWW.SMTH.HOME. COM/8ABYCAM.HTM WWW.TRAFF.C.COM 52828A2796C 7B TRAFFIC MAP fSEAE

F.G. 55 SIGN GEOSSARY FIG. 54 U.S. Patent Jul. 1, 2014 Sheet 46 of 55 US 8,768,313 B2

MAGE SENSORS

AMPLFY ANALOG SIGNALS

CONVERT TO DIGITAL NATIVE MAGE DATA

BAYER INTERPOLATION

WHTE BALANCE CORRECTION

GAMMA CORRECTION

EDGE ENHANCEMENT

JPEG COMPRESSION

STORE IN BUFFER MEMORY

DISPLAY CAPTURED IMAGE ON SCREEN

ONUSER COMMAND, STORE IN CARD AND/OR TRANSMIT

FIG. 56 (Prior Art) U.S. Patent Jul. 1, 2014 Sheet 47 of 55 US 8,768,313 B2

IMAGE SENSORS

AMPFY ANALOG F.G. 57 SIGNALS

CONVERT TO DIGITAL NATIVE IMAGE DATA

or or " BAYER FOURIER ETC EGENFACE ETC INTERPOLATION TRANSFORM CALCULATION - - Etc (as in Fig. 56)

MATCH MATCH FACE TEMPLATE BARCODE Yes Yes TEMPLATE

SEND DIGITA NATIVE MELLIN MAGE DATA AND/OR TRANSFORM FOURIER DOMAN DATA TO BARCODE SERVICE

SEND DIGITAL NATIVE MAGE DATA AND/OR F-M MATCH TEXT DOMAN DATA TO OCR TEMPLATEP SERVICE Yes SEND DIGITAL NATIVE MAGE DATA AND/OR F-M ORIENTATION DOMAN DATA TO WM Yes TEMPLATET SERVICE SEND DIGITAL NATIVE MATCH FACE TEMPLATE MAGE DATA AND/OR F-M Yes DOMAN DATA AND/OR EGEN FACE DATA TO FACA RECOGNITION SERVICE U.S. Patent Jul. 1, 2014 Sheet 48 of 55 US 8,768,313 B2

FROM APPARENT is

HORIZON:

ROATON

SNCE 17 degrees left STAR

SCALE CHANGE SNCE 30% closer STAR

IMAGE CAPTURE! PROCESSING

ANALYSS, MEMORY

OUTPUT FIG. 59 DEPENDENT ON IMAGE DATA U.S. Patent Jul. 1, 2014 Sheet 49 of 55 US 8,768,313 B2

U.S. Patent Jul. 1, 2014 Sheet 50 of 55 US 8,768,313 B2

FOURIER TRANSFORM DATA b BARCODE NFORMATION FOURIER-MELLIN TRANSFORM DATA b

OCR INFORMATION wo

WATERMARK NFORMATION

DCT DATA e

MOTON INFORMATION ...-e- - METADATA EGENFACE DATA b

FACE INFORMATION -e-

SOBEL FILTER DATA e

WAVELET b TRANSFORM DATA

COLOR HISTOGRAM b DATA FIG. 61 U.S. Patent Jul. 1, 2014 Sheet 51 of 55 US 8,768,313 B2

FG 67

...r.t. --

*

F.G. 68 U.S. Patent Jul. 1, 2014 Sheet 52 of 55 US 8,768,313 B2

F.G. 69

U.S. Patent Jul. 1, 2014 Sheet 53 of 55 US 8,768,313 B2

Spatial frequency Cycles/Degree)

, 1,000

O.O. Sensitivity s 1: Threshold Wisible Contrast)

O. Ot FG. 72 Spatial Frequency (Cyclesi MMOn Retina)

F.G. 73 U.S. Patent Jul. 1, 2014 Sheet 54 Of 55 US 8,768,313 B2

THE WALISTRET OURNA

&X: : X----- H : 83 x H x:&

3.x:x

Captured image Frane

Offset Of DeCoded Watermark

FG. 76 U.S. Patent Jul. 1, 2014 Sheet 55 of 55 US 8,768,313 B2

START START

ROUGH OUTLINES LARGE DETAS INTERMEDIATE DETAS FACA FNE DETALS EGENVALUES

R, G, B Data Alpha Channel Data

FIG. 77A FIG. 77B US 8,768,313 B2 1. 2 METHODS AND SYSTEMIS FOR MAGE OR No. 6,389,055 (Alcatel-Lucent), U.S. Pat. No. 6,121,530 AUDIO RECOGNITION PROCESSING (Sonoda), and U.S. Pat. No. 6,002,946 (Reber/Motorola). The presently-detailed technology concerns improve RELATED APPLICATION DATA ments to Such technologies—moving towards the goal of intuitive computing: devices that can see and/or hear, and This application is a non-provisional of, and claims priority infer the user's desire in that sensed context. to, provisional application 61/234,542, filed Aug. 17, 2009. The present technology builds on, and extends, technology SELECTED FEATURES disclosed in other patent applications by the present assignee. The reader is thus directed to the following applications that 10 As will be apparent, the present specification details a serve to detail arrangements in which applicants intend the wealth of novel technologies. To give the reader an introduc present technology to be applied, and that technically supple tory overview, a few such arrangements are reviewed in the ment the present disclosure: following paragraphs: application Ser. No. 12/271,772, filed Nov. 14, 2008 (pub 15 (A) A portable device that receives input from one or more lished as 201001 19208); physical sensors, employs processing by one or more local application Ser. No. 61/150,235, filed Feb. 5, 2009: services, and also employs processing by one or more remote application Ser. No. 61/157,153, filed Mar. 3, 2009: services, wherein software in the device includes one or more application Ser. No. 61/167.828, filed Apr. 8, 2009: abstraction layers through which said sensors, local services, application Ser. No. 12/468,402, filed May 19, 2009 now and remote services are interfaced to the device architecture, U.S. Pat. No. 8,004,576): facilitating Substitution. application Ser. No. 12/484,115, filed Jun. 12, 2009 now (B) A portable device that receives input from one or more U.S. Pat. No. 8,385,971); and physical sensors, processes the input and packages the result application Ser. No. 12/498,709, filed Jul. 7, 2009 (now into key vector form, and transmits the key vector form from published as 20100261465) 25 the device. Also, Such an arrangement in which the device The disclosures of all the above-identified documents are receives a further-processed counterpart to the key vector incorporated herein by reference. back from a remote resource to which the key vector was transmitted. Also, Such an arrangement in which the key vec INTRODUCTION tor form is processed—on the portable device or a remote 30 device in accordance with one or more instructions that are The present specification details a diversity of technolo implied in accord with context. gies, assembled over an extended period of time, to serve a (C) A distributed processing architecture for responding to variety of different objectives. Yet they relate together in physical stimulus sensed by a cellphone (aka 'smartphone'), various ways, and can be used in conjunction, and so are the architecture employing a local process on the cellphone, presented collectively in this single document. 35 and a remote process on a remote computer, the two processes This varied, interrelated subject matter does not lend itself to a straightforward presentation. Thus, the reader's indul being linked by a packet network and an inter-process com gence is solicited as this narrative occasionally proceeds in munication construct, the architecture also including a pro nonlinear fashion among the assorted topics and technolo tocol by which different processes may communicate, this 40 protocol including a message passing paradigm with either a g1eS. message queue, or a collision handling arrangement. Also, BACKGROUND Such an arrangement in which driver Software for one or more physical sensor components provides sensor data in packet Digimarc's U.S. Pat. No. 6,947,571 shows a system in form and places the packet on an output queue, either which a cell phone camera captures content (e.g., image 45 uniquely associated with that sensor or common to plural data), and processes same to derive information related to the components; wherein local processes operate on the packets imagery. This derived information is Submitted to a data and place resultant packets back on the queue, unless the structure (e.g., a remote database), which indicates corre packet is to be processed remotely, in which case it is directed sponding data or actions. The cell phone then displays to a remote process by a router arrangement. responsive information, or takes responsive action. Such 50 (D) An arrangement in which a network associated with a sequence of operations is sometimes referred to as “visual particular physical venue is adapted to automatically discern search.” whether a set of visitors to the venue have a social connection, Related technologies are shown in patent publications by reference to traffic on the network. Also, such an arrange 20080300011 (Digimarc), U.S. Pat. No. 7,283,983 and ment that also includes discerning a demographic character WO07/130,688 (Evolution Robotics), 20070175998 and 55 istic of the group. Also, such an arrangement in which the 20020102966 (DSPV), 20060012677, 20060240862 and network facilitates adhoc networking among visitors who are 20050185060 (Google), 20060056707 and 20050227674 discerned to have a social connection. (Nokia), 20060026140 (ExBiblio), U.S. Pat. No. 6,491,217, (E) Anarrangement whereina network including computer 20020152388, 200201784.10 and 20050144455 (Philips), resources at a public venue is dynamically reconfigured in 20020072982 and 20040199387 (Shazam), 20030083098 60 accordance with a predictive model of behavior of users vis (Canon), 20010055391 (Qualcomm), 20010001854 (Air iting said venue. Also, such an arrangement in which the Clic), U.S. Pat. No. 7.251,475 (Sony), 7,174.293 (Iceberg), network reconfiguration is based, in part, on context. Also, U.S. Pat. No. 7,065,559 (Organnon Wireless), U.S. Pat. No. Such an arrangement wherein the network reconfiguration 7,016,532 (Evryx Technologies), U.S. Pat. Nos. 6,993,573 includes caching certain content. Also, such an arrangement and 6,199,048 (Neomedia), U.S. Pat. No. 6,941,275 (Tune 65 in which the reconfiguration includes rendering synthesized Hunter), U.S. Pat. No. 6,788.293 (Silverbrook Research), content and storing in one or more of the computer resources U.S. Pat. Nos. 6,766,363 and 6,675,165 (BarEoint), U.S. Pat. to make same more rapidly available. Also, such an arrange US 8,768,313 B2 3 4 ment that includes throttling back time-insensitive network and other processing tasks can be performed on a processor— traffic in anticipation of a temporal increase in traffic from the or plural processors—remote from the device, and in which USCS. packets are employed to convey data between processing (F) An arrangement in which advertising is associated with tasks, and the contents of the packets are determined in auto real world content, and a chargetherefore is assessed based on mated fashion based on consideration of two or more differ Surveys of exposure to said content—as indicated by sensors ent factors drawn from a set that includes at least: mobile in users cellphones. Also, such an arrangement in which the device power considerations; response time needed; routing charged is set through use of an automated auction arrange constraints; state of hardware resources within the mobile ment. device; connectivity status; geographical considerations; risk (G) An arrangement including two subjects in a public 10 venue, wherein illumination on said Subjects is changed dif of pipeline stall; information about the remote processor ferently—based on an attribute of a person proximate to the including its readiness, processing speed, cost, and attributes Subjects. of importance to a user of the mobile device; and the relation (H) An arrangement in which content is presented to per of the task to other processing tasks; wherein in Some circum Sons in a public venue, and there is a link between the pre 15 stances the packets may include data of a first form, and in sented content and auxiliary content, wherein the linked aux other circumstances the packets may include data of a second iliary content is changed in accordance with a demographic form. Also, such an arrangement in which the decision is attribute of a person to whom the content is presented. based on a score dependent on a combination of parameters (I) An arrangement wherein a temporary electronic license related to at least some of the listed considerations. to certain content is granted to a person in connection with the (M) An arrangement wherein a venue provides data ser person’s visit to a public venue. vices to users through a network, and the network is arranged (J) An arrangement for recognition processing of stimuli to deter use of electronic imaging by users while in the venue. captured by a sensor of a user's mobile device, in which some Also, Such an arrangement in which the deterrence is effected processing tasks can be performed on processing hardware in by restricting transmission of data from user devices to cer the device, and other processing tasks can be performed on a 25 tain data processing providers external to the network. processor—or plural—remote from the device, and in which (N) An arrangement in which a mobile communications a decision regarding whether a first task should be performed device with image capture capability includes a pipelined on the device hardware or on a remote processor is made in processing chain for performing a first operation, and a con automated fashion based on consideration of at least one trol system that has a mode in which it tests image data by factor drawn from both of the following groups: (1) band 30 performing a second operation thereon, the second operation width costs, external service provider costs, power costs to being computationally simpler than the first operation, and the cell phone battery, intangible costs in consumer (dis-) the control system applies image data to the pipelined pro satisfaction by delaying processing, available processing cessing chain only if the second operation produces an output capacity of the remote processor(s), distance to the remote of a first type. processor(s); and (2) routing constraints, geographical con 35 (O) An arrangement in which a cellphone is equipped with siderations other than distance to the remote processor(s), risk a GPU to facilitate rendering of graphics for display on a cell of pipeline stall, and the relation of the first task to other phone screen, e.g., for gaming, and the GPU is also employed processing tasks; wherein in some circumstances the first task for machine vision purposes. Also, such an arrangement in is performed on the device hardware, and in other circum which the machine vision purpose includes facial detection. stances the first task is performed on the remote processor. 40 (P) An arrangement in which plural socially-affiliated Also, such an arrangement in which the decision is based on mobile devices, maintained by different individuals, cooper a score dependent on a combination of parameters related to ate in performing a machine vision operation. Also, such an at least some of the listed considerations. arrangement wherein a first of the devices performs an opera (K) An arrangement for processing stimuli captured by a tion to extract facial features from an image, and a second of sensor of a user's mobile device, in which some processing 45 the devices performs template matching on the extracted tasks can be performed on processing hardware in the device, facial features produced by the first device. and other processing tasks can be performed on a processor— (Q) An arrangement in which a voice recognition operation or plural processors—remote from the device, and in which a is performed on audio from an incoming video or phone call sequence in which a set of tasks should be performed is made to identify a caller. Also, Such an arrangement in which the in automated fashion based on consideration of two or more 50 Voice recognition operation is performed only if the incoming different factors drawn from a set that includes at least: call is not identified by CallerID data. Also, such an arrange mobile device power considerations; response time needed; ment in which the Voice recognition operation includes ref routing constraints; state of hardware resources within the erence to data corresponding to one or more earlier-stored mobile device; connectivity status; geographical consider Voice messages. ations; risk of pipeline stall; information about the remote 55 (R) An arrangement in which speech from an incoming processor including its readiness, processing speed, cost, and Video orphone call is recognized, and text data corresponding attributes of importance to a user of the mobile device; and the thereto is generated as the call is in process. Also, such an relation of the task to other processing tasks; wherein in some arrangement in which the incoming call is associated with a circumstances the set of tasks is performed in a first sequence, particular geography, and Such geography is taken into and in other circumstances the set of tasks is performed in a 60 account in recognizing the speech. Also, Such an arrangement second, different, sequence. Also, such an arrangement in in which the text data is used to query a data structure for which the decision is based on a score dependent on a com auxiliary information. bination of parameters related to at least some of the listed (S) An arrangement for populating overlay baubles onto a considerations. mobile device screen, derived from both local and cloud (L) An arrangement for processing stimuli captured by a 65 processing. Also, Such an arrangement in which the overlay sensor of a user's mobile device, in which some processing baubles are tuned in accordance with user preference infor tasks can be performed on processing hardware in the device, mation. US 8,768,313 B2 5 6 (T) An arrangement wherein a user may (1) be charged by FIG. 11 illustrates that tasks referred for external process a vendor for a data processing service, or alternatively (2) ing can be routed to a first group of service providers who may be provided the service free or even receive credit from routinely perform certain tasks for the cell phone, or can be the vendor if the user takes certain action in relation thereto. routed to a second group of service providers who compete on (U) An arrangement in which a user receives a commercial a dynamic basis for processing tasks from the cellphone. benefit in exchange for being presented with promotional FIG. 12 further expands on concepts of FIG. 11, e.g., content—as sensed by a mobile device conveyed by the user. showing how a bid filter and broadcast agent software module (V) An arrangement in which a first user allows a second may oversee a reverse auction process. party to expend credits of the first user, or incur expenses to be FIG. 13 is a high level block diagram of a processing 10 arrangement incorporating aspects of the present technology. borne by the first user, by reason of a Social networking FIG. 14 is a high level block diagram of another processing connection between the first user and the second party. Also, arrangement incorporating aspects of the present technology. Such an arrangement in which a social networking web page FIG.15 shows an illustrative range of image types that may is a construct with which the second party must interact in be captured by a cell phone camera. expending Such credits, or incurring such expenses. 15 FIG. 16 shows a particular hardware implementation (W) An arrangement for charitable fundraising, in which a incorporating aspects of the present technology. user interacts with a physical object associated with a chari FIG. 17 illustrates aspects of a packet used in an exemplary table cause, to trigger a computer-related process facilitating embodiment. a user donation to a charity. FIG. 18 is a block diagram illustrating an implementation (X) An arrangement wherein visual query data is processed of the SIFT technique. in distributed fashion between a user's mobile device and FIG. 19 is a block diagram illustrating, e.g., how packet cloud resources, to generate a response, and wherein related header data can be changed during processing, through use of information is archived in the cloud and processed so that a memory. Subsequent visual query data can generate a more intuitive FIG. 19A shows a prior art architecture from the robotic response. 25 Player Project. The foregoing and many other features and advantages of FIG. 19B shows how various factors can influence how the present technology will be further apparent from the fol different operations may be handled. lowing detailed description, which proceeds with reference to FIG. 20 shows an arrangement by which a cell phone the accompanying drawings. camera and a cell phone projector share a lens. 30 FIG. 20A shows a reference platform architecture that can BRIEF DESCRIPTION OF THE DRAWINGS be used in embodiments of the present technology. FIG.21 shows an image of a desktop telephone captured by FIG. 1 is a high level view of an embodiment incorporating a cellphone camera. aspects of the present technology. FIG. 22 shows a collection of similar images found in a FIG. 2 shows some of the applications that a user may 35 repository of public images, by reference to characteristics request a camera-equipped cell phone to perform. discerned from the image of FIG. 21. FIG. 3 identifies some of the commercial entities in an FIGS. 23-28A, and 30-34 are flow diagrams detailing embodiment incorporating aspects of the present technology. methods incorporating aspects of the present technology. FIGS. 4, 4A and 4B conceptually illustrate how pixel data, FIG.29 is an arty shot of the EiffelTower, captured by a cell and derivatives, is applied in different tasks, and packaged 40 phone user. into packet form. FIG. 35 is another image captured by a cell phone user. FIG. 5 shows how different tasks may have certain image FIG. 36 is an image of an underside of a telephone, discov processing operations in common. ered using methods according to aspects of the present tech FIG. 6 is a diagram illustrating how common image pro nology. cessing operations can be identified, and used to configure 45 FIG. 37 shows part of the physical user interface of one cellphone processing hardware to perform these operations. style of cell phone. FIG. 7 is a diagram showing how a cell phone can send FIGS. 37A and 38B illustrate different linking topologies. certain pixel-related data across an internal bus for local pro FIG.38 is an image captured by a cellphone user, depicting cessing, and send other pixel-related data across a communi an Appalachian Trail trail marker. cations channel for processing in the cloud. 50 FIGS. 39-43 detail methods incorporating aspects of the FIG. 8 shows how the cloud processing in FIG. 7 allows present technology. tremendously more “intelligence' to be applied to a task FIG. 44 shows the user interface of one style of cellphone. desired by a user. FIGS. 45A and 45B illustrate how different dimensions of FIG.9 details how key vector data is distributed to different commonality may be explored through use of a user interface external service providers, who perform services in exchange 55 control of a cell phone. for compensation, which is handled in consolidated fashion FIGS. 46A and 46B detail a particular method incorporat for the user. ing aspects of the present technology, by which keywords FIG. 10 shows an embodiment incorporating aspects of the such as Prometheus and Paul Manship are automatically present technology, noting how cellphone-based processing determined from a cell phone image. is Suited for simple object identification tasks—such as tem 60 FIG. 47 shows some of the different data sources that may plate matching, whereas cloud-based processing is Suited for be consulted in processing imagery according to aspects of complex tasks—such as data association. the present technology. FIG. 10A shows an embodiment incorporating aspects of FIGS. 48A, 48B and 49 show different processing methods the present technology, noting that the user experience is according to aspects of the present technology. optimized by performing visual key vector processing as close 65 FIG.50 identifies some of the different processing that may to a sensor as possible, and administering traffic to the cloud be performed on image data, inaccordance with aspects of the as low in a communications Stack as possible. present technology. US 8,768,313 B2 7 8 FIG. 51 shows an illustrative tree structure that can be recognition and interaction' is contemplated, where a user of employed in accordance with certain aspects of the present the mobile device expects virtually instantaneous results and technology. augmented reality graphic feedback on the mobile device FIG. 52 shows a network of wearable computers (e.g., cell screen, as that user points the camera at a scene or object. phones) that can cooperate with each other, e.g., in a peer-to In accordance with one aspect of the present technology, a peer network. distributed network of pixel processing engines serve Such FIGS. 53-55 detailhow a glossary of signs can be identified mobile device users and meet most qualitative "human real by a cell phone, and used to trigger different actions. time interactivity” requirements, generally with feedback in FIG. 56 illustrates aspects of prior art digital camera tech much less than one second. Implementation desirably pro nology. 10 vides certain basic features on the mobile device, including a FIG. 57 details an embodiment incorporating aspects of the rather intimate relationship between the image sensors out present technology. put pixels and the native communications channel available to FIG. 58 shows how a cell phone can be used to sense and the mobile device. Certain levels of basic “content filtering display affine parameters. and classification of the pixel data on the local device, fol FIG. 59 illustrates certain state machine aspects of the 15 lowed by attaching routing instructions to the pixel data as present technology. specified by the user's intentions and Subscriptions, leads to FIG. 60 illustrates how even “still imagery can include an interactive session between a mobile device and one or temporal, or motion, aspects. more “cloud based pixel processing services. The key word FIG. 61 shows some metadata that may be involved in an “session further indicates fast responses transmitted back to implementation incorporating aspects of the present technol the mobile device, where for some services marketed as “real Ogy. time' or “interactive. a session essentially represents a FIG. 62 shows an image that may be captured by a cell duplex, generally packet-based, communication, where sev phone camera user. eral outgoing 'pixel packets' and several incoming response FIGS. 63-66 detail how the image of FIG. 62 can be pro packets (which may be pixel packets updated with the pro cessed to convey semantic metadata. 25 cessed data) may occur every second. FIG. 67 shows another image that may be captured by a cell Business factors and good old competition are at the heart phone camera user. of the distributed network. Users can subscribe to or other FIGS. 68 and 69 detail how the image of FIG. 67 can be wise tap into any external services they choose. The local processed to convey semantic metadata. device itself and/or the carrier service provider to that device FIG. 70 shows an image that may be captured by a cell 30 can be configured as the user chooses, routing filtered and phone camera user. pertinent pixel data to specified object interaction services. FIG.71 details how the image of FIG. 70 can be processed Billing mechanisms for such services can directly plug into to convey semantic metadata. existing cell and/or mobile device billing networks, wherein FIG. 72 is a chart showing aspects of the human visual users get billed and service providers get paid. system. 35 Butlets back up a bit. The addition of camera systems to FIG. 73 shows different low, mid and high frequency com mobile devices has ignited an explosion of applications. The ponents of an image. primordial application certainly must be folks simply Snap FIG. 74 shows a newspaper page. ping quick visual aspects of their environment and sharing FIG. 75 shows the layout of the FIG. 74 page, as set by such pictures with friends and family. layout software. 40 The fanning out of applications from that starting point FIG. 76 details how user interaction with imagery captured arguably hinges on a set of core plumbing features inherent in from printed text may be enhanced. mobile cameras. In short (and non-exhaustive of course), FIG. 77 illustrates how semantic conveyance of metadata Such features include: a) higher quality pixel capture and low can have a progressive aspect, akin to JPEG2000 and the like. level processing; b) better local device CPU and GPU 45 resources for on-device pixel processing with Subsequent DETAILED DESCRIPTION user feedback; c) structured connectivity into “the cloud: and importantly, d) a maturing traffic monitoring and billing The present specification details a collection of interrelated infrastructure. FIG. 1 is but one graphic perspective on some work, including a great variety oftechnologies. A few of these of these plumbing features of what might be called a visually include image processing architectures for cell phones, 50 intelligent network. (Conventional details of a cell phone, cloud-based computing, reverse auction-based delivery of Such as the microphone, A/D converter, modulation and services, metadata processing, image conveyance of semantic demodulation systems, IF stages, cellular transceiver, etc., are information, etc., etc., etc. Each portion of the specification not shown for clarity of illustration.) details technology that desirably incorporates technical fea It is all well and good to get better CPUs and GPUs, and tures detailed in other portions. Thus, it is difficult to identify 55 more memory, on mobile devices. However, cost, weight and “a beginning from which this disclosure should logically power considerations seem to favor getting “the cloud' to do begin. That said, we simply dive in. as much of the “intelligence' heavy lifting as possible. Mobile Device Object Recognition and Interaction Using Relatedly, it seems that there should be a common denomi Distributed Network Services nator set of “device-side' operations performed on visual data There is presently a huge disconnect between the unfath 60 that will serve all cloud processes, including certain format omable Volume of information that is contained in high qual ting, elemental graphic processing, and other rote operations. ity image data streaming from a mobile device camera (e.g., Similarly, it seems there should be a standardized basic in a cell phone), and the ability of that mobile device to header and addressing scheme for the resulting communica process this data to whatever end. “Off device' processing of tion traffic (typically packetized) back and forth with the visual data can help handle this fire hose of data, especially 65 cloud. when a multitude of visual processing tasks may be desired. This conceptualization is akin to the human visual system. These issues become even more critical once “real time object The eye performs baseline operations, such as chromaticity US 8,768,313 B2 9 10 groupings, and it optimizes necessary information for trans FIGS. 4A and 4B also introduce a central player in this mission along the optic nerve to the brain. The brain does the disclosure: the packaged and address-labeled pixel packet, real cognitive work. And there's feedback the other way into a body of which key vector data is inserted. The key vector too—with the brain sending information controlling muscle data may be a single patch, or a collection of patches, or a movement where to point eyes, Scanning lines of a book, time-series of patches/collections. A pixel packet may be less controlling the iris (lighting), etc. than a kilobyte, or its size can be much much larger. It may FIG.2 depicts a non-exhaustive but illustrative list of visual convey information about an isolated patch of pixels processing applications for mobile devices. Again, it is hard excerpted from a larger image, or it may convey a massive not to see analogies between this list and the fundamentals of of Notre Dame cathedral. how the human visual system and the human brain operate. It 10 (AS presently conceived, a pixel packet is an application is a well studied academic area that deals with how “opti layer construct. When actually pushed around a network, mized the human visual system is relative to any given object however, it may be broken into Smaller portions—as transport recognition task, where a general consensus is that the eye layer constraints in a network may require.) retina-optic nerve-cortex system is pretty darn wonderful in FIG. 5 is a segue diagram—still at an abstract level, but how efficiently it serves a vast array of cognitive demands. 15 pointing toward the concrete. A list of user-defined applica This aspect of the technology relates to how similarly effi tions, such as illustrated in FIG. 2, will map to a state-of-the cient and broadly enabling elements can be built into mobile art inventory of pixel processing methods and approaches devices, mobile device connections and network services, all which can accomplish each and every application. These with the goal of serving the applications depicted in FIG. 2, as pixel processing methods break down into common and not well as those new applications which may show up as the So-common component Sub-tasks. Object recognition text technology dance continues. books are filled with a wide variety of approaches and termi Perhaps the central difference between the human analogy nologies which bring a sense of order into what at first glance and mobile device networks must surely revolve around the might appear to be a bewildering array of “unique require basic concept of “the marketplace where buyers buy better ments' relative to the applications shown in FIG. 2. (In addi and better things So long as businesses know how to profit 25 tion, multiple computer vision and image processing librar accordingly. Any technology which aims to serve the appli ies, such as OpenCV and CMVision—discussed below, have cations listed in FIG.2 must necessarily assume that hundreds been created that identify and render functional operations, if not thousands of business entities will be developing the which can be considered "atomic' functions within object nitty gritty details of specific commercial offerings, with the recognition paradigms.) But FIG. 5 attempts to show that expectation of one way or another profiting from those offer 30 there are indeed a set of common steps and processes shared ings. Yes, a few behemoths will dominate main lines of cash between visual processing applications. The differently flows in the overall mobile industry, but an equal certainty shaded pie slices attempt to illustrate that certain pixel opera will be that niche players will be continually developing niche tions are of a specific class and may simply have differences applications and services. Thus, this disclosure describes how in low level variables or optimizations. The size of the overall a marketplace for visual processing services can develop, 35 pie (thought of in a logarithmic sense, where a pie twice the whereby business interests across the spectrum have some size of another may represent 10 times more Flops, for thing to gain. FIG.3 attempts a crude categorization of some example), and the percentage size of the slice, represent of the business interests applicable to the global business degrees of commonality. ecosystem operative in the era of this filing. FIG. 6 takes a major step toward the concrete, sacrificing FIG. 4 sprints toward the abstract in the introduction of the 40 simplicity in the process. Here we see a top portion labeled technology aspect now being considered. Here we find a “Resident Call-Up Visual Processing Services, which repre highly abstracted bit of information derived from some batch sents all of the possible list of applications from FIG. 2 that a of photons that impinged on Some form of electronic image given mobile device may be aware of, or downright enabled to sensor, with a universe of waiting consumers of that lowly bit. perform. The idea is that not all of these applications have to FIG. 4A then quickly introduces the intuitively well-known 45 beactive all of the time, and hence some sub-set of services is concept that singular bits of visual information arent worth actually “turned on at any given moment. The turned on much outside of their role in both spatial and temporal group applications, as a one-time configuration activity, negotiate to ings. This core concept is well exploited in modern video identify their common component tasks, labeled the “Com compression standards such as MPEG7 and H.264. mon Processes Sorter' first generating an overall common The “visual character of the bits may be pretty far 50 list of pixel processing routines available for on-device pro removed from the visual domain by certain of the processing cessing, chosen from a library of these elemental image pro (consider, e.g., the vector strings representing eigenface cessing routines (e.g., FFT, filtering, edge detection, resam data). Thus, we sometimes use the term "key vector data” (or pling, color histogramming, log-polar transform, etc.). “key vector strings') to refer collectively to raw sensor/stimu Generation of corresponding Flow Gate Configuration/Soft lus data (e.g., pixel data), and/or to processed information and 55 ware Programming information follows, which literally loads associated derivatives. A key vector may take the form of a library elements into properly ordered places in a field pro container in which Such information is conveyed (e.g., a data grammable gate array set-up, or otherwise configures a Suit structure such as a packet). A tag or other data can be included able processor to perform the required component tasks. to identify the type of information (e.g., JPEG image data, FIG. 6 also includes depictions of the image sensor, fol eigenface data), or the data type may be otherwise evident 60 lowed by a universal pixel segmenter. This pixel segmenter from the data or from context. One or more instructions, or breaks down the massive stream of imagery from the sensor operations, may be associated with key vector data—either into manageable spatial and/or temporal blobs (e.g., akin to expressly detailed in the key vector, or implied. An operation MPEG macroblocks, wavelet transform blocks, 64x64 pixel may be implied in default fashion, for key vector data of blocks, etc.). After the torrent of pixels has been broken down certain types (e.g., for JPEG data it may be “store the image: 65 into chewable chunks, they are fed into the newly pro for eigenface data is may be “match this eigenface template”). grammed gate array (or other hardware), which performs the Or an implied operation may be dependent on context. elemental image processing tasks associated with the selected US 8,768,313 B2 11 12 applications. (Such arrangements are further detailed below, services can largely get the job done. But even in the barcode in an exemplary system employing 'pixel packets.) Various example, flexibility in the growth and evolution of overt output products are sent to a routing engine, which refers the visual coding targets begs for an architecture which doesn't elementally-processed data (e.g., key vector data) to other force “code upgrades' to a gaZillion devices every time there resources (internal and/or external) for further processing. is some advance in the overt symbology art. This further processing typically is more complex that that At the other end of the spectrum, arbitrarily complex tasks already performed. Examples include making associations, can be imagined, e.g., referring to a network of Supercomput deriving inferences, pattern and template matching, etc. This ers the task of predicting the apocryphal typhoon resulting further processing can be highly application-specific. from the fluttering of a butterfly's wings halfway around the (Consider a promotional game from Pepsi, inviting the 10 world—if the application requires it. OZ beckons. public to participate in a treasure huntina State park. Based on FIG. 8 attempts to illustrate this radical extra dimension internet-distributed clues, people try to find a hidden six-pack ality of pixel processing in the cloud as opposed to the local of soda to earn a S500 prize. Participants must download a device. This virtually goes without saying (or without a pic special application from the Pepsi-dot-com web site (or the ture), but FIG. 8 is also a segue figure to FIG. 9, where Apple AppStore), which serves to distribute the clues (which 15 Dorothy gets back to Kansas and is happy about it. may also be published to Twitter). The downloaded applica FIG.9 is all about cash, cash flow, and happy humans using tion also has a prize verification component, which processes cameras on their mobile devices and getting highly meaning image data captured by the users cell phones to identify a ful results back from their visual queries, all the while paying special pattern with which the hidden six-pack is uniquely one monthly bill. It turns out the Google AdWords' auction marked. SIFT object recognition is used (discussed below), genie is out of the bottle. Behind the scenes of the moment with the SIFT feature descriptors for the special package by-moment visual scans from a mobile user of their immedi conveyed with the downloaded application. When an image ate visual environment are hundreds and thousands of micro match is found, the cell phone immediately reports same decisions, pixel routings, results comparisons and micro wirelessly to Pepsi. The winner is the user whose cell phone auctioned channels back to the mobile device user for the hard first reports detection of the specially-marked six-pack. In the 25 good they are “truly” looking for, whether they know it or not. FIG. 6 arrangement, some of the component tasks in the SIFT This last point is deliberately cheeky, in that searching of any pattern matching operation are performed by the elemental kind is inherently open ended and magical at Some level, and image processing in the configured hardware; others are part of the fun of searching in the first place is that Surpris referred for more specialized processing—either internal or ingly new associations are part of the results. The search user external.) 30 knows after the fact what they were truly looking for. The FIG. 7 up-levels the picture to a generic distributed pixel system, represented in FIG. 9 as the carrier-based financial services network view, where local device pixel services and tracking server, now sees the addition of our networked pixel "cloud based pixel services have a kind of symmetry in how services module and its role in facilitating pertinent results they operate. The router in FIG.7 takes care of how any given being sent back to a user, all the while monitoring the uses of packaged pixel packet gets sent to the appropriate pixel pro 35 the services in order to populate the monthly bill and send the cessing location, whether local or remote (with the style of fill proceeds to the proper entities. pattern denoting different component processing functions; (As detailed further elsewhere, the money flow may not only a few of the processing functions required by the enabled exclusively be to remote service providers. Other money visual processing services are depicted). Some of the data flows can arise. Such as to users or other parties, e.g., to induce shipped to cloud-based pixel services may have been first 40 or reward certain actions.) processed by local device pixel services. The circles indicate FIG. 10 focuses on functional division of processing that the routing functionality may have components in the illustrating how tasks in the nature of template matching can cloud—nodes that serve to distribute tasks to active service be performed on the cell phone itself, whereas more sophis providers, and collect results for transmission back to the ticated tasks (in the nature of data association) desirably are device. In some implementations these functions may be 45 referred to the cloud for processing. performed at the edge of the wireless network, e.g., by mod Elements of the foregoing are distilled in FIG. 10A, show ules at wireless service towers, so as to ensure the fastest ing an implementation of aspects of the technology as a action. Results collected from the active external service pro physical matter of (usually) software components. The two viders, and the active local processing stages, are fed back to ovals in the figure highlight the symmetric pair of Software Pixel Service Manager software, which then interacts with 50 components which are involved in setting up a "human real the device user interface. time' visual recognition session between a mobile device and FIG. 8 is an expanded view of the lower right portion of the generic cloud or service providers, data associations and FIG. 7 and represents the moment where Dorothy's shoes visual query results. The oval on the left refers to “keyvec turn red and why distributed pixel services provided by the tors' and more specifically “visual key vectors.” As noted, this cloud—as opposed to the local device—will probably trump 55 term can encompass everything from simple JPEG com all but the most mundane object recognition tasks. pressed blocks all the way through log-polar transformed Object recognition in its richer form is based on visual facial feature vectors and anything in between and beyond. association rather than strict template matching rules. If we The point of a key vector is that the essential raw information all were taught that the capital letter “A” will always be of some given visual recognition task has been optimally strictly following some pre-historic form never to change, a 60 pre-processed and packaged (possibly compressed). The oval universal template image if you will, then pretty clean and on the left assembles these packets, and typically inserts some locally prescriptive methods can be placed into a mobile addressing information by which they will be routed. (Final imaging device in order to get it to reliably read a capital A addressing may not be possible, as the packet may ultimately any time that ordained form 'A' is presented to the camera. be routed to remote service providers—the details of which 2D and even three 3D barcodes in many ways follow this 65 may not yet be known.) Desirably, this processing is per template-like approach to object recognition, where for con formed as close to the raw sensor data as possible. Such as by tained applications involving Such objects, local processing processing circuitry integrated on the same Substrate as the US 8,768,313 B2 13 14 image sensor, which is responsive to software instructions In some circumstances, one input to the query router and stored in memory or provided from another stage in packet response manager may be the user's location, so that a differ form. ent service provider may be selected when the user is at home The oval on the right administers the remote processing of in Oregon, than when she is vacationing in Mexico. In other key Vector data, e.g., attending to arranging appropriate Ser 5 instances, the required turnaround time is specified, which vices, directing traffic flow, etc. Desirably, this software pro may disqualify some vendors, and make others more com cess is implemented as low down on a communications stack petitive. In some instances the query router and response as possible, generally on a "cloud side' device, access point, manager need not decide at all, e.g., if cached results identi or cell tower. (When real-time visual key vector packets fying a service provider selected in a previous auction are still stream over a communications channel, the lower down in the 10 communications Stack they are identified and routed, the available and not beyond a “freshness” threshold. smoother the “human real-time' look and feel a given visual Pricing offered by the vendors may change with processing recognition task will be.) Remaining high level processing load, bandwidth, time of day, and other considerations. In needed to support this arrangement is included in FIG. 10A some embodiments the providers may be informed of offers for context, and can generally be performed through native 15 Submitted by competitors (using known trust arrangements mobile and remote hardware capabilities. assuring data integrity), and given the opportunity to make FIGS. 11 and 12 illustrate the concept that some providers their offers more enticing. Such a bidding war may continue of some cloud-based pixel processing services may be estab until no bidder is willing to change the offered terms. The lished in advance, in a pseudo-static fashion, whereas other query router and response manager (or in some implementa providers may periodically vie for the privilege of processing tions, the user) then makes a selection. a user's key vector data, through participation in a reverse For expository convenience and visual clarity, FIG. 12 auction. In many implementations, these latter providers shows a software module labeled “Bid Filter and Broadcast compete each time a packet is available for processing. Agent. In most implementations this forms part of the query Consider a user who Snaps a cell phone picture of an router and response manager module. The bid filter module unfamiliar car, wanting to learn the make and model. Various 25 decides which vendors—from a universe of possible ven service providers may compete for this business. A startup dors—should be given a chance to bid on a processing task. vendor may offer to perform recognition for free to build its (The user's preference data, or historical experience, may brand or collect data. Imagery submitted to this service indicate that certain service providers be disqualified.) The returns information simply indicating the car's make and broadcast agent module then communicates with the selected model. Consumer Reports may offer an alternative service— 30 bidders to inform them of a user task for processing, and which provides make and model data, but also provides tech provides information needed for them to make a bid. nical specifications for the car. However, they may charge 2 Desirably, the bid filter and broadcast agent do at least cents for the service (or the cost may be bandwidth based, some their work in advance of data being available for pro e.g., 1 cent per megapixel). Edmunds, or JD Powers, may cessing. That is, as soon as a prediction can be made as to an offer still another service, which provides data like Consumer 35 operation that the user may likely soon request, these modules Reports, but pays the user for the privilege of providing data. start working to identify a provider to perform a service In exchange, the vendor is given the right to have one of its expected to be required. A few hundred milliseconds later the partners send a text message to the user promoting goods or user key vector data may actually be available for processing services. The payment may take the form of a credit on the (if the prediction turns out to be accurate). user's monthly cell phone voice/data service billing. 40 Sometimes, as with Google's present AdWords system, the Using criteria specified by the user, Stored preferences, service providers are not consulted at each user transaction. context, and other rules/heuristics, a query router and Instead, each provides bidding parameters, which are stored response manager (in the cellphone, in the cloud, distributed, and consulted whenever a transaction is considered, to deter etc.) determines whether the packet of data needing process mine which service provider wins. These stored parameters ing should be handled by one of the service providers in the 45 may be updated occasionally. In some implementations the stable of static standbys, or whether it should be offered to service provider pushes updated parameters to the bid filter providers on an auction basis—in which case it arbitrates the and broadcast agent whenever available. (The bid filter and outcome of the auction. broadcast agent may serve a large population of users, such as The static standby service providers may be identified all Verizon subscribers in area code 503, or all subscribers to when the phone is initially programmed, and only reconfig 50 an ISP in a community, or all users at the domain well-dot ured when the phone is reprogrammed. (For example, Verizon com, etc.; or more localized agents may be employed. Such as may specify that all FFT operations on its phones be routed to one for each cell phone tower.) a server that it provides for this purpose.) Or, the user may be If there is a lull in traffic, a service provider may discount its able to periodically identify preferred providers for certain services for the next minute. The service provider may thus tasks, as through a configuration menu, or specify that certain 55 transmit (or post) a message stating that it will perform eigen tasks should be referred for auction. Some applications may vector extraction on an image file of up to 10 megabytes for 2 emerge where static service providers are favored; the task cents until 1244754176 Coordinated Universal Time in the may be so mundane, or one provider's services may be so Unix epoch, after which time the price will return to 3 cents. un-paralleled, that competition for the provision of services The bid filter and broadcast agent updates a table with stored isn't warranted. 60 bidding parameters accordingly. In the case of services referred to auction, Some users may (Information about the Google reverse auction, used to exalt price above all other considerations. Others may insist place sponsored advertising on web search results page, is on domestic data processing. Others may want to Stick to reproduced at the end of the provisional specification to service providers that meet “green.” “ethical or other stan which this application claims priority. This information was dards of corporate practice. Others may prefer richer data 65 published in Wired magazine on May 22, 2009, in an article output. Weightings of different criteria can be applied by the by Stephen Levy, entitled “Secret of Googlenomics: Data query router and response manager in making the decision. Fueled Recipe Brews Profitability.”) US 8,768,313 B2 15 16 In other implementations, the broadcast agent polls the In another example, a consumer may be rewarded for bidders—communicating relevant parameters, and soliciting accepting commercials, or commercial impressions, from a bid responses whenever a transaction is offered for process company. If a consumer goes into the Pepsi Center in Denver, ing. she may receive a reward for each Pepsi-branded experience Once a prevailing bidder is decided, and data is available she encounters. The amount of micropayment may scale with for processing, the broadcast agent transmits the key vector the amount of time that she interacts with the different Pepsi data (and other parameters as may be appropriate to a par branded objects (including audio and imagery) in the venue. ticular task) to the winning bidder. The bidder then performs Notjust large brand owners can provide credits to individu the requested operation, and returns the processed data to the als. Credits can be routed to friends and social/business 10 acquaintances. To illustrate, a user of Facebook may share query router and response manager. This module logs the credit (redeemable for goods/services, or exchangeable for processed data, and attends to any necessary accounting (e.g., cash) from his Facebook page—enticing others to visit, or crediting the service provider with the appropriate fee). The linger. In some cases, the credit can be made available only to response data is then forwarded back to the user device. people who navigate to the Facebook page in a certain man In a variant arrangement, one or more of the competing 15 ner—such as by linking to the page from the user's business service providers actually performs some or all of the card, or from another launch page. requested processing, but "teases the user (or the query As another example, consider a Facebook user who has router and response manager) by presenting only partial earned, or paid for, or otherwise received credit that can be results. With a taste of what’s available, the user (or the query applied to certain services—such as for downloading Songs router and response manager) may be induced to make a from iTunes, or for music recognition services, or for identi different choice than relevant criteria/heuristics would other fying clothes that go with particular shoes (for which an wise indicate. image has been Submitted), etc. These services may be asso The function calls sent to external service providers, of ciated with the particular Facebook page, so that friends can course, do not have to provide the ultimate result sought by a invoke the services from that page—essentially spending the consumer (e.g., identifying a car, or translating a menu listing 25 host’s credit (again, with Suitable authorization or invitation from French to English). They can be component operations, by that hosting user). Likewise, friends may Submit images to such as calculating an FFT, or performing a SIFT procedure a facial recognition service accessible through an application or a log-polar transform, or computing a histogram or eigen associated with the user's Facebook page. Images Submitted vectors, or identifying edges, etc. in such fashion are analyzed for faces of the hosts friends, In time, it is expected that a rich ecosystem of expert 30 and identification information is returned to the submitter, processors will emerge—serving myriad processing requests e.g., through a user interface presented on the originating from cell phones and other thin client devices. Facebook page. Again, the host may be assessed a fee for each More on Monetary Flow Such operation, but may allow authorized friends to avail Additional business models can be enabled, involving the themselves of Such service at no cost. subsidization of consumed remote services by the service 35 Credits, and payments, can also be routed to charities. A providers themselves in exchange for user information (e.g., viewer exiting a theatre after a particularly poignant movie for audience measurement), or in exchange for action taken about poverty in Bangladesh may capture an image of an by the user, such as completing a Survey, visiting specific associated movie poster, which serves as a portal for dona sites, locations in store, etc. tions for a charity that serves the poor in Bangladesh. Upon Services may be subsidized by third parties as well, such a 40 recognizing the movie poster, the cell phone can present a coffee shop that derives value by providing a differentiating graphical/touch user interface through which the user spins service to its customers in the form of free/discounted usage dials to specify an amount of a charitable donation, which at of remote services while they are seated in the shop. the conclusion of the transaction is transferred from a finan In one arrangement an economy is enabled wherein a cur cial account associated with the user, to one associated with rency of remote processing credits is created and exchanged 45 the charity. between users and remote service providers. This may be More on a Particular Hardware Arrangement entirely transparent to the user and managed as part of a As noted above and in the cited patent documents, there is service plan, e.g., with the user's cell phone or data service a need for generic object recognition by a mobile device. provider. Or it can be exposed as a very explicit aspect of the Some approaches to specialized object recognition have present technology. Service providers and others may award 50 emerged, and these have given rise to specific data processing credits to users for taking actions or being part of a frequent approaches. However, no architecture has been proposed that user program to build allegiance with specific providers. goes beyond specialized object recognition toward generic As with other currencies, users may choose to explicitly object recognition. donate, save, exchange or generally barter credits as needed. Visually, a generic object recognition arrangement Considering these points in further detail, a service may 55 requires access to good raw visual data preferably free of pay a user for opting-in to an audience measurement panel. device quirks, Scene quirks, user quirks, etc. Developers of E.g., The Nielsen Company may provide services to the pub systems built around object identification will best prosper lic—Such as identification of television programming from and serve their users by concentrating on the object identifi audio or video samples Submitted by consumers. These ser cation task at hand, and not the myriad existing roadblocks, vices may be provided free to consumers who agree to share 60 resource sinks, and third party dependencies that currently Some of their media consumption data with Nielsen (Such as must be confronted. by serving as an anonymous member for a city's audience As noted, virtually all object identification techniques can measurement panel), and provided on a fee basis to others. make use of or even rely upon a pipe to “the cloud.” Nielsen may offer, for example, 100 units of credit micro “Cloud can include anything external to the cell phone. An payments or other value—to participating consumers each 65 example is a nearby cellphone, or plural phones on a distrib month, or may provide credit each time the user Submits uted network. Unused processing power on Such other phone information to Nielsen. devices can be made available for hire (or for free) to call upon US 8,768,313 B2 17 18 as needed. The cell phones of the implementations detailed done in the cell phone, other processing is done externally, herein can Scavenge processing power from Such other cell and the application Software orchestrating the process resides phones. in the cell phone. Such a cloud may be ad hoc, e.g., other cellphones within To illustrate further discussion, FIG. 15 shows a range 40 of Bluetooth range of the user's phone. The ad hoc network can some of the different types of images 41-46 that may be be extended by having such other phones also extend the local captured by a particular user's cell phone. A few brief (and cloud to further phones that they can reach by Bluetooth, but incomplete) comments about some of the processing that may the user cannot. be applied to each image are provided in the following para The "cloud can also comprise other computational plat graphs. 10 Image 41 depicts a thermostat. A steganographic digital forms, such as set-top boxes; processors in automobiles, ther watermark 47 is textured or printed on the thermostats case. mostats, HVAC systems, wireless routers, local cell phone (The watermark is shown as visible in FIG. 15, but is typically towers and other wireless network edges (including the pro imperceptible to the viewer). The watermark conveys infor cessing hardware for their software-defined radio equip mation intended for the cell phone, allowing it to present a ment), etc. Such processors can be used in conjunction with 15 graphic user interface by which the user can interact with the more traditional cloud computing resources—as are offered thermostat. A bar code or other data carrier can alternatively by Google, Amazon, etc. be used. Such technology is further detailed below, and in (In View of concerns of certain users about privacy, the patent application Ser. No. 12/498,709, filed Apr. 14, 2009. phone desirably has a user-configurable option indicating Image 42 depicts an item including a barcode 48. This whether the phone can refer data to cloud resources for pro barcode conveys Universal Product Code (UPC) data. Other cessing. In one arrangement, this option has a default value of barcodes may convey other information. The barcode pay “No, limiting functionality and impairing battery life, but load is not primarily intended for reading by a user cellphone also limiting privacy concerns. In another arrangement, this (in contrast to watermark 47), but it nonetheless may be used option has a default value of “Yes”) by the cell phone to help determine an appropriate response Desirably, image-responsive techniques should produce a 25 for the user. short term “result or answer,” which generally requires some Image 43 shows a product that may be identified without level of interactivity with a user hopefully measured in reference to any express machine readable information (Such fractions of a second for truly interactive applications, or a as a bar code or watermark). A segmentation algorithm may few seconds or fractions of a minute for nearer-term “I’m be applied to edge-detected image data to distinguish the patient to wait' applications. 30 apparent image subject from the apparent background. The As for the objects in question, they can break down into image Subject may be identified through its shape, color and various categories, including (1) generic passive (clues to texture. Image fingerprinting may be used to identify refer basic searches), (2) geographic passive (at least you know ence images having similar labels, and metadata associated where you are, and may hook into geographic-specific with those other images may be harvested. SIFT techniques resources), (3) “cloud supported passive, as with “identified/ 35 (discussed below) may be employed for such pattern-based enumerated objects and their associated sites, and (4) active/ recognition tasks. Specular reflections in low texture regions controllable, a la ThingPipe (a reference to technology may tend to indicate the image Subject is made of glass. detailed in application Ser. No. 12/498,709, such as WiFi Optical character recognition can be applied for further infor equipped thermostats and parking meters). mation (reading the visible text). All of these clues can be An object recognition platform should not, it seems, be 40 employed to identify the depicted item, and help determine an conceived in the classic “local device and local resources appropriate response for the user. only' software mentality. However, it may be conceived as a Additionally (or alternatively), similar-image search sys local device optimization problem. That is, the software on tems, such as Google Similar Images, and Microsoft Live the local device, and its processing hardware, should be Search, can be employed to find similar images, and their designed in contemplation of their interaction with off-device 45 metadata can then be harvested. (As of this writing, these software and hardware. Ditto the balance and interplay of services do not directly Supportupload of a user picture to find both control functionality, pixel crunching functionality, and similar web pictures. However, the user can post the image to application software/GUI provided on the device, versus off Flickr (using Flickr's cellphone upload functionality), and it the device. (In many implementations, certain databases use will soon be found and processed by Google and Microsoft.) ful for object identification/recognition will reside remote 50 Image 44 is a Snapshot of friends. Facial detection and from the device.) recognition may be employed (i.e., to indicate that there are In a particularly preferred arrangement, such a processing faces in the image, and to identify particular faces and anno platform employs image processing near the sensor—opti tate the image with metadata accordingly, e.g., by reference to mally on the same chip, with at least Some processing tasks user-associated data maintained by Apple's iPhoto service, desirably performed by dedicated, special purpose hardware. 55 Google's Picasa service, Facebook, etc.) Some facial recog Consider FIG. 13, which shows an architecture of a cell nition applications can be trained for non-human faces, e.g., phone 10 in which an image sensor 12 feeds two processing cats, dogs animated characters including avatars, etc. Geolo paths. One, 13, is tailored for the human visual system, and cation and date/time information from the cell phone may includes processing Such as JPEG compression. Another, 14. also provide useful information. is tailored for object recognition. As discussed, some of this 60 The persons wearing Sunglasses pose a challenge for some processing may be performed by the mobile device, while facial recognition algorithms. Identification of those indi other processing may be referred to the cloud 16. viduals may be aided by their association with persons whose FIG. 14 takes an application-centric view of the object identities can more easily be determined (e.g., by conven recognition processing path. Some applications reside wholly tional facial recognition). That is, by identifying other group in the cellphone. Other applications reside wholly outside the 65 pictures in iPhoto/Picasa/Facebook/etc. that include one or cell phone—e.g., simply taking key vector data as stimulus. more of the latter individuals, the other individuals depicted More common are hybrids, Such as where some processing is in Such photographs may also be present in the Subject image. US 8,768,313 B2 19 20 These candidate persons form a much smaller universe of author the field 55 to specify that the sensor is to sum sensor possibilities than is normally provided by unbounded iPhoto/ charges to reduce resolution (e.g., producing a frame of 640x Picasa/Facebook/etc data. The facial vectors discernable 480 data from a sensor capable of 1280x960), output data from the Sunglass-wearing faces in the Subject image can then only from red-filtered sensor cells, output data only from a be compared against this Smaller universe of possibilities in 5 horizontal line of cells across the middle of the sensor, output order to determine a best match. If, in the usual case of data only from a 128x128 patch of cells in the center of the recognizing a face, a score of 90 is required to be considered pixel array, etc. The camera instruction field 55 may further a match (out of an arbitrary top match score of 100), in specify the exact time that the camera is to capture data—so searching Such a group-constrained set of images a score of as to allow, e.g., desired synchronization with ambient light 70 or 80 might suffice. (Where, as in image 44, there are two 10 ing (as detailed later). persons depicted without Sunglasses, the occurrence of both Each packet 56 issued by setup module 34 may include of these individuals in a photo with one or more other indi different camera parameters in the first header field 55. Thus, viduals may increase its relevance to Such an analysis— a first packet may cause camera 32 to capture a full frame implemented, e.g., by increasing a weighting factor in a image with an exposure time of 1 millisecond. A next packet matching algorithm.) 15 may cause the camera to capture a full frame image with an Image 45 shows part of the statue of Prometheus in Rock exposure time of 10 milliseconds, and a third may dictate an efeller Center, NY. Its identification can follow teachings exposure time of 100 milliseconds. (Such frames may later be detailed elsewhere in this specification. processed in combination to yield a high dynamic range Image 46 is a landscape, depicting the Maroon Bells moun image.) A fourth packet may instruct the camera to down tain range in Colorado. This image subject may be recognized sample data from the image sensor, and combine signals from by reference to geolocation data from the cellphone, in con differently color-filtered sensor cells, so as to output a 4x3 junction with geographic information services such as array of grayscale luminance values. A fifth packet may GeoNames or Yahoo!'s GeoPlanet. instruct the camera to output data only from an 8x8 patch of (It should be understood that techniques noted above in pixels at the center of the frame. A sixth packet may instruct connection with processing of one of the images 41-46 in 25 the camera to output only five lines of image data, from the FIG. 15 can likewise be applied to others of the images. top, bottom, middle, and mid-upper and mid-lower rows of Moreover, it should be understood that while in some respects the sensor. A seventh packet may instruct the camera to output the depicted images are ordered according to ease of identi only data from blue-filtered sensor cells. An eighth packet fying the Subject and formulating a response, in other respects may instruct the camera to disregard any auto-focus instruc they are not. For example, although landscape image 46 is 30 tions but instead capture a full frame at infinity focus. And so depicted to the far right, its geolocation data is strongly cor O. related with the metadata"Maroon Bells.” Thus, this particu Each such packet 57 is provided from setup module 34 lar image presents an easier case than that presented by many across a bus or other data channel 60 to a camera controller other images.) module associated with the camera. (The details of a digital In one embodiment, Such processing of imagery occurs 35 camera—including an array of photosensor cells, associated automatically—without express user instruction each time. analog-digital converters and control circuitry, etc., are well Subject to network connectivity and power constraints, infor known to artisans and so are not belabored.) Camera 32 mation can be gleaned continuously from Such processing, captures digital image data in accordance with instructions in and may be used in processing Subsequently-captured the header field 55 of the packet and stuffs the resulting image images. For example, an earlier image in a sequence that 40 data into a body 59 of the packet. It also deletes the camera includes photograph 44 may show members of the depicted instructions 55 from the packet header (or otherwise marks group without Sunglasses—simplifying identification of the header field 55 in a manner permitting it to be disregarded by persons later depicted with Sunglasses. Subsequent processing stages). FIG. 16, Etc., Implementation When the packet 57 was authored by setup module 34 it FIG. 16 gets into the nifty gritty of a particular implemen 45 also included a series of further header fields 58, each speci tation incorporating certain of the features earlier dis fying how a corresponding. Successive post-sensor stage 38 cussed. (The other discussed features can be implemented by should process the captured data. As shown in FIG. 16, there the artisan within this architecture, based on the provided are several Such post-sensor processing stages 38. disclosure.) In this data driven arrangement 30, operation of a Camera 32 outputs the image-stuffed packet produced by cellphone camera 32 is dynamically controlled in accordance 50 the camera (a pixel packet) onto a bus or other data channel with packet data sent by a setup module 34, which in turn is 61, which conveys it to a first processing stage 38. controlled by a control processor module 36. (Control pro Stage 38 examines the header of the packet. Since the cessor module 36 may be the cellphone's primary processor, camera deleted the instruction field 55 that conveyed camera oran auxiliary processor, or this function may be distributed.) instructions (or marked it to be disregarded), the first header The packet data further specifies operations to be performed 55 field encountered by a control portion of stage 38 is field 58a. by an ensuing chain of processing stages 38. This field details parameters of an operation to be applied by In one particular implementation, setup module 34 dic stage 38 to data in the body of the packet. tates—on a frame by frame basis—the parameters that are to For example, field 58a may specify parameters of an edge be employed by camera 32 in gathering an exposure. Setup detection algorithm to be applied by stage 38 to the packets module 34 also specifies the type of data the camera is to 60 image data (or simply that such an algorithm should be output. These instructional parameters are conveyed in a first applied). It may further specify that stage 38 is to substitute field 55 of a header portion 56 of a data packet 57 correspond the resulting edge-detected set of data for the original image ing to that frame (FIG. 17). data in the body of the packet. (Substituting of data, rather For example, for each frame, the setup module 34 may than appending, may be indicated by the value of a single bit issue a packet 57 whose first field 55 instructs the camera 65 flag in the packet header.) Stage 38 performs the requested about, e.g., the length of the exposure, the aperture size, the operation (which may involve configuring programmable lens focus, the depth of field, etc. Module 34 may further hardware in certain implementations). First stage 38 then US 8,768,313 B2 21 22 deletes instructions 58a from the packet header 56 (or marks particular type of processing (e.g., object identification). This them to be disregarded) and outputs the processed pixel focus/exposure information can be used as predictive setup packet for action by a next processing stage. data for the camera the next time a frame of the same or A control portion of a next processing stage (which here similar type is captured. The control processor module 36 can comprises stages 38a and 38b, discussed later) examines the set up a frame request using a filtered or time-series prediction header of the packet. Since field 58a was deleted (or marked sequence of focus information from previous frames, or a to be disregarded), the first field encountered is field 58b. In sub-set of those frames. this particular packet, field 58b may instruct the second stage Error and status reporting functions may also be accom not to perform any processing on the data in the body of the plished using outputs 31. Each stage may also have one or packet, but instead simply delete field 58b from the packet 10 more other outputs 33 for providing data to other processes or header and pass the pixel packet to the next stage. modules—locally within the cell phone, or remote (“in the A next field of the packet header may instruct the third cloud'). Data (in packet form, or in other format) may be stage 38c to perform 2D FFT operations on the image data directed to such outputs in accordance with instructions in found in the packet body, based on 16x16 blocks. It may packet 57, or otherwise. further direct the stage to hand-off the processed FFT data to 15 For example, a processing module 38 may make a data flow a wireless interface, for internet transmission to address selection based on Some result of processing it performs. E.g., 216.239.32.10, accompanied by specified data (detailing, if an edge detection stage discerns a sharp contrast image, e.g., the task to be performed on the received FFT data by the then an outgoing packet may be routed to an external service computer at that address, such as texture classification). It provider for FFT processing. That provider may return the may further direct the stage to hand off a single 16x16 block resultant FFT data to other stages. However, if the image has of FFT data, corresponding to the center of the captured poor edges (such as being out of focus), then the system may image, to the same or a different wireless interface for trans not want FFT and following processing to be performed on mission to address 12.232.235.27—again accompanied by the data. Thus, the processing stages can cause branches in the corresponding instructions about its use (e.g., search an data flow, dependent on parameters of the processing (such as archive of stored FFT data for a match, and return information 25 discerned image characteristics). if a match is found; also, store this 16x16 block in the archive Instructions specifying such conditional branching can be with an associated identifier). Finally, the header authored by included in the header of packet 57, or they can be provided setup module 34 may instruct stage 38c to replace the body of otherwise. FIG. 19 shows one arrangement. Instructions 58d the packet with the single 16x16 block of FFT data dispatched originally in packet 57 specify a condition, and specify a to the wireless interface. As before, the stage also edits the 30 location in a memory 79 from which replacement subsequent packet header to delete (or mark) the instructions to which it instructions (58e'-58g) can be read, and substituted into the responded, so that a header instruction field for the next packet header, if the condition is met. If the condition is not processing stage is the first to be encountered. met, execution proceeds in accordance with header instruc In other arrangements, the addresses of the remote com tions already in the packet. puters are not hard-coded. For example, the packet may 35 In other arrangements, other variations can be employed. include a pointer to a database record or memory location (in For example, all of the possible conditional instructions can the phone or in the cloud), which contains the destination be provided in the packet. In another arrangement, a packet address. Or, stage 38c may be directed to hand-off the pro architecture is still used, but one or more of the header fields cessed pixel packet to the Query Router and Response Man do not include explicit instructions. Rather, they simply point ager (e.g., FIG. 7). This module examines the pixel packet to 40 to memory locations from which corresponding instructions determine what type of processing is next required, and it (or data) are retrieved, e.g., by the corresponding processing routes it to an appropriate provider (which may be in the cell stage 38. phone if resources permit, or in the cloud—among the stable Memory 79 (which can include a cloud component) can of static providers, or to a provider identified through an also facilitate adaptation of processing flow even if condi auction). The provider returns the requested output data (e.g., 45 tional branching is not employed. For example, a processing texture classification information, and information about any stage may yield output data that determines parameters of a matching FFT in the archive), and processing continues per filter or other algorithm to be applied by a later stage (e.g., a the next item of instruction in the pixel packet header. convolution kernel, a time delay, a pixel mask, etc). Such The data flow continues through as many functions as a parameters may be identified by the former processing stage particular operation may require. 50 in memory (e.g., determined/calculated, and stored), and In the particular arrangement illustrated, each processing recalled for use by the later stage. In FIG. 19, for example, stage 38 strips-out, from the packet header, the instructions on processing stage 38 produces parameters that are stored in which it acted. The instructions are ordered in the header in memory 79. A subsequent processing stage 38c later retrieves the sequence of processing stages, so this removal allows these parameters, and uses them in execution of its assigned each stage to look to the first instructions remaining in the 55 operation. (The information in memory can be labeled to header for direction. Other arrangements, of course, can alter identify the module/provider from which they originated, or natively be employed. (For example, a module may insert to which they are destined