  Attila Tanács

Department of Image Processing and Computer Graphics University of Szeged  Mobile platforms o Modern

 Mobile imaging o Physical constraints, software and hardware tricks

 Mobile image processing o Software packages o Use cases

2 3  Penetration  Feature phone o Billions of potentional users o Basic functions o More SIM cards than people • Voice calls, SMS o Closed system o Limited developer  Usage possibilities o Communication • Basic: voice, SMS • Internet-based: Facebook,  messengers, email, … o Internet access o Source of information o Multitude of communication • Weather, stock indices, channels exchange rate, news, o navigation, … Expandable „knowledge” o Entertainment o Open environment for developers • Games, e-books, music, multimedia o App stores

4  First mobile phone call (1973) o Motorola (Martin Cooper → Joel S. Engel)  Manager calculators (1980s, 1990s) o EPOC o Personal digital assistant, no phone!  Apple Newton platform (1987-1998) o Handwriting recognition, digital assistant, no phone!  PDA era (1996-2008) o Palm, Windows CE, Windows Mobile o Still no phone, multimedia push  Modern smartphones (2007-) o Convergence of mobile phones and PDAs o Nokia communicators (phone with data capabilities) o RIM Blackberry, Symbian o Apple iPhone, Android, Windows Phone, … • Radical market change from 2007!  Tablets, o Decline of PCs 5 Since 2013:

Android ~80% iOS ~20% Others ~ 0%

6  CPU o Mainly ARM architecture (32 and 64-bits) o 16 MHz → 2.3 GHz (quad and octa core) o Dedicated GPU  Memory o 1 MB → 1–3 GB  Display o Resistive: stylus – old technology! o Capacitive: multi-finger taps o Resolution: 160x160 black-white → 4K, 2.55’’–7’’ o Autostereoscopic 3D displays  New sensors o Camera, GPS, accelerometer, magnetic compass, gyroscope, microphone, magnetic field, light sensor, barometer,, … 7  1997: Sharp, Kyocera: initial experiments (in Japan)  2000: First phone camera o Sharp J-SH04, 0.1 MP resolution  2000: Eyemodule Digital Camera Springboard Module for Handspring Visor o 320x240 resolution  2002: Sony Ericsson T68i o First MMS-capable phone (with externally attachable camera)  2002: Sanyo SCP-5300 o First in USA, VGA (0.3 MP) resolution  2003: more camera phones than digital cameras  2004: Nokia sells the most digital cameras  2006: half of the mobile phones have a camera  2008: Nokia is the biggest camera manufacturer (analogue and digital all together!)

 2010: almost all phones have cameras 8  Huge social impact!  Taking notes o Family o Opening hours, • Sending photos and videos to advertisements any distances  Visual codes • Instantly o Bar codes, QR codes o Politics • Broadcast events  Photo processing and filtering • Inform news channels o Montage, HDR, time lapse, o Science removing moving objects, … • Meteorology  Photo sharing and upload • Tsunami o Instagram, Twitter • …  Content-based image search, augmented reality o Client-server applications  Image analysis o Face detection, OCR, …  Video calls  … 9  Visual codes o Product identification o Business card scanning o URL o General text  Complex photo functions o Smile/blink detection o Post processing o HDR imaging o Panorama creation (even using sensor information) o Taking 3D photos

10  Google Goggles (Android) o Buildings, paintings o Visual codes o Business card o Books, products o Solving Sudoku

 Google Cloud Vision API Source: Google System BlogSpot o See later in this presentation

11  Optical character recognition, scanning o Scanning and recognizing text on the go o OfficeLens  Translation o Translating words and phrases in camera live view o (Word Lens, CamTranslator,) Google Translate  Augmented Driving (iPhone) o Lane detection, approaching vehicle warning o iOnRoad  Analyzing skin pigments o Doctor Mole  Motion detection o Security applications

 … 12  Imaging sensor o Sensitivity (ISO) o Resolution (MP) o Size, type  Lens o Shutter speed o Focal length • Zoom o Aperture  Illuminance o Amount, evenness • → HDR (High Dynamic Range) o Color temperature Source: Stanford University  Larger sensor size → o Less noise o Better depth of field o Better photos o But larger lens needed!

 Few examples o 35 mm: DSLR o APS-C: DSLR, MILC o 4/3’’: MILC o 1/1.8’’: top compact Source: Engadget o 1/2.3’’: compact, better mobile o 1/3.2’’: ordinary mobile  In mobile devices o Usually small sized sensors o Comparable to average P&S cameras

Nokia smartphone sensors Mobile devices P&S cameras Smartphones

Mobile devices MILC DSLR

Average Point&Shoot cameras Sources: Camera Image Sensor Comparison  Brightness o Amount of light getting through the lens o Aperture (F-stop) and shutter speed play role

 Focal length o Determines the angle of view o Fixed • Mobile or „pancake” lens (MILC, DSLR) • Simpler buildup → smaller lens o Zoom • 3x-5x: compact, MILC, DSLR • 5x-40x: ultrazoom (compact) cameras • Only very few smartphones have zoom!

Source: Camera Lenses Research Project Smartphone cameras

Source: http://www.digital-photography-student.com/lens-focal-length-explained/ High enough megapixel count, but  Cheap and small sized sensors o Low power consumption and cheap price is dominant  Weak lens o Longer shutter speeds Megapixel count alone is not a good • Blurry photos, especially in low light situations indicator of o Weak flash or lack thereof imaging quality! • Works acceptably only in good illuminance o Miniaturized size • Geometric distortions  Limited parameter settings o Automatism, slow!  Advantages o Small, always at hand, easily available • „The best camera is the one that’s with you!” (Chase Jarvis) o Might replace low-end compact cameras o Photos can be edited and shared instantly o Good quality video

 Distinguishing factor for manufacturers o OS and other phone features are mainly similar o Manufacturers pay attention to camera capabilities • Especially in high-end devices

19  Better sensors o BSI, Exmor, IsoCell, Ultrapixel, … o Megapixel war is (mainly) over! o Dedicated processing chip o OIS (optical image stabilization)  Camera function o Zoom lens (thicker device) Source: PixInfo o Faster and more precise focusing using laser light o Dual-lens systems (with different parameters) o Better front side camera module („selfie” camera)  Software photo processing o HDR o Simulate depth-of-field o Sensor-based panorama 20 Source: iPhone Informer  Panorama creation o Horizontal and sphere (StreetView-like) panoramas

 High Dynamic Range photos o Handling uneven lighting conditions

 Artificial depth-of-field simulation (”lens blur”) o Highlight what is important in the scene o Refocus: interactively select area after image capture

22  Google Camera o Taking photos using sensory help (gyroscope) o Blending overlapping photos to eliminate blockiness

Source: downloadol.com 23  HDR (High Dynamic Range) o Difficult illumination situations (dark and bright regions in the image) o Software solution • Series of photos with different exposure values (e.g., ±2 EV, 3-5 photos) • Fixed position, stand still scene • Photo fusion • Divide the photos to little squares • Select the one with highest information content • Entropy • Blending to get smooth boundaries

Images: A.A. Ghostasby, András Osztroluczki and Erik Zajác  ”Lens blur” o Wide DoF: (almost) everything is sharp in the scene • „Documentary” o Shallow DoF: only objects at a certain distance interval are sharp, others are blurred • Photographer can show what he/she thinks is important in the scene • Hard to obtain using mobile camera modules due to physical constraints

Source: Google Research BlogSpot  Software solution using 1 camera (Google Camera app) o Camera movement during exposition (small video) • See YouTube video o Estimating distance of object points (works for objects close to us) o Follow-up interactive refocus using the depth map and blur strength Source: Google Research BlogSpot Source: Google Research BlogSpot 27  Two identical lenses o 3D stereo – disappeared from markets  Lenses with two different focal lengths o Dual-lens smartphone cameras are coming, and this is why we want one o Can switch lenses to magnify more distant subjects without resorting to digital zoom  Lenses with different aperture o : F/2 and F/2.4 apertures • It uses both cameras at the same time to capture several photos at different apertures • Lets adjust the focal point and aperture after you've already taken a photo, as well as a choice of apertures — from f/0.95 to f/16  Different sensors o RGB + Black&White: better low light performance since BW is without Bayer filtering; hardware based DoF simulation 28  RGB-D sensor o Color + distance • Kinect-like sensor o Google Project • Mapping 3D interior of rooms • Small distances (few meters) • No direct (sun)light • Development kit is available (in the USA) • Phab 2 Pro is coming (first commercial Tango device) o Official YouTube Video o The next mobile imaging war won't be waged over megapixels • „Though Mountain View is focused on 3D mapping, so-called depth camera tech could dramatically improve all the pictures you take with your smartphone.”  Native APIs available o Camera handling (photo, live view), bitmap I/O and display o Accessing data → mainly do-it-yourself processing! • Warning: Do not reinvent the wheel!

 Dedicated image processing libraries o OpenCV • http://www.opencv.org • Open source (BSD licence) and multiplatform • Computer vision, machine learning functionality o FastCV • https://developer.qualcomm.com/docs/fastcv/api/index.html • From (the chip maker), mobile specific 30  >2500 optimized algorithms o Basic algorithms • Filtering, edge detection, histogram, Hough-transform, … o Complex functionality • Face detection and recognition, movement classification, object tracking, image stitching, content-based image search, …

 Multiple platforms and programming languages o >10 years of development, numerous contributors o C++ template interface o Python, Matlab, Java bindings o Windows, Linux, Mac OS, Android, iOS versions • Architecture-specific pre-built libraries  On smartphone camera live view o Fast processing is necessary

 Applications o Face detection (and recognition) o Content-based image search (client-server) o Object classification (client only)

32  Usage o Anonimize photos (e.g., StreetView) o Tagging photos: detecting and recognizing participants o Security functions o Marketing (age, man/woman, …) o Detecting emotions

 Main steps o Detecting faces o Search face in database (recognition)

33  Viola-Jones detector (2001) o Very fast (real time), high accuracy o Strong classifier built from weak classifiers (AdaBoost) • Weak classifier ~ Somewhat better than random o Cascading • Fast rejection of trivial cases o Haar-like features • Intensity sums and differences inside rectangular regions  Publication o Viola, P. and Jones, M. (2001). Rapid Object Detection using a Boosted Cascade of Simple Filters. Conference on Computer Vision and Pattern Recognition. o Viola, P. and Jones, M. (2004). Robust Real-Time Face Detection. International Journal of Computer Vision 57(2), 137-154. 34  Haar-like features o Rectangular regions of different sizes (windows) o Differences of intensity sums in windows („blacks” – „whites”) • Eyes-face (dark-bright); eyes-nose (dark-bright-dark); … o Can be computed efficiently • Using integral image representation • Overwhelmingly many possible features. Which are the good ones?

35  Integral image representation o At position (x,y) we can find the sum of the intensity values to the „left” and „up” ′ • 푖푖 푥, 푦 = 푥′≤푥,푦′≤푦 푖(푥 , 푦′) • Can be computed by one pass over the image o The sum of intensity values inside rectangle ”D” can be computed by indexing 4 array values

• 퐷 = 푃4 + 푃1 − 푃2 + 푃3

36  Learning phase with AdaBoost o Positive and negative annotated examples (24x24 pixel) o Long learning time: can take even hours/days o Cascaded: fast rejection of regions where there is no face

37  Evaluation o Hundreds of thousands of features o Tasks • Determine cascades of features • Determine feature weights inside cascades

= 1 (if w1x1 +… wd xd + wd+1 > 0) o(x1, x2,…, xd-1, xd)

= -1 (otherwise)

38  Training set examples

Source: University of Cambridge Digital Technology Group. Available online at http://www.cl.cam.ac.uk/research/dtg/attarchive/pub/data/att_f aces.zip.

39 Source: Prof. Padhraic Smyth, University of California, Irvine  Detection in images o Size of 24x24pixels; 6060 features in 38 layers of cascade o Scaling • Change size of detector (not the image) 1.25x o Position • In moving window o Final detection • At overlapping detections (at least e.g., 3) • Post processing: merge overlaps to single detection o Computing time • In 2003 using 700 MHz CPU: 0.067 secs for image sizes of 384 x 288 pixels

40  Results

41 Source: Prof. Padhraic Smyth, University of California, Irvine o http://makematics.com/research/viola-jones/

42  Object detection approaches o opencv_haartraining (older versions) o opencv_traincascade (newer versions) • Haar-like features • Local binary pattern (LBP) features o Should provide negative and positive examples o Can be used to classify other type of objects too

 Face recognition o Eigenface, Fisherfaces, and LBP histogram approaches

 Image source

o Photo or video 43  for Android SDK o Processing detected face • Blinking, smile • Face orientation • Face recognition (from version 2.1) • Blog

 facereclib o https://pypi.python.org/pypi/facereclib o Comparing face recognition algorithms using standard databases

 Many applications in Google Play o Search for ”face detection” 44  Huge amount of image data o Cheap digital imaging • Digital cameras, mobile devices with camera o Internet • Easy sharing of photos → Need for automatic image retrieval methods

 Decades of research o CBIR (Content-based image retrieval) o QBE (Query by Example) o Paper describing earlier approaches • Ying Liua, Dengsheng Zhanga, Guojun Lua,Wei-Ying Mab. A survey of content-based image retrieval with high-level semantics. Pattern Recognition 40 (2007) pp. 262–282. 45  Problem o Usually there is no direct connection between low level features and higher order, abstract descriptions

 Image database types o Specialized • Photos from a well defined topic • E.g., plant leaves, flowers, cars, … o Wide range of photos • Heterogeneous • E.g., photos uploaded to social networks (Flickr, FaceBook, Google Photos, …)

46  Three levels of search o Level 1 • Search based on low level features • ”Find similar images to this one” • Should provide a photo or sketch o Level 2 • Inference based on features • ”Find photos of flowers!” o Level 3 • Search based on abstract attributes • ”Find photos of happy people!”  At levels 2 and 3 it is important the compensate for the semantic gap!

47  Approaches o Image content + surrounding text (in web pages) o Object antologies o Machine learning o Providing relevance feedback o Semanic templates o Bag of visual words

 Deep learning o Popular buzzword today: gives way better results than other approaches o Multilayer nonlinear processing o Can be supervised or non supervised o Can learn multiple levels of abstraction (conception hierarchy) o Hierarchical representation: Higher levels are based on lower level concepts o Needs really large amuont of data! 48  Usually client-server solutions o Mobile uploads photo, server gives result

 Android applications o PlinkArt: search for paintings (Google acquired) o Google Goggles • Famous buildings, paintings • Decoding visual codes • Business card scanning • Image based identification of books and other comsumer products • Sudoku solver

49  Analysing photo content o What is visible in the image? • Can give a list of object classes • Can caption photos • Google's AI is getting really good at captioning photos https://www.engadget.com/2016/09/23/googles-ai-is-getting-really-good- at-captioning-photos/  ImageNet o Large database of annotated photos • Over 10 million annotations provided by the community • Categorized into thousands of fine-grained categories (e.g., 120 categories of dog breeds) o Challenge • Detect objects in photos from 1000 categories

• Earlier approaches: 25% error rate; deep learning: few percent! 50  Google Cloud Vision API o Made available in 2016 • Google opens its Vision API in public beta to give any app the power of Google Photos o CLOUD VISION API BETA o Quickstart  Sample Applications o Label Tagging Using Kubernetes; Landmark Detection Using Google Cloud Storage; Text Detection Using the Vision API o Object Detection Using Android Device Photos  Main features of usage o Needs internet connection o Can be tested for free (for larger scale use you have to pay a fee) 51  Usage o Accessible from smartphone apps also (client-server solution) o Gives description of image contents • »It quickly classifies images into thousands of categories (e.g., "sailboat", "lion", "Eiffel Tower"),detects individual objects and faces within images, and finds and reads printed words contained within images.« o Try Google Photos! • Search for photos containing dogs, river, … in your own photo library

52  TensorFlow o Google’s open source AI package • Available on Github o C++, Python API + bindings to other languages o Desktop PC + mobile environment • https://www.tensorflow.org/mobile/ • Android, iOS, Raspberry  Features o After the learning stage (that can take a loooong time on PC massively using GPU architecture), o Model can be used locally on mobile camera live view (really fast)  Android examples o https://github.com/tensorflow/tensorflow/tree/master/tensorflow/exa mples/android/ o APK is available (based on Google’s Inception model)

• Provides TF Classify, TF Detect, TF Stylize 53