Attila Tanács
Department of Image Processing and Computer Graphics University of Szeged Mobile platforms o Modern smartphones
Mobile imaging o Physical constraints, software and hardware tricks
Mobile image processing o Software packages o Use cases
2 3 Penetration Feature phone o Billions of potentional users o Basic functions o More SIM cards than people • Voice calls, SMS o Closed system o Limited developer Usage possibilities o Communication • Basic: voice, SMS • Internet-based: Facebook, Smartphone messengers, email, … o Internet access o Source of information o Multitude of communication • Weather, stock indices, channels exchange rate, news, o navigation, … Expandable „knowledge” o Entertainment o Open environment for developers • Games, e-books, music, multimedia o App stores
4 First mobile phone call (1973) o Motorola (Martin Cooper → Joel S. Engel) Manager calculators (1980s, 1990s) o EPOC o Personal digital assistant, no phone! Apple Newton platform (1987-1998) o Handwriting recognition, digital assistant, no phone! PDA era (1996-2008) o Palm, Windows CE, Windows Mobile o Still no phone, multimedia push Modern smartphones (2007-) o Convergence of mobile phones and PDAs o Nokia communicators (phone with data capabilities) o RIM Blackberry, Symbian o Apple iPhone, Android, Windows Phone, … • Radical market change from 2007! Tablets, phablets o Decline of PCs 5 Since 2013:
Android ~80% iOS ~20% Others ~ 0%
6 CPU o Mainly ARM architecture (32 and 64-bits) o 16 MHz → 2.3 GHz (quad and octa core) o Dedicated GPU Memory o 1 MB → 1–3 GB Display o Resistive: stylus – old technology! o Capacitive: multi-finger taps o Resolution: 160x160 black-white → 4K, 2.55’’–7’’ o Autostereoscopic 3D displays New sensors o Camera, GPS, accelerometer, magnetic compass, gyroscope, microphone, magnetic field, light sensor, barometer,, … 7 1997: Sharp, Kyocera: initial experiments (in Japan) 2000: First phone camera o Sharp J-SH04, 0.1 MP resolution 2000: Eyemodule Digital Camera Springboard Module for Handspring Visor o 320x240 resolution 2002: Sony Ericsson T68i o First MMS-capable phone (with externally attachable camera) 2002: Sanyo SCP-5300 o First camera phone in USA, VGA (0.3 MP) resolution 2003: more camera phones than digital cameras 2004: Nokia sells the most digital cameras 2006: half of the mobile phones have a camera 2008: Nokia is the biggest camera manufacturer (analogue and digital all together!)
2010: almost all phones have cameras 8 Huge social impact! Taking notes o Family o Opening hours, • Sending photos and videos to advertisements any distances Visual codes • Instantly o Bar codes, QR codes o Politics • Broadcast events Photo processing and filtering • Inform news channels o Montage, HDR, time lapse, o Science removing moving objects, … • Meteorology Photo sharing and upload • Tsunami o Instagram, Twitter • … Content-based image search, augmented reality o Client-server applications Image analysis o Face detection, OCR, … Video calls … 9 Visual codes o Product identification o Business card scanning o URL o General text Complex photo functions o Smile/blink detection o Post processing o HDR imaging o Panorama creation (even using sensor information) o Taking 3D photos
10 Google Goggles (Android) o Buildings, paintings o Visual codes o Business card o Books, products o Solving Sudoku
Google Cloud Vision API Source: Google System BlogSpot o See later in this presentation
11 Optical character recognition, scanning o Scanning and recognizing text on the go o OfficeLens Translation o Translating words and phrases in camera live view o (Word Lens, CamTranslator,) Google Translate Augmented Driving (iPhone) o Lane detection, approaching vehicle warning o iOnRoad Analyzing skin pigments o Doctor Mole Motion detection o Security applications
… 12 Imaging sensor o Sensitivity (ISO) o Resolution (MP) o Size, type Lens o Shutter speed o Focal length • Zoom o Aperture Illuminance o Amount, evenness • → HDR (High Dynamic Range) o Color temperature Source: Stanford University Larger sensor size → o Less noise o Better depth of field o Better photos o But larger lens needed!
Few examples o 35 mm: DSLR o APS-C: DSLR, MILC o 4/3’’: MILC o 1/1.8’’: top compact Source: Engadget o 1/2.3’’: compact, better mobile o 1/3.2’’: ordinary mobile In mobile devices o Usually small sized sensors o Comparable to average P&S cameras
Nokia smartphone sensors Mobile devices P&S cameras Smartphones
Mobile devices MILC DSLR
Average Point&Shoot cameras Sources: Camera Image Sensor Comparison Brightness o Amount of light getting through the lens o Aperture (F-stop) and shutter speed play role
Focal length o Determines the angle of view o Fixed • Mobile or „pancake” lens (MILC, DSLR) • Simpler buildup → smaller lens o Zoom • 3x-5x: compact, MILC, DSLR • 5x-40x: ultrazoom (compact) cameras • Only very few smartphones have zoom!
Source: Camera Lenses Research Project Smartphone cameras
Source: http://www.digital-photography-student.com/lens-focal-length-explained/ High enough megapixel count, but Cheap and small sized sensors o Low power consumption and cheap price is dominant Weak lens o Longer shutter speeds Megapixel count alone is not a good • Blurry photos, especially in low light situations indicator of o Weak flash or lack thereof imaging quality! • Works acceptably only in good illuminance o Miniaturized size • Geometric distortions Limited parameter settings o Automatism, slow! Advantages o Small, always at hand, easily available • „The best camera is the one that’s with you!” (Chase Jarvis) o Might replace low-end compact cameras o Photos can be edited and shared instantly o Good quality video
Distinguishing factor for manufacturers o OS and other phone features are mainly similar o Manufacturers pay attention to camera capabilities • Especially in high-end devices
19 Better sensors o BSI, Exmor, IsoCell, Ultrapixel, … o Megapixel war is (mainly) over! o Dedicated processing chip o OIS (optical image stabilization) Camera function o Zoom lens (thicker device) Source: PixInfo o Faster and more precise focusing using laser light o Dual-lens systems (with different parameters) o Better front side camera module („selfie” camera) Software photo processing o HDR o Simulate depth-of-field o Sensor-based panorama 20 Source: iPhone Informer Panorama creation o Horizontal and sphere (StreetView-like) panoramas
High Dynamic Range photos o Handling uneven lighting conditions
Artificial depth-of-field simulation (”lens blur”) o Highlight what is important in the scene o Refocus: interactively select area after image capture
22 Google Camera o Taking photos using sensory help (gyroscope) o Blending overlapping photos to eliminate blockiness
Source: downloadol.com 23 HDR (High Dynamic Range) o Difficult illumination situations (dark and bright regions in the image) o Software solution • Series of photos with different exposure values (e.g., ±2 EV, 3-5 photos) • Fixed position, stand still scene • Photo fusion • Divide the photos to little squares • Select the one with highest information content • Entropy • Blending to get smooth boundaries
Images: A.A. Ghostasby, András Osztroluczki and Erik Zajác ”Lens blur” o Wide DoF: (almost) everything is sharp in the scene • „Documentary” o Shallow DoF: only objects at a certain distance interval are sharp, others are blurred • Photographer can show what he/she thinks is important in the scene • Hard to obtain using mobile camera modules due to physical constraints
Source: Google Research BlogSpot Software solution using 1 camera (Google Camera app) o Camera movement during exposition (small video) • See YouTube video o Estimating distance of object points (works for objects close to us) o Follow-up interactive refocus using the depth map and blur strength Source: Google Research BlogSpot Source: Google Research BlogSpot 27 Two identical lenses o 3D stereo – disappeared from markets Lenses with two different focal lengths o Dual-lens smartphone cameras are coming, and this is why we want one o Can switch lenses to magnify more distant subjects without resorting to digital zoom Lenses with different aperture o Honor 6 Plus: F/2 and F/2.4 apertures • It uses both cameras at the same time to capture several photos at different apertures • Lets adjust the focal point and aperture after you've already taken a photo, as well as a choice of apertures — from f/0.95 to f/16 Different sensors o RGB + Black&White: better low light performance since BW is without Bayer filtering; hardware based DoF simulation 28 RGB-D sensor o Color + distance • Kinect-like sensor o Google Project Tango • Mapping 3D interior of rooms • Small distances (few meters) • No direct (sun)light • Development kit is available (in the USA) • Lenovo Phab 2 Pro phablet is coming (first commercial Tango device) o Official YouTube Video o The next mobile imaging war won't be waged over megapixels • „Though Mountain View is focused on 3D mapping, so-called depth camera tech could dramatically improve all the pictures you take with your smartphone.” Native APIs available o Camera handling (photo, live view), bitmap I/O and display o Accessing pixel data → mainly do-it-yourself processing! • Warning: Do not reinvent the wheel!
Dedicated image processing libraries o OpenCV • http://www.opencv.org • Open source (BSD licence) and multiplatform • Computer vision, machine learning functionality o FastCV • https://developer.qualcomm.com/docs/fastcv/api/index.html • From Qualcomm (the chip maker), mobile specific 30 >2500 optimized algorithms o Basic algorithms • Filtering, edge detection, histogram, Hough-transform, … o Complex functionality • Face detection and recognition, movement classification, object tracking, image stitching, content-based image search, …
Multiple platforms and programming languages o >10 years of development, numerous contributors o C++ template interface o Python, Matlab, Java bindings o Windows, Linux, Mac OS, Android, iOS versions • Architecture-specific pre-built libraries On smartphone camera live view o Fast processing is necessary
Applications o Face detection (and recognition) o Content-based image search (client-server) o Object classification (client only)
32 Usage o Anonimize photos (e.g., StreetView) o Tagging photos: detecting and recognizing participants o Security functions o Marketing (age, man/woman, …) o Detecting emotions
Main steps o Detecting faces o Search face in database (recognition)
33 Viola-Jones detector (2001) o Very fast (real time), high accuracy o Strong classifier built from weak classifiers (AdaBoost) • Weak classifier ~ Somewhat better than random o Cascading • Fast rejection of trivial cases o Haar-like features • Intensity sums and differences inside rectangular regions Publication o Viola, P. and Jones, M. (2001). Rapid Object Detection using a Boosted Cascade of Simple Filters. Conference on Computer Vision and Pattern Recognition. o Viola, P. and Jones, M. (2004). Robust Real-Time Face Detection. International Journal of Computer Vision 57(2), 137-154. 34 Haar-like features o Rectangular regions of different sizes (windows) o Differences of intensity sums in windows („blacks” – „whites”) • Eyes-face (dark-bright); eyes-nose (dark-bright-dark); … o Can be computed efficiently • Using integral image representation • Overwhelmingly many possible features. Which are the good ones?
35 Integral image representation o At position (x,y) we can find the sum of the intensity values to the „left” and „up” ′ • 푖푖 푥, 푦 = 푥′≤푥,푦′≤푦 푖(푥 , 푦′) • Can be computed by one pass over the image o The sum of intensity values inside rectangle ”D” can be computed by indexing 4 array values
• 퐷 = 푃4 + 푃1 − 푃2 + 푃3
36 Learning phase with AdaBoost o Positive and negative annotated examples (24x24 pixel) o Long learning time: can take even hours/days o Cascaded: fast rejection of regions where there is no face
37 Evaluation o Hundreds of thousands of features o Tasks • Determine cascades of features • Determine feature weights inside cascades
= 1 (if w1x1 +… wd xd + wd+1 > 0) o(x1, x2,…, xd-1, xd)
= -1 (otherwise)
38 Training set examples
Source: University of Cambridge Digital Technology Group. Available online at http://www.cl.cam.ac.uk/research/dtg/attarchive/pub/data/att_f aces.zip.
39 Source: Prof. Padhraic Smyth, University of California, Irvine Detection in images o Size of 24x24pixels; 6060 features in 38 layers of cascade o Scaling • Change size of detector (not the image) 1.25x o Position • In moving window o Final detection • At overlapping detections (at least e.g., 3) • Post processing: merge overlaps to single detection o Computing time • In 2003 using 700 MHz CPU: 0.067 secs for image sizes of 384 x 288 pixels
40 Results
41 Source: Prof. Padhraic Smyth, University of California, Irvine o http://makematics.com/research/viola-jones/
42 Object detection approaches o opencv_haartraining (older versions) o opencv_traincascade (newer versions) • Haar-like features • Local binary pattern (LBP) features o Should provide negative and positive examples o Can be used to classify other type of objects too
Face recognition o Eigenface, Fisherfaces, and LBP histogram approaches
Image source
o Photo or video 43 Qualcomm Snapdragon for Android SDK o Processing detected face • Blinking, smile • Face orientation • Face recognition (from version 2.1) • Blog
facereclib o https://pypi.python.org/pypi/facereclib o Comparing face recognition algorithms using standard databases
Many applications in Google Play o Search for ”face detection” 44 Huge amount of image data o Cheap digital imaging • Digital cameras, mobile devices with camera o Internet • Easy sharing of photos → Need for automatic image retrieval methods
Decades of research o CBIR (Content-based image retrieval) o QBE (Query by Example) o Paper describing earlier approaches • Ying Liua, Dengsheng Zhanga, Guojun Lua,Wei-Ying Mab. A survey of content-based image retrieval with high-level semantics. Pattern Recognition 40 (2007) pp. 262–282. 45 Problem o Usually there is no direct connection between low level features and higher order, abstract descriptions
Image database types o Specialized • Photos from a well defined topic • E.g., plant leaves, flowers, cars, … o Wide range of photos • Heterogeneous • E.g., photos uploaded to social networks (Flickr, FaceBook, Google Photos, …)
46 Three levels of search o Level 1 • Search based on low level features • ”Find similar images to this one” • Should provide a photo or sketch o Level 2 • Inference based on features • ”Find photos of flowers!” o Level 3 • Search based on abstract attributes • ”Find photos of happy people!” At levels 2 and 3 it is important the compensate for the semantic gap!
47 Approaches o Image content + surrounding text (in web pages) o Object antologies o Machine learning o Providing relevance feedback o Semanic templates o Bag of visual words
Deep learning o Popular buzzword today: gives way better results than other approaches o Multilayer nonlinear processing o Can be supervised or non supervised o Can learn multiple levels of abstraction (conception hierarchy) o Hierarchical representation: Higher levels are based on lower level concepts o Needs really large amuont of data! 48 Usually client-server solutions o Mobile uploads photo, server gives result
Android applications o PlinkArt: search for paintings (Google acquired) o Google Goggles • Famous buildings, paintings • Decoding visual codes • Business card scanning • Image based identification of books and other comsumer products • Sudoku solver
49 Analysing photo content o What is visible in the image? • Can give a list of object classes • Can caption photos • Google's AI is getting really good at captioning photos https://www.engadget.com/2016/09/23/googles-ai-is-getting-really-good- at-captioning-photos/ ImageNet o Large database of annotated photos • Over 10 million annotations provided by the community • Categorized into thousands of fine-grained categories (e.g., 120 categories of dog breeds) o Challenge • Detect objects in photos from 1000 categories
• Earlier approaches: 25% error rate; deep learning: few percent! 50 Google Cloud Vision API o Made available in 2016 • Google opens its Vision API in public beta to give any app the power of Google Photos o CLOUD VISION API BETA o Quickstart Sample Applications o Label Tagging Using Kubernetes; Landmark Detection Using Google Cloud Storage; Text Detection Using the Vision API o Object Detection Using Android Device Photos Main features of usage o Needs internet connection o Can be tested for free (for larger scale use you have to pay a fee) 51 Usage o Accessible from smartphone apps also (client-server solution) o Gives description of image contents • »It quickly classifies images into thousands of categories (e.g., "sailboat", "lion", "Eiffel Tower"),detects individual objects and faces within images, and finds and reads printed words contained within images.« o Try Google Photos! • Search for photos containing dogs, river, … in your own photo library
52 TensorFlow o Google’s open source AI package • Available on Github o C++, Python API + bindings to other languages o Desktop PC + mobile environment • https://www.tensorflow.org/mobile/ • Android, iOS, Raspberry Features o After the learning stage (that can take a loooong time on PC massively using GPU architecture), o Model can be used locally on mobile camera live view (really fast) Android examples o https://github.com/tensorflow/tensorflow/tree/master/tensorflow/exa mples/android/ o APK is available (based on Google’s Inception model)
• Provides TF Classify, TF Detect, TF Stylize 53