Energy Efficient Enabling Technologies for Semantic Video Processing on Mobile Devices

ENERGY EFFICIENT ENABLING TECHNOLOGIES FOR SEMANTIC VIDEO PROCESSING ON MOBILE DEVICES by Daniel Larkin, B.Eng. Submitted in partial fulfilment of the requirements for the Degree of Doctor of Philosophy Dublin City University School of Electronic Engineering Supervisor: Prof. Noel E. O’Connor September, 2008 I hereby certify that this material, which I now submit for assessment on the programme of study leading to the award of Doctor of Philosophy is entirely my own work, that I have exercised reasonable care to ensure that the work is original, and does not to the best of my knowledge breach any law of copyright, and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work. Signed: Daniel Larkin (Candidate) ID: Date: Table of Contents List of Figures vii List of Tables xi Abstract xii List of Acronyms xiv List of Peer-Reviewed Publications xvii Acknowledgements xix 1 Introduction 1 1.1 The Emergence of Mobile Multimedia . 2 1.2 Grand Challenges Facing Next Generation Mobile Multimedia . 3 1.2.1 Semantic Multimedia Processing . 3 1.2.2 Multimedia Content Delivery to Mobile Devices . 5 1.2.3 Multimedia Processing on Resource Constrained Mobile Devices . 6 1.3 Research Motivation and Work Programme . 7 1.3.1 Novel Face Detection and associated hardware acceleration . 7 1.3.2 Hardware Acceleration of Motion Estimation . 8 1.4 ResearchObjectives................................. 9 1.5 ThesisStructure................................... 9 iii 1.6 Summary ...................................... 10 2 Technical Background 11 2.1 Digital Video Compression . 12 2.1.1 A Generic Video Compression System . 15 2.1.2 Semantic Video Object based Compression . 28 2.1.3 Image & Video Compression Standards . 32 2.2 Semantic Video Object Segmentation . 37 2.3 Thesis Contributions in the Context of Video Object Processing . 38 2.4 Implementation Options . 41 2.4.1 Summary of Implementation Options . 43 2.5 Energy Efficient Design Principles . 44 2.5.1 Low Power design techniques . 47 2.5.2 SummaryofLowPowerDesign . 51 2.6 Conclusions..................................... 52 3 Face Detection: A Review of Popular Approaches 53 3.1 Face Detection State of the Art Review . 55 3.1.1 Feature Extraction based face detection approaches . 55 3.1.2 Appearance based face detection approaches . 59 3.1.3 Prior art in hardware acceleration of face detection . 69 3.2 Discussion: Suitable Algorithms for a Mobile Device . 73 3.3 Fundamentals of Evolutionary Artificial Neural Networks . 77 3.3.1 Artificial Neural Networks . 78 3.3.2 Genetic Algorithms . 82 3.3.3 Evolutionary Artificial Neural Networks . 83 3.3.4 Neuro Evolution of Augmenting Topologies . 88 3.4 Conclusions..................................... 90 4 A Novel Face Detection Training Algorithm 92 4.1 Training data preparation . 92 4.2 NEAT based face detection training . 95 4.3 Training Runs and Parameter Selection . 102 iv 4.3.1 Non-Seeded Topology Training . 106 4.3.2 Seeded Topology Training . 110 4.4 Conclusions on Training . 120 4.5 FutureWork..................................... 121 4.5.1 Automatic Input/Feature Selection . 128 4.5.2 Using Alternative Low-Level Features . 128 4.5.3 Detection of Side Profile and Rotated Faces . 130 4.6 Summary of Contributions . 130 5 Software Implementation of Trained Face Detection EANN 134 5.1 Normal Mode Face Detection Operation . 134 5.1.1 Merging detection results . 136 5.2 Profiling of Normal Mode Face Detection Operation . 138 5.2.1 Software Power & Energy Consumption . 140 5.3 Software Implementation & Algorithmic Optimisations . 141 5.4 Results........................................ 145 5.4.1 Evaluation on the BioID face database . 147 5.4.2 Failure Analysis on the CMU/MIT face database . 150 5.5 FutureWork..................................... 150 5.6 Summary of Contributions . 154 6 EANN Hardware Accelerator 155 6.1 State of the Art Hardware EANN Review . 158 6.1.1 Efficient Multiply Accumulation . 159 6.1.2 Activation Function Generator . 159 6.1.3 Number System and Precision Requirement Considerations . 160 6.2 ProposedHardwareArchitecture . 162 6.3 Implementation of Proposed Architecture . 165 6.3.1 Activation Function Generator . 166 6.4 Results........................................ 172 6.4.1 Activation Function Results . 173 6.4.2 Hardware Implementation Results . 174 6.4.3 Benchmarking of Proposed Hardware Architecture against Prior Art . 178 v 6.5 FutureWork..................................... 181 6.6 Summary of Contributions . 182 7 Binary Motion Estimation Hardware Accelerator 184 7.1 State of the Art Review . 185 7.1.1 Binary Motion Estimation . 190 7.2 Proposed Binary Motion Estimation Architecture . 193 7.2.1 Reformulation of the Binary SAD Metric . 194 7.2.2 4xPE Block Matching Architecture . 200 7.3 Results........................................ 208 7.4 FutureWork..................................... 209 7.4.1 16XPE Block-Matching Architecture . 210 7.4.2 Possible PE improvements . 210 7.5 Summary of Contributions . 212 8 Conclusions & Future Work 214 8.1 Motivation for Research – A Summary . 214 8.2 Summary of Thesis contributions . 216 8.3 A Vision for the Future . 218 Bibliography 220 vi List of Figures 1.1 Video scene decomposed into constituent semantic objects . 3 1.2 Example of multiple applications leveraging a hardware ANN accelerator . 8 2.1 An example of a digital still image and video sequence . 13 2.2 Examples of Y CbCr sampling modes . 13 2.3 A Generic Video Encoder and Decoder . 17 2.4 Temporal Redundancy in Video Sequences . 19 2.5 Motion estimation taxonomy . 20 2.6 Block sizes used in H.264 motion estimation) . 21 2.7 Probability of DCT coefficients being nonzero . 25 2.8 Zig-zag scanning of quantised DCT coefficients . 27 2.9 MPEG-4VideoObjects............................... 29 2.10 MPEG-4BinaryShapeEncoder . 31 2.11 Motion Vector difference for Shape . 33 2.12 Contributions of this Thesis in context of Video object-based functionality . 39 2.13 Dynamic power loss in a CMOS Inverter . 45 3.1 Basic taxonomy of face detection algorithms . 56 3.2 Block diagram of a generic feature extraction based face detection algorithm . 56 3.3 Algorithmic options for appearance based face detection approaches ....... 59 3.4 Rowley’s ANN based face detection algorithm . 61 vii 3.5 Haar-like Features used in Viola & Jones Cascaded Adaboost algorithm ..... 66 3.6 Additional Haar features for rotation-invariant face detection . ...... 67 3.7 System Architecture of Face Detection Algorithm proposed by Nguyen et al . 71 3.8 Face Detection Hardware Architecture Proposed by Yang et al . ...... 73 3.9 Basic Terminology associated with Artificial Neural Networks . 79 3.10 Selection of popular activation functions . 81 3.11 Examples of genetic algorithm binary crossover techniques . ..... 84 3.12 Taxonomy of EANN algorithms . 85 3.13 Issues associated with topology genome representation & the crossover operator . 87 3.14 Example of a NEAT genome and resultant phenotype . 87 3.15 Use of innovation numbers in the mutation and crossover operators in NEAT . 89 4.1 Preprocessing of FERET face training dataset . 93 4.2 Example scenery image devoid of faces which was used during the training image 94 4.3 Starting topology seeded with localised features . 95 4.4 Proposed NEAT-based face detection training . 99 4.5 Simple example of over-training . 100 4.6 Proposed Evolutionary Training Strategy . 105 4.7 Non-seeded topology training run – iteration 1 . 111 4.8 Non-seeded topology training run – iteration 2 . 112 4.9 Non-seeded topology training run – iteration 3 . 113 4.10 Non-seeded topology training run – iteration 4 . 114 4.11 Non-seeded topology training run – iteration 5 . 115 4.12 Non-seeded topology training run – iteration 6 . 116 4.13 Seeded topology training run – iteration 1 . 122 4.14 Seeded topology training run – iteration 2 . 123 4.15 Seeded topology training run – iteration 3 . 124 4.16 Seeded topology training run – iteration 4 . 125 4.17 Seeded topology training run – iteration 5 . 126 4.18 Seeded topology training run – iteration 6 . 127 4.19 DCTEnergyexploration .............................. 131 4.20 Energy versus the number of SA-DCT coefficients of each of the face samples . 132 viii 5.1 Example image pyramid . 135 5.2 Software Profiling of Proposed Face Detection Algorithm . 139 5.3 Power Profiling of Multiply Accumulate Operations on an ARM-920T processor 141 5.4 Power Profiling of a Sigmoid Activation Function on an ARM-920T processor . 142 5.5 Efficient calculation of block skin totals . 143 5.6 Effect of changing the face detection threshold . 146 5.7 Representative results from the BioID face evaluation dataset . 148 5.8 ROC graph generated from proposed algorithm when using the BioID dataset . 149 5.9 Representative results from the CMU face evaluation dataset . 151 5.10 ROC graph generated from proposed algorithm when using the CMU dataset . 153 6.1 NEAT multimedia hardware accelerator . 157 6.2 Neuronconnectiontypes .............................. 162 6.3 NEAT phenotype to hardware memory mapping . 163 6.4 Simplified block diagram of the proposed EANN hardware accelerator datapath . 164 6.5 Simplified block diagram of the.

Energy Efficient Enabling Technologies for Semantic Video Processing on Mobile Devices

Webcore: Architectural Support for Mobile Web Browsing

EVA: an Efficient Vision Architecture for Mobile Systems

6Th Generation Intel® Core™ Processors Based on the Mobile U-Processor for Iot Solutions (Intel® Core™ I7-6600U, I5-6300U, and I3-6100U Processors)

PART I ITEM 1. BUSINESS Industry We Are

NVIDIA Tegra 4 Family CPU Architecture 4-PLUS-1 Quad Core

Architectural Support for Javascript Type-Checking on Mobile Processors

Intel® Core™2 Duo Mobile Processor for Intel® Centrino® Duo Mobile Processor Technology

Design and Implementation of Low Power 5 Stage Pipelined 32 Bits MIPS Processor Using 28Nm Technology

5 Microprocessors

Introducing Intel's First-Ever Core™ I9 Mobile Processor

Intel® Core™ I7-900/I7-800/I7-700 Mobile Processor

Android in the Cloud on Arm Native Servers