Using Computer Vision in Retail Analytics

Marcus Norrgård USING COMPUTER VISION IN RETAIL ANALYTICS MARCUS NORRGÅRD STRATACACHE OY Master of Science Thesis Supervisors: Annamari Soini, Johan Lilius Advisor: Niclas Jern Software Engineering Department of Information Technology Åbo Akademi University 2020 Marcus Norrgård ABSTRACT This thesis comprises the creation of a computer vision-based analytics platform for making age and gender-based inference for retail analytics. Computer vision in retail spaces is an emerging field with interesting opportunities for research. The thesis utilizes modern technologies to create a machine learning pipeline for training two convolutional neural networks for classifying age and gender. Furthermore, this thesis examines deployment and inference computing of the trained neural network models on a Raspberry Pi single board computer. The results presented in this thesis demonstrate the feasibility of the created solution. i Marcus Norrgård ACKNOWLEDGEMENTS I would like to thank professor Johan Lilius and my supervisor Annamari Soini for their help and guidance during the writing of this thesis. I would also like to thank my advisor Niclas Jern for his support and my employer Stratacache OY for giving me the opportunity to make this project a reality. Marcus Norrgård Turku, April 30, 2020 ii Marcus Norrgård Table of Contents ABSTRACT ....................................................................................................................... i ACKNOWLEDGEMENTS .............................................................................................. ii 1 INTRODUCTION .................................................................................................... 1 1.1 Aim .................................................................................................................... 1 1.2 Stakeholders ...................................................................................................... 1 1.3 Problem ............................................................................................................. 2 1.4 Relevance .......................................................................................................... 3 1.5 Solution criteria ................................................................................................. 3 1.6 Tools .................................................................................................................. 4 1.7 Ethical discussion .............................................................................................. 5 1.8 Structure of the thesis ........................................................................................ 7 2 BACKGROUND AND TERMINOLOGY............................................................... 8 2.1 Computer vision ................................................................................................ 8 2.2 Digital image fundamentals .............................................................................. 9 2.3 Transforming digital images and convolutions ............................................... 10 2.4 Image classification and object detection ....................................................... 11 2.5 Machine learning ............................................................................................. 12 2.6 Parameterized learning .................................................................................... 13 2.6.1 Training data ............................................................................................... 14 2.6.2 Scoring function .......................................................................................... 15 2.6.3 Weights and bias ......................................................................................... 16 2.6.4 Loss functions ............................................................................................. 16 2.7 Optimization .................................................................................................... 16 2.8 Neural networks and deep learning ................................................................. 18 2.9 Convolutional neural networks ....................................................................... 20 iii Marcus Norrgård 2.10 Training process .............................................................................................. 22 2.11 Underfitting and overfitting ............................................................................ 23 2.12 Haar-cascade classifiers .................................................................................. 24 2.13 Transfer learning and fine-tuning.................................................................... 25 3 CURRENT TECHNOLOGIES AND OTHER PLAYERS .................................... 26 3.1 Adience ........................................................................................................... 26 3.2 TensorFlow and Keras .................................................................................... 27 3.3 ImageNet ......................................................................................................... 27 3.4 MxNet ............................................................................................................. 27 3.5 Neural compute devices .................................................................................. 28 3.6 Google Coral ................................................................................................... 28 3.7 Intel Movidius NCS2 ...................................................................................... 29 4 DESIGNING THE COMPUTER VISION SOLUTION ........................................ 30 4.1 Requirements................................................................................................... 30 4.2 Convolutional neural network ......................................................................... 33 4.3 Machine learning pipeline ............................................................................... 34 4.3.1 Splitting and preparing the dataset .............................................................. 35 4.3.2 Pre-processing ............................................................................................. 36 4.3.3 Training, packaging and, validating the model ........................................... 39 4.4 Computer vision device................................................................................... 40 5 DESCRIPTION OF THE MACHINE LEARNING PIPELINE ............................ 42 5.1 Training process .............................................................................................. 42 5.2 Training results................................................................................................ 43 5.3 Discussion ....................................................................................................... 49 6 EVALUATION AND TESTING OF THE COMPUTER VISION DEVICE ....... 52 6.1 Performing inference on the computer vision device ..................................... 52 6.1.1 Performance test 1: Frames per second ....................................................... 52 iv Marcus Norrgård 6.1.2 Performance test 2: Inference time ............................................................. 53 6.1.3 Performance test 3: Model loading time ..................................................... 53 6.1.4 Example output ........................................................................................... 53 6.2 Discussion ....................................................................................................... 54 7 CONCLUSION AND FUTURE WORK ............................................................... 57 7.1 Conclusion ...................................................................................................... 57 7.2 Future implementation .................................................................................... 58 SUMMARY IN SWEDISH – SAMMANFATTNING PÅ SVENSKA ........................ 59 BIBLIOGRAPHY ........................................................................................................... 62 v Marcus Norrgård 1 INTRODUCTION The purpose of this thesis is to implement and design a proof of concept computer vision (CV) solution for gathering retail analytics insights for Stratacache OY. Further, the thesis tests and benchmarks how well the developed computer vision solution performs. The thesis will start by explaining core concepts and engineering technologies that are commonly used in computer vision, image recognition, and object detection. 1.1 Aim The thesis focuses on the technical aspects of CV. First, it briefly introduces and compares so-called traditional computer vision techniques with modern state of the art technologies based on convolutional neural networks. It presents the technologies that are most popular and relevant today and accounts for the development of the system. Secondly, it examines the trade-off between hardware limitations, accuracy, and performance and presents the findings. 1.2 Stakeholders The author of this thesis is a software engineer at Stratacache OY. This thesis documents work done for Stratacache OY, a company, which specializes in retail analytics. The thesis serves as a research project to evaluate the possible suitability of a computer vision- based analytics solution for Stratacache OY. The company aims to give retailers a broader understanding of the behavior of their customers. Insights are produced by combining different

Using Computer Vision in Retail Analytics

EECS 498/598: Brain-Inspired Computing: Models, Architectures, and Programming

A Deep Neural Network for On-Board Cloud Detection on Hyperspectral Images

AI EDGE up Series

Artificial Intelligence Computing for Automotive Webcast for Xilinx Adapt: Automotive: Anywhere January 12Th, 2021

Synopsys and Movidius First-Pass Silicon Success for Myriad 2 Vision Processing Unit with Designware USB 3.0, LPDDR3/2 & MIPI D-PHY IP

Semi Autonomous Vehicle Intelligence: Real Time Target Tracking for Vision Guided Autonomous Vehicles

A Comparative Analysis of Tightly-Coupled Monocular, Binocular, and Stereo VINS

Cost-Effective Hardware Accelerator Recommendation for Edge Computing∗

Benchmarking of Cnns for Low-Cost, Low-Power Robotics Applications

Exploring the Vision Processing Unit As Co-Processor for Inference

HETEROGENEOUS COMPUTING for AI at the EDGE Webinar Pasca Sarjana Terapan PENS 2 September 2020 Dr

Intel Vision Products