DEGREE PROJECT IN COMPUTER ENGINEERING, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2018

Using Random Forest model to predict image engagement rate

FELIX EDER

MARKO LAZIC

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Using Random Forest model to predict image engagement rate

FELIX EDER, MARKO LAZIC

Master in Computer Science Date: June 4, 2018 Supervisor: Jens Lagergren Examiner: Pawel Herman Swedish title: Användning av Random Forest model för att förutspå bildengagemangsfrekvens School of Electrical Engineering and Computer Science

iii

Abstract

The purpose of this research is to investigate if Google Cloud Vision API combined with Random Forest Machine Learning algorithm is ad- vanced enough in order to make a software that would evaluate how much an Instagram photo contributes to the image of a brand. The data set contains images scraped from the public Instagram feed fil- tered by #Nike, together with the meta data of the post. Each image was processed by the Google Cloud Vision API in order to obtain a set of descriptive labels for the content of the image. The data set was sent to the Random Forest algorithm in order to train the predictor. The results of the research shows that the predictor can only guess the correct score in about 4% of cases. The results are not very accurate, which is mostly because of the limiting factors of the Google Cloud Vision API. The conclusion that was drawn is that it is not possible to create a software that can accurately predict the engagement rate of an image with the technology that is publicly available today. iv

Sammanfattning

Syftet med denna forskning är att undersöka om Google Cloud Vision API kombinerat med Random Forest Machine Learning algoritmer är tillräckligt avancerade för att skapa en mjukvara som tillförlitligt kan evaluera hur mycket ett Instagram-inlägg kan bidra till bilden av ett varumärke. Datamängden innehåller bilder hämtade från Instagrams publika flöde filtrerat av #Nike, tillsammans med metadatan för inläg- get. Varje bild var bearbetad av Google Cloud Vision API för att få tag på en mängd deskriptiva etiketter för innehållet av en bild. Datamäng- den skickades till Random Forest-algoritmen för att träna dess model. Undersökningens resultat är inte särskilt exakta, vilket främst beror på de begränsade faktorerna från Google Cloud Vision API. Slutsat- sen som dras är att det inte är möjligt att tillförlitligt förutspå en bilds kvalitet med tekniken som finns allmänt tillgänglig idag. Contents

1 Introduction 1 1.1 Problem statement ...... 2 1.2 Scope ...... 2 1.3 Thesis overview ...... 3

2 Background 4 2.1 Terminology ...... 4 2.2 Machine Learning ...... 4 2.2.1 Decision tree ...... 5 2.2.2 Bootstrap aggregating ...... 5 2.2.3 Random Forests ...... 6 2.3 Image API and classification ...... 6 2.3.1 Cloud Vision API ...... 7 2.3.2 Clarifai Predict ...... 9 2.3.3 IBM Watson Visual Recognition ...... 11 2.3.4 Amazon Rekcognition ...... 13 2.4 Related work ...... 15

3 Methods 17 3.1 Points formula ...... 17 3.2 Image Source and Scraping ...... 18 3.3 Image Classification API ...... 19 3.4 Machine Learning Run ...... 19 3.4.1 Comparing image scores ...... 20 3.4.2 Noise reduction ...... 20 3.4.3 Algorithm parameters ...... 20

4 Results 22 4.1 Regression models ...... 22 4.1.1 R1 ...... 23

v vi CONTENTS

4.1.2 R2 ...... 25 4.1.3 R3 ...... 25 4.1.4 R4 ...... 30 4.1.5 R5 ...... 33

5 Discussion 34 5.1 Result analysis ...... 34 5.2 Limitations ...... 36 5.3 Future research ...... 36

6 Conclusion 38

Bibliography 39

A Source Code 41 Chapter 1

Introduction

Our methods of communication throughout history has been ever chang- ing, from developing our first languages to start writing letters and telegrams. But the signature method of communication in the twenty- first century is undoubtedly social media. It allows us to express our opinions and beliefs as well as keep in touch with our loved ones [15]. Social media has however had a significant impact on companies and brands, which now has a tool for open and direct communica- tion with its customers, both ways. Websites such as Twitter, Face- book and LinkedIn offers a sense of community and connection be- tween not only companies and people, but also between customers and fans themselves. If Facebook was its own country, it would be the worlds third most populous one (after China and India), which means that good communication between people and brands is almost but mandatory in todays ever-changing world. But this also means that companies have to be more careful with their online marketing and overall behavior, as customer backlash can quickly damage the brand’s image [15]. For all these reasons social media has become an important aspect for all types of brands, as it is viewed as great channel of communica- tion and customer satisfaction. This area is however still pretty young and there is not a lot of tools for brands to directly reward their in- fluencers for the marketing they do for them. A lot of people upload images connected to certain brands out of their loyalty or love for a product, but there is not a lot of tools for brands to find these spe- cific people that contribute the most to their online appearance and community. For example, the hash tag #Nike has reached over 33 mil-

1 2 CHAPTER 1. INTRODUCTION

lion people on social media right now (according to Keyhole hash-tag tracker for 2018-03-12 http://keyhole.co/), but there is no way for Nike to actually go through all these posts to see what images relate to the brand appearance they want on social media and which images that detract from that. It is impossible to manually go through all this data and since social media user-bases only continue to grow this problem will only continue to get bigger for brands that try to establish their online image. The purpose of this thesis is to see if it is possible to build a software program that uses Machine Learning in order to decide which photos will be an advantage for a brand to associate with their online image and which photos will detract from it. This sort of software would save a lot of time and money and could be an invaluable tool for brands trying to build their business in this modern world of social media.

1.1 Problem statement

This thesis aims to investigate whether it is possible to combine ex- isting Image Classification API:s with Machine Learning in order to determine if a photo will contribute to the wanted online image of a Brand. This would be an invaluable tool for brands trying to build their image. The question that this thesis will handle is:

• Are random forest algorithms combined with Google Cloud Vi- sion API sufficiently advanced in order to determine the engage- ment rate of images based on a list of requirements?

1.2 Scope

Random forest Machine Learning algorithm will be used, the training data for the algorithms will be 80 000 different posted images under a specific hash tag posted on Instagram. 25 000 images will be used as test data for the trained algorithms. One Image Classification API will be used in order to classify all images and give them a set of labels for the Machine Learning algorithm. CHAPTER 1. INTRODUCTION 3

1.3 Thesis overview

Chapter 2 will introduce the Random Forest Machine Learning algo- rithm as well as related work. It will also describe the most popular commonly accessible Image Classification API:s as well as compare their results by sending in the same image as a test. Chapter 3 mo- tivates the choice of Machine Learning algorithm as well as which Image Classification API to use and how the test data was sampled. It will also explain how the research was conducted. Chapter 4 will present the results from the research and in chapter 5 the results will be analyzed. In chapter 6 conclusions will be drawn based on the pre- vious discussions. Chapter 2

Background

2.1 Terminology

The following are a few new concepts used throughout the report:

Ghost followers, also referred to as ghosts and ghost accounts or lurk- ers, are users on social media platforms who remain inactive or do not engage in activity. They are usually created by bots in order to boost the number of followers but they can also be created by people [4].

2.2 Machine Learning

Machine learning is a form of AI that enables a system to learn from data rather than through explicit programming [14]. More formal def- inition of the machine learning algorithms is provided by Tom M. Mitchell in his book from 1997: "A computer program is said to learn from experience E with respect to some class of tasks T and perfor- mance measure P if its performance at tasks in T, as measured by P, improves with experience E" [13] . Machine learning algorithms use data to learn to make predictions about unseen data in the future. Ma- chine learning algorithms can roughly be categorized in two types de- pending on the learning technique that is used: • Supervised learning : Where the data presented to the algorithm is labeled with the desired output • Unsupervised learning: No labels are available with learning data

4 CHAPTER 2. BACKGROUND 5

We will be focusing on analyzing result from a Random Decision Forest supervised learning algorithm.

2.2.1 Decision tree The basic idea with the decision trees is to test attributes sequentially and then also ask questions about the target sequentially. Decision trees can be used for both regression and classification problems but for our purposes we will be using only regression. There are two main steps in building the decision trees:

1. Divide the set of possible values X1,X2, ..., Xp into J distinct and non-overlapping regions, R1,R2, ..., Rj.

2. For every observation in the region Rj, make the same prediction that which is the mean of the response values for the training observations in Rj.

The goal is to find regions R1,R2, ..., Rj that minimize the Residual sum of squares (RSS) which is given by formula:

J X X 2 (yi − yRj )

j=1 i∈Rj

where yRj is the mean response for training observation within jth region. Because it is infeasible to consider each possible partition of the feature space into J regions, the top-down, greedy approach is used to split the tree. It starts at the top of the tree and then splits the predictor space. It takes the best split on particular step rather than looking ahead and picking the split that will lead to best tree in the future [17].

2.2.2 Bootstrap aggregating The decision trees usually suffer from high variance. This means that if we split the training data into two parts at random, and fit at deci- sion tree to both halves, the results that we get could be quite different. Bootstrap aggregation, or bagging, is a general-purpose procedure for reducing the variance of a statistical learning method and it is particu- larly useful in the context of decision trees. Given a set of n indepen- 2 dent observations Z1, ..., Zn each with variance σ , the variance of the 6 CHAPTER 2. BACKGROUND

mean Z of the observations is given by σ2/n which means that aver- aging a set of observations reduces variance. So what we do is that we generate a B different bootstrapped training data sets from our origi- nal data set and make a B predictions and then finally average them to get our final prediction [17]:

B 1 X f (x) = f ∗b(x) bag B b=1

2.2.3 Random Forests Random forests brings one improvement over bagged trees by decor- relating the trees. Many different decision trees are built on boot- strapped training samples in the same way as in bagging but this time when building these decision trees, each time a split in a tree is con- sidered, a random sample of m predictors is chosen as split candidates √ from a full set of p predictors. Typically we choose m ≈ p. This means that while building the random forest, at each split in the tree, the algorithm is not allowed to consider the majority of the available predictors. This method will overcome the problem that can occur if we have one strong predictor which will make the tree highly corre- lated [17].

2.3 Image API and classification

Our software solution in this project will utilize an Image API in or- der to analyze a lot of different images quickly. Therefor we will look into the most popular openly available Image API:s today in order to assess which would be most suitable for our project and report. As a small test, a selected image will be sent to each of the API:s and their label-answers will be compared. This will be the test-image: CHAPTER 2. BACKGROUND 7

2.3.1 Cloud Vision API Google’s Cloud Vision API was released to the public in February of 2016 and its main draw is that it can recognize individual objects from an image and return a set of labels. For example, if you submit an image of a hike through the woods, you might get a results such as: • Walking: 90%

• Woods: 85%

• Spring: 87% It also supports advanced face recognition and Google claims that it has trained its algorithm to detect over a thousand different ob- jects[9]. 8 CHAPTER 2. BACKGROUND

The selected test image is sent to the API and these labels were retrieved:

As can be seen from the labels, everything the API has labeled is in the picture, but there are still some missing labels, such as "people", "man" etc. that could be included as well. None of the labels has a really high accuracy percentage, which does not make it seem very precise in its labeling. CHAPTER 2. BACKGROUND 9

2.3.2 Clarifai Predict Clarifai has an open Image API called Clarifai Predict which works similarly to Google’s Vision API. You input an image and get a list of labels and their probability score that they are included in the im- age. The API also has a feature called model, which is a category of images. Let’s say you work with this API and are only interested in the colors of the images you input. Then you could choose a color model, which then only returns labels like contrast, dominant colors and other color-related labels. This is useful when you do not want to get all the available labels, but only a select few. You can also define your own custom model with the labels you are interested in, in order to only get the data you need. Clarifai also supports video classifica- tion, which classifies one frame per second and returns a set of labels for each classification[3]. The selected test image is sent to the Clarifai API, the response con- sisted of these labels: 10 CHAPTER 2. BACKGROUND

The Clarifai API returns many labels that are very accurate as well and present in the picture (e.g. "man", "outdoors" etc.), a lot more and accu- rate then the Cloud Vision API. It does however do a few assumptions about the image that might not be accurate ("family", "couple") as well as some that are not present in the image at all ("woman" for example). CHAPTER 2. BACKGROUND 11

2.3.3 IBM Watson Visual Recognition IBM has a number of AI-services called Watson, where one of them classifies images, called Watson Visual Recognition. This API works in much the same way as the two previous API:s, you enter an image and get back a list of labels with corresponding percentages. The API also includes face detection which according to IBM should be able to distinguish age and gender. As with the Clarifai Predict API Watson also supports models, which can either be chosen from a list of pre- made models or custom-made[5]. Our previously selected test image is sent over to this API as well, these were the returned labels: 12 CHAPTER 2. BACKGROUND CHAPTER 2. BACKGROUND 13

The Watson Visual Recognition API seems to focus a lot on smaller de- tails with lower accuracy, such as "rice" and "trouser". The only high- accuracy label is for a color and simple labels such as "person" has a fairly low accuracy value.

2.3.4 Amazon Rekcognition Amazon has its own image recognition API, named only Amazon Rekog- nition. It offers mainly the same features as the previous API:s, input images and get a list of labels and percentages back. Like Clarifai, it supports both Image and Video recognition and Amazon boasts of more advanced facial analysis than its competitors, with support for things like image recognition between multiple images, facial compar- ison (how close do two faces look?) as well as beard detection[6]. The Amazon Rekognition returned this set of labels for our test image: 14 CHAPTER 2. BACKGROUND

The API manages to detect some simple labels like "human" and "per- son" with very high accuracy, as well as minor things like "Sunglasses" CHAPTER 2. BACKGROUND 15

with fairly good accuracy. Some labels such "Nature" and "Outdoors" do have a fairly low accuracy, even tough they make up a big portion of the image.

2.4 Related work

There is not a lot of work and research done in the field of Image Clas- sification regarding social media. A research made in 2012 from Na- tional Taiwan University experimented with image classification with data sets from social media. They wanted to find out if using image resources from social media could help increase the accuracy of the classification itself. Their data set of images came from Flickr, where they manually labeled 873 images. They then randomly selected 10 000 images to serve as backgrounds for the labeled images. They then merged the labeled images with the backgrounds to create a data set of 13 000 images. They found out that by using this crowd sourced image data set the accuracy of the classified images improved by 27% com- paring to using state-of-the-art image training data. They concluded this increased accuracy was because of the more visually diverse train- ing images that were possible with a data set from Flickr[18]. Another research regarding Image Classification was conducted from Standford University, where the use of crowd-sourced images as a data set was used in order to assess whether the meta data collected from the selected social media platform (in this paper Flickr was used) can be harnessed to label and classify images. It also aimed to answer the question regarding what types of meta data are useful for label- ing images. They used three different types of meta data in order to predict images: image labels, tags and groups. Labels on Flickr are de- cided entirely by humans and can be set outside of Flickr. Tags are a bit less structured and can be set by both humans and computers. Tags do not need to be related to the content of the image itself, but could be in- formation like the brand of the camera the image was taken with. The group of the image is an optional parameter set by the up-loader of the image. In order to evaluate the results of their research, they compared images both visually and with their meta data. In regards to the labels, they noticed a 7% improvement in the Mean Average Precision (MAP) classifying image’s labels compared to the best visual methods. Sim- ilar results were encountered when the same tests where performed 16 CHAPTER 2. BACKGROUND

but the images where classified based on their tags and groups[12]. A similar research done at Kitware Inc. in New York wanted to see if they could use Convolutional Neural Networks (CNN) with the meta data of images found in social medias (Flickr specifically was used in this paper) in order to boost the quality of image labeling. Since most programs that label an image go through it pixel by pixel in order to determine the contents of it, this research wanted to see if it was possible to label the image based on meta data such as com- ments, groups, tags and other images posted by the same user. They conducted their experiment with a subset of images from Flickr’s Cre- ative Commons licenses, which contained 6000 images for training of their algorithm and 3182 testing images. In the end they found out that their own CNN algorithm trained with meta data from Flickr out- performed the current state-of-the-art methods[11]. Chapter 3

Methods

105 000 images with the hash tag "Nike" were scraped from the public Instagram feed and the images with accompanying meta data were stored locally. The images were sent to the Google Cloud Vision API, which was the API that was chosen, and the responding labels were stored locally with the average of a bit more than 8 labels per image. The meta data and set of labels for 80 000 of these images were then sent to our Machine Learning algorithm.

3.1 Points formula

The number of points are determined by meta data and it is calculated by the following formula:

l ∗ w1(f) ∗ w2(l, c) ∗ w3(l, f) (3.1) where l is number of likes, f number of followers, c number of com- ments and:

17 18 CHAPTER 3. METHODS

 2, if f > 100000  1.8, if 10000 <= f < 100000  w1(f) = 1.6, if 5000 <= f < 10000  1.4, if 1000 <= f < 5000  1.4, otherwise  0.6, if c = 0  w2(l, c) = 1.2, if l/c > 0.2  0.8, otherwise  0, if f = 0   0.8, if 0 <= l/c < 0.03  1, if 0.03 <= l/c < 0.05 w3(l, f) = 1.2, l/c  if 0.05 <= < 0.1  1.4, if 0.1 <= l/c < 0.2  1.6, otherwise This formula is derived from testing several different ways to cal- culate the points and the main idea behind it is to measure how big of an impact the photo made in the user’s community. This is done by taking into account the number of followers the user has, likes/followers ratio and likes/comments ratio. The number of likes is the most im- portant factor in deciding how many points one photo should receive but it is not decisive because one like is weighted differently depend- ing on these three factors. After all images have been inputed into the machine learning algo- rithm 20 000 new images that went through the same process through Google Vision API were then sent to the Random Forest algorithm in order to determine how many points one particular photo would get on Instagram.

3.2 Image Source and Scraping

The images which were used by the machine learning algorithm as in- put data for both learning and testing where scraped from the public Instagram feed, sorted by #Nike. Instagram was chosen since it is cur- rently the biggest photo sharing social platform[7] where posts mainly CHAPTER 3. METHODS 19

consists of personal images. The hash tag was chosen since it repre- sents one of the largest brands on the platform with over 60 million associated posts[8]. As of 10/1/2017, the Instagram API only allows basic permissions regarding fetching data [10]. This means that you can only get information about your own profile and media with an API key. In order to get enough data to perform the required research Insta- gram posts were scraped for images and meta data. Public feeds on Instagram are open and can easily be queried for specific hash tags, which made web scraping a possible solution. The Instagram hash tag search feed provides photos in chronological order with latest posts first so the minimum time interval of 7 days was set between first and second page in order to give the users enough time to like and com- ment the photo and get as relevant data as possible.

3.3 Image Classification API

All Image Classification API:s that are discussed previously in the back- ground section are for commercial use and therefore needs to be paid for. Google, Clarifai and Amazon offer free credits on sign up but only Google’s and Clarifai’s offer was sufficient enough for amount of data used. However Google’s Vision API was chosen because it provides more code examples and much clearer documentation than Clarifai. As for the different API:s label detection, all the results except IBM Watson gave highly accurate labels, so choosing Google before Clar- ifai and Amazon is not a big problem, since the API:s are all pretty equivalent.

3.4 Machine Learning Run

The Random Forest Machine algorithm was mainly chosen because of its high execution speed [2]. Because of the amount of data used, relative to limited computational power that was available, speed was crucial for the choice of Machine Learning algorithm. The library used is scikit-learn which is free Machine Learning library for the Python programming language [16]. 20 CHAPTER 3. METHODS

3.4.1 Comparing image scores After the algorithm has been trained with the provided data, the la- bels of 20 000 test images are sent to the algorithm, where it would predict what score the image would get. After the algorithm has run, the actual score calculated from the points-formula stated above, is compared with the prediction of the algorithm. Since an image can get several thousands of points it would be near impossible to predict the exact number of points for an image. Therefor a range of points will be used, if the predicted value is within +- 10% of the actual value, the prediction will be deemed correct. All predictions outside this range will be considered incorrect, but multiple error ranges for points will be used in order to determine the magnitude of the error. These error ranges will +- 20%, +- 30%, +- 40% and +- 50%. These ranges will then be used in order to compare the prediction quality of the different runs of the algorithm.

3.4.2 Noise reduction In one of Machine Algorithm runs the non-representative data was re- moved prior to the algorithm run. All images containing zero likes or zero followers were filtered out of the data set as well as images which got less than 1% of likes. Some of the posts collected from Instagram were just stock images coming from fake accounts with no followers at all or were accounts with ghost followers. These images would get zero points by formula 3.1 despite the fact that they are high quality stock images. The set of labels contained by that image would then be characterized as low quality and that could affect the performance of machine learning algorithm.

3.4.3 Algorithm parameters Total of five Random Forest Regressors were made by testing out dif- ferent values on theese three parameters [1]:

• max_features: The number of features to consider when looking for the best split.

• n_estimators: The number of trees in the forest. CHAPTER 3. METHODS 21

• min_sample_leaf: The minimum number of samples required to be at a leaf node.

All machine learning runs are presented in following table:

Regressor max_features: n_estimators min_sample_leaf filtered R1 sqrt(n_features) 10 1 no R2 sqrt(n_features) 10 1 yes R3 log2(n_features) 10 1 yes R4 sqrt(n_features) 10 50 yes R5 sqrt(n_features) 20 1 yes Chapter 4

Results

In this section the results of the machine learning runs will be pre- sented, one regression at a time.

4.1 Regression models

This table displays the resulting information of the five different re- gressors created in the experiment. The unit in this table is number of images. The columns G1-G6 represents percentage ranges, G1 is 10%, G2 is 20% and so forth. G6 are over 50% and are deemed incorrect pre- dictions. Each value for a percentage range is the number of images correctly predicted to have a score within that specific range relative to the actual score of the image.

Regressor G1 G2 G3 G4 G5 G6 R1 1001 1083 1065 1024 1061 17468 R2 1005 1036 1064 1063 983 17551 R3 991 1040 1026 998 1017 17630 R4 634 617 602 614 636 19599 R5 997 977 1073 1051 977 17627

Table 4.1: Results of regressions

22 CHAPTER 4. RESULTS 23

Figure 4.1: This graph shows the percentages of the groups of R1

4.1.1 R1 Table 4.1 shows that the regressor R1 only predicted the scores of about 1000 images correctly (within the ten percent range), and then pretty evenly predicted about 1000 images for every following percentage boundary. The majority of the prediction were however incorrect (over the 50 percent range from the actual value). As you can see in figure 4.1, the distribution of accuracy of predic- tion is very even for the first 50%, but G6 sticks out a lot with almost 80%, about 20 times as high as the other groups. The highest rated image predicted by R1 is showed in figure 4.2. It has a score of 392852.864 points, but the actual calculated score of the image is 96.768. The set of labels retrieved from this image was: orange, food, product, t-shirt, cuisine, shoe. The lowest rated image predicted by R1 is showed in figure 4.3. It has a score of 0.86073436, but the actual calculated score of the image is 1.344 and the set of labels is: comics, cartoon, facial expression, nose, text, fictional character, fiction, male, emotion, comic book. 24 CHAPTER 4. RESULTS

Figure 4.2: This is the highest rated image predicted by R1 CHAPTER 4. RESULTS 25

Figure 4.3: This is the lowest rated image predicted by R1

4.1.2 R2 As can be seen in Table 4.1, R2 has very similar results as R1, with about 1000 images for groups 1-5, with G6 having the clear majority of the images. Figure 4.4 shows that R2:s accuracy looks very similar to that of R1, with G6 being almost 20 times as high as the other groups. The image predicted by R2 to have the highest score is the same as the highest predicted image of R1, figure 4.2, but with a score of 422180.9216 points. The lowest rated image predicted by R2 is showed in figure 4.5 with a total of 1.152 points. Its actual score is 12.096 with the set of labels: footwear, pink, shoe, purple, magenta, outdoor shoe, athletic shoe, sneakers, grass, product.

4.1.3 R3 R3 has predicted about 1000 images for each of the groups 1-5, with G1 and G4 being a bit lower than regressors 1 and 2. The majority of the images predicted were however outside the 50% boundary. 26 CHAPTER 4. RESULTS

Figure 4.4: This graph shows the percentages of the groups of R2 CHAPTER 4. RESULTS 27

Figure 4.5: This is the image with the worst score predicted by R2 28 CHAPTER 4. RESULTS

Figure 4.6: This graph shows the percentages of the groups of R3 CHAPTER 4. RESULTS 29

Figure 4.7: This image has the lowest prediction by R3

As can be seen in figure 4.6, groups 1-5 have very similar results, all being around 4-5% of the images with G6 almost having 78% of all images. The image predicted by R3 to have the highest score is the same image as the highest rated image for R1 and R2, but with a score of 765349.5936 points. The lowest rated image predicted by R2 is showed in figure 4.7, scoring 1.344 points. Its actual score is 7.68 with the set of labels: fash- ion accessory, wallet, product, selling, bag, product, product design, coin purse, brand, font. 30 CHAPTER 4. RESULTS

Figure 4.8: This graph shows the percentages of the groups of R4

4.1.4 R4 R4 overall has a lower accuracy then any of the other regressors, with an average of about 600 images for groups 1-5 and G6 having almost 20000 images. The lower average for groups 1-5 can be seen clearly in figure 4.8, as well as the incredible majority of incorrect predictions, represented by group 6. The highest rated image predicted by R4 is showed in figure 4.9, with a score of 12961.21818094 points. The actual calculated value of it is 23.04 with the set of labels: close up, finger, hand, textile, wood, peach, material, font. The lowest rated image predicted by R4 is showed in figure 4.10, with a score of 5.77996294 points. Its actual score is 6.72 with the set of labels: handbag, bag, fashion accessory, product, shoulder bag, prod- uct, leather, strap, selling, brand. CHAPTER 4. RESULTS 31

Figure 4.9: This image has the highest prediction by R4 32 CHAPTER 4. RESULTS

Figure 4.10: This image has the lowest prediction by R4 CHAPTER 4. RESULTS 33

Figure 4.11: This graph shows the percentages of the groups of R5

4.1.5 R5 R5:s results are very similar to the results of regressors 1-3, with about a 1000 images per group 1-5. G6 has a clear majority with almost 18000 images. Figure 4.11 shows that groups 1-5 have very similar results, with G3 and G4 standing out a bit over the others, but all are between 4-5%. G6 holds most of the images with almost 80% of all images. The image predicted by R5 to have the highest score is the same as the highest scoring image for R1, R2 and R3, but with a score of 441765.5616 points. The lowest scored image by R5 is the same as the lowest scored image by R1, but with a score of 1.55674667 points. Chapter 5

Discussion

5.1 Result analysis

The results for regressor R1 shows that it has a very similar distribu- tion of images over the first groups (G1-G5) with around 1000 images per percentage range. This contrasts greatly with G6 which has a clear majority, with almost 17500 images. Looking at figure 4.1, groups 1- 5 are all between 4-5% while G6 has almost 80% of all the images. A clear majority of the images are therefor deemed incorrect, which is in- adequate. The images predicted by the algorithm that are within 10% of the actual score is only a little more than 4% of the total number of images. Less than 25% of the images are within 50% accuracy, which is remarkably low. Tuning the parameters of the random forest algorithm for regres- sion model 2-5 yielded similar results. Regressors 2, 3 and 5 have ex- tremely similar results to R1, with about 1000 images per percentage group and about the same amount of incorrect predictions. R4 sticks out with a overall lower accuracy, with only about 2.79% within the 10% range and 86.33% are incorrect predictions. The difference between regressor 1 and 2 is that the first run is the only regressor that did not filter out the ghost accounts. This shows that filtering out certain accounts has no impact on the results of the algorithm run. The similar results between R1 and R3 indicates that changing the number of features considered in each split in the deci- sion tree has no significant impact on the results. Increasing the num- ber of trees in the forest by 100% did not change the result in any mean- ingful way.

34 CHAPTER 5. DISCUSSION 35

The regressor that sticks out from the others is R4, which has con- siderably worse accuracy. This shows that using a more generalized model which does not allow leaf nodes to have less than 50 samples results in a more inaccurate prediction. These regressors show that changing the parameters for their Ran- dom Forest algorithms does not have a big impact on the final results. The highest scored image for R1 is figure 4.2 with a score of 392852.86 points, but the actual calculated score of the image is 96.77 points. This is visually a very satisfying image which is of benefit for Nike, but there is an issue with the difference between the predicted score and the actual score. This could be the best image of the test data, but according to the calculated points it shouldn’t be a very good image. This shows a big disconnect between our formula for points and our predicted score. Looking at the labels for figure 4.2, it can be seen that they are not very specific, especially with broad words like "orange", "product", etc. Some labels are also incorrect like "food" and "cuisine". This difference in score can also be because explained by duplicated images on Instagram. For example, if a person takes an image from the official Nike-account and posts it as their own, the official picture would get many likes and therefor a lot of points according to our formula. The re-posted image would probably not get as many likes despite the fact that both have the same set of labels. If the original pic- ture was part of the training data it would get a high score and all the re-posted images that are part of the test data would therefor get a high score predicted by our algorithm. This would cause the big difference in score between predicted and calculated value of the research. The worst rated image by R1 is figure 4.3 with a predicted score of 0.86 points but with a calculate value of 1.34 points. Although the prediction was within a half-point marginal from the actual value, the difference in percent is greater than 50 which puts it in group 6. This is the same group as the best prediction for R1, but the absolute point difference for it is almost 400 000 points. This phenomenon occurs because our groups 1-6 uses a difference in percentage, which can be problematic for images with low score since the error margin becomes very small. This makes it very hard to predict a score with reason- able accuracy for low values but makes it easier to predict for greater values. 36 CHAPTER 5. DISCUSSION

5.2 Limitations

This research has been limited by a number of different factors. The biggest factor being that the amount of labels returned from Cloud Vision API is to few and to general to differentiate all images. As can be seen in the labels returned for figure 4.2 and figure 4.3, they are very general and does not really do a good job of describing a unique image. The same labels could describe a lot of different images. Since the Cloud Vision API only returns about 8 labels on average, it is simply too few to get a precise enough classification for the images, which is vital for the research. This limitation stems not from the research itself, but from the technology of today. Google Cloud Vision API is among the best image classification technologies publicly available today. Another limiting factor is the duplicated images of Instagram. A single image could be reposted multiple times, which would give it the same labels from the API but with different meta data resulting in different amounts of points calculated by the points formula. Similar issues applies to stock images which are usually posted from fake ac- counts. Such images could get low number of points although they could be good. A third limitation of the research is that the meta data of an image is not a reliable source of information regarding the engagement rate of an image. If a photo has a lot of likes does not guarantee that it is a positive contribution to the image of a brand. The same applies to comments which could be negative and number of followers which has no real impact of the engagement rate of an image. Even though the derived points formula of this research was supposed to calculate a score determining the actual engagement rate of an image the number of likes is still the most weighted factor and the image with more likes will almost always get a higher score. A critical limitation of this research was the size of the data set which was limited by the resources and computational power avail- able.

5.3 Future research

Future research in the field of image classification would need to filter out the duplicated images and stock images of Instagram. A solution CHAPTER 5. DISCUSSION 37

to this could be that every image in the data set is searched in the Google Search Engine for photos in order to see if the same photo can be found. If it is found but has no direct connection to the original poster, the image would not be included in the data set as it is most likely a re-posted or a stock image. Instead of using the meta data and deriving a formula in order to determine the engagement rate of an image, a complete data set of images where every image is already scored according to brand stan- dards could be used. In a future research the data set should be much larger, which would require more resources and computational power in order to complete the experiment in a reasonable time. Chapter 6

Conclusion

The answer for the question stated in the problem statement whether it is possible to determine the engagement rate of images based on a list of requirements with random forest algorithms and Google Cloud Vi- sion API is no, at least with the technology available for us today. The limitations of the Image API is to great to provide data with enough distinction for the machine learning algorithms used.

38 Bibliography

[1] 3.2.4.3.2. sklearn.ensemble.RandomForestRegressor — scikit-learn 0.19.1 documentation. http://scikit-learn.org/stable/modules/ generated/sklearn.ensemble.RandomForestRegressor. html. (Accessed on 04/26/2018). [2] Leo Breiman. Classification and regression trees. Routledge, 2017. [3] Clarifai. Cloud Vision API. 2018. URL: https://www.clarifai. com/products (visited on 03/12/2018). [4] Ghost followers - Wikipedia. https://en.wikipedia.org/ wiki/Ghost_followers. (Accessed on 04/25/2018). [5] IBM. Watson Visual Recognition. 2018. URL: https://www.ibm. com/watson/services/visual-recognition (visited on 03/12/2018). [6] Amazon.com Inc. Amazon Rekognition – Video and Image - AWS. 2018. URL: https://aws.amazon.com/rekognition (vis- ited on 03/12/2018). [7] Amazon.com Inc. Top 15 Most Popular Photo Sharing Sites | July 2017. 2018. URL: http://www.ebizmba.com/articles/ photo-sharing-sites (visited on 03/30/2018). [8] Amazon.com Inc. Top 200 HashTags on Instagram. 2018. URL: https: //top-hashtags.com/instagram/101 (visited on 03/30/2018). [9] Google Inc. Cloud Vision API. 2017. URL: https : / / cloud . google.com/vision/ (visited on 03/12/2018). [10] Instagram Developer Documentation. https://www.instagram. com/developer/authorization/. (Accessed on 03/30/2018). [11] Chengjiang Long et al. “Deep Neural Networks In Fully Con- nected CRF For Image Labeling With Social Network Metadata”. In: arXiv preprint arXiv:1801.09108 (2018).

39 40 BIBLIOGRAPHY

[12] Julian McAuley and Jure Leskovec. “Image labeling on a net- work: using social-network metadata for image classification”. In: European conference on computer vision. Springer. 2012, pp. 828– 841. [13] Tom M. Mitchell. Machine learning. McGraw-Hill Book Company, 1997. [14] John Paul Mueller and Luca Massaron. Machine Learning for Dum- mies. John Wiley & Sons, 2016. [15] M Saravanakumar and T SuganthaLakshmi. “Social media mar- keting”. In: Life Science Journal 9.4 (2012), pp. 4444–4451. [16] scikit-learn Machine Learning in Python. http://scikit-learn.org/stable/. [17] Robert Tibshirani et al. An introduction to statistical learning-with applications in R. 2013. [18] Sheng-Yuan Wang et al. “Learning by expansion: Exploiting so- cial media for image classification with few training examples”. In: Neurocomputing 95 (2012), pp. 117–125. Appendix A

Source Code

All source code written for this research can be found in the git repos- itory: https://gits-15.sys.kth.se/mlazic/photoScraper

41 www.kth.se