<<

Shatha Jaradat kth royal institute of technology Mining of User Profiles in Online Social Networks for Improved Personalized Recommendations Personalized Improved for Networks Social Online in Profiles User of Mining

Doctoral Thesis in Information and Communication Technology Mining of User Profiles in Online Social Networks for Improved Personalized Recommendations

SHATHA JARADAT

ISBN 978-91-7873-688-1

TRITA-EECS-AVL-2020:59 2020 KTH www.kth.se Stockholm, Sweden 2020 Mining of User Profiles in Online Social Networks for Improved Personalized Recommendations

SHATHA JARADAT

Academic Dissertation which, with due permission of the KTH Royal Institute of Technology, is submitted for public defence for the Degree of Doctor of Philosophy on Thursday the 3rd of December 2020 at 4:00 p.m. in Sal C, Kistagången 16, Kista.

Doctoral Thesis in Information and Communication Technology KTH Royal Institute of Technology Stockholm, Sweden 2020 © Shatha Jaradat

ISBN 978-91-7873-688-1 TRITA-EECS-AVL-2020:59

Printed by: Universitetsservice US-AB, Sweden 2020 To all the people who supported me during my PhD journey iv

Abstract

We have focused on influencer-based in online social networks as a source of implicit learning about the preferences of users. Those users who use social networks on a daily basis are also the online shoppers who are confronted with huge information overload and a wide variety of online products and to choose from. The role of digital influencers in promoting products and spreading information to a large scale of followers who engage with the influencers’ posts and interact with them is our key to better understanding of these followers’ tastes and future purchase intentions. Hence, the analysis and the extraction of fine-grained details (which we refer to as user profiling) from digital influencers media content serves in collecting more information about the implicit preferences of their followers. With this knowledge, the chances of offering social media users better personalized services are enhanced. In this thesis, we empower cross-domain recommendations through the development of novel methods and algorithms for improving personalization through the effective mining of user profiles in online social networks. We developed a semantic information extraction framework from social media textual content that is able to capture fine-grained attributes with respect to the defined online shops taxonomy. Results form the aforementioned framework have been applied as input to the approaches we proposed to incorporate extracted textual hints in supporting the visual fine-grained classification of social media images in a dynamic way. Our methods have improved the classification accuracy when compared to state-of-the-art approaches. Moreover, we suggested solutions for incorporating the extracted products’ -data in embedding-based personalized recommendation architectures where our strategies improved the recommendations’ quality. In order to speed up the process of preparing large scale social media images datasets for deep learning image analysis, we developed a complete framework for detailed annotation, object localization and semantic segmentation. As our focus is also directed towards the analysis of interactions between social media users, we proposed a neural reinforcement learning approach that is based on estimating the established trust levels between social media users for controlling the amount of recommended updates they get from each other. Moreover, we proposed enhanced topic modelling algorithm for supporting interpretable yet dynamic summarizations of large social media contents. v

Sammanfattning

Vi har fokuserat på influenserbaserad marknadsföring i sociala nätverk online som en källa till implicit lärande om sociala medianvändares preferenser. De an- vändare som använder sociala nätverk dagligen är också online-shoppare som står inför enorm informationsöverbelastning och ett brett utbud av onlineprodukter och varumärken att välja mellan. Rollen hos digitala influenser när det gäller att marknadsföra produkter och sprida information till en stor skala av anhängare som engagerar sig i influencers inlägg och interagerar med dem är vår nyckel till bättre förståelse för dessa anhängares smak och framtida köpintentioner. Analysen och utvinningen av finkorniga detaljer (som vi kallar textit user profiling) från medieinnehåll för digitala influenser tjänar därför till att samla in mer informa- tion om deras implicita preferenser. Med denna kunskap tillämpad för att berika användarprofiler för sociala medier förbättras chanserna att erbjuda dem bättre anpassade tjänster. I denna avhandling ger vi rekommendationer över gränserna genom utveckling av nya metoder och algoritmer för att förbättra personalisering genom effektiv utvinning av användarprofiler i sociala nätverk online. Vi utvecklade en semantisk ram för informationsextraktion från textinnehåll i sociala medier som kan fånga finkorniga attribut med avseende på den definierade onlinebutikens taxonomi. Resultat från ovannämnda ramverk har använts som input till de till- vägagångssätt som vi föreslog för att införliva extraherade texttips för att stödja den visuella finkorniga klassificeringen av sociala mediebilder på ett dynamiskt sätt. Våra metoder har förbättrat klassificeringsnoggrannheten jämfört med topp- moderna metoder. Dessutom föreslog vi lösningar för att integrera de extraherade produkternas metadata i inbäddningsbaserade personliga rekommendationsar- kitekturer där våra strategier förbättrade rekommendationernas kvalitet. För att påskynda processen att förbereda storskaliga bildmängder för sociala medier för djupinlärningsbildanalys utvecklade vi en komplett ram för detaljerad kommentar, objektlokalisering och semantisk segmentering. Eftersom vårt fokus också riktas mot analysen av interaktioner mellan användare av sociala medier, föreslog vi en neurologisk förstärkningsinlärningsmetod som baseras på att uppskatta de etablerade tillitsnivåerna mellan användare av sociala medier för att kontrollera mängden rekommenderade uppdateringar de får från varandra. Dessutom föreslog vi förbättrad ämnesmodelleringsalgoritm för att stödja tolkbara men dynamiska sammanfattningar av stora sociala medieinnehåll. vi

Acknowledgements

I start by thanking my supervisor Professor Mihhail Matskin for guiding and supporting me throughout the past years. I would never be able to thank him enough for all the opportunities that he provided to me. I am very grateful to him and to his patience and wisdom that directed me in this .

I am very thankful to my industrial supervisor Dr. Nima Dokoohaki. I have been very lucky that I had the chance to collaborate with him during my doctoral projects. I learnt alot from him at the technical, managerial and personal levels. His continous support has helped me to grow and to explore many opportunities.

I would like to express my gratitude to Professor Vladimir Vlassov for his support of my work and all of his valuable advices. I would also like to thank Professor Henrik Boström and Assoc. Professor Sarunas Girdzijauskas for their feedback during the evaluation of my thesis. I am also very thankful to Alf Thomas Sjöland for his kind words and support.

Special thanks to my colleague and dearest friend Leila Bahri for all her sup- port and advice. I would like also to thank my colleagues: Kamal Hakimzadeh, Saranya Natarajan, Amira Soliman, Petar Mrazović and Zainab Abbas for all the nice memories we had together. I am also very proud that I have worked with Kim Hammar, Mallu Goswami, and Ummul Wara during their Master the- sis projects. They added a great value to my doctoral thesis during our collaboration.

Finally, I would like to thank all the members of my family: my parents, my sister Bashira, and my brothers Fady and Naser for their encouragement and support throughout the duration of my PhD studies. Special thanks to my sister Bashira who always believed in me and gave me all the positive energy I needed during tough times. Contents

List of Figures ix

List of Tables xi

1 Introduction 1 1.1 Research Motivation ...... 1 1.2 Research Questions ...... 4 1.3 Thesis Objectives ...... 5 1.4 Research Methodology ...... 6 1.5 Contributions ...... 6 1.6 Methodological Challenges ...... 11 1.7 Application Areas ...... 12 1.8 Thesis Organization ...... 14

2 Background 15 2.1 Product Recommendation Algorithms ...... 15 2.2 Profile Mining in Social Media ...... 18 2.3 Advances in Fashion Analysis (Application Area) ...... 28

3 Detailed Contributions 31 3.1 Development of a Supervised and Dynamic Topic Mining Framework 31 3.2 Development of a Fine-Grained & Semantic Information Extraction Framework from Social Media ...... 33 3.3 Development of a Methodology for Incorporating Hierarchical Meta- Data in Embedding-based Recommendations ...... 35 3.4 Development of an Adaptive Privacy-Preserving Knowledge Sharing Solution for OSNs ...... 36 3.5 Development of Framework for Image segmentation, object localiza- tion, and detailed annotation ...... 37 3.6 Development of Dynamic & Scalable Image Classifiers ...... 39

4 Conclusions 41 4.1 Conclusions ...... 41 4.2 Future Work ...... 42

Bibliography 45

A OLLDA- A Supervised and Dynamic Topic Mining Framework in 57

vii viii contents

B Deep Text Mining of Data without Strong Supervision 65

C Outfit2Vec: Incorporating Clothing Hierarchical MetaData into Outfits’ Recommendation 75

D Learning What to Share in Online Social Networks using Deep Rein- forcement Learning 97

E Deep Cross-Domain Fashion Recommendation 119

F TALS: A Framework for Text Analysis, Fine-Grained Annotation, Lo- calisation and Semantic Segmentation 125

G Dynamic CNN Models for Fashion Recommendation in Instagram 133 List of Figures

1.1 Illustration of the thesis objectives and the way they are mapped to the contributions ...... 6 1.2 Illustration of the general methodology that is followed in our work to extract clothing knowledge from social influencers’ media for the purpose of generating personalized recommendations. Social influencers’ posts receive interactions from their followers in the form of likes, comments and shares. These interactions when analysed can be used to estimate the level of trust that the followers’ have towards the influencers which can be reflected in their generated recommendations...... 13

2.1 Example of Possible Outfit’s Extracted Information from an Instagram Post 23 2.2 Graphical representations for Paragraph Vector models: (a) Distributed Memory (b) Distributed Bag of Words ...... 24 2.3 Graphical representations for (a) original image, (b) object localization applied to the image to decide the bounding box where the object exists, (c) semantic segmentation is applied to the image to decide the exact pixels belonging to each class...... 27

3.1 Graphical representations for (a) LDA model - Source: [1] (b) Variational distribution used to approximate the posterior distribution in LDA - Source: [2] (c) Labeled LDA model - Source: [3] (d) Modified variational distribution used to approximate the posterior distribution in OLLDA 32

ix

List of Tables

2.1 Example of an Instagram fashion post along with the received comments 23

4.1 Correlating the research questions and the thesis contributions . . . . . 43

xi

chapter1

Introduction

"A is no longer what we tell the consumer it is – it is what consumers tell each other it is" Scott Cook - Businessman, director at eBay, board member at Procter & Gamble and co-founder of Intuit

1.1 Research Motivation

Social Media Marketing has dramatically affected how people search for and receive information. Recent statistics revealed that around 45% of the world popula- tion use social media in their daily life, and around 73% of businesses invest in social media for their marketing campaigns [4]. A consequence of this is an overwhelming amount of conflicting brand messages that people encounter everyday which makes it necessary to rely on reviews and recommendations from friends and other trusted connections [5]. With the high competition, brands cannot convince consumers that their product is better than others with just advertisements. Motivated by that, influencer-based marketing has emerged as a modern field that focuses on growing engagement between brands and social media users through channels that the later have learned to trust: social influencers [6].

Social influencers can be described as third-party endorsers who cultivate a sizable number of followers by speaking people’s language and regularly inter- acting with them through their daily experiences and [7]. A Twitter study suggested that consumers may accord social media influencers a similar level of trust as they hold for their friends [8]. Trustworthiness along with attractiveness, informative value of the content that influencers generate and their similarity to the followers positively affect the followers’ trust in the influencers’ branded posts, which subsequently affect their purchase intentions [5, 9, 10, 11]. The reasons behind the effectiveness of social influencer-based marketing can be summarized as:

1 2 1 introduction information overload, trendy content, and huge variety in products and services which make efficient recommendations a necessity.

As business brands and online retailers are increasingly investing using social media, influencer-based marketing has grown into a multibillion-dollar industry [12]. Many followers browse the product that the influencer is advertisting for and they search for it or for a similar one in online shops. This power of social influencers can be effectively employed in generating personalized recommendations for the followers. Personalization in recommendation systems aims at exploiting the differences among users by adapting their recommended content to their personal tastes. It can be most efficient when it presents the right product to the right consumers and importantly at the right time. This is achieved through a process of constructing knowledge about the users’ preferences based on a combination of analysis of their interactions and contents, which is referred to as user profiling [13]. The main statement of this thesis is focused on developing methods and algorithms for improving personalization in cross-domain recommendation systems through the effective mining of user profiles in online social networks. The domains that we refer to in cross-domain recommendations are: online social networks and online retail shops where recommendations can be provided to users based on the enhanced understanding of their preferences in online social networks.

"A recommendation system calculates and provides relevant content to the user based on knowledge of the user, content, and interactions between the user and the item" [14]. Traditionally, the research on recommendation systems is categorized into: (a) Collaborative Filtering (CF) in which the known preferences of a group of users are employed to generate predictions of other users’ preferences, CF-based methods can be further classified into: latent matrix factorization [15, 16] and neighborhood-based approaches [17, 18], (b) Content-Based Filtering (CB) in which recommendations are based on the analysis of the content and meta-data of the products with which the users interact [19, 20], (c) Hybrid methods which combine different types of filtering algorithms to overcome some common problems that are associated with recommendations such as: cold start and sparsity [21]. On the other hand, previous research on digital social influence has focused on: (a) analysing the persuasive power of digital influencers on customers’ choices [22, 11, 23, 5], (b) analysing the structure of social influence and its propagation in social networks [24], (c) leveraging social influence to design collaborative filtering recommendation algorithms [25, 26, 27, 28, 29, 30] (more details in subsection 1.1.1). However, due to the novelty of the digital influencers domain and the constantly evolving nature, the topic of extracting knowledge (user profiling) from social digital influencers for enhancing consumers’ personalization has not been discussed enough by scholarly researchers. In this thesis, we identified the advantages of considering the content 1.1 research motivation 3 provided by social media influencers as a source of implicit feedback about their followers. The influencers’ media content usually hold the following properties: (a) it is frequent enough to capture the trendy products and brands, (b) it is in large quantity in total which helps in generating rich profiles and minimising the cold start problem, (c) it is usually rich in hints about the advertised products (e.g. , mentions), and (d) it can be usually mapped to online shops or brands.

We employed the aforementioned properties in the exploration of user profiling at a fine-grained level of social influencers media content in a way that can facilitate mapping of the identified products to similar or exact products in online shops where recommendations can be generated to the users. Fine-grained knowledge in the fashion domain as an example can be the details of an outfit including the materials used in the fabric of the outfit and the design patterns. Such details when identified in a social media post can make the identification of similar or same items in online shops easier. Moreover, these extractions can serve as implicit feedback signals about the followers’ preferences in outfits. Hence, such extractions can be used to generate personalized recommendations in online shops for the followers based on their level of interactions with social media influencers. Nevertheless, Fine-grained user profiling from social media textual and visual content presents multiple challenges. With the invention of social media, a new language has been created; by using abbreviations, , emoticons, hashtags, and media slang that have changed how societies communicate over online platforms. These changes in the structure and vocabulary of languages have presented advanced complications for the computational processing algorithms of human language. Moreover, social media posts are not always described with enough details about their contents. Some posts that represent sponsored campaigns contain information about the product’s name and brand while many others don’t. Describing the product(s) with enough details to find an exact or similar match to retrieve or recommend depends on the amount of information that the influencer provides. Therefore, in this research we analyze whether the posts’ images and text including the followers’ comments can be a source of detailed information that can be used to describe an identified product or group of products from a single post and map them to online shops. We also analyse the effective methods for extracting this knowledge from social media text which has few or no tags in an unsupervised manner and with efficient performance.

1.1.1 Open Challenges on the application of Influencer-based Marketing in Personalized Recommendation Systems Generally, the major aspects influencing consumers’ purchasing behaviour are word of mouth communications (e.g. marketing and sales representatives) and consumers sharing their experiences with other consumers [31]. However, the online “word of 4 1 introduction mouth” (digital influence) is nowadays one of the fastest features of spreading the information on a larger scale of followers [31]. In this section, we some examples from research that focused on the effect of digital influencers on changing the customers’ purchase intentions and their role in recommendation systems.

In [22], the authors analyzed the persuasive power of a sample of digital in- fluencers by studying the responses of a selected sample of their followers. The results of the study confirmed that the influential power has increased the followers’ behavioral intention regarding the recommended brands. In their work in [28], they extract the interactions between followers to find the influential nodes and construct an endorsement graph. Then they apply a random walk mechanism for item recommendations based on targeted influencers in the network. In [29, 30] the authors find the global and local influence nodes (celebrities and trusted friends) as implicit source of feedback by users and embed them in a matrix factorization-based item recommendation model.

We differentiate our research from existing research as follows:

• We focus on the extraction of fine-grained details from online social networks accounts for the purpose of constructing well defined profiles that can be used to enhance personalized recommendations.

• We facilitate cross-domain recommendations through mapping the profiles of online social networks’ accounts to the related domain taxonomy. For example: fashion products taxonomy that can be easily linked to online shops products meta-data.

• We target complex scenarios of identification of multi-items from social media post that can be mapped to online shops. The extraction of fine-grained details facilitates the modelling and mapping of the multiple items.

• We focus on the extraction of knowledge from any content and not the sponsored or already products. This highlights the value of our tools for knowledge extraction especially with the high level of noise available in social media images and text.

1.2 Research Questions

The problem that we consider in this work is the effective mining of user profiles in online social networks for the purpose of enhancing personalization and cross- domain recommendations. Following this formulation of problem, the research questions that summarize this dissertation are: 1.3 thesis objectives 5

Q1: What are the methods and algorithms that need to be developed to effectively extract the fine-grained details of the online social media users’ contents in an unsupervised manner and with efficient performance to facilitate cross-domain recommendations ?

Q2: How to build tools and frameworks for the construction of fine-grained user profiles that can be used in recommendations ?

As defined before; a recommendation system calculates and provides relevant content to the user based on knowledge of the user, content, and interactions between the user and the item. Following this definition, we defined further sub-questions with respect to these aspects: [Content Analysis] How can visual and textual deep learning technologies be used to handle the large-scale fine-grained representations and classifications? [Interactions Analysis] With the huge information overload in social media, how can the interactions between social media users and their connections be used effectively as a source of implicit feedback about their preferences? [Content Analysis] How can we enhance and speed up the process of preparing datasets that can be used to train deep learning models for fine-grained products’ classifications? [Content Analysis] How can we enhance topic modelling in social media to facilitate recommendations ?

1.3 Thesis Objectives

To address the proposed research questions, we have planned and achieved the following objectives:

• Objective I: Development of fine-grained information extraction methodolo- gies from social accounts.

• Objective II: Development of personalized recommendation algorithms based on social influence.

• Objective III: Development of tools and frameworks for enabling the prepa- ration and preprocessing of fine-grained datasets.

Figure 1.1 illustrates how the thesis objectives are mapped to the contributions which are explained in the coming section. 6 1 introduction

Figure 1.1: Illustration of the thesis objectives and the way they are mapped to the contributions

1.4 Research Methodology

We followed a quantitative and empirical research strategy in this dissertation as our methods and algorithms are machine learning based . An extensive literature review has been done to identify the open challenges in product recommendation algorithms (section 2.1), analysis of social media textual and visual contents (sub- section 2.2.1, subsection 2.2.2). Then, we identified the areas where our methods can be applied to improve user profiling in the online social networks context for the purpose of generating personalized recommendations. We proceeded with the design and implementation of the algorithms and methods as illustrated in the thesis objectives for achieving the thesis goal and answering the research questions that are proposed in this dissertation.

1.5 Contributions

The main contributions of this dissertation, along with the papers they appear in, are listed below:

• Contribution I: Development of a Supervised and Dynamic Topic Mining Framework 1.5 contributions 7

Paper I (Appendix A): OLLDA - A Supervised and Dynamic Topic Mining Framework in Twitter. Shatha Jaradat, Nima Dokoohaki, Mihhail Matskin. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 1354-1359. IEEE, 2015. Extended with additional experiments in: On Dynamic Topic Models for Min- ing Social Media. Shatha Jaradat and Mihhail Matskin. In Emerging Research Challenges and Opportunities in Computational Analysis and Mining, pp. 209-230. Springer, Cham, 2019.

I am the main contributor in these publications, where I was responsible of the research literature review and the formulation of the idea, designing and implementing experiments, and writing the final paper. The co-authors contribution was in guiding the research direction, providing ideas related to the evaluation and review/support during writing.

In these works we deal with the following research sub-question: How can we enhance topics modelling in social media to facilitate recommendations ?. The purpose of these publications is to provide interpretable summarizations from the continuous streams of social media content. This serves our objective of developing fine-grained information extraction methodologies from the dynamic media content. We identified the need for an algorithm that achieves supervision and online learning jointly and efficiently. More details are provided in subsubsection 2.2.1.1 and section 3.1.

• Contribution II: Development of a Fine-Grained & Semantic Information Extraction Framework from Social Media

Paper II (Appendix B): Deep Text Mining of Instagram Data without Strong Supervision. Kim Hammar, Shatha Jaradat, Nima Dokoohaki, Mihhail Matskin - In 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pp. 158-165. IEEE, 2018.

Extended with additional experiments in: Deep text classification of Instagram data using word embeddings and weak supervision. Kim Hammar, Shatha Jaradat, Nima Dokoohaki, and Mihhail Matskin. In Web Intelligence, no. Preprint, pp. 1-15. IOS Press, 2020.

I am the second author in these publication, the part that is related to my thesis is the design of SemCluster component. The other proposed method belongs to the first author. The implementation and paper writing was done mainly 8 1 introduction

by the second author. The other co-authors contribution was in guiding the research direction, providing ideas related to the evaluation and review of writing.

In these works we deal with the following research sub-question: How can visual and textual deep learning technologies be used to handle the large-scale fine-grained representations and classifications?. We identified the need for an information extraction system that handles noisy social media text in an unsupervised manner and that is suitable for fine-grained and hierarchical data (as explained in details in subsubsection 2.2.1.2 and section 3.2). In this contribution, we provided an information extraction system (SemCluster) that: (a) has no requirement of supervision, (b) is able to capture fine-grained attributes with respect to the defined taxonomy, (c) is able to handle noisy unstructured text due to its support to semantic knowledge, (d) is applicable to different tasks and environments with minimum requirements.

• Contribution III: Development of a Framework for Incorporating Hierarchical MetaData in Embedding-based Recommendations

Paper III (Appendix C): Outfit2Vec: Incorporating Clothing Hierarchical Meta- Data into Outfits’ Recommendation. Shatha Jaradat, Nima Dokoohaki, and Mihhail Matskin. To Appear in a special issue (Fashion in Recommender Systems) in Lecture Notes in Social Networks Springer 2020.

I am the main contributor in this publication, where I was responsible of the research literature review and the formulation of the idea, designing and implementing experiments, and writing the final paper. The co-authors contribution was in guiding the research direction, providing ideas related to the evaluation and review/support during writing.

In this work we deal with the following research sub-question: How can visual and textual deep learning technologies be used to handle the large-scale fine-grained representations and classifications?. We identified the need for a methodology for generating representative vectors for hierarchical and complex-structured documents for reserving the semantic meanings explained with more details in subsubsection 2.2.1.3 and section 3.3. The focus of the related manuscript was on the usage of neural embedding text processing approaches for the representation and learning of relations between fine-grained details in datasets. The proposed approach has been evaluated in whole and partial recommendation experiments. 1.5 contributions 9

• Contribution IV: Development of an Adaptive Privacy-Preserving Knowl- edge Sharing Solution for OSNs

Paper IV (Appendix D): Learning What to Share in Online Social Networks using Deep Reinforcement Learning. In Machine Learning Techniques for Online Social Networks, pp. 115-133. Springer, Cham, 2018.

Short paper: Trust and privacy correlations in social networks: a deep learning framework. Shatha Jaradat, Nima Dokoohaki, Mihhail Matskin, and Elena Fer- rari. In 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 203-206. IEEE, 2016.

I am the main contributor in these publications, where I was responsible of the research literature review and the formulation of the idea, designing and implementing experiments, and writing the final paper. The co-authors contribution was in guiding the research direction, providing ideas related to the evaluation and review/support during writing.

In these works we deal with the following research sub-question: With the huge information overload in social media, how can the interactions between social media users and their connections be used efficiently as a source of implicit feedback about their preferences?. We proposed a model-free neural-based reinforcement learning framework for predicting the amount of information that a social network user would share with their friends based on the frequency of interactions (e.g. like and share). The purpose is a natural control of what the user want to share with others. More details are available in section 3.4.

• Contribution V: Development of Framework for Image segmentation, object localization, and detailed annotation.

Paper V (Appendix E) Deep Cross-Domain Fashion Recommendation. Shatha Jaradat - In Proceedings of the Eleventh ACM Conference on Recommender Systems, Recsys, pp. 407-410. 2017

I am the main contributor in this publication, where I was responsible of the research literature review and the formulation of the idea, designing and implementing experiments, and writing the final paper.

Paper VI (Appendix F) TALS: A Framework for Text Analysis,Fine-Grained Anno- tation, Localization and Semantic Segmentation. Shatha Jaradat, Nima Dokoohaki, 10 1 introduction

Ummal Wara, Mallu Goswami, Kim Hammar, Mihhail Matskin. In 2019 IEEE 43rd Annual Software and Applications Conference (COMPSAC), vol. 2, pp. 201-206. IEEE, 2019.

I was responsible about the design of the framework in details and writing the paper. The implementation was done by some of the co-authors. The other co-authors contribution was in guiding the research direction and review of writing.

In these works we deal with the following research sub-question: How can we enhance and speed up the process of preparing datasets that can be used to train deep learning models for fine-grained products’ classifications?. We identified the need for a unified solution for preparing image datasets with detailed annotations, localization, and pixel-by-pixel Segmentation meta-data. In this work, we design and develop a framework where we can speed up the process of preparing a dataset with detailed annotations (fine-grained details), and lo- calization (bounding-boxes) and pixel-by-pixel semantic segmentation classes information. More details are available in subsection 2.2.2 and section 3.5.

• Contribution VI: Development of Dynamic Convolutional Neural Network Classifiers.

Paper VII (Appendix G) Dynamic CNN Models for Fashion Recommendation in Instagram. Shatha Jaradat, Nima Dokoohaki, Mihhail Matskin - In 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud Computing, Social Computing Networking, Sustainable Computing Communications (ISPA/IUCC/BDCloud/So- cialCom/SustainCom), pp. 1144-1151. IEEE, 2018.

I am the main contributor in this publication, where I was responsible of the research literature review and the formulation of the idea, designing and implementing experiments, and writing the final paper. The co-authors contribution was in guiding the research direction, providing ideas related to the evaluation and review of writing.

In this work we deal with the following research sub-question: How can visual and textual deep learning technologies be used to handle the large-scale fine-grained representations and classifications?. We identified the need for dynamic image classifiers to handle the large scale of classifications in products’ attributes. We propose two models: DynamicPrunning in which the detected product 1.6 methodological challenges 11

from social media posts’ text analysis can be used to activate the possible range of connections of the relevant attributes in the classifier, DynamicLayers which is a dynamic framework in which multiple-attributes classification layers exist and a suitable attributes’ classifier layer is activated dynamically based upon the mined text from the image. More details are available in subsection 2.2.2 and section 3.6.

1.6 Methodological Challenges

1.6.1 Personalized Recommendations versus Online Privacy

Achieving the balance between using consumers’ data for offering personalized services and protecting their online privacy can be a very challenging aspect in the development of recommendation systems. We live in an online digital world where our activities can be tracked, analyzed and used by search engines’ providers and social networks to provide customized advertisements and search results. Personalized services save consumers’ time from a cumbersome navigation in the oceans of information through tailoring their searches for products or news based on their personal preferences [32]. However, it can be hard to decide whether the filtered information of the consumers reflect their real interests and don’t hide valuable information from them. With the increasing attention to the General Data Protection Regulation (GDPR rules) consumers became more aware of their rights to know what kind of data is gathered from their online activities and for what reasons. However, they still seem willing to trade their privacy and data for convenience and better services [33, 34]. According to [34], consumers are interested in personalization but they are curious about how the service provider handles their personal data. In a survey [35], the authors tried to evaluate whether the growing consciousness about how digital providers use the personal data is having an impact in their disclosure behaviour. They found that the participants’ attitude towards disclosure is developed from an overall assessment of privacy concerns and perceived benefits. A similar result has been confirmed in [36] where the authors aimed to understand consumers’ information disclosure behavior by identifying the tradeoffs that consumers think about before making a decision. The identified trade-offs were privacy versus the potential risks of exposing their data.

Privacy can be defined as "the claim of an individual to determine what information about himself or herself should be known to others" [37]. So, a practical way of resolving the tradeoff is to give consumers explicit control over how their information is used. A study [38] found that companies offering their customers individual-level control over their privacy settings reduce their privacy concerns. In this thesis, we analyse the privacy aspects as follows: 12 1 introduction

• Data Collection: the datasets that have been collected in this thesis during 2017 were from public Instagram influencer accounts. The datasets have been used by the author and her co-authors solely and have not been shared outside the group. The focus of our experiments has been on the identification of clothing details from the raw text (users’ posts and comments) with the final results being the identified clothes from the image. Ongoing efforts to blur the faces in the collected images and to anonymise the users’ ids are being done in the case of data being requested from other researchers. The amount of data to be shared will be the images and the information produced from our text analysis framework that includes only clothing details. Some examples of academic research that had collected data from online shops and social networks are: [39, 40, 41]. An option that we tried to achieve at the beginning of the research was to develop an application through Instagram API to collect data from real users. However, there were multiple difficulties in reaching the users whose amounts of posted data match our needs in this research. Other datasets that have been used in this thesis were from: Twitter [42], [43], and a public fashion dataset [39]. These datasets have been collected by other authors.

• Data Usage in Recommendations: the experiments that have been performed in this thesis were on the collected samples of data and were not used in any live service. For future scenarios of applying the algorithms that are developed in this thesis, we intend to provide them as a service that can be connected to the user’s social network account and their preferred online shop by asking them for an explicit and informed consent before assembling their data, to state the purposes of the gathering and how their data will be used.

• Data Filtering: to avoid hiding the real interests from the users, we believe that combining data from different sources such as: their social media interactions, their purchasing history, and their geolocation can provide better predictions. Our potential services can be provided as extra services that consumers can find without affecting their ability to search in the online shop directly.

Our datasets have been collected with a specific research purpose from public accounts and have not been shared or manipulated in any way. Our suggested future approach for providing personalization services is based on users’ explicit consent.

1.7 Application Areas

In this thesis we focused on fashion recommendation as the primary application area due to the large amounts of fashion style images in social media and to the 1.7 application areas 13

Figure 1.2: Illustration of the general methodology that is followed in our work to extract clothing knowledge from social influencers’ media for the purpose of generating personalized recommendations. Social influencers’ posts receive interactions from their followers in the form of likes, comments and shares. These interactions when analysed can be used to estimate the level of trust that the followers’ have towards the influencers which can be reflected in their generated recommendations. 14 1 introduction

fine-grained style details available in outfits where we can show the value of our knowledge extraction approaches for cross-domain recommendations. However, our methodologies can be applied in similar applications for products’ recommen- dations such as: furniture interior design and electronics shopping. We describe in Figure 1.2 the general methodology we suggest to extract knowledge from social influencers’ media content for the purpose of generating personalized recommen- dations. As shown in the figure, the details that are extracted include: patterns, materials, styles, brands and clothing categories and sub-categories. The choice of these extractions is aimed to facilitate mapping them to the general taxonomy used in online shops for fashion outfits. Hence, for another application area the taxonomy can be easily changed to reflect the related products’ meta-data. The approach we suggest for analysing the interactions between social media users and applying it to enhance their recommendations was evaluated on general datasets in our contributions (C1 and C4) but yet they can be easily adapted to the fashion application scenario that we worked on.

1.8 Thesis Organization

The rest of this dissertation is organized as follows: in chapter 2 we give a background with respect to the topics of the proposed contributions. Each background section is concluded by the research gaps that are identified to justify the aim of the work. The detailed contributions are presented in chapter 3 each followed by discussion and then finally we present the thesis conclusions and future work in chapter 4. The manuscripts of the published work are presented in the final part of the the thesis. chapter2

Background

In this chapter, we present the background overview required to cover the technical parts in the rest of this dissertation. The chapter begins by an overview of the state-of-the-art recommendation algorithms with a specific focus on products’ recommendation. Then, we focus on user profiling in social media through textual and visual knowledge extraction. First, we explore the field of Natural Language Processing (NLP) from our research perspectives with respect to their applications and challenges in profile mining in online social networks. We then describe the recent advances in image processing covering: object localization, semantic segmentation and image classification. We explore the methods that are commonly used for analyzing the relations between online social media users and their role in enriching user profiling in social media platforms. Finally, an overview about the recent advances in fashion recommendation is presented as it is the major application area in our work. The focus of the background is mostly on neural network models as they are the dominant tools and techniques used in our proposed solutions.

2.1 Product Recommendation Algorithms

Recommendation systems are essential tools for facilitating a better user experience and promoting services for many and applications [44]. The traditional recommendation algorithms are categorized into: (a) Collaborative Filtering al- gorithms in which recommendations are generated by learning from the user and item historical interactions, either through explicit (users’ ratings) or implicit (browsing history, clicks, views) feedback, (b) Content-based algorithms in which recommendations are based on comparisons across items and users’ auxiliary/side information (text, images, videos), (c) Hybrid algorithms which integrate two or more types of recommendation strategies. Most of these algorithms were based

15 16 2 background on machine learning methods such as Matrix/Tensor factorization. However, the recent advances in deep learning have been changing the recommendation systems field in a significant way. Recently, deep learning started to become very popular in a wide range of academic and industrial recommendation systems [45, 46, 47, 48]. This is explained by the ability of deep learning in overcoming the obstacles of the conventional recommendation methods and outperforming their performance. In fact, the use of deep learning methods in recommendation systems can be thought of as a natural evolution of the aforementioned traditional methods. The reason is that the traditional recommendation algorithms (matrix factorization, factorization machines, etc.) can be expressed as neural/differential architectures [45].

The power of using deep learning techniques in recommendation systems stems from the fact that they allow for enhanced feature extraction from items (such as image, video, text, audio) in uni- and multi- modal learning settings. This conse- quently allows for a more accurate modelling of items and can strongly affect the content-based and hybrid methods performance when compared to the traditional techniques. Considering the representation learning from multiple sources (multi- modal learning), the ability to represent images, text, and interactions in a unified joint framework was not possible without these recent advances in deep learning [44]. There has been a significant increase in the deployment of deep learning models with content-based methods due to the enhanced quality of features that can be extracted with these methods. The advantages of deep learning in the recommendation systems area are not just limited to content-based methods, but they also include collaborative-filtering techniques where the standard techniques (e.g MF) often treat the user-item interactions as a flat matrix structure which ignores the temporal structure of the data [45]. However, deep learning techniques like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) allow the modeling of temporal structure in data which leads to improved perfor- mance. Moreover, deep learning is able to effectively capture the non-linear and nontrivial user/item relationships and facilitate the transformation of more complex abstractions in its higher network layers. Deep neural networks are specifically justified and required for collaborative-filtering when there is a huge amount of complexity in the nature of data or when there is a large number of training instances.

According to [45], the deep learning-based recommendation systems are classi- fied into: (a) Embedding systems: which are inspired by the widely popular neural word embedding model (Word2Vec) where the model learns word representations in a low-dimensional continuous vector space using a surrounding context of the word in a sentence. Then as a result, semantically similar words will be close to each other in the resulting embedding space. Similarly, users and items can be modelled using these embedding techniques and the learnt semantic relationships from them 2.1 product recommendation algorithms 17 can be used either directly or as an input to supervised learning methods to pro- vide recommendations [49, 50, 51]. Examples of embedding methods for product recommendations are: Prod2Vec [52], Item2Vec [15], MetaProd2Vec [50], Search2Vec [53], Content2Vec [54]. (b) Neural Collaborative Filtering: this category usually applies either Feedforward Neural Networks or AutoEncoders on the user-item interactions in order to build collaborative filtering models [55, 56]. Examples of this category are: NNMF [57], NCF [58], DeepMF [59], CFN [60], NTF [61], CDAE [56] (c) Deep Feature Extraction: which use deep neural networks to efficiently extract the items’ features. The extracted features can then be used in hybrid collaborative filtering methods. Examples of this category include Convolutional Neural Networks for features extraction from different modalities: (image, text, video, audio) and RNNs for text-based recommendations. (d) Session-based: are methods in which user interactions with the content are presented as sequential sessions. Typically GRUs, LSTMs and RNNs are effective methods for session-based recommendations [62, 63, 64].

We add to the above categorization the following categories: (e) Deep Reinforce- ment Learning for recommendation: the combination of deep neural networks and reinforcement learning has achieved enhanced performance in multiple domains such as games and self-driving cars, and recently its applications started to include social networking and recommendation systems. we explored this area in our work in section 3.4. (f) Cross-domain recommendations: in which recommendations in a target domain can be enhanced using knowledge learned from another domain (source domain). These types of recommendations can be helpful to address the cold-start and sparsity issues related to new users and items in a certain domain through knowledge transfer. According to [65], Cross-domain recommendations are classified into: (1) factorization-based approaches for extracting common features of users and items from both domains, (2) semantic-based approaches in which knowledge maps are generated using attributes from source domain to be applied in target domain, (3) graph-based approaches to measure and identify the connections between users and items in both domains, (4) tag-based approaches in which meta- data association between participating domains is generated. For deep-learning cross-domain recommendation approaches, the most commonly used approach is transfer learning which aims to improve learning tasks in a domain by using knowledge transferred from another domain to efficiently train models on different data. The category that is relevant to our work in this thesis is tag-based where we identify the common meta-data in online shop fashion clothing and use it to extract knowledge in the social media context using neural approaches as we show in subsubsection 2.2.1.2. Hence, the source domain in our context is the online shop and the target domain is the social media platform where recommendations from online shops can be shown to the users. Our focus in this thesis was on: embedding 18 2 background methods (as we explain in section 3.2 and section 3.3), and deep feature extraction methods (section 3.6) and cross-domain recommendations (section 3.2). We have also explored deep reinforcement learning for social interactions recommendations in (section 3.4).

2.2 Profile Mining in Social Media

2.2.1 Natural Language Processing in Online Social Networks "Today getting people to hear your story on social media, and then act on it, requires using a platform’s native language, paying attention to context, understanding the nuances and subtle differences that make each platform unique, and adapting your content to match." - Entrepreneur, author, speaker, and personality.

Over the past years, online social networks have revolutionized the way people communicate and have affected multiple aspects of their daily practices. The unprecedented volume of user-generated content and the variety in the forms of users’ interactions have introduced new challenges for the computational processing algorithms of human language known as Natural Language Processing (NLP) [66]. A new language has been created in social media that uses emjois, emoticons, hashtags, abbreviations and media slang. Social media text is usually informal and in the form of conversations or comments that don’t always adhere to standard grammar rules and might contain spelling errors. For example, a sentence such as: “gorg booties” is equivalent to “Goregous boots”. The traditional NLP techniques were dominated by symbolic approaches that are based on logic, grammatical rules and ontologies. However, the evolving nature of social media language which is highly ambiguous and highly variable, has called for more statistical machine learning algorithmic approaches for understanding text [67]. We identify the following as the common tasks in case studies of NLP: (a) text normalization, (b) features extraction and representation, (c) learning models, and (d) applications. We define these steps from the research perspective of analysing social media and the challenges that are introduced in regards to that. (a) Challenges of Normalizing Social Media Text: Text normalization is gener- ally the first step that is performed in text processing for the purpose of improving the effectiveness of the learning model. Commonly, text normalization includes some of the following stages that are decided based on the task: lemmatization: a stage in which words are converted to their lemma for reducing the morphological variation, stemming: a variant of lemmatization that converts words to their stem, stop-word removal: a stage where the most frequent words are removed to reduce their effect on the statistical models that consider the co-occurrence, tokenization: where the text is divided into tokens including emojis, URLs, and user-handles depending on their need in the task, spell-checking: where the miss-spelled words 2.2 profile mining in social media 19 are corrected, word segmentation: the task of dividing a string of composite text or hashtags into a list of words. Although the aforementioned normalization strategies have been used frequently in multiple research works, there is no clear agreement on the negative effect of normalization on removing important information or degrading the system’s performance. Usually, it requires individual evaluation of the system’s performance based on the chosen strategies of normalization. (b) Efficient Features Extraction and Representation of Noisy Input: Generally, natural language exhibits a set of properties that make its representation and analysis complicated. For instance, language is symbolic; words are distinct symbols whose meaning and relation to other words cannot be inferred from the symbols themselves. For example, there is no inherent linguistic relation between the words "coat" and "jacket". Moreover, there are many ways in which words can be combined to form meaningful phrases and sentences. These properties lead to data sparseness as there is no clear way of defining similarity rules between words and sentences or generalizing from one to another. However, in the past years deep learning approaches proved to be very appealing for use in NLP due to their efficiency in feature engineering and learning processes. A central component in neural networks for feature engineering is the embedding layer in which the discrete language symbols are mapped to continuous vectors in a relatively low dimensional space. This in turn transforms the symbols into mathematical vectors that their distance from each other can be equated to represent the semantic similarity between their related words. The idea of word embeddings is conceptually related to the distributional hypothesis which indicates that the meaning of a word is dictated by the context in which it occurs, described as "You shall know a word by the company it keeps" [68]. By incorporating this idea in the practice of deriving word representations, words that occur in similar contexts will obtain similar representations [49]. Traditionally, one-hot encoding was common to use where each input dimension corresponds to unique features, and the resulting feature vector can be thought of as a high-dimensional vector in which a single dimension has a value of 1 and 0s otherwise. As the dimension of one-hot vector is the same as the number of distinct features, one-hot vectors are not preferred computational- wise. While in dense encodings, the features are represented as dense vectors, and each core feature is embedded into a d-dimensioanl space and represented as a vector in that space. Consequently, similar features have similar vectors as information is shared between similar features. This enhances the generalization ability of the model. These capabilities address to some extent the data sparseness problems. (c) Deep Learning Models for Learning Representations and Predictions: Learn- ing word embeddings based on neural predictive context-window models has become a dominant approach in deriving text representations [67]. Language window-based methods were first introduced in [69] where the authors proposed a 20 2 background neural network model that learns word vector representations for language mod- elling that takes as input a k-gram of words and produces as output a probability distribution over the next word. The idea has further developed in the Skip-gram and Continuous Bag of Words (CBOW) models which became the foundation of the modern algorithms for deriving word embeddings [70]. The widely popular Word2Vec algorithm [70] is a state-of-the-art neural language model that imple- ments the context representations CBOW and Skip-gram and has two different optimization objectives (Negative Sampling and Hierarchical Softmax). However, Word2Vec has some shortcomings as it depends on learning from local context windows of words which affects its global statistics of the whole corpus. These shortcomings motivated the development of other models such as: Glove [71] that combines the local context windows with global corpus statistics. In both Word2Vec and Glove, each word in the corpus is assigned distinct word embeddings. This however affects the model’s sensitivity to out-of-the-vocabulary words. FastText [72] addresses this problem by describing each word by a bag of characters (ngrams), and subsequently each embedding representation gets associated with an n-gram. Word2Vec, Glove, and FastText have achieved unbeaten results on several tasks and metrics [73, 74]. This has led to integrating word embeddings in several neural classification models that have achieved superior classification accuracy and succeeded in major breakthroughs in multiple applications such as: language modelling [75, 76] and automatic machine translation [77, 78].

(d) NLP Applications in Social Media Platforms: There are multiple case studies of NLP that are applicable to social media such as: information extraction, document’s language classification, named entity recogni- tion, topic modelling, part-of-speech tagging and many others. However, the focus of this thesis was mainly on two areas which are: topic modelling and information extraction. Specifically our research was on: (a) topic modelling in Microblogging online services (subsubsection 2.2.1.1), (b) the application of word embeddings on fact extraction from online media content (subsubsection 2.2.1.2), (c) leveraging fine-grained details of media into documents’ embeddings (subsubsection 2.2.1.3). We provide the background on those specific applications in the coming sections, and we identify the gaps in existing research that motivated our work.

2.2.1.1 Topic Modelling in Microblogging Online Services

With an average of 500 millions tweets sent per day in 2019 [79], 350 millions photos uploaded to Facebook [80] and over 100 millions of photos uploaded to Instagram [81], the pressure of online content is dramatically increasing on users in a way that is more than what they can possibly filter. This seriously causes information overload in online social media platforms. Within this context, analyzing media in real-time is of great importance in order to digest the content and perceive the topics 2.2 profile mining in social media 21 of interest to users. Topic models are statistical models that are used to finding and analyzing the hidden semantic relationships between collections of documents. There are various methods for topic modelling such as: Latent Semantic Analysis (LSA) [82], Probabilistic Latent Semantic Analysis (PLSA) [83], and Latent Dirichlet Allocation (LDA) [1]. However, LDA, introduced by Blei, Ng and Jordan, has gained most of the attention and multiple extensions have been proposed for this model [84]. LDA is an unsupervised generative probabilistic method for modelling a col- lection of documents, which can be tweets for example. The model assumes that each document can be represented as a probabilistic distribution over a set of topics. Each topic in the model is also represented as a probabilistic distribution over words. The major approaches used for estimating the model parameters and variables are Markov Chain Monte-Carlo (MCMC) GIBBS sampling techniques, Expectation-Maximization (EM), and Variational Bayes Inference (VB) [84]. We refer the reader to the original LDA paper [1] for the technical details of the model, and to a recent survey [84] for examples of extensions and applications of the main model. From our perspective, we have classified LDA extensions into three categories: (a) parallelism and scalability (a) supervised learning and (b) online learning (stream- ing support) extensions. This categorization is based on the availability of LDA extensions that have addressed some of the identified shortcomings of the main model which are: (a) limited performance on huge corpora (b) difficulty of topics’ interpretation and (c) limitation of batch processing. The original LDA algorithm is unsupervised and it models each document as a mixture of topics that are hard to interpret. This makes it difficult to directly apply this technique to multi-labeled corpora of documents. A remedy to this problem is incorporating supervision which has been proposed in multiple research work such as Labeled LDA [3], Supervised LDA [85], DiscLDA [86], and TweetLDA [87]. Another shortcoming of LDA is the batch execution, which makes it difficult to adapt to streaming nature of microblogging services such as Twitter. Multiple efficient solutions have been proposed for dynamic extensions of LDA such as TM-LDA [88], OLDA [89], and Online VB LDA [2]. Focusing on (b) and (c) from the shortcomings, we identified the need for an algorithm that improves LDA with respect to these shortcomings in a joint manner. We further elaborate this approach in the next subsection. Need for an algorithm that achieves supervision and online learning jointly and efficiently All the above models try to address one of the challenges in topic modelling. However, at the time of writing our paper [90] (2015), we identified the need for a single algorithm that achieves both objectives of analysing huge amounts of documents, including new documents arriving in the stream and at the same time, achieving high quality of topics’ detection. We join the positive aspects of two popular algorithms: Online VB for LDA and Labeled LDA by modifying the 22 2 background variational distribution used to approximate the posterior distribution in online VB for LDA algorithm, thus constraining the variational parameters to a specific set of meaningful labels. We refer the reader to section 3.1 for a summary of the contributions of this paper.

2.2.1.2 Semantic Words Embeddings for Challenging Scenarios in Information Extraction

Information extraction is an NLP area that is focused on identifying a predefined set of concepts and their relevant properties in a specific domain such as an online social network by analysing free-text data and discovering valuable knowledge in the form of structured information [91]. The information to be extracted is usually pre-specified in a user-defined structure that we can refer to as a template or taxonomy. Information extraction on social media data can be significantly complex due to several reasons such as the: (a) noisy nature and informal style of text that usually contains misspellings, non-standard abbreviations and emjois, (b) multi-lingual content, (c) large scale of data which often involves millions of documents which require efficient processing systems in terms of speed and computational consumption, (d) richness in fine-grained subjects (sports, politics, fashion, etc.) [92, 93, 94, 95]. An example of information extraction from social media content that is relevant to this thesis is the identification of the clothing items and their attributes in an outfit from a fashion Instagram post. Table 2.1 shows a post from Instagram where the post’s author explains her outfit details in the form of text and hashtags and receives comments about the details of the outfit. Figure 2.1 then shows the structured information derived from the post’s text and comments that can be useful to describe the outfit in details. The identification of this information was based on defining a taxonomy of clothing items and properties.

State-of-the-art techniques for information extraction can be classified into: (a) supervised which are further classified into: ontology-based and machine-learning based (e.g. Decision Trees, Genetic Algorithms, Deep learning algorithms), (b) un- supervised such as: hierarchical clustering, ontology-based clustering and semantic word matching using word embedding [96, 95]. We focused on unsupervised approaches for information extraction specifically word embeddings which have shown to be a great asset for information extraction. They are particularly useful in social media domain due to the high syntactic variety in text (multiple languages, misspellings, non-standard abbreviations) which makes applying syntactic word similarity on such type of text unfeasible. As the thesis is focused on extracting fine-grained details from the noisy media content, we researched the existing information extraction from social media and we found that most of the research was focused on general and high-level extraction of concepts, while in a task such as outfit detection our goal is to handle a complex multi-label extraction. For the 2.2 profile mining in social media 23

Table 2.1: Example of an Instagram fashion post along with the received comments

Post Text Examples of Comments #blueblouse #coldshouldertop #shoul- C1: That blouse is so gorgeous! C2: dercutout #satinblouse #highwaist- Great bag C3: Love the blue C4: Love pants #trompetpants #blacktrousers that handbag C5: This bag is on the list #pradabag #velvetshoes #statementbag now C6: Your style is amazing! Classy! #elegantstyle #timelessfashion #outfitin- C7: Beautiful top C8: This looks so spo classy and chic C9: I am in love with this outfit! That blouse is a stunner

Figure 2.1: Example of Possible Outfit’s Extracted Information from an Instagram Post

aforementioned reasons, we identify the following need:

Need for Information Extraction System that handle noisy social media text in an unsupervised manner and is suitable for fine-grained and hierarchical data Our objectives behind the identification of fine-grained outfit details from Instagram images is to enhance the cross-domain retrievals and achieve better understanding of consumers’ preferences. The acquisition of annotated data that is accurate and at the level of details that we aim for training text mining models is expensive and difficult to achieve. This motivated the need to develop a system that handles the noisy social media text with the least supervision and with output that predicts the fine details about products from unlabelled posts. Results from the previous work was not enough to achieve our goals due to the following reasons: (a) most of the existing research rely on access to massive quantities of annotated datasets, (b) no prior assessment of complex, multi-label, hierarchical extraction and classification on social media has been made. We refer the reader to section 3.2 for a summary of the contributions of our work on this need. 24 2 background

Figure 2.2: Graphical representations for Paragraph Vector models: (a) Distributed Memory (b) Distributed Bag of Words

2.2.1.3 Document Embedding Models for Handling Hierarchical Datasets

Paragraph Vector [97] is an unsupervised model for constructing representations of input sequences of variable length such as paragraphs. It can be described as an extension of the Word2Vec model for document embeddings. While in Word2Vec the model learns to project words into a latent d-dimensional space, it learns to project documents in Paragraph Vector (commonly referred to as Doc2Vec). Paragraph Vector supports two models: distributed memory (DM) and distributed bag of words (DBOW). In the distributed memory model of Paragrpah Vector, every word is mapped to a unique vector, and every paragraph is also mapped to a unique vector to act as a memory that remembers what is missing from the context of the paragraph. The paragraph vector is then concatenated with several word vectors from a paragraph to predict the following word in the given context. This model considers the order of words while in the DBOW model the context of words is ignored and the model predicts words randomly sampled from the input paragraph. Figure 2.2 shows a graphical illustration of the two models. Both previously described models have focused on simple flat structured inputs rather than hierarchical complex structures such as sequences of products where each product has a meta-data hierarchical structure. Need for a methodology for generating representative vectors for hierarchical and complex-structured documents for reserving the semantic meanings In our work, we have sequences of outfits’ descriptions such that each outfit is composed of multiple clothing items. Furthermore, each item consists of clothing category, 2.2 profile mining in social media 25 subcategory, with materials and patterns details. Style labels are attached to the whole outfit. Moreover, brand names and hashtags are also available at the outfit’s level and they can be used as additional contexts to enhance the outfits’ recommendations. The clothing attributes describe the items at the instance level rather than the whole sequence. We proposed a methodology that can be followed to generate representative vectors of hierarchically-composed items such as outfits. The vocabulary in the model can then be the sequence of outfit details rather than just clothing categories. We refer the reader to section 3.3 for a summary of the contributions of our work on this need.

2.2.2 Modern Approaches for Image Processing

We identify the following topics as the major areas contributing to image processing: object detection and localization, semantic segmentation, human pose estimation and image classification. This section provides a high-level overview of the state-of- the-art in each of these areas, and concentrates on the recent improvements in these fields. Research efforts that are closely related to these topics are examined in more detail in section 3.6 and section 3.5. When it comes to analysing the content of an image, computer vision techniques together with machine learning algorithms have provided pioneering solutions with interesting results. However, deep learning has recently changed the scenario: while in computer vision methods, the image has to be preprocessed to extract features and relevant information ,which can be a non-trivial challenge, no preprocessing has to be done in deep learning as features get extracted inside the network through training. Some traditional computer vision techniques for extracting features are: Scale Invariant Feature Transform (SIFT), Local Binary Patterns (LBP), Histogram of Oriented Gradients (HOG), Discrete Cosine Transform (DCT), Color Moment and Color histogram. The mentioned techniques are used as feature descriptors (ways to describe a concrete region or points in the image and are expressed as vectors that gather important information of the neighbourhood region around the studied point).

2.2.2.1 Object Detection & Localization

The detection and localization of the elements that appear in an image have always been useful and challenging techniques in image processing. Object localization can be defined as the task where the bounding box (characterized using center, width and height) of where the object exists is predicted. In a more general way, the object detection involves the localization of multiple objects that don’t have to be from the same class. The process of localization can be split into two general components: the region proposal step and the classification step. We describe three major works in object localization in computer vision that have applied deep CNNs in their 26 2 background architectures: R-CNN [98], SPPNET [99], Fast R-CNN [100], and Faster R-CNN [101]. R-CNN (2014) is considered one of the most impactful advancements in localization tasks in computer vision. SPPNET (2014) enhanced the computational efficiency by sharing computation (a feature map is created for an entire image rather than for each of the project proposals as was done in RCNN). Fast R-CNN (2015) and Faster R-CNN (2015) were proposed to make the model faster and better suited for modern object localization and detection tasks.

2.2.2.2 Semantic Segmentation Segmentation consists of taking an image and separately localizing and classifying a particular number of regions. Segmentation is sometimes referred as a pixel classification, and this is what differentiates it from localization and object detections tasks: it is more precise. In effect, a segmented region will only include pixels from that region, which is not the case of localization where the proposed boxes detect the target object but also the background. This process has been historically performed with very different methods: from Region Merging (merging the adjacent regions which meet a similarity criteria such as the pixels intensity) to Region Growing (methods for refinement of the selected regions).

2.2.2.3 Image Classification Deep Convolutional Neural Networks (CNN) have recently achieved dramatic accuracy improvements in image classification [102]. AlexNet (2012) [103] is one of the most influential CNN architectures that was the starting point for many other improvements in this field, and started with lowering the classification error rate to be 15.4% in ILSVRC image classification competition. Followed by ZF NET (2013) [104], VGG NET (2014) [105], Google Le NET (2015) [106], Microsoft Residual Nets (ResNet) (2015) [107] with which the classification error rate to 3.6% in the same competition.

2.2.2.4 Human Pose Estimation One of the most important problems that appears when comparing human images is the great variability that usually exists between them. In images, occlusion, deformation and pose variation are often present. Human Parts Alignment is an example of a popular method that deals with the issue of pose variation when comparing different human images. The idea is that after applying this method the human beings that appear in the images share a similar feature representation, making them more easily comparable. The process consists of extracting several features uses HOG, LBP, Color Moment, Color Histogram and Skin Descriptor. Figure 2.3 shows a graphical illustration of the described tasks within image processing. 2.2 profile mining in social media 27

Figure 2.3: Graphical representations for (a) original image, (b) object localization applied to the image to decide the bounding box where the object exists, (c) semantic segmentation is applied to the image to decide the exact pixels belonging to each class.

With respect to the complex scenarios of identification of outfit details from media content that has been the focus of this thesis, we identified the following needs:

Need for Dynamic Image Classifiers to Handle the Large Scale of Clothing Attributes and Brands The direct application of the existing image classification frameworks becomes more complex when the task of classification is on large scale of multi-class multi-label clothing categories, attributes and brands. The complexity is about the time needed to perform the task and the required memory especially that the required fine-grained outputs require enough training. We proposed novel strategies to address these complexities in section 3.6.

Need for a Unified Solution for Preparing Image Datasets with Detailed Anno- tations, localization, and Pixel-by-Pixel Segmentation Meta-Data Object localiza- tion and pixel-by-pixel semantic segmentation in image analysis neural architectures provide an enhanced level of information that can be needed for specific tasks such as outfits’ details detection. However, to train such architectures in an end-to-end manner, detailed datasets with specific meta-data are required. Such datasets re- quire annotation and identification of extensive attributes to lead to better partition of the clothing feature space and facilitate the recognition and retrieval of cross domain clothes images [39]. As compared to object detection and localization, evaluation and training of segmentation Convolutional Neural Networks (CNN) architectures are more complex and expensive as obtaining pixel-level classification is challenging. We surveyed the existing solutions for localization and pixel-by-pixel 28 2 background segmentation and we found that most of them require large memory and further changes on the class labels. For this reason, we identified the need for a flexible and extensible unified framework where researchers can find all the tools that are required for detailed annotation, localization and segmentation in one place and with few required changes. We refer the reader to section 3.5 for the full details of this contribution.

2.3 Advances in Fashion Analysis (Application Area)

We identify the following general categories in Fashion Analysis: (a) clothing recog- nition and retrieval: which is the process of matching a query fashion image with similar or same clothes from online shops, (b) clothing segmentation and parsing: which is the decomposition of a fashion outfit into its specific clothing items, (c) clothing style and features’ extraction: which is the identification and analysis of the major clothing features (e.g. categories, materials) usually achieved using advanced CNN architectures and can be useful input for fashion recommendation algorithms, (d) clothing recommendation: which can be achieved in different ways based on the available knowledge about users, items and their interactions and can be for different purposes such as recommending: outfits, single clothing items, similar style outfits and sizing suggestions, (e) fashion compatibility: which is the process of matching clothing pieces together in an outfit to match a human preferred style. Some of these categories compliment each other. For example, the segmentation of clothing can be beneficial for constructing the user’s fashion profile by analysing their preferred clothing items which consequently help in better recommendations. In this thesis, we have worked in many of these areas as we will show in the contributions chapters. These tasks all considered to be challenging due to the large number of possible garment items to match, variations in photos’ configurations and lighting, occlusions from other humans or objects, similarity between items, garments appearance, layering and occlusions [108, 109, 110].

Most of the early works on fashion analysis relied on traditional computer vision techniques and hand-crafted features such as SIFT, HOG, LBP, and color histogram. However, with the advances in deep learning in the recent years, a large number of deep CNN models have been introduced to learn more discriminative representations of clothing items to efficiently handle the cross-scenario variations and enhance their performance [39, 111, 112, 113]. Clothing Recognition and Retrieval category is usually referred to as "Street-to- Shop" or cross-domain retrievals. In clothing recognition and retrieval, the clothing item(s) have to be detected first in a given image (consumer image from which we are interested in finding similar/same items from online shops). The item detection is usually achieved through advanced features extraction stage. Then, a 2.3 advances in fashion analysis (application area) 29 search algorithm is applied to match the detected features with the online shop images for a top-k retrieval of items based on their similarity to the detected items in the original image. Match R-CNN [40] is an example of a clothing retrieval architecture where there is a feature extractor component that retrieves the highly discrimintative features with respect to clothing attributes of both query and input images, and then the features are fed to a similarity learning network to obtain the similarity score between the detected clothing items from both images. Such images are already annotated with boundary boxes of the clothing items with their labels. Given the large amount of noise that is usually available in the input images (background, different objects and poses of people), the more specific the search the better is the performance. Hence, using (a) landmarks and boundary boxes (e.g. [114, 115, 39]) and (b) clothing segments (e.g.[116, 117, 118, 119]) to identify the crucial parts of the image and then perform a visual search of their features can enhance the retrieval performance. Visual neural attention mechanism has been applied in many recent work for clothing retrieval for less strong supervision of landmarks and segmentation labels and to extract attention maps from given images (e.g. [120]). The attention mechanism is motivated by the human visual attention where people only need to focus on specific parts of visual inputs to understand or recognize them. Hence, the attention mechanism filters out the uninformative features from raw inputs and reduces the side effects of noisy input [44]. Recently, multimodal approaches have been proposed for clothing retrieval such as [121, 122]. For example in DeepStyle [121], a joint neural network architecture is proposed to model features of different modalities such as text and images for clothing retrieval tasks. We classify the fashion compatibility category into two areas based on their purpose: (a) learning user’s matching style with respect to their body measure- ments, (b) matching clothing items to compose an outfit. Dress with Style [123] is an example of a framework that aims to learn the compatibility of body shapes and clothing styles based on the body attributes through a joint embedding multimodal representation model. The key to a proper outfit is usually in the harmonious matching of clothing items. NeuroStylist [124] is an example of clothing matching frameworks in which they benefit from the outfit compositions that are provided by fashion experts and rich in visual and textual content to train models that jointly learn the visual and textual features of clothing outfits.

Building a successful fashion recommendation system depends on deep un- derstanding of items’ aesthetic attributes, user preferences, contexts (e.g. weather) and the dynamics and evolution of trends and users’ needs. Generally, embedding methods, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are the common techniques used to model text, images and sequential data in recommendation systems. Some approaches combine a group of these techniques 30 2 background together to effectively integrate multiple dimensions of users and items details into the recommendation system. For example in [125], they combine the high-level visual features of clothing images that are extracted using CNNs with users’ past feedback in a Collaborative-Filtering approach. Session-based sequential recom- mendation systems consider the order of the user’s interactions in a session when recommending items by examining the similarity and complementary relationships between the items that the consumer is adding to the cart or viewing/favoriting. This type can be linked to the aforementioned fashion compatibility category as the suggestions can be complementary to the products that the user is interacting with. DeepStyle and Context-Aware GRU (CA-GRU) [126] are examples of proposed methods to model the user’s selection behavior in a session based on their elicited style and the general context (e.g. weather). DeepStyle is the component that learns the visual style features of the products that the consumer showed interest in, and the CA-GRU integrates this contextual information into the sequential prediction. Research that integrates textual (metadata) information of clothing images usually applies multimodal joint or coordinated learning approaches in which the distances between items decide their similarity with other items in the latent space representations. However, this line of research is still open for more contributions. chapter3

Detailed Contributions

3.1 Development of a Supervised and Dynamic Topic Mining Frame- work

The ability of social media platforms to publish content and broadcast it to millions of people instantaneously has affected multiple areas in social life and Twitter is undeniably one of the major influential social media platforms. Hence, the development of algorithms that analyze media at run time and detect their topics in a meaningful way can essentially affect how information can be recommended to users based on their interests. This has been our focus in [90] where we proposed OLLDA as a supervised and dynamic topic mining framework in Twitter.

Summary of Contributions: Most of the research attention was focused on extend- ing LDA algorithm [1] to serve a single objective. For example, scalability and dynamicity were addressed in online VB for LDA [2] and topics’ interpretability was addressed in Labeled LDA model [3]. However, less attention has been paid to providing a single model that is capable of addressing the identified shortcomings of LDA (subsubsection 2.2.1.1) in a joint and efficient manner. Figure 3.1 (a) shows a graphical representation of the original model of LDA. As explained in subsub- section 2.2.1.1, the posterior distribution of the model’s hidden variables given the documents can be approximated using variational inference. This is achieved by transforming the original model into a simpler model as shown in Figure 3.1 (b). With the new simplified distribution, the optimization goal is finding the settings of the variational parameters (γ and λ) that make the simplified model close to the true posterior distribution. In online VB for LDA, the original algorithm was extended to support streaming of new documents in a dynamic fashion through the update step of the model which is executed for the newly observed documents (tweets). On the other hand, supervision in Labeled LDA model is achieved by having the

31 32 3 detailed contributions

Figure 3.1: Graphical representations for (a) LDA model - Source: [1] (b) Variational distribution used to approximate the posterior distribution in LDA - Source: [2] (c) Labeled LDA model - Source: [3] (d) Modified variational distribution used to approximate the posterior distribution in OLLDA topic distribution over documents θ (as shown in Figure 3.1 (c)) depending not only on the hyperparameter α but also on Λ which is the set of observed labels for a document. By coupling the mechanism of handling streaming in Online VB for LDA and supervision in Labeled LDA we presented OLLDA (Figure 3.1 (d)) where we have the variational parameters in the simplified model constrained to a specific set of meaningful labels Λ in which the model’s update mechanism is performed in a similar way to Online VB for LDA. This in turn influences the assignment of words to topics according to an interpretable classification mechanism. This constraint changed the approach to be supervised, making it useful for streaming scenarios while maintaining accuracy. At the same time, this approach supported having dynamic vocabulary rather than a static one. This is extremely useful to serve the noisy nature of tweets that contain hashtags, abbreviations, and names of people which are not part of the general vocabulary. Our model achieves this by extracting words from tweets and assigning them to topics, taking into consideration the prior topics’ distribution score which we have estimated through the classification service as we explained in [90]. We conducted experiments on a one year crawl of Twitter data that was gathered in [42], with results demonstrating the enhanced performance and topical inference quality of our algorithm when compared to the state of the art baselines. To evaluate our approach, we used information retrieval metrics such as: precision and recall for measuring the accuracy of inferred topics. We can summarize the contributions of paper [90] as follows: C1.1: Proposing a modified variational distribution to approximate the posterior distribution of the LDA model. C1.2: Introducing a dynamic vocabulary framework for preparing labelled and translated tweets as input to LDA algorithm. 3.2 development of a fine-grained & semantic information extraction framework from social media 33 We relate the contributions of this work to the proposed research questions as follows: This contribution specifically answers the research question: How can we enhance topic modelling in social media to facilitate recommendations ? The purpose of this work is to provide interpretable summarizations from the continuous streams of social media content. This serves our goal of developing fine-grained information extraction methodologies from the dynamic media content. The classification mechanism of the individual tweets participates in handling the social media vocabulary to estimate the prior topics’ distribution scores which later affect the assignment of words to topics. The effective summarizations of topics can enhance recommendations to users based on their identified topics’ interests.

3.2 Development of a Fine-Grained & Semantic Information Extrac- tion Framework from Social Media

Personalized recommendation algorithms typically rely on building knowledge about users from their profile information and their history of interactions with the system. While these interactions can be efficient metrics for measuring users’ preferences, better conclusions can be achieved through a deeper analysis of users’ shared content (e.g. posts’ text and comments). Most of previous research has focused on high-level extraction methods. In our paper [127], we presented an assessment and a practical methodology of a complex, multi-label, hierarchical extraction system that can easily adapt to social media content. The application area of this work was extracting fashion attributes from raw text in Instagram (posts’ text and comments). However, the method is applicable to different domains by changing the used taxonomy and by generating embeddings of the new application domain. Summary of Contributions: The task of extracting fine-grained details from social media text (Instagram in our study) can be described as a mixture of information extraction and ranking. The high syntactic variety in the text (e.g. spelling and grammatical errors) makes it difficult to use syntactic word similarity for information extraction in such domain. For this reason, we have made use of word embeddings as a central component to extract hierarchical information from Instagram posts. Our extraction process can be divided into three stages:

• Linguistic and Syntactic Text Normalization which includes: lower case, tokenization, removal of stop words and urls, lemmatization, and sorting tokens into different categories such as Caption, Comments, Tags, and Emojis

• Text Mapping to a Taxonomy (for example: fashion concepts) by using word embeddings and the cosine similarity metric. The word embeddings have been trained on a large corpora of Instagram text and can semantically relate both grammatically correct sentences as well as noisy tokens, such as miss-spellings, 34 3 detailed contributions

emojis, and hashtags, to the taxonomy. Other factors that affect the mapping are: (a) the frequency of occurrence measured by Term Frequency-Inverse Document Frequency (TF-IDF), (b) the ambiguity of tokens (based on lookups with the Probase API), (c) the location of the token (in image caption or in a user comment). For words that have no retrieved embeddings, a syntactic similarity using Levenshtein distance is performed.

• Ranking of the terms in the taxonomy based on their accumulated mapping to the input text

We have applied this framework on a task of extracting clothing items, brands, materials, patterns, and styles from unstructured text associated with Instagram posts. We refer the reader to [127] for the complete details on the approach and the experimental pipeline. Value of Research on Instagram Text: Little prior work has paid attention to the promising applications of text mining on Instagram. Instagram hosts large volumes of unstructured user generated text. Specifically, an Instagram post can be associated with an image caption written by the author of the post which can possibly contain hashtags and tags of other people or items. Each post can receive comments written by other users. In an empirical study that we have performed on Instagram dataset that is focused on fashion, we observed that text associated with Instagram posts includes both information about obvious image attributes such as "Where did you buy that coat from ?", and more nuanced image details that are harder to infer from the image alone such as the brand in this example: "this coat is from Zara, it is sold out though". Mining such details can be beneficial to multiple applications such as: automatic image tagging, personalized recommendations, trends detection, and cross-domain retrievals. We can summarize the contributions of paper [127] as follows: C2.1: Providing an information extraction system that: (a) has no requirement of supervision, (b) is able to capture fine-grained attributes with respect to the defined taxonomy, (c) is able to handle noisy unstructured text due to its support to semantic knowledge, (d) is applicable to different tasks and environments with minimum requirements. C2.2: Providing a technical assessment of the crucial value of word embeddings in extracting information from noisy and dynamic social media text. C2.3: Providing a hierarchically annotated and fine-grained attributed dataset to the research community that can be useful for case studies in NLP, image processing and multimodel learning. C2.4: Presenting the first empirical study of Instagram text to highlight its value in personalized recommendation and trends detection. 1

1at the time of the paper’s publication 3.3 development of a methodology for incorporating hierarchical meta-data in embedding-based recommendations 35

We relate the contributions of this work to the proposed research questions as follows: This contribution answers the research question: How can visual and textual deep learning technologies be used to handle the large-scale fine-grained representations and classifications?. The detailed explanation of the list of contributions above illustrates the relation to the research question. Cross-domain recommendations are facilitated through the used taxonomy that represents the common taxonomy for fashion clothing online shops. This serves to facilitate the retrieval and recommendations between the related domains (e.g. social media and online shops).

3.3 Development of a Methodology for Incorporating Hierarchical Meta-Data in Embedding-based Recommendations

Nowadays, customers expect personalized content in their online services which can be in multiple formats such as recommended music, movies, news, and fashion brands. This saves them from information overload and presents them with sugges- tions relevant to what they are interested in. However, with the increasing volume of content, it becomes more competitive to get users’ attention. One personalization strategy that we suggest in [128]) is incorporating fine-grained metadata of items in embedding-based recommendations to reflect users’ preferences in a more detailed way. Summary of Contributions: Most of the recent service recommendation models are based on neural embedding approaches for learning the latent relationships between users and products. For example, Prod2Vec [52], Item2Vec [15] and MetaProd2Vec [50] are all inspired from the neural embedding approach Word2Vec [70] where the main concept is learning the semantic relationship between items from their contexts’ similarities. As our focus is on sequences of items with their fine-grained metadata descriptions, we proposed a model inspired by Paragraph Vector [129] along with a strategy for learning representations of complex items that are provided as input to the model. We applied this approach on the task of learning representations of clothing outfits where each outfit is composed of multiple clothing instances, and each instance has many levels of meta-data (e.g. patterns materials, and style). The input to our model is taken from SemCluster framework that we have explained in (section 3.2) where we extract fine-grained clothing details from social media text. We summarize the major steps in our methodology (which can be generalized to similar cases of complex content) as follows:

• Entities’ Generation which includes: (a) mapping raw text to a defined taxonomy from which the instances can be defined, (b) defining a structure of the instances’ descriptions to generate unique entities. 36 3 detailed contributions

• Projection of Entities into Learning Vectors which is defining a unified order of the sequences to provide a consistent way of describing the predictions, specially for partial-sequence recommendations.

• Learning Embeddings Providing the generated sequences to train the em- bedding models

We proposed Outfit2Vec and PartialOutfit2Vec as outfit embedding models to learns users’ outfits representations using the Paragraph Vector Distributed Memory (PV-DM) and Paragraph Vector Distributed Bag of Words (PV-DBOW) models by following our suggested strategy of learning complex items-metadata structures. We evaluated our models using multiple evaluation ranking metrics such as: NDCG, MRR, MPR on an annotated Intsgram dataset that we have gathered in [130]. our models achieved better results specially in whole outfits’ recommendation evaluations with an average of 22% increase. We can summarize the contributions of our paper [128] as follows: C3.1: Providing a methodology to generate representative vectors of hierarchically- composed items with multiple levels of meta-data. C3.2: Providing an improved personalization strategy for recommendations. We relate the contributions of this work to the proposed research questions as follows: This contribution partially answers the research question: How can visual and textual deep learning technologies be used to handle the large-scale fine-grained representations and classifications?. The focus of this work was on the usage of neural embedding text processing approaches for the representation and learning of relations between fine-grained details in datasets. The proposed approach has been evaluated in whole and partial recommendation experiments.

3.4 Development of an Adaptive Privacy-Preserving Knowledge Shar- ing Solution for OSNs

Our activities in online social networks can often be propagated to other people who are not in our direct cycle of friends. For example, in Twitter, users can find tweets from stranger accounts in their news feed with which their connections have recent interaction with (e.g. Your friend X likes this Tweet from Stranger Y). This could be useful sometimes as it introduces users to new accounts to be followed if they share similar interests. However, in some other cases, it can come with unintended consequences. One disadvantage can be in overloading users’ news feeds from different sources other than the ones they originally chose to follow. Another disadvantage can be in the negative effect of exposing users’ interests and interactions as they have no control on the information they want to share about 3.5 development of framework for image segmentation, object localization, and detailed annotation 37 their preferences. We addressed this problem in our paper [131] as we describe below. Summary of Contributions: We proposed a model-free neural-based reinforcement learning framework for predicting the amount of information that a social network user would share with her friends. In reinforcement learning, the agent learns by experience and interactions through time how to behave in a certain environment. In a similar way, our framework can dynamically learn how to propagate information amongst social network users based on the estimated level of trust between them. We model the trust by the amount and frequency of their interactions as well as their profile similarity. The high level of trust between users leads to higher rewards which are expressed in more information propagation. In a timely fashion, the framework is able to decide the best data propagation policy that distinguishes the highly trusted users. With that, the user needs to do less efforts in managing her privacy preferences, and at the same time gain more control over her own data. In our framework, we use Neural Fitted Q iteration algorithm [132] with a sequential time-series neural architecture to approximate the learning action values in the reinforcement learning settings. We evaluated our framework on a one year crawl of Twitter data (gathered in [42]) on different tasks of top users’ detection and correlation measurements. We can summarize the contributions of our paper [131] as follows: C4.1: Proposing an algorithm for dynamic generation of privacy labels with minimum user efforts. C4.2: Proposing an online trust calculation mechanism based on social behavior. C4.3: Introducing a new application for reinforcement learning algorithms in a social networking context. We relate the contributions of this work to the proposed research questions as follows: This contribution answers the research question: With the huge information overload in social media, how can the interactions between social media users and their connections be used effectively as a source of implicit feedback about their preferences? The suggested approach to estimate the trust level between users can be easily adapted to the general scenario we handle in this thesis in which we analyze the interactions between the digital influencers and their followers. This estimation can be used to control the amount of recommendations that the followers recieve from their favorite influencers.

3.5 Development of Framework for Image segmentation, object lo- calization, and detailed annotation

The motivation behind this thesis was the development of a set of tools and al- gorithms that facilitate the interpretation of social network users’ behaviour into 38 3 detailed contributions implicit recommendations that reflect their true tastes. Such advanced analytics require architectures that capture the fine details from different sources of users’ input such as: images and text. In our paper [133] we have laid out the roadmap for the different components that we provided in this thesis and that we have described in the previous sections (papers [127], [134], [130], [128]). Furthermore, we described a domain adaptation methodology that can be used to facilitate cross-domain retrievals between social networks and online shops.

A high-level description of the methodology described in [133] can be summa- rized as follows:

• Development of information extraction methodology from social media text that can be used to generate meta-data of the items to aid in the retrieval of similar or same items from target domains.

• Capturing the similarity between images from source and target domains using a combination of textual and visual features similarity. This step is used for an initial retrieval of a set similar or same items from the target domain.

• Detection and localization of the elements of interest that appear in the source image to decide the regions upon which we will perform more analysis. This step enhances the filtering process of the retrieved results.

• Performing pixel-wise semantic segmentation on the localized regions where an outline of every object in the input image is decided and classified. For this purpose, we suggested developing a model that merges and modifies two advanced semantic segmentation architectures: Multi-task Network Cascades (MNC) [135] and Human Parsing using CNN [136] (more details in [133]).

• Using extracted information from text analysis as a correction factor for the semantic segmentation process.

To train and evaluate such detailed architectures in which localization and pixel- by-pixel semantic segmentation are performed in an end-to-end manner, detailed datasets with specific meta-data are required. The meta-data is for: (a) the hierarchical annotations that are used to classify the items of interest, (b) localization bounding boxes and labels, (c) semantic segmentation classes. Another requirement that we wanted to have in the dataset is that it should be collected from a social network to reflect users’ behavior. To the best of our knowledge, all the available data sources don’t contain neither the level of meta-data nor the added value of user interactions analysis that we aimed for. At the same time, the available tools for preparing detailed image meta-data such as semantic segmentation require large computation and storage . For the aforementioned reasons: 3.6 development of dynamic & scalable image classifiers 39

• We collected a 70K posts’ dataset from Instagram as it is an effective social network platform which is rich of images and users’ interactions

• We designed and developed a framework that can be used to tag images in a hierarchical fashion, and to perform object localization and semantic segmentation in a fast and efficient way.

We refer the reader to [133] and [130] for the full details of the framework and the dataset. The dataset has been enriched with the output of our text mining framework that was described in section 3.2. We can summarize the contributions of our papers [133] and [130] as follows: C5.1: Providing a rich-hierarchical dataset that can be used for training advanced deep learning architectures for research purposes. C5.2: Designing and developing a complete, extensible and easy-to-use framework to hierarchically tag and perform object localization and semantic segmentation of images. C5.3: Providing a description of a general domain adaptation methodology for cross-domain retrievals.

We relate the contributions of this work to the proposed research questions as follows: The contributions in our papers [133] and [130] answers the research question: How can we enhance and speed up the process of preparing datasets that can be used to train deep learning models for fine-grained products’ classifications?

3.6 Development of Dynamic & Scalable Image Classifiers

A strategy that can take personalized marketing a step ahead is known as Hyper- Personalization [137]. Hyper-Personalization leverages real-time consumers’ data to deliver more relevant recommendations. This goes beyond the basic consumer’s information (e.g. name, location, and shopping history) as it makes use of refined consumers’ information that is detected from their online behaviour. In the context of fashion online shops, hyper-personalization can be achieved through the detection of the outfits’ details in which consumers have shown interest (view, favorite). Examples of these details are: materials, patterns, brands and styles. However, the detection of such details from fashion outfits in social media images require training classification models with enough fine-grained annotated data. With the continuous increase of clothing brands and styles, the need for scalable methods of advanced image classification in machine learning is becoming increasingly important. This has been the focus in our work in [134] where we have used our results in [127] of textual analysis from social media content to support the visual classification. 40 3 detailed contributions

Summary of Contributions: We proposed two novel techniques to incorporate the social media textual content to support the visual classification in neural networks in a dynamic way. An outfit image usually contains multiple clothing items, and each item can be classified into further specific categories. Furthermore, each clothing item has a material, a pattern, and a brand. This complicates the classification space. Recently, some deep learning frameworks introduced the concept of Dynamic Computational Graph (define by run) graphs where a deep learning architecture can be defined at runtime. We benefit from this in our first technique DynamicCNN where we define a dynamic framework in which multiple classification layers exist and they get activated dynamically based upon the mined text from the image. Another technique we proposed is DynamicPruning which as the name suggests is similar to the neural pruning concept which aims to control the number of redundant network parameters or feature maps that have less contribution to the prediction’s output. However, we developed a dynamic approach to dynamically deactivate the parameter connections that are irrelevant to the detected category in the image based on the detected category from text analysis as we show in our work in [127]. We evaluated our models on two large fashion-specialized datasets. Our experiments demonstrated the improvements achieved by both models in enhancing classification accuracy. We can summarize the contributions of paper [134] as follows: C6.1: Proposing two novel techniques(DynamicPruning and DynamicCNN) to incorporate the social media textual content to support the visual classification in neural networks in a dynamic way. We relate the contributions of this work to the proposed research questions as follows: The contributions in our paper [134] answers the research question: How can visual and textual deep learning technologies be used to handle the large-scale fine-grained representations and classifications?. chapter4

Conclusions

4.1 Conclusions

Within the course of this thesis, we focused on proposing solutions to effectively extract knowledge about the preferences of social media users and connect them to recommendation sources such as online shops. The key to build a solid under- standing about an online user is through a detailed analysis of their profile, history and behaviour. We analysed the interactions between online social media users as a source of implicit feedback about their preferences in topics and services. We focused on digital influencers as a rich source of information that can describe the followers’ potential future purchases in a specific domain such as fashion. We explored the possibility of the identification of products from social media images and text that can represent items in online shops. This in turn can be useful for generating cross-domain recommendations and can act as a mediator between social media and online shops. For the development of such fine-grained knowledge extraction tools, there was a need to dive in solutions for large-space image classifiers which we have addressed in our work. We proposed strategies for incorporating the extracted social media products’ meta-data into recommendation models. Furthermore, we developed a complete framework for preparing datasets that are used in training advanced models with the required localization, annotation and pixel-by-pixel segmentation. Finally, we proposed efficient interpretable topics’ summarization approaches from social media. Table 4.1 shows how the contribu- tions of this thesis are correlated to the research questions that we introduced in chapter 1.

41 42 4 conclusions

4.2 Future Work

As part of future work, we aim to explore multi-modal cross-domain recommen- dations through the development of unified multimodal neural representations from both social media platforms, such as Instagram, and online shops. The main challenges in this direction can be defined as follows: 1) Finding multimodal representations for text and images of Online Social Networks (OSN) users that can be mapped to multimodal representations of items’/products’ images and metadata in online shops; 2) Applying user interactions with other users and digital influencers to filter the retrievals from multiple online shops; 3) Fusion- ing of modalities from OSNs and online shops into the recommendation architecture.

Furthermore, a complimentary part to the frameworks that we have described in this thesis would be to explore adding users’ real interactions from actual retailers where we can best evaluate the value of the described work in real use cases. That can be best achieved by the development of a web or mobile application, that requires users approval to be connected to their Instagram profiles where their interactions with digital influencers can be traced and analyzed which can be useful to evaluate the described knowledge extraction methods. Finally, more experimentation for the described components using the latest attention-mechanism neural architectures can possibly enhance the quality of our frameworks. 4.2 future work 43

Table 4.1: Correlating the research questions and the thesis contributions

Contribution Questions C1: Development of a Supervised and Q: How can we enhance topic modelling Dynamic Topic Mining Framework [90] in social media to facilitate recommen- dations ? C2: Development of a Fine-Grained & Q: How can visual and textual deep Semantic Information Extraction Frame- learning technologies be used to handle work from Social Media [127] the large-scale fine-grained representa- tions and classifications? C3: Development of a Framework for Q: How can visual and textual deep Incorporating Hierarchical MetaData in learning technologies be used to handle Embedding-based Recommendations the large-scale fine-grained representa- [128] tions and classifications? C4: Development of an Adaptive Q: With the huge information overload Privacy-Preserving Knowledge Sharing in social media, how can the interac- Solution for OSNs [131] tions between social media users and their connections be used effectively as a source of implicit feedback about their preferences? C5: Development of Framework for Im- Q: How can we enhance and speed up age segmentation, object localization, the process of preparing datasets that and detailed annotation [133] and [130] can be used to train deep learning mod- els for fine-grained products’ classifica- tions? C6: Development of Dynamic Convolu- Q: How can visual and textual deep tional Neural Network Classifiers [134] learning technologies be used to handle the large-scale fine-grained representa- tions and classifications?

Bibliography

[1] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.

[2] M. Hoffman, F. R. Bach, and D. M. Blei, “Online learning for latent dirichlet allocation,” in advances in neural information processing systems, 2010, pp. 856–864.

[3] D. Ramage, D. Hall, R. Nallapati, and C. D. Manning, “Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora,” in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, 2009, pp. 248–256.

[4] M. Mohsin, “10 social media statistics you need to know in 2020,” Retrieved February, vol. 19, 2019. [Online]. Available: https: //www.oberlo.com/blog/social-media-marketing-statistics

[5] C. Lou and S. Yuan, “Influencer marketing: how message value and credibility affect consumer trust of branded content on social media,” Journal of Interactive , vol. 19, no. 1, pp. 58–73, 2019.

[6] S. S. Singh, K. Singh, A. Kumar, H. K. Shakya, and B. Biswas, “A survey on information diffusion models in social networks,” in International Conference on Advanced Informatics for Computing Research. Springer, 2018, pp. 426–439.

[7] K. Freberg, K. Graham, K. McGaughey, and L. A. Freberg, “Who are the social media influencers? a study of public perceptions of personality,” Review, vol. 37, no. 1, pp. 90–92, 2011.

[8] M. Swant, “Twitter says users now trust influencers nearly as much as their friends,” Retrieved February, vol. 19, p. 2018, 2016.

[9] J. Grafström, L. Jakobsson, and P. Wiede, “The impact of influencer marketing on consumers’ attitudes,” 2018.

[10] X. J. Lim, J.-H. Cheah, and M. W.Wong, “The impact of social media influencers on purchase intention and the mediation effect of customer attitude,” Asian Journal of Business Research, vol. 7, no. 2, pp. 19–36, 2017.

[11] T. Baramidze et al., “The effect of influencer marketing on customer behavior. the case of" " influencers in makeup industry,” 2018.

45 46 bibliography

[12] P. Martineau. (2018) Inside the pricy war to influence your instagram feed. [Online]. Available: https://www.wired.com/story/pricey-war-influence- your-instagram-feed/

[13] N. Dokoohaki, “Trust-based user profiling,” Ph.D. dissertation, KTH Royal Institute of Technology, 2013.

[14] K. Falk, Practical Recommender Systems. Mannings Publications, 2019.

[15] O. Barkan and N. Koenigstein, “Item2vec: neural item embedding for collab- orative filtering,” in 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2016, pp. 1–6.

[16] M. Jamali and M. Ester, “A matrix factorization technique with trust propaga- tion for recommendation in social networks,” in Proceedings of the fourth ACM conference on Recommender systems, 2010, pp. 135–142.

[17] Y. Koren, “Factorization meets the neighborhood: a multifaceted collaborative filtering model,” in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008, pp. 426–434.

[18] R. M. Bell and Y. Koren, “Improved neighborhood-based collaborative filter- ing,” in KDD cup and workshop at the 13th ACM SIGKDD international conference on knowledge discovery and data mining. Citeseer, 2007, pp. 7–14.

[19] R. Van Meteren and M. Van Someren, “Using content-based filtering for recommendation,” in Proceedings of the Machine Learning in the New Information Age: MLnet/ECML2000 Workshop, vol. 30, 2000, pp. 47–56.

[20] P. Aggarwal, V. Tomar, and A. Kathuria, “Comparing content based and collaborative filtering in recommender systems,” International Journal of New Technology and Research, vol. 3, no. 4, 2017.

[21] P. B. Thorat, R. Goudar, and S. Barve, “Survey on collaborative filtering, content-based filtering and hybrid recommendation system,” International Journal of Computer Applications, vol. 110, no. 4, pp. 31–36, 2015.

[22] D. Jiménez-Castillo and R. Sánchez-Fernández, “The role of digital influencers in brand recommendation: Examining their impact on engagement, expected value and purchase intention,” International Journal of Information Management, vol. 49, pp. 366–376, 2019.

[23] M. Sudha and K. Sheena, “Impact of influencers in consumer decision process: the fashion industry,” SCMS Journal of Indian Management, vol. 14, no. 3, pp. 14–30, 2017. bibliography 47

[24] P. P. Analytis, D. Barkoczi, P. Lorenz-Spreen, and S. M. Herzog, “The structure of social influence in recommender networks,” in WWW’20. International World Wide Web Conferences Committee, 2020.

[25] P. Domingos and M. Richardson, “Mining the network value of customers,” in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, 2001, pp. 57–66.

[26] F. Eskandanian, N. Sonboli, and B. Mobasher, “Power of the few: Analyzing the impact of influential users in collaborative recommender systems,” in Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization, 2019, pp. 225–233.

[27] N. Rubens and M. Sugiyama, “Influence-based collaborative active learning,” in Proceedings of the 2007 ACM conference on Recommender systems, 2007, pp. 145–148.

[28] C.-T. Li, S.-D. Lin, and M.-K. Shan, “Exploiting endorsement information and social influence for item recommendation,” in Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, 2011, pp. 1131–1132.

[29] Q. Zhang, J. Wu, H. Yang, W. Lu, G. Long, and C. Zhang, “Global and local influence-based social recommendation,” in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 2016, pp. 1917–1920.

[30] Q. Zhang, J. Wu, Q. Zhang, P. Zhang, G. Long, and C. Zhang, “Dual influence embedded social recommendation,” World Wide Web, vol. 21, no. 4, pp. 849–874, 2018.

[31] S. A. Hasan, M. I. Subhani, M. Osman et al., “Effect of trust factors on consumer’s acceptance of word of mouth recommendation,” 2012.

[32] R. K. Chellappa and R. G. Sin, “Personalization versus privacy: An empirical examination of the online consumer’s dilemma,” and management, vol. 6, no. 2-3, pp. 181–202, 2005.

[33] S. Garcia-Rivadulla, “Personalization vs. privacy: An inevitable trade-off?” IFLA journal, vol. 42, no. 3, pp. 227–238, 2016.

[34] Y. Jeong and Y. Kim, “Privacy concerns on social networking sites: Interplay among posting types, content, and audiences,” in Human Behavior, vol. 69, pp. 302–310, 2017. 48 bibliography

[35] J. L. Gómez-Barroso, C. Feijóo, and J. F. Palacios, “Acceptance of personalised services and privacy disclosure decisions: Results from a representative survey of internet users in spain,” in Organizing for Digital Innovation. Springer, 2019, pp. 63–76.

[36] Y. Li, “Theories in online information privacy research: A critical review and an integrated framework,” Decision support systems, vol. 54, no. 1, pp. 471–481, 2012.

[37] A. F. Westin, “Social and political dimensions of privacy,” Journal of social issues, vol. 59, no. 2, pp. 431–453, 2003.

[38] C. E. Tucker, “The economics of advertising and privacy,” International journal of Industrial organization, vol. 30, no. 3, pp. 326–329, 2012.

[39] Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang, “Deepfashion: Powering robust clothes recognition and retrieval with rich annotations,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1096–1104.

[40] Y. Ge, R. Zhang, X. Wang, X. Tang, and P. Luo, “Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5337–5345.

[41] K.-H. Liu, F. Wang, and T.-J. Liu, “A clothing recommendation dataset for online shopping,” in 2019 IEEE International Conference on Consumer Electronics- Taiwan (ICCE-TW). IEEE, 2019, pp. 1–2.

[42] N. Dokoohaki, F. Zikou, D. Gillblad, and M. Matskin, “Predicting swedish elections with twitter: A case for stochastic link structure analysis,” in 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 2015, pp. 1269–1276.

[43] C. Akcora, B. Carminati, and E. Ferrari, “Privacy in social networks: How risky is your social graph?” in 2012 IEEE 28th International Conference on Data Engineering. IEEE, 2012, pp. 9–19.

[44] S. Zhang, L. Yao, A. Sun, and Y. Tay, “Deep learning based recommender system: A survey and new perspectives,” ACM Computing Surveys (CSUR), vol. 52, no. 1, pp. 1–38, 2019.

[45] A. Karatzoglou and B. Hidasi, “Deep learning for recommender systems,” in Proceedings of the eleventh ACM conference on recommender systems, 2017, pp. 396–397. bibliography 49

[46] H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. An- derson, G. Corrado, W. Chai, M. Ispir et al., “Wide & deep learning for recommender systems,” in Proceedings of the 1st workshop on deep learning for recommender systems, 2016, pp. 7–10.

[47] P. Covington, J. Adams, and E. Sargin, “Deep neural networks for youtube recommendations,” in Proceedings of the 10th ACM conference on recommender systems, 2016, pp. 191–198.

[48] S. Okura, Y. Tagami, S. Ono, and A. Tajima, “Embedding-based news rec- ommendation for millions of users,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 1933–1942.

[49] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119.

[50] F. Vasile, E. Smirnova, and A. Conneau, “Meta-prod2vec: Product embeddings using side-information for recommendation,” in Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016, pp. 225–232.

[51] J. B. Vuurens, M. Larson, and A. P. de Vries, “Exploring deep space: Learning personalized ranking in a semantic space,” in Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, 2016, pp. 23–28.

[52] M. Grbovic, V. Radosavljevic, N. Djuric, N. Bhamidipati, J. Savla, V. Bhagwan, and D. Sharp, “E-commerce in your inbox: Product recommendations at scale,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015, pp. 1809–1818.

[53] M. Grbovic, N. Djuric, V. Radosavljevic, F. Silvestri, R. Baeza-Yates, A. Feng, E. Ordentlich, L. Yang, and G. Owens, “Scalable semantic matching of queries to ads in sponsored search advertising,” in Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016, pp. 375–384.

[54] T. Nedelec, E. Smirnova, and F. Vasile, “Specializing joint representations for the task of product recommendation,” in Proceedings of the 2nd Workshop on Deep Learning for Recommender Systems, 2017, pp. 10–18.

[55] S. Sedhain, A. K. Menon, S. Sanner, and L. Xie, “Autorec: Autoencoders meet collaborative filtering,” in Proceedings of the 24th international conference on World Wide Web, 2015, pp. 111–112. 50 bibliography

[56] Y. Wu, C. DuBois, A. X. Zheng, and M. Ester, “Collaborative denoising auto- encoders for top-n recommender systems,” in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, 2016, pp. 153–162. [57] G. K. Dziugaite and D. M. Roy, “Neural network matrix factorization,” arXiv preprint arXiv:1511.06443, 2015. [58] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative filtering,” in Proceedings of the 26th international conference on world wide web, 2017, pp. 173–182. [59] J. Lian, X. Zhou, F. Zhang, Z. Chen, X. Xie, and G. Sun, “xdeepfm: Combining explicit and implicit feature interactions for recommender systems,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 1754–1763. [60] F. Strub, R. Gaudel, and J. Mary, “Hybrid recommender system based on autoencoders,” in Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, 2016, pp. 11–16. [61] X. Wu, B. Shi, Y.Dong, C. Huang, and N. Chawla, “Neural tensor factorization,” arXiv preprint arXiv:1802.04416, 2018. [62] B. Hidasi and A. Karatzoglou, “Recurrent neural networks with top-k gains for session-based recommendations,” in Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018, pp. 843–852. [63] B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk, “Session-based recom- mendations with recurrent neural networks,” arXiv preprint arXiv:1511.06939, 2015. [64] B. Hidasi, M. Quadrana, A. Karatzoglou, and D. Tikk, “Parallel recurrent neural network architectures for feature-rich session-based recommendations,” in Proceedings of the 10th ACM Conference on Recommender Systems, 2016, pp. 241–248. [65] M. M. Khan, R. Ibrahim, and I. Ghani, “Cross domain recommender systems: a systematic literature review,” ACM Computing Surveys (CSUR), vol. 50, no. 3, pp. 1–34, 2017. [66] A. Farzindar and D. Inkpen, “Natural language processing for social media,” Synthesis Lectures on Human Language Technologies, vol. 8, no. 2, pp. 1–166, 2015. [67] Y. Goldberg, “Neural network methods for natural language processing,” Synthesis Lectures on Human Language Technologies, vol. 10, no. 1, pp. 1–309, 2017. bibliography 51

[68] J. R. Firth, “A synopsis of linguistic theory, 1930-1955,” Studies in linguistic analysis, 1957.

[69] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neural probabilistic language model,” Journal of machine learning research, vol. 3, no. Feb, pp. 1137–1155, 2003.

[70] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.

[71] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.

[72] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2017.

[73] T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin, “Ad- vances in pre-training distributed word representations,” arXiv preprint arXiv:1712.09405, 2017.

[74] Z. Bairong, W. Wenbo, L. Zhiyu, Z. Chonghui, and T. Shinozaki, “Comparative analysis of word embedding methods for dstc6 end-to-end conversation modeling track,” in Proceedings of the 6th Dialog System Technology Challenges (DSTC6) Workshop, 2017.

[75] D. Ganguly, D. Roy, M. Mitra, and G. J. Jones, “Word embedding based generalized language model for information retrieval,” in Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, 2015, pp. 795–798.

[76] G. Zuccon, B. Koopman, P. Bruza, and L. Azzopardi, “Integrating and evaluating neural word embeddings in information retrieval,” in Proceedings of the 20th Australasian document computing symposium, 2015, pp. 1–8.

[77] T. Mikolov, Q. V.Le, and I. Sutskever, “Exploiting similarities among languages for machine translation,” arXiv preprint arXiv:1309.4168, 2013.

[78] W. Y. Zou, R. Socher, D. Cer, and C. D. Manning, “Bilingual word embeddings for phrase-based machine translation,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013, pp. 1393–1398.

[79] Y. Lin. (2019) 10 twitter statistics every marketer should know in 2020. [Online]. Available: https://www.oberlo.com/blog/twitter-statistics 52 bibliography

[80] S. Aslam. (2020) Facebook by the numbers: Stats, demographics fun facts. [Online]. Available: https://www.omnicoreagency.com/facebook-statistics/ [81] ——. (2020) Instagram by the numbers: Stats, demographics fun facts. [Online]. Available: https://www.omnicoreagency.com/instagram- statistics/ [82] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American society for information science, vol. 41, no. 6, pp. 391–407, 1990. [83] T. Hofmann, “Probabilistic latent semantic analysis,” in Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 1999, pp. 289–296. [84] H. Jelodar, Y. Wang, C. Yuan, X. Feng, X. Jiang, Y. Li, and L. Zhao, “Latent dirichlet allocation (lda) and topic modeling: models, applications, a survey,” Multimedia Tools and Applications, vol. 78, no. 11, pp. 15 169–15 211, 2019. [85] J. D. Mcauliffe and D. M. Blei, “Supervised topic models,” in Advances in neural information processing systems, 2008, pp. 121–128. [86] S. Lacoste-Julien, F. Sha, and M. I. Jordan, “Disclda: Discriminative learning for dimensionality reduction and classification,” in Advances in neural information processing systems, 2009, pp. 897–904. [87] D. Quercia, H. Askham, and J. Crowcroft, “Tweetlda: supervised topic classification and link prediction in twitter,” in Proceedings of the 4th Annual ACM Web Science Conference. ACM, 2012, pp. 247–250. [88] Y. Wang, E. Agichtein, and M. Benzi, “Tm-lda: efficient online modeling of latent topic transitions in social media,” in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2012, pp. 123–131. [89] L. AlSumait, D. Barbará, and C. Domeniconi, “On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking,” in 2008 eighth IEEE international conference on data mining. IEEE, 2008, pp. 3–12. [90] S. Jaradat, N. Dokoohaki, and M. Matskin, “Ollda: a supervised and dynamic topic mining framework in twitter,” in 2015 IEEE International Conference on Data Mining Workshop (ICDMW). IEEE, 2015, pp. 1354–1359. [91] J. Piskorski and R. Yangarber, “Information extraction: Past, present and future,” in Multi-source, multilingual information extraction and summarization. Springer, 2013, pp. 23–49. bibliography 53

[92] P. Bhargava, N. Spasojevic, and G. Hu, “Lithium nlp: A system for rich information extraction from noisy user generated text on social media,” arXiv preprint arXiv:1707.04244, 2017.

[93] A. Ritter, O. Etzioni, and S. Clark, “Open domain event extraction from twitter,” in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012, pp. 1104–1112.

[94] M. B. H. Morgan and M. Van Keulen, “Information extraction for social media,” in Proceedings of the Third Workshop on Semantic Web and Information Extraction, 2014, pp. 9–16.

[95] C. Cherry and H. Guo, “The unreasonable effectiveness of word representa- tions for twitter named entity recognition,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015, pp. 735–745.

[96] R. Irfan, C. K. King, D. Grages, S. Ewen, S. U. Khan, S. A. Madani, J. Kolodziej, L. Wang, D. Chen, A. Rayes et al., “A survey on text mining in social networks,” The Knowledge Engineering Review, vol. 30, no. 2, pp. 157–170, 2015.

[97] Q. Le and T. Mikolov, “Distributed representations of sentences and docu- ments,” in International conference on machine learning, 2014, pp. 1188–1196.

[98] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587.

[99] P. Purkait, C. Zhao, and C. Zach, “Spp-net: Deep absolute pose regression with synthetic views,” arXiv preprint arXiv:1712.03452, 2017.

[100] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.

[101] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99.

[102] Q. Chen, J. Huang, R. Feris, L. M. Brown, J. Dong, and S. Yan, “Deep domain adaptation for describing people based on fine-grained clothing attributes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5315–5324.

[103] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information process- ing systems, 2012, pp. 1097–1105. 54 bibliography

[104] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in European conference on computer vision. Springer, 2014, pp. 818–833.

[105] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.

[106] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.

[107] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

[108] K. Nogueira, A. Veloso, and J. A. dos Santos, “Statistical and deep learning algorithms for annotating and parsing clothing items in fashion photographs,” Dissertation, 2015.

[109] K. Yamaguchi, M. H. Kiapour, L. E. Ortiz, and T. L. Berg, “Parsing clothing in fashion photographs,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 3570–3577.

[110] M. Hadi Kiapour, X. Han, S. Lazebnik, A. C. Berg, and T. L. Berg, “Where to buy it: Matching street clothing photos in online shops,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 3343–3351.

[111] N. Khosla and V. Venkataraman, “Building image-based shoe search using convolutional neural networks,” CS231n course project reports, 2015.

[112] J.-C. Chen and C.-F. Liu, “Visual-based deep learning for clothing from large database,” in Proceedings of the ASE BigData & SocialInformatics 2015, 2015, pp. 1–10.

[113] Z. Li, Y. Sun, F. Wang, and Q. Liu, “Convolutional neural networks for clothes categories,” in CCF Chinese Conference on Computer Vision. Springer, 2015, pp. 120–129.

[114] S. Jain, “Landmarks for clothing retrieval,” Ph.D. dissertation, 2019.

[115] Z. Liu, S. Yan, P. Luo, X. Wang, and X. Tang, “Fashion landmark detection in the wild,” in European Conference on Computer Vision. Springer, 2016, pp. 229–245.

[116] G. Shi, “Search by image: clothing segmentation and retrieval,” 2019. bibliography 55

[117] X. Liang, L. Lin, W. Yang, P. Luo, J. Huang, and S. Yan, “Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval,” IEEE Transactions on Multimedia, vol. 18, no. 6, pp. 1175–1186, 2016. [118] L. Zhou, Z. Zhou, and L. Zhang, “Deep part-based image feature for clothing retrieval,” in International Conference on Neural Information Processing. Springer, 2017, pp. 340–347. [119] Z. Li, Y. Li, W. Tian, Y. Pang, and Y. Liu, “Cross-scenario clothing retrieval and fine-grained style recognition,” in 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2016, pp. 2912–2917. [120] Z. Wang, Y. Gu, Y. Zhang, J. Zhou, and X. Gu, “Clothing retrieval with visual attention model,” in 2017 IEEE Visual Communications and Image Processing (VCIP). IEEE, 2017, pp. 1–4. [121] I. Tautkute, T. Trzciński, A. P. Skorupa, Ł. Brocki, and K. Marasek, “Deepstyle: Multimodal search engine for fashion and interior design,” IEEE Access, vol. 7, pp. 84 613–84 628, 2019. [122] Y. Jo, J. Wi, M. Kim, and J. Y. Lee, “Flexible fashion product retrieval using multimodality-based deep learning,” Applied Sciences, vol. 10, no. 5, p. 1569, 2020. [123] S. C. Hidayati, T. W. Goh, J.-S. G. Chan, C.-C. Hsu, J. See, W. L. Kuan, K.-L. Hua, Y. Tsao, and W.-H. Cheng, “Dress with style: Learning style from joint deep embedding of clothing styles and body shapes,” IEEE Transactions on Multimedia, 2020. [124] X. Song, F. Feng, J. Liu, Z. Li, L. Nie, and J. Ma, “Neurostylist: Neural compatibility modeling for clothing matching,” in Proceedings of the 25th ACM international conference on Multimedia, 2017, pp. 753–761. [125] R. He and J. McAuley, “Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering,” in proceedings of the 25th international conference on world wide web, 2016, pp. 507–517. [126] Q. Liu, S. Wu, and L. Wang, “Learning preferences and demands in visual recommendation,” arXiv preprint arXiv:1911.04229, 2019. [127] K. Hammar, S. Jaradat, N. Dokoohaki, and M. Matskin, “Deep text mining of instagram data without strong supervision,” in 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI). IEEE, 2018, pp. 158–165. [128] S. Jaradat, “Outfit2vec: Incorporating clothing hierarchical metadata intoout- fits’ recommendation,” in Fashion Recommender Systems Book, to appear in Lecture Notes in Social Networks Springer, 2020. 56 bibliography

[129] Q. Le and T. Mikolov, “Distributed representations of sentences and docu- ments,” in International conference on machine learning, 2014, pp. 1188–1196.

[130] S. Jaradat, N. Dokoohaki, U. Wara, M. Goswami, K. Hammar, and M. Matskin, “Tals: A framework for text analysis, fine-grained annotation, localisation and semantic segmentation,” in 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 2. IEEE, 2019, pp. 201–206.

[131] S. Jaradat, N. Dokoohaki, M. Matskin, and E. Ferrari, “Trust and privacy cor- relations in social networks: a deep learning framework,” in 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 2016, pp. 203–206.

[132] M. Riedmiller, “Neural fitted q iteration–first experiences with a data efficient neural reinforcement learning method,” in European Conference on Machine Learning. Springer, 2005, pp. 317–328.

[133] S. Jaradat, “Deep cross-domain fashion recommendation,” in Proceedings of the Eleventh ACM Conference on Recommender Systems, 2017, pp. 407–410.

[134] S. Jaradat, N. Dokoohaki, K. Hammar, U. Wara, and M. Matskin, “Dynamic cnn models for fashion recommendation in instagram,” in 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/Sus- tainCom). IEEE, 2018, pp. 1144–1151.

[135] J. Dai, K. He, and J. Sun, “Instance-aware semantic segmentation via multi-task network cascades,” in CVPR, 2016.

[136] X. Liang, “Human parsing with contextualized convolutional neural network,” in Proceedings IEEE Int. Conf. Comput. Vis., 2015, pp. 136–139.

[137] A. Barcelona. (2018) What is hyper-personalization? [Online]. Available: https://dotcms.com/blog/post/what-is-hyper-personalization- appendix A

OLLDA- A Supervised and Dynamic Topic Mining Framework in Twitter

(Included Publication) Shatha Jaradat, Nima Dokoohaki, Mihhail Matskin. OLLDA - A Supervised and Dynamic Topic Mining Framework in Twitter. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 1354-1359. IEEE, 2015.

Extended with additional experiments in: Shatha Jaradat and Mihhail Matskin. On Dynamic Topic Models for Mining Social Media. In Emerging Research Challenges and Opportunities in Computational Social Network Analysis and Mining, pp. 209-230. Springer, Cham, 2019.

57 appendix B

Deep Text Mining of Instagram Data without Strong Supervision

(Included Publication) Kim Hammar, Shatha Jaradat, Nima Dokoohaki, Mihhail Matskin. Deep Text Mining of Instagram Data without Strong Supervision. In 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pp. 158-165. IEEE, 2018.

Extended with additional experiments in: Kim Hammar, Shatha Jaradat, Nima Dokoohaki, and Mihhail Matskin. Deep text classification of Instagram data using word embeddings and weak supervision. In Web Intelligence, no. Preprint, pp. 1-15. IOS Press, 2020.

65 appendix C

Outfit2Vec: Incorporating Clothing Hierarchical MetaData into Outfits’ Recommendation

Shatha Jaradat, Nima Dokoohaki, and Mihhail Matskin. Outfit2Vec: Incorporating Clothing Hierarchical MetaData into Outfits’ Recommen- dation. To Appear in a special issue (Fashion in Recommender Systems) in Lecture Notes in Social Networks Springer December 2020.

75 appendix D

Learning What to Share in Online Social Networks using Deep Reinforcement Learning

(Included Publication) Shatha Jaradat, Nima Dokoohaki, Mihhail Matskin, and Elena Ferrari. Learning What to Share in Online Social Networks using Deep Reinforcement Learning. In Machine Learning Techniques for Online Social Networks, pp. 115-133. Springer, Cham, 2018.

Short paper: Shatha Jaradat, Nima Dokoohaki, Mihhail Matskin, and Elena Fer- rari. Trust and privacy correlations in social networks: a deep learning framework. In 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 203-206. IEEE, 2016.

97 appendix E

Deep Cross-Domain Fashion Recommendation

Shatha Jaradat Deep Cross-Domain Fashion Recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems, Recsys, pp. 407-410. 2017

119 appendix F

TALS: A Framework for Text Analysis, Fine-Grained Annotation, Localisation and Semantic Segmentation

Shatha Jaradat, Nima Dokoohaki, Ummal Wara, Mallu Goswami, Kim Hammar, Mihhail Matskin. TALS: A Framework for Text Analysis,Fine-Grained Annotation, Localisation and Semantic Segmentation. In 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 201-206. IEEE, 2019.

125 appendix G

Dynamic CNN Models for Fashion Recommendation in Instagram

Shatha Jaradat, Nima Dokoohaki, Mihhail Matskin. Dynamic CNN Models for Fashion Recommendation in Instagram. In 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud Computing, Social Computing Networking, Sustain- able Computing Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), pp. 1144-1151. IEEE, 2018.

133