
ARISTOTLE UNIVERSITY OF THESSALONIKI JOINT POSTGRADUATE COURSE ON «INFORMATICS AND MANAGEMENT» DEPARTMENTS OF INFORMATICS AND ECONOMIC SCIENCES Link prediction based on Multi-modal Social Networks Master’s Thesis of Christos Perentis (297) Examination Committee Supervisor: Panagiotis Symeonidis, PhD Members: Athena Vakali, Associate Professor Christina Boutsouki, Assistant Professor THESSALONIKI JUNE 2011 ii !"#$%&%'('#& )!*')#$%+,#& -'$$!(&*#.+$ /#!%,+,!%#.& )"&0"!,,! ,'%!)%12#!.3* $)&1/3* «)(+"&4&"#.+ .!# /#&#.+$+» %,+,!%3* )(+"&4&"#.+$ .!# &#.&*&,#.3* ')#$%+,3* Link prediction based on Multi-modal Social Networks /5678µ9:5;< '=>9?@9 :AB 2=<?:AB )C=DE:F (!',: 297) 'GC:9?:5;< '65:=A6< '65H7D68E: !"#"$%&'() *+µ,-#./(), 0%/12'34") ,D7F: 56(#1 7"218(, 5#"98(4&'4%" :"6($;'4%" <4%='.#" >93+'=3?2(, @9.23+4( :"6($;'4%" !"##$%&'()* (&+'(&# 2011 iv To my parents Fryni and George vi Acknowledgements This thesis would not have been possible without the support of Dr. Panagiotis Syme- onidis whose guidance was crucial to its formation and completion. I am utterly con- vinced that Dr. Symeonidis will continue to conduct primary research on the broader and emerging research area of Data Mining with the same dedication. I hope that we will cooperate again in the future, addressing further challenging issues. Last but not least, I would like to thank my parents, my family and my friends for their selfless support not only during this thesis but throughout my studies and my life in general. Abstract Online social networks (OSNs) such as Facebook and MySpace, are aware of high acceptance since they enable users to share digital content (links, pictures, videos), express or share opinions and expand their social circle, by making new friends. All these kinds of interactions that users participate in, lead to the evolution and expansion of the social network over time. OSNs support users, providing them with friend recommendations based on the existing explicit friendship network they gradually build. This task refers to the Link Prediction problem, where given a snapshot of a social network we try to infer which new future interactions are likely to occur in the near future, among its members. Most of the related work focuses on the structural properties of a single type of network to provide user recommen- dations. However, users form several implicit networks due to a number of interactions with items such as co-sharing a group, co-commenting on posts or co-tagging photos. In addition, the majority of earlier work uses this kind of auxiliary (user-item) sources in order to only recommend items to users. Main aim of this thesis is to exploit such implicit interactions, which are simultaneously being formed within social networks, (relating users with items) in order to provide enhanced user recommendations or to be fully used in absence of information from the explicit friendship network. Katz status index, a global-based approach, is adopted for the computation of users "proximity" in a social graph. Several extensions of the Katz measure are taken into account, including this of using only a single source but also other combined cases using auxiliary user-item relationships. The premise, that considering also auxiliary sources will perform more accurate user recommendations, is experimentally verified. Keywords: Link prediction, Data mining, Online Social Networks, Recommender Systems, Multi-Modal Networks, Bipartite Networks, Experimentation, Web 2.0, Katz status index. vii viii !"#$%&'& !" #$%&'()'* +,"-.-"/0- 1"/23.- (Online Social Networks ) 4$.* 2, Facebook /5" 2, MySpace 6-.&)7,#- µ'689% 5$,:,;< 5$4 2,#* ;&<(2'* 2,# =56/4(µ",# >(2,3 µ?(5 (25 2'9'#25)5 ;&4-"5. @?(5 5$4 5#28, :)-'25" % :#-5242%25 (2,#* ;&<(2'* -5 :%µ",#&6<(,#- ?-5 $9<A,* :"5B,&'2"/0- 599%9'$":&8('.-. C?2,"'* ')-5" % '3/,9% 5-259956< D%B"5/,3 $'&"';,µ?-,#, % '3&'(% $595"0- /5" % :%µ",#&6)5 -?.- B"9"0- µ?(. 2%* '$?/25(%* 2,# /,"-.-"/,3 2,#* /3/9,#, % 5-259956< 5$4D'.-, /5A0* /5" % (#µµ'2,;< (' :"/2#5/?* ,µ8:'* /29. @?(5 (' 5#28 25 $95)("5, $5&,#("87'25" % 5-86/% 6"5 2%- 3$5&E% '/')-.- 2.- µ%;5-"(µ0- $,# µ$,&,3- -5 #$,(2%&)E,#- 2%- (3(25(% (;'2"/0- $&,(0$.- < 5-2"/'"µ?-.- (2,#* ;&<(2'*, µ?(5 5$4 ?-5 µ'689, 46/, '2'&,6'-0- :':,µ?-.- $,# ')-5" :"5A?("µ5 (2, 1"5:)/2#,. !" #$%&'()'* /,"-.-"/<* :"/23.(%*, $5&?;,#- (2,#* ;&<(2'* (#(28('"* 899.- ;&%(20- < 5-2"/'"µ?-.- µ' 25 ,$,)5 ," ;&<(2'* #$8&;'" $"A5-42%25 -5 (#(;'2)7,-25" µ' F8(% 2"* $5&,3('* /,"-?* B"9)'* < /,"-?* $&,2"µ<('"* 6"5 /8$,"5 5-2"/')µ'-5. C5 (#(2<µ525 (#(28('.- /593$2,#- 5#2< 2%- 5-86/%, F5("74µ'-5 (' 'µB5-')* /5" 5B5-')* (;?('"* $,# :"52%&,3- ," ;&<(2'* µ'25E3 2,#* < (' (;?(% µ' 8995 5-2"/')µ'-5. G#6/'/&"µ?-5, % '&'#-%2"/< $'&",;< 2,# Link Prediction, 95µF8-'" #$4D% 2"* $5&,3('* 5#2?* (;?('"*, $&,($5A0-25* -5 $&,F9?D'" (#(;'2)('"* $,# $&4/'"25" -5 $&,/3D,#- (2, 8µ'(, µ?99,-. !" $'&"((42'&'* '&'#-%2"/?* µ'9?2'* $,# ?;,#- <:% $&56µ52,$,"%A') (2, ;0&,, '/µ'2599'3,-25" /#&).* µ)5 µ,-5:"/< $%6< $9%&,B,&)5* 6"5 2%- $5&56.6< (#(28('.- (2,#* ;&<(2'*. H5("/4* (24;,* 2%* $5&,3(5* :"$9.µ52"/<* '&65()5* ')-5" -5 'E'28('" ?-5 '-,$,"%µ?-, $95)(", 4$,# '/µ'2599'3'25" $'&"((42'&'* 5$4 µ)5 $%6?* $9%&,B,&)5*, F5("74µ'-, 24(, (' ?-5 :)/2#, B"9)5*, 4(, /5" (' ?-5 :)/2#, /,"-0- 5-2"/'"µ?-.- $,# µ,"&87,-25" ," ;&<(2'*, 6"5 2% F?92"(2% (3(25(% ;&<(2%. I",A'20-25* ?-5 6-.(24 µ?2&, (Katz status index), #$,9,6)7,#µ' 2% (#(;?2"(% 5-8µ'(5 (2,#* ;&<(2'* /5A0* /5" µ'25E3 ;&%(20- /5" 5-2"/'"µ?-.-. G2% (#-?;'"5, $5&86,#µ' (#(28('"* ;&%(20-, '9?6;,-25* 2%- 5/&)F'"5 2%* (3(25(%*, 425- ;&%("µ,$,"')25" $9%&,B,&)5 µ4-, 5$4 2, :)/2#, B"9)5* 5998 /5" 425- (#-:#87,-25" µ' :)/2#5 ;&%(20--5-2"/'"µ?-.-. C?9,*, 5E",9,6,3µ' $'"&5µ52"/8 2%- µ?A,:4 µ5*, :');-,-25* 42" , (#-:#5(µ4* $'&"((42'&.- :"/23.- (#(;?2"(%* '-"(;3'" (%µ5-2"/8 2% (3(25(% ;&%(20-. ()*"+, -%"+.+/: =&4F9'D% G#-:?('.- (Link Prediction), JE4&#E% 1':,µ?-.-, @?(5 +,"-.-"/<* 1"/23.(%*, G#(2<µ525 G#(28('.-, =,9#2&,$"/8 (Multi-modal) 1)/2#5, 1"µ'&< 1)/2#5, ='"&5µ52"(µ4*, Web 2.0, µ'2&"/< (#(;?2"(%* Katz.! Contents 1 Introduction 1 1.1 Motivation .................................... 1 1.2 Contribution ................................... 2 1.3 Outline ...................................... 4 2 Social Networking in the WWW 5 2.1 Web 2.0 Technologies .............................. 5 2.1.1 Blogs ................................... 7 2.1.2 Wikis ................................... 8 2.1.3 Mash-ups ................................ 9 2.1.4 Social Tagging Systems ........................ 9 2.2 Social Networking and Social Networks ................... 11 2.3 Online Social Networks Sites ......................... 12 2.3.1 Facebook ................................. 12 2.3.2 Myspace ................................. 14 2.3.3 Twitter .................................. 15 2.4 Social Rating Networks ............................. 16 2.4.1 Epinions ................................. 16 2.4.2 Flixster .................................. 17 2.4.3 Digg ................................... 18 3 Data mining in OSNs 21 3.1 Social Network Analysis (SNA) ........................ 21 3.2 Basic Graph Theory ............................... 23 3.2.1 Unipartite Graph ............................ 24 3.2.2 Bipartite Graph ............................. 26 ix x CONTENTS 3.2.3 Muti-Modal Graph ........................... 27 3.2.4 Measures and Metrics ......................... 28 3.3 Community Detection ............................. 31 3.4 Topological Properties ............................. 32 3.4.1 Small World Networks ......................... 32 3.4.2 Scale free Networks .......................... 32 3.5 Visualization and Tools ............................. 33 3.6 Personalization and Recommender Systems ................. 35 3.6.1 Content-Based ............................. 36 3.6.2 Collaborative Filtering ......................... 37 3.6.3 Hybrid .................................. 38 3.7 OSNs and Recommender Systems ...................... 38 4 Link Prediction 41 4.1 Mathematical Problem Formulation ..................... 42 4.2 Related Work .................................. 43 4.2.1 Different Link Prediction Tasks .................... 44 4.2.1.1 Link Prediction ........................ 44 4.2.1.2 Rating Prediction ...................... 45 4.3 Classification of Topological Measures for Link Prediction ........ 46 4.3.1 Node Based Methods ......................... 46 4.3.2 Path Based Methods .......................... 48 4.3.3 Higher Level Approaches ....................... 51 4.4 Experimental Configuration .......................... 52 5 Methodology 55 5.1 Proposed Approach ............................... 55 5.2 Link Prediction based on User-User Adjacency Matrix ........... 56 5.3 Link Prediction based on User-Item Bi-Adjacency Matrix ......... 59 5.4 Link Prediction based on Multi-modal Graph ................ 61 6 Experimental Evaluation and Results 65 6.1 xSocial Synthetic Data Set ........................... 66 6.2 Experimental Setup ............................... 69 6.2.1 Evaluation Method ........................... 69 6.2.2 Performance Measures ......................... 69 6.3 Results and Discussion ............................. 72 7 Conclusion and Future Work 79 CONTENTS xi A Appendix 89 A.1 xSocial Generator ................................ 90 A.2 Matlab Code ................................... 92 A.3 Visual Basic Code ................................ 94 A.4 Topological Properties Computation in C++ ................. 98 xii CONTENTS List of Figures 2.1 Overview of the Facebook Website ...................... 13 2.2 Overview
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages122 Page
-
File Size-