Similarity Measure for Social Networks – A Brief Survey

Ahmad Rawashdeh and Anca L. Ralescu EECS Department, ML 0030 University of Cincinnati Cincinnati OH 45221-0030, USA [email protected], [email protected]

Abstract al. 2012), (Huang and Lai 2006), (Pan et al. 2010), (Syme- onidis, Tiakas, and Manolopoulos 2010). Automating this Social networks play an increasing role in many areas of com- task may help when browsing large collections of data: in- puter science applications. An important aspect of these ap- plications relies on similarity measures between nodes in the stead of searching through a large network to find candidate network. Several similarity measures, described in the litera- profiles, a similarity aware browser can suggest them by ture are surveyed here with the goal of providing a guide to considering similarities along some features. Applications their selection in various applications. of such browsers include social networks (e.g., , LinkedIn), as well as other networks (e.g., recommending systems) Introduction For many people, day-to-day interaction has been re- Social networks represent a particular domain as a collection placed by instant messages, likes (or favorite), and share of nodes/profiles and links between them. Common opera- (retweet) through social networking websites such as Face- tions in social networks, such as link prediction, community book, MySpace, , YouTube, and 1. In particu- formation, browing, are driven by a similarity measure be- lar, by the end of 2010, Facebook had in excess of 1.2 billion tween nodes. Node similarity can be viewed as similarity users (Facebook 2010). Many people have turned to such between strings, whose definition/ evaluation can be traced websites to communicate with friends or make new connec- to work on information retrieval (Findler and Van Leeuwen tions. This increase in internet usage has raised many ques- 1979). tions concerning the privacy of these users, since they up- Often similarity measures are defined as decreasing func- load their personal media content (photos and videos) and tions of a distance metric. For example, two of the string they share their personal opinions on various topics (D´ıaz metrics used most often are editDistance (Lin 1998) and tri- and Ralescu 2012). grams (Bahl, Jelinek, and Mercer 1983). For finite strings x The motives for participating in online social networks and y the edit distance is defined as could be understood from the study of psychology. See for example (Heidemann, Klier, and Probst 2012) where the d (x, y) = min{γ(S)|S is a en edit sequence taking x to y} edit definition of social networks, their characteristics, as well (1) as what motivates participation in them is presented. where γ denotes the cost of an edit operation (deletion, insertion, replacement), and for the sequence of edit opera- Formally, a can be represented as a graph Pn (D´ıaz and Ralescu 2012), that is a collection of nodes, or tions S = {s1, . . . , sn}, γ(S) = i=1 γ(si). The trigram distance for two sequences x and y is defined as: profiles. Similarity between nodes could be based on node attributes(textual) and/or edges/links(structure). |tri(x) ∩ tri(y)| Some similarity measures consider the common neigh- d (x, y) = (2) tri |tri(x) ∪ tri(y)| bors of nodes (Jeh and Widom 2002), while others allow nodes to be similar even when they do not have common where tri(x) denotes the collection of trigrams (ordered neighbors (Leicht, Holme, and Newman 2006). Some sim- substrings of length 3) of x, and |tri(x)| denotes the number ilarity measures only consider the link similarity of length of trigrams of x. Then the similarity measures correspond- two, others define similarity based on longer paths (Leicht, ing to (1) and (2) are defined as in equations (3) respectively Holme, and Newman 2006), while others are defined as the (Lin 1998). number of paths of varying length between them (Papadim- 1 itriou, Symeonidis, and Manolopoulos 2012). Applications sima(x, y) = (3) of node similarity are different and they have inspired re- 1 + da(x, y) searchers to explore different approaches for evaluating it. where a ∈ {edit, tri}. Given a profile of a network node, finding similar pro- 1www.facebook.com, www.myspace.com, www.twitter.com, files has been investigated by many researchers (Yang et www.youtube.com, www.orkut.com For example, some work combines similarity from Wordnet different similarity measures, three of which are based on with a vector cosine similarity (Rawashdeh et al. 2014) to information content and the remaining three are edge based find similarity of profiles in Facebook. similarity measures. Several similarity measures have been introduced includ- The work described in (Mabotuwana, Lee, and Cohen- ing, Jaccard (biology) (Jaccard 1912), cosine, min (Le- Solal 2013) uses cosine similarity in conjunction with icht, Holme, and Newman 2006), Sorensen, Adamic Adar the SNOMED CT ontology to evaluate similarity between (Adamic and Adar 2003), and resource allocation (Zhang words. Similarly, in (12 ) cosine similarity is also used, how- et al. 2010). Also, PageSim, a method to measure the simi- ever, in conjunction with Wordnet which is a more general larity between web documents was proposed in (Lin, King, ontology than the SNOMED CT ontology. In addition, the and Lyu 2006), based on PageRank score propagation. Pa- approach described in (12 ) finds the similarity between sen- geSim was evaluated against standard information retrieval tences not just words. Therefore tools of natural language similarities TF/IDF, which were considered to be the ground processing(NLP) are considered. truth. Most of the similarity measures described in the lit- erature are knowledge dependent. However, the authors in (Lin 1998) describe an independent definition of similarity Problem description and evaluation metrics in terms of information theory. A list of similarity properties The problem of finding node similarity can be concisely (axioms) was included in (Burkhard and Richter 2001). stated as follows: given a node with attributes and possibly a set of structural attributes represented as connections find Semantic Similarity the set of nodes which are similar to it. Using the formal- ism of graph theory, a social network is defined as a graph Research in finding the semantic similarity between con- G = (V,E), where V , the set of vertices represents nodes cepts using knowledge such as Wordnet or between words in the the network, and E the set of edges, represents the in the semantic web has been reported in several papers (Li, links in the network. Thus the similarity problem is to find Bandar, and McLean 2003), (Ilakiya, Sumathi, and Karthik all pairs of similar vertices (vi, vj), vi, vj ∈ V , based either 2012). Semantic similarity measures have been classified on the node profiles (node attributes) or the set of edges E. into (1) feature based, (2) information content (which re- lies on counting the number of occurrences of a word in corpora for instance), (3) hybrid, and (4) /ontology mea- Motivation for finding similarity sures (which counts the number of edges/nodes between two The problem of finding similar objects has its root in clus- concepts) (Elavarasi and Menaga 2014). tering, collaborative filtering, and search engines(Ganesan, The path similarity measure is based on the structure of Garcia-Molina, and Widom 2003). Finding similar objects the taxonomy of the conceptual relationships (ontology hi- can be used to predict links in data networks (Lu¨ and Zhou erarchy) and it is sensitive to the quality of the taxonomy of 2011). There are two approaches for link prediction ei- concepts. This determines how the semantic similarity mea- ther local or global link structure (overall path). Also, find- sure is quantified. Edge counting methods suffer from irreg- ing similar objects may be used to recommend items for a ularities in path lengths between different concepts so one customer or friends for a particular person based on com- must proceed with caution when using them. monality between the objects attributes (Yang et al. 2014). Approaches based on information content combine cor- Prior to recommender systems, the problem of finding sim- pus statistics and taxonomy structure (Jiang and Conrath ilar objects was also studied in information retrieval (Lin, 1997). The results report that the information content mea- King, and Lyu 2006), similarity is used to cluster docu- sures perform better than edge only based measures. It ments (Zhou, Cheng, and Yu 2009). Measures of similar- is worth noting that most studies that use Wordnet only ity used for this purpose include content-based, title-based, consider the is-a relationship (hyponymy/hypernymy ) (Li, and keyword-based (Xiao 2012) measures. An example of Yang, and Park 2012). using similarity for clustering is collaborative filtering (Jeh A comparison between the three different similarity mea- and Widom 2002). Zhou, Cheng, and Yu proposed an al- sures was discussed in the paper (Pirro´ 2009). The au- gorithm to clustering objects using attributes and structure thors have pointed out that approaches that rely on statistics where the attribute of a node and the structure are seem- of word occurrences, within the corpora, require intensive ingly conflicting or at least independent (Zhou, Cheng, and computations, and thus are not practical when the corpora Yu 2009). Different similarities measures have been used is large or is different from the one used to find information in biology, ethnology, taxonomy, image retrieval, geology, content. and chemistry (Choi, Cha, and Tappert 2010), as well as Wordnet is a free lexical database that organizes English in the biomedical field (Mabotuwana, Lee, and Cohen-Solal words into concepts and relations between them. English 2013). Applications of finding similarity in data include (Li nouns, verbs, adjectives, and adverbs form hierarchies of et al. 2010) neighborhood search, centrality analysis, link synsets with relations connecting them. A synset is the hi- prediction, graph clustering, multimedia captioning, related erarchy determined by the hypernym (is-a) relationship. pages suggestion in search engines, identifying web commu- Wordnet-Similarity is a Perl package for calculating the nities, friends suggestion in friendship network (Facebook similarity between concepts using Wordnet (Pedersen, Pat- or MySpace), movies suggestion, item recommendation in wardhan, and Michelizzi 2004).The package implements six retail service, scientific and web domains in general. Table 1: Two Facebook profiles Table 2: Two Facebook profiles Profile ID Profile Friends IDs Dataset Facebook

xxxxxxx773.html Comedy, Action films American, EL EL 1, 2, 3, 10 Profile-1 ID 100000060663828.html xxxxxx432.html Haunted 3D, Saw, Transformers, Movies Interest Captain Jack Sparrow, Pirates of the Caribbean, Mind Hunter 3, 4, 5, 9, 10 Meet The Spartans, Ice Age Movie, Spider-Man Profile-2 ID 100000067167795.html Node and other similarity measurements Movies Interest Clash of the Titans, Ratatouille, When the graph structure is considered, the similarity is Independence Day, Mr. Nice Guy, based on node and edge properties. When general ontolo- The Lord of the Rings Trilogy (Official Page) gies or domain knowledge are used, then semantic similar- ity measures are used. Furthermore, depending on the con- text, similarity between words, documents, or between pro- Table 3: Human judgement on the Facebook profiles shown in files (nodes) are used (Symeonidis, Tiakas, and Manolopou- Table 2 los 2010), (Naderi and Rumpler 2007). According to their Person Similarity score types, similarity measures in networks can be classified as: in [−2, 2] User 1 1 • Structural similarity (link-based). In this type of simi- User 2 0 larity, the links between the nodes in the graph are exam- User 3 2 ined; the links can represent: co-authorship, friendship, User 4 1 payment, etc. It has been shown that when compared with User 5 2 respect to the human judgment they are better than text User 6 0.8 similarities (content)(Li et al. 2010). An example of struc- Average 1.13 tural similarity, which takes into account the neighbors of the pair of vertices under consideration is defined in (Le- icht, Holme, and Newman 2006). Table 4 shows a list of structural similarity measures. and Set similarities. For the Wordnet-Cosine measure, a • Content similarity (text-based). In this type of similar- node profile X is represented by the vector DX = ity, the attributes of the node in the graph are examined. [Dx1,...,Dxn], where Dxi denotes the distance, in the hi- Content similarity of a friendship website could possibly erarchy of concepts, between the ith word in the user profile be based on birth date, hobbies, movies interest, and age. X and the top concept entity, obtained by using Wordnet. One way to capture content is by the use of user-defined The Wordnet-Cosine similarity is then defined as shown in tags (e.g., tags were considered to represent the content equation (4) of a movie of interest to the user while building a group SimW (X,Y ) = cos(DX ,DY ), (4) profile). Based on tag similarity, a recommendation algo- rithm can be developed (Pera and Ng 2013). For the WFV similarity measure, a node profile X is rep- • Keyword similarity (word-based). Like for tag similar- resented by the vector FX = [Fx1,...,Fxn], where Fxi ity, node similarity may be defined based on the similar- denotes the denotes the frequency of the ith word in the ity between node representing collections of words: key- dataset. The Word Frequency Vector similarity is then de- words. An example of keyword similarity is the forest fined as shown in equation (5) model described in (Bhattacharyya, Garg, and Wu 2011), SimWFV (X,Y ) = cos(VX ,VY ), (5) where the keywords were arranged in a hierarchical struc- ture to form trees of different heights. Wordnet was then The Symantic Category similarity measure is defined as used to find the semantic relationship between the key- shown in equation (6). words. Tables 2 and 3 show an example of two Facebook profiles SimSC (X,Y ) = cos(SCX ,SCY ), (6) and their similarities as evaluated by a group of six users. In where table 1 for each profile ID, the movies interest, and the list of friends, are included. This data is a Facebook snaphot, where SCX = [fA(X)|A ∈ {NN,NNS,NNP,NNPS}], the friend IDs are synthetic data. The scores in table 3 range and fA(X) denotes the frequency of A in X. from [−2, 2] with negative score indicating dissimilarity and Finally, the Set similarity is defined as shown in equation positive scores indicating similarity. (7).

|SX ∩ SY | SimS(X,Y ) = , (7) Node similarity |SX ∪ SY |

The similarity measures compared in (12 ) are Wordnet- where SX = {Sxi|i = 1, . . . , n} is the set of parents for the Cosine, Word Frequency Vector, Symantic Categories, ith word in the user profile X obtained by using Wordnet. similarity measures, some of which are listed below, aim to Table 4: Node and Link similarities evaluate the similarity between two nodes in the context of Node Similarity the whole network. Wordnet Cosine Set Semantic Word Frequency SimRank is a general approach for finding similarity be- Vector tween objects, based on structural features (Jeh and Widom 0.862795963 0.0659340066 0.877526909 0.74900588 2002). Two objects are considered to be similar if they are Link Similarity related to similar objects. The authors state that performance Slaton Jaccard Hub Promoted Index Hub Depressed Index was out of scope in their experiment. SimF usion (Xi et 0.423 0.285 0.5 0.4 al. 2005) finds the similarity between two objects by con- sidering evidence from multiple sources (data ). One of the differences between SimRank and SimF usion is Edge similarity that SimF usion uses two random walker models while Several structural similarity measures, based on edges are SimRank uses a random Surfer-Pairs model (Xi et al. shown in equations (8) - (11), where Γ(X) denotes the set 2005). A non-iterative version of SimRank, was shown to of neighbors of X, and KX is the degree of node X: have improved performance (Li et al. 2010) . P −Rank (Zhao, Han, and Sun 2009) extends SimRank |Γ(X) ∩ Γ(Y )| by taking into consideration the in-links and out-links re- SimSalton(X,Y ) = √ (8) KX × KY lationship when calculating the similarity. According to P − Rank, which expands the definition of SimRank, |Γ(X) ∩ Γ(Y )| SimJaccard(X,Y ) = (9) “two entities a and b are similar, if they are referenced by |Γ(X) ∪ Γ(Y )| similar entities”and “if they also reference similar entities”. |Γ(X) ∩ Γ(Y )| E − rank of two nodes, measures probability of two ran- SimHPI (X,Y ) = (10) dom walkers each starting from one of the nodes considered, min{K ,K } X Y along paths of possibly unequal length (SimRank) (Zhang |Γ(X) ∩ Γ(Y )| et al. 2012). SimHDI (X,Y ) = (11) max{KX ,KY } As already mentioned in the previous section, when the objects under consideration are represented in a hierarchical Table 4 shows the similarities between the two Face- manner, set intersectional similarity measures cannot cap- book profiles (shown in Table 1). The top portion of the ture this aspect. It can result in 0 similarity value between table, shows the node similarities computed according to the objects of different heights in the hierarchy of concepts equations (4)-(7), while the bottom portion shows the edge- even though they might actually be similar. similarities computed according to equations (8)-(11). It can Other types of similarity measures are vector space meth- be seen from table 4 that, with the exception of set simi- ods which include cosine similarity and Pearson Correlation larity, node/profile semantic similarity measures exceed the Coefficient. A user study was conducted to evaluate these measures based on links. The maximum node similarity is similarity measures and it was found that the similarity mea- attained by Wordnet Cosine, which exceeds 0.86, while the sure introduced by the authors gives results that are very lowest similarities are attained by set similarity and Jaccard close to human judgment. A performance-based comparison similarity respectively. Note that this is (or it should not be) of six structural collaborative measures of similarity with surprising, for the Wordnet Cosine captures the similarity Cosine Index and Pearson Correlation Coefficient is detailed of meanings based on the Wordnet hierarchy. The Jaccard in (Zhang et al. 2010). The results on two datasets: Movie- similarity is an index of intersection of the set neighbors, Lens and Netflix indicates that Salton Index, Jaccard Index, without any semantic analysis of their meaning. The spirit and Sorensen Index always have good performance. Cosine of set similarity is actually quite close to that of the Jaccard similarity produces good results as well. However, its com- similarity, as it provides the index of intersection of node putational complexity is very high to be applied to very large parents (which are, of course among the neighbors) of the data. nodes being compared. In a browsing application, setting A simple group-based similarity measure, GroupRem, a threshold, α, on the similarity of items returned given a defined on movie tags and popularity was defined in (Pera query, if α ≥ 0.5 three of the node similarity measures will and Ng 2013). When compared with three most popu- output the two profiles as similar. And in fact, the same re- lar collaborative filtering techniques, GroupRem outper- sult would hold when α = 0.74. By contrast, with α = 0.5 formed them with respect to the Discounted Cumulative only one of the link similarity measures would output them Gain (Croft, Metzler, and Strohman 2010). as similar. A comparative study of similarity measures between bi- nary vectors, which the authors call binary similarity mea- sures, is described in (Choi, Cha, and Tappert 2010), where Global Structural Similarities both negative and positive matches have been studied. Sev- Structural similarity can be classified according to three enty six binary similarity measures are clustered (using hier- perspectives: (i) local vs. global, (ii) parameter-free vs. archical clustering) and evaluated according to the relation- parameter-dependent, and (iii) node-dependent vs. path- ships between them. dependent (Lu¨ and Zhou 2011). In general, global structural Inspired by P ageRank, P ageSim (Lin, King, and Lyu Table 5: Comparison between similarity measures Table 6: Comparison between similarity measures Similarity Measure Time Space Similarity Mea- Compared with Dataset used SimRank O(Kn2d2) O(n2) sure Improved SimRank O(k4n2)(k ≤ n) k2 × n2 SimRank Co-citation Research Index1 PageSim O(C2), C = kr O(Cn), C = kr (Small 1973) 2 E-Rank O(n3), but more ex- future work Improved Sim- SimRank DBLP ; Image data (query- tensive evaluation to Rank ing Google Image Search); 3 be considered in future Wikipedia 4 work PageSim SimRank; Co- crawled Webpages SimFusion O(Kn2d), where d is O(n2), where sine TFIDF as a the number of iterations n is the total ground truth 5 number of ob- E-Rank Enriches Enron Email dataset , Cita- 6 7 jects P-Rank by tion Network , DBLP P-Rank O(Kn2d2) O(n2) considering FriendTNS 0.012sec, forN=1000, both in- and k=10 out-links SimFusion SimRank Search click through log (detailed de- scription) and 2006) is a method for finding similar web pages in domains tf × idf such as search engines or web document classifications, and P-Rank Extends Sim- Synthetic2 it was evaluated against Cosine TF/IDF. Rank The Facebook “People you may know”friends recom- Vertex similar- Cosine sim- AddHealth data: study as mender uses friends of friends, paths of length two, as a sim- ity 2 ilarity and part of the National Longi- ilarity measure and global graph properties (as local graph SimRank tudinal Study of Adolescent information) are used to recommend friends. Precision and Health recall were used to measure the performance of friend FriendTNS RWR, Short- Facebook, Hi5, Epinion recommendation (Symeonidis, Tiakas, and Manolopou- est Path, los 2010), (Papadimitriou, Symeonidis, and Manolopoulos Adamic/Adar, 2012). FOAF 1http://www.researchindex.com Transcript of 1050 stu- Conclusion dents at Stanford University; 2http://kdl.cs.umass.edu/data/dblp/dblp-info.html; Similarity measures play an important role in information 3 porcessing. When used in conjunction with social networks http://www.wikipedia.org/ 4http://www.cse.cuhk.edu.hk (or more generally, complex networks) two main issues 5 arise, structural, that is, the link pattern of the network, http://www.cs.cmu.edu/enron/ 6http://snap.stanford.edu/data/ and semantic, that is, the meaning of nodes (the informa- 7 tion stored in them). These issues led researchers to de- http://www.informatik.uni-trier.de/ley/db/ velop structural, semantic, and hybrid structural and seman- tic measures of similarity for such networks. This brief sur- vey illustrates the variety of similarity measures developed Bahl, L. R.; Jelinek, F.; and Mercer, R. 1983. A maximum for social networks and highlights the difficulty of selecting likelihood approach to continuous speech recognition. Pat- a similarity measure for problems such as link prediction tern Analysis and Machine Intelligence, IEEE Transactions on (2):179–190. or community detection. Tables 5 and 6, list the differences between several similarity measures. Table 5 compares the Bhattacharyya, P.; Garg, A.; and Wu, S. F. 2011. Analysis of user keyword similarity in online social networks. Social selected similarity measures based on time and space com- network analysis and mining 1(3):143–158. plexities, while Table 6 compares the selected similarity Burkhard, H.-D., and Richter, M. M. 2001. On the notion of measures with respect to whom they compared their work similarity in case based reasoning and fuzzy theory. In Soft with, dataset used, and performance. For a comparison be- computing in case based reasoning. Springer. 29–45. tween similarities from other perspectives the reader is re- Choi, S.-S.; Cha, S.-H.; and Tappert, C. C. 2010. A survey ferred to (Lin 1998) and (Choi, Cha, and Tappert 2010) and of binary similarity and distance measures. Journal of Sys- references therein. temics, Cybernetics and Informatics 8(1):43–48. Croft, W. B.; Metzler, D.; and Strohman, T. 2010. Search References engines: Information retrieval in practice. Addison-Wesley Reading. D´ıaz, I., and Ralescu, A. 2012. Privacy issues in social net- Adamic, L. A., and Adar, E. 2003. Friends and neighbors on works: a brief survey. In Advances in Computational Intelli- the web. Social networks 25(3):211–230. gence. Springer. 509–518. Elavarasi, S Anitha, A. J., and Menaga, K. 2014. A survey SITIS’07. Third International IEEE Conference on, 239–245. on semantic similarity measure. International Journal of Re- IEEE. search in Advent Technology 2(4):389–398. Pan, Y.; Li, D.-H.; Liu, J.-G.; and Liang, J.-Z. 2010. Detect- Facebook. 2010. Facebook: 10 years of social networking, in ing community structure in complex networks via node simi- numbers. [Online; accessed 19-July-2014]. larity. Physica A: Statistical Mechanics and its Applications Findler, N. V., and Van Leeuwen, J. 1979. A family of sim- 389(14):2849–2857. ilarity measures between two strings. Pattern Analysis and Papadimitriou, A.; Symeonidis, P.; and Manolopoulos, Y. Machine Intelligence, IEEE Transactions on (1):116–118. 2012. Fast and accurate link prediction in social networking Ganesan, P.; Garcia-Molina, H.; and Widom, J. 2003. Ex- systems. Journal of Systems and Software 85(9):2119–2132. ploiting hierarchical domain structure to compute similarity. Pedersen, T.; Patwardhan, S.; and Michelizzi, J. 2004. Word- ACM Transactions on Information Systems (TOIS) 21(1):64– net:: Similarity: measuring the relatedness of concepts. In 93. Demonstration Papers at HLT-NAACL 2004, 38–41. Associ- Heidemann, J.; Klier, M.; and Probst, F. 2012. Online so- ation for Computational Linguistics. cial networks: A survey of a global phenomenon. Computer Pera, M. S., and Ng, Y.-K. 2013. A group recommender for Networks 56(18):3866–3878. movies based on content similarity and popularity. Informa- Huang, X., and Lai, W. 2006. Clustering graphs for visual- tion Processing & Management 49(3):673–687. ization via node similarities. Journal of Visual Languages & Pirro,´ G. 2009. A semantic similarity metric combining fea- Computing 17(3):225–253. tures and intrinsic information content. Data & Knowledge Ilakiya, P.; Sumathi, M.; and Karthik, S. 2012. A survey Engineering 68(11):1289–1308. on semantic similarity between words in semantic web. In Rawashdeh, A.; Rawashdeh, M.; D´ıaz, I.; and Ralescu, A. Radar, Communication and Computing (ICRCC), 2012 In- 2014. Measures of semantic similarity of nodes in a social ternational Conference on, 213–216. IEEE. network. In Information Processing and Management of Un- Jaccard, P. 1912. The distribution of the flora in the alpine certainty in Knowledge-Based Systems, 76–85. Springer. zone. 1. New phytologist 11(2):37–50. Small, H. 1973. Co-citation in the scientific literature: A new Jeh, G., and Widom, J. 2002. Simrank: a measure of measure of the relationship between two documents. Journal structural-context similarity. In Proceedings of the eighth of the American Society for information Science 24(4):265– ACM SIGKDD international conference on Knowledge dis- 269. covery and data mining, 538–543. ACM. Symeonidis, P.; Tiakas, E.; and Manolopoulos, Y. 2010. Tran- Jiang, J. J., and Conrath, D. W. 1997. Semantic similar- sitive node similarity for link prediction in social networks ity based on corpus statistics and lexical taxonomy. arXiv with positive and negative links. In Proceedings of the fourth preprint cmp-lg/9709008. ACM conference on Recommender systems, 183–190. ACM. Leicht, E.; Holme, P.; and Newman, M. E. 2006. Vertex Xi, W.; Fox, E. A.; Fan, W.; Zhang, B.; Chen, Z.; Yan, J.; similarity in networks. Physical Review E 73(2):026120. and Zhuang, D. 2005. Simfusion: measuring similarity using Li, Y.; Bandar, Z. A.; and McLean, D. 2003. An approach unified relationship matrix. In Proceedings of the 28th an- for measuring semantic similarity between words using mul- nual international ACM SIGIR conference on Research and tiple information sources. Knowledge and Data Engineering, development in information retrieval, 130–137. ACM. IEEE Transactions on 15(4):871–882. Xiao, J.-T. 2012. An efficient web document clustering al- Li, C.; Han, J.; He, G.; Jin, X.; Sun, Y.; Yu, Y.; and Wu, T. gorithm for building dynamic similarity profile in similarity- 2010. Fast computation of simrank for static and dynamic aware web caching. In Machine Learning and Cybernet- information networks. In Proceedings of the 13th Interna- ics (ICMLC), 2012 International Conference on, volume 4, tional Conference on Extending Database Technology, 465– 1268–1273. IEEE. 476. ACM. Yang, X.; Tian, Z.; Cui, H.; and Zhang, Z. 2012. Link pre- Li, C. H.; Yang, J. C.; and Park, S. C. 2012. Text catego- diction on evolving network using tensor-based node similar- rization algorithms using semantic approaches, corpus-based ity. In Cloud Computing and Intelligent Systems (CCIS), 2012 thesaurus and wordnet. Expert Systems with Applications IEEE 2nd International Conference on, volume 1, 154–158. 39(1):765–772. IEEE. Lin, Z.; King, I.; and Lyu, M. R. 2006. Pagesim: A novel Yang, X.; Guo, Y.; Liu, Y.; and Steck, H. 2014. A survey link-based similarity measure for the world wide web. In Pro- of collaborative filtering based social recommender systems. ceedings of the 2006 IEEE/WIC/ACM International Confer- Computer Communications 41:1–10. ence on Web Intelligence, 687–693. IEEE Computer Society. Zhang, Q.-M.; Shang, M.-S.; Zeng, W.; Chen, Y.; and Lu,¨ L. Lin, D. 1998. An information-theoretic definition of similar- 2010. Empirical comparison of local structural similarity in- ity. In ICML, volume 98, 296–304. dices for collaborative-filtering-based recommender systems. Lu,¨ L., and Zhou, T. 2011. Link prediction in complex net- Physics Procedia 3(5):1887–1896. works: A survey. Physica A: Statistical Mechanics and its Zhang, M.; He, Z.; Hu, H.; and Wang, W. 2012. E-rank: Applications 390(6):1150–1170. A structural-based similarity measure in social networks. Mabotuwana, T.; Lee, M. C.; and Cohen-Solal, E. V. 2013. In Web Intelligence and Intelligent Agent Technology (WI- An ontology-based similarity measure for biomedical data– IAT), 2012 IEEE/WIC/ACM International Conferences on, application to radiology reports. Journal of biomedical infor- volume 1, 415–422. IEEE. matics 46(5):857–868. Zhao, P.; Han, J.; and Sun, Y. 2009. P-rank: a comprehensive Naderi, H., and Rumpler, B. 2007. Three user profile sim- structural similarity measure over information networks. In ilarity calculation (upsc) methods and their evaluation. In Proceedings of the 18th ACM conference on Information and Signal-Image Technologies and Internet-Based System, 2007. knowledge management, 553–562. ACM. Zhou, Y.; Cheng, H.; and Yu, J. X. 2009. Graph clustering VLDB Endowment 2(1):718–729. based on structural/attribute similarities. Proceedings of the