An Overview of Social Network Analysis Marcia´ Oliveira and Joao˜ Gama∗

Overview An overview of social network analysis Marcia´ Oliveira and Joao˜ Gama∗ Data mining is being increasingly applied to social networks. Two relevant rea- sons are the growing availability of large volumes of relational data, boosted by the proliferation of social media web sites, and the intuition that an individual’s connections can yield richer information than his/her isolate attributes. This syn- ergistic combination can show to be germane to a variety of applications such as churn prediction, fraud detection and marketing campaigns. This paper attempts to provide a general and succinct overview of the essentials of social network analysis for those interested in taking a first look at this area and oriented to use data mining in social networks. C 2012 Wiley Periodicals, Inc. How to cite this article: WIREs Data Mining Knowl Discov 2012, 2: 99–115 doi: 10.1002/widm.1048 INTRODUCTION bers of a group were depicted using the so-called so- he world is a complex system of interconnected ciograms, which can be defined as charts where in- T parts. Each part itself constitutes a smaller sys- dividuals are represented as nodes and the relations tem whose networked structure can be, most of the among them are represented by lines. Such diagrams times, analyzed through the lens of social network revealed to be very useful in uncovering the hidden analysis (SNA). structures of groups, by means of the identification SNA is an interdisciplinary methodology re- of, for instance, stars, alliances, and subgroups. search area with contributions from Sociology, So- In a broader sense, a social network is con- cial Psychology, Anthropology, Physics, Mathemat- structed from relational data and can be defined as ics, Computer Science, among others, being a rich a set of social entities, such as people, groups, and scientific field that has significantly benefited from organizations, with some pattern of relationships or the collaborative efforts of researchers from different interactions between them. These networks are usu- scientific areas. Because networks were studied inde- ally modeled by graphs, where vertices represent the pendently by distinct disciplines, for a considerable social entities and edges represent the ties established amount of time, each one developed its own jargon. between them. The underlying structure of such net- To avoid ambiguity and clarify the adopted language, works is the object of study of SNA. SNA methods in Table 1 we present the network terminology used and techniques were thus designed to discover pat- in different fields. Throughout this document, we will terns of interaction between social actors in social use these terms interchangeably. networks. The origins of SNA, as a basis for developing Hence, the focus of SNA is on the relationships useful sociological concepts, can be traced back to the established between social entities rather in the social early 1930s, when Moreno1 developed the sociomet- entities themselves. In fact, the main goal of this tech- ric approach as a way to conceptualize the structure nique is to examine both the contents and patterns of the social relations established among small groups of relationships in social networks to understand the of individuals. These interpersonal ties between mem- relations among actors and the implications of these relationships. ∗ Common tasks of SNA involve the identification Correspondence to: [email protected] of the most influential, prestigious, or central actors, Faculty of Economics, University of Porto, Porto, Portugal; The Laboratory of Artificial Intelligence and Decision Support, Insti- using statistical measures; the identification of hubs tute for Systems and Computer Engineering of Porto, University of and authorities, using link analysis algorithms, and Porto, Porto, Portugal. the discovery of communities, using community de- DOI: 10.1002/widm.1048 tection techniques. These tasks are extremely useful Volume 2, March/April 2012 c 2012 John Wiley & Sons, Inc. 99 Overview wires.wiley.com/widm TABLE 1 Network Terminology for Different Fields of Knowl- In turn, information networks are based upon edge the exchange of information among entities usually aiming to enhance knowledge diffusion, business, or Mathematics Computer Science Sociology Physics Others social aims. Examples include networks of citations between academic papers, commonly represented by Vertex/vertices Node Actor/agent Site Dot an acyclic-directed graph where vertices represent pa- Edge Link/connection Relational tie Bond Arc pers and there is a direct edge if paper A cites paper B; and preference networks, which are generally mod- in the process of extracting knowledge from networks eled through bipartite graphs and represent individu- and, consequently, in the process of problem solving. als’ consumption preferences for a given commercial Because of the appealing nature of such tasks and to product13 (e.g., books). Another important example the high potential opened by this kind of analyses, of an information network is the World Wide Web, SNA has become a popular approach in a myriad of which can be represented as a directed graph, in which fields, from Biology to Business. For instance, some vertices represent static Web pages and edges corre- companies use SNA to maximize positive word of spond to the hyperlinks between them.14 mouth of their products by targeting the customers Technological networks are man-made net- with higher network value (those with higher influ- works designed for distribution of some commodity ence and support).2–4 Other companies, such as the or resource (e.g., electricity, information). Some ex- ones operating in the sector of mobile telecommu- amples are networks of roads and railways, networks nications, apply SNA techniques to the phone call of airline routes, and networks of physical connec- networks and use them to identify customer’s pro- tions between computers (Internet). files and to recommend personalized mobile phone The last type of networks are the so-called bio- tariffs, according to these profiles. These companies logical networks15 and, as the name implies, are those also use SNA for churn prediction, i.e., to detect cus- that arise from biological processes, such as networks tomers who may potentially switch to another mo- of chemical reactions among metabolites, protein in- bile operator by detecting changes in the patterns teraction networks, genetic regulatory networks, real of phone contacts.5,6 Another interesting application neural networks, and food webs or predator–prey net- emerges from the domain of fraud detection. For in- works. stance, SNA can be applied to networks of organiza- Despite the fact the origins of network stud- tional communications (e.g., Enron company dataset) ies go back a few centuries ago, in recent years we to perform an analysis of the frequency and direc- witnessed an impressive advance in network-related tion of formal/informal email communication, which fields, mainly because of the growing interest in so- can reveal communication patterns among employees cial networks, which became a ‘hot’ topic and a focus and managers. These patterns can help identify peo- of considerable attention. For this reason, a lot of ple engaged in fraudulent activities, thus promoting students, practitioners and researchers are willing to the adoption of more efficient forms of acting toward enter the field and explore, even superficially, the po- the eradication of crime.7,8 tential of SNA techniques for the study of their prob- Besides social networks, there are other types lems. Bearing this in mind, in this paper our aim is of real-world structures that can be represented by to provide a general and succinct overview of the es- networks. According to Newman,9 real-world net- sentials of SNA for those interested in knowing more works can be categorized into four main types: social about the area and strongly oriented to use SNA in networks, information networks (or knowledge net- practical problems. works), technological networks, and biological net- The remainder of this document is organized works. as follows. We begin by pointing out some types of As previously mentioned, social networks are representations that can be used to model social net- the ones that arise as a result of human and so- works. Then, we introduce the best known statistical cial interactions and encompass studies of friend- measures to analyze them, according to two levels of ship networks,10 informal communication networks analysis: the actor level and the network level. Af- within companies,11 collaboration networks12 (e.g., terward, we talk a little about the link analysis task networks of coappearance in movies by actors, in and explain how it can be used to identify influential which two actors are connected if they appeared and authoritative nodes. Then, we distinguish two together in a movie, and networks of coauthorship important network models and introduce the main among academics, in which individuals are linked if properties of real-world networks. Later, we devote they coauthored one or more papers), among others. a section to the problem of finding communities in 100 c 2012 John Wiley & Sons, Inc. Volume 2, March/April 2012 WIREs Data Mining and Knowledge Discovery Social network analysis networks. After introducing the main concepts, we ces (contains both adjacency and degree information), provide a list of the most popular SNA software and and distance matrices (identical to the adjacency ma- tools for those readers interested in applying network trices with the difference that the entries of the matrix analysis in professional or academic problems. This are the lengths of the shortest paths between pairs of overview ends with the identification of the current vertices) are appropriate to represent full matrices. trends arising in the field of SNA. Several types of graphs can be used to model different kinds of social networks. For instance, graphs can be classified according to the direction of their REPRESENTATION OF SOCIAL links. This leads us to the differentiation between NETWORKS undirected and directed graphs.

Load more