Introduction to Social Network and Link Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to Social Network and Link Analysis David Loshin Knowledge Integrity, Inc. TDWI 2007 Spring Conference Boston, MA 1 1 © 2006 Knowledge Integrity Incorporated 2 www.knowledge-integrity.com (301) 754-6350 2 Half-Day Agenda Introduction to Networks, social and otherwise Network Connectivity Basics Link and Network Analysis Issues and Considerations for BI © 2006 Knowledge Integrity Incorporated 3 www.knowledge-integrity.com (301) 754-6350 In this talk, we will discuss the notion of connectivity, and why models for analyzing connections can add value to a business intelligence initiative. By reviewing the ways that objects interact through networks, we will explore whether the results of this analysis can enhance profiles, predictive analytics, and general business intelligence Objectives: •Understand network connectivity basics •Explore ways to represent networks •Understand the types of analysis that can be performed •Envisioning network analytics, data extraction and preparation 3 Introduction to Networks, Social and Otherwise © 2006 Knowledge Integrity Incorporated 4 www.knowledge-integrity.com (301) 754-6350 4 Networks, Links, and Coincidences? How many people do you know? Family, friends, co-workers, conference attendees Dozens? Hundreds? Thousands? How well do you know them? Very close, know them well, acquaintances, “just met” Concepts of Connectivity? You know 1,000 people They each know 1,000 people Therefore, you are potentially connected to 1,000,000 people through just 1 link By 2 links, the network could potentially extend to 1,000,000,000 people! “Small World Theory” – we are all connected through a very small number of links (See Milgram, Bacon) © 2006 Knowledge Integrity Incorporated 5 www.knowledge-integrity.com (301) 754-6350 The notion of connectivity is intriguing, especially when considering individuals, other types of parties, and the knowledge that can be derived through the analysis of connections. For example, think about the example in this slide. Let’s assume that we all know about 1000 people. But is it really true that each individual is therefore linked to 1,000,000 people? Conceptually, that would be true as long as none of the 1000 people I know are completely different than the 1000 people that you know. But in reality, we all seem to run around in similar circles, and so there is a great likelihood that many of the people that I know are the same people that you know. The consequences of this is the effective “self-organization” of communities ( as well as sub-communities, and sub-sub-communities). By examining the relationships that exist among groups of people, we can learn who are the influencers, who are the influenced, who spans critical communication boundaries, and how information (or commerce, or viruses, etc.) flow through the selected community. 5 Euler’s Insight Bridges of Konigsberg © 2006 Knowledge Integrity Incorporated 6 www.knowledge-integrity.com (301) 754-6350 One pastime of the residents of Konigsberg was to walk around town over the bridges between the different land areas of town. One game was to see if one could start at one location, walk over every bridge just once, and end up at the starting point. Mathematician Leonhard Euler abstracted the problem into a “graph” – acollection of nodes and links between them. By examining the graph, he was able to determine that based on the degrees of the links between nodes, the challenge of the bridges of Konigsberg was actually impossible. However, this insight created the branch of mathematics referred to as graph theory, which is the fundamental basis of network (and consequently, social network) analysis. 6 Network and Link Analysis Linkages exist everywhere Between individuals (“MCI Friends and Family”) Between locations (“Bridges of Konigsberg”) Between other types of objects (“Telephone network”) Between individuals and other kinds of objects (“Purchasing Preferences”) Between businesses (“D&B corporate hierarchies”) There are different kinds of links Each link has some sort of attribution Analyzing networks can provide insight for evaluating behavior patterns for different intelligence activities © 2006 Knowledge Integrity Incorporated 7 www.knowledge-integrity.com (301) 754-6350 There are many applications that rely on the power of the network. Each of these networks represents some attempt to exploit the different kinds of connections that exist among small groups of individuals, larger groups of individuals as well as how the groups themselves interact. Applications may be designed to seek out some interesting pattern within the network or to exploit the communication and information exchanges provided by the network. Every node and their each of their corresponding links carries certain characteristics. Each node represents an entity, while each link carries attributes that describe the nature of the relationship. 7 Applications of Network/Link Analysis Enforcement: Criminal analysis, money laundering Fraud detection: spambot detection, call pattern analysis Marketing: Customer Behavior analysis, Segmentation, collaborative filtering Community analyses: Account proxy (account used by more than one individual, many accounts used by one individual), research collaboration, communities of interest © 2006 Knowledge Integrity Incorporated 8 www.knowledge-integrity.com (301) 754-6350 8 More Applications… Health care: Contagion, disease control Physical: Supply chain analysis Transfer/Communications: Spheres of Influence, Information flows, business partnerships Formal Relationships: Working relationships, Influential individuals, ownership, accountability, corporate structure Informal Relationships: Friendship networks, extended families, social interactions, insider networks © 2006 Knowledge Integrity Incorporated 9 www.knowledge-integrity.com (301) 754-6350 9 Evidence is All Around… Databases Transaction systems, logs, data warehouses Semi-structured data Email, web pages, public records, filings Unstructured data News items, prospectuses, filings © 2006 Knowledge Integrity Incorporated 10 www.knowledge-integrity.com (301) 754-6350 Typical business intelligence applications focus on the ability to organize information for reporting and analysis across one or more dimensions, but are not usually configured to enable network analysis. Yet data warehouses contain significant amounts of connectivity information that is suitable to network and link analysis. Other sources of information provide connectivity data – transaction systems, database logs, software activity logs, as well as less structured systems such as emails, web logs, electronic public data filings, other public records (e.g., real estate transactions, Uniform Commercial Code, etc.). In addition, text analysis applications can extract individual data out of unstructured data to establish connections. 10 Example: Death Notice Richard J. Palaima, Of Mattapan, Suddenly, Nov. 18, 2002. Beloved husband of Jonadee (Badayos) Palaima. Devoted son of Madelyn L. (George) Palaima of Braintree, and the late Richard A. Palaima. Devoted brother of John A. Palaima of Braintree, nephew of Catherine Cunningham of Rockland, Cousin & Godson of Robert Cunningham of Rockland. Funeral from the Mortimer N. Peck-Russell Peck Funeral Home, 516 Washington St., BRAINTREE, on Saturday at 9 a.m. Funeral Mass in St. Gregory's Church, Dorchester, at 10 a.m. Relatives and friends invited. Visiting hours Friday 2-4 & 7-9 p.m. Memorial donations may be sent to St. Gregory's Church, 2215 Dorchester Ave., Dorchester 02124. © 2006 Knowledge Integrity Incorporated 11 www.knowledge-integrity.com (301) 754-6350 This example was taken from the Boston Globe, and was available on line from 11/20/2002 - 11/21/2002. Death, birth, engagement, and wedding notices are good examples of publicly available information (published in the newspaper) configured in semi-structured form that provide a lot of data about connections. In this example, we have a description of one individual and his immediate family, his location, and his religious affiliation. 11 Example: Extracted Entities and Their Links lives in Richard A Plaima married to Braintree, MA lives in Madelyn L. (George) Palaima has living state of sister of Deceased Catherine Cunningham father of Mattapan, MA mother of has living state of mother of John A Palaima lives in St. Gregory's Church brother of Robert Cunningham lives in located in is religiously affiliated with lives in Richard J Paliama married to Jondalee (Badayos) Palaima Dorchester, MA is godson/godfather of Rockland, MA is cousin of © 2006 Knowledge Integrity Incorporated 12 www.knowledge-integrity.com (301) 754-6350 12 Good SNA Resource Many examples taken from Robert A. Hanneman and Mark Riddle’s online text Introduction to social network methods http://faculty.ucr.edu/~hanneman/nettext/ © 2006 Knowledge Integrity Incorporated 13 www.knowledge-integrity.com (301) 754-6350 13 Integrating Network/Link Analysis with BI Network information may be embedded within data warehouse However: Representations may not be appropriate for analysis Data may need to be transformed and managed using non- relational data structures Analysis lends itself to visual representation Must understand concepts associated with networks, connectivity, and qualification of linkage Objective: Gain a conceptual understanding of Network data and its representation Characteristics of network relationships Types of analysis performed © 2006 Knowledge Integrity