Social Networks and the Semantic Web SEMANTIC WEB and BEYOND Computing for Human Experience

Social Networks and the Semantic Web SEMANTIC WEB AND BEYOND Computing for Human Experience

Series Editors:

Ramesh Jain Amit Sheth University of California, Irvine University of Georgia http://ngs.ics.uci.edu/ http://lsdis.cs.uga.edu/∼amit

As computing becomes ubiquitous and pervasive, computing is increasingly becom- ing an extension of human, modifying or enhancing human experience. Today’s car reacts to human perception of danger with a series of computers participating in how to handle the vehicle for human command and environmental conditions. Proliferat- ing sensors help with observations, decision making as well as sensory modiﬁcations. The emergent semantic web will lead to machine understanding of data and help ex- ploit heterogeneous, multi-source digital media. Emerging applications in situation monitoring and entertainment applications are resulting in development of experien- tial environments. SEMANTIC WEB AND BEYOND Computing for Human Experience addresses the following goals: ➢ brings together forward looking research and technology that will shape our world more intimately than ever before as computing becomes an extension of human experience; ➢ covers all aspects of computing that is very closely tied to human perception, understanding and experience; ➢ brings together computing that deal with semantics, perception and experience; ➢ serves as the platform for exchange of both practical technologies and far reaching research. Additional information about this series can be obtained from http://www.springer.com

AdditionalTitles in the Series: Ontology Alignment: Bridging the Semantic Gap by Marc Ehrig, ISBN: 0-387-32805-X Semantic Web Services: Processes and Applications edited by Jorge Cardoso, Amit P. Sheth, ISBN 0-387-30239-5 Canadian Semantic Web edited by Mamadou T. Kon´e., Daniel Lemire; ISBN 0-387-29815-0 Semantic Management of Middleware by Daniel Oberle; ISBN-10: 0-387-27630-0 Social Networks and the Semantic Web

Peter Mika Yahoo! Research Barcelona Barcelona, Spain Peter Mika Yahoo! Research Barcelona Ocata 1, 1st ﬂoor 08003 Barcelona Spain [email protected]

ISBN-13: 978-0-387-71000-6 e-ISBN-13: 978-0-387-71001-3

Library of Congress Control Number: 2007926707

Social Networks and the Semantic Web by Peter Mika

c 2007 Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now know or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identiﬁed as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

Printed on acid-free paper.

987654321 springer.com Foreword

Science is like a tree: contrary to popular belief, both trees and science grow at their edges, not at their core. For science, this means that most of the fruitful and exciting developments are not happening at the core of established fields, but are instead happening at the boundaries between such fields. This has been particularly true for Computer Science. The most interesting database developments don’t happen inside the Database community, but rather where databases hit Biology. Similarly, the most interesting developments in Arti- ficial Intelligence in recent years have happened where AI met the Web. The young field of Semantic Web research has been one of the results of a number of different subfields of Computer Science being exposed to the challenges of the Web: databases, computational linguistics, knowledge representation, knowledge- based systems and service-oriented computing are just some of the subfields that are all making contributions to the Semantic Web vision, namely a Web that consists not only of links between web-pages full of pictures and text, but of a Semantic Web that consists of links between computer-interpretable data. Such a Semantic Web would make it possible to query and reason over integrated data-sets consisting of separate pieces of information that were never intended to be linked together, but that can nevertheless be fruitfully combined and integrated, sometimes even by a third party who neither wrote nor owns any of the original pieces of data (just as is possible with web-pages on the current Web). The current book on Social Networks and the Semantic Web is a fine example to illustrate that most exciting things in science do indeed happen on the overlapping boundaries of separate fields. Peter Mika has taken the bold step of investigating the mutual benefit of these two fields in both directions. In one direction, the Web offers a fantastic observatory for scientists interested in social behaviour, since it combines both huge numbers of participants and electronic traceability of social relations between these participants. In the other direction, Semantic Web applications can greatly benefit by knowing in which social context certain pieces of information were generated or used, helping to provide much more socially aware information systems. VI Foreword

Besides investigating this bi-directional fertilisation of the two fields of Social Networking studies and Semantic Web, Peter Mika has done one more double-take, namely by using the Semantic Web community itself as the subject of some of the social networking studies which are themselves aided by the use of Semantic Web technology. Besides being an amusing double-take, this choice actually has some scientific justification, since it enabled Peter and his fellow researchers to evaluate the quality of the networking results they obtained, since the community was well known to themselves. In this book, Peter certainly shows us that this meeting of Social Networking studies and Semantic Web research is a fruitful one. By deploying Semantic Web techniques, he manages to obtain social networking data-sets which are an order of magnitude larger than what is typical in such studies. And by using Social Net- working techniques, he has developed an award winning Semantic Web use-case1 of a kind that was not envisaged in any of the typical Semantic Web use-case cata- logues, which mention search, personalisation, information-integration and service- discovery, but did not until Peter’s work mention social network analysis. Besides for its cross-disciplinary nature, the book is remarkable for another reason: the author is as much a thinker as he is a hacker (and I mean this as a compli- ment!). The book contains clear conceptual analyses, e.g. in the well-written introductions to both fields in the early chapter, and in quantitative analysis of its findings in the later chapters. But it also contains fine examples of solid software design and engineering, resulting in the systems Flink and openacademia. Of course, a foreword should not end without a cautionary remark. It’s well known that in every exploration of new fertile scientific ground, early progress is quick, and fruits are waiting to be found. This book certainly shows that the overlapping area of Semantic Web research and Social Networking studies is such a fertile area. I can only hope that other researchers will be inspired by this work to enter into this fertile area, and to show us that still other fruits grow in that place.

Frank van Harmelen, Amsterdam, April 2007.

1 http://www.flink.org Preface

Whether we changed the Web or the Web has changed us is difficult to distil even when equipped with the wisdom of hindsight. While the process is a mystery, the changes are a fact. A recent large scale study on the Internet use of Americans have recorded the dramatic shift in the way that we approach the online world [BHWR06]. If we think of the Web as a giant billboard, the early days were spent with some affixing a note, but most merely passing by, care- lessly surfing from one note to the next. These days we don’t just ’surf’ anymore: the Web has become an extension of our self that allows to reach others. We have learned to use the billboard to actively seek out others and made it a gathering place. Around the board we discuss matters, ask and give advice, share our tastes, experiences and ideas with friends and unknowns and build relationships in the process. The end result is clear from the survey: we activate our new, online ties even in those situations that we used to solve exclusively with the help of our closest ties, our intimate friends and family. The Net has changed the size and composition of our social networks: in particular, our networks have grown with an array of weak ties Ð a common name for those familiar faces on our blog rolls, buddy lists, chat groups, fora, mailing lists and the myriad other forums of our interactions. Needless to say, the billboard had to change to adapt to its new function. What we now call Web 2.0 is a collective name for these evolutionary changes. First, the Web has finally become the read/write Web that its inventor originally intended it to be: the popular ’places’ of today are created by and for the people. The wisdom of the crowd is used to build and manage large repositories of knowledge such as Wikipedia, the online encyclopedia. But ’the crowd’ is not only editing encyclo- pedias, but also sharing photos and music, hunting for books and news together, designing software, writing stories and much more. Technologically, the Web needed to adapt as well. AJAX-based development has significantly improved the user experience while interacting with websites, while RSS feeds and other technologies improved the connectivity between users and the content of the Web. The intense popularity of scripting languages hints at the way programming is democratized and turned into a form of art, rather than engineering. Lastly, the sense of (collective) ownership continues to inspire creative VIII Preface experimentation with web content in the form of mash-ups, web applications that combine user generated content from multiple sources. In contrast to Web 2.0, the Semantic Web is a more conscious effort on behalf of the World Wide Web Consortium (the standards organization behind the Web) to make the Web friendlier to machines. While at the moment most of the content in the online world is only accessible to human readers, the Semantic Web would provide additional layers of Web architecture for describing content using shared vocabu- laries called ontologies. This would allow computers to reason with the knowledge expressed in Web resources, e.g. to aggregate relevant information from multiple sources and come to conclusions in a manner that resembles human logic. While an infrastructure for machines, the knowledge that fills the Semantic Web and the rules of reasoning will in fact be provided by humans. In short, there is no semantics without humans and this makes the Semantic Web as much a social system as a technological one. These developments are of interest to researchers in both the Social and Infor- mation Sciences, as well as to practitioners developing social-semantic software for the Web. On the one hand, the emergence of the Social Web opens up never foreseen opportunities for observing social behavior by tracing social interaction on the Web. On the other hand, user generated content and metadata in social software requires a different treatment than other content and metadata. In particular, user generated knowledge comes with additional information about the social context in which it is conceived and this information —in particular, the social networks of users— is also accessible for our machines to reason with. This provides unprecedented opportunities of building socially-aware information systems. For the Semantic Web in particular this means building intelligent applications that are aware of the social embeddedness of semantics. In this book we provide two major case studies to demonstrate each of these opportunities. The first case study shows the possibilities of tracking a research community over the Web, combining the information obtained from the Web with other data sources (publications, emails). The results are analyzed and correlated with performance measures, trying to predict what kind of social networks help researchers succeed (Chapter 8). Social network mining from the Web plays an important role in this case study for obtaining large scale, dynamic network data beyond the possibilities of survey methods. In turn semantic technology is the key to the representation and aggregation of information from multiple heterogeneous information sources (Chapters 4 and 5). As the methods we are proposing are more generally applicable than the context of our scientometric study, most of this volume is spent on describing our methods rather than discussing the results. We summarize the possibilities for (re)using electronic data for network analysis in Chapter 3 and evaluate two methods of social network mining from the Web in a separate study described in Chapter 7. We discuss semantic technology for social network data aggregation in Chapters 4 and 5. Lastly, we describe the implementation of our methods in the award-winning Flink system in Chapter 6. In fact these descriptions should not only allow the reader to repro- duce our work, but to apply our methods in a wide range of settings. This includes Preface IX adapting our methods to other social settings and other kinds of information sources, while preserving the advantages of a fully automated analysis process based on electronic data. Our second study highlights the role of the social context in user-generated clas- sifications of content, in particular in the tagging systems known as folksonomies (Chapter 9). Tagging is widely applied in organizing the content in many Web 2.0 services, including the social bookmarking application del.icio.us and the photo sharing site Flickr. We consider folksonomies as lightweight semantic structures where the semantics of tags emerges over time from the way tags are applied. We study tagging systems using the concepts and methodology of network analysis. We es- tablish that folksonomies are indeed much richer in semantics than it might seem at first and we show the dependence of semantics on the social context of application. These results are particularly relevant for the development of the Semantic Web using bottom-up, collaborative approaches. Putting the available knowledge in a social context also opens the way to more personalized applications such as social search. As the above descriptions show, both studies are characterized by an interdisciplinary approach where we combine the concepts and methods of Artificial Intel- ligence with those of Social Network Analysis. However, we will not assume any particularly knowledge of these fields on the part of the reader and provide the nec- essary introductions to both (Chapters 1 and 2). These introductions should allow access to our work for both social scientists with an interest in electronic data and for information scientists with an interest in social-semantic applications. Our primary goal is not to teach any of these disciplines in detail but to provide an insight for both Social and Information Scientists into the concepts and methods from outside their respective fields. We show a glimpse of the benefits that this understanding could bring in addressing complex outstanding issues that are inherently interdisciplinary in nature. Our hope is then to inspire further creative experimentation toward a better understanding of both online social interaction and the nature of human knowledge. Such understanding will be indispensable in a world where the border between these once far-flung disciplines is expected to shrink rapidly through more and more socially immersive online environments such as the virtual worlds of Second Life. Only when equipped with the proper understanding will we succeed in designing systems that show true intelligence in both reasoning and social capabili- ties and are thus able to guide us through an ever more complex online universe. The Author would like to acknowledge the support of the Vrije Universiteit Re- search School for Business Information Sciences (VUBIS) in conducting the research contained in this volume. Contents

Part I Introduction to the Semantic Web and Social Networks

1 The Semantic Web ...... 3 1.1 LimitationsofthecurrentWeb ...... 4 1.1.1 What’swrongwiththeWeb? ...... 4 1.1.2 Diagnosis: A lack of knowledge ...... 8 1.2 Thesemanticsolution...... 9 1.3 DevelopmentoftheSemanticWeb...... 13 1.3.1 Research,developmentandstandardization...... 13 1.3.2 Technology adoption ...... 16 1.4 Theemergenceofthesocialweb...... 21 1.4.1 Web2.0+SemanticWeb=Web3.0?...... 23 1.5 Discussion...... 25

2 Social Network Analysis ...... 27 2.1 Whatisnetworkanalysis?...... 27 2.2 DevelopmentofSocialNetworkAnalysis...... 29 2.3 Keyconceptsandmeasuresinnetworkanalysis...... 31 2.3.1 Theglobalstructureofnetworks ...... 32 2.3.2 Themacro-structureofsocialnetworks...... 37 2.3.3 Personalnetworks ...... 41 2.4 Discussion...... 46

Part II Web data and semantics in social network applications

3 Electronic sources for network analysis ...... 51 3.1 Electronicdiscussionnetworks...... 52 3.2 Blogs and online communities ...... 53 3.3 Web-basednetworks...... 55 3.4 Discussion...... 62 XII Contents

4 Knowledge Representation on the Semantic Web ...... 65 4.1 OntologiesandtheirroleintheSemanticWeb...... 67 4.1.1 Ontology-based Knowledge Representation ...... 67 4.1.2 Ontologies and ontology languages for the Semantic Web . . 70 4.2 Ontology languages for the Semantic Web ...... 71 4.2.1 The Resource Description Framework (RDF) and RDF Schema...... 72 4.2.2 The Web Ontology Language (OWL) ...... 79 4.2.3 Comparison to the Uniﬁed Modelling Language (UML) . . . . 81 4.2.4 Comparison to the Entity/Relationship (E/R) model and the relationalmodel...... 84 4.2.5 Comparison to the Extensible Markup Language (XML) andXMLSchema...... 86 4.3 Discussion:Web-basedknowledgerepresentation ...... 90

5 Modelling and aggregating social network data ...... 93 5.1 State-of-the-artinnetworkdatarepresentation...... 94 5.2 Ontologicalrepresentationofsocialindividuals...... 96 5.3 Ontologicalrepresentationofsocialrelationships...... 101 5.3.1 Conceptualmodel...... 103 5.4 Aggregatingandreasoningwithsocialnetworkdata...... 109 5.4.1 Representing identity ...... 110 5.4.2 On the notion of equality ...... 111 5.4.3 Determining equality ...... 113 5.4.4 Reasoning with instance equality ...... 114 5.4.5 Evaluatingsmushing...... 118 5.5 Discussion...... 119 5.5.1 Advancedrepresentations...... 119

6 Developing social-semantic applications ...... 121 6.1 Building Semantic Web applications with social network features . . 123 6.1.1 The generic architecture of Semantic Web applications . . . . . 124 6.1.2 Sesame...... 126 6.1.3 Elmo ...... 128 6.1.4 GraphUtil ...... 133 6.2 Flink:thesocialnetworksoftheSemanticWebcommunity...... 134 6.2.1 ThefeaturesofFlink...... 135 6.2.2 Systemdesign...... 137 6.3 openacademia: distributed, semantic-based publication management 141 6.3.1 The features of openacademia ...... 142 6.3.2 Systemdesign...... 143 6.4 Discussion...... 150 Contents XIII

Part III Case studies

7 Evaluation of web-based social network extraction ...... 155 7.1 Differences between survey methods and electronic data extraction . 157 7.2 Contextoftheempiricalstudy...... 159 7.3 Datacollection...... 160 7.4 Preparingthedata...... 161 7.5 Optimizing goodness of ﬁt ...... 162 7.6 Comparison across methods and networks ...... 165 7.7 Predicting the goodness of ﬁt ...... 167 7.8 Evaluation through analysis ...... 170 7.9 Discussion...... 171

8 Semantic-based Social Network Analysis in the sciences ...... 175 8.1 Context...... 177 8.2 Methodology ...... 178 8.2.1 Data acquisition ...... 178 8.2.2 Representation,storageandreasoning...... 180 8.2.3 VisualizationandAnalysis...... 181 8.3 Results...... 182 8.3.1 Descriptiveanalysis...... 182 8.3.2 Structural and cognitive effects on scientiﬁc performance . . . 184 8.4 ConclusionsandFutureWork...... 190

9 Ontologies are us: emergent semantics in folksonomy systems ...... 193 9.1 A tripartite model of ontologies ...... 194 9.1.1 Ontologyenrichment...... 196 9.2 Casestudies...... 198 9.2.1 Ontologyemergenceindel.icio.us...... 198 9.2.2 Community-based ontology extraction from Web pages . . . . 203 9.3 Evaluation ...... 205 9.4 ConclusionsandFutureWork...... 206

Part IV Conclusions

10 The perfect storm ...... 211 10.1 Looking back: the story of Katrina PeopleFinder ...... 212 10.1.1 TheSemanticWeb...... 216 10.1.2 SocialNetworks...... 219 10.2 Looking ahead: a Second Life ...... 221

References ...... 225

Index ...... 233