Using Linked Data in Purposive Social Networks

UNIVERSITY OF SOUTHAMPTON FACULTY OF PHYSICAL AND APPLIED SCIENCES Electronics and Computer Science Using Linked Data in Purposive Social Networks by Priyanka Singh Thesis for the degree of Doctor of Philosophy September 2016 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF PHYSICAL AND APPLIED SCIENCES Electronics and Computer Science Doctor of Philosophy USING LINKED DATA IN PURPOSIVE SOCIAL NETWORKS by Priyanka Singh The Web has provided a platform for people to collaborate by using collective intelligence. Messaging boards, Q&A forums are some examples where people broadcast their issues and other people provide solutions. Such communities are defined as a Purposive Social Network (PSN) in this thesis. PSN is a community where people with similar interest and varied expertise come together, use collective intelligence to solve common problems in the community and build tools for common purpose. Usually, Q&A forums are closed or semi-open. The data are controlled by the websites. Difficulties in the search and discovery of information is an issue. People searching for answers or experts in a website can only see results from its own network, while losing a whole community of experts in other websites. Another issue in Q&A forums is not getting any response from the community. There is a long tail of questions that get no answer. The thesis introduces the Suman system that utilises Semantic Web (SW) and Linked Data technologies to solve above challenges. SW technologies are used to structure the community data so it can be decentralized and used across platforms. Linked Data helps to find related information about linked resources. The Suman system uses available tools to solve name entity disambiguation problem and add semantics to the PSN data. It uses a novel combination of semantic keyword search with traditional text search tech- niques to find similar questions with answers for unanswered questions to expand the query term with added semantics and uses corwdsourced data to rank the results. Fur- thermore, the Suman system also recommends experts who can answer those questions. This helps to narrow down the long tail of unanswered questions in such communities. The Suman system is designed using the Design Science methodology and evaluated by users in two experiments. The results were statistically analysed to show that the keywords generated by the Suman system were rated higher than the original keywords from the websites. It also showed that the participants agreed with the algorithm rating for answers provided by the Suman system. StackOverflow and Reddit are used as an example of PSN and to build an application as a proof of concept of the Suman system. Contents Declaration of Authorship xv Acknowledgements xvii Nomenclature xix 1 Introduction1 1.1 Overview....................................1 1.2 Research Questions...............................3 1.3 Research Methodology.............................4 1.3.1 Scientific Steps to Answer KQ1 and KQ2..............5 1.3.1.1 Knowledge Problem Investigation.............5 1.3.1.2 Research Design.......................5 1.3.1.3 Research Design Validation.................5 1.3.1.4 Research Execution.....................6 1.3.1.5 Analysis of Results......................6 1.3.2 Scientific Steps to Answer DP1 and DP2...............6 1.3.2.1 Problem Investigation....................6 1.3.2.2 Treatment Design......................7 1.3.2.3 Design Validation......................8 1.3.2.4 Treatment Implementation.................8 1.3.2.5 Implementation Evaluation.................9 1.3.3 Scientific Steps to Answer KQ3, KQ4 and KQ5...........9 1.3.3.1 Knowledge Problem Investigation.............9 1.3.3.2 Research Design....................... 10 1.3.3.3 Research Design Validation................. 11 1.3.3.4 Research Execution..................... 12 1.3.3.5 Analysis of Results...................... 12 1.4 Research Contribution............................. 12 1.5 Structure of Thesis............................... 14 2 Background 15 2.1 Emergence of Social Web........................... 16 2.1.1 Web 2.0 and Social Media....................... 17 2.1.2 Content-specific Social Networking Services............. 18 2.2 Collective Intelligence and Crowdsourcing.................. 19 2.2.1 Information Sharing.......................... 20 2.2.2 Human Computation.......................... 21 v vi CONTENTS 2.2.3 Quality Management and User Recommendation.......... 22 2.3 Semantic Web and Linked Data........................ 23 2.3.1 Semantic Web Technology....................... 23 2.3.2 Linked Data Technology........................ 25 2.3.2.1 Benefits of Linked Data................... 26 2.3.2.2 Linking the Datasets.................... 27 2.3.2.3 Linked Data Cloud..................... 30 2.3.3 Semantic Web Vocabularies...................... 31 2.3.3.1 FOAF............................. 31 2.3.3.2 SIOC............................. 32 2.3.4 Social Semantic Applications..................... 33 2.3.4.1 Semantic Tagging...................... 33 2.3.4.2 Semantic blogging and microblogging........... 34 2.3.4.3 Semantic Wiki........................ 35 2.3.5 Semantic Search and Query...................... 35 2.3.5.1 Information Retrieval.................... 36 2.3.5.2 Web Search.......................... 36 2.3.5.3 Semantic Search....................... 37 2.3.5.3.1 Keyword Disambiguation:............ 40 2.3.5.3.2 Concept mapping................. 41 2.3.5.3.3 Semantic Queries................. 42 2.3.5.3.4 Expert Recommendation:............. 44 3 Purposive Social Network 47 3.1 What is Purposive Social Network...................... 48 3.2 Different types of communities in Purposive Social Network........ 49 3.2.1 Information Based Community.................... 49 3.2.2 Interest Based Community...................... 49 3.2.3 Expert Based Community....................... 50 3.2.4 Location Based Community...................... 50 3.3 Properties of Purposive Social Network................... 51 3.3.1 Community Size............................ 51 3.3.2 Focused Interest............................ 51 3.3.3 Direct Communication......................... 51 3.3.4 Active Participation.......................... 52 3.3.5 Short Lifespan............................. 52 3.3.6 Strong Incentive............................ 52 3.4 Benefits of Purposive Social Network..................... 52 3.4.1 Information Exchange and Self-interest............... 53 3.4.2 Symbiotic Relation and Social Exchange............... 54 3.4.3 Social Recognition and Personal Satisfaction............ 54 3.4.4 Recommendation System....................... 55 3.4.5 Expert Finder............................. 55 3.5 Challenges in Purposive Social Network................... 55 3.5.1 Recruiting and Retaining Users.................... 56 3.5.2 Incentive Model............................ 57 3.5.3 Quality Control............................. 58 CONTENTS vii 3.5.4 Search and Discovery of Quality Content.............. 59 3.6 Case Study of Purposive Social Network................... 59 3.6.1 StackOverflow Analysis........................ 60 3.6.1.1 Question and Answers.................... 61 3.6.1.2 Users............................. 62 3.6.1.3 Tags.............................. 63 3.6.1.4 Votes and Reputation.................... 65 3.6.1.5 Communication network structure............. 66 3.6.1.6 Tags network......................... 67 3.6.1.7 Incentive Design....................... 68 3.6.1.8 Quality Control....................... 71 3.6.1.9 Community Moderation................... 72 3.6.2 Reddit Analysis............................. 72 3.6.2.1 Communication network structure............. 75 3.6.2.2 Subreddit network...................... 76 3.6.2.3 Incentive Design....................... 76 3.6.2.4 Quality Control....................... 79 3.6.2.5 Community Moderation................... 79 3.6.2.6 Discourse Analysis...................... 80 4 The Suman System 85 4.1 Use of Semantic Web and Linked Data in Purposive Social Network... 85 4.1.1 Research problem and challenges................... 86 4.1.2 Benefit of Using Semantic Web and Linked Data in Purposive Social Network............................. 87 4.1.2.1 Structured Data....................... 87 4.1.2.2 Linking People to People and People to Data....... 88 4.1.2.3 Multidimensional Network and Graph........... 88 4.1.2.4 Integrated Knowledge.................... 88 4.1.2.5 Smart Query and Search.................. 88 4.1.2.6 Social Network Analysis................... 89 4.1.3 Research Validation.......................... 89 4.1.4 Research execution........................... 89 4.1.5 Result evaluation............................ 90 4.2 The Suman System............................... 91 4.2.1 Problem Investigation......................... 92 4.2.2 Treatment Design of the Suman System............... 92 4.2.2.1 Data Collection....................... 93 4.2.2.2 Data Structuring....................... 94 4.2.2.3 Keyword Annotation and Linking............. 94 4.2.2.4 Database and Query..................... 96 4.2.2.4.1 Database Indexing and Configuration:...... 97 4.2.2.5 Suman Search Algorithm.................. 98 4.2.2.5.1 Detailed Explanation of Each Step........ 99 4.2.2.6 Expert Finder........................ 102 4.2.2.6.1 Detailed Explanation of Each Step........ 103 4.2.2.7 Design Innovation in the Suman System......... 105 viii CONTENTS 4.2.2.7.1 Special feature of the Suman algorithms:.... 105 4.2.3 Design Validation............................ 106 4.2.3.1 Validating the Suman system design........... 106 4.2.4

Load more