Analysis-Aware Approach to Improving Social Data Quality

Total Page:16

File Type:pdf, Size:1020Kb

Analysis-Aware Approach to Improving Social Data Quality UC Irvine UC Irvine Electronic Theses and Dissertations Title Analysis-Aware Approach to Improving Social Data Quality Permalink https://escholarship.org/uc/item/6k08w8js Author Sadri, Mehdi Publication Date 2017 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, IRVINE Analysis-Aware Approach to Improving Social Data Quality DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Computer Science by Mehdi Sadri Dissertation Committee: Professor Sharad Mehrotra, Chair Professor Chen Li Professor Nalini Venkatasubramanian Professor Yaming Yu 2017 c 2017 Mehdi Sadri DEDICATION To my beloved parents, Monir and Mohammad. ii TABLE OF CONTENTS Page LIST OF FIGURES vi LIST OF TABLES viii ACKNOWLEDGMENTS ix CURRICULUM VITAE x ABSTRACT OF THE DISSERTATION xi 1 Introduction 1 2 Preliminaries and Related Work 4 2.1 Data Quality . 5 2.1.1 Social Data Quality . 5 2.2 Data Acquisition . 6 2.2.1 Social Data Acquisition . 7 2.3 Data Cleaning . 9 2.3.1 Social Data Cleaning . 10 2.4 Analysis-Aware Approach . 11 3 Social Data Acquisition 13 3.1 Introduction . 14 3.2 Motivating Example . 18 3.3 Notation and Problem Definition . 19 3.4 Query Generation . 23 3.4.1 Probabilistic Query Coverage . 23 3.4.2 Query Generation . 25 3.4.3 Statistics Maintenance . 26 3.4.4 Combinatorial MAB Framework . 30 3.4.5 Greedy Approximation Bound . 33 3.4.6 Greedy Algorithm . 34 3.5 Relevance Check . 35 3.5.1 Phrase Based Relevance (Rt) ...................... 36 3.5.2 Clue Relevance (Rc) ........................... 37 iii 3.5.3 User History (Ru)............................. 38 3.6 Topic Maintenance . 38 3.7 Experimental Evaluation . 40 3.7.1 Experimental Setup . 40 3.7.2 Evaluation Criteria . 43 3.7.3 Experimental Results . 44 3.8 TAPP (Twitter Follow-up Application) . 53 3.8.1 System Overview . 54 3.9 Summary . 57 4 Social Entity Linking 58 4.1 Introduction . 59 4.2 Motivating Example . 65 4.3 Preliminaries . 67 4.3.1 Window-based Stream . 67 4.3.2 Data Cleaning Functionalities . 68 4.3.3 Entity Blocks . 72 4.3.4 Mention Probabilities . 72 4.3.5 Continuous Top-k Query . 73 4.4 Deterministic Top-k . 75 4.5 Probabilistic Top-k . 77 4.5.1 Factor Graph . 78 4.5.2 Entity Probabilistic Model . 80 4.5.3 Entity Dominance Graph (EDG) . 83 4.5.4 Selection Criteria . 85 4.5.5 Stopping Criteria . 86 4.5.6 Finding Top-K . 90 4.5.7 Scalability of EDG . 94 4.6 Architecture of TkET . 96 4.6.1 Sliding Window Stream Processing . 97 4.7 Experimental Evaluation . 98 4.7.1 Experimental Setup . 99 4.7.2 Knowledge Base . 101 4.7.3 Factor Graphs and Dimple . 104 4.7.4 Synthetic Dataset Genration . 105 4.7.5 Experimental Results . 106 4.7.6 Real Tweet Dataset Experimental Results . 111 4.7.7 Discussion . 112 4.8 Related Work . 113 4.8.1 Social Entity Linking . 113 4.8.2 Top-k Query Answering . 114 4.9 Summary . 114 5 SoDAS: Social Data Analytics System 116 5.1 System Overview . 117 iv 6 Conclusions and Future Work 120 Bibliography 124 v LIST OF FIGURES Page 2.1 Common Steps in Data Processing Pipelines . 4 3.1 TAS Architecture . 14 3.2 Phrase Weight . 18 3.3 TAS Iterations . 19 3.4 Phrase Maintenance . 39 3.5 Approximate Relative Recall . 44 3.6 TAS vs. BaseM: Number of Tweets . 45 3.7 TAS over Simulation: Number of Tweets . 46 3.8 TAS vs. BaseM: Approximate Relative Recall . 47 3.9 TAS with Different Phrase Budgets . 48 3.10 TAS with Different Inner Iteration Sizes . 49 3.11 Topic Maintenance Module On vs. Off: Number of Tweets . 49 3.12 TAS: Number of Phrases . 50 3.13 TAS vs. ATM: Topic 75 . 52 3.14 TAS vs. ATM: Topic 85 . 52 3.15 TAAP Application . 55 4.1 NER and NEL Black Box Interfaces . 68 4.2 Factor Graph for the “Catfish” Entity Block . 80 4.3 Entity Dominance Graph Example . 84 4.4 In-Degree vs. Out-Degree based Stopping Criteria . 87 4.5 Out-Degree based Stopping Criteria Example . 88 4.6 In-Degree based Stopping Criteria Example . 89 4.7 EDG 1-2 steps of TkET top-2 algorithm on Motivating Example . 91 4.8 EDG 3-5 steps of TkET top-2 algorithm on Motivating Example . 92 4.9 EDG 6-7 steps of TkET top-2 algorithm on Motivating Example . 93 4.10 Transitivity of Pairwise Dominance . 95 4.11 Overview of TkET . 95 4.12 Architecture of TkET . 96 4.13 Sliding Window, Stream Processing . 98 4.14 Motivating Example's Identified Entity Blocks . 102 4.15 Selected Synthetic Datasets Block Size Distribution . 107 4.16 SDS4: Latency vs. Parameters(k, th) . 109 4.17 SDS4: Accuracy vs. Parameters(k, th) . 110 vi 5.1 SoDAS General Architecture . 118 vii LIST OF TABLES Page 3.1 Example Phrases of an Interest . 19 3.2 Fixed Corpus Topics of Interest . 45 3.3 ARR Calculation, Sample Sizes . 47 3.4 Streaming Topics of Interest . 51 3.5 Streaming Experiment Summary . 53 4.1 Example Raw Tweets . 65 4.2 Selected Dataset Parameters . 108 4.3 Efficiency for Out-Degree based Stopping Criteria . 108 4.4 Efficiency for In-Degree based Stopping Criteria . 110 4.5 Efficiency over the Real Tweet Dataset . 112 viii ACKNOWLEDGMENTS I would like first to express my deepest sincere gratitude to my advisor Prof. Sharad Mehro- tra for his unwavering guidance, support, and encouragement. Prof. Sharad has patiently taught me how to identify new important research problems, solve problems in principle, and how to write research papers. I am glad to have had the opportunity to work with him and for that I am very grateful. An additional special gratitude is to due to Prof. Yaming Yu and Prof. Charless Fowlkes for their insightful support and suggestions throughout this research, especially on the third and fourth chapters of this thesis. The time and effort they spent with me were instrumental in my progress. I would also like to extend my appreciation to the members of my doctoral committee; Prof. Chen Li, Prof. Nalini Venkatasubramanian, Prof. Yaming Yu, for their useful feedback and for finding the time to serve on my committee. I would like to thank everyone in the ISG group, especially my colleagues in the Data Quality and Privacy Group at UCI, Yasser Altowim, Hotham Altwaijry, Stylianos Doudalis, Kerim Oktay, Jie Xu, Liyan Zhang, Abdulrahman Alsaudi, and Jamshid Esmaelnezhad. The work reported in this thesis was also supported in part by NSF grants CNS-1527536, CNS-1545071, CNS-1450768, CNS-1450768, CNS-1059436, CNS-1118114. Foremost, I would.
Recommended publications
  • Social Data Biases and Methodological Pitfalls
    A Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries Alexandra Olteanu Carlos Castillo, Eurecat Fernando Diaz, Microsoft Research Emre Kıcıman, Microsoft Research Social data in digital form, which includes user-generated content, expressed or implicit relationships between people, and behavioral traces, are at the core of many popular applications and platforms, and drive the research agenda of many re- searchers. The promises of social data are many, including understanding “what the world thinks” about a social issue, brand, product, celebrity, or other entity, as well as enabling better decision making in a variety of fields including public policy, healthcare, and economics. Many academics and practitioners have warned against the na¨ıve usage of social data. There are biases and inaccuracies at the source of the data, but also introduced during processing. There are methodological limitations and pitfalls, as well as ethical boundaries and unexpected consequences that are often overlooked. This survey recognizes that the rigor with which these issues are addressed by different researchers varies across a wide range. We present a framework for identifying a broad range of menaces in the research and practices around social data. Additional Key Words and Phrases: Social media, user-generated content, behavioral traces, biases, evaluation 1. INTRODUCTION “For your own sanity, you have to remember that not all problems can be solved. Not all problems can be solved, but all problems can be illuminated.” –Ursula Franklin1 This survey covers a series of concerns on the usage by researchers of social data for a variety of goals. To set the context, in this section, we describe social data and what is it being used for (x1.1), outline general concerns about its usage as voiced by academics in the past (x1.2), and overview the remainder of the survey (x1.3).
    [Show full text]
  • A Big Social Data Analysis Framework Kashif Ali RMIT University, [email protected]
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by AIS Electronic Library (AISeL) Association for Information Systems AIS Electronic Library (AISeL) Pacific Asia Conference on Information Systems PACIS 2018 Proceedings (PACIS) 6-26-2018 Meta-Information as a Service: A Big Social Data Analysis Framework Kashif Ali RMIT University, [email protected] Margaret Hamilton RMIT University, [email protected] Charles Thevathayan RMIT University, [email protected] Xiuzhen Zhang RMIT University, [email protected] Follow this and additional works at: https://aisel.aisnet.org/pacis2018 Recommended Citation Ali, Kashif; Hamilton, Margaret; Thevathayan, Charles; and Zhang, Xiuzhen, "Meta-Information as a Service: A Big Social Data Analysis Framework" (2018). PACIS 2018 Proceedings. 99. https://aisel.aisnet.org/pacis2018/99 This material is brought to you by the Pacific Asia Conference on Information Systems (PACIS) at AIS Electronic Library (AISeL). It has been accepted for inclusion in PACIS 2018 Proceedings by an authorized administrator of AIS Electronic Library (AISeL). For more information, please contact [email protected]. A Big Social Data Analysis Framework Meta-Information as a Service: A Big Social Data Analysis Framework Completed Research Paper Kashif Ali Margaret Hamilton School of Science, RMIT University School of Science, RMIT University Melbourne, Victoria, Australia Melbourne, Victoria, Australia [email protected] [email protected] Charles Thevathayan Xiuzhen Zhang School of Science, RMIT University School of Science, RMIT University Melbourne, Victoria, Australia Melbourne, Victoria, Australia [email protected] [email protected] Abstract Social information services generate a large amount of data.
    [Show full text]
  • Coded Visualization: the Rhetoric and Aesthetics of Data-Based Cultural
    CODED VISUALIZATION: THE RHETORIC AND AESTHETICS OF DATA-BASED CULTURAL INTERFACE A Dissertation Presented to The Academic Faculty by Tanyoung Kim In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Literature, Media, and Communication Georgia Institute of Technology May 2013 COPYRIGHT 2013 BY TANYOUNG KIM CODED VISUALIZATION: THE RHETORIC AND AESTHETICS OF DATA-BASED CULTURAL INTERFACE Approved by: Dr. Carl DiSalvo, Advisor Dr. John Stasko School of Literature, Media, and School of Interactive Computing Communication Georgia Institute of Technology Georgia Institute of Technology Dr. Jay David Bolter Dr. Johanna Drucker School of Literature, Media, and Graduate School of Education and Communication Information Studies Georgia Institute of Technology University of California, Los Angeles Dr. Ian Bogost School of Literature, Media, and Communication Georgia Institute of Technology Date Approved: April 2, 2013 ACKNOWLEDGEMENTS It is difficult to explain how much gratitude and thankfulness I feel to have support from these people. First of all, I especially thank my thesis readers. Without my advisor, Carl DiSalvo’s guidance, encouragement, critiques, and patience, I would never have finished this dissertation. John Stasko has also supported me endlessly since I first took his class that opened my eyes to visualization. Jay Bolter and Ian Bogost, the great digital media scholars, have guided me not to miss critical points in my dissertation. I am grateful that Johanna Drucker, who wrote the most significant books for my research, served as one of my committee members. I also thank other fantastic faculty members in the Digital Media Program. Brian Magerko supported me in successfully performing my first research in Atlanta.
    [Show full text]
  • Twitter Strategies and Trends: an Analysis of the Smartphone Industry
    Twitter Strategies and Date of Submission: 09/03/2020 Trends: An Analysis of The Smartphone Industry Santeri Nupponen & Øyvind Hegstad MASTER'S THESIS MSc in Business Administration and E-Business Supervisor: Niels Buss Lassen Number of characters: 140 272 Number of pages: 87 Student ID Santeri Nupponen: 121729 Student ID Øyvind Hegstad: 116568 Table of Contents 1.0 Abstract .......................................................................................................................................................................... 4 2.0 Forewords ..................................................................................................................................................................... 5 3.0 Introduction ................................................................................................................................................................ 5 4.0 Objective ........................................................................................................................................................................ 7 5.0 Aim .................................................................................................................................................................................... 7 6.0 Scope ................................................................................................................................................................................ 8 7.0 Delimitation ................................................................................................................................................................
    [Show full text]
  • TV Ratings Vs. Social Media Engagement
    TV Ratings vs. Social Media Engagement Big Social Data Analytics of the Scandinavian TV Talk Show Skavlan Larsen, Henrikke Hovda; Forsberg, Johanna Margareta; Hemstad, Sigrid Viken; Mukkamala, Raghava Rao; Hussain, Abid; Vatrapu, Ravi Document Version Final published version Published in: Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016 Publication date: 2016 License CC BY-NC-ND Citation for published version (APA): Larsen, H. H., Forsberg, J. M., Hemstad, S. V., Mukkamala, R. R., Hussain, A., & Vatrapu, R. (2016). TV Ratings vs. Social Media Engagement: Big Social Data Analytics of the Scandinavian TV Talk Show Skavlan. In R. Ak, G. Karypis, Y. Xia, X. T. Hu, P. S. Yu, J. Joshi, L. Ungar, L. Liu, A-H. Sato, T. Suzumura, S. Rachuri, R. Govindaraju, & W. Xu (Eds.), Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016 (pp. 3849-3858). [7841058] IEEE. Link to publication in CBS Research Portal General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Take down policy If you believe that this document breaches copyright please contact us ([email protected]) providing details, and we will remove access to the work immediately and investigate your claim. Download date: 23. Sep. 2021 2016 IEEE International Conference on Big Data (Big Data) TV Ratings vs. Social
    [Show full text]
  • Chapter 1 Social Big Data: an Overview and Applications
    Chapter 1 Social Big Data: An Overview and Applications Bilal Abu-Salih1, Pornpit Wongthongtham2 Dengya Zhu3 , Kit Yan Chan3 , Amit Rudra3 1The University of Jordan 2 The University of Western Australia 3 Curtin University Abstract : The emergence of online social media services has made a qualitative leap and brought profound changes to various aspects of human, cultural, intellectual, and social life. These significant Big data tributaries have further transformed the businesses processes by establishing convergent and transparent dialogues between businesses and their customers. Therefore, analysing the flow of social data content is necessary in order to enhance business practices, to augment brand awareness, to develop insights on target markets, to detect and identify positive and negative customer sentiments, etc., thereby achieving the hoped-for added value. This chapter presents an overview of Social Big Data term and definition. This chapter also lays the foundation for several applications and analytics that are broadly discussed in this book. Keywords: Social Big Data; Social Credibility; Domain Knowledge; Sentiment Analysis; Affective Design; Predictive Analytics; 1.1 Introduction The social media services, positioned on the throne of cyberspace, in their broad sense, cover an ample set of freely accessible electronic platforms that are built to encourage and simplify communication between people with similar interests by enabling interactive conversations and exchanging information regardless of physical location. Those virtual platforms are continuing to spread exponentially by providing social communication services to their affiliated members. The services offered by these sites have expanded, providing their consumers with extensive possibilities for exchanging information in the fields of education, health, culture, sports and other domains of knowledge [1, 2].
    [Show full text]
  • Over a Decade of Social Opinion Mining: a Systematic Review
    Over a Decade of Social Opinion Mining: A Systematic Review Keith Cortis ADAPT Centre, Dublin City University, Ireland [email protected] Brian Davis ADAPT Centre, Dublin City University, Ireland [email protected] Abstract Social media popularity and importance is on the increase due to people using it for various types of social interaction across multiple channels. This sys- tematic review focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectiv- ity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats, like text, image, video and audio. Through Social Opinion Mining, nat- ural language can be understood in terms of the different opinion dimensions, as expressed by humans. This contributes towards the evolution of Artificial In- arXiv:2012.03091v2 [cs.CL] 6 Jul 2021 telligence which in turn helps the advancement of several real-world use cases, such as customer service and decision making. A thorough systematic review was carried out on Social Opinion Mining research which totals 485 published studies and spans a period of twelve years between 2007 and 2018. The in-depth analysis focuses on the social media platforms, techniques, social datasets, lan- guage, modality, tools and technologies, and other aspects derived. Social Opin- ion Mining can be utilised in many application areas, ranging from marketing, Preprint submitted to Artificial Intelligence Review July 8, 2021 advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance, healthcare, sports and gov- ernment.
    [Show full text]
  • Computational Perspectives on Large-Scale Social Networks
    COMPUTATIONAL PERSPECTIVES ON LARGE-SCALE SOCIAL NETWORKS A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Johan Holke Olof Ugander August 2014 c 2014 Johan Holke Olof Ugander ALL RIGHTS RESERVED COMPUTATIONAL PERSPECTIVES ON LARGE-SCALE SOCIAL NETWORKS Johan Holke Olof Ugander, Ph.D. Cornell University 2014 This thesis investigates both how computational perspectives can improve our under- standing of social networks, and also how modern insights about social networks can be put to work to address difficult computational and inferential challenges across systems engineering and the social sciences. The microstructure of human behavior has a rich history of study across many disciplines, and only recently — through the data deluge of online instrumentation and experimentation — has the role networks play across social and economic domains come into full view. Work in this thesis examines how social net- work neighborhoods, the rich local networks that surround individuals, function as con- tact surfaces through which individuals process information, mediating social decision and social contagion processes. Work in this thesis on distributing graph computations at Facebook, the online social networking service, has led to dramatic efficiency gains there, successfully deploying a new partitioning algorithm to reduce average query times for their “People You May Know” link prediction system by 50%. These improvements were achieved by harnessing both geographic and network structures of social graphs not necessarily found in other graph contexts. Additional work presents a highly scal- able “restreaming” approach to partitioning massive graphs with rich local structure.
    [Show full text]
  • Data Visualization in Society Edited by Martin Kennedy and Helen Engebretsen
    Kennedy (eds.) Kennedy & Engebretsen Data Visualization in Society Edited by Martin Engebretsen and Helen Kennedy Data Visualization in Society Data Visualization in Society Data Visualization in Society Edited by Martin Engebretsen and Helen Kennedy Amsterdam University Press The publication of this book is made possible by a grant from the Research Council of Norway (grant number 25936). Cover illustration: Prisca Schmarsow of Eyedea Studio Cover design: Coördesign, Leiden Lay-out: Crius Group, Hulshout isbn 978 94 6372 290 2 e-isbn 978 90 4854 313 7 doi 10.5117/9789463722902 nur 811 Creative Commons License CC BY NC ND (http://creativecommons.org/licenses/by-nc-nd/3.0) All authors / Amsterdam University Press B.V., Amsterdam 2020 Some rights reserved. Without limiting the rights under copyright reserved above, any part of this book may be reproduced, stored in or introduced into a retrieval system, or transmitted, in any form or by any means (electronic, mechanical, photocopying, recording or otherwise). Every effort has been made to obtain permission to use all copyrighted illustrations reproduced in this book. Nonetheless, whosoever believes to have rights to this material is advised to contact the publisher. Table of Contents List of tables 8 List of figures 9 Acknowledgements 15 Foreword: The dawn of a philosophy of visualization 17 Alberto Cairo, Knight Chair at the University of Miami and author of How Charts Lie 1. Introduction : The relationships between graphs, charts, maps and meanings, feelings, engagements 19 Helen Kennedy and Martin Engebretsen Section I Framing data visualization 2. Ways of knowing with data visualizations 35 Jill Walker Rettberg 3.
    [Show full text]
  • Exploring the Value of Big Data Analysis of Twitter Tweets and Share Prices
    Exploring the value of Big Data analysis of Twitter tweets and share prices A Thesis submitted by Peter Wlodarczak, B.Sc. Computer Science, MBA For the award of Doctor of Philosophy, 2017 Peter Wlodarczak Abstract Over the past decade, the use of social media (SM) such as Facebook, Twitter, Pinterest and Tumblr has dramatically increased. Using SM, millions of users are creating large amounts of data every day. According to some estimates ninety per cent of the content on the Internet is now user generated. Social Media (SM) can be seen as a distributed content creation and sharing platform based on Web 2.0 technologies. SM sites make it very easy for its users to publish text, pictures, links, messages or videos without the need to be able to program. Users post reviews on products and services they bought, write about their interests and intentions or give their opinions and views on political subjects. SM has also been a key factor in mass movements such as the Arab Spring and the Occupy Wall Street protests and is used for human aid and disaster relief (HADR). There is a growing interest in SM analysis from organisations for detecting new trends, getting user opinions on their products and services or finding out about their online reputation. Companies such as Amazon or eBay use SM data for their recommendation engines and to generate more business. TV stations buy data about opinions on their TV programs from Facebook to find out what the popularity of a certain TV show is. Companies such as Topsy, Gnip, DataSift and Zoomph have built their entire business models around SM analysis.
    [Show full text]
  • Information Revolution from Data to Policy Action in Low-Income Countries: How Can Innovation Help?
    Information Revolution From Data to Policy Action in Low-Income Countries: How Can Innovation Help? Matjaž Jug PARIS21 Partnership in Statistics for Development in the 21st Century Discussion Paper No. 3 April 2014 Disclaimer The opinions expressed in this paper are those of the author and should not be attributed to the PARIS21 partnership or its members. Abstract While the volume and diversity of data available to researchers, policy analysts and decision- makers is growing, it is clear that traditional methods and approaches aren’t enough to realize the potential of these new data sources. The environment in low-income countries poses some additional challenges to the ways to collect, analyse, disseminate and use data. This paper investigates technological trends and enabling initiatives and opportunities that can significantly improve collection, analysis and dissemination of relevant, accurate and timely information for policy action in low income countries. Examples of innovative approaches in emerging areas like mobile data collection, big data, visualisation, and data dissemination, are presented and investigated in terms of their potential for improved policy action. Introduction The Report of the High-Level Panel of Eminent Persons on the Post-2015 Development Agenda has identified how new goals and measurable targets could be framed in the wake of transformative shifts, calling for a data revolution for sustainable development, with a new international initiative to improve the quality of statistics and information available to citizens. “We should actively take advantage of new technology, crowd sourcing, and improved connectivity to empower people with information on the progress towards the targets.”1 Information needs and challenges in low-income countries Information needs underpin policy action in our global, diverse, and fast changing environment in many different ways.
    [Show full text]
  • Towards a Formal Model of Social Data
    Towards a Formal Model of Social Data Raghava Rao Mukkamala1 Abid Hussain2 Ravi Vatrapu2;3 1 IT University of Copenhagen, Denmark 2 Copenhagen Business School, Denmark 3 Norwegian School of Information Technology (NITH), Norway IT University Technical Report Series TR-2013-169 ISSN 1600–6100 November 2013 Copyright c 2013, Raghava Rao Mukkamala1 Abid Hussain2 Ravi Vatrapu2;3 1 IT University of Copenhagen, Denmark 2 Copenhagen Business School, Denmark 3 Norwegian School of Information Technology (NITH), Norway IT University of Copenhagen All rights reserved. Reproduction of all or part of this work is permitted for educational or research use on condition that this copyright notice is included in any copy. ISSN 1600–6100 ISBN 978-87-7949.307-0 Copies may be obtained by contacting: IT University of Copenhagen Rued Langgaards Vej 7 DK-2300 Copenhagen S Denmark Telephone: +45 72 18 50 00 Telefax: +45 72 18 50 01 Web www.itu.dk Towards a Formal Model of Social Data Raghava Rao Mukkamala1, Abid Hussain2, and Ravi Vatrapu2;3 1 IT University of Copenhagen Rued Langgaardsvej 7, 2300 Copenhagen, Denmark [email protected], http://www.itu.dk 2 Copenhagen Business School Howitzvej 60, Frederiksberg, 2000, Denmark fah.itm, [email protected], http://cssl.cbs.dk 3 Norwegian School of Information Technology (NITH), Norway Abstract. Computational social science (CSS) is an emerging field of research that seeks to apply computa- tional methods and tools to important and interesting social science questions and problems. Situated within CSS, Social data analytics as a research stream aims to collect, archive, retrieve, process, transform, analyse, and report social data from social media platforms such as Facebook and twitter.
    [Show full text]