Provenance Data in Social Media
Total Page:16
File Type:pdf, Size:1020Kb
Series ISSN: 2151-0067 • GUNDECHA •LIUBARBIER • FENG DATA INSOCIAL MEDIA PROVENANCE SYNTHESIS LECTURES ON M Morgan & Claypool Publishers DATA MINING AND KNOWLEDGE DISCOVERY &C Series Editors: Jiawei Han, University of Illinois at Urbana-Champaign, Lise Getoor, University of Maryland, Wei Wang, University of North Carolina, Chapel Hill, Johannes Gehrke, Cornell University, Robert Grossman, University of Illinois at Chicago Provenance Data Provenance Data in Social Media Geoffrey Barbier, Air Force Research Laboratory Zhuo Feng, Arizona State University in Social Media Pritam Gundecha, Arizona State University Huan Liu, Arizona State University Social media shatters the barrier to communicate anytime anywhere for people of all walks of life. The publicly available, virtually free information in social media poses a new challenge to consumers who have to discern whether a piece of information published in social media is reliable. For example, it can be difficult to understand the motivations behind a statement passed from one user to another, without knowing the person who originated the message. Additionally, false information can be propagated through social media, resulting in embarrassment or irreversible damages. Provenance data associated with a social media statement can help dispel rumors, clarify opinions, and confirm facts. However, Geoffrey Barbier provenance data about social media statements is not readily available to users today. Currently, providing this data to users requires changing the social media infrastructure or offering subscription services. Zhuo Feng Taking advantage of social media features, research in this nascent field spearheads the search for a way to provide provenance data to social media users, thus leveraging social media itself by mining it for the Pritam Gundecha provenance data. Searching for provenance data reveals an interesting problem space requiring the development and application of new metrics in order to provide meaningful provenance data to social Huan Liu media users. This lecture reviews the current research on information provenance, explores exciting research opportunities to address pressing needs, and shows how data mining can enable a social media user to make informed judgements about statements published in social media. About SYNTHESIs This volume is a printed version of a work that appears in the Synthesis MORGAN Digital Library of Engineering and Computer Science. Synthesis Lectures provide concise, original presentations of important research and development topics, published quickly, in digital and print formats. For more information visit www.morganclaypool.com & CLAYPOOL ISBN: 978-1-60845-783-0 SYNTHESIS LECTURES ON Morgan & Claypool Publishers 90000 DATA MINING AND KNOWLEDGE DISCOVERY www.morganclaypool.com 9 781608 457830 Jiawei Han, Lise Getoor, Wei Wang, Johannes Gehrke, Robert Grossman, Series Editors Provenance Data in Social Media Synthesis Lectures on Data Mining and Knowledge Discovery Editors Jiawei Han, UIUC Lise Getoor, University of Maryland Wei Wang, University of North Carolina, Chapel Hill Johannes Gehrke, Cornell University Robert Grossman, University of Chicago Synthesis Lectures on Data Mining and Knowledge Discovery is edited by Jiawei Han, Lise Getoor, Wei Wang, Johannes Gehrke, and Robert Grossman. The series publishes 50- to 150-page publications on topics pertaining to data mining, web mining, text mining, and knowledge discovery, including tutorials and case studies. The scope will largely follow the purview of premier computer science conferences, such as KDD. Potential topics include, but not limited to, data mining algorithms, innovative data mining applications, data mining systems, mining text, web and semi-structured data, high performance and parallel/distributed data mining, data mining standards, data mining and knowledge discovery framework and process, data mining foundations, mining data streams and sensor data, mining multi-media data, mining social networks and graph data, mining spatial and temporal data, pre-processing and post-processing in data mining, robust and scalable statistical methods, security, privacy, and adversarial data mining, visual data mining, visual analytics, and data visualization. Provenance Data in Social Media Geoffrey Barbier, Zhuo Feng, Pritam Gundecha, and Huan Liu 2013 Graph Mining: Laws, Tools, and Case Studies D. Chakrabarti and C. Faloutsos 2012 Mining Heterogeneous Information Networks: Principles and Methodologies Yizhou Sun and Jiawei Han 2012 iii Privacy in Social Networks Elena Zheleva, Evimaria Terzi, and Lise Getoor 2012 Community Detection and Mining in Social Media Lei Tang and Huan Liu 2010 Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions Giovanni Seni and John F. Elder 2010 Modeling and Data Mining in Blogosphere Nitin Agarwal and Huan Liu 2009 Copyright © 2013 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. Provenance Data in Social Media Geoffrey Barbier, Zhuo Feng, Pritam Gundecha, and Huan Liu www.morganclaypool.com ISBN: 9781608457830 paperback ISBN: 9781608457847 ebook DOI 10.2200/S00496ED1V01Y201304DMK007 A Publication in the Morgan & Claypool Publishers series SYNTHESIS LECTURES ON DATA MINING AND KNOWLEDGE DISCOVERY Lecture #7 Series Editors: Jiawei Han, UIUC Lise Getoor, University of Maryland Wei Wang, University of North Carolina, Chapel Hill Johannes Gehrke, Cornell University Robert Grossman, University of Chicago Series ISSN Synthesis Lectures on Data Mining and Knowledge Discovery Print 2151-0067 Electronic 2151-0075 Provenance Data in Social Media Geoffrey Barbier Air Force Research Laboratory Zhuo Feng Arizona State University Pritam Gundecha Arizona State University Huan Liu Arizona State University SYNTHESIS LECTURES ON DATA MINING AND KNOWLEDGE DISCOVERY #7 M &C Morgan& cLaypool publishers ABSTRACT Social media shatters the barrier to communicate anytime anywhere for people of all walks of life. The publicly available, virtually free information in social media poses a new challenge to consumers who have to discern whether a piece of information published in social media is reliable. For example, it can be difficult to understand the motivations behind a statement passed from one user to another, without knowing the person who originated the message. Additionally, false information can be propagated through social media, resulting in embarrassment or irreversible damages. Provenance data associated with a social media statement can help dispel rumors, clarify opinions, and confirm facts. However, provenance data about social media statements is not readily available to users today. Currently, providing this data to users requires changing the social media infrastructure or offering subscription services. Taking advantage of social media features, research in this nascent field spearheads the search for a way to provide provenance data to social media users, thus leveraging social media itself by mining it for the provenance data. Searching for provenance data reveals an interesting problem space requiring the development and application of new metrics in order to provide meaningful provenance data to social media users. This lecture reviews the current research on information provenance, explores exciting research opportunities to address pressing needs, and shows how data mining can enable a social media user to make informed judgements about statements published in social media. KEYWORDS social computing, social media, provenance, data mining, social networking, microblog- ging vii This effort is dedicated to my family. Thank you all for your support. – GB To my parents and wife. – ZF To my parents, brother, and wife. – PG To my parents, wife, and sons. – HL ix Contents Acknowledgments ........................................................ xi 1 Information Provenance in Social Media .....................................1 1.1 Social Media ........................................................... 1 1.2 Social Media Data ...................................................... 4 1.3 Information Provenance ................................................. 9 1.4 The Information Provenance Problem .................................... 10 1.5 Challenges ............................................................ 12 1.6 In Search of Provenance Data ........................................... 13 1.6.1 Analyzing Provenance Attributes .................................. 14 1.6.2 Seeking Provenance via Network Information ....................... 14 1.6.3 Searching for Provenance Data .................................... 14 1.7 Summary ............................................................. 14 2 Provenance Attributes ................................................... 17 2.1 Defining Provenance Attributes .......................................... 18 2.2 Measuring Provenance Attributes ........................................ 19 2.3 Analyzing Provenance Attributes ........................................ 23 2.4 Summary ............................................................. 30 3 Provenance via Network Information ...................................... 31 3.1 Information Propagation Models ........................................ 32 3.1.1 Susceptible-Infected (SI)