Comparing Truth Discovery Methods Without Using Ground Truth
Total Page:16
File Type:pdf, Size:1020Kb
From Appearance to Essence: Comparing Truth Discovery Methods without Using Ground Truth Xiu Susie Fang Quan Z. Sheng Xianzhi Wang Department of Computing, Macquarie Department of Computing, Macquarie School of Computer Science and University University Engineering, The University of New Sydney, NSW 2109, Australia Sydney, NSW 2109, Australia South Wales [email protected] [email protected] Sydney, NSW 2052, Australia [email protected] Wei Emma Zhang Anne H.H. Ngu Department of Computing, Macquarie Department of Computer Science, University Texas State University Sydney, NSW 2109, Australia San Marcos, TX 78666, USA [email protected] [email protected] ABSTRACT ACM Reference format: Truth discovery has been widely studied in recent years as a fun- Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Wei Emma Zhang, and Anne damental means for resolving the conicts in multi-source data. H.H. Ngu. 2017. From Appearance to Essence: Comparing Truth Discovery Methods without Using Ground Truth. In Proceedings of ACM Conference, Although many truth discovery methods have been proposed based Washington, DC, USA, July 2017 (Conference’17), 11 pages. on dierent considerations and intuitions, investigations show that DOI: xxxxxxx no single method consistently outperforms the others. To select the right truth discovery method for a specic application scenario, it becomes essential to evaluate and compare the performance of 1 INTRODUCTION dierent methods. A drawback of current research eorts is that The World Wide Web (WWW) has become a platform of paramount they commonly assume the availability of certain ground truth importance for storing, collecting, processing, querying, and man- for the evaluation of methods. However, the ground truth may be aging the Big Data in recent years, with around 2:5 quintillion bytes very limited or even out-of-reach in practice, rendering the evalu- of data created every day through various channels such as blogs, ation biased by the small ground truth or even unfeasible. In this social networks, discussion forums, and crowd-sourcing platforms. paper, we present CompTruthHyp, a general approach for compar- People from various domains, such as medical care, government, ing the performance of truth discovery methods without using business, research, are relying on these data to fulll their informa- ground truth. In particular, our approach calculates the probability tion needs. In these scenarios, information about the same objects of observations in a dataset based on the output of dierent meth- can often be collected from a variety of sources. However, due to ods. The probability is then ranked to reect the performance of the varying quality of Web sources, the data about the same ob- these methods. We review and compare twelve existing truth dis- jects may conict among the sources. To help users determine the covery methods and consider both single-valued and multi-valued veracity of multi-source data, a fundamental research topic, truth objects. Empirical studies on both real-world and synthetic datasets discovery, has attracted broad attentions recently. demonstrate the eectiveness of our approach for comparing truth So far, various truth discovery methods [12] have been proposed discovery methods. based on dierent considerations and intuitions. However, investi- arXiv:1708.02029v1 [cs.DB] 7 Aug 2017 CCS CONCEPTS gations [11, 12, 18] show that no methods could constantly outper- form the others in all application scenarios. Moreover, Li et al. [11] • Information systems → Data cleaning; Data mining; • Theory demonstrate with experiments that even an improved method does of computation → Design and analysis of algorithms; not always beat its original version, such as Investment and Pooled- Investmen [13], Cosine, 2-Estimates and 3-Estimates [7]. Therefore, KEYWORDS to help users select the most suitable method to fulll their ap- Big Data, Truth Discovery Methods, Sparse Ground Truth, Perfor- plication needs, it becomes essential to evaluate and compare the mance Evaluation, Single-Valued Objects, Multi-Valued Objects performance of dierent methods. To evaluate the eectiveness of truth discovery methods, current Permission to make digital or hard copies of part or all of this work for personal or research usually measures their performance in terms of accuracy classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation (or error rate), F1-score, recall, precision, specicity for categorical on the rst page. Copyrights for third-party components of this work must be honored. data [18], and Mean of Absolute Error (MAE) and Root of Mean For all other uses, contact the owner/author(s). Square Error (RMSE) for continuous data [12]. All these metrics Conference’17, Washington, DC, USA © 2017 Copyright held by the owner/author(s). xxxxxxxxxxxxxxx...$15.00 are measured and compared based on the assumption that a rea- DOI: xxxxxxx sonable amount of ground truth is available. However, the fact is, Conference’17, July 2017, Washington, DC, USA X. S. Fang et al. ground truth is often very limited or even out-of-reach (generally • We analyze, implement, and compare twelve specic truth less than 10% of the original dataset’s size [18]). For example, the discovery methods, including majority voting, Sums, Average- knowledge graph construction [4] involves a huge number of ob- Log, Investment, PooledInvestment [13], TruthFinder [22], jects, making it impossible to have the complete ground truth for 2-Estimates, 3-Estimates [7], Accu [3], CRH [10], SimpleLCA, performance validation. In addition, it requires enormous human and GuessLCA [15]. eorts to acquire even a small set of ground truth. The lack of su- • We propose a novel approach, called CompTruthHyp, to cient ground truth can, in many cases, statistically undermine the compare the performance of truth discovery methods legitimacy of evaluating and comparing existing methods using the without using ground truth, by considering the output ground truth-based approach. For example, previous comparative of each method as a hypothesis about the ground truth. studies [2, 3, 10, 11, 16, 19, 20, 23, 24], based on real-world datasets CompTruthHyp takes both single-valued and multi-valued with sparse ground truth, could all bring bias to the performance objects into consideration. It utilizes the output of all meth- measurement of the methods. Under this circumstance, it is hard ods to quantify the probability of observation of the dataset to conclude which method performs better as we cannot trust the and then determines the method with the largest probabil- comparison results. This also makes it dicult to select the method ity to be the most accurate. with the best performance to be applied to specic application • We conduct extensive experiments on both synthetic and scenarios. Therefore, evaluating the performance of various truth real-world datasets to demonstrate the eectiveness of our discovery methods with missing or very limited ground truth can proposed approach. Our approach consistently achieves be a signicant and challenging problem for the truth discovery more accurate rankings of the twelve methods than tradi- applications [12]. We identify the key challenges of this issue as tional ground truth-based evaluation approach. the following: The rest of the paper is organized as follows. We review the related work in Section 2. Section 3 introduces some background • The only way to obtain evidence for performance evalua- knowledge about truth discovery and the observations that mo- tion without ground truth is to extract features from the tivate our work. Section 4 presents our approach. We report our given dataset for truth discovery. However, the features experiments and results in Section 5. Finally, Section 6 provides of a dataset are sometimes complex, concerning source- some concluding remarks. to-source, source-to-object, object-to-value, and value-to- value relations. In addition, it is challenging to nd a method to capture those relations without importing additional bi- 2 RELATED WORK ases. Generally, there are two categories of previous studies on perfor- • Current truth discovery methods commonly jointly deter- mance evaluation and comparison of truth discovery methods. The mine value veracity and calculate source trustworthiness. rst category includes the work on novel and advanced approaches Source trustworthiness and value condence scores are for truth discovery in various scenarios. To validate the perfor- the common intermediates of the existing methods, which mance of their proposed approaches and to show how their ap- are also the key elements for identifying the truth for each proaches outperform the state-of-the-art truth discovery methods, object. Therefore, we can consider identifying the rela- those projects conduct comparative studies by running experiments tions among sources, objects, and values by leveraging on real-world datasets with manually collected ground truth. Truth those measurements to match the relations extracted from discovery is rst formulated by Yin et al. [22]. To show the eective- the given dataset. However, even if we are able to obtain ness of TruthFinder, they conduct experiments on one real-world the features of the given dataset, dierent truth discov- dataset, i.e., Book-Author dataset, which contains 1; 263 objects. The ery methods may calculate the source trustworthiness and manually