Issue Editor

Bulletin of the Technical Committee on Data Engineering September 2016 Vol. 39 No. 3 IEEE Computer Society Letters Letter from the Editor-in-Chief . David Lomet 1 Letter from the 2016 IEEE TCDE Impact Award Winner . Michael Carey 2 Call for Nominations for TCDE Chair . Kyu-Young Whang 3 Letter from the Special Issue Editor . Haixun Wang 4 Special Issue on Text, Knowledge, and Database Managing Google’s data lake: an overview of the Goods system . Alon Halevy, Flip Korn, Natalya F. Noy, Christopher Olston, Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang 5 Data services leveraging Bing’s data assets . ........................ Kaushik Chakrabarti, Surajit Chaudhuri, Zhimin Chen, Kris Ganjam, Yeye He 15 Attributed Community Analysis: Global and Ego-centric Views . Xin Huang, Hong Cheng, Jeffrey Xu Yu 29 Ten Years of Knowledge Harvesting: Lessons and Challenges . ........................................... Gerhard Weikum, Johannes Hoffart, Fabian Suchanek 41 Towards a game-theoretic framework for text data retrieval . ChengXiang Zhai 51 Neural enquirer: learning to query tables in natural language . Zhengdong Lu, Hang Li, Ben Kao 63 Multi-Dimensional, Phrase-Based Summarization in Text Cubes . Fangbo Tao, Honglei Zhuang, Chi Wang Yu, Qi Wang, Taylor Cassidy, Lance Kaplan, Clare Voss, Jiawei Han 74 Answering End-User Questions, Queries and Searches on Wikipedia and its History . .................................... Maurizio Atzori, Shi Gao, Giuseppe M. Mazzeo, Carlo Zaniolo 85 Conference and Journal Notices TCDE Membership Form . back cover Editorial Board TCDE Executive Committee Editor-in-Chief Chair David B. Lomet Xiaofang Zhou Microsoft Research The University of Queensland One Microsoft Way Brisbane, QLD 4072, Australia Redmond, WA 98052, USA [email protected] [email protected] Executive Vice-Chair Associate Editors Masaru Kitsuregawa The University of Tokyo Tim Kraska Tokyo, Japan Department of Computer Science Secretary/Treasurer Brown University Thomas Risse Providence, RI 02912 L3S Research Center Tova Milo Hanover, Germany School of Computer Science Committee Members Tel Aviv University Amr El Abbadi Tel Aviv, Israel 6997801 University of California Christopher Ré Santa Barbara, California 93106 Stanford University Malu Castellanos 353 Serra Mall HP Labs Stanford, CA 94305 Palo Alto, CA 94304 Xiaoyong Du Haixun Wang Renmin University of China Facebook, Inc. Beijing 100872, China 1 Facebook Way Wookey Lee Menlo Park, CA 94025 Inha University Inchon, Korea Distribution Renée J. Miller Brookes Little University of Toronto IEEE Computer Society Toronto ON M5S 2E4, Canada 10662 Los Vaqueros Circle Erich Neuhold Los Alamitos, CA 90720 University of Vienna [email protected] A 1080 Vienna, Austria Kyu-Young Whang The TC on Data Engineering Computer Science Dept., KAIST Membership in the TC on Data Engineering is open to Daejeon 305-701, Korea all current members of the IEEE Computer Society who are interested in database systems. The TCDE web page is http://tab.computer.org/tcde/index.html. Liaisons Anastasia Ailamaki The Data Engineering Bulletin Ecole´ Polytechnique Fédérale de Lausanne The Bulletin of the Technical Committee on Data Engi- Station 15, 1015 Lausanne, Switzerland neering is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, Paul Larson modelling, theory and application of database systems and Microsoft Research their technology. Redmond, WA 98052 Letters, conference information, and news should be sent Chair, DEW: Self-Managing Database Sys. to the Editor-in-Chief. Papers for each issue are solicited Shivnath Babu by and should be sent to the Associate Editor responsible Duke University for the issue. Durham, NC 27708 Opinions expressed in contributions are those of the au- Co-Chair, DEW: Cloud Data Management thors and do not necessarily reflect the positions of the TC Xiaofeng Meng on Data Engineering, the IEEE Computer Society, or the Renmin University of China authors’ organizations. Beijing 100872, China The Data Engineering Bulletin web site is at http://tab.computer.org/tcde/bull_about.html. i Letter from the Editor-in-Chief The Current Issue Many years ago, there was an IBM ad with the catch phrase ”Not just data, reality”. ADspeak is not science, so we should not take this phrase as describing the historical situation when it was published, nor should we look back at it with too critical an eye. (Though it is funny.) I think of it as aspirational, and in that context, it is really right-on! It is one thing to have petabytes of data, it is another to be able to exploit it to some purpose. Gathering data is easy, though it can be expensive. Extracting utility from data requires more than expensive infrastructure. It requires insights to build technology to “mine” the data. The data engineering community has been working on this for many years for multiple and increasing purposes, from knowledge research to health care to business opportunity, and many places in-between. The current issue surveys the efforts by our community to put data to good use. And, perhaps unique to the Bulletin, it spans historical to work in progress, from universities, research centers, and industry. Thus, the issue provides a broad perspective on the purposes to which data is being put and the technology to enable this. Haixun Wang, the issue editor, has succeeded in bringing this topic to Bulletin readers. The topic has, is, and will continue to be timely. I want to thank Haixun for his hard work in bringing this important issue to us. “No” on Proposed IEEE Constitutional Amendment The Computer Society is part of the IEEE. As such it has a role to play in the governance of the IEEE, as do all societies within the IEEE. A proposed amendment to the IEEE constitution would change the way that the IEEE is governed. As of this moment, the Computer Society is joined with nine other societies who have had formal votes in opposition to the amendment. And no society has voted in favor of the amendment. The thrust of the IEEE amendment is to move governance partially out of the constitution and make it sub- ject to the more easily changed by-laws. Prior versions of the amendment have suggested that the purpose may be to reduce the role of societies in IEEE governance. It also removes IEEE members from some governance votes that are currently required. The proposal in detail can be seen at https://www.ieee.org/documents/constitution_approved_amendment_changes_election.pdf (login is required using your IEEE account). As a member of the Computer Society Board of Governors (BOG), I have participated in BOG discussions on the proposed IEEE amendment, and agree with the position taken by the Computer Society in opposition. I believe the amendment is misguided. Professional meetings can be contentious, as there are sometimes conflict- ing interests that need to be resolved. But the change proposed seems to me to not improve the way we deal with these conflicts but rather seeks to bury them. That is not the way democracy is supposed to work. So I urge you to vote “no” on this amendment. IEEE TCDE Award Winners The Technical Committee on Data Engineering (TCDE) initiated awards for the data engineering community in 2014. This is the third year for the awards. The winners of the awards this year are listed on the TCDE web site http://tab.computer.org/tcde/tcdeawardsrecipients.html This year, award winners have the opportunity to publish a short communication about their thoughts as a result of the award in the Bulletin. The current issue contains a short communication from Impact Award winner Michael Carey. Award winners have made truly distinguished contributions. I want to congratulate them on their awards, and I would encourage you to see what Mike has to say in the current issue. David Lomet Microsoft Corporation 1 Letter from the 2016 IEEE TCDE Impact Award Winner In March I received the following news by e-mail: “...on behalf of the IEEE TCDE I am delighted to inform you that the TCDE Award Committee has awarded you the IEEE TCDE CSEE (Computer Science, Engineering, and Education) and Impact Award for this year ‘for leadership and research excellence in building impactful data management systems, engineering tools, products, and practices’.” I was very surprised, and I was extremely honored to be chosen! Along with that news came an opportunity to write a one-pager about “anything” for the DE Bulletin, and I’ve decided to use my DE space to offer some in-my-opinion (IMO) “stylistic suggestions” for researchers in our community – thoughts aimed at all of us, but especially at the younger set. 1. Results, not papers, are the objective! All too often, conversations with our fellow academics (both faculty and students alike) or research lab staffers include phrases like “I’m working on a paper about ...” or “I want to write a paper for SIGXXX...”. The goal of our work should be to “do cool stuff” – to build cool systems or subsystems, or to come up with cool and useful algorithms or data structures or insights – not to write papers. Once we have something to report, it’s paper time – but the paper should never be the end goal (IMO). 2. It’s possible to over-publish – but please don’t! In the “good old days”, a fresh Ph.D. student would have a small handful of papers on their CV – maybe one per year spent in graduate school – and that was sufficient as long as the results were cool (see point 1). It’s really not possible for one person to come up with a new publication-worthy result in under 6-12 months per result. Look at some of the most impactful systems researchers in CS – e.g., let’s look at two of my heroes, Barbara Liskov (MIT) and John Ousterhout (currently at Stanford).

Issue Editor

University of Oklahoma Graduate College

Weekend Glance

A Typology of Next Generation Employment Preferences in Family

Psychotherapy in Pop Song Lyrics

UCLA Electronic Theses and Dissertations

Healing Our History Mother That All His Judgments Had Pre - but Because They Are Mutually Exclusive

THE JUSTICE COLLECTIVE (U.A

Sung Texts CD 4

Five by Five Karaoke As of 113017

Modes of Expression in the Songs of Aimee Mann Amy M

Haitian Creole – English Dictionary

Growing Semi-Living Art