Development of Linguistic Linked Open Data Resources for Collaborative Data- Intensive Research in the Language Sciences
Total Page:16
File Type:pdf, Size:1020Kb
Development of Linguistic Linked Open Data Resources for Collaborative Data- Intensive Research in the Language Sciences Development of Linguistic Linked Open Data Resources for Collaborative Data- Intensive Research in the Language Sciences Edited by Antonio Pareja- Lora, María Blume, Barbara C. Lust, and Christian Chiarcos The MIT Press Cambridge, Massachusetts London, England © 2019 Mas sa chu setts Institute of Technology This work is subject to a Creative Commons CC BY- NC- ND license. Subject to such license, all rights are reserved. The Open Access edition of this book was published with generous support from the National Science Foundation (grant number BCS-1463196), Pontificia Universidad Católica del Perú, and Knowledge Unlatched. This book was set in Times New Roman by Westchester Publishing Ser vices. Printed and bound in the United States of Ameri ca. Library of Congress Cataloging- in- Publication Data Names: Pareja- Lora, Antonio, editor. | Blume, María, editor. | Lust, Barbara C., 1941– editor. | Chiarcos, Christian, editor. Title: Development of linguistic linked open data resources for collaborative data- intensive research in the language sciences / edited by Antonio Pareja- Lora, María Blume, Barbara C. Lust, and Christian Chiarcos. Description: Cambridge : MIT Press, 2019. | Includes bibliographical references and index. Identifiers: LCCN 2019019588 | ISBN 9780262536257 (paperback) Subjects: LCSH: Language and languages--Study and teaching. | Language and languages-- Research. | Linked data. Classification: LCC P53 .D398 2019 | DDC 025.06/4-- dc23 LC rec ord available at h t t p s : / / l c c n . l o c . g o v / 2 0 1 9 0 1 9 5 8 8 10 9 8 7 6 5 4 3 2 1 Contents Acknowledgments vii Development of Linguistic Linked Open Data Resources for Collaborative Data- Intensive Research in the Language Sciences: An Introduction ix Barbara C. Lust, María Blume, Antonio Pareja- Lora, and Christian Chiarcos 1 Open Data— Linked Data— Linked Open Data— Linguistic Linked Open Data (LLOD): A General Introduction 1 Christian Chiarcos and Antonio Pareja- Lora 2 Whither GOLD? 19 D. Terence Langendoen 3 Management, Sustainability, and Interoperability of Linguistic Annotations 25 Nancy Ide 4 Linguistic Linked Open Data and Under- Resourced Languages: From Collection to Application 39 Steven Moran and Christian Chiarcos 5 A Data Category Repository for Language Resources 69 Kara Warburton and Sue Ellen Wright 6 Describing Research Data with CMDI— Challenges to Establish Contact with Linked Open Data 99 Thorsten Trippel and Claus Zinn 7 Expressing Language Resource Metadata as Linked Data: The Case of the Open Language Archives Community 117 Gary F. Simons and Steven Bird vi Contents 8 TalkBank Resources for Psycholinguistic Analysis and Clinical Practice 131 Nan Bernstein Ratner and Brian MacWhinney 9 Enabling New Collaboration and Research Capabilities in Language Sciences: Management of Language Acquisition Data and Metadata with the Data Transcription and Analysis Tool 151 María Blume, Antonio Pareja- Lora, Suzanne Flynn, Claire Foley, Ted Caldwell, James Reidy, Jonathan Masci, and Barbara Lust 10 Challenges for the Development of Linked Open Data for Research in Multilingualism 185 María Blume, Isabelle Barrière, Cristina Dye, and Carissa Kang 11 Research Libraries as Partners in Ensuring the Sustainability of E- science Collaborations 201 Oya Y. Rieger List of Contributors 213 Author Index 221 Thematic Index 225 Acknowledgments This volume and the 2015 workshop, Development of Linguistic Linked Open Data Resources for Collaborative Data- Intensive Research in the Language Sciences, convened at the University of Chicago in July 2015, were funded by the N ational Science Foundation with a grant (Award ID 1463196) given to Barbara C. Lust, María Blume, and Antonio Pareja- Lora. Publication was further supported by a grant fromhe t Cornell University Library and supplemented by the Cornell Institute for Social Science and the Cornell Cognitive Science Program, as well as the Department of Humanities at the Pontificia Universidad Católica del Perú and Knowledge Unlatched. We thank William J. Badecker, NSF Linguistics Program director, whose advice has been invaluable during all stages of this project, and Marc Lowenthal and Anthony Zannino at MIT Press, who guided us to publication. We thank Amy Brand, director of MIT Press, whose pursuit of the development of scholarly communication in the digital age provided support for our project. The Cornell University Library, through Oya Rieger, provided con- tinual advice and support, as well as a critical dimension of library- researcher relations, continuing the early vision of previous Mann Library director, Janet McCue. Emily Ber- nardski provided key support and coordination for the workshop, as did Carissa Kang and Jonathan Masci, our student support team. Our editors Michelle Melanson and Rebecca Rich Goldweber provided invaluable assistance in volume publication. James Gair pro- vided continual support throughout. Previous supporters for the development of the LLOD vision and related research that established the foundations for this project are María Blume and Barbara Lust (2008), Transforming the Primary Research Process Through Cybertool Dissemination: an Implementation of a Virtual Center for the Study of Language Acquisition. NSF OCI- 0753415; Janet McCue and Barbara Lust (2004), National ScienceFoundation Award: Planning Information Infrastructure Through a New Library- Research Partnership (SGER = Small Grant for Exploratory Research NSF 0437603); and Barbara Lust (2003) Planning Grant: a Virtual Center for Child Language Acquisition Research, National Sci- ence Foundation, NSF BCS-0126546. Additional support has be en provided by the Amer- ican Institute for Sri Lankan Studies, the Cornell University E inaudi Center, Cornell viii Acknowledgments University Institute Faculty for Innovation Social in and Teaching Awards, the Cornell Economic Research (CISER), and the Cornell Institute for Social Sciences. Finally, we gratefully acknowledge the welcome and continual collaboration of other founding members of the Virtual Center for the Study of Language Acquisition (VCLA) for supporting the vision represented in this volume: Suzanne Flynn (MIT, USA); Qi Wang, Marianella Casasola, and Claire Cardie (Cornell University, USA); Elise Temple (The Nielsen Company, USA); Liliana Sánchez (Rutgers University at New Brunswick, USA); Jennifer Austin (Rutgers University at Newark, USA); YuChin Chien (California State University at San Bernardino, USA); and Usha Lakshmanan (Southern Illinois Uni- versity at Carbondale, USA). We greatly appreciate the collaboration of scholars who are VCLA affiliates, including Sujin Yang (Ewha Womans University,South Korea); Gita Martohardjono (City University of New York Graduate Center and Queens College, USA); Valerie Shafer (City University of New York, USA); Isabelle Barrière (Long Island University– Brooklyn, USA); Cristina Dye (Newcastle University, UK); Yarden Kedar (Beit Berl College, Israel); Joy Hirsch (Columbia University, USA); Sarah Callahan (Assessment Technology Inc., USA); Kwee Ock Lee (Kyungsung University, South Korea); R. Amritavalli (Central Institute of English and Foreign Languages, India); and A. Usha Rani (Osmania University, India). Development of Linguistic Linked Open Data Resources for Collaborative Data- Intensive Research in the Language Sciences: An Introduction Barbara C. Lust, María Blume, Antonio Pareja- Lora, and Christian Chiarcos This volume arose out of a workshop, Development of Linguistic Linked Open Data Resources for Collaborative Data- Intensive Research in the Language Sciences, held under the auspices of the Linguistic Society of America (LSA) Summer Institute at the University of Chicago in July 2015. The workshop was organized by Barbara Lust, Anto- nio Pareja- Lora, and María Blume, with the support of the National Science Foundation (NSF 1463196), supplemented by support from Cornell University’s Institute for Social Sci- ences and Cognitive Science program. The collection of papers in this volume results from that workshop. Publication was further supported by the Cornell University Library, the Department of Humanities at the Pontificia Universidad Católica eld Perú and Knowledge Unlatched. The workshop was energized by the transformation in science scholarship that has developed over recent decades and that was envisioned by the National Science Founda- tion’s Blue- Ribbon Advisory Panel on CyberInfrastructure (Atkins et al. 2003, reviewed and assessed by Borgman 2007). Empowered by the internet, the current digital age opened unprecedented opportunities for storing, disseminating, sharing, and manipulating large and complex amounts of data to become open and linked (Berners- Lee 2009; Chiarcos, Hellmann, and Nordhoff 2012). The more that each data singleton anc be significantly interlinked, the more powerful and useful it becomes, enabling scholars to pursue new and advanced questions. The more that data are linked, and the more that datasets and data providers are included in this linking, the more that researchers within and across disciplines can partner, based on shared data, thereby empowering more powerful research questions. Today, many of the sciences, including the social sciences, are being transformed by these developments. Yet, converting disparate and self- contained databases (data silos) into interlinked resources to facilitate co- operation and synergies between academic researchers