Using Wikidata for Research

Experiences, Opportunities and Challenges for Research Data Management Using Wikidata for Video Game Research

Vienna, 18.10.2019 Tracy Hoffmann

Leipzig University Library Using Wikidata for Video Game Research

The research project

● diggr (Databased Infrastructure for Global Games Research) ● Collaborative research project funded by the German Research Foundation ● Duration: 01/2017 - 07/2020

● The Team: ○ Interdisciplinary (Information Science, Librarianship, Cultural Studies, [Japan|Area] Studies) ○ Library's IT department ○ Institute for Japanese Studies of Leipzig University

Leipzig University Library Using Wikidata for Video Game Research

Support Research Data Lifecycle

Research data lifecycle diagram by Jisc CC BY-NC-ND

Leipzig University Library Using Wikidata for Video Game Research

For Science! Metadata about Video Games and Companies Using Wikidata for Video Game Research

Data about Video Game Companies

● No database (only entities inside video game databases) ● No common identifier ● Little information

Approach ● Data curation by hand

Leipzig University Library Data-driven Perspectives on FromSoftware Videogames

Data about Video Games

● A lot of databases (Mobygames, Media Art DB, GameFAQs, IGDB, …) ● No common identifier ● Different data coverage / specialization ● Conceptual differences ● Completeness ● Errors ● Bias ● …

Approach ● Linking databases

Universitätsbibliothek Leipzig Data-driven Perspectives on FromSoftware Videogames

Two Main Tasks

● Data curation … of public available information ● Linking databases … to aggregate information + Cooperation with other researchers

Universitätsbibliothek Leipzig Data-driven Perspectives on FromSoftware Videogames

Two Main Tasks

● Data curation … of public available information ● Linking databases … to aggregate information + Cooperation with other researchers

Own infrastructure?! ● Development, maintenance, long term service, time, human resource, ...

Universitätsbibliothek Leipzig Data-driven Perspectives on FromSoftware Videogames

Two Main Tasks

● Data curation … of public available information ● Linking databases … to aggregate information + Cooperation with other researchers

Own infrastructure?! ● Development, maintenance, long term service, time, human resource, ...

There's gotta be something there….

Universitätsbibliothek Leipzig Using Wikidata for Video Game Research

Data Curation with Wikidata

Simple as that

Leipzig University Library Using Wikidata for Video Game Research

Wikidata as THE Linking Hub for Video Games 100 identifier properties for video games!

https://www.wikidata.org/wiki/Wikidata:WikiProject_Video_games/Identifiers Leipzig University Library Using Wikidata for Video Game Research

Wikidata as THE Linking Hub for Video Games

Example: Dark Souls

MusicBrainz Work ID Soundtrack

Metacritic ID Reviews and scores

subreddit Fan culture (e.g. memes)

Behind The Voice Actors Video Game ID Voice Actors

speedrun.com game id Speedruns

Bibliothèque nationale de France ID Library ID

IGN game ID News

https://www.wikidata.org/wiki/Wikidata:WikiProject_Video_games/Identifiers Leipzig University Library Using Wikidata for Video Game Research

Other Examples for Wikidata + Research Data

Bio (GeneDB) Waagmeester A, Schriml L, Su A (2019) Wikidata as a linked-data hub for Biodiversity data . Biodiversity Information Science and Standards 3: e35206. https://doi.org/10.3897/biss.3.35206 Sebastian Burgstaller-Muehlbacher, Andra Waagmeester, Elvira Mitraka, Julia Turner, Tim Putman, Justin Leong, Chinmay Naik, Paul Pavlidis, Lynn Schriml, Benjamin M Good, Andrew I Su, Wikidata as a semantic framework for the Gene Wiki initiative, Database, Volume 2016, 2016, baw015, https://doi.org/10.1093/database/baw015 Manske M, Böhme U, Püthe C and Berriman M. GeneDB and Wikidata [version 1; peer review: 1 approved, 1 approved with reservations]. Wellcome Open Res 2019, 4:114 (https://doi.org/10.12688/wellcomeopenres.15355.1)

History FactGrid https://blog.factgrid.de/welcome

General Mietchen D, Hagedorn G, Willighagen E, Rico M, Gómez-Pérez A, Aibar E, Rafes K, Germain C, Dunning A, Pintscher L, Kinzler D (2015) Enabling Open Science: Wikidata for Research (Wiki4R). Research Ideas and Outcomes 1: e7573. https://doi.org/10.3897/rio.1.e7573

Leipzig University Library Using Wikidata for Video Game Research

Advantages

● Other people's infrastructure ● Easy access (for researchers, student assistants, developers) ● Community knowledge ● Provenance and history of statements ● Open Refine integration -> easy bulk imports ● Collaborative data curation tool -> international collaboration

Leipzig University Library Using Wikidata for Video Game Research

Learning by doing Activities Using Wikidata for Video Game Research

Timeline

Company data enhancement since 2019-01 Company data 2017 Import Link Start 2018-05 Import Project 2019-08

Try own Wikibase OLAC Video 2018-01 Game Genre 2019-01 Curation of links since 2019-08

Leipzig University Library Using Wikidata for Video Game Research

Activities

Bulk Imports − Video game companies (names, locations, inception, industry, identifiers) − Links of video game databases (Media Art DB, GameFAQs) Ongoing data enhancement − Locations and Mobygames ID for video game companies − New company items − Links to video game database Mobygames (via Mix 'n' Match) − New game items Discussion − Community-driven Initiatives and Research Avenues Workshop (July 2019)

Leipzig University Library Using Wikidata for Video Game Research

Pitfalls and Confusions Challenges Using Wikidata for Video Game Research

Challenges Company Data

− Company data often doesn't match the principles of notability -> some were deleted :(

Does not meet the notability policy: content was: "Beyond Interactive, Inc."

− small companies − perhaps doesn't exist anymore − little information about it − Complex relationships between or inside companies (subsidiaries, branches, successors, merges, …)

Leipzig University Library Using Wikidata for Video Game Research

Challenges Video Game Data

− Concepts, vocabulary, inconsistencies, ...

Leipzig University Library Using Wikidata for Video Game Research

Wikidata and Research Data Management? Using Wikidata for Video Game Research

Support Research Data Lifecycle

− Wikidata for − Collect and capture data − Collaborate and analyse data − Store data − Share data − Discover and reuse data

Research data lifecycle diagram by Jisc CC BY-NC-ND

Leipzig University Library Using Wikidata for Video Game Research

Is it Findable, Accessible, Interoperable, and Reusable (FAIR)

− yes …

Leipzig University Library Using Wikidata for Video Game Research

Is it Findable, Accessible, Interoperable, and Reusable (FAIR)

− yes …

− … but, we cannot import everything: research dataset contains data derived from a 3rd party or which is not suitable with Wikidata − copyright (databases) − personal information (credit data) − Approach a: Own Wikibase instance − we failed with that in 2018, but maybe in the future it gets easier for research projects − Approach b: Data dumps in repositories (also needed for publications!)

Leipzig University Library Using Wikidata for Video Game Research

Résumé/Resümee #sharingiscaring

− After two years: − it's getting better and better − active community − data is (mostly) stable − We are convinced that: − Having research data on Wikidata increases its sustainability − When research project end, data remain on an independently funded infrastructure − Wikidata allows rapid integration with other domains − The project activities are well documented inside Wikidata − Open = many eyes = fast curation loops = less errors

Leipzig University Library Using Wikidata for Video Game Research

Next steps

− WikidataCon 2019 − "Sum of All video games − 2019 edition" (Jean-Frédéric Berthelot, Envel Le Hir, Tracy Hoffmann) − Work on data model − Further data curation

Leipzig University Library Using Wikidata for Video Game Research

Thanks! A big thank you goes out to @Jean-Fred ( ´ ▽ ` )b

https://www.wikidata.org/wiki/Wikidata:WikiProject_Video_games

https://diggr.link/ https://github.com/diggr/

The contents of this presentation - except where otherwise noted and with the exception of the graphic design elements and the logos of the Leipzig University and its institutions - are licensed under Creative Commons Attribution 4.0 International.

Leipzig University Library