<<

REVISITING QUALITY @ Framework and Tools discussion workshop ¦ Wikimania 2019 We are interested in research that matters.

Claudia Cristina Lydia Mariam Müller-Birn Sarasua Pintscher Farda Sarbas FUB UZH WMDE FUB Relevant Data Quality Frameworks

Data (Wang and Strong, 1996) (Zaveri et al., 2015 ) Linked Data / Knowledge Bases (Färber et al., 2018 )

Wikidata

Intrinsic Intrinsic Intrinsic Contextual Contextual Contextual Representational Representational Representational Accessibility Accessibility Accessibility Dynamicity (* Z) (Wikidata Community Trust (* Z) RfC by Piscopo, 2016) - “ Barack Obama’s place of birth is Washington” Accuracy

❖ Data Quality Dimensions ❖ “ Barack Obama’s place of birth is Hawaii”

+ - “ Q72 is an instance of municipality of Switzerland and human.” Consistency

❖ Data Quality Dimensions ❖

“ Q72 is an instance of municipality of Switzerland.” + - “ Stockholm’s latest population is 905,184 inhabitants.” Timeliness

❖ Data Quality Dimensions ❖

“ Stockholm’s latest population is 965,232 inhabitants.” + - “ Wikidata contains 280 Swedish municipalities.” Completeness

❖ Data Quality Dimensions ❖

“ Wikidata contains 290 Swedish municipalities.” + Method to compute class cardinality: (Luggen et al.,2019) - “ Q72 is linked to db:I1, Information which contains the coordinates of the city of Gain via Zurich.” Interlinking

❖ Data Quality Dimensions ❖ “ Q72 is linked to odzh:I1, which contains population data at a neighbourhood level.” + Measures to assess gain: (Sarasua et al., 2017) “ Wikidata contains - data about 200 types of things, referenced with 150 primary sources.” Diversity

❖ Data Quality Dimensions ❖ “ Wikidata contains data about 3K types of things, referenced with 2K primary sources.” + Source Diversity by Bots: (Farda Sarbas et al., 2019) What tools do we have in the Wikidata ecosystem for data quality?

See Overview of Data Quality Tools by Wikimedia here See more community tools in Hay’s directory here Property constraints Maps for coordinates Recoin Tools per Data Quality Dimension Feedback from our interactive poster Let’s discuss! What critical data quality issues did you spot in Wikidata? Help curate a joined list here. How should we organize data quality management in the community? What did we learn from and/or other WM projects? Thank you!

https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality References

Vrandečić, D. and Krötzsch, M.. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (September 2014), 78-85. DOI: https://doi.org/10.1145/2629489

Färber, M., Bartscherer, F., Menne, C., Rettinger, A., 2018. Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. https://doi.org/10.3233/SW-170275

Wang, R.Y., Strong, D.M., 1996. Beyond Accuracy: What Data Quality Means to Data Consumers. https://doi.org/10.1080/07421222.1996.W11518099

Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S., 2015. Quality assessment for Linked Data: A Survey: A systematic literature review and conceptual framework. https://doi.org/10.3233/SW-150175

Piscopo, A., 2016. Requests for comment/Data quality framework for Wikidata. https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Data_quality_framework_for_Wikidata References

Luggen, M., Difallah, D., Sarasua, C., Demartini , G., Cudre-Mauroux., P., 2019.Non-Parametric Class Completeness Estimators for Collaborative Knowledge Graphs. ISWC 2019. https://exascale.info/assets/pdf/luggen2019iswc.pdf

Sarasua, C., Staab, S., Thimm, M., 2017. Methods for intrinsic evaluation of links in the web of data ESWC 2017. https://pdfs.semanticscholar.org/4986/e7de4af98d0689557ae2cad0ffe302030a98.pdf

Farda-Sarbas, M., Zhu, H., Nest, M. and Müller-Birn, C., 2019. Approving Automation: Analyzing Requests for Permissions of Bots in Wikidata. OpenSym 2019. Images

Cybernetwork (front slide) (pixabay license) https://pixabay.com/illustrations/cyber-network-technology-futuristic-3400789/

Mars Climate Orbiter (Public Domain) https://en.wikipedia.org/wiki/Mars_Climate_Orbiter#/media/File:Mars_Climate_Orbiter_2.jpg

Hands (pixabay license) https://pixabay.com/photos/hand-united-together-people-unity-1917895/

Discussion logo (pixabay license) https://pixabay.com/illustrations/group-discussion-human-personal-1962592/ Back-up Wikidata’s Data Model 101 Wikidata’s Data Model 101

Data Wikidata’s Data Model 101 schema