Introduction to Wikidata for Librarians Structuring Wikipedia and Beyond

Please complete a short survey: http://bit.ly/oclc18wikidatasurvey Introduction to Slides: http://bit.ly/oclc18wikidata One page guide: Wikidata for http://bit.ly/wikidata-onepage Librarians Wikimedia District of Columbia Structuring Wikipedia @wikimediadc and Beyond [email protected] @fuzheado [email protected] OCLC @wikigamaliel June 12, 2018 | Wikimedia DC #Wikidata Licensed via CC-BY-SA 4.0 Andrew Lih Author of The Wikipedia Revolution; digital sharing strategist; journalism professor About Robert Fernandez Assistant Professor, Resources Development/ eLearning Librarian, Prince George's Community College; Wikimedia DC board member GLAM Galleries, Libraries, Archives and Museums - Cultural Partners Wikimedia DC Library of Congress NARA Smithsonian Local chapter for Wikipedia Edit-a-thons Full-time Wikipedian in Edit-a-thons / Wikimedia community Residence Article improvement drives Wikidata and modeling Associations for the Centers Wikipedia Space exhibit Linked Open Data for Study of Congress Wikiconference hosting Wikidata: The evolution of Wikipedia into the ultimate, free linked open database Wikidata In one page http://bit.ly/wikidata-onepage 2017 was a turning point for Wikidata ● Google Knowledge Graph ● Digital assistants: Siri, Alexa, etc. ● Infoboxes on Wikipedia Why? ● Structured data on Commons ● Wikicite, WikidataCon A hub for the future of Wikimedia content The mission The mission CC BY 3.0, Cavefrog Wikidata items have identifiers - VIAF links back Dan Scott, Laurentian University "rather than focusing on directly enhancing our own local data repository silos (for example, library catalogues, digital exhibits), libraries and archives should invest their limited resources in enriching Wikidata, a centralized data repository, to maximize the visibility of those entities and the reusability of that data in the world at large… and then pull that data back into our local repositories to enrich our displays and integration with the broader world of data." https://coffeecode.net/wikidata-canada-150-and-music-festival-data .html Why Wikidata? Design of Wikidata Features, RDF, triples Overview Queries and tools Case studies Calls to action More than 5 million English articles Wikipedia Top 10 most visited site today Reputation and cultural partnerships Knowledge scattered among 30 million articles in 200+ languages Wikipedia Inconsistency, gaps and challenges replication How to consolidate knowable facts? 2001: Images scattered across Wikipedia editions Lesson: Images 2004: Wikimedia Commons centralized and and multimedia consolidated multimedia commons.wikimedia.org Convert encyclopedic lexical content into "structured" statements Wikidata Turn human readable into machine understandable as the future Link to stable external data of LAM institutions "Semantic web" realized Facts and figures from articles, infoboxes are only in human-readable prose Navigation boxes at bottom of Wikipedia articles done by hand Launched 2012 Power of searching, Wikidata sorting and querying capabilities Sum Interconnected mesh of all human knowledge Factual claims are stored as statements in Wikidata subject - predicate - object Fundamentals: or Statements item - property - value or thing - relationship - thing Wikidata item for United States Congress (Q11268) Wikidata item for United States Congress (Q11268) "Triple" Wikidata Basics item - property - value Q numbers - item P numbers - property ● Anyone can make a Q item ● Controlled vocabulary for consistency ● Corresponds to Wikipedia article / concept ● Proposal, discussion and approval ● Examples ○ Q1 - the Universe process ○ Q2 - Earth ● Examples ○ Q5 - human ○ P31 - instance of ○ Q146 - cat ○ P279 - subclass of ○ Q729 - animal ○ P214 - VIAF ID ○ Q571 - book ○ P569 - date of birth ○ Q7075 - library ○ P625 - coordinate location ○ Q190593 - OCLC ● See: Wikidata:List_of_properties ○ Q877140 - cardigan Wikidata item page Claims capture factual, provable information Any number of statements can be associated with an item Wikidata statement Item Property Value "George Washington" "instance of" "human" triples Q23 P31 Q5 Underneath the surface... Using symbols makes them language "place of "George Washington" "Mount Vernon" independent (identifiers vs names) burial" Q23 P119 Q731635 "George Washington" "LCAuth ID" "n86140996" Q23 P244 Wikidata link on Wikipedia articles Wikidata stores statements as explicit triples - item + property + value Item United States Congress Q11268 Property Value "instance of" "bicameral legislature" P31 Q189445 Wikidata statement triples Claims capture factual, provable information Using symbols makes them language independent (identifiers vs names) Item Property Value United States Congress "instance of" "bicameral legislature" Q11268 P31 Q189445 Wikidata statement triples Relationships are "first class" = very fast to search and sort Seconds vs minutes to search Ad hoc data model highly adaptive Well-suited to the wiki way Item Property Value United States Congress "instance of" "bicameral legislature" Q11268 P31 Q189445 Traditional databases Artist Date of birth Country Medium Henri Matisse December 31, 1869 France Painting Schemas well-defined and controlled Claude Monet November 14, 1840 France Painting Relational databases and SQL: Columns need lots of planning and Edward Hopper July 22, 1882 United States Painting forethought Changes can be complex, with many Work Creator Date Location cascading effects Les Bêtes De La Mer Henri Matisse 1950 NGA Searches involving relationships can be slow or expensive (join operations) Cape Cod Morning Edward Hopper 1950 SAAM Nighthawks Edward Hopper 1942 Art Inst Chicago United States Wikidata and RDF citizen of databases Edward July 22, 1882 Hopper Relationships are explicit and precise date of birth Database can take any shape and grow according to need creator Also known as "graph databases" Nighthawks Cape Cod Morning Instance of painting creative work subclass of Summary UPSIDES DOWNSIDES RDF triples make for a very flexible and fast Schema-on-the-fly system can make modeling system inconsistent and difficult Suitable for the BEBOLD wiki culture Hard for newcomers to understand Multiple parallel ontologies can co-exist Multiple parallel ontologies can co-exist Muammar Gaddafi Gadhafi Mu‘ammar al-Qaḏḏafi Muammar Muhammad Abu Qaḏḏafi Muamar al Gadafi Minyar al-Gaddafi Qaḏḏāfī Moammar Gadafi Colonel Gaddafi Muammar Muhammad Abu Muammar al-Gaddafi Kadhafi Minyar al-Gaddafi Muhammad Ghadaffi Wikidata items Mu‘ammar al Qaḏḏāfi Khadafi Muammar el Gaddafi Moammar Al Qadhafi Mu‘ammar al-Qaḏḏāfi Muamar al Gaddafhi Qaḏḏāfi Gaddafi Mu‘ammar al-Qaḏḏāfī Gadafi Muammar el Gadafi Mu‘ammar al Qaḏḏafī Using identifiers removes language Kadaffi Muamar al-Gaddafi Khaddafi Al-Khadafy Muamar al Gaddafi Muammar al Gaddafi dependence and ambiguity in: Gadaffi Mu‘ammar al Qaḏḏafi Qaḏḏafī Kaddafi Kadafi El Kazzafi Muammar al–Gaddafi Omar Gadafi Muhamad Gadafi Writing systems (Chinese, Serbian, Jaddafi Kaddaffi Muamar al-Gaddafhi Kazakh, et al) Qaddafy Moammar Jaddafi Muammar Gaddafi Muamar Gadafi Muhamar Gadaffi Muamar el-Gadafi 53 Latinized Phonetization variations Mu‘ammar al-Qaḏḏafī Mu‘ammar al Qaḏḏāfī Al-Qadhdhaafi Al-Qathafi variations! (May 2017) Spelling variations Maiden vs. married names Canonical identifiers help link to Item external databases Muammar Gaddafi Q19878 Wikidata has more than 48 million items Speed, Simple searches take less consistency, than a second Complex queries automation supported by open standards like SPARQL Search example - Find all bicameral legislatures http://query.wikidata.org Item Property Value ? "instance of" "bicameral legislature" ? P31 Q189445 Wikidata Search - Result from Query 26 million items in 1/3 of a second Wikidata items have identifiers - VIAF links back Wikidata items have identifiers - links to external databases Barack Obama (Q76) has 83 identifiers! Some prominent identifiers - links to external databases University Library catalog NDLAuth ID (National Diet Library of CiNii (Scholarly and Academic Japan) Information Navigator) Japan SELIBR (National Library of Sweden NNDB people ID - Notable Names Libris) Database NLA (Australia) ID Politifact NKCR Czech National Authority Encyclopedia Britannica ID WorldCat Database (National Library of Czech CONOR ID (Slovenia) VIAF Republic) NYT topic ID LC Name Authority File RSL ID (person) Russian State Library Guardian topic ID ISNI IMDB Parlement & Politiek ID (Dutch politics GND (Integrated Authority File) Dutch National Thesaurus for Author site) SUDOC (French universities) names Social Networks and Archival Context BNF (Bibliotheque France) Declarator.org - Russian ID (SNAC) MusicBrainz non-governmental database with NARA Bio Directory of Congress information on the income of California Digital Library Quora topic ID government officials University of Virginia C-SPAN person ID NUKAT - Center of Warsaw Freebase University of California, Berkeley OCLC related properties Consistency and automation Constraint reports/violations provide warnings on logic and bounds Common Wikidata editing tasks - Language links Wikidata provides central hub Where all inter-wiki links are found Properties: Identifiers Indexes into other databases Authority control Accession numbers Catalog identifiers Stable URLs to other sites Instead of individual item Alternative pages... contribution Task lists, games and other interfaces contribute methods to Wikidata Wikidata Game allows for "one-click" contributions based on task lists Notable

Introduction to Wikidata for Librarians Structuring Wikipedia and Beyond

Amber Billey Senylrc 4/1/2016

What Do Wikidata and Wikipedia Have in Common? an Analysis of Their Use of External References

Wikipedia Knowledge Graph with Deepdive

Worldcat Data Licensing

Knowledge Graphs on the Web – an Overview Arxiv:2003.00719V3 [Cs

Mate Choice from Avicenna's Perspective

Download; (2) the Appropriate Log-In and Password to Access the Server; and (3) Where on the Server (I.E., in What Folder) the File Was Kept

Preparing for a Linked Data Approach to Name Authority Control in an Institutional Repository Context

OCLC and the Ethics of Librarianship: Using a Critical Lens to Recast a Key Resource

Scripts, Languages, and Authority Control Joan M

There Are No Limits to Learning! Academic and High School

Falcon 2.0: an Entity and Relation Linking Tool Over Wikidata