<<

Please complete a short survey: http://bit.ly/oclc18wikidatasurvey Introduction to Slides: http://bit.ly/oclc18wikidata One page guide: for http://bit.ly/wikidata-onepage Wikimedia District of Columbia Structuring @wikimediadc and Beyond [email protected] @fuzheado

[email protected] OCLC @wikigamaliel June 12, 2018 | Wikimedia DC #Wikidata Licensed via CC-BY-SA 4.0 Andrew Lih

Author of The Wikipedia Revolution; digital sharing strategist; journalism professor

About Robert Fernandez

Assistant Professor, Resources Development/ eLearning , George's Community College; Wikimedia DC board member GLAM Galleries, , and Museums - Cultural Partners

Wikimedia DC of Congress NARA Smithsonian Local chapter for Wikipedia Edit-a-thons Full-time Wikipedian in Edit-a-thons / Wikimedia community Residence Article improvement drives Wikidata and modeling Associations for the Centers Wikipedia Space exhibit Linked for Study of Congress Wikiconference hosting Wikidata: The of Wikipedia into the ultimate, free linked open database Wikidata In one page

http://bit.ly/wikidata-onepage 2017 was a turning point for Wikidata

● Digital assistants: , Alexa, etc. ● on Wikipedia Why? ● Structured data on Commons ● Wikicite, WikidataCon

A hub for the future of Wikimedia content The mission

The mission

CC BY 3.0, Cavefrog Wikidata items have identifiers - VIAF links back Dan Scott, Laurentian University

"rather than focusing on directly enhancing our own local data repository silos (for example, library catalogues, digital exhibits), libraries and archives should invest their limited resources in enriching Wikidata, a centralized data repository, to maximize the visibility of those entities and the reusability of that data in the world at large… and then pull that data back into our local repositories to enrich our displays and integration with the broader world of data." https://coffeecode.net/wikidata-canada-150-and-music-festival-data .html Why Wikidata?

Design of Wikidata

Features, RDF, triples

Overview Queries and tools

Case studies

Calls to action More than 5 million English articles

Wikipedia Top 10 most visited site

today Reputation and cultural partnerships Knowledge scattered among 30 million articles in 200+ languages

Wikipedia Inconsistency, gaps and challenges replication How to consolidate knowable facts? 2001: Images scattered across Wikipedia editions Lesson: Images 2004: centralized and and multimedia consolidated multimedia

commons.wikimedia.org Convert encyclopedic lexical content into "structured" statements Wikidata Turn readable into machine understandable as the future Link to stable external data of LAM institutions

"" realized Facts and figures from articles, infoboxes are only in human-readable prose Navigation boxes at bottom of Wikipedia articles done by hand Launched 2012

Power of searching, Wikidata sorting and querying capabilities Sum Interconnected mesh of all human knowledge Factual claims are stored as statements in Wikidata

subject - predicate - object Fundamentals: or Statements item - property - value or

thing - relationship - thing Wikidata item for Congress (Q11268) Wikidata item for United States Congress (Q11268) "Triple" Wikidata Basics item - property - value

Q numbers - item P numbers - property

● Anyone can make a Q item ● for consistency ● Corresponds to Wikipedia article / concept ● Proposal, discussion and approval ● Examples ○ Q1 - the Universe process ○ Q2 - Earth ● Examples ○ Q5 - human ○ P31 - instance of ○ Q146 - cat ○ P279 - subclass of ○ Q729 - animal ○ P214 - VIAF ID ○ Q571 - book ○ P569 - date of birth ○ Q7075 - library ○ P625 - coordinate location ○ Q190593 - OCLC ● See: Wikidata:List_of_properties ○ Q877140 - cardigan Wikidata item page

Claims capture factual, provable information

Any number of statements can be associated with an item Wikidata statement Item Property Value "" "instance of" "human" triples Q23 P31 Q5

Underneath the surface...

Using symbols makes them language "place of "George Washington" "Mount Vernon" independent (identifiers vs names) burial" Q23 P119 Q731635

"George Washington" "LCAuth ID" "n86140996" Q23 P244 Wikidata link on Wikipedia articles Wikidata stores statements as explicit triples - item + property + value

Item United States Congress Q11268

Property Value "instance of" "bicameral legislature" P31 Q189445 Wikidata statement triples

Claims capture factual, provable information

Using symbols makes them language independent (identifiers vs names)

Item Property Value United States Congress "instance of" "bicameral legislature"

Q11268 P31 Q189445 Wikidata statement triples

Relationships are "first class" = very fast to search and sort

Seconds vs minutes to search

Ad hoc data model highly adaptive

Well-suited to the way Item Property Value United States Congress "instance of" "bicameral legislature"

Q11268 P31 Q189445 Traditional databases Artist Date of birth Country Medium

Henri Matisse December 31, 1869 Painting Schemas well-defined and controlled

Claude Monet November 14, 1840 France Painting Relational databases and SQL: Columns need lots of planning and Edward Hopper July 22, 1882 United States Painting forethought

Changes can be complex, with many Work Creator Date Location cascading effects Les Bêtes De La Mer Henri Matisse 1950 NGA Searches involving relationships can be slow or expensive (join operations) Cape Cod Morning Edward Hopper 1950 SAAM

Nighthawks Edward Hopper 1942 Art Inst United States Wikidata and RDF citizen of databases Edward July 22, 1882 Hopper Relationships are explicit and precise date of birth

Database can take any shape and grow according to need creator

Also known as "graph databases" Nighthawks

Cape Cod Morning Instance of

painting

creative work subclass of Summary

UPSIDES DOWNSIDES

RDF triples make for a very flexible and fast Schema-on-the- system can make modeling system inconsistent and difficult

Suitable for the BEBOLD wiki culture Hard for newcomers to understand

Multiple parallel ontologies can co-exist Multiple parallel ontologies can co-exist Muammar Gaddafi Gadhafi Mu‘ammar al-Qaḏḏafi Muammar Abu Qaḏḏafi Muamar al Gadafi Minyar al-Gaddafi Qaḏḏāfī Moammar Gadafi Colonel Gaddafi Muammar Muhammad Abu Muammar al-Gaddafi Kadhafi Minyar al-Gaddafi Muhammad Ghadaffi Wikidata items Mu‘ammar al Qaḏḏāfi Khadafi Muammar el Gaddafi Moammar Al Qadhafi Mu‘ammar al-Qaḏḏāfi Muamar al Gaddafhi Qaḏḏāfi Gaddafi Mu‘ammar al-Qaḏḏāfī Gadafi Muammar el Gadafi Mu‘ammar al Qaḏḏafī Using identifiers removes language Kadaffi Muamar al-Gaddafi Khaddafi Al-Khadafy Muamar al Gaddafi Muammar al Gaddafi dependence and ambiguity in: Gadaffi Mu‘ammar al Qaḏḏafi Qaḏḏafī Kaddafi Kadafi El Kazzafi Muammar al–Gaddafi Omar Gadafi Muhamad Gadafi Writing systems (Chinese, Serbian, Jaddafi Kaddaffi Muamar al-Gaddafhi Kazakh, et al) Qaddafy Moammar Jaddafi Muammar Gaddafi Muamar Gadafi Muhamar Gadaffi Muamar el-Gadafi 53 Latinized Phonetization variations Mu‘ammar al-Qaḏḏafī Mu‘ammar al Qaḏḏāfī Al-Qadhdhaafi Al-Qathafi variations! (May 2017) Spelling variations

Maiden vs. married names

Canonical identifiers help link to Item external databases Muammar Gaddafi Q19878 Wikidata has more than 48 million items Speed, Simple searches take less consistency, than a second Complex queries automation supported by open standards like SPARQL Search example - Find all bicameral legislatures http://query.wikidata.org

Item Property Value ? "instance of" "bicameral legislature"

? P31 Q189445 Wikidata Search - Result from Query

26 million items in 1/3 of a second Wikidata items have identifiers - VIAF links back Wikidata items have identifiers - links to external databases

Barack Obama (Q76) has 83 identifiers! Some prominent identifiers - links to external databases

University NDLAuth ID (National Diet Library of CiNii (Scholarly and Academic Japan) Information Navigator) Japan SELIBR (National Library of Sweden NNDB people ID - Notable Names Libris) Database NLA () ID Politifact NKCR Czech National Authority Encyclopedia Britannica ID WorldCat Database (National Library of Czech CONOR ID (Slovenia) VIAF Republic) NYT topic ID LC Name Authority File RSL ID (person) Russian State Library Guardian topic ID ISNI IMDB Parlement & Politiek ID (Dutch politics GND (Integrated Authority File) Dutch National Thesaurus for Author site) SUDOC (French universities) names Social Networks and Archival Context BNF (Bibliotheque France) Declarator.org - Russian ID (SNAC) MusicBrainz non-governmental database with NARA Bio Directory of Congress information on the income of Digital Library topic ID government officials University of Virginia C-SPAN person ID NUKAT - Center of Warsaw , Berkeley OCLC related properties Consistency and automation

Constraint reports/violations provide warnings on logic and bounds Common Wikidata editing tasks - Language links

Wikidata provides central hub Where all inter-wiki links are found Properties: Identifiers

Indexes into other databases

Authority control

Accession numbers

Catalog identifiers

Stable URLs to other sites Instead of individual item Alternative pages... contribution Task lists, games and other interfaces contribute methods to Wikidata Wikidata Game allows for "one-click" contributions based on task lists Notable external databases

Art and museum databases, thesauri, dictionaries, encyclopedias, national and academic libraries

Internet-based databases - IMDb, MusicBrainz, Quora

Mix'n'Match a one-click "game" interface to help match external data to Wikidata https://tools.wmflabs.org/mix-n-matc h Wikidata Game: Identifier match for SAAM Querying tools

Searching and Presentation

displaying Visualization

Wikidata APIs and endpoints Basic search with Wikidata

SPARQL endpoint at query.wikidata.org

Superficially similar to SQL

One of the busiest endpoints on the Internet https://wikidata.org/wiki/Wikidata:SP ARQL_query_service Basic search with Wikidata

SPARQL is an open standard

Try the "Examples" button for lots of interesting searches

Hint: Don't write queries from scratch. Modify existing ones!

Use auto-complete with CTRL-SPACE (Beware Mac users!) Advanced search with Wikidata

Statistics, graphs, maps via Wikidata

Discover stories in the data:

Example: Where have members of Congress been educated? Example: Education of Congress

List all members of Congress who have ever served and examine where they have been educated ?moc wdt:P31 wd:Q5 . #"instances of"

?moc wdt:P1157 ?lcbioid . #LC "Congress Bio ID"

?moc wdt:P69 ?school . #grab "educated at"

COUNT the occurrences of each school

ORDER them from highest to lowest

LIMIT it to the top 15 results Education of Congress

Run time: about 15 seconds

Results can be shown in multiple ways

Tables

Maps

Charts

Timelines http://tinyurl.com/k8tqzj7 Education of Congress

H-Y-P dominate

University of Michigan very prominent

Some surprises - Union College? In 1800, the U.S. Big Four colleges:

Harvard Yale Princeton Union College Union (!) Union College lost ground amid a financial scandal and Civil War attrition (1861-1865) Deeper query: members of Congress educated at Union - table mode, date of birth Members of Congress educated at Union - Timeline mode, Civil War (1861-1865)

Union College Schenectady, NY

Columbia University , NY Impact of Wikidata

Google closed their own Freebase project in 2016, in favor of backing Wikidata

Google search results and Knowledge Graph use Wikidata

Schema.org has endorsed using Wikidata Interesting Wikidata tools

Wikidata Query

Wikidata Graph Builder

Monumental

Reasonator

Vizquery

Gender Gap Tool Wikidata Query: SPARQL endpoint https://query.wikidata.org/

Try example queries Reasonator: Nicely formatted Wikidata pages https://tools.wmflabs.org/reasonator/ SQID: Browsing Wikidata entries and linkages https://tools.wmflabs.org/sqid/ SQID: Browsing Wikidata entries and linkages https://tools.wmflabs.org/sqid/ Wikidata Timeline: Queries displayed in linear timelines

https://tools.wmflabs.org/wikidata-timeline Wikidata Distributed Game: click to contribute https://tools.wmflabs.org/wikidata-game/distributed/ Quickstatements: Bulk upload tables https://tools.wmflabs.org/wikidata-todo/quick_statements.php Wikidata Graph Builder: Visualizing relationships https://angryloki.github.io/wikidata-graph-builder Query: Washington DC museums, metadata http://tinyurl.com/ydekc2w2

Consider this query to find all museums in DC... Query: Washington DC museums, results http://tinyurl.com/ydekc2w2

Raw table results from query... Query: Washington DC museums, multiple views http://tinyurl.com/ydekc2w2 Vizquery: Simple Wikidata item selection https://tools.wmflabs.org/hay/vizquery/

Much simpler way to do queries Advanced Wikidata tools

Scholia - citations/authors of scholarly articles and journals

Bulk uploading - Quickstatements, Petscan

Wikidata Game, Distributed Game - contributing by clicking Wikidata still an early work in progress

Many areas well-modeled

Caveats Many areas quite bare (items with no statements)

Instances vs subclasses Shifting to Wikidata:

● Control of modeling Issues ● Display and formatting ● Features vs. commonality Wikidata: Internet duct tape

Research, academic hub

The future is CC0 - no

structured Join top cultural and through commercial institutions already working with Wikidata Wikidata

Ask questions! Wikidata in One Slide

Main site - http://wikidata.org/ Tools - https://wikidata.org/wiki/Wikidata:Tools/External_tools

● Queries - Experiment with http://query.wikidata.org ● Plan - contributions / image uploads ○ Mix-n-Match ○ Quickstatements Wikidata In one page

http://bit.ly/wikidata-onepage Just as with Wikipedia: Editing Wikidata Edit button +add button The next step?

● Add your library to Wikidata ● Dan Scott's step by step guide: https://coffeecode.net/creating-and-editi ng-libraries-in-wikidata.html ● Wikidata treasure hunt: http://bit.ly/oclc18wikidataexplore ● Wikidata explore: http://bit.ly/oclcwikidata18hunt

Thank you! Discussion - Q&A

[email protected] @fuzheado

[email protected] @wikigamaliel

Wikimedia District of Columbia @wikimediadc wikimediadc.org