Please complete a short survey: http://bit.ly/oclc18wikidatasurvey Introduction to Slides: http://bit.ly/oclc18wikidata One page guide: Wikidata for http://bit.ly/wikidata-onepage Librarians Wikimedia District of Columbia Structuring Wikipedia @wikimediadc and Beyond [email protected] @fuzheado
[email protected] OCLC @wikigamaliel June 12, 2018 | Wikimedia DC #Wikidata Licensed via CC-BY-SA 4.0 Andrew Lih
Author of The Wikipedia Revolution; digital sharing strategist; journalism professor
About Robert Fernandez
Assistant Professor, Resources Development/ eLearning Librarian, Prince George's Community College; Wikimedia DC board member GLAM Galleries, Libraries, Archives and Museums - Cultural Partners
Wikimedia DC Library of Congress NARA Smithsonian Local chapter for Wikipedia Edit-a-thons Full-time Wikipedian in Edit-a-thons / Wikimedia community Residence Article improvement drives Wikidata and modeling Associations for the Centers Wikipedia Space exhibit Linked Open Data for Study of Congress Wikiconference hosting Wikidata: The evolution of Wikipedia into the ultimate, free linked open database Wikidata In one page
http://bit.ly/wikidata-onepage 2017 was a turning point for Wikidata
● Google Knowledge Graph ● Digital assistants: Siri, Alexa, etc. ● Infoboxes on Wikipedia Why? ● Structured data on Commons ● Wikicite, WikidataCon
A hub for the future of Wikimedia content The mission
The mission
CC BY 3.0, Cavefrog Wikidata items have identifiers - VIAF links back Dan Scott, Laurentian University
"rather than focusing on directly enhancing our own local data repository silos (for example, library catalogues, digital exhibits), libraries and archives should invest their limited resources in enriching Wikidata, a centralized data repository, to maximize the visibility of those entities and the reusability of that data in the world at large… and then pull that data back into our local repositories to enrich our displays and integration with the broader world of data." https://coffeecode.net/wikidata-canada-150-and-music-festival-data .html Why Wikidata?
Design of Wikidata
Features, RDF, triples
Overview Queries and tools
Case studies
Calls to action More than 5 million English articles
Wikipedia Top 10 most visited site
today Reputation and cultural partnerships Knowledge scattered among 30 million articles in 200+ languages
Wikipedia Inconsistency, gaps and challenges replication How to consolidate knowable facts? 2001: Images scattered across Wikipedia editions Lesson: Images 2004: Wikimedia Commons centralized and and multimedia consolidated multimedia
commons.wikimedia.org Convert encyclopedic lexical content into "structured" statements Wikidata Turn human readable into machine understandable as the future Link to stable external data of LAM institutions
"Semantic web" realized Facts and figures from articles, infoboxes are only in human-readable prose Navigation boxes at bottom of Wikipedia articles done by hand Launched 2012
Power of searching, Wikidata sorting and querying capabilities Sum Interconnected mesh of all human knowledge Factual claims are stored as statements in Wikidata
subject - predicate - object Fundamentals: or Statements item - property - value or
thing - relationship - thing Wikidata item for United States Congress (Q11268) Wikidata item for United States Congress (Q11268) "Triple" Wikidata Basics item - property - value
Q numbers - item P numbers - property
● Anyone can make a Q item ● Controlled vocabulary for consistency ● Corresponds to Wikipedia article / concept ● Proposal, discussion and approval ● Examples ○ Q1 - the Universe process ○ Q2 - Earth ● Examples ○ Q5 - human ○ P31 - instance of ○ Q146 - cat ○ P279 - subclass of ○ Q729 - animal ○ P214 - VIAF ID ○ Q571 - book ○ P569 - date of birth ○ Q7075 - library ○ P625 - coordinate location ○ Q190593 - OCLC ● See: Wikidata:List_of_properties ○ Q877140 - cardigan Wikidata item page
Claims capture factual, provable information
Any number of statements can be associated with an item Wikidata statement Item Property Value "George Washington" "instance of" "human" triples Q23 P31 Q5
Underneath the surface...
Using symbols makes them language "place of "George Washington" "Mount Vernon" independent (identifiers vs names) burial" Q23 P119 Q731635
"George Washington" "LCAuth ID" "n86140996" Q23 P244 Wikidata link on Wikipedia articles Wikidata stores statements as explicit triples - item + property + value
Item United States Congress Q11268
Property Value "instance of" "bicameral legislature" P31 Q189445 Wikidata statement triples
Claims capture factual, provable information
Using symbols makes them language independent (identifiers vs names)
Item Property Value United States Congress "instance of" "bicameral legislature"
Q11268 P31 Q189445 Wikidata statement triples
Relationships are "first class" = very fast to search and sort
Seconds vs minutes to search
Ad hoc data model highly adaptive
Well-suited to the wiki way Item Property Value United States Congress "instance of" "bicameral legislature"
Q11268 P31 Q189445 Traditional databases Artist Date of birth Country Medium
Henri Matisse December 31, 1869 France Painting Schemas well-defined and controlled
Claude Monet November 14, 1840 France Painting Relational databases and SQL: Columns need lots of planning and Edward Hopper July 22, 1882 United States Painting forethought
Changes can be complex, with many Work Creator Date Location cascading effects Les Bêtes De La Mer Henri Matisse 1950 NGA Searches involving relationships can be slow or expensive (join operations) Cape Cod Morning Edward Hopper 1950 SAAM
Nighthawks Edward Hopper 1942 Art Inst Chicago United States Wikidata and RDF citizen of databases Edward July 22, 1882 Hopper Relationships are explicit and precise date of birth
Database can take any shape and grow according to need creator
Also known as "graph databases" Nighthawks
Cape Cod Morning Instance of
painting
creative work subclass of Summary
UPSIDES DOWNSIDES
RDF triples make for a very flexible and fast Schema-on-the-fly system can make modeling system inconsistent and difficult
Suitable for the BEBOLD wiki culture Hard for newcomers to understand
Multiple parallel ontologies can co-exist Multiple parallel ontologies can co-exist Muammar Gaddafi Gadhafi Mu‘ammar al-Qaḏḏafi Muammar Muhammad Abu Qaḏḏafi Muamar al Gadafi Minyar al-Gaddafi Qaḏḏāfī Moammar Gadafi Colonel Gaddafi Muammar Muhammad Abu Muammar al-Gaddafi Kadhafi Minyar al-Gaddafi Muhammad Ghadaffi Wikidata items Mu‘ammar al Qaḏḏāfi Khadafi Muammar el Gaddafi Moammar Al Qadhafi Mu‘ammar al-Qaḏḏāfi Muamar al Gaddafhi Qaḏḏāfi Gaddafi Mu‘ammar al-Qaḏḏāfī Gadafi Muammar el Gadafi Mu‘ammar al Qaḏḏafī Using identifiers removes language Kadaffi Muamar al-Gaddafi Khaddafi Al-Khadafy Muamar al Gaddafi Muammar al Gaddafi dependence and ambiguity in: Gadaffi Mu‘ammar al Qaḏḏafi Qaḏḏafī Kaddafi Kadafi El Kazzafi Muammar al–Gaddafi Omar Gadafi Muhamad Gadafi Writing systems (Chinese, Serbian, Jaddafi Kaddaffi Muamar al-Gaddafhi Kazakh, et al) Qaddafy Moammar Jaddafi Muammar Gaddafi Muamar Gadafi Muhamar Gadaffi Muamar el-Gadafi 53 Latinized Phonetization variations Mu‘ammar al-Qaḏḏafī Mu‘ammar al Qaḏḏāfī Al-Qadhdhaafi Al-Qathafi variations! (May 2017) Spelling variations
Maiden vs. married names
Canonical identifiers help link to Item external databases Muammar Gaddafi Q19878 Wikidata has more than 48 million items Speed, Simple searches take less consistency, than a second Complex queries automation supported by open standards like SPARQL Search example - Find all bicameral legislatures http://query.wikidata.org
Item Property Value ? "instance of" "bicameral legislature"
? P31 Q189445 Wikidata Search - Result from Query
26 million items in 1/3 of a second Wikidata items have identifiers - VIAF links back Wikidata items have identifiers - links to external databases
Barack Obama (Q76) has 83 identifiers! Some prominent identifiers - links to external databases
University Library catalog NDLAuth ID (National Diet Library of CiNii (Scholarly and Academic Japan) Information Navigator) Japan SELIBR (National Library of Sweden NNDB people ID - Notable Names Libris) Database NLA (Australia) ID Politifact NKCR Czech National Authority Encyclopedia Britannica ID WorldCat Database (National Library of Czech CONOR ID (Slovenia) VIAF Republic) NYT topic ID LC Name Authority File RSL ID (person) Russian State Library Guardian topic ID ISNI IMDB Parlement & Politiek ID (Dutch politics GND (Integrated Authority File) Dutch National Thesaurus for Author site) SUDOC (French universities) names Social Networks and Archival Context BNF (Bibliotheque France) Declarator.org - Russian ID (SNAC) MusicBrainz non-governmental database with NARA Bio Directory of Congress information on the income of California Digital Library Quora topic ID government officials University of Virginia C-SPAN person ID NUKAT - Center of Warsaw Freebase University of California, Berkeley OCLC related properties Consistency and automation
Constraint reports/violations provide warnings on logic and bounds Common Wikidata editing tasks - Language links
Wikidata provides central hub Where all inter-wiki links are found Properties: Identifiers
Indexes into other databases
Authority control
Accession numbers
Catalog identifiers
Stable URLs to other sites Instead of individual item Alternative pages... contribution Task lists, games and other interfaces contribute methods to Wikidata Wikidata Game allows for "one-click" contributions based on task lists Notable external databases
Art and museum databases, thesauri, dictionaries, encyclopedias, national and academic libraries
Internet-based databases - IMDb, MusicBrainz, Quora
Mix'n'Match a one-click "game" interface to help match external data to Wikidata https://tools.wmflabs.org/mix-n-matc h Wikidata Game: Identifier match for SAAM Querying tools
Searching and Presentation
displaying Visualization
Wikidata APIs and endpoints Basic search with Wikidata
SPARQL endpoint at query.wikidata.org
Superficially similar to SQL
One of the busiest endpoints on the Internet https://wikidata.org/wiki/Wikidata:SP ARQL_query_service Basic search with Wikidata
SPARQL is an open standard
Try the "Examples" button for lots of interesting searches
Hint: Don't write queries from scratch. Modify existing ones!
Use auto-complete with CTRL-SPACE (Beware Mac users!) Advanced search with Wikidata
Statistics, graphs, maps via Wikidata
Discover stories in the data:
Example: Where have members of Congress been educated? Example: Education of Congress
List all members of Congress who have ever served and examine where they have been educated ?moc wdt:P31 wd:Q5 . #"instances of" humans
?moc wdt:P1157 ?lcbioid . #LC "Congress Bio ID"
?moc wdt:P69 ?school . #grab "educated at"
COUNT the occurrences of each school
ORDER them from highest to lowest
LIMIT it to the top 15 results Education of Congress
Run time: about 15 seconds
Results can be shown in multiple ways
Tables
Maps
Charts
Timelines http://tinyurl.com/k8tqzj7 Education of Congress
H-Y-P dominate
University of Michigan very prominent
Some surprises - Union College? In 1800, the U.S. Big Four colleges:
Harvard Yale Princeton Union College Union (!) Union College lost ground amid a financial scandal and Civil War attrition (1861-1865) Deeper query: members of Congress educated at Union - table mode, date of birth Members of Congress educated at Union - Timeline mode, Civil War (1861-1865)
Union College Schenectady, NY
Columbia University New York, NY Impact of Wikidata
Google closed their own Freebase project in 2016, in favor of backing Wikidata
Google search results and Knowledge Graph use Wikidata
Schema.org has endorsed using Wikidata Interesting Wikidata tools
Wikidata Query
Wikidata Graph Builder
Monumental
Reasonator
Vizquery
Gender Gap Tool Wikidata Query: SPARQL endpoint https://query.wikidata.org/
Try example queries Reasonator: Nicely formatted Wikidata pages https://tools.wmflabs.org/reasonator/ SQID: Browsing Wikidata entries and linkages https://tools.wmflabs.org/sqid/ SQID: Browsing Wikidata entries and linkages https://tools.wmflabs.org/sqid/ Wikidata Timeline: Queries displayed in linear timelines
https://tools.wmflabs.org/wikidata-timeline Wikidata Distributed Game: click to contribute https://tools.wmflabs.org/wikidata-game/distributed/ Quickstatements: Bulk upload tables https://tools.wmflabs.org/wikidata-todo/quick_statements.php Wikidata Graph Builder: Visualizing relationships https://angryloki.github.io/wikidata-graph-builder Query: Washington DC museums, metadata http://tinyurl.com/ydekc2w2
Consider this query to find all museums in DC... Query: Washington DC museums, results http://tinyurl.com/ydekc2w2
Raw table results from query... Query: Washington DC museums, multiple views http://tinyurl.com/ydekc2w2 Vizquery: Simple Wikidata item selection https://tools.wmflabs.org/hay/vizquery/
Much simpler way to do queries Advanced Wikidata tools
Scholia - citations/authors of scholarly articles and journals
Bulk uploading - Quickstatements, Petscan
Wikidata Game, Distributed Game - contributing by clicking Wikidata still an early work in progress
Many areas well-modeled
Caveats Many areas quite bare (items with no statements)
Instances vs subclasses Shifting to Wikidata:
● Control of modeling Issues ● Display and formatting ● Features vs. commonality Wikidata: Internet duct tape
Research, academic hub
The future is CC0 - no copyright
structured Join top cultural and through commercial institutions already working with Wikidata Wikidata
Ask questions! Wikidata in One Slide
Main site - http://wikidata.org/ Tools - https://wikidata.org/wiki/Wikidata:Tools/External_tools
● Queries - Experiment with http://query.wikidata.org ● Plan - linked data contributions / image uploads ○ Mix-n-Match ○ Quickstatements Wikidata In one page
http://bit.ly/wikidata-onepage Just as with Wikipedia: Editing Wikidata Edit button +add button The next step?
● Add your library to Wikidata ● Dan Scott's step by step guide: https://coffeecode.net/creating-and-editi ng-libraries-in-wikidata.html ● Wikidata treasure hunt: http://bit.ly/oclc18wikidataexplore ● Wikidata explore: http://bit.ly/oclcwikidata18hunt
Thank you! Discussion - Q&A
[email protected] @fuzheado
[email protected] @wikigamaliel
Wikimedia District of Columbia @wikimediadc wikimediadc.org