A Disagreeable Case

6

Friday, 29 July 16 6 Taxonomies • A simple, common form of representation • Simplest version – A set of terms – Arranged in a tree • The paths represent term relations – Generalization/specialization – Navigational salience – Decision structure • Key feature – Every term has a single place • And thus a single path • (We can generalize from trees!)

Friday, 29 July 16 7 Disagreement: Taxonomic position • Hierarchies aren’t neutral! “In meta-utopia, the lab-coated guardians of epistemology sit down and rationally map out a hierarchy of ideas...This presumes that there is a "correct" way of categorizing ideas, and that reasonable people, given enough time and incentive, can agree on the proper means for building a hierarchy. Nothing could be farther from the truth. Any hierarchy of ideas necessarily implies the importance of some axes over others.” —Metacrap: Putting the torch to seven straw-men of the meta-utopia

Friday, 29 July 16 8 Lab Coat View

Mary Van Rensselaer Buell (1893-1969) http://www.flickr.com/photos/smithsonian/3322785642/

http://www.well.com/~doctorow/metacrap.htm#2.5 Friday, 29 July 16 9 Duelling Manufacturers

vs

http://www.well.com/~doctorow/metacrap.htm#2.5 Friday, 29 July 16 10 Reconciled?

(Navigational, not generalisation)

Friday, 29 July 16 11 Could Web Tech Help This? • URIs! – URIs are (putatively) global identifiers – URIs designate “the same” entity in a given context • And perhaps, ideally, in all contexts • The duelling manufacturers – Could use the same URIs for each concept • http://common-terminology-for-washers.com/size • Thus the common nodes would merge “automatically”

Friday, 29 July 16 12 Disagreement: Taxonomic position • No one position! “Discussion of the hierarchies frequently will elicit comments from the domain expert about the hierarchy structure. Not infrequently in biomedicine, there is no canonical determination of a concept’s correct tree position. For example, meningococcal meningitis may be classified correctly as both a disease of the central nervous systems and a bacterial disease.” —Modeling a description logic vocabulary for cancer research Even with graphs, people might disagree about position!

Friday, 29 July 16 13 Reconciling? “For example, meningococcal meningitis may be classified correctly as both a disease of the central nervous systems and a bacterial disease. So there are always things the experts will question. These discussions of why the hierarchies are structured as they are offer the opportunity to introduce the notions of roles, since the hierarchy position of defined concepts are the result of the concept’s role restrictions.” • Focus on the definition of terms – Easier to get agreement? • Position follows from definitions – But this is a standard KR benefit! (Inference!)

Friday, 29 July 16 14 Lessons Learned? • Is the lack of schema neutrality solved? – Perhaps for good actors – Perhaps not for bad actors – Problem is social as well as technological • We have to design tech around how people behave • There wasn’t a clear “decentralization moment” – Variant perspectives are a feature of the Web – But it wasn’t clearly a new challenge for KR • Indeed, it may be an old challenge

Friday, 29 July 16 15 Integrating KR and the Web

16

Friday, 29 July 16 16 How to Combine Web & KR

1. Uploading a KR to the Web 2. Mining the Web to Generate a KR 3. Publishing a Web Based KR App 4. Something distinctive

17 Friday, 29 July 16 17 1) Uploading a KR to the Web • Still a popular mechanism! – The Web is effective at sharing files – And more!

Friday, 29 July 16 18 Profound effects on KR Research • 2002 ≈200 ontologies on the Web • 2006 ≈ 1000 ontologies on the Web • 2011 ≈ 30,000 ontologies on the Web • Size and complexity of ontologies growing • Tools can handle them!

Friday, 29 July 16 19 Effects on Research: Depth • Publishing systematically yields benefits – NCIt published monthly for 10 years – Can study the evolution of ontologies

Friday, 29 July 16 20 Effects on Practice? • Harder to say – Reuse and linking not esp. high • Designing for reuse is hard • Reusing is hard • Reinvention is the norm • “Standards” make progress on reuse – But usually high level and pushed hard (e.g., BFO) – Web based editing still in infancy • Most work is done with offline tools • See software engineering • Standard Web Benefits – Examples – Feedback – Standardization of formalisms – Indirect (tools get better)

Friday, 29 July 16 21 2) Mining the Web • The Web is an information resource – let’s use it • Use the content – DBpedia comes to mind • Use the people! – E.g. ConceptNet 2

Friday, 29 July 16 22 Interlude: The Web Evolves

Previous versions of ConceptNet were a home-grown crowd-sourced project, where we ran a Web site collecting facts from people who came to the site. The Web of Data is much bigger than that now. Our data comes from many different sources, many of which you can contribute to and improve not just the state of computational knowledge, but of human knowledge. • To begin with, ConceptNet 5 contains almost all the data from ConceptNet 4, created by contributors to the Open Mind Common Sense project. • Much of our knowledge comes from the English and its contributors, through two sources: • DBPedia extracts knowledge from the infoboxes that appear on articles. • ReVerb is a machine reading project, extracting relational knowledge from the actual text of each article. • We have also parsed a large amount of content from the English Wiktionary, including synonyms, antonyms, translations of concepts into hundreds of languages, and multiple labeled word senses for many English words http://conceptnet5.media.mit.edu/ Friday, 29 July 16 23 Using People • Direct crowd sourcing – Let The People curate • Indirect crowd sourcing – Feedback and use – Map data! • Excellent for concrete facts – Unproven for more complex things

Friday, 29 July 16 24 3) Publishing a Web Based KR App

http://www.isi.edu/isd/LOOM/PowerLoom/documentation/ontosaurus-screenshot.gif Friday, 29 July 16 25 What is Wolfram Alpha? Wolfram|Alpha's long-term goal is to make all systematic knowledge immediately computable and accessible to everyone.

We aim to collect and curate all objective data; implement every known model, method, and algorithm; and make it possible to compute whatever can be computed about anything.

http://www.wolframalpha.com/about.html Friday, 29 July 16 26 What is Wolfram Alpha? “Wolfram|Alpha's long-term goal is to make all systematic knowledge immediately computable and accessible to everyone.”

“We aim to collect and curate all objective data; implement every known model, method, and algorithm; and make it possible to compute whatever can be computed about anything.” A (Private) ! http://www.wolframalpha.com/about.html Friday, 29 July 16 27 Interlude: Other Examples • Wolfram Alphaesqe – Evi/TrueKnowledge • Lighter understanding, narrower scope – Garlik • Deeper understanding, narrower scope – • Even lighter understanding, even greater scale – in Google () • 500 million objects • 3.5 billion facts • All fairly centralized! – Although they pull a lot from other sources – Share results but not data

Friday, 29 July 16 28 Do we still need decentralization? “We remove the centralized concepts of absolute truth, total knowledge, and total provability, and see what we can do with limited knowledge.”

• In 1993, the Web >>> any 1 org’s capacity • Today? –Google, , Facebook, etc. –Can handle both the data and the people

Friday, 29 July 16 29 4) Something Distinctive • 1-3 are not fundamentally distinctive –Not a new web (really) –Not a new kind of KR • What would be a Webesque KR? –BTW: I’ve no idea •(OK, I’ve some ideas...) –Do we need such a thing? •Web distinctiveness might be overstated •Technology may have caught up –Google (or Facebook) could run the Web

30 Friday, 29 July 16 30 Linked Data?

Perhaps we were wrong about this knowledge thing?

31

Friday, 29 July 16 31 Linked Data “Principles” • Linked Data is the new cool fad! 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs. so that they can discover more things.

http://www.w3.org/DesignIssues/LinkedData.html Friday, 29 July 16 32 Linked Data “Principles” • Linked Data is the new cool fad! 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs. so that they can discover more things. • Still not very directive! – Some direction • More than said here: E.g., “reuse URIs” – Entity and attribute reconciliation?! – “Providing useful information” is tricky • Esp. for programs! • Meaning of terms....ontologies?! – (In the broadest sense of the term)

http://www.w3.org/DesignIssues/LinkedData.html Friday, 29 July 16 33 Where are we? • What’s the linking? – Ask yourself what sorts of links there are – Ask yourself what do the links support • And who (or what) exploits them – What makes a linked datum? • And what value add over an unlinked one? • What’s the useful information? – What sort of information could there be? – What makes it useful? • For what task?

Friday, 29 July 16 34 Not Very Directive • So, in the end – I can’t quite tell you what the Semantic Web is – I can’t tell you what makes SemWeb tech distinctive • Or distinctively valuable – Because, I believe we don’t know • I think these are very hard questions • However, – The technologies and techniques are very interesting • Data on the web • KR on the web • Lots of stuff to do and learn! – We can do neat stuff with them • Truly open Wolfram Alpha/Evi • Custom Knowledge Sites

Friday, 29 July 16 35 Have Fun!

Take pleasure in minority tech Don’t depend on taking over the world Expand our understanding

36

Friday, 29 July 16 36