Drawing Conclusions from Linked Data on the Web: the EYE Reasoner
Total Page:16
File Type:pdf, Size:1020Kb
Editor: Michiel van Genuchten VitalHealth Software IMPACT [email protected] Editor: Les Hatton Oakwood Computing Associates [email protected] Drawing Conclusions from Linked Data on the Web The EYE Reasoner Ruben Verborgh and Jos De Roo This issue’s installment examines a software program reasoning about the world’s largest knowledge source. Ruben Verborgh and Jos De Roo describe how a small open source project can have a large impact. This is the fourth open source product discussed in the Impact department and the rst written in the logic programming language Prolog. —Michiel van Genuchten and Les Hatton THE WEB is the world’s largest source If you look closely at Figure 1, you’ll of knowledge for people—and ma- notice that the link type itself (the prop- chines. In the beginning, those machines erty) is also a URL. So, if a machine were mostly search engine crawlers doesn’t understand what “knows” that extracted keywords from natural- means, it can look it up by following language texts. But now, the Web of- that URL. This principle is crucial to fers them something far more powerful: linked data: if you don’t know some- linked data. thing, look it up. Which Thomas are Linked data goes back to the essence we talking about? What does “knows” of the Web and information itself, by mean? Follow the URL to nd out. representing each piece of data as a link If you follow the URL for this par- between two things. For example, Fig- ticular “knows” (http://xmlns.com/foaf ure 1 shows a triple stating that Thomas /0.1/knows), you’ll learn about the na- Edison “knows” Nikola Tesla. Unlike ture of this relationship. First, using most hyperlinks between Edison and “knows” means the involved subject Tesla, this one carries a speci c mean- and object are people. So even if we ing. Yet linked data’s real bene t goes don’t know Thomas or Nikola, we know deeper: Edison and Tesla are repre- they’re people (as opposed to pets or car- sented by their Web address or URL. So, toon gures). Second, this “knows” indi- if you want to know more about Edison cates reciprocity, so Nikola also knows or Tesla, you can follow their URLs. Thomas. As humans, we can derive this Therefore, linked data is linked on two without even being aware of who Nikola levels: each triple links two concepts, or Thomas are. and those concepts link to more infor- Such pieces of derived knowledge mation about themselves. seem human-speci c, but linked data 0740-7459/15/$31.00 © 2015 IEEE MAY/JUNE 2015 | IEEE SOFTWARE 23 IMPACT links that object to the subject. http://dbpedia.org/resource/Thomas_Edison Subject So, if we give EYE the Edison– knows–Tesla triple, the description http://xmlns.com/foaf/0.1/knows Predicate of “knows,” and the previous rule, http://dbpedia.org/resource/Nikola_Tesla Object EYE will derive that Tesla also knows Edison. In isolation, this might not look FIGURE 1. At the heart of linked data are triples: predicates link a subject to an object. spectacular. After all, we gave EYE This triple states that Thomas Edison “knows” Nikola Tesla. all the knowledge it needed. How- ever, the rules can cascade at a large scale, so it becomes interest- to turn that data into knowl- ing if we give EYE tons of knowl- Software Machines edge and concrete actions. edge—say, millions of triples—and Continuing the previous it derives just the facts we want. For Generic reasoning engine example, let’s see how EYE instance, we might ask for “contem- Notation3 P-code moves from data to conclu- poraries of Nikola Tesla,” and by sions. We can describe the the (derived) fact that Tesla knows Euler abstract machine knows property with existing Edison, EYE could find Edison as Prolog VM code concepts such as domain, range, an answer. In a different use case, and SymmetricProperty. Many rea- we might feed EYE with medical Prolog virtual machine soners have built-in knowl- information of millions of patients Assembly code edge; they would know what to identify adverse drug effects that those concepts mean and how previously were hidden. EYE is used Central processing unit to apply them. The draw- regularly to solve such real-world back, of course, is that only clinical applications. built-in concepts can cre- FIGURE 2. The EYE stack offers a generic ate new knowledge. So, EYE EYE’s Internals reasoning engine on top of interchangeable virtual was designed to have the least Algorithmically speaking, EYE is machines. amount of inherent knowl- a theorem prover. Users set a goal, edge; it’s extensible through and EYE tries to reach it by applying rules so that new knowledge logical rules similar to what we men- also lets software reasoners arrive at can be created. For example, the tioned previously, mostly working the same conclusions. This column EYE webpage offers rules that model backward from the goal. To evade discusses how the EYE reasoner can the SymmetricProperty concept as follows: endless loops, the algorithm avoids do just that and how industry is al- needlessly repeating previous work ready using this. { through Euler path detection, simi- ?property rdf:type owl:SymmetricProperty. lar to Leonhard Euler’s Königsberg Reasoning on the ?subject ?property ?object. bridge problem. EYE interprets each Semantic Web with EYE } logical rule P ⇒ C (where P is a pre- Since 2006, EYE (available un- => condition and C is a consequent) as der the W3C license at http:// { P AND NOT(C) ⇒ C, so that rules eulersharp.sourceforge.net) has been ?object ?property ?subject. execute only when they can generate tackling such reasoning challenges }. new triples. on a large scale, thereby forming a A key characteristic of EYE’s ar- part of the bigger Semantic Web vi- Perhaps surprisingly, this rule is chitecture is portability, because sion. In this vision, machines per- also a (special) triple: antecedent– interoperability is crucial to the Se- form tasks on the Web for people, then–consequent. It explains that, mantic Web. EYE components are combining linked data with con- if a symmetric property links a interchangeable (see Figure 2). EYE cepts such as reasoning and proof subject to an object, then it also runs in a Prolog virtual machine, 24 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFTWARE IMPACT which is compiled to assembly code and runs directly on the CPU. The 2,500 core of EYE, the Euler Abstract Machine, is compatible with at least 2,000 two major Prolog engines (YAP and SWI-Prolog). This machine accepts s N3 (Notation3) P-code, which is a 1,500 Prolog representation resulting from parsing RDF (Resource Description 1,000 Framework) triples and N3 rules. No. of download The entire stack thereby becomes a 500 generic reasoning engine, which is extensible with any kind of domain- 0 specific rules. Because EYE can out- 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 put N3 P-code to a file, users can Year create specific reasoning-engine in- stances that have a certain rule set preloaded for a particular domain. FIGURE 3. EYE has been downloaded 30,000 times, at an average of 200 downloads EYE also exhibits external com- per month. patibility: it accepts exchangeable N3 and Turtle RDF documents from any source on the Web. Us- predominant and only implemen- domain modeled in RDF (and N3 ers can ask EYE to generate a proof tation, aptly named EYE (first for for rules). In particular, if a domain explaining how the given goal is Euler YAP [Prolog] Engine, then for was modeled with existing core on- reached. Such proofs use a publicly Euler Yet Another Proof Engine). tological concepts from the Semantic available, interoperable vocabulary, Since then, more than 150 updates Web, such as those from the RDFS so other parties can follow and of EYE with new features and bug (RDF Schema) and OWL vocabular- understand the line of reasoning. fixes have been released; the sched- ies, EYE can derive new triples right Most important, this mechanism ule was recently adapted to a three- away because it can reuse the exist- allows independent proof valida- month cycle. The number of down- ing N3 rules for these concepts. tion, which contributes to one of loads from SourceForge has steadily In particular, EYE has several the forms of trust on the Seman- increased, with an average of 200 active applications in the medical tic Web. If a certain party comes per month and a total of 30,000 (see sector.1 One example is the SALUS to a conclusion, any other party Figure 3). EYE is also available as a (Scalable, Standard Based Interop- can thus verify why that conclu- public Web service and a Docker im- erability Framework for Sustainable sion is valid. Furthermore, EYE is age (with 500 downloads so far). Proactive Post Market Safety Stud- compatible with the W3C’s Rule Jos De Roo develops and main- ies) project, for which Agfa Health- Interchange Format (RIF), so the tains EYE, and 30 people in its com- care is developing a semantic in- rules that constitute deductions and munity support it through bug re- teroperability service for postmarket proofs can be exchanged even be- ports, which have totaled more than drug safety studies. Different data yond the N3 language. 200 so far. These reports are usually sources each employ different on- addressed within days. EYE’s cur- tologies, and EYE lets users trans- EYE’s Development and Use rent code base consists of roughly form queries and data to and from EYE originated around 1999, the 10,000 lines of Prolog source code, CREAM (Clinical Research Entities beginning of the Semantic Web era, which amounts to 100,000 lines of Patterns: Advanced Model).