<<

TDT4215 Web-intelligence

Project Report

A semantic application using Semantic MediaWiki

Group 12 Hedda Nonstad Jakob Hovland Sigurd Sandve Jørgen Grimnes

March 2015 Abstract

This report will describe the design process of a Semantic MediaWiki application. We will start by establishing a theme for the wiki and then move on to how we implemented the as a web application. The wiki will rely heavily on aggregated content from and DBPedia and maintain a clear semantic annotation by using common ontologies such as the Friend of a Friend and the rdfs ontology.

Preface

TDT4215 - Web Intelligence is one of the available courses for specialization within data and information management. The purpose of this course is to give the students an understanding of the web-based information systems and how advanced technologies can be used to access it or explore the knowledge. The team, consisting of four students from the Department of Computer and Information Science at the Norwegian University of Science and Technology(NTNU). Our task was to make a wiki based on the , RDF, SPARQL, ontologies(OWL), searching and querying content, and categorizing content.

I Contents 3.4 Collect character information from DBPedia...... 5 3.5 Retrieving a description of the 1 Introduction 1 from Freebase. .5 1.1 Problem Description ...... 1 4.1 Content aggregation using DB- 1.1.1 Wiki Theme ...... 1 Pedia...... 6 1.1.2 Motivation ...... 1

2 Preliminary studies 1 List of Tables 2.1 Desired Solution ...... 1 2.2 Existing ...... 1 3.1 Links to the structure of the Wiki.4 2.3 Technical limitations ...... 2 2.4 Tools and technologies ...... 2 2.4.1 Semantic MediaWiki . . .2 2.4.2 Semantic Bundle . . . . .2 2.4.3 Theory from the curricu- lum ...... 2

3 The wiki architecture 2 3.1 The ontology ...... 2 3.2 Classes and properties ...... 2 3.3 Structure of the web application3 3.4 Formatting a query ...... 3 3.4.1 Advanced query 1: Gathering character information ...... 4 3.4.2 Advanced query 2: Get a description of the Rings of Power ...... 4 3.5 The public domain ...... 5 3.5.1 Wiki users ...... 5 3.5.2 Introduced risks . . . . .5

4 Implementation 5 4.1 Forms ...... 5 4.1.1 Content aggregation . . .5 4.2 Problems and challenges . . . . .6

5 Conclusion 6 5.1 Final conclusion ...... 6 5.2 Future work ...... 7 5.3 Evaluation of the project . . . .7

List of Figures

3.1 Our ontology ...... 3 3.2 Query for the friends of ...... 3 3.3 List the bearers of The .3

II 1 Introduction movie series based on has also been seen by all the group members. A few group 1.1 Problem Description members have also read . Be- cause of the popularity of the series, we knew We were tasked to create a wiki with a theme there would be a lot of information available on of our own choice. The wiki was required be ac- our subject. cessible to the public. We were encouraged to use an open source wiki engine such as Seman- We decided that the wiki should be based tic MediaWiki [3] which is hosted free of charge upon the characters related to the Rings of at referata.com. The produced wiki should at Power. There are a lot of other wikis with infor- least fulfill the following parameters: mation on the J.R.R. Tolkien world, so we de- cided to make a wiki with a very specific theme • The wiki should be capable of displaying ex- rather than making a poor copy of the already ternal data. This data could be available established wikis. through eg: SPARQL endpoints.

• The wiki should use at least one external 2 Preliminary studies ontology to annotate the content, such as the Friend of a Friend ontology (FOAF). 2.1 Desired Solution The theme of our wiki is based on Tolkien’s • The wiki should be structured in a manner work and will contain information on the char- that encourages and must acters whom are related to the Rings of Power. offer at least two different ways of search- An idealistic solution will contain enough infor- ing and browsing the data. mation to the extent that the user won’t need to look any further. The wiki could reach this 1.1.1 Wiki Theme. During the initial dis- goal by collecting information from other wikis cussions regarding the theme of our wiki, we and aggregate its stored knowledge by queries considered to create a semantic application to external resources. In order to construct a about movies. Unfortunately, once we began solution of this magnitude we will depend on working on the project we discovered that the using existing function calls and great docu- movie domain was too large and complex to mentation of the inner workings of the Semantic model in the limited time frame of our project. MediaWiki application. In order to narrow the scope of our project, we had to reconsider our chosen domain. After 2.2 Existing wikis some discussion we decided to create a seman- Tolkien Gateway[2] is probably the biggest tic wiki about characters from Middle-earth, wiki with the theme around J.R.R Tolkien and a fictional universe created by J.R.R. Tolkien. his world of Middle earth. It is an fan driven Middle-earth contains a wide range of magi- wiki, so every user and all fans can edit and cal creatures, such as , elves, nazguls share their wisdom. Users are editing the pages and , and legendary items like the Rings daily and since their launch in 2005, they now of Power. We have defined the wiki’s tagline have has over 11 000 articles and 42 000 pages. as “Characters from Middle-earth that are con- To best present their information they are ref- nected to the Rings of Power”. erencing every part of their wiki to the pages of Tolkien’s work. The Tolkien Gateway is cre- 1.1.2 Motivation. ated with MediaWiki, but it does not imple- and The Hobbit are enormously popular books ment the Semantic MediaWiki extension. This and movie series. All of the members in the implies that we can’t issue semantic queries to group have seen the The Lord of the Rings the Tolkien Gateway and that the wiki don’t movies several times and some of the mem- facilitate data collection. bers have also read the books. The newly made The One wiki to Rule them all[1] is similar to

1 the Tolkien Gateway wiki, but it is smaller in the wiki uses the Friend-of-a-Friend (FOAF) size. It provides fans with a community based ontology to annotate its entities. FOAF is a on a shared interest and have a lot of the same small ontology for describing people and their features as the Tolkien Gateway. relationships. Specifically, the wiki uses the Agent, Person and Group classes; and the 2.3 Technical limitations Name, Age, Gender, Knows and Member prop- The project description recommended us to erties from FOAF. The Person class is one use Referata as the platform to distribute our of the core classes in FOAF. In the wiki it is wiki. The main problem with Referata was used as a base class for the race classes; , the lack of support for SPARQL queries. We , , Hobbit and Maia. The race classes needed therefore to migrate the wiki from are used to describe characters of the differ- Referata to folk.ntnu.no, in order to accom- races. All the race classes have the same modate the requirements of this project assign- properties; Name, Age, Gender, Knows, and Is ment. Ring Bearer. Although the race classes should be disjoint, Semantic MediaWiki does not al- 2.4 Tools and technologies low us to define the classes in such a way. The This section gives presents the open source Group class is another core class in the FOAF tools we have used during the implementation ontology. In the wiki it used to define groups of the wiki. of Agents and is itself a subclass of Agent. The Group class has two properties: Name and 2.4.1 Semantic MediaWiki. The Semantic Members. MediaWiki is an extension to the popular open The ontology presented graphically in Fig- source project MediaWiki which enriches the ure 3.1. wiki application with the power of semantic no- tation. The Semantic MediaWiki also facili- 3.2 Classes and properties tates dynamic representation of information by In the following section we will describe the running queries on the structured data. classes and properties in our ontology. The de- cision for choosing them is to best fit the wiki to 2.4.2 Semantic Bundle. The Semantic our theme. The properties of the race classes Bundle is a pre-packaged bundle of common Hobbit, Elf, Dwarf, Man and Maia are listed tools and extensions that plays nicely with below: Semantic MediaWiki and provides “essential” functionality such as SPARQL support, along • :name: The name of the character. with other useful tools such as Semantic • foaf:age: The age of the character at the Drilldown. time of ’s death.

2.4.3 Theory from the curriculum. We • foaf:gender: The gender of the character. will have to rely on our theoretic background The allowed values are Male and Female. from the lectures in order to construct the re- quired SPARQL queries for content aggregation • foaf:knows: The characters that are known and we will rely heavily on the basic under- by this character. standing of how structured data should be an- • isRingBearer: This property should be notated semantically correct. true if this character at some point carried the One Ring, and false otherwise.

3 The wiki architecture • memberOf: This is an inferred property.

3.1 The ontology The properties of the foaf:Group class are In addition to its own classes and properties, listed below:

2 foaf : Person

is a is a is a

Hobbit Elf Dwarf …

has property has property has property

Is ring Member Name Age Gender Name Members bearer of

has properties

Fellowship of White Council the Ring

is a is a

foaf : group

Figure 3.1: A graphical representation of our ontology

[[Category:Hobbit]] [[knows::Gandalf]] The query in Figure 3.2 will display all the hobbits from in the wiki who knows Gandalf.

Figure 3.2: Query for the Hobbit friends of The query will first locate all the entities of Gandalf the category Hobbit in the semantic , and then filter the retrieved entities to a subset that has Gandalf defined as a valid value • foaf:name: The name of the group. of the[[Category property:Hobbit knows.]]/[[knows::Gandalf]]/ / / • foaf:member: The characters that are {{#ask:/ ////[[Category:foaf:Person]]/[[Is/ring/bearer::True]]/ members of this group. ////|/format/=/list/ ////|/link///=/all/ }}/ / 3.3 Structure of the web applica- / Figure/ 3.3: List the bearers of The One Ring. tion {{#get_web_data:/ Please refer to the links in Table 3.1 to observe ////url/=/https://www.googleapis.com/freebase/vi/search?/ ///////query//=/Rings+of+Power/&/ the implementation of our ontological struc- ///////output/=/{{/urlencode://(/common/topic/description/)/|/QUERY/}}// Following////|/format/=/ isjson a/ list of example semantic ture. ////|/data///=/description/=//common/topic/description// queries}}/ that will give data from the wiki: / / 3.4 Formatting a query • {{Figure#:/ 3.2 will return all of Gandalf’s hob- ////PREFIX/dbpediaWowl:/ ////PREFIX/dbpprop:/ The ability to perform semantic searches in ////bitPREFIX friends./dbres:/ ////PREFIX/pu:/ ////SELECT/?description/ our wiki offers the user a dynamic approach ////WHERE//{/?character/rdfs:label/?name/.// • //Figure?character 3.3/dbpedia willWowl:abstract display/?description all//.// the bearers of ?character/pu:subject//.// to represent information already stored on the FILTER/(/lang(?description)/=/'en'/&&/lang(?name)/=/'en'/&&/regex(?name,/"^{{#replace:{{PAGENAME}}|/|.*}}.*",/"i")/)/ ////The}/ One Ring in the Lord of the Rings wiki. By providing the semantic search with a ////|templateBare/=/tableCell/ }}triology./ collection of query terms, the search will try to / retrieve and display the documents containing • Figure 3.4 will retrieve aggregating data those terms. from DBPedia.

3 Wiki Properties http://folk.ntnu.no/jorgegri/webint/mediawiki- 1.24.1/index./Special:Properties Wiki Categories http://folk.ntnu.no/jorgegri/webint/mediawiki- 1.24.1/index.php/Special:Categories Wiki Templates http://folk.ntnu.no/jorgegri/webint/mediawiki- 1.24.1/index.php/Special:Templates Wiki Forms http://folk.ntnu.no/jorgegri/webint/mediawiki- 1.24.1/index.php/Special:Forms

Table 3.1: Links to the structure of the Wiki.

• Figure 3.5 will retrieve structured data In addition, the name should also be from Freebase. in English.

The two last queries referred to in the list 3.4.2 Advanced query 2: Get a descrip- above are advanced queries which facilitate con- tion of the Rings of Power. This query was tent aggregation. The content is fetched from designed to retrieve a description of the Rings Freebase with Ask queries [4] and from DBPe- of Power from Freebase. Since Freebase does dia with SPARQL queries [5]. The following not support any API calls suitable for SPARQL sections will discuss how they work i detail. queries, we used the MediaWiki extension Ex- ternal Data [6] to issue the request. This ex- 3.4.1 Advanced query 1: Gathering tension provides the means of retrieving infor- character information. This query is in- mation from structured data. cluded on the template for the fictional charac- Refer to Figure 3.5 for the query in question. ters. The query is used to aggregate the charac- The first line in the code snippet indicates to ter’s page with a description of the character’s MediaWiki that we wish to perform an inline role in the series. Please refer to Figure 3.4. information retrieval with the External Data The first line in Figure 3.4 initiates an in- extension. The second line specified the URL to line Semantic MediaWiki SPARQL query. The which we will make a HTTP request. The URL following lines with the prefix keyword defines is formatted with all the parameters necessary the semantic annotation that is present in the to construct a query that will return the desired query. The following selection sentence spec- result. The syntactical sugar of adding an inline ifies that we will collect the description of the url encoder is purely for debugging purposes. given entity from a SPARQL endpoint. The de- We’ve further specified that the HTTP re- fault endpoint for our SPARQL queries is DB- quest will return a JSON object and that we Pedia. want to store the retrieved description in a vari- The WHERE statement is where the ad- able named “description”. vanced logic takes place. In plain text, the query asks for the following result:

Return the English description for a character in the Middle Earth uni- verse which has a name that is similar to the name of the page. The spaces “ ” in the name might be represented by some random symbol, and the name might be longer that we’ve specified.

4 [[Category:Hobbit]]/[[knows::Gandalf]]/ / / {{#ask:/ ////[[Category:foaf:Person]]/[[Is/ring/bearer::True]]/ ////|/format/=/list/ ////|/link///=/all/ }}/ / / / {{#get_web_data:/ ////url/=/https://www.googleapis.com/freebase/vi/search?/ ///////query//=/Rings+of+Power/&/ ///////output/=/{{/urlencode://(/common/topic/description/)/|/QUERY/}}// ////|/format/=/json/ ////|/data///=/description/=//common/topic/description// }}/ / / {{#sparql:/ ////PREFIX/dbpediaWowl:/ ////PREFIX/dbpprop:/ ////PREFIX/dbres:/ ////[[CategoryPREFIX/pu::Hobbit]]/[[knows/ ::Gandalf]]/ ///// SELECT/?description/ ////WHERE//{/?character/rdfs:label/?name/.// // ?character/dbpediaWowl:abstract/?description//.// ?character/pu:subject//.// {{#ask:/FILTER/(/lang(?description)/=/'en'/&&/lang(?name)/=/'en'/&&/regex(?name,/"^{{#replace:{{PAGENAME}}|/|.*}}.*",/"i")/)/ ////}/ [[Category:foaf:Person]]/[[Is/ring/bearer::True]]/ ////|templateBare/=/tableCell/ }}///// |/format/=/list/ /////|/link///=/all/ }}/ Figure 3.4: Collect character information from DBPedia. / / / {{#get_web_data:/ ////url/=/https://www.googleapis.com/freebase/vi/search?/ ///////query//=/Rings+of+Power/&/ ///////output/=/{{/urlencode://(/common/topic/description/)/|/QUERY/}}// ////|/format/=/json/ ////|/data///=/description/=//common/topic/description// }}/ / / {{#sparql:Figure/ 3.5: Retrieving a description of the Rings of Power from Freebase. ////PREFIX/dbpediaWowl:/ ////PREFIX/dbpprop:/ ////PREFIX/dbres:/ ////PREFIX/pu:/ ////SELECT/?description/ ////WHERE//{/?character/rdfs:label/?name/.// 3.5/ The/ public?character/ domaindbpediaWowl:abstract/?description//.// user the opportunity to rollback to a previous ?character/pu:subject/version if necessary. However/.// with a sufficient FILTER/(/lang(?description)/=/'en'/&&/lang(?name)/=/'en'/&&/regex(?name,/"^{{#replace:{{PAGENAME}}|/|.*}}.*",/"i")/)/ ////}/ user base, the content should be self-correcting ////|templateBare/=/tableCell/ }}/ without us having to moderate the content. 3.5.1/ Wiki users. We have made our wiki an open to the public domain. That implies that all web users may edit the information con- tained in our wiki. If the site was being used for 4 Implementation a third party application we would recommend to only allow registered users to edit. Medi- 4.1 Forms aWiki only enforces one restriction on the edit- There are 8 categories available for the user ing privileges of the users: the users may not to edit on our wiki. We have created 4 forms delete pages. available through the sidebar menu. These are The , an international, informal com- Elf, Hobbit & Man for creation of new charac- munity of fans of the works of J. R. R. Tolkien, ters and Group for generating new groups. is a huge community dedicated to the tales of These categories come with a pre built tem- Tolkien. The One Wiki to Rule Them All is a plate that the user have to follow, this ensures wiki made by fans for fans, to share information that our content are built with the same data. is the best way possible. By opening the doors This makes it so that it is possible to make use- to a large community of editors, the online wikis ful queries because all categories share the same gain the ability to expand their knowledge base data-point. quickly. 4.1.1 Content aggregation. Our wiki ap- 3.5.2 Introduced risks. Since we have made plication is facilitated by content aggregated an open wiki there is always the chance of van- from DBPedia and Freebase. Specifying gener- dalism by users deleting content or inserting alized queries that were able to automatically false information. However, MediaWiki has his- aggregate content based on the title of the cur- tory logs for all content, giving us or any other rent page proved a hard task to implement.

5 [[Category:Hobbit]]/[[knows::Gandalf]]/ / / {{#ask:/ ////[[Category:foaf:Person]]/[[Is/ring/bearer::True]]/ ////|/format/=/list/ ////|/link///=/all/ }}/ / / / {{#get_web_data:/ ////url/=/https://www.googleapis.com/freebase/vi/search?/ ///////query//=/Rings+of+Power/&/ ///////output/=/{{/urlencode://(/common/topic/description/)/|/QUERY/}}// ////|/format/=/json/ ////|/data///=/description/=//common/topic/description// }}/ / / {{#sparql:/ ////PREFIX/dbpediaWowl:/ ////PREFIX/dbpprop:/ ////PREFIX/dbres:/ ////PREFIX/pu:/ ////SELECT/?description/ ////WHERE//{/?character/rdfs:label/?name/.// // ?character/dbpediaWowl:abstract/?description//.// ?character/pu:subject//.// FILTER/(/lang(?description)/=/'en'/&&/lang(?name)/=/'en'/&&/regex(?name,/"^{{#replace:{{PAGENAME}}|/|.*}}.*",/"i")/)/ ////}/ ////|templateBare/=/tableCell/ }}/ / Figure 4.1: Content aggregation using DBPedia.

The design process of this query encouraged queries, since it was listed as a required feature us to study the course material that were made in the project description. After having wasted available to us from the lectures held by Jon a lot of time on this issue, we decided to migrate Atle Gulla. Through a combination of study- our wiki to a server we could host ourselves. ing the DBPedia ontology and a fair amount of trial and error, we were able to define a query that performed as desired. By incorporating 5 Conclusion built-in MediaWiki functionality, regex filtering and a heavy SPARQL query, the system could 5.1 Final conclusion automatically aggregate content upon request. We managed to construct a semantic wiki The query is listen in Figure 4.1. application by using the Semantic MediaWiki framework. We struggled to implement the au- tomatic content aggregation while our applica- 4.2 Problems and challenges tion was hosted on Referata. The entire Refer- In the first delivery we specified the project ata affair was an unfortunate digression from with a very wide theme. When we started to the progression of our project. When we mi- construct the ontology, we found that having an grated the wiki to a domain we hosted our- extensive theme made the project unnecessar- selves, the development of our wiki sped up ily difficult to model. We decided therefore to tremendously, and our wiki application gained narrow it down to the work of J.R.R. Tolkien. the ability to fetch data from external re- The theme we chose has a lot of fans and they sources, both by using SPARQL queries and make massive contributions to different wikis using the External Data extension of Semantic on the web. MediaWiki. We experienced a lot of challenges with the The wiki uses the Friend of a Friend ontology Referata hosted service. During the first weeks as the backbone in our knowledge representa- of the project, we experienced Referata as a tion. All character entities are sub-categories very slow hosting service. Even basic opera- of foaf:Person, thus making the entities easily tions such as refreshing the webpage could con- accessible to semantic queries. sume 20 seconds. The slow response time of the We did not achieve the projected goal of hosting service affected our motivation during creating a self-sustained wiki application that the initial phase of the project. could dynamically expand its knowledge base When we shifted our focus to incorporating through SPARQL queries, on demand to ex- SPARQL queries into our semantic wiki, we ternal endpoints. We feel the reason that we discovered that Referata did not support basic were not able implement this feature was due SPARQL queries. At first we believed this is- to a combination of the time consuming failure sue to arise from an inadequate knowledge of of using Referata and that the group members how Semantic MediaWiki operated. We be- had no previous experience with semantic ap- lieved that Referata had to support SPARQL plications.

6 We are very pleased with the automatic con- All in all, the project was a very positive ex- tent aggregation of the character pages. The perience and increased both our understanding SPARQL query that is issued in order to re- of how semantic applications operate and gave trieve the data required us to employ most of us hands on experience with writing SPARQL this course’s curriculum: ranging from using queries. entities and properties to advanced filter op- erations such as regex filters. References

5.2 Future work [1] The One wiki to Rule Them All. The concept of an automatically expanding http://lotr.wikia.com/wiki/Main Page wiki should be feasible by using the SPARQL queries and the template structure of Wiki- [2] Tolkien Gateway. Media. The wiki should be expandable as http://tolkiengateway.net/wiki/Main Page long as the produced SPARQL results return [3] Semantic MediaWiki. previously unobserved entities. The process https://semanticmediawiki.org of combining information from multiple re- sources by using the “same as” property re- [4] Special Ask queries in SMW. mains a challenge to be solved. Creating a https://semantic- semi-autonomous wiki generator could be an mediawiki.org/wiki/Help:Inline queries exciting Master Thesis subject. [5] LinkedWiki, a SMW extension. http://m.mediawiki.org/wiki/Extension:LinkedWiki 5.3 Evaluation of the project The unanimous opinion of the group was at [6] External Data, a SMW extension. first that this year’s project in Web-intelligence http://www.mediawiki.org/wiki/Extension:External Data seemed to be quite dull in comparison to ear- lier project assignments that has been given in this subject. Although, as we worked on de- veloping our ontology and gained knowledge of how the inner workings of a semantic applica- tion should be organized, we appreciated the value of this assignment. The project eased us into the process of designing ontologies and how to use SPARQL queries in order to enrich the semantic application. The project description should not have men- tioned Referata as a valid option for hosting our wiki application, since Referata does not support SPARQL queries which was a specified requirement in the project description. This detour into Referata consumed a lot of stress- ful work hours, as we tried to find ways to make SPARQL work on their hosted service. We solved this by hosting our own Semantic MediaWiki at the NTNU provided folk.ntnu.no domain. If this project was to be repeated, we feel the project description should refer other students to also set up their own web applica- tion.

7