<<

The online encyclopedia is being supplemented by user- edited structured data, available for free to anyone.

BY VRANDECIˇ C´ AND MARKUS KRÖTZSCH : A Free Collaborative

UNNOTICEDBYMOST of its readers, Wikipedia is currently undergoing dramatic changes, as its sister project Wikidata introduces a new multilingual ‘Wikipedia for data’ to manage the factual information of the popular online encyclopedia. With Wikipedia’s data becoming cleaned and integrated in a single location, opportunities arise for many new applications.

Initially conceived as a mostly text-based this striking gap between vision and reality resource, Wikipedia [1] has been collect- is that Wikipedia’s data is buried within 30 ing increasing amounts of structured data: million Wikipedia articles in 287 languages, numbers, dates, coordinates, and many from where it is very difficult to extract. types of relationships from family trees to This situation is unfortunate for anyone the taxonomy of species. This data has be- who wants to make use of the data, but it come a resource of enormous value, with is also an increasing threat to Wikipedia’s potential applications across all areas of main goal of providing up-to-date and ac- science, technology, and culture. This de- curate encyclopedic knowledge. The same velopment is hardly surprising given that information often appears in articles in Wikipedia is driven by the general vision of many languages and on many articles ‘a world in which every single being within a single language. Population num- can freely share in the sum of all knowl- bers for Rome, for example, can be found edge’. There can be no question today in the English and Italian article about About this text that this sum must include data that can be Rome, but also in the English article Cities searched, analyzed, and reused. In March 2014, this manuscript in Italy. All of these numbers are different. has been accepted its current It may thus be surprising that Wikipedia The goal of Wikidata is to overcome form for publication as a con- does not provide direct access to most of these problems by creating new ways for tributed article in Communica- this data, neither through query services Wikipedia to manage its data on a global tions of the ACM. It is an au- nor through downloadable data exports. scale. The result of these ongoing efforts thors’ draft and not the final ver- Actual uses of the data are rare and often can be seen at wikidata.org. The following sion. The final article should be restricted to very specific pieces of informa- essential design decisions characterize the published with , us- tion, such as the geo-tags of Wikipedia ar- approach taken by Wikidata. We will have ing CACM’s hybrid OA model. ticles used in Maps. The reason for a closer look at some of these points later.

Unpublished manuscript (authors’ draft) | Accepted for publication | COMMUNICATIONS OF THE ACM 1 Open Editing. Like Wikipedia, - ham, who created the first wiki in 1995, neither DBpedia nor extract source data allows every user of the site to extend used it to emphasize that his website could information. and edit the stored information, even with- be changed quickly [17]). Wikipedia data, obtained from the out creating an account. A form-based in- The most popular such system is Se- above projects or by custom extraction terface makes editing very easy. mantic MediaWiki (SMW) [15], which ex- methods, has been used successfully to Community Control. Not only the ac- tends MediaWiki, the software used to run improve object search in Google’s Knowl- tual data but also the schema of the data Wikipedia [2], with data management ca- edge Graph (based on ) and is controlled by the contributor community. pabilities. SMW was originally proposed for ’s Open Graph, and in answer- Contributors edit the population number of Wikipedia, but soon was used on hundreds ing engines such as Wolfram Alpha [24], Rome, but they also decide that there is of other websites instead. In contrast to [21], and IBM’s [10]. Wiki- such a number in the first place. Wikidata, SMW manages data as part of its pedia’s geo-tags are also used by Google Plurality. It would be naive to expect textual content. This hinders the creation of Maps. All of these applications would global agreement on the ‘true’ data, since a multilingual, single knowledge base sup- benefit from up-to-date, machine-readable many facts are disputed or simply uncer- porting all Wikimedia projects. Moreover, data exports (e.g., Google Maps currently tain. Wikidata allows conflicting data to co- the data model of Wikidata (discussed be- show India’s Chennai district in the polar exist and provides mechanisms to organize low) is more elaborate than that of SMW, Kara Sea, next to Ushakov Island). Among this plurality. allowing users to capture more complex in- the above applications, Freebase and Evi Secondary Data. Wikidata gathers formation. In spite of these differences, are the only ones that also allow users to facts published in primary sources, to- SMW has had a great influence on Wiki- edit or at least extend the data. gether with references to these sources. data, and the two projects are sharing code A Short History of Wikidata There is no ‘true population of Rome’, but for common tasks. a ‘population of Rome as published by the Wikidata was launched October 2012. Ed- Other examples of free knowledge city of Rome in 2011’. itors could only create items and connect base projects are OpenCyc and Freebase. Multilingual Data. Most data is not them to Wikipedia articles. In January OpenCyc is the free part of [16], tied to one language: numbers, dates, and 2013, three —first Hungarian, which aims for a much more compre- coordinates have universal meaning; la- then Hebrew and Italian—started to con- hensive and expressive representation of bels like Rome and population are trans- nect to Wikidata. Meanwhile, the commu- knowledge than Wikidata. OpenCyc is re- lated into many languages. Wikidata is nity had already created more than three leased under a free license and available multi-lingual by design. While Wikipedia million items. In February, the English Wiki- to the public, but unlike Wikidata, OpenCyc has independent editions for each lan- pedia followed, and in March all Wikipedias is not supposed to be editable by the pub- guage, there is only one Wikidata site. were connected to Wikidata. lic. Freebase, acquired in 2010 by Google, Easy Access. Wikidata’s goal is to al- Wikidata has received input from over is an online platform that allows commu- low data to be used both in Wikipedia and 40,000 contributors so far. Since May nities to manage structured data [7]. Ob- in external applications. Data is exported 2013, Wikidata continuously had over jects in Freebase are classified by types through Web services in several formats, 3,500 active contributors, i.e., contributors that prescribe what kind of data the object including JSON and RDF.Data is published who make at least five edits within a month. can have. For example, Freebase clas- under legal terms that allow the widest pos- These numbers make it one of the most ac- sifies Einstein as a musical artist since it sible reuse. tive Wikimedia projects. would otherwise not be possible to refer Continuous Evolution. In the best In March 2013, Lua was introduced as to records of his speeches. Wikidata sup- tradition of Wikipedia, Wikidata grows with a scripting language to Wikipedia, which ports the use of arbitrary properties on all its community and tasks. Instead of devel- can be used to automatically create and objects. Other differences to Wikidata are oping a perfect system that is presented to enrich parts of articles, such as the in- related to multi-language support, source the world in a couple of years, new features foboxes mentioned before. Lua scripts can information, and to the proprietary software are deployed incrementally and as early as access Wikidata, allowing Wikipedia edi- used to run the site. The latter is critical possible. tors to retrieve, process, and display data. for Wikipedia, which is committed to run on These properties characterize Wikidata Many further features have been intro- a fully open source software stack to allow as a specific kind of curated [8]. duced in the course of 2013, and develop- anyone to fork the project. ment is planned to continue in the foresee- Data in Wikipedia: The Story So Far Other approaches have aimed at ex- able future. The value of Wikipedia’s data has long tracting data from Wikipedia, most notably been obvious, and many attempts have DBPedia [6] and Yago [13]. Both projects Out of Many, One been made to use it. The approach of Wiki- extract information from Wikipedia cate- The first challenge for Wikidata was to rec- data is to crowdsource data acquisition, gories, and from the tabular in oncile the 287 language editions of Wiki- allowing a global community to edit data. the upper right of many Wikipedia articles. pedia. For Wikidata to be truly multi- This extends the traditional wiki approach Additional mechanisms help to improve the lingual, the object that represents Rome of allowing users to edit a website (wiki is extraction quality. Yago includes some tem- must be one and the same across all lan- a Hawaiian word for fast; Ward Cunning- poral and spatial context information, but guages. Fortunately, Wikipedia already

2 COMMUNICATIONS OF THE ACM | Accepted for publication | Unpublished manuscript (authors’ draft) Figure 1: Screenshot of a complex statement as displayed in Wikidata has a closely related mechanism: lan- Simple Data: Properties and Values to contain a property as of with value 2010, guage links, displayed on the left of each For storing structured data beyond text la- and a property method with value estima- article, connect articles in different lan- bels and language links, Wikidata uses tion. These property-value pairs do not re- guages. These links were created from a simple data model. Data is basically fer to Rome, but to the assertion that Rome user-edited text entries at the bottom of ev- described by using property-value pairs. has a population of 2,761,477. We thus ar- ery article, leading to a quadratic number of For example, the item for Rome might rive at a model where the property-value links: each of the 207 articles about Rome have a property population with value pairs assigned to items can have additional contained a list of 206 links to all other ar- 2,777,979. Properties are objects in their subordinate property-value pairs, which we ticles about Rome—a total of 42,642 lines own right that have Wikidata pages with call qualifiers. of text. By the end of 2012, Wikipedias in labels, aliases, and descriptions. In con- Qualifiers can be used to state con- 66 languages contained more text for lan- trast to items, however, these pages are textual information, such as the validity guage links than for actual article content. not linked to Wikipedia articles. time of an assertion. They can also On the other hand, property pages al- be used to encode ternary relations that It would clearly be better to store and ways specify a datatype that defines which elude the property-value model. For ex- manage language links in a single location, type of values the property can have. Pop- ample, to state that Meryl Streep played and this was Wikidata’s first task. For every ulation is a number, has father relates to Margaret Thatcher in The Iron Lady, one Wikipedia article, a page has been created another Wikidata item, and postal code is a could add to the item of the movie a prop- on Wikidata where links to related Wiki- string. This information is important to pro- erty cast member with value Meryl Streep, pedia articles in all languages are man- vide adequate user interfaces and to en- and an additional qualifier ‘role=Margaret aged. Such pages on Wikidata are called sure that inputs are valid. There are only Thatcher’. a small number of datatypes, mainly quan- items. Initially, only a limited amount of These examples illustrate why we have tity, item, string, date and time, geographic data could be stored for each item: a list decided to adopt an extensible set of qual- coordinates, and URL. In each case, data of language links, a label, a list of aliases, ifiers instead of restricting ourselves to the is international, although its display may and a one-line description. Labels, aliases, most common qualifiers, e.g., for tempo- be language-dependent (e.g., the number and descriptions can be specified individu- ral information. Indeed, qualifiers in their 1,003.5 is written ‘1.003,5’ in German and ally for currently up to 358 languages. current form are an almost direct represen- ‘1 003.5’ in French). tation of data found in Wikipedia infoboxes The Wikidata community has created Not-So-Simple Data today. This solution resembles known ap- bots to move language links from Wikipedia Property-value pairs are too simple for proaches of representing context informa- to Wikidata, and more than 240 million links many cases. For example, Wikipedia tion [18, 11]. It should not be misunder- could be removed from Wikipedia. To- states that the population of Rome was stood as a workaround to represent rela- day, most language links displayed on Wiki- 2,761,477 as of 2010 based on estimations tions of higher arity in graph-based data pedia are served from Wikidata. It is still published by Istat. Figure1 shows how this models, since Wikidata statements do not possible to add custom links in an article, could be represented in Wikidata. Even have a fixed (or even bounded) arity in this which is needed in the rare cases where when leaving source information aside, the sense [20]. links are not bi-directional: some articles information can hardly be expressed in Finally, Wikidata also allows for two refer to more general articles in other lan- property-value pairs. One could use a special types of statements. First, it is pos- guages, while Wikidata deliberately con- property estimated population in 2010, or sible to specify that the value of a prop- nects only pages that cover the same sub- create an item Rome in 2010 to specify a erty is unknown. For example, one can ject. By importing language links, Wikidata value for its estimated population—either say that Ambrose Bierce’s day of death is obtained a huge set of initial items that are solution is clumsy and impractical. As sug- unknown rather than not saying anything ‘grounded’ in actual Wikipedia pages. gested by Figure1, we would like the data about it. This clarifies that he is certainly

Unpublished manuscript (authors’ draft) | Accepted for publication | COMMUNICATIONS OF THE ACM 3 Figure 2: Growth of Wikidata: bi-weekly number of edits for different editor groups (left) and size of knowledge base (right) not among the living. As the second addi- provenance [19], but rather provides for the ber of human edits during 14-day intervals. tional feature, one can say that a property structural representation of references. We highlight contributions of power users has no value at all, for example to state Sources are also important as context with more than ten or hundred thousand that Angela Merkel has no children. It is information. Different sources often make edits, respectively, as of February 2014; important to distinguish this situation from contradicting claims, yet Wikidata should they account for most of the variation. The the common case that information is simply represent all views rather than choosing increase in March 2013 marks the official incomplete. It would be wrong to consider one ‘true’ claim. Combined with the context announcement of the site. these two cases as special values. This information provided by qualifiers (e.g., for The right of Figure2 shows the growth becomes clear when considering queries temporal context), a large number of state- of Wikidata from its launch until February that ask for items sharing the same value ments might be stored about a single prop- 2014. There are about 14.5 million items for a property—otherwise, one would have erty, such as population. To help manage and 36 million language links. Essen- to conclude that Merkel and Benedict XVI this plurality, Wikidata allows contributors tially every Wikipedia article is connected have a common child. to optionally mark statements as preferred to a Wikidata item today, so these num- The full data model and its expression (for the most relevant, current statements) bers grow only slowly. In contrast, the num- in OWL/RDF can be found online [9]. or deprecated (for irrelevant or unverified ber of labels, currently 45.6 million, contin- statements). Deprecated statements can ues to grow: there are more labels than Citation Needed be useful to Wikidata editors, to record er- Wikipedia articles. Almost 10 million items Property assertions, possibly with quali- roneous claims of certain sources, or to have statements, and more than 30 mil- fiers, provide a rich structure to express keep statements that still need to be im- statements have been created, using arbitrary claims. In Wikidata, every such proved or verified. Like all content of Wiki- over 900 different properties. As expected, claim has a list of references to sources data, these classifications are subject to property usage is skewed: the most fre- that support the claim. This agrees with community-governed editorial processes, quent property is instance of (P31, 5.6 mil- Wikipedia’s goal of being a secondary similar to those of Wikipedia [1]. lion uses), which is used to classify items; (or tertiary) source, that does not publish one of the least frequent properties is P485 its own research but gathers information Wikidata in Numbers (133 uses), which connects a topic (e.g., published in other primary (or secondary) Wikidata has grown significantly since its Johann Sebastian Bach) with the institu- sources. launch in October 2012. Some key facts tion that archives the topic (e.g., the Bach- There are many ways to specify a refer- about its current content are shown in Ta- Archiv in Leipzig). ence, depending on whether it is a book, a ble 1. It has also become the most edited curated database, a website, or something Wikimedia project, sporting 150–500 ed- The Web of Data entirely different. Moreover, some possi- its per minute, or half a million per day— One of the promising developments in ble sources are represented by Wikidata about three times as many as the English Wikidata is the community’s reuse and inte- items while others are not. Because of Wikipedia. About 90% of these edits are gration of external identifiers from existing that, a reference is simply a list of property- made by bots that contributors have cre- and authority controls, such as value pairs, leaving the details of refer- ated for automating tasks, yet almost one ISNI (International Standard Name Iden- ence modeling to the community. Note million edits per month are made by hu- tifier), CALIS (China Academic Library & that Wikidata does not automatically record mans. The left of Figure2 shows the num- Information System), IATA (airlines and

4 COMMUNICATIONS OF THE ACM | Accepted for publication | Unpublished manuscript (authors’ draft) Table 1. Some basic statistics about Wikidata as of February 2014 Supported languages 358 Statements 30,263,656 Edits 108,027,725 Labels 45,693,894 Statements with source 19,770,547 Usage of datatypes: Descriptions 33,904,616 Properties 920 – Wikidata items 20,135,245 Aliases 8,711,475 Most-used properties: – Strings 7,589,740 Items 14,449,300 – instance of 5,612,339 – Geocoordinates 1,154,703 Items with statements 9,714,877 – country 2,018,736 – Points in time 912,287 Items with ≥5 statements 1,835,865 – taxon name 1,689,377 – Media files 386,357 Item with most statements: Registered contributors 42,065 – URLs 75,614 – Rio Grande do Sul 511 with 5+ edits in Jan 2014 5,008 – Numbers (new in 2014) 9,842

airports), MusicBrainz (albums and per- to common dictionaries, Wikidata covers a can use the Wikidata API to browse, query, formers), or HURDAT (North Atlantic hur- large number of named entities, such as and even edit data. If simple queries are ricanes). These external IDs allow applica- names for places, chemicals, plants, and not enough, a dedicated copy of (parts of) tions to integrate Wikidata with data from specialist terms, which can be very difficult the data is needed; it can be obtained from other sources, which remains under the to translate. Many data-centric views can regular dumps and possibly be updated in control of the original publisher. be translated trivially term by term—think real-time by following edits on Wikidata. Wikidata is not the first project to of maps, shopping lists, or ingredients of Enriching Applications. Many appli- reconcile identifiers and authority files dishes on a menu—assuming that all items cations can be enriched by embedding in- from different sources. Other examples are associated with suitable Wikidata IDs. formation from Wikidata directly into their include VIAF for the bibliographic do- Identifier Reuse. Item IDs can be interfaces. For example, a music player main [3], GeoNames for the geographical used as language-independent identifiers might want to fetch the portrait of the artist domain [22], or Freebase [7]. Wikidata is to facilitate data exchange and integration just being played. In contrast to earlier linked to many of these projects, yet it also across application boundaries. By referring uses of Wikipedia data, e.g., in Google differs in terms of scope, scale, editorial to Wikidata items, applications can provide Maps, it is unnecessary to extract and processes, and author community. unambiguous definitions for the terms they maintain the data. Such lightweight data The collected data is exposed in vari- use, which at the same time are the en- access is particularly attractive for mobile ous ways.1 Current per-item exports are try point to a wealth of related informa- apps. In other cases, it is useful to prepro- available in JSON, XML, RDF, and several tion. Wikidata IDs thus resemble Digi- cess data to integrate it into an application. other formats. Full database dumps are tal Object Identifiers (DOIs), but empha- For example, it would be easy to extract a created at intervals and supplemented by sizing (meta)data beyond online document file of all German cities together with region daily diffs. All data is licensed under CC0, locations, and using another social infras- and post code range, which could then be putting the data into the . tructure for ID assignment. Wikidata IDs used in any application. Such derived data Every Wikidata entity is identified by a are stable: IDs do not depend on lan- can be used and redistributed online or in unique URI, such as http://www.wikidata. guage labels, items can be deleted but IDs software, under any license, even in com- org/entity/Q42 for item Q42 (Douglas are never reused, and the links to other mercial contexts. Adams). By resolving this URI, tools can datasets and sites further increase stability. Advanced Analytics. Information in obtain item data in the requested format Besides providing a large collection of IDs, Wikidata can further be analyzed to derive (through content negotiation). This follows Wikidata also provides means to support new insights beyond what is already stated. standards for data publica- contributors in selecting the right ID by dis- An important approach in this area is log- tion [5], making Wikidata part of the Se- playing labels and descriptions—external ical reasoning, where information about mantic Web [4] and supporting the integra- applications can use the same functional- general relationships is used to derive ad- tion of other data sources ity through the same API. ditional facts. For example, Wikidata’s with Wikidata. Accessing Wikidata. The information property grandparent is obsolete since its collected by Wikidata is interesting in its value can be inferred from values of prop- Wikidata Applications own right, and many applications can be erties father and mother. If we are gener- The data in Wikidata lends itself to manifold built to access this information more con- ally interested in ancestors, then a transi- applications on very different levels. veniently and effectively. Applications cre- tive closure needs to be computed. This Language Labels and Descriptions. ated so far include generic data browsers is relevant for many hierarchical, spatial, Wikidata provides labels and descriptions like the one shown in Figure3, and special- and partonomical relations. Other types of for many terms in different languages. purpose tools including two genealogy advanced analytics include statistical eval- These can be used to present informa- viewers, a tree of life, a table of elements, uations, both of the data and of the inci- tion to international audiences. In contrast and various mapping tools.2 Applications dental metadata collected in the system.

1See http://www.wikidata.org/wiki/Wikidata:Data_access 2An incomplete list is at http://www.wikidata.org/wiki/Wikidata:Tools

Unpublished manuscript (authors’ draft) | Accepted for publication | COMMUNICATIONS OF THE ACM 5 Figure 3: Wikidata in external applications: the data browser ‘Reasonator’ (http://tools.wmflabs.org/reasonator/)

For example, one can readily analyze arti- and co-evolve the same knowledge base on the common goal of turning Wikidata cle coverage by language [12], or the gen- imprint on the way Wikidata is structured? into the most accurate, useful, and informa- der balance of persons with Wikipedia ar- How will Wikidata respond to the demands tive resource possible. This goal provides ticles [14]. Like Wikipedia, Wikidata pro- of communities beyond Wikipedia? stability and continuity, in spite of the fast- vides plenty of material for researchers to paced development, while allowing anyone The influence of the community even study. interested to take part in defining the future extends to the technical development of These are only the most obvious ap- of Wikidata. the website and the underlying software. proaches of exploiting the data, and many Wikipedia is one of the most important Wikidata is based on an open development unforeseen uses can be expected. Wiki- websites today: a legacy that Wikidata still process that invites contributions, and the data is still very young and the data is far has to live up to. Within a year, Wikidata site itself provides many extension points from complete. We look forward to new and has already become an important plat- for user-created add-ons. Various interface innovative applications made possible by form for integrating information from many features, e.g., for image embedding and Wikidata and its development as a knowl- sources. In addition to this primary data, multi-language editing, were designed and edge base [23]. Wikidata also aggregates large amounts of developed by the community. The com- incidental metadata about its own evolu- munity also developed ways to enrich the Future Prospects tion and impact on Wikipedia. Wikidata semantics of properties by encoding (soft) Wikidata is only at its beginning, with some thus has the potential to become a major constraints such as ‘items should not have crucial features still missing. These include resource for both research and the devel- more than one birthplace’. External tools support for complex queries, which is cur- opment of new and improved applications. gather this information, analyze the dataset rently under development. Wikidata, the free knowledge base that ev- for constraint violations, and publish the list However, to predict the future of Wiki- eryone can edit, may thus bring us one of violations on Wikidata to allow editors to data, the plans of the development team step closer to a world in which everybody check if they are valid exceptions or errors. might be less important than one would ex- can freely share in the sum of all knowl- pect: the biggest open questions are about These examples illustrate the close re- edge. the evolution and interplay of the many lationships between technical infrastruc- Wikimedia communities. Will Wikidata ture, editorial processes, and content, and Acknowledgements earn the trust of the Wikipedia communi- the pivotal role the community plays in The work on Wikidata is funded through ties? How will the fact that such different shaping these aspects. The community, donations by the Allen Institute of Arti- Wikipedia communities, with their differ- however, is as dynamic as Wikidata itself, ficial Intelligence (ai)2, Google, the Gor- ent languages and cultures, access, share, based not on status or membership, but don and Betty Moore Foundation, and Yan-

6 COMMUNICATIONS OF THE ACM | Accepted for publication | Unpublished manuscript (authors’ draft) REFERENCES REFERENCES

dex. The second author is supported by [8] Peter Buneman, James Cheney, Wang-Chiew pedia. J. of Web Semantics, 5(4):251–261, the German Research Foundation (DFG) Tan, and Stijn Vansummeren. Curated 2007. databases. In Maurizio Lenzerini and Domenico in project DIAMOND (Emmy Noether grant [16] Douglas B. Lenat and Ramanathan V. Guha. Lembo, editors, Proc. 27th Symposium on Prin- Building Large Knowledge-Based Systems: KR 4381/1-1). ciples of Database Systems (PODS’09), pages Representation and Inference in the Cyc Project. 1–12. ACM, 2008. Addison-Wesley, 1989. References [9] Wikimedia community. Wikidata: Data model. [17] Bo Leuf and Ward Cunningham. The Wiki way: [1] Phoebe Ayers, Charles Matthews, and Ben Wikimedia Meta-Wiki, 2012. https://meta. quick collaboration on the Web. Addison-Wesley Yates. How Wikipedia works: And how you can wikimedia.org/wiki/Wikidata/Data_model. be a part of it. No Starch Press, 2008. Professional, 2001. [10] David A. Ferrucci, Eric W. Brown, Jennifer [18] Robert . MacGregor. Representing reified re- [2] Daniel J. Barrett. MediaWiki. O’Reilly Media, Chu-Carroll, James Fan, David Gondek, Aditya lations in Loom. J. Exp. Theor. Artif. Intell., 5(2- Inc., 2008. Kalyanpur, Adam Lally, J. William Murdock, Eric 3):179–183, 1993. [3] Rick Bennett, Christina Hengel-Dittrich, Ed- Nyberg, John M. Prager, Nico Schlaefer, and ward T. O’Neill, and Barbara B. Tillett. VIAF Christopher A. Welty. Building Watson: an [19] Luc Moreau. The foundations for provenance on (Virtual International Authority File): Linking Die overview of the DeepQA project. AI Magazine, the Web. Foundations and Trends in Web Sci- Deutsche Bibliothek and Library of Congress 31(3):59–79, 2010. ence, 2(2–3):99–241, 2010. name authority files. In Proc. World Library and [11] Ramanathan V. Guha, Rob McCool, and Richard [20] Natasha Noy and Alan Rector, editors. Defin- Information Congress: 72nd IFLA General Con- Fikes. Contexts for the Semantic Web. In ing N-ary Relations on the Semantic Web. W3C ference and Council. IFLA, 2006. Sheila A. McIlraith, Dimitris Plexousakis, and Working Group Note, 12 April 2006. Available at [4] Tim Berners-Lee, James Hendler, and Ora Las- Frank van Harmelen, editors, Proc. 3rd Int. Se- http://www.w3.org/TR/swbp-n-aryRelations/. sila. The Semantic Web. Scientific American, mantic Web Conf. (ISWC’04), volume 3298 of [21] William Tunstall-Pedoe. True Knowledge: pages 96–101, May 2001. LNCS, pages 32–46. Springer, 2004. open-domain using struc- [5] Christian Bizer, Tom Heath, and Tim Berners- [12] Scott A. Hale. Multilinguals and Wikipedia edit- tured knowledge and inference. AI Magazine, Lee. Linked data: The story so far. International ing. arXiv:1312.0976 [cs.CY], 2013. http://arxiv. 31(3):80–92, 2010. org/abs/1312.0976. Journal on Semantic Web and Information Sys- [22] Unxos GmbH. GeoNames, launched 2005. http: tems (IJSWIS), 5(3):1–22, 2009. [13] Johannes Hoffart, Fabian M. Suchanek, Klaus //www.geonames.org, accessed Dec 2013. Berberich, and Gerhard Weikum. YAGO2: A [6] Christian Bizer, Jens Lehmann, Georgi Kobi- [23] Denny Vrandeciˇ c.´ The Rise of Wikidata. IEEE spatially and temporally enhanced knowledge larov, Sören Auer, Christian Becker, Richard Cy- Intelligent Systems, 28(4):90–95, 2013. ganiak, and Sebastian Hellmann. DBpedia – A base from Wikipedia. Artif. Intell., Special Issue [24] . Wolfram Alpha, launched crystallization point for the Web of Data. J. of on Artificial Intelligence, Wikipedia and Semi- 2009. https://www.wolframalpha.com, accessed Web Semantics, 7(3):154–165, 2009. Structured Resources, 194:28–61, 2013. Dec 2013. [14] Maximilian Klein and Alex Kyrios. VIAFbot [7] Kurt Bollacker, Colin Evans, Praveen Paritosh, and the integration of library data on Wikipedia. Denny Vrandeciˇ c´ ([email protected]) works at Tim Sturge, and Jamie Taylor. Freebase: A col- Google. He was the project director of Wikidata at code{4}lib Journal, 2013. http://journal.code4lib. laboratively created graph database for structur- Wikimedia Deutschland until September 2013. org/articles/8964. ing human knowledge. In Proc. 2008 ACM SIG- Markus Krötzsch (markus.kroetzsch@tu- MOD Int. Conf. on Management of Data, pages [15] Markus Krötzsch, Denny Vrandeciˇ c,´ Max Völkel, dresden.de) is lead of the Wikidata data model speci- 1247–1250. ACM, 2008. Heiko Haller, and Rudi Studer. - fication, and research group leader at TU Dresden.

Unpublished manuscript (authors’ draft) | Accepted for publication | COMMUNICATIONS OF THE ACM 7