Wikipedia Tools for Google Spreadsheets
Total Page:16
File Type:pdf, Size:1020Kb
Wikipedia Tools for Google Spreadsheets Thomas Steiner Google Germany GmbH ABC Str. 19, 20354 Hamburg, Germany [email protected] ABSTRACT 1.1 Wikipedia and Wikidata In this paper, we introduce the Wikipedia Tools for Google Wikipedia's content and data is available through the Spreadsheets. Google Spreadsheets is part of a free, Web- Wikipedia API (https://flanguageg.wikipedia.org/w/api.php), based software office suite offered by Google within its Google where {language} represents one of the currently 291 sup- Docs service. It allows users to create and edit spread- ported Wikipedia languages,2 for example, en for English, sheets online, while collaborating with other users in real- de for German, or zu for Zulu. Wikidata is a collaboratively time. Wikipedia is a free-access, free-content Internet ency- edited knowledge base and intended to provide a common clopedia, whose content and data is available, among other source of structured data which can be used by projects such means, through an API. With the Wikipedia Tools for Google as Wikipedia. Its content and data is available through the Spreadsheets, we have created a toolkit that facilitates work- Wikidata API (https://www.wikidata.org/w/api.php). Both ing with Wikipedia data from within a spreadsheet context. the Wikipedia and the Wikidata APIs' data is available as We make these tools available as open-source on GitHub,1 XML or JSON, among other formats. Wikipedia pageviews released under the permissive Apache 2.0 license. data, i.e., the number of times within a given period of time that a given Wikipedia article has been viewed can be ob- Categories and Subject Descriptors tained using the Pageviews API (https://wikimedia.org/api/ rest v1/?doc). The data is available in JSON format. H.3.5 [Online Information Services]: Web-based services 1.2 Google Spreadsheets and Apps Scripts Keywords Google Spreadsheets can be extended with custom func- 3 Wikipedia, Wikidata, Google Spreadsheets, Google Sheets tions (or formulas) using Google Apps Scripts that are writ- ten in standard JavaScript.4 To illustrate this, a trivial func- 1. INTRODUCTION tion is defined in Listing 1 that can then be used from within a spreadsheet as outlined in Listing 2. Custom functions can In the world of Computer Science, spreadsheet applica- access external resources on the Web by fetching URLs with tions serve for the organization, analysis, and storage of the UrlFetchApp, one of the scripting services available in data in tabular form. Spreadsheets are the computerized Google Apps Script. Fetched data can either be in XML or simulation of paper accounting worksheets, and operate on JSON format and parsed with convenience functions. data represented as cells of an array, organized in rows and columns. Cells can contain numeric or textual data, or the function DOUBLE(input){ results of formulas that automatically calculate and display return input * 2; a value based on the contents of other cells. With the Wiki- } pedia Tools for Google Spreadsheets, we introduce a toolkit of such formulas, tailored to the universe of Wikipedia, that Listing 1: Custom Google Sheets function called DOUBLE. arXiv:1602.02506v1 [cs.IR] 8 Feb 2016 enables a wide range of potential use cases starting from marketing, to search engine optimization, to business anal- =DOUBLE(A1) ysis. Especially through the chaining of formulas, the true DOUBLE power and ease of spreadsheet applications can be unleashed. Listing 2: Usage of the custom function from List- ing 1 in a cell with the value of cell A1 as a parameter. 1Wikipedia Tools for Google Spreadsheets: https://github. com/tomayac/wikipedia-tools-for-google-spreadsheets 2. LIST OF DEVELOPED FUNCTIONS In our Wikipedia Tools for Google Spreadsheets, we provide eleven functions that|in traditional spreadsheets style| follow an all-uppercase naming convention and start with 2List of Wikipedias: https://meta.wikimedia.org/wiki/List of Wikipedias Copyright is held by the International World Wide Web Conference Committee 3 https://developers.google.com/ (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author’s site if the Google Apps Script: Material is used in electronic media. apps-script/ 4 ACM 123-4-5678-9012-3/45/67. Custom functions in Google Sheets: https://developers. http://dx.doi.org/12.3456/7890123.4567890 google.com/apps-script/guides/sheets/functions a WIKI prefix. These functions are wrappers around the par- ticular Wikipedia or Wikidata API calls, or the Pageviews /** Returns Wikipedia synonyms API respectively. Figure 1 shows exemplary output for the * * @param{string} article The Wikipedia article English Wikipedia article https://en.wikipedia.org/wiki/Berlin * @return{Array<string>} The list of synonyms and the English Wikipedia category https://en.wikipedia.org/ */ wiki/Category:Berlin. The functions are listed below. function WIKISYNONYMS(article){ ’use strict’; WIKITRANSLATE Returns Wikipedia translations (language if (!article){ links) for a Wikipedia article. return ’’; WIKISYNONYMS Returns Wikipedia synonyms (redirects) for } var results = []; a Wikipedia article. try{ WIKIEXPAND Returns Wikipedia translations (language links) var language= article.split(/:(.+)?/)[0]; and synonyms (redirects) for a Wikipedia article. var title= article.split(/:(.+)?/)[1]; if (!title){ WIKICATEGORYMEMBERS Returns Wikipedia category mem- return ’’; bers for a Wikipedia category. } WIKISUBCATEGORIES Returns Wikipedia subcategories for title= title.replace(/\s/g,’_’); a Wikipedia category. var url=’https://’+ language+ ’.wikipedia.org/w/api.php’+ WIKIINBOUNDLINKS Returns Wikipedia inbound links for ’?action=query’+ a Wikipedia article. ’&blnamespace=0’+ WIKIOUTBOUNDLINKS Returns Wikipedia outbound links for ’&list=backlinks’+ a Wikipedia article. ’&blfilterredir=redirects’+ ’&bllimit=max’+ WIKIMUTUALLINKS Returns Wikipedia mutual links, i.e, the ’&format=xml’+ intersection of inbound and outbound links for a Wiki- ’&bltitle=’+ pedia article. encodeURIComponent(title); var xml= UrlFetchApp.fetch(url) WIKIGEOCOORDINATES Returns Wikipedia geocoordinates for .getContentText(); a Wikipedia article. var document= XmlService.parse(xml); WIKIDATAFACTS Returns Wikidata facts for a Wikipedia var entries= document.getRootElement() article. .getChild(’query’).getChild(’backlinks’) .getChildren(’bl’); WIKIPAGEVIEWS Returns Wikipedia pageviews statistics for for( var i = 0;i< entries.length;i++) { a Wikipedia article. var text= entries[i].getAttribute(’title’) WIKIPAGEEDITS Returns Wikipedia pageedits statistics for .getValue(); a Wikipedia article. results[i] = text; } Most functions directly wrap native API calls, with three } catch (e){ exceptions: (i) the functionality of the WIKISYNONYMS and // no-op } the WIKITRANSLATE functions is combined in the WIKIEXPAND return results.length>0? results:’’; WIKITRANSLATE WIKIEXPAND function, both the and the func- } tion accept an optional target languages parameter that al- lows for limiting the output to just a subset of all available Listing 3: Implementation of WIKISYNONYMS. Wikipedia languages; (ii) the function WIKIMUTUALLINKS is the intersection of the two functions WIKIINBOUNDLINKS and WIKIOUTBOUNDLINKS; and (iii) the function WIKIDATAFACTS provides a list of claims [11] (or facts), enriched with en- 3. USAGE SCENARIOS tity and property labels for improved readability, limited to We have tested the Wikipedia Tools for Google Spreadsheets single-value objects, and simplified using an adapted version with different usage scenarios in mind. These include, but of Maxime Lathuili`ere's simplifyClaims function5 from his are not limited to, the ones listed in the following. Wikidata SDK [6]. This allows us to return two columns| in RDF [2] terms \predicate" and \object" pairs|with one 3.1 Usage Scenario I: Ordered Category Panel unique object, for example, the predicate ISO 3166-2 code Wikipedia holds an enormous amount of categories, for with the object DE-BE, and deliberately discarding multi- example, visitor attractions in Montreal.6 Category members value claims, for example, predicate head of government obtained through a call of WIKICATEGORYMEMBERS are listed with objects Michael Müller and Klaus Wowereit, among in alphabetical order, however, if we additionally request many others. While in the concrete example the ordering pageviews data for each category member through a series is clear (temporal), this is not true in the general case, of WIKIPAGEVIEWS calls and then sort by pageviews in de- for example, with predicate instance of. As a result, in scending order, we get a representative list of top-10 visitor WIKIDATAFACTS, we prefer indisputability of claims over their attractions|enriched with photos retrieved through calls of completeness. Listing 3 exemplarily shows the complete im- WIKIDATAFACTS filtered on \image"|as shown in Figure 2. plementation of the WIKISYNONYMS function. A similar feature (based on non-disclosed metrics) in form 5Wikidata SDK simplifyClaims function: https://github. 6Visitor attractions in Montreal: https://en.wikipedia.org/ com/maxlath/wikidata-sdk#simplify-claims-results wiki/Category:Visitor attractions in Montreal Arkenberge Portal:Berlin of Berlin Abgeordnetenhaus Berlin State of of the Constitutional Court Germany 2--43155 0000 0001 2341 9654 of Berlin flag Berlin.svg Flag of of Berlin.svg Coat of arms DE-BE 62422 Berlin Berlin of arms of Coat 10115–14199 030 122530980 4005728-8 B Germany Brandenburg Category:Berlin banner 2.jpg Berlin http://www.berlin.de/ c9ac1239-e832-41bc-9930-e252a1fd110