Wikipedia Tools for Google Spreadsheets
Thomas Steiner Google Germany GmbH ABC Str. 19, 20354 Hamburg, Germany [email protected]
ABSTRACT 1.1 Wikipedia and Wikidata In this paper, we introduce the Wikipedia Tools for Google Wikipedia’s content and data is available through the Spreadsheets. Google Spreadsheets is part of a free, Web- Wikipedia API (https://{language}.wikipedia.org/w/api.php), based software office suite offered by Google within its Google where {language} represents one of the currently 291 sup- Docs service. It allows users to create and edit spread- ported Wikipedia languages,2 for example, en for English, sheets online, while collaborating with other users in real- de for German, or zu for Zulu. Wikidata is a collaboratively time. Wikipedia is a free-access, free-content Internet ency- edited knowledge base and intended to provide a common clopedia, whose content and data is available, among other source of structured data which can be used by projects such means, through an API. With the Wikipedia Tools for Google as Wikipedia. Its content and data is available through the Spreadsheets, we have created a toolkit that facilitates work- Wikidata API (https://www.wikidata.org/w/api.php). Both ing with Wikipedia data from within a spreadsheet context. the Wikipedia and the Wikidata APIs’ data is available as We make these tools available as open-source on GitHub,1 XML or JSON, among other formats. Wikipedia pageviews released under the permissive Apache 2.0 license. data, i.e., the number of times within a given period of time that a given Wikipedia article has been viewed can be ob- Categories and Subject Descriptors tained using the Pageviews API (https://wikimedia.org/api/ rest v1/?doc). The data is available in JSON format. H.3.5 [Online Information Services]: Web-based services 1.2 Google Spreadsheets and Apps Scripts Keywords Google Spreadsheets can be extended with custom func- 3 Wikipedia, Wikidata, Google Spreadsheets, Google Sheets tions (or formulas) using Google Apps Scripts that are writ- ten in standard JavaScript.4 To illustrate this, a trivial func- 1. INTRODUCTION tion is defined in Listing 1 that can then be used from within a spreadsheet as outlined in Listing 2. Custom functions can In the world of Computer Science, spreadsheet applica- access external resources on the Web by fetching URLs with tions serve for the organization, analysis, and storage of the UrlFetchApp, one of the scripting services available in data in tabular form. Spreadsheets are the computerized Google Apps Script. Fetched data can either be in XML or simulation of paper accounting worksheets, and operate on JSON format and parsed with convenience functions. data represented as cells of an array, organized in rows and columns. Cells can contain numeric or textual data, or the function DOUBLE(input){ results of formulas that automatically calculate and display return input * 2; a value based on the contents of other cells. With the Wiki- } pedia Tools for Google Spreadsheets, we introduce a toolkit of such formulas, tailored to the universe of Wikipedia, that Listing 1: Custom Google Sheets function called DOUBLE. arXiv:1602.02506v1 [cs.IR] 8 Feb 2016 enables a wide range of potential use cases starting from marketing, to search engine optimization, to business anal- =DOUBLE(A1) ysis. Especially through the chaining of formulas, the true DOUBLE power and ease of spreadsheet applications can be unleashed. Listing 2: Usage of the custom function from List- ing 1 in a cell with the value of cell A1 as a parameter. 1Wikipedia Tools for Google Spreadsheets: https://github. com/tomayac/wikipedia-tools-for-google-spreadsheets 2. LIST OF DEVELOPED FUNCTIONS In our Wikipedia Tools for Google Spreadsheets, we provide eleven functions that—in traditional spreadsheets style— follow an all-uppercase naming convention and start with 2List of Wikipedias: https://meta.wikimedia.org/wiki/List of Wikipedias Copyright is held by the International World Wide Web Conference Committee 3 https://developers.google.com/ (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author’s site if the Google Apps Script: Material is used in electronic media. apps-script/ 4 ACM 123-4-5678-9012-3/45/67. Custom functions in Google Sheets: https://developers. http://dx.doi.org/12.3456/7890123.4567890 google.com/apps-script/guides/sheets/functions a WIKI prefix. These functions are wrappers around the par- ticular Wikipedia or Wikidata API calls, or the Pageviews /** Returns Wikipedia synonyms API respectively. Figure 1 shows exemplary output for the * * @param{string} article The Wikipedia article English Wikipedia article https://en.wikipedia.org/wiki/Berlin * @return{Array
Figure 1: Example output for each function in the Wikipedia Tools for Google Spreadsheets (cropped). Live spreadsheet: https://goo.gl/yvbmex. hih”fc via fact “height” sosa xml hr skyscrapers where example via tained an desir- category shows the be 3 in of listed may Figure facts cre- it known POIs. and featuring site, (POIs) such interest automatically booking of advertisements exam- points hotel For on ate based a advertise Wikidata. imagine to able and we Wikipedia if in ple, contained is Ads that Search II: Scenario Usage Knowledge3.2 Google’s (demo in montreal” “vis- for seen in searching attractions when be pages itor results can search Web carousel [10] Graph image an of streetview/treks/miniatur-wunderland/ https://www.google.com/maps/about/behind-the-scenes/ Category:Skyscrapers underly- the of functionality the the extend ex- In they and functions, where functions. implemented native plained the were listed they ex- have if we be as following, from context used can cell be then Spreadsheets a can within Google that Sec- functions how custom through APIs. shown tended different have their we and ond, Wikidata and Wikipedia ces Spreadsheets Google WORK FUTURE AND CONCLUSIONS 5. de- language—is programming Java Milne by the scribed with for designed use min- but general spreadsheets, for to toolkit bound open-source Wikipedia—not An ing functions. custom allelized par- through have“super-computing”powers to spreadsheets Abramson [9]. in of [1], described process in further The is Further, spreadsheets spreadsheets. via with APIs Web taught calling be what can of APIs inverse Web the in RDF, do to we data spreadsheets Fer- translate in to given is sentials introduction similar A reira’s with functions. Spreadsheets Google custom extending to introduction an gives WORK RELATED holiday). public 4. a after weekend long a 8, had January which on article, linear peak German earlier a the an for after (except 13 progression January curve starting pageviews of take first WIKIPAGEVIEWS we via Therefore, pageviews guages any more interest. had the that visitor has obtain increased assuming campaign to attraction, marketing translate the the indicator, on popularity if impact a examine as can pageviews we Wikipedia global railway ing model the for imagery Campaigns Marketing III: Scenario Usage 3.3 by generated are keywords Search calling ads. create to templates 8 7 iitrWneln nGol tetView: Street Google on Wunderland Miniatur meter: 350 over Skyscrapers erhavriescngetypotfo h information the from profit greatly can advertisers Search nti ae,w aeitoue the introduced have we paper, this In book his In View Street added Maps Google 2016, 13, January On ogeAp cit e plcto eeomn Es- Development Application Web Script: Apps Google WIKISYNONYMS ,Han [5], In [3]. WIKIDATAFACTS WIKICATEGORYMEMBERS WIKITRANSLATE iitrWunderland Miniatur ogeAp citfrBeginners for Script Apps Google soside nitrainlup- international an indeed shows 4 Figure . WIKIDATAFACTS tal. et is,w aeitoue h aasour- data the introduced have we First, . over ksrpr vr30meter 350 over skyscrapers n obndwt em ie“hotel”. like terms with combined and tal. et n[7]. in how [8] in show Moser and Olsen . 350 tal. et n hnrtiv aeiw via pageviews retrieve then and ecieterapproach their describe meters https://en.wikipedia.org/wiki/ iitrWunderland Miniatur hc ste sdi two in used then is which , eciehwte enabled they how describe ril nalaalbelan- available all in article n hncekdfrtheir for checked then and https://goo.gl/Ugt0je iiei ol for Tools Wikipedia 7 r rtob- first are ,Gabet [4], RDF123 . 8 Tak- ). DIY Knowledge Graph [email protected] File Edit View Insert Format Data Tools Addons Help All changes saved in Drive Comments Share ing wrapped API functions. We have then focused on three $ % 123 Arial 10
=IFERROR(SUM(QUERY(WIKIPAGEVIEWS("en:"&A2, TODAY() - 30, TODAY()), "SELECT Col2")), "") different usage scenarios that illustrate how to work with the Wikipedia Tools for Google Spreadsheets and finally have provided an overlook on related work in the area. Future work will focus on adding more functions as need be and potentially making the functions more parameteri- zable. In the current iteration, we have favored simplicity and ease of use over customizability, essentially making the most common use case the only option. Possibly, in up- coming releases, we will add an advanced mode that allows experienced users to fine-tune the functions’ results, for ex- ample, to implicitly include bot traffic in WIKIPAGEVIEWS that we have currently excluded on purpose. Concluding, we were positively surprised by the increased productivity and short turnaround time enabled by the Wiki- pedia Tools for Google Spreadsheets for the rapid prototyping
Sheet1 of ideas, especially in combination with the fill-down and
fill-right features in spreadsheets and the charting capabili- Figure 2: Usage scenario I: Wikipedia Tools for Google ties. We look forward to making the tools even more pow- Spreadsheets used to create an ordered category panel based erful and hope to attract collaborators for the open source on Wikipedia category memberships and accumulated Wiki- project available on GitHub at https://github.com/tomayac/ pedia pageviews for popularity ranking (here: the top-10 wikipedia-tools-for-google-spreadsheets. As a positive side ef- visitor attractions in Montreal). Live spreadsheet: https: fect, the tools can even help improve Wikipedia and Wiki- //goo.gl/Njvt1T. data when authors add missing data, for example, we added an image to one of the visitor attractions of Montreal, as this
AdWords Ads [email protected] File Edit View Insert Format Data Tools Addons Help All changes saved in Drive Comments Share fact was initially missing in Wikidata (and thus in Figure 2).
$ % 123 Arial 10
=ARRAYFORMULA(LOWER(WIKISYNONYMS("en:"&C$2)&" hotel"))
6. REFERENCES [1] D. Abramson, L. Kotler, D. Mather, and P. Roe. ActiveSheets: Super-Computing with Spreadsheets. In U. Seattle, editor, Proceedings of the High Performance Computing Symposium – HPC 2001, pages 110–115, San Diego, USA, 2001. [2] R. Cyganiak, D. Wood, and M. Lanthaler. RDF 1.1 Concepts and Abstract Syntax. Recommendation, W3C, Feb. 2014. [3] J. Ferreira. Google Apps Script: Web Application Development Essentials. O’Reilly Media, 2014. [4] S. Gabet. Google Apps Script for Beginners. Packt Sheet1
Publishing, 2014. Figure 3: Usage scenario II: Wikipedia Tools for Google [5] L. Han, T. Finin, C. Parr, J. Sachs, and A. Joshi. Spreadsheets used to create textual search ads based on RDF123: From Spreadsheets to RDF. In The Wikidata facts (here: skyscraper heights) and Wikipedia Semantic Web – ISWC 2008, volume 5318 of LNCS, synonyms as keywords combined with the term “hotel”. Live pages 451–466. Springer, 2008. spreadsheet: https://goo.gl/np1Is8. [6] M. Lathuili`ere.Wikidata SDK, 2016. https://github.com/maxlath/wikidata-sdk (2016-02-08).
Miniatur Wunderland [email protected] File Edit View Insert Format Data Tools Addons Help All changes saved in Drive Comments Share
$ % 123 Arial 10 [7] D. Milne and I. H. Witten. An Open-Source Toolkit =IF(ISBLANK(C$2), "", QUERY(WIKIPAGEVIEWS(C$2, TODAY() - $B$1, TODAY()), "SELECT Col2"))
for Mining Wikipedia. Artificial Intelligence, 194:222–239, Jan. 2013. [8] T. Olsen and K. Moser. Teaching Web APIs in Introductory and Programming Classes: Why and How. Paper 16, SIGED: IAIM Conference, Feb. 2013.
2000 Jan 13, 2016 da:Miniatur W… en:Miniatur Wunderland: 1487 de:Miniatur W… 1500 en:Miniatur W… [9] K. Patel, S. Prish, S. Sadhu, L. Bizek, and X. Pan. es:MiniaturW…
…ﺳﺭﺯﻣﻳﻥ ﻋﺟﺎﯼ:fa 1000 fi:Miniatur Wu… fr:MiniaturWu… Spreadsheet Functions to Call REST API Sources, 500 hu:Miniatur W…
0 1/2 Dec 29, 2015 Jan 5, 2016 Jan 12, 2016 Jan 19, 2016 May 15 2014. US Patent App. 13/672,704. Date
Sheet1 Explore [10] A. Singhal. “Introducing the Knowledge Graph: things, not strings”, Official Google Blog, May 2012. Figure 4: Usage scenario III: Wikipedia Tools for Google http://googleblog.blogspot.com/2012/05/ Spreadsheets used to evaluate the impact of a marketing introducing-knowledge-graph-things-not.html. campaign (here: model railway Miniatur Wunderland being [11] D. Vrandeˇci´cand M. Kr¨otzsch. Wikidata: A Free featured on Google Street View since January 13, 2016). Collaborative Knowledgebase. Commun. ACM, Live spreadsheet: https://goo.gl/q1yhuV. 57(10):78–85, Sept. 2014.