WikiBrain: Democratizing computation on Wikipedia Shilad Sen Toby Jia-Jun Li Macalester College University of Minnesota St. Paul, Minnesota Minneapolis, Minnesota
[email protected] [email protected] ∗ WikiBrain Team Brent Hecht Matt Lesicko, Ari Weiland, Rebecca Gold, University of Minnesota Yulun Li, Benjamin Hillmann Minneapolis, Minnesota Macalester College
[email protected] St. Paul, Minnesota ABSTRACT has raced towards this goal, growing in articles, revisions, 1 Wikipedia is known for serving humans' informational needs. and gigabytes by factor of 20. The planet has taken notice. Over the past decade, the encyclopedic knowledge encoded Humans turn to Wikipedia's 31 million articles more than in Wikipedia has also powerfully served computer systems. any other online reference. The online encylopedia's 287 languages attract a global audience that make it the sixth Leading algorithms in artificial intelligence, natural language 2 processing, data mining, geographic information science, and most visited site on the web. many other fields analyze the text and structure of articles However, outside the public eye, computers have also come to build computational models of the world. to rely on Wikipedia. Algorithms across many fields extract Many software packages extract knowledge from Wiki- meaning from the encyclopedia by analyzing articles' text, pedia. However, existing tools either (1) provide Wikipedia links, titles, and category structure. Search engines such as Google and Bing respond to user queries by extracting struc- data, but not well-known Wikipedia-based algorithms or (2) 3 narrowly focus on one such algorithm. tured knowledge from Wikipedia . Leading semantic relat- This paper presents the WikiBrain software framework, edness (SR) algorithms estimate the strength of association an extensible Java-based platform that democratizes access between words by mining Wikipedia's text, link structure to a range of Wikipedia-based algorithms and technologies.