Catching the Red Priest: Using Historical Editions of Encyclopaedia Britannica to Track the Evolution of Reputations Yen-Fu Luo†, Anna Rumshisky†, Mikhail Gronas∗ †Dept. of Computer Science, University of Massachusetts Lowell, Lowell, MA, USA ∗Dept. of Russian, Dartmouth College, Hanover, NH, USA yluo,arum @cs.uml.edu,
[email protected] { } Abstract mention statistics from books written at different historical periods. Google Ngram Viewer is a tool In this paper, we investigate the feasibil- that plots occurrence statistics using Google Books, ity of using the chronology of changes in the largest online repository of digitized books. But historical editions of Encyclopaedia Britan- while Google Books in its entirety certainly has nica (EB) to track the changes in the land- quantity, it lacks structure. However, the history scape of cultural knowledge, and specif- of knowledge (or culture) is, to a large extent, the ically, the rise and fall in reputations of history of structures: hierarchies, taxonomies, do- historical figures. We describe the data- mains, subdomains. processing pipeline we developed in order to identify the matching articles about his- In the present project, our goal was to focus on torical figures in Wikipedia, the current sources that endeavor to capture such structures. electronic edition of Encyclopaedia Britan- One such source is particularly fitting for the task; nica (edition 15), and several digitized his- and it has been in existence at least for the last torical editions, namely, editions 3, 9, 11. three centuries, in the form of changing editions of We evaluate our results on the tasks of arti- authoritative encyclopedias, and specifically, Ency- cle segmentation and cross-edition match- clopaedia Britannica.