Etymonline: from Etymological Dictionary to an Online Digital Library
Total Page:16
File Type:pdf, Size:1020Kb
MASTER’S THESIS IN LIBRARY AND INFORMATION SCIENCE FACULTY OF LIBRARIANSHIP, INFORMATION, EDUCATION AND IT 2017:11 Etymonline: From etymological dictionary to an online digital library Utilizing LIS notions in transforming a database into a DL YAELLE KALIFON © Yaelle Kalifon Partial or full copying and distribution of the material in this thesis is forbidden without written consent. English title Etymonline: From etymological dictionary to an online digital library Utilizing LIS notions in converting a database to a DL Author: Yaelle Kalifon Completed: 2017 Abstract: The thesis explores the hypothetical turning of an online database of etymology into an online digital library. The reader will be introduced with the principles that constitute a library, as opposed to other sources of information and knowledge, and the contemporary attributes of having those available online. The thesis argues that by separating metadata from entries and storing it separately, users would be able to gain more information per search, as well as browse through other etymological data in the area of their interest. The methodology employed is that of characterizing each entry according to categories, each of which composed of various possible tags that fit the existing corpus of entries. A search is then made possible according to specific entry (as is the case currently in the Etymonline database), as well as specific category, specific tag, or any combination of those. These added features also enable random browsing of the collection and the deducing of other etymological ties or points of interest. Though not without flaws, the suggested methodology proves to retrieve more hits upon searching for a specific attribute of words or related ties to specific entries. Flaws, restrictions and limitations, as well as functions that were not attempted but could improve user experience further, are discussed. The thesis stresses the importance of searching and browsing functions and their contribution to future etymological research, by using the available information as it is provided in Etymonline.com., in the structure of a DL instead of a database. Keywords: categorization, tagging, information organization, data mining, etymology 1 Table of Contents Chapter 1 Introduction........................................................................................................ 5 1.1 Background........................................................................................... 5 1.2 Research question and aim.................................................................... 5 1.2.1 Problem........................................................................................... 5 1.2.2 Aim ................................................................................................. 6 1.2.3 Research question ........................................................................... 6 1.2.4 Ideational limitation........................................................................ 7 1.2.5 Etymonline justification.................................................................. 7 1.3 Thesis structure ..................................................................................... 7 Chapter 2 Literature review ............................................................................................... 8 2.1 Libraries and digital libraries – past, present, future ............................ 8 2.2 Volunteer based digital resources ......................................................... 9 2.2.1 Project Gutenberg ........................................................................... 10 2.2.2 Wikipedia........................................................................................ 10 2.2.3 Flickr............................................................................................... 10 Chapter 3 Classification theory .......................................................................................... 11 3.1 Foundations of classification ................................................................ 11 3.2 The power of classification systems ..................................................... 12 3.3 Matching a classification system to a corpus – gains and pitfalls ........ 13 3.4 The freedom and difficulties of classifying in the digital world........... 13 3.4.1 Searching versus browsing ............................................................. 14 3.4.2 Metadata.......................................................................................... 14 3.4.3 Misc. ............................................................................................... 15 3.4.4 Different ways of organizing .......................................................... 15 3.4.5 Tagging ........................................................................................... 16 3.4.6 Penn tags project............................................................................. 17 Chapter 4 Etymology ......................................................................................................... 18 4.1 English etymology ................................................................................ 18 4.2 Lexical approaches to etymological dictionaries.................................. 19 Chapter 5 Conceptual and visual representation.............................................................. 20 5.1 System and interface design.................................................................. 20 5.2 Psychological representation ................................................................ 21 5.3 Text mining........................................................................................... 22 5.4 Text visualization.................................................................................. 22 5.5 Physical versus digital categorization................................................... 23 2 5.6 Classification versus categorization...................................................... 24 5.6.1 Classification................................................................................... 24 5.6.2 Categorization................................................................................. 24 Chapter 6 Etymonline.......................................................................................................... 25 Chapter 7 Method and procedure ...................................................................................... 30 7.1 Contents and set selection..................................................................... 30 7.2 Method .................................................................................................. 31 7.2.1 Categories ....................................................................................... 31 7.2.2 Tags................................................................................................. 35 7.3 Hypothesis and analysis........................................................................ 36 7.3.1 Etymonline related ties ................................................................... 37 7.3.2 Suggested model related ties........................................................... 38 7.3.3 Comparison..................................................................................... 39 7.4 Reservations and flaws ......................................................................... 39 7.4.1 Reservations.................................................................................... 39 7.4.2 Flaws............................................................................................... 41 Chapter 8 Results ............................................................................................................... 42 8.1 Testing the methodology in producing sets of interest ......................... 42 8.2 Testing the DL model in comparison to Etymonline............................ 44 Chapter 9 Discussion ......................................................................................................... 46 9.1 Context.................................................................................................. 47 9.2 Contextual needs................................................................................... 47 9.3 Metadata................................................................................................ 48 9.4 From database to a DL.......................................................................... 48 9.5 Original titles for categories and tags ................................................... 49 9.6 Why tagging.......................................................................................... 50 9.7 Results’ representation.......................................................................... 50 9.8 Community or experts’ control............................................................. 51 9.9 Value of Etymonline............................................................................. 51 Chapter 10 Conclusion .......................................................................................................... 52 Bibliography ....................................................................................................... 54 Appendices.......................................................................................................... 58 3 Figures and Tables Figures Figure 1 Related terms .................................................................................. 6 Figure 2 Hierarchical categorization ............................................................