Latent Semantic Analysis, Corpus Stylistics and Machine Learning

Total Page:16

File Type:pdf, Size:1020Kb

Latent Semantic Analysis, Corpus Stylistics and Machine Learning Latent Semantic Analysis, Corpus stylistics and Machine Learning Stylometry for Translational and Authorial Style Analysis: The Case of Denys Johnson-Davies’ Translations into English A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Mohammed Al Batineh May, 2015 © Copyright by Mohammed S. Al-Batineh All Rights Reserved Dissertation written by Mohammed Al Batineh BA., Yarmouk University, Jordan, 2008 MA., Yarmouk University, Jordan, 2010 APPROVED BY __________________________, Chair, Doctoral Dissertation Committee Dr. Françoise Massardier-Kenney (advisor) __________________________, Members, Doctoral Dissertation Committee Dr. Carol Maier __________________________, Dr. Gregory M. Shreve __________________________, Dr. Jonathan I. Maletic __________________________, Dr. Katherine Rawson ACCEPTED BY __________________________, Interim Chair, Modern and Classical Language Studies Dr. Keiran J Dunne __________________________, Dean, College of Arts and Sciences Dr. James L. Blank TABLE OF CONTENTS LIST OF FIGURES ........................................................................................... viii LIST OF TABLES ............................................................................................... ix DEDICATION ...................................................................................................... x ABSTRACT ........................................................................................................ xii CHAPTER 1: INTRODUCTION ......................................................................... 1 1.1. Introduction ........................................................................................... 1 1.2. Denys Johnson-Davies .......................................................................... 6 1.3. Research Hypotheses ............................................................................ 8 1.4. Research Method .................................................................................. 9 1.5. Significance of the Study .................................................................... 11 1.6. Summary of Chapters ......................................................................... 12 CHAPTER 2: LITERATURE REVIEW ............................................................ 14 2.1. A Brief History of Literary Stylistics .................................................. 14 2.2. Approaches to Style in Translation Studies ........................................ 17 2.3. Text-Oriented Approaches .................................................................. 18 2.3.1. Comparative Approach ................................................................... 19 2.3.2. Target-Oriented Approach .............................................................. 25 2.4. Translator-Oriented Approaches ......................................................... 27 2.5. Cognitive-Oriented Approach ............................................................. 44 2.6. Conclusion .......................................................................................... 47 iii CHAPTER 3: METHODOLOGY ...................................................................... 51 3.1. Introduction ......................................................................................... 51 3.2. Data Collection ................................................................................... 53 3.3. Corpus Database ................................................................................. 53 3.4. Corpus Compilation and Pre-processing ............................................ 54 3.5. Latent Semantic Analysis ................................................................... 56 3.5.1. LSA Similarity Query ..................................................................... 60 3.5.2. LSA Similarity Cutoff .................................................................... 62 3.5.3. LSA Output Evaluation ................................................................... 62 3.6. Corpus Stylistics ................................................................................. 62 3.6.1. Standardized Type-Token Ratio (STTR) ........................................ 63 3.6.2. Mean Sentence Length .................................................................... 64 3.6.3. Punctuation marks ........................................................................... 65 3.7. Statistical Testing ................................................................................ 65 3.8. Machine Learning Approach .............................................................. 66 3.8.1. Character n-grams ........................................................................... 68 3.8.2. Part of Speech (POS) n-grams ........................................................ 69 3.8.3. Word n-grams ................................................................................. 72 3.9. Tools Used in the Dissertation ............................................................ 73 3.10. Conclusion .......................................................................................... 74 CHAPTER 4: LATENT SEMANTIC ANALYSIS RESULTS ......................... 78 4.1. Introduction ......................................................................................... 78 iv 4.2. LSA Similarity Analysis ..................................................................... 79 4.2.1. LSA Similarity Query on J-D’s Translation before Creative Writing 80 4.2.1.1. LSA Results with V=100 ...................................................... 82 4.2.2. LSA Similarity Query on J-D’s Translation after Creative Writing 87 4.2.2.1. LSA Results with V=50 ........................................................ 89 4.3. Conclusion .......................................................................................... 93 CHAPTER 5: CORPUS STYLISTICS AND MACHINE LEARNING ANALYSIS RESULTS ...................................................................................... 94 5.1. Introduction ......................................................................................... 94 5.2. Corpus Analysis .................................................................................. 95 5.2.1. Textual Analysis ............................................................................. 95 5.2.1.1. Standardized Type-Token Ratio ........................................... 95 5.2.1.2. Mean Sentence Length .......................................................... 97 5.2.2. Punctuation Marks Analysis ........................................................... 98 5.2.2.1. Standardized hyphen Analysis .............................................. 99 5.2.2.2. Standardized Comma Analysis ........................................... 101 5.2.2.3. Standardized Semicolon Analysis ....................................... 102 5.2.3. SPSS Statistical Analysis .............................................................. 103 5.2.3.1. Textual Analysis ................................................................. 104 5.2.3.1.1. Standardized Type-Token Ratios (STTRs) .................. 104 5.2.3.2. Mean Sentence Length ........................................................ 105 v 5.2.3.3. Punctuation Marks analysis ................................................ 105 5.2.3.3.1. Standardized Comma analysis ...................................... 105 5.2.3.3.2. Standardized Hyphen analysis ...................................... 106 5.2.3.3.3. Standardized Semicolon analysis ................................. 107 5.3. Machine Learning Stylometry .......................................................... 108 5.3.1. JGAAP Tool ................................................................................. 110 5.3.2. Corpus Pre-processing .................................................................. 112 5.3.3. JGAAP Analysis Method .............................................................. 113 5.3.4. Style Markers Analysis ................................................................. 114 5.3.4.1. Character n-gram analysis ................................................... 114 5.3.4.2. Part-of-Speech (POS) Analysis ........................................... 115 5.3.4.3. Word n-gram Analysis ........................................................ 117 5.3.5. Conclusion .................................................................................... 118 CHAPTER 6: DISCUSSION ............................................................................ 122 6.1. Introduction ....................................................................................... 122 6.2. Zooming into the Results .................................................................. 123 6.3. Thematic analysis .............................................................................. 125 6.4. Textual Analysis ............................................................................... 126 6.4.1. STTR ............................................................................................. 126 6.4.2. Mean Sentence length ................................................................... 127 6.5. Punctuation Marks ............................................................................ 128 6.6. Syntactic Analysis ............................................................................. 130 vi 6.7. Word n-gram Analysis ...................................................................... 131 6.8. Character n-gram Analysis
Recommended publications
  • St.Litter.-2 Lam.(4)-2018.Indd
    Studia Litteraria Universitatis Iagellonicae Cracoviensis 13 (2018), z. 4, s. 257–269 doi:10.4467/20843933ST.18.022.9475 www.ejournals.eu/Studia-Litteraria ROSWITHA BADRY Uniwersytet Alberta i Ludwika we Fryburgu e-mail: [email protected] Socially Marginalised Women in Selected Narratives of Egyptian Female Writers Abstract Since the 1970s women authors in Egypt have produced a number of narratives that centre on the plight and fate of socially marginalised women. In this context marginalisation is not only understood in the sense of socio-economically disadvantaged women of the lower strata but also refers to non-conformist women, whose behaviour is considered to be deviant from the norm, abnormal or even mad by mainstream society. As a result, they feel alienated from society, and choose diverse ways (passive, active, or subversive) of coping with their fate. This contribution will take selected novels and short stories written by Alifa Rifaat (1930–96), Nawal El Saadawi (b. 1931), and Salwa Bakr (b. 1949) as examples in order to demonstrate the shift in emphasis and perspective on the topic. This will be done against the individual biographical background and writing career of the three authors. Although all au- thors are committed to women’s issues and gender equality, not all of them can be described as feminist writer-activists. Keywords: Egyptian women writers, Alifa Rifaat, Nawal El Saadawi, Salwa Bakr, social mar- ginalisation, alienation, literary perspectives Introduction: Social Marginalisation as a Major Topic of Egyptian Women Authors Since the 1970s women authors in Egypt have produced a number of narratives that centre on the plight and fate of socially marginalised women.
    [Show full text]
  • Genealogies of Feminism: Leftist Feminist Subjectivity in the Wake of the Islamic Revival in Contemporary Morocco
    Genealogies of Feminism: Leftist Feminist Subjectivity in the Wake of the Islamic Revival in Contemporary Morocco Nadia Guessous Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2011 ©2011 Nadia Guessous All rights reserved ABSTRACT Genealogies of Feminism: Leftist Feminist Subjectivity in the Wake of the Islamic Revival in Contemporary Morocco Nadia Guessous This dissertation is an ethnographic and genealogical study of leftist feminist subjectivity in the wake of the Islamic Revival in contemporary Morocco. It draws on two years (2004-2006) of field research amongst founding members of the Moroccan feminist movement whose activism emerged out of their immersion in and subsequent disenchantment with leftist and Marxist politics in the early 1980s. Based on ethnographic observations and detailed life histories, it explores how Moroccan feminists of this generation came to be constituted as particular kinds of modern leftist subjects who: 1) discursively construct “tradition” as a problem, even while positively invoking it and drawing on its internal resources; 2) posit themselves as “guardians of modernity” despite struggling with modernity’s constitutive contradictions; and 3) are unable to parochialize their own normative assumptions about progress, modernity, freedom, the body, and religion in their encounter with a new generation of women who wear the hijab. How and why a strong commitment to ideas associated with modernity, with women’s rights and with the left is seen as necessitating a condemnation and disavowal of “traditional” and of non-secular ways of being is one of the main themes animating this project.
    [Show full text]
  • The Role of Social Agents in the Translation Into English of the Novels of Naguib Mahfouz
    Some pages of this thesis may have been removed for copyright restrictions. If you have discovered material in AURA which is unlawful e.g. breaches copyright, (either yours or that of a third party) or any other law, including but not limited to those relating to patent, trademark, confidentiality, data protection, obscenity, defamation, libel, then please read our Takedown Policy and contact the service immediately The Role of Social Agents in the Translation into English of the Novels of Naguib Mahfouz Vol. 1/2 Linda Ahed Alkhawaja Doctor of Philosophy ASTON UNIVERSITY April, 2014 ©Linda Ahed Alkhawaja, 2014 This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without proper acknowledgement. Thesis Summary Aston University The Role of Social Agents in the Translation into English of the Novels of Naguib Mahfouz Linda Ahed Alkhawaja Doctor of Philosophy (by Research) April, 2014 This research investigates the field of translation in an Egyptain context around the work of the Egyptian writer and Nobel Laureate Naguib Mahfouz by adopting Pierre Bourdieu’s sociological framework. Bourdieu’s framework is used to examine the relationship between the field of cultural production and its social agents. The thesis includes investigation in two areas: first, the role of social agents in structuring and restructuring the field of translation, taking Mahfouz’s works as a case study; their role in the production and reception of translations and their practices in the field; and second, the way the field, with its political and socio-cultural factors, has influenced translators’ behaviour and structured their practices.
    [Show full text]
  • Ramzi Salti, Ph.D
    Ramzi Salti, Ph.D. Stanford University Stanford, California 94305-2015 Lecturer, Arabic Language and Literature Office: (650) 725-1560 Arabic Language Program Fax: (650) 725-9377 Stanford Language Center [email protected] Division of Literatures, Cultures and Languages www.author32.blogspot.com Building 240-212 ACADEMIC EXPERIENCE & SERVICE: Full Time Lecturer, Arabic Language Program (1997-present): Stanford Language Center, Division of Literatures, Cultures and Languages, Stanford University, Stanford, California Taught courses over the past 16 years in Beginning, Intermediate, and Advanced Arabic; created/taught new courses in Arabic including “Readings in Arabic Literature” and “Advanced Conversational Arabic.” Organized/ Hosted/Participated in dozens of academic events at Stanford University including The Arab Film Festival (Oct 2013); “Ya'ani: Week of Music, Culture, and Languages of the Middle East (May 2013); 'Tradition and Modernity: Globalization of Hip Hop' Event/Concert featuring Omar Offendum & DAM; “Emerging Voices in Arab American Literature” series (February 2000-9); “Minorities in the Arab World” (MSAN sponsored event 2009); “Islam and Hip Hop Culture” (co- presenter with Stanford Professor H. Samy Alim, 2011-12). Host of a weekly radio show titled ‘Arabology’ which airs on KZSU 90.1 FM (Stanford University) and features interviews with Middle Eastern scholars while spotlighting music and various cultural productions from the Arab world. See www.author32.blogspot.com and www.facebook.com/arabology. Guest speaker at MSAN’s Professor Lunch Series (by invitation). Active participant in workshops and that center on using technology to supplement classroom learning including lectures on “Using Internet-Based Technology to Enhance Classroom Proficiency” (AME Talk, 2008) and “Using Facebook and Blogger to Improve Student Communication (in Arabic)”.
    [Show full text]
  • Unit 1: Introduction
    Corpus building and investigation for the Humanities: An on-line information pack about corpus investigation techniques for the Humanities Unit 1: Introduction David Evans, University of Nottingham 1.1 What a corpus is A corpus is defined here as a principled collection of naturally occurring texts which are stored on a computer to permit investigation using special software. A corpus is principled because texts are selected for inclusion according to pre-defined research purposes. Usually texts are included on external rather than internal criteria. For example, a researcher who wants to investigate metaphors used in university lectures will attempt to collect a representative sample of lectures across a number of disciplines, rather than attempting to collect lectures that include a lot of figurative language. Most commercially available corpora are made up of samples of a particular language variety which aim to be representative of that variety. Here are some examples of some of the different types of corpora and how they represent a particular variety: General corpora An example of a general corpus is the British National Corpus which “… aims to represent the universe of contemporary British English [and] to capture the full range of varieties of language use.” (Aston & Burnard 1998: 5). As a result of this aim the corpus is very large (containing some 100 million words) and contains a balance of texts from a wide variety of different domains of spoken and written language. Large general corpora are sometimes referred to as reference corpora because they are often used as a baseline against which judgements about the language varieties held in more specialised corpora can be made.
    [Show full text]
  • A Study of Short Stories by Assia Djebar and Alifa Rifaat
    ISLAMIC CULTURE AND THE QUESTION OF WOMEN’S HUMAN RIGHTS IN NORTH AFRICA: A STUDY OF SHORT STORIES BY ASSIA DJEBAR AND ALIFA RIFAAT NAOMI EPONGSE NKEALAH (24127028) A MINI-DISSERTATION SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE MA PAN-AFRICAN LITERATURES IN THE DEPARTMENT OF ENGLISH IN THE FACULTY OF HUMANITIES AT THE UNIVERSITY OF PRETORIA SUPERVISOR: MS K SOLDATI-KAHIMBAARA CO-SUPERVISOR: PROFESSOR RA GRAY MARCH 2006 1 CONTENTS Dedication 3 Acknowledgements 4 Summary 5 CHAPTER ONE INTRODUCTION Aim 7 Proposition 7 Definition of concepts 8 Background to the study: women in Islam 18 The condition of women in pre-Islamic Arabia 19 Women’s status in the days of Prophet Muhammed 22 The situation of Muslim women in modern times 26 Problems and issues to be investigated 31 Literature review 33 Methodology 40 Structural organization 41 CHAPTER TWO ECHOING VOICES: CONTEXTUALIZING MUSLIM WOMEN’S WRITING The context of African feminisms 44 Arab feminism and gender discourse 47 Feminist politics in Muslim women’s writing 52 Nawal el-Saadawi 53 Mariama Bâ 57 Zaynab Alkali 60 CHAPTER THREE THE CRY OF A MOTHER HEN: ALIFA RIFAAT SPEAKS FOR MUSLIM WOMEN IN NORTH AFRICA Author’s background and creative vision 65 A synopsis of Distant View of a Minaret 68 Women’s right to love and kind treatment 73 Women’s right to maintenance 88 Women’s right to protection of honour 91 CHAPTER FOUR ASSIA DJEBAR AS AN ADVOCATE OF WOMEN’S RIGHTS IN ALGERIA Djebar’s writing and the issue of women’s rights in Islam 98 Brief overview of stories 101
    [Show full text]
  • Egypt & Alifa Rifaat
    EGYPT & ALIFA RIFAAT By Bruce Gordon GEOGRAPHY • Northern Africa, bordering the Mediterranean Sea, between Libya and the Gaza Strip, and the Red Sea north of Sudan, and includes the Asian Sinai Peninsula. • Desert; hot, dry summers with moderate winters. • Periodic droughts; frequent earthquakes; flash floods; landslides; hot, driving windstorms called khamsin occur in spring; dust storms; sandstorms. POPULATION • 88,487,396 (July 2015 est.) • Egyptian 99.6% • Other 0.4% • (2006 census) • -0.19 migrant(s) / 1,000 population (2015 est.) • Extremely low migration rate. RELIGION • 90% Muslim (predominantly Sunni) • 10% Christian (majority Coptic Orthodox, other Christians include Armenian Apostolic, Catholic, Maronite, Orthodox, and Anglican) • (2012 est.) Al-Azhar Mosque CULTURE FOOD • Koshari – Some consider this - a mixture of rice, lentils, and macaroni - to be the national dish. • Egyptian cuisine makes heavy use of legumes, vegetables and fruits since Egypt's rich Nile valley and delta produce large quantities of these crops in high quality. ART • Ancient Egyptian Art reached a high level in painting and sculpture, and was both highly stylized and symbolic. HISTORY Early Modern • Ottoman Egypt 1517–1867 • French occupation 1798–1801 • Egypt under Muhammad Ali 1805–1882 • Khedivate of Egypt 1867–1914 Modern Egypt • British occupation 1882–1922 • Sultanate of Egypt 1914–1922 • Kingdom of Egypt 1922–1953 • Republic 1953–present LITERATURE • In the late nineteenth and early twentieth centuries, the Arab world experienced al-Nahda, a Renaissance-esque movement which touched nearly all areas of life, including literature. • In 1914 Muhammad Husayn Haykal wrote Zaynab, considered the first modern Egyptian as well as Islamic novel.
    [Show full text]
  • Spotlight on the Muslim Middle East-Issues of Identity. a Student
    DOCUMENT RESUME ED 415 148 SO 027 957 AUTHOR Greenberg, Hazel Sara, Ed.; Mahony, Liz, Ed. TITLE Spotlight on the Muslim Middle East Issues of Identity. A Student Reader [and] Teacher's Guide. INSTITUTION American Forum for Global Education, New York, NY. SPONS AGENCY Department of Education, Washington, DC. ISBN ISBN-0-944675-55-7; ISBN-0-944675-56-5 PUB DATE 1995-00-00 NOTE 175p. AVAILABLE FROM The American Forum for Global Education, 120 Wall Street, Suite 2600, New York, NY 10005, telephone: 212-742-8232. PUB TYPE Guides Classroom Learner (051) Guides Classroom Teacher (052) EDRS PRICE MF01/PC07 Plus Postage. DESCRIPTORS African History; Arabs; Asian History; Foreign Countries; Global Education; *Islamic Culture; *Middle Eastern History; *Middle Eastern Studies; Non Western Civilization; *Primary Sources; Resource Materials; Secondary Education; Social Studies; *World History IDENTIFIERS Middle East; Muslims ABSTRACT These books offer primary source readings focusing on issues of identity and personality in the Middle East. Individual sections of the books examine a particular issue in personality development through the perspectives of Islamic religion and cultural tradition. The issues of identity include: (1) "Religion"; (2) "Community"; (3) "Ethnicity"; (4) "Nationalism"; and (5)"Gender." Unique to the teacher's guide are three essays that provide additional background information: (1) "Thinking about Identity" (Lila Abu Lughod); (2) "Muhammad, the Qur'an and Muslim Identity" (Frank E. Peters); and (3)"Identity and the Literacy Context" (Mona N. Mikhail). Insights and strategies are offered in the teacher's guide to accompany the student readings. Appended materials in the teacher's guide include: a student worksheet on religion, eight teacher readings, and 26 references.
    [Show full text]
  • Feminism and Religion in Alifa Rifaat's Short Stories Ramzi M. Salti
    Feminism and Religion in Alifa Rifaat's Short Stories Ramzi M. Salti, University of California, Riverside When one speaks of feminism in so-called Third World countries, one must be careful not to confound it with Western conceptions of feminism. This statement is especially relevant when applied to feminism in Arab countries, and particularly when it comes to feminist writers in Egypt, an Arab country which had seen a major feminist movement emerge in the latter part of the nineteenth century and acquire true recognition in the past fifty years or so.: Following in the path of such women as the founder of the Egyptian Feminist Union, Huda Sha'rawi (1879-1947), women writers in Egypt have constantly struggled to obtain an identity of their own. This quest for an identity has, for the most part, limited itself to finding an identity for women in Arab societies within the boundaries of the Islamic tradition. Muslim authors such as Alifa Rifaat and Nawal Sadawi have fought largely for an independent identity for women within the context of Islam, without adopting a secular view or one which deviates from certain accepted social norms. Such authors, of course, hold the view that there indeed is a dignified and independent place for women within Islam, provided that the Qur'anic teachings on women are followed more faithfully. By borrowing some ideas from Western feminists and changing them to fit their own situation, these writers argue that the male dominance that governs most Arab societies is the result of misinterpreting the Qur'an on the part of men, and ignoring many of the parts that deal specifically with women and their social rights.
    [Show full text]
  • Cambridge Handbook of English Corpus Linguistics Chapter 2: Computational Tools and Methods for Corpus Compilation and Analysis1
    Cambridge Handbook of English Corpus Linguistics Chapter 2: Computational Tools and Methods for Corpus Compilation and Analysis1 Paul Rayson UCREL, Lancaster University 1. Introduction The growing interest in corpus linguistics methods in the 1970s and 1980s was largely enabled by the increased power of computers and the use of computational methods to store and process language samples. Before this, even simple methods for studying language such as extracting a list of all the different words in a text and their immediate contexts was incredibly time consuming and costly in terms of human effort. Only concordances of books of special importance such as the Qur’an, the Bible and the works of Shakespeare were made before the 20th century and required either a large number of scholars or monks or a significant investment in time by a single individual, in some cases more than ten years of their lives. In these days of web search engines and vast quantities of text that is available at our finger tips, the end user would be mildly annoyed if a concordance from a one billion word corpus took more than five seconds to be displayed. Other text rich disciplines can trace their origins back to the same computing revolution. Digital Humanities scholars cite the work of Roberta Busa working with IBM in 1949 who produced his Index Thomisticus, a computer-generated concordance to the writings of Thomas Aquinas. Similarly, lexicographers in the 19th century used millions of handwritten cards or quotation slips but the field was revolutionised in the 1980s with the creation of machine-readable corpora such as COBUILD and the use of computers for searching and finding patterns in the data.
    [Show full text]
  • Introduction ¸¹º
    Introduction ¸¹º his anthology contains short stories by forty contemporary women writers from across the Arab world. These multiple T voices articulate the female experience over the past half- century in an area stretching from the Middle East to North Africa. They speak of old values, new needs, marriage, childbearing, love, sexuality, education, work, and freedom. They explore the relations between the sexes, and question traditional norms and bequeathed customs as they assert their own desires and aspirations. Invariably, they take a stand, be it romantic, rebellious, conservative, liberal, or radical. The intimate and vividly crafted portrait of the Arab woman that emerges from these narratives is not only fascinating but end- lessly thought provoking. The aim of this anthology is to introduce the English reader to Arab women’s ways of life, currents of thought, and creative expres- sion. The volume offers a rich cultural encounter in which the com- plex world of Arab women, as seen by these women themselves, is unveiled. One may hope that the wealth of material presented in this volume will deepen Western understanding of Arab society, illuminate the status and lifestyles of Arab women, and broaden awareness of the contribution of Arab women writers to modern Arabic literature. The anthology is organized around the genre of the short story and its individual female practitioners. In this regard, it is the first of its kind, the traditional collection being generally centered on a sin- gle author, a particular country, or a variety of genres. The scope of this anthology extends over several generations of women writers, beginning with the pioneers who published in the 1940s and 1950s, through the younger generation who fol- lowed in the 1960s and 1970s, to the present generation whose lit- erary output appeared in the 1980s and especially the 1990s, thus 1 © 2005 State University of New York Press, Albany 2 Arab Women Writers providing a broad spectrum of works of fiction by Arab women.
    [Show full text]
  • Significant Concordance and Co-Occurrence in Quantitative
    Exploring Cities in Crime: Significant Concordance and Co-occurrence in Quantitative Literary Analysis Janneke Rauscher1 Leonard Swiezinski2 Martin Riedl2 Chris Biemann2 (1) Johann Wolfgang Goethe University Gruneburgplatz¨ 1, 60323 Frankfurt am Main, Germany (2) FG Language Technology, Dept. of Computer Science Technische Universitat¨ Darmstadt, 64289 Darmstadt, Germany [email protected], [email protected], {riedl,biem}@cs.tu-darmstadt.de Abstract Although the number of research projects in Dig- ital Humanities is increasing at fast pace, we still We present CoocViewer, a graphical analy- observe a gap between the traditional humanities sis tool for the purpose of quantitative lit- scholars on the one side, and computer scientists on erary analysis, and demonstrate its use on the other. While computer science excels in crunch- a corpus of crime novels. The tool dis- ing numbers and providing automated processing plays words, their significant co-occurrences, and contains a new visualization for signif- for large amounts of data, it is hard for the com- icant concordances. Contexts of words and puter scientist to imagine what research questions co-occurrences can be displayed. After re- form the discourse in the humanities. In contrast to viewing previous research and current chal- this, humanities scholars have a hard time imagining lenges in the newly emerging field of quan- the possibilities and limitations of computer technol- titative literary research, we demonstrate how ogy, how automatically generated results ought to CoocViewer allows comparative research on be interpreted, and how to operationalize automatic literary corpora in a project-specific study, and how we can confirm or enhance our hypothe- processing in a way that its unavoidable imperfec- ses through quantitative literary analysis.
    [Show full text]