Towards Computational Models to Theme Analysis in Literature

Towards Computational Models to Theme Analysis in Literature

(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020 Towards Computational Models to Theme Analysis in Literature Abdulfattah Omar Department of English College of Sciences and Humanities Prince Sattam Bin Abdulaziz University Abstract—The recent years have witnessed the development In spite of the proliferation of computational technology of numerous computational methods that have been widely used and the articulation of an explosive production of electronically in humanities and literary studies. In spite of their potentials of encoded information of all kinds, computational methods have such methods in terms of providing workable solutions to been very little used in humanities in general and literature in different inherent problems within these domains including particular [11-13]. The wide cultural gap between the literary selectivity, objectivity, and replicability, very little has been done critic and computational research communities is the most on thematic studies in literature. Almost all the work is done obvious reason. The study is an attempt towards bridging the through traditional methods based on individual researchers’ gap between traditional literary criticism and computational reading of texts and intuitive abstraction of generalizations from methods. The study employs experimentally replicable data that reading. These approaches have negative implications to representation and clustering methods. The greatest advantage issues of objectivity and replicability. Furthermore, it is challenging for such traditional methods to deal effectively with of these methods is that they are completely objective in the the hundreds of thousands of new novels that are published every sense that the results obtained are independent of the person year. In the face of these problems, this study proposes an applying the method. integrated computational model for the thematic classifications The remainder of this article is organized as follows. of literary texts based on lexical clustering methods. As an Section 2 is a brief survey of the approaches to thematic studies example, this study is based on a corpus including Thomas of literary works. Section 3 outlines the methods and Hardy’s novels and short stories. Computational semantic procedures of the study. It describes document clustering analysis based on the vector space model (VSM) representation methods are used to classify the selected works in a of the lexical content of the texts is used. Results indicate that the selected texts were thematically grouped based on their semantic thematically coherent way. Section 4 reports the results of the content. It can be claimed that text clustering approaches which proposed methods and explores the thematic interrelationships have long been used in computational theory and data mining between the texts. Section 5 is conclusion. It summarizes the applications can be usefully used in literary studies. main findings and suggests propositions that may be generalized to other literary texts and genres. Keywords—Computational models; computational semantics; lexical clustering; lexical content; philological methods; Thomas II. LITERATURE REVIEW Hardy; Vector Space Model (VSM) Theme analysis of literary texts is one of the oldest and most established disciplines in literary studies. Critics have I. INTRODUCTION been generally concerned with identifying the themes within An important development in literary studies over the past literary texts. It was thought that part of the critic’s job is to few decades is the increasing application of scientific methods understand the deep meanings conveyed by authors, make in analysis of literary works [1-5]. It has been argued that the observations about literary texts in order to construct the use of such scientific methods can assist in preventing the expression of themes in these works [14-16]. formation of false theories of criticism and the generation of unreliable thematic classifications [6, 7]. The present study is Although theme analysis is very old in literature studies, intended as a contribution to that development. The study seeks the issue of the way themes are defined is still controversial in to propose a computational model that helps readers and critics literary criticism. There is no single agreed upon approach to of literary texts in an objective, replicable, therefore scientific theme analysis in literature. Over the years, there is no way through exploring the thematic relationships of texts in a consensus among critics on the best ways of interpreting texts conceptually coherent way. The study is based on the novels and deriving thematic concepts. It is true to claim that theme and short stories of Thomas Hardy as an example. Thomas analysis is still controversial and problematic in literary Hardy is one of the most important figures in the history of the criticism studies [17]. English novel and he has ever sustained readers’ interest in the With the development of different literary theories themes and topics he tackled. Thomas Hardy was a Victorian including Marxism, Modernism, and feminism, theme analysis poet and novelist and is considered by many critics as a main has been widely considered a reflection of these theories [18, component of the English cultural heritage [8-10]. 19]. Critics have been more concerned with identifying the relationship between author and work as reflected on themes of 93 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020 race, class, and gender. In this way, thematic concepts of It is thought however that it is appropriate for the study. The novelists and authors are usually confined and restricted to the rationale is that the study is concerned with build thematic critic’s engagement with a given theory or selections from a structures of the texts based on their lexical semantics. VSC is text or some texts. thus appropriate for the purposes of the study. In VSC, documents can be grouped into distinct classes based on their The issue of theme analysis with its complexities and lexical content [30, 38, 39]. In this regard, it is assumed that controversies has its implications to the thematic studies of VSC is appropriate for the purposes of the study. VSC is used Thomas Hardy’s literary texts. The thematic classification of for organizing the novels and short stories of Thomas Hardy Hardy’s prose fiction ranges from a broad general classification into distinct classes based on the lexical content of these texts. of his novels and short stories to a discussion of a single Herein, lexical clustering (one of VSC methods) is used. thematic aspect in one, some, or all his writings [10, 20-25]. The main observation about almost the critical studies on the Conventional lexical-clustering algorithms treat text thematic structures of Hardy’s work is that critics have been fragments as a mixed collection of words, with a semantic generally concerned with what Hardy himself classified as similarity between them calculated based on the term of how Major Works. Despite the rich thematic concepts exhibited in many the particular word occurs within the compared Hardy’s prose fiction works exhibit rich thematic concepts, the fragments. Whereas this technique is appropriate for clustering majority of the thematic discussions of Hardy have been large-sized textual collections, it operates poorly when flawed in limiting their discussions to the series of novels and clustering small-sized texts such as sentences [40]. short stories he wrote between 1871 and 1895 [26]. In so doing, a corpus including all the selected texts is It can be claimed thus that the work on Thomas Hardy’s designed. The tradition of building a corpus for text clustering prose fiction is widely selective. Some critics focus on what is applications has always been based on the assumption that the referred to Wessex novels. They think of Hardy’s works as a corpus is both large and representative of the research domain. cry for the lost beauty of the English countryside. Evidently, An important question in the context of this study is what size many commentators have characterized Hardy as a regional the corpus should be in order to support objective and reliable novelist, attribute this regionalist focus to his fascination with generalizations about Thomas Hardy’s prose fiction. The Wessex, an old English kingdom covering an area that corpus on which this analysis is based consists of all the provided the fictionalised setting for Hardy [27, 28]. Balanced known (published and unpublished) prose fiction texts of against this argument, others insist that Hardy was a Victorian Hardy. social critic since his writings depict the sufferings of As a first step for data representation, the corpus was England’s working class and society’s responsibility for their tragic fates. Through this process, Hardy is seen to have been confined to what is referred to a bag of words. It was also decided that the corpus to be built of only the content words. preoccupied with improving conditions in society. These concerns mark Hardy out as a realistic writer who took on the All function words were thus removed. The hypothesis is that they do not usually carry semantic meaning; thus, they cannot role of expressing the joys and woes of the victims of the merciless harshness of their lives [25, 29]. be considered as distinctive features. The corpus should include only and all the distinctive features [41]. Content words One major problem with studies in this tradition is that they can act as strong predictors of the topic(s) or content of a ignore much of the thematic richness in Hardy’s works. In the document [42]. Moreover, the experimental results of face of this limitation, this study suggests the use of empirical document classification indicate that content-word approaches and new technologies. These should have the representation gives good results in identifying the content of a impact of developing a comprehensive and more detailed document and its latent structure [43-45].

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us