A Topic Segmentation of Texts Based on Semantic Domains
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Plot Thickens in Lubowski Probe
• TODAY: PLOT THICKENS IN LUBOWSKI PR0BE • OSTRICH CASE' IN 'COURT " BARBARA A WOW· Malawi 'The truth must out' to polls Christina Dausas. grandmother of ClOSEtofourmil CHRIS NOIVANGA lion HaJawlan vot the boy. yesterday told The Nam ibian ers went to the she believed the bones were those of RELA TIVES of 15 ycar-old Raymond her grand-child as Ihe clothes found poU. yesterday to Charles Dausab have approached the vote In the coun with the bones had been those he had Namibian Police 10 invcstigate the worn on the day he vanished. try" nrst-ever cause of the teenager's death after multi-party elec Nampo\ 's Inspector Jun ius Shoopala bones believed to be his were found in told The Namibian Ihat although the tions. a riverbed in KalUtura o n May 6. The Namiblan'. results of a post-mortem will only be Dausab disappeared in March this CHRISTOF released today. the police believe the year after he was last seen with some MAlETSKY Is In bones are Oausab' s. The po lice are MaJawi and reportt; friends in Katutura. in vestigating Ihecasc, Shoopala ,added. on the historic The bones were found behind the The Nam ibian traced Ch ristina event. Carn olicchurth in Katutura's Soweto Dau sas at the police station in SEEKING THE TRUTH ... Granny Christina Dausas (left), relatives See p;ale 7. section, approximate ly one kilometre Petrina Kooitjieand Elsievan Wyk (behind)claim police did not do much from his home. continued on page 2 to trace the missing boy. Swawek forced to import SA power He e:lprusecl eom:em tbat lm· FRANNA KAVARI portedpowerwascoalDgSwawu. -
A Bootstrapping Approach for Thematic Analysis
A bootstrapping approach for thematic analysis Olivier Ferret and Brigitte Grau LIMSI-CNRS BP 133 91403 Orsay cedex [ferret,grau]@limsi.fr Abstract Thematic analysis is important for a lot of applications dealing with texts, such as text summarisation or information extraction. But it can be done with a great precision only if it relies on structured knowledge, which is difficult to produce. In this paper, we propose using bootstrapping in order to solve this problem: a first thematic analysis based on a weakly structured source of knowledge, a collocation network, is used for learning explicit topic representations that then support a more precise and a more reliable thematic analysis. 1 Introduction Applications on large text bases, as information retrieval systems, have to deal with topic identification and tracking if they aim at improving their results by presenting retrieved texts in a better way, by highlighting the most relevant passages or the thematic structure of the texts. Information extraction systems would also benefit of a precise topic identification in order to delimit a context for the searched information that gives a mean to reduce ambiguities. A problem such systems have to face consists of detecting thematically coherent pieces of text and identifying their topic, without limitation about the topics that are likely to be found. Detecting coherent pieces of text refers to text segmentation, and when it is applied to large text collections, segmentation method are based on lexical knowledge (Hearst, 1997; Salton, 1996) or on non specialized knowledge as collocation networks, thesaurus or dictionary (Kozima, 1993) when dealing with narratives. -
A Topic Segmentation of Texts Based on Semantic Domains Olivier Ferret, Brigitte Grau
A Topic Segmentation of Texts based on Semantic Domains Olivier Ferret, Brigitte Grau To cite this version: Olivier Ferret, Brigitte Grau. A Topic Segmentation of Texts based on Semantic Domains. ECAI 2000, Proceedings of the 14th European Conference on Artificial Intelligence, 2000, Berlin, Germany. pp.426–430. hal-02458243 HAL Id: hal-02458243 https://hal.archives-ouvertes.fr/hal-02458243 Submitted on 28 Jan 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Topic Segmentation of Texts based on Semantic Domains Olivier Ferret1 and Brigitte Grau1 Abstract. Thematic analysis is essential for many Natural Language Proc- 2 OVERVIEW OF THE SEGAPSITH SYSTEM essing (NLP) applications, such as text summarization or information extraction. It is a two-dimensional process that has both to delimit the SEGAPSITH has two main components: a module that learns thematic segments of a text and to identify the topic of each of them. The semantic domains from texts and a topic segmentation module that system we present possesses these two characteristics. Based on the use of relies on these domains. Semantic domains are topic representa- semantic domains, it is able to structure narrative texts into adjacent the- tions made of weighted words. -
A Bootstrapping Approach for Robust Topic Analysis
Natural Language Engineering 1 (1): 1{25. Printed in the United Kingdom 1 °c 2002 Cambridge University Press A Bootstrapping Approach For Robust Topic Analysis O l i v i e r F E R R E T B r i g i t t e G R A U LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France (Received 30 June 2001; revised 31 January 2002 ) Abstract Topic analysis is important for a lot of applications dealing with texts, such as text sum- marization or information extraction. But it can be done with a great precision only if it relies on structured knowledge, which is di±cult to produce on a large scale. In this article, we propose using bootstrapping in order to solve this problem: a ¯rst topic analysis based on a weakly structured source of knowledge, a collocation network, is used for learning explicit topic representations that then support a more precise and a more reliable topic analysis. 1 Introduction The problem we address in this paper is the topic analysis of texts on a large scale. Topic analysis is a kind of paradox: it is a very intuitive process that seems famil- iar to every reader but it is also very di±cult to de¯ne precisely, especially in the ¯eld of linguistics. According to us, topic analysis covers three kinds of problems: topic segmentation, topic identi¯cation and topic structuring of texts. This divi- sion is closely akin to the division into the three following dimensions: syntagmatic, paradigmatic and functional. Topic segmentation consists in delimiting parts of texts that are thematically coherent.