Discours Et Document
Total Page:16
File Type:pdf, Size:1020Kb
I Schedae Prépublications de l’Université de Caen Basse-Normandie Fascicule n° 1 2006 Colloque International Discours et Document International Symposium Discourse and Document Presses universitaires de Caen II III Schedae, 2006 Fascicule n° 1 Colloque International : Discours et Document International Symposium: Discourse and Document Responsable : Patrice ENJALBERT L’objectif du colloque Discours et Document est de rassembler des chercheurs intéres- sés par ce qu'on peut appeler le « niveau document » en linguistique du discours, en TAL ou en ingénierie documentaire. Ce fascicule regroupe les communications pré- sentées au colloque. Présidents du colloque M.-P. PÉRY-WOODLEY, U. Toulouse 2 ; P. E NJALBERT, U. Caen ; M. GAIO, U. Pau et Pays de l’Adour. Comité de programme J. BATEMAN, U. Bremen, Allemagne ; D. BATTISTELLI, U. Paris 4, France ; Y. BESTGEN, U. C. Lou- vain, Belgique ; B. BOGURAEV, IBM T.J. Watson Research Center, USA ; A. BORILLO, U. Tou- louse 2, France ; N. BOUAYAD-AGHA, U. Pompeu Fabra, Barcelona, Espagne ; F. CERBAH, Dassault Aviation, France ; M. CHAROLLES, U. Paris 3, France ; D. CRISTEA, U. Iasi, Romania ; L. DEGAND, U. C. Louvain, Belgique ; D. DUTOIT, Sté Memodata, France ; P. ENJALBERT, U. Caen, France ; S. FERRARI, U. Caen, France ; O. FERRET, CEA, France ; M. GAIO, U. Pau, France ; B. GRAU, U. Paris-Sud, France ; N. HERNANDEZ, U. Caen, France ; G. LAPALME, U. Montréal, Québec, Canada ; A. LE DRAOULEC, U. Toulouse 2, France ; A. LEHMAM, Sté Pertinence Mining.com, France ; D. LEGALLOIS, U. Caen, France ; N. LUCAS, U. Caen et CNRS, France ; F. M AUREL, U. Caen, France ; A. MAX, U. Paris-Sud, France ; J.-L. MINEL, U. Paris 4, France ; M. MOJAHID, U. Toulouse 3, France; M.-P. PÉRY WOODLEY, U. Toulouse 2, France; H. SAGGION, U. Sheffield, Angleterre ; I. SALEH, U. Paris 8, France ; S. SALMON, Alt ATILF-CNRS, France ; L. SARDA, CNRS, LATTICE, France ; D. SCOTT, Open University, Angleterre. Comité d’organisation S. FERRARI, Coordinateur ; F. BILHAUT ; N. HERNANDEZ ; A. WIDLÖCHER. GREYC – Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen Statut : Unité mixte de recherche université, CNRS et ENSICAEN – UMR 6072 Directeur : Régis CARIN Directeur-adjoint : Étienne GRANDJEAN Axes de recherches : algorithmique, sécurité, information, langage, interface homme-machine, image, automatique, instrumentation, capteurs, électronique IV V Schedae, 2006 Fascicule n° 1 Sommaire Preface . VII Session 1 : Organisation discursive : études de corpus et modélisation Marie-Paule JACQUES & Josette REBEYROLLE : Titres et structuration des documents . 1 Farida AOULADOMAR, Leila AMGOUD, Patrick SAINT-DIZIER : On Argumentation in Procedural Texts. 13 Sophie PIÉRARD & Yves BESTGEN : Adverbiaux temporels et expressions référentielles comme marqueurs de segmentation : emploi simultané ou exclusif ? . 23 Sandrine STEIN-ZINTZ : De l’altérité spatiale à l’organisation textuelle : la locution d’une part… d’autre part . 29 Susanne HEMPEL & Liesbeth DEGAND : The use of sequencers in academic writing: a comparative study of French and English . 35 Session 2 : Discours, document, et TAL Frédérik BILHAUT : Introducteurs intra-prédicatifs d’univers de discours et leur détection automatique. 41 Marion LAIGNELET : Les titres et les introducteurs de cadres comme indices pour le repérage de segments d’information évolutive . 51 Dominique LEGALLOIS & Stéphane FERRARI : Vers une grammaire de l’évaluation des objets culturels . 57 Nadia ZERIDA, Nadine LUCAS, Bruno CRÉMILLEUX : Combinaison de descripteurs linguistiques et de structure pour la fouille d’articles biomédicaux . 69 Amanda BOUFFIER : Segmentation de textes procéduraux pour l’aide à la modélisation de connaissances : le rôle de la structure visuelle . 79 Christophe PIMM : Quelle plus-value linguistique pour la segmentation automatique de texte ? 85 Session 3 : Nouveaux types de documents, nouveaux modes d’accès à l’information textuelle Clara MANCINI & Donia SCOTT : Hyper-Document Structure: Maintaining Discourse Coherence in Non-Linear Documents . 91 Javier COUTO & Jean-Luc MINEL : SEXTANT, un langage de modélisation des connaissances pour la navigation textuelle. 105 Birgitta BEXTEN : Hypertext and Plurilinearity: Challenging an Old-fashioned Discourse Model 117 VI Thomas KRECZANIK : Modélisation de parcours dans des hypertextes pédagogiques : typage des ressources et des liens . 123 Olivier LE DEUFF : Des bons mots au bon document. Comment éduquer à l’usage des mots-clés efficaces pour accéder à la pertinence documentaire . 129 Session 4 : Systèmes de TAL, démonstrations Abderrafih LEHMAM : Solutions de traitement du document textuel avec prise en charge de ressources linguistiques . 135 Frédérik BILHAUT & Antoine WIDLÖCHER : Analyse de structures discursives avec la plate-forme LinguaStream . 141 Ágnes SÁNDOR, Aaron KAPLAN, Gilbert RONDEAU : Discourse and citation analysis with concept-matching . 147 Conférence invitée Simone TEUFEL : Discourse structure in scientific articles: argumentation and citation (à venir) . 153 VII Schedae, 2006 Fascicule n° 1 Preface ISDD 2006: aims and scope In connection with the development of digital documents, discourse linguistics, docu- ment engineering and NLP are increasingly converging: applying corpus analysis methods to discourse calls for greater use of NLP techniques while new modes of access to the contents of documents place more emphasis on exploiting discourse structure. This convergence is manifest in a number of joint studies, and results in cross fertilisation of the disciplines. This is the analysis which led us, in the call for papers for Discourse and Document 2006, to explicitly reach out towards researchers concerned with “the document level” in discourse linguistics, computational linguistics, and document-engineering. We present in this volume twenty contributions by authors who must have recognised themselves in this way of setting out the issues. The aim of the symposium is to build on the convergence of questions and objectives which clearly emerge from these contributions. Beyond their specific scientific interest, the challenge is to arrive at a usable definition of an emergent research field, with implications both in discourse linguistic and document engi- neering areas. The first two sessions can be described as presenting different takes on document organ- isation. Each paper tends to focus on a particular view of what may be semantically impor- tant in discourse processing. One such view is that documents are organised in topics (in the sense of “what is being talked about”), and can be segmented in terms of this organisation (whether via automatic procedures to identify breaks in lexical cohesion or via analyses of reference chains). Other approaches stress argumentative structure, and identify segments that fulfil particular argumentative or rhetorical functions. In both these views, the organisa- tion is assumed to be largely implicit: various techniques are brought to bear to identify the shifts between continuity and discontinuity, to tease out discourse function on the basis of surface markers. Another take is to consider explicit clues to document organisation, such as metadiscursive expressions, or elements of the so-called “logical structure”. These questions are considered in a largely descriptive manner in the first session, while the second focuses on the design of NLP procedures to identify such structures in text. Indeed a major field in NLP is the development of systems concerned with facilitating access to the information stored in documents, and there is a growing awareness of the need to take better account of the organisation of the documents being processed. Another facet of this evolu- tion is that researchers into discourse organisation gradually move towards more empirical methods and require computational instruments to analyse large volumes of data. The third session provides a very concrete illustration of these trends, through the presentation and demonstration of NLP systems, originating in both academic and industrial contexts. VIII Finally, new document types - hyper-documents - raise radically new questions about dis- course organisation and the interaction between semiotic functions. What makes such docu- ments cohere (or not)? How are they read and understood? How can this reading process be made easier, more efficient? But further, what new insight into the organisation of “ordi- nary” text can be gained through the comparison with these new non linear textual forms? Some of these questions apply equally at the level of document bases - now widely acces- sible thanks to internet and other electronic devices -, which can be seen as “macro-texts” through which the user has to wander as s/he scours for relevant information. And the notion of navigation is also at stake in the case of “classical” texts, with new NLP techniques going into the design of much needed tools to assist the reader in non-linear text browsing. From linear document to hyper-document to document bases, and back to non-linear modes of access to “classical” documents, we've gone full circle… These are some of the stimulating questions which are addressed in the final session. Taken as a whole, the twenty papers presented at ISDD'06 provide a rich and accurate view of a number of complementary aspects of discourse structure in relation with the func- tional notion of document. A promising area of research is outlined, an area which, as it extends across discipline boundaries, requires a scientific