Computer-aided lexicography
Creation, publication, and use of dictionaries: our experience at the Ixa NLP Group
Xabier Artola Zubillaga [email protected] Faculty of Computer Science, Donostia
Using the dictionary is not always fun
How many legs has a fly?
This looks like a past participle of some verb!!!: shrunk
There must be a word for... to remove the hair from the skin of goats and sheep
I need a verb now!: the fire ...s
Is there any relationship between these words? Which one?: to burn, to blacken
Which one is correct?: a quick shower or a fast shower
2010-11-26 IULA - InfoLex (UPF) 3 Using the dictionary is not always fun
Translating buy for into Spanish:
The company bought stock for investment purposes They kept buying for several months They bought stock for €3,000,000 The defendant said he bought it for his brother
look after: what does it mean?
2010-11-26 IULA - InfoLex (UPF) 4
Outline of the presentation
Creation Computer-aided lexicography: text corpora and language databases Dictionary editing environments Knowledge representation issues
Publication Print Electronic (on-line or whatever) From the editing application to the final product
Use Use cases, users, and dictionary software functionality Do we get from electronic dictionaries what we could expect from them?
2010-11-26 IULA - InfoLex (UPF) 5 Creation: dictionary making
Still in the 20th century: piles of index cards within shoeboxes Word usage was compiled largely on paper slips or index cards, as the basis for the creation of dictionary entries
Computer technology text corpora (concordances, KWIC) to: acquire real language use examples discover and ascertain word senses, extract definitions find and verify collocations find neologisms find out multiword lexical units databases (wide sense) to store dictionary contents
2010-11-26 IULA - InfoLex (UPF) 6
Creation: dictionary making
Today's electronic dictionaries: where do we get dictionary content from? print dictionaries (legacy): scanning OCR parsing of typographic features importing it from glossaries, entry lists, other electronic dictionaries... from scratch: editing (lexicographer) word processors databases XML editors publishers' custom applications dictionary editing software: Tshwanelex...
2010-11-26 IULA - InfoLex (UPF) 8 Creation: dictionary making
Building electronic dictionaries from legacy dictionaries: [scanning + OCR +] parsing of typographic features Goal: to obtain a structural representation of the dictionary content (often in XML) ¾ from text to a lexicographic database
Two real cases (Ixa NLP Group): eEH: from RTF to TEI SGML / XML (Arregi et al., 2003, 2007) DBE: from RTF to TEI XML (Alegria et al., 2006a, 2006b)
2010-11-26 IULA - InfoLex (UPF) 10
eEH: from RTF to TEI SGML / XML
Sarasola I. Euskal Hiztegia. Kutxa Fundazioa: Donostia, 1996.
Basque monolingual dictionary, reference for the standard Basque dictionary (Hiztegi Batua, Academy of the Basque Language)
33,111 entries, 41,699 senses
Typical examples illustrating the use of words, drawn from corpora
From RTF to TEI SGML (later to TEI XML): DCG written in Prolog TEI DTD: select / customize / enhance
Manual correction of the automatically obtained output
2010-11-26 IULA - InfoLex (UPF) 11 eEH: from RTF to TEI SGML / XML
eEH: electronic Euskal Hiztegia (electronic dictionary prototype) Sophisticated indexing system (no databases are used) definition and example texts fully lemmatized Users: ordinary advanced (philologists, lexicographers, translators...) Functionality full hypertext utility (from definitions and examples to corresponding entries) basic query advanced query • especially designed query language • dictionary search as in a corpus
Problem: lack of editing environment
2010-11-26 IULA - InfoLex (UPF) 12
eEH: electronic dictionary prototype
query language
query interface
2010-11-26 IULA - InfoLex (UPF) 13 eEH: electronic dictionary prototype
query language
query interface
2010-11-26 IULA - InfoLex (UPF) 14
DBE: from RTF to TEI XML
Miyares Bermúdez E. (dir.) Diccionario Básico Escolar. Centro de Lingüística Aplicada, Santiago de Cuba. 2003.
School dictionary, monolingual st 7,473 entries, 14,013 word senses (1 ed.)
From RTF to TEI P4 XML: Word macros Ferret (semi-automatic learning software) TEI DTD: select / customize / enhance
Manual correction of the automatically obtained output
leXkit: dictionary editing environment
Three on-line versions, two CDs, three print editions
2010-11-26 IULA - InfoLex (UPF) 15 DBE: CD and on-line (3rd version)
image request
other entry look-up functionality
letter index indexes look-up
response cross- orthographic references help
2010-11-26 IULA - InfoLex (UPF) 17
Dictionary editing environments
Essential if databases or markup languages are chosen for dictionary knowledge representation
Wish list all kind of editing facilities: XML-transparent, navigation facilities, cross-reference building, wizards... integrity constraint checking and consistency multimedia integration import facilities collaborative editing Wiktionary dicussion forums • Ultralingua (online discussion forum) • Leo collaborative bilingual dictionaries
2010-11-26 IULA - InfoLex (UPF) 18 Dictionary editing environments
Wish list (cont'd) customized output: dictionary publication different dictionary products: • unabridged dictionary • student's dictionary • ... export formats: • electronic versions: XML, HTML, other formats... • print: PDF, desktop publishing software...
2010-11-26 IULA - InfoLex (UPF) 19
A real case: leXkit (Ixa NLP Group)
leXkit: a dictionary content management system (Alegria et al., 2006c) Dictionary edition and maintenance XML-based: Berkeley DBXML XML native database for storage Client-server architecture: SOAP-based communication Suitable for different kinds of dictionaries
Main features: Allows adding, deleting and modifying entries in a friendly fashion: XML details are transparent for the lexicographer Provides the lexicographers with all the features of a full-fledged DBMS: full search capabilities, safe storage, concurrent access, etc.
2010-11-26 IULA - InfoLex (UPF) 20 leXkit
Main features (cont'd): Maintains entry states (version control and tracking) Allows to automatically generate the files and components needed by a running application such as the current electronic DBE. Tailored output is feasible: it allows to easy export data required in print editions, diversified electronic versions, etc.
Architecture Client The component used by the lexicographer Tool integration (corpora, other dictionaries...) Server: database, concurrency, configuration files (dictionary schema definitions, wizards, etc.), import/export utilities, backups...
2010-11-26 IULA - InfoLex (UPF) 21
leXkit
Editor: •Edition tree •Predefined tasks
dictionary tabs
edition textbox
Index: Dictionary entries Search results Viewer: •Entry preview (WYSIWYG) •Integrated tools
2010-11-26 IULA - InfoLex (UPF) 22 leXkit
views and info tabs
Viewer: •XML tab •Entry info •Session control •...
2010-11-26 IULA - InfoLex (UPF) 23
leXkit: system architecture
2010-11-26 IULA - InfoLex (UPF) 24 leXkit
Communication (client / server) SOAP web services (RPC model + cookies) Intermediate declarative layer (XML) Dictionary specifications Operations (context-dependent tasks) Wizards (common edition operations, predefined searches...)
Other technical aspects XSLT is widely used in the application XSLTi: decarative language that adds interactivity to XSLT scripts XML processing: Xerces + Xalan Graphical interface: wxWidgets HTML rendering: Mozilla (wxMozilla)
2010-11-26 IULA - InfoLex (UPF) 25
leXkit: wizards for the DBE
2010-11-26 IULA - InfoLex (UPF) 26 leXkit: conclusions
nd leXkit has been used at the CLA for editing the DBE's 2 and 3rd editions: from 7473 entries / 14013 senses in the 1st edition to 10557 entries / 19374 senses in the 3rd one.
The construction of leXkit was a vital tool in the qualitative leap of this work.
Dictionary edition applications are a must, especially if dictionaries are stored in databases or XML-encoded.
leXkit can be used by other lexicographical teams to create and update dictionaries. It is available as free software (open source) at http://sourceforge.net/projects/lexkit/.
2010-11-26 IULA - InfoLex (UPF) 27
Dictionary representation
Representation is the key factor for dictionary functionality we won't get what is not stored and adequately represented in the dictionary the representation we choose conditions what we later on will be able to get from the dictionary
Physical level text (no access facilities, deficient structuring) plain or somehow structured (CSV, tabular...) rich text: typography, word processors
¾ even the entry concept is diluted sometimes ¾ risk: vicious circle (to be avoided)
2010-11-26 IULA - InfoLex (UPF) 28 Dictionary representation
Physical level (cont'd)
database: relational (structure, indexing, query and update facilities) one database = one dictionary • is each pertinent information unit correctly represented in a field or column? integrated dictionary system (publishers) • publisher's general dictionary database
marked text HTML: mark-up language, presentation-oriented SGML / XML: mark-up metalanguage, content-oriented
2010-11-26 IULA - InfoLex (UPF) 29
Dictionary representation
¾ content-oriented marked text constitutes a better data model for the representation of dictionary content and structure than the relational model
lexical information is inherently complex information apparently similar is represented in dictionaries using structurally different ways intra-entry hierarchical structure is not adequately represented using the relational model the information must be split in several tables: redundancy, factorization problems construction of user-friendly graphical user interfaces is not always easy query languages are often complex and non-intuitive
2010-11-26 IULA - InfoLex (UPF) 30 Dictionary representation
¾ content-oriented marked text... (cont'd)
content-oriented marked texts (SGML, XML...) descriptive markup (structure, content) more flexible data representation model reflects better the lexicographic data model used in dictionaries drawback: manageability and efficiency ¾ XML native databases: indexing, query and update facilities
TEI (Text Encoding Initiative): a whole chapter full of recommendations on marking up human-oriented dictionaries
2010-11-26 IULA - InfoLex (UPF) 31
Dictionary representation
Physical level (cont'd)
dictionary knowledge bases: reasoning, artificial intelligence techniques, knowledge representation languages
¾ the only way to extract implicit knowledge from dictionary structures
2010-11-26 IULA - InfoLex (UPF) 32 Dictionary representation
Conceptual level
what information/knowledge is represented? orthography, pronunciation, grammar (mostly POS), register, definition... morphology? irregular inflection paradigms? • important in learner's dictionaries, highly inflected languages...but not only • two real cases (Ixa NLP Group) • Elhuyar eu-es (MS Word plugin): eu and es lemmatization • UZEI synonyms (MS Word plugin): eu lemmatization
2010-11-26 IULA - InfoLex (UPF) 33
Dictionary representation
Conceptual level (cont'd)
dictionary typology monolingual / bi- or multilingual language dictionary / encyclopedic general use / specific (terminology) ... implicit knowledge: in definitions, examples, lexical semantics WordNet, thesauri... association lists, semantic networks
¾ inference, reasoning
2010-11-26 IULA - InfoLex (UPF) 34 Publication: presentation, output
print how to obtain the "file" to submit to the publisher?
electronic typology on-line • on-line dictionaries (free, subscriptions...) • dictionary directories: OneLook Dictionary Search • multi-dictionary access tools: Euskalbar, a Firefox plugin that integrates ~30 dictionaries and corpora
• the web (corpus) as a dictionary • translation memories, parallel corpora
2010-11-26 IULA - InfoLex (UPF) 35
Publication: presentation, output
typology (cont'd) desktop dictionary software • standalone applications: personal computers, small handheld devices, mobile phones... • integrated dictionaries, plugins: in word processors, web browsers... • Elhuyar eu-es (MS Word plugin) • UZEI synonyms (MS Word plugin) • multi-dictionary tools: Babylon machine-readable dictionaries: PDF...
formats: HTML, XML, PDF, PS, electronic book formats, application proprietary formats...
2010-11-26 IULA - InfoLex (UPF) 36 Publication: presentation, output
Which is the way leading from the editing environment or database to the print or to the electronic version?
DBE CD and on-line: XML to HTML (dynamic transformation, XSLT) print: XML to PDF (XSLT-FO)
Hiztegi Batua (Euskaltzandia, Basque Language Academy): on-line: XML to HTML (XSLT) publishing: HTML to Quark (manually) download: Quark to PDF
2010-11-26 IULA - InfoLex (UPF) 37
Publication: presentation, output
Which is the way... (cont'd) other solutions: [general dictionary] • Oracle to HTML (web) • Oracle to Quark (print) [terminological dictionary] • 4D to Quark (print) • 4D to XML (TBX) to XHTML (web)
customized output: proprietary formats (mainly in desktop dictionary software)
The longer the way...the easier is to get lost! update will be more costly
2010-11-26 IULA - InfoLex (UPF) 38 Use: functionality
Use cases language input: typical lookup (definitions, multiword expressions...) language output: is the dictionary well oriented to be used in language production situations? much more information is needed when we want to actually use a word in speech or in writing than when we only want to understand a word in a passage. translation tasks: language input and output especial information is needed: faux amis... language learning activities: more information is needed about context of use, connotations of a word, collocations, etc.
2010-11-26 IULA - InfoLex (UPF) 40
Use: functionality
Users (models, profiles) native speakers language learners translators students, children... specialists: scientists, technicians...
Functionality do we get from electronic dictionaries what we could expect from them? are they something more than their print counterparts?
2010-11-26 IULA - InfoLex (UPF) 41 Dictionaries of the future: http://www.oxforddictionaries.com/page/84
Print dictionaries have been joined by dictionaries in electronic form: these are often enriched with many additional features, such as sound recordings or sophisticated links to other related material. ... It seems likely that by the middle of this century, if not before, all dictionaries will be in electronic form. This means that limitations of space, which have always been a serious issue for lexicographers and dictionary publishers, will be much less important. Dictionaries will be able to include more material: more words and definitions, interactive features, and multimedia content such as images, sound, and video. They will also be updated much more rapidly than ever before. But the general idea of a dictionary - a resource that provides explanations of words and how they are used - will probably remain the same.
2010-11-26 IULA - InfoLex (UPF) 42
Use: functionality
Functionality (cont'd) what we get search facilities: from basic lookup to advanced queries speed, storage facilities orthographic help (closeness) integration: word processors, reading applications... new features: multimedia (recorded sounds, images, videos), hyperlinks interactivity? wish list definition and examples: corpus queries navigation: fully hyperlinked (lemmatization of definitions, examples...) morphology, grammar, derivation...
2010-11-26 IULA - InfoLex (UPF) 43 Use: functionality
wish list (cont'd) use of words, lexical combinatorics, collocations • dictionary and corpus integration? find a word from its definition, explore related concepts...: OneLook Reverse Dictionary (statistical language processing)
intelligent dictionary? why not integrate different kinds of information and tools (WordNet, thesauri, multimedia, collocations, thematic...) in powerful language help systems, and provide them with inference and reasoning capabilities? • Hiztsua / SIAD (Artola, 1993; Agirre et al. 1994a, 1994b, 1997) • AnHitz (Arregi, 1995; Agirre et al. 1996, 2000)
Have we investigated enough the ways users use dictionaries?
2010-11-26 IULA - InfoLex (UPF) 44
Hiztsua / SIAD: Intelligent Dictionary Help System
Built from a small French dictionary: Le Plus Petit Larousse (Librairie Larousse. Paris, 1980).
Definitions parsed using NLP techniques: morphology, syntax, definition patterns, lexico-semantic relationships
Building procedure: LPPL (typed directly into a DB GUI) Dictionary Database (DDB, relational) Dictionary Knowledge Base (DKB)
DKB: interrelated network of concepts (semantic network): hypernymy/hyponymy synonymy, antonyms meronymy semantic roles
2010-11-26 IULA - InfoLex (UPF) 45 Hiztsua / SIAD: Intelligent Dictionary Help System
Frame-based system, allowing inheritance, inference, composition of lexical relationships
Prototype conceived and designed for human users from the study of questions that human users would like to have answered when consulting a dictionary
Functionality that allows to extract and infer implicit knowledge hidden in the dictionary structures definition queries, searches of alternative definitions differences, relations and analogies between concepts thesaurus-like word search verification of concept properties and of interconceptual relationships ...
2010-11-26 IULA - InfoLex (UPF) 46
Anhitz: A translator-oriented Dictionary System
Intelligent help system for human translators the dictionary is conceived as an "active" tool that observes the activity of the user while he or she is working, providing him or her with "intelligent help"
Prototype based on two monolingual dictionaries (French and Basque): two monolingual knowledge bases one bilingual DKB establishes equivalence links between concepts from the monolingual dictionaries diverse types of equivalence relationships: more general, more specific...
2010-11-26 IULA - InfoLex (UPF) 48 Anhitz: A translator-oriented Dictionary System
Functionality: empirical observation and study, using protocols and questionnaires, on the activity of professional and non- professional translators to model the translator-dictionary interaction when translating lexical units from the source language into the target language user's goals and intentions, dictionary queries made, observations, etc. have been recorded monolingual and bilingual, locution, synonym... dictionaries real use cases functions classified according basically to three main activities: source text understanding target text generation search for translation equivalents
2010-11-26 IULA - InfoLex (UPF) 49
Anhitz: A translator-oriented Dictionary System
TRANSLATING THE
SOURCE WORD trans-lex
GETTING THE CONTEXT
TARGET WORD ACQUIRING THE GENERATION MODEL OF THE SEARCHING FOR THE TRANSLATOR SOURCE WORD EQUIVALENT UNDERSTANDING
2010-11-26 IULA - InfoLex (UPF) 50 Anhitz: A translator-oriented Dictionary System
TRANSLATING THE SOURCE WORD
TARGET WORD GENERATION
FINDING DISCRIMINATING GENERATION FROM THE DICTIONARY GENERATION GENERATION HYPOTHESIS ENTRY TO THE LEXICAL HYPOTHESES HYPOTHESES VERIFICATION UNIT
rths colloc exam dis-pro comp-sem prod ver-reg dpro sint-pat
2010-11-26 IULA - InfoLex (UPF) 51
Anhitz: A translator-oriented Dictionary System
Primitive functions: morphological analysis of a word form choice of a dictionary entry / word sense in a given context list of the possible senses that could be suitable for a word in a given context definition request reformulation of a definition request of the properties of a concept choice of a definition in a given context request of differences or relationships between two concepts verification of relationships between two concepts definition verification
2010-11-26 IULA - InfoLex (UPF) 52 Anhitz: A translator-oriented Dictionary System
Primitive functions (cont'd): verification of the properties of a concept thesaurus-like search of concepts request of examples direct lexical translation of a word form verification of translation equivalents semantic compatibility between two word senses according to a given relationship search for syntactic constructions corresponding to a given pattern search of lexical collocations request of the verb regime search for potential translation equivalents
2010-11-26 IULA - InfoLex (UPF) 53
To finish...
Dictionary edition: provide the lexicographer with advanced tools
Stress the importance of dictionary knowledge representation: we will get what we keep, and we will get it if we represent it adequately for the purpose required
We should investigate how users do use dictionaries, in order to build more "intelligent" systems, capable of anticipating users' needs and help them better
The dictionary of the future should be a "different" thing, not merely a "faster" print dictionary integration of different kinds of information and tools in powerful language help systems rich and heterogeneous functionality and access ways to the lexicon
2010-11-26 IULA - InfoLex (UPF) 54 Bibliography
Miyares Bermúdez E., Ruiz Miyares L., Álamo Suárez C., Pérez Marqués C., Artola Zubillaga X., Alegria Loinaz I., Arregi Iparragirre X.. 2010a. Las últimas ediciones del Diccionario Básico Escolar de Cuba. IV Congreso Internacional de Lexicografía Hispánica. Tarragona.
Miyares Bermúdez E., Ruiz Miyares L., Álamo Suárez C., Pérez Marqués C., Artola Zubillaga X., Alegria Loinaz I., Arregi Iparragirre X.. 2010b. La segunda y tercera ediciones del Diccionario Básico Escolar. Euralex2010. Leeuwarden (The Netherlands).
Arregi X., Arriola J.M., Artola X., Díaz de Ilarraza A., Garcia E., Lascurain V., Soroa A., Uria L. 2007. Semiautomatic Construction of the Electronic Euskal Hiztegia Basque Dictionary (eEHBD). The XVIth biennial conference of the Dictionary Society of North America, Chicago.
Alegria I., Arregi X., Artola X., Astiz M., Ruiz Miyares L.. 2006a. Different issues in the design and development of the electronic Cuban Basic School Dictionary. E. Miyares, L. Ruiz eds., Linguistics in the Twenty First Century, 273-288. Cambridge Scholars Press, UK. ISBN: 1904303862.
Alegria I., Arregi X., Artola X., Astiz M., Ruiz Miyares L.. 2006b. Building an Electronic Version of the Cuban Basic School Dictionary. Proceedings EURALEX 2006 I, 243-250 (Turin, Italy). (ISBN 88-7694-918-6).
2010-11-26 IULA - InfoLex (UPF) 55
Bibliography
Alegria I., Arregi X., Artola X., Astiz M., Ruiz Miyares L.. 2006c. A Dictionary Content Management System. Proceedings EURALEX 2006 I, 105-109 (Turin, Italy). (ISBN 88-7694- 918-6).
Soroa, A. Izaera heterogeneoko baliabide lexikalen integraziorako arkitektura baten proposamena. Datu-integrazioaren ikuspegitik egindako ekarpena. PhD Thesis. Informatika Fakultatea, UPV-EHU. 2004.
Arregi X., Arriola J., Artola X., Díaz de Ilarraza A., García E., Laskurain B., Sarasola K., Soroa A., Uria L.. 2003. Semiautomatic conversion of the Euskal Hiztegia Basque Dictionary to a queryable electronic form. T.A.L. journal. vol 44, num 2 p 107-124 ISSN: 1248-9433.
Arriola J., Artola X., Soroa A.. 2003. Automatic Extraction of verb patterns from Hauta-Lanerako Euskal Hiztegia. B. Oyharçabal ed., Inquiries into the lexicon-syntax relations in Basque. Supplements of ASJU no. XLVI (ISBN: 84-8373-580-6), 127-146. UPV/EHU, Bilbo.
E. Agirre, X. Arregi, X. Artola, A. Díaz de Ilarraza, F. Evrard, K. Sarasola, A. Soroa. 2003. An Intelligent Dictionary Help System. Encyclopedia of Library and Information Science, 2nd. Edition (ISSN/ISBN: 0-8247-2075-X [print]; 0-8247-4259-1 [web]), 1390-1401. Allen Kent (Marcel Dekker, Inc.), New York.
2010-11-26 IULA - InfoLex (UPF) 56 Bibliography
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Soroa A.. 2001. MLDS: A Translator-Oriented MultiLingual Dictionary System. Natural Language Engineering, 5 (4), 325- 353. ISSN: 1351-3249. Cambridge University Press.
Agirre E., Ansa O., Arregi X., Artola X., Díaz de Ilarraza A., Lersundi M., Martinez D., Sarasola K., Urizar R.. 2000. Extraction Of Semantic Relations From A Basque Monolingual Dictionary Using Constraint Grammar. Proceedings of Euralex Sttutgart (Germany). 2000. ISBN 3-00- 006574-1.
Arriola, J.M.. Euskal Hiztegia-ren azterketa eta egituratzea ezagutza lexikalaren eskuratze automatikoari begira. Aditz-adibideen analisia murriztapen-gramatika baliatuz, azpikategorizazioaren bidean. PhD Thesis. Filologia eta Historia-Geografia Fakultatea, UPV- EHU, 2000.
Patrick J., Zhang J., Artola X.. 2000. An Architecture and Query Language for a Federation of Heterogeneous Dictionary Databases. Computers and the Humanities (ISSN: 0010-4817).
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Soroa A.. 2000. A Methodology For Building Translator-Oriented Dictionary Systems. Machine Translation Journal. ISSN: 0922- 6567. Kluwer Academic Publishers. V. 15 nº 4. pp. 295-310. 2000.
2010-11-26 IULA - InfoLex (UPF) 57
Bibliography
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Soroa A.. 1999. Un Diccionario activo vasco-castellano en un entorno de escritura. VI Simposio Internacional de Comunicación Social. Santiago de Cuba, 25-28 de Enero de 1999.
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Soroa A.. 1997. Constructing an intelligent dictionary help system. Natural Language Engineering 2(3): 229-252. ISSN: 1351- 3249. Cambridge University Press. Cambridge. 1997.
Arriola J., Artola X., Soroa A.. 1996. Hauta-lanerako Euskal Hiztegiaren analisi erdiautomatikoa. ASJU, Anuario del Seminario de Filología Vasca.
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Ezeiza N., Sarasola K., Soroa A., A. Agirre, Patel H..1996. Design of a translator-oriented dictionary: Enhancement of a dictionary knowledge base by task modelling. Le traitement automatique du langage et les applications industrielles/Natural Language Processing and Industrial Applications. (NLP + IA96), Volume I, pp 1-6. Moncton, Canada. 1996.
Arriola J., Artola X., Soroa A.. 1996. Automatic extraction of lexical information from an ordinary dictionary. EURALEX'96, Göteborg (Sweden).
Patrick J., Zhang J., Artola X.. 1996. An Architecture for a Federation of Heterogeneous Lexical and Dictionary Databases. Joint International Conference ALLC/ACH'96, 221-225. Bergen (Sweden).
2010-11-26 IULA - InfoLex (UPF) 58 Bibliography
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Evrard F.. 1995. IDHS, MLDS: Towards Dictionary Help Systems for Human Users. Semantics And Pragmatics Of Natural Language: Logical And Computational Aspects. K. Korta & J. M. Larrazabal (Eds.), ILCLI Series, n. 1. Donostia.
Arriola J., Artola X., Soroa A.. 1995. Análisis automático del diccionario Hauta-Lanerako Euskal Hiztegia. Procesamiento del lenguaje natural (SEPLN), Revista no. 17, 173-181. Bilbo.
Arregi, X.. ANHITZ: Itzulpenean laguntzeko hiztegi-sistema eleanitza. PhD Thesis. Informatika Fakultatea, UPV-EHU, 1995.
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K.. 1994a. Lexical Knowledge Representation in an Intelligent Dictionary Help System. Proceedings of COLING'94, vol. 1, 544- 550. Kyoto (Japan).
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Evrard F.. 1994b. Intelligent dictionary help systems. Applications and Implications of current LSP Research. Eds. Brekke, M.; Andersen. I.; Dahl, T. & Myking, J., v. 1., 174-183. Fakbokforlaget (Norway).
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K.. 1994c. Analysing world-level translation activity to design a computerised dictionary. Proceedings of Euralex'94. Amsterdam.
2010-11-26 IULA - InfoLex (UPF) 59
Bibliography
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K.. 1994d. A methodology for the extraction of semantic knowledge from dictionaries using phrasal patterns. Proceedings of IBERAMIA'94. IV Congreso Iberoamericano de Inteligencia Artificial. McGraw-Hill. , 263-270. Caracas (Venezuela).
Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Evrard F.. 1993. Sistema Diccionarial Multilingüe: aproximación funcional. Revista de la Asociación Española para el Procesamiento del Lenguaje Natural. Vol: 14, pp: 313-335.ISSN: 1135-5948.
Artola, X.. HIZTSUA: Hiztegi-sistema urgazle adimendunaren sorkuntza eta eraikuntza. Hiztegi- ezagumenduaren errepresentazioa eta arrazonamenduaren ezarpena. / Conception et construction d'un système intelligent d'aide dictionnariale (SIAD). Acquisition et représentation des connaissances dictionnariales, établissement de mécanismes de déduction et spécification des fonctionnalités de base. PhD Thesis. Informatika Fakultatea, UPV-EHU, 1993.
Artola X., Evrard F.. 1992. Dictionnaire intelligent d'aide á la compréhension. Actas IV Congreso International EURALEX'90 (Benalmádena), 45-57. Barcelona.
Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Evrard F.. 1991. Aproximación funcional a DIAC: diccionario inteligente de ayuda a la comprensión. Revista de la Asociación Española para el Procesamiento del Lenguaje Natural. Vol: 11, pp:127-138. ISSN: 1135-5948.
2010-11-26 IULA - InfoLex (UPF) 60 RTF: Rich Text Format (MS Word)
{\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;} … {\b\f69\fs16 aberastasun}{\fs14 .}{\b\i\fs14 }{\i\fs14 iz. }{\fs14 (1617; }{\i\fs14 abrastasun}{\fs14 1571).}{\b\fs14 1}{\i\fs14 . }{\fs14 Ondasun edo gauza baliotsuen ugaritasuna}{\i\fs14 . Aberastasunak ematen du aginpidea. }{\fs14 Ik. }{\b\fs14 diru}{\b\i\fs14 . }{\i\fs14 Aberastasunez betea. Ez ohorerik ez aberastasunik. Garai hart an Espainia guztian omen zen baso-oihanetan aberastasun handia. Basoetako aberastasuna. Zein zitezkeen gereziketa eta fruitu aberastasun horren iturburuak. }{\f69\fs12 II}{\fs14 }{\i\fs14 Pl. }{\fs14 Norbaitek dituen ondasun eta gauza baliozkoak}{ \i\fs14 . Herri baten aberastasunak eta baliabideak. Aberastasun galkorren ondoan ibiltzea. Euskarak bere baitan dituen aberastasunak. Aberastasunen banaketa zuzena. Aberastasunak hondatu. }{\b\fs14 2}{\fs14 .}{\i\fs14 }{\fs14 Aberatsa denaren nolakotasuna. Ant}{\i\fs14 . }{\b\fs14 pobretasun}{\fs14 ;}{\b\fs14 behartasun}{\b\i\fs14 . }{\i\fs14 Aberastasunean bizi. Pobretasunetik aberastasunera. Aberastasunaren arriskuak. \par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 { \par }}
2010-11-26 IULA - InfoLex (UPF) 61
Simplified DCG grammar to parse EH entries
Entry => Hdw [Relations] Category [date] [DefExamples]. Hdw => [Homograph] [NonStdHdw | StdHdw]. Homograph => bh number eh. NonStdHdw => cross bb hdw eb. StdHdw => bb hdw eb. Category => [subc] Category. Category => bi cat ei. DefExamples => Def [Examples] DefExamples | ε. Def => [SenseNumber][SenseGroup] def [Relations]. SenseNumber => bs number es. SenseGroup => bsg grouptag esg. Relations => [SynRel | AntRel] Relations [Examples] | ε. SynRel => bsy synonyms esy. AntRel => ba antonyms ea. Examples => bi examples ei.
2010-11-26 IULA - InfoLex (UPF) 62 TEI XML encoding (DBE entry)
Con el paso del tiempo, su interés
2010-11-26 IULA - InfoLex (UPF) 63
DBE: print version (3rd ed.) page markers
figure refs.
2010-11-26 IULA - InfoLex (UPF) 64