
Creation, publication, and use of dictionaries: our experience at the Ixa NLP Group

Xabier Artola Zubillaga [email protected] Faculty of Computer Science, Donostia

Using the dictionary is not always fun

‰ How many legs has a fly?

‰ This looks like a past participle of some verb!!!: shrunk

‰ There must be a word for... to remove the hair from the skin of goats and sheep

‰ I need a verb now!: the fire ...s

‰ Is there any relationship between these words? Which one?: to burn, to blacken

‰ Which one is correct?: a quick shower or a fast shower

2010-11-26 IULA - InfoLex (UPF) 3 Using the dictionary is not always fun

‰ Translating buy for into Spanish:

The company bought stock for investment purposes They kept buying for several months They bought stock for €3,000,000 The defendant said he bought it for his brother

‰ look after: what does it mean?

2010-11-26 IULA - InfoLex (UPF) 4

Outline of the presentation

‰ Creation ƒ Computer-aided lexicography: text corpora and language databases ƒ Dictionary editing environments ƒ Knowledge representation issues

‰ Publication ƒ Print ƒ Electronic (on-line or whatever) ƒ From the editing application to the final product

‰ Use ƒ Use cases, users, and dictionary software functionality ƒ Do we get from electronic dictionaries what we could expect from them?

2010-11-26 IULA - InfoLex (UPF) 5 Creation: dictionary making

‰ Still in the 20th century: piles of index cards within shoeboxes ƒ Word usage was compiled largely on paper slips or index cards, as the basis for the creation of dictionary entries

‰ Computer technology ƒ text corpora (concordances, KWIC) to: ƒ acquire real language use examples ƒ discover and ascertain word senses, extract definitions ƒ find and verify collocations ƒ find neologisms ƒ find out multiword lexical units ƒ databases (wide sense) to store dictionary contents

2010-11-26 IULA - InfoLex (UPF) 6

Creation: dictionary making

‰ Today's electronic dictionaries: where do we get dictionary content from? ƒ print dictionaries (legacy): ƒ scanning ƒ OCR ƒ parsing of typographic features ƒ importing it from glossaries, entry lists, other electronic dictionaries... ƒ from scratch: editing (lexicographer) ƒ word processors ƒ databases ƒ XML editors ƒ publishers' custom applications ƒ dictionary editing software: Tshwanelex...

2010-11-26 IULA - InfoLex (UPF) 8 Creation: dictionary making

‰ Building electronic dictionaries from legacy dictionaries: [scanning + OCR +] parsing of typographic features ƒ Goal: to obtain a structural representation of the dictionary content (often in XML) ¾ from text to a lexicographic database

‰ Two real cases (Ixa NLP Group): ƒ eEH: from RTF to TEI SGML / XML (Arregi et al., 2003, 2007) ƒ DBE: from RTF to TEI XML (Alegria et al., 2006a, 2006b)

2010-11-26 IULA - InfoLex (UPF) 10

eEH: from RTF to TEI SGML / XML

‰ Sarasola I. Euskal Hiztegia. Kutxa Fundazioa: Donostia, 1996.

‰ Basque monolingual dictionary, reference for the standard Basque dictionary (Hiztegi Batua, Academy of the Basque Language)

‰ 33,111 entries, 41,699 senses

‰ Typical examples illustrating the use of words, drawn from corpora

‰ From RTF to TEI SGML (later to TEI XML): DCG written in Prolog ƒ TEI DTD: select / customize / enhance

‰ Manual correction of the automatically obtained output

2010-11-26 IULA - InfoLex (UPF) 11 eEH: from RTF to TEI SGML / XML

‰ eEH: electronic Euskal Hiztegia ( prototype) ƒ Sophisticated indexing system (no databases are used) ƒ definition and example texts fully lemmatized ƒ Users: ƒ ordinary ƒ advanced (philologists, lexicographers, translators...) ƒ Functionality ƒ full hypertext utility (from definitions and examples to corresponding entries) ƒ basic query ƒ advanced query • especially designed query language • dictionary search as in a corpus

‰ Problem: lack of editing environment

2010-11-26 IULA - InfoLex (UPF) 12

eEH: electronic dictionary prototype

query language

query interface

2010-11-26 IULA - InfoLex (UPF) 13 eEH: electronic dictionary prototype

query language

query interface

2010-11-26 IULA - InfoLex (UPF) 14

DBE: from RTF to TEI XML

‰ Miyares Bermúdez E. (dir.) Diccionario Básico Escolar. Centro de Lingüística Aplicada, Santiago de Cuba. 2003.

‰ School dictionary, monolingual st ‰ 7,473 entries, 14,013 word senses (1 ed.)

‰ From RTF to TEI P4 XML: ƒ Word macros ƒ Ferret (semi-automatic learning software) ƒ TEI DTD: select / customize / enhance

‰ Manual correction of the automatically obtained output

‰ leXkit: dictionary editing environment

‰ Three on-line versions, two CDs, three print editions

2010-11-26 IULA - InfoLex (UPF) 15 DBE: CD and on-line (3rd version)

image request

other entry look-up functionality

letter index indexes look-up

response cross- orthographic references help

2010-11-26 IULA - InfoLex (UPF) 17

Dictionary editing environments

‰ Essential if databases or markup languages are chosen for dictionary knowledge representation

‰ Wish list ƒ all kind of editing facilities: XML-transparent, navigation facilities, cross-reference building, wizards... ƒ integrity constraint checking and consistency ƒ multimedia integration ƒ import facilities ƒ collaborative editing ƒ Wiktionary ƒ dicussion forums • Ultralingua (online discussion forum) • Leo collaborative bilingual dictionaries

2010-11-26 IULA - InfoLex (UPF) 18 Dictionary editing environments

‰ Wish list (cont'd) ƒ customized output: dictionary publication ƒ different dictionary products: • unabridged dictionary • student's dictionary • ... ƒ export formats: • electronic versions: XML, HTML, other formats... • print: PDF, desktop publishing software...

2010-11-26 IULA - InfoLex (UPF) 19

A real case: leXkit (Ixa NLP Group)

‰ leXkit: a dictionary content management system (Alegria et al., 2006c) ƒ Dictionary edition and maintenance ƒ XML-based: Berkeley DBXML XML native database for storage ƒ Client-server architecture: SOAP-based communication ƒ Suitable for different kinds of dictionaries

‰ Main features: ƒ Allows adding, deleting and modifying entries in a friendly fashion: XML details are transparent for the lexicographer ƒ Provides the lexicographers with all the features of a full-fledged DBMS: full search capabilities, safe storage, concurrent access, etc.

2010-11-26 IULA - InfoLex (UPF) 20 leXkit

‰ Main features (cont'd): ƒ Maintains entry states (version control and tracking) ƒ Allows to automatically generate the files and components needed by a running application such as the current electronic DBE. ƒ Tailored output is feasible: it allows to easy export data required in print editions, diversified electronic versions, etc.

‰ Architecture ƒ Client ƒ The component used by the lexicographer ƒ Tool integration (corpora, other dictionaries...) ƒ Server: database, concurrency, configuration files (dictionary schema definitions, wizards, etc.), import/export utilities, backups...

2010-11-26 IULA - InfoLex (UPF) 21


Editor: •Edition tree •Predefined tasks

dictionary tabs

edition textbox

Index: ƒDictionary entries ƒSearch results Viewer: •Entry preview (WYSIWYG) •Integrated tools

2010-11-26 IULA - InfoLex (UPF) 22 leXkit

views and info tabs

Viewer: •XML tab •Entry info •Session control •...

2010-11-26 IULA - InfoLex (UPF) 23

leXkit: system architecture

2010-11-26 IULA - InfoLex (UPF) 24 leXkit

‰ Communication (client / server) ƒ SOAP web services (RPC model + cookies) ƒ Intermediate declarative layer (XML) ƒ Dictionary specifications ƒ Operations (context-dependent tasks) ƒ Wizards (common edition operations, predefined searches...)

‰ Other technical aspects ƒ XSLT is widely used in the application ƒ XSLTi: decarative language that adds interactivity to XSLT scripts ƒ XML processing: Xerces + Xalan ƒ Graphical interface: wxWidgets ƒ HTML rendering: Mozilla (wxMozilla)

2010-11-26 IULA - InfoLex (UPF) 25

leXkit: wizards for the DBE

2010-11-26 IULA - InfoLex (UPF) 26 leXkit: conclusions

nd ‰ leXkit has been used at the CLA for editing the DBE's 2 and 3rd editions: from 7473 entries / 14013 senses in the 1st edition to 10557 entries / 19374 senses in the 3rd one.

‰ The construction of leXkit was a vital tool in the qualitative leap of this work.

‰ Dictionary edition applications are a must, especially if dictionaries are stored in databases or XML-encoded.

‰ leXkit can be used by other lexicographical teams to create and update dictionaries. It is available as free software (open source) at

2010-11-26 IULA - InfoLex (UPF) 27

Dictionary representation

‰ Representation is the key factor for dictionary functionality ƒ we won't get what is not stored and adequately represented in the dictionary ƒ the representation we choose conditions what we later on will be able to get from the dictionary

‰ Physical level ƒ text (no access facilities, deficient structuring) ƒ plain or somehow structured (CSV, tabular...) ƒ rich text: typography, word processors

¾ even the entry concept is diluted sometimes ¾ risk: vicious circle (to be avoided)

2010-11-26 IULA - InfoLex (UPF) 28 Dictionary representation

‰ Physical level (cont'd)

ƒ database: relational (structure, indexing, query and update facilities) ƒ one database = one dictionary • is each pertinent information unit correctly represented in a field or column? ƒ integrated dictionary system (publishers) • publisher's general dictionary database

ƒ marked text ƒ HTML: mark-up language, presentation-oriented ƒ SGML / XML: mark-up metalanguage, content-oriented

2010-11-26 IULA - InfoLex (UPF) 29

Dictionary representation

¾ content-oriented marked text constitutes a better data model for the representation of dictionary content and structure than the relational model

ƒ lexical information is inherently complex ƒ information apparently similar is represented in dictionaries using structurally different ways ƒ intra-entry hierarchical structure is not adequately represented using the relational model ƒ the information must be split in several tables: redundancy, factorization problems ƒ construction of user-friendly graphical user interfaces is not always easy ƒ query languages are often complex and non-intuitive

2010-11-26 IULA - InfoLex (UPF) 30 Dictionary representation

¾ content-oriented marked text... (cont'd)

ƒ content-oriented marked texts (SGML, XML...) ƒ descriptive markup (structure, content) ƒ more flexible data representation model ƒ reflects better the lexicographic data model used in dictionaries ƒ drawback: manageability and efficiency ¾ XML native databases: indexing, query and update facilities

TEI (Text Encoding Initiative): a whole chapter full of recommendations on marking up human-oriented dictionaries

2010-11-26 IULA - InfoLex (UPF) 31

Dictionary representation

‰ Physical level (cont'd)

ƒ dictionary knowledge bases: reasoning, artificial intelligence techniques, knowledge representation languages

¾ the only way to extract implicit knowledge from dictionary structures

2010-11-26 IULA - InfoLex (UPF) 32 Dictionary representation

‰ Conceptual level

ƒ what information/knowledge is represented? ƒ orthography, pronunciation, grammar (mostly POS), register, definition... ƒ morphology? irregular inflection paradigms? • important in learner's dictionaries, highly inflected languages...but not only • two real cases (Ixa NLP Group) • Elhuyar eu-es (MS Word plugin): eu and es lemmatization • UZEI synonyms (MS Word plugin): eu lemmatization

2010-11-26 IULA - InfoLex (UPF) 33

Dictionary representation

‰ Conceptual level (cont'd)

ƒ dictionary typology ƒ monolingual / bi- or multilingual ƒ language dictionary / encyclopedic ƒ general use / specific (terminology) ƒ ... ƒ implicit knowledge: in definitions, examples, lexical semantics ƒ WordNet, thesauri... ƒ association lists, semantic networks

¾ inference, reasoning

2010-11-26 IULA - InfoLex (UPF) 34 Publication: presentation, output

‰ print ƒ how to obtain the "file" to submit to the publisher?

‰ electronic ƒ typology ƒ on-line • on-line dictionaries (free, subscriptions...) • dictionary directories: OneLook Dictionary Search • multi-dictionary access tools: Euskalbar, a Firefox plugin that integrates ~30 dictionaries and corpora

• the web (corpus) as a dictionary • translation memories, parallel corpora

2010-11-26 IULA - InfoLex (UPF) 35

Publication: presentation, output

ƒ typology (cont'd) ƒ desktop dictionary software • standalone applications: personal computers, small handheld devices, mobile phones... • integrated dictionaries, plugins: in word processors, web browsers... • Elhuyar eu-es (MS Word plugin) • UZEI synonyms (MS Word plugin) • multi-dictionary tools: Babylon ƒ machine-readable dictionaries: PDF...

ƒ formats: HTML, XML, PDF, PS, electronic book formats, application proprietary formats...

2010-11-26 IULA - InfoLex (UPF) 36 Publication: presentation, output

‰ Which is the way leading from the editing environment or database to the print or to the electronic version?

ƒ DBE ƒ CD and on-line: XML to HTML (dynamic transformation, XSLT) ƒ print: XML to PDF (XSLT-FO)

ƒ Hiztegi Batua (Euskaltzandia, Basque Language Academy): ƒ on-line: XML to HTML (XSLT) ƒ publishing: HTML to Quark (manually) ƒ download: Quark to PDF

2010-11-26 IULA - InfoLex (UPF) 37

Publication: presentation, output

‰ Which is the way... (cont'd) ƒ other solutions: ƒ [general dictionary] • Oracle to HTML (web) • Oracle to Quark (print) ƒ [terminological dictionary] • 4D to Quark (print) • 4D to XML (TBX) to XHTML (web)

ƒ customized output: proprietary formats (mainly in desktop dictionary software)

‰ The longer the way...the easier is to get lost! ƒ update will be more costly

2010-11-26 IULA - InfoLex (UPF) 38 Use: functionality

‰ Use cases ƒ language input: typical lookup (definitions, multiword expressions...) ƒ language output: is the dictionary well oriented to be used in language production situations? ƒ much more information is needed when we want to actually use a word in speech or in writing than when we only want to understand a word in a passage. ƒ translation tasks: language input and output ƒ especial information is needed: faux amis... ƒ language learning activities: ƒ more information is needed about context of use, connotations of a word, collocations, etc.

2010-11-26 IULA - InfoLex (UPF) 40

Use: functionality

‰ Users (models, profiles) ƒ native speakers ƒ language learners ƒ translators ƒ students, children... ƒ specialists: scientists, technicians...

‰ Functionality ƒ do we get from electronic dictionaries what we could expect from them? ƒ are they something more than their print counterparts?

2010-11-26 IULA - InfoLex (UPF) 41 Dictionaries of the future:

Print dictionaries have been joined by dictionaries in electronic form: these are often enriched with many additional features, such as sound recordings or sophisticated links to other related material. ... It seems likely that by the middle of this century, if not before, all dictionaries will be in electronic form. This means that limitations of space, which have always been a serious issue for lexicographers and dictionary publishers, will be much less important. Dictionaries will be able to include more material: more words and definitions, interactive features, and multimedia content such as images, sound, and video. They will also be updated much more rapidly than ever before. But the general idea of a dictionary - a resource that provides explanations of words and how they are used - will probably remain the same.

2010-11-26 IULA - InfoLex (UPF) 42

Use: functionality

‰ Functionality (cont'd) ƒ what we get ƒ search facilities: from basic lookup to advanced queries ƒ speed, storage facilities ƒ orthographic help (closeness) ƒ integration: word processors, reading applications... ƒ new features: multimedia (recorded sounds, images, videos), hyperlinks ƒ interactivity? ƒ wish list ƒ definition and examples: corpus queries ƒ navigation: fully hyperlinked (lemmatization of definitions, examples...) ƒ morphology, grammar, derivation...

2010-11-26 IULA - InfoLex (UPF) 43 Use: functionality

ƒ wish list (cont'd) ƒ use of words, lexical combinatorics, collocations • dictionary and corpus integration? ƒ find a word from its definition, explore related concepts...: OneLook Reverse Dictionary (statistical language processing)

ƒ intelligent dictionary? why not integrate different kinds of information and tools (WordNet, thesauri, multimedia, collocations, thematic...) in powerful language help systems, and provide them with inference and reasoning capabilities? • Hiztsua / SIAD (Artola, 1993; Agirre et al. 1994a, 1994b, 1997) • AnHitz (Arregi, 1995; Agirre et al. 1996, 2000)

Have we investigated enough the ways users use dictionaries?

2010-11-26 IULA - InfoLex (UPF) 44

Hiztsua / SIAD: Intelligent Dictionary Help System

‰ Built from a small French dictionary: Le Plus Petit Larousse (Librairie Larousse. Paris, 1980).

‰ Definitions parsed using NLP techniques: morphology, syntax, definition patterns, lexico-semantic relationships

‰ Building procedure: LPPL (typed directly into a DB GUI) Dictionary Database (DDB, relational) Dictionary Knowledge Base (DKB)

‰ DKB: interrelated network of concepts (semantic network): ƒ hypernymy/hyponymy ƒ synonymy, antonyms ƒ meronymy ƒ semantic roles

2010-11-26 IULA - InfoLex (UPF) 45 Hiztsua / SIAD: Intelligent Dictionary Help System

‰ Frame-based system, allowing inheritance, inference, composition of lexical relationships

‰ Prototype conceived and designed for human users ƒ from the study of questions that human users would like to have answered when consulting a dictionary

‰ Functionality that allows to extract and infer implicit knowledge hidden in the dictionary structures ƒ definition queries, searches of alternative definitions ƒ differences, relations and analogies between concepts ƒ thesaurus-like word search ƒ verification of concept properties and of interconceptual relationships ƒ ...

2010-11-26 IULA - InfoLex (UPF) 46

Anhitz: A translator-oriented Dictionary System

‰ Intelligent help system for human translators ƒ the dictionary is conceived as an "active" tool that observes the activity of the user while he or she is working, providing him or her with "intelligent help"

‰ Prototype based on two monolingual dictionaries (French and Basque): ƒ two monolingual knowledge bases ƒ one bilingual DKB establishes equivalence links between concepts from the monolingual dictionaries ƒ diverse types of equivalence relationships: more general, more specific...

2010-11-26 IULA - InfoLex (UPF) 48 Anhitz: A translator-oriented Dictionary System

‰ Functionality: ƒ empirical observation and study, using protocols and questionnaires, on the activity of professional and non- professional translators ƒ to model the translator-dictionary interaction when translating lexical units from the source language into the target language ƒ user's goals and intentions, dictionary queries made, observations, etc. have been recorded ƒ monolingual and bilingual, locution, synonym... dictionaries ƒ real use cases ƒ functions classified according basically to three main activities: ƒ source text understanding ƒ target text generation ƒ search for translation equivalents

2010-11-26 IULA - InfoLex (UPF) 49

Anhitz: A translator-oriented Dictionary System


SOURCE WORD trans-lex



2010-11-26 IULA - InfoLex (UPF) 50 Anhitz: A translator-oriented Dictionary System




rths colloc exam dis-pro comp-sem prod ver-reg dpro sint-pat

2010-11-26 IULA - InfoLex (UPF) 51

Anhitz: A translator-oriented Dictionary System

‰ Primitive functions: ƒ morphological analysis of a word form ƒ choice of a dictionary entry / word sense in a given context ƒ list of the possible senses that could be suitable for a word in a given context ƒ definition request ƒ reformulation of a definition ƒ request of the properties of a concept ƒ choice of a definition in a given context ƒ request of differences or relationships between two concepts ƒ verification of relationships between two concepts ƒ definition verification

2010-11-26 IULA - InfoLex (UPF) 52 Anhitz: A translator-oriented Dictionary System

‰ Primitive functions (cont'd): ƒ verification of the properties of a concept ƒ thesaurus-like search of concepts ƒ request of examples ƒ direct lexical translation of a word form ƒ verification of translation equivalents ƒ semantic compatibility between two word senses according to a given relationship ƒ search for syntactic constructions corresponding to a given pattern ƒ search of lexical collocations ƒ request of the verb regime ƒ search for potential translation equivalents

2010-11-26 IULA - InfoLex (UPF) 53

To finish...

‰ Dictionary edition: provide the lexicographer with advanced tools

‰ Stress the importance of dictionary knowledge representation: we will get what we keep, and we will get it if we represent it adequately for the purpose required

‰ We should investigate how users do use dictionaries, in order to build more "intelligent" systems, capable of anticipating users' needs and help them better

‰ The dictionary of the future should be a "different" thing, not merely a "faster" print dictionary ƒ integration of different kinds of information and tools in powerful language help systems ƒ rich and heterogeneous functionality and access ways to the lexicon

2010-11-26 IULA - InfoLex (UPF) 54 Bibliography

‰ Miyares Bermúdez E., Ruiz Miyares L., Álamo Suárez C., Pérez Marqués C., Artola Zubillaga X., Alegria Loinaz I., Arregi Iparragirre X.. 2010a. Las últimas ediciones del Diccionario Básico Escolar de Cuba. IV Congreso Internacional de Lexicografía Hispánica. Tarragona.

‰ Miyares Bermúdez E., Ruiz Miyares L., Álamo Suárez C., Pérez Marqués C., Artola Zubillaga X., Alegria Loinaz I., Arregi Iparragirre X.. 2010b. La segunda y tercera ediciones del Diccionario Básico Escolar. Euralex2010. Leeuwarden (The Netherlands).

‰ Arregi X., Arriola J.M., Artola X., Díaz de Ilarraza A., Garcia E., Lascurain V., Soroa A., Uria L. 2007. Semiautomatic Construction of the Electronic Euskal Hiztegia Basque Dictionary (eEHBD). The XVIth biennial conference of the Dictionary Society of North America, Chicago.

‰ Alegria I., Arregi X., Artola X., Astiz M., Ruiz Miyares L.. 2006a. Different issues in the design and development of the electronic Cuban Basic School Dictionary. E. Miyares, L. Ruiz eds., Linguistics in the Twenty First Century, 273-288. Cambridge Scholars Press, UK. ISBN: 1904303862.

‰ Alegria I., Arregi X., Artola X., Astiz M., Ruiz Miyares L.. 2006b. Building an Electronic Version of the Cuban Basic School Dictionary. Proceedings EURALEX 2006 I, 243-250 (Turin, Italy). (ISBN 88-7694-918-6).

2010-11-26 IULA - InfoLex (UPF) 55


‰ Alegria I., Arregi X., Artola X., Astiz M., Ruiz Miyares L.. 2006c. A Dictionary Content Management System. Proceedings EURALEX 2006 I, 105-109 (Turin, Italy). (ISBN 88-7694- 918-6).

‰ Soroa, A. Izaera heterogeneoko baliabide lexikalen integraziorako arkitektura baten proposamena. Datu-integrazioaren ikuspegitik egindako ekarpena. PhD Thesis. Informatika Fakultatea, UPV-EHU. 2004.

‰ Arregi X., Arriola J., Artola X., Díaz de Ilarraza A., García E., Laskurain B., Sarasola K., Soroa A., Uria L.. 2003. Semiautomatic conversion of the Euskal Hiztegia Basque Dictionary to a queryable electronic form. T.A.L. journal. vol 44, num 2 p 107-124 ISSN: 1248-9433.

‰ Arriola J., Artola X., Soroa A.. 2003. Automatic Extraction of verb patterns from Hauta-Lanerako Euskal Hiztegia. B. Oyharçabal ed., Inquiries into the lexicon-syntax relations in Basque. Supplements of ASJU no. XLVI (ISBN: 84-8373-580-6), 127-146. UPV/EHU, Bilbo.

‰ E. Agirre, X. Arregi, X. Artola, A. Díaz de Ilarraza, F. Evrard, K. Sarasola, A. Soroa. 2003. An Intelligent Dictionary Help System. Encyclopedia of Library and Information Science, 2nd. Edition (ISSN/ISBN: 0-8247-2075-X [print]; 0-8247-4259-1 [web]), 1390-1401. Allen Kent (Marcel Dekker, Inc.), New York.

2010-11-26 IULA - InfoLex (UPF) 56 Bibliography

‰ Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Soroa A.. 2001. MLDS: A Translator-Oriented MultiLingual Dictionary System. Natural Language Engineering, 5 (4), 325- 353. ISSN: 1351-3249. Cambridge University Press.

‰ Agirre E., Ansa O., Arregi X., Artola X., Díaz de Ilarraza A., Lersundi M., Martinez D., Sarasola K., Urizar R.. 2000. Extraction Of Semantic Relations From A Basque Monolingual Dictionary Using Constraint Grammar. Proceedings of Euralex Sttutgart (Germany). 2000. ISBN 3-00- 006574-1.

‰ Arriola, J.M.. Euskal Hiztegia-ren azterketa eta egituratzea ezagutza lexikalaren eskuratze automatikoari begira. Aditz-adibideen analisia murriztapen-gramatika baliatuz, azpikategorizazioaren bidean. PhD Thesis. Filologia eta Historia-Geografia Fakultatea, UPV- EHU, 2000.

‰ Patrick J., Zhang J., Artola X.. 2000. An Architecture and Query Language for a Federation of Heterogeneous Dictionary Databases. Computers and the Humanities (ISSN: 0010-4817).

‰ Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Soroa A.. 2000. A Methodology For Building Translator-Oriented Dictionary Systems. Journal. ISSN: 0922- 6567. Kluwer Academic Publishers. V. 15 nº 4. pp. 295-310. 2000.

2010-11-26 IULA - InfoLex (UPF) 57


‰ Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Soroa A.. 1999. Un Diccionario activo vasco-castellano en un entorno de escritura. VI Simposio Internacional de Comunicación Social. Santiago de Cuba, 25-28 de Enero de 1999.

‰ Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Soroa A.. 1997. Constructing an intelligent dictionary help system. Natural Language Engineering 2(3): 229-252. ISSN: 1351- 3249. Cambridge University Press. Cambridge. 1997.

‰ Arriola J., Artola X., Soroa A.. 1996. Hauta-lanerako Euskal Hiztegiaren analisi erdiautomatikoa. ASJU, Anuario del Seminario de Filología Vasca.

‰ Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Ezeiza N., Sarasola K., Soroa A., A. Agirre, Patel H..1996. Design of a translator-oriented dictionary: Enhancement of a dictionary knowledge base by task modelling. Le traitement automatique du langage et les applications industrielles/Natural Language Processing and Industrial Applications. (NLP + IA96), Volume I, pp 1-6. Moncton, Canada. 1996.

‰ Arriola J., Artola X., Soroa A.. 1996. Automatic extraction of lexical information from an ordinary dictionary. EURALEX'96, Göteborg (Sweden).

‰ Patrick J., Zhang J., Artola X.. 1996. An Architecture for a Federation of Heterogeneous Lexical and Dictionary Databases. Joint International Conference ALLC/ACH'96, 221-225. Bergen (Sweden).

2010-11-26 IULA - InfoLex (UPF) 58 Bibliography

‰ Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Evrard F.. 1995. IDHS, MLDS: Towards Dictionary Help Systems for Human Users. Semantics And Pragmatics Of Natural Language: Logical And Computational Aspects. K. Korta & J. M. Larrazabal (Eds.), ILCLI Series, n. 1. Donostia.

‰ Arriola J., Artola X., Soroa A.. 1995. Análisis automático del diccionario Hauta-Lanerako Euskal Hiztegia. Procesamiento del lenguaje natural (SEPLN), Revista no. 17, 173-181. Bilbo.

‰ Arregi, X.. ANHITZ: Itzulpenean laguntzeko hiztegi-sistema eleanitza. PhD Thesis. Informatika Fakultatea, UPV-EHU, 1995.

‰ Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K.. 1994a. Lexical Knowledge Representation in an Intelligent Dictionary Help System. Proceedings of COLING'94, vol. 1, 544- 550. Kyoto (Japan).

‰ Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Evrard F.. 1994b. Intelligent dictionary help systems. Applications and Implications of current LSP Research. Eds. Brekke, M.; Andersen. I.; Dahl, T. & Myking, J., v. 1., 174-183. Fakbokforlaget (Norway).

‰ Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K.. 1994c. Analysing world-level translation activity to design a computerised dictionary. Proceedings of Euralex'94. Amsterdam.

2010-11-26 IULA - InfoLex (UPF) 59


‰ Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K.. 1994d. A methodology for the extraction of semantic knowledge from dictionaries using phrasal patterns. Proceedings of IBERAMIA'94. IV Congreso Iberoamericano de Inteligencia Artificial. McGraw-Hill. , 263-270. Caracas (Venezuela).

‰ Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Evrard F.. 1993. Sistema Diccionarial Multilingüe: aproximación funcional. Revista de la Asociación Española para el Procesamiento del Lenguaje Natural. Vol: 14, pp: 313-335.ISSN: 1135-5948.

‰ Artola, X.. HIZTSUA: Hiztegi-sistema urgazle adimendunaren sorkuntza eta eraikuntza. Hiztegi- ezagumenduaren errepresentazioa eta arrazonamenduaren ezarpena. / Conception et construction d'un système intelligent d'aide dictionnariale (SIAD). Acquisition et représentation des connaissances dictionnariales, établissement de mécanismes de déduction et spécification des fonctionnalités de base. PhD Thesis. Informatika Fakultatea, UPV-EHU, 1993.

‰ Artola X., Evrard F.. 1992. Dictionnaire intelligent d'aide á la compréhension. Actas IV Congreso International EURALEX'90 (Benalmádena), 45-57. Barcelona.

‰ Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Evrard F.. 1991. Aproximación funcional a DIAC: diccionario inteligente de ayuda a la comprensión. Revista de la Asociación Española para el Procesamiento del Lenguaje Natural. Vol: 11, pp:127-138. ISSN: 1135-5948.

2010-11-26 IULA - InfoLex (UPF) 60 RTF: Rich Text Format (MS Word)

{\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;} … {\b\f69\fs16 aberastasun}{\fs14 .}{\b\i\fs14 }{\i\fs14 iz. }{\fs14 (1617; }{\i\fs14 abrastasun}{\fs14 1571).}{\b\fs14 1}{\i\fs14 . }{\fs14 Ondasun edo gauza baliotsuen ugaritasuna}{\i\fs14 . Aberastasunak ematen du aginpidea. }{\fs14 Ik. }{\b\fs14 diru}{\b\i\fs14 . }{\i\fs14 Aberastasunez betea. Ez ohorerik ez aberastasunik. Garai hart an Espainia guztian omen zen baso-oihanetan aberastasun handia. Basoetako aberastasuna. Zein zitezkeen gereziketa eta fruitu aberastasun horren iturburuak. }{\f69\fs12 II}{\fs14 }{\i\fs14 Pl. }{\fs14 Norbaitek dituen ondasun eta gauza baliozkoak}{ \i\fs14 . Herri baten aberastasunak eta baliabideak. Aberastasun galkorren ondoan ibiltzea. Euskarak bere baitan dituen aberastasunak. Aberastasunen banaketa zuzena. Aberastasunak hondatu. }{\b\fs14 2}{\fs14 .}{\i\fs14 }{\fs14 Aberatsa denaren nolakotasuna. Ant}{\i\fs14 . }{\b\fs14 pobretasun}{\fs14 ;}{\b\fs14 behartasun}{\b\i\fs14 . }{\i\fs14 Aberastasunean bizi. Pobretasunetik aberastasunera. Aberastasunaren arriskuak. \par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 { \par }}

2010-11-26 IULA - InfoLex (UPF) 61

Simplified DCG grammar to parse EH entries

Entry => Hdw [Relations] Category [date] [DefExamples]. Hdw => [Homograph] [NonStdHdw | StdHdw]. Homograph => bh number eh. NonStdHdw => cross bb hdw eb. StdHdw => bb hdw eb. Category => [subc] Category. Category => bi cat ei. DefExamples => Def [Examples] DefExamples | ε. Def => [SenseNumber][SenseGroup] def [Relations]. SenseNumber => bs number es. SenseGroup => bsg grouptag esg. Relations => [SynRel | AntRel] Relations [Examples] | ε. SynRel => bsy synonyms esy. AntRel => ba antonyms ea. Examples => bi examples ei.

2010-11-26 IULA - InfoLex (UPF) 62 TEI XML encoding (DBE entry)

decaer de|ca|er
vintr. (33) Ir a menos, perder una persona o cosa parte de las propiedades que le daban su fuerza o valor. Con el paso del tiempo, su interés type="?">decayó. Sin. debilitar disminuir flaquear desfallecer
decaído (p.p.)

2010-11-26 IULA - InfoLex (UPF) 63

DBE: print version (3rd ed.) page markers

figure refs.

2010-11-26 IULA - InfoLex (UPF) 64