Eudelex: Creating a German-Basque Electronic Dictionary
Total Page:16
File Type:pdf, Size:1020Kb
EuDeLex: Creating a German-Basque Electronic Dictionary Functions Microstructure Primarily targeted user groups of a first edition are (1) Basque-L1 German learners, and (2) Follo ing microstructures proposed in some dictionaries (e.g. Langenscheidt-s bilinguals German to Basque translators. ith German, P.NS Großw1rterbuch DaF), we organize t!e dictionary entry first and We propose a database with an open structure t!at allo s to include more data in the foremost syntactically, i.e. contrary to T45 Standard we define syntactical entities as parent, future, and to deri#e different dictionary macro- and microstructures from it. For design and and sense groups as their child elements. For verbs, we discriminate (1) transiti#e, (2) editing of this database, we use the TshwaneLex software (De Schry#er & Joffe 2++,). intransitive, (3) refle7i#e or reciprocal and non-personal use as o n syntactical entities. Some arguments in fa#our of such an element order according to syntactic properties are: Macrostructure 1. 5t helps as a first orientation within longer dictionary entries 2. 5n te7t reception, it is a strategy of ad#anced German learners to identify the verb (the meaning of w!ich they possibly don't kno ) and its arguments as syntactic entities and Frequency-based lemmalists then to proceed to semantics • 5t has been pro#ed that the most frequent words are actually the words most frequently 6. 5n German as a foreign language production, it is often not the meaning but the syntactic looked up by dictionary users; this is true for the top fe thousand (De Schry#er et al. properties of a word dictionary users want to find information about. 2+1+). • %requency data is useful information for bot! dictionary editor and user. aussetzen 'ere oRank 1948 > trennbar German 5 Verb Transitiv +haben ‣ • We define a lemma list based on a corpus-based frequency word list (DeReWo-<+.0++, I'( 2++;), re#ise it by hand, and include frequency data in the published dictionary. GFL 1 (umea, animalia) utzietsi@ (bazterrera) utzi@ abandonatu learners- basic vocabulary is fully co#ered by DeReWo. 2 [+Dat.] eraginpean jarri !asque 3 A eine Belohnung (auf etw.) aussetzen zbt.en saria iragarri • We de#elop a Basque frequency-based lemmalist from corpus-based frequency word lists 4 = unterbre"hen eten (Elhuyar WebCorpus, ETC UPH-4DU) and the Basque Language Academy's !izte#i Batua 5 :4CD3(W5((4/(CDE%3 atzeratu@ geroratu (Lindemann & San Vicente, in prep.) 6 [an etw.] kezkatu@ gaitzetsi 55 Verb Intransitiv +haben ‣ Research and Working Steps 1 eten A (eine Runde) aussetzen (txanda bat) galdu • Proposals for macrostructure and microstructure 2 huts egin • 1+I of DE Lemmalist: Edition of Dictionary entries DE>4G • Scripting for User Interface, publication of a preliminary version 5ndications for German verb au7iliary selection are placed behind the syntactical flag. By • Bilingual Dictionary Drafting (Lindemann, Manterola, Nazar et al. 2+1<) #alency formula, the user gets clues about a correct argument stru"ture realization, which is helpful not only for te7t production, but also in te7t reception, in cases w!ere a valency • Result e#aluation by hand (Letter A) formula re#eals the necessary to discriminate the German verb-s polysemy. • Possible pasting of draft data into EuDeLe7 database (in prep.) 3ogether with Basque translation equivalents, word sense groups are furnis!ed with • Edition and publication of all dictionary entries D4JEU (planned) additional information su"! as specification of semantic domain and register, and with links • Edition and publication of dictionary entries EU>'E (planned) to German synonyms. T!e structure of German noun, verb, adFecti#e and ad#erb entries is • Drafting by in#erting DE>EU articles presented in detail in Lindemann (2+1<). German-Basque Corpora Online Interface • See Lindemann (2+16), Lindemann, Manterola, Nazar et al. (2+1<) TshwaneLex e7ports to XML format, the D3D of which is defined inside the application. Our o n *erl script transforms this XML to MySQL tables to be installed on t!e ser#er; T!e User- "arallel Corpora 5nterface is based on a single php script (scripts: Lindemann & Nazar 2+16). • Used in the documentation process for entry editing Ketalanguage S itch • Used for automatic Bilingual Dictionary Drafting • Used for the display of translated usage e7amples as part of t!e dictionary search result page (planned) $iterature Corpus Korphology from • 47tracted from 81 German no#els and their official translations to Basque, alignment (ear"! hand-re#isioned (Zubillaga & Sanz Villar 2+11-2+16, UPH-4DU), lemmatized and P.S- Wiktionary annotated !ible Corpus 4uDeLe7 • Aligned at verse le#el, lemmatized and P.S-annotated Dictionary Entry 'hat can GFL learners ex*ect Comments Link from a bilingual dictionar12 4uDeLe7 Lemma Bro ser • A bilingual dictionary should contain introducing and e7plaining prefaces in bot! languages. User interfaces of ele"tronic dictionaries should be able to pro#ide the possibility to switch bet een bot! languages as metalanguage, i.e. all instru"tions apart from the lemma signs, synonyms and translation equivalents should be available in both languages. • German inflected forms should be sear"!able, information about the word form and a link to t!e corresponding canonical form listed as head ord should be pro#ided. Links Wiktionary and • Apart from single word lemma signs, also multi- ord e7pressions su"! as light verb Wikipedia links constructions or idiomatic phrases should be found in a dictionary, together with '4 (and EG) e7plaining instructions. • A bilingual dictionary article should not only be furnished with insightful instructions for ord sense (polysemy) disambiguation and a mapping to suitable Translation Equivalents, but also instructions related to morphology, syntax, and pragmatics. In the case of a dictionary for GFL learners these are t!e follo ing: 5n the page header, the user is prompted for a German head ord to look for. 3he search result frame is organized in three columns: ◦5nflection morphology paradigm (or complete inflection tables) for the German verb, noun or adFe"tive 1. Found dictionary entry or entries, and matching page titles, redirect pages and Basque translation links from the German editions of Wiktionar1 and Wiki*e&ia. ◦For German verbs, au7iliary selection 2. Lemmabro ser: Links to direct neig!bours on the lemma list; ◦5nstructions related to valency or argument structure realisation ◦ 6. Morphological information about the German head ord, retrie#ed from Wiktionary: German synonyms 5nflectionBelo , links to German monolingual and bilingual websites. ◦ %requency data 3hese three columns function independently, i.e. column (3) will also respond with search ◦Pragmatics (Register) results if no entry is found in the local database. Inflected forms are also found in ◦Collocates (planned) Wiktionar1, the word form is then analysed, and a link to the canonical form is pro#ided. ◦Phrases and Idioms (planned) Es all data from Wikime&ia is retrie#ed on-the,0ly together with t!e local database query, it ◦3ranslated usage e7amples from bilingual corpora (in prep.) ill al ays reflect the last version of these sources. Re&erences: 5'S (2++;). Korpusbasierte Wortgrundformenliste D4:4W., v-<++++g-2++;-12-61-+.1, mit Benut2erdokumentation. Lindemann, D. (2+16). Bilingual Lexi"ography and Corpus Methods. The Example of German-Basque as Language Pair. In Pro"edia - So"ial and Behavioral ("ien"es, 9,, 2<;>2,Q. Lindemann, D. (2+1<). Z eisprac!ige Lexi9ographie des Spra"henpaares Deuts"h-Bas9is"h. In DomRngue2 VS2que2, M.*., Molli"a, F. & Nied, M. (eds.) M eisprachige Lexi9ographie im Spannungsfeld z is"hen Translation und Didakti9, Lexi"ograp!i"a Series Maior. De Gruyter, pp. 2,;>2=1. Lindemann, D., Saralegi, X., San Vi"ente, I., Manterola, I. & Nazar, R. (2+1<). Bilingual Di"tionary Drafting. The example of German-Basque, a medium-density language pair. EG:EL4X, Bol2ano. Lindemann, D. & San Vi"ente, I. (in prep.). Eus9araz9o mai2tasun lemategia gaurko te9nologien i9uspuntuti9. Bilbo: UPH-4DG. 'e Schry#er, G.-K. & Joffe, D. (2++,). One database, many di"tionaries>varying co(n)te7t with t!e di"tionary appli"ation Ts! aneLex. In Pro"eedings of the 4th 3his study has been supported by the proje"t I3VV,-16, funded by t!e Basque Go#ernment, E(5EL4X conferen"e. Singapore, pp. 5<>,;. and by the proje"t EC FPQW((D-2+16-1 A3!4K4 (616<V,), funded by the European Commission 'e Schry#er, G.-K., Joffe, D., Joffe, P. & Hille aert, S. (2+1+). Do di"tionary users really loo9 up frequent wordsTUon the o#erestimation of the value of corpus- based lexi"ography. In Lexi9os, 1V. www.ehu.es/eudelex [email protected] EUR*$(+ 2014, Bolzano.