University of Zagreb Faculty of Humanities and Social Sciences Department of Information and Communication Sciences

A framework for consolidating most important digitized Croatian dictionaries from 1595 to 1945

Petra Bago, research and teaching assistant prof. Damir Boras, PhD

Information Technology and Journalism 2012 (ITJ 17) , 28 May – 01 June, 2012 About the project

• Croatian Dictionary Heritage and Croatian European Identity • project coordinator: prof. Damir Boras, PhD • Goals - digitization of Croatian bilingual and multilingual - dictionaries printed from 1595 to 1945 - is an inseparable part of European identity - awareness of Croatian identity in Europe - reception of dictionaries in Croatia and Europe - the structure of dictionary knowledge Importance of the project

• scientific, cultural, and political context • Croatian and European lexicography • principles of knowledge representation • enabling access to (scientific) audience Most important digitized dictionaries

• Faust Vrančić. Venice, 1595. [5] • Peter Loderecker. Prague, 1605. [7] • Jakov Mikalja. Rome, 1649-1651. [3] • Juraj Habdelić. Graz, 1670. [2] • Ivan Belostenec. Zagreb, 1740. [2] • Ardelio Della Bella. Venice, 1728. [3] • Ivan Mažuranić, Jakov Užarević. Zagreb, 1842. [2] • Bartol Kašić. Rome, 1599. [2] • Josip Altman, Stevan Bukl et. al. Zagreb, 1881. [2] Other smaller digitized dictionaries

• Libellus alphabeticus. (probably) Slavonia, • 1756. [3] • Jakov Anton Mikoč. Rijeka, 1852. [2] • Božo Babić. - Trieste, 1870. [2] - Kraljevica, 1877. [3] - Senj, 1901. [3] • Ivan Broz. Zagreb, 1893. • Milan Žepić, Zagreb, 1913. [2] Dictionaries in the process of digitization

• Andrija Jambrešić. Zagreb, 1742. [4] • Mirko Divković. Zagreb, 1900. [2] Languages in selected dictionaries

• Croatian • • Italian • German • Hungarian About the framework for consolidating dictionaries

• a universal structure of all dictionaries using TEI Guidelines - eXtensible Markup Language (XML) - Unicode - the Text Encoding Initiative (TEI) • contains different views of dictionaries - the typographic view – page layout, physical details - the editorial view – sequence of tokens - the lexical view – the underlying information - represented in the dictionary Selected dictionaries

• 1. Petr Lodereker. Dictionarium septem diversarum linguarum videlicet Latine, Italice, Dalmatice, Bohemice, Polonice, Germanice et Ungarice. Prague, 1605. • 2. Jakov Mikalja. Blago jezika slovinskoga. Rome, 1649. • 3. Ardelio della Bella. Dizionario italiano -latino -illirico, 2nd ed . Dubrovnik, 1785. • 4. Joakim Stulli, Rjecsosloxje ilirsko (slovinsko) –italiansko- latinsko. Dubrovnik, 1806. • 5. Ivan Mažuranić, Josip Užarević. Njemačko-ilirski slovar. Zagreb, 1842. • 6. Dragutin Antun Parčić. Vocabolario croato-italiano, 3ed ed. 1901. • 7. Mirko Divković. Latinsko-hrvatski rječnik za škole. Zagreb, 1900. Steps

• (semi)automatic language detection • transcription to modern - Latin as interlingua • pairing elements between resources • identification of mistakes • results: single search and browser engine for • all resources Sources

• Dictionaries – TEI P5. http://www.tei-c.org/release/doc/tei-p5- doc/en/html/DI.html (29.05.2012.) • Extensible Markup Language. http://www.w3.org/XML/ (29.05.2012.) • Mirko Divković. Latinsko-hrvatski rječnik za škole. Zagreb, 1900. • Ardelio della Bella. Dizionario italiano-latino-illirico, 2nd ed. Dubrovnik, 1785. • Petr Lodereker. Dictionarium septem diversarum linguarum videlicet Latine, Italice, Dalmatice, • Bohemice, Polonice, Germanice et Ungarice. Prague, 1605. • Ivan Mažuranić, Josip Užarević. Njemačko-ilirski slovar. Zagreb, 1842. • Jakov Mikalja. Blago jezika slovinskoga. Rome, 1649. • Dragutin Antun Parčić. Vocabolario croato-italiano, 3ed ed. 1901. • Joakim Stulli, Rjecsosloxje ilirsko (slovinsko)-italiansko-latinsko. Dubrovnik, 1806. • The Unicode Consortium. http://unicode.org/ (29.05.2012.)