On the creation of a pronunciation dictionary for Hungarian Stephen M. Grimes
[email protected] August 2007 Abstract This report describes the process of creating a pronunciation dictionary and phonological lexicon for Hungarian for the purpose of aiding in linguistic research on Hungarian phonology and phonotactics. The pronunciation dictionary was created by transforming orthographic forms to pronunciation representations by taking advantage of systematic deviations between Hungarian orthography and pronunciation. It is argued that the “automated” creation of such a dictionary is reasonably expected to be accurate due to the relative similarity of Hungarian orthography to actual pronunciation. This document includes discussion of goals and standards for creating a Hungarian pronunciation dictionary, and each phonological change creating a mismatch between orthography and pronunciation is highlighted. Future developments and additions to the current dictionary are also suggested as well as strategies for evaluating the quality of the dictionary. Finally, potential applications to linguistic research are discussed. 1 Introduction While students of the English language quickly learn that English spelling is by no means consistent, many Hungarians believe that the Hungarian alphabet is completely phonetic. Here, a phonetic alphabet refers to the existence of a one-to-one mapping between symbol and sound. It can quite easily be demonstrated by counter-example that 1 Hungarian orthography is not phonetic, and in fact several types of orthographic- pronunciation discrepancies exist. Consider as an example the word /szabadság/ 1 [sabatʃ:a:g] ‘freedom, liberty’ , in which no fewer than four orthographic-pronunciation discrepancies can be identified with the written form of this word: (1) a.