Blowing in the wind: Using ‘North Wind and the Sun’ texts to sample phoneme inventories Louise Baird ARC Centre of Excellence for the Dynamics of Language, The Australian National University
[email protected] Nicholas Evans ARC Centre of Excellence for the Dynamics of Language, The Australian National University
[email protected] Simon J. Greenhill ARC Centre of Excellence for the Dynamics of Language, The Australian National University & Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History
[email protected] Language documentation faces a persistent and pervasive problem: How much material is enough to represent a language fully? How much text would we need to sample the full phoneme inventory of a language? In the phonetic/phonemic domain, what proportion of the phoneme inventory can we expect to sample in a text of a given length? Answering these questions in a quantifiable way is tricky, but asking them is necessary. The cumulative col- lection of Illustrative Texts published in the Illustration series in this journal over more than four decades (mostly renditions of the ‘North Wind and the Sun’) gives us an ideal dataset for pursuing these questions. Here we investigate a tractable subset of the above questions, namely: What proportion of a language’s phoneme inventory do these texts enable us to recover, in the minimal sense of having at least one allophone of each phoneme? We find that, even with this low bar, only three languages (Modern Greek, Shipibo and the Treger dialect of Breton) attest all phonemes in these texts.