Lects in Helsinki Finnish - a probabilistic component modeling approach Olli Kuparinen, Tampere University Jaakko Peltonen, Tampere University Liisa Mustanoja, Tampere University Unni Leino, Tampere University Jenni Santaharju, University of Helsinki Competing interests: The authors declare none. Olli Kuparinen Faculty of Information Technology and Communication Sciences Kanslerinrinne 1 (Pinni B) FI-33100 Tampere
[email protected] Short title: Lects in Helsinki Finnish Accepted for publication in Language Variation and Change. Lects in Helsinki Finnish Lects in Helsinki Finnish - a probabilistic component modeling approach Abstract This article examines Finnish lects spoken in Helsinki from the 1970s to the 2010s with a probabilistic model called Latent Dirichlet Allocation. The model searches for underlying components based on the linguistic features used in the interviews. Several coherent lects were discovered as components in the data, which counters the results of previous studies that report only weak co-variation between features that are assumed to present the same lect. The speakers, however, are not categorical in their linguistic behavior and tend to use more than one lect in their speech. This implies that the lects should not be considered in parallel with seemingly uniform linguistic systems such as languages, but as partial systems that constitute a network. Keywords: lect, coherence, real time change, Finnish, Latent Dirichlet Allocation The research project was funded by the Kone Foundation. Lects in Helsinki Finnish 1 The coherence of linguistic varieties has been increasingly questioned in recent years, for example in studies on ethnolects (Boyd & Fraurud, 2010; Wolfram, 2007), dialects (Gregersen & Pharao, 2016), sociolects (Guy, 2013) and registers (Geeraerts, 2010).