Sardinian on Facebook
Total Page:16
File Type:pdf, Size:1020Kb
Sardinian on Facebook: Analysing Diatopic Varieties through Translated Lexical Lists Irene Russo Simone Pisano ILC CNR Universita` Guglielmo Marconi Pisa Roma [email protected] [email protected] Claudia Soria ILC CNR Pisa [email protected] Abstract usually very poorly represented digitally (Soria, 2016). English. Presence of regional and minor- Since poor digital representation of regional and ity languages over digital media is an in- minority languages further prevents their usability dicator of their vitality. In this paper, we on digital media and devices, it is extremely im- want to investigate quantitative aspects of portant to enhance every bottom-up effort that can the use on Facebook of the Sardinian lan- boost the quantity of available digital content. In guage. In particular, we want to focus fact, if the perception of the marginal role and lim- on the co-existence of diatopic varieties. ited applicability of RMLs persists, their attrac- We extracted linguistic data from public tiveness diminishes. pages and, through the translation of the An increase in quantity of digital content avail- most frequent words, we find out similari- able online represents today an opportunity for re- ties and differences between varieties. gional and minority languages. Online speakers Italiano. La presenza e l’ uso delle lingue can make visible the existence of a community that regionali e minoritarie sui mezzi digitali e` uses the language to interact; they can use online un indicatore della loro vitalita.` In questo communication to converge toward a standard and lavoro vogliamo concentrarci sugli aspetti they can instruct less skilled speakers toward bet- quantitativi del sardo usato su Facebook. ter mastering of the rules of the language, espe- In particolare, vogliamo analizzare le va- cially when the language is not formally included rieta` diatopiche estraendo i dati linguis- in education. From the perspective of computa- tici dalle pagine pubbliche. Mediante la tional linguistics, the presence of digital content traduzione delle parole piu` frequenti ab- written in RMLs means that corpora can be built biamo trovato similarita` e differenze tra le for them and basic tools (lemmatizers, spell check- varieta.` ers, lexicons etc.) can be developed. The presence of RMLs over digital media and their usability through digital devices is often limited to 1 Introduction instances of digital activism and/or by means of Everyday life makes an increasingly extensive use cultural initiatives focused on the preservation of of digital devices that involve language use; for cultural heritage. this reason, usability of a language over digital de- In this paper we promote the first study we are vices is a sign for that language of being mod- aware of about the use on social networks (more ern, relevant to current lifestyles and capable of specifically, Facebook) of Sardinian, an Italian facing the needs of the XXI century. A positive minority language characterised by the coexist- correlation between presence in new technologies ence of varieties and the difficulties for the pro- and better appreciation of a language has been re- moted standard to emerge as unifying factor. Our peatedly observed in the literature, see for instance starting hypothesis concerned the vitality on so- (Eisenlohr, 2004) and (Crystal, 2010). Regional cial networks of a language that is mainly spoken. and minority languages (RMLs henceforth) are With the help of a Sardinian linguist, we identi- fied a small set of FB public groups where specific This proposal was sharply criticised by some sec- varieties of Sardinian are chosen as their main lan- tors of the public opinion and strong disapproval guage plus groups where generic, not further de- came even from a part of native speakers, espe- fined Sardinian is used to communicate. We ex- cially from the South, who considered this stan- tracted messages from these pages and created a dard too much different from the language they frequency lexicon for each variety. The most fre- spoke. It is a fact that it never became a model quent 150 words have been translated by a Sar- of official Sardinian. dophone expert linguist with the aim of finding In 2006, another model of written language was differences and commonalities between varieties. made official by the Regional Committee resolu- This preliminary analysis is the first step toward tion n◦16/14. This standard, called LSC (Limba the use of computational linguistics methodolo- Sarda Comuna, Common Sardinian Language)2 gies in the promotion of a standard for Sardinian made the effort of taking into account also the dia- based on quantitative data. lects of the transition region of the center men- tioned earlier. Although regional administration 2 Sardinian today: Main Varieties and recommended its use for written public documents Standardization Efforts it is still reluctantly accepted by some speakers, Sardinian is an autonomous Romance language who perceive it as too distant from the varieties spoken in the island of Sardinia. According to they speak. (Lupinu, 2007) it is known by approximately In 2010, the Provincial Council of Cagliari took a different course choosing with the Provincial 68,4% of the population of the island. Ethno- ◦ 3 logue1 lists four varieties for Sardinian: North- Committee resolution n 17 a linguistic norm western Sardinian or Sassarese (100,000 speakers based on literary language of Southern poets and ca.), Campidanese (500,000 speakers ca.), Central writers, in order to draw up acts, documents and Sardinian or Logudorese (500,000 speakers ca.) even textbooks for primary children. and Gallurese (100.000 speakers ca.) All these standardization efforts, politically The most important differences from a lexical, guided or emerged bottom-up, clearly show that phonological and morphological point of view Sardinian speakers are aware of the role of stan- within Sardinian can be found between Central- dard orthography and grammar for the vitality and Southern and Central-Northern dialects. the survival of their language. On the one hand, Scholars use to divide Sardinian in two main vari- they want to promote the idea of a unique lan- eties: Logudorese and Campidanese, the first one guage as a matter of identity; on the other, they spoken in the North and in the center of the island dont want to lose local peculiarities by adopting and the second one spoken in the South. standard rules that inevitably hide some local dif- Logudorese and Campidanese can be related to ferences. two different pre-existing written standards: the Social media are widely used by Sardinian speak- so-called Logudorese (or Logudorese illustre) was ers and they represent an interesting scenario for used for the first time in a short poem at the end of written but informal use of the language. An in- the XV century (Manca, 2002), whereas what is depth analysis of the type of language used by known as Campidanese was the language of some Sardinian speakers on social media is still miss- religious plays at the end of the XVII Century (De ing. Certainly, use of everyday Sardinian in spo- Martini Abdullah Luca, 2006). ken and written (online) informal communication, Today, Sardinian lacks of a generally agreed stan- is a sign of vitality of the language. Interaction dard variety, although standardization efforts char- is a powerful instrument for standardization, and acterised the recent history of the Region. the interactive modality offered by social media The first attempt to introduce a written system could reveal the emergence of coordination strate- based on an integration of phonetic, lexical and 2Regione Autonoma della Sardegna (2006), Limba Sarda morphological features of modern Sardinian vari- Comuna. Norme linguistiche di riferimento a carattere sper- eties was made in 2001, when the basic rules of imentale per la lingua scritta dellAmministrazione regionale, Cagliari, Regione Autonoma della Sardegna. LSU (Limba Sarda Unificada,Unified Sardinian 3Arregulas` po sortografia, sa fonetica,` sa morfologia e su Language) were presented (Blasco Ferrer, 2001). fueddariu` de sa bariedadi Campidanesa de sa l`ıngua sarda (Rules for orthography, phonetic, morphology and the vocab- 1www.ethnologue.com ulary of Campidanese variety of Sardinian language) gies toward a standard in speakers community as the four sets of Facebook groups analysed are re- a natural need (Burghardt, 2016). To check this ported. In Table 3 each possible pair of varieties is hypothesis, we started to analyse the use of dif- compared by checking the overlapping of trans- ferent varieties of Sardinian that is being made on lations into Italian. The second column reports Facebook. According to the preliminary data of a how many Italian types are in common between recent survey, Facebook is the social media that is two varieties. For example, among the most fre- most used by Sardinian speakers, and where Sar- quent 150 LSC word forms and the 150 most fre- dinian is actively and extensively used4. quent Sardu word forms, 61 words have the same Italian translation. The third column contains the 3 Data Extraction and Analysis number of words with the same word forms in the We selected public pages and communities on two varieties compared, e.g. the Italian adjective Facebook that are rich in content and interactions grande has the same word form (mannu) in Nu- between users. With the help of a Sardinian lin- goresu and Campidanese. This is a first attempt to guist we identified four mutually exclusive sets: understand if two varieties are close orthograph- ically, considering the orthographic forms of the • pages where people communicate in LSC; analysed words. We also report the number of con- tent words found in each pair because we believe • pages where people communicate in Sar- that in the future the overlapping at orthographic dinian without further specification of the level should be analysed taking into account the chosen variety; distinction between content and function words.