Survey on European and Brazilian Portuguese Speech and Text Corpora

Total Page:16

File Type:pdf, Size:1020Kb

Survey on European and Brazilian Portuguese Speech and Text Corpora Carla Simões, [email protected] Survey on European and Brazilian Portuguese Speech and Text Corpora 1 Portuguese Corpora • Where can we find it? – World Wide Web • www.elra.info , European Language Resources Association • www.elda.org, Evaluations and Language resources Distribution Agency • www.ldc.upenn.edu, Linguistic Data Consortium • www.iltec.pt, Instituto de Linguística Teórica e Computacional • www.clul.ul.pt, Centro de Linguística da Universidade de Lisboa • www.l2f.inesc-id.pt, Laboratório de Sistemas de Língua Falada, INESC • www.linguateca.pt, Language resource center for Portuguese • devoted.to/corpora, Bookmarks for Corpus-based Linguists • www.appen.com.au, Appen Speech Technologies 2 ELRA www.elra.info 3 European Language Resources Association • Spoken corpus • Written corpus – Desktop/microphone – Monolingual Lexicon • C-ORAL-ROM - Integrated • LusoLEX European Portuguese reference corpora for spoken Lexicon romance languages • BrasiLEX Brazilian Portuguese • FASiL Portuguese “fasil-pt” corpus lexicon • Portuguese Speecon database • PAROLE Portuguese Lexicon • GlobalPhone Portuguese • LABEL-LEX (MW) (Brazilian) • LABEL-LEX (SW) – Telephony – Written corpora • Portuguese SpeechDat(M) • PAROLE Portuguese Corpus database • ECI/MCI (European Corpus • Portuguese SpeechDat(II) FDB- Initiative/Multilingual Corpus I) 4000 • MLCC - Multilingual and Parallel Corpora 4 Spoken corpus – Desktop/microphone • C-ORAL-ROM - Integrated reference corpora for • Catalog Reference : spoken romance languages S0172 – The corpus consists of four comparable recording • Source Channel : collections of Italian, French, Portuguese and Spanish Microphone, Radio, spontaneous speech sessions (around 300,000 words Telephone, Television for each Language) • Members Prices – Academic - Commercial – It provides the acoustic source of each session together 10000.00 EUR with the following main annotations: – Academic - Research • The orthographic transcription, in CHAT format, enriched 1500.00 EUR with the tagging of terminal and non terminal prosodic breaks – Commercial - Commercial 10000.00 EUR • Session metadata – Commercial - Research • The text to speech synchronization, in WIN PITCH CORPUS 10000.00 EUR format, based on the alignment of each transcribed utterance • Non Member Prices : – Package: – Academic - Commercial 20000.00 EUR • uncompressed .WAV files (Win PCM: 22,050 hz; 16 bit) – Academic - Research • Transcription files in .TXT and .XML format 3000.00 EUR • transcription files with PoS tagging in .TXT files – Commercial - Commercial 20000.00 EUR • The frequency list of lemmas for each language collection in – Commercial - Research TXT files 20000.00 EUR • Measurements of spoken language variability in EXCEL files 5 Menu Spoken corpus – Desktop/microphone • FASiL Portuguese unimodal “fasil-pt” corpus – The corpus was collected in the context of the FASiL • Members Prices project, EU FP5 IST-2001-38685 (http://www.fasil.co.uk), – Academic - Commercial as a wizard-of-oz experiment 8000.00 Academic - Research 4000.00 EUR – Commercial - – There are sound recordings of subject and wizard. A Commercial 8000.00 total of 70 subjects were recorded EUR – Commercial - Research 8000.00 EUR – The woz experiment is about the voice interaction with a • Non Member Prices Virtual Personal Assistant (VPA) for an email, calendar – Academic - Commercial and contacts task 10000.00 EUR – Academic - Research – .wav files (u-law) for audio, plain ASCII text (.txt) for 8000.00 EUR transcriptions – Commercial - • Catalog Reference : S0174-02 Commercial 10000.00 EUR • Distribution medium : CD-ROM, DVD – Commercial - Research 10000.00 EUR • Also available the FASiL combined unimodal “fasil- all” corpus and the FASiL multimodal “fasil-mm” corpus, where the subjects were recorded in three project languages: Swedish, Portuguese and English • Demo 6 Menu Spoken corpus – Desktop/microphone • Portuguese Speecon database • Catalog Reference : – It’s a Portuguese speech corpus recorded in S0180 Portugal • Source Channel : Microphone – Recorded at 16 KHz, 16 bit – linear – 87 hours • Members Prices – Microphone: CloseTalk/ FarTalk – Academic - Commercial 67000.00 – Divided in two sets: – Academic - Research 50000.00 EUR • The first set comprises the recordings of 553 – Commercial - Commercial adult Portuguese speakers (266 males, 287 67000.00 – Commercial - Research females), recorded over 4 microphone 67000.00 channels in 4 recording environments (office, entertainment, car, public place) • The second set comprises the recordings of • Non Member Prices 52 child Portuguese speakers (19 boys, 33 – Academic - Commercial girls), recorded over 4 microphone channels in 75000.00 EUR 1 recording environment (children room) – Academic - Research 60000.00 EUR • This database is partitioned into 29 DVDs – Commercial - Commercial (first set) and 4 DVDs (second set) 75000.00 EUR – Commercial - Research 75000.00 EUR 7 Menu Spoken corpus – Desktop/microphone • GlobalPhone Portuguese (Brazilian) • Catalog Reference : S0201 – provides transcribed speech data for the • Info development and evaluation of large vocabulary – http://www.cs.cmu.edu/~ continuous speech recognition systems in the most tanja/GlobalPhone widespread languages of the world • Distribution medium : DVD – The Portuguese (Brazilian) corpus was produced using the Folha de São Paulo newspaper • Members Prices – Academic - Commercial – The entire GlobalPhone corpus contains over 300 3000.00 hours of speech spoken by more than 1500 native – Academic - Research 600.00 adult speakers – Commercial - Commercial 3000.00 EUR – It contains recordings of 102 speakers (54 males, 48 – Commercial - Research females with different age distribution) recorded in 3000.00 Porto Velho and Sao Paulo, Brazil • Non Member Prices – Academic - Commercial – In each language about 100 adult native speakers 3600.00 Academic - were asked to read 100 sentences Research 700.00 – Commercial - Commercial – About 2 Gb for each language 3600.00 EUR – close-speaking microphone, PCM encoding, – Commercial - Research mono quality, 16-bit quantization, and 16 kHz 3600.00 sampling rate 8 Menu Spoken corpus – Telephony • Portuguese SpeechDat(M) database • An INESC project under a subcontract with Portugal • Catalog Reference : S0068 Telecom – first phase • Source Channel : Telephone – Contains the recordings of 1,001 speakers (453 males, • Distribution : 548 females). This speech database was collected by CD-ROM Portugal Telecom within the European SpeechDat • Members prices : project – Academic - Commercial – It has a good representation of many regional accents, 14000.00 and age distribution – EURAcademic - Research – It is also included a pronunciation lexicon with a 11000.00 phonemic transcription in SAMPA – EURCommercial - Commercial 14000.00 – 8 kHz, 8-bit A-law – EURCommercial - Research – Each speaker uttered the following items: 14000.00 EUR • natural numbers • digits • Non member prices : • money amounts – Academic - Commercial • dates 20000.00 • time phrase • application words – EURAcademic - Research • spelled-out words 14000.00 • word spotting phrases – EURCommercial - • sentences Commercial 20000.00 • yes/no questions – EURCommercial - Research • spontaneous date 20000.00 EUR • spontaneous time • region name 9 Menu Spoken corpus – Telephony • Portuguese SpeechDat(II) FDB-4000 • Catalog Reference : S0092 • An INESC project under a subcontract with Portugal Telecom – second phase • Source Channel : – Comprises 4027 Portuguese speakers (1861 males, 2166 Telephone females) recorded over the Portuguese fixed telephone • Distribution : network. It has a good representation of many regional CD-ROM accents, and age distribution – 8-bit 8 kHz A-law • Members prices : – A pronunciation lexicon with a phonemic transcription in – Academic - Commercial SAMPA 40000.00 EUR – Each speaker uttered different items: – Academic - Research - digits, numbers 28000.00 EUR - currency money amount – Commercial - Commercial - dates 40000.00 - time phrases : – Commercial - Research - spelled words : 40000.00 EUR - directory assistance utterances - yes/no questions • Non member prices : - application words – Academic - Commercial - phonetically rich words 56000.00 EUR - phonetically rich sentences – Academic - Research 48000.00 EUR • Samples - http://speechdat.phonetik.uni- – Commercial - Commercial muenchen.de/speechdt/speechDB/FIXED1PT/HTML/index. 56000.00 html – Commercial - Research 56000.00 EUR 10 Menu Written corpus - Monolingual Lexicon • LusoLEX European Portuguese Lexicon and BrasiLEX Brazilian Portuguese lexicon – Available at Microsoft Language Resources • PAROLE Portuguese Lexicon – It’s constituted by 20 000 entries morpho-syntactically and syntactically encoded – Distribution medium : CD-ROM • Members Prices – Academic - Commercial 10500.00 EUR – Academic - Research 1400.00 EUR – Commercial - Commercial 10500.00 EUR – Commercial - Research 3500.00 EUR • Non Member Prices – Academic - Commercial 15000.00 EUR – Academic - Research 2000.00 EUR – Commercial - Commercial 15000.00 EUR – Commercial - Research 5000.00 EUR 11 Menu Written corpus – Monolingual Lexicon • LABEL-LEX (MW) • Members Prices – It is a Portuguese formalized lexicon, – Academic - Commercial 10000.00 EUR containing 88 619 multiword lexical – Academic - Research 3000.00 EUR units (formally, sequences of simple – Commercial - Commercial 10000.00 EUR words) – Commercial - Research 10000.00 EUR – Often, are used to express ideas and • Non Member Prices concepts – Academic - Commercial 15000.00
Recommended publications
  • P TRANSCRIPTION of the INTONATION of NORTHEASTERN
    TRANSCRIPTION OF THE INTONATION OF NORTHEASTERN BRAZILIAN PORTUGUESE by Meghan Dabkowski B.A., University of Pittsburgh, 2002 Submitted to the Graduate Faculty of The Dietrich School of Arts and Sciences in partial fulfillment of the requirements for the degree of Master of Arts University of Pittsburgh 2012 p UNIVERSITY OF PITTSBURGH The Kenneth P. Dietrich School of Arts and Sciences This thesis was presented by Meghan Dabkowski It was defended on April 9, 2012 and approved by Shelome Gooden, Associate Professor, Department of Linguistics Alan Juffs, Associate Professor, Department of Linguistics Committee Chair: Marta Ortega-Llebaria, Assistant Professor, Department of Linguistics ii Copyright © by Meghan Dabkowski 2012 iii TRANSCRIPTION OF THE INTONATION OF NORTHEASTERN BRAZILIAN PORTUGUESE Meghan Dabkowski, M.A. University of Pittsburgh, 2012 Among dialects of Brazilian Portuguese (henceforth BP), the variety spoken in the Northeast region of Brazil is considered to display a distinctive use of intonation. In attempt to understand this characterization, this thesis presents an analysis of the intonational phonology of five main phrase types in Northeastern BP (henceforth NEBP): declarative statements, absolute questions, wh- questions, echo questions, and imperative statements. Contrastive focus, enumeration, and disjunction were also investigated. Participants were 5 female natives of the region currently residing in Pittsburgh, PA. Utterances were elicited using an intonation questionnaire designed to evoke everyday situations (Prieto and Roseano, 2010), which was adapted to BP by the author. This descriptive analysis is couched within Autosegmental Metrical theory, which posits a separate level of linear organization for the pitch track of an utterance, autonomous from the text, or segmental information, but associated with it via tonal alignment with metrically strong syllables and phrase edges.
    [Show full text]
  • Language Contact and Variation in Cape Verde and São Tomé and Príncipe Nélia Alexandre & Rita Gonçalves (University Of
    ALEXANDRE, Nélia & GONÇALVES, Rita. (2018). Language contact and variation in Cape Verde and São Tome and Principe. In L. Álvarez; P. Gonçalves & J. Avelar (eds.). The Portu- guese Language Continuum in Africa and Brazil. 237-265, Amsterdam: John Benjamins Publ. Language contact and variation in Cape Verde and São Tomé and Príncipe Nélia Alexandre1 & Rita Gonçalves (University of Lisbon, Center of Linguistics) Abstract Both Cape Verdean Portuguese (CVP) and Santomean Portuguese (STP) are in contact with Portuguese-related Creole languages formed during the 15th and 16th centuries: Capeverdean, in Cape Verde, and Santome, Angolar and Lung’Ie, in São Tomé and Príncipe. Despite this, Portuguese is the only offi- cial language of the two archipelagos. The goal of this chapter is twofold. First, we aim to compare and dis- cuss corpora data from CVP and STP underlining the impact of socio- historical factors in some morphosyntactic aspects of these Portuguese varie- ties’ grammars. Second, we will show that the convergence and divergence observed is not only driven by the language contact situation but also by inter- nal change mechanisms. 1. Introduction Cape Verde and São Tomé and Príncipe are two islands states near the west coast of Africa. Both are former Portuguese colonies which have Portuguese- related Creole languages formed during the 15th and 16th centuries: Capeverd- ean, in the former, and three Gulf of Guinea creoles, with a major role of San- tome, in the latter. In spite of this, Portuguese was chosen as the only official language after their independence in 1975. These historical similarities sug- 1 This research was funded by the Portuguese National Science Foundation (FCT) project UID/LIN/00214/2013.
    [Show full text]
  • Dialects of Spanish and Portuguese
    30 Dialects of Spanish and Portuguese JOHN M. LIPSKI 30.1 Basic Facts 30.1.1 Historical Development Spanish and Portuguese are closely related Ibero‐Romance languages whose origins can be traced to the expansion of the Latin‐speaking Roman Empire to the Iberian Peninsula; the divergence of Spanish and Portuguese began around the ninth century. Starting around 1500, both languages entered a period of global colonial expansion, giving rise to new vari- eties in the Americas and elsewhere. Sources for the development of Spanish and Portuguese include Lloyd (1987), Penny (2000, 2002), and Pharies (2007). Specific to Portuguese are fea- tures such as the retention of the seven‐vowel system of Vulgar Latin, elision of intervocalic /l/ and /n/ and the creation of nasal vowels and diphthongs, the creation of a “personal” infinitive (inflected for person and number), and retention of future subjunctive and pluper- fect indicative tenses. Spanish, essentially evolved from early Castilian and other western Ibero‐Romance dialects, is characterized by loss of Latin word‐initial /f‐/, the diphthongiza- tion of Latin tonic /ɛ/ and /ɔ/, palatalization of initial C + L clusters to /ʎ/, a complex series of changes to the sibilant consonants including devoicing and the shift of /ʃ/ to /x/, and many innovations in the pronominal system. 30.1.2 The Spanish Language Worldwide Reference grammars of Spanish include Bosque (1999a), Butt and Benjamin (2011), and Real Academia Española (2009–2011). The number of native or near‐native Spanish speakers in the world is estimated to be around 500 million. In Europe, Spanish is the official language of Spain, a quasi‐official language of Andorra and the main vernacular language of Gibraltar; it is also spoken in adjacent parts of Morocco and in Western Sahara, a former Spanish colony.
    [Show full text]
  • The Use of Third Person Accusative Pronouns in Spoken Brazilian Portuguese: an Analysis of Different Tv Genres
    THE USE OF THIRD PERSON ACCUSATIVE PRONOUNS IN SPOKEN BRAZILIAN PORTUGUESE: AN ANALYSIS OF DIFFERENT TV GENRES by Flávia Stocco Garcia A Thesis submitted to the Faculty of Graduate Studies of The University of Manitoba in partial fulfillment of the requirements of the degree of MASTER OF ARTS Department of Linguistics University of Manitoba Winnipeg Copyright © 2015 by Flávia Stocco Garcia ABSTRACT This thesis presents an analysis of third person accusative pronouns in Brazilian Portuguese. With the aim to analyze the variation between the use of standard (prescribed by normative grammar) and non-standard pronouns found in oral language, I gathered data from three kinds of TV show (news, non-scripted and soap-opera) in order to determine which form of pronoun is more common and if there is any linguistic and/or sociolinguistic factors that will influence on their usage. Based on data collected, I demonstrate that non-standard forms are favored in general and that the rules prescribed by normative grammar involving standard forms are only followed in specific contexts. Among all the variables considered for the analysis, the ones that showed to be significant were the kind of show, the context of the utterance, the socio-economic status of the speaker and verbs in the infinitive. Considering my results, I provide a discussion regarding to which extent the distribution of the 3rd-person pronouns on TV reflect their use by Brazilians and a brief discussion of other issues related to my findings conclude this work. 2 ACKNOWLEDGEMENTS First and foremost, I would like to express my gratitude to my advisor, Verónica Loureiro-Rodríguez, for all her help during the completion of this work.
    [Show full text]
  • PDF 1.3 MB(Link Is External)
    MEMORIA DE ACTIVIDADES 2020 Instituto da Lingua Galega Edita Instituto da Lingua Galega Universidade de Santiago de Compostela Praza da Universidade, 4 15782 Santiago de Compostela A Coruña Correo electrónico: [email protected] Páxina web: http://ilg.usc.gal/ Teléfonos +34 8818 12815 +34 8818 12802 Fax +34 81572770 Maquetación Raquel Vila-Amado A portada foi deseñada cunha imaxe de Harryarts (www.freepik.es) Universidade de Santiago de Compostela, maio de 2021 ÍNDICE LIMIAR ........................................................................................................................................... 3 RELACIÓN DE LIÑAS E PROXECTOS DE INVESTIGACIÓN ................................................................ 6 A. PROXECTOS PROMOVIDOS NO SEO DO ILG.......................................................................... 6 B. PROXECTOS PROMOVIDOS POR OUTRAS INSTITUCIÓNS .................................................. 26 CONSULTAS DA PÁXINA WEB DO CENTRO .................................................................................29 ORGANIZACIÓN DE CURSOS, CONFERENCIAS E ENCONTROS LINGÜÍSTICOS ............................. 30 A ACTIVIDADE DO PERSOAL INVESTIGADOR ............................................................................... 32 A. CONFERENCIAS, COMUNICACIÓNS E PÓSTERS EN CONGRESOS E ENCONTROS CIENTÍFICOS .................................................................................................................... 32 B. CURSOS E SEMINARIOS ..................................................................................................
    [Show full text]
  • Towards an Outline of Central and Southern Portugal Potamonymy Para Um Perfil Da Potamonímia Do Centro E Do Sul De Portugal
    DOI: 10.14393/DL46-v15n2a2021-10 Towards an outline of central and southern Portugal potamonymy Para um perfil da potamonímia do centro e do sul de Portugal Carlos ROCHA* ABSTRACT: Within the set of river RESUMO: Em relação ao conjunto names of Portugal, those of the northwest onomástico formado pelos nomes dos rios usually stand out because of their (potamónimos) de Portugal, destacam-se archaism. However, rivers located to the normalmente os do noroeste pelo arcaísmo. south of the Mondego basin and the Contudo, os potamónimos localizados a sul Central System are no less interesting, as da bacia do Mondego e do Sistema Central they reveal great etymological não são menos interessantes, pois revelam heterogeneity, ranging from a few that fit grande heterogeneidade etimológica, into the pre-Latin substrates to several abrangendo desde um pequeno grupo names that underwent Arabization enquadrável nos substratos pré-latinos a um between the 8th and 13th centuries. reportório alterado pela arabização ocorrida Several items also stand out, which are na região entre os séculos VIII a XIII. more recent and result from the Sobressai ainda um largo número de nomes expansion of the Galician-Portuguese de criação mais recente, criados pela dialects to the south, in the context of the implantação a sul dos dialetos galego- medieval Christian conquest and portugueses, assim configurando um colonization. This article, which draws on processo de colonização linguística previous research (ROCHA, 2017), sets decorrente da conquista cristã medieval. O out an outline of the central and southern presente trabalho, baseado noutro anterior Portuguese potamonym by classifying (ROCHA, 2017), propõe definir um perfil da each item etymologically and ascribing potamonímia centro-meridional portuguesa them to the stratigraphy and the history por meio da classificação etimológica de of transmission of the current toponymy cada item e do seu enquadramento tanto na in the territory in point.
    [Show full text]
  • Lisbon Stories: Migration, Community and Intercultural Relations in Contemporary Cinema and Literature FERNANDO ARENAS University of Michigan
    Lisbon Stories: Migration, Community and Intercultural Relations in Contemporary Cinema and Literature FERNANDO ARENAS University of Michigan Abstract: This essay focuses on immigration in contemporary Portugal and social phenomena related to community, intercultural relations, and citizenship as represented in acclaimed cinematic and literary texts. The objects of analysis are the Brazilian film Terra estrangeira (1996) by directors Walter Salles and Daniela Thomas, the Brazilian novel Estive em Lisboa e lembrei de você (2009) by Luiz Ruffato, and the Portuguese film Viagem a Portugal (2011) by Sérgio Tréfaut. Keywords: Portugal; cinema; novel; Walter Salles; Luiz Ruffato; Sérgio Tréfaut Migration played a significant role in the maritime-colonial expansion of Western European powers since the early modern era, through the forced migration of millions of Africans and a mixture of voluntary and forced migration of Europeans across the Atlantic to the Americas and Africa. Such was the case throughout the history of the Portuguese empire, as well as in postcolonial Brazil, Cape Verde, and Portugal. In the case of Brazil, successive waves of forced African and voluntary European migration (as well that from East Asia and the Middle East) constitute the foundation of the modern nation- state, alongside the native Amerindian population. Cape Verde is a diasporized nation-family par excellence, in that half of its population lives outside the 144 Journal of Lusophone Studies 2.1 (Spring 2017) archipelago. The Portuguese themselves have been migrating for more than five centuries. Throughout a long history of outward migration, however, there was an important interval shortly after Portugal’s entry into the European Union, between the late 1980s and early 2000s, that witnessed the modernization and expansion of the Portuguese economy.
    [Show full text]
  • Estudos Em Variação Linguística Nas Línguas Românicas 1
    Estudos em variação linguística nas línguas românicas 1 Estudos em variação linguística nas línguas românicas Coordenação Lurdes de Castro Moutinho Rosa Lídia Coimbra Elisa Fernández Rei Xulio Sousa Alberto Gómez Bautista 2 Estudos em variação linguística nas línguas românicas FICHA TÉCNICA TÍTULO Estudos em variação linguística nas línguas românicas EDITORES Lurdes de Castro Moutinho, Rosa Lídia Coimbra, Elisa Fernández Rei, Xulio Sousa, Alberto Gómez Bautista COMISSÃO CIENTÍFICA DO VOLUME Alexsandro Rodrigues Meireles (Universidade Federal do Espírito Santo), Antonio Romano (Universidade de Turim), Francisco Dubert (Universidade de Santiago de Compostela), Helena Rebelo (Universidade de Madeira), Izabel Christine Seara (Universidade Federal de Santa Catarina), Leandra Batista Antunes (Universidade Federal de Ouro Preto), Maria Teresa Roberto (Universidade de Aveiro), María Victoria Navas Sánchez-Élez (Universidade Complutense de Madrid), Paolo Mairano (Université de Lille), Regina Célia Fernandes Cruz (Universidade Federal do Pará), Rosa Maria Lima (Escola Superior de Educação Paula Frassinetti), Sandra Madureira (Pontifícia Universidade Católica de São Paulo), Valentina De Iacovo (Universidade de Turim), Vanessa Gonzaga Nunes (Universidade Federal de Santa Catarina) e os editores desta publicação. EDITORA UA Editora Universidade de Aveiro Serviços de Biblioteca, Informação Documental e Museologia 1.ª edição – julho 2019 ISBN 978-972-789-600-4 IMAGEM DA CAPA Pixabay Estudos em variação linguística nas línguas românicas 3 APOIOS AO EVENTO 4 Estudos em variação linguística nas línguas românicas Les études que nous menons ne sont jamais achevées, comme n’est jamais achevée l’analyse de langues que les hommes parlent, depuis des centaines de milliers d’années, et de l’univers culturel qu’elles véhiculent et qui nous émerveille chaque jour tout le long de notre existence.
    [Show full text]
  • Bigorna – a Toolkit for Orthography Migration Challenges
    Bigorna – a toolkit for orthography migration challenges José João Almeida1, André Santos1, Alberto Simões2 1 Departamento de Informática, Universidade do Minho, Portugal 2 Escola Superior de Estudos Industriais e de Gestão, Instituto Politécnico do Porto, Portugal [email protected], [email protected], [email protected] Abstract Languages are born, evolve and, eventually, die. During this evolution their spelling rules (and sometimes the syntactic and semantic ones) change, putting old documents out of use. In Portugal, a pair of political agreements with Brazil forced relevant changes on the way the Portuguese language is written. In this article we will detail these two Orthographic Agreements (one in the thirties and the other more recently, in the nineties), and the challenges present on the automatic migration of old documents spelling to their actual one. We will reveal Bigorna, a toolkit for the classification of language variants, their comparison and the conversion of texts in different language versions. These tools will be explained together with examples of migration issues. As Birgorna relies on a set of conversion rules we will also discuss how to infer conversion rules from a set of documents (texts with different ages). The document concludes with a brief evaluation on the conversion and classification tool results and their relevance in the current Portuguese language scenario. 1. Introduction impact. In this article we are interested into the latest, that can be summarized as: Languages evolve. This evolution can be natural and grad- ual, when speakers change habits during decades, or can be • 1931: This was the first orthographic agreement be- forced and drastic, when some kind of regulatory institu- tween Portugal and Brazil trying to approximate both tion defines some rules (or heuristics) on how the language languages.
    [Show full text]
  • Developments of the Lateral in Occitan Dialects and Their Romance and Cross-Linguistic Context Daniela Müller
    Developments of the lateral in occitan dialects and their romance and cross-linguistic context Daniela Müller To cite this version: Daniela Müller. Developments of the lateral in occitan dialects and their romance and cross- linguistic context. Linguistics. Université Toulouse le Mirail - Toulouse II, 2011. English. NNT : 2011TOU20122. tel-00674530 HAL Id: tel-00674530 https://tel.archives-ouvertes.fr/tel-00674530 Submitted on 27 Feb 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. en vue de l’obtention du DOCTORATDEL’UNIVERSITÉDETOULOUSE délivré par l’université de toulouse 2 - le mirail discipline: sciences du langage zur erlangung der doktorwürde DERNEUPHILOLOGISCHENFAKULTÄT DERRUPRECHT-KARLS-UNIVERSITÄTHEIDELBERG présentée et soutenue par vorgelegt von DANIELAMÜLLER DEVELOPMENTS OF THE LATERAL IN OCCITAN DIALECTS ANDTHEIRROMANCEANDCROSS-LINGUISTICCONTEXT JURY Jonathan Harrington (Professor, Ludwig-Maximilians-Universität München) Francesc Xavier Lamuela (Catedràtic, Universitat de Girona) Jean-Léonard Léonard (Maître de conférences HDR, Paris
    [Show full text]
  • HERAUSGEBER Pan, Christoph: Minderheitenschutz in Südtirol: Hilfe Zur Selbsthilfe Christoph Pan Als Instrument Der Ethnopolitik (1954–2020)
    Europäisches Journal für Minderheitenfragen European Journal of Minority Studies Contents Vol 13 No 3–4 2020 EJM Editorial Videsott, Paul/Pan, Christoph: Editorial . .119 Beiträge No 3–4 13 Vol 2020 Europäisches Journal für Minderheitenfragen Banza, Ana Paula: Linguistic minorities in Portugal: the Barranquenho . .123 European Journal of Minority Studies EJM Bárbolo Alves, António: The Mirandese language of Portugal . .135 Opfer-Klinger, Björn: Im Bann der gesellschaftlichen Radikalisierung? – Die Situation der ethnischen Minderheiten in Bulgarien . .143 Vol 13 No 3–4 2020 Louvin, Roberto/Alessi, Nicolò: The maze of languages in Aosta Valley . .167 Chronik HERAUSGEBER Pan, Christoph: Minderheitenschutz in Südtirol: Hilfe zur Selbsthilfe Christoph Pan als Instrument der Ethnopolitik (1954–2020) . .191 Franz Matscher Jacob, Fabian/Novak, Měto: Creating added value. How the structural change in Lusatia can benefi t from Sorbian-German multilingualism . .236 Manfred Kittel Matthias Theodor Vogt Bröstl, Alexander: The extension of the application of Part III of the European Charter for Regional or Minority Languages in the Czech Republic . .248 Paul Videsott Rautz, Günther/Kuzborska-Pacha, Elzbieta: Minorities and COVID-19. Beate Sibylle Pfeil European Journal of Minority Studies European Academy Bozen/Bolzano (EURAC research) Webinar Series | (May–July 2020) . .257 Rezensionen Melchior, Luca: Videsott, Paul/Videsott, Ruth/Casalicchio, Jan (eds.): Manuale di linguistica ladina. Manuals of Romance lingusitics Bd. 26. Berlin/Boston, 2020 . .261
    [Show full text]
  • The Phonology of Portuguese Ebook Free Download
    THE PHONOLOGY OF PORTUGUESE PDF, EPUB, EBOOK Maria Helena Mateus | 172 pages | 05 Dec 2002 | Oxford University Press | 9780199256709 | English | Oxford, United Kingdom The Phonology of Portuguese PDF Book Schmid and Barbara Kopke. Already have an account? Rich in data and examples, the overall description of the phonology and morphology of Portuguese presented in the book is quite adequate. Email this Article. Influential in its development were successive invasions by Germanic peoples, Visigoths, and Moors, the latter of whom were finally evicted in the thirteenth century. The book deals with fundamental issues in the phonology of Portuguese, from segmental to suprasegmental, covering the inventory of consonants and vowels, the realization of glides, the formation and structure of the syllable, and the generation of lexical stress. For more detailed information on regional accents, see Portuguese dialects , and for historical sound changes see History of Portuguese. Unlike French, for example, Portuguese does not indicate most of these sound changes explicitly in its orthography. It is also a feature of a phonologically closely related language, Catalan. Continue with Facebook Sign up with Google. Phonologies of the world's languages. Finally, Ch. The book is organized into seven chapters. Brazilian Portuguese, on the other hand, is of mixed characteristics, halfway to being a syllable-timed language , with:. To purchase, visit your preferred ebook provider. This pronunciation is particularly common in lower registers , although found in most registers in some areas, e. Submitting a report will send us an email through our customer support system. Read Article. Phonological variation and change in Brazilian Portuguese: the case of the liquids.
    [Show full text]