Adaptation of Brazilian Portuguese Texts to European Portuguese

Total Page:16

File Type:pdf, Size:1020Kb

Adaptation of Brazilian Portuguese Texts to European Portuguese BP2EP – Adaptation of Brazilian Portuguese texts to European Portuguese Lu´ıs Marujo 1,2,3, Nuno Grazina 1, Tiago Lu´ıs 1,2, Wang Ling 1,2,3, Lu´ısa Coheur 1,2, Isabel Trancoso 1,2 1 Spoken Language Laboratory / INESC-ID Lisboa, Portugal 2 Instituto Superior Tecnico,´ Lisboa, Portugal 3 Language Technologies Institute / Carnegie Mellon University, Pittsburgh, USA luis.marujo,ngraz,tiago.luis,wang.ling,luisa.coheur,imt @l2f.inesc-id.pt { } Abstract Portuguese (EP) translations. This discrepancy has origin in the Brazilian population size that is near This paper describes a method to effi- 20 times larger than the Portuguese population. ciently leverage Brazilian Portuguese re- This paper describes the progressive develop- sources as European Portuguese resources. ment of a tool that transforms BP texts into EP, in Brazilian Portuguese and European Por- order to increase the amount of EP parallel corpora tuguese are two Portuguese varieties very available. close and usually mutually intelligible, This paper is organized as follows: Section 2 de- but with several known differences, which scribes some related work; Section 3 presents the are studied in this work. Based on this main differences between EP and BP; the descrip- study, we derived a rule based system to tion of the BP2EP system is the focus of Section translate Brazilian Portuguese resources. 4. The following section describes an algorithm Some resources were enriched with mul- for extracting Multiword Lexical Contrastive pairs tiword units retrieved semi-automatically from SMT Phrase-tables; Section 6 presents the re- from phrase tables created using statisti- sults, and Section 7 concludes and suggests future cal machine translation tools. Our experi- work. ments suggest that applying our translation step improves the translation quality be- 2 Related Work tween English and Portuguese, relatively Few papers are available on the topic of improv- to the same process using the same re- ing Machine Translation (MT) quality by explor- sources. ing similarities in varieties, dialects and closely re- lated languages. 1 Introduction Altintas (2002) states that developing an MT Modern statistical machine translation (SMT) system between similar languages is much easier depends crucially on large parallel corpora and on than the traditional approaches, and that by putting the amount and specialization of the training data aside issues like word reordering and most of the for a given domain. Depending on the domain, semantics, which are probably very similar, it is such resources may sometimes be available in only possible to focus on more important features like one of the language varieties. For instance, class- grammar and the translation itself. This also al- room lecture and talk transcriptions such as the lows the creation of domains of closely related lan- ones that one can find in the MIT OpenCourse- guages which may be interchangeable and that, in Ware (OCW) website1, and the TED talks website2 this particular case, would allow, for instance, the (838 BP vs 308 EP) are examples of parallel cor- development of MT systems between English and pora where Brazilian Portuguese (BP) translations a set of Turkic languages instead of only Turkish. can be found much more frequently than European The system uses a set of rules, written in the XE- ROX Finite State Tools (XFST) syntax, which is c 2011 European Association for Machine Translation. 1http://ocw.mit.edu/OcwWeb based on Finite State Transducers (FST), to ap- 2http://www.ted.com/talk ply several morphological and grammatical adap- Mikel L. Forcada, Heidi Depraetere, Vincent Vandeghinste (eds.) Proceedings of the 15th Conference of the European Association for Machine Translation, p. 129136 Leuven, Belgium, May 2011 tations from Turkish to Crimean Tatar. Results the BLEU score by 2.86 points (Nakov and Ng, showed that this approach does not cover all varia- 2009). tions possible in these languages, and that in some cases, there is no way of adapting the text with- 3 Corpora out an additional parser capable of determining if Several corpora were collected to use in our ex- an adaptation not covered by the rules should be periments: performed. – CETEMPublico corpus: collection of EP news- Other authors have developed very similar sys- paper articles 3 tems with identical approaches to translate from – CETEMFolha: collection of BP newspaper arti- Czech to Slovak (Hajicˇ et al., 2000), from Spanish cles 4 to Catalan (Canals-Marote et al., 2001)(Navarro et – 115 texts from the Zero Hora newspaper and 50 al., 2004), and from Irish to Gaelic Scottish (Scan- texts from the Folha Cienciaˆ da Folha de Sao˜ Paulo nell, 2006). Looking at Scannel’s system (2006) newspaper. This corpus includes about 2,200 sen- gives us a better understanding of the common sys- tences, 62,000 words, and it was initially presented tem architecture. This architecture consists of a in (Caseli et al., 2009). It has 3 versions (original pipeline of components, such as a Part-of-Speech and 2 simplified versions). A parallel original cor- tagger (POS-tagger), a Na¨ıve Bayes word sense pus version in EP was also created to allow a fair disambiguator, and a set of lexical and grammat- evaluation using very close translations. The un- ical transfer rules, based on bilingual contrastive availability of EP text simplification corpora also lexical and grammatical differences. motivated this choice. On a different level, (Nakov and Ng, 2009) de- – Ted Talks (T.T.): 761 T.T. in English were col- scribes a way of building MT systems for less- lected. From those, only 262 T.T. have a corre- resourced languages by exploring similarities with sponding EP version and 749 T.T. have a BP ver- closely related and languages with much more re- sion (Table 1). sources. More than allowing translation for less- resourced languages, this work also aims at allow- Language Nr. T.T. Nr. Sentences Nr. Words ing translation from groups of similar languages to EN 761 99,970 1,817,632 other groups of similar languages just like stated EP 262 29,284 512,233 earlier. This method proposes the merging of BP 749 95,872 1,706,223 bilingual texts and phrase-table combination in the training phase of the MT system. Merging bilin- Table 1: Description of the gathered Ted Talks cor- gual texts from similar languages (on the source pus. side), one with the less-resourced language and the other (much larger) with the extensive resource 4 Main Differences Between EP and BP language, provides new contexts and new align- Texts ments for words existing in the smaller text, in- The two Portuguese varieties involved in this creases lexical coverage on the source side and work are very close and usually mutually intelli- reduces the number of unknown words at trans- gible, but there are several sociolinguistics, ortho- lation time. Also, words can be discarded from graphic, morphologic, syntactic, and semantic dif- the larger corpus present in the phrase-table sim- ferences (Mateus, 2003). Furthermore, there are ply because the input will never match them (the also relevant phonetic differences that are beyond input will be in the low resource language). This of the scope of this work. approach is heavily based on the existence of a high number of cognates between the related lan- 4.1 Sociolinguistics Differences Silva (2008) made the point that language va- guages. Experiments performed when both ap- rieties contribute to the sociolinguistic variations. proaches are combined between the similar lan- As a matter of fact, such variations generate emo- guages show that extending Indonesian-English tive meanings (e.g.: pejorative terms), strict soci- translation models with Malaysian texts yielded a olinguistic meanings (e.g.: erudite, popular, and gain of 1.35 points in BLEU. Similar results are regional terms/expressions), discursive meanings obtained when improving Spanish-English transla- tion with larger Portuguese texts, which improved 3http://www.linguateca.pt/cetempublico 4http://www.linguateca.pt/cetenfolha/ 130 (e.g.: interjections, discourse markers) and ways possessive pronouns are frequently omitted, while of addressing people (e.g.: senhor, voceˆ, and tu). this is not correct in EP. Examples: In BP, voceˆ (you) is used as a personal pronoun – EN: I sold my car. when addressing someone, in the majority of situ- – BP: Vendi meu carro. ations, instead of tu (you) or its omission in EP. – EP: Vendi o meu carro. 4.2 Orthographic and Morphologic In addition, word expansions in BP are com- Differences The recent introduction (2009) of the ortho- monly word contractions in EP (Caseli et al., graphic agreement of 1990 5 mitigated some dif- 2009). The list of contractions was extracted from (Abreu and Murteira, 1994) and it is also ferences between the two varieties, but most ex- 6 istent linguistic resources were written using the available at Wikipedia . Examples: pre-agreement version. The germane orthographic – EN: He lived in that house. differences from EP to BP are: inclusion of muted – BP: Ele vivia em aquela casa. consonants; abolition of umlaut; and different ac- – EP: Ele vivia naquela casa. centuation in Proparoxytone words (words with 4.4 Lexical and Semantic Differences stress on third-to-last syllable), some Paroxytone In the same manner that Color and colour, or words (stress on the penultimate syllable) ending gas and petrol are examples of the differences be- in -n, -r, -s , -x; and words ending in -eica, and - tween American and British dialects, there are also oo (Teyssier, 1984). Examples: correlative lexical differences between BP and EP. – EN: project, water, tennis. At this level, there are innumerable differences be- – BP: projeto, ag´ ua,¨ tenis.ˆ tween this varieties. The origin of these differences – EP: projecto, agua,´ tenis.´ is also very varied, ranging from the influence of other languages, cultural differences and historical 4.3 Syntactic Differences Both varieties differ in their preferences when reasons.
Recommended publications
  • Portuguese Language in Angola: Luso-Creoles' Missing Link? John M
    Portuguese language in Angola: luso-creoles' missing link? John M. Lipski {presented at annual meeting of the AATSP, San Diego, August 9, 1995} 0. Introduction Portuguese explorers first reached the Congo Basin in the late 15th century, beginning a linguistic and cultural presence that in some regions was to last for 500 years. In other areas of Africa, Portuguese-based creoles rapidly developed, while for several centuries pidginized Portuguese was a major lingua franca for the Atlantic slave trade, and has been implicated in the formation of many Afro- American creoles. The original Portuguese presence in southwestern Africa was confined to limited missionary activity, and to slave trading in coastal depots, but in the late 19th century, Portugal reentered the Congo-Angola region as a colonial power, committed to establishing permanent European settlements in Africa, and to Europeanizing the native African population. In the intervening centuries, Angola and the Portuguese Congo were the source of thousands of slaves sent to the Americas, whose language and culture profoundly influenced Latin American varieties of Portuguese and Spanish. Despite the key position of the Congo-Angola region for Ibero-American linguistic development, little is known of the continuing use of the Portuguese language by Africans in Congo-Angola during most of the five centuries in question. Only in recent years has some attention been directed to the Portuguese language spoken non-natively but extensively in Angola and Mozambique (Gonçalves 1983). In Angola, the urban second-language varieties of Portuguese, especially as spoken in the squatter communities of Luanda, have been referred to as Musseque Portuguese, a name derived from the KiMbundu term used to designate the shantytowns themselves.
    [Show full text]
  • Portuguese Languagelanguage Kitkit
    PortuguesePortuguese LanguageLanguage KitKit Expressions - Grammar - Online Resources - Culture languagecoursesuk.co.uk Introduction Whether you plan to embark on a new journey towards learning Portuguese or you just need a basic reference booklet for a trip abroad, the Cactus team has compiled some of the most help- ful Portuguese expressions, grammar rules, culture tips and recommendations. Portuguese is one the most significant languages in the world, and Portugal and Brazil are popular desti- nations for holidays and business trips. As such, Portuguese is appealing to an ever-growing number of Cactus language learners. Learning Portuguese will be a great way to discover the fascinating cultures and gastronomy of the lusophone world, and to improve your career pros- pects. Learning Portuguese is the beginning of an exciting adventure that is waiting for you! The Cactus Team 3. Essential Expressions Contact us 4. Grammar and Numbers Telephone (local rate) 5. Useful Verbs 0845 130 4775 8. Online Resources Telephone (int’l) 10. Take a Language Holiday +44 1273 830 960 11. Cultural Differences Monday-Thursday: 9am-7pm 12. Portugal & Brazil Culture Friday: 9am-5pm Recommendations 15. Start Learning Portuguese 2 Essential Expressions Hello Olá (olah) Goodbye Tchau (chaoh) Please Por favor Thank you Obrigado (obrigahdu) Yes Sim (simng) No Não (nowng) Excuse me/sorry Desculpe / perdão (des-cool-peh) My name is… O meu nome é… (oh meoh nomay ay) What is your name? Qual é o seu nome? (kwah-ooh eh seh-ooh noh-mee) Nice to meet you Muito prazer
    [Show full text]
  • ON MIRANDESE LANGUAGE RESOURCES for TEXT-TO-SPEECH José Pedro Ferreira2, Cristiano Chesi1,3,4 , Hyongsil Cho 1, 3, Daan Baldewi
    SLTU-2014, St. Petersburg, Russia, 14-16 May 2014 ON MIRANDESE LANGUAGE RESOURCES FOR TEXT-TO-SPEECH José Pedro Ferreira2, Cristiano Chesi1,3,4 , Hyongsil Cho 1, 3, Daan Baldewijns1, Daniela Braga1, 3, Miguel Dias 1, 3 1Microsoft Language Development Center, Portugal. 2Instituto de Linguística Teórica e Computacional. 3ISCTE-IUL University Institute of Lisbon, 4IUSS, Istituto Universitario di Studi Superiori, Pavia ABSTRACT 2. THE MIRANDESE LANGUAGE This paper aims at describing the major components of the first Text-to-Speech (TTS) system ever built for Mirandese, Mirandese belongs to the Astur-Leonese group of West [1] a minority language spoken in the Northeast of Portugal. Iberian languages, being closely related to Asturian, spoken Both language resources development (corpus, text- to this day in areas of the Asturias and Leon autonomous normalization rules, annotated lexicon, phone sets and communities in Spain, with which Mirandese no longer recordings) and the TTS (Statistical Parameter Synthesis) retains a linguistic continuum. Unwritten for most of its system are documented here. history, Mirandese was first scientifically identified and studied in the late 19th century [2]. Throughout the 20th Index Terms— Mirandese, text-to-speech, language century, strong demographic changes, namely the exodus of resources large numbers through emigration in the 1940s and in the 1970s, an influx of non-Mirandese speaking workers from 1. INTRODUCTION various other parts of the country in the 1950s and 1960s, along with the rise of Portuguese-spoken-only media, led to Mirandese is a minority language spoken by around 15.000 inter-generational transmission to be gradually abandoned, people in the Northeast of Portugal for which little or no leaving the language seriously threatened.
    [Show full text]
  • Developments of the Lateral in Occitan Dialects and Their Romance and Cross-Linguistic Context Daniela Müller
    Developments of the lateral in occitan dialects and their romance and cross-linguistic context Daniela Müller To cite this version: Daniela Müller. Developments of the lateral in occitan dialects and their romance and cross- linguistic context. Linguistics. Université Toulouse le Mirail - Toulouse II, 2011. English. NNT : 2011TOU20122. tel-00674530 HAL Id: tel-00674530 https://tel.archives-ouvertes.fr/tel-00674530 Submitted on 27 Feb 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. en vue de l’obtention du DOCTORATDEL’UNIVERSITÉDETOULOUSE délivré par l’université de toulouse 2 - le mirail discipline: sciences du langage zur erlangung der doktorwürde DERNEUPHILOLOGISCHENFAKULTÄT DERRUPRECHT-KARLS-UNIVERSITÄTHEIDELBERG présentée et soutenue par vorgelegt von DANIELAMÜLLER DEVELOPMENTS OF THE LATERAL IN OCCITAN DIALECTS ANDTHEIRROMANCEANDCROSS-LINGUISTICCONTEXT JURY Jonathan Harrington (Professor, Ludwig-Maximilians-Universität München) Francesc Xavier Lamuela (Catedràtic, Universitat de Girona) Jean-Léonard Léonard (Maître de conférences HDR, Paris
    [Show full text]
  • Measuring Linguistic Distance in Galician Varieties
    languages Article From Regional Dialects to the Standard: Measuring Linguistic Distance in Galician Varieties Xulio Sousa Instituto da Lingua Galega, Universidade de Santiago de Compostela, 15782 Santiago de Compostela, Galicia, Spain; [email protected] Received: 12 December 2019; Accepted: 8 January 2020; Published: 13 January 2020 Abstract: The analysis of the linguistic distance between dialect varieties and the standard variety typically focuses on establishing the influence exerted by the standard norm on the dialects. In languages whose standard form was established and implemented well after the beginning of the last century, however, such as Galician, it is still possible to perform studies from other perspectives. Unlike other languages, standard Galician was not based on a single dialect but aspired instead to be supradialectal in nature. In the present study, the tools and methods of quantitative dialectology are brought to bear in the task of establishing the extent to which dialect varieties of Galician resemble or differ from the standard variety. Moreover, the results of the analysis underline the importance of the different dialects in the evaluation of the supradialectal aim, as established by the makers of the standard variety. Keywords: galician language; dialects; standard; typology; linguistic distance; dialectometry; language planning; language standardization 1. Introduction The measurement of linguistics differences is an old topic of linguistic research that has been given new life in recent decades by the advance of data-intensive computing systems in linguistic research. The two main topics of interest in this type of study are the measurement of the linguistic distance between language systems (languages of different families, languages of the same family, social varieties, regional varieties, etc.) and the analysis of linguistic differences within language systems, especially in computational linguistics (word-sense induction, word-sense disambiguation, etc.).
    [Show full text]
  • The Indo-Portuguese Creoles of the Malabar: Historical Cues and Questions
    ** NOTE: This is a pre-print version. The published version of the chapter can be found in: - Cardoso, Hugo C. 2019. The Indo-Portuguese creoles of the Malabar: Historical cues and questions. In Pius Malekandathil, Lotika Varadarajan & Amar Farooqi (eds.), India, the Portuguese, and maritime interactions, vol. II [Religion, language and cultural expression], 345-373. Delhi: Primus Books. ** --- The Indo-Portuguese Creoles of the Malabar Historical Cues and Questions Hugo C. Cardoso THIS ESSAY PROVIDES a state-of-the-art of current research on the Indo-Portuguese creoles of the Malabar. Having been given up as extinct, these creoles have been off the radar of linguists and historians alike for a long while. Yet, they are particularly important as potential descendants of the earliest forms of contact varieties of Portuguese that formed in Asia in the sixteenth century, and raise questions that interact with a social historiography of the Indo-Portuguese communities of the region. This essay will focus on four aspects of the study of these languages which operate on a linguistic-historical interface: (a) the social conditions required for their formation; (b) their course after the end of Portuguese colonial rule; (c) their putative foundational role in the context of Luso-Asian creoles; and (d) the social and linguistic stratification encapsulated in modern and late nineteenth-century records. This discussion is meant as a step towards the integration of linguistic evidence into the study of Indo-Portuguese social history, and of historical evidence into the study of Indo-Portuguese linguistics. Introduction Starting in the early sixteenth century, the colonial involvement of Portugal with Asia introduced the Portuguese language in the region.
    [Show full text]
  • 1 22. European Portuguese and Brazilian
    22. European Portuguese and Brazilian Portuguese: how quickly may languages diverge? CHARLOTTE GALVES 22.1 Introduction In 1500, Portuguese-speaking sailors arrived to South American coasts where people spoke Tupi languages. Although in the first part of the colonization period the influence was more from the Tupi speakers towards the Portuguese speakers, as the new incomers learn the language of the natives, the “Terra de Santa Cruz vulgarmente chamada Brasil”i will eventually be the greatest Portuguese-speaking language in the world. However, Brazilian Portuguese is in many aspects as different from European Portuguese as from Galician or Spanish. We shall even argue below that from a grammatical point of view, it is much more different, since it displays syntactic features typical of typologically different languages. Portuguese provides us therefore with a case of recent split between two different realizations of a formerly unique language whose effects are observable in the phonetic, lexical and syntactic aspects of the modern languages and can be argued to characterize two radically different languages. In this chapter we shall describe the results and the dynamics of such a split, addressing ultimately the issue of the span of time necessary for languages to diverge. It will be argued that at the core of this question lies the dialectic relationship between I(nternal)-language and E(xternal)-language, in such a way that it is not possible to approach this process without taking into consideration both aspects and the way they interact. This means that we should understand not only the grammatical (I-Language) changes occurred in the history of the new languages and trace back their emergence, but also what provoked them and how they came to be diffused in the population and eventually became sufficiently homogeneous in their output (E-language) to be recognized as a new common language.
    [Show full text]
  • From European to Brazilian Portuguese: a Parameter Tree Approach
    Cadernos de Estudos Linguísticos, 58(2), pp. 237-256, mai/ago 2016 From European to Brazilian Portuguese: a parameter tree approach Juanito Ornelas de Avelar University of Campinas Charlotte Galves University of Campinas Resumo: Neste artigo, apresentamos uma análise para mudanças sofridas pelo português brasileiro relacionadas à concordância e Caso, em contraste com o português europeu. Tomando como base a existência de características sintáticas amplamente atestadas nas línguas bantas, bem como a importância demográfica das populações de origem africana nos períodos colonial e imperial no Brasil, propomos que o português brasileiro é tipologicamente diferenciado do português europeu (e das demais línguas românicas), sob a influência das línguas africanas que entraram em território brasileiro pelo tráfico de escravos. Explorando o quadro teórico da versão minimalista da Teoria de Princípios e Parâmetros, argumentamos que dois parâmetros gramaticais estão crucialmente envolvidos nessa mudança: a possibilidade de expressões nominais serem inseridas sem traço de Caso na derivação de uma sentença, e a ausência de sensibilidade do traço EPP da categoria funcional Tempo à existência de traços- phi. Seguindo o modelo de Roberts (2012), sugerimos que essas propriedades podem ser captadas por meio de uma árvore paramétrica que abarque concordância e Caso. Résumé: Dans cet article, nous présentons une analyse de l’évolution du portugais européen vers le portugais brésilien. Prenant comme base l’existence dans celui-ci de caractéristiques syntaxiques amplement attestées dans les langues du groupe bantou, bien comme l’importance démographique au long de l’histoire du Brésil des populations d’origine africaine, nous proposons que le portugais a subi un changement typologique, sous l’influence des langues africaines apportées au Brésil à l’occasion du trafic d’esclaves.
    [Show full text]
  • The Meaning of Approximative Adverbs: Evidence from European Portuguese
    THE MEANING OF APPROXIMATIVE ADVERBS: EVIDENCE FROM EUROPEAN PORTUGUESE DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Patrícia Matos Amaral, Master’s Degree in Linguistics ***** The Ohio State University 2007 Dissertation Committee: Approved by Professor Scott Schwenter, Adviser Professor Craige Roberts ____________________________ Professor John Grinstead Adviser Spanish and Portuguese Graduate Program ABSTRACT This dissertation presents an analysis of the semantic-pragmatic properties of adverbs like English almost and barely (“approximative adverbs”), both in a descriptive and in a theoretical perspective. In particular, I investigate to what extent the meaning distinctions encoded by the system of approximative adverbs in European Portuguese (EP) shed light on the characterization of these adverbs as a class and on the challenges raised by their semantic-pragmatic properties. I focus on the intuitive notion of closeness associated with the meaning of these adverbs and the related question of the asymmetry of their meaning components. The main claim of this work is that the meaning of approximative adverbs involves a comparison between properties along a scalar dimension, and makes reference to a lexically provided or contextually assumed standard value of comparison. In chapter 2, I present some of the properties displayed by approximative adverbs cross-linguistically and more specifically in European Portuguese. This chapter discusses the difficulties raised by their classification within the major classes assumed in taxonomies of adverbs. In chapter 3, I report two experiments that were conducted to test the asymmetric behavior of the meaning components of approximative adverbs and the role that they play in interpretation.
    [Show full text]
  • John Benjamins Publishing Company
    John Benjamins Publishing Company This is a contribution from Portuguese-Spanish Interfaces: Diachrony, synchrony, and contact. Edited by Patrícia Amaral and Ana Maria Carvalho. © 2014. John Benjamins Publishing Company This electronic file may not be altered in any way. The author(s) of this article is/are permitted to use this PDF file to generate printed copies to be used by way of offprints, for their personal use only. Permission is granted by the publishers to post this file on a closed server which is accessible to members (students and staff) only of the author’s/s’ institute, it is not permitted to post this PDF on the open internet. For any other use of this material prior written permission should be obtained from the publishers or through the Copyright Clearance Center (for USA: www.copyright.com). Please contact [email protected] or consult our website: www.benjamins.com Tables of Contents, abstracts and guidelines are available at www.benjamins.com A historical perspective of Afro-Portuguese and Afro-Spanish varieties in the Iberia Peninsula John M. Lipski The Pennsylvania State University This study traces the linguistic history of 15th – 18th century Spanish and Portuguese contact varieties in Spain and Portugal. The plausibility of literary imitations – the sole source of information – is discussed. Several likely common denominators emerge from the discussion, including apparently consistent phonological and morphosyntactic patterns that are also attested in existent creole languages. Keywords: Afro-Hispanic language, Afro-Portuguese language, language contacts – history, pidgins and creoles 1. Introduction Sub-Saharan Africans were present in Spain and Portugal during medieval times, mostly arriving via northeastern Africa and the eastern Mediterranean.
    [Show full text]
  • ([email protected]) in This Paper, Novel Data on Trevigiano
    ON INSITUNESS AND (VERY) LOW WH-POSITIONS THE CASE OF TREVIGIANO* Caterina Bonan ([email protected]) 1. INTRODUCTION In this paper, novel data on Trevigiano (TV), a Northern Italian Romance dialect spoken in the Provincia di Treviso, are presented and analysed. The interrogative syntax TV of lies somewhere between that of more widely studied Romance languages like French (Mathieu 1999; Bošković 2000; Cheng and Rooryck 2002, among others) and Bellunese (Munaro 1995; Munaro et al. 2001; Munaro and Pollock 2005), which makes it worth studying. In this language, the uncanonical relative order of V-selected arguments and ‘in situ’ wh-adjuncts, along with the presence of an ‘if’-COMP that licenses ‘insituness’ in indirect questions, makes it incompatible with a Remnant-IP Movement analysis à la Munaro et al. 2001 (and related works) and rather calls for a TP-internal movement analysis of ‘in situ’ instances. 2. INSITUNESS IN TREVIGIANO Like many Northern Italian Dialects (NIDs) (Poletto 1993; Munaro 1999; Manzini and Savoia 2005), TV has two series of pronouns: a declarative proclitic paradigm, and an interrogative enclitic one. In matrix questions, Subject-Clitic Inversion (SClI) is compulsory (1): (1) a. Vjen-tu al marcà co nojaltri? TV Come=you to.the market with us 'Are you going to the market with us?' b. * Te vjen al marcà co nojaltri? TV You come to.the market with us In wh-interrogatives, both exsituness and insituness are possible in genuine questions, which makes TV similar to both Bellunese (BL) and French (FR). However, whereas TV requires for SClI even when the wh-phrase is ‘in situ’ (2), just like BL (Munaro 1995) (3), this construction is ruled out in FR (4): (2) a.
    [Show full text]
  • The Subject Position in Brazilian Portuguese: the Embedding of a Syntactic Change
    University of Pennsylvania Working Papers in Linguistics Volume 14 Issue 2 Selected Papers from NWAV 36 Article 8 November 2008 The Subject Position in Brazilian Portuguese: the Embedding of a Syntactic Change Silvia Regina de O. Cavalcante Universidade Federal Fluminense Maria Eugênia L. Duarte Universidade Federal do Rio de Janeiro Follow this and additional works at: https://repository.upenn.edu/pwpl Recommended Citation Cavalcante, Silvia Regina de O. and Eugênia L. Duarte, Maria (2008) "The Subject Position in Brazilian Portuguese: the Embedding of a Syntactic Change," University of Pennsylvania Working Papers in Linguistics: Vol. 14 : Iss. 2 , Article 8. Available at: https://repository.upenn.edu/pwpl/vol14/iss2/8 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/pwpl/vol14/iss2/8 For more information, please contact [email protected]. The Subject Position in Brazilian Portuguese: the Embedding of a Syntactic Change Abstract One remarkable difference between European and Brazilian Portuguese is related with the setting of the Null Subject Parameter (NSP). While European Portuguese (EP) behaves like a prototypical romance null subject language, contemporary Brazilian Portuguese (BP) is a partially pro-drop system (Duarte 1995; Kato 2000), with preferably overt referential subjects and null expletive subjects in finite clauses, a procedure consistent with a discourse orientation shown by BP (Kato and Duarte 2003). The aim of this paper is to show that the subject position of non-finite clauses begins ot show some “side effects” or “by-products” of the ongoing process of change. Our analysis will compare the position of arbitrary subjects of infinitival sentences in EP and BP inspired by the diachronic analysis of Cavalcante (2006) of such structures in Classical Portuguese (from the 16th to the 18th century) and Modern European Portuguese (19th century).
    [Show full text]