Extending the Punctuation Module for European Portuguese

Total Page:16

File Type:pdf, Size:1020Kb

Extending the Punctuation Module for European Portuguese INTERSPEECH 2010 Extending the punctuation module for European Portuguese Fernando Batista1,2, Helena Moniz1,3, Isabel Trancoso1,4, Hugo Meinedo1, Ana Isabel Mata3, Nuno Mamede1,4 1INESC-ID, Lisbon, Portugal 2ISCTE-IUL - Lisbon University Institute, Lisbon, Portugal 3FLUL/CLUL, University of Lisbon, Portugal 4IST, Lisbon, Portugal {fmmb;helenam;isabel.trancoso;meinedo;njm}@l2f.inesc-id.pt and [email protected] Abstract 2. Related work This paper describes our recent work on extending the punc- Recent studies (e.g., [2, 3, 4, 5, 6, 7]) show that the analysis of tuation module of automatic subtitles for Portuguese Broad- prosodic features is crucial to improve sentence boundary de- cast News. The main improvement was achieved by the use of tection systems. Prosodic features, such as pause, final length- prosodic information. This enabled the extension of the previ- ening, pitch reset, and energy, are among the most salient cues ous module which covered only full stops and commas, to cover in these studies. This is supported on evidences that such cues question marks as well. The approach uses lexical, acoustic and are language-independent [8] and that languages have prosodic prosodic information. Our results show that the latter is rel- strategies to delimit sentence-like units (SU). evant for all types of punctuation. An analysis of the results also shows what type of interrogative is better dealt with by Recent studies have also pointed out that there are prosodic our method, taking into account the specificities of Portuguese. similarities across speaking styles [3]. The authors extracted This may lead to different results for different types of corpora, prosodic features and ranked them accordingly to their impact depending on the types of interrogatives that are more frequent. on the identification of dialog act boundaries. The scaled fea- tures are (in order): pause, pitch (log ratio of the pitch at the Index Terms: automatic punctuation, sentence boundary detec- end of the last word and at the beginning of the first word after tion, rich transcription, broadcast news subtitling. a boundary), energy (similar to the analysis window for pitch) and duration. 1. Introduction Detecting positions where a punctuation mark is missing, The main motivation of this work is the improvement of the roughly corresponds to the task of detecting a SU, or finding punctuation module of our automatic broadcast news captioning SU boundaries. SU boundary detection has gained increas- system. Like the speech recognition module and the modules ing attention during recent years, and it has been part of the that precede it, the punctuation and capitalization modules share NIST rich transcription evaluations. A general HMM (Hidden low latency requirements, given their on-the-fly usage. Markov Model) framework that allows the combination of lexi- Although the use of prosodic features in automatic punc- cal and prosodic clues for recovering full stop, comma and ques- tuation methods is well studied for some languages, the first tion marks is used by [9] and [10]. A similar approach was also implemented version for European Portuguese (henceforth EP) used for detecting sentence boundaries by [11, 2, 12]. [10] also deals only with full stop and comma recovery, and explores a combines 4-gram language models with a CART (Classifica- limited set of features, mostly lexical and acoustic, simultane- tion and Regression Tree) and concludes that prosodic infor- ously targeting at low latency, and language independence [1]. mation highly improve the results. [13] describes a maximum entropy (ME) based method for inserting punctuation marks The aim of this work is to improve the punctuation module, into spontaneous conversational speech, where the punctuation first by exploring additional features, namely prosodic, and later task is considered as a tagging task and words are tagged with by weighting the lexical and prosodic features impact on the the appropriate punctuation. It covers three punctuation marks: baseline system when encompassing interrogatives. To the best comma, full stop, and question mark; and the best results on the of our knowledge, this is the first study to quantify the distinct ASR output are achieved using bigram-based features and com- interrogative types and also to discuss the weight of the lexical bining lexical and prosodic features. [7] proposes a multi-pass and prosodic properties of these structures, based on planned linear fold algorithm for sentence boundary detection in spon- and spontaneous speech data for EP. taneous speech, which uses prosodic features, focusing on the The next section reviews related work. The baseline mod- relation between sentence boundaries, break indices and dura- ule and its improvement are described in Sections 3 and 4, re- tion, covering their local and global structural properties. Other spectively. Section 5 deals with the implementation of prosodic recent studies have shown that the best performance for the features to detect question marks. Conclusions and future work punctuation task is achieved when prosodic, morphologic and are presented in Section 6. syntactic information are combined [6, 12, 14]. Copyright 2010 ISCA 1509 26-30 September 2010, Makuhari, Chiba, Japan alignment between the manual and the automatic transcriptions. Table 1: Portuguese BN corpus properties. This is a non-trivial task mainly because of the recognition er- #Words Dur. (h) Planned Spont. WER rors. The NIST SCLite tool 1 was used for this task, followed Train 477k 46 55% 32% 14% by a post-processing step, either by aligning words which can Devel 66k 6 51% 38% 19% be written differently or by correcting some SCLite basic er- Test 135k 18 56% 36% 19% rors. The data was automatically annotated with part-of-speech information, using MARv [20]. The results achieved with this baseline version are shown 3. Baseline punctuation module in the first line of Table 2. The evaluation used the performance metrics Precision, Recall and SER (Slot Error Rate) [21]. Only Our on-line broadcast news processing system consists of a punctuation marks are considered as slots and used by these pipeline of modules that starts with jingle detection, audio di- metrics. Hence, the SER is computed by dividing the number arization, and automatic speech recognition (ASR) [15]. Our of punctuation errors by the number of punctuation marks in the ASR system follows the connectionist paradigm. The hy- reference data. brid models have been recently improved by the inclusion of multiple-state phone units, and a fixed set of phone transition units aimed at specifically modeling the most frequent intra- 4. Improved model for full stop and comma word phone transitions [16]. Although the system can be im- In order to introduce prosodic features for detecting SUs, we proved by using dynamic vocabulary and language models up- have performed a number of additional steps. The first step con- dated daily, the experiments reported in this paper used a fixed sisted of extracting the pitch and the energy from the speech sig- vocabulary of 100k words. nal, which was achieved using the Snack Sound Toolkit2. Du- Next in our pipeline, come the punctuation and capitaliza- rations of phones, words, and interword-pauses are extracted tion modules [1, 17]. Both use a discriminative approach, based from the recognizer output. By combining the pitch values with on maximum entropy (ME) models, which provides a clean way the phone boundaries, we have removed micro-intonation and of expressing and combining different properties of the infor- octave jump effects from the pitch track. Another important mation. This is specially useful for the punctuation task, given step consisted of marking the syllable boundaries as well as the the broad set of available lexical, acoustic and prosodic fea- syllable stress. A set of syllabification rules was designed and tures. This approach requires all information to be expressed applied to the lexicon. The rules account fairly well for native in terms of features, causing the resultant data file to become words, but need improvements for words of foreign origin. Fi- several times larger than the original one. The classification is nally, we have calculated the maximum, minimum, median and straightforward, making it interesting for on-the-fly usage. slope values for pitch and energy in each word, syllable, and The experiments described in this paper used the MegaM phone. Duration was also calculated for each one of the previ- tool [18] for training the maximum entropy models, using con- ous units. jugate gradient and logistic regression. Our baseline experi- As previously mentioned, our experiments aim at analyzing ments targeted only the two most frequent punctuation marks: the weight and contribution of each prosodic feature per se and full stop and comma. The following features were used for a the impact of the combination of prosodic features. Underlying given word w in the position i of the corpus: wi, wi+1, 2wi 2, the feature extraction process are linguistic evidences that pitch − 2wi 1, 2wi, 2wi+1, 3wi 2, 3wi 1, pi, pi+1, 2pi 2, 2pi 1, contour, boundary tones, energy slopes, and pauses are crucial − − − − − 2pi, 2pi+1, 3pi 2, 3pi 1, GenderChgs1, SpeakerChgs1, to delimit sentence-like units across languages. First, we have − − and TimeGap1, where: wi is the current word, wi+1 is the tested if the features would perform better on different units of word that follows and nwi x is the n-gram of words that starts analysis: phones, syllables and/or words. Supported on linguis- ± x positions after or before the position i; pi is part-of-speech tic findings for EP [22, 23], we hypothesized that the stressed of the current word, and npi x is the n-gram of part-of-speech and post-stressed syllables would be relevant units of analysis ± of words that starts x positions after or before the position i.
Recommended publications
  • Portuguese Language in Angola: Luso-Creoles' Missing Link? John M
    Portuguese language in Angola: luso-creoles' missing link? John M. Lipski {presented at annual meeting of the AATSP, San Diego, August 9, 1995} 0. Introduction Portuguese explorers first reached the Congo Basin in the late 15th century, beginning a linguistic and cultural presence that in some regions was to last for 500 years. In other areas of Africa, Portuguese-based creoles rapidly developed, while for several centuries pidginized Portuguese was a major lingua franca for the Atlantic slave trade, and has been implicated in the formation of many Afro- American creoles. The original Portuguese presence in southwestern Africa was confined to limited missionary activity, and to slave trading in coastal depots, but in the late 19th century, Portugal reentered the Congo-Angola region as a colonial power, committed to establishing permanent European settlements in Africa, and to Europeanizing the native African population. In the intervening centuries, Angola and the Portuguese Congo were the source of thousands of slaves sent to the Americas, whose language and culture profoundly influenced Latin American varieties of Portuguese and Spanish. Despite the key position of the Congo-Angola region for Ibero-American linguistic development, little is known of the continuing use of the Portuguese language by Africans in Congo-Angola during most of the five centuries in question. Only in recent years has some attention been directed to the Portuguese language spoken non-natively but extensively in Angola and Mozambique (Gonçalves 1983). In Angola, the urban second-language varieties of Portuguese, especially as spoken in the squatter communities of Luanda, have been referred to as Musseque Portuguese, a name derived from the KiMbundu term used to designate the shantytowns themselves.
    [Show full text]
  • Portuguese Languagelanguage Kitkit
    PortuguesePortuguese LanguageLanguage KitKit Expressions - Grammar - Online Resources - Culture languagecoursesuk.co.uk Introduction Whether you plan to embark on a new journey towards learning Portuguese or you just need a basic reference booklet for a trip abroad, the Cactus team has compiled some of the most help- ful Portuguese expressions, grammar rules, culture tips and recommendations. Portuguese is one the most significant languages in the world, and Portugal and Brazil are popular desti- nations for holidays and business trips. As such, Portuguese is appealing to an ever-growing number of Cactus language learners. Learning Portuguese will be a great way to discover the fascinating cultures and gastronomy of the lusophone world, and to improve your career pros- pects. Learning Portuguese is the beginning of an exciting adventure that is waiting for you! The Cactus Team 3. Essential Expressions Contact us 4. Grammar and Numbers Telephone (local rate) 5. Useful Verbs 0845 130 4775 8. Online Resources Telephone (int’l) 10. Take a Language Holiday +44 1273 830 960 11. Cultural Differences Monday-Thursday: 9am-7pm 12. Portugal & Brazil Culture Friday: 9am-5pm Recommendations 15. Start Learning Portuguese 2 Essential Expressions Hello Olá (olah) Goodbye Tchau (chaoh) Please Por favor Thank you Obrigado (obrigahdu) Yes Sim (simng) No Não (nowng) Excuse me/sorry Desculpe / perdão (des-cool-peh) My name is… O meu nome é… (oh meoh nomay ay) What is your name? Qual é o seu nome? (kwah-ooh eh seh-ooh noh-mee) Nice to meet you Muito prazer
    [Show full text]
  • ON MIRANDESE LANGUAGE RESOURCES for TEXT-TO-SPEECH José Pedro Ferreira2, Cristiano Chesi1,3,4 , Hyongsil Cho 1, 3, Daan Baldewi
    SLTU-2014, St. Petersburg, Russia, 14-16 May 2014 ON MIRANDESE LANGUAGE RESOURCES FOR TEXT-TO-SPEECH José Pedro Ferreira2, Cristiano Chesi1,3,4 , Hyongsil Cho 1, 3, Daan Baldewijns1, Daniela Braga1, 3, Miguel Dias 1, 3 1Microsoft Language Development Center, Portugal. 2Instituto de Linguística Teórica e Computacional. 3ISCTE-IUL University Institute of Lisbon, 4IUSS, Istituto Universitario di Studi Superiori, Pavia ABSTRACT 2. THE MIRANDESE LANGUAGE This paper aims at describing the major components of the first Text-to-Speech (TTS) system ever built for Mirandese, Mirandese belongs to the Astur-Leonese group of West [1] a minority language spoken in the Northeast of Portugal. Iberian languages, being closely related to Asturian, spoken Both language resources development (corpus, text- to this day in areas of the Asturias and Leon autonomous normalization rules, annotated lexicon, phone sets and communities in Spain, with which Mirandese no longer recordings) and the TTS (Statistical Parameter Synthesis) retains a linguistic continuum. Unwritten for most of its system are documented here. history, Mirandese was first scientifically identified and studied in the late 19th century [2]. Throughout the 20th Index Terms— Mirandese, text-to-speech, language century, strong demographic changes, namely the exodus of resources large numbers through emigration in the 1940s and in the 1970s, an influx of non-Mirandese speaking workers from 1. INTRODUCTION various other parts of the country in the 1950s and 1960s, along with the rise of Portuguese-spoken-only media, led to Mirandese is a minority language spoken by around 15.000 inter-generational transmission to be gradually abandoned, people in the Northeast of Portugal for which little or no leaving the language seriously threatened.
    [Show full text]
  • Developments of the Lateral in Occitan Dialects and Their Romance and Cross-Linguistic Context Daniela Müller
    Developments of the lateral in occitan dialects and their romance and cross-linguistic context Daniela Müller To cite this version: Daniela Müller. Developments of the lateral in occitan dialects and their romance and cross- linguistic context. Linguistics. Université Toulouse le Mirail - Toulouse II, 2011. English. NNT : 2011TOU20122. tel-00674530 HAL Id: tel-00674530 https://tel.archives-ouvertes.fr/tel-00674530 Submitted on 27 Feb 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. en vue de l’obtention du DOCTORATDEL’UNIVERSITÉDETOULOUSE délivré par l’université de toulouse 2 - le mirail discipline: sciences du langage zur erlangung der doktorwürde DERNEUPHILOLOGISCHENFAKULTÄT DERRUPRECHT-KARLS-UNIVERSITÄTHEIDELBERG présentée et soutenue par vorgelegt von DANIELAMÜLLER DEVELOPMENTS OF THE LATERAL IN OCCITAN DIALECTS ANDTHEIRROMANCEANDCROSS-LINGUISTICCONTEXT JURY Jonathan Harrington (Professor, Ludwig-Maximilians-Universität München) Francesc Xavier Lamuela (Catedràtic, Universitat de Girona) Jean-Léonard Léonard (Maître de conférences HDR, Paris
    [Show full text]
  • Measuring Linguistic Distance in Galician Varieties
    languages Article From Regional Dialects to the Standard: Measuring Linguistic Distance in Galician Varieties Xulio Sousa Instituto da Lingua Galega, Universidade de Santiago de Compostela, 15782 Santiago de Compostela, Galicia, Spain; [email protected] Received: 12 December 2019; Accepted: 8 January 2020; Published: 13 January 2020 Abstract: The analysis of the linguistic distance between dialect varieties and the standard variety typically focuses on establishing the influence exerted by the standard norm on the dialects. In languages whose standard form was established and implemented well after the beginning of the last century, however, such as Galician, it is still possible to perform studies from other perspectives. Unlike other languages, standard Galician was not based on a single dialect but aspired instead to be supradialectal in nature. In the present study, the tools and methods of quantitative dialectology are brought to bear in the task of establishing the extent to which dialect varieties of Galician resemble or differ from the standard variety. Moreover, the results of the analysis underline the importance of the different dialects in the evaluation of the supradialectal aim, as established by the makers of the standard variety. Keywords: galician language; dialects; standard; typology; linguistic distance; dialectometry; language planning; language standardization 1. Introduction The measurement of linguistics differences is an old topic of linguistic research that has been given new life in recent decades by the advance of data-intensive computing systems in linguistic research. The two main topics of interest in this type of study are the measurement of the linguistic distance between language systems (languages of different families, languages of the same family, social varieties, regional varieties, etc.) and the analysis of linguistic differences within language systems, especially in computational linguistics (word-sense induction, word-sense disambiguation, etc.).
    [Show full text]
  • Adaptation of Brazilian Portuguese Texts to European Portuguese
    BP2EP – Adaptation of Brazilian Portuguese texts to European Portuguese Lu´ıs Marujo 1,2,3, Nuno Grazina 1, Tiago Lu´ıs 1,2, Wang Ling 1,2,3, Lu´ısa Coheur 1,2, Isabel Trancoso 1,2 1 Spoken Language Laboratory / INESC-ID Lisboa, Portugal 2 Instituto Superior Tecnico,´ Lisboa, Portugal 3 Language Technologies Institute / Carnegie Mellon University, Pittsburgh, USA luis.marujo,ngraz,tiago.luis,wang.ling,luisa.coheur,imt @l2f.inesc-id.pt { } Abstract Portuguese (EP) translations. This discrepancy has origin in the Brazilian population size that is near This paper describes a method to effi- 20 times larger than the Portuguese population. ciently leverage Brazilian Portuguese re- This paper describes the progressive develop- sources as European Portuguese resources. ment of a tool that transforms BP texts into EP, in Brazilian Portuguese and European Por- order to increase the amount of EP parallel corpora tuguese are two Portuguese varieties very available. close and usually mutually intelligible, This paper is organized as follows: Section 2 de- but with several known differences, which scribes some related work; Section 3 presents the are studied in this work. Based on this main differences between EP and BP; the descrip- study, we derived a rule based system to tion of the BP2EP system is the focus of Section translate Brazilian Portuguese resources. 4. The following section describes an algorithm Some resources were enriched with mul- for extracting Multiword Lexical Contrastive pairs tiword units retrieved semi-automatically from SMT Phrase-tables; Section 6 presents the re- from phrase tables created using statisti- sults, and Section 7 concludes and suggests future cal machine translation tools.
    [Show full text]
  • The Indo-Portuguese Creoles of the Malabar: Historical Cues and Questions
    ** NOTE: This is a pre-print version. The published version of the chapter can be found in: - Cardoso, Hugo C. 2019. The Indo-Portuguese creoles of the Malabar: Historical cues and questions. In Pius Malekandathil, Lotika Varadarajan & Amar Farooqi (eds.), India, the Portuguese, and maritime interactions, vol. II [Religion, language and cultural expression], 345-373. Delhi: Primus Books. ** --- The Indo-Portuguese Creoles of the Malabar Historical Cues and Questions Hugo C. Cardoso THIS ESSAY PROVIDES a state-of-the-art of current research on the Indo-Portuguese creoles of the Malabar. Having been given up as extinct, these creoles have been off the radar of linguists and historians alike for a long while. Yet, they are particularly important as potential descendants of the earliest forms of contact varieties of Portuguese that formed in Asia in the sixteenth century, and raise questions that interact with a social historiography of the Indo-Portuguese communities of the region. This essay will focus on four aspects of the study of these languages which operate on a linguistic-historical interface: (a) the social conditions required for their formation; (b) their course after the end of Portuguese colonial rule; (c) their putative foundational role in the context of Luso-Asian creoles; and (d) the social and linguistic stratification encapsulated in modern and late nineteenth-century records. This discussion is meant as a step towards the integration of linguistic evidence into the study of Indo-Portuguese social history, and of historical evidence into the study of Indo-Portuguese linguistics. Introduction Starting in the early sixteenth century, the colonial involvement of Portugal with Asia introduced the Portuguese language in the region.
    [Show full text]
  • 1 22. European Portuguese and Brazilian
    22. European Portuguese and Brazilian Portuguese: how quickly may languages diverge? CHARLOTTE GALVES 22.1 Introduction In 1500, Portuguese-speaking sailors arrived to South American coasts where people spoke Tupi languages. Although in the first part of the colonization period the influence was more from the Tupi speakers towards the Portuguese speakers, as the new incomers learn the language of the natives, the “Terra de Santa Cruz vulgarmente chamada Brasil”i will eventually be the greatest Portuguese-speaking language in the world. However, Brazilian Portuguese is in many aspects as different from European Portuguese as from Galician or Spanish. We shall even argue below that from a grammatical point of view, it is much more different, since it displays syntactic features typical of typologically different languages. Portuguese provides us therefore with a case of recent split between two different realizations of a formerly unique language whose effects are observable in the phonetic, lexical and syntactic aspects of the modern languages and can be argued to characterize two radically different languages. In this chapter we shall describe the results and the dynamics of such a split, addressing ultimately the issue of the span of time necessary for languages to diverge. It will be argued that at the core of this question lies the dialectic relationship between I(nternal)-language and E(xternal)-language, in such a way that it is not possible to approach this process without taking into consideration both aspects and the way they interact. This means that we should understand not only the grammatical (I-Language) changes occurred in the history of the new languages and trace back their emergence, but also what provoked them and how they came to be diffused in the population and eventually became sufficiently homogeneous in their output (E-language) to be recognized as a new common language.
    [Show full text]
  • From European to Brazilian Portuguese: a Parameter Tree Approach
    Cadernos de Estudos Linguísticos, 58(2), pp. 237-256, mai/ago 2016 From European to Brazilian Portuguese: a parameter tree approach Juanito Ornelas de Avelar University of Campinas Charlotte Galves University of Campinas Resumo: Neste artigo, apresentamos uma análise para mudanças sofridas pelo português brasileiro relacionadas à concordância e Caso, em contraste com o português europeu. Tomando como base a existência de características sintáticas amplamente atestadas nas línguas bantas, bem como a importância demográfica das populações de origem africana nos períodos colonial e imperial no Brasil, propomos que o português brasileiro é tipologicamente diferenciado do português europeu (e das demais línguas românicas), sob a influência das línguas africanas que entraram em território brasileiro pelo tráfico de escravos. Explorando o quadro teórico da versão minimalista da Teoria de Princípios e Parâmetros, argumentamos que dois parâmetros gramaticais estão crucialmente envolvidos nessa mudança: a possibilidade de expressões nominais serem inseridas sem traço de Caso na derivação de uma sentença, e a ausência de sensibilidade do traço EPP da categoria funcional Tempo à existência de traços- phi. Seguindo o modelo de Roberts (2012), sugerimos que essas propriedades podem ser captadas por meio de uma árvore paramétrica que abarque concordância e Caso. Résumé: Dans cet article, nous présentons une analyse de l’évolution du portugais européen vers le portugais brésilien. Prenant comme base l’existence dans celui-ci de caractéristiques syntaxiques amplement attestées dans les langues du groupe bantou, bien comme l’importance démographique au long de l’histoire du Brésil des populations d’origine africaine, nous proposons que le portugais a subi un changement typologique, sous l’influence des langues africaines apportées au Brésil à l’occasion du trafic d’esclaves.
    [Show full text]
  • The Meaning of Approximative Adverbs: Evidence from European Portuguese
    THE MEANING OF APPROXIMATIVE ADVERBS: EVIDENCE FROM EUROPEAN PORTUGUESE DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Patrícia Matos Amaral, Master’s Degree in Linguistics ***** The Ohio State University 2007 Dissertation Committee: Approved by Professor Scott Schwenter, Adviser Professor Craige Roberts ____________________________ Professor John Grinstead Adviser Spanish and Portuguese Graduate Program ABSTRACT This dissertation presents an analysis of the semantic-pragmatic properties of adverbs like English almost and barely (“approximative adverbs”), both in a descriptive and in a theoretical perspective. In particular, I investigate to what extent the meaning distinctions encoded by the system of approximative adverbs in European Portuguese (EP) shed light on the characterization of these adverbs as a class and on the challenges raised by their semantic-pragmatic properties. I focus on the intuitive notion of closeness associated with the meaning of these adverbs and the related question of the asymmetry of their meaning components. The main claim of this work is that the meaning of approximative adverbs involves a comparison between properties along a scalar dimension, and makes reference to a lexically provided or contextually assumed standard value of comparison. In chapter 2, I present some of the properties displayed by approximative adverbs cross-linguistically and more specifically in European Portuguese. This chapter discusses the difficulties raised by their classification within the major classes assumed in taxonomies of adverbs. In chapter 3, I report two experiments that were conducted to test the asymmetric behavior of the meaning components of approximative adverbs and the role that they play in interpretation.
    [Show full text]
  • John Benjamins Publishing Company
    John Benjamins Publishing Company This is a contribution from Portuguese-Spanish Interfaces: Diachrony, synchrony, and contact. Edited by Patrícia Amaral and Ana Maria Carvalho. © 2014. John Benjamins Publishing Company This electronic file may not be altered in any way. The author(s) of this article is/are permitted to use this PDF file to generate printed copies to be used by way of offprints, for their personal use only. Permission is granted by the publishers to post this file on a closed server which is accessible to members (students and staff) only of the author’s/s’ institute, it is not permitted to post this PDF on the open internet. For any other use of this material prior written permission should be obtained from the publishers or through the Copyright Clearance Center (for USA: www.copyright.com). Please contact [email protected] or consult our website: www.benjamins.com Tables of Contents, abstracts and guidelines are available at www.benjamins.com A historical perspective of Afro-Portuguese and Afro-Spanish varieties in the Iberia Peninsula John M. Lipski The Pennsylvania State University This study traces the linguistic history of 15th – 18th century Spanish and Portuguese contact varieties in Spain and Portugal. The plausibility of literary imitations – the sole source of information – is discussed. Several likely common denominators emerge from the discussion, including apparently consistent phonological and morphosyntactic patterns that are also attested in existent creole languages. Keywords: Afro-Hispanic language, Afro-Portuguese language, language contacts – history, pidgins and creoles 1. Introduction Sub-Saharan Africans were present in Spain and Portugal during medieval times, mostly arriving via northeastern Africa and the eastern Mediterranean.
    [Show full text]
  • ([email protected]) in This Paper, Novel Data on Trevigiano
    ON INSITUNESS AND (VERY) LOW WH-POSITIONS THE CASE OF TREVIGIANO* Caterina Bonan ([email protected]) 1. INTRODUCTION In this paper, novel data on Trevigiano (TV), a Northern Italian Romance dialect spoken in the Provincia di Treviso, are presented and analysed. The interrogative syntax TV of lies somewhere between that of more widely studied Romance languages like French (Mathieu 1999; Bošković 2000; Cheng and Rooryck 2002, among others) and Bellunese (Munaro 1995; Munaro et al. 2001; Munaro and Pollock 2005), which makes it worth studying. In this language, the uncanonical relative order of V-selected arguments and ‘in situ’ wh-adjuncts, along with the presence of an ‘if’-COMP that licenses ‘insituness’ in indirect questions, makes it incompatible with a Remnant-IP Movement analysis à la Munaro et al. 2001 (and related works) and rather calls for a TP-internal movement analysis of ‘in situ’ instances. 2. INSITUNESS IN TREVIGIANO Like many Northern Italian Dialects (NIDs) (Poletto 1993; Munaro 1999; Manzini and Savoia 2005), TV has two series of pronouns: a declarative proclitic paradigm, and an interrogative enclitic one. In matrix questions, Subject-Clitic Inversion (SClI) is compulsory (1): (1) a. Vjen-tu al marcà co nojaltri? TV Come=you to.the market with us 'Are you going to the market with us?' b. * Te vjen al marcà co nojaltri? TV You come to.the market with us In wh-interrogatives, both exsituness and insituness are possible in genuine questions, which makes TV similar to both Bellunese (BL) and French (FR). However, whereas TV requires for SClI even when the wh-phrase is ‘in situ’ (2), just like BL (Munaro 1995) (3), this construction is ruled out in FR (4): (2) a.
    [Show full text]