CURRICULUM VITAE VICTOR A. FRIEDMAN Andrew W. Mellon Professor Department of Slavic Languages and Literatures•University of Ch
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Why Is Language Typology Possible?
Why is language typology possible? Martin Haspelmath 1 Languages are incomparable Each language has its own system. Each language has its own categories. Each language is a world of its own. 2 Or are all languages like Latin? nominative the book genitive of the book dative to the book accusative the book ablative from the book 3 Or are all languages like English? 4 How could languages be compared? If languages are so different: What could be possible tertia comparationis (= entities that are identical across comparanda and thus permit comparison)? 5 Three approaches • Indeed, language typology is impossible (non- aprioristic structuralism) • Typology is possible based on cross-linguistic categories (aprioristic generativism) • Typology is possible without cross-linguistic categories (non-aprioristic typology) 6 Non-aprioristic structuralism: Franz Boas (1858-1942) The categories chosen for description in the Handbook “depend entirely on the inner form of each language...” Boas, Franz. 1911. Introduction to The Handbook of American Indian Languages. 7 Non-aprioristic structuralism: Ferdinand de Saussure (1857-1913) “dans la langue il n’y a que des différences...” (In a language there are only differences) i.e. all categories are determined by the ways in which they differ from other categories, and each language has different ways of cutting up the sound space and the meaning space de Saussure, Ferdinand. 1915. Cours de linguistique générale. 8 Example: Datives across languages cf. Haspelmath, Martin. 2003. The geometry of grammatical meaning: semantic maps and cross-linguistic comparison 9 Example: Datives across languages 10 Example: Datives across languages 11 Non-aprioristic structuralism: Peter H. Matthews (University of Cambridge) Matthews 1997:199: "To ask whether a language 'has' some category is...to ask a fairly sophisticated question.. -
Instituti Albanologjik I Prishtinës
Begzad BALIU Onomastikë dhe identitet Prof.asc.dr. Begzad BALIU Onomastikë dhe identitet Recensues Prof. dr. Bahtijar Kryeziu Shtëpia botuese Era, Prishtinë, 2012 Botimin e këtij libri e ka përkrahur Drejtoria për Kulturë e Komunës së Prishtinës 2 Begzad BALIU ONOMASTIKË DHE IDENTITET Era Prishtinë, 2012 3 4 Bardhës, Erës, Enit, fëmijëve të mi! 5 6 PËRMBAJTJA PARATHËNIE E RECENSUESIT. ..............................11 HYRJE ..... .....................................................................15 I ONOMASTIKA E KOSOVËS - NDËRMJET MITEVE DHE IDENTITEVE ......................................................... 19 I. 1. Onomastika si fat ...................................................... 19 I. 2. Onomastika dhe origjina e shqiptarëve ..................... 21 I. 3. Onomastika dhe politika ........................................... 24 II. 1. Etnonimi kosovar..................................................... 27 II. 2. Ruajtja e homogjenitetit .......................................... 29 II. 3. Toponimi Kosovë dhe etnonimi kosovar ................ 30 III. 1. Konteksti shqiptaro-sllav i toponimisë................... 35 III. 2. Ndeshja: struktura e toponimisë ............................. 36 III. 3. Struktura shumështresore e toponimisë së Kosovës .......................................................................................... 39 III. 4. Konteksti ................................................................ 40 III. 5. Standardizimi i toponimisë dhe gjuha .................... 43 III. 6. Standardizimi i toponimisë - goditja -
Albanian Language Identification in Text
5 BSHN(UT)23/2017 ALBANIAN LANGUAGE IDENTIFICATION IN TEXT DOCUMENTS *KLESTI HOXHA.1, ARTUR BAXHAKU.2 1University of Tirana, Faculty of Natural Sciences, Department of Informatics 2University of Tirana, Faculty of Natural Sciences, Department of Mathematics email: [email protected] Abstract In this work we investigate the accuracy of standard and state-of-the-art language identification methods in identifying Albanian in written text documents. A dataset consisting of news articles written in Albanian has been constructed for this purpose. We noticed a considerable decrease of accuracy when using test documents that miss the Albanian alphabet letters “Ë” and “Ç” and created a custom training corpus that solved this problem by achieving an accuracy of more than 99%. Based on our experiments, the most performing language identification methods for Albanian use a naïve Bayes classifier and n-gram based classification features. Keywords: Language identification, text classification, natural language processing, Albanian language. Përmbledhje Në këtë punim shqyrtohet saktësia e disa metodave standarde dhe bashkëkohore në identifikimin e gjuhës shqipe në dokumente tekstuale. Për këtë qëllim është ndërtuar një bashkësi të dhënash testuese e cila përmban artikuj lajmesh të shkruara në shqip. Për tekstet shqipe që nuk përmbajnë gërmat “Ë” dhe “Ç” u vu re një zbritje e konsiderueshme e saktësisë së identifikimit të gjuhës. Për këtë arsye u krijua një korpus trajnues i posaçëm që e zgjidhi këtë problem duke arritur një saktësi prej më shumë se 99%. Bazuar në eksperimentet e kryera, metodat më të sakta për identifikimin e gjuhës shqipe përdorin një klasifikues “naive Bayes” dhe veçori klasifikuese të bazuara në n-grame. -
Modeling Language Variation and Universals: a Survey on Typological Linguistics for Natural Language Processing
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing Edoardo Ponti, Helen O ’Horan, Yevgeni Berzak, Ivan Vulic, Roi Reichart, Thierry Poibeau, Ekaterina Shutova, Anna Korhonen To cite this version: Edoardo Ponti, Helen O ’Horan, Yevgeni Berzak, Ivan Vulic, Roi Reichart, et al.. Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing. 2018. hal-01856176 HAL Id: hal-01856176 https://hal.archives-ouvertes.fr/hal-01856176 Preprint submitted on 9 Aug 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing Edoardo Maria Ponti∗ Helen O’Horan∗∗ LTL, University of Cambridge LTL, University of Cambridge Yevgeni Berzaky Ivan Vuli´cz Department of Brain and Cognitive LTL, University of Cambridge Sciences, MIT Roi Reichart§ Thierry Poibeau# Faculty of Industrial Engineering and LATTICE Lab, CNRS and ENS/PSL and Management, Technion - IIT Univ. Sorbonne nouvelle/USPC Ekaterina Shutova** Anna Korhonenyy ILLC, University of Amsterdam LTL, University of Cambridge Understanding cross-lingual variation is essential for the development of effective multilingual natural language processing (NLP) applications. -
Form, Function and History of the Present Suffix -I/-Ën in Albanian and Its Dialects
M.A. Lopuhaä Form, function and history of the present suffix -i/-ën in Albanian and its dialects Master Thesis, July 1, 2014 Supervisor: Dr. M.A.C. de Vaan Contents 1 Introduction 4 2 Conventions and notation 5 3 Background and statement of the problem 7 3.1 The Albanian verbal system ................................... 7 3.2 The Proto-Albanian verbal system ............................... 8 3.3 Main research questions ..................................... 9 3.4 Previous work on the subject .................................. 9 4 Morphological changes from Old Albanian to Modern Albanian 11 4.1 Verbal endings in Old and Modern Albanian .......................... 11 4.2 Present singular .......................................... 12 4.3 Present plural ........................................... 12 4.4 Imperfect and subjunctive .................................... 13 5 Proto-Albanian reconstruction 14 6 Proto-Indo-European reconstruction 17 6.1 Vocalic nasals in Albanian .................................... 17 6.2 The reality of a PIE suffix *-n-ie/o- ............................... 18 7 Dialectal information 20 7.1 Buzuku .............................................. 23 7.2 Northwestern Geg ........................................ 23 7.3 Northern Geg ........................................... 24 7.4 Northeastern Geg ......................................... 25 7.5 Central Geg ............................................ 26 7.6 Southern Geg ........................................... 27 7.7 Transitory dialects ....................................... -
Proper Language, Proper Citizen: Standard Practice and Linguistic Identity in Primary Education
Proper Language, Proper Citizen: Standard Linguistic Practice and Identity in Macedonian Primary Education by Amanda Carroll Greber A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of Slavic Languages and Literatures University of Toronto © Copyright by Amanda Carroll Greber 2013 Abstract Proper Language, Proper Citizen: Standard Linguistic Practice and Identity in Macedonian Primary Education Doctor of Philosophy 2013 Amanda Carroll Greber Department of Slavic Languages and Literatures University of Toronto This dissertation analyzes how the concept of the ideal citizen is shaped linguistically and visually in Macedonian textbooks and how this concept changes over time and in concert with changes in society. It is focused particularly on the role of primary education in the transmission of language, identity, and culture as part of the nation-building process. It is concerned with how schools construct linguistic norms in association with the construction of citizenship. The linguistic practices represented in textbooks depict “good language” and thus index also “good citizen.” Textbooks function as part of the broader sets of resources and practices with which education sets out to make citizens and thus they have an important role in shaping young people’s knowledge and feelings about the nation and nation-state, as well as language ideologies and practices. By analyzing the “ideal” citizen represented in a textbook we can begin to discern the goals of the government and society. To this end, I conduct a diachronic analysis of the Macedonian language used in elementary readers at several points from 1945 to 2000 using a combination of qualitative and quantitative methods. -
Morphological Tagging and Lemmatization of Albanian: a Manually Annotated Corpus and Neural Models
Morphological Tagging and Lemmatization of Albanian: A Manually Annotated Corpus and Neural Models Nelda Kote¹, Marenglen Biba², Jenna Kanerva³, Samuel Rönnqvist³ and Filip Ginter³ ¹Faculty of Information Technology, Polytechnic University of Tirana, Albania ²Faculty of Information Technology, New York University of Tirana, Albania ³TurkuNLP Group, University of Turku, Finland [email protected], [email protected], {jmnybl, saanro, figint}@utu.fi Abstract In this paper, we present the first publicly available part-of-speech and morphologically tagged corpus for the Albanian language, as well as a neural morphological tagger and lemmatizer trained on it. There is currently a lack of available NLP resources for Albanian, and its complex grammar and morphology present challenges to their development. We have created an Albanian part-of-speech corpus based on the Universal Dependencies schema for morphological annotation, containing about 118,000 tokens of naturally occuring text collected from different text sources, with an addition of 67,000 tokens of artificially created simple sentences used only in training. On this corpus, we subsequently train and evaluate segmentation, morphological tagging and lemmatization models, using the Turku Neural Parser Pipeline. On the held-out evaluation set, the model achieves 92.74% accuracy on part-of-speech tagging, 85.31% on morphological tagging, and 89.95% on lemmatization. The manually annotated corpus, as well as the trained models are available under an open license. Keywords: part-of-speech tagging, morphological tagging, lemmatization, Albanian language 1. Introduction 2. Related Work The Albanian language is an Indo-European language There are several previous attempts to develop NLP tools spoken by around 7 million native speakers mostly in the for Albanian. -
Typology, Documentation, Description, and Typology
Typology, Documentation, Description, and Typology Marianne Mithun University of California, Santa Barbara Abstract If the goals of linguistic typology, are, as described by Plank (2016): (a) to chart linguistic diversity (b) to seek out order or even unity in diversity knowledge of the current state of the art is an invaluable tool for almost any linguistic endeavor. For language documentation and description, knowing what distinctions, categories, and patterns have been observed in other languages makes it possible to identify them more quickly and thoroughly in an unfamiliar language. Knowing how they differ in detail can prompt us to tune into those details. Knowing what is rare cross-linguistically can ensure that unusual features are richly documented and prominent in descriptions. But if documentation and description are limited to filling in typological checklists, not only will much of the essence of each language be missed, but the field of typology will also suffer, as new variables and correlations will fail to surface, and our understanding of deeper factors behind cross-linguistic similarities and differences will not progress. 1. Typological awareness as a tool Looking at the work of early scholars such as Franz Boas and Edward Sapir, it is impossible not to be amazed at the richness of their documentation and the insight of their descriptions of languages so unlike the more familiar languages of Europe. It is unlikely that Boas first arrived on Baffin Island forewarned to watch for velar/uvular distinctions and ergativity. Now more than a century later, an awareness of what distinctions can be significant in languages and what kinds of systems recur can provide tremendous advantages, allowing us to spot potentially important features sooner and identify patterns on the basis of fewer examples. -
Grammar of the Albanian Language" (1882) of Konstandin Kristoforidhi
E-ISSN 2281-4612 Academic Journal of Interdisciplinary Studies Vol 4 No 3 S1 ISSN 2281-3993 MCSER Publishing, Rome-Italy December 2015 Description of the Verbal System of Albanian Language in the " Grammar of the Albanian Language" (1882) of Konstandin Kristoforidhi Dr. Manola Kaçi (Myrta) “Aleksandër Moisiu” University, Durrës; [email protected] Doi:10.5901/ajis.2015.v4n3s1p421 One of the most significant works of Kristoforidhi is the "Grammar of Albanian language". The author published this work in Greek ("Gramatiqi'tis Alvaniqi's Glo'sis), in 1882, in Istanbul. Albanian grammatology actually had very good traditions in this respect, but If we would refer to the linguist Shaban Demiraj, following the grammar of De Rada, the grammar of Kristoforidhi is the second grammar of the Albanian language, written and published by an Albanian, and the best work of this kind written hitherto by foreign scholars (Demiraj, 2002 p. 42). In his Albanian Grammar text, Kristoforidhi describes in detail the Albanian morphology. Given the intention to not draft a simply educational grammar and to raise the Albanian language at the same scientific level as the most advanced languages in Europe, the Grammar of Kristoforidhi has the appearance of a linguistic manual in which the morphological structures of Albanian Language appear and is supported by numerous linguistic facts and diverse examples provided in both Albanian dialects (Karapinjalli, Stringa, 2002, p. 213). Although it is generally a descriptive grammar, it is designed on sound theoretical grounds that permeate it from the beginning of the treatment until the end. This linguistic theory is revealed in the systematic submission of all factual language material, including from time to time even the numerous explanatory notes of the author himself. -
"Shoot the Teacher!": Education and the Roots of the Macedonian Struggle
"SHOOT THE TEACHER!" EDUCATION AND THE ROOTS OF THE MACEDONIAN STRUGGLE Julian Allan Brooks Bachelor of Arts, University of Victoria, 1992 Bachelor of Education, University of British Columbia, 200 1 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS In the Department of History O Julian Allan Brooks 2005 SIMON FRASER UNIVERSITY Fall 2005 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without permission of the author. APPROVAL Name: Julian Allan Brooks Degree: Master of Arts Title of Thesis: "Shoot the Teacher!" Education and the Roots of the Macedonian Struggle Examining Committee: Chair: Professor Mark Leier Professor of History Professor AndrC Gerolymatos Senior Supervisor Professor of History Professor Nadine Roth Supervisor Assistant Professor of History Professor John Iatrides External Examiner Professor of International Relations Southern Connecticut State University Date Approved: DECLARATION OF PARTIAL COPYRIGHT LICENCE The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users. The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection, and, without changing the content, to translate the thesislproject or extended essays, if technically possible, to any medium or format for the purpose of preservation of the digital work. -
Most If Not All the Major Social Issues of Our Time Are Considered As Political Or Economic Problems: What Can Anthropology
Course Syllabus THE ROMA: BRINGING TOGETHER HISTORICAL, ANTHROPOLOGICAL AND LINGUISTIC APPROACHES Language Resource Persons: Victor Friedman; Yaron Matras; Katalin Kovalcsik Victor Friedman Romani Language: Structure, History, and Identity The focus of these lectures will be on the history and structure of Romani is essential to comprehending the structure of the Romani language and the use of development of the standards. Moreover, a Romani as a vehicle of both social integration and knowledge of Romani linguistic history and identity maintenance. Romani language is closely tied dialectology is indispensable to any encounter with to Romani identity and Romani history for many these debates at the transnational level. groups. Nonetheless, examples such as Anglo-Romani The course does not presume a familiarity and Calo constitute quite different linguistic systems, with either linguistics or Romani, although students and there are also groups who identify as Romani with such experience will benefit from it. The while speaking some form of a majority or minority introduction will acquaint the student with some of European national language. Since Romani always the basic principles of linguistic science that are exists in contact with other languages, the study of essential to an informed understanding of Romani Romani grammar brings unique perspectives to issues history and modern Romani language questions. The of language history and language contact. This is lectures will also cover the basics of Romani especially true owing to the unidirectional nature of dialectology and Romani grammar in a manner Romani multilingualism, even in the Balkans, where accessible to the non-specialist but with points useful bi-directional multilingualism was the norm for to the specialist. -
Linguistic Typology in Language Teaching and Learning in Multilingual Environments, Migrants’ Integration, and Preservation of Minority Languages
International Journal of Arts & Sciences, CD-ROM. ISSN: 1944-6934 :: 09(04):481–494 (2017) LINGUISTIC TYPOLOGY IN LANGUAGE TEACHING AND LEARNING IN MULTILINGUAL ENVIRONMENTS, MIGRANTS’ INTEGRATION, AND PRESERVATION OF MINORITY LANGUAGES Kristian Pérez Zurutuza EHU-UPV, UNED, Spain Linguistic Typology has been a useful tool for the establishment of a unitary linguistic structure of language as a cognitive construct of the mind with the usage of various typological patterns to build a methodology that enables multilingual language teaching in infant, primary, and secondary education environments. This has occurred by making use of unified typological items, such as word order pattern, verb inflection, comparative morphology, or syntax, among others; permitting young learners to acquire second language(s) parallel to their own mother tongue in a direct manner in multilingual contexts through a unique methodology common to all multiple languages used as target languages, but considering pupils’ native tongue as a solid reference. Such methodology breaks down language into a skeleton young learners approach to regardless of previous knowledge, cultural background, or age. It becomes sustainable at all levels, for all second languages are addressed through their features related to the native tongue of the learner, leading to a more efficient comprehension and quick, effortless, natural assimilation and acquisition through visual and memory methods, along a plethora of exercises that only act as tools for such process, not as the main source for language learning nor teaching. The methodology does not depend on economic basis, for it may be used traditionally, or with ICTs, in a fluid manner to help natives and migrants banish linguistic barriers when integrating within foreign communities and their educational institutions, as well as helping preserve and ensure the growth of minority languages through the increase of speakers, which may lead to the creation of cultural production of whatever type.