Thomas Petzold Thesis

Total Page:16

File Type:pdf, Size:1020Kb

Thomas Petzold Thesis Thomas Petzold B.Arts (Hons), The Open University London, UK; M.Arts, Europa Universität Viadrina, Frankfurt/Oder, Germany; R , W „ W . P T ,“ W ! , G ! (Magister, Sofia University „St. Kliment Ohridski,“ Sofia, Bulgaria) A dissertation presented in fulfilment of the requirements for the degree of Doctor of Philosophy Creative Industries Faculty Queensland University of Technology (QUT) 2011 Languages Internet research Media studies Complexity research Language evolution Governance of languages on the Internet Wikipedia Television Google Translation technology …, or, as a keyword cloud (using www.wordle.net) i Language-use has proven to be the most complex and complicating of all Internet features, yet people and institutions invest enormously in language and cross- language features because they are fundamental to the success of the Internet’s past, present and future. The thesis takes into focus the developments of the latter – features that facilitate and signify linking between or across languages – both in their historical and current contexts. In the theoretical analysis, the conceptual platform of inter-language linking is developed to both accommodate efforts towards a new social complexity model for the co-evolution of languages and language content, as well as to create an open analytical space for language and cross-language related features of the Internet and beyond. The practiced uses of inter-language linking have changed over the last decades. Before and during the first years of the WWW, mechanisms of inter-language linking were at best important elements used to create new institutional or content arrangements, but on a large scale they were just insignificant. This has changed with the emergence of the WWW and its development into a web in which content in different languages co-evolve. The thesis traces the inter-language linking mechanisms that facilitated these dynamic changes by analysing what these linking mechanisms are, how their historical as well as current contexts can be understood and what kinds of cultural-economic innovation they enable and impede. The study discusses this alongside four empirical cases of bilingual or multilingual media use, ranging from television and web services for languages of smaller populations, to large-scale, multiple languages involving web ventures by the British Broadcasting Corporation, the Special Broadcasting Service Australia, Wikipedia and Google. To sum up, the thesis introduces the concepts of ‘inter-language linking’ and the ‘lateral web’ to model the social complexity and co-evolution of languages online. The resulting model reconsiders existing social complexity models in that it is the first that can explain the emergence of large-scale, networked co-evolution of languages and language content facilitated by the Internet and the WWW. Finally, the thesis argues that the Internet enables an open space for language and cross- language related features and investigates how far this process is facilitated by (1) amateurs and (2) human-algorithmic interaction cultures. ii Keywords ................................................................................................................................. i Abstract ................................................................................................................................... ii Table of Contents .................................................................................................................. iii List of Figures ......................................................................................................................... v Statement of Original Authorship ....................................................................................... vi Acknowledgments ................................................................................................................ vii 1 Introduction .................................................................................................................... 1 1.1 Overview of the thesis.............................................................................................. 4 1.2 Research design ....................................................................................................... 7 1.2.1 The methods ..................................................................................................... 7 1.2.2 The interviews .................................................................................................. 8 1.2.3 The research questions ................................................................................... 10 2 Architectural foundations for a lateral web: codes, software, content ................... 11 2.1 Introduction ............................................................................................................ 11 2.2 Codes...................................................................................................................... 12 2.2.1 Early character encodings .............................................................................. 14 2.2.2 Universal character encodings ....................................................................... 17 2.3 Software ................................................................................................................. 21 2.3.1 Technical implementation measures .............................................................. 22 2.3.2 Political implementation measures ................................................................ 23 2.4 Content ................................................................................................................... 27 2.5 Conclusion ............................................................................................................. 29 3 Theoretical approach ................................................................................................... 33 3.1 Introduction ............................................................................................................ 33 3.2 The macro level: linking cultures and languages – towards a more complex understanding ......................................................................................................... 34 3.2.1 The social complexity of inter-language linking ........................................... 37 3.2.2 Excursus: Geo-linguistic complexities of the World Wide Web – the linguistic development of Wikipedia ............................................................. 40 3.2.2.1 Wikipedia as a critical observation site ...................................................... 41 3.2.2.2 Techniques of geo-linguistic analysis ........................................................ 43 3.2.2.2.1 Choropleth maps and cartograms ......................................................... 44 3.2.2.2.2 Network graphs .................................................................................... 45 3.3 The meso level: linking mechanisms ..................................................................... 47 3.3.1 Inter-language linking: defining large-scale content co-evolution ................ 48 3.3.2 Ambitions and regulations for inter-language linkage ................................... 50 3.3.2.1 Governance of languages in the Internet environment............................... 51 3.3.2.1.1 Wikimedia ............................................................................................ 53 3.3.2.1.2 Google Translate .................................................................................. 54 3.3.3 Software for linking languages ...................................................................... 55 3.3.3.1 Inter-language links, the example of Wikipedia ........................................ 55 iii 3.3.3.2 Excursus: the development of machine translation .................................... 58 3.3.3.3 Language pairs, the example of Google Translate ..................................... 64 3.4 The micro level: collective intelligence and inter-language linking ...................... 66 3.5 Conclusion .............................................................................................................. 69 4 Historical antecedents: inter-language linking & television ..................................... 73 4.1 Introduction ............................................................................................................ 73 4.2 Precursors of inter-language linking ...................................................................... 74 4.2.1 Prolegomenon: foundations for new institutional arrangements .................... 74 4.2.2 S4C’s start-up environment & the formation of rudimentary forms of inter- language linking ............................................................................................. 77 4.2.3 The development of the new Welsh content network .................................... 79 4.3 Linking languages and content – Welsh content in the web environment ............. 82 4.3.1 Welsh content in the web environment .......................................................... 82 4.3.2 Techno-linguistic enthusiasm in a global environment: a future for small languages ........................................................................................................ 86 4.4 Conclusion .............................................................................................................
Recommended publications
  • Cultural Anthropology Through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-Based Sentiment
    Cultural Anthropology through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-based Sentiment Peter A. Gloor, Joao Marcos, Patrick M. de Boer, Hauke Fuehres, Wei Lo, Keiichi Nemoto [email protected] MIT Center for Collective Intelligence Abstract In this paper we study the differences in historical World View between Western and Eastern cultures, represented through the English, the Chinese, Japanese, and German Wikipedia. In particular, we analyze the historical networks of the World’s leaders since the beginning of written history, comparing them in the different Wikipedias and assessing cultural chauvinism. We also identify the most influential female leaders of all times in the English, German, Spanish, and Portuguese Wikipedia. As an additional lens into the soul of a culture we compare top terms, sentiment, emotionality, and complexity of the English, Portuguese, Spanish, and German Wikinews. 1 Introduction Over the last ten years the Web has become a mirror of the real world (Gloor et al. 2009). More recently, the Web has also begun to influence the real world: Societal events such as the Arab spring and the Chilean student unrest have drawn a large part of their impetus from the Internet and online social networks. In the meantime, Wikipedia has become one of the top ten Web sites1, occasionally beating daily newspapers in the actuality of most recent news. Be it the resignation of German national soccer team captain Philipp Lahm, or the downing of Malaysian Airlines flight 17 in the Ukraine by a guided missile, the corresponding Wikipedia page is updated as soon as the actual event happened (Becker 2012.
    [Show full text]
  • Universality, Similarity, and Translation in the Wikipedia Inter-Language Link Network
    In Search of the Ur-Wikipedia: Universality, Similarity, and Translation in the Wikipedia Inter-language Link Network Morten Warncke-Wang1, Anuradha Uduwage1, Zhenhua Dong2, John Riedl1 1GroupLens Research Dept. of Computer Science and Engineering 2Dept. of Information Technical Science University of Minnesota Nankai University Minneapolis, Minnesota Tianjin, China {morten,uduwage,riedl}@cs.umn.edu [email protected] ABSTRACT 1. INTRODUCTION Wikipedia has become one of the primary encyclopaedic in- The world: seven seas separating seven continents, seven formation repositories on the World Wide Web. It started billion people in 193 nations. The world's knowledge: 283 in 2001 with a single edition in the English language and has Wikipedias totalling more than 20 million articles. Some since expanded to more than 20 million articles in 283 lan- of the content that is contained within these Wikipedias is guages. Criss-crossing between the Wikipedias is an inter- probably shared between them; for instance it is likely that language link network, connecting the articles of one edition they will all have an article about Wikipedia itself. This of Wikipedia to another. We describe characteristics of ar- leads us to ask whether there exists some ur-Wikipedia, a ticles covered by nearly all Wikipedias and those covered by set of universal knowledge that any human encyclopaedia only a single language edition, we use the network to under- will contain, regardless of language, culture, etc? With such stand how we can judge the similarity between Wikipedias a large number of Wikipedia editions, what can we learn based on concept coverage, and we investigate the flow of about the knowledge in the ur-Wikipedia? translation between a selection of the larger Wikipedias.
    [Show full text]
  • Une Minorité Invisible ? the Sorbs, an Invisible Minority?
    Belgeo Revue belge de géographie 3 | 2013 Les minorités nationales et ethniques : entre renouvellement et permanence Les Sorabes : une minorité invisible ? The Sorbs, an invisible minority? Hélène Yèche Édition électronique URL : http://journals.openedition.org/belgeo/11570 DOI : 10.4000/belgeo.11570 ISSN : 2294-9135 Éditeur : National Committee of Geography of Belgium, Société Royale Belge de Géographie Édition imprimée Date de publication : 30 décembre 2013 ISSN : 1377-2368 Référence électronique Hélène Yèche, « Les Sorabes : une minorité invisible ? », Belgeo [En ligne], 3 | 2013, mis en ligne le 24 mai 2014, consulté le 22 mai 2020. URL : http://journals.openedition.org/belgeo/11570 ; DOI : https:// doi.org/10.4000/belgeo.11570 Ce document a été généré automatiquement le 22 mai 2020. Belgeo est mis à disposition selon les termes de la licence Creative Commons Attribution 4.0 International. Les Sorabes : une minorité invisible ? 1 Les Sorabes : une minorité invisible ? The Sorbs, an invisible minority? Hélène Yèche 1 Longtemps caution démocratique de l’État-SED, les Sorabes de Lusace – minorité linguistique et culturelle slave au sein de l’espace germanophone dont les origines remontent au VIe siècle – ont conservé dans l’Allemagne unifiée un statut à part, protégé par la Loi fondamentale. La politique culturelle conduite dans cette région du nord-est de l’Allemagne depuis la fin de la Seconde Guerre mondiale par le gouvernement de la RDA et poursuivie après le tournant de 1989-1990 par la nouvelle République fédérale offre un exemple
    [Show full text]
  • Fifth Report of the Federal Republic of Germany
    +BMI Fifth Report of the Federal Republic of Germany in accordance with Article 15 (1) of the European Charter for Regional or Minority Languages 2013 2 3 Table of content A. INTRODUCTION .....................................................................................................................................8 B. UPDATED GEOGRAPHIC AND DEMOGRAPHIC INFORMATION ..............................................9 C. GENERAL TRENDS.............................................................................................................................10 I. DIVISION OF COMPETENCES BETWEEN THE FEDERAL AND STATE LEVELS .............................................10 II. COOPERATION ACROSS STATES ..............................................................................................................11 III. CHANGED FRAMEWORK CONDITIONS ......................................................................................................11 IV. BROCHURE OF THE FEDERAL MINISTRY OF THE INTERIOR .....................................................................12 V. SINTI MUSIC FESTIVAL ..............................................................................................................................12 VI. LANGUAGE CONFERENCE ........................................................................................................................13 VII. LANGUAGE CONTEST ...............................................................................................................................13 VIII. EUROPEADA .............................................................................................................................................13
    [Show full text]
  • A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages
    Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2373–2380 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages Dwaipayan Roy, Sumit Bhatia, Prateek Jain GESIS - Cologne, IBM Research - Delhi, IIIT - Delhi [email protected], [email protected], [email protected] Abstract Wikipedia is the largest web-based open encyclopedia covering more than three hundred languages. However, different language editions of Wikipedia differ significantly in terms of their information coverage. We present a systematic comparison of information coverage in English Wikipedia (most exhaustive) and Wikipedias in eight other widely spoken languages (Arabic, German, Hindi, Korean, Portuguese, Russian, Spanish and Turkish). We analyze the content present in the respective Wikipedias in terms of the coverage of topics as well as the depth of coverage of topics included in these Wikipedias. Our analysis quantifies and provides useful insights about the information gap that exists between different language editions of Wikipedia and offers a roadmap for the Information Retrieval (IR) community to bridge this gap. Keywords: Wikipedia, Knowledge base, Information gap 1. Introduction other with respect to the coverage of topics as well as Wikipedia is the largest web-based encyclopedia covering the amount of information about overlapping topics.
    [Show full text]
  • Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
    information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia.
    [Show full text]
  • Omnipedia: Bridging the Wikipedia Language
    Omnipedia: Bridging the Wikipedia Language Gap Patti Bao*†, Brent Hecht†, Samuel Carton†, Mahmood Quaderi†, Michael Horn†§, Darren Gergle*† *Communication Studies, †Electrical Engineering & Computer Science, §Learning Sciences Northwestern University {patti,brent,sam.carton,quaderi}@u.northwestern.edu, {michael-horn,dgergle}@northwestern.edu ABSTRACT language edition contains its own cultural viewpoints on a We present Omnipedia, a system that allows Wikipedia large number of topics [7, 14, 15, 27]. On the other hand, readers to gain insight from up to 25 language editions of the language barrier serves to silo knowledge [2, 4, 33], Wikipedia simultaneously. Omnipedia highlights the slowing the transfer of less culturally imbued information similarities and differences that exist among Wikipedia between language editions and preventing Wikipedia’s 422 language editions, and makes salient information that is million monthly visitors [12] from accessing most of the unique to each language as well as that which is shared information on the site. more widely. We detail solutions to numerous front-end and algorithmic challenges inherent to providing users with In this paper, we present Omnipedia, a system that attempts a multilingual Wikipedia experience. These include to remedy this situation at a large scale. It reduces the silo visualizing content in a language-neutral way and aligning effect by providing users with structured access in their data in the face of diverse information organization native language to over 7.5 million concepts from up to 25 strategies. We present a study of Omnipedia that language editions of Wikipedia. At the same time, it characterizes how people interact with information using a highlights similarities and differences between each of the multilingual lens.
    [Show full text]
  • Title of Thesis: ABSTRACT CLASSIFYING BIAS
    ABSTRACT Title of Thesis: CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis Directed By: Dr. David Zajic, Ph.D. Our project extends previous algorithmic approaches to finding bias in large text corpora. We used multilingual topic modeling to examine language-specific bias in the English, Spanish, and Russian versions of Wikipedia. In particular, we placed Spanish articles discussing the Cold War on a Russian-English viewpoint spectrum based on similarity in topic distribution. We then crowdsourced human annotations of Spanish Wikipedia articles for comparison to the topic model. Our hypothesis was that human annotators and topic modeling algorithms would provide correlated results for bias. However, that was not the case. Our annotators indicated that humans were more perceptive of sentiment in article text than topic distribution, which suggests that our classifier provides a different perspective on a text’s bias. CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis submitted in partial fulfillment of the requirements of the Gemstone Honors Program, University of Maryland, 2018 Advisory Committee: Dr. David Zajic, Chair Dr. Brian Butler Dr. Marine Carpuat Dr. Melanie Kill Dr. Philip Resnik Mr. Ed Summers © Copyright by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang 2018 Acknowledgements We would like to express our sincerest gratitude to our mentor, Dr.
    [Show full text]
  • International Journal of Computational Linguistics
    International Journal of Computational Linguistics & Chinese Language Processing Aims and Scope International Journal of Computational Linguistics and Chinese Language Processing (IJCLCLP) is an international journal published by the Association for Computational Linguistics and Chinese Language Processing (ACLCLP). This journal was founded in August 1996 and is published four issues per year since 2005. This journal covers all aspects related to computational linguistics and speech/text processing of all natural languages. Possible topics for manuscript submitted to the journal include, but are not limited to: • Computational Linguistics • Natural Language Processing • Machine Translation • Language Generation • Language Learning • Speech Analysis/Synthesis • Speech Recognition/Understanding • Spoken Dialog Systems • Information Retrieval and Extraction • Web Information Extraction/Mining • Corpus Linguistics • Multilingual/Cross-lingual Language Processing Membership & Subscriptions If you are interested in joining ACLCLP, please see appendix for further information. Copyright © The Association for Computational Linguistics and Chinese Language Processing International Journal of Computational Linguistics and Chinese Language Processing is published four issues per volume by the Association for Computational Linguistics and Chinese Language Processing. Responsibility for the contents rests upon the authors and not upon ACLCLP, or its members. Copyright by the Association for Computational Linguistics and Chinese Language Processing. All rights reserved. No part of this journal may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical photocopying, recording or otherwise, without prior permission in writing form from the Editor-in Chief. Cover Calligraphy by Professor Ching-Chun Hsieh, founding president of ACLCLP Text excerpted and compiled from ancient Chinese classics, dating back to 700 B.C.
    [Show full text]
  • Sixth Periodical Report Presented to the Secretary General of the Council of Europe in Accordance with Article 15 of the Charter
    Strasbourg, 19 February 2018 MIN-LANG (2018) PR 1 EUROPEAN CHARTER FOR REGIONAL OR MINORITY LANGUAGES Sixth periodical report presented to the Secretary General of the Council of Europe in accordance with Article 15 of the Charter GERMANY Sixth Report of the Federal Republic of Germany pursuant to Article 15 (1) of the European Charter for Regional or Minority Languages 2017 3 Table of contents A. PRELIMINARY REMARKS ................................................................................................................8 B. UPDATED GEOGRAPHIC AND DEMOGRAPHIC INFORMATION ...............................................9 C. GENERAL TRENDS..........................................................................................................................10 I. CHANGED FRAMEWORK CONDITIONS......................................................................................................10 II. LANGUAGE CONFERENCE, NOVEMBER 2014 .........................................................................................14 III. DEBATE ON THE CHARTER LANGUAGES IN THE GERMAN BUNDESTAG, JUNE 2017..............................14 IV. ANNUAL IMPLEMENTATION CONFERENCE ...............................................................................................15 V. INSTITUTE FOR THE LOW GERMAN LANGUAGE, FEDERAL COUNCIL FOR LOW GERMAN ......................15 VI. BROCHURE OF THE FEDERAL MINISTRY OF THE INTERIOR ....................................................................19 VII. LOW GERMAN IN BRANDENBURG.......................................................................................................19
    [Show full text]
  • Topological Evolution of Networks: Case Studies in the US Airlines and Language Wikipedias By
    Topological Evolution of Networks: Case Studies in the US Airlines and Language Wikipedias by Gergana Assenova Bounova B.S., Theoretical Mathematics, Massachusetts Institute of Technology (2003) B.S., Aeronautics & Astronautics, Massachusetts Institute of Technology (2003) S.M., Aeronautics & Astronautics, Massachusetts Institute of Technology (2005) Submitted to the Department of Aeronautics and Astronautics in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2009 c 2009 Gergana A. Bounova, All rights reserved. ... The author hereby grants to MIT the permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part. Author.......................................................................................... Department of Aeronautics and Astronautics February 27, 2009 Certified by..................................................................................... Prof. Olivier L. de Weck Associate Professor of Aeronautics and Astronautics and Engineering Systems Thesis Supervisor Certified by..................................................................................... Prof. Christopher L. Magee Professor of the Practice of Mechanical Engineering and Engineering Systems Certified by..................................................................................... Dr. Daniel E. Whitney Senior Research Scientist, Center for Technology, Policy and Industrial Development, Senior Lecturer in
    [Show full text]
  • Langmag July06 14-17.Qxd (Page 1)
    Methodology Robert L. Read and Steven D. Brewer explain how Esperanto acts as a springboard for the acquisition of other languages Who Knows Where Esperanto Might Lead? In 1887, an obscure eye doctor in ly attain a competency that eluded them in Esperanto, or any language, provides a Poland self-published a little book in Russian. learning an ethnic language or report that they propaedeutic effect in learning a next lan- Over the next several years Lingvo Internacia1 reached a given level of competency in a frac- guage which is similar. appeared in English, French, German, tion of the time required by a national lan- Several factors may contribute to the Hebrew, and Polish. This book, written under guage. Early success creates a virtuous cycle Corder effect, including similarities in vocabu- the pen name Doctor Esperanto, laid the which encourages more study and often leads lary, grammatical structure, and word order. foundation for a new language that would to genuine fluency. Achievement yields positive Similarity of vocabulary has been shown to achieve what no other language project had effects on student self-confidence, insight into be an effective metric for predicting how ever done: establish a living community that the nature of languages in general, and the much knowing one language will help with would go on to survive the death of its cre- structure of their native language in particular. learning another.5 Since Esperanto was ator. Even conservative estimates place the Barry Farber writes in his book How to designed to have a widely recognized vocab- number of active speakers in the tens of Learn Any Language:2 “It’s said that once you ulary and grammatical features broadly thousands, with the number who have master one foreign language, all others come shared across language families, it takes learned Esperanto at some time in their lives much more easily.
    [Show full text]