Char-RNN for Word Stress Detection in East Slavic Languages

Total Page:16

File Type:pdf, Size:1020Kb

Char-RNN for Word Stress Detection in East Slavic Languages Char-RNN for Word Stress Detection in East Slavic Languages Maria Ponomareva Kirill Milintsevich Ekaterina Artemova ABBYY University of Tartu National Research University Moscow, Russia Tartu, Estonia Higher School of Economics [email protected] [email protected] Moscow, Russia [email protected] Abstract 1. how well does the sequence labeling ap- proach suit the word stress detection task? We explore how well a sequence labeling approach, namely, recurrent neural net- 2. among the investigated RNN-based archi- work, is suited for the task of resource-poor tectures, what is the best one for the task? and POS tagging free word stress detec- tion in the Russian, Ukranian, Belarusian 3. can a word detection system be trained languages. We present new datasets, an- on one or a combination of languages and notated with the word stress, for the three successfully used for another language? languages and compare several RNN mod- els trained on three languages and explore To tackle these questions we: possible applications of the transfer learn- ing for the task. We show that it is possi- 1. compare the investigated RNN-based ble to train a model in a cross-lingual set- ting and that using additional languages models for the word stress detection task improves the quality of the results. on a standard dataset in Russian and se- lect the best one; 1 Introduction 2. create new data sets in Russian, It is impossible to describe Russian (and any Ukrainian and Belarusian and con- other East Slavic) word stress with a set of duct a series of mono- and cross-lingual hand-picked rules. While the stress can be experiments to study the possibility of fixed at a word base or ending along the cross-lingual analysis. whole paradigm, it can also change its posi- tion. The word stress detection task is im- The paper is structured as follows: we start portant for text-to-speech solutions and word- with the description of the datasets created. level homonymy resolving. Moreover, stress Next, we present our major approach to the detecting software is in demand among Rus- selection of neural network architecture. Fi- sian learners. nally, we discuss the results and related work. One of the approaches to solving this prob- 2 Dataset lem is a dictionary-based system. It simply keeps all the wordforms and fails at OOV- In this project, we approach the word stress words. The rule-based approach offers better detection problem for three East Slavic lan- results; however collecting the word stress pat- guages: Russian, Ukrainian and Belarusian, terns is a highly time consuming task. Also, which are said to be mutually intelligible to the method cannot manage words without spe- some extent. Our preliminary experiments cial morpheme markers. As shown in (Pono- along with the results of (Ponomareva et al., mareva et al., 2017), even simple deep learning 2017) show that using context, i.e., left and methods easily outperform all the approaches right words to the word under consideration, described above. is of great help. Hence, such data sources as In this paper we address the following re- dictionaries, including Wiktionary, do not sat- search questions: isfy these requirements, because they provide 35 Proceedings of VarDial, pages 35–41 Minneapolis, MN, June 7, 2019 c 2019 Association for Computational Linguistics only single words and do not provide context are shared between Ukranian and Belarusian words. datasets and between Russian and Ukranian To our knowledge, there are no corpora, an- and Belarusian datasets. The intersection notated with word stress for Ukrainian and between the Ukrainian and Russian datasets Belarusian, while there are available transcrip- amounts around 200 words. tions from the speech subcorpus in Russian1 The structure of the dataset is straightfor- of Russian National Corpus (RNC) (Grishina, ward: each entry consists of a word trigram 2003). Due to the lack of necessary corpora, and a number, which indicates the position of we decided to create them manually. the word stress in the central word3. The approach to data annotation is quite simple: we adopt texts from Universal Depen- dencies project and use provided tokenization and POS-tags, conduct simple filtering and use a crowdsourcing platform, Yandex.Toloka2, for the actual annotation. To be more precise, we took Russian, Ukrainian and Belarusian treebanks from Uni- versal Dependencies project. We split each text from these treebanks in word trigrams and filtered out unnecessary trigrams, where cen- Figure 1: A screenshot of the word stress detection ter words correspond to NUM, PUNCT, and task from Yandex.Toloka crowdsourcing platform other non-word tokens. The next step is to create annotation tasks for the crowdsourcing platform. We formulate word stress annota- 3 Preprocessing tion task as a multiple choice task: given a We followed a basic preprocessing strategy for trigram, the annotator has to choose the word all the datasets. First, we tokenize all the texts stress position in the central word by choosing into words. Next, to take the previous and one of the answer options. Each answer option next word into account we define left and right is the central word, where one of the vowels is contexts of the word as the last three charac- capitalized to highlight a possible word stress ters of the previous word and last three char- position. The example of an annotation task acters of the next word. The word stresses (if is provided in Fig.1. Each task was solved any) are removed from context characters. If by three annotators. As the task is not com- the previous / next word has less than three plicated, we decide to accept only those tasks letters, we concatenate it with the current where all three annotators would agree. Fi- word (for example, “te_oblak´a”[that-Pl.Nom nally, we obtained three sets of trigrams for the cloud-Pl.Nom]). This definition of context is Russian, Ukrainian and Belarusian languages used since East Slavic endings are typically of approximately the following sizes 20K, 10K, two-four letters long and derivational mor- 3K correspondingly. The sizes of the resulting phemes are usually located on the right pe- datasets are almost proportional to the initial riphery of the word. corpora from the Universal Dependencies tree- Finally, each character is annotated with one banks. of the two labels L = f0; 1g: it is annotated Due to the high quality of the Universal De- with 0, if there is no stress, and with 1, if there pendencies treebanks and the languages being should be a stress. An example of an input not confused, there are little intersections be- character string can be found in Table1. tween the datasets, i.e., only around 50 words 4 Model selection 1Word stress in spoken texts database in Russian National Corpus [Baza dannykh aktsentologicheskoy We treat word stress detection as a sequence razmetki ustnykh tekstov v sostave Natsional’nogo kor- labeling task. Each character (or syllable) is pusa russkogo yazyka], http://www.ruscorpora.ru/en/ search-spoken.html 3Datasets are avaialable at: https://github.com/ 2https://toloka.yandex.ru MashaPo/russtressa 36 in лая ворона тит out 00000001000000 Table 1: Character model input and output: each character is annotated with either 0, or 1. A tri- gram “белая ворона летит” (“white crow flies”) is annotated. The central word remains unchanged, while its left and right contexts are reduced to the last three characters labeled with one of the labels L = f0; 1g, indi- cating no stress on the character (0) or a stress (1). Given a string s = s1; : : : ; sn of characters, ∗ ∗ ∗ the task is to find the labels Y = y1; : : : ; yn, Figure 2: Local model for word stress detection such that Y ∗ = arg max p(Y js): position of the stress instead of making a series Y 2Ln of local decisions if there should be a stress on The most probable label is assigned to each each character or not. character. We compare two RNN-based models for the task of word stress detection (see Fig.2 and Fig.3). Both models have a common input and hidden layers but differ in output layers. The input of both models are embeddings of the characters. In both cases, we use bidirec- tional LSTM of 32 units as the hidden layer. Further, we describe the difference between the output layers of the two models. 4.1 Local model The decision strategy of the local model (see Fig.2) follows common language modeling and NER architectures (Ma and Hovy, 2016): all Figure 3: Global model for word stress detection outputs are independent of each other. We de- cide, whether there should be a stress on each To test the approach and to compare these given symbol (or syllable) or not. To do this models, we train two models on the subcorpus for each character we put an own dense layer of Russian National Corpus for Word stress with two units and a softmax activation func- in spoken texts, which appears to be a stan- tion, applied to the corresponding hidden state dard dataset for the task of word stress detec- of the recurrent layer, to label each input char- tion. This dataset was preprocessed according acter (or syllable) with L = f0; 1g. to our standard procedure, and the resulting dataset contains approximately around 1M tri- 4.2 Global model grams. The results of cross-validation exper- The decision strategy of the global model (see iments, presented in Table2, show that the Fig.3) follows common encoder-decoder archi- global model outperforms significantly the lo- tectures (Sutskever et al., 2014). We use the cal model. Hence, the global architecture is hidden layer to encode the input sequence into used further on in the next experiments.
Recommended publications
  • Detecting Inappropriate Messages on Sensitive Topics That Could Harm a Company's Reputation
    Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company’s Reputation∗ Nikolay Babakovz, Varvara Logachevaz, Olga Kozlovay, Nikita Semenovy, and Alexander Panchenkoz zSkolkovo Institute of Science and Technology, Moscow, Russia yMobile TeleSystems (MTS), Moscow, Russia fn.babakov,v.logacheva,[email protected] foskozlo9,[email protected] Abstract texts which are not offensive as such but can ex- press inappropriate views. If a chatbot tells some- Not all topics are equally “flammable” in terms thing that does not agree with the views of the com- of toxicity: a calm discussion of turtles or pany that created it, this can harm the company’s fishing less often fuels inappropriate toxic di- alogues than a discussion of politics or sexual reputation. For example, a user starts discussing minorities. We define a set of sensitive top- ways of committing suicide, and a chatbot goes ics that can yield inappropriate and toxic mes- on the discussion and even encourages the user to sages and describe the methodology of collect- commit suicide. The same also applies to a wide ing and labeling a dataset for appropriateness. range of sensitive topics, such as politics, religion, While toxicity in user-generated data is well- nationality, drugs, gambling, etc. Ideally, a chat- studied, we aim at defining a more fine-grained bot should not express any views on these subjects notion of inappropriateness. The core of inap- propriateness is that it can harm the reputation except those universally approved (e.g. that drugs of a speaker. This is different from toxicity are not good for your health).
    [Show full text]
  • Ride-Hailing Businesses Allowing To
    Company Presentation July 2021 This presentation contains forward-looking statements that involve risks and uncertainties. These include statements regarding our future financial and business performance, our business and strategy and the impact of the COVID-19 pandemic on our industry, business and financial results. Actual results may differ materially from the results predicted or implied by such statements, and our reported results should not be considered as an indication of future Forward performance. The potential risks and uncertainties that could cause actual results to differ from the results predicted or implied by such statements include, among others, the Looking impact of the ongoing COVID-19 pandemic and regulatory and business responses to that crisis, macroeconomic and geopolitical developments affecting the Russian Statement economy or our business, changes in the political, legal and/or regulatory environment, competitive pressures, changes in advertising patterns, changes in user preferences, technological developments, and our need to expend capital to accommodate the growth Disclaimer of the business, as well as those risks and uncertainties included under the captions “Risk Factors” and “Management’s Discussion and Analysis of Financial Condition and Results of Operations” in our Annual Report on Form 20-F dated April 1, 2021, which is on file with the Securities and Exchange Commission and is available on our investor relations website. All information provided in this presentation is as of July 28, 2021, and Yandex
    [Show full text]
  • Murat APISHEV SUMMARY EDUCATION WORK EXPERIENCE
    Murat APISHEV Moscow / Russia | +7-916-769-82-75 [email protected] github | My page (in russian) SUMMARY I am an expert in data science, machine learning, natural language processing and software development. My skills include data analysis, training ML and DL models, designing and writing efficient code, team leading. Also I teach a lot and like to be a public speaker. EDUCATION MOSCOW STATE UNIVERSITY (CMC) / FRC IM OF THE RUSSIAN ACADEMY OF SCIENCES 2017-2020 PH.D IN COMPUTER SCIENCE •Effective Implementation of Topic Modeling Algorithms with Additive Regularization MOSCOW STATE UNIVERSITY (CMC) 2015-2017 MASTER OF APPLIED MATHEMATICS AND COMPUTER SCIENCE MOSCOW STATE UNIVERSITY (CMC) 2011-2015 BACHELOR OF APPLIED MATHEMATICS AND COMPUTER SCIENCE WORK EXPERIENCE AND PROJECTS TALKMART42 April 2020 - Now DATA SCIENTIST (CO-FOUNDER) •Prototype of voice control system for self-service checkout for retail company (Python / NLP) •English neural QA-system for specific domain for advertising agency (Python / NLP) •Advertisements classification system for advertising and media research company (Python / NLP) •Voice games for Yandex Alice and Sber Salut (Python) DIGITAL DECISIONS (AITHEA) April 2019 - April 2020 NLP TEAM LEADER / DATA SCIENTIST (SENIOR) •Spell checker for government organization (C++ / Python / NLP) •English NER in application onboarding for B2C startup (Project managment) •Search and recommender system for English scientific articles (Project managment / Team leading) •Trend detection in Twitter and YouTube data for Instinct
    [Show full text]
  • Transmesis in Slavic Literary Postmodernism
    Transmesis in Slavic Literary Postmodernism: Understanding Translation through Fiction by Roman Ivashkiv A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Slavic Languages and Literatures Department of Modern Languages and Cultural Studies University of Alberta ©Roman Ivashkiv, 2015 ii Abstract This dissertation examines the phenomenon of transmesis — the mimesis or portrayal of translation in fiction — in three postmodernist novels in Ukrainian and Russian and their English translations: Yuri Andrukhovych’s Perverziia (translated by Michael Naydan), Serhiy Zhadan’s Depesh Mod (translated by Myroslav Shkandrij), and Viktor Pelevin’s Generation “П” (translated by Andrew Bromfield). My objective is to explore the use and identify the purposes of transmesis in fiction, to investigate issues of untranslatability to which it gives rise, and to identify the implications of transmesis for translation theory and practice. Transmesis, a term coined by Thomas Beebee, stands for the representation in fiction of translation, both as a process and a product, as well as for the portrayal of the figure of the translator in a fictional text. In a larger historico-theoretical framework, the concept of transmesis stands at the juncture of the so-called cultural and fictional turns in translation studies. While the former has been pivotal in expanding our understanding of translation as a cultural rather than merely a linguistic act, the latter has unraveled the potential of fictional portrayals of translation,
    [Show full text]
  • Working Conditions on Digital Labour Platforms: Evidence from a Leading Labour Supply Economy
    DISCUSSION PAPER SERIES IZA DP No. 12245 Working Conditions on Digital Labour Platforms: Evidence from a Leading Labour Supply Economy Mariya Aleksynska Anastasia Bastrakova Natalia Kharchenko MARCH 2019 DISCUSSION PAPER SERIES IZA DP No. 12245 Working Conditions on Digital Labour Platforms: Evidence from a Leading Labour Supply Economy Mariya Aleksynska IZA and FrDB Anastasia Bastrakova Kiev International Institute of Sociology Natalia Kharchenko Kiev International Institute of Sociology MARCH 2019 Any opinions expressed in this paper are those of the author(s) and not those of IZA. Research published in this series may include views on policy, but IZA takes no institutional policy positions. The IZA research network is committed to the IZA Guiding Principles of Research Integrity. The IZA Institute of Labor Economics is an independent economic research institute that conducts research in labor economics and offers evidence-based policy advice on labor market issues. Supported by the Deutsche Post Foundation, IZA runs the world’s largest network of economists, whose research aims to provide answers to the global labor market challenges of our time. Our key objective is to build bridges between academic research, policymakers and society. IZA Discussion Papers often represent preliminary work and are circulated to encourage discussion. Citation of such a paper should account for its provisional character. A revised version may be available directly from the author. ISSN: 2365-9793 IZA – Institute of Labor Economics Schaumburg-Lippe-Straße 5–9 Phone: +49-228-3894-0 53113 Bonn, Germany Email: [email protected] www.iza.org IZA DP No. 12245 MARCH 2019 ABSTRACT Working Conditions on Digital Labour Platforms: Evidence from a Leading Labour Supply Economy1 Online labour platforms matching labour supply and demand are profoundly modifying the world of work.
    [Show full text]
  • List of Merchants 2
    Merchant Name Date Registered Merchant Name Date Registered Merchant Name Date Registered ClubWPT VIP 06/08/2020 Fresha.com SV Ltd-Sue DeG 05/08/2020 Jurys Hotel Management (U 04/08/2020 Creative Curb Appeal 06/08/2020 GRØNN KONTAKT AS-Gronn Ko 05/08/2020 Koninklijke Ahrend BV-Gis 04/08/2020 Duifhuizen E-shop B.V.-Du 06/08/2020 Kouriten Ltd-Kouriten Ltd 05/08/2020 LEGO System A/S-LEGO, Ric 04/08/2020 Fresha.com SV Ltd-Apsara 06/08/2020 MEWS Systems B.V.-AethosS 05/08/2020 Lawn Works Triad, Inc. 04/08/2020 Fresha.com SV Ltd-BonsaiH 06/08/2020 Nespresso-Elements 05/08/2020 MEWS Systems B.V.-Cityhot 04/08/2020 Fresha.com SV Ltd-Bronzed 06/08/2020 Northern Virginia Landsca 05/08/2020 Nextbike-NEXTBIKE NEW ZEA 04/08/2020 Fresha.com SV Ltd-Capital 06/08/2020 Parkopedia Ltd-Parkopedia 05/08/2020 Qashier Pte. Ltd.-Donburi 04/08/2020 Fresha.com SV Ltd-Palm Vi 06/08/2020 Plan It Recreation II 05/08/2020 Qashier Pte. Ltd.-SILVERB 04/08/2020 Fresha.com SV Ltd-Rose Be 06/08/2020 RC Hotels (Pte.) Ltd.-RC 05/08/2020 Risol Imports-RP Pitlochr 04/08/2020 Fresha.com SV Ltd-Roxstar 06/08/2020 Roberts Dental Center 05/08/2020 Roberts Home Services 04/08/2020 Fresha.com SV Ltd-Seraphi 06/08/2020 SARA MART LIMITED-SARA MA 05/08/2020 SLC Scapes, LLC 04/08/2020 Fresha.com SV Ltd-The Hiv 06/08/2020 Smiles on Belmont 05/08/2020 SMCPHolding-DE FURSAC KIN 04/08/2020 Fresha.com SV Ltd-The Sir 06/08/2020 Soham Inc.-AIREM - Airem 05/08/2020 The Greener Side Lawn Car 04/08/2020 Fresha.com SV Ltd-Tropica 06/08/2020 Soham Inc.-GOAT - West Va 05/08/2020 UK Tool Centre Limited-Ro 04/08/2020 Fresha.com SV Ltd-VIVI Sa 06/08/2020 Soham Inc.-GOAT Haircuts 05/08/2020 Uber B.V.-UberPE 04/08/2020 GGR S.r.l.-GianvitoRoss P 06/08/2020 Svenska Te-Centralen AB-S 05/08/2020 Uber BV-UberDirectPE 04/08/2020 Lawn Fix, INC 06/08/2020 Uberall GmbH-Uberall GmbH 05/08/2020 apaleo GmbH-Goldener Grei 04/08/2020 MOW-IT-PROS, LLC 06/08/2020 Vodelca BV-Vodelca B.V.
    [Show full text]
  • Skin-Related Neglected Tropical Diseases (Skin-Ntds) a New Challenge
    Skin-Related Neglected Tropical Diseases (Skin-NTDs) A New Challenge Edited by Roderick J. Hay and Kingsley Asiedu Printed Edition of the Special Issue Published in Tropical Medicine and Infectious Disease www.mdpi.com/journal/tropicalmed Skin-Related Neglected Tropical Diseases (Skin-NTDs) Skin-Related Neglected Tropical Diseases (Skin-NTDs) A New Challenge Special Issue Editors Roderick J. Hay Kingsley Asiedu MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade Special Issue Editors Roderick J. Hay Kingsley Asiedu The International Foundation for Dermatology World Health Organization UK Switzerland Editorial Office MDPI St. Alban-Anlage 66 4052 Basel, Switzerland This is a reprint of articles from the Special Issue published online in the open access journal Tropical Medicine and Infectious Disease (ISSN 2414-6366) from 2018 to 2019 (available at: https://www. mdpi.com/journal/tropicalmed/special issues/Skin NTDs). For citation purposes, cite each article independently as indicated on the article page online and as indicated below: LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. Journal Name Year, Article Number, Page Range. ISBN 978-3-03921-253-8 (Pbk) ISBN 978-3-03921-254-5 (PDF) Cover image courtesy of Daniel Mason. c 2019 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications. The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.
    [Show full text]
  • Work on Digital Labour Platforms in Ukraine
    WORK ON DIGITAL LABOUR PLATFORMS IN UKRAINE ISSUES AND POLICY PERSPECTIVES Mariya Aleksynska (ILO) Anastasia Bastrakova Natalia Kharchenko (Kiev International Institute of Sociology) WORK ON DIGITAL LABOUR 02 PLATFORMS IN UKRAINE: ISSUES AND POLICY PERSPECTIVES Copyright © International Labour Organization 2018 First published 2018 Publications of the International Labour Office enjoy copyright under Protocol 2 of the Universal Copy- right Convention. Nevertheless, short excerpts from them may be reproduced without authorization, on condition that the source is indicated. For rights of reproduction or translation, application should be made to ILO Publications (Rights and Licensing), International Labour Office, CH-1211 Geneva 22, Switzerland, or by email: [email protected]. The International Labour Office welcomes such applications. Libraries, institutions and other users registered with a reproduction rights organization may make copies in accordance with the licences issued to them for this purpose. Visit www.ifrro.org to find the reproduction rights organization in your country. Work on Digital Labour Platforms in Ukraine: Issues and Policy Perspectives ISBN: 978-92-2-030734-2 (print) ISBN: 978-92-2-030735-9 (web pdf) Also available in Ukranian: Зайнятість через цифрові платформи в Україні. Проблеми та Стратегічні Перспективи, Geneva, 2018 978-92-2-030736-6 (print) 978-92-2-030737-3 (web pdf) The designations employed in ILO publications, which are in conformity with United Nations practice, and the presentation of material therein do not imply the expression of any opinion whatsoever on the part of the International Labour Office concerning the legal status of any country, area or territory or of its authori- ties, or concerning the delimitation of its frontiers.
    [Show full text]
  • Crowdsourcing of Parallel Corpora: the Case of Style Transfer for Detoxification
    Crowdsourcing of Parallel Corpora: the Case of Style Transfer for Detoxification Daryna Dementieva1, Sergey Ustyantsev1, David Dale1, Olga Kozlova2, Nikita Semenov2, Alexander Panchenko1 and Varvara Logacheva1 1Skolkovo Institute of Science and Technology, Moscow, Russia 2Mobile TeleSystems (MTS), Moscow, Russia Abstract One of the ways to fighting toxicity online is to automatically rewrite toxic messages. This is asequence- to-sequence task, and the easiest way of solving it is to train an encoder-decoder model on a set of parallel sentences (pairs of sentences with the same meaning, where one is offensive and the other is not). However, such data does not exist, making researchers resort to non-parallel corpora. We close this gap by suggesting a crowdsourcing scenario for creating a parallel dataset of detoxifying paraphrases. In our first experiments, we collect paraphrases for 1,200 toxic sentences. We describe and analyse the crowdsourcing setup and the resulting corpus. Keywords toxicity, dataset, crowdsourcing, parallel data 1. Introduction Toxicity is a major problem on the Internet. There are multiple strategies for fighting it. Detection of toxicity, being a popular area of research in NLP [1,2,3], is a necessary first step for any toxicity removal strategy. There exist multiple methods for toxicity detection [4, 5] and datasets [6,7,8] which can be used as training data for toxicity detection models. Toxic messages can differ in terms of their content. For some of them, the offence istheonly content, and their only goal is to insult the reader. On the other hand, other messages classified as toxic can contain a non-toxic and useful content that is only expressed in a toxic way.
    [Show full text]
  • Master Thesis Wkmeijer an ... Acking.Pdf
    Android App Tracking Investigating the feasibility of tracking user behavior on mobile phones by analyzing en- crypted network traffic W.K. Meijer Technische Universiteit Delft Android app tracking Investigating the feasibility of tracking user behavior on mobile phones by analyzing encrypted network traffic by W.K. Meijer to obtain the degree of Master of Science at the Delft University of Technology, to be defended publicly on 17 December 2019. Student number: 4224426 Project duration: 19 November 2018 – 17 December 2019 Thesis committee: Dr. C. Doerr, TU Delft, supervisor Dr. ir. J.C.A. van der Lubbe, TU Delft, chair Dr. F.A.Oliehoek TU Delft An electronic version of this thesis is available at http://repository.tudelft.nl/. Abstract The mobile phone has become an important part of people’s lives and which apps are used says a lot about a person. Even though data is encrypted, meta-data of network traffic leaks private information about which apps are being used on mobile devices. Apps can be detected in network traffic using the networkfingerprint of an app, which shows what a typical connection of the app resembles. In this work, we investigate whether fingerprinting apps is feasible in the real world. We collected automatically generated data from various versions of around 500 apps and real-world data from over 65 unique users. We learn thefingerprints of the apps by training a Random Forest on the collected data. This Random Forest is used to detect appfingerprints in network traffic. We show that it is possible to build a model that can classify a specific subset of apps in network traffic.
    [Show full text]
  • LAPPEENRANTA-LAHTI UNIVERSITY of TECHNOLOGY LUT School of Business and Management Degree Programme in International Business and Entrepreneurship
    LAPPEENRANTA-LAHTI UNIVERSITY OF TECHNOLOGY LUT School of Business and Management Degree Programme in International Business and Entrepreneurship Eetu Paju INTERNATIONALIZATION OF FINNISH COMPLEMENTORS THROUGH DIGITAL PLATFORM AND ECOMMERCE MARKET IN RUSSIA – CASE YANDEX Examiners: Professor Juha Väätänen Post-Doctoral Researcher Roman Teplov ABSTRACT Lappeenranta-Lahti University of Technology LUT School of Business and Management Degree Programme in International Business and Entrepreneurship Eetu Paju Internationalization of Finnish complementors through digital platform and ecommerce market in Russia – case Yandex Master’s Thesis 2021 104 pages, 10 figures, 9 tables and 2 appendices Examiners: Professor Juha Väätänen, Post-Doctoral Researcher Roman Teplov Keywords: digital platforms, cross-border e-commerce, internationalization, Russia Digital platforms and e-commerce markets have been studied widely and researchers have been interested to explain how the platforms facilitate transactions and innovations between the users and how the ecosystem of digital platforms operates. The study of digital platforms is still missing information about Russian digital platform providers and the aim of this research was to fill the research gap by studying Russian digital platform from the perspective of a case company Yandex. The research explains how the company has internationalized first to its closest markets and how it has later expanded one if its products segments to multiple foreign markets. The research was conducted with a qualitative research method. The empirical part was divided into two parts where the first part studies Russian digital platform provider Yandex as the case company and the second part focuses on complementors of Yandex’s ecommerce platform by interviewing three Finnish companies which products are sold in the company’s ecommerce platform.
    [Show full text]
  • PRESERVING the DNIPRO RIVER Harmony, History and Rehabilitation PRESERVING the DNIPRO RIVER Harmony, History and Rehabilitation
    PRESERVING THE DNIPRO RIVER harmony, history and rehabilitation PRESERVING THE DNIPRO RIVER harmony, history and rehabilitation International Dnipro Fund, Kiev, Ukraine, National Academy of Sciences of Ukraine, International Development Research Centre, Ottawa, Canada, National Research Institute of Environment and Resources of Ukraine PRESERVING THE DNIPRO RIVER harmony, history and rehabilitation Vasyl Yakovych Shevchuk Georgiy Oleksiyovich Bilyavsky Vasyl Mykolayovych Navrotsky Oleksandr Oleksandrovych Mazurkevich International Development Research Centre Ottawa • Cairo • Dakar • Montevideo • Nairobi • New Delhi • Singapore Library and Archives Canada Cataloguing in Publication Preserving the Dnipro River / V.Y. Schevchuk ... [et al. Includes bibliographical references and index. ISBN 0-88962-827-0 1. Water quality management—Dnieper River. 2. Dnieper River—Environmental conditions. I. Schevchuk, V. Y. QH77.U38P73 2004 333.91'62153'09477 C2004-906230-1 No part of this book may be reproduced or transmitted in any form, by any means, electronic or mechanical, including photocopying and recording, in- formation storage and retrieval systems, without permission in writing from the publisher, except by a reviewer who may quote brief passages in a review. Publishing by Mosaic Press, offices and warehouse at 1252 Speers Rd., units 1 & 2, Oakville, On L6L 5N9, Canada and Mosaic Press, PMB 145, 4500 Witmer Industrial Estates, Niagara Falls, NY, 14305-1386, U.S.A. and In- ternational Development Research Centre PO Box 8500 Ottawa, ON K1G 3H9/Centre de recherches pour le developpement international BP 8500 Ottawa, ON K1G 3H9 ([email protected] / www.idrc.ca) Copyright©The Authors, 2005 Printed and Bound in Canada Co-published by Mosaic Press (ISBN 0-88962-827-0) and the International Development Research Centre (ISBN 1-55250-138-8).
    [Show full text]