Learning to Rewrite Queries

Total Page:16

File Type:pdf, Size:1020Kb

Learning to Rewrite Queries Learning to Rewrite Queries Yunlong He Jiliang Tang Hua Ouyang Yahoo! Reseach Michigan State University Yahoo! Reseach Sunnyvale, CA East Lansing, MI Sunnyvale, CA [email protected] [email protected] [email protected] Changsung Kang Dawei Yin Yi Chang Yahoo! Reseach Yahoo! Reseach Yahoo! Reseach Sunnyvale, CA Sunnyvale, CA Sunnyvale, CA [email protected] [email protected] [email protected] ABSTRACT isfy users' information needs by understanding the intent from their queries. It is widely known that there exists a semantic gap be- Query rewriting (QRW), which targets to alter a given tween web documents and user queries and bridging this query to alternative queries that can improve relevance per- gap is crucial to advance information retrieval systems. The formance by reducing the mismatches, is a critical task in task of query rewriting, aiming to alter a given query to a modern search engines and has attracted increasing atten- rewrite query that can close the gap and improve informa- tion in recent years [6, 14, 18, 8]. For example, authors tion retrieval performance, has attracted increasing atten- in [14] propose to modify the search queries based on typical tion in recent years. However, the majority of existing query substitutions that web searchers make to their queries; and rewriters are not designed to boost search performance and query rewriting is treated as a machine translation prob- consequently their rewrite queries could be sub-optimal. In lem and statistical machine translation (SMT) models are this paper, we propose a learning to rewrite framework that trained based on query-snippet pairs [18, 8]. However, the consists of a candidate generating phase and a candidate vast majority of existing algorithms focus on either correla- ranking phase. The candidate generating phase provides tions between queries and rewrites or bridging the language us the flexibility to reuse most of existing query rewriters; gap between user queries and web documents, which could while the candidate ranking phase allows us to explicitly be sub-optimal for the goal of query rewriting { improv- optimize search relevance. Experimental results on a com- ing search relevance performance. For example, since the mercial search engine demonstrate the effectiveness of the SMT model aims at translating a sentence from a source proposed framework. Further experiments are conducted language to a fluent and grammatically correct sentence in to understand the important components of the proposed a target language, the SMT model prefers to rewrite the framework. query \how much tesla" as \how much is the tesla" instead of \price tesla". Therefore it is necessary to explicitly consider 1 Introduction ranking relevance when developing query rewriting methods. In this paper, we propose a learning to rewrite framework Users of the Internet typically play two roles in commercial that consists of (1) candidate generating and (2) candidate search engines. They are information creators that generate ranking as shown in Figure 1. Given a query, we create possi- web documents and they are also information consumers ble candidates via a set of candidate generators in the candi- that retrieve documents for their information needs. It is date generating phase; while given a learning target, we train well known, however, that there is a \lexical chasm" [18] be- a scoring function to rank these candidates from the candi- tween user queries and web documents because they use dif- date generating phase in the candidate ranking phase. The ferent language styles and vocabularies. As a consequence, advantages of this framework are two-fold. First, the can- search engines could be unable to retrieve relevant docu- didate generating phase not only allows us to reuse most of ments even when the issued queries are perfectly describing existing query rewriting algorithms as candidate generators users' information needs. For example, a user forms a query but also enables us to explore recent advanced techniques \how much tesla", but web documents in search engines use such as deep learning to build new candidate generators. the expression \price tesla". Therefore it has become in- Second, the candidate ranking phase gives us flexibility to creasingly important for search engines to intelligently sat- choose the target to optimize for query rewriting; for exam- ple, in this work, we desire to boost relevance performance thus we can choose the relevance target to rank candidates. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed Our contributions are summarized as below: for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than • Define the problem of learning to rewrite formally; ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission • Propose a learning to rewrite framework that is flexible and/or a fee. Request permissions from [email protected]. to incorporate existing query rewriting algorithms and CIKM’16 , October 24-28, 2016, Indianapolis, IN, USA optimize the relevance performance explicitly; and c 2016 ACM. ISBN 978-1-4503-4073-1/16/10. $15.00 DOI: http://dx.doi.org/10.1145/2983323.2983835 • Conduct experiments on a commercial search engine to Figure 1: Flow chart of the proposed learning to rewrite framework demonstrate the effectiveness of the proposed frame- pure counts of term frequencies. However, the SMT system work. is used as a black box instead of fully tuned for the task of query rewriting. The rest of the paper is organized as follows. In Section 2, we briefly review the related work. In Section 3, we for- Another related research topic is query recommendation. mally define the problem and introduce the proposed frame- Authors in [12] make use of search session data to find query work in detail. In Section 4, we present empirical evaluation pairs frequently co-occurring in the same sessions, which are with discussion. In Section 5, we give conclusion with future used as suggestions for each other. Baeza-Yates et al. [2] work. propose to rank the clustered queries according to two prin- ciples: i) the similarity of the queries to the input query, and 2 Related Work ii) the support of the suggested query, which measures the Query expansion and rewriting have long been important magnitude of answers returned in the past to this query that research topics in information retrieval [3]. Xu and Croft have attracted the attention of users. A follow-up work [4] [23] studied using the top ranked documents retrieved by combines various click-based, topic-based and session based the original query to expand the query. This method suffers ranking strategies and uses supervised learning in order to from the sensitivity to initial ranking results and does not maximize the semantic similarity between the query and the learn from user generated data. Later approaches [6, 7, 14] recommendations. Cao et.al [5] addressed the data sparse- focus on using user query logs to generate query expansions ness issue by summarizing queries into concepts by clus- by collecting signals such as clickthrough rate [6, 24], cooc- tering a click-through bipartite. Then, from session data currence in search sessions [14] or query similarity based on a concept sequence suffix tree is constructed as the query click graphs [7, 1]. Since search logs contain query docu- suggestion model. ment pairs clicked by millions of users, the term correlations Recently, deep learning techniques [16, 21] have been ap- reflect the preference of the majority of users. However, plied on query processing and machine translation tasks. the correlation-based method, as pointed out by [19], suf- For example, authors in [11] generate distributed language fers low precision partly because the correlation model does models for queries to improve the relevance in sponsored not explicitly capture context information and is suscepti- search. Authors in [21] applied recurrent neural networks ble to noise. More recently, natural language technology in on machine translation tasks and achieved state-of-the-art form of statistical machine translation (SMT) [19, 18, 8, 9] performance compared to traditional SMT systems. A hi- has been introduced for the query expansion and rewriting erarchical recurrent encoder-decoder method is proposed in problems. In the SMT system all component models are [20] for the task of query auto-completion. properly smoothed using sophisticated techniques to avoid In this work, our approach is formulated as a learning sparse data problems while the correlation model relies on to rewrite framework and its key component is candidate ranking. Therefore the candidate ranking phase is similar with traditional techniques. Therefore a candidate gener- to the learning to rank framework in terms of many aspects ator based on deep learning techniques could be comple- such as loss functions [25, 22] and ranking features [15]. mentary to traditional ones and provides potentially better candidates. Recurrent Neural Network (RNN) is neural se- 3 Learning to Rewrite Framework quence model that achieves state of the art performance on The query rewriting problem aims to find the query rewrites many important sequential learning tasks. The long short- of a given query for the purpose of improving the relevance term memory (LSTM) is a one of the most popular RNN of the information retrieval system. The proposed frame- instance. It can learn long range temporal dependencies and work formulates the query rewriting problem as an opti- mitigate the vanishing gradient problem. We propose to use mization problem of finding a scoring function F (q; r) which the Sequence-to-Sequence LSTM model [21] to build a new assigns a score for any pair of query q and its rewrite can- candidate generator.
Recommended publications
  • Originally, the Descendants of Hua Xia Were Not the Descendants of Yan Huang
    E-Leader Brno 2019 Originally, the Descendants of Hua Xia were not the Descendants of Yan Huang Soleilmavis Liu, Activist Peacepink, Yantai, Shandong, China Many Chinese people claimed that they are descendants of Yan Huang, while claiming that they are descendants of Hua Xia. (Yan refers to Yan Di, Huang refers to Huang Di and Xia refers to the Xia Dynasty). Are these true or false? We will find out from Shanhaijing ’s records and modern archaeological discoveries. Abstract Shanhaijing (Classic of Mountains and Seas ) records many ancient groups of people in Neolithic China. The five biggest were: Yan Di, Huang Di, Zhuan Xu, Di Jun and Shao Hao. These were not only the names of groups, but also the names of individuals, who were regarded by many groups as common male ancestors. These groups first lived in the Pamirs Plateau, soon gathered in the north of the Tibetan Plateau and west of the Qinghai Lake and learned from each other advanced sciences and technologies, later spread out to other places of China and built their unique ancient cultures during the Neolithic Age. The Yan Di’s offspring spread out to the west of the Taklamakan Desert;The Huang Di’s offspring spread out to the north of the Chishui River, Tianshan Mountains and further northern and northeastern areas;The Di Jun’s and Shao Hao’s offspring spread out to the middle and lower reaches of the Yellow River, where the Di Jun’s offspring lived in the west of the Shao Hao’s territories, which were near the sea or in the Shandong Peninsula.Modern archaeological discoveries have revealed the authenticity of Shanhaijing ’s records.
    [Show full text]
  • Negotiation Philosophy in Chinese Characters
    6 Ancient Wisdom for the Modern Negotiator: What Chinese Characters Have to Offer Negotiation Pedagogy Andrew Wei-Min Lee* Editors’ Note: In a project that from its inception has been devoted to second generation updates, it is instructive nonetheless to realize how much we have to learn from the past. We believe Lee’s chapter on Chinese characters and their implications for negotiation is groundbreaking. With luck, it will prove to be a harbinger of a whole variety of new ways of looking at our field that will emerge from our next round of discussion. Introduction To the non-Chinese speaker, Chinese characters can look like a cha- otic mess of dots, lines and circles. It is said that Chinese is the most difficult language in the world to learn, and since there is no alpha- bet, the struggling student has no choice but to learn every single Chinese character by sheer force of memory – and there are tens of thousands! I suggest a different perspective. While Chinese is perhaps not the easiest language to learn, there is a very definite logic and sys- tem to the formation of Chinese characters. Some of these characters date back almost eight thousand years – and embedded in their make-up is an extraordinary amount of cultural history and wisdom. * Andrew Wei-Min Lee is founder and president of the Leading Negotiation Institute, whose mission is to promote negotiation pedagogy in China. He also teaches negotiation at Peking University Law School. His email address is an- [email protected]. This article draws primarily upon the work of Feng Ying Yu, who has spent over three hundred hours poring over ancient Chinese texts to analyze and decipher the make-up of modern Chinese characters.
    [Show full text]
  • English Versions of Chinese Authors' Names in Biomedical Journals
    Dialogue English Versions of Chinese Authors’ Names in Biomedical Journals: Observations and Recommendations The English language is widely used inter- In English transliteration, two-syllable Forms of Chinese Authors’ Names nationally for academic purposes. Most of given names sometimes are spelled as two in Biomedical Journals the world’s leading life-science journals are words (Jian Hua), sometimes as one word We recently reviewed forms of Chinese published in English. A growing number (Jianhua), and sometimes hyphenated authors’ names accompanying English- of Chinese biomedical journals publish (Jian-Hua). language articles or abstracts in various abstracts or full papers in this language. Occasionally Chinese surnames are Chinese and Western biomedical journals. We have studied how Chinese authors’ two syllables (for example, Ou-Yang, Mu- We found considerable inconsistency even names are presented in English in bio- Rong, Si-Ma, and Si-Tu). Editors who are within the same journal or issue. The forms medical journals. There is considerable relatively unfamiliar with Chinese names were in the following categories: inconsistency. This inconsistency causes may mistake these compound surnames for • Surname in all capital letters followed by confusion, for example, in distinguishing given names. hyphenated or closed-up given name, for surnames from given names and thus cit- China has 56 ethnic groups. Names example, ing names properly in reference lists. of minority group members can differ KE Zhi-Yong (Chinese Journal of In the current article we begin by pre- considerably from those of Hans, who Contemporary Pediatrics) senting as background some features of constitute most of the Chinese population. GUO Liang-Qian (Chinese Chinese names.
    [Show full text]
  • Delong Zuo Associate Professor Department of Civil and Environmental Engineering Texas Tech University
    Curriculum Vitae Delong Zuo Associate Professor Department of Civil and Environmental Engineering Texas Tech University Box 41023, Texas Tech University, Lubbock, TX 79409-1023 Phone: (806) 834-6535; Fax: (806) 742-3446; Email: [email protected] GENERAL INFORMATION Education Ph.D. in Civil Engineering May 2005 The Johns Hopkins University Dissertation: "Understanding Wind- and Rain-Wind-Induced Stay Cable Vibrations" Advisor: Professor Nicholas P. Jones Master of Science in Civil Engineering May 2003 The Johns Hopkins University Master of Science in Structural Engineering April 1999 Chongqing Jiaotong University, Chongqing, China Bachelor of Engineering in Civil Engineering (with highest honor) July 1996 Chongqing Jiaotong University, Chongqing, China Professional Experience Associate Professor of Civil Engineering September 2012 – Present Texas Tech University Assistant Professor of Civil Engineering September 2006 – August 2012 Texas Tech University Postdoctoral Fellow April 2005 –August 2006 Department of Civil Engineering The Johns Hopkins University Visiting Scholar August 2002 - August 2004 Department of Civil and Environmental Engineering University of Illinois at Urbana-Champaign Selected Honors and Awards Faculty Research Award 2010 Department of Civil and Environmental Engineering Texas Tech University Meyerhoff Fellowship 1999-2000 The Johns Hopkins University Curriculum Vitae Wufu-Zhenhua Scholarship 1999 Department of Transportation, China RESEARCH PROJECTS Sponsored by Federal Agencies: P1. Department of Energy Optimization of Wind Turbine Performance (Announced, subcontract from Alstom Power, Inc.) September 2014-August, 2015 PI: Delong Zuo, Co-PI: John Schroeder Total Award of $148,483 (50%) P2. National Science Foundation Development of a Practical Model for Wind- and Rain-Wind-Induced Stay Cable Vibrations September 2009 – August 2012 PI: Delong Zuo Total Award of $150,529 (100%) P3.
    [Show full text]
  • From Translation to Adaptation: Chinese Language Texts and Early Modern Japanese Literature
    From Translation to Adaptation: Chinese Language Texts and Early Modern Japanese Literature Nan Ma Hartmann Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2014 © 2014 Nan Ma Hartmann All rights reserved ABSTRACT From Translation to Adaptation: Chinese Language Texts and Early Modern Japanese Literature Nan Ma Hartmann This dissertation examines the reception of Chinese language and literature during Tokugawa period Japan, highlighting the importation of vernacular Chinese, the transformation of literary styles, and the translation of narrative fiction. By analyzing the social and linguistic influences of the reception and adaptation of Chinese vernacular fiction, I hope to improve our understanding of genre development and linguistic diversification in early modern Japanese literature. This dissertation historically and linguistically contextualizes the vernacularization movements and adaptations of Chinese texts in the seventeenth to eighteenth centuries, showing how literary importation and localization were essential stimulants and also a paradigmatic shift that generated new platforms for Japanese literature. Chapter 1 places the early introduction of vernacular Chinese language in its social and cultural contexts, focusing on its route of propagation from the Nagasaki translator community to literati and scholars in Edo, and its elevation from a utilitarian language to an object of literary and political interest. Central figures include Okajima Kazan (1674-1728) and Ogyû Sorai (1666-1728). Chapter 2 continues the discussion of the popularization of vernacular Chinese among elite intellectuals, represented by the Ken’en School of scholars and their Chinese study group, “the Translation Society.” This chapter discusses the methodology of the study of Chinese by surveying a number of primers and dictionaries compiled for reading vernacular Chinese and comparing such material with methodologies for reading classical Chinese.
    [Show full text]
  • Man Hua Zuo P N Ke Xian Li De Qiu Fang Wei J N Keion Shouno Zou Zh Yue Gu Ng Tiao Li Gett Robo Xu N Hu Sh Ng Mai Yue Gu Ng Ji Mian
    MAN HUA ZUO P N KE XIAN LI DE QIU FANG WEI J N KEION SHOUNO ZOU ZH YUE GU NG TIAO LI GETT ROBO XU N HU SH NG MAI YUE GU NG JI MIAN PDF-33MHZPNKXLDQFWJNKSZZYGNTLGRXNHSNMYGNJM16 | Page: 133 File Size 5,909 KB | 10 Oct, 2020 TABLE OF CONTENT Introduction Brief Description Main Topic Technical Note Appendix Glossary PDF File: Man Hua Zuo P N Ke Xian Li De Qiu Fang Wei J N Keion Shouno Zou Zh Yue Gu Ng Tiao Li Gett 1/3 Robo Xu N Hu Sh Ng Mai Yue Gu Ng Ji Mian - PDF-33MHZPNKXLDQFWJNKSZZYGNTLGRXNHSNMYGNJM16 PDF File: Man Hua Zuo P N Ke Xian Li De Qiu Fang Wei J N Keion Shouno Zou Zh Yue Gu Ng Tiao Li Gett 2/3 Robo Xu N Hu Sh Ng Mai Yue Gu Ng Ji Mian - PDF-33MHZPNKXLDQFWJNKSZZYGNTLGRXNHSNMYGNJM16 Man Hua Zuo P N Ke Xian Li De Qiu Fang Wei J N Keion Shouno Zou Zh Yue Gu Ng Tiao Li Gett Robo Xu N Hu Sh Ng Mai Yue Gu Ng Ji Mian e-Book Name : Man Hua Zuo P N Ke Xian Li De Qiu Fang Wei J N Keion Shouno Zou Zh Yue Gu Ng Tiao Li Gett Robo Xu N Hu Sh Ng Mai Yue Gu Ng Ji Mian - Read Man Hua Zuo P N Ke Xian Li De Qiu Fang Wei J N Keion Shouno Zou Zh Yue Gu Ng Tiao Li Gett Robo Xu N Hu Sh Ng Mai Yue Gu Ng Ji Mian PDF on your Android, iPhone, iPad or PC directly, the following PDF file is submitted in 10 Oct, 2020, Ebook ID PDF-33MHZPNKXLDQFWJNKSZZYGNTLGRXNHSNMYGNJM16.
    [Show full text]
  • Performing Chinese Contemporary Art Song
    Performing Chinese Contemporary Art Song: A Portfolio of Recordings and Exegesis Qing (Lily) Chang Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy Elder Conservatorium of Music Faculty of Arts The University of Adelaide July 2017 Table of contents Abstract Declaration Acknowledgements List of tables and figures Part A: Sound recordings Contents of CD 1 Contents of CD 2 Contents of CD 3 Contents of CD 4 Part B: Exegesis Introduction Chapter 1 Historical context 1.1 History of Chinese art song 1.2 Definitions of Chinese contemporary art song Chapter 2 Performing Chinese contemporary art song 2.1 Singing Chinese contemporary art song 2.2 Vocal techniques for performing Chinese contemporary art song 2.3 Various vocal styles for performing Chinese contemporary art song 2.4 Techniques for staging presentations of Chinese contemporary art song i Chapter 3 Exploring how to interpret ornamentations 3.1 Types of frequently used ornaments and their use in Chinese contemporary art song 3.2 How to use ornamentation to match the four tones of Chinese pronunciation Chapter 4 Four case studies 4.1 The Hunchback of Notre Dame by Shang Deyi 4.2 I Love This Land by Lu Zaiyi 4.3 Lullaby by Shi Guangnan 4.4 Autumn, Pamir, How Beautiful My Hometown Is! by Zheng Qiufeng Conclusion References Appendices Appendix A: Romanized Chinese and English translations of 56 Chinese contemporary art songs Appendix B: Text of commentary for 56 Chinese contemporary art songs Appendix C: Performing Chinese contemporary art song: Scores of repertoire for examination Appendix D: University of Adelaide Ethics Approval Number H-2014-184 ii NOTE: 4 CDs containing 'Recorded Performances' are included with the print copy of the thesis held in the University of Adelaide Library.
    [Show full text]
  • Names of Chinese People in Singapore
    101 Lodz Papers in Pragmatics 7.1 (2011): 101-133 DOI: 10.2478/v10016-011-0005-6 Lee Cher Leng Department of Chinese Studies, National University of Singapore ETHNOGRAPHY OF SINGAPORE CHINESE NAMES: RACE, RELIGION, AND REPRESENTATION Abstract Singapore Chinese is part of the Chinese Diaspora.This research shows how Singapore Chinese names reflect the Chinese naming tradition of surnames and generation names, as well as Straits Chinese influence. The names also reflect the beliefs and religion of Singapore Chinese. More significantly, a change of identity and representation is reflected in the names of earlier settlers and Singapore Chinese today. This paper aims to show the general naming traditions of Chinese in Singapore as well as a change in ideology and trends due to globalization. Keywords Singapore, Chinese, names, identity, beliefs, globalization. 1. Introduction When parents choose a name for a child, the name necessarily reflects their thoughts and aspirations with regards to the child. These thoughts and aspirations are shaped by the historical, social, cultural or spiritual setting of the time and place they are living in whether or not they are aware of them. Thus, the study of names is an important window through which one could view how these parents prefer their children to be perceived by society at large, according to the identities, roles, values, hierarchies or expectations constructed within a social space. Goodenough explains this culturally driven context of names and naming practices: Department of Chinese Studies, National University of Singapore The Shaw Foundation Building, Block AS7, Level 5 5 Arts Link, Singapore 117570 e-mail: [email protected] 102 Lee Cher Leng Ethnography of Singapore Chinese Names: Race, Religion, and Representation Different naming and address customs necessarily select different things about the self for communication and consequent emphasis.
    [Show full text]
  • CJK NACO Searching
    1/13/2017 CJK NACO Searching Prepared by Ryan Finnerty and Shi Deng, UC San Diego Library With assistance by Hideyuki Morimoto, Columbia University Libraries Thanks to Erica Chang (Univ. of Hawai’i) and Sarah Byun (LC) for providing Korean examples and search tips. NACO Searching: Purpose & Outlines • Keep the database clean by searching before contributing • Discuss why searching is important with CJK examples • To prevent duplicate NARs • To prevent conflict in authorized access points and variant access points • To gather information from existing bibliographic records • To identify existing records that may need to be evaluated and re‐coded for RDA • To identify bibliographic records that may need BFM • Searching tips in Connexion 2 1 1/13/2017 Why Search? To Prevent Duplicate NARs • Duplicates are normally created by inefficient searching and the 24‐ hour upload gap in the Name authority file. • Before creating a name authority record: 1. Search the OCLC authority file for the authorized access point, including variant forms of the access point. 2. In addition, search WorldCat for bibliographic records that contain the authorized access point or variant forms. • If you put your record in a save file, remember to search again if more than 24 hours have passed. • If you encounter duplicate records in the authority file, be sure to notify your NACO Coordinator so the records can be reported to LC. 3 Duplicate NARs for Personal Names (1) • 24 hours rule: If you put your record in a save file, remember to search again if more than 24 hours have passed. Entered: May 16, 2016 Entered: May 10, 2016 010 no2016066120 010 n 2016025569 046 ǂf 1983 ǂ2 edtf 046 ǂf 1983 ǂ2 edtf 1001 Tanaka, Yūsuke, ǂd 1983‐ 1001 Tanaka, Yūsuke, ǂd 1983‐ 4001 田中祐輔, ǂd 1983‐ 4001 田中祐輔, ǂd 1983‐ 670 Gendai Chūgoku no Nihongo kyōikushi, 2015: 670 Gendai Chūgoku no Nihongo kyōikushi, 2015: ǂbt.p.
    [Show full text]
  • The Acquisition of Mandarin Prosody by American Learners of Chinese As a Foreign Language
    The Acquisition of Mandarin Prosody by American Learners of Chinese as a Foreign Language (CFL) Dissertation Presented in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the Graduate School of The Ohio State University By Chunsheng Yang, B.A., M.A. Graduate Program in East Asian Languages and Literatures The Ohio State University 2011 Dissertation Committee: Dr. Marjorie K.M. Chan, Advisor Dr. Mary E. Beckman Dr. Cynthia Clopper Dr. Mineharu Nakayama Copyright by Chunsheng Yang 2011 ABSTRACT In the acquisition of second language (L2) or foreign language (FL) pronunciation, learners not only learn how to pronounce consonants and vowels (tones as well, in the case of tone languages, such as Mandarin Chinese), they also learn how to produce the vowel reduction, vowel-consonant co-articulation, and prosody. Central to this dissertation is prosody, which refers to the way that an utterance is broken up into smaller units, and the acoustic patterns of each unit at different levels, in terms of fundamental frequency (F0), duration and amplitude. In L2 pronunciation, prosody is as important as -- if not more important than -- consonants and vowels. This dissertation examines the acquisition of Mandarin prosody by American learners of Chinese as a Foreign Language (CFL). Specifically, it examines four aspects of Mandarin prosody: (1) prosodic phrasing (i.e., breaking up of utterances into smaller units); (2) surface F0 and duration patterns of prosodic phrasing in a group of sentence productions elicited from L1 and L2 speakers of Mandarin Chinese; (3) patterns of tones errors in L2 Mandarin productions; and (4) the relationship between tone errors and prosodic phrasing in L2 Mandarin.
    [Show full text]
  • Isolation Forest
    Isolation Forest Fei Tony Liu, Kai Ming Ting Zhi-Hua Zhou Gippsland School of Information Technology National Key Laboratory Monash University, Victoria, Australia for Novel Software Technology {tony.liu},{kaiming.ting}@infotech.monash.edu.au Nanjing University, Nanjing 210093, China [email protected] Abstract anomalies. Notable examples such as statistical methods [11], classification-based methods [1], and clustering-based Most existing model-based approaches to anomaly de- methods [5] all use this general approach. Two major draw- tection construct a profile of normal instances, then iden- backs of this approach are: (i) the anomaly detector is opti- tify instances that do not conform to the normal profile as mized to profile normal instances, but not optimized to de- anomalies. This paper proposes a fundamentally different tect anomalies—as a consequence, the results of anomaly model-based method that explicitly isolates anomalies in- detection might not be as good as expected, causing too stead of profiles normal points. To our best knowledge, the many false alarms (having normal instances identified as concept of isolation has not been explored in current liter- anomalies) or too few anomalies being detected; (ii) many ature. The use of isolation enables the proposed method, existing methods are constrained to low dimensional data iForest, to exploit sub-sampling to an extent that is not fea- and small data size because of their high computational sible in existing methods, creating an algorithm which has a complexity. linear time complexity with a low constant and a low mem- This paper proposes a different type of model-based ory requirement.
    [Show full text]
  • 1 a Syntactic Approach to the Grammaticalization of the Modal Marker Dāng 當 in Middle Chinese Barbara Meisterernst, National
    A syntactic approach to the grammaticalization of the modal marker dāng 當 in Middle Chinese Barbara Meisterernst, National Tsing Hua University, Taiwan [email protected] In this paper, I discuss the grammaticalization of the Chinese modal verb dāng from a lexical verb into a deontic modal marker and a future marker as a case of upward movement to one or more functional categories in the sense of Roberts and Roussou (2003). Evidence for a func- tional category outside vP hosting deontic modality comes from the deontic negative markers of Archaic Chinese, from the semantic scope of negation (following Cormack and Smith 2002), from the syntax of wh-adverbials, and from the relative order of necessity modals and possibil- ity modals. 1. Modal markers in Chinese • Modal markers in Late Archaic Chinese (5th – 2nd c. BCE) as a clear instantiation of grammaticalization. → Grammaticalization can range from → lexical verb → modal verb → markers of deontic and/or epistemic modality → future mark- ers. → The first modal displaying this development: the verb kě 可 ‘possible, can’. (1) a. 有無父之國則可也。」 (Zuozhuan, Huan 16.5.3) (LAC) Yŏu wú fù zhī guó zé kě yě Have not.have father GEN state then possible SFP ‘If there is a country without fathers, then it is possible.’ b. 匹夫猶未可動,而況諸侯乎! (Zhuangzi 4.2.1) (LAC) Pǐfū yóu wèi kě dòng, ér kuàng zhūhóu hū Commoner still NEGasp KE move, CON rather feudal.lord SFP/Q ‘If even a commoner cannot be moved, even less can a feudal lord!’ c. 臣違君命者,亦不可不殺也。」 (Guoyu, Luyu shang) (LAC) Chén wéi jūn mìng zhě, yì bù kě bù shā yě subject oppose ruler order REL, also NEG KE NEG kill SFP ‘A subject who opposes the order of his ruler must also be killed.’ • Poly-functionality of most modals in Modern Chinese, Tsai (e.g.
    [Show full text]