1994-A Probabilistic Algorithm for Segmenting Non-Kanji Japanese

Total Page:16

File Type:pdf, Size:1020Kb

1994-A Probabilistic Algorithm for Segmenting Non-Kanji Japanese From: AAAI-94 Proceedings. Copyright © 1994, AAAI (www.aaai.org). All rights reserved. stic Algorithm for Segmenting Non-Kanji Japanese Strings Virginia Teller Eleanor Olds Batchelder Hunter College and the Graduate School The Graduate School The City University of New York The City University of New York 695 Park Avenue 33 West 42nd Street New York, NY 10021 New York, NY 10036 [email protected] [email protected] Abstract part-of-speech taggers have been used to obtain information about the lexical, syntactic, and some We present an algorithm for segmenting unrestricted Japanese text that is able to detect up to 98% of the semantic properties of large corpora. Automatic text words in a corpus. The segmentation technique, which tagging is an important first step in discovering the is simple and extremely fast, does not depend on a linguistic structure of large text corpora. lexicon or any formal notion of what a word is in Probabilistic approaches to tagging have developed Japanese, and the training procedure does not require in response to the failure of traditional rule-based annotated text of any kind. Relying almost exclusively systems to handle large-scale applications involving on character type information and a table of hiragana unrestricted text processing. Characterized by the bigram frequencies, the algorithm makes a decision as brittleness of handcrafted rules and domain knowledge to whether to create word boundaries or not. This and the intractable amount of work required to build method divides strings of Japanese characters into units them and to port them to new domains and that are computationally tractable and that can be applications, the rule-based paradigm that has domi- justified on lexical and syntactic grounds as well. nated NLP and artificial intelligence in general has come under close scrutiny. Stochastic taggers are one Introduction example of a class of alternative approaches, often A debate is being waged in the field of machine referred to as “corpus-based” or “example-based” translation about the degree to which rationalist and techniques, that use statistics rather than rules as the empiricist approaches to linguistic knowledge should be basis for NLP systems. used in MT systems. While most participants in the A major decision in the design of a tagger is to debate seem to agree that both methods are useful, determine exactly what will count as a word, and albeit for different tasks, few have compared the limits whether two sequences of characters are instances of of knowledge-based and statistics-based techniques in the same word or different words (Brown et al. 1992). the various stages of translation. This may sound trivial - after all, words are delimited The perspective adopted in this paper represents one by spaces - but it is a problem that has plagued end of the rationalist-empiricist spectrum. We report a linguists for decades. For example, is s~ozlldn’t one study that assumes almost no rule-based knowledge word or two? Is s/zozcldn’t different from shoz&i not? If and attempts to discover the maximum results that can hyphenated forms like rzcle-basedand Baum-Welch (as in be achieved with primarily statistical information about Baum-Web-h algonkh) are to count as two words, then the problem domain. For the task of segmenting non- what about vis-h-vis? The effects of capitalization must kanji strings in unrestricted Japanese text, we found also be considered, as the following example shows: that the success rate of this minimalist method approaches 9.5%. Bill, please send the bill. Bill me today or bill me tomorrow. May I pay in May? Background In the first sentence Bill is a proper noun and bill is a In recent years, researchers in the field of natural lan- common noun. In the second sentence Bill and bill are guage processing have become interested in analyzing the same - both are verbs - while in the third increasingly large bodies of text. Whereas a decade ago sentence May and May are different - one is a modal a corpus of one million words was considered large, and the other a proper noun. corpora consisting of tens of millions of words are These problems are compounded in Japanese common today, and several are close to 100 million because, unlike English, sentences are written as words. Since exhaustively parsing such enormous continuous strings of characters without spaces between amounts of text is impractical, lexical analyzers called words. As a result, decisions about word boundaries are 742 NaturalLanguage Processing all the more difficult, and lexical analysis plays an basing its decisions instead solely on whether a two- important preprocessing role in all Japanese natural character sequence is deemed more likely to continue language systems. Before text can be parsed, a lexical a word or contain a word boundary. This approach was analyzer must segment the stream of input characters able to segment 90% of the words in test sentences comprising each sentence. correctly, compared to 91.7% for the JUMAN-based Japanese lexical analyzers typically segment method. sentences in two major steps, first dividing each sentence into major phrases called bzcnsetszl composed of Characteristics of Japanese Text a content word and accompanying function words, e.g. noun+particle, verb+endings, and then discovering the Japanese text is composed of four different types of individual words within each phrase. Algorithms based characters: kanji characters borrowed more than a on the longest match principle perform the bulk of the millennium ago from Chinese; two kana syllabaries, work (Kawada 1990). To extract a bunsetsu structure hiragana and katakana; and romaji, consisting of Roman from an input string, the system first proposes the alphabetic and Arabic numeral characters. The longest candidate that matches a dictionary entry and syllabaries contain equivalent sets of around 80 then checks whether the result agrees with the rules for characters each. Hiragana is used for Japanese words bunsetsu composition. If the check fails, the system and inflections, while katakana is used for words backtracks, proposes another shorter word, and checks borrowed from foreign languages and for other special the composition rules again. This process is repeated purposes. Lunde (1993:4) describes the distribution of until the sentence is divided into the least number of character types as follows: bunsetsu consistent with its structure (Ishizaki et al. 1989). Given an average sampling of Japanese writing, Maruyama et al. (1988) describe a sentence analyzer one normally finds 30 percent kanji, 60 percent that consists of five stages, each of which exploits a hiragana, and 10 percent katakana. Actual per- distinct kind of knowledge that is stored in the form of centages depend on the nature of the text. For a set of rules or a table. The five stages are: example, you may find a higher percentage of segmentation by (1) character type, (2) character kanji in technical literature, and a higher percent- sequence, and (3) a longest matching algorithm; (4) a age of katakana in the literature of fields such as bottom-up parallel algorithm if stage 3 fails; and (5) computer science, which make extensive use of compound-word composition. The first three lines in loan words written in katakana. the transliterated example below illustrate what the procedure must accomplish: The variable proportions of character types can easily be seen in a comparison of three different samples of input: sisutemugabunobunkaisuru. Japanese text. The first corpus consists of a set of bunsetsu: sisutemuga/buno/bunkaisuru. short newspaper articles on business ventures from words: sisutemu-ga/bun-o/bunkai-suru. Yomiu?i. The second corpus contains a series of meaning: system-subj/sentence-obj/analyze-nonpast editorial columns from Asahi Shinbun (tenseijingo shasetsu, translation: A/The system analyzes a/the sentence. 19851991). Information on a third corpus was drawn Two recent projects at BBN have used rule-based from the description provided by Yokoyama (1989) of lexical analyzers to construct probabilistic models of an online dictionary, Shin-Meikai Kokugo Jiten. Table 1 Japanese segmentation and part-of-speech assignment. gives the size of each corpus in thousands of characters Matsukawa, Miller, & Weischedel (1993) based their and shows the percentage of text written in each of the work on JUMAN, developed at Kyoto University, four character types. Punctuation and special symbols which has a 40,000 word lexicon and tags with a success have been excluded from the counts. rate of about 93%. They used hand-corrected output from JUMAN to train an example-based algorithm to bus. ed. diet. correct both segmentation errors and part of speech size (K chars) errors in JUMAN’s output. POST, a stochastic tagger, 42 275 2,508 % hiragana 30.2 58.0 52.4 then selects among ambiguous alternative segmentation % kanji 47.5 34.6 37.9 and part-of-speech assignments and predicts the part of % katakana 19.3 4.8 6.8 speech of unknown words. Papageorgiou (1994) trained % num/rom 2.9 2.6 2.9 a bigram hidden Markov model to segment Japanese text using the output of MAJESTY, a rule-based Table 1 morphological preprocessor (Kitani & Mitamura 1993) that is reported to segment and tag Japanese text with Of particular note is the fact that the business corpus better than 98% accuracy. Papageorgiou’s method uses contains roughly half the amount of hiragana of the neither a lexicon of Japanese words nor explicit rules, other two samples, both of which come close to Corpus-Based 743 Lunde’s norm, and three to four times as much units are produced. katakana. Table 2 lists the ten most frequent hiragana in the three corpora expressed as a percentage of total The Segmentation Algorithm hiragana.
Recommended publications
  • Noun Group and Verb Group Identification for Hindi
    Noun Group and Verb Group Identification for Hindi Smriti Singh1, Om P. Damani2, Vaijayanthi M. Sarma2 (1) Insideview Technologies (India) Pvt. Ltd., Hyderabad (2) Indian Institute of Technology Bombay, Mumbai, India [email protected], [email protected], [email protected] ABSTRACT We present algorithms for identifying Hindi Noun Groups and Verb Groups in a given text by using morphotactical constraints and sequencing that apply to the constituents of these groups. We provide a detailed repertoire of the grammatical categories and their markers and an account of their arrangement. The main motivation behind this work on word group identification is to improve the Hindi POS Tagger’s performance by including strictly contextual rules. Our experiments show that the introduction of group identification rules results in improved accuracy of the tagger and in the resolution of several POS ambiguities. The analysis and implementation methods discussed here can be applied straightforwardly to other Indian languages. The linguistic features exploited here are drawn from a range of well-understood grammatical features and are not peculiar to Hindi alone. KEYWORDS : POS tagging, chunking, noun group, verb group. Proceedings of COLING 2012: Technical Papers, pages 2491–2506, COLING 2012, Mumbai, December 2012. 2491 1 Introduction Chunking (local word grouping) is often employed to reduce the computational effort at the level of parsing by assigning partial structure to a sentence. A typical chunk, as defined by Abney (1994:257) consists of a single content word surrounded by a constellation of function words, matching a fixed template. Chunks, in computational terms are considered the truncated versions of typical phrase-structure grammar phrases that do not include arguments or adjuncts (Grover and Tobin 2006).
    [Show full text]
  • An Analysis of the Content Words Used in a School Textbook, Team up English 3, Used for Grade 9 Students
    [Pijarnsarid et. al., Vol.5 (Iss.3): March, 2017] ISSN- 2350-0530(O), ISSN- 2394-3629(P) ICV (Index Copernicus Value) 2015: 71.21 IF: 4.321 (CosmosImpactFactor), 2.532 (I2OR) InfoBase Index IBI Factor 3.86 Social AN ANALYSIS OF THE CONTENT WORDS USED IN A SCHOOL TEXTBOOK, TEAM UP ENGLISH 3, USED FOR GRADE 9 STUDENTS Sukontip Pijarnsarid*1, Prommintra Kongkaew2 *1, 2English Department, Graduate School, Ubon Ratchathani Rajabhat University, Thailand DOI: https://doi.org/10.29121/granthaalayah.v5.i3.2017.1761 Abstract The purpose of this study were to study the content words used in a school textbook, Team Up in English 3, used for Grade 9 students and to study the frequency of content words used in a school textbook, Team Up in English 3, used for Grade 9 students. The study found that nouns is used with the highest frequency (79), followed by verb (58), adjective (46), and adverb (24).With the nouns analyzed, it was found that the Modifiers + N used with the highest frequency (92.40%), the compound nouns were ranked in second (7.59 %). Considering the verbs used in the text, it was found that transitive verbs were most commonly used (77.58%), followed by intransitive verbs (12.06%), linking verbs (10.34%). As regards the adjectives used in the text, there were 46 adjectives in total, 30 adjectives were used as attributive (65.21 %) and 16 adjectives were used as predicative (34.78%). As for the adverbs, it was found that adverbs of times were used with the highest frequency (37.5 % ), followed by the adverbs of purpose and degree (33.33%) , the adverbs of frequency (12.5 %) , the adverbs of place ( 8.33% ) and the adverbs of manner ( 8.33 % ).
    [Show full text]
  • 6 the Major Parts of Speech
    6 The Major Parts of Speech KEY CONCEPTS Parts of Speech Major Parts of Speech Nouns Verbs Adjectives Adverbs Appendix: prototypes INTRODUCTION In every language we find groups of words that share grammatical charac- teristics. These groups are called “parts of speech,” and we examine them in this chapter and the next. Though many writers onlanguage refer to “the eight parts of speech” (e.g., Weaver 1996: 254), the actual number of parts of speech we need to recognize in a language is determined by how fine- grained our analysis of the language is—the more fine-grained, the greater the number of parts of speech that will be distinguished. In this book we distinguish nouns, verbs, adjectives, and adverbs (the major parts of speech), and pronouns, wh-words, articles, auxiliary verbs, prepositions, intensifiers, conjunctions, and particles (the minor parts of speech). Every literate person needs at least a minimal understanding of parts of speech in order to be able to use such commonplace items as diction- aries and thesauruses, which classify words according to their parts (and sub-parts) of speech. For example, the American Heritage Dictionary (4th edition, p. xxxi) distinguishes adjectives, adverbs, conjunctions, definite ar- ticles, indefinite articles, interjections, nouns, prepositions, pronouns, and verbs. It also distinguishes transitive, intransitive, and auxiliary verbs. Writ- ers and writing teachers need to know about parts of speech in order to be able to use and teach about style manuals and school grammars. Regardless of their discipline, teachers need this information to be able to help students expand the contexts in which they can effectively communicate.
    [Show full text]
  • The Impact of Function Words on the Processing and Acquisition of Syntax
    NORTHWESTERN UNIVERSITY The Impact of Function Words on the Processing and Acquisition of Syntax A DISSERTATION SUBMITTED TO THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for the degree DOCTOR OF PHILOSOPHY Field of Linguistics By Jessica Peterson Hicks EVANSTON, ILLINOIS December 2006 2 © Copyright by Jessica Peterson Hicks 2006 All Rights Reserved 3 ABSTRACT The Impact of Function Words on the Processing and Acquisition of Syntax Jessica Peterson Hicks This dissertation investigates the role of function words in syntactic processing by studying lexical retrieval in adults and novel word categorization in infants. Christophe and colleagues (1997, in press) found that function words help listeners quickly recognize a word and infer its syntactic category. Here, we show that function words also help listeners make strong on-line predictions about syntactic categories, speeding lexical access. Moreover, we show that infants use this predictive nature of function words to segment and categorize novel words. Two experiments tested whether determiners and auxiliaries could cause category- specific slowdowns in an adult word-spotting task. Adults identified targets faster in grammatical contexts, suggesting that a functor helps the listener construct a syntactic parse that affects the speed of word identification; also, a large prosodic break facilitated target access more than a smaller break. A third experiment measured independent semantic ratings of the stimuli used in Experiments 1 and 2, confirming that the observed grammaticality effect mainly reflects syntactic, and not semantic, processing. Next, two preferential-listening experiments show that by 15 months, infants use function words to infer the category of novel words and to better recognize those words in continuous speech.
    [Show full text]
  • Functional Word, Content Word, Word, and Morphemes
    Functional word, content word, word, and morphemes (Vocabulary) By : Kurnia aya Syntax written by kurnia aya Page 1 DISCUSSION PART I Functional Words And Content Words The lexical items are of two kinds: functional words and content words. The functional words are preppsitions, articles, conjuctions, forms indicating number, gender, or tense and pronouns. They are used chiefly to express grammatical functions. The contents are used to express cultural content and they are consists of nouns, verbs, adjectives, and adverbs. They have more or less independent meanings. Example: 1. Cats eat fish. Cats = Noun Eat = Verb Fish = Noun 2. My little daughter does not speak English. My = Preposition Little = Adverb Daughter = Noun Does not = Tense Speak English = Verb 3. The children went to the zoo happily. Syntax written by kurnia aya Page 2 The = Articles Children = Noun Went = Verb To = Conjunctions Zoo = Noun Happily = Adverbs The functional words follow a close system but the content words follow an open system. Thus, the bulk of the English vocabulary consists of the content words, especially nouns, since they keep on increasing in numbers. The exercise: Words NOUN VERB ADJECTIVE ADVERB Knowledge Smallest Purely Various Combined Noticeable Built Difficult Matter Even Words are among the most noticeable units of a language, yet it is extremely difficult to find a general, explicit description of this concept Syntax written by kurnia aya Page 3 that agrees with the judgments made by native speakers of English about which forms are words. In part, this difficult arises from the fact that such judgments are matters of performance, and, therefore, nonlinguistic factors may play a role in the speaker’s determination of words.
    [Show full text]
  • Introduction to Japanese Computational Linguistics Francis Bond and Timothy Baldwin
    1 Introduction to Japanese Computational Linguistics Francis Bond and Timothy Baldwin The purpose of this chapter is to provide a brief introduction to the Japanese language, and natural language processing (NLP) research on Japanese. For a more complete but accessible description of the Japanese language, we refer the reader to Shibatani (1990), Backhouse (1993), Tsujimura (2006), Yamaguchi (2007), and Iwasaki (2013). 1 A Basic Introduction to the Japanese Language Japanese is the official language of Japan, and belongs to the Japanese language family (Gordon, Jr., 2005).1 The first-language speaker pop- ulation of Japanese is around 120 million, based almost exclusively in Japan. The official version of Japanese, e.g. used in official settings andby the media, is called hyōjuNgo “standard language”, but Japanese also has a large number of distinctive regional dialects. Other than lexical distinctions, common features distinguishing Japanese dialects are case markers, discourse connectives and verb endings (Kokuritsu Kokugo Kenkyujyo, 1989–2006). 1There are a number of other languages in the Japanese language family of Ryukyuan type, spoken in the islands of Okinawa. Other languages native to Japan are Ainu (an isolated language spoken in northern Japan, and now almost extinct: Shibatani (1990)) and Japanese Sign Language. Readings in Japanese Natural Language Processing. Francis Bond, Timothy Baldwin, Kentaro Inui, Shun Ishizaki, Hiroshi Nakagawa and Akira Shimazu (eds.). Copyright © 2016, CSLI Publications. 1 Preview 2 / Francis Bond and Timothy Baldwin 2 The Sound System Japanese has a relatively simple sound system, made up of 5 vowel phonemes (/a/,2 /i/, /u/, /e/ and /o/), 9 unvoiced consonant phonemes (/k/, /s/,3 /t/,4 /n/, /h/,5 /m/, /j/, /ó/ and /w/), 4 voiced conso- nants (/g/, /z/,6 /d/ 7 and /b/), and one semi-voiced consonant (/p/).
    [Show full text]
  • An Explanation of the Decisive Role of Function Words in Driving Syntactic Development
    An explanation of the decisive role of function words in driving syntactic development Anat Ninio Department of Psychology, The Hebrew University of Jerusalem, Jerusalem 91905, Israel [email protected] Abstract The early mastery of function words (FWs) better predicts children’s concurrent and subsequent syntactic development than their acquisition of content words (CWs). Wishing to understand why the advantage of the early mastering of a FW vocabulary, we tested the hypothesis that the learning of FWs involves learning their syntax to a higher degree than is the case for CWs. English-language parental (N=506) and young children’s speech samples (N=350) were taken from the CHILDES archive. We mapped the use of words of different form-classes in parental speech, comparing the words’ occurrence as single-word utterances and as the heads of two-word long syntactically structured sentences. The distributions showed a dramatic effect of form-class: the four FW categories subordinators, determiners, prepositions and auxiliary verbs are used by parents almost exclusively in multiword utterances. By contrast, words in the four CW categories verbs, nouns, adjectives and adverbs appear both as single-word utterances and as roots of two- word sentences. Analysis of children’s talk had similar results, the proportions correlating very highly with parents’. Acquisition of FWs predicts syntactic development because they must be learned as combining words, whereas CWs can be learned as stand-alone lexemes, without mastering their syntax. 1. The research question 1.1 FWs predict syntactic development better than CWs Grammatical words such as determiners, auxiliary verbs and prepositions had long been considered marginal for the early stages of syntactic development.
    [Show full text]
  • Language Inference from Function Words by Tony C
    Working Paper Series ISSN l l 70-487X Language inference from function words by Tony C. Smith & Ian H. Witten Working Paper 93/3 January, 1993 © 1993 by Tony C. Smith & Ian H. Witten Department of Computer Science The University of Waikato Private Bag 3105 Hamilton, New Zealand Language Inference from Function Words Tony C .. Smith Department of Computer Science, University of Calgary, Calgary TIN 1N4, Canada Email: [email protected]; phone: +1 (403) 220-6015; fax: +1 (403) 284-4707 Ian H. Witten Department of Computer Science, University of Waikato, Hamilton, New Zealand Email: [email protected]; phone: +64 (7) 83~246; fax: +64 (7) 838-4155 Abstract Language surface structures demonstrate regularities that make it possible to learn a capacity for producing an infinite number of well-formed expressions. This paper out­ lines a system that uncovers and characterizes regularities through principled wholesale pattern analysis of copious amounts of machine-readable text. The system uses the notion of closed-class lexemes to divide the input into phrases, and from these phrases infers lexical and syntactic information. TI1c set of closed-class lexemes is derived from the text, and then these lexemes are clustered into functional types. Next the open-class words are categorized according to how they tend to appear in phrases and then clustered into a smaller number of open-class types. Finally these types are used to infer, and generalize, grammar rules. Statistical criteria are employed for each of these inference operations. The result is a relatively compact grammar that is guaran­ teed to cover every sentence in the source text that was used to form it.
    [Show full text]
  • Grammaticalization and Semantic Reanalysis Regine Eckardt [email protected]
    Grammaticalization and Semantic Reanalysis Regine Eckardt [email protected] 1. Grammaticalization as a conspiracy of changes Research in grammaticalization was inspired by the question “where does grammar come from?”. While it is almost tautological that any communication system requires signals for entities, properties, relations (“content words”), grammatical structures don’t seem to be required by signalling systems as such. Nevertheless, practically all natural languages include grammatical structure of surprising complexity. Moreover, there is no correlation between the level of cultural achievements of a society and the level of grammatical complexity of the society’s language. These observations suggest that our universal linguistic abilities drive us to collectively enrich signalling systems of content words with grammatical infrastructure. The present article takes a closer look into the semantic processes involved in these developments. The prototypical instance of language change called ‘grammaticalization’ is a change where a word with independent content, preferably of one of the main lexical categories A, V or N, develops a new use with a comparatively more dependent, more abstract content, changed word class, typically of a functional nature, e.g. auxiliary, modal, functional word or even affix. The development of Latin and French future tense forms is often presented as a typical model case of grammaticalization. (1) Expression of Future tense: we will sing Pre-Latin Latin French *kanta bhumos → canta-bimus sing be-2Pl.pres. sing-2Pl.fut. cantare habemus → chante-rons sing have-2Pl.pres. sing-2Pl.fut. allons chanter → ? go-2Pl.pres sing The semantic link between main verb (‘sing’) and embedding/auxiliary verb (‘have’, later ‘go’) changes during the development.
    [Show full text]
  • Are Lexical Categories Universal? the View from Chamorro*
    DOI 10.1515/tl-2012-0001 Theoretical Linguistics 2012; 38(1-2): 1 – 56 Sandra Chung Are lexical categories universal? The view from Chamorro* Abstract: Many years of linguistic research have led to no consensus on the issue of whether every language has nouns, verbs, and adjectives. This article investi- gates the issue from the perspective of Chamorro, an Austronesian language of the Mariana Islands. Chamorro has been claimed to have an unusual lexical cat- egory system consisting of just two language-particular categories. Evidence is presented here that (i) the language does in fact have nouns, verbs, and adjec- tives, and (ii) the apparent use of content words in multiple syntactic functions results from productive processes of denominal verb formation and denominal adjective formation that are not signaled by overt morphology. The lexical seman- tics and pragmatics of these processes are shown to be broadly parallel to d enominal verb formation in English. Overall, the evidence supports the claim that lexical categories are universal, and suggests that the broad routes by which semantic and phonological material can be packaged into lexical categories may be universal as well. Keywords: lexical categories, conversion, Chamorro, Austronesian Sandra Chung: Cowell Academic Services, University of California, Santa Cruz, Santa Cruz, CA 95064, U.S.A. E-mail: [email protected] * This study began with my attempts to figure out what parts of speech to assign to entries in the revised Chamorro-English dictionary. I am indebted to the principal investigator of the Chamorro Dictionary Revision Project, Dr. Elizabeth D. Rechebei, to the heads of the Dictionary working group, Manuel F.
    [Show full text]
  • Syntactic Category Constrains Lexical Competition in Speaking T ⁎ Shota Mommaa, , Julia Buffintonb, L
    Cognition 197 (2020) 104183 Contents lists available at ScienceDirect Cognition journal homepage: www.elsevier.com/locate/cognit Syntactic category constrains lexical competition in speaking T ⁎ Shota Mommaa, , Julia Buffintonb, L. Robert Slevcb, Colin Phillipsb a University of Massachusetts, Amherst, United States of America b University of Maryland, College Park, United States of America ARTICLE INFO ABSTRACT Keywords: We report two experiments that suggest that syntactic category plays a key role in limiting competition in lexical Lexical access access in speaking. We introduce a novel sentence-picture interference (SPI) paradigm, and we show that nouns Syntactic category (e.g., running as a noun) do not compete with verbs (e.g., walking as a verb) and verbs do not compete with nouns Grammatical encoding in sentence production, regardless of their conceptual similarity. Based on this finding, we argue that lexical Sentence production competition in production is limited by syntactic category. We also suggest that even complex words containing Sentence picture interference category-changing derivational morphology can be stored and accessed together with their final syntactic ca- tegory information. We discuss the potential underlying mechanism and how it may enable us to speak relatively fluently. 1. Introduction mean lemmas (Kempen & Huijbers, 1983), which are abstract linguistic representations that contain syntactic and semantic information but not Saying a word requires accessing an appropriate representation of phonological information (Levelt et al., 1999). To select a lemma in- the word among tens of thousands of words in speakers' mental dic- volves resolving competition from non-target lemmas. This interference tionaries, many of which are similar to each other.
    [Show full text]
  • The Grammatical Form of Content and Functions Words in Chinese Textbook for Indonesian Students
    International Journal of Translation and Interpretation Studies (IJTIS) DOI: 10.32996/ijtis Journal Homepage: www.al-kindipublisher.com/index.php/ijtis The Grammatical Form of Content and Functions Words in Chinese Textbook for Indonesian Students Karisma Erikson Tarigan1 Margaret Stevani2 12Fakultas Pendidikan Bahasa Inggris, Universitas Katolik Santo Thomas Sumatera Utara, Indonesia Corresponding Author: Karisma Erikson Tarigan, E-mail: [email protected] ARTICLE INFORMATION ABSTRACT Received: June 05, 2021 With the rapid of China‘s society development, Chinese content and function words Accepted: July 10, 2021 had their own development in static and dynamic form in grammatical form. Volume: 1 Indonesian students have to know well about the functions and characteristics of Issue: 1 Chinese content and function words in order to learn and used Chinese words. This DOI: 10.32996/ijtis.2021.1.1.2 research used qualitative method and the category based on Modern Chinese Dictionary (现代汉语词典Xiandai Hanyu Cidian) to categorize the vocabularies in KEYWORDS Contemporary Chinese textbooks. The results showed that 46% of content words were nouns, 26% were verbs, and 13% were adjectives. The content words in Contemporary Content and function words, Chinese textbooks include six parts of speech in the category of content words were Chinese textbook, Indonesian nouns, verbs, adjectives, numerals, measure words, and pronouns. The function words students in these content words were categorized into four of six parts, there were adverbs, prepositions, conjunctions, auxiliary words. There were no exclamation words, onomatopoetic words in the textbook. 42% of function words were adverbs, 17% conjunction words, 13% were prepositions, and some function words categories had unique characteristics, such as the adverb ―把ba and―离li, the auxiliary words (―的de, ―得de,―地de).
    [Show full text]