Abstract Current Language Resource Tools Provide Only Limited Help for ESL/EFL Writing

[email protected] ESL/EFL / Abstract Current language resource tools provide only limited help for ESL/EFL writing. This research proposes a language information retrieval approach to acquire referential sentences from corpus under various levels of users language cognition. The approach includes a set of expression elements, such as exact words, prefix, suffix, part-of-speech, wildcard, and subsequence. Sentence retrieval involves both exact match and partial match with user query. Finally, the set of retrieved sentences are evaluated and ranked by multiple sequence alignment for relevance to user expression needs. Keywords: Information Retrieval, Corpus, Writing Assistance. (English as Second Language - ESL/English as Foreign Language - EFL) (collocation) [1] ESL/EFL ESL/EFL ESL/EFL 5 [2] [3][4] ESL/EFL ESL/EFL concordance concordancer [5][6][7] concordancer ESL/EFL [8](1) ESL/EFL (2) (3) concordancer concordancer ESL/EFL ESL/EFL ESL/EFL - SAW (Sentence Assistance for Writing) SAW SAW ESL/EFL 6 makecreateproduce music compose music ESL/EFL [8] [9][10](collocation) ESL/EFL problem causecreatesolve make Ilson (grammatical collocation)(lexical collocation)[11] (dominant word) determined by compose music ESL/EFL Word Sketch [12] (lexical profiling)(collocation) Brigham Young University VIEW[13](POS) concordance determined by VIEW concordancer 7 ESL/EFL / ESL/EFL ESL/EFL ESL/EFL by and large large by large ESL/EFL ESL/EFL ESL/EFL (expression elements) () (expression elements) 1. (exact words) 2. /(prefix/suffix)/ ESL/EFL "%" 3. (wildcard) "#" 8 4. (POS) (PREP)(ADJ)(N)(ADV)(V) (Other) 5. (subsequence) either or rather than"*" regular expression AdJ tea a pro% P would rather V than V () ESL/EFL ESL/EFL not only but also only also only also not only but also an university "an" "a" an university an university an university () (Multiple Sequence Alignment MSA)[14] (sequence) S1S2 Snn3 A1A2 An A1 S1A2 S2 n A1A2 An "í" S1= CCAATAS2= CCAT={A,C,T} S1 = C C A A T AS2 = C C A T S1 = C C A A T AS2 = C C A 9 T S1S2 MSA (substitution matrix) (optimal MSA) Needleman-Wunsch algorithm[14] the center star algorithm k n the center star algorithm O(k2n2)[15] query MSA 11 6 PJNdVO() (exact)/(prefix/suffix)(wildcard) (gap)(100)(50) (25)(5) = % /# X query S0a pro% PS0 MSA "= % P" S1S2S3 S1. This(X) posed(X) a(=) particular(X) problem(%) for(P) an(X) agent(X). S2. Listening(X) to(X) all(X) these(X) personal(X) accounts(X) has(X) had(X) a(=) profound(%) effect(X) on(P) us(X). S3. Increasingly(X) acid(X) rain(X) is(X) a(=) problem(%) in(P) Europe(X) too(X). S1S2S3 "a"(exact)"pro%"(prefix)"P"() S1S2S3 MSA (S0) the center star algorithm A0A1A2A3 sum-of-pair score A1 A0 C1A2 A0 C2A3 A0 C3 S(x,y) x y A0 = % P A1 X X = X % P X X A2 X X X X X X X X = % X P X A3 X X X X = % P X X C1=S(,) + + S(=,=) + S(%,X) + S(P, %) + S(,P) + ... + S(,X) = 93 C2=S(,X) + + S(=,=) + S(%,%) + S(P, X) + S(,P) + ... + S(,) =139 C3=S(,) + + S(=,=) + S(%,%) + S(P, P) + S(,X) + ... + S(,) = 169 sum-of-pair score SAW S3S2S1 S3. Increasingly acid rain is a problem in Europe too. S2. Listening to all these personal accounts has had a profound effect on us. S1. This posed a particular problem for an agent. () SAW(Sentence Assistance for Writing) 10 MSA SAW POS wildcard exact suffix prefix subsequence MSA SAW not only but also not only but also S1We must also make sure that future generations not only read, but also have a real enthusiasm for visiting bookshops and libraries. S2This was not only humiliating but also very awkward for Baldwin. S3This is not only easier, but also more fun. not only but also S1S2S3 S1S2S3 S3S2S1 SAW deter% by () native P() SAW deter% bynative P responsible #either * or prefixPOSwildcardsubsequence deter% bynative Presponsible # either * or 11 concordance 10 SAW deter% by determined by 676 deterred by 27 determination by 12 deterrence by 2 determine by 2 determining by 1 SAW responsible # () either * or () BNC(British National Corpus) 1974-1994 350 POS 62 BNC 62 POS 6 POS SAW ESL/EFL ()() () ESL/EFL SAW 16 TOFEL 11 45 12 ( subsequence ) 18 SAW (exactprefix POSwildcard) 12 SAW SAW 12 16 3 SAW SAW SAW SAW SAW [16] 5 1. 2. 4 3. 5 4. (number of relevant retrieved / number of retrieved) (precision) 5. () 1. 10 8 exactprefix POSwildcard 4 (1)exact specialize in (2)prefix prefix specialize in s% in specialize i% (3)POS POS specialize in V in specialize P (4)wildcard wildcard specialize in # in specialize # SAW 10 10 13 10 SAW exact 100% prefix 10 POS POS wildcard native to prefix POS wildcard prefix a proportion of a% proportion ofa p% ofa proportion o% a proportion of POS O proportion ofa N ofa proportion P wildcard # proportion ofa # ofa proportion # prefixPOSwildcard SAW Recommendation(first 10 sentences) Exact_Match AVG_Prefix AVG_POS AVG_Wildcard 100 80 60 40 Match(%) 20 0 specialize in responsible for familiarity with relevance to deal with SAW Recommendation(first 10 sentences) Exact_Match AVG_Prefix AVG_POS AVG_Wildcard 100 80 60 40 Match(%) 20 0 essential to native to composed of determined by guard against SAW prefix POS POS wildcard prefix wildcard 2. 12 enable * toderive * fromexpose * tonot only * but alsowould rather * thandistinguish * fromdivide * fromexpand * intoprovide * with 14 the same * aseither * orso * that subsequence 12 enable to enable * and enable * but 12 to, from, to, but, than, from, from, with, into, as, or, so and but not only * but also but but but or SAW ( 100%) enable * to to and but either * or so * that (false query)(vague query) SAW Recommendation(first 10 sentences) Subsequence Partial_Errors 100 80 60 40 Match(%) 20 0 enable * to derive * from expose * to not only * but would rather * distinguish * also than from SAW Recommendation(first 10 sentences) Subsequence Partial_Errors 100 80 60 40 Match(%) 20 0 divide * from expand * into provide * with the same * as either * or so * that subsequence partial errors 3. 19 (16 3 ) SAW BNC SAW 19 19 19 15 19 19 125 [17] 1000 80.44% 2000 46.10% Nation [18] 1000 74% 2000 81% 2000 125 12 ( 1 2 3 4 123 4) 113 86 44 20 22 27 86 44 20 ( 62.48%)22 ( 34.69%) 16 0.625 10 4.42 1.95 6.69 2.23 2.27 5.97 1.85 1.55 SAW (0~10) 4.42 6.69 5.97 (0~10) 4.2 5.83 (1~4) --- 2.78 2.70 (1~4) --- 2.80 (1~3) --- 2.09 (1~5) --- 2.76 (significance level) SAW Null Hypothesis (H0) SAW ( 4.42)Alternative Hypothesis (H1) SAW ( 4.42) 5% 20 H0 4.42+(1.65*1.95/(20)1/2) =5.14 6.69 5.14 H0 5% SAW 22 4.42+(1.65*1.95/(22)1/2)=5.10 SAW 5.97 5.97 5.10 5% SAW 16 86 44 42 50% 25% 25%() 10 4.20 2.56 5.83 2.46 1.63 5% 4.97 5.83 4.97 5% SAW (usefulness) 1~4 44 2.78 2.70 2.80 2.7~2.8 SAW 2000 BNC SAW (0~10) 4.99 6.53 (0~10) 4.56 5.42 (1~4) --- 2.59 (1~4) --- 2.53 (1~3) --- 2.16 (1~5) --- 3.16 27 16 0.625 10 4.99 2.49 6.53 2.00 1.54 5% 5.71 6.53 5.71 SAW 50% 25% 25%( ) 10 4.56 2.59 5.42 2.60 0.86 5% 5.38 5.42 5.38 SAW () SAW SAW SAW 17 2000 BNC I would rather going shopping than staying home. would rather than would rather I would rather go shopping than stay home. SAW would rather V than V would rather * than * would rather V than V 5/5 would rather * than * 7/10SAW CMU REAP [19] REAP REAP REAP SAW Gsearch (syntactic criteria) [20]Gsearch ESL/EFL TANGO [21] ESL/EFL SAW SAW (exact) /(prefix/suffix)(POS)(wildcard) (subsequence)(expression elements) MSA SAW ()( ) SAW SAW SAW SAW ESL/EFL 18 [1] A. Aghbar, Fixed Expressions in Written Texts: Implications for Assessing Sophistication, East Lancing, MI: National Center for Research on Teacher Learning, ERIC Document Reproduction Service No. ED 352808, 1990. [2] T. McEnery and A. Wilson, Corpus Linguistics, Edinburgh University Press, Edinburgh, 1996. [3] S. Conrad, The Importance of Corpus-based Research for Language Teachers, System 27: 1-18, 1999. [4] A. B. M. Tsui, ESL Teachers Questions and Corpus Evidence, International Journal of Corpus Linguistics 10(3): 335-356, 2005. [5] I. de OSullivan and A. Chambers, Learners Writing Skills in French: Corpus Consultation and Learner Evaluation, Journal of Second Language Writing 15(1):49-68, 2006. [6] Jean-Jacques Weber, A Concordance and Genre-informed Approach to ESP Essay Writing, ELT Journal 55(1): 14-20, 2001. [7] Y. C. Sun, Learning Process, Strategies and Web-based Concordancers: a Case Study, British Journal of Educational Technology 34(5): 601-613, 2003. [8] H. Yoon, An Investigation of Students Experiences with Corpus Technology in Second Language Academic Writing, Ph.D. Dissertation, The Ohio State University, USA, 2005. [9] C. C. Shei and H. Pain, An ESL Writers Collocational Aid, Computer Assisted Language Learning 13(2): 167182, 2000. [10] B. Altenberg and S. Granger, The Grammatical and Lexical Pattern of MAKE in Native and Non-native Student Writing, Applied Linguistics 22(2):173-195, 2001.

Abstract Current Language Resource Tools Provide Only Limited Help for ESL/EFL Writing

A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages

Towards a Korean Dbpedia and an Approach for Complementing the Korean Wikipedia Based on Dbpedia

Arxiv:2010.11856V3 [Cs.CL] 13 Apr 2021 Questions from Non-English Native Speakers to Rep- Information-Seeking Questions—Questions from Resent Real-World Applications

QUARTERLY CHECK-IN Technology (Services) TECH GOAL QUADRANT

Social Changer Jae-Hee Technology Comfort Level

Interview with Sue Gardner, Executive Director WMF October 1, 2009 510 Years from Now, What Is Your Vision?

SOME NOTES on the ETYMOLOGY of the WORD Altai / Altay Altai/Altay Kelimesinin Etimolojisi Üzerine Notlar

Transforming Wikipedia Into a Large Scale Multilingual Concept Network ∗ Vivi Nastase , Michael Strube

Analyzing the Creative Editing Behavior of Wikipedia Editors Through Dynamic Social Network Analysis

Issues of Cross-Contextual Information Quality Evaluation—The Case of Arabic, English, and Korean Wikipedias

KLUE: Korean Language Understanding Evaluation

Lost Koreans: Information Technology and Identity in the Former Soviet Union by Veronika Li a Thesis Presented in Partial Fulfil