Semantic Role Labeling for Learner Chinese
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Role Labeling for Learner Chinese: the Importance of Syntactic Parsing and L2-L1 Parallel Data Zi Lin123, Yuguang Duan3, Yuanyuan Zhao125, Weiwei Sun124 and Xiaojun Wan12 1Institute of Computer Science and Technology, Peking University 2The MOE Key Laboratory of Computational Linguistics, Peking University 3Department of Chinese Language and Literature, Peking University 4Center for Chinese Linguistics, Peking University 5Academy for Advanced Interdisciplinary Studies, Peking University fzi.lin,ariaduan,ws,[email protected], [email protected] Abstract scoring. In this paper, we study semantic pars- This paper studies semantic parsing for in- ing for interlanguage, taking semantic role label- terlanguage (L21), taking semantic role label- ing (SRL) as a case task and learner Chinese as a ing (SRL) as a case task and learner Chi- case language. nese as a case language. We first manu- Before discussing a computation system, we ally annotate the semantic roles for a set of first consider the linguistic competence and per- learner texts to derive a gold standard for formance. Can human robustly understand learner automatic SRL. Based on the new data, we texts? Or to be more precise, to what extent, a na- then evaluate three off-the-shelf SRL systems, i.e., the PCFGLA-parser-based, neural-parser- tive speaker can understand the meaning of a sen- based and neural-syntax-agnostic systems, to tence written by a language learner? Intuitively, gauge how successful SRL for learner Chi- the answer is towards the positive side. To validate nese can be. We find two non-obvious facts: this, we ask two senior students majoring in Ap- 1) the L1-sentence-trained systems performs plied Linguistics to carefully annotate some L2-L1 rather badly on the L2 data; 2) the performance parallel sentences with predicate–argument struc- drop from the L1 data to the L2 data of the tures according to the specification of Chinese two parser-based systems is much smaller, in- PropBank (CPB; Xue and Palmer, 2009), which dicating the importance of syntactic parsing in SRL for interlanguages. Finally, the pa- is developed for L1. A high inter-annotator agree- per introduces a new agreement-based model ment is achieved, suggesting the robustness of lan- to explore the semantic coherency informa- guage comprehension for L2. During the course of tion in the large-scale L2-L1 parallel data. We semantic annotation, we find a non-obvious fact then show such information is very effective that we can re-use the semantic annotation spec- to enhance SRL for learner texts. Our model ification, Chinese PropBank in our case, which is achieves an F-score of 72.06, which is a 2.02 developed for L1. Only modest rules are needed to point improvement over the best baseline. handle some tricky phenomena. This is quite dif- 1 Introduction ferent from syntactic treebanking for learner sen- tences, where defining a rich set of new annotation A learner language (interlanguage) is an idiolect heuristics seems necessary (Ragheb and Dickin- developed by a learner of a second or foreign son, 2012; Nagata and Sakaguchi, 2016; Berzak language which may preserve some features of et al., 2016). his/her first language. Previously, encouraging re- Our second concern is to mimic the human’s ro- sults of automatically building the syntactic anal- bust semantic processing ability by computer pro- ysis of learner languages were reported (Nagata grams. The feasibility of reusing the annotation and Sakaguchi, 2016), but it is still unknown how specification for L1 implies that we can reuse stan- semantic processing performs, while parsing a dard CPB data to train an SRL system to pro- learner language (L2) into semantic representa- cess learner texts. To test the robustness of the tions is the foundation of a variety of deeper anal- state-of-the-art SRL algorithms, we evaluate two ysis of learner languages, e.g., automatic essay types of SRL frameworks. The first one is a tradi- 1In this paper, we call sentences written by non-native tional SRL system that leverages a syntactic parser speakers (henceforth, “L2 sentences”), aligned to their cor- rections by native speakers (henceforth, “L1 sentences”) L2- and heavy feature engineering to obtain explicit L1 parallel sentences. information of semantic roles (Feng et al., 2012). 3793 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3793–3802 Brussels, Belgium, October 31 - November 4, 2018. c 2018 Association for Computational Linguistics Furthermore, we employ two different parsers for parallel dataset 2. comparison: 1) the PCFGLA-based parser, viz. Berkeley parser (Petrov et al., 2006), and 2) a min- 2 Semantic Analysis of An L2-L1 imal span-based neural parser (Stern et al., 2017). Parallel Corpus The other SRL system uses a stacked BiLSTM to implicitly capture local and non-local information 2.1 An L2-L1 Parallel Corpus (He et al., 2017). and we call it the neural syntax- An L2-L1 parallel corpus can greatly facilitate the agnostic system. All systems can achieve state-of- analysis of a learner language (Lee et al., 2017). the-art performance on L1 texts but show a signif- Following Mizumoto et al.(2011), we collected icant degradation on L2 texts. This highlights the a large dataset of L2-L1 parallel texts of Man- weakness of applying an L1-sentence-trained sys- darin Chinese by exploring “language exchange” tem to process learner texts. social networking services (SNS), i.e., Lang-8, a While the neural syntax-agnostic system ob- language-learning website where native speakers tains superior performance on the L1 data, the two can freely correct the sentences written by foreign syntax-based systems both produce better analy- learners. The proficiency levels of the learners are ses on the L2 data. Furthermore, as illustrated in diverse, but most of the learners, according to our the comparison between different parsers, the bet- judgment, is of intermediate or lower level. ter the parsing results we get, the better the perfor- Our initial collection consists of 1,108,907 sen- mance on L2 we achieve. This shows that syntac- tence pairs from 135,754 essays. As there is lots of tic parsing is important in semantic construction noise in raw sentences, we clean up the data by (1) for learner Chinese. The main reason, according ruling out redundant content, (2) excluding sen- to our analysis, is that the syntax-based system tences containing foreign words or Chinese pho- may generate correct syntactic analyses for par- netic alphabet by checking the Unicode values, (3) tial grammatical fragments in L2 texts, which pro- dropping overly simple sentences which may not vides crucial information for SRL. Therefore, syn- be informative, and (4) utilizing a rule-based clas- tactic parsing helps build more generalizable SRL sifier to determine whether to include the sentence models that transfer better to new languages, and into the corpus. enhancing syntactic parsing can improve SRL to The final corpus consists of 717,241 learner some extent. sentences from writers of 61 different native lan- guages, in which English and Japanese constitute Our last concern is to explore the potential of a the majority. As for completeness, 82.78% of the large-scale set of L2-L1 parallel sentences to en- Chinese Second Language sentences on Lang-8 hance SRL systems. We find that semantic struc- are corrected by native human annotators. One tures of the L2-L1 parallel sentences are highly sentence gets corrected approximately 1.53 times consistent. This inspires us to design a novel on average. agreement-based model to explore such seman- In this paper, we manually annotate the tic coherency information. In particular, we de- predicate–argument structures for the 600 L2-L1 fine a metric for comparing predicate–argument pairs as the basis for the semantic analysis of structures and searching for relatively good auto- learner Chinese. It is from the above corpus that matic syntactic and semantic annotations to ex- we carefully select 600 pairs of L2-L1 parallel tend the training data for SRL systems. Experi- sentences. We would choose the most appropri- ments demonstrate the value of the L2-L1 paral- ate one among multiple versions of corrections lel sentences as well as the effectiveness of our and recorrect the L1s if necessary. Because word method. We achieve an F-score of 72.06, which structure is very fundamental for various NLP is a 2.02 percentage point improvement over the tasks, our annotation also contains gold word seg- best neural-parser-based baseline. mentation for both L2 and L1 sentences. Note that To the best of our knowledge, this is the first there are no natural word boundaries in Chinese time that the L2-L1 parallel data is utilized to en- 2The data is collected from Lang-8 (www.lang-8. hance NLP systems for learner texts. com) and used as the training data in NLPCC 2018 Shared Task: Grammatical Error Correction (Zhao et al., 2018), For research purpose, we have released our SRL which can be downloaded at https://github.com/ annotations on 600 sentence pairs and the L2-L1 pkucoli/srl4il 3794 PRF tations selected by each annotator, discussing the L1 95.87 96.17 96.02 ENG differences, and either selecting one as fully cor- L2 94.78 93.06 93.91 rect or creating a hybrid representing the consen- L1 97.95 98.69 98.32 JPN sus decision for each choice point. When we felt L2 96.07 97.48 96.77 that the decisions were not already fully guided by L1 96.95 95.41 96.17 RUS the existing annotation guidelines, we worked to L2 97.04 94.08 95.53 articulate an extension to the guidelines that would L1 96.95 97.76 97.35 ARA support the decision.