Readability Assessment for Text Simplification

Sandra Aluisio1, Lucia Specia2, Caroline Gasperin1 and Carolina Scarton1 1Center of Computational Linguistics (NILC) 2Research Group in Computational Linguistics University of São Paulo University of Wolverhampton São Carlos - SP, Brazil Wolverhampton, UK {sandra,cgasperin}@icmc.usp.br, [email protected] [email protected]

Abstract In order to promote digital inclusion and acces- We describe a readability assessment ap- sibility for people with low levels of literacy, par- proach to support the process of text simplifi- ticularly to documents available on the web, it is cation for poor literacy readers. Given an in- important to provide text in a simple and easy-to- put text, the goal is to predict its readability read way. This is a requirement of the Web Con- level, which corresponds to the literacy level tent Accessibility Guidelines 2.0’s principle of that is expected from the target reader: rudi- comprehensibility and accessibility of Web con- mentary, basic or advanced. We complement tent1. It states that for texts which demand reading features traditionally used for readability as- skills more advanced than that of individuals with sessment with a number of new features, and lower secondary education, one should offer an al- experiment with alternative ways to model this problem using machine learning methods, ternative version of the same content suitable for namely classification, regression and ranking. those individuals. While readability formulas for The best resulting model is embedded in an English have a long history – 200 formulas have authoring tool for Text Simplification. been reported from 1920 to 1980s (Dubay, 2004) – the only tool available for Portuguese is an adapta- 1 Introduction tion of the Flesch Reading Ease index. It evaluates the complexity of texts in a 4-level scale corres- In Brazil, the National Indicator of Functional Lite- ponding to grade levels (Martins et al., 1996). racy (INAF) index has been computed annually In the PorSimples project (Aluísio et al., 2008) since 2001 to measure the levels of literacy of the we develop text adaptation methods (via text sim- Brazilian population. The 2009 report presented a plification and elaboration approaches) to improve worrying scenario: 7% of the individuals are illite- the comprehensibility of texts published on gov- rate; 21% are literate at the rudimentary level; 47% ernment websites or by renowned news agencies, are literate at the basic level; only 25% are literate which are expected to be relevant to a large au- at the advanced level (INAF, 2009). These literacy dience with various literacy levels. The project levels are defined as: provides automatic simplification tools to aid (1) (1) Illiterate: individuals who cannot perform poorly literate readers to understand online content simple tasks such as reading words and phrases; – a browser plug-in for automatically simplifying (2) Rudimentary: individuals who can find ex- websites – and (2) authors producing texts for this plicit information in short and familiar texts (such audience – an authoring tool for guiding the crea- as an advertisement or a short letter); tion of simplified versions of texts. (3) Basic: individuals who are functionally lite- This paper focuses on a readability assessment rate, i.e., they can read and understand texts of av- approach to assist the simplification process in the erage length, and find information even when it is authoring tool, SIMPLIFICA. The current version necessary to make some inference; and of SIMPLIFICA offers simplification operations (4) Advanced: fully literate individuals, who can addressing a number of lexical and syntactic phe- read longer texts, relating their parts, comparing nomena to make the text more readable. The au- and interpreting information, distinguish fact from opinion, make inferences and synthesize. 1 http://www.w3.org/TR/WCAG20/

1 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications, pages 1–9, Los Angeles, California, June 2010. c 2010 Association for Computational Linguistics thor has the freedom to choose when and whether 2. Text Simplification in PorSimples to apply the available simplification operations, a Text Simplification (TS) aims to maximize reading decision based on the level of complexity of the comprehension of written texts through their sim- current text and on the target reader. plification. Simplification usually involves substi- A method for automatically identifying such tuting complex by simpler words and breaking level of complexity is therefore of great value. down and changing the syntax of complex, long With our readability assessment tool, the author is sentences (Max, 2006; Siddharthan, 2003). able to automatically check the complexi- To meet the needs of people with different le- ty/readability level of the original text, as well as vels of literacy, in the PorSimples project we pro- modified versions of such text produced as he/she pose two types of simplification: natural and applies simplification operations offered by strong. The first type results in texts adequate for SIMPLIFICA, until the text reaches the expected people with a basic literacy level and the second, level, adequate for the target reader. rudimentary level. The difference between these In this paper we present such readability as- two is the degree of application of simplification sessment tool, developed as part of the PorSimples operations to complex sentences. In strong simpli- project, and discuss its application within the au- fication, operations are applied to all complex syn- thoring tool. Different from previous work, the tool tactic phenomena present in the text in order to does not model text difficulty according to linear make it as simple as possible, while in natural sim- grade levels (e.g., Heilman et al., 2008), but in- plification these operations are applied selectively, stead maps the text into the three levels of literacy only when the resulting text remains “natural”. defined by INAF: rudimentary, basic or advanced. One example of original text (a), along with its Moreover, it uses a more comprehensive set of fea- natural (b) and strong (c) manual simplifications, is tures, different learning techniques and targets a given in Table 1. new language and application, as we discuss in Section 4. More specifically, we address the fol- (a) The cinema theaters around the world were show- lowing research questions: ing a production by director Joe Dante in which a shoal of piranhas escaped from a military laborato- 1. Given some training material, is it possible to ry and attacked participants of an aquatic show. detect the complexity level of Portuguese texts, (...) More than 20 people were bitten by palometas which corresponds to the different literacy levels (Serrasalmus spilopleura, a species of piranhas) defined by INAF? that live in the waters of the Sanchuri dam. 2. What is the best way to model this problem (b) The cinema theaters around the world were show- and which features are relevant? ing a production by director Joe Dante. In the pro- duction a shoal of piranhas escaped from a military We experiment with nominal, ordinal and interval- laboratory and attacked participants of an aquatic based modeling techniques and exploit a number show. (…) More than 20 people were bitten by pa- of the cognitively motivated features proposed by lometas that live in the waters of the Sanchuri dam. Coh-Metrix 2.0 (Graesser et al., 2004) and adapted Palometas are Serrasalmus spilopleura, a species to Portuguese (called Coh-Metrix-PORT), along of piranhas. with a set of new features, including syntactic fea- (c) The cinema theaters around the world were show- ing a movie by director Joe Dante. In the movie a tures to capture simplification operations and n- shoal of piranhas escaped from a military laborato- gram language model features. ry. The shoal of piranhas attacked participants of In the remainder of this paper, we first provide an aquatic show. (...). Palometas have bitten more some background information on the need for a than 20 people. Palometas live in the waters of the readability assessment tool within our text simpli- Sanchuri dam. Palometas are Serrasalmus spilop- fication system (Section 2) and discuss prior work leura, a species of piranhas. on readability assessment (Section 3), to then Table 1: Example of original and simplified texts present our features and modeling techniques (Sec- The association between these two types of simpli- tion 4) and the experiments performed to answer fication and the literacy levels was identified by our research questions (Section 5). means of a corpus study. We have manually built a corpus of simplified texts at both natural and

2 strong levels and analyzed their linguistic struc- The readability assessment tool automatically tures according to the description of the two litera- detects the level of complexity of a text at any cy levels. We verified that strong simplified sen- moment of the authoring process, and therefore tences are more adequate for rudimentary level guides the author towards producing the adequate readers, and natural ones for basic level readers. simplification level according to the type of reader. This claim is supported by several studies which It classifies a text in one of three levels: rudimenta- relate capabilities and performance of the working ry, basic or advanced. memory with reading levels (Siddharthan, 2003; Figure 1 shows the interface of SIMPLIFICA, McNamara et al., 2002). where the complexity level of the current text as given by the readability assessment tool is shown 2.1 The Rule-based Simplification System at the bottom, in red (in this case, “Nível Pleno”, The association between simplification operations which corresponds to advanced). To update the and the syntactic phenomena they address is im- readability assessment of a text the author can plemented within a rule-based syntactic simplifica- choose “Nível de Inteligibilidade” (readability lev- tion system (Candido Jr. et al., 2009). This system el) at any moment. is able to identify complex syntactic phenomena in The text shown in Figure 1 is composed of 13 a sentence and perform the appropriate operations sentences, 218 words. The lexical simplification to simplify each phenomenon. module (not shown in the Figure 1) finds 10 candi- The simplification rules follow a manual for date words for simplification in this text, and the syntactic simplification in Portuguese also devel- syntactic simplification module selects 10 sen- oped in PorSimples. They cover syntactic con- tences to be simplified (highlighted in gray). structions such as apposition, relative clauses, When the author selects a highlighted sentence, coordination and subordination, which had already he/she is presented with all possible simplifications been addressed by previous work on text simplifi- proposed by the rule-based system for this sen- cation (Siddharthan, 2003). Additionally, they ad- tence. Figure 2 shows the options for the first sen- dress the transformation of sentences from passive tence in Figure 1. The first two options cover non- into active voice, normalization of sentences into finite clause and adverbial adjuncts, respectively, the Subject-Verb-Object order, and simplification while the third option covers both phenomena in of adverbial phrases. The simplification operations one single step. The original sentence is also given available are: sentence splitting, changing particu- as an option. lar discourse markers by simpler ones, transform- It is possible that certain suggestions of auto- ing passive into active voice, inverting the order of matic simplifications result in ungrammatical or clauses, converting to subject-verb-object order, inadequate sentences (mainly due to er- relocating long adverbial phrases. rors). The author can choose not to use such sug- gestions as well as manually edit the original or 2.2 The SIMPLIFICA Tool automatically simplified versions. The impact of The rule-based simplification system is part of the author’s choice on the overall readability level SIMPLIFICA, an authoring tool for writers to of the text is not always clear to the author. The adapt original texts into simplified texts. Within goal of the readability assessment function is to SIMPLIFICA, the author plays an active role in provide such information. generating natural or strong simplified texts by ac- Simplified texts are usually longer than the cepting or rejecting the simplifications offered by original ones, due to sentence splittings and the system on a sentence basis and post-editing repetition of information to connect such them if necessary. sentences. We acknowledge that low literacy Despite the ability to make such choices at the readers prefer short texts, but in this tool the sentence level, it is not straightforward for the au- shortening of the text is a responsibility of the thor to judge the complexity level of the text as author. Our focus is on the linguistic structure of whole in order to decide whether it is ready for a the texts; the length of the text actually is a feature certain audience. This is the main motivation for considered by our readability assessment system. the development of a readability assessment tool.

3 Figure 1: SIMPLIFICA interface

Figure 2. Simplification options available for the first sentence of the text presented in Figure 1 3. Readability Assessment cognitive impairment caused by Alzheimer (Roark Recent work on readability assessment for the at al, 2007). English language focus on: (i) the feature set used Sheehan et al. (2007) focus on models for to capture the various aspects of readability, to literary and expository texts, given that traditional evaluate the contribution of lexical, syntactic, se- metrics like Flesch-Kincaid Level score tend to mantic and discursive features; (ii) the audience of overpredict the difficulty of literary texts and the texts the readability measurement is intended underpredict the difficulty of expository texts. to; (iii) the genre effects on the calculation of text Heilman et al. (2008) investigate an appropriate difficult; (iv) the type of learning technique scale of measurement for reading difficulty – which is more appropriate: those producing nomi- nominal, ordinal, or interval – by comparing the nal, ordinal or interval scales of measurement, and effectiveness of statistical models for each type of (v) providing an application for the automatic as- data. Petersen and Ostendorf (2009) use sessment of reading difficulty. classification and regression techniques to predict a Pitler and Nenkova (2008) propose a unified readability score. framework composed of vocabulary, syntactic, Miltsakali and Troutt (2007; 2008) propose an elements of lexical cohesion, entity coherence and automatic tool to evaluate reading difficulty of discourse relations to measure text quality, which Web texts in real time, addressing teenagers and resembles the composition of rubrics in the area of adults with low literacy levels. Using machine essay scoring (Burstein et al., 2003). learning, Glöckner et al. (2006) present a tool for The following studies address readability as- automatically rating the readability of German sessment for specific audiences: learners of Eng- texts using several linguistic information sources lish as second language (Schwarm and Ostendorf, and a global readability score similar to the Flesch 2005; Heilman et al., 2007), people with intellec- Reading Ease. tual disabilities (Feng et al., 2009), and people with

4 4. A Tool for Readability Assessment 1 Number of words 2 Number of sentences In this section we present our approach to readabil- 3 Number of paragraphs ity assessment. It differs from previous work in 4 Number of verbs the following aspects: (i) it uses a feature set with 5 Number of nouns cognitively-motivated metrics and a number of ad- 6 Number of adjectives ditional features to provide a better explanation of 7 Number of adverbs the complexity of a text; (ii) it targets a new audi- 8 Number of pronouns 9 Average number of words per sentence ence: people with different literacy levels; (iii) it 10 Average number of sentences per paragraph investigates different statistical models for non- 11 Average number of syllables per word linear data scales: the levels of literacy defined by 12 Flesch index for Portuguese INAF, (iv) it focus on a new application: the use of 13 Incidence of content words readability assessment for text simplification sys- 14 Incidence of functional words tems; and (v) it is aimed at Portuguese. 15 Raw Frequency of content words 16 Minimal frequency of content words 17 Average number of verb hypernyms 4.1 Features for Assessing Readability 18 Incidence of NPs Our feature set (Table 2) consists of 3 groups of 19 Number of NP modifiers features. The first group contains cognitively- 20 Number of words before the main verb motivated features (features 1-42), derived from 21 Number of high level constituents the Coh-Metrix-PORT tool (see Section 4.1.1). 22 Number of personal pronouns 23 Type-token ratio The second group contains features that reflect the 24 Pronoun-NP ratio incidence of particular syntactic constructions 25 Number of “e” (and) which we target in our text simplification system 26 Number of “ou” (or) (features 43-49). The third group (the remaining 27 Number of “se” (if) features in Table 2) contains features derived from 28 Number of negations n-gram language models built considering uni- 29 Number of logic operators 30 Number of connectives grams, and probability and per- 31 Number of positive additive connectives plexity plus out-of-vocabulary rate scores. We later 32 Number of negative additive connectives refer to a set of basic features, which consist of 33 Number of positive temporal connectives simple counts that do not require any linguistic tool 34 Number of negative temporal connectives or external resources to be computed. This set cor- 35 Number of positive causal connectives responds to features 1-3 and 9-11. 36 Number of negative causal connectives 37 Number of positive logic connectives 4.1.1 Coh-Metrix-Port 38 Number of negative logic connectives 39 Verb ambiguity ratio The Coh-Metrix tool was developed to compute 40 Noun ambiguity ratio features potentially relevant to the comprehension 41 Adverb ambiguity ratio of English texts through a number of measures in- 42 Adjective ambiguity ratio formed by linguistics, psychology and cognitive 43 Incidence of clauses studies. The main aspects covered by the measures 44 Incidence of adverbial phrases are cohesion and coherence (Graesser et al., 2004). 45 Incidence of apposition 46 Incidence of passive voice Coh-Metrix 2.0, the free version of the tool, con- 47 Incidence of relative clauses tains 60 readability metrics. The Coh-Metrix- 48 Incidence of coordination PORT tool (Scarton et al., 2009) computes similar 49 Incidence of subordination metrics for texts in Brazilian Portuguese. The ma- 50 Out-of-vocabulary words jor challenge to create such tool is the lack of some 51 LM probability of unigrams 52 LM perplexity of unigrams of the necessary linguistic resources. The follow- 53 LM perplexity of unigrams, without line break ing metrics are currently available in the tool (we 54 LM probability of bigrams refer to Table 2 for details): 55 LM perplexity of bigrams 1. Readability metric: feature 12. 56 LM perplexity of bigrams, without line break 57 LM probability of trigrams 2. Words and textual information: 58 LM perplexity of trigrams Basic counts: features 1 to 11. 59 LM perplexity of trigrams, without line break Table 2. Feature set

5 Frequencies: features 15 to 16. which of them can better distinguish among the Hypernymy: feature 17. three literacy levels. We therefore experiment with

3. Syntactic information: three types of machine learning algorithms: a stan- dard classifier, an ordinal (ranking) classifier and a Constituents: features 18 to 20. regressor. Each algorithm assumes different rela- Pronouns: feature 22 tions among the groups: the classifier assumes no Types and Tokens: features 23 to 24. relation, the ordinal classifier assumes that the Connectives: features 30 to 38. groups are ordered, and the regressor assumes that 4. Logical operators: features 25 to 29. the groups are continuous. As classifier we use the Support Vector Ma- The following resources for Portuguese were used: chines (SVM) implementation in the Weka4 toolkit the MXPOST POS tagger (Ratnaparkhi, 1996), a (SMO). As ordinal classifier we use a meta clas- word frequency list compiled from a 700 million- 2 sifier in Weka which takes SMO as the base classi- token corpus , a tool to identify reduced noun fication algorithm and performs pairwise classifi- phrases (Oliveira et al., 2006), a list of connectives cations (OrdinalClassClassifier). For regression we classified as positives/negatives and according to use the SVM regression implementation in Weka cohesion type (causal, temporal, additive or logi- (SMO-reg). We use the linear versions of the algo- cal), a list of logical operators and WordNet.Br rithms for classification, ordinal classification and (Dias-da-Silva et al., 2008). regression, and also experiment with a radial basis In this paper we include seven new metrics to function (RBF) kernel for regression. Coh-Metrix-PORT: features 13, 14, 21, and 39 to 3 42. We used TEP (Dias-da-Silva et al., 2003) to 5. Experiments obtain the number of senses of words (and thus their ambiguity level), and the Palavras parser 5.1 Corpora (Bick, 2000) to identify the higher level constitu- In order to train (and test) the different machine ents. The remaining metrics were computed based learning algorithms to automatically identify the on the POS tags. readability level of the texts we make use of ma- According to a report on the performance of nually simplified corpora created in the PorSimples each Coh-Metrix-PORT metric (Scarton et al., project. Seven corpora covering our three literacy 2009), no individual feature provides sufficient in- levels (advanced, basic and rudimentary) and two dication to measure text complexity, and therefore different genres were compiled. The first corpus is the need to exploit their combination, and also to composed of general news articles from the Brazil- combine them with the other types of features de- ian newspaper Zero Hora (ZH original). These ar- scribed in this section. ticles were manually simplified by a linguist, ex- pert in text simplification, according to the two 4.1.2 Language-model Features levels of simplification: natural (ZH natural) and Language model features were derived from a strong (ZH strong). The remaining corpora are large corpus composed of a sample of the Brazilian composed of popular science articles from differ- newspaper Folha de São Paulo containing issues ent sources: (a) the Caderno Ciência section of the from 12 months taken at random from 1994 to Brazilian newspaper Folha de São Paulo, a main- 2005. The corpus contains 96,868 texts and stream newspaper in Brazil (CC original) and a 26,425,483 tokens. SRILM (Stolcke, 2002), a manually simplified version of this corpus using standard language modelling toolkit, was used to the natural (CC natural) and strong (CC strong) produce the language model features. levels; and (b) advanced level texts from a popular science magazine called Ciência Hoje (CH). Table 4.2 Learning Techniques 3 shows a few statistics about these seven corpora. Given that the boundaries of literacy level classes are one of the subjects of our study, we exploit 5.2 Feature Analysis three different types of models in order to check As a simple way to check the contribution of dif- ferent features to our three literacy levels, we com- 2 http://www2.lael.pucsp.br/corpora/bp/index.htm 3 http://www.nilc.icmc.usp.br/tep2/index.htm 4 http://www.cs.waikato.ac.nz/ml/weka/

6

Corpus Doc Sent Words Avg. words Avg. Features Class F Corr. MAE per text (std. words p. All original 0.913 0.84 0.276 deviation) sentence natural 0.483 ZH original 104 2184 46190 444.1 (133.7) 21.1 strong 0.732 ZH natural 104 3234 47296 454.7 (134.2) 14.6 Language original 0.669 0.25 0.381 ZH strong 104 3668 47938 460.9 (137.5) 13.0 Model natural 0.025 CC original 50 882 20263 405.2 (175.6) 22.9 strong 0.221 CC natural 50 975 19603 392.0 (176.0) 20.1 Basic original 0.846 0.76 0.302 CC strong 50 1454 20518 410.3 (169.6) 14.1 natural 0.149 CH 130 3624 95866 737.4 (226.1) 26.4 strong 0.707 Table 3. Corpus statistics Syntactic original 0.891 0.82 0.285 natural 0.32 puted the (absolute) Pearson correlation between strong 0.74 our features and the expected literacy level for the Coh- original 0.873 0.79 0.290 two sets of corpora that contain versions of the Metrix- natural 0.381 three classes of interest (original, natural and PORT strong 0.712 strong). Table 4 lists the most highly correlated Flesch original 0.751 0.52 0.348 features. natural 0.152 strong 0.546 Feature Corr. Table 5: Standard Classification 1 Words per sentence 0.693 2 Incidence of apposition 0.688 Features Class F Corr. MAE 3 Incidence of clauses 0.614 All original 0.904 0.83 0.163 4 Flesch index 0.580 natural 0.484 5 Words before main verb 0.516 strong 0.731 6 Sentences per paragraph 0.509 Language original 0.634 0.49 0.344 7 Incidence of relative clauses 0.417 Model natural 0.497 8 Syllables per word 0.414 strong 0.05 9 Number of positive additive connectives 0.397 Basic original 0.83 0.73 0.231 10 Number of negative causal connectives 0.388 natural 0.334 Table 4: Correlation between features and literacy levels strong 0.637 Syntactic original 0.891 0.81 0.180 Among the top features are mostly basic and syn- natural 0.382 tactic features representing the number of apposi- strong 0.714 tive and relative clauses and clauses in general, and Coh- original 0.878 0.8 0.183 also features from Coh-Metrix-PORT. This shows Metrix- natural 0.432 PORT strong 0.709 that traditional cognitively-motivated features can Flesch original 0.746 0.56 0.310 be complemented with more superficial features natural 0.489 for readability assessment. strong 0 Table 6: Ordinal classification 5.3 Predicting Complexity Levels As previously discussed, the goal is to predict the The results of the standard and ordinal classifica- complexity level of a text as original, naturally or tion are comparable in terms of F-measure and cor- strongly simplified, which correspond to the three relation, but the mean absolute error is lower for literacy levels of INAF: rudimentary, basic and ad- the ordinal classification. This indicates that ordi- vanced level. nal classification is more adequate to handle our Tables 5-7 show the results of our experiments classes, similarly to the results found in (Heilman using 10-fold cross-validation and standard classi- et al., 2008). Results also show that distinguishing fication (Table 5), ordinal classification (Table 6) between natural and strong simplifications is a and regression (Table 7), in terms of F-measure harder problem than distinguishing between these (F), Pearson correlation with true score (Corr.) and and original texts. This was expected, since these mean absolute error (MAE). Results using our two levels of simplification share many features. complete feature set (All) and different subsets of However, the average performance achieved is it are shown so that we can analyze the considered satisfactory. performance of each group of features. We also Concerning the regression model (Table 7), the experiment with the Flesch index on its own as a RBF kernel reaches the best correlation scores feature. 7 among all models. However, its mean error rates mental model dimensions metrics. Having this ca- are above the ones found for classification. A lin- pacity for readability assessment is useful not only ear SVM (not shown here) achieves very poor re- to inform authors preparing simplified material sults across all metrics. about the complexity of the current material, but

Features Corr. MAE also to guide automatic simplification systems to All 0.8502 0.3478 produce simplifications with the adequate level of Language Model 0.6245 0.5448 complexity according to the target user. Basic 0.7266 0.4538 The authoring tool, as well as its text simplifica- Syntactic 0.8063 0.3878 tion and readability assessment systems, can be Coh-Metrix-PORT 0.8051 0.3895 used not only for improving text accessibility, but Flesch 0.5772 0.5492 Table 7: Regression with RBF kernel also for educational purposes: the author can pre- pare texts that are adequate according to the level With respect to the different feature sets, we can of the reader and it will also allow them to improve observe that the combination of all features consis- their reading skills. tently yields better results according to all metrics across all our models. The performances obtained References with the subsets of features vary considerably from model to model, which shows that the combination Sandra M. Aluísio, Lucia Specia, Thiago A. S. Pardo, Erick G. Maziero, Renata P. M. Fortes (2008). To- of features is more robust across different learning wards Brazilian Portuguese Automatic Text Simpli- techniques. Considering each feature set independ- fication Systems. In the Proceedings of the 8th ACM ently, the syntactic features, followed by Coh- Symposium on Document Engineering, pp. 240-248. Metrix-PORT, achieve the best correlation scores, Eckhard Bick (2000). The Parsing System "Palavras": while the language model features performed the Automatic Grammatical Analysis of Portuguese in a poorest. Constraint Grammar Framework. PhD Thesis. Uni- These results show that it is possible to predict versity of Århus, Denmark. with satisfactory accuracy the readability level of Jill Burstein, Martin Chodorow and Claudia Leacock texts according to our three classes of interest: (2003). CriterionSM Online Essay Evaluation: An original, naturally simplified and strongly simpli- Application for Automated Evaluation of Student Essays. In the Proceedings of the Fifteenth Annual fied texts. Given such results we embedded the Conference on Innovative Applications of Artificial classification model (Table 5) as a tool for read- Intelligence, Acapulco, Mexico. ability assessment into our text simplification au- Arnaldo Candido Jr., Erick Maziero, Caroline Gasperin, thoring system. The linear classification is our Thiago A. S. Pardo, Lucia Specia, and Sandra M. simplest model, has achieved the highest F- Aluisio (2009). Supporting the Adaptation of Texts measure and its correlation scores are comparable for Poor Literacy Readers: a Text Simplification to those of the other models. Editor for Brazilian Portuguese. In NAACL-HLT Workshop on Innovative Use of NLP for Building 6. Conclusions Educational Applications, pages 34–42, Boulder’. Helena de M. Caseli, Tiago de F. Pereira, Lúcia Specia, We have experimented with different machine Thiago A. S. Pardo, Caroline Gasperin and Sandra learning algorithms and features in order to verify Maria Aluísio (2009). Building a Brazilian Portu- whether it was possible to automatically distin- guese Parallel Corpus of Original and Simplified guish among the three readability levels: original Texts. In the Proceedings of CICLing. texts aimed at advanced readers, naturally simpli- Max Coltheart (1981). The MRC psycholinguistic data- fied texts aimed at people with basic literacy level, base. In Quartely Jounal of Experimental Psycholo- and strongly simplified texts aimed at people with gy, 33A, pages 497-505. Scott Deerwester, Susan T. Dumais, George W. Furnas, rudimentary literacy level. All algorithms achieved Thomas K. Landauer e Richard Harshman (1990). satisfactory performance with the combination of Indexing By . In Journal of all features and we embedded the simplest model the American Society For Information Science, V. into our authoring tool. 41, pages 391-407. As future work, we plan to investigate the con- Bento C. Dias-da-Silva and Helio R. Moraes (2003). A tribution of deeper cognitive features to this prob- construção de um eletrônico para o portu- lem, more specifically, semantic, co-reference and guês do Brasil. In ALFA- Revista de Lingüística, V.

8 47, N. 2, pages 101-115. Automatic Evaluation of Reading Difficulty of Web Bento C Dias-da-Silva, Ariani Di Felippo and Maria das Text. In the Proceedings of E-Learn 2007, Quebec, Graças V. Nunes (2008). The automatic mapping of Canada. Princeton WordNet lexical conceptual relations onto Eleni Miltsakaki and Audrey Troutt (2008). Real Time the Brazilian Portuguese WordNet database. In Pro- Web Text Classification and Analysis of Reading ceedings of the 6th LREC, Marrakech, Morocco. Difficulty. In the Proceedings of the 3rd Workshop William H. DuBay (2004). The principles of readability. on Innovative Use of NLP for Building Educational Costa Mesa, CA: Impact Information: http://www.i Applications, Columbus, OH. mpact-information.com/impactinfo/readability02.pdf Cláudia Oliveira, Maria C. Freitas, Violeta Quental, Cí- Christiane Fellbaum (1998). WordNet: An electronic cero N. dos Santos, Renato P. L. and Lucas Souza lexical database. Cambridge, MA: MIT Press. (2006). A Set of NP-extraction rules for Portuguese: Lijun Feng, Noémie Elhadad and Matt Huenerfauth defining and learning. In 7th Workshop on Computa- (2009). Cognitively Motivated Features for Reada- tional Processing of Written and Spoken Portuguese, bility Assessment. In the Proceedings of EACL Itatiaia, Brazil. 2009, pages 229-237. Sarah E. Petersen and Mari Ostendorf (2009). A ma- Ingo Glöckner, Sven Hartrumpf, Hermann Helbig, Jo- chine learning approach to reading level assess- hannes Leveling and Rainer Osswald (2006b). An ment. Computer Speech and Language 23, 89-106. architecture for rating and controlling text readabili- Emily Pitler and Ani Nenkova (2008). Revisiting reada- ty. In Proceedings of KONVENS 2006, pages 32-35. bility: A unified framework for predicting text quali- Konstanz, Germany. ty. In Proceedings of EMNLP, 2008. Arthur C. Graesser, Danielle S. McNamara, Max M. Adwait Ratnaparkhi (1996). A Maximum Entropy Part- Louwerse and Zhiqiang Cai (2004). Coh-Metrix: of-Speech Tagger. In Proceedings of the First Em- Analysis of text on cohesion and language. In Beha- pirical Methods in Natural Language Processing vioral Research Methods, Instruments, and Comput- Conference, pages133-142. ers, V. 36, pages 193-202. Brian Roark, Margaret Mitchell and Kristy Holling- Ronald K. Hambleton, H. Swaminathan and H. Jane shead (2007). Syntactic complexity measures for de- Rogers (1991). Fundamentals of item response tecting mild cognitive impairment. In the Proceed- theory. Newbury Park, CA: Sage Press. ings of the Workshop on BioNLP 2007: Biological, Michael Heilman, Kevyn Collins-Thompson, Jamie Translational, and Clinical Language Processing, Callan and Max Eskenazi (2007). Combining lexical Prague, Czech Republic. and grammatical features to improve readability Caroline E. Scarton, Daniel M. Almeida, Sandra M. A- measures for first and second language texts. In the luísio (2009). Análise da Inteligibilidade de textos Proceedings of NAACL HLT 2007, pages 460-467. via ferramentas de Processamento de Língua Natu- Michael Heilman, Kevyn Collins-Thompson and Max- ral: adaptando as métricas do Coh-Metrix para o ine Eskenazi (2008). An Analysis of Statistical Português. In Proceedings of STIL-2009, São Carlos, Models and Features for Reading Difficulty Predic- Brazil. tion. In Proceedings of the 3rd Workshop on Innova- Sarah E. Schwarm and Mari Ostendorf (2005). Reading tive Use of NLP for Building Educational Applica- Level Assessment Using Support Vector Machines tions, pages 71-79. and Statistical Language Models. In the Proceedings INAF (2009). Instituto P. Montenegro and Ação Educa- of the 43rd Annual Meeting of the ACL, pp 523–530. tiva. INAF Brasil - Indicador de Alfabetismo Funcio- Kathleen M. Sheehan, Irene Kostin and Yoko Futagi nal - 2009. Available online at http://www. ibope. (2007). Reading Level Assessment for Literary and com.br/ipm/relatorios/relatorio_inaf_2009.pdf Expository Texts. In D. S. McNamara and J. G. Teresa B. F. Martins, Claudete M. Ghiraldelo, Maria Trafton (Eds.), Proceedings of the 29th Annual Cog- das Graças V. Nunes e Osvaldo N. de Oliveira Jr. nitive Science Society, page 1853. Austin, TX: Cog- (1996). Readability formulas applied to textbooks in nitive Science Society. brazilian portuguese. ICMC Technical Report, N. Advaith Siddharthan (2003). Syntactic Simplification 28, 11p. and Text Cohesion. PhD Thesis. University of Cam- Aurélien Max (2006). Writing for Language-impaired bridge. Readers. In Proceedings of CICLing, pages 567-570. Andreas Stolcke. SRILM -- an extensible language Danielle McNamara, Max Louwerse, and Art Graesser, modeling toolkit. In Proceedings of the International 2002. Coh-Metrix: Automated cohesion and coher- Conference on Spoken Language Processing, 2002. ence scores to predict text readability and facilitate comprehension. Grant proposal. http://cohmetrix. memphis.edu/cohmetrixpr/publications.html Eleni Miltsakaki and Audrey Troutt (2007). Read-X:

9