Summarising News Stories for Children

Iain Macdonald Advaith Siddharthan Computing Science Computing Science University of Aberdeen University of Aberdeen Scotland, U.K. Scotland, U.K. [email protected] [email protected]

Abstract poration’s CBBC Newsround1, a television pro- gramme and website dedicated to providing chil- This paper proposes a system to automat- dren in the age range of 6–12 years with news ically summarise news articles in a manner suitable for them (, 2011). This is suitable for children by deriving and com- primarily motivated by two factors: the impor- bining statistical ratings for how impor- tant, positively oriented and easy to read tance of young people engaging with current af- each sentence is. Our results demonstrate fairs and the potential benefits of automating that this approach succeeds in generating the creation of children’s news articles. summaries that are suitable for children, Multiple studies have highlighted potential and that there is further scope for combin- links between youth civic engagement (defined ing this extractive approach with abstrac- by Adler and Goggin (2005) as active partici- tive methods used in text simplification. pation in the life of a community to improve conditions for others or to help shape the com- 1 Introduction munity’s future) with the use of various forms of news media (see Boyd and Dobrow (2010) for a Automatic text summarisation is a research area good overview). However, while children’s news with half a century of history, with Luhn (1958) sources exist, possibly the best known being the discussing as far back as 1958 the task he called aforementioned Newsround, they are time con- “auto-abstracting of documents”. This field suming to maintain, and as a result very few has evolved considerably with a large number news stories are made available through them. of unsupervised and supervised techniques for For instance, Newsround has only six journalists summarising documents reported in the litera- working to maintain the website (Newsround, ture (see Nenkova and McKeown (2012) for an 2008), who focus more on multimedia content, overview). The vast majority of publications fo- so only around five articles a day are published cus on sentence selection based on notions of for children. While the guidelines used to pro- information content and topicality; such meth- duce Newsround articles are not public, we have ods are referred to collectively as extractive sum- observed that they are shorter than those that marisation. We adapt one such well understood appear on the main news site, use simpler lan- notion of informativeness to incorporate other guage, and also try to stay upbeat, avoiding up- desirable characteristics such as how positive or setting news where possible. Our primary objec- optimistic sentences are and how difficult they tive is to automate the generation of such news are to read, with the goal of generating news stories for children using an extractive approach, summaries that are suitable for children. though the further potential for abstractive ap- We are targeting a similar demographic of children as that of the British Broadcasting Cor- 1http://www.bbc.co.uk/newsround

1

Proceedings of The 9th International Natural Language Generation conference, pages 1–10, Edinburgh, UK, September 5-8 2016. c 2016 Association for Computational Linguistics proaches such as text simplification is also dis- where the denominator denotes the number of cussed. In order to achieve this objective, there words in the sentence. We adapted this metric are four key components described in this paper: in two ways: • A measure of how informative a sentence is. 1. A list Stop of 173 common stop words (Uni- • A measure of how positive or negative a sen- versity of Washington, 2012) was incorporated, tence is. and these were discounted in the calculations. • A measure of how difficult a sentence is to read 2. A peculiarity of news reporting in English and understand. is that the central information is often sum- • A formula for combining the combining the pre- marised within the first two sentences; this is vious measures. sometimes referred to as the inverted pyramid We describe these components and our eval- structure, widely believed to have been devel- uation methodology in §2 and our results in §3 oped in the 19th century (P¨ottker, 2003), and before discussing our contributions with respect the most common structure for print, broadcast to related work in §4 and presenting our conclu- and online news articles in English (Rich, 2015, sions in §5. p. 162). To account for this, we increased the score of the first sentence by a factor of 2 and 2 Method the second by a factor of 1.5. We based our summariser on SumBasic, a con- Our implemented information score is: temporary summariser that has been shown to perform well in evaluations in the news domain IPW Score (Sj) = p (wi) info wi∈Sj input (Nenkova and Vanderwende, 2005) and is easy |{wi|w ∈/Stop}| w ∈S i Xi j to adapt. SumBasic is a greedy algorithm that wi∈/Stop incrementally selects sentences to create a sum- where IPW , the inverted pyramid weight, is 2 mary with a similar distribution of words as the for first sentence, 1.5 for second sentence and 1 input document(s). It begins by estimating the otherwise. probability of seeing each word wi in the in- put as pinput(wi) = n/N, where n is the fre- 2.2 Sentence Difficulty Score quency of wi in the input and N is the total Sentence difficulty is often assessed as some com- number of words in the input. It then assigns bination of lexical and syntactic difficulty. Typi- a score to each sentence Sj which is the aver- cal heuristics such as readability formulae (Dale age probability of all the words in the sentence and Chall, 1948; Kincaid et al., 1975; Gun- Score S p w /length S SumBasic( j) = wi∈Sj ( i) ( j). ning, 1952; Mc Laughlin, 1969) are intended for Sentences are selected in decreasing value of the P scoring entire texts, rather than individual sen- score, and each time a sentence is incorporated tences. Alternately, psycholinguistic data for vo- in the summaries, the probabilities of words con- cabulary such as the Bristol Norms (Stadthagen- tained in the sentence are discounted to reduce Gonzalez and Davis, 2006; Gilhooly and Logie, the chance of selecting redundant sentences. We 1980) exist for age of acquisition, familiarity, extended this algorithm to incorporate senti- etc., but are relatively small (the Bristol Norms ment and ease of language as described below. contain only 3,394 words). 2.1 Information Score To more directly assess linguistic suitability for children, we used a language model derived We based our information metric on the Sum- from historical BBC Newsround stories. Text- Basic metric proposed by Nenkova and Vander- STAT (H¨uning,2002) was used to acquire 1000 wende (2005): Newsround URLs and ICEweb (Weisser, 2013) was used to extract the text from these web 1 page. The probability of every word in the cor- ScoreSumBasic(Sj) = pinput(wi) |{wi|wi ∈ Sj}| w ∈S pus was calculated, resulting in a lexicon of over Xi j

2 12,500 words. Lexical difficulty was then esti- abilities for each class (Pos and Neg), calculated mated in the same manner as importance in the as: section above; i.e. as the average probability p(P os|w1..n) = p(P os) p(wi|P os) of the words in the sentence, but this time ac- i ..n =1Y cording to the Newsround model. We excluded p(Neg|w1..n) = p(Neg) p(wi|Neg) names from the calculation by matching words i=1..n Y against a large collection of names (Ward, 1993): From these, we calculate a sentiment score as:

1 p(P os|w1..n) Score (Sj) = p (wi) lex wi∈Sj newsround Scorenb(Sj) = |{wi| }| p P os|w p Neg|w wi∈/Names wi∈Sj ( 1..n) + ( 1..n) wi∈/XNames

We used a simple sentence length heuristic for Dictionary based approach: In an effort syntactic difficulty, to give a combined difficulty to further overcome vocabulary issues with score: the statistical system, we also incorporated a Scorelex (Sj) dictionary-based approach. We used a senti- Scorediff (Sj) = |{wi|wi ∈ Sj}| ment dictionary with around 2000 positive and 4800 negative words respectively (Liu et al., 2.3 Sentiment score 2005). The classifier simply starting with a sen- We implemented hybrid of a statistical and a timent score of 0.5 and incremented or decre- rule based sentiment analysis component. mented by 0.1 for every word in a sentence found in the positive or negative dictionary respec- Supervised sentiment classifier: The sta- tively. tistical component was implemented as a su- pervised Na¨ıve Bayes classifier with unigram, Scoredict(Sj) = 0.5 + 0.1 − 0.1 bigram and trigram features. We first experi- wi∈Sj wi∈Sj wi∈XDictpos wi∈XDictneg mented with training it on a large corpus of pos- itive and negative movie reviews (Pang and Lee, 2.4 Combining Scores 2004). We were however not satisfied with the In order to combine scores, we first converted quality of classifications for news stories. The each individual score into its standard score key issue was the difference in vocabulary usage (also called z-score); a renormalisation that gives in the two genres; e.g. a word such as “ter- each score a mean of 0 and a standard deviation rifying” features prominently in positive movie of 1 over all sentences in the input. Following reviews, but should no predict positive senti- this step, a score for each sentence was computed ment in a news story. For genre adaptation, a as follows. First, the statistical and dictionary new dataset was created specifically for our pur- based (standardised) sentiment scores were com- pose by taking a pre-existent dataset of 2,225 bined in the ratio three is to one to give a single BBC articles assembled for topic classification sentiment score: (Greene and Cunningham, 2006). These articles were then manually labelled as positive, negative 2 1 Scoresent(Sj) = Scorenb(Sj) + Scoredict(Sj) or neutral based just on the topic of the story, 3 3 and the sentences from the positive and negative The final sentence score was then computed articles were added to the training data from the as a linear function of the scores for informa- movie review dataset. This augmentation was tiveness, difficulty and sentiment: observed to produce better results on new sto- ries, but no formal evaluation was carried out Scorechildren(Sj) = 5 × Scoreinfo(Sj) on this particular aspect. For a sentence with + 5 × Scoresent(Sj) n words, Na¨ıve Bayes returns conditional prob- + 2 × Scorediff (Sj)

3 checked that it was based on an article on the The weightings were set by hand based on main BBC webpage (we did this in order to con- manual experimentation. We found that within duct a further comparison to the manually writ- a single news report, there was limited varia- ten Newsround story [c.f.§3.1]). The first six tion in sentence difficulty; this score could be articles found to meet this criteria were used. assigned a higher weight in a multi-document An additional article was taken from each of summarisation task. The Guardian, The Independent and Sky News, 2.5 Experimental Setup again by sampling at random from a corpus of 1000 articles generated using TextSTAT. Evaluation platform: Amazon’s Mechanical These articles were then split into three sur- Turk service2 was utilised to create a survey to veys each with two BBC articles and one of the compare different summaries of news articles. other three articles. For each article partici- Various studies have been carried out into the pants were presented with the two summaries quality of data provided by Mechanical Turk produced by NSFC and GS, side by side, la- with the general consensus of these seeming to belled ‘A’ and ‘B’, in a randomised order and be that, provided the questions are clear and without any information on how they were pro- that the instructions are intuitive, the data gen- duced. They were provided a link to the original erated from Mechanical Turk is of a high quality news report, but not forced to read it. Examples (Ramsey et al., 2016; Buhrmester et al., 2011; of summaries used in the evaluation are provided Rand, 2012). in Table 1. Participants were then asked to an- Summariser settings: We compared two swer a four comparison questions on a five point summariser settings, the original SumBasic scale [“A is significantly more X”, “A is slightly score for informativeness (SSumBasic), and the more X”, “Not sure, or equally X”, “B is slightly other that combined informativeness with ease more X” and “B is significantly more X”], where of reading and sentiment (Schildren). For both X is the word in bold font in the questions below: settings, we set the required summary length to either one hundred words or half the length Q1 Which of these summaries is more informa- of the original article, whichever was smaller. tive? With respect to how this was implemented in Q2 Which of these summaries is more positive? the iterative summariser described at the top of §2, any sentence that would cause the summary Q3 Which of these summaries is more easy to read to exceed this length was ignored and the next and understand? highest rated sentence was given a chance in its place. Further, to prevent poorly scoring sen- Q4 Overall, which of these summaries do you be- tences being included, a minimum z-score limit lieve is more suitable for a child? was set to -0.25 below which sentences would be Finally, we asked a single non-comparison rejected. For both summarisers, sentences in the question for each summary on a five point scale summary were reordered to correspond to their [“Strongly disagree”, “Disagree”, “Not sure”, original ordering in the news article. “Agree”, “Strongly Agree”]: Evaluation data: We sampled 9 news arti- cles to summarise, six from the BBC and one Q5 I would consider showing summary {A|B} to a each from The Guardian, The Independent and child if I wanted them to know more about this news story. Sky News. For the BBC articles, we generated a corpus of 1000 Newsround stories using Text- Design: We solicited nine participants for STAT (H¨uning, 2002), and iteratively picked each survey, twenty-seven in total, resulting in one using a random number generator, and then each question being answered eight-one times 2http://www.mturk.com/ (twenty-seven participants, three articles each).

4 NSFC GS A blaze that swept through a dogs’ home has The newspaper has also captured aerial footage now claimed the lives of 60 animals, police have showing the extent of the damage caused by the said. More than 150 dogs were rescued from the blaze. In the aftermath of the fire, the manager fire, which broke out at Manchester Dogs’ Home of the home said 60 dogs had been housed in the in Moss Brook Road in Harpurhey on Thursday worst-affected building. Hundreds of messages of evening. Greater Manchester Fire and Rescue sympathy have been left on the JustGiving page, Service (GMFRS) tweeted its thanks to people as the amount of money donated continues to rise. who have donated money, saying: “One hundred A number of people, including police officers and and fifty dogs rescued. Thousands of pounds do- staff were quickly on the scene and put their life nated. Thank you Greater Manchester.” The on the line to help with the rescue effort. The RSPCA described the fire as “heartbreaking”. RSPCA described the fire as “heartbreaking”. The Manchester home was established in 1893 and cares for more than 7,000 dogs every year. Doctors have warned that almost half of all adults The doctors have produced a league table of pos- in Britain will be classified as obese within the sible actions that could be taken to curb the epi- next 20 years. They predict that on current demic. At the top is a 10% tax on high-calorie trends an extra 11 million people will be severely food and drink. “People know obesity is a real overweight by 2030, bringing the total to 26 mil- problem. People don’t know, as individuals, what lion. Only tough government action, including a to do about it.” “Governments do know what to tax on unhealthy food, can slow the trend, they do about it and if they could persuade people, say. At the top is a 10% tax on high-calorie food as they easily could, it would be a popular ac- and drink. “People know obesity is a real prob- tion.” Tam Fry, of the National Obesity Forum, lem. People don’t know, as individuals, what to said: “Children are born thin. It’s what we do to do about it.” children that makes them obese.” A new species of titanosaur unearthed in Ar- A new species of titanosaur unearthed in Ar- gentina is the largest animal ever to walk the gentina is the largest animal ever to walk the Earth, palaeontologists say. Based on its huge Earth, palaeontologists say. By measuring the thigh bones, it was 40m (130ft) long and 20m length and circumference of the largest femur (65ft) tall. A film crew from the BBC Natural (thigh bone), they calculated the animal weighed History Unit was there to capture the moment the 77 tonnes. “Given the size of these bones, which scientists realised exactly how big their discovery surpass any of the previously known giant an- was. This giant herbivore lived in the forests of imals, the new dinosaur is the largest animal Patagonia between 95 and 100 million years ago, known that walked on Earth,” the researchers based on the age of the rocks in which its bones told BBC News. “It will be named describing were found. There have been many previous con- its magnificence and in honour to both the region tenders for the title “world’s biggest dinosaur”. and the farm owners who alerted us about the discovery,” the researchers said.

Table 1: Sample summaries used in the evaluation

3 Results and aggregating counts for ‘slightly’ and ‘signif- icantly’ more. The family significance level was We will refer to the two summarisers being com- set at α = 0.05; with m = 6 null hypotheses pared as NSFC (News Summariser for Chil- (that the two summaries are equal on Q1–4 and dren), which uses Score as the met- children that for Q5 neither summariser would be con- ric and GS (Generic Summariser), which uses sidered suitable for children). We used the Bon- Score . The quantitative data for the SumBasic ferroni Correction (α/m), giving an individual four comparison questions are reported in Table significance threshold of 0.05/6 = 0.00833. 2, with pie charts for each question in Fig. 1. For statistical analysis of significance, we used Informative: News Summariser for Children the Sign Test, by ignoring the ‘Not Sure’ counts outperformed the generic summariser by a sig-

5 (NSFC) produced output deemed suitable for Q1 Q2 Q3 Q4 Info Pos Easy Overall the vast majority of cases (69 out of 74 where GS Significant 7 1 7 6 an opinion was expressed; p < 0.0001). GS Slight 12 12 9 14 Overall, these results were deemed to be Not Sure 6 32 14 4 tremendously positive and indicating that the NSFC Slight 26 20 34 32 News Summariser for Children has the potential NSFC Significant 30 16 17 25 to be an excellent tool in creating news sum- Table 2: Responses to comparison questions maries for children. To gain further insights, we also asked an expert in education to provide some qualitative feedback, as reported below. nificant margin of 56 to 19 (p < 0.0001), with only 14 instances of “Not Sure”. This suggests 3.1 Qualitative Comparison to BBC that the potentially negative effect on informa- Newsround tiveness of incorporating sentiment and reading In order to get qualitative feedback on the ease into the sentence score was more than offset strengths and weaknesses of our summariser by our adaptation of the SumBasic score to in- (NSFC), we selected the summaries of BBC corporate increased weight for the first two sen- news reports from the previous experiment for tences and ignore stop words. which NSFC received the highest and the lowest Positive: While the News Summariser for overall ratings. These were shown to a faculty Children still outperformed the generic sum- member from our University’s School of Educa- mariser by a significant margin of 36 to 13 tion, alongside the text from the corresponding (p = 0.0014), the most common response was BBC Newsround article. The Newsround arti- “Not Sure”(40% of responses). cle and NSFC summary were labelled A or B in each case and no indication was given as to the Easy: News Summariser for Children outper- identity of either. For the NSFC summary rated formed the generic summariser by a significant highest, the qualitative feedback from the ex- margin of 51 to 16 (p < 0.0001), with only 14 pert indicated that the summary created by the instances of “Not Sure”. NSFC (“B” in the following quote) was actually preferable to the article featured on Newsround Overall: News Summariser for Children out- (“A” in the following quote): performed the generic summariser by a signifi- cant margin of 57 to 20 (p < 0.0001), with only “While A provides more information 4 instances of “Not Sure”. about the [e]vent, it does not neces- Non-comparison question: The final ques- sarily make the news clearer or un- tion Q5 simply asked the participant to rate derstandable. Children up to the age whether they would show each summary to a of 10 generally find it difficult to deal child on a Likert scale. This question was neces- with big numbers and with metaphori- sary as the News Summariser for Children could cal words... The level of information [in have radically outperformed the generic sum- B] is kept to a minimum. It is very fac- mariser whilst still not have produced a par- tual and therefore easy to understand ticularly good summary in and of itself. Ta- ble 3 presents the quantitative data for the GS NSFC non-comparison question. While the generic Disagree Strongly 4 1 summariser (GS) produced output deemed suit- Disagree 16 4 able for being shown to children for slightly Not Sure 23 7 Agree 29 44 fewer than half the cases (38 out of 58 where Agree Strongly 9 25 an opinion was expressed; not significant with p = 0.0124), the news summariser for children Table 3: Responses to non-comparison question

6 (a) Q1: How informative? (b) Q2: How positive?

(c) Q3: How easy? (d) Q4: Overall? Figure 1: Responses to comparison questions

and to recall. Generally, the usage morphised usage of non-human sub- of common words, short sentences and jects and presence of metaphorical factual description helps understand- terms.... B is longer and it also in- ing of the news items.” cludes elements of scientific jargon and metaphorical terms. However the sen- For the NSFC summary rated lowest, the quali- tences are describing facts effectively tative feedback from the expert indicated that by means of clear stating of the sub- the summary created by the NSFC (“A” in jects, their actions captured by verbs the following quote) was inferior to the article in the active form and places/time” featured on Newsround (“B” in the following quote): 4 Discussion “A is shorter but ‘denser’ due to While there is considerable work in automatic the use of scientific jargon, anthropo- text summarisation (Nenkova and McKeown,

7 2012), sentiment analysis (Liu and Zhang, 2012) cific observations about vocabulary, highlight- and computational assessment of text readabil- ing words and phrases such as ‘blaze’, ‘flash ity (Collins-Thompson, 2014), as well as related floods’, ‘arson’ and ‘aid agencies’ as examples fields such as text simplification (Siddharthan, that may be difficult for a child to understand, 2014), we are unaware of any work directly tar- and approving of Newsround defining terms like geting the task of summarising news stories for ‘arson’ clearly within the text. The solution children. Perhaps the most closely related work it would appear is to combine the purely ex- is De Belder and Moens (2010), who describe a tractive approach described in this paper with system for simplifying news stories in a manner more abstractive approaches used in research on that is suitable for children, splitting sentences text simplification. This will be explored in fu- up into smaller simpler ones and replacing diffi- ture work. For instance, numerical simplifica- cult words with easier synonyms. Related ideas tion (Power and Williams, 2012; Bautista et al., have also been explored in Information Retrieval 2011), accurate conversion of passive to active research, with Collins-Thompson et al. (2011) voice (Siddharthan, 2010), sentence shortening describing how search results can be reranked by to preferentially remove difficult words (Angrosh readability to make them suitable for different et al., 2014), lexical simplification (De Belder reading skills, and Enikuomehin and Rahman and Moens, 2010; Yatskar et al., 2010), ex- (2015) describing how sentiment analysis could planatory descriptions of named entities (Sid- be incorporated into an IR engine for children. dharthan et al., 2011), simplifying causality and In the real world, news reporting for chil- discourse connectives (Siddharthan, 2003; Sid- dren is done manually at considerable cost. The dharthan and Katsos, 2010) and defining ter- BBC’s CBBC Newsround is a news source with minology (Elhadad, 2006) have all been demon- a long history, with its first episode airing in strated for text simplification systems. 1972 and regular episodes continuing to broad- cast to this day. The primary demographic for 5 Conclusions these summaries is children aged six to twelve years old (Newsround, 2011). Today a website Our goal was to create an automatic news sum- provides manually written news stories for chil- marisation system capable of producing sum- dren. In reality, these stories are often edited maries suitable for children by combining scores versions of an article on the main BBC web- for sentence informativeness, sentiment and dif- page, but considerably shorter, with easier to ficulty. Our evaluation confirmed that our sum- read sentences and by and large an optimistic mariser outperforms a generic summariser fo- outlook. This is the sort of news story we were cused only on informativeness in each of the as- attempting to emulate in this paper. pects of informativeness, positivity and simplic- Our quantitative results suggest that our sum- ity. Additionally, an overwhelming majority of mariser is successful in identifying sentences experimental participants rated the summaries that are informative while still being upbeat and created by this system as being suitable for be- easy to read. However, there are clearly limita- ing shown to children. An expert in the field of tions of our current work. These come through education further confirmed that when the sys- clearly in the qualitative feedback we received tem worked well, the summaries were of a high from the expert, who made references to “big standard and indeed superior to that created by numbers”, “metaphorical words”, “clear stating a professional journalist. The expert also anal- of the subjects”, “verbs in the active form”, ysed reasons for poor performance of the system etc. None of these are captured by our score. on other stories. As discussed in the previous Even if they were, it is doubtful whether al- section, there is potential for overcoming these ternative sentences that are equally informative by combining the extractive methods described can be found in a single document summarisa- here with abstractive methods from research on tion context. The expert also made various spe- automatic text simplification.

8 References and ambiguity measures for 1,944 words. Behavior Research Methods & Instrumentation, 12(4):395– Richard P Adler and Judy Goggin. 2005. What 427. do we mean by “civic engagement”? Journal of Derek Greene and P´adraig Cunningham. 2006. Transformative Education, 3(3):236–253. Practical solutions to the problem of diagonal Mandya Angrosh, Tadashi Nomoto, and Advaith dominance in kernel document clustering. In Proc. Siddharthan. 2014. Lexico-syntactic text simplifi- 23rd International Conference on Machine learn- cation and compression with typed dependencies. ing (ICML’06), pages 377–384. ACM Press. In Proceedings of COLING 2014, the 25th Interna- Robert Gunning. 1952. The technique of clear writ- tional Conference on Computational Linguistics: ing. Technical Papers, pages 1996–2006, Dublin, Ire- Matthias H¨uning. 2002. Textstat. http: land, August. Dublin City University and Associ- //neon.niederlandistik.fu-berlin.de/ ation for Computational Linguistics. static/textstat/TextSTAT-Doku-EN.html#7. Susana Bautista, Raquel Herv´as, Pablo Gerv´as, JP Kincaid, RP Fishburne, RL Rogers, and Richard Power, and Sandra Williams. 2011. How BS Chissom. 1975. Derivation of new readability to make numerical information accessible: Exper- formulas. Technical report, Technical report, TN: imental identification of simplification strategies. Naval Technical Training, US Naval Air Station, Human-Computer Interaction–INTERACT 2011, Memphis, TN. pages 57–64. Bing Liu and Lei Zhang. 2012. A survey of opin- Michelle J Boyd and Julie Dobrow. 2010. Media lit- ion mining and sentiment analysis. In Mining text eracy and positive youth development. Advances data, pages 415–463. Springer. in child development and behavior, 41:251–271. Bing Liu, Minqing Hu, and Junsheng Cheng. 2005. Michael Buhrmester, Tracy Kwang, and Samuel D Opinion observer: analyzing and comparing opin- Gosling. 2011. Amazon’s mechanical turk a ions on the web. In Proceedings of the 14th in- new source of inexpensive, yet high-quality, data? ternational conference on World Wide Web, pages Perspectives on psychological science, 6(1):3–5. 342–351. ACM. Kevyn Collins-Thompson, Paul N Bennett, Ryen W Hans Peter Luhn. 1958. A business intelligence sys- White, Sebastian de la Chica, and David Sontag. tem. IBM Journal of Research and Development, 2011. Personalizing web search results by reading 2(4):314–319. level. In Proceedings of the 20th ACM interna- G Harry Mc Laughlin. 1969. Smog grading- tional conference on Information and knowledge a new readability formula. Journal of reading, management, pages 403–412. ACM. 12(8):639–646. Kevyn Collins-Thompson. 2014. Computational as- Ani Nenkova and Kathleen McKeown. 2012. A sur- sessment of text readability: A survey of current vey of text summarization techniques. In Mining and future research. ITL-International Journal of Text Data, pages 43–76. Springer. Applied Linguistics, 165(2):97–135. Ani Nenkova and Lucy Vanderwende. 2005. The Edgar Dale and Jeanne S Chall. 1948. A formula for of frequency on summarization. Mi- predicting readability: Instructions. Educational crosoft Research, Redmond, Washington, Tech. research bulletin, pages 37–54. Rep. MSR-TR-2005-101. Jan De Belder and Marie-Francine Moens. 2010. CBBC Newsround. 2008. Cbbc newsround - “who Text simplification for children. In Prroceedings we are”. http://news.bbc.co.uk/cbbcnews/hi/ of the SIGIR workshop on accessible search sys- help/default.stm. Accessed: 16th November, tems, pages 19–26. ACM. 2015. Noemie Elhadad. 2006. Comprehending technical CBBC Newsround. 2011. Cbbc newsround - fre- texts: Predicting and defining unfamiliar terms. quently asked questions. http://www.bbc.co. In AMIA Annual Symposium proceedings, volume uk/newsround/13927399. Accessed: 13th April, 2006, page 239. American Medical Informatics As- 2016. sociation. Bo Pang and Lillian Lee. 2004. A sentimental edu- AO Enikuomehin and MA Rahman. 2015. A quan- cation: Sentiment analysis using subjectivity sum- tum based web summarizer for childrenâĂŹs news marization based on minimum cuts. In Proceed- rendering. International Journal of Internet of ings of the ACL. Things, 4(1):6–10. Horst P¨ottker. 2003. News and its communicative Ken J Gilhooly and Robert H Logie. 1980. Age- quality: the inverted pyramidâĂŤwhen and why of-acquisition, imagery, concreteness, familiarity, did it appear? Journalism Studies, 4(4):501–511.

9 Richard Power and Sandra Williams. 2012. Gener- Mark Yatskar, Bo Pang, Cristian Danescu- ating numerical approximations. Computational Niculescu-Mizil, and Lillian Lee. 2010. For the Linguistics, 38(1):113–134. sake of simplicity: Unsupervised extraction of lexi- Sarah R Ramsey, Kristen L Thompson, Melissa cal simplifications from wikipedia. In Human Lan- McKenzie, and Alan Rosenbaum. 2016. Psycho- guage Technologies: The 2010 Annual Conference logical research in the internet age: The quality of of the North American Chapter of the Association web-based data. Computers in Human Behavior, for Computational Linguistics, pages 365–368. As- 58:354–360. sociation for Computational Linguistics. David G Rand. 2012. The promise of mechanical turk: How online labor markets can help theorists run behavioral experiments. Journal of theoretical biology, 299:172–179. Carole Rich. 2015. Writing and reporting news: A coaching method. Cengage Learning, eighth edi- tion. Advaith Siddharthan and Napoleon Katsos. 2010. Reformulating discourse connectives for non- expert readers. In Proceedings of the 11th An- nual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2010), Los Angeles, CA. Advaith Siddharthan, Ani Nenkova, and Kathleen McKeown. 2011. Information status distinctions and referring expressions: An empirical study of references to people in news summaries. Compu- tational Linguistics, 37(4):811–842. Advaith Siddharthan. 2003. Preserving discourse structure when simplifying text. In Proceedings of the European Natural Language Generation Work- shop (ENLG), 11th Conference of the European Chapter of the Association for Computational Lin- guistics (EACL’03), pages 103–110, Budapest, Hungary. Advaith Siddharthan. 2010. Complex lexico- syntactic reformulation of sentences using typed dependency representations. In Proceedings of the 6th International Natural Language Genera- tion Conference, pages 125–133. Association for Computational Linguistics. Advaith Siddharthan. 2014. A survey of research on text simplification. ITL-International Journal of Applied Linguistics, 165(2):259–298. Hans Stadthagen-Gonzalez and Colin J Davis. 2006. The bristol norms for age of acquisition, image- ability, and familiarity. Behavior research meth- ods, 38(4):598–605. University of Washington. 2012. Stopword.list. http://courses.cs.washington.edu/courses/ cse573/12sp/A2/. Grady Ward. 1993. Names.txt. http://www. gutenberg.org/ebooks/3201. Martin Weisser. 2013. Iceweb. http:// martinweisser.org/ling_soft.html#iceweb.

10