Summarising News Stories for Children
Total Page:16
File Type:pdf, Size:1020Kb
Summarising News Stories for Children Iain Macdonald Advaith Siddharthan Computing Science Computing Science University of Aberdeen University of Aberdeen Scotland, U.K. Scotland, U.K. [email protected] [email protected] Abstract poration’s CBBC Newsround1, a television pro- gramme and website dedicated to providing chil- This paper proposes a system to automat- dren in the age range of 6–12 years with news ically summarise news articles in a manner suitable for them (Newsround, 2011). This is suitable for children by deriving and com- primarily motivated by two factors: the impor- bining statistical ratings for how impor- tant, positively oriented and easy to read tance of young people engaging with current af- each sentence is. Our results demonstrate fairs and the potential benefits of automating that this approach succeeds in generating the creation of children’s news articles. summaries that are suitable for children, Multiple studies have highlighted potential and that there is further scope for combin- links between youth civic engagement (defined ing this extractive approach with abstrac- by Adler and Goggin (2005) as active partici- tive methods used in text simplification. pation in the life of a community to improve conditions for others or to help shape the com- 1 Introduction munity’s future) with the use of various forms of news media (see Boyd and Dobrow (2010) for a Automatic text summarisation is a research area good overview). However, while children’s news with half a century of history, with Luhn (1958) sources exist, possibly the best known being the discussing as far back as 1958 the task he called aforementioned Newsround, they are time con- “auto-abstracting of documents”. This field suming to maintain, and as a result very few has evolved considerably with a large number news stories are made available through them. of unsupervised and supervised techniques for For instance, Newsround has only six journalists summarising documents reported in the litera- working to maintain the website (Newsround, ture (see Nenkova and McKeown (2012) for an 2008), who focus more on multimedia content, overview). The vast majority of publications fo- so only around five articles a day are published cus on sentence selection based on notions of for children. While the guidelines used to pro- information content and topicality; such meth- duce Newsround articles are not public, we have ods are referred to collectively as extractive sum- observed that they are shorter than those that marisation. We adapt one such well understood appear on the main news site, use simpler lan- notion of informativeness to incorporate other guage, and also try to stay upbeat, avoiding up- desirable characteristics such as how positive or setting news where possible. Our primary objec- optimistic sentences are and how difficult they tive is to automate the generation of such news are to read, with the goal of generating news stories for children using an extractive approach, summaries that are suitable for children. though the further potential for abstractive ap- We are targeting a similar demographic of children as that of the British Broadcasting Cor- 1http://www.bbc.co.uk/newsround 1 Proceedings of The 9th International Natural Language Generation conference, pages 1–10, Edinburgh, UK, September 5-8 2016. c 2016 Association for Computational Linguistics proaches such as text simplification is also dis- where the denominator denotes the number of cussed. In order to achieve this objective, there words in the sentence. We adapted this metric are four key components described in this paper: in two ways: • A measure of how informative a sentence is. 1. A list Stop of 173 common stop words (Uni- • A measure of how positive or negative a sen- versity of Washington, 2012) was incorporated, tence is. and these were discounted in the calculations. • A measure of how difficult a sentence is to read 2. A peculiarity of news reporting in English and understand. is that the central information is often sum- • A formula for combining the combining the pre- marised within the first two sentences; this is vious measures. sometimes referred to as the inverted pyramid We describe these components and our eval- structure, widely believed to have been devel- uation methodology in §2 and our results in §3 oped in the 19th century (P¨ottker, 2003), and before discussing our contributions with respect the most common structure for print, broadcast to related work in §4 and presenting our conclu- and online news articles in English (Rich, 2015, sions in §5. p. 162). To account for this, we increased the score of the first sentence by a factor of 2 and 2 Method the second by a factor of 1.5. We based our summariser on SumBasic, a con- Our implemented information score is: temporary summariser that has been shown to perform well in evaluations in the news domain IPW Score (Sj) = p (wi) info wi∈Sj input (Nenkova and Vanderwende, 2005) and is easy |{wi|w ∈/Stop}| w ∈S i Xi j to adapt. SumBasic is a greedy algorithm that wi∈/Stop incrementally selects sentences to create a sum- where IPW , the inverted pyramid weight, is 2 mary with a similar distribution of words as the for first sentence, 1.5 for second sentence and 1 input document(s). It begins by estimating the otherwise. probability of seeing each word wi in the in- put as pinput(wi) = n/N, where n is the fre- 2.2 Sentence Difficulty Score quency of wi in the input and N is the total Sentence difficulty is often assessed as some com- number of words in the input. It then assigns bination of lexical and syntactic difficulty. Typi- a score to each sentence Sj which is the aver- cal heuristics such as readability formulae (Dale age probability of all the words in the sentence and Chall, 1948; Kincaid et al., 1975; Gun- Score S p w /length S SumBasic( j) = wi∈Sj ( i) ( j). ning, 1952; Mc Laughlin, 1969) are intended for Sentences are selected in decreasing value of the P scoring entire texts, rather than individual sen- score, and each time a sentence is incorporated tences. Alternately, psycholinguistic data for vo- in the summaries, the probabilities of words con- cabulary such as the Bristol Norms (Stadthagen- tained in the sentence are discounted to reduce Gonzalez and Davis, 2006; Gilhooly and Logie, the chance of selecting redundant sentences. We 1980) exist for age of acquisition, familiarity, extended this algorithm to incorporate senti- etc., but are relatively small (the Bristol Norms ment and ease of language as described below. contain only 3,394 words). 2.1 Information Score To more directly assess linguistic suitability for children, we used a language model derived We based our information metric on the Sum- from historical BBC Newsround stories. Text- Basic metric proposed by Nenkova and Vander- STAT (H¨uning,2002) was used to acquire 1000 wende (2005): Newsround URLs and ICEweb (Weisser, 2013) was used to extract the text from these web 1 page. The probability of every word in the cor- ScoreSumBasic(Sj) = pinput(wi) |{wi|wi ∈ Sj}| w ∈S pus was calculated, resulting in a lexicon of over Xi j 2 12,500 words. Lexical difficulty was then esti- abilities for each class (Pos and Neg), calculated mated in the same manner as importance in the as: section above; i.e. as the average probability p(P os|w1..n) = p(P os) p(wi|P os) of the words in the sentence, but this time ac- i ..n =1Y cording to the Newsround model. We excluded p(Neg|w1..n) = p(Neg) p(wi|Neg) names from the calculation by matching words i=1..n Y against a large collection of names (Ward, 1993): From these, we calculate a sentiment score as: 1 p(P os|w1..n) Score (Sj) = p (wi) lex wi∈Sj newsround Scorenb(Sj) = |{wi| }| p P os|w p Neg|w wi∈/Names wi∈Sj ( 1..n) + ( 1..n) wi∈/XNames We used a simple sentence length heuristic for Dictionary based approach: In an effort syntactic difficulty, to give a combined difficulty to further overcome vocabulary issues with score: the statistical system, we also incorporated a Scorelex (Sj) dictionary-based approach. We used a senti- Scorediff (Sj) = |{wi|wi ∈ Sj}| ment dictionary with around 2000 positive and 4800 negative words respectively (Liu et al., 2.3 Sentiment score 2005). The classifier simply starting with a sen- We implemented hybrid of a statistical and a timent score of 0.5 and incremented or decre- rule based sentiment analysis component. mented by 0.1 for every word in a sentence found in the positive or negative dictionary respec- Supervised sentiment classifier: The sta- tively. tistical component was implemented as a su- pervised Na¨ıve Bayes classifier with unigram, Scoredict(Sj) = 0.5 + 0.1 − 0.1 bigram and trigram features. We first experi- wi∈Sj wi∈Sj wi∈XDictpos wi∈XDictneg mented with training it on a large corpus of pos- itive and negative movie reviews (Pang and Lee, 2.4 Combining Scores 2004). We were however not satisfied with the In order to combine scores, we first converted quality of classifications for news stories. The each individual score into its standard score key issue was the difference in vocabulary usage (also called z-score); a renormalisation that gives in the two genres; e.g. a word such as “ter- each score a mean of 0 and a standard deviation rifying” features prominently in positive movie of 1 over all sentences in the input. Following reviews, but should no predict positive senti- this step, a score for each sentence was computed ment in a news story.