ABC Research & Development Technical Report ABC-TRP-2015-T1
ACE: Automated Content Enhancement
An evaluation of natural language processing approaches to generating metadata from the full text of ABC stories
Page 1 of 9 © 2015 Australian Broadcasting Corporation ACE: Automated Content Enhancement
An evaluation of natural language processing approaches to generating metadata from the full-text of ABC stories.
Principal researcher Charlie Szasz Report prepared by Viveka Weiley & Charlie Szasz
It seems obvious, but we have to put the audience at the centre of what we do. If we are not delivering distinctive and quality content that finds its way to the people who pay for us, then we are not fulfilling our basic function.
ABC Strategy 2015 – Te 2nd pillar – "audience at the centre"
Introduction: The Challenge of Discoverability
Audience at the centre is more than a goal – it’s a realit. Audiences already expect to find the stories they are most interested in delivered to them at the time, place, device, platform and format of their choice.
Te ABC produces a huge volume of high qualit stories on a broad range of topics. However with the proliferation of alternative media outlets, new platforms for consuming and producing media and increasingly diverse audience expectations, traditional means of bringing those stories to audiences are no longer sufcient.
Broadcast at the centre Home page at the centre
Page 2 of 9 Audience at the centre Solving for relevance
Te task then is to help people find the most relevant stories, however they want them. Tis is more than just a technical challenge. To respond we must look into what people want, and into our own methods for organising and presenting stories to them.
ACE is focused on the latter question. In this project we are prototping and demonstrating systems to radically improve discoverabilit of relevant ABC stories. Our first prototpe is available now and has been demonstrated to stakeholders across the ABC.
We looked into the former question with the Spoke (Location, News and Relevance) pilots1. Te Spoke engine aggregated stories from across the ABC as well as from third parties, tagged according to location and topic. We then built a mobile app to present those stories to pilot users according to their preferences. Within Spoke we implemented a machine learning system so that it could improve its responses over time, and embedded comprehensive metrics followed up by audience interviews to tell us how well the Spoke content reflected their preferences.
Tis analysis taught us a lot about what people care about; in particular their strong preference for highly relevant stories based on their locations and topics of interest, and the granularity of that expectation. For example, we discovered that few people are interested in sport as a category; instead they tend to be interested in certain sports and averse to others; they may want to read every article available on a particular team, but find any information on a sport that they don’t like to be intensely irritating. Similarly, people may be interested in science but not technology or vice versa, and very interested in local business but not at all in national or global business.
Tis also gave us insight into how well the ABC’s current and future metadata practices can help meet these emerging audience expectations.
Current metadata practices
Te curated home page is no longer the sole entry point; as users turn to search, aggregators and emerging platforms, new discovery methods must be accounted for. Tis situation complicates the task of delivering the greatest value for audiences.
Every ABC story is tagged with metadata, primarily used by authors to control how content appears in current websites. To the extent that it accurately describes stories it can also be used as discovery metadata, to help search engines and aggregators find and present those stories.
Tere are two main gaps in our metadata. First, the existing fields and ontologies do not aford sufciently detailed description for discovery and recommendation engines to deliver personalised results. Second, those fields that are available are ofen lef empt by authors and editors.
For personalised content delivery and location and topic-based recommendations to deliver audience value, much more granular and comprehensive metadata will be required.
1 See ABC-WHP-2015-A Page 3 of 9 Solving the metadata problem
Sufciently granular and comprehensive metadata could be delivered entirely manually: by training authors and editors in more comprehensive metadata entry, and through policy direction. Tis would however be a labour-intensive solution, in a context where those people have many competing demands on their time and attention.
Another manual option would be the employment of metadata subeditors. Tis sufced for the Spoke pilot, where a single part-time editor was able to add sufcient location metadata to create localised feeds for two regional centres. To scale up to the whole country and to add the task of increasing metadata granularit however would not be sustainable.
At the other end of the scale is a fully automated system, using artificially intelligent expert systems to extract meaning from the full text of the articles and apply metadata tags. As the success of algorithm-driven solutions such as Google’s PageRank demonstrate, machine learning systems and automated content analysis can be a powerful and scalable means of improving discoverabilit.
As computation becomes exponentially cheaper and more powerful, a wide variet of useful machine learning and expert systems are emerging. Tese systems first appear in the world as research projects; and then as each approach matures its products coalesce into proprietary solutions and finally into open source and commodit platforms.
Better metadata through Natural Language Processing
Natural Language Processing (NLP) techniques in particular have the potential to become an important and useful tool for sorting through large volumes of stories, increasing discoverabilit.
NLP is a technology based on artificial intelligence. It can take large volumes of text, and summarise and codif it using a deep understanding of the structure of language as well as databases that link words and phrases to their meanings. It allows metadata to be created automatically. In the past this has been too computationally expensive for our uses, but in recent years advances in NLP techniques and the raw speed of computing have changed that realit.
Tis presents an opportunit, as NLP is now at the point where open source and commodit platforms are becoming available.
In recent years a number of NLP engines have been released as SaaS (Sofware As A Service) oferings, presenting APIs (Application Programming Interfaces) which can be used to analyse large datasets and provide sample results. Tis makes it possible to use third-part NLP engines as a core component of a technology stack to provide customisable content analysis services for an organisation.
Applications include increasing discoverabilit for textual content like news stories, but also anything for which a transcript is available, such as radio interviews and iView content. Page 4 of 9 Evaluation of automatically extracted metadata
In order to explore the options for NLP engines, ABC R&D first conducted an overview of all oferings available in the open market. We then selected the top three candidates for testing by building a prototpe to facilitate analysis.
The ABC ACE prototype
Te result is the ABC ACE prototpe, a custom-built research apparatus which connects the three NLP systems to a corpus of ABC content aggregated from across the organisation and provides a user interface for exploring the results generated by each NLP engine.
Te prototpe presents a query interface to aford exploration of the dataset as augmented by the NLP systems. For example, a user can find stories that are in the category of Business_Finance, which must contain the entit BHP Billiton with relevance at least 0.7 (where 1 is highest) and should contain the concepts Iron Ore and Mining with relevance at least 0.5.
Tis system is designed to expose as much data and functionalit as possible in order to support comparative evaluation of the three systems, and also to open up the possibilit space to reveal the breadth of possibilities for NLP. Accordingly it presents a comprehensive range of controls which may be daunting for the non-expert user.
Future prototpes may explore specific use cases for NLP, intended for specific user populations and with custom-built user interfaces. In the meantime the ACE prototpe system is live and available to use on request for further exploration. To gain access or schedule a demo please contact us.
Charlie at the ACE console. Lef: code. Centre: ACE prototype. Right: DBPedia entity detail page.
Page 5 of 9 Research Procedure
Using the ACE prototpe over 2600 ABC stories were analysed using the three chosen NLP services, AlchemyAPI, OpenCalais and TextRazor.
All stories were analysed for:
Named entities, fuzzy and disambiguated e.g. John Howard - Person (fuzzy) John Howard - Australian politician, http://en.wikipedia.org/wiki/John_Howard (disambiguated)
Concepts e.g. Prime Minister - http://dbpedia.org/resource/Prime_minister Concepts don’t necessarily match exact words or phrases in the text, they derived from meaning and linked to entries in various knowledge bases ( wikipedia, dbpedia ).
Categories or Topics (Taxonomy) e.g. law, govt and politics / government Only AlchemyAPI provides hierarchical categories, TextRazor and OpenCalais only derive top level. Te taxonomy extracted by all three services loosely match the ABC’s own.
Sentiment Only AlchemyAPI provides sentiment analysis. Due to limited API calls only named entities were analysed for sentiment. Overall sentiment of stories were not established.
Engine Comparison – Initial Conclusion
As a result of this analysis we determined that all engines detected and identified a similar number of named entities, concepts and topics (within an order of magnitude). Only Alchemy API could provide sentiment.
Out of all available NLP APIs the Alchemy API stands out as the most robust and promising choice at this point, largely due to its capabilit to connect with their News API, connecting to a corpus collected from over 75,000 news organisations. Te capacit for sentiment analysis could also be a useful feature, particularly with regard to providing metadata for a recommendations system.
Further: as all NLP engines are driven by the same underlying concepts, it is possible to build an application architecture which is independent of which NLP engine provides the underlying results. Te ACE prototpe demonstrates this principle in practice, allowing users to switch between any of the three engines under review.
Page 6 of 9 Confirmation study with prototype system users
We identified one potential confounding factor: false positives. Any data analysis procedure, including the natural language processing systems under review, can be considered on a range from more specific to more sensitive. A highly specific test will produce fewer results more accurately, whereas a highly sensitive test will produce more results, but with a higher chance of detecting false positives.
False positives will be missed by entirely automated systems and can pollute results. For example: a story may mention the ABC, and the engine may come up with a disambiguated link for it which is wrong – for example ABC Learning Centres when the article means the Australian Broadcasting Corporation, or vice versa. Another example: a story about a police sting operation may be misclassified as Arts and Entertainment, because the word “sting” has been misidentified as the musician. Sometime false positives are obvious – “Council of the European Union” for a story about a local council. Others are not: for example “A Private Function” will appear as a topic match, when the story mentions a private function at a club. Only on further investigation will you see that this match links to a DBPedia entry about the 1984 British comedy film starring Michael Palin and Maggie Smith.
To account for this factor we chose a subset of analysed stories (150) and recruited test users from the R&D team to check them for false positives. Tis was a time consuming process – appropriate for our experimental testing but not for real-world use in editorial workflows. We also determined that while the number of true false positives is low, they can have damaging editorial efects: for example flagging a story on a death as entertainment.
Quality Issues in Detected Metadata
Trough these automated and user driven analyses we identified the following concerns:
Inconsistent named entit disambiguation e.g. AlchemyAPI disambiguated “ABC” over 20 diferent ways in the stories analysed while over 90% of the instances found are referring to the Australian Broadcasting Corporation.
Too many fuzzy named entities, not enough disambiguated ones. Creates noise. Ontologies for entit tpes are ambiguous and vague. e.g. “driver” was detected as - Position (OpenCalais tpe), meaning a person’s occupation. In the context of a trafc accident story this is misleading.
Errors with potential for editorial harm Te machine learning systems can produce errors that a human editor would find embarrassing: for example AlchemyAPI categorising a murder story as “Arts and Entertainment”.
Page 7 of 9 Metadata Quality Findings
Even accounting for confounding factors, all the engines in this study are within the same order of magnitude in results and accuracy. None of them can be reliably used without editorial oversight, especially for Australian content. Even highly customised, proprietary solutions trained on Australian media (for example Fairfax’s Fizzing Panda) could only achieve 84% accuracy when it comes to disambiguating entities.
Notwithstanding this issue, the right workflow could enable content creators and editors to select qualit, NLP generated metadata with minimal efort, enhancing the power of authors and editors to make content richer and more discoverable.
Conclusion
Our initial aim was to explore the usefulness of NLP systems for augmenting metadata at the point of publication. We are now satisfied that those systems are now efective at the point of publication; and that their value can go beyond this point. NLP systems can provide tools useful at each stage of the story production process: research, writing and publishing.
We have also demonstrated that by including a human editor in the loop, results can be obtained that are more useful than either a purely manual or entirely automated approach would deliver. Our recommendation therefore is to ensure that any system using NLP results to augment news stories include a human in the loop. Future research can therefore most fruitfully focus on the human experience of using the engines to enrich content.
Our early user tests indicate that this person should select the strongest matches for retention rather than excluding mismatches specifically. We plan to go on to produce a proof of concept prototpe to show how such a system could be designed.
Te experimental process has already uncovered some qualitative results that could provide insight into the opportunities and challenges of deploying NLP engines to augment editorial workflows. For example, we discovered that the engines could be useful in the research phase, by revealing related stories when an author is still drafing, as well as in writing and publication; but that some terminology used in the field is obscure and if shown should be translated into more recognisable terms for non-expert users.
Accordingly, future phases will explore practical scenarios for integration of NLP techniques in the news gathering and editorial workflow through interactive prototpes focused on specific use cases.
Demos of the ACE prototpe have garnered strong interest from Digital Networks Technology, the LRS project, ABC News, Splash, iView and the WCMS project. We anticipate that any division with a significant content corpus in need of better discovery and analysis tools could benefit from the considered application of NLP techniques.
Page 8 of 9 Opportunities for future development
If we can design an efective a system that allows users to select the most useful metadata from automatically extracted list, then we can substantially improve the qualit of metadata in the ABC’s systems while simultaneously providing a simpler and more efective editorial workflow.
Tis would demonstrate the usefulness of the system, but some opportunities for greater value would be missed. Tese improvements go beyond the scope of an initial proof of concept, but would be fruitful avenues for further research.
1. Identifing consistent errors
By only approving NLP metadata that is correct we would miss out on metadata that is consistently identified erroneously by the NLP service. If we had the abilit to teach the system the correct response, then that match would become useful.
For example, the “ABC” is consistently misidentified as something other than the Australian Broadcasting Corporation. In our proposed proof of concept those mismatches would be ignored by the editor, and discarded by the system.
Further research could incorporate a “correction” process into the workflow which may include presenting multiple choices of disambiguation and/or creating our own definition.
2. Identifing missing entities
Some entities are missed by the NLP engines, or identified but not disambiguated as there is no matching database entry. Te abilit for ABC users to create new database entries for those missing entities would lead to richer results in future.
3. Building a learning system
By teaching the system to correct consistent errors and enriching its concept and topic database, we would over time be building a system that learned from the collective knowledge of ABC contributors, content authors and editors. Tis would result in a continual opening up of our content to easier discovery and distribution.
Page 9 of 9 Appendix 1: Natural Language Processing Case Study
Introduction
As part of our Automatic Content Classification Engine (ACE) project we investigated three commercially available NLP services - AlchemyAPI, OpenCalais, TextRazor – to analyse a subset of ABC content.
Tese services were used to extract named entities, categorise content into topics, generate concepts/tags and in the case of AlchemyAPI, identif the associated sentiment of the named entities extracted.
In this document we explore in detail the results generated by AlchemyAPI for a single selected story published by the ABC. We chose this example as it is illustrative of the issues, matches and mismatches common to NLP analysis of Australian news stories.
Terminology
1. Named Entities Named-entit recognition (NER) (also known as entit identification, entit chunking and entit extraction) is a subtask of information extraction that seeks to locate and classif elements in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. - Wikipedia
2. Named Entit Disambiguation In natural language processing, entit linking, named entit disambiguation (NED), named entit recognition and disambiguation (NERD) or named entit normalization (NEN)[1] is the task of determining the identit of entities mentioned in text. It is distinct from named entit recognition (NER) in that it identifies not the occurrence of names (and a limited classification of those), but their reference. - Wikipedia
3. Sentiment Analysis Sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarit of a document. Te attitude may be his or her judgment or evaluation (see appraisal theory), afective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional efect the author wishes to have on the reader). - Wikipedia!
Page 1 of 7 Case study example: Bali Nine.!
Fuzzy Named Entities
Named entities correctly disambiguated
Named entities incorrectly disambiguated
Named entities missed or ignored
Bali Nine1 families and diplomats en route to Cilacap2 amid emotional pleas
Myuran Sukumaran's3 sister has issued an emotional plea for his life to be spared, appearing in a YouTube4 video clutching a photograph of her brother as a young boy wearing a school uniform.
"My brother made a mistake 10 years ago and he's paid for this mistake every single day since then," Brintha Sukumaran5 said.
"My brother is now a good man and afer 10 years in prison, he has taught so many Indonesian prisoners about art and about how to live outside in the world and have a good and productive life
"From the bottom of my heart, please President Widodo6 have mercy on my brother ... change punishment for humanit."
Sukumaran7 and his co-charged Andrew Chan8 were sentenced to death in Indonesia9 in 2006, as ringleaders of the Bali Nine1 drug smuggling gang.
Some of their family members are on the way to Cilacap2.
Consular ofcials from the countries whose citizens face execution have also started arriving in Cilacap2, which is close to the high-securit prison island of Nusakambangan10 where all of the death row convicts are now housed.
Australian and Indonesian ofcials have met and it is understood they discussed final requests from the condemned men and their funeral arrangements.
Foreign Minister11 Julie Bishop12 said Australian ofcials had been told the execution of the Bali Nine1pair was imminent.
Page 2 of 7 "Indonesian authorities today advised Australian consular ofcials that the executions of Andrew Chan8 and Myuran Sukumaran3 will be scheduled imminently at Nusakambangan10 prison in central Java13"¨," she said in a statement on Saturday.
However, she said the Australian Government14 would still seek clemency from Indonesian president Joko Widodo15.
Jakarta16 has said an exact date for the executions could not be decided yet, as a judicial review was still pending for the sole Indonesian in the group of 10 people who face death by firing squad.
Indonesia9's Supreme Court17 said the ruling on that case could be made as early as Monday, paving the way for the executions to proceed.
Filipina on death row given execution notice: lawyer
A Filipina on death row in Indonesia9 has been informed that she will be executed on Tuesday, her lawyer said.
"We were informed by Mary Jane18 herself that she received the notice that the sentence will be implemented on April 28," Veloso's19 lawyer Minnie Lopez20 told news agency AFP21.
Veloso's19 father and mother, her two sons aged six and 12, and sister pushed through a scrum of waiting journalists.
"If anything bad happens to my daughter, I will hold many people accountable. Tey owe us my daughter's life," Veloso's19 55-year-old mother, Celia22, told a Philippine radio station.
"I hope my appeal reaches President Widodo6."
Lawyers for Veloso19 have also filed another court bid to halt her execution.
Authorities said on Tursday they had ordered prosecutors to start making preparations for the executions.
However convicts must be given 72 hours' notice before executions are carried out, and this notice is yet to be given.
Lawyers for the Australians say the legal process is not complete, with both a constitutional court challenge and judicial commission still in progress, however Indonesia9 says all judicial reviews and appeals for clemency have been exhausted, and that the legal manoeuvres amount to delaying tactics.
Page 3 of 7 Te 10 inmates facing execution, including Chan23, Sukumaran7, Veloso19, one each from Brazil24 and France25 and four from Africa26, have all lost appeals for clemency from Mr Widodo27, who has argued that Indonesia9 is fighting a drugs emergency.
Mr Widodo27 has turned a deaf ear to increasingly desperate appeals on the convicts' behalf from their governments, from social media and from others such as band Napalm Death28 " the president is a huge heavy metal fan.
Julian McMahon29 (centre), the lawyer for the Bali Nine1 pair on death row, leaves the Cilacap2 district prosecutor's30 ofce. (AAP31: Darma Semito32)
Highlights
• Mary-Jane Veloso, one of the subjects of this story, is never identified by her full name, leading to a number of misidentifications. • Te entit database is light on entries regarding Indonesian political figures, sometimes misidentifing them as entertainers with similar names. • Te entit database is better on Australian political figures, but still incomplete. • Te entit database (while very knowledgeable about Australian landmarks) does not recognise many significant Indonesian landmarks. • Te entit database is unaware of topical phrases such as “Bali Nine”, which it detected and through text analysis defined as an unknown “organisation”.
Page 4 of 7 Bali!Nine! Iden%fied'as'“Organiza%on”,'somewhat'incorrect.' 1 0.55$ Did'not'disambiguate'as:'h=p://dbpedia.org/page/Bali_Nine nega)ve
Cilacap! Iden%fied'as'“City”.' 2 0.55$ Did'not'disambiguate'as:'h=p://dbpedia.org/page/Cilacap_Regency nega)ve
Myuran! Did'not'recognise'as'en%ty.' 3 Sukumaran h=p://dbpedia.org/page/Myuran_Sukumaran
YouTube! 4 0.37$ Correctly'disambiguated'as:'h=p://dbpedia.org/resource/YouTube neutral
Brintha! Sukumaran! Incorrectly'disambiguated'as:'h=p://dbpedia.org/resource/Sukumaran' 5 0.75$ No'entry'exists'in'dbpedia. nega)ve
President!Widodo! Iden%fied'as'“Person”,'somewhat'incorrect'(should'be'just'Widodo)' 6 0.81$ Did'not'disambiguate'as:'h=p://dbpedia.org/resource/Joko_Widodo' nega)ve See:'15
7 Sukumaran See:'3
Andrew!Chan! 8 0.54$ Correctly'disambiguated'as:'h=p://dbpedia.org/resource/Andrew_Chan nega)ve
Indonesia! 9 0.72$ Correctly'disambiguated'as:'h=p://dbpedia.org/resource/Indonesia nega)ve
Nusakambangan! Iden%fied'as'“City”.' 10 0.31$ Did'not'disambiguate'as:'h=p://dbpedia.org/resource/Nusa_Kambangan nega)ve
Foreign!Minister! 11 0.30$ Iden%fied'as'“FieldTerminology”,'quite'ambigous neutral
Julie!Bishop! Iden%fied'as'“Person”.' 12 0.29$ Did'not'disambiguate'as:'h=p://dbpedia.org/page/Julie_Bishop nega)ve
Did'not'recognise'en%ty.' 13 Java h=p://dbpedia.org/resource/Java
Australian! Government! 14 Correctly'disambiguated'as:'h=p://dbpedia.org/resource/Government_of_Australia 0.34$ neutral
Joko!Widodo! Correctly'disambiguated'as:'h=p://dbpedia.org/resource/Joko_Widodo' 15 0.49$ See:'3 nega)ve
Page 5 of 7 Jakarta! 16 0.34$ Correctly'disambiguated'as:'h=p://dbpedia.org/resource/Jakarta nega)ve
Supreme!Court! 17 0.32$ Iden%fied'as'“Organiza%on” neutral
Mary!Jane! 18 0.33$ Incorrectly'disambiguated'as:'h=p://dbpedia.org/resource/Mary_Jane_CroZ neutral
Veloso! Iden%fied'as'“Person”' 19 0.83$ No'entry'exists'in'dbpedia. nega)ve
Minnie!Lopez! Iden%fied'as'“Person”' 20 0.26$ No'entry'exists'in'dbpedia. neutral
AFP! Incorrectly'disambiguated'as:'h=p://dbpedia.org/resource/Philippines' 21 0.30$ The'correct'disambigua%on'is:'h=p://dbpedia.org/page/Agence_France[Presse neutral
Celia! 22 0.28$ Iden%fied'as'“Person” posi)ve
23 Chan Did'not'recognise'as'en%ty.'Did'not'iden%fy'it'to'be'the'same'as'8
Brazil! Incorrectly'disambiguated'as:'h=p://dbpedia.org/resource/ 24 0.26$ Brazilian_military_government' neutral Correct'disambigua%on:'h=p://dbpedia.org/page/Brazil
France! Iden%fied'as'“County”.' 25 0.22$ Did'not'disambiguate'as:'h=p://dbpedia.org/page/France neutral
Africa! 26 0.28$ Correctly'disambiguated'as:'h=p://dbpedia.org/resource/Africa nega)ve
27 Mr!Widodo Did'not'recognise'as'en%ty.'Did'not'iden%fy'it'to'be'the'same'as'15
Did'not'recognise'as'en%ty.'No'entry'exists'in'dbpedia'for'heavy'metal'band'Napalm' 28 Napalm!Death Death
Julian!McMahon! 29 0.33$ Incorrectly'disambiguated'as:'h=p://dbpedia.org/resource/Julian_McMahon nega)ve
Prosecutor! 30 0.28$ Iden%fed'as'“JobTitle” nega)ve
Did'not'recognise'en%ty.'Did'not'disambiguate'as:'h=p://dbpedia.org/page/ 31 AAP Australian_Associated_Press
32 Darma!Semito Did'not'recognise'en%ty.
Page 6 of 7 Named entities in order of detected relevance:
1. Veloso -0.83
2. President Widodo – 0.81
3. Brintha Sukumaran – 0.75
4. Indonesia – 0.71
5. Bali Nine – 0.55
6. Cilacap – 0.55
7. Andrew Chan – 0.54
8. Joko Widodo – 0.49
9. Youtube – 0.37
10. Australian Government – 0.34
11. Jakarta – 0.34
12. Mary Jane – 0.33
13. Julian McMahon – 0.33
14. Supreme Court - 0.32
15. Nusakambangan – 0.31
16. Foreign Minister – 0.30
17. AFP – 0.30
18. Julie Bishop – 0.29
19. Celia – 0.28
20. Africa – 0.28
21. Prosecutor – 0.28
22. Brazil – 0.26
23. Minnie Lopez – 0.26
24. France – 0.22
Page 7 of 7 Appendix 2: System documentation