<<

An Entity-Focused Approach to Generating Company Descriptions

Gavin Saldanha* Or Biran** Kathleen McKeown** Alfio Gliozzo Columbia University Columbia University Columbia University IBM Watson†

* [email protected] ** orb, kathy @cs.columbia.edu [email protected]{ } †

Abstract these with sentences on the web that match learned expressions of relationships. We evaluate our hy- Finding quality descriptions on the web, brid approach and compare it with a targeted-only such as those found in Wikipedia arti- approach and a data-driven-only approach, as well cles, of newer companies can be difficult: as a strong multi-document summarization base- search engines show many pages with line. Our results show that the hybrid approach varying relevance, while multi-document performs significantly better than either approach summarization algorithms find it difficult alone as well as the baseline. to distinguish between core facts and other The targeted (TD) approach to company de- information such as news stories. In this scription uses Wikipedia descriptions as a model paper, we propose an entity-focused, hy- for generation. It learns how to realize RDF re- brid generation approach to automatically lations that have the company as their subject: produce descriptions of previously unseen each relation contains a company/entity pair and companies, and show that it outperforms a it is these pairs that drive both content and expres- strong summarization baseline. sion of the company description. For each com- pany/entity pair, the system finds all the ways in 1 Introduction which similar company/entity pairs are expressed As new companies form and grow, it is impor- in other Wikipedia company descriptions, clus- tant for potential investors, procurement depart- tering together sentences that express the same ments, and business partners to have access to a company/entity relation pairs. It generates tem- 360-degree view describing them. The number plates for the sentences in each cluster, replacing of companies worldwide is very large and, for the mentions of companies and entities with typed the vast majority, not much information is avail- slots and generates a new description by insert- able in sources like Wikipedia. Often, only fir- ing expressions for the given company and entity mographics data (e.g. industry classification, lo- in the slots. All possible sentences are generated cation, size, and so on) is available. This cre- from the templates in the cluster, the resulting sen- ates a need for cognitive systems able to aggre- tences are ranked and the best sentence for each gate and filter the information available on the web relation selected to produce the final description. and in news, databases, and other sources. Provid- Thus, the TD approach is a top-down approach, ing good quality natural language descriptions of driven to generate sentences expressing the rela- companies allows for easier access to the data, for tions found in the company’s RDF data using real- example in the context of virtual agents or with izations that are typically used on Wikipedia. text-to-speech applications. In contrast, the data-driven (DD) approach uses In this paper, we propose an entity-focused sys- a semi-supervised method to select sentences from tem using a combination of targeted (knowledge descriptions about the given company on the web. base driven) and data-driven generation to create Like the TD approach, it also begins with a company descriptions in the style of Wikipedia de- seed set of relations present in a few companies’ scriptions. The system generates sentences from DBPedia entries, represented as company/entity RDF triples, such as those found in DBPedia and pairs, but instead of looking at the corresponding Freebase, about a given company and combines Wikipedia articles, it learns patterns that are typ-

243 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 243–248, Berlin, Germany, August 7-12, 2016. c 2016 Association for Computational Linguistics ically used to express the relations on the web. tence selection system (Blair-Goldensohn et al., In the process, it uses bootstrapping (Agichtein 2003; Weischedel et al., 2004; Schiffman et al., and Gravano, 2000) to learn new ways of ex- 2001). In our approach, we target both relevance pressing the relations corresponding to each com- and variety of expression, driving content by se- pany/entity pair, alternating with learning new lecting sentences that match company/entity pairs pairs that match the learned expression patterns. and inducing multiple patterns of expression. Sen- Since the bootstrapping process is driven only by tence selection has also been used in prior work on company/entity pairs and lexical patterns, it has generating Wikipedia overall articles (Sauper and the potential to learn a wider variety of expressions Barzilay, 2009). Their focus is more on learning for each pair and to learn new relations that may domain-specific templates that control the topic exist for each pair. Thus, this approach lets data structure of an overview, a much longer text than for company descriptions on the web determine we generate. the possible relations and patterns for expressing those relations in a bottom-up fashion. It then uses 3 Targeted Generation the learned patterns to select matching sentences from the web about a target company. The TD system uses a development set of 100 S&P500 companies along with their Wikipedia 2 Related Work articles and DBPedia entries to form templates. For each RDF relation with the company as the The TD approach falls into the generation pipeline subject, it identifies all sentences in the corre- paradigm (Reiter and Dale, 1997), with content sponding article containing the entities in the re- selection determined by the relation in the com- lation. The specific entities are then replaced with pany’s DBpedia entry while microplanning and their relation to create a template. For example, realization are carried out through template gen- “Microsoft was founded by Bill Gates and Paul eration. While some generation systems, partic- Allen” is converted to “ company was founded h i ularly in early years, used sophisticated gram- by founder ,” with conjoined entities collapsed h i mars for realization (Matthiessen and Bateman, into one slot. Many possible templates are created, 1991; Elhadad, 1991; White, 2014), in recent some of which contain multiple relations (e.g., years, template-based generation has shown a “ company , located in location , was founded h i h i resurgence. In some cases, authors focus on doc- by founder ”). In this way the system learns how h i ument planning and sentences in the domain are Wikipedia articles express relations between the stylized enough that templates suffice (Elhadad company and its key entities (founders, headquar- and Mckeown, 2001; Bouayad-Agha et al., 2011; ters, products, etc). Gkatzia et al., 2014; Biran and McKeown, 2015). At generation time, we fill the template slots In other cases, learned models that align database with the corresponding information from the RDF records with text snippets and then abstract out entries of the target company. Conjunctions are specific fields to form templates have proven suc- inserted when slots are filled by multiple entities. cessful for the generation of various domains (An- Continuing with our example, we might now pro- geli et al., 2010; Kondadadi et al., 2013). Others, duce the sentence “Palantir was founded by Peter like us, target atomic events (e.g., date of birth, Thiel, Alex Karp, Joe Lonsdale, Stephen Cohen, occupation) for inclusion in biographies (Filatova and Nathan Gettings” for target company Palan- and Prager, 2005) but the templates used in other tir. Preliminary results showed that this method work are manually encoded. was not adequate - the data for the target com- Sentence selection has also been used for ques- pany often lacked some of the entities needed to tion answering and query-focused summarization. fill the templates. Without those entities the sen- Some approaches focus on selection of relevant tence could not be generated. As Wikipedia sen- sentences using probabilistic approaches (Daume´ tences tend to have multiple relations each (high III and Marcu, 2005; Conroy et al., 2006), semi- information density), many sentences containing supervised learning (Wang et al., 2011) and graph- important, relevant facts were discarded due to based methods (Erkan and Radev, 2004; Otter- phrases that mentioned lesser facts we did not have bacher et al., 2005). Yet others use a mixture of the data to replace. We therefore added a post- targeted and data-driven methods for a pure sen- processing step to remove, if possible, any phrases

244 from the sentence that could not be filled; other- pattern.1 The entities are therefore assumed to be wise, the sentence is discarded. related since they are expressed in the same way This process yields many potential sentences as the seed pair. Unlike the TD approach, the ac- for each relation, of which we only want to choose tual relationship between the entities is unknown the best. We cluster the newly generated sen- (since the only data we use is the web text, not the tences by relation and score each cluster. Sen- structured RDF data); all we need to know here is tences are scored according to how much informa- that a relationship exists. tion about the target company they contain (num- We alternate learning the patterns and generat- ber of replaced relations). Shorter sentences are ing entity pairs over our development set of 100 also weighted more as they are less likely to con- companies. We then take all the learned patterns tain extraneous information, and sentences with and find matching sentences in the Bing search re- more post-processing are scored lower. The high- sults for each company in the set of target compa- est scored sentence for each relation type is added nies.2 Sentences that match any of the patterns are to the description as those sentences are the most selected and ranked by number of matches (more informative, relevant, and most likely to be gram- matches means greater probability of strong rela- matically correct. tion) before being added to the description.

4.1 Pruning and Ordering 4 Data-Driven Generation After selecting the sentences for the description, we perform a post-processing step that removes The DD method produces descriptions using sen- noise and redundancy. To address redundancy, we tences taken from the web. Like the TD approach, remove those sentences which were conveyed pre- it aims to produce sentences realizing relations be- viously in the description using exactly the same tween the input company and other entities. It uses wording. Thus, sentences which are equal to or a bootstrapping approach (Agichtein and Gravano, subsets of other sentences are removed. We also 2000) to learn patterns for expressing the relations. remove sentences that come from news stories; It starts with a seed set of company/entity pairs, analysis of our results on the development set indi- representing a small subset of the desired rela- cated that news stories rarely contain information tions, but unlike previous approaches, can gener- that is relevant to a typical Wikipedia description. ate additional relations as it goes. To do this we use regular expressions to capture Patterns are generated by reading text from the common newswire patterns (e.g., [CITY, STATE: web and extracting those sentences which con- sentence]). Finally, we remove incomplete sen- tain pairs in the seed set. The pair’s entities tences ending in “. . . ”, which sometimes appear are replaced with placeholder tags denoting the on websites which themselves contain summaries. type of the entity, while the words around them We order the selected sentences using a scoring form the pattern (the words between the tags are method that rewards sentences based on how they selected as well as words to the left and right refer to the company. Sentences that begin with of the tags). Each pattern thus has the form the full company name get a starting score of 25, “ L T1 M T2 R ,” where L, M, and R are re- h ih ih ih ih i sentences that begin with a partial company name spectively the words to the left of, between, and start with a score of 15, and sentences that do not to the right of the entities. T1 is the type of the contain the company name at all start at -15 (if first entity, and T2 the type of the second. Like they contain the company name in the middle of the TD algorithm, this is essentially a template the sentece, they start at 0). Then, 10 points are based approach, but the templates in this case are added to the score for each keyword in the sen- not aligned to a relation between the entity and the tence (keywords were selected from the most pop- company; only the type of entity (person, location, ulous DBPedia predicates where the subject is a organization, etc) is captured by the tag. company). This scoring algorithm was tuned on New entity pairs are generated by matching the the development set. The final output is ordered in learned patterns against web text. A sentence is 1 considered to match a pattern if it has the same We use a threshold on the cosine similarity of the texts to determine whether they match. entity types in the same order and its L, M, and R 2We excluded Wikipedia results to better simulate the words fuzzy match the corresponding words in the case of companies which do not have a Wikipedia page

245 descending order of scores. 150 words 500 words no limit TextRank 13.7 12.8 6.3 DD 15.0 14.5 14.0 5 Hybrid system TD 11.3 11.3 11.3 Hybrid 15.5 14.6 14.2 In addition to the two approaches separately, we also generated hybrid output from a combination Table 1: First experiment results: average ME- of the two. In this approach, we start with the DD TEOR scores for various size limits output; if (after pruning) it has fewer than three % best Avg. score sentences, we add the TD output and re-order. TextRank 25.79 2.82 Hybrid 74.21 3.81 The hybrid approach essentially supplements the large, more noisy web content of the DD out- Table 2: Second experiment results: % of compa- put with the small, high-quality but less diverse nies for which the approach was chosen as best by TD output. For companies that are not consumer- the human annotators, and average scores given facing or are relatively young, and thus have a rel- atively low web presence - our target population - this can significantly impact the description. provided a link to the company’s Wikipedia page for reference) and give a score on a 1-5 scale to 6 Experiments each description. For quality assurance, each pair of descriptions was processed by three annota- To evaluate our approach, we compare the three tors, and we only included in the results instances versions of our output - generated by the TD, DD, where all three agreed. Those constituted 44% of and hybrid approach - against multi-document the instances. In this evaluation we only used the summaries generated (from the same search re- hybrid version, and we limited the length of both sults used by our DD approach) by TextRank (Mi- the baseline and our output to 150 words to reduce halcea and Tarau, 2004). For each one of the ap- bias from a difference in lengths and keep the de- proaches as well as the baseline, we generated de- scriptions reasonably short for the annotators. scriptions for all companies that were part of the S&P500 as of January 2016. We used our devel- 7 Results opment set of 100 companies for tuning, and the evaluation results are based on the remaining 400. The results of the automated evaluation are shown We conducted two types of experiments. The in Table 1. Our DD system achieves higher ME- first is an automated evaluation, where we use TEOR scores than the TextRank baseline under all the METEOR score (Lavie and Agarwal, 2007) size variations, while TD by itself is worse in most between the description generated by one of our cases. In all cases the combined approach achieves approaches or by the baseline and the first sec- a better result than the DD system by itself. tion of the Wikipedia article for the company. The results of the human evaluation are shown In Wikipedia articles, the first section typically in Table 2. Here the advantage of our approach serves as an introduction or overview of the most becomes much more visible: we clearly beat the important information about the company. ME- baseline both in terms of how often the annotators TEOR scores capture the content overlap between chose our output to be better (almost 75% of the the generated description and the Wikipedia text. times) and in terms of the average score given to To avoid bias from different text sizes, we set the our descriptions (3.81 on a 1 5 point scale). − same size limit for all descriptions when compar- All results are statistically significant, but the ing them. We experimented with three settings: difference in magnitude between the results of 150 words, 500 words, and no size limit. the two experiments are striking: we believe that In addition, we conducted a crowd-sourced while the TextRank summarizer extracts sentences evaluation on the CrowdFlower platform. In this which are topically relevant and thus achieve re- evaluation, we presented human annotators with sults close to ours in terms of METEOR, the more two descriptions for the same company, one de- structured entity-focused approach we present scribed by our approach and one by the baseline, here is able to extract content that seems much in random order. The annotators were then asked more reasonable to humans as a general descrip- to choose which of the two descriptions is a better tion. One example is shown in Figure 1. overview of the company in question (they were Right from the start, we see that our system out-

246 performs TextRank. Our first sentence introduces the generated output. However, since it modifies the company and provides a critical piece of his- them, it risks generating ungrammatical sentences tory about it, while TextRank does not even im- or sentences which contain information about an- mediately name it. The hybrid generation output other company. The latter can occur because the has a more structured output, going from the ori- sentence is uniquely tied to the original. For in- gins of the company via merger, to its board, and stance, the following Wikipedia sentence fragment finally its products. TextRank’s output, in compar- – “Microsoft is the world’s largest software maker ison, focuses on the employee experience and only by revenue” - is a useful insight about the com- mentions products at the very end. Our system is pany, but our system would not be able to correctly much more suitable for a short description of the modify that to fit any other company. company for someone unfamilar with it. In contrast, by selecting sentences from the web about the given company, the DD approach en- TextRank: The company also emphasizes stretch assign- sures that the resulting description will be both ments and on-the-job learning for development, while its for- mal training programs include a Masters in the Business of grammatical and relevant. It also results in a wider (or “MBActivision”) program that gives employ- variety of expressions and a greater number of sen- ees a deep look at company operations and how its games are tences. However, it can include nonessential facts made, from idea to development to store shelves. How easy is it to talk with managers and get the information I need? that appear in a variety of different web venues. Will managers listen to my input? At , 78 It is not surprising, therefore, that the hybrid ap- percent of employees say they often or almost always experi- proach performs better than either by itself. ence a free and transparent exchange of ideas and information within the organization. Gaming is a part of day-to-day life at While in this paper we focus on company de- Activision Blizzard, and the company often organizes internal scriptions, the system can be adapted to generate tournaments for , Hearthstone: Heroes of War- descriptions for other entities (e.g. Persons, Prod- craft, Destiny, and other titles. What inspires em- ployees’ company spirit here Do people stand by their teams’ ucts) by updating the seed datasets for both ap- work What impact do people have outside the organization. proaches (to reflect the important facts for the de- sired descriptions) and retuning for best accuracy. Hybrid: Activision Blizzard was formed in 2007 from a merger between Activision and (as well as , which had already been a division of Acknowledgment Vivendi Games.) Upon merger, Activision Blizzard’s board of directors initially formed of eleven members: six directors This research was supported in part by an IBM designated by Vivendi, two Activision management directors Shared University Research grant provided to the and three independent directors who currently serve on Ac- tivision’s board of directors. It’s comprised of Blizzard En- Computer Science Department of Columbia Uni- tertainment, best known for blockbuster hits including World versity. of Warcraft, Hearthstone: Heroes of Warcraft, and the War- craft, StarCraft, and Diablo franchises, and Activision Pub- lishing, whose development studios (including Infinity Ward, , , and , to name References just a few) create blockbusters like Call of Duty, Skylanders, , and Destiny. Eugene Agichtein and Luis Gravano. 2000. Snow- ball: Extracting relations from large plain-text col- lections. In Proceedings of the fifth ACM conference Figure 1: Descriptions for Activision Blizzard on Digital libraries, pages 85–94. ACM.

Gabor Angeli, Percy Liang, and Dan Klein. 2010. A 8 Conclusion simple domain-independent probabilistic approach to generation. In Proceedings of the 2010 Confer- We described two approaches to generating com- ence on Empirical Methods in Natural Language pany descriptions as well as a hybrid approach. Processing, EMNLP ’10, pages 502–512, Strouds- burg, PA, USA. Association for Computational Lin- We showed that our output is overwhelmingly pre- guistics. ferred by human readers, and is more similar to Wikipedia introductions, than the output of a state- Or Biran and Kathleen McKeown. 2015. Discourse planning with an n-gram model of relations. In Pro- of-the-art summarization algorithm. ceedings of the 2015 Conference on Empirical Meth- These complementary methods each have their ods in Natural Language Processing, pages 1973– advantages and disadvantages: the TD approach 1977, Lisbon, Portugal, September. Association for ensures that typical expressions in Wikipedia com- Computational Linguistics. pany descriptions - known to be about the fun- Sasha Blair-Goldensohn, Kathleen R. McKeown, and damental relations of a company - will occur in And rew Hazen Schlaikjer. 2003. Defscriber: a hy-

247 brid system for definitional qa. In SIGIR ’03: Pro- Christian M.I.M. Matthiessen and John A. Bateman. ceedings of the 26th annual international ACM SI- 1991. Text generation and systemic-functional lin- GIR conference on Research and development in in- guistics: experiences from english and japanese. formaion retrieval, pages 462–462. Rada Mihalcea and Paul Tarau. 2004. Textrank: Nadjet Bouayad-Agha, Gerard Casamayor, and Leo Bringing order into texts. In Conference on Em- Wanner. 2011. Content selection from an ontology- pirical Methods in Natural Language Processing, based knowledge base for the generation of football Barcelona, Spain. summaries. In Proceedings of the 13th European Workshop on Natural Language Generation, ENLG Jahna Otterbacher, Gunes Erkan, and Dragomir R. ’11, pages 72–81, Stroudsburg, PA, USA. Associa- Radev. 2005. Using random walks for question- tion for Computational Linguistics. focused sentence retrieval. In Proceedings of HLT- EMNLP. John Conroy, Judith Schlesinger, and Dianne O’Leary. 2006. Topic-focused multi-document summariza- Ehud Reiter and Robert Dale. 1997. Building applied tion using an approximate or acle score. In Proceed- natural language generation systems. Nat. Lang. ings of ACL. Eng., 3(1):57–87, March. Hal Daume´ III and Daniel Marcu. 2005. Bayesian Christina Sauper and Regina Barzilay. 2009. Auto- multi-document summarization at mse. In Proceed- matically generating wikipedia articles: A structure- ings of the Workshop on Multilingual Summariza- aware approach. In Proceedings of the Joint Confer- tion Eva luation (MSE), Ann Arbor, MI, June 29. ence of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Lan- Noemie Elhadad and Kathleen R. Mckeown. 2001. guage Processing of the AFNLP: Volume 1 - Volume Towards generating patient specific summaries of 1, ACL ’09, pages 208–216, Stroudsburg, PA, USA. medical articles. In In Proceedings of NAACL-2001 Association for Computational Linguistics. Workshop Automatic. Michael Elhadad. 1991. FUF: The Universal Unifier B. Schiffman, Inderjeet. Mani, and K. Concepcion. User Manual ; Version 5.0. Department of Com- 2001. Producing biographical summaries: Combin- puter Science, Columbia University. ing linguistic knowledge with corpus statistics. In Proceedings of the 39th Annual Meeting of the Asso- Gunes¸Erkan¨ and Dragomir R. Radev. 2004. Lexrank: ciation for Computational Linguistics (ACL-EACL Graph-based centrality as salience in text summa- 2001), Toulouse, France, July. rization. Journal of Artificial Intelligence Research (JAIR). William Yang Wang, Kapil Thadani, and Kathleen McKeown. 2011. Identifying event descriptions Elena Filatova and John Prager. 2005. Tell me using co-training with online news su mmaries. what you do and i’ll tell you what you are: Learn- In Proceedings of IJNLP, Chiang-Mai, Thailand, ing occupation-related activities for biographies. In November. Proceedings of the Conference on Human Lan- guage Technology and Empirical Methods in Natu- Ralph M. Weischedel, Jinxi Xu, and Ana Licuanan. ral Language Processing, HLT ’05, pages 113–120, 2004. A hybrid approach to answering biographical Stroudsburg, PA, USA. Association for Computa- questions. In Mark T. Maybury, editor, New Direc- tional Linguistics. tions in Question Answering, pages 59–70. AAAI Press. Dimitra Gkatzia, Helen F. Hastie, and Oliver Lemon. 2014. Finding middle ground? multi-objective nat- Michael White. 2014. Towards surface realiza- ural language generation from time-series data. In tion with ccgs induced from dependencies. In Gosse Bouma and Yannick Parmentier, editors, Pro- Proceedings of the 8th International Natural Lan- ceedings of the 14th Conference of the European guage Generation Conference (INLG), pages 147– Chapter of the Association for Computational Lin- 151, Philadelphia, Pennsylvania, U.S.A., June. As- guistics, EACL 2014, April 26-30, 2014, Gothen- sociation for Computational Linguistics. burg, Sweden, pages 210–214. The Association for Computer Linguistics. Ravi Kondadadi, Blake Howald, and Frank Schilder. 2013. A statistical nlg framework for aggregated planning and realization. In ACL (1), pages 1406– 1415. The Association for Computer Linguistics. Alon Lavie and Abhaya Agarwal. 2007. Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. In Proceed- ings of the Second Workshop on Statistical Machine Translation, StatMT ’07, pages 228–231. Associa- tion for Computational Linguistics.

248