An Entity-Focused Approach to Generating Company Descriptions Gavin Saldanha* Or Biran** Kathleen McKeown** Alfio Gliozzo Columbia University Columbia University Columbia University IBM Watson† *
[email protected] ** orb, kathy @cs.columbia.edu
[email protected]{ } † Abstract these with sentences on the web that match learned expressions of relationships. We evaluate our hy- Finding quality descriptions on the web, brid approach and compare it with a targeted-only such as those found in Wikipedia arti- approach and a data-driven-only approach, as well cles, of newer companies can be difficult: as a strong multi-document summarization base- search engines show many pages with line. Our results show that the hybrid approach varying relevance, while multi-document performs significantly better than either approach summarization algorithms find it difficult alone as well as the baseline. to distinguish between core facts and other The targeted (TD) approach to company de- information such as news stories. In this scription uses Wikipedia descriptions as a model paper, we propose an entity-focused, hy- for generation. It learns how to realize RDF re- brid generation approach to automatically lations that have the company as their subject: produce descriptions of previously unseen each relation contains a company/entity pair and companies, and show that it outperforms a it is these pairs that drive both content and expres- strong summarization baseline. sion of the company description. For each com- pany/entity pair, the system finds all the ways in 1 Introduction which similar company/entity pairs are expressed As new companies form and grow, it is impor- in other Wikipedia company descriptions, clus- tant for potential investors, procurement depart- tering together sentences that express the same ments, and business partners to have access to a company/entity relation pairs.