Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar∗ Sunita Sarawagi IBM Research IIT Bombay New Delhi, India Mumbai, India
[email protected] [email protected] ABSTRACT like “Pain killer Side effects” to retrieve instances of rela- | We present the design of a structured search engine which re- tionship between two entities; three column keyword queries like “Cheese name Country of origin Milk source” to find turns a multi-column table in response to a query consisting | | of keywords describing each of its columns. We answer such entities along with values of two attributes. We tap the queries by exploiting the millions of tables on the Web be- large number of organically created tables on the Web to cause these are much richer sources of structured knowledge answer such queries. In a recent 500 million pages Web than free-format text. However, a corpus of tables harvested crawl, we conservatively estimated that over 25 million ta- from arbitrary HTML web pages presents huge challenges of bles express structured information. Similar statistics have diversity and redundancy not seen in centrally edited knowl- been reported elsewhere on other web crawls [4, 3, 7]. Each edge bases. We concentrate on one concrete task in this pa- table contributes valuable facts about entities, their types, and relationships between them, and does so in a manner per. Given a set of Web tables T1,...,Tn, and a query Q that is considerably less diverse and less noisy, compared to with q sets of keywords Q1,...,Qq, decide for each Ti if it is relevant to Q and if so, identify the mapping between the how facts are expressed in free-format text on the general Web.