Squid: Semantic Similarity-Aware Query Intent Discovery
Total Page:16
File Type:pdf, Size:1020Kb
SQuID: Semantic Similarity-Aware Query Intent Discovery Anna Fariha Sheikh Muhammad Sarwar Alexandra Meliou [email protected] [email protected] [email protected] University of Massachusetts Amherst University of Massachusetts Amherst University of Massachusetts Amherst Traditional data retrieval is challenging Semantic context can be complex q Non expert users struggle to formulate How to express funny? complex SQL query q Implicit context q Humans can tell easily! complex schema no SQL expertise ü SQuID captures implicit semantic context Semantic context: funny actors q No explicit attribute, but hidden in the data Query by Example (QBE) tries but fails q Appearing in more than 40 comedy movies But I wanted SQuID in a nutshell Marvel superhero movies only! q Real-time performance through precomputing basic and derived semantic properties Example Tuples Iron Man Basic Derived Spider Man QBE Semantic Property Semantic Property The Incredible Hulk birth genre list of all year person movie movies language gender person q Existing QBE approaches fail to capture semantic context (e.g., Marvel movies) height genre age person SQuID: a query intent discovery framework language ü SQuID is aware of semantic similarity q Filters encode semantic properties and Result tuples constitute selection predicates Ant Man SELECT name The Avengers FROM person, person_to_genre pg, genre Example tuples Iron Man Iron Man WHERE person.id = pg.person_id AND The Incredible Hulk Spider Man pg.genre_id genre.id AND Thor: The Dark World The Incredible Hulk Doctor Strange genre.name = “Comedy” AND pg.count >= 40 Captain America Spider Man Architecture Guardians of the Galaxy q Offline: Precomputes derived semantic properties and related statistics for real-time performance Not all semantic similarities are intended q Online: Discovers semantic context, captures most likely query intent, and constructs SQL query What are the intended semantic similarities? Example Example tuples Offline Online intent ✘ All are Male precomputation discovery Adam Sandler • but, so are 2 million other people Query Eddie Murphy ✘ All are Hollywood actors Jim Carrey • but, so are 1.5 million other people derived ü All are Funny Actors relations • very few actors are funny Meta-info Semantic schema DB property information statistics ü SQuID rejects irrelevant semantic context.