Efficient and Effective Query Auto-Completion
Giulio Ermanno Pibiri
Simon Gog Rossano Venturini
ACM Conference on Research and Development in Information Retrieval (SIGIR), 2020 27/07/2020 Query Auto-Completion
Given a collection S of scored strings and a partially completed user query Q, find the top-k strings that “match” Q in S. Setting
We focus on matching algorithms, not ranking mechanisms: we return the “most popular” results from a query log.
Many matching algorithms are possible, such as: exact, prefix, pattern (substring), edit-distance… Setting
We focus on matching algorithms, not ranking mechanisms: we return the “most popular” results from a query log.
Many matching algorithms are possible, such as: exact, prefix, pattern (substring), edit-distance…
prefix conjunctive Setting
We focus on matching algorithms, not ranking mechanisms: we return the “most popular” results from a query log.
Many matching algorithms are possible, such as: exact, prefix, pattern (substring), edit-distance…
prefix conjunctive Conjunctive-Search
Return strings containing all the tokens in the prefix and any token prefixed by the suffix.
Build an inverted index where docids are assigned in decreasing score order: smaller docids are better. Conjunctive-Search
Return strings containing all the tokens in the prefix and any token prefixed by the suffix.
Build an inverted index where docids are assigned in decreasing score order: smaller docids are better.
bmw s| Conjunctive-Search
Return strings containing all the tokens in the prefix and any token prefixed by the suffix.
Build an inverted index where docids are assigned in decreasing score order: smaller docids are better.
bmw s| bmw i3 sedan Conjunctive-Search
Return strings containing all the tokens in the prefix and any token prefixed by the suffix.
Build an inverted index where docids are assigned in decreasing score order: smaller docids are better.
bmw s| bmw i3 sedan bmw i3 sportback Conjunctive-Search
Return strings containing all the tokens in the prefix and any token prefixed by the suffix.
Build an inverted index where docids are assigned in decreasing score order: smaller docids are better.
bmw s| bmw i3 sedan bmw i3 sportback bmw i3 sport Conjunctive-Search
Returnbmw strings[7,9] containing all the tokens in the prefix and any token prefixed by the suffix.
Build an inverted index where docids are assigned in decreasing score order: smaller docids are better.
bmw s| bmw i3 sedan bmw i3 sportback bmw i3 sport Conjunctive-Search
Returnbmw strings[7,9] containing all the tokens in the prefix and any token prefixed by the suffix.
Build an inverted index where docids are assigned in decreasing score order: smaller docids are better. Heap-based approach: (1) Much better than explicitly computing the union. (2) Terms involved in union may be too many! bmw s| bmw i3 sedan bmw i3 sportback bmw i3 sport Conjunctive-Search
Forward search Conjunctive-Search
Forward bmw 2| search
————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search
Forward bmw 2| search
————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search
Forward bmw 2| search
————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search
Forward bmw 2| search bmw i 3 2015
————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search
Forward bmw 2| search bmw i 3 2015
————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search
Forward bmw 2| search bmw i 3 2015 bmw i 3 2017
————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search
Forward bmw 2| search bmw i 3 2015 bmw i 3 2017
————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search
Forward bmw 2| search bmw i 3 2015 bmw i 3 2017 bmw i 3 2016
————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search
Forward bmw 2| search bmw i 3 2015 bmw i 3 2017 bmw i 3 2016
————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search
Forward bmw 2| search bmw i 3 2015 bmw i 3 2017 bmw i 3 2016
————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search
Forward bmw 2| search bmw i 3 2015 bmw i 3 2017 bmw i 3 2016 bmw i 8 2015 ————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search
Forward-Search approach: (1) No heap management. (2) Need “direct” access to completions: Fwd or FC.
Forward bmw 2| search bmw i 3 2015 bmw i 3 2017 bmw i 3 2016 bmw i 8 2015 ————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Experiments
Machine equipped with Intel i9-9900K cores (@3.60 GHz), 64 GB of RAM, and running Linux 5 (64 bits).
C++ code available at https://github.com/jermp/autocomplete
Datasets Experiments — Efficiency
Top-10 conjunctive-search query timings in μsec per query, by varying query length and percentage of the last query token. Experiments — Effectiveness
Percentage of better scored results returned by conjunctive-search with respect to those returned by prefix-search for top-10 queries. Experiments — Space
Space usage in total MiB and bytes per completion (bpc). Experiments — Space
Space usage in total MiB and bytes per completion (bpc). Experiments — Space
Space usage in total MiB and bytes per completion (bpc). Take-away Messages
• Conjunctive-search overcomes the limited effectiveness of prefix-search by returning more and better scored results.
• While prefix-search is very fast (less then 3 μsec per query on average), conjunctive-search is more expensive and costs between 4 and 500 μsec per query depending on the size of the query.
• Our optimized implementation of conjunctive-search substantially outperforms the use of a classical as well as blocked inverted index with small extra, or even less, space. Thanks for your attention! Prefix-Search Prefix-Search
bmw i3 s Prefix-Search
1
bmw i3 s
1 Prefix-Search
1
2
bmw i3 s
2
1 Prefix-Search
1 Docids are assigned in decreasing score order: top-k algorithm 2 reduces to RMQ. 3
bmw i3 s
2
3 1 Prefix-Search
1 Docids are assigned in decreasing score order: top-k algorithm 2 reduces to RMQ. 3
bmw i3 s bmw i3 sedan 2
3 1 Prefix-Search
1 Docids are assigned in decreasing score order: top-k algorithm 2 reduces to RMQ. 3
bmw i3 s bmw i3 sedan 2 bmw i3 sportback
3 1 Prefix-Search
1 Docids are assigned in decreasing score order: top-k algorithm 2 reduces to RMQ. 3
bmw i3 s bmw i3 sedan 2 bmw i3 sportback bmw i3 sport 3 1