Efficient and Effective Query Auto-Completion

Giulio Ermanno Pibiri

Simon Gog Rossano Venturini

ACM Conference on Research and Development in Information Retrieval (SIGIR), 2020 27/07/2020 Query Auto-Completion

Given a collection S of scored strings and a partially completed user query Q, find the top-k strings that “match” Q in S. Setting

We focus on matching algorithms, not ranking mechanisms: we return the “most popular” results from a query log.

Many matching algorithms are possible, such as: exact, prefix, pattern (substring), edit-distance… Setting

We focus on matching algorithms, not ranking mechanisms: we return the “most popular” results from a query log.

Many matching algorithms are possible, such as: exact, prefix, pattern (substring), edit-distance…

prefix conjunctive Setting

We focus on matching algorithms, not ranking mechanisms: we return the “most popular” results from a query log.

Many matching algorithms are possible, such as: exact, prefix, pattern (substring), edit-distance…

prefix conjunctive Conjunctive-Search

Return strings containing all the tokens in the prefix and any token prefixed by the suffix.

Build an inverted index where docids are assigned in decreasing score order: smaller docids are better. Conjunctive-Search

Return strings containing all the tokens in the prefix and any token prefixed by the suffix.

Build an inverted index where docids are assigned in decreasing score order: smaller docids are better.

s| Conjunctive-Search

Return strings containing all the tokens in the prefix and any token prefixed by the suffix.

Build an inverted index where docids are assigned in decreasing score order: smaller docids are better.

bmw s| Conjunctive-Search

Return strings containing all the tokens in the prefix and any token prefixed by the suffix.

Build an inverted index where docids are assigned in decreasing score order: smaller docids are better.

bmw s| bmw i3 sedan bmw i3 sportback Conjunctive-Search

Return strings containing all the tokens in the prefix and any token prefixed by the suffix.

Build an inverted index where docids are assigned in decreasing score order: smaller docids are better.

bmw s| bmw i3 sedan bmw i3 sportback bmw i3 sport Conjunctive-Search

Returnbmw strings[7,9] containing all the tokens in the prefix and any token prefixed by the suffix.

Build an inverted index where docids are assigned in decreasing score order: smaller docids are better.

bmw s| bmw i3 sedan bmw i3 sportback bmw i3 sport Conjunctive-Search

Returnbmw strings[7,9] containing all the tokens in the prefix and any token prefixed by the suffix.

Build an inverted index where docids are assigned in decreasing score order: smaller docids are better. Heap-based approach: (1) Much better than explicitly computing the union. (2) Terms involved in union may be too many! bmw s| bmw i3 sedan bmw i3 sportback bmw i3 sport Conjunctive-Search

Forward search Conjunctive-Search

Forward bmw 2| search

————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search

Forward bmw 2| search

————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search

Forward bmw 2| search

————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search

Forward bmw 2| search bmw i 3 2015

————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search

Forward bmw 2| search bmw i 3 2015

————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search

Forward bmw 2| search bmw i 3 2015 bmw i 3 2017

————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search

Forward bmw 2| search bmw i 3 2015 bmw i 3 2017

————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search

Forward bmw 2| search bmw i 3 2015 bmw i 3 2017 bmw i 3 2016

————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search

Forward bmw 2| search bmw i 3 2015 bmw i 3 2017 bmw i 3 2016

————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search

Forward bmw 2| search bmw i 3 2015 bmw i 3 2017 bmw i 3 2016

————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search

Forward bmw 2| search bmw i 3 2015 bmw i 3 2017 bmw i 3 2016 bmw i 8 2015 ————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Conjunctive-Search

Forward-Search approach: (1) No heap management. (2) Need “direct” access to completions: Fwd or FC.

Forward bmw 2| search bmw i 3 2015 bmw i 3 2017 bmw i 3 2016 bmw i 8 2015 ————————————————————————————————— terms inverted lists termids ————————————————————————————————— —————————————————————————————————————————— 1 4 0 docids completions sets 2015 0, 5, 6 1 —————————————————————————————————————————— 2016 3 2 5 audi a 3 2015 7 , 6 , 4 , 1 2017 1, 2 3 2 audi q 8 2017 7 , 10 , 5 , 3 3 0, 1, 3, 5 4 4 bmw x 1 8 , 11 , 0 8 2, 6 5 0 bmw i 3 2015 8 , 9 , 4 , 1 a 5 6 3 bmw i 3 2016 8 , 9 , 4 , 2 audi 2, 5 7 1 bmw i 3 2017 8 , 9 , 4 , 3 bmw 0, 1, 3, 4, 6 8 6 bmw i 8 2015 8 , 9 , 5 , 1 i 0, 1, 3, 6 9 q 2 10 x 4 11 Experiments

Machine equipped with Intel i9-9900K cores (@3.60 GHz), 64 GB of RAM, and running Linux 5 (64 bits).

C++ code available at https://github.com/jermp/autocomplete

Datasets Experiments — Efficiency

Top-10 conjunctive-search query timings in μsec per query, by varying query length and percentage of the last query token. Experiments — Effectiveness

Percentage of better scored results returned by conjunctive-search with respect to those returned by prefix-search for top-10 queries. Experiments — Space

Space usage in total MiB and bytes per completion (bpc). Experiments — Space

Space usage in total MiB and bytes per completion (bpc). Experiments — Space

Space usage in total MiB and bytes per completion (bpc). Take-away Messages

• Conjunctive-search overcomes the limited effectiveness of prefix-search by returning more and better scored results.

• While prefix-search is very fast (less then 3 μsec per query on average), conjunctive-search is more expensive and costs between 4 and 500 μsec per query depending on the size of the query.

• Our optimized implementation of conjunctive-search substantially outperforms the use of a classical as well as blocked inverted index with small extra, or even less, space. Thanks for your attention! Prefix-Search Prefix-Search

bmw i3 s Prefix-Search

1

bmw i3 s

1 Prefix-Search

1

2

bmw i3 s

2

1 Prefix-Search

1 Docids are assigned in decreasing score order: top-k algorithm 2 reduces to RMQ. 3

bmw i3 s

2

3 1 Prefix-Search

1 Docids are assigned in decreasing score order: top-k algorithm 2 reduces to RMQ. 3

bmw i3 s bmw i3 sedan 2

3 1 Prefix-Search

1 Docids are assigned in decreasing score order: top-k algorithm 2 reduces to RMQ. 3

bmw i3 s bmw i3 sedan 2 bmw i3 sportback

3 1 Prefix-Search

1 Docids are assigned in decreasing score order: top-k algorithm 2 reduces to RMQ. 3

bmw i3 s bmw i3 sedan 2 bmw i3 sportback bmw i3 sport 3 1