Extreme Multi-Label Learning for Semantic Matching in Product Search

Extreme Multi-label Learning for Semantic Matching in Product Search Wei-Cheng Chang Daniel Jiang Hsiang-Fu Yu Amazon Amazon Amazon USA USA USA Choon-Hui Teo Jiong Zhang Kai Zhong Amazon Amazon Amazon USA USA USA Kedarnath Kolluri Qie Hu Nikhil Shandilya Amazon Amazon Amazon USA USA USA Vyacheslav Ievgrafov Japinder Singh Inderjit S. Dhillon Amazon Amazon Amazon USA USA USA ABSTRACT CCS CONCEPTS We consider the problem of semantic matching in product search: • Computing methodologies ! Machine learning; Natural lan- given a customer query, retrieve all semantically related products guage processing; • Information systems ! Information retrieval; from a huge catalog of size 100 million, or more. Because of large World Wide Web. catalog spaces and real-time latency constraints, semantic matching algorithms not only desire high recall but also need to have KEYWORDS low latency. Conventional lexical matching approaches (e.g., Okapi- Product Search; Semantic Matching; Extreme Multi-label Learning BM25) exploit inverted indices to achieve fast inference time, but fail to capture behavioral signals between queries and products. ACM Reference Format: Wei-Cheng Chang, Daniel Jiang, Hsiang-Fu Yu, Choon-Hui Teo, Jiong Zhang, In contrast, embedding-based models learn semantic representa- Kai Zhong, Kedarnath Kolluri, Qie Hu, Nikhil Shandilya, Vyacheslav Iev- tions from customer behavior data, but the performance is often grafov, Japinder Singh, and Inderjit S. Dhillon. 2021. Extreme Multi-label limited by shallow neural encoders due to latency constraints. Se- Learning for Semantic Matching in Product Search. In Proceedings of the mantic product search can be viewed as an eXtreme Multi-label 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Classification (XMC) problem, where customer queries are input (KDD ’21), August 14–18, 2021, Virtual Event, Singapore. ACM, New York, instances and products are output labels. In this paper, we aim to NY, USA, 9 pages. https://doi.org/10.1145/3447548.3467092 improve semantic product search by using tree-based XMC models where inference time complexity is logarithmic in the number 1 INTRODUCTION of products. We consider hierarchical linear models with n-gram In general, product search consists of two building blocks: the features for fast real-time inference. Quantitatively, our method matching stage, followed by the ranking stage. When a customer maintains a low latency of 1.25 milliseconds per query and achieves issues a query, the query is passed to several matching algorithms a 65% improvement of Recall@100 (60.9% v.s. 36.8%) over a compet- to retrieve relevant products, resulting in a match set. The match ing embedding-based DSSM model. Our model is robust to weight set passes through stages of ranking, where top results from the pruning with varying thresholds, which can flexibly meet different previous stage are re-ranked before the most relevant items are system requirements for online deployments. Qualitatively, our displayed to customers. Figure 1 outlines a product search system. method can retrieve products that are complementary to existing The matching (i.e., retrieval) phase is critical. Ideally, the match product search system and add diversity to the match set. set should have a high recall [29], containing as many relevant and diverse products that match the customer intent as possible; otherwise, many relevant products won’t be considered in the final Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed ranking phase. On the other hand, the matching algorithm should be for profit or commercial advantage and that copies bear this notice and the full citation highly efficient to meet real-time inference constraints: retrieving on the first page. Copyrights for components of this work owned by others than ACM a small subset of relevant products in time sublinear to enormous must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a catalog spaces, whose size is as large as 100 million or more. fee. Request permissions from [email protected]. Matching algorithms can be put into two categories. The first KDD ’21, August 14–18, 2021, Virtual Event, Singapore type is lexical matching approaches that score a query-product © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8332-5/21/08...$15.00 pair by a weighted sum of overlapping keywords among the pair. https://doi.org/10.1145/3447548.3467092 One representative example is Okapi-BM25 [36, 37], which remains industrial product search engines do not have the luxury to leverage state-of-the-art deep Transformer encoders in embedding-based neural methods, whereas shallow MLP encoders result in limited performance. In this paper, we take another route to tackle semantic matching by formulating it as an eXtreme Multi-label Classification (XMC) problem, where the goal is tagging an input instance (i.e., query) with most relevant output labels (i.e., products). XMC approaches have received great attention recently in both the academic and industrial community. For instance, the deep learning XMC model X-Transformer [7, 52] achieves state-of-the-art performance on Figure 1: System architecture for augmenting match set with public academic benchmarks [3]. Partition-based methods such as extreme multi-label learning models. Parabel [33] and XReg [34], as another example, finds successful applications to dynamic search advertising in Bing. In particular, state-of-the-art for decades and is still widely used in many re- tree-based partitioning XMC models are a staple of modern search trieval tasks such as open-domain question answering [6, 27] and engines and recommender systems due to their inference time being passage/document retrieval [5, 14]. While the inference of Okapi- sub-linear (i.e., logarithmic) to the enormous output space (e.g., 100 BM25 can be done efficiently using an inverted index [53], these million or more). index-based lexical matching approaches cannot capture semantic In this paper, we aim to answer the following research question: and customer behavior signals (e.g., clicks, impressions, or pur- Can we develop an effective tree-based XMC approach for semantic chases) tailored from the product search system and are fragile to matching in product search? Our goal is not to replace lexical-based morphological variants or spelling errors. matching methods (e.g., Okapi-BM25). Instead, we aim to comple- The second option is embedding-based neural models that learn ment/augment the match set with more diverse candidates from semantic representations of queries and products based on the the proposed tree-based XMC approaches; hence the final stage customer behavior signals. The similarity is measured by inner ranker can produce a good and diverse set of products in response products or cosine distances between the query and product em- to search queries. beddings. To infer in real-time the match set of a novel query, We make the following contributions in this paper. embedding-based neural models need to first vectorize the query • We apply the leading tree-based XR-Linear (PECOS) model [52] tokens into the query embedding, and then find its nearest neighbor to semantic matching. To our knowledge, we are the first to products in the embedding space. Finding nearest neighbors can successfully apply tree-based XMC methods to an industrial be done efficiently via approximate nearest neighbor search (ANN) scale product search system. methods (e.g., HNSW [28] or ScaNN [15]). Vectorizing query tokens • We explore various =-gram TF-IDF features for XR-Linear into an embedding, nevertheless, is often the inference bottleneck, as vectorizing queries into sparse TF-IDF features is highly that depends on the complexity of neural network architectures. efficient for real-time inference. Using BERT-based encoders [11] for query embeddings is less • We study weight pruning of the XR-Linear model and demon- practical because of the large latency in vectorizing queries. Specif- strate its robustness to different disk-space requirements. ically, consider retrieving the match set of a query from the output • We present the trade-off between recall rates and real-time space of 100 millions of products on a CPU machine. In Table 1, latency. Specifically, for beam size1 = 15, XR-Linear achieves we compare the latency of a BERT-based encoder (12 layers deep Recall@100 of 60.9% with a latency of 1.25 ms/q; for beam Transformer) with a shallow DSSM [32] encoder (2 layers MLP). The size 1 = 50, XR-Linear achieves Recall@100 of 65.7% with a inference latency of a BERT-based encoder is 50 milliseconds per latency of 3.48 ms/q. In contrast, DSSM only has Recall@100 query (ms/q) in total, where vectorization and ANN search needs of 36.2% with a latency of 3.13 ms/q. 88% and 12% of the time, respectively. On the other hand, the latency The implementation of XR-Linear (PECOS) is publicly available at of DSSM encoder is 3.13 ms/q in total, where vectorization only https://github.com/amzn/pecos. takes 33% of the time. Because of real-time inference constraints, 2 RELATED WORK Vectorizer latency (ms/q) ANN latency (ms/q) 2.1 Semantic Product Search Systems DSSM [32] 1.00 HNSW [28] 2.13 Two-tower models (a.k.a., dual encoders, Siamese networks) are BERT-based [11] 44.00 HNSW [28] ≈ 6.39 arguably one of the most popular embedding-based neural mod- Table 1: Comparing inference latency of BERT-based en- els used in passage or document retrieval [31, 46], dialogue sys- coder with DSSM encoder on a CPU machine, with single- tems [18, 30], open-domain question answering [16, 24, 27], and batch, single-thread settings.

Load more