Pre-Trained Language Model Based Ranking in Baidu Search

Pre-trained Language Model based Ranking in Baidu Search Lixin Zou, Shengqiang Zhangy, Hengyi Cai, Dehong Ma, Suqi Cheng, Daiting Shi, Zhifan Zhu, Weiyue Su, Shuaiqiang Wang, Zhicong Cheng, Dawei Yin∗ Baidu Inc., Beijing, China {zoulixin15,hengyi1995,chengsuqi,shqiang.wang}@gmail.com,[email protected] {madehong,shidaiting01,zhuzhifan,suweiyue,chengzhicong01}@baidu.com,[email protected] ABSTRACT KEYWORDS As the heart of a search engine, the ranking system plays a crucial Pre-trained Language Model; Learning to Rank role in satisfying users’ information demands. More recently, neural rankers fine-tuned from pre-trained language models (PLMs) ACM Reference Format: Lixin Zou, Shengqiang Zhang, Hengyi Cai, De- establish state-of-the-art ranking effectiveness. However, it is non- hong Ma, Suqi Cheng, Daiting Shi, Shuaiqiang Wang, Zhicong Cheng, Dawei trivial to directly apply these PLM-based rankers to the large-scale Yin. 2021. Pre-tarined Language Model based Ranking in Baidu Search. In web search system due to the following challenging issues: (1) Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and the prohibitively expensive computations of massive neural PLMs, Data Mining (KDD ’21), August 14-18, 2021, Virtual Event, Singapore. ACM, especially for long texts in the web-document, prohibit their de- New York, NY, USA, 9 pages. https://doi.org/10.1145/3447548.3467147 ployments in an online ranking system that demands extremely low latency; (2) the discrepancy between existing ranking-agnostic 1 INTRODUCTION pre-training objectives and the ad-hoc retrieval scenarios that de- mand comprehensive relevance modeling is another main barrier As essential tools for accessing information in today’s world, search for improving the online ranking system; (3) a real-world search engines like Google and Baidu satisfy millions of users’ information engine typically involves a committee of ranking components, and needs every day. In large-scale industrial search engines, ranking thus the compatibility of the individually fine-tuned ranking model typically serves as the central stage. It aims at accurately ordering is critical for a cooperative ranking system. the shortlisted candidate documents retrieved from previous stages, In this work, we contribute a series of successfully applied tech- which plays a critical role in satisfying user information needs and niques in tackling these exposed issues when deploying the state- improving user experience. of-the-art Chinese pre-trained language model, i.e., ERNIE, in the Traditional approaches, including learning to rank [34], are typi- online search engine system. We first articulate a novel practice cally based on hand-crafted, manually-engineered features. How- to cost-efficiently summarize the web document and contextualize ever, they may easily fail to capture the search intent from the the resultant summary content with the query using a cheap yet query text and infer the latent semantics of documents. With the powerful Pyramid-ERNIE architecture. Then we endow an inno- recent significant progress of pre-training language models (PLMs) vative paradigm to finely exploit the large-scale noisy and biased like BERT [13] and ERNIE [44] in many language understanding post-click behavioral data for relevance-oriented pre-training. We tasks, large-scale pre-trained models also demonstrate increasingly also propose a human-anchored fine-tuning strategy tailored for promising text ranking results [33]. For example, neural rankers the online ranking system, aiming to stabilize the ranking signals fine-tuned from pre-trained models establish state-of-the-art rank- across various online components. Extensive offline and online ex- ing effectiveness [39, 40], attributing to its ability to perform full perimental results show that the proposed techniques significantly self-attention over a given query and candidate document, in which boost the search engine’s performance. deeply-contextualized representations of all possible input token pairs bridge the semantic gap between query and document terms. However, it is nontrivial to directly apply the recent advance- arXiv:2105.11108v3 [cs.IR] 25 Jun 2021 CCS CONCEPTS ments in PLMs to web-scale search engine systems with trillions • Information systems → Language models; Learning to rank; of documents and stringent efficiency requirements. First, significant improvements brought by these PLMs come at a high cost of ∗ Corresponding author. y Co-first author. prohibitively expensive computations. Common wisdom [45, 49] suggests that the BERT-based ranking model is inefficient in pro- cessing long text due to its quadratically increasing memory and Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed computation consumption, which is further exacerbated when in- for profit or commercial advantage and that copies bear this notice and the full citation volving the full content of a document (typically with length ¡ on the first page. Copyrights for components of this work owned by others than ACM 4000) into the ranking stage. It thus poses a challenging trade-off must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a to reconcile the efficiency and contextualization in a real-world fee. Request permissions from [email protected]. ranking system. Second, explicitly capturing the comprehensive KDD ’21, August 14–18, 2021, Virtual Event, Singapore. relevance between query and documents is crucial to the ranking © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8332-5/21/08...$15.00 task. Existing pre-training objectives, either sequence-based tasks https://doi.org/10.1145/3447548.3467147 (e.g., masked token prediction) or sentence pair-based tasks (e.g., permuted language modeling), learn contextual representations a fast query-dependent summary extraction algorithm and a based on the intra/inter-sentence coherence relationship, which Pyramid-ERNIE architecture, striking a good balance between cannot be straightforwardly adapted to model the query-document the efficiency and effectiveness of PLM-based ranking schema in relevance relations. Although user behavioral information can be the real-world search engine system. leveraged to mitigate this defect, elaborately designing relevance- • Relevance-oriented Pre-training. We design an innovative oriented pre-training strategies to fully exploit the power of PLMs relevance-oriented pre-training paradigm to finely exploit the for industrial ranking remains elusive, especially in noisy clicks and large-scale post-click behavioral data, in which the noisy and exposure bias induced by the search engine. Third, to well deploy biased user clicks are calibrated to align the relevance signals the fine-tuned PLM in a real ranking system with various modules, annotated by the human experts. the final ranking score should be compatible with other compo- • Human-anchored Fine-tuning. We propose a human-anchored nents, such as the ranking modules of freshness, quality, authority. fine-tuning strategy tailored for the online ranking system, aim- Therefore, in addition to pursuing the individual performance, care- ing to stabilize the ranking signals across various online compo- fully designing the fine-tuning procedure to seamlessly interwoven nents and further mitigate the misalignment between the naive the resultant PLM and other components into a cooperative ranking fine-tuning objective and human-cared intrinsic relevance mea- system is the crux of a well-behaved deployment. surements. This work concentrates on endowing our experiences in tack- • Extensive Offline and Online Evaluations. We conduct ex- ling these issues that emerged in PLM-based online ranking and tensive offline and online experiments to validate the effective- introducing a series of instrumental techniques that have been suc- ness of the designed ranking approach. The results show that cessfully implemented and deployed to power the Baidu search en- the proposed techniques significantly boost the search engine’s gine. In order to improve both the effectiveness and efficiency axes performance. for PLM-based full-content-aware ranking, we propose a two-step framework to achieve this goal: (1) extract the query-dependent 2 METHODOLOGY summary on the fly with an efficient extraction algorithm; (2) decou- In this section, we describe the technical details of our proposed ple the text representation and interaction with a modularized PLM. approaches. We first formulate the ranking task as a utility optimiza- Specifically, we provide a QUery-WeIghted Summary ExTraction tion problem. Then, we provide the linear time complexity query- (QUITE) algorithm with linear time complexity to cost-efficiently dependent summary extraction algorithm and propose Pyramid- summarize the full content of the web document. Given a sum- ERNIE architecture to reconcile the content-aware ranking’s ef- mary, a Pyramid-ERNIE, built upon the state-of-the-art Chinese ficiency and effectiveness. To effectively incentivize a relevance- PLM ERNIE [44], first decouples the text representation into two oriented contextual representation, we present a novel pre-training parts: the query-title part and summary part. Then, the Pyramid- strategy in which large-scale post-click behavioral information can ERNIE captures

Load more