Model Ensemble for Click Prediction in Bing Search Ads

Total Page:16

File Type:pdf, Size:1020Kb

Model Ensemble for Click Prediction in Bing Search Ads Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Weiwei Deng Chen Gu Microsoft Bing Microsoft Bing Microsoft Bing No. 5 Dan Ling Street No. 5 Dan Ling Street No. 5 Dan Ling Street Beijing, China Beijing, China Beijing, China [email protected] [email protected] [email protected] ∗ Hucheng Zhou Cui Li Feng Sun Microsoft Research Microsoft Research Microsoft Bing No. 5 Dan Ling Street No. 5 Dan Ling Street No. 5 Dan Ling Street Beijing, China Beijing, China Beijing, China [email protected] [email protected] [email protected] ABSTRACT Google [21], Facebook [14] and Yahoo! [3]. Recently, factoriza- Accurate estimation of the click-through rate (CTR) in sponsored tion machines (FMs) [24, 5, 18, 17], gradient boosting decision ads significantly impacts the user search experience and businesses’ trees (GBDTs) [25] and deep neural networks (DNNs) [29] have revenue, even 0.1% of accuracy improvement would yield greater also been evaluated and gradually adopted in industry. earnings in the hundreds of millions of dollars. CTR prediction is A single model would lead to suboptimal accuracy, and the above- generally formulated as a supervised classification problem. In this mentioned models all have various different advantages and dis- paper, we share our experience and learning on model ensemble de- advantages. They are usually ensembled together in an industry sign and our innovation. Specifically, we present 8 ensemble meth- setting (or even machine learning competition like Kaggle [15]) to ods and evaluate them on our production data. Boosting neural net- achieve better prediction accuracy. For instance, apps recommen- works with gradient boosting decision trees turns out to be the best. dation in Google adopts Wide&Deep [7] that co-trains LR (wide) With larger training data, there is a nearly 0.9% AUC improvement and DNN (deep) together; ad CTR in Facebook [14] uses GBDT in offline testing and significant click yield gains in online traffic. for non-linear feature transformation and feeds them to LR for the In addition, we share our experience and learning on improving the final prediction; Yandex [25] boosts LR with GBDT for CTR pre- quality of training. diction; and there also exists work [29] on ads CTR that feeds the FM embedding learned from sparse features to DNN. Simply replicating them does not yield the best possible level of accuracy. Keywords In this paper, we share our experience and learning on designing click prediction; DNN; GBDT; model ensemble and optimizing model ensembles to improve the CTR prediction in Microsoft Bing Ads. 1. INTRODUCTION The challenge lies in the large design space: which models are Search engine advertising has become a significant element of ensembled together; which ensemble techniques are used; and which the web browsing experience. Choosing the right ads for a query ensemble design would achieve the best accuracy? In this paper, and the order in which they are displayed greatly affects the prob- we present 8 ensemble variants and evaluate them in our system. ability that a user will see and click on each ad. Accurately esti- The ensemble that boosts the NN with GBDT, i.e., initializes the mating the click-through rate (CTR) of ads [10, 16, 12] has a vital sample target for GBDT with the prediction score of NN, is con- impact on the revenue of search businesses; even a 0.1% accuracy sidered to be the best in our setting. With larger training data, it improvement in our production would yield hundreds of millions shows near 0.9% AUC improvement in offline testing and signifi- of dollars in additional earnings. An ad’s CTR is usually modeled cant click yield gains in online traffic. To push this new ensemble as a classification problem, and thus can be estimated by machine design into the system also brings system challenges on a fast and learning models. The training data is collected from historical ads accurate trainer, considering that multiple models are trained and impressions and the corresponding clicks. Because of the sim- each trainer must have good scalability and accuracy. We share our plicity, scalability and online learning capability, logistic regres- experience with identifying accuracy-critical factors in training. sion (LR) is the most widely used model that has been studied by The rest of the paper is organized as follows. We first provide a brief primer on the ad system in Microsoft Bing Ads in Section 2. ∗This work was done during her internship in Microsoft Research. We then present several model ensemble design in detail in Sec- tion 3, followed by the corresponding evaluation against production data. The means of improving model accuracy and system perfor- c 2017 International World Wide Web Conference Committee (IW3C2), mance is described in Section 5. Related work is listed in Section 6 published under Creative Commons CC BY 4.0 License. and we conclude in Section 7. WWW 2017, April 3–7, 2017, Perth, Australia. ACM 978-1-4503-4914-7/17/04. http://dx.doi.org/10.1145/3041021.3054192 2. ADS CTR OVERVIEW In this section, we will describe the overview of the ad system in . Microsoft Bing Ads and the basic models and features we use. 2.1 Ads System Overview output Sponsored search typically uses keyword based auction. Adver- tisers bid on a list of keywords for their ad campaigns. When a user hidden layers searches with a query, the search engine matches the user query with activation with bidding keywords, and then selects and shows proper ads to splitting features the user. When a user clicks any of the ads, the advertiser will ... ... ... be charged with a fee based on the generalized second price [2, LR input features DNN GBDT Figure 1: Graphical illustration of basic models: LR, DNN and 1]. A typical system involves several steps including selection, GBDT. relevance filtration, CTR prediction, ranking and allocation. The input query from the user is first used to retrieve a list of candi- date ads (selection). Specifically, the selection system parses the query, expands it to relevant ad keywords and then retrieves the Deep Neural Network. DNN generalizes to previously unseen ads from advertisers’ campaigns according to their bidding key- query-ad feature pairs by learning a low-dimensional dense em- words. For each selected ad candidate, a relevance model estimates bedding vector for both query and ad features, with less burden of the relevance score between query and ad, and further filters out feature engineering. The middle model in Figure 1 depicts four the least relevant ones (relevance filtration). The remain- layers of the DNN structure, including two hidden layers each with ing ads are estimated by the click model to predict the click prob- u neuron units, one input layer with m features and one output ability (pClick) given the query and context information (click layer with one single output. With a top-down description, the output unit is a real number p 2 (0;1) as the predicted CTR with prediction). In addition, a ranking score is calculated for each 1 ad candidate by bid ∗ pClick where bid is the corresponding bid- p = s(w2 · x2 + b2), where s(a) = 1+exp(−a) is the logistic acti- 1×u ding price. These candidates are then sorted by their ranking score vation function. w2 2 R is the parameter matrix between out- (ranking). Finally, the top ads with a ranking score larger than the put layer and the connected hidden layer, b2 2 R is the bias. x2 2 given threshold are allocated for impression (allocation), such Ru is the activation output of the last hidden layer computed as u×u u u that the number of impressions is limited by total available slots. x2 = s(w1 ·x1 +b1), where w1 2 R ;b1 2 R ;x1 2 R . Similarly, u×m u m The click probability is thus a key factor used to rank the ads in ap- x1 = s(w0 · x0 + b0) where w0 2 R ;b0 2 R and x0 2 R is the propriate order, place the ads in different locations on the page, and input sample. Different hidden layers can be regarded as different even to determine the price that will be charged to the advertiser if internal functions capturing different forms of representations of a click occurs. Therefore, ad click prediction is a core component a data instance. Compared with the linear model, DNN thus has of the sponsored search system. better for catching intrinsic data patterns and leads to better gen- eralization. The sigmoid activation can be replaced as a tanh or 2.2 Models ReLU [19] function. Consider a training data set D = f(xi;yi)g with n examples (i.e., 2.3 Training data m jDj = n), where each sample has m features xi 2 R with observed The training data is collected from an ad impressions log, that label yi 2 f0;1g. We formulate click prediction as a supervised each sample (xi;yi) represents whether or not the impressed ad al- learning problem, and binary classification models are often used located by the ad system has been clicked by a user. The output for click probability estimation p(click = 1juser;query;ad). Given variable yi is 1 if the ad has been clicked, and yi is 0 otherwise. the observed label y 2 f0;1g, the prediction p gets the resulting The input features xi consist of different sources that describe dif- LogLoss (logistic loss), given as: ferent domains of an impression. 1). query features that include `(p) = −y · log p − (1 − y) · log(1 − p); (1) query term, query classification, query length, etc.
Recommended publications
  • Retail for Lease 2987 Granville Street Vancouver, British Columbia
    RETAIL FOR LEASE 2987 GRANVILLE STREET VANCOUVER, BRITISH COLUMBIA Exceptional Yaletown Retail Opportunity CONTACT US Mario Negris Nolan Toigo Personal Real Estate Corporation Sales Representative Executive Vice President +1 778 372 3938 +1 604 662 3000 [email protected] [email protected] URBAN | PROPERTIES | GROUP WEST 5TH AVENUE UNO LANGANN INE ART AVAILABLE TAL SHOP CHEESECAE ETC. IRA HAA SPA ATINSONS WEST 6TH AVENUE UNER CONSTRUCTION ASTERPIECE PAULS PLACE OLETTERY ART|ANTIUES|ESIGN URBAN ITNESS ARARAT RUGS SCOTT ART RAIN SONS ART AVYAN CARPETS CANAIAN CANNABIS CANCER RESERACH ASTERS GALLERY HEEL INE ART STARBUCS AUCTION GALLERY WEST 7TH AVENUE VACANT INUSTRIAL REVOLUTION ONA NELLIS IAN TAN GALLERY AVAILABLE CACHE COUTURE VANCOUVER HAIR ACAEY URBAN BARN OUGLAS REYNOLS GALLERY SUYA HOE ECOR ALISON BOUTIUE THE BRIC GRANVILLE STREET AVAILABLE WEST 8TH AVENUE STRUCTUBE PA PAYRY RUG VILLA BEAU INTERIORS VERANAH ANTIUES CALIORNIA CLOSETS URBANITY SG P GALLERY ALLURE NAILS SPA UBREAITII RESH SLICE LE SALON HAE SOE SHOP URBATO GALLERY RE RUBY HAIR STUIO RBC AVAILABLE WEST BROADWAY BLEN COEE CONALS RESTAURANT CHOW SANG INIGO 2 LEVELS EWELLERY E3 URNITURE BURRITO CURRENCY STARBUCS ECHANGE SUSHI VAN GRANVILLE GREEN RHINO EICAL CLINIC IREHALL CANNABIS VANCOUVER SHOE REPAIR PHARASAVE RESTORATION HARWARE PUBLIC LIBRARY EWAR CHAPAN WOAN W 10TH AVENUE BAN O ONTREAL POTTERY BARN AVAILABLE 2 LEVELS ASON . REE PEOPLE AS BEER WINE SPIRITS SOT OC THE ARUIS HOUSE O NIVES ASHIA OE ASHION ROOTS ECCO STARBUCS COEE EYES OR YOU LUSSO BABY WIRELESS
    [Show full text]
  • A/B Testing Challenges in Large Scale Social Networks
    From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks Ya Xu, Nanyu Chen, Adrian Fernandez, Omar Sinno, Anmol Bhasin LinkedIn Corp, 2029 Stierlin Court, Mountain View, CA 94043 {yaxu, nchen, afernandez, osinno, abhasin}@linkedin.com ABSTRACT experiments. The theory of controlled experiment dates back to Sir Ronald A. A/B testing, also known as bucket testing, split testing, or Fisher’s experiments at the Rothamsted Agricultural Experimental controlled experiment, is a standard way to evaluate user Station in England in the 1920s [11]. Since then, many textbooks engagement or satisfaction from a new service, feature, or and papers from different fields have provided theoretical product. It is widely used among online websites, including social foundations [20, 21, 32, 33] for running controlled experiments. network sites such as Facebook, LinkedIn, and Twitter to make While the theory may be straightforward, the deployment and data-driven decisions. At LinkedIn, we have seen tremendous mining of experiments in practice and at scale can be complex and growth of controlled experiments over time, with now over 400 challenging [13, 14]. In particular, several past KDD papers have concurrent experiments running per day. General A/B testing discussed at length the experimentation systems used at Microsoft frameworks and methodologies, including challenges and pitfalls, Bing and Google [8, 9], including best practices and pitfalls [7, have been discussed extensively in several previous KDD work 10]. Facebook also introduces the PlanOut language which [7, 8, 9, 10]. In this paper, we describe in depth the provides a toolkit for parameter-based experiments [12].
    [Show full text]
  • UI Design and Interaction Guide for Windows Phone 7
    UI Design and Interaction Guide 7 for Windows Phone 7 July 2010 Version 2.0 UI Design and Interaction Guide for Windows Phone 7 July 2010 Version 2.0 This is pre-release documentation and is subject to change in future releases. This document supports a preliminary release of a software product that may be changed substantially prior to final commercial release. This docu- ment is provided for informational purposes only and Microsoft makes no warranties, either express or implied, in this document. Information in this document, including URL and other Internet Web site references, is subject to change without notice. The entire risk of the use or the results from the use of this document remains with the user. Unless otherwise noted, the companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted in examples herein are fictitious. No association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this docu- ment. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
    [Show full text]
  • Consumer Heterogeneity and Paid Search Effectiveness
    Consumer Heterogeneity and Paid Search E↵ectiveness: A Large Scale Field Experiment⇤ Thomas Blake† Chris Nosko‡ Steven Tadelis§ October 30, 2013 Abstract Internet advertising has been the fastest growing advertising channel in recent years with paid search ads comprising the bulk of this revenue. We present results from a series of large-scale field experiments done at eBay that were designed to measure the causal e↵ectiveness of paid search ads. Because search clicks and purchase behavior are correlated, we show that returns from paid search are a fraction of conventional non-experimental estimates. As an extreme case, we show that brand-keyword ads have no short-term benefits. For non-brand keywords we find that new and infrequent users are positively influenced by ads but that more frequent users whose purchasing behavior is not influenced by ads account for most of the advertising expenses, resulting in average returns that are negative. ⇤We are grateful to many ebay employees and executives who made this work possible. We thank Susan Athey, Randall Lewis, David Reiley, Justin Rao and Florian Zettelmeyer for comments on earlier drafts. †eBay Research Labs. Email: [email protected] ‡University of Chicago and eBay Research Labs. Email: [email protected] §UC Berkeley and eBay Research Labs. Email: [email protected] 1 Introduction Advertising expenses account for a sizable portion of costs for many companies across the globe. In recent years the internet advertising industry has grown disproportionately, with revenues in the
    [Show full text]
  • The Thin Lines Between Data Analysis and Surveillance: Reflections on a Research History
    The Thin Lines between Data Analysis and Surveillance: Reflections on a Research History Abstract Danyel Fisher Where are the lines between ‘big data analytics’ and Microsoft Research ‘surveillance’? As a researcher in the former—and an 1 Microsoft Way outspoken sceptic of the latter—I review my own Redmond, Washington USA research to examine how my own attempts to manage [email protected] privacy in collecting and visualizing data have worked out. Interestingly, perhaps distressingly, it seems that even when projects are designed around viewing and displaying privacy-enhanced aggregates, it is easier to discuss them in terms of individual behavior and single subjects: a path that can lead distressingly toward accidentally building surveillance systems. Author Keywords Big data; data visualization; data analysis; social networks Prepared for the CHI 2014 Workshop: “Critical Making Hackathon.” Copyright 2014, Microsoft Research. Introduction Where does data analytics end, and surveillance begin? My research has often used large-scale data analysis to examine online behavior, and used user traces to understand the ways that users interact with systems and each other. But with a recent interest in mass scale data collection—from commercial organizations looking to analyze user behavior for optimal ad-selling, or from government agencies looking to stop terrorism—has begun to blur the lines between data analysis and The viewer of the system should not see any surveillance. information that a subject has not explicitly published. Without attempting to formally define terms, The system should lead the viewer toward Projects ‘surveillance’ connotes using data to look specifically at conclusions about group behavior, not individual users; ‘data analysis’ usually connotes individual behavior.
    [Show full text]
  • All Use of the Microsoft Bing Service Is Subject to the Following Microsoft Bing Terms of Use
    All use of the Microsoft Bing Service is subject to the following Microsoft Bing Terms of Use. The following terms govern your use of the BING Service (the "Service"). By accessing or using the Service, you are agreeing to the terms of the Microsoft Bing Maps and MapPoint Web Service Terms of Use (the "EULA") and the additional terms set forth below (collectively, the "Agreement"). Terms of Use for the Embedded Map Service also apply if you use the Service in your web sites or applications. DO NOT USE THE SERVICE UNTIL YOU HAVE READ AND AGREED TO THIS AGREEMENT. IF YOU ARE NOT WILLING TO BE BOUND BY THE AGREEMENT, do not access or use the Service. Any conflict between the terms of the EULA and this license shall be resolved in favor of this license as it relates to Your use of the Service. 1. Maintenance. You may use the Service only with Licensed Products (as defined in the EULA) that are covered by a current maintenance agreement with PBSI. 2. Limitations on Use of Virtual Earth. You may not integrate Virtual Earth or any content of Virtual Earth with any Google or MapQuest mapping platform as the primary road mapping source. However, You may incorporate various data layers of types not available through the Services (for example, demographic or school location data). 3. Limitations on sensor-based routing. You shall not use the Service (a) to provide guidance based on the position or routing of multiple objects tracked using GPS or other sensor-generated methods (for example but without limitation GPS); or (b) to present or alert an end user to individual maneuvers of a route in any way that is synchronized with the end-user’s sensor-based position along the route, (for example but without limitation, "real-time" route guidance that tracks end-user’s position using GPS and communicates a maneuver as the end-user approaches the location for such maneuver).
    [Show full text]
  • 30 Ways Your Windows 10 Computer Phones Home to Microsoft
    30 Ways Your Windows 10 Computer Phones Home to Microsoft Windows 10 phones home more than any other version of Windows before it. Along with Windows 10, Microsoft released a new privacy policy and services agreement containing 45 pages of legalese. If Google wrote these, Microsoft might say you were being ―Scroogled.‖ Like it or not, Microsoft isn’t alone here. What Windows 10 is doing has become common and normal across the web, Android, iOS, and other modern platforms. Microsoft is hoovering up all the data it can get to better target ads, personalize the operating system, and improve its software and services. Editor’s Note: almost everything is sending back data to somewhere — for instance, if you are using Chrome, everything you search for is sent back to Google. Ad networks are tracking you on every website (including this one). Facebook and Amazon have ad systems that border on creepy. We’re not necessarily condemning Microsoft with this article, but with all the recent interest in privacy and Windows 10, we decided to build a list of all the things being sent back in Windows 10 and let you decide what you think. Privacy Options, Personalized Ads, Location, Getting to Know You and Feedback What’s the Difference Between Windows 10’s Express or Custom Setup? When you first set up Windows 10, Microsoft is going to want to rush you through the process with ―Express... [Read Article] You can actually change many of these options right after the installation process if you select ―Customize settings‖ instead of ―Use express settings.‖ However, most people will probably click right through and use the express settings.
    [Show full text]
  • Appendix I: Search Quality and Economies of Scale
    Appendix I: search quality and economies of scale Introduction 1. This appendix presents some detailed evidence concerning how general search engines compete on quality. It also examines network effects, scale economies and other barriers to search engines competing effectively on quality and other dimensions of competition. 2. This appendix draws on academic literature, submissions and internal documents from market participants. Comparing quality across search engines 3. As set out in Chapter 3, the main way that search engines compete for consumers is through various dimensions of quality, including relevance of results, privacy and trust, and social purpose or rewards.1 Of these dimensions, the relevance of search results is generally considered to be the most important. 4. In this section, we present consumer research and other evidence regarding how search engines compare on the quality and relevance of search results. Evidence on quality from consumer research 5. We reviewed a range of consumer research produced or commissioned by Google or Bing, to understand how consumers perceive the quality of these and other search engines. The following evidence concerns consumers’ overall quality judgements and preferences for particular search engines: 6. Google submitted the results of its Information Satisfaction tests. Google’s Information Satisfaction tests are quality comparisons that it carries out in the normal course of its business. In these tests, human raters score unbranded search results from mobile devices for a random set of queries from Google’s search logs. • Google submitted the results of 98 tests from the US between 2017 and 2020. Google outscored Bing in each test; the gap stood at 8.1 percentage points in the most recent test and 7.3 percentage points on average over the period.
    [Show full text]
  • Ecdl Online Essentials
    ECDL ONLINE ESSENTIALS Syllabus 1.0 Learning Material (Outlook 2016) Provided by: Malta © 2015 COMAT Training Services Page i European Computer Driving Licence, ECDL, International Computer Driving Licence, ICDL, e-Citizen and related logos are all registered Trade Marks of The European Computer Driving Licence Foundation Limited (“ICDL Foundation”). This courseware may be used to assist candidates to prepare for the ECDL Foundation Certification Programme as titled on the courseware. ICDL Foundation does not warrant that the use of this courseware publication will ensure passing of the tests for that ICDL Foundation Certification Programme. The material contained in this courseware does not guarantee that candidates will pass the test for the ICDL Foundation Certification Programme. Any and all assessment items and / or performance-based exercises contained in this courseware relate solely to this publication and do not constitute or imply certification by ICDL Foundation in respect of the ECDL Foundation Certification Programme or any other ICDL Foundation test. This material does not constitute certification and does not lead to certification through any other process than official ICDL Foundation certification testing. Candidates using this courseware must be registered with ECDL Malta before undertaking a test for an ICDL Foundation Certification Programme. Without a valid registration, the test(s) cannot be undertaken and no certificate, nor any other form of recognition, can be given to a candidate. Registration should be undertaken at an Approved Test Centre. Screen shots used with permission from Microsoft. Page ii ECDL Online Essentials S1.0 VS 211116 © 2016 ECDL Online Essentials With a growing reliance on the Internet for work, communication and recreational needs, keeping up-to-date with online tools and knowing how to use them effectively is a crucial component of digital literacy.
    [Show full text]
  • License Agreement/Product Use Rights
    Microsoft Volume Licensing License Agreement/Product Use Rights August 2012 Table of Contents INTRODUCTION ..................................................................................... 4 Project 2010 Professional 28 UNIVERSAL LICENSE TERMS .............................................................. 7 Project 2010 Standard 28 Definitions 7 Publisher 2010 29 Your Use Rights 8 Rental Rights for Office 29 Rights to use other versions 8 SharePoint Workspace 2010 29 Third Party Software 9 Streets & Trips 2013 30 Pre-release Code 9 Visio 2010 Premium 30 Updates and Supplements 9 Visio 2010 Professional 30 No Commercial Hosting 9 Visio 2010 Standard 30 Technical Limitations 9 Word 2010 30 Other Rights 9 Word for Mac 2011 30 Documentation 9 DESKTOP OPERATING SYSTEMS (PER COPY PER DEVICE) ......... 31 License Reassignment 9 Windows 8 Professional 32 Product Activation 10 Rental Rights for Windows 33 Additional Functionality 10 SERVERS: PROCESSOR/CAL (PROCESSOR LICENSE + CAL + Using More than One Product or Functionality Together 10 OPTIONAL EXTERNAL CONNECTOR) ............................................... 34 Font Components 11 Windows Server 2012 Datacenter 35 .NET Framework and PowerShell Software 11 Windows Server 2012 Standard 36 Benchmark Testing 11 SERVERS: SERVER / CAL (SERVER LICENSE + CAL + OPTIONAL Products That Include SQL Server Technology 11 EXTERNAL CONNECTOR) .................................................................. 38 SQL Server Reporting Services Map Report Item 11 Bing Maps Server 39 Multiplexing 11 Bing Maps Server with Enhanced Content Pack 40 Management Packs 11 Business Intelligence Appliance 2012 40 Distributable Code 11 Duet Enterprise for Microsoft SharePoint and SAP 1.0 41 Software Plus Services 13 Duet for Microsoft Office and SAP 1.5 41 The following license terms apply to your use of products in the Exchange Server 2007 Standard for Small Business 42 Microsoft Servers licensing models.
    [Show full text]
  • Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’S Bing Search Engine
    Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine Thore Graepel [email protected] Joaquin Quiñonero Candela [email protected] Thomas Borchert [email protected] Ralf Herbrich [email protected] Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK Abstract of CTR prediction is absolutely crucial to Sponsored Search advertising because it impacts user experience, We describe a new Bayesian click-through rate profitability of advertising and search engine revenue. (CTR) prediction algorithm used for Sponsored Recognising the importance of CTR estimation for online Search in Microsoft’s Bing search engine. The advertising, management at Bing/adCenter decided to run algorithm is based on a probit regression model a competition to entice people across the company to that maps discrete or real-valued input features to develop the most accurate and scalable CTR predictor. probabilities. It maintains Gaussian beliefs over The algorithm described in this publication tied for first weights of the model and performs Gaussian place in the first competition and won the subsequent online updates derived from approximate competition based on prediction accuracy. As a message passing. Scalability of the algorithm is consequence, it was chosen to replace Bing’s previous ensured through a principled weight pruning CTR prediction algorithm, a transition that was completed procedure and an approximate parallel in the summer of 2009. implementation. We discuss the challenges arising from evaluating and tuning the predictor The paper makes three major contributions. First, it as part of the complex system of sponsored describes the Sponsored Search application scenario, the search where the predictions made by the key role of CTR prediction in general, and the particular algorithm decide about future training sample constraints derived from the task, including accuracy, composition.
    [Show full text]
  • Download Download
    International Journal of Management & Information Systems – Fourth Quarter 2011 Volume 15, Number 4 History Of Search Engines Tom Seymour, Minot State University, USA Dean Frantsvog, Minot State University, USA Satheesh Kumar, Minot State University, USA ABSTRACT As the number of sites on the Web increased in the mid-to-late 90s, search engines started appearing to help people find information quickly. Search engines developed business models to finance their services, such as pay per click programs offered by Open Text in 1996 and then Goto.com in 1998. Goto.com later changed its name to Overture in 2001, and was purchased by Yahoo! in 2003, and now offers paid search opportunities for advertisers through Yahoo! Search Marketing. Google also began to offer advertisements on search results pages in 2000 through the Google Ad Words program. By 2007, pay-per-click programs proved to be primary money-makers for search engines. In a market dominated by Google, in 2009 Yahoo! and Microsoft announced the intention to forge an alliance. The Yahoo! & Microsoft Search Alliance eventually received approval from regulators in the US and Europe in February 2010. Search engine optimization consultants expanded their offerings to help businesses learn about and use the advertising opportunities offered by search engines, and new agencies focusing primarily upon marketing and advertising through search engines emerged. The term "Search Engine Marketing" was proposed by Danny Sullivan in 2001 to cover the spectrum of activities involved in performing SEO, managing paid listings at the search engines, submitting sites to directories, and developing online marketing strategies for businesses, organizations, and individuals.
    [Show full text]