New Analytics Paradigms in Online Advertising and Fantasy Sports

New Analytics Paradigms in Online Advertising and Fantasy Sports Raghav Singal Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy under the Executive Committee of the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2020 © 2020 Raghav Singal All Rights Reserved Abstract New Analytics Paradigms in Online Advertising and Fantasy Sports Raghav Singal Over the last two decades, digitization has been drastically shifting the way businesses operate and has provided access to high volume, variety, velocity, and veracity data. Naturally, access to such granular data has opened a wider range of possibilities than previously available. We leverage such data to develop application-driven models in order to evaluate current systems and make better decisions. We explore three application areas. In Chapter 1, we develop models and algorithms to optimize portfolios in daily fantasy sports (DFS). We use opponent-level data to predict behavior of fantasy players via a Dirichlet-multinomial process, and our predictions feed into a novel portfolio construction model. The model is solved via a sequence of binary quadratic programs, motivated by its connection to outperforming stochas- tic benchmarks, the submodularity of the objective function, and the theory of order statistics. In addition to providing theoretical guarantees, we demonstrate the value of our framework by par- ticipating in DFS contests. In Chapter 2, we develop an axiomatic framework for attribution in online advertising, i.e., assessing the contribution of individual ads to product purchase. Leveraging a user-level dataset, we propose a Markovian model to explain user behavior as a function of the ads she is exposed to. We use our model to illustrate limitations of existing heuristics and propose an original framework for attribution, which is motivated by causality and game theory. Furthermore, we establish that our framework coincides with an adjusted “unique-uniform” attribution scheme. This scheme is efficiently implementable and can be interpreted as a correction to the commonly used uniform attribution scheme. We supplement our theory with numerics using a real-world large-scale dataset. In Chapter 3, we propose a decision-making algorithm for personalized sequential marketing. As in attribution, using a user-level dataset, we propose a state-based model to capture user behavior as a function of the ad interventions. In contrast with existing approaches that model only the myopic value of an intervention, we also model the long-run value. The objective of the firm is to maximize the probability of purchase and a key challenge it faces is the lack of understand- ing of the state-specific effects of interventions. We propose a model-free learning algorithm for decision-making in such a setting. Our algorithm inherits the simplicity of Thompson sampling for a multi-armed bandit setting and we prove its asymptotic optimality. We supplement our theory with numerics on an email marketing dataset. Table of Contents List of Figures .......................................... iv List of Tables .......................................... ix Acknowledgments ........................................ x Introduction ........................................... 1 Chapter 1: How to Play Fantasy Sports Strategically (and Win) ............... 4 1.1 Introduction . 4 1.2 Problem Formulation . 10 1.3 Modeling Opponents’ Team Selections . 13 1.4 Solving the Double-Up Problem . 19 1.5 Solving the Top-Heavy Problem . 23 1.6 Numerical Experiments . 30 1.7 The Value of Modeling Opponents, Insider Trading, and Collusion . 37 1.8 Conclusions and Further Research . 45 Chapter 2: Shapley Meets Uniform: An Axiomatic Framework for Attribution in Online Advertising ..................................... 48 2.1 Introduction . 48 2.2 Model . 55 2.3 Current Attribution Approaches and Their Limitations . 59 2.4 Shapley Value (SV) . 64 2.5 CASV for the Markov Chain Model . 74 2.6 Numerical Experiments . 79 2.7 Value of State-Specific Attribution . 87 i 2.8 Conclusions and Further Research . 94 Chapter 3: Beyond Myopia: Model Free Approximate Bayesian Learning for Conversion Funnel Optimization ................................ 97 3.1 Introduction . 98 3.2 Consumer Model and Conversion Funnel Optimization . 107 3.3 Model-Free Approximate Bayesian Learning . 113 3.4 Properties of MFABL . 118 3.5 Model Extensions . 123 3.6 Numerical Experiments . 126 3.7 Value of Prior Information, Concept Shift, and Covariate Shift . 142 3.8 Conclusions and Further Research . 147 Concluding Remarks ...................................... 151 References ............................................ 153 Appendix A: Additional Details for Chapter 1 ........................ 168 A.1 Further Technical Details . 168 A.2 Parimutuel Betting Markets and Their Relation to DFS . 185 A.3 Further Details of Numerical Experiments . 194 Appendix B: Additional Details for Chapter 2 ........................ 216 B.1 Coalition Value . 216 B.2 Proof of Theorem 2.1 . 218 B.3 Sensitivity of CASV Due to Finite Size of Data . 221 B.4 Proofs of Theorem 2.3 and Proposition 2.2 . 224 B.5 Counterfactuals: Local, Global, and Forward-Looking . 225 Appendix C: Additional Details for Chapter 3 ........................ 230 ii C.1 Proof of Theorem 3.1 . 230 C.2 Algorithms for Model Extensions . 237 iii List of Figures 1.1 Cumulative realized dollar P&L for the strategic and benchmark models across the three contest structures for all seventeen weeks of the FanDuel DFS contests in the 2017 NFL regular season. 31 1.2 P&L distribution for the diversification strategy for the strategic and benchmark portfolios for week 10 contests of the 2017 NFL season. Recall N = 50; 25; and 10 for top-heavy, quintuple-up and double-up, respectively. The three metrics at the top of each image are the expected P&L, the standard deviation of the P&L and the probability of loss, that is, P¹P&L < 0º. ....................... 35 1.3 P&L distribution for the replication strategy for the strategic and benchmark portfolios for week 10 contests of the 2017 NFL season. Recall N = 50; 25; and 10 for top-heavy, quintuple-up and double-up, respectively. The three metrics at the top of each image are the expected P&L, the standard deviation of the P&L and the probability of loss, that is, P¹P&L < 0º. ....................... 35 1.4 Predicted and realized cumulative P&L for the strategic and benchmark models across the three contest structures for all seventeen weeks of the FanDuel DFS contests in the 2017 NFL regular season. The realized cumulative P&Ls are dis- played as points. 36 1.5 Weekly expected dollar P&L for the strategic model (N = 50) with and without inside information p in the top-heavy series. 41 2.1 Network for Examples 2.1, 2.2, and 2.4. The action space consists of two actions: no-ad action (a = 1) and ad action (a = 2). The solid blue lines show transitions corresponding to the ad action. We assume the no-ad action in every state leads to a transition to the quit state w.p. 1. The advertiser takes the ad action at all states w.p. 1. 61 iv 2.2 Network for Example 2.3. The action space consists of two actions: no-ad action (a = 1) and ad action (a = 2). The solid blue lines show transitions corresponding to the ad action. The no-ad action in every state leads to a transition to the quit state w.p. 1. The advertiser takes the ad action at all states w.p. 1. 62 2.3 Network for Example 2.5. The action space consists of two actions: no-ad action (a = 1) and ad action (a = 2). Solid blue (dashed red) lines denotes transitions for the ad action (no-ad action). The advertiser takes the ad action at state 1 w.p. 1, 2 1 i.e., β1 = 1 and β1 = 0. ................................ 68 2.4 Attributions to different actions under various schemes with τ = 10. We report the percentage attributions to each action by aggregating over states, i.e., Í a Í a0 s πs / ¹s0;a0º πs0 for each a 2 A............................ 85 2.5 Attributions to state-action pairs under various schemes with τ = 10. We report the a attributions πs for all ¹s; aº 2 S × A. The four colors correspond to the four states in S, arranged in the natural order (unaware, aware, interest, desire). 86 2.6 Attributions to different actions under various schemes with τ = 7. We report the Í a Í a0 percentage attributions to each action by aggregating over states, i.e., s πs / ¹s0;a0º πs0 for each a 2 A. .................................... 88 2.7 Attributions to different actions under various schemes with τ = 14. We report the percentage attributions to each action by aggregating over states, i.e., Í a Í a0 s πs / ¹s0;a0º πs0 for each a 2 A............................ 89 2.8 Value of state-specific attribution on the simulated dataset. For the state-specific model, we report the percentage attributions to each action by aggregating over 0 Í a;Shap Í a ;Shap states, i.e., s s / ¹s0;a0º s0 for each a 2 A. For the aggregated model, we report ¯ a;Shap for each a 2 A............................... 94 v 3.1 Sequence of emails received by one of the authors after providing his email to Netflix (but not subscribing to the membership). The emails were sent with a 15- 20 days gap in between (May 3, May 18, and June 8) and the contents of each of the email were unique. The subject line of the three emails were “Movies & TV shows your way”, “Watch TV shows & movies anytime, anywhere”, and “Netflix - something for everyone”, respectively. 99 3.2 Consumer journey as a function of her interactions with the firm. On average, out of 100 consumers who sign-up, around 65, 30, and 2 receive, open, and click an email, respectively.

Load more