Session 5 Data and Algorithms for Personalization 5.1 Introduction

Data and Algorithms for Personalization CS320: Value of Data & AI Winter 2020 Session 5 Data and Algorithms for Personalization Lecturer(s): James Zou Scribes: Qiwen Wang Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Instructor. 5.1 Introduction The personalization algorithms influence what people see and choose on the Internet. The personalization tailors product to accommodate an individual's needs and interests. However, the data and algorithms used for personalization impose concerns on privacy, ethics, and policy. 5.1.1 Netflix’s Artwork Personalization Netflix is known for its high-quality media content and personalized search results and viewing content. The personalization in Netflix allows each member to have a different view of the content that adapts to their interests and can help expand their interests over time. Besides personalize media content, for each media, Netflix personalizes the whole viewing experience. Netflix differs from other traditional media offerings in that they \don't have one product but over 100 million different products with one for each of our members with personalized recommendations and personalized visuals."[1] Each movie and TV series can be personalized in terms of the artwork and visual elements. The core algorithmic challenge for the Netflix team is to use the data from the viewers to customize not only the viewing options but also the visual aspects. There were significant controversies on Netflix showing different artworks for each movie or TV series to targeted viewers. Some viewers think Netflix is targeting them by race from these artworks. On the 2018 Netflix comedy film poster \Like Father," people noticed that especially for the African American viewers, Netflix advertised to them with posters featuring black characters, even if the film predominantly starred a white cast. The poster is misleading in that the content of the TV series is not what the viewers are expecting based on their impression from the poster and is not an accurate representation of the content. Is Netflix targeting users by race? Netflix defended itself by claiming they didn't use the race, gender, or ethnicity information to personalize user's individual Netflix experience; the only information Netflix used was the viewing history. However, the controversy remains. People argue that it is Netflix’s responsibility to ensure the film posters do not falsely represent. Besides, even though Netflix doesn't explicitly use the sensitive data in their model, the proxy features can lead to bias against certain groups. These features can implicitly learn about the race, gender and ethnicity information of the individual. More fundamentally, is Netflix providing \too much" personalization? 5-1 5-2 Lecture 5 Data and Algorithms for Personalization: James Zou 5.1.2 Challenges of personalization • Comparing to the systems in the past, the systems nowadays have too many options to personalize. They can customize every single element, e.g., artworks, font size, image position, etc. Too many options impose a challenge from the algorithm perspective, and the privacy, ethics, and security perspective. • Companies need to run a large number of experiments. The challenges lie in the methodologies companies use to perform the tests efficiently. In particular, we will discuss the contextual bandit algorithm in Section 5.5. • Although every element can be personalized nowadays, too much personalization can result in recom- mendation bias and a narrower perspective. Companies need to evaluate whether the personalization is beneficial, or it is too much for the users. 5.2 How do companies experiment? The Washington Post runs live tests on the website to identify contents that users positively respond to, to provide users a better reading experience. To identify the best headline for the news story, The Washington Post generates different versions of the headlines for the same news story and tests which headline generates the most click-though rate (CTR)[2]. For example, The Washington Post presents the following headlines for a news story on why organization expert Marie Kondo's tips don't work for parents: • Why Marie Kondo's life-changing magic doesn't work for parents (CTR: 3.3%) • The real reasons Marie Kondo's life-changing magic doesn't work for parents (CTR: 3.9%) • The real reasons Marie Kondo's life-changing magic doesn't work for parents (with the thumbnail of the author) (CTR: 4.8%) The third headline seems more promising because the phrase \The real reasons" evokes more curiosity and is more declarative, and the thumbnail is more related to the content. 5.2.1 How to compute CTR? To compute the click-through rate, The Washington Post 1. Picks a subset of the users, i.e., 1% of the subscribers as the sample, 2. Randomly partitions the sample into three sets, each corresponds to one of these options, 3. Computes the CTR for each set. The headline that gets the highest CTR is selected as the headline for the news story. 5.2.2 Bing Search Many companies perform similar experiments. Take Bing Search Engine ads as another example. The test considers two advertisements shown on Bing Search Engine of the same product. One advertisement shows Lecture 5 Data and Algorithms for Personalization: James Zou 5-3 an additional set of the links at the bottom of it, and the other one doesn't have the set of the links. The result shows the advertisement with the additional links is more promising. The tiny change in ads generates over 100 million dollars in revenue for the company. Such methodology is often referred to as A/B testing, in which version A and version B, which are identical except for one variation, are involved in a randomized experiment. In the above example, the variation is whether the ads contain additional links. 5.2.3 Run Experiments in Parallel The number of experiments companies run per today grow quadraticaly. With the limited number of users, the overhead of testing components sequentially is enormous. How do companies handle the massive scale of experiments per day? In fact, hundreds of trials run in parallel across different interfaces for each individual. For companies like Bing and Microsoft, product teams run experiments simultaneously on the same interface for testing various components, i.e., background color, font, new features, etc., in a decentralized way. Product teams can specify the slice of the population to run the experiments and the length of the experiments. 5.2.3.1 Challenges in Large-scale Experimentation • Combining the best individual components from each test doesn't implt a better user experience. The marginal changes may not be additive. • The output metrics is hard to attribute to the individual changes. LinkedIn built its own platform (XLNT) to provide a solution and help make data-driven A/B testing decisions. There are over 400 experiments per day in parallel. With multiple experiments running simultaneously on a page, locating the source of impact is difficult. To solve this problem, the experiment runner can subscribe to metrics and get a list of the experiments that are impacting these metrics on the platform. The list of experiment are selected based on the experiment population, effect size and metric intrinsic volatility. 5.2.4 Personalized Pricing The personalization is nowadays so massively used that even for the same product, the companies charge the users different prices based on their affordability. Companies can track what websites you've visited, what items you've purchased, your location, or what device you may be using, to offer different prices of the product. Take, for example, a search for a hotel room in Panama City on Travelocity. The Westin Playa Bonita hotel was listed at $200 using a regular browser, but only $181 on an ìncognito' browser that hides the user's cookies. Both searches were done on the same date. What consumers call \price discrimination" has become a common practice in the travel industry. Uber uses machine learning to find the price trade-off curve for each individual so that Uber can offer different prices for different rides. Daniel Graf, Uber's head of product, said the company applies machine-learning techniques to estimate how many groups of customers are willing to shell out for a ride. Uber calculates riders' propensity for paying a higher price for a particular route at a certain time of day. For instance, someone traveling from a wealthy neighborhood to another tony spot might be asked to pay more than another person heading to a poorer part of town, even if demand, traffic and distance are the same. 5-4 Lecture 5 Data and Algorithms for Personalization: James Zou People are concerned about Uber's personalized pricing. Although personalized pricing is a legit business model from the marketing perspective, it's an unfair advantage that some people are charged more. There aren't clear societal standards on price discrimination. The only way for the new services like car-sharing services to figure out the standard is to experiment. However, the experiment effects on the pricing are narrow. The test results can only show that whether the user is willing to keep using the service with a change of the price, but these tests don't show the long term impact on the neighborhood, commute pattern, and the change in demography. These experiments are often usually lack of transparency. Uber gets permission to do the experiments when the users register an account and sign the policy term. Nevertheless, Uber doesn't inform the user what exactly are they tested on, not it's realistic in the case of pricing.

Load more