Two Decades of Recommender Systems at Amazon.Com
Total Page:16
File Type:pdf, Size:1020Kb
The Test of Time of Test The Two Decades of Recommender Systems at Amazon.com Amazon is well-known for personalization and recommendations, which help customers discover items they might otherwise not have found. In this update to our original article, we discuss some of the changes as Amazon has grown. Brent Smith or two decades now,1 Amazon. recommendations, as well as desirable Amazon.com com has been building a store for properties such as updating immediately F every customer. Each person who based on new information about a cus- Greg Linden comes to Amazon.com sees it differ- tomer and being able to explain why it Microsoft ently, because it’s individually person- recommended something in a way that’s alized based on their interests. It’s as if easily understandable. you walked into a store and the shelves What was described in our 2003 started rearranging themselves, with IEEE Internet Computing article has what you might want moving to the faced many challenges and seen much front, and what you’re unlikely to be development over the years. Here, we interested in shuffling further away. describe some of the updates, improve- From a catalog of hundreds of mil- ments, and adaptations for item-based lions of items, Amazon.com’s recommen- collaborative filtering, and offer our dations pick a small number of items you view on what the future holds for col- might enjoy based on your current con- laborative filtering, recommender sys- text and your past behavior. The algo- tems, and personalization. rithms aren’t magic; they simply share with you what other people have already The Algorithm discovered. The algorithm does all the As we described it in 2003, the item- work. It’s computers helping people help based collaborative filtering algorithm other people, implicitly and anonymously. is straightforward. In the mid-1990s, Amazon.com launched item-based collaborative filtering was generally collaborative filtering in 1998, enabling user-based, meaning the first step of the recommendations at a previously unseen algorithm was to search across other scale for millions of customers and a cat- users to find people with similar inter- alog of millions of items. Since we wrote ests (such as similar purchase patterns), about the algorithm in IEEE Internet Com- then look at what items those similar puting in 2003,2 it has seen widespread users found that you haven’t found yet. use across the Web, including YouTube, Instead, our algorithm begins by find- Netflix, and many others. The algorithm’s ing related items for each item in the success has been from its simplicity, scal- catalog. The term “related” could have ability, and often surprising and useful several meanings here, but at this point, 12 Published by the IEEE Computer Society 1089-7801/17/$33.00 © 2017 IEEE IEEE INTERNET COMPUTING Two Decades of Recommender Systems at Amazon.com Standing the Test of Time s part of recognizing IEEE Internet Computing for its 20 Brent Smith, and Jeremy York, from the January/February 2003 A years in publication, I recommended to the editorial issue of IC (see doi:10.1109/MIC.2003.1167344). Fourteen years board that we pick one of our magazine articles that, over the after the publication of this article, it shows 125 downloads from past 20 years, has withstood the “test of time.” In selecting an IEEE Xplore in one month, with more than 12,754 downloads since article, we evaluated the ideas in more than 20 candidate arti- January 2011. The article currently shows 4,258 citations in Google cles that reported on “evergreen” research areas over the past Scholar. I’m delighted that the selection committee recommended two decades and then assessed these articles based on down- an industry article, as it aligns with the magazine’s focus of acces- loads from IEEE Xplore, citations, and mentions of the work in sibility in academic, research, and industrial populations. popular press. This information was presented to a commit- In addition to recognizing the article, we asked the authors tee consisting of previous Editors in Chief for the magazine. I to create this retrospective piece discussing research and would like to thank the selection committee from the editorial insights that have transpired since publishing their winning board — led by Arun Iyengar, and including Fred Douglis, Rob- “Test of Time” article, while projecting into the future. ert Filman, Michael Huhns, Charles Petrie, Michael Rabinovich, Going forward, the magazine hopes to celebrate a “Test of and Munindar Singh. This committee deliberated on the top Time” article every 2–3 years. I hope that you enjoy this ret- three articles by evaluating each work’s previous importance rospective article, and please take a moment to congratulate within the context of its sustained importance in the future. Greg Linden, Brent Smith, and Jeremy York. It’s my pleasure to recognize the committee’s official “Test — M. Brian Blake of Time” winner: an industry article titled “Amazon.com Recom- Editor-in-Chief, IEEE Internet Computing mendations: Item-to-Item Collaborative Filtering” by Greg Linden, Provost and Distinguished Professor, Drexel University let’s loosely define it as “people who buy one intuitive way as arising from a list of items the item are unusually likely to buy the other.” So, customer remembers purchasing. for every item i1, we want every item i2 that was purchased with unusually high frequency by In 2003: Amazon.com, Netflix, people who bought i1. YouTube, and More Once this related items table is built, we can By the time we published in IEEE in 2003, item- generate recommendations quickly as a series based collaborative filtering was widely deployed of lookups. For each item that’s part of this cus- across Amazon.com. The homepage prominently tomer’s current context and previous interests, featured recommendations based on your past we look up the related items, combine them to purchases and items browsed in the store. Search yield the most likely items of interest, filter out result pages recommended items related to your items already seen or purchased, and then we search. The shopping cart recommended other are left with the items to recommend. items to add to your cart, perhaps impulse buys This algorithm has many advantages over to bundle in at the last minute, or perhaps com- the older user-based collaborative filtering. plements to what you were already considering. Most importantly, the majority of the computa- At the end of your order, more recommendations tion is done offline — a batch build of the related appeared, suggesting items to order later. Using items — and the computation of the recommen- e-mails, browse pages, product detail pages, and dations can be done in real time as a series of more, many pages on Amazon.com had at least lookups. The recommendations are high quality some recommended content, starting to approach and useful, especially given enough data, and a store for every customer. remain competitive in perceived quality even Others have reported using the algorithm, with the newer algorithms created over the last too. In 2010, YouTube reported using it for rec- two decades. The algorithm scales to hundreds ommending videos.3 Many open source and of millions of users and tens of millions of third-party vendors included the algorithm, and items without sampling or other techniques that it showed up widely in online retail, travel, news, can reduce the quality of the recommendations. advertising, and more. In the years following, The algorithm updates immediately on new the recommendations were used so extensively information about a person’s interests. Finally, by Amazon.com that a Microsoft Research report the recommendations can be explained in an estimated 30 percent of Amazon.com’s page maY/June 2017 13 The Test of Time A Present-Day Perspective on Recommendation and Collaborative Filtering s a PhD student who uses collaborative filtering in my means, this approach is item-centric, which drastically reduces A work to introduce customized recommendation tech- the data space for evaluation. As outlined in IEEE Internet Com- niques (and collaborative filtering) that select “workers” for puting’s Test of Time article4 and other closely related work,5 crowdsourcing,1,2 the Test of Time article is particularly mean- this data-space reduction is potentially up to three orders of ingful to me. Collaborative filtering is a technique used to per- magnitude of its original size. Being item-centric, it overcomes sonalize the experience of users through recommendations the issue with sparsity in user data in traditional approaches tailored to the users’ interests, leveraging the experiences of (such approaches contribute largely to unnecessary evalua- other users with similar profiles. Traditionally, the technique tion). It also overcomes the issue of the density in frequent is used in e-commerce platforms to drive sales by converting users who have large portions of data associated with their targeted suggestions to purchases.3 The technique has ren- profiles. dered more favorable results than blanket advertisement, and Item-based collaborative filtering still requires offline is more purposeful toward customizing the experience of indi- processing to pair similar items. By preprocessing this infor- vidual users. Despite this success, two primary challenges have mation offline, recommendations in the list produced from surfaced: these are concerns related to real-time scalability and item-based collaborative filtering can occur in real time in an recommendation quality. These concerns directly impact the online modality. This allows for easy, quick, more personalized users’ individual experiences and, by induction, the success of recommendation for the user. The similar items list is a sleek the platforms using the technique. subset of items targeted to the user’s purchasing or rating his- The first concern of scalability is directly affected by today’s tory, as opposed to that of others in the entire dataset.