Recipe Recommendation Using Ingredient Networks
Total Page:16
File Type:pdf, Size:1020Kb
Recipe recommendation using ingredient networks Chun-Yuen Teng Yu-Ru Lin Lada A. Adamic School of Information IQSS, Harvard University School of Information University of Michigan CCS, Northeastern University University of Michigan Ann Arbor, MI, USA Boston, MA Ann Arbor, MI, USA [email protected] [email protected] [email protected] ABSTRACT thesubject ofour present study,wasfoundedin 1997,years Therecording and sharing of cooking recipes,a human ac- ahead of ot her collaborat ive websit es such as t he W ikipedia. tivity dating back thousandsof years,naturally becamean Recipe sites thrive because individuals are eager to share ear l y and pr omi nent soci al use of t he web. T he r esul t i ng their recipes,fromfamily recipesthat hadbeenpasseddown online recipe collect ions are reposit ories of ingredient com- for generations, to new concoctions that they created that binations and cooking methods whose large-scale and vari- aft ernoon, having been mot ivat ed in part by t he abilit y t o ety yield interesting insightsabout both thefundamentalsof share the result online. Once shared, the recipes are imple- cooking and user preferences. At the level of an individual mented and evaluated by other users, who supply ratings ingredient wemeasurewhether it tendsto beessential or can and comment s. be dropped or added, and whether its quantity can be modi- Thedesire to look up recipesonlinemay at first appear fied. Wealsoconstruct two typesof networksto capturethe odd given that tombs of printed recipes can be found in relationshipsbetweeningredients. Thecomplement network almost every kit chen. T he Joy of Cooking [12] alone con- captures which ingredients tend to co-occur frequently, and tains4,500recipesspread over 1,000pages. There is,how- iscomposed of two largecommunities: onesavory, theother ever , subst ant i al addi t i onal val ue i n onl i ne r eci p es, b eyond sweet. The substitute network, derived from user-generated their accessibility. While the Joy of Cooking contains a suggestionsfor modifications, can bedecomposed into many single recipe for Swedish meatballs, Allrecipes.com hosts communitiesof functionally equivalent ingredients, and cap- “SwedishMeatballsI”,“II”,and“III”,submittedbydier ent turesusers’preferenceforhealthiervariantsofarecipe. Our users, along with 4 other variants, including “ The Amaz- ex p er i ment s r eveal t hat r eci p e r at i ngs can b e wel l pr edi ct ed ing Swedish Meatball”. Each variant has been reviewed, with features derived from combinations of ingredient net- from 329 reviews for “ Swedish Meatballs I” to 5 reviews works and nutrition information. for “ Swedish Meatballs III”. The reviews not only provide acrowd-sourcedrankingofthedier ent r eci p es, but al so many suggestions on how to modify them, e.g. using ground Categoriesand Subject Descriptors turkey instead of beef, skipping the“cream of wheat” be- H.2.8 [DatabaseManagement]: Database applications— cause it is rarely on hand, et c. Data mining Thewealth of information capturedby onlinecollabora- tive recipe sharing sites is revealing not only of the fun- General Terms damentals of cooking, but also of user preferences. The co- occurrence of ingredient s in t ens of t housands of recipes pro- Measurement; Experimentation videsinformation about which ingredients go well together, and when a pairing is unusual. Users’ reviews provide clues Keywords as t o t he flexibilit y of a recipe, and t he ingredient s wit hin it. Can the amount of cinnamon be doubled? Can the nut- ingredient networks, recipe recommendation meg be omitted? If one is lacking a certain ingredient, can a substitute be found among supplies at hand without a trip 1. INTRODUCTION to thegrocery store? Unlikecookbooks,whichwill contain Thewebenablesindividualstocollaboratively shareknowl- vetted but perhaps not the best variants for some individu- edge and r eci p e websi t es ar e one of t he ear l i est ex ampl es of als’ t ast es, rat ings assigned t o user-submit t ed recipes allow collaborativeknowledgesharing on theweb. Allrecipes.com, for the evaluation of what works and what does not. In this paper, we seek to distill the collective knowledge and preference about cooking t hrough mining a popular recipe-sharing website. To extract such information, wefirst Permission to make digital or hard copies of all or part of this work for parse the unstructured text of the recipes and the accom- personal or classroom use is granted without fee provided that copies are panying user reviews. We construct two types of networks not made or distributed for profit or commercial advantage and that copies that reflect dier ent r el at i onshi ps b et ween i ngr edi ent s, i n bear this notice and the full citation on the first page. To copy otherwise, or order t o capt ure users’ knowledge about how t o combine in- republish, to post on serversor to redistributeto lists, requiresprior specific permission and/or a fee. gredient s. T he complement net work capt ures which ingre- Web Sc i 2 0 1 2 , June 22–24, 2012, Evanston, Illinois, USA. dients tend to co-occur frequently, and is composed of two Copyright 2012 ACM 978-1-4503-1228-8. large communities: one savory, the other sweet. The sub- 3. DATASET stitute network, derived from user-generated suggestions for Allrecipes.com is one of the most popular recipe-sharing modifications, can be decomposed into many communitiesof websites, where novice and expert cooks alike can upload functionally equivalent ingredients, and capturesusers’ pref- and rat e cooking recipes. I t host s 16 cust omized int erna- er ence f or heal t hi er var i ant s of a r eci p e. Our ex p er i ment s tional sitesfor users to share their recipesin their native reveal that recipe ratings can be well predicted by features languages, of which we study only the main, English, ver- derived from combinations of ingredient networks and nu- sion. Recipes uploaded to the site contain specific instruc- trition information (with accuracy .792), while most of the tionsonhowtoprepareadish: thelistofingredients,prepa- prediction power comesfrom the ingredient networks (84%). ration steps,preparation and cook time,thenumber of serv- Therestofthepaperisorganizedasfollows. Section2re- ingsproduced, nutrition information, serving directions,and viewstherelated work. Section 3 describesthedataset. Sec- photos of the prepared dish. The uploaded recipes are en- tion4discussestheextractionoftheingredient and comple- riched with user ratings and reviews, which comment on ment networks and their characteristics. Section 5 presents thequality of therecipe,and suggest changesand improve- theextractionof recipemodificationinformation, aswell as ments. In addition to rating and commenting on recipes, theconstructionand characteristicsoftheingredient substi- users are able to save them as favorites or recommend them tute network. Section 6 presents our experiments on recipe to othersthrough a forum. recommendation and Section 7 concludes. We downloaded 46,337 recipes including all information listed from allrecipes.com, including several classifications, 2. RELATED WORK such as a region (e.g. the midwest region of US or Eu- Recipe recommendation has been the subject of much rope), the course or meal the dish is appropriate for (e.g.: prior work. Typically the goal has been to suggest recipes appet izers or breakfast ), and any holidays t he dish may be to users basedon their past reciperatings[15][3] or brows- associated with. In order to understand users’ recipeprefer- ing/ cooking history [16]. The algorithms then find simi- ences, we cr awl ed 1,976,920 r ev i ews whi ch i ncl ude r ev i ewer s’ lar recipes based on overlapping ingredients, either treat- ratings, review text, and the number of userswho voted the ing each ingredient equally [4] or by identifying key ingre- review as useful. dients [19]. Instead of modeling recipes using ingredients, Wang et al. [17] represent the recipes as graphs which are 3.1 Data preprocessing built on ingredients and cooking directions, and they demon- Thefirst stepin processing therecipesis identifying the strate that graph representations can be used to easily ag- ingredients and cooking methods from the freeform text of gregat e Chinese dishes by t he flow of cooking st eps and t he the recipe. Usually, although not always, each ingredient sequenceof added ingredients. However, their approach only is listed on a separate line. To extract the ingredients, we models the occurrence of ingredients or cooking methods, tried two approaches. In the first, we found the maximal and doesn’t t ake int o account t he relat ionships bet ween in- match between a pre-curated list of ingredients and the text gredient s. I n cont rast , in t his paper we incorporat e t he like- of the line. However, this missed too many ingredients, lihood of ingredients to co-occur, as well as the potential of while misidentifying others. In the second approach, we one ingredient t o act as a subst it ut e for anot her. used regular expression matching to remove non-ingredient Another branch of research has focused on recommend- termsfrom thelineand identifiedtheremainder asthein- ing recipesbased on desired nutritional intake or promoting gredient . We removed quant ifiers, such as e.g. “ 1 lb” or “ 2 healthy food choices. Geleijnse et al. [7] designed a proto- cups”, words referring to consistency or temperature, e.g. type of a personalized recipe advicesystem, which suggests chopped or cold, along with a few other heuristics, such as recipesto usersbasedon their past food selectionsand nutri- removing content in parentheses. For example“1 (28 ounce) tion intake. In addition to nutrition information, Kamieth can baked beans (such as Bush’s Original R )” is identified et al . [9] bui l t a p er sonal i zed r eci p e r ecommendat i on sy st em as “ baked beans”.