<<

Recipe recommendation using ingredient networks

Chun-Yuen Teng Yu-Ru Lin Lada A. Adamic School of Information IQSS, Harvard University School of Information University of Michigan CCS, Northeastern University University of Michigan Ann Arbor, MI, USA Boston, MA Ann Arbor, MI, USA [email protected] [email protected] [email protected]

ABSTRACT thesubject ofour present study,wasfoundedin 1997,years Therecording and sharing of cooking recipes,a human ac- ahead of ot her collaborat ive websit es such as t he W ikipedia. tivity dating back thousandsof years,naturally becamean Recipe sites thrive because individuals are eager to share ear l y and pr omi nent soci al use of t he web. T he r esul t i ng their recipes,fromfamily recipesthat hadbeenpasseddown online recipe collect ions are reposit ories of ingredient com- for generations, to new concoctions that they created that binations and cooking methods whose large-scale and vari- aft ernoon, having been mot ivat ed in part by t he abilit y t o ety yield interesting insightsabout both thefundamentalsof share the result online. Once shared, the recipes are imple- cooking and user preferences. At the level of an individual mented and evaluated by other users, who supply ratings ingredient wemeasurewhether it tendsto beessential or can and comment s. be dropped or added, and whether its quantity can be modi- Thedesire to look up recipesonlinemay at first appear fied. Wealsoconstruct two typesof networksto capturethe odd given that tombs of printed recipes can be found in relationshipsbetweeningredients. Thecomplement network almost every kit chen. T he Joy of Cooking [12] alone con- captures which ingredients tend to co-occur frequently, and tains4,500recipesspread over 1,000pages. There is,how- iscomposed of two largecommunities: onesavory, theother ever , subst ant i al addi t i onal val ue i n onl i ne r eci p es, b eyond sweet. The substitute network, derived from user-generated their accessibility. While the Joy of Cooking contains a suggestionsfor modifications, can bedecomposed into many single recipe for Swedish meatballs, Allrecipes.com hosts communitiesof functionally equivalent ingredients, and cap- “SwedishMeatballsI”,“II”,and“III”,submittedbydier ent turesusers’preferenceforhealthiervariantsofarecipe. Our users, along with 4 other variants, including “ The Amaz- ex p er i ment s r eveal t hat r eci p e r at i ngs can b e wel l pr edi ct ed ing Swedish Meatball”. Each variant has been reviewed, with features derived from combinations of ingredient net- from 329 reviews for “ Swedish Meatballs I” to 5 reviews works and nutrition information. for “ Swedish Meatballs III”. The reviews not only provide acrowd-sourcedrankingofthedier ent r eci p es, but al so many suggestions on how to modify them, e.g. using ground Categoriesand Subject Descriptors turkey instead of beef, skipping the“cream of wheat” be- H.2.8 [DatabaseManagement]: Database applications— cause it is rarely on hand, et c. Data mining Thewealth of information capturedby onlinecollabora- tive recipe sharing sites is revealing not only of the fun- General Terms damentals of cooking, but also of user preferences. The co- occurrence of ingredient s in t ens of t housands of recipes pro- Measurement; Experimentation videsinformation about which ingredients go well together, and when a pairing is unusual. Users’ reviews provide clues Keywords as t o t he flexibilit y of a recipe, and t he ingredient s wit hin it. Can the amount of cinnamon be doubled? Can the nut- ingredient networks, recipe recommendation meg be omitted? If one is lacking a certain ingredient, can a substitute be found among supplies at hand without a trip 1. INTRODUCTION to thegrocery store? Unlikecookbooks,whichwill contain Thewebenablesindividualstocollaboratively shareknowl- vetted but perhaps not the best variants for some individu- edge and r eci p e websi t es ar e one of t he ear l i est ex ampl es of als’ t ast es, rat ings assigned t o user-submit t ed recipes allow collaborativeknowledgesharing on theweb. Allrecipes.com, for the evaluation of what works and what does not. In this paper, we seek to distill the collective knowledge and preference about cooking t hrough mining a popular recipe-sharing website. To extract such information, wefirst Permission to make digital or hard copies of all or part of this work for parse the unstructured text of the recipes and the accom- personal or classroom use is granted without fee provided that copies are panying user reviews. We construct two types of networks not made or distributed for profit or commercial advantage and that copies that reflect dier ent r el at i onshi ps b et ween i ngr edi ent s, i n bear this notice and the full citation on the first page. To copy otherwise, or order t o capt ure users’ knowledge about how t o combine in- republish, to post on serversor to redistributeto lists, requiresprior specific permission and/or a fee. gredient s. T he complement net work capt ures which ingre- Web Sc i 2 0 1 2 , June 22–24, 2012, Evanston, Illinois, USA. dients tend to co-occur frequently, and is composed of two Copyright 2012 ACM 978-1-4503-1228-8. large communities: one savory, the other sweet. The sub- 3. DATASET stitute network, derived from user-generated suggestions for Allrecipes.com is one of the most popular recipe-sharing modifications, can be decomposed into many communitiesof websites, where novice and expert cooks alike can upload functionally equivalent ingredients, and capturesusers’ pref- and rat e cooking recipes. I t host s 16 cust omized int erna- er ence f or heal t hi er var i ant s of a r eci p e. Our ex p er i ment s tional sitesfor users to share their recipesin their native reveal that recipe ratings can be well predicted by features languages, of which we study only the main, English, ver- derived from combinations of ingredient networks and nu- sion. Recipes uploaded to the site contain specific instruc- trition information (with accuracy .792), while most of the tionsonhowtoprepareadish: thelistofingredients,prepa- prediction power comesfrom the ingredient networks (84%). ration steps,preparation and cook time,thenumber of serv- Therestofthepaperisorganizedasfollows. Section2re- ingsproduced, nutrition information, serving directions,and viewstherelated work. Section 3 describesthedataset. Sec- photos of the prepared dish. The uploaded recipes are en- tion4discussestheextractionoftheingredient and comple- riched with user ratings and reviews, which comment on ment networks and their characteristics. Section 5 presents thequality of therecipe,and suggest changesand improve- theextractionof recipemodificationinformation, aswell as ments. In addition to rating and commenting on recipes, theconstructionand characteristicsoftheingredient substi- users are able to save them as favorites or recommend them tute network. Section 6 presents our experiments on recipe to othersthrough a forum. recommendation and Section 7 concludes. We downloaded 46,337 recipes including all information listed from allrecipes.com, including several classifications, 2. RELATED WORK such as a region (e.g. the midwest region of US or Eu- Recipe recommendation has been the subject of much rope), the course or meal the dish is appropriate for (e.g.: prior work. Typically the goal has been to suggest recipes appet izers or breakfast ), and any holidays t he dish may be to users basedon their past reciperatings[15][3] or brows- associated with. In order to understand users’ recipeprefer- ing/ cooking history [16]. The algorithms then find simi- ences, we cr awl ed 1,976,920 r ev i ews whi ch i ncl ude r ev i ewer s’ lar recipes based on overlapping ingredients, either treat- ratings, review text, and the number of userswho voted the ing each ingredient equally [4] or by identifying key ingre- review as useful. dients [19]. Instead of modeling recipes using ingredients, Wang et al. [17] represent the recipes as graphs which are 3.1 Data preprocessing built on ingredients and cooking directions, and they demon- Thefirst stepin processing therecipesis identifying the strate that graph representations can be used to easily ag- ingredients and cooking methods from the freeform text of gregat e Chinese dishes by t he flow of cooking st eps and t he the recipe. Usually, although not always, each ingredient sequenceof added ingredients. However, their approach only is listed on a separate line. To extract the ingredients, we models the occurrence of ingredients or cooking methods, tried two approaches. In the first, we found the maximal and doesn’t t ake int o account t he relat ionships bet ween in- match between a pre-curated list of ingredients and the text gredient s. I n cont rast , in t his paper we incorporat e t he like- of the line. However, this missed too many ingredients, lihood of ingredients to co-occur, as well as the potential of while misidentifying others. In the second approach, we one ingredient t o act as a subst it ut e for anot her. used regular expression matching to remove non-ingredient Another branch of research has focused on recommend- termsfrom thelineand identifiedtheremainder asthein- ing recipesbased on desired nutritional intake or promoting gredient . We removed quant ifiers, such as e.g. “ 1 lb” or “ 2 healthy food choices. Geleijnse et al. [7] designed a proto- cups”, words referring to consistency or temperature, e.g. type of a personalized recipe advicesystem, which suggests chopped or cold, along with a few other heuristics, such as recipesto usersbasedon their past food selectionsand nutri- removing content in parentheses. For example“1 (28 ounce) tion intake. In addition to nutrition information, Kamieth can baked beans (such as Bush’s Original R )” is identified et al . [9] bui l t a p er sonal i zed r eci p e r ecommendat i on sy st em as “ baked beans”. By limit ing t he list of pot ent ial t erms based on availability of ingredients and personal nutritional to remove from an ingredient entry, we erred on the side needs. Shidochi et al. [14] proposed an algorithm to extract of not conflat ing pot ent ially ident ical or highly similar in- replaceableingredientsfrom recipesin order to satisfy users’ gredient s, e.g. “ cheddar ”, used in 2450 recipes, was various demands, such as calorie constraint s and food avail- considered dier ent f r om “ shar p cheddar cheese” , occur r i ng ability. Their method identifies substitutable ingredients by in 394 recipes. matching the cooking actions that correspond to ingredient We then generated an ingredient list sorted by frequency names. However, their assumption that substitutable ingre- of ingredient occurrence and select ed t he t op 1000 common dients are subject to the same processing methods is less di- ingredient namesasour finalized ingredient list. Each of the rect and specific than extracting substitutions directly from top 1000 ingredients occurred in 23 or more recipes, with user-contributed suggestions. plain salt making an appearance in 47.3% of recipes. These Ahn et al. [1] and Kinouchi et al [10] examinednetworks ingredients alsoaccounted for 94.9% of ingredient entriesin involving ingredients derived from recipes, with the former therecipe dataset. Theremaining ingredients were missed modeling ingredients by their flavor bonds, and the latter ei t her b ecause of hi gh sp eci fi ci t y ( e.g. yol k -f r ee egg noodl e) , ex ami ni ng t he r el at i onshi p b et ween i ngr edi ent s and r eci p es. referencingbrand names(e.g. Plantersalmonds), rarity (e.g. In contrast, wederive direct ingredient-ingredient networks serviceberry), misspellings, or not being a food (e.g. “nylon of bot h compliment s and subst it ut es. We also st ep beyond netting”). characterizing these networks to demonstrating that they The remaining processing task was to identify cooking can be used t o predict which recipes will be successful. processes from the directions. We first identified all heating methods using a listing in the Wikipedia entry on cooking [18]. For example, baking, boiling, and steaming are all ways ThePMI givestheprobability that twoingredientsoccur together against theprobability that theyoccur separately. midwest mountain Complementary ingredients tend to occur together far more northeast west coast south oft en t han would be expect ed by chance. Figure 2 shows a visualization of ingredient complemen- tarity. Two distinct subcommunitiesof recipesare imme- diately apparent: one corresponding to savory dishes, the ot her t o sweet ones. Some cent ral ingredient s, e.g. egg and % in recipes salt, actually are pushed to the periphery of the network. Theyaresoubiquitous,thatalthoughtheyhavemanyedges, theyare all weak, sincetheydon’t showparticular comple- 0 10 20 30 40 mentarity with any single group of ingredients. bake boil fry grill roast simmer marinate We further probed the structure of the complementarity method network by applying a network clustering algorithm [13]. Thealgorithm confirmedtheexistenceof twomain clusters Figure 1: The percentage of recipes by region that containing thevast majority of theingredients. An interest- apply a specific heating method. ing satellitecluster is that of mixed drink ingredients, which of heat ing t he food. We t hen ident ified mechanical ways of is evident as a constellation of small nodeslocated near the processing thefood such aschopping and grinding, and other top of the sweet cluster in Figure 2. The cluster includes chemical techniques such as marinating and brining. thefollowing ingredients: lime,rum, ice,orange,pineapple juice, vodka, cranberry juice, lemonade, tequila, etc. 3.2 Regional preferences For each recipe we recorded t he minimum, average, and Choosing one cooking method over another appears to be maximum pairwise pointwise mutual information between aquestionofregionaltaste.5.8%ofrecipeswereclassified ingredients. The intuition is that complementary ingredi- into one of five US regions: Mountain, Midwest, Northeast, ent s woul d y i el d hi gher r at i ngs, whi l e i ngr edi ent s t hat don’ t South, and West Coast (including Alaska and Hawaii). Fig- go t oget her would lower t he average rat ing. We found t hat 2 while the average and minimum pointwise mutual informa- ure 1 shows significantly ( test p-value < 0.001) varying preferences in the dier ent U S r egi ons among 6 of t he most tion between ingredients is uncorrelated with ratings, the maximum is very slightly positively correlated with the av- popular cooking methods. Boiling and simmering, both in- − 10 volving heating food in hot liquids, are more common in the eragerating for therecipe( ⇢ =0.09, p-value < 10 ). This South and Midwest. Marinating and grilling are relatively suggeststhat having at least two complementary ingredients more popular in the West and Mountain regions, but in the very slightly boosts a recipe’s prospects, but having clashing West more grilling recipes involve seafood (18/ 42 = 42%) or unrelat ed ingredient s does not seem t o do harm. relative to other regions combined (7/106 = 6%). Frying is popular in the South and Northeast. Baking is a univer- 5. RECIPE MODIFICATIONS sally popular and versatiletechnique, which isoften used for Co-occurrence of ingredients aggregated over individual both sweet and savory dishes, and is slightly more popular recipes reveals the structure of cooking, but tells us little in the Northeast and Midwest. Examination of individual about how flexible t he ingredient proport ions are, or whet her recipes reflecting these frequencies shows that these dier - some ingredients could easily be left out or substituted. An ences i n pr ef er ence can b e t i ed t o di er ences i n demogr aph- experienced cook may know that applesauceisa low-fat al- ics, immigrant culture and availability of local ingredients, ternativeto oil, or mayknowthat nutmegisoftenoptional, e.g. seaf ood. but a novice cook may implement recipes literally, afraid that deviating from the instructions may producepoor re- 4. INGREDIENT COMPLEMENT NETWORK sults. Whilea traditional hardcopy cookbook would provide few such hints, they are plentiful in the reviews submitted Can we learn how to combine ingredients from the data? by users who implemented t he recipes, e.g. “Thisisa great Hereweemploy theoccurrencesof ingredientsacrossrecipes recipe, but using fresh tomatoes only adds a few minutes to to distill users’ knowledgeabout combining ingredients. the prep time and makesit tasteso much better”,oranother We constructed an ingredient complement network based comment about t he same recipe“Thisisbyfarthebest on point wise mut ual informat ion (PM I ) defined on pairs of recipe we have ever come across. We did however change it ingredients (a, b): just a little bit by adding extra onion.” p(a, b) Astheexamplesillustrate,modificationsarereportedeven PMI(a, b) = log , p(a)p(b) when the user likes the recipe. In fact, we found that 60.1% of recipe reviews cont ain words signaling modificat ion, such where as “ add”, “ omit ”, “ inst ead”, “ ext ra” and 14 ot hers. Furt her- #ofrecipescontaininga and b p(a, b)= , more, it is the reviews that include changes that have a sta- #ofrecipes tistically higheraveragerating (4.49vs. 4.39,t-test p-value < 10− 10), and lower rating variance (0.82 vs. 1.05, Bartlett #ofrecipescontaininga test p-value < 10− 10), as is evident in the distribution of p(a)= , #ofrecipes ratings, shown in Fig. 3. This suggests that flexibility in recipes is not necessarily a bad thing, and that reviewers #ofrecipescontainingb who don’t mention modifications are more likely to think of p(b)= . #ofrecipes therecipeasperfect,or to dislikeit entirely. tiger prawn

lobster tail

sea salt

artichoke greek yogurt

kosher salt black pepper

root beer

white mushroom

haddock

button mushroom

port wine watercres goat cheese grape tomato

arugula fat free half and half white balsamic

salt black pepper chestnut trout chicken leg

sea scallop fennel irish stout beer portobello mushroom cap melon liqueur white wineleg of lamb sour mix spanish onion triple sec tuna steak italian parsley sweet vermouth gin gorgonzola cheese coconut rum simple juiced shallot sea salt limeade white rum grenadine syrup club soda lemon lime soda triple sec liqueur peach schnapp carbonated water pomegranate juice lemon baby spinach greek seasoning cranberry juice grapefruit juice pine nut

orange roughy asparagus tequila crimushroom

swiss chard ice seed yukon gold potato coconut oil english cucumber butterscotch schnapp vodka ale round spiced rum lemonade mint tilapia sauce creme fraiche cornish game hen fontina cheese kalamata olive new potato portobello mushroom linguini pasta pink lemonade lime cracked black pepper banana liqueur pineapple juice scallop mussel malt vinegar arborio rice pancetta bacon lemon lime carbonated beverage shiitake mushroom feta cheese vanilla vodka debearded rum jicama spaghetti squash squid papaya asparagu brandy based orange liqueur zested kosher salt orzo pasta vermicelli pasta maraschino cherry herb asiago cheese champagne marsala wine couscou quinoa red bell pepper honeydew melon skewer wooden skewer linguine pasta watermelon brie cheese corn oil penne pasta linguine pineapple ring tamari prosciutto orange mango rosemary red snapper bulgur sa#ron thread vegetable stock lime juice tahini sake basil low sodium beef broth anchovy heavy cream chile pepper salmon lambred onionfettuccine pasta ka$r lime pork roast gruyere cheese angel hair pasta zucchini butternut squash serrano chile pepper eggplant orange juice mirin beef tenderloin juice sa#ron low fat cottage cheese leek roma tomato escarole

smoked ham "ank steak phyllo dough jalapeno chile pepper caper red curry paste balsamic dressing cherry tomato coconut cream lemon gras fettuccini pasta curry paste cream of coconut jasmine rice collard green basil pesto cantaloupe tarragon green cabbage chocolate hazelnut spread chocolate ice cream parsnip vanilla bean orange liqueur chicken stock irish cream liqueur pork tenderloin pre pizza crust

amaretto liqueur orange sherbet yam white kidney bean french baguette lemonlime zest juice miso paste bow tie pasta green beans snapped red wine farfalle pasta white bean fusilli pasta cheese tortellini mascarpone cheese red cabbage parsley fat free chicken broth angel food cake mix ginger root chicken breast half vanilla ice cream pu# pastry green olive kale tea bag red pepper "ake lemon zest salt pepper bay scallop brown lentil plum tomato peach chile paste egg yolk super!ne sugar plum whiskey pita bread round cream sherry ears corn butterhazelnut chocolate syrup cola carbonated beverage brandy italian cheese blend coconut milk adobo sauce co#ee liqueur lime peel asian tarragon vinegar bittersweet chocolate eggnog peanut oil halibut vegetable broth bell pepper sour cherry lady!nger rice noodle tofu maraschino cherry juice chili oil semolina "our sugar snap pea turnip !sh stock heavy whipping cream star anise pod chile powder brussels sprout key lime juice pistachio nut bourbon whiskey grapefruit sesame oil gelatin !sh sauce mandarin orange segment raspberry low fat milk shrimp chicken liver white grape juice rose water rice vinegar thyme cherry rice wine red lettuce kiwi mixed green poblano chile pepper clam napa cabbage apple cider short grain rice red wine vinegar clam juice strawberry pear raspberry vinegar bok choy coriander rice wine vinegar chickpea nectarine prune cilantrolow sodium chicken broth chinese !ve spice powder marinated artichoke heart blueberry pound cake pork loin muenster cheese rutabaga black peppercorn garbanzo bean burgundy wine sweet potato stu#ed green olive cream cheese spread vanilla yogurt blackberry !g orange zest grape cannellini bean red chile pepper gingerroot white onion black pepper yellow onion yellow pepper allspice berry beef "ank steak chicken drum hoisin sauce apricot cooking sherry russet potato artichoke heart coriander seed serrano pepper pita bread chocolate wafer bean sprout romaine lettuce spinach red lentil vegetable bouillon banana crystallized ginger yogurt panko bread chorizo sausage apricot nectar sugar pumpkin fennel seed fat free "at iron steak yellow squash habanero pepper beef stock pearl onion savory ditalini pasta walnut oil rigatoni pasta bamboo shoot alfredo pasta sauce grape juice ginger hazelnut liqueur cranberry chipotle pepper yellow summer squash pizza crust dough smoked salmon snow pea pasta alfredo sauce white chocolate green lettuce parmesanmild italian sausage cheese almond paste romano cheese creme de menthe liqueur cakemilk " chocolateour cardamom pod arti!cial sweetener semisweet chocolate carrot cheese ravioli raspberry jam granny smith apple white sugar baby corn whipping creamegg white low sodium vanilla wafer co#ee golden syrup brown rice ricotta cheese cocoa pie shell pineapple splenda meatball red delicious apple italian sauce chocolate graham crackervanilla crust pudding mix mixed spice cardamom red pepper macadamia nut orangeangel extract food cake tomatillo red food coloring red candied cherry candied cherry beef short rib cocoa powder rum extract golden raisin salad shrimp cucumber oreganochicken leg quarter rotini pasta "ounder pearl barley almond extract apricot jam candied mixed fruit peel chicken brothcaesar dressing green food coloring giblet raspberry gelatin mix apple juice poblano pepper pastry shell currant soy sauce bay basmati rice avocado strawberry gelatin mix orange gelatin white pepper ghee seed fenugreek seed italian seasoning tapioca clove curry pasta shell peppercorn silken tofu cream of shrimp soup ziti pasta peppermint extract cornstarch garam masala mandarin orange italian sausage golden delicious apple white potato andouille sausage hot almond peppermint candy curry powder pork shoulder turbinado sugar ginger garlic paste low fat mango mushroom pie crust mace orange marmalade beef sirloin steak lentil asafoetida powder cod pork shoulder roast candied citron powdered fruit pectin green grape german chocolate chicken breastmarjoram nutmeg allspice tube pasta chocolate sandwichcinnamon cooky red candy fruit cocktail green apple wonton wrapper gingersnap cooky broccoli "oret ham hock baking powder banana pepper pasta sauce ginger paste powdered non dairy creamer orange peel chicken soup base apricot preserve craw!sh tail cream of tartar lemon extractstrawberry preserve rhubarb basil sauce coconutdate extract green candied cherry yellow food coloring candied pineapplepistachio pudding mix strawberry gelatin manicotti shell white cake mix lemon peel creole seasoning red bean raspberry gelatin lemon yogurt ring acorn squash lemon pepper cauli"ower "oret okra pastry cumin adobo seasoning part skim mozzarella cheese self rising "our berrypotato starch cranberry sauce rump roast green salsa low fat yogurt chicken thigh lemon pepper seasoning noodle canola oil radishe red potato white hominy honey scallion oyster beef sirloin cooking oil lump crab meat tomatopart skim ricotta cheese white chocolate chip whole milk black eyed pea marinara sauce chicken bouillon powder italian bread confectioners' sugar sesame seed chive smoked mozzarella cheese chocolate puddingchocolate frosting mix co#ee powder persimmon pulp lime gelatin apple provolone cheese spaghetti sauce sour milk rice "our butterchocolate extract cookie ramen noodle dijon tomato paste salami raisin garlic paste veal fajita seasoning coconut maple syrup red grape red apple fat free yogurt sage cajun seasoning recipe pastry low fat cream cheese fat free evaporated milk sauce blue cheese italian salad dressing graham crackerwalnutyellow cake mix lite whipped topping baby carrot hungarian paprika whole wheat pastry "our pineapple chip chutney mustard seed lasagna noodle cranberry sauce egg roll wrapper cayenne pepper low fat whipped topping cauli"oweret meatless spaghetti sauce lemon cake mix white cheddar cheese jellied cranberry sauce beef chuck roast lemon pudding mix tart apple lemon gelatin mix barley anise seed poppy seed beet beef stew meat louisiana pepperoni whipped topping bacon dripping cashew half and half white wine vinegar seashell pasta pizza crust bourbon cherry pie !lling peanut salmon steak jalapeno pepper anise extract maple extract blueberry pie !lling pimento stu#ed green olive green chile pepper honey mustard turkey breast pepperoni sausage pumpkin romaine bacon bit beef broth 1% buttermilk spaghetti buttermilk fruit gelatin mix raspberry preserve vanilla frosting steak low fat peanut butter apple pie spice french bread wild rice zesty italian dressing strawberry jam chicken wing pu# pastry shell salt free seasoning blend cauli"ower hot sauce pepper jack cheesegreen pea lobster pork loin roast cat!sh chicken ramen noodle sirloin steak seafood seasoning onion separated baking chocolate low fat black walnut any fruit jam green tomato browning sauce white chip navy bean shortening bacon grease pork loin chop black olive chocolate cookie crust fat free mayonnaise red kidney bean pecan sugar free vanilla pudding mix orange gelatin mix black bean chocolate cake mix vegetable cooking spray unbleached "our "ax seed skim milk broiler fryer chicken up bratwurst unpie crust lime gelatin mix egg substitute coleslaw mix non fat yogurt sun"ower kernel crab meat smoked sausage round steak mixed fruit wheat germ matzo meal water chestnut "our marshmallow softened butter cornbread applesauce oat bran cherry gelatin cooking spray cinnamontapioca "our distilled white vinegar celery solid pack pumpkin corn bread dough lemon gelatin lean pork hoagie roll pumpkin pie spice sun"ower seed pickling spice cocktail rye bread beef bouillon milk powder pumpkin seed dill pepper baking apple water long grain rice pizza sauce low fat sour cream cabbage cinnamon sugar whole wheat "our evaporated milk rapid rise yeast tomato sauce baking sodanut lean turkey fat free cream cheese apple cider vinegar powdered milk chocolate chip devil's food cake mix baking cocoa whipped topping mix onion powder herb stu$ng beef round steak chicken bouillon potato individually wrapped caramel rye "our great northern bean yeast steak seasoning candy coated chocolate vanilla extract bread machine yeast vital wheat gluten sugar based curing mixture poultry seasoning corn syrup oil apple pie !lling molasse lard fat free sour cream crisp rice cereal brownie mix fruit pork low fat cheddar cheese green bean white corn corkscrew shaped pasta unpastry shell peanut butter chip beer wax bean greenchuck roast bell pepper spiral pasta caramel ice cream topping olive beef chuck topping oatmeal kielbasa sausage semisweet chocolate chipapple butter candy coated milk chocolate old bay seasoning tm roast beef vegetable combination beef brisket oat wheat bran mexican corn milk chocolate candy kisse caramel vanilla pudding pie !lling beaten egg unpie shell beef consomme curd cottage cheese monosodium glutamate whitechile sauce rice venison kidney bean sugar white vinegar broccoli "oweret milk chocolate chip german chocolate cake mix iceberg lettuce imitation crab meat vidalia onion chocolate mix pinto bean tomato juice marshmallow creme crispy rice cereal celery seed polish sausage vegetable shortening butterscotch pudding mix toothpick to#ee baking bit butterscotch chip cider vinegar peanut butterwheat nilla wafer barley nugget cereal maple "avoring spice cake mix caraway seed sourdough starter chow mein noodle herb stu$ng mix sugar cookie mix neufchatel cheese chili powder butter shortening non fat milk powder wild rice mix ‚Ñ pork sparerib cottage cheese parsley "ake cream cheesepretzel bread "our mustard powder paprika garlic powder potato "ake lettuce turkey yellow cornmeal herb bread stu$ng mix country pork rib seasoning chili saucehot pepper sauce chili bean meat tenderizer corn tortilla chunk chicken breast !rmly brown sugar celery salt food coloring chocolate pudding cornmeal crouton golden mushroom soup broccoli corn chip vanilla long grain dill seed salad green vegetable soup mix tortilla chipbread low fat margarine mayonnaise pickled jalapeno pepper brick cream cheese baker's semisweet chocolate cereal popped popcorn liquid smoke brown mix candy decorating gel brown sugarmixed nut corn"akes cereal bacon baby pea mixed berry pimento pepper monterey jack cheese pork chop refried bean green chile whole wheat tortilla chili seasoning mix spicy pork sausage bagel buttery round cracker mixed vegetable tomato vegetable juice cocktail sausage cheese vegetable cream of potato soup onion "ake enchilada sauce corn tortilla chip brown mustard cook grape jelly worcestershiresandwich roll sauce seasoning salt lima bean taco seasoning mix pepperjack cheese swiss cheese dill pickle juice colby monterey jack cheese crescent roll dough pimiento stu$ng saltine chicken ranch bean spicy brown mustard onion salsa lean beef cracker onion soup mix "our tortilla picante sauce turkey gravy macaroni dressing french onion soup italian dressing mix english mu$n tuna taco seasoning kaiser roll white bread pimento ranch dressing mix green chily taco sauce whole wheat bread pickle mexican cheese blend butter bean milk biscuit baking mix elbow macaroni

vinegar catalina dressing stu#ed olive corn"ake crescent roll pickle butter cooking spray french dressing crescent dinner roll dill pickle relish garlic salt potato chip dill pickle bean kernel corn pancake mix margarine apple jelly yellow mustard barbeque sauce bread stu$ng mix egg noodle buttermilk biscuit ham saltine cracker onion salt mild cheddar cheese

colby cheese cream of chicken soup beef gravy egg process cheese sauce processed cheese corned beef buttermilk baking mix stu$ng mix

corn bread mix mustardsourdough bread cream of mushroom soup thousand island dressing cream corn cream of celery soup processed cheese food process american cheese rye bread hamburger bun baking potato beef pork sausage tomato soup salt canadian bacon creamed corn cheddar cheese soup french green bean baking mix chili processed american cheese

sour cream biscuit dough sharp cheddar cheese process cheese american cheese

biscuit biscuit mix butter cracker

chili without bean

hash brown potato chunk chicken tomato based chili sauce cheddarcorn mu$n mix cheese hot dog

tater tot

grit

hot dog bun

dinner roll

Figure 2: Ingredient complement network. Two ingredients share an edge if theyluncheon meat occur together more than would be expected by chance and if their pointwise mutual information exceeds a threshold.

us to compare the relative rate of modification, as well as thefrequency of increasevs. decreasebetweeningredients. 0.6 no modification Theingredientsthemselveswereextractedby performinga with modification maximal character match within a window following an ad- 0.5 justment term. Figure 4 shows the ratios of the number of reviews sug-

0.4 gest ing modificat ions, eit her increases or decreases, t o t he number of recipesthat contain theingredient. Two patterns

0.3 are immediat ely apparent . I ngredient s t hat may be per- ceived as being unhealthy, such as fats and sugars, are, with

0.2 the exception of vegetable oil and margarine, more likely to be modified, and to be decreased. On the other hand,

0.1 flavor enhancers such as soy sauce, lemon juice, cinnamon, proportion of reviews with given rating , and toppings such as , bacon

0.0 and mushrooms, are also likely t o be modified; however, t hey 1 2 3 4 5 tend to be added in greater, rather than lesser quantities. rating Combined, the patterns suggest that good-tasting but “ un- healthy” ingredients can be reduced, if desired, while spices, Figure 3: The likelihood that a review suggests a ex t r act s, and t oppi ngs can b e i ncr eased t o t ast e. modification to the recipe dependson the star rating the review is assigning to the recipe. 5.2 Deletionsand additions Recipesare also frequently modified such that ingredients In the following, wedescribe the recipe modifications ex- are omit t ed ent irely. We looked for words indicat ing t hat tracted from user reviews, including adjustment, deletion thereviewer did not havean ingredient (and hencedid not and addit ion. We t hen present how we const ruct ed an in- use it), e.g. “ had no” and “ didn’t have”. We further used gredient subst it ut e net work based on t he ext ract ed informa- “omit/left out/left o/botherwith”asindicationthatthe tion. reviewer had omitted the ingredients, potentially for other reasons. Becausereviewersoften used simplified terms, e.g. 5.1 Adjustments “vanilla”insteadof“vanillaextract”,wecomparedwordsin Some modifications involve increasing or decreasing the proximity to the action words by constructing 4-character- amount of an ingredient in t he recipe. I n t his and t he fol- grams and calculat ing t he cosine similarit y bet ween t he n- lowing analyses, we split the review on punctuation such gramsin thereview and thelist of ingredientsfor therecipe. as commas and periods. We used simple heurist ics t o de- To identify additions, wesimply lookedfor theword“ add”, tect whena reviewsuggesteda modification: adding/ using but omitted possible substitutions. For example, we would more/ less of an ingredient count ed as an increase/ decrease. use“ added cucumber”, but not “ added cucumber instead of Doubling or increasing counted as an increase, while reduc- green pepper”, t he lat t er of which we analyze in t he follow- ing, cutting, or decreasing counted asa decrease. Whileit is ing section. We then compared the addition to the list of likely that there are other expressions signaling the adjust- ingredients in the recipes, and considered the addition valid ment of ingredient quantities, using this set of terms allowed only if t he ingredient does not already belong in t he recipe. soy sauce

garlic cinnamon chicken broth cheddar bacon chocolate chip honey mushroom parmesan cream cheese worcestershire s. cornstarchpotato garlic powder lemon juice carrot milk chicken breast tomato flour sour cream vanilla extract basil brown sugar pecan nutmegwater onion butterwhite sugar celery oregano mayonnaise sugar cs’. sugar black pepper egg salt walnut baking powder pepper olive oil green bell pepper baking soda parsley shortening (# reviews adjusting up)/(# recipes)

vegetable oil

margarine 0.01 0.02 0.05 0.10 0.20 0.50 1.00

0.01 0.02 0.05 0.10 0.20 0.50 1.00 (# reviews adjusting down)/(# recipes) Figure 4: Suggested modifications of quantity for the 50 most common ingredients, derived from recipe reviews. The line denotes equal numbers of suggested quantity increases and decreases. Figure 5: Ingredient substitute network. Nodesare sized according to the number of times they have Table 1 shows the correlation between ingredient modifi- been recommended as a substitute for another in- cations. As might be expected, the more frequently an in- gredient, and colored according to their indegree. gredient occurs in a recipe, the more times its quantity has theopportunity to be modified, asis evident in thestrong correlation between thethenumber of recipestheingredient occurs in and bot h increases and decreases recommended in substitution. Thus, we found an alternative source for ex- reviews. However,themorecommon an ingredient, themore tracting replacement relationships – users’ comments, e.g. stable it appears to be. Recipe frequency is negatively cor- “I replacedthebutterin thefrostingbysour cream, justto related with deletions/recipe (⇢ = 0.22), addit ions/ recipe soothe my conscience about all the fatty calories” . (⇢ = 0.25), and increases/ recipe (⇢ = 0.26). For exam- To extract such knowledge, we first parsed the reviews ple, salt is so essential, appearing in over 21,000 recipes, that as follows: we considered several phrases t o signal replace- wedetected only 18 reviewswhereit wasexplicitly dropped. ment relationships: “ replace a with b”, “substitute b for a”, In contrast, Worcheshire sauce, appearing in 1,542 recipes, “ b instead of a”, etc, and matched a and b to our list of is dropped explicitly in 148 reviews. ingredients. Asmight alsobeexpected, additionsarepositively corre- We constructed an ingredient substitute network to cap- lated with increases, and deletionswith decreases. However, ture users’ knowledgeabout ingredient replacement. This addit ions and delet ions are very weakly negat ively corre- weighted, directed network consists of ingredients as nodes. lated, indicating that an ingredient that is added frequently We thresholded and eliminated any suggested substitutions is not necessarily omitted more frequently as well. that occurredfewer than 5 times. Wethendeterminedthe weight of each edge by p(b|a), the proportion of substitu- tionsofingredient a that suggest ingredient b.Forexample, Table 1: Correlations between ingredient modifica- 68% of subst it ut ions for whit e sugar were t o splenda, an tions art ificial sweet ener, and hence t he assigned weight for t he sugar splenda edge i s 0.68. addit ion delet ion increase decrease ! #recipes 0.41 0.22 0.61 0.68 The resulting substitution network, shown in Figure 5, addit ion -0.15 0.79 0.11 ex hi bi t s st r ong cl ust er i ng. We ex ami ned t hi s st r uct ur e by deletion 0.09 0.58 applying t he map generat or t ool by Rosvall et al. [13], which increase 0.39 usesa random walk approach to identify clustersin weighted, directed networks. The resulting clusters, and their relation- shipsto oneanother, areshown in Fig. 6. Thederived clus- 5.3 Ingredient substitute network ters could be used when following a relatively new recipe which may not receive many reviews, and therefore many Replacement relationships show whether one ingredient suggestionsfor ingredient substitutions. If onedoesnot have is preferable to another. The preference could be based all ingredient s at hand, one could examine t he cont ent of on t ast e, availabilit y, or price. Some ingredient subst it u- one’s fridge and pant ry and mat ch it wit h ot her ingredient s tion tablescan be found online1 ,butareneitherextensive found in the same cluster as the ingredient called for by nor contain information about relative frequencies of each therecipe. Table 2 lists thecontentsof a few such sample 1 e.g., ht t p:/ / al l r eci p es.com/ H owT o/ common-i ngr edi ent - ingredient clusters, and Fig. 7 shows two example clusters subst i t ut i ons/ det ai l .aspx ex t r act ed f r om t he subst i t ut e net wor k . pumpkin seed,.. vegetable shortening,.. Table 2: Clusters of ingredients that can be substi- ,..lemon cake mix,.. baking powder,.. golden syrup,.. black olive,.. tuted for one another. A maximum of 5 additional lemonade,.. graham cracker,.. coconut milk,.. ingredients for each cluster are listed, ordered by PageRank. hoagie roll,.. almond extract,.. vanilla,.. pie crust,..honey,.. peach schnapp,.. cranberry,.. main ot her ingredient s strawberry,.. almond,.. milk,.. chi cken turkey,beef, sausage,chickenbreast,bacon cinnamon,.. lemon juice,.. olive oil butter, apple sauce, oil, banana, margarine corn chip,..bread,..chocolate chip,.. apple juice,.. olive oil,.. sweet yam, potato, pumpkin, butternut squash, sour cream,.. apple,.. white wine,.. flour,.. champagne,.. potato parsnip cottage cheese,.. egg,.. chicken broth,.. baking baking soda, cream of tartar garlic,.. sauce,.. powder sweet potato,.. onion,.. tomato,.. almond pecan, walnut, cashew, peanut, sunflower s. brown rice,.. celery,.. apple peach, pineapple, pear, mango, pie filling hot,.. pepper,.. spaghetti sauce,.. cheese,.. chicken,..spinach,.. egg egg whi t e, egg subst i t ut e, egg yol k seasoning,.. tilapia cod, catfish, flounder, halibut, orange roughy red potato,.. black bean,.. italian seasoning,.. spi nach mushroom, broccoli, kale, carrot, zucchini cream of mushroom soup,.. italian basil, cilantro, oregano, parsley, dill sugar snap pea,..iceberg lettuce,.. curry powder,.. imitation crab meat,.. pickle,.. seasoni ng quinoa,.. tilapia,.. cabbage,..sea scallop,.. smoked paprika,.. cabbage coleslaw mix, sauerkraut, bok choy napa cabbage Figure 6: Ingredient substitution clusters. Nodes represent clustersand edgesindicate the presenceof recommended substitutionsthat span clusters. Each Finally, we examine whether the substitution network en- cluster represents a set of related ingredients which codes preferences for one ingredient over another, as evi- are frequently substituted for one another. denced by the relative ratings of similar recipes, one which contains an original ingredient, and another which imple- ginger root ments a substitution. To test this hypothesis, we construct whipping cream evaporated milk ginger a“preferencenetwork”,whereoneingredient ispreferredto cardamom heavy cream anot her in t erms of received rat ings, and is const ruct ed by half and half cream pumpkin pie spice creat ing an edge (a, b)betweenapairofingredients,wherea heavy whipping cream cinnamon and b arelisted in two recipes X and Y respectively, if recipe buttermilk milk clove whole milk allspice ratings RX >RY .Forexample,ifrecipeX includes beef, nutmeg soy milk ketchup and cheese, and recipe Y cont ains beef and pick- skim milk mace les, then this recipe pair contributesto two edges: one from (a) milk substitutes (b) cinammon substitutes picklesto ketchup, and the other from picklesto cheese. The aggregat e edge weight s are defined based on PM I . Because Figure 7: Relationshipsbetween ingredientslocated PMI is a symmetric quantity (PMI(a; b)= PMI(b; a)), we within two of the clusters from Fig. 6. introduce a directed PMI measure to cope with the direc- tionality of thepreferencenetwork: Then weapply a discriminative machine learning method, p(a b) stochastic gradient boosting trees [6], to predict recipe rat- PMI(a b)= log ! , ! p(a)p(b) ings. In theexperiments,weseek to answer thefollowing three where questions. (1) Can we predict users’ preference for a new #ofrecipepairsfroma tob recipegiventheinformation present in therecipe? (2) What p(a b)= , ! #ofrecipepairs are t he key aspect s t hat det ermine users’ preference? (3) Doesthe structure of ingredient networks help in recipe rec- and p(a), p(b)aredefinedasintheprevioussection. ommendat ion, and how? We find high correlation between this preference network and t he subst it ut ion net work (⇢ =0.72,p< 0.001). T his ob- 6.1 Recipe Pair Prediction ser vat i on suggest s t hat t he subst i t ut e net wor k encodes user s’ Thegoal of our predictiontaskis: gi ven a pai r of si mi lar ingredient preference, which we usein the recipe prediction recipes, determine which one has higher average rating than taskdescribedin thenext section. the other.Thistaskisdesignedparticularlytohelpusers with a specific dish or meal in mind, and who are trying to 6. RECIPE RECOMMENDATION decide between several recipe options for that dish. We use the above insights to uncover novel recommen- Recipe pair data. The data for this prediction task dation algorithms suitable for recipe recommendations. We consists of pairs of similar recipes. The reason for select- use ingredients and the relationships encoded between them ing similar recipes, with high ingredient overlap, is that in ingredient networks as our main feature sets to predict while apples may be quite comparable to oranges in the recipe ratings, and compare them against features encod- context of recipes, especially if one is evaluating or ing nutrition information, as well as other baseline features desserts, lasagna may not be comparable to a mixed drink. such as cooking methods, and preparation and cook time. To derive pairs of related recipes, we computed similarity with a cosine similarity between the ingredient lists for the two recipes, weighted by the inverse document frequency, log(# of r eci pes/ # of recipes containing theingredient ). combined We considered only those pairs of recipes whose cosine sim- ilarity exceeded 0.2. The weighting is intended to identify ing. networks higher similarity among recipes sharing more distinguishing ingredients, such as Brussels sprouts, as opposed to recipes nutrition shar i ng ver y common ones, such as but t er . Afurtherchallengetoobtainingreliablerelativerankings full ingredients of recipes is variance int roduced by having dier ent user s choose t o r at e di erent recipes. In addition, some users baseline might not have a su cient number of reviews under their belt to have calibrated their own rating scheme. To con- trol for variation introduced by users, we examined recipe pairs where the same users are rating both recipes and are 0.60 0.65 0.70 0.75 0.80 Accuracy collectively expressing a preference for one recipe over an- ot her. Specifically, we generat ed 62,031 recipe pairs (a, b) Figure 8: Prediction performance. The nutrition where ratingi (a) >ratingi (b), for at least 10 users i ,and information and ingredient networks are more eec- over 50% of userswho rated both recipe a and recipe b.Fur- tive features than full ingredients. The ingredient thermore, each user i should be an active enough reviewer network features lead to impressive performance, to haveratedat least 8 other recipes. close t o t he best per for mance. Features. In the prediction dataset, each observation consists of a set of predictor variables or features that rep- measures, including degree centrality, betweenness central- resent information about two recipes, and theresponsevari- ity, etc., from theingredient networks. A centrality measure able is a binary indicat or of which get s t he higher rat ing on can be represent ed as a vect or ~g where each entry indicates average. To study the key aspects of recipe information, we the centrality of an ingredient. The network position of a const ruct ed dier ent set of f eat ur es, i ncl udi ng: recipe, with its full ingredient list represented as a binary • Baseline: Thisincludescookingmethods,suchaschop- vector f~,canbesummarizedby~gT · f~,i.e.,anaggregated ping, marinating, or grilling, and cooking eort de- centrality measure based on the centrality of its ingredients. scriptors, such as preparation time in minutes, as well Network communities provide information about which as t he number of servings produced, et c. T hese fea- ingredient is more likely to co-occur with a group of other turesare considered asprimary information about a ingredientsin thenetwork. A recipeconsisting of ingredients recipe and will be included in all other feature sets that are frequently used with, complemented by or substi- described below. tuted by certain groups may be predictive of the ratings • Full ingredient s: We select ed up t o 1000 popular ingre- the recipe will receive. To obtain the network community dients to build a “ full ingredient list”. In this feature information, we applied latent semantic analysis (LSA) on set, each observed recipe pair contains a vector with recipes. We first factorized each ingredient network, rep- entries indicating whether an ingredient from the full resented by matrix W ,usingsingularvaluedecomposition list is present in either recipe in the pair. (SVD). In the matrix W ,eachentryWij indicateswhether • Nutrition: This feature set does not include any in- ingredient i co-occurrs, complement s or subst it ues ingredi- ent j . gredient s but only nut rit ion informat ion such t he t ot al T caloric content, as well as quantities of fats, carbohy- Supp ose Wk = Uk Σ k Vk is a rank-k approximat ion of W , drates, etc. we can then transform each recipe’s full ingredient list using thelow-dimensionalrepresentation, Σ − 1 V T f~,ascommunity • Ingredient networks: In this set, we replaced the full k k information within a network. These low-dimensional vec- ingredient list by structural information extractedfrom tors,together with thevectorsof network positions,consti- dier ent i ngr edi ent net wor k s, as descr i b ed i n Sect i ons 4 tutetheingredient network features. and 5.3. Co-occurrence is t reat ed separat ely as a raw Learning method. We applied discriminative machine count , and a complement arit y, capt ured by t he PM I . learning methodssuchassupport vector machines(SVM) [2] • Combined set: Finally, a combined feature set is con- and st ochast ic gradient boost ing t rees [5] t o our predict ion structed to test the performance of a combination of problem. Here we report and discuss the detailed results features, including baseline, nutrition and ingredient based on the gradient boosting tree model. Like SVM, the networks. gradient boost ing t ree model seeks a paramet erized classi- To build the ingredient network feature set, we extracted fier, but unlike SVM that considers all the features at one the following two typesof structural information from the time, the boosting treemodel considers a set of features co-occurrence and subst it ut ion net works, as well as t he com- at a t ime and it erat ively combines t hem according t o t heir plement network derived from the co-occurrence informa- empi r i cal er r or s. I n pr act i ce, i t not onl y has comp et i t i ve tion: performance comparable to SVM, but can serve as a feature Network positions are calculat ed t o represent how a recipe’s ranking procedure [11]. ingredients occupy positions within the networks. Such po- In this work, wefitteda stochastic gradient boosting tree sition measures are likely to inform if a recipe contains any model with 8 terminal nodesunder an exponential lossfunc- “popular” or“unusual” ingredients. To calculate theposi- tion. The dataset is roughly balanced in terms of which tionmeasures,wefirst calculatedvariousnetwork centrality recipe is the higher-rated one within a pair. We randomly 1.0 group 1.0 nutrition nutrition (6.5%) carbs (20.9%) cook effort (5.0%) cholesterol (17.7%) 0.8 ing. networks (84%) 0.8 calories (19.7%) cook methods (3.9%) sodium (16.8%) 0.6 0.6 fiber (12.3%) fat (12.4%) 0.4 0.4 importance importance

0.2 0.2

0.0 0.0

20 40 60 80 100 2 4 6 8 10 12 feature feature Figure 9: Relative importance of features in the Figure 11: Relative importanceof featuresfrom nu- combined set. The individual items from nutri- trition information. The carbs item is the most in- tion information are very indicative in dierent iat - fluential feature in predicting higher-rated recipes. ing highly rated recipes, while most of the prediction power comes from ingredient networks. .746. In contrast, the nutrition information and ingredient networks are more eective(with accuracy .753 and .786, re- network 0.7 substitution (39.8%) spectively). Both of them havemuch lower dimensions(from co−occurrence (30.9%) tenstoseveralhundreds), comparedwith thefull ingredients 0.6 complement (29.2%) that are represented by more than 2000 dimensions (1000 0.5 ingredients per recipe in the pair). The ingredient network features lead to impressive performance, close to the best 0.4 performance given by the combined set (.792), indicating 0.3 thepower of network structuresin reciperecommendation. importance Figure 9 shows the influence of dier ent f eat ur es i n t he 0.2 combined feature set. Up to 100 features with the highest 0.1 relative importanceareshown. The importanceof a feature group is summarized by how much t he t ot al import ance is 0.0 contributed by all features in the set. For example, the 20 40 60 80 100 feature baseline consisting of cooking eort and cooking met hods Figure 10: Relative importance of features repre- cont ribut e 8.9% to the overall performance. The individual senting the network structure. The substitution net- itemsfrom nutrition information arevery indicativein dier - work has the strongest contribution (39.8%)tothe ent i at i ng hi ghl y -r at ed r eci p es, whi l e most of t he pr edi ct i on total importanceof network features,and it alsohas power comesfrom ingredient networks (84%). more influential features in the top 100 list, which Figure 10 shows the top 100 featuresfrom the threenet- suggests that the substitution network is comple- works. In terms of the total importance of ingredient net- mentary to other features. work features, the substitution network has slightly stronger cont ribut ion (39.8%) than the other two networks, and it divided the dataset into a training set (2/ 3) and a testing also has more influent ial feat ures in t he t op 100 list . T his set (1/ 3). Theprediction performanceisevaluated based on suggests that the structural information extracted from the accuracy, and t he feat ure performance is evaluat ed in t erms substitution network isnot only important but also comple- of relat ive import ance [8]. For each single decision t ree, one mentary to information from other aspects. of t he input variables, xj ,isusedtopartitiontheregionas- Looking into thenutrition information (Fig.11), wefound sociated with that node into two subregions in order to fit that carbohydratesare the most influential feature in pre- to theresponsevalues. Thesquaredrelativeimportanceof dicting higher-rated recipes. Since carbohydrates comprise variable xj is the sum of such squared improvements over around 50% or more of t ot al calories, t he high import ance all int ernal nodes for which it was chosen as t he split t ing of t his feat ure int erest ingly suggest s t hat a recipe’s rat ing variable, as: can be influenced by users’ concerns about nutrition and diet. Another interesting observation is that, while individ- 2 j imp(j )= ˆi k I (splits on x ) ual nutrition items are powerful predictors, a higher predic- k tion accuracy can be reachedby using ingredient networks alone, as shown in Fig. 8. T his implies t he informat ion ˆ 2 where i k is the empirical improvement by the k-th node about nut rit ion may have been encoded in t he ingredient spl i t t i ng on xj at t hat point . network structure, e.g. substitutions of less healthful ingre- dients with “ healthier” alternatives. 6.2 Results Constructing the ingredient network feature involves re- The overall prediction performance is shown in Fig. 8. ducing high-dimensional network information through SVD, Surprisingly, even with a full list of ingredients, the pre- as described in t he previous sect ion. T he dimensionalit y can diction accuracy is only improved from .712 (baseline) to be determined by cross-validation. As shown in Fig. 12, fea- tureswith avery largedimensiontendtooverfit thetraining 0.80 In Figure 13 weshow the most representative ingredients in thedecomposedmatrix derivedfrom thesubstitution net- work. We display the top five influential dimensions, eval- uated based on the relative importance, from the SVD re- ● sul t ant mat r i x Vk ,andineachofthesedimensionsweex- ● 0.79 tracted six representative ingredientsbased on their inten- ● sities in the dimension (the squared entry values). These ● representative ingredients suggest that the communities of ● ● ingredient substitutes, such as the sweet and oil substitutes in the first dimension or the milk substitutesin the second ● 0.78 dimesion (which is similar to the cluster shown in Fig. 6), are part icularly informat ive in predict ing recipe rat ings. Accuracy To summarize our observations, we find we are able to eect i vel y pr edi ct user s’ pr ef er ence f or a r eci p e, but t he pr e- diction is not through using a full list of ingredients. Instead, network 0.77 by using the structural information extracted from the re- ● combined lationships among ingredients, we can better uncover users’ substitution preference about recipes. complement co−occurrence 7. CONCLUSION 0.76 10 20 30 40 50 60 70 Recipes are little more than instructions for combining Dimensions and processing set s of ingredient s. I ndividual cookbooks, Figure 12: Prediction performanceover reduced di- even t he most ex pansi ve ones, cont ai n si ngl e r eci p es f or each mensionality. The best performance is given by re- dish. The web, however, permits collaborative recipe gen- duced dimension k =50when combining all three er at i on and modi fi cat i on, wi t h t ens of t housands of r eci p es networks. In addition, using the information about contributed in individual websites. Wehaveshown how this the complement network alone is more eect ive in data can be used to glean insights about regional preferences

prediction than using other two networks. and modifiabilit y of individual ingredient s, and also how it Color Key Color

Value can be used to construct two kinds of networks, one of in-

− 0.5 ingredient gredient complement s, t he ot her of ingredient subst it ut es. Thesenetworks encodewhich ingredients gowell together, 41 Color Key

svd dimension svd and which can be subst it ut ed t o obt ain superior result s, and 82 permit one to predict, given a pair of related recipes, which −0.5 0.5 Value 433 one will be more highly rat ed by users. splenda In future work, weplan to extend ingredient networks to 194 olive oil applesauce honey incorporate the cooking methods as well. It would also be 65 butter brown sugar of int erest t o generat e region-specific and diet -specific rat -

milk

chickenbreast

pork

italian sausage italian

sausage

chicken

turkey

coconutextract

walnut

lime juice lime

lemon extract lemon

chocolate pudding chocolate

almond extract almond

cream of chicken of cream soup

beef

almond

kale

vanilla

vanillaextract

evaporatedmilk

sour cream sour

buttermilk

chickenbroth

half and half and half

milk

brownsugar

butter

honey

applesauce

oliveoil splenda half and half chicken broth ings, depending on the users’ background and preferences. buttermilk sour cream Awholehostofuser-interfacefeaturescouldbeaddedfor evaporated milk vanilla extract vanilla users who are interacting with recipes, whether the recipe kale almond is newly submitted, and henceunrated, or whether they are beef cream of chicken soup browsing a cookbook. In addition to automatically predict- almond extract chocolate pudding lemon extract ing a rating for the recipe, one could flag ingredients that lime juice walnut can be omitted, ones whose quantity could be tweaked, as coconut extract Figure 13: Influential substitutionturkey communities. chicken well as suggested additions and substit utions. sausage The matrix shows the most influentialitalian sausage feature di- pork mensions extracted from the substitutionchicken breast network.

4

8 6 8. ACKNOWLEDGMENTS

43 For each dimension, the six19 representative ingredi- This work wassupportedby MURI award FA9550-08-1- ents with the highest intensity values are shown, 0265 from t he A ir Force O ce of Scientific Research. The with colors indicating their intensity. T hesefeatures methodology used in this paper was developed with sup- suggest that the communities of ingredient substi- port from funding from the Army Research O ce, M ult i- tutes, such as the sweet and oil in the first dimen- University Research Initiative on Measuring, Understand- sion, are particularly informative in prediction. ing, and Responding to Covert Social Networks: Passiveand ActiveTomography. Theauthorsgratefully acknowledgeD. data. Hence we chose k =50forthereduceddimensionof Lazer for support. all t hree net works. T he figure also shows t hat using t he information about the complement network alone is more eective in prediction than using either the co-occurrence 9. REFERENCES and subst it ut e net works, even in t he case of low dimen- [1] Ahn, Y., Ahnert, S., Bagrow, J., and Barabasi, A. si ons. Consi st ent l y, as shown i n t er ms of r el at i ve i mp or t ance Flavor network and the principlesof food pairing. (Fig. 10), the substitution network alone is not the most ef- Bulletin of the American Physical Society 56 (2011). fective, but it provides more complementary information in [2] Cortes, C., and Vapnik, V. Support-vector networks. thecombinedfeatureset. Machine learning 20,3(1995),273–297. [3] Forbes, P., and Zhu, M. Content-boosted matrix [12] Rombauer, I., Becker, M., Becker, E., and Maestro, L. factorization for recommender systems: Experiments Joy of cooking.ScribnerBookCompany,1997. with recipe recommendation. Proceedings of [13] Rosvall, M., and Bergstrom, C. Maps of random walks Recommender Systems (2011). on complex net works reveal communit y st ruct ure. [4] Freyne, J., and Berkovsky, S. Intelligent food PNAS 105,4(2008),1118. planning: personalized recipe recommendation. In IUI, [14] Shidochi, Y., Takahashi, T., Ide, I., and Murase, H. ACM (2010), 321–324. Finding replaceable materials in cooking recipe texts [5] Friedman, J. St ochast ic gradient boost ing. considering charact erist ic cooking act ions. I n Proc. of Computational Statistics & Data Analysis 38,4 the ACM multimedia 2009 workshop on Multimedia (2002), 367–378. for cooking and eating activities,ACM(2009),9–14. [6] Friedman, J., Hastie, T., and Tibshirani, R. Additive [15] Svensson, M ., H ¨o¨ok, K ., and C¨ost er, R. Designing and logistic regression: a statistical view of boosting. evaluating kalas: A social navigation system for food Annals of Statistics 28 (1998), 2000. recipes. ACM Transactions on Computer-Human [7] Geleijnse, G., Nachtigall, P., van Kaam, P., and Interaction (TOCHI) 12,3(2005),374–400. Wijgergangs,L. A personalizedrecipeadvicesystem [16] Ueda, M., Takahata, M., and Nakajima, S. User’s food to promotehealthful choices.In IUI,ACM(2011), preference extraction for personalized cooking recipe 437–438. recommendation. Proc. of the Second Workshop on [8] Hastie, T., Tibshirani, R., Friedman, J., and Franklin, Semantic Personalized Information Management: J. The elements of statistical learning: data mining, Retrieval and Recommendation (2011). inference and prediction. TheMathematical [17] Wang, L., Li, Q., Li, N., Dong, G., and Yang, Y. Intelligencer 27,2(2005). Substructure similarity measurement in chinese [9] Kamieth, F., Braun, A., and Schlehuber, C. Adaptive recipes. In WWW,ACM(2008),979–988. implicit interaction for healthy nutrition and food [18] Wikipedia. Outlineof food preparation, 2011. [Online; intake supervision. Human-Computer Interaction. accessed 22-Oct -2011]. Towards Mobile and Intelligent Interaction [19] Zhang, Q., Hu, R., Mac Namee, B., and Delany, S. Environments (2011), 205–212. Back to the future: Knowledgelight casebasecookery. [10] Kinouchi, O., Diez-Garcia,R., Holanda,A., In Proc. of The 9th European Conference on Zambianchi, P., and Roque, A. The non-equilibrium Case-Based Reasoning Workshop (2008), 15. nature of culinary evolution. New Journal of Physics 10 (2008), 073020. [11] Lu, Y., Peng, F., Li, X., and Ahmed, N. Coupling feature selection and machine learning methods for navigational query identification. In CIKM,ACM (2006), 682–689.