Distance functions for knowledge-based cocktail recommendation Sigurd Sippel Hamburg University of Applied Sciences, Department of Computer Science, Berliner Tor 7, 20099 Hamburg [email protected] April 11, 2015
1. Introduction lected glassware. The following example is a prototype of a cocktail recipe. The cocktail recommendations made by bartenders in a bar have to be appropriate to the guest to be success- Manhattan Cocktail ful. An automatic recommendation system for cocktail (1882 Harry Johnson, Bartenders Manual, p. 162) recipes can combine knowledge and huge volumes of 1 dash of gum syrup, very carefully; recipes — such as from books — to find appropriate 1 dash of bitters (orange bitters); recommendations. If an exemplary favorite is given, 1 dash of curacao, if required; the appropriate recommendation must be similar to 1/2 wine glass of whiskey; the favorite, but not too obviously so. This recom- 1/2 wine glass of sweet vermouth; mender system aims to find more appropriate recom- stir up well; strain into a fancy cocktail glass; mendations than a human expert — the bartender. Personalization is implicitly given by an exemplary fa- vorite of the user. The target group comprises bar- Such recipes can be found in books or blogs, which tenders. form the sources for this recommender system. The ex- perimental platform contains several components (Fig- The necessary methodical steps to develop a recom- ure 1): the pre-extraction parses sources to get a clean mender system for cocktail recipes are considered in and normalized raw data set. The ontology compo- [Sip15]. It contains three main challenges: the knowl- nent offers to find ontology items with a raw string and edge stored in ontology with feature extraction and a chosen taxonomy such as ingredients, preparations, recommendation based on distance functions, the pre- glassware or units. The feature extraction is in the cen- extraction for cocktail books and the validation using ter, and is dependent on the pre-processing and ontol- expert knowledge. The first of these challenges is con- ogy. The feature extraction converts the pre-extracted sidered in this paper. data with help of ontology to the target structure, Section 2 shows an overview about the experimental which contains only ontology items and optional meta platform. Section 3 explains how the pre-extraction is information. made to work with the feature extraction. The feature extraction with an ontology is considered in Section 4. Section 5 shows the distance function for the in- ����� ����� gredients, preparation, glassware and all combined in a cocktail distance function. An alternative approach ������������� �������� with a balance distance is considered to find adapta- tions. In Section 6, it follows a experiment of coher- ������������������ ����������� ������������ ��������� ����� ence and the distinction of clusters, which are made by domain expert. Section 7 shows how the distance �������������� functions are used to get a recommendation for a given exemplary favorite. The last section provides the con- clusion and prospects for future work. Figure 1: Architecture overview of cocktail recom- mender system
2. Experimental platform Depending on the output of feature extraction, the recommendation finds recommendable cocktail recipes Cocktails recipes primarily contain a title and a list of of the given cocktail recipe example. The extraction ingredients. Every ingredient contains a quantity with is partially mocked by manually pre-extracted exam- an optional measurement unit. Additional information ples. Pre-processing is planned for the future, which is includes preparation, such as shake or stir and the se- visualized by dotted lines.
1 From the technical perspective, each component 4. Feature extraction with an ontology is written in Scala 2.111 with some additional tech- nologies: The ontology is realized with the ontol- The feature extraction uses the ontology component to ogy description language RDF2, the Resource Descrip- find features in the raw data structure. Because of the tion Framework. Requests to RDF are written in XML structure, the feature extraction does not have SPARQL3 and computed with the banana-rdf 0.74 li- to decide whether a string is an ingredient or anything brary. The feature extraction including distance func- else. This information is already given by manual pre- tions is written in pure Scala. The pre-extracted ex- extraction. amples are persistent in XML. The reading of the pre- The main task of the ontology component is to find extracted examples is realized by the scala-xml 1.05 an item for a given name and a taxonomy. For this, library. a concrete ontology has to be designed. The ontology Graph visualization is realized using Graphiz 2.386 component contains categories separated into the fol- and the RDF format is converted to the necessary data lowing taxonomies: ingredient, preparation, glassware format DOT by the rdf2dot7 library. and units. These are different kinds of items that are addressed and identified by an always unique URI. The RDF model contains a set of triples (resource, 3. Pre-extracting property, atomic value). Instead of atomic values, such as labels or titles, there could also be other triples. The pre-extraction is realized as a cocktail recipe pool, This nested definition is used to model trees. Every which is persistent as a simple XML structure. A cock- property can have a URI for ensuring a unique address. tail is separated into a title, a list of ingredients, a The property describes the edge that connects the left preparation and a chosen glass. Each ingredient con- with the right one. There are predefined properties. tains a quantity with a unit and a value and an ingre- dient name.
Quantity(unit : String, value : String) The type is referenced to the ingredient class (Listing 1http://www.scala-lang.org 4). The ingredient class contains two subclasses, which 2http://www.w3.org/RDF represent the basic categories such as gin and subordi- 3 http://www.w3.org/TR/rdf-sparql-query nates such as London dry gin. The superordinates 4 https://github.com/w3c/banana-rdf such as spirits are explicitly excluded, because the 5https://github.com/scala/scala-xml 6http://www.graphviz.org shared properties between two spirits such as absinthe 7https://github.com/hannibalhh/rdf2dot and gin are too low.
2 size of 1.
The query to RDF, which is written in SPARQL, ������������������������������ has to map (name, type(ingredient)) → Item (List- �������� ������������ ing 5). SPARQL is a kind of SQL, so it used a select �������� �������� ������������������������� �������� �������� �������� ������������ �������� query. It allows one to declare triples with bound and ��������������������������������� ������������ ���������������������������� ��� ���������������������������
free variables. There are two kinds of triples that are ���������� ��������������� ��������������� ����������
important. The first one binds an ingredient kindof0 ���������������������� ��������������������� �������� �������� ���������������������������� to their literal name. The name is bound with a fil- ���������� �������� ter to an uncapitalized exemplary string of P lymouth. ���������� ���������� The second one binds the ingredient to a type0 that is defined as a subclass of the ingredient class. This Figure 2: Gin categories is either a basic category or a subordinate. Assuming that the ingredients are found, there could be parent The path of P lymouth contains itself and the parent categories. These are requested in optional statements. gin (Equation 4). The superordinate is ignored and Only a triple (?kindofx kindof ?kindofx+1) and a type the types are represented by the chosen data structure check are necessary. The type check is needed to pre- such as BasicIngredient vent items that are not ingredients from appearing in pathI (P lymouth) = (4) the result. The maximum tree depth is defined as four, SuboridinateIngredient(cocktail://ingredient/plymouth) :: so only three parents could be found. The result is a BasicIngredient(cocktail://ingredient/gin) :: Nil
SELECT ?type0 ?kindof0 ?type1 ?kindof1 ?type2 ?kindof2 ?type3 ?kindof3 WHERE { ?kindof0
always possible with the URI. The path has a minimal pathP (build) = P reparation(cocktail://preparation/stir) :: Nil (5)
3 ������������������������������� ����� ������������ �������� ������������ ������������ ���������������������������� �������� �������� ����������������������������������� �������� ���������� ��������� ������������ ������� �������� ������������ ������������ ������������ �������� �������� ������������������������� ����������������� ��������������������� ��������������� ������� ������������ �������� ������������������������������ ������������ ���������� ��������������� ��������������� ���������� �������� �������������� �������������������� �������� �������� ������ �������� ���������� �������� ������������������������������� ���������� ���������� �������� �������������������� �������� ��������� ���������� ��������������������������� ������������ ���� ������������ ����� Figure 4: Glassware categories 4.4. Units Figure 3: Preparation categories 4.3. Glassware The main task of the unit in the ontology is to iden- Glassware is highly diverse [Sip15]; there are many tify measurement units and their converting factor to names for the same glass or a very similar one. If the standard unit cl. Conversion is necessary to nor- the name is ignored and only the figure is considered, malize the quantity. The measurement units are sep- a glass can be classified into a small number of figures. arated into quantitative and qualitative units. Quan- This could be done automatically, but in this approach titative units such as cl are scalable, while qualitative it is done manually. units such as dash are not scalable. There are metric Glassware is separated into classes of bottles or units such as ml and American or British units such as drinking glasses, which are realized in RDF as sub- ounce. For non-metric units, there are synonyms such classes of glassware. The drinking glasses are manu- as singular and plural words. In particular, since qual- ally classified into a small number of raw figures, such itative units such as slice or piece are not delimitable, as highballs, tumblers, ballons, goblets, or cocktail they are presented as synoyms, too. glasses (little bowls). They are represented as proper- �������� ��������������������������� ties with the type of drinking glass. The names, such �������� ���������� �������������������� ������� ���� ��������������� ���������������� as julep cup or silver cup, are recognized as synonyms ��������
�� (Figure 4). The hierarchy of glassware in the ontology ������� �������� � ������������������ is tendentiously flat, but there examples, such as the �������� �������� ���������� ���� ������������ ��������������� �������� ��������������� julep cup, which have the same figure but are not of �������� ����������
�������� �� ����������������� the same material. The julep cup is a special kind of ������� ���������� �������� ������������������ �������� ���������������������������� whiskey tumbler, but it is made of silver. �������� �������� ���
��������������������� �������� � SELECT ?type0 ?kindof0 ?type1 ?kindof1 �������
WHERE { ����� ?kindof0
pathG(julep cup) = (6) The conversion of a quantity q, which contains a DrinkingGlass(cocktail://glassware/silver/cup) :: value and a unit, to another unit u, is defined with DrinkingGlass(cocktail://glassware/whiskeytumbler) :: Nil the factor to the standard unit (Equation 7). In the example, the 30 ml are converted to 1 ounce.
4 It is necessary to declare a triple—in this case to find value(q) ∗ factor(unit) convert(q, u) = Quantity( , u) (7) the literal name—because a SPARQL filter does not factor(u) work otherwise. The balance information must also be 30 ∗ 0.1 convert(Quantity(30, ml), ounce) = Quantity( , ounce) 3 in parent ingredient, so it is declared the kindof triple. If another balance exists, then the information will be found. As the tree depth is limited to four, this query 4.5. Balance contains three nested kindof triples. In this example, the given ingredient P lymouth does The cocktail balance represents four pieces of informa- not have balance information. The basic category gin tion: the amounts of sweet, sour, water and alcohol. has alcohol and water in the proportion 0.47 and 0.53, This is an abstract point of view on the cocktail. It respectively. The superordinate has alcohol and water is necessary to get these four parts of information for in the proportion 0.4 and 0.6, respectively. As sweet is every ingredient. But this information is not always not declared, the default value of the balance property, available and the ontology does not contain all the which is not found, is 0. information. Therefore, it needs a default logic ap- proach. For example, the ontology does not contain ������������������������������ �������� ������������ balance information for a concrete gin product, but �������� ������������������������� �������� ���������������������������� �������� �������� ������������ ��������� ������� �������� �������� ������� �������� ��������� ������������
the balance of the gin prototype is known. Then the ��������������������������������� ��������������������������� ��� ���� ���� ������������ ��� ����������������������������������� ��� ������
���������� ��������������� ��������������� ���������� ����������
balance information of gin has to be used. ���������������������� ��������������������� �������� �������� ���������������������������� �������� ������������������������
���������� ��������
���������� ����������
Listing 9: Superordinate class The path contains the balance information of all sin- gle ingredients — first P lymouth, then gin and finally A basic category may not share specific information, spirits (Equation 8). The question mark is used as a but superordinates, such as spirits, contain these de- symbol to indicate that the information is not known. tails. Then this information must be used. In this case, The first information to be known is part of the result. superordinates share some of these properties. The su- perordinates are added to the ontology (Listing 9), but balance(water, alcohol, sweet, sour) (8) it is not a subclass of ingredient to prevent useless in- pathB(P lymouth) = (?, ?, ?, ?) :: gredient similarities (described in subsection 4.1). (0.53, 0.47, ?, ?) :: (0.6, 0.4, ?, ?) :: Nil
SELECT DISTINCT ?sweet0 ?sour0 ?alcohol0 ?water0 ?sweet1 ?sour1 ?alcohol1 ?water1 ?sweet2 ?sour2 ?alcohol2 balance(P lymouth) = (0.53, 0.47, 0, 0) ?water2 ?sweet3 ?sour3 ?alcohol3 ?water3 WHERE { ?kindof0
Listing 10: Balance query written in SPARQL � � � The balance query is designed for search by known � � ingredient URI (Listing 10). Self-declared and domain- specific elements such as c : sweet are used to declare balance information. These information could be miss- Figure 7: Graph of steps ing, so all declarations in where clause are optional. Only the ingredient URI has to be there. The stepF unction (Figure 8) has several aims. The
5 first is to scale the stepDistance between 0 and 1. The find the most similar ingredient to Ia in the ingredients second is to ensure that the stepDistance is indepen- of cocktail b. dent of the path sizes. The last of these is that the The distance dI (Equation 13) between ingredients stepDistance has to approximate smoothly to 1. If no of recipe a and the ingredients of b represent the ingre- equal item is found, then the distance is 1.0. dient distance between two recipes. It uses the distance d , which maps an ingredient to another ingredient. 0.85 DP stepF unction(n) = 1 − √ (9) As mappings are not completely accurate, the distance n must be calculated in both directions to catch all the The stepF unction is only designed along these aims ingredients in the distance. The distance dI sums up and there is no relation to the knowledge. If the step all the minimum dDP distances in both directions. One count is 0 then the stepDistance must also be 0. Be- direction is already scaled to 1 because the ingredients cause of the maximum depth of four in the graph, only are weighted by the ratio. The sum of the two direc- three steps per path are possible. There are six max- tions must be divided by 2 to scale dI to 1. imum steps. Therefore the value n is defined in the Pn Pm min(d (Ia ,I ))+ min(d (I ,Ia )) d (a, b) = i=0 DP i bj j=0 DP bj i (13) range [1..6] ∈ N. The stepF unction cannot be nega- I 2 tive. In the following example are two different recipes — a Negroni and a Mezcal Negroni.
Negroni Mezcal Negroni 3.0 cl Punt e Mes 3.5 cl Tlacuache silver 3.0 cl Plymouth Leyenda 3.0 cl Campari 3.5 cl Carpano Antica For- 1.0 piece orange zest mula Figure 8: Graph of step function (stir, whiskey tumbler) 2.0 cl Gran Classico (stir, cocktail glass) 5.1. Ingredient distance The distance pairs are the following (Listing 11). The distance of a distance pair (I ,I ) is a path dis- a b First, there are the mappings of ingredients of the tance (Equation 10). A quantity weighting is added, Mezcal Negroni recipe to the ingredients of the because the quantity says something about the impor- Negroni recipe. This follows the mapping in the other tance. 6 cl gin are more important than 1 cl sugar direction. If there is no similar ingredient, it is visu- syrup. The weight is the quantity in relation to the alized by a question mark. The distance is displayed volume of the cocktail. The volume of the sum of quan- in the middle. All distances are ingredient-based, be- tities of all quantitatively-measured ingredients. All cause there are no equal ingredients. As the T lacuache quantities are transformed to the standard unit cl. silver Leyenda does not have a similar ingredient, the
quantity(Ia) stepDistance is 1 and, because of the ratio to the vol- dDPI = stepDistance(Ia,Ib) · volume(a) (10) ume of the recipe, the dDP is 0.39. This has a huge The stepDistance has the lowest value 0, if both in- effect. The Gran Classico and Campari are types of gredients are the same. The quantity could be differ- bitter liquors. Carpano Antica F ormula and P unt e ent. A different quantity has no effect because it is mul- Mes are types of red vermouth. The stepDistance is low, but their ratio is also low, so their effect in the tiplied by 0. Therefore, dDP has to be divided into two kinds of distance functions: the ingredient-based dis- dDP is not very high. Taking the other direction is tance function (Equation 10) and the quantity-based also similar. The P lymouth does not have a similar distance function (Equation 11). ingredient, which takes the most effect. The distances are rounded off to two decimal places. quantity(Ia) quantity(Ib) The sums of the mappings are not equal (Equa- d = | − | (11) DPQ volume(a) volume(b) tion 14). In this case, they are very similar. The ingre- dient distance d (Equation 15) says that these drinks The distance pair distance is dependent on the I have some similarities such as redvermouth, because stepDistance (Equation 12). in the range of 0 to 1 it is in the middle, but they have differences such as gin and mezcal. dDP = if(stepDistance == 0) dDPQ else dDPI (12)
A cocktail recipe contains a list of ingredients. The sum(Negroni → Mezcal Negroni) = 0.63 (14) order must not affect the distance, because the order sum(Mezcal Negroni → Negroni) = 0.62 could be different and would not change the recipe. If 0.63 + 0.62 there is an ingredient I of the cocktail a, the aim is to dI = = 0.63 (15) a 2 6 Mezcal Negroni => Negroni with Punt e Mes IngredientBased[2.0 cl Gran Classico <= (0.40,0.09) => An example of two very similar recipes (Listings 12) 3.0 cl Campari] IngredientBased[3.5 cl Carpano Antica Formula <= (0.40,0.16) => contains only ingredients, which have a similar ingre- 3.0 cl Punt e Mes] IngredientBased[3.5 cl Tlacuache silver Leyenda <= (1.00,0.39) => dient. As the stepDistance is very low, the ingredient ?] distance is low, and the preparation and glassware dis- Negroni with Punt e Mes => Mezcal Negroni IngredientBased[1.0 piece orange zest <= (1.00,0.05) => ?] tance in 0. The cocktail distance is only 0.06. One of IngredientBased[3.0 cl Campari <= (0.40,0.13) => 2.0 cl Gran Classico] these recipes should not lead to a recommendation of IngredientBased[3.0 cl Plymouth <= (1.00,0.32) => ?] IngredientBased[3.0 cl Punt e Mes <= (0.40,0.13) => the other because they are too similar. 3.5 cl Carpano Antica Formula]
Negroni => Negroni with Punt e Mes QuantityBased[1.0 piece orange zest <= (0.00,0.00) => 1.0 piece orange zest] Listing 11: Negroni Cocktail Distance QuantityBased[3.0 cl Campari <= (0.00,0.00) => 3.0 cl Campari] IngredientBased[3.0 cl Gin <= (0.15,0.05) => 3.0 cl Plymouth] 5.2. Preparation distance IngredientBased[3.0 cl red Vermouth <= (0.15,0.05) => 3.0 cl Punt e Mes] The preparation distance is a simple path distance Negroni with Punt e Mes => Negroni QuantityBased[1.0 piece orange zest <= (0.00,0.00) => (Equation 16). 1.0 piece orange zest] QuantityBased[3.0 cl Campari <= (0.00,0.00) => 3.0 cl Campari] IngredientBased[3.0 cl Plymouth <= (0.15,0.05) => 3.0 cl Gin] IngredientBased[3.0 cl Punt e Mes <= (0.15,0.05) => dP (pa, pb) = stepDistance(pa, pb) (16) 3.0 cl red Vermouth] CocktailDistance = 0.06 = 0.6 ingredient(0.09 + 0.09 / 2 = 0.09) For example (Equation 17), the distance between stir + 0.2 preparation((stir,stir) => 0.00) and build is 0 because it is only a synonym. The + 0.2 glass((whiskey tumbler,whiskey tumbler) => 0.00) preparations shake and stir are absolutely different and have the maximum distance. Listing 12: Cocktail distance of two Negroni recipes
dP (build, stir) = 0.0 (17) In an example of two absolutely different recipes (Listings 13), there is no similar ingredient and no sim- d (shake, stir) = 1.0 P ilar preparation or glassware. The cocktail distance is 1. One of these recipes is also not a good recommen- 5.3. Glassware distance dation for the other. The glassware distance is also a simple path distance Manhattan => Gin Fizz (Equation 18). IngredientBased[1.0 piece Angostura <= (1.00,0.05) => ?] IngredientBased[1.0 piece orange zest <= (1.00,0.05) => ?] IngredientBased[4.0 cl red vermouth <= (1.00,0.36) => ?] dG(ga, gb) = stepDistance(ga, gb) (18) IngredientBased[6.0 cl Rye <= (1.00,0.55) => ?] Gin Fizz => Manhattan IngredientBased[2.0 cl soda <= (1.00,0.15) => ?] For example, the distance between a whiskey tumbler IngredientBased[2.0 cl sugar syrup <= (1.00,0.15) => ?] IngredientBased[3.0 cl lemon juice <= (1.00,0.23) => ?] and a cocktail glass is 1.0, because there are no sim- IngredientBased[6.0 cl Gin <= (1.00,0.46) => ?] CocktailDistance = 1.00 = ilarities in the paths. A silver cup contains whiskey 0.6 ingredient(1.00 + 1.00 / 2 = 1.00) tumbler in the path. One step is necessary here. + 0.2 preparation((stir,shake) => 1.00) + 0.2 glass((cocktail glass,collins glass) => 1.00)
dG(whisky tumbler, cocktail glass) = 1.0 (19) Listing 13: Cocktail distance of a Manhattan and Gin dG(whisky tumbler, silver cup) = 0.15 Fizz 5.4. Cocktail distance The combined function — the cocktail distance — uses 5.5. Balance distance the ingredient, preparation and glassware. Since the ingredient distance is the most important part, the The balance distance is an alternative to the cocktail weight is 0.6; preparation and glassware split the re- distance. The aim is to find adaptations. Every ingre- maining 0.4 in equal parts. dient has a balance. The balance of a cocktail is the sum of all single ingredients (Equation 22).
dC (ca, cb) = 0.6dI (i(ca), i(cb)) + 0.2dP (p(ca), p(cb)) + 0.2dG(g(ca)g(cb)) (20) Pn balance(c) = balancei(water,alcohol,sweet,sour) (22) i=1
The cocktail distance in the example of Negroni dB (balance) = water + alcohol + sweet + sour (23) and Mezcal Negroni (Listing 11) is about 0.58 (Equa- dB (ca, cb) = dB (|balance(cA) − balance(cB )|) (24) tion 21). This distance is high because both recipes have many differences. The difference between two balances (Equation 23) is a balance with a difference in each component, such dC = 0.58 = (21)
0.8dI (0.63 + 0.62/2 = 0.63) as sour. The balance distance is the difference between
+ 0.2(dP ((stir, stir) = 0.00) the summed up balance of ca and cb (Equation 24); all + 0.2(dG((cocktail glass, whiskey tumbler) = 1.00) components will be added up to a scalar distance.
7 The balance distance is not scaled to 1. A balance structure (section 3). The reference to the book is such as balance(1, 1, 1, 1) is unrealistic. An ingredient added to be able to reconstruct these recipes. with a high ratio of alcohol such as Absinth does not One file is used for each cluster that represents the contain a high ratio of sugar such as syrup. It is more idea of one classic cocktail: realistic that the sum is 1. An ontology cannot ensure • lemonade.xml (three recipes) • crusta.xml (three recipes) this, because the default values can always break this • brandypunch.xml (two recipes) • julep.xml (three recipes) constraint. With huge volumes of data, an empirical • alexander.xml (two recipes) • aromatic.xml (two recipes) maximum value can be computed to scale this distance • vermouth.xml (three recipes) • flip.xml (two recipes) function. • tomcollins.xml (two recipes) • absinth.xml (three recipes) • eggnogg.xml (two recipes) Mezcal Negroni • whiskeysour.xml (two recipes) 2.0 cl Gran Classico: • manhattan.xml (three recipes) Balance(alcohol=0.28,sweet=0.20,sour=0.00,water=0.60) • daiquiri.xml (nine recipes) 3.5 cl Carpano Antica Formula: • japanesecocktail.xml (two recipes) Balance(alcohol=0.18,sweet=0.00,sour=0.25,water=0.74) • jackrose.xml (two recipes) 3.5 cl Tlacuache silver Leyenda: • ginfizz.xml (two recipes) Balance(alcohol=0.40,sweet=0.00,sour=0.00,water=0.60) • cloverclub.xml (three recipes) MezcalNegroniBalance(alcohol=0.29,sweet=0.14,sour=0.00,water=0.65) • sidecar.xml (two recipes)
Negroni 1.0 piece orange zest: The manual pre-extraction into the data structure Balance(alcohol=0.00,sweet=0.00,sour=0.00,water=0.00) 3.0 cl Campari: (Equation 1) has to simplify the recipes in terms of Balance(alcohol=0.25,sweet=0.20,sour=0.00,water=0.60) 3.0 cl Plymouth: vocabulary, data structure and knowledge: Recipes Balance(alcohol=0.47,sweet=0.00,sour=0.00,water=0.53) 3.0 cl Punt e Mes: in historic books contains or-relations (Equation 25). Balance(alcohol=0.18,sweet=0.15,sour=0.00,water=0.74) NegroniBalance(alcohol=0.28,sweet=0.11,sour=0.00,water=0.59) That means, for example, that either bourbon or rye BalanceDistance = 0.10 = DifferenceOfBalance(alcohol=0.00,sweet=0.03,sour=0.00,water=0.06) have to used, not both. An optional ingredient could pose the same problem. The target data structure sup- ports only one. Only one ingredient was chosen for Listing 14: Negeroni Balance Distance pre-extraction. In the example of Negroni and Negroni Mezcal (List- 3 ounce bourbon or rye (25) ing 11), the cocktail distance is very high because the two drinks do not share many properties. The bal- optionally 1 dash Angostura ance distance shows more shared properties (Listing Recipes contain ranges or quantities (Equation 26). 14). The sum of all ingredient balances is very simi- Often, it means seasoning an ingredient. The average lar, because both have the same alcohol strength and was chosen for pre-extraction. sweetness, and differ only slightly from the dilution with water. The balance distance is only 0.1. These 2 − 4 dashes bitters (26) recipes are not the same, but have a similar character- istic that qualifies one of them to be a recommendation Recipes contain solid ingredients (Equation 27). The for the other. mapping of solids to liquids allows one to find better similarities to other recipes. Converting the measure- ments is not enough, because it is necessary to combine 6. Experiment by domain expert a qualitative unit such as half with an ingredient such The aim of the experiment is analyze the precision of as lemon. The ontology has to know that one lemon the cocktail distance function. A domain expert as- contains about 5 cl in order to convert this correctly. sumed that classic recipes, which have been popular The conversion to the liquid quantity of the ingredient since a long time are distinct to all other classic recipes. was done manually. A recipes, which were not distinct to the others would half small lemon → 2.5 cl (27) be forgotten. 52 recipes, clustered by domain expert into 19 clusters, are used for an initial measurement of piece orange → 1 cl orange juice how well the distances work. The recipes are from five 5 cl lemon lemonade → 4 cl soda, different historic cocktail books. 0.5 cl lemon juice, 0.5 cl sugar • Jerry Thomas, How to Mix Drinks (1862) • Harry Johnson, Bartenders’ Manual (1882) • Harry Craddock, Savoy Cocktail Book (1930) Recipes also contain fillers (Equation 28), which are in- • Virginia Elliott and Phil D. Stong, Shake em up! (1930) gredients that lack a concrete quantity. That does not • David A. Embury, Fine Art of Mixing Drinks (1948) mean a dash or a splash, which is always a small quan- The clusters contains recipes that differ in only small tity. A filler could be more than 10 cl. The concrete things such as a different kind of gin order with or quantity chosen must be realistic in the perspective of without a lemon twist. They use different kinds of the glassware. units such as wineglasses or ounces. These recipes are pre-extracted and are persistent in an XML data fill with soda (28)
8 Recipes contain comments that are not always neces- It is assumed that the maximum distance of a cock- sary to make a cocktail. In some cases, such as when tail in one hand-made cluster must be low enough, so using a Crusta or a Julep, which are more compli- that recipes of other clusters have a chance to get a cated, knowing these comments may prove useful. But higher distance. The measurement of coherence (Fig- it is not dependent on distances. Recipes contain in- ure 9) shows the distances within a clusters. It is sorted gredients along with their origins (Equation 29). The in descending order. The first, the julep.xml cluster ontology contains categories such as Jamaica Rum, has a maximum distance of 0.34. The last one, the but if an origin is not known, the ingredient will not be flip.xml, has a maximum value of 0.09. These are recognized, because the ontology knows only the whole positive results, because recipes in the clusters are also name. The ontology is maintained with all spellings. similar in perspective of the distance function.
Jamaica Rum (29) ��� Demerara Rum
Recipes contain the known default names of ingredi- ents. Since recipes should be short, ingredient names are as short as possible. The problem is that the
names are not distinct. Chartreuse is a company, ��� ��� ��� ��� ���
but usually the product Chartreuse V erte is meant. �������� The vermouth is a category, but red vermouth is meant; therefore, vermouth is a superordinate to pre- vent that this is matched and vermouth is added to red vermouth as a synonym.
Chartreuse → Chartreuse V erte (30) ��� ��� ��� ��� ��� ������� ��������� ���������� ���������� ������������ ����������� V ermouth → Red V ermouth ����������� ������������ ����������� ������������ ������������ ������������� ������������ �������������� �������������� ������������� ��������������� ���������������
Recipes contain numbers and fractions as words (Equa- ��������������������
tion 31). It needs synonyms of numbers or fractions in ������� the ontology. These are manually converted to a double value. Figure 9: Coherency of clusters One → 1 (31) The next measurement shows the distances of each 1 1/3 → recipe to every recipe that is outside its own cluster 3 (Figure 10). It is sorted in increasing order. The first, half → 0.5 brandypunch.xml, is about 0.29. The last, julep.xml, 1 is about 0.60. Most of these distances are bigger, which one third → 3 shows that recipes in a cluster are coherent and the clusters are distinct from each other. This is a posi- Recipes contains compound and separate spellings as tive results, because the distance function represents well as singular and plural words. All are represented the domain knowledge, which means that classics are as synonyms to recognize that these are the same. distinct. wine − glass → wineglass (32) The clusters could be more distinct if the following is recognized: Solids such as mint in a julep may be wine glass → wineglass of high importance in one cocktail. The quantity is dashes → dash irrelevant because this is a natural ingredient and the intensity is unstable; hence, the quantity of mint differs Recipes contain many different spellings (Equa- strongly. Ingredients such as orgeat or absinth always tion 33). These spellings are persistent in the ontology have high intensity. as another synonym. The result is an ontology which An unsolved problem is that different authors of greater than it is useful. recipes have different opinions. One says that a cock- one slice of lemon (33) tail should be stirred, another believes that it must be shaken. The result is that the same idea of a recipe lemon slice could occur in another cluster. Recognition and support of these issues make it possi- The time taken for calculating distances is very high. ble to get distances that are more precise. 2,702 distance calculations without any caching mech-
9 as graph (Figure 11). Orange nodes are the examples ��� e. Each recipe was used once as an example. They are grouped along with their recommendations. The cock- tail distance is in brackets behind the name. 30 % of the recipes do not have a recommendation, because no recipes met the given constraints as the dataset is too small. The thresholds could be softer, but this would ��� ��� ��� ��� ��� lead to lower precision. �������� The biggest cluster is daiquiri.xml. This cluster can be found in the recommendation. A good rec- ommendation of a Jack Rose or a W hiskey Sour is a Daiquiri. This recommendation contains the result, but in the same recommendation there are also recipes
��� ��� ��� ��� ��� which are very near to the the Daiquiri. Clusters
������� showed that classics are distinct from other classics. ��������� ���������� ���������� ����������� ������������ ����������� ������������ ����������� ������������ ������������ ������������� ������������ �������������� �������������� A recipe could be a good recommendation, but not all ������������� ��������������� ���������������
�������������������� members of the cluster that contains this recipe have to occur in one recommendation. Clusters could used ������� to remove such duplicates. The recommendation of the example Albemarle Figure 10: Distinction of clusters F izz contains the T om Collins and the Side Car. From the perspective of cocktail structure, the fizz is anism or indexing needs about 38.2 seconds and with sour with a splash of soda. A collins is a sour that simple map caching it needs about 6.8 seconds. The is topped up with soda. The recommendations are ontology queries are the bottlenecks, because they are not the same but have a similar characteristic, which used often but are very slow. If more spellings are sup- makes this a good recommendation. Another example ported, the number of necessary queries increases and is the Cold Rum F lip, which has only the recommen- performance drops. Hence, an indexing mechanism is dation of Real Georgia Mint Julep. This does not necessary. make sense because the flip is a drink with egg and the julep contains mint. Issues with mint have already 7. Recommendation by example been discussed. Egg or cream are missing in the bal- ance and cannot be recognized. Recommendations could be from the perspective of in- gredients or balance. In the first approach, the ingredi- ents of the example and the recommendation is nearly 8. Conclusion and future work the same, but the balance has changed. In this ap- proach, the aim is to find recipes that have nearly the This paper has shown that the cocktail distance and same balance of the given example. But cocktails that the balance distance have an expected precision. The are too similar have to be removed. opinions of different authors cannot be changed or han- The recommendation approach uses the nearest dled. Numerous optimizations are needed, especially neighbor classification nn (Equation 34) of the given to recognize ingredients correctly. In this process, the pre-extraction can be simplified step by step. The per- example e. The balance distance dB shows which cock- tails have the same characteristic if the balance dis- formance optimization of ontology with an indexer is tance is close to 0. A higher distance has no validity also useful. because it could result from a heavy change in one The introduced recommendation approach of found component of the balance such as sour. In the first recipes with the same balance and different ingredients instance, it uses all cocktail recipes that have a lower must be compared with this opposite approach. Per- haps both can be appropriate and dependent only on dB than the threshold tB = 0.1. The nearest neigh- the guest’s preferences. bor with the cocktail distance dC which is lower than With a greater number of recognized recipes, a val- threshold tC = 0.2 has to be removed from recommen- dation r, because the recipe is too similar. idation of the recommendation approach with domain experts is possible. For this case, a platform for a vali- r = nndB (e, tB) \ nndC (e, tC ) (34) dation by the user is necessary, which offers recommen- The recommendation is a list of cocktails. This is or- dations and a rating of whether or not it is appropriate. dered increasingly by balance distance. The hand-made clusters were used as a flattened recipe list to test this approach. The result is shown as
10 References
[HKRS07] Hitzler, Pascal ; Krötzsch, Markus ; Rudolph, Sebastian ; Sure, York: Semantic Web: Grundlagen. Springer-Verlag, 2007
[Sip15] Sippel, Sigurd: Knowledge-based Recommendations for cocktail recipes. (2015). http://users. informatik.haw-hamburg.de/~ubicomp/projekte/master2014-sem/sippel/bericht.pdf
A. Recommendation map
�������������������� ������������������
������������������� �������������������������� ������������������������ ����������������������� ��������������������������� ��������������������������������������� ��������������������������� ���������� ���������������� ��������������������������� ������������� ����������������� ������������������������
������������������ ������������������� ������������������ ������������������� �������������������� ������������������� ������������������
������������� ������������������������������������ ������������� ������������������������������������ ����������� ������������������������������ ����������������������������
��������������������������������������� ��������������������������������������� ������������������������������������ �������������� ������������������������ ������������������������ �������������������
�������� �������������������� ���������������� ������������������� ����������������������� ���������������������� ������������������� �������� ������������������������������������ �������������������� ������������������������������������ ������������������������ �������������������� ��������������������������������������� ������������������������
�������������������� ��������������������������� �������������������� ������������������������ ������������������������������������ ����������������� ������������������ ��������������������������������������� ������������������ ���������������� ������������ ������������������������������ ��������������������� ������������������ ������������������� ��������������� �������������������� �������������������� �������������������� ������������������������ ������������ ������������������������������������ ������������������������������
�������� ��������������������������������������� ��������������������������������� �������������������� ��������������������
���������������� ��������������� ������������������������
������������������������ ������������������� ������������������������
�������������������� ������������������������������ ������������������ ������������ ������������������� ��������������������������� ������������������� ��������������������������������������� ������������������ �����������������������
�������������������� ��������������������������������� ������������� ����������������� ��������� ������������������������ ������������������������������������
����������������� ������������������������ ������������������� ���������������������������������������
������������������������ ������������������������������ ������������������ ������������������� ������������������� ����������������� ���������������������� ����������������������� ��������������������������� �������������������� ������������������������������
������������ �������������������� ������������������� �������������������������������� �������������������� ������������������� ���������������������� ����������������������������� ������������������������ ������������� ������������ ������������� ������������������������������������ ������������������������������ �������������������� �������������������� ��������������������������������������� ������������ �����������������������
������������������������ ��������������������������� ��������������� �������������������� �������� ��������������������
����������������������� ������������������������ ����������������� �������������������� ����������������� ����������� ������������������������������������ �������� ������������������� ������������������������ ��������������������� �������������������� ����������� ������������������� ��������������������
��������������������������� ������� �������������������� �������������������� ����������������������� ������������� ��������� ������������������� ������������������������
������������������������ ������������������������������ �������������������� �������������������� ���������������������
������������������� ���������������������
Figure 11: Recommendation with nearest neighbor approach B. XML of hand-made clusters
B.1. lemonade.xml
11
B.2. crusta.xml
B.3. brandypunch.xml
12
B.4. julep.xml
B.5. alexander.xml
B.6. aromatic.xml
13 B.7. vermouth.xml
B.8. flip.xml
B.9. tomcollins.xml
B.10. absinth.xml
14
B.11. eggnogg.xml
B.12. whiskeysour.xml
B.13. manhattan.xml
15
B.14. daiquiri.xml
16
17
B.18. cloverclub.xml
B.19. sidecar.xml
18 C. XML-RDF of ontology
19
20
21
22
23
24
25