Distance Functions for Knowledge-Based Cocktail
Total Page:16
File Type:pdf, Size:1020Kb
Distance functions for knowledge-based cocktail recommendation Sigurd Sippel Hamburg University of Applied Sciences, Department of Computer Science, Berliner Tor 7, 20099 Hamburg [email protected] April 11, 2015 1. Introduction lected glassware. The following example is a prototype of a cocktail recipe. The cocktail recommendations made by bartenders in a bar have to be appropriate to the guest to be success- Manhattan Cocktail ful. An automatic recommendation system for cocktail (1882 Harry Johnson, Bartenders Manual, p. 162) recipes can combine knowledge and huge volumes of 1 dash of gum syrup, very carefully; recipes — such as from books — to find appropriate 1 dash of bitters (orange bitters); recommendations. If an exemplary favorite is given, 1 dash of curacao, if required; the appropriate recommendation must be similar to 1/2 wine glass of whiskey; the favorite, but not too obviously so. This recom- 1/2 wine glass of sweet vermouth; mender system aims to find more appropriate recom- stir up well; strain into a fancy cocktail glass; mendations than a human expert — the bartender. Personalization is implicitly given by an exemplary fa- vorite of the user. The target group comprises bar- Such recipes can be found in books or blogs, which tenders. form the sources for this recommender system. The ex- perimental platform contains several components (Fig- The necessary methodical steps to develop a recom- ure 1): the pre-extraction parses sources to get a clean mender system for cocktail recipes are considered in and normalized raw data set. The ontology compo- [Sip15]. It contains three main challenges: the knowl- nent offers to find ontology items with a raw string and edge stored in ontology with feature extraction and a chosen taxonomy such as ingredients, preparations, recommendation based on distance functions, the pre- glassware or units. The feature extraction is in the cen- extraction for cocktail books and the validation using ter, and is dependent on the pre-processing and ontol- expert knowledge. The first of these challenges is con- ogy. The feature extraction converts the pre-extracted sidered in this paper. data with help of ontology to the target structure, Section 2 shows an overview about the experimental which contains only ontology items and optional meta platform. Section 3 explains how the pre-extraction is information. made to work with the feature extraction. The feature extraction with an ontology is considered in Section 4. Section 5 shows the distance function for the in- ����� ����� gredients, preparation, glassware and all combined in a cocktail distance function. An alternative approach ������������� �������� with a balance distance is considered to find adapta- tions. In Section 6, it follows a experiment of coher- ������������������ ����������� ������������ ��������� ����� ence and the distinction of clusters, which are made by domain expert. Section 7 shows how the distance �������������� functions are used to get a recommendation for a given exemplary favorite. The last section provides the con- clusion and prospects for future work. Figure 1: Architecture overview of cocktail recom- mender system 2. Experimental platform Depending on the output of feature extraction, the recommendation finds recommendable cocktail recipes Cocktails recipes primarily contain a title and a list of of the given cocktail recipe example. The extraction ingredients. Every ingredient contains a quantity with is partially mocked by manually pre-extracted exam- an optional measurement unit. Additional information ples. Pre-processing is planned for the future, which is includes preparation, such as shake or stir and the se- visualized by dotted lines. 1 From the technical perspective, each component 4. Feature extraction with an ontology is written in Scala 2.111 with some additional tech- nologies: The ontology is realized with the ontol- The feature extraction uses the ontology component to ogy description language RDF2, the Resource Descrip- find features in the raw data structure. Because of the tion Framework. Requests to RDF are written in XML structure, the feature extraction does not have SPARQL3 and computed with the banana-rdf 0.74 li- to decide whether a string is an ingredient or anything brary. The feature extraction including distance func- else. This information is already given by manual pre- tions is written in pure Scala. The pre-extracted ex- extraction. amples are persistent in XML. The reading of the pre- The main task of the ontology component is to find extracted examples is realized by the scala-xml 1.05 an item for a given name and a taxonomy. For this, library. a concrete ontology has to be designed. The ontology Graph visualization is realized using Graphiz 2.386 component contains categories separated into the fol- and the RDF format is converted to the necessary data lowing taxonomies: ingredient, preparation, glassware format DOT by the rdf2dot7 library. and units. These are different kinds of items that are addressed and identified by an always unique URI. The RDF model contains a set of triples (resource, 3. Pre-extracting property, atomic value). Instead of atomic values, such as labels or titles, there could also be other triples. The pre-extraction is realized as a cocktail recipe pool, This nested definition is used to model trees. Every which is persistent as a simple XML structure. A cock- property can have a URI for ensuring a unique address. tail is separated into a title, a list of ingredients, a The property describes the edge that connects the left preparation and a chosen glass. Each ingredient con- with the right one. There are predefined properties. tains a quantity with a unit and a value and an ingre- dient name. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" <cocktails> xmlns:c="http://www.myclassicbar.com/rdf#"> <cocktail> ... <title>Manhattan</title> </rdf:RDF> <ingredients> <ingredient> <quantity><value>6</value><unit>cl</unit></quantity> <name>rye</name> </ingredient> Listing 2: RDF schemas <ingredient> <quantity><value>4</value><unit>cl</unit></quantity> <name>red vermouth</name> The minimal structure of RDF (Listing 2) contains </ingredient> <ingredient> the root element rdf : RDF with three name spaces <quantity><value>1</value><unit></unit></quantity> <name>orange zest</name> [HKRS07]: rdf contains elements such as property or </ingredient> <ingredient> type that are extended by the name space rdfs, which <quantity><value>1</value><unit></unit></quantity> contains elements such as Class. The name space c <name>Angostura</name> </ingredient> is the self-invented name space for the domain-specific </ingredients> <preparation>stir</preparation> elements such as factor. The semantic of the elements <glass>cocktail glass</glass> </cocktail> will be explained in the following chapters. ... </cocktails> 4.1. Ingredients Listing 1: Preprocessed example written in XML Each ingredient category is a property (Listing 3) with The XML structure is designed to read data in an the type of ingredient, an URI about itself and a literal easy way. Every part of information is put in one tag as a name. and converted in a data structure (Equation 1). <rdf:Property rdf:type="cocktail://ingredient/basic" rdf:about="cocktail://ingredient/gin" Cocktail(title : String, List[Ingredient], (1) rdfs:Literal="gin"/> preparation : String, glassware : String) Ingredient(q : Quanitity, name : String) Listing 3: RDF property Quantity(unit : String, value : String) The type is referenced to the ingredient class (Listing 1http://www.scala-lang.org 4). The ingredient class contains two subclasses, which 2http://www.w3.org/RDF represent the basic categories such as gin and subordi- 3 http://www.w3.org/TR/rdf-sparql-query nates such as London dry gin. The superordinates 4 https://github.com/w3c/banana-rdf such as spirits are explicitly excluded, because the 5https://github.com/scala/scala-xml 6http://www.graphviz.org shared properties between two spirits such as absinthe 7https://github.com/hannibalhh/rdf2dot and gin are too low. 2 size of 1. <rdfs:Class rdf:about="cocktail://ingredient"> <rdfs:label>ingredient</rdfs:label> </rdfs:Class> <rdfs:Class rdf:about="cocktail://ingredient/basic"> <rdfs:label>basic category of ingredient</rdfs:label> cocktail : //unknown/?name (3) <rdfs:subClassOf>cocktail://ingredient</rdfs:subClassOf> </rdfs:Class> <rdfs:Class rdf:about="cocktail://ingredient/subordinate"> <rdfs:label>subordinate ingredient</rdfs:label> In the example (Figure 2), there are a subordinate in- <rdfs:subClassOf>cocktail://ingredient</rdfs:subClassOf> </rdfs:Class> gredient P lymouth, which has a parent gin as a basic category of ingredients, and the superordinate spirits, which is not declared as an ingredient. Listing 4: RDF Class The query to RDF, which is written in SPARQL, ������������������������������ has to map (name, type(ingredient)) → Item (List- �������� ������������ ing 5). SPARQL is a kind of SQL, so it used a select �������� �������� ������������������������� �������� �������� �������� ������������ �������� query. It allows one to declare triples with bound and ��������������������������������� ������������ ���������������������������� ��� ��������������������������� free variables. There are two kinds of triples that are ���������� ��������������� ��������������� ���������� important. The first one binds an ingredient kindof0 ���������������������� ��������������������� �������� �������� ����������������������������