Domain-Specific Recommendation Based on Deep Understanding of Text
Total Page:16
File Type:pdf, Size:1020Kb
Master Thesis Sigurd Sippel Domain-specific recommendation based on deep understanding of text Fakultät Technik und Informatik Faculty of Engineering and Computer Science Studiendepartment Informatik Department of Computer Science Sigurd Sippel Domain-specic recommendation based on deep understanding of text Master Thesis eingereicht im Rahmen der Masterprüfung im Studiengang Master of Science Informatik am Department Informatik der Fakultät Technik und Informatik der Hochschule für Angewandte Wissenschaften Hamburg Betreuender Prüfer: Prof. Dr. Kai von Luck Zweitgutachter: Prof. Dr. Klaus-Peter Schoeneberg Eingereicht am: 21. April 2016 Sigurd Sippel Thema der Arbeit Domain-specic recommendation based on deep understanding of text Stichworte Empfehlungssysteme, KDD, Data Mining, Tiefenverständnis, Featureextraktion, Ontologie, Basiskategorien, Validierung, Domänenexperten Kurzzusammenfassung Diese Masterarbeit betrachtet den Prozess zur Entwicklung domain-spezischer Empfeh- lungssysteme am Beispiel der Domäne Cocktailrezepte. Auf Basis einer Ontologie wird ein Tiefenverständnis der Texte — der Rezepte — erzeugt. Die Ontologie ist modelliert mit Ba- siskategorien, die dazu dienen Markmale wie Zutaten aus dem Rezept zu extrahieren. Für eine Vergleichbarkeit der Rezepte werden Zutaten auf Basis von Aromen modelliert. Für den Prozess der Datenaufbereitung bis hin zur Empfehlung wird KDD als Leitlinie verwendet. Die Empfehlung anhand einer domänen-spezischen Distanzfunktion wird berechnet. Zur Klassikation einer Empfehlung zu einem gegebenen Favoriten wird ein k-nearest neighbor verwendet. Die Validierung erfolgt durch eine Befragung von Domännenexperten hinsichtlich der Akzeptabilität der Empfehlungen. Sigurd Sippel Title of the paper Domain-specic recommendation based on deep understanding of text Keywords Recommender Systems, KDD, Data Mining, Deep Understanding, Feature extraction, Ontology, Basic Categories, Validation, Domain experts Abstract This thesis considers the process of development for a domain-specic recommender system that uses the domain of cocktail recipes for experiments. Based on ontology a deep understand- ing of text is created — recipes are considered. The ontology is designed by basic categories to extract features such as ingredients. Ingredients are modeled by avors for comparability. The process of data processing along with the recommendation uses the KDD process as its guidelines. The key of the recommendation is based on domain-specic distance functions. A nearest-neighbor approach is used to classify recommendations for a given favorite. Validation is considered based on the acceptability of domain experts. Contents 1. Introduction 1 2. Objective 3 3. Literature review 6 3.1. Recommender systems . .6 3.2. Knowledge discovery in database and data mining . .8 3.3. Distances in context of clustering and classication . 10 3.4. Basic categories . 13 3.5. Extraction and distances with ontologies . 16 3.5.1. Graph-based recommendation of cooking recipes . 20 3.5.2. Graph-based menu planning . 22 3.6. Modeling of sensations . 25 3.6.1. Dimensions of odors . 25 3.6.2. Nutrition-based modeling of cooking recipes . 31 3.6.3. Flavor and emotion for cocktail recommendation . 33 3.7. Validation by domain experts . 34 3.8. Challenges for experiments . 37 4. Experiments 40 4.1. Understanding the eld of cocktail recipes with domain experts . 40 4.1.1. Minimal recipe . 41 4.1.2. Preparation and ice . 42 4.1.3. Substitution of ingredients . 43 4.1.4. Quantity of an ingredient . 44 4.1.5. Glassware . 44 4.1.6. Choice of preparation . 45 4.1.7. Choice of volume . 46 4.1.8. Recommendation of recipe . 47 4.2. Classic recipe extraction by domain experts . 51 4.2.1. Ingredients . 53 4.2.2. Preparations . 54 4.2.3. Glassware . 55 4.2.4. Experiences of extraction . 56 4.2.5. Target structure . 58 iv Contents 4.3. Parsing of cocktail books . 60 4.3.1. Recognition phase . 62 4.3.2. Cleaning phase . 68 4.3.3. Selection phase . 70 4.3.4. Context phase . 70 4.3.5. Domain-specic reasoning phase . 75 4.3.6. Plausibility metric . 79 4.3.7. Optimization of data quality . 80 4.3.8. Development process . 81 4.3.9. Statistics of cocktail parsing . 82 4.4. Distances between classic recipes . 88 4.4.1. Item path . 89 4.4.2. Units . 90 4.4.3. Balance . 92 4.4.4. Distances for recommendation . 94 4.4.5. Evaluation of distance measurement . 99 4.5. Validation by domain-experts . 102 4.5.1. Setting of oine experiment . 103 4.5.2. First proof of validation approach . 104 4.5.3. Acceptability of independent domain-experts . 109 4.6. Architecture . 116 4.7. Summarization and evaluation of experiments . 117 5. Conclusion and future work 119 A. Survey 121 B. Domain-specific clusters 130 C. Extract of ontology 144 D. Extract of acceptability ratings by domain experts 145 v 1. Introduction In order to understand the recommendation process, a specic domain is used for experiments that are focused on deep understanding of text. Deep understanding [ASB08] leads to a rich semantic representation of data, which is necessary for content recommendation. As an example of a specic domain, the domain of cocktails is chosen because it is denite and documented by bartending manuals and books of cocktail recipes written by domain experts. The quality of recommendation is not directly measurable, but domain experts can also be interviewed to get feedback on the quality. The cocktail recommended by a bartender in a bar has to be appropriate to the ambience and to the preferences of individual guests. The success of a bar depends on how appropriate the recommendation is. If the guest asks the bartender for a recommendation, the bartender then acts as a gatekeeper [PGK11, p. 105] for the cocktails for him, because the bartender is limited by the given circumstances. However, though he probably knows about a great variety of drinks, only a few are on his mind at a particular point in time. Cocktail recipes are considered as special kinds of cooking recipes because they contain basically a small list of ingredients and suggestions about preparation and glassware. When cooking, people tend to use a limited collection of recipes, which they try to remember. A huge number of cooking recipes is available in the form of books or on the internet, and it is possible to select a new meal every day by using these sources [STIM09, p. 9]. Owing to such a diverse and dizzying range of choices, a system of automatic recommendation deals with such diversity [XYL10, p. 254]. A bartender’s recommendation serves as a metaphor for the content-based recommendation approach. The cognitive processes of human beings and machines are dierent, so only the performances and not the processes have to be validated. Recommender systems [RV97] make it possible for a user to classify an appropriate recom- mendation if the necessary knowledge about the items — such as quantied ingredients — are given. This thesis considers a recommender system of cocktail recipes that are appropriate to domain experts. It includes the algorithm of and modeling techniques for recipes. Since there is little research on cocktails, theoretical approaches to cooking recipes will be applied to cocktail 1 1. Introduction recipes. The dierences is that cooking recipes considers a longer process of preparation including more ingredients, the consideration in cocktail recipes goes more in depth such as more denite ingredients instead of breadth. Section 2 describes the objectives of the thesis. In Section 3, the literature review focuses on the deep understanding of text. Content-based recommendation systems, the KDD process, and the modeling of an ontology with basic categories are the basis for a path-based distance measurement. Domain-related work such as modeling of ingredients and sensations such as avors are considered. Section 4 describes the experiments started by the understanding of domain knowledge. This is followed by a feature extraction of raw text and ends with a recommendation approach including validation. The last section, 5, provides the conclusion and prospects for future work. 2 2. Objective For analysis and understanding of a domain-specic recommendation system the following application is envisaged: When a guest seeks a recommendation for a drink in a bar, a selection of cocktails may be oered according to a menu card or in a more personal way by the bartender. A part of the recommendation process is based on psychological factors such as the atmosphere or guest’s character but this thesis considers objective factors based on cocktail recipes. Cocktails are written down as cocktail recipes that contain a name, ingredients including quantity, and partially short information about preferred glassware and preparation. Detailed availability of these specic parts is presented in the experiments. Manhattan Cocktail 1 1 dash of gum syrup, very carefully; 1 dash of bitters (orange bitters); 1 dash of curacao, if required; 1/2 wine glass of whiskey; 1/2 wine glass of sweet vermouth; stir up well; strain into a fancy cocktail glass; These recipes are available in cocktail books2, blogs3, or cocktail databases4. The sources present a huge volume of data, which is already available and increases with time. There are dierent types of cocktails: Besides classic cocktails such as a Manhattan, which are cold and contain only liquid ingredients, there are hot cocktails and molecular recipes containing drops or foams. This thesis focuses on the classic recipes with two or more recipes that contain partially a cherry, a zest, or mint but are basically liquid. If it is liquid, the result is a mixture containing all ingredients of this recipe. This approach assumes that a recipe results from a single mixture and each necessary ingredient is already prepared. 11882 Harry Johnson, Bartenders Manual p. 162 2euvs-vintage-cocktail-books.cld.bz 3www.winebags.com/50-Top-Cocktail-Blogs-of-2015/2910.htm 4www.kindredcocktails.com 3 2. Objective Cocktail recipes contain relevant information to prepare a specic cocktail.