Exploiting Food Embeddings for Ingredient Substitution

Exploiting Food Embeddings for Ingredient Substitution Chantal Pellegrini a;∗, Ege Ozsoy¨ b;∗, Monika Wintergerst c;∗ and Georg Groh d Department of Informatics, Technical University of Munich, Boltzmannstr. 3, 85748 Garching, Germany Keywords: Food Substitution, BERT, Word2Vec, Word Embeddings. Abstract: Identifying ingredient substitutes for cooking recipes can be beneficial for various goals, such as nutrient optimization or avoiding allergens. Natural language processing (NLP) techniques can be valuable tools to make use of the vast cooking-related knowledge available online, and aid in finding ingredient alternatives. Despite previous approaches to identify ingredient substitutes, there is still a lack of research in this area regarding the most recent developments in the field of NLP. On top of that, a lack of standardized evaluation metrics makes comparing approaches difficult. In this paper, we present two models for ingredient embeddings, Food2Vec and FoodBERT. In addition, we combine both approaches with images, resulting in two multimodal repre- sentation models. FoodBERT is furthermore used for relation extraction. We conduct a ground truth based evaluation for all approaches, as well as a human evaluation. The comparison shows that FoodBERT, and especially the multimodal version, is best suited for substitute recommendations in dietary use cases. 1 INTRODUCTION ral language processing (NLP) can be used to identify generally applicable ingredient substitutes. We In the light of rising rates of non-communicable dis- refer to these replacements as context-free, meaning eases such as type 2 diabetes, and with an unhealthy independent of the direct recipe context. The food diet being one of the leading health risks globally, items identified as fitting substitutes for an ingredient healthy eating has become increasingly important should, therefore, be suitable replacements in several (World Health Organization, 2020). However, chang- cases, and lay the foundation for all substitution use ing one’s diet can be challenging as eating habits are cases mentioned above. hard to break. This is why the substitution of food While there has been previous research in the NLP items is a promising approach to foster healthy nutri- domain that explored this task, there are few gen- tion. Substituting ingredients in a cooking recipe for erally applicable approaches that use neural embed- healthier alternatives can, for instance, help to opti- ding models to generate ingredient substitutes, de- mize the meal’s nutrient profile to meet individuals’ spite these models’ success on other NLP tasks. No- dietary needs, while at the same time preserving the tably, there is no approach to our knowledge that ap- dish’s culinary attributes. Thus, familiar meals can be plies the most recent NLP-powerhouses, transformer- enjoyed while approaching dietary aims in manage- based models like BERT (Devlin et al., 2018), to the able steps. Apart from replacing ingredients for nutri- task of substitute generation. ent optimization, food substitutes can also be applied In this paper, we propose and compare several ap- for other goals such as avoiding allergens or adapting proaches for context-free ingredient substitute gener- dishes to dietary preferences. ation. We train two models, word2vec (Mikolov et al., A promising way to find food substitutes is 2013) and BERT (Devlin et al., 2018), on recipe in- to leverage the vast amounts of (mostly textual) structions from the Recipe1M+ dataset (Marin et al., cooking-related data online to draw conclusions about 2019) to compute meaningful ingredient embeddings. which food items are interchangeable. Thus, natu- For BERT, we start from a pre-trained checkpoint. We refer to these models as Food2Vec and Food- a https://orcid.org/0000-0002-7555-0336 BERT. Additionally, we combine these text-only ap- b https://orcid.org/0000-0003-2841-3838 proaches with an image-based approach resulting in c https://orcid.org/0000-0002-9244-5431 multimodal ingredient representations. We also use d https://orcid.org/0000-0002-5942-2297 FoodBERT to perform relation extraction on recipe ∗These authors contributed equally. 67 Pellegrini, C., Özsoy, E., Wintergerst, M. and Groh, G. Exploiting Food Embeddings for Ingredient Substitution. DOI: 10.5220/0010202000670077 In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF, pages 67-77 ISBN: 978-989-758-490-9 Copyright c 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved HEALTHINF 2021 - 14th International Conference on Health Informatics comment data to extract substitute pairs. Further- on human evaluation on a relatively small scale, ei- more, we propose an approach for a ground truth ther by cooking and testing recipes with substitu- based evaluation for all methods and conduct a hu- tions (Kazama et al., 2018; Yamanishi et al., 2015) man evaluation. Our code and the trained models are or through qualitative analysis of recipes or substi- available on GitHub1. tute pair examples (Shidochi et al., 2009; Sauer et al., The remainder of the paper is organized as fol- 2017; Lawo et al., 2020). In contrast, (Achananuparp lows: Section 2 presents related work, and Section 3 and Weber, 2016) had crowd workers rate 2,000 sub- provides the theoretical background. Section 4 intro- stitute pairs. The authors were then able to employ duces the used dataset. The approaches are explained quantitative evaluation metrics. In an endeavor to cre- in Section 5, and their results on ingredient substitu- ate a gold standard for various food relations, (Wie- tion are presented and discussed in Section 6. Section gand et al., 2012b) found that two subjects’ lists of 7 concludes the paper and proposes possible future substitute pairs had little overlap, indicating that find- work. ing a consensus on suitable substitutes is challenging. The authors used their gold standard in subsequent research to evaluate their approaches quantitatively 2 RELATED WORK (Wiegand et al., 2012a). (Reiplinger et al., 2014) cre- ated a gold standard for food relations by manually la- This section presents previous research in the area of beling sampled sentences from their domain-specific NLP that has explored the task of substitute gener- corpus. However, the above gold standards are both ation, as well as different approaches to evaluating in German and in include terms specific to the Ger- such methods. man cuisine, such as ”Maultaschen”, a German kind Some approaches have detected food substitutes of dumpling. Therefore, it is difficult to apply them in recipe websites’ user comments, for instance to approaches in other languages, which in turn might by identifying segments with alteration suggestions require additional terms in their vocabulary to reflect (Druck and Pang, 2012), or through relation extrac- the respective cuisines. tion (Wiegand et al., 2012a; Reiplinger et al., 2014). Given the lack of exploration of neural embedding (Teng et al., 2012) used pattern extraction results models for ingredient substitution and the challenges to build an ingredient substitute network. Other re- concerning evaluation, we propose several learning- searchers have focused on recipe texts and used statis- based approaches for substitute generation and con- tical methods to find food substitutes (Shidochi et al., duct both a ground truth based and a human evalua- 2009; Boscarino et al., 2014; Yamanishi et al., 2015). tion. (Achananuparp and Weber, 2016) identified substitutes from food diary data with food-context matrices. The authors refer to the distributional hypothesis, the- 3 BACKGROUND orizing that ingredients that occur in similar contexts tend to be similar. Other researchers have followed In the previous section, we observed that a con- this concept with neural embedding techniques. Some cept that has been employed in many substitution ap- approaches employ vector arithmetic on word em- proaches is the distributional hypothesis, which states beddings to find substitutes via ingredient analogies. that words occurring in similar contexts tend to be They are designed for special use cases, such as ”veg- similar in meaning. It was first explored in linguistics anizing” meals (Lawo et al., 2020) or adapting recipes (Harris, 1954), and later adopted in the area of com- to different cuisines (Kazama et al., 2018). There putational linguistics (Rubenstein and Goodenough, is a lack of research on general-purpose ingredient 1965; Rieger, 1991). In recent years and with the embeddings for food substitution, as the general ap- surge of neural networks, neural approaches to cap- proaches so far have been rather exploratory and with- ture this semantic similarity have been developed. out in-depth evaluation (Hinds, 2016; Altosaar, 2017; Two of the most popular approaches are word2vec Sauer et al., 2017). Additionally, there is no ap- (Mikolov et al., 2013) and BERT (Devlin et al., 2018). proach to our knowledge that employs transformer- Word2vec (Mikolov et al., 2013) is an approach to based models like BERT (Devlin et al., 2018). compute continuous vector representations of words A big challenge for food substitution is the ab- given a large corpus of training text. Training a sence of a standard for comparing the results of dif- word2vec model results in vectors representing the ferent methods. Many approaches so far relied solely words in the corpus, with semantically similar words 1https://github.com/ChantalMP/ being

Exploiting Food Embeddings for Ingredient Substitution

Malware Classification with BERT

Bros:Apre-Trained Language Model for Understanding Textsin Document

Information Extraction Based on Named Entity for Tourism Corpus

NLP with BERT: Sentiment Analysis Using SAS® Deep Learning and Dlpy Doug Cairns and Xiangxiang Meng, SAS Institute Inc

Unified Language Model Pre-Training for Natural

Question Answering by Bert

Distilling BERT for Natural Language Understanding

Arxiv:2104.05274V2 [Cs.CL] 27 Aug 2021 NLP ﬁeld Due to Its Excellent Performance in Various Tasks

Efficient Training of BERT by Progressively Stacking

Word Sense Disambiguation with Transformer Models

Improving the Prosody of RNN-Based English Text-To-Speech Synthesis by Incorporating a BERT Model

Recombining Frames and Roles in Frame-Semantic Parsing