Multiple-Food Recognition Considering Co-Occurrence Employing Manifold Ranking
Total Page:16
File Type:pdf, Size:1020Kb
21st International Conference on Pattern Recognition (ICPR 2012) November 11-15, 2012. Tsukuba, Japan Multiple-Food Recognition Considering Co-occurrence Employing Manifold Ranking Yuji Matsuda and Keiji Yanai Graduate School of Informatics, The University of Electro-Communications Email: [email protected], [email protected] Abstract In this paper, we propose a method to recog- nize food images which include multiple food items Figure 1. Examples of multiple-food pho- considering co-occurrence statistics of food items. tos. Theproposedmethodemploysamanifoldrank- ing method which has been applied to image re- search of scene recognition the targets of which usu- trieval successfully in the literature. In the exper- ally contain multiple objects, relations between ob- iments, we prepared co-occurrence matrices of 100 jects were considered as important cue for scene food items using various kinds of data sources in- recognition in some works [8, 9]. Inspired by these cluding Web texts, Web food blogs and our own food works, we introduce relation information between database, and evaluated the final results obtained by food items for recognizing multiple-food meal pho- applying manifold ranking. As results, it has been tos. As relations which have been used in object proved that co-occurrence statistics obtained from a recognition research so far, co-occurrence [8] and food photo database is very helpful to improve the relative location [9] are common. In case of meal classification rate within the top ten candidates. photos, we think co-occurrence relation is more im- portant than relative locations, because some com- 1 Introduction binations of foods such as “hamburger and french Recently, personal services to recode people’s fries” are very common, while the way to place food food habits by taking meal photos with mobile items on the table is not strictly restricted in gen- phones have become popular. Currently, labeling eral. food names to meal photos requires human labor, In this paper, we propose a method to rec- which is a quite troublesome task. Therefore, it ognize multiple-food meal photos considering co- is desired to make recording of food items more occurrence statistics by extending our previous easier and quickly. To this end, several methods work [4]. To do that, we use manifold ranking [10] to recognize food images have been proposed so which have been used as a method for relevance far [1, 2, 3, 4]. feedback of image retrieval [11]. Manifold rank- Most of the existing works assumed that one ing is a ranking method to consider similarities meal image contained only one food item. They between items. In the image retrieval research, cannot handle a meal photo which contains two or manifold ranking was used to rank images con- more food items such as a hamburger-and-french- sidering both user preference and visual similar- fries image. Then, in [4], we proposed a new ities between images. In this paper, we use the method for recognizing meal photos which con- manifold ranking method to rank food candidates tain two or more food items as shown in Figure considering both recognition results by our previ- 1. In our proposed method, firstly, we detect candi- ous method [4] and co-occurrence statistics between date regions with several methods including Felzen- food items. That is, the method proposed in this szwalb’s deformable part model (DPM) [5], a cir- paper re-ranks the original candidate ranking ob- cle detector and the JSEG region segmentation tained by the multiple-food recognition method, [6]. Then, we extract various kinds of image fea- which does not consider co-occurrence statistics, by tures from each candidate region. After applying taking into account co-occurrence statistics. To our the classification models trained by multiple kernel best knowledge, this is the first work to recognize learning [7], we obtain the names of the top N food multiple-food items in one meal image taking into item candidates over the given image. account co-occurrence statistics. In the proposed method [4], each food item is The rest of this paper is organized as follows: recognized independently. Meanwhile, in the re- Section 2 explains the proposed method, and Sec- 978-4-9906441-1-6 ©2012 IAPR 2017 tion 3 describes the experimental results. Finally photo. Then, we select the maximum SVM out- in Section 4 we conclude this paper. put score over all the candidate region regarding each food category. Finally, we obtain the evalua- tion scores which express likelihood that the corre- 2 Proposed Method sponding food items appear in the given photo. As The proposed method refine the results obtained a system output, we obtain the names of the top by our previous method [4] with manifold rank- N food items over the given image regarding the ing [10]. Before explaining the propose method, we descending order of the evaluation score. describe the previous method to detect food items for multiple-food photos as shown in Figure 1. 2.2 Manifold Ranking with Co- occurrence Statistics Query Image Some common combinations of food items ex- Classification ist such as “hamburger and french-fries” and “rice Candidate Region Detection MKL-SVM and miso-soup”, while unlikely combinations ex- Whole DPM Circle JSEG ist such as “sushi and hamburger” or “sashimi Ranking of 100 Foods Integration of Candidate Region and french-fries”. From these observation, co- occurrence statistics is expected to be able to Re-Ranking by Manifold Ranking Coding Image Feature Vector enhance the performance of multiple-food image Color SIFT CSIFT HOG Gabor Result recognition. By considering co-occurrence statis- tics, we can reduce unlikely combinations and boost Figure 2. Recognition Flow possible combinations which are included in the higher ranked candidates. As results, obtained ranking of food item candidates become more pre- 2.1 Previous Method cise. In [4], we proposed a food image recognition sys- For multi-food recognition with co-occurrence tem which outputs the names of food items that are statistics, we use manifold ranking [10], which is a expected to be shown in a given meal photo. We re-ranking method to consider similarities between show the overview of the processing flow of the pro- items. In this paper, we propose to re-rank the posed system including the co-occurrence extension food candidate ranking obtained by our previous proposed in this paper in Figure 2. Given an input work [4] with the manifold ranking method. image, first, the system detects candidate regions The equation of the manifold ranking is as fol- of dishes. We use four types of detectors includ- lows: ing the deformable part model (DPM) [5], a circle r∗ =(I − αS)−1r, (1) detector, the JSEG region segmentation [6], and r r∗ whole image. Next, we integrate bounding boxes where and represent the initial ranking vector I S of the candidate regions detected by the four meth- and the manifold ranking vector, and and rep- ods. Then, we check the aspect ratio of width and resent an identity matrix and a similarity matrix, α height of the bounding boxes, and exclude irrele- respectively. Note that is a constant varying from S vant bounding boxes regarding their shapes from 0 to 1, which adjusts the effect of for the initial r the candidate set. vector . r Next, the system extracts various kinds of image The initial ranking vector is calculated by ap- features including Bag-of-Features (BoF) of SIFT plying a standard sigmoid function to the SVM out- v and color SIFT with spatial pyramid, Histogram of put value i of each category and L1-normalized as Oriented Gradients (HoG), and Gabor texture fea- shown in the following equation: tures from the selected regions, and calculate SVM −1 (1 + exp(−vi)) scores by multiple-kernel learning (MKL) [7] with ri , = −1 (2) chi-square RBF kernels for each candidate region j(1 + exp(−vj)) in the one-vs-rest manner. MKL can estimate the optimal weights for linear combination of kernels As a similarity matrix to re-rank the initial rank- each of which corresponds to one kind of a visual ing, we use a co-occurrence probability matrix, feature. which can be calculated by counting the number of After that, we obtain SVM output scores for co-occurrence of two food item pairs in a training all the candidate regions and all the food cate- dataset. Each element of the co-occurrence matrix gories. Note that our objective is not associating Si,j can be obtained with extracted regions with possible names of food items directly, but listing all the names of the food items ci,j Si,j = , (3) c which are estimated to be shown in the given meal k∈F ∧k= j k,j 2018 3 Experiments chicken-’n’- rice eels on rice pilaf egg on port cutlet beef curry sushi chicken fried rice tempura bibimbap toast croissant roll bread raisin chip butty hamburger rice on rice rice bowl bread Japanese- udon tempura soba ramen tensin fried sauteed grilled sauteed In the experiments, we used our own food im- pizza spaghetti style takoyaki gratin croquette sandwiches noodle udon noodle noodle beef noodle noodle noodle vegetables eggplant spinach pancake age dataset built for [4] as shown in Figure 3 which vegetable teriyaki grilled sweet and miso soup potage sausage oden omelet ganmodoki jiaozi stew fried fish grilled salmon sashimi pacific sukiyaki tempura grilled fish salmon meuniere saury sour pork includes 100 Japanese food categories with bound- ing boxes on each food item. It contains about one lightly steamed seasoned spicy chili- egg roasted egg tempura fried sirloin nanbanzuke boiled fish beef with hambarg steak dried fish ginger pork flavored yakitori cabbage omelet sunny-side fish hotchpotch chicken cutlet potatoes steak saute tofu roll up hundred images for each category and 9132 images stir-fried boiled fish-shaped steamed chilled simmered chicken sashimi pancake shrimp roast omelet cutlet spaghetti fried natto cold tofu egg roll beef and pork and sushi bowl with bean with chill meat with fried curry shrimp noodle peppers bowl source chicken dumpling rice meat sauce totally.