A Modified All-And-One Classification Algorithm Combined with the Bag-Of-Features Model to Address the Food Recognition Task
Total Page:16
File Type:pdf, Size:1020Kb
A Modified All-and-One Classification Algorithm Combined with the Bag-of-Features Model to Address the Food Recognition Task Kalliopi Dalakleidi, Myriam Sarantea and Konstantina Nikita School of Electrical and Computer Engineering, National Technical University of Athens, Iroon Polytechniou Str., Athens, Greece Keywords: Diabetes, All-And-One, Bag-Of-Features, Food Recognition System. Abstract: Dietary intake monitoring can play an important role in reducing the risk of diet related chronic diseases. Automatic systems that support patients to count the nutrient contents, like carbohydrates (CHO), of their meals, can provide valuable tools. In this study, a food recognition system is proposed, which consists of two modules performing feature extraction and classification of food images, respectively. The dataset used con- sists of 1200 food images split into six categories (bread, meat, potatoes, rice, pasta and vegetables). Speeded Up Robust Features (SURF) along with Color and Local Binary Pattern (LBP) features are extracted from the food images. The Bag-Of-Features (BOF) model is used in order to reduce the features space. A modified version of the All-And-One Support Vector Machine (SVM) is proposed to perform the task of classification and its performance is evaluated against several classifiers that follow the SVM or the K-Nearest Neighbours (KNN) approach. The proposed classification method has achieved the highest levels of accuracy (Acc = 94.2 %) in comparison with all the other classifiers. 1 INTRODUCTION changes and distinctiveness means that the extracted features can be used to distinguish the specified ob- Diet related chronic diseases, such as obesity and di- ject from other objects or the background. Features abetes mellitus, are expanding nowadays. Therefore, related to color and texture have been shown to ensure an urgent need for dietary intake monitoring arises stability and distinctiveness. Moreover, a large vari- that can reduce the risk of these diseases. Studies have ety of local feature descriptors has been proposed in shown that when patients with diabetes mellitus do the literature, like Gaussian derivatives (Florack et significant errors in reporting their dietary intake, al., 1994), moment invariants (Mindru et al., 2004), there is an increased risk of postprandial hypo- or hy- complex features (Baumberg, 2000; Schaffalitzky perglycemia. Automatic systems, usually based on a and Zisserman, 2002), steerable filters (Freeman and mobile phone, can support patients that suffer from Adelson, 1991), and phase-based local features (Car- diet related chronic diseases with carbohydrates neiro and Jepson, 2003). A variant of Scale Invariant (CHO) counting. The user first takes a photograph of Feature Transform (SIFT), Speeded Up Robust Fea- the upcoming meal with the camera of his mobile tures (SURF), has the ability to capture spatial inten- phone. Then, the image is processed so that the dif- sity patterns, while being robust to small defor- ferent types of food are divided from each other and mations or localization errors and is shown to outper- segmented in different areas of the image. A series of form the previous mentioned categories of features features are extracted from each segmented area and (Mikolajczyk and Schmid, 2003; Bay et al., 2008). are fed to a classifier, which decides what kind of Classification results of food images in dietary in- food is represented by each segmented area. Then, the take monitoring systems can be improved when the volume of each segmented area is calculated and the dimension of the extracted feature vector is reduced. total CHO of the depicted meal are estimated. The use of the Bag-Of-Features (BOF) model (Peng Feature extraction can play a key role in dietary et al., 2016), which is inspired by the Bag-Of-Words intake monitoring systems. Efficient feature de- model for text classification (Cruz-Roa et al., 2011) scriptors could ensure stability and distinctiveness, has been reported to highly improve classification ac- where stability means that the extracted features are curacy in food recognition tasks. The BOF model invariant to different photometric and geometric achieves dimensionality reduction by creating from 284 Dalakleidi K., Sarantea M. and Nikita K. A Modified All-and-One Classification Algorithm Combined with the Bag-of-Features Model to Address the Food Recognition Task. DOI: 10.5220/0006141302840290 In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2017), pages 284-290 ISBN: 978-989-758-213-4 Copyright c 2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved A Modified All-and-One Classification Algorithm Combined with the Bag-of-Features Model to Address the Food Recognition Task the extracted features visual words, and by describing calculates pairwise statistics between local features the image content with the distribution of these visual has been presented (Yang et al., 2010). These statis- words (Wang et al., 2016). tics are accumulated in a multi-dimensional histo- Another important aspect of the food recognition gram, which is then used as input to a SVM classifier. task is that it is usually a multiclass classification Food images are taken from the Pittsburgh Food Im- problem, as the used food datasets almost always con- age Dataset (PFID) (Chen et al., 2009). This system tain more than two categories of food. There exist has also achieved high levels of recognition accuracy many classification approaches in order to address the (80%). multiclass recognition task, but the most prominent The use of the BOF model has been adopted in ones, like the One-Against-All (OAA), the One- several food recognition systems recently, since food Against-One (OAO) and the All-And-One (A&O) recognition does not presume any typical spatial ar- (Pedrajas and Boyer, 2006) descend from the binari- rangement of the food elements. Based on the BOF zation strategy, where the division of the initial mul- model, the Food Intake Visual and Voice Recognizer ticlass problem to several binary class problems takes system which aims to measure the nutritional content place (Galar et al., 2011). of a user’s meal (Puri et al., 2009) has been proposed. Several attempts to implement automatic or semi- Given a set of three images of a user’s plate of food, automatic systems for dietary intake monitoring have the system first asks the user to list food items through been reported in the literature. A food identification speech, then attempts to identify each food item on application called DietCam has been recently pre- the plate, and finally reconstructs them in 3D to meas- sented (Kong and Tan, 2012), which consists of three ure their respective volumes. Food images are col- parts: image manager, food classifier and volume es- lected by the developers of the system. Food classifi- timator. Images taken by the users are fed to the im- cation is based on the combined use of color neigh- age manager, then SIFT features are extracted, clus- borhood and maximum response features in a texton tered into visual words and fed to a simple Bayesian histogram model, which resembles the BOF ap- probabilistic classifier, which achieves high levels of proach. Adaboost is used for feature selection and accuracy (92%). The food volume estimator calcu- SVM for classification, which achieves recognition lates the volume of each food item recognized by the accuracy about 90%. Moreover, a food recognition food classifier and then the calorie content of the food system for the classification of Japanese food images is estimated. Another food recognition application has been introduced (Joutou and Yanai, 2009), based has been recently proposed for the classification of on the combined use of BOF of SIFT, Gabor filter re- fast-food images (Shroff et al., 2008). After segmen- sponses and color histograms, which are then fed to a tation of the fast-food image, color, size, texture, multiple kernel learning classifier, which has shape and context-based features are computed and achieved acceptable levels of accuracy (61,34%). The fed to a feed-forward artificial neural network achiev- BOF model has been used in another automatic food ing a 90% accuracy. Moreover, a food identification recognition system (Anthimopoulos et al., 2014). The system has been presented which consists of the fol- system firstly computes dense local features using the lowing modules: image segmentation, feature extrac- SIFT on the HSV (Hue Saturation Value) color space, tion, food classification, and volume estimation (Zhu then builds a visual vocabulary of 10000 visual words et al., 2010). Food description is based on a set of by using the hierarchical k-means clustering and, fi- color and texture features, while classification is nally, classifies the food images with a linear SVM based on a Support Vector Machine (SVM) classifier, classifier, which achieves high levels of accuracy which has achieved high classification accuracy (78%). (95,8%). An automated Food Intake Evaluation Sys- In the present study, a food recognition system is tem (AFIES) has been reported (Martin et al., 2009), proposed which consists of two modules performing which consists of reference card detection, food re- feature extraction and classification of food images, gion segmentation, food classification and food respectively (Figure 1). Motivated by the ability of amount estimation modules. The color RGB data are SURF to capture spatial intensity patterns