Mobile Barcode Scanning Applications for Consumers Application Development and the Quality of Product Master Data
Total Page:16
File Type:pdf, Size:1020Kb
Research Collection Doctoral Thesis Mobile barcode scanning applications for consumers Application development and the quality of product master data Author(s): Karpischek, Stephan Publication Date: 2012 Permanent Link: https://doi.org/10.3929/ethz-a-009774955 Rights / License: In Copyright - Non-Commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use. ETH Library Diss. ETH No. 20792 Mobile Barcode Scanning Applications for Consumers Application development and the quality of product master data A dissertation submitted to ETH Zürich for the degree of Doctor of Science Presented by Stephan Karpischek Dipl. Des., Universität der Künste Berlin born 6 January 1975 citizen of Austria accepted on the recommendation of Prof. Dr. Elgar Fleisch Prof. Sanjay Sarma, Ph. D. Dr. Florian Michahelles 2012 This dissertation is dedicated to the memory of my father Franz Karpischek (1929 - 1991) Information is not knowledge Knowledge is not wisdom Wisdom is not truth Truth is not beauty Beauty is not love Love is not music Music is the best. Frank Zappa. Packard Goose, 1978. Abstract Mobile barcode scanning applications can improve the daily buying decisions of millions of users. Application providers need access to high quality product master data for correct product descriptions. Data quality problems are emerging as product master data designated for industrial supply chains reach a wide audience of consumers in these applications. This thesis contributes to the research on mobile barcode scanning ap- plications and product master data quality. We describe the development of a mobile barcode scanning application that enables consumers to share comments and ratings on products. The app has been deployed to thousands of Android and iPhone smartphones and the software has been released under an open source license. Analysis of usage data shows that users are less likely to share comments and ratings when product descriptions are missing. We use aggregated prod- uct master data for more than 120,000 products to develop a method for identifying incorrect product names. We evaluate the performance and use- fulness for consumer packaged goods businesses and measure the correctness of product master data from publicly available sources. Our results show that approximately 2% of product names are incorrect. The method developed can be used to effectively monitor and control product master data quality in external sources. Implemented in master data manage- ment processes, it can help to improve the overall quality of product master data for mobile barcode scanning applications. v Zusammenfassung Mobile Software-Anwendungen, die Produkte anhand ihres Strichcodes identi- fizieren, ermöglichen jeden Tag bessere Kaufentscheidungen für Millionen von Konsumenten weltweit. Wesentliche Voraussetzung für diese Anwendungen sind qualitativ hochwertige Produktstammdaten, um eine korrekte Erkennung der Produkte zu gewährleisten. Allerdings werden durch die – ursprünglich nicht vorgesehene – Verwendung von Strichcodes durch Konsumenten auch zunehmend Qualitätsprobleme im Bereich der Produktstammdaten sichtbar. Diese Arbeit untersucht die Entwicklung von mobilen Anwendungen für Konsumenten und die Qualität von Produktstammdaten für die Produk- terkennung. Wir präsentieren ein Konzept für eine mobile Anwendung, mit der Konsumenten Kommentare und Meinungen zu Produkten abgeben kön- nen. Die Anwendung wird für die mobilen Plattformen Android und iPhone implementiert und aus unterschiedlichen Perspektiven evaluiert. Eine Analyse der Nutzungsdaten zeigt, dass Benutzer deutlich öfter ihre Meinung zu einem Produkt abgeben, wenn dieses korrekt erkannt wurde. Aufbauend auf einer Sammlung von Produktstammdaten für über 120.000 Produkte wird eine Methode entwickelt, mit deren Hilfe falsche Produkt- namen schnell und zuverlässig erkannt werden können. Wir demonstrieren die Leistungsfähigkeit dieser Methode und ihre praktische Anwendbarkeit für Unternehmen der Konsumgüterindustrie und zeigen, dass etwa 2% der Produktnamen in online verfügbaren Datenquellen falsch sind. Unternehmen können die entwickelte Methode nutzen, um Stammdaten für ihre Produkte in externen Quellen zu kontrollieren und so die Datenqualität und Wahrnehmung ihrer Produkte zu verbessern. vii Acknowledgments First, I want to thank Elgar, Sanjay, and Florian for supervising this thesis, for their patience, and for their great support at the Auto-ID Labs at ETH Zürich and MIT. I want to thank my dear colleagues at the Auto-ID Labs Irena, Erica, Andrea, Andreas, Felix, and Mikko for the good work and the time together; and Thorsten, Lukas, and Albrecht for their always helpful and friendly advice and support. I also want to thank my colleagues at the Chair of Information Management at ETH Zürich and at the Institute of Technology Management at the University of St. Gallen for an inspiring and motivating research environment, and especially Liz and Monica for their caring and friendly support. I also want to thank the team that worked with us on the my2cents project: Nadine, Tilmann, Yan, Steffan, Mike and Claudio, Anton, and all the others who helped with advice, resources, and encouragement, especially Roman Bleichenbacher and the codecheck team, Christian Flörkemeier and the Mirasense team, and Nicolas Florin and the team of GS1 Switzerland. Florian Resatsch, Gerald Madlmayer, Mark Butler, Steffan Hradetzky, and Dieter Würch have been great mentors and friends. I want to thank them and my parents Christa and Klaus for their support. Thank you! My deepest thanks go to Vivian Mary for her patience and the power which enabled me to work on this thesis, a great and loving family, Emma and Simon: Do what you love. No excuses. ix Contents 1 Introduction1 1.1 Motivation.............................1 1.2 Research questions........................9 1.3 Contributions........................... 12 1.4 Structure of thesis........................ 13 2 Related work 15 2.1 Research on mobile shopping assistants............. 15 2.1.1 Hardware devices..................... 16 2.1.2 Pocket computers..................... 17 2.1.3 Mobile phone applications................ 20 2.2 Mobile applications on the market................ 22 2.2.1 Price comparison..................... 23 2.2.2 Social applications.................... 24 2.2.3 Other barcode scanning applications.......... 25 2.3 Research on data quality..................... 26 2.3.1 Product master data................... 27 2.3.2 Data quality assessment................. 28 2.3.3 Data integration..................... 29 2.3.4 Name matching...................... 30 2.3.5 Machine learning..................... 31 2.4 Word of mouth and consumer power.............. 31 2.5 Summary............................. 33 xi 3 Application development 35 3.1 Concept.............................. 36 3.1.1 Our approach....................... 37 3.1.2 Features.......................... 38 3.1.3 Value network....................... 41 3.1.4 Benefits for brand owners................ 42 3.1.5 Business models...................... 43 3.2 Implementation.......................... 46 3.2.1 External elements..................... 47 3.2.2 Mobile clients....................... 50 3.2.3 Server back end...................... 53 3.2.4 Current status....................... 54 3.2.5 Credits........................... 55 3.3 Evaluation............................. 55 3.3.1 Application usage..................... 55 3.3.2 User feedback....................... 57 3.3.3 Usage scenarios...................... 58 3.3.4 Usability tests....................... 58 3.3.5 Feedback on the business concept............ 59 3.4 Effects of incomplete data.................... 60 3.4.1 Comments......................... 61 3.4.2 Ratings.......................... 63 3.5 Discussion............................. 65 3.5.1 Software implementation................. 66 3.5.2 Barcode scanning..................... 66 3.5.3 User base......................... 67 3.5.4 Commercialization.................... 68 3.5.5 Data for research..................... 69 3.5.6 Quality of product descriptions............. 69 4 Quality of product master data 71 4.1 Methodology........................... 73 4.1.1 Data collection...................... 73 4.1.2 Quality dimensions.................... 76 4.1.3 String similarity measures................ 77 4.1.4 Supervised learning.................... 78 4.1.5 Regularized logistic regression.............. 83 4.2 Results............................... 86 4.2.1 Collected data....................... 86 4.2.2 Classifier selection.................... 90 4.2.3 Classifier performance.................. 90 4.2.4 Identifying incorrect product names........... 94 4.3 Discussion............................. 95 4.3.1 Recognition rates..................... 97 4.3.2 Classification results................... 97 4.3.3 Reasons for incorrect product names.......... 99 4.3.4 Limitations........................ 101 5 Conclusions 103 5.1 Contributions........................... 103 5.2 Implications for research..................... 105 5.3 Implications for practice..................... 106 A Additional tables 113 B Source code 121 Bibliography 135 Referenced web resources 151 List of Figures 1.1 A 13-digit GTIN barcode.....................2 1.2 Consumer scanning a product barcode with a smartphone in a supermarket (picture copyright Florian Michahelles)......3 3.1 my2cents concept: Scan a product’s barcode, read what oth- ers have