// NO.19-063 | 12/2019 DISCUSSION PAPER // JANNA AXENBECK AND PATRICK BREITHAUPT Web-Based Innovation Indicators – Which Firm Web- site Characteristics Relate to Firm-Level Innovation Activity? Web-Based Innovation Indicators – Which Firm Website Characteristics Relate to Firm-Level Innovation Activity? Janna Axenbeck†+* & Patrick Breithaupt†* † Department of Digital Economy, ZEW – Leibniz Centre for European Economic Research, L7 1, 68161 Mannheim, Germany +Justus-Liebig-University Giessen, Faculty of Economics, Licher Straße 64, 35394 Gießen, Germany * Correspondence:
[email protected]; Phone: +49 621 1235 – 188,
[email protected]; Phone: +49 621 1235 – 217 December 31, 2019 Abstract Web-based innovation indicators may provide new insights into firm-level innovation activities. However, little is known yet about the accuracy and relevance of web-based information. In this study, we use 4,485 German firms from the Mannheim Innovation Panel (MIP) 2019 to analyze which website characteristics are related to innovation activities at the firm level. Website characteristics are measured by several text mining methods and are used as features in different Random Forest classification models that are compared against each other. Our results show that the most relevant website characteristics are the website’s language, the number of subpages, and the total text length. Moreover, our website characteristics show a better performance for the prediction of product innovations and innovation expenditures than for the prediction of process innovations. Keywords: Text as data, innovation indicators, machine learning JEL Classification: C53, C81, C83, O30 Acknowledgments: The authors would like to thank the German Federal Ministry of Education and Research for providing funding for the research project (TOBI - Text Data Based Output Indicators as Base of a New Innovation Metric; funding ID: 16IFI001).