The Power of Prediction: Predictive Analytics, Workplace Complements, and Heterogeneous Firm Performance 1
Erik Brynjolfsson Wang Jin Kristina McElheran Stanford University and MIT Sloan School of University of Toronto NBER Management [email protected] [email protected] [email protected]
Abstract Anecdotes abound concerning the promise of predictive analytics tools and techniques in an age of increasingly ubiquitous digital information. However, large-scale microdata on this fast- emerging practice and its nuances has been lacking. To address this gap, we worked with the U.S. Census Bureau to collect detailed responses from over 30,000 establishments in the U.S. manufacturing sector on predictive analytics and key workplace complements. This effort yielded the first statistics on the widespread adoption of predictive analytics in a large, representative panel of establishments. Focusing on plant-level productivity over time, we find a significant increase in firm performance among adopters – but almost entirely among plants that have also pursued specific tangible and intangible investments. For workplaces with high IT investment, educated employees, a high-efficiency production strategy, and certain management practices, multi-factor productivity is more than two to four percent higher with the use of predictive analytics. Instrumental variables estimates and the timing of performance gains with respect to adoption suggest a causal relationship. While the use of predictive analytics is diffusing rapidly across a wide range of rims, there are understudied strategic and organizational contingencies that determine how productive it is, in practice.
Keywords: predictive analytics, productivity, complementarity, strategic commitment, management practices, information technology, digitization
1 Disclaimer: Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed. All errors are our own. 1
“What information consumes is rather obvious: it consumes the attention of its recipients…The real design problem is not to provide more information to people, but [to design] intelligent information-filtering systems.”
- Herbert A. Simon 1971 Designing Organizations for an Information-Rich World 1. Introduction
According to Ransbotham et al. (2016), “the hype around data and analytics has reached a fever pitch.” The increased volume, variety, and timeliness of digital information produced by sources ranging from social media to the Internet of Things (IoT) have dramatically shifted what is measurable within firms and markets (McAfee and Brynjolfsson 2012). At the same time, exponential growth in computing power and the rise of new computational methods (in particular, new machine learning algorithms) have fueled excitement about tools and practices to extract more value from this data. In a 2019 IDC report, the worldwide forecasted revenues for big data and business analytics will reach $274.3 billion by the year of 2022 with a five-year compound annual growth rate of 13.2%. 2 Research in strategic management is increasingly orienting towards understanding the impacts of digitization for firm strategy and firm performance (e.g., Greenstein et al. 2013, Adner et al. 2019, Bennett 2020). Ironically, however, rising big-data enthusiasm is not grounded in big-sample evidence. Critcal gaps persist in our understanding of the prevalence, business impacts, and limitations of these tools and techniques among heterogeneous firms. We worked to address these gaps by collaborating with the US Census Bureau to gather the first large scale, representative data on the adoption of predictive analytics and potential tangible and intangible complements such as IT investement, worker education, production strategy, and structured management practices. We link this new survey to a detailed panel of administrative data to estimate productivity at the establishment level. Leveraging these novel measures, instrumental variables estimates based on plant-level variance in government-mandated data collection, and key details of timing for distinct waves of adopters, we provide new insights into the prevalence and variance in predictive analytics use, the causal relationships at work, and the
2 For more details, please see IDC website: https://www.idc.com/getdoc.jsp?containerId=IDC_P33195. 2
workplace complements that both constrain and augument the benefits achievable with these rapidly-emerging tools and techniques. In our representative sample of over 30,000 U.S. manufacturing establishments, we find that adoption of predictive analytics is both widespread and significantly associated with at least two to four percent higher multi-factor productivity. This shift in how firms extract value from data translates into an average $830,000 difference in sales between adopters and non-adopters, accounting for production inputs and a wide range of other factors. Critically, these gains are only present among establishments that have also pursued high levels of IT capital, employ educated workers at high rates, have committed to a high-efficiency production strategy, or rely on certain management practices centered on tracking key performance indicators. Predictive analytics, unlike other methods for making sense of data inputs, 3 is a set of techniques ranging from data mining to statistical modeling (including machine learning and “artificial intelligence”) to analyze historical and current data in order to make predictions about future events. 4 Long-standing theories in economics and management have modeled how better information can support and improve human decision-making, creating value at both the individual and organizational levels (Raiffa 1968; Blackwell 1953; Gorry and Scott 1971; March 1994). By extracting core information from large and complex data, predictive analytics shrinks the dimensions of relevant information, reduces the cognitive costs of decision-making, and speeds the execution of higher-quality decisions, thereby boosting returns from digital information. Significant research attention has been paid in recent years to “Data-Driven-Decision Making” (DDD) practices among managers and the rise of predictive analytics as a related but distinct phenomenon. The adoption of DDD and data analytics have been linked to higher Tobin’s q and profits (Brynjolfsson et al. 2011; Saunders and Tambe 2015), better productivity (Müller et al. 2018; Brynjolfsson and McElheran 2019), and innovation (Wu et al. 2019). However, the causal link between predictive analytics and organizational performance in large-scale, representative studies remains lacking. Building on prior literature, we address this gap by exploring the
3 For instance, a recent study by Berman and Israeli study the impacts of descriptive analytics (2020). All of our results separately control for management practices that are “data-driven” according to other questions on the survey (Brynjolfsson and McElheran 2019), so we are specifically isolating this particular set of techniques. 4 Please see Wikipedia and citations within at: https://en.wikipedia.org/wiki/Predictive_analytics. 3
performance implications of predictive analytics in a sample covering more than 50% of the U.S. manufacturing economy. We address open questions about causality by employing both instrumental variables estimation and timing tests to rule out reverse causality. Our findings provide evidence that predictive analytics is causally related to higher productivity, on average. However, important boundary conditions apply related to firm hetreogeneity and strategic commitments that have received little or no attention to date. In particular, our study reveals how these non-trivial returns on the use of predictive analytics depend almost entirely on the presence of other tangible and intangible firm investments. Anecdotal evidence is mounting that many firms struggle to realize benefits from their investments in data analytics (Schrage 2014). Lack of complementary skills and unaligned corporate strategy are reported to be key challenges (Ransbotham et al. 2015). Earlier acdaemic work on IT productivity emphasizes significant heterogeneity across industries and firms in the benefits of generic IT investment (e.g., Stiroh 2002, Brynjolfsson and Hitt 1995). A robust theoretical literature argues that this heterogeneity may arise from investments in complementary assets and managerial practices (Kandel and Lazear 1992; Milgrom and Roberts 1990, 1995; Holmstrom and Milgrom 1994; Brynjolfsson and Milgrom 2013; Brynjolfsson et al. 2018). A number of empirical studies have confirmed this intuition with respect to general-purpose IT and computer use (Black and Lynch 2001; Caroli and Van Reenen 2001; Bresnahan et al. 2002; Aral and Weill 2007; Bloom et al. 2012; Aral et al. 2012) as well as for data-centered management practices (Tambe 2014; Brynjolfsson and McElheran 2019; Dubey et.al. 2019). Yet tools and techniques for leveraging digital information are evolving rapidly, detailed data on their use in practice is difficult to obtain for large samples of firms, and many open questions remain. In particular, large-scale evidence on what specific complements should be considered – much less how much they matter – is almost entirely lacking. Also missing is a rigorous empirical approach for disentangling causality and potential confounds across a large and economically important sample of firms.5 Our study makes progress on these dimensions and sheds light on heretofore understudied organizational complements such as elements of the firms strategic positioning day-to-day management practices.
5 A few studies in the management information systems literature provide conceptual frameworks for understanding how predictive analytics can create synergies in organizations and improve performance (Côrte-Real et.al. 2014; Grover et.al. 2018). 4
This study builds on the frameworks for organizational complementarity developed in Athey and Stern (1998) and Brynjolfsson and Milgrom (2013) and contributes to a small but growing empirical literature on complementarities among technology, workplace characteristics, and managerial practices (e.g., Black and Lynch 2001; Aral et al. 2012; Tambe et al. 2012; Bloom et al. 2012; Brynjolfsson and McElheran 2019). Insights from these and related works guided our development of new survey questions specifically designed with this research question in mind. Our findings contribute to a growing strategic management literature emphasizing not only the average impacts of digitization and digital strategies, but also the heterogeneity in those effects related to varying organizational and market-based contingencies and constraints. This literature dates at least as far back as Tushman and Anderson’s (1986) exploration of how technology interacts with a given firm’s competence to more-recent explorations of internal and external influences on digital strategy choices and their implications (e.g., Mithas et al. 2013, McElheran and Jin 2020). This effort is part of a small but non-trivial set of collaborations between academics and administrative agencies to combine the coverage and consistency of administrative data collection with insights and questions from the frontier of academic research (see Buffington et al. 2017 and Zolas et al. 2020). Our partnership with Census yielded detailed establishment-level information on management and organizational practices, worker education, the availability and use of data, extensive and intensive margins of predictive analytics use, specifc use cases, and related questions on the organizational design of data collection – including government influence on this process. Linking this new survey with longstanding administrative data on production inputs and outputs, we achieve unusually-detailed insights for such a large sample (over 30,000 establishments). Thus, a key contribution of this study is to provide the first-ever descriptive statistics into the prevalence and contexts in which these practices have diffused in the U.S. Through systematic analysis, our study further reveals how core dimensions of process design and workplace practices may determine the returns to this increasingly widespread set of tools. In addition to contributing to the frontier of academic resaerch in this area, our findings provide evidence-based guidance for managers looking to achieve better returns on investment in data and predictive analytics and help them stay competitive in an increasingly digital age.
5
2. Conceptual Motivation
The Performance Effect of Predictive Analytics
Given the high ratio of interest-to-evidence surrounding predictive analytics, it is more important than ever to understand what impact predictive analytics has on business performance and how firms can achieve a higher return on their investments in “big data” (Grover et al. 2018). Existing theory in economics and management predicts performance gains from increasingly sophisticated tools for extracting useful insights from large sets of structured and unstructured data. All else equal, better information is argued to support and improve human decision-making, creating value at both the individual and organizational levels (Raiffa 1968; Blackwell 1953; Gorry and Scott 1971 March and Simon 1994 ). By extracting essential insights from large and complex data, predictive analytics shrinks the dimensions of relevant information, reduces the cognitive costs of decision-making, and speeds execution of higher-quality decisions, thereby boosting returns from digital information. Following prior literature focusing on the business value of data and analytics (Brynjolfsson et al. 2011; Tambe 2014; Saunder and Tambe 2015; Müller et al. 2018; Brynjolfsson and McElheran 2019), we argue that the adoption of predictive analytics is likely to have a positive impact on business performance and can lead to a significant gain in efficiency and productivity, all else equal.
Predictive Analytics and Potential Complements
In practice, important factors are typically not equal across firms. A prevalent theme from the frontier of economics and strategic management research is the widespread and increasing hetereogeneity in firm performance and workplace conditions.6 Prior theory and empircal studies highlight how complementarities among IT investments and organizational characteristics can lead to differences among firms that may persist and even grow over time (Milgrom and Roberts 1990, 1995; Black and Lynch 2001; Caroli and Van Reenen 2001; Bresnahan et al. 2002; Aral and Weill 2007; Bloom et al. 2012; Aral et al. 2012; Brynjolfsson and Milgrom 2013). While laggard firms
6 While high cross-sectional heterogeneity in firm performance is long-established (e.g., Syverson 2004 & 2011; Hopenhayn 2014), recent studies point to increasing firm heterogeneity along a number of economically important dimensions (Andrews et al. 2015; Van Reenen 2018; Song et al. 2019; Decker et al. 2020; Autor et al. 2020; Bennett 2020a). This phenomenon is not restricted to the United States (e.g., Berlingieri et al. 2017). 6
may catch up along certain dimensions (Brynjolfsson and McElheran 2019), trends towards increasing dominance of “superstar firms” (Autor et al. 2020), as well as rising concentration and inequality in workplace conditions and employee earnings have been linked to technology investment at the industry and firm level (Bessen 2017; Bennet 2020b; Lashkari et al. 2020; Barth et al. 2020). In addition, attention is increasingly focused on difficult-to-measure intangible features of firms and markets that may amplify these dynamics (Saunders and Brynjolfsson 2016; Haskel and Westlake 2018 Building on these insights and techniques, we explore potential tangible and intangible complements to predictive analytics that might be expected to drive economically important heterogeneity in returns to these tools and techniques.
IT Capital Stock
The successful implementation of predictive analytics is widely expected to depend on the firms’ IT infrastructure. The collection, storage, and communication of input data for predictive modeling all require robust tangible investments in sensors, transmission equipment (e.g. routers), data storage hardware, and computing resources for analyzing data. For instance, Bocsh’s manufacturing factories use predictive analytics tools for the maintenance of equipment such as robots, heat exchangers, and spindles using data generated from sensors.7 General Electric, another manufacturing giant, augments its jet engines with sensors to monitor temperature, fuel consumption, and many other daily operation tasks while generating 500 gigabytes of data per engine for each flight.8 Unsurprisingly, it is also investing heavily in predictive analytics to utilize such volume of data (Winnig 2016). Furthermore, building, training, and implementing these analytic tools require corresponding data processing hardware and software. IT-savvy firms that are more prepared for the industrial Internet of Things (IoT) – a particularly important dimension of the “big data” trend for our manufacturing context – may have fully-depreciated investments in sensors and other IT infrastructure to collect and anlayze data, giving them a tangible advantage when it comes to anlaytics.
Intangible complements may also be associated with these easier-to-measure IT capital
7 See reports at https://blog.bosch-si.com/industry40/industry-4-0-predictive-maintenance-use-cases-in-detail/ and https://amfg.ai/2019/03/28/industry-4-0-7-real-world-examples-of-digital-manufacturing-in-action/ for more details. 8 https://venturebeat.com/2015/06/18/here-comes-the-industrial-internet-and-enormous-amounts-of-data/ 7
investments over time. Firms with long-standing experience in developing and deploying IT resources and related practices are expected to enjoy disproportionate returns due to accumulated IT capabilities, appropriate organizational design, human capital accumulation, and other dimensions of “learning by doing” (e.g., Bharadwaj 2000; Saunders and Brynjoflsson 2016). In addition, firms with greater accumulation of IT capital may enjoy a greater capacity to adopt and effectively deploy new techniques and related busines processes (e.g., Han et al. 2011). We hypothesize that firms with higher measured IT capital stock will have lower costs and greater opportunities to leverage predictive anlaytics to boost productivity in their workplaces.
Educated Workers
While IT capital stocks may proxy for a range of critical complements to predictive analytics, it is useful to disentangle this cluster of interrelated activities as finely as possible. In particular, it is widely accepted that more-skilled and better-educated workers are key drivers of growth in both manufacturing productivity (Black and Lynch 2001; Moretti 2004; Waldman 2016) and returns to IT (Brynjolfsson and Hitt 2000; Bresnahan et al. 2002). With the increasing digitalized organization and growing prevalence of business applications for data and predictive analytics, demand for workers with complementary analytic skills is also growing. Firms increasingly demand workers that know how to deploy “smart” technologies in production settings (Helper et al. 2019 as well as those who can help translate digital information into business insights. While competition for these workers may drive up wages to a point where they do not manifest excess returns in the production function, we expect that their presence will boost the estimated returns from adopting predictive anlaytics due to these complementarities.
Production Strategy Production strategy has received limited attention – even in the strategic management literature - and has not, to our knowledge, been linked to IT or digitization. Yet the relevance of this concern is conceptually very straightforward and arguably first-order in our manufacturing setting. At its most fundamental level, predictive analytics uses historical and current data to predict future outcomes. Thus, the effectiveness of this forward-looking approach depends on: 1) the richness of past data inputs to charcterize meaningful states of the world, and 2) sufficient stationarity of the environment so that past performance may be a useful indicator of future
8
conditions. Not all production processes will score high on these criteria. In general, workplaces with greater automation and reduced variance will provide richer data inputs and better alignment with the value proposition of predictive analytics compared to other, more dynamic, production environments. Yet dynamism versus stationarity of a production environment is rooted in a host of economic and strategic considerations that are very difficult to adjust, potentially limiting or augmenting the value of predictive analytics in particular settings in persistent ways. For organizational, strategic, cognitive, and technological reasons, the orientation of a business unit’s production process towards efficiency versus flexibility tends towards one extreme or the other and is fundmentally difficult to reverse (e.g., Ghemawat and Ricart Costa 1993). It is correlated with a range of specific management practices (McElheran et al. 2020), and typically aligned with key dimensions of the firm’s market context and positioning (Hayes and Wheelright 1979, Safizadeh et al. 1996). Thus, investments in more-automated and more-efficient production processes versus more innovative and dynamic activities is indicative of an organizational environment that is subject to lower variance. This holds for features of both the external market context (e.g., market demand and suppy chain processes), as well as for internal operations (equipment utilization, product quality, cycle times, etc). Following the influential work of Hayes and Wheelwright (1979), we conceptualize a high- efficiency production process in the manufacturing sector as one where a plant operates primarily on a continuous-flow production basis. 9 Compared to other processes (e.g. job shops for custom work or prototyping for product development), continuous-flow processes tend to be more capital- intensive 10 continuously-monitored, and managed so as to minimize variance. This promotes stationarity and the value of predictions based on past information. The richness of information available to be analyzed also tends to be higher. Higher-volume, lower-mix production tends to involve more tightly-coupled systems where downtime and low
9 The comparison is between the establishments with continuous flow production process and the job shops and establishments with a batch process. See Hayes and Wheelwright (1979) for a more complete discussion of the process design features captured by the measures we use here. 10 Based on the industry level productivity database from the National Bureau of Economic Research (NBER), these industries have the highest average capital stock within the manufacturing sector. Database and statistics are available upon request. 9
capital utilization are more costly or even dangerous, promoting systematically greater use of instrumentation and sensors. The data generated from these sensors are increasingly digitalized, transmitted, and analyzed at a high frequency, providing rich inputs to predictive analytics practices when they are adopted. Thus, we anticipate lower costs and higher returns to predictive analytics in workplaces with a high-efficiency production strategy.
KPI monitoring
The old adage that “what gets measured gets managed” holds in the digital age as well as ever. Manufacturing firms have long been developing and monitoring key performance indicators (KPIs) in a variety of operational practices such as Lean manufacturing and Total Quality Management. Firms track capacity utilization rate, manufacturing cycle time, and production downtime, among many others. These KPIs help firms quantify and monitor the performance of operations and thus are highly correlated with the breadth and intensity of data collection and analysis at the establishment (Brynjolfsson and McElheran 2019). Schrage and Kiron (2018) stress the importance of KPI use in the digital era with the rise of big data and computational and algorithmic innovation. Firms with well-established KPI practices will tend to have richer inputs to feed their predictive models. In other words, “what gets measured gets into predictive modeling.” Given the advancement in reinforcement learning and neural networks, predictive analytics have the potential to help make better decisions in rather complex business processes with its increasing capability and precision (Khurana et al. 2018; Khan et al 2018).
Yet, more-sophisticated predictions will only provide performance gains to the extent that they are acted on in practice. Again, we anticiapte that prior use of managerial routines around KPI monitoring will support greater organizational capacity for leveraging outputs from predictive models, leading to important production dependences between this class of managerial practices and new predictive analytics techniques.
3. Data and Key Measures
The lack of large-scale data on business adoption of predictive analytics has been a significant barrier to empirical research in this area. We made progress by collaborating with the U.S. Census Bureau to design and field a purpose-designed survey that was sent to a large and representative sample of establishments in the U.S. manufacturing sector. Leveraging the extensive reach, 10
expertise, and administrative infrastructure of the Census Bureau, we developed, tested, and deployed a set of questions that were added to the 2015 Management and Organizational Practice Survey (MOPS).11 Our sample includes roughly 30,000 establishments that provided data on predictive analytics adoption and frequency of use in 2015, along with recall data for 2010. Responses to the survey are required by law, yielding a response rate of 70.9%. 12 Therefore, our study is significantly less affected by the respondent bias and lack of representativeness that typically plague other custom-designed surveys. 13
Predictive Analytics Use Adding questions to Census surveys requires that they be applicable across a wide range of industry and firm settings and that they pass rigorous cognitive testing designed and implemented by dedicated Census employees and reviewed by both stakeholders across a range of positions and backgrounds. At the end of multiple rounds of development and cognitive testing, the focal question on the MOPS survey asks, “How frequently does this establishment typically rely on predictive analytics (statistical models that provide forecasts in areas such as demand, production, or human resources.” They may mark all that apply among Never, Yearly, Monthly, Weekly, and Daily, with separate columns for 2010 (recall) and 2015. We construct a measure of the extensive margin of predictive analytics use by identifying establishments that adopt with any frequency and compare them to those that report “ Never.” To capture the intensive margin, we first assign numeric values from 0 to 4 to each frequency category, with higher values representing higher frequencies of predictive analytics use. In cases where plants report multiple frequencies, we default to the highest one. We also explore using a normalized score based on taking the average of multiple responses for a given establishment (see Bloom et al. 2019). 14 To explore different use cases for predictive analytics, we further combine information from
11 This survey is conducted on both 2010 and 2015 as a supplement of the Annual Survey of Manufactures (ASM). See Bloom et al. (2019) and Buffington et al. (2017) for more details on the MOPS survey. 12 For more details, please see the website for the MOPS: https://www.census.gov/programs-surveys/mops/technical- documentation/methodology.html#:~:text=The%202015%20MOPS%20had%20a,(URR)%20of%2070.9%25. 13 See Zolas et al. (2020) for a more-comprehensive discussion of the issues and related survey work. 14 The normalization is helpful for easy interpretation of coefficients in the regression analysis and taking an average of the scores helps to account for the multiple responses given by a given establishment. Nevertheless, we find the results from both measures are consistent. 11
another MOPS question regarding where data analytics was applied at the plant. This question focuses on three main business functions: supply chain management (SCM), demand forecasting (DF), and product design (PD). We construct the corresponding use cases for predictive analytics conditional on the plants reporting having adopted predictive analytics and using data analytics in the corresponding function. 15 Figures 1 and 2 present the adoption of predictive analytics by states and industries (3-digital NAICS). Both figures reveal a surprising fact: predictive analytics was widely diffused among manufacturing plants across almost all states and industries as early as 2010, with average adoption rates well over 70%. Further exploration using a balanced sample of roughly 18,000 establishments (those with data for both years)16 indicates a rather small change in the adoption of predictive analytics between 2010 and 2015. That is, the average rate of growth in use of predictive analytics observed in our sample is under 1.5% per year.17 This slow rate of change is due in no small part to the high baseline adoption in 2010. If we think about predictive analytics as a “management technology” following Bloom et al. (2016), its diffusion likely reached a relatively level of maturity – “latter majority stage” in the technology adoption S curve (see Rogers 2010) - and thus has slowed its growth among the remaining population of firms. Remaining non-adopters will tend to be firms with low net benefits of adoption, due either to high costs or low anticipated returns. In addition to being informative about the timing and penetration of predictive analytics adoption, this empirical fact has important implications for our research approach. In particular, standard models for addressing time-invariant organizational heterogeneity through the use of firm or establishment fixed effects are of limited use in our empirical setting. Recall that panel data methods only identify coefficients off of establishments that transition from not using predictive analytics to adopting in our sample. Thus, they would be applicable to a very small and highly- selected group of plants whose defining features is being in this “laggard” group of adopters (those that adopted between 2010 and 2015). Frontier users of these practices would be differenced out
15 Both questions are presented in Appendix Table A1. 16 The smaller number of observations in 2010 as well as in the balanced panel is due to the sample rotation of the ASM. The ASM sample frame changes in years ended with “4” and “9”, which means that the establishments in our 2010 and balanced samples have to be in both 2009-2013 and 2014-2018 sample frames. 17 The adoption rate of predictive analytics increased from 73% in 2010 to 80% in 2015, implying a 1.4% growth rate per year. 12
of any panel data estimation, likely biasing the average productivity attributed to these practices. Moreover, the 5-year gap in our two-year panel generates considerable measurement imprecision concerning the exact year of adoption. Therefore, we focus our empirical approach primarily on pooled OLS models for our two discrete years of data and include both industry-year fixed-effects as well as a rich set of controls (many of which capture fixed or quasi-fixed organizational characteristics). For completeness, we report the less-informative results from plant fixed-effects models in the online appendix. [Insert Figures 1 and 2 here] Performance Data and Workplace Complements Using plant-level identifiers maintained by the Census Bureau, we merge the MOPS with the Annual Survey of Manufactures (ASM), the Census of Manufactures (CMF), and the Longitudinal Business Database (LBD) to bring in information on detailed inputs (including accumulated and depreciated capital stocks, as well as input costs for labor, materials, and energy), outputs (total value of shipments and value-added), age, and whether the establishment belongs to a multi- establishment firm. We restrict the sample to observations with complete information on sales, labor, materials, energy, and the total number of employees for technical and disclosure-avoidance reasons. To explore potential organizational complements, we leverage several other measures in the ASM and the MOPS. To explore the relationship to tangible IT investments, we calculate IT capital stocks using capital expenditure on computer and peripheral data processing equipment from the ASM and CMF panel dating back to 2002, using a standard perpetual inventory approach and an industry-level deflator for hardware from the Bureau of Economic Analysis (BEA). We impute values for years in which values are missing 18 and depreciate at the rate of 35% per year following Bloom et al. (2014), Brynjolfsson and McElheran (2016), and Jin and McElheran (2019). 19 We
18 We impute the missing values using the average of the IT investment from the closest before and after years that have non-missing values for the establishment-years where the capital expenditure on computer and data processing equipment is missing. For instance, if IT investment in 2008 is missing, we impute it using the average IT investment for the establishment in 2007 and 2009 or using the 2007 and 2010 values if 2009 is missing. Similar logic is applied to missing values from other years. A similar method has been used in Bloom et al. (2014), see appendix for more detail. 19 Also note that the Non-IT capital is calculated using a similar approach but with total non-IT capital expenditure (total capital expenditure less IT capital investment) instead. The industry level deflator for the non-IT capital stock is obtained from the ASM Total Factor Productivity databases. 13
test the complementarity between IT and predictive analytics by utilizing the IT capital stock variable. The advantage of this measure is that it accounts for the overall stock. If it takes time for firms to adjust and utilize novel IT investment, we will be able to capture the lagged effect. A reasonable concern here is that we are unable to capture the effect of capitalized software (e.g. ERP investment) which might also play a significant role in facilitating the implementation of predictive analytics or otherwise boost productivity (Barth et al. 2020). To address this concern, we conduct several robustness tests in both the baseline performance analysis and the complementarity tests. First, we control separately for software and IT services expenditures from the ASM, in addition to IT capital stock. Alternatively, we use a measure summing up all IT investments (hardware, software, and services) instead of the IT capital stock variable. In both cases, we find that our results stay robust and consistent. Next, we use information from the background characteristics in the MOPS regarding the percentages of managers and non-managers with a bachelor’s degree. Combined with the total number of employees (from the ASM) and the number of managers (from the MOPS), we calculate the weighted average of the percentage of employees (both managers and non-managers) with a bachelor’s degree following Bloom et al. (2019). This measure is similar to the prior literature that uses education as a proxy for human capital in labor studies (Card 1999; Bresnahan et al. 2002; Arcidiacono et al. 2010) . Another measure we exploit in the MOPS is an indicator of how the production process is organized. We focus on establishments with high-flow production processes (including both cellular and continuous-flow manufacturing).20 These processes are characterized by low product mix and high volume per product, which also tend to be more capital-intensive, with higher levels of automation (Safizadeh et al. 1996). This is in contrast to establishments operating as job shops, batch manufacturing facilities, or R&D focused establishments that are all more likely to have a “jumbled flow” process design, supporting flexible, high-mix but generally low-volume operations. The latter is typically central to product design and prototyping (Hayes and Wheelwright 1979) and appear less amenable to very structured approaches to management (McElheran et al. 2020).21
20 See Kiran (2019) for detailed description of cellular manufacturing. 21 We assigned the establishments characterized by batch production along with job shops. These establishments lie in-between the “job shops” and continuous flow production in terms of their capital intensity, level of machine 14
We test empirically whether establishments with high-flow process designs are better positioned to seize larger benefits from the adoption of predictive analytics. Note that the production process is usually quasi-fixed since the transition cost is high. This is largely aligned with the empirical evidence from our data, where we observed almost no change for this variable. That is, the percentage of plants transitions into the continuous-flow production process is less than 0.7% per year between 2010 and 2015. Arguably, this is a feature for testing complementarity since it allows us to test the effect of predictive analytics on the performance given the pre-existing cross establishment differences (Aral et al. 2012). Lastly, we construct an indicator for establishments that report monitoring 10 or more KPIs. Based on the MOPS survey, establishments are categorized by the number of KPIs monitored, ranging from 0 KPI, 1-2 KPI, 3-9 KPI, and 10 or more KPI. We take the top category as a proxy for establishments with high intensity of KPI monitoring to test our hypothesis on whether the practice of KPI monitoring is a complement for predictive analytics. Table 1 presents the summary statistics for our baseline sample.22 Overall, we have a pooled sample of 51,000 observations from both 2010 and 2015 with a complete set of information for our empirical analysis (about 20,000 for 2010 and 31,000 for 2015). Table 2 reports the summary statistics for all key variables in this baseline sample. The average adoption rate for predictive analytics is well above 70%, with the majority of establishments reporting annually and/or monthly use (i.e. frequency value is roughly 1.12). The average annual total value of shipments and total employment in log terms for establishments in our sample are around 10.37 and 4.56, respectively, which can be translated into about $32,000,000 and an average size of 96 employees. 23 The mean age of these establishments is more than 24 years old. These establishments have on average about $175,000 of IT capital stock and slightly over 15% of workers with a bachelor's degree. Finally, about 35% of the establishments reported having the continuous-flow production process, and
automation, as well as product standardization. Adding them in the control group is likely to add measurement error, biasing our estimates downwards. In so doing, we might be estimating a low bound of the effect on instead of suffering an omitted variable bias (if we do not control it in the base group). We also control for R&D facilities separately. 22 Table A1 presents the key question from the MOPS regarding predictive analytics in the survey. Also, please see the survey questionnaire for the 2015 MOPS at the U.S. Census website: https://www2.census.gov/programs- surveys/mops/technical-documentation/questionnaires/ma-10002_15_final_3-2-16.pdf 23 To cover a larger percentage of the manufacturing economy and all industries, the ASM selects a set of establishments in the certainty sample frame. For instance, establishments with employment for the establishment is greater than or equal to 1,000 in the 2012 Census or the establishment that classified within an industry with less than 15
around 44% of them reported tracking 10 or more KPIs. With this discussion in mind, we now turn to the empirical method.
4. Empirical Methods
Adoption and Correlation Test After providing descriptive statistics on adoption, our empirical analysis explores correlations between predictive analytics use and potential workplace complements including IT capital stock, an indicator for plants having a more continuous-flow production process, percentage of educated workers, and top KPI monitoring. If complementarities exist, we should observe higher adoption of predictive analytics for establishments that also report the presence of these characteristics (Brynjolfsson and Milgrom 2013). We test both linear probability and probit models, and explore a wide range of establishment and firm characteristics to address potential omitted variable bias. We also control for geographic differences and industry-year fixed-effects to account for any transitory industry-specific shocks. Performance Analysis To estimate the performance impact of predictive analytics, we take a conventional approach to modeling the establishment’s production function (Brynjolfsson and Hitt 2000; Brynjolfsson and Hitt 2003; Bloom et al. 2012; Tambe and Hitt 2012). As our baseline specification, we consider a log-transformed Cobb-Douglas production function in equation (1):