Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas Alex Deng Ulf Knoblich Jiannan Lu∗ Microsoft Corporation Microsoft Corporation Microsoft Corporation Redmond, WA Redmond, WA Redmond, WA
[email protected] [email protected] [email protected] ABSTRACT variance of a normal distribution from independent and identically During the last decade, the information technology industry has distributed (i.i.d.) observations, we only need to obtain their sum adopted a data-driven culture, relying on online metrics to measure and sum of squares, which are the corresponding summary statis- 1 and monitor business performance. Under the setting of big data, tics and can be trivially computed in a distributed fashion. In data- the majority of such metrics approximately follow normal distribu- driven businesses such as information technology, these summary tions, opening up potential opportunities to model them directly statistics are often referred to as metrics, and used for measuring without extra model assumptions and solve big data problems via and monitoring key performance indicators [15, 18]. In practice, it closed-form formulas using distributed algorithms at a fraction of is often the changes or differences between metrics, rather than the cost of simulation-based procedures like bootstrap. However, measurements at the most granular level, that are of greater inter- certain attributes of the metrics, such as their corresponding data est. In the context of randomized controlled experimentation (or generating processes and aggregation levels, pose numerous chal- A/B testing), inferring the changes of metrics can establish causal- lenges for constructing trustworthy estimation and inference pro- ity [36, 44, 50], e.g., whether a new user interface design causes cedures.