Bulk Volume Trade Classification and Informed Trading*

Preliminary and Incomplete: Please do not cite or circulate this version.

Bulk Volume Trade Classification and Informed Trading ∗

Allen Carrion University of Memphis

Madhuparna Kolay University of Portland

This Draft: January 2020

Abstract

We document that the existing evidence that bulk volume trade classification (BVC) can measure informed trading arises largely due to misspecified tests. In particular, simulations show that these tests detect spurious relationships in data containing only uninformed liquidity trades. We also assess the performance of BVC order imbalances in the NASDAQ HFT dataset, showing that BVC order imbalances underperform conventional order imbalance measures that are based on aggressor flags in detecting informed trading. When we isolate the component of order flow that BVC designates as passive informed trading, we find that this component of order flow fails to predict future returns with the correct sign. On balance, our evidence supports the use of conventional order imbalance measures to identify informed trading.

∗ We thank NASDAQ and Frank Hatheway for supplying the HFT dataset.

1. Introduction

A standard approach to identifying informed trading in microstructure data is to use order imbalances (or unexpected order imbalances) as a proxy. Order imbalances measure the difference between aggressive buying volume and aggressive selling volume in an interval, and sometimes normalize this value by total volume traded. 1 This has been a longstanding practice and has been validated empirically, but this approach has a wellknown shortcoming. Order imbalances measure the amount of aggressive trading in an interval, and their use as a proxy for informed trading rests on the assumption that, on balance, informed traders tend to demand liquidity. However, as pointed out by Harris (1998), Kaniel and Liu (2006), Baruch, Panayides, and Venkataraman (2017), and many others, there are conditions that can motivate informed traders to trade passively. This has been confirmed empirically (Kaniel and Liu (2006), others).

The validity of the use of order flow imbalances as a proxy for informed trading rests on the assumption that, while some informed traders my trade passively, they generally demand more liquidity than they supply. O’Hara (2015) and Easley, Lopez de Prado, and O’Hara (2016) (ELO

(2016) hereafter) suggest that this assumption has weakened in modern markets, as informed traders presumably have increased their use of algorithms that trade passively. ELO (2016) argue that, as a result of increases in passive trading by informed traders, “the notion of the active side of the trade signaling underlying information is undermined.”

Regardless of whether aggressive trading remains correlated with informed trading in modern markets, more accurate methods of identifying informed trading that outperform order imbalances by detecting both aggressive and passive informed trading would be desirable. ELO

1 Some studies calculate imbalances using number of trades or dollar volumes instead of share volumes. Examples include Chordia, Roll and Subrahmanyam (2002), Chordia and Subrahmanyam (2004), and Kim and Stoll (2014). 1

(2016) introduce the bulk volume classification technique (or BVC), which potentially addresses this issue. As describe in ELO (2016), BVC “aggregates trades over short time or volume intervals and then uses a standardized price change between the beginning and end of the interval to approximate the percentage of buy and sell order flow.” The authors suggest that this technique identifies imbalances from informed traders whether they trade aggressively or passively. They provide the following interpretation of their main empirical result: “What matters for our purposes is that order imbalance created from bulk volume works, in the sense that it is positively related to the highlow spread [signifying a response to informed trading], but that order imbalance created from the tick rule (or even derived directly from the aggressor flag) does not work.”

If BVC indeed has these properties, this technique offers an invaluable tool for researchers. However, the empirical evidence supporting these claims is limited. ELO (2016) provide strong arguments motivating BVC, but their empirical validation of BVC is limited to a single test asset and single BVC parameterization. We are only aware of two other studies that conduct similar analyses (Panayides, Shofi, and Smith (2019) and Chakrabarty, Pascual, and

Shkilko (2015)), and they primarily rely on the methodology employed in ELO (2016) and a similar test specification that was proposed in an earlier working paper version of ELO (2016) and later abandoned.2,3 Both studies qualitatively reproduce the main ELO (2016) result in samples of equity data using many stocks, but Chakrabarty, Pascual, and Shkilko (2015) express

2 The working paper version of ELO (2016) used a test with a highlow spread (HL) as a liquidity proxy. The final version of ELO (2016) states that the HL spread is contaminated by “fundamental variance” and replaces it with the CorwinShultz in this test. The final version of their test is discussed in detail in Section 3. 3 Panayides, Shofi, and Smith (2019) also conduct an alternate test that finds BVC predict returns around corporate events. Andersen and Bondarenko (2014a, 2014b, and 2015) also test or discuss BVC. However, they focus on its accuracy in identifying the aggressor side of trades or its use in VPIN calculations. 2

qualifications.4 Chakrabarty, Pascual, and Shkilko (2015) also use an older version of BVC based on the ELO working paper, which is similar but not identical.

In this paper, we further investigate the ability of BVC to detect informed trading. First,

we revisit the initial empirical evidence provided by ELO (2016). We first discuss some

conceptual concerns with their tests. We then repeat their tests in simulated data with no

informed trading, and find that their tests are severely misspecified. The ELO (2016) regressions

never fail to reject the null hypothesis of no informed trading at the 1% level in the simulated

data. It appears that these tests are measuring something other than the relationship between

BVC and informed trading. While we believe that the original evidence in support of BVC’s

ability to detect informed trading is unreliable, this does not rule out the possibility that BVC

may still have the properties claimed by ELO (2016). Therefore, we propose two alternative tests

and conduct them on a sample of equity trading data. Both of these tests are motivated by the

idea that informed trading should positively predict future returns, and are related to tests previously conducted in the literature (cites). In the first set of tests, we compete order

imbalances constructed with BVC with conventional order imbalances using the known trade

signs (true order imbalances). In multiple specifications with several return horizons and parameterizations of BVC, true order imbalances outperform BVC order imbalances in every

case, and the coefficients on BVC order imbalances are often insignificant or predict future

returns with the wrong sign. In a second set of tests, we derive a measure from BVC and the true

order imbalance that should measure passive informed trading if BVC performs as claimed, and

test the ability of this measure to predict future returns. This measure fails to do so in all

4 Chakrabarty, Pascual, and Shkilko (2015) note that the relationship between BVC and the HL spread may be driven by correlations with volatility. They also replace HL with returns and alternate liquidity proxies and find mixed results; BVC is generally found to be positively related to returns and contemporaneous liquidity proxies but does not uniformly outperform conventional order imbalance measures. 3

specifications examined, and often predicts returns with the wrong sign. We repeat these tests using order imbalances constructed using trades signed with the Lee and Ready (1991) method instead of true trade signs, and find similar results (available next version).

Our results suggest that researchers should be wary of employing BVC based on expectations of its superior ability to measure informed trading. We find no evidence that BVC outperforms conventional order imbalances constructed from true trade signs or Lee and Ready

(1991) trade signs in this regard; it actually underperforms significantly in our tests.

The rest of this paper is organized as follows. Section 2 describes and discusses the BVC algorithm. Section 3 reviews the existing evidence used to support the claim that BVC identifies informed trading. Section 4 presents results from alternate tests of the ability of BVC to identify informed trading. Section 5 concludes.

2. The Bulk Volume Classification Algorithm

The bulk volume classification (or BVC) method does not classify individual trades as do other conventional trade classification algorithms such as the Tick rule or the Lee and Ready method. Instead, BVC aggregates trades into bars, by either volume or time.5 ELO (2012) and

ELO (2016) argue for the importance of aggregation by volume in modern markets, so we focus on volume bars in this paper. The CDF of the price change between the close of a bar and the close of the prior is then used to calculate the percentage of buys versus sells per bar. Since tradebytrade classification is replaced by a barbybar probability of buys versus sells, the aggregation may lead to greater efficiency with respect to data usage.

Specifically, the buyerinitiated volume is calculated as follows (Eq. (1), ELO (2016)):

5 Trade bars have also been used in Chakrabarty, Pascual and Shkilko (2015) and a working paper version of ELO (2016). 4

∆ (1) = × , ∆ where is the estimated buyer initiated volume during bar , is the aggregated volume, is ∆

the price change between consecutive bars, σ p is the standard deviation of price changes, and t is the cumulative distribution function of student’s tdistribution, with degrees of freedom. The buyer initiated volume is then converted into an order imbalance with the following formula

(adapted from Eq. (5), ELO (2016)):

_ = 2 − 1 (2) The usefulness of BVC can be viewed from two perspectives. First, BVC is a potential competitor to conventional trade signing algorithms. In this context, its performance can be measured by its accuracy in reproducing the true trade signs (or “aggressor flags”) in the underlying data. The empirical evidence on BVC’s accuracy as a trade signing algorithm is mixed. In various tests in the literature, it does not generally outperform other popular trade signing algorithms, but there is some evidence that BVC may have accuracy advantages over the competing algorithms in certain settings, and it does correlate with true trade signs. 6 However,

BVC has other advantages such as computational efficiency and minimal data requirements that may make it attractive for specific applications. Second, and more importantly for this study,

BVC may outperform conventional order imbalances in measuring the information in order flow regardless of its performance from a trade signing perspective. Stated alternately, BVC may disagree with the true trade signs but still more accurately reflect the information in order flow than the true trade signs. Conventional order imbalances are a proxy for informed trading under the assumption that informed trading tends to be aggressive, while it is possible that BVC

6 See Panayides, Shofi, and Smith (2019) and Chakrabarty, Pascual, and Shkilko (2015), ELO (2016), and Andersen and Bondarenko (2015). 5

actually measures informed trading whether it is aggressive or passive. We discuss the relevant evidence for this potential property of BVC in Section 3 below.

BVC may be particularly advantageous in the context of markets where high frequency

trading is common. ELO (2016) assert that high speeds of trading, increased fragmentation, and

higher rates of order submissions and cancellations have changed the mechanics of trading in

ways that reduce the classification accuracy of conventional trade signing algorithms.7 ELO

(2016) also suggest that in the modern market environment the aggressor side of a trade is now less likely to correlate with the informed side of the trade.

One potential drawback of BVC is that its output is not entirely deterministic. The grouping of trades into bars can differ if the starting point changes. For example, consider two researchers working on partially overlapping trade samples with different starting dates. If they chose to fill bars continuously over the full sample (as in ELO (2016)), they would potentially arrive at different bars for the overlapping trades. The situation is similar if two researchers use the same sample but on chooses to fill bars continuously while the other restarts fresh with an empty bar periodically (possibly every stockmonth as in Panayides, Shofi, and Smith (2019) or every trading day). Volatility estimation can also cause uncertainty in the output. σp in Eq. (1) is estimated insample using the full sample period in ELO (2016). Therefore, the volatility and

BVC classifications for a specific group of trades could differ depending on the starting or ending dates of the samples they are included in. Further, if the researcher is concerned with timevarying volatility, it would clearly be reasonable to reestimate volatility over shorter intervals. This would also potentially alter the output. Another related issue is that there is little

7 There is disagreement in the literature regarding this assertion. Chakrabarty, Pascual, and Shkilko (2015) and Carrion and Kolay (2019) both report evidence that conventional trade signing algorithms perform well in data from fast and fragmented markets, and Panayides, Shofi, and Smith (2019) report mixed evidence. 6

guidance in the literature on how to select the bar size for BVC. These problems could be mitigated if researchers coordinated on a set of implementation details, but this has not happened as of this writing. With the exception of some sensitivity analysis around bar size, these effects have not been quantified in the literature so it is not clear if they are significant in practice. 8

3. Existing Evidence on BVC and Informed Trading

First, we describe and discuss the evidence presented in ELO (2016). Next, we report the results of a simulation exercise where we assess the statistical properties of the test specification

ELO (2016) by repeating their test in simulated data with no informed trading.

3.1 The ELO (2016) Test

The main evidence provided by ELO (2016) in support of the ability of BVC to identify informed trading is provided by a regression designed to associate time variation in liquidity with order flow imbalance measures. 9 This is based on the idea that market makers will withdraw

liquidity when informed traders are active. The regression specification is:

(3) _ = + |_| + |_| + _

where CS_SPRD is the Corwin and Schultz (2012) highlow spread (CorwinSchultz spread

hereafter), OIB_BVC is the order flow imbalance constructed using BVC, OIB_TICK is the

order flow imbalance constructed using the tick rule, and t indexes volume bars. The lagged

value of CS_SPRD is included as a control for autocorrelation, and tstatistics are NeweyWest

adjusted.

8 See Chakrabarty, Pascual, and Shkilko (2015), Masot, Nawn, and Pascual (2018), and Panayides, Shofi, and Smith (2019). 9 ELO (2016) also present related evidence graphically in their Figure 5, but without an accompanying statistical test. 7

The results of this test are presented in Table 6 of ELO (2016). Three versions of the regression model are estimated: one with OIB_BVC, one with OIB_TICK and a third with both.

All three versions include the lagged CS_SPRD. The coefficient on |OIB_BVC| is positive and significant and the coefficient on |OIB_TICK| is negative and significant. These results are consistent across models. ELO (2016) interpret these results as evidence that BVC works to measure informed trading, and that the tick rule order imbalances identify aggressive traders that tend to be uninformed. We discuss several observations and concerns with this test below.

First, we note that the main conclusion is only supported by the result of a single time series regression, using a single test asset and a single BVC parameterization. The rationale for this choice is not explained in the paper, and a separate analysis in ELO (2016) uses three test assets and ten BVC parameterizations for each. Regardless of the rationale, it seems difficult to draw strong conclusions from these results without evidence of robustness.

Second, this test relies on reductions in liquidity provision by market makers to detect informed trading. In contrast to other tests in the literature that use permanent price impacts, future returns, or other pricediscovery based measures, this test relies on a nearimmediate market maker response to detect informed trading. As Kacperczyk and Pagnotta (2016) state,

“most theorymotivated information measures, such as the bid–ask spread and the price impact of trades (Glosten and Milgrom, 1985; Kyle, 1985), rely on the notion that the presence of informed traders is common knowledge to other market participants.” This is essentially a joint hypothesis problem, where the test relies not only on the presence of informed trading but also on the nature of the response by market makers. This immediate response is inconsistent with some theories where market makers learn over a series of trades or informed traders attempt to

hide their informed status. 10 If market makers respond slowly to informed imbalances, or

withdraw liquidity while patient liquidity demanders continue to post quotes at the old prices,

these tests will misclassify the imbalances as uninformed. We do not suggest that this issue

invalidates this test, as it is based on a theoretically sound response to informed trading. It seems

reasonable, however, to take the nature of this test into account when determining how much

weight to place on these results and as motivation to seek confirmation from other tests.

Third, it is not clear that the CorwinSchultz spread is wellsuited to this test. This test is

designed to associate time variation in liquidity provision with the time variation in informed

trading captured by the explanatory variables. The CorwinSchultz spread relies only on trade

data, and ELO (2016) motivate their choice of this liquidity measure with concerns about the

reliability of quote data from fast markets. The CorwinSchultz spread as originally introduced,

however, estimates an average spread over a series of observations and treats time variation between adjacent observations as noise to be averaged out rather than as meaningful time

variation. In their monthly analysis, Corwin and Schultz (2012) discard all stockmonths where

they are not able to average over at least 12 trading days. Corwin and Schultz (2012) also find

time variation in individual observations when using simulated data with a constant bidask

spread, including negative spread observations.11 Similarly, in our analysis below we find significant time variation in the CorwinSchultz spread, including negative observations, in simulated data with a constant trading cost function. 12 ELO (2016) use the individual spread observations directly without smoothing or aggregation.

10 For example, in Kyle (1985) the informed trader trades slowly to blend in with liquidity traders, and its private information is only revealed gradually in prices. 11 Corwin and Shultz (2012) investigate alternate adjustments for negative spreads and find that setting negative spreads to 0 before averaging results in the best performance. 12 Not tabulated, available upon request. 9

Fourth, the BVC order imbalance is only competed against the tick rule order imbalance

and not the true order imbalance. One of the most important hypothesized properties of BVC is

its potential advantage over order imbalances constructed from true trade signs in identifying

informed trading.

Fifth, as others have pointed out in similar tests, there is a possibility that there is a

mechanical bias in regressions of this form.13 The CorwinSchultz spread, used as the dependent variable, is essentially a nonlinear transformation of the highlow range in a bar. The BVC order imbalance is essentially a nonlinear transformation of the return in the bar, and the regressions use the absolute value of this variable as an explanatory variable. The highlow range in a bar is clearly mechanically related to the absolute value of the return in the bar, regardless of the presence of informed trading. 14 Therefore, one might expect regressions of the highlow range on the contemporaneous absolute value of the return to be misspecified. It is not clear if the transformations of the highlow range into the CorwinShultz spread and the absolute value of the return into BVC break this mechanical relationship. While many of the other concerns we raise in this section are qualitative and somewhat subjective, this issue is testable. We investigate the properties of this regression specification in the next section.

3.2 Simulations

In this section, we conduct a simulation exercise to investigate the properties of the ELO

(2016) regression tests. We simply reproduce these regressions repeatedly in simulated data with no informed trading. If these tests are well specified, the coefficients of interest will be zero on

13 See Chakrabarty, Pascual, and Shkilko (2015) and Andersen and Bondarenko (2015). 14 We are not aware of an analytical proof of this relationship, but it is easily demonstrated with simulations. 10

average and their test statistics will reject the null hypothesis at an appropriate rate for the specified significance level.

ELO (2016) only conduct their tests on BVC order imbalances constructed with 10,000 contract volume bars. However, they advance no arguments that this bar size is optimal and provide no cautions that their results should be sensitive to bar size. They use bar sizes from

1,000 – 25,000 contracts in other sections of their paper. Therefore, we reproduce tests on order imbalances constructed with 1,000, 10,000, and 25,000 contract bars. In each test, we follow

ELO (2016) in using the same bar size to calculate the CorwinShultz spread as that used to calculate the BVC order imbalances.

To generate data with no informed trading, we use a version of the Glosten and Harris

(1988) model with the adverse selection component of the spread set to zero. Our implementation of the model can be summarized as follows:

(4) = +

(5) ~ (0, )

(6) = + ( + ( − 1))

Where mt is the pretrade midpoint (or, equivalently in this setting, the fair value of the security) for trade t, P t is the trade price for trade t, Q t is a trade sign indicator variable for trade t taking the value of 1 for a buy trade and 1 for a sell trade, V t is the unsigned volume of trade t, c 0 and c1 are trading cost parameters, and ε t is the change in pretrade midpoint due to public information. We subtract one contract from V t so c 0 will match the half bidask spread exactly for a one contract trade. Note that we simulate the model in event time rather than calendar time.

In this data generated from this model, trades have no permanent price impact at all.

For our main simulations, we generate signed trade volumes from a normal distribution, round to the nearest contract, and use these signed volumes to calculate Q t and V t. We also adjust small trade sizes up to one contract before rounding to eliminate zero contract trades. This procedure generates less skewness in trade size than that reported by ELO (2016), which we believe is conservative in that it will likely lead to data with betterbehaved statistical properties.

The price dynamics in these simulations contain no features resembling informed trading

and can be described simply as follows. Future midpoint price changes are uncorrelated with

trades. Trade prices only deviate from the pretrade midpoints by temporary price impacts which

disappear by the next trade. These temporary price impacts include half the bidask spread plus a

linear function of the volume traded, which can be viewed as the effect of larger trades “walking

the book.” One could employ richer models that simulate other microstructure effects such as

autocorrelated order flow induced by ordersplitting, inventory effects, timevarying volatility, or

imperfect resiliency where temporary price impacts decay slowly instead of disappearing by the

next trade. A wellspecified test for informed trading should be robust to these effects, which

could generate patterns that share some features of informed trading and can be present in a

market with no informed trading. However, given the results for the simple simulation tests we

report below, we consider richer simulations unnecessary in this case.

For each bar size, we simulate 1,000 trials. For each trial we generate a dataset of

128,579,415 trades, matching the ELO (2016) ES sample. We choose our other simulation parameters to approximately match the summary statistics reported by ELO (2016) for their

sample and other known characteristics of the ES market. Table 1 reports the simulation parameters. Figure 1 shows the price path for a randomly selected trial. Next, in each trial we

form volume bars and calculate the BVC order imbalances and CorwinSchultz spreads as

described above. In place of tick rule order imbalances, we calculate true order imbalances for each bar using our simulationgenerated trade signs. The true order imbalances are defined as:

_ –_ (7) _ =

where BUY_VOLUME t and SELL_VOLUME t are the share volumes designated as buys and

sells in bar t and VOLUME t is the total share volume in bar t.

Table 2 reports selected summary statistics from the simulated trades. In general, these are similar to the ELO (2016) sample in the main dimensions. Of particular interest, the mean

CorwinSchultz spread for the 10,000 contract bar size simulations is 22 bps, compared to 23 bps in ELO (2016). The one notable difference is that the trade size distribution is less skewed than the ELO (2016) sample as mentioned above; while the means are similar, the median in our sample is 4 contracts compared to 1 contract in the ELO (2016) sample. While we attempt to simulate data similar to that used in ELO (2016), we emphasize that this is for comparability only; the validity of their regression model is not purported to be contingent on specific features of the test asset or market, so we should not require a close match to their data in order to investigate the regression specification properties of interest.

In each trial, we run variations of the following regression:

(8) _ = + |_| + |_ | + _

where all variables are as defined above. This regression is similar to Eq. (2), which corresponds

to Eq. (6) in ELO (2016). Aside from the inclusion of an intercept, our only deviation from Eq.

(2) is the replacement of the tick rule order imbalance with the true order imbalance. This choice

is motivated by the discussion in Section 3.1 above. We estimate three versions of the model that

vary only in the included imbalance variables: one version with only BVC order imbalance, one

with only the true order imbalance, and another with both. All versions include the lagged

CorwinShultz spread. We calculate NeweyWest tstatistics.

Table 3 reports the average coefficients from Eq. (7) across all 1,000 trials for each bar size/regression model combination, and tstatistics that test whether the average coefficient is different from 0. Panel A reports results for 1,000 contract bars. Models 1 and 3 show that the coefficient on is significantly positive, and Models 2 and 3 show that the average |_| coefficient on is significantly negative. These results are qualitatively consistent |_ | with the results reported in ELO (2016) for the 10,000 contract bar size, although the estimated coefficients are smaller in magnitude. Panel B reports results for 10,000 contract bars. Models 1 and 3 show that the coefficient on is significantly negative, and Models 2 and 3 |_| show that the average coefficient on is insignificant. Panel C reports results for |_ | 25,000 contract bars. The results are similar to those for the 10,000 contract bars. Again Models

1 and 3 show that the coefficient on is significantly negative, and Models 2 and 3 |_| show that the average coefficient on is insignificant. |_ |

For all three parameterizations of BVC, these tests indicate a statistically significant

relationship between BVC and informed trading, despite being run on data with no informed

trading. Further, the relationship is not stable across BVC parameterizations. Taken literally,

when we divide the data into 1,000 contract bars, the tests indicate that BVC measures the

informed side of the order flow. However, for 10,000 and 25,000 contract bars, the tests indicate

that BVC measures the uninformed side. While these are clearly incorrect interpretations from

spurious results, it is noteworthy that the conclusions a researcher employing this test might form

are so sensitive to the BVC parameterization choices.

Table 4 provides an alternate perspective on these regressions. While Table 3 considers results aggregated across all trials, Table 4 considers the distribution of NeweyWest tstatistics on the individual trials and reports associated rejection rates for the null hypotheses of no relationship between the imbalance variables and informed trading. The data presented in Table

4 allows us to evaluate the size of the tests; given the lack of informed trading in the data, well specified tests should reject the null at approximately the specified significance level. Panels A and B report rejection rates for hypothesis tests on the coefficient on . Panel A uses |_| tests from the regression model that omits (corresponding to Model 1 in Table 3), |_ | and Panel B uses tests from the regression model that controls for (corresponding |_ | to Model 3 in Table 3). Mean tstatistics are similar for both specifications and rejection rates are identical. We use a 1% significance threshold.

The results in Table 4 Panels A and B are extreme. The twosided test of the hypothesis that the coefficient on is equal to zero is rejected for 100% of trials for all |_| parameterizations, with or without the control. Every one of the 1,000 trials |_ | incorrectly rejects the null in every specification tested. Similarly, the onesided tests all reject

with at 100% or 0% rates. Recall that we are using a 1% significance level, so a wellspecified

test would reject in approximately 1% of trials. 15 In summary, in both models and for all three bar sizes, we observe a Type 1 error in every single trial for the twosided test and for one of the

onesided tests. The asymmetry in the onesided tests points to a bias, but the direction of the bias changes with the parameterization of BVC and matches the direction suggested by the

aggregated results in Table 3.

15 Brown and Warner (1980) point out that, when the null is true and the test statistic is wellspecified, the rejection rate will not be exactly equal to the significance level; the rejection rate is a random variable with a Bernoulli distribution. In our setting, this corresponds to a 95% confidence interval for rejection rates of (0.47% and 1.53%). 15

Table 4 Panels C and D report rejection rates for hypothesis tests on the coefficient on

. Panel C uses tests from the regression model that omits |_ | |_| (corresponding to Model 2 in Table 3), and Panel D uses tests from the regression model that controls for (corresponding to Model 3 in Table 3). For 1,000 contract bars, the two |_| sided tests reject 100% of the time in both models. Again, the onesided tests reject the at 100% or 0% rates in the directions suggested by the results in Table 3. For 10,000 and 25,000 contracts, however, the rejection rates in both models are much closer to the specified significance level of 1%. For 10,000 contract bars, the rejection rates are all greater than 1% but none are significantly different from 1% at the 5% level (i.e. all fall within the 95% confidence interval around 1% given in Footnote 11). For 25,000 contract bars, the rejection rates are all lower than 1%, but only significantly lower in one case (the 1sided test of the null hypothesis of

>= 0 without controlling for ). The result that the rejection rates for |_ | |_| are much more reasonable than those for for all but the small bar size |_ | |_| suggests that concerns of a mechanical relationship between and the CorwinShultz |_| spread have merit. However, the extreme misspecification observed for the 1,000 contract bars

indicates that this explanation is incomplete.

Note that for the 10,000 contract bar size the coefficient on is uniformly |_| significantly negative, while ELO (2016) uses this bar size and reports a significantly positive coefficient. One might wonder whether this means the test is biased against the ELO (2016) result and strengthens their conclusions. We do not believe that this is the correct interpretation.

First, given that the bias in this test is not stable across parameterizations, it is not likely to be stable across characteristics of the data. As mentioned above, our simulated data omits many features that may be present in real data such as order splitting, inventory effects, and imperfect

resiliency. Therefore, it would be an aggressive interpretation to claim that this negative result holds uniformly in the absence of informed trading when bar sizes of 10,000 contracts or shares are used to estimate BVC in actual market data. Second, it is important that the coefficients are unbiased in this type of regression. In this specification, a negative coefficient has the specific economic interpretation that BVC is measuring the trading of the uninformed counterparties of informed order flow. It is therefore misleading to observe these negative coefficients when there is no informed trading. Third, and perhaps most importantly, our results are so extreme that it is difficult to rationalize using regressions of this form to measure the relationship between order imbalances and informed trading. In our view, 100% rejection rates in these simulations indicate a completely spurious regression rather than one containing a statistical issue that can be overcome with an adjustment for a predictable bias or through inflation of the standard errors.

A reasonable question is whether the temporary price impacts in our simulations drive our results by mimicking some dimension of informed trading. To investigate this possibility we modify the simulations to omit the temporary price impacts and repeat the experiment.

Mechanically speaking, we set the c 0 and c 1 trading cost parameters in Eq. (6) to 0 and leave the

rest of the procedure unchanged. The midpoint prices then follow a random walk and all trades

take place at the midpoint, regardless of trade direction and size. The results are presented in

Tables A1 and A2 of Appendix A. Table A1 shows that the mean coefficients on |_| are lower in magnitude than those in Table 3, but are all significantly negative. For the 1,000 contract bars this coefficient switches from the positive value shown in Table 3, and the signs match for the larger bar sizes. The mean coefficients on are close to zero and are |_ | only significant for 1,000 contract bars. Table A2 reports extreme rejection rates for all

hypothesis tests involving . The lowest twosided rejection rate is 38.7% for 25,000 |_|

contract bars. These rejection rates are generally lower than 100% rates observed in the simulations that included temporary price impacts, but still indicate extreme misspecification.

Similarly, the onesided tests either grossly overreject or never reject. None of the rejection rates for are significantly different from 1%. This analysis shows that, while temporary |_ | price impacts exacerbate the misspecification in the ELO (2016) test, the test is still grossly misspecified when prices follow a random walk and are completely unrelated to trades.

Overall, we conclude that evidence in ELO (2016) in support of a relationship between

BVC and informed trading is unreliable. Our simulations show that the tests employed grossly overreject the null in the hypothesis test of central importance and are severely biased. Further, the bias is unstable across BVC parameterizations and is not likely to be corrected with a simple statistical adjustment.

4. Alternate Tests of the Relationship between BVC and Informed Trading

The evidence presented in Section 3 calls into question the previous evidence in the

literature supporting the relationship between BVC and informed trading. However, even if the

tests used in prior studies were flawed, BVC may still be a useful technique to measure informed

trading. In this section, we propose and implement what we believe to be improved tests to

assess this issue. We use the NASDAQ HFT dataset to conduct our tests, which we describe below.

4.1 Motivation and Regression Specifications

We believe that the problems identified above with the tests conducted in ELO (2016) primarily stem from timing issues. There is potentially a mechanical relationship between the

contemporaneous CorwinShultz spread and BVC order imbalance that mimics the hypothesized 18

informed trading effects. There are also theoretical arguments that this type of specification could either miss information effects that occur with a delay if market makers recognize informed order flow gradually or could confuse temporary price impacts with information effects.

These issues can be avoided by employing tests that exploit the simple property that informed trading will be in the direction of future returns. Informed buying (selling) will tend to be followed by positive (negative) returns. We use tests that regress returns in intervals of various lengths on signed order imbalances variables from the prior volume bar. We use midpoint returns to mitigate the effects of bidask bounce. We interpret positive coefficients on order imbalance variables as evidence that those variables are positively correlated with informed trading. Related specifications have been used often in prior literature, see for example

Chordia and Subrahmanyam (2004), Kaniel and Liu (2006), and Kim and Stoll (2014).

We note a limitation to our tests. Information could be incorporated into prices too rapidly after informed trades to be captured with this design. If prices do adjust almost immediately after informed trades, then the information effects will be largely realized before the end of the bar in which the trades occur. Other studies using similar designs address this issue by combining contemporaneous and future price changes or returns into a single dependent variable

(Huang and Stoll (1996), Kaniel and Liu (2006), and others). Given that BVC uses contemporaneous returns in its calculation, it is not obvious how a valid test can be designed that include contemporaneous returns in the dependent variable. However, BVC cannot be calculated until the end of a bar, so its relationship to information revealed after the bar is complete (which our tests are designed to capture) is probably of greater interest. Regardless, we believe that positive results in these tests could potentially provide reliable evidence of a relationship

between BVC and informed trading in the literature, while negative results would not rule out a

relationship between BVC and very shortlived information in our data.

We design our tests to address two questions. First, do BVC order imbalances improve on

the ability of true order imbalances to measure informed trading? And second, do BVC order

imbalances capture passive informed trading?

In our first set of tests, we simply compete the ability of BVC order imbalances against

true order imbalances to predict returns in postbar intervals ranging from 15 seconds to 5

minutes.

= + _ + _ (9) where RET t is the midpoint return computed in the interval from the first midpoint after the end

of the bar in which the order imbalance is measured until the first midpoint after the specified

time has elapsed, and other variables are as previously defined.

For our second set of tests, we construct a variable that is designed to isolate the

component of order flow attributable to passive informed trading if the claims ELO (2016) are

correct. We describe the construction of this variable as follows.

Trades in an interval (whether a volume, trade, or time bar) can be classified into the

categories V BB , V BS , V BN , V SS , V SB , and V SN , where V indicates the proportion of the interval’s volume traded in that category, the first subscript denotes whether the aggressive side of the trade bought or sold, and the second subscript denotes whether the more informed side of the trade bought or sold. The second subscript is set to N when two uninformed traders trade with each other. For example, V BS is the volume in the interval where aggressive uninformed buyers

trade with passive informed sellers.

A conventional order imbalance measure, using either true trade signs or a trade classification algorithm, measures the buyerinitiated volume less the sellerinitiated volume in a block of trades. For a given block of trades, this measure can be represented as:

(10) _ = ( + + ) − ( + + )

Note that this measure does not require the researcher to identify the informed side of the trade, only the aggressive side. We present this formula assuming true trade classifications are observable, as when “aggressor flags” are provided. If trades are classified with error, this relationship can be rewritten with estimated variables and classification errors.

While OIB_TRUE mechanically measures the aggressive trading imbalance, it is often used as a proxy for the informed trading imbalance. This is based on the assumption that informed traders tend to trade more aggressively than uninformed traders. This assumption can be stated as:

(11) >

(12) >

BVC is used to construct a similar order imbalance variable. ELO (2016) argue that order imbalances calculated using BVC capture information, and that when these order imbalances differ from conventional order imbalances, the differences are due to information rather than error. Therefore, if these claims are correct, we can represent a BVC order imbalance as:

(13) = ( + ) − ( + )

Note that this representation ignores classification errors; we do not mean to imply that ELO

(2016) claim that this classification is perfect.

Using this representation, we can see that the difference between a BVC order imbalance and a conventional order imbalance should contain information. Taking this difference and simplifying, we find:

−

= ( + ) − ( + ) − ( + + ) − ( + + )

(14) = 2( − ) + ( − )

For brevity we refer to this difference variable as OIB_DIFF hereafter. The first term is twice the

imbalance between informed traders buying passively and those selling passively. The second

term is an imbalance between uninformed traders selling aggressively and those buying

aggressively. Therefore, the first term should positively predict future returns while the second

should have no effect, so the total difference term should positively predict future returns.16 With perfect BVC classification, all passive informed trading in the bar should be isolated in this

difference variable and no aggressive informed trading should be included. If BVC classification

is noisy but on balance adequate, the sign of this variable should still tend to indicate the

direction of passive informed trading. This motivates the following regression specification for

our second set of tests:

(15) = + _

For both tests, we estimate the regressions one stock at a time, and average the

coefficients from these time series regressions across stocks in a reverse Fama and MacBeth

(1973) procedure. For each coefficient, we calculate the tstatistic on the crosssectional mean

and also calculate NeweyWest tstatistics for each stock.

16 For now we are ignoring the reversals of temporary price impacts for simplicity. If material, this effect should bias the test in favor of associating OIB_DIFF with informed trading. 22

4.2 Data

We conduct our tests in the NASDAQ HFT dataset, which contains trade and quote data for a sample of 120 stocks over a subset of dates in 20082010. This dataset is also used in

Brogaard (2012), Carrion (2013), and Brogaard, Hendershott, and Riordan (2014), O’Hara, Yao, and Ye (2014), Carrion and Kolay (2019), and others. The 120 stocks in this sample were chosen with a stratified random sampling approach along the dimensions of market capitalization and listing venue.17 Market capitalization is evenly split between small, medium and large firms.

Listing venues are split equally between NASDAQ and NYSE. We utilize two subsets of data from this dataset: trade reports and the NASDAQ Inside Quotes (BBO). The trade sample covers each trading date during 2008 and 2009 and the week of Feb 22 – 26, 2010. For each sample stock and date, each trade executed on the NASDAQ exchange is shown, excluding trades done in the opening, closing, and intraday crosses. Trades include a millisecond timestamp and are signed to indicate whether they were initiated by a buyer or seller. The trade signs are based on records of fee and rebate payments used by the exchange. The NASDAQ BBO data is available only for the following subset of dates: 1) the first full week of the first month of each quarter during 2008 and 2009; 2) the crisis week of Sept 15 – 19, 2008; and 3) the week of Feb 22 – 26,

2010. We apply a filter to the sample to remove trades and quotes occurring before 9:30 am and after 4:00 pm.

There are several benefits of using this dataset for our analysis. First, it contains true trade signs, which we use to create true order imbalances. Second, NASDAQ notes that trade and quote sequencing is of high quality in this dataset, which is particularly relevant for our analysis.

17 Sample was chosen by Ryan Riordan and Terrence Hendershott. See Brogaard (2013) and Carrion (2013) for further details.

Third, this data is drawn from a market characterized by pervasive highfrequency trader (HFT) participation and short durations between trades and quotes, which is the type of environment where ELO (2016) argue that BVC should have advantages over conventional techniques.

Carrion and Kolay (2019) study the same data and report that HFTs participate in 75.16% of the sample trades, the median time elapsed between quotes is 0.024 seconds, and the median time elapsed between trades is 0.001 seconds.

This dataset is described in more detail in Carrion (2013), Brogaard, Hendershott, and

Riordan (2014), and Carrion and Kolay (2019).

4.3 BVC Implementation Details

Implementing BVC requires the researcher to make a number of decisions. The most obvious are the basic parameterization – the bar size and volatility estimation. It is also necessary to decide how to handle the overnight trading period and trades that overflow bars. There is little relevant guidance or precedent in the literature. ELO (2016) note that implementation in equity markets involves addressing new issues that they did not face in their futures data. Chakrabarty,

Pascual and Shkilko (2015) and Panayides, Shohfi, and Smith (2019) are the only other studies we are aware of that have implemented BVC in equity data, and we follow their procedures where appropriate.

We design bar sizes to be roughly equivalent to the bar sizes use by ELO (2016). In their

ES sample they use a range of bar sizes from 1,000 contracts to 25,000 contracts, and focus on

10,000 contract bars for their main test. The number of contracts per bar itself has very little economic meaning across instruments. Therefore, instead of directly using these numbers as share volume per bar, we scale bar sizes for each stock that approximately match the bars per

hour pace of the three bar sizes in ELO (2016). Considering the trading hours and average daily volume, the ELO (2016) 1,000, 25,000, and 25,000 contract bar sizes approximately correspond to 100, 10, and 4 bars per hour on average. 18 After adjusting for 6.5 hour equity trading day, these sizes correspond to averages of 650, 65, and 26 bars per day. For each stock, we convert these to “bars per day” rules which are then applied to find fixed numbers of shares per bar every month using the average insample daily volume for each stockmonth. For brevity we refer to these sizes as small, medium, and large bars hereafter.

We drop stockmonths for a specific bar size where this approach would give a bar size of less than 500 shares. Very small bar sizes do not result in wellformed bars. With bar sizes around a typical trade size, BVC would not result in much aggregation. With bar sizes smaller than the typical trade size, BVC would actually result in disaggregation where most trades are split over multiple bars, which is clearly contrary to the spirit of BVC. This filtering leaves us with a sample of 56 usable stocks for small bars, 112 stocks for medium bars, and 117 stocks for large bars.

It is also necessary to estimate volatility for each stock to compute the CDF in Eq (1).

ELO (2016) use the insample standard deviation of bartobar price changes over the whole sample. In our data we have a longer sample period and each stock’s volatility may vary significantly, so we use insample standard deviations of bartobar price changes within each stockmonth.

18 The ES market is open an average of 23.55 hours per day and the ELO (2016) oneyear sample contained 128,579,415 trades with an average size of 4.5 contracts. 25

To handle the overnight period, we discard the partially filled bar at the end of each day and start a fresh bar with the first trade of the next day. We also use the closeopen withinbar price change for the first bar of the day rather than using the closing bar from the previous day.

Chakrabarty, Pascual and Shkilko (2015) and Panayides, Shohfi, and Smith (2019) note a problem with large trades that span multiple bars. There may be a valid price change to use in Eq

(1) for the first bar, but for subsequent bars the open and the close are the same price. With no price change, BVC evenly splits the volume in these bars between buys and sells. We apply a

correction for this issue proposed by Chakrabarty, Pascual, and Shkilko (2015). When a bar is

filled solely by volume from a single trade, we assign the same buy/sell split calculated for the

last bar which contained volume from multiple trades. This assumes that the first bar in each

cluster of bars with volume from a single trade generally contains volume from previous trades

and the beginning of the large trade. At the end of the cluster, we treat the first bar with volume

from multiple trades normally. This issue is likely to be more severe when using small bar sizes,

and partially motivates our filtering criteria of a required minimum bar size of 500 shares

discussed above.

We only use the subperiod that includes NASDAQ BBO data for our main tests, but we

use the full sample period to construct BVC order imbalances. This is important because the

BVC estimation benefits from using the longer period of contiguous data to compute the

standard deviations of bartobar price changes.

4.4 Results

Our first set of tests use the regression model in Eq. (9) to compare the predictive power

of OIB_BVC and OIB_TRUE for future returns. The results are reported in Table 5. Panel A

reports results for the small bar size, which corresponds to an average of 650 bars per day. The regressions only use 56 of the stocks, because reasonably wellformed BVC bars could not be calculated for the remaining stocks at this small of a bar size. 19 Using returns measured at all

three postbar horizons (15 seconds, 1 minute, and 5 minutes), the coefficients on both order

imbalance variables are positive, indicating that these variables positively predict future returns

and suggesting that they are both correlated with information in the order flow. The coefficient

on OIB_TRUE is larger than that on OIB_BVC in all three models. Based on the tstatistic that

tests whether the mean coefficient for all stocks is equal to 0, OIB_TRUE is significant at all

three horizons, while OIB_BVC is significant at 15 seconds and marginally significant at 1

minute. Inspecting the distributions of tstatistics from the 56 individual regressions reveals a

similar pattern. The coefficients on OIB_TRUE have higher mean tstatistics, are more often

individually significantly positive, and less often significantly negative than the coefficients on

OIB_BVC. However, the coefficients on OIB_BVC are more often significantly positive than

negative. The predictive power of both variables weakens as the return horizon lengthens, but predictive power of OIB_TRUE weakens more slowly and retains statistical significance at 5

minutes. Overall, for the small bar size, OIB_TRUE outperforms OIB_BVC in predicting future

returns, but the coefficients on OIB_BVC have the correct sign and are statistically significant at

shorter horizons.

Table 5 Panels B and C report results for medium and large bars. After filtering, 112

(117) stocks are usable with medium (large) bars. For all three return horizons and both bar sizes,

the coefficients on OIB_TRUE are positive and significant based on ttests of the mean. The

distributions of the individual tstatistics show some weakening of significance with large bars

19 The relevant filtering criteria is described in Section 4.3. 27

and longer return horizons, however. OIB_BVC has no significant marginal predictive power for future returns at these bar sizes. The mean coefficients are negative in four of the six specifications and are never significantly different from zero. The distribution of individual t statistics tells a similar story; the means are always below 0.25 and often negative, and the frequencies of significantly positive individual coefficients are much lower than observed with small bars and sometimes lower than the number of significantly negative coefficients in the same test.

Our second set of tests assesses the predictive power of OIB_DIFF for future returns. If

BVC successfully measures passive informed trading, this variable should isolate this component of the order flow and positively predict returns. The results are reported in Table 6. Panel A reports results for small bars, Panel B reports results for medium bars, and Panel C reports results for large bars. The results are consistent across all bar sizes and return horizons. The mean coefficients on OIB_DIFF are uniformly negative and statistically significant. Turning to the distributions of tstatistics from individual regressions, the means are uniformly negative and less than 2.0 for all small bar regressions and the medium bar regression with the 15 second horizon.

Few coefficients are positive and significant (ranging from 0/56 with small bars and 15 second and 1 minute horizons to 4/117 with large bars and 5 minute horizons), while many are negative and significant (ranging from 16/117 with large bars and 5 minute horizons to 55/56 with small bars and 15 second horizons). In short, we see no evidence that OIB_DIFF can positively predict returns, which is inconsistent with the hypothesis that OIB_BVC captures passive informed trading.

How should we interpret the oftenstrong negative relationship between OIB_DIFF and future returns? Mechanically, order flow only enters into the OIB_DIFF variable when

OIB_BVC differs in its buysell classification from OIB_TRUE. Therefore, regardless of whether the difference is related to information or not, it takes on positive values for aggressive selling and negative values for aggressive buying on the disagreed upon order flow. If neither

OIB_BVC nor OIB_TRUE capture informed trading and the disagreements between them are random, then we should expect OIB_DIFF to have no predictive power for future returns.

However, if BVC does not capture passive informed trading and the disagreements are random, but aggressive trading tends to be more informed than passive trading, then we should expect

OIB_DIFF to have negative predictive power for future returns. Therefore, what we observe is consistent with OIB_TRUE capturing informed trading and the disagreed upon order flow consisting of more random misclassification than passive informed trading.

Considering the hypothesized properties of BVC, we consider OIB_TRUE the most interesting benchmark of OIB_BVC’s ability to identify informed trading. However, researchers often use data that do not contain true trade signs and must rely on trade signing algorithms such as the Lee and Ready method to estimate order imbalances. Therefore, in a robustness test we repeat the analysis presented in Table 5 using order imbalances constructed from Lee and Ready trade signs (OIB_LR) in place of OIB_TRUE.20 We describe the procedure in Appendix B and

report the results in Appendix Table B1. The results are very similar to those in Table 5. The

coefficients on OIB_LR are positive in every specification and generally significant, while the

coefficients on OIB_BVC have mixed signs and are only positive and significant for small bars

with a 15 second return horizon.

20 We do not repeat the OIB_DIFF tests from Table 6 using Lee and Ready order imbalances. These tests would lose their interpretation as tests of BVC’s ability to measure passive informed trading if OIB_DIFF was constructed using trades signed with error instead of true trade signs. This can be illustrated by modifying Eq. (14) to incorporate trade signing errors. 29

Overall, we find no evidence that OIB_BVC outperforms OIB_TRUE or OIB_LR in measuring informed trading and no evidence that OIB_BVC captures passive informed trading.

From our first set of tests, OIB_BVC does have some positive predictive power for future returns when controlling for OIB_TRUE for small bar sizes and short horizons, but even in these cases it underperforms OIB_TRUE and cannot be calculated for many stocks if one requires reasonably wellformed bars.

5. Conclusions

Our analysis calls into question the use of BVC to measure informed trading. We identify several issues with the evidence used to support this relationship in prior literature. Most importantly, we conduct a simulation exercise that reproduces the main test used in the literature in data with no informed trading, and find that the results are similar to those previously thought to show a relationship between BVC and informed trading. Our simulation exercise shows that these tests are severely misspecified, are sensitive the BVC parameterization, and are consistent with a mechanical relationship between BVC and the liquidity measure used as a dependent variable. We conduct independent tests of this relationship in the NASDAQ HFT dataset using a research design that we believe avoids these problems. We find that conventional order imbalances constructed from both true trade signs and Lee and Ready (1991) trade signs uniformly outperform BVC in predicting future returns across multiple combinations of BVC parameterizations and return measurement horizons. Additional tests uniformly fail to show that the disagreements between BVC and true trade signs are driven by informed passive trading as suggested by ELO (2016).

We note that BVC has many degrees of freedom, with little guidance from the literature on how to best parameterize it. BVC is also arguably nondeterministic; even with a fixed parameterization the same group of trades could be classified differently depending on the length of the sample they are included in. These issues make it difficult to make conclusive statements about the properties of BVC. It is possible that BVC can be improved by some implementation strategy that we are not aware of.

We focus on the relationship between BVC and informed trading only, and our results should not be interpreted to imply that BVC does not have other useful applications. ELO (2016) and other studies have established that BVC can be used as a trade signing algorithm, and nothing in this paper cautions against this application when the goal is unrelated to the information content of the trade. Prior research has shown that BVC does not generally outperform other algorithms with regard to trade signing accuracy, but it has computational advantages, does not require quote data, and may be more robust to data sequencing errors than other methods.

References Andersen, Torben G., and Oleg Bondarenko, 2014a, VPIN and the flash crash, Journal of Financial Markets 17, 1–46.

Andersen, Torben G., and Oleg Bondarenko, 2014b, Reflecting on the VPIN dispute, Journal of Financial Markets 17, 53–64.

Andersen, Torben G., and Oleg Bondarenko, 2015, Assessing measures of order flow toxicity and early warning signals for market turbulence, Review of Finance 19, 154.

Baruch, Shmuel, Marios Panayides, and Kumar Venkataraman, 2017, Informed trading and price discovery before corporate events, Journal of Financial Economics 125, 561–588.

Brogaard, Jonathan, 2012, Essays on high frequency trading, Northwestern University dissertation.

Brogaard, Jonathan, Terrence Hendershott, Ryan Riordan, 2014, High frequency trading and price discovery, Review of Financial Studies 27, 2267–2306.

Brown, Stephen J., and Jerold B. Warner, 1980, Measuring security price performance, Journal of Financial Economics 8, 205–258.

Carrion, Allen, 2013, Very fast money: Highfrequency trading on the NASDAQ, Journal of Financial Markets, 16, 680–711.

Carrion, Allen, and Madhuparna Kolay, 2019, Trade signing in fast markets, Financial Review , forthcoming.

Chakrabarty, Bidisha, Roberto Pascual Gascó, and Andriy Shkilko, 2015, Evaluating trade classification algorithms: Bulk volume classification vs. the tick rule and the LeeReady algorithm, Journal of Financial Markets 25, 52–79.

Chordia, Tarun, Richard Roll, and Avanidhar Subrahmanyam, 2002, Order imbalance, liquidity, and market returns, Journal of Financial Economics 65, 111–130.

Chordia, Tarun, and Avanidhar Subrahmanyam, 2004, Order imbalance and individual stock returns: theory and evidence, Journal of Financial Economics 72, 485–518.

Corwin, Shane, and Paul Schultz, 2012, A simple way to estimate bidask spreads from daily high and low prices, Journal of Finance 67, 719–759.

Easley, David, Marcos M. López de Prado, and Maureen O’Hara, 2012, Flow toxicity and liquidity in a highfrequency world, Review of Financial Studies 25, 1457–1493.

Easley, David, Marcos M. López de Prado, and Maureen O’Hara, 2016, Discerning information from trade data, Journal of Financial Economics 120, 269–285.

Easley, David, Marcos M. López de Prado, and Maureen O’Hara, 2014, VPIN and the Flash Crash: A rejoinder, Journal of Financial Markets 17, 47–52.

Harris, Lawrence, 1998, Optimal dynamic order submission strategies in some stylized trading problems, Financial Markets, Institutions and Instruments 7, 1–76.

Huang, Roger D., and Hans R. Stoll, 1996, Dealer vs. auction markets: a paired comparison of execution costs on NASDAQ and the NYSE, Journal of Financial Economics 41, 313–357.

Kaniel, Ron, and Hong Liu, 2006, So what orders do informed traders use? Journal of Business 79 1867–1913.

Kacperczyk, Marcin, and Emiliano Pagnotta, 2016, Chasing private information, working paper.

Kim, Sukwon Thomas, and Hans R. Stoll, 2014, Are trading imbalances indicative of private information? Journal of Financial Markets, 20, 151–174.

Kyle, Albert S., 1985, Continuous auctions and insider trading, Econometrica 53, 13151336.

Lee, Charles M. C., and Mark J. Ready, 1991, Inferring trade direction from intraday data, Journal of Finance 46, 733–746.

Massot, Magdalena, Samarpan Nawn and Roberto Pascual, 2018, Bulk volume classification under the microscope: Estimating the net order flow, working paper.

O’Hara, Maureen, 2015, High frequency market microstructure, Journal of Financial Economics 116, 257–270.

O’Hara, Maureen, Chen Yao, and Mao Ye, 2014, What’s not there: Oddlots and market data, Journal of Finance 69, 2199–2236.

Panayides, Marios, Thomas Shohfi, and Jared Smith, 2019, Bulk volume classification and information detection, Journal of Banking and Finance 103, 113129.

Table 1 Summary of Simulation Parameters. Signed trade volumes are generated from a normal distribution and are used to calculate the unsigned trade volume and the trade sign indicator as per Eqs. 4, 5 and 6 in Section 3. Trades are aggregated into volume bars of 1,000, 10,000, or 25,000 contracts. The number of trades per trial is matched to the number of trades in the ELO (2016) Emini S&P futures sample while other simulation parameters are chosen to approximately match either the ELO (2016) Emini S&P 500 futures sample characteristics or the Emini S&P futures' market characteristics. The number of trades per trial is matched to that in ELO's sample while the starting Emini S&P 500 futures midpoint is the settlement price of Emini S&P futures on the day prior to the start of ELO's sample period. The midpoint volatility corresponds to the closing value of VIX on the day prior to the start of ELO's sample period (18.26) scaled to a pertrade value. The signed volume volatility is calibrated to match ELO's mean trade size after adjustments. The Glosten & Harris (GH) trading costs parameter c0 is half of Emini S&P futures tick size while c 1 is calibrated to match the mean CorwinShultz spread in ELO's sample with 10,000 contract bar size. The bar volatility for each of the three bars is the midpoint volatility scaled to bar size.

Simulation Parameter Value

Number of trials for each bar size 1,000 Number of trades/trial 128,579,415 Starting Emini S&P 500 futures midpoint 1,222 Midpoint volatility (per trade) 0.02 Signed volume volatility (contracts/trade) 5.6 Glosten & Harris trading cost parameter c0 0.125 Glosten & Harris trading cost parameter c2 0.09 Bar volatility, 1k bar size 2.98 Bar volatility, 10k bar size 9.43 Bar volatility, 25k bar size 14.91

Table 2 Simulation summary statistics. Mean order imbalances and CorwinSchultz spread calculated from volume bars in each trial. The calculation of BVC order imbalances and CorwinSchultz spreads is as described in Section 4.3. As in ELO, the CorwinSchultz spreads are reported as a proportion of the price of a contract and order imbalances are the absolute value of the estimated order imbalance as a fraction of the total volume in a bar.

Statistics Value

Trade Size, mean 4.53 Trade Size, median 4

Bar size = 1,000 contracts Mean number of bars 582,896.7 Mean CorwinSchultz spread 0.0019 Mean |OIB_BVC| 0.111 Mean |OIB_TRUE| 0.066

Bar size = 10,000 contracts Mean number of bars 58,289.2 Mean CorwinSchultz spread 0.0022 Mean |OIB_BVC| 0.050 Mean |OIB_TRUE| 0.021

Bar size = 25,000 contracts Mean number of bars 23,315.4 Mean CorwinSchultz spread 0.0023 Mean |OIB_BVC| 0.041 Mean |OIB_TRUE| 0.013

Table 3 Regressions of Corwin-Schultz spreads on order imbalances in simulated trade data. This table reports the results of variations of the following regression:

_ = + |_| + |_ | + _ in simulated trade data. The simulation parameters are given in Table 1 and summary statistics of the simulated data are presented in Table 2. CS_SPRD is the CorwinSchultz spread, OIB_BVC is the BVC order imbalance, and OIB_TRUE is the order imbalance constructed from true trade signs. The regression is estimated separately for each of 1,000 draws of the data. Reported coefficients are means over all trials, and tstatistics that test the null that the mean coefficient is equal to 0.

Variable Model Panel A: Bar size = 1,000 contracts 1 2 3

Intercept 0.0015 0.0015 0.0015 (287.79) (289.73) (288.37) |OIB_BVC| 0.0001 0.0001 (253.42) (253.40) |OIB_TRUE| 0.0001 0.0001 (198.08) (197.72) CS_SPRD t1 0.1774 0.1775 0.1774 (70.37) (70.41) (70.38)

Panel B: Bar size = 10,000 contracts 1 2 3

Intercept 0.0022 0.0021 0.0022 (329.14) (329.78) (329.00) |OIB_BVC| 0.0018 0.0018 (252.1) (252.24) |OIB_TRUE| 0.00001 0.00001 (1.13) (1.27) CS_SPRD t1 0.0532 0.0532 0.0532 (43.46) (43.43) (43.46)

Panel C: Bar size = 25,000 contracts 1 2 3

Intercept 0.0024 0.0023 0.0024 (312.61) (314.68) (312.76) |OIB_BVC| 0.0044 0.0044 (246.93) (246.91) |OIB_TRUE| 0.00002 0.00002 (0.92) (1.06) CS_SPRD t1 0.0319 0.0319 0.0319 (41.11) (41.21) (41.11)

Table 4 Rejection rates for significance tests of order imbalances. This table reports the proportion of times the null hypotheses for the coefficients on the order imbalances in the regression in Table 3 are rejected. For each of the three bar sizes, 1000 regressions are estimated using simulated data. The simulation parameters are given in Table 1 and summary statistics of the simulated data are presented in Table 2. |OIB_BVC| is the absolute value of the calculated contemporaneous BVC order imbalance and |OIB_TRUE| is the absolute value of the true order imbalance from the simulations. The mean tstatistics reported are the crosssectional averages of the 1,000 t statistics obtained for each bar size.

Panel A: Without controlling for |OIB_TRUE| Rejection rates for significance tests of |OIB_BVC|,

Panel B: Controlling for |OIB_TRUE| Rejection rates for significance tests of |OIB_BVC|,

Panel C: Without controlling for |OIB_BVC| Rejection rates for significance tests of |OIB_TRUE|,

Panel D: Controlling for |OIB_BVC| Rejection rates for significance tests of |OIB_TRUE|,

Table 5 Regressions of returns on order imbalances in equity trading data from the NASDAQ HFT dataset. This table reports the results of the following regression:

= + _ + _ for a sample of stocks selected for NASDAQ by Terrence Hendershott and Ryan Riordan. The full sample consists of 61,271,087 trades for 120 stocks over the time periods January 2008 – December 2009 and February 22, 2010 – February 26, 2010. is the midpointtomidpoint return from the end of the volume bar to a postbar midpoint after the return horizon has elapsed. is the lagged true order imbalance determined by rebate payments , from the data. is the lagged order imbalance estimated using the BVC methodology as described in _ Section 4.3. Panel A reports the result from 650 average bars per day, Panel B from 65 average bars per day and Panel C 26 average bars per day. The number of stocks used drops below 120 in each panel since stocks which cannot be distributed into wellformed bars get dropped from the sample. The coefficients and tstats (raw and Newey West corrected) reported are the averages across all stocks used in the regression. All coefficients are multiplied by 1,000. Num pos sig (Num neg sig) represent the number of stocks for which the coefficient is positive (negative) and significant.

Panel A: Average Bars per Day=650 tstatistic N= 56 stocks Num Num Return Horizon Variable Coefficient t(mean) mean(t ) pos sig neg sig 15 seconds Intercept 0.0171 0.94 0.09 5 6 OIB_BVC 0.1104 2.71 2.59 34 3 OIB_TRUE 0.2518 6.62 9.90 56 0

1 minute Intercept 0.0406 1.21 0.27 4 1 OIB_BVC 0.1682 1.82 2.48 35 2 OIB_TRUE 0.2654 4.11 5.14 47 0

5 minutes Intercept 0.0498 1.1 0.07 3 2 OIB_BVC 0.0492 0.46 0.69 13 4 OIB_TRUE 0.2367 3.17 2.09 28 1

Panel B: Average Bars per Day=65 tstatistic N= 112 stocks Num Num Return Horizon Variable Coefficient t(mean) mean(t ) pos sig neg sig 15 seconds Intercept 0.0234 0.99 0.12 5 0 OIB_BVC 0.0374 0.58 0.22 8 4 OIB_TRUE 0.3261 6.81 2.78 68 0

1 minute Intercept 0.0301 1.42 0.16 8 4 OIB_BVC 0.0564 0.54 0.12 5 3 OIB_TRUE 0.4564 4.97 2.01 55 1

5 minutes Intercept 0.1113 1.93 0.17 7 0 OIB_BVC 0.0698 0.54 0.13 8 11 OIB_TRUE 0.4554 4.41 1.14 34 2

Panel C: Average Bars per Day=26 tstatistic N= 117 stocks Num Num Return Horizon Variable Coefficient t(mean) mean(t ) pos sig neg sig 15 seconds Intercept 0.0219 1.12 0.12 4 8 OIB_BVC 0.0479 0.62 0.20 4 2 OIB_TRUE 0.3300 4.76 1.46 36 0

1 minute Intercept 0.0127 0.62 0.09 2 3 OIB_BVC 0.0736 0.65 0.02 4 4 OIB_TRUE 0.4628 3.76 1.09 30 0

5 minutes Intercept 0.0759 1.54 0.04 2 4 OIB_BVC 0.1427 0.99 0.18 7 2 OIB_TRUE 0.4315 2.76 0.67 18 3

Table 6 Regressions of returns on BVC – True order imbalance disagreement variable in equity trading data from the NASDAQ HFT dataset. This table reports the results of the following regression:

= + _ for a sample of stocks selected for NASDAQ by Terrence Hendershott and Ryan Riordan. The full sample consists of 61,271,087 trades for 120 stocks over the time periods January 2008 – December 2009 and February 22, 2010 – February 26, 2010. is the midpointtomidpoint return from the end of the volume bar to a postbar midpoint after seconds have elapsed. is the lagged difference between the order imbalance estimated using the _ BVC methodology and the true order imbalance, as described in Section 4.3. Panel A reports the result from 650 average bars per day, Panel B from 65 average bars per day and Panel C 26 average bars per day. The number of stocks used drops below 120 in each panel since stocks which cannot be distributed into wellformed bars get dropped from the sample. The coefficients and tstats (raw and Newey West corrected) reported are the averages across all stocks used in the regression. All coefficients are multiplied by 1,000. Num pos sig (Num neg sig) represent the number of stocks for which the coefficient is positive (negative) and significant.

Panel A: Average Bars per Day=650 tstatistic N= 56 stocks Num Num Return Horizon Variable Coefficient t(mean) mean(t ) pos sig neg sig 15 seconds Intercept 0.0170 0.94 0.07 5 6 OIB_DIFF 0.2653 6.9 10.64 0 55

1 minute Intercept 0.0405 1.2 0.26 4 1 OIB_DIFF 0.2821 4.41 5.65 0 46

5 minutes Intercept 0.0497 1.09 0.06 3 2 OIB_DIFF 0.2494 3.37 2.29 1 29

Panel B: Average Bars per Day=65 tstatistic N= 112 stocks Num Num Return Horizon Variable Coefficient t(mean) mean(t ) pos sig neg sig 15 seconds Intercept 0.0228 0.95 0.14 0 5 OIB_DIFF 0.2989 6.16 2.43 0 61

1 minute Intercept 0.0293 1.38 0.15 6 4 OIB_DIFF 0.4360 4.72 1.87 2 52

5 minutes Intercept 0.1111 1.91 0.16 5 0 OIB_DIFF 0.4718 4.63 1.22 2 37

Panel C: Average Bars per Day=26 tstatistic N= 117 stocks Num Num Return Horizon Variable Coefficient t(mean) mean(t ) pos sig neg sig 15 seconds Intercept 0.0204 1.06 0.13 2 8 OIB_DIFF 0.2671 4.03 1.10 1 31

1 minute Intercept 0.0117 0.57 0.10 2 3 OIB_DIFF 0.4220 3.51 0.95 1 27

5 minutes Intercept 0.0747 1.52 0.03 2 4 OIB_DIFF 0.3903 2.62 0.60 4 16

Figure 1 Trade Price Path from Single Trial of Simulation Exercise

Appendix A: Simulations without Temporary Price Impacts

In this section we repeat the analysis reported in Table 3 and Table 4 in simulated data with no temporary price impacts. We generate data with no informed trading and no temporary price impacts using following model:

(A1) = = +

(A2) ~ (0, )

Where mt is the pretrade midpoint (or, equivalently in this setting, the fair value of the security) for trade t and P t is the trade price for trade t. Note that we simulate the model in event time

rather than calendar time. In this data generated from this model, trades have no temporary or permanent price impact. This is equivalent to the modified Glosten and Harris (1998) model

described in Section 3.2. with the trading cost parameters c 0 and c 1 set to 0. The rest of the procedure is identical to that described in Section 3.2, and we continue to use the relevant parameters given in Table 1.

Table A1 Regressions of Corwin-Schultz spreads on order imbalances in simulated trade data with no temporary price impacts. This table reports the results of variations of the following regression:

_ = + |_| + |_ | + _ in simulated trade data. The simulation parameters are given in Table 1, with the exception of the trading cost parameters c 0 and c 1 which are set to 0. CS_SPRD is the CorwinSchultz spread, OIB_BVC is the BVC order imbalance, and OIB_TRUE is the order imbalance constructed from true trade signs. The regression is estimated separately for each of 1,000 draws of the data. Reported coefficients are means over all trials, and tstatistics that test the null that the mean coefficient is equal to 0.

Variable Model Panel A: Bar size = 1,000 contracts 1 2 3

Intercept 0.0001 0.0001 0.0001 (286.83) (286.91) (286.68) |OIB_BVC| 0.0001 0.0001 (207.35) (207.35) |OIB_TRUE| 0.0000 0.0000 (4.47) (4.73) CS_SPRD t1 0.0251 0.0251 0.0251 (158.3) (158.3) (158.3)

Panel B: Bar size = 10,000 contracts 1 2 3

Intercept 0.0003 0.0003 0.0003 (286.46) (284.96) (285.57) |OIB_BVC| 0.0002 0.0002 (104.27) (104.26) |OIB_TRUE| 0.0000 0.0000 (0.2) (0.17) CS_SPRD t1 0.0174 0.0174 0.0174 (83.35) (83.36) (83.37)

Panel C: Bar size = 25,000 contracts 1 2 3

Intercept 0.0005 0.0005 0.0005 (285.28) (285.17) (284.14) |OIB_BVC| 0.0004 0.0004 (70.24) (70.22) |OIB_TRUE| 0.0000 0.0000 (0.04) (0.05) CS_SPRDt1 0.0163 0.0163 0.0163 (58.6) (58.66) (58.61)

Table A2 Rejection rates for significance tests of order imbalances in simulated trade data with no temporary price impacts. This table reports the proportion of times the null hypotheses for the coefficients on the order imbalances in the regression in Table A1 are rejected. For each of the three bar sizes, 1000 regressions are estimated using simulated data. The simulation parameters are given in Table 1, with the exception of the trading cost parameters c 0 and c 1, which are set to 0. |OIB_BVC| is the absolute value of the calculated contemporaneous BVC order imbalance and |OIB_TRUE| is the absolute value of the true order imbalance from the simulations. The mean tstats reported are the crosssectional averages of the 1,000 tstats obtained for each bar size.

Panel A: Without controlling for |OIB_TRUE| Rejection rates for significance tests of |OIB_BVC|,

Panel B: Controlling for |OIB_TRUE| Rejection rates for significance tests of |OIB_BVC|,

Panel C: Without controlling for |OIB_BVC| Rejection rates for significance tests of |OIB_TRUE|,

Panel D: Controlling for |OIB_BVC| Rejection rates for significance tests of |OIB_TRUE|,

Appendix B: Regressions of Returns on BVC and Lee and Ready Order Imbalances

In this section we repeat the analysis reported in Table 5 using order imbalances constructed from trades signed with the Lee and Ready method instead of true trade signs. We first sign trades using the Lee and Ready method. The Lee and Ready method applies the quote rule where possible and the tick rule otherwise. In our implementation of the quote rule, we match trades with the most recent pretrade quote midpoint that is not in the same millisecond as the trade. Trades above the midpoint are classified as buys, and trades below the midpoint are classified as sells. The tick rule is used when trade occurs at the midpoint of the matched quotes, or in a small number of cases where no pretrade quote is available. The tick rule compares the current trade price with the preceding trade price and classifies the trade as a buy if the current price is higher or a sell if the current price is lower. If the preceding trade price is identical to the current price, the tick rule looks back to the most recent different price for a benchmark.

Next, we aggregate the signed trades in each bar into order imbalances as follows. The true order imbalances are defined as:

_ –_ (B1) _ =

where BUY_VOLUME t and SELL_VOLUME t are the share volumes designated as buys and

sells by the Lee and Ready method in bar t and VOLUME t is the total share volume in bar t. The bars are volume bars formed as described in Section 4.3.

For our regressions, we modify Eq. (9) as follows:

= + _ + _ (B2)

where OIB_LR is the Lee and Ready order imbalance defined in Eq. (B1) and other variables are as previously defined. We estimate this regression one stock at a time and report the results in

Table B1 below.

Table B1 Regressions of returns on order imbalances in equity trading data from the NASDAQ HFT dataset. This table reports the results of the following regression:

= + _ + _LR for a sample of stocks selected for NASDAQ by Terrence Hendershott and Ryan Riordan. The full sample consists of 61,271,087 trades for 120 stocks over the time periods January 2008 – December 2009 and February 22, 2010 – February 26, 2010. is the midpointtomidpoint return from the end of the volume bar to a postbar midpoint after the return horizon has elapsed. is the lagged order imbalance estimated using a modified Lee and _ Ready (1991) algorithm that matches trades with the last quote in the prior millisecond. is the lagged _ order imbalance estimated using the BVC methodology as described in Section 4.3. Panel A reports the result from 650 average bars per day, Panel B from 65 average bars per day and Panel C 26 average bars per day. The number of stocks used drops below 120 in each panel since stocks which cannot be distributed into wellformed bars get dropped from the sample. The coefficients and tstats (raw and Newey West corrected) reported are the averages across all stocks used in the regression. All coefficients are multiplied by 1,000. Num pos sig (Num neg sig) represent the number of stocks for which the coefficient is positive (negative) and significant.

Panel A: Average Bars per Day=650 tstatistic N= 56 stocks Num Num Return Horizon Variable Coefficient t(mean) mean(t ) pos sig neg sig 15 seconds Intercept 0.0164 0.93 0.08 4 6 OIB_BVC 0.0903 2.15 2.29 30 3 OIB_LR 0.2626 6.76 10.54 55 0

1 minute Intercept 0.0399 1.21 0.27 4 1 OIB_BVC 0.1294 1.36 2.17 28 2 OIB_LR 0.2901 4.37 5.80 48 0

5 minutes Intercept 0.0491 1.10 0.07 4 2 OIB_BVC 0.0075 0.07 0.43 11 3 OIB_LR 0.2751 3.53 2.60 30 0

Panel B: Average Bars per Day=65 tstatistic N= 112 stocks Num Num Return Horizon Variable Coefficient t(mean) mean(t ) pos sig neg sig 15 seconds Intercept 0.0246 1.02 0.11 2 6 OIB_BVC 0.0606 0.84 0.15 5 4 OIB_LR 0.3638 7.19 3.00 71 0

1 minute Intercept 0.0269 1.21 0.17 6 4 OIB_BVC 0.0527 0.52 0.04 5 4 OIB_LR 0.4699 5.25 2.29 62 0

5 minutes Intercept 0.1128 1.93 0.17 6 0 OIB_BVC 0.1422 1.02 0.21 4 13 OIB_LR 0.5599 5.19 1.46 43 0

Panel C: Average Bars per Day=26 tstatistic N= 117 stocks Num Num Return Horizon Variable Coefficient t(mean) mean(t ) pos sig neg sig 15 seconds Intercept 0.0228 1.16 0.10 4 8 OIB_BVC 0.0156 0.22 0.14 2 2 OIB_LR 0.3781 5.45 1.66 49 0

1 minute Intercept 0.0125 0.62 0.07 2 3 OIB_BVC 0.0892 0.71 0.03 4 4 OIB_LR 0.5041 3.94 1.30 34 0

5 minutes Intercept 0.0788 1.57 0.05 3 4 OIB_BVC 0.0959 0.35 0.13 6 3 OIB_LR 0.0505 3.51 0.87 24 4