Preliminary and Incomplete: Please do not cite or circulate this version.
Bulk Volume Trade Classification and Informed Trading ∗
Allen Carrion University of Memphis
Madhuparna Kolay University of Portland
This Draft: January 2020
Abstract
We document that the existing evidence that bulk volume trade classification (BVC) can measure informed trading arises largely due to misspecified tests. In particular, simulations show that these tests detect spurious relationships in data containing only uninformed liquidity trades. We also assess the performance of BVC order imbalances in the NASDAQ HFT dataset, showing that BVC order imbalances underperform conventional order imbalance measures that are based on aggressor flags in detecting informed trading. When we isolate the component of order flow that BVC designates as passive informed trading, we find that this component of order flow fails to predict future returns with the correct sign. On balance, our evidence supports the use of conventional order imbalance measures to identify informed trading.
∗ We thank NASDAQ and Frank Hatheway for supplying the HFT dataset.
1. Introduction
A standard approach to identifying informed trading in microstructure data is to use order imbalances (or unexpected order imbalances) as a proxy. Order imbalances measure the difference between aggressive buying volume and aggressive selling volume in an interval, and sometimes normalize this value by total volume traded. 1 This has been a long standing practice and has been validated empirically, but this approach has a well known shortcoming. Order imbalances measure the amount of aggressive trading in an interval, and their use as a proxy for informed trading rests on the assumption that, on balance, informed traders tend to demand liquidity. However, as pointed out by Harris (1998), Kaniel and Liu (2006), Baruch, Panayides, and Venkataraman (2017), and many others, there are conditions that can motivate informed traders to trade passively. This has been confirmed empirically (Kaniel and Liu (2006), others).
The validity of the use of order flow imbalances as a proxy for informed trading rests on the assumption that, while some informed traders my trade passively, they generally demand more liquidity than they supply. O’Hara (2015) and Easley, Lopez de Prado, and O’Hara (2016) (ELO
(2016) hereafter) suggest that this assumption has weakened in modern markets, as informed traders presumably have increased their use of algorithms that trade passively. ELO (2016) argue that, as a result of increases in passive trading by informed traders, “the notion of the active side of the trade signaling underlying information is undermined.”
Regardless of whether aggressive trading remains correlated with informed trading in modern markets, more accurate methods of identifying informed trading that outperform order imbalances by detecting both aggressive and passive informed trading would be desirable. ELO
1 Some studies calculate imbalances using number of trades or dollar volumes instead of share volumes. Examples include Chordia, Roll and Subrahmanyam (2002), Chordia and Subrahmanyam (2004), and Kim and Stoll (2014). 1
(2016) introduce the bulk volume classification technique (or BVC), which potentially addresses this issue. As describe in ELO (2016), BVC “aggregates trades over short time or volume intervals and then uses a standardized price change between the beginning and end of the interval to approximate the percentage of buy and sell order flow.” The authors suggest that this technique identifies imbalances from informed traders whether they trade aggressively or passively. They provide the following interpretation of their main empirical result: “What matters for our purposes is that order imbalance created from bulk volume works, in the sense that it is positively related to the high low spread [signifying a response to informed trading], but that order imbalance created from the tick rule (or even derived directly from the aggressor flag) does not work.”
If BVC indeed has these properties, this technique offers an invaluable tool for researchers. However, the empirical evidence supporting these claims is limited. ELO (2016) provide strong arguments motivating BVC, but their empirical validation of BVC is limited to a single test asset and single BVC parameterization. We are only aware of two other studies that conduct similar analyses (Panayides, Shofi, and Smith (2019) and Chakrabarty, Pascual, and
Shkilko (2015)), and they primarily rely on the methodology employed in ELO (2016) and a similar test specification that was proposed in an earlier working paper version of ELO (2016) and later abandoned.2,3 Both studies qualitatively reproduce the main ELO (2016) result in samples of equity data using many stocks, but Chakrabarty, Pascual, and Shkilko (2015) express
2 The working paper version of ELO (2016) used a test with a high low spread (HL) as a liquidity proxy. The final version of ELO (2016) states that the HL spread is contaminated by “fundamental variance” and replaces it with the Corwin Shultz in this test. The final version of their test is discussed in detail in Section 3. 3 Panayides, Shofi, and Smith (2019) also conduct an alternate test that finds BVC predict returns around corporate events. Andersen and Bondarenko (2014a, 2014b, and 2015) also test or discuss BVC. However, they focus on its accuracy in identifying the aggressor side of trades or its use in VPIN calculations. 2
qualifications.4 Chakrabarty, Pascual, and Shkilko (2015) also use an older version of BVC based on the ELO working paper, which is similar but not identical.
In this paper, we further investigate the ability of BVC to detect informed trading. First,
we revisit the initial empirical evidence provided by ELO (2016). We first discuss some
conceptual concerns with their tests. We then repeat their tests in simulated data with no
informed trading, and find that their tests are severely misspecified. The ELO (2016) regressions
never fail to reject the null hypothesis of no informed trading at the 1% level in the simulated
data. It appears that these tests are measuring something other than the relationship between
BVC and informed trading. While we believe that the original evidence in support of BVC’s
ability to detect informed trading is unreliable, this does not rule out the possibility that BVC
may still have the properties claimed by ELO (2016). Therefore, we propose two alternative tests
and conduct them on a sample of equity trading data. Both of these tests are motivated by the
idea that informed trading should positively predict future returns, and are related to tests previously conducted in the literature (cites). In the first set of tests, we compete order
imbalances constructed with BVC with conventional order imbalances using the known trade
signs (true order imbalances). In multiple specifications with several return horizons and parameterizations of BVC, true order imbalances outperform BVC order imbalances in every
case, and the coefficients on BVC order imbalances are often insignificant or predict future
returns with the wrong sign. In a second set of tests, we derive a measure from BVC and the true
order imbalance that should measure passive informed trading if BVC performs as claimed, and
test the ability of this measure to predict future returns. This measure fails to do so in all
4 Chakrabarty, Pascual, and Shkilko (2015) note that the relationship between BVC and the HL spread may be driven by correlations with volatility. They also replace HL with returns and alternate liquidity proxies and find mixed results; BVC is generally found to be positively related to returns and contemporaneous liquidity proxies but does not uniformly outperform conventional order imbalance measures. 3
specifications examined, and often predicts returns with the wrong sign. We repeat these tests using order imbalances constructed using trades signed with the Lee and Ready (1991) method instead of true trade signs, and find similar results (available next version).
Our results suggest that researchers should be wary of employing BVC based on expectations of its superior ability to measure informed trading. We find no evidence that BVC outperforms conventional order imbalances constructed from true trade signs or Lee and Ready
(1991) trade signs in this regard; it actually underperforms significantly in our tests.
The rest of this paper is organized as follows. Section 2 describes and discusses the BVC algorithm. Section 3 reviews the existing evidence used to support the claim that BVC identifies informed trading. Section 4 presents results from alternate tests of the ability of BVC to identify informed trading. Section 5 concludes.
2. The Bulk Volume Classification Algorithm
The bulk volume classification (or BVC) method does not classify individual trades as do other conventional trade classification algorithms such as the Tick rule or the Lee and Ready method. Instead, BVC aggregates trades into bars, by either volume or time.5 ELO (2012) and
ELO (2016) argue for the importance of aggregation by volume in modern markets, so we focus on volume bars in this paper. The CDF of the price change between the close of a bar and the close of the prior is then used to calculate the percentage of buys versus sells per bar. Since trade by trade classification is replaced by a bar by bar probability of buys versus sells, the aggregation may lead to greater efficiency with respect to data usage.
Specifically, the buyer initiated volume is calculated as follows (Eq. (1), ELO (2016)):
5 Trade bars have also been used in Chakrabarty, Pascual and Shkilko (2015) and a working paper version of ELO (2016). 4