Evaluating the Impact of IPv6 on a High Frequency Trading System

Nereus Lobo, Vaibhav Malik, Chris Donnally, Seth Jahne, Harshil Jhaveri

[email protected] , [email protected] , [email protected] , [email protected] , [email protected]

A capstone paper submitted as partial fulfillment of the requirements for the degree of Masters in Interdisciplinary at the University of Colorado, Boulder, 4 May 2012. Project directed by Dr. Pieter Poll and Professor Andrew Crain.

1 Introduction

Employing latency-dependent strategies, financial trading firms rely on trade execution speed to obtain a price advantage on an asset in order to earn a profit of a fraction of a cent per asset share [1]. Through successful execution of these strategies, trading firms are able to realize profits on the order of billions of dollars per year [2]. The key to success for these trading strategies are ultra-low latency processing and networking systems, henceforth called High Frequency Trading (HFT) systems, which enable trading firms to execute orders with the necessary speed to realize a profit [1]. However, competition from other trading firms magnifies the need to achieve the lowest latency possible. A 1 µs latency disadvantage can result in unrealized profits on the order of $1 million per day [3]. For this reason, trading firms spend billions of dollars on their data center infrastructure to ensure the lowest propagation delay possible [4]. Further, trading firms have expanded their focus on latency optimization to other aspects of their trading infrastructure including application performance, inter-application messaging performance, and network infrastructure modernization [5]. As new networking technologies are introduced into the market, it is imperative that these technologies are evaluated to assess the impact on the end-to-end system latency. The objective of this research is to determine if there is a significant impact on HFT system latency from emerging networking technologies that could result in a competitive disadvantage. Specifically, this paper develops a latency-optimized HFT system performance model and contrasts latency performance between IPv4 and IPv6 through this model. For the purposes of measurement and evaluation, this paper defines a latency of 1 µs as significant based on the magnitude of forgone profit that can occur from such a disadvantage. Our research contributes a latency optimized end-to-end HFT system model for comparing latency performance with other HFT systems and platforms. Additionally, this paper provides recommendations for HFT system optimization. This research is important because the profitability of low latency trading strategies is highly coupled to end-to-end HFT system latency performance, where a small change in latency performance can result in a large swing in realized profits. Further, IP is a foundational networking technology implemented across an HFT system. Changing the IP implementation from IPv4 to IPv6 alters the latency characteristics of these devices due to additional overhead 1 processing and serialization delays. While the change in latency is arguably small, these small latencies, accumulated over an end-to-end HFT system, may result in a significant 1 µs latency increase. The salient feature of an HFT system is the speed in which it is able to execute a trade. In an HFT system, pre-trade analytics occur to determine if a trade should be executed or not. The latency of pre-trade analytics, which occurs at the application layer inside an HFT market analysis engine, is beyond the scope of this paper. Additionally, the inter-software messaging and the trade execution software processing latency are beyond the scope of this paper. Instead, this research focuses solely on the trade execution path latency performance across the processing platform and networking devices that provide connectivity to the financial exchange. The main audiences for this research are financial trading firms that employ low latency trading strategies and network service providers that service these firms. The findings from this research serve to assist with a financial trading firm’s HFT system IP modernization planning. Further, the resulting latency-optimized HFT system model can be used by trading firms and network service providers as a point of comparison to assess the performance of their HFT systems and to identify latency optimization opportunities for those systems.

2 Assumptions

This paper makes two assumptions to establish a common networking interface across an HFT system. The first assumption is the message size for a financial trading firm’s trade execution order is 256 bytes. This assumption is based on the largest average trade order message size used by a financial exchange [6]. The second assumption is that all processing and networking devices implement a 10G networking adapter, which is derived from trading firms’ data center equipment expenditures [4].

3 Prior Study

Our initial literature review served to confirm the novelty of our research question. We were able to find general latency studies for specific computing and networking technologies, but did not find any studies specific to the latency of an end-to-end HFT system. We were also able to find general studies on latency issues caused by IPv6, especially as they relate to transition technologies; however, we were unable to find any studies that described potential IPv6 latency issues in an HFT network.

4 Methodology and Results

To determine the latency impact of IPv6 on an HFT system, we decomposed the system into three segments: the trade execution computing platform segment, the IP network segment, and the optical transport segment. Next, we developed latency models to identify sources of latency for processing, networking, and transmission platforms used in the HFT system segments. From there, a literature survey was conducted to identify high performance platforms likely deployed in HFT systems, and to identify where differences between IPv4 and IPv6 implementations would be latency impacting. With the HFT system segment platforms identified, our literature survey expanded to obtain latency performance measures for each platform and to assess any performance differential between IPv4 and IPv6. Finally, with consideration to the latency contributions of IPv4 and

2

IPv6, platform latency performances are compiled into the HFT system model to establish a latency-optimized performance baseline.

4.1 Trade Execution Computing Platforms Trade execution computing platforms are simply high performance computers that run a financial trading firm’s trade execution software application [7]. To identify potential sources of latency on the computing platform, an IP packet processing model was developed to illustrate different potential paths through an (OS) available to trade execution software executing a trade order. The first path is through the OS’s networking stack, where the IP protocol overhead processing occurs, along with processing of other protocols. The second path is through a TCP Offload Engine (TOE), which instantiates the networking stack on the platform’s network adapter. The computing platform IP packet processing model is illustrated in Figure 1. From this model, the pertinent computing platform processing aspects that contribute to trade execution latency are the OS networking stack, TOE, IPSec and IPv6 transition protocol processing. Further, HFT literature identifies that trade execution computing platform deployment configurations are non-standard and can be implemented with a variety of OS and hardware configurations [7]. Therefore, the literature survey focused on locating latency performance measures for High Performance Computing (HPC) platforms, which run modern versions of either Windows or Linux OS configured with a latency-optimized networking stack.

Application Layer S T T Y A A S S Socket Layer S K K T Transport Layer E I S M N C T H Network Layer E E TCP Off- Zero -IPSec C R D Loading Copy A R U -v6 Transition U L L P I L T N Device Driver S S G

Interrupt Coalescing Legend Packet Processing NIC Hardware Operating System

Figure 1: Computing Platform IP Packet Processing Model

The literature survey produced two independent research studies identifying round-trip time (RTT) latency performance on similar HPC platforms. The first study was conducted on a 3 GHz Intel Xeon 5160 processor with the following configuration: Linux 2.6.18, latency- optimized network stack, and a Chelsio T110 10 Gigabit Ethernet (GigE) adapter. The platform’s RTT latency performance was 10.2 µs [8]. The second study was conducted on a 2.2

3

GHz AMD Opteron processor with the following configuration: Linux 2.6.6, latency-optimized network stack, and a Chelsio T110 10GigE adapter. The platform’s RTT latency performance was 10.37 µs [9]. Further, the first study contrasted the platform’s OS networking stack latency performance against the TOE on the Chelsio T110 10GigE adapter. This resulted in a 1.3 µs latency improvement, which lowered the total platform latency performance to 8.9 µs [9]. In contrast, non-optimized computing platforms and networking stacks have a latency performance on the order of 40 µs, which is significantly higher than that of optimized platforms [10][11]. Finally, the computing platform’s total latency performance is decomposed into transmit and receive latency performance. Utilizing the TOE, the computing platform’s transmit latency is 2.9 µs and the receive latency is 6.0 µs [8][9]. The measured IPv4 processing latency for the optimized networking stack on the Intel platform, which utilized the network adapter for IPv4 header checksum offloading, is 450 ns [8]. While latency performance data for IPv6 is unavailable from the platform study, investigation into the Linux 2.6 network stack shows increased processing is needed for IPv6 socket creation due to the inclusion of ICMP, NDP, and IGMP into IPv6 [12]. An independent study contrasting IPv4 and IPv6 socket creation times confirms this finding [13]. However, a financial firm has a persistent connection to a financial exchange during trading hours; i.e., socket creation latency is not a factor in trade order execution speed. Continuing, once the socket is created, and assuming IPv4 checksum offloading, the main processing performed at the IP layer is to fragment large TCP segments [14]. Based on the assumption of a 256 byte message size, the IP layer will not need to fragment the TCP segment. Therefore, the OS processing demands on IPv6 are equivalent to IPv4 and will not incur a significant latency penalty. However, a latency penalty is incurred when transmitting an IPv6 packet due to the serialization delay from the larger overhead size. The IPv6 serialization latency increases the computing platform latency performance by .016 µs. Table 1 identifies the trade execution computing platform latency performance values for IPv4 and IPv6 used in the latency-optimized HFT system model.

Table 1: Trade Execution Computing Platform Latency Performance Model Trade Execution IPv4 Latency IPv6 Latency Computing Platform (µs) (µs) Model Tx latency 2.9 2.916 Model Rx latency 6.0 6.016 Total Latency 8.9 8.932

Other optional IP technologies that may impact computing platform latency performance are IPSec and IPv6 transition mechanisms. IPSec appends additional headers to IPv4 and IPv6 for increased IP communications security [15]. When IPSec is configured to have the lowest impact to latency, i.e., is configured for tunnel mode and AES-128 Encapsulating Security Payload (ESP), the Intel platform incurs an additional 1.9 µs latency performance penalty [8][16]. The IPSec latency performance penalty is equal across IPv4 and IPv6 [15]. Further, IPSec use is optional in both IPv4 and IPv6, even though it implementation in IPv6 is mandatory [17]. Finally, a survey of IPv6 transition mechanism performance studies exposed the potential for up to a 200 ms latency penalty on non-optimized platforms [18][19].

4

4.2 IP Network From the perspective of a financial trading firm, the main purpose of networking devices in an HFT system is to processes and route trading orders to the financial exchange as fast as possible. A general architecture for networking devices was constructed to identify potential IP packet processing latency sources [20]. Based on the functions identified in this architecture, the potential latency sources are the networking device’s processing delay, fragmentation delay, serialization delay, queuing delay and checksum delay. This architecture is illustrated in Fig. 2.

Figure 2: Networking Platform IP Packet Processing Model

When an IP packet enters a networking device, a processing latency is incurred because the device must read in the IP packet header and make a routing decision based on the IP header address [20]. Additionally, IPv4 packets incur further latency due to the verification of the header checksum, which determines if any header errors were incurred during the transmission of the packet [21]. However, IPv6 does not contain a header checksum and is therefore not subject to the additional checksum latency penalty [16]. This paper combines IPv4 checksum processing into the IPv4 processing latency for evaluation purposes. The processing delay for a networking device is defined as the inverse of the device’s packet processing speed [22]. Once the path through the networking device is determined, the IP packet is sent to the queue on the selected egress port for transmission. The queuing delay is the amount of time a packet is held prior to transmission and is determined by the queuing algorithm efficiency and egress port buffer size [23]. The queuing delay for a networking device is defined as the product of the network load and the store and forward switching latency [24]. As the IP packet exits the networking device’s egress port, the packet is serialized and transmitted one bit at a time onto the physical transmission medium. The serialization delay is defined as the sum of the message size and the networking overhead, in bits, divided by the data rate [24]. The last source of latency identified is fragmentation delay, where a transmitting networking device fragments a large IP packet into smaller packets for transmission over a path with a smaller Maximum Transmission Unit (MTU). The receiving networking device must wait until

5 all packet fragments are received before a routing decision can be made. Based on the assumed trade order packet size, IP fragmentation will not occur in an HFT system [6]. Additionally, it is worth noting that while fragmentation is supported in IPv4, IPv6 implements MTU discovery which mitigates the need for packet fragmentation on intervening devices such as routers [16]. From this model, the relevant networking device processing aspects that contribute to trade execution latency are the processing delay, queuing delay, and serialization delay. The main objective of the literature survey was to identify latency performance measures for the highest performing networking devices. The router literature survey produced a study comparing the performance between the Cisco NX7000 and Juniper EX8000 series routers, which are marketed as high performance data center routers [25]. Based on the data provided in this study, and using the processing delay calculation from the model, the processing delay for the Cisco NX7000 series router was calculated to be 16.67 ns for IPv4 and 33.3 ns for IPv6. The processing delay for the Juniper EX8000 series router was calculated to be 8.3 ns for IPv4 and IPv6. To find the worst case queuing and serialization latency for a trade execution order, the largest average message size of 256 bytes is used in the calculations [6]. The queuing latency for IPv4 is 66.2 ns and for IPv6 is 66.8 ns. The serialization latency over a 10GigE network adapter using IPv4 is 251 ns and using IPv6 is 267 ns. Based on the latency sources identified in the model, the aggregate latency performance for the Cisco NX7000 is 333.9 ns for IPv4 and 367.1 ns for IPv6. The aggregate latency performance for the Juniper EX8000 is 325.5 ns for IPv4 and 342.1 ns for IPv6. Therefore, the latency performance difference between IPv4 and IPv6 is not significant. However, a large performance discrepancy exists between the networking device IP packet processing model and the measured overall device latency. From the study, the aggregate latency performance of the Juniper EX8000 ranges between 8 to 15 µs and the performance of the Cisco NX7000 ranges between 20 to 40 µs [25]. Investigation into the difference between the modeled and measured latency performance of the Cisco and Juniper routers is left to further study. Instead, the literature survey expanded scope to compare the latency performance of TCP/IP on latency- optimized technologies such as Infiniband and Myrinet. From the literature survey, a study measured the latency performance for TCP/IP at 60 µs, for Myrinet at 8.3 µs, and for Infiniband at 5.3 µs [8]. Another study evaluated networking devices specifically optimized for latency performance. This study evaluated the Fujitsu XG1200 10GigE switch and measured the latency performance at 0.45 µs [9]. The latency performance from optimized networking devices and technologies more closely match the model’s calculated latency performance. Table 2 contains the modeled IP network device latency performance values used in the latency-optimized HFT system model.

Table 2: IP Network Device Latency Performance Model IPv4 Latency IPv6 Latency IP Network Device (µs) (µs) Processing Delay 0.008 0.008 Queuing Delay 0.066 0.067 Serialization Delay 0.251 0.267 Total Latency 0.326 0.342

4.3 Optical Network Financial trading firms are conscious of propagation delay and have been actively reducing

6 it, primarily through co-locating their datacenters with the financial exchanges [4]. Pragmatism unfortunately dictates that trading firms cannot co-locate a datacenter at every exchange and that they must therefore rely on high-speed long-haul optical networks for many connections. Given the ubiquitous need for trade execution speed, latency sources in the long-haul optical network need to be evaluated for impact on HFT system performance. The architecture model, illustrated in Figure 3, identifies three categories of potential latency sources present in the optical network.

Processing Delay

ADM / Multiplexor ADM / Multiplexor Dispersion Optical Compensation Amplifier

80 to 150km 80 to 150km

Distance Delay

Figure 3: Optical Network Latency Model

The first category is distance delay, which is a function of lightwave propagation delay, fiber cable construction, and cable routing directness. Lightwave propagation delay is a function of the and the refraction index. Modern optical fiber cables, including OFS TrueWave Reach and Corning LEAF, exhibit latency characteristics of approximately 4.9 ms over 1,000 km [26][27]. Air Core Fiber is a future generation fiber technology that exhibits significantly improved latency characteristics of 3.36 ms over 1,000 km [28]. Next, fiber cable construction impacts latency because most deployed fiber cables are constructed with loose buffer tube design, where several fiber filled tubes are twisted around a central core. This twisting approach increases the lightwave’s travel distance with respect to the fiber cable length. Alternatively, fiber cables can be constructed with a single central tube that reduces the lightwave’s travel distance within the fiber cable [29]. The loose buffer tube design can add 59 to 398 µs of latency over 1000 km, whereas single-tube ribbon cable adds only 15 to 24 µs over 1000 km [29]. Finally, cable routing directness relates the total installed fiber cable length to the shortest distance between two geographic points. For example, based on the Level 3 fiber route map between New York and Chicago, we estimated a distance of 1,485 km and a resultant latency of 7.28 ms. One approach to reduce the overall fiber distance is to remove installed fiber maintenance coils which would reduce the overall distance by 36 km and lower latency to 7.1 ms [30]. Alternatively, new fiber routes could be installed to further reduce the fiber route distances between two points. Spread Networks has applied this approach and was able to reduce the fiber distance by 158 km and lower their network latency to 6.5 ms [31]. To construct an optimized distance latency model, the minimum distance between the New York and Chicago exchange is calculated based on the distance between their latitudinal and longitudinal coordinates. Using this method, the distance between the New York and Chicago exchange is 1,040 km. This distance is the optimized lightwave propagation distance and is used along with the optimized fiber index of refraction and cable construction type to find the

7 optimized lightwave distance latency. Table 3 identifies the lightwave distance latencies for the optimized and non-optimized models.

Table 3: New York Exchange to Chicago Exchange Lightwave Distance Latency Model Optical Network Distance Optimized Latency Non-Optimized Latency Refraction Index 3.36 ms per 1,000 km 4.9 ms per 1,000 km (Air Core Fiber) (OFS TrueWave) Helix Factor 0.015 ms per 1,000 km 0.398 ms per 1,000 km (Single Tube Ribbon) (Loose Buffer Tube) Routing Directness 1040 km 1485 km (Theoretical minimum) (Level 3 Fiber Map) Total Distance Latency 3.510 ms 7.868 ms

The second optical network latency category is the deployed optical transmission systems, which include optical amplifiers, Add/Drop Multiplexers (ADMs), muxponders, dispersion compensation modules (DCMs), and Forward Error Correction (FEC). For optical amplifiers, the implementation of an Erbium Doped Fiber Amplifier (EDFA) adds approximately 30m of erbium doped fiber per amplifier, which serves as an all optical amplification medium for the optical signal. Over 1000 km, approximately 10 EDFAs are needed, which would increase the overall fiber length by 300m resulting in an additional latency of 1 µs [32]. However, Raman amplifiers are able to amplify the optical signal without introducing additional fiber length to the route. For muxponders, which adapt layer 2 networking protocols to and from the optical medium, the processing delay and queuing latency for adapting a 10GigE input onto the layer 1 optical network is 6 µs [33]. ADMs, which perform a similar adaption function for SONET networks, did not have any 10GigE latency data available. Next, DCMs correct the dispersion of chromatic light which would otherwise cause an optical signal to be intelligible to the receiver. The latency characteristics of dispersion systems are technology dependent. Based on Level 3’s New York to Chicago fiber route, older style DCMs on SMF-28e fiber will increase the latency performance by 1.09 ms [29]. Alternatively, along the same route, current generation coherent networking compensations, based on specialized ASIC/DSPs, reduce the latency penalty for dispersion compensation to 3 µs. Finally, FEC processing, which is employed to improve the optical signal to noise ratio via coding gains, incur a latency performance penalty, ranging from 15 µs to 150 µs [28][34]. Table 4 identifies the optical equipment processing latencies for the optimized and non-optimized models. The final latency category is the serialization of the optical protocol onto the fiber, which is impacted by the overhead differences in the IP header and the choice of layer 2 adaption protocol. To contrast the latency performance between IPv4 and IPv6, a 256 byte packet was modeled through an OC-192c (10G) Packet of SONET (POS) link and a 10G optical Ethernet link. For the 10G POS link, we calculated the latency performance for IPv4 at 246 ns and for IPv6 at 263 ns. For the 10G optical Ethernet link, we calculated the latency performance for IPv4 at 254 ns and for IPv6 at 270 ns. Table 5 identifies the optical network IP serialization latencies for the optimized and non-optimized models.

8

Table 4: Optical Network Equipment Processing Latency Model Optical Network Optimized Non-Optimized Equipment Latency Latency 0.0 µs 1.5 µs Optical Amplifier (RAMAN) (EDFA) Terminal Equipment 6.0 µs 6.0 µs Processing (ADM) (Muxponder) 3.0 µs 1,090.0 µs Dispersion Compensation (ASIC/DSP) (DCM) FEC Processing 15.0 µs 150.0 µs Total Processing Latency 24.0 µs 1,247.5 µs

Table 5: Optical Network IP Serialization Latency Model Optimized Latency Non-Optimized Latency (10G POS) (10G Ethernet) Optical Network IPv4 Latency IPv6 Latency IPv4 Latency IPv6 Latency IP Serialization (µs) (µs) (µs) (µs) 0.246 0.263 0.254 0.270

Table 6 identifies the composite optimized optical network model comprised of the optimized lightwave distance latency, optical equipment processing latency, and IP serialization latency. The optimized optical network latency performance values are used in the latency- optimized HFT system model.

Table 6: Composite Optimized Optical Network Latency Performance Model IPv4 Model IPv6 Model Optical Network Latency Latency (µs) Latency (µs) Lightwave distance 3,510.00 3,510.00 Equipment processing 24.00 24.00 IP serialization 0.25 0.26 Total 3,534.25 3,534.26

4.4 HFT System Model Based on our research, the latency-optimized HFT system model provides the minimum latency achievable between a trading firm and financial exchange. Some aspects of the HFT system model are not immediately achievable in practice because they are based on either future technologies or direct paths between two geographic locations. This paper examines two trading scenarios for latency-dependent trading strategies to characterize the HFT system model’s performance. The first strategy is low latency arbitrage, which exploits a condition at financial exchanges where trade orders in the front of a queue receive a lower asset price, in the range of a cent or two per share, than orders deeper in the queue [1]. A typical HFT system implementation strategy is to co-locate the trade execution computing platform with the financial exchange [1]. Under this scenario, the transmit latency of a trade order is critical. From the developed HFT system model and our collocation assumption, the HFT system segments that apply to this

9 scenario are the trade execution computing platform and the IP network. Figure 4 illustrates the HFT low latency arbitrage scenario.

Co-located HFT Firm Financial Firm A Exchange Firm B

Buy Order Buy Order

Exchange Queue Firm A Order – Stock Price $10.00 Firm B Order – Stock Price $10.01

Figure 4: High Frequency Trading Scenario – Low Latency Arbitrage

For the computing platform, the transmit latency for IPv4 is 2.9 µs and for IPv6 is 2.916 µs [8][9]. For the IP network, the transmit latency for IPv4 is 326 ns and for IPv6 is 342 ns. Therefore, based on the developed HFT system latency performance model for this scenario, the total latency for IPv4 is 3.226 µs and for IPv6 is 3.258 µs. Table 7 contains the latency- optimized HFT system model values for IPv4 and IPv6. Under the low latency arbitrage scenario, the HFT system latency model shows a non-significant latency performance penalty for IPv6 network implementations.

Table 7: Latency Optimized HFT System Model for Low Latency Arbitrage Scenario IPv4 Latency IPv6 Latency HFT System Model (µs) (µs) Trade Execution 2.900 2.916 Computing Platform IP Network Device 0.326 0.342 Total Latency 3.226 3.258

The second strategy is HFT scalping where a trading firm identifies, between two distant exchanges, a higher asset bid price on one exchange than the asking price on the other. The firm then purchases the asset at the cheaper asking price on the one exchange and immediately resells the asset at the higher bid price on the other exchange, resulting in a profit of a few cents per share [1]. A trading firm may capitalize on these scalping opportunities at any financial exchange. Under this scenario, the round trip latency is critical because a trading firm must complete the buy transaction before the selling opportunity disappears. In addition to the round trip latency from the buy transaction, the transmit latency to the distance exchange for the sell transaction must also be considered under this scenario. From the developed HFT system model, 10 the HFT system segments that apply to this scenario are the trade execution computing platform, the IP network, and the optical network. Additionally, for this scenario we model the HFT system latency performance of a trading firm in New York performing HFT scalping in Chicago. Figure 5 illustrates the HFT scalping scenario.

Figure 5: High Frequency Trading Scenario – Scalping

For the trade execution computing platform, the total latency for IPv4 is 11.8 µs and for IPv6 is 11.848 µs. For the IP network, the total latency for IPv4 is .978 µs and for IPv6 is 1.026 µs. For the optical network, the total latency for IPv4 is 3,534.25 µs and for IPv6 is 3,534.26 µs. Therefore, based on the developed HFT system latency performance model for this scenario, the total latency for IPv4 is 3,547.028 µs and IPv6 is 3,547.134 µs. Table 8 contains the latency- optimized HFT system model values for IPv4 and IPv6. Under the scalping scenario, the HFT system latency model shows a non-significant latency performance penalty for IPv6 network implementations.

Table 8: Latency Optimized HFT System Model for Scalping Scenario IPv4 Latency IPv6 Latency HFT System Model (µs) (µs) Trade Execution 11.800 11.848 Computing Platform IP Network Device 0.978 1.026 Optical Network 3,534.250 3,534.260 Total Latency 3,547.028 3,547.134

11

5 Conclusion

Based on 1 µs as the latency of significance for HFT systems, we conclude that a trading firm implementing a latency-optimized native IPv6 HFT system will not incur a latency disadvantage when compared to a native IPv4 implementation. Our results show that among all of the various delay types, the serialization of the additional IP header bits is the prominent performance differentiator between IPv4 and IPv6, producing a performance difference on the order of tens of nanoseconds. Additionally, our results show that the optical layer is the largest source of latency in an HFT network and provides the greatest opportunity for latency optimization.

5.1 Recommendations For HFT systems administrators reviewing their path to IPv6, we offer a few recommendations. First, our research finds that transitional deployments will be disadvantaged significantly; administrators would need to make plans for a direct cutover to a native IPv6 environment. Second, high performance platforms, which are generally hardware based and software optimized, are essential to eliminate any processing delay differential between IPv4 and IPv6. Third, faster interfaces, such as the 10GigE used in our model, are essential to minimize the serialization delay difference between IPv4 and IPv6. Finally, rigorous lab testing, based on the firm’s production network, will need to be performed to ensure that platforms and their interoperation do not contain any latency degrading hardware or software bugs.

5.2 Further Study During the course of our research, we found several areas that warrant further study. The first area to investigate is the latency performance difference between IPv4 and IPv6 on layer 4 processing platforms, including firewalls and WAN accelerators. While it is expected, based on our research findings, that the performance difference is negligible, a systematic study of these platforms would serve to enhance the end-to-end HFT system latency performance model. Next, our research focused on a static model of an HFT system. Dynamic modeling, defined as characterizing the latency performance under different load scenarios, presents another opportunity for further study. Due to the cost and complexity of latency-optimized equipment, a study of this nature would need significant funding to be performed in an academic environment. Finally, while HFT firms are currently focused on microsecond optimizations, we project that nanosecond latencies will be emphasized in the future. As latency becomes optimized at the optical layer, or for direct co-location applications, the IPv4 to IPv6 serialization delay differential, even at higher future interface speeds, may again become an open question of significance. For this future possibility, further study into the latency advantages and feasibility of Infiniband and Myrinet would be warranted.

12

References:

[1] A. Golub, “Overview of high frequency trading,” presented at the Marie Curie Initial Training Network on Risk Management and Risk Reporting Mid-Term Conference, Berlin, 2011. [2] C. Reed, (2012, February 27). A fair transaction tax for U.S. stock trading (1st ed. ) [Online]. Available: http://www.seekingalpha.com [3] J. Goldstein, (2010, June 8). The million dollar microsecond (1st ed. ) [Online]. Available: http://www.npr.org/ [4] K. McPartland, (2010, June 21). Long distance latency: Straightest and fastest equals profit (1st ed. ) [Online]. Available: http://www.tabbgroup.com [5] M. Rabkin, (2010, Sep. 27). TABB says banks are taking holistic approach to finding weak links within a trade’s lifecycle (1st ed. ) [Online]. Available: http://www.tabbgroup.com [6] M. Simpson, “Market data optimized for high performance,” presented at the FIA Futures and Options Expo Conference, Chicago, IL, 2005. [7] “Low latency framework,” presented at the Oracle Trading Applications Developer Workshop, London, 2011. [8] S. Larsen et al. , “Architectural breakdown of end-to-end latency in a TCP/IP network,” in 19 th Intl. Symp. on Computer Architecture and High Performance Computing ., Rio Grande do Sol, 2007, pp. 195-202. [9] W. Feng et al. , “Performance characterization of a 10-gigabit ethernet TOE,” in 13 th Symp. on High Performance Interconnects ., 2005, pp. 58-63. [10] S. Narayan and Y. Shi, “TCP/UDP analysis of Windows operating systems with IPv4 and IPv6,” in 2nd Intl. Conf. on Signal Processing Systems ., Dalian, 2010, pp. 219-222. [11] S. Narayan et al. , “Network performance evaluation of internet protocols IPv4 and IPv6 on operating systems,” in Intl. Conf. on Wireless and Optical Communications Networks ., Cairo, 2009, pp. 1-5. [12] T. Herbert, “Internet protocol version 6 (IPv6),” in The Linux TCP/IP Stack: Networking for Embedded Systems, 1st ed. Hingham, 2004, ch. 11, sec. 11.8, pp. 425-426. [13] S. Zeadally and L. Raicu, “Evaluation IPv6 on Windows and Solaris,” in IEEE Internet Computing ., 2003, pp. 51-57. [14] T. Herbert, “Internet protocol version 6 (IPv6),” in The Linux TCP/IP Stack: Networking for Embedded Systems, 1st ed. Hingham, 2004, ch. 11, sec. 11.5.1, pp. 417-420. [15] S. Kent and R. Atkinson, (1998, November). Security architecture for the internet protocol [Online]. Available: http://www.ietf.org/rfc/rfc2401.txt [16] H. Niedermayer et al. , “The networking perspective of security performance – a measurement study,” in 13 th GI/ITG Conference on Measuring, Modeling and Evaluation of Computer and Communication Systems ., 2006, pp. 1-17. [17] S. Deering and R. Hinden, (1998, Dec.). Internet protocol, version 6 (IPv6) specification [Online]. Available: http://www.ietf.org/rfc/rfc2460.txt [18] S. Narayan and S. Tauch ., “IPv4-v6 transition mechanisms network performance evaluation on operating systems,” in 3 rd IEEE Intl. Conf. on Computer Science and Information Technology ., Chengdu, 2010, pp. 664-668. [19] S. Tauch, “Performance evaluation of IP version 4 and IP version 6 transition mechanisms on various operating systems,” M.S. thesis, Computing & Technology, Unitec Inst. of Technology, NZ, 2010. [20] J. Aweya. “IP router architecture: An overview,” J. of Systems Architecture , vol. 46, pp.483-511, 1999. [21] J. Postel, (1981, Sep.). Internet protocol: DARPA internet protocol specification [Online]. Available: http://www.ietf.org/rfc/rfc791.txt [22] J. Kurose and K. Ross. “Delay and loss in packet-switched networks,” in Computer networking: a top- down approach featuring the Internet, Boston, Addison-Wesley, 2000. [23] Latency on a switched Ethernet network , Ruggedcom, 2008 [Online]. Available: http://www.ruggedcom.com/pdfs/application_notes/latency_on_a_switched_ethernet_network.pdf [24] E. Gamess and N. Morales. "Modeling IPv4 and IPv6 Performance in Ethernet Networks," International Journal of Computer and Electrical Engineering, vol. 3, no. 2, pp. 285, 2011. [25] Juniper ex8200 vs. Cisco nexus 7000 , Great Lakes Computer, 2012 [Online]. Available: www.glcomp.com [26] Corning LEAF optical fiber product information , Corning Incorporated, 2002. [27] TrueWave REACH fiber , OFS Fitel, LLC, 2011.

13

[28] P. Schoenau, (2011, Apr. 28). Optical networks for low latency applications [Online]. Available: http://www.a-teamgroup.com/ [29] J. Jay, “Low signal latency in optical fiber networks,” in Proc. of the 60 th IWCSConference , Charlotte, NC, 2011, pp. 429-437. [30] B. Quigley, (2011, May 4). Another building block of low-latency trading: efficient optical transport [Online]. Available: http://blog.advaoptical.com/another-building-block-low-latency-trading-efficient- optical-transport/ [31] Network Map, Spread Networks, 2012 [Online]. Available: http://www.spreadnetworks.com/network-map/ [32] Low latency—how low can you go? , Transmode, 2011 [Online]. Available: http:// www.transmode.com/doc_download/262-low-latency -design [33] Cisco ONS 15454 40Gbps enhanced FEC full band tuneable muxponder cards , Cisco Systems, Inc., 2011. [34] A sensible low-latency strategy for optical transport networks , Optelian, 2011.

14