An Industry Briefing Researched and Written by

Powering Winning Low-Latency Trading Strategies Gaining an Edge Through Server Performance February 2013

In Association with Powering Winning Low-Latency Trading Strategies

Introduction

It’s appropriate to begin with a little levity:

Two campers are walking through the forest when they suddenly encounter a big grizzly bear. The bear rears up on his hind legs and lets out a terrifying roar.

Both campers were frozen in their tracks. The first camper whispers “I’m sure glad I wore my running shoes today.”

“It doesn’t matter what kind of shoes you’re wearing, you won’t outrun that bear,” replies the second.

“I don’t have to outrun the bear, I just have to outrun you,” answers the first.

In the world of the financial markets, securing an edge over the competition can mean life or death for a trading firm. Whether it is acting on news alerts or price movements, determining the best trading opportunity, or delivering an order to the marketplace, microseconds mean the difference between winning or just playing.

Reducing those microseconds – referred to as latency – is a continuing focus of trading firms, and an increasing challenge as that latency is pushed down to double and single digit microseconds. The “race to zero” becomes increasingly difficult and expensive to engage in as it nears its conclusion.

To date, much of the focus on latency reduction has been directed at reducing the physical distance between trading firms and the markets in which they participate, which results in so-called propagation latency. Nowadays, co-location of trading firms’ servers in the same data centers as markets’ matching engines has nearly eradicated that distance and associated latency.

With local network latency essentially addressed, an emerging but still challenging area of focus for latency reduction is on trading execution and matching system applications, and on the servers that host them.

This industry briefing outlines the low-latency trading landscape, details the latency characteristics of key data and trade execution processing applications, and introduces microprocessor techniques, such as Processor Acceleration Technology, designed to reduce latency in a cost-effective manner.

An industry briefing researched and written by Low-Latency.com for DELL 2 Powering Winning Low-Latency Trading Strategies

The Business of Low-Latency Trading Market Automation, Fragmentation and Execution Latency

Whether in the U.S., Europe, Latin America or Asia/Pacific, exchanges and ATS have in- vested heavily in low-latency automation. In the U.S, competition among these marketplaces was encouraged by the 2007 implementation of Regulation NMS, an initiative by the Secu- rities and Exchange Commission. For trading firms, those exchanges offering the fastest execution times are sought, so that the best price can be achieved before the markets move against them. Today, with 13 regulated equities markets and around 50 ATS in existence, round trip matching times of less than 100 microseconds are commonplace.

In Europe, similar cross-country regulation in the form of the Markets in Financial Instru- ments Directive (MiFID) was introduced and exchanges across the continent have engaged in similar competition for order flow, with the SIX Swiss Exchange – leveraging technology from NASDAQ – offering a matching time of less than 40 microseconds. Markets in Asia/ Pacific – from Singapore to Japan to Australia – and in Latin America – where Brazil and Mexico are leading the way – are also following the low-latency matching trend.

And it’s not just the cash equities markets that have become fragmented and seen low- latency technology investment. In the U.S., the addition at the end of 2012 of the Miami Options Exchange brought the number of equity options markets to 11, and applications are pending for more. Meanwhile, major futures exchanges, such as the Chicago Mercantile Exchange, NYSE Liffe in London and the Frankfurt-based Eurex, have updated their technol- ogy to reduce matching latency. As a result, trading firms are able to engage in low-latency arbitrage between cash and derivatives markets.

New automated marketplaces unrelated to equities – including foreign exchange and fixed income – are also emerging and investing in latency reduction. Markets such as FXAll and Hotspot FX have emerged to support FX HFT strategies, while a number of Swap Execu- tion Facilities are expected to establish themselves and will compete, at least in part, on the latency of their trading functions.

Algorithmic and High Frequency Trading

As new markets in all asset classes have emerged, and market fragmentation has risen alongside advances in technology, trading firms have adopted new approaches to electronic trading for both their proprietary operations and for their investment management customers.

Algorithmic trading – which might be broadly defined as computer-initiated trading of finan- cial instruments – began in the purest sense in the 1980s as a means to trade baskets of securities, sometimes arbitraging between cash and futures markets. During the past few years it has become more widespread and is now directed at a wide range of markets.

So-called execution algorithms are widely used by investment management firms to buy or sell large blocks of equities in the marketplace with minimum market impact. These algo- rithms seek out liquidity across trading markets by examining the order books published by each and break down a large order into much smaller ones, trickling them out across mar- kets over an extended period of time.

Latency is an important factor in such trading strategies to ensure that best execution is achieved with the minimum of price slippage while orders are being fed into the various mar- kets, which will seek to respond by adjusting bid/offer prices accordingly.

An industry briefing researched and written by Low-Latency.com for DELL 3 Powering Winning Low-Latency Trading Strategies

For some algorithms, just as important as latency is jitter, or variance of latency from the norm. These strategies take account for known price variances on markets over microsec- ond timespans, and so it is important to keep those timespans consistent for the algorithms to work effectively.

High-frequency trading – or HFT – is an important class of algorithmic trading, where trading strategies determine the very rapid buying and selling of an asset, individually or as portfoli- os, with the intent to aggregate small profits per transaction over many trades. For HFT, price slippage must be kept to a minimum for the strategy to be profitable, and hence low-latency execution is a must.

Many trading firms access markets via the Direct Market Access (DMA) services of sponsor- ing brokers, who are continuously reducing the latency of their offerings. Against that trend, regulatory pressure is requiring these brokers to implement compliance and risk monitor- ing functions. Thus, implementing this functionality while adding the minimum of latency is important.

The Market Data Explosion

As a result of market fragmentation and competition, the growth in automated trading of options, derivatives and foreign exchange trading, and the introduction of algorithmic and high frequency trading, the marketplace has witnessed a massive increase – an explosion as some have termed it – in aggregate market data (trade report and quotations) rates.

Given that market data is the life blood of algorithmic and HFT strategies, being able to digest it – including never missing an update, processing and storing each – with minimum latency is a crucial first step in the automated trading .

A major challenge to processing market data is coping with peak volumes, which generally occur when markets open and close, but also occur during the day as trading firms react to corporate, economic and political events. Aggregate peaks for U.S. markets have recently been as high as 6.65 million price messages per second (according to www.marketdata- peaks.com), with options market data accounting for much of that.

Moreover, despite the current period of low trading volumes, aggregate market data rates are increasing, and marketplaces expect market data rates to continue to increase in future years. For example, OPRA, which consolidates data feeds from the U.S. options markets, is advising market participants to plan for peaks of nearly 13 million messages per second for 2013.

Thus, the challenge for automated trading systems is to cope with both high data through- put, in the form of many millions of price updates, and low-latency processing of that data.

An industry briefing researched and written by Low-Latency.com for DELL 4 Powering Winning Low-Latency Trading Strategies

Low Latency Trading Technology

As marketplaces across all asset classes and geographies automate and provide faster matching, so trading firms are reducing the latency of their execution technologies to be competitive with their peers.

Broadly speaking, the latency associated with trading infrastructure is related to two components: the latency of moving data from point A to point B; and the latency of processing that data at point A and point B. Much of the focus to date has been on the latency of moving data – propagation latency – between marketplaces and trading firms, and the primary contributor to that latency is the distance between the parties involved.

Propagation Latency and Co-Location

Reducing propagation latency by making use of fast direct fiber and wireless connections has been a common approach for trading firms, many of which are now leveraging co- location, which involves a firm placing their execution systems in the same as the matching engines they are trading against.

Within a co-lo data center, connectivity leverages local area networking technology and is measured in 10s of microseconds. Ethernet at 10 GBits/second is the most common technology in place, with switches from the likes of , , and . Data transports are commonly TCP/IP for transactional data and UDP for the broadcast of market data.

Since most trading infrastructure consists of multiple servers, local networking is also leveraged as an interconnect. Once again, 10gE Ethernet is widely deployed, but InfiniBand is also popular as it generally offers lower latency and reduced jitter. Mellanox Technologies is the leader in the InfiniBand space.

In addition to TCP/IP protocols, memory-to-memory transports such as Remote Direct Memory Access (RDMA) are often deployed for inter-server communications, with network cards from the likes of Solarflare Communications and Mellanox, which support kernel- bypass data transfer (from network interface to RAM without involving CPU processing).

Within each server, in addition to the use of RDMA and kernel-bypass networking, it is common for operating systems to be configured for low-latency applications, especially with regard to resource allocation and processor priorities to avoid context switching that lowers latency and increases jitter. Red Hat’s distribution of (with appropriate patches) and Oracle’s Solaris are most likely to be used to host latency sensitive applications.

Focus on Application Latency

With latency related to wide area connectivity and local area networking largely understood, addressed and equalized, the focus is now moving to the matching and trading applications themselves, and approaches to reducing latency within them to maintain an edge.

Application latency reduction is dependent on:

• Designing application architecture for speed • Choice of coding language and code optimization • Choice of microprocessor and leveraging specific abilities

An industry briefing researched and written by Low-Latency.com for DELL 5 Powering Winning Low-Latency Trading Strategies

For certain extreme low-latency applications, deploying specialist microprocessors, typically Field Processor Gate Arrays (FPGAs), has become accepted. Typically, they are implemented on network interfaces and perform specific processing related to data feed decoding and order book management. While the performance of this approach can be attractive, the downside of FPGAs is the difficulty in designing code to run on them, along with subsequent debugging and on-going code maintenance.

Thus, most trading systems applications are implemented on mainstream Intel microprocessor architectures, the most recent of which is commonly referred to as Sandy Bridge, and is implemented in Intel® Xeon® processor E5 family chips for servers.

While the Intel® Xeon® processor E5 family is a multi-core design – with up to eight cores allowing up to 16 code threads to execute in parallel – several functions that form part of an automated trading system are not well suited to parallel processing, and so high single thread execution performance is important in order to reduce latency.

An industry briefing researched and written by Low-Latency.com for DELL 6 Powering Winning Low-Latency Trading Strategies

Dell Processor Acceleration Technology

DPAT Overview

Dell Processor Acceleration Technology (DPAT) is a technique to run Intel® Xeon® processor E5 Family microprocessor in certain Dell servers at their highest clock speed for maximum single thread execution power. It was specifically developed based on feedback from trading firms engaged in algorithmic trading and HFT.

DPAT is a free downloadable and customer installable BIOS patch available for – and only for – Dell PowerEdge R720, R720xd and R620 servers. It is configured for those servers via manual console commands.

The clock speed of a microprocessor – set by its oscillator crystal and expressed in gigahertz – determines how fast it executes instructions, and relatedly how much power it requires and dissipates.

DPAT leverages the Turbo Boost mode of the Intel® Xeon® processor E5-2690 to increase the clock speed from its normal base level up to a rated maximum for a subset of cores in a multi-core microprocessor.

Usually, Turbo Boost is dynamically controlled by a server’s operating system to boost power for its application load when possible, based on the number of cores in active use and power and thermal limits.

DPAT allows Turbo Boost to be specifically configured and optimized to run applications at higher than base clock speeds, and so execute faster. Thus, applications that require high single thread performance – including key components of trading systems – can run in an environment that provides increased processor power.

For example, the Intel® Xeon® processor E5-2690 chip has eight cores, which have a base frequency of 2.9 GHz. With all cores enabled, it is possible to boost clock speed to 3.3 GHz, while with just one core enabled, it can boost as high as 3.8 GHz. With such processor frequency increases – of 13% to 31% – application performance will generally be improved (real-world impact being subject to specific application design).

Moreover, because the Turbo Boost clock speed is locked ‘on’ for the active cores, there is no transition in to and out of Turbo mode that can lead to jitter, which is undesirable for trading systems applications.

Importantly, microprocessors and servers implementing DPAT exhibit the same thermal characteristics, and enjoy the same warranty and support services, as standard chips and servers. This contrasts with other processor acceleration techniques, such as overclocking, which typically invalidate warranties and can prove to be unreliable.

By deploying DPAT, trading applications run faster, with no time-consuming application code changes – no need to redesign code to eek out microseconds of compute time, no debugging or testing, and no release/deployment issues of the kind that have led to high profile failures at marketplaces and trading firms in recent months.

For trading firms, this translates to a boost in performance that can be implemented in minutes, with no impact on application stability.

An industry briefing researched and written by Low-Latency.com for DELL 7 Powering Winning Low-Latency Trading Strategies

DPAT and Low-Latency Trading

Low-Latency trading applications leveraging DPAT enjoy a number of benefits:

• Ease of deployment – free, downloadable, customer installable, potentially in minutes • Improved performance with no operating system stack or application code changes, and hence no deployment testing or rogue code errors • No jitter induced by Turbo Boost transitioning • Standard warranties and support services apply

As stated above, certain applications within a trading system require single thread execution, and do not benefit from multi-core parallel processing. For these applications, a higher clock speed means faster code execution, and so faster applications with lower latency. Specific examples include:

• Market Data Feed Handlers – which are responsible for interfacing to market data feeds, decoding message structures and transforming data formats (both character- based and binary), normalization of data and data enrichment, and then presenting data to downstream components via a standard interface.

Error detection and recovery is also performed by these handlers. Optional functionality can include last value caching, persistent storage of a time series of messages, and calculations such as daily highs and lows.

• Order Execution Gateways – which interface to different marketplaces delivering orders and handling order acknowledgements, supporting standard protocols such a FIX as well as proprietary binary protocols. This generally requires data transformation and message construction, fed by data from algorithmic trading engines.

Handling of order exceptions are also performed by gateways. Optionally, timestamping and storage of orders and acknowledgements is performed for latency monitoring.

Thus, DPAT can play an important role in the overall functioning of algorithmic and HFT systems, ensuring key components take full advantage of a server’s maximum performance, and hence reducing latency and jitter.

An industry briefing researched and written by Low-Latency.com for DELL 8 Powering Winning Low-Latency Trading Strategies

Conclusion and Take Aways

In the financial trading markets, acting on a market data update, delivering an order to a market, or executing a trade faster than the competition can mean the difference between making money or losing it on algorithmic and high frequency trading strategies.

As a result, marketplaces and trading firms have sought to reduce the latency of their systems to compete with their peers. Of increasing importance is the ability to deploy low- latency technologies in a cost-effective and agile manner, in order to take advantage of new trading opportunities, in new asset classes and geographies.

To date, much of the focus on latency reduction has been on that due to propagation of data over distance, which has resulted in the uptake of co-location space, physically close to matching engines in the same data centers. Next, the focus moved to local networking technology and systems/networking stack software.

The latency reduction focus is now moving to trading and matching applications, including their design and efficiency of code. Making improvements in this area is time consuming and expensive, requires specialist skills, and is prone to errors that might be catastrophic.

Dell Processor Acceleration Technology – or DPAT – is one route to boosting application performance requiring no code changes or associated debugging and testing. It is a free, customer-installable BIOS patch available exclusively for certain Dell PowerEdge servers, which allows the Turbo Boost mode of the Intel® Xeon® processor E5-2690 to be configured for optimal application performance. Thermal characteristics of the Dell server are not impacted.

By leveraging DPAT – which can be implemented in minutes – applications can benefit from the 13% to 31% increase in processor frequency, and not be subject to jitter that impacts the performance of trading strategies. As such, it provides an attractive upgrade option for trading firms seeking to maintain an edge over their competition.

For further information:

http://content.dell.com/us/en/enterprise/financial-services-markets-solutions-processor- acceleration-technology.aspx

Also see “Dell Processor Acceleration Technology” – a technical white paper – available from Dell (NDA required).

An industry briefing researched and written by Low-Latency.com for DELL 9 Powering Winning Low-Latency Trading Strategies

About Dell

Dell Inc. (NASDAQ: DELL) listens to customers and delivers innovative technology and services that give them the power to do more. For more information, visit www.dell.com.

About Low-Latency.com Low-Latency.com is the premier online community from A-Team Group for financial markets technologists who operate at the cutting edge. It is the community portal offering vital news, expert analysis, in-depth research and thought leadership focused on the role played by low- latency and related technologies in today’s high performance electronic trading markets. In particular, we see increasing convergence of low latency, cloud and big data technologies - each bringing real benefits to the automated trading markets of tomorrow. This convergence is a key focus for Low-Latency.com and the community that engages with it.

An industry briefing researched and written by Low-Latency.com for DELL 10