Application of Machine Learning in High Frequency Trading of Stocks
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Scientific & Engineering Research Volume 10, Issue 5, May-2019 1592 ISSN 2229-5518 Application of Machine Learning in High Frequency Trading of Stocks Obi Bertrand Obi Worldquant University 201 St. Charles Avenue, Suite 2500 New Orleans, LA 70170, USA [email protected] Abstract Algorithmic trading strategies have traditionally been centered on follwing the market trends and the use of technical indicators. Over the years High Frequency algorithmic Trading has been left only in the hands of institutional players with deep pockets and lots of assets under management, despite huge returns involved. In this project webuilt trading strategies by applying Machine Learning models to technical indicators based on High Frequency Stock data. The result is an automated trading system which when applied to any stock could generate returns which are ten times higher than the market returns without significant increase in volatility. With advancement in technology High Frequency Algorithmic trading can be undertaken even by individuals or retail traders with moderate initial investment and technical skills. Keywords:Machine Lerning; Prediction of stock prices movements; Classification reports; Algorithmic trading; High frequency trading; Key performace indicators IJSER 1. Introduction Not too long ago, Algorithmic Trading was only available for institutional players with deep pockets and lots of assets under management. Recent developments in the areas of open source, open data, cloud computing and storage as well as online trading platforms have leveled the playing field for smaller institutions and individual traders, making it possible to venture in this fascinating discipline with only a modern notebook and an Internet connection. Nowadays, Python and its eco-system of powerful packages is the technology platform of choice for algorithmic trading. Among others, Python allows you to do efficient data analytics (with e.g. numpy, pandas), to apply machine learning to stock market prediction (with e.g. scikit-learn) or even make use of Google’s deep learning technology (with tensorflow) and Microsoft’s CNTK. Algorithmic trading basically refers to the trading of financial instruments based on some formal algorithm. An algorithm is a set of operations (mathematical, technical) to be conducted in a certain sequence to achieve a certain goal. For example, there are mathematical algorithms to solve a Rubik’s cube (The Mathematics of the Rubik’s Cube or Algorithms for Solving Rubik’s Cube). Such an algorithm can perfectly solve the problem at hand via a step-by-step procedure. Another example is algorithms for finding the root(s) of an equation (if it (they) exist(s) at all). In that sense, the objective of a mathematical algorithm is often well specified and an optimal solution is often expected IJSER © 2019 http://www.ijser.org International Journal of Scientific & Engineering Research Volume 10, Issue 5, May-2019 1593 ISSN 2229-5518 High-frequency trading(HFT)is a type of algorithmic trading characterized by complex computer algorithms that trade in and out of positions in fractions of seconds, leveraging arbitrage strategies in order to profit from the public markets. Commonly, traders take advantage of the penny spread between the bids-ask on equities. For the typical retail trader, this would seem redundant and the pay-off would be minuscule. For HFTs, the profit from the spread accumulates and as thousands of trades are executed, there are millions of dollars to be made [1]. Traditionally, financial markets operated on a quote-driven process where a few market makers provided the sole liquidity and prices for Financial Assets. Recently, major developments have beenmade to automate the Financial Markets which have led to many trading firms using computer algorithms to trade the Assets. High Frequency Trading (HFT), in particular, has been a major topic due to the features that distinguishes it from electronic and manual trading. This includes the extremely high speed of execution (microseconds), multiple executions per session, and very short holding periods (usually less than a day). 1.1. Problem statement Time series data in financial markets are highly nonlinear, nonstationary and noisy in nature. Traditional models based on statistical methods, such as the Autoregressive Moving Average (ARMA) model, Autoregressive Integrated Moving Average (ARIMA) model, and General Autoregressive Conditional Heteroskedasticity (GARCH) model, suffer from limitations due to their linearity assumption. Predicting how the stock market will perform is one of the most difficult things to do. There are so many factors involved in the prediction such as; physical factors, psychological, rational and irrational behaviour, etc. All these aspects combine to make share prices volatile and very difficult to predict with a high degree of accuracy. Waren Buffet states that: “Forecasts may tell you a great deal about the forecaster; they tell you nothing about the future.” Hence finding the right algorithm to automatically and successfully predict and trade in financial markets is the Holy Grail in finance. 1.2. Project Objectives The main IJSERobjective of this project is to develop a High Frequency Trading System which uses Machine Learning to predict the movements of stock market prices with reasonable level of accuracyand to trade the stock with simple trading strategy to generate adequate performance. Other objectives include the following: 1. Comparative analysis of Machine learning Algorithims on High Frequency Stock data to determine algorithms with high predictive power for stock price movements 2. Perform technical analyses as features to the Machine Learning models in the High frequency Trading System 3. Generate and track adequate performance from the High frequency Trading System. 4. Add to the elaborate body of literature on application of Machine learning to Finance and High Frequency Trading 1.3. Hypothesis Machine Learning Algorithms cannot predict stock price movement with reasonable amount of certainty in High Frequency Trading IJSER © 2019 http://www.ijser.org International Journal of Scientific & Engineering Research Volume 10, Issue 5, May-2019 1594 ISSN 2229-5518 2. Literature Review Several authors have employed Machine learning technologies in predicting and trading stock markets. The following Algorithms have been used in various situations: Because of their ability to model nonlinear relationships without pre-specification during the modeling process, neural networks (NNs) have become a popular method in financial time-series forecasting. NNs also offer huge flexibility in the type of architecture of the model, in terms of number of hidden nodes and layers. Indeed, Pekkaya and Hamzacebi compare the results from using a linear regression versus a NN model to forecast macro variables and show that the NN gives much better results [3].Many studies have used NNs and shown promising results in the financial markets. Grudnitski and Osburn implemented NNs to forecast S&P500 and Gold futures price directions and found they were able to correctly predict the direction of monthly price changes 75% and 61% respectively [4]. Another study showed that a NN-based model leads to higher arbitrage profits compared to cost of carry models [5]. Phua, Ming and Lin implement a NN using Singapore’s stock market index and show a forecasting accuracy of 81% [6]. Another popular machine learning classification technique that does not require any domain knowledge or parameter setting is the decision tree. It also often offers a better visually interpretable model compared to NN, as the nodes in the tree can be easily understood. The simplest type of decision tree model is the classification and regression tree (CART). Sorensen et al. show that CART decision trees perform better than single-factor models based on the same variables in picking stock portfolios [7]. Another study found that a boosted alternating decision tree with expert weighing generated abnormal returns for the S&P500 index during the test period [8]. To improve accuracy, some studies used the random forest algorithm for classification, Booth et al. show that a regency-weighted ensemble of random forests produce superior results when analyzed on a large sample of stocks from the DAX in terms of both profitability and prediction accuracy compared with other ensemble techniques [9]. Similarly, a gradient boosted random forest model applied to Singapore’s stock market was able to generate excess returns compared with a buy-and-hold strategy [10]. Some recent researches combine decision tree analysis with evolutionary algorithms to allow the model to adapt to changing market conditions. Hsu et al. present constraintbased evolutionaryIJSER classification trees (CECT) and show strong predictability of a company’s financial performance [11]. Support Vector Machines (SVM) is also often used in predicting market behaviors. Huang et al. compare SVM with other classification methods (random Walk, linear discriminant analysis, quadratic discriminant analysis and elman backpropagation neural networks) and finds that SVM performs the best in forecasting weekly movements of the Nikkei 225 index [12]. Nair et al. propose a system that is a genetic algorithm optimized decision tree support vector machine hybrid and validate its performance on the BSE-Sensex and found that its predictive accuracy is better than that of both a NN and Naive bayes based model [13] While some studies