RCURRENCY: Live Digital Asset Trading Using a -based Forecasting System

Yapeng Jasper Hu Ralph van Gurp Ashay Somai Delft University of Technology Delft University of Technology Delft University of Technology Distributed Systems Group Distributed Systems Group Distributed Systems Group Delft, The Netherlands Delft, The Netherlands Delft, The Netherlands [email protected] [email protected] [email protected]

Hugo Kooijman Jan S. Rellermeyer Delft University of Technology Delft University of Technology Distributed Systems Group Distributed Systems Group Delft, The Netherlands Delft, The Netherlands [email protected] [email protected]

Abstract—Consistent alpha generation, i.e., maintaining an difficulty in predicting traditional asset data is all the more edge over the market, underpins the ability of asset traders to true for the recent digital asset (e.g. Bitcoin) markets, which reliably generate profits. Technical indicators and trading strate- have proven to be extraordinarily volatile [8–10]. gies are commonly used tools to determine when to buy/hold/sell assets, yet these are limited by the fact that they operate on However, the recent rise of AI and ML has opened up new known values. Over the past decades, multiple studies have opportunities. AI techniques can analyze enormous amounts investigated the potential of artificial intelligence in stock trading of data and observe decades of conventional experience at a in conventional markets, with some success. In this paper, we rate that is only limited by the complexity of the technique present RCURRENCY, an RNN-based trading engine to predict and the available computational power [11, 12]. Furthermore, data in the highly volatile digital asset market which is able to successfully manage an asset portfolio in a live environment. By modern algorithms are capable of discover- combining asset value prediction and conventional trading tools, ing patterns in data that are hidden to the human eye due to RCURRENCY determines whether to buy, hold or sell digital the volume and complexity of the data [13]. Unsurprisingly, currencies at a given point in time. Experimental results show investment and asset management firms have started to invest that, given the data of an interval t, a prediction with an error billions into developing AI-based asset trading and investment of less than 0.5% of the data at the subsequent interval t+1 can be obtained. Evaluation of the system through backtesting shows applications [14]. that RCURRENCY can be used to successfully not only maintain In this paper, we present our findings and experiences a stable portfolio of digital assets in a simulated live environment in designing and implementing RCURRENCY, an automated using real historical trading data but even increase the portfolio trading engine tasked with managing a portfolio of digital value over time. assets (Section III). The primary goal of our research is to Index Terms—Machine Learning, Digital currencies, Neural networks, Forecasting, Deep learning, Bitcoin provide insights into the capabilities of AI-powered digital arXiv:2106.06972v1 [cs.AI] 13 Jun 2021 asset trading in a highly volatile and therefore challenging live market environment. Effectively managing a portfolio requires I.INTRODUCTION the AI-based training engine to estimate when to sell off assets, Ever since the emergence of modern financial markets in buy new ones, or hold the current position. Decisions are made the 17th century, specialists have made an effort to maximize based on data gathered from a digital asset exchange at a profits from trading goods, services, stocks and securities. regular interval. For this purpose, we combine the prediction Consequently, consistent alpha generation [1], a term to de- with previous (known) values and apply a trading strategy to scribe the edge over the market or the ability to make profit, derive and execute a final buy/hold/sell decision. In practice, has remained a hot topic [2, 3]. With the advent of the achieving a stable portfolio is considered a high bar for digital age and the ability to process large amounts of data, automatic trading engines to master. many strategies have revolved around new techniques and We experimentally evaluate different trading strategies computational breakthroughs in an effort to predict the upward based on actual historical market data (Section IV). Our and downward trends of assets (e.g., [4, 5]). The reliability results show that the prediction accuracy is highly dependent of these predictions often remains questionable [6, 7]. This on the configuration of the neural network. The Root Mean Squared Error over a prediction can increase significantly variables is limited; resulting the loss of potentially valuable when incorrect parameters are chosen. We thoroughly evaluate network input data. the most important parameter spaces and their impact on the prediction accuracy. When an adequate set of parameters is III.THE RCURRENCYLIVE TRADING SYSTEM chosen, the trading engine is capable of estimating future asset In order to address the challenge of making reliable predic- values with an error of approximately 0.4%. In practice, this tions of cryptocurrency markets for live trading, we designed makes it possible to reliably predict up and down motions, the RCURRENCY system and structured it into three separate as well as approximate the absolute change in value. This, components. The responsibility of the first component is to in turn, allows the system to maintain and increase portfolio collect and preprocess the timely-ordered series of data. The value over time. second component feeds this data into the neural network and, by doing so, trains it to recognize patterns in the processed II.RELATED WORK data. These two components are sufficient to grant the system The suggestion of predicting stock prices utilizing historic predictive capabilities for single steps in the time series data. data has been an active research topic for decades. One of the The third component allows the system to make trading early prediction methods based on an inter-temporal general decisions depending not only on a single prediction, but also equilibrium model was proposed by Balvers et al. [15]. These on an elaborate analysis of trends and patterns. It implements early methods used a purely analytical approach applying and applies various trading strategies, each of which calculates basic mathematics and statistics. However, some schools of a trading signal based on its respective predefined rules. The thought even then disputed the merit of the approach and flow diagram of the RCURRENCY system is visualized in results of predicting the stock prices. For instance, it has been Figure 1. empirically shown that if a random walk strategy accurately reflects the reality, then all models concerning (stock) market A. Training Data predictions are meaningless [6]. A neural network-based prediction method is only as reli- Nowadays, more variables are added to the prediction able as the data that can be leveraged to train the network. As a model, such as the market volume [16], twitter sentiment result, an essential step in the processing pipeline is to collect [17], investor sentiment, or media news content [18]. Whereas the raw data and turn it into preprocessed training data for the fundamental analysis describes the examination of data fo- learning algorithm to digest. Our system obtains raw Bitcoin cused on macro-economic variables [19], technical analysis training data by calling an API provided by CryptoCompare1. puts more emphasis on extracting trends and patterns of an This company offers services concerning over 3.000 digital investment instrument’s price, volume, breadth and trading assets, including the ability to retrieve historical price data. activities [20–22]. The methodology proposed and analyzed Such data comprises candlestick data points in one-minute in this paper focuses on technical analysis while it could be time frames, where each data point contains four values: the expanded to support fundamental analysis in future works. open, high, low, and close price of the time frame. The system Most recently, multiple studies have used AI techniques for runs a simple script to recursively and sequentially call this prediction. Examples of concrete techniques include particle API to fetch historical Bitcoin price data, which consists of swarm optimization [23], nearest neighbor classification [24], approximately 80.000 hourly data points, spanning back to and neural networks [25]. An empirical study by Gu et al. August 2010. In contrast, live data on the current market [26] compared multiple AI techniques in combination with condition is instead fetched directly from a cryptocurrency trading techniques in the conventional stock market. Their exchange to minimize acquisition delays. fundamental result is that overall AI can be successfully The raw data then undergoes preprocessing. Unfortunately, utilized and the predictive capability of the AI techniques approximately the first 30 percent of the data set contains carries a significant advantage. However, using AI in highly many near-zero values that tend to deteriorate the prediction volatile markets such as digital assets [9] has gone largely capabilities of the network. These values do not contain unexplored to this date. sufficient variance for the gradient descent of the error function Some initial efforts to predict prices and trends of Bitcoin to converge. As a consequence, all data up until May 2013 and other digital currencies have been undertaken by McNally has been manually removed, roughly up until the time Bitcoin et al. [27] and [28]. McNally et al. [27] explore prediction of reached values around one hundred USD. Bitcoin prices based on the Simple Moving Average (SMA) Next, the system applies some technical analysis on the technical indicator, through RNN, LSTM and ARIMA meth- resulting data set, which involves extracting trends and pat- ods. terns from the price data by constructing technical trading Alessandretti et al. [28] investigate price prediction of a indicators. Each indicator is calculated according to their large number of cryptocurrencies using tree boosting tech- respective formula, where each formula uses some (or all) of niques, as well as an LSTM-based method. The researchers the data set as input. RCURRENCY implements the financial simulate trading of currencies, taking into account transaction technical indicators. The calculated indicator values are added fees, but fail to explore the myriad of strategies traders use to determine their investments. In either case the number of input 1https://min-api.cryptocompare.com/ Fig. 1: Structure and data flow in RCURRENCY. as additional rows to the training data set. This extra training value in the predictions of the actions following the current data provides the neural network with more information about subsequence. the market and should thus lead to higher predictive accuracy. The recurrent neural network encompasses three layers: The data is subsequently converted to what is called a 1) Linear layer stationary data set. Such a data set is created by taking the The linear layer is required to map the nodes of the difference of each data point from the previous data point in input layer to the number of nodes in the hidden layer. a column-based manner. It is a fully connected layer, of which every edge has an Finally, row-based normalization is applied to the data set individual weight. to fit the resulting values in the range of -1 and 1 in order to 2) Fast LSTM layer obtain better results and reduce the training time [29]. The new The FastLSTM layer is a special, faster version of the row-normalized matrix is mathematically defined as follows: conventional LSTM layer. It combines the calculation of Given a matrix x, with dimensions m×n, for the new matrix the input, forget and output gates and the hidden state y holds: in a single step. The composite functions of the standard LSTM cell are x − ymin y = 2 ∗ ( i,j ) − 1, 1 ≤ i ≤ m, &1 ≤ j ≤ n. (1) based on the paper from [37]. and are defined as follows: i,j ymax − ymin i = σ(W x + W h + W c + b ) B. Recurrent Neural Network t xi t hi t−1 ci t−1 i ft = σ(Wxf xt + Whf ht−1 + Wcf ct−1 + bf ) There are various options available regarding networks c = f c + i (W x + W h + b ) capable of consuming a series of data points in order to make t t t−1 ttanh xc t hc t−1 c (2) predictions on future values, such as the time series analysis ot = σ(Wxoxt + Whoht−1 + Wcoct−1 + bo) technique ARIMA [30], or neural networks. As evaluated ht = ottanh(ct). by Kohzadi et al. [31], a neural network outperforms the However, as the network implements a FastLSTM layer ARIMA network over longer periods of time, justifying its rather than a conventional LSTM layer, the composite preference for our system. Specifically a recurrent neural functions are slightly altered [38]. network functions as RCURRENCY’s predictor, which is a particular type of neural network capable of exhibiting it = σ(Wxixt + Whiht−1 + bi) temporal dynamic behavior. As first introduced by [32], these ft = σ(Wxf xt + Whf ht−1 + bf ) networks model the connections between layer nodes as a ct = ftct−1 + tanh(Wxcxt + Whcht−1 + bc) (3) directed graph along a temporal sequence. The input data o = σ(W x + W h + b ) for RCURRENCY’s recurrent neural network consists of so- t xo t ho t−1 o called candlesticks, data structures containing numerical stock ht = tanh(ct). market information on assets. The values in a candlestick are 3) Output layer with Mean Squared Error (MSE) perfor- dependent on previous candlestick values and do therefore not mance function follow a random walk [33]. The output layer evaluates the neural network, in case A conventional recurrent neural network experiences prob- it is used for training purposes. If the neural network lems with the gradient descent when learning long-time de- is used to predict a values, the value that is returned pendencies, as depicted in the research by Bengio et al. reflects the output of the specified output layer. In this [34]. When an LSTM layer is used in the architecture of case, the MSE function is used to evaluate the network. the recurrent neural network, the problems of exploding and It does that as follows: m n vanishing gradients are solved [35]. An example of a long-term 1 X X MSE = (x − xˆ )2 (4) dependency in the current context is given by Tsinaslanidis m ∗ n ij ij and Kugiumtzis [36]. Their paper shows that a previously i=1 j=1 encountered subsequence of the time series data could be where xij is the actual value, xˆij is the predicted value similar to the current trend. The consequent actions of the price in row i and column j, m is the total amount of rows, following the previously encountered subsequence could carry and n is the total amount of columns. The network also uses the Adam optimizer, introduced by result in an asset being marked as overbought (indicating Kingma and Ba [39], to speed up the training process. The the holder should sell) and vice versa. Adam optimizer is computationally efficient, requires little • Double Exponential Moving Averages (DEMA) memory, is invariant to diagonal re-scaling of the gradients, The DEMA strategy utilizes two moving averages to de- and is well suitable for large quantities of data, or problems termine whether to buy, hold or sell an asset. Crossovers with many parameters. Therefore, it is a good fit for the of these averages indicate buy and sell moments as the solution. The resulting formula for updating parameters is: momentum changes between long-term and short-term

mt asset value. mˆ t = t , • Moving Average Convergence / Divergence (MACD) 1 − β1 vt MACD is a well-known strategy based on the difference vˆt = t , between two moving averages with different lengths, 1 − β2 (5) called slow- and fast moving average. This is then com- a ∗ p1 − βt 2 pared to the moving average of the difference itself, called at = t , 1 − β1 the signal line. A decision to buy, hold or sell is made at ∗ mt θt+1 = θt − √ mˆ t, whenever the difference crosses the signal line. vˆt +  • Random Walk where α is the step size, mt is the mean at time t, and vt is A controversial yet widely regarded investment theory the uncentered variance of the gradients at time t. known as the efficient market hypothesis states that all available information is immediately reflected in the C. Trading Strategies and Prediction value of an asset, and therefore no consistently profitable Technical analysis of digital assets is indeed sensitive to the trading strategies should exist [43]. Consequently, a series trading strategy, i.e, the methods of buying and selling based of random actions should be able to approximate the on predefined rules to make trading decisions. There have been results of a series of calculated trades. longstanding discourses on how useful trading strategies are D. Implementation in practice [40]. Common trading strategies based on time RCURRENCY is implemented in C++ with various libraries series pattern analysis, such as momentum and contrarian, as making up its core. First, the historical data is fetched using the well as other technical analysis strategies have had varying API provided by CryptoCompare. Since the API is currently degrees of success [24, 41, 42]. As such, it was decided not rate-limited or metered, this step takes only minutes. to add this element to the experimental AI, giving it the Live data is fetched in real-time by creating a WebSocket capability of applying four different trading strategies based connection with the exchange (Binance) and fetching the on technical indicators. Due to the vast number of strategies candles after they have formed. available and conceivable, priority was given to those that are After obtaining the historical data, it is preprocessed and fed most commonly adopted by existing trading platforms, such as to the neural network. This process utilizes two well-known TradingView2 or Cryptowatch3. Each trading strategy is briefly libraries: Armadillo4 and TA-Lib5. The library discussed below. provides highly optimized matrix computation functionality Every time period (minutely, hourly, daily) the neural net- desired for regular data set manipulation. TA-Lib provides work makes a prediction of the next period’s asset price. This all necessary functionality for efficiently calculating technical value is digested by the chosen trading strategy or strategies indicators from stock market data. As such, preprocessing and to determine a decision signal, which is most commonly preparation of the training dataset is completed in under a facilitated by observing the intersection between two or more minute. moving averages. Volatility of the portfolio value can be con- The recurrent neural network, as described in Section III, tained by setting limits on the minimum and maximum amount is implemented using MLPack [44]. MLPack is a fast and of portfolio value that can be traded each time period. Altering flexible C++ machine learning library, implementing various these limits and changing the trading strategies determines the types of configurable neural networks and machine learning level of risk management in trading. methods. It utilizes the Armadillo library for linear algebra • Rate-of-Change (ROC) computations, allowing for relatively low network training The ROC strategy considers the change of the predicted times. Training the network on all of the historical data value relative to the last actual value. When the change is between May 15th, 2013 and January 31st, 2019 takes ap- below, in between, or above a lower and upper threshold, proximately 10.5 seconds on a single commodity machine a decision is made whether to sell, buy or hold. (Apple MacBook Pro model A1502, 2.4GHz Intel Core i5- • Relative Strength Indicator (RSI) 4258U, 8GB RAM). The prediction of a data point as well This strategy tracks upward and downward trends in asset as an associated trading decision can be obtained in under a value. A sufficiently long series of positive value changes millisecond.

2https://www.tradingview.com/ 4http://arma.sourceforge.net/ 3https://cryptowat.ch/ 5http://ta-lib.org/ Fig. 2: Performance of the network under k-fold cross- Fig. 3: Impact of the different weight initialization values on validation. the RMSE.

IV. EXPERIMENTAL EVALUATION A. Parameter Optimization The performance of the neural network is highly dependent Considering the fact that the training data set is time- on several parameters that fine-tune different aspects of the dependent, Time Series k-fold Cross Validation was applied network. These parameters relate to the topology of the net- to assess the accuracy of the AI’s predictive capabilities work itself, as well as mathematical functions used to increase and express them numerically in a meaningful manner. The the speed of learning and overcome local optima. training data set is divided into a training part (80%) of size p To optimize the hyperparameters, the validation technique samples and a validation part (20%) of size q samples, where of the previous section was applied while conducting an each sample represents the values of one time unit. exhaustive search for the minimal RMSE by modifying a The value k represents how many samples are used in the single parameter at a time. While related works using neural validation step, or how many time units are predicted in each networks often remains silent on how and why specific hyper- training-validation cycle. Next, a software environment was parameters have been chosen, in practice they are crucial to developed to facilitate this validation method. The software the accuracy of the RNN and therefore to the success of any continuously trained the model on samples 1, . . . , p, produced RNN-based agent. a prediction for the next hour p + 1, validated the prediction 1) Edge Weight Initialization: The initialization values of using validation sample p + 1, trained the model anew on these edges will notably influence the training time of the samples 1, . . . , p + 1, and so on. Setting the value of k to neural network. A more optimal setting of initial weights will 1 means q cycles are performed to complete one series of lead to a faster and better convergence of the final weight cross validation. An increasing value of k means a decreasing values, and this will reduce the training time significantly [45]. amount of training-validation cycles. After each cycle, the As Figure 3 shows, random numbers in the range of -0.75 and error is calculated and stored. Finally, an error function is 0.75 proved to have the lowest RMSE across the different folds used to measure the performance. A range of values for k of the Bitcoin data set. was tested in order to obtain the lowest margin of error, as 2) Optimization Algorithm: The role of the optimization shown in Figure 2 where the Root Mean Squared Error is algorithm is to adjust the weights on each training iteration expressed for the number of chosen folds. of the network, impacting both the speed of convergence As expected, the AI makes fewer predictive errors for a as well as the accuracy of the prediction [46]. Stochastic lower value of k, as a larger number of folds reduces the Gradient Descent (SGD) methods are the most commonly used amount of data that is used to train the network before the optimization algorithms. After testing it with different step prediction cycle(s). The figure also shows that the prediction sizes against the Adam optimizer [39], the latter proved to error on all four elements stays well below fifty dollars for give the best balance of speed and accuracy at a learning rate lower values of k. This effectively allows the system to of 0.01. The results are shown in Figure 4a. accurately predict up- and downward trends, as the absolute 3) Rho Value (Look-back): The rho value sets the length of change between each interval is on average higher than the the back-propagation through time (BPTT) used to train the prediction error. Since the initial plan for the trading engine network. Effectively, it determines how many recent values are was to make hourly predictions in the first place, the value of used to predict the output at the next step in the time series. k was fixed to one. A new validation cycle is automatically As shown in Figure 4b, the optimal value was found to be performed each hour after the JavaScript API pulls new data. around 150. Fig. 5: Performance of the different hidden layer sizes, mea- sured against the RMSE.

Fig. 4: Impact of parameter optimization on the RMSE.

4) Hidden Layer Complexity: The complexity of the hidden layer refers to the number of hidden nodes used in the Fig. 6: Portfolio growth of 10.000 USDT between January FastLSTM layer. Increasing the number of nodes brings the 31th and September 13th, 2019. potential to learn more complex rules, but at the same time requires more data to train. Furthermore, it increases the risk of over-fitting and thereby losing the ability to generalize, which the quote currency. The total value is calculated by converting results in inaccurate results when the network is used with the portfolio to quote currency at each period. slightly different data sets. Figure 5 shows that the RMSE However, merely achieving a higher monetary value does was the lowest at approximately 36 hidden layer nodes. not imply successful trading if the trader is outperformed by the market itself. If the BTC/USDT value was only ever B. Evaluating Trading Strategy Results increasing during a certain time period, then even taking no This section discusses the results obtained by validating action whatsoever would have turned a profit. Incidentally, this the trading engine and each of its five previously described is known as the Buy-And-Hold strategy. With this strategy, a trading strategies on historical data; a technique popularly designated asset volume is simply bought and held; no further known as backtesting. The set of historical data points acts trading actions are taken. This strategy therefore provides an as a simulation of live input and updates the portfolio by appropriate baseline for comparison, also known as ”trading executing hypothetical (’dummy’) trades. The performance of against Alpha”. The results of this comparison and the exper- each trading strategy in minimizing losses to the portfolio is iments in general are shown in Figure 6. dependent on the quality of the prediction used to determine The AI was trained on approximately 90% of the data set a course of action, as well as the quality of the strategy itself. and used to predict the remaining 10% of the data. Each The recurrent neural network in these experiments utilizes strategy was tested independently, each receiving an initial the optimized parameter configuration as mentioned in the portfolio containing 10.000 USDT and subsequently used for previous section to maximize prediction accuracy. trades using the stock market data spanning January 31st to Each experiment concerns trading the BTC/USDT pair, with September 13th 2019, approximately 5.400 hours. The Alpha BTC as the base currency and USDT as the quote currency. baseline was generated by simply converting the 10.000 USDT The goal is to maximize the portfolio’s total monetary value in into BTC at t = 0 and taking no further trading actions. Some limitations have been imposed on the trading agent in order to achieve more stable results. Primarily, a maximum of 25% of the total portfolio value may be traded at a time. Secondarily, no trades will be made if the expected value increase across the traded volume is smaller than the trading fee (set at 0.1%). Figure 6 shows that during this part of historical Bitcoin data the RSI strategy managed to achieve the highest portfolio value, with the MACD strategy coming in as a very close second and the DEMA strategy performing only slightly worse. However, all three strategies managed to maintain a stable portfolio value, keeping up with and even outperforming the Alpha baseline. The ROC strategy performs almost as badly as the Random Walk, both barely managing to retain a third of the Alpha baseline value. Although the AI is able to outperform the Alpha baseline Fig. 7: Sharpe Ratios of the six trading strategies between with certain strategies, the question remains whether the profit February 13th and September 13th, 2019. merits the risks involved. One way of answering this question is by calculating the Sharpe Ratio [47], which gives an indication of how well an investor is compensated by the asset value) and seven values derived from this data (technical returns of an asset compared to a risk-free asset with respect indicators), while the four outputs represent the predictions for to the risk. The Sharpe Ratio is calculated by subtracting some the four asset values. Performance is measured using the Root risk-free asset return rate from the considered asset return rate, Mean Squared Error, which emphasizes the gravity of large divided by the standard-deviation of the considered asset return prediction errors. Results obtained from the TSKCV suggest rate. Figure 7 shows the monthly Sharpe Ratios of each of the that the network is capable of predicting new asset values six trading strategies measured over the final seven (virtual) with approximately 0.4% error across the four output channels. months of the backtesting period, ranging from February 13th The strategies have varying rates of success in maintaining to September 13th, 2019. The return rate R is calculated as a steady portfolio value throughout the historical dataset but the percentage of excess returns relative to the start of each every single one performs better than the randomized strategy. Despite the fact that RCURRENCY is tested for Bitcoin, its month. As for determining the risk-free rate of return Rf , monthly U.S. Treasury Bills (T-Bills) made for an appropriate design allows the system to support other digital assets as well. risk-free asset, as the portfolio is measured in USDT. The No strategy was able to predict large and sudden drops in T-Bills during the backtesting period had a risk-free rate asset value correctly. This, however, is a common problem of roughly 2%, as indicated on the website of the United of existing AI-based solutions and future research could shed States Treasury Department [48]. To put these numbers into light on the question whether utilizing a broader set of input perspective, a Sharpe Ratio above is 1 is generally considered data can improve the ability to predict such events effectively. a good investment, above 2 is considered excellent [49]. Neural networks generally perform better given more train- The diagram reveals that even though the RSI and MACD ing data. One specific input readily available is volume data, strategies attained the highest portfolio values respectively, it which measures what quantities of an asset have been traded is the DEMA strategy that holds the highest Sharpe Ratio. in a given period of time. Unfortunately, volume data had to Since the returns (R) of the RSI and MACD strategies were be excluded as it was unknown from which exchanges the higher and the risk-free rate of return (Rf ) of each month was volume data originated, a problem that would likely lead to the same for every strategy, the explanation is that the RSI and poor predictions. Finding out the origins of the volume data MACD strategies displayed a higher volatility (expressed as would provide the system with additional training data. σR) than the DEMA strategy during this backtesting period. Despite the fact that RCURRENCY is tested for Bitcoin, The Random Walk strategy suffered both suffered major losses the design allows the system to support other digital assets as and a high volatility, thus its Sharpe Ratio fell below negative. well. The training data has to include the historical data of the Although the ROC strategy suffered similar losses, it managed alternative digital asset. Subsequently, the hyper-parameters to keep a positive Sharpe Ratio due to lower volatility levels. have to be verified or re-optimized for the newly added data. Another addition, is combining the historical data of multiple V. CONCLUSIONAND FUTURE WORK assets in different slices per asset, supplying the model with With RCURRENCY, we have built a viable AI-based trad- the possibility to utilize the correlations between the slices. ing agent that is able to operate within the highly volatile cryp- Lastly, it is a common trader’s tactic to combine multiple tocurrency market and even generate profit under real market trading strategies to maximize profits. Further simulations conditions. RCURRENCY uses a FastLSTM-backed recurrent could be executed in order to test the performance disparity of neural network to predict future stock prices. Network inputs the trading engine when combining two, three or even more are a combination of direct data (four values representing trading strategies. REFERENCES ticipating cryptocurrency prices using machine learning,” Complexity, [1] W. H. Gross, “Consistent alpha generation through structure,” Financial vol. 2018, 2018. Analysts Journal, vol. 61, no. 5, pp. 40–43, 2005. [29] J. Sola and J. Sevilla, “Importance of input data normalization for the [2] G. P. C. Fung, J. X. Yu, and W. Lam, “News sensitive stock trend application of neural networks to complex industrial problems,” IEEE prediction,” in Adv. in Knowledge Discovery and , M.-S. Trans Nucl Sci, vol. 44, no. 3, 1997. Chen, P. S. Yu, and B. Liu, Eds., 2002. [30] T. C. Mills and T. C. Mills, Time series techniques for economists. [3] E. W. Saad, D. V. Prokhorov, and D. C. Wunsch, “Comparative study Cambridge University Press, 1991. of stock trend prediction using time delay, recurrent and probabilistic [31] N. Kohzadi, M. S. Boyd, B. Kermanshahi, and I. Kaastra, “A comparison neural networks,” IEEE Transactions on neural networks, vol. 9, no. 6, of artificial neural network and time series models for forecasting pp. 1456–1470, 1998. commodity prices,” Neurocomputing, vol. 10, no. 2, pp. 169–181, 1996. ˇ [4] M.-C. Lee, “Using support vector machine with a hybrid feature [32] T. Mikolov, M. Karafiat,´ L. Burget, J. Cernocky,` and S. Khudanpur, selection method to the stock trend prediction,” Expert Systems with “Recurrent neural network based language model,” in Eleventh annual Applications, vol. 36, no. 8, pp. 10 896–10 904, 2009. conference of the international speech communication association, 2010. [5] L.-P. Ni, Z.-W. Ni, and Y.-Z. Gao, “Stock trend prediction based on [33] A. W. Lo and A. C. MacKinlay, “Stock market prices do not follow fractal feature selection and support vector machine,” Expert Systems random walks: Evidence from a simple specification test,” The review with Applications, vol. 38, no. 5, 2011. of financial studies, vol. 1, no. 1, pp. 41–66, 1988. [6] E. F. Fama, “Random walks in stock market prices,” Financial analysts [34] Y. Bengio, P. Simard, P. Frasconi et al., “Learning long-term depen- journal, vol. 51, no. 1, pp. 75–80, 1995. dencies with gradient descent is difficult,” IEEE transactions on neural [7] C. S. Vui, G. K. Soon, C. K. On, R. Alfred, and P. Anthony, “A networks, vol. 5, no. 2, pp. 157–166, 1994. review of stock market prediction with artificial neural network (ann),” [35] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural in 2013 IEEE International Conference on Control System, Computing Comput., 1997. and Engineering, 2013. [36] P. E. Tsinaslanidis and D. Kugiumtzis, “A prediction scheme using per- [8] R. Ali, J. Barrdear, R. Clews, and J. Southgate, “The economics of ceptually important points and dynamic time warping,” Expert Systems digital currencies,” Bank of England Quarterly Bulletin, p. Q3, 2014. with Applications, vol. 41, no. 15, pp. 6848–6860, 2014. [9] A. H. Dyhrberg, “Bitcoin, gold and the dollar–a garch volatility analy- [37] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with sis,” Finance Research Letters, vol. 16, 2016. deep recurrent neural networks,” in Acoustics, speech and signal pro- [10] D. Yermack, “Is bitcoin a real currency? an economic appraisal,” in cessing (icassp), 2013 ieee international conference on. IEEE, 2013, Handbook of digital currency. Elsevier, 2015, pp. 31–43. pp. 6645–6649. [11] L. Bottou, “Large-scale machine learning with stochastic gradient de- [38] M. Edel, “Fast lstm layer documentation,” jun 2018. scent,” in COMPSTAT’2010, 2010, pp. 177–186. [Online]. Available: https://github.com/mlpack/mlpack/blob/master/src/ [12] L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for large- mlpack/methods/ann/layer/fast lstm.hpp scale machine learning,” Siam Review, vol. 60, no. 2, pp. 223–311, 2018. [39] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” [13] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical arXiv preprint arXiv:1412.6980, 2014. machine learning tools and techniques. Morgan Kaufmann, 2016. [40] P. J. Knez and M. J. Ready, “Estimating the profits from trading [14] J. Lee, “One of the world’s oldest quants is going all-in with robots,” strategies,” The Review of Fin. Studies, vol. 9, no. 4, 1996. Bloomberg, 2019. [41] J. Conrad and G. Kaul, “An anatomy of trading strategies,” The Review [15] R. J. Balvers, T. F. Cosimano, and B. McDonald, “Predicting stock of Financial Studies, vol. 11, no. 3, pp. 489–519, 1998. returns in an efficient market,” The Journal of Finance, vol. 45, no. 4, [42] H. Bessembinder and K. Chan, “Market efficiency and the returns to pp. 1109–1128, 1990. technical analysis,” Financial mgmnt, 1998. [16] C. Brooks, “Predicting stock index volatility: can market volume help?” [43] S. Basu, “Investment performance of common stocks in relation to their Journal of Forecasting, vol. 17, no. 1, pp. 59–80, 1998. price-earnings ratios: A test of the efficient market hypothesis,” The [17] X. Zhang, H. Fuehres, and P. A. Gloor, “Predicting stock market journal of Finance, vol. 32, no. 3, pp. 663–682, 1977. indicators through twitter “i hope it is not as bad as i fear”,” Procedia- [44] R. Curtin, M. Edel, M. Lozhnikov, Y. Mentekidis, S. Ghaisas, and Social and Behavioral Sciences, vol. 26, 2011. S. Zhang, “Mlpack 3: A fast, flexible machine learning library,” Journal [18] P. C. Tetlock, “Giving content to investor sentiment: The role of media in of Open Source Softw., 2018. the stock market,” The Journal of finance, vol. 62, no. 3, pp. 1139–1168, [45] J. Y. Yam and T. W. Chow, “Feedforward networks training speed 2007. enhancement by optimal initialization of the synaptic coefficients,” IEEE [19] J. S. Abarbanell and B. J. Bushee, “Fundamental analysis, future Trans Neural Netw, vol. 12, no. 2, 2001. earnings, and stock prices,” Journal of accounting research, vol. 35, [46] “A weight initialization method for improving training speed in feedfor- no. 1, pp. 1–24, 1997. ward neural network,” Neurocomputing, vol. 30, no. 1, pp. 219 – 232, [20] M. C. Thomsett, Mastering technical analysis. Dearborn Trade 2000. Publishing, 1999. [47] W. F. Sharpe, “Mutual fund performance,” The Journal of business, [21] M. J. Pring, Technical analysis explained: The successful investor’s vol. 39, no. 1, pp. 119–138, 1966. guide to spotting investment trends and turning points. McGraw-Hill [48] U.S. Department of the Treasury, “Daily treasury yield curve rates,” New-York, 1991, vol. 4. https://bit.ly/3kYRIax, 2019. [22] T. A. Meyers, The Technical Analysis Course: A Winning Program for [49] W. Goetzmann, J. Ingersoll, M. I. Spiegel, and I. Welch, “Sharpening Investors and Traders. Irwin Prof. Publ., 1994. sharpe ratios,” Tech. Rep., 2002. [23] J. Nenortaite and R. Simutis, “Stocks’ trading system based on the particle swarm optimization algorithm,” in International Conference on , 2004. [24] L. A. Teixeira and A. L. I. De Oliveira, “A method for automatic stock trading combining technical analysis and nearest neighbor classifica- tion,” Expert systems with applications, vol. 37, no. 10, pp. 6885–6890, 2010. [25] M. Lam, “Neural network techniques for financial performance predic- tion: integrating fundamental and technical analysis,” Decision support systems, vol. 37, no. 4, pp. 567–581, 2004. [26] S. Gu, B. Kelly, and D. Xiu, “Empirical asset pricing via machine learning,” NBER, Tech. Rep., 2018. [27] S. McNally, J. Roche, and S. Caton, “Predicting the price of bitcoin using machine learning,” in 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP). IEEE, 2018, pp. 339–343. [28] L. Alessandretti, A. ElBahrawy, L. M. Aiello, and A. Baronchelli, “An-