Trader-Company Method: A Metaheuristic for Interpretable Stock Price Prediction Katsuya Ito Kentaro Minami Preferred Networks, Inc. Preferred Networks, Inc. [email protected] [email protected] Kentaro Imajo Kei Nakagawa Preferred Networks, Inc. Nomura Asset Management Co., Ltd. [email protected] [email protected]
ABSTRACT [7, 15, 16] have been standard asset pricing models for many years. Investors try to predict returns of financial assets to make suc- Technical indicators such as Moving Average Convergence Diver- cessful investment. Many quantitative analysts have used machine gence (MACD) and Relative Strength Index (RSI) are also prediction learning-based methods to find unknown profitable market rules methods that have been used by many traders [1, 28]. from large amounts of market data. However, there are several Although many quantitative analysts are struggling to derive challenges in financial markets hindering practical applications of new rules from newly available big data, there has been no gold- machine learning-based models. First, in financial markets, there standard practical method that can fully leverage these data [41]. is no single model that can consistently make accurate prediction We believe that there are the following two challenges that are because traders in markets quickly adapt to newly available informa- hindering the development of quantitative models today. tion. Instead, there are a number of ephemeral and partially correct models called “alpha factors”. Second, since financial markets are 1.1 Tackling Nearly-Efficient Markets highly uncertain, ensuring interpretability of prediction models is Our first challenge is to tackle the non-stationary and noisy nature quite important to make reliable trading strategies. To overcome of the financial market, which is known asthe efficiency of mar- these challenges, we propose the Trader-Company method, a novel kets. The widely acknowledged Efficient Market Hypothesis14 [ ] evolutionary model that mimics the roles of a financial institute and states that asset prices reflect all available information, correcting traders belonging to it. Our method predicts future stock returns by undervalued or overvalued prices into fair values. In an efficient aggregating suggestions from multiple weak learners called Traders. market, investors cannot outperform the overall market because A Trader holds a collection of simple mathematical formulae, each asset prices quickly follow other traders’ strategic and adversarial of which represents a candidate of an alpha factor and would be activities [10]. In fact, many empirical studies have reported that interpretable for real-world investors. The aggregation algorithm, real-world markets are nearly efficient [9, 32, 41]. Due to this, future called a Company, maintains multiple Traders. By randomly gener- stock returns are hardly predictable in most markets, and no single ating new Traders and retraining them, Companies can efficiently explanatory model can consistently make an accurate prediction. find financially meaningful formulae whilst avoiding overfitting to On the other hand, there still is a common belief that stock a transient state of the market. We show the effectiveness of our returns can be predictable in a sufficiently short time period, which method by conducting experiments on real market data. suggests the existence of investments or trading strategies that KEYWORDS beat the overall market at least temporarily. In particular, potential sources of profitability would come from some simple mathematical Finance, Metaheuristics, Stock Price Prediction formulae, called alpha factors [13, 26]. Typical alpha factors used in ACM Reference Format: production are given as combinations of few elementary functions Katsuya Ito, Kentaro Minami, Kentaro Imajo, and Kei Nakagawa. 2021. and arithmetic operations. For example, Trader-Company Method: A Metaheuristic for Interpretable Stock Price
arXiv:2012.10215v1 [q-fin.TR] 18 Dec 2020 Prediction. In Proc. of the 20th International Conference on Autonomous Agents ( / ) and Multiagent Systems (AAMAS 2021), Online, May 3–7, 2021, IFAAMAS, log yesterday’s close price yesterday’s open price 9 pages. represents the classical momentum strategy [24]. It has been re- 1 INTRODUCTION ported that there is a variety of mathematical formulae with rea- Developing quantitative trading strategies is a universal task in the sonably low mutual correlations [26], each of which can be a good financial industry11 [ ]. Many quantitative models have been pro- trading signal and actually usable in real-life trading. posed to predict the behavior of financial markets20 [ , 32, 41]. For Although the efficacy of a single formula is slight and ephemeral, example, Fama–Frech’s three-factor model and five-factor model combining multiple formulae in a sophisticated way can lead to a more robust trading signal. We hypothesize that we can overcome Proc. of the 20th International Conference on Autonomous Agents and Multiagent Systems the instability and the uncertainty of markets by maintaining mul- (AAMAS 2021), U. Endriss, A. Nowé, F. Dignum, A. Lomuscio (eds.), May 3–7, 2021, Online. © 2021 International Foundation for Autonomous Agents and Multiagent Systems tiple “weak models” given as simple mathematical formulae. This (www.ifaamas.org). All rights reserved. is in the same spirit as the ensemble methods (see e.g., [21, 39]), Trader: predict based on simple formulas Company: aggregate predictions and manage traders
If sign(� ) > 0 Stock then Buy A �
If 0.1� + 0.9� > 0 Stock then Buy A New! � If 0.2� + � > 0 Stock then Buy A (2)Aggregate (3)Educate (4)Dismiss (5)Recruit � (1)Traders make simple strategies Traders’ prediction Bad Traders Bad Traders New Traders
Figure 1: Illustration of our method. (1) Traders predict the return of assets using simple formulas. Companies manage Traders and combine Traders’ predictions into one value. The Company algorithm consists of four functions: (2) prediction by aggre- gation, (3) education of bad Traders, (4) dismissal of bad Traders, and (5) recruitment of new Traders. but paying more attention to specific structure of real-world al- (i.e., hiring) new candidates of good Traders as well as by deleting pha factors may improve the performance of the resulting trading (i.e., dismissing) poorly performing Traders. This framework allows strategy. us to effectively search over the space of mathematical formulae having categorical parameters. 1.2 Interpretability of Trading Strategies We demonstrate the effectiveness of our method by experiments The second challenge is to gain the interpretability of models. As on real market data. We show that our method outperforms several mentioned above, it is hard to achieve consistently high perfor- standard baseline methods in realistic settings. Moreover, we show mance in the financial markets. Even if we could have the best that our method can find formulae that are profitable by themselves possible trading strategies at hand, their predictive accuracies are and simple enough to be interpreted by real-world investors. quite limited and unsustainable. To gain intuition, suppose that we forecast the rise or fall of a single stock. Then, the accuracy is 2 PROBLEM DEFINITION typically no higher than 51%, which is approximately the chance In this section, we provide several mathematical definitions of fi- rate. As such, investors should worry about their trading strategies nancial concepts and formulate our problem setting. Table 1 sum- having a large uncertainty in the returns and the risks. marizes the notation we use in this paper. In such a highly uncertain environment, the interpretability of models is of utmost importance. Warren Buffett said, “Risk comes 2.1 Problem Setting from not knowing what you are doing” [18]. As his word implies, in- Our problem is to forecast future returns of stocks based on their vestors may desire the model to be explainable to understand what historical observations. To be precise, let 푋 [푡] be the price of stock they are doing. In fact, historically, investors and researchers have 푖 푖 at time 푡, where 1 ≤ 푖 ≤ 푆 is the index of given stocks and preferred linear factor models to explain asset prices (e.g., [15, 16]), 0 ≤ 푡 ≤ 푇 is the time index. Throughout this paper, we consider which are often believed as interpretable. On the other hand, for the logarithmic returns of stock prices as input features of models. machine learning-based strategies, the lack of interpretability can That is, we denote the one period ahead return of stock 푖 by be an obstacle to practical use, without which investors cannot understand their own investments nor ensure accountability to cus- 푋푖 [푡] − 푋푖 [푡 − 1] 푟푖 [푡] := log(푋푖 [푡]/푋푖 [푡 − 1]) ≈ . (1) tomers. We believe that using the combination of simple formulae 푋푖 [푡 − 1] mentioned in the previous subsection can open up a possibility of We denote returns over multiple periods and returns over multiple interpretable machine learning-based trading strategies. periods and multiple stocks by 1.3 Our Contributions 풓푖 [푢 : 푣] = (푟푖 [푢], ··· , 푟푖 [푣]), 풓푖:푗 [푢 : 푣] = (푟푖 [푢 : 푣], ··· , 푟 푗 [푢 : 푣]) To address the aforementioned challenges, we propose the Trader- (2) Company method, a new metaheuristics-based method for stock Our main problem is formulated as follows. price prediction. Our method is inspired by the role of financial institutions in the real-world stock markets. Figure 1 depicts the Problem 1 (one-period-ahead prediction). The predictor sequen- entire framework of our method. Our method consists of two main tially observes the returns 푟푖 [푡] (1 ≤ 푖 ≤ 푆) at every time 0 ≤ 푡 ≤ 푇 . ingredients, Traders and Companies. A single Trader predicts the For each time 푡, the predictor predicts the one-period-ahead re- returns based on simple mathematical formulae, which are postu- turn 푟푖 [푡 + 1] based on the past 푡 returns 풓1:푆 [0 : 푡]. That is, the lated to be good candidates for interpretable alpha factors. Brought predictor’s output can be written as together by a Company, Traders act as weak learners that provide 푟ˆ푖 [푡 + 1] = 푓푡 (풓1:푆 [0 : 푡]) (3) partial information helping the Company’s eventual prediction. The Company also updates the collection of Traders by generating for some function 푓푡 that does not depend on the values of 풓1:푆 [푡 : 푇 ]. Table 1: Notation. values [26]. We postulate that there are a number of unexplored profitable market rules that can be represented by simple formulae, Notation Meaning Def. which leads us to the following definition of a parametrized family stock price of stock 푖 at time 푡 of formulae. 푋푖 [푡] § 2.1 where 1 ≤ 푖 ≤ 푆, 0 ≤ 푡 ≤ 푇 Definition 1. A Trader is a predictor of one period ahead returns 푟푖 [푡] logarithmic return of 푖 at 푡 (1) defined as follows. Let 푀 be the number of terms in the prediction 풓 [푢 : 푣](푟 [푢], ··· , 푟 [푣]) (2) 푖 푖 푖 formula. For each 1 ≤ 푗 ≤ 푀, we define 푃푗 , 푄 푗 as the indices of 풓 [푢 : 푣](푟 [푢 : 푣], ··· , 푟 [푢 : 푣]) (2) 푖:푗 푖 푗 the stock to use, 퐷 푗 , 퐹푗 as the delay parameters, 푂 푗 as the binary [ ] ˆ [ ] [ ] 푟ˆ푖 푡 ,푏푖 푡 predicted value of 푟푖 푡 and its sign (3)(4) operator, 퐴푗 as the activation function, and 푤 푗 as the weight of the 푀, 푃 , 푄 , 퐷 푗 푗 푗 hyper-parameters of Traders (5) 푗-th term. Then, the Trader predicts the return value 푟푖 [푡 + 1] at 퐹푗 , 퐴푗 , 푂 푗 time 푡 + 1 by the formula 푀 ∑︁ 푓 (풓 [0 : 푡]) = 푤 퐴 (푂 (푟 [푡 − 퐷 ], 푟 [푡 − 퐹 ])). (5) 2.2 Evaluation Θ 1:푆 푗 푗 푗 푃푗 푗 푄 푗 푗 푗=1 To evaluate the goodness of the prediction, we use the cumulative where Θ is the parameters of the Trader: return defined as follows. Given a predictor’s output 푟ˆ푖 [푡 + 1], we 푀 define the “canonical” trading strategy as Θ := {푀, {푃푗 , 푄 푗 , 퐷 푗 , 퐹푗 , 푂 푗 , 퐴푗 ,푤 푗 }푗=1}.
푏ˆ푖 [푡 + 1] = sign(푟ˆ푖 [푡 + 1]). (4) For activation functions 퐴푗 , we use standard activation functions used in deep learning such as the identity function, hyperbolic ˆ [ ] Here, the value of 푏푖 푡 represents the trade of stock 푖 at time 푡. tangent function, hyperbolic sine function, and Rectified Linear That is, 푏ˆ [푡] = ±1 respectively means that the strategy buys/sells 푖 Unit (ReLU). For the binary operators 푂 푗 , we use several arithmetic one stock and sell/buy it back after one periods. Thus, equation binary operators (e.g., 푥 + 푦, 푥 − 푦, and 푥 × 푦), the coordinate (4) can be interpreted as a strategy that buys a unit amount of a projection, (푥,푦) ↦→ 푥, the max/min functions, and the comparison stock if its one period ahead return is expected to be positive and function (푥 > 푦) = sign(푥 − 푦). otherwise sells it. Then, we define the cumulative return of the Our definition of the Trader has several advantages. First, the prediction 푟ˆ푖 [푡 + 1] as formula (5) is ready to be interpreted in the sense that it has a similar 푡 푡 form to typical human-generated trading strategies [26]. Second, ∑︁ ˆ ∑︁ 퐶푖 [푡] = 푏푖 [푢 + 1]푟푖 [푢 + 1] = sign(푟ˆ푖 [푢 + 1])푟푖 [푢 + 1]. the Trader model has a sufficient expressive power. The Trader 푢=0 푢=0 has various binary operators as fundamental units, which allows If the predictor could perfectly predict the sign of the one period it to represent any binary operations commonly used in practical ahead returns, the above canonical strategy yields the maximum trading strategies. Besides, the model also encompasses the linear cumulative return among all possible strategies that can only trade models since we can choose the projection operator (푥,푦) ↦→ 푥 as unit amounts of stocks. As such, we consider 퐶푖 [푡] as an evaluation 푂 푗 . major of the prediction. Ideally, we want to optimize the Traders by maximizing the cumulative returns: 3 TRADER-COMPANY METHOD ∗ Í Θ ∈ arg max 푢 sign(푓Θ (풓1:푆 [0 : 푢]))· 푟푖 [푢 + 1] In this section, we present the Trader-Company method, a new Θ metaheuristics-based prediction algorithm for stock prices. := arg max 푅(푓Θ, 풓1:푆 [0 : 푡], 푟푖 [0 : 푡 + 1]) (6) Figure 1 outlines our proposed method. Our method consists of Θ two main components, Traders and Companies, which are inspired However, it is difficult to apply common optimization methods by the role of human traders and financial institutes, respectively. since the objective in the right-hand side is neither differentiable A Trader predicts the returns using a simple model expressing nor continuous w.r.t. the parameter Θ. Therefore, we introduce a realistic trading strategies. A Company combines suggestions from novel evolutionary algorithm driven by Company models, which multiple Traders into a single prediction. To train the parameters we will describe below. of the proposed system, we employ an evolutionary algorithm that mimics the role of financial institutes as employers of traders. 3.2 Companies - Optimization and Aggregation During training, a Company generates promising new candidates Module of Traders and deletes poorly performing ones. Below, we provide As mentioned in the introduction, the behaviour of the financial more detailed definitions and training algorithms for Traders and market is highly unstable and uncertain, and thereby any single Company. explanatory model is merely partially correct and transient. To over- come this issue and robustify the prediction, we develop a method 3.1 Traders - Simple Prediction Module to combine predictions of multiple Traders. This is in the same spirit First, we introduce the Traders, which are the minimal components as the general and long-standing framework of ensemble methods in our proposed framework. As mentioned in the introduction, trad- (e.g., [21] or Chapter 4 of [39]), but introducing an “inductive bias” ing strategies used in real-life trading are made of simple formulae that takes into account the dynamics of real-world financial mar- involving a small number of arithmetic operations on the return kets would improve the performance of combined prediction. In 1 Í푁 ( [ ]) particular, given the fact that the stock prices are determined as 푁 푛=1 푓Θ푛 풓1:푆 0 : 푡 , linear regression or general trainable pre- a result of diverse investments made by institutional traders, it diction models (e.g., neural networks and the Random Forest) that is reasonable to consider a model imitating the environments in take the Traders’ suggestions as the input features. which institutional traders are involved (i.e., financial institutions). In order to achieve low training errors whilst avoiding overfitting, the Company should maintain the average quality as well as the Algorithm 1 Prediction algorithm of Company diversity of the Traders’ suggestions. To this end, we introduce the Educate algorithm (Algorithm 2) and the Prune-and-Generate [ ] { }푁 Input: 풓1:푆 0 : 푡 :stock returns before 푡, Traders : Θ푛 푛=1 algorithm (Algorithm 3), which update weights and formulae of Output: 푟ˆ푖 [푡 + 1]: predicted return of stock 푖 at 푡 Traders, respectively. 1: function CompanyPrediction • Educating Traders: Recall that a single Trader (5) is a linear 2: for 푛 = 1, ··· , 푁 do combination of 푀 mathematical formula. A single Trader 3: 푃 ⇐ 푓 (풓 [0 : 푡]) ⊲ Prediction by Trader (5) 푛 Θ푛 1:푆 can perform poorly in terms of cumulative returns. However, 4: end for if the Trader has good candidates of formulae (i.e., alpha 5: return Aggregate(푃1, ··· , 푃푁 ) ⊲ Aggregation factors), slightly updating the weights {푤 푗 } while keeping 6: end function the formulae would significantly improve the performance. Algorithm 2 corrects the weights {푤 푗 } of the Traders achiev- ing relatively low cumulative returns. Here, we update the Algorithm 2 Educate algorithm of Company weights by the least-squares method, which is solved analyt- ically. Input: 풓 [0 : 푡]:stock returns before 푡 1:푆 • Pruning poorly performing Traders and generating new Input: Traders. 푁 : the number of Traders. 푄: ratio. candidate good Traders: If a Trader holds “bad” candidates Output: Traders of formulae, keeping that Trader makes no improvement on 1: function CompanyEducate the prediction performance while it can increase risk expo- 2: 푅푛 ⇐ 푅(푓Θ푛 , 풓1:푆 [0 : 푡], 푟푖 [0 : 푡 + 1]) ⊲ Trader’s return (6) ∗ sures. In that case, it may be beneficial to simply remove 3: 푅 ⇐ bottom 푄 percentile of {푅푛 } ∗ that Trader and replace it with a new promising candidate. 4: for 푛 ∈ {푚|푅 ≤ 푅 } do ⊲ for all bad traders 푚 Algorithm 3 implements this idea. First, we evaluate the cu- 5: Update 푤 in (5) by least squares method 푖 mulative returns of the current set of Traders, and remove 6: end for the Traders having relatively low returns. Then, we generate 7: return Traders new Traders by randomly fluctuating the existing Traders 8: end function with good performances. To this end, we fit some probabil- ity distribution to the current set of parameters, and draw new parameters from it. While the parameter specifying Algorithm 3 Prune-and-Generate algorithm of Company the formulae contain discrete variables such as indices of stocks and choices of arithmetic operations, we empirically Input: 풓 [0 : 푡]:stock returns before 푡, F: # of fit times 1:푆 found that fitting the Gaussian mixture distribution (i.e., a Input: 푁 : the number of Predictors. 푄: ratio. continuous multi-modal distribution) to discrete indices and Output: 푁 ′ Predictors discretizing the generated samples can achieve reasonably 1: Θ ∼ Unifrom Distribution 푛 good performances. See Section 4 for detailed experimental 2: for 푘 = 1, ··· , 퐹 do results. 3: 푅푛 ⇐ 푅(푓Θ, 풓1:푆 [0 : 푡], 푟푖 [0 : 푡 + 1]) ⊲ Trader’s return (6) ∗ 4: 푅 ⇐ bottom 푄-percentile of {푅푛 } Using the above algorithms together, we can effectively search ∗ 5: {Θ푗 }푗 ⇐ {Θ푛 |푅푛 ≥ 푅 } ⊲ Pruning the complicated parameter space of Traders. Educate algorithm 푁 ′ (Algorithm 2) is intended to be applied before pruning (Algorithm 6: {Θ푗 } ∼ GM fitted to {Θ푗 }푗 * ⊲ Generation 푗=1 3) to prevent potentially useful alpha factors from being pruned. 7: end for ′ ′ In practice, given past observations of returns 풓 푆 [푡1 : 푡2], we can 8: return 푁 Predictors with {Θ }푁 1: 푗 푗=1 train the model by the following workflow. * If the parameter is an integer, we round it off. (1) Educate a fixed proportion of poorly performing Traders by Algorithm 2. In our framework, a Company maintains 푁 Traders that act as (2) Replace a fixed proportion of poorly performing Traders weak learners or feature extractors, and aggregate them. Given 푁 with random new Traders by Algorithm 3. Traders specified by parameters Θ1,..., Θ푛 and the past observa- (3) If the aggregation function Aggregate has trainable param- tions of stock returns 풓1:푆 [0 : 푡], a Company predicts the future eters, update them using the data 풓1:푆 [푡1 : 푡2] and any opti- returns by mization algorithm.
푟ˆ[푡 + 1] = Aggregate(푓Θ1 , . . . , 푓Θ푛 ). (4) Predict future returns by Algorithm 1. For clarity, this procedure is presented in Algorithm 1. Here, Aggregate We comment on some intuitions about the advantages of our can be an arbitrary aggregation function and allowed to have ex- method, although there is no theoretical guarantee in practical tra parameters. For example, we can use the simple averaging settings. Algorithm 3 increases the diversity of Traders by injecting random fluctuations to existing good Traders. From a generalization • Sharpe Ratio (SR): The Sharpe ratio [36], or the Return/Risk perspective, injecting randomness may help to avoid overfitting ratio (R/R) is the return value adjusted by its standard devia- 1 1 Í푇 ˆ to the current transient state of the market. From an optimization tion. That is, letting 휇푖 := 푇 퐶푖 [푇 ] = 푇 푡=1 푏푖 [푢 + 1]푟푖 [푢 + perspective, we can view our algorithm as a variant of evolutionary 2 1 Í푇 ˆ 2 1] and휎푖 := 푇 푡=1 (푏푖 [푢 +1]푟푖 [푢 +1] −휇푖 ) , and we define algorithms such as the Covariance Matrix Adaptation Evolution 1 Í푆 SR푖 := 휇푖 /휎푖 . Then, we report the average SR = 푆 푖=1 SR푖 . Strategy (CMA-ES) [19]. While the original CMA-ES generates new • Calmar Ratio (CR): We also use the Calmar ratio [43], an- particles from a Gaussian distribution, we see that using a multi- other definition of adjuster returns. Define the Maximum modal distribution is practically important. See Section 4 for an DrawDown (MDD) as empirical verification. 퐶푖 [푡 ] MDD푖 := max1≤푡 ≤푇 max푡<푠 ≤푇 1 − [ ] . 4 EXPERIMENTS 퐶푖 푠 The Calmar ratio is defined as CR := AR /MDD . In our We conducted experiments to evaluate the performance of our 푖 푖 푖 experiments, we report CR = 1 Í푆 CR . Note that while proposed method on real market data. We compare our method 푆 푖=1 푖 both SR and CR are adjusted returns by its risk measures, with several benchmarks including simple linear models and state- CR is more sensitive to drawdown events that occur less of-the-art deep learning methods, and confirm the superiority of our frequently (e.g., financial crises). method. We also demonstrate that our method can find profitable formulae (alpha factors) that are simple enough to interpret. 4.3 Baseline Methods 4.1 Datasets We compared our methods with the following baseline methods: Throughout the experiments, we used two real market datasets: (i) • Market: a uniform Buy-And-Hold strategy. US stock prices listed on Standard & Poor’s 500 (S&P 500) Stock In- • Vector Autoregression (VAR): a commonly-used linear model dex and (ii) UK stock prices from the London Stock Exchange (LSE). for financial time series forecasting [37]. S&P 500 and LSE have one of the highest trading volumes and mar- • Random Forest (RF): a commonly-used ensemble method ket capitalization in the world. From a reproducibility viewpoint, [5]. we used data that were distributed online free from Dukascopy’s • Multi Head Attention (MHA): a deep learning algorithm for Online data feed1. For the S&P 500 data, we used daily data for all time series prediction [40]. the 500 stocks listed S&P 500 Stock Index in the period from May • Long- and Short-Term Networks (LSTNet): a deep-learning- 19, 2000 to May 19, 2020. For the LSE data, we used hourly data for based algorithm which combines Convolutional and Recur- all the 77 stocks prices available on Dukascopy in the period from rent Neural Network [27]. 2 September 07, 2016 to September 07, 2019. • State-Frequency Memory Recurrent Neural Networks (SFM) : a deep learning-based stock price prediction algorithm that 4.2 Evaluation Protocols incorporates the concept of Fourier Transform into Long Short-Term Memory [44]. 4.2.1 Time windows and execution lags. In our experiments, we • Symbolic Regression by Genetic Programming (GP): a pre- employed two practical constraints, time windows and execution dicton algorithm using genetic programming [33]. lags. In practice, we cannot use all the past observations of stock prices due to the time and space complexity. Also, we cannot trade To verify the effects of individual technical components in our stocks immediately after the observation of returns because we proposed method (TC), we also compared the following “ablation” need some time to make an inference by the model and execute models. the trade. Thus, it is reasonable to introduce a time window 푤 > 0 • Changing the Trader structure: TC linear only uses the and an execution lag 푙 > 0, so we train models using observations linear activation function 푥 ↦→ 푥, so the eventual prediction 풓푖:푆 [푡 − 푙 − 푤 : 푡 − 푙] and predict returns at time 푡 + 1. Throughout of a Campany becomes just a linear combination of several experiments, we used 푤 = 10 and 푙 = 1. binary operations. TC unary only uses the unary operator 푂 (푥,푦) = 푥. 4.2.2 Metrics. To evaluate the performances of prediction algo- • Changing the optimization algorithm: TC w/o educate does rithms, we adopted three metrics defined as follows. For each not execute the Educate Algorithm (Algorithm 2), so Traders metric, higher value is better. Let 푟 [푡] be the return of stock 푖 푖 can be discarded even if they have promising formulae. TC (푖 ∈ {1, . . . , 푆}) at time 푡, and let 푟ˆ [푡] be its prediction obtained 푖 w/o prune does not execute the pruning step in Algorithm 푏ˆ [푡] = (푟 [푡) from an arbitrary method. Recall 푖 sign ˆ푖 . 3, so a Company keeps poorly performing Traders. TC uni- • Accuracy (ACC) the accuracy rate of prediction of the rise modal uses the Gaussian distributions instead of the Gauss- and drop of stock prices. ACC푖 = P[sign(푟푖 [푡]) = 푟ˆ푖 [푡]]. ian mixtures in the generation step. TC MSE uses the mean • Annualized Return(AR): Given predictions of the returns squared loss as scores for educating and pruning, so it does Í푡 ˆ of stock푖, the cumulative return is given as퐶푖 [푡] := 푢=0 푏푖 [푢+ not use the cumulative returns at all. ] [ + ] 1 푟푖 푢 1 . We define the Annualized Return (AR) averaged Table 2 lists the hyper-parameters used in the baseline algo- × 푇Y 1 Í푆 [ ] over all stocks as AR := 100 푇 푆 푖=1 퐶푖 푇 , where 푇Y is rithms. the average number of periods contained in one year. 2There was an unintended data leak in the implementation published by the author. 1https://www.dukascopy.com/ Therefore, we fixed it in our experiment for fairness. Table 2: Hyper-parameters in Experiments Table 3: Performance comparison on US markets
Parameter Value Definition ACC(%) AR(%) SR CR 푀 {1, ··· , 10} (5) Market 55.09 -5.45 -0.2 -0.12 퐷 푗 , 퐹푗 {0, ··· , 10} (5) VAR 49.86 3.51 0.28 0.24 퐴푗 (푥){푥, tanh(푥), exp(푥), sign(푥), ReLU(푥)} (5) MHA 48.16 ± 1.1 0.66 ± 0.54 0.12 ± 0.10 0.11 ± 0.09 {푥 + 푦, 푥 − 푦, 푥푦, 푥, 푦, max(푥, 푦), LSTNet 47.51 ± 1.12 3.61 ± 2.09 0.18 ± 0.10 0.16 ± 0.1 푂 (푥, 푦) (5) 푗 min(푥, 푦), 푥 > 푦, 푥 < 푦, Corr(푥, 푦)} SFM 50.14 ± 1.09 5.95 ± 1.16 0.52 ± 0.10 0.50 ± 0.18 푁 100 Algorithm 1 GP 54.99 ± 1.15 -0.27 ± 0.67 -0.03 ± 0.10 -0.02 ± 0.08 Aggregate Linear regression Algorithm 1 RF 53.74 ± 1.11 4.38 ± 1.60 0.28 ± 0.11 0.25 ± 0.13 푄 0.5 Algorithm 3 TC linear 53.23 ± 1.18 2.81 ± 0.36 0.86 ± 0.10 0.93 ± 0.28 TC unary 51.14 ± 1.17 1.50 ± 0.33 0.46 ± 0.11 0.44 ± 0.17 TC w/o educate 52.36 ± 1.15 3.69 ± 0.32 1.12 ± 0.10 1.36 ± 0.38 TC w/o prune 50.31 ± 1.14 1.08 ± 0.33 0.34 ± 0.11 0.30 ± 0.13