JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 Temporal-Relational Hypergraph Tri-Attention Networks for Stock Trend Prediction Chaoran Cui, Xiaojie Li, Juan Du, Chunyun Zhang, Xiushan Nie, Meng Wang, and Yilong Yin

Abstract—Predicting the future price trends of stocks is a trend prediction, which aims to forecast the future price trends challenging yet intriguing problem given its critical role to help of stocks, has received increasing attention due to its potential investors make profitable decisions. In this paper, we present a in helping investors make profitable decisions. Although the collaborative temporal-relational modeling framework for end- to-end stock trend prediction. The temporal dynamics of stocks famous efficient market hypothesis [1] holds a pessimistic is firstly captured with an attention-based recurrent neural view that the future price of a stock is unpredictable with network. Then, different from existing studies relying on the respect to currently available information, continuous research pairwise correlations between stocks, we argue that stocks are works [2]–[4] on stock trend prediction have achieved impres- naturally connected as a collective group, and introduce the sive success in past decades, and provided strong evidence for hypergraph structures to jointly characterize the stock group- wise relationships of industry-belonging and fund-holding. A the predictability of stock markets. novel hypergraph tri-attention network (HGTAN) is proposed A natural solution to stock trend prediction is to regard it as to augment the hypergraph convolutional networks with a hier- a time series modeling problem, for which the autoregressive archical organization of intra-hyperedge, inter-hyperedge, and model and its variants [5] were initially applied to fit the inter-hypergraph attention modules. In this manner, HGTAN stock trends based on the historical price data. Afterwards, adaptively determines the importance of nodes, hyperedges, and hypergraphs during the information propagation among stocks, classic linear models including logistic regression and sup- so that the potential synergies between stock movements can port vector machine (SVM) were frequently adopted as the be fully exploited. Extensive experiments on real-world data predictive models [6]. However, the inherent non-linear and demonstrate the effectiveness of our approach. Also, the results non-stationary nature of stock prices limits the applicability of of investment simulation show that our approach can achieve a these early techniques. With the rise of deep learning, recurrent more desirable risk-adjusted return. The data and codes of our work have been released at https://github.com/lixiaojieff/HGTAN. neural networks (RNNs) [7], [8] and transformer networks [9] have shown promising results in stock trend prediction, owing to their powerful abilities to capture the underlying dynamics Index Terms—Stock trend prediction, stock investment simula- tion, hypergraph convolutional networks, triple attention mech- of the chaotic time series. anism. In another research line, the relationship information be- tween stocks has proven to be highly valuable in improv- ing stock trend prediction. Especially with the popularity of I.INTRODUCTION graph neural networks [10], [11], different stocks and their Or a long time, the stock market has been one of the most relationships are typically viewed as nodes and edges in a F important investment options for both individuals and graph, and the influence between stocks is incorporated via the institutions to chase wealth. As recently reported, the overall node representation learning applied on the graph. Despite the capitalization of major stock markets worldwide has exceeded encouraging progress, existing studies [12], [13] make the pre- 100 trillion U.S. dollars by the first quarter of 20211. Stock dictions on future trends depending mainly on the correlations between pairs of stocks. But in fact, we argue that different This work was supported by the National Natural Science Foundation stocks are naturally connected as a collective group rather than arXiv:2107.14033v1 [q-fin.ST] 22 Jul 2021 of under Grant 62077033 and Grant 61876098, by the National by pairwise interactions. For example, multiple stocks could Key R&D Program of China under Grant 2018YFC0830100 and Grant 2018YFC0830102, by Shandong Provincial Natural Science Foundation Key belong to the same industry or be held by the same fund, and Project under Grant ZR2020KF015, and by the Fostering Project of Dom- they may thus share common intrinsic properties [14], [15]. inant Discipline and Talent Team of Shandong Province Higher Education Fig. 1 also displays the price volatility patterns of such two Institutions. C. Cui, X. Li, and C. Zhang are with the School of Computer Science groups of stocks within a certain period of time. Obviously, and Technology, Shandong University of Finance and Economics, Jinan the stocks in each group exhibit approximately consistent price 250014, China (e-mail: [email protected]; [email protected]; trends, and the phenomenon suggests the existence of the [email protected]). J. Du is with the School of Finance, Shandong University of Finance and group-wise relationships among stocks. As a result, simply Economics, Jinan 250014, China (e-mail: [email protected]). decomposing the group-wise relationships into pairwise ones X. Nie is with the School of Computer Science and Technology, Shandong may inevitably cause the loss of information. Jianzhu University, Jinan 250101, China (e-mail: [email protected]). M. Wang is with the School of Computer Science and Information Motivated by the above discussions, in this paper, we Engineering, Hefei University of Technology, Hefei 230601, China (email: introduce the hypergraph structures [16] to jointly characterize [email protected]). the group-wise relationships of industry-belonging and fund- Y. Yin is with the School of Software, Shandong University, Jinan 250101, China (e-mail: [email protected]). holding among stocks. A hypergraph is a generalization of 1http://www.businesskorea.co.kr/news/articleView.html?idxno=63985 a simple graph, in which a hyperedge expresses a group- JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2

1.0 Pudong Development Bank based gated recurrent unit (GRU) model. Hua Xia Bank Extensive experiments are carried out on real-world data 0.8 China Construction Bank collected from China’s A-share market, and the results show the superiority of our approach over state-of-the-art methods 0.6 for stock trend prediction. In addition, we simulate the stock investment using the trading strategies based on different meth- 0.4 ods, and the results show that our approach earns significantly

0.2 higher returns with limited downside risk. Finally, detailed Normalized Closing Price ablation studies are performed to investigate the efficacy of 0.0 the key components in our approach. 10/08/2019 10/28/2019 11/17/2019 12/07/2019 12/27/2019 Trading Date In summary, the main contributions of our work are: (a) Daily closing prices of stocks belonging to the bank industry. • We introduce the hypergraph structures to jointly charac- terize the group-wise relationships of industry-belonging and fund-holding for stock trend prediction. 1.0 Haier Smart Home China Jushi • We propose a novel HGTAN consisting of hierarchical 0.8 Weichai Power attention modules to consider the importance of different Jinjiang Hotel nodes, hyperedges, and hypergraphs when guiding the 0.6 information propagation in stock hypergraphs. • We conduct both experimental evaluation and investment 0.4 simulation on real-world data, and the results demonstrate

0.2 the validity and rationality of our approach. Normalized Closing Price The remainder of the paper is organized as follows. Sec- 0.0 tion II reviews the related work. Section III details the 10/08/2019 10/28/2019 11/17/2019 12/07/2019 12/27/2019 Trading Date proposed framework for stock trend prediction. Experimental setups are described in Section IV, and the results and analysis (b) Daily closing prices of constituent stocks of a mutual fund. are reported in Section V. Section VI concludes our work and Fig. 1: Price volatility patterns of two groups of stocks outlines the directions of future research. belonging to the same industry and held by the same fund in China’s A-share market, respectively. II.RELATED WORK In this paper, we first review the existing literature on stock trend prediction. Then, we present a brief overview of the topic wise relationship that links multiple nodes simultaneously. of hypergraph learning, which is closely related to our work. Accordingly, the recently proposed hypergraph convolutional networks (HGCNs) [17], [18] could be easily used for stock representation learning, so that the group-wise relationship in- A. Stock Trend Prediction formation is integrated in stock trend prediction [19]. However, In the early stage, many statistical models such as autore- due to the complexity of the influence process between stocks, gressive integrated moving average (ARIMA) [5] and Kalman HGCNs still face three main problems: 1) It equally treats the filters [20] were widely adopted as solutions to stock trend neighbors of a stock in a hyperedge, and ignores the subtle prediction. Besides, some technical indicators were designed differences of their impacts on the target stock; 2) When a based on stocks’ historical prices and volumes to provide stock is associated with multiple hyperedges, how to choose insights about the future trends [15]. Machine learning tech- proper hyperedge weights remains an open question; and 3) niques like logistic regression and SVM have also shown The industry-belonging and fund-holding relationships result promise for stock trend prediction [6]. The major limitation of in two heterogeneous stock hypergraphs, but it is difficult for these research efforts lies in that they make the premise that HGCNs to effectively coordinate them. the input signals are linear and stationary, regardless of the To address the issues, we propose a hypergraph tri-attention fact that the stock market is a highly volatile dynamic system. network (HGTAN), which augments HGCNs with a triple Meanwhile, they may have to manually extract useful features attention mechanism. Specifically, HGTAN is equipped with from the stock time series, which requires a considerable the hierarchical intra-hyperedge, inter-hyperedge, and inter- amount of domain knowledge and engineering skills. hypergraph attention modules, which measure the importance With the huge surge of deep learning, RNNs have become of different nodes, hyperedges, and hypergraphs, respectively. a popular alternative to substitute the traditional time series In this way, HGTAN selectively aggregates the information models for stock trend prediction. For example, Nelson et from different sources, and fully exploits the potential syner- al. [21] used the long short-term memory (LSTM) network gies between stock movements. Note that our approach pro- to predict the future trends of stocks based on the price vides a collaborative temporal-relational modeling capacity for history alongside with technical indicators. Akita et al. [22] end-to-end stock trend prediction, because we deploy HGTAN converted newspaper articles into event representations and to model the group-wise relationships among stocks after modeled the temporal effects of past events on opening prices capturing the temporal dynamics of stocks with an attention- about multiple companies with the LSTM network. Zhang et JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3 al. [8] proposed a state frequency memory (SFM) network TABLE I: Summary of key notations and definitions. to decompose the hidden states of the LSTM memory cells Notation Definition into multiple components, each of which captures a particular S Set of stocks frequency of latent trading pattern underlying the fluctuation n Number of stocks, i.e., n = |S| of stock prices. Qin et al. [7] presented a dual-stage attention- m Size of lookback window based recurrent neural network (DARNN), which integrates s Stock index, i.e., s ∈ S two attention modules within a LSTM network to adaptively t t-th trading day, i.e., t ≤ m extract relevant input features at each time step and select X Historical price records of stocks relevant encoder hidden states across all time steps. Most xs,t Price attributes of s at the t-th trading day recently, the transformer architecture [9], which relies solely Gi = (S, Ei, wi) Industry-belonging hypergraph on the self-attention mechanism to model temporal context Gf = (S, Ef , wf ) Fund-holding hypergraph information, has been reported to achieve remarkable results e Hyperedge index, i.e., e ∈ Ei or e ∈ Ef in stock trend prediction. wi,e, wf,e Importance of e in Gi and Gf In real financial markets, stocks are broadly correlated with ybs, ys Predicted and ground-truth trends of s h Hidden state at the t-th trading day each other via a variety of relationships. It is natural to believe t gs Temporal dynamics representation of s that the price change of a stock would be significantly affected rs, rs Embedding of s before and after the update by other related stocks. From this point, the relationship of HGTAN information between stocks has been extensively exploited Hs Subset of hyperedges containing s in previous works [4]. Typically, the stock relationships are Ne Subset of nodes forming hyperedge e e viewed as a graph, and some graph-based learning methods rs Hyperedge-specific embedding of s with re- spect to e are applied to model stock representations more effectively. i f rs, rs Hypergraph-specific embeddings of s regard- For example, Chen et al. [23] established a stock graph ing Gi and Gf according to the shareholding information and adopted the graph convolutional networks (GCNs) to forecast the rising or falling of stock prices. Feng et al. [12] introduced a distance between two nodes in the hyperedge. Bai et al. [18] new component in neural network modeling, named temporal introduced two end-to-end trainable operators to the family graph convolution (TGC), which handles the impact between of graph neural networks, i.e., hypergraph convolution and different stocks by encoding stock relationships in a time- hypergraph attention, in order to deal with the non-pairwise sensitive way. Kim et al. [13] proposed a hierarchical graph relationships among nodes. attention network for stock trend prediction (HATS), which Building on the success of hypergraph learning, Sawhney et selectively aggregates information on different relationship al. [19] proposed a spatiotemporal hypergraph convolutional types to learn stock representations. network (STHGCN) to model the temporal evolution in stock Despite the impressive progress, most existing works simply prices and the industry-belonging relationships of stocks for assume stocks to be correlated in a pairwise manner. How- predicting the future trends. However, STHGCN equally treats ever, as previously mentioned, stocks are usually connected different nodes in the stock hypergraph, and simply mixes up to each other as a group, and decomposing the group-wise all relationship information hidden in different hyperedges. relationships into pairwise ones may result in the loss of By contrast, both the group-wise relationships of industry- information. In this paper, we directly characterize the group- belonging and fund-holding are leveraged in our work, and a wise relationships among stocks by hypergraph modeling. triple attention mechanism is designed to hierarchically quan- tify the importance of nodes, hyperedges, and hypergraphs for better guiding the information propagation among stocks. B. Hypergraph Learning

Hypergraph is designed to describe the topological structure III.FRAMEWORK beyond pairwise interactions between nodes [24]. It has proven In this section, we present a collaborative temporal- to be effective in a wide range of applications, including image relational modeling framework for end-to-end stock trend classification [25], object segmentation [26], and multimodal prediction. Firstly, we formulate the problem of stock trend learning [27]. Towards stock trend prediction, the hypergraph prediction. Then, we illustrate the process of temporal dynam- clustering algorithm was initially applied to partition all stocks ics modeling for stocks. Finally, we detail the hypergraph tri- into multiple sets, and the price trend of a stock was deter- attention network (HGTAN) that models the group-wise rela- mined with the rest stocks in the same set [28], [29]. tionships among stocks. Fig. 2 displays the overall architecture In the era of deep learning, the application of graph neural of our framework. networks for hypergraphs has received increasing attention. Feng et al. [17] were the first to perform spectral convolution on hypergraphs, and developed a hypergraph neural network A. Problem Formulation framework. Yadati et al. [30] proposed a new way of training To formulate our problem, we declare some notations in a GCN on hypergraphs, where a hypergraph is transformed advance. Throughout this paper, we use calligraphic capital into a simple graph and each hyperedge is represented by a letters (e.g., X ), bold capital letters (e.g., X), bold lowercase simple edge whose weight is proportional to the maximum letters (e.g., x), and non-bold letters (e.g., x) to represent sets,

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4

Group Group

Temporal Dynamics Modeling Group Trend Prediction

- - -

FullyFully

r Fully

wisewiseRelationshipRelationship

wiseRelationship 1

ModelingModelingModeling

- -

GRU r -

connectedconnectedLayer Layer 2 connectedLayer

x1 h1 1 GRU r3  2 x h 2 2  3 r4 GRU  x h m 3 3 g r GRU n stock movement xm hm embeddings direction

Group-wise Relationship Hypergraph Tri-Attention Network Hypergraph

Industry-belonging  e su rs Intra-hyperedge Attention Max i Pooling e rs  sv Inter-hyperedge Attention Max b ωi Pooling rs Inter-hypergraph Fund-holding b rs Attention

ω f Intra-hyperedge Inter-hyperedge Attention Attention f rs

Fig. 2: The overall architecture of our framework. matrices, vectors, and scalars, respectively. If not clarified, all trends of stock s on the following trading day m + 1. The vectors are in column form. Table I summarizes some key future movement direction is subsequently chosen as the one notations and definitions used throughout this paper. with the largest probability. Meanwhile, the ground-truth label As it is rather difficult to estimate the exact price of a stock, ys is given as a one-hot vector indicating the real movement we instead judge the stock price trend in the future. Given a direction of s. We use the cross entropy as the loss function set of n stocks S, we collect the historical price records over to penalize the deviation of ybs from ys, i.e., m a lookback window of trading days for each stock, i.e., X X = {[xs,1, xs,2,..., xs,m], s ∈ S}, where xs,t is a set of l(ybs, ys) = − ys,c ln ybs,c. (1) price attributes of stock s at the t-th trading day. c Besides, we introduce two hypergraphs to model the group- The desired mapping function f(·) can be determined by wise relationships among stocks, i.e., the industry-belonging minimizing the loss over all stocks across different lookback and fund-holding relationships, respectively. A hypergraph is windows. an extension of a simple graph, in which a set of nodes are defined as a weighted hyperedge. Let Gi = (S, Ei, wi) be the industry-belonging hypergraph, where S is taken as the node B. Temporal Dynamics Modeling set, Ei is the set of hyperedges connecting different stocks belonging to the same industry, and wi is a weight vector with Due to the volatility dependencies over time in financial the element wi,e representing the importance of hyperedge markets, the historical price records of a stock play a critical e ∈ Ei. In a similar way, we denote by Gf = (S, Ef , wf ) role in predicting its future trend. In this paper, we use the the fund-holding hypergraph, which contains the hyperedges GRU model to capture the temporal dynamics of each stock connecting the stocks held by the same fund. from its time series price data. Compared to the vanilla RNN In this study, we consider three movement directions of and its LSTM variant, the GRU model has not only the stock prices, i.e., the rising, falling, and steady trends. Based powerful ability of memorizing long-term information, but on the historical price records and the group-wise relation- also a relatively simpler structure, fewer parameters, and faster ships among stocks, our goal is to learn a mapping function training ability [31]. [yb1, yb2,..., ybn] = f(X , Gi, Gf ), where ybs is the predicted At the t-th trading day, we individually feed the price probability distribution over the rising, falling, and steady attributes xs,t of stock s to the GRU model. In the following, JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5 we shall drop the stock index s for notational simplicity. C. Group-wise Relationship Modeling Formally, the GRU model executes the calculations as follows: In temporal dynamics modeling, different stocks are re- zt = σ (Wz [ht−1, xt]) , garded as independent of each other. However, there exist

ut = σ (Wu [ht−1, xt]) , many potential relationships between stocks, and the stocks (2) c = tanh (W [z h , x ]) , with closer relationships are more likely to have similar price t c t t−1 t trends. More importantly, we notice that stocks are naturally ht = (1 − ut) ht−1 + ut ct, connected as a collective group rather than by pairwise inter- actions. Therefore, as opposed to prior works [12], [13], [23] where ht−1 denotes a hidden state summarizing all past that depend on stock pairwise correlations, we leverage the information up to the (t − 1)-th trading day. ht−1 and xt are hypergraph structures to characterize the inherent group-wise firstly concatenated and transformed into a reset gate zt and a relationships among stocks, including the industry-belonging update gate ut, respectively. The former determines how much of the past information to forget, while the latter controls how and fund-holding relationships. much of that needs to be brought into the current hidden state Recently, hypergraph convolutional networks ht. Then, ht−1 is reset with zt, and it is again concatenated (HGCNs) [17], [30] have emerged for modeling hypergraph- with xt to generate a memory cell ct. ct represents the new structured data and achieved state-of-the-art performance. information to be added to ht. Finally, ht is computed as HGCNs define an information propagation rule in the combination of ht−1 and ct, and ut serves as a balance hypergraphs for data representation learning via a convolution factor in this procedure. In Eq. (2), Wz, Wu, and Wc are the operator [18]. Taking the industry-belonging hypergraph parameters to be learned, denotes the Hadamard product, Gi = (S, Ei, wi) as an example, the convolution operator and σ(·) and tanh(·) denote the sigmoid and tanh activation updates the embedding of stock s ∈ S by aggregating the functions, respectively. information from its local neighbors in each hyperedge: The GRU model successively outputs the hidden states ! across all m trading days, i.e., h1, h2,..., hm. However, the X X r = δ w P r , price volatility of stocks may not be sequentially dependent. s i,e u (6) For example, stock prices frequently exhibit periodic changes e∈Hs u∈Ne over long time intervals, a typical representative of which where r and r are the embeddings of s before and after is the phenomena of calendar effects [32]. Therefore, it is s s the update, respectively. H ⊆ E is the subset of hyperedges necessary to discriminate the importance of historical contexts s i containing s, N ⊆ S is the subset of nodes forming hy- at different moments for predicting the future trends of stocks. e peredge e, and P is a projection matrix to be learned. δ(·) To this end, we further introduce a temporal attention layer denotes a nonlinear activation function like LeakyReLU [34]. to selectively emphasize informative hidden states of past In practice, the hyperedge weight w is usually normalized trading days and suppress less useful ones [7]. Specifically, i,e to avoid numerical instabilities. We omit this step here for the attention weight γ of the hidden state h is measured by t t expression simplicity. examining how well it is compatible with a query reference. Based on the recency bias hypothesis [33] that the future trend By accepting the temporal dynamics representation of a of a stock has a stronger correlation with its recent volatility, stock as the input embedding (e.g., rs = gs), HGCNs can further integrate the group-wise relationship information in we choose the latest hidden state hm as the query reference. stock trend prediction [19]. But as explained above, due to γt is thus defined as follows: the complexity of the influence process between stocks, it exp (s (ht, hm)) may be inappropriate to directly apply HGCNs to the stock γt = Pm , (3) j=1 exp (s (hj, hm)) hypergraphs. In this paper, we propose a hypergraph tri- attention network (HGTAN), which augments HGCNs with a where triple attention mechanism [11]. In particular, HGTAN consists T s (ht, hm) = (Ukht) (Uqhm) (4) of intra-hyperedge, inter-hyperedge, and inter-hypergraph at- tentions modules. Benefiting from the three hierarchically de- is a compatibility function that transforms ht and hm into signed modules, HGTAN simultaneously takes account of the a latent space, and computes their dot product in the space. importance of different nodes, hyperedges and hypergraphs, We generate an unified embedding g to describe the global and adaptively determines the optimal way of information temporal dynamics of a stock, which is computed as the propagation in stock hypergraphs. In the following, we elab- weighted sum of the transformed hidden states, i.e., orate the attention module at each level in HGTAN. m X Intra-Hyperedge Attention: The intra-hyperedge attention g = γtUvht. (5) module aims to learn the importance of local neighbors of a t=1 stock within the same hyperedge. Using the same notations as

In the temporal attention layer, Uq, Uk, and Uv are three given in Eq. (6), for stock s and its neighbor u in hyperedge transformation matrices to be learned. Following the same e, we quantify the degree to which s is close to u by steps, we can yield the temporal dynamics representations of T  all stocks, which are denoted as g1, g2,..., gn. d(rs, ru) = δ ad [P rs, P ru] , (7) JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6

Algorithm 1: The update process of HGTAN. TABLE II: Main statistics of the dataset. Input: Industry-belonging hypergraph Gi = (S, Ei, wi), Statistics Fund-holding hypergraph Gf = (S, Ef , wf ), Temporal dynamics representations of stocks {gs, ∀s ∈ S}. Number of stocks 758 Output: Final embeddings of stocks {rs, ∀s ∈ S}. Number of training days 1021 1 for s ∈ S do Number of validation days 340 2 rs = gs;  Number of testing days 331 3 for G ∈ Gi, Gf do Number of industry-belonging relationships 104 4 Find the subset of hyperedges Hs ⊆ E containing s; 5 for e ∈ Hs do Number of fund-holding relationships 61 6 Find the subset of nodes Ne ⊆ S forming e; Percentage of rising days 38.0% 7 for u ∈ Ne do Percentage of falling days 37.3% 8 Compute the neighbor weight α ; \\ Eq. (8) su Percentage of steady days 24.7% 9 end e 10 Generate the hyperedge-specific stock embedding rs; \\ Eq. (9) 11 end 12 for e ∈ Hs do respectively. Similar to (8), the normalized version of c(qe) 13 Generate the hyperedge embedding qe; \\ Eq. (10) is used to indicate the weight of e regarding s: 14 Compute the hyperedge weight βe; \\ Eq. (12) 15 end i exp (c(qe)) 16 Generate the hypergraph-specific stock embedding rs or βe = P . (12) f exp (c(qb)) rs ; \\ Eq. (13) b∈Hs 17 end 18 Compute the hypergraph weights ωi and ωf ; \\ Eq. (15) Furthermore, all hyperedge-specific embeddings of s are ag- 19 Generate the final stock embedding rs; \\ Eq. (16) gregated to globally characterize s in Gi: 20 end 21 return rs; i X e rs = βers. (13)

e∈Hs s where ad is a shared attention vector when computing the Analogously, the overall representation of in the fund- G rf degree between any pair of stocks. The neighbor weight αsu holding hypergraph f can be obtained and denoted as s . is further computed indicating how s should attend to u in e: Inter-Hypergraph Attention: For stock s, the inter- hypergraph attention module works on combining its dual exp (d(rs, ru)) i f αsu = P . (8) representations rs and rs obtained from the heterogenous exp (d(rs, rv)) v∈Ne industry-belonging and fund-holding hypergraphs, respec- In the intra-hyperedge attention module, the embedding of s tively. Here, the attention mechanism is applied to balance i f i with respect to e is updated via the weighted aggregation of the trade-off between rs and rs in the combination. rs and f the information provided by its neighbors: rs are firstly compared with each other by ! i  T  i f  e X o rs = δ ao V rs, V rs , r = δ αsuP ru . (9) s f  T  f i  (14) u∈Ne o rs = δ ao V rs , V rs ,

Inter-Hyperedge Attention: Intuitively, a stock may belong where ao and V are the model parameters in the hypergraph- to multiple industries or be held by multiple funds at the i f level attention. The relative weights of rs and rs in the same time. As a result, a node can be covered by multiple combination are then computed by hyperedges in each stock hypergraph, and the hyperedge- i specific node embedding as shown in Eq. (9) only reflects exp(o(rs)) ωi = , a partial view of the node. To gain a more comprehensive exp(o(ri )) + exp(o(rf )) s s (15) picture, we develop the inter-hyperedge attention module to f exp(o(rs )) ωf = . further fuse all hyperedge-specific embeddings of an individ- i f exp(o(rs)) + exp(o(rs )) ual stock. As aforementioned, denote by Hs the subset of hyperedges containing s in Gi, for hyperedge e ∈ Hs, the Finally, HGTAN completes the update of the embedding of s module generates the hyperedge embedding qe for e by by i f e rs = ωirs + ωf rs , (16) qe = pool ({rs, s ∈ Ne}) , (10) where pool(·) denotes the element-wise max-pooling opera- and predicts the probability distribution ybs over the future tion. In graph theory [35], the importance of e can be measured movement directions of s by feeding rs into a fully-connected by its closeness centrality, i.e., how close it is to all other layer with softmax activations. The entire update process of hyperedges: HGTAN is shown in Algorithm 1. X T  c(qe) = δ ac [Qqe, Qqb] , (11) IV. EXPERIMENTAL SETUPAND IMPLEMENTATION b∈Hs where ac and Q are the trainable attention vector and In this section, we describe the experimental setup and projection matrix in the inter-hyperedge attention module, implementation details for our performance evaluation. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7

A. Data Collection the number of predictions correctly made for positive classes Recently, there emerges several public datasets [8], [36] for (tp) and negative classes (tn), as well as that of predictions stock trend prediction, but most of them lack the stock rela- made falsely for both classes (fp and fn), i.e., tional data, especially the group-wise relationships required in tp + tn accuracy = , this study. As such, we used a financial data API 2 to collect tp + tn + fp + fn the historical price and relational data of stocks from China’s tp precision = , A-share market by ourselves. Specifically, we chose the stocks tp + fp that have price records between 01/04/2013 and 12/31/2019, tp (18) recall = , obtaining 2,433 stocks. Similar to the setting in [12], we tp + fn further performed a filtering step to eliminate those stocks 2 ∗ precision ∗ recall F = . that were traded on less than 98% of all trading days. This 1 precision + recall finally results in 758 stocks. The main statistics of the dataset are summarized in Table II. C. Investment Simulation For each stock, we extracted six price attributes per day, including the open price, high price, low price, close price, The classification performance only reflects the ability of trading volume, and trading value. If a stock experienced a models to forecast future stock trends. However, what ulti- temporary trading suspension, we used the price attributes mately matters in stock markets is the profitability. To test of the last day before the suspension as an alternative. Ad- whether the predictions made by a model can make a profit, ditionally, we introduced four technical indicators, i.e., 5-, we set up a back-testing by simulating the stock investment in 10-, 20-, and 30-day moving averages, to capture the past China’s A-Share market. In particular, the model is regarded as weekly and monthly trends. Note that each attribute of a a market-timing tool [38], which generates the trading signals stock was separately normalized by dividing by its maximum of buying, holding, or selling stocks based on its predictions value over the entire trading horizon. In our experiments, the about future trends. At the beginning of the back-testing, historical price data were chronologically split into three time we evenly allocate the investment budget to each stock. The periods in the ratio of 6:2:2 for training, validation, and testing, trading strategy is executed as follows: if the model predicts respectively. that a stock is expected to have a rising trend the next day, We considered the industry-belonging and fund-holding the trader will invest in the stock at the closing price. After a relationships of stocks. For the former, we grouped all stocks purchase, if the model predicts that the stock price continues into 104 industry categories according to the Shenwan Industry to rise or keeps steady, the trader will hold the stock on that Classification Standard3. For the latter, we selected 61 mutual day; on the other hand, if the model predicts that the stock may funds established before 2013 in the A-share market, and show a falling trend, the trader will sell it at the closing price. acquired the constituent stocks of each fund from the quarterly In the back-testing, we only take long positions (aka. buy- portfolio reports. hold-sell) on stocks and ignore short positions (aka. borrow- sell-buy). B. Evaluation Methodology and Metric The back-testing was conducted on the trading days covered by the test set, i.e., from 08/22/2018 to 12/31/2019. At the In our study, the stock trend for the next trading day is end of the back-testing, we counted the cumulative investment defined as one of the directions of rising (+1), falling (−1), return rate (IRR), which is calculated by summing over and steady (0). The ground-truth label is determined based on the return rates of all stocks. Note that most of previous the change ratio of the closing price, i.e., works [12], [14], [15] performed back-testing without consid-  pt+1 − pt eration of transaction costs. However, as widely reported in  + 1, if ≥ ξ ;  p rising literature [39], [40], many trading strategies fail to yield the  t pt+1 − pt excess return once transaction costs are included. Especially as y = − 1, if ≤ ξ ; (17)  p falling trading frequency increases, the effect of transaction costs may  t 0, otherwise. outweigh the profitability of trading strategies. In our back- testing, we take into account a transaction cost of 0.03% when where p and p are the closing prices at the t-th and t t+1 calculating the investment return rate, which is in accordance (t + 1)-th trading days, respectively. ξ and ξ are rising falling with the stock market practice in China. As a result, our back- two thresholds. To balance the number of samples in different testing provides a more realistic evaluation of the profitability categories, we set ξ = 0.55% and ξ = −0.50% rising falling of trading strategies. in line with the previous works [9], [37]. The daily average In addition, we care about the risk exposure of trading percentages of stocks with rising, falling, and steady trends strategies, which can be assessed by the metrics of maximum are listed in Table II. drawdown (MDD) and Sharpe ratio (SR) [41]. A maximum We evaluated the algorithm performance in terms of the drawdown is the maximum observed losses from a peak of classification accuracy, precision, recall, and F score on 1 trading strategies during the back-testing. Obviously, a smaller stock trends. These metrics can be calculated according to maximum drawdown indicates a lower downside risk of trad- 2https://tushare.pro/document/2 ing strategies. The Sharpe ratio helps investors understand the 3http://www.swsindex.com/idx0530.aspx return of trading strategies to the risk. It is defined as the ratio JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8 of the average return earned in excess of the risk-free rate to E. Implementation Details the volatility of excess return, i.e., We implemented all methods evaluated in this study with Pytorch5 except SFM, for which the original Keras imple- IRRa − Rf SR = , (19) mentation6 was directly used. For the sake of reproducibil- σ ity, the data and codes of our work have been released at where IRRa denotes the annual investment return rate, Rf https://github.com/lixiaojieff/HGTAN. denotes the risk-free rate, and σ denotes the standard deviation In our implementation, the price data of stocks at each of excess return during the back-testing. In this case, we set trading day is embedded in a 16-dimensional space before fed the risk-free rate to the interest rate of one year time deposit into different methods. For baseline methods, we determined 4 announced by the People’s Bank of China in 2019 , i.e., Rf = the optimal hyperparameters either following the suggested 1.5%. Intuitively, the greater the value of the Sharpe ratio, the settings in the original papers or based on the classification more attractive the risk-adjusted return [42]. accuracy obtained on the validation set. In our model, all hyperparameters were also optimized with the validation set. Specifically, we chose the number of hidden layers and the D. Baseline number of cells per layer in the GRU model to be 2 and 32, We compared our proposed HGTAN against different algo- respectively. The size of stock embeddings before and after rithms for stock trend prediction. Specifically, we first chose the update of HGTAN was set to 16 and 8, respectively. Our two traditional stock technical indicators as baselines: network was trained with the mini-batch Adam optimizer [44]. We set the batch size to 64. The initial learning rate was 10−3 • MOM [43]: The momentum indicator suggests that the future direction of stock price is consistent with that in for all layers, and the maximum number of epochs was 600 the recent period. during training. • MR [37]: The mean reversion indicator predicts the movement of stock price as the opposite direction of V. EXPERIMENTAL RESULTS AND ANALYSIS current price towards the past average price. In this section, we report a series of experimental results Three deep sequence models were further introduced to the to validate the effectiveness of the proposed HGTAN method. comparison: Note that for each method compared in our experiments, we repeat the training and testing procedures five times, and report • LSTM [21]: This method is the LSTM model that the average performance to alleviate the fluctuations caused by sequentially accepts the time series price data to make random initializations. Through these experiments, we try to a prediction on future stock trend. address the following research questions: • DARNN [7]: This method utilizes a dual-stage attention- • RQ1: Does our method achieve superior performance in based recurrent neural network, which adaptively extracts stock trend prediction? relevant input features at each time step and selects • RQ2: Does our method help investors earn excess returns relevant encoder hidden states across all time steps. from stock investment in real market? • SFM [8]: This method extends LSTM by decomposing • RQ3: Does our method work better with the group-wise the hidden memory states into multiple components, and relationships of industry-belonging and fund-holding models the latent trading patterns with multiple frequen- among stocks? cies to predict the trend of stock prices. • RQ4: Does our method benefit from the triple attention Besides, we experimented with several recently proposed mechanism in hypergraph modeling? algorithms that are also based on stock relationships: • GCN [23]: This method uses a LSTM network to encode A. Classification Performance the historical price data of stocks, and the results are For stock trend prediction, we studied how different meth- then fed into a GCN to learn based on the relationships ods perform when considering different lengths of price between stocks. records, including the past 5 trading days, 10 trading days, • TGC [12]: This method devises a new component of and 20 trading days (i.e., m = 5, 10, and 20 in Eq. (5)). neural network modeling, named temporal graph convolu- The settings simulate the scenario that we predict the future tion, which generates the relational embeddings of stocks stock price trend depending on the historical data from the past in a time-sensitive way. week, two weeks, or month. Table III lists the performance • HATS [13]: This method presents a hierarchical graph comparison between different methods, from which we can attention network that selectively aggregates different make the following observations: types of relational data to learn stock representations. • STHGCN [19]: This method models the industry- • As a traditional technical indicator, MR exhibits ac- belonging relationships of stocks via a hypergraph, and ceptable performance with respect to the other complex introduces a gated temporal convolution to capture the deep learning based models. For example, in the case temporal dependencies in stock price features. of 5-trading day records, MR achieves the best result in

5https://pytorch.org 4https://cn.investing.com/economic-calendar/pboc-deposit-rate-1082 6https://github.com/z331565360/State-Frequency-Memory-stock-prediction JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9

TABLE III: Classification performance of different methods with the transaction records over different number of past trading days.

5 trading days 10 trading days 20 trading days Method Accuracy Precision Recall F1 Accuracy Precision Recall F1 Accuracy Precision Recall F1 MOM [43] 34.52% 34.79% 31.88% 33.27% 34.50% 34.94% 32.07% 33.44% 35.73% 35.19% 32.82% 33.96% MR [37] 35.59% 39.37% 33.77% 36.36% 34.73% 29.34% 31.79% 30.52% 35.32% 38.03% 33.60% 35.68% LSTM [21] 34.92% 35.34% 33.91% 34.27% 35.09% 38.09% 34.37% 35.90% 35.03% 36.43% 34.23% 35.20% DARNN [7] 37.68% 37.81% 35.17% 36.43% 38.89% 38.59% 35.22% 36.82% 38.41% 37.99% 39.24% 38.60% SFM [8] 33.29% 26.83% 33.35% 29.52% 34.95% 24.82% 33.34% 28.22% 34.54% 26.93% 33.32% 29.49% GCN [23] 37.24% 37.23% 33.54% 35.22% 37.44% 39.07% 34.49% 36.62% 37.30% 39.28% 34.16% 36.54% TGC [12] 37.43% 38.28% 34.05% 36.01% 38.42% 39.35% 35.72% 37.44% 37.81% 36.96% 34.49% 35.67% HATS [13] 38.74% 36.92% 34.29% 35.52% 38.05% 39.23% 34.52% 36.67% 38.85% 38.70% 35.06% 36.78% STHGCN [19] 38.53% 37.35% 34.65% 35.89% 38.81% 36.57% 35.11% 35.75% 38.45% 37.22% 32.82% 34.87% HGTAN 39.51% 38.90% 36.96% 37.89% 39.83% 41.72% 37.32% 39.37% 40.02% 41.77% 39.03% 40.32%

The best result in terms of each metric is indicated in bold, and the second best one is underlined. This convention is also adopted in the following tables.

terms of precision. The observation is in accordance with TABLE IV: Profitability of different methods during the back- previous financial empirical studies [45], [46], which have testing. proven that simple forms of technical analysis contain Method IRR MDD SR impressive forecasting power, especially with the short- Buy-and-Hold 3.84% 19.57% 0.076 term technical indicators. MOM [43] 4.01% 8.74% 0.179 • As deep sequence models, LSTM and SFM considerably MR [37] 4.89% 18.13% 0.123 fall behind those stock relationship based contenders, LSTM [21] 4.73% 13.87% 0.147 including GCN, TGC, HATS, STHGCN, and HGTAN. DARNN [7] 3.23% 12.34% 0.083 This highlights the importance of relationship information SFM [8] 5.85% 5.26% 0.513 for stock trend prediction. On the other hand, DARNN GCN [23] 6.51% 14.44% 0.217 frequently achieves competitive performance merely us- TGC [12] 8.23% 8.60% 0.513 ing the historical price data. Contrasting to LSTM and HATS [13] 11.55% 7.31% 0.697 SFM, DARNN additionally incorporates the attention STHGCN [19] 6.23% 10.43% 0.248 mechanism to adaptively select relevant time series fea- HGTAN 25.17% 4.23% 1.792 tures. The performance gap between them suggests the benefit of introducing the attention mechanism to capture stock temporal dynamics. is also included as a benchmark for the comparison. In the • The proposed HGTAN consistently outperforms the other buy-and-hold strategy, investors are assumed to purchase all competitors, leading to the best or runner-up performance stocks at the beginning of the back-testing, and hold them in all cases. More precisely, it exceeds the second place until they are sold at the end of the back-testing. For a more by an average of nearly 1.0% and 1.70% in terms of intuitive understanding, Fig. 3 plots the fluctuation curves of accuracy and F1 score, respectively. The results clearly different methods regarding IRR during the back-testing. From demonstrate the effectiveness of HGTAN for stock trend the results, we can see that: prediction, and provide the evidence that the research • Most active trading strategies generated with different question RQ1 can be positively answered. methods beat the passive buy-and-hold strategy, indi- • HGTAN offers a steady improvement in performance cating that both technical indicator based and machine with the increase of the lookback window of past trading learning based models are capable of providing profitable days. For instance, the F1 score obtained by HGTAN trading signals to some extent. The only exception is gradually arises from 37.89% in the case of 5-trading DARNN that yields 3.23% return rate slightly inferior day records to 39.37% and 40.32% in those of 10- and to 3.84% for the buy-and-hold strategy. This is per- 20-trading day records, respectively. It should be noted haps surprising given the relatively good performance of that such a phenomenon is not observed for most of the DARNN for stock trend prediction as shown in Table III. other methods, which could experience a performance Therefore, it cannot be definitively concluded that the degradation when considering a longer period of records. trend prediction ability and profitability of models are Therefore, we believe that HGTAN has a higher capabil- equivalent for stock investment. ity of making full use of historical price data. • The proposed HGTAN earns significantly higher returns than the other methods with a cumulative rate of 25.17%. B. Profitability Notably, it achieves more than twice the return obtained Table IV reports the profitability comparison of different by the second-best method, and six times that of the buy- methods during the back-testing. To observe the overall volatil- and-hold strategy. On the other hand, HGTAN shows a ity of the stock market, a passive buy-and-hold trading strategy stronger power of risk management, reaching the lowest JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 10

30% Buy-and-Hold 25% MOM MR LSTM 20% DARNN SFM 15% GCN TGC 10% HATS STHGCN 5% HGTAN

0% Investment Return Rate

-5%

-10%

-15% 08/31/2018 10/31/2018 12/31/2018 02/28/2019 04/30/2019 06/30/2019 08/31/2019 10/31/2019 Trading Date

Fig. 3: The fluctuation curves of different methods regarding the cumulative investment return rate during the back-testing.

TABLE V: Accuracy of stock trend prediction of different methods per day when the market is in a bear period from 04/23/2018 to 05/10/2018.

Date 04/23 04/24 04/25 04/26 04/29 04/30 05/06 05/07 05/08 05/09 MOM [43] 43.27% 36.94% 48.02% 38.39% 44.46% 37.86% 48.29% 40.90% 37.60% 36.41% MR [37] 16.23% 52.77% 9.76% 17.68% 19.13% 64.78% 7.12% 73.48% 25.33% 17.81% LSTM [21] 24.80% 46.44% 14.38% 22.43% 22.03% 66.49% 8.31% 69.92% 30.87% 20.32% DARNN [7] 35.36% 43.27% 20.19% 30.74% 24.14% 61.87% 14.12% 57.26% 30.61% 25.86% SFM [8] 20.05% 30.87% 14.38% 56.86% 62.27% 22.56% 52.59% 22.03% 48.68% 53.30% GCN [23] 22.43% 48.02% 15.44% 21.50% 23.22% 63.72% 11.35% 71.11% 28.36% 18.73% TGC [12] 69.39% 21.24% 77.97% 43.40% 65.57% 65.44% 15.17% 15.30% 48.15% 22.69% HATS [13] 37.86% 38.26% 31.40% 38.13% 43.67% 50.53% 48.55% 50.26% 44.59% 33.25% STHGCN [19] 21.24% 27.44% 15.04% 23.88% 14.78% 66.62% 12.67% 75.20% 26.65% 19.26% HGTAN 57.78% 22.03% 88.26% 60.16% 72.43% 16.62% 76.52% 31.27% 44.99% 56.07%

maximum drawdown among all competitors. It is worth- 800 while to point out that all the other methods fail to yield a 700 s

k 600

Sharpe ratio above one, implying that their returns are not c o t

S 500 high enough to compensate for the risk [47]. By contrast, f o 400 r

HGTAN results in a more desirable risk-adjusted return e b 300 m

with a Sharpe ratio of 1.792. In other words, HGTAN u is able to provide more return under the same risk. As N 200 a result, we can give a positive answer to the research 100 0 question RQ2. 04/23 04/24 04/25 04/26 04/29 04/30 05/06 05/07 05/08 05/09 • By analyzing the investment return curves of different Trading Date methods in Fig. 3, we notice that HGTAN produces Rising Falling Steady approximately stable and continuous positive returns Fig. 4: Number of stocks with different trends per day when throughout the back-testing. Particularly, the advantage the market is in a bear period from 04/23/2018 to 05/10/2018. of HGTAN over the others mainly lies in its superior performance when the stock market is in a downtrend. Taking the period from 04/23/2018 to 05/10/2018 as an predictions in the bear stage, so that the selling decisions example, for which Fig. 4 shows the number of stocks can be made in time to avoid losses. with rising, falling, and steady trends per trading day. As can be seen, the market is in a bear stage at that time, and there are far more stocks going down than those going up C. Impact of Group-wise Relationships or keeping steady. Meanwhile, Table V lists the accuracy In our study, we characterize the group-wise relationships of predicting stock trends of different methods per day in of industry-belonging and fund-holding among stocks via the same period. Clearly, HGTAN provides more accurate hypergraph modeling. To verify the effectiveness of this JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 11

TABLE VI: Classification performance comparison between GCN and HGCN with different kinds of relationship information.

5 trading days 10 trading days 20 trading days Method Accuracy Precision Recall F1 Accuracy Precision Recall F1 Accuracy Precision Recall F1 GCN-I 37.45% 35.63% 34.79% 35.19% 37.21% 35.23% 33.96% 34.55% 37.88% 36.74% 35.66% 36.19% GCN-F 37.63% 33.85% 34.47% 34.15% 37.23% 34.01% 34.29% 34.14% 36.48% 32.51% 33.31% 32.90% GCN-M 37.24% 37.23% 33.54% 35.22% 37.44% 39.07% 34.49% 36.62% 37.30% 39.28% 34.16% 36.54% HGCN-I 37.55% 37.83% 34.62% 36.15% 38.29% 37.98% 35.01% 36.43% 37.70% 37.76% 35.64% 36.67% HGCN-F 38.03% 36.76% 35.23% 35.97% 38.32% 37.65% 36.23% 36.92% 37.19% 36.58% 36.52% 36.55% HGCN-M 38.15% 37.60% 33.72% 35.55% 38.78% 39.14% 35.35% 37.15% 38.13% 38.14% 34.92% 36.44%

TABLE VII: Classification performance comparison between HGTAN and its variants.

5 trading days 10 trading days 20 trading days Method Accuracy Precision Recall F1 Accuracy Precision Recall F1 Accuracy Precision Recall F1

EWintraE 37.54% 38.97% 34.79% 36.76% 38.92% 39.35% 35.36% 37.65% 39.36% 40.27% 37.62% 38.90% EWinterE 38.67% 38.50% 33.81% 36.00% 39.63% 39.06% 34.68% 36.74% 38.74% 41.16% 37.52% 39.26% EWinterG 38.97% 39.20% 36.43% 37.76% 39.35% 41.41% 37.23% 39.21% 39.85% 39.31% 39.29% 39.30% HGTAN 39.51% 38.90% 36.96% 37.89% 39.83% 41.72% 37.32% 39.37% 40.02% 41.77% 39.03% 40.32% scheme, we conduct the performance comparison between GCN and HGCN with different kinds of relationship infor- mation. Notably, GCN generates the stock embeddings based on the pairwise relationships, for which any pair of stocks are connected if they are both belonging to the same industry or held by the same fund; instead, HGCN works on the group- wise relationships that collectively brings together all stocks associated with an industry or fund. Table VI summarizes the comparison results, in which the suffixes ‘-I’, ‘-F’, and ‘-M’ indicate the methods using the industry-belonging relationship information, the fund-holding relationship information, as well as the mixture of the two, respectively. From the table, we have the following findings:

• Overall, GCN is substantially worse than HGCN, whether exploiting the industry-belonging or fund-holding rela- Fig. 5: Visualization of the learned embeddings of 300 ran- tionship information. The observation confirms our belief domly chosen stocks. Some stocks belonging to the industries that simply compressing the group-wise relationships into of banking, airlines, and liquor are marked by colored points. pairwise ones inevitably causes the loss of information. • It is difficult to tell which is better between HGCN-I and HGCN-F, but HGCN-M frequently obtains higher per- the intra-hyperedge, inter-hyperedge, and inter-hypergraph at- formance over them. The results show that the industry- tention modules are replaced by assigning equal weights to belonging and fund-holding relationship information are different nodes, hyperedges, and hypergraphs, respectively. complementary to each other, and jointly leveraging them Table VII compares the performance between HGTAN and is beneficial to improving stock trend prediction. There- the variant methods. It can be seen that when any of the intra- fore, a positive answer to the research question RQ3 can hyperedge, inter-hyperedge, and inter-hypergraph attention be formed. modules is missing, the variant methods become clearly infe- rior to HGTAN in most cases. This underlines the necessity of simultaneously weighing the importance of nodes, hyperedges, D. Contribution of Triple Attention Mechanism and hypergraphs for guiding the information propagation in In the proposed HGTAN, we introduce a triple atten- stock hypergraphs, and also indicates that the triple attention tion mechanism consisting of the intra-hyperedge, inter- mechanism plays a critical role in HGTAN. Therefore, we can hyperedge, and inter-hypergraph attention modules to hierar- give a positive answer to the research question RQ4. chically measure the importance of nodes, hyperedges, and hypergraphs during the process of information propagation among stocks. To investigate the contribution of the triple E. Visualization and Case Study attention mechanism, we devise three variants of HGTAN, To gain an intuitive understanding about the representation named ‘EWintraE’, ‘EWinterE’, and ‘EWinterG’, in which learning ability of HGTAN, we took the embeddings of 300 JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 12

BAHP

CHNT JNP HP CS PAI PDH AP HPI CRC YS Higher Attention CHNT

1.00 0.89 0.87 0.82 0.79 0.73 JNP HP

G CS BAHP NG BEZHG THD CWTC SZHPD PAI (a) Correlation coefficients regarding the price movement direction. PDH AP 0.35 0.30 HPI 0.30 0.25 CRC

0.20 0.17 YS 0.15 Lower Attention 0.15 0.14 0.13 0.11 0.10 Fig. 7: Heatmap depicting the attention weights between each 0.05 pair of stocks in a fund. 0.00 BAHP NG BEZHG THD CWTC SZHPD (b) Attention weights. ample, and obtain its top 10 constituent stocks8 in the fourth Fig. 6: The correlation coefficients between BAHP and each quarter of 2018. Fig. 7 visualizes the attention weights between neighbor, and the attention weights of the neighbors in the each pair of stocks in the fund. As can be seen, the first hyperedge of park development industry. three stocks, including CHNT, JNP, and HPG, pay much more attention to each other. A possible reason is that they all belong to the pharmaceutical industry. The phenomenon indicates that randomly chosen stocks generated by HGTAN, and projected HGTAN is also able to capture the underlying correlations them into a two-dimensional space using the t-SNE algo- between the stocks held by the same fund. rithm [48]. Fig. 5 displays the projection results. Taking the industries of banking, airlines, and liquor as examples, we VI.CONCLUSION can see that some stocks in the same industry are embedded close to each other, which are marked by colored points in the Inspired by the observation that different stocks are naturally figure. The finding suggests that the representations learned connected as a collective group, we present a collaborative from HGTAN indeed reflect the industry relatedness between temporal-relational modeling framework for end-to-end stock different stocks to some extent. trend prediction, in which a hypergraph tri-attention network Given a target stock, HGTAN can learn the attention weights (HGTAN) is introduced to model the stock group-wise rela- to indicate the importance of local neighbors of the stock tionships of industry-belonging and fund-holding after captur- within the same hyperedge. Here, we present a case study for ing the temporal dynamics of stocks. HGTAN is equipped with the stock Airport High-Tech Park (BAHP), in which the hierarchical intra-hyperedge, inter-hyperedge, and inter- its neighbors7 in the hyperedge of park development industry hypergraph attention modules, which simultaneously measures are considered. We show the correlation coefficient between the importance of nodes, hyperedges, and hypergraphs for the price movement directions of BAHP and each neighbor in guiding the information propagation in stock hypergraphs. Fig. 6(a), as well as the attention weights assigned to the neigh- In the experiments, we demonstrate that our approach sub- bors in Fig. 6(b). By comparison, we can observe that there stantially outperforms state-of-the-art methods on real-world exists the consistency between the neighbor lists sorted by dataset, and show its superior profitability through an invest- correlation coefficient and attention weight, i.e., the neighbors ment simulation. We also perform detailed ablation analysis on having more similar price trends to BAHP are assigned higher its key components, as well as visualization and case studies attention weights. Moreover, it is worth noting that BAHP to provide more insights into our approach. gets the highest attention weight, which is accordant with the Building upon the current study, our future work will be conclusion in previous studies [49] that a node itself often carried out along three directions. Firstly, we intend to collect plays the most important role in learning its representation. more diverse stock-related data like online financial news and Such results demonstrate that HGTAN discriminates between social media contents, and integrate these additional cues in the neighbors belonging to the same industry, and effectively improving stock trend prediction. Secondly, we will explore identifies those meaningful ones. alternative trading strategies to generate more reliable trading As for the fund-holding relationships, we take the Xingquan signals based on the predictions about future stock trends. Global Vision Equity Securities Investment Fund as an ex- 8Changchun High and New Tech (CHNT), Jiangsu Nhwa Pharmaceutical (JNP), Haisco Pharmaceutical (HP), Huaneng Power International (HPI), 7Nanjing Gaoke (NG), Beijing Electronic Zone High-Tech Group Citic Securities (CS), Avicopter PLC (AP), Poly Developments and Holdings (BEZHG), Tianjin High-Tech Development (THD), China World Trade Center (PDH), Construction (CRC), Ping An Insurance (PAI), and (CWTC), and Shanghai Zhangjiang High-Tech Park Development (SZHPD) Yonghui Superstores (YS) JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 13

Finally, we will test on more real-world stock markets to [21] D. M. Nelson, A. C. Pereira, and R. A. de Oliveira, “Stock market’s further validate the adaptability of our approach. price movement prediction with lstm neural networks,” in Proceedings of the 30th International Joint Conference on Neural Networks, 2017, pp. 1419–1426. [22] R. Akita, A. Yoshihara, T. Matsubara, and K. Uehara, “Deep learning for ACKNOWLEDGEMENT stock prediction using numerical and textual information,” in Proceed- ings of the 15th International Conference on Computer and Information We would like to thank the authors of the works [8], [12], Science, 2016, pp. 1–6. and [13] for sharing their codes. [23] Y. Chen, Z. Wei, and X. Huang, “Incorporating corporation relationship via graph convolutional neural networks for stock price prediction,” in Proceedings of the 27th ACM International Conference on Information REFERENCES and Knowledge Management, 2018, pp. 1655–1658. [24] S. Agarwal, K. Branson, and S. Belongie, “Higher order learning [1] E. F. Fama, “The behavior of stock-market prices,” The Journal of with graphs,” in Proceedings of the 23rd International Conference on Business, vol. 38, no. 1, pp. 34–105, 1965. Machine Learning, 2006, pp. 17–24. [2] Z.-R. Lai, D.-Q. Dai, C.-X. Ren, and K.-K. Huang, “Radial basis [25] H. Shi, Y. Zhang, Z. Zhang, N. Ma, X. Zhao, Y. Gao, and J. Sun, functions with adaptive input and composite trend representation for “Hypergraph-induced convolutional networks for visual classification,” portfolio selection,” IEEE Transactions on Neural Networks and Learn- IEEE Transactions on Neural Networks and Learning Systems, vol. 30, ing Systems, vol. 29, no. 12, pp. 6214–6226, 2018. no. 10, pp. 2963–2972, 2019. [3] H. Wu, W. Zhang, W. Shen, and J. Wang, “Hybrid deep sequential [26] Y. Huang, Q. Liu, and D. Metaxas, “Video object segmentation by Proceedings of the IEEE Conference on Computer modeling for social text-driven stock prediction,” in Proceedings of hypergraph cut,” in the 27th ACM International Conference on Information and Knowledge Vision and Pattern Recognition, 2009, pp. 1738–1745. Management, 2018, pp. 1627–1630. [27] E.-S. Kim, W. Y. Kang, K.-W. On, Y.-J. Heo, and B.-T. Zhang, “Hy- pergraph attention networks for multimodal learning,” in Proceedings [4] W. Jiang, “Applications of deep learning in stock market prediction: of the IEEE Conference on Computer Vision and Pattern Recognition, recent progress,” arXiv preprint arXiv:2003.01859, 2020. 2020, pp. 14 581–14 590. [5] A. A. Adebiyi, A. O. Adewumi, and C. K. Ayo, “Comparison of arima [28] S. Yang, J. Hu, Y. Lu, and X. Wang, “Stock trends prediction by hyper- and artificial neural networks models for stock price prediction,” Journal graph modeling,” in Proceedings of the IEEE International Conference of Applied Mathematics, vol. 2014, no. 1, pp. 1–7, 2014. on Software Engineering and Service Science, 2012, pp. 104–107. [6] M. Ballings, D. Van den Poel, N. Hespeels, and R. Gryp, “Evaluating [29] Y. Luo, J. Hu, X. Wei, D. Fang, and H. Shao, “Stock trends prediction multiple classifiers for stock price direction prediction,” Expert systems based on hypergraph modeling clustering algorithm,” in Proceedings with Applications , vol. 42, no. 20, pp. 7046–7056, 2015. of the IEEE International Conference on Progress in Informatics and [7] Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. Cottrell, Computing, 2014, pp. 27–31. “A dual-stage attention-based recurrent neural network for time series [30] N. Yadati, M. Nimishakavi, P. Yadav, V. Nitin, A. Louis, and P. Talukdar, arXiv preprint arXiv:1704.02971 prediction,” , 2017. “Hypergcn: A new method for training graph convolutional networks on [8] L. Zhang, C. Aggarwal, and G.-J. Qi, “Stock price prediction via hypergraphs,” Advances in Neural Information Processing Systems, pp. discovering multi-frequency trading patterns,” in Proceedings of the 23rd 1509–1520, 2019. ACM SIGKDD International Conference on Knowledge Discovery and [31] L. Zhao, Y. Song, C. Zhang, Y. Liu, P. Wang, T. Lin, M. Deng, and H. Li, Data Mining, 2017, pp. 2141–2149. “T-gcn: A temporal graph convolutional network for traffic prediction,” [9] Q. Ding, S. Wu, H. Sun, J. Guo, and J. Guo, “Hierarchical multi-scale IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 9, gaussian transformer for stock movement prediction,” in Proceedings of pp. 3848–3858, 2019. the 29th International Joint Conference on Artificial Intelligence, 2020, [32] G. Kling and L. Gao, “Calendar effects in chinese stock market,” Annals pp. 4640–4646. of Economics and Finance, vol. 6, no. 1, pp. 75–88, 2005. [10] T. N. Kipf and M. Welling, “Semi-supervised classification with graph [33] Y. Hao, H.-H. Chu, K.-Y. Ho, and K.-C. Ko, “The 52-week high and convolutional networks,” arXiv preprint arXiv:1609.02907, 2016. momentum in the stock market: Anchoring or recency biases?” [11] P. Velickoviˇ c,´ G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Ben- International Review of Economics & Finance, vol. 43, pp. 121–138, gio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017. 2016. [12] F. Feng, X. He, X. Wang, C. Luo, Y. Liu, and T.-S. Chua, “Temporal re- [34] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities lational ranking for stock prediction,” ACM Transactions on Information improve neural network acoustic models,” in Proceedings of the 30th Systems, vol. 37, no. 2, pp. 1–30, 2019. International Conference on Machine Learning, vol. 30, 2013, p. 3. [13] R. Kim, C. H. So, M. Jeong, S. Lee, J. Kim, and J. Kang, “Hats: A [35] S. P. Borgatti and M. G. Everett, “A graph-theoretic perspective on hierarchical graph attention network for stock movement prediction,” centrality,” Social networks, vol. 28, no. 4, pp. 466–484, 2006. arXiv preprint arXiv:1908.07999, 2019. [36] Y. Xu and S. B. Cohen, “Stock movement prediction from tweets and [14] C. Chen, L. Zhao, J. Bian, C. Xing, and T.-Y. Liu, “Investment behaviors historical prices,” in Proceedings of the 56th Annual Meeting of the can tell what inside: Exploring stock intrinsic properties for stock trend Association for Computational Linguistics, 2018, pp. 1970–1979. prediction,” in Proceedings of the 25th ACM SIGKDD International [37] F. Feng, H. Chen, X. He, J. Ding, M. Sun, and T.-S. Chua, “Enhancing Conference on Knowledge Discovery and Data Mining, 2019, pp. 2376– stock movement prediction with adversarial training,” in Proceedings of 2384. the 28th International Joint Conference on Artificial Intelligence, 2018, [15] Z. Li, D. Yang, L. Zhao, J. Bian, T. Qin, and T.-Y. Liu, “Individualized pp. 5843–5849. indicator for all: Stock-wise technical indicator optimization with stock [38] P. Bolton, H. Chen, and N. Wang, “Market timing, investment, and risk embedding,” in Proceedings of the 25th ACM SIGKDD International management,” Journal of Financial Economics, vol. 109, no. 1, pp. 40– Conference on Knowledge Discovery and Data Mining, 2019, pp. 894– 62, 2013. 902. [39] J. Andrada-Felix´ and F. Fernandez-Rodr´ ´ıguez, “Improving moving aver- [16] A. Bretto, “Hypergraph theory,” An introduction. Mathematical Engi- age trading rules with boosting and statistical learning methods,” Journal neering. Cham: Springer, 2013. of Forecasting, vol. 27, no. 5, pp. 433–449, 2008. [17] Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao, “Hypergraph neural net- [40] D. Bowen, M. C. Hutchinson, and N. O’Sullivan, “High-frequency works,” in Proceedings of the AAAI Conference on Artificial Intelligence, equity pairs trading: transaction costs, speed of execution, and patterns 2019, pp. 3558–3565. in returns,” The Journal of Trading, vol. 5, no. 3, pp. 31–38, 2010. [18] S. Bai, F. Zhang, and P. H. Torr, “Hypergraph convolution and hyper- [41] W. Long, Z. Lu, and L. Cui, “Deep learning-based feature engineering graph attention,” arXiv preprint arXiv:1901.08150, 2019. for stock price movement prediction,” Knowledge-Based Systems, vol. [19] R. Sawhney, S. Agarwal, A. Wadhwa, and R. R. Shah, “Spatiotemporal 164, pp. 163–173, 2019. hypergraph convolution network for stock movement forecasting,” in [42] W. Shen, J. Wang, Y.-G. Jiang, and H. Zha, “Portfolio choices with Proceedings of the 20th IEEE International Conference on Data Mining, orthogonal bandit learning,” in Proceedings of the 24th International 2020, pp. 482–491. Conference on Artificial Intelligence, 2015, pp. 974–980. [20] X. Yan and Z. Guosheng, “Application of kalman filter in the prediction [43] T. J. Moskowitz, Y. H. Ooi, and L. H. Pedersen, “Time series momen- of stock price,” in Proceedings of the 5th International Symposium on tum,” Journal of Financial Economics, vol. 104, no. 2, pp. 228–250, Knowledge Acquisition and Modeling, 2015, pp. 197–198. 2012. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 14

[44] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Xiushan Nie received the Ph.D. degree from Shan- arXiv preprint arXiv:1412.6980, 2014. dong University in 2011. He was a Research Fellow [45] W.-K. Wong, M. Manzur, and B.-K. Chew, “How rewarding is technical with the University of Missouri–Columbia, USA, analysis? evidence from singapore stock market,” Applied Financial from 2013 to 2014, under the supervision of Prof. Economics, vol. 13, no. 7, pp. 543–551, 2003. Wenjun (Kevin) Zeng. He is currently a Full Pro- [46] H. Yu, G. V. Nartea, C. Gan, and L. J. Yao, “Predictive ability and fessor with the School of Computer Science and profitability of simple technical trading rules: Recent evidence from Technology, Shandong Jianzhu University, China. southeast asian stock markets,” International Review of Economics & His research interests include multimedia retrieval Finance, vol. 25, pp. 356–371, 2013. and indexing, multimedia security, and computer [47] J. Du, “Machine learning based trading strategies for the chinese stock vision. market,” Ph.D. dissertation, University of Liverpool, 2020. [48] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 11, pp. 2579–2605, 2008. [49] X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu, “Heterogeneous graph attention network,” in Proceedings of the World Wide Web Conference, 2019, pp. 2022–2032. Meng Wang is a professor at the Hefei University of Technology, China. He received his B.E. degree and Ph.D. degree in the Special Class for the Gifted Young and the Department of Electronic Engineer- ing and Information Science from the University of Science and Technology of China (USTC), Hefei, Chaoran Cui received his Ph.D. degree in computer China, in 2003 and 2008, respectively. His current science from Shandong University in 2015. Prior to research interests include multimedia content analy- that, he received his B.E. degree in software engi- sis, computer vision, and pattern recognition. He has neering from Shandong University in 2010. During authored more than 200 book chapters, journal and 2015-2016, he was a research fellow at Singapore conference papers in these areas. He is the recipient Management University. He is now a professor of the ACM SIGMM Rising Star Award 2014. He is an associate editor of with School of Computer Science and Technology, IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE), IEEE Shandong University of Finance and Economics. Transactions on Circuits and Systems for Video Technology (IEEE TCSVT), His research interests include information retrieval, and IEEE Transactions on Neural Networks and Learning Systems (IEEE recommender systems, multimedia, and machine TNNLS). learning. He is a member of the IEEE.

Yilong Yin received the Ph.D. degree from Jilin University, Changchun, China, in 2000. He is cur- Xiaojie Li received her B.E. degree in digital me- rently the Director of the Machine Learning and dia technology from Shandong University of Fi- Applications Group and a Professor with Shandong nance and Economics, Jinan, China, in 2019. She University, Jinan, China. From 2000 to 2002, he is currently pursuing the master degree in computer was a Postdoctoral Fellow with the Department of application technology at Shandong University of Electronic Science and Engineering, Nanjing Uni- Finance and Economics, Jinan, China. Her research versity, Nanjing, China. His research interests in- interests include data mining, recommender systems, clude machine learning, data mining, computational and machine learning. medicine, and biometrics.

Juan Du obtained her MSc in Quantitative Finance from National University of Singapore in 2016, and Ph.D. in Mathematical Science from University of Liverpool, in 2020. She is currently working at School of Finance at Shandong University of Finance and Economics. Her research interests focus on developing methodologies and algorithms, such as machine learning and mathematical models to solve problems in the field of quantitative finance.

Chunyun Zhang received her Ph.D. degree from Beijing University of Posts and Telecommunica- tions, China in 2015. She is now an associate profes- sor in Machine Learning and Data Mining Center, Shandong University of Finance and Economics. Her current research interests include machine learning and natural language processing.