RECENT DEVELOPMENT IN DEEP LEARNING IN FINANCE

Harsh Prasad

DISCLAIMER: 1. Any opinion in this presentation is in my personal capacity 2. Material presented here is in the nature of literature review of work done by others Adoption of Deep Learning in Finance

1. Deep Learning for Financial Applications : A Survey Ahmet Murat Ozbayoglua , Mehmet Ugur Gudeleka , Omer Berat Sezera 2. Deep learning in finance and banking: A literature review and classification Jian Huang , Junyi Chai and Stella Cho 3. in Finance (Bonnie G Buchanan, Alan Turing Institute)

1. Financial text mining, Algo-trading, risk assessments, sentiment analysis, portfolio management and fraud detection are among the most studied areas of finance research. 2. Even though DL models already had better achievements compared to traditional counterparts in almost all areas, the overall interest is still on the rise in all research areas. 3. Cryptocurrencies, , behavioral finance, HFT and derivatives market have promising potentials for research 4. RNN based models (in particular LSTM), CNN and DMLP have been used extensively in implementations 5. In most of the studies, DL models performed better than their ML counterparts. 6. Hybrid models based on Spatio-temporal data representations, NLP, semantics and text mining-based models might become more important in the near future. Purpose of Finance

1. Why Finance Matters: Building an industry that serves its customers and society Pitt-watsonand Mann 2. On the theory and measurement of financial intermediation Phillipon 2015, 2016

1. The academic literature identifies the four principle purposes of the finance industry, against which its output can be measured. These are:

• The safe-keeping of assets; • Providing an effective payment system; • Pooling risk; • Intermediation –matching the users and suppliers of money.

2. Other consideration which are in the nature of enabling function or externalities include providing information in a decentralized system and manage asymmetric information (Merton and Bodie) and providing liquidity, or developing new processes (Epstein).

3. “enabling functions” such as successful innovation, or the management of asymmetric information andeExternalities of intermediation leading to price discovery and separation of ownership and management.

4. It is difficult not to see finance as an industry with excessive rents and poor overall efficiency. The puzzle is why this has persisted for so long. There are several plausible explanations for this: zero-sum games in trading activities, inefficient regulations, barriers to entry, increasing returns to size, etc Is Big data enabling finance?

1. Market efficiency in the age of big data Ian Martin, Stefan Nagel, 2020 2. Artificial Intelligence and Asymmetric Information Theory Tshilidzi Marwala and Evan Hurwitz

1. Does big data give more predictability?

2. Does it help make information symmetry?

3. Efficient market hypothesis states that the market incorporates all the information such that it is impossible to beat the market (Fama, 1965). It thus follows that the only way to beat the market is to engage in high risk transactions. Implicit in the efficient market hypothesis is the fact that the agents that participate in the market are rational. Of course we now know that human agents are not rational and therefore the markets cannot be rational. Theories such as prospect theory and have proven that at best human agents are not fully rational but almost always are not rational (Simon, 1974; Kahneman and Tversky, 1979). Marwala (2015) surmised that artificial intelligent agents make markets more rational than human agents.

4. Five specific economic patterns influenced by AI are discussed: (1) following in the footsteps of ‘homo economicus’ a new type of agent, ‘machina economica’, enters the stage of the global economy. (2) The pattern of division of labor and specialization is further accelerated by AI-induced micro-division of labor. (3) The introduction of AI leads to triangular agency relationships and next level information asymmetries. (4) Data and AI-based machine laborhave to be understood as new factors of production. (5) The economics of AI networks can lead to market dominance and unwanted external effects. NEURAL NETWORK

ퟎ 풊풇 σ풋 풘풋풙풋 ≤ 풕풉풓풆풔풉풐풍풅 Output = ൝ ퟏ 풊풇 σ풋 풘풋풙풋 > 풕풉풓풆풔풉풐풍풅 ퟎ 풊풇 풘. 풙 + 풃 ≤ ퟎ Output = ቊ ퟏ 풊풇 풘. 풙 + 풃 > ퟎ ACTIVATION FUNCTIONS ACTIVATION FUNCTIONS NETWORK AND ACTIVATION FEEDFORWARD VS RECURRENT USE CASE: HANDWRITING RECOGNITION SOLVING THE PROBLEM – FINAL DECISION BREAKING THE PROBLEM STATEMENT LAYER 2 DECISION LAYER 1 DECISION FEEDFORWARD DECISION FLOW COST FUNCTION COST MINIMIZATION GRADIENT DESCENT BACKPROPAGATION Deep Learning for Fraud Detection and Risk Assessment

1. Deep learning detecting fraud in credit card transactions Abhimanyu Roy, Jingyi Sun, Robert Mahoney, Loreto Alonzi, Stephen Adams, and Peter Beling 2. Towards automated feature for credit card fraud detection using multi-perspective HMMs Y Lucas et al 2020

1. Roy et al (2017) while studying fraud detection in credit card transaction indicated that he LSTM and GRU model significantly outperformed the baseline ANN which indicates that order of transactions for an account contains useful information in differentiating between fraud and non-fraudulent transactions.

2. Lucas et al presented a feature engineering framework to model a sequence of credit card transactions from three different perspectives, namely (i) The sequence contains or doesn’t contain a fraud (ii) The sequence is obtained by fixing the card-holder or the payment terminal (iii) It is a sequence of spent amount or of elapsed time between the current and previous transactions. Combinations of the three binary perspectives give eight sets of sequences from the (training) set of transactions. Each one of these sequences is modelled with a Hidden Markov Model (HMM). Each HMM associates a likelihood to a transaction given its sequence of previous transactions. These likelihoods are used as additional features in a Random Forest classifier for fraud detection.

3. The feature engineering strategy is shown to perform well for e-commerce and face-to-face credit card fraud detection: the results show an increase in the precision–recall AUC of 18.1% for the face-to-face transactions and 9.3% for the e-commerce ones. The feature engineering strategy is shown to be rele- vant for various types of classifiers (random forest, logistic regression and Adaboost) and robust to hyperparameters choices made for constructing the features. Deep Learning for Fraud Detection and Risk Assessment

1. Dual Sequential Variational Autoencoders for Fraud Detection Ayman Alaziz, Amaury Habrard, Francois Jacquenet, LiyunHe-Guelton, and Frederic Oble, 2020

1. An autoencoder is a special type of feedforward neural network where the input is same as the output. It compresses the input into a lower dimension and then reconstructs the output from this representation. It is an unsupervised algorithm that applies backpropagation to set the target value to be equal to the input. It is made up of two parts linked together: an encoder E(x) and a decoder D(z). Given an input sample x, the encoder generates z, a condensed representation of x. The decoder is then tuned to reconstruct the original input x from the encoded representation z. The objective function used during the training of the AE is given by: LAE (x) = ∥x − D(E(x))∥ where ∥ · ∥ denotes an arbitrary distance function. The l2 norm is typically applied here. The AE can be optimized for example using stochastic gradient descent.

2. A Variational autoencoder (VAE) is an attractive probabilistic generative version of the standard autoencoder. It can learn a complex distribution and then use it as a generative model defined by a prior p(z) and conditional distribution pθ(x|z). Due to the fact that the true likelihood of the data is generally intractable, a VAE is trained through maximizing the evidence lower bound (ELBO):

3. Negative learning is a technique used for regularizing the training of the AE in the presence of labelled data by limiting reconstruction capability (LRC) . The basic idea is to maximize the reconstruction error for abnormal instances, while minimizing the reconstruction error for normal ones in order to improve the discriminative ability of the AE. Given an input instance x ∈ Rn and y ∈ {0, 1} denotes its associated label where y = 1 stands for a fraudulent instance and y = 0 for a genuine one. The objective function of LRC to be minimized is: (1 − y)LAE (x) − (y)LAE (x) Deep Learning for Fraud Detection and Risk Assessment

1. Dual Sequential Variational Autoencoders for Fraud Detection Ayman Alaziz, Amaury Habrard, Francois Jacquenet, LiyunHe-Guelton, and Frederic Oble, 2020

1. The DuSVAE model consists of a generative model that takes into account the sequential nature of the data. It combines two variational autoencoders that can generate a condensed representation of the input sequential data that can then be processed by a classifier to label each new sequence as fraudulent or genuine.

2. One of the main contribution of this paper is to propose a method to identify fraudulent sequences of credit transactions in the context of highly imbalanced data. For this purpose, the Dual Sequential Variational Autoencoders is used, that consists of a combination of two variational autoencoders. The first one is trained from fraudulent sequences of transactions in order to be able to project the input data into another feature space and to assign a fraud score to each sequence thanks to the reconstruction error information. Once this model is trained, a second VAE is plugged at the output of the first one. This second VAE is then trained with a negative learning approach with the objective to maximize the reconstruction error of the fraudulent sequences and minimize the reconstruction error of the genuine ones. Deep Learning for Text Mining/Sentiment Analysis

1. Decision support from financial disclosures with deep neural networks and transfer learning (Mathias Kraus, Stefan Feuerriegel 2017) 2. FinBERT: Financial Sentiment Analysis with Pre-trained Language Models (Dogu Tan Araci, 2019) 3. A Novel Ensemble Deep Learning Model for Stock Prediction Based on Stock Prices and News (Yang Li, Yi Pan 2020)

1. With the rapid spreading of social media and real-time streaming news/tweets, instant text-based information retrieval became available for financial model development. As a result, financial text mining studies became very popular in recent years. Even though some of these studies are directly interested in the sentiment analysis through crowdsourcing, there are a lot of implementations that are interested in the content retrieval of news, financial statements, disclosures, etc. through analyzing the text context.

2. The recurrent neural network processes raw text in sequential order Which helps the RNN to implicitly learn context - sensitive features. However, the RNN is subject to drawbacks (vanishing gradient problem and short context dependencies), which often prohibit its application to real-world problems.

3. An improvement to the classical RNN is represented by the long short-term memory model, which is capable of processing sequential inputs with very long dependencies between related input signals by utilizing forget gates that prevent exploding gradients during back-propagation and thus numerical instabilities.

4. Transfer learning performs representation learning on a different, but related, dataset and then applies the gained knowledge to the actual training phase. The weights in a usual neural network are initialized randomly and then optimized for the training set. The idea behind transfer learning is to initialize the weights not randomly, but rather with values that might be close to the optimized ones.

5. ELMo embeddings are contextualized word representations in the sense that the surrounding words influence the representation of the word. ULMFit is a transfer learning model for down-stream NLP tasks, that make use of language model pre-training.

6. The ensemble learning method combines decisions from multiple sub-models to a new model and then to make the final output to improve the prediction accuracy or the overall performance. Deep Learning in Algo Trading and Portfolio Management (RL)

1. Deep Reinforcement Learning for Option Replication and Hedging Jiayi Du, Muyang Jin,Petter Kolm, Gordon Ritter, Yixuan Wang, Bofei Zhang, 2020

Reinforcement Learning provides a way to train computer models, referred to as agents, that learn to interact with an environment by means of the sequency of actions they take, with the goal of optimizing a cumulative reward over time. At each time step t, the agent observes the current state of the environment st ∈ S and chooses an action at ∈ A. This choice influences both the transition to the next state, st+1, as well as the reward, Rt+1, the agent receives. The agent’s goal is to choose actions to maximize the expected cumulative reward.

2 E[Gt]:=E[Rt+1 +γRt+2 +γ Rt+3 +...], (1) where the constant γ ∈ [0, 1] is referred to as the discount factor. The sum in (1) can be either finite or infinite, depending on the problem at hand. If rewards are bounded, then γ < 1 ensures convergence when the sum (1) is infinite. A policy π is a strategy for determining an action at, conditional on the current state st. Polices can be deterministic or stochastic. In the deterministic case, π maps S → A; in the stochastic case, π maps a state s ∈ S to a probability distribution π(a | s) on A Deep Learning in Algo Trading and Portfolio Management

1. Deep Reinforcement Learning for Option Replication and Hedging Jiayi Du, Muyang Jin,Petter Kolm, Gordon Ritter, Yixuan Wang, Bofei Zhang, 2020

The action-value function Qπ : S × A → R, also known as the Q-function, expresses the value of starting in state s, taking action a, and following policy π thereafter π Q (s,a) := Eπ[Gt |st = s,at = a], (2)

where Eπ denotes the expectation under the assumption that policy π is followed. The state-value function is defined as the action-value function where the first action also comes from the policy π, i.e. π π V (s):=Eπ[Gt|St = s ]=Q (s,π(s)). (3)

Policy π is defined to be at least as good as π′ if V π(s) ≥ V π′ (s) (4)

or all states s. An optimal policy is defined to be one which is at least as good as any other policy. All optimal policies share the same optimal action-value function Q∗, the optimal action-value function. The goal of Q-learning is to learn Q* . The optimal action-value function satisfies the Bellman equation Q∗(s,a)=E [R+γmaxQ∗(s′,a′)|s,a] . (5)

The basic idea of Q-learning is to turn the Bellman equation into the update Qi+1(s, a) = E [R + γ maxa′ Qi ∗ (s′, a′) | s, a], and iterate this scheme until convergence, Qi → Q Once one has determined the optimal action-value function, the optimal policy can be computed via π∗(s) = arg max Q∗(s, a) . (6) Deep Learning in Algo Trading and Portfolio Management

1. Deep Reinforcement Learning for Option Replication and Hedging Jiayi Du, Muyang Jin,Petter Kolm, Gordon Ritter, Yixuan Wang, Bofei Zhang, 2020

DQN: In deep Q-learning the action-value function is approximated with a DNN, Q(s,a;θ) ≈ Q∗(s,a), where θ represents the network parameters. The DNN is then trained by minimizing the the following sequence of loss functions − Li(θi)=E Lδ Q(s,a;θi)−R−γmaxQ(s′,a′;θ )|(s,a,R,s′)∼U(D) , (7)

2 where Lδ is the Huber loss : Lδ(x) = 1/2 x when |x| ≤ δ and δ(|x| − 1/2 δ) otherwise

In the loss functions (7) above, Q(s, a; θ ), Q(s, a; θ−) and U(D) are referred to as the policy network, target network and behavior distribution, respectively. While θ is updated at every iteration, θ− is updated only every M steps. In the computational examples here, the DQN models are trained using experience replay and the exploration and exploitation tradeoff is controlled using the ε-greedy approach as in Mnih et al. (2015, Algorithm 1)

Pop- Art: DQN can experience stability issues in problems where rewards vary significantly in magnitude, resulting in poor performance. Mnihet al. (2015) use reward clipping to address this issue. However, deciding on an acceptable range for the rewards is ad hoc and can also change the learning objective, thereby resulting in different policies.

Hasselt et al. (2016) propose an adaptive approach to normalize rewards that they call “Preserving Outputs Precisely, while Adaptively Rescaling Targets” (Pop-Art) and demonstrate how it improves stability and performance of DQN on a number of Atari 2600 games.

Policy Gradient Methods: Each action at is generated by a stochastic policy such that a t ∼ πθ(at | st) with parameters θ. In contrast to improving the approximation of the action-value function as is done in Q-learning, policy gradient methods ∗ aim to directly learn the policy πθ that maximizes the cumulative reward by performing a gradient update of θ.

Proximal Policy Optimization. A challenge with policy gradient methods is that estimates of the gradients have high variance, resulting in lack of robustness and poor convergence. Additionally, basic policy gradient methods are often not data-efficient. Proximal policy optimization (PPO) is chosen as it has been shown to be data efficient and robust in a number practical applications. Deep Learning in Algo Trading and Portfolio Management (Results)

1. Deep Reinforcement Learning for Option Replication and Hedging Jiayi Du, Muyang Jin,Petter Kolm, Gordon Ritter, Yixuan Wang, Bofei Zhang, 2020

1. The paper demonstrated that the deep reinforcement learning (DRL) models learn similar or better strategies as compared to delta hedging. The three DRL used a) deep Q-learning, (b) deep Q-learning with Pop-Art and (c) proximal policy optimization, could learn to optimally replicate options with different strikes subject to realistic conditions and is trained to hedge a whole range of strikes and no retraining is needed when the user changes to another strike within the range*.

2. The system is model-free, and does not require assumptions about price processes of the derivatives and hedging securities, or transaction costs. It also does not depend on the existence of a “perfect” dynamic trading strategy replicating the derivatives. Rather, it learns to optimally trade off variance and cost, as best as possible using any hedging securities, doing away the assumption of a complete market.

3. One can evaluate efficacy of an automatic hedging model by how often the total P&L (including the hedg- ing and trading costs) is significantly less than zero. The middle panel in Figure 6 displays density plots of the t-statistics of total P&L for the agents and BSM model. We observe that DQN and PPO perform the best in that their t-statistic are more frequently close to zero than the other models.

• For DQN, DQN with Pop-Art and PPO, we refer the reader to Mnihet al. (2013), Mnih et al. (2015), Silver et al. (2017), Hasselt et al. (2016), Schulman et al. (2015b), and Schulman et al. (2017). SIMPLE RECURRENT NEURAL NETWORK

Elman Network: ht = σh(Whxt + Uhht-1 + bh) yt = σy(Wyht + by) Jordan Network: ht = σh(Whxt + Uhyt-1 + bh) yt = σy(Wyht + by) LONG SHORT TERM MEMORY (LSTM)

it = σ(Wiht−1 + Uixt + bi)

Ft = σ(Wfht−1 +Ufxt +bf)

ot = σ(Woht−1 +Uoxt +bo)

gt = tanh(Wght−1 +Ugxt +bg)

ct = ft ⊙ ct−1 +it ⊙ gt

ht = ot ⊙ tanh(ct) Deep Recurrent Factor Model

1. Deep Recurrent Factor Model: Interpretable Non-Linear and Time-Varying Multi-Factor Model Kei Nakagawa, Tomoki Ito , Masaya Abe , Kiyoshi Izumi, 2019

The traditional linear multi-factor model : ri = αi +Xi1F1 +···+XiNFN +εi,

The non-linear time-dependent model: ri(t) = f (Xi1(t)F1(t) + Xi1(t − 1)F1(t − 1) + ··· +XiN(t)FN(t)+···)+εi(t)

To estimate the unknown non-linear function f, long short-term memory (LSTM) model is used.

However, given the limitations in the interpretability of the prediction from a blackboxmodel (LSTM), layer-wise relevance propagation (LRP) (Bach et al. 2015) was applied to linearize the proposed LSTM model. Transfer Learning

1. QuantNet: Transferring Learning Across Trading Strategies Adriano Koshiyama, Stefano B. Blumberg, Nick Firoozye and Philip Treleaven , 2020

1. QuantNet: an architecture that learns market-agnostic trends and use these to learn superior market-specific trading strategies. QuantNet uses recent advances in transfer- and meta-learning, where market-specific parameters are free to specialize on the problem at hand, while market-agnostic parameters capture signals from all markets. QuantNet takes a step towards end-to- end global financial trading that can deliver superior market returns. In a few big regional markets, such as S&P 500, FTSE 100, KOSPI and Saudi Arabia Tadawul All Shares, QuantNet showed 2-10 times improvement in SR and CR. QuantNet also generated positive and statistically significant alpha according to Fama-Fench 5 factors model.

2. Transfer learning embodies a set of techniques for sharing information obtained on one task, or market, when learning another task (market). In the simplest case of pre-training, for example, we would train a model in a market M2 by initializing its parameters to the final parameters obtained in market M1. While such pre-training can be useful, there is no guarantee that the parameters obtain on task M1 will be useful for learning task M2.

3. key contributions are: (i) a novel architecture for transfer learning across financial trading strategies; (ii) a novel learning objective to facilitate end-to-end training of trading strategies; and (iii) demonstrate that QuantNet can achieve significant improvements across global markets with an end-to-end learning system. Deep Learning in Finance – Explainability, Fairness and Ethics

1. Four Principles of Explainable Artificial Intelligence (NIST, 2020) 2. Understanding artificial intelligence ethics and safety (Alan Turning Institute, 2019)

1. The four principles for Explainable AI:

- Explanation: Systems deliver accompanying evidence or reason(s) for all outputs.

- Meaningful: Systems provide explanations that are understandable to individual users.

- Explanation Accuracy: The explanation correctly reflects the system’s process for generating the output. 3. The FAST track principles: - Knowledge Limits: The system only operates under conditions for which it was designed or when the system reaches a sufficient confidence in its output. - Fairness: Data, Design, Outcome, Implementation 2. Tools used for explainability: - Accountability: Deserves consideration both before and SHAP (shapely regression values and layer-wise relevance propagation): unified approach to after implementation interpreting model predictions; InterpretML: opensource tool; LIME: a model-agnostic approach; TreeInterpreters; RFEX 2.0; Alibi; DeepLift - Sustainability and Safety: Stakeholder impact assessment, accuracy, reliability, security and robustness - Transparency: AI transparency map Deep Learning in Finance – Explainability, Fairness and Ethics

1. Top 9 ethical issues, World Economic Forum, 2016

Top 9 ethical issues in artificial intelligence 1. Unemployment. What happens after the end of jobs? 2. Inequality. How do we distribute the wealth created by machines? 3. Humanity. How do machines affect our behaviour and interaction? 4. Artificial stupidity. How can we guard against mistakes? 5. Racist robots. How do we eliminate AI bias? 6. Security. How do we keep AI safe from adversaries? 7. Evil genies. How do we protect against unintended consequences? 8. Singularity. How do we stay in control of a complex intelligent system? 9. Robot rights. How do we define the humane treatment of AI? Questions?