<<

Review Comprehensive Review of Deep Learning Methods and Applications in

Amirhosein Mosavi 1,2,* , Yaser Faghan 3, Pedram Ghamisi 4,5 , Puhong Duan 6, Sina Faizollahzadeh Ardabili 7 , Ely Salwana 8 and Shahab S. Band 9,10,*

1 Environmental Quality, Atmospheric Science and Climate Change Research Group, Ton Duc Thang University, Ho Chi Minh City, Vietnam 2 Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Vietnam 3 Instituto Superior de Economia e Gestao, University of Lisbon, 1200-781 Lisbon, Portugal; [email protected] 4 Helmholtz-Zentrum Dresden-Rossendorf, Chemnitzer Str. 40, D-09599 Freiberg, Germany; [email protected] 5 Department of Physics, Faculty of Science, the University of Antwerp, Universiteitsplein 1, 2610 Wilrijk, Belgium 6 College of Electrical and Engineering, Hunan University, Changsha 410082, China; [email protected] 7 Department of Biosystem Engineering, University of Mohaghegh Ardabili, Ardabil 5619911367, Iran; [email protected] 8 Institute of IR4.0, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia; [email protected] 9 Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam 10 Future Technology Research Center, College of Future, National Yunlin University of Science and Technology, 123 University Road, Section 3, Douliou, Yunlin 64002, Taiwan * Correspondence: [email protected] (A.M.); [email protected] (S.S.B.)

 Received: 10 May 2020; Accepted: 15 September 2020; Published: 23 September 2020 

Abstract: The popularity of deep (DRL) applications in economics has increased exponentially. DRL, through a wide range of capabilities from reinforcement learning (RL) to (DL), offers vast opportunities for handling sophisticated dynamic economics systems. DRL is characterized by scalability with the potential to be applied to high-dimensional problems in conjunction with noisy and nonlinear patterns of economic . In this paper, we initially consider a brief review of DL, RL, and deep RL methods in diverse applications in economics, providing an in-depth insight into the state-of-the-art. Furthermore, the architecture of DRL applied to economic applications is investigated in order to highlight the complexity, robustness, accuracy, performance, computational tasks, risk constraints, and profitability. The survey results indicate that DRL can provide better performance and higher efficiency as compared to the traditional while facing real economic problems in the presence of risk parameters and the ever-increasing uncertainties.

Keywords: economics; deep reinforcement learning; deep learning; ; mathematics; applied ; ; survey; literature review; explainable artificial intelligence; ensemble; ; 5G; fraud detection; COVID-19; Prisma; ;

1. Introduction Deep learning (DL) techniques are based on the use of multi-neurons that rely on the multi-layer architectures to accomplish a learning task. In DL, the neurons are linked to the input data in

Mathematics 2020, 8, 1640; doi:10.3390/math8101640 www.mdpi.com/journal/mathematics Mathematics 2020, 8, 1640 2 of 42 conjunction with a for the purpose of updating their weights and maximizing the fitting to the inbound data [1,2]. In the structure of a multi-layer, every node takes the outputs of all the prior layers in order to represent outputs set by diminishing the approximation of the primary input data, while multi-neurons learn various weights for the same data at the same time. There is a great demand for the appropriate mechanisms to improve productivity and product quality in the current market development. DL enables predicting and investigating complicated market trends compared to the traditional algorithms in ML. DL presents great potential to provide powerful tools to learn from stochastic data arising from multiple sources that can efficiently extract complicated relationships and features from the given data. DL is reported as an efficient predictive tool to analyze the market [3,4]. Additionally, compared to the traditional algorithms, DL is able to prevent the over-fitting problem, to provide more efficient sample fitting associated with complicated interactions, and to outstretch input data to cover all the essential features of the relevant problem [5]. Reinforcement learning (RL) [6] is a powerful mathematical framework for experience-driven autonomous learning [7]. In RL, the agents interact directly with the environment by taking actions to enhance its efficiency by trial-and-error to optimize the cumulative reward without requiring . Policy search and value function approximation are critical tools of autonomous learning. The search policy of RL is to detect an optimal (stochastic) policy applying gradient-based or gradient-free approaches dealing with both continuous and discrete state-action settings [8]. The value function strategy is to estimate the expected return in order to find the optimal policy dealing with all possible actions based on the given state. While considering an economic problem, despite traditional approaches [9], reinforcement learning methods prevent suboptimal performance, namely, by imposing significant market constraints that lead to finding an optimal strategy in terms of market analysis and forecast [10]. Despite RL successes in recent years [11–13], these results suffer the lack of scalability and cannot manage high dimensional problems. The DRL technique, by combining both RL and DL methods, where DL is equipped with the vigorous function approximation, representation learning properties of deep neural networks (DNN), and handling complex and nonlinear patterns of economic data, can efficiently overcome these problems [14,15]. Ultimately, the purpose of this paper is to comprehensively provide an overview of the state-of-the-art in the application of both DL and DRL approaches in economics. However, in this paper, we focus on the state-of-the-art papers that employ DL, RL, and DRL methods in economics issues. The main contributions of this paper can be summarized as follows:

Classification of the existing DL, RL, and DRL approaches in economics. • Providing extensive insights into the accuracy and applicability of DL-, RL-, and DRL-based • economic models. Discussing the core technologies and architecture of DRL in economic technologies. • Proposing a general architecture of DRL in economics. • Presenting open issues and challenges in current deep reinforcement learning models in economics. • The survey is organized as follows. We briefly review the common DL and DRL techniques in Section2. Section3 proposes the core architecture and applicability of DL and DRL approaches in economics. Finally, we follow the discussion and present real-world challenges in the DRL model in economics in Section4 with a conclusion to work in Section5.

2. Methodology and Taxonomy of the Survey The survey adopts the Prisma standard to identify and review the DL and DRL methods used in economics. As stated in [16], a systematic review based on the Prisma method includes four steps: (1) identification, (2) screening, (3) eligibility, (4) inclusion. In the identification stage, the documents are identified through an initial search among the mentioned . Through Thomson Reuters Web-of-Science (WoS) and Elsevier Scopus, 400 of the most relevant articles are identified. The screening step includes two stages in which, first, duplicate articles are eliminated. As a result, 200 unique articles Mathematics 2020, 8, 1640 3 of 42

Mathematics 2020, 8, x FOR PEER REVIEW 3 of 44 moved to the next stage, where the relevance of the articles is examined on the basis of their title and abstract.of their The title result and ofabstract. this step The was result 80 of articles this step for was further 80 articles consideration. for further Theconsideration. next step ofThe the next Prisma modelstep is eligibility,of the Prisma in whichmodel is the eligibility, full text in of which articles the was full read text byof articles the authors, was read and by 57 the of authors them considered, and 57 of them considered eligible for final review in this study. The last step of the Prisma model is the eligible for final review in this study. The last step of the Prisma model is the creation of the of creation of the database of the study, which is used for qualitative and quantitative analyses. The the study,database which of the is usedcurrent for research qualitative comprises and quantitative 57 articles, and analyses. all the Theanalys databasees in this of study the current took place research comprisesbased on 57 these articles, articles. and Figure all the 1 analysesillustratesin the this steps study of creating took place the database based onof the these current articles. research Figure 1 illustratesbased theon the steps Prism ofa creating method. the database of the current research based on the Prisma method.

Figure 1. Diagram of the systematic selection of the survey database. Figure 1. Diagram of the systematic selection of the survey database. Taxonomy of the survey is given in Figure2, where the notable methods DRL are presented. MathematicsTaxonomy 2020, 8 of, x FORthe PEERsurvey REVIEW is given in Figure 2, where the notable methods DRL are presented.4 of 44

Value-Based Methods Q-Learning Deep Q-Networks (DQN) Model-Based Methods

Deep Reinforcement Learning Distributional DQN

Double DQN Pure Model-Based Methods Actor–Critic Methods Policy Gradient Methods

Integrating Model-Free and Model-Based Methods (IMF&MBM) Combined Policy Gradient and Q-Learning

Stochastic Policy Gradient (SPG) Deterministic Policy Gradient (DPG)

Figure 2. Taxonomy of the notable DRL applied in economics. Figure 2. Taxonomy of the notable DRL applied in economics.

2.1. Deep Learning Methods In this section, we review the most commonly used DL algorithms, which have been applied in various fields [16–20]. These deep networks comprise stacked auto-encoders (SAEs), deep belief networks (DBNs), convolutional neural networks (CNNs), and recurrent neural networks (RNNs). A fundamental structure of a neural network is presented in Figure 3.

Figure 3. Structure of a simple neural network.

2.2. Stacked Auto-Encoders (SAEs) The basic building block of the stacked AE is called an AE, which includes one visible input layer and one hidden layer [17]. It has two steps in the training process. In mathematics, they can be explained as equations (1) and (2):

ℎ = 푓(푤ℎ 푥 + 푏ℎ ) (1)

풴 = 푓(푤푦 푥 + 푏푦 ) (2) The hidden layer ℎ can be transformed to provide the output value 풴. Here, 푥 휖 ℝ푑 represents 퐿 the input values, and ℎ 휖 ℝ denotes the hidden layer, i.e., the encoder. 푤ℎ and 푤푦 represent the

Mathematics 2020, 8, x FOR PEER REVIEW 4 of 44

Value-Based Methods Q-Learning Deep Q-Networks (DQN) Model-Based Methods

Deep Reinforcement Learning Distributional DQN

Double DQN Pure Model-Based Methods Actor–Critic Methods Policy Gradient Methods

Integrating Model-Free and Model-Based Methods (IMF&MBM) Combined Policy Gradient and Q-Learning

Stochastic Policy Gradient (SPG) Deterministic Policy Gradient (DPG)

Mathematics 2020, 8, 1640 4 of 42 Figure 2. Taxonomy of the notable DRL applied in economics.

2.1.2.1. Deep Deep Learning Learning Methods Methods InIn this this section section,, we we review review the the most most commonly commonly used used DL DL algorithms algorithms,, which which have have been been applied applied in variousin various fields fields [16 [–1620–]20. These]. These deep deep networks networks comprise comprise stacked stacked auto auto-encoders-encoders (SAEs), (SAEs), deep deep belief belief networksnetworks (DBNs), (DBNs), convolutional convolutional neural neural networks networks (CNNs), (CNNs), and and recurrent recurrent neural neural networks networks (RNNs). (RNNs). A fundamentalA fundamental structure structure of a of neural a neural network network is presented is presented in F inigure Figure 3. 3.

FigureFigure 3. 3. StructureStructure of of a a simple simple neural neural network network..

2.2.2.2. Stacked Stacked Auto Auto-Encoders-Encoders (SAE (SAEs)s) TheThe basic basic building building block block of ofthe the stacked stacked AE AEis called is called an AE an, which AE, which includes includes one visible one visible input layer input andlayer one and hidden one hidden layer layer[17]. [ It17 has]. It twohas two steps steps in the in thetraining training process process.. In In mathematics, mathematics, they they can can be be explainedexplained as as e Equationsquations (1 (1)) and and (2 (2):): h = f (wh x + bh) (1) ℎ = 푓(푤ℎ 푥 + 푏ℎ ) (1) = f wy x + by (2) 풴Y= 푓(푤푦 푥 + 푏푦 ) (2) d The hidden layer h can be transformed to provide the output value . Here, x R푑 represents The hidden layer ℎ can beL transformed to provide the output value 풴Y. Here, 푥 휖∈ ℝ represents the input values, and h R퐿 denotes the hidden layer, i.e., the encoder. wh and wy represent the the input values, and ℎ 휖∈ ℝ denotes the hidden layer, i.e., the encoder. 푤ℎ and 푤푦 represent the input-to-hidden and hidden-to-input weights, respectively. bh and by refer to the bias of the hidden and output terms, respectively, and f (.) indicates an . One can estimate the error term utilizing the Euclidean distance for approximating input data x while minimizing x y 2. Figure4 k − k2 presents the architecture of an AE.

2.3. Deep Belief Networks (DBNs) The basic building block of deep belief networks is known as a restricted Boltzmann machine (RBM) which is a layer-wise training model [18]. It contains a two-layer network with visible and hidden units. One can express the joint configuration energy of the units as Equation (3):

Xd XL Xd XL E(v, h; θ) = b v a h w v h = bTv aTh vTwh (3) − i i − j j − ij i j − − − i=1 j=1 i=1 j=1 where bi and aj are the bias term of the visible and hidden units, respectively. Here, wij denotes the weight between the visible unit i and hidden unit j. In RBM, the hidden units can capture an unbiased sample from the given data vector, as they are conditionally independent from knowing the visible MathematicsMathematics 20202020, ,88, ,x 1640 FOR PEER REVIEW 55 of of 44 42 input-to-hidden and hidden-to-input weights, respectively. 푏ℎ and 푏푦 refer to the bias of the hidden andstates. output One terms can improve, respectively, the feature and 푓 representation(. ) indicates an of activation a single RBM function. by cumulating One can estimate diverse RBMsthe error one after another, which constructs a DBN for detecting a deep hierarchical representation of the training2 term utilizing the Euclidean distance for approximating input data 푥 while minimizing ‖푥 − 푦‖2. Figuredata. Figure4 presents5 presents the architecture a simple architecture of an AE. of an RBM.

Mathematics 2020, 8, x FOR PEER REVIEW 6 of 44 FigureFigure 4. 4. TheThe simple simple architecture architecture of of an an AE AE..

2.3. Deep Belief Networks (DBNs) The basic building block of deep belief networks is known as a restricted Boltzmann machine (RBM) which is a layer-wise training model [18]. It contains a two-layer network with visible and hidden units. One can express the joint configuration energy of the units as Equation (3):

푑 퐿 d L 푇 푇 푇 퐸(푣, ℎ; 휃) = − ∑푖=1 푏푖푣푖 − ∑푗=1 푎푗ℎ푗 − ∑푖=1 ∑푗=1 푤푖푗푣푖ℎ푗 =-푏 푣 − 푎 ℎ-푣 푤ℎ (3) where 푏푖 and 푎푗 are the bias term of the visible and hidden units, respectively. Here, 푤푖푗 denotes the weight between the visible unit 푖 and hidden unit 푗. In RBM, the hidden units can capture an unbiased sample from the given data vector, as they are conditionally independent from knowing the visible states. One can improve the feature representation of a single RBM by cumulating diverse RBMs one after another, which constructs a DBN for detecting a deep hierarchical representation of the training data. FigureFigure 5 presents 5. A simple a simple architecture architecture of an RBMof an with RBM. hidden layers. Figure 5. A simple architecture of an RBM with hidden layers. 2.4. Convolutional Neural Networks (CNNs)

CNNs are composed of a stack of periodic layers and pooling layers with multiple fully connected layers. In the convolutional layer, CNNs employ a set of kernels to convolve the input data and intermediate features to yield various feature maps. In general, the pooling layer follows a convolutional layer, which is utilized to diminish the dimensions of feature maps and the network parameters. Finally, by utilizing the fully connected layers, these obtained maps can be transformed into feature vectors. We present the formula of the vital parts of the CNNs, which are the convolution layers. Assume that X is the input cube with the size of m n d where m n refers to the spatial level × × ×

Mathematics 2020, 8, x FOR PEER REVIEW 7 of 44

2.4. Convolutional Neural Networks (CNNs) CNNs are composed of a stack of periodic convolution layers and pooling layers with multiple fully connected layers. In the convolutional layer, CNNs employ a set of kernels to convolve the input data and intermediate features to yield various feature maps. In general, the pooling layer follows a Mathematicsconvolutional2020, 8layer,, 1640 which is utilized to diminish the dimensions of feature maps and the networ6 of 42k parameters. Finally, by utilizing the fully connected layers, these obtained maps can be transformed into feature vectors. We present the formula of the vital parts of the CNNs, which are the convolution of X, and d counts the channels. The j-th filter is specified with weight wj and bias bj. Then, one can layers. Assume that 푋 is the input cube with the size of 푚 × 푛 × 푑 where 푚 × 푛 refers to the spatial express the j-th output associated with the convolution layer as Equation (4): level of 푋, and 푑 counts the channels. The 푗-th filter is specified with weight 푤푗 and bias 푏푗. Then, 푗 d one can express the -th output associatedX  with the convolution layer as Equation (4): yj = 푑 f xi wj + bj , j = 1, 2, ... , k (4) 푦푗 = ∑ 푓(푥푖∗∗ 푤푗 + 푏푗 ), 푗 = 1, 2, … , 푘 (4) i=푖=11 Here, the activation function 푓(. ) is used to enhance the network's nonlinearity. Currently, Here, the activation function f (.) is used to enhance the network’s nonlinearity. Currently, ReLU [19] is the most popular activation function that leads to notably rapid convergence and ReLU [19] is the most popular activation function that leads to notably rapid convergence and robustness in terms of gradient vanishing [20]. Figure 6 presents the convolution procedure. robustness in terms of gradient vanishing [20]. Figure6 presents the convolution procedure.

(a) Kernel (b) Input (c) Output

FigureFigure 6.6. Convolution procedure.procedure.

2.5.2.5. Recurrent Neural NetworksNetworks (RNNs)(RNNs) RNNsRNNs extendedextended thethe conventionalconventional neuralneural networknetwork withwith loopsloops inin connections andand were developed inin [[21]21];; RNNsRNNs can identify patterns in sequentialsequential data and dynamic temporal specificationspecification byby utilizing recurrentrecurrent hiddenhidden statesstates comparedcompared toto thethe feedforwardfeedforward neuralneural network.network. Suppose Suppose that that 푥x isis thethe inputinput t<푡> vector.vector. TheThe recurrentrecurrent hiddenhidden statestate hℎh i of theof the RNN RNN can can be be updated updated by by Equation Equation (5): (5):

<푡> ( 표 푖푓 푡=표 ℎ = { <푡> i f t = o , (5) t 푓1(ℎ , 푥 ) 표푡ℎ푒푟푤푖푠푒 hh i= t 1 t , (5) f1(hh − i, xh i) otherwise where 푓1 denotes a nonlinear function, such as a hyperbolic agent function. The update rule of the whererecurrentf1 denotes hidden state a nonlinear can be expressed function, such as equation as a hyperbolic (6): agent function. The update rule of the recurrent hidden state can be expressed<푡> as Equation<푡−1> (6): <푡> ℎ = 푓1(푢ℎ + 푤푥 + 푏ℎ), (6) t t 1 t where 푤 and 푢 indicate the coefficienthh i= matricesf1(uhh − fori + thewxh inputi + bh )in, the current state and the activation (6) of recurrent hidden units at the prior step, respectively. 푏ℎ is the bias vector. Therefore, the output where풴<푡> atw timeand u푡indicate is presented the coe asffi Equationcient matrices (7): for the input in the current state and the activation of t recurrent hidden units at the prior step, respectively. bh is the bias vector. Therefore, the output h i at 풴<푡> = 푓 (푝ℎ<푡> + 푏 ). Y (7) time t is presented as Equation (7): 2 푦 t t Here, 푓 is the nonlinear function, andh i 푝= isf2 (thephh coefficienti + by). matrix for the activation of recurrent (7) Y hidden units in the current step, and 푏 represents the bias vector. Due to the vanishing gradient of Here, f is the nonlinear function,ℎ and p is the coefficient matrix for the activation of recurrent traditional RNNs, long-short-term memory (LSTM) [22] and gated recurrent units [23] were hidden units in the current step, and bh represents the bias vector. Due to the vanishing gradient of introduced to handle huge sequential data. The convolution procedure is presented in [24]. Figure 7 traditional RNNs, long-short-term memory (LSTM) [22] and gated recurrent units [23] were introduced illustrates a schematic for an RNN. to handle huge sequential data. The convolution procedure is presented in [24]. Figure7 illustrates a schematic for an RNN. MathematicsMathematics 2020,2020 8, x, FOR8, 1640 PEER REVIEW 7 of 428 of 44

FigureFigure 7. 7. AAnn illustration illustration ofofRNN. RNN. 2.6. Deep Reinforcement Learning Methods 2.6. Deep Reinforcement Learning Methods In this section, we mainly focus on the most commonly used deep RL algorithms, such as value-basedIn this section methods,, we mainly policy focus gradient on methods,the most andcommonly model-based used methods. deep RL algorithms, such as value- based methods,We first policy describe gradient how to formulatemethods, theand RL model problem-based for themethods. agent dealing with an environment whileWe first the describe goal is to how maximize to formulate cumulative the RL rewards. problem The for two the important agent dealing characteristics with an ofenvironment RL are whileas the follows: goal is first, to themaximize agent has cumulative the capability rewards. of learning The two good import behaviorant incrementally characteristics and of second, RL are as follows:the RLfirst agent, the enjoys agent the has trial-and-error the capability experience of learning by only good dealing behavior with theincrementally environment and and second gathers, the RL agentinformation enjoys (seethe Figuretrial-and6). It-error is worth experience mentioning by thatonly RL dealing methods with are the able environment to practically provideand gathers informationthe most (see appropriate Figure 6 method). It is worth in terms mentioning of computational that RL e methodsfficiency as are compared able to practically to some traditional provide the approaches. One can model the RL problem with a (MDP) with a 5-tuple ( , most appropriate method in terms of computational efficiency as compared to some traditionalS , T, , λ) where (state-space), (action space), T [0,1] (transition function), (reward function), approaches.A One canS model the RLA problem with a Markov∈ Decision ProcessR (MDP) with a 5-tuple and γ [0,1) (discount factor). The RL agent aims to search for the optimal expected return base on the ∈ (풮, 𝒜value, T, ℛ function, 휆) whereVπ(s )풮by (state Equation-space (8).), 𝒜 (action space), T ∈ [0,1] (transition function), ℛ (reward function), and 훾 ∈ [0, 1) (discount factor). The RL agent aims to search for the optimal expected   푉휋X(푠) return base on the value functionπ  ∞ kby e quation (8). π V (s)= E γ rk+t st = s. π where V∗= max V (s) (8)   π Π 휋 ∞ k= 푘 ∗∈ 휋 푉 (푠) = 피(∑푘=0 훾0 푟푘+푡|푠푡 = 푠. 휋)where 푉 = max 푉 (푠) (8) 휋∈Π where: where: rt= E (st, a, st+1), (9) a π(st,.)R ∼ 푟푡 = 피 ℛ(푠푡, 푎, 푠푡+1), (9) 푎 ~ 휋(푠푡,.) P(st+1 st, at) = T(st. at. st+1 ) with at π(st) (10) ∼ Analogously, theℙQ(value푠푡+1|푠 function푡, 푎푡) = can Τ( be푠푡. expressed푎푡. 푠푡+1) aswith Equation 푎푡 ~ (11):휋(푠푡. . ) (10)

Analogously, the Q value function can be expressed as Equation (11): X∞  Qπ(s.a)=  γkr s = s.a = a. π where Q = max Qπ(s.a) (11) 휋 ∞E 푘 k+t t t  ∗ ∗ 휋 푄 (푠. 푎) = 피(∑푘=0훾 푟푘+푡|푠 푡 = 푠. 푎푡 = 푎. 휋) where π푄Π = max 푄 (푠. 푎) (11) k=0 ∈ 휋∈Π One Onecan cansee seethe thegeneral general architecture architecture of of the the DRL algorithmsalgorithms in in Figure Figure8 adapted 8 adapted from from [25]. [25].

Mathematics 2020, 8, 1640 8 of 42 Mathematics 2020, 8, x FOR PEER REVIEW 9 of 44

Figure 8. 8. InteractionInteraction between between the the agent agent and and environment environment and and the g theeneral general structure structure of the of DRL the approachesDRL approaches..

2.6.1. Value-Based Methods 2.6.1. Value-Based Methods The value-based algorithms allow us to construct a value function for defining a policy. We discuss The value-based algorithms allow us to construct a value function for defining a policy. We the Q-learning [26] and the deep q-network (DQN) algorithm [6] with great success when discuss the Q-learning algorithm [26] and the deep q-network (DQN) algorithm [6] with great success playing ATARI games. We then give a brief review of the improved DQN algorithm. when playing ATARI games. We then give a brief review of the improved DQN algorithm. 2.6.2. Q-Learning 2.6.2. Q-Learning The basic value-based algorithm is called the Q-learning algorithm. Assume Q is the value The basic value-based algorithm is called the Q-learning algorithm. Assume Q is the value function; function; then, the optimal value of the Q-learning algorithm using the [27] can be then, the optimal value of the Q-learning algorithm using the Bellman equation [27] can be expressed expressed as Equation (12): as equation (12): Q∗(s.a) = ( Q∗)(s.a) (12) B 푄∗(s. a) = (ℬ푄∗) (s.a) (12) where the Bellman operator ( ) can be described as Equation (13): B where the Bellman operator (ℬ) can beX described as Equation (13): ( K)(s, a)= T(s, a, s´) ( (s, a, s´) + γ max K(s´, a´)) (13) (ℬB퐾)(푠, 푎) =∑푠́∈푆 Τ(푠, 푎, 푠́) (Rℛ(푠, 푎, 푠́) + 훾a´max Κ (푠́, 푎́)) s´ S 푎∈Á ∈ 𝒜 (13) ∈ Here, the unique optimal solution of the Q value function is 푄∗ (s, a). One can check out the Here, the unique optimal solution of the Q value function is Q∗ (s, a). One can check out the theoretical analysis of the optimal Q function in discrete space with sufficient sufficient exploration guarante guaranteee in [26] [26].. In In practice, practice, a a parameterized parameterized value value function function is is able able to to overcome overcome the the high high dimensional dimensional problems problems (possibly continuous space).

2.6.3. Deep Deep Q-NetworksQ-Networks (DQN) The DQN algorithm is presented by Mnih et al. [6] [6] that can obtain good results for ATARI games in an online framework.framework. In In deep deep Q Q-learning,-learning, we we make make use use of of a a neural neural net net to to estimate estimate a a complex, complex, nonlinear Q Q-value-value function. Imagine the target function as Equation ( (14):14): 푄 ̅ 푌푘Q = 푟 + 훾 max Q (푠́, 푎́; 휃푘) (14) Y = r + γ max푎́ ∈ 𝒜 Q(s´, a´; θk) (14) k a´ ∈A 푡ℎ The 휃푘̅ parameter, which defines the values of the Q function at the 푘 iteration, will be th updatedThe onlyθk parameter, every 퐴 휖 which ℕ iteration defines to the keep values the stability of the Q andfunction diminish at the thek riskiteration, of divergence. will be updatedIn order toonly bound every theA instabilities,N iteration DQN to keep uses the two stability heuristics, and diminish i.e., the target the risk Q-network of divergence. and the In replay order tomemory bound ∈ [28]the. instabilities, Additionally, DQN DQN uses take twos the heuristics, advantage i.e., of the other target heuristics Q-network such and as clipping the replay the memory rewards [28 for]. maintaining reasonable target values and for ensuring proper learning. An interesting aspect of DQN is that a variety of deep learning techniques are used to practically improve its performance such as the

Mathematics 2020, 8, 1640 9 of 42

Additionally, DQN takes the advantage of other heuristics such as clipping the rewards for maintaining Mathematics 2020, 8, x FOR PEER REVIEW 10 of 44 reasonable target values and for ensuring proper learning. An interesting aspect of DQN is that a variety of deep learning techniques are used to practically improve its performance such as the preprocessing step of the inputs, convolutional layers, and the optimization (stochastic gradient preprocessing step of the inputs, convolutional layers, and the optimization (stochastic ) [29]. We show the general scheme of the DQN algorithm [25] in Figure 9. descent) [29]. We show the general scheme of the DQN algorithm [25] in Figure9.

Figure 9. Basic structure of the DQN algorithm. Figure 9. Basic structure of the DQN algorithm. 2.6.4. Double DQN 2.6.4. Double DQN In Q-learning, the DQN-value function utilizes a similar amount as Q-value in order to identify andIn evaluate Q-learning, an action the D whichQN-value may causefunction overestimated utilizes a similar values amount and upward as Q- biasvalue in in the order algorithm. to identify Thus, andthe eval doubleuate estimator an action method which may can because used overestimated for each variable values to e ffiandciently upward remove bias thein the positive algorithm. bias in Thus,the action the double estimation estimator process method [30]. can The be Double used DQNfor each is independentvariable to efficiently of any source remove of error,the positive namely, biasstochastic in the environmentalaction estimation error. process The target [30].value The Double function DQN in the is double independent DQN (DDQN) of any source can be describedof error, namelyas Equation, stochastic (15): environmental error. The target value function in the double DQN (DDQN) can be described as Equation (15): DDQN Y = r + γQ(s´, argmax Q(s´, a; θk); θk) (15) k a 퐷퐷푄푁 ∈A 푌 = 푟 + 훾푄 (푠́, arg max 푄 (푠́, 푎; 휃푘); 휃푘̅ ) (15) Compared to the Q-network,푘 DDQN is usually푎∈𝒜 able to improve stability and to obtain a more accurateCompared Q-value to functionthe Q-network, as well. DDQN is usually able to improve stability and to obtain a more accurate Q-value function as well. 2.6.5. Distributional DQN 2.6.5. DistributionalThe idea of approaches DQN explained in the previous subsections was to estimate the expected cumulative return. Another interesting method is to represent a value distribution which allows The idea of approaches explained in the previous subsections was to estimate the expected us to better detect the inherent stochastic rewards and agent transitions in conjunction with the cumulative return. Another interesting method is to represent a value distribution which allows us to environment. One can define the random distribution return function associated with policy π as better detect the inherent stochastic rewards and agent transitions in conjunction with the follows in Equation (16): environment. One can define the random distribution return function associated with policy 휋 as Zπ (s, a) = (s, a, S´)+γZπ(S´, A´) (16) follows in Equation (16): R Equation (16) includes random state-action pairs (S´, A´) and A´ π(. S´). Thus, the Q value function Ζ휋 (s, a) = ℛ(s, a, 푆́) + 훾Ζ휋(푆́, 퐴́)∼ | (16) can be expressed as Equation (17): π π ́ ́ ́ ́ Equation (16) includes random stateQ -(actions, a) = pairsE[Z (푆s,, a퐴)]) and 퐴 ~ 휋(. |푆). Thus, the Q value(17) function can be expressed as Equation (17): In practice, the distributional Bellman equation, which interacts with deep learning, can play the 휋 휋 role of the approximation function [31–33푄].(s, The a) = main 피[Ζ benefit(s, a)] of this approach is the implementation(17) of risk-awareIn practice, behavior the distributional [34], and improved Bellman learning equation, provides which interact a richers setwith of deep training learning, signals can [35 play]. the role of the approximation function [31-33]. The main benefit of this approach is the implementation of 2.7. Policy Gradient Methods risk-aware behavior [34], and improved learning provides a richer set of training signals [35]. This section discusses policy gradient (PG) methods that are frequently used algorithms in 2.7.reinforcement Policy Gradient learning Methods [36 ] which follows a class of policy-based methods. The method is to find a neural network parameterized policy in order to maximize the expected cumulative reward [37]. This section discusses policy gradient (PG) methods that are frequently used algorithms in reinforcement learning [36] which follows a class of policy-based methods. The method is to find a neural network parameterized policy in order to maximize the expected cumulative reward [37].

Mathematics 2020, 8, 1640 10 of 42

2.7.1. Stochastic Policy Gradient (SPG) The easiest approach to obtain the policy gradient estimator could be to utilize algorithm [38]. The general approach to derive the estimated gradient is shown in Equation (18):

ωπω(s, a) = πω(s, a) ω log(πω(s, a)) (18) ∇ ∇ while πω π πω ωV (s0)= Es ρ ω,a πω [ ω(log πω(s, a))Q (s, a)] (19) ∇ ∼ ∼ ∇ Note that, in these methods, the policy evaluation estimates the Q-function, and the policy improvement optimizes the policy by taking a gradient step, utilizing the value function approximation. The easy way of estimating the Q-function is to exchange it with a cumulative return from entire trajectories. A value-based method such as the actor-critic method can be used to estimate the return efficiently. In general, an entropy function can be used for the policy randomness and efficient exploration purpose. Additionally, it is common to employ an advantage value function where conducts a measurement of comparison to the expected return for each action. In practice, this replacement improves the numerical efficiency.

2.7.2. Deterministic Policy Gradient (DPG) The DPG approach is the expected gradient of the action–value function. The deterministic policy gradient can be approximated without using an integral term over the action space. It can be demonstrated that the DPG algorithms can perform better than SPG algorithms in high-dimensional action spaces [39]. NFQ and DQN algorithms can resolve the problematic discrete actions using the Deep Deterministic Policy Gradient (DDPG) [39] and the Neural Fitted Q Iteration with Continuous Actions (NFQCA) [40] algorithms, with the direct representation of a policy. An approach proposed by [41] was developed to overcome the global optimization problem while updating greedy policy at each step. They defined a differentiable deterministic policy that can be moved to the gradient direction of the value function for deriving the DDPG algorithm, Equation (20):

πω πω ωV (s0)= Es ρπω[ ω(πω) a(Q (s, a)) a = πω(s)] (20) ∇ ∼ ∇ ∇ π which shows that Equation (20) is based on a(Q ω (s, a)). ∇ 2.7.3. Actor–Critic Methods An actor–critic architecture is a common approach where the actor updates the policy distribution with policy gradients, and the critic estimates the value function for the current policy [42], Equation (21).

π π ω ( )= π [ ( ( ) ω ( ))] ωV s0 Es ρ β , a π θ log πω s, a Q s, a . (21) ∇ ∼ ∼ β ∇ where β is behavior policy that makes the gradient biased, and the critic with parameter θ estimates the value function, Q(s, a; θ), with the current policy π. In deep reinforcement learning, the actor–critic functions can be parameterized with nonlinear neural networks [36]. The approach proposed by Sutton [7] was quite simple but not computationally efficient. The ideal is to design an architecture to profit from the reasonably fast reward propagation, the stability, and the capability using replay memory. However, the new approach utilized in the actor–critic framework presented by Wang et al. [43] and Gruslys et al. [44] has sample efficiency and is computationally efficient as well.

2.7.4. Combined Policy Gradient and Q-Learning In order to improve the policy strategy in RL, an efficient technique needs to be engaged, such as a policy gradient, applying a sample-efficient approach and value function approximation associated Mathematics 2020, 8, 1640 11 of 42 with the policy. These algorithms enable us to work with continuous action spaces, to construct the policies for explicit exploration, and to apply the policies to multiagent setting where the problem deals with the stochastic optimal policy. However, the idea of combining policy gradient methods with optimal policy Q-learning was proposed by O Donoghue et al. [45] while summing the equations with an entropy function, Equation (22):

πω πω πω ωV (s0 ) = Es,a[ ω(log πω(s, a))Q (s, a)]+αEs[ ωH (s)] (22) ∇ ∇ ∇

where X Hπ(s) = π(s, a) log π(s, a) (23) a It showed that in some specific settings, both value-based and policy-based approaches have a fairly similar structure [46–48].

2.8. Model-Based Methods We have discussed so far, the value-based or the policy-based methods which belong to the model-free approach. In this section, we focus on the model-based approach where the model deals with the dynamics of the environment and the reward function.

2.8.1. Pure Model-Based Methods When the explicit model is not known, it can be learned from experience by the function approximators [49–51]. The model plays the actual environment role to recommend an action. The common approach in the case of discrete actions is look ahead search, and trajectory optimization can be utilized in a continuous case. A lookahead search is to generate potential trajectories with the difficulty of exploration and exploitation trade-off in sampling trajectories. The popular approaches for lookahead search are (MCTS) methods such that the MCTS algorithm recommends an action (see Figure9). Recently, instead of using explicit tree search techniques, learning an end-to-end model was developed [52] with improved sample efficiency and performance as well. Lookahead search techniques are useless in a continuous environment. Another approach is PILCO, i.e., apply Gaussian processes in order to produce a probabilistic model with reasonable sample efficiency [53]. However, the gaussian processes are reliable only in low-dimensional problems. One can take the benefit of the generalization capabilities of DL approaches to build the model of the environment in higher dimensions. For example, a DNN can be utilized in a latent state space [54]. Another approach aims at leveraging the trajectory optimization such as guided policy search [55] by taking a few sequences of actions and then learns the policy from these sequences. The illustration of the MCTS process is presented in Figure 10 and was adopted from [25].

2.8.2. Integrating Model-Free and Model-Based Methods (IMF&MBM) The choice of model-free versus model-based approaches mainly depends on the model architecture such as policy and value function. As an example to clearly explain the key point, assume that an agent needs to pass the street randomly while the best choice is to take the step unless something unusual happens in front of the agent. In this situation, using the model-based approach may be problematic due to the randomness of the model, while a model-free approach to find the optimal policy is highly recommended. There is a possibility of integrating planning and learning to produce a practical algorithm which is computationally efficient. In the absence of a model where the limited number of trajectories is known, one approach is to build an algorithm to generalize or to construct a model to generate more samples for model-free problems [56]. Another choice could be utilizing a model-based approach to accomplish primary tasks and apply model-free fine-tuning to achieve the goal successfully [57]. The tree employs a model to search techniques directly [58]. The notion of a neural network is able to combine these two approaches. The model proposed by Heess et al. [59] engaged a Mathematics 2020, 8, x FOR PEER REVIEW 12 of 44 the policies for explicit exploration, and to apply the policies to multiagent setting where the problem deals with the stochastic optimal policy. However, the idea of combining policy gradient methods with optimal policy Q-learning was proposed by O Donoghue et al. [45] while summing equation 2.24 with an entropy function, Equation (22):

휋휔 휋휔 휋휔 ∇휔푉 (푠0)= 피s,a [∇휔(log 휋휔(s, a)) 푄 (푠, 푎)] + 훼피푠[∇휔Η (s)] (22) where

휋 Η (s) = ∑푎 휋(s, a) log 휋(s, a) (23) It showed that in some specific settings, both value-based and policy-based approaches have a fairly similar structure [46-48].

2.8. Model-Based Methods We have discussed so far, the value-based or the policy-based methods which belong to the model-free approach. In this section, we focus on the model-based approach where the model deals with the dynamics of the environment and the reward function.

2.8.1. Pure Model-Based Methods When the explicit model is not known, it can be learned from experience by the function approximators [49-51]. The model plays the actual environment role to recommend an action. The common approach in the case of discrete actions is look ahead search, and trajectory optimization can be utilized in a continuous case. A lookahead search is to generate potential trajectories with the difficulty of exploration and exploitation trade-off in sampling trajectories. The popular approaches for lookahead search are Monte Carlo tree search (MCTS) methods such that the MCTS algorithm recommends an action (see Figure 9). Recently, instead of using explicit tree search techniques, learning an end-to-end model was developed [52] with improved sample efficiency and performance as well. Lookahead search techniques are useless in a continuous environment. Another approach is PILCO, i.e., apply Gaussian processes in order to produce a probabilistic model with reasonable sample efficiency [53]. However, the gaussian processes are reliable only in low-dimensional problems. One can take the benefit of the generalization capabilities of DL approaches to build the modelMathematics of the2020 environment, 8, 1640 in higher dimensions. For example, a DNN can be utilized in a latent state12 of 42 space [54]. Another approach aims at leveraging the trajectory optimization such as guided policy searchbackpropagation [55] by taking algorithm a few sequences to estimate of actions a value and function. then learn Anothers the policy example from is thethese work sequences presented. The by illustrationSchema Networks of the MCTS [60] thatprocess uses is prolific presented structured in Figure architecture 10 and was where adopted it leads from to robust[25]. generalization.

Figure 10. Illustration of the MCTS process. Figure 10. Illustration of the MCTS process. 3. Review Section

This section discusses an overview of various interesting uses of both DL and deep RL approaches in economics.

3.1. Deep Learning Application in Economics The recent attractive application of deep learning in a variety of economics domains is discussed in this section.

3.1.1. Deep Learning in Stock Pricing From an economic point of view, the stock market value and its development are essential to business growth. In the current economic situation, there are many investors around the world that are interested in the stock market in order to receive quick and better return compared to other sectors. The presence of uncertainty and risk in the forecasting of stock pricing bring challenges to the researcher to design a market model for prediction. Despite all advances to develop mathematical models for forecasting, they are still not that successful [61]. The deep learning topic attracts scientists and practitioners as it is useful for high revenue while enhancing the prediction accuracy with DL methods. Table1 presents recent research.

Table 1. Application of deep learning in stock price prediction.

Reference Methods Application Deep learning framework for stock [62] Two-Streamed network value prediction [63] Filtering methods Novel filtering approach Pattern matching algorithm for forecasting [64] Pattern techniques the stock value Advanced DL framework for the stock [65] Multilayer deep Approach value price

According to Table1, Minh et al. [ 62] presented a more realistic framework for forecasting stock price movement concerning financial news and sentiment dictionary, as previous studies mostly relied Mathematics 2020, 8, 1640 13 of 42 on an inefficient sentiment dataset, which are crucial in stock trends, which led to poor performance. MathematicsThey proposed 2020, 8, thex FOR Two-stream PEER REVIEW Gated Recurrent Unit (TGRU) using deep learning techniques14 of that 44 perform better than the LSTM model. Where it takes the advantage of applying two states that enable Harvard IV-4. Additionally, they provided a simulation system for investors in order to calculate the model to provide much better information. They presented a sentiment Stock2Vec embedding Mathematicstheir actual 2020 return., 8, x FOR Results PEER REVIEW were evaluated using accuracy, precision, and recall values to compare14 of 44 with the proof of the model robustness in terms of market risk while using Harvard IV-4. Additionally, TGRU and LSTM techniques with GRU. Figure 11 presents the relative percentage of performance Harvardthey provided IV-4. Additionally, a simulation they system provided for investors a simulation in order system to calculate for investors their actual in order return. to calculate Results factors for TGRU and LSTM in comparison with that for GRU. theirwere actual evaluated return. using Results accuracy, were precision, evaluated and using recall accuracy, values toprecision compare, and TGRU recall and values LSTM to techniques compare TGRUwith GRU. and LSTM Figure techniques11 presents with the relative GRU. Fig percentageure 11 presents of performance the relative factors percentage for TGRU of performance and LSTM in factorscomparison for TGRU with and that LSTM for GRU. in comparison with that for GRU.

Figure 11. Performance factors for comparing TGRU and LSTM with GRU.

As is clear from Figure 11, TGRU presents higher improvement in relative values for Figure 11. Performance factors for comparing TGRU and LSTM with GRU. performance factorFigures in 11. comparison Performance with factors LSTM. for comparing Also, TGRU TGRU provide and LSTMs higher with improvement GRU. in recall values.As is clear from Figure 11, TGRU presents higher improvement in relative values for performance As is clear from Figure 11, TGRU presents higher improvement in relative values for factorsSong in comparisonet al [63] presented with LSTM. work Also, to apply TGRU spotlighted provides higher deep improvementlearning techniques in recall for values. forecasting performance factors in comparison with LSTM. Also, TGRU provides higher improvement in recall stockSong trend ets. al. The [63 ] research presented developed work to apply a deep spotlighted learning deep model learning with techniquesa novel input for forecasting-feature mai stocknly values. focusedtrends. Theon filtering research technique developeds in a terms deep learningof delivering model better with training a novel accuracy. input-feature Results mainly were focused evaluated on Song et al [63] presented work to apply spotlighted deep learning techniques for forecasting byfiltering accuracy techniques values in in terms a training of delivering step. better Comparing training profit accuracy. values Results for the were developed evaluated approaches by accuracy stock trends. The research developed a deep learning model with a novel input-feature mainly indicatedvalues in a that training the novel step. Comparing filtering technology profit values and for stock the price developed model approaches by employing indicated 715 features that the focused on filtering techniques in terms of delivering better training accuracy. Results were evaluated providednovel filtering the highest technology return andvalue stock by more price than model 130$. by Fig employingure 12 presents 715 features a visualized provided comparison the highest for by accuracy values in a training step. Comparing profit values for the developed approaches accuracyreturn value values. by The more comparative than 130$. analysis Figure 12 for presents the methods a visualized developed comparison by Song et for al [63] accuracy is presented values. indicated that the novel filtering technology and stock price model by employing 715 features inThe Figure comparative 12. analysis for the methods developed by Song et al. [63] is presented in Figure 12. provided the highest return value by more than 130$. Figure 12 presents a visualized comparison for accuracy values. The comparative analysis for the methods developed by Song et al [63] is presented in Figure 12.

FigureFigure 12. 12. ComparingComparing accuracy accuracy values values for for the the methods. methods. Based on Figure 12, the highest accuracy is related to a simple feature model without filtering, Based on Figure 12, the highest accuracy is related to a simple feature model without filtering, and the lowest accuracy is related to a novel feature model with plunge filtering. By considering and the lowest accuracy isFigure related 12. Comparingto a novel featureaccuracy model values with for the plunge methods. filtering. By considering the trend, it can be concluded that the absence of the filtering process has a considerable effect on Based on Figure 12, the highest accuracy is related to a simple feature model without filtering, increasing the accuracy. and the lowest accuracy is related to a novel feature model with plunge filtering. By considering the In another study, Go and Hong [64] employed the DL technique to forecast stock value streams trend, it can be concluded that the absence of the filtering process has a considerable effect on while analysing the pattern in stock price. The study designed a DNN deep learning algorithm to increasing the accuracy. In another study, Go and Hong [64] employed the DL technique to forecast stock value streams while analysing the pattern in stock price. The study designed a DNN deep learning algorithm to

Mathematics 2020, 8, 1640 14 of 42 the trend, it can be concluded that the absence of the filtering process has a considerable effect on increasing the accuracy.

MathematicsIn another 2020, 8, study,x FOR PEER Go andREVIEW Hong [64] employed the DL technique to forecast stock value streams15 of 44 while analysing the pattern in stock price. The study designed a DNN deep learning algorithm to find findthe pattern the pattern utilizing utilizing the time the time series series technique technique which which had ahad high a accuracyhigh accuracy performance. performance. Results Results were wereevaluated evaluated by the by percentage the percentage of test setsof test of 20sets companies. of 20 companies. The accuracy The a valueccuracy for value DNN wasfor DNN calculated was calculatedto be 86%. to However, be 86%. However DNN had, DNN some disadvantageshad some disadvantages such as over such fitting as over and fitting complexity. and complexity. Therefore, Therefore,it was proposed it was toproposed employ CNNto employ and RNN.CNN and RNN. InIn the the s studytudy by by Das and Mishra [65] [65],, a new multilayer deep learning approach approach was was used used by by employingemploying the the time time series series concept concept for for data data representation representation to to forecast forecast the the close close price price of of current current stock. stock. ResultsResults were evaluatedevaluated byby prediction prediction error error and and accuracy accuracy values values compared compared to the to resultsthe results obtained obtained from frothem related the related studies. studies. Based Based on theon the results, result thes, the prediction prediction error error was was very very low low based based onon thethe outcome graphgraph,, and and the the predicted price was fairly close to the reliable price in a time series data. Fig Figureure 1133 provides a comparison of the the proposed proposed method method with with the the similar similar method method reported reported by by different different studies studies inin terms terms of of accuracy. accuracy. Figure 1133 reports the comparative analysis from Das andand MishraMishra [[65]65]..

Figure 13. TheThe accuracy accuracy values. values.

BasedBased on on Fig Figureure 13,, the the proposed proposed method method by by employing employing the the related related dataset dataset could could considerably considerably improveimprove the the accuracy accuracy value value by about by about 34, 13 34,, and 13, 1.5% and compared 1.5% compared with Huynh with et Huynhal [66], Mingyue et al. [66 et], alMingyue [67], and et Weng al. [67 ],et andal [68] Weng, respectively. et al. [68], respectively. ThThee used used approach approach in in [62] [62] has has the the strong strong advantage advantage of of utilizing utilizing forward forward and and backward backward learning learning statesstates at the same time to present more useful information.information. P Partart of the mathematical forward pass formulaformula for for the the update gate is given in Equation ( (24):24): ⃗⃗⃗⃗⃗⃗⃗⃗ ⃗⃗⃗⃗ 퓏⃗⃗⃗푡 = 휎(푊⃗⃗⃗⃗⃗퓏 →푥푡 + 푈⃗⃗→⃗⃗퓏 ℎ→푡−1 +→ 푏퓏) (24) →zt = σ(Wzxt+Uz ht 1+bz) (24) − and the backward pass formula is shown in Equation (25): and the backward pass formula is shown in Equation (25): ⃖⃗⃗⃗⃗⃗⃗⃗⃗ ⃖⃗⃗⃗⃗ 퓏⃖⃗⃗푡⃗ = 휎(푊⃖⃗⃗⃗퓏⃗⃗푥푡 + 푈⃖⃗⃗⃗퓏⃗ ℎ푡−1 + 푏퓏) (25) ← ← ← ← ←zt = σ(Wzxt+Uz ht 1+bz) (25) where 푥푡 is the input vector, and b is the bias. 휎 denotes− the logistic function. ℎ푡 is the activation function and W, U are the weights. In the construction of TGRU, both forward and backward passes where x is the input vector, and b is the bias. σ denotes the logistic function. h is the activation function linked intot a single context for the stock forecasting, which led to dramaticallyt enhanced accuracy of and W, U are the weights. In the construction of TGRU, both forward and backward passes linked the prediction by applying more efficient financial indicators regarding financial analysis. In Figure into a single context for the stock forecasting, which led to dramatically enhanced accuracy of the 13 one can see the whole architecture of the proposed model. The TGRU structure is presented in prediction by applying more efficient financial indicators regarding financial analysis. In Figure 13 one Figure 14, adopted from [62]. can see the whole architecture of the proposed model. The TGRU structure is presented in Figure 14, adopted from [62].

Mathematics 2020, 8, 1640 15 of 42 Mathematics 2020, 8, x FOR PEER REVIEW 16 of 44

Figure 14. IllustrationIllustration of of the the TGRU structure structure..

3.1.2. Deep Deep Learning Learning in Insurance Another application of DL methods is the insurance sector.sector. One of the challenges of insurance companies is is to to efficiently efficiently manage manage fraud fraud detection detection (see (see Table Table 2).2 In). recent In recent years years,, ML techniques ML techniques have beenhave widely been widely used usedto develop to develop practical practical algorithms algorithms in this in field this fielddue dueto the to high the high market market demand demand for newfor new approaches approaches compared compared with traditional with traditional methods methods to practically to practically measure measure all types allof risks types (Brockett of risks (Brockettet al. 2002; et Pathak al. 2002; et Pathakal. 2005, et Derrig, al. 2005, 2002). Derrig, For 2002).instance, For there instance, are many there demands are many for demands car insurance for car thatinsurance forces that companies forces companies to find novel to find strategies novel strategies in order into ordermeliorate to meliorate and upgrade and upgrade their system. their system. Table Table2 summarizes2 summarizes the most the mostnotable notable studies studies for the for application the application of DL of techniques DL techniques in insurance. in insurance.

Table 2.2. Application of deepdeep learning in the InsuranceInsurance industry industry.. Reference Methods Application References Methods Application [69] Cycling algorithms Fraud detection in car insurance [69][ 70]Cycling LDA-based algorithms appraoch Fraud detection Insurance in fraud car insurance [71] technique Evaluation of risk in car insurance [70] LDA-based appraoch Insurance fraud

[71] Autoencoder technique Evaluation of risk in car insurance Bodaghi and Teimourpour [69] proposed a new method to detect professional fraud in car insurance for big data by using social network analysis. Their approach employed cycling, which plays crucialBodaghi roles inand network Teimourpour systems, [69] to constructproposed an a new indirect method collisions to detect network professional and to then fraud identify in car insurancedoubtful cyclesfor big in data order by to using make social more network profit concerning analysis. Their more realisticapproach market employed assumption. cycling, which Fraud playsdetection crucial may roles affect in pricingnetwork strategies systems,and to construct long-term an profit indirect while collisions dealing network with the and insurance to then industry.identify doubtfulEvaluation cycles of the in methodsorder to make for suspicious more profit components concerning was more performed realistic bymarket the assumption. of Fraud being detectionfraudulent may in the affect actual pricing data. strategies Fraud probability and long was-term calculated profit while for di dealingfferent numbers with the of insurance nodes in industry.various community Evaluation IDs of andthe methods cycle IDs. for Based suspicious on results, components the highest was Fraud performed probability by the for probability a community of wasbeing obtained fraudulent atnode in the number actual data 10, by. Fraud 3.759, probability and the lowest was calculated Fraud probability for different for anumber communitys of nodes was inobtained various at community node number ID 25,s and by 0.638. cycle Also, IDs. the Based highest on results, Fraud probability the highest for Fraud a cycle probability was obtained for ata communitynode number was 10, by obtained 7.898, which at node was number about 110% 10, by higher 3.759 than, and that the for lowest the community, Fraud probability and the lowest for a communityFraud probability was obtained for the at cycle node was number obtained 25, by at 0.638. node Also, number the 12,highest by 1.638, Fraud which probability was about for a 156%cycle washigher obtained than that at for node the community. number 10, by 7.898, which was about 110% higher than that for the communityRecently,, and a new the deeplowest learning Fraud probability model was for presented the cycle to was investigate obtained fraud at node in car number insurance 12, by 1.638 Wang, whichand Xu was [70 about]. The 156% proposed higher model than that outperforms for the community. the traditional method where the combination of Recently, a new deep learning model was presented to investigate fraud in car insurance by Wang and Xu [70]. The proposed model outperforms the traditional method where the combination

Mathematics 2020, 8, 1640 16 of 42 Mathematics 2020, 8, x FOR PEER REVIEW 17 of 44 Mathematics 2020, 8, x FOR PEER REVIEW 17 of 44 oflatent latent Dirichlet Dirichlet allocation allocation (LDA) (LDA) [60 [60]] and and the the DNN DNN technique technique is utilizedis utilized to to extract extract the the text text features features of ofthe thelatent accidents accidents Dirichlet comprising comprising allocation traditional traditional (LDA) [60] features features and the and andDNN text text technique features. features. Anotheris Another utilized importantimportant to extract topictopicthe text thatthat features brings ofinterest the accidents to the insuranceinsurance comprising industry traditional is telematicstelematics features and devices text thatfeatures. deal Another with detectingdetecting important hidden topic informationinformation that brings interestinside the to datathe insurance eefficiently.fficiently. industry Results is were telematics evaluated devices by accuracya ccuracythat deal and with precision detecting performance hidden information factors in insidetwo scenarios,scenarios the data, “with efficiently.“with LDA” LDA” Results and and “without “withoutwere evaluated LDA”, LDA” to, consider byto a considerccuracy the e andff theect effectprecision of LDA of on LDAperformance the prediction on the prediction factors process. in twoprocess.Figure scenarios 14 Fig presentsure, “with14 thepresents LDA” visualized the and visualized results. “without A results. comparative LDA” , Ato comparative consider analysis the of analysis SVM, effect RF, ofof LDA andSVM, DNN on RF the, and by prediction Wang DNN and by Xuprocess.Wang [70 ]and is Fig illustrated Xuure [70] 14 presentsis inillustrated Figure the 15 visualizedin. Figure 1 results.5. A comparative analysis of SVM, RF, and DNN by Wang and Xu [70] is illustrated in Figure 15.

Figure 15. ComparativeComparative analysis of SVM, RF RF,, and DNN. Figure 15. Comparative analysis of SVM, RF, and DNN. According to Figure 15, LDA has a positive effect on the accuracy of DNN and RF that could According to Figure 15, LDA has a positive effect on the accuracy of DNN and RF that could successfully increase the accuracy of DNN and RF by about 7% and 1%, respectively, but reduce the successfullyAccording increase to Fig theure accuracy 15, LDA of has DNN a positive and RF effect by about on the 7% accuracy and 1%, respectivelyof DNN and, butRF reducethat could the accuracy of SVM by about 1.2%. On the other hand, LDA could successfully increase the precision of successfullyaccuracy of SVM increase by about the accuracy 1.2%. O nof the DNN other and hand, RF by LDA about could 7% successand 1%,fully respectively increase ,the but precision reduce the of SVM, RF, and DNN by about 10, 1.2 and 4%, respectively. accuracySVM, RF ,of and SVM DNN by aboutby about 1.2%. 10, O 1.2n theand other 4%, respectively.hand, LDA could successfully increase the precision of Recent work proposed an algorithm combining an auto-encoder technique with telematics data SVM,R RFecent, and work DNN proposed by about an 10, algorithm 1.2 and 4%,combining respectively. an auto -encoder technique with telematics data to forecast the risk associated with insurance customers [71]. To efficiently deal with a large dataset, to forecastRecent the work risk proposed associated an with algorithm insurance combining customers an auto[71].- encoderTo efficie techniquently deal with atelematics large dataset data, one requires powerful updated tools for detecting valuable information such as telematics devices [71]. toone forecast require thes powerful risk associated updated with tool insurances for detecting customers valuable [71]. informationTo efficiently such deal as with telematics a large datasetdevices, The work utilized a conceptual model in conjunction with telematics technology to forecast the risk one[71] .require The works powerful utilized updateda conceptual tool smodel for detecting in conjunction valuable with information telematics such technology as telematics to forecast devices the (see Figure 16). While the risk score (RS) calculated by Equation (26): [71]risk. (seeThe Figworkure utilized 16). Whi ale conceptual the risk score model (RS) in calculated conjunction by withequation telematics (26): technology to forecast the risk (see Figure 16). While the risk score (RS) calculated by equation (26): P ∑P∑ 푂 (푅푆) X∑ 푊 푂 푊 i 푖 j푗O푖푗ij 푗= 푖 푐푖* 푖푗 where 푐푖 = ∑ ∑ 푂 (26) (RS)j= Wci Oij where Wci = P∑푖 푗 푗푂푖푗푖푗 (26) (푅푆)푗= ∑푖 푊푐 *∗ 푂푖푗 where 푊푐 = j Oij (26) i 푖 푖 ∑푗 푂푖푗 where 푊 and 푂 represent the risk weight and driving style, respectively. Their experimental results wherewshowedhere W푊 the andand superiorityO 푂represent represent of thetheir the risk riskproposed weight weight approach and and driving driving (see style, style, figure respectively.respectively. adopted for TheirThe their model experimental [71]). results showed the superiority of their proposed approach (see(see figurefigure adoptedadopted forfor thethe modelmodel [[71]71]).).

Figure 16. The graph of the conceptual model. Figure 16. The graph of the conceptual model model.. 3.1.3. Deep Learning in Auction Mechanisms 3.1.3. Deep Learning in Auction Mechanisms Auction design has a major importance in practice that allows the organizations to present better servicesAuction to their design customers has a major. A great importance challenge in practiceto learn thata trustable allows theauction organizations is that its tobidders present require better services to their customers. A great challenge to learn a trustable auction is that its bidders require

Mathematics 2020, 8, 1640 17 of 42

Mathematics 2020, 8, x FOR PEER REVIEW 18 of 44 3.1.3. Deep Learning in Auction Mechanisms optimalAuction strategy design for maximizing has a major importanceprofit. In this in direct practiceion, that Myerson allows designed the organizations an optimal topresent auction better with onlyservices a single to their item customers. [72]. There A are great many challenge works to with learn results a trustable for single auction bidders is that but its most bidders often require with partialoptimal optimality strategy for[73 maximizing-75]. Table 3 profit. presents In this the direction,notable studies Myerson developed designed by an DL optimal techniques auction in Auctionwith only Mechanisms. a single item [72]. There are many works with results for single bidders but most often with partial optimality [73–75]. Table3 presents the notable studies developed by DL techniques in Table 3. Application of deep learning in auction design. Auction Mechanisms. References Methods Application Table 3. Application of deep learning in auction design. [76] Augmented Lagrangian Technique Optimal auction design Reference Methods Application [77][76] AugmentedExtended Lagrangian RegretNet method Technique Maximized Optimal auction return in design auction [77] Extended RegretNet method Maximized return in auction [78] Data-Driven Method Mechanism design in auction [78] Data-Driven Method Mechanism design in auction [79][79] Multi-layerMulti-layer neuralneural NetworkNetwork methodmethod AuctionAuction inin mobilemobile networksnetworks

Dütting et al.al [80] [80] designed aa compatiblecompatible auction auction with with multi-bidders multi-bidder thats that maximizes maximize thes the profit profit by byapplying applying multi-layer multi-layer neural neural networks networks for encodingfor encoding its mechanisms. its mechanisms The. The proposed proposed method method was ablewas ableto solve to solve much much more more complex complex tasks tasks while while using using the augmented the augmented Lagrangian Lagrangian technique technique than LP-basedthan LP- basedapproach. approach. Results Results were were evaluated evaluated by comparingby comparing the the total total revenue. revenue. Despite Despite allall previous results, results, the proposed approachapproach hadhad great great capability capability to to be be applied applied to to the th largee large setting setting with with high high profit profit and and low lowregret. . Figure Fig 17ure presents 17 presents the revenuethe revenue outcomes outcomes for the for studythe study by Dütting by Dütting et al. et [80 al]. [80].

FigureFigure 17. TheThe comparison comparison of of revenue revenue for for the the study. study.

According to Figure 17, RegretNet as the proposed technique increased the revenue by about 2.32 According to Figure 17, RegretNet as the proposed technique increased the revenue by about and 2% compared with VVCA and AMAbsym, respectively. 2.32 and 2% compared with VVCA and AMAbsym, respectively. Another study used the deep learning approach to extend the result in [76] in terms of both budget Another study used the deep learning approach to extend the result in [76] in terms of both constraints and Bayesian compatibility [77]. The method demonstrated that neural networks are able budget constraints and Bayesian compatibility [77]. The method demonstrated that neural networks to efficiently design novel optimal-revenue auctions by focusing on multiple setting problems with are able to efficiently design novel optimal-revenue auctions by focusing on multiple setting different valuation distributions. Additionally, a new method proposed by [78] improved the result problems with different valuation distributions. Additionally, a new method proposed by [78] in [80] by constructing different mechanisms to apply DL techniques. The approach makes use of improved the result in [80] by constructing different mechanisms to apply DL techniques. The strategy under the assumption that multiple bids can be applied to each bidder. Another attractive approach makes use of strategy under the assumption that multiple bids can be applied to each approach applied to mobile blockchain networks constructed an effective auction using a multi-layer bidder. Another attractive approach applied to mobile blockchain networks constructed an effective neural network technique [79]. Neural networks trained by formulating parameters maximize the auction using a multi-layer neural network technique [79]. Neural networks trained by formulating profit of the problem, which considerably outperformed the baseline approach. The main recent work parameters maximize the profit of the problem, which considerably outperformed the baseline mentioned in Table3 indicates that this field is growing fast. approach. The main recent work mentioned in Table 3 indicates that this field is growing fast. The proposed approach in [77] modified the regret definition in order to handle budget constraints while designing an auction with multiple item settings (see Figure 18). The main expected regret is shown in Equation (27):

Mathematics 2020, 8, 1640 18 of 42

The proposed approach in [77] modified the regret definition in order to handle budget constraints while designing an auction with multiple item settings (see Figure 18). The main expected regret is shownMathematics in Equation2020, 8, x FOR (27): PEER REVIEW 19 of 44 " # 푟푔rgt푡 == 피 max[maxχ휒(p(푝 ́ ≤b )(푏 )(풰(t (,푡t´,)푡́ ) − 풰(t (, 푡t,))푡 ),)], (27) 푖i Et푡i 푖~F퐹i 푖 i(t´푖i)(푡푖) i 푖 i 푖i 푖i 푖 i 푖i 푖i 푖 (27) ∼ t´ 푡́푖∈풯푖 ≤ U − U i∈Ti where 휒 is the indicator function, 풰 is the interim utility, and 푝 is the interim payment. The where χ is the indicator function, is the interim utility, and p is the interim payment. The method method improved the state-of-theU-art utilized DL concepts to optimally design an auction applied to improved the state-of-the-art utilized DL concepts to optimally design an auction applied to the the multiple items with high-profit. Figure 18 represents an adaptation of RegretNet [77]. multiple items with high-profit. Figure 18 represents an adaptation of RegretNet [77].

FigureFigure 18.18. Illustration of a budgeted RegretNet.RegretNet. 3.1.4. Deep Learning in Banking and Online Markets 3.1.4. Deep Learning in Banking and Online Markets In current technology improvement, fraud detection is a challenging application of deep learning, In current technology improvement, fraud detection is a challenging application of deep namely, in online shopping and credit cards. There is a high market demand to construct an efficient learning, namely, in online shopping and credit cards. There is a high market demand to construct system for fraud detection in order to keep the involved system safe (see Table4). an efficient system for fraud detection in order to keep the involved system safe (see Table 4).

Table 4. Application of deep learning in the banking system and online market. Table 4. Application of deep learning in the banking system and online market. Reference Methods Application References Methods Application [81] AE Fraud detection in unbalanced datasets [[81]82] NetworkA topologyE Fraud detection credit card in transactionsunbalanced datasets [83] Natural language Processing Anti-money laundering detection [[82]84] AE andNetwork RBM architecturetopology Fraudcredit detection card transactions in credit cards [83] Natural language Processing Anti-money laundering detection

Unsupervised[84] learningAE could and beRBM used architecture to investigate onlineFraud transactions detection due in credit to variable cards patterns of fraud and change in customer’s behavior. An interesting work relied on employing deep learning methods such as AE and RBM to mimic irregularity from regular patterns by rebuilding regular could be used to investigate online transactions due to variable patterns transactions in real-time [84]. The applied fundamental experiments to confirm that AE and RBM of fraud and change in customer’s behavior. An interesting work relied on employing deep learning approaches are able to accurately detect credit cards using a huge dataset. Although deep learning methods such as AE and RBM to mimic irregularity from regular patterns by rebuilding regular transactions in real-time [84]. The applied fundamental experiments to confirm that AE and RBM approaches are able to accurately detect credit cards using a huge dataset. Although deep learning approaches enable us to fairly detect the fraud problem in credit cards, model building makes use of diverse parameters that affect its outcomes. The work of Abhimanyu [82] evaluated commonly used methods in deep learning to efficiently check out previous fraud detection problems in terms of class

Mathematics 2020, 8, 1640 19 of 42 approaches enable us to fairly detect the fraud problem in credit cards, model building makes use of diverse parameters that affect its outcomes. The work of Abhimanyu [82] evaluated commonly used methods in deep learning to efficiently check out previous fraud detection problems in terms of class inconsistency and scalability. A plenary advice was provided by the authors regarding analysis of model parameter sensitivity and its tuning while applying neural to the fraud detection problems in credit cards. Another study by [81] designed an autoencoder algorithm in order to model the fraudulent activities, as efficient automated tools need to accurately handle huge daily transactions around the world. The model enables investigators to give a report regarding unbalanced datasets where there is no need to use data balanced approaches such as the Under-Sampling approach. One of the world’s largest industries is money laundering, which is the unlawful process of hiding the original source of received money unlawfully by transmitting it through a complicated banking transaction. A recent work, considering anti-money laundering detection, designed a new framework using natural language processing (NLP) technology [83]. Here, the main reason for constructing a deep learning framework is decreasing the cost of human capital and time consumption. The distributed and scalable method (e.g., NLP) performs complex mechanisms associated with various data sources such as news and tweets in order to make the decision simpler while providing more records. It is a great challenge to design a practical algorithm to prevent fraudulent transactions in financial sectors while dealing with a credit card. The work in [81] presented an efficient algorithm that has the superiority of controlling unbalanced data compared to traditional algorithms. Where anomaly detection can be handled by the reconstruction loss function, we depicted their proposed AE algorithm in Algorithm 1 [81].

Algorithm 1. AE pseudo algorithm Steps Processes Input Matrix X II input dataset Parameter of the matrix//parameter (w,bx,bh) Step 1:Prepare the input data where: w: Weight between layers, bx Encoder’s parameters, bh Decoder’s Parameters h null // vector for the hidden layer ← X null // Reconstructed x ← Step 2: initial Variables L null II vector for Loss Function ← 1 batch number ← i 0 ← While i < 1 do II Encoder function maps an input X to hidden representation h: h = f (p[i ].w + p[i] bx) /* Decoder function maps hidden representation h back to a Reconstruction X:*/ X = g(p[i ].l/ + p[i] bx) /*For nonlinear reconstruction, the reconstruction loss is generally from cross-entropy :*/ Step 3: loop statement L = sum(x *log(X) + (1 x) *log(l X)) − − − /* For linear reconstruction, the reconstruction loss is generally from the squared error:*/ L = sum(x X)2 − Min θ[i] = p L(x X) − End while Return θ θ //objective function ← /*Training an auto-encoder involves finding Step 4: output parameters = (W,hx,bb) that minimize the reconstruction loss in the given dataset X and the objective function*/ Mathematics 2020, 8, 1640 20 of 42

3.1.5. Deep Learning in Macroeconomics Macroeconomic prediction approaches have gained much interest in recent years, which are helpful for investigating economics growth and business changes [85]. There are many proposed methodsMathematics that 2020, can 8, x FOR forecast PEERmacroeconomic REVIEW indicators, but these approaches require huge amounts21 of of44 data and suffer from model dependency. Table5 shows the recent results which are more acceptable Table 5. Application of deep learning in macroeconomics. than the previous ones. References Methods Application Table 5. Application of deep learning in macroeconomics. [86] Encoder-decoder Indicator prediction Reference Methods Application [87][86 ] Encoder-decoder Approach IndicatorForecasting prediction inflation [87] Backpropagation Approach Forecasting inflation [88][88 ] Feed-ForwardFeed-Forward neural neural Network Network AssetAsset allocation allocation

Application ofof deepdeep learning learning in in macroeconomics macroeconomics has has been been exponentially exponentially growing growing during during the pastthe fewpast years few years [89–91 [89]. Smalter-91]. Smalter and Cook and Cook[92] presented [92] presented a new robusta new modelrobust called model an called encoder–decoder an encoder– thatdecoder makes that use make ofs deep use of neural deep architectureneural architecture to increase to increase the accuracy the accuracy of prediction of prediction with awith low a data low demanddata demand concerning concerning unemployment unemployment problems. problem Thes. The mean mean absolute absolute error error (MAE) (MAE) was was employed employed for evaluatingfor evaluating the the results. results. Figure Figure 19 19 visualizes visualizes the the average average values values of of MAE MAE obtained obtained byby SmalterSmalter and Cook [[88]88].. This compares average MAE values for CNN, LSTM, AE, DARM,DARM, and thethe Survey of Professional ForecastersForecasters (SPF).(SPF).

Figure 19. Average MAE reports.

According to Figure 19, the lowest average MAE is related to AE followed by SPF with additional According to Figure 19, the lowest average MAE is related to AE followed by SPF with advantages such as supplying nice single-series efficiency, higher accuracy of predicting, and better additional advantages such as supplying nice single-series efficiency, higher accuracy of predicting, model specification. and better model specification. Haider and Hanif [87] employed an ANN method applied to forecast macroeconomic indicators Haider and Hanif [87] employed an ANN method applied to forecast macroeconomic indicators for inflation. Results were evaluated by RMSE values for comparing the performance of ANN, AR, for inflation. Results were evaluated by RMSE values for comparing the performance of ANN, AR, and ARIMA techniques. Figure 20 presents the average RMSE values for comparison. and ARIMA techniques. Figure 20 presents the average RMSE values for comparison. According to Figure 20, the study with model simulation based on a backpropagation approach outperformed previous models such as the autoregressive integrated moving average (ARIMA) model. Another useful application of DL architecture is to deal with investment decisions in conjunction with macroeconomic data. Chakravorty et al. [88] used a feed-forward neural network to perform tactical asset allocation while applying macroeconomic indicators and price-volume trends. They proposed two different methods in order to build a portfolio; the first one estimated expected returns and uncertainty, and the second approach obtained allocation directly using neural network architecture and the optimized portfolio Sharpe. Their methods with the adopted trading strategy demonstrated a comparable achievement with previous results.

Figure 20. Average RMSE reports.

Mathematics 2020, 8, x FOR PEER REVIEW 21 of 44

Table 5. Application of deep learning in macroeconomics.

References Methods Application

[86] Encoder-decoder Indicator prediction

[87] Backpropagation Approach Forecasting inflation

[88] Feed-Forward neural Network Asset allocation

Application of deep learning in macroeconomics has been exponentially growing during the past few years [89-91]. Smalter and Cook [92] presented a new robust model called an encoder– decoder that makes use of deep neural architecture to increase the accuracy of prediction with a low data demand concerning unemployment problems. The mean absolute error (MAE) was employed for evaluating the results. Figure 19 visualizes the average values of MAE obtained by Smalter and Cook [88]. This visualization compares average MAE values for CNN, LSTM, AE, DARM, and the Survey of Professional Forecasters (SPF).

Figure 19. Average MAE reports.

According to Figure 19, the lowest average MAE is related to AE followed by SPF with additional advantages such as supplying nice single-series efficiency, higher accuracy of predicting, and better model specification. Haider and Hanif [87] employed an ANN method applied to forecast macroeconomic indicators for inflation. Results were evaluated by RMSE values for comparing the performance of ANN, AR, Mathematics 2020, 8, 1640 21 of 42 and ARIMA techniques. Figure 20 presents the average RMSE values for comparison.

Figure 20. Average RMSE reports. Figure 20. Average RMSE reports.

A new technique was used in [86] to enhance the indicator prediction accuracy; the model requires few data. Experimental results indicate that the encoder–decoder outperformed the highly cited SPF prediction. The results in Table6 showed that the encoder–decoder is more responsive than the SPF prediction or is more adaptable to data changes [86].

Table 6. Inflection point prediction for Unemployment around 2007.

Time Horizon SPF Encoder–Decoder 3-month horizon model Q3 2007 Q1 2007 6-month horizon model Q3 2007 Q2 2007 9-month horizon model Q2 2007 Q3 2007 12-month horizon model Q3 2008 Q1 2008

3.1.6. Deep Learning in Financial Markets (Service & Risk Management) In financial markets, it is crucial to efficiently handle the risk arising from credits. Due to recent advance in big data technology, DL models can design a reliable financial model in order to forecast credit risk in banking systems (see Table7).

Table 7. Application of deep learning in financial markets (services and risk management).

Reference Methods Application [89] Binary Classification Technique Loan pricing [90] Credit risk analysis [91] AE Portfolio management [92] Likelihood Esrtimation Mortgage risk

Addo et al. [89] employed a binary classification technique to identify essential features of selected ML and DL models in order to evaluate the stability of the classifiers based on their performance. The study used the models separately to forecast loan default probability, by considering the selected features and the selected algorithm, which are the crucial keys in loan pricing processes. In credit risk management, it is important to distinguish the good and the bad customers which oblige us to construct very efficient classification models by using deep learning tools. Figure 21 presents the results visualized for the study by Addo et al. [89] to compare the performance of LR, RF, Gboosting, and DNN in terms of RMSE and area under the curve values (see Figure 21). Mathematics 2020, 8, x FOR PEER REVIEW 23 of 44 MathematicsMathematics 20202020,, 88,, 1640x FOR PEER REVIEW 2223 ofof 4244

((aa)) ((bb))

FigureFigure 21. 21. AUCAUC ( (aaa))) and andand RMSE RMSERMSE ( ((bbb))) results. results.results.

InInIn general,general,general, credit creditcredit data datadata include includeinclude useless uselessuseless and andand unneeded unneededunneeded features featuresfeatures that needthatthat need toneed be filteredtoto bebe filteredfiltered by the featurebyby thethe featureselectionfeature selectionselection strategy strategystrategy to provide toto provideprovide the classifier thethe classifierclassifier with high withwith accuracy. highhigh accuracy.accuracy. HaHa and and Nguyen Nguyen [90][[90]90] studied studiedstudied a aa novel novelnovel feature featurefeature selectionselection methodmethod thatthat help helpshelpss financial financialfinancial institution institutioninstitution s stostoto performperform creditcredit assessmentassessment with with lessless workloadworkload whilewhile focusingfocusing onon importantimportant variablesvariables andand to toto improve improveimprove thethethe classification classificationclassification accuracy accuracyaccuracy in inin terms terms of of creditcredcreditit scoringscoring andand customercustomercustomer rating.rating.rating. TheThe accuracy accuracy value value was was employedemployed for forfor comparing comparingcomparing the the performance performance of of Linear Linear SVM,SVM, CACA CART,RT,RT, k k-NN,k--NN,NN, Naïve Naïve Bayes, Bayes, MLP MLP,MLP,, and and RF RF techniques.techniques.techniques. FigFig Figureureure 22 2222 presents presentspresents thethe the comparison comparison resultsresults results related related to to the the study study by by Ha Ha and andand NguyenNguyen [90] [[90]90] (see(see FigureFigure 2222)).)..

Figure 22. The comparison results for the study.

Figure 22 contains the baseFigureFigure type 22. 22. of TheThe methods comparison comparison and results results their integration forfor the the study. study. with GA, PSO, LR, LDA, and t-tests as different scenarios except RF which is in its baseline type. In all the cases except linear SVM, integratingFigFigureure 22 with22 containscontains optimizers thethe basebase and typetype filters ofof such methodsmethods as GA, andand PSO, theirtheir and integratiintegrati LR improvedoonn withwith GA,GA, the accuracyPSO,PSO, LR,LR, comparedLDALDA,, andand twitht--testtests theirs as as different different baseline scenarios scenarios type. However, except except RF RF in SVM,whichwhich theis is in in baseline its its baseline baseline model type. type. provided InIn all all the the higher cases casesaccuracy except except linear linear compared SVM, SVM, integratingwithintegrating other scenarios. withwith optimizersoptimizers andand filtersfilters suchsuch asas GA,GA, PSOPSO,, andand LRLR improvedimproved thethe accuracyaccuracy comparedcompared withwith A theirtheir study baselinebaseline applied type.type. hierarchical HoweHoweverver,, inin models SVM,SVM, the tothe present baselinebaseline high modelmodel performance providedprovided higherhigher regarding accuracyaccuracy big comparedcompared data [91]. withTheywith otherother constructed scenarios.scenarios. a deep portfolio with the four processes of auto-encoding, calibrating, validating, and verifyingAA studystudy appliedapplied the application hierarchicalhierarchical of a modelsmodels portfolio toto present includingpresent highhigh put performanceperformance and call options regardingregarding with big underlyingbig datadata [91][91] stocks... TheyThey constructTheirconstruct methodseded aa deepdeep are able pportort tofoliofolio find with thewith optimal thethe fourfour strategy processprocess ineses the ofof theory autoauto--encoding,encoding, of deep portfolios calibrating,calibrating, by ameliorating validatingvalidating,, andand the verifyingdeepverifying feature. thethe application Anotherapplication work ofof aa developed portfolioportfolio including aincluding deep learning putput andand model callcall options inoptions mortgage withwith riskunderlyingunderlying applied stocks. tostocks. huge TheirTheir data methodmethodss areare ableable toto fifindnd thethe optimaloptimal strategystrategy inin thethe theorytheory ofof deepdeep portfoliosportfolios byby amelioratingameliorating thethe

Mathematics 2020, 8, x FOR PEER REVIEW 24 of 44 deepMathematics feature.2020 Another, 8, 1640 work developed a deep learning model in mortgage risk applied to huge23 data of 42 sets that discovered the nonlinearity relationship between the variables and debtor behavior which can be affected by the local economic situation [92]. Their research demonstrated that the unemploymentsets that discovered variable the nonlinearity is substantial relationship in the risk between of mortgage the variables which and brings debtor behaviorthe importance which can of implicationsbe affected by to the the local relevant economic practitioners. situation [92]. Their research demonstrated that the unemployment variable is substantial in the risk of mortgage which brings the importance of implications to the 3.1.7.relevant Deep practitioners. Learning in Investment

3.1.7.Financial Deep Learning problems in Investmentgenerally need to be analyzed in terms of datasets from multiple sources. Thus, it is substantial to construct a reliable model for handling unusual interactions and features from Financialthe data for problems efficient generally forecasting. need Table to be 8 analyzed comprises in the terms recent ofdatasets results of from using multiple deep learning sources. approachesThus, it is substantial in financial to invest constructment a. reliable model for handling unusual interactions and features from the data for efficient forecasting. Table8 comprises the recent results of using deep learning approaches in financial investment.Table 8. Application of deep learning in stock price prediction.

References Table 8. ApplicationMethods of deep learning in stock price prediction.Application

[93] ReferenceLSTM Methods and AE ApplicationMarket investment [94] [93]Hyper LSTM-parameter and AE MarketOption investment pricing in finance [94] Hyper-parameter Option pricing in finance [95] [95]LSTM LSTM and and SVR SVR QuantitativeQuantitative strategy strategy in investment in investment [96] [96]R-NN R-NN and and genetic genetic method method SmartSmart financial financial investment investment

Aggarwal and and Aggarwal Aggarwal [93 [93]] designed designed a deep a learningdeep learning model applied model to applied an economic to an investmenteconomic investmentproblem with problem the capability with the of extractingcapability nonlinear of extracting data nonlinear patterns. dataThey pattern presenteds. They a decision presented model a decisionusing neural model network using architectureneural network such architecture as LSTM, auto-encoding, such as LSTM, and auto smart-encoding indexing, and to smart better indexing estimate tothe better risk ofestimate portfolio the selectionrisk of portfolio with securities selection forwith the securities investment for the problem. investment Culkin problem. and Das Culkin [94] andinvestigated Das [94] investigated the option pricing the option problem pricing using problem DNN architectureusing DNN toarchitecture reconstruct to the reconstruct well-known the Blackwell- knownand Scholes Black formulaand Scholes with formula considerable with considerable accuracy. The accuracy. study tried The study to revisit tried the to previousrevisit the result previous [97] resultand made [97] theand model made morethe model accurate more with accurate the diverse with selectionthe diverse of hyper-parameters. selection of hyper The-parameters. experimental The experimentalresult showed result that the showed model that can the price model the options can price with the minor options error. with Fang minor et al. error. [95] investigated Fang et al [95] the investigatedoption pricing the problem option in pricing conjunction problem with transaction in conjunction complexity with transa wherection the research complexity goal is where to explore the researchefficient investmentgoal is to explore strategy efficient in a high-frequency investment strategy trading manner.in a high The-frequency work used trading the deltamanner. hedging The workconcept used to the handle delta the hedging risk in concept addition to to handle the several the risk key in components addition to the of theseveral pricing key such components as implied of thevolatility pricing and such historical as implied volatility, volatility etc. and LSTM-SVR historical models volatility, were etc. applied LSTM to-SVR forecast models the finalwere transactionapplied to forecastprice with the experimental final transaction results. price Results with experimental were evaluated results. by theResults deviation were evaluated value and by were the compareddeviation valuewith single and were RF and compared single LSTMwith single methods. RF and Figure single 23 LSTMpresents methods. the visualized Figure 23 results presents to compare the visualized results resultmores clearly to compare for the results study more by Culkin clearly and for Das the[ study94]. by Culkin and Das [94].

Figure 23. Figure 23. Deviation results for LSTM, RF, and and LSTM LSTM-SVR.-SVR.

MathematicsMathematics 20202020,, 88,, x 1640 FOR PEER REVIEW 2524 of of 44 42

Based on the results, the single DL approach outperforms the traditional RF approach with higherBased return on in the adopted results, theinvestment single DL strategy, approach but outperforms its deviation the is traditional considerably RF approachhigher than with that higher for thereturn hybrid in adopted DL (LSTM investment-SVR) method. strategy, A butnovel its learning deviation Genetic is considerably Algorithm higher is proposed than that by for Serrano the hybrid [96] concerningDL (LSTM-SVR) Smart method. Investment A novel that learning uses the Genetic R-NN Algorithm model to emulate is proposed human by Serrano behavior. [96 ] The concerning model makesSmart Investmentuse of complicated that uses deep the R-NNlearning model architecture to emulate where human reinforcement behavior. Thelearning model occurs makes for use fast of decisioncomplicated-making deep purpose learnings,architecture deep learning where forreinforcement constructing learning stock identity, occurs forclusters fastdecision-making for the overall decisionpurposes,-making deep learningpurpose, forand constructinggenetics for the stock transfer identity, purpose. clusters The forcreated the overallalgorithm decision-making improved the returnpurpose, regarding and genetics the market for the risk transfer with purpose.minor error The performance. created algorithm improved the return regarding the marketThe authors risk with in [96] minor construct error performance.ed a complex model while the using network weightsThe was authors used in to [ 96imitate] constructed the human a complex brain w modelhere applying while the the genetic following algorithm formula using for network the encoding weights– decodingwas used organism to imitate,the Equation human ( brain28): where applying the following formula for the encoding–decoding organism, Equation (28): 푚푖푛‖푋 − 휁(푊2휁(푋푊1))‖ s.t. 푊1 ≥ 0 (28) min X ζ(W2ζ(XW1)) s.t. W1 0 (28) and k − k ≥ and 푊 휁(푋푊 ) 푋 푥 (푥푇T푥 푥푇T W22== pinv(pinv(ζ(XW1 1))) X , , pinv pinv((x)) = (x x)x (29(29)) where 푥 is the input vector, 푊 is the weight matrix. 푄 = 휁(푋푊 ) represents the neuron vector, and where x is the input vector, W is the weight matrix. 1Q1 = ζ(1XW1) represents the neuron vector, 푄 = 휁(푊 푄 ) represents the cell vector. We illustrate the whole structure of the model in Figure 24 and2 Q2 =2 ζ1(W2Q1) represents the cell vector. We illustrate the whole structure of the model in adoptedFigure 24 from adopted [96]. from [96].

Figure 24. Structure of the smart investment model. Figure 24. Structure of the smart investment model. 3.1.8. Deep Learning in Retail 3.1.8. NewDeep applications Learning in of Retail DL as well as novel DRL in retail industry are emerging in a fast pace [98–117]. Augmented reality (AR) enables customers to improve their experience while buying/finding a product New applications of DL as well as novel DRL in retail industry are emerging in a fast pace [98- from real stores. This algorithm is frequently used by researchers in the field. Table9 presents the 117]. Augmented reality (AR) enables customers to improve their experience while buying/finding a notable studies. product from real stores. This algorithm is frequently used by researchers in the field. Table 9 presents the notable studies.

Mathematics 2020, 8, x FOR PEER REVIEW 26 of 44

Table 9. Application of deep learning in retail markets.

References Methods Application Mathematics 2020, 8, 1640 25 of 42

[98] Augmented reality and image classification Improving shopping in retail markets Table 9. Application of deep learning in retail markets.

[99]Reference DNN Methods methods ApplicationSale prediction [100] [98] Augmented realityCNN and image classification ImprovingInvestigation shopping in in retail retail markets stores [99] DNN methods Sale prediction [101] [100]Adaptable CNN CNN InvestigationValidation in in the retail food stores industry [101] Adaptable CNN Validation in the food industry Cruz et al [98] combined the DL technique and augmented reality approaches in order to provide informationCruz et for al. [clients.98] combined They presented the DL technique a mobile and application augmented that reality is able approaches to locate clients in order by to means provide of theinformation image classification for clients. technique They presented in deep a mobile learning. application Then, employed that is able AR toapproaches locate clients to efficiently by means adviseof the image finding classification the selected technique product in with deep all learning. helpful information Then, employed regarding AR approaches that product to e ffi inciently large stores.advise Another finding the interesting selected application product with of alldeep helpful learning information methods regarding is in fashion that retail product with in large large supply stores. andAnother demand interesting where application precise sales of predictiondeep learning is tied methods to the is company’s in fashion retail profit. with Nogueira large supply et al [99] and designeddemand where a novel precise DNN sales to prediction accurately is forecast tied to the future company’s sales where profit. Nogueirathe model et uses al. [99 a] designed huge and a completelynovel DNN different to accurately set of forecast variables future such sales as physical where the specification model usess aof huge the product and completely and the di ideafferent of subjectset of variables-matter expert. such as physicalTheir experimental specifications work of the indicated product and that the DNN idea of model subject-matter performance expert. is comparableTheir experimental with other work shallow indicated approaches that DNN such model as RF performance and SVR. An is comparableimportant issue with for other retail shallow is to analyzeapproaches client such behavior as RF and in order SVR. Anto max importantimize issuerevenue for retailthat can is to be analyze perform cliented by behavior computer in order vision to techniquesmaximize revenuein the study that by can De be Sousa performed Ribeiro by et computer al [100]. The vision proposed techniques CNN in regression the study bymodel De Sousadeals withRibeiro counting et al. [100 problem]. The proposeds where assessing CNN regression the number model of dealsavailable with persons counting in problems the stores where and detecting assessing crucialthe number spots. of The available work also persons designed in the a storesforeground/background and detecting crucial approach spots. for The detecting work also the designed behavior a offoreground people in/ backgroundretail markets. approach The results for detecting of the proposed the behavior method of peoplewere compared in retail markets. with a common The results CNN of techniquethe proposed in term methods of wereaccuracy. compared Figure with 25 presents a common the CNN visualized technique result in termsfor the of study accuracy. by De Figure Sousa 25 Ribeiropresents et the al [100] visualized. result for the study by De Sousa Ribeiro et al. [100].

Figure 25. Comparison of CNN for two separate data sets. Figure 25. Comparison of CNN for two separate data sets. According to Figure 25, the proposed model is robust enough and performs better than common CNNAccording methods to in Fig bothure datasets. 25, the proposed In the current model situationis robust enough of the food and perform retail industry,s better itthan is crucialcommon to CNdistributeN methods the products in both todatasets the market. In withthe current proper situation information of andthe removefood retail the possibilityindustry, it of is mislabeling crucial to distributein terms of the public products health. to the market with proper information and remove the possibility of mislabelingA new in adaptable terms of convolutional public health. neural network approach for Optical Character Verification was presentedA new with adaptable the aim convolutio to automaticallynal neural identify network the use approach by the dates for Optical in a large Character dataset [Verification101]. The model was presentedapplies both with a k-meansthe aim algorithmto automatically and k-nearest identify neighbor the use by to incorporatethe dates in calculateda large dataset centroids [101] to. The the modelCNN for applies efficient both separation a k-means and algorithm adaptation. and Developed k-nearest modelsneighbor allow to incorporate us to better managecalculated the centroids precision and the readability of important use related to dates and to better control the tractability information in food producing processes. The most notable research is collected in Table9 with the application of DL in retail market. The adaptation method in [117] interconnects C and Z cluster centroids illustrated Mathematics 2020, 8, x FOR PEER REVIEW 27 of 44 to the CNN for efficient separation and adaptation. Developed models allow us to better manage the precision and the readability of important use related to dates and to better control the tractability information in food producing processes. The most interesting recent research is collected in Table Mathematics 2020, 8, 1640 26 of 42 11 with the application of DL in retail market. The adaptation method in [117] interconnects C and Z cluster centroids illustrated in Figure 11. This is to construct a new augmented cluster including informatioin Figure n11 from. This both is to networks construct in a CNN1 new augmented and CNN2 cluster using includingk-NN classification information [101] from (see both Fig networksure 26). Thein CNN1methodologies and CNN2 lead using to considerably k-NN classification improved [food101] safety (see Figure and correct 26). Thelabelling. methodologies lead to considerably improved food safety and correct labelling.

FigureFigure 26. 26. FullFull structure structure of ofthe the adaptable adaptable CNN CNN framework framework..

3.1.9. Deep Learning in Business (Intelligence) 3.1.9. Deep Learning in Business (Intelligence) Nowadays, big data solutions play the key role in business services and productivities to efficiently Nowadays, big data solutions play the key role in business services and productivities to reinforce the market. To solve the complex business intelligence (BI) problems dealing with market efficiently reinforce the market. To solve the complex business intelligence (BI) problems dealing with data, DL techniques are useful (see Table 10). Table 10 presents the notable studies for the application market data, DL techniques are useful (see Table 10). Table 10 presents the notable studies for the of deep learning in business intelligence. application of deep learning in business intelligence.

Table 10. Application of deep learning in business intelligence. Table 10. Application of deep learning in business intelligence. Reference Methods Application References Methods Application [102] MLP BI with client data [102][ 103]MLP MLS and SAE FeatureBI with selection client data in market data [103][ 104]MLS and SAE RNN Feature Information selection detection in market in business data data [105] RNN Predicting procedure in business [104] RNN Information detection in business data [105] RNN Predicting procedure in business Fombellida et al. [102] engaged the concept of meta plasticity, which has the capability of improving the flexibility of learning mechanisms to detect more useful information and learning from data.Fombellida The study et focused al [102] on engaged MLP where the the concept output of is inmeta the plasticity,application which of BI while has the utilizing capabilit they client of improvingdata. The the developed flexibility work of learning approved mechanism that the models to detect reinforces more useful learning information over the and classical learning MLPs from and data.other The systems study infocused terms on of MLP learning where evolution the output and is ultimate in the application accomplishment of BI while regarding utilizing precision the client and data.stability. The developed The results work of the approved proposed that method the mode werel compared reinforces with learning MOE over developed the classical by West MLPs [106 and] and otherMCQP systems developed in terms by Pengof learning et al. [ 107evolution] and Fombellida and ultimate et al.accomplishment [102] in terms ofregarding accuracy precision and sensitivity. and stability.Results areThe presented results of inthe Figure proposed 27. method were compared with MOE developed by West [106] and MCQPAs is clear developed from Figure by Peng 27, the et proposed al [107] and method Fombellida provided et higher al. [102] sensitivity in term compareds of accuracy with and other sensitivity.methods andResults a lower are presented accuracy value. in Figure 27. Business intelligence Net, which is a for exploiting irregularity in business procedures, was designed to mainly manage the data aspect of business procedures. Where, this Net does not depend on any given information about the procedure and clean dataset by Nolle et al. [104]. The study presented a heuristic setting that decreased the handy workload. The proposed approach can be applied to model the time dimension in sequential phenomenon which is useful for anomalies. It is proven by the experimental results that BI Net is a trustable approach with high capability of anomaly detection in business phenomenon logs. An alternative strategy to the ML algorithm needs to manage Mathematics 2020, 8, x FOR PEER REVIEW 28 of 44

Mathematics 2020, 8, 1640 27 of 42 huge amounts of data generated by enlargement and broader use of digital technology. DL enables us to dramatically handle large datasets with heterogeneous properties. Singh and Verma [103] designed a new multi-layer feature selection interacting with a stacked auto-encoder (SAE) to detect crucial representations of data. The novel presented method performed better than commonly used ML algorithms applied to the Farm Ads dataset. Results were evaluated by employing accuracy and area Mathematics 2020, 8, x FORFigure PEER 27. REVIEW The visualized results considering sensitivity and accuracy. 28 of 44 under the curve factors. Figure 28 presents the visualized results according to Singh and Verma [103]. As is clear from Figure 27, the proposed method provided higher sensitivity compared with other methods and a lower accuracy value. Business intelligence Net, which is a recurrent neural network for exploiting irregularity in business procedures, was designed to mainly manage the data aspect of business procedures. Where, this Net does not depend on any given information about the procedure and clean dataset by Nolle et al [104]. The study presented a heuristic setting that decreased the handy workload. The proposed approach can be applied to model the time dimension in sequential phenomenon which is useful for anomalies. It is proven by the experimental results that BI Net is a trustable approach with high capability of anomaly detection in business phenomenon logs. An alternative strategy to the ML algorithm needs to manage huge amounts of data generated by enlargement and broader use of digital technology. DL enables us to dramatically handle large datasets with heterogeneous properties. Singh and Verma [103] designed a new multi-layer feature selection interacting with a stacked auto-encoder (SAE) to detect crucial representations of data. The novel presented method performed better than commonly used ML algorithms applied to the Farm Ads dataset. Results were evaluated by employing accuracy and area under the curve factors. Figure 28 presents the visualized results according to Singh and Verma [103]. Figure 27.27. The visualized results considering sensitivity andand accuracy.accuracy.

As is clear from Figure 27, the proposed method provided higher sensitivity compared with other methods and a lower accuracy value. Business intelligence Net, which is a recurrent neural network for exploiting irregularity in business procedures, was designed to mainly manage the data aspect of business procedures. Where, this Net does not depend on any given information about the procedure and clean dataset by Nolle et al [104]. The study presented a heuristic setting that decreased the handy workload. The proposed approach can be applied to model the time dimension in sequential phenomenon which is useful for anomalies. It is proven by the experimental results that BI Net is a trustable approach with high capability of anomaly detection in business phenomenon logs. An alternative strategy to the ML algorithm needs to manage huge amounts of data generated by enlargement and broader use of digital technology. DL enables us to dramatically handle large datasets with heterogeneous properties. Singh and Verma [103] designed a new multi-layer feature selection interacting with a stacked auto-encoder (SAE) to detect crucial representations of data. The novel presented method performed better than commonly used ML algorithms applied to the Farm Ads dataset. Results were Figure 28. Comparative analysis of RF, linear SVM, and RBF SVM. evaluated by employingFigure accuracy 28. Comparative and area analysi unders theof RF, curve linear factors. SVM, andFig ureRBF 28 SVM. presents the visualized results according to Singh and Verma [103]. According to Figure 28, RBF-based SVM provided higher accuracy and AUC, and the lowest accuracy and AUC were related to the RF method. The kernel type is the most important factor for the performance of SVM, and the RBF kernel provides higher performance compared with that for the linear kernel. A new approach [105] used recurrent neural network architecture for forecasting the business procedure manner. Their approach was equipped with an explicit process characteristic, with no need for a model, where the RNN inputs build by means of an embedded space with demonstrated results regarding the validation accuracy and the feasibility of this method. The work in [103] utilized novel multi-layer SAE architecture to handle Farm Ads data setting where the traditional ML algorithm did not perform well regarding high dimensionality and the huge sample size characteristics of Farm Ads data. Their feature selection approach has high capability of

Figure 28. Comparative analysis of RF, linear SVM, and RBF SVM.

Mathematics 2020, 8, x FOR PEER REVIEW 29 of 44

According to Figure 28, RBF-based SVM provided higher accuracy and AUC, and the lowest accuracy and AUC were related to the RF method. The kernel type is the most important factor for the performance of SVM, and the RBF kernel provides higher performance compared with that for the linear kernel. A new approach [105] used recurrent neural network architecture for forecasting the business procedure manner. Their approach was equipped with an explicit process characteristic, with no need for a model, where the RNN inputs build by means of an embedded space with demonstrated results regarding the validation accuracy and the feasibility of this method. MathematicsThe2020 work, 8, 1640 in [103] utilized novel multi-layer SAE architecture to handle Farm Ads data28 setting of 42 where the traditional ML algorithm did not perform well regarding high dimensionality and the huge sample size characteristics of Farm Ads data. Their feature selection approach has high capability of dimensionalitydimensionality reduction reduction and and enhancing enhanc theing classifierthe classifier performance. performance. Wedepict We depict the complete the complete process process of theof algorithm the algorithm in Figure in Fig 29ure. 29.

Figure 29. Multi-Layer SAE architecture. Figure 29. Multi-Layer SAE architecture. Although DL utilizes the representation learning property of DNN and handles complex problems involvingAlthough nonlinear DL patterns utilizes ofthe economic representation data, DL learning methods property generally of DNN cannot and align handl thees learning complex problemproblems with involving the goal of nonlinear the trader patterns and the significant of economic market data, constraint DL methods to detect generally the optimal cannot strategy align the associatedlearning with problem economics with the problems. goal of RL the approaches trader and enablethe significant us to fill market the gap constraint with their to powerful detect the mathematicaloptimal strategy structure. associated with economics problems. RL approaches enable us to fill the gap with their powerful mathematical structure. 3.2. Deep Reinforcement Learning Application in Economics 3.2.Despite Deep Reinforcement the traditional Learning approaches, Application DRL in has Economics the important capability of capturing substantial marketDespite conditions the totraditional provide theapproaches, best strategy DRL in has economics, the important which capability also provides of capturing the potential substantial of scalabilitymarket andconditions efficient to handling provide ofthe high-dimensional best strategy in economic problems.s, Thus, which we also are provides motivated the to potential consider of the recent advance of deep RL applications in economics and the financial market.

3.3. Deep Reinforcement Learning in Stock Trading Financial companies need to detect the optimal strategy while dealing with stock trading in the dynamic and complicated environment in order to maximize their revenue. Traditional methods applied to stock market trading are quite difficult for experimentation when the practitioner wants to consider transaction costs. RL approaches are not efficient enough to find the best strategy due to the lack of scalability of the models to handle high-dimensional problems [108]. Table 11 presents the most notable studies developed by deep RLs in the stock market. Mathematics 2020, 8, 1640 29 of 42

Table 11. Application of deep RLs in the stock market.

Reference Methods Application [109] DDPG Dynamic stock market [110] Adaptive DDPG Stock portfolio strategy [111] DQN methods Efficient market strategy [112] RCNN Automated trading

Xiong et al. [109] used the deep deterministic policy gradient (DDPG) algorithm as an alternative to explore the optimal strategy in dynamic stock markets. The algorithm components handle large action-state space, taking care of the stability, removing sample correlation, and enhancing data utilization. Results demonstrated that the applied model is robust in terms of equilibrating risk and performs better as compared to traditional approaches with the guarantee of higher return. Xinyi et al. [110] designed a new Adaptive Deep Deterministic Reinforcement Learning framework (Adaptive DDPG) dealing with detecting optimal strategy in dynamic and complicated stock markets. The model combined an optimistic and pessimistic deep RL that relies on both negative and positive forecasting errors. Under complicated market situations, the model has the capability to gain better portfolio profit based on the Dow Jones stocks. Li et al. [111] did survey studies on deep RL for analyzing the multiple algorithms for stock decision-making mechanism. Their experimental results, based on three classical models, DQN, Double DQN and Dueling DQN, showed that among them, DQN models enable us to attain a better investment strategy in order to optimize the return in stock trading where applying empirical data to validate the model. Azhikodan et al. [112] focused on automating oscillation in securities trading using deep RL where they employed a recurrent convolutional neural network (RCNN) approach for forecasting stock value from the economic news. The original goal of the work was to demonstrate that deep RL techniques are able to efficiently detect the stock trading tricks. The recent deep RL results in the application of stock markets can be viewed in Table 12. An adaptive DDPG [110] comprising both   an actor network µ(s θµ) and a critic network Q s, a θQ utilized for the stock portfolio with a different | update structure than the DDPG approach is given by Equation (30):

´ ´ θQ τθQ+(1 τ) θQθµ´ τθµ+(1 τ) θµ´ (30) ← − ← − where the new target function is Equation (31):

µ´ Q´ = r +γQ´ (s + , µ´(s + θ , θ )) (31) Yi i i 1 i 1 |

Table 12. Application of deep reinforcement learning in portfolio management.

References Methods Application [109,113] DDPG Algorithmic trading [114] Model-less CNN Financial portfolio algorithm [15] Model-free Advanced strategy in portfolio trading [115] Model-based Dynamic portfolio optimization

Experimental results indicated that the model outperformed the DDPG with high portfolio return as compared to the previous methods. One can check the structure of the model in Figure 30 adopted from [110]. Mathematics 2020, 8, x FOR PEER REVIEW 31 of 44

Table 12. Application of deep reinforcement learning in portfolio management.

References Methods Application

[109, 113] DDPG Algorithmic trading [114] Model-less CNN Financial portfolio algorithm

[15] Model-free Advanced strategy in portfolio trading [115] Model-based Dynamic portfolio optimization

Experimental results indicated that the model outperformed the DDPG with high portfolio

Mathematicsreturn as2020 compared, 8, 1640 to the previous methods. One can check the structure of the model in Fig30 ofure 42 30 adopted from [110].

Critic network

Actor network  N at ( ) as()     µ(S) N at ()

S Q(s,a)

FigureFigure 30. 30.Architecture Architecture for for the the adaptive adaptive deep deep deterministic deterministic reinforcement reinforcement learning learning framework framework (Adaptive(Adaptive DDPG). DDPG).

3.3.1.3.3.1. Deep Deep Reinforcement Reinforcement Learning Learning in Portfolioin Portfolio Management Management AlgorithmicAlgorithmic trading trading area area is currentlyis currently using using deep deep RL RL techniques techniques for for portfolio portfolio management management with with fixedfixed allocation allocatio ofn capitalof capital into into various various financial financial products products (see (see Table Table 12 ).12 Table). Table 12 1presents2 presents the the notable notable studiesstudies in thein the application application of deepof deep reinforcement reinforcement learning learning in portfolioin portfolio management. management. LiangLiang et al.et al [113 [113]] engaged engaged di ffdifferenterent RL RL methods, methods such, such as DDPG,as DDPG, Proximal Proximal Policy Policy Optimization Optimization (PPO),(PPO) and, and PG PG methods, methods, to to obtain obtain the the strategy strategy associated associated with with financial financial portfolio portfolio in incontinuous continuous action-space.action-space. They They compared compared thethe performanceperformance of the the models models in in various various setting settingss in inconjunction conjunction with withthe the China China asset asset market market and and indicated indicated that that PG PGis more is more favorable favorable in stock in stock trading trading than thanthe two the others. two others.The study The study also presented also presented a novel a novelAdversarial Adversarial Training Training approach approach for ameliorate for ameliorate the training the training efficiency effiandciency the and mean the-return mean-return,, where where the new the training new training approach approach ameliorated ameliorated the performance the performance of PG of as PGcompared as compared to uniform to uniform constant constant rebalanced rebalanced portfolios portfolios (UCRP). (UCRP). RecentRecent work work byby Jiang Jiang et et al. al [114 [114]] employedemployed a deterministic a deterministic deep deep reinforcement reinforcement learning learning approach approach based based on on cryptocurrency cryptocurrency to obtainto obtain thethe optimal optimal strategy strategy in financialin financial problem problem settings. setting Cryptocurrencies,s. Cryptocurrencies, such such as as Bitcoin, Bitcoin, can can be be used used in in placeplace of governmentalof governmental money. money. The The study study designed designed a model-less a model-less convolutional convolutional neural neural network network (RNN) (RNN) wherewhere the the inputs inputs are are historical historical asset asset prices prices from from a cryptocurrency a cryptocurrency exchange exchange with with the the aim aim to produceto produce thethe set set of portfolio of portfolio weights. weights. The proposedThe proposed models models try to optimize try to optimize the return the using return the using network the reward network functionreward in function the reinforcement in the reinforcement learning manner. learning The manner. performance The performance of their presented of their CNN presented approach CNN obtainedapproach good obtained results as good compared results to other as compared benchmark to algorithms other benchmark in portfolio management.algorithms in Another portfolio tradingmanagement. algorithm Anothe was proposedr trading asalgorithm the model-free was proposed Reinforcement as the model Learning-free frameworkReinforcement [15] Learning where theframework reward function [15] where was engaged the reward by applying function a fully was exploiting engaged DPG by applying approach a so fully as to exploiting optimize the DPG accumulatedapproach return.so as to The optimize model includes the accumulated Ensemble of return. Identical The independent model includes evaluators Ensemble (EIIE) of topology Identical to incorporateindependent large evaluators sets of neural-net (EIIE) topology in terms to of incorporate weight-sharing large and sets a portfolio-vectorof neural-net in memoryterms of (PVM) weight- insharing order to a preventnd a portfolio the gradient-vector memory destruction (PVM) problem. in order There to prevent is an online the gradient stochastic destruction batch learning problem. (OSBL)There scheme is an online to consecutively stochastic batch analyze learning steady (OSBL) inbound scheme market to information, consecutively in conjunction analyze steady with inbound CNN, LSTM,market and information, RNN models. in conj Theunction results indicatedwith CNN, that LSTM the, models, and RNN while models. interacting The results with benchmarkindicated that models,the models, performed while better interacting than previous with benchmark portfolio management models, performed algorithms better based than on a previous cryptocurrency portfolio market database. A novel model-based deep RL scheme was designed by Yu et al. [115] in the sense of automated trading to take action and make the decisions sequentially associated with global goals. The model architecture includes an infused prediction module (IPM), a generative adversarial module (DAM), and a behavior cloning module (BCM), dealing with designed back-testing. empirical results, using historical market data, proved that the model is stable and gains more return as compared to baseline approaches and other recent model-free methods. Portfolio optimization is a challenging task while trading stock in the market. Authors in [115] utilized a novel efficient RL architecture associated with a risk-sensitive portfolio combining IPM to forecast the stock trend with historical asset prices for improving the performance of the RL agent, DAM to control the over-fitting problem, and BCM to handle unexpected movement in portfolio weights and to retain Mathematics 2020, 8, x FOR PEER REVIEW 32 of 44 management algorithms based on a cryptocurrency market database. A novel model-based deep RL scheme was designed by Yu et al [115] in the sense of automated trading to take action and make the decisions sequentially associated with global goals. The model architecture includes an infused prediction module (IPM), a generative adversarial data augmentation module (DAM), and a behavior cloning module (BCM), dealing with designed back-testing. empirical results, using historical market data, proved that the model is stable and gains more return as compared to baseline approaches and other recent model-free methods. Portfolio optimization is a challenging task while trading stock in the market. Authors in [115] utilized a novel efficient RL architecture associated with a risk-sensitive portfolioMathematics combining2020, 8 IPM, 1640 to forecast the stock trend with historical asset prices for improving the31 of 42 performance of the RL agent, DAM to control the over-fitting problem, and BCM to handle unexpected movement in portfolio weights and to retain the portfolio with low volatility (see Figure 31). Thethe results portfolio demonstrate with low volatility the complex (see Figuremodel 31is ).more The robust results and demonstrate profitable the as complexcompared model to the is more prior approachesrobust and profitable[115]. as compared to the prior approaches [115].

AE DAM

IPM BCM

DH Data Fetcher Agent

Market

MSE Broker Trade Execution Engine

FigureFigure 31. Trading 31. Trading architecture architecture..

3.3.2. Deep3.3.2. Reinforcement Deep Reinforcement Learning Learning in Online in Online Services Services In currentIn current development development of online of online service services,s, the the users users faceface the the challenge challenge of detecting of detecting their their interested interesteditems items efficiently efficiently where where recommendation recommendation techniques techniques enable enable us us to to give give the the right right solutions solutions to this to thisproblem. problem. Various Various recommendation recommendation methods methods are are presented presented such such as as content-based content-based collaborative collaborative filtering, filtering,factorization factorization machines, machines, multi-armed multi-armed bandits, bandits, to name to name a few. a few. These These proposed proposed approaches approaches are mostly are mostlylimited limited to where to where the the users users and and recommender recommender systems systems interact interact statically statically andand focus on on short short-term- term rewardrewards.s. Table Table 1 133 presents presents the the notable notable studies studies in inthe the application application of deep of deep reinforcement reinforcement learning learning in in onlineonline services. services.

Table 13. Application of deep reinforcement learning in online services. Table 13. Application of deep reinforcement learning in online services. Reference Methods Application References Methods Application [116] Actor–critic method Recommendation architecture [116] [117]Actor–critic SS-RTB method method Recommendation Bidding optimization architecture in advertising [117] [118]SS-RTB method DDPG and DQNBidding Pricing optimization algorithm in foradvertising online market [119] DQN scheme Online news recommendation [118] DDPG and DQN Pricing algorithm for online market [119] DQN scheme Online news recommendation Feng et al. [116] applied a new recommendation using an actor–critic model in RL that can explicitly have a dynamic interaction and long-term rewards in consecutive decision-making processes. Feng et al [116] applied a new recommendation using an actor–critic model in RL that can The experimental work utilizing practical datasets proved that the presented approach outperforms explicitly have a dynamic interaction and long-term rewards in consecutive decision-making the prior methods. A key task in online advertising is to optimize advertisers’ gain in the framework of bidding optimization. Zhao et al. [117] focused on real-time bidding (RTB) applied to sponsored search (SS) auction in a complicated stochastic environment associated with user action and bidding policies (see Algorithm 2). Their model, the so-called SS-RTB, engaged reinforcement learning concepts to adjust a robust Markov Decision Process (MDP) model in a changing environment, is based on a suitable aggregation level of datasets from auction market. Empirical results of both online and offline evaluation based on the Alibaba auction platform indicated the usefulness of the proposed method. In the rapid growing of business, online retailers face more complicated operations that heavily require detection of updated Mathematics 2020, 8, 1640 32 of 42 pricing methodology in order to optimize their profit. Liu et al. [118] proposed a pricing algorithm that models the problem in the framework of MDP based on E-commerce platform. They used deep reinforcement learning approaches to effectively deal with the dynamic market environment and market changes and to set efficient reward functions associated with the complex environment. Online testing for such models is impossible as legally the same prices have to be presented to the various customers simultaneously. The model engages deep RL techniques to maximize the long-term return while applying both discrete and continuous setting for pricing problems. Empirical experiments showed that the obtained policies from DDPG and DQN methods were much better than other pricing strategies based on various business database.s Based on reports of Kompan and Bieliková [120], news aggregator services are the favorite online services able to supply massive volume of content for the users. Therefore, it is important to design news recommendation approaches for boosting the user experience. Results were evaluated by precision and recall values into two scenarios: SME.SK and manually annotated. Figure 32 presents the visualized results for the study by Kompan and Bieliková [120].

Algorithm 2 [117]: DQN Leaming 1: for epsode = 1 to n do 2: Initialize replay memory D to capacity N 3: Initialize action value functions (Qtrain.Qep. Qtarget) with weights Otrain.Oep.Otarget 4: for t = 1 to m do 5: With probability E select a random action at 6: otherwise select at = arg max 0 Qep(St, a;θep ) 7: Execute action at to auction simulator and observe state St+I and reward rt 8: if budget of St+ 1 < 0, then continue 9: Store transition (s1,at,r1, s1+1 ) in D 10: Sample random rmru batch of transitions (sj,aj,rj, Sj+1 ) from D 11: if j = m then 12: Set Yi = ri 13: else 14: Set Y = r + y arg maxa, Qiarget (s a.,θtarget) i i i+1• 15: end if 16: Perform a gradient descent step on the loss function (Yi Q (si.ai;θ )) 2 − irain 1rain 17: end for 18: Update θep with θ 19: Every C steps, update θ1arget with θ 20: end for

As is clear from Figure 32, in both scenarios, the proposed method provided the lowest recall and precision compared with TFIDF. A recent research by Zheng et al. [119] designed a new framework using the DQN scheme for online news recommendation with the capability of taking both current and future rewards. The study considered user activeness and also the Dueling Bandit Gradient Descent approach to meliorate the recommendation accuracy. Wide empirical results, based on online and offline tests, indicated the superiority of their novel approach. Mathematics 2020, 8, 1640 33 of 42 Mathematics 2020, 8, x FOR PEER REVIEW 34 of 44

Figure 32. Comparative analysis of TFID for DDPG and DQN. Figure 32. Comparative analysis of TFID for DDPG and DQN. To deal with the bidding optimization problem is the real-world challenge in online advertising. DespiteAs theis clear prior from methods, Figure recent 32, in work both [ 117scenarios,] used an the approach proposed called method SS-RTB provided to handle the sophisticatedlowest recall changingand precision environments compared associated with TFIDF. with biddingA recent policies research (see by Table Zheng 13). Moreet al importantly,[119] designed the amodel new wasframework extended using to the the multi-agent DQN scheme problem for online using news robust recommend MDP and thenation evaluated with the capability by the performance of taking metricboth current PUR_AMT and future/COST reward that showeds. The study the modelconsidered considerably user activeness outperform and also the the PUR_AMT Dueling /BanditCOST. TheGradient PUR_AMT Descent/COST approach is a metric to meliorate of optimizing the recommendation the buying amount accuracy. with high Wide correlation empirical ofresults, minimizing based theon online cost. and offline tests, indicated the superiority of their novel approach. To deal with the bidding optimization problem is the real-world challenge in online advertising. 4.Despite Discussion the prior methods, recent work [117] used an approach called SS-RTB to handle sophisticated changingA comparative environment results associated of DL and with DRL bidding models policies was applied(see Table to 1 the3). More multiple importantly, economic the domains model discussedwas extended in Table to the 14 multi and Figure-agent 33problem. Different using aspects robust of MDP proposed and then models evaluated were summarized by the performance such as modelmetric complexity,PUR_AMT/COST accuracy that and showed speed, the source model of considerably dataset, internal outperform detection the property PUR_AMT/COST. of the models, The andPUR_AMT/COST profitability of theis a modelmetric regarding of optimizing to revenue the bu andying managing amount thewith risk. high Our correlation work indicated of minimizing that both DLthe andcost. DRL algorithms are useful for prediction purposes with almost the same quality of prediction in terms of statistical error. Deep learning models such as AE methods in risk management [91] and LSTM-SVR4. Discussion approaches [95] in investment problems showed that they enable agents to considerably maximizeA comparative their revenue result while of DL taking andcare DRL of models risk constraints was applied with to reasonably the multiple high economic performance domains as it isdiscussed quite important in Table to14 theand economic Figure 33. markets. Different In aspect addition,s of proposed reinforcement models learning were summ is ablearized to simulate such as moremodel e fficomplexity,cient models accuracy with moreand speed, realistic source market of dataset, constraints, internal while detection deep RL property goes further of the tomodels, solve theand scalability profitability problem of the ofmodel RL algorithms, regarding to which revenue is crucial and managing for fast users the risk. and marketOur work growth, indicated and that ebothfficiently DL and work DRL with algorithm high-dimensionals are useful settings for prediction as it is highly purpose desireds with in almost the financial the same market. quality DRL of canprediction give notable in terms help of statistical to design error. more Deep efficient learning algorithms models for such forecasting as AE method and analyzings in risk management the market with[91] and real-word LSTM- parameters.SVR approach Thees deep[95] in deterministic investment policy problem gradients showed (DDPG) that methodthey enable used agents by [109 to] inconsiderably stock trading maximize demonstrated their revenue how the while model taking is able care to handle of risk large constraints settings with concerning reasonabl stability,y high improvingperformance data as it use, is quite and important equilibrating to the risk economic while optimizing markets. In the addition, return withreinforcement a high performance learning is guarantee.able to simulate Another more example efficient inmodels the deep with RL more framework realistic mademarket use constraints of the DQN, while scheme deep RL [119 goes] to amelioratefurther to solve thenews the scalability recommendation problem accuracyof RL algorithms, dealing withwhich huge is crucial users for at thefast sameusers timeand market with a considerablygrowth, and that high-performance efficiently work guaranteewith high-dimensional of the method. settings Our as review it is hig demonstratedhly desired in thatthe financial there is greatmarket. potential DRL can in improvinggive notable deep help learning to design methods more efficient applied algorithms to the RL problems for forecasting to better and analyze analyzing the relevantthe market problem with real for-word finding parameters. the best strategy The deep in deterministic a wide range policy of economic gradient domains. (DDPG) Furthermore,method used RLby [109] is widely in stock growing trading research demonstrated in DL. However, how the mode mostl of is theable well-known to handle large results setting in RLs concerning have been instability, single-agent improving environments, data use, while and the equilibrating cumulative risk reward while just optimizing involves single the return state-action with spaces.a high Itperformance would be interesting guarantee. to Another consider example multi-agent in the scenarios deep RL inframework RL that comprise made use more of the than DQN one scheme agent where[119] to the ameliorate cumulative the news reward recommendation can be affected accuracy by the actions dealing of with other huge agents. users Thereat the sa isme some time recent with researcha considerably that applied high-performance multi-agent reinforcementguarantee of the learning method. (MARL) Our review scenarios demonstrated to a small setthat of there agents is great potential in improving deep learning methods applied to the RL problems to better analyze the relevant problem for finding the best strategy in a wide range of economic domains. Furthermore,

Mathematics 2020, 8, 1640 34 of 42 but not to a large set of agents [121]. For instance, in the financial markets where there are a huge number of agents at the same time, the acts of particular agents, concerning the optimization problem, might be affected by the decisions of the other agents. The MARL can provide an imprecise model and then inexact prediction while considering infinite agents. On the other hand, mathematicians use the theory of mean-field games (MFGs) to model a huge variety of non-cooperative agents in a complicated multi-agent dynamic environment, but in theory, MFG modeling might often lead to derive unsolvable partial differential equations (PDE) while dealing with large numbers of agents. Therefore, the big challenge in the application of the MARL is to efficiently model a huge number of agents and then solve the real-world problem using MFGs, dealing with economics and financial markets. Now, the big challenges might be applying MFGs to MARL systems while involving infinite agents. Note that MGFs are able to efficiently model the problem but give unsolvable equations [122,123]. Recent advanced research demonstrated that using MFGs can diminish the complexity of the MARL system dealing with infinite agents. Thus, the question is how to effectively present the combination of MGFs (mathematically), and MARL that can help out to solve unsolvable equations applied to economic and financial problems. For example, the goal of all the high-frequency traders is to dramatically optimize the profit where MARL scenarios can be modeled by MFGs, assuming that all the agents have the same cumulative reward.

Table 14. Comparative study of DL and DRL models in economics.

Dataset Detection Accuracy & Processing Methods Complexity Profit & Risk Application Type Efficiency Rate TGRU Historical High Reasonable High-Reasonable Reasonable profit Stock pricing DNN Historical Reasonably high Reasonable High-Reasonable Reasonable profit Stock pricing AE Historical High High High-Reasonably high Reasonably low risk Insurance RgretNet Historical High Reasonable High-Reasonable High profit Auction design Credit AE & RBM Historical Reasonably high Reasonable High-Reasonable Low risk card fraud ENDE Historical Reasonable Reasonable High-Reasonable — Macroeconomic Risk AE Historical High High High-Reasonable High-Low management LSTM-SVR Historical High Reasonable Reasonably-high-High High-Low Investment profit CNNR Historical Reasonable High Reasonably high-High Reasonably high Retail market Business RNN Historical Reasonable High Reasonable-Reasonable — intelligence DDPG Historical High High Reasonably high-High High-Low Stock trading portfolio IMF&MBM Historical High High Reasonabley high-High High profit managing DQN Historical High High High-High High-Low Online services Mathematics 2020, 8, x FOR PEER REVIEW 36 of 44

Figure 33. The ranking score for each method in its own application. Figure 33. The ranking score for each method in its own application.

Figure 34 illustrates a comprehensive evaluation of models’ performance in the surveyed manuscript. LSTM, CNN, DNN, RNN, and GRU generally produced higher RMSE. The identified DRL methods with higher performance can be further used in emerging applications, e.g., pandemic modeling, financial markets, and 5G communication.

Figure 34. Comparative analysis of the models’ performance

5. Conclusions In the current fast economics and market growth, there is a high demand for the appropriate mechanisms in order to considerably enhance the productivity and quality of the product. Thus, DL can contribute to effectively forecast and detect complex market trends, as compared to the traditional ML algorithms, with the major advantage of a high-level feature extraction property and proficiency of the problem solver methods. Furthermore, reinforcement learning enables us to construct more efficient frameworks regarding the integration of the prediction problem with the portfolio structure task, considering crucial market constraints and better performance, while using deep reinforcement learning architecture and the combination of both DL and RL approaches, for RL to resolve the

Mathematics 2020, 8, x FOR PEER REVIEW 36 of 44

Mathematics 2020, 8, 1640 35 of 42 Figure 33. The ranking score for each method in its own application.

Figure 34Figure illustrates 34 illustrates a comprehensive a comprehensive evaluation of evaluation models’ performance of models’ in the performance surveyed manuscript. in the surveyed LSTMmanuscript., CNN, DNN, LSTM, RNN CNN,, and GRU DNN, generally RNN, produced and GRU higher generally RMSE. produced The identified higher DRL RMSE. methods The with identified higherDRL performance methods with can higher be further performance used in emerging can be further applications, used in e.g., emerging pandemic applications, modeling, financial e.g., pandemic markets,modeling, and 5G financial communication markets,. and 5G communication.

FigureFigure 34. 34.ComparativeComparative analysis analysis of the of themodels’ models’ performance performance

5. Conclusions

5. ConclusionIn thes current fast economics and market growth, there is a high demand for the appropriate mechanisms in order to considerably enhance the productivity and quality of the product. Thus, DL can contributeIn the current to eff ectivelyfast economics forecast and and market detect complexgrowth, there market is trends,a high demand as compared for the to appropriate the traditional mechanismsML algorithms, in order with to theconsiderably major advantage enhance of the a high-level productivity feature and extractionquality of the property product. and Thus, proficiency DL canof contribute the problem to effectively solver methods. forecast Furthermore,and detect complex reinforcement market trend learnings, as compared enables us to to the construct traditional more MLe ffialgorithms,cient frameworks with the regarding major advantage the integration of a high of-level the prediction feature extraction problem property with the and portfolio proficiency structure of thetask, problem considering solver crucial methods. market Furthermore, constraints andreinforcement better performance, learning enable whiles using us to deep construct reinforcement more efficientlearning framework architectures regarding and the combinationthe integration of bothof the DL prediction and RL approaches, problem with for RLthe toportfolio resolve structure the problem task,of considering scalability and crucial to be market applied constraints to the high-dimensional and better performance problems, while as desired using deep in real-world reinforcement market learningsettings. architecture Several DL and and the deep combinati RL approaches,on of both such DL as and DNN, RL Autoencoder, approaches, for RBM, RL LSTM-SVR, to resolve the CNN, RNN, DDPG, DQN, and a few others, were reviewed in the various application of economic and market domains, where the advanced models improved prediction to extract better information and to find the optimal strategy mostly in complicated and dynamic market conditions. This brief work represents the basic issue that all proposed approaches are mainly to fairly deal with the model complexity, robustness, accuracy, performance, computational tasks, risk constraints, and profitability. Practitioners can employ a variety of both DL and deep RL techniques, with the relevant strengths and weaknesses, that serve in economic problems to enable the machine to detect the optimal strategy associated with the market. Recent works showed that the novel techniques in DNN, and recent interaction with reinforcement learning, so-called deep RL, have the potential to considerably enhance the model performance and accuracy while handling real-world economic problems. We mention that our work indicates the recent approaches in both DL and deep RL perform better than the classical ML approaches. Significant progress can be obtained by designing more efficient novel algorithms using deep neural architectures in conjunction with reinforcement learning concepts to detect the optimal strategy, namely, optimize the profit and minimize the loss while considering the risk parameters in a highly competitive market. For future research, a survey of machine learning and deep learning in 5G, cloud , cellular network, and COVID-19 outbreak is suggested. Mathematics 2020, 8, 1640 36 of 42

Author Contributions: Conceptualization, A.M.; methodology, A.M.; software, Y.F.; validation, A.M., P.G. and P.D.; formal analysis, S.F.A. and E.S.; investigation, A.M.; resources, S.S.B.; , A.M.; writing—original draft preparation, A.M.; writing—review and editing, P.G.; visualization, S.F.A.; supervision, P.G. and S.S.B.; project administration, A.M.; funding acquisition, A.M. All authors have read and agreed to the published version of the manuscript. Funding: We acknowledge the financial support of this work by the Hungarian State and the European Union under the EFOP-3.6.1-16-2016-00010 project and the 2017-1.3.1-VKE-2017-00025 project. The research presented in this paper was carried out as part of the EFOP-3.6.2-16-2017-00016 project in the framework of the New Szechenyi Plan. The completion of this project is funded by the European Union and co-financed by the European Social Fund. We acknowledge the financial support of this work by the Hungarian State and the European Union under the EFOP-3.6.1-16-2016-00010 project. Acknowledgments: We acknowledge the financial support of this work by the Hungarian-Mexican bilateral Scientific and Technological (2019-2.1.11-TÉT-2019-00007) project. The support of the Alexander von Humboldt Foundation is acknowledged. Conflicts of Interest: The authors declare no conflict of interest.

Abbreviations

AE Autoencoder ALT Augmented Lagrangian Technique ARIMA Autoregressive Integrated Moving Average BCM Behavior Cloning Module BI Business Intelligence CNN Convolutional Neural Network CNNR Convolutional Neural Network Regression DAM Data Augmentation Module DBNs Deep Belief Networks DDPG Deterministic Policy Gradient DDQN Double Deep Q-network DDRL Deep Deterministic Reinforcement Learning DL Deep Learning DNN Deep Neural Network DPG Deterministic Policy Gradient DQN Deep Q-network DRL Deep Reinforcement Learning EIIE Ensemble of Identical Independent Evaluators ENDE Encoder-Decoder GANs Generative Adversarial Nets IPM Infused Prediction Module LDA Latent Dirichlet Allocation MARL Multi-agent Reinforcement Learning MCTS Monte-Carlo Tree Search MDP Markov Decision Process MFGs Mean Field Games ML Machine Learning MLP Multilayer MLS Multi Layer Selection NFQCA Neural Fitted Q Iteration with Continuous Actions NLP Natural Language Processing OSBL Online Stochastic Batch Learning PCA Principal Component Analysis PDE Partial Differential Equations PG Policy Gradient Mathematics 2020, 8, 1640 37 of 42

PPO Proximal Policy Optimization PVM Portfolio-Vector Memory RBM Restricted Boltzmann Machine RCNN Recurrent Convolutional Neural Network RL Reinforcement Learning RNN Recurrent Neural Network R-NN Random Neural Network RS Risk Score RTB Real-Time Bidding SAE Stacked Auto-encoder SPF Survey of Professional Forecasters SPG Stochastic Policy Gradient SS Sponsored Search SVR Support Vector Regression TGRU Two-Stream Gated Recurrent Unit UCRP Uniform Constant Rebalanced Portfolios

References

1. Erhan, D.; Bengio, Y.; Courville, A.; Vincent, P. Visualizing higher-layer features of a deep network. Univ. Montr. 2009, 1341, 1. 2. Olah, C.; Mordvintsev, A.; Schubert, L. Feature visualization. Distill 2017, 2, e7. [CrossRef] 3. Ding, X.; Zhang, Y.; Liu, T.; Duan, J. Deep learning for event-driven stock prediction. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. 4. Pacelli, V.; Azzollini, M. An artificial neural network approach for credit risk management. J. Intell. Learn. Syst. Appl. 2011, 3, 103–112. [CrossRef] 5. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. 6. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [CrossRef][PubMed] 7. Sutton, R.S. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Proceedings of the Advances in Neural Information Processing Systems 9, Los Angeles, CA, USA, September 1996; pp. 1038–1044. 8. Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [CrossRef] 9. Moody, J.; Wu, L.; Liao, Y.; Saffell, M. Performance functions and reinforcement learning for trading systems and portfolios. J. Forecast. 1998, 17, 441–470. [CrossRef] 10. Dempster, M.A.; Payne, T.W.; Romahi, Y.; Thompson, G.W. Computational learning techniques for intraday FX trading using popular technical indicators. IEEE Trans. Neural Netw. 2001, 12, 744–754. [CrossRef] 11. Bekiros, S.D. Heterogeneous trading strategies with adaptive fuzzy actor–critic reinforcement learning: A behavioral approach. J. Econ. Dyn. Control 2010, 34, 1153–1170. [CrossRef] 12. Kearns, M.; Nevmyvaka, Y. Machine learning for market microstructure and high frequency trading. In High Frequency Trading: New Realities for Traders, Markets, and Regulators; Easley, D., de Prado, M.L., O’Hara, M., Eds.; Risk Books: London, UK, 2013. 13. Britz, D. Introduction to Learning to Trade with Reinforcement Learning. Available online: http: //www.wildml.com/2018/02/introduction-to-learning-to-tradewith-reinforcement-learning (accessed on 1 August 2018). 14. Guo, Y.; Fu, X.; Shi, Y.; Liu, M. Robust log-optimal strategy with reinforcement learning. arXiv 2018, arXiv:1805.00205. 15. Jiang, Z.; Xu, D.; Liang, J. A deep reinforcement learning framework for the financial portfolio management problem. arXiv 2017, arXiv:1706.10059. 16. Nosratabadi, S.; Mosavi, A.; Duan, P.; Ghamisi, P. Data science in economics. arXiv 2020, arXiv:2003.13422. Mathematics 2020, 8, 1640 38 of 42

17. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [CrossRef] 18. Chen, Y.; Zhao, X.; Jia, X. Spectral–spatial classification of hyperspectral data based on . IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2381–2392. [CrossRef] 19. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems 25, Los Angeles, CA, USA, 1 June 2012; pp. 1097–1105. 20. Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [CrossRef] 21. Williams, R.J.; Zipser, D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1989, 1, 270–280. [CrossRef] 22. Schmidhuber, J.; Hochreiter, S. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. 23. Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. 24. Wu, J. Introduction to Convolutional Neural Networks; National Key Lab for Novel Software Technology, Nanjing University: Nanjing, China, 2017; Volume 5, p. 23. 25. François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement learning. Found. Trends® Mach. Learn. 2018, 11, 219–354. [CrossRef] 26. Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [CrossRef] 27. Bellman, R.E.; Dreyfus, S. Applied ; Princeton University Press: Princeton, NJ, USA, 1962. 28. Lin, L.-J. Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 1992, 8, 293–321. [CrossRef] 29. Tieleman, T.; Hinton, G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 2012, 4, 26–31. 30. Hasselt, H.V. Double Q-learning. In Proceedings of the Advances in Neural Information Processing Systems 23, San Diego, CA, USA, July 2010; pp. 2613–2621. 31. Bellemare, M.G.; Dabney, W.; Munos, R. A distributional perspective on reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, Sydney,Australia, 6–11 August 2017; Volume 70, pp. 449–458. 32. Dabney, W.; Rowland, M.; Bellemare, M.G.; Munos, R. Distributional reinforcement learning with quantile regression. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. 33. Rowland, M.; Bellemare, M.G.; Dabney, W.; Munos, R.; Teh, Y.W. An analysis of categorical distributional reinforcement learning. arXiv 2018, arXiv:1802.08163. 34. Morimura, T.; Sugiyama, M.; Kashima, H.; Hachiya, H.; Tanaka, T. Nonparametric return distribution approximation for reinforcement learning. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 799–806. 35. Jaderberg, M.; Mnih, V.; Czarnecki, W.M.; Schaul, T.; Leibo, J.Z.; Silver, D.; Kavukcuoglu, K. Reinforcement learning with unsupervised auxiliary tasks. arXiv 2016, arXiv:1611.05397. 36. Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the 12th International Conference, MLDM 2016, New York, NY, USA, 18–20 July 2016; Springer: Berlin/Heidelberg, Germany, 2016; Volume 9729. 37. Salimans, T.; Ho, J.; Chen, X.; Sidor, S.; Sutskever, I. Evolution strategies as a scalable alternative to reinforcement learning. arXiv 2017, arXiv:1703.03864. 38. Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992, 8, 229–256. [CrossRef] 39. Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the 31st International Conference on International Conference on Machine Learning, Beijing, China, 21–26 June 2014. 40. Hafner, R.; Riedmiller, M. Reinforcement learning in feedback control. Mach. Learn. 2011, 84, 137–169. [CrossRef] 41. Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. Mathematics 2020, 8, 1640 39 of 42

42. Konda, V.R.; Tsitsiklis, J.N. Actor-critic algorithms. In Proceedings of the Advances in Neural Information Processing Systems 12, Chicago, IL, USA, 20 June 2000; pp. 1008–1014. 43. Wang, Z.; Bapst, V.; Heess, N.; Mnih, V.; Munos, R.; Kavukcuoglu, K.; de Freitas, N. Sample efficient actor-critic with experience replay. arXiv 2016, arXiv:1611.01224. 44. Gruslys, A.; Azar, M.G.; Bellemare, M.G.; Munos, R. The reactor: A sample-efficient actor-critic architecture. arXiv 2017, arXiv:1704.04651. 45. O’Donoghue, B.; Munos, R.; Kavukcuoglu, K.; Mnih, V. Combining policy gradient and Q-learning. arXiv 2016, arXiv:1611.01626. 46. Fox, R.; Pakman, A.; Tishby, N. Taming the noise in reinforcement learning via soft updates. arXiv 2015, arXiv:1512.08562. 47. Haarnoja, T.; Tang, H.; Abbeel, P.; Levine, S. Reinforcement learning with deep energy-based policies. In Proceedings of the 34th International Conference on Machine Learning, Sydney,Australia, 6–11 August 2017; Volume 70, pp. 1352–1361. 48. Schulman, J.; Chen, X.; Abbeel, P. Equivalence between policy gradients and soft q-learning. arXiv 2017, arXiv:1704.06440. 49. Oh, J.; Guo, X.; Lee, H.; Lewis, R.L.; Singh, S. Action-conditional video prediction using deep networks in atari games. In Proceedings of the Advances in Neural Information Processing Systems 28: 29th Annual Conference on Neural Information Processing Systems, Seattle, WA, USA, May 2015; pp. 2863–2871. 50. Mathieu, M.; Couprie, C.; LeCun, Y. Deep multi-scale video prediction beyond mean square error. arXiv 2015, arXiv:1511.05440. 51. Finn, C.; Goodfellow, I.; Levine, S. Unsupervised learning for physical interaction through video prediction. In Proceedings of the Advances in Neural Information Processing Systems Advances in Neural Information, Processing Systems 29, Barcelona, Spain, 5–10 December 2016; pp. 64–72. 52. Pascanu, R.; Li, Y.; Vinyals, O.; Heess, N.; Buesing, L.; Racanière, S.; Reichert, D.; Weber, T.; Wierstra, D.; Battaglia, P. Learning model-based planning from scratch. arXiv 2017, arXiv:1707.06170. 53. Deisenroth, M.; Rasmussen, C.E. PILCO: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, 28 June–2 July 2011; pp. 465–472. 54. Wahlström, N.; Schön, T.B.; Deisenroth, M.P. From pixels to torques: Policy learning with deep dynamical models. arXiv 2015, arXiv:1502.02251. 55. Levine, S.; Koltun, V. Guided policy search. In Proceedings of the International Conference on Machine Learning, Oslo, Norway, February 2013; pp. 1–9. 56. Gu, S.; Lillicrap, T.; Sutskever, I.; Levine, S. Continuous deep q-learning with model-based acceleration. In Proceedings of the International Conference on Machine Learning(ICML 2013), Atlanta, GA, USA, June 2013; pp. 2829–2838. 57. Nagabandi, A.; Kahn, G.; Fearing, R.S.; Levine, S. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In Proceedings of the 2018 IEEE International Conference on and Automation (ICRA), Brisbane, Australia, 21–26 May 2018; pp. 7559–7566. 58. Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [CrossRef] 59. Heess, N.; Wayne, G.; Silver, D.; Lillicrap, T.; Erez, T.; Tassa, Y. Learning continuous control policies by stochastic value gradients. In Proceedings of the Advances in Neural Information Processing Systems; Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada, December 2015; pp. 2944–2952. 60. Kansky, K.; Silver, T.; Mély, D.A.; Eldawy, M.; Lázaro-Gredilla, M.; Lou, X.; Dorfman, N.; Sidor, S.; Phoenix, S.; George, D. Schema networks: Zero-shot transfer with a generative causal model of intuitive physics. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 1809–1818. 61. Patel, J.; Shah, S.; Thakkar, P.; Kotecha, K. Predicting stock and stock price index movement using trend deterministic and machine learning techniques. Expert Syst. Appl. 2015, 42, 259–268. [CrossRef] Mathematics 2020, 8, 1640 40 of 42

62. Lien Minh, D.; Sadeghi-Niaraki, A.; Huy, H.D.; Min, K.; Moon, H. Deep learning approach for short-term stock trends prediction based on two-stream gated recurrent unit network. IEEE Access 2018, 6, 55392–55404. [CrossRef] 63. Song, Y.; Lee, J.W.; Lee, J. A study on novel filtering and relationship between input-features and target-vectors in a deep learning model for stock price prediction. Appl. Intell. 2019, 49, 897–911. [CrossRef] 64. Go, Y.H.; Hong, J.K. Prediction of stock value using pattern matching algorithm based on deep learning. Int. J. Recent Technol. Eng. 2019, 8, 31–35. [CrossRef] 65. Das, S.; Mishra, S. Advanced deep learning framework for stock value prediction. Int. J. Innov. Technol. Explor. Eng. 2019, 8, 2358–2367. [CrossRef] 66. Huynh, H.D.; Dang, L.M.; Duong, D. A new model for stock price movements prediction using deep neural network. In Proceedings of the Eighth International Symposium on Information and Communication Technology, Ann Arbor, MI, USA, June 2016; pp. 57–62. 67. Mingyue, Q.; Cheng, L.; Yu, S. Application of the artifical neural network in predicting the direction of stock market index. In Proceedings of the 2016 10th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS), Fukuoka, Japan, 6–8 July 2016; pp. 219–223. 68. Weng, B.; Martinez, W.; Tsai, Y.T.; Li, C.; Lu, L.; Barth, J.R.; Megahed, F.M. Macroeconomic indicators alone can predict the monthly closing price of major US indices: Insights from artificial intelligence, time-series analysis and hybrid models. Appl. Soft Comput. 2017, 71, 685–697. [CrossRef] 69. Bodaghi, A.; Teimourpour, B. The detection of professional fraud in automobile insurance using social network analysis. arXiv 2018, arXiv:1805.09741. 70. Wang, Y.; Xu, W. Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decis. Support Syst. 2018, 105, 87–95. [CrossRef] 71. Siaminamini, M.; Naderpour, M.; Lu, J. Generating a risk profile for car insurance policyholders: A deep learning conceptual model. In Proceedings of the Australasian Conference on Information Systems, Geelong, Australia, 3–5 December 2012. 72. Myerson, R.B. Optimal auction design. Math. Oper. Res. 1981, 6, 58–73. [CrossRef] 73. Manelli, A.M.; Vincent, D.R. Bundling as an optimal selling mechanism for a multiple-good monopolist. J. Econ. Theory 2006, 127, 1–35. [CrossRef] 74. Pavlov, G. Optimal mechanism for selling two goods. BE J. Theor. Econ. 2011, 11, 122–144. [CrossRef] 75. Cai, Y.; Daskalakis, C.; Weinberg, S.M. An algorithmic characterization of multi-dimensional mechanisms. In Proceedings of the Forty-Fourth Annual ACM Symposium on Theory of Computing; Association for Computing Machinery, New York, NY, USA, 19–22 May 2012; pp. 459–478. 76. Dutting, P.; Zheng, F.; Narasimhan, H.; Parkes, D. Optimal economic design through deep learning. In Proceedings of the Conference on Neural Information Processing Systems (NIPS), Berlin, Germany, 4–9 December 2017. 77. Feng, Z.; Narasimhan, H.; Parkes, D.C. Deep learning for revenue-optimal auctions with budgets. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems, Stockholm, Sweden, 10–15 July 2018; pp. 354–362. 78. Sakurai, Y.; Oyama, S.; Guo, M.; Yokoo, M. Deep false-name-proof auction mechanisms. In Proceedings of the International Conference on Principles and Practice of Multi-Agent Systems, Kolkata, India, 12–15 November 2010; pp. 594–601. 79. Luong, N.C.; Xiong, Z.; Wang, P.; Niyato, D. Optimal auction for edge computing resource management in mobile blockchain networks: A deep learning approach. In Proceedings of the 2018 IEEE International Conference on Communications (ICC), Kansas City, MO, USA, 20–24 May 2018; pp. 1–6. 80. Dütting, P.; Feng, Z.; Narasimhan, H.; Parkes, D.C.; Ravindranath, S.S. Optimal auctions through deep learning. arXiv 2017, arXiv:1706.03459. 81. Al-Shabi, M. Credit card fraud detection using autoencoder model in unbalanced datasets. J. Adv. Math. Comput. Sci. 2019, 33, 1–16. [CrossRef] 82. Roy, A.; Sun, J.; Mahoney, R.; Alonzi, L.; Adams, S.; Beling, P. Deep learning detecting fraud in credit card transactions. In Proceedings of the 2018 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA, 27 April 2018; IEEE: Trondheim, Norway, 2018; pp. 129–134. Mathematics 2020, 8, 1640 41 of 42

83. Han, J.; Barman, U.; Hayes, J.; Du, J.; Burgin, E.; Wan, D. Nextgen aml: Distributed deep learning based language technologies to augment anti money laundering investigation. In Proceedings of the ACL 2018, System Demonstrations, Melbourne, Australia, 15–20 July 2018. 84. Pumsirirat, A.; Yan, L. Credit card fraud detection using deep learning based on auto-encoder and restricted boltzmann machine. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 18–25. [CrossRef] 85. Estrella, A.; Hardouvelis, G.A. The term structure as a predictor of real economic activity. J. Financ. 1991, 46, 555–576. [CrossRef] 86. Smalter Hall, A.; Cook, T.R. Macroeconomic indicator forecasting with deep neural networks. In Federal Reserve Bank of Kansas City Working Paper; Federal Reserve Bank of Kansas City: Kansas City, MO, USA, 2017; Volume 7, pp. 83–120. [CrossRef] 87. Haider, A.; Hanif, M.N. Inflation forecasting in Pakistan using artificial neural networks. Pak. Econ. Soc. Rev. 2009, 47, 123–138. 88. Chakravorty, G.; Awasthi, A. Deep learning for global tactical asset allocation. SSRN Electron. J. 2018, 3242432. [CrossRef] 89. Addo, P.; Guegan, D.; Hassani, B. Credit risk analysis using machine and deep learning models. Risks 2018, 6, 38. [CrossRef] 90. Ha, V.-S.; Nguyen, H.-N. Credit scoring with a feature selection approach based deep learning. In Proceedings of the MATEC Web of Conferences, Beijing, China, 25–27 May 2018; p. 05004. 91. Heaton, J.; Polson, N.; Witte, J.H. Deep learning for finance: Deep portfolios. Appl. Stoch. Models Bus. Ind. 2017, 33, 3–12. [CrossRef] 92. Sirignano, J.; Sadhwani, A.; Giesecke, K. Deep learning for mortgage risk. arXiv 2016, arXiv:1607.02470. [CrossRef] 93. Aggarwal, S.; Aggarwal, S. Deep investment in financial markets using deep learning models. Int. J. Comput. Appl. 2017, 162, 40–43. [CrossRef] 94. Culkin, R.; Das, S.R. Machine learning in finance: The case of deep learning for option pricing. J. Invest. Manag. 2017, 15, 92–100. 95. Fang, Y.; Chen, J.; Xue, Z. Research on quantitative investment strategies based on deep learning. Algorithms 2019, 12, 35. [CrossRef] 96. Serrano, W. The random neural network with a genetic algorithm and deep learning clusters in fintech: Smart investment. In Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Hersonissos, Crete, Greece, 24–26 May 2019; pp. 297–310. 97. Hutchinson, J.M.; Lo, A.W.; Poggio, T. A nonparametric approach to pricing and hedging derivative securities via learning networks. J. Financ. 1994, 49, 851–889. [CrossRef] 98. Cruz, E.; Orts-Escolano, S.; Gomez-Donoso, F.; Rizo, C.; Rangel, J.C.; Mora, H.; Cazorla, M. An augmented reality application for improving shopping experience in large retail stores. Virtual Real. 2019, 23, 281–291. [CrossRef] 99. Loureiro, A.L.; Miguéis, V.L.; da Silva, L.F. Exploring the use of deep neural networks for sales forecasting in fashion retail. Decis. Support Syst. 2018, 114, 81–93. [CrossRef] 100. Nogueira, V.; Oliveira, H.; Silva, J.A.; Vieira, T.; Oliveira, K. RetailNet: A deep learning approach for people counting and hot spots detection in retail stores. In Proceedings of the 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Rio de Janeiro, Brazil, 28–31 October 2019; pp. 155–162. 101. Ribeiro, F.D.S.; Caliva, F.; Swainson, M.; Gudmundsson, K.; Leontidis, G.; Kollias, S. An adaptable deep learning system for optical character verification in retail food packaging. In Proceedings of the 2018 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), Rhodes, Greece, 25–27 May 2018; pp. 1–8. 102. Fombellida, J.; Martín-Rubio, I.; Torres-Alegre, S.; Andina, D. Tackling business intelligence with bioinspired deep learning. Neural Comput. Appl. 2018, 1–8. [CrossRef] 103. Singh, V.; Verma, N.K. Deep learning architecture for high-level feature generation using stacked auto encoder for business intelligence. In Complex Systems: Solutions and Challenges in Economics, Management and Engineering; Springer: Berlin/Heidelberg, Germany, 2018; pp. 269–283. 104. Nolle, T.; Seeliger, A.; Mühlhäuser, M. BINet: Multivariate business process anomaly detection using deep learning. In Proceedings of the International Conference on Business Process Management, Sydney, Australia, 9–14 September 2018; pp. 271–287. Mathematics 2020, 8, 1640 42 of 42

105. Evermann, J.; Rehse, J.-R.; Fettke, P. Predicting process behaviour using deep learning. Decis. Support Syst. 2017, 100, 129–140. [CrossRef] 106. West, D. Neural network credit scoring models. Comput. Oper. Res. 2000, 27, 1131–1152. [CrossRef] 107. Peng, Y.; Kou, G.; Shi, Y.; Chen, Z. A multi-criteria convex quadratic programming model for credit . Decis. Support Syst. 2008, 44, 1016–1030. [CrossRef] 108. Kim, Y.; Ahn, W.; Oh, K.J.; Enke, D. An intelligent hybrid trading system for discovering trading rules for the futures market using rough sets and genetic algorithms. Appl. Soft Comput. 2017, 55, 127–140. [CrossRef] 109. Xiong, Z.; Liu, X.-Y.; Zhong, S.; Yang, H.; Walid, A. Practical deep reinforcement learning approach for stock trading. arXiv 2018, arXiv:1811.07522. 110. Li, X.; Li, Y.; Zhan, Y.; Liu, X.-Y. Optimistic bull or pessimistic bear: Adaptive deep reinforcement learning for stock portfolio allocation. arXiv 2019, arXiv:1907.01503. 111. Li, Y.; Ni, P.; Chang, V. An empirical research on the investment strategy of stock market based on deep reinforcement learning model. Comput. Sci. Econ. 2019.[CrossRef] 112. Azhikodan, A.R.; Bhat, A.G.; Jadhav, M.V. Stock Trading Bot Using Deep Reinforcement Learning. In Innovations in and Engineering; Springer: Berlin/Heidelberg, Germany, 2019; pp. 41–49. 113. Liang, Z.; Chen, H.; Zhu, J.; Jiang, K.; Li, Y. Adversarial deep reinforcement learning in portfolio management. arXiv 2018, arXiv:1808.09940. 114. Jiang, Z.; Liang, J. Cryptocurrency portfolio management with deep reinforcement learning. In Proceedings of the 2017 Intelligent Systems Conference (IntelliSys), London, UK, 7–8 September 2017; pp. 905–913. 115. Yu, P.; Lee, J.S.; Kulyatin, I.; Shi, Z.; Dasgupta, S. Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization. arXiv 2019, arXiv:1901.08740. 116. Feng, L.; Tang, R.; Li, X.; Zhang, W.; Ye, Y.; Chen, H.; Guo, H.; Zhang, Y. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv 2018, arXiv:1810.12027. 117. Zhao, J.; Qiu, G.; Guan, Z.; Zhao, W.; He, X. Deep reinforcement learning for sponsored search real-time bidding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & , London, UK, 19–23 August 2018; pp. 1021–1030. 118. Liu, J.; Zhang, Y.; Wang, X.; Deng, Y.; Wu, X. Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning. arXiv 2019, arXiv:1912.02572. 119. Zheng, G.; Zhang, F.; Zheng, Z.; Xiang, Y.; Yuan, N.J.; Xie, X.; Li, Z. DRN: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 Conference, Lyon, France, 23–27 April 2018; pp. 167–176. 120. Kompan, M.; Bieliková, M. Content-based news recommendation. In Proceedings of the International Conference on Electronic Commerce and Web Technologies, Valencia, Spain, September 2015; pp. 61–72. 121. Jaderberg, M.; Czarnecki, W.M.; Dunning, I.; Marris, L.; Lever, G.; Castaneda, A.G.; Beattie, C.; Rabinowitz, N.C.; Morcos, A.S.; Ruderman, A. Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. arXiv 2018, arXiv:1807.01281. 122. Guéant, O.; Lasry, J.-M.; Lions, P.-L. Mean field games and applications. In Paris-Princeton Lectures on Mathematical Finance 2010; Springer: Berlin/Heidelberg, Germany, 2011; pp. 205–266. 123. Vasiliadis, A. An introduction to mean field games using probabilistic methods. arXiv 2019, arXiv:1907.01411.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).