Masters Thesis: Predicting Periodic and Chaotic Signals Using Wavenets

Predicting periodic and chaotic signals using Wavenets Master of Science Thesis For the degree of Master of Science in Applied Mathematics with the specialization Financial Engineering at Delft University of Technology D.C.F. van den Assem (4336100) August 18, 2017 Supervisor: Prof. dr. ir. C. W. Oosterlee TU Delft Thesis committee: Dr. S. M. Bohte, CWI Amsterdam Dr. ir. R. J. Fokkink, TU Delft Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) · Delft University of Technology iii Copyright c Delft Institute of Applied Mathematics (DIAM) All rights reserved. Master of Science Thesis D.C.F. van den Assem (4336100) iv D.C.F. van den Assem (4336100) Master of Science Thesis Abstract This thesis discusses forecasting periodic time series using Wavenets with an application in financial time series. Conventional neural networks used for forecasting such as the LSTM and the full convolutional network (FCN) are computationally expensive. The Wavenet uses dilated convolutions which significantly reduces the computational cost compared to the FCN with the same number of inputs. Forecasts made on the sine wave shows that the network can successfully fully forecast a sine wave. Forecasts made on the Mackey Glass time series shows that the network can outperform the LSTM and other methods Furthermore, forecasts made on the Lorenz system shows that the network is able to outperform the LSTM. By conditioning the network on the other relevant coordinate, the prediction becomes more accurate and is able to make full forecasts. In a financial application, the network shows less predictive accuracy compared to multivariate dynamic kernel support vector machines. Master of Science Thesis D.C.F. van den Assem (4336100) ii D.C.F. van den Assem (4336100) Master of Science Thesis Table of Contents Acknowledgements ix 1 Introduction 1 1.1 Outline.......................................... 2 2 Machine Learning3 2.1 Terminology in Machine Learning............................ 3 2.2 Classification in the Iris data set............................ 5 2.3 The Single Layer Perceptron Model........................... 7 2.3.1 Implementation and Training.......................... 7 2.3.2 Example 2: The other data set......................... 9 2.4 Logistic regression.................................... 9 2.5 Introduction to Neural Networks............................ 11 2.6 Summary......................................... 12 3 Neural Networks 13 3.1 Network Architectures.................................. 13 3.1.1 Activation Functions.............................. 14 3.1.2 Convolutional Neural Networks......................... 15 3.1.3 Recurrent Neural Networks........................... 18 3.2 Supervised Learning of the Neural Network....................... 21 3.2.1 Backpropagation Algorithm........................... 21 3.2.2 Cost Function.................................. 22 3.2.3 Stochastic Gradient Descent, Batch Gradient Descent and Mini-Batch Gra- dient Descent.................................. 23 3.2.4 Initializers.................................... 24 3.3 Regularization methods................................. 31 3.3.1 Bootstrap aggregating............................. 33 3.3.2 Dropout..................................... 33 3.4 Wavenet......................................... 34 3.5 Augmented Wavenet (AWN).............................. 37 3.6 Summary......................................... 38 4 Methodology 39 4.1 Evaluation of the network................................ 39 4.2 Error Measures...................................... 40 4.2.1 Statistical Testing................................ 43 4.2.2 Benchmarks................................... 44 4.3 Artificial Time Series.................................. 44 4.3.1 The sine wave.................................. 44 4.3.2 The Lorenz System............................... 45 4.3.3 Mackey Glass Equation............................. 46 4.4 Real world time series.................................. 47 4.4.1 Data pre processing............................... 47 Master of Science Thesis D.C.F. van den Assem (4336100) iv Table of Contents 5 Results 49 5.1 Implementation comparison............................... 49 5.2 The Sine Wave..................................... 51 5.3 The Mackey Glass Time Series............................. 54 5.4 The Lorenz System................................... 57 5.5 Results on financial time series............................. 60 5.6 Summary......................................... 61 6 Conclusion 63 6.1 Summary and conclusion................................ 63 6.2 Future research..................................... 64 A Seperation hyperplanes 65 B Glorot derivation 67 C Code of the model 71 Bibliography 73 D.C.F. van den Assem (4336100) Master of Science Thesis List of Figures 2.1 Scatter plot of Iris data, Setosa (blue •), Versicolor (red ×), Virginica (green ♦). In the subplot on the first row and second column, the sepal width is plotted against the sepal length...................................... 6 2.2 Scatter plot of Iris data, Setosa (blue •), Versicolor (red ×), Virginica (green ♦)... 6 2.3 Schematic representation of the single layer perceptron................ 7 2.4 Illustrations of the multi-class logit and the softmax implementation......... 10 2.5 A multi layer network for solving the XOR-problem.................. 11 3.1 Graph of a layered network with E = E1 ∪ E2 ∪ E3,N = N1 ∪ N2 ......... 14 3.2 The three different activation functions......................... 15 3.3 Illustration of the Receptive Field............................ 16 3.4 Illustration of the replications, shared weights and feature map............ 16 3.5 Illustration of the Padding................................ 16 3.6 Illustration of the Strides................................ 17 3.7 Illustration of the causal convolutions.......................... 17 3.8 Illustration of the causal convolutions with larger inputs and outputs......... 18 3.9 Illustration of the dilated convolutions......................... 19 3.10 The RNN on the left and the unfolded RNN network on the right.......... 19 3.11 The LSTM block. The × and + are point-wise operators, σ, tanh are activation functions. Two joining arcs makes a concatenate operation. Two splitting arcs makes a copy operation................................. 20 3.12 Figures of paraboloids, with a = 1 and b = 2 ...................... 28 3.13 Behaviour of the training error and validation error during training.......... 32 3.14 Overview of the residual block and the entire architecture, retrieved from [1]..... 34 3.15 Illustration of the stacked dilated convolutions..................... 35 3.16 Overview of the architecture used in AugmentedWavenet, retrieved from [2]..... 37 5.1 The full forecast of the sine wave using different implementations.......... 50 5.2 Overview of the AWN I(4)............................... 54 5.3 The full forecast of the a noisy sine wave using AWN I(4) .............. 55 5.4 The full forecast of the Mackey Glass time series using 8 layers on AWN I(4) .... 55 5.5 The full forecast of the Mackey Glass time series using 8 layers on AWN I(4) trained on one-ahead noisy data (σ = 0.1)........................... 57 5.6 The one ahead forecast and the full forecast of the Lorenz system using 4 layers on AWN I(4) ........................................ 58 5.7 Convergence behavior of the training of networks with different γ parameter, for 4 and 8 layers........................................ 59 5.8 The full conditioned forecast of the Lorenz system using 4 layers on AWN I(4C) using 4 layers...................................... 59 5.9 Comparison of the one step ahead (using months) of the AWN I(4) without and with tuning of the parameters.............................. 61 Master of Science Thesis D.C.F. van den Assem (4336100) vi List of Figures A.1 Petal Length vs Petal Width.............................. 66 D.C.F. van den Assem (4336100) Master of Science Thesis List of Tables 2.1 XOR-problem using SLP on the left and Multi-Layer XOR-problem on the right... 12 3.1 Number of weights for networks with a ‘visual field’ of 512.............. 36 4.1 Difference in response to errors between MAE and RMSE ............. 42 5.1 The standard parameters used in the Wavenet..................... 49 5.2 MAE and MSE based on 1000 samples of full forecast................. 50 5.3 - means that the forecast diverged, therefore the number is not useful........ 50 5.4 Results for I(2) with a variation, I(3) and I(4) .................... 52 5.5 The√ one-ahead and full forecast performance with different values for SNR. (SNR = 2 σ2 )............................................ 53 5.6 The results for the Mackey glass t steps ahead forecast using 4 layers. Values are RMSE ×10−3. The ± value is the standard deviation of the 10 runs......... 56 5.7 Results for noisy Mackey Glass time series on two configurations. Configuration 1 uses 4 layers and configuration 2 uses 8 layers. Values are RMSE ×10−2 ...... 56 5.8 Results of the modified network for different γ, RMSE of the benchmark is 4.78 × 10−3 58 5.9 Comparison of the noisy conditioned Lorenz system. RMSE × 103 ......... 60 5.10 Results from AWN I(4) and AWN I(4C) on the S&P500 data, conditioned with the CBOE data........................................ 61 5.11 The standard parameters used in the AWN I(4) and AWN I(4C)........... 62 Master of Science Thesis D.C.F. van den Assem (4336100) viii List of Tables D.C.F. van den Assem (4336100) Master of Science Thesis Acknowledgements This thesis has been submitted

Load more