Unsupervised Feature Learning from Time Series

Total Page:16

File Type:pdf, Size:1020Kb

Unsupervised Feature Learning from Time Series Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Unsupervised Feature Learning from Time Series , Qin Zhang⇤, Jia Wu⇤, Hong Yang§, Yingjie Tian† ‡, Chengqi Zhang⇤ ⇤ Quantum Computation & Intelligent Systems Centre, University of Technology Sydney, Australia † Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing, China. ‡ Key Lab of Big Data Mining & Knowledge Management, Chinese Academy of Sciences, Beijing, China. § MathWorks, Beijing, China qin.zhang@student.,jia.wu@, chengqi.zhang@ uts.edu.au, [email protected], [email protected] { } Abstract In this paper we study the problem of learning Shapelet 1 discriminative features (segments), often referred to as shapelets [Ye and Keogh, 2009] of time se- ries, from unlabeled time series data. Discover- ing shapelets for time series classification has been widely studied, where many search-based algo- Shapelet 2 Shapelet 1 rithms are proposed to efficiently scan and select segments from a pool of candidates. However, such types of search-based algorithms may incur high Shapelet 2 time cost when the segment candidate pool is large. Alternatively, a recent work [Grabocka et al., 2014] uses regression learning to directly learn, instead Figure 1: The two blue thin lines represent the time series of searching for, shapelets from time series. Moti- data. The upper one is rectangular signals with noise and the vated by the above observations, we propose a new lower one is sinusoidal signals with noise. The curves marked Unsupervised Shapelet Learning Model (USLM) in bold red are learnt features(shapelets). We can observe that to efficiently learn shapelets from unlabeled time the learnt shapelets may differ from all candidate segments series data. The corresponding learning func- and thus are robust to noise. tion integrates the strengths of pseudo-class label, spectral analysis, shapelets regularization term and regularized least-squares to auto-learn shapelets, defined distance metric and the segments which best pre- pseudo-class labels and classification boundaries dict the class labels are selected as shapelets. Based on the simultaneously. A coordinate descent algorithm seminal work, a line of speed-up algorithms [Mueen et al., is used to iteratively solve the learning function. 2011][Rakthanmanon and Keogh, 2013][Chang et al., 2012] Experiments show that USLM outperforms search- have been proposed to improve the performance. All these based algorithms on real-world time series data. methods can be categorized as the search-based algorithms which scan a pool of candidate segments. For example, with the Synthetic Control dataset [Chen et al., 2015] which con- 1 Introduction tains 600 time series examples of length 60, the number of candidates for all lengths is 1.098 106. Time series classification has wide applications in fi- ⇥ nance [Ruiz et al., 2012], medicine [Hirano and Tsumoto, On the other hand, a recent work [Grabocka et al., 2014] 2006] and trajectory analysis [Cai and Ng, 2004]. The main proposes a new time series shapelet learning approach. In- challenge of time series classification is to find discriminative stead of searching for shapelets from a candidate pool, they features that can best predict class labels. To solve the chal- use regression learning and aim to learn shapelets from time lenge, a line of works have been proposed to extract discrimi- series. This way, shapelets are detached from candidate seg- native features, which are often referred to as shapelets, from ments and the learnt shapelets may differ from all the can- time series. Shapelets are maximally discriminative features didate segments. More importantly, shapelet learning is fast of time series which enjoy the merit of high prediction accu- to compute, scalable to large datasets, and robust to noise as racy and are easy to explain [Ye and Keogh, 2009]. There- shown in Fig. 1. fore, discovering shapelets has become an important branch We present a new Unsupervised Shapelet Learning Model in time series analysis. (USLM for short) that can auto-learn shapelets from unla- The seminal work on shapelet discovery [Ye and Keogh, beled time series data. We first introduce pseudo-class label 2009] resorts to a full-scan of all possible time series seg- to transform unsupervised learning to supervised learn- ments where the segments are ranked according to a pre- ing. Then, we use the popular regularized least-squares and 2322 spectral analysis approaches to learn both shapelets and clas- through regression learning. This type of learning method sification boundaries. A new regularization term is also added does not consider a limited set of candidates but can obtain to avoid similar shapelets being selected. A coordinate de- arbitrary shapelets. scent algorithm is used to iteratively solve the pseudo-class Unsupervised feature selection. Many unsupervised fea- label, classification boundary and shapelets. ture selection algorithms have been proposed to select infor- Compared to the search-based algorithms on unlabeled mative features from unlabeled data. A commonly used cri- time series data [Zakaria et al., 2012], our method provides a terion in unsupervised feature learning is to select features new tool that learns shapelets from unlabeled time series data. best preserving data similarity or manifold structure con- The contributions of the work are summarized as follows: structed from the whole feature space[Zhao and Liu, 2007] [ ] We present a new Unsupervised Shapelet Learning Cai et al., 2010 , but they fail to incorporate discrimina- • Model (USLM) to learn shapelets from unlabeled time tive information implied within data, which cannot be di- series data. USLM combines pseudo-class label, spec- rectly applied in our shapelet learning problem. Earlier un- tral analysis, shapelets regularization and regularized supervised feature selection algorithms evaluate the impor- least-squares for learning. tance of each feature individually and select features one by one [He et al., 2005][Zhao and Liu, 2007], with a limita- We empirically validate the performance of USLM on tion that correlation among features is neglected [Cai et al., • both synthetic and real-world data. The results show 2010]citezhao2010efficient˜ [Zhang et al., 2015]. promising results compared to the state-of-the-art unsu- State-of-the-art unsupervised feature selection algorithms pervised shapelets selection models. perform feature selection by simultaneously exploiting dis- The remainder of the paper is organized as follows: Sec- criminative information and feature correlation. Unsuper- tion 2 surveys related work. Section 3 gives the preliminar- vised Discriminative Feature Selection (UDFS) [Yang et al., ies. Section 4 introduces the proposed unsupervised shapelet 2011] aims to select the most discriminative features for data learning model USLM. Section 5 introduces the learning al- representation, where manifold structure is also considered. gorithm with analysis. Section 6 conducts experiments. We Since the most discriminative information for feature selec- conclude the paper in Section 7. tion is usually encoded in labels, it is very important to predict a good cluster indicators as pseudo labels for unsupervised feature selection. 2 Related Work Shapelets for clustering. Shapelets also have been uti- Shapelets [Ye and Keogh, 2009] are time series short seg- lized to cluster time series [Zakaria et al., 2012]. Zakaria ments that can best predict class labels. The basic idea of et al. [Zakaria et al., 2012] have proposed a method to use shapelets discovery is to consider all segments of training data unsupervised-shapelets (u-Shapelets) for time series cluster- and assess them regarding a scoring function to estimate how ing. The algorithm searches for u-Shapelets which can sep- predictive they are with respect to the given class labels [Wis- arate and remove a subset of time series from the rest of the tuba et al., 2015]. The seminal work [Ye and Keogh, 2009] dataset, then it iteratively repeats the search among the re- builds a decision tree classifier by recursively searching for maining data until no data remains to be separated. It is a informative shapelets measured by information gain. Based greedy search algorithm which attempts to maximize the gap on information gain, several new measures such as F-Stat, between the two groups of time series divided by a u-shapelet. Kruskall-Wallis and Mood’s median are used in shapelets se- The k-shape algorithm is proposed in the work [Paparri- lection [Hills et al., 2014][Lines et al., 2012]. zos and Gravano, 2015] to cluster time series. k-shape is a Since time series data usually have a large number of can- novel algorithm for shape-based time series clustering that didate segments, the runtime of brute-force shapelets selec- is efficient and domain independent. k-shape is based on a tion is infeasible. Therefore, a series of speed-up techniques scalable iterative refinement procedure which creates homo- have been proposed. On the one hand, there are smart im- geneous and well-separated clusters. Specifically, k-Shape plementations using early abandon of distance computations requires a distance measure that is invariant to scaling and and entropy pruning of the information gain heuristic [Ye shifting. It uses a normalized version of the cross-correlation and Keogh, 2009]. On the other hand, many speed-ups measure as distance measure to consider the shapes of time rely on the reuse of computations and pruning of the search series. Based on the normalized cross-correlation, the method space [Mueen et al., 2011], as well as pruning candidates by computes cluster centroids in every iteration to update the as- searching possibly interesting candidates on the SAX repre- signment of time series to clusters. sentation [Rakthanmanon and Keogh, 2013] or using infre- Our work differs from the above research problems. We quent shapelets [He et al., 2012]. Shapelets have been applied introduce a new approach for unsupervised shapelet learning in a series of real-world applications. to auto-learn shapelets from unlabeled time series by com- Shapelet learning.
Recommended publications
  • Automatic Feature Learning Using Recurrent Neural Networks
    Predicting purchasing intent: Automatic Feature Learning using Recurrent Neural Networks Humphrey Sheil Omer Rana Ronan Reilly Cardiff University Cardiff University Maynooth University Cardiff, Wales Cardiff, Wales Maynooth, Ireland [email protected] [email protected] [email protected] ABSTRACT influence three out of four major variables that affect profit. Inaddi- We present a neural network for predicting purchasing intent in an tion, merchants increasingly rely on (and pay advertising to) much Ecommerce setting. Our main contribution is to address the signifi- larger third-party portals (for example eBay, Google, Bing, Taobao, cant investment in feature engineering that is usually associated Amazon) to achieve their distribution, so any direct measures the with state-of-the-art methods such as Gradient Boosted Machines. merchant group can use to increase their profit is sorely needed. We use trainable vector spaces to model varied, semi-structured input data comprising categoricals, quantities and unique instances. McKinsey A.T. Kearney Affected by Multi-layer recurrent neural networks capture both session-local shopping intent and dataset-global event dependencies and relationships for user Price management 11.1% 8.2% Yes sessions of any length. An exploration of model design decisions Variable cost 7.8% 5.1% Yes including parameter sharing and skip connections further increase Sales volume 3.3% 3.0% Yes model accuracy. Results on benchmark datasets deliver classifica- Fixed cost 2.3% 2.0% No tion accuracy within 98% of state-of-the-art on one and exceed state-of-the-art on the second without the need for any domain / Table 1: Effect of improving different variables on operating dataset-specific feature engineering on both short and long event profit, from [22].
    [Show full text]
  • Self-Discriminative Learning for Unsupervised Document Embedding
    Self-Discriminative Learning for Unsupervised Document Embedding Hong-You Chen∗1, Chin-Hua Hu∗1, Leila Wehbe2, Shou-De Lin1 1Department of Computer Science and Information Engineering, National Taiwan University 2Machine Learning Department, Carnegie Mellon University fb03902128, [email protected], [email protected], [email protected] Abstract ingful a document embedding as they do not con- sider inter-document relationships. Unsupervised document representation learn- Traditional document representation models ing is an important task providing pre-trained such as Bag-of-words (BoW) and TF-IDF show features for NLP applications. Unlike most competitive performance in some tasks (Wang and previous work which learn the embedding based on self-prediction of the surface of text, Manning, 2012). However, these models treat we explicitly exploit the inter-document infor- words as flat tokens which may neglect other use- mation and directly model the relations of doc- ful information such as word order and semantic uments in embedding space with a discrimi- distance. This in turn can limit the models effec- native network and a novel objective. Exten- tiveness on more complex tasks that require deeper sive experiments on both small and large pub- level of understanding. Further, BoW models suf- lic datasets show the competitiveness of the fer from high dimensionality and sparsity. This is proposed method. In evaluations on standard document classification, our model has errors likely to prevent them from being used as input that are relatively 5 to 13% lower than state-of- features for downstream NLP tasks. the-art unsupervised embedding models. The Continuous vector representations for docu- reduction in error is even more pronounced in ments are being developed.
    [Show full text]
  • Exploring the Potential of Sparse Coding for Machine Learning
    Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 10-19-2020 Exploring the Potential of Sparse Coding for Machine Learning Sheng Yang Lundquist Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/open_access_etds Part of the Applied Mathematics Commons, and the Computer Sciences Commons Let us know how access to this document benefits ou.y Recommended Citation Lundquist, Sheng Yang, "Exploring the Potential of Sparse Coding for Machine Learning" (2020). Dissertations and Theses. Paper 5612. https://doi.org/10.15760/etd.7484 This Dissertation is brought to you for free and open access. It has been accepted for inclusion in Dissertations and Theses by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Exploring the Potential of Sparse Coding for Machine Learning by Sheng Y. Lundquist A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Dissertation Committee: Melanie Mitchell, Chair Feng Liu Bart Massey Garrett Kenyon Bruno Jedynak Portland State University 2020 © 2020 Sheng Y. Lundquist Abstract While deep learning has proven to be successful for various tasks in the field of computer vision, there are several limitations of deep-learning models when com- pared to human performance. Specifically, human vision is largely robust to noise and distortions, whereas deep learning performance tends to be brittle to modifi- cations of test images, including being susceptible to adversarial examples. Addi- tionally, deep-learning methods typically require very large collections of training examples for good performance on a task, whereas humans can learn to perform the same task with a much smaller number of training examples.
    [Show full text]
  • Q-Learning in Continuous State and Action Spaces
    -Learning in Continuous Q State and Action Spaces Chris Gaskett, David Wettergreen, and Alexander Zelinsky Robotic Systems Laboratory Department of Systems Engineering Research School of Information Sciences and Engineering The Australian National University Canberra, ACT 0200 Australia [cg dsw alex]@syseng.anu.edu.au j j Abstract. -learning can be used to learn a control policy that max- imises a scalarQ reward through interaction with the environment. - learning is commonly applied to problems with discrete states and ac-Q tions. We describe a method suitable for control tasks which require con- tinuous actions, in response to continuous states. The system consists of a neural network coupled with a novel interpolator. Simulation results are presented for a non-holonomic control task. Advantage Learning, a variation of -learning, is shown enhance learning speed and reliability for this task.Q 1 Introduction Reinforcement learning systems learn by trial-and-error which actions are most valuable in which situations (states) [1]. Feedback is provided in the form of a scalar reward signal which may be delayed. The reward signal is defined in relation to the task to be achieved; reward is given when the system is successfully achieving the task. The value is updated incrementally with experience and is defined as a discounted sum of expected future reward. The learning systems choice of actions in response to states is called its policy. Reinforcement learning lies between the extremes of supervised learning, where the policy is taught by an expert, and unsupervised learning, where no feedback is given and the task is to find structure in data.
    [Show full text]
  • Text-To-Speech Synthesis
    Generative Model-Based Text-to-Speech Synthesis Heiga Zen (Google London) February rd, @MIT Outline Generative TTS Generative acoustic models for parametric TTS Hidden Markov models (HMMs) Neural networks Beyond parametric TTS Learned features WaveNet End-to-end Conclusion & future topics Outline Generative TTS Generative acoustic models for parametric TTS Hidden Markov models (HMMs) Neural networks Beyond parametric TTS Learned features WaveNet End-to-end Conclusion & future topics Text-to-speech as sequence-to-sequence mapping Automatic speech recognition (ASR) “Hello my name is Heiga Zen” ! Machine translation (MT) “Hello my name is Heiga Zen” “Ich heiße Heiga Zen” ! Text-to-speech synthesis (TTS) “Hello my name is Heiga Zen” ! Heiga Zen Generative Model-Based Text-to-Speech Synthesis February rd, of Speech production process text (concept) fundamental freq voiced/unvoiced char freq transfer frequency speech transfer characteristics magnitude start--end Sound source fundamental voiced: pulse frequency unvoiced: noise modulation of carrier wave by speech information air flow Heiga Zen Generative Model-Based Text-to-Speech Synthesis February rd, of Typical ow of TTS system TEXT Sentence segmentation Word segmentation Text normalization Text analysis Part-of-speech tagging Pronunciation Speech synthesis Prosody prediction discrete discrete Waveform generation ) NLP Frontend discrete continuous SYNTHESIZED ) SEECH Speech Backend Heiga Zen Generative Model-Based Text-to-Speech Synthesis February rd, of Rule-based, formant synthesis []
    [Show full text]
  • Reinforcement Learning Data Science Africa 2018 Abuja, Nigeria (12 Nov - 16 Nov 2018)
    Reinforcement Learning Data Science Africa 2018 Abuja, Nigeria (12 Nov - 16 Nov 2018) Chika Yinka-Banjo, PhD Ayorkor Korsah, PhD University of Lagos Ashesi University Nigeria Ghana Outline • Introduction to Machine learning • Reinforcement learning definitions • Example reinforcement learning problems • The Markov decision process • The optimal policy • Value function & Q-value function • Bellman Equation • Q-learning • Building a simple Q-learning agent (coding) • Recap • Where to go from here? Introduction to Machine learning • Artificial Intelligence (AI) is the study and design of Intelligent agents. • An Intelligent agent can perceive its environment through sensors and it can act on its environment through actuators. • E.g. Agent: Humanoid robot • Environment: Earth? • Sensors: Camera, tactile sensor etc. • Actuators: Motors, grippers etc. • Machine learning is a subfield of Artificial Intelligence Branches of AI Introduction to Machine learning • Machine learning techniques learn from data without being explicitly programmed to do so. • Machine learning models enable the agent to learn from its own experience by extracting useful information from feedback from its environment. • Three types of learning feedback: • Supervised learning • Unsupervised learning • Reinforcement learning Branches of Machine learning Supervised learning • Supervised learning: the machine learning model is trained on many labelled examples of input-output pairs. • Such that when presented with a novel input, the model can estimate accurately what the correct output should be. • Data(x, y): x is input data, y is label Supervised learning task in the form of classification • Goal: learn a function to map x -> y • Examples include; Classification, regression object detection, image captioning etc. Unsupervised learning • Unsupervised learning: here the model extract useful information from unlabeled and unstructured data.
    [Show full text]
  • Unsupervised Speech Representation Learning Using Wavenet Autoencoders Jan Chorowski, Ron J
    1 Unsupervised speech representation learning using WaveNet autoencoders Jan Chorowski, Ron J. Weiss, Samy Bengio, Aaron¨ van den Oord Abstract—We consider the task of unsupervised extraction speaker gender and identity, from phonetic content, properties of meaningful latent representations of speech by applying which are consistent with internal representations learned autoencoding neural networks to speech waveforms. The goal by speech recognizers [13], [14]. Such representations are is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being desired in several tasks, such as low resource automatic speech invariant to confounding low level details in the signal such as recognition (ASR), where only a small amount of labeled the underlying pitch contour or background noise. Since the training data is available. In such scenario, limited amounts learned representation is tuned to contain only phonetic content, of data may be sufficient to learn an acoustic model on the we resort to using a high capacity WaveNet decoder to infer representation discovered without supervision, but insufficient information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the to learn the acoustic model and a data representation in a fully kind of constraint that is applied to the latent representation. supervised manner [15], [16]. We compare three variants: a simple dimensionality reduction We focus on representations learned with autoencoders bottleneck, a Gaussian Variational Autoencoder (VAE), and a applied to raw waveforms and spectrogram features and discrete Vector Quantized VAE (VQ-VAE). We analyze the quality investigate the quality of learned representations on LibriSpeech of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately re- [17].
    [Show full text]
  • Sparse Feature Learning for Deep Belief Networks
    Sparse Feature Learning for Deep Belief Networks Marc'Aurelio Ranzato1 Y-Lan Boureau2,1 Yann LeCun1 1 Courant Institute of Mathematical Sciences, New York University 2 INRIA Rocquencourt {ranzato,ylan,[email protected]} Abstract Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have cer- tain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the repre- sentation. We describe a novel and efficient algorithm to learn sparse represen- tations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, high-order dependencies between the input observed variables can be captured. 1 Introduction One of the main purposes of unsupervised learning is to produce good representations for data, that can be used for detection, recognition, prediction, or visualization. Good representations eliminate irrelevant variabilities of the input data, while preserving the information that is useful for the ul- timate task. One cause for the recent resurgence of interest in unsupervised learning is the ability to produce deep feature hierarchies by stacking unsupervised modules on top of each other, as pro- posed by Hinton et al.
    [Show full text]
  • A Review of Unsupervised Artificial Neural Networks with Applications
    A REVIEW OF UNSUPERVISED ARTIFICIAL NEURAL NETWORKS WITH APPLICATIONS Samson Damilola Fabiyi Department of Electronic and Electrical Engineering, University of Strathclyde 204 George Street, G1 1XW, Glasgow, United Kingdom [email protected] ABSTRACT designer) who uses his or her knowledge of the environment to Artificial Neural Networks (ANNs) are models formulated to train the network with labelled data sets [7]. Hence, the mimic the learning capability of human brains. Learning in artificial neural networks learn by receiving input and target ANNs can be categorized into supervised, reinforcement and pairs of several observations from the labelled data sets, unsupervised learning. Application of supervised ANNs is processing the input, comparing the output with the target, limited to when the supervisor’s knowledge of the environment computing the error between the output and target, and using is sufficient to supply the networks with labelled datasets. the error signal and the concept of backward propagation to Application of unsupervised ANNs becomes imperative in adjust the weights interconnecting the network’s neurons with situations where it is very difficult to get labelled datasets. This the aim of minimising the error and optimising performance [6, paper presents the various methods, and applications of 7]. Fine-tuning of the network continues until the set of weights unsupervised ANNs. In order to achieve this, several secondary that minimise the discrepancy between the output and the sources of information, including academic journals and desired output is obtained. Figure 1 shows the block diagram conference proceedings, were selected. Autoencoders, self- which conceptualizes supervised learning in ANNs.
    [Show full text]
  • A Hybrid Model Consisting of Supervised and Unsupervised Learning for Landslide Susceptibility Mapping
    remote sensing Article A Hybrid Model Consisting of Supervised and Unsupervised Learning for Landslide Susceptibility Mapping Zhu Liang 1, Changming Wang 1,* , Zhijie Duan 2, Hailiang Liu 1, Xiaoyang Liu 1 and Kaleem Ullah Jan Khan 1 1 College of Construction Engineering, Jilin University, Changchun 130012, China; [email protected] (Z.L.); [email protected] (H.L.); [email protected] (X.L.); [email protected] (K.U.J.K.) 2 State Key Laboratory of Hydroscience and Engineering Tsinghua University, Beijing 100084, China; [email protected] * Correspondence: [email protected]; Tel.: +86-135-0441-8751 Abstract: Landslides cause huge damage to social economy and human beings every year. Landslide susceptibility mapping (LSM) occupies an important position in land use and risk management. This study is to investigate a hybrid model which makes full use of the advantage of supervised learning model (SLM) and unsupervised learning model (ULM). Firstly, ten continuous variables were used to develop a ULM which consisted of factor analysis (FA) and k-means cluster for a preliminary landslide susceptibility map. Secondly, 351 landslides with “1” label were collected and the same number of non-landslide samples with “0” label were selected from the very low susceptibility area in the preliminary map, constituting a new priori condition for a SLM, and thirteen factors were used for the modeling of gradient boosting decision tree (GBDT) which represented for SLM. Finally, the performance of different models was verified using related indexes. The results showed that the performance of the pretreated GBDT model was improved with sensitivity, specificity, accuracy Citation: Liang, Z.; Wang, C.; Duan, and the area under the curve (AUC) values of 88.60%, 92.59%, 90.60% and 0.976, respectively.
    [Show full text]
  • 4 Perceptron Learning
    4 Perceptron Learning 4.1 Learning algorithms for neural networks In the two preceding chapters we discussed two closely related models, McCulloch–Pitts units and perceptrons, but the question of how to find the parameters adequate for a given task was left open. If two sets of points have to be separated linearly with a perceptron, adequate weights for the comput- ing unit must be found. The operators that we used in the preceding chapter, for example for edge detection, used hand customized weights. Now we would like to find those parameters automatically. The perceptron learning algorithm deals with this problem. A learning algorithm is an adaptive method by which a network of com- puting units self-organizes to implement the desired behavior. This is done in some learning algorithms by presenting some examples of the desired input- output mapping to the network. A correction step is executed iteratively until the network learns to produce the desired response. The learning algorithm is a closed loop of presentation of examples and of corrections to the network parameters, as shown in Figure 4.1. network test input-output compute the examples error fix network parameters Fig. 4.1. Learning process in a parametric system R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996 78 4 Perceptron Learning In some simple cases the weights for the computing units can be found through a sequential test of stochastically generated numerical combinations. However, such algorithms which look blindly for a solution do not qualify as “learning”. A learning algorithm must adapt the network parameters accord- ing to previous experience until a solution is found, if it exists.
    [Show full text]
  • Unsupervised Pre-Training of a Deep LSTM-Based Stacked Autoencoder for Multivariate Time Series Forecasting Problems Alaa Sagheer 1,2,3* & Mostafa Kotb2,3
    www.nature.com/scientificreports OPEN Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems Alaa Sagheer 1,2,3* & Mostafa Kotb2,3 Currently, most real-world time series datasets are multivariate and are rich in dynamical information of the underlying system. Such datasets are attracting much attention; therefore, the need for accurate modelling of such high-dimensional datasets is increasing. Recently, the deep architecture of the recurrent neural network (RNN) and its variant long short-term memory (LSTM) have been proven to be more accurate than traditional statistical methods in modelling time series data. Despite the reported advantages of the deep LSTM model, its performance in modelling multivariate time series (MTS) data has not been satisfactory, particularly when attempting to process highly non-linear and long-interval MTS datasets. The reason is that the supervised learning approach initializes the neurons randomly in such recurrent networks, disabling the neurons that ultimately must properly learn the latent features of the correlated variables included in the MTS dataset. In this paper, we propose a pre-trained LSTM- based stacked autoencoder (LSTM-SAE) approach in an unsupervised learning fashion to replace the random weight initialization strategy adopted in deep LSTM recurrent networks. For evaluation purposes, two diferent case studies that include real-world datasets are investigated, where the performance of the proposed approach compares favourably with the deep LSTM approach. In addition, the proposed approach outperforms several reference models investigating the same case studies. Overall, the experimental results clearly show that the unsupervised pre-training approach improves the performance of deep LSTM and leads to better and faster convergence than other models.
    [Show full text]