Parallel Wavenet: Fast High-Fidelity Speech Synthesis

Total Page:16

File Type:pdf, Size:1020Kb

Parallel Wavenet: Fast High-Fidelity Speech Synthesis Parallel WaveNet: Fast High-Fidelity Speech Synthesis Aaron van den Oord 1 Yazhe Li 1 Igor Babuschkin 1 Karen Simonyan 1 Oriol Vinyals 1 Koray Kavukcuoglu 1 George van den Driessche 1 Edward Lockhart 1 Luis C. Cobo 1 Florian Stimberg 1 Norman Casagrande 1 Dominik Grewe 1 Seb Noury 1 Sander Dieleman 1 Erich Elsen 1 Nal Kalchbrenner 1 Heiga Zen 1 Alex Graves 1 Helen King 1 Tom Walters 1 Dan Belov 1 Demis Hassabis 1 Abstract network which can synthesise equally high quality speech much more efficiently, and is deployed to millions of users. The recently-developed WaveNet architec- ture (van den Oord et al., 2016a) is the current WaveNet is one of a family of autoregressive deep genera- state of the art in realistic speech synthesis, tive models that have been applied with great success to data consistently rated as more natural sounding for as diverse as text (Mikolov et al., 2010), images (Larochelle many different languages than any previous & Murray, 2011; Theis & Bethge, 2015; van den Oord system. However, because WaveNet relies on et al., 2016c;b), video (Kalchbrenner et al., 2016), hand- sequential generation of one audio sample at writing (Graves, 2013) as well as human speech and music. a time, it is poorly suited to today’s massively Modelling raw audio signals, as WaveNet does, represents parallel computers, and therefore hard to deploy a particularly extreme form of autoregression, with up to in a real-time production setting. This paper 24,000 samples predicted per second. Operating at such a introduces Probability Density Distillation, a high temporal resolution is not problematic during network new method for training a parallel feed-forward training, where the complete sequence of input samples is network from a trained WaveNet with no already available and—thanks to the convolutional struc- significant difference in quality. The resulting ture of the network—can be processed in parallel. When system is capable of generating high-fidelity generating samples, however, each input sample must be speech samples at more than 20 times faster than drawn from the output distribution before it can be passed real-time, a 1000x speed up relative to the original in as input at the next time step, making parallel processing WaveNet, and capable of serving multiple English impossible. and Japanese voices in a production setting. Inverse autoregressive flows (IAFs) (Kingma et al., 2016) represent a kind of dual formulation of deep autoregressive modelling, in which sampling can be performed in parallel, 1. Introduction while the inference procedure required for likelihood esti- mation is sequential and slow. The goal of this paper is to Recent successes of deep learning go beyond achieving marry the best features of both models: the efficient training state-of-the-art results in research benchmarks, and push the of WaveNet and the efficient sampling of IAF networks. frontiers in some of the most challenging real world applica- The bridge between them is a new form of neural network tions such as speech recognition (Hinton et al., 2012), image distillation (Hinton et al., 2015), which we refer to as Prob- recognition (Krizhevsky et al., 2012; Szegedy et al., 2015), ability Density Distillation, where a trained WaveNet model and machine translation (Wu et al., 2016). The recently pub- is used as a teacher for a feedforward IAF model. lished WaveNet (van den Oord et al., 2016a) model achieves state-of-the-art results in speech synthesis, and significantly The next section describes the original WaveNet model, closes the gap with natural human speech. However, it is while Sections 3 and 4 define in detail the new, parallel not well suited for real world deployment due to its pro- version of WaveNet and the distillation process used to hibitive generation speed. In this paper, we present a new transfer knowledge between them. Section 5 then presents algorithm for distilling WaveNet into a feed-forward neural experimental results showing no loss in perceived quality for parallel versus original WaveNet, and continued superiority 1 DeepMind Technologies, London, United Kingdom. Corre- over previous benchmarks. We also present timings for spondence to: Aaron van den Oord <[email protected]>. sample generation, demonstrating more than 1000 speed- × Proceedings of the 35 th International Conference on Machine up relative to original WaveNet. Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 by the author(s). Parallel WaveNet: Fast High-Fidelity Speech Synthesis 2. WaveNet 2.1. Higher Fidelity WaveNet Autoregressive networks model the joint distribution of high- For this work we made two improvements to the basic dimensional data as a product of conditional distributions WaveNet model to enhance its audio quality for production using the probabilistic chain-rule: use. Unlike previous versions of WaveNet (van den Oord Y et al., 2016a), where 8-bit (µ-law or PCM) audio was mod- p(x) = p(x x ; θ); tj <t elled with a 256-way categorical distribution, we increased t the fidelity by modelling 16-bit audio. Since training a where xt is the t-th variable of x and θ are the parameters 65,536-way categorical distribution would be prohibitively of the autoregressive model. The conditional distributions costly, we instead modelled the samples with the discretized are usually modelled with a neural network that receives mixture of logistics distribution introduced in (Salimans x<t as input and outputs a distribution over possible xt. et al., 2017). We further improved fidelity by increasing the WaveNet (van den Oord et al., 2016a) is a convolutional audio sampling rate from 16kHz to 24kHz. This required a WaveNet with a wider receptive field, which we achieved autoregressive model which produces all p(xt x<t) in one forward pass, by making use of causal—or maskedj — by increasing the dilated convolution filter size from 2 to 3. convolutions (van den Oord et al., 2016c; Germain et al., An alternative strategy would be to increase the number of 2015). Every causal convolutional layer can process its in- layers or add more dilation stages. put in parallel, making these architectures very fast to train compared to RNNs (van den Oord et al., 2016b), which can 3. Parallel WaveNet only be updated sequentially. At generation time, however, the waveform has to be synthesised in a sequential fashion While the convolutional structure of WaveNet allows for rapid parallel training, sample generation remains inherently as xt must be sampled first in order to obtain x>t. Due to this nature, real time (or faster) synthesis with a fully sequential and therefore slow, as it is for all autoregressive autoregressive system is challenging. While sampling speed models which use ancestral sampling. We therefore seek is not a significant issue for offline generation, it is essential an alternative architecture that will allow for rapid, parallel for real-word applications. A version of WaveNet that gen- generation. erates in real-time has been developed (Paine et al., 2016), Inverse-autoregressive flows (IAFs) (Kingma et al., 2016) but it required the use of a much smaller network, resulting are stochastic generative models whose latent variables are in severely degraded quality. arranged so that all elements of a high dimensional observ- Raw audio data is typically very high-dimensional (e.g. able sample can be generated in parallel. IAFs are a special 16,000 samples per second for 16kHz audio), and contains type of normalising flow (Dinh et al., 2014; Rezende & Mo- complex, hierarchical structures spanning many thousands hamed, 2015; Dinh et al., 2016) which model a multivari- of time steps, such as words in speech or melodies in mu- ate distribution pX (x) as an explicit invertible non-linear sic. Modelling such long-term dependencies with standard transformation x = f(z) of a simple tractable distribution causal convolution layers would require a very deep net- pZ (z) (such as an isotropic Gaussian distribution). Using work to ensure a sufficiently broad receptive field. WaveNet the change of variables formula the resulting distribution avoids this constraint by using dilated causal convolutions, can be written as: which allow the receptive field to grow exponentially with dx depth. log p (x) = log p (z) log ; X Z − dz WaveNet uses gated activation functions, together with a simple mechanism introduced in (van den Oord et al., 2016c) dx where dz is the determinant of the Jacobian of f. For to condition on extra information such as class labels or all normalizing flows the transformation f is chosen so linguistic features: that it is invertible and its Jacobian determinant is easy to T T compute. In the case of an IAF, the output is modelled by hi = σ Wg;i xi + Vg;ic tanh Wf;i xi + Vf;ic ; ∗ ∗ (1) xt = f(z≤t). Because of this strict dependency structure, where denotes a convolution operator, and denotes the transformation has a triangular Jacobian matrix which makes the determinant equal to the product of the diagonal an element-wise∗ multiplication operator. σ( ) is a logistic entries: sigmoid function. c represents extra conditioning· data. i is dx X @f(z≤t) the layer index. f and g denote filter and gate, respectively. log = log : dz @z W and V are learnable weights. In cases where c encodes t t spatial or sequential information (such as a sequence of T T linguistic features), the matrix products (Vf;ic and Vg;ic) To sample from an IAF, a random sample is first drawn from are replaced by convolutions (V c and V c). z p (z) (we use the Logistic(0;I) distribution) which f;i ∗ g;i ∗ ∼ Z Parallel WaveNet: Fast High-Fidelity Speech Synthesis Output Dilation = 8 Hidden Layer Dilation = 4 Hidden Layer Dilation = 2 Hidden Layer Dilation = 1 Input Figure 1.
Recommended publications
  • Backpropagation and Deep Learning in the Brain
    Backpropagation and Deep Learning in the Brain Simons Institute -- Computational Theories of the Brain 2018 Timothy Lillicrap DeepMind, UCL With: Sergey Bartunov, Adam Santoro, Jordan Guerguiev, Blake Richards, Luke Marris, Daniel Cownden, Colin Akerman, Douglas Tweed, Geoffrey Hinton The “credit assignment” problem The solution in artificial networks: backprop Credit assignment by backprop works well in practice and shows up in virtually all of the state-of-the-art supervised, unsupervised, and reinforcement learning algorithms. Why Isn’t Backprop “Biologically Plausible”? Why Isn’t Backprop “Biologically Plausible”? Neuroscience Evidence for Backprop in the Brain? A spectrum of credit assignment algorithms: A spectrum of credit assignment algorithms: A spectrum of credit assignment algorithms: How to convince a neuroscientist that the cortex is learning via [something like] backprop - To convince a machine learning researcher, an appeal to variance in gradient estimates might be enough. - But this is rarely enough to convince a neuroscientist. - So what lines of argument help? How to convince a neuroscientist that the cortex is learning via [something like] backprop - What do I mean by “something like backprop”?: - That learning is achieved across multiple layers by sending information from neurons closer to the output back to “earlier” layers to help compute their synaptic updates. How to convince a neuroscientist that the cortex is learning via [something like] backprop 1. Feedback connections in cortex are ubiquitous and modify the
    [Show full text]
  • Face Recognition: a Convolutional Neural-Network Approach
    98 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 1, JANUARY 1997 Face Recognition: A Convolutional Neural-Network Approach Steve Lawrence, Member, IEEE, C. Lee Giles, Senior Member, IEEE, Ah Chung Tsoi, Senior Member, IEEE, and Andrew D. Back, Member, IEEE Abstract— Faces represent complex multidimensional mean- include fingerprints [4], speech [7], signature dynamics [36], ingful visual stimuli and developing a computational model for and face recognition [8]. Sales of identity verification products face recognition is difficult. We present a hybrid neural-network exceed $100 million [29]. Face recognition has the benefit of solution which compares favorably with other methods. The system combines local image sampling, a self-organizing map being a passive, nonintrusive system for verifying personal (SOM) neural network, and a convolutional neural network. identity. The techniques used in the best face recognition The SOM provides a quantization of the image samples into a systems may depend on the application of the system. We topological space where inputs that are nearby in the original can identify at least two broad categories of face recognition space are also nearby in the output space, thereby providing systems. dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides for 1) We want to find a person within a large database of partial invariance to translation, rotation, scale, and deformation. faces (e.g., in a police database). These systems typically The convolutional network extracts successively larger features return a list of the most likely people in the database in a hierarchical set of layers. We present results using the [34].
    [Show full text]
  • CNN Architectures
    Lecture 9: CNN Architectures Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 1 May 2, 2017 Administrative A2 due Thu May 4 Midterm: In-class Tue May 9. Covers material through Thu May 4 lecture. Poster session: Tue June 6, 12-3pm Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 2 May 2, 2017 Last time: Deep learning frameworks Paddle (Baidu) Caffe Caffe2 (UC Berkeley) (Facebook) CNTK (Microsoft) Torch PyTorch (NYU / Facebook) (Facebook) MXNet (Amazon) Developed by U Washington, CMU, MIT, Hong Kong U, etc but main framework of Theano TensorFlow choice at AWS (U Montreal) (Google) And others... Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 3 May 2, 2017 Last time: Deep learning frameworks (1) Easily build big computational graphs (2) Easily compute gradients in computational graphs (3) Run it all efficiently on GPU (wrap cuDNN, cuBLAS, etc) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 4 May 2, 2017 Last time: Deep learning frameworks Modularized layers that define forward and backward pass Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 5 May 2, 2017 Last time: Deep learning frameworks Define model architecture as a sequence of layers Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 6 May 2, 2017 Today: CNN Architectures Case Studies - AlexNet - VGG - GoogLeNet - ResNet Also.... - NiN (Network in Network) - DenseNet - Wide ResNet - FractalNet - ResNeXT - SqueezeNet - Stochastic Depth Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 7 May 2, 2017 Review: LeNet-5 [LeCun et al., 1998] Conv filters were 5x5, applied at stride 1 Subsampling (Pooling) layers were 2x2 applied at stride 2 i.e.
    [Show full text]
  • Self-Training Wavenet for TTS in Low-Data Regimes
    StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes Manish Sharma, Tom Kenter, Rob Clark Google UK fskmanish, tomkenter, [email protected] Abstract is increased. However, it can be seen from their results that the quality degrades when the number of recordings is further Recently, WaveNet has become a popular choice of neural net- decreased. work to synthesize speech audio. Autoregressive WaveNet is To reduce the voice artefacts observed in WaveNet stu- capable of producing high-fidelity audio, but is too slow for dent models trained under a low-data regime, we aim to lever- real-time synthesis. As a remedy, Parallel WaveNet was pro- age both the high-fidelity audio produced by an autoregressive posed, which can produce audio faster than real time through WaveNet, and the faster-than-real-time synthesis capability of distillation of an autoregressive teacher into a feedforward stu- a Parallel WaveNet. We propose a training paradigm, called dent network. A shortcoming of this approach, however, is that StrawNet, which stands for “Self-Training WaveNet”. The key a large amount of recorded speech data is required to produce contribution lies in using high-fidelity speech samples produced high-quality student models, and this data is not always avail- by an autoregressive WaveNet to self-train first a new autore- able. In this paper, we propose StrawNet: a self-training ap- gressive WaveNet and then a Parallel WaveNet model. We refer proach to train a Parallel WaveNet. Self-training is performed to models distilled this way as StrawNet student models. using the synthetic examples generated by the autoregressive We evaluate StrawNet by comparing it to a baseline WaveNet teacher.
    [Show full text]
  • Deep Learning Architectures for Sequence Processing
    Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright © 2021. All rights reserved. Draft of September 21, 2021. CHAPTER Deep Learning Architectures 9 for Sequence Processing Time will explain. Jane Austen, Persuasion Language is an inherently temporal phenomenon. Spoken language is a sequence of acoustic events over time, and we comprehend and produce both spoken and written language as a continuous input stream. The temporal nature of language is reflected in the metaphors we use; we talk of the flow of conversations, news feeds, and twitter streams, all of which emphasize that language is a sequence that unfolds in time. This temporal nature is reflected in some of the algorithms we use to process lan- guage. For example, the Viterbi algorithm applied to HMM part-of-speech tagging, proceeds through the input a word at a time, carrying forward information gleaned along the way. Yet other machine learning approaches, like those we’ve studied for sentiment analysis or other text classification tasks don’t have this temporal nature – they assume simultaneous access to all aspects of their input. The feedforward networks of Chapter 7 also assumed simultaneous access, al- though they also had a simple model for time. Recall that we applied feedforward networks to language modeling by having them look only at a fixed-size window of words, and then sliding this window over the input, making independent predictions along the way. Fig. 9.1, reproduced from Chapter 7, shows a neural language model with window size 3 predicting what word follows the input for all the. Subsequent words are predicted by sliding the window forward a word at a time.
    [Show full text]
  • Unsupervised Speech Representation Learning Using Wavenet Autoencoders Jan Chorowski, Ron J
    1 Unsupervised speech representation learning using WaveNet autoencoders Jan Chorowski, Ron J. Weiss, Samy Bengio, Aaron¨ van den Oord Abstract—We consider the task of unsupervised extraction speaker gender and identity, from phonetic content, properties of meaningful latent representations of speech by applying which are consistent with internal representations learned autoencoding neural networks to speech waveforms. The goal by speech recognizers [13], [14]. Such representations are is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being desired in several tasks, such as low resource automatic speech invariant to confounding low level details in the signal such as recognition (ASR), where only a small amount of labeled the underlying pitch contour or background noise. Since the training data is available. In such scenario, limited amounts learned representation is tuned to contain only phonetic content, of data may be sufficient to learn an acoustic model on the we resort to using a high capacity WaveNet decoder to infer representation discovered without supervision, but insufficient information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the to learn the acoustic model and a data representation in a fully kind of constraint that is applied to the latent representation. supervised manner [15], [16]. We compare three variants: a simple dimensionality reduction We focus on representations learned with autoencoders bottleneck, a Gaussian Variational Autoencoder (VAE), and a applied to raw waveforms and spectrogram features and discrete Vector Quantized VAE (VQ-VAE). We analyze the quality investigate the quality of learned representations on LibriSpeech of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately re- [17].
    [Show full text]
  • Unsupervised Speech Representation Learning Using Wavenet Autoencoders
    Unsupervised speech representation learning using WaveNet autoencoders https://arxiv.org/abs/1901.08810 Jan Chorowski University of Wrocław 06.06.2019 Deep Model = Hierarchy of Concepts Cat Dog … Moon Banana M. Zieler, “Visualizing and Understanding Convolutional Networks” Deep Learning history: 2006 2006: Stacked RBMs Hinton, Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks” Deep Learning history: 2012 2012: Alexnet SOTA on Imagenet Fully supervised training Deep Learning Recipe 1. Get a massive, labeled dataset 퐷 = {(푥, 푦)}: – Comp. vision: Imagenet, 1M images – Machine translation: EuroParlamanet data, CommonCrawl, several million sent. pairs – Speech recognition: 1000h (LibriSpeech), 12000h (Google Voice Search) – Question answering: SQuAD, 150k questions with human answers – … 2. Train model to maximize log 푝(푦|푥) Value of Labeled Data • Labeled data is crucial for deep learning • But labels carry little information: – Example: An ImageNet model has 30M weights, but ImageNet is about 1M images from 1000 classes Labels: 1M * 10bit = 10Mbits Raw data: (128 x 128 images): ca 500 Gbits! Value of Unlabeled Data “The brain has about 1014 synapses and we only live for about 109 seconds. So we have a lot more parameters than data. This motivates the idea that we must do a lot of unsupervised learning since the perceptual input (including proprioception) is the only place we can get 105 dimensions of constraint per second.” Geoff Hinton https://www.reddit.com/r/MachineLearning/comments/2lmo0l/ama_geoffrey_hinton/ Unsupervised learning recipe 1. Get a massive labeled dataset 퐷 = 푥 Easy, unlabeled data is nearly free 2. Train model to…??? What is the task? What is the loss function? Unsupervised learning by modeling data distribution Train the model to minimize − log 푝(푥) E.g.
    [Show full text]
  • Real-Time Black-Box Modelling with Recurrent Neural Networks
    Proceedings of the 22nd International Conference on Digital Audio Effects (DAFx-19), Birmingham, UK, September 2–6, 2019 REAL-TIME BLACK-BOX MODELLING WITH RECURRENT NEURAL NETWORKS Alec Wright, Eero-Pekka Damskägg, and Vesa Välimäki∗ Acoustics Lab, Department of Signal Processing and Acoustics Aalto University Espoo, Finland [email protected] ABSTRACT tube amplifiers and distortion pedals. In [14] it was shown that the WaveNet model of several distortion effects was capable of This paper proposes to use a recurrent neural network for black- running in real time. The resulting deep neural network model, box modelling of nonlinear audio systems, such as tube amplifiers however, was still fairly computationally expensive to run. and distortion pedals. As a recurrent unit structure, we test both Long Short-Term Memory and a Gated Recurrent Unit. We com- In this paper, we propose an alternative black-box model based pare the proposed neural network with a WaveNet-style deep neu- on an RNN. We demonstrate that the trained RNN model is capa- ral network, which has been suggested previously for tube ampli- ble of achieving the accuracy of the WaveNet model, whilst re- fier modelling. The neural networks are trained with several min- quiring considerably less processing power to run. The proposed utes of guitar and bass recordings, which have been passed through neural network, which consists of a single recurrent layer and a the devices to be modelled. A real-time audio plugin implement- fully connected layer, is suitable for real-time emulation of tube ing the proposed networks has been developed in the JUCE frame- amplifiers and distortion pedals.
    [Show full text]
  • Machine Learning V/S Deep Learning
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072 Machine Learning v/s Deep Learning Sachin Krishan Khanna Department of Computer Science Engineering, Students of Computer Science Engineering, Chandigarh Group of Colleges, PinCode: 140307, Mohali, India ---------------------------------------------------------------------------***--------------------------------------------------------------------------- ABSTRACT - This is research paper on a brief comparison and summary about the machine learning and deep learning. This comparison on these two learning techniques was done as there was lot of confusion about these two learning techniques. Nowadays these techniques are used widely in IT industry to make some projects or to solve problems or to maintain large amount of data. This paper includes the comparison of two techniques and will also tell about the future aspects of the learning techniques. Keywords: ML, DL, AI, Neural Networks, Supervised & Unsupervised learning, Algorithms. INTRODUCTION As the technology is getting advanced day by day, now we are trying to make a machine to work like a human so that we don’t have to make any effort to solve any problem or to do any heavy stuff. To make a machine work like a human, the machine need to learn how to do work, for this machine learning technique is used and deep learning is used to help a machine to solve a real-time problem. They both have algorithms which work on these issues. With the rapid growth of this IT sector, this industry needs speed, accuracy to meet their targets. With these learning algorithms industry can meet their requirements and these new techniques will provide industry a different way to solve problems.
    [Show full text]
  • Linear Prediction-Based Wavenet Speech Synthesis
    LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis Min-Jae Hwang Frank Soong Eunwoo Song Search Solution Microsoft Naver Corporation Seongnam, South Korea Beijing, China Seongnam, South Korea [email protected] [email protected] [email protected] Xi Wang Hyeonjoo Kang Hong-Goo Kang Microsoft Yonsei University Yonsei University Beijing, China Seoul, South Korea Seoul, South Korea [email protected] [email protected] [email protected] Abstract—We propose a linear prediction (LP)-based wave- than the speech signal, the training and generation processes form generation method via WaveNet vocoding framework. A become more efficient. WaveNet-based neural vocoder has significantly improved the However, the synthesized speech is likely to be unnatural quality of parametric text-to-speech (TTS) systems. However, it is challenging to effectively train the neural vocoder when the target when the prediction errors in estimating the excitation are database contains massive amount of acoustical information propagated through the LP synthesis process. As the effect such as prosody, style or expressiveness. As a solution, the of LP synthesis is not considered in the training process, the approaches that only generate the vocal source component by synthesis output is vulnerable to the variation of LP synthesis a neural vocoder have been proposed. However, they tend to filter. generate synthetic noise because the vocal source component is independently handled without considering the entire speech To alleviate this problem, we propose an LP-WaveNet, production process; where it is inevitable to come up with a which enables to jointly train the complicated interactions mismatch between vocal source and vocal tract filter.
    [Show full text]
  • Comparative Analysis of Recurrent Neural Network Architectures for Reservoir Inflow Forecasting
    water Article Comparative Analysis of Recurrent Neural Network Architectures for Reservoir Inflow Forecasting Halit Apaydin 1 , Hajar Feizi 2 , Mohammad Taghi Sattari 1,2,* , Muslume Sevba Colak 1 , Shahaboddin Shamshirband 3,4,* and Kwok-Wing Chau 5 1 Department of Agricultural Engineering, Faculty of Agriculture, Ankara University, Ankara 06110, Turkey; [email protected] (H.A.); [email protected] (M.S.C.) 2 Department of Water Engineering, Agriculture Faculty, University of Tabriz, Tabriz 51666, Iran; [email protected] 3 Department for Management of Science and Technology Development, Ton Duc Thang University, Ho Chi Minh City, Vietnam 4 Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam 5 Department of Civil and Environmental Engineering, Hong Kong Polytechnic University, Hong Kong, China; [email protected] * Correspondence: [email protected] or [email protected] (M.T.S.); [email protected] (S.S.) Received: 1 April 2020; Accepted: 21 May 2020; Published: 24 May 2020 Abstract: Due to the stochastic nature and complexity of flow, as well as the existence of hydrological uncertainties, predicting streamflow in dam reservoirs, especially in semi-arid and arid areas, is essential for the optimal and timely use of surface water resources. In this research, daily streamflow to the Ermenek hydroelectric dam reservoir located in Turkey is simulated using deep recurrent neural network (RNN) architectures, including bidirectional long short-term memory (Bi-LSTM), gated recurrent unit (GRU), long short-term memory (LSTM), and simple recurrent neural networks (simple RNN). For this purpose, daily observational flow data are used during the period 2012–2018, and all models are coded in Python software programming language.
    [Show full text]
  • A Primer on Machine Learning
    A Primer on Machine Learning By instructor Amit Manghani Question: What is Machine Learning? Simply put, Machine Learning is a form of data analysis. Using algorithms that “ continuously learn from data, Machine Learning allows computers to recognize The core of hidden patterns without actually being programmed to do so. The key aspect of Machine Learning Machine Learning is that as models are exposed to new data sets, they adapt to produce reliable and consistent output. revolves around a computer system Question: consuming data What is driving the resurgence of Machine Learning? and learning from There are four interrelated phenomena that are behind the growing prominence the data. of Machine Learning: 1) the ever-increasing volume, variety and velocity of data, 2) the decrease in bandwidth and storage costs and 3) the exponential improve- ments in computational processing. In a nutshell, the ability to perform complex ” mathematical computations on big data is driving the resurgence in Machine Learning. 1 Question: What are some of the commonly used methods of Machine Learning? Reinforce- ment Machine Learning Supervised Machine Learning Semi- supervised Machine Unsupervised Learning Machine Learning Supervised Machine Learning In Supervised Learning, algorithms are trained using labeled examples i.e. the desired output for an input is known. For example, a piece of mail could be labeled either as relevant or junk. The algorithm receives a set of inputs along with the corresponding correct outputs to foster learning. Once the algorithm is trained on a set of labeled data; the algorithm is run against the same labeled data and its actual output is compared against the correct output to detect errors.
    [Show full text]