Neural Network Based Feature Extraction for Speech and Image Recognition

Total Page:16

File Type:pdf, Size:1020Kb

Neural Network Based Feature Extraction for Speech and Image Recognition Neural Network based Feature Extraction for Speech and Image Recognition Von der Fakultät für Mathematik, Informatik und Naturwissenschaften der RWTH AACHEN UNIVERSITY zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften genehmigte Dissertation vorgelegt von Dipl.-Inform. Christian Plahl aus Bielefeld Berichter: Univ.-Prof. Dr.-Ing. Hermann Ney Priv.-Doz. Dr.Ing. Björn W. Schuller Tag der mündlichen Prüfung: 23. Januar 2014 Diese Dissertation ist auf den Internetseiten der Hochschulbibliothek online verfügbar. Dipl.-Inform. Christian Plahl Human Language Technology and Pattern Recognition Group RWTH Aachen University [email protected] Erklärung Hiermit versichere ich, dass ich die vorliegende Doktorarbeit selbstständig verfasst und keine anderen als die angegebenen Hilfsmittel verwendet habe. Alle Textauszüge und Grafiken, die sinngemäß oder wörtlich aus veröffentlichten Schriften entnommen wurden, sind durch Ref- erenzen gekennzeichnet. Aachen, Juni 2014 Dipl.-Inform. Christian Plahl vii Abstract This work investigates features derived from an artificial neural network. These artificial neural network based probabilistic features have become a major component of current state-of-the-art systems for automatic speech recognition and other areas, e.g. image recognition. A detailed study of the artificial neural network based features helps to improve the feature extraction and to understand which information of the speech signal is relevant for recognition. Two algorithms are investigated which are widely used to integrate the information derived from an artificial neural network: the tandem and the hybrid approach. This work studies the effect of each of the algorithms in terms of recognition performance w.r.t. word error rate and the computational requirements. In addition, a detailed comparison and a discussion of the main advantages of each integration approach are given. Furthermore, novel extensions are proposed improving the artificial neural network feature extraction and the final recognition performance of the systems trained. These extensions con- cern the input features and the topology of the network used to train the artificial neural network and are independent of the integration method. Different short-term and long-term features model other complementary aspects of the speech signal. By combining these different feature sets the development circle of the speech recognition system can be simplified. This allows increasing the model complexity of the artificial neural network or of the acoustic model. The topology of an artificial neural network has a huge impact on the quality of the fea- tures derived from the artificial neural network. This work investigates the hierarchical frame- work, the bottle-neck processing and recurrent neural networks, especially the long-short-term- memory structure and the training of bi-directional networks. Furthermore, this work exam- ines cross-lingual artificial neural network features and their impact on the topology and the amount of audio data used to train such features. The training and testing language of the artifi- cial neural network features differs and the system development circle is simplified when such cross-lingual artificial neural network based features are used. In addition, this work analyses different supervised and unsupervised weight pre-training techniques. The initialization of the weights of a deep neural network is critical since the op- timization function is non-convex. A new unsupervised pre-training technique is developed ix which allows the optimization of the loss function directly and provides a clear stopping crite- rion compared to other pre-training techniques like Restricted Boltzmann Machines. Finally, this work analyzes the generality of the artificial neural network based feature extrac- tion approach by transferring the concept to different image tasks, optical character recognition and automatic sign language recognition. While most results are confirmed, some surprising new results are obtained. x Zusammenfassung Diese Doktorarbeit untersucht Merkmale, die mit Hilfe von künstlichen neuronalen Netzen erzeugt werden. Diese probabilistischen Merkmale sind als wichtiger Bestandteil aktueller automatischer Spracherkennungssysteme unverzichtbar geworden. Ebenso erfolgreich werden sie auch in vielen anderen Bereichen der Mustererkennung eingesetzt, wie z.B. in der Bilder- erkennung oder der Handschriftenerkennung. Durch eine detaillierte Analyse dieser Merkmale und ihres Entstehungsprozesses wird die Merkmalsgewinnung einerseits optimiert, andererseits lassen sich die für die automatische Spracherkennung relevanten Bestandteile des Sprachsignals identifizieren. Der hybride und der tandem Ansatz werden in dieser Arbeit ausführlich untersucht. Beide Verfahren stellen dem Erkennungssystem die aus dem künstlichen neuronalen Netz gewonnene Information in unterschiedlicher Weise bereit. Die Verfahren werden hinsichtlich ihrer Erken- nungsleistung und der benötigten Rechenleistung untersucht. Ihre wichtigsten Vor- und Nach- teile werden gegenübergestellt und ihre Bedeutung für das Erkennungssystem diskutiert. Um die Erkennung mit den aus dem neuronalen Netz gewonnenen Merkmalen zu verbessern, wird sowohl die Bedeutung der verwendeten Struktur des künstlichen neuronalen Netzes, als auch der Einfluss der Eingangsdaten in das neuronale Netz untersucht. Die Änderung der Netz- struktur und der Einfluss verschiedener Merkmale sind unabhängig von der Methode, wie die Merkmale in das Spracherkennungssystem integriert werden. Verschiedene Merkmale des Kurzzeit- und Langzeitspektrums stellen unterschiedliche Aspekte des Sprachsignals in den Vordergrund. Durch die Kombination verschiedener Merkmale profitiert das neuronale Netz von diesen unterschiedlichen Wissensquellen. Diese Arbeit zeigt, dass ein künstliches neuronales Netz die Informationen verschiedener Merkmale besser ausnutzen kann, als die Kombination der auf diesen Merkmalen trainierten Einzelsysteme. Die Systemkombination profitiert von den unterschiedlichen Fehlern der Einzelsysteme. Jedes Einzelsystem ist dabei nur auf einem Merkmalsset trainiert. Zusätzlich vereinfacht das neue Kombinationsverfahren die Entwicklung des finalen Spracherkennungssystems, da nicht mehrere Systeme mit unter- schiedlichen Eingangsdaten trainiert werden müssen. Die so eingesparten Ressourcen können eingesetzt werden um z.B. komplexere künstliche neuronale Netze zu trainieren. xi Die Struktur eines künstlichen neuronalen Netzes besitzt einen großen Einfluss auf die Qua- lität der erzeugten Merkmale. Diese Arbeit untersucht die Auswirkungen hierarchischer An- sätze, der Flaschenhalsarchitektur (engl. bottle-neck) und die Verwendung von rekurrenten neuronalen Netzen auf die Erkennungsleistung. Im hierarchischen Ansatz werden mehrere neuronale Netze hintereinander geschaltet, so dass als Eingang der Ausgang eines vorherigen Netzes genommen wird. Rekurrente Netze führen ein Gedächtnis ein, welches die vorherigen Eingangssignale repräsentiert. Das Hauptaugenmerk bei der Verwendung von rekurrenten Net- zen liegt in der Analyse der bi-direktionalen Netzstruktur und in der Verwendung eines Lang- und Kurzeit Gedächtnisses (engl. long-short-term-memory). Sprachenübergreifende Merkmale reduzieren den Entwicklungsaufwand eines Spracherken- nungssystems. Das Aufsetzen und Trainieren neuer Systeme vereinfacht sich durch die Wieder- verwendung bereits trainierter Netze. In dieser Arbeit wird die Generalisierbarkeit solcher auf einer anderen Sprache trainierten neuronalen Netze Merkmale (engl. cross-lingual features) für das Spracherkennungssystem untersucht. Insbesondere werden die Auswirkung der Netzstruk- tur und die Relevanz der Anzahl an Sprachdaten, die zum Trainieren des neuronalen Netzes verwendet werden, thematisiert. Die Fehlerfunktion eines künstlichen neuronalen Netzes ist nicht konvex und das Erreichen des globalen Optimums daher nicht garantiert. In den meisten Fällen steckt die Zielfunktion in einem lokalen Optimum fest. Das Vortrainieren der Gewichte mittels un- und überwachter Lern- verfahren hilft, ein besseres lokales Optimum zu finden. In dieser Arbeit werden verschiedene un- und überwachte Lernstrategien getestet und analysiert. Zusätzlich zur Vorinitialisierung der Gewichte durch Restricted Boltzmann Machines wird ein neues unüberwachtes Verfahren eingeführt, die Sparse Encoder Symmetric Machines. Das neue Verfahren zeichnet sich sowohl durch ein klares Abbruchkriterium aus, als auch durch eine direkte Optimierung der Gewichte basierend auf der Zielfunktion. Bei den Restricted Boltzmann Machines muss die eigentliche Zielfunktion approximiert werden. Die Merkmalsgewinnungsverfahren durch neuronale Netze können nicht nur in der automa- tischen Spracherkennung erfolgreich angewendet werden, sondern auch in anderen Bereichen der Mustererkennung. Diese Arbeit zeigt, dass künstliche neuronale Netze in der automatischen Handschriftenerkennung und bei der automatischen Erkennung von Gebärden Verbesserungen bringen. Die Ergebnisse aus dem Bereich der Spracherkennung werden bestätigt. Das Training von gausschen Mischverteilungssystemen auf den Merkmalen des neuronalen Netzes und den Basismerkmalen ist jedoch nicht Erfolg versprechend. Erkennungssysteme, die nur auf den neu- ronalen Netzen Merkmalen trainiert werden, erzielen deutlich besser Erkennungsfehlerraten. xii Acknowledgement A long time ago it started, and finally here it is: my thesis. It was a long and difficult road to go, and I would like to thank a number
Recommended publications
  • Automatic Feature Learning Using Recurrent Neural Networks
    Predicting purchasing intent: Automatic Feature Learning using Recurrent Neural Networks Humphrey Sheil Omer Rana Ronan Reilly Cardiff University Cardiff University Maynooth University Cardiff, Wales Cardiff, Wales Maynooth, Ireland [email protected] [email protected] [email protected] ABSTRACT influence three out of four major variables that affect profit. Inaddi- We present a neural network for predicting purchasing intent in an tion, merchants increasingly rely on (and pay advertising to) much Ecommerce setting. Our main contribution is to address the signifi- larger third-party portals (for example eBay, Google, Bing, Taobao, cant investment in feature engineering that is usually associated Amazon) to achieve their distribution, so any direct measures the with state-of-the-art methods such as Gradient Boosted Machines. merchant group can use to increase their profit is sorely needed. We use trainable vector spaces to model varied, semi-structured input data comprising categoricals, quantities and unique instances. McKinsey A.T. Kearney Affected by Multi-layer recurrent neural networks capture both session-local shopping intent and dataset-global event dependencies and relationships for user Price management 11.1% 8.2% Yes sessions of any length. An exploration of model design decisions Variable cost 7.8% 5.1% Yes including parameter sharing and skip connections further increase Sales volume 3.3% 3.0% Yes model accuracy. Results on benchmark datasets deliver classifica- Fixed cost 2.3% 2.0% No tion accuracy within 98% of state-of-the-art on one and exceed state-of-the-art on the second without the need for any domain / Table 1: Effect of improving different variables on operating dataset-specific feature engineering on both short and long event profit, from [22].
    [Show full text]
  • Exploring the Potential of Sparse Coding for Machine Learning
    Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 10-19-2020 Exploring the Potential of Sparse Coding for Machine Learning Sheng Yang Lundquist Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/open_access_etds Part of the Applied Mathematics Commons, and the Computer Sciences Commons Let us know how access to this document benefits ou.y Recommended Citation Lundquist, Sheng Yang, "Exploring the Potential of Sparse Coding for Machine Learning" (2020). Dissertations and Theses. Paper 5612. https://doi.org/10.15760/etd.7484 This Dissertation is brought to you for free and open access. It has been accepted for inclusion in Dissertations and Theses by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Exploring the Potential of Sparse Coding for Machine Learning by Sheng Y. Lundquist A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Dissertation Committee: Melanie Mitchell, Chair Feng Liu Bart Massey Garrett Kenyon Bruno Jedynak Portland State University 2020 © 2020 Sheng Y. Lundquist Abstract While deep learning has proven to be successful for various tasks in the field of computer vision, there are several limitations of deep-learning models when com- pared to human performance. Specifically, human vision is largely robust to noise and distortions, whereas deep learning performance tends to be brittle to modifi- cations of test images, including being susceptible to adversarial examples. Addi- tionally, deep-learning methods typically require very large collections of training examples for good performance on a task, whereas humans can learn to perform the same task with a much smaller number of training examples.
    [Show full text]
  • Incorporating Automated Feature Engineering Routines Into Automated Machine Learning Pipelines by Wesley Runnels
    Incorporating Automated Feature Engineering Routines into Automated Machine Learning Pipelines by Wesley Runnels Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY May 2020 c Massachusetts Institute of Technology 2020. All rights reserved. Author . Wesley Runnels Department of Electrical Engineering and Computer Science May 12th, 2020 Certified by . Tim Kraska Department of Electrical Engineering and Computer Science Thesis Supervisor Accepted by . Katrina LaCurts Chair, Master of Engineering Thesis Committee 2 Incorporating Automated Feature Engineering Routines into Automated Machine Learning Pipelines by Wesley Runnels Submitted to the Department of Electrical Engineering and Computer Science on May 12, 2020, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract Automating the construction of consistently high-performing machine learning pipelines has remained difficult for researchers, especially given the domain knowl- edge and expertise often necessary for achieving optimal performance on a given dataset. In particular, the task of feature engineering, a key step in achieving high performance for machine learning tasks, is still mostly performed manually by ex- perienced data scientists. In this thesis, building upon the results of prior work in this domain, we present a tool, rl feature eng, which automatically generates promising features for an arbitrary dataset. In particular, this tool is specifically adapted to the requirements of augmenting a more general auto-ML framework. We discuss the performance of this tool in a series of experiments highlighting the various options available for use, and finally discuss its performance when used in conjunction with Alpine Meadow, a general auto-ML package.
    [Show full text]
  • Text-To-Speech Synthesis
    Generative Model-Based Text-to-Speech Synthesis Heiga Zen (Google London) February rd, @MIT Outline Generative TTS Generative acoustic models for parametric TTS Hidden Markov models (HMMs) Neural networks Beyond parametric TTS Learned features WaveNet End-to-end Conclusion & future topics Outline Generative TTS Generative acoustic models for parametric TTS Hidden Markov models (HMMs) Neural networks Beyond parametric TTS Learned features WaveNet End-to-end Conclusion & future topics Text-to-speech as sequence-to-sequence mapping Automatic speech recognition (ASR) “Hello my name is Heiga Zen” ! Machine translation (MT) “Hello my name is Heiga Zen” “Ich heiße Heiga Zen” ! Text-to-speech synthesis (TTS) “Hello my name is Heiga Zen” ! Heiga Zen Generative Model-Based Text-to-Speech Synthesis February rd, of Speech production process text (concept) fundamental freq voiced/unvoiced char freq transfer frequency speech transfer characteristics magnitude start--end Sound source fundamental voiced: pulse frequency unvoiced: noise modulation of carrier wave by speech information air flow Heiga Zen Generative Model-Based Text-to-Speech Synthesis February rd, of Typical ow of TTS system TEXT Sentence segmentation Word segmentation Text normalization Text analysis Part-of-speech tagging Pronunciation Speech synthesis Prosody prediction discrete discrete Waveform generation ) NLP Frontend discrete continuous SYNTHESIZED ) SEECH Speech Backend Heiga Zen Generative Model-Based Text-to-Speech Synthesis February rd, of Rule-based, formant synthesis []
    [Show full text]
  • Unsupervised Speech Representation Learning Using Wavenet Autoencoders Jan Chorowski, Ron J
    1 Unsupervised speech representation learning using WaveNet autoencoders Jan Chorowski, Ron J. Weiss, Samy Bengio, Aaron¨ van den Oord Abstract—We consider the task of unsupervised extraction speaker gender and identity, from phonetic content, properties of meaningful latent representations of speech by applying which are consistent with internal representations learned autoencoding neural networks to speech waveforms. The goal by speech recognizers [13], [14]. Such representations are is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being desired in several tasks, such as low resource automatic speech invariant to confounding low level details in the signal such as recognition (ASR), where only a small amount of labeled the underlying pitch contour or background noise. Since the training data is available. In such scenario, limited amounts learned representation is tuned to contain only phonetic content, of data may be sufficient to learn an acoustic model on the we resort to using a high capacity WaveNet decoder to infer representation discovered without supervision, but insufficient information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the to learn the acoustic model and a data representation in a fully kind of constraint that is applied to the latent representation. supervised manner [15], [16]. We compare three variants: a simple dimensionality reduction We focus on representations learned with autoencoders bottleneck, a Gaussian Variational Autoencoder (VAE), and a applied to raw waveforms and spectrogram features and discrete Vector Quantized VAE (VQ-VAE). We analyze the quality investigate the quality of learned representations on LibriSpeech of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately re- [17].
    [Show full text]
  • Sparse Feature Learning for Deep Belief Networks
    Sparse Feature Learning for Deep Belief Networks Marc'Aurelio Ranzato1 Y-Lan Boureau2,1 Yann LeCun1 1 Courant Institute of Mathematical Sciences, New York University 2 INRIA Rocquencourt {ranzato,ylan,[email protected]} Abstract Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have cer- tain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the repre- sentation. We describe a novel and efficient algorithm to learn sparse represen- tations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, high-order dependencies between the input observed variables can be captured. 1 Introduction One of the main purposes of unsupervised learning is to produce good representations for data, that can be used for detection, recognition, prediction, or visualization. Good representations eliminate irrelevant variabilities of the input data, while preserving the information that is useful for the ul- timate task. One cause for the recent resurgence of interest in unsupervised learning is the ability to produce deep feature hierarchies by stacking unsupervised modules on top of each other, as pro- posed by Hinton et al.
    [Show full text]
  • Learning from Few Subjects with Large Amounts of Voice Monitoring Data
    Learning from Few Subjects with Large Amounts of Voice Monitoring Data Jose Javier Gonzalez Ortiz John V. Guttag Robert E. Hillman Daryush D. Mehta Jarrad H. Van Stan Marzyeh Ghassemi Challenges of Many Medical Time Series • Few subjects and large amounts of data → Overfitting to subjects • No obvious mapping from signal to features → Feature engineering is labor intensive • Subject-level labels → In many cases, no good way of getting sample specific annotations 1 Jose Javier Gonzalez Ortiz Challenges of Many Medical Time Series • Few subjects and large amounts of data → Overfitting to subjects Unsupervised feature • No obvious mapping from signal to features extraction → Feature engineering is labor intensive • Subject-level labels Multiple → In many cases, no good way of getting sample Instance specific annotations Learning 2 Jose Javier Gonzalez Ortiz Learning Features • Segment signal into windows • Compute time-frequency representation • Unsupervised feature extraction Conv + BatchNorm + ReLU 128 x 64 Pooling Dense 128 x 64 Upsampling Sigmoid 3 Jose Javier Gonzalez Ortiz Classification Using Multiple Instance Learning • Logistic regression on learned features with subject labels RawRaw LogisticLogistic WaveformWaveform SpectrogramSpectrogram EncoderEncoder RegressionRegression PredictionPrediction •• %% Positive Positive Ours PerPer Window Window PerPer Subject Subject • Aggregate prediction using % positive windows per subject 4 Jose Javier Gonzalez Ortiz Application: Voice Monitoring Data • Voice disorders affect 7% of the US population • Data collected through neck placed accelerometer 1 week = ~4 billion samples ~100 5 Jose Javier Gonzalez Ortiz Results Previous work relied on expert designed features[1] AUC Accuracy Comparable performance Train 0.70 ± 0.05 0.71 ± 0.04 Expert LR without Test 0.68 ± 0.05 0.69 ± 0.04 task-specific feature Train 0.73 ± 0.06 0.72 ± 0.04 engineering! Ours Test 0.69 ± 0.07 0.70 ± 0.05 [1] Marzyeh Ghassemi et al.
    [Show full text]
  • Expert Feature-Engineering Vs. Deep Neural Networks: Which Is Better for Sensor-Free Affect Detection?
    Expert Feature-Engineering vs. Deep Neural Networks: Which is Better for Sensor-Free Affect Detection? Yang Jiang1, Nigel Bosch2, Ryan S. Baker3, Luc Paquette2, Jaclyn Ocumpaugh3, Juliana Ma. Alexandra L. Andres3, Allison L. Moore4, Gautam Biswas4 1 Teachers College, Columbia University, New York, NY, United States [email protected] 2 University of Illinois at Urbana-Champaign, Champaign, IL, United States {pnb, lpaq}@illinois.edu 3 University of Pennsylvania, Philadelphia, PA, United States {rybaker, ojaclyn}@upenn.edu [email protected] 4 Vanderbilt University, Nashville, TN, United States {allison.l.moore, gautam.biswas}@vanderbilt.edu Abstract. The past few years have seen a surge of interest in deep neural net- works. The wide application of deep learning in other domains such as image classification has driven considerable recent interest and efforts in applying these methods in educational domains. However, there is still limited research compar- ing the predictive power of the deep learning approach with the traditional feature engineering approach for common student modeling problems such as sensor- free affect detection. This paper aims to address this gap by presenting a thorough comparison of several deep neural network approaches with a traditional feature engineering approach in the context of affect and behavior modeling. We built detectors of student affective states and behaviors as middle school students learned science in an open-ended learning environment called Betty’s Brain, us- ing both approaches. Overall, we observed a tradeoff where the feature engineer- ing models were better when considering a single optimized threshold (for inter- vention), whereas the deep learning models were better when taking model con- fidence fully into account (for discovery with models analyses).
    [Show full text]
  • Comparison of Feature Learning Methods for Human Activity Recognition Using Wearable Sensors
    sensors Article Comparison of Feature Learning Methods for Human Activity Recognition Using Wearable Sensors Frédéric Li 1,*, Kimiaki Shirahama 1 ID , Muhammad Adeel Nisar 1, Lukas Köping 1 and Marcin Grzegorzek 1,2 1 Research Group for Pattern Recognition, University of Siegen, Hölderlinstr 3, 57076 Siegen, Germany; [email protected] (K.S.); [email protected] (M.A.N.); [email protected] (L.K.); [email protected] (M.G.) 2 Department of Knowledge Engineering, University of Economics in Katowice, Bogucicka 3, 40-226 Katowice, Poland * Correspondence: [email protected]; Tel.: +49-271-740-3973 Received: 16 January 2018; Accepted: 22 February 2018; Published: 24 February 2018 Abstract: Getting a good feature representation of data is paramount for Human Activity Recognition (HAR) using wearable sensors. An increasing number of feature learning approaches—in particular deep-learning based—have been proposed to extract an effective feature representation by analyzing large amounts of data. However, getting an objective interpretation of their performances faces two problems: the lack of a baseline evaluation setup, which makes a strict comparison between them impossible, and the insufficiency of implementation details, which can hinder their use. In this paper, we attempt to address both issues: we firstly propose an evaluation framework allowing a rigorous comparison of features extracted by different methods, and use it to carry out extensive experiments with state-of-the-art feature learning approaches. We then provide all the codes and implementation details to make both the reproduction of the results reported in this paper and the re-use of our framework easier for other researchers.
    [Show full text]
  • Deep Learning Feature Extraction Approach for Hematopoietic Cancer Subtype Classification
    International Journal of Environmental Research and Public Health Article Deep Learning Feature Extraction Approach for Hematopoietic Cancer Subtype Classification Kwang Ho Park 1 , Erdenebileg Batbaatar 1 , Yongjun Piao 2, Nipon Theera-Umpon 3,4,* and Keun Ho Ryu 4,5,6,* 1 Database and Bioinformatics Laboratory, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Korea; [email protected] (K.H.P.); [email protected] (E.B.) 2 School of Medicine, Nankai University, Tianjin 300071, China; [email protected] 3 Department of Electrical Engineering, Faculty of Engineering, Chiang Mai University, Chiang Mai 50200, Thailand 4 Biomedical Engineering Institute, Chiang Mai University, Chiang Mai 50200, Thailand 5 Data Science Laboratory, Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh 700000, Vietnam 6 Department of Computer Science, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Korea * Correspondence: [email protected] (N.T.-U.); [email protected] or [email protected] (K.H.R.) Abstract: Hematopoietic cancer is a malignant transformation in immune system cells. Hematopoi- etic cancer is characterized by the cells that are expressed, so it is usually difficult to distinguish its heterogeneities in the hematopoiesis process. Traditional approaches for cancer subtyping use statistical techniques. Furthermore, due to the overfitting problem of small samples, in case of a minor cancer, it does not have enough sample material for building a classification model. Therefore, Citation: Park, K.H.; Batbaatar, E.; we propose not only to build a classification model for five major subtypes using two kinds of losses, Piao, Y.; Theera-Umpon, N.; Ryu, K.H.
    [Show full text]
  • Convex Multi-Task Feature Learning
    Convex Multi-Task Feature Learning Andreas Argyriou1, Theodoros Evgeniou2, and Massimiliano Pontil1 1 Department of Computer Science University College London Gower Street London WC1E 6BT UK [email protected] [email protected] 2 Technology Management and Decision Sciences INSEAD 77300 Fontainebleau France [email protected] Summary. We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the well-known single- task 1-norm regularization. It is based on a novel non-convex regularizer which controls the number of learned features common across the tasks. We prove that the method is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. The algorithm has a simple interpretation: it alternately performs a super- vised and an unsupervised step, where in the former step it learns task-speci¯c functions and in the latter step it learns common-across-tasks sparse repre- sentations for these functions. We also provide an extension of the algorithm which learns sparse nonlinear representations using kernels. We report exper- iments on simulated and real data sets which demonstrate that the proposed method can both improve the performance relative to learning each task in- dependently and lead to a few learned features common across related tasks. Our algorithm can also be used, as a special case, to simply select { not learn { a few common variables across the tasks3. Key words: Collaborative Filtering, Inductive Transfer, Kernels, Multi- Task Learning, Regularization, Transfer Learning, Vector-Valued Func- tions.
    [Show full text]
  • Feature Engineering for Predictive Modeling Using Reinforcement Learning
    The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Feature Engineering for Predictive Modeling Using Reinforcement Learning Udayan Khurana Horst Samulowitz Deepak Turaga [email protected] [email protected] [email protected] IBM Research AI IBM Research AI IBM Research AI Abstract Feature engineering is a crucial step in the process of pre- dictive modeling. It involves the transformation of given fea- ture space, typically using mathematical functions, with the objective of reducing the modeling error for a given target. However, there is no well-defined basis for performing effec- tive feature engineering. It involves domain knowledge, in- tuition, and most of all, a lengthy process of trial and error. The human attention involved in overseeing this process sig- nificantly influences the cost of model generation. We present (a) Original data (b) Engineered data. a new framework to automate feature engineering. It is based on performance driven exploration of a transformation graph, Figure 1: Illustration of different representation choices. which systematically and compactly captures the space of given options. A highly efficient exploration strategy is de- rived through reinforcement learning on past examples. rental demand (kaggle.com/c/bike-sharing-demand) in Fig- ure 2(a). Deriving several features (Figure 2(b)) dramatically Introduction reduces the modeling error. For instance, extracting the hour of the day from the given timestamp feature helps to capture Predictive analytics are widely used in support for decision certain trends such as peak versus non-peak demand. Note making across a variety of domains including fraud detec- that certain valuable features are derived through a compo- tion, marketing, drug discovery, advertising, risk manage- sition of multiple simpler functions.
    [Show full text]