Feature Engineering for Machine Learning
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Data Science (ML-DL-Ai)
Data science (ML-DL-ai) Statistics Multiple Regression Model Building and Evaluation Introduction to Statistics Model post fitting for Inference Types of Statistics Examining Residuals Analytics Methodology and Problem- Regression Assumptions Solving Framework Identifying Influential Observations Populations and samples Detecting Collinearity Parameter and Statistics Uses of variable: Dependent and Categorical Data Analysis Independent variable Describing categorical Data Types of Variable: Continuous and One-way frequency tables categorical variable Association Cross Tabulation Tables Descriptive Statistics Test of Association Probability Theory and Distributions Logistic Regression Model Building Picturing your Data Multiple Logistic Regression and Histogram Interpretation Normal Distribution Skewness, Kurtosis Model Building and scoring for Outlier detection Prediction Introduction to predictive modelling Inferential Statistics Building predictive model Scoring Predictive Model Hypothesis Testing Introduction to Machine Learning and Analytics Analysis of variance (ANOVA) Two sample t-Test Introduction to Machine Learning F-test What is Machine Learning? One-way ANOVA Fundamental of Machine Learning ANOVA hypothesis Key Concepts and an example of ML ANOVA Model Supervised Learning Two-way ANOVA Unsupervised Learning Regression Linear Regression with one variable Exploratory data analysis Model Representation Hypothesis testing for correlation Cost Function Outliers, Types of Relationship, -
Incorporating Automated Feature Engineering Routines Into Automated Machine Learning Pipelines by Wesley Runnels
Incorporating Automated Feature Engineering Routines into Automated Machine Learning Pipelines by Wesley Runnels Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY May 2020 c Massachusetts Institute of Technology 2020. All rights reserved. Author . Wesley Runnels Department of Electrical Engineering and Computer Science May 12th, 2020 Certified by . Tim Kraska Department of Electrical Engineering and Computer Science Thesis Supervisor Accepted by . Katrina LaCurts Chair, Master of Engineering Thesis Committee 2 Incorporating Automated Feature Engineering Routines into Automated Machine Learning Pipelines by Wesley Runnels Submitted to the Department of Electrical Engineering and Computer Science on May 12, 2020, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract Automating the construction of consistently high-performing machine learning pipelines has remained difficult for researchers, especially given the domain knowl- edge and expertise often necessary for achieving optimal performance on a given dataset. In particular, the task of feature engineering, a key step in achieving high performance for machine learning tasks, is still mostly performed manually by ex- perienced data scientists. In this thesis, building upon the results of prior work in this domain, we present a tool, rl feature eng, which automatically generates promising features for an arbitrary dataset. In particular, this tool is specifically adapted to the requirements of augmenting a more general auto-ML framework. We discuss the performance of this tool in a series of experiments highlighting the various options available for use, and finally discuss its performance when used in conjunction with Alpine Meadow, a general auto-ML package. -
Machine Learning V1.1
An Introduction to Machine Learning v1.1 E. J. Sagra Agenda ● Why is Machine Learning in the News again? ● ArtificiaI Intelligence vs Machine Learning vs Deep Learning ● Artificial Intelligence ● Machine Learning & Data Science ● Machine Learning ● Data ● Machine Learning - By The Steps ● Tasks that Machine Learning solves ○ Classification ○ Cluster Analysis ○ Regression ○ Ranking ○ Generation Agenda (cont...) ● Model Training ○ Supervised Learning ○ Unsupervised Learning ○ Reinforcement Learning ● Reinforcement Learning - Going Deeper ○ Simple Example ○ The Bellman Equation ○ Deterministic vs. Non-Deterministic Search ○ Markov Decision Process (MDP) ○ Living Penalty ● Machine Learning - Decision Trees ● Machine Learning - Augmented Random Search (ARS) Why is Machine Learning In The News Again? Processing capabilities General ● GPU’s etc have reached level where Machine ● Tools / Languages / Automation Learning / Deep Learning practical ● Need for Data Science no longer limited to ● Cloud computing allows even individuals the tech giants capability to create / train complex models on ● Education is behind in creating Data vast data sets Scientists ● Organizing data is hard. Organizations Memory (Hard Drive (now SSD) as well RAM) challenged ● Speed / capacity increasing ● High demand due to lack of qualified talent ● Cost decreasing Data ● Volume of Data ● Access to vast public data sets ArtificiaI Intelligence vs Machine Learning vs Deep Learning Artificial Intelligence is the all-encompassing concept that initially erupted Followed by Machine Learning that thrived later Finally Deep Learning is escalating the advances of Artificial Intelligence to another level Artificial Intelligence Artificial intelligence (AI) is perhaps the most vaguely understood field of data science. The main idea behind building AI is to use pattern recognition and machine learning to build an agent able to think and reason as humans do (or approach this ability). -
Dynamic Feature Scaling for Online Learning of Binary Classifiers
Dynamic Feature Scaling for Online Learning of Binary Classifiers Danushka Bollegala University of Liverpool, Liverpool, United Kingdom May 30, 2017 Abstract Scaling feature values is an important step in numerous machine learning tasks. Different features can have different value ranges and some form of a feature scal- ing is often required in order to learn an accurate classifier. However, feature scaling is conducted as a preprocessing task prior to learning. This is problematic in an online setting because of two reasons. First, it might not be possible to accu- rately determine the value range of a feature at the initial stages of learning when we have observed only a handful of training instances. Second, the distribution of data can change over time, which render obsolete any feature scaling that we perform in a pre-processing step. We propose a simple but an effective method to dynamically scale features at train time, thereby quickly adapting to any changes in the data stream. We compare the proposed dynamic feature scaling method against more complex methods for estimating scaling parameters using several benchmark datasets for classification. Our proposed feature scaling method consistently out- performs more complex methods on all of the benchmark datasets and improves classification accuracy of a state-of-the-art online classification algorithm. 1 Introduction Machine learning algorithms require train and test instances to be represented using a set of features. For example, in supervised document classification [9], a document is often represented as a vector of its words and the value of a feature is set to the num- ber of times the word corresponding to the feature occurs in that document. -
Capacity and Trainability in Recurrent Neural Networks
Published as a conference paper at ICLR 2017 CAPACITY AND TRAINABILITY IN RECURRENT NEURAL NETWORKS Jasmine Collins,∗ Jascha Sohl-Dickstein & David Sussillo Google Brain Google Inc. Mountain View, CA 94043, USA {jlcollins, jaschasd, sussillo}@google.com ABSTRACT Two potential bottlenecks on the expressiveness of recurrent neural networks (RNNs) are their ability to store information about the task in their parameters, and to store information about the input history in their units. We show experimentally that all common RNN architectures achieve nearly the same per-task and per-unit capacity bounds with careful training, for a variety of tasks and stacking depths. They can store an amount of task information which is linear in the number of parameters, and is approximately 5 bits per parameter. They can additionally store approximately one real number from their input history per hidden unit. We further find that for several tasks it is the per-task parameter capacity bound that determines performance. These results suggest that many previous results comparing RNN architectures are driven primarily by differences in training effectiveness, rather than differences in capacity. Supporting this observation, we compare training difficulty for several architectures, and show that vanilla RNNs are far more difficult to train, yet have slightly higher capacity. Finally, we propose two novel RNN architectures, one of which is easier to train than the LSTM or GRU for deeply stacked architectures. 1 INTRODUCTION Research and application of recurrent neural networks (RNNs) have seen explosive growth over the last few years, (Martens & Sutskever, 2011; Graves et al., 2009), and RNNs have become the central component for some very successful model classes and application domains in deep learning (speech recognition (Amodei et al., 2015), seq2seq (Sutskever et al., 2014), neural machine translation (Bahdanau et al., 2014), the DRAW model (Gregor et al., 2015), educational applications (Piech et al., 2015), and scientific discovery (Mante et al., 2013)). -
Learning from Few Subjects with Large Amounts of Voice Monitoring Data
Learning from Few Subjects with Large Amounts of Voice Monitoring Data Jose Javier Gonzalez Ortiz John V. Guttag Robert E. Hillman Daryush D. Mehta Jarrad H. Van Stan Marzyeh Ghassemi Challenges of Many Medical Time Series • Few subjects and large amounts of data → Overfitting to subjects • No obvious mapping from signal to features → Feature engineering is labor intensive • Subject-level labels → In many cases, no good way of getting sample specific annotations 1 Jose Javier Gonzalez Ortiz Challenges of Many Medical Time Series • Few subjects and large amounts of data → Overfitting to subjects Unsupervised feature • No obvious mapping from signal to features extraction → Feature engineering is labor intensive • Subject-level labels Multiple → In many cases, no good way of getting sample Instance specific annotations Learning 2 Jose Javier Gonzalez Ortiz Learning Features • Segment signal into windows • Compute time-frequency representation • Unsupervised feature extraction Conv + BatchNorm + ReLU 128 x 64 Pooling Dense 128 x 64 Upsampling Sigmoid 3 Jose Javier Gonzalez Ortiz Classification Using Multiple Instance Learning • Logistic regression on learned features with subject labels RawRaw LogisticLogistic WaveformWaveform SpectrogramSpectrogram EncoderEncoder RegressionRegression PredictionPrediction •• %% Positive Positive Ours PerPer Window Window PerPer Subject Subject • Aggregate prediction using % positive windows per subject 4 Jose Javier Gonzalez Ortiz Application: Voice Monitoring Data • Voice disorders affect 7% of the US population • Data collected through neck placed accelerometer 1 week = ~4 billion samples ~100 5 Jose Javier Gonzalez Ortiz Results Previous work relied on expert designed features[1] AUC Accuracy Comparable performance Train 0.70 ± 0.05 0.71 ± 0.04 Expert LR without Test 0.68 ± 0.05 0.69 ± 0.04 task-specific feature Train 0.73 ± 0.06 0.72 ± 0.04 engineering! Ours Test 0.69 ± 0.07 0.70 ± 0.05 [1] Marzyeh Ghassemi et al. -
Training and Testing of a Single-Layer LSTM Network for Near-Future Solar Forecasting
applied sciences Conference Report Training and Testing of a Single-Layer LSTM Network for Near-Future Solar Forecasting Naylani Halpern-Wight 1,2,*,†, Maria Konstantinou 1, Alexandros G. Charalambides 1 and Angèle Reinders 2,3 1 Chemical Engineering Department, Cyprus University of Technology, Lemesos 3036, Cyprus; [email protected] (M.K.); [email protected] (A.G.C.) 2 Energy Technology Group, Department of Mechanical Engineering, Eindhoven University of Technology, 5612 AE Eindhoven, The Netherlands; [email protected] 3 Department of Design, Production and Management, Faculty of Engineering Technology, University of Twente, 7522 NB Enschede, The Netherlands * Correspondence: [email protected] or [email protected] † Current address: Archiepiskopou Kyprianou 30, Limassol 3036, Cyprus. Received: 30 June 2020; Accepted: 31 July 2020; Published: 25 August 2020 Abstract: Increasing integration of renewable energy sources, like solar photovoltaic (PV), necessitates the development of power forecasting tools to predict power fluctuations caused by weather. With trustworthy and accurate solar power forecasting models, grid operators could easily determine when other dispatchable sources of backup power may be needed to account for fluctuations in PV power plants. Additionally, PV customers and designers would feel secure knowing how much energy to expect from their PV systems on an hourly, daily, monthly, or yearly basis. The PROGNOSIS project, based at the Cyprus University of Technology, is developing a tool for intra-hour solar irradiance forecasting. This article presents the design, training, and testing of a single-layer long-short-term-memory (LSTM) artificial neural network for intra-hour power forecasting of a single PV system in Cyprus. -
Expert Feature-Engineering Vs. Deep Neural Networks: Which Is Better for Sensor-Free Affect Detection?
Expert Feature-Engineering vs. Deep Neural Networks: Which is Better for Sensor-Free Affect Detection? Yang Jiang1, Nigel Bosch2, Ryan S. Baker3, Luc Paquette2, Jaclyn Ocumpaugh3, Juliana Ma. Alexandra L. Andres3, Allison L. Moore4, Gautam Biswas4 1 Teachers College, Columbia University, New York, NY, United States [email protected] 2 University of Illinois at Urbana-Champaign, Champaign, IL, United States {pnb, lpaq}@illinois.edu 3 University of Pennsylvania, Philadelphia, PA, United States {rybaker, ojaclyn}@upenn.edu [email protected] 4 Vanderbilt University, Nashville, TN, United States {allison.l.moore, gautam.biswas}@vanderbilt.edu Abstract. The past few years have seen a surge of interest in deep neural net- works. The wide application of deep learning in other domains such as image classification has driven considerable recent interest and efforts in applying these methods in educational domains. However, there is still limited research compar- ing the predictive power of the deep learning approach with the traditional feature engineering approach for common student modeling problems such as sensor- free affect detection. This paper aims to address this gap by presenting a thorough comparison of several deep neural network approaches with a traditional feature engineering approach in the context of affect and behavior modeling. We built detectors of student affective states and behaviors as middle school students learned science in an open-ended learning environment called Betty’s Brain, us- ing both approaches. Overall, we observed a tradeoff where the feature engineer- ing models were better when considering a single optimized threshold (for inter- vention), whereas the deep learning models were better when taking model con- fidence fully into account (for discovery with models analyses). -
Training Dnns: Tricks
Training DNNs: Tricks Ju Sun Computer Science & Engineering University of Minnesota, Twin Cities March 5, 2020 1 / 33 Recap: last lecture Training DNNs m 1 X min ` (yi; DNNW (xi)) +Ω( W ) W m i=1 { What methods? Mini-batch stochastic optimization due to large m * SGD (with momentum), Adagrad, RMSprop, Adam * diminishing LR (1/t, exp delay, staircase delay) { Where to start? * Xavier init., Kaiming init., orthogonal init. { When to stop? * early stopping: stop when validation error doesn't improve This lecture: additional tricks/heuristics that improve { convergence speed { task-specific (e.g., classification, regression, generation) performance 2 / 33 Outline Data Normalization Regularization Hyperparameter search, data augmentation Suggested reading 3 / 33 Why scaling matters? : 1 Pm | Consider a ML objective: minw f (w) = m i=1 ` (w xi; yi), e.g., 1 Pm | 2 { Least-squares (LS): minw m i=1 kyi − w xik2 h i 1 Pm | w|xi { Logistic regression: minw − m i=1 yiw xi − log 1 + e 1 Pm | 2 { Shallow NN prediction: minw m i=1 kyi − σ (w xi)k2 1 Pm 0 | Gradient: rwf = m i=1 ` (w xi; yi) xi. { What happens when coordinates (i.e., features) of xi have different orders of magnitude? Partial derivatives have different orders of magnitudes =) slow convergence of vanilla GD (recall why adaptive grad methods) 2 1 Pm 00 | | Hessian: rwf = m i=1 ` (w xi; yi) xixi . | { Suppose the off-diagonal elements of xixi are relatively small (e.g., when features are \independent"). { What happens when coordinates (i.e., features) of xi have different orders 2 of magnitude? Conditioning of rwf is bad, i.e., f is elongated 4 / 33 Fix the scaling: first idea Normalization: make each feature zero-mean and unit variance, i.e., make each row of X = [x1;:::; xm] zero-mean and unit variance, i.e. -
Deep Learning Feature Extraction Approach for Hematopoietic Cancer Subtype Classification
International Journal of Environmental Research and Public Health Article Deep Learning Feature Extraction Approach for Hematopoietic Cancer Subtype Classification Kwang Ho Park 1 , Erdenebileg Batbaatar 1 , Yongjun Piao 2, Nipon Theera-Umpon 3,4,* and Keun Ho Ryu 4,5,6,* 1 Database and Bioinformatics Laboratory, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Korea; [email protected] (K.H.P.); [email protected] (E.B.) 2 School of Medicine, Nankai University, Tianjin 300071, China; [email protected] 3 Department of Electrical Engineering, Faculty of Engineering, Chiang Mai University, Chiang Mai 50200, Thailand 4 Biomedical Engineering Institute, Chiang Mai University, Chiang Mai 50200, Thailand 5 Data Science Laboratory, Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh 700000, Vietnam 6 Department of Computer Science, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Korea * Correspondence: [email protected] (N.T.-U.); [email protected] or [email protected] (K.H.R.) Abstract: Hematopoietic cancer is a malignant transformation in immune system cells. Hematopoi- etic cancer is characterized by the cells that are expressed, so it is usually difficult to distinguish its heterogeneities in the hematopoiesis process. Traditional approaches for cancer subtyping use statistical techniques. Furthermore, due to the overfitting problem of small samples, in case of a minor cancer, it does not have enough sample material for building a classification model. Therefore, Citation: Park, K.H.; Batbaatar, E.; we propose not only to build a classification model for five major subtypes using two kinds of losses, Piao, Y.; Theera-Umpon, N.; Ryu, K.H. -
Temporal Convolutional Neural Network for the Classification Of
Article Temporal Convolutional Neural Network for the Classification of Satellite Image Time Series Charlotte Pelletier 1*, Geoffrey I. Webb 1 and François Petitjean 1 1 Faculty of Information Technology, Monash University, Melbourne, VIC, 3800 * Correspondence: [email protected] Academic Editor: name Version February 1, 2019 submitted to Remote Sens.; Typeset by LATEX using class file mdpi.cls Abstract: New remote sensing sensors now acquire high spatial and spectral Satellite Image Time Series (SITS) of the world. These series of images are a key component of classification systems that aim at obtaining up-to-date and accurate land cover maps of the Earth’s surfaces. More specifically, the combination of the temporal, spectral and spatial resolutions of new SITS makes possible to monitor vegetation dynamics. Although traditional classification algorithms, such as Random Forest (RF), have been successfully applied for SITS classification, these algorithms do not make the most of the temporal domain. Conversely, some approaches that take into account the temporal dimension have recently been tested, especially Recurrent Neural Networks (RNNs). This paper proposes an exhaustive study of another deep learning approaches, namely Temporal Convolutional Neural Networks (TempCNNs) where convolutions are applied in the temporal dimension. The goal is to quantitatively and qualitatively evaluate the contribution of TempCNNs for SITS classification. This paper proposes a set of experiments performed on one million time series extracted from 46 Formosat-2 images. The experimental results show that TempCNNs are more accurate than RF and RNNs, that are the current state of the art for SITS classification. We also highlight some differences with results obtained in computer vision, e.g. -
Feature Engineering for Predictive Modeling Using Reinforcement Learning
The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Feature Engineering for Predictive Modeling Using Reinforcement Learning Udayan Khurana Horst Samulowitz Deepak Turaga [email protected] [email protected] [email protected] IBM Research AI IBM Research AI IBM Research AI Abstract Feature engineering is a crucial step in the process of pre- dictive modeling. It involves the transformation of given fea- ture space, typically using mathematical functions, with the objective of reducing the modeling error for a given target. However, there is no well-defined basis for performing effec- tive feature engineering. It involves domain knowledge, in- tuition, and most of all, a lengthy process of trial and error. The human attention involved in overseeing this process sig- nificantly influences the cost of model generation. We present (a) Original data (b) Engineered data. a new framework to automate feature engineering. It is based on performance driven exploration of a transformation graph, Figure 1: Illustration of different representation choices. which systematically and compactly captures the space of given options. A highly efficient exploration strategy is de- rived through reinforcement learning on past examples. rental demand (kaggle.com/c/bike-sharing-demand) in Fig- ure 2(a). Deriving several features (Figure 2(b)) dramatically Introduction reduces the modeling error. For instance, extracting the hour of the day from the given timestamp feature helps to capture Predictive analytics are widely used in support for decision certain trends such as peak versus non-peak demand. Note making across a variety of domains including fraud detec- that certain valuable features are derived through a compo- tion, marketing, drug discovery, advertising, risk manage- sition of multiple simpler functions.