Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS

Total Page:16

File Type:pdf, Size:1020Kb

Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks Neural Networks A neural network is a set of connected input/output units During the learning phase, the network learns by adjusting the (neurons) where each connection has a weight associated with it. weights that enable it to predict the correct class label of the input samples (the training samples). Knowledge about the learning task is given in the form of examples. Inter neuron connection strengths (weights) are used to store the acquired information (the training examples). During the learning process the weights are modified in order to model the particular learning task correctly on the training examples. http://aemc.jpl.nasa.gov/activities/bio_regen.cfm 3 4 Neural Networks Network architectures Advantages Three different classes of network architectures prediction accuracy is generally high single‐layer feed‐forward neurons are organized in acyclic layers robust, works when training examples contain errors or noisy data multi‐layer feed‐forward output may be discrete, real‐valued, or a vector of several discrete or real‐ recurrent valued attributes The architecture of a neural network is linked with the learning fast evaluation of the learned target function algorithm used to train Criticism parameters are best determined empirically, such as the network topology or single‐layer multi‐layer structure long training time Input layer Output layer Input Output of of difficult to understand the learned function (weights) layer layer source nodes neurons not easy to incorporate domain knowledge Hidden Layer 5 6 Neurons The neuron Bias Neural networks are built out of a densely interconnected set x b of simple units (neurons) 1 w1 Each neuron takes a number of real‐valued inputs w0 Local Field Produces a single real‐valued output Output Input v Inputs to a neuron may be the outputs of other neurons. x2 w2 () y signal A neuron’s output may be used as input to many other neurons Activation Adder function (linear function combiner) which computes the (squashing weighted sum of the inputs: function) for limiting the xm wm amplitude of the m output of the uw b 0 wxj j neuron weights j1 y φ(u) Bias: serves to vary the activity of the unit 7 8 The neuron How does it Works? Assign weights to each input‐link Multiply each weight by the input value (0 or 1) Sum all the weight‐firing input combinations Apply squash function, e.g.: If sum > threshold for the Neuron then Output = +1 Else Output = ‐1 http://www‐cse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg 9 10 Popular activation functions How Are Neural Networks Trained? Linear activation Logistic activation Initially 1 zz z choose small random weights (wi) 1 e z 1 Set threshold = 1 (step function) Choose small learning rate (r) z z 0 Apply each member of the training set to the neural net model Threshold activation Hyperbolic tangent activation using a training rule to adjust the weights 2u 1,if z 0, 1 e zzsign( ) u tanh u 2u For each unit 1,if z 0. 1 e Compute the net input to the unit as a linear combination of all the inputs 1 1 to the unit Compute the output value using the activation function z 0 z -1 Compute the error ‐1 Update the weights and the bias 11 11 12 Single Layer Perceptron Single layer perceptron: training rule Modify the weights (wi) according to the Training Rule: Are the simplest form of neural networks wi = wi + r ∙ (t –a) ∙ xi output variables r is the learning rate (eg. 0.2) input variables t = target output a = actual output xi =i‐th input value output nodes Learning rate: if too small learning occurs at a small pace, if too large it may stuck in local minimum in the decision space 13 14 b=‐1 Example x1 x2 Y Multi layer network X1=0 000 w0=0.49 101 w1=0.95 011 Y=0 111 w2=0.15 threshold = 0.5 r=0.05 X2=1 Compute output u = ‐1 x 0.49 + 0 x 0.95 + 1 x 0.15=‐0.34 < t for the input thus, y=0 target output = 1 Compute the actual output (y) = 0 Repeat the process error error = (1‐0) = 1 with the new correction factor = error x r = 0.05 weigths for a given w0 = 0.49 + 0.05 x (1‐0) x (‐1) = 0.44 number of Compute the new w1 = 0.95 + 0.05 x (1‐0) x 0 = 0.95 iterations input layer hidden layer output layer weights w2 = 0.15 + 0.05 x (1‐0) x 1 = 0.20 15 (one or more) 16 Training multi layer networks Multi‐Layer network of sigmoid units back‐propagation algorithm Problem: what is the desired output for a hidden node? => Backpropagation algorithm Phase 1: Propagation θθjj()rErr j Forward propagation of a training input Output vector to update the bias Back propagation of the propagation's output activations. wwij ij() rErrO j i Phase 2: Weight update Output nodes to update the weights For each weight‐synapse: Multiply its output delta and input activation to Errjj O()1 O j Err kjk w get the gradient of the weight. k error for a node in the hidden layer Bring the weight in the opposite direction of the Hidden nodes gradient by subtracting a ratio of it from the Err O()()1 O T O weight. jj jjj error for a node in the output layer This ratio influences the speed and quality of learning. The sign of the gradient of a weight indicates where the error is increasing, this is 1 Oj why the weight must be updated in the opposite Input nodes 1 e Ij direction. Repeat the phase 1 and 2 until the performance IwOjijij θ of the network is good enough. Input vector: xi i 17 18 Example Propagation w04=‐0.4 x1=1 1 w14=0.2 IwOjijij θ i w06=0.1 w15=‐0.3 4 w46=‐0.3 w24=0.4 1 Oj 1 e Ij x2=0 2 6 w56=‐0.2 w25=0.1 w34=‐0.5 5 3 neuron input output x3=1 w35=0.2 w05=0.2 activation function 4 0.2x1+0.4x0‐0.5x1‐0.4=‐0.7 1/(1+e0.7)=0.332 Oj = 1 / (1+e‐Ij) 5 ‐0.3x1+0.1x0+0.2x1+0.2=0.1 1/(1+e‐0.1)=0.525 xi – input variables (1,0,1) whose class is 1 and 0.105 wij – randomly assigned weights 6 ‐0.3x0.332‐0.2x0.525+0.1=‐0.105 1/(1+e )=0.474 learning rate = 0.9 19 20 Calculation of the neuron’ Updating weights error for a node in the output layer weight New value w46 ‐0.3 + 0.9 x 0.1311 x 0.332 = ‐0.261 Errjj O()()1 O jjj T O w56 ‐0.2 + 0.9 x 0.1311 x 0.525 = ‐0.138 error for a node in the hidden layer w14 0.2 + 0.9 x ‐0.0087 x 1 = 0.192 w15 ‐0.3 + 0.9 x ‐0.0065 x 1 = ‐0.306 Errjj O()1 O j Err kjk w k w24 0.4 + 0.9 x ‐0.0087 x 0 = 0.4 w25 0.1 + 0.9 x ‐0.0065 x 0 = 0.1 w34 ‐0.5 + 0.9 x ‐0.0087 x 1 = ‐0.508 to update the weights to update the bias w35 0.2 + 0.9 x ‐0.0065 x 1 = 0.194 w06 0.1 + 0.9 x 0.1311 = 0.218 wwij ij() rErrO j i θθjj()rErr j w05 0.2 + 0.9 x ‐0.0065 = 0.194 neuron output neuron error neuron output error w04 ‐0.4 + 0.9 x ‐0.0087 = ‐0.408 4 0.332 6 0.474 x (1 ‐ 0.474) x (1 ‐ 0.474) = 0.1311 4 0.332 ‐0.0087 5 0.525 5 0.525 x (1 ‐ 0.525) x (‐0.2) x 0.1311 = ‐0.0065 5 0.525 ‐0.0065 6 0.474 4 0.332 x (1 – 0.332) x (‐0.3) x 0.1311 = ‐0.0087 6 0.474 0.1311 21 22 Example Neural Network as a Classifier w04=‐0.408 Weakness x1=1 1 w14=0.192 Long training time w06=0.218 w15=‐0.306 4 w46=‐0.261 Require a number of parameters typically best determined empirically, e.g., the w24=0.4 network topology or ``structure." Poor interpretability: Difficult to interpret the symbolic meaning behind the x2=0 2 6 w56=‐0.138 learned weights and of ``hidden units" in the network w25=0.1 Strength 5 w34=‐0.508 High tolerance to noisy data Ability to classify untrained patterns 3 x3=0 w35=0.194 Well‐suited for continuous‐valued inputs and outputs w05=0.194 Successful on a wide array of real‐world data This is the resulting network after the first iteration. We now have to process Algorithms are inherently parallel another training example until the overall error is low or we run out of examples. 23 24 Ensemble Method Aggregation of multiple learned models with the goal of improving accuracy. Intuition: simulate what we do when we combine a expert panel in a human decision‐making process ENSEMBLE METHODS 25 26 Some Comments Methods to Achieve Diversity Combining models adds complexity Diversity from differences in input variation It is more difficult to characterize and explain predictions Different feature weightings The accuracy may increase Ratings Classifier A + Actors Classifier B Predictions + Genres Classifier C Violation of Ockham’s Razor Training Examples “simplicity leads to greater accuracy” Divide up training data among models Identifying the best model requires identifying the proper "model complexity" Classifier A + Classifier B Predictions + Classifier C Training Examples 27 28 Ensemble Methods: Increasing the How to combine models Accuracy Ensemble methods Use a combination of models to increase accuracy Algebraic methods Voting methods Combine a series of k learned models, M1, M2, …, Mk, with the aim of Average Majority voting creating an improved model M* Weighted average Weighted majority voting Sum Borda count (rank candidates in order of preference) Weighted sum Product Maximum Minimum Median 29 30 Popular ensemble methods Bagging: Bootstrap AGGregatING Analogy: Diagnosis based on multiple doctors’ majority vote Bagging: Training averaging the prediction over a collection of classifiers Given a set D of d tuples, at each iteration i, a training set Di of d tuples is sampled with replacement from D (i.e., bootstrap) Boosting: A classifier model Mi is learned for each training set Di weighted vote with a collection of classifiers Classification: classify an unknown sample X Each classifier M returns its class prediction Ensemble: i The bagged classifier M* counts the votes and assigns the class with the combining a set of heterogeneous classifiers most votes to X Prediction: can be applied to the prediction of continuous values by taking the average value of each prediction for a given test tuple 31 32 Bagging Accuracy Often significant better than a single classifier derived from D For noise data: not considerably worse, more robust Proved improved accuracy in prediction Requirement: Need unstable classifier types Unstable means a small change to the training data may lead to major decision changes.
Recommended publications
  • Ranking and Automatic Selection of Machine Learning Models Abstract Sandro Feuz
    Technical Disclosure Commons Defensive Publications Series December 13, 2017 Ranking and automatic selection of machine learning models Abstract Sandro Feuz Victor Carbune Follow this and additional works at: http://www.tdcommons.org/dpubs_series Recommended Citation Feuz, Sandro and Carbune, Victor, "Ranking and automatic selection of machine learning models Abstract", Technical Disclosure Commons, (December 13, 2017) http://www.tdcommons.org/dpubs_series/982 This work is licensed under a Creative Commons Attribution 4.0 License. This Article is brought to you for free and open access by Technical Disclosure Commons. It has been accepted for inclusion in Defensive Publications Series by an authorized administrator of Technical Disclosure Commons. Feuz and Carbune: Ranking and automatic selection of machine learning models Abstra Ranking and automatic selection of machine learning models Abstract Generally, the present disclosure is directed to an API for ranking and automatic selection from competing machine learning models that can perform a particular task. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to provide to a software application one or more machine learning models from different providers. The trained models are suited to a task or data type specified by the developer. The one or more models are selected from a registry of machine learning models, their task specialties, cost, and performance, such that the application specified cost and performance requirements are met. An application processor interface (API) maintains a registry of various machine learning models, their task specialties, costs and/or performances. A third-party developer can make a call to the API to select one or more machine learning models.
    [Show full text]
  • Kitsune: an Ensemble of Autoencoders for Online Network Intrusion Detection
    Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection Yisroel Mirsky, Tomer Doitshman, Yuval Elovici and Asaf Shabtai Ben-Gurion University of the Negev yisroel, tomerdoi @post.bgu.ac.il, elovici, shabtaia @bgu.ac.il { } { } Abstract—Neural networks have become an increasingly popu- Over the last decade many machine learning techniques lar solution for network intrusion detection systems (NIDS). Their have been proposed to improve detection performance [2], [3], capability of learning complex patterns and behaviors make them [4]. One popular approach is to use an artificial neural network a suitable solution for differentiating between normal traffic and (ANN) to perform the network traffic inspection. The benefit network attacks. However, a drawback of neural networks is of using an ANN is that ANNs are good at learning complex the amount of resources needed to train them. Many network non-linear concepts in the input data. This gives ANNs a gateways and routers devices, which could potentially host an NIDS, simply do not have the memory or processing power to great advantage in detection performance with respect to other train and sometimes even execute such models. More importantly, machine learning algorithms [5], [2]. the existing neural network solutions are trained in a supervised The prevalent approach to using an ANN as an NIDS is manner. Meaning that an expert must label the network traffic to train it to classify network traffic as being either normal and update the model manually from time to time. or some class of attack [6], [7], [8]. The following shows the In this paper, we present Kitsune: a plug and play NIDS typical approach to using an ANN-based classifier in a point which can learn to detect attacks on the local network, without deployment strategy: supervision, and in an efficient online manner.
    [Show full text]
  • A Robust Deep Learning Approach for Spatiotemporal Estimation of Satellite AOD and PM2.5
    remote sensing Article A Robust Deep Learning Approach for Spatiotemporal Estimation of Satellite AOD and PM2.5 Lianfa Li 1,2,3 1 State Key Laboratory of Resources and Environmental Information Systems, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Datun Road, Beijing 100101, China; [email protected]; Tel.: +86-10-648888362 2 University of Chinese Academy of Sciences, Beijing 100049, China 3 Spatial Data Intelligence Lab Ltd. Liability Co., Casper, WY 82609, USA Received: 22 November 2019; Accepted: 7 January 2020; Published: 13 January 2020 Abstract: Accurate estimation of fine particulate matter with diameter 2.5 µm (PM ) at a high ≤ 2.5 spatiotemporal resolution is crucial for the evaluation of its health effects. Previous studies face multiple challenges including limited ground measurements and availability of spatiotemporal covariates. Although the multiangle implementation of atmospheric correction (MAIAC) retrieves satellite aerosol optical depth (AOD) at a high spatiotemporal resolution, massive non-random missingness considerably limits its application in PM2.5 estimation. Here, a deep learning approach, i.e., bootstrap aggregating (bagging) of autoencoder-based residual deep networks, was developed to make robust imputation of MAIAC AOD and further estimate PM2.5 at a high spatial (1 km) and temporal (daily) resolution. The base model consisted of autoencoder-based residual networks where residual connections were introduced to improve learning performance. Bagging of residual networks was used to generate ensemble predictions for better accuracy and uncertainty estimates. As a case study, the proposed approach was applied to impute daily satellite AOD and subsequently estimate daily PM2.5 in the Jing-Jin-Ji metropolitan region of China in 2015.
    [Show full text]
  • An Evaluation of Machine Learning Approaches to Natural Language Processing for Legal Text Classification
    Imperial College London Department of Computing An Evaluation of Machine Learning Approaches to Natural Language Processing for Legal Text Classification Supervisors: Author: Prof Alessandra Russo Clavance Lim Nuri Cingillioglu Submitted in partial fulfillment of the requirements for the MSc degree in Computing Science of Imperial College London September 2019 Contents Abstract 1 Acknowledgements 2 1 Introduction 3 1.1 Motivation .................................. 3 1.2 Aims and objectives ............................ 4 1.3 Outline .................................... 5 2 Background 6 2.1 Overview ................................... 6 2.1.1 Text classification .......................... 6 2.1.2 Training, validation and test sets ................. 6 2.1.3 Cross validation ........................... 7 2.1.4 Hyperparameter optimization ................... 8 2.1.5 Evaluation metrics ......................... 9 2.2 Text classification pipeline ......................... 14 2.3 Feature extraction ............................. 15 2.3.1 Count vectorizer .......................... 15 2.3.2 TF-IDF vectorizer ......................... 16 2.3.3 Word embeddings .......................... 17 2.4 Classifiers .................................. 18 2.4.1 Naive Bayes classifier ........................ 18 2.4.2 Decision tree ............................ 20 2.4.3 Random forest ........................... 21 2.4.4 Logistic regression ......................... 21 2.4.5 Support vector machines ...................... 22 2.4.6 k-Nearest Neighbours .......................
    [Show full text]
  • Predicting Construction Cost and Schedule Success Using Artificial
    Available online at www.sciencedirect.com International Journal of Project Management 30 (2012) 470–478 www.elsevier.com/locate/ijproman Predicting construction cost and schedule success using artificial neural networks ensemble and support vector machines classification models ⁎ Yu-Ren Wang , Chung-Ying Yu, Hsun-Hsi Chan Dept. of Civil Engineering, National Kaohsiung University of Applied Sciences, 415 Chien-Kung Road, Kaohsiung, 807, Taiwan Received 11 May 2011; received in revised form 2 August 2011; accepted 15 September 2011 Abstract It is commonly perceived that how well the planning is performed during the early stage will have significant impact on final project outcome. This paper outlines the development of artificial neural networks ensemble and support vector machines classification models to predict project cost and schedule success, using status of early planning as the model inputs. Through industry survey, early planning and project performance information from a total of 92 building projects is collected. The results show that early planning status can be effectively used to predict project success and the proposed artificial intelligence models produce satisfactory prediction results. © 2011 Elsevier Ltd. APM and IPMA. All rights reserved. Keywords: Project success; Early planning; Classification model; ANNs ensemble; Support vector machines 1. Introduction Menches and Hanna, 2006). In particular, researches have indi- cated that project definition in the early planning process is an im- In the past few decades, the researchers and industry prac- portant factor leading to project success (Le et al., 2010; Thomas titioners have recognized the potential impact of early plan- and Fernández, 2008; Younga and Samson, 2008). Based on ning to final project outcomes and started to put more these results, this research intends to further investigate this rela- emphasis on early planning process (Dvir, 2005; Gibson et tionship and to examine if the status of early planning can be used al., 2006; Hartman and Ashrafi, 2004).
    [Show full text]
  • Ensemble Learning
    Ensemble Learning Martin Sewell Department of Computer Science University College London April 2007 (revised August 2008) 1 Introduction The idea of ensemble learning is to employ multiple learners and combine their predictions. There is no definitive taxonomy. Jain, Duin and Mao (2000) list eighteen classifier combination schemes; Witten and Frank (2000) detail four methods of combining multiple models: bagging, boosting, stacking and error- correcting output codes whilst Alpaydin (2004) covers seven methods of combin- ing multiple learners: voting, error-correcting output codes, bagging, boosting, mixtures of experts, stacked generalization and cascading. Here, the litera- ture in general is reviewed, with, where possible, an emphasis on both theory and practical advice, then the taxonomy from Jain, Duin and Mao (2000) is provided, and finally four ensemble methods are focussed on: bagging, boosting (including AdaBoost), stacked generalization and the random subspace method. 2 Literature Review Wittner and Denker (1988) discussed strategies for teaching layered neural net- works classification tasks. Schapire (1990) introduced boosting (see Section 5 (page 9)). A theoretical paper by Kleinberg (1990) introduced a general method for separating points in multidimensional spaces through the use of stochastic processes called stochastic discrimination (SD). The method basically takes poor solutions as an input and creates good solutions. Stochastic discrimination looks promising, and later led to the random subspace method (Ho 1998). Hansen and Salamon (1990) showed the benefits of invoking ensembles of similar neural networks. Wolpert (1992) introduced stacked generalization, a scheme for minimiz- ing the generalization error rate of one or more generalizers (see Section 6 (page 10)).Xu, Krzy˙zakand Suen (1992) considered methods of combining mul- tiple classifiers and their applications to handwriting recognition.
    [Show full text]
  • Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions
    Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions Synthesis Lectures on Data Mining and Knowledge Discovery Editor Robert Grossman, University of Illinois, Chicago Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions Giovanni Seni and John F. Elder 2010 Modeling and Data Mining in Blogosphere Nitin Agarwal and Huan Liu 2009 Copyright © 2010 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions Giovanni Seni and John F. Elder www.morganclaypool.com ISBN: 9781608452842 paperback ISBN: 9781608452859 ebook DOI 10.2200/S00240ED1V01Y200912DMK002 A Publication in the Morgan & Claypool Publishers series SYNTHESIS LECTURES ON DATA MINING AND KNOWLEDGE DISCOVERY Lecture #2 Series Editor: Robert Grossman, University of Illinois, Chicago Series ISSN Synthesis Lectures on Data Mining and Knowledge Discovery Print 2151-0067 Electronic 2151-0075 Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions Giovanni Seni Elder Research, Inc. and Santa Clara University John F. Elder Elder Research, Inc. and University of Virginia SYNTHESIS LECTURES ON DATA MINING AND KNOWLEDGE DISCOVERY #2 M &C Morgan& cLaypool publishers ABSTRACT Ensemble methods have been called the most influential development in Data Mining and Machine Learning in the past decade. They combine multiple models into one usually more accurate than the best of its components. Ensembles can provide a critical boost to industrial challenges – from investment timing to drug discovery, and fraud detection to recommendation systems – where predictive accuracy is more vital than model interpretability.
    [Show full text]
  • (Machine) Learning
    A Practical Tour of Ensemble (Machine) Learning Nima Hejazi 1 Evan Muzzall 2 1Division of Biostatistics, University of California, Berkeley 2D-Lab, University of California, Berkeley slides: https://goo.gl/wWa9QC These are slides from a presentation on practical ensemble learning with the Super Learner and h2oEnsemble packages for the R language, most recently presented at a meeting of The Hacker Within, at the Berkeley Institute for Data Science at UC Berkeley, on 6 December 2016. source: https://github.com/nhejazi/talk-h2oSL-THW-2016 slides: https://goo.gl/CXC2FF with notes: http://goo.gl/wWa9QC Ensemble Learning – What? In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. - Wikipedia, November 2016 2 This rather elementary definition of “ensemble learning” encapsulates quite well the core notions necessary to understand why we might be interested in optimizing such procedures. In particular, we will see that a weighted collection of individual learning algorithms can not only outperform other algorithms in practice but also has been shown to be theoretically optimal. Ensemble Learning – Why? ▶ Ensemble methods outperform individual (base) learning algorithms. ▶ By combining a set of individual learning algorithms using a metalearning algorithm, ensemble methods can approximate complex functional relationships. ▶ When the true functional relationship is not in the set of base learning algorithms, ensemble methods approximate the true function well. ▶ n.b., ensemble methods can, even asymptotically, perform only as well as the best weighted combination of the candidate learners. 3 A variety of techniques exist for ensemble learning, ranging from the classic “random forest” (of Leo Breiman) to “xgboost” to “Super Learner” (van der Laan et al.).
    [Show full text]
  • Ensemble Learning of Named Entity Recognition Algorithms Using Multilayer Perceptron for the Multilingual Web of Data
    Ensemble Learning of Named Entity Recognition Algorithms using Multilayer Perceptron for the Multilingual Web of Data René Speck Axel-Cyrille Ngonga Ngomo Data Science Group, University of Leipzig Data Science Group, University of Paderborn Augustusplatz 10 Pohlweg 51 Leipzig, Germany 04109 Paderborn, Germany 33098 [email protected] [email protected] ABSTRACT FOX’s inner workings. In Section 4, we compare the results achieved Implementing the multilingual Semantic Web vision requires trans- by our evaluation on the silver and gold standard datasets. Finally, forming unstructured data in multiple languages from the Docu- we discuss the insights provided by our evaluation and possible ment Web into structured data for the multilingual Web of Data. We extensions of our approach in Section 5. present the multilingual version of FOX, a knowledge extraction suite which supports this migration by providing named entity 2 RELATED WORK recognition based on ensemble learning for five languages. Our NER tools and frameworks implement a broad spectrum of ap- evaluation results show that our approach goes beyond the per- proaches, which can be subdivided into three main categories: formance of existing named entity recognition systems on all five dictionary-based, rule-based and machine learning approaches languages. In our best run, we outperform the state of the art by a [20]. The first systems for NER implemented dictionary-based ap- gain of 32.38% F1-Score points on a Dutch dataset. More informa- proaches, which relied on a list of named entities (NEs) and tried tion and a demo can be found at http://fox.aksw.org as well as an to identify these in text [1, 38].
    [Show full text]
  • A Taxonomy of Massive Data for Optimal Predictive Machine Learning and Data Mining Ernest Fokoue
    Rochester Institute of Technology RIT Scholar Works Articles 2013 A Taxonomy of Massive Data for Optimal Predictive Machine Learning and Data Mining Ernest Fokoue Follow this and additional works at: http://scholarworks.rit.edu/article Recommended Citation Fokoue, Ernest, "A Taxonomy of Massive Data for Optimal Predictive Machine Learning and Data Mining" (2013). Accessed from http://scholarworks.rit.edu/article/1750 This Article is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Articles by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. A Taxonomy of Massive Data for Optimal Predictive Machine Learning and Data Mining Ernest Fokoué Center for Quality and Applied Statistics Rochester Institute of Technology 98 Lomb Memorial Drive, Rochester, NY 14623, USA [email protected] Abstract Massive data, also known as big data, come in various ways, types, shapes, forms and sizes. In this paper, we propose a rough idea of a possible taxonomy of massive data, along with some of the most commonly used tools for handling each particular category of massiveness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data massiveness. The specific statistical machine learning technique used to handle a particular massive data set will depend on which category it falls in within the massiveness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Prepro- cessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Se- quentialization.
    [Show full text]
  • A Comparison of Artificial Neural Networks and Bootstrap
    Journal of Risk and Financial Management Article A Comparison of Artificial Neural Networks and Bootstrap Aggregating Ensembles in a Modern Financial Derivative Pricing Framework Ryno du Plooy * and Pierre J. Venter Department of Finance and Investment Management, University of Johannesburg, P.O. Box 524, Auckland Park 2006, South Africa; [email protected] * Correspondence: [email protected] Abstract: In this paper, the pricing performances of two learning networks, namely an artificial neural network and a bootstrap aggregating ensemble network, were compared when pricing the Johannesburg Stock Exchange (JSE) Top 40 European call options in a modern option pricing framework using a constructed implied volatility surface. In addition to this, the numerical accuracy of the better performing network was compared to a Monte Carlo simulation in a separate numerical experiment. It was found that the bootstrap aggregating ensemble network outperformed the artificial neural network and produced price estimates within the error bounds of a Monte Carlo simulation when pricing derivatives in a multi-curve framework setting. Keywords: artificial neural networks; vanilla option pricing; multi-curve framework; collateral; funding Citation: du Plooy, Ryno, and Pierre J. Venter. 2021. A Comparison of Artificial Neural Networks and Bootstrap Aggregating Ensembles in 1. Introduction a Modern Financial Derivative Black and Scholes(1973) established the foundation for modern option pricing the- Pricing Framework. Journal of Risk ory by showing that under certain ideal market conditions, it is possible to derive an and Financial Management 14: 254. analytically tractable solution for the price of a financial derivative. Industry practition- https://doi.org/10.3390/jrfm14060254 ers however quickly discovered that certain assumptions underlying the Black–Scholes (BS) model such as constant volatility and the existence of a unique risk-free interest rate Academic Editor: Jakub Horak were fundamentally flawed.
    [Show full text]
  • Chapter 7: Ensemble Learning and Random Forest
    Chapter 7: Ensemble Learning and Random Forest Dr. Xudong Liu Assistant Professor School of Computing University of North Florida Monday, 9/23/2019 1 / 23 Notations 1 Voting classifiers: hard and soft 2 Bagging and pasting Random Forests 3 Boosting: AdaBoost, GradientBoost, Stacking Overview 2 / 23 Hard Voting Classifiers In the setting of binary classification, hard voting is a simple way for an ensemble of classifiers to make predictions, that is, to output the majority winner between the two classes. If multi-classes, output the Plurality winner instead, or the winner according another voting rule. Even if each classifier is a weak learner, the ensemble can be a strong learner under hard voting, provided sufficiently many weak yet diverse learners. Voting Classifiers 3 / 23 Training Diverse Classifiers Voting Classifiers 4 / 23 Hard Voting Predictions Voting Classifiers 5 / 23 Ensemble of Weak is Strong? Think of a slightly biased coin with 51% chance of heads and 49% of tails. Law of large numbers: as you keep tossing the coin, assuming every toss is independent of others, the ratio of heads gets closer and closer to the probability of heads 51%. Voting Classifiers 6 / 23 Ensemble of Weak is Strong? Eventually, all 10 series end up consistently above 50%. As a result, the 10,000 tosses as a whole will output heads with close to 100% chance! Even for an ensemble of 1000 classifiers, each correct 51% of the time, using hard voting it can be of up to 75% accuracy. Scikit-Learn: from sklearn.ensemble import VotingClassifier Voting Classifiers 7 / 23 Soft Voting Predictions If the classifiers in the ensemble have class probabilities, we may use soft voting to aggregate.
    [Show full text]