Deep Learning on Big Data

Deep Learning on Big‐Data INGEGNERI Aurelio Uncini degli [email protected] EE ORDIN provincia di ANCONA Facoltà di Ingegneria dell'Informazione, Informatica e Statistica (I3S) http://ispac.diet.uniroma1.it Bologna, 07 Novembre 2013 Ancona,Rome, June July 12, 2015 2015 Prologue • AiAris to tle argue dthtd that a ll peop le express ing s im ilar intellectual faculties and that the differences were due to the teaching and example. • My elementary school teacher said that “man is intelligent because it has the ability to adapt”. • Bernard Widrow (LMS inventor): - “I'm an ‘adaptive’ guy” Keywords: teaching, example and adaptation The Big Data Phenomenon Exponential growth of available information • Social networks • Sensor networks • Internet of Things • Bureaucratic and specific database • Apps •…. Big Data cycle Apps Data Users Big Data many ‘V’ 2020 about 44x1021 (44 zettabyte) VlVolume Velocity Variability Variety Source: IDC’s Digital Universe Study (EMC) Big Data • Untapped opportunities for socioeconomic growth World Economic Forum • DtData is the new oil ofthf the I nt ernet and dth the new currency of the digital world. Meglena Kuneva, European Consumer Commissioner • Data in 21st Century is like Oil in the 18th Century: an immensely, untapped valuable asset. Big Data - Many ‘V’ definition Big problem: extraction of ‘V’alue from the large pools of data Cost center Profit center Harvesting of valuable knowledge from Big Data is not an ordinary t ask Today, machine learning methods, have come to play a vital role in Big Data analytics and knowledge discovery Big-Data relevant themes Data Constraint Computational Intelligence Massive scale Methods Decentralized Deep Learning Method Real - Time stream Deep Neural Nets Convolutive Neural Nets Distributed Neural Nets Meta heuristic Infrastructure ……. Massive Scale Value Cloud Storage BD business model High speed networks BD Analytics High speed computers High-value Computational model added products Adaptive …….. Parallel Task Distributed Modeling Local connections Prediction ‘Green ’ Classification Clustering …….. • Support projects that can transform our ability to harness in novel ways from huge volumes of digital data. • In April, 2013, U.S. President Barack Obama announced another federal project, a new brain mapping initiative called the BRAIN ((ggBrain Research Through Advancing Innovative Neurotechnologies). • President Barack Obama's Big Data Keynote -- Hadoop World 2015 (He talks about the importance of Big Data and Data Science) (19 feb 2015) Biologically inspired computing Biologically inspired approach .... Instinct Knowledge Experience Culture Brain Memory Deduction Emotions A priori knowledge Rules AtiAction Aware Reasoning ability ... Moreover: fifusion w ith other ifinforma tion .... .... most of our behaviors , which combine information, knowledge and intelligence; happens unconsciously. Ex. Complex scene summarization in a few words Characteristics of the biological brain The neuron cell DditDendrites (receivers) ATilAxon Terminals Cell Body (transmitters) Nucleus Stimuli Response •Birth ofArtificial Neural Networks (ANN) (40s) •The fooarmalneur on ofo McCulloc h – Pitts ((93)1943) • Simple biological inspired circuit ()s Non linear function Synaptic weights w Threshold or bias Cell potential w1 1 s x1 (ii)(activation) w Neuron's 2 input x T Stimuli s wx T () y ()w x Response Summing Activation w M junction function Axon xM Dendrit • Can be implemented by a very simple algorithm. Suitable for Artificial Neural Networks Learning model and paradigms • Learning model: simple rewarding mechanism • In general terms we can define two learning paradigms – Supervised – UidUnsupervised Supervised learning Learning through teaching by examples wwnn1 Rewarding_Function Stimuli Response Supervisor or Teacher Comparison Correct answer Reward mechihanism Error e[n ] RdiRewarding mechihanism: error ftifunction minimization provided through examples. External forcing Learning by error correction • A learning algorithm with a concrete and useful results is the LMS algorithm (Delta-rule) of Bernard Widrow (()1959). Desired output d (supervisor or teacher) w External stimuli x T Response ywx n (Signals) Comparison w w Stimuli Error Learning wwnn 1 xe algorithm ed y Bernard Widrow “I'm an ‘adaptive’ guy” Professor Emeritus Electrical Engineering Department Stan for d Un ivers itity USA Muutlti-Layer N eur al N etw ork s Compare outputs with correct answer to get error signal Back - propagate error signal to (3)(3)(2)(2)(1)(1) y Outputs y Φ W Φ W Φ Wx get derivatives for learn ing W(3) W(2) Many Hidden layers W(1) Feed - forward computation x Input vector (pattern) Back-Propagation learning algorithm (mid 80s) Unsupervised learning Learning through self adaptation Stimuli Response Rewarding mechanism No extlternal fiforcings Rewarding mechanism:simple: simple primal instinct that creates the adaptation i.e. natural evolutionary behavior Unsupervised learning Hebbian learning • Hebb’s Postulate • The strength of the connection depends on the activity between the neurons. Donald Hebb (Canadian psychologist, 1904-1985) Canadian psychologist, McGill University, Montreal Neural Networks Historyyyy: Gartner Hype Cycle • Neural Network Disillusionment Peak of Infleted Expectation hype RNN r media Plateau of Productivity MLP tions o NNs Rebirth aa Slope of Enlightenment BP Expect Trough of Disillusionment Widrow’s LMS Technology Trigger 1950-70 ’80 ’90 ‘00 ’06 ‘10 Time BP-NNs Disillusionment, 80 and 90 •Supervised learning – It requires labeled training data – Almost all data is unlabelled • Long learning time – Very slow in networks with many hidden layers – Vanish ggpradient problem • It mayyp fall into poor local minima – For deep networks they may be too far from the optimal solution Back-propagation problems in the 80 and 90 Three main problems of BP 1. Difficulty of producing labelled training data set: not enough labelled data sets. 2. No fast enough CPU. 3. Difficulty of correct weights: propagation error problems. What has happened recently 1. Labelled data sets got much bigger. 2. Computer got much faster. 3. New paradigm for learning deep layers using unlabeled data (2006). • Result: deep neural networks are the now state- of-the-art for many real world problems. Deep Neural Networks Neural Networks History: Gartner Hype Cycle Peak of Infleted Expectation hype RNN MLP NNs 2nd Rebirth r media Pla teau of PdProduc tiittivity? tions o aa BP DNN Slope of Enlightenment Expect Trough of Disillusionment Widrow’s LMS Technology Trigger 1950-70 ’80 ’90 ‘00 ’06 ‘10 Time DNN DNN (industry) Deep Neural Networks - Gartner Hype Cycle hypothesized trend DNN Strong AI hype W A R N I N G Bill Gates Stephen Hawking r media BP tions o aa Expect Widrow’s LMS 1950-70 ’80 ’90 ‘00 ’06 ‘10 Time DNN http://www.huffingtonpost.com/james-barrat/hawking-gates-artificial-intelligence_b_7008706.html Machine Learning performance vs amount of data Deep learning methods ance Standard mm machine learning algorithms Perfor Amount of data Deep Learning definition • Many definitions: • DL is a set of algorithms in machine learning that attempt to learn in multiple levels,,pg corresponding to different levels of abstraction. It typically uses artificial neural networks. • DL is a class of machine learning techniques that exploit many layers of non-linear information processing for supervised or unsupervised feature extraction and transformation, and for pattern analysis andld class ifitiification. • …. DL Biological evidence • For example the layers organization of the visual system Receptors Motoneuron Muscle cells External stimuli Memory, ideation, Hidden layers psyche, etc. Many levels of transformation DL Psychological-cognitive evidence • The knowledge is represented in different levels of abstraction Abstraction Wisdom Insight Understanding Knowledge Information Data Concreteness The Ladder-of-Abstraction and the Data-Wisdom Pyramid Example of Deep Learning solutions • Apple - Siri speech recognition, iPhone personal assistant, … • Facebook – massive data analysis, … • Google - Translator, Android’s voice recognition, text processing Word2Vec, (Google acquires AI startup Deep Mind > $500M), … • IBM – brain-like computer, deep learning for Big Data, (IBM acquires AlchemyAPI, Enhancing Watson’s Deep Learning Capabilities)… • Microsoft – speech, massive data analysis, … • Twitter – acquires Deep Learning startup Madbits • Yahoo – acquires startup LookFlow to work on Flickr and Deep Learning • As data keeps getting bigger DL coming to play a key role in: • Data modeling • Analytics solutions • Leverage for competitive advantage Three main DNN families (L. Deng, D. Yu 2014) • Deep networks for unsupervised or generative learning • Capture high-order correlation of the observed data when no information about target class labels is available. • Deep networks for supervised learning • Directly provide discriminative power for pattern classification purposes. • Hybrid deep networks • Mix of the previous models. The goal is discrimination which is assisted, often in a significant way, with the outcomes of generative or unsupervised deep networks. • The research activities in the field, is very high Unsupervised generative model • Ex. Deep Belief Networks (DBN) • Stack of Restricted Boltzmann Machines (RBM) Independent unsupervised training of Output layer O RBM each layer. W(4) Hidden lay er H 3 RBM W(3) DBN can effectively utilize Hidden layer H large amounts of 2 RBM W(2) unlabeled data for

Load more