Semantic Folding & Word Embeddings Seminar: Suchmaschinen, Sose 2021

Total Page:16

File Type:pdf, Size:1020Kb

Semantic Folding & Word Embeddings Seminar: Suchmaschinen, Sose 2021 Semantic Folding & Word Embeddings Seminar: Suchmaschinen, SoSe 2021 ReferentInnen: Constantin Rebouskos, Dzmitry Kirylenka, Laura Lehmann Zoom, den 16.06.2021 Inhalt 1. Word Embeddings Grundkonzept 2. Beispiel Word2Vec 3. Anwendungsbereiche Word Embeddings 4. Semantic Folding 5. Semantic Fingerprints 6. Rechnen mit Fingerabdrücken 7. Anwendungsbereiche Semantic Folding 8. Word Embeddings und Semantic Folding im Vergleich 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 2 Word Embeddings ● Begriff der im Bereich des NLP benutzt wird um die Bedeutung eines Wortes innerhalb eines Vektors zu kodieren ● Das Grundprinzip hierbei ist, dass Wortvektoren, welche näher beieinander liegen zueinander ähnliche Bedeutungen haben ● Dies Bedeutungsähnlichkeiten werden durch Distributionseigenschaften der einzelnen Wörter in den Trainingsdaten gewonnen ● Eine Annotation der Trainingsdaten ist hierbei nicht nötig 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 3 Wortvektorrepräsentation Eine Möglichkeit Wörter als Vektoren darzustellen ist die One-Hot-Vector- Darstellung: • Der [1, 0, 0, 0, … ] • flinke [0, 1, 0, 0, … ] • Fuchs [0, 0, 1, 0, … ] • springt [0, 0, 0, 1, … ] • ... 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 4 Wortvektorrepräsentation Problematisch an der One-Hot-Vektor-Darstellung ist, dass Ähnlichkeiten zwischen Wörtern nicht dargestellt werden können • Flinke [0, 1, 0, 0, …] • Schnelle [0, 0, 0, 1, …] Besser (aber immer noch nicht gut) also: • Flinke [0, 1, 0, 0, …] • Schnelle [0, 1, 0, 0, …] 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 5 Wortvektorrepräsentation Noch besser ist es nicht nur 1 & 0 als Werte zu nehmen sondern alle reellen zahlen • Fuchs [0.234, -4.993, 0.45, … ] Fuchs flinke schnelle sprang 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 6 Wortvektorrepräsentation Die Ähnlichkeit zweier Worte kann dann über die Kosinus-Ähnlichkeit der beiden Vektoren berechnet werden • cos(x,y) =(x·y) / (‖x‖‖y‖) Fuchs flinke schnelle 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 7 Lernen von Word Embeddings • Grundkonzept: Distributionsähnlichkeit, d.h. die Bedeutung eines Wortes kann aus seinem Kontext geschlossen werden • “Der Fuchs trinkt Wasser” vs. “Der Fuchs trinkt Bier” • Bekanntester Ansatz: Word2Vec von Tomas Mikolov 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 8 Word2Vec ● Entwickelt von Tomas Mikolov Et al. 2013 ● Erster WE-Ansatz, welcher richtig gut performt hat ● Besteht aus zwei Ansätzen: – Skipgram – CBOW (Continious Bag of Words) 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 9 von Continous Bag of Words • Idee: der Kontext ist gegeben und wir wollen das Wort vorhersagen • Zu beginn sind die Wortvektoren der Kontextwörter noch zufällig initialisiert • Die Wortvektoren der Kontextwörter werden aufsummiert um das mittlere Wort zu repräsentieren • Dies wird in mehreren Durchläufen für den ganzen Text gemacht 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 10 Skipgram • Idee: das Wort ist gegeben und wir wollen den Kontext vorhersagen • Ansonsten gleich in der Funktionsweise wie CBOW • Nachdem das Netzwerk trainiert wurde kann man dem Netzwerk ein Input-Wort geben und bekommt seinen Kontext 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 11 Neuere Ansätze • GloVe • FastText • BERT von Jacob Devlin • ELMo von Matthew E. Peters 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 12 Anwendungsbereiche von Word Embeddings • Durch Word Embeddings kann die Bedeutung eines Wortes in eine arithmetisch verarbeitbare Form gebracht werden • z.B. Hauptstadt + Spanien = Madrid oder (Madrid – Spanien) + Deutschland = Berlin • Word Embeddings sind hilfreich bei: • Analyse von Kommentaren, Umfragen, Websiteinhalten • Vorschlagen ähnlicher Inhalte auf Basis von Suchanfragen • Maschineller Übersetzung • Sentiment Analyse & Question Answering 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 13 Semantic Folding • Ansatz zur (computergestützten) Repräsentation von Sprache in Anlehnung an menschliches Gehirn • Explizite Kodierung von Sprachsemantik einschließlich aller Bedeutungen und Kontexte von Wörtern • Basis: Theorie des Hierarchischen Temporalspeichers (engl. Hierarchical Temporal Memory, HTM) − Künstliche Nachbildung menschlichen Neokortex − Beschreibt Neokortex als (topographischen) 2D-Raum − Sparse Distributed Representation als Datenstruktur • Unüberwachter ML-Ansatz 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 14 Sparse Distributed Representation (SDR) • Kodierungsformat für Daten • Modelliert menschlichen Neokortex (2D-Raum) • Bildet sehr großen, spärlich gefüllten Bitvektor • Bits entsprechen Neuronen im Gehirn • Jedes Bit stellt eindeutige semantische Bedeutung dar • Bedeutung ist über die Bits verteilt • Gleiche Bits in SDRs deuten auf semantische Ähnlichkeit hin • Unterstützt Mengenoperationen 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 15 Sparse Distributed Representation (SDR) 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 16 Sparse Distributed Representation (SDR) • Bitvektor für “Cat”: 00100000000000000001000000 00010100000 … 000010001000 • Bitvektor für “Dog”: 00000000000000000001000100 00000110000 … 000000001010 • Bitvektor für “Fish”: 10000010000000000001001000 00010000000 … 000000100000 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 17 Semantic Folding Prozess Semantic Folding Pipeline Datenbank Trainingskorpus semantischer (Semantic Aufteilung des Fingerabdrücke Universe) Erzeugung (topografischen) Erzeugung semantischer Vorverarbeitung des Trainingskorpus (Retina DB) semantischen Raums Fingerabdrücke Trainingskorpus in Passagen (Semantic Map) (Semantic Fingerprints) (Snippets) 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 18 Semantic Map • Jedes Snippet stellt einzelnen Kontext dar • Snippets werden auf 2D-Matrix (128x128 Bit) verteilt • Erzeugte 2D-Matrix bildet den topografischen semantischen Raum (Semantic Map) des Trainingskorpus 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 19 Semantic Fingerprints (Wort) • Große, sehr spärlich gefüllte Bitvektoren für jedes Wort im Trainingskorpus • Verkapseln Wortbedeutung in topografischer Darstellung (SDR, 2D, 128x128 Bits) • Erlauben direkten semantischen Vergleich der Bedeutungen zwischen Wörtern • Ähnlichkeit wird durch Grad der Überlappung zwischen SDRs gemessen 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 20 Semantische Ähnlichkeit (Wort) 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 21 Semantic Fingerprints (Text) • Wort-Fingerabdrücke können gestapelt und zu Text-Fingerabdrücken aggregiert werden • Gestapelte Wort-Fingerabdrücke bilden den sog. Bit-Stapel • Zur Sparsamkeit des Text-Fingerabdrucks wird der Bit-Stapel an einem Schwellenwert abgeschnitten • Alle semantischen Fingerabdruck-Eigenschaften bleiben erhalten → ähnliche Fingerabdrücke bedeuten ähnliche Texte und vice versa 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 22 Semantic Fingerprints (Text) 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 23 Semantische Ähnlichkeit (Text) 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 24 Rechnen mit Fingerabdrücken • Alle semantischen Fingerabdrücke haben die gleiche Größe und den gleichen Feature-Raum • Direktes Rechnen mit booleschen Ausdrücken möglich • AND-, OR- und SUB-Operationen 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 25 Anwendungsbereich Semantic Folding ● Dokumentenklassifikation ● Text Stream Filter ● Dokumentensuche 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 26 Dokumentenklassifikation ● Normalerweise: Training eines Klassifikators auf Trainingsdaten (oft supervised) ● Hier: Jede Klasse wird durch Fingerprint repräsentiert ● Bsp.: 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 27 Text Stream Filter Konvertieren der Tweets in Fingerprints Vergleich Tweets mit Filter Fingerprints der Tweets Matching Not matching Fingerprints Filter Fingerprints 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 28 Dokumentensuche Wie wird Honig Text documents hergestellt? Suchanfrage (Wort, Frage, …) Semantic Folding Pipeline mit: - Vorverarbeitung - Aufteilung in Snippets Suchergebnis (Snippet, Dokument) Erstellen des Fingerprints Erstellen des Fingerprints pro Snippet der Suchanfrage Fingerprints der Snippets Vergleich mit Fingerprints aus Databank Snippets mit Snippet 1 Snippet 2 Snippet … dem besten Rank 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 29 Word Embeddings und Semantic Folding im Vergleich ● Word Embedding vs. Semantic Folding ● Vor- und Nachteile 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 30 Word Embedding vs. Semantic Folding Word Embeddings (wie Word2Vec) Semantic Folding Repräsentation: - Dense - Sparse - mehrere Hundert Features - Zehntausende Feautres (~ 16 000) - float - binär Vergleich: durch Distanzberechnung durch binäre Überlappung Disambiguierung: desambiguiert das Wort „Apple“ in desambiguiert das Wort „Apple“ mittels Form von einer Computer Firma oder seines Kontexts: bspw. Computer, einer Frucht Smartphone, Tablet, kochen, Frucht, Baum, usw. Training: - Training mit Gold-Standard - Training auf Text Snippets - Supervised - Unsupervised 16.06.2021 Semantic Folding & Word Embeddings – Referat Folie 31 Vor- und Nachteile Word Embeddings (wie Word2Vec) Semantic Folding Viele Open Source Packages
Recommended publications
  • Approximate Pattern Matching Using Hierarchical Graph Construction and Sparse Distributed Representation
    Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 9-29-2020 Approximate Pattern Matching Using Hierarchical Graph Construction and Sparse Distributed Representation Aakanksha Mathuria Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/open_access_etds Part of the Electrical and Computer Engineering Commons Let us know how access to this document benefits ou.y Recommended Citation Mathuria, Aakanksha, "Approximate Pattern Matching Using Hierarchical Graph Construction and Sparse Distributed Representation" (2020). Dissertations and Theses. Paper 5581. https://doi.org/10.15760/etd.7453 This Thesis is brought to you for free and open access. It has been accepted for inclusion in Dissertations and Theses by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Approximate Pattern Matching using Hierarchical Graph Construction and Sparse Distributed Representation by Aakanksha Mathuria A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer Engineering Thesis Committee: Dan Hammerstrom, Chair Christof Teuscher Nirupama Bulusu Portland State University 2020 Abstract With recent developments in deep networks, there have been significant advances in visual object detection and recognition. However, some of these networks are still easily fooled/hacked and have shown ”bag of features” kinds of failures. Some of this is due to the fact that even deep networks make only marginal use of the complex structure that exists in real-world images. Primate visual systems appear to capture the structure in images, but how? In the research presented here, we are studying approaches for robust pattern matching using static, 2D Blocks World images based on graphical representations of the various components of an image.
    [Show full text]
  • Anomalous Behavior Detection Framework Using HTM-Based Semantic Folding Technique
    Hindawi Computational and Mathematical Methods in Medicine Volume 2021, Article ID 5585238, 14 pages https://doi.org/10.1155/2021/5585238 Research Article Anomalous Behavior Detection Framework Using HTM-Based Semantic Folding Technique Hamid Masood Khan ,1 Fazal Masud Khan,1 Aurangzeb Khan,2 Muhammad Zubair Asghar ,1 and Daniyal M. Alghazzawi 3 1Institute of Computing and Information Technology, Gomal University, D.I.Khan, Pakistan 2Department of Computer Science, University of Science and Technology, Bannu, Pakistan 3Information Systems Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia Correspondence should be addressed to Muhammad Zubair Asghar; [email protected] Received 23 January 2021; Revised 17 February 2021; Accepted 26 February 2021; Published 16 March 2021 Academic Editor: Waqas Haider Bangyal Copyright © 2021 Hamid Masood Khan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Upon the working principles of the human neocortex, the Hierarchical Temporal Memory model has been developed which is a proposed theoretical framework for sequence learning. Both categorical and numerical types of data are handled by HTM. Semantic Folding Theory (SFT) is based on HTM to represent a data stream for processing in the form of sparse distributed representation (SDR). For natural language perception and production, SFT delivers a solid structural background for semantic evidence description to the fundamentals of the semantic foundation during the phase of language learning. Anomalies are the patterns from data streams that do not follow the expected behavior.
    [Show full text]
  • Approximate Pattern Matching Using Hierarchical Graph Construction and Sparse Distributed Representation
    Portland State University PDXScholar Electrical and Computer Engineering Faculty Publications and Presentations Electrical and Computer Engineering 7-23-2019 Approximate Pattern Matching using Hierarchical Graph Construction and Sparse Distributed Representation Aakanksha Mathuria Portland State University, [email protected] Dan Hammerstrom Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/ece_fac Part of the Engineering Commons Let us know how access to this document benefits ou.y Citation Details "Approximate Pattern Matching using Hierarchical Graph Construction and Sparse Distributed Representation," A. Mathuria, D.W. Hammerstrom, International Conference on Neuromorphic Systems, Knoxville, TN, July 23-5, 2019 This Conference Proceeding is brought to you for free and open access. It has been accepted for inclusion in Electrical and Computer Engineering Faculty Publications and Presentations by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Approximate Pattern Matching using Hierarchical Graph Construction and Sparse Distributed Representation Aakanksha Mathuria Electrical and Computer Engineering Portland State University USA [email protected] Dan W. Hammerstrom Electrical and Computer Engineering Portland State University USA [email protected] ABSTRACT training on large numbers of images. None of these techniques actually captures the spatial relationships of the low level or high- With recent developments in deep networks, there have been level features, which biological networks appear to do. significant advances in visual object detection and Efficient graph representations capture the higher order recognition. However, some of these networks are still easily information content of the objects and provide algorithmic benefits fooled/hacked and have shown “bag of features” failures.
    [Show full text]
  • Biologically-Inspired Machine Intelligence Technique for Activity Classification in Smart Home Environments
    Biologically-Inspired Machine Intelligence Technique for Activity Classification in Smart Home Environments Talal Sarheed Alshammari A thesis submitted in partial fulfillment of the requirement of Staffordshire University for the degree of Doctor of Philosophy May 2019 List of Publications 2 REFEREED JOURNAL PUBLICATIONS OpenSHS: Open Smart Home Simulator May, 2017 Sensors Journal MDPI 2 REFEREED JOURNAL PUBLICATIONS SIMADL: Simulated Activities of Daily Living Dataset. April, 2018 Data Journal MDPI 2 REFEREED CONFERENCE PUBLICATIONS Evaluating Machine Learning Techniques for Activity Classifica- February ,2018 tion in Smart Home Environments World Academy of Science, Engineering and Technology International Journal of Infor- mation and Communication Engineering i Acknowledgements I would like to thank my supervisors Dr. Mohamed Sedky and Chris Howard for encour- aging and supporting me during my Ph.D study. They were very helpful and provided me their assistance throughout my Ph.D journey. It was very good experience to improve my research skills. Due to this experience, I learned new knowledge, found new ideas from your supervision. I appreciate your consideration, efforts and kind cooperation, you deserve all the thanks. I would like to thank to my parents and my wife for their unconditional support that has helped me for achieving this thesis. I would like to thank the Ministry of Education in Saudi Arabia and Hail university for funding and supporting this research. ii Contents List of Publicationsi Acknowledgements ii Table of Contents
    [Show full text]
  • Distributional Semantics
    Distributional semantics Distributional semantics is a research area that devel- by populating the vectors with information on which text ops and studies theories and methods for quantifying regions the linguistic items occur in; paradigmatic sim- and categorizing semantic similarities between linguis- ilarities can be extracted by populating the vectors with tic items based on their distributional properties in large information on which other linguistic items the items co- samples of language data. The basic idea of distributional occur with. Note that the latter type of vectors can also semantics can be summed up in the so-called Distribu- be used to extract syntagmatic similarities by looking at tional hypothesis: linguistic items with similar distributions the individual vector components. have similar meanings. The basic idea of a correlation between distributional and semantic similarity can be operationalized in many dif- ferent ways. There is a rich variety of computational 1 Distributional hypothesis models implementing distributional semantics, includ- ing latent semantic analysis (LSA),[8] Hyperspace Ana- The distributional hypothesis in linguistics is derived logue to Language (HAL), syntax- or dependency-based from the semantic theory of language usage, i.e. words models,[9] random indexing, semantic folding[10] and var- that are used and occur in the same contexts tend to ious variants of the topic model. [1] purport similar meanings. The underlying idea that “a Distributional semantic models differ primarily with re- word is characterized by the company it keeps” was pop- spect to the following parameters: ularized by Firth.[2] The Distributional Hypothesis is the basis for statistical semantics. Although the Distribu- • tional Hypothesis originated in linguistics,[3] it is now re- Context type (text regions vs.
    [Show full text]
  • Arguments For: Semantic Folding and Hierarchical Temporal Memory
    [email protected] Mariahilferstrasse 4 1070 Vienna, Austria http://www.cortical.io ARGUMENTS FOR: SEMANTIC FOLDING AND HIERARCHICAL TEMPORAL MEMORY Replies to the most common Semantic Folding criticisms Francisco Webber, September 2016 INTRODUCTION 2 MOTIVATION FOR SEMANTIC FOLDING 3 SOLVING ‘HARD’ PROBLEMS 3 ON STATISTICAL MODELING 4 FREQUENT CRITICISMS OF SEMANTIC FOLDING 4 “SEMANTIC FOLDING IS JUST WORD EMBEDDING” 4 “WHY NOT USE THE OPEN SOURCED WORD2VEC?” 6 “SEMANTIC FOLDING HAS NO COMPARATIVE EVALUATION” 7 “SEMANTIC FOLDING HAS ONLY LOGICAL/PHILOSOPHICAL ARGUMENTS” 7 FREQUENT CRITICISMS OF HTM THEORY 8 “JEFF HAWKINS’ WORK HAS BEEN MOSTLY PHILOSOPHICAL AND LESS TECHNICAL” 8 “THERE ARE NO MATHEMATICALLY SOUND PROOFS OR VALIDATIONS OF THE HTM ASSERTIONS” 8 “THERE IS NO REAL-WORLD SUCCESS” 8 “THERE IS NO COMPARISON WITH MORE POPULAR ALGORITHMS OF DEEP LEARNING” 8 “GOOGLE/DEEP MIND PRESENT THEIR PERFORMANCE METRICS, WHY NOT NUMENTA” 9 FMRI SUPPORT FOR SEMANTIC FOLDING THEORY 9 REALITY CHECK: BUSINESS APPLICABILITY 9 GLOBAL INDUSTRY UNDER PRESSURE 10 “WE HAVE TRIED EVERYTHING …” 10 REFERENCES 10 Arguments for Semantic Folding and Hierarchical Temporal Memory Theory Introduction During the last few years, the big promise of computer science, to solve any computable problem given enough data and a gold standard, has triggered a race for machine learning (ML) among tech communities. Sophisticated computational techniques have enabled the tackling of problems that have been considered unsolvable for decades. Face recognition, speech recognition, self- driving cars; it seems like a computational model could be created for any human task. Anthropology has taught us that a sufficiently complex technology is indistinguishable from magic by the non-expert, indigenous mind.
    [Show full text]
  • Using Semantic Folding with Textrank for Automatic Summarization
    DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2017 Using semantic folding with TextRank for automatic summarization SIMON KARLSSON KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Using semantic folding with TextRank for automatic summarization SIMON KARLSSON [email protected] Master in Computer Science Date: June 27, 2017 Principal: Findwise AB Supervisor at Findwise: Henrik Laurentz Supervisor at KTH: Stefan Nilsson Examiner at KTH: Olov Engwall Swedish title: TextRank med semantisk vikning för automatisk sammanfattning School of Computer Science and Communication i Abstract This master thesis deals with automatic summarization of text and how semantic fold- ing can be used as a similarity measure between sentences in the TextRank algorithm. The method was implemented and compared with two common similarity measures. These two similarity measures were cosine similarity of tf-idf vectors and the number of overlapping terms in two sentences. The three methods were implemented and the linguistic features used in the construc- tion were stop words, part-of-speech filtering and stemming. Five different part-of- speech filters were used, with different mixtures of nouns, verbs, and adjectives. The three methods were evaluated by summarizing documents from the Document Understanding Conference and comparing them to gold-standard summarization cre- ated by human judges. Comparison between the system summaries and gold-standard summaries was made with the ROUGE-1 measure. The algorithm with semantic fold- ing performed worst of the three methods, but only 0.0096 worse in F-score than cosine similarity of tf-idf vectors that performed best. For semantic folding, the average preci- sion was 46.2% and recall 45.7% for the best-performing part-of-speech filter.
    [Show full text]
  • Biological and Machine Intelligence (Bami)
    BIOLOGICAL AND MACHINE INTELLIGENCE (BAMI) A living book that documents Hierarchical Temporal Memory (HTM) March 8, 2017 ©Numenta, Inc. 2017 For more details on the use of Numenta’s software and intellectual property, including the ideas contained in this book, see http://numenta.com/business-strategy-and-ip/. Biological and Machine Intelligence (BAMI) is a living book authored by Numenta researchers and engineers. Its purpose is to document Hierarchical Temporal Memory, a theoretical framework for both biological and machine intelligence. While there’s a lot more work to be done on HTM theory, we have made good progress on several components of a comprehensive theory of the neocortex and how to apply that knowledge to practical applications. We would like to share our work as we go. We hope this book will become the standard reference for people who want to learn about HTM cortical theory and its applications for machine intelligence. Going forward we anticipate using two main forms of documentation of our research, one is published research papers and the other is additions to this book, Biological and Machine Intelligence. Just as we make all of our research and technology available in open source, we want to be transparent with this manuscript as well, even well-ahead of its completion. When it is finished, this book will cover all the fundamental concepts of HTM theory. For now, we are sharing the chapters that have been completed, each of which contains a revision history, so you can see what has changed in each chapter. Over time, we will add chapters to BAMI.
    [Show full text]
  • Encoding Data for HTM Systems
    ! Encoding Data for HTM Systems Scott Purdy Numenta, Inc, Redwood City, California, USA Email: [email protected] We welcome comments. Please contact the author with any questions or comments. Encoding Data for HTM Systems Scott Purdy Numenta, Inc, Redwood City, California, United States of America ! Abstract Hierarchical Temporal Memory (HTM) is a 2. The Encoding Process biologically inspired machine intelligence The encoding process is analogous to the functions of technology that mimics the architecture and sensory organs of humans and other animals. The processes of the neocortex. In this white paper cochlea, for instance, is a specialized structure that we describe how to encode data as Sparse converts the frequencies and amplitudes of sounds in Distributed Representations (SDRs) for use in the environment into a sparse set of active neurons HTM systems. We explain several existing (Webster et al, 1992, Schuknecht, 1974). The basic encoders, which are available through the open 1 mechanism for this process (Fig. 1) comprises a set of source project called NuPIC , and we discuss inner hair cells organized in a row that are sensitive to requirements for creating encoders for new different frequencies. When an appropriate frequency types of data. of sound occurs, the hair cells stimulate neurons that send the signal into the brain. The set of neurons that 1. What is an Encoder? are triggered in this manner comprise the encoding of Hierarchical Temporal Memory (HTM) provides a the sound as a Sparse Distributed Representation. flexible and biologically accurate framework for solving prediction, classification, and anomaly detection problems for a broad range of data types (Hawkins and Ahmad, 2015).
    [Show full text]
  • Gait Recognition and Understanding Based on Hierarchical Temporal Memory Using 3D Gait Semantic Folding
    sensors Article Gait Recognition and Understanding Based on Hierarchical Temporal Memory Using 3D Gait Semantic Folding Jian Luo 1,* and Tardi Tjahjadi 2 1 Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha 410000, China 2 School of Engineering, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK; [email protected] * Correspondence: [email protected]; Tel.: +86-(0)731-8887-2135 Received: 22 January 2020; Accepted: 13 March 2020; Published: 16 March 2020 Abstract: Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge.
    [Show full text]
  • Research on NLP for RE at Utrecht University: a Report
    Research on NLP for RE at Utrecht University: A Report Fabiano Dalpiaz Sjaak Brinkkemper RE-Lab, Utrecht University RE-Lab, Utrecht University Utrecht, The Netherlands Utrecht, The Netherlands [email protected] [email protected] Abstract [Team Overview] The Requirements Engineering Lab at Utrecht Uni- versity conducts research on techniques and software tools that help people express better requirements in order to ultimately deliver bet- ter software products. [Past Research] We have focused on natural language processing-powered tools that analyze user stories to iden- tify defects, to extract conceptual models that deliver an overview of the used concepts, and to pinpoint terminological ambiguity. Also, we studied how reviews for competing products can be analyzed via natural language processing (NLP) to identify new requirements. [Re- search Plan] The gained knowledge from our experience with NLP in requirements engineering (RE) triggers new research lines concerning the synergy between humans and NLP, including the use of intelligent chatbots to elicit requirements, the automated synthesis of creative re- quirements, and the maintenance of traceability via linguistic tooling. 1 Team Overview The Requirements Engineering Lab (RE-Lab) is a research group in the Department of Information and Com- puting Sciences at Utrecht University. The RE-Lab members are involved in several research directions with the common objective to help people express better requirements in order to ultimately deliver better software products. The lab's research method involves the use of state-of-the-art, innovative techniques from various disciplines (computer science, logics, artificial intelligence, computational linguistics, social sciences, psychology, etc.) and to apply them to solve real-world problems in the software industry.
    [Show full text]
  • Sparse Distributed Representations As Word Embeddings for Language
    University Institute of Lisbon Department of Information Science and Technology Sparse Distributed Representations as Word Embeddings for Language Understanding André de Vasconcelos Santos Silva A dissertation submitted in partial fulfillment of the requirements for the Degree of Master in Computer Engineering Supervisor Doctor Ricardo Daniel Santos Faro Marques Ribeiro, Assistant Professor ISCTE-IUL October, 2018 Acknowledgements I would like to thank my wife and son for being so comprehensive and supportful throughout this year of intense work. They have been my source of motivation and inspiration that helped me accomplish all my objectives. I’m also grateful to my supervisor Prof. Ricardo Ribeiro who always guided me to the right direction. Without his guidance I wouldn´t have achieved my research goals. Last but not least a very special gratitude for my parents, from whom I owe everything. i Resumo A designação word embeddings refere-se a representações vetoriais das palavras que capturam as similaridades semânticas e sintáticas entre estas. Palavras similares tendem a ser representadas por vetores próximos num espaço N dimensional considerando, por exemplo, a distância Euclidiana entre os pontos associados a estas representações vetoriais num espaço vetorial contínuo. Esta propriedade, torna as word embeddings importantes em várias tarefas de Processamento Natural da Língua, desde avaliações de analogia e similaridade entre palavras, às mais complexas tarefas de categorização, sumarização e tradução automática de texto. Tipicamente, as word embeddings são constituídas por vetores densos, de dimensionalidade reduzida. São obtidas a partir de aprendizagem não supervisionada, recorrendo a consideráveis quantidades de dados, através da otimização de uma função objetivo de uma rede neuronal.
    [Show full text]