A Categorical Viewpoint on Machine Learning

Total Page:16

File Type:pdf, Size:1020Kb

A Categorical Viewpoint on Machine Learning A Categorical Viewpoint on Machine Learning Christophe Culan Maxime Lubin [email protected] [email protected] Abstract Modeling the reasoning process of human beings is a long-standing goal of artificial intelligence. Starting with a symbolic approach rooted in logic in the 60s - compose- able, interpretable but brittle in the face of fuzzy and stochastic patterns - to recent deep learning advances at the other end of the spectrum. We propose a novel model and inference technique based on Category Theory to treat training data not only as a set but as a full-fledged learnable category: what points are related to each other? In what way? In which causal direction? Learning such a complex view of a dataset is tantamount to soundly drop the often unquestioned hypothesis of independent, identically distributed training data. This can be useful to better account for the consequences of data augmentation techniques, as well as to learn non-trivial relations spanning multiple observations. 1 Introduction We perceive and abstract our reality through the prism of causality: causes precede consequences. However we do not perceive causality directly. Instead through repeated experimentation, we eventually notice patterns emerging from the underlying causal links, eg, drinking alcohol then feeling tipsy. These patterns can be learned through induction, even by machines, as all the great recent progress in machine learning showcases in vision [KSH12][HZRS15], natural language processing [VSP+17][HS97], speech processing [GMH13][BCS+16] or reinforcement learning [SSS+17] [MKS+15]. We now have tremendous pattern-matching machines but they still do struggle and are largely incapable of reifying observations into causal links. The most notable exceptions are Probabilistic Graphical Models [KF09], Markov Logic Networks [RD06], Structural Causal Models [Pea09] and the recent Graph Networks [BHB+18], all of which requires an explicit encoding of stochastic causal dependence between observables. Nature and human abstractions abund in different patterns. So much they first appear to form a large set of distinct, unrelated curiosities. However in each category, humanity’s work and ingenuity largely proved the infinite looking zoo of patterns to stem from a small number of unique interacting entities. For example, physical processes are largely described by the handful of elementary particles and evolution laws of the Standard Model, mathematics from axioms and proof constructs, language from words and syntax. Complexity seems to emerge through interaction of simpler components, a phenomenon known as combinatorial generalization. The emphasis is not on the intrinsics of the elementary entities but on their collective behavior and interaction. [BHB+18] offers a comprehensive, well written introduction and justification for this argument. Category theory [EM45] is a very powerful framework [Law66] that precisely embodies this view. Since its inception, the pervasive nature of categories has been steadily fleshed out and revealed many deep connections between seemingly unrelated fields of mathematics and is now a core tool in state-of-the-art developments of mathematics, computer science and theoretical physics. Machine learning and more generally statistics has thus far reaped little theoretical and practical benefits. Our choice to rely on category theory for AI is not so strange a position. Most if not all of computational Preprint. Work in progress. ontologies e.g. [HV06][PSST10] is about categories, albeit very often not phrased in its direct language but instead in logic. Category theory is starting to creep into cognitive sciences to formally model concepts, interplays and analogies [BP][HG08], and even applied to neural networks [Hea00]. This paper starts by a brief introduction of a few basic notions required from category theory, after which we incrementally build a categorical view of classification problems in Section 2. This is followed by a short presentation of related works in the literature, after which we shortly conclude and expand on planned future works. An open source Python implementation can be found at: https://github.com/Previsou/ CategoryLearning. Basic category theory primer For the reader unfamiliar with category theory, we here define the few basic notions used in the rest of the paper. For a more principled introduction and advanced category theory topics, see [Awo10] and [Rie16]. Definition 1. A category C consists of: • Objects, noted as x; y; z; ··· : C • Arrows/relations between two objects, noted f : x ! y; g; : : : • Identities: given x : C, there is an identity arrow 1x : x ! x in C. • Compositions: the collection of arrows is closed by composition. Given f : x ! y and g : y ! z there is an arrow noted as g ◦ f : x ! z in C. The composition ◦ is furthermore restricted to be associative and identities act as left and right units. That is, given f : x ! y; g : y ! z; h : z ! w in C: • associativity: h ◦ (g ◦ f) = (h ◦ g) ◦ f. • unit: f = f ◦ 1A = 1B ◦ f. Everything matching the above definitions is a category. For example, directed graph G(V; E) can be seen as a form of proto-category. A path in G is a finite sequence of edges e1e2 : : : el. The free category C(G) over a graph G(V; E) is obtained by completing the set of edges. Then C(G) 0 0 has for objects the vertices and for arrows the paths e1 : : : el, ele1 : : : em in G whose composite is 0 0 1 e1 : : : ele1 : : : em ie there is an arrow fV1 ! V2 if and only if V1 = V2 or V1 and V2 belongs to the same connected component. Definition 2. Let C be a category. A subcategory S of C is given by a subcollection of objects of C - denoted ob(S) - and a subcollection of arrows of C - denoted hom(S) - such that •8 X : ob(S); idX : hom(S), •8 f : hom(S)(X; Y );X : ob(S) ^ Y : ob(S), •8 f; g : hom(S); (g ◦ f : C ) g ◦ f : hom(S)). 2 Learning the category of a dataset Supervised machine learning algorithm receives a set of observations and matching targets as N input. Let θ 2 Rq be a q-dimensional parameter vector. Let X 2 FT and Y be respectively the observation sample of length N and Y the assocatiated labels or targets, FT is the feature space of a single observation. Discriminative supervised machine learning training can be viewed as, given a model f - typically a neural network or SVM- to find the maximum a posteriori (MAP) parameter vector: MAP(θ; f; X; Y ) = arg max P (Y = f(X; θ)jX) (1) θ 1Loops must also be added to every vertex to serve as identities. 2 For tractability, most algorithms hypothesize those observations are independent and identically distributed 2: they were generated by the same measurement protocol, with previous measurements having no effect on future ones. However, not only is this assumption quite wrong in many situations, but it is inherently broken by the very common use of data augmentation techniques. Independence of samples is usually supposed as it enables to factor Eq.1: N Y MAP(θ; f; X; Y ) = arg max P (Yi = f(Xi; θ)jXi) (2) θ i=1 For numerical reasons, Eq.2 is transformed as a loss minimization problem L by applying the transformation x ! − log x. The factored product transforms into a summation, less prone to floating-point rounding errors. N 1 X L(θ; f; X; Y ) = − log P (Y = f(X ; θ)jX ) (3) N i i i n=1 We now particularize to K-class classification setting.Targets Y are assumed to be one-hot encoded ie Yi;j = 1 if and only if observation i is of class j and 0 otherwise. Let our model be S : FT × θ ! K PK [0; 1] constrained by i=1 Si(∗) = 1. The ouput of S is essentially a probability vector. The same holds for Yi where all probability mass is concentrated in a single entry. The goal is to have these two distributions to align. Classification problems typically select the Kullback-Leibler divergence DKL to compare distributions. Definition 3. The Kullback-Leibler divergence between discrete probability distributions P and Q is P P (i) defined as DKL(P jjQ) = i P (i) log Q(i) . Its most relevant properties are: • positivity: DKL(P jjQ) ≥ 0. • asymmetry: DKL(P jjQ) 6= DKL(QjjP ) in the general case. • minimum: DKL(P jjQ) = 0 if and only if P = Q almost everywhere. We end up with the following training loss: N 1 X L(θ; S; X; Y ) = D (Y jjS(X ; θ)) (4) N KL i i n=1 Given the degenerate case of Yi the KL-divergence reduces to the cross-entropy H(P; Q), and we end up with the familiar classification loss: K X LCE(θ; S; Xi;Yi) = H (Yi;S(Xi; θ)) = − Yi;j log S(Xi; θ)j j=1 (5) N 1 X L(θ; S; X; Y ) = L (θ; S; X ;Y ) N CE i i n=1 Considering the input data not as a set but as a learnable small category requires to select a proper and practical cost function for identities and composition, and to define how observations can relate to each other and in how many ways. The gist of what we set out to do can be exemplified as follows. Assume we set out to perform a binary classification task: given a sequenced genome, is the patient affected by a given genetic disease? From what we know of DNA and heredity, encoding and properly leveraging whether two patients are related should yield a more accurate model. 2The most notable exceptions are time-series and sequence learning problems, where the time/order depen- dence is implicit and given.
Recommended publications
  • Adaptive Sparse Representation of Continuous Input for Tsetlin Machines Based on Stochastic Searching on the Line
    electronics Article Adaptive Sparse Representation of Continuous Input for Tsetlin Machines Based on Stochastic Searching on the Line Kuruge Darshana Abeyrathna *, Ole-Christoffer Granmo and Morten Goodwin Centre for Artificial Intelligence Research, University of Agder, 4870 Grimstad, Norway; [email protected] (O.-C.G.); [email protected] (M.G.) * Correspondence: [email protected] Abstract: This paper introduces a novel approach to representing continuous inputs in Tsetlin Machines (TMs). Instead of using one Tsetlin Automaton (TA) for every unique threshold found when Booleanizing continuous input, we employ two Stochastic Searching on the Line (SSL) automata to learn discriminative lower and upper bounds. The two resulting Boolean features are adapted to the rest of the clause by equipping each clause with its own team of SSLs, which update the bounds during the learning process. Two standard TAs finally decide whether to include the resulting features as part of the clause. In this way, only four automata altogether represent one continuous feature (instead of potentially hundreds of them). We evaluate the performance of the new scheme empirically using five datasets, along with a study of interpretability. On average, TMs with SSL feature representation use 4.3 times fewer literals than the TM with static threshold-based features. Furthermore, in terms of average memory usage and F1-Score, our approach outperforms simple Multi-Layered Artificial Neural Networks, Decision Trees, Support Vector Machines, K-Nearest Neighbor, Random Forest, Gradient Boosted Trees (XGBoost), and Explainable Boosting Machines (EBMs), as well as the standard and real-value weighted TMs. Our approach further outperforms Citation: Abeyrathna, K.D.; Granmo, Neural Additive Models on Fraud Detection and StructureBoost on CA-58 in terms of the Area Under O.-C.; Goodwin, M.
    [Show full text]
  • Coalesced Multi-Output Tsetlin Machines with Clause Sharing∗
    Coalesced Multi-Output Tsetlin Machines with Clause Sharing∗ Sondre Glimsdal† and Ole-Christoffer Granmo‡ August 18, 2021 Abstract Using finite-state machines to learn patterns, Tsetlin machines (TMs) have obtained competitive accuracy and learning speed across several benchmarks, with frugal memory- and energy footprint. A TM represents patterns as conjunctive clauses in propositional logic (AND-rules), each clause voting for or against a particular output. While efficient for single-output problems, one needs a separate TM per output for multi-output problems. Employing multiple TMs hinders pattern reuse because each TM then operates in a silo. In this paper, we introduce clause sharing, merging multiple TMs into a single one. Each clause is related to each output by using a weight. A positive weight makes the clause vote for output 1, while a negative weight makes the clause vote for output 0. The clauses thus coalesce to produce multiple outputs. The resulting coalesced Tsetlin machine (CoTM) simultaneously learns both the weights and the composition of each clause by employing interacting Stochastic Searching on the Line (SSL) and Tsetlin automata (TAs) teams. Our empirical results on MNIST, Fashion-MNIST, and Kuzushiji-MNIST show that CoTM obtains significantly higher accuracy than TM on 50- to 1K-clause configurations, indicating an ability to repurpose clauses. E.g., accuracy goes from 71:99% to 89:66% on Fashion- MNIST when employing 50 clauses per class (22 Kb memory). While TM and CoTM accuracy is similar when using more than 1K clauses per class, CoTM reaches peak accuracy 3ˆ faster on MNIST with 8K clauses.
    [Show full text]
  • Arxiv:2105.09114V1 [Cs.CL] 19 May 2021 Decomposes Into Meaningful Words and Their Difficult to Detect Fake News Based on Linguistic Con- Negations
    Explainable Tsetlin Machine framework for fake news detection with credibility score assessment Bimal Bhattarai Ole-Christoffer Granmo Lei Jiao University of Agder University of Agder University of Agder [email protected] [email protected] [email protected] Abstract particularly problematic as they seek to deceive people for political and financial gain (Gottfried The proliferation of fake news, i.e., news in- and Shearer, 2016). tentionally spread for misinformation, poses a threat to individuals and society. Despite In recent years, we have witnessed extensive various fact-checking websites such as Politi- growth of fake news in social media, spread across Fact, robust detection techniques are required news blogs, Twitter, and other social platforms. to deal with the increase in fake news. Sev- At present, most online misinformation is manu- eral deep learning models show promising re- ally written (Vargo et al., 2018). However, natural sults for fake news classification, however, language models like GPT-3 enable automatic gen- their black-box nature makes it difficult to ex- eration of realistic-looking fake news, which may plain their classification decisions and quality- assure the models. We here address this prob- accelerate future growth. Such growth is problem- lem by proposing a novel interpretable fake atic as most people nowadays digest news stories news detection framework based on the re- from social media and news blogs (Allcott and cently introduced Tsetlin Machine (TM). In Gentzkow, 2017). Indeed, the spread of fake news brief, we utilize the conjunctive clauses of poses a severe threat to journalism, individuals, and the TM to capture lexical and semantic prop- society.
    [Show full text]
  • DETAILED PROGRAM Last Updated 27-Nov-2020
    DETAILED PROGRAM Last Updated 27-Nov-2020 ALL TIMES in AEDT (UTC+11) Please note in the conversion table below that some of the times listed below are on the previous day. AEDT 7:00:00 AM 8:00:00 AM 9:00:00 AM 10:00:00 AM 11:00:00 AM 7:00:00 PM 7:00:00 PM 8:00:00 PM 9:00:00 PM 10:00:00 PM 11:00:00 PM GMT 9:00:00 PM 10:00:00 PM 11:00:00 PM 12:00:00 AM 1:00:00 AM 9:00:00 AM 9:00:00 AM 10:00:00 AM 11:00:00 AM 12:00:00 PM 1:00:00 PM Beijing 4:00:00 AM 5:00:00 AM 6:00:00 AM 7:00:00 AM 8:00:00 AM 4:00:00 PM 4:00:00 PM 5:00:00 PM 6:00:00 PM 7:00:00 PM 8:00:00 PM Paris 9:00:00 PM 10:00:00 PM 11:00:00 PM 12:00:00 AM 1:00:00 AM 9:00:00 AM 9:00:00 AM 10:00:00 AM 11:00:00 AM 12:00:00 PM 1:00:00 PM London 8:00:00 PM 9:00:00 PM 10:00:00 PM 11:00:00 PM 12:00:00 AM 8:00:00 AM 8:00:00 AM 9:00:00 AM 10:00:00 AM 11:00:00 AM 12:00:00 PM Mexico 3:00:00 PM 4:00:00 PM 5:00:00 PM 6:00:00 PM 7:00:00 PM 3:00:00 AM 3:00:00 AM 4:00:00 AM 5:00:00 AM 6:00:00 AM 7:00:00 AM New York 4:00:00 PM 5:00:00 PM 6:00:00 PM 7:00:00 PM 8:00:00 PM 4:00:00 AM 4:00:00 AM 5:00:00 AM 6:00:00 AM 7:00:00 AM 8:00:00 AM San Francisco 1:00:00 PM 2:00:00 PM 3:00:00 PM 4:00:00 PM 5:00:00 PM 1:00:00 AM 1:00:00 AM 2:00:00 AM 3:00:00 AM 4:00:00 AM 5:00:00 AM 2 Tuesday, December 1, 7:00AM-10:06AM Tutorial: Deep Learning 1.0 and Beyond, Instructor: Truyen Tran: Room 1 Tutorial: Artificial Intelligence-based Uncertainty Qualification: Importance, Challenges and Solutions, Instructor: Abbas Khosravi and Saeid Nahavandi: Room 2 Tutorial: Handling Data Streams in Continual and Rapidly Changing Environments, Instructor: Mahardhika Pratama: Room 3 Special Session: IEEE-CIS 2nd Technical Challenge On Energy Prediction From Smart Meter Data 7.01 AM: Welcome (Luis) – 5 minutes 7.05 AM: Introduction to the 2nd Challenge and summary of results (Isaac) – 15 mins 7.20 AM: Presentations – 12 min presentation + Questions (5 min) + 3 min switch to next speaker.
    [Show full text]
  • Text Classification with Noisy Class Labels
    Text Classification with Noisy Class Labels By Andrea Pagotto A thesis proposal submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfilment of the requirements for the degree of Master of Computer Science Ottawa-Carleton Institute for Computer Science School of Computer Science Carleton University Ottawa, Ontario June 2020 ⃝c Copyright 2020, Andrea Pagotto The undersigned hereby recommend to the Faculty of Graduate and Postdoctoral Affairs acceptance of the thesis, Text Classification with Noisy Class Labels submitted by Andrea Pagotto Dr. Michel Barbeau (Director, School of Computer Science) Dr. B. John Oommen (Thesis Supervisor) Carleton University June 2020 ii ABSTRACT Text classification is a sub-field of Natural Language Processing (NLP), thatin- volves mapping an input text document to an output topic or label. There are nu- merous practical applications for text classification, including email spam detection, identifying sentiments in social media, news topic detection, and many more. Due to the importance of the field, there has been much related research in recent years. Our work in this thesis specifically focuses on the problem of text classification in the setting of a Random Environment. The Random Environment in this application would be noise in the labels of the training data. Label noise is an important issue in classification, with many potential negative consequences, such as decreasing the accuracy of the predictions, and increasing the complexity of the trained models. Designing learning algorithms that help maximize a desired performance measure in such noisy settings, is very valuable for achieving success on real world data. This thesis also investigates a recently proposed classification method that in- volves the use of Learning Automata (LA) for text classification.
    [Show full text]
  • Question Classification Using Interpretable Tsetlin Machine
    Question Classification using Interpretable Tsetlin Machine Dragos, Constantin Nicolae [email protected] Research Institute for Artificial Intelligence “Mihai Drăgănescu” Bucharest, Romania ABSTRACT machine learning approaches [9], feature extraction plays a vital Question Answering (QA) is one of the hottest research topics in role in accomplishing the target accuracy. Feature extraction is Natural Language Processing (NLP) as well as Information Retrieval done using various lexical, syntactic features and parts of speech. (IR). Among various domains of QA, Question Classification (QC) Most of the machine learning algorithms are powerful to obtain is a very important system that classifies a question based on the good accuracy with QC data [17]. However, there always exists type of answer expected from it. Generalization is a very important a limitation of interpretation in the model. Decision Tree despite factor in NLP specifically in QA and subsequently in QC. There are having somewhat interpretable when have a complex tree makes numerous models for the classification of types of questions. Despite it slightly difficult for a human to extract the meaning outofit. its good performance, it lacks the interpretability that shows us how Similarly, a very powerful tool called deep neural network having the model can classify the output. Hence, in this paper, we propose impressive performance is still criticized for being black-box in a Tsetlin Machine based QC task that shows the interpretability nature [3]. of the model yet retaining the state-of-the-art performance. Our Interpretability is a huge topic of interest in recent machine model is validated by comparing it with other interpretable machine learning domain.
    [Show full text]
  • Using the Tsetlin Machine to Learn Human-Interpretable Rules for High-Accuracy Text Categorization with Medical Applications
    Using the Tsetlin Machine to Learn Human-Interpretable Rules for High-Accuracy Text Categorization with Medical Applications Geir Thore Berge1,2,3, Ole-Christoffer Granmo1, Tor Oddbjørn Tveit1,3,4, Morten Goodwin1*, 1* 1* Lei Jiao , and Bernt Viggo Matheussen 1 Centre for Artificial Intelligence Research, University of Agder, Grimstad, Norway 2 Department of Information Systems, University of Agder, Kristiansand, Norway 3 Department of Technology and eHealth, Sørlandet Hospital Trust, Kristiansand, Norway 4 Department of Anesthesia and Intensive Care, Sørlandet Hospital Trust, Kristiansand, Norway [email protected] Abstract forming phrases, which interact to form sentences, which in Medical applications challenge today's text categorization tech- turn are interweaved into paragraphs that carry implicit and niques by demanding both high accuracy and ease-of-interpre- explicit meaning (Norvig 1987; Zhang, Zhao, and LeCun tation. Although deep learning has provided a leap ahead in ac- 2015). Because of the complexity inherent in the formation curacy, this leap comes at the sacrifice of interpretability. To of natural language, text understanding has traditionally address this accuracy-interpretability challenge, we here intro- been a difficult area for machine learning algorithms (Linell duce, for the first time, a text categorization approach that lev- 1982). Medical text understanding is no exception, both due erages the recently introduced Tsetlin Machine. In all brevity, to the intricate nature of medical language, and due to the we represent the terms of a text as propositional variables. From need for transparency, through human-interpretable proce- these, we capture categories using simple propositional formu- lae, such as: if “rash” and “reaction” and “penicillin” then Al- dures and outcomes (Berge, Granmo, and Tveit 2017; Y.
    [Show full text]
  • Swarm-Based Machine Learning Algorithm for Building Interpretable Classifiers
    Received December 10, 2020, accepted December 14, 2020, date of publication December 21, 2020, date of current version December 31, 2020. Digital Object Identifier 10.1109/ACCESS.2020.3046078 Swarm-Based Machine Learning Algorithm for Building Interpretable Classifiers DIEM PHAM1,2, BINH TRAN 1, (Member, IEEE), SU NGUYEN 1, (Member, IEEE), AND DAMMINDA ALAHAKOON1, (Member, IEEE) 1Research Centre for Data Analytics and Cognition, La Trobe University, Melbourne, VIC 3086, Australia 2College of Information and Communication Technology, Can Tho University, Can Tho 900100, Vietnam Corresponding author: Binh Tran ([email protected]) This work was supported by La Trobe University: FUNDREF 10.13039/501100001215, GRANT #(s) Startup Grant / ASSC-2020-RSU20-TRAN. ABSTRACT This paper aims to produce classifiers that are not only accurate but also interpretable to decision makers. The classifiers are represented in the form of risk scores, i.e. simple linear classifiers where coefficient vectors are sparse and bounded integer vectors which are then optimised by a novel and scalable discrete particle swarm optimisation algorithm. In contrast to past studies which usually use particle swarm optimisation as a pre-processing step, the proposed algorithm incorporates particle swarm optimisation into the classification process. A penalty-based fitness function and a local search heuristic based on symmetric uncertainty are developed to efficiently identify classifiers with high classification performance and a preferred model size or complexity. Experiments with 10 benchmark datasets show that the proposed swarm-based algorithm is a strong candidate to develop effective linear classifiers. Comparisons with other interpretable machine learning algorithms that produce rule-based and tree-based classifiers also demonstrate the competitiveness of the proposed algorithm.
    [Show full text]
  • Playing the Game of Hex with the Tsetlin Machine and Tree Search
    Playing the game of Hex with the Tsetlin Machine and tree search Audun Linjord Simonsen Ole Andr´eHaddeland SUPERVISOR Ole-Christoffer Granmo University of Agder, 2020 Faculty of Engineering and Science Department of ICT UiA University of Agder Master's thesis Faculty of Engineering and Science Department of ICT c 2020 Audun Linjord Simonsen Ole Andr´eHaddeland. All rights reserved Abstract Hex is an abstract mathematical board game where the players aim to build a connection of pieces, traversing the board from edge to edge. The game requires the use of certain patterns to be played at a high level. Artificially Intelligent Hex players have had success using Monte Carlo tree search and current research efforts have in- troduced neural networks. This thesis looks into the recent Tsetlin Machine pattern-recognition technique, relying on interpretability, in combination with the Monte Carlo tree search method to play the game of Hex. A supervised learning approach has been employed in an effort to teach the Tsetlin Machine beneficial patterns for winning, resulting in around 91% accuracy, 87% recall and 97% precision. It is demonstrated with a Hex tournament that the Tsetlin Machine is unable to play perfectly on a board of size 6 × 6 alone, but performs much better in combination with Monte Carlo tree search. Monte Carlo tree search reduced the number of averagely placed piece from around 35.5 down to around 20 and below. The benefit of using the Tsetlin Machine's interpretable clauses and pattern capabilities are that they can provide valuable knowledge needed for gameplay, and appear helpful for ventures into larger unexplored board sizes.
    [Show full text]
  • Text Categorization Techniques and Current Trends
    International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958 (Online), Volume-9 Issue-5, June 2020 Text Categorization Techniques and Current Trends Abhisu Jain, Aditya Goyal, Vikrant Singh, Anshul Tripathi, Saravanakumar Kandasamy Learning, Joint Mutual information method, Interaction Abstract: With the development of online data, text weight-based feature selection method, Random Forrest, categorization has become one of the key procedures for taking Recurrent Neural Network, Singular Value Decomposition, Term care of and sorting out content information. Text categorization Frequency-Inverse document Frequency, Tsetlin Machine. strategies are utilized to order reports, to discover fascinating data on the world wide web. Text Categorization is a task for I. INTRODUCTION categorizing information based on text and it has been important for effective analysis of textual data frameworks. There are As we step into the advanced world, technology is advancing systems which are designed to analyse and make distinctions at an exponential level. Therefore, with the advancement in between meaningful classes of information and text, such system the Internet and multimedia technology, huge amounts of data is known as text classification systems. The above-mentioned come along and because of the rapid and steady growth, it system is widely accepted and has been used for the purpose of consists of junk data too which is unrelated, irrelevant and retrieval of information and natural language processing. The usage of memory. With the advancement of the new era of archives can be ordered in three different ways unsupervised, supervised and semi supervised techniques. Text categorization social media, text content has seen a rise over the years.
    [Show full text]
  • Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling
    Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling Kuruge Darshana Abeyrathna * 1 Bimal Bhattarai * 1 Morten Goodwin * 1 Saeed Rahimi Gorji * 1 Ole-Christoffer Granmo * 1 Lei Jiao * 1 Rupsa Saha * 1 Rohan Kumar Yadav * 1 Abstract 1. Introduction Using logical clauses to represent patterns, Tsetlin Tsetlin machines (TMs) (Granmo, 2018) have recently machines (TMs) have recently obtained compet- demonstrated competitive results in terms of accuracy, mem- itive performance in terms of accuracy, memory ory footprint, energy, and learning speed on diverse bench- footprint, energy, and learning speed on several marks (image classification, regression, natural language benchmarks. Each TM clause votes for or against understanding, and speech processing) (Berge et al., 2019; a particular class, with classification resolved us- Yadav et al., 2021a; Abeyrathna et al., 2020; Granmo et al., ing a majority vote. While the evaluation of 2019; Wheeldon et al., 2020; Abeyrathna et al., 2021; Lei clauses is fast, being based on binary operators, et al., 2021). They use frequent pattern mining and resource the voting makes it necessary to synchronize the allocation principles to extract common patterns in the data, clause evaluation, impeding parallelization. In rather than relying on minimizing output error, which is this paper, we propose a novel scheme for desyn- prone to overfitting. Unlike the intertwined nature of pat- chronizing the evaluation of clauses, eliminating tern representation in neural networks, a TM decomposes the voting bottleneck. In brief, every clause runs problems into self-contained patterns, expressed as conjunc- in its own thread for massive native parallelism. tive clauses in propositional logic (i.e., in the form if input For each training example, we keep track of the X satisfies condition A and not condition B then output class votes obtained from the clauses in local vot- y = 1).
    [Show full text]
  • Expanding Convolutional Tsetlin Machine for Images with Lossless Binarization JENS MARTIN HÅSÆTHER
    Expanding Convolutional Tsetlin Machine for Images with Lossless Binarization JENS MARTIN HÅSÆTHER SUPERVISOR Lei Jiao Ole-Christoffer Granmo University of Agder, 2021 Faculty of Engineering and Science Department of Engineering Sciences Acknowledgements I wish to show my gratitude to my supervisors for help, guidance and troubleshooting. I wish to thank several academic staff and personal of the University of Agder that helped with various issues ranging from academical to technical as well as for use of their com- putational power. Finally I wish to express my deepest gratitude to my family, for being supportive during the pandemic. Without the combined help and assistance of all these this thesis would not have been possible. Abstract Deep convolutional neural networks (CNN) is known to be efficient in image classifica- tion but non-interpretable. To overcome the black box nature of CNN a derivative of the Tsetlin automata, the convolutional Tsetlin machine (CTM) which is transparent and in- terpretable, was developed. As CTM handles binary inputs, it is important to transform the input images into binary form with minimum information loss so that the CTM can classify them correctly and efficiently. Currently, a relatively lossy mechanism, called adaptive Gaussian thresholding, was employed for binarization. To retain as much infor- mation as possible, in this thesis, we adopt an adaptive binarization mechanism, which can offer lossless input to the CTM. In more details, we employ up to 8 bits in three colour channels and arrange them in a certain way so that the CTM can handle all bits as in- put. In addition, we can also select certain significant bits and ignore others to increase efficiency of the system.
    [Show full text]