Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling

Total Page:16

File Type:pdf, Size:1020Kb

Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling Kuruge Darshana Abeyrathna * 1 Bimal Bhattarai * 1 Morten Goodwin * 1 Saeed Rahimi Gorji * 1 Ole-Christoffer Granmo * 1 Lei Jiao * 1 Rupsa Saha * 1 Rohan Kumar Yadav * 1 Abstract 1. Introduction Using logical clauses to represent patterns, Tsetlin Tsetlin machines (TMs) (Granmo, 2018) have recently machines (TMs) have recently obtained compet- demonstrated competitive results in terms of accuracy, mem- itive performance in terms of accuracy, memory ory footprint, energy, and learning speed on diverse bench- footprint, energy, and learning speed on several marks (image classification, regression, natural language benchmarks. Each TM clause votes for or against understanding, and speech processing) (Berge et al., 2019; a particular class, with classification resolved us- Yadav et al., 2021a; Abeyrathna et al., 2020; Granmo et al., ing a majority vote. While the evaluation of 2019; Wheeldon et al., 2020; Abeyrathna et al., 2021; Lei clauses is fast, being based on binary operators, et al., 2021). They use frequent pattern mining and resource the voting makes it necessary to synchronize the allocation principles to extract common patterns in the data, clause evaluation, impeding parallelization. In rather than relying on minimizing output error, which is this paper, we propose a novel scheme for desyn- prone to overfitting. Unlike the intertwined nature of pat- chronizing the evaluation of clauses, eliminating tern representation in neural networks, a TM decomposes the voting bottleneck. In brief, every clause runs problems into self-contained patterns, expressed as conjunc- in its own thread for massive native parallelism. tive clauses in propositional logic (i.e., in the form if input For each training example, we keep track of the X satisfies condition A and not condition B then output class votes obtained from the clauses in local vot- y = 1). The clause outputs, in turn, are combined into a ing tallies. The local voting tallies allow us to classification decision through summation and thresholding, detach the processing of each clause from the rest akin to a logistic regression function, however, with binary of the clauses, supporting decentralized learning. weights and a unit step output function. Being based on This means that the TM most of the time will op- the human-interpretable disjunctive normal form (Valiant, erate on outdated voting tallies. We evaluated the 1984), like Karnaugh maps (Karnaugh, 1953), a TM can proposed parallelization across diverse learning map an exponential number of input feature value combina- tasks and it turns out that our decentralized TM tions to an appropriate output (Granmo, 2018). learning algorithm copes well with working on outdated data, resulting in no significant loss in learning accuracy. Furthermore, we show that the Recent progress on TMs Recent research reports several proposed approach provides up to 50 times faster distinct TM properties. The TM can be used in convolution, learning. Finally, learning time is almost constant providing competitive performance on MNIST, Fashion- for reasonable clause amounts (employing from MNIST, and Kuzushiji-MNIST, in comparison with CNNs, 20 to 7; 000 clauses on a Tesla V100 GPU). For K-Nearest Neighbor, Support Vector Machines, Random sufficiently large clause numbers, computation Forests, Gradient Boosting, BinaryConnect, Logistic Cir- time increases approximately proportionally. Our cuits and ResNet (Granmo et al., 2019). The TM has also parallel and asynchronous architecture thus allows achieved promising results in text classification (Berge et al., processing of massive datasets and operating with 2019), word sense disambiguation (Yadav et al., 2021b), more clauses for higher accuracy. novelty detection (Bhattarai et al., 2021c;b), fake news de- tection (Bhattarai et al., 2021a), semantic relation analysis *Equal contribution (The authors are ordered alphabetically by (Saha et al., 2020), and aspect-based sentiment analysis (Ya- last name.) 1Department of Information and Communication Tech- dav et al., 2021a) using the conjunctive clauses to capture nology, University of Agder, Grimstad, Norway. Correspondence textual patterns. Recently, regression TMs compared favor- to: Ole-Christoffer Granmo <[email protected]>. ably with Regression Trees, Random Forest Regression, and Proceedings of the 38 th International Conference on Machine Support Vector Regression (Abeyrathna et al., 2020). The Learning, PMLR 139, 2021. Copyright 2021 by the author(s). above TM approaches have further been enhanced by vari- Massively Parallel and Asynchronous Tsetlin Machine Architecture ous techniques. By introducing real-valued clause weights, investigate how processing time scales with the number of it turns out that the number of clauses can be reduced by clauses, uncovering almost constant-time processing over up to 50× without loss of accuracy (Phoulady et al., 2020). reasonable clause amounts. Finally, in Section5, we con- Also, the logical inference structure of TMs makes it pos- clude with pointers to future work, including architectures sible to index the clauses on the features that falsify them, for grid-computing and heterogeneous systems spanning the increasing inference- and learning speed by up to an order of cloud and the edge. magnitude (Gorji et al., 2020). Multi-granular clauses sim- The main contributions of the proposed architecture can be plify the hyper-parameter search by eliminating the pattern summarized as follows: specificity parameter (Gorji et al., 2019). In (Abeyrathna et al., 2021), stochastic searching on the line automata (Oom- men, 1997) learn integer clause weights, performing on-par • Learning time is made almost constant for reasonable or better than Random Forest, Gradient Boosting, Neural clause amounts (employing from 20 to 7; 000 clauses Additive Models, StructureBoost and Explainable Boosting on a Tesla V100 GPU). Machines. Closed form formulas for both local and global • For sufficiently large clause numbers, computation TM interpretation, akin to SHAP, was proposed by Blakely time increases approximately proportionally to the in- & Granmo(2020). From a hardware perspective, energy crease in number of clauses. usage can be traded off against accuracy by making infer- ence deterministic (Abeyrathna et al., 2020). Additionally, • The architecture copes remarkably with working on Shafik et al.(2020) show that TMs can be fault-tolerant, outdated data, resulting in no significant loss in learn- completely masking stuck-at faults. Recent theoretical work ing accuracy across diverse learning tasks (regression, proves convergence to the correct operator for “identity” and novelty detection, semantic relation analysis, and word “not”. It is further shown that arbitrarily rare patterns can sense disambiguation). be recognized, using a quasi-stationary Markov chain-based analysis. The work finally proves that when two patterns are Our parallel and asynchronous architecture thus allows pro- incompatible, the most accurate pattern is selected (Zhang cessing of more massive data sets and operating with more et al., 5555). Convergence for the “XOR” operator has also clauses for higher accuracy, significantly increasing the im- recently been proven by Jiao et al.(2021). pact of logic-based machine learning. 2. Tsetlin Machine Basics Paper Contributions In all of the above mentioned TM schemes, the clauses are learnt using Tsetlin automaton 2.1. Classification (TA)-teams (Tsetlin, 1961) that interact to build and in- tegrate conjunctive clauses for decision-making. While A TM takes a vector X = [x1; : : : ; xo] of o Boolean features producing accurate learning, this interaction creates a bottle- as input, to be classified into one of two classes, y = 0 or neck that hinders parallelization. That is, the clauses must y = 1. These features are then converted into a set of literals be evaluated and compared before feedback can be provided that consists of the features themselves as well as their to the TAs. negated counterparts: L = fx1; : : : ; xo; :x1;:::; :xog. In this paper, we first cover the basics of TMs in Section2. If there are m classes and n sub-patterns per class, a TM Then, we propose a novel parallel and asynchronous archi- employs m × n conjunctive clauses to represent the sub- 1 tecture in Section3, where every clause runs in its own patterns. For a given class , we index its clauses by j, thread for massive parallelism. We eliminate the above 1 ≤ j ≤ n, each clause being a conjunction of literals: interaction bottleneck by introducing local voting tallies V Cj(X) = lk: (1) that keep track of the clause outputs, per training exam- lk2Lj ple. The local voting tallies detach the processing of each Here, lk; 1 ≤ k ≤ 2o; is a feature or its negation. Further, clause from the rest of the clauses, supporting decentralized Lj is a subset of the literal set L. For example, the particular learning. Thus, rather than processing training examples clause Cj(X) = x1 ^ :x2 consists of the literals Lj = one-by-one as in the original TM, the clauses access the fx1; :x2g and outputs 1 if x1 = 1 and x2 = 0. training examples simultaneously, updating themselves and the local voting tallies in parallel. In Section4, we investi- The number of clauses n assigned to each class is user- gate the properties of the new architecture empirically on configurable. The clauses with odd indexes are assigned pos- regression, novelty detection, semantic relation
Recommended publications
  • Adaptive Sparse Representation of Continuous Input for Tsetlin Machines Based on Stochastic Searching on the Line
    electronics Article Adaptive Sparse Representation of Continuous Input for Tsetlin Machines Based on Stochastic Searching on the Line Kuruge Darshana Abeyrathna *, Ole-Christoffer Granmo and Morten Goodwin Centre for Artificial Intelligence Research, University of Agder, 4870 Grimstad, Norway; [email protected] (O.-C.G.); [email protected] (M.G.) * Correspondence: [email protected] Abstract: This paper introduces a novel approach to representing continuous inputs in Tsetlin Machines (TMs). Instead of using one Tsetlin Automaton (TA) for every unique threshold found when Booleanizing continuous input, we employ two Stochastic Searching on the Line (SSL) automata to learn discriminative lower and upper bounds. The two resulting Boolean features are adapted to the rest of the clause by equipping each clause with its own team of SSLs, which update the bounds during the learning process. Two standard TAs finally decide whether to include the resulting features as part of the clause. In this way, only four automata altogether represent one continuous feature (instead of potentially hundreds of them). We evaluate the performance of the new scheme empirically using five datasets, along with a study of interpretability. On average, TMs with SSL feature representation use 4.3 times fewer literals than the TM with static threshold-based features. Furthermore, in terms of average memory usage and F1-Score, our approach outperforms simple Multi-Layered Artificial Neural Networks, Decision Trees, Support Vector Machines, K-Nearest Neighbor, Random Forest, Gradient Boosted Trees (XGBoost), and Explainable Boosting Machines (EBMs), as well as the standard and real-value weighted TMs. Our approach further outperforms Citation: Abeyrathna, K.D.; Granmo, Neural Additive Models on Fraud Detection and StructureBoost on CA-58 in terms of the Area Under O.-C.; Goodwin, M.
    [Show full text]
  • Coalesced Multi-Output Tsetlin Machines with Clause Sharing∗
    Coalesced Multi-Output Tsetlin Machines with Clause Sharing∗ Sondre Glimsdal† and Ole-Christoffer Granmo‡ August 18, 2021 Abstract Using finite-state machines to learn patterns, Tsetlin machines (TMs) have obtained competitive accuracy and learning speed across several benchmarks, with frugal memory- and energy footprint. A TM represents patterns as conjunctive clauses in propositional logic (AND-rules), each clause voting for or against a particular output. While efficient for single-output problems, one needs a separate TM per output for multi-output problems. Employing multiple TMs hinders pattern reuse because each TM then operates in a silo. In this paper, we introduce clause sharing, merging multiple TMs into a single one. Each clause is related to each output by using a weight. A positive weight makes the clause vote for output 1, while a negative weight makes the clause vote for output 0. The clauses thus coalesce to produce multiple outputs. The resulting coalesced Tsetlin machine (CoTM) simultaneously learns both the weights and the composition of each clause by employing interacting Stochastic Searching on the Line (SSL) and Tsetlin automata (TAs) teams. Our empirical results on MNIST, Fashion-MNIST, and Kuzushiji-MNIST show that CoTM obtains significantly higher accuracy than TM on 50- to 1K-clause configurations, indicating an ability to repurpose clauses. E.g., accuracy goes from 71:99% to 89:66% on Fashion- MNIST when employing 50 clauses per class (22 Kb memory). While TM and CoTM accuracy is similar when using more than 1K clauses per class, CoTM reaches peak accuracy 3ˆ faster on MNIST with 8K clauses.
    [Show full text]
  • Arxiv:2105.09114V1 [Cs.CL] 19 May 2021 Decomposes Into Meaningful Words and Their Difficult to Detect Fake News Based on Linguistic Con- Negations
    Explainable Tsetlin Machine framework for fake news detection with credibility score assessment Bimal Bhattarai Ole-Christoffer Granmo Lei Jiao University of Agder University of Agder University of Agder [email protected] [email protected] [email protected] Abstract particularly problematic as they seek to deceive people for political and financial gain (Gottfried The proliferation of fake news, i.e., news in- and Shearer, 2016). tentionally spread for misinformation, poses a threat to individuals and society. Despite In recent years, we have witnessed extensive various fact-checking websites such as Politi- growth of fake news in social media, spread across Fact, robust detection techniques are required news blogs, Twitter, and other social platforms. to deal with the increase in fake news. Sev- At present, most online misinformation is manu- eral deep learning models show promising re- ally written (Vargo et al., 2018). However, natural sults for fake news classification, however, language models like GPT-3 enable automatic gen- their black-box nature makes it difficult to ex- eration of realistic-looking fake news, which may plain their classification decisions and quality- assure the models. We here address this prob- accelerate future growth. Such growth is problem- lem by proposing a novel interpretable fake atic as most people nowadays digest news stories news detection framework based on the re- from social media and news blogs (Allcott and cently introduced Tsetlin Machine (TM). In Gentzkow, 2017). Indeed, the spread of fake news brief, we utilize the conjunctive clauses of poses a severe threat to journalism, individuals, and the TM to capture lexical and semantic prop- society.
    [Show full text]
  • DETAILED PROGRAM Last Updated 27-Nov-2020
    DETAILED PROGRAM Last Updated 27-Nov-2020 ALL TIMES in AEDT (UTC+11) Please note in the conversion table below that some of the times listed below are on the previous day. AEDT 7:00:00 AM 8:00:00 AM 9:00:00 AM 10:00:00 AM 11:00:00 AM 7:00:00 PM 7:00:00 PM 8:00:00 PM 9:00:00 PM 10:00:00 PM 11:00:00 PM GMT 9:00:00 PM 10:00:00 PM 11:00:00 PM 12:00:00 AM 1:00:00 AM 9:00:00 AM 9:00:00 AM 10:00:00 AM 11:00:00 AM 12:00:00 PM 1:00:00 PM Beijing 4:00:00 AM 5:00:00 AM 6:00:00 AM 7:00:00 AM 8:00:00 AM 4:00:00 PM 4:00:00 PM 5:00:00 PM 6:00:00 PM 7:00:00 PM 8:00:00 PM Paris 9:00:00 PM 10:00:00 PM 11:00:00 PM 12:00:00 AM 1:00:00 AM 9:00:00 AM 9:00:00 AM 10:00:00 AM 11:00:00 AM 12:00:00 PM 1:00:00 PM London 8:00:00 PM 9:00:00 PM 10:00:00 PM 11:00:00 PM 12:00:00 AM 8:00:00 AM 8:00:00 AM 9:00:00 AM 10:00:00 AM 11:00:00 AM 12:00:00 PM Mexico 3:00:00 PM 4:00:00 PM 5:00:00 PM 6:00:00 PM 7:00:00 PM 3:00:00 AM 3:00:00 AM 4:00:00 AM 5:00:00 AM 6:00:00 AM 7:00:00 AM New York 4:00:00 PM 5:00:00 PM 6:00:00 PM 7:00:00 PM 8:00:00 PM 4:00:00 AM 4:00:00 AM 5:00:00 AM 6:00:00 AM 7:00:00 AM 8:00:00 AM San Francisco 1:00:00 PM 2:00:00 PM 3:00:00 PM 4:00:00 PM 5:00:00 PM 1:00:00 AM 1:00:00 AM 2:00:00 AM 3:00:00 AM 4:00:00 AM 5:00:00 AM 2 Tuesday, December 1, 7:00AM-10:06AM Tutorial: Deep Learning 1.0 and Beyond, Instructor: Truyen Tran: Room 1 Tutorial: Artificial Intelligence-based Uncertainty Qualification: Importance, Challenges and Solutions, Instructor: Abbas Khosravi and Saeid Nahavandi: Room 2 Tutorial: Handling Data Streams in Continual and Rapidly Changing Environments, Instructor: Mahardhika Pratama: Room 3 Special Session: IEEE-CIS 2nd Technical Challenge On Energy Prediction From Smart Meter Data 7.01 AM: Welcome (Luis) – 5 minutes 7.05 AM: Introduction to the 2nd Challenge and summary of results (Isaac) – 15 mins 7.20 AM: Presentations – 12 min presentation + Questions (5 min) + 3 min switch to next speaker.
    [Show full text]
  • Text Classification with Noisy Class Labels
    Text Classification with Noisy Class Labels By Andrea Pagotto A thesis proposal submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfilment of the requirements for the degree of Master of Computer Science Ottawa-Carleton Institute for Computer Science School of Computer Science Carleton University Ottawa, Ontario June 2020 ⃝c Copyright 2020, Andrea Pagotto The undersigned hereby recommend to the Faculty of Graduate and Postdoctoral Affairs acceptance of the thesis, Text Classification with Noisy Class Labels submitted by Andrea Pagotto Dr. Michel Barbeau (Director, School of Computer Science) Dr. B. John Oommen (Thesis Supervisor) Carleton University June 2020 ii ABSTRACT Text classification is a sub-field of Natural Language Processing (NLP), thatin- volves mapping an input text document to an output topic or label. There are nu- merous practical applications for text classification, including email spam detection, identifying sentiments in social media, news topic detection, and many more. Due to the importance of the field, there has been much related research in recent years. Our work in this thesis specifically focuses on the problem of text classification in the setting of a Random Environment. The Random Environment in this application would be noise in the labels of the training data. Label noise is an important issue in classification, with many potential negative consequences, such as decreasing the accuracy of the predictions, and increasing the complexity of the trained models. Designing learning algorithms that help maximize a desired performance measure in such noisy settings, is very valuable for achieving success on real world data. This thesis also investigates a recently proposed classification method that in- volves the use of Learning Automata (LA) for text classification.
    [Show full text]
  • Question Classification Using Interpretable Tsetlin Machine
    Question Classification using Interpretable Tsetlin Machine Dragos, Constantin Nicolae [email protected] Research Institute for Artificial Intelligence “Mihai Drăgănescu” Bucharest, Romania ABSTRACT machine learning approaches [9], feature extraction plays a vital Question Answering (QA) is one of the hottest research topics in role in accomplishing the target accuracy. Feature extraction is Natural Language Processing (NLP) as well as Information Retrieval done using various lexical, syntactic features and parts of speech. (IR). Among various domains of QA, Question Classification (QC) Most of the machine learning algorithms are powerful to obtain is a very important system that classifies a question based on the good accuracy with QC data [17]. However, there always exists type of answer expected from it. Generalization is a very important a limitation of interpretation in the model. Decision Tree despite factor in NLP specifically in QA and subsequently in QC. There are having somewhat interpretable when have a complex tree makes numerous models for the classification of types of questions. Despite it slightly difficult for a human to extract the meaning outofit. its good performance, it lacks the interpretability that shows us how Similarly, a very powerful tool called deep neural network having the model can classify the output. Hence, in this paper, we propose impressive performance is still criticized for being black-box in a Tsetlin Machine based QC task that shows the interpretability nature [3]. of the model yet retaining the state-of-the-art performance. Our Interpretability is a huge topic of interest in recent machine model is validated by comparing it with other interpretable machine learning domain.
    [Show full text]
  • Using the Tsetlin Machine to Learn Human-Interpretable Rules for High-Accuracy Text Categorization with Medical Applications
    Using the Tsetlin Machine to Learn Human-Interpretable Rules for High-Accuracy Text Categorization with Medical Applications Geir Thore Berge1,2,3, Ole-Christoffer Granmo1, Tor Oddbjørn Tveit1,3,4, Morten Goodwin1*, 1* 1* Lei Jiao , and Bernt Viggo Matheussen 1 Centre for Artificial Intelligence Research, University of Agder, Grimstad, Norway 2 Department of Information Systems, University of Agder, Kristiansand, Norway 3 Department of Technology and eHealth, Sørlandet Hospital Trust, Kristiansand, Norway 4 Department of Anesthesia and Intensive Care, Sørlandet Hospital Trust, Kristiansand, Norway [email protected] Abstract forming phrases, which interact to form sentences, which in Medical applications challenge today's text categorization tech- turn are interweaved into paragraphs that carry implicit and niques by demanding both high accuracy and ease-of-interpre- explicit meaning (Norvig 1987; Zhang, Zhao, and LeCun tation. Although deep learning has provided a leap ahead in ac- 2015). Because of the complexity inherent in the formation curacy, this leap comes at the sacrifice of interpretability. To of natural language, text understanding has traditionally address this accuracy-interpretability challenge, we here intro- been a difficult area for machine learning algorithms (Linell duce, for the first time, a text categorization approach that lev- 1982). Medical text understanding is no exception, both due erages the recently introduced Tsetlin Machine. In all brevity, to the intricate nature of medical language, and due to the we represent the terms of a text as propositional variables. From need for transparency, through human-interpretable proce- these, we capture categories using simple propositional formu- lae, such as: if “rash” and “reaction” and “penicillin” then Al- dures and outcomes (Berge, Granmo, and Tveit 2017; Y.
    [Show full text]
  • Swarm-Based Machine Learning Algorithm for Building Interpretable Classifiers
    Received December 10, 2020, accepted December 14, 2020, date of publication December 21, 2020, date of current version December 31, 2020. Digital Object Identifier 10.1109/ACCESS.2020.3046078 Swarm-Based Machine Learning Algorithm for Building Interpretable Classifiers DIEM PHAM1,2, BINH TRAN 1, (Member, IEEE), SU NGUYEN 1, (Member, IEEE), AND DAMMINDA ALAHAKOON1, (Member, IEEE) 1Research Centre for Data Analytics and Cognition, La Trobe University, Melbourne, VIC 3086, Australia 2College of Information and Communication Technology, Can Tho University, Can Tho 900100, Vietnam Corresponding author: Binh Tran ([email protected]) This work was supported by La Trobe University: FUNDREF 10.13039/501100001215, GRANT #(s) Startup Grant / ASSC-2020-RSU20-TRAN. ABSTRACT This paper aims to produce classifiers that are not only accurate but also interpretable to decision makers. The classifiers are represented in the form of risk scores, i.e. simple linear classifiers where coefficient vectors are sparse and bounded integer vectors which are then optimised by a novel and scalable discrete particle swarm optimisation algorithm. In contrast to past studies which usually use particle swarm optimisation as a pre-processing step, the proposed algorithm incorporates particle swarm optimisation into the classification process. A penalty-based fitness function and a local search heuristic based on symmetric uncertainty are developed to efficiently identify classifiers with high classification performance and a preferred model size or complexity. Experiments with 10 benchmark datasets show that the proposed swarm-based algorithm is a strong candidate to develop effective linear classifiers. Comparisons with other interpretable machine learning algorithms that produce rule-based and tree-based classifiers also demonstrate the competitiveness of the proposed algorithm.
    [Show full text]
  • Playing the Game of Hex with the Tsetlin Machine and Tree Search
    Playing the game of Hex with the Tsetlin Machine and tree search Audun Linjord Simonsen Ole Andr´eHaddeland SUPERVISOR Ole-Christoffer Granmo University of Agder, 2020 Faculty of Engineering and Science Department of ICT UiA University of Agder Master's thesis Faculty of Engineering and Science Department of ICT c 2020 Audun Linjord Simonsen Ole Andr´eHaddeland. All rights reserved Abstract Hex is an abstract mathematical board game where the players aim to build a connection of pieces, traversing the board from edge to edge. The game requires the use of certain patterns to be played at a high level. Artificially Intelligent Hex players have had success using Monte Carlo tree search and current research efforts have in- troduced neural networks. This thesis looks into the recent Tsetlin Machine pattern-recognition technique, relying on interpretability, in combination with the Monte Carlo tree search method to play the game of Hex. A supervised learning approach has been employed in an effort to teach the Tsetlin Machine beneficial patterns for winning, resulting in around 91% accuracy, 87% recall and 97% precision. It is demonstrated with a Hex tournament that the Tsetlin Machine is unable to play perfectly on a board of size 6 × 6 alone, but performs much better in combination with Monte Carlo tree search. Monte Carlo tree search reduced the number of averagely placed piece from around 35.5 down to around 20 and below. The benefit of using the Tsetlin Machine's interpretable clauses and pattern capabilities are that they can provide valuable knowledge needed for gameplay, and appear helpful for ventures into larger unexplored board sizes.
    [Show full text]
  • Text Categorization Techniques and Current Trends
    International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958 (Online), Volume-9 Issue-5, June 2020 Text Categorization Techniques and Current Trends Abhisu Jain, Aditya Goyal, Vikrant Singh, Anshul Tripathi, Saravanakumar Kandasamy Learning, Joint Mutual information method, Interaction Abstract: With the development of online data, text weight-based feature selection method, Random Forrest, categorization has become one of the key procedures for taking Recurrent Neural Network, Singular Value Decomposition, Term care of and sorting out content information. Text categorization Frequency-Inverse document Frequency, Tsetlin Machine. strategies are utilized to order reports, to discover fascinating data on the world wide web. Text Categorization is a task for I. INTRODUCTION categorizing information based on text and it has been important for effective analysis of textual data frameworks. There are As we step into the advanced world, technology is advancing systems which are designed to analyse and make distinctions at an exponential level. Therefore, with the advancement in between meaningful classes of information and text, such system the Internet and multimedia technology, huge amounts of data is known as text classification systems. The above-mentioned come along and because of the rapid and steady growth, it system is widely accepted and has been used for the purpose of consists of junk data too which is unrelated, irrelevant and retrieval of information and natural language processing. The usage of memory. With the advancement of the new era of archives can be ordered in three different ways unsupervised, supervised and semi supervised techniques. Text categorization social media, text content has seen a rise over the years.
    [Show full text]
  • Expanding Convolutional Tsetlin Machine for Images with Lossless Binarization JENS MARTIN HÅSÆTHER
    Expanding Convolutional Tsetlin Machine for Images with Lossless Binarization JENS MARTIN HÅSÆTHER SUPERVISOR Lei Jiao Ole-Christoffer Granmo University of Agder, 2021 Faculty of Engineering and Science Department of Engineering Sciences Acknowledgements I wish to show my gratitude to my supervisors for help, guidance and troubleshooting. I wish to thank several academic staff and personal of the University of Agder that helped with various issues ranging from academical to technical as well as for use of their com- putational power. Finally I wish to express my deepest gratitude to my family, for being supportive during the pandemic. Without the combined help and assistance of all these this thesis would not have been possible. Abstract Deep convolutional neural networks (CNN) is known to be efficient in image classifica- tion but non-interpretable. To overcome the black box nature of CNN a derivative of the Tsetlin automata, the convolutional Tsetlin machine (CTM) which is transparent and in- terpretable, was developed. As CTM handles binary inputs, it is important to transform the input images into binary form with minimum information loss so that the CTM can classify them correctly and efficiently. Currently, a relatively lossy mechanism, called adaptive Gaussian thresholding, was employed for binarization. To retain as much infor- mation as possible, in this thesis, we adopt an adaptive binarization mechanism, which can offer lossless input to the CTM. In more details, we employ up to 8 bits in three colour channels and arrange them in a certain way so that the CTM can handle all bits as in- put. In addition, we can also select certain significant bits and ignore others to increase efficiency of the system.
    [Show full text]
  • The Tsetlin Machine – a Game Theoretic Bandit Driven Approach to Optimal Pattern Recognition with Propositional Logic∗
    The Tsetlin Machine { A Game Theoretic Bandit Driven Approach to Optimal Pattern Recognition with Propositional Logic∗ Ole-Christoffer Granmo† Abstract Although simple individually, artificial neurons provide state-of-the-art performance when interconnected in deep networks. Arguably, the Tsetlin Automaton is an even simpler and more versatile learning mechanism, capable of solving the multi-armed bandit problem. Merely by means of a single integer as memory, it learns the optimal action in stochastic environments through increment and decrement operations. In this paper, we introduce the Tsetlin Machine, which solves complex pattern recognition problems with propositional formulas, composed by a collective of Tsetlin Automata. To eliminate the longstanding problem of vanishing signal-to-noise ratio, the Tsetlin Machine orchestrates the automata using a novel game. Further, both inputs, patterns, and outputs are expressed as bits, while recognition and learning rely on bit manipulation, simplifying computation. Our theoret- ical analysis establishes that the Nash equilibria of the game align with the propositional formulas that provide optimal pattern recognition accuracy. This translates to learning without local optima, only global ones. In five benchmarks, the Tsetlin Machine provides competitive accuracy compared with SVMs, Decision Trees, Random Forests, Naive Bayes Classifier, Logistic Regression, and Neural Networks. We further demonstrate how the propositional formulas facilitate interpretation. In conclusion, we believe the combination of high accuracy, interpretability, and computational simplicity makes the Tsetlin Machine a promising tool for a wide range of domains. Keywords: Bandit Problem, Game Theory, Interpretable Pattern Recognition, Propo- sitional Logic, Tsetlin Automata Games, Learning Automata, Frequent Pattern Mining, Resource Allocation. 1 Introduction Although simple individually, artificial neurons provide state-of-the-art performance when in- terconnected in deep networks [1].
    [Show full text]