The Adaptive Radix Tree: Artful Indexing for Main-Memory Databases

Total Page:16

File Type:pdf, Size:1020Kb

The Adaptive Radix Tree: Artful Indexing for Main-Memory Databases The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases Viktor Leis, Alfons Kemper, Thomas Neumann Fakultat¨ fur¨ Informatik Technische Universitat¨ Munchen¨ Boltzmannstrae 3, D-85748 Garching <lastname>@in.tum.de Abstract—Main memory capacities have grown up to a point digit 1 where most databases fit into RAM. For main-memory database systems, index structure performance is a critical bottleneck. A Traditional in-memory data structures like balanced binary digit 2 search trees are not efficient on modern hardware, because they do not optimally utilize on-CPU caches. Hash tables, also often NR used for main-memory indexes, are fast but only support point queries. digit 3 To overcome these shortcomings, we present ART, an adaptive D YT ET radix tree (trie) for efficient indexing in main memory. Its lookup performance surpasses highly tuned, read-only search trees, while leaf nodes supporting very efficient insertions and deletions as well. At the same time, ART is very space efficient and solves the problem Fig. 1. Adaptively sized nodes in our radix tree. of excessive worst-case space consumption, which plagues most radix trees, by adaptively choosing compact and efficient data structures for internal nodes. Even though ART’s performance is comparable to hash tables, it maintains the data in sorted the long pipelines of modern CPUs stall, which causes addi- order, which enables additional operations like range scan and tional latencies after every second comparison (on average). prefix lookup. These problems of traditional search trees were tackled by recent research on data structures specifically designed to be I. INTRODUCTION efficient on modern hardware architectures. The k-ary search After decades of rising main memory capacities, even large tree [6] and the Fast Architecture Sensitive Tree (FAST) [7] transactional databases fit into RAM. When most data is use data level parallelism to perform multiple comparisons cached, traditional database systems are CPU bound because simultaneously with Singe Instruction Multiple Data (SIMD) they spend considerable effort to avoid disk accesses. This instructions. Additionally, FAST uses a data layout which has led to very intense research and commercial activities in avoids cache misses by optimally utilizing cache lines and main-memory database systems like H-Store/VoltDB [1], SAP the Translation Lookaside Buffer (TLB). While these opti- HANA [2], and HyPer [3]. These systems are optimized for mizations improve search performance, both data structures the new hardware landscape and are therefore much faster. Our cannot support incremental updates. For an OLTP database system HyPer, for example, compiles transactions to machine system which necessitates continuous insertions, updates, and code and gets rid of buffer management, locking, and latching deletions, an obvious solution is a differential file (delta) overhead. For OLTP workloads, the resulting execution plans mechanism, which, however, will result in additional costs. are often sequences of index operations. Therefore, index Hash tables are another popular main-memory data struc- efficiency is the decisive performance factor. ture. In contrast to search trees, which have O(log n) access More than 25 years ago, the T-tree [4] was proposed as time, hash tables have expected O(1) access time and are an in-memory indexing structure. Unfortunately, the dramatic therefore much faster in main memory. Nevertheless, hash processor architecture changes have rendered T-trees, like all tables are less commonly used as database indexes. One reason traditional binary search trees, inefficient on modern hardware. is that hash tables scatter the keys randomly, and therefore only The reason is that the ever growing CPU cache sizes and support point queries. Another problem is that most hash tables the diverging main memory speed have made the underlying do not handle growth gracefully, but require expensive reor- assumption of uniform memory access time obsolete. B+-tree ganization upon overflow with O(n) complexity. Therefore, variants like the cache sensitive B+-tree [5] have more cache- current systems face the unfortunate trade-off between fast friendly memory access patterns, but require more expensive hash tables that only allow point queries and fully-featured, update operations. Furthermore, the efficiency of both binary but relatively slow, search trees. and B+-trees suffers from another feature of modern CPUs: A third class of data structures, known as trie, radix tree, Because the result of comparisons cannot be predicted easily, prefix tree, and digital search tree, is illustrated in Figure1. These data structures directly use the digital representation of II. RELATED WORK keys instead of hashing or comparing keys. The underlying + idea is similar to a thumb-index found in many alphabetically In disk-based database systems, the B -tree [8] is ubiq- ordered dictionary books: The first character of a word can uitous [9]. It retrieves large blocks from disk to reduce the directly be used to jump to all words starting with that number of accesses. Red-black trees [10], [11] and T-trees character. In a computer, this process can be repeated with [4] were proposed for main-memory database systems. Rao the next characters until a match is found. As a consequence and Ross [5] showed that T-trees, like all binary search trees, of this process, all operations have O(k) complexity where k is suffer from poor cache behavior and are therefore often slower + the length of the key. In the era of extremely large data sets, than B -trees on modern hardware. As an alternative, they + + when n is growing faster than k, having a time complexity proposed a cache conscious B -tree variant, the CSB -tree + independent of n is very attractive. [12]. Further cache optimizations for B -trees were surveyed In this work, we present the adaptive radix tree (ART) which by Graefe and Larson [13]. is a fast and space-efficient in-memory indexing structure Modern CPUs allow to perform multiple comparisons with specifically tuned for modern hardware. While most radix a single SIMD instruction. Schlegel et al. [6] proposed k- trees require to trade off tree height versus space efficiency ary search which reduces the number of comparisons from by setting a globally valid fanout parameter, ART adapts the log2 n to logK n where K is the number of keys that fit representation of every individual node, as exemplified in into one SIMD vector. In comparison with binary trees, this Figure1. By adapting each inner node locally, it optimizes technique also reduces the number of cache misses, because global space utilization and access efficiency at the same time. K comparisons are performed for each cache line loaded Nodes are represented using a small number of efficient and from main memory. Kim et al. extended this research by compact data structures, chosen dynamically depending on proposing FAST, a methodology for laying out binary search the number of child nodes. Two additional techniques, path trees in an architecture sensitive way [7]. SIMD, cache line, compression and lazy expansion, allow ART to efficiently and page blocking are used to optimally use the available index long keys by collapsing nodes and thereby decreasing cache and memory bandwidth. Additionally, they proposed to the tree height. interleave the stages of multiple queries in order to increase A useful property of radix trees is that the order of the keys the throughput of their search algorithm. FAST trees and is not random as in hash tables; rather, the keys are ordered the k-ary search trees are pointer-free data structures which bitwise lexicographically. We show how typical data types can store all keys in a single array and use offset calculations be reordered efficiently to support all operations that require to traverse the tree. While this representation is efficient and the data to be ordered (e.g., range scan, prefix lookup, top-k, saves space, it also implies that no online updates are possible. minimum, and maximum). Kim et al. also presented a GPU implementation of FAST and This work makes the following contributions: compared its performance to modern CPUs. Their results show • We develop the adaptive radix tree (ART), a fast and that, due to the higher memory bandwidth, GPUs can achieve space-efficient general-purpose indexing structure for higher throughput than CPUs. Nevertheless, the use of GPUs main-memory database systems. as dedicated indexing hardware is not yet practical because • We prove that the space consumption per key is bounded memory capacities of GPUs are limited, communications cost to 52 bytes, even for arbitrarily long keys. We show with main memory is high, and hundreds of parallel queries experimentally, that the space consumption is much lower are needed to achieve high throughput. We, therefore focus on in practice, often as low as 8.1 bytes per key. index structures for CPUs. • We describe how common built-in data types can be The use of tries for indexing character strings has been stored in radix trees while retaining their order. studied extensively. The two earliest variants use lists [14] • We experimentally evaluate ART and other state of the and arrays [15] as internal node representations. Morrison art main-memory index structures, including the most introduced path compression in order to store long strings efficient search tree proposals. efficiently [16]. Knuth [17] analyzes these early trie variants. • By integrating ART into the main-memory database The burst trie is a more recent proposal which uses trie nodes system HyPer and running the TPC-C benchmark, we for the upper levels of the tree, but switches to a linked list prove its superior end-to-end performance in a “real-life” once a subtree has only few elements. The HAT-trie [18] transaction processing application. improves performance by replacing the linked list with a The rest of this paper is organized as follows.
Recommended publications
  • Application of TRIE Data Structure and Corresponding Associative Algorithms for Process Optimization in GRID Environment
    Application of TRIE data structure and corresponding associative algorithms for process optimization in GRID environment V. V. Kashanskya, I. L. Kaftannikovb South Ural State University (National Research University), 76, Lenin prospekt, Chelyabinsk, 454080, Russia E-mail: a [email protected], b [email protected] Growing interest around different BOINC powered projects made volunteer GRID model widely used last years, arranging lots of computational resources in different environments. There are many revealed problems of Big Data and horizontally scalable multiuser systems. This paper provides an analysis of TRIE data structure and possibilities of its application in contemporary volunteer GRID networks, including routing (L3 OSI) and spe- cialized key-value storage engine (L7 OSI). The main goal is to show how TRIE mechanisms can influence de- livery process of the corresponding GRID environment resources and services at different layers of networking abstraction. The relevance of the optimization topic is ensured by the fact that with increasing data flow intensi- ty, the latency of the various linear algorithm based subsystems as well increases. This leads to the general ef- fects, such as unacceptably high transmission time and processing instability. Logically paper can be divided into three parts with summary. The first part provides definition of TRIE and asymptotic estimates of corresponding algorithms (searching, deletion, insertion). The second part is devoted to the problem of routing time reduction by applying TRIE data structure. In particular, we analyze Cisco IOS switching services based on Bitwise TRIE and 256 way TRIE data structures. The third part contains general BOINC architecture review and recommenda- tions for highly-loaded projects.
    [Show full text]
  • KP-Trie Algorithm for Update and Search Operations
    The International Arab Journal of Information Technology, Vol. 13, No. 6, November 2016 KP-Trie Algorithm for Update and Search Operations Feras Hanandeh1, Mohammed Akour2, Essam Al Daoud3, Rafat Alshorman4, Izzat Alsmadi5 1Dept. of Computer Information Systems, Faculty of Prince Al-Hussein Bin Abdallah II For Information Technology, Hashemite University, Zarqa, Jordan, [email protected] 2,5Dept. of Computer Information Systems, Faculty of Information Technology, Yarmouk University, Irbid, Jordan, [email protected], [email protected] 3,4Computer Science Department, Zarqa University [email protected]; [email protected] Abstract: Radix-Tree is a space optimized data structure that performs data compression by means of cluster nodes that share the same branch. Each node with only one child is merged with its child and is considered as space optimized. Nevertheless, it can't be considered as speed optimized because the root is associated with the empty string . Moreover, values are not normally associated with every node; they are associated only with leaves and some inner nodes that correspond to keys of interest. Therefore, it takes time in moving bit by bit to reach the desired word. In this paper we propose the KP-Trie which is consider as speed and space optimized data structure that is resulted from both horizontal and vertical compression. Keywords: Trie, radix tree, data structure, branch factor, indexing, tree structure, information retrieval. 1- Introduction The concept of Data structures such as trees has developed since the 19th century. Tries evolved from trees. They have different names Data structures are a specialized format for such as: Radix tree, prefix tree compact trie, efficient organizing, retrieving, saving and bucket trie, crit bit tree, and PATRICIA storing data.
    [Show full text]
  • Zero Displacement Ternary Number System: the Most Economical Way of Representing Numbers
    Revista de Ciências da Computação, Volume III, Ano III, 2008, nº3 Zero Displacement Ternary Number System: the most economical way of representing numbers Fernando Guilherme Silvano Lobo Pimentel , Bank of Portugal, Email: [email protected] Abstract This paper concerns the efficiency of number systems. Following the identification of the most economical conventional integer number system, from a solid criteria, an improvement to such system’s representation economy is proposed which combines the representation efficiency of positional number systems without 0 with the possibility of representing the number 0. A modification to base 3 without 0 makes it possible to obtain a new number system which, according to the identified optimization criteria, becomes the most economic among all integer ones. Key Words: Positional Number Systems, Efficiency, Zero Resumo Este artigo aborda a questão da eficiência de sistemas de números. Partindo da identificação da mais económica base inteira de números de acordo com um critério preestabelecido, propõe-se um melhoramento à economia de representação nessa mesma base através da combinação da eficiência de representação de sistemas de números posicionais sem o zero com a possibilidade de representar o número zero. Uma modificação à base 3 sem zero permite a obtenção de um novo sistema de números que, de acordo com o critério de optimização identificado, é o sistema de representação mais económico entre os sistemas de números inteiros. Palavras-Chave: Sistemas de Números Posicionais, Eficiência, Zero 1 Introduction Counting systems are an indispensable tool in Computing Science. For reasons that are both technological and user friendliness, the performance of information processing depends heavily on the adopted numbering system.
    [Show full text]
  • KP-Trie Algorithm for Update and Search Operations
    The International Arab Journal of Information Technology, Vol. 13, No. 6, November 2016 722 KP-Trie Algorithm for Update and Search Operations Feras Hanandeh1, Izzat Alsmadi2, Mohammed Akour3, and Essam Al Daoud4 1Department of Computer Information Systems, Hashemite University, Jordan 2, 3Department of Computer Information Systems, Yarmouk University, Jordan 4Computer Science Department, Zarqa University, Jordan Abstract: Radix-Tree is a space optimized data structure that performs data compression by means of cluster nodes that share the same branch. Each node with only one child is merged with its child and is considered as space optimized. Nevertheless, it can’t be considered as speed optimized because the root is associated with the empty string. Moreover, values are not normally associated with every node; they are associated only with leaves and some inner nodes that correspond to keys of interest. Therefore, it takes time in moving bit by bit to reach the desired word. In this paper we propose the KP-Trie which is consider as speed and space optimized data structure that is resulted from both horizontal and vertical compression. Keywords: Trie, radix tree, data structure, branch factor, indexing, tree structure, information retrieval. Received January 14, 2015; accepted March 23, 2015; Published online December 23, 2015 1. Introduction the exception of leaf nodes, nodes in the trie work merely as pointers to words. Data structures are a specialized format for efficient A trie, also called digital tree, is an ordered multi- organizing, retrieving, saving and storing data. It’s way tree data structure that is useful to store an efficient with large amount of data such as: Large data associative array where the keys are usually strings, bases.
    [Show full text]
  • Lecture 26 Fall 2019 Instructors: B&S Administrative Details
    CSCI 136 Data Structures & Advanced Programming Lecture 26 Fall 2019 Instructors: B&S Administrative Details • Lab 9: Super Lexicon is online • Partners are permitted this week! • Please fill out the form by tonight at midnight • Lab 6 back 2 2 Today • Lab 9 • Efficient Binary search trees (Ch 14) • AVL Trees • Height is O(log n), so all operations are O(log n) • Red-Black Trees • Different height-balancing idea: height is O(log n) • All operations are O(log n) 3 2 Implementing the Lexicon as a trie There are several different data structures you could use to implement a lexicon— a sorted array, a linked list, a binary search tree, a hashtable, and many others. Each of these offers tradeoffs between the speed of word and prefix lookup, amount of memory required to store the data structure, the ease of writing and debugging the code, performance of add/remove, and so on. The implementation we will use is a special kind of tree called a trie (pronounced "try"), designed for just this purpose. A trie is a letter-tree that efficiently stores strings. A node in a trie represents a letter. A path through the trie traces out a sequence ofLab letters that9 represent : Lexicon a prefix or word in the lexicon. Instead of just two children as in a binary tree, each trie node has potentially 26 child pointers (one for each letter of the alphabet). Whereas searching a binary search tree eliminates half the words with a left or right turn, a search in a trie follows the child pointer for the next letter, which narrows the search• Goal: to just words Build starting a datawith that structure letter.
    [Show full text]
  • Data Structure
    EDUSAT LEARNING RESOURCE MATERIAL ON DATA STRUCTURE (For 3rd Semester CSE & IT) Contributors : 1. Er. Subhanga Kishore Das, Sr. Lect CSE 2. Mrs. Pranati Pattanaik, Lect CSE 3. Mrs. Swetalina Das, Lect CA 4. Mrs Manisha Rath, Lect CA 5. Er. Dillip Kumar Mishra, Lect 6. Ms. Supriti Mohapatra, Lect 7. Ms Soma Paikaray, Lect Copy Right DTE&T,Odisha Page 1 Data Structure (Syllabus) Semester & Branch: 3rd sem CSE/IT Teachers Assessment : 10 Marks Theory: 4 Periods per Week Class Test : 20 Marks Total Periods: 60 Periods per Semester End Semester Exam : 70 Marks Examination: 3 Hours TOTAL MARKS : 100 Marks Objective : The effectiveness of implementation of any application in computer mainly depends on the that how effectively its information can be stored in the computer. For this purpose various -structures are used. This paper will expose the students to various fundamentals structures arrays, stacks, queues, trees etc. It will also expose the students to some fundamental, I/0 manipulation techniques like sorting, searching etc 1.0 INTRODUCTION: 04 1.1 Explain Data, Information, data types 1.2 Define data structure & Explain different operations 1.3 Explain Abstract data types 1.4 Discuss Algorithm & its complexity 1.5 Explain Time, space tradeoff 2.0 STRING PROCESSING 03 2.1 Explain Basic Terminology, Storing Strings 2.2 State Character Data Type, 2.3 Discuss String Operations 3.0 ARRAYS 07 3.1 Give Introduction about array, 3.2 Discuss Linear arrays, representation of linear array In memory 3.3 Explain traversing linear arrays, inserting & deleting elements 3.4 Discuss multidimensional arrays, representation of two dimensional arrays in memory (row major order & column major order), and pointers 3.5 Explain sparse matrices.
    [Show full text]
  • Efficient In-Memory Indexing with Generalized Prefix Trees
    Efficient In-Memory Indexing with Generalized Prefix Trees Matthias Boehm, Benjamin Schlegel, Peter Benjamin Volk, Ulrike Fischer, Dirk Habich, Wolfgang Lehner TU Dresden; Database Technology Group; Dresden, Germany Abstract: Efficient data structures for in-memory indexing gain in importance due to (1) the exponentially increasing amount of data, (2) the growing main-memory capac- ity, and (3) the gap between main-memory and CPU speed. In consequence, there are high performance demands for in-memory data structures. Such index structures are used—with minor changes—as primary or secondary indices in almost every DBMS. Typically, tree-based or hash-based structures are used, while structures based on prefix-trees (tries) are neglected in this context. For tree-based and hash-based struc- tures, the major disadvantages are inherently caused by the need for reorganization and key comparisons. In contrast, the major disadvantage of trie-based structures in terms of high memory consumption (created and accessed nodes) could be improved. In this paper, we argue for reconsidering prefix trees as in-memory index structures and we present the generalized trie, which is a prefix tree with variable prefix length for indexing arbitrary data types of fixed or variable length. The variable prefix length enables the adjustment of the trie height and its memory consumption. Further, we introduce concepts for reducing the number of created and accessed trie levels. This trie is order-preserving and has deterministic trie paths for keys, and hence, it does not require any dynamic reorganization or key comparisons. Finally, the generalized trie yields improvements compared to existing in-memory index structures, especially for skewed data.
    [Show full text]
  • Data Structure Invariants
    Lecture 8 Notes Data Structures 15-122: Principles of Imperative Computation (Spring 2016) Frank Pfenning, André Platzer, Rob Simmons 1 Introduction In this lecture we introduce the idea of imperative data structures. So far, the only interfaces we’ve used carefully are pixels and string bundles. Both of these interfaces had the property that, once we created a pixel or a string bundle, we weren’t interested in changing its contents. In this lecture, we’ll talk about an interface that mimics the arrays that are primitively available in C0. To implement this interface, we’ll need to round out our discussion of types in C0 by discussing pointers and structs, two great tastes that go great together. We will discuss using contracts to ensure that pointer accesses are safe. Relating this to our learning goals, we have Computational Thinking: We illustrate the power of abstraction by con- sidering both the client-side and library-side of the interface to a data structure. Algorithms and Data Structures: The abstract arrays will be one of our first examples of abstract datatypes. Programming: Introduction of structs and pointers, use and design of in- terfaces. 2 Structs So far in this course, we’ve worked with five different C0 types — int, bool, char, string, and arrays t[] (there is a array type t[] for every type t). The character, Boolean and integer values that we manipulate, store LECTURE NOTES Data Structures L8.2 locally, and pass to functions are just the values themselves. For arrays (and strings), the things we store in assignable variables or pass to functions are addresses, references to the place where the data stored in the array can be accessed.
    [Show full text]
  • Number Systems and Radix Conversion
    Number Systems and Radix Conversion Sanjay Rajopadhye, Colorado State University 1 Introduction These notes for CS 270 describe polynomial number systems. The material is not in the textbook, but will be required for PA1. We humans are comfortable with numbers in the decimal system where each po- sition has a weight which is a power of 10: units have a weight of 1 (100), ten’s have 10 (101), etc. Even the fractional apart after the decimal point have a weight that is a (negative) power of ten. So the number 143.25 has a value that is 1 ∗ 102 + 4 ∗ 101 + 3 ∗ 0 −1 −2 2 5 10 + 2 ∗ 10 + 5 ∗ 10 , i.e., 100 + 40 + 3 + 10 + 100 . You can think of this as a polyno- mial with coefficients 1, 4, 3, 2, and 5 (i.e., the polynomial 1x2 + 4x + 3 + 2x−1 + 5x−2+ evaluated at x = 10. There is nothing special about 10 (just that humans evolved with ten fingers). We can use any radix, r, and write a number system with digits that range from 0 to r − 1. If our radix is larger than 10, we will need to invent “new digits”. We will use the letters of the alphabet: the digit A represents 10, B is 11, K is 20, etc. 2 What is the value of a radix-r number? Mathematically, a sequence of digits (the dot to the right of d0 is called the radix point rather than the decimal point), : : : d2d1d0:d−1d−2 ::: represents a number x defined as follows X i x = dir i 2 1 0 −1 −2 = ::: + d2r + d1r + d0r + d−1r + d−2r + ::: 1 Example What is 3 in radix 3? Answer: 0.1.
    [Show full text]
  • Scalability of Algorithms for Arithmetic Operations in Radix Notation∗
    Scalability of Algorithms for Arithmetic Operations in Radix Notation∗ Anatoly V. Panyukov y Department of Computational Mathematics and Informatics, South Ural State University, Chelyabinsk, Russia [email protected] Abstract We consider precise rational-fractional calculations for distributed com- puting environments with an MPI interface for the algorithmic analysis of large-scale problems sensitive to rounding errors in their software imple- mentation. We can achieve additional software efficacy through applying heterogeneous computer systems that execute, in parallel, local arithmetic operations with large numbers on several threads. Also, we investigate scalability of the algorithms for basic arithmetic operations and methods for increasing their speed. We demonstrate that increased efficacy can be achieved of software for integer arithmetic operations by applying mass parallelism in het- erogeneous computational environments. We propose a redundant radix notation for the construction of well-scaled algorithms for executing ba- sic integer arithmetic operations. Scalability of the algorithms for integer arithmetic operations in the radix notation is easily extended to rational- fractional arithmetic. Keywords: integer computer arithmetic, heterogeneous computer system, radix notation, massive parallelism AMS subject classifications: 68W10 1 Introduction Verified computations have become indispensable tools for algorithmic analysis of large scale unstable problems (see e.g., [1, 4, 5, 7, 8, 9, 10]). Such computations require spe- cific software tools; in this connection, we mention that our library \Exact computa- tion" [11] provides appropriate instruments for implementation of such computations ∗Submitted: January 27, 2013; Revised: August 17, 2014; Accepted: November 7, 2014. yThe author was supported by the Federal special-purpose program "Scientific and scientific-pedagogical personnel of innovation Russia", project 14.B37.21.0395.
    [Show full text]
  • Search Trees for Strings a Balanced Binary Search Tree Is a Powerful Data Structure That Stores a Set of Objects and Supports Many Operations Including
    Search Trees for Strings A balanced binary search tree is a powerful data structure that stores a set of objects and supports many operations including: Insert and Delete. Lookup: Find if a given object is in the set, and if it is, possibly return some data associated with the object. Range query: Find all objects in a given range. The time complexity of the operations for a set of size n is O(log n) (plus the size of the result) assuming constant time comparisons. There are also alternative data structures, particularly if we do not need to support all operations: • A hash table supports operations in constant time but does not support range queries. • An ordered array is simpler, faster and more space efficient in practice, but does not support insertions and deletions. A data structure is called dynamic if it supports insertions and deletions and static if not. 107 When the objects are strings, operations slow down: • Comparison are slower. For example, the average case time complexity is O(log n logσ n) for operations in a binary search tree storing a random set of strings. • Computing a hash function is slower too. For a string set R, there are also new types of queries: Lcp query: What is the length of the longest prefix of the query string S that is also a prefix of some string in R. Prefix query: Find all strings in R that have S as a prefix. The prefix query is a special type of range query. 108 Trie A trie is a rooted tree with the following properties: • Edges are labelled with symbols from an alphabet Σ.
    [Show full text]
  • Using Machine Learning to Improve Dense and Sparse Matrix Multiplication Kernels
    Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2019 Using machine learning to improve dense and sparse matrix multiplication kernels Brandon Groth Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Applied Mathematics Commons, and the Computer Sciences Commons Recommended Citation Groth, Brandon, "Using machine learning to improve dense and sparse matrix multiplication kernels" (2019). Graduate Theses and Dissertations. 17688. https://lib.dr.iastate.edu/etd/17688 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Using machine learning to improve dense and sparse matrix multiplication kernels by Brandon Micheal Groth A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Applied Mathematics Program of Study Committee: Glenn R. Luecke, Major Professor James Rossmanith Zhijun Wu Jin Tian Kris De Brabanter The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation. The Graduate College will ensure this dissertation is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2019 Copyright c Brandon Micheal Groth, 2019. All rights reserved. ii DEDICATION I would like to dedicate this thesis to my wife Maria and our puppy Tiger.
    [Show full text]