Parallel Alpha-Beta Algorithm on the GPU

Total Page:16

File Type:pdf, Size:1020Kb

Parallel Alpha-Beta Algorithm on the GPU Journal of Computing and Information Technology - CIT 19, 2011, 4, 269–274 269 doi:10.2498/cit.1002029 Parallel Alpha-Beta Algorithm on the GPU Damjan Strnad and Nikola Guid Faculty of Electrical Engineering and Computer Science, University of Maribor, Slovenia In the paper we present the parallel implementation of the laptops. They feature hardware accelerated alpha-beta algorithm running on the graphics processing scheduling of thousands of lightweight threads unit (GPU). We compare the speed of the parallel player with the standard serial one using the game of reversi running on a number of streaming multiproces- with boards of different sizes. We show that for small sors, resembling in operation the SIMD (single boards the level of available parallelism is insufficient for instruction multiple data) machines. In recent efficient GPU utilization, but for larger boards substan- years, the parallel architecture of the GPU has tial speed-ups can be achieved on the GPU. The results indicate that the GPU-based alpha-beta implementation become available for general purpose program- would be advantageous for similar games of higher com- ming not related to graphics rendering. This putational complexity (e.g. hex and go) in their standard was facilitated by the emergence of specialized form. toolkits like C for CUDA [16] and OpenCL [8] Keywords: alpha-beta, reversi, GPU, CUDA, paralleliza- which propelled the successful parallelization tion of many compute intensive tasks [15]. Our goal with the GPU-based alpha-beta imple- mentation is to compare its speed of play against 1. Introduction the basic serial version of the algorithm. The game of reversi (also known as Othello) serves as a testing ground for comparison, where dif- The alpha-beta algorithm is central to most en- ferent board sizes are used to vary the com- gines for playing the two player zero-sum per- [ ] plexity of the game. This allows us to set the fect information board games 10 . Its effi- future directions of research on more complex ciency stems from the pruning of the game tree games that exhibit higher level of exploitable that relies heavily on the serial interdependence parallelism. of tree traversal. This sequential nature of the algorithm has also proved to be the major cul- The GPU-based implementation of game tree prit for its effective parallelization, which has search has previously been reported by Blei- been the focus of many researchers in the past weiss, but he demonstrated significant speed- [3, 6, 13]. The noticeable speed-ups have been ups only for practically limited case of thou- achieved in simulation [7], but required special sands of simultaneously running games [1].An- hardware [11, 12, 21] or network configurations other report shows good acceleration results [4, 19] to be demonstrated in practice. With with the exhaustive minimax tree search, which multi-threaded solution on broadly accessible represents the foundation of the alpha-beta al- multi-core CPUs, a relatively modest accelera- gorithm [17]. In their work, Rocki and Suda in- tion has been obtained [2]. dicate some intricacies of the GPU architecture that impair the performance of parallel minimax In this paper, we present the parallel alpha-beta and are inherited by the alpha-beta. algorithm running on the graphics processing unit (GPU). Modern GPUs are massively par- The rest of the paper is organized as follows: allel processors residing on the graphics boards in Section 2 we describe the principles of gen- installed in common desktop computers and eral purpose computation on graphics proces- 270 Parallel Alpha-Beta Algorithm on the GPU sors, followed by the description of alpha-beta on-device memory comes in much larger quan- parallelization methodology and its GPU-based tities (1GB and more) and can be used for inter- implementation in Section 3. In Section 4 we block communication, but has a high latency. give the specification of testing details and in It is further divided into read-only texture and Section 5 the results are presented. In Section 6 constant memory, and read-write global mem- we conclude and present some ideas for future ory. The purpose and characteristics of each work. memory type are detailed in the documentation [9]. The usual template of operation in CUDA ker- 2. General Purpose GPU Programming nels is to copy the data from global to shared and CUDA memory, process it there, and copy the results back. All of these steps are performed in par- In our research the GPU programming toolkit allel. The kernel complexity in terms of shared C for CUDA by Nvidia has been used because memory usage and register count determines it is the most mature and well documented [18]. the number of blocks that can be simultaneously Here we give a short overview of the technology. scheduled on the multiprocessor and thereby af- fects the GPU occupancy. The expected GPU The parallel architecture of Nvidia GPUs con- occupancy can be calculated using the tools pro- sists of a set of pipelined multiprocessors. The vided with CUDA so that the execution con- parallel computation on the GPU is performed figuration and kernel compilation can be tuned as a set of concurrently executing thread blocks, for optimal utilization of resources. The higher which are organized into a 1D or 2D grid.The GPU occupancy allows the scheduler to swap blocks themselves can be 1D, 2D, or 3D with the warps waiting for global memory transfer each thread designated by a unique combination with the queued warps. of indices. The hardware schedules the execu- Another major impact on kernel performance tion of blocks on the multiprocessors in units is the way threads in a warp address the mem- of 32 threads called warps. The threads from ory locations. The desired addressing scheme the same warp run in a lockstep, with each di- uses coalesced memory reads and writes, which vergent branching in the code suspending some means that consecutive threads access sequen- of the threads and thereby reducing the rate of tial memory locations or at least a permutation parallelism. The hierarchical structure of GPU of them. Conflicting memory accesses are seri- processing is also expressed in the necessity for alized and lower the memory bandwidth. careful thread synchronization to avoid the rac- ing conditions and optimize the performance. In CUDA, the C-like code to be executed on the 3. Implementation of Parallel Alpha-beta GPU is written in the form of functions called on the GPU kernels. For the efficient implementation of kernels on the GPU, one must consider the lim- The main source of parallelism in the alpha- ited amount of available resources like on-chip beta algorithm is in the concurrent processing memory and registers, as well as restrictions on of multiple sibling nodes at any level of the game execution configuration (i.e. the dimensions of tree. If the best move is evaluated first, then the grid and blocks). The properties of the GPU rest of the moves can be refuted in parallel with are described by its compute capability,which maximal efficiency. Unfortunately, the a priori can be queried at run-time and used to adjust quality of move can only be heuristically esti- the kernel parameters. mated. The parallel processing of nodes there- The memory available to the GPU is of sev- fore usually introduces some search overhead eral types. Each thread block has at disposal which may surpass the amount of serial search a limited amount (16KB – 48KB) of fast local in case of unfavourable move order. storage called shared memory, which resides on The basis for our parallel alpha-beta implemen- the GPU and has low latency. It is used for tation is the PV-split algorithm [14],inwhich internal computation and communication be- the parallelism is employed at the nodes on the tween the threads of the block. The off-chip, principal variation (i.e. the leftmost path in the Parallel Alpha-Beta Algorithm on the GPU 271 Figure 1. GPU-based PV-split heuristically ordered game tree), also known as the PV-nodes or type 1 nodes [10]. In the GPU- based variant the leftmost child of each PV-node is searched on the CPU to establish the lower bound on the PV-node value. The rest of PV- node’s descendants are searched in parallel on the GPU using the narrower window. Figure 1 shows the principle of parallel execution in the PV-nodes. Figure 2. Two level parallelism of GPU The parallel processing of sibling nodes is real- ized using multiple thread blocks on the GPU, but this is only the high level of parallelization exerted by the GPU-based alpha-beta. On the lower level, each node (i.e. reversi board) is processed in parallel by the threads of the two- dimensional block where a single block is used per node. Figure 2 depicts the described two level parallelism which differs from the ap- proach by Rocki and Suda, who delegate the nodes to individual warps of the same block [17]. The dimension of the block may be smaller than that of the board because: • using the smaller block is desirable when it results in higher GPU occupancy, and • the number of threads in the block is limited to 1024, which requires the use of smaller blocks for boards larger than 32×32 squares. If the block and board are of the same dimen- Figure 3. The mapping of threads to consecutive board squares sion, then the thread with index pair (x, y) is mapped directly to square (x, y) of the board. If the block is smaller than the board, then the appropriate number of consecutive squares the mapping scheme shown in Figure 3 is em- which ensues in nicely coalesced memory ac- ployed.
Recommended publications
  • Graph Traversals
    Graph Traversals CS200 - Graphs 1 Tree traversal reminder Pre order A A B D G H C E F I In order B C G D H B A E C F I Post order D E F G H D B E I F C A Level order G H I A B C D E F G H I Connected Components n The connected component of a node s is the largest set of nodes reachable from s. A generic algorithm for creating connected component(s): R = {s} while ∃edge(u, v) : u ∈ R∧v ∉ R add v to R n Upon termination, R is the connected component containing s. q Breadth First Search (BFS): explore in order of distance from s. q Depth First Search (DFS): explores edges from the most recently discovered node; backtracks when reaching a dead- end. 3 Graph Traversals – Depth First Search n Depth First Search starting at u DFS(u): mark u as visited and add u to R for each edge (u,v) : if v is not marked visited : DFS(v) CS200 - Graphs 4 Depth First Search A B C D E F G H I J K L M N O P CS200 - Graphs 5 Question n What determines the order in which DFS visits nodes? n The order in which a node picks its outgoing edges CS200 - Graphs 6 DepthGraph Traversalfirst search algorithm Depth First Search (DFS) dfs(in v:Vertex) mark v as visited for (each unvisited vertex u adjacent to v) dfs(u) n Need to track visited nodes n Order of visiting nodes is not completely specified q if nodes have priority, then the order may become deterministic for (each unvisited vertex u adjacent to v in priority order) n DFS applies to both directed and undirected graphs n Which graph implementation is suitable? CS200 - Graphs 7 Iterative DFS: explicit Stack dfs(in v:Vertex) s – stack for keeping track of active vertices s.push(v) mark v as visited while (!s.isEmpty()) { if (no unvisited vertices adjacent to the vertex on top of the stack) { s.pop() //backtrack else { select unvisited vertex u adjacent to vertex on top of the stack s.push(u) mark u as visited } } CS200 - Graphs 8 Breadth First Search (BFS) n Is like level order in trees A B C D n Which is a BFS traversal starting E F G H from A? A.
    [Show full text]
  • Lecture12: Trees II Tree Traversal Preorder Traversal Preorder Traversal
    Tree Traversal • Process of visiting nodes in a tree systematically CSED233: Data Structures (2013F) . Some algorithms need to visit all nodes in a tree. Example: printing, counting nodes, etc. Lecture12: Trees II • Implementation . Can be done easily by recursion Computers”R”Us . Order of visits does matter. Bohyung Han Sales Manufacturing R&D CSE, POSTECH [email protected] US International Laptops Desktops Europe Asia Canada CSED233: Data Structures 2 by Prof. Bohyung Han, Fall 2013 Preorder Traversal Preorder Traversal • A preorder traversal of the subtree rooted at node n: . Visit node n (process the node's data). 1 . Recursively perform a preorder traversal of the left child. Recursively perform a preorder traversal of the right child. 2 7 • A preorder traversal starts at the root. public void preOrderTraversal(Node n) 3 6 8 12 { if (n == null) return; 4 5 9 System.out.print(n.value+" "); preOrderTraversal(n.left); preOrderTraversal(n.right); 10 11 } CSED233: Data Structures CSED233: Data Structures 3 by Prof. Bohyung Han, Fall 2013 4 by Prof. Bohyung Han, Fall 2013 Inorder Traversal Inorder Traversal • A preorder traversal of the subtree rooted at node n: . Recursively perform a preorder traversal of the left child. 6 . Visit node n (process the node's data). Recursively perform a preorder traversal of the right child. 4 11 • An iorder traversal starts at the root. public void inOrderTraversal(Node n) 2 5 7 12 { if (n == null) return; 1 3 9 inOrderTraversal(n.left); System.out.print(n.value+" "); inOrderTraversal(n.right); } 8 10 CSED233: Data Structures CSED233: Data Structures 5 by Prof.
    [Show full text]
  • Efficient Stackless Hierarchy Traversal on Gpus with Backtracking in Constant Time
    High Performance Graphics (2016) Ulf Assarsson and Warren Hunt (Editors) Efficient Stackless Hierarchy Traversal on GPUs with Backtracking in Constant Time Nikolaus Binder† and Alexander Keller‡ NVIDIA Corporation 1 1 1 2 3 2 3 2 3 4 5 6 7 4 5 6 7 4 5 6 7 10 11 10 11 10 11 22 23 22 23 22 23 a) b) c) 44 45 44 45 44 45 Figure 1: Backtracking from the current node with key 22 to the next node with key 3, which has been postponed most recently, by a) iterating along parent and sibling references, b) using references to uncle and grand uncle, c) using a stack, or as we propose, a perfect hash directly mapping the key for the next node to its address without using a stack. The heat maps visualize the reduction in the number of backtracking steps. Abstract The fastest acceleration schemes for ray tracing rely on traversing a bounding volume hierarchy (BVH) for efficient culling and use backtracking, which in the worst case may expose cost proportional to the depth of the hierarchy in either time or state memory. We show that the next node in such a traversal actually can be determined in constant time and state memory. In fact, our newly proposed parallel software implementation requires only a few modifications of existing traversal methods and outperforms the fastest stack-based algorithms on GPUs. In addition, it reduces memory access during traversal, making it a very attractive building block for ray tracing hardware. Categories and Subject Descriptors (according to ACM CCS): I.3.3 [Computer Graphics]: Picture/Image Generation—Ray Trac- ing 1.
    [Show full text]
  • MA/CSSE 473 Day 15
    MA/CSSE 473 Day 15 BFS Topological Sort Combinatorial Object Generation MA/CSSE 473 Day 15 • HW 6 due tomorrow, HW 7 Friday, Exam next Tuesday • HW 8 has been updated for this term. Due Oct 8 • ConvexHull implementation problem due Day 27 (Oct 21); assignment is available now – Individual assignment (I changed my mind, but I am giving you 3.5 weeks to do it) • Schedule page now has my projected dates for all remaining reading assignments and written assignments – Topics for future class days and details of assignments 9-17 still need to be updated for this term • Student Questions • DFS and BFS • Topological Sort • Combinatorial Object Generation - Intro Q1 1 Recap: Pseudocode for DFS Graph may not be connected, so we loop. Backtracking happens when this loop ends (no more unmarked neighbors) Analysis? Q2 Notes on DFS • DFS can be implemented with graphs represented as: – adjacency matrix: Θ(|V| 2) – adjacency list: Θ(| V|+|E|) • Yields two distinct ordering of vertices: – order in which vertices are first encountered (pushed onto stack) – order in which vertices become dead-ends (popped off stack) • Applications: – check connectivity, finding connected components – Is this graph acyclic? – finding articulation points, if any (advanced) – searching the state-space of problem for (optimal) solution (AI) Q3 2 Breadth-first search (BFS) • Visits graph vertices by moving across to all the neighbors of last visited vertex. Vertices closer to the start are visited early • Instead of a stack, BFS uses a queue • Level-order tree traversal is
    [Show full text]
  • Discrete Mathematics and Its Applications Seventh Edition
    Discrete Mathematics and Its Applications Seventh Edition Kenneth H. Rosen Monmouth University (and.formerly AT&T Laboratories) � onnect Learn •� Succeed" The McGraw·Hi/1 Companies ��onnect Learn a_,Succeed· DISCRETE MATHEMATICS AND ITS APPLICATIONS, SEVENTH EDITION Published by McGraw-Hill, a business unit of The McGraw-Hill Companies, Inc., 1221 Avenue of the Americas, New York, NY 10020. Copyright© 2012 by The McGraw-Hill Companies, Inc. All rights reserved. Previous editions © 2007, 2003, and 1999. No part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written consent of The McGraw-Hill Companies, Inc., including, but not limited to, in any network or other electronic storage or transmission, or broadcast for distance learning. Some ancillaries, including electronic and print components, may not be available to customers outside the United States. This book is printed on acid-free paper. 1234567890DOW/DOW 10987654 321 ISBN 978-0-07-338309-5 0-07-338309-0 MHID Vice President & Editor-in- Chief: Marty Lange Editorial Director: Michael Lange Global Publisher: Raghothaman Srinivasan Executive Editor: Bill Stenquist Development Editors: LorraineK. Buczek/Rose Kernan Senior Marketing Manager: Curt Reynolds Project Manager: Robin A. Reed Buyer: Sandy Ludovissy Design Coordinator: Brenda A. Rolwes Cover painting: Jasper Johns, Between the Clock and the Bed, 1981. Oil on Canvas (72 x 126 I /4 inches) Collection of the artist. Photograph by Glenn Stiegelman. Cover Art© Jasper Johns/Licensed by VAGA, New York, NY Cover Designer: Studio Montage, St. Louis, Missouri Lead Photo Research Coordinator: Carrie K.
    [Show full text]
  • Trees Assignment #4
    Trees Assignment #4 Tuesday, February 16: Read Rosen and Write Essay Sections 9.6, 9.7, 9.8 -- Pages 647-672 Thursday, February 18: Read Rosen and Write Essay Section 10.1, 10.2, 10.3 -- Pages 683-722 Assignment #4 Homework Exercises Section 9.1: Problem 31 Section 9.2: Problems 18, 28 Section 9.3: Problems 30, 31 Section 9.4: Problems 54, 55 Section 9.5: Problems 10, 26, 27, 46, 54 Extra Credit Problem: See Comp 280 web page Motivation for Trees Many, Many Applications Fundamental Data Structure Neat Algorithms Simpler than Graphs Examples and Animations http://oneweb.utc.edu/~Christopher-Mawata/petersen/ Definitions Tree • Connected graph with no simple circuits. • Graph with a unique path between any two vertices. Rooted Tree • Explicit Definition -- A tree with one vertex called the root • Recursive Definition -- A single vertex r is a rooted tree with root r. -- If T1, …, Tn are rooted trees with roots r1, …, rn and r is a new vertex, then the graph with edges joining r to r1, …, rn is a rooted tree with root r. Counting Theorems Theorem 1: e = v −1 Proof: By induction on the number of vertices. (Inductive Step: Remove a leaf.) Theorem 2: # leaves ≤ mheight (m-ary trees) Proof: By induction on the height of the tree. (Inductive Step: Remove the root.) Theorem 3: height ≥ logm (#leaves) Proof: This result is just a restatement of Theorem 2. Standard Terminology Nodes Trees Ancestor Level Parent Height Child Balanced Sibling N-ary Descendent Binary Leaf -- Left Subtree / Child Internal Vertex (Node) -- Right Subtree / Child Examples
    [Show full text]
  • Traversal of Binary Trees 30 Traversal of Binary Trees 31 Traversals of Binary Trees • Is the Process of Visiting Each Node (Precisely
    Data Structures September 28 1 Hierachical structures: Trees • 2 Objectives Discuss the following topics: • Trees, Binary Trees, and Binary Search Trees • Implementing Binary Trees • Tree Traversal • Searching a Binary Search Tree • Insertion • Deletion 3 Objectives (continued) Discuss the following topics: • Heaps • Balancing a Tree • Self-Adjusting Trees 4 Trees, Binary Trees, and Binary Search Trees • A tree is a data type that consists of nodes and arcs • These trees are depicted upside down with the root at the top and the leaves (terminal nodes ) at the bottom • The root is a node that has no parent; it can have only child nodes • Leaves have no children (their children are null) 5 Trees, Binary Trees, and Binary Search Trees (continued) • Each node has to be reachable from the root through a unique sequence of arcs, called a path • The number of arcs in a path is called the length of the path • The level of a node is the length of the path from the root to the node plus 1, which is the number of nodes in the path • The height of a nonempty tree is the maximum level of a node in the tree 6 Trees, Binary Trees, and Binary Search Trees (continued) Figure 6-1 Examples of trees 7 Trees, Binary Trees, and Binary Search Trees (continued) Figure 6-2 Hierarchical structure of a university shown as a tree 8 Trees: abstract/mathematical important, great number of varieties • terminology (knoop, wortel, vader, kind) node/vertex, root, father/parent, child ) ) ) ) (non) directed (non) orderly adt adt adt adt binary trees (left ≠≠≠ right) full
    [Show full text]
  • Cmsc 132: Object-Oriented Programming Ii
    CMSC 132: OBJECT-ORIENTED PROGRAMMING II Graphs & Graph Traversal Department of Computer Science University of Maryland, College Park © Department of Computer Science UMD Graph Data Structures • Many-to-many relationship between elements • Each element has multiple predecessors • Each element has multiple successors © Department of Computer Science UMD Graph Definitions • Node • Element of graph A • State • List of adjacent/neighbor/successor nodes • Edge • Connection between two nodes • State • Endpoints of edge © Department of Computer Science UMD Graph Definitions • Directed graph • Directed edges • Undirected graph • Undirected edges © Department of Computer Science UMD Graph Definitions • Weighted graph • Weight (cost) associated with each edge © Department of Computer Science UMD Graph Definitions • Path • Sequence of nodes n1, n2, … nk • Edge exists between each pair of nodes ni , ni+1 • Example • A, B, C is a path • A, E, D is not a path © Department of Computer Science UMD Graph Definitions • Cycle • Path that ends back at starting node • Example • A, E, A • A, B, C, D, E, A • Simple path • No cycles in path • Acyclic graph • No cycles in graph • What is an example? © Department of Computer Science UMD Graph Definitions • Connected Graph • Every node in the graph is reachable from every other node in the graph • Unconnected graph • Graph that has several disjoint components Unconnected graph © Department of Computer Science UMD Graph Operations • Traversal (search) • Visit each node in graph exactly once • Usually perform computation
    [Show full text]
  • Trees and Tree Traversal Definition of a Tree
    Trees and Tree Traversal Material adapted – courtesy of Prof. Dave Matuszek at UPENN Definition of a tree A tree is a node with a value and zero or more children Depending on the needs of the program, the children may or may not be ordered A tree has a root, internal nodes, and leaves A Each node contains an element and has branches B C D E leading to other nodes (its children) F G H I J K Each node (other than the root) has a parent L M N Each node has a depth (distance from the root) 2 1 More definitions An empty tree has no nodes The descendents of a node are its children and the descendents of its children The ancestors of a node are its parent (if any) and the ancestors of its parent The subtree rooted at a node consists of the given node and all its descendents An ordered tree is one in which the order of the children is important; an unordered tree is one in which the children of a node can be thought of as a set The branching factor of a node is the number of children it has The branching factor of a tree is the average branching factor of its nodes 3 File systems File systems are almost always implemented as a tree structure The nodes in the tree are of (at least) two types: folders (or directories), and plain files A folder typically has children—subfolders and plain files A plain file is typically a leaf 4 2 Family trees It turns out that a tree is not a good way to represent a family tree Every child has two parents, a mother and a father Parents frequently remarry An “upside down” binary tree almost works
    [Show full text]
  • Modern B-Tree Techniques Contents
    Foundations and TrendsR in Databases Vol. 3, No. 4 (2010) 203–402 c 2011 G. Graefe DOI: 10.1561/1900000028 Modern B-Tree Techniques By Goetz Graefe Contents 1 Introduction 204 1.1 Perspectives on B-trees 204 1.2 Purpose and Scope 206 1.3 New Hardware 207 1.4 Overview 208 2 Basic Techniques 210 2.1 Data Structures 213 2.2 Sizes, Tree Height, etc. 215 2.3 Algorithms 216 2.4 B-trees in Databases 221 2.5 B-trees Versus Hash Indexes 226 2.6 Summary 230 3 Data Structures and Algorithms 231 3.1 Node Size 232 3.2 Interpolation Search 233 3.3 Variable-length Records 235 3.4 Normalized Keys 237 3.5 Prefix B-trees 239 3.6 CPU Caches 244 3.7 Duplicate Key Values 246 3.8 Bitmap Indexes 249 3.9 Data Compression 253 3.10 Space Management 256 3.11 Splitting Nodes 258 3.12 Summary 259 4 Transactional Techniques 260 4.1 Latching and Locking 265 4.2 Ghost Records 268 4.3 Key Range Locking 273 4.4 Key Range Locking at Leaf Boundaries 280 4.5 Key Range Locking of Separator Keys 282 4.6 Blink-trees 283 4.7 Latches During Lock Acquisition 286 4.8 Latch Coupling 288 4.9 Physiological Logging 289 4.10 Non-logged Page Operations 293 4.11 Non-logged Index Creation 295 4.12 Online Index Operations 296 4.13 Transaction Isolation Levels 300 4.14 Summary 304 5 Query Processing 305 5.1 Disk-order Scans 309 5.2 Fetching Rows 312 5.3 Covering Indexes 313 5.4 Index-to-index Navigation 317 5.5 Exploiting Key Prefixes 324 5.6 Ordered Retrieval 327 5.7 Multiple Indexes for a Single Table 329 5.8 Multiple Tables in a Single Index 333 5.9 Nested Queries and Nested Iteration 334
    [Show full text]
  • Adapting Tree Structures for Processing with SIMD Instructions
    Adapting Tree Structures for Processing with SIMD Instructions Steffen Zeuch Frank Huber⇤ Johann-Christoph Freytag Humboldt-Universität zu Berlin {zeuchste,huber,freytag}@informatik.hu-berlin.de ABSTRACT and the kind of data each node stores. The most widely + + In this paper, we accelerate the processing of tree-based in- used variant of the B-Tree is the B -Tree. The B -Tree dex structures by using SIMD instructions. We adapt the distinguishes between leaf and branching nodes. While leaf B+-Tree and prefix B-Tree (trie) by changing the search al- nodes store data items to form a so-called sequence set [8], gorithm on inner nodes from binary search to k-ary search. branching nodes are used for pathfinding based on stored The k-ary search enables the use of SIMD instructions, which key values. Thus, each node is either used for navigation or for storing data items, but not for both purposes like in are commonly available on most modern processors today. + The main challenge for using SIMD instructions on CPUs the original B-Tree. One major advantage of the B -Tree is their inherent requirement for consecutive memory loads. is its ability to speedup sequential processing by linking leaf The data for one SIMD load instruction must be located nodes to support range queries. in consecutive memory locations and cannot be scattered As a variant of the original B-Tree, Bayer et al. intro- over the entire memory. The original layout of tree-based duced the prefix B-Tree (trie) [4], also called digital search index structures does not satisfy this constraint and must tree.
    [Show full text]
  • “Breadth First Traversal Technique Using Tcl/Tk” In
    “BREADTH FIRST TRAVERSAL TECHNIQUE USING TCL/TK” Mini Project-I Report Submitted in Partial Fulfillment of the Requirements for the Degree of BACHELOR OF TECHNOLOGY IN ELECTRONICS & COMMUNICATION ENGINEERING By Aagam Jariwala (17BEC001) Akshay Mishra (17BEC008) Department of Electronics & Communication Engineering Institute of Technology NIRMA UNIVERSITY Ahmedabad 382 481 November 2019 1 | P a g e CERTIFICATE This is to certify that the Mini Project Report entitled “Breadth First Traversal technique using TCL/TK” submitted by Aagam Jariwala (17BEC001) and Akshay Mishra (17BEC008) towards the partial fulfillment of the requirements for the award of degree in Bachelor of Technology in the field of Electronics & Communication Engineering of Nirma University is the record of work carried out by him under our supervision and guidance. The work submitted has in our opinion reached a level required for being accepted for examination. The results embodied in this mini project work to the best of our knowledge have not been submitted to any other University or Institution for award of any degree or diploma. Date: 20th November 2019 Name of Guide: Dr. Usha Mehta Head of the Department Department of Electronics & Communication Engineering Institute of Technology, Nirma University Ahmedabad 2 | P a g e Undertaking for Originality of the Work (for all students) We, Aagam Jariwala and Akshay Mishra, Roll No.17BEC001 & 17BEC008 respectively, give undertaking that the Mini Project entitled “Breadth First Traversal technique using TCL/TK” submitted by us, towards the partial fulfillment of the requirements for the degree of Bachelor of Technology in the field of Electronics & Communication Engineering of Nirma University, Ahmedabad, is the original work carried out by me and I give assurance that no attempt of plagiarism has been made.
    [Show full text]