Beyond Binary Search: Parallel In-Place Construction of Implicit Search Tree Layouts
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Paralellizing the Data Cube
PARALELLIZING THE DATA CUBE By Todd Eavis SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY AT DALHOUSIE UNIVERSITY HALIFAX, NOVA SCOTIA JUNE 27, 2003 °c Copyright by Todd Eavis, 2003 DALHOUSIE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE The undersigned hereby certify that they have read and recommend to the Faculty of Graduate Studies for acceptance a thesis entitled “Paralellizing the Data Cube” by Todd Eavis in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Dated: June 27, 2003 External Examiner: Virendra Bhavsar Research Supervisor: Andrew Rau-Chaplin Examing Committee: Qigang Gao Evangelos Milios ii DALHOUSIE UNIVERSITY Date: June 27, 2003 Author: Todd Eavis Title: Paralellizing the Data Cube Department: Computer Science Degree: Ph.D. Convocation: October Year: 2003 Permission is herewith granted to Dalhousie University to circulate and to have copied for non-commercial purposes, at its discretion, the above title upon the request of individuals or institutions. Signature of Author THE AUTHOR RESERVES OTHER PUBLICATION RIGHTS, AND NEITHER THE THESIS NOR EXTENSIVE EXTRACTS FROM IT MAY BE PRINTED OR OTHERWISE REPRODUCED WITHOUT THE AUTHOR’S WRITTEN PERMISSION. THE AUTHOR ATTESTS THAT PERMISSION HAS BEEN OBTAINED FOR THE USE OF ANY COPYRIGHTED MATERIAL APPEARING IN THIS THESIS (OTHER THAN BRIEF EXCERPTS REQUIRING ONLY PROPER ACKNOWLEDGEMENT IN SCHOLARLY WRITING) AND THAT ALL SUCH USE IS CLEARLY ACKNOWLEDGED. iii To the two women in my life: Amber and Bailey. iv Table of Contents Table of Contents v List of Tables x List of Figures xi Abstract i Acknowledgements ii 1 Introduction 1 1.1 Overview of Primary Research . -
WLFC: Write Less in the Flash-Based Cache
WLFC: Write Less in the Flash-based Cache Chaos Dong Fang Wang Jianshun Zhang Huazhong University of Huazhong University of Huazhong University of Science and Technology Science and Technology Science and Technology Huhan, China Huhan, China Huhan, China Email: [email protected] Email: [email protected] Email: [email protected] Abstract—Flash-based disk caches, for example Bcache [1] and log-on-log [5]. At the same time, the Open-Channel SSD Flashcache [2], has gained tremendous popularity in industry (OCSSD) has became increasingly popular in academia. in the last decade because of its low energy consumption, non- Through a standard OCSSD interface, the behaviors of the volatile nature and high I/O speed. But these cache systems have a worse write performance than the read performance because flash memory can be managed by the host to improve the of the asymmetric I/O costs and the the internal GC mechanism. utilization of the storage. In addition to the performance issues, since the NAND flash is In this paper, we mainly make three contributions to optimize a type of EEPROM device, the lifespan is also limited by the write performance of the flash-based disk cache. First, we Program/Erase (P/E) cycles. So how to improve the performance designed a write-friendly flash-based disk cache system and the lifespan of flash-based caches in write-intensive scenarios has always been a hot issue. Benefiting from Open-Channel SSDs upon Open-Channel SSDs, named WLFC (Write Less in (OCSSDs) [3], we propose a write-friendly flash-based disk cache the Flash-based Cache), in which the requests is handled system, which is called WLFC (Write Less in the Flash-based by a strictly sequential writing method to reduce write Cache). -
Two-Level Main Memory Co-Design: Multi-Threaded Algorithmic Primitives, Analysis, and Simulation
Two-Level Main Memory Co-Design: Multi-Threaded Algorithmic Primitives, Analysis, and Simulation Michael A. Bender∗x Jonathan Berryy Simon D. Hammondy K. Scott Hemmerty Samuel McCauley∗ Branden Moorey Benjamin Moseleyz Cynthia A. Phillipsy David Resnicky and Arun Rodriguesy ∗Stony Brook University, Stony Brook, NY 11794-4400 USA fbender,[email protected] ySandia National Laboratories, Albuquerque, NM 87185 USA fjberry, sdhammo, kshemme, bjmoor, caphill, drresni, [email protected] zWashington University in St. Louis, St. Louis, MO 63130 USA [email protected] xTokutek, Inc. www.tokutek.com Abstract—A fundamental challenge for supercomputer memory close to the processor, there can be a higher architecture is that processors cannot be fed data from number of connections between the memory and caches, DRAM as fast as CPUs can consume it. Therefore, many enabling higher bandwidth than current technologies. applications are memory-bandwidth bound. As the number 1 of cores per chip increases, and traditional DDR DRAM While the term scratchpad is overloaded within the speeds stagnate, the problem is only getting worse. A computer architecture field, we use it throughout this variety of non-DDR 3D memory technologies (Wide I/O 2, paper to describe a high-bandwidth, local memory that HBM) offer higher bandwidth and lower power by stacking can be used as a temporary storage location. DRAM chips on the processor or nearby on a silicon The scratchpad cannot replace DRAM entirely. Due to interposer. However, such a packaging scheme cannot contain sufficient memory capacity for a node. It seems the physical constraints of adding the memory directly likely that future systems will require at least two levels of to the chip, the scratchpad cannot be as large as DRAM, main memory: high-bandwidth, low-power memory near although it will be much larger than cache, having the processor and low-bandwidth high-capacity memory gigabytes of storage capacity. -
Minimizing Writes in Parallel External Memory Search
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Minimizing Writes in Parallel External Memory Search Nathan R. Sturtevant and Matthew J. Rutherford University of Denver Denver, CO, USA fsturtevant, [email protected] Abstract speed in Chinese Checkers, and is 3.5 times faster in Rubik’s Cube. Recent research on external-memory search has shown that disks can be effectively used as sec- WMBFS is motivated by several observations: ondary storage when performing large breadth- first searches. We introduce the Write-Minimizing First, most large-scale BFSs have been performed to ver- Breadth-First Search (WMBFS) algorithm which ify the diameter (the number of unique depths at which states is designed to minimize the number of writes per- can be found) or the width (the maximum number of states at formed in an external-memory BFS. WMBFS is any particular depth) of a state space, after which any com- also designed to store the results of the BFS for puted data is discarded. There are, however, many scenarios later use. We present the results of a BFS on a in which the results of a large BFS can be later used for other single-agent version of Chinese Checkers and the computation. In particular, a breadth-first search is the under- Rubik’s Cube edge cubes, state spaces with about lying technique for building pattern databases (PDBs), which 1 trillion states each. In evaluating against a com- are used as heuristics for search. PDBs require that the depth parable approach, WMBFS reduces the I/O for the of each state in the BFS be stored for later usage. -
Sorting Algorithms
Sorting Algorithms Next to storing and retrieving data, sorting of data is one of the more common algorithmic tasks, with many different ways to perform it. Whenever we perform a web search and/or view statistics at some website, the presented data has most likely been sorted in some way. In this lecture and in the following lectures we will examine several different ways of sorting. The following are some reasons for investigating several of the different algorithms (as opposed to one or two, or the \best" algorithm). • There exist very simply understood algorithms which, although for large data sets behave poorly, perform well for small amounts of data, or when the range of the data is sufficiently small. • There exist sorting algorithms which have shown to be more efficient in practice. • There are still yet other algorithms which work better in specific situations; for example, when the data is mostly sorted, or unsorted data needs to be merged into a sorted list (for example, adding names to a phonebook). 1 Counting Sort Counting sort is primarily used on data that is sorted by integer values which fall into a relatively small range (compared to the amount of random access memory available on a computer). Without loss of generality, we can assume the range of integer values is [0 : m], for some m ≥ 0. Now given array a[0 : n − 1] the idea is to define an array of lists l[0 : m], scan a, and, for i = 0; 1; : : : ; n − 1 store element a[i] in list l[v(a[i])], where v is the function that computes an array element's sorting value. -
Random Access Memory (Ram)
www.studymafia.org A Seminar report On RANDOM ACCESS MEMORY (RAM) Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science SUBMITTED TO: SUBMITTED BY: www.studymafia.org www.studymafia.org www.studymafia.org Acknowledgement I would like to thank respected Mr…….. and Mr. ……..for giving me such a wonderful opportunity to expand my knowledge for my own branch and giving me guidelines to present a seminar report. It helped me a lot to realize of what we study for. Secondly, I would like to thank my parents who patiently helped me as i went through my work and helped to modify and eliminate some of the irrelevant or un-necessary stuffs. Thirdly, I would like to thank my friends who helped me to make my work more organized and well-stacked till the end. Next, I would thank Microsoft for developing such a wonderful tool like MS Word. It helped my work a lot to remain error-free. Last but clearly not the least, I would thank The Almighty for giving me strength to complete my report on time. www.studymafia.org Preface I have made this report file on the topic RANDOM ACCESS MEMORY (RAM); I have tried my best to elucidate all the relevant detail to the topic to be included in the report. While in the beginning I have tried to give a general view about this topic. My efforts and wholehearted co-corporation of each and everyone has ended on a successful note. I express my sincere gratitude to …………..who assisting me throughout the preparation of this topic. -
The Parallel Persistent Memory Model
The Parallel Persistent Memory Model Guy E. Blelloch∗ Phillip B. Gibbons∗ Yan Gu∗ Charles McGuffey∗ Julian Shuny ∗Carnegie Mellon University yMIT CSAIL ABSTRACT 1 INTRODUCTION We consider a parallel computational model, the Parallel Persistent In this paper, we consider a parallel computational model, the Par- Memory model, comprised of P processors, each with a fast local allel Persistent Memory (Parallel-PM) model, that consists of P pro- ephemeral memory of limited size, and sharing a large persistent cessors, each with a fast local ephemeral memory of limited size M, memory. The model allows for each processor to fault at any time and sharing a large slower persistent memory. As in the external (with bounded probability), and possibly restart. When a processor memory model [4, 5], each processor runs a standard instruction set faults, all of its state and local ephemeral memory is lost, but the from its ephemeral memory and has instructions for transferring persistent memory remains. This model is motivated by upcoming blocks of size B to and from the persistent memory. The cost of an non-volatile memories that are nearly as fast as existing random algorithm is calculated based on the number of such transfers. A access memory, are accessible at the granularity of cache lines, key difference, however, is that the model allows for individual pro- and have the capability of surviving power outages. It is further cessors to fault at any time. If a processor faults, all of its processor motivated by the observation that in large parallel systems, failure state and local ephemeral memory is lost, but the persistent mem- of processors and their caches is not unusual. -
Implementing Operational Intelligence Using In-Memory Computing
Implementing Operational Intelligence Using In-Memory Computing William L. Bain ([email protected]) June 29, 2015 Agenda • What is Operational Intelligence? • Example: Tracking Set-Top Boxes • Using an In-Memory Data Grid (IMDG) for Operational Intelligence • Tracking and analyzing live data • Comparison to Spark • Implementing OI Using Data-Parallel Computing in an IMDG • A Detailed OI Example in Financial Services • Code Samples in Java • Implementing MapReduce on an IMDG • Optimizing MapReduce for OI • Integrating Operational and Business Intelligence © ScaleOut Software, Inc. 2 About ScaleOut Software • Develops and markets In-Memory Data Grids, software middleware for: • Scaling application performance and • Providing operational intelligence using • In-memory data storage and computing • Dr. William Bain, Founder & CEO • Career focused on parallel computing – Bell Labs, Intel, Microsoft • 3 prior start-ups, last acquired by Microsoft and product now ships as Network Load Balancing in Windows Server • Ten years in the market; 400+ customers, 10,000+ servers • Sample customers: ScaleOut Software’s Product Portfolio ® • ScaleOut StateServer (SOSS) ScaleOut StateServer In-Memory Data Grid • In-Memory Data Grid for Windows and Linux Grid Grid Grid Grid • Scales application performance Service Service Service Service • Industry-leading performance and ease of use • ScaleOut ComputeServer™ adds • Operational intelligence for “live” data • Comprehensive management tools • ScaleOut hServer® • Full Hadoop Map/Reduce engine (>40X faster*) -
Lenovo Thinksystem DM Series Performance Guide
Front cover Lenovo ThinkSystem DM Series Performance Guide Introduces DM Series Explains ONTAP performance performance concepts fundamentals Explains controller reads and Provides basic infrastructure writes operations check points Vincent Kao Abstract Lenovo® ThinkSystem™ DM Series Storage Arrays run ONTAP data management software, which gives customers unified storage across block-and-file workloads. This document covers the performance principles of the ONTAP operating system and the best practice recommendations. This paper is intended for technical specialists, sales specialists, sales engineers, and IT architects who want to learn more about the performance tuning of the ThinkSystem DM Series storage array. It is recommended that users have basic ONTAP operation knowledge. At Lenovo Press, we bring together experts to produce technical publications around topics of importance to you, providing information and best practices for using Lenovo products and solutions to solve IT challenges. See a list of our most recent publications at the Lenovo Press web site: http://lenovopress.com Do you have the latest version? We update our papers from time to time, so check whether you have the latest version of this document by clicking the Check for Updates button on the front page of the PDF. Pressing this button will take you to a web page that will tell you if you are reading the latest version of the document and give you a link to the latest if needed. While you’re there, you can also sign up to get notified via email whenever we make an update. Contents Introduction . 3 ONTAP software introduction. 3 Introduction to ONTAP performance . -
CSS 133 Linked Lists, Stack, Queue
CSS 133 Computer Programming for Engineers II Linked Lists, Stack, Queue Professor Pisan 1 Doubly Linked List What can we do that we could not before? What is the disadvantage? How would removeNode change? Review Code: https://github.com/pisan133/doubly-linkedlist Lab: printBackward, findLargest, findSmallest, swapNodes 2 Extending Linked Lists Linked List: flexible ● insertAtFront, insertAtBack ● addBefore, addAfter ● contains ● empty ● removeNode, swap ● size Stack: LIFO - Last In, First Out ● push, pop, top Queue: FIFO - First in, First Out ● push, pop, front ● older versions: enqueue, dequeue, front 3 Linked List vs Array Arrays ● Fixed size ● Insert anywhere other than the end takes extra time ● Random access to elements possible ○ Binary search for sorted array ● Forward/backward traversal Linked List ● Extendable ● Has to allocate/deallocate memory ● Flexible ○ Need front and back ptr if we want to insert at end ○ Need double linked list to traverse forwards and backwards ● No random access ○ Maintaining a sorted list easy ○ Sorting is time consuming ○ No binary search 4 Stack LIFO - Last In, First Out ● push, pop, top ● Function call, push return address onto the stack ● Backtracking - remember the series of choices, when stuck, go back and change the choice ○ http://cs.lmu.edu/~ray/notes/backtracking/ ● Reverse-Polish Notation: 3 4 + ○ Efficient for computer math. No parentheses. ○ Operators applied to the operands at the bottom of the stack 5 Implementation Array ● Have a maximum stack size ● Keep track of index of the last -
Models for Parallel Computation in Multi-Core, Heterogeneous, and Ultra Wide-Word Architectures
Models for Parallel Computation in Multi-Core, Heterogeneous, and Ultra Wide-Word Architectures by Alejandro Salinger A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in Computer Science Waterloo, Ontario, Canada, 2013 c Alejandro Salinger 2013 I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. iii Abstract Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a chip being widely available and an increasing number of cores predicted for the future. In ad- dition, the decreasing costs and increasing programmability of Graphic Processing Units (GPUs) have made these an accessible source of parallel processing power in general purpose computing. Among the many research challenges that this scenario has raised are the fundamental problems related to theoretical modeling of computation in these architectures. In this thesis we study several aspects of computation in modern parallel architectures, from modeling of computation in multi-cores and heterogeneous platforms, to multi-core cache management strategies, through the proposal of an architecture that exploits bit-parallelism on thousands of bits. Observing that in practice multi-cores have a small number of cores, we propose a model for low-degree parallelism for these architectures. We argue that assuming a small number of processors (logarithmic in a problem's input size) simplifies the design of parallel algorithms. -
Random Access to Grammar-Compressed Strings and Trees
Downloaded from orbit.dtu.dk on: Oct 04, 2021 Random Access to Grammar-Compressed Strings and Trees Bille, Philip; Landau, Gad M.; Raman, Rajeev; Sadakane, Kunihiko; Satti, Srinivasa Rao; Weimann, Oren Published in: SIAM Journal on Computing Link to article, DOI: 10.1137/130936889 Publication date: 2015 Document Version Peer reviewed version Link back to DTU Orbit Citation (APA): Bille, P., Landau, G. M., Raman, R., Sadakane, K., Satti, S. R., & Weimann, O. (2015). Random Access to Grammar-Compressed Strings and Trees. SIAM Journal on Computing, 44(3), 513-539. https://doi.org/10.1137/130936889 General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. RANDOM ACCESS TO GRAMMAR-COMPRESSED STRINGS AND TREES∗ PHILIP BILLE y , GAD M. LANDAU z , RAJEEV RAMAN x , KUNIHIKO SADAKANE {, SRINIVASA RAO SATTI k, AND OREN WEIMANN ∗∗ Abstract. Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures (sometimes with slight reduction in efficiency) many of the popular compression schemes, including the Lempel-Ziv family, Run-Length Encoding, Byte-Pair Encoding, Sequitur, and Re-Pair.