Survey of the Sequential and Parallel Models of Computation : Technical Report LUSY-2012/02
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Paralellizing the Data Cube
PARALELLIZING THE DATA CUBE By Todd Eavis SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY AT DALHOUSIE UNIVERSITY HALIFAX, NOVA SCOTIA JUNE 27, 2003 °c Copyright by Todd Eavis, 2003 DALHOUSIE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE The undersigned hereby certify that they have read and recommend to the Faculty of Graduate Studies for acceptance a thesis entitled “Paralellizing the Data Cube” by Todd Eavis in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Dated: June 27, 2003 External Examiner: Virendra Bhavsar Research Supervisor: Andrew Rau-Chaplin Examing Committee: Qigang Gao Evangelos Milios ii DALHOUSIE UNIVERSITY Date: June 27, 2003 Author: Todd Eavis Title: Paralellizing the Data Cube Department: Computer Science Degree: Ph.D. Convocation: October Year: 2003 Permission is herewith granted to Dalhousie University to circulate and to have copied for non-commercial purposes, at its discretion, the above title upon the request of individuals or institutions. Signature of Author THE AUTHOR RESERVES OTHER PUBLICATION RIGHTS, AND NEITHER THE THESIS NOR EXTENSIVE EXTRACTS FROM IT MAY BE PRINTED OR OTHERWISE REPRODUCED WITHOUT THE AUTHOR’S WRITTEN PERMISSION. THE AUTHOR ATTESTS THAT PERMISSION HAS BEEN OBTAINED FOR THE USE OF ANY COPYRIGHTED MATERIAL APPEARING IN THIS THESIS (OTHER THAN BRIEF EXCERPTS REQUIRING ONLY PROPER ACKNOWLEDGEMENT IN SCHOLARLY WRITING) AND THAT ALL SUCH USE IS CLEARLY ACKNOWLEDGED. iii To the two women in my life: Amber and Bailey. iv Table of Contents Table of Contents v List of Tables x List of Figures xi Abstract i Acknowledgements ii 1 Introduction 1 1.1 Overview of Primary Research . -
PDF with Red/Green Hyperlinks
Elements of Programming Elements of Programming Alexander Stepanov Paul McJones (ab)c = a(bc) Semigroup Press Palo Alto • Mountain View Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. Copyright c 2009 Pearson Education, Inc. Portions Copyright c 2019 Alexander Stepanov and Paul McJones All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, request forms and the appropriate contacts within the Pearson Education Global Rights & Permissions Department, please visit www.pearsoned.com/permissions/. ISBN-13: 978-0-578-22214-1 First printing, June 2019 Contents Preface to Authors' Edition ix Preface xi 1 Foundations 1 1.1 Categories of Ideas: Entity, Species, -
Chapter 4 Algorithm Analysis
Chapter 4 Algorithm Analysis In this chapter we look at how to analyze the cost of algorithms in a way that is useful for comparing algorithms to determine which is likely to be better, at least for large input sizes, but is abstract enough that we don’t have to look at minute details of compilers and machine architectures. There are a several parts of such analysis. Firstly we need to abstract the cost from details of the compiler or machine. Secondly we have to decide on a concrete model that allows us to formally define the cost of an algorithm. Since we are interested in parallel algorithms, the model needs to consider parallelism. Thirdly we need to understand how to analyze costs in this model. These are the topics of this chapter. 4.1 Abstracting Costs When we analyze the cost of an algorithm formally, we need to be reasonably precise about the model we are performing the analysis in. Question 4.1. How precise should this model be? For example, would it help to know the exact running time for each instruction? The model can be arbitrarily precise but it is often helpful to abstract away from some details such as the exact running time of each (kind of) instruction. For example, the model can posit that each instruction takes a single step (unit of time) whether it is an addition, a division, or a memory access operation. Some more advanced models, which we will not consider in this class, separate between different classes of instructions, for example, a model may require analyzing separately calculation (e.g., addition or a multiplication) and communication (e.g., memory read). -
Undecidability of the Word Problem for Groups: the Point of View of Rewriting Theory
Universita` degli Studi Roma Tre Facolta` di Scienze M.F.N. Corso di Laurea in Matematica Tesi di Laurea in Matematica Undecidability of the word problem for groups: the point of view of rewriting theory Candidato Relatori Matteo Acclavio Prof. Y. Lafont ..................................... Prof. L. Tortora de falco ...................................... questa tesi ´estata redatta nell'ambito del Curriculum Binazionale di Laurea Magistrale in Logica, con il sostegno dell'Universit´aItalo-Francese (programma Vinci 2009) Anno Accademico 2011-2012 Ottobre 2012 Classificazione AMS: Parole chiave: \There once was a king, Sitting on the sofa, He said to his maid, Tell me a story, And the maid began: There once was a king, Sitting on the sofa, He said to his maid, Tell me a story, And the maid began: There once was a king, Sitting on the sofa, He said to his maid, Tell me a story, And the maid began: There once was a king, Sitting on the sofa, . " Italian nursery rhyme Even if you don't know this tale, it's easy to understand that this could continue indefinitely, but it doesn't have to. If now we want to know if the nar- ration will finish, this question is what is called an undecidable problem: we'll need to listen the tale until it will finish, but even if it will not, one can never say it won't stop since it could finish later. those things make some people loose sleep, but usually children, bored, fall asleep. More precisely a decision problem is given by a question regarding some data that admit a negative or positive answer, for example: \is the integer number n odd?" or \ does the story of the king on the sofa admit an happy ending?". -
Introduction to Parallel Computing, 2Nd Edition
732A54 Traditional Use of Parallel Computing: Big Data Analytics Large-Scale HPC Applications n High Performance Computing (HPC) l Much computational work (in FLOPs, floatingpoint operations) l Often, large data sets Introduction to l E.g. climate simulations, particle physics, engineering, sequence Parallel Computing matching or proteine docking in bioinformatics, … n Single-CPU computers and even today’s multicore processors cannot provide such massive computation power Christoph Kessler n Aggregate LOTS of computers à Clusters IDA, Linköping University l Need scalable parallel algorithms l Need to exploit multiple levels of parallelism Christoph Kessler, IDA, Linköpings universitet. C. Kessler, IDA, Linköpings universitet. NSC Triolith2 More Recent Use of Parallel Computing: HPC vs Big-Data Computing Big-Data Analytics Applications n Big Data Analytics n Both need parallel computing n Same kind of hardware – Clusters of (multicore) servers l Data access intensive (disk I/O, memory accesses) n Same OS family (Linux) 4Typically, very large data sets (GB … TB … PB … EB …) n Different programming models, languages, and tools l Also some computational work for combining/aggregating data l E.g. data center applications, business analytics, click stream HPC application Big-Data application analysis, scientific data analysis, machine learning, … HPC prog. languages: Big-Data prog. languages: l Soft real-time requirements on interactive querys Fortran, C/C++ (Python) Java, Scala, Python, … Programming models: Programming models: n Single-CPU and multicore processors cannot MPI, OpenMP, … MapReduce, Spark, … provide such massive computation power and I/O bandwidth+capacity Scientific computing Big-data storage/access: libraries: BLAS, … HDFS, … n Aggregate LOTS of computers à Clusters OS: Linux OS: Linux l Need scalable parallel algorithms HW: Cluster HW: Cluster l Need to exploit multiple levels of parallelism l Fault tolerance à Let us start with the common basis: Parallel computer architecture C. -
CS411-2015F-14 Counter Machines & Recursive Functions
Automata Theory CS411-2015F-14 Counter Machines & Recursive Functions David Galles Department of Computer Science University of San Francisco 14-0: Counter Machines Give a Non-Deterministic Finite Automata a counter Increment the counter Decrement the counter Check to see if the counter is zero 14-1: Counter Machines A Counter Machine M = (K, Σ, ∆,s,F ) K is a set of states Σ is the input alphabet s ∈ K is the start state F ⊂ K are Final states ∆ ⊆ ((K × (Σ ∪ ǫ) ×{zero, ¬zero}) × (K × {−1, 0, +1})) Accept if you reach the end of the string, end in an accept state, and have an empty counter. 14-2: Counter Machines Give a Non-Deterministic Finite Automata a counter Increment the counter Decrement the counter Check to see if the counter is zero Do we have more power than a standard NFA? 14-3: Counter Machines Give a counter machine for the language anbn 14-4: Counter Machines Give a counter machine for the language anbn (a,zero,+1) (a,~zero,+1) (b,~zero,−1) (b,~zero,−1) 0 1 14-5: Counter Machines Give a 2-counter machine for the language anbncn Straightforward extension – examine (and change) two counters instead of one. 14-6: Counter Machines Give a 2-counter machine for the language anbncn (a,zero,zero,+1,0) (a,~zero,zero,+1,0) (b,~zero,~zero,-1,+1) (b,~zero,zero,−1,+1) 0 1 (c,zero,~zero,0,-1) 2 (c,zero,~zero,0,-1) 14-7: Counter Machines Our counter machines only accept if the counter is zero Does this give us any more power than a counter machine that accepts whenever the end of the string is reached in an accept state? That is, given -
A Microcontroller Based Fan Speed Control Using PID Controller Theory
A Microcontroller Based Fan Speed Control Using PID Controller Theory Thu Thu Zue Zin and Hlaing Thida Oo University of Computer Studies, Yangon thuthuzuezin @gmail.com Abstract PIC control application is widely used and User define popular in modern elections. The aim of this paper Microcontroller speed PIC 16F877 is design and construct the fan speed control system Duty cycle with microcontroller. PIC 16F877 and photo transistor is used for the system. User define speed can be selected by keypads for various speed. Motor is controlled by the pulse width modulation. Driver The system can be operate with normal mode and circuit timer mode. Delay time for timer mode is 10 seconds. The feedback signal from sensor is controlled by the proportional integral derivative equations to reproduce the desire duty cycle. Sensor Motor 1. Introduction Figure 1. Block Diagram of the System Motors are widely used in many applications, such as air conditioners , slide doors, washing machines and control areas. Motors are 2. Background Theory derivate it’s output performance due to the PID Controller tolerances, operating conditions, process error and Proportional Integral Derivative controller is measurement error. Pulse Width Modulation a generic control loop feedback mechanism widely control technique is better for control the DC motor used in industrial control systems. A PID controller than others techniques. PWM control the speed of attempts to correct the error between a measured motor without changing the voltage supply to process variable and a desired set point by calculating motor. Series of pulse define the speed of the and then outputting a corrective action that can adjust motor. -
Copyright © 1998, by the Author(S). All Rights Reserved
Copyright © 1998, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. WHATS DECIDABLE ABOUT HYBRID AUTOMATA by Thomas A. Henzinger, Peter W. Kopke, Anuj Puri, and Pravin Varaiya Memorandum No. UCB/ERL M98/22 15 April 1998 WHAT'S DECIDABLE ABOUT HYBRID AUTOMATA by Thomas A. Henzinger,Peter W. Kopke, Anuj Puri, and Pravin Varaiya Memorandum No. UCB/ERL M98/22 15 April 1998 ELECTRONICS RESEARCH LABORATORY College ofEngineering University of California, Berkeley 94720 What's Decidable About Hybrid Automata?*^ Thomas A. Henzinger^ Peter W. Kopke^ Anuj Puri^ Pravin Varaiya^ Abstract. Hybrid automata model systems with both digital and analog compo nents, such as embedded control programs. Many verification tasks for such programs can be expressed as reachabibty problems for hybrid automata. By improving on pre vious decidability and undecidability results, we identify a precise boundary between decidability and undecidability for the reachabibty problem of hybrid automata. On the positive side, we give an (optimal) PSPACE reachabibty algorithmfor the case of initiabzed rectangular automata, where ab analog variables foUow independent tra jectories within piecewise-bnear envelopes and are reinitiabzed whenever the envelope changes. Our algorithm is based on the construction ofa timedautomaton that contains ab reachabibty information about a given initiabzed rectangular automaton. -
Two-Level Main Memory Co-Design: Multi-Threaded Algorithmic Primitives, Analysis, and Simulation
Two-Level Main Memory Co-Design: Multi-Threaded Algorithmic Primitives, Analysis, and Simulation Michael A. Bender∗x Jonathan Berryy Simon D. Hammondy K. Scott Hemmerty Samuel McCauley∗ Branden Moorey Benjamin Moseleyz Cynthia A. Phillipsy David Resnicky and Arun Rodriguesy ∗Stony Brook University, Stony Brook, NY 11794-4400 USA fbender,[email protected] ySandia National Laboratories, Albuquerque, NM 87185 USA fjberry, sdhammo, kshemme, bjmoor, caphill, drresni, [email protected] zWashington University in St. Louis, St. Louis, MO 63130 USA [email protected] xTokutek, Inc. www.tokutek.com Abstract—A fundamental challenge for supercomputer memory close to the processor, there can be a higher architecture is that processors cannot be fed data from number of connections between the memory and caches, DRAM as fast as CPUs can consume it. Therefore, many enabling higher bandwidth than current technologies. applications are memory-bandwidth bound. As the number 1 of cores per chip increases, and traditional DDR DRAM While the term scratchpad is overloaded within the speeds stagnate, the problem is only getting worse. A computer architecture field, we use it throughout this variety of non-DDR 3D memory technologies (Wide I/O 2, paper to describe a high-bandwidth, local memory that HBM) offer higher bandwidth and lower power by stacking can be used as a temporary storage location. DRAM chips on the processor or nearby on a silicon The scratchpad cannot replace DRAM entirely. Due to interposer. However, such a packaging scheme cannot contain sufficient memory capacity for a node. It seems the physical constraints of adding the memory directly likely that future systems will require at least two levels of to the chip, the scratchpad cannot be as large as DRAM, main memory: high-bandwidth, low-power memory near although it will be much larger than cache, having the processor and low-bandwidth high-capacity memory gigabytes of storage capacity. -
Minimizing Writes in Parallel External Memory Search
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Minimizing Writes in Parallel External Memory Search Nathan R. Sturtevant and Matthew J. Rutherford University of Denver Denver, CO, USA fsturtevant, [email protected] Abstract speed in Chinese Checkers, and is 3.5 times faster in Rubik’s Cube. Recent research on external-memory search has shown that disks can be effectively used as sec- WMBFS is motivated by several observations: ondary storage when performing large breadth- first searches. We introduce the Write-Minimizing First, most large-scale BFSs have been performed to ver- Breadth-First Search (WMBFS) algorithm which ify the diameter (the number of unique depths at which states is designed to minimize the number of writes per- can be found) or the width (the maximum number of states at formed in an external-memory BFS. WMBFS is any particular depth) of a state space, after which any com- also designed to store the results of the BFS for puted data is discarded. There are, however, many scenarios later use. We present the results of a BFS on a in which the results of a large BFS can be later used for other single-agent version of Chinese Checkers and the computation. In particular, a breadth-first search is the under- Rubik’s Cube edge cubes, state spaces with about lying technique for building pattern databases (PDBs), which 1 trillion states each. In evaluating against a com- are used as heuristics for search. PDBs require that the depth parable approach, WMBFS reduces the I/O for the of each state in the BFS be stored for later usage. -
Oblivious Network RAM and Leveraging Parallelism to Achieve Obliviousness
Oblivious Network RAM and Leveraging Parallelism to Achieve Obliviousness Dana Dachman-Soled1,3 ∗ Chang Liu2 y Charalampos Papamanthou1,3 z Elaine Shi4 x Uzi Vishkin1,3 { 1: University of Maryland, Department of Electrical and Computer Engineering 2: University of Maryland, Department of Computer Science 3: University of Maryland Institute for Advanced Computer Studies (UMIACS) 4: Cornell University January 12, 2017 Abstract Oblivious RAM (ORAM) is a cryptographic primitive that allows a trusted CPU to securely access untrusted memory, such that the access patterns reveal nothing about sensitive data. ORAM is known to have broad applications in secure processor design and secure multi-party computation for big data. Unfortunately, due to a logarithmic lower bound by Goldreich and Ostrovsky (Journal of the ACM, '96), ORAM is bound to incur a moderate cost in practice. In particular, with the latest developments in ORAM constructions, we are quickly approaching this limit, and the room for performance improvement is small. In this paper, we consider new models of computation in which the cost of obliviousness can be fundamentally reduced in comparison with the standard ORAM model. We propose the Oblivious Network RAM model of computation, where a CPU communicates with multiple memory banks, such that the adversary observes only which bank the CPU is communicating with, but not the address offset within each memory bank. In other words, obliviousness within each bank comes for free|either because the architecture prevents a malicious party from ob- serving the address accessed within a bank, or because another solution is used to obfuscate memory accesses within each bank|and hence we only need to obfuscate communication pat- terns between the CPU and the memory banks. -
Turing Machines [Fa’16]
Models of Computation Lecture 6: Turing Machines [Fa’16] Think globally, act locally. — Attributed to Patrick Geddes (c.1915), among many others. We can only see a short distance ahead, but we can see plenty there that needs to be done. — Alan Turing, “Computing Machinery and Intelligence” (1950) Never worry about theory as long as the machinery does what it’s supposed to do. — Robert Anson Heinlein, Waldo & Magic, Inc. (1950) It is a sobering thought that when Mozart was my age, he had been dead for two years. — Tom Lehrer, introduction to “Alma”, That Was the Year That Was (1965) 6 Turing Machines In 1936, a few months before his 24th birthday, Alan Turing launched computer science as a modern intellectual discipline. In a single remarkable paper, Turing provided the following results: • A simple formal model of mechanical computation now known as Turing machines. • A description of a single universal machine that can be used to compute any function computable by any other Turing machine. • A proof that no Turing machine can solve the halting problem—Given the formal description of an arbitrary Turing machine M, does M halt or run forever? • A proof that no Turing machine can determine whether an arbitrary given proposition is provable from the axioms of first-order logic. This is Hilbert and Ackermann’s famous Entscheidungsproblem (“decision problem”). • Compelling arguments1 that his machines can execute arbitrary “calculation by finite means”. Although Turing did not know it at the time, he was not the first to prove that the Entschei- dungsproblem had no algorithmic solution.