Parallel Finger Search Structures Arxiv:1908.02741V4 [Cs.DS]

Total Page:16

File Type:pdf, Size:1020Kb

Parallel Finger Search Structures Arxiv:1908.02741V4 [Cs.DS] Parallel Finger Search Structures Seth Gilbert Wei Quan Lim National University of Singapore National University of Singapore Keywords Parallel data structures, multithreading, dictionaries, comparison-based search, distribution-sensitive algorithms Abstract In this paper 1 we present two versions of a parallel finger structure FS on p processors that supports searches, insertions and deletions, and has a finger at each end. This is to our knowledge the first implementation of a parallel search structure that is work-optimal with respect to the finger bound and yet has very good parallelism (within a factor of O¹logpº2 of optimal). We utilize an extended implicit batching framework that transparently facilitates the use of FS by any parallel program P that is modelled by a dynamically generated DAG D where each node is either a unit-time instruction or a call to FS. The total work done by either version of FS is bounded by the finger bound FL (for some linearization L of D), i.e. each operation on an item with distance r from a finger takes O¹logr+1º amortized work. Running P using the simpler version takes T1+FL 2 O p +T1 +d · ¹logpº +logn time on a greedy scheduler, where T1;T1 are the size and span of D respectively, and n is the maximum number of items in FS, and d is the maximum number of calls to FS along any path in D. Using the faster T1+FL 2 version, this is reduced to O p +T1 +d ·¹logpº +sL time, where sL is the weighted span of D where each call to FS is weighted by its cost according to FL. We also sketch how to extend FS to support a fixed number of movable fingers. The data structures in our paper fit into the dynamic multithreading paradigm, and their performance bounds are directly composable with other data structures given in the same paradigm. Also, the results can be translated to practical implementations using work-stealing schedulers. Acknowledgements We would like to express our gratitude to our families and friends for their wholehearted support, to the kind reviewers who provided helpful feedback, and to all others who have given us valuable comments and advice. This research was supported in part by Singapore MOE AcRF Tier 1 grant T1 251RES1719. 1 Introduction There has been much research on designing parallel programs and parallel data structures. The dynamic multithreading paradigm (see [14] chap. 27) is one common parallel programming model, in which algorithmic parallelism is expressed through parallel programming primitives such as fork/join (also spawn/sync), parallel loops and synchronized methods, but the program cannot stipulate any mapping from subcomputations to processors. This is the case with many parallel languages and libraries, such as Cilk dialects [20, 25], Intel TBB [34], Microsoft Task Parallel Library [37] and subsets of OpenMP [31]. Recently, Agrawal et al. [3] introduced the exciting modular design approach of implicit batching, in which the programmer arXiv:1908.02741v4 [cs.DS] 10 Oct 2019 writes a multithreaded parallel program that uses a black box data structure, treating calls to the data structure as basic operations, and also provides a data structure that supports batched operations. Given these, the runtime system automatically combines these two components together, buffering data structure operations generated by the program, and executing them in batches on the data structure. This idea was extended in [4] to data structures that do not process only one batch at a time (to improve parallelism). In this extended implicit batching framework, the runtime system not only holds the data structure operations in a parallel buffer, to form the next input batch, but also notifies the data structure on receiving the first operation in each batch. Independently, the data structure can at any point flush the parallel buffer to get the next batch. This framework nicely supports pipelined batched data structures, since the data structure can decide when it is ready to get the next input batch from the parallel buffer, which may be even before it has finished processing the previous batch. Furthermore, this framework makes it easy for us to build composable parallel algorithms and data structures with composable performance bounds. This is demonstrated by both the parallel working-set map in [4] and the parallel finger structure in this paper. 1 This is the full version of a paper published in the 33rd International Symposium on Distributed Computing (DISC 2019). It is posted here for your personal or classroom use. Not for redistribution. c 2019 Copyright is held by the owner/author(s). 1 Finger Structures The map (or dictionary) data structure, which supports inserts, deletes and searches/updates, collectively referred to as accesses, comes in many different kinds. A common implementation of a map is a balanced binary search tree such as an AVL tree or a red-black tree, which (in the comparison model) takes O¹lognº worst-case cost per access for a tree with n items. There are also maps such as splay trees [36] that have amortized rather than worst-case performance bounds. A finger structure is a special kind of map that comes with a fixed finger at each end and a (fixed) number of movable fingers, each of which has a key (possibly −∞ or 1 or between adjacent items in the map) that determines its position in the map, such that accessing items nearer the fingers is cheaper. For instance, the finger tree [27] was designed to have the finger property in the worst case; it takes O¹logr +1º steps per operation with finger distance r (Definition 1), so its total cost satisfies the finger bound (Definition 2). Definition 1 (Finger Distance). Define the finger distance of accessing an item x on a finger structure M to be the number of items from x to the nearest finger in M (including x), and the finger distance of moving a finger to be the distance moved. Definition 2 (Finger Bound). Given any sequence L of N operations on a finger structure M, let FL denote the finger bound ÍN ¹ º for L, defined by FL = i=1 logri +1 where ri is the finger distance of the i-th operation in L when L is performed on M. Main Results We present in this paper, to the best of our knowledge, the first parallel finger structure. In particular, we design two parallel maps that are work-optimal with respect to the Finger Bound FL (i.e. it takes O¹FLº work) for some linearization L of the operations (that is consistent with the results), while having very good parallelism. (We assume that each key comparison takes O¹1º steps.) These parallel finger structures can be used by any parallel program P, whose actual execution is captured by a program DAG D, where each node is an instruction that finishes in O¹1º time or a call to the finger structure M, called an M-call, that blocks until the result is returned, and each edge represents a dependency due to the parallel programming primitives. The first design, called FS1, is a simpler data structure that processes operations one batch at a time. Theorem 3 (FS1 Performance). If P uses FS1 (as M), then its running time on p processes using any greedy scheduler (i.e. at each step, as many tasks are executed as are available, up to p) is T +F O 1 L +T +d · ¹logpº2 +logn p 1 for some linearization L of M-calls in D, where T1 is the number of nodes in D, and T1 is the number of nodes on the longest path in D, and d is the maximum number of M-calls on any path in D, and n is the maximum size of M. 2 Notice that if M is an ideal concurrent finger structure (i.e. one that takes O¹FLº work), then running P using M on p processors T1+FL according to the linearization L takes Ω¹Toptº worst-case time where Topt = p +T1. Thus FS1 gives an essentially optimal 2 2 time bound except for the ‘span term’ d · ¹logpº +logn , which adds O ¹logpº +logn time per FS1-call along some path in D. The second design, called FS2, uses a complex internal pipeline to reduce the ‘span term’. Theorem 4 (FS2 Performance). If P uses FS2, then its running time on p processes using any greedy scheduler is T +F O 1 L +T +d ·¹logpº2 +s p 1 L for some linearization L of M-calls in D, where d is the maximum number of FS2-calls on any path in D, and sL is the weighted span of D where each FS2-call is weighted by its cost according to FL, except that each finger-move operation is weighted by logn. Specifically, each access FS2-call that is an access with finger distance r according to L is given the weight logr +1, and each FS2-call that is a finger-move is given the weight logn, and sL is the maximum weight of any path in D. Thus, ignoring 2 finger-move operations, FS2 gives an essentially optimal time bound up to an extra O ¹logpº time per FS2-call along some path in D. We shall first focus on basic finger structures with just one fixed finger at each end, since we can implement the general finger structure with f movable fingers by essentially concatenating ¹f +1º basic finger structures, as we shall explain later in Section 6. We will also discuss later in Section 7 how to adapt our results for work-stealing schedulers that can actually be provided by a real runtime system.
Recommended publications
  • THE INVARIANCE THESIS 1. Introduction in 1936, Turing [47
    THE INVARIANCE THESIS NACHUM DERSHOWITZ AND EVGENIA FALKOVICH-DERZHAVETZ School of Computer Science, Tel Aviv University, Ramat Aviv, Israel e-mail address:[email protected] School of Computer Science, Tel Aviv University, Ramat Aviv, Israel e-mail address:[email protected] Abstract. We demonstrate that the programs of any classical (sequential, non-interactive) computation model or programming language that satisfies natural postulates of effective- ness (which specialize Gurevich’s Sequential Postulates)—regardless of the data structures it employs—can be simulated by a random access machine (RAM) with only constant factor overhead. In essence, the postulates of algorithmicity and effectiveness assert the following: states can be represented as logical structures; transitions depend on a fixed finite set of terms (those referred to in the algorithm); all atomic operations can be pro- grammed from constructors; and transitions commute with isomorphisms. Complexity for any domain is measured in terms of constructor operations. It follows that any algorithmic lower bounds found for the RAM model also hold (up to a constant factor determined by the algorithm in question) for any and all effective classical models of computation, what- ever their control structures and data structures. This substantiates the Invariance Thesis of van Emde Boas, namely that every effective classical algorithm can be polynomially simulated by a RAM. Specifically, we show that the overhead is only a linear factor in either time or space (and a constant factor in the other dimension). The enormous number of animals in the world depends of their varied structure & complexity: — hence as the forms became complicated, they opened fresh means of adding to their complexity.
    [Show full text]
  • Optimal Multithreaded Batch-Parallel 2-3 Trees Arxiv:1905.05254V2
    Optimal Multithreaded Batch-Parallel 2-3 Trees Wei Quan Lim National University of Singapore Keywords Parallel data structures, pointer machine, multithreading, dictionaries, 2-3 trees. Abstract This paper presents a batch-parallel 2-3 tree T in an asynchronous dynamic multithreading model that supports searches, insertions and deletions in sorted batches and has essentially optimal parallelism, even under the restrictive QRMW (queued read-modify-write) memory contention model where concurrent accesses to the same memory location are queued and serviced one by one. Specifically, if T has n items, then performing an item-sorted batch (given as a leaf-based balanced binary tree) of b operations · n ¹ º ! 1 on T takes O b log b +1 +b work and O logb+logn span (in the worst case as b;n ). This is information-theoretically work-optimal for b ≤ n, and also span-optimal for pointer-based structures. Moreover, it is easy to support optimal intersection, n union and difference of instances of T with sizes m ≤ n, namely within O¹m·log¹ m +1ºº work and O¹log m + log nº span. Furthermore, T supports other batch operations that make it a very useful building block for parallel data structures. To the author’s knowledge, T is the first parallel sorted-set data structure that can be used in an asynchronous multi-processor machine under a memory model with queued contention and yet have asymptotically optimal work and span. In fact, T is designed to have bounded contention and satisfy the claimed work and span bounds regardless of the execution schedule.
    [Show full text]
  • On Data Structures and Memory Models
    2006:24 DOCTORAL T H E SI S On Data Structures and Memory Models Johan Karlsson Luleå University of Technology Department of Computer Science and Electrical Engineering 2006:24|: 402-544|: - -- 06 ⁄24 -- On Data Structures and Memory Models by Johan Karlsson Department of Computer Science and Electrical Engineering Lule˚a University of Technology SE-971 87 Lule˚a, Sweden May 2006 Supervisor Andrej Brodnik, Ph.D., Lule˚a University of Technology, Sweden Abstract In this thesis we study the limitations of data structures and how they can be overcome through careful consideration of the used memory models. The word RAM model represents the memory as a finite set of registers consisting of a constant number of unique bits. From a hardware point of view it is not necessary to arrange the memory as in the word RAM memory model. However, it is the arrangement used in computer hardware today. Registers may in fact share bits, or overlap their bytes, as in the RAM with Byte Overlap (RAMBO) model. This actually means that a physical bit can appear in several registers or even in several positions within one regis- ter. The RAMBO model of computation gives us a huge variety of memory topologies/models depending on the appearance sets of the bits. We show that it is feasible to implement, in hardware, other memory models than the word RAM memory model. We do this by implementing a RAMBO variant on a memory board for the PC100 memory bus. When alternative memory models are allowed, it is possible to solve a number of problems more efficiently than under the word RAM memory model.
    [Show full text]
  • Algorithms: a Quest for Absolute Definitions∗
    Algorithms: A Quest for Absolute De¯nitions¤ Andreas Blassy Yuri Gurevichz Abstract What is an algorithm? The interest in this foundational problem is not only theoretical; applications include speci¯cation, validation and veri¯ca- tion of software and hardware systems. We describe the quest to understand and de¯ne the notion of algorithm. We start with the Church-Turing thesis and contrast Church's and Turing's approaches, and we ¯nish with some recent investigations. Contents 1 Introduction 2 2 The Church-Turing thesis 3 2.1 Church + Turing . 3 2.2 Turing ¡ Church . 4 2.3 Remarks on Turing's analysis . 6 3 Kolmogorov machines and pointer machines 9 4 Related issues 13 4.1 Physics and computations . 13 4.2 Polynomial time Turing's thesis . 14 4.3 Recursion . 15 ¤Bulletin of European Association for Theoretical Computer Science 81, 2003. yPartially supported by NSF grant DMS{0070723 and by a grant from Microsoft Research. Address: Mathematics Department, University of Michigan, Ann Arbor, MI 48109{1109. zMicrosoft Research, One Microsoft Way, Redmond, WA 98052. 1 5 Formalization of sequential algorithms 15 5.1 Sequential Time Postulate . 16 5.2 Small-step algorithms . 17 5.3 Abstract State Postulate . 17 5.4 Bounded Exploration Postulate and the de¯nition of sequential al- gorithms . 19 5.5 Sequential ASMs and the characterization theorem . 20 6 Formalization of parallel algorithms 21 6.1 What parallel algorithms? . 22 6.2 A few words on the axioms for wide-step algorithms . 22 6.3 Wide-step abstract state machines . 23 6.4 The wide-step characterization theorem .
    [Show full text]
  • A Microcoded Machine Simulator and Microcode Assembler in a FORTH Environment A
    A Microcoded Machine Simulator and Microcode Assembler in a FORTH Environment A. Cotterman, R. Dixon, R. Grewe, G. Simpson Department ofComputer Science Wright State University Abstract A FORTH program which provides a design tool for systems which contain a microcoded component was implemented and used in a computer architecture laboratory. The declaration of standard components such as registers, ALUs, busses, memories, and the connections is required. A sequencer and timing signals are implicit in the implementation. The microcode is written in a FORTH-like language which can be executed directly as a simulation or interpreted to produce a fixed horizontal microcode bit pattern for generating ROMs. The direct execution of the microcode commands (rather than producing bit patterns and interpreting those instructions) gives a simpler, faster implementation. Further, the designer may concentrate on developing the design at a block level without considering some ofthe implementation details (such as microcode fields) which might change several times during the design cycle. However, the design is close enough to the hardware to be readily translated. Finally, the fact that the same code used for the simulation may be used for assembly ofthe microcode instructions (after the field patterns have been specified) saves time and reduces errors. 1. Introduction At the Wright State University computer architecture laboratory a microcoded machine simulator and microcode generator have been developed using FORTH. These tools have been used as "hands-on" instructional aids in graduate courses in computer architecture, and are also being used to aid the in-house development of new architectures in ongoing research efforts. The simulator provides basic block-level functional control and data-flow simulation for a machine architecture based on a microcoded implementation.
    [Show full text]
  • Hardware Information Flow Tracking
    Hardware Information Flow Tracking WEI HU, Northwestern Polytechnical University, China ARMAITI ARDESHIRICHAM, University of California, San Diego, USA RYAN KASTNER, University of California, San Diego, USA Information flow tracking (IFT) is a fundamental computer security technique used to understand how information moves througha computing system. Hardware IFT techniques specifically target security vulnerabilities related to the design, verification, testing, man- ufacturing, and deployment of hardware circuits. Hardware IFT can detect unintentional design flaws, malicious circuit modifications, timing side channels, access control violations, and other insecure hardware behaviors. This article surveys the area of hardware IFT. We start with a discussion on the basics of IFT, whose foundations were introduced by Denning in the 1970s. Building upon this, we develop a taxonomy for hardware IFT. We use this to classify and differentiate hardware IFT tools and techniques. Finally, we discuss the challenges yet to be resolved. The survey shows that hardware IFT provides a powerful technique for identifying hardware security vulnerabilities as well as verifying and enforcing hardware security properties. CCS Concepts: • Security and privacy Logic and verification; Information flow control; Formal security models. ! Additional Key Words and Phrases: Hardware security, information flow security, information flow tracking, security verification, formal method, survey ACM Reference Format: Wei Hu, Armaiti Ardeshiricham, and Ryan Kastner. 2020. Hardware Information Flow Tracking. ACM Comput. Surv. 0, 0, Article 00 (April 2020), 38 pages. https://doi.org/10.0000/0000000.0000000 1 INTRODUCTION A core tenet of computer security is to maintain the confidentiality and integrity of the information being computed upon. Confidentiality ensures that information is only disclosed to authorized entities.
    [Show full text]
  • Complexity of Algorithms
    Complexity of Algorithms Lecture Notes, Spring 1999 Peter G´acs Boston University and L´aszl´oLov´asz Yale University 1 Contents 0 Introduction and Preliminaries 1 0.1 Thesubjectofcomplexitytheory . ... 1 0.2 Somenotationanddefinitions . .. 2 1 Models of Computation 3 1.1 Introduction................................... 3 1.2 Finiteautomata ................................ 5 1.3 TheTuringmachine .............................. 7 1.3.1 ThenotionofaTuringmachine. 7 1.3.2 UniversalTuringmachines. 9 1.3.3 Moretapesversusonetape . 11 1.4 TheRandomAccessMachine . 17 1.5 BooleanfunctionsandBooleancircuits. ...... 22 2 Algorithmic decidability 30 2.1 Introduction................................... 30 2.2 Recursive and recursively enumerable languages . ......... 31 2.3 Otherundecidableproblems. .. 35 2.3.1 Thetilingproblem .. .. .. .. .. .. .. .. .. .. .. 35 2.3.2 Famous undecidable problems in mathematics . .... 38 2.4 Computabilityinlogic . 40 2.4.1 Godel’sincompletenesstheorem. .. 40 2.4.2 First-orderlogic ............................ 42 2.4.3 A simple theory of arithmetic; Church’s Theorem . ..... 45 3 Computation with resource bounds 48 3.1 Introduction................................... 48 3.2 Timeandspace................................. 48 3.3 Polynomial time I: Algorithms in arithmetic . ....... 50 3.3.1 Arithmeticoperations . 50 3.3.2 Gaussianelimination. 52 3.4 PolynomialtimeII:Graphalgorithms . .... 55 3.4.1 Howisagraphgiven? . .. .. .. .. .. .. .. .. .. .. 55 3.4.2 Searchingagraph ........................... 55 3.4.3 Maximum bipartite
    [Show full text]
  • Energy-Efficient Algorithms
    Energy-Efficient Algorithms Erik D. Demaine∗ Jayson Lynch∗ Geronimo J. Mirano∗ Nirvan Tyagi∗ May 30, 2016 Abstract We initiate the systematic study of the energy complexity of algorithms (in addition to time and space complexity) based on Landauer's Principle in physics, which gives a lower bound on the amount of energy a system must dissipate if it destroys information. We propose energy- aware variations of three standard models of computation: circuit RAM, word RAM, and trans- dichotomous RAM. On top of these models, we build familiar high-level primitives such as control logic, memory allocation, and garbage collection with zero energy complexity and only constant-factor overheads in space and time complexity, enabling simple expression of energy- efficient algorithms. We analyze several classic algorithms in our models and develop low-energy variations: comparison sort, insertion sort, counting sort, breadth-first search, Bellman-Ford, Floyd-Warshall, matrix all-pairs shortest paths, AVL trees, binary heaps, and dynamic arrays. We explore the time/space/energy trade-off and develop several general techniques for analyzing algorithms and reducing their energy complexity. These results lay a theoretical foundation for a new field of semi-reversible computing and provide a new framework for the investigation of algorithms. Keywords: Reversible Computing, Landauer's Principle, Algorithms, Models of Computation ∗MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar Street, Cambridge, MA 02139, USA, fedemaine,jaysonl,geronm,[email protected]. Supported in part by the MIT Energy Initiative and by MADALGO | Center for Massive Data Algorithmics | a Center of the Danish National Research Foundation. arXiv:1605.08448v1 [cs.DS] 26 May 2016 Contents 1 Introduction 1 2 Energy Models 4 2.1 Energy Circuit Model .
    [Show full text]
  • The Yesterday, Today, and Tomorrow of Parallelism in Logic Programming
    The Yesterday, Today, and Tomorrow of Parallelism in Logic Programming Enrico Pontelli Department of Computer Science New Mexico State University New Mexico State University Tutorial Roadmap Systems Going clasp Large Basics Prolog ASP Going Small (Early) Yesterday Today Tomorrow KLAP Laboratory New Mexico State University Let’s get Started! KLAP Laboratory New Mexico State University Tutorial Roadmap Systems Going clasp Large Basics Prolog ASP Going Small (Early) Yesterday Today Tomorrow KLAP Laboratory New Mexico State University Prolog Programs • Program = a bunch of axioms • Run your program by: – Enter a series of facts and axioms – Pose a query – System tries to prove your query by finding a series of inference steps • “Philosophically” declarative • Actual implementations are deterministic KLAP Laboratory 5 Horn Clauses (Axioms) • Axioms in logic languages are written: H :- B1, B2,….,B3 Facts = clause with head and no body. Rules = have both head and body. Query – can be thought of as a clause with no body. KLAP Laboratory 6 Terms • H and B are terms. • Terms = – Atoms - begin with lowercase letters: x, y, z, fred – Numbers: integers, reals – Variables - begin with captial letters: X, Y, Z, Alist – Structures: consist of an atom called a functor, and a list of arguments. ex. edge(a,b). line(1,2,4). KLAP Laboratory 7 Backward Chaining START WITH THE GOAL and work backwards, attempting to decompose it into a set of (true) clauses. This is what the Prolog interpreter does. KLAP Laboratory 8 Backtracking search KLAP Laboratory 9 Assumption for this Tutorial CODE AREA TRAIL • BasicHEAP familiarity with Logic ProgrammingTop of Trail Instruction Pointer – Datalog (pure Horn clauses, no function symbols)Return Address Heap Top Prev.
    [Show full text]
  • Computer Science & Engineering
    Punjabi University, Patiala Four Year B.Tech (CSE) Batch 2014 BOS: 2015 B. TECH SECOND YEAR COMPUTER SCIENCE & ENGINEERING (Batch 2014) Session (2015-16) SCHEME OF PAPERS THIRD SEMESTER (COMPUTER SCIENCE & ENGINEERING) S. No. Subject Code Subject Name L T P Cr. 1. ECE-209 Digital Electronic Circuits 3 1 0 3.5 2. CPE-201 Computer Architecture 3 1 0 3.5 3. CPE-202 Object Oriented Programming using C++ 3 1 0 3.5 4. CPE-203 Operating Systems 3 1 0 3.5 5. CPE-205 Discrete Mathematical Structure 3 1 0 3.5 6. CPE-210 Computer Peripheral and Interface 3 1 0 3.5 7. ECE-259 Digital Electronic Circuits Lab 0 0 2 1.0 8. CPE-252 Object Oriented Programming using C++ Lab 0 0 2 1.0 9. CPE-253 Operating System and Hardware Lab 0 0 2 1.0 10. ** Punjabi 3 0 0 Total 18 6 6 24 Total Contact Hours = 30 ECE-259, CPE-252 and CPE-253 are practical papers only. There will not be any theory examination for these papers. * * In addition to above mentioned subjects, there will be an additional course on Punjabi as a qualifying subject. Page 1 of 87 Batch: 2015 (CSE) Punjabi University, Patiala Four Year B.Tech (CSE) Batch 2014 BOS: 2015 B. TECH SECOND YEAR COMPUTER SCIENCE & ENGINEERING (Batch 2014) Session (2015-16) SCHEME OF PAPERS FOURTH SEMESTER (COMPUTER SCIENCE & ENGINEERING) S. No. Subject Code Subject Name L T P Cr. 1. BAS-201 Numerical Methods & Applications 3 1 0 3.5 2.
    [Show full text]
  • Triadic Automata and Machines As Information Transformers
    information Article Triadic Automata and Machines as Information Transformers Mark Burgin Department of Mathematics, University of California, Los Angeles, 520 Portola Plaza, Los Angeles, CA 90095, USA; [email protected] Received: 12 December 2019; Accepted: 31 January 2020; Published: 13 February 2020 Abstract: Algorithms and abstract automata (abstract machines) are used to describe, model, explore and improve computers, cell phones, computer networks, such as the Internet, and processes in them. Traditional models of information processing systems—abstract automata—are aimed at performing transformations of data. These transformations are performed by their hardware (abstract devices) and controlled by their software (programs)—both of which stay unchanged during the whole computational process. However, in physical computers, their software is also changing by special tools such as interpreters, compilers, optimizers and translators. In addition, people change the hardware of their computers by extending the external memory. Moreover, the hardware of computer networks is incessantly altering—new computers and other devices are added while other computers and other devices are disconnected. To better represent these peculiarities of computers and computer networks, we introduce and study a more complete model of computations, which is called a triadic automaton or machine. In contrast to traditional models of computations, triadic automata (machine) perform computational processes transforming not only data but also hardware and programs, which control data transformation. In addition, we further develop taxonomy of classes of automata and machines as well as of individual automata and machines according to information they produce. Keywords: information; automaton; machine; hardware; software; modification; process; inductive; recursive; superrecursive; equivalence 1.
    [Show full text]
  • A Verified Information-Flow Architecture
    A Verified Information-Flow Architecture Arthur Azevedo de Amorim1 Nathan Collins2 Andre´ DeHon1 Delphine Demange1 Cat˘ alin˘ Hrit¸cu1,3 David Pichardie3,4 Benjamin C. Pierce1 Randy Pollack4 Andrew Tolmach2 1University of Pennsylvania 2Portland State University 3INRIA 4Harvard University Abstract 40, etc.] and dynamic [3, 20, 39, 44, etc.] enforcement mecha- SAFE is a clean-slate design for a highly secure computer sys- nisms and a huge literature on their formal properties [19, 40, etc.]. tem, with pervasive mechanisms for tracking and limiting infor- Similarly, operating systems with information-flow tracking have mation flows. At the lowest level, the SAFE hardware supports been a staple of the OS literature for over a decade [28, etc.]. But fine-grained programmable tags, with efficient and flexible prop- progress at the hardware level has been more limited, with most agation and combination of tags as instructions are executed. The proposals concentrating on hardware acceleration for taint-tracking operating system virtualizes these generic facilities to present an schemes [12, 15, 45, 47, etc.]. SAFE extends the state of the art information-flow abstract machine that allows user programs to la- in two significant ways. First, the SAFE machine offers hardware bel sensitive data with rich confidentiality policies. We present a support for sound and efficient purely-dynamic tracking of both ex- formal, machine-checked model of the key hardware and software plicit and implicit flows (i.e., information leaks through both data mechanisms used to control information flow in SAFE and an end- and control flow) for arbitrary machine code programs—not just to-end proof of noninterference for this model.
    [Show full text]