From Dataflow to Multithreading

Total Page:16

File Type:pdf, Size:1020Kb

From Dataflow to Multithreading ASYNCHRONY IN PARALLEL COMPUTING: FROM DATAFLOW TO MULTITHREADING y z x JURIJ \ILC , BORUT ROBIS , AND THEO UNGERER Abstract. The pap er presents an overview of the parallel computing mo dels, architectures, and research pro jects that are based on asynchronous instruction scheduling. It starts with pure dataow computing mo dels and presents an historical development of several ideas i.e. single-token-per-arc dataow, tagged-token dataow, explicit token store, threaded dataow, large-grain dataow, RISC dataow, cycle-by-cycle interleaved multithreading, blo ckinterleaved multithreading, simultaneous multithreading that resulted in mo dern multithreaded sup erscalar pro cessors. The pap er shows that unication of von Neumann and dataow mo dels is p ossible and preferred to treating them as two unrelated, orthogonal computing paradigms. Today's dataow research incorp orates more explicit notions of state into the architecture, and von Neumann mo dels using many dataow techniques to improve the latency hiding asp ects of mo dern multithreaded systems. Key words. parallel computer architectures, data-driven computing,multithread computing, static dataow, tagged-token dataow, threaded dataow, cycle-by-cycle interleaving, blo ckinter- leaving, multithreaded sup erscalar AMS sub ject classications. 68-02, 68M05, 68Q10 1. Intro duction. There are many problems that require enormous computa- tional capacity to solve, and therefore present opp ortunities for high-p erformance parallel computing. There are also a numberofwell-known hardware organization techniques for exploiting parallelism. In this pap er we giveanoverview of the com- puters where parallelism is achieved by executing multiple asynchronous threads of instructions concurrently. For sequential computers the principal execution mo del is the well known von Neumann model which consists of a sequential pro cess running in a linear address space. The amount of concurrency available is relatively small [228]. Computers having more than one pro cessor may b e categorized in twotyp es, SIMD or MIMD [82], and allow several multiprocessor von Neumann program execution mo dels, such as shared memory with actions co ordinated by synchronization op erations, e.g. P and V semaphore commands [75], or distributed memory with actions co ordinated by message passing facilities [118, 135], e.g. explicit op erating system calls. A MIMD computer in principle will have a dierent program running on every pro cessor. This makes for an extremely complex programming environment. A frequent program execution mo del for MIMD computers is the single-program multiple data SPMD mo del [62] where the same program is run on all pro cessors, but the execution may follow dierent paths through the program according to the pro cessor's identity. In multipro cessor systems, two main issues must b e addressed: memory latency , which is the time that elapses b etween issuing a memory request and receiving the cor- Received by the editors January 14, 1997; accepted for publications in revised form September 9, 1997. This work was partially supp orted by the Ministry of Science and Technology of the Republic of Slovenia under grants J2-7648 and J2-8697. y Computer Systems Department, Joºef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia [email protected]. z Faculty of Computer and Information Science, University of Ljubljana, Trºa²ka cesta 25, SI-1000 Ljubljana, Slovenia, and Computer Systems Department, Joºef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia [email protected]. x Department of Computer Design and Fault Tolerance, University of Karlsruhe, P.O.Box 6980, D-76128 Karlsruhe, [email protected]. 1 2 J. \ILC, B. ROBIS AND T. UNGERER resp onding resp onse, and synchronization , which is the need to enforce the ordering of instruction executions according to their data dep endencies. The two issues can- not b e prop erly resolved in a von Neumann context since connecting von Neumann pro cessors into a very high sp eed general purp ose computer also brings b ottleneck problems [20]. As an alternative, the dataow mo del was intro duced as the radically new mo del capable of prop erly satisfying the two needs [8,17,66,88]. Dataow mo dels use dataow program graphs to represent the ow of data and control. Synchronization is achieved by a requirement that two or more sp ecied events o ccur b efore an instruction is eligible for execution [191]. For example, in the graph/heap model [65] an instruction is enabled by the presence of simple tokens i.e., data packets on the arcs of copies of the dataow graph, whereas the unraveling interpreter model U-interpreter [19] enables instructions by matching tags that identify the graph instance to which a token b elongs. In [158] a detailed implementation of an unraveling dataowinterpreter of the Irvine Dataow Id programming language is prop osed. Dataow mo dels have also guided the design of several multithreaded computers [70]. Unfortunately, direct implementation of computers based on a pure dataow mo del has b een found to b e an arduous task. For this reason, the impact of the convergence of the dataow and control-owwas investigated [16, 125, 127, 141, 169]. In particular, Nikhil and Arvind p osed the following question: What can a von Neu- mann processor borrow from dataow to become more suitable for a multiprocessor? and oered the answer in terms of the P-RISC Parallel RISC programming mo del [164]. The mo del is based on a simple, RISC-like instruction set extended with three instructions that giveavon Neumann pro cessor a ne-grained dataow capability [16]. It uses explicit fork and join commands. Based on P-RISC programming mo del amultithreaded Id programming language is implemented [162, 163]. Another mo del designed to supp ort nested functional languages using sequential threads of instruc- tions is the Multilisp mo del [114, 115, 116, 146]. Multilisp is an extension of the Lisp dialect Scheme with additional op erators and semantics to deal with parallel execu- tion. The principal language extension provided by Multilisp is f utur ex.Upon executing f utur ex, an immediate undeter mined value is returned. The computa- tion of x o ccurs in parallel and the result replaces undeter mined when complete. Of course, any use of the result would blo ck the parent pro cess until the computation is nished. The incorp oration of conventional control-ow thread execution into the dataow approach resulted in the multithreaded computer architecture which is one of the most promising and exciting avenues for the exploitation of parallelism [10, 43, 87, 128, 151, 159]. 2. The day b efore yesterday: Pure dataow. The fundamental principles of dataowwere develop ed by Jack Dennis [65] in the early 1970s. The dataow mo del [8, 68, 93, 191]avoids the two features of von Neumann mo del, the program counter and the global up datable store, which b ecome b ottlenecks in exploiting parallelism [27]. The computational rule, also known as the ring rule of the dataow mo del, sp ecies the condition for the execution of an instruction. The basic instruction ring rule, common to all dataow systems is as follows: An instruction is said to be executable when al l the input operands that arenecessary for its execution are available to it. The instruction for which this condition is satised is said to b e red . The eect of ring an instruction is the consumption of its input values and generation of output values. Due to the ab ove rule the mo del is asynchronous . It is also self-scheduling ASYNCHRONY IN PARALLEL COMPUTING 3 since instruction sequencing is constrained only by data dep endencies. Thus, the ow of control is the same as the ow of data among various instructions. As a result, a dataow program can b e represented as a directed graph consisting of named nodes , which represent instructions, and arcs , which represent data dep endencies among instructions [64, 138] Fig.2.1 a,b. Data values propagate along the arcs in the form of data packets, called tokens . The two imp ortantcharacteristics of the dataow graphs are functionality and composability .Functionality means that evaluation of a graph is equivalenttoevaluation of a mathematical function on the same input values. Comp osability means that graphs can b e combined to form new graphs. In a dataow architecture the program execution is in terms of receiving, pro cess- ing and sending out tokens containing some data and a tag. Dep endencies b etween data are translated into tag matching and transformation, while pro cessing o ccurs when a set of matched tokens arrives at the execution unit. The instruction which has to b e fetched from the instruction store according to the tag information con- tains information ab out what to do with the data and how to transform the tags. The matching and execution unit are connected by an asynchronous pip eline, with queues added to smo oth out load variations [136]. Some form of asso ciate memory is required to supp ort token matching. It can b e a real memory with asso ciative access, a simulated memory based on hashing, or a direct matched memory. Each solution has its prop onent but none is absolutely suitable. Due to its elegance and simplicity, the pure dataow mo del has b een the sub ject of many research eorts. Since the early 1970s, a numb er of dataow computer proto- typ es have b een built and evaluated, and dierent designs and compiling techniques have b een simulated [17, 66, 88, 151, 168, 185, 196, 205, 208, 216, 219]. Clearly, an architecture supp orting the execution of dataow graphs should supp ort the ow of data.
Recommended publications
  • NVIDIA Tesla Personal Supercomputer, Please Visit
    NVIDIA TESLA PERSONAL SUPERCOMPUTER TESLA DATASHEET Get your own supercomputer. Experience cluster level computing performance—up to 250 times faster than standard PCs and workstations—right at your desk. The NVIDIA® Tesla™ Personal Supercomputer AccessiBLE to Everyone TESLA C1060 COMPUTING ™ PROCESSORS ARE THE CORE is based on the revolutionary NVIDIA CUDA Available from OEMs and resellers parallel computing architecture and powered OF THE TESLA PERSONAL worldwide, the Tesla Personal Supercomputer SUPERCOMPUTER by up to 960 parallel processing cores. operates quietly and plugs into a standard power strip so you can take advantage YOUR OWN SUPERCOMPUTER of cluster level performance anytime Get nearly 4 teraflops of compute capability you want, right from your desk. and the ability to perform computations 250 times faster than a multi-CPU core PC or workstation. NVIDIA CUDA UnlocKS THE POWER OF GPU parallel COMPUTING The CUDA™ parallel computing architecture enables developers to utilize C programming with NVIDIA GPUs to run the most complex computationally-intensive applications. CUDA is easy to learn and has become widely adopted by thousands of application developers worldwide to accelerate the most performance demanding applications. TESLA PERSONAL SUPERCOMPUTER | DATASHEET | MAR09 | FINAL FEATURES AND BENEFITS Your own Supercomputer Dedicated computing resource for every computational researcher and technical professional. Cluster Performance The performance of a cluster in a desktop system. Four Tesla computing on your DesKtop processors deliver nearly 4 teraflops of performance. DESIGNED for OFFICE USE Plugs into a standard office power socket and quiet enough for use at your desk. Massively Parallel Many Core 240 parallel processor cores per GPU that can execute thousands of GPU Architecture concurrent threads.
    [Show full text]
  • High Performance Computing Through Parallel and Distributed Processing
    Yadav S. et al., J. Harmoniz. Res. Eng., 2013, 1(2), 54-64 Journal Of Harmonized Research (JOHR) Journal Of Harmonized Research in Engineering 1(2), 2013, 54-64 ISSN 2347 – 7393 Original Research Article High Performance Computing through Parallel and Distributed Processing Shikha Yadav, Preeti Dhanda, Nisha Yadav Department of Computer Science and Engineering, Dronacharya College of Engineering, Khentawas, Farukhnagar, Gurgaon, India Abstract : There is a very high need of High Performance Computing (HPC) in many applications like space science to Artificial Intelligence. HPC shall be attained through Parallel and Distributed Computing. In this paper, Parallel and Distributed algorithms are discussed based on Parallel and Distributed Processors to achieve HPC. The Programming concepts like threads, fork and sockets are discussed with some simple examples for HPC. Keywords: High Performance Computing, Parallel and Distributed processing, Computer Architecture Introduction time to solve large problems like weather Computer Architecture and Programming play forecasting, Tsunami, Remote Sensing, a significant role for High Performance National calamities, Defence, Mineral computing (HPC) in large applications Space exploration, Finite-element, Cloud science to Artificial Intelligence. The Computing, and Expert Systems etc. The Algorithms are problem solving procedures Algorithms are Non-Recursive Algorithms, and later these algorithms transform in to Recursive Algorithms, Parallel Algorithms particular Programming language for HPC. and Distributed Algorithms. There is need to study algorithms for High The Algorithms must be supported the Performance Computing. These Algorithms Computer Architecture. The Computer are to be designed to computer in reasonable Architecture is characterized with Flynn’s Classification SISD, SIMD, MIMD, and For Correspondence: MISD. Most of the Computer Architectures preeti.dhanda01ATgmail.com are supported with SIMD (Single Instruction Received on: October 2013 Multiple Data Streams).
    [Show full text]
  • 2.5 Classification of Parallel Computers
    52 // Architectures 2.5 Classification of Parallel Computers 2.5 Classification of Parallel Computers 2.5.1 Granularity In parallel computing, granularity means the amount of computation in relation to communication or synchronisation Periods of computation are typically separated from periods of communication by synchronization events. • fine level (same operations with different data) ◦ vector processors ◦ instruction level parallelism ◦ fine-grain parallelism: – Relatively small amounts of computational work are done between communication events – Low computation to communication ratio – Facilitates load balancing 53 // Architectures 2.5 Classification of Parallel Computers – Implies high communication overhead and less opportunity for per- formance enhancement – If granularity is too fine it is possible that the overhead required for communications and synchronization between tasks takes longer than the computation. • operation level (different operations simultaneously) • problem level (independent subtasks) ◦ coarse-grain parallelism: – Relatively large amounts of computational work are done between communication/synchronization events – High computation to communication ratio – Implies more opportunity for performance increase – Harder to load balance efficiently 54 // Architectures 2.5 Classification of Parallel Computers 2.5.2 Hardware: Pipelining (was used in supercomputers, e.g. Cray-1) In N elements in pipeline and for 8 element L clock cycles =) for calculation it would take L + N cycles; without pipeline L ∗ N cycles Example of good code for pipelineing: §doi =1 ,k ¤ z ( i ) =x ( i ) +y ( i ) end do ¦ 55 // Architectures 2.5 Classification of Parallel Computers Vector processors, fast vector operations (operations on arrays). Previous example good also for vector processor (vector addition) , but, e.g. recursion – hard to optimise for vector processors Example: IntelMMX – simple vector processor.
    [Show full text]
  • Data Structure
    EDUSAT LEARNING RESOURCE MATERIAL ON DATA STRUCTURE (For 3rd Semester CSE & IT) Contributors : 1. Er. Subhanga Kishore Das, Sr. Lect CSE 2. Mrs. Pranati Pattanaik, Lect CSE 3. Mrs. Swetalina Das, Lect CA 4. Mrs Manisha Rath, Lect CA 5. Er. Dillip Kumar Mishra, Lect 6. Ms. Supriti Mohapatra, Lect 7. Ms Soma Paikaray, Lect Copy Right DTE&T,Odisha Page 1 Data Structure (Syllabus) Semester & Branch: 3rd sem CSE/IT Teachers Assessment : 10 Marks Theory: 4 Periods per Week Class Test : 20 Marks Total Periods: 60 Periods per Semester End Semester Exam : 70 Marks Examination: 3 Hours TOTAL MARKS : 100 Marks Objective : The effectiveness of implementation of any application in computer mainly depends on the that how effectively its information can be stored in the computer. For this purpose various -structures are used. This paper will expose the students to various fundamentals structures arrays, stacks, queues, trees etc. It will also expose the students to some fundamental, I/0 manipulation techniques like sorting, searching etc 1.0 INTRODUCTION: 04 1.1 Explain Data, Information, data types 1.2 Define data structure & Explain different operations 1.3 Explain Abstract data types 1.4 Discuss Algorithm & its complexity 1.5 Explain Time, space tradeoff 2.0 STRING PROCESSING 03 2.1 Explain Basic Terminology, Storing Strings 2.2 State Character Data Type, 2.3 Discuss String Operations 3.0 ARRAYS 07 3.1 Give Introduction about array, 3.2 Discuss Linear arrays, representation of linear array In memory 3.3 Explain traversing linear arrays, inserting & deleting elements 3.4 Discuss multidimensional arrays, representation of two dimensional arrays in memory (row major order & column major order), and pointers 3.5 Explain sparse matrices.
    [Show full text]
  • Massively Parallel Computing with CUDA
    Massively Parallel Computing with CUDA Antonino Tumeo Politecnico di Milano 1 GPUs have evolved to the point where many real world applications are easily implemented on them and run significantly faster than on multi-core systems. Future computing architectures will be hybrid systems with parallel-core GPUs working in tandem with multi-core CPUs. Jack Dongarra Professor, University of Tennessee; Author of “Linpack” Why Use the GPU? • The GPU has evolved into a very flexible and powerful processor: • It’s programmable using high-level languages • It supports 32-bit and 64-bit floating point IEEE-754 precision • It offers lots of GFLOPS: • GPU in every PC and workstation What is behind such an Evolution? • The GPU is specialized for compute-intensive, highly parallel computation (exactly what graphics rendering is about) • So, more transistors can be devoted to data processing rather than data caching and flow control ALU ALU Control ALU ALU Cache DRAM DRAM CPU GPU • The fast-growing video game industry exerts strong economic pressure that forces constant innovation GPUs • Each NVIDIA GPU has 240 parallel cores NVIDIA GPU • Within each core 1.4 Billion Transistors • Floating point unit • Logic unit (add, sub, mul, madd) • Move, compare unit • Branch unit • Cores managed by thread manager • Thread manager can spawn and manage 12,000+ threads per core 1 Teraflop of processing power • Zero overhead thread switching Heterogeneous Computing Domains Graphics Massive Data GPU Parallelism (Parallel Computing) Instruction CPU Level (Sequential
    [Show full text]
  • Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++
    Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++ Christopher S. Zakian, Timothy A. K. Zakian Abhishek Kulkarni, Buddhika Chamith, and Ryan R. Newton Indiana University - Bloomington, fczakian, tzakian, adkulkar, budkahaw, [email protected] Abstract. Library and language support for scheduling non-blocking tasks has greatly improved, as have lightweight (user) threading packages. How- ever, there is a significant gap between the two developments. In previous work|and in today's software packages|lightweight thread creation incurs much larger overheads than tasking libraries, even on tasks that end up never blocking. This limitation can be removed. To that end, we describe an extension to the Intel Cilk Plus runtime system, Concurrent Cilk, where tasks are lazily promoted to threads. Concurrent Cilk removes the overhead of thread creation on threads which end up calling no blocking operations, and is the first system to do so for C/C++ with legacy support (standard calling conventions and stack representations). We demonstrate that Concurrent Cilk adds negligible overhead to existing Cilk programs, while its promoted threads remain more efficient than OS threads in terms of context-switch overhead and blocking communication. Further, it enables development of blocking data structures that create non-fork-join dependence graphs|which can expose more parallelism, and better supports data-driven computations waiting on results from remote devices. 1 Introduction Both task-parallelism [1, 11, 13, 15] and lightweight threading [20] libraries have become popular for different kinds of applications. The key difference between a task and a thread is that threads may block|for example when performing IO|and then resume again.
    [Show full text]
  • CS 2210: Theory of Computation
    CS 2210: Theory of Computation Spring, 2019 Administrative Information • Background survey • Textbook: E. Rich, Automata, Computability, and Complexity: Theory and Applications, Prentice-Hall, 2008. • Book website: http://www.cs.utexas.edu/~ear/cs341/automatabook/ • My Office Hours: – Monday, 6:00-8:00pm, Searles 224 – Tuesday, 1:00-2:30pm, Searles 222 • TAs – Anjulee Bhalla: Hours TBA, Searles 224 – Ryan St. Pierre, Hours TBA, Searles 224 What you can expect from the course • How to do proofs • Models of computation • What’s the difference between computability and complexity? • What’s the Halting Problem? • What are P and NP? • Why do we care whether P = NP? • What are NP-complete problems? • Where does this make a difference outside of this class? • How to work the answers to these questions into the conversation at a cocktail party… What I will expect from you • Problem Sets (25%): – Goal: Problems given on Mondays and Wednesdays – Due the next Monday – Graded by following Monday – A learning tool, not a testing tool – Collaboration encouraged; more on this in next slide • Quizzes (15%) • Exams (2 non-cumulative, 30% each): – Closed book, closed notes, but… – Can bring in 8.5 x 11 page with notes on both sides • Class participation: Tiebreaker Other Important Things • Go to the TA hours • Study and work on problem sets in groups • Collaboration Issues: – Level 0 (In-Class Problems) • No restrictions – Level 1 (Homework Problems) • Verbal collaboration • But, individual write-ups – Level 2 (not used in this course) • Discussion with TAs only – Level 3 (Exams) • Professor clarifications only Right now… • What does it mean to study the “theory” of something? • Experience with theory in other disciplines? • Relationship to practice? – “In theory, theory and practice are the same.
    [Show full text]
  • The Problem with Threads
    The Problem with Threads Edward A. Lee Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2006-1 http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.html January 10, 2006 Copyright © 2006, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement This work was supported in part by the Center for Hybrid and Embedded Software Systems (CHESS) at UC Berkeley, which receives support from the National Science Foundation (NSF award No. CCR-0225610), the State of California Micro Program, and the following companies: Agilent, DGIST, General Motors, Hewlett Packard, Infineon, Microsoft, and Toyota. The Problem with Threads Edward A. Lee Professor, Chair of EE, Associate Chair of EECS EECS Department University of California at Berkeley Berkeley, CA 94720, U.S.A. [email protected] January 10, 2006 Abstract Threads are a seemingly straightforward adaptation of the dominant sequential model of computation to concurrent systems. Languages require little or no syntactic changes to sup- port threads, and operating systems and architectures have evolved to efficiently support them. Many technologists are pushing for increased use of multithreading in software in order to take advantage of the predicted increases in parallelism in computer architectures.
    [Show full text]
  • The 1-Bit Instrument: the Fundamentals of 1-Bit Synthesis
    BLAKE TROISE The 1-Bit Instrument The Fundamentals of 1-Bit Synthesis, Their Implementational Implications, and Instrumental Possibilities ABSTRACT The 1-bit sonic environment (perhaps most famously musically employed on the ZX Spectrum) is defined by extreme limitation. Yet, belying these restrictions, there is a surprisingly expressive instrumental versatility. This article explores the theory behind the primary, idiosyncratically 1-bit techniques available to the composer-programmer, those that are essential when designing “instruments” in 1-bit environments. These techniques include pulse width modulation for timbral manipulation and means of generating virtual polyph- ony in software, such as the pin pulse and pulse interleaving techniques. These methodologies are considered in respect to their compositional implications and instrumental applications. KEYWORDS chiptune, 1-bit, one-bit, ZX Spectrum, pulse pin method, pulse interleaving, timbre, polyphony, history 2020 18 May on guest by http://online.ucpress.edu/jsmg/article-pdf/1/1/44/378624/jsmg_1_1_44.pdf from Downloaded INTRODUCTION As unquestionably evident from the chipmusic scene, it is an understatement to say that there is a lot one can do with simple square waves. One-bit music, generally considered a subdivision of chipmusic,1 takes this one step further: it is the music of a single square wave. The only operation possible in a -bit environment is the variation of amplitude over time, where amplitude is quantized to two states: high or low, on or off. As such, it may seem in- tuitively impossible to achieve traditionally simple musical operations such as polyphony and dynamic control within a -bit environment. Despite these restrictions, the unique tech- niques and auditory tricks of contemporary -bit practice exploit the limits of human per- ception.
    [Show full text]
  • Computer Architecture: Dataflow (Part I)
    Computer Architecture: Dataflow (Part I) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture n These slides are from 18-742 Fall 2012, Parallel Computer Architecture, Lecture 22: Dataflow I n Video: n http://www.youtube.com/watch? v=D2uue7izU2c&list=PL5PHm2jkkXmh4cDkC3s1VBB7- njlgiG5d&index=19 2 Some Required Dataflow Readings n Dataflow at the ISA level q Dennis and Misunas, “A Preliminary Architecture for a Basic Data Flow Processor,” ISCA 1974. q Arvind and Nikhil, “Executing a Program on the MIT Tagged- Token Dataflow Architecture,” IEEE TC 1990. n Restricted Dataflow q Patt et al., “HPS, a new microarchitecture: rationale and introduction,” MICRO 1985. q Patt et al., “Critical issues regarding HPS, a high performance microarchitecture,” MICRO 1985. 3 Other Related Recommended Readings n Dataflow n Gurd et al., “The Manchester prototype dataflow computer,” CACM 1985. n Lee and Hurson, “Dataflow Architectures and Multithreading,” IEEE Computer 1994. n Restricted Dataflow q Sankaralingam et al., “Exploiting ILP, TLP and DLP with the Polymorphous TRIPS Architecture,” ISCA 2003. q Burger et al., “Scaling to the End of Silicon with EDGE Architectures,” IEEE Computer 2004. 4 Today n Start Dataflow 5 Data Flow Readings: Data Flow (I) n Dennis and Misunas, “A Preliminary Architecture for a Basic Data Flow Processor,” ISCA 1974. n Treleaven et al., “Data-Driven and Demand-Driven Computer Architecture,” ACM Computing Surveys 1982. n Veen, “Dataflow Machine Architecture,” ACM Computing Surveys 1986. n Gurd et al., “The Manchester prototype dataflow computer,” CACM 1985. n Arvind and Nikhil, “Executing a Program on the MIT Tagged-Token Dataflow Architecture,” IEEE TC 1990.
    [Show full text]
  • Data Structure Invariants
    Lecture 8 Notes Data Structures 15-122: Principles of Imperative Computation (Spring 2016) Frank Pfenning, André Platzer, Rob Simmons 1 Introduction In this lecture we introduce the idea of imperative data structures. So far, the only interfaces we’ve used carefully are pixels and string bundles. Both of these interfaces had the property that, once we created a pixel or a string bundle, we weren’t interested in changing its contents. In this lecture, we’ll talk about an interface that mimics the arrays that are primitively available in C0. To implement this interface, we’ll need to round out our discussion of types in C0 by discussing pointers and structs, two great tastes that go great together. We will discuss using contracts to ensure that pointer accesses are safe. Relating this to our learning goals, we have Computational Thinking: We illustrate the power of abstraction by con- sidering both the client-side and library-side of the interface to a data structure. Algorithms and Data Structures: The abstract arrays will be one of our first examples of abstract datatypes. Programming: Introduction of structs and pointers, use and design of in- terfaces. 2 Structs So far in this course, we’ve worked with five different C0 types — int, bool, char, string, and arrays t[] (there is a array type t[] for every type t). The character, Boolean and integer values that we manipulate, store LECTURE NOTES Data Structures L8.2 locally, and pass to functions are just the values themselves. For arrays (and strings), the things we store in assignable variables or pass to functions are addresses, references to the place where the data stored in the array can be accessed.
    [Show full text]
  • The History of Computer Language Selection
    The History of Computer Language Selection Kevin R. Parker College of Business, Idaho State University, Pocatello, Idaho USA [email protected] Bill Davey School of Business Information Technology, RMIT University, Melbourne, Australia [email protected] Abstract: This examines the history of computer language choice for both industry use and university programming courses. The study considers events in two developed countries and reveals themes that may be common in the language selection history of other developed nations. History shows a set of recurring problems for those involved in choosing languages. This study shows that those involved in the selection process can be informed by history when making those decisions. Keywords: selection of programming languages, pragmatic approach to selection, pedagogical approach to selection. 1. Introduction The history of computing is often expressed in terms of significant hardware developments. Both the United States and Australia made early contributions in computing. Many trace the dawn of the history of programmable computers to Eckert and Mauchly’s departure from the ENIAC project to start the Eckert-Mauchly Computer Corporation. In Australia, the history of programmable computers starts with CSIRAC, the fourth programmable computer in the world that ran its first test program in 1949. This computer, manufactured by the government science organization (CSIRO), was used into the 1960s as a working machine at the University of Melbourne and still exists as a complete unit at the Museum of Victoria in Melbourne. Australia’s early entry into computing makes a comparison with the United States interesting. These early computers needed programmers, that is, people with the expertise to convert a problem into a mathematical representation directly executable by the computer.
    [Show full text]