The Abacus Processor Architecture

Total Page:16

File Type:pdf, Size:1020Kb

The Abacus Processor Architecture Abacus – A Processor Family for Education Nikita Bhardwaj, Maximilian Senftleben, and Klaus Schneider Department of Computer Science, TU Kaiserslautern http://es.cs.uni-kl.de ABSTRACT of the insights that lead to RISC archictectures [23, 47]. While We present the Abacus processor family and its compiler frame- some processor architectures like the dynamically organized out- work for the MiniC language that we have developed for teaching of-order execution schemes [29, 60] are quite independent of the processor architectures. Besides typical RISC instructions, Aba- compiler, others like VLIW [16], EDGE [8], and TTA [26] pro- cus also offers instructions for vector processing and thread syn- cessors strongly depend on their compilers. Studying these archi- chronization, but it is still small enough to be discussed completely tectures without considering code generators of compilers is not in a class. With reasonable effort, students can therefore modify reasonable. given implementations of micro-architectures and code generators Last but not least, the impact of the memory hierarchy is a pri- to deepen their understanding of the theoretical concepts. More- mary matter of concern in the interplay between compilers and pro- over, using benchmark examples, they can explore the quantitative cessors. Students must experience the enormous effect that can lead aspects of their optimizations. In contrast to commercial and other to performance differences up to a factor of 1000 [59]. For multi- educational processors, we provide many micro-architectures that core processors, students moreover have to understand the difficul- are based on a pure concept only rather than on a combination of ties of cache coherence and weak memory consistency [1, 58]. All concepts, and we provide code generators which contain the core of these aspects can only be understood by actively working with ideas of some architectures. compilers and processors, or at least with simulators of processors. To keep it as simple as possible, many microprocessors and sim- ulators have been developed for educational use: The most popular 1. INTRODUCTION ones are probably the DLX processor of Hennessy and Patterson Even though instruction sets of processors have a quite long life- [24] and Knuth’s MIX and MMIX processors [31] that are both time, it is more worthwhile to study the concepts and core ideas supported by simulators [32], but both are outdated and lack simple of processor architectures rather than concrete instances of proces- models of both micro-architectures and compilers. The ‘little com- sors. To that end, processor architectures are classically catego- puter LC3’ [46] instead offers simulators and compilers for a very rized in many ways: The famous SISD, SIMD, MISD, and MIMD1 simple processor but, there are no different micro-architectures, and classes of Flynn [17] relate the number of instructions with the therefore is only useful for Bachelor students. The same is the case number of their operands. Other classic classification criteria are for the ‘little man’s computer LMC’ [67]. In contrast, for most open the RISC or CISC instruction sets [23, 47], while yet others con- micro-architecture models2 like OpenRISC, OpenSPARC, LEON, sider the micro-architecture of the processors rather than their in- and others, there is typically a lack of simple compilers, and these struction sets. processors are also too complex for teaching. We address further Today, one can further distinguish classes of processor architec- related works in Section 5 of the paper, but a complete survey tures by dynamic or static instruction scheduling [15, 16, 21, 29, would require a single paper on its own. 60], use of scalar or vector data operands [25], use of subword- In this paper, we therefore present the Abacus processor family level parallelism as used in multimedia or arithmetic instructions, and its compiler framework for the MiniC language that we de- general-purpose or application-specific instruction sets [35, 62], veloped for educational use in processor architecture classes. The reconfigurable hardware units [18, 19], or their uses in desktop, Abacus instruction set offers besides typical RISC instructions also server or embedded systems. instructions for vector processing and thread synchronization and Moreover, it is a well-know matter of fact that compilers and can therefore be used to demonstrate hot topics of processor archi- processor architectures have to support each other which was one tectures. Nevertheless, the instruction set is still small enough3 so that we can offer complete implementations of micro-architectures 1Abbreviating single/multiple instruction vs. single/multiple data. that support special architectural concepts like pipelined execution, out-of-order execution or vector pipeline chaining. Students can re- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not fine these implementations, e.g., by adding data conflict resolution made or distributed for profit or commercial advantage and that copies bear techniques to pipelines or by adding branch predictors to super- this notice and the full citation on the first page. Copyrights for components scalar machines, etc.. Analogously, we can offer a compiler library of this work owned by others than ACM must be honored. Abstracting with whose code generators can be adapted by the students to study reg- credit is permitted. To copy otherwise, or republish, to post on servers or to ister allocation, instruction selection schemes, dataflow analysis, redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. 2See http://opencores.org WESE’14, October 12-17, 2014, New Delhi, India 3 c 2014 ACM. ISBN 978-1-4503-3090-9/14/10 ...$15.00. See http://es.cs.uni-kl.de/research/ DOI: http://dx.doi.org/10.1145/2829957.2829959. architectures for more details and a reference card. and refined compilation techniques like trace and modulo schedul- However, considering high-level programming languages with ing for statically organized processors like VLIW [16], EDGE [8], features like object orientation, higher order functions or polymor- and TTA [26] processors. phic types is way too complicated for a course on processor archi- Using the different Abacus micro-architectures and code genera- tectures where just the final code generators of the compilers are of tors, our students can deepen their knowledge about compilers and interest. To that end, we could, in principle, consider open source processors which is a mandatory competence of today’s embedded compilers like the GNU compiler collection4 or SUIF [65] or the system designers. Using benchmark examples, they can moreover code generation from a byte code language of a virtual machine explore quantitative impacts of their designs. Both the qualitative language like LLVM [33, 34]. However, these are still too big to as well as the quantitative aspects are not possible to teach with allow our students to concentrate on the main problems for code commercial processors. With this intention in mind, we introduce generation of the particular architectures. the different Abacus micro-architectures and code generators to our We have therefore developed a simple programming language students who learn the following in detail: called MiniC that is described in this section. For this MiniC lan- guage, we have developed a compiler framework including a typ- • general compiler techniques like intermediate representation ical frontend (lexer, parsers, type checker) with various backends of the programs as control-flow graphs and dataflow analysis (intermediate code, code optimization, register allocators, instruc- to compute, e.g., the liveness of variables tion selections) that is made available to our students, so that they • general instruction selection, e.g., for generating code for ex- can concentrate on the code generation part for considered proces- pressions [55] or dataflow graphs sor architectures. • register allocation schemes like Chaitin-Briggs graph color- 2.1 The MiniC Language ing [7, 22] and the linear scan method [49] MiniC has been developed as a minimal multi-threaded program- • pipelined execution with conflict resolution schemes either ming language that can be easily compiled into intermediate for- by the compiler or the processor using stalling or result for- mats like control-flow graphs so that students can easily work with warding these data structures to implement their own code optimizers and • dynamic branch prediction techniques generators. • dynamical instruction scheduling by processors using reser- MiniC has a minimal set of data types, namely booleans (bool), vation stations and reordering unsigned/signed integers of machine width (nat and int), ar- • static instruction scheduling by compilers using trace and rays, and tuples. For these data types, there are the usual boolean modulo scheduling and arithmetic operations that are needed to implement reasonable • pipeline techniques for vector processors like convoys and benchmark examples. The language is strictly typed and allows pipeline chaining programmers to cast types to other types using cast expressions like (nat) τ to enforce the type checker to give expression τ the • code vectorization of scalar code for vector processors in- type nat. cluding strip-mining and other compiler techniques like tiling Statements of MiniC are also standard except
Recommended publications
  • Donald Knuth Fletcher Jones Professor of Computer Science, Emeritus Curriculum Vitae Available Online
    Donald Knuth Fletcher Jones Professor of Computer Science, Emeritus Curriculum Vitae available Online Bio BIO Donald Ervin Knuth is an American computer scientist, mathematician, and Professor Emeritus at Stanford University. He is the author of the multi-volume work The Art of Computer Programming and has been called the "father" of the analysis of algorithms. He contributed to the development of the rigorous analysis of the computational complexity of algorithms and systematized formal mathematical techniques for it. In the process he also popularized the asymptotic notation. In addition to fundamental contributions in several branches of theoretical computer science, Knuth is the creator of the TeX computer typesetting system, the related METAFONT font definition language and rendering system, and the Computer Modern family of typefaces. As a writer and scholar,[4] Knuth created the WEB and CWEB computer programming systems designed to encourage and facilitate literate programming, and designed the MIX/MMIX instruction set architectures. As a member of the academic and scientific community, Knuth is strongly opposed to the policy of granting software patents. He has expressed his disagreement directly to the patent offices of the United States and Europe. (via Wikipedia) ACADEMIC APPOINTMENTS • Professor Emeritus, Computer Science HONORS AND AWARDS • Grace Murray Hopper Award, ACM (1971) • Member, American Academy of Arts and Sciences (1973) • Turing Award, ACM (1974) • Lester R Ford Award, Mathematical Association of America (1975) • Member, National Academy of Sciences (1975) 5 OF 44 PROFESSIONAL EDUCATION • PhD, California Institute of Technology , Mathematics (1963) PATENTS • Donald Knuth, Stephen N Schiller. "United States Patent 5,305,118 Methods of controlling dot size in digital half toning with multi-cell threshold arrays", Adobe Systems, Apr 19, 1994 • Donald Knuth, LeRoy R Guck, Lawrence G Hanson.
    [Show full text]
  • Increasing Memory Miss Tolerance for SIMD Cores
    Increasing Memory Miss Tolerance for SIMD Cores ∗ David Tarjan, Jiayuan Meng and Kevin Skadron Department of Computer Science University of Virginia, Charlottesville, VA 22904 {dtarjan, jm6dg,skadron}@cs.virginia.edu ABSTRACT that use a single instruction multiple data (SIMD) organi- Manycore processors with wide SIMD cores are becoming a zation can amortize the area and power overhead of a single popular choice for the next generation of throughput ori- frontend over a large number of execution backends. For ented architectures. We introduce a hardware technique example, we estimate that a 32-wide SIMD core requires called “diverge on miss” that allows SIMD cores to better about one fifth the area of 32 individual scalar cores. Note tolerate memory latency for workloads with non-contiguous that this estimate does not include the area of any intercon- memory access patterns. Individual threads within a SIMD nection network among the MIMD cores, which often grows “warp” are allowed to slip behind other threads in the same supra-linearly with the number of cores [18]. warp, letting the warp continue execution even if a subset of To better tolerate memory and pipeline latencies, many- core processors typically use fine-grained multi-threading, threads are waiting on memory. Diverge on miss can either 1 increase the performance of a given design by up to a factor switching among multiple warps, so that active warps can of 3.14 for a single warp per core, or reduce the number of mask stalls in other warps waiting on long-latency events. warps per core needed to sustain a given level of performance The drawback of this approach is that the size of the regis- from 16 to 2 warps, reducing the area per core by 35%.
    [Show full text]
  • Tug2007-Slides-2X2.Pdf
    Dedication ÅEÌ Professor Donald Knuth (Stanford) Extending TEX and Professor William Kahan (Berkeley) ÅEÌAFÇÆÌ with Floating-Point Arithmetic AF Nelson H. F. Beebe ÇÆÌ X and Department of Mathematics University of Utah E T Salt Lake City, UT 84112-0090 USA TEX Users Group Conference 2007 talk. – p. 1/30 TEX Users Group Conference 2007 talk. – p. 2/30 ÅEÌAFÇÆÌ Arithmetic in TEX and Arithmetic in ÅEÌAFÇÆÌ ÅEÌ ÅEÌ Binary integer arithmetic with 32 bits (T X \count ÅEÌAFÇÆÌ restricts input numbers to 12 integer bits: ≥ E registers) % mf expr Fixed-point arithmetic with sign bit, overflow bit, 14 gimme an expr: 4095 >> 4095 ≥ gimme an expr: 4096 integer bits, and 16 fractional bits (T X \dimen, E ! Enormous number has been reduced. \muskip, and \skip registers) AF >> 4095.99998 AF Overflow detected on division and multiplication but not gimme an expr: infinity >> 4095.99998 on addition (flaw (NHFB), feature (DEK)) gimme an expr: epsilon >> 0.00002 gimme an expr: 1/epsilon Gyrations sometimes needed in ÅEÌAFÇÆÌ to work ÇÆÌ ! Arithmetic overflow. ÇÆÌ Xwith and fixed-point numbers X and >> 32767.99998 Uh, oh.E A little while ago one of the quantities gimmeE an expr: 1/3 >> 0.33333 that I was computing got too large, so I’m afraid gimme an expr: 3*(1/3) >> 0.99998 T T your answers will be somewhat askew. You’ll gimme an expr: 1.2 • 2.3 >> •1.1 probably have to adopt different tactics next gimme an expr: 1.2 • 2.4 >> •1.2 time. But I shall try to carry on anyway.
    [Show full text]
  • Typeset MMIX Programs with TEX Udo Wermuth Abstract a TEX Macro
    TUGboat, Volume 35 (2014), No. 3 297 Typeset MMIX programs with TEX Example: In section 9 the lines \See also sec- tion 10." and \This code is used in section 24." are given. Udo Wermuth No such line appears in section 10 as it only ex- tends the replacement code of section 9. (Note that Abstract section 10 has in its headline the number 9.) In section 24 the reference to section 9 stands for all of ATEX macro package is presented as a literate pro- the eight code lines stated in sections 9 and 10. gram. It can be included in programs written in the If a section is not used in any other section then languages MMIX or MMIXAL without affecting the it is a root and during the extraction of the code a assembler. Such an instrumented file can be pro- file is created that has the name of the root. This file cessed by TEX to get nicely formatted output. Only collects all the code in the sequence of the referenced a new first line and a new last line must be entered. sections from the code part. The collection process And for each end-of-line comment a flag is set to for all root sections is called tangle. A second pro- indicate that the comment is written in TEX. cess is called weave. It outputs the documentation and the code parts as a TEX document. How to read the following program Example: The following program has only one The text that starts in the next chapter is a literate root that is defined in section 4 with the headline program [2, 1] written in a style similar to noweb [7].
    [Show full text]
  • Tousimojarad, Ashkan (2016) GPRM: a High Performance Programming Framework for Manycore Processors. Phd Thesis
    Tousimojarad, Ashkan (2016) GPRM: a high performance programming framework for manycore processors. PhD thesis. http://theses.gla.ac.uk/7312/ Copyright and moral rights for this thesis are retained by the author A copy can be downloaded for personal non-commercial research or study This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the Author The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the Author When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given Glasgow Theses Service http://theses.gla.ac.uk/ [email protected] GPRM: A HIGH PERFORMANCE PROGRAMMING FRAMEWORK FOR MANYCORE PROCESSORS ASHKAN TOUSIMOJARAD SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy SCHOOL OF COMPUTING SCIENCE COLLEGE OF SCIENCE AND ENGINEERING UNIVERSITY OF GLASGOW NOVEMBER 2015 c ASHKAN TOUSIMOJARAD Abstract Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards in- creased parallelism. However, increased parallelism does not necessarily lead to better per- formance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general- purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM).
    [Show full text]
  • Multi-Core Processors and Systems: State-Of-The-Art and Study of Performance Increase
    Multi-Core Processors and Systems: State-of-the-Art and Study of Performance Increase Abhilash Goyal Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 [email protected] ABSTRACT speedup. Some tasks are easily divided into parts that can be To achieve the large processing power, we are moving towards processed in parallel. In those scenarios, speed up will most likely Parallel Processing. In the simple words, parallel processing can follow “common trajectory” as shown in Figure 2. If an be defined as using two or more processors (cores, computers) in application has little or no inherent parallelism, then little or no combination to solve a single problem. To achieve the good speedup will be achieved and because of overhead, speed up may results by parallel processing, in the industry many multi-core follow as show by “occasional trajectory” in Figure 2. processors has been designed and fabricated. In this class-project paper, the overview of the state-of-the-art of the multi-core processors designed by several companies including Intel, AMD, IBM and Sun (Oracle) is presented. In addition to the overview, the main advantage of using multi-core will demonstrated by the experimental results. The focus of the experiment is to study speed-up in the execution of the ‘program’ as the number of the processors (core) increases. For this experiment, open source parallel program to count the primes numbers is considered and simulation are performed on 3 nodes Raspberry cluster . Obtained results show that execution time of the parallel program decreases as number of core increases.
    [Show full text]
  • Understanding and Guiding the Computing Resource Management in a Runtime Stacking Context
    THÈSE PRÉSENTÉE À L’UNIVERSITÉ DE BORDEAUX ÉCOLE DOCTORALE DE MATHÉMATIQUES ET D’INFORMATIQUE par Arthur Loussert POUR OBTENIR LE GRADE DE DOCTEUR SPÉCIALITÉ : INFORMATIQUE Understanding and Guiding the Computing Resource Management in a Runtime Stacking Context Rapportée par : Allen D. Malony, Professor, University of Oregon Jean-François Méhaut, Professeur, Université Grenoble Alpes Date de soutenance : 18 Décembre 2019 Devant la commission d’examen composée de : Raymond Namyst, Professeur, Université de Bordeaux – Directeur de thèse Marc Pérache, Ingénieur-Chercheur, CEA – Co-directeur de thèse Emmanuel Jeannot, Directeur de recherche, Inria Bordeaux Sud-Ouest – Président du jury Edgar Leon, Computer Scientist, Lawrence Livermore National Laboratory – Examinateur Patrick Carribault, Ingénieur-Chercheur, CEA – Examinateur Julien Jaeger, Ingénieur-Chercheur, CEA – Invité 2019 Keywords High-Performance Computing, Parallel Programming, MPI, OpenMP, Runtime Mixing, Runtime Stacking, Resource Allocation, Resource Manage- ment Abstract With the advent of multicore and manycore processors as building blocks of HPC supercomputers, many applications shift from relying solely on a distributed programming model (e.g., MPI) to mixing distributed and shared- memory models (e.g., MPI+OpenMP). This leads to a better exploitation of shared-memory communications and reduces the overall memory footprint. However, this evolution has a large impact on the software stack as applications’ developers do typically mix several programming models to scale over a large number of multicore nodes while coping with their hiearchical depth. One side effect of this programming approach is runtime stacking: mixing multiple models involve various runtime libraries to be alive at the same time. Dealing with different runtime systems may lead to a large number of execution flows that may not efficiently exploit the underlying resources.
    [Show full text]
  • Consolidating High-Integrity, High-Performance, and Cyber-Security Functions on a Manycore Processor
    Consolidating High-Integrity, High-Performance, and Cyber-Security Functions on a Manycore Processor Benoît Dupont de Dinechin Kalray S.A. [email protected] Figure 1: Overview of the MPPA3 processor. ABSTRACT CCS CONCEPTS The requirement of high performance computing at low power can • Computer systems organization → Multicore architectures; be met by the parallel execution of an application on a possibly Heterogeneous (hybrid) systems; System on a chip; Real-time large number of programmable cores. However, the lack of accurate languages. timing properties may prevent parallel execution from being appli- cable to time-critical applications. This problem has been addressed KEYWORDS by suitably designing the architecture, implementation, and pro- manycore processor, cyber-physical system, dependable computing gramming models, of the Kalray MPPA (Multi-Purpose Processor ACM Reference Format: Array) family of single-chip many-core processors. We introduce Benoît Dupont de Dinechin. 2019. Consolidating High-Integrity, High- the third-generation MPPA processor, whose key features are mo- Performance, and Cyber-Security Functions on a Manycore Processor. In tivated by the high-performance and high-integrity functions of The 56th Annual Design Automation Conference 2019 (DAC ’19), June 2– automated vehicles. High-performance computing functions, rep- 6, 2019, Las Vegas, NV, USA. ACM, New York, NY, USA, 4 pages. https: resented by deep learning inference and by computer vision, need //doi.org/10.1145/3316781.3323473 to execute under soft real-time constraints. High-integrity func- tions are developed under model-based design, and must meet hard 1 INTRODUCTION real-time constraints. Finally, the third-generation MPPA processor Cyber-physical systems are characterized by software that interacts integrates a hardware root of trust, and its security architecture with the physical world, often with timing-sensitive safety-critical is able to support a security kernel for implementing the trusted physical sensing and actuation [10].
    [Show full text]
  • RISC-V Instructioninstruction Setset
    PortingPorting HelenOSHelenOS toto RISC-VRISC-V http://d3s.mff.cuni.cz Martin Děcký [email protected] CHARLES UNIVERSITY IN PRAGUE FacultyFaculty ofof MathematicsMathematics andand PhysicsPhysics IntroductionIntroduction Two system-level projects RISC-V is an instruction set architecture, HelenOS is an operating system Martin Děcký, FOSDEM, January 30th 2016 Porting HelenOS to RISC-V 2 IntroductionIntroduction Two system-level projects RISC-V is an instruction set architecture, HelenOS is an operating system Both originally started in academia But with real-world motivations and ambitions Both still in the process of maturing Some parts already fixed, other parts can be still affected Martin Děcký, FOSDEM, January 30th 2016 Porting HelenOS to RISC-V 3 IntroductionIntroduction Two system-level projects RISC-V is an instruction set architecture, HelenOS is an operating system Both originally started in academia But with real-world motivations and ambitions Both still in the process of maturing Some parts already fixed, other parts can be still affected → Mutual evaluation of fitness Martin Děcký, FOSDEM, January 30th 2016 Porting HelenOS to RISC-V 4 IntroductionIntroduction Martin Děcký Computer science researcher Operating systems Charles University in Prague Co-author of HelenOS (since 2004) Original author of the PowerPC port Martin Děcký, FOSDEM, January 30th 2016 Porting HelenOS to RISC-V 5 RISC-VRISC-V inin aa NutshellNutshell Free (libre) instruction set architecture BSD license, in development since 2014 Goal: No royalties for
    [Show full text]
  • Implementation of a MIX Emulator: a Case Study of the Scala Programming Language Facilities
    ISSN 2255-8691 (online) Applied Computer Systems ISSN 2255-8683 (print) December 2017, vol. 22, pp. 47–53 doi: 10.1515/acss-2017-0017 https://www.degruyter.com/view/j/acss Implementation of a MIX Emulator: A Case Study of the Scala Programming Language Facilities Ruslan Batdalov1, Oksana Ņikiforova2 1, 2 Riga Technical University, Latvia Abstract – Implementation of an emulator of MIX, a mythical synchronous manner, possible errors in a program may remain computer invented by Donald Knuth, is used as a case study of unnoticed. In the authors’ opinion, these checks are useful in the features of the Scala programming language. The developed mastering how to write correct programs because similar emulator provides rich opportunities for program debugging, such as tracking intermediate steps of program execution, an errors often occur in a modern program despite all changes in opportunity to run a program in the binary or the decimal mode hardware and software technologies. Therefore, it would be of MIX, verification of correct synchronisation of input/output helpful if an emulator supported running programs in different operations. Such Scala features as cross-compilation, family modes and allowed checking that the execution result was the polymorphism and support for immutable data structures have same in all cases. proved to be useful for implementation of the emulator. The The programming language chosen by the authors for the authors of the paper also propose some improvements to these features: flexible definition of family-polymorphic types, implementation of an emulator supporting these features is integration of family polymorphism with generics, establishing Scala. This choice is arbitrary to some extent and rather full equivalence between mutating operations on mutable data dictated by the authors’ interest in the features of this types and copy-and-modify operations on immutable data types.
    [Show full text]
  • Parallel Processing with the MPPA Manycore Processor
    Parallel Processing with the MPPA Manycore Processor Kalray MPPA® Massively Parallel Processor Array Benoît Dupont de Dinechin, CTO 14 Novembre 2018 Outline Presentation Manycore Processors Manycore Programming Symmetric Parallel Models Untimed Dataflow Models Kalray MPPA® Hardware Kalray MPPA® Software Model-Based Programming Deep Learning Inference Conclusions Page 2 ©2018 – Kalray SA All Rights Reserved KALRAY IN A NUTSHELL We design processors 4 ~80 people at the heart of new offices Grenoble, Sophia (France), intelligent systems Silicon Valley (Los Altos, USA), ~70 engineers Yokohama (Japan) A unique technology, Financial and industrial shareholders result of 10 years of development Pengpai Page 3 ©2018 – Kalray SA All Rights Reserved KALRAY: PIONEER OF MANYCORE PROCESSORS #1 Scalable Computing Power #2 Data processing in real time Completion of dozens #3 of critical tasks in parallel #4 Low power consumption #5 Programmable / Open system #6 Security & Safety Page 4 ©2018 – Kalray SA All Rights Reserved OUTSOURCED PRODUCTION (A FABLESS BUSINESS MODEL) PARTNERSHIP WITH THE WORLD LEADER IN PROCESSOR MANUFACTURING Sub-contracted production Signed framework agreement with GUC, subsidiary of TSMC (world top-3 in semiconductor manufacturing) Limited investment No expansion costs Production on the basis of purchase orders Page 5 ©2018 – Kalray SA All Rights Reserved INTELLIGENT DATA CENTER : KEY COMPETITIVE ADVANTAGES First “NVMe-oF all-in-one” certified solution * 8x more powerful than the latest products announced by our competitors**
    [Show full text]
  • Processor Architectures
    CS143 Handout 18 Summer 2008 30 July, 2008 Processor Architectures Handout written by Maggie Johnson and revised by Julie Zelenski. Architecture Vocabulary Let’s review a few relevant hardware definitions: register: a storage location directly on the CPU, used for temporary storage of small amounts of data during processing. memory: an array of randomly accessible memory bytes each identified by a unique address. Flat memory models, segmented memory models, and hybrid models exist which are distinguished by the way locations are referenced and potentially divided into sections. instruction set: the set of instructions that are interpreted directly in the hardware by the CPU. These instructions are encoded as bit strings in memory and are fetched and executed one by one by the processor. They perform primitive operations such as "add 2 to register i1", "store contents of o6 into memory location 0xFF32A228", etc. Instructions consist of an operation code (opcode) e.g., load, store, add, etc., and one or more operand addresses. CISC: Complex instruction set computer. Older processors fit into the CISC family, which means they have a large and fancy instruction set. In addition to a set of common operations, the instruction set has special purpose instructions that are designed for limited situations. CISC processors tend to have a slower clock cycle, but accomplish more in each cycle because of the sophisticated instructions. In writing an effective compiler back-end for a CISC processor, many issues revolve around recognizing how to make effective use of the specialized instructions. RISC: Reduced instruction set computer. Many modern processors are in the RISC family, which means they have a relatively lean instruction set, containing mostly simple, general-purpose instructions.
    [Show full text]