The Abacus Processor Architecture

Abacus – A Processor Family for Education Nikita Bhardwaj, Maximilian Senftleben, and Klaus Schneider Department of Computer Science, TU Kaiserslautern http://es.cs.uni-kl.de ABSTRACT of the insights that lead to RISC archictectures [23, 47]. While We present the Abacus processor family and its compiler frame- some processor architectures like the dynamically organized out- work for the MiniC language that we have developed for teaching of-order execution schemes [29, 60] are quite independent of the processor architectures. Besides typical RISC instructions, Aba- compiler, others like VLIW [16], EDGE [8], and TTA [26] pro- cus also offers instructions for vector processing and thread syn- cessors strongly depend on their compilers. Studying these archi- chronization, but it is still small enough to be discussed completely tectures without considering code generators of compilers is not in a class. With reasonable effort, students can therefore modify reasonable. given implementations of micro-architectures and code generators Last but not least, the impact of the memory hierarchy is a pri- to deepen their understanding of the theoretical concepts. More- mary matter of concern in the interplay between compilers and pro- over, using benchmark examples, they can explore the quantitative cessors. Students must experience the enormous effect that can lead aspects of their optimizations. In contrast to commercial and other to performance differences up to a factor of 1000 [59]. For multi- educational processors, we provide many micro-architectures that core processors, students moreover have to understand the difficul- are based on a pure concept only rather than on a combination of ties of cache coherence and weak memory consistency [1, 58]. All concepts, and we provide code generators which contain the core of these aspects can only be understood by actively working with ideas of some architectures. compilers and processors, or at least with simulators of processors. To keep it as simple as possible, many microprocessors and simulators have been developed for educational use: The most popular 1. INTRODUCTION ones are probably the DLX processor of Hennessy and Patterson Even though instruction sets of processors have a quite long life- [24] and Knuth’s MIX and MMIX processors [31] that are both time, it is more worthwhile to study the concepts and core ideas supported by simulators [32], but both are outdated and lack simple of processor architectures rather than concrete instances of proces- models of both micro-architectures and compilers. The ‘little com- sors. To that end, processor architectures are classically catego- puter LC3’ [46] instead offers simulators and compilers for a very rized in many ways: The famous SISD, SIMD, MISD, and MIMD1 simple processor but, there are no different micro-architectures, and classes of Flynn [17] relate the number of instructions with the therefore is only useful for Bachelor students. The same is the case number of their operands. Other classic classification criteria are for the ‘little man’s computer LMC’ [67]. In contrast, for most open the RISC or CISC instruction sets [23, 47], while yet others con- micro-architecture models2 like OpenRISC, OpenSPARC, LEON, sider the micro-architecture of the processors rather than their in- and others, there is typically a lack of simple compilers, and these struction sets. processors are also too complex for teaching. We address further Today, one can further distinguish classes of processor architec- related works in Section 5 of the paper, but a complete survey tures by dynamic or static instruction scheduling [15, 16, 21, 29, would require a single paper on its own. 60], use of scalar or vector data operands [25], use of subword- In this paper, we therefore present the Abacus processor family level parallelism as used in multimedia or arithmetic instructions, and its compiler framework for the MiniC language that we de- general-purpose or application-specific instruction sets [35, 62], veloped for educational use in processor architecture classes. The reconfigurable hardware units [18, 19], or their uses in desktop, Abacus instruction set offers besides typical RISC instructions also server or embedded systems. instructions for vector processing and thread synchronization and Moreover, it is a well-know matter of fact that compilers and can therefore be used to demonstrate hot topics of processor archi- processor architectures have to support each other which was one tectures. Nevertheless, the instruction set is still small enough3 so that we can offer complete implementations of micro-architectures 1Abbreviating single/multiple instruction vs. single/multiple data. that support special architectural concepts like pipelined execution, out-of-order execution or vector pipeline chaining. Students can re- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not fine these implementations, e.g., by adding data conflict resolution made or distributed for profit or commercial advantage and that copies bear techniques to pipelines or by adding branch predictors to super- this notice and the full citation on the first page. Copyrights for components scalar machines, etc.. Analogously, we can offer a compiler library of this work owned by others than ACM must be honored. Abstracting with whose code generators can be adapted by the students to study reg- credit is permitted. To copy otherwise, or republish, to post on servers or to ister allocation, instruction selection schemes, dataflow analysis, redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. 2See http://opencores.org WESE’14, October 12-17, 2014, New Delhi, India 3 c 2014 ACM. ISBN 978-1-4503-3090-9/14/10 ...$15.00. See http://es.cs.uni-kl.de/research/ DOI: http://dx.doi.org/10.1145/2829957.2829959. architectures for more details and a reference card. and refined compilation techniques like trace and modulo schedul- However, considering high-level programming languages with ing for statically organized processors like VLIW [16], EDGE [8], features like object orientation, higher order functions or polymor- and TTA [26] processors. phic types is way too complicated for a course on processor archi- Using the different Abacus micro-architectures and code genera- tectures where just the final code generators of the compilers are of tors, our students can deepen their knowledge about compilers and interest. To that end, we could, in principle, consider open source processors which is a mandatory competence of today’s embedded compilers like the GNU compiler collection4 or SUIF [65] or the system designers. Using benchmark examples, they can moreover code generation from a byte code language of a virtual machine explore quantitative impacts of their designs. Both the qualitative language like LLVM [33, 34]. However, these are still too big to as well as the quantitative aspects are not possible to teach with allow our students to concentrate on the main problems for code commercial processors. With this intention in mind, we introduce generation of the particular architectures. the different Abacus micro-architectures and code generators to our We have therefore developed a simple programming language students who learn the following in detail: called MiniC that is described in this section. For this MiniC language, we have developed a compiler framework including a typ- • general compiler techniques like intermediate representation ical frontend (lexer, parsers, type checker) with various backends of the programs as control-flow graphs and dataflow analysis (intermediate code, code optimization, register allocators, instruc- to compute, e.g., the liveness of variables tion selections) that is made available to our students, so that they • general instruction selection, e.g., for generating code for ex- can concentrate on the code generation part for considered proces- pressions [55] or dataflow graphs sor architectures. • register allocation schemes like Chaitin-Briggs graph color- 2.1 The MiniC Language ing [7, 22] and the linear scan method [49] MiniC has been developed as a minimal multi-threaded program- • pipelined execution with conflict resolution schemes either ming language that can be easily compiled into intermediate for- by the compiler or the processor using stalling or result for- mats like control-flow graphs so that students can easily work with warding these data structures to implement their own code optimizers and • dynamic branch prediction techniques generators. • dynamical instruction scheduling by processors using reser- MiniC has a minimal set of data types, namely booleans (bool), vation stations and reordering unsigned/signed integers of machine width (nat and int), ar- • static instruction scheduling by compilers using trace and rays, and tuples. For these data types, there are the usual boolean modulo scheduling and arithmetic operations that are needed to implement reasonable • pipeline techniques for vector processors like convoys and benchmark examples. The language is strictly typed and allows pipeline chaining programmers to cast types to other types using cast expressions like (nat) τ to enforce the type checker to give expression τ the • code vectorization of scalar code for vector processors in- type nat. cluding strip-mining and other compiler techniques like tiling Statements of MiniC are also standard except

The Abacus Processor Architecture

Donald Knuth Fletcher Jones Professor of Computer Science, Emeritus Curriculum Vitae Available Online

Increasing Memory Miss Tolerance for SIMD Cores

Tug2007-Slides-2X2.Pdf

Typeset MMIX Programs with TEX Udo Wermuth Abstract a TEX Macro

Tousimojarad, Ashkan (2016) GPRM: a High Performance Programming Framework for Manycore Processors. Phd Thesis

Multi-Core Processors and Systems: State-Of-The-Art and Study of Performance Increase

Understanding and Guiding the Computing Resource Management in a Runtime Stacking Context

Consolidating High-Integrity, High-Performance, and Cyber-Security Functions on a Manycore Processor

RISC-V Instructioninstruction Setset

Implementation of a MIX Emulator: a Case Study of the Scala Programming Language Facilities

Parallel Processing with the MPPA Manycore Processor

Processor Architectures