Power Struggles: Revisiting the RISC Vs. CISC Debate On

Appears in the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013) 1 Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures Emily Blem, Jaikrishnan Menon, and Karthikeyan Sankaralingam University of Wisconsin - Madison fblem,menon,[email protected] Abstract design complexity were previously the primary constraints, energy and power constraints now dominate. Third, from a com- RISC vs. CISC wars raged in the 1980s when chip area and mercial standpoint, both ISAs are appearing in new markets: processor design complexity were the primary constraints and ARM-based servers for energy efficiency and x86-based mo- desktops and servers exclusively dominated the computing land- bile and low power devices for higher performance. Thus, the scape. Today, energy and power are the primary design con- question of whether ISA plays a role in performance, power, or straints and the computing landscape is significantly different: energy efficiency is once again important. growth in tablets and smartphones running ARM (a RISC ISA) is surpassing that of desktops and laptops running x86 (a CISC Related Work: Early ISA studies are instructive, but miss ISA). Further, the traditionally low-power ARM ISA is enter- key changes in today’s microprocessors and design constraints ing the high-performance server market, while the traditionally that have shifted the ISA’s effect. We review previous com- high-performance x86 ISA is entering the mobile low-power de- parisons in chronological order, and observe that all prior com- vice market. Thus, the question of whether ISA plays an intrinsic prehensive ISA studies considering commercially implemented role in performance or energy efficiency is becoming important, processors focused exclusively on performance. and we seek to answer this question through a detailed mea- Bhandarkar and Clark compared the MIPS and VAX ISA by surement based study on real hardware running real applica- comparing the M/2000 to the Digital VAX 8700 implementations. We analyze measurements on the ARM Cortex-A8 and tions [7] and concluded: “RISC as exemplified by MIPS pro- Cortex-A9 and Intel Atom and Sandybridge i7 microprocessors vides a significant processor performance advantage.” In an- over workloads spanning mobile, desktop, and server comput- other study in 1995, Bhandarkar compared the Pentium-Pro to ing. Our methodical investigation demonstrates the role of ISA the Alpha 21164 [6], again focused exclusively on performance in modern microprocessors’ performance and energy efficiency. and concluded: “...the Pentium Pro processor achieves 80% to We find that ARM and x86 processors are simply engineering 90% of the performance of the Alpha 21164... It uses an aggres- design points optimized for different levels of performance, and sive out-of-order design to overcome the instruction set level there is nothing fundamentally more energy efficient in one ISA limitations of a CISC architecture. On floating-point intensive class or the other. The ISA being RISC or CISC seems irrelevant. benchmarks, the Alpha 21164 does achieve over twice the performance of the Pentium Pro processor.” Consensus had grown that RISC and CISC ISAs had fundamental differences that led 1. Introduction to performance gaps that required aggressive microarchitecture optimization for CISC which only partially bridged the gap. The question of ISA design and specifically RISC vs. CISC Isen et al. [22] compared the performance of Power5+ to Intel ISA was an important concern in the 1980s and 1990s when Woodcrest considering SPEC benchmarks and concluded x86 chip area and processor design complexity were the primary matches the POWER ISA. The consensus was that “with ag- constraints [24, 12, 17, 7]. It is questionable if the debate was gressive microarchitectural techniques for ILP, CISC and RISC settled in terms of technical issues. Regardless, both flourished ISAs can be implemented to yield very similar performance.” commercially through the 1980s and 1990s. In the past decade, Many informal studies in recent years claim the x86’s the ARM ISA (a RISC ISA) has dominated mobile and low- “crufty” CISC ISA incurs many power overheads and attribute power embedded computing domains and the x86 ISA (a CISC the ARM processor’s power efficiency to the ISA [1, 2]. These ISA) has dominated desktops and servers. studies suggest that the microarchitecture optimizations from the Recent trends raise the question of the role of the ISA and past decades have led to RISC and CISC cores with similar per- make a case for revisiting the RISC vs. CISC question. First, the formance, but the power overheads of CISC are intractable. computing landscape has quite radically changed from when the In light of the prior ISA studies from decades past, the signif- previous studies were done. Rather than being exclusively desk- icantly modified computing landscape, and the seemingly vastly tops and servers, today’s computing landscape is significantly different power consumption of ARM implementations (1-2 W) shaped by smartphones and tablets. Second, while area and chip to x86 implementations (5 - 36 W), we feel there is need to Appears in the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013) 2 Perf interface to Hw performance counters Cortex A8 (RISC) Cortex A9 RISC vs. CISC Beagle Board Panda Board WattsUp Simulated ARM Power appears instruction mix Measures Power Mobile Desktop Server irrelevant CoreMark SPEC CPU2006 Lighttpd 2 WebKit CLucene 10 INT Binary Instrumentation Atom N450 (CISC) i7-Core2700 10 FP Database kernels Performance Atom Dev Board SandyBridge for x86 instruction info Four Platforms 26 Workloads Over 200 Measures Over 20,000 Data Points + Careful Analysis Figure 1. Summary of Approach. revisit this debate with a rigorous methodology. Specifically, Key Findings: The main findings from our study are: considering the dominance of ARM and x86 and the multi- ◦ Large performance gaps exist across the implementations, al- pronged importance of the metrics of power, energy, and perfor- though average cycle count gaps are ≤ 2:5×. mance, we need to compare ARM to x86 on those three metrics. ◦ Instruction count and mix are ISA-independent to first order. Macro-op cracking and decades of research in high-performance ◦ Performance differences are generated by ISA-independent microarchitecture techniques and compiler optimizations seem- microarchitecture differences. ingly help overcome x86’s performance and code-effectiveness ◦ The energy consumption is again ISA-independent. bottlenecks, but these approaches are not free. The crux of our ◦ ISA differences have implementation implications, but mod- analysis is the following: After decades of research to mitigate ern microarchitecture techniques render them moot; one CISC performance overheads, do the new approaches introduce ISA is not fundamentally more efficient. fundamental energy inefficiencies? ◦ ARM and x86 implementations are simply design points op- Challenges: Any ISA study faces challenges in separating timized for different performance levels. out the multiple implementation factors that are orthogonal to Implications: Our findings confirm known conventional (or the ISA from the factors that are influenced or driven by the suspected) wisdom, and add value by quantification. Our results ISA. ISA-independent factors include chip process technology imply that microarchitectural effects dominate performance, node, device optimization (high-performance, low-power, or power, and energy impacts. The overall implication of this work low-standby power transistors), memory bandwidth, I/O device is that the ISA being RISC or CISC is largely irrelevant for to- effects, operating system, compiler, and workloads executed. day’s mature microprocessor design world. These issues are exacerbated when considering energy measure- Paper organization: Section 2 describes a framework we de- ments/analysis, since chips implementing an ISA sit on boards velop to understand the ISA’s impacts on performance, power, and separating out chip energy from board energy presents addi- and energy. Section 3 describes our overall infrastructure and tional challenges. Further, some microarchitecture features may rationale for the platforms for this study and our limitations, be required by the ISA, while others may be dictated by perfor- Section 4 discusses our methodology, and Section 5 presents the mance and application domain targets that are ISA-independent. analysis of our data. Section 6 concludes. To separate out the implementation and ISA effects, we con- 2. Framing Key Impacts of the ISA sider multiple chips for each ISA with similar microarchitec- tures, use established technology models to separate out the In this section, we present an intellectual framework in technology impact, use the same operating system and com- which to examine the impact of the ISA—assuming a von Neu- piler front-end on all chips, and construct workloads that do not mann model—on performance, power, and energy. We con- rely significantly on the operating system. Figure 1 presents an sider the three key textbook ISA features that are central to the overview of our approach: the four platforms, 26 workloads, RISC/CISC debate: format, operations, and operands. We do and set of measures collected for each workload on each plat- not consider other textbook features, data types and control, as form. We use multiple implementations of the ISAs and specifi- they are orthogonal to

Power Struggles: Revisiting the RISC Vs. CISC Debate On

Robust Architectural Support for Transactional Memory in the Power Architecture

Modelling the Armv8 Architecture, Operationally: Concurrency and ISA

Computer Architectures an Overview

Outline What Makes a Good ISA? Programmability Implementability

Appendix K Survey of Instruction Set Architectures

POWER® Instruction Set Architecture(ISA): Openpower Profile Revision 1.0.0 (2016-02-17) Copyright © 2016 Openpower Foundation

The Design of Scalar AES Instruction Set Extensions for RISC-V

Altivec Technology Programming Environments Manual for Power ISA Processors

Performance Optimization and Tuning Techniques for IBM Power Systems Processors Including IBM POWER8

Xilinx Logicore IP Virtex-5 APU Floating-Point Unit (V1.01A), Data Sheet

ECE 152 / 496 Introduction to Computer Architecture Instruction Set Architecture (ISA) Benjamin C

Assembler Language Reference