A64FX Microarchitecture Manual 1.3 3

A64FX® Microarchitecture Manual English 1.3, 31 October 2020 Copyright 2020 - 2021 Fujitsu Limited Copyright© 2019 Fujitsu Limited, 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, 211-8588, Japan. All rights reserved. This product and related documentation are protected by copyright and distributed under licenses restricting their use, copying, distribution, and decompilation. No part of this product or related documentation may be reproduced in any form by any means without prior written authorization of Fujitsu Limited and its licensors, if any. The product(s) described in this book may be protected by one or more U.S. patents, foreign patents, or pending applications. TRADEMARKS Fujitsu and the Fujitsu logo are trademarks of Fujitsu Limited. This publication is provided “as is” without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or noninfringement. This publication could include technical inaccuracies or typographical errors. Changes are periodically added to the information herein; these changes will be incorporated in new editions of the publication. Fujitsu Limited may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time. Revision History Change Date Edition Description of Change 2/28/2020 1.0 First Release 4/28/2020 1.1 Correct typos Update following chapters and sections: ・6.5.1. SVE Instruction with Merging Predication ・7.8.1. Multiple Structures Instruction 7/31/2020 1.2 ・7.8.2. Gather Load/Scatter Store ・9.6. Zero Fill ・16. List of Instruction Attribute and Latency: the description of extra µOP and latency Update and modify a following chapter: 10/31/2020 1.3 ・16. List of Instruction Attribute and Latency: ARMv8 Base and SVE instructions A64FX Microarchitecture Manual 1.3 3 Contents 1. Introduction ........................................................................................................... 11 1.1. A64FX Processor Overview .................................................................................................................. 11 1.2. A64FX Processor Specification ............................................................................................................. 12 1.3. A64FX Processor Block Diagram .......................................................................................................... 13 2. Out-of-Order Architecture ................................................................................... 14 2.1. Overview ................................................................................................................................................ 14 2.2. Micro-Operation Instruction .................................................................................................................. 15 2.3. Operation-Flow ...................................................................................................................................... 15 2.4. Out-of-Order Resources ......................................................................................................................... 15 2.5. Pipeline Stage......................................................................................................................................... 17 2.6. Execution Latency.................................................................................................................................. 21 2.7. Operand Bypass ..................................................................................................................................... 21 2.8. Resource Allocation and Release ........................................................................................................... 24 2.9. Execution Latency Changing ................................................................................................................. 24 3. Instruction Fetch ................................................................................................... 26 3.1. Overview of Fetch Stage ........................................................................................................................ 26 3.2. Branch Prediction Mechanism ............................................................................................................... 27 Small Taken Chain Predictor .................................................................................................. 27 Loop Prediction Table ............................................................................................................ 28 Branch Weight Table .............................................................................................................. 28 Branch Target Buffer .............................................................................................................. 29 Return Address Stack ............................................................................................................. 29 3.3. Combination of Predictors ..................................................................................................................... 29 3.4. Short Loop Detector ............................................................................................................................... 30 4. Instruction Decode and Commit.......................................................................... 31 4.1. Micro-Operation Instruction .................................................................................................................. 31 4.2. Multi-Operation ..................................................................................................................................... 31 4.3. MOVPRFX Instruction Packing ............................................................................................................ 31 4.4. Instruction Decode ................................................................................................................................. 32 Pre-Decode ............................................................................................................................. 33 Decode ................................................................................................................................... 35 4.5. Instruction Commit ................................................................................................................................ 35 No Exception Mode ............................................................................................................... 36 4.6. Pipeline Flush......................................................................................................................................... 37 4.7. Particular Instruction Controls ............................................................................................................... 37 5. Instruction Dispatch ............................................................................................. 38 5.1. Reservation Station ................................................................................................................................ 38 5.2. Instruction Dispatch Attribute ................................................................................................................ 38 5.3. Dependency Group Detection ................................................................................................................ 39 5.4. Instruction Dispatch Mechanism ............................................................................................................ 40 6. Instruction Execution ........................................................................................... 42 6.1. Instruction Issue ..................................................................................................................................... 42 6.2. Execution Pipeline ................................................................................................................................. 42 6.3. Blocking Control .................................................................................................................................... 43 6.4. Physical Register File ............................................................................................................................. 43 6.5. Execution of Particular Instructions ....................................................................................................... 44 SVE Instruction with Merging Predication ............................................................................ 44 Inter-Register-File MOV Operation ....................................................................................... 44 Denormalized Number Operation .......................................................................................... 45 7. Memory Access ...................................................................................................... 46 7.1. Overview of Load/Store Pipeline ........................................................................................................... 46 7.2. Basic Execution Mechanism of Load/Store ........................................................................................... 47 Load Instruction ..................................................................................................................... 48 Store Instruction ....................................................................................................................

A64FX Microarchitecture Manual 1.3 3

Computer Organization

Review of Computer Architecture

V850ES/SA2, V850ES/SA3 32-Bit Single-Chip Microcontrollers

Testing and Validation of a Prototype Gpgpu Design for Fpgas Murtaza Merchant University of Massachusetts Amherst

The Microarchitecture of the Pentium 4 Processor

Implementation, Verification and Validation of an Openrisc-1200

Python Console Target Device 78K0 Microcontroller RL78 Family 78K0R Microcontroller V850 Family RX Family RH850 Family

Emcissonfrom L0 \ 001Mm: R" W \14

Value Prediction Design for High-Frequency Microprocessors 1

Computer Organization Structure of a Computer Registers Register

Section 3 Central Processing Unit

Ec603e™ Embedded RISC Microprocessor Technical Summary