Dynamic Vectorization of Instructions

Títol: Dynamic vectorization of instructions Volum: 1 Alumne: Aaron Call Barreiro Director/Ponent: Xavier Martorell Bofill Departament: Arquitectura de computadors Data: 30/05/2014 DADES DEL PROJECTE Títol del Projecte: Dynamic vectorization of instructions Nom de l'estudiant: Aaron Call Barreiro Titulació: Enginyeria en Informàtica Crèdits: 37.5 Director/Ponent: Xavier Martorell Bofill Departament: Arquitectura de Computadors MEMBRES DEL TRIBUNAL (nom i signatura) President: Ramon Canal Corretger Vocal: Isabel Navazo Alvaro Secretari: Xavier Martorell Bofill QUALIFICACIÓ Qualificació numèrica: Qualificació descriptiva: Data: Dynamic vectorization of instructions 5 Table of contents 1. Introduction and motivation........................................................................................................11 1.1 Context.....................................................................................................................................11 1.2 Saving energy...........................................................................................................................11 1.2.1 Vectorization....................................................................................................................12 1.2.2 Problem and compilers solution.......................................................................................12 1.3 Motivation................................................................................................................................16 1.3.1 Goal..................................................................................................................................18 2. Computer architecture overview.................................................................................................19 2.1 Von Neumann architecture.......................................................................................................19 2.2 Processor design overview.......................................................................................................20 2.2.1 Pipeline of a processor.....................................................................................................21 2.2.2 Linear processors.............................................................................................................22 2.2.3 Scalar processors..............................................................................................................22 2.2.4 Multi-cycle processors.....................................................................................................22 2.2.5 Superscalar processors.....................................................................................................23 2.2.6 CPI...................................................................................................................................24 2.3 Hazards....................................................................................................................................24 2.3.1 Data hazards.....................................................................................................................25 2.3.2 Eliminating hazards..........................................................................................................26 2.3.2.1 Rename stage............................................................................................................27 2.4 Final pipeline...........................................................................................................................28 3. Design.............................................................................................................................................30 3.1 Rename algorithms..................................................................................................................30 3.1.1 Scoreboard.......................................................................................................................30 3.1.2 Tomasulo algorithm.........................................................................................................33 3.1.3 Renaming through the reorder buffer...............................................................................36 3.1.4 Renaming through a rename buffer..................................................................................36 3.1.5 Merged register file..........................................................................................................37 3.2 Proposal rename algorithm......................................................................................................38 3.2.1 Rename algorithm execution example.............................................................................40 3.2.2 Correctness of vectorial words.........................................................................................41 3.2.3 Additional structures of rename algorithm.......................................................................42 3.3 Dispatch stage..........................................................................................................................43 3.4 Proposal dispatch modifications..............................................................................................43 3.4.1 Additional structures of dispatch stage............................................................................45 3.5 Second algorithm.....................................................................................................................46 4. Implementation.............................................................................................................................47 4.1 Gem5 simulator architecture....................................................................................................48 4.1.1 Simulator source code tree...............................................................................................49 4.1.2 Running simulations with gem5......................................................................................50 4.1.3 CPU models.....................................................................................................................51 4.1.4 O3CPU model..................................................................................................................52 4.1.4.1 O3CPU Rename algorithm.......................................................................................54 4.1.4.2 O3CPU Dispatch algorithm......................................................................................55 Dynamic vectorization of instructions 6 4.2 Rename implementation..........................................................................................................55 4.2.1 Register file structure.......................................................................................................55 4.2.2 Instructions information structure....................................................................................57 4.2.3 Rename algorithm............................................................................................................57 4.3 Dispatch implementation.........................................................................................................62 4.3.1 Additional structures........................................................................................................62 4.3.2 Dispatch algorithm implementation.................................................................................63 4.4 Second algorithm implementation...........................................................................................67 5. Validation and evaluation............................................................................................................69 5.1 Code semantics validation.......................................................................................................69 5.2 Evaluation framework..............................................................................................................69 5.3 Benchmark A – Vectorial loop.................................................................................................70 5.3.1 Rename algorithm validation...........................................................................................70 5.3.2 Dispatch algorithm validation..........................................................................................72 5.3.3 Evaluation........................................................................................................................73 5.4 Benchmark B - Non-vectorial loop..........................................................................................75 5.4.1 Rename algorithm validation...........................................................................................75 5.4.2 Dispatch algorithm validation..........................................................................................76 5.4.3 Evaluation........................................................................................................................76 5.5 Benchmark C – Non detectable vectorial loop........................................................................77 5.5.1 Rename algorithm validation...........................................................................................77 5.5.2 Dispatch algorithm validation..........................................................................................78 5.5.3 Evaluation........................................................................................................................79

Dynamic Vectorization of Instructions

Computer Science 246 Computer Architecture Spring 2010 Harvard University

Sections 3.2 and 3.3 Dynamic Scheduling – Tomasulo's Algorithm

Multiple Instruction Issue and Completion Per Clock Cycle Using Tomasulo’S Algorithm – a Simple Example

Tomasulo's Algorithm

Tomasulo's Algorithm

Tomasulo Algorithm and Dynamic Branch Prediction

WCAE 2003 Workshop on Computer Architecture Education

Verification of an Implementation of Tomasulo's Algorithm by Compositional Model Checking

MIPS Architecture with Tomasulo Algorithm [12]

MP-Tomasulo: a Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs

California State University, Northridge a Tomasulo

Superscalar Techniques – Register Data Flow Inside the Processor