Dynamic Vectorization of Instructions
Total Page:16
File Type:pdf, Size:1020Kb
Títol: Dynamic vectorization of instructions Volum: 1 Alumne: Aaron Call Barreiro Director/Ponent: Xavier Martorell Bofill Departament: Arquitectura de computadors Data: 30/05/2014 DADES DEL PROJECTE Títol del Projecte: Dynamic vectorization of instructions Nom de l'estudiant: Aaron Call Barreiro Titulació: Enginyeria en Informàtica Crèdits: 37.5 Director/Ponent: Xavier Martorell Bofill Departament: Arquitectura de Computadors MEMBRES DEL TRIBUNAL (nom i signatura) President: Ramon Canal Corretger Vocal: Isabel Navazo Alvaro Secretari: Xavier Martorell Bofill QUALIFICACIÓ Qualificació numèrica: Qualificació descriptiva: Data: Dynamic vectorization of instructions 5 Table of contents 1. Introduction and motivation........................................................................................................11 1.1 Context.....................................................................................................................................11 1.2 Saving energy...........................................................................................................................11 1.2.1 Vectorization....................................................................................................................12 1.2.2 Problem and compilers solution.......................................................................................12 1.3 Motivation................................................................................................................................16 1.3.1 Goal..................................................................................................................................18 2. Computer architecture overview.................................................................................................19 2.1 Von Neumann architecture.......................................................................................................19 2.2 Processor design overview.......................................................................................................20 2.2.1 Pipeline of a processor.....................................................................................................21 2.2.2 Linear processors.............................................................................................................22 2.2.3 Scalar processors..............................................................................................................22 2.2.4 Multi-cycle processors.....................................................................................................22 2.2.5 Superscalar processors.....................................................................................................23 2.2.6 CPI...................................................................................................................................24 2.3 Hazards....................................................................................................................................24 2.3.1 Data hazards.....................................................................................................................25 2.3.2 Eliminating hazards..........................................................................................................26 2.3.2.1 Rename stage............................................................................................................27 2.4 Final pipeline...........................................................................................................................28 3. Design.............................................................................................................................................30 3.1 Rename algorithms..................................................................................................................30 3.1.1 Scoreboard.......................................................................................................................30 3.1.2 Tomasulo algorithm.........................................................................................................33 3.1.3 Renaming through the reorder buffer...............................................................................36 3.1.4 Renaming through a rename buffer..................................................................................36 3.1.5 Merged register file..........................................................................................................37 3.2 Proposal rename algorithm......................................................................................................38 3.2.1 Rename algorithm execution example.............................................................................40 3.2.2 Correctness of vectorial words.........................................................................................41 3.2.3 Additional structures of rename algorithm.......................................................................42 3.3 Dispatch stage..........................................................................................................................43 3.4 Proposal dispatch modifications..............................................................................................43 3.4.1 Additional structures of dispatch stage............................................................................45 3.5 Second algorithm.....................................................................................................................46 4. Implementation.............................................................................................................................47 4.1 Gem5 simulator architecture....................................................................................................48 4.1.1 Simulator source code tree...............................................................................................49 4.1.2 Running simulations with gem5......................................................................................50 4.1.3 CPU models.....................................................................................................................51 4.1.4 O3CPU model..................................................................................................................52 4.1.4.1 O3CPU Rename algorithm.......................................................................................54 4.1.4.2 O3CPU Dispatch algorithm......................................................................................55 Dynamic vectorization of instructions 6 4.2 Rename implementation..........................................................................................................55 4.2.1 Register file structure.......................................................................................................55 4.2.2 Instructions information structure....................................................................................57 4.2.3 Rename algorithm............................................................................................................57 4.3 Dispatch implementation.........................................................................................................62 4.3.1 Additional structures........................................................................................................62 4.3.2 Dispatch algorithm implementation.................................................................................63 4.4 Second algorithm implementation...........................................................................................67 5. Validation and evaluation............................................................................................................69 5.1 Code semantics validation.......................................................................................................69 5.2 Evaluation framework..............................................................................................................69 5.3 Benchmark A – Vectorial loop.................................................................................................70 5.3.1 Rename algorithm validation...........................................................................................70 5.3.2 Dispatch algorithm validation..........................................................................................72 5.3.3 Evaluation........................................................................................................................73 5.4 Benchmark B - Non-vectorial loop..........................................................................................75 5.4.1 Rename algorithm validation...........................................................................................75 5.4.2 Dispatch algorithm validation..........................................................................................76 5.4.3 Evaluation........................................................................................................................76 5.5 Benchmark C – Non detectable vectorial loop........................................................................77 5.5.1 Rename algorithm validation...........................................................................................77 5.5.2 Dispatch algorithm validation..........................................................................................78 5.5.3 Evaluation........................................................................................................................79