TABLE DRIVEN ADAPTIVE, EFFECTIVELY HETEROGENEOUS MULTI-CORE ARCHITECTURE by SURPRIYA TIKE Bachelor of Science in Electronics
Total Page:16
File Type:pdf, Size:1020Kb
TABLE DRIVEN ADAPTIVE, EFFECTIVELY HETEROGENEOUS MULTI-CORE ARCHITECTURE By SURPRIYA TIKE Bachelor of Science in Electronics and Telecommunication Pune University Pune, India 2007 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE December, 2011 TABLE DRIVEN ADAPTIVE, EFFECTIVELY HETEROGENEOUS MULTI-CORE ARCHITECTURE Thesis Approved: Dr. James E. Stine Jr. Thesis Adviser Dr. Chris Hutchens Dr. Louis G. Johnson Dr. Sheryl A. Tucker Dean of the Graduate College ii TABLE OF CONTENTS Chapter Page 1. INTRODUCTION .............................................................................................................. 1 2. BACKGROUND ................................................................................................................ 1 2.1. Processor based platforms ........................................................................................... 1 2.1.1. OpenSPARC ..................................................................................................... 1 2.1.2. SimpleScalar .................................................................................................... 2 2.1.3. SPIM ................................................................................................................ 2 2.1.4. Simics ............................................................................................................... 2 2.2. Other Related Work .................................................................................................... 3 2.3. Proposed Solution ....................................................................................................... 4 3. IMPLEMENTATION ......................................................................................................... 6 3.1. Architecture Top Level description - .......................................................................... 6 3.2. Performance Counters ............................................................................................... 10 3.3. Single-threaded core ................................................................................................. 10 3.3.1. Datapath ........................................................................................................ 11 3.3.2. Hazard Unit ................................................................................................... 21 3.3.3. Coprocessor0 ................................................................................................. 21 3.4. L2 Data and Instruction Cache .................................................................................. 21 3.5. Floating Point Units .................................................................................................. 22 3.6. Arbiter Logic ............................................................................................................. 25 3.7. Asynchronous FIFO .................................................................................................. 27 3.8. Data Coherency ......................................................................................................... 29 3.8.1. Snooping ........................................................................................................ 29 3.8.2. Directory-based ............................................................................................. 30 3.8.3. Snarfing .......................................................................................................... 30 4. TOOL FLOW AND DESIGN METHODOLOGY .......................................................... 34 iii 4.1. ModelSim .................................................................................................................. 34 4.1.1. Unit testing of module .................................................................................... 37 4.1.2. Benchmark Testing ......................................................................................... 37 4.1.3. Challenge with Benchmark Testing ............................................................... 37 4.1.4. Solution .......................................................................................................... 38 4.1.5. Script and Setup ............................................................................................. 38 4.2. Design Compiler ....................................................................................................... 38 4.2.1. Design Compiler flow .................................................................................... 38 4.2.2. Design Compiler file formats ......................................................................... 39 4.2.3. Script and Setup ............................................................................................. 41 4.3. Formal Verification ................................................................................................... 41 4.3.1. Need for Formal Verification ......................................................................... 42 4.3.2. Script and Setup ............................................................................................. 43 4.4. IC Compiler .............................................................................................................. 43 4.4.1. IC Compiler flow ............................................................................................ 44 4.4.2. Script and Setup ............................................................................................. 56 5. CONCLUSION ................................................................................................................. 57 REFERENCES ........................................................................................................................ 59 APPENDICES ......................................................................................................................... 62 iv LIST OF TABLES Table Page Table 1 - Implemented Instruction Set ........................................................................................... 12 Table 2 - Arbiter Bus - [Level1_Icache - Arbiter - Level2_Icache] .............................................. 25 Table 3 - Arbiter Bus - [Level1_Dcache - Arbiter - Level2_Dcache] ........................................... 25 Table 4 - Arbiter Bus - [Core_CX - Arbiter - Floating Point Unit] ............................................... 25 v LIST OF FIGURES Figure Page Figure 1 - Top Level Architecture ................................................................................................... 7 Figure 2 - 5 Stage pipelined datapath............................................................................................... 9 Figure 3 - Register File Set ............................................................................................................ 13 Figure 4 - Integer Radix16 Divider (fast) ...................................................................................... 14 Figure 5 - Integer Radix4 Divider (slow)....................................................................................... 15 Figure 6 - Level1 Dcache architecture flexibility .......................................................................... 16 Figure 7 - Memory Subsystem adapted into different architectures .............................................. 17 Figure 8 - Memory moving from lower to higher set associativity ............................................... 18 Figure 9 - Memory moving from higher to lower set associativity – encounters invalid entries .. 19 Figure 10 - multi_match penalty evaluation .................................................................................. 20 Figure 11 - Pipelined Floating Point Unit ...................................................................................... 23 Figure 12 - Arbiter Logic ............................................................................................................... 26 Figure 13 - Asynchronous FIFO with Gray code read/write pointers ........................................... 27 Figure 14 - Snooping based cache coherency ................................................................................ 28 Figure 15 - Directory Structure ...................................................................................................... 31 Figure 16 - Tool Flow Methodology.............................................................................................. 35 Figure 17 - ModelSim Simulation of RTL..................................................................................... 36 Figure 18 - Design Vision showing the synthesized memory stage from datapath ....................... 40 Figure 19 - Formality – Successful Verification with post-synthesis netlist for Core1 ................. 42 Figure 20 - Initialize floorplan for complete design ...................................................................... 45 vi Figure 21 – Creating plan-groups for hierarchical sub-modules ................................................... 45 Figure 22 – Placement of plan-groups along with hard macros..................................................... 46 Figure 23 - Committing final plan-groups ..................................................................................... 46 Figure 24 - Plan-group after placement ......................................................................................... 47 Figure