TABLE DRIVEN ADAPTIVE, EFFECTIVELY HETEROGENEOUS MULTI-CORE ARCHITECTURE by SURPRIYA TIKE Bachelor of Science in Electronics

TABLE DRIVEN ADAPTIVE, EFFECTIVELY HETEROGENEOUS MULTI-CORE ARCHITECTURE By SURPRIYA TIKE Bachelor of Science in Electronics and Telecommunication Pune University Pune, India 2007 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE December, 2011 TABLE DRIVEN ADAPTIVE, EFFECTIVELY HETEROGENEOUS MULTI-CORE ARCHITECTURE Thesis Approved: Dr. James E. Stine Jr. Thesis Adviser Dr. Chris Hutchens Dr. Louis G. Johnson Dr. Sheryl A. Tucker Dean of the Graduate College ii TABLE OF CONTENTS Chapter Page 1. INTRODUCTION .............................................................................................................. 1 2. BACKGROUND ................................................................................................................ 1 2.1. Processor based platforms ........................................................................................... 1 2.1.1. OpenSPARC ..................................................................................................... 1 2.1.2. SimpleScalar .................................................................................................... 2 2.1.3. SPIM ................................................................................................................ 2 2.1.4. Simics ............................................................................................................... 2 2.2. Other Related Work .................................................................................................... 3 2.3. Proposed Solution ....................................................................................................... 4 3. IMPLEMENTATION ......................................................................................................... 6 3.1. Architecture Top Level description - .......................................................................... 6 3.2. Performance Counters ............................................................................................... 10 3.3. Single-threaded core ................................................................................................. 10 3.3.1. Datapath ........................................................................................................ 11 3.3.2. Hazard Unit ................................................................................................... 21 3.3.3. Coprocessor0 ................................................................................................. 21 3.4. L2 Data and Instruction Cache .................................................................................. 21 3.5. Floating Point Units .................................................................................................. 22 3.6. Arbiter Logic ............................................................................................................. 25 3.7. Asynchronous FIFO .................................................................................................. 27 3.8. Data Coherency ......................................................................................................... 29 3.8.1. Snooping ........................................................................................................ 29 3.8.2. Directory-based ............................................................................................. 30 3.8.3. Snarfing .......................................................................................................... 30 4. TOOL FLOW AND DESIGN METHODOLOGY .......................................................... 34 iii 4.1. ModelSim .................................................................................................................. 34 4.1.1. Unit testing of module .................................................................................... 37 4.1.2. Benchmark Testing ......................................................................................... 37 4.1.3. Challenge with Benchmark Testing ............................................................... 37 4.1.4. Solution .......................................................................................................... 38 4.1.5. Script and Setup ............................................................................................. 38 4.2. Design Compiler ....................................................................................................... 38 4.2.1. Design Compiler flow .................................................................................... 38 4.2.2. Design Compiler file formats ......................................................................... 39 4.2.3. Script and Setup ............................................................................................. 41 4.3. Formal Verification ................................................................................................... 41 4.3.1. Need for Formal Verification ......................................................................... 42 4.3.2. Script and Setup ............................................................................................. 43 4.4. IC Compiler .............................................................................................................. 43 4.4.1. IC Compiler flow ............................................................................................ 44 4.4.2. Script and Setup ............................................................................................. 56 5. CONCLUSION ................................................................................................................. 57 REFERENCES ........................................................................................................................ 59 APPENDICES ......................................................................................................................... 62 iv LIST OF TABLES Table Page Table 1 - Implemented Instruction Set ........................................................................................... 12 Table 2 - Arbiter Bus - [Level1_Icache - Arbiter - Level2_Icache] .............................................. 25 Table 3 - Arbiter Bus - [Level1_Dcache - Arbiter - Level2_Dcache] ........................................... 25 Table 4 - Arbiter Bus - [Core_CX - Arbiter - Floating Point Unit] ............................................... 25 v LIST OF FIGURES Figure Page Figure 1 - Top Level Architecture ................................................................................................... 7 Figure 2 - 5 Stage pipelined datapath............................................................................................... 9 Figure 3 - Register File Set ............................................................................................................ 13 Figure 4 - Integer Radix16 Divider (fast) ...................................................................................... 14 Figure 5 - Integer Radix4 Divider (slow)....................................................................................... 15 Figure 6 - Level1 Dcache architecture flexibility .......................................................................... 16 Figure 7 - Memory Subsystem adapted into different architectures .............................................. 17 Figure 8 - Memory moving from lower to higher set associativity ............................................... 18 Figure 9 - Memory moving from higher to lower set associativity – encounters invalid entries .. 19 Figure 10 - multi_match penalty evaluation .................................................................................. 20 Figure 11 - Pipelined Floating Point Unit ...................................................................................... 23 Figure 12 - Arbiter Logic ............................................................................................................... 26 Figure 13 - Asynchronous FIFO with Gray code read/write pointers ........................................... 27 Figure 14 - Snooping based cache coherency ................................................................................ 28 Figure 15 - Directory Structure ...................................................................................................... 31 Figure 16 - Tool Flow Methodology.............................................................................................. 35 Figure 17 - ModelSim Simulation of RTL..................................................................................... 36 Figure 18 - Design Vision showing the synthesized memory stage from datapath ....................... 40 Figure 19 - Formality – Successful Verification with post-synthesis netlist for Core1 ................. 42 Figure 20 - Initialize floorplan for complete design ...................................................................... 45 vi Figure 21 – Creating plan-groups for hierarchical sub-modules ................................................... 45 Figure 22 – Placement of plan-groups along with hard macros..................................................... 46 Figure 23 - Committing final plan-groups ..................................................................................... 46 Figure 24 - Plan-group after placement ......................................................................................... 47 Figure

TABLE DRIVEN ADAPTIVE, EFFECTIVELY HETEROGENEOUS MULTI-CORE ARCHITECTURE by SURPRIYA TIKE Bachelor of Science in Electronics

Day 2, 1640: Leveraging Opensparc

Sparc Enterprise T5440 Server Architecture

SPIM MIPS Simulator

Opensparc – an Open Platform for Hardware Reliability Experimentation

MIPS Assembly Language Programming Using Qtspim

Computer Architectures an Overview

Marrying Prediction and Segmentation to Identify Sales Leads

Industry Perspective on Chip Multi-Threading: Bridging the Gap with Academia Using Opensparc Dwayne Lee Opensparc Community Manager

Pipenightdreams Osgcal-Doc Mumudvb Mpg123-Alsa Tbb

QEMU As a Platform for PLC Virtualization an Analysis from a Cyber Security Perspective

Philip Machanick

Kutztown CSC Multiprocessors, Page 1 Using The