An Evaluation of the Adapteva Epiphany Many-Core Architecture

21 August 2015 MASTER THESIS AN EVALUATION OF THE ADAPTEVA EPIPHANY MANY-CORE ARCHITECTURE FINAL REPORT Tom Vocke, BSc. (s0144002) Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) Chair: Computer Architecture for Embedded Systems (CAES) Graduation Committee: Dr.Ir. A.B.J. Kokkeler (University of Twente) Ir. E. Molenkamp (University of Twente) Ir. J. Scholten (University of Twente) Ir. R. Marsman (Thales Nederland) Ing. S.T. de Feber (Thales Nederland) 2 / 82 Contents Glossary 5 Abstract 6 1 Introduction 7 1.1 History and Motivation ................................................................................................... 7 1.2 Research Goals ............................................................................................................ 8 1.2.1 Thesis Outline ........................................................................................................ 9 2 Research Approach 11 2.1 Performance Analysis .................................................................................................. 11 2.2 Programming Model and Tool Support ........................................................................ 12 2.3 Architecture ................................................................................................................. 12 3 Relevant Background 13 3.1 Radar Processing and Beam-Forming ........................................................................ 13 3.1.1 Channel Filtering: Hilbert and Bandpass Filter ..................................................... 14 3.1.2 Digital Beam-forming ........................................................................................... 14 3.1.3 Front-End Throughput and Processing Requirements ......................................... 15 3.2 Epiphany Architecture ................................................................................................. 17 3.2.1 Memory Layout .................................................................................................... 17 3.2.2 eMesh Network-On-Chip ..................................................................................... 18 3.2.3 The Epiphany eCore ............................................................................................ 19 3.2.4 Maximizing Epiphany Performance ..................................................................... 20 3.2.5 Initial E16G301 Performance Estimates .............................................................. 22 3.3 CRISP / Recore Xentium RFD Summary .................................................................... 23 3.3.1 Front-End Receiver Mapping and Performance ................................................... 24 4 Benchmarking & Related Work 27 4.1 Related Work .............................................................................................................. 27 4.1.1 eCore performance .............................................................................................. 27 4.1.2 eMesh Performance ............................................................................................ 29 4.1.3 eLink Performance .............................................................................................. 30 4.1.4 Epiphany Power Consumption ............................................................................ 30 4.1.5 Remaining Work .................................................................................................. 31 4.2 Custom Micro-Benchmarks ......................................................................................... 32 4.2.1 Message Size vs. Bandwidth ............................................................................... 32 4.2.2 eLink Bandwidth Sharing ..................................................................................... 32 4.2.3 eMesh Bandwidth Sharing ................................................................................... 33 4.2.4 E16G301 Power Consumption ............................................................................ 34 5 Micro-Benchmark Results 35 5.1.1 Message Size versus Bandwidth ......................................................................... 36 5.1.2 eLink Bandwidth Sharing: .................................................................................... 37 5.1.3 eMesh Bandwidth Sharing ................................................................................... 39 5.1.4 E16G301 Power Consumption ............................................................................ 40 5.1.5 Micro-Benchmark result summary ....................................................................... 44 3 / 82 CONTENTS Contents 6 Front-end Task Implementation 45 6.1.1 Hilbert Filter Optimization ................................................................................... 45 6.1.2 Bandpass Filter Optimization ............................................................................. 49 6.1.3 Beam-former Optimization ................................................................................. 50 6.1.4 Task Power Consumption .................................................................................. 52 6.1.5 Task Implementation Summary and Further Improvements ............................... 53 7 Front-End Receiver Design 55 7.1 Process Communication ............................................................................................ 55 7.2 Problem Decomposition ............................................................................................ 56 7.2.1 Filter Stage Decomposition ................................................................................ 56 7.2.2 Beam-Former Decomposition ............................................................................ 58 7.2.3 Load Balancing .................................................................................................. 58 7.3 Example Front-End Receiver Mapping ...................................................................... 59 7.3.1 Load case and video sizes ................................................................................. 59 7.3.2 Task Mapping on the E16G301 .......................................................................... 59 7.3.3 Data Communication ......................................................................................... 61 7.3.4 E16G301 Front-End Receiver Performance ....................................................... 62 7.3.5 E16G301 Front-End Receiver Power Consumption ........................................... 64 8 Tools and Architecture 65 8.1 Programming Model and Tool Support ...................................................................... 65 8.1.1 Compiler ............................................................................................................ 65 8.1.2 The Debugger and Instruction-Set Simulator ..................................................... 66 8.1.3 Performance Evaluation and Profiling ................................................................ 66 8.1.4 SDK Software Libraries ...................................................................................... 66 8.2 Architecture Design Choices ...................................................................................... 67 8.2.1 Reconfigurability ................................................................................................ 67 8.2.2 Dependability ..................................................................................................... 67 8.2.3 Scalability ........................................................................................................... 67 9 Conclusions and Future Work 69 9.1.1 Epiphany Architecture Performance ................................................................... 69 9.1.2 Programming Model and Tool Support ............................................................... 69 9.1.3 Architecture ........................................................................................................ 70 9.1.4 Future Work ....................................................................................................... 70 References 71 Appendix 73 A. Thales Many-Core Processor Survey ......................................................................... 73 B. Power Measurement Load Cases .............................................................................. 74 C. Power measurement setup ........................................................................................ 75 D. Software Implementation Details ............................................................................... 77 E. SDF3: Automated MPSOC Task Mapping .................................................................. 82 4 / 82 Glossary Term Meaning MMAC(S) Million Multiply-ACcumulate operations (per Second) GFLOP(S) Giga (109) FLoating point OPerations (per Second) ADC Analog to Digital Converter MPSoC Multi-Processor System-on-Chip FPGA Field Programmable Gate Array TDP Thermal Design Power DMA Direct Memory Access MSB Most Significant Bit KB / MB / GB Kilo (103), Mega (106) and Giga (109) Byte KiB / MiB / GiB Kibi (210), Mebi (220) and Gibi (230) Byte FMADD Fused Multiply-Add (multiplies two arguments and adds the result to a third) MS/s Mega-Samples / second. A sample can be of arbitrary bit-length VLIW Very Large Instruction Word RFD Reconfigurable Fabric Device GPP General Purpose Processor GP_GPU General Purpose Graphics Processing Unit DRAM Dynamic Random Access Memory eMesh On-Chip Network of the Epiphany Architecture

Load more