Multi-Core Architectures with Coarse-Grained Dynamically Reconfigurable Processors for Broadband Wireless Access Technologies
Total Page:16
File Type:pdf, Size:1020Kb
Multi-core Architectures with Coarse-grained Dynamically Reconfigurable Processors for Broadband Wireless Access Technologies Wei Han A thesis submitted for the degree of Doctor of Philosophy. The University of Edinburgh. January 2010 Abstract Broadband Wireless Access technologies have significant market potential, especially the WiMAX protocol which can deliver data rates of tens of Mbps. Strong demand for high performance WiMAX solutions is forcing designers to seek help from multi-core processors that offer competitive advantages in terms of all performance metrics, such as speed, power and area. Through the provision of a degree of flexibility similar to that of a DSP and performance and power consumption advantages approaching that of an ASIC, coarse-grained dynamically reconfigurable processors are proving to be strong candidates for processing cores used in future high performance multi-core processor systems. This thesis investigates multi-core architectures with a newly emerging dynamically reconfigurable processor – RICA, targeting WiMAX physical layer applications. A novel master-slave multi-core architecture is proposed, using RICA processing cores. A SystemC based simulator, called MRPSIM, is devised to model this multi-core architecture. This simulator provides fast simulation speed and timing accuracy, offers flexible architectural options to configure the multi-core architecture, and enables the analysis and investigation of multi-core architectures. Meanwhile a profiling-driven mapping methodology is developed to partition the WiMAX application into multiple tasks as well as schedule and map these tasks onto the multi-core architecture, aiming to reduce the overall system execution time. Both the MRPSIM simulator and the mapping methodology are seamlessly integrated with the existing RICA tool flow. Based on the proposed master-slave multi-core architecture, a series of diverse homogeneous and heterogeneous multi-core solutions are designed for different fixed WiMAX physical layer profiles. Implemented in ANSI C and executed on the MRPSIM simulator, these multi-core solutions contain different numbers of cores, combine various I memory architectures and task partitioning schemes, and deliver high throughputs at relatively low area costs. Meanwhile a design space exploration methodology is developed to search the design space for multi-core systems to find suitable solutions under certain system constraints. Finally, laying a foundation for future multithreading exploration on the proposed multi-core architecture, this thesis investigates the porting of a real-time operating system – Micro C/OS-II to a single RICA processor. A multitasking version of WiMAX is implemented on a single RICA processor with the operating system support. II Declaration of originality I hereby declare that the research recorded in this thesis and the thesis itself was composed by myself in the School of Engineering at The University of Edinburgh, except where explicitly stated otherwise in the text. Wei Han III Acknowledgements This thesis would not have been what you are seeing now, without the help of my supervisors and colleagues. Foremost, I would like to thank my PhD supervisors Prof. Tughrul Arslan and Dr. Ahmet T. Erdogan for their support and guidance during my study. I would also like to thank my colleagues, Dr. Ying Yi who made great contributions to our multi-core project and did a lot of work which I was too lazy to do, Mr. Mark I.R. Muir for his valuable guidance and parser libraries, Dr. Ioannis Nousias for his help on MRPSIM simulator design, Mr. Xin Zhao and Dr. Cheng Zhan for their contributions on WiMAX implementations. Meanwhile, I would like to thank all RICA team members: Dr. Sami Khawam, Dr. Mark Milward, Dr. Ioannis Nousias, Dr. Ying Yi and Mr. Mark I.R. Muir for their brilliant invention – the RICA architecture and its tool flow. In addition, many thanks to all members in SLI group for their help throughout my PhD study. A very special thank to my wife Xinyu Chen who encourages me with her love, stays with me getting through all tough times and cooks delicious food for me. Finally, I would like to express my deepest appreciation to my parents for their love, guide and support to me throughout my life. IV Acronyms and abbreviations 3G Third Generation ADSL Asymmetric Digital Subscriber Line AES Advanced Encryption Standard ALM Adaptive Logic Module ALU Arithmetic Logic Unit API Application Programming Interface ASIC Application-Specific Integrated Circuit ASIP Application-Specific Instruction Set Processor ASSP Application Specific Standard Product BPSK Binary Phase-Shift Keying BS Base Station BWA Broadband Wireless Access CDMA Code Division Multiple Access CLB Configurable Logic Block CMP Chip-Level Multiprocessing CP Cyclic Prefix CPS Cycles Per Second DC Direct Current DL Downlink DSL Digital Subscriber Line DSP Digital Signal Processor DR Dynamically Reconfigurable EV-DO Evolution-Data Optimized V FEC Forward Error Correction FFT Fast Fourier Transform FIR Finite Impulse Response FPGA Field Program Gate Array GF Galois Field GFLOPS Giga Floating-Point Operations Per Second GPP General Purpose Processor GSM Global System for Mobile communications GPU Graphics Processing Unit HDL Hardware Description Language HPT Highest Priority Task HSPA High Speed Packet Access HSDPA High Speed Downlink Packet Access HSUPA High Speed Uplink Packet Access IFFT Inverse Fast Fourier Transform ILP Instruction Level Parallelism IP Intellectual Property IPI Inter-Processor Interrupt IPS Instructions Per Second ISA Instruction Set Architecture ISDN Integrated Services Digital Network ISI Intersymbol Interference LL Load-Link LUT Look-Up Table MAC Media Access Control Mbps Megabits per second MIMO Multi-Input Multi-Output ML Maximum Likelihood VI MDF Machine Description File MRPSIM Multiple Reconfigurable Processor Simulator OFDM Orthogonal Frequency-Division Multiplexing OS Operating System PHY Physical Layer PLD Programming Logic Device PPE Power Processing Element PRBS Pseudo-Random Binary Sequence PSE Processing and Storage Element QAM Quadrature Amplitude Modulation QPSK Quadrature Phase-Shift Keying RA Reconfigurable Architecture RAM Random Access Memory RICA Reconfigurable Instruction Cell Array RISC Reduced Instruction Set Computer RRC Reconfigurable Rate Controller RPU Reconfigurable Processing Unit RS Reed-Solomon RTL Register Transfer Level RTOS Real-Time Operating System SC Store-Conditional SDL System Description Level SIMD Single Instruction Multiple Data SoC System on Chip SOFDMA Scalable Orthogonal Frequency Division Multiple Access SMP Symmetric Multiprocessing SPE Synergistic Processing Element SPMD Single Program Multiple Data VII SPS Steps Per Second SRAM Static Random Access Memory SS Subscriber Station TDD Time Division Duplex TLM Transaction-Level modeling TTA Transport Triggered Architecture UL Uplink UMTS Universal Mobile Telecommunications System VLIW Very Long Instruction Word WiMAX Worldwide Interoperability for Microwave Access WLAN Wireless Local Area Network VIII Publication from this work Journals 1. W. Han, Y. Yi, M. Muir, I. Nousias, T. Arslan, and A. T. Erdogan, “Multi-core Architectures with Dynamically Reconfigurable Array Processors for Wireless Broadband Technologies,” IEEE Transaction on Computer Aided Design of Integrated Circuits and Systems (TCAD), Vol. 28, Issue 12, pp. 1830-1843, December 2009. 2. W. Han, Y. Yi, M. Muir, I. Nousias, T. Arslan, and A. T. Erdogan, “Efficient Implementation of WiMAX Physical Layer on Multi-core Architecture with Dynamically Reconfigurable Processors,” Scalable Computing: Practice and Experience Scientific international journal for parallel and distributed computing, Vol. 9, pp.185-196, ISSN 1097-2803, 2008. Conference and Workshops 1. W. Han, Y. Yi, Xin Zhao, M. Muir, T. Arslan, and A. T. Erdogan, “Heterogeneous Multi-core Architectures with Dynamically Reconfigurable Processors for WiMAX transmitter”, the 22nd Annual IEEE International SOC Conference (SOCC’09), pp. 97 - 100, Belfast, UK, 9 – 11 September, 2009. 2. W. Han, Y. Yi, Xin Zhao, M. Muir, T. Arslan, and A. T. Erdogan, “Heterogeneous Multi-core Architectures with Dynamically Reconfigurable Processors for Wireless Communication”, the 7th IEEE Symposium on Application Specific Processors in conjunction with Design Automation Conference 2008 (SASP’09), pp. 1 - 6, San Francisco, California, 27 - 28 July, 2009. 3. Y. Yi, W. Han, X. Zhao, A. T. Erdogan and T. Arslan, “An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures,” the Conference on Design, Automation and Test in Europe (DATE’09), pp. 33-38, Nice, France, 20 – 24 April, 2009. 4. W. Han, Y. Yi, M. Muir, I. Nousias, T. Arslan, and A. T. Erdogan, “MRPSIM: a TLM based Simulation Tool for MPSoCs targeting Dynamically Reconfigurable Processors,” the 21st Annual IEEE International SOC Conference (SOCC’08), pp. 41-44, Newport Beach, California, 17 - 20 September, 2008. 5. Y. Yi, W. Han, A. Major, A. T. Erdogan, and T. Arslan, “Exploiting Loop-Level Parallelism on Multi-Core Architectures for the WiMAX Physical Layer,” the 21st Annual IEEE International SOC Conference (SOCC’08), pp. 31-34, Newport Beach, California, 17 – 20 September, 2008. 6. W. Han, Y. Yi, M. Muir, I. Nousias, T. Arslan, and A. T. Erdogan, “Multi-core Architectures with Dynamically Reconfigurable Array Processors for the WIMAX Physical IX Layer,” the 6th IEEE Symposium on Application Specific Processors conjunction with Design Automation Conference 2008