Porting GNU Radio to Multicore DSP+ARM System-On-Chip – a Purely Open-Source Approach

8th Karlsruhe Workshop on Software Radios Porting GNU Radio to Multicore DSP+ARM System-on-Chip – A Purely Open-Source Approach Shenghou Ma∗, Vuk Marojevic∗, Philip Balister†, and Jeffrey H. Reed∗ ∗Bradley Dept. of Electrical and Computer Engineering Virginia Polytechnic Institute and State University Blacksburg, Virginia, USA minux,maroje,reedjh @vt.edu { } †Open SDR PO Box 11789 Blacksburg, Virginia, USA [email protected] Abstract—GNU Radio is a versatile fast-prototyping for HMP-SoCs is a much more tedious task than for platform for Software-Defined Radio (SDR). This paper GPPs. The designer needs to deal with heterogeneous describes our mission of porting GNU Radio to Texas devices and manage the data flows among them as Instruments’ KeyStone II Multicore DSP+ARM System- sophisticated development tools and schedulers are not on-Chip (SoC). We describe our long term vision providing commonly available. Several vendors are developing unified scheduling over the heterogeneous multiprocessor SoC and initial results on this revolutionary path. powerful HMP-SoC with the goal of deploying cheap Index Terms—GNU Radio, System-on-Chip, Open- and versatile LTE transceivers. Source. Our goal is to provide a fully open-source approach for SDR education and professional development on a modern SoC. This paper describes our vision for I. INTRODUCTION the Keystone II Multicore DSP+ARM SoC from Texas OFTWARE-defined radio (SDR) is becoming om- Instruments to be used in SDR education, research and S nipresent both in academia and industry. Vanu, for development. Porting GNU Radio to the ARM allows instance, is deploying SDR base stations for a decade directly interfacing with radio frequency (RF) front ends. already. Amarisoft has recently implemented an LTE This provides the excitement of over-the-air transmis- base station, or eNode-B, purely in software that runs sions for students and researchers. Having GNU Radio on powerful general-purpose processor (GPP). GNU running on the ARMs can be useful for DSP developers Radio is gaining momentum in education and research. as well if their algorithms, running on a DSP core, can It has a growing user community, which provides sup- be integrated into a GNU Radio flow graph. We will port through their mailing list, and organizes regular evaluate the different approaches and justify a purely phone conferences and meetings at diverse conferences open-source approach. and events. GNU Radio is open-source and provides a framework for rapid waveform development and testing, II. CONTEXT supporting a variety of popular radio front ends. It is GNU Radio provides an open-source framework for GPP-centric and is only recently being ported to other rapid waveform design and prototyping. It provides a devices, such as graphics-processing units (GPUs) [1]. number of readily available signal processing blocks and Systems-on-chip (SoCs) provide important advantages also easily integrates third-party blocks. The waveform over GPPs for building wireless communication sys- designer can glue together these blocks to create an tems: processing capacity, system integrity and power intended data flow graph. efficiency, to name a few. Heterogeneous Multiprocessor- Aided by the availability of compatible RF front- SoCs (HMP-SoC) are being deployed not only in ra- end hardware, such as the Ettus Research’s Universal dio access network element, but also in mobile de- Software Radio Peripheral (USRP) series. GNU Radio is vices [2]. However, developing applications (waveforms) ideal for education and research. However, GNU Radio 21 8th Karlsruhe Workshop on Software Radios is not without its limitations: Designed for PCs, GNU radio is not easily portable to non-GPP-based processing platforms. The trend in computing is towards heterogeneous multiprocessing, where different instruction set architectures (ISAs) and micro-architectures are integrated in a single system. This trend is observable by looking at the current top5001 supercomputer list. The fastest supercomputer in the world in January 2014 is the Tianhe-2 (MilkyWay-2)2 in the National Super Computer Center in Guangzhou, China. It uses both the Intel Xeon E5-2692 12-Core general purpose processor and Intel Xeon Phi 31S1P many- core processor and has a total of 3120000 computing cores and 1 PB memory. Using 17.808 MW electricity, it can provide a theoretical peak computation throughput of Fig. 1. The Target EVMK2HX Keystone-II Evaluation Board [4] 54.902 Peta-Flops per second. The second ranked is the Titan, which is a Cray XK7. It uses AMD Opteron 6274 A. The Open-Source Aspect 16-Core processors and Nvidia K20x GPUs to provide a theoretical peak computation throughput of 27.112 Peta- GNU Radio is an open-source project distributed Flops per second. Such heterogeneous multiprocessing under the General Public License. This calls for open- architectures provide a higher processing integration and source SDR tools for integrating GNU Radio flow graphs power efficiency than equivalent GPP-based solutions. with processing block running on the DSP cores of the Keystone II or similar HMP-SoCs. The reasons for this SDR requires a huge amount of computational re- are manifold. sources, because of the high date rate and sophisticated Firstly, open-source software is generally more ac- signal processing algorithms. In this aspect, it looks very cessible, so relying on open-source technology makes much like a supercomputer at a smaller scale. But, you the result easily repeatable. Secondly, having the source cannot afford to carry around a mega-watt smartphone. code available means that one does not need to rely To be practical, SDR platforms also need the virtue of on vendor for any bug fixes and feature enhancements, power efficiency. This is where GPPs fail to deliver their and this also provides long-term stability against vendor promises. Nowadays, almost all smartphones are using dropping support for the software. This is especially a the ARM processor precisely because it can deliver the problem in the SoC market where the fast proliferation ever-increasing requirement of computation at affordable of new devices lead to limited-time support for older power levels. However, using ARM processors alone generations. Last but not the least, open-source means cannot provide the necessary processing power. Digi- one can modify the software to fit his or her own needs, tal signal processors (DSP), which are optimized for and more importantly, to share freely the modification. multiply-accumulate (MAC) operations close this gap. Although some proprietary software comes with source A DSP is the choice for implementing typical digital code, the license agreement generally forbid the user signal processing algorithms in software, but it has the from distributing modified copies. This limits the ability disadvantage of not being very easy to program (partly of sharing among the research and education communi- because of the lack of smart compilers that do not need ties. manual hints in order to utilize the full potential of the chip). ARMs and DSPs are therefore combined to B. Hardware Platform take advantage of both processor types, the processing Fig. 1 shows the EVMK2HX evaluation board, pro- power of the DSP and the flexibility of the ARM. duced by ADVANTECH [3], used in this paper. Under Texas Instrument’s has recently released their Keystone the fan and heat sink lies the main multi-core DSP+ARM II multicore DSP+ARM SoC. SoC TCI6638K2K from Texas Instruments. Fig. 2 depicts the functional block diagram of the 1http://www.top500.org/lists/ TCI6638K2K chip. The chip contains a number of 2http://www.top500.org/system/177999 processing cores, accelerators and peripherals [5]. 22 TCI6638K2K Multicore DSP+ARM KeyStone II System-on-Chip (SoC) SPRS836D—October 2013 8th Karlsruhe Workshop on Software Radios 1.6 Functional Block Diagram Figure 1-1 shows the functional block diagram of the device. Figure 1-1 Functional Block Diagram TCI6638K2K Coprocessors RSARSA RSA RSA Memory Subsystem RSARSA RSA RSA SRAM shared by all 12 cores. It also has memory RSARSA RSA RSA RSARSA RSA RSA 72-Bit 6MB C66x™C66x™ RAC 4´ DDR3 EMIF MSM C66x™C66x™C66x™ SRAM C66x™C66x™C66x™CorePac protection capability so that we can limit each core’s CorePacCorePacCorePacCorePac 72-Bit MSMC CorePacCorePacCorePac DDR3 EMIF 32KB32KB L1L132KB32KB L1L1 32KB32KB L1L132KB32KB L1L1 TAC access privilege to this memory for safety and isolation. 32KB32KBP-CacheP-Cache L1L132KB32KBD-CacheD-Cache L1L1 32KB32KBP-CacheP-Cache L1L132KB32KBD-CacheD-Cache L1L1 P-CacheP-CacheD-CacheD-Cache P-CacheP-Cache1024KB1024KBD-CacheD-Cache L2L2 CacheCache Debug & Trace 1024KB1024KB L2L2 CacheCache 1024KB1024KB L2L2 CacheCache VCP2 8´ 1024KB1024KB L2L2 CacheCache The SoC also has a lot of hardware coprocessors sup- Boot ROM 8´ 32KB L1 32KB L1 32KB L1 32KB L1 P-Cache D-Cache P-Cache D-Cache porting typical digital signal processing functionalities of Semaphore ARM ARM TCP3d ´ Power A15 A15 4 modern base stations [3]. Management 4MB L2 Cache Since the SoC contains 12 heterogeneous cores, com- PLL ARM ARM FFTC ´ A15 A15 5 32KB L1 32KB L1 32KB L1 32KB L1 ´ munication and debugging of multi-core application EDMA P-Cache D-Cache P-Cache D-Cache 6 BCP 5´ 8 C66x DSP Cores @ up to 1.2 GHz poses a bigger challenge than for traditional SoCs or 4 ARM Cores @ up to 1.4 GHz 2´ HyperLink TeraNet GPPs with a smaller core count. In order to facilitate Multicore Navigator inter-core communication and synchronization, the SoC Queue Packet Manager DMA also provides a Multicore Navigator. 3-Port 5-Port Security Ethernet Ethernet Accelerator C. Related Work Switch Switch ´ ´ ´ ´ 2 Packet IC SPI Accelerator UART In [1] and [6], the authors managed to use GPGPU ´ ´ USIM 3 3 ´ EMIF16 USB 3.0 AIF2 6 PCIe 2 SRIO 4 2 GPIO 32 10GBE 1GBE 1GBE 1GBE 1GBE (more specifically, NVidia GPGPU with CUDA) to ac- 10GBE 10GBE Network Coprocessor celerate certain GNU Radio blocks.

Load more