Software-Defined Radios: Architecture, State-Of-The-Art, And

INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 1

Software-deﬁned Radios: Architecture, State-of-the-art, and Challenges Rami Akeela, and Behnam Dezfouli Internet of Things Research Lab, Department of Computer Engineering, Santa Clara University, USA Email: [email protected], [email protected]

Abstract—Software-defined Radio (SDR) is a programmable central significance for designers in the industry due to wire- transceiver with the capability of operating various wireless less protocols that advance rapidly and consistently, hence communication protocols without the need to change or update requiring a hardware that can keep up with the evolution. For the hardware. Progress in the SDR field has led to the escalation of protocol development and a wide spectrum of applications, example, the authors in [5] proposed a platform, OpenRadio, with more emphasis on programmability, flexibility, portability, for programming both PHY and MAC layers while offering and energy efficiency, in cellular, WiFi, and M2M communication. a high level of abstraction. Rather than including yet another Consequently, SDR has earned a lot of attention and is of equipment to deal with a new standard or recurrence band, great significance to both academia and industry. SDR designers the equipment of a formerly introduced platform can adjust intend to simplify the realization of communication protocols while enabling researchers to experiment with prototypes on to the particulars of another standard. In a military scenario, deployed networks. This paper is a survey of the state-of-the- for example, the needs of these platforms change in light of art SDR platforms in the context of wireless communication terrible conditions that develop during a mission, which may protocols. We offer an overview of SDR architecture and its not have been predicted when designed initially, leading to the basic components, then discuss the significant design trends development and utilization of new protocols. and development tools. In addition, we highlight key contrasts between SDR architectures with regards to energy, computing Software-defined Radio (SDR) is a technology for radio power, and area, based on a set of metrics. We also review existing communication. This technology is based on software-defined SDR platforms and present an analytical comparison as a guide wireless protocols, as opposed to hardware-based solutions. to developers. Finally, we recognize a few of the related research This translates to supporting various features and functionali- topics and summarize potential solutions. ties, such as updating and upgrading through reprogramming, Index Terms—SDR, Wireless Communication, Programmabil- without the need to replace the hardware on which they ity, Co-design, LTE, WiFi, IoT. are implemented. This opens the doors to the possibility of realizing multi-band and multi-functional wireless devices. I.INTRODUCTION The driving factors for the high demand of SDR include DVANCES in wireless technologies have altered con- network interoperability, readiness to adapt to future updates A sumers communication habits. Wireless technologies are and new protocols, and more importantly, lower hardware and an essential part of users daily lives, and its effect will become development costs. In a report [6], the SDR market is projected even larger in the future. In a technical report, the World to be more than USD 29 billion by the year 2021. Global Wireless Research Forum (WWRF) has predicted that 7 trillion Industry Analysts [7] highlights some of the market trends wireless devices for 7 billion people will be deployed by for SDR as follows: (i) increasing interest from the military 2020 [1]. When these huge numbers of devices are connected sector in building communication systems and large-scale to the Internet to form an Internt of Things (IoT) network, deployment in developing countries, (ii) growing demand for arXiv:1804.06564v1 [cs.NI] 18 Apr 2018 the first challenge is to adjust the basic connectivity and public safety and disaster preparedness applications, and (iii) networking layers to handle the large numbers of end points. building virtualized base stations (BSs). SDRs are also ideal There is an increasing number of wireless protocols that have for developing future space communications [8]–[10], Global been developed, such as ZigBee, BLE, LTE, and new WiFi Navigation Satellite System (GNSS) sensors [11], Vehicle-to- protocols for loT and Machine-to-Machine (M2M) communi- Vehicle (V2V) communication [12]–[14], and IoT applications cation purposes due to different demanding requirements, one [15], [16], where relatively small and low-power SDRs can be of which is high energy efficiency [2]. Wireless standards, utilized. in general, are adapting quickly in order to accommodate SDRs are implemented through employing various types different user needs and hardware specifications [3], [4]. This of hardware platforms, such as General Purpose Processors has called for a transceiver design with the ability to handle (GPPs), Graphics Processing Units (GPUs), Digital Signal several protocols, including the existing ones and those being Processors (DSPs), and Field Programmable Gate Arrays developed. In order to accomplish this task, one needs to (FPGAs). Each of these platforms is associated with their realize the protocols’ need for a flexible, re-configurable, and own set of challenges. Some of these challenges are: utilizing programmable framework. the computational power of the selected hardware platform, Both consumer enterprise and military frameworks have keeping the power consumption at a minimum, ease of design a need for programmable platforms. Programmability is of process, and cost of tools and equipment. The research com- INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 2

TABLE I Digital Front End RF Front End Antenna KEY ABBREVIATIONS

ADC Analog-to-Digital Converter Signal Interpolation Digital RF Power DAC ASIC Application-speciﬁc Integrated Circuit Processing Filter Mixer Up-Converter Amplifier

BS Base Station - FFT/IFFT Digital CUDA Compute Unified Device Architecture - FIR/IIR Digital Up - Decoding Local DAC Digital-to-Analog Converter Converter Oscillator DSP Digital Signal Processor (DUC) FFT Fast Fourier Transform FLOPS Floating Point Operations Per Second (a) Transmitter FPGA Field Programmable Gate Array GPP General Purpose Processor GPU Graphics Processing Unit RF Front End Digital Front End HLS High Level Synthesis Antenna NFV Network Function Virtualization SDR Software-defined Radio Power RF Digital Lowpass Signal ADC SDR Software-defined Network Amplifier Tuner Mixer Filter Processing SNR Signal-to-noise Ratio SoC System on Chip Digital Digital Down USRP Universal Software Radio Peripheral Local Converter Digital Oscillator Analog (DDC)

(b) Receiver munity and industry have both developed SDRs that are based on the aforementioned hardware platforms. A few examples Fig. 1. SDR architecture. Sub-figure (a) shows SDR from a receiver’s point include USRP [17], Sora [18], Atomix [19], Airblue [20], of view, and sub-figure (b) shows SDR from a transmitter’s point of view. and WARP [21]. Each SDR is unique with regards to the design methodology, development tools, performance, and end application. digital signal processing, and digital front end, which includes In this paper, we first present an overview of SDR archi- channel selection and modulation/demodulation, take place tecture as well as the analog and digital divides of the system in the digital domain. This is usually performed in software and interconnection of components. We then introduce the running on processors, such as GPPs and DSPs. However, it criteria based on which the different hardware platforms are can also run on programmable hardware, i.e., FPGAs. classified. We thoroughly examine the architecture and design In general, from the transmitter point of view, a baseband approaches employed by these hardware platforms and present waveform needs to be produced, then an Intermediate Fre- their strengths and weaknesses in the context of SDR imple- quency (IF) waveform, generate an RF waveform, then send mentation. Furthermore, we provide an analytical comparison it through the antenna. From the receiver point of view, this RF of hardware platforms as a guide for design decision making. signal is sampled and de-modulated, then decoded. To provide Moreover, we discuss the use of respective development tools more details to the process, we study the receiving end of the and present a summary to help explain their functionalities system as follows. and the platforms they support. Afterwards, we review the The RF signal from the antenna is amplified, with a tuned SDR platforms developed by both industry and academia, and RF stage that amplifies a range of the frequency band. This analyze and compare them using the criteria discussed before. amplified RF signal is then converted to an analog IF signal. Finally, we identify current challenges and the open research The Analog-to-Digital Converter (ADC) digitizes this IF signal topics related to future SDR development. into digital samples. Then, it is fed into a mixer stage. The This paper is organized as follows: Section II provides a mixer has another input coming from a local oscillator with a description of SDR architecture as well as the classification frequency set by the tuning control. The mixer then translates process used to summarize the various design approaches the input signal to a baseband. The next stage is essentially adopted. Section IV lists some of the corresponding develop- a FIR filter that permits only one signal. This filter limits the ment tools and platforms. Section V presents an analysis and signal bandwidth and acts as a decimating low-pass filter. The comparison of the commercially and academically developed digital down-converter includes a large number of multipliers, SDR platforms. Research questions and future trends are adders, and shift-registers, in the hardware in order to accom- highlighted in Section VI. Section VII presents an analysis plish the aforementioned tasks. Next, the signal processing of the existing literature on SDR surveys. We conclude the stage performs tasks such as de-modulation and decoding. paper in Section VIII. A list of key abbreviations used in this This stage is typically handled by a dedicated hardware like paper can be found in Table I. an Application Specific Integrated Circuit (ASIC) or other programmable alternatives like FPGA or DSP [22]. II.CONCEPTSAND ARCHITECTURE As shown in Figure 1 (a) and (b), at a high level, a typical In this section, we examine the general architecture of SDR transceiver consists of the following components: Signal SDRs, their main components, and their processing require- Processing, Digital Front End, Analog RF Front End, and an ments. As explained in the previous section, SDRs play a vital antenna. role in wireless standard development due to their flexibility 1) Antenna: SDR platforms usually employ several anten- and ease of programmability. This is due to the fact that most nas to cover a wide range of frequency bands [23]. Antennas INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 3 are often referred to as ”intelligent” or ”smart” due to their In a SDR transceiver, the following tasks are executed in the ability to select a frequency band and adapt with mobile digital front end: tracking or interference cancellation [22], [24]. In the case of – In the transmitting side (Figure 1(a)), the Digital Up SDRs, an antenna needs to meet a certain list of requirements Converter (DUC) translates the aforementioned baseband such as self-adaptation (i.e., flexibility to tuning to several signal to IF. The DAC that is connected to the DUC then bands), self-alignment (i.e., beamforming capability), and self- converts the digital IF samples into an analog IF signal. healing (i.e., interference rejection) [24]. Afterwards, the RF up-converter converts the analog IF 2) RF Front End: This is a RF circuitry that its main func- signal to RF frequencies. tion is to transmit and receive the signal at various operating – In the receiving side (Figure 1(b)), the ADC converts frequencies. Its other function is to change the signal to/from the IF signal into digital samples. These samples are the Intermediate Frequency (IF). The process of operation is subsequently fed into the next block, which is the Digital divided into two, depending on the direction of the signal (i.e., Down Converter (DDC). The DDC includes a digital Tx or Rx mode): mixer and a numerically-controlled oscillator. DDC ex- tracts the baseband digital signal from ADC, and after – In the transmission path, digital samples are converted being processed by the Digital Front End, this digital into an analog signal by the Digital-to-Analog Converter baseband signal is forwarded to a high-speed digital (DAC), which in turn feeds the RF Front End. This analog signal processing block [29]. signal is mixed with a preset RF frequency, modulated, and then transmitted. 5) Signal Processing: Signal processing operations, such – In the receiving path, the antenna captures the RF signal. as encoding/decoding, interleaving/deinterleaving, modula- The antenna input is connected to the RF Front End tion/demodulation, and scrambling/descrambling, are per- using a matching circuitry to guarantee an optimum formed in this block. Encoding for the channel serves as an signal power transfer. It then passes through a Low Noise error correcting code. Specifically, the encoded signal includes Amplifier (LNA), which resides in a close proximity to redundancy that is utilized by the receiver’s decoder to re- the antenna, to amplify weak signals and minimize the construct the original signal from the corrupted received sig- noise level. This amplified signal, with a signal from the nal. Examples of error correcting codes include Convolutional Local Oscillator (LO), are fed into the mixer in order to Codes, Turbo Codes, and Low Density Parity Check (LDPC) down convert it to the IF [25]. [30]. The decoder constitutes the most computationally intensive part of the Signal Processing block, due to data transfer 3) Analog-to-Digital and Digital-to-Analog Conversion: and memory schemes [31]. The second part that is regarded The DAC, as mentioned in the previous section, is responsible as highly complex and expensive (in terms of area and power) for producing the analog signal to be transmitted from the is the Fast Fourier Transform (FFT) and Inverse FFT (IFFT), digital samples. On the receiver side, the ADC resides, and is as part of the modulation phase [32]. an essential component in radio receivers. The ADC is respon- The signal processing block is commonly referred to as sible for converting continuous-time signal to a discrete-time, the baseband processing block. When discussing SDRs, the binary-coded signals. ADC performance can be described by baseband block is at the heart of the discussion, as it makes various parameters [26], [27] including: (i) Signal-to-Noise up the bulk of the digital domain of the implementation. This Ratio (SNR): the ratio of signal power to noise power in implementation runs on top of a hardware circuitry that is the output, (ii) resolution: number of bits per sample, (iii) capable of processing signals efficiently. Examples include Spurious-free Dynamic Range (SFDR): the strength ratio of ASICs, FPGAs, DSPs, GPPs, and GPUs. The second part the carrier signal to the next strongest noise component or spur, of the implementation is the software, which provides the and (iv) power dissipation. Advances in SDR development functionality and high-level abstractions to execute signal have provided momentum for ADC performance improve- processing operations. In the next section we examine the ments. For example, since ADC’s power consumption affects aforementioned hardware platforms and analyze in detail the the lifetime of battery-powered SDRs, more energy efficient various design approaches. ADCs have been developed [28]. 4) Digital Front End: The Digital Front End has two main functions [27]: III.DESIGN APPROACHES – Sample Rate Conversion (SRC), which is a functionality In this section, we discuss the classification of the various to convert the sampling from one rate to another. This is SDR design methodologies of the baseband processing block, necessary since the two communication parties must be namely GPP, GPU, DSP, FPGA, and co-design based method- synchronized. ologies. In this classification, we analyze and compare SDR – Channelization, which includes up/down conversion in platforms based on a set of performance metrics in a criteria the transmitter and receiver side, respectively. It also which we introduce. This criteria includes: includes channel filtering, where channels that are divided – Flexibility and Reconfigurability. The capability for the by frequency are extracted. Examples include interpola- modulation and air-interface algorithms and protocols to tion and low-pass filters, as can be observed in Figure evolve by merely loading new software onto the platform 1. [10]. INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 4

– Adaptability. The SDR platform can adjust its capabilities GPUs are processors specifically designed to handle graphics- as network or traffic operational requirements change. related tasks and they efficiently process large blocks of – Computational Power. The processing rate of the SDR streaming data in parallel. SDR platforms comprised of both platform, namely Giga Operations per Second (GOPS). GPPs and GPUs are flexible and have a higher level of pro- – Energy Efficiency. The total power consumption (typi- cessing power. However, this results in a lower level of power cally within a few hundreds milli watts), especially for efficiency (e.g., GPP’s power efficiency is ∼9GFLOPS/W for mobile [33] and IoT deployments. single-precision, compared to 20GFLOPS/W for GPU [42]). – Cost. The total cost of the SDR platform, including GPUs act as co-processors to GPPs because a GPP is required time-to-market, development, and hardware costs. to act as the control unit and transfer data from external memory. After a transfer is completed, signal processing algorithms are executed by the GPU. While GPUs are typically used for processing graphics, they A. GPP-based are also useful at signal processing algorithms. Over the past One of the first approaches to realizing SDR platforms is few years, theoretical peak performance for GPUs and GPPs using a General Purpose Processor (GPP), or the commonly for single and double precision processing has been growing known generic computer microprocessors such as x86/64 and [43]. For example, comparing Intel Haswell’s 900 GFLOPs ARM architectures. Examples of SDR platforms that utilize [44] with NVIDIA GTX TITAN’s 4500 GFLOPS [45] for GPPs include Sora [18], KUAR [34], and USRP [17]. single precision, it is apparent that GPUs have a computational 1) Definition and Uses: A GPP is a digital circuit that is power that far exceeds their GPP counterparts [43]. Their clock-driven and register-based, and is capable of processing multi-core architectures and parallel processors are the main different functions and operates on data streams represented attractive features, in addition to their relatively reasonable in binary system [35]. These GPPs can be used for several prices and credit card sizes. These features makes them purposes, making them extremely useful for unlimited number good candidates as co-processors in GPP-based SDRs, where of applications, eliminating the need for building application- they can play a vital role in accelerating computing-intensive specific circuits, and thus reducing the overall cost of run- blocks [46]. Another advantage is their power efficiency, which ning applications. GPPs are generally a preferable hardware keeps improving with every new model (e.g., it went from platform by researchers in academia due to their flexibility, 0.5 to 20GFLOPS/W for single-precision) [42]. To take full abundance, and ease of programmability, which is one of advantage of GPUs, it is a condition that algorithms conform the main requirements in SDR platforms [36]. In addition, to their architecture. From an architectural perspective, GPUs researchers prefer GPPs since they are more familiar with them have a number of advantages that makes them preferable so- and their software frameworks, compared to DSPs and FPGAs. lutions to applications such as video processing. In particular, From the performance point of view, GPPs are being enhanced GPUs employ a concept called Single Program Multiple Data rapidly, credited not only to technological advances in terms of (SPMD) that allows multiple instruction streams to execute CMOS technology [37], but also to the increase of the average the same program. In addition, due to their multi-threading number of instructions processed per clock cycle. The latter is scheme, data load instructions are more efficient. GPUs also achieved through different means, and in particular, utilizing present a high computational density, where cache to ALU parallelism within and between processors. This has led to the ratio is low [47]. evolution of multi-core GPPs [38]. In Table II, the authors of [46] confirmed that the signal 2) Adoption and GPUs: Architecturally, the instruction set detection algorithm (which includes intensive FFT computa- of GPPs include instructions for different operations such as tions) shows a faster parallel processing in the case of GPU Arithmetic and Logic Unit (ALU), data transfer, and I/O. over GPP, while operating in real-time. This is due to the A GPP processes these instructions in the sequential order. availability of cuFFT library developed for NVIDIA GPUs Because of sequential processing, GPPs are not convenient for for more efficient FFT processing [48]. With regards to the high-throughput computing with real-time requirements (i.e., architectural advantage of GPUs, several hundred CUDA cores high throughput and low latency) [39]. For example, using can perform a single operation at the same time, as opposed GNU Radio [40] to implement IEEE 802.11 standard, which to a few cores in the case of multi-core GPPs. requires 20MHz sampling rate, would be challenging, since Examples of using GPUs alongside GPPs to build SDR GNU Radio is restricted by the limited processing capabilities platforms is the work in [49], where the authors built a of GPPs. This leads to the GPP cores (of the PC attached) to framework on a desktop PC in addition to using a GPU to reach saturation and frames become corrupted and discarded. implement an FM receiver. The authors in [46] studied real- Moreover, wireless protocols require predictable performance time signal detection using an SDR platform composed of in order to guarantee meeting timing constraints. However, a laptop computer and an NVIDIA Quadro M4000M [45]. conditional branch instructions in GPP’s instruction sets lead Examples of GPUs available in the market can be found to out-of-order execution, which makes it unfeasible to achieve in Table III. In this table, we show two examples of high predictability. performing GPUs (> 5500 GFLOPS) suitable for SDRs with To remedy the limitation of GPPs, researchers have pro- strict timing and performance requirements. We also show posed multiple solutions, one of which is the addition of two more examples of less powerful and less expensive GPUs co-processors, such as Graphic Processing Unit (GPU) [41]. suitable for prototyping SDRs in academic environments. INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 5

TABLE II PERFORMANCEOF SIGNAL DETECTION ALGORITHMON GPP AND GPU [46]

Processing Platform of Signal Detection Algorithm ADC Data Length (ms) GPP Serial Processing (ms) GPP Parallel Processing (ms) GPU Parallel Processing (ms) 1 13.487 1.254 0.278 10 135.852 12.842 2.846 100 1384.237 131.026 29.358 1000 13946.218 1324.346 321.254

TABLE III COMPARISON OF GPUS

NVIDIA GeForce AMD Radeon NVIDIA GeForce AMD Radeon GTX 980 Ti [45] R9 390X [50] GTX 680 [45] RX 560 [50] GFLOPS 5632 5913 3090 2611 Power Consumption (W) 250 363 356 180 Frequency (MHz) 1000 1050 1006 1175 Cost (USD) 870 520 300 150

3) Shortcomings: State-of-the-art GPP and GPU-based modulation/demodulation, filtering, and encoding/decoding are platforms, such as Sora [18] and USRP [17], utilize desktop commonly and frequently used in applications that include computers to realize the systems. However, these platforms speech recognition, image processing, and communication consume a significant amount of power for a performance systems. DSPs, however, implement them more quickly and goal and their form factor is large, which makes it impossible efficiently due to their architecture (e.g., RISC-like architec- for real-world deployments. It is worth noting that GPPs ture, parallel processing) which is specifically optimized to and GPUs alike, present scaling limitation while meeting handle arithmetic operations, especially multiplications. Since Koomey’s Law. This law states that energy efficiency of DSPs are capable of delivering high performance with lower computers doubles roughly every 18 months [51]. This lim- power, they are better candidates for SDR deployment [58], itation calls for alternatives that provide higher computing compared to GPPs. Examples of DSPs especially designed power while keeping the energy efficiency the same. One for SDR platforms are TI TMS320C6657 and TMS320C6655. alternative is the hybrid or co-design approach, where software These DSPs are both equipped with hardware accelerators for and hardware implementations are combined. This will be complex functions like the Viterbi and Turbo Decoders [59]. discussed in more details in Section III-D. 2) Adoption: As discussed in the previous section, GPPs When both GPP and GPU are used for a SDR design, data provide an average performance for a wide range of appli- transfer operations between GPP and GPU can be bottlenecks cations. Needless to say, this performance level might be and cause performance loss, especially for meeting real- sufficient for research and academia, but if the system is to time requirements [52]. However, there are continuous efforts be deployed commercially, certain performance requirements to reduce or eliminate the time overhead of data transfers must be met. To this end, compared to GPPs, DSPs are tailored by introducing multi-stream scheduling for pipelining of the for processing digital signals efficiently, utilizing features memory copy tasks. This would ensure no stalls in the pipeline like combined multiply-accumulate operations (MAC units) and thus enhancing processing parallelism [53], [54]. Finally, and parallelism [60]. DSP manufacturers usually sell these although the processing power of microprocessors is being products in two flavors: optimized for performance, and energy constantly improved, balancing between sufficient computing optimized. Therefore, when used in SDRs, high performance power and a specific goal for energy consumption and cost, and energy efficient products can be employed in BSs and stays a very difficult task now and in the future. This is true edge devices, respectively. especially with the growing need for more data to be processed In terms of the instruction set, DSPs can be categorized into and blocks that can handle data processing in parallel. two groups: (i) Single Instruction Multiple Data (SIMD) architecture, and (ii) Multiple Instruction Multiple Data (MIMD) architecture, as described by Michael J. Flynn in what is B. DSP-based known as Flynn’s Taxonomy [61], [62]. This taxonomy is a The DSP-based solution can be considered as a special method of classifying various architectures depending on the case of GPP-based solutions, but due to its popularity and number of concurrent instructions and data streams, as follows: unique processing features, it deserve a separate discussion. – A SIMD-based DSP can execute an instruction on mul- An example of DSP-based SDR is the Atomix platform [19] tiple data streams at the same time. This architecture can which utilizes TI TMS320C6670 DSP [55]. be very efficient in cases when there exists high data 1) Definition and Uses: DSP is a particular type of mi- parallelism within the algorithm [63]. This indicates that croprocessor that is optimized to process digital signals [56]. there are similar operations that can be performed on To help understand how DSPs are distinguished from GPPs, different datasets at the same time. Examples of SIMD- we should first note that both are capable of implementing based DSPs include the Cell processor presented in [64] and processing complex arithmetic tasks [57]. Tasks like which supports 256 GFLOPS. More examples of DSPs INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 6

that are optimized for low power are NXP CoolFlux DSP C. FPGA-based [65] and Icera Livanto [66]. A SDR employing a SIMD DSP is the SODA architecture [67]. It has been a common Another approach towards realizing SDRs is to use a practice to add more cores in order to achieve a better programmable hardware such as FPGAs. Example of FPGA- trade off between performance and power. With each based SDR platforms are Airblue [20], Xilinx Zynq-based extra core utilizing Very Long Instruction Word (VLIW), implementation of IEEE 802.11ah [73], and [74] that used the a higher level of parallelism can be accomplished as well. same FPGA board to implement a complete communication – On the other hand, MIMDs have the ability to operate system with channel coding. on multiple data streams executing multiple instructions 1) Definition and Uses: An FPGA is an array of pro- at any point in time. This is essentially an extension grammable logic blocks, such as general logic, memory, and of the SIMD architecture, where different instructions multiplier blocks, that are surrounded by a routing fabric, or programs run on multiple cores concurrently. This is which is also programmable [75]. This circuit has the ca- especially important and useful in cases where parallelism pability of implementing any design or function, with ease is not uniform across different blocks, but MIMD ar- of updating it. Although FPGAs consume more power and chitecture allows for parallel execution which leads to occupy more area than ASICs, the programmability feature is speed improvements. Examples of MIMD-based DSPs the reason behind their increasing adoption in a wide range include Texas Instruments SMJ320C80 and SM320C80 of applications. Furthermore, when the reconfiguration delay DSPs with 100 MFLOPS [59]. is in the order of milliseconds, the SDR can switch between different modes and protocols seamlessly [76]. Another major Since DSPs are customized to meet certain signal difference is that, ASIC fabrication is expensive (at least a processing-related needs, it is crucial to clarify these cus- few tens of thousands of dollars) and requires a few months, tomizations in order to understand how DSPs stand out and whereas FPGAs can be quickly reprogrammed, and their cost how they are successful at not only meeting the requirements, is within a few tens to a few thousands of dollars, at most. but also becoming a vital player in the wireless communication The low end product cycle, along with attractive hardware field. These customizations, which are mostly architecture- processing advantages, such as high speed performance, low related, are as follows. power consumption, and portability, compared to processors such as GPPs and DSPs, present FPGAs as contenders that In [68], the authors discuss the energy efficiency of DSPs. offer the best of both worlds [75]. In general, DSPs consume more power than ASICs, however, there exists DSPs that are optimized for low power wireless In a study by [77], the authors compared the performance of implementations, such as TI C674x DSP [59]. One of the Xilinx FPGAs [78] against 16-core GPPs. The calculation of methods to lower power consumption is using multiple data peak performance for GPPs was performed through multiply- memory buses (e.g., one for write, and two for reads). This ing the number of floating point function units on each core by paves the way for higher memory bandwidth, and allows the number of cores and by the clock frequency. For FPGAs, for multiple operand instructions, which in turn results in performance is calculated through picking a configuration, fewer cycles. Also, as discussed above, VLIW architectures adding up the Lookup Tables (LUTs), flip-flops, and DSP along with specialized instructions can provide a higher level slices needed, then multiplying them by the appropriate clock of efficiency, and hence lower energy consumption. These frequency. The authors calculated the theoretical peaks for 64- improvements can be seen in DSPs such as TI TMS320C6x bit floating point arithmetic and showed that Xilinx Virtex-7 [59] and ADI TigerSHARC [69]. These techniques, coupled FPGA is about 4.2 times faster than a 16-core GPP. This can with proven power saving techniques like clock gating and been seen in Figure 2. Even with a one-to-one adder/multiplier putting parts of or the entire system in sleep mode, further configuration, the V7-2000T achieved 345.35GFLOPS, which reduce the power consumption. Examples of DSPs available in is better than a 16-core GPP. From Intel [44], Stratix 10 the market can be found in Table IV. In this table, we present FPGAs can achieve a 10 Tera FLOPS peak floating point three examples of DSPs that do no include co-processors, and performance [79]. This is due to the fixed architecture of the three DSP-based SoCs that, in addition to DSP cores, include GPP, where not all functional units can be fully utilized, and extra soft cores as control processors. the inherent parallelism of FPGAs and their dynamic architecture. In addition, despite having lower clock frequencies (up to 3) Shortcomings: Although DSPs have been widely 300MHz), FPGAs can achieve better performances due to their adopted for SDR implementations for decades [72], they architectures which allow higher levels of parallelism through present shortcomings as follows. As more applications call for custom design [80]. In a study by [81], the authors compared increasing parallelism and reconfigurability in order to handle the performance and power efficiency of FPGAs to that of computationally intensive tasks, DSPs can be insufficient. GPPs and GPUs using double-precision floating point matrix- In addition, programming DSPs to achieve higher levels of vector multiplication. The results show that FPGAs are capable parallelism predictability can be challenging. This opened the of outperforming the other platforms, while maintaining their door for parallel architectures like FPGAs, or multi-core GPPs, flexibility. In another study by [47], the authors thoroughly or even a hybrid of both, to be adopted for SDRs. In addition, analyzed and compared FPGAs against GPUs via the imple- power consumption of DSPs is generally higher than FPGAs mentations of various algorithms. The authors concluded that due to operating at high frequencies. although both architectures support a high level of parallelism, INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 7

TABLE IV COMPARISON OF DSPSAND DSP-BASED SOCS

DSP only SoC TI C66x CEVA Analog Devices TI Keystone II Analog Devices Qualcomm Snapdragon 820 (TMS320C6652) [59] (XC-4500) [70] (ADSP-21369) [69] (66AK2G02) [59] (ADSP-SC573) [69] (Hexagon 680) [71] GFLOPS 9.6 40 2.4 28.8 5.4 No Floating Point Memory (Kb) 1088 No Info 2000 1024 768 No Info Frequency (MHz) 600 1300 400 600 450 2000 Cost (USD) ∼25 No Info ∼20 ∼20 ∼20 ∼70 ARM ARM Qualcomm Kyro 385 (CPU) Soft Core N/A N/A N/A Cortex-A15 Cortex-A5 Adreno 530 (GPU)

700 through the implementation of several techniques discussed GPP 600 FPGA (1-to-1) in [76]. These techniques can be at a system, device, and/or

500 FPGA (Best) architecture level, such as clock gating and glitch reduction. Table V presents a summary of widely-used FPGA platforms. 400 3) Shortcomings: One of the challenges of using FPGAs, V7-2000T GFLOPS GFLOPS 300 however, can be the prior knowledge about target hardware 200 architecture and resources that a developer needs to possess V6-SX475T V5-SX240T 100 Sixteen-core in order to design an application efficiently for FPGAs. In the V4-LX200 Six-core Dual-core Quad-core SDR domain, designing the platform has typically been the 0 90nm 65nm 40-45nm 28-32nm job of software engineers, and thus the process can be time- CMOS Technology consuming and less trivial to incorporate this experience into Fig. 2. Peak performance of GPPs versus FPGAs when performing 64-bit hardware design. However, as it will be discussed in Section floating point operations [77]. It can be observed that FPGAs increased their IV-A, adoption of FPGA solutions can be made more feasible floating point performance by an order of magnitude compared to GPPs. through HLS tools. which is crucial to signal processing applications, FPGAs offer D. Hybrid Design (a.k.a., co-design) a larger increase in parallelism, whereas GPUs have a fixed The fourth approach towards realizing SDRs is the hybrid parallelism due to their data path and memory system. approach, where both hardware and software-based techniques 2) Adoption: Over the past decade, FPGAs have signifi- are combined into one platform. This is commonly referred cantly advanced and become more powerful computationally, to as the co-design or hybrid approach. Examples of SDRs and now exist in many differnet versions such as Xilinx that adopted the co-design approach include WARP [21] and Kintex UltraScale [78] and Intel Arria 10 [44] [82], [83]. In CODIPHY [92]. addition, the availability of various toolsets gave FPGAs an 1) Definition: Hardware/software co-design as a concept advantage by making them more accessible. This is supported has been around for over a decade, and it has evolved at a by the availability of compilers that have the capability of faster rate in the past few years due to an increasing interest generating Register-transfer Level (RTL) code, such as Verilog in solving integrated circuit design problems with a new and and VHDL, that is needed to run on FPGAs, from high-level different approach. Even with GPPs becoming more powerful programming languages. This process is typically referred to than ever, and with multi-core designs, it is clear that in order as High Level Synthesis (HLS). Examples of such compilers to achieve higher performance and realize applications that include HDL Coder [84] for MATLAB code [85] and Xilinx demand real-time processing, designers had to shift attention HLS [86] or Altera Nios II C2H compiler [87] for C, C++, to new design schemes that utilize hardware solutions, namely, and SystemC. We will explain some of these tools in Section FPGAs and ASICs [93], [94]. Co-design indicates the use IV. of hardware design methodology, represented by the FPGA HLS allows software engineers to design and implement fabric, and software methodology, represented by processors. applications, such as SDRs, on FPGAs using a familiar pro- As more applications, such as automotive, communication, gramming language to code, namely C, C++, SystemC, and and medical, grow in complexity and size, it has become MATLAB, without the need to posses a prior rich knowledge a common practice to design systems that integrate both about the target hardware architecture (refer to Section IV-A). software (like firmware and operating system) and hardware These compilers can also be used to speed up or accelerate [95]. This has been made feasible in the recent years thanks to parts of the software code running on a GPP or DSP that the advances in high-level synthesis and developing tools that are causing slowdowns or setbacks to the overall performance. not only have the capability to produce efficient RTL from This will be further discussed in Section III-D. Further, FPGAs software codes, but also define the interface between the both can achieve high performance while still consuming less sides. The industry has realized the huge market for co-design, energy than previously discussed processors [88] (e.g., Intel and provided various SoC boards, that in addition to the FPGA Stratix 10 FPGA can achieve up to 100 GFLOPS/W [89], fabric, contain multiple processors. For example, the Xilinx compared to 23 GFLOPS/W for NVIDIA GeForce GTX 980 Zynq board [78] includes an FPGA fabric as well as two ARM Ti [45]). In addition, power dissipation can be further lowered Cortex-A9 processors [96]. In addition to the aforementioned INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 8

TABLE V COMPARISON OF FPGASAND FPGA-BASED SOCS

FPGA only SoC Xilinx Kintex-7 Intel Cyclone V GX Lattice ECP3-70 Xilinx Zynq-700 Intel Cyclone V SE SoC Microsemi SmartFuion2 (XC7K70T) [78] (C5) [44] (LFE3-70EA) [90] (Z-7020 XC7Z020) [78] (A5) [44] (M2S090) [91] Logic Cells (K) 65.6 77 67 85 85 86.31 Memory (Mb) 4.86 4.46 4.42 4.9 3.97 4.488 DSP Slices 240 150 128 220 87 84 Cost (USD) 130 185 80 110 110 155 Dual-core ARM Dual-core ARM Soft Core N/A N/A N/A ARM Cortex-M3 Cortex-A9 Cortex-A9 advantages, there are other reasons that make co-design even 14000 16nm more interesting including, faster time-to-market, lower power 12000 consumption (when optimized for this), ﬂexibility, and higher 10000 processing speeds, as typically hardware in these systems is 8000 used as an acceleration to software bottlenecks [97]. 6000 Adopting the co-design methodology in essence is a mat- 20nm

Number of DSP Slices DSP of Number 4000 28nm 16nm ter of partitioning the system into synthesizable hardware 20nm 40nm 28nm 28nm 2000 and executable software blocks. This process depends on a 65nm 90nm strict criteria that is developed by the designer [98], [99]. 0 Virtex-4 Virtex-5 Virtex-6 Kintex-7 Virtex-7 Zynq-7000 Kintex Virtex Kintex Virtex The authors in [100] and [101] discuss their partitioning UltraScale UltraScale UltraScale+ UltraScale+ Xilinx FPGA Series methodologies and present the process of making the proper architectural decisions. Common methods typically provide Fig. 3. Number of DSP Slices in Xilinx FPGAs [78]. The values on top of useful information to the designer to help make the best the bars refer to the CMOS technology used. decision of what to implement in hardware and what to keep in software. This information can include possible speedups, for handling intensive signal processing tasks, as depicted in communication overheads, data dependencies, and locality and Figure 3. This also can be seen from USRP E310 model, which regularity of computations [101]. Examples of SoC boards incorporates a Xilinx Zynq SoC [78]. available in the market can be found in Table V. 3) Shortcomings: A downside of adopting SoCs for co- 2) Adoption: As mentioned in Section II, SDRs can be con- design is that their prices are generally higher, compared sidered inherently hybrid or heterogeneous systems, thereby to previously mentioned design approaches, due to having implying the need for both hardware and software blocks. multiple components on the same board, i.e., processor and This is due to the fact that the control part is usually taken FPGA fabric. Other factors contributing to this include extra care of by a general processor, and other functions, such memory and sophisticated interfaces. Another challenge of co- as signal processing, by a specialized processor like DSPs, design is shared memory access, e.g., external DDR memory, and sometimes are accelerated using dedicated hardware like between the processor and FPGA fabric. The study of [106] FPGAs [102]. This design approach fits well with SDRs and shows that the number of memory read and write operations can be fully utilized to meet certain requirements that pertain performed by a GPP is higher than that of FPGAs. This is to their attractive features. For example, accelerating portions due to the fact that processors perform operation on registers, of a block or moving it entirely to the FPGA fabric can while FPGAs operate on buffers. Since memory accesses add help to push the processing time to the limit in order to up to the overall latency, this can cause a bottleneck to the achieve a real-time performance for real-life deployment. In overall performance. In addition, the authors have developed addition, through careful implementation of RTL optimiza- a methodology for predicting shared memory bandwidth by tion techniques, the development of power efficient systems using a functionally-equivalent software. This enables the for mobile and IoT applications would be possible. On the designers to be aware of any bottlenecks before implementing other hand, running most of the MAC layer operations on a the entire co-design. processor, or multi-processors, can be advantageous for easy reconfiguration. Therefore, different partitioning schemes can E. Comparison be adopted to meet the requirements of the application at hand. When we covered different design methodologies and hard- It is worth noting that FPGA vendors, namely Intel [44] ware platforms for a wide selection of SDR platforms, we and Xilinx [78], are widening their product base with more intended to compare them analytically one-on-one, using a SoCs and Multi-processor SoCs (MPSoCs) [103], due to the cross-platform implementation of one of the wireless commu- the growing demand for such devices. An example of an nication protocols. What was available in literature, instead, SDR realized on an MPSoC is the work by [104]. In a white was a series of abstract comparisons using a set of bench- paper, National Instruments (the company that owns USRP marks targeting High Performance Computing (HPC), and not [17]) predicts that the future of SDRs is essentially a co- necessarily SDR applications. It is somewhat difficult to draw design implementation [105], especially with the introduction conclusions from these numbers alone, since a performance of FPGAs that are equipped with a large number of DSP slices comparison in the SDR field requires testing them in real-life. INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 9

In Table VI, we provide a high-level comparison between and their results lead to the conclusion that GPUs and FPGAs three major design approaches as a guideline for designers perform better than GPPs. It is obvious from above studies towards choosing the method that best meets their application that trade-offs are to be expected when a particular design specifications. In this comparison, we focus on the features methodology is adopted, hence careful analysis should be that are important to SDR design. However, we do not make carried out beforehand. Other comparative studies include assumptions on what the best approach is and believe it [111]–[113] with similar results and conclusions. is the developer’s responsibility to make the best judgment depending on the application area. Please note that in this table IV. DEVELOPMENT TOOLS we did not include GPUs, as they typically act as co-processors to GPPs and their addition generally improves performance. As we mentioned in Section III-C, HLS is an abstract We also did not include co-design since it combines GPPs method of designing hardware using a high-level programming with FPGAs. language. Developers of FPGA and co-design based SDRs As Table VI shows, while GPPs are easy to program and can benefit from HLS as it requires no prior experience with extremely flexible, they lack the power to meet specifications hardware design. Unlike the rest of the development tools, in real-time and are very inefficient in terms of power. To HLS tools share a common theme and offer similar features. increase their performance, multiple cores with similar instruc- Thus, we first discuss HLS in this section. Next, we review tion sets are included in the same GPP platform to exploit the common development tools that are typically used in the parallelism and perform more operations per clock cycle. process of SDR design and implementation for different design However, hardware replication (i.e., adding more cores to approaches. GPPs) may not necessarily translate to a higher performance. GPUs tackle this by offering the same control logic for A. High Level Synthesis (HLS) several functional units. The sequential portion of the code runs on the GPP, which can be optimized on multi-core HLS has been a hot research topic for over a decade, GPPs, while the computationally intensive portion runs on a with both academia and industry trying to make hardware several-hundred-core GPU, where the cores operate in parallel. design more accessible by every developer [114]. HLS is Another example of a customized processor is DSP, which the process of converting an algorithmic specification of the performs significantly better than GPPs, while at the same design described by a high-level programming language to time maintaining the ease-of-use feature that GPPs possess, an RTL implementation. HLS provides a new level of design making them very attractive options. They are also more power abstraction through exploring the micro-architecture and any efficient and better fit for signal processing applications. On hardware constraints. The resulting RTL is highly optimized, the other hand, they are more expensive, which is the main in terms of power, throughput and latency, and reasonably trade-off. Finally, FPGAs combine the flexibility of processors comparable to a hand-tuned code. Figure 4 depicts this pro- and efficiency of hardware. FPGAs can achieve a high level cess. The major difference between RTL and C is the absence of parallelism through dynamic reconfiguration, while yielding of timing constraints in the high-level model, which is merely better power efficiency [42]. FPGAs are typically more suit- a behavioral description of the system with no details about the able for fixed-point arithmetic, like signal processing tasks, underlying hardware. The second difference is the processing but in the recent years their floating-point performance has architecture: while GPP architecture is fixed, the best possible increased significantly [81], [107]. However, the designers are processing architecture is built by the compiler for FPGA expected to know a lot more about the hardware, which is [115]. In addition, HLS can speed up the development cycle sometimes a deterring feature. (time to market) to several weeks, down from several months In a comparative analysis by [108], authors studied the [116]. This is because the task of producing an optimized RTL performance and energy efficiency of GPUs and FPGAs using is handled by the HLS tool and the developer’s efforts are a number of benchmarks in terms of targeted applications, focused on describing the system’s algorithmic description. complexity, and data type. The authors concluded that GPUs In [117] the authors presented LegUP, an open-source HLS perform better for streaming applications, whereas FPGAs tool. This tool is capable of profiling a code to identify are more suitable for applications that employ intensive FFT frequently executed sections of the code for hardware acceler- computations, due to their ability to handle non-sequential ation (i.e., moving them to the FPGA fabric). The authors in memory accesses in a faster and more energy efficient manner. [118] survey HLS compilers and their capability to provide an Similarly, in [42], the authors review and report the sustainable accurate estimation of functional area and timing, comparable performance and energy efficiency for different applications. to results from hand-tuned hardware designs. In an effort to One of their findings related to SDRs is that FPGAs should be help the developer make the right decision to pick an HLS tool used for signal processing without floating point, confirming that yields the best results for their application, the authors aforementioned results. In addition, the authors in [109] report in [116] present a study where they compared three of the that GPUs are ten times faster than FPGAs with regards to FFT industry tools, namely Vivado HLS [86], Intel FPGA SDK for processing, while authors in [81] demonstrate that the power OpenCL [119], and MaxCompiler [120], through developing efficiency of FPGAs is always better than GPUs for matrix LDPC decoders, which are often used as error correcting operations. Finally, authors in [110] compare GPPs, GPUs, blocks in SDRs. All three tools successfully synthesized LDPC and FPGAs through the implementation of LDPC decoders, decoders and implemented them on Intel [44] and Xilinx [78] INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 10

TABLE VI COMPARISON OF SDRDESIGN APPROACHES

GPP DSP FPGA Computation Fixed Arithmetic Engines Fixed Arithmetic Engines User Configurable Logic Execution Sequential Partially Parallel Highly Parallel Throughput Low Medium High Data Rate Low Medium High Data Width Limited by Bus Width Limited by Bus Width High Programmability Easy Easy Moderate Complex Algorithms Easy Easy Moderate I/O Dedicated Ports Dedicated Ports User Configurable Ports Cost Moderate Low Moderate Power Efficiency Low Moderate High Form Factor Large Medium Small

C / C++ / MATLAB / openCL High-Level Language While some of them are vendor-speciﬁc, some tools work with a variety of FPGA boards. Examples mentioned in the table all Pipelining / dataflow / unrolling HLS Tool provide a set of area and timing optimizations such as resource Verilog / VHDL Synthesizable HDL Code sharing, scheduling, and pipelining. However, not all of them

Vivado / Quartus RTL Synthesis and Place-and-Route Tool are capable of generating testbenches for the design.

FPGA Implementation Bit Stream B. Tools

Fig. 4. HLS design flow. In this section we review the existing software tools for SDR development. For each design methodology, we discuss a compatible development tool and list its features. We also FPGA boards. The difference, however, was in the logic uti- provide an overall comparison between them to highlight the lization and performance. Similarly, the authors in [121], com- differences. pare the same aforementioned list of compilers quantitatively 1) MATLAB and Simulink: Most designers start with mod- and qualitatively using several financial engineering problems elling and simulating the system using Mathworks MATLAB (e.g., Monte Carlo-based Option Pricing) and compare the [85] and Simulink [132]. Through the availability of a wide performance of several FPGA boards. Their results show range of built-in functions and toolboxes, especially for signal that both Intel FPGA SDK for OpenCL and MaxCompiler processing and communication, developing and testing appli- performed better than Vivado HLS due to their ability to cations became very common and widely adopted. However, extract parallelism more effectively. In [122], the authors in order to use these models for different platforms, developers comprehensively review recent HLS tools and provide through would need to use MATLAB Coder [133] and Simulink Coder a careful analysis a methodology based on C benchmarks to [134] to generate C/C++ codes. The generated codes can compare some of these tools and their optimization features. be used with Embedded Coder [135] to optimize them and The various benchmarks implemented demonstrate that some generate software interfaces with AXI drivers for the sake of tools are better suited for certain applications than the rest, running on embedded processors and microprocessors, like the with no specific tool dominating the HLS field. The authors dual ARM cortex A9 MPcore [96] on the ZedBoard [136]. also show that open-source HLS tools such as LegUP [123] Alternatively, developers can use the HDL Coder [84] to can be as effective as their commercial counterparts. Other generate synthesizable RTL (Verilog or VHDL) code to be surveys and analyses include [124], [97], [125], which focused implemented on FPGAs or ASICs. It also has support for on open-source tools, and [126], which studied some of the Xilinx [78] and Intel [44] SoC devices by providing some trade-offs of HLS-generated designs and how reliable they information and optimizations pertaining to resource utiliza- are when errors are injected. All of the studies above prove tion and distributed pipelining. Figure 5 shows the design the feasibility and reliability of HLS tools to generate RTL flow for SoC platforms that the aforementioned tools offer codes, despite having different development and optimization and how they are connected. Examples of using MATLAB solutions. and Simulink to develop an SDR are the works by [137] and Examples of HLS tools include Xilinx Vivado HLS [86] and [138], where the authors used the RTL-SDR very low-cost SDSoC [127], Intel FPGA SDK for OpenCL [119], Cadence SDR dongle [139] (∼ $20) with a desktop computer to design Stratus High-level Synthesis (combining Cadence C-to-Silicon an academic curriculum for teaching DSP and communications and Forte Cynthesizer) [128], Synopsys Synphony C Compiler theory. [129], Maxeler MaxCompiler [120], MATLAB HDL Coder 2) Vivado HLS and SDSoC: Xilinx Vivado HLS [86] is a [84], and LegUP [123], which unlike the rest of the tools is design environment for high-level synthesis. This tool offers a vendor-independent (works with all types of FPGA boards, variety of features to tweak and improve the RTL netlist output such as Xilinx [78], Intel [44], Lattice [90], and Microsemi that is compatible and optimized for Xilinx FPGA boards. [91]). It accepts input specifications described in several languages Table VII presents a summary of commercial HLS tools. (e.g., C, C++, SystemC, and OpenCL), and generates hardware INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 11

TABLE VII HLSTOOLS

Xilinx Vivado Intel FPGA Cadence Stratus Synopsys Synphony Maxeler [86] [119] [128] [129] [120] HLS SDK for OpenCL High-level Synthesis C Compiler MaxCompiler Input C/C++/SystemC C/C++/SystemC C/C++/SystemC C/C++ MaxJ Output VHDL/Verilog/SystemC VHDL/Verilog VHDL/Verilog VHDL/Verilog/SystemC VHDL Testbench Yes No Yes Yes No Optimizations Yes Yes Yes Yes Yes Compatibility Xilinx FPGA Intel FPGA All All All

TABLE VIII DEVELOPMENT TOOLSAND PLATFORMS

MATLAB & Simulink [130] Vivado HLS & SDSoC [78] LegUP [123] GNU Radio [40] LabView [131] CUDA [48] Input MATLAB/Graphical C/C++/SystemC C Graphical/Python/C++ Graphical C/C++/Fortran/Python Output MATLAB/C++/RTL C/RTL C/RTL C/RTL C/RTL Machine Code Platform GPP/GPU/DSP/FPGA GPP/FPGA GPP/FPGA GPP/GPU/DSP/FPGA GPP/GPU/DSP/FPGA GPU Licence commercial commercial open-source open-source commercial commercial

such as decoders, demodulators, and ﬁlters. It is also capable of connecting these blocks and managing data transfer in a MATLAB/Simulink reliable fashion. In addition, it supports USRP systems [17]. Algorithm Design & Modeling One of the attractive features of GNU Radio is the ability to

HDL Coder deﬁne and add new blocks. This can be done via programming in C++ or Python. An example of using GNU Radio is the Embedded System Tool Integration Simulink Coder + Embedded Coder work by [142], where the author uses it with a USRP to realize different types of transceivers such as TDMA and CSMA, and FPGA Bitstream SW Code showcases some of its capabilities. Similarly, the authors in

Xilinx/Intel SoC Device [143] successfully achieve real-time communication between two computers using USRP [17] and RTL-SDR [139]. 4) LabVIEW: A widely used tool from National Instru- Fig. 5. Mathworks SoC design ﬂow [130]. ments [131] that offers a visual programming environment for test, automation and control applications used by both industry modules in Verilog or VHDL. Developers have several options and academia. It is similar to GNU Radio and Simulink, to optimize the solution in terms of area and timing through the where the design can be realized schematically by connecting use of directives (guidelines for the optimization process) and a chain of various blocks together, each of which performs pragmas for RTL optimization. These optimizations include a certain function. It also offers complete support for USRP loop unrolling, loop pipelining, and operation chaining. SD- [17] enabling rapid prototyping for communications systems. SoC [127] is another tool by Xilinx [78]. The major difference Designing different blocks of the system can be achieved between the two tools is that the latter has the capability using high-level languages such as C or MATLAB, or using to provide solutions for SoCs. SDSoC is built on top of a graphical dataﬂow. An SDR platform development using Vivado HLS and has the same C-to-RTL conversion capability. LabVIEW is the work by [144], where the author describes a The main advantage of using SDSoC is that it automatically wireless communication course design that incorporates USRP generates data movers, which are responsible for transferring and LabVIEW, due to their ease of use, to help teach students data between the software on the processor and the hardware basic concepts. Similarly, in [145] the authors designed an on the FPGA. SDR platform, namely FRAMED-SOFT, that includes two A similar tool to SDSoC that is open-source is LegUP types of USRPs and is intended for an academic environment. [123]. It was developed at the University of Toronto, as part 5) CUDA: Developed by NVIDIA, it issues and man- of an academic research effort to design an HLS tool that is ages computing platforms and programming models for data- capable of taking in a C code as an input and providing three parallel computing on GPUs [48]. Developers typically use possible outputs: a synthesizable RTL code for an FPGA, a CUDA when GPUs are part of the processing architecture as pure software executable, and a hardware/software co-design co-processors, and want to take full advantage of their power solution for a SoC. by speeding up applications. As discussed in Section III-A2, 3) GNU Radio: It is an open-source software development in order to identify application components that should run toolkit that provides signal processing blocks to implement be on GPP and the parts that should be accelerated by the SDRs [40], [140]. It runs on desktop or laptop computers, GPU, one needs to look at the tasks at hand. Programming and with the addition of simple hardware such as USRP B200 languages that can be used in CUDA include C, C++, Python, [17], can build a basic SDR. It is often used by academia Fortran, and MATLAB [85]. The toolkit includes, in addition and the research community for simulation as well as quick to the rich library for GPU-related acceleration functions, a setup of SDR platforms. Similar to System Generator tool compiler, development tools, and CUDA runtime, to develop [141] and Simulink [132], it includes different kinds of blocks applications and optimize them for systems that incorporate INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 12

a Xilinx Virtex II Pro P30 FPGA, which has two PowerPC USB Controller 405 cores [78]. The main advantage of this platform is the degree of flexibility that it offers. Developers have the option to implement communication standards in three different Daughterboard ADC ADC Daughterboard methods: (i) they can fit the entire design onto the FPGA Rx ADC ADC Rx and assign few tasks to the host’s GPP (full hardware), (ii) a FPGA Board full software implementation, where the design is implemented entirely on the GPP with minimal FPGA involvement, and (iii) Daughterboard DAC DAC Daughterboard a hybrid hardware/software co-design implementation, where Tx DAC DAC Tx the developer can partition the design in any way that fits their criteria. LimeSDR. Resembling the general basic architecture of Fig. 6. USRP board architecture. RF daughterboard selection depends on the USRP (e.g., USRP B200 [17]), Lime Microsystems [146] application specifications in terms of frequency coverage. develops a series of SDRs that is based on Lime Microsys- tems latest generation of field programmable RF transceiver GPUs. (FPRF) technology, in addition to an Intel FPGA [44] and a microcontroller. It is then connected to a computer via USB V. PLATFORMS 3.0. The SDR platform has the responsibility of delivering the wireless data, while the GPP (computer) has the task In this section, we list the different types of SDRs from of processing the incoming signals and generating the data the architecture and design point of view. We analyze them, to be transmitted by the SDR. The GPP in this case is the examine their strengths and shortcomings, and discuss their source of computing power for baseband. LimeSDR supports impact on SDR development. signal bandwidth ∼30-60MHz. Developers of LimeSDR also developed LimeSuite software, which is used to model SDRs. A. GPP-based This tool, source code, firmware, and schematics, are open- USRP N-Series. Universal Software Radio Peripheral source. (USRP) is the most common SDR platform known to the Ziria. It is a programming platform that uses a domain- developers’ community [17]. The cost of this platform is specific language (DSL) named Ziria, and an optimizing around $4000-5000. It provides a hardware platform for the compiler [147]. The application of Ziria is the implementation GNU Radio Project [140]. There are two generations available: of the physical layer (PHY) of wireless protocols. Ziria has a USRP1 and USRP2. USRP1 (released in 2004) is connected to 2-layer design: a generic computer through USB, with the addition of a small – A lower layer that is an imperative language, which is a FPGA. The FPGA board has two roles: routing information, mixture of C and MATLAB [85] code, with features of and limited signal processing. This generation was capable of the two languages carefully chosen to guarantee efficient supporting a ∼ 3MHz bandwidth due to USB 2.0 limitation. compilation. The second generation, USRP2, was released in 2008, and it – A higher layer, which is the language used to specify and supports 25MHz bandwidth by utilizing gigabit Ethernet. It stage stream processors. includes a Xilinx Spartan 3 FPGA [78] for local processing The Ziria optimizing compiler consists of two parts: the operations. frontend and the backend. The frontend parses the Ziria USRP, in general, is a board with ADC and DAC, an RF code, putting it in an abstract representation, then applying front end, a PC host interface, and an FPGA. This board several optimizations on it. The backend compiles the resulting consists of a motherboard and typically four daughterboards optimized code into an optimized low-level execution model. (two transmitters Tx, and two receivers Rx), as depicted The particular benefits of this platform are as follows: The in Figure 6. The daughterboards process analog operations first one is easy and dynamic reconfiguration, due to the like filtering and up/down conversions. They are modular so dynamic staging of the control graph. This is unlike GNU they can deal with applications operating up to 6GHz. The Radio [40] C++ templates that allow limited reconfigurability. FPGA board, depending on the USRP series, handles a few In addition, its code optimization can operate on data-flow signal processing operations, and the majority of operations components, and often yields a faster execution on GPPs (e.g., are offloaded to the connected host system. USRP platforms through the use of LUTs). In general, a Ziria code is very can be easily set up to use. However, while their performance concise and easy to use. For example, an implementation of is suitable for research experiments and quick prototyping, a WiFi scrambler in Ziria needs thirteen lines only. Ziria is these platforms do not necessarily meet the requirements of an interesting research based on data-flow languages, which communication standards. In fact, the minimum bandwidth are typically used in embedded systems. It also builds upon of the RF, PC host, or FPGA component used affects the popular functional reactive programming framework. throughput and timing characteristics of the platform. Sora. Sora is a fully programmable software radio platform KUAR. Another platform that is similar to USRP is the on PC architecture. It requires C programming on multi- Kansas University Agile Radio (KUAR) platform [34]. The core GPP, and yields high performance that includes high basic architecture is composed of a generic computer and processing speed and low latency. The Sora platform uses INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 13 the Ziria language discussed above to write high-level SDR descriptions, and is tested for real-time operation [18]. Unlike Antenna FPGA CPU WARP [21] (which we will explain in Section V-E), Sora can Packet text Buffer Transfer Buffer text GPU accommodate various RF front ends. Bus Since PC hardware is not intended for signal processing WARP GPU Server of wireless protocols, their performance can be limited. We discussed some of the limitations of GPPs in the context of Fig. 7. GPU-accelerated SDR Platform using WARP [52]. The GPU server SDRs in Section III-A. These limitations were the motivation is used for baseband processing, while WARP is used for radio control and behind the development of Sora. The overall setup of Sora interface. Offloading signal processing tasks to the GPU has significantly includes a soft-radio stack that combines a multi-core GPP improved the overall performance. and a radio control board, which consists of a Xilinx Virtex- 5 FPGA [78], PCIe-x8 interface, and DDR2 SDRAM. Sora communication systems are in a constant state of development uses both hardware and software techniques to address the and they increase in complexity and sophistication. This calls challenges of using PC architectures for high speed SDR. It is for higher computational performance and, along with it, a the first SDR platform that enables users to develop high speed higher level of flexibility. In order to realize Iris on Zynq wireless implementations entirely in software on a standard PC platform, the components are first translated into C++ using architecture. Cmake tools and then they are ported to the platform. HLS In Sora, new techniques are proposed for efficient PHY tools, like Xilinx HLS [86], can be used to accelerate parts of implementation. Some of these techniques include: (i) ex- the design that are considered to be the bottlenecks. This is ploiting large high-speed cache memory to minimize memory done by running system profiling on the software components, accesses, (ii) extensive use of LUTs, where they would trade and one of these profilers is Linux Perf. Acceleration, in memory for calculation and still well fit into L2 cache, (iii) particular, refers to running parts of the software (after re- exploiting data parallelism in PHY, and (iv) utilizing wide- writing it in RTL) on the FPGA fabric in order to achieve vector SIMD extension in GPP. One of the main novelties higher performance. of Sora is efficiently partitioning and scheduling the PHY processing across cores in GPP. The second innovation is core dedication for real-time support. This was accomplished by B. GPU-based exclusively allocating enough cores for SDR processing in multi-core systems. They showcased its effectiveness through WiMAX SDR. The authors in [150] built a GPU-based SoftWiFi, which is a full implementation of IEEE 802.11a/b/g platform to realize a WiMAX system. In their study, they PHY and CSMA MAC with 9000 lines of C code and real-time also compared the performance of GeForce 9800GTX GPU performance. They also successfully implemented a 3GPP LTE [45] against a TMS320C6416 DSP [59] via implementing Physical Uplink Shared Channel (PUSCH), or Soft-LTE, with the Viterbi decoder algorithm. The results indicate that the 5000 lines of C code and a peak rate of 43.8Mbps with throughput of the GPU is 181.6Mbps, with a considerable 20MHz. difference compared to the DSP’s 2.07Mbps. Some of the critiques to Sora is that its FPGA is not programmable and its capabilities are not fully utilized. In OFDM for WiFi Uplink SDR. In [52], the authors used the addition, the authors do not share the details of its internal WARP framework [21] as a basis to realize their NVIDIA routines. Finally, Sora only works on GPPs, with no clear GPU-based SDR platform. They utilized the inherent paral- mapping to DSPs. lelism of GPUs, and with the help of CUDA [48], they were Iris. It is a cross-platform SDR, developed to support highly able to achieve real-time performance on WARP. They used reconfigurable radio networks [148]. It is built using a plu- this platform to design and implement a Single Input Single gin architecture, namely components, to achieve modularity. Output (SISO) OFDM system for WiFi uplink communication. These components process data streams and perform different Figure 7 depicts the architecture of this enhanced (accelerated) operations on them. An engine is used to run, load and WARP SDR. For this platform, they used: (i) a WARP version maintain the components. Similar to System Generator [141] 3, which consists of a Xilinx Virtex-6 FPGA [78] for radio and Simulink [132] tools, Iris engine can be used to link control and interface, and (ii) a GPU server, which consists of components together to build a complete radio system. XML an Intel i7-3930K six-core 3.2GHz CPU [44] for transceiver is used to specify the components used in the program, their configuration, and four NVIDIA GTX TITAN graphic cards parameters, and how they should be linked. The main features [45] for baseband processing. Each TITAN is comprised of of the Iris architecture include: (i) runtime configurability, (ii) 2880 core Kepler GPU running at 889MHz. The accelerated support for the entire network stack (all layers), not just the WARP achieves less than 3ms latency and higher than 50Mbps PHY layer, (iii) support for advanced processing platforms over-the-air throughput. including FPGAs. There are multiple studies on using Iris. An example is the Signal Detection SDR. In [46] the authors designed and work in [149], where the authors discuss the process through studied real-time signal detection using an SDR platform which they were able to implement Iris on Xilinx Zynq composed of a laptop computer with an Intel Xeon E3-1535M SoC [78]. The motivation behind this work is the fact that processor [44] and an NVIDIA Quadro M4000M GPU [45]. INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 14

developers must learn the architecture in order to implement SDRAM SDRAM SDRAM SDRAM SDRs. ARM Ardbeg. In [156], a commercial prototype based on Streaming Memory System SODA architecture was presented. The main enhancements of Ardbeg compared to SODA are optimized SIMD design, Host Stream Processor Controller Stream Register File Network VLIW support, and a few special ASIC accelerators, which are Interface dedicated to certain algorithms such as Turbo encoder/decoder, block floating point and arithmetic operations. Atomix. Typically, programmers need to do one of three tasks to the software workflow of DSPs: tapping into a signal ALU Cluster 0 Cluster ALU 1 Cluster ALU 2 Cluster ALU 3 Cluster ALU 4 Cluster ALU 5 Cluster ALU 6 Cluster ALU 7 Cluster ALU Microcontroller processing chain, tweaking a block, or inserting/deleting a block. To simplify these tasks, modularity is crucial. How- Fig. 8. Imagine Processor Architecture. This platform employs a real-time ever, designing a modular software for DSPs is challeng- stream processor for baseband processing. ing considering the particular requirements, such as latency sensitivity and high throughput, that must be supported. The main challenge is the need for programmers to define and % For 1000ms long samples, this design achieves around 75 manage everything manually and explicitly. In other words, reduction in parallel processing time, compared to GPPs. it is necessary to use bare metal features, like moving data across cores, managing SRAMs, and parallelizing software. C. DSP-based In order to address these concers, Atomix [19] describes Imagine Processor-based SDR. Authors of [151] proposed the software in blocks, named atoms. An atom can be used one of the earliest SDR solutions that is fully based on to implement any operation. This can be signal processing a DSP. This SDR employs the Imagine stream processor, or system handling. Atoms can be used for realizing blocks, developed at Stanford University in 2001 [152]. The Stan- flowgraphs, and states in wireless stacks. In addition, simple ford Imagine project aimed at providing a signal and image control flow makes atoms composable. The easy modification processor that was C programmable and was able to match feature mentioned above is due to declarative language. It the high performance and density of an ASIC. It is based on is important to note that an Atomix signal processing block stream processing [153]–[155], which is similar to dataflow implements a fixed algorithmic function, operates on fixed data programming in exploiting data parallelism and is suitable for lengths, is associated with a specific processor type, and uses signal processing applications. This work paved the way to only the memory buffers passed to it during invocation. The the development of GPUs. blocks will run fixed sets of instructions executing uninter- As Figure 8 shows, Imagine processor uses VLIW-based rupted on fixed resources using fixed memories. This results ALU clusters arranged in a SIMD fashion to handle data in having fixed execution times. Atoms can also be composed streams. At the middle of the architecture, there is the Stream to build larger atoms. Using Atomix, radio software can be Register File, which stores data from other components, built entirely out of atoms and is easily modifiable. Atomix- thereby minimizing memory accesses. The performance of this based radio also meets throughput and latency requirements. platform has been evaluated through implementing complex Developers define the atoms in C. Then, the blocks are algorithms relevant to W-CDMA cellular system. The results composed into flowgraphs and states using Atomix interface. show higher performance compared to TI C67 DSP [59]: The next step involves developing a parallelized schedule and channel estimation and detection are improved by 48x and resource assignment in order to meet latency and throughput 42x, respectively. requirements. The software is then compiled by Atomix into a SODA. In [67], the authors introduce the Signal-processing low-level C. The compiled code, along with Atomix libraries, On-Demand Architecture (SODA), which is an SDR platform will be compiled into a binary. Atomix was used to build based on multi-core DSPs. It offers full programmability and IEEE 802.11a receiver only, namely Atomix11a. Evaluation targets various radio standards. The SODA design achieves of Atomix11a shows that it exceeds receiver sensitivity re- high performance, energy efficiency, and programmability. quirements, operates in indoor environments robustly, has low This is attributed to a combination of features that include processing latency, and atoms have low timing variability. SIMD parallelism and hardware support for 16bit computa- Although the power consumption reported is 7W, it does not tions, since most algorithms operate on small values. The basic include the power consumed by the front end, USRP2, which processing element is an asymmetric processor consisting of a is about 14W. A shortcoming of Atomix is that it is intended scalar and SIMD pipeline, and a set of distributed scratchpad only for synthesis on a variety of DSPs, not for GPPs, GPUs, memories that are fully managed in software. SODA is a multi- or FPGAs. core architecture, with one ARM Cortex-M3 processor [96] BeagleBoard-X15. A collaborative project between Texas for control purposes and multiple processing elements for DSP Instruments [59], Digi-Key [157], and Newark element14 operations. Using four processing elements can meet the com- [158], BeagleBoard is an open-source SoC computer [159]. putational requirements of 802.11a and W-CDMA. Compared It features TI Sitara AM5728 [59], which includes two C66x to WARP and Sora, as a single-chip implementation, SODA DSPs [59], two ARM Cortex-A15, two ARM M4 cores [96], is more appropriate for embedded scenarios. As with WARP, and two PowerVR SGX544 GPUs [160]. With its relatively INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 15 low price (∼$270), the DSPs along with the co-processors and experimented with a set of protocol changes. Airblue make a powerful platform for implementing standalone SDRs. demonstrated that it was easily modifiable and still meets An example of using the BeagleBoard (an older model but timing requirements. Airblue achieves a higher speed than same general architecture) is the work by [161], where it was Sora for cross-layer communication (between MAC and PHY used to implement the Public Safety Cognitive Radio (PSCR) layers), which typically has the latency requirement of a few [162] through GNU Radio [40]. PSCR is FM radio-based and microseconds. was developed by the Center for Wireless Telecommunications (CWT) at Virginia Tech. E. Hybrid Design USRP Embedded (E) Series. This is the embedded version of USRP, where they incorporate Xilinx SoCs [78] to develop D. FPGA-based standalone SDRs. USRP E310, for example, is based on Xilinx Airblue. This work [20] introduces an FPGA-based SDR Zynq 7020, which yields high performance and is energy platform for PHY and MAC layers. Airblue is a method to efficient. This USRP is stand-alone and suitable for mobile implement radios on FPGA to achieve configurability. This applications. It supports frequency range 70MHz to 6GHz and is done using an HDL language called Bluespec, through features a 2 × 2 MIMO RF front end. which all hardware blocks of a radio transceiver are written. WARP. The Wireless open-Access Research Platform is In Bluespec, a developer describes the execution semantics of another example of a co-design specifically developed to the design through Term Rewriting Systems (TRS). TRS is prototype wireless protocols [21]. It is programmable and a computational paradigm based on the repeated application scalable, however, parts of the device are implemented in of simplification rules [163]. The next step is compiling the ASIC, which makes it less flexible. WARP v3 contains a TRS into RTL codes. TRS has the capability of generating Xilinx Virtex-6 FPGA [78], which includes two MicroBlaze efficient hardware designs. The main difference between a processors and a Gigabit Ethernet peripheral. It requires the Verilog interface and a Bluespec interface is that the former use of Xilinx Embedded Development Kit (EDK) [78] to is merely a collection of wires with no semantic meaning, design SDRs. It is also open-source, with 802.11 reference while the latter includes handshake signals for blocks com- design made available to the research community. WARP munication. Therefore, Bluespec facilitates latency insensitive has become widely used in the research community due to designs which are essential to system construction via modular its effectiveness in implementing various wireless protocols composition. Using Airblue, developers may find the need to (e.g., 802.11 g/n PHY and MAC) and achieving real-time modify the building blocks, or modules, to add new features, performance. It also supports MIMO since it has multiple RF make algorithmic modifications, tune the performance to meet interfaces. throughput or timing requirements, or make FPGA-specific PSoC 5LP. Developed by Cypress Semiconductor [164], optimizations. this SoC is composed of an ARM Cortex-M3 GPP [96], an In order to reach modular refinements, the design of a analog system, and a digital system that uses Universal Digital configurable radio must have two design properties, latency- Blocks (UDBs). A UDB is a programmable digital block insensitivity and data-driven control. In addition to flexibility, based on Programmable Logic Devices (PLDs) for realizing Airblue meets tight latency requirements by implementing synchronous state machines, i.e., they resemble an FPGA, but both PHY and the MAC on FPGA and connecting them are smaller. All parts are reconfigurable and programmable with streaming interfaces, which allows data to be processed by using the PSoC Creator software IDE, which includes a as soon as it is ready. Another advantage of Airblue is the graphical design editor. A few developers have used it to build implementation of highly reusable designs through parameter- standalone SDRs with simplicity [165]. In addition to its low izations (i.e., same RTL block can be instantiated several times price ($15), PSoC is a good candidate to quickly prototype using different parameter values). Also, several techniques that and get familiar with SDRs. permit designers to reuse existing designs to implement run- Zynq-based SDR. The authors in [166] developed an time parameterized designs are developed. SDR using the Xilinx Zynq ZC706 and ZedBoard SoCs to Airblue essentially is a HLS platform that offers modular implement IEEE 802.11a. For their RF front end, they used refinement, where modifying one module does not affect Analog Devices FMComm3 AD9361 [69]. They used two the rest of the system, similar to the approach adopted by Zynq boards to compare their performances. They both in- Atomix. This is why, when compared to WARP, Airblue clude dual-core ARM Cortex-A9 [96], with ZC706 containing is more flexible, since WARP was not designed with the Kintex-7 and ZedBoard containing Artix-7 FPGAs. In addition aforementioned principles in mind. The authors in [20] also to HDL Coder [84] and Embedded Coder [135] to generate studied HLS tools and compared them to Bluespec. They RTL and software executable, they used Mathworks Simulink found them to be more effective than Bluespec in early stages [132] to generate the model. They reported an average of 2W of the design; nevertheless, Bluespec is capable of yielding power consumption for Tx and Rx, compared to 5 to 8W a more optimized final RTL. They argued that HLS will be reported by Atomix [19]. of limited use in final FPGA implementations, especially for high performance blocks required by future wireless protocols, F. Comparison hence Bluespec is the language of choice for Airblue. For Table IX compares the SDR platforms discussed above in performance evaluation, the authors have implemented 802.11 an effort to provide a reference guide for developers. SDR INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 16 platforms are compared according to the following criteria: In [167], the authors introduce a method based on RS- – Programmablity: As protocols evolve, a platform is re- 422 serial communication and High Level Data Link Control programmed by simply adding or exchanging parts of (HDLC) protocol to update DSP and FPGA systems. Their the design. For example, when 3GPP issues an update method is fast, stable and easy to implement. The authors in for LTE standard, instead of replacing the entire radio, [171] show a new method for remotely updating Xilinx FPGAs an SDR is reprogrammed to accommodate the upgrade. [78] by storing the new design or code in Serial Peripheral – Flexibility: A platform is capable of handling future wire- Interface (SPI) flash memories. They utilized the Xilinx Quick less protocols as requirements become more demanding. Boot application, along with the KC705 Evaluation Kit from This means an SDR should be able to support tighter Xilinx, in order to develop their method. However, it is not timing requirements for next generation protocols. always practical or feasible to use a download cable in order – Portability: A platform is standalone and readily de- to update the system. The authors in [170] tackle this issue ployed. This is generally a requirement for mobile and and the problem of downtime during an update or in the IoT applications. case of a failure. With this method, there exists two images, – Modularity: A design’s components are separated and re- namely a factory mode configuration image, which acts as combined without internal module changes. For example, the baseline, and an application mode configuration image, a developer may need to exchange a Viterbi decoder with which is used for some specific functions. They show the a Turbo decoder without having to worry about the rest capability of updating and running a new application mode of the design. (design) image, in addition to rolling back into the factory – Computing Power: Since performance depends on the mode image when no application mode image is available. protocol used, we merely evaluate the capability of plat- Others have focused their efforts on improving the security forms to implement a given protocol, and not be limited of remote updates, such as [172] and [173]. While there to a subset of the protocol (e.g., implementing the Viterbi exists some solutions to the remote update process, a few decoder only). challenges including partial reconfiguration of FPGAs are not – Energy Efficiency: Similar to computing power, we evalu- yet resolved. ate the capability to implement protocols efficiently, while keeping power consumption at a minimum. B. Centralized Algorithms and Network Slicing – Cost: The cost of the hardware equipment. If a platform requires a PC, its cost is not included. Also, when the In order to simplify the design and provisioning of large- price of the entire setup is unknown, the price of the scale networks, software-defined networking (SDN) has been basic platform (not including RF front end or interfaces) proposed to centralized network control. In this architecture, a is shown. controller (or multiple controllers) communicate with network devices (data plane) to collect their status information and configure their operation. Recent studies show that wireless VI.RELATED RESEARCHAND POTENTIAL SOLUTIONS networks can significantly benefit from the central network Although in the previous sections we have highlighted some view established in the controller to design more efficient of the challenges of building SDRs, in this section, we present network control algorithms such as channel assignment and additional research areas that are still being faced by the mobility control. research and development community. These challenges are Centralized control of network resources is the enabler technical or practical. of network slicing, which refers to the abstraction, slicing and sharing of network resources. Three levels of slicing is applicable to wireless networks: (i) spectrum (a.k.a., link virtu- A. Remote System Update alization): requires frequency, time or space multiplexing. (ii) One of the main features and motivations of SDRs is their infrastructure: the slicing of physical devices such as BSs. (iii) reconfigurability and flexibility. In order to take full advantage network: this refers to the slicing of the network infrastructure. of this, the process of updating a SDR platform should be Compared to wired networks, slicing the resources of wireless quick and easy. Remote stand alone SDRs are usually FPGA networks is significantly more challenging due to the variable and DSP based, with more FPGAs being used. Hence, most of nature of wireless channel. Meanwhile, since SDRs enable the the research has been focused on remote updates of FPGAs. slicing of resources both at the spectrum and infrastructure Since FPGAs are volatile, which are configured during system level, they can be used to augment SDNs towards a fine- power-on through their flash memories, update is traditionally grain allocation of resources. For example, compared to ASIC- done using Joint Test Action Group (JTAG) method [167]. based transceivers, SDRs provide a significantly higher level However, with more SDRs deployed and adopted in wireless of control over the parameters of physical and MAC layer. and cellular networks, remote update becomes necessary and a It should be noted that when centralized network control is challenge. With remote over the air (OTA) updates, it becomes employed, the delay of communication between the controller possible to send patches to current designs, or even upload an and SDR platforms as well as the delay of programming and entirely new and improved design to mobile networks and BSs applying new configurations should be bounded within the [168], [169]. Some of the challenges faced by the research specification requirements. For example, in a dense and mobile community include speed, reliability, cost, and security [170]. environment, the controller may employ a centralized channel INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 17

TABLE IX COMPARISON OF EXISTING SDRPLATFORMS

Programm- Flex- Port- Modul- Computing Energy Cost Soft Core FPGA ability ability ability arity Power Efficiency (USD) Imagine Stream Imagine-based [151] × × × Medium Low N/A N/A X Processor Xilinx ∼ 4 − 5K USRP X300 [17] × High Low PC X X X Kintex-7 Total Dual-core ARM Xilinx ∼ 3K USRP E310 [17] High High X X X X Cortex-A9 Artix-4 Total Pc + Xilinx KUAR [34] × × × Medium Low N/A X 2× PowerPC cores Virtex II Pro Intel ∼ 300 LimeSDR [146] × High Low PC X X X Cyclone IV Board Only Depends Ziria [147] × × High Low PC N/A X X on App Xilinx ∼ 900 Sora [18] × × High Low PC X X Virtex-5 Board Only ARM Cortex-M3 SODA [67] × High High N/A N/A X X X + Processing Elements Dual-core ARM Xilinx ∼ 1.2K Iris [148] High High X X X X Cortex-A9 Kintex-4 Total ∼ 200 Atomix [19] High Medium TI 6670 DSP N/A X X X X DSP Only 2× TI C66x DSPs + ∼ 270 BeagleBoard-X15 [159] High Medium N/A X X X X 2× ARM Cortex-A15 & 2× M4 Board Only Intel ∼ 1.3K Airblue [20] High High N/A X X X X Cyclone IV Board Only 2× Xilinx Xilinx ∼ 7K WARP v3 [21] × High High X X X MicroBlaze cores Virtex-6 Total 10 PSoC 5LP [164] × Low High ARM Cortex-M3 N/A X X X Board Only Dual-core ARM Xilinx ∼ 1.2K Zynq-based [166] High High X X X X Cortex-A9 Kintex-4 Total and power control algorithm to instruct the nodes adjust their these operations are offloaded to a general computing platform. channels based on the decisions made centrally. In this case, Such an architecture reduces the load of edge devices and BSs it is essential to ensure the delay of sending configuration as can benefit from powerful processors to implement complex messages to multiple SDR platforms meets the requirements signal processing operations. For example, when multiple of central algorithm. Furthermore, it is essential to ensure all SDRs are connected to a computing platform, sophisticated the SDRs apply the configuration at the same time, otherwise signal processing algorithms could be developed to cope with serious interference and collision might happen. Although challenges such as interference. protocols such as OpenFlow [174] and Netconf [175] have To leverage NFV for SDRs, developers have been attempt- been designed for interactions between the controller and ing to tie them together to achieve complete flexibility across data plane, their implications on the performance of wireless the platform [178], [179]. Although several wireless SDN networks have not been investigated. Specifically, it is essential architectures have been proposed to address the challenges to evaluate the effect of hardware platform (i.e., GPP, DSP, of wireless communication [180]–[183], most of them do not FPGA) on the delay of applying configurations. These chal- benefit from the features of SDRs. lenges have not been addressed yet. D. Energy Efficiency C. Network Functions Virtualization (NFV) Battery-powered devices in an IoT network face the chal- One of the new topics is the concept of Network Functions lenge of minimizing power consumption in order to extend Virtualization (NFV), which offers an alternative way of battery-life before they are due for a replacement, which is designing and managing networking functions. The concept a costly operation. Ready-to-deploy SDR systems may use is very similar to SDRs, in the sense that various network high-performing platforms, such as FPGAs, without providing functions can run in software on top of different hardware solutions or alternatives to batteries [184]. Even in the case of platforms. These platforms are typically high-volume servers, BSs with on-grid power sources, it has become crucial to lower storage devices, and cloud-computing data centers [176]. Some power consumption in order to reduce CO2 emissions [185]. of the functions that are virtualized via this method are load This is particularly important for cellular network operators, balancing, firewalls, intrusion detection devices and WAN where BSs consume more that 50% of the total energy accelerators. This flexibility is what makes NFV very attractive consumed in the network [186]. To address these concerns, for network operators, carriers, and manufacturers, in addition energy harvesting mechanisms have been introduced. Energy to cost reductions [177]. From the SDR point of view, instead harvesting or scavenging is the process of deriving power of performing signal processing operations on the platform, from external sources, such as solar and wind energies (also, INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 18 referred to as green energy), and stored for consumption along- F. Security side internal sources (e.g., battery or electrical grid sources) SDRs simplify security provisioning. For example, when a [187]. As this green energy is a viable option for powering new security mechanism does not require hardware replace- BSs [188], Ericsson (a major telecommunication company) has ment (e.g., 802.11i’s WPA), it can be implemented through invested into and designed green-energy-powered BSs moti- reprogramming SDRs. On the other side, the reprogramma- vated by environmental and financial reasons [189]. Therefore, bility feature of SDRs exposes security threats, whether they hybrid power operation has been accepted as a solution to are standalone or part of a SDN architecture. For example, lower electrical grid energy consumption and cost [186]. assume an 802.11 network employs SDR-based BSs (i.e., In [187], the authors present a survey on energy harvesting access points). In this case, a denial of service (DoS) attack technologies with regards to transducers, such as antennas can be implemented by instructing a large number of clients and solar cells, that can utilize renewable energy sources, and to associate with a BS. Also, if the controller is compromised, cover some of the applications in the IoT and M2M world. then all the SDRs might be reprogrammed to be nonfunctional. The authors in [188] overview the cellular network operators Therefore, it is important to identify security threats and that have adopted the hybrid solution and started using green take them into account when designing SDR-based wireless energies to power their BSs. The authors in [190] discuss the networks. issue of implementing an energy harvesting transmitter in a Offloading SDR processing to general processing platforms cognitive radio. They also derive the optimal spectrum sensing through NFV enables the deployment of sophisticated cen- policy that maximizes the expected total throughput under tral algorithms for detecting abnormal activities and network energy causality and collision constraints. Energy causality breaches. For example, by analyzing the signal strength re- states that the total consumed energy should be equal to or ceived from a client at one or multiple BSs, a denial of service less than the total harvested energy, whereas the collision attack caused by generating excessive interference could be constraint states that the probability of accessing the occupied detected. Proposing security mechanisms that benefit from the spectrum is equal to or less than the target probability of a features of SDRs is an important future direction, especially collision. The authors concluded that a battery-operated radio for large-scale and public networks. can operate for a long time by optimizing both the energy and spectral efficiency. Energy harvesting is often associated with VII.EXISTING SURVEYS what is known as ”Green IoT”, which is the new trend for IoT networks and devices, where the main focus is making them Joseph Mitola III was the first to use the term ”Software more energy efficient. Another relevant work is [191], [192], Radio” in 1993, and published an important work introducing where the authors present an overview of the challenges and and explaining the concept of using software rather than existing solutions of energy-efficient IoT systems. traditionally used hardware for designing radio systems [194]. In an interesting early survey (1999) [195] on the ”then” emerging technology, namely SDR, the authors made the case E. Co-Design that SDRs have a great potential to advance and facilitate the development of communication standards such as WiFi By definition, co-design is the process of realizing system- and cellular networks. In a review published a few years level goals through exploiting the trade-offs between software later [22], the author paid close attention to the hardware and hardware in an integrated design. This yields a higher component of SDRs, such as programmable RF modules and design quality, and optimizes the cost and design cycle time, high-performance DACs and ADCs, as more technological which in turn shortens the time-to-market time. As co-design advancement had been made. is adopted for more applications, developers typically face the There are very few notable works to survey and review problem of partitioning and scheduling. Finding the optimal different SDR platforms and testbeds. One such survey is design partition is not trivial. While system profilers can assist the work by [38], where the author discusses the challenges in providing an insight about the system’s behavior and help SDR developers face. These challenges include size, weight, identify the parts of the code that can be accelerated on power, software architecture, security, certification and busi- hardware, partitioning should be an automatic process and ness opportunities. While this work is important for compiling requires no external involvement. There are several algorithms information about these challenges and presenting them in proposed to address the challenges of partitioning, such as one place, it stays away from enumerating the different SDR PSO, FCM, and FCMPSO [193]. These are optimization platforms, implementation approaches and their applications algorithms used to mapping embedded applications to Directed in the communication standards world. Acyclic Graphs (DAGs) for multi-core architectures. However, In that regard, the authors of [196] compare the SDR plat- in SDRs, the problem is even more complicated due to forms developed by the year 2012 and give a brief description having two layers of operation, namely PHY and MAC, and of what each platform entails. It lacks, however, any detailed partitioning needs to take into account strict real-time re- comparison based on the different categories of computational quirements. It is a delicate equilibrium between performance, power, energy efficiency, flexibility, adaptability, and cost. It cost, and correct operation. Even with the process being more is through these comparisons that a designer is able to make challenging than other design approaches, the benefits of co- an informed decision on what platform to adopt for their design for complex systems outweigh the initial cost. specific application. The authors of [197] attempt to list and INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 19

review several DSP-based SDR platforms from both academia REFERENCES and industry, with more focus on commercial solutions, and [1] “WWRF (2009) Visions and research direction for the wireless world,” then providing a simple comparison between them in terms Tech. Rep., 2009. of programmability, power, and flexibility. However, what this [2] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internet of Things: A Survey on Enabling Technologies, work lacks, in addition to being outdated, is the discussion of Protocols, and Applications,” IEEE Communications Surveys and Tu- FPGAs (mentioned only one example) and hardware/software torials, 2015. co-design solutions, and a methodical analysis of different [3] IEEE, “IEEE 802.11, The Working Group Setting the Standards for Wireless LANs.” [Online]. Available: http://www.ieee802.org/11/ design approaches based on a set of performance metrics. [4] 3GPP, “3GPP - Release 16.” [Online]. Available: http://www.3gpp. Authors of [198] presented a survey of a few SDR platforms org/release-16 that had been developed more than a decade ago. The authors [5] M. Bansal, J. Mehlman, S. Katti, and P. Levis, “OpenRadio,” in Proceedings of the first workshop on Hot topics in software defined of [199] presented a paper that laid out the development and networks - HotSDN ’12, New York, USA, 2012, p. 109. evolution of SDRs over the past several years and discussed [6] ReportBuyer, “Software Defined Radio Market by Application, the motivation behind gaining even more attention recently. Component, End User, Type - Global Forecast to 2021,” Tech. Rep., 2016. [Online]. Available: https://www.reportbuyer. Another work is the comparison conducted by the authors com/product/4364831/software-defined-radio-market-by-application- in [200] between the Imec Bear platform [201] and a few component-end-user-type-global-forecast-to-2021.html multicore-based SDR platforms. Other attempts include the [7] Global Industry Analysts Inc, “Software Defined Radio (SDR) - Global Strategic Business Report,” Tech. Rep., 2017. work by [202], where the authors focused on discussing the [Online]. Available: https://www.researchandmarkets.com/research/ analog end of the SDR concerning signal sampling, process- pc9x7g/software defined ing, and the hardware/software responsible for handling these [8] B. Paillassa and C. Morlet, “Flexible satellites: software radio in the sky,” in 10th International Conference on Telecommunications, 2003. tasks. The work by [203] compares several SDR platforms to ICT 2003., vol. 2, pp. 1596–1600. prove the feasibility and reliability of using SDRs in education, [9] P. Angeletti, R. D. Gaudenzi, M. Lisi, I. Introduction, S. Division, and industry, and government. Another outdated survey is [204]. C. Scientist, “From Bent Pipes to Software Defined Payloads: Evolution and Trends of Satellite Communications Systems,” System, no. June, Other surveys that are relevant to SDR platforms is the work pp. 10 – 12, jun 2008. by [205], where the author discusses and reviews the state- [10] P. Angeletti, M. Lisi, and P. Tognolatti, “Software Defined Radio: A key of-the-art in microwave technology in transceivers. The paper technology for flexibility and reconfigurability in space applications,” in 2014 IEEE Metrology for Aerospace (MetroAeroSpace), may 2014, explores the development of SDRs using different technologies pp. 399–403. in radio frequency engineering. It is a comprehensive study of [11] J. Seo, Y.-H. Chen, D. S. De Lorenzo, S. Lo, P. Enge, D. Akos, and several research topics such as the design of tunable radio J. Lee, “A real-time capable software-defined receiver using GPU for adaptive anti-jam GPS sensors.” Sensors (Basel, Switzerland), vol. 11, frequency components, linear and power efficient amplifiers, no. 9, pp. 8966–91, 2011. linear mixers, and interference rejection techniques. Whereas, [12] W. Xiang, F. Sotiropoulos, and S. Liu, “xRadio: An Novel Software De- authors in [206] provide a very thorough study on the several fined Radio (SDR) Platform and Its Exemplar Application to Vehicle- to-Vehicle Communications.” Springer, Cham, 2015, pp. 404–415. security threats and challenges in SDRs, and go over their [13] K. D. Singh, P. Rawat, and J.-M. Bonnin, “Cognitive radio for vehicular certification process. As SDR platforms grow in popularity, ad hoc networks (CR-VANETs): approaches and challenges,” EURASIP and more communication protocols are realized using them, Journal on Wireless Communications and Networking, vol. 2014, no. 1, p. 49, dec 2014. it becomes essential that developers are prepared for security [14] M. Kloc, R. Weigel, and A. Koelpin, “SDR implementation of an issues and well-equipped with tools that protect their systems. adaptive low-latency IEEE 802.11p transmitter system for real-time wireless applications,” in 2017 IEEE Radio and Wireless Symposium (RWS), jan 2017, pp. 207–210. VIII.CONCLUSION [15] Y. Chen, S. Lu, H.-S. Kim, D. Blaauw, R. G. Dreslinski, and T. Mudge, “A low power software-defined-radio baseband processor for the In- In this paper, we provided a comprehensive overview of ternet of Things,” in 2016 IEEE International Symposium on High the various design approaches and hardware platforms adopted Performance Computer Architecture (HPCA), mar 2016, pp. 40–51. [16] Y. Park, S. Kuk, I. Kang, and H. Kim, “Overcoming IoT Language for SDR solutions. This includes GPPs, GPUs, DSPs, FPGAs, Barriers Using Smartphone SDRs,” IEEE Transactions on Mobile and co-design. We explained the basic architectures and ana- Computing, vol. 16, no. 3, pp. 816–828, mar 2017. lyzed their advantages and disadvantages. Due to the different [17] “Ettus Research - Networked Software Defined Radio (SDR).” [Online]. Available: https://www.ettus.com/ features of design approaches, it was important to compare [18] K. Tan, H. Liu, J. Zhang, Y. Zhang, J. Fang, and G. M. Voelker, them against each other in terms of computational power and “Sora: high-performance software radio using general-purpose multi- power efficiency. We then reviewed the major current and past core processors,” Communications of the ACM, vol. 54, no. 1, p. 99, jan 2011. SDR platforms, whether they were developed by the industry [19] M. Bansal, A. Schulman, and S. Katti, “Atomix: A Framework for or in academia. Finally, we discussed some of the research Deploying Signal Processing Applications on Wireless Infrastructure,” challenges and topics that are predicted to improve in the near in 12th USENIX Symposium on Networked Systems Design and Imple- mentation (NSDI 15), 2015, pp. 173–188. future, helping to advance SDRs and their wide adoption. [20] M.-C. Ng, K. Fleming, M. Vutukuru, S. Gross, Arvind, and H. Bal- We believe that SDR solutions are going to be mainstream akrishnan, “Airblue: A system for cross-layer wireless protocol devel- and that their ability to implement different wireless com- opment,” Symp. Architectures for Networking & Communications Syst. (ANCS), pp. 1–11, 2010. munication standards with high levels of flexibility and re- [21] “WARP Project.” [Online]. Available: https://warpproject.org/trac programmability will be considered the norm. This paper poses [22] A. Haghighat, “A review on essentials and technical challenges of as an exhaustive overview of this phenomenon, its enabling software defined radio,” in MILCOM 2002. Proceedings, 2002, pp. 377–382. technologies, applications, and the current research tackling [23] U. L. Rohde and T. T. N. Bucher, Communications Receivers: Princi- this issue from different angles. ples and Design, 4th ed. McGraw-Hill Education, 1988. INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 20

[24] T. J. Rouphael, RF and digital signal processing for software-defined [52] K. Li, M. Wu, G. Wang, and J. R. Cavallaro, “A high performance radio : a multi-standard multi-mode approach. Newnes, 2009. GPU-based software-defined basestation,” in 2014 48th Asilomar Con- [25] J. J. Carr, The technician’s radio receiver handbook : wireless and ference on Signals, Systems and Computers, nov 2014, pp. 2060–2064. telecommunication technology. Newnes, 2001. [53] K. Li, B. Yin, M. Wu, J. R. Cavallaro, and C. Studer, “Accelerating [26] R. Walden, “Analog-to-digital converter survey and analysis,” IEEE massive MIMO uplink detection on GPU for SDR systems,” in 2015 Journal on Selected Areas in Communications, vol. 17, no. 4, pp. 539– IEEE Dallas Circuits and Systems Conference (DCAS), oct 2015, pp. 550, apr 1999. 1–4. [27] T. Hentschel, M. Henker, and G. Fettweis, “The digital front-end of [54] J. G. Millage, “GPU Integration into a Software Defined Radio software radio terminals,” IEEE Personal Communications, vol. 6, Framework,” Ph.D. dissertation, Iowa State University, 2010. no. 4, pp. 40–46, 1999. [55] “TMS320C6670 Multicore Fixed and Floating-Point System-on- [28] C. Bowick, J. Blyler, and C. J. Ajluni, RF circuit design. Chip — TI.com.” [Online]. Available: https://www.ti.com/product/ Newnes/Elsevier, 2011. tms320c6670 [29] M. N. O. Sadiku and C. M. Akujuobi, “Software-defined radio: a brief [56] L. R. Rabiner and B. Gold, “Theory and application of digital signal overview,” IEEE Potentials, vol. 23, no. 4, pp. 14–15, oct 2004. processing,” Englewood Cliffs, N.J., Prentice-Hall, Inc., 1975. 777 p., [30] L. C. Choo and Z. Lei, “CRC Codes for Short Control Frames in 1975. IEEE 802.11ah,” in 2014 IEEE 80th Vehicular Technology Conference [57] S. W. others Smith, The scientist and engineer’s guide to digital signal (VTC2014-Fall), sep 2014, pp. 1–5. processing. California Technical Pub. San Diego, 1997. [31] F. Berns, G. Kreiselmaier, and N. Wehn, “Channel decoder architecture [58] S. A. Dyer and B. K. Harms, “Digital Signal Processing,” 1993, pp. for 3G mobile wireless terminals,” in Proceedings Design, Automation 59–117. and Test in Europe Conference and Exhibition, pp. 192–197. [59] “SMJ320C80 Digital Signal Processor — TI.com.” [Online]. Available: [32] P.-Y. Chiueh, Tzi-Dar and Tsai, OFDM baseband receiver design for http://www.ti.com/product/SMJ320C80 wireless communications. John Wiley & Sons, 2008. [60] A. Antoniou, Digital Signal Processing. Mcgraw-Hill, 2016. [33] B. Dezfouli, M. Radi, and O. Chipara, “REWIMO,” ACM Transactions [61] D. A. Patterson and J. L. Hennessy, “Computer organization and on Sensor Networks, vol. 13, no. 3, pp. 1–42, aug 2017. design,” Morgan Kaufmann, pp. 474–476, 2007. [34] G. J. Minden, J. B. Evans, L. Searl, D. DePardo, V. R. Petty, R. Rajban- [62] M. J. Flynn, “Some Computer Organizations and Their Effectiveness,” shi, T. Newman, Q. Chen, F. Weidling, J. Guffey, D. Datla, B. Barker, IEEE Transactions on Computers, vol. C-21, no. 9, pp. 948–960, sep M. Peck, B. Cordill, A. M. Wyglinski, and A. Agah, “KUAR: A 1972. Flexible Software-Defined Radio Development Platform,” in 2007 2nd [63] R. Duncan, “A survey of parallel computer architectures,” Computer, IEEE International Symposium on New Frontiers in Dynamic Spectrum vol. 23, no. 2, pp. 5–16, feb 1990. Access Networks, apr 2007, pp. 428–439. [64] M. Gschwind, H. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and [35] K. Kant, Microprocessors and microcontrollers. Prentice-Hall Of T. Yamazaki, “Synergistic Processing in Cell’s Multicore Architecture,” India, 2014. IEEE Micro, vol. 26, no. 2, pp. 10–24, mar 2006. [36] T. Kazaz, C. Van Praet, M. Kulin, P. Willemen, and I. Moerman, [65] “nxp — Microcontrollers and Processors.” [Online]. Avail- able: http://www.nxp.com/products/microcontrollers-and-processors: “Hardware Accelerated SDR Platform for Adaptive Air Interfaces,” apr 2017. MICROCONTROLLERS-AND-PROCESSORS [66] “Livanto ICE8060.” [Online]. Available: http : / / [37] C.-H. Lin, B. Greene, S. Narasimha, and J. Cai, “High performance www.icerasemi.com/products/livanto chipsets/livanto soft baseband/ 14nm SOI FinFET CMOS technology with 0.0174m2 embedded Livanto ICE8060/index.html DRAM and 15 levels of Cu metallization,” in 2014 IEEE International [67] Y. Lin, H. Lee, M. Woh, Y. Harel, S. Mahlke, T. Mudge, C. Chakrabarti, Electron Devices Meeting, dec 2014, pp. 3.8.1–3.8.3. and K. Flautner, “SODA: A low-power architecture for software radio,” [38] T. Ulversoy, “Software Defined Radio: Challenges and Opportunities,” in Proceedings - International Symposium on Computer Architecture, IEEE Communications Surveys & Tutorials, vol. 12, no. 4, pp. 531– vol. 2006, no. 2, may 2006, pp. 89–100. 550, 2010. [68] A. Gatherer, T. Stetzler, M. McMahan, and E. Auslander, “DSP-based Embedded systems : architecture, programming and design [39] R. Kamal, . architectures for mobile communications: past, present and future,” New Delhi: Tata MaGraw-Hill, 2003. IEEE Communications Magazine, vol. 38, no. 1, pp. 84–90, 2000. [40] “GNU Radio.” [Online]. Available: https://www.gnuradio.org/ [69] “TigerSHARC Processors — Analog Devices.” [41] K. Vachhani, “Multiresolution analysis: An unified approach using [70] “CEVA- Leading Licensor of Signal Processing IP.” [Online]. Discrete Wavelet Transform on GNU radio,” in 2015 International Available: https://www.ceva-dsp.com/ Conference on Green Computing and Internet of Things (ICGCIoT), [71] “Qualcomm — Wireless Technology and Innovation.” [Online]. oct 2015, pp. 887–892. Available: https://www.qualcomm.com/ [42] M. Vestias and H. Neto, “Trends of CPU, GPU and FPGA for high- [72] G. Frantz, “Digital signal processor trends,” IEEE Micro, vol. 20, no. 6, performance computing,” in 2014 24th International Conference on pp. 52–59, 2000. Field Programmable Logic and Applications (FPL), sep 2014, pp. 1– [73] R. Akeela and Y. El Ziq, “Design and Verification of IEEE 802.11ah 6. for IoT and M2M Applications,” 2017 IEEE International Conference [43] “CPU vs GPU performance - michaelgalloy.com.” [Online]. Available: on Pervasive Computing and Communications Workshops (PerCom http://michaelgalloy.com/2013/06/11/cpu-vs-gpu-performance.html Workshops), pp. 491–496, mar 2017. [44] “Intel Xeon Processors.” [Online]. Available: https://www.intel.com/ [74] X. Cai, M. Zhou, and X. Huang, “Model-Based Design for Software content/www/us/en/products/processors/xeon.html Defined Radio on an FPGA,” IEEE Access, vol. 5, pp. 8276–8283, [45] “NVIDIA - Visual Computing Technologies.” [Online]. Available: 2017. http://www.nvidia.com/content/global/global.php [75] I. Kuon, R. Tessier, and J. Rose, “FPGA Architecture: Survey and [46] A. Fisne and A. Ozsoy, “Grafik processor accelerated real time soft- Challenges,” Foundations and Trends in Electronic Design Automation, ware defined radio applications,” in 2017 25th Signal Processing and vol. 2, no. 2, pp. 135–253, 2007. Communications Applications Conference (SIU), may 2017, pp. 1–4. [76] N. Grover and M. K. Soni, “Reduction of Power Consumption in FP- [47] B. Cope, P. Y. Cheung, W. Luk, and L. Howes, “Performance Compar- GAs -An Overview,” Information Engineering and Electronic Business, ison of Graphics Processors to Reconfigurable Logic: A Case Study,” vol. 5, no. 5, pp. 50–69, 2012. IEEE Transactions on Computers, vol. 59, no. 4, pp. 433–448, apr [77] D. Strenski, C. Kulkarni, J. Cappello, O. Design, P. Sundararajan, 2010. F. Programmable, G. Arrays, B. High, and T. Graphical, “Latest [48] “CUDA Toolkit Documentation.” [Online]. Available: http://docs. FPGAs Show Big Gains in Floating Point Performance,” pp. 1– nvidia.com/cuda/ 7, 2014. [Online]. Available: https://www.hpcwire.com/2012/04/16/ [49] P. Szegvari and C. Hentschel, “Scalable Software Defined FM-radio re- latest fpgas show big gains in floating point performance/ ceiver running on desktop computers,” in 2009 IEEE 13th International [78] “Xilinx - All Programmable.” [Online]. Available: https://www.xilinx. Symposium on Consumer Electronics, may 2009, pp. 535–539. com/ [50] “Amd — processors — graphics and technology.” [79] D. Lewis, G. Chiu, J. Chromczak, D. Galloway, B. Gamsa, [51] J. Koomey, S. Berard, M. Sanchez, and H. Wong, “Implications of V. Manohararajah, I. Milton, T. Vanderhoek, and J. Van Dyken, “The Historical Trends in the Electrical Efficiency of Computing,” IEEE Stratix 10 Highly Pipelined FPGA Architecture,” in Proceedings of the Annals of the History of Computing, vol. 33, no. 3, pp. 46–54, mar 2016 ACM/SIGDA International Symposium on Field-Programmable 2011. Gate Arrays - FPGA ’16, New York, USA, 2016, pp. 159–168. INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 21

[80] K. Sano and S. Yamamoto, “FPGA-Based Scalable and Power-Efficient Codesign,” in 2017 Euromicro Conference on Digital System Design Fluid Simulation using Floating-Point DSP Blocks,” IEEE Transactions (DSD), aug 2017, pp. 533–537. on Parallel and Distributed Systems, vol. 28, no. 10, pp. 2823–2837, [107] K. Underwood and Keith, “FPGAs vs. CPUs: trends in peak floating- oct 2017. point performance,” in Proceeding of the 2004 ACM/SIGDA 12th [81] S. Kestur, J. D. Davis, and O. Williams, “BLAS Comparison on FPGA, international symposium on Field programmable gate arrays - FPGA CPU and GPU,” in 2010 IEEE Computer Society Annual Symposium ’04, New York, USA, 2004, p. 171. on VLSI, jul 2010, pp. 288–293. [108] B. Betkaoui, D. B. Thomas, and W. Luk, “Comparing performance [82] R. Woods and Wiley InterScience, FPGA-based implementation of and energy efficiency of FPGAs and GPUs for high productivity signal processing systems. John Wiley & Sons, 2008. computing,” in 2010 International Conference on Field-Programmable [83] U. Meyer-Baese, Digital Signal Processing with Field Programmable Technology, dec 2010, pp. 94–101. Gate Arrays, ser. Signals and Communication Technology. Berlin, [109] B. Duan, W. Wang, X. Li, C. Zhang, P. Zhang, and N. Sun, “Floating- Heidelberg: Springer Berlin Heidelberg, 2014. point mixed-radix fft core generation for fpga and comparison with gpu [84] “HDL Coder.” [Online]. Available: https://www.mathworks.com/ and cpu,” in 2011 International Conference on Field-Programmable products/hdl-coder.html Technology, Dec 2011, pp. 1–6. [85] “MATLAB - MathWorks.” [Online]. Available: https://www. [110] M. Owaida, P. Ienne, G. Falcao, J. Andrade, C. Antonopoulos, mathworks.com/products/matlab.html N. Bellas, M. Purnaprajna, D. Novo, G. Karakonstantis, and A. Burg, [86] “Vivado High-Level Synthesis.” [Online]. Available: https://www. “Enhancing Design Space Exploration by Extending CPU/GPU Spec- xilinx.com/products/design-tools/vivado/integration/esl-design.html ifications onto FPGAs,” ACM Transactions on Embedded Computing [87] “Intel FPGA and SoC.” [Online]. Available: https://www.altera.com/ Systems, vol. 14, no. 2, pp. 1–23, feb 2015. [88] S. Choi, R. Scrofano, V. K. Prasanna, and J.-W. Jang, “Energy- [111] X. Tian and K. Benkrid, “High-Performance Quasi-Monte Carlo Fi- efficient signal processing using FPGAs,” in Proceedings of the 2003 nancial Simulation: FPGA vs. GPP vs. GPU,” ACM Transactions on ACM/SIGDA eleventh international symposium on Field programmable Reconfigurable Technology and Systems, vol. 3, no. 4, pp. 1–22, nov gate arrays - FPGA ’03, New York, USA, 2003, p. 225. 2010. [89] Altera, “Achieving One TeraFLOPS with 28-nm FPGAs,” Altera White [112] Q. Wu, Y. Ha, A. Kumar, S. Luo, A. Li, and S. Mohamed, “A Paper, 2010. heterogeneous platform with GPU and FPGA for power efficient [90] “Lattice Semiconductor.” [Online]. Available: http://www.latticesemi. high performance computing,” in 2014 International Symposium on com/ Integrated Circuits (ISIC), dec 2014, pp. 220–223. [91] “Microsemi — Semiconductor & System Solutions — Power Matters.” [113] H. Giefers, P. Staar, C. Bekas, and C. Hagleitner, “Analyzing the [Online]. Available: https://www.microsemi.com/ energy-efficiency of sparse matrix multiplication on heterogeneous systems: A comparative study of GPU, Xeon Phi and FPGA,” in 2016 [92] A. Dutta, D. Saha, D. Grunwald, and D. Sicker, “CODIPHY,” in IEEE International Symposium on Performance Analysis of Systems Proceedings of the second workshop on Software radio implementation and Software (ISPASS), apr 2016, pp. 46–56. forum - SRIF ’13, New York, USA, 2013, p. 1. [114] J. M. P. Cardoso and M. Weinhardt, “High-Level Synthesis,” in FPGAs [93] W. Wolf, “A decade of hardware/software codesign,” Computer, for Software Programmers. Cham: Springer International Publishing, vol. 36, no. 4, pp. 38–43, Apr 2003. 2016, pp. 23–47. [94] G. De Micheli, R. Ernst, and M. Wolf, Readings in hardware/software [115] R. Tessier, K. Pocek, and A. DeHon, “Reconfigurable Computing co-design. Morgan Kaufmann Publishers, 2002. Architectures,” Proceedings of the IEEE, vol. 103, no. 3, pp. 332–354, [95] J. Teich, “Hardware/Software Codesign: The Past, the Present, and mar 2015. Predicting the Future,” Proceedings of the IEEE, vol. 100, no. Special [116] J. Andrade, N. George, K. Karras, D. Novo, F. Pratas, L. Sousa, Centennial Issue, pp. 1411–1430, may 2012. P. Ienne, G. Falcao, and V. Silva, “Design Space Exploration of LDPC [96] “Architecting a Smarter World Arm.” [Online]. Available: https: Decoders using High-Level Synthesis,” IEEE Access, pp. 1–1, 2017. //www.arm.com/ [117] A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, T. Cza- [97] S. Windh, X. Ma, R. J. Halstead, P. Budhkar, Z. Luna, O. Hussaini, and jkowski, S. D. Brown, and J. H. Anderson, “LegUp: : An open- W. A. Najjar, “High-level language tools for reconfigurable computing,” source high-level synthesis tool for FPGA-based processor/accelerator Proceedings of the IEEE, vol. 103, no. 3, 2015. systems,” ACM Transactions on Embedded Computing Systems, vol. 13, [98] K. B. Chehida and M. Auguin, “HW / SW partitioning approach no. 2, pp. 1–27, sep 2013. for reconfigurable system design,” in Proceedings of the international [118] J. Cong, Bin Liu, S. Neuendorffer, J. Noguera, K. Vissers, and conference on Compilers, architecture, and synthesis for embedded Zhiru Zhang, “High-Level Synthesis for FPGAs: From Prototyping systems - CASES ’02, New York, USA, 2002, p. 247. to Deployment,” IEEE Transactions on Computer-Aided Design of [99]M.L opez-Vallejo´ and J. C. Lopez,´ “On the hardware-software partition- Integrated Circuits and Systems, vol. 30, no. 4, pp. 473–491, apr 2011. ing problem,” ACM Transactions on Design Automation of Electronic [119] “Intel FPGA SDK for OpenCL.” [Online]. Available: Systems, vol. 8, no. 3, pp. 269–297, jul 2003. https://www.altera.com/products/design-software/embedded-software- [100] L. Zhuo and V. K. Prasanna, “Hardware/Software Co-Design for Matrix developers/opencl/overview.html Computations on Reconfigurable Computing Systems,” in 2007 IEEE [120] “MaxCompiler — Maxeler Technologies.” [Online]. Available: International Parallel and Distributed Processing Symposium, 2007, https://www.maxeler.com/products/software/maxcompiler/ pp. 1–10. [121] G. Inggs, S. Fleming, D. Thomas, and W. Luk, “Is high level synthesis [101] G. Sapienza, I. Crnkovic, and P. Potena, “Architectural Decisions for ready for business? A computational finance case study,” in 2014 HW/SW Partitioning Based on Multiple Extra-Functional Properties,” International Conference on Field-Programmable Technology (FPT), in 2014 IEEE/IFIP Conference on Software Architecture, apr 2014, pp. dec 2014, pp. 12–19. 175–184. [122] R. Nane, V.-M. Sima, C. Pilato, J. Choi, B. Fort, A. Canis, Y. T. [102] I. Bolsens, H. De Man, B. Lin, K. Van Rompaey, S. Vercauteren, and Chen, H. Hsiao, S. Brown, F. Ferrandi, J. Anderson, and K. Bertels, “A D. Verkest, “Hardware/software co-design of digital telecommunication Survey and Evaluation of FPGA High-Level Synthesis Tools,” IEEE systems,” Proceedings of the IEEE, vol. 85, no. 3, pp. 391–418, mar Transactions on Computer-Aided Design of Integrated Circuits and 1997. Systems, vol. 35, no. 10, pp. 1591–1604, oct 2016. [103] C. Zhang, Y. Ma, and W. Luk, “HW/SW Partitioning Algorithm [123] “High-Level Synthesis For Any FPGA — LegUp Computing.” Targeting MPSOC with Dynamic Partial Reconfigurable Fabric,” in [Online]. Available: https://www.legupcomputing.com/ 2015 14th International Conference on Computer-Aided Design and [124] W. Meeus, K. Van Beeck, T. Goedeme,´ J. Meel, and D. Stroobandt, Computer Graphics (CAD/Graphics), aug 2015, pp. 240–241. “An overview of today’s high-level synthesis tools,” Design Automation [104] D. K. Halim, S. W. Lee, M. S. Ng, Z. N. Lim, and C. M. Tang, “Ex- for Embedded Systems, vol. 16, no. 3, pp. 31–51, sep 2012. ploring software-defined radio on Multi-Processor System-on-Chip,” in [125] S. Ravi and M. Joseph, “Open source HLS tools: A stepping stone 2015 3rd International Conference on New Media (CONMEDIA), nov for modern electronic CAD,” in 2016 IEEE International Conference 2015, pp. 1–4. on Computational Intelligence and Computing Research (ICCIC), dec [105] G. Snider, P. Kuekes, and R. S. Williams, “{CMOS}-like logic in 2016, pp. 1–8. defective, nanoscale crossbars,” Nanotechnology, vol. 15, no. 8, pp. [126] L. A. Tambara, J. Tonfat, A. Santos, F. Lima Kastensmidt, N. H. 881–891, aug 2004. Medina, N. Added, V. A. P. Aguiar, F. Aguirre, and M. A. G. [106] M. Gobel, A. Elhossini, and B. Juurlink, “A Methodology for Predict- Silveira, “Analyzing Reliability and Performance Trade-Offs of HLS- ing Application-Specific Achievable Memory Bandwidth for HW/SW- Based Designs in SRAM-Based FPGAs Under Soft Errors,” IEEE INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 22

Transactions on Nuclear Science, vol. 64, no. 2, pp. 874–881, feb [152] B. Khailany, W. J. Dally, U. J. Kapasi, P. Mattson, J. Namkoong, 2017. J. D. Owens, B. Towles, A. Chang, and S. Rixner, “Imagine: media [127] “SDSoC Development Environment.” [Online]. Available: https: processing with streams,” IEEE Micro, vol. 21, no. 2, pp. 35–46, 2001. //www.xilinx.com/products/design-tools/software-zone/sdsoc.html [153] B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, J. D. Owens, and [128] “Stratus High-Level Synthesis - Cadence Design Systems.” [Online]. B. Towles, “Exploring the VLSI scalability of stream processors,” in Available: https://www.cadence.com/content/cadence-www/global/en The Ninth International Symposium on High-Performance Computer US/home/tools/digital-design-and-signoff/synthesis/stratus-high-level- Architecture, 2003. HPCA-9 2003. Proceedings., 2003, pp. 153–164. synthesis.html [154] J. Gummaraju and M. Rosenblum, “Stream Programming on General- [129] “Synphony C Compiler - Synopsys.” [Online]. Available: Purpose Processors,” in 38th Annual IEEE/ACM International Sympo- https://www.synopsys.com/implementation-and-signoff/rtl-synthesis- sium on Microarchitecture (MICRO’05), 2005, pp. 343–354. test/synphony-c-compiler.html [155] B. K. Khailany, T. Williams, J. Lin, E. P. Long, M. Rygh, D. W. Tovey, [130] “MathWorks - Makers of MATLAB and Simulink.” [Online]. and W. J. Dally, “A Programmable 512 GOPS Stream Processor for Available: https://www.mathworks.com/ Signal, Image, and Video Processing,” IEEE Journal of Solid-State [131] “LabVIEW - National Instruments.” [Online]. Available: http: Circuits, vol. 43, no. 1, pp. 202–213, jan 2008. //www.ni.com/en-us/shop/labview.html [156] M. Woh, Y. Lin, S. Seo, S. Mahlke, T. Mudge, C. Chakrabarti, R. Bruce, [132] “Simulink - Simulation and Model-Based Design.” [Online]. Available: D. Kershaw, A. Reid, M. Wilder, and K. Flautner, “From SODA to https://www.mathworks.com/products/simulink.html scotch: The evolution of a wireless baseband processor,” in 2008 41st [133] “MATLAB Coder.” [Online]. Available: https://www.mathworks.com/ IEEE/ACM International Symposium on Microarchitecture, nov 2008, products/matlab-coder.html pp. 152–163. [134] “Simulink Coder - MATLAB & Simulink.” [Online]. Available: [157] “DigiKey Electronics - Electronic Components Distributor.” [Online]. https://www.mathworks.com/products/simulink-coder.html Available: https://www.digikey.com/ [135] “Embedded Coder - MATLAB & Simulink.” [Online]. Available: [158] “Newark element14 Electronics — Electronic Components https://www.mathworks.com/products/embedded-coder.html Distributor.” [Online]. Available: https://www.newark.com [136] “ZedBoard Zynq-7000 ARM/FPGA SoC Development Board.” [159] “BeagleBoard.org - community supported open hardware computers [Online]. Available: https://www.xilinx.com/products/boards-and- for making.” [Online]. Available: http://beagleboard.org/ kits/1-elhabt.html [160] “Imagination Technologies - Developing and Licensing IP cores.” [137] R. W. Stewart, L. Crockett, D. Atkinson, K. Barlee, D. Crawford, [Online]. Available: https://www.imgtec.com/ I. Chalmers, M. Mclernon, and E. Sozer, “A low-cost desktop software [161] A. S. Fayez, “Designing a Software Defined Radio to Run on a defined radio design environment using MATLAB, simulink, and the Heterogeneous Processor,” Ph.D. dissertation, Virginia Tech, apr 2011. RTL-SDR,” IEEE Communications Magazine, vol. 53, no. 9, pp. 64– [162] B. Le, F. A. Rodriguez, Q. Chen, B. P. Li, F. Ge, M. ElNainay, T. W. 71, sep 2015. Rondeau, and C. W. Bostian, “A public safety cognitive radio node,” [138] R. W. Stewart, K. W. Barlee, D. S. W. Atkinson, and L. H. Crockett, in Proceedings of the 2007 SDR forum technical conference, SDRF Software Defined Radio using MATLAB & Simulink and the RTL-SDR. Google Scholar, 2007. Strathclyde Academic Media, 2015. [163] M. M. Bezem, J. W. Klop, R. de. Vrijer, and Terese (Group), Term [139] “RTL-SDR (RTL2832U).” [Online]. Available: https://www.rtl- rewriting systems. Cambridge University Press, 2003. sdr.com/ [164] “32-bit ARM Cortex-M3 PSoC 5LP — Cypress Semiconductor.” [140] Blossom and Eric, “GNU radio: tools for exploring the radio frequency [Online]. Available: http://www.cypress.com/products/32-bit-arm- spectrum,” Linux Journal, vol. 2004, no. 122, p. 4, 2004. cortex-m3-psoc-5lp [141] “Vivado System Generator for DSP.” [Online]. Available: https:// [165] “PSoC SDR PAøRWE.” [Online]. Available: http://pa0rwe.nl/?page www.xilinx.com/products/design-tools/vivado/integration/sysgen.html id=32 [142] J. Malsbury, “Modular, open-source software transceiver for PHY/MAC research,” Proceedings of the second workshop on Software radio [166] B. Drozdenko, M. Zimmermann, T. Dao, K. Chowdhury, and implementation forum - SRIF ’13, 2013. M. Leeser, “Hardware-Software Codesign of Wireless Transceivers on [143] M. Abirami, V. Hariharan, M. B. Sruthi, R. Gandhiraj, and K. P. Soman, Zynq Heterogeneous Systems,” IEEE Transactions on Emerging Topics “Exploiting GNU radio and USRP: An economical test bed for real in Computing, pp. 1–1, 2017. time communication systems,” in 2013 4th International Conference on [167] Z. X. Zhang Xin, T. L.-b. Tang Lin-bo, and J. M.-p. Ji Mei-ping, Computing, Communications and Networking Technologies, ICCCNT “Remote updating for DSP and FPGA programs,” in IET International 2013, 2013. Radar Conference 2015. Institution of Engineering and Technology, [144]K.K uç¨ uk,¨ “RTWiFi-Lab: A real-time Wi-Fi laboratory platform on 2015, pp. 4 .–4 . USRP and LabVIEW for wireless communications education and [168] J. Hoffmeyer, Il-Pyung Park, M. Majmundar, and S. Blust, “Radio research,” Computer Applications in Engineering Education, aug 2017. software download for commercial wireless reconfigurable devices,” [145] V. P. G. Jimenez, A. L. Serrano, B. G. Guzman, and A. G. Armada, IEEE Communications Magazine, vol. 42, no. 3, pp. S26–S32, mar “Learning Mobile Communications Standards through Flexible Soft- 2004. ware Defined Radio Base Stations,” IEEE Communications Magazine, [169] B. Bing, “A fast and secure framework for over-the-air wireless vol. 55, no. 5, pp. 116–123, may 2017. software download using reconfigurable mobile devices,” IEEE Com- [146] “Software Defined Radio - Lime Micro.” [Online]. Available: munications Magazine, vol. 44, no. 6, pp. 58–63, jun 2006. http://www.limemicro.com/products/software-defined-radio/ [170] L. Zhou, Q. Liu, B. Wang, P. Yang, X. Li, and J. Zhang, “Remote [147] G. Stewart, M. Gowda, G. Mainland, B. Radunovic, D. Vytiniotis, and System Update for System on Programmable Chip Based on Controller C. L. Agullo, “Ziria,” in Proceedings of the Twentieth International Area Network,” Electronics, vol. 6, no. 2, p. 45, jun 2017. Conference on Architectural Support for Programming Languages and [171] A. Fernandes, R. C. Pereira, J. Sousa, P. F. Carvalho, M. Correia, A. P. Operating Systems - ASPLOS ’15, vol. 50, no. 4, New York, USA, Rodrigues, B. B. Carvalho, C. M. B. A. Correia, and B. Goncalves, 2015, pp. 415–428. “FPGA Remote Update for Nuclear Environments,” IEEE Transactions [148] P. D. Sutton, J. Lotze, H. Lahlou, S. A. Fahmy, K. E. Nolan, B. ??zg??l, on Nuclear Science, vol. 63, no. 3, pp. 1645–1649, jun 2016. T. W. Rondeau, J. Noguera, and L. E. Doyle, “Iris: An architecture for [172] J. Vliegen, N. Mentens, and I. Verbauwhede, “Secure, Remote, Dy- cognitive radio networking testbeds,” IEEE Communications Magazine, namic Reconfiguration of FPGAs,” ACM Transactions on Reconfig- vol. 48, no. 9, pp. 114–122, sep 2010. urable Technology and Systems, vol. 7, no. 4, pp. 1–19, dec 2014. [149] J. van de Belt, P. D. Sutton, and L. E. Doyle, “Accelerating software [173] H. Kashyap and R. Chaves, “Compact and On-the-Fly Secure Dynamic radio: Iris on the Zynq SoC,” in 2013 IFIP/IEEE 21st International Reconfiguration for Volatile FPGAs,” ACM Transactions on Reconfig- Conference on Very Large Scale Integration (VLSI-SoC), oct 2013, pp. urable Technology and Systems, vol. 9, no. 2, pp. 1–22, jan 2016. 294–295. [174] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, [150] J. Kim, S. Hyeon, and S. Choi, “Implementation of an SDR system J. Rexford, S. Shenker, and J. Turner, “Openflow: Enabling innovation using graphics processing unit,” IEEE Communications Magazine, in campus networks,” ACM SIGCOMM Computer Communication vol. 48, no. 3, pp. 156–162, mar 2010. Review, vol. 38, no. 2, pp. 69–74, 2008. [151] S. Rajagopal, S. Rixner, and J. R. Cavallaro, “A programmable base- [175] R. Enns, M. Bjorklund, and J. Schoenwaelder, “Network configuration band processor design for software defined radios,” in The 2002 45th protocol (netconf),” 2011. Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002., [176] R. Mijumbi, J. Serrat, J.-L. Gorricho, N. Bouten, F. De Turck, and vol. 3, 2002, pp. 413–416. R. Boutaba, “Network Function Virtualization: State-of-the-Art and INTERNETOFTHINGSRESEARCHLAB,DEPARTMENTOFCOMPUTERENGINEERING,SANTACLARAUNIVERSITY,USA— MARCH 2018 23

Research Challenges,” IEEE Communications Surveys & Tutorials, [200] M. Palkovic, P. Raghavan, M. Li, A. Dejonghe, L. Van der Perre, and vol. 18, no. 1, pp. 236–262, 2016. F. Catthoor, “Future Software-Defined Radio Platforms and Mapping [177] J. Gil Herrera and J. F. Botero, “Resource Allocation in NFV: A Flows,” IEEE Signal Processing Magazine, vol. 27, no. 2, pp. 22–33, Comprehensive Survey,” IEEE Transactions on Network and Service mar 2010. Management, vol. 13, no. 3, pp. 518–532, sep 2016. [201] V. Derudder and B. Bougard and A. Couvreur and A. Dewilde and [178] S. Sun, M. Kadoch, L. Gong, and B. Rong, “Integrating network S. Dupont and L. Folens and L. Hollevoet and F. Naessens and D. function virtualization with SDR and SDN for 4G/5G networks,” IEEE Novo and P. Raghavan and T. Schuster and K. Stinkens and J. W. Network, vol. 29, no. 3, pp. 54–59, may 2015. Weijers and L. Van der Perre, “A 200Mbps+ 2.14nJ/b digital baseband [179] P. Shome, Muxi Yan, S. M. Najafabad, N. Mastronarde, and multi processor system-on-chip for SDRs,” in 2009 Symposium on VLSI A. Sprintson, “CrossFlow: A cross-layer architecture for SDR using Circuits, 2009, pp. 292–293. SDN principles,” in 2015 IEEE Conference on Network Function [202] R. G. Machado and A. M. Wyglinski, “Software-defined radio: Bridg- Virtualization and Software Defined Network (NFV-SDN), nov 2015, ing the analog-digital divide,” Proceedings of the IEEE, vol. 103, no. 3, pp. 37–39. pp. 409–423, mar 2015. [180] J. Schulz-Zander, C. Mayer, B. Ciobotaru, S. Schmid, and A. Feld- [203] A. M. Wyglinski, D. P. Orofino, M. N. Ettus, and T. W. Rondeau, mann, “Opensdwn: Programmatic control over home and enterprise “Revolutionizing software defined radio: case studies in hardware, wifi,” in Proceedings of the 1st ACM SIGCOMM Symposium on software, and education,” IEEE Communications Magazine, vol. 54, Software Defined Networking Research. ACM, 2015, p. 16. no. 1, pp. 68–75, jan 2016. [181] A. Zubow, S. Zehl, and A. Wolisz, “BIGAP - Seamless handover in [204] R. Farrell, M. Sanchez, and G. Corley, “Software-Defined Radio high performance enterprise IEEE 802.11 networks,” in in IEEE/IFIP Demonstrators: An Example and Future Trends,” International Journal Network Operations and Management Symposium (NOMS), April of Digital Multimedia Broadcasting, pp. 1–12, mar 2009. 2016, pp. 445–453. [205] T. Nesimoglu, “A review of Software Defined Radio enabling technolo- [182] R. Riggio, M. K. Marina, J. Schulz-Zander, S. Kuklinski, and gies,” in 2010 10th Mediterranean Microwave Symposium, aug 2010, T. Rasheed, “Programming abstractions for software-defined wireless pp. 87–90. networks,” IEEE Transactions on Network and Service Management, [206] G. Baldini, T. Sturman, A. R. Biswas, R. Leschhorn, G. Godor, and vol. 12, no. 2, pp. 146–162, June 2015. M. Street, “Security Aspects in Software Defined Radio and Cognitive [183] J. Vestin and A. Kassler, “Qos enabled wifi mac layer processing as Radio Networks: A Survey and A Way Ahead,” IEEE Communications an example of a nfv service,” in 1st IEEE Conference on Network Surveys & Tutorials, vol. 14, no. 2, pp. 355–379, 2012. Softwarization (NetSoft), April 2015, pp. 1–9. [184] L.-m. Ang, K. P. Seng, L. W. Chew, L. S. Yeong, and W. C. Chia, Wireless Multimedia Sensor Networks on Reconfigurable Hardware. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. [185] T. Han and N. Ansari, “A Traffic Load Balancing Framework for Software-Defined Radio Access Networks Powered by Hybrid Energy Sources,” IEEE/ACM Transactions on Networking, vol. 24, no. 2, pp. 1038–1051, apr 2016. [186] ——, “On Optimizing Green Energy Utilization for Cellular Networks with Hybrid Energy Supplies,” IEEE Transactions on Wireless Com- munications, vol. 12, no. 8, pp. 3872–3882, aug 2013. [187] M. M. Tentzeris, A. Georgiadis, and L. Roselli, “Energy Harvesting and Scavenging [Scanning the Issue],” Proceedings of the IEEE, vol. 102, no. 11, pp. 1644–1648, nov 2014. [188] T. Han and N. Ansari, “Powering mobile networks with green energy,” IEEE Wireless Communications, vol. 21, no. 1, pp. 90–96, feb 2014. [189] Ericsson Inc., “Sustainable Energy Use in Mobile Communications Abstract — TechOnline,” Tech. Rep., 2017. [190] S. Park, H. Kim, and D. Hong, “Cognitive Radio Networks with Energy Harvesting,” IEEE Transactions on Wireless Communications, vol. 12, no. 3, pp. 1386–1397, mar 2013. [191] F. K. Shaikh, S. Zeadally, and E. Exposito, “Enabling technologies for green internet of things,” IEEE Systems Journal, 2017. [192] R. Arshad, S. Zahoor, M. A. Shah, A. Wahid, and H. Yu, “Green IoT: An Investigation on Energy Saving Practices for 2020 and Beyond,” IEEE Access, vol. 5, pp. 15 667–15 681, 2017. [193] I. Mhadhbi, S. Ben Othman, and S. Ben Saoud, “An Efficient Technique for Hardware/Software Partitioning Process in Codesign,” Scientific Programming, pp. 1–11, Jul 2016. [194] J. M. Mitola, “Software Radios Survey, Critical Evaluation and Future Directions,” IEEE Aerospace and Electronic Systems Magazine, vol. 8, no. 4, pp. 25–36, apr 1993. [195] W. H. W. Tuttlebee, “Software-defined radio: facets of a developing technology,” IEEE Personal Communications, vol. 6, no. 2, pp. 38–44, apr 1999. [196] M. Dardaillon, K. Marquet, T. Risset, and A. Scherrer, “Software defined radio architecture survey for cognitive testbeds,” in IWCMC 2012 - 8th International Wireless Communications and Mobile Computing Conference, aug 2012, pp. 189–194. [197] O. Anjum, T. Ahonen, F. Garzia, J. Nurmi, C. Brunelli, and H. Berg, “State of the art baseband DSP platforms for Software Defined Ra- dio: A survey,” EURASIP Journal on Wireless Communications and Networking, vol. 2011, no. 1, p. 5, dec 2011. [198] M. Cummings and T. Cooklev, “Tutorial: Software-defined radio technology,” in 2007 25th International Conference on Computer Design, oct 2007, pp. 103–104. [199] C. Moy and J. Palicot, “Software radio: a catalyst for wireless innovation,” IEEE Communications Magazine, vol. 53, no. 9, pp. 24–30, sep 2015.