Eindhoven University of Technology

MASTER

Mapping a China Digital (CDR) receiver on a software-defined-radio platform

Cheng, Y.

Award date: 2017

Link to publication

Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration.

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial Department of Mathematics and Computer Science Algorithm & Software Innovation Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform

Master Thesis

Yan Cheng

Supervisors: prof.dr.ir.C.H.(Kees) van Berkel Dr.Hong Li

Eindhoven, August 2017

Abstract

With the launch of the China Digital Radio (CDR) standard in hundreds of cities in China, CDR chips are required in market. To explore fast and efficient embedded Software- Defined-Radio (SDR) CDR receiver design and realization, this thesis project used Data-Flow (DF) modeling to study architectural options of a CDR receiver design for an existing NXP SDR chip. Next to a study on the CDR standard and CDR processing algorithms. The CDR receiver architectural requirements, constrains and options are extracted and analyzed for three receiver use-cases: Single radio, Single radio plus background scanning, and Dual to listen to two radio channels simultaneously. Abstract Cyclo Static Data-Flow(CSDF) models of all CDR pro- cessing tasks and Data-flow (DF) graphs for each use-case are established with processing load, states and memory measurements of the tasks, and simulated using a HAPI tool to evaluate architecture options in different CDR transmission modes. With the CSDF modeling and simulation, we have shown that the NXP chip and processing algorithms can support CDR processing in all use-cases when the broadcasting is in QPSK . It has sufficient processing capacity for 16QAM, but cannot support the Dual Radios use-case in 64QAM modulation, although 64QAM is not expected in real broadcasting. The DF simulation shows also the best LDPC decoding options for different transmission config- urations, which allows dynamical selection for the best reception quality according to the detected broadcasting mode in run-time.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform iii Chapter 1

Introduction

With the technology development in internet and , the infotainment behaviors of users have changed dramatically, the traditional analog radio broadcasting are facing more and more challenges. Compared with the traditional analog FM radio, the digital radio can provide new features including: 1. Efficient use of spectrum space 2. More channels and content offering a wider variety of radio services 3. Clear sound quality and clarity 4. Pause and rewind of live radio Many digital radio standards have been launched in different areas of the world, China has also developed own standard - the China Digital Radio (CDR) standard, and now it is being deployed in more than 500 cities in China.

1.1 Project Context

Because of the market demand for the CDR receiver chips, a CDR receiver is designed for a Software Defined Radio (SDR) platform in this project. SDR enables a flexible approach to support a wide range of wireless communication standards by function designs and configurations. It can support new standard by updating the software and the reprogrammable logic without making any changes to the hardware platform. However, lots of expert experience and architecture art are required for mapping and schedul- ing functional tasks onto the embedded target platform. Moreover, during searching about an optimized design from a series of options, large amount of quantitative analysis about the demand of each task and available resource on the platform are involved in evaluation of the advantages, disadvantages, potential issues and problems for each option. In order to deal with the complexity and issues arising from the architecture design, model- based analysis is an efficient solution. Complicated systems can be described by mathematical models, which can represent system components and their interactions with their surrounding environment [1]. These models can be used to facilitate different stages in the design procedure, including system simulation, stability analysis, and scheduling on the target platform. Data-flow modeling is a widely used model-based analysis approach for processing ap- plications. Many tools can provide the simulation based on the DF model to improve the quality and efficiency of the analysis. With the simulator, engineers are able to evaluate the design options, critical problems can be detected before the implementation phase. Therefore, the implementation of the CDR receiver in this project will benefit from data-flow modeling and simulation. The approach is explored to improve the quality and efficiency of the design procedure in industry.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 1 CHAPTER 1. INTRODUCTION

1.2 Related Work

The and decoding subsystem in the CDR receiver is defined by the standard [6]. A signal processing simulation chain was designed and tested according to the CDR standard [16], with several issues investigated and solutions explored in Matlab. Another subsystem for demul- tiplexing is defined in the standard [7]. In the research about architecture design for the radio receiver on a SDR platform, a flexible architecture is proposed in [8] to support multiple channels in the FM band and two single tuners for DAB, which is a pure digital radio standard in Europe. Four single channels are extracted up in real-time, each channel is buffered in a FIFO memory to allow further time sharing of the decoder blocks. To deal with the complexity in the design procedure, many researches have explored data-flow based analysis and design. A reconfigurable phase-shift keying system as an SDR application was designed and implemented with a light weight data-flow approach [13], which facilitates the pro- cesses of cross-language, cross-platform migration and prototyping in signal processing application domains for efficient implementation. The Synchronous Data-Flow (SDF) is an important type of the data-flow in application mod- eling. But it is hard to describe the dynamic behavior of the SDR system, since the SDF is a restriction of Kahn process networks [12], where actors produce and consume a fixed number of data items per firing with static scheduling. In order to address the limitation in Data-flow based modeling, the dynamic data processing behavior is divided into a group of static modes of operation, where each group is modeled by a SDF graph referred to as a scenario [15], and the worst-case throughput analysis is applied for the scenarios. However, the dependencies between scenarios are lack in the scenario based data flow model, it will lead to an invalid worst-case temporal analysis. Then a technique for modeling the dependency with Finite-State machine is explored, and the results of throughput were compared and validated with that by the previous scenario-aware data-flow model [14]. As an extension of the SDF modeling, the Cyclo-Static Data-Flow (CSDF) is much more expressive, it supports algorithms with a cyclically changing, but predefined behavior [3]. The data-dependent and state-dependent conditional behaviors can be described by the CSDF. A static scheduling can be done with CSDF, because the behavior changes with a group of predictable sequences. Because the CDR standard supports multi-transmission modes and spectrum modes, the CSDF is suitable to describe the signal processing tasks which fires in a periodical and data-dependent pattern. In this report, the CDR receiver will be modeled by the CSDF, the architecture options will be applied on the models and simulated with a data-flow simulator to analyze and generate optimal design options.

1.3 Problem Definition

The goal of this project is to research the architecture options, including optimization in the schedule policy and the resources usage for a CDR receiver. To improve the efficiency in the research on the architecture options for industrial embedded architecture design, issues and solutions will be explored to benefit from the data-flow modeling and a simulation tool. The research about design options includes:

1. Describing the CDR receiver use cases, setting up the requirements based on an existed signal processing Matlab chain and the de-multiplexing sub-system, the statistics about the resource usage on the target platform are also taken as constraints.

2. Building the data-flow model for the CDR receiver. The model should be flexible enough to express the behaviors of the system in multiple transmission modes and spectrum modes for three use cases.

2 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 1. INTRODUCTION

3. Doing simulation with a data-flow simulator using the receiver and several graphs for different architecture design options, in order to generate analyzable data and results.

4. Analyzing and evaluating the designs with the simulation results. The recommended design options can be made for the CDR receiver system.

In order to understand the requirements of the processing functions and advanced receiver use cases, it is a task of the project to complete a CDR de-multiplexer, which includes the API design and testing based on an initial CDR de-multiplexer implementation. The implementation of the CDR demultiplexer includes:

1. The requirements are figured out. The functional requirements are set up according to the CDR multiplexing standard. The requirements for the demultiplexer running in different use cases are also analyzed.

2. Design a multiplexer in order to test the de-multiplexer in different use cases.

3. The design of the demultiplexer is implemented and verified.

For the analysis of the architecture design options, a simulator is used to generate the results for evaluation of the resource usage and system performance. The motivation about choosing HAPI as the simulator includes:

1. The HAPI simulator provides understandable actors and buffer channels to describe the data-flow based system. The simulation model in HAPI is built by programming in C++, so no extra syntax is required to learn in modeling and simulation, it enable designer to build the simulation model quickly.

2. The simulation model can be in high level abstraction, therefore the simulation is typically fast with less details.

3. HAPI has built-in support for schedulers, which facilitate to analyze the performances of the feasible scheduling policies. The arbitration policy includes Fix Priority Preemptive (FPP) scheduling, Time Division Multiplex (TDM) scheduling and Round-Robin(RR) scheduling.

4. The execution depending on the input and output can be expressed by HAPI. The actions for each actor in HAPI includes three stages: Acquire section, Execution, and Release function. It is convenient for describing the behaviors and temporal characteristics of a conditional model.

With these features, HAPI is very suitable for generating analyzable simulation results. It is easy to learn, and the results are observable and accurate for the analysis.

1.4 Contribution

The contribution of this project includes:

1. Build data-flow models and graphs for the CDR receiver system, analyze the resource usage and explore optimal architecture design options via DF simulation in different use cases.

2. Implement the CDR de-multiplexing subsystem including the design of API and a flexible multiplexer for testing.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 3 CHAPTER 1. INTRODUCTION

1.5 Outline

The rest of the thesis is organized as follows: Chapter 2 introduces the CDR standard, CDR receiver system and the HAPI tool. Chapter 3 introduces the design of the CDR De-multiplexer includes the frame structure, the de-multiplexer functionality, and the multiplexer design. In Chapter 4 the requirements and constraints for the CDR receiver architecture design are described for different use cases on the SDR platform. In Chapter 5, the CSDF models of the CDR receiver are built using the HAPI simulator. Several data-flow graphs simulation and results are described and analyzed in Chapter 6. Chapter 7 gives conclusions and future work, including architecture recommendations for different CDR receiver use cases.

4 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform Chapter 2

CDR standard and HAPI simulator

The CDR standard is introduced first in this chapter. The functional tasks in an existed base band signal processing Matlab chain will be described in the second section, the design for one of the subsystems in the CDR receiver will be based on the chain. In the last section, a data-flow simulation tool will be introduced and explored to show its feasibility in analyzing the architecture design and schedule options for the CDR receiver.

2.1 The CDR Standard

China Digital Radio, commonly referred to as CDR, is a digital radio broadcasting standard that operates in the FM band (87 MHz to 108 MHz) [6]. It is expected to cover over 500 cities in China before the end of 2017. The CDR standard is a type of In-Band-On-Channel (IBOC) system. IBOC means a hybrid method for simultaneously broadcasting digital and analog radio in the same frequency band by transmitting extra digital sub-carriers in the sidebands of an AM or FM channel [5]. The CDR standard was released in the year of 2013, and labeled as GY/T268.1 Physical Layer, and GY/T268.2 Multiplexing, The main characteristics of the CDR standard include:

• Flexible spectrum utilization modes, as well as multiple transmission modes

• LDPC (Low Density Parity Check) coding is used as forward error correction of service data

• Audio compression uses the Chinese DRA+ coding scheme [18]

A CDR system diagram includes both transmitting and receiving systems as shown in Fig. 2.1. For the CDR transmitter is on the left of the Fig. 2.1, one type of the input to the transmitter is radio program containing multiple audio streams and data services, which is defined as the Main Service Data (MSD) in the CDR standard. The other type of input is Service Description Information (SDI) defined to carry the program guide information, configuration of the MSD and network information. The multiplexing subsystem generates Service Multiplexed Frames (SMF) with MSD bits and Control Multiplexed Frames (CMF) with the SDI bits. System Information (SI) is required to figure out the configuration of the modulation, coding, channel capacity and so on. After channel coding and modulation, the SI, MSD and SDI bits will be transmitted on their specified sub- carriers, respectively. In the receiving system on the right of the Fig. 2.1, for each received physical layer frame, the SI is processed first in the de-modulation and de-coding subsystem to configure the modulation and coding settings of the SDI and MSD sub-carriers. Then the SDI bits in the CMFs are processed

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 5 CHAPTER 2. CDR STANDARD AND HAPI SIMULATOR

Figure 2.1: CDR system diagram and sent to the de-multiplexing sub-system. After SDI bits processing, the MSD bits in the SMFs inthe Physical layer frame can be processed, then the selected audio stream extracted by de-multiplexer will go further through the DRA+ decoder then be played to the user.

2.1.1 DRA+ coding scheme In the CDR standard, the audio streams in the MSD can only be compressed wit DRA extension (DRA+) scheme. DRA+ is a low-bit-rate high quality multichannel audio coding algorithm.

2.1.2 Multiple spectral utilization and transmission modes

Figure 2.2: Spectral modes in the CDR standard

Being an IBOC system, six spectral modes are defined in the CDR standard [6] to broadcast the hybrid of analog and digital in a channel. As shown in Fig. 2.2, each block of a half sub-band covers 50 kHz bandwidth, the green blocks represent the half sub-bands carrying digital signal and those in grey are for analog signal. The first 2 modes are defined for pure digital transmission, while the other 4 modes contain sub-bands for both analog and digital signals. Modes 2, 10 and 23 consist of 4 half sub-bands for the digital signal, the other three modes (1, 9, 22) contain only 2 half sub-bands for digital use. The label ’DA’ represents the sub-bands with the central frequencies of: (100i + 50)kHz, i = 0, 1; while the label ’DB’ means the central frequencies of sub-bands are integer multiples of 100kHz.

6 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 2. CDR STANDARD AND HAPI SIMULATOR

Besides spectral modes, the CDR standard also defines 3 transmission modes. The difference among 3 modes are about the configurations of OFDM symbol and sub-carriers in each sub-band. Longer cyclic prefix in the OFDM symbol in Mode 1 provides better tolerance to multi-path delay. On the contrary, shorter cyclic prefix in Mode 3 provides larger capacity of the channel, or higher data rate. The Mode 2 uses wider sub-carrier spacing for more robust against the Doppler shift and is more suitable for high-speed mobile use [16].

2.1.3 LDPC Low-Density Parity-Check (LDPC) coding is an advanced channel coding against noise and channel effects, with error correction performance most close to the theoretical maximum (the Shannon limit) for a symmetric memoryless channel. LDPC is popular in modern standards, such as the latest 5G and WiFi standardization. Using iterative belief propagation techniques, LDPC codes can be decoded in time linear to their block length [17]. In the CDR standard, four LDPC coding rates are defined including 3/4,1/2,1/3 and 1/4. The coded block length is 9216 bits.

2.2 Use cases of the CDR receiver

Three CDR receiver use cases are considered in this work. The requirements for resource usage and scheduling in each use case are different. Analysis and evaluation of architecture design options, including processing load, memory usage and scheduling of tasks, will depend on the use cases. Use Case 1: Single Radio – Single in receiver. Description: The Single Radio use case can provide only one radio program to users. Its processing load is the lowest in the three use cases. It is the mainstream use case and requires the most economic design in its embedded architecture implementation. Functionality:

1. It must support all transmission and spectrum modes, i.e. processing OFDM symbols with different size and time period, and hybrid content of digital and analog signals.

2. It must support audio information decoding, such as to list languages of a multi-language broadcasting and let the user to select a language for listening. When a radio program and the language are selected, only one audio stream will be played out.

3. It must support decoding of all data services of a selected program, and send the decoded data to another system for further processing. If a data service contains several data types, all data will be categorized by the types and sent to separated application systems for further processing.

4. It must process the emergency broadcasting data with highest priority.

Use Case 2: Single Radio+ – plus Background Scanning Description: Besides the Single Radio functionality, the Background Scanning use case can provide information of the network and other programs in other channels. It also helps for switching to another program. An extra channel need to be processed in background. It requires more resources to process two channels. Depending on the resources allocated for the back ground scanning, it can provide different levels of data processing, such as scanning the program guide of other channels or details of the audio programs and so on. Functionality:

1. It must have the same functionality in processing the audio and data service with the use case 1 – Single Radio.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 7 CHAPTER 2. CDR STANDARD AND HAPI SIMULATOR

2. It must finish the synchronization on the scanned channel. 3. For scanning the System Information (SI), It must support to process and provide the con- figuration information of other programs in scanned channel to the users, such as coding and modulation method, spectrum mode. 4. For scanning the Service Description Information (SDI), it must support to provide the program guide information about other programs to users. 5. For scanning the Main Service Data (MSD), it must support to provide more details about the program in the scanned channel, for example the the audio compression rate, and what kind of language for the audio service. Use Case 3: Dual Radios – two antennas in receiver. Description: Dual Radios use case can provide two programs to users simultaneously. It has the highest requirement for resources in embedded implementation. Functionality: 1. It must support to process two selected programs totally in two channels. In each channel, one language can be selected by users. 2. It must support to play two audio streams simultaneously, data services in two programs must be processed and categorized by the types.

2.3 Design of the CDR receiver

Figure 2.3: Task flow in the CDR receiver

There are two sub-systems in the CDR receiver shown in the Fig. 2.3, there are two sub- systems: one is de-modulation & de-coding sub-system, in which the CDR baseband signal got from Analog to Digital Converter (ADC) is demodulated and decoded via the baseband signal processing chain. The other sub-system is the de-multiplexing sub-system, which extracts wanted audio and data from the multiplexed frames from the demodulation & decoding sub-system. Two working modes exists in the baseband signal processing chain: the Acquisition & Syn- chronization mode, the Demodulation & Decoding mode. The receiver always starts from the Acquisition mode when switching to a new channel. After the synchronization has been finished, it will switch to the Demodulation & Decoding mode in this subsystem.

1. Acquisition and Synchronization mode: When receiver is switched to a new channel, it works in this mode. Tasks in this mode are shown in Fig. 2.5, Transmission mode (Tx mode) is detected first. After the Tx mode

8 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 2. CDR STANDARD AND HAPI SIMULATOR

Figure 2.4: Beacon and OFDM symbols in CDR frame structure

is found, the sizes of OFDM symbol and Beacon are known, then the FM mode will be detected. After the detection is finished, the Carrier Frequency Offset (CFO) is corrected. From the begin of next new sub-frame, the beacon in the beginning of the sub-frame shown in the Fig. 2.4 is found, then the OFDM symbol will be processed and the synchronization will finished, the subsystem will switch to the Demodulation & Channel Decoding Mode.

Figure 2.5: Tasks in Acquisition and Synchronization mode

2. Demodulation and Decoding mode This mode contains a stream of tasks for base band signal processing, receiver stays in this mode after acquisition and synchronization mode. Each beacon/OFDM symbol as the input is passed through the signal processing tasks, so the input buffer size should be 4864 Bytes, which is equal with the input buffer in the previous mode. • each symbol, shown in Fig. 2.5, will be sent to the FM removal filter first. To support the hybrid spectrum modes in the CDR standard, if the spectrum mode is for the hybrid of the digital and FM analog signals, the FM sub-bands can be removed here, otherwise the filter will only work as a Low pass filter. • Then the symbols in each sub-frame will be processed differently. The beacon will be sent to do frequency and time tracking only. Beacon processing works once in the beginning of a new sub-frame according to the frame structure shown in Fig. 2.4. The length of one sub-frame varies in three transmission modes, the numebr of OFDM symbol: N= 56, 111 61 in three modes. • OFDM symbols will be processed by all demodulation and decoding tasks. Frequency tracking is required first for further frequency offsets correction. In one OFDM symbol, there are hundreds of sub-carriers(122-484), which are used to carry the scattered Pilots, SI bits, MSD bits and SDI bits. Sub-carriers for scattered pilots are extracted then. The extracted scattered pilots are sent to the channel estimation task, then all left sub-carriers can be equalized. SI are collected and processed from the equalized sub-carriers, the SI bits include con- figuration of demodulation and decoding, it is required by the following processing tasks and will not change frequently. Moreover, the CRC checksums existing in SI bits can be

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 9 CHAPTER 2. CDR STANDARD AND HAPI SIMULATOR

Figure 2.6: Pre-processing tasks in Demodulation and Decoding mode

used to check whether error occurs in the symbol, if several consecutive errors are found, the receiver will stop processing OFDM symbol and switch back to the Acquisition and Synchronization Mode. The group of sub-carriers left are the data sub-carriers for SDIs and MSDs. the demap- ping task works in the method (QPSK/16QAM/64QAM ) according to the SI. All of the tasks introduced above require a buffer with one- OFDM-symbol size, it is passed along the task flow.

Figure 2.7: Tasks in OFDM symbol processing

The SDIs processing includes the de-interleaving task, Viterbi decoding task and SDI discrambling task, shown in Fig. 2.8. De-interleaving task will finish until one logic frame (4 sub-frames) has been received, all processed OFDM symbols accumulate in a buffer until the number is equal with one logic frame. The following tasks, Viterbi decoding and SDI discrambling, work on one logic frame too. A large buffer for the SDIs LLRs demapped from several OFDM symbols is required for the input of deinterleaving task, an output buffer with the same size is also required and it can be released only after the SDI discrambling finished. • The MSD processing includes the similar tasks with the SDI processing, the difference is about the decoding task. MSD bits will be decoded by LDPC decoding task, which can be designed to do different number of iterations on one coded frame, more iterations can improve the error capacity of the system. The de-interleaving buffer is even larger than the SDI de-interleaving buffer, since the MSD bits are more than the SDI bits in one logic frame. Two large buffers are required by MSD processing tasks as the input and output buffer.

Besides the signal processing subsystem introduced above, the receiver contains the other subsys- tem for de-multiplexing, seen in Fig. 2.1. The de-multiplex sub-system will be introduced in the 3rd chapter.

10 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 2. CDR STANDARD AND HAPI SIMULATOR

Figure 2.8: Tasks in SDI sub-carriers processing

2.4 SDR platform

In the project, the CDR receiver will be implemented on a existing SDR platform, which contains two digital signal processor from Tensilica, one is the BBE16, and the other one is the HiFi. BBE16 offers baseband engine with SIMD architecture, HiFi is optimized for audio, video processing. They provide flexible functional blocks to enable SDR and a range of optimized in- structions to meet the high throughput, for example, FFT, complex multiplication, vector division etc. The details about the platform including the features of the processors and the memory will be introduced in the Chapter4 as the constraints for the architecture design.

2.5 The HAPI Simulator

To improve the quality and efficiency of the analysis about the architecture options, a data- flow simulator is required. HAPI supports the time notation with low complexity focus on the temporal behavior analysis and schedule policy simulation, it will be used to validate and evaluate the architecture design options.

2.5.1 Elements in HAPI simulator HAPI is a timed discrete-event simulator built on top of the SystemC library [11]. It supports for modelling shared resources (memory, processor) with tokens and actors, as well as modeling the parallel applications (actors) executed on multiprocessor systems. When using the API of HAPI, a data-flow graph can be defined in C++ in which data-flow elements are instantiated and connected. Several arbitration policies are also supported by HAPI.

Figure 2.9: Example of the data-flow graph in HAPI [2]

1. Actor: is used to represent functions, which process the data values (tokens).

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 11 CHAPTER 2. CDR STANDARD AND HAPI SIMULATOR

• enabled when there is a predefined number of input tokens in the input channel and of spaces in output channel. • fires immediately in a self-timed method [11], it has a firing duration ρ which is also called execution time. • has input and output ports, which are used to connected with channels, the number of ports can be defined. • consumes tokens from channel at the moment that it fires and produces to the output channel after the time of ρ. • includes the quanta to define the number of tokens consumed and produced for each firing. • can do a so-called auto-concurrent execution, which means multiple instances of the same actor can execute at the same time. • can model a task with functional behavior, when the actor fires, the functionality for the task can be executed by the actor in the model. • The sensor task in HAPI can fire periodically with constant execution time.

For example, in Fig. 2.9, task v0 has the firing duration ρ0, while v1 has the firing duration ρ1. The quanta of the two tasks are defined as 1, the number of tokens consumed and produced in each firing is 1. The initial two tokens are in the channel. v0 defined to be producer can be enabled by one token in the channel, after it produces one token, v1 is enabled. The self-edge contained by task v1 help to prevent the auto-concurrent execution of v1, before one execution finished, no more execution can start. Compared with the task v1, task v0 can do an auto-concurrent execution, the number of concurrent executions depends on the tokens in the channel. 2. Channel: The actors can only communicate by the channel connected with the ports. One channel connect two tasks, one task is the producer, the other one is the consumer. The FIFO buffer in the channel is modeled by two unbounded queues. It is assumed to be initially empty, the capacity of the buffer is modeled by the number of initial tokens. The actions on the channel can be done by the actor including:

• Acquire: The producer such as v0 in the example acquires space for the token in the channel before it fires, and the consumer v1 acquire tokens in the channel before firing. • Release: The token will be released into the space in the channel acquired by the producer, the tokens acquired by the consumer are eaten and the space for these token can be released in the channel. 3. Processor: It is used to model a shared resource in HAPI. Multiple actors can be scheduled on one processor, the cycle counts are shared among these tasks. When sharing the resource, several arbitration policies are provided in HAPI, including Time-Division Multiplex (TDM), non-preemptive Round-Robin (RR) and Fixed Priority Preemptive (FPP). • With the TDM scheduling policy, the actors sharing the resource perform in the sched- uled time slices, the remaining budget of each actor can be tracked in current interval. • With RR scheduling policy, all the actors sharing the resource will be executed in a predefined order, no preemtion exists in this case. • With FPP scheduling policy, each actor has a priority for sharing the resource, one actor can be executed when no other actor with higher priority is enabled. The preemption exists in the execution of a task when a higher priority task is enabled. If two actor with same priority are enabled simultaneously, they have equal possibility to be executed first.

12 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 2. CDR STANDARD AND HAPI SIMULATOR

2.5.2 Simulation steps and features of HAPI When using HAPI to do simulation for data-flow model, the steps are as below: 1. Define the data flow graph in C++. In HAPI, the actors, channels and processors are implemented as C++ classes by which the data-flow elements are instantiated and connected, then specify the shared resource among a set of actors and set the schedule policy. 2. Compile the source code with the g++ compiler, a executable file will be generated including the simulation model. 3. Execute the executable file, the simulation results are obtained and stored in a vcd file including the traces of events. 4. Viewing the vcd file with the open-source Gtkwave tool. In the timing graph, it is visible for the moment of the actor fires, the firing duration, the preemption and delay of fire. Based on the description above, the features of HAPI are:

1. the HAPI supports the simulations of both functional and temporal behavior, functionality can be implemented in the actor. 2. The number of tokens can be defined by the quanta of the actor, so it supports the simulation of multi-rate data flow graph. The data memory usage can only be expressed by the number of tokens in the channel.

3. The execution time of an actor can be a constant or generated by random generator, the firing rule can depend on the number of input tokens, so a non-sequential firing rule can be simulated [2]. 4. The HAPI supports to simulate the dynamics in the data-flow model by the auto-concurrent execution and the preemptive scheduling policy.

5. The HAPI simulation model is in a high abstraction level, because there is no notion of registers and instructions, communication bus or network on chip, so it is not suitable for analyzing the increase in task execution time caused by the contention of on memory ports that can occur on both read and write accesses of processors [11]. However, the high level abstraction can make the simulation fast.

6. The simulation results from HAPI can be used to validate upper and lower bounds that are obtained under specific scenario. When using the simulation tool HAPI to search the optimal architecture options in industry, system can be simulated with a data-flow model in an abstract level, simulation results shown in the timing graphs directly are about processing load and scheduling, the analysis of memory usage requires some calculation based on the timing graph.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 13 Chapter 3

Demultiplexer design

In a CDR receiver, the de-multiplexing subsystem extracts the audio and data services from one or more selected radio programs. In this project, the de-multiplexing subsystem is designed and implemented to complete the receiver system. In this chapter, the CDR standard [7] about the de-multiplexer and the frame specification will be introduced to figure out the functional requirements. The use cases for demultiplexer will also be designed. Besides, a flexible multiplexer is developed to produce test frames for the verification of the de-multiplexer.

3.1 The CDR Multiplexing standard

To provide multiple radio programs with audio streams and data service in one FM channel, the CDR standard has specified multiplexing framework. The multiplexing sub-system in the transmitter is shown in the Fig. 3.1.

Figure 3.1: CDR multiplexing sub-system in the transmitter

14 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 3. DEMULTIPLEXER DESIGN

3.1.1 The multiplex frame specification Three types of information are defined by the CDR standard: the Main Service Data (MSD), the Service Description Information (SDI) and the System Information (SI).

1. MSD contains all service data including audio services, data services and emergency broad- casting information, which are defined as MSD bits shown in the upper left block of Fig. 3.1.

2. SDI is defined with the configuration information about the multiplexing, and about the channels in currently used network and in neighbour networks as shown in the lower left block of Fig. 3.1.

3. SI contains the configuration information about the modulation and coding, which is required by the receiver physical layer to demodulate and decode the MSD and SDI sub-carriers.

The MSD bits and SDI bits are processed by the multiplexer into Service Multiplex Frame (SMF) and Control Multiplex Frame (CMF), respectively in a transmitter, seen on the right of the multiplexing subsystem in Fig. 3.1. Each physical layer logic frame contains one CMF and one or more SMFs. A receiver de-multiplexer must first de-multiplex the CMF to get service information, before it demultiplex the SMFs into service streams.

Figure 3.2: Multiplex Frame structure

The structure of the multiplex frames are shown in Fig. 3.2.

1. SMF: It is defined to contain MSD bits of at most 15 radio programs as payload in one frame, each program provide audio and data services simultaneously. One audio service may have maximum 8 streams for multi-language broadcasting. A data service may contain maximum 256 types of data. One SMF may contain Multiplexed Sub-Frames (MSF), each carries one radio or data pro- gram as shown in Fig. 3.3.

• The audio section in the MSF contains the audio data divided into several units, each unit can have a time stamp. If the audio section contain several streams for different languages, the stream number will be kept in the audio section header for each unit.

Figure 3.3: Multiplex Sub-Frame structure

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 15 CHAPTER 3. DEMULTIPLEXER DESIGN

• The data section for the radio program may contain several types of data service in units,for example service guide, digital data service, system test and so on. The service type of each unit is also kept in the data section header. • The emergency broadcasting is also carried by the SMF, it can be transmitted in the SMF header if the size is small, otherwise it will be in the first MSF.

2. CMF: It contains several service description tables as payload, shown in Fig. 3.2: Service Multiplex Configuration Table (SMCT) and Network Information Table (NIT) and so on. For example, the SMCT can provide information to find the selected service in SMF by the service ID which identifies a radio program.

The padding part in the SMF and CMF is used to fullfill a logic frame to the size decided by the channel capacity, which is is determined by the transmission modes, modulation and coding rate.

3.1.2 Frames specification in CDR The CDR standard has defined four types of frames in physical layer, as shown in Fig 3.4:

Figure 3.4: Frames in CDR standard

1. Logic Frame: A logic frame contains the multiplexed frames that will be broadcasted in a physical frame.

2. Sub-Frames: Sub-Frame is the basic physical layer structure to transmit services. It starts with a preamble and contains a fixed number of OFDM symbols defined by the transmission modes.

3. Physical Frame: A physical frame is 640ms long, which is the same length as the Logic Frame. It contains four sub-frames, selected from 1 or 2 or 4 logic frame according to the frame permutation mode.

4. Super Frame: Contains 4 physical frames to perform frame permutation. The physical layer processing in a receiver needs alignment with a super frame to process frame permutation.

There are 3 permutation modes defined in the CDR standard. The first type is no reordering to the sub-frames in the logic frames, the logic frame will be identical to the physical frame. The other 2 modes are seen in the Fig. 3.5, they will generate physical frames with sub-frames from 2 or 4 logic frames.

• Type 1: Sub-Frames are not re-allocated.

• Type 2: Sub-Frames are re-allocated across 2 logic frames.

16 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 3. DEMULTIPLEXER DESIGN

Figure 3.5: Permutation types

• Type 3: Sub-Frames are re-allocated across 4 logic frames. Based on the frame specification introduced above, the functionality of the CDR broadcasting system should be: 1. Broadcasting multiple radio programs simultaneously in one FM channel. One radio program may include multiple audio streams, multiple types of data or both services 2. Broadcasting one audio program in multiple languages simultaneously. One audio service may contain multiple streams for different languages 3. Broadcasting data services in one radio program, including: User program guide, Emergency broadcasting data, Digital radio information data, System test, etc.

3.2 Functionality of de-multiplexer

In a CDR receiver system, the input of the de-multiplexing sub-system is SMFs and CMFs, and the output are the audio streams with description information like time stamps, and the data services, as shown in Fig. 3.6

Figure 3.6: The output of de-multiplexer

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 17 CHAPTER 3. DEMULTIPLEXER DESIGN

1. Demultiplexing one CMF requires: • Memory is required to store maximal 15 updates of SMCTs and NITs respectively. • After a CMF is de-multiplexed, check whether the SMCT in memory need to be updated by the SMCT update ID in the CMF. • For every CMF, check whether the NIT in memory need to be updated by the NIT update ID in the CMF 2. Demultiplexing one SMF requires:

Figure 3.7: Look up the service ID in SMCT in SMF demultiplexing

• Figure out the service ID for each MSF (radio program). SMCT update ID and SMF ID in the SMF header are used to check the corresponding SMCT, shown in Fig 3.7 • Audio stream recovery, in Fig 3.6: – Audio stream description (e.g. unit time stamp, unit length. . . ) – Payloads of audio units in the same stream are appended to each other • Data service recovery, in Fig 3.6: – Data type description (e.g. data type, unit length. . . ) – Payloads of data units in same type are appended to each other

Figure 3.8: CRC generator framework

3. CRC checks in the CDR standard : • The CRC checksums exist in all the headers of SMF and CMF. When demultiplexing frames, errors in the content will be checked, if error is found then the frame will be dropped, demultiplexer will stop processing and wait for next frame.

18 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 3. DEMULTIPLEXER DESIGN

• The CRC8 and CRC32 generator are defined in the CDR standard.

n n−1 2 Gn(x) = x + gn−1x + ... + g2x + g1x + 1 (3.1) Framework is seen in Fig 3.8[4].

3.2.1 Design of multiplexer with flexible API The multiplexer designed to test the de-multiplexer should be able to produce two types of mul- tiplex frames: SMF and CMF. The functionality for the multiplexer are: 1. Multiplex one or multiple radio programs into one SMF, each MSF may include several audio streams and data services. 2. Multiplex service description tables to one CMF, including the SMCT and the NIT tables. In order to decide the frame size for multiplexing, the initialization function is required before multiplexing. 1. The channel capacity shared by one SMF and one CMF can be initialized according to the physical layer configuration, such as the TX mode, spectral mode, and coding rate.. 2. The SDI bits should be initialized into the forms of table before being multiplexed into CMF. The MSD in SMF can be configured by user when use a API, which is designed to support different use cases. A flexible API is designed as: 1. The configuration parameters used for initialization of the channel capacity can be modified with the API, reading from a configuration file is possible for initialization. 2. For multiplexing a CMF, the API provides interfaces to: • update the SMCT by adding or removing services ID for an exited SMF ID. • reconfigure the existed services about the transmission mode. • increase the update ID when the table has been updated. 3. In SMF multiplexing, the API provides interfaces to: • Assign proper audio data to each MSF according to the compression rate of each audio stream in one radio program. Each MSF is required to contain 640ms audio data for one program. When compression rate is reconfigured, the data size of the audio will be changed, the channel capacity will be checked on whether the compression rate is applicable in current transmission mode, if it is not applicable, then an error message will be returned, the transmission mode should be set to a higher level defined in the standard. • set start time of the audio in one service by user, then the audio data will be divided into several units with equal length. The time stamp for each unit will be calculated in the API and kept in the audio section header in an MSF. • check the capacity for data service. The audio service is designed to have higher priority than data service. After audio data is prepared for the frame, if there is still capacity left, it will be used to transmit data service. • Assign the data service in the left channel capacity in a best-effort way. All the capacity left for data in one SMF will be occupied by data service, no padding is needed when there is enough data to be sent. • set priority for data in the API. Three types of data service are defined in the CDR standard. The emergency data is designed to have the highest priority, then it is the digital broadcasting data, the system test data has the lowest priority. The data with the highest priority will be sent in best effort way.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 19 CHAPTER 3. DEMULTIPLEXER DESIGN

With the API and multiplexer, the de-multiplexing subsystem can be tested for different multi- plexed frames, which can include different radio programs with different audio data size, and data services.

3.3 Verification of the Demultiplexer

The demultiplexer is verified in three ways: 1. The CMF and SMF generated by the multiplexer with different configurations are all pro- cessed correctly by the demultiplexer, the audio and data service are compared with the source files successfully.

2. A data file generated by a test CDR system from the CDR work group is demultiplexed, the audio data can be played after DRA+ decoding. 3. The information bits extracted from the frame headers are analyzed and compared with the frame specification in the CDR standard, the content is also correct.

From the verification, the functionality of the demultiplexer meet the requirements for the CDR standard and use cases.

20 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform Chapter 4

Requirements and constraints

In the embedded architecture design, the requirements and constraints must be specified first. The functional requirements according to three use cases have been figured out in Chapter2 and Chapter3. Now the hardware features will be analyzed in the first section to explore the constraints. The transmission modes and modulation levels defined by the CDR standard are introduced in the second section, it requires different processing loads. Then profiling statistics of functional modules are shown in the third section.

4.1 Platform description

In the project, the automotive CDR receiver system is implemented on the platform with two digital signal processors from Tensilica. The base band signal processing sub-system defined in the standard uses the ConnX BBE16 DSP, which is a high-performance DSP for use in LTE/4G modems and multi-standard broadcast receivers. The other sub-system for de-multiplexing will be implemented on the HiFi DSP which is also from Tensilica and configurable for audio, voice, and speech processing. The audio decoding application will also run on this processor sharing the resource with de-multiplexing. The features of BBE16 DSP [9]:

Figure 4.1: Block Diagram of a ConnX BBE16 Baseband processor [9]

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 21 CHAPTER 4. REQUIREMENTS AND CONSTRAINTS

1. 8-way single-instruction, multiple-data (SIMD) and 3-issue very long instruction word (VLIW), as well as 10-stage pipeline with customized FIFO, port and lookup interfaces, Dual 128 bits load/store units, it is shown in the Fig 4.1

2. OFDM and MIMO optimized instruction set

3. Supports a rich variety of complex arithmetic operations

4. Single-cycle radix-4 fast Fourier transform (FFT) butterfly, 4 complex-tap FIR, and 16 real- tap FIR operations

5. High-performance C/C++ compiler with automatic vectorization of scalar C and full support for vector data

With the features above, the BBE16 DSP has high performance, low power consumption for a broad range of algorithms, the software development on the processor is based on C programming with several signal processing libraries support. There is a comprehensive instruction set simulator (ISS) which allows to quickly development and evaluate the software and execution performance. This processor is suitable for the base band signal processing. The HiFi processor is used in de-multiplexing and audio decoding. The features of the HiFi processor are:

1. 32-bit ISA

2. 3-issue VLIW

3. support user defined insturction the HiFi DSP is highly optimized for audio, video and speech, it also provides high performance with low energy consumption, especially for audio decoding, it is configurable with numerous pre-defined functions and features for designer, as well as adding custom instructions. For pro- gramming, the development on HiFi are completely in C [10]. The memory provided on the platform are shown in the table 4.1.

Table 4.1: Memory in the system

Tightly Coupled Memory Cache TCM (KBytes) (KBytes) BBE16 DRAM 296 BBE16 IRAM 80 HiFi DRAM 256 16 HiFI IRAM 2 16 SRAM 2496 SROM 384

The BBE16 and HiFi DSP have their own local memories for data and instructions, shown in the Table 4.1. The SRAM and SROM are shared memory on the platform. The clock frequency of the BBE16 and the HiFi cores are 250 MHz.

4.2 Specific configuration of CDR broadcasting

The CDR standard defines the broadcasting system based on the OFDM technology, the sampling rate for OFDM symbol is 816 KHz. The symbol periods and sizes vary in transmission modes and

22 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 4. REQUIREMENTS AND CONSTRAINTS spectrum modes. These parameters are related to the time budget for signal processing, the cycle count for baseband DSP modules and the memory requirements. The parameters about symbol and sub-frame sizes in three transmission modes are shown in Table 4.2, T = 1/fs = 1/81600s in the table.

Table 4.2: Configurations in transmission modes

Parameters Symbol Tx Mode 1 Tx Mode 2 Tx Mode3 OFDM body Tu 2.51 (2048T) 1.255 (1024T) 2.51 (2048T) (ms) Cyclic Pre- Tcp 0.2941 (240T) 0.1716 (140T) 0.0686 (56T) fix1 (ms) OFDM sym- Ts = Tu + T cp 2.804 (2288T) 1.426 (1164T) 2.5786 (2104T) bol period (ms)

Cyclic Pre- TBcp 0.4706 (384T) 0.4069 (332T) 0.2059 (168T) fix2 (ms)

Beacon (ms) TB = TBcp + Tu 2.9804 (2432T) 1.6618 (1356T) 2.7157 (2216T) OFDM sym- SN 56 111 61 bols per sub- frame sub-frame Tsf 160 (130560T) 160 (130560T) 160 (130560T) period (ms)

In the base band signal processing subsystem, most of the tasks work on one OFDM symbol. With the table 4.2, it can be figured out that one OFDM symbol may contain 1164 to 2288 samples, each sample is a 16-bit integer, so one OFDM symbol requires the buffer with size of 4.576K Bytes. The processing load for the same task in different transmission modes will be different, because the symbol sizes are different. However, from the de-interleaving task, the following tasks require one complete logic frame as input. The buffer for these tasks are significantly larger than that for the previous tasks. The specific size will depend on the demapping method and the number of sub-carriers. The numbers of sub-carriers vary in three transmission modes and spectrum modes, which are shown in Table 4.3

Table 4.3: The number of sub-carriers per OFDM symbol in one sub-band(upper+lower)

Parameters Symbol Tx Mode 1 Tx Mode 2 Tx Mode3 Number of Nv1 242 122 242 sub-carriers in spectrum mode 1/9/22 Number of Nv2 484 244 484 sub-carriers in spectrum mode 2/10/23

The sub-carrier matrix includes all the sub-carriers in one logic frames. Assume the matrix is M, then the logic frame contains four sub-frames, if the sub-carrier matrix for one sub-frame is

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 23 CHAPTER 4. REQUIREMENTS AND CONSTRAINTS expressed as Ms,t : s = 1, 2, ..., SN , t = 1, 2, ..., Nv (4.1) The number of sub-carriers in one logic frame can be calculated based on the Table 4.3 and Table 4.2 for different transmission and spectrum modes:

Nviframe = 4 ∗ SN ∗ Nvi (4.2)

These sub-carriers are divided into four groups carrying SIs, Scattered Pilots, SDIs and MSDs, it is shown in Table 4.4

Table 4.4: The number of sub-carriers in logic frame

Parameters Symbol Tx Mode 1 Tx Mode 2 Tx Mode3

Num of sub- Nv1frame 54208 54168 59048 carriers in spectrum mode 1/9/22

Num of sub- Nv1MSD 46080 46080 50688 carriers for MSDs per frame 1/9/22

Num of sub- Nv1SDI 1704 1576 1360 carriers for SDIs per frame 1/9/22

Num of sub- Nv1SI 108 108 108 carriers for SIs per frame 1/9/22

In the Table 4.4, the number of sub-carries are calculated for spectrum mode (1/9/22). For the other spectrum type (2/10/23), the number can be got by doubling the results in the Table 4.4. Each sub-carrier is an complex number processed in the signal processing chain. The size of input data for each task in the chain can be calculated based on the analysis above. Several constant buffers in the receiver system are required to keep the reference symbols:

1. Beacon reference symbol: it is generated by repeating a sequence of complex numbers twice, 240 complex numbers in spectrum mode (1/9/22), 480 in spectrum mode (2/10/23). The IQ samples are 16 bits integer, each complex number contain two samples, the maximum size for beacon reference symbol is 480 ∗ 4 = 1920 Bytes.

2. Pilot pattern: 124 bits complex numbers, the maximum size is 124 ∗ 2/8 = 31Bytes in spectrum mode (2/10/23).

3. Pilot buffer: there are scattered pilots in every symbol. 22 pilots in Txmode 1 and 3, each pilot is a complex number consisting of two of 16 bits integers, repeat every 3 symbols, maximal size is for spectrum mode (2/10/23): 22 ∗ 4 ∗ 3 ∗ 2 = 528Bytes

The softbits got from demapping task are the input of de-interleaving task, three demapping techniques are supported in the standard, including QPSK, 16QAM and 64QAM. the mapping efficiencies are 2/4/6 respectively. So the size of de-interleaving input in spectrum mode (1/9/22) is shown in the Table 4.4: the maximum size of de-interleaving buffer should be:

24 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 4. REQUIREMENTS AND CONSTRAINTS

Table 4.5: The Number of softbits per sub-band in spectrum mode (1/9/22)

Parameters Symbol Tx Mode 1 Tx Mode 2 Tx Mode3 (QPSK/ (QPSK/ 16QAM/ (QPSK/ 16QAM/ 16QAM/ 64QAM) 64QAM) 64QAM)

Num of Nsbitssymbol 484/ 968/ 1452 244/ 976/ 732 484/ 968/ 1452 softbits per OFDM symbol

Num of Nsbitsframe 108416/ 216832/ 108336/ 216672/ 118096/ 236192/ softbits per 325248 325008 354288 frame

Num of SDI NsbitsSDI 3408/ 6816/ 10224 3152/ 6304/ 9456 2720/ 5440/ 8160 softbits

Num of MSD NsbitsMSD 92160/ 184320/ 92160/184320/ 101376/202752/ softbits 276480 276480 304128

1. For MSD, the max number of softbits is in transmission mode 3 and 64 QAM: 304128 shown in Table 4.5. For spectrum mode (2/10/23), it is:304128*2=608256, each one is 4 bits, two buffers are required for input and output of de-interleaving, the total size is: 608256/2 ∗ 2 = 608256Bytes 2. The size of MSD output buffer which is used for the output of discrambling task: when code rate is 3/4, in Tx mode 3 with 64QAM, the maximal size is in spectrum mode (2/10/23): 304128 ∗ 3/4 ∗ 2/8 = 57024Bytes 3. SDI de-interleaving input and output buffers, maximum size totally is : 10224 ∗ 2/2 ∗ 2 = 20448Bytes 4. SDI output buffer maximum size is: 1704 ∗ 6 ∗ 1/4 ∗ 2/8 = 640Bytes

5. SI repeats 2 times in 216 sub-carriers, so one complete SI block contain max.108 sub-carriers and is modulated only with QPSK, so buffer size: 108*2=216 softbits, the total size of two buffers for SI: 216/2 ∗ 2 = 216Bytes. The LDPC decoding requires a scratch buffer with maximal 100000 Bytes, permanent buffer 29696 Bytes. The DRA+ decoder run on HiFi, it requires a input buffer in SRAM, it is the audio output from the de-multiplexging subsystem. The memory usage of the system in use case 1: Single Radio is shown in Table 4.6. According to the memory shown in Table 4.1 The DRAM in the BBE16 and shared memory SRAM are sufficient. The DRAM in HiFi will be used for Demux and DRA+ decoding, because the DRA+ decoding module is not implemented on the platform now, the memory cannot be analyzed. If for use case 3: Dual Radios, two channels are received, buffer sizes required in SRAM increase to 1560952 Bytes, the memory on the platform is enough to implement three use cases. The instruction memory is also enough for all the modules in the system. Memory usage is analyzed by manual calculation here. Because memory analysis in HAPI simulator is based on the number of tokens. When configure the number of tokens produced or consumed by the actors in each firing, similar calculation is also required. Moreover, in order to benefit from the data-flow analysis, the simulation model will be built in abstract level, the numbers of tokens produced and consumed express the firing rates of actors. So the tokens will express different size of data in the model.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 25 CHAPTER 4. REQUIREMENTS AND CONSTRAINTS

Table 4.6: Memory requirements in CDR receiver

Task BBE-DRAM(Byte) SRAM (Byte) Acquisition spec- 6144*2=12288 trum mode Reference Beacon 1920 Reference Pilot 31 Pilot received 528 One OFDM symbol 488 buffer Interpolation 928 (Wiener filter) SI LLR 216 SDI de-interleaving 20448 buffer SDI output buffer 640 MSD de- 608256 interleaving buffer MSD output buffer 57024 LDPC Decoding 29696 Permanent buffer Demux data 300000 memory (tables) Demux code size 800000 Demux MSD out- 57024 put 1 audio packet 7868 DRA+ data 160000 memory DRA+ audio out- 9216 put DRA+ library 300000 Total 46095 1330476

4.3 Functional tasks profiling on the platform

Functional tasks are implemented as modules on the platform. A comprehensive instruction set simulator (ISS) provided by the manufacture allows to do a quick and accurate simulation for the cycle count of each implemented module. The baseband processing chain will be analyzed here, because one processor is shared by the modules in the chain, the Demux will share HiFi with DRA+ later and its cycle count is relatively low, it will not be analyzed here. According to current module implementation, the cycle counts of modules are collected by three methods:

1. part of the cycle counts for Txmode 1 and spectrum 9 are collected from simulation results directly.

2. several cycle counts are collected by approximation , which is based on the simulation results with several input sizes, which are different from the sizes defined by CDR standard.

3. All the cycle counts in Txmode 2 are estimated to be about half of the cycle counts in Tx mode 1, because the OFDM symbol size in Txmode 2 is half of the symbol in Txmode 1. Tx

26 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 4. REQUIREMENTS AND CONSTRAINTS

mode 2 is not supported by modules now. The spectrum mode (2/10/23) is not supported now neither, so the cycle count will be estimated to be double of the number in spectrum 9, because the sub-carrier size is doubled. The spectrum mode (2/10/23) in simulation will be only set for validation in worst case. The cycle counts for each modules are: 1. The filter module: for FM signal removal in hybrid spectrum mode or Low Pass Filter in digital only mode. The cycle counts for other input sizes are approximated by the equation:

cyclecount = 3.75 ∗ inputsize + 43 (4.3)

For transmission mode 1: when inputsize = 2288, cyclecount = 8623 For transmission mode 2: when inputsize = 1164, cyclecount = 4408 For transmission mode 3: when inputsize = 2104, cyclecount = 7933 The numbers of samples are the same in two different spectrum modes, so cycle counts are also the same in mode (1/9/22) and mode (2/10/23). 2. The Beacon will be processed after the filter, two tasks here can be simulated directly: the frequency tracking task and time tracking task, the cycle counts are 6234 and 16745 for each. This is measured in the Tx mode 1 in spectrum mode 9, the cycle count in Txmode 2 is estimated to be half of that in Txmode 1. 3. OFDM symbol tracking task: the same as the frequency tracking task for beacon (6234 cycles). In spectrum mode 2, the cycle count is estimated to be 3117. 4. the Scattered Pilot(SP) processing task: includes modules of SP extraction and removal, the cycles simulated with Tx mode 1 in spectrum mode 9 are 1261 + 2522 = 3783. For mode 2, it is approximately 1399 cycles. 5. Channel estimation costs 660 cycle counts by simulation, For mode 2, it is approximately 330 cycles. 6. Equalization task: the cycle count for TX mode 1 spectrum mode 9 is: 4898 cycles, a equation can be used for estimation:

cyclecount = 3.75 ∗ inputsize + 43 (4.4)

7. SI processing: includes SI combining, de-mapping and Viterbi decoding. In Tx mode 1 and spectrum 9, the cycle count is 5872 based on the equations for cycle count of QPSK demapping and simulation results for combining and Viterbi decoding. 8. De-mapping task: three equations are given for QPSK, 16QAM and 64QAM to calculate cycle counts with different input size.

QP SK : numberofcycles = 45 + inputsize + doSymSat ∗ (24 + 1.5 ∗ inputsize) 16QAM : numberofcycles = 52 + inputsize + doSymSat ∗ (24 + 3 ∗ inputsize) 64QAM : numberofcycles = 58 + inputsize + doSymSat ∗ (24 + 4.5 ∗ inputsize)the0doSymSat0meansmaketheoutputinsideasymmetricalrange.

9. De-interleaving task: the simulation results are collected from the MSD de-interleaving task. The cycle count of SDI de-interleaving task is estimated based on the simulation results. For Tx mode 1 spectrum mode 9 and QPSK demapping, the MSD de-deinterleaving task costs 3700 cycles, 16 QAM requires 3900 cycles, 64 QAM for 4500 cycles in simulation. The special case is that: although the symbol size in mode 2 is half of that in mode 1 and 3, the number of OFDM symbols in one logic frame is almost doubled in Txmode 2, so the de-interleaving task requires similar number of cycles for processing one logic frame in all Tx modes.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 27 CHAPTER 4. REQUIREMENTS AND CONSTRAINTS

10. SDI Viterbi decoding: simulation results are for decoding SI bits, it consumes 5771 cycles, the cycle counts can be estimated for SDIs according to the input size. With different demapping methods, the cycle counts of SDI Viterbi decoding are different in each Txmodes. 11. The LDPC decoding for MSD: The cycle count will depend on the number of iterations designed for LDPC decoding. At least about 4500000 cycles are consumed for one logic frame, when 10 iterations of decoding are designed for one frame. If an improved algorithm which can decode block by block in one logic frame is used, for a block of 9216 bits, one LDPC decoding iteration requires 48400/ 54100/ 50100/ 56000 for coding rates 1/2,1/3,1/4,3/4 respectively. 12. The discrambling task: the cycle consumption can be calculated by the equation:

cyclecount = 0.0786 ∗ inputsize + 35.1134 (4.5)

The cycle count for each task in the signal processing chain have been collected by simulation and estimation. Differences could exist between the data collected now and that of the final modules implemented on the platform to support all Tx modes. However, the difference will no be very significant. All the signal processing tasks share the cycles from one processor. In use cases for background scanning and dual radios, symbols from two channels need to be processed. Different processing load will be simulated with the HAPI simulator in different Txmodes and spectrum modes, the optimal design for tasks sharing the processor will be made.

28 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform Chapter 5

CDR receiver data-flow modeling in HAPI

To simulate and analysis the CDR receiver system, Data-Flowv (DF) based simulation models will be built with HAPI. In this Chapter, the DF model is built first to describe the receiver system in abstract level. Then the simulation models for three use cases are established with HAPI, the details about the simulation module are introduced in the second section. Data-Flow(DF) modeling is a widely used analysis technique in system design and implement- ation, especially for DSP systems. Signal processing tasks can be expressed by the actors in DF graph, tokens in the channel can express the data stream processed by the tasks. With Dynamic Data-Flow(DDF), the feature of dynamic in system behaviour can be expressed well. However, because of the real-time behavior in DDF, it can’t generate analytical results for evaluation. For the CDR system, symbols are processed by tasks in a predefined sequence, no run-time dynamical behavior exits in the system, so Static Data Flow is more suitable for the CDR receiver. In Static Data Flow, some types can be considered for our analysis: 1. Single Rate DF: requires only one token to be consumed and produced into the channel in each time of firing. For CDR system, from de-interleaving task, received signals are accumulated and processed frame by frame, it is not allowed in SRDF. 2. Multi Rate DF: allows constant number of tokens to be produced and consumed in each firing, but for one task, the executions should be identical in each time. In the CDR receiver, the filter task will execute differently depending on the received signal. 3. Cyclo-Static DF: every actor has an execution sequence, a firing rule can be evaluated before each firing, the number of tokens can be different according to the condition. the tasks with data-dependent or state-dependent conditionals can be described by CSDF. According to the features of different Data-Flow, CSDF is the most suitable choice for expression and analysis of the CDR receiver. An abstract CSDF model is built based on the modules im- plemented on the platform for the CDR receiver, it is used to analyze the processing load and scheduling. The functionality of these modules have been simulated and verified in Matlab, it is unnecessary to check the correctness of the algorithms in this model.

5.1 CSDF model

The CSDF model for the CDR receiver system is shown in the graph Fig. 5.1. In every sub-frame, beacon is the first symbol, so actor Filter will transit to state S1 after firing several times (e.g 56 OFDM symbols in Txmode 1) in state S2 and produce to actor Beacon for time and frequency tracking.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 29 CHAPTER 5. CDR RECEIVER DATA-FLOW MODELING IN HAPI

Figure 5.1: CSDF model of the CDR receiver

When 4 sub-frames (1 logic frame) have been processed, all tokens produced by actor Demap are kept in the channel for a complete logic frame, for example, 224 tokens in transmission mode 1, the actor SDI Deint and MSD Deint will fire once. The actors following the de-interleaving actor work on one logic frame too. Most of the actors fire for one OFDM symbol and process the symbol in the static order. All actors for digital signal processing are assigned to the BBE16 processor, the demultiplexing actors are on the HiFi processor which will also be used for audio DRA+ decoding.

5.2 Simulation model design

In the simulation model, actors will be designed to have state-dependent behaviors, which are based on the conditions in the functional module implemented on the platform. The CDR receiver system always starts from Acquisition & Synchronization mode in which the transmission mode and the spectrum mode are detected first. In this mode, no token is produced to the following demodulation tasks. When the channel is synchronized, the receiver will be in demodulation & decoding mode then. The design of the actors in the simulation model are introduced below.

5.2.1 Actor Sync

In the simulation model, an actor Sync is designed to express the Acquisition & Synchronization mode. The behavior of the Sync actor in the state (S1) is: consume token, but no production, the sate is for the Txmode detection, shown in Fig. 5.2.

When a new sub-frame arrives, the beacon is found, then the actor goes into state (S2) for FM detection, in which it consumes token with longer Execution Time (ET) than that of S1, and no token is produced to the channel. After the FM mode detection is finished, the arriving tokens will be dropped directly until the start of the next logic frame, so no execution time and no production in the state S3. After S3, synchronization is finished, the receiver will start to work in Demodulation & Decod- ing mode, the actor Sync will stay in the state S4 consuming and producing one token (OFDM symbol)in each firing, the execution time is very small. In the Demodulation and Decoding mode of the receiver, the OFDM symbol will be processed by many tasks in a static order, each task fires periodically, one token express one OFDM symbol or sub-carriers from one OFDM symbol. The firing period is equal with the OFDM symbol period.

30 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 5. CDR RECEIVER DATA-FLOW MODELING IN HAPI

Figure 5.2: FSM of actor Sync and Actor in HAPI

5.2.2 Actor Filter

Figure 5.3: FSM of actor Filter and Actor in HAPI

The actor Filter express the task removing the FM parts from the received signal in hybrid spectrum mode, or a Low Pass Filter for digital only mode. The actor in model is configured with different execution times to express the differences in spectrum modes. When the spectrum mode is detected, the Filter starts in state S1 consuming a beacon symbol, produce to the beacon processing actor. Then the actor goes into the state S2 consuming an OFDM symbol and producing to the F tracking actor until next beacon. the behavior is data-dependent when spectrum mode is decided. Beacon symbol will be used to do frequency and time tracking, so the actor Beacon proc in the model only consumes token when it produced by the actor Filter, after execution, no production from the actor Beacon proc.

5.2.3 Actor Equalz The actor Equalz supports two modes for different uses. The first mode, it can equalize the sub-carriers based on the estimation result for current symbol. The second mode, it requires estimation results from three consecutive symbols as input, then produce equalized sub-carriers for the second symbol. The sub-carriers in first symbol are dropped then, so the delay of a symbol length in output exists in this mode. For an OFDM symbol in the system, the actor of F tracking will fire when a token appears in the channel, then produce the OFDM symbol to the actor for channel estimation. After the actor of Channel esti, the actor Equalz will fire.

5.2.4 ActorSI proc After token produced by the Equalz, the actor SI proc will fire.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 31 CHAPTER 5. CDR RECEIVER DATA-FLOW MODELING IN HAPI

Figure 5.4: FSMs of two equalization modes for actor Equalz and actor in HAPI

Because the SI content will always be compared and checked in the real system. If several consecutive errors occur, then the receiver will restart from the Acquisition and Synchronization mode. So in the design for the simulation model, the max number of errors is assumed to be 5, when SI is processed successfully once, a token will be produced to the channel connecting with the actor Demap.

Figure 5.5: FSMs of actor SI proc and actor in HAPI

Then one token will be consumed and produced into the channel by actor Demap. But in the receiver system, the output is several softbits, the size depends on the mapping method, including QPSK, 16QAM and 64QAM.

Figure 5.6: Simulation model in HAPI

The simulation model shown in Fig. 5.6 is for the use case 1 Single Radio, the number of tokens is related to the firing rate, the numbers of token buffers in channel are the minimum sizes required by the actors. The de-interleaving actor can fire only when the number of tokens produced by de-mapping is equal with the number of OFDM symbols in one logic frame under the current Tx mode. After the de-interleaving, the SDI bits will be processed first, the viterbi decoding and dis- crambling task also fire once per logic frame.

32 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 5. CDR RECEIVER DATA-FLOW MODELING IN HAPI

Then the token produced by actor SDI discrambling will be consumed by the SDI de-multiplexing actor, which is on the other processor. The token of MSD bits after de-interleaving will be consumed by the actor MSD dec. In order to get good capacity for bit errors, the LDPC decoding will take place for several iterations on one logic frame, so the execution time of actor MSD LDPC is long. Then it will produce one token to actor MSD discrbl, the token express one logic frame here. The actor MSD discrbl will produce to the actor for MSD de-multiplexing, the actor is also on the other processor, and it is connected with the SDI demultiplexing actor to show the data dependency between MSD demultiplexing and SDI demultiplexing. The SDI demultiplexing is required to execute before the MSD de-multiplexing task by the CDR standard. src is a sensor task in HAPI, firing period is the OFDM symbol period, ET can be ignored and it will not be mapped to any processor, the value of token produced by src is the symbol ID defined as a global variable, value in:0,SN ,the initial value can be set before compiling the model, when the ID is different, then the delay for the first output from the receiver is different, because one complete logic frame is required to produce the output, the symbol ID in each token will be checked by the actors to make sure every symbol is processed.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 33

Chapter 6

CDR receiver data-flow graphs & simulation

With the profiling statistics about cycle count in Chapter4, the simulation model in HAPI is configured. Simulations have been done on the graphs for different use cases and transmission configurations. The usage of the processor cycles is analyzed based on the simulation results, the issues of the design options are figured out, the proposal about design options and the solutions for the issues are made.

Figure 6.1: Simulation graph for use case 1

6.1 Single radio use case study

Simulation starts from the basic use case for single radio, the data-flow graph is shown in Fig. 6.1. Only one channel is received and processed in the system. The tasks execute in iterative pattern, the LDPC decoding algorithm consumes the most cycles of the processor.

Table 6.1: Configuration for Use Case 1 Single Radio

Txmode 1 Spectrum mode 9 LDPC1/2 Iteration10 SymbolID 50 FrameID 3

1. Configuration: The configuration includes transmission mode, spectrum mode, first received symbol ID and first received frame ID. Assuming the receiver start from symbol ID=50 (from 0 to 56), frame ID=3(from 1 to 4). LDPC code rate is 1/2, total iterations on one coded block is 10. 2. Validation of the model: The first received symbol is the 51th in the third sub-frame shown in Table. 6.1. The synchronization mode will finish after 64 OFDM symbols produced by

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 35 CHAPTER 6. CDR RECEIVER DATA-FLOW GRAPHS & SIMULATION

Figure 6.2: Start Symbol ID for demodulation and decoding mode

the actor src, the demodulation and decoding mode will start from the 65th OFDM symbol shown in Fig. 6.2. In the timing graph Fig. 6.2, execution of each actor is counted from 0 and labeled with hex number, so 0x40 means the 65th symbol produced by the source, 64 symbols are processed before the filter actor in demodulation and decoding is executed for the first time, the task flow is correct compared with the Matlab model.

Figure 6.3: Actors firing on every OFDM symbol

3. Results: In the configuration, simulation results shown in Fig. 6.3 depict the actors from filter to demapping in one symbol period, other actors will not fire until one logic frame is received. With the markers set in the graph, the percentage of cycles consumed, shown from marker A to C, in one symbol period from A to B, which is only about 6.6% of one period.

Figure 6.4: Delayed OFDM symbols by LDPC decoding

4. Issues: In the Fig. 6.4, when the MSD LDPC fires between two markers in the graph, 7 OFDM symbols produced by Src are not processed, extra buffers are required for these delayed symbols, total size of 7 ∗ 4576 = 32032Bytes, the size of extra buffer will increase with the delay, especially when using higher modulation, such as 64QAM and LDPC rate 3/4. 5. Proposal: In order to eliminate the delay caused by LDPC decoding and save the memory, the LDPC decoding module should be split to iterations and share cycles in symbol period

36 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 6. CDR RECEIVER DATA-FLOW GRAPHS & SIMULATION

with symbol demodulation tasks, because only 6.6% of each period is occupied now, then no delay exists for symbol demodultaion, so extra buffers can be reduced. Then the LDPC is improved to run iteratively, it can keep the state after one iteration and restart in next symbol period. The de-interleaving actor requires a logic frame size buffer both in its input channel and output channel. On the input side, the buffer is used to keep the LLRs from demapping, the output buffer is used to keep the input for the LDPC decoding. In order to save the memory, if the LDPC decoding actor can process one logic frame within the time of one logic frame, then no extra buffer is required for the output of the de-interleaving actor.

6.1.1 Iterative LDPC decoding study When LDPC can execute iteration by iteration and keep the state to restart in the next OFDM period, then the memory for buffering the symbol delayed by LDPC can be saved, while the throughput will decrease depending on how many periods will be shared by LDPC decoding.

1. Configuration: With the same configuration in Table. 6.1. When running the iterative LDPC decoding, the arbitration policy use Round Robin, which makes the LDPC task not to be preempted within one iteration. Between two iterations, other tasks can be checked whether they are ready to fire. Delay will appear in processing new OFDM symbol, but no extra buffer is required, because the delay will always be less than the execution time of one LDPC iteration, all the tasks before LDPC decoding can finish before next symbol arriving.

Figure 6.5: Improved LDPC decoding algorithm

2. Results:

(a) The throughput of the signal processing chain with non-iterative LDPC is 1.5035 frame/sec including one symbol delayed by the equalization mode, which can be used as a maximum reference. The throughput required by the CDR standard is larger than 0.78125 frame/sec. (b) If using the improved LDPC decoding in the ideal case, which can lead to 100% used cycles in every OFDM symbol period before LDPC decoding finished on one logicframe (marker A in the graph), the number of iterations in each period is not fixed. The simulation result is shown in Fig. 6.5, The symbol ’+’ means delay. Then the throughput of the signal processing chain is: 1.5024frame/sec including one symbol delayed by the equalization mode. The cycle budget for the LDPC task from the time when the LDPC is ready to execute until the end of current logic frame is only used by 3.67%,.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 37 CHAPTER 6. CDR RECEIVER DATA-FLOW GRAPHS & SIMULATION

3. Proposal: Considering the receiver will support different transmission modes and spectrum modes, the simulation results should be collected with different configurations about Txmode and demapping method in the ideal case as a maximal reference.

The ideal case results for different Txmodes and code rates are collected in the following simulation as the maximum references, the number of iterations on one block is still set to 10.

Table 6.2: Throughput and budget usage of iterative LDPC in spectrum 2/10/23

Settings Throughput Cycle budget usage Txmode 1, QPSK, LDPC 3/4, 30 iters 1.2349 26.31% Txmode 1, 64QAM, LDPC 3/4, 30 iters 0.9486 73% Txmode 2, 64QAM, LDPC 3/4, 30 iters 0.8879 76.243% Txmode 3, 64QAM, LDPC 3/4, 30 iters 0.9094 71.813%

1. Configuration: the configurations are shown in Table. 6.2, the three Tx modes are set in spectrum 10 for the simulation. For the Txmode 1, two demapping methods are tested, one (QPSK) requires the least cycles for tasks after demapping, the other one (64QAM) requires the most cycles for the tasks after the demapping. LDPC coding rate affect the cycle count of each iteration, coding rate 1/2 require the least number of cycles for one iteration, on the contrary, the coding rate 3/4 requires the largest number of cycles for one iteration, the number of iterations on one coded block is set to 30, which can achieve the best performance for error correction. 2. Results: The throughput of the subsystem with QPSK is high, only 26.31% of the cycle budget is used, no issue will exist with this configuration. The highest budget usage is for the Txmode 2, about 73%. The lowest throughput is for Txmode 1, 64 QAM, LDPC coding rate 3/4, the lowest throughout is 0.8879 frame/sec. 3. Proposal: According to the budget usage, when using 64QAM, The budget usage will be over 50% in spectrum mode 2/10/23, much larger than the usage for QPSK. If in the use case 3 Dual Radios, the processing load will be an issue for 64QAM in spectrum mode 2/10/23. In order to make feasible design, the cycles in one period should not be totally used up. Some budget should be kept for some situation like the register update or memory access between tasks. And the number of iterations in one period should be fixed for the design. For the spectrum mode 10, the cycle counts are got from estimation base on that in mode 9, the size of subcarriers and number of LDPC block are doubled in the mode 10. The simulation about the optimal design for the LDPC in different configuration is required, the number of iterations in one period and total iterations for one block are required to be set in the design.

6.1.2 Study on the number of LDPC iterations per OFDM period Simulation for the optimal number of LDPC iterations on one coded block to avoid delay, and the number of iterations can be done in one period for different broadcasting modes, the configurations and results are shown below.

Transmission Mode 1 1. Configuration: The Txmode 1 is tested first with all different demapping methods, shown in Table. 6.3, the LDPC coding rates are also tested in the basic situation, the iteration

38 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 6. CDR RECEIVER DATA-FLOW GRAPHS & SIMULATION

Table 6.3: Simulation results with optimal design of improved LDPC decoding in Txmode 1

Tx settings LDPC setting Throughput cycle usage per cycle budget us- period(%) age(%) Txmode 1, QPSK 1/2, 12/84 1.0843 90.94 44 Txmode 1, QPSK 1/3, 10/84 1.0222 85.14 52.7938 Txmode 1, QPSK 3/4, 10/84 1.0222 87.8934 52.8060 Txmode 1, 3/4, 10/60 1.0192 91.3045 53.2495 16QAM Txmode 1, 3/4, 10/40 1.0163 93.0403 53.6943 64QAM

number in one period is set to 10 or 12, the total iterations required on every coded block is set from 84, which is large enough for a good error capacity. 2. Results: From the results in Txmode 1, the cycles are occupied more than 85%, the efficiency of resource usage is high. When the coding rates is changed from 1/2, the number of iterations in one period has to be reduced in order to avoid delay, which requires extra buffer for arriving symbol, so 12 iterations per period is not feasible when the code rate is 1/3 or 3/4. The cycle budget is kept around 50%, which ensure enough budget for the other subsystem.

Figure 6.6: Symbols demodulation delayed by the SDI decoding iterations

The total iterations for one coded block will be affect by the number of blocks waiting for the decoding. For QPSK, the total number of iterations on one block is kept as 84, because less blocks produced in this demapping method, so more iterations can be done for each block. For 16QAM, the number of blocks waiting for the LDPC decoding increases, so the total number of iterations will be decreased (60) to keep the budget usage around 50%. When the demapping methods changes, the cycle usage in one period is not change dramatically, so the number of iterations in one period can be kept at 10. For 64 QAM, the number of blocks for LDPC decoding is the most in the three demapping methods, so the total number of iterations on one block is decrease further to 40. The cycle usage in one period increase to 93%. 3. Issues: For worst case, if LDPC fires in the period, in which the task for SDIs deinterleaving, Viterbi decoding, discrambling and the MSD deinterleaving also fires. Then the number of LDPC decoding can’t be finished within this period. As a result, delay will appear from that period and affect the following 6 symbol periods shown in Figure. 6.6 labeled with ’X’.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 39 CHAPTER 6. CDR RECEIVER DATA-FLOW GRAPHS & SIMULATION

4. Proposal: Although no extra buffer is required in this configuration except the worst case assumption, the symbols are processed by all the previous tasks in their own periods. The delay can be eliminated by making the LDPC decoding fires in the next period which is after the period for MSD de-interleaving, this choice can implemented with the LDPC module on the platform. The number of iterations for LDPC decoding is not a issue in TX1, the 84 iterations is too large, the necessary number is about 30. It is feasible to set the iteration number as 30 for all spectrum modes (including worst case spectrum mode 2/10/23) in Txmode 1 with all code rates and demapping methods.

Transmission Mode 2 and 3 The simulation results are shown in the Table. 6.4

Table 6.4: Simulation results about optimal design of LDPC decoding in Txmode 2 and 3

Tx settings LDPC setting Throughput cycle usage per cycle budget us- period(%) age(%) Txmode 2, 3/4, 4/32 1.0151 82.3154 54.0846 64QAM SP mode 10 3/4, 4/28 0.9517 86.5783 95.2883 Txmode 3, 3/4, 9/36 1.04834 92.6993 48.9360 64QAM SP mode 10 3/4, 9/27 0.9270 94.1487 80.5287

1. Configuration: Setting with the Txmode 2 and 3 in spectrum mode 9 first. According to the results for Txmode 1, the usage of cycles in one period and time budget increased with the demapping methods and coding rates. The worst case configuration includes: 64 QAM and coding rate 3/4 and spectrum mode 10, but the cycle counts in spectrum mode 10 are all based on estimation, so the results in spectrum 10 will only be used to validate the design option in worst case.

2. Results: In Txmode 2, the symbol period is only half of that in Tx mode 1 defined by the CDR standard, then the number of iterations in one period is set to 4, the percentage of cycles in one period is used by 82%, since the execution time for one iteration is the same with tha in Txmode 1 and 3, so one LDPC iteration requires at least 14% cycles in one period, it means that if use 5 iterations per period, the cycles will be used by more than 96%, in order to make the design more feasible, it should be avoid to make the cycle usage in one period being too high. In Txmode 3, the symbol period is about 92% of that in Txmode1, although the execution times are a little smaller for most tasks compared with those in Txmode 1, but the LDPC decoding has the same execution time for each iteration, so the number of iterations can be done in one period is 9. For SP mode 10, the number of iterations for LDPC decoding is smaller than that in the same configuration in SP mode 9, because the number of coded blocks is doubled.

40 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 6. CDR RECEIVER DATA-FLOW GRAPHS & SIMULATION

Figure 6.7: Simulation graph for scanning SDIs

6.2 Back ground scanning use case study

Base on the analysis for the single radio use case, enough budget is kept, at the same time, the number of total iterations for each block is very high. As a results, it is feasible to implement the Back ground scanning for the CDR receiver. The simulations have been done from the basic scanning choice, i.e. only scanning the SDIs, seen in the Fig. 6.7.

6.2.1 Study of the design for scanning the SDIs

Table 6.5: Simulation results with optimal design for SDI scainning in Txmode 1

Tx settings LDPC cycle usage per Throughput cycle budget setting period(%) usage(%) Txmode 1, QPSK, 1/2 9/81 81.417 1.1155 39.7909 scanned: Txmode 1, 64QAM 81.417 1.4993 3.6297 Txmode 1, 16QAM, 3/4 9/54 93.6817 1.013402 54.1177 scanned: Txmode 1, 64QAM Txmode 1, 16QAM, 3/4 9/54 0.9618 62.4637 scanned: Txmode 2, 64QAM 96.6931(102.884) Txmode 1, 16QAM, 3/4 9/54 84.8271 0.9617 62.4753 scanned: Txmode 3, 64QAM Txmode 1, 64QAM, 3/4 9/36 98.2271(101.8503) 1.01052 54.1902 scanned: Txmode 1, 64QAM Txmode 1, 64QAM, 3/4 9/36 99.646 1.0135 45.8871 scanned: Txmode 2, 64QAM Txmode 1, 64QAM, 3/4 9/36 94.6934 1.0135 51.1153 scanned: Txmode 3, 64QAM

1. Configuration: The configuration is based on the Txmode 1 in main channel, and the back- ground scanning is considered only for SDI bits, so no LDPC decoding will be done for the scanned channel. The demapping method is always set to be 64QAM as the worst case with highest requirements for the cycles in scanned channel with all three Txmodes. All demapping methods for main channel are simulated for the optimal design. 2. Results: From the basic configuration with QPSK in main channel, the throughput and time budget usage of the scanned channel are quite good shown in the first row in the Table.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 41 CHAPTER 6. CDR RECEIVER DATA-FLOW GRAPHS & SIMULATION

Figure 6.8: Period usage with 16QAM in main channel and Txmode 1 for scanning

6.5, because of no LDPC decoding. The Cycle usage is the same with the main channel, because the usage is measured by including tasks for two channels. Two of the cycle usage per period are above 100% in the table, the usage in most periods is below 100% shown outside the parentheses, the over use of the period happens when 2 scanned symbols fall in one period of the main channel. Before the arriving of the second one, the processor is idle (between two markers C and D), in this case, delay exists in processing next symbol in main channel, the delay is less than 2% in the following period can be eliminated in one period. The cycle usage in one period is shown in the Fig. 6.8, while the delay caused by the over usage is shown in the Fig. 6.9. The spectrum mode 10 is supported by reducing the number of iterations. Compared with choice for use case 1, and the cycle budget is kept about 40% in the table for the spectrum mode 10.

3. issues: Some extra buffers are required for keeping the arrived OFDM symbols. Because the Viterbi decoding fire with the LDPC decoding in the same period, then the delay will cross several periods. From the simulation results, For 16QAM in the main channel and Tx mode 1 in the scanned channel, 3 buffers with one OFDM symbol size for each will be required for the two channel respectively. If the scanned channel is in Txmode 2, then 2 buffers are required in main channel, 3 buffers for the scanned channel. If Txmode 3 in the scanned channel, then one buffer for each channel. For the 64QAM in the main channel, if the scanned channel is in Txmode 1, then 3 buffers for each channel. If the sanned channel is in Txmode 2, then 1 buffer for main channel and 3 buffers for the scanned channel. If Txmode 3 in the scanned channel, shown as the Figure 6.10, 1 buffer for main channel and 2 buffers for the scanned channel. The extra buffer can be saved by setting the LDPC decoding task not firing within the same period with Viterbi decoding.

When the main channel works in Txmode 2 or 3, then the design options about LDPC are also required to support all configuration in the scanned channel. The simulation results are also got and analyzed.

42 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 6. CDR RECEIVER DATA-FLOW GRAPHS & SIMULATION

Figure 6.9: Period usage with 16QAM in main channel and Txmode 2 for scanning

Table 6.6: Simulation results with optimal design for SDI scainning in Txmode 2& 3

Tx settings LDPC cycle usage per Throughput cycle budget setting period(%) usage(%) Txmode 2, 64QAM, 3/4 4/32 96.7490 1.0151 45.9249 scanned: Txmode 1, 64QAM W.C Spectrum mode 10: Tx- 4/28 98.9447(112.1772) 0.7938 97.1715 mode2, 64QAM, 3/4 scanned: Txmode 1, 64QAM Txmode 3, 64QAM, 3/4 7/35 87.1624 0.9348 67.1326 scanned: Txmode 1, 64QAM Txmode 3, 64QAM, 3/4 7/35 88.3006 0.9327 67.5087 scanned: Txmode 2, 64QAM

1. Configurations: The left two Txmodes in main channel are simulated with worst case resource consumption in scanned channel. When the Txmode is 2 in main channel in first row of Table. 6.6 , then the symbol period is a half of the Txmode 1, the same tasks in Txmode 1 consume the most number of cycles according to the longest size of OFDM symbol, so the simulation is configured with Txmode 1, 64QAM in scanned channel. For the Txmode 2, which contains a shorter period and smaller size of one OFDM symbol, so when scanned channel is in Txmode 2/10/23, in worst case, two scanned symbol will fall in one period of the main channel, the period usage will be in worst case. 2. Results: When the main channel is in Txmode 2, the highest consumption of cycles will be for the 64 QAM and 3/4 code rate. Then at most one scanned symbol requires to be

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 43 CHAPTER 6. CDR RECEIVER DATA-FLOW GRAPHS & SIMULATION

Figure 6.10: Symbols delayed by the Viterbi decoding with 16QAM in main channel and Txmode2 in scanned channel

processed in the period of main channel, only 4 iterations can be done per period. Symbol delayed when the LDPC shares cycles with the Viterbi decoding and deinterleacing require buffers, at most 3 symbol buffers for each channel in all configurations, for example, in Figure 6.10, one extra buffer is required by the main channel and 2 extra buffer is required by scanned channel .

3. Proposal: The setting of the LDPC decoding can be adaptive in the design according to the Txmodes in main channel. With the simulation result, the 9 iterations per period and totally 36 iterations on one block can be used in Txmode 1 for main channel, and 4/32 can be used in Txmode 2, 7/35 for Txmode 3. The best decoding performance can be achieved. The LDPC decoding should be designed to not fire in the same period with the Viterbi decoding.

6.2.2 Study for scanning part of the MSDs Considering the frame specification in the CDR standard, if the left resource is used to process more bits from scanned channel, the SDI bits and part of the MSD bits will be able to be processed. More services can be provided to the users, including the program description, information about the audio streams and data and so on, the simulation graph is shown in Fig. 6.11.

1. Configurations: The main channel is set with all three Txmodes, for Txmode 1 in the first three rows of the Table. 6.7, the simulation includes all demapping method and the scanned channel is in Txmode 2 only, because it is the worst case in cycle usage in one period. For the Txmode 2 in main channel, the results are collected for 64 QAM when the scanned channel is configured with Txmode 1 for highest cycle requirements from each task. For the Txmode 3 in main channel, when Txmode 2 in scanned channel, two scanned symbols will come in each symbol period of main channel. In all configurations, 10 coded blocks for

44 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform CHAPTER 6. CDR RECEIVER DATA-FLOW GRAPHS & SIMULATION

Figure 6.11: Simulation graph for scanning MSDs

Table 6.7: Simulation results with optimal design for scanning MSDs

Tx settings LDPC set- cycle us- Throughput cycle budget ting age per usage(%) period(%) Txmode 1, QPSK, 1/2 6/60 79.3125 1.08179 44.3362 scanned: Txmode 2, 64QAM, 1/20 79.3125 1.0986 42.3385 10blks,3/4 Txmode 1, 16QAM, 3/4 6/36 89.2438 1.0201 53.1224 scanned: Txmode 2, 64QAM, 1/20 89.2438 1.0986 42.3362 10blks,3/4 Txmode 1, 64QAM, 3/4 6/30 98.662(104.0147)0.937 66.7808 scanned: Txmode 2, 64QAM, 1/20 98.662(104.0147)1.0986 42.3329 10blks,3/4 Txmode 2, 64QAM, 3/4 3/30 96.8541 0.9341 67.48 scanned: Txmode 1, 64QAM, 1/15 96.8541 1.0986 61.3046 10blks,3/4 Txmode 3, 64QAM, 3/4 5/30 90.5129 0.9006 73.6087 scanned: Txmode 2, 64QAM, 1/25 90.5129 1.0188 53.474 10blks,3/4

the MSDs in scanned channel are assumed to be processed successfully, it contains at least 1/3 of the logic frame (more than 1 kB), so the headers (at most 255 Bytes for each) of SMF and the first MSF are processed for sure, all the information about audio program can be extracted from the scanned channel.

2. Results: In order to get a good performance for error correction, the number of LDPC decoding iterations on one block will be around 30 in the first channel, the number of the LDPC decoding iteration is designed to be at least above 10, which makes the perfomance acceptable. The time budget can be kept around 50% and the cycle usage in one period is under 97% with QPSK and 16QAM demapping, 10 blocks are processed, which means that the system can be designed for use case 3 Dual Radio for QPSK. The configuration with 16 QAM can also support the use case 3, the number of LDPC iteration can be set to 30,

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 45 CHAPTER 6. CDR RECEIVER DATA-FLOW GRAPHS & SIMULATION

Throughput can be kept still larger then the requirement in the CDR standards. However the 64QAM will generate at least 30 blocks to be processed, from the budget usage for 10 blocks in the table, it is not feasible to implement the use case 3 with 64QAM. 3. Issues: The delay will occur for processing the arriving OFDM symbol in both of the chan- nels, 3 buffers can be allocated for each channel to keep the symbols when the Viterbi decoding and de-interleaving fires, otherwise the LDPC should be designed not firing within the same period with the Viterbi decoding. 4. Proposal: The platform and algorithm support to implement use case 3 with QPSK and 16QAM, 64QAM can not be supported now. When processing MSDs in scanned channel, the deinterleaving buffer is required for both channels, so memory usage will increase a lot. Optimization about the memory required by the functional modules should be considered.

46 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform Chapter 7

Conclusion

7.1 Conclusion

The China Digital Radio (CDR) standard operating in the FM band will cover more than 500 cities in China, the chip is required in market. A CDR receiver architecture has been studied using CSDF data flow modeling and simula- tion. CSDF models for CDR receiving functions have been designed and implemented using the HAPI library, based on CDR processing algorithms, their measured processing load and resources requirement on a real SDR platform. Several data flow graphs have been built and simulated using the HAPI to study the system requirements and optimization for several CDR receiver use cases. The demultiplexing subsystem is designed and implemented for completing the CDR receiver system. The simulation results have shown:

1. With limited effort, the system requirement can be quantitatively simulated to check the processor load, memory usage, related issues and their bottleneck.

2. The models and simulation can be easily modified to find improvement in options and to quantitatively check design changes or other use cases with limited effort.

For the three CDR receiver use cases on a given NXP SDR platform, the architecture design options have been evaluated and the results have shown:

1. The LDPC decoding iteration limitations have been found for several CDR modes, which will be used in real CDR receiver implementation.

2. The platform can achieve very good reception quality with 30 iterations in LDPC decoding for the Single Radio use case. Although another OFDM symbol buffer is required, if the Viterbi decoding and LDPC decoding tasks can run in one OFDM period.

3. The platform has sufficient processing power for listening the Single Radio as well as back- ground scanning on the Service Description Information of another channel in all CDR modes.

4. The platform can support simultaneously listening of two CDR channels modulated in QPSK and 16QAM, but not in 64QAM.

7.2 Future work

The models and graphs can be further improved for CDR receiver product design, e.g. using measured resource usage of CDR TX mode 2 processing functions. The current simulation is done by estimated processing load.

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 47 CHAPTER 7. CONCLUSION

The memory usage results can be improved. Currently the token size is defined as a single value. The simulation can be further improved by implementing more real algorithms in the HAPI actors in the data flow graph.

48 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform Bibliography

[1] Mansour Ahmadian, ZJ Nazari, N Nakhaee, and Z Kostic. Model based design and sdr. In DSPenabledRadio, 2005. The 2nd IEE/EURASIP Conference on (Ref. No. 2005/11086), pages 8–pp. IET, 2005.1 [2] Marco JG Bekooij. Introduction to the HAPI Timed Discrete-Event Simulator, volume 10. 2015. 11, 13 [3] Greet Bilsen, Marc Engels, Rudy Lauwereins, and Jean Peperstraete. Cycle-static dataflow. IEEE Transactions on signal processing, 44(2):397–408, 1996.2 [4] Matlab R2016b Communications System Toolbox Documentation. [error detection and cor- rection]. 19 [5] Inc. Wikimedia Foundation. [in-band on-channel — wikipedia, the free encyclopedia].5 [6] GY/T2681. [Digital Audio Broadcasting In FM Band–Part 1:Framing Structure, Channel Coding and Modulation For Digital Broadcasting channel]. Diss.2013.2,5,6 [7] GY/T2682. [Digital Audio Broadcasting In FM Band–Part2: Multiplexing]. Diss.2013.2, 14 [8] Matthias Ihmig and Andreas Herkersdorf. Flexible multi-standard multi-channel system ar- chitecture for software defined radio receiver. In Intelligent Transport Systems Telecommu- nications,(ITST), 2009 9th International Conference on, pages 598–603. IEEE, 2009.2 [9] Cadence Design System Inc. Tensilica connx bbe16 dsp for baseband processing. 21 [10] Cadence Design System Inc. Tensilica hifi dsp. 22 [11] Philip S Kurtin, Joost PHM Hausmans, and Marco JG Bekooij. Hapi: an event-driven simu- lator for real-time multiprocessor systems. In Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems, pages 60–66. ACM, 2016. 11, 12, 13 [12] Edward A Lee and David G Messerschmitt. Synchronous data flow. Proceedings of the IEEE, 75(9):1235–1245, 1987.2 [13] Chung-Ching Shen, William Plishker, Hsiang-Huang Wu, and Shuvra S Bhattacharyya. A lightweight dataflow approach for design and implementation of sdr systems. In Proceedings of the Wireless Innovation Conference and Product Exposition, pages 640–645, 2010.2 [14] Firew Siyoum, Marc Geilen, Orlando Moreira, Rick Nas, and Henk Corporaal. Analyzing synchronous dataflow scenarios for dynamic software-defined radio applications. In System on Chip (SoC), 2011 International Symposium on, pages 14–21. IEEE, 2011.2 [15] Bart D Theelen, Marc CW Geilen, Twan Basten, Jeroen PM Voeten, Stefan Valentin Ghe- orghita, and Sander Stuijk. A scenario-aware data flow model for combined long-run average and worst-case performance analysis. In Proceedings of the Fourth ACM and IEEE Inter- national Conference on Formal Methods and Models for Co-Design, 2006. MEMOCODE’06. Proceedings., pages 185–194. IEEE Computer Society, 2006.2

Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform 49 BIBLIOGRAPHY

[16] Yun Wang. [Software-defined radio receiver design and development for China Digital Radio]. Master’s thesis, Delft University of Technology, July 2015.2,7 [17] Wikipedia. Low-density parity-check code — wikipedia, the free encyclopedia, 2017. [Online; accessed 6-June-2017 ].7

[18] Yuli You. Dra audio coding standard. In Audio Coding, pages 255–291. Springer, 2010.5

50 Mapping a China Digital Radio (CDR) receiver on a Software-Defined-Radio platform