<<

A comparative study of PDSP and FPGA design

methodologies for DSP system design

A thesis submitted

to the Graduate School of the University of Cincinnati

in partial fulfillment of the requirements for the Degree of

Master of Science

in the Department of and Computing Systems

of the College of Engineering and Applied Science

By

Prasad Deodhar

Bachelor of Engineering ( Engineering)

University of Mumbai, India. 2009.

Thesis Advisor and Committee Chair: Dr. Carla Purdy ABSTRACT

In today’s globally interconnected world, we notice a proliferation of a vast array of electronic devices and systems in our daily life, from industrial automation, military, aerospace, aviation, medicine, consumer to multimedia and entertainment products. The common that binds all these devices is that they involve some kind of a human-computer interface that helps the end-user of these devices interact and control the computational system within each device. Such a human –computer interface typically involves some kind of a Digital

Processing (DSP) module whose specific task is to accept as input a real-world , convert it into a and the digital signal by means of extracting useful information through transformation, analysis and synthesis to eventually deliver a result that can help in making a decision. Hence DSP serves as the “interface” between the analog domain of real-life and the computational world of digital signals. The most widely used hardware platform for DSP system implementation is the Programmable Digital Signal (PDSP).

The PDSPs are general purpose designed for embedded DSP applications. They contain special architecture and instructions that support execution of computation - intensive

DSP more efficiently. However, rapid advancements in CMOS technology have widened the options available to a hardware engineer for DSP system implementation. The advent of Field Programmable Gate Arrays (FPGAs) with in-built hardware blocks like DSP multiplier cores, hard and soft IP cores and high-level synthesis tools has given the PDSP a strong competitor. A multitude of factors such as development effort, design time, performance in terms of power consumption and speed, time-to-market, prototyping capabilities, design methodologies and architectural flexibility should be considered for DSP system implementation.

ii

This thesis makes a comparative study of the two hardware platforms – PDSP and FPGA – in terms of design methodologies, architectures, design time and effort and impact of high-level synthesis tools. The objective is to help a DSP hardware engineer make an informed decision on the pros and cons of selecting a particular hardware platform.

iii

iv

ACKNOWLEDGEMENTS

First and foremost, I would like to express my gratitude to God, for his blessings and support in every phase of my life.

I express my appreciation and thanks to my advisor Dr. Carla Purdy, Associate Professor,

University of Cincinnati for her motivation, constant support and guidance that helped me in my research work and writing this thesis.

I would also like to sincerely thank Dr. J. Adam Wilson, currently serving as Assistant Professor of Neurology at Cincinnati Children's Hospital and Medical Center, without whose valuable assistance, I would not have been able to work on the hardware and software resources required for the preparation and completion of this study. I express my immense gratitude to Dr. Wilson for his support and guidance.

I would like to thank my parents and friends who kept me motivated and without their moral support and motivation this thesis would not have been possible. They have always been a strong pillar of strength, care and inspiration for me.

I dedicate this thesis to my sister Swati, who had been the greatest support for me throughout my graduate education.

v

TABLE OF CONTENTS

ABSTRACT ...... ii

ACKNOWLEDGMENTS ...... v

LIST OF FIGURES ...... viii

LIST OF TABLES ...... x

LIST OF ABBREVIATIONS ...... xi

1. INTRODUCTION ...... 1

1.1 Motivation ...... 1

1.2 Research Overview ...... 2

1.3 Thesis Outline ...... 6

2. BACKGROUND STUDY ...... 9

2.1 Introduction ...... 9

2.2 Impact of VLSI Technology on DSP and Vice-versa ...... 11

2.3 Evolution of DSP Hardware ...... 13

2.3.1 Programmable Digital Signal Processors (PDSPs) ...... 14

2.3.2 Field Programmable Gate Arrays (FPGAs) ...... 18

2.4 VLSI - DSP: An Insight into the Education Perspective ...... 22

2.4.1 Present State of VLSI - DSP Education ...... 23

2.4.2 Suggested Improvements in VLSI - DSP Education ...... 25

3. PROBLEM STATEMENT ...... 27

3.1 Design Methodologies: PDSP and FPGA Design Flows ...... 27

3.1.1 DSP System Development Flow Using PDSPs ...... 29

vi

3.1.2 DSP System Development Flow Using FPGAs ...... 32

3.2 PDSP and FPGA Comparison: Summary of Past Case Studies ...... 42

3.3 Problem Definition: Scope, Goals & Objectives ...... 47

4. EXPERIMENTS ...... 49

4.1 Introduction ...... 49

4.2 Hardware and Software Overview ...... 50

4.2.1 Hardware – PDSP: C5515 Evaluation Board ...... 50

4.2.2 Software – PDSP: Code Composer Studio IDE ...... 54

4.2.3 Hardware – FPGA Altera DE1 Development Board ...... 57

4.2.4 Software – FPGA ...... 59

4.3 Experiments ...... 65

4.4 Procedure for Implementation ...... 69

4.5 Observations and Results ...... 71

4.6 Comparison of the Results ...... 82

5. CONCLUSIONS AND FUTURE WORK ...... 88

5.1 Introduction ...... 88

5.2 Conclusions: Template for Hardware Platform Selection ...... 89

5.3 Future Trends: Co-Processors – A Hybrid Approach ...... 93

5.4 Concluding Remarks ...... 99

REFERENCES ...... 101

vii

LIST OF FIGURES

Figure 1: Evolution of DSP Hardware ...... 14

Figure 2: PDSP Design Flow [19] ...... 30

Figure 3: FPGA Based DSP design approaches ...... 33

Figure 4: Model Based Design Framework Using Simulink [6] ...... 37

Figure 5: FPGA Verification [6] ...... 37

Figure 6: Model Based Design: Complete Design Flow [6]...... 40

Figure 7: Texas Instruments C5515 Evaluation Board [25] ...... 50

Figure 8: TI TMS320 C5515 DSP Processor [26] ...... 52

Figure 9: Altera DE1 Development Board [28] ...... 57

Figure 10: Quartus II Flow [30] ...... 61

Figure 11: Altera SOPC Builder Tool [33] ...... 63

Figure 12: FPGA Development Tools at a Glance [33] ...... 65

Figure 13: Experiment 1 Simulink Model ...... 72

Figure 14: Experiment 1 Simulink Simulation ...... 73

Figure 15: Experiment 1 Quartus II ...... 73

Figure 16: Experiment 1 PDSP ...... 74

Figure 17: Experiment 2 Simulink Model_1 ...... 75

Figure 18: Experiment 2 Simulink Model_2 ...... 75

Figure 19: Experiment 2 Simulink Simulation ...... 76

Figure 20: Experiment 2 Quartus II ...... 76

Figure 21: Experiment 2 PDSP ...... 77

Figure 22: Experiment 3 Simulink Model_1 ...... 80 viii

Figure 23: Experiment 3 Simulink Model_2 ...... 81

Figure 24: Experiment 3 Simulink Simulation ...... 81

Figure 25: Experiment 3 Quartus ...... 82

Figure 26: TI DSP and Xilinx FPGA as Co-processor [36] ...... 97

Figure 27: TI DSP and Altera FPGA as Co-processor [40]...... 98

Figure 28: Heterogeneous platform-based design [37] ...... 98

ix

LIST OF TABLES

Table 1: Results for Case Study # 2 [21] ...... 44

Table 2: Internal Memory [26] ...... 53

Table 3: External Memory [26] ...... 53

Table 4: Comparing FPGA and PDSP Implementation Results ...... 86

Table 5: Template for Hardware Platform Selection_1 ...... 91

Table 6: Template for Hardware Platform Selection_2 ...... 92

x

LIST OF ABBREVIATIONS

DSP Digital

PDSP Programmable

FPGA Field Programmable

TI Texas Instruments

DFT Discrete Fourier Transform

FIR Finite Impulse Response

IIR Infinite Impulse Response

CCS Code Composer Studio

VLSI Very Large Scale Integrated Circuits

HDL Hardware Description Language

VHDL VHSIC Hardware Description Language

xi

CHAPTER 1: INTRODUCTION

1.1 MOTIVATION

The past decade has witnessed an exponential growth in the field of embedded systems, especially in the entertainment/portable computing/mobile devices sector. The increasing trend towards high processing power, portable, mobile and power-efficient systems has compelled engineers and scientists to develop innovative design methodologies that can fulfill these requirements and meet the stringent system specifications. Many of these systems, like digital cameras, portable media players, and tablets, perform Digital Signal Processing

(DSP) operations that require intensive mathematical operations.

The conventional method of developing DSP applications is by using a programmable digital signal processor (PDSP) for prototype design and implementation. This is primarily due to the shorter development time, lower power consumption and lower cost.

Due to progress in CMOS semiconductor technology, complex DSP algorithms, communication protocols, and applications are now feasible, which, in turn, increase the complexity of the systems and products. As the complexity increases, the system reliability is no longer solely defined by the hardware platform reliability but also increasingly determined by hardware and software architecture, design and verification processes, and the level of design maintainability.

[1]

The presence of DSP capabilities is becoming a ubiquitous phenomenon today. More and more common devices require some kind of signal processing with a high throughput of data. As DSP is integrated into more devices, time-to-market and the ability to make late design changes become important. The challenge before engineers is to find ways to achieve higher processing performance coupled with less design effort so that time-to- market is quick.

1

Programmable DSP processors perform their arithmetic operations via software. Software can give flexibility in design, allowing late design changes, but its performance is poor compared to hardware. Software executes in a sequential manner where hardware can execute in a truly parallel way. This is a serial operation in nature, and therefore a slow approach, but has the advantage of being modifiable. The idea of putting the arithmetic operations in hardware has been around for a long time. But creating a custom ASIC requires a lot of time and effort up front and the computing logic on the ASIC cannot be modified after it has been fabricated. This is where a field programmable gate array (FPGA) becomes a great solution by combining the strengths of hardware and software. An FPGA combines the best of both worlds. Reconfigurable hardware such as FPGAs offers high performance and can consequently be significantly faster than the microprocessors. [2]

1.2 RESEARCH OVERVIEW

Real-time implementation of DSP systems requires design of hardware which can match the application sample rate to the hardware processing rate (which is related to the and the implementation style).When the hardware has matched the sample rate, there is no advantage in making the hardware any faster or larger. Thus, real-time does not always mean high-speed.

Real-time architectures are capable of processing samples as they are received from the signal source, as opposed to storing them in buffers for later processing as done in batch processing.

Furthermore, real-time architectures operate on an infinite time series (since the number of the samples of the signal source is so large that it can be considered infinite). The sample rate information alone cannot be used to choose the architecture. The complexity of algorithms is also an important consideration. The requirements for high data rate and increased algorithmic complexity in next-generation devices present a difficulty for meeting the power budgets.

Therefore, in designing next-generation systems, designers, system architects and

2 circuit designers face a challenge of how to optimally utilize the benefits of technology scaling in a short development cycle.

Programmable Digital Signal Processors (PDSPs) are a specialized form of a with an architecture optimized for the fast operational needs of DSP applications. A PDSP works well in signal processing applications because it is optimized to efficiently process signals, is relatively inexpensive, and has a well-defined development path and a fixed hardware configuration. Since the different types of digital signals require only one set of hardware, a DSP processor can be “mass manufactured” so that the hardware is constant for all chips and

“functionality is defined through software” [3] [4].

The fundamental difference between a PDSP and a general-purpose microprocessor is the DSP processor’s hardware multiply-accumulate (MAC) block and specialized hardware accelerator blocks (co-processors) to facilitate faster computation of commonly found DSP functions [5].

The MAC operation is usually the performance bottleneck in most DSP applications. DSP processors generally incorporate MAC blocks in their architecture to minimize this performance bottleneck. While adding more MAC units may provide more PDSP throughput, the PDSP falls behind in raw data processing power for certain data-intensive DSP functions such as Fourier

Transforms, digital filters, etc. To overcome this hurdle, PDSPs have also tried incorporating specialized hardware cores or “hardware accelerators” () blocks such as the FFT coprocessor, the Viterbi coprocessor and the enhanced filter coprocessor. While such coprocessor blocks provide high DSP throughput, they do not cater to all DSP applications. Most

DSP applications cannot benefit from the DSP vendors' predefined and limited set of hardware accelerator blocks. Additionally, such hardware accelerator blocks are permanent, do not allow for any level of customization for the particular design needs, and can quickly become archaic in

3 today’s evolving standards [3]. The DSP processor’s fixed hardware architecture is not suitable for certain applications that might require customized DSP function-based implementations.

Field Programmable Gate Array (FPGA) is a highly customizable chip used for logic functions.

The FPGA is programmed using a hardware description language (HDL) which is used to program the connections for the individual gates in the FPGA. An FPGA design is usually limited by the number of gates available on the chip. As the number of available gates increases on the FPGA, more and more complex designs can be placed on the chip. The FPGA can also be limited by the time it takes a signal to travel from one gate to another as well as the time it takes to pass through a single gate. Advances in FPGA technology have increased the number of available gates while reducing the time it takes a signal to travel through and between the gates

[5].

Field programmable gate arrays (FPGAs) enjoy adequate performance and logic capacity to implement a number of digital signal processing (DSP) algorithms effectively. But the most significant and path breaking step that FPGAs are taking in regards to signal processing is the utility of the “software tools” used to create the designs. Software development and hardware synthesis tools are essential for creating a structural and functional design and produce either an optimized hardware layout when working with a FPGA or an optimized software routine when working with a DSP.

The tools used in the PDSP software design have been modified so that the software used in the

DSP runs efficiently using a structure that offers only a limited degree of parallelism.

However, the tools used in the FPGA design optimize the design to run in a highly parallel manner. The tools for the PDSP generally only require knowledge of the C language, while the

FPGA tools require the user to know less common hardware description languages such as

VHDL or . Tools like System Generator from Xilinx or DSP Builder from Altera have

4 reduced the user’s need to know VHDL and Verilog by using a block-based representation for large pieces of hardware design code. Advances in the FPGA design tools, along with the performance increase the FPGA provides, have opened the door for signal processing designers to increasingly use FPGAs over the traditional DSPs [5].

Latest improvements in the simulation capabilities of high level modeling tools have opened new possibilities. High level design tool support for algorithm modeling and simulation enables a designer to use a single environment to create floating and fixed-point models, and to make decisions early in the design process. At the same time, FPGA vendors have expanded commercial IP offerings to incorporate higher level DSP functions. Together, these technologies enable a new flow for data path design that includes design iterations at the system level [6] [7].

Previously, FPGA based DSP implementation required the combined efforts of a DSP engineer and a hardware engineer familiar with HDL or schematic based design .The steps in such an effort of combined DSP design using algorithmic modeling tools coupled with hardware architecture generation tools by FPGA vendors follow a typical set of stages that include construction of an ideal mathematical model, investigation of implementation effects, test-bench creation and hardware net list generation. Traditional HDL design methods are then used to complete the design implementation [6].

The issue of mapping “DSP algorithm to VLSI architecture” and then translating the optimal architecture to silicon can be formulated in many ways. Commonly, there is a requisite performance level, with acceptable power and silicon-area resources budget along with a desired speed (sampling rate in this case). Since the performance of a DSP application is often dictated by a standard, small power or area are commonly distinguishing features of a particular implementation. The overall design process, therefore, can be viewed as a constrained

5 optimization problem, where the power and/or area are minimized under performance constraints

[6] [8].

The problem that needs to be cracked is multifaceted: to find the right algorithm that optimally uses the underlying technology to achieve the desired data rate, while staying within the power and die size limits. To answer this, every candidate algorithm has to be mapped into an architecture that is optimal for a particular technology. The architecture choice strongly depends on the required throughput, but also on the underlying technology options, usually defined by the choice of supply and threshold voltages. To meet targets, design teams routinely settle on the first architecture and VLSI technology platform that meets the specifications. The ultimate choice is dependent on a design that attains a low power implementation with shorter design time

[8].

Until recently, the use of programmable digital signal processors was nearly universal, but with the requirements of many applications exceeding the processing capabilities of digital signal processors, the use of FPGAs is growing rapidly. Equally vital is the FPGA’s intrinsic advantage in ease of customization, rapid prototyping and reliability.

1.3 THESIS OUTLINE

The goal of this thesis is to research and investigate the core issue of VLSI-based design and implementation of DSP systems. It is an attempt to explore the design methodologies and architectures for using PDSPs and FPGA based design and gauge the impact of high-level synthesis tools on DSP system development.

This thesis will demonstrate the comparative analysis of the PDSP and FPGA implementations with regards to their processing performance as well as calculating a measure of development effort and evaluating the usage of critical high-level design tools of each implementation

6 platform. It shall be observed that the FPGA is capable of implementing the same design as the

PDSP with the advantage of achieving much higher computational speeds. Several case studies have been used as the test-bed implementation for the PDSP and FPGA platforms.

This thesis is broken down into four parts: problem statement (definition, scope, objectives and limitations), algorithm design, architectural implementation and analysis of results. This chapter

(Chapter 1) discusses the motivation behind this thesis followed by the overview of the current state of research in this topic and eventually concluding with the thesis outline.

Chapter 2 of this thesis gives the overview and background study involved in preparing this thesis. The emphasis of this chapter is to acquaint the reader with the knowledge of how the progress and advancements in VLSI technology have influenced the hardware implementation of

DSP systems. This is followed by a detailed study of the evolution of DSP hardware primarily focusing on two hardware platforms that are most commonly used for DSP system design: the

Programmable Digital Signal Processor (PDSP) and the Field Programmable Gate Array

(FPGA). An insight into the world of VLSI –DSP from an education perspective is also explained at the end of the chapter. It is an attempt to take a look at the current state in academia of VLSI and DSP coursework and teaching methods. It makes a strong case for the inclusion of the DSP hardware aspect in teaching advanced VLSI or DSP for graduate students and suggests ways and means to achieve improvements in teaching methods in terms of course emphasis, course structure and syllabus.

Chapter 3 of this thesis builds onto the conceptual foundation of chapter 2 and examines the various design methodologies that can be utilized for implementation of DSP systems when targeting different hardware platforms. It also presents a brief summary of past case studies that have explicitly talked about the issue of PDSP and FPGA comparison in the development of

DSP systems. These case studies shall enable any reader of this thesis to get acquainted with the

7 wider level of arguments that go in favor of or against when making a selection of the hardware platform along with various other finer points that are encountered when debating over the contentious topic of the best available hardware platform for DSP system implementation. The chapter eventually concludes with a formal problem statement of this thesis that defines the problem at hand with its scope and limitations, the goals and objectives of this research concluding with an overview of the experiments that are planned to be performed.

Chapter 4 of this thesis deals with the set of experiments performed. It also delves separately into the issues of implementing the standard DSP algorithms using Texas Instruments’ DSP chip and

Altera’s FPGA, elaborating in detail the development flow used by each platform in the system design. It also lays down the observations and results obtained in the process of implementation that serve as the basis for the comparison of the two hardware platforms.

Chapter 5 of this thesis discusses the conclusions of this thesis and emerging trends for future work possible in this area.

8

CHAPTER 2: BACKGROUND STUDY

2.1 INTRODUCTION

The first chapter of this thesis provided an introduction to the broad area of VLSI based design and implementation of digital signal processing systems. This chapter and the next chapter principally lay the necessary groundwork for the theoretical foundation of the principles, concepts, techniques and the elementary information required for understanding subject matter concerning VLSI based DSP. This has been accomplished through the study of literature available currently along with a summary of past case studies to support the main topic being discussed. These case studies have explicitly made a comparative analysis and critique of the two most commonly available and utilized hardware platforms for DSP systems, namely the

Programmable Digital Signal Processor (PDSP) and Field Programmable Gate Array (FPGA).

This chapter opens with an overview of the impact of rapid advancements in VLSI and its influence on DSP systems followed by a detailed explanation of the evolution of DSP hardware platforms mentioned above: PDSP and FPGA.

Digital Signal processing (DSP) is the link from the real world to the computing world. DSP is used in several applications that encompass digital communication, multimedia systems, and satellite systems, biomedical devices, image-processing applications and consumer electronic appliances. All these applications cover a broad spectrum of performance and cost requirements and hence require different sampling rates. [1]

Due to progress in CMOS semiconductor technology, realizing complex DSP algorithms and applications is now feasible. These factors have led to an escalation in the performance demands

9 of these algorithms resulting in more complex systems and products. As the complexity increases, the system reliability cannot be exclusively defined by the hardware platform reliability but also increasingly determined by the associated hardware and software architecture, development and verification processes, and the level of design maintainability. [1]

Real-time signal processing applications are transforming the electronics segment of the high- tech industry market. On an average, every six months, markets are swamped with new products and technologies that are smarter, faster, smaller and more interconnected than ever. This has led to a huge demand for greater speed, effectiveness and portability in any new product that hits the market. The changing market dynamics continues to propel growth and the pace of change is accelerating. [9]

For the semiconductor provider, that fast adoption means fast time-to-market along with a quick ramp-to-volume. This puts tremendous pressure on DSP / VLSI design engineers to meet these varied demands. Costs must be controlled and power consumption needs to be reduced while increasing performance and flexibility within the ambit of an increasingly complex development environment and a design cycle that is ever shrinking [9].

The consequent objective is to examine the benefits of hardware platforms and associated architectures for realizing specific DSP applications and compare the design trade-offs between them. The criteria used to evaluate the options for selecting a platform for implementation are time to market, cost of production, processing performance, development effort, power consumption and the real flexibility with a substantial number of features can be accommodated.

DSPs from traditional vendors like Texas Instruments and have been the primary choice for signal processing applications for many years in the academia and industry.

While they are still widely used for many applications today, the insatiable need for multimedia

10 systems that require higher performance and algorithm complexity, is fueling a rate of growth that Moore’s Law is hard pressed to keep up with. As such, another option of FPGAs have evolved to become reconfigurable signal processors that warrant serious consideration for many of today's signal processing design challenges.

2.2 IMPACT OF VLSI TECHNOLOGY ON DSP AND VICE-VERSA

The advent of VLSI (Very Large Scale Integration) enabled solutions to complex and intractable engineering problems. Advancements in VLSI have played a serious role in realizing the amazing electronic appliances today. The capability to place billions of in a small silicon area has revolutionized the consumer high-technology market, with products regularly appearing with increasing computational power, improved battery life and reduced physical size.

Digital Signal Processing is generally performed using “specialized programmable signal processors”. The capability of a signal processor is determined by its “hardware” and “software”.

By “hardware” we mean the physical implementation, which includes both individual ICs and the system architecture. “Software” is the computational procedure, which includes both the mathematical functionality and the particular algorithm by which it is implemented [10].

Progress in signal processing capability is the product of progress in IC devices, architectures, algorithms and mathematics. Advances in VLSI technology can be used to examine the relative impact of improved IC technology and computing architectures, or more generally computing

“hardware”, versus fast algorithms and new mathematical techniques (computing “software”) in advancing the capability of digital signal processing [10].

The capability of the hardware implementation is affected by the performance of the individual

ICs that comprise the processor, memory, and communication elements, as well as the architecture that defines the overall organization of these elements. Since all of the signal

11

processors all realized using VLSI technology, the progress and advancements in VLSI have

inherently impacted on signal processing system architecture in a number of important ways of

which a few are elaborated below [11]

 High speed: As the IC manufacturing technology evolves, the feature dimensions of transistors

continue to shrink. Smaller transistors means faster switching speed and, hence, higher clock

rate. Faster processing speed means more demanding signal processing algorithms can now be

implemented for real-time processing.

 Parallelism: Higher device density and larger chip area promise to pack millions of transistors

on a single chip. This makes it feasible to exploit parallel processing to achieve an even higher

throughput rate by processing multiple data streams concurrently. To fully exploit the benefit of

parallel processing, however, the formulation of signal processing algorithms must be

reexamined. Algorithm transformation techniques are also developed to exploit maximum

parallelism from a given DSP algorithm formulation.

 Local communication: As device dimensions continue to decrease and chip area continues to

increase, the cost of intercommunication becomes significant in terms of both chip real estate

and transmission . Hence, pipelined operation with a local is preferred to broadcasting

using global interconnection links. Compiler and code generation methods need to be updated to

maximize the efficiency of pipelining.

 Low-power architecture: Smaller feature size makes it possible to reduce the

operating voltage and, thereby, significantly reduces the power consumption of an IC chip. This

trend makes it possible to develop digital signal processing systems on portable or handheld

mobile . On the other hand, the stringent performance requirement and regular

deterministic formulation of signal processing applications also profoundly influenced the VLSI

design methodology.

12

 High-level synthesis design methodology: The quest to streamline the process of translating a

complex algorithm into a functional piece of silicon that meets the stringent performance and

costs constraints has led to significant progress in the area of high-level synthesis, system

compilation, and optimal code generation. Ideas such as dataflow modeling, loop unrolling,

software pipelining, which were originally developed for general purpose computing systems,

have enjoyed great success when applied to aiding the synthesis of an application-specific signal

processing system from a high-level behavioral description.

 Multimedia processing architecture: With the maturity and popularity of multimedia signal

processing applications, general purpose microprocessors have incorporated special-purpose

architecture, such as the multimedia extension instruction set (e.g., MMX). Signal processors

also led the wave of a novel architectural concept such as very long instruction word (VLIW)

architecture. In fact, it is argued that incorporating multimedia features is the only way to sustain

the exponential growth in performance through the next decade.

2.3 EVOLUTION OF DSP HARDWARE

We shall be discussing two of the most commonly used hardware platforms for DSP namely

Programmable Digital Signal Processor (PDSP) and Field Programmable Gate Array (FPGA) in

this section. We have restricted the scope of the discussion to these platforms as the main

objective of this thesis is to have a comparative study of these platforms. Various other types of

hardware that can be utilized for DSP implementation are beyond the scope of this thesis.

13

Figure 1: Evolution of DSP Hardware

2.3.1 Programmable Digital Signal Processors (PDSPs)

Programmable digital signal processors (PDSPs) are general-purpose microprocessors designed

specifically for DSP applications where the architecture is optimized for repetitive, numerically

intensive tasks at high rates. They are designed mainly for embedded DSP applications.

PDSPs fall between the general-purpose microprocessor and the custom-designed chips. GPPs

have the advantage of ease of programming and development. However, GPPs often suffer from

unsatisfactory performance for DSP applications due to overheads incurred in both the

architecture and the instruction set. ASICs, on the other hand, lack the flexibility of

programming. The time to market delay due to chip development are longer. [4]

Recently the border between DSP processors and general-purpose processors has been

diminishing as general-purpose processors have obtained DSP features to support various

multimedia applications. On the other hand, DSP processors, which used to be programmed with

14

manual assembly, have nowadays incorporated features from general-purpose computers to

support software development on high-level languages. [3]

PDSPs can be classified as General Purpose PDSPs and Application- specific PDSPs typically based on the architecture they support.

General Purpose PDSPs

This class of PDSPs is characterized by the following features [3] [12]:

 Data path: The actual signal processing operations in a processor are carried out in various

functional units, such as arithmetic logic units (ALU), multipliers, and the collection these units

is called as data path. In order to store intermediate results, data path contains also accumulators

and registers. The data path in DSP processors can be expected to be tailored for computations

inherent in typical DSP algorithms. Multiplication is involved in one of the most characteristic

operations in DSP, multiply-accumulate (MAC) and often DSP performance even characterized

as MAC/s. Therefore, a fast multiplier is an essential unit in a PDSP. In addition, an is

used in MAC operation and these resources form a MAC unit. Processor may also contain

parallel MAC units to further boost the performance on DSP applications. As in general-purpose

processors, PDSPs contain arithmetic-logical unit (ALU) which performs the basic operations:

addition, subtraction, increment, negate, and, or, not, etc. DSP applications typically have very

high computational requirements in comparison to other types of computing tasks, since they

often must execute DSP algorithms in real time on lengthy segments of signals sampled at 10-

100 KHz or higher. Hence, DSP processors may also be enhanced with special function units to

improve performance for a group of applications that may often include several independent

execution units that are capable of operating in parallel. As a result, efficiency can be improved

by adding application specific functions to the data path of a general-purpose DSP processor and

significant savings can be obtained if a system is tailored for the application at hand.

15

 Data format support: DSP processors are divided as fixed-point and floating-point processors

based on the type of arithmetic units in the processor. The floating-point processors contain more

complex logic and, therefore, consume more power and are more expensive. However, the

floating-point processors are easier to program as the dynamic range in floating-point

representation is larger and there is no need to scale and optimize the signal levels during

intermediate computations. Furthermore, high-level languages have floating-point data types

while integers are the only supported fixed-point data types although signal processing calls for

fractional data types for fixed-point arithmetic.

: A key feature of PDSPs is the adoption of a Harvard memory

architecture that contains separate program and data memory so as to allow simultaneous

instruction fetch and data access. This is different from the conventional Von Neumann

architecture, where program and data are stored in the same memory space. This implies that, in

Harvard architecture, while operands for current instruction are accessed, the next instruction can

already be fetched. This approach doubles the memory bandwidth when one operand instructions

are used.

 Dedicated address generator: Intensive access to memory in DSP applications implies that

address computations are performed frequently. As the data path is utilized by signal processing

arithmetic, DSP processors often contain ALUs dedicated to computations. The

dedicated address generator allows rapid access of data with complex data arrangement without

interfering with the pipelined execution of main ALUs (arithmetic and logic units). This is useful

for situations such as two-dimensional (2D) digital filtering and . Some

address generators may include a bit-reversal address calculation to support the efficient

16

implementation of FFT, and circular buffer addressing for the implementation of infinite impulse

response (IIR) digital filters.

 High bandwidth memory and I/O controller: To allow low-cost, intensive input and output

demands of most signal processing applications, most PDSPs incorporate one or more

specialized serial or parallel I/O interfaces, and streamlined I/O handling mechanisms, such as

low-overhead interrupts and (DMA), to allow data transfers to proceed

with little or no intervention from the processor's computational units. Several PDSPs have built-

in multichannel DMA channels and dedicated DMA buses to handle data I/O without interfering

with CPU operations. To maximize data I/O efficiency, some modern PDSPs even include a

dedicated video and audio codec (coder/decoder) as well as a high-speed serial/parallel

communication port.

Application- Specific Instruction Set – Digital Signal Processors (ASIP DSPs) [3]

An ASIP DSP is an application specific digital signal processor designed for an application

domain to accelerate computationally heavy and most used functions. It is used for iterative data

manipulation, transformation, and matrix computing extensive applications. ASIP architecture is

designed to implement the assembly instruction set with minimum hardware cost. The main

difference between a general-purpose processor and an ASIP DSP is the application domain. A

general-purpose processor is not designed for a specific application class so that it should be

optimized based on the performance of the application.

The design focus of an ASIP is on specific performance and specific flexibility with low cost for

solving problems in a specific domain. A general-purpose microprocessor aims for the maximum

average performance instead of specific performance. Designers of general-purpose processors

take into consideration both the maximum performance and maximum flexibility. The instruction

set must be general enough to support general applications. The compiler should offer

17 compilation for all programs and to adapt all programmers’ coding behaviors. ASIP designers have to think about applications and cost first. Usually the primary challenges for ASIP designers are the silicon cost and power consumption.

Based on the specified function coverage, the goal of an ASIP design is to reach the highest performance over silicon cost, the highest performance over power consumption, and the highest performance over the design cost. The requirement on flexibility should be sufficient instead of ultimate. The performance is application specific instead of the highest one. To minimize the silicon cost, a design of an ASIP aims usually to a custom performance requirement instead of an ultimate possible high performance. Programs running in an ASIP might be relatively short, simple, with ultra-high coding efficiency, requirements on tool qualities such as the quality of code compiler could be application specific. For example, for radio baseband, the requirement on compiler may not really be mandatory.

2.3.2 Field Programmable Gate Arrays (FPGAs)

A Field Programmable Gate Array (FPGA) is a software-configurable hardware device that contains an array of programmable logic cells interconnected by a matrix of programmable connections. Each cell can implement a simple logic function defined by a designer’s CAD tool.

The programmability of FPGA is based on two key principles: the use of programmable functional blocks, and programmable interconnect which allows multiple blocks to be connected to form more complex logical functions. [13]

The fundamental building block of the FPGA is the Look-Up-Table (LUT). A LUT is a Read-

Only-Memory (ROM) which may be programmed to emulate logical functions by storing the relevant output in the memory location corresponding to the inputs which produce those outputs.

Loading the data into each LUT on the chip is known as configuring the FPGA By connecting

18

these together using the programmable interconnect; networks of LUTs could implement higher

dimensionality logic functions. The resulting networks are then connected to the outside world

via the programmable pins. [14]

FPGA implementation styles [3]

 Dedicated –core based: In this style, the underlying FPGA logic resources are used to “host”

the DSP processing engine or core in its entirety. DSP algorithms can be modeled and simulated

or pre-existing DSP IP cores can be used to design the whole DSP processing system.

Algorithmic model to hardware core generation requires tight coupling between model and core

and a node in the model translates to a component in the circuit. The most commonly used high

level modeling and synthesis tools used in this type of implementation style is MathWorks

Simulink along with FPGA vendor tools (Altera’s DSP Builder or Xilinx’s System Generator).

Dedicated core development is accomplished through algorithm modeling and simulation in

Simulink and the associated FPGA vendor tool. After the intended algorithm has been modeled,

FPGA vendor tool generates a VHDL or Verilog HDL description of the algorithmic model that

can be used for further FPGA based design. Since RTL generation of the algorithm being a

complex task, the high level synthesis tools play a crucial role in saving development time and

effort by automatic HDL code generation. The logic can then be synthesized, placed and routed

on the FPGA using standard FPGA tools. In this manner, a dedicated core for the

desires DSP logic can be created, generated synthesized on the target FPGA hardware. FPGA

tools may also possess the capability of generating test benches along with HDL code. This test

bench can be used for RTL simulation and functional verification. A detailed development flow

of creating such kind of a dedicated core is explained in the next chapter.

19

 Co-processor based: In this style, available microprocessor architecture is coupled with a

custom designed co-processor for application-specific DSP. FPGA vendors provide softcore

embedded processors like Xilinx’s Microblaze and Altera’s Nios II. FPGAs allow the softcore

processors to be extended with custom co-processors which results in a robust platform

for developing application-specific Multi-processor System on Chip (MPSoC). The chief

advantage is software programmability of multi-processors delivering high performance for

domain specific applications. FPGA vendors like Altera provides a soft core processor core

called Nios II and Xilinx provides a soft core processor called Micro Blaze along with a vast IP

of memory and peripherals that can be used in conjunction with the designed DSP logic. High

level synthesis tools like MathWorks Simulink are employed for algorithm modeling and

simulation. These tools work in tandem with FPGA vendor specific tools like Altera’s DSP

Builder and Quartus II and Xilinx’s System Generator to provide a way to convert an algorithmic

description of a DSP application to a RTL module. This RTL DSP module can be coupled with a

soft core processor provided by the FPGA vendor along with memory and other peripherals to

generate a configuration termed as System on a Programmable Chip. Such a configuration

harnesses the strengths of both the soft core processor and the core DSP logic in such a manner

that the soft core processor can handle general purpose computing tasks while leaving the DSP

core logic free to manage and execute its core DSP functions. As this configuration consists of a

dedicated hardware accelerator combined with a soft core processor, it is termed as a co-

processor based configuration.

Advantages of FPGA based implementation of DSP

 Digital signal processing (DSP) algorithms have traditionally been implemented using

application-specific integrated circuits (ASICs) or programmable digital signal processors

(PDSPs). However, with the introduction of large capacity FPGAs, there has been a shift towards

20

reconfigurable computing for DSP. The key motivating factors for choosing FPGA as a target

platform for DSP applications are flexibility, real-time performance and cost. While software

processors allow functional flexibility, the application’s real-time performance requirements, or

physical constraints placed on the embedded realization, for example in terms of size or power

consumption, may be beyond which these can achieve. In such a situation, unless volumes are

sufficiently high, the Non-Recurring Engineering (NRE) costs associated with creating

customized ASICs are such that this may not be commercially viable.

 For such high performance, low volume DSP systems, the ability of FPGA to host custom

computing architectures tailored to the real-time requirements of the application and physical

requirements of the implementation, at relatively low cost, is a key advantage. The fine-grained

parallelism of FPGAs coupled with the inherent found in many DSP functions,

have made reconfigurable computing a viable alternative that offers a compromise between the

performance of fixed-functionality hardware and the flexibility of software- programmable

devices [3].

 Modern FPGAs play host to a range of complex processing resources which can only be

effectively exploited by heterogeneous processing architectures composed of microprocessors

with custom co-processors, parallel software processors and dedicated hardware units. The

complexity of these architectures, coupled with the need for frequent regeneration of the

implementation with each new application makes FPGA system design a highly complex and

unique design problem. The key to success in this process is the ability of the designer to best

exploit the FPGA resources in a custom architecture, and the ability of design tools to quickly

and efficiently generate these architectures [3].

 FPGAs offer performance target not achievable by DSP processors. However, to achieve the

high-performance, FPGA-based designs have come at a cost. Efficient utilization of possibilities

21 provided by modern programmable devices requires knowledge of hardware specific design methods. Designing DSP system targeted for FPGA devices is very different than designing it for DSP processors. Most algorithms being in use were developed for software implementation.

Such algorithms can be difficult to translate into hardware. Thus the efficiency of FPGA-based

DSP is heavily dependent on experience of the designer and his ability to tailor the algorithm to efficient hardware implementation. [DSP design for FPGA Arch]. FPGA implementation has several key advantages specifically time-to-market is short, upgrade to new architecture is relatively easy and low-volume production is cost effective. [4]

2.4 VLSI - DSP: AN INSIGHT INTO THE EDUCATION PERSPECTIVE

While discussing the evolution of DSP hardware in terms of various options available to an engineer, this author as a part of the background study of this thesis, also feels the need to delve into the issue of the current state of education with respect to VLSI and DSP subjects at the undergraduate and graduate level. The motivation for this is the intention to make a solid case of incorporating the teaching of hardware implementation of DSP algorithms and applications in university curricula for electrical and students.

In this section, we shall review the current state of graduate –level DSP education / VLSI education and discuss ways and means that can be adopted to incorporate the hardware implementation aspect in DSP education. This can be visualized as an attempt to bring about a convergence of VLSI and designers with DSP algorithm engineers with the intention of creating a possibility of knowledge sharing and joint development of DSP projects that benefits from the capabilities of both DSP architects and VLSI engineers.

Digital signal processing (DSP) is an area of engineering that has developed rapidly over the past

30 years. DSP is omnipresent as it is used in many important areas, from multimedia and digital

22 communication to consumer electronic products, such as and MP3 players.DSP has enabled the user to clean up noisy signals, speed up the communication rate, and store more data, and provides many advantages over its analog counterpart. [15]

We are entering an era when it is insufficient to be just a DSP designer, just as it is insufficient to be only a hardware implementation expert. Instead, engineers must be aware of the interactions between design and implementation-especially in the area of digital signal processing. The key to unlock the huge potential that lies at the convergence of VLSI and DSP is integrating algorithm design and implementation. This approach is centered on producing engineers that can understand the intricacies and interaction of the DSP algorithm and its implementation into real products. The question however is to what extent theory and development of the algorithm should be integrated with its implementation. [16]

2.4.1 Present state of VLSI - DSP education

With regards to digital VLSI education at the graduate level, this has generally been restricted to

VLSI design, verification, testing & validation and electronic design automation courses with rarely a thought being given about including the diverse fields that Integrated Circuits (ICs) touch upon, such as System-on –Chips, or the application aspect of Application- Specific

Integrated Circuits (ASICs). It has come to the notice of this author that somewhere in the maze of VLSI education at the graduate level, the importance of “application-specific” has been reduced and the “” aspect has taken precedence. In this process, the core essence of learning and understanding the design, verification & validation of ASICs has been lost. This situation prompts for more targeted action at including the study of a range of

“application domains” that can be visualized when studying about ASIC design. Nowadays,

ASICs are ubiquitous, they can be found in every gadget, appliance, product, system that is expected to work in a wide array of domains ranging from military systems, aerospace &

23

aviation, automobile, industrial automation, medicine & healthcare, entertainment &

multimedia gadgets, consumer electronic appliances, mobile computing products,

& networking products. Therefore it makes it necessary for a hardware

engineer to understand intricacies like:

 What is the function an ASIC is being designed?

 What domain it shall be deployed in?

 What is the external environment & how does it affect the functioning of the ASIC?

 Does the domain influence in any way the ASIC design-verification-testing-validation

lifecycle?

 Is the domain consideration necessary only at the architectural level and irrelevant at the

logic/circuit/physical design levels of a typical ASIC design cycle?

DSP courses offered at many universities do not normally include hardware implementation case

scenarios of signal processing algorithms. The scope of these courses is limited to understanding

and giving a working knowledge of commonly used DSP algorithms like Fourier transforms and

digital filtering. In terms of including the hardware aspects, rarely is such an option considered

partly due to expensive costs associated with purchasing PDSP or FPGA evaluation kits and full-

version software tools. A major reason for the non-inclusion of FPGA implementation is the fact

that students majoring in signal processing often are not familiar with the VHDL or Verilog

FPGA programming languages. It is of benefit to signal processing students if applied DSP

courses are structured in such a way that a hardware implementation aspect is included as an

alternative or complement to DSP theory. [17]

The primary benefit of including the DSP hardware aspect is to help students understand the

process of algorithm to architecture transformation. Restricting the study only to DSP theory

may inhibit and hinder the realization of how a DSP algorithm can be mapped to a fixed

24 processor architecture like a Texas Instruments PDSP chip or a flexible user defined architecture that can be synthesized on an FPGA using on board hard-IP cores like logic blocks, memories, multipliers or soft-IP cores that are provided by FPGA vendors for a plethora of common DSP tasks. Hence inclusion of such a DSP hardware aspect in a DSP theory class or an advanced DSP class shall help students to learn a critical aspect of how DSP applications are able to function: i.e. interface with signals in the real-world, perform computation on those signals and generate necessary output signals after computation according to some pre-defined algorithm to bring about a signal transformation that helps the end-user of the DSP system decipher the output signal and make decisions based on the output information.

2.4.2 Suggested improvements in VLSI - DSP education

DSP theory courses have been standard fare in electrical engineering departments for many years, however DSP hardware courses are less common, and designing a course syllabus is not so straightforward. Two syllabus decisions involve course emphasis and course structure.

An all-inclusive and all-encompassing DSP theory and hardware course can be split up over a period of 2 semesters. In the first semester, the emphasis shall be on teaching of basic DSP theory that has been traditionally followed typically in all universities. An advanced DSP hardware course shall specifically include a type of DSP hardware as a platform on which DSP algorithms and DSP applications can be experimented. PDSP and FPGA are naturally the most optimum choices when considering the DSP hardware aspect. In fact, the availability of a wide suite of educational hardware evaluation boards that are offered by Texas Instruments, a leading vendor of PDSPs and Altera and Xilinx, a leading vendor of FPGAs has the academic community spoilt for choice.

In such a situation, the choice of hardware boils down to a competition between differing design methodologies that are associated with each platform. Conventional wisdom shall make

25 professors to be inclined to go with having PDSP kits in the labs for DSP hardware study. Texas

Instruments offers a vast array of DSP families in their product portfolio ranging from Multicore

DSPs (C6x), power optimized DSPs to low-power DSPs (C54x, C55x). Along with the hardware kit, Texas Instruments provides access to Code Composer Studio software. Code Composer

Studio™ (CCStudio) is an Eclipse-based integrated development environment for TI's DSPs, and application processors. CCStudio includes a suite of tools used to develop and debug embedded applications. It includes compilers for each of TI's device families, source code editor, project build environment, debugger, profiler, simulators and many other features.

CCStudio provides a single user interface taking users through each step of the application development flow. Familiar tools and interfaces allow users to get started faster than ever before and add functionality to their application thanks to sophisticated productivity tools. [18]

The case for having FPGA as a hardware platform in DSP hardware classes has become much stronger over the recent years with the availability of high-level synthesis tools provided by

FPGA vendors like Altera & Xilinx. These tools help engineers design and implement DSP applications using graphical modeling tools like Simulink. Vendors provide specific tools to convert a Simulink model to Verilog or VHDL code which can be downloaded onto an FPGA and the synthesized design can be evaluated directly on hardware. Moreover a complete processor based system can also be designed using DSP modules that can be integrated with soft-

IP cores like processors, memories and peripherals. Hence a complete package of software tools and hardware boards can help a student use them not only for digital design or embedded systems classes but also for DSP hardware design and deployment class.

26

CHAPTER 3: PROBLEM STATEMENT

In the previous chapter, we discussed the necessary background information that lays the groundwork essential for providing a detailed problem statement including the scope, objectives and limitations of this thesis in this chapter. It discussed the advent of VLSI and its profound impact on DSP system implementation with a detailed account of the evolution of DSP hardware, specifically the PDSPs and FPGAs and the architectural styles associated with each one of them.

This chapter continues where the preceding chapter concluded and proceeds to introduce and expand on the issue of an overall design methodology associated with each hardware platform for DSP system development. The first section of this chapter provides a detailed analysis of

PDSP and FPGA design methodologies, discussing the similarities and differences that emerge when working on a DSP application starting from an algorithm, moving to architecture and concluding with a circuit to realize the intended application. Hence, the end-result is a cumulative effort of implementing the intended algorithm on a target architecture combined together to synthesize a circuit is the desired DSP application.

3.1 DESIGN METHODOLOGIES: PDSP AND FPGA DESIGN FLOWS

A design methodology is the overall strategy to organize and solve the design tasks at the different steps of the design process. It is not possible to construct a comprehensive design methodology that can be applied for all application domains or hardware platforms, but a sensible methodology can be crafted that spans the common features present with a domain or platform.

A comparative study of PDSP and FPGA development flows can be performed using three parameters of algorithm modeling and simulation, architecture design and analysis, circuit

27 synthesis and implementation. The development flow for each platform consists of these steps that involve the 3 distinct phases of system design namely algorithm, architecture, and circuit.

The first step in the development flow basically discusses the mapping or transformation of a

DSP algorithm for suitable hardware architecture. This step consists of modeling a DSP algorithm in a high-level programming language like C or C++ or an or a hardware description language (HDL) like Verilog HDL or VHDL. In case of FPGA-based design, generally HDLs are used to generate a functional description of the DSP algorithm also called as the Register Transfer Level (RTL) description. Nowadays, with the advent of “high- level synthesis tools”, it has become possible to model algorithms in graphical tools like

Simulink combined with extensive support for algorithm simulation, verification and debug capabilities.

The second step involves architecture design for the underlying hardware tailored to for the DSP application and taking into consideration the hardware platform that should host the architecture.

In certain hardware platforms like PDSPs, the architecture is fixed at the time of manufacture and hence cannot be modified or reconfigured during DSP system development. In contrast,

FPGAs are a class of reconfigurable and programmable logic that allow the development of custom architectures tailored for the for the computational, memory and power requirements of the application for which DSP system is intended to be designed. This has removed the preconception that FPGAs are only used as ‘’ platform and more realistically shows that FPGAs are a collection of system components with which the user can create a DSP system.

The third step consists of circuit synthesis and implementation. In case of FPGAs, this can be viewed as downloading the custom architecture designed in the previous step onto the FPGA using logic synthesis tools. A logic synthesis tool takes an RTL hardware description and a standard cell library as input and produces a gate-level netlist as output. The resulting gate-level

28 netlist is a completely structural description with only standard cells at the leaves of the design.

Internally, a synthesis tool performs many steps including high-level RTL optimizations, RTL to boolean logic, technology independent optimizations, and finally technology mapping to the available standard cells. In case of PDSPs, since the architecture is fixed, there is no separate need for synthesizing it. The final step involves system integration of the software components and the hardware architecture. Therefore, software development in high-level languages / graphical tools for the DSP algorithms using the target architecture on the desired hardware platform is an essential and ultimate step in deploying the envisioned DSP application on the target hardware platform.

Given below is a detailed account of the development flow for PDSP followed by FPGA.

3.1.1 DSP system development flow using PDSPs

As stated above, the generic steps for DSP system development using PDSPs remain the same.

Here each step has been elaborated in the context of PDSP as a hardware platform. The diagram below shows the design flow for PDSPs. [19]

29

Figure 2: PDSP design flow [19]

The flowchart above [19] clearly depicts each stage of DSP system development starting with system requirements and specifications followed by algorithm development and simulation, hardware/software partitioning, development and prototyping and concluding with system integration testing debug and final release.

Algorithm –level modeling and simulation is generally done using high level languages like

MATLAB or C / C++. Simulations are performed using tools that run on a general purpose computer so that results can be analyzed and input data tested before migrating to algorithm to the target hardware platform. The advantage of this step is that becomes easy to test and debug and if required modify programs in high level languages. This significantly saves software development time.

30

The choice of the PDSP depends upon a variety of factors like kind of algorithm to be implemented, processing requirements of the algorithm, code sizes etc. the objective is to select the chip that best match the project’s time scales and cost calculations. Moreover, the vast array of development tools that can be supported by PDSP vendors has emerged as a key factor in the selection of PDSP chips. These tools range from compilers, assemblers, simulators in the software development and debug category to in-circuit emulators and logic analyzers in the hardware testing category along with commercially available evaluation boards for the purpose of prototyping and software development before the actual PDSP chip is purchased for mass deployment of applications.

The software development stage is the most important stage in DSP system development using

PDSPs. This stage mirrors a typical software development life-cycle that involves the phases of requirements specification, analysis, design, implementation, integration, testing, debug and finally concluding with release. High –level programming languages like C/C++ are the most commonly used tools for software development in PDSP based designs. On the other hand,

Assembly language programming does empower the programmer to use various functions of the processor resulting in a highly efficient mapping of the algorithm to the processor but is generally discouraged due to its time-consuming and complex nature. Software simulators or hardware platforms are used for debug and testing purposes in software development. Emulators are used when software need to be tested on the target hardware. DSP software development engineers largely prefer to use C because of the wide array of C compilers that are available for different hardware platforms. This is in addition to numerous inherent advantages of the C language itself like the support of data structure and powerful commands.

31

Hence, the PDSP design flow for DSP system development shows a well-defined pathway for system realization with the software development stage acquiring primacy over other stages and as a key deciding factor in the selection of a particular PDSP chip.

Currently, Texas Instruments and Analog Devices are the biggest vendors of PDSP chips along with associated suite of development tools and end-to-end solutions across a broad spectrum of application domains like industrial, medical, entertainment, military, aerospace, aviation etc.

3.1.2 DSP system development flow using FPGAs

In the previous chapter, it has been mentioned how FPGAs have become a competitive alternative for high performance DSP applications, previously dominated by PDSP and ASIC devices. It also explained the implementation styles of using a FPGA as a DSP Co-processor, as well as, a stand-alone DSP Engine also called as the dedicated core approach.

In the past few years, the electronics industry has witnessed upgraded versions of products are introduced frequently. This forces the engineers to build a product development strategy that is flexible, fast, and low cost that supports rapid, low-cost product innovation and evolution. The strategy should help system designers react in real time to customer feedback and market changes; tailor features of a basic design for different users, regions, or price points; develop differentiated features before the competition; and maintain the first-mover advantage that is so critical to market success. This “design once, make many” approach improves productivity, saves development time, and ultimately saves money.

The main barrier to acceptance of FPGAs for new users has been ease of use and design flow, which is now being addressed with the emergence of new development platforms combined with high-level design methodologies and software tools associated with it.

32

The role of high-level synthesis tools in FPGA based design has been prolific and note-worthy in the sense that it has greatly simplified the process of algorithm modeling and simulation. In the absence of high-level modeling tools, a DSP system designer is expected to write HDL programs for any DSP algorithm that needs to be modeled and simulated. This is a time consuming process because HDL coding complicates and expands the design effort and provides no specific advantages over using high-level programming languages like C /C++.

Before examining deeper into DSP system design using FPGAs using high level synthesis tools and a model-based design framework, it shall be prudent to discuss the conventional approach of

FPGA based design using Hardware Description Languages (HDLs). The section below discusses the conventional method of HDL based design followed by the a brief comment on programming challenges that are encountered in HDL based design that have consequently compelled the adoption of model based design framework using high level synthesis and verification tools.

Figure 3: FPGA based DSP design approaches

33

Conventional approach: HDL based design and its pitfalls

Hardware designers that program FPGAs predominantly use HDLs such as Verilog and VHDL which is a lower level of abstraction than high level programming languages like C or C++. This is because a hardware designer thinks about a design in terms of low-level building blocks like basic Boolean gates and/or/nand/nor/ex-or, , decoders, adders, multipliers, flip-, registers. Hence, employing Verilog or VHDL makes the task easier for the designer to construct the structural logic required to perform the necessary function. Therefore, the emphasis is more on the structural representation of the intended application than on the behavioral aspects [24].

In contrast, programming general purpose CPUs using High level programming languages

(HLLs) enjoy the advantages of solid Instruction Set Architectures (ISA) of the microprocessors and availability of cutting-edge compilers that enable a simpler programming experience.

Moreover, a higher level of abstraction drastically increases as programmer’s efficiency and reduced possibility of bugs resulting in faster time to market [24].

After an architectural and logic level description is created using HDLs, for synthesizing the design on the FPGA high level tools are used to generate a configuration bitstream from Verilog or VHDL code. This is the most significant and time consuming strep that relies heavily on the performance of powerful EDA tools to generate a bitstream that shall adhere to the resource and timing constraints of the target FPGA device. This phase typically consists of several steps like synthesis, translation, target hardware resource mapping, , timing analysis eventually ending with the generation of a bitstream configuration file for downloading on the

FPGA.

This entire flow described above can be broken down into two major phases: “high-level synthesis” and “logic synthesis”. High level synthesis can be defined as the conversion of an

34 algorithmic (behavioral) description of an application to a low-level RTL (structural) description using logic gates. Logic synthesis can be defined as the process of converting a RTL level description into a low-level netlist specific to the target hardware with the extensive use of target’s technology libraries.

To ease the programming of FPGAs, several frameworks have been proposed [24] with the chief objective of elevating the level of abstraction at which the hardware designer can write a program to compile it down to VHDL or Verilog. Three frameworks- HDL-like frameworks,

HLL based frameworks and model based – are the most widely used and shall be discussed here.

The model based framework has been separately explained in the next section.

HDL-like frameworks: An example of this is SystemVerilog that contains two components: the synthesizable component that extends and adds several features of Verilog-2005 standard and the verification component which uses an object-oriented model similar to C++ or Java than Verilog.

HLL –based frameworks: An example of this is the use of high level programming languages like C or C++ or SystemC that support the conversion of a behavioral description to RTL description done by high-end compilers. EDA vendors have developed tools to support high level synthesis. Example: Cadence C to Silicon Compiler, Synphony C Synthesis,

Mentor Graphics Catapult C.

The next phase of “Design Verification” tasks often gobble up a large portion of the overall design cycle time. In HDL based designs, a test bench be must be created that is connected to the design under test. The design under test and the testbench are simulated using event based simulators like ModelSim of Mentor Graphics Corporation wherein each signal transaction is recorded and displayed as a waveform. This waveform based debugging may work for smaller designs but the complexity and simulation time required increases as the designs become larger

35 and larger. To avoid this, hardware designers often turn to emulation suing FPGAs to speed up simulation. This whole process is diametrically opposite to the case in software verification which is much less complicated and has numerous debugging to verify programs in a modest yet potent way [24].

Modern approach: model based design framework

High-level synthesis takes an abstract “behavioral specification” of a digital system and builds a register-transfer level (RTL) “structure” that realizes the given behavior. The task is to take a specification of the behavior required of a system and a set of constraints and goals to be satisfied, and to form a structure that implements the behavior while satisfying the goals and constraints. Hence High-level synthesis actually maps algorithms to architectures.

Traditionally, algorithm designers used MATLAB or C to validate algorithms, without feedback about the practical feasibility of a hardware system. Hardware designers then re-entered the design using Hardware Description Languages (HDLs) that involved numerous changes in the algorithm. The result was required to be re-validated by the algorithm designer, leading to multiple coding and verification of the design, significantly increasing the development cycle

[23].

The arrival of graphical high level modeling and synthesis tools like Simulink by MathWorks in conjunction with FPGA vendor tools like Altera’s DSP Builder or Xilinx’s System Generator have revolutionized the whole procedure of DSP system design and led to emergence and adoption of the FPGA as a credible alternative to fixed architecture PDSP chips or high cost

ASICs.

A unified Simulink design environment/ framework widely adopted by the algorithm designers can be depicted as below [23]

36

Figure 4 : Model based design Framework using Simulink [6]

Figure 5 : FPGA Verification [6]

Using this approach, a design needs to be entered only once. The environment enables both algorithm verification and hardware emulation, and also provides an abstract view of the design architecture. Finally, it allows FPGA-based verification, again using the same input description.

37

Behavioral HDL is produced which allows algorithm mapping onto an FPGA for hardware

emulation. EDA tools runs an initial synthesis and HDL simulation to verify functional

equivalency between the two hardware descriptions. Mapped HDL can then be synthesized into a

GDSII format. Hence, graphical block-based design entry restores the missing link between

algorithm and circuit designers. It enables a single design entry, architecture optimization, and

final hardware verification within Simulink environment, which is widely adopted by the

algorithmic designers.

The above figure clearly shows that it is straightforward to map a fixed architecture to a target

technology. With some architectural feedback from the underlying technology such as speed,

power, and area of the building blocks, the architecture can be optimized in Simulink.

Leveraging this flow, architectural trade-offs can then be explored, allowing the designer to

minimize power and area for a given technology, for a specified throughput constraint for an

algorithm. The final phase of the design flow is verification. Simulink is used in the entire design

cycle: design entry, architecture optimization, and final verification. Behavioral HDL generated

by Simulink can be used to emulate the design on the FPGA.

Model based design flow for FPGA based implementation: Goals

According to [6] the five major goals of model based design are listed below:

1. Provide DSP system modeling capability at a high level of abstraction, with simple the availability of simple arithmetic and logic operators together with specific operators for typical DSP computations like DFT, FFT, FIR and IIR filters etc.

2. Ensure that the same model can be used throughout the design process, from performing initial theoretical design, to simulation, RTL code generation and system integration

3. Generate an FPGA implementation of the data path from the system model automatically, without requiring the addition of device-specific information.

4. Support the task of synthesizing control logic for the FPGA implementation. 

38

5. Automate the creation of test benches for performing logic simulation on the final FPGA implementation.

Model based design flow for FPGA based implementation: Process [6]

Model based design is defined as the process of systematic generation of a hardware

representation for an intended DSP application traversing through multiple phases. It starts with

an algorithmic model followed by RTL code generation to describe the functional representation

of the system to a gate-level netlist as a structural representation that eventually gets mapped and

synthesized to a specific target hardware platform usually a FPGA.

Simulink is a graphical modeling tool; designers create algorithmic models using available

blocksets from MathWorks and other arithmetic and logic blocksets provided by Altera’s DSP

Builder. Simulink also provides extensive simulation capabilities including floating point types.

DSP Builder is Altera’s high level modeling and synthesis tool that works in conjunction with

and is ingrained as a part of Simulink. DSP Builder simplifies hardware implementation of DSP

functions, provides a system verification capabilities to the system engineer who may not even

be familiar with the HDL based design flow and allows the system engineer to implement DSP

functions on a FPGA without the knowledge of HDLs. DSP Builder shortens design cycles by

helping construct the hardware representation of a DSP design in an algorithm-friendly

development environment. It integrates the algorithm development, simulation, and verification

capabilities of MathWorks MATLAB and Simulink system-level design tools with the Altera

Quartus II software and third-party synthesis and simulation tools.

Given below is the diagram [6] and description of the model based design flow that uses

Simulink and DSP Builder as high level synthesis tools.

39

Figure 6: Model based design: complete design flow [6]

The first step in model based design involves model construction and simulation using Simulink and DSP Builder function blocksets for the intended DSP algorithms. The development environment is Simulink where DSP Builder blocksets are ingrained. When the model is

40

converted into a form that can be realized on the FPGA, the system designer can invoke the

netlister and testbench generator.

The netlister extracts a hierarchical representation of the model’s structure annotated with all the

element parameters and signal data types. A mapper then analyzes the elements in the hierarchy

and creates a VHDL description of the design. The test bench generator is an interactive tool that

runs in the Simulink environment, where the designer can captures the input stimuli and system

outputs of selected simulation runs for conversion to test patterns. The test bench generator

converts the captured simulation data into VHDL code that will verify the algorithm being

modeled and test its outputs against the expected results.

The FPGA vendor specific tools take over from this stage to synthesize the control logic and

combine all the pieces into a single fully-realized netlist, and place and route the design in an

FPGA. The outputs of this back-end process are a bit-stream (FPGA configuration file) and an

EDIF (Electronic Design Interchange Format) structural netlist of the hardware annotated with

timing information. This netlist can be simulated with the test vectors produced previously from

system simulations to verify the performance of the completed FPGA hardware realization.

The step by step process for Altera FPGAs is precisely given as follows [6]

• Model Creation in Simulink and DSP Builder

• Simulate the model within Simulink

• Convert the Simulink model to a FPGA realizable form

• HDL Code generation & Test bench generation in DSP Builder

• Simulation of DSP model as a Quartus II (Altera’s FPGA design tool) project

• FPGA specific netlist generation

41

• Download the design on FPGA and test

From the above description of the FPGA Model based design flow it is clear that FPGA based

design of DSP systems using high-level development tools has opened the floodgates for a vast

amount of DSP algorithm engineers to actively contribute in the process of hardware system

realization shoulder to shoulder with their VLSI counterparts. In turn, it has created vast

opportunities for VLSI engineers to gain domain specific knowledge of the DSP application

domain and envision the chip design process from an algorithm developer’s perspective.

In the past, DSP FPGA design required the cumulative energies of a DSP engineer and VLSI

engineer familiar with HDL based design. But, FPGA model based design approach helps a DSP

algorithm engineer to derive an HDL netlist for a data path directly from a system level tool. The

steps include construction of an ideal mathematical model, investigation of implementation

effects, test-bench creation, and hardware netlist generation. Hence it provides a seamless path

from system-level algorithm design to FPGA implementation.

3.2 PDSP AND FPGA COMPARISON: SUMMARY OF PAST CASE STUDIES

Now that the design methodologies for DSP system development using PDSPs and FPGAs have

been explained in the above two sections, it shall be necessary to demonstrate working examples

of the two hardware platforms that have already been developed and deployed. In the following

section, we will take a look at a limited set of examples in the area of DSP system

implementation. These examples have been culled from available research conference

proceedings and journal articles.

Presented below is a summary from three case studies from past literature that specifically deal

with the topic of hardware platform comparison (PDSP vs. FPGA) for DSP system

implementation. These DSP systems belong to varied application domains like sound signal

42 processing and image processing. This survey together with the material presented till now in this thesis forms the theoretical foundation on which the problem statement of this research shall be based.

Case Study # 1

The first case study here [20] deals with the FPGA implementation of a discrete wavelet transform algorithm used for real time imaging compression applications. DWT is the core transform used in JPEG2000 image compression standard. The goal of this paper was to compare the performance of a traditional DSP processor against an FPGA in terms of the development effort and processing performance.

The hardware platform used was TI C6416 DSP chip and Altera’s DE2 board with a Cyclone II

FPGA. The FPGA implementation was accomplished using handwritten VHDL code and PDSP implementation using C programming.

The Cyclone II FPGA executed the DWT in 164,354 clock cycles at a 50 MHz clock rate resulting in an execution time of 3.2 milliseconds for a 128 X 128 pixels image. In contrast , the

PDSP chip required 10,770,432 clock cycles at 600MHz clock rate leading to 17.9 ms to do the same job. In terms of execution performance, the FPGA outperformed the PDSP by a wide margin. In terms of hardware utilization, FPGA design required 742 Logic Elements out of a total of 33,216, resulting in 2.2% of hardware utilization. The PDSP occupied 67.2 KB of 1024

KB on-chip memory making it 6.5 % of total size. In this case too, FPGA design was more efficient that the PDSP design. The only metric where PDSP raced ahead was the lines of code.

PDSP design requires 132 lines of C code whereas FPGA design required 429 lines of VHDL code resulting in 3.2 times the lines of code than the PDSP. This factor has been tackled and new results have demonstrated that with the usage of high-level synthesis tools instead of hand-

43 written VHDL code, the number of lines of code required for FPGA based designs can be minimized greatly. More details of this are presented in the next case study.

Case Study # 2

The second case study here [21] is an offshoot of the first case study presented above. It delves deeper into the issue of FPGA implementation techniques and proposes a new method of FPGA implementation using high-level synthesis tools like Simulink/DSP Builder. The author states that the goal was to study the feasibility of a high-level synthesis tools based approach using

Simulink/ DSP builder by comparing its performance with a handwritten VHDL implementation.

It goes on to prove that Simulink/DSP Builder technique is more efficient and faster as compared to traditional VHDL coding technique.

The hardware platform used was a Cyclone II FPGA on an Altera DE2 board. To evaluate the quality of results of the Simulink/ DSP Builder tool as against a hand-written VHDL, performance in terms of hardware utilization and execution time was measured. Two algorithms of the JPEG2000 standard – Daubechies 5/3 and Daubechies 9/7 were implemented for comparison. The following results were obtained [21]

RESULTS Algorithm VHDL Simulink/DSP Builder Logic Units Clock Frequency Logic Units Clock Frequency Daubechies 5/3 110 203.13 79 250.4 Daubechies 9/7 133 66.08 107 109

Table 1: Results for Case Study # 2 [21]

The above results clearly show that using Simulink/DSP Builder has outperformed in the metrics of clock frequency and hardware utilization over hand-written VHDL.This has led to the

44 emergence of high-level synthesis tools as a competitive alternative to traditional VHDL based implementation in FPGAs allowing for faster implementation and faster time to market.

Case Study # 3

This case study [22] discusses the comparison of FPGA and DSP development environments and performance for acoustic array processing.

The purpose of the sound localization system is to process signals from an array of microphones to determine the direction of arrival of an impulsive acoustic signal like a gunshot or a handclap.

The application uses a fixed planar array of microphones to capture impulsive acoustic signals from point source. The sound source is assumed to be located at a distance that is sufficiently far so as to justify a far-field assumption. Additionally, the sound source is assumed to be located on the same plane as the microphone array. The objective is to locate the direction of the (far-field) sound source. This is accomplished by estimating the relative time delays for the arrival times of the acoustic impulse at each microphone. The system includes a filtering stage, a correlation stage, and a trigonometric math (angle of arrival calculation) stage. All of these stages are computationally intensive.

The DSP implementation was accomplished using Texas Instrument’s Code Composer 3.3 development software with a TMS360 C6711 DSK as the target hardware that has floating point capability that reduces the development time for most DSP applications. A C6711 DSK board operates at 100 MHz, is capable of completing eight 32-bit . The DSP on the board has eight independent functional units: four floating point ALUs (), two fixed point ALUs, and two fixed/floating point multipliers. , programmed in C, were developed for the preprocessing, partial cross-correlation and angle calculation portions of the algorithm.

45

The Xilinx Virtex-II FPGA contains 96 18-bit x 18-bit multipliers, 96 18 KB block RAMs, and

3,584 blocks (CLBs). Xilinx rates the part as the equivalent of 3,000,000 logic gates. The clocking available to the FPGA can run at speeds up to 120 MHz. The FPGA development was accomplished using MATLAB’s Simulink in conjunction with Xilinx’s FPGA

System Generator Blockset.The functionality of top-level design consisting three blocks was specified using a lower-level diagram containing System Generator blocks. The preprocessing block & angle of arrival block were designed exclusively using basic System Generator blocks.

The System Generator block set does not provide a correlation block. Hence correlation function can be implemented either with an assembly of smaller blocks or using the “black box” feature of System Generator. The black box feature allows the user to develop a custom block whose functionality is specified using either Verilog HDL or VHDL.

The major focus [22] was to determine how the development tools affected the development time and the processing performance. The software metric used to determine the “equivalent LOC” for the Simulink source code corresponded to the development time, at least for a rough estimate.

For a comparison of the development time, a metric of equivalent lines of code was developed.

Since this cannot be applied directly to graphical languages such as Simulink, an “equivalent

LOC” metric for Simulink code was developed. FPGA design line count to a total of 429 lines of code. By comparison, the DSP design contained 86. The final results [22] indicate that the FPGA design required 4.6 times more lines of code. In terms of timing performance, the FPGA implementation is significantly faster than the DSP. The DSP took 25,725,060 clock cycles to produce a final answer running at a clock rate of 100 MHz. The FPGA took 23,005 clock cycles to produce a final answer running at a clock rate of 40 MHz. This resulted in operating times of

257.3 ms and 0.575 ms respectively, a speedup factor of 447.

46

The above results indicate a wide trade-off between the two approaches in terms of hardware platforms, implementation methodologies and software tools. Development time for the FPGA version can be shortened and the quality of results can be increased when a design can be implemented using major functions that are described by standard blocks made available by high level synthesis tools.

The core objective of the above survey was to understand, in each of the case studies mentioned, the procedure that has been adopted for implementation on the PDSP and FPGA hardware platforms, the nature of the algorithm that has been implemented, the high level languages or synthesis tools that have been used and the impact they have caused on the development time and processing performance. Results of each case study can be compared on the basis of common metrics like execution time, hardware utilization and code size. These results eventually help in building a case for performing such a qualitative and quantitative analysis that can be applied to hitherto untouched application domains of DSP like biomedical applications.

3.3 PROBLEM DEFINITION: SCOPE, GOALS & OBJECTIVES

Based on the exhaustive background study presented in the previous chapter, detailed explanation of the design methodologies associated with each hardware platform and a summary of previous case studies specifically reflecting the subject of this research – VLSI implementation of DSP systems: A comparative study of PDSP and FPGA design methodologies for DSP system design–a comprehensive problem statement has been formulated and presented below. The problem statement outlines its scope and limitations.

The principal focus of this research is to compare the design methodologies that come into play when a specific hardware platform is selected as target hardware for implementation. This

47 research shall also expand its scope to evaluate the impact of high-level synthesis tools have come to play in the course of DSP system development.

An attempt has been made to provide a basic level of comparison between the two platforms to help hardware engineers and algorithm architects decide on a specific hardware platform. The three case studies discussed earlier in this chapter have by and large focused their attention on comparing PDSP and FPGA implementations for a particular class of DSP applications like image processing and acoustics. However, in this thesis, the experiments have been restricted to generic DSP algorithms like DFT, FFT and digital filters. The key intention here is not to get bound to any particular class of DSP applications but to provide a “generic template” for engineers and architects alike to make an informed decision on hardware platform selection.

Considering this author’s primary academic background is in the area of VLSI and embedded systems, the comparative study and the resulting selection template that shall be the end-result of this thesis, should be viewed as an honest attempt to arrive at a conclusion as seen from a hardware engineer’s perspective with some degree of working knowledge of the domain of signal processing. The basic list of experiments along with the necessary hardware and software resources needed for an efficient and purposeful investigation, recording of results and drawing up of inferences have been presented in the next chapter.

48

CHAPTER 4: EXPERIMENTS

4.1 INTRODUCTION

The preceding two chapters have laid the groundwork of a strong theoretical foundation that

consisted of a detailed overview and background survey of the broad area of research that this

thesis discusses. It included the evolution of DSP hardware, the architectural options that

accompany each type of DSP hardware platform, the design methodologies associated with each

DSP hardware platform and their similarities and differences concluding with a survey of past

case studies that have explicitly overseen and discussed the comparison of hardware platforms

for the VLSI implementation of DSP systems.

The previous chapter also formally stated the problem statement of this thesis that has essentially

set out the goals to be achieved at the end of this research:

1. To compare the design methodologies for PDSP based design and FPGA based design

2. To analyze the PDSP and FPGA implementations with regards to their development effort for

a generic set of algorithms

To accomplish the objective of validating these goals, a set of algorithms have been formulated

to be implemented on the two DSP hardware platforms mentioned in the previous chapters :

PDSP and FPGA. These set of algorithms represent the most commonly used mathematical

computations that are virtually used in all types of DSP algorithms in every plausible application

domain ranging from image processing, speech processing, video processing to complex systems

like biomedical devices, military and aerospace systems.

49

4.2 HARDWARE AND SOFTWARE OVERVIEW

Before investigating into the aspect of describing the experiments proposed and the techniques deployed for implementation, it is essential to explain in detail the specific hardware platforms and software technologies have been utilized for the experiments. For this purpose, we have selected a Texas Instruments DSP chip TMS 320 C5515 Evaluation board as a host platform for

PDSP implementation and Altera DE-1 board with cyclone II FPGA as platform for FPGA implementation. Given below is a brief summary of the hardware evaluation boards and the software associated with those platforms.

4.2.1 Hardware – PDSP: Texas Instruments C5515 Evaluation Board

The PDSP hardware platform used for this thesis is the Texas Instruments TMS320 C5515

Evaluation Module (EVM).

Figure 7: Texas Instruments C5515 Evaluation Board [25]

50

The C5515 EVM is a standalone development platform that enables users to evaluate and develop applications for the TI C5515 Digital Signal Processor (DSP). The EVM is designed to work with TI’s Code Composer Studio (CCS) Integrated Development Environment (IDE). Code

Composer Studio communicates with the EVM board through the external emulator header, or on board emulation. The EVM operates from a +5V external power supply or battery. The EVM comes with a full complement of on-board devices that suit a wide variety of application environments. [25]

The key features of the C5515 EVM are [25]:

 A Texas Instruments TMS320C5515 DSP operating up to 100 MHz  128 Mbytes of Mobile SDRAM  16 Megabytes of NOR Flash  64 Megabytes of NAND Flash  128 x 128 bit mapped color LCD display  10 User push button  External JTAG emulation interface  Embedded JTAG controller  RS-232 Interface  MMC / SD Media Card Connector  User USB 2.0 port via C5515  I2C EEPROM (256Kbits) and SPI EEPROM (256Kbits)  Expansion connectors for Bluetooth interface  TPS65023 IC for individual C5515 power rail control  TLV320AIC3204 stereo codec with line in, line out, headphone, mic in, on board microphones  INA219 power measurement devices  Optional battery power

TI TMS320 C5515 DSP Processor [26]

The TMS320C5515 digital-signal processor (DSP) contains a high-performance, low-power DSP to efficiently handle tasks required by portable audio, wireless audio devices, industrial controls, software defined radio, fingerprint biometrics, and medical applications. The functional block diagram of the C5515 DSP is shown below [26]

51

Figure 8: TI TMS320 C5515 DSP Processor [26]

The DSP consists of the following primary components [26]:

The above figure shows the functional block diagram of the PDSP and how it connects to the rest

of the device.

(1) CPU Core: The C5515 CPU is responsible for performing the digital signal processing tasks

required by the application. In addition, the CPU acts as the overall system controller,

responsible for handling many system functions such as system-level initialization,

configuration, user interface, user command execution, connectivity functions, and overall

system control. The CPU also manages/controls all peripherals on the device. The DSP

architecture uses the switched central resource (SCR) to transfer data within the system. [26]

Tightly coupled to the CPU are the following components:

 DSP Internal Memory : - Single and Dual Access RAM and ROM

52

 FFT hardware accelerator

 Ports and buses

(2) FFT Hardware Accelerator: The C5515 CPU includes a tightly-coupled FFT hardware

accelerator that communicates with the C5515 CPU through the use coprocessor instructions.

For ease of use, the ROM has a set of C-callable routines that use these coprocessor instructions

to perform 8, 16, 32, 64, 128, or 256-point FFTs.

(3) System Memory

Memory Type Memory Size Dual-access RAM (DARAM) 64 KB

Single-access RAM (SARAM) 256 KB

Read-only memory (ROM) 128 KB

Table 2: Internal Memory [26]

Memory Type Memory Size Mobile SDRAM 128 MB

NOR Flash 16 MB

NAND Flash 64 MB

Table 3: External Memory [26]

(4) Peripherals: The C5515 PDSP includes the following peripherals [26]:

 One external memory interface (EMIF) with 21-bit address and 16-bit data. The EMIF has

support for mobile SDRAM and non-mobile SDRAM single-level cell (SCL) NAND with 1-bit

ECC, and multi-level cell (MLC) NAND with 4-bit ECC.

 Two serial busses each configurable to support one Multimedia Card (MMC) / Secure Digital

(SD/SDIO) controller, one inter-IC sound bus (I2S) interface with GPIO, or a full GPIO

interface.

53

 One parallel bus configurable to support a 16-bit LCD bridge or a combination of an 8-bit LCD

bridge, a serial peripheral interface (SPI), an I2S, a universal asynchronous receiver/transmitter

(UART), and GPIO.

 Four direct memory access (DMA) controllers, each with four independent channels.

 One inter-integrated circuit (I2C) multi-master and slave interface with 7-bit and 10-bit

addressing modes.

 Three 32-bit timers with 16-bit pre-scaler; one timer supports watchdog functionality.

 A USB 2.0 slave.

 A 10-bit successive approximation (SAR) analog-to-digital converter with

conversion capability.

 One real-time clock (RTC) with associated low power mode.

4.2.2 Software – PDSP: Code Composer Studio IDE [27]

Code Composer Studio (CCS) is the integrated development environment for TI's DSPs,

microcontrollers and application processors. CCS includes a suite of tools used to develop and

debug embedded applications. CCS is based on the Eclipse open source software framework.

Code Composer Studio version 5 uses an unmodified version of Eclipse, and also includes

support for Linux, as well as Microsoft Windows.

CCS includes compilers for each of TI's device families, source code editor, project build

environment, debugger, profiler, simulators and many other features. CCS includes a real time

called DSP/BIOS or SYS/BIOS. CCS includes support for OS level application

debug as well as low-level JTAG based development.

Debugger: CCSs integrated debugger has several capabilities and advanced breakpoints to

simplify development. Conditional or hardware breakpoints are based on full C expressions,

54 local variables or registers. CCS supports the development of complex systems with multiple processors or cores. Global breakpoints and synchronous operations provide control over multiple processors and cores.

Compiler: TI has developed C/C++ compilers specifically tuned to maximize the processor's usage and performance. TI compilers use a wide range of classical, application-oriented, and sophisticated device-specific optimizations that are tuned to all the supported architectures. With the program level view, the compiler is able to generate code similar to an assembly program developer who has the full system view. This application level view is leveraged by the compiler to make trade-offs that significantly increase the processor performance.

Prior to starting the debugger, it is necessary to select and configure the target to where the code will execute. The target can be a “software simulator” or an “emulator connected to a board”.

Simulation using Code Composer Studio [27]

An instruction set simulator is a software tool for developing applications on TI’s PDSPs.

Simulators are an excellent platform for application development because they provide greater visibility into application behavior, are readily available, and are easy to use. Additional simulator characteristics that are critical to application development are simulation speed, simulation accuracy, and the ability to run complete applications. TI offers instruction set simulators which ensure quick deployment of applications into end systems as an integral part of the CCS IDE. The rich integrated development environment (IDE) offers a number of features to speed up the various phases of application development, debug, and optimization. CCS IDE supports complete application simulation, easy migration between simulation and emulation environments, device level simulation, BIOS and RTDX.

55

Simulators provide an excellent development platform that helps the developer meet their goals.

The advantages of simulation are:

 Easy to use. No additional setup is required. Simulators, being software, can be distributed easily and are usually less expensive.

 Provide excellent control and repeatability to the user – a simulator can run in an identical manner time after time. In the hardware scenario, repeatability of external events like interrupts is almost impossible to guarantee.

 Flexibility. Some aspects could be ignored if necessary, to provide an environment more suited to the particular phase of development.

 Provide visibility into the application behavior as well as resource usage. The details, which can be provided on the simulators, may be difficult to obtain on the hardware.

Simulators have some limitations as they are not real systems and are therefore normally limited in the extent to which they can model hardware. TI provides different flavors of simulator and they are abstracted based on the range of details and extend of hardware modeled.

Range of details: Functional - Provides a programmer view of the model Cycle Accurate - Models 100% pipeline and latencies

Extend of Hardware modeled: CPU/Core simulator - Models the CPU core only Device Simulator - Models the CPU, caches, DMA and peripherals. System/SOC Simulator - Multi-core simulator with multiple cores. Ex: ARM +DSP

56

4.2.3 Hardware – FPGA: Altera DE1 Development Board

Figure 9: Altera DE1 Development Board [28]

Altera DE1 Development and Education board features a state-of-the-art Cyclone® II 2C20

FPGA in a 484-pin package. All important components on the board are connected to pins of this chip, allowing the user to control all aspects of the board’s operation. For simple experiments, the DE1 board includes a sufficient number of robust switches (of both toggle and push-button type), LEDs, and 7-segment displays. For more advanced experiments, there are SRAM,

SDRAM, and Flash memory chips.

For experiments that require a processor and simple I/O interfaces, it is easy to instantiate

Altera’s Nios II processor and use interface standards such as RS-232 and PS/2. For experiments

57 that involve sound or video signals, there are standard connectors for microphone, line-in, line- out (24-bit audio CODEC), SD memory card connector, and VGA; these features can be used to create CD-quality audio applications and video.

The following hardware is provided on the DE1 board [28]:

 Altera Cyclone® II 2C20 FPGA device

 Altera Serial Configuration device – EPCS4

 USB Blaster (on board) for programming and user API control; both JTAG and Active Serial

(AS) programming modes are supported

 512-Kbyte SRAM

 8-Mbyte SDRAM

 4-Mbyte Flash memory

 SD Card socket

 4 pushbutton switches

 10 toggle switches

 10 red user LEDs

 8 green user LEDs

 50-MHz oscillator, 27-MHz oscillator and 24-MHz oscillator for clock sources

 24-bit CD-quality audio CODEC with line-in, line-out, and microphone-in jacks

 VGA DAC (4-bit network) with VGA-out connector

 RS-232 transceiver and 9-pin connector

 PS/2 mouse/keyboard connector

 Two 40-pin Expansion Headers with resistor protection

 Powered by either a 7.5V DC adapter or a USB cable

58

CYCLONE II FPGA [28]

Altera® Cyclone II FPGAs extend the low-cost FPGA density range to 68,416 logic elements

(LEs) and provide up to 622 usable I/O pins and up to 1.1 Mbits of embedded memory. Altera’s latest generation of low-cost FPGAs—Cyclone II FPGAs, offer 60% higher performance and half the power consumption of competing 90-nm FPGAs. The low cost and optimized feature set of Cyclone II FPGAs make them ideal solutions for a wide array of automotive, consumer, communications, video processing, test and measurement, and other end-market solutions.

Features of Cyclone II FPGA

The Cyclone II device family offers the following features:

 High-density architecture with 4,608 to 68,416 LEs

 Embedded multipliers

 Advanced I/O support

 Flexible clock management circuitry

 Device configuration

 Intellectual property

4.2.4 Software – FPGA

Software Tools for DSP system development

(1) Simulink [29]: Simulink is a software tool from MathWorks that is used for modeling, simulating and analyzing dynamic systems. Altera’s DSP Builder runs a s an integral part of

Simulink. The DSP Builder Standard and Advanced blocksets appear in the Simulink Library browser. DSP Builder works within the model based design methodology. An executable

59

specification is created using standard Simulink blocksets. After the functionality and dataflow

issues have been defined, DSP builder can be used for specifying the hardware implementation

details for a specific Altera FPGA board/device. DSP Builder can execute all downstream

implementation tools by invoking

Altera’s Quartus II EDA tool to place and route, bitstream generation to configure the FPGA.

(2) Quartus II [30]: Altera provides various tools for development of hardware and software for

embedded systems. Altera’s Quartus II design software provides a complete design environment

that easily adapts to your specific design requirements. The CAD flow involves the following

steps:

1. Design Entry – the desired circuit is specified either by means of a schematic diagram, or by

using a hardware description language, such as VHDL or Verilog

2. Synthesis – the entered design is synthesized into a circuit that consists of the logic elements

(LEs) provided in the FPGA chip

3. Functional Simulation – the synthesized circuit is tested to verify its functional correctness;

this simulation does not take into account any timing issues

4. Fitting – the CAD Fitter tool determines the placement of the LEs defined in the netlist into

the LEs in an actual FPGA chip; it also chooses routing wires in the chip to make the required

connections between specific LEs

5. Timing Analysis – propagation delays along the various paths in the fitted circuit are analyzed

to provide an indication of the expected performance of the circuit

6. Timing Simulation – the fitted circuit is tested to verify both its functional correctness and

timing

60

7. Programming and Configuration – the designed circuit is implemented in a physical FPGA

chip by programming the configuration switches that configure the LEs and establish the

required wiring connections

Figure 10: Quartus II Flow [30]

61

(3) SOPC Builder [31]: SOPC Builder is a powerful system development tool that enables the user to define and generate a complete system-on-a-programmable-chip (SOPC) in much less time than using traditional, manual integration methods. SOPC Builder is included as part of the

Quartus II software. SOPC Builder is a general-purpose tool for creating systems that may or may not contain a processor and may include a soft processor other than the Nios II processor.

SOPC Builder automates the task of integrating hardware components. In traditional design methods, HDL modules must be written manually to wire together the pieces of the system. On the contrary, in SOPC Builder, the system components are specified in a GUI environment and

SOPC Builder generates the interconnect logic automatically. SOPC Builder generates HDL files that define all components of the system, and a top-level HDL file that connects all the components together. SOPC Builder generates either Verilog HDL or VHDL equally.

An SOPC Builder component is a design module that SOPC Builder recognizes and can automatically integrate into a system. Custom components can also be defined and added or selected from a list of provided components. SOPC Builder connects multiple modules together to create a top-level HDL file called the SOPC Builder system. SOPC Builder generates system interconnect fabric that contains logic to manage the connectivity of all modules in the system.

SOPC Builder modules are the building blocks for creating an SOPC Builder system. SOPC

Builder modules use Avalon interfaces, such as memory-mapped, streaming, and IRQ, for the physical connection of components.

62

Figure 11: Altera SOPC Builder Tool [33]

(4) Altera DSP Builder [32]: Digital signal processing (DSP) system design in Altera programmable logic devices (PLDs) requires both “high-level algorithm” and “hardware description language (HDL) development” tools. Altera’s DSP Builder integrates these tools by combining the algorithm development, simulation, and verification capabilities of The

MathWorks MATLAB and Simulink system-level design tools with VHDL and Verilog HDL design flows, including the Altera Quartus II software. DSP Builder shortens DSP design cycles by helping you create the hardware representation of a DSP design in an algorithm-friendly development environment. Existing MATLAB functions and Simulink blocks can be combined with Altera DSP Builder blocks and Altera intellectual property (IP) MegaCore functions to link system-level design and implementation with DSP algorithm development. In this way, DSP

Builder allows system, algorithm, and hardware designers to share a common development platform. The DSP Builder Signal Compiler block reads Simulink Model Files (.mdl) that contain other DSP Builder blocks and MegaCore functions. Signal Compiler then generates the

VHDL files and Tcl scripts for synthesis, hardware implementation, and simulation.

63

The DSP Builder standard blockset includes libraries of design building and interface blocks and

a library of blocks that represent each of the DSP MegaCore functions.

The standard blockset has the following features:

 Cycle-accurate behavioral models

 Multiple clock domain management

 Control rich with backpressure support

 Access to specific hardware device features

 Hardware-in-the-loop (HIL) support enables FPGA hardware co-simulation

 Support for importing VHDL or Verilog HDL design entities

 Tabular and graphical state machine support

 Rapid prototyping using Altera DSP development boards

 SignalTap II logic analyzer debugging support

 Direct instantiation of DSP IP cores

The DSP Builder advanced blockset does not interface directly with the DSP IP cores but instead

includes its own timing-driven IP blocks that can generate high performance FIR, CIC, NCO,

and FFT models.

The advanced blockset has the following features:

 Specification driven design with automatic pipelining and folding

 High level synthesis technology

 High performance timing-driven IP models

 Multichannel designs with automatically vectorized inputs

 Automatic generation of memory-mapped interfaces

 Simulink fixed-point types

64

 Single system clock for the main datapath logic

 Feed-forward datapath with minimum control

 Portability across different device families

Figure 12: FPGA development tools at a glance [33]

4.3 EXPERIMENTS

The previous chapter which presented a snapshot of past literature on the comparison of

hardware platforms for DSP system implementation deliberated case studies related to various

fields of DSP applications like image compression and sound processing. The common factor

that binds these all previous case studies is that none of them have given a “generic viewpoint”

when comparing the FPGA and PDSP platforms. Each case study discusses and implements a

DSP algorithm that is tailored for the intended application.

65

The attempt of this thesis is to present hardware platform comparison for the most commonly used DSP computations like Frequency Analysis using Fourier Transforms, design and realization and sampling rate conversion. These computations are ubiquitous in almost all

DSP applications ranging from image processing, audio/video processing to biomedical systems.

The objective here is to design a set of experiments that cover these common DSP computations and implement it on both FPGA and PDSP using the design flows described in Chapter 3.

Explained in brief below is the list of experiments performed.

Experiment 1: Basic Sampling and Quantization

A signal is defined as a quantity that varies with time space or any other independent variable.

An analog signal is a signal that is continuous in time and amplitude. It means that it has a value or takes a value at every instant of time. Hence it is also called as a continuous time and continuous valued signal. Analog to Digital Converters are used to convert an Analog Signal to a digital signal. A digital signal is a signal that is discrete in time and amplitude. It means that it takes values at discrete instants of time.

Sampling is defined as the process of recording the value of a signal at discrete and periodic instants of time. The time difference between two consecutive samples is called as sampling time and its reciprocal is sampling frequency. Hence to get the value of a signal at discreet instants, we have to “sample” the signal at periodic time intervals. By the Sampling theorem, sampling frequency must be at least twice the frequency of the continuous-time signal to avoid aliasing

Quantization is the process of mapping a large set of input values to a smaller set – such as rounding values to some unit of precision. A device or algorithmic function that performs quantization is called a quantizer. The round-off error introduced by quantization is referred to as

66 quantization error. Hence quantization can be thought of as a process of truncation or rounding off.

Mathematically, an analog signal can be represented as

x(t) = A sin (ωt) = A sin (2πF0t) where A = amplitude of the analog signal; t= time

F0= analog frequency to convert an analog signal to a digital signal, we sample the analog signal at

Ts time intervals, where Ts represents sampling time period.

If ‘n’ is the number of samples per period, then t=nTs;

Hence the analog frequency equation can be written as

x(n) = A sin(2πF0 nTs)

x(n) = A sin(2π F0/Fs n) ……. (because Ts = 1/Fs)

Hence the equation of a digital signal can be represented as x(n) = A sin(2 π F n)

where F = F0/Fs = Frequency of digital signal obtained by dividing analog frequency by sampling frequency.

Experiment # 2: Discrete Fourier Transform

Discrete Fourier Transform (DFT) is the frequency domain representation of a time domain discrete time signal x(n).

67

The Discrete Time Fourier Transform (DTFT) a discrete-time aperiodic signal is given by the following equation:

 X(ω) =  x(n) e – j ω n n the range of the DTFT is infinite. However, the range of the DFT is finite. The DFT is obtained by sampling the DTFT of x(n) at ‘N’ equally spaced points over a period extending from ω=0 to

DFT can be expressed as

N 1 X(k) =  x(n) e –j2πkn/N n0

(-j2ᴨ/N) The Twiddle Factor can be expressed as WN = e

Hence, x(n) is a N*1 matrix containing ‘N’ elements [x(0) x(1) x(2) ….. x(N-1)]

X(k) is a N*1 matrix containing ‘N’ elements [X(0) X(1) X(2) ….. X(N-1)]

Therefore, the DFT equation can be expressed as the multiplication of matrices

XN = [WN]* xN

Experiment # 3: Digital FIR Filters – design and realization

A digital filter is a system that performs mathematical operations on a sampled, discrete-time signal to reduce or enhance certain aspects of that signal. The primary functions of a digital filter are : to confine a signal to a prescribed frequency band like low pass or high pass, to decompose a signal into multiple sub-bands, to modify the frequency spectrum of a signal and to model the

I/O relationship of a system.

68

A digital filter is characterized by its transfer function obtained after taking a Z Transform of the difference equation. Mathematical analysis of the transfer function can describe how it will respond to any input. Filter Design consists of developing specifications appropriate to the required conditions like a low pass filter or high-pass filter with a specific cut-off frequency, and then producing a transfer function which meets the specifications.

There are two types of digital filters classified based on their impulse response : Finite- Impulse

Response Filters (FIR) and Infinite Impulse Response Filters (IIR). We shall be restricting ourselves to FIR Filters in this thesis. FIR filters are non-recursive, i.e. its output depends only on the present input and the past inputs. IIR systems are recursive, they have a feedback and their output depends not only on the present and past inputs but also on future outputs that are fed back to the IIR filter.

The system transfer function of a FIR filter is given by

M 1 -k H(Z) =  bk Z k 0

The objective here is to design an FIR filter (both low-pass and high pass) with a given cut-off frequency for both the filters. The filter shall be filtering a multi-channel signal having 4 channels.

4.4 PROCEDURE FOR IMPLEMENTATION

FPGA

Simulink and DSP Builder were used as the graphical modeling and simulation tools for FPGA implementation. Simulink and DSP Builder provide toolboxes that contain pre-defined blocks required for each application domain. The toolboxes used for this project include DSP System toolbox, Altera DSP builder blockset, and Simulink General toolbox. Using the blocks provided

69 in these above mentioned toolboxes, graphical models have been constructed for each experiment. Signal Tap Logic Analyzer method has been used to build these models. The

SignalTap II logic analyzer captures the signal activity at the output gates loads into the Altera device on the development board. The logic analyzer retrieves the values and displays them in the MATLAB work space. A SignalTap II Logic Analyzer block in DSP Builder has a simple, easy-to-use interface, analyzes signals in the top-level design file, uses a single clock source and captures data around a trigger point. 88% of the data is pre-trigger and 12% of the data is post- trigger [32] A signal-tap node represents a wire carrying a signal that travels between different logical components of a design file. The SignalTap II logic analyzer can capture signals from any internal device node in a design file, including I/O pins. The SignalTap II logic analyzer can analyze up to 128 internal nodes or I/O elements. The trigger pattern describes a logic event in terms of logic levels or edges. The SignalTap II logic analyzer uses a comparison register to recognize the moment when the input signals match the data specified in the trigger pattern. The trigger pattern comprises a logic condition for each input signal. By default, all signal conditions for the trigger pattern are set to Don’t Care, masking them from trigger recognition. You can select one of the following logic conditions for each input signal in the trigger pattern: Don’t care, Low, High, Rising edge, Falling edge, Either edge. The SignalTap II logic analyzer triggers when it detects the trigger pattern on the input signals [32]. Using the method described above, models can be constructed and simulated in MATLAB workspace. Thereafter, the Simulink model is imported in Quartus II tool of Altera using Tcl scripting commands. Altera’s tools convert the Simulink model to a VHDL or Verilog code as required by the designer. The designer can then simulate the design in Altera using ModelSim. The design can be synthesized and downloaded onto the FPGA using Altera’s Quartus II tool.

70

PDSP

The procedure for PDSP implementation is pretty straight-forward as compared to the FPGA implementation. Code Composer Studio (CCS) is the name of the software tool provided by

Texas Instruments along with its C5515 Evaluation Module. CCS works in an Eclipse GUI based

Microsoft Windows environment. Designers can choose to write either Assembly language programs or C language programs to create projects in CCS. The user has to specify the Device family (in this case C5515 EVM) and the type of debugging method. Two types of debugging methods are generally provided: simulation and emulation. On-board hardware Emulation is possible through a USB cable connected to the host computer. Emulation provides a way for users to inspect the inner details of a TI digital signal processor and help the product development by means of a hardware device (Emulator). Emulation also has the benefit of providing the scenario that is closest to the end product while still maintaining control over the device. However, for purposes of this thesis, emulation has not been used due to limitations of the CCS software version 5.3 and non-availability of specific drivers compatible with the software version. Instead, TI Simulator for EVM C5515 provided by CCS has been used to code, build, debug and simulate the designs. This drawback has limited the extent of analysis that can be performed on the PDSP based design and hence only parameters like lines of code, design time and have been recorded for comparison with FPGA.

4.5 OBSERVATIONS AND RESULTS

(1) Quantization :

FPGA :

Quantization Interval = 5 Quantized Values Range (0,5,10,15….)

71

The Quantizer block passes its input signal through a stair-step function so that many neighboring points on the input axis are mapped to one point on the output axis. The effect is to quantize a smooth signal into a stair-step output. The output is computed using the round-to- nearest method, which produces an output that is symmetric about zero. y = q * round(u/q) where y is the output, u the input, and q the Quantization interval parameter.

Data Type Support: The Quantizer block accepts and outputs real or complex signals of type single or double. For more information, see Data Types Supported by Simulink in the Simulink documentation.

Quantization interval: The interval around which the output is quantized. Permissible output values for the Quantizer block are n*q, where n is an integer and q the Quantization interval. The default is 0.5.

Screenshots:

Figure 13: Experiment 1 Simulink Model

72

Figure 14: Experiment 1 Simulink Simulation

Figure 15: Experiment 1 Quartus II

73

Figure 16: Experiment 1 PDSP

(2) Discrete Fourier Transform

Fo= 20 Hz (Frequency of Continuous Time Signal)

Fs= 100 Hz (Sampling Frequency) Ts= 0.01 seconds

Number of Samples per Frame =256

Total Simulation time = 10 seconds

Total Number of Samples = (Total Simulation Time) / (Sampling Time)

= 10/0.01 = 1000 samples

Total Number of Frames = Total Number of Samples / Number of Samples Per Frame

= 1000/256

=4 (approximately)

74

Figure 17: Experiment 2 Simulink Model_1

75

Figure 18: Experiment 2 Simulink Model_2

Figure 19: Experiment 2 Simulink Simulation

Figure 20: Experiment 2 Quartus II

76

Figure 21: Experiment 2 PDSP

(3) Digital Filter

Using Filter Design Block in Simulink

Digital Filter Design block & Filter Realization Wizard [MATHWORKS HELP]

Overview of the Digital Filter Design Block

The Digital Filter Design block can be used to design and implement a digital filter. It is designed can filter single-channel or multichannel signals. The Digital Filter Design block is ideal for simulating the numerical behavior of a filter on a floating-point system, such as a personal computer or DSP chip.

Filter Design and Analysis: All filter design and analysis can be performed within the Filter

Design and Analysis Tool (FDATool) GUI, which opens with the Digital Filter Design block.

FDATool provides extensive filter design parameters and analysis tools such as pole-zero and impulse response plots.

77

Filter Implementation: Once a filter is designed using FDATool, the block automatically

realizes the filter using the filter structure specified. The block can then be used to filter signals

in a Simulink model. The filter can also be fine-tuned by changing the filter specification

parameters during a simulation.

Guidelines when Selecting a Filter Design Block

Users can design and implement digital filters using the Digital Filter Design block and Filter

Realization Wizard. There are certain similarities and differences between these blocks and how

to choose the block that is best suited for specific needs.

Similarities:

 Filter design and analysis options: Both blocks use the Filter Design and Analysis Tool

(FDATool) GUI for filter design and analysis.

 Output values: If the output of both blocks is double-precision floating point, single-precision

floating point, or fixed point, the output values of both blocks numerically match the output of

the filter method of the dfilt object.

Differences:

 Filter implementation method: The Digital Filter Design block opens the FDATool GUI to the

Design Filter panel. It implements filters using the Digital Filter block. These filters are

optimized for both speed and memory use in simulation and in C code generation. The Filter

Realization Wizard opens the FDATool GUI to the Realize Model panel. The block can

implement filters in two different ways. It can use the Simulink Sum, Gain, and Delay blocks, or

it can use the Digital Filter block. If a filter is implemented using the Digital Filter block, it is

78

bound by the type of filters this block supports. If a filter is implemented by the Filter

Realization Wizard using Sum, Gain, and Delay blocks, inputs to the filter must be sample based.

 Supported filter structures: Both blocks support many of the same basic filter structures, but the

Filter Realization Wizard supports more structures than the Digital Filter Design block. This is

because the block can implement filters using Sum, Gain, and Delay blocks.

 Multichannel filtering: The Digital Filter Design block can filter multichannel signals. Filters

implemented by the Filter Realization Wizard can only filter single-channel signals.

 Data type support: The Digital Filter block supports single- and double-precision floating-point

computation for all filter structures and fixed-point computation for some filter structures. The

Digital Filter Design block only supports single- and double-precision floating-point

computation.

Guidelines regarding when to use each block

Digital Filter Design Block

 Use to simulate single- and double-precision floating-point filters.

 Use to filter multichannel signals.

 Use to generate highly optimized ANSI® C code that implements floating-point filters for

embedded systems.

Filter Realization Wizard

 Use to simulate numerical behavior of fixed-point filters in a DSP chip, a field-programmable

gate array (FPGA), or an application-specific integrated circuit (ASIC).

79

 Use to simulate single- and double-precision floating-point filters with structures that the

Digital Filter Design block does not support.

 Use to visualize the filter structure, as the block can build the filter from Sum, Gain, and Delay

blocks.

 Use to generate multiple filter blocks rapidly.

A multi-channel signal having 4 channels with frequencies 30 Hz, 50 Hz, 95 Hz, 110 Hz is to be

filtered into separate high-pass bands (95 Hz and 110 Hz) with high-pass filter cut-off frequency

80 Hz and low-pass band (30 Hz and 50 Hz) with low pass cut-off frequency as 65 Hz. the

sampling frequency used is 300 Hz.

Figure 22: Experiment 3 Simulink Model_1

80

Figure 23: Experiment 3 Simulink Model_2

Figure 24: Experiment 3 Simulink Simulation

81

Figure 25: Experiment 3 Quartus

4.6 COMPARISON OF THE RESULTS

In the table given below, the PDSP and FPGA implementations have been compared on the basis of various factors listed. The procedure for implementation has already been described in detail in section 4.4 and the observations and results have been explained in section 4.5.

Multiple parameters have been used to compare the performance. The parameters have been grouped together into three groups based on their nature. In the “first group”, we look at the nature of each hardware platform, the type of hardware and software used and the programming method required for each platform.

In the “second group”, we compare based on performance. Performance is measured in terms of lines of code required, execution time and hardware resource utilization. Processing performance is one of the most important characteristic of comparison that distinguishes each hardware

82 platform from the other. These are the parameters that matter the most to DSP engineers as it gives them a clear picture of the nature and type of hardware to be used for a specific class of

DSP applications.

The “third group” are other those parameters that are an off-shoot of the architectural structure inherent to each hardware device. Architecture is the mother parameter that influences the design size, design effort parameters as these are dependent and directly derived from the architecture of each hardware platform.

In the table given below, we can see that the primary difference is the method of implementation associated with each hardware platform. Therefore, for FPGA implementations, we can see that the use of graphical modeling using Simulink and DSP Builder results in 130, 77 and 130 lines of code for the three experiments as compared to 43, 112 and 115 lines of code in C for the

PDSP implementation. Therefore, on an average FPGA implementation requires approximately

112 lines of code versus 90 lines of code. We can see that, using a high level language like C automatically always results in fewer lines of code than other methods types of coding. However, in this case a strong caveat needs to be added as we have calculated a rough estimate of Simulink lines of code.

Simulink, being a graphical modeling tool, does not involve writing lines of code. The method that we have adopted here is to estimate the lines of code for Simulink based FPGA implementation as if each block were a module having one input, one output and at least one parameter whose value is set by the block. This approach adopted results in the values of lines of code obtained as mentioned in the table. VHDL lines of code for the experiments are 692,430 and 554. The VHDL lines of code have been “generated” by the Altera’s Quartus II tool when the Simulink+DSP builder model has been imported into Quartus II for hardware synthesis on

FPGA. It shall not be wise to compare the C and VHDL lines of code in this case because the

83 conversion of a graphical model to HDL involves 3 tools (Simulink, DSP Builder, Quartus II) working in tandem to generate the VHDL code.

In terms of resource utilization, this metric is unique to the FPGA design as the designers create the intended design from scratch using the logic gates on the FPGA to host the DSP design.

Hence a proper metric for resource utilization can be obtained for FPGA implementation after compilation of the imported design in Quartus II. The resource utilization has been found at less than 5% due to fewer number of gates and the simple logic involved in the DSP computations.

However, a designer can construct a DSP design that involves large number of elements like multipliers, FFT, FIR computation blocks that are dedicated DSP blocks along with a processor core, memory and peripherals. Such a design will have the resource utilization shooting up to high levels as a large number of gates on the FPGA will be utilized for the design. Resource utilization as a metric cannot be explicitly measured for the PDSP chip as the processor and other components are already fabricated on the chip and the designer makes uses of these resources already present to build a DSP application.

When we consider the type of architecture of each type of hardware, we can notice a distinct difference. The PDSP has a pre-defined and pre-fabricated architecture consisting of a processor optimized for DSP needs, memory, peripherals and other DSP specific components. Whereas the

FPGA is just a sea of logic gates that must be configured to before any DSP application can be executed on it. Therefore, for a DSP designer who is not concerned overtly about the intricacies of the host architecture on which his DSP application runs, should ideally be satisfied with buying an off-the shelf DSP board and conduct experiments. However, the designer will be constrained to operate within the limits that the PDSP host architecture imposes on its users.

On the other hand, a DSP engineer opting for an FPGA based design shall have the option of building an entirely new architecture. Usually, building a processor based architecture on an

84

FPGA is a complex and time-consuming process that requires writing thousands of lines of

VHDL or Verilog code and testing it by writing test benches. However, with the advent of high- level synthesis tools like Simulink and Altera’s DSP Builder, the complex task of assembling an architecture can be drastically cut down in terms of complexity by switching to graphical modeling and simulation using pre-defined building blocks made available to the designer.

Hence, we can see that, the task of the designer is simplified to a large extent and it helps the designer to build his own architecture and optimize it as per the needs of any specific DSP application domain. Such an option cannot be possible in a PDSP chip, which even though has been manufactured to be tailor-made for DSP applications, cannot be re-configured to modify its architecture to fine-tune it for particular DSP application domains.

An extension of the architecture is the design size and design effort required for the hardware platforms. Generally, design size of PDSP designs is smaller as compared to FPGA designs because of the use of high-level programming language and fixed architecture that has already been optimized for area. In FPGA designs however, the designer has to fit the design within the logic fabric available on the FPGA chip itself and it has been observed that FPGA designs are large and also require some gates for configuring the logic.

In terms of design effort, PDSP designs are quick to implement if the design complexity is small and the designer has the requisite programming knowledge. If the designer is not concerned about the internal details of the architecture, PDSP based implementation is fast. In FPGA based design, the process becomes a lot easier due to almost complete elimination of hand-written coding. It relies on graphical modeling tools to make the process of design easier, simpler and faster.

85

Table 4: Comparing FPGA and PDSP Implementation Results

PROPERTY FPGA PDSP NATURE OF THE PLATFORM Hardware Altera DE-1 Board Texas Instruments C5515 DSP Cyclone II FPGA Evaluation Module Software MathWorks Simulink (R2011a) TI code composer Studio 5.3 Altera DSP Builder Altera Quartus II ModelSim Programming Method Graphical modeling. Requires little or no C Programming hand-written programs. PERFORMANCE Sampling & 43 Average = 90 Lines of Quantization Simulink* = 130 Average = 112.3 Code DFT Simulink* = 77 112 Digital Simulink* = 130 115 Filters Execution Time Execution time is shorter than PDSPs. Execution time is longer than [43] [44] Faster execution observed in cases where FPGAs. Maximum CPU clock DSP logic is downloaded onto FPGA. Even frequency is 120 MHz. Also, if an embedded processor like Nios II is instruction is executed in 12 included, execution time is lesser as Nios II pipelined stages. clock frequency is 200 MHz Resource Utilization Specific value is reported by Altera’s tools Though no values reported due to [43] [44] depending upon the number of logic tool limitations, pipelined elements used. instruction execution utilizes a large amount of CPU hardware. EP2C20 chip (18,752 logic elements) C5515 chip 484 Pin BGA package 196 Pin BGA package 26 Multipliers (18X18 bit) 250 MHz Dual Multipliers (240 MHz) 52 M4K blocks (1 block= 4Kbit) 320 KB RAM & 128 KB ROM Nios II Processor (200, 185, 165 MHz) CPU (60, 75, 100, 120 MHz) ARCHITECTURE Architecture Flexible and tailor-made architecture can be Fixed Architecture cannot be designed by the designer. changed Design Size Design size shall be large if components Design size is small as processor like processors, memories and peripherals architecture is pre-fabricated on are included with DSP modules in the the chip. design. Design Effort Faster only if simple blocks need to be Fast if the programmer is adept at plugged in and connected. C or assembly. Also need not Designs take longer time if an entire spend time coding for architecture. processor based system needs to be created Only needs to make best use of it. and synthesized on an FPGA board Familiarity helps hardware engineers familiar with DSP engineers prefer PDSPs over design flow to help design DSP systems. FPGAs due to limited exposure to Familiarity with MATLAB helps DSP hardware design tools. engineers acclimatize to FPGA based design.

86

*Note: The Simulink / DSP builder design tools use graphical blocks supplemented with HDL code instead of C code. Therefore, to find an equivalent measurement for the total Lines of Code in the FPGA design, an assumption has to be made about the blocks used. The blocks required the designer to specify a minimum of three basic details to instantiate the block: the block function, the input(s), and the output(s). The methodology provided one line of code for each block placed and one line for each input and output parameter. Moreover, the blocks also contained user-defined parameters that have to be set for each block instantiation. Hence, for every parameter needed to define the block, another line of code was added to the count. In this manner, an estimate of the Lines of Code for Simulink modeling has been made.

87

CHAPTER 5: CONCLUSIONS AND FUTURE WORK

5.1 INTRODUCTION

In the discussion in the previous chapters, most of the issues that the author came across while researching this subject have been extensively covered. This includes an exhaustive background study and literature review of this research area that discussed the impact on VLSI technology on

DSP, evolution of DSP hardware like the PDSP and FPGA, the design methodologies associated with each hardware platform and three case studies from past research that specifically deal with the issue of PDSP vs. FPGA comparison. The experiments described and formulated and the consequent observations and results have clearly demonstrated the inherent advantages and disadvantages of each hardware platform.

The next section of this chapter shall list all he conclusions in a comparison format based on various points the author has observed for the two hardware platforms. This chart should serve as a generic template that can help engineers working in this area to make informed decisions about hardware selection. Since the intention of the author was not to restrict the debate about this topic to the confines of an industry perspective, this template should also be helpful for students and academic researchers.

In the penultimate section, the core objective is to envision the future scope and direction of emerging trends and developments in the DSP hardware implementation field. The aim is to provide an insight into happenings in this space and to explore a few evolving trends and developments that may help us chart the future course of discussion of this research area.

88

5.2 CONCLUSIONS: TEMPLATE FOR HARDWARE PLATFORM SELECTION

In this section, we present a template for hardware platform selection for DSP design using either

PDSPs or FPGAs. Throughout the course of this thesis until now, we have come across multiple factors that are necessary to be taken into account when the type of hardware needs to zeroed on for DSP system development.

In Table 5 below, we list the factors that are influenced by the nature of the hardware that have an effect on the processing performance of the DSP application being implemented. Those are speed, cost, power consumption, area/size, prototyping ability, field programmability and the availability of pre-designed hard and soft IP cores.

The second table (Table 6) lists all the external or generic factors that indirectly affect the design and implementation of DSP systems. We have called them external or generic because these can be seen virtually in any kind of system development, whether is pure software based or uses general purpose microprocessors, ASICs, PDSP, FPGAs and microcontrollers.

From the two tables, it we can list the specific advantages associated with PDSPs and FPGAs as given below.

Advantages of a PDSP based design methodology:-

 Conventional design methodology that is well developed and mature.

 PDSP based DSP system design is beneficial when the designer is not concerned about

the explicit internal details of the architecture and does not require an architecture that needs

to be tailor-made and fine-tuned for a specific class of DSP applications.

89

 The PDSP is designed and optimized to support a larger and wider class of DSP algorithms and applications.

 PDSP is useful whenever there is a need of high production volumes and environments

 Useful when power needs to be conserved and area needs to be optimized.

 Knowledge of logic design or HDLs not required as architecture need not be designed separately.

 Using PDSP is akin to using a personal computer (with a general purpose microprocessor) for writing programs to build a purely software based application, albeit with the notable difference that in case of PDSP; the user writes high-level programs to run

DSP applications that are tightly coupled to the host hardware with the absence of any operating system.

Advantages of a FPGA based design methodology:-

 A modern approach that eliminates the need to write long lines of code and instead replaces it with a graphical modeling approach and high-level synthesis tools to translate the models.

 Designer can construct a whole new architecture with pre-defined blocks and a wide range of hard and soft IP cores library made available by the FPGA vendors.

 Best suited for low-production volume applications.

 Design can be changed and re-configured frequently and refinements can be made easily and quickly.

 Models can be re-used for new designs.

90

PROPERTY PDSP FPGA Speed Slower speed due to lower clock rates for Faster speed due to higher clock rates of [21] [22] CPU cores. Execution takes longer if pipeline embedded processors. Also case studies has large number of stages. Case studies too have reported shorter execution time have reported longer execution times for than for PDSP. PDSPs Cost Useful for complex and high production For smaller applications and/or lower volume designs as a PDSP is a pre- production volumes manufactured device with a fixed architecture Power Can be optimized for low power consumption Consumes more power before it is fabricated later. Size / Area Application typically use all components Applications may use only a few logic present on a PDSP chip as instructions are elements for core DSP logic. In a executed in a pipelined fashion. processor-based system, large number of logic elements required. Field Cannot be reprogrammed. Field Programmable. A new code can be Programmability Ships with a fixed architecture. downloaded and an FPGA can be reprogrammed in a short time.

Prototyping PDSP can be used for prototyping subject to Can be used a prototyping device architecture constraints. because of reusability. More freedom because user can design all blocks practically from scratch. Pre-designed Not available unless specifically included in Modern FPGAs have additional hard or soft IP hardware. Need to be designed separately. hardware/ software blocks like cores/blocks multipliers for DSP, hard core or soft- core processors & peripherals. Possible to design a System-on-Chip (SoC) using these blocks Parallel PDSPs deliver better performance for high- FPGAs deliver better performance for Processing speed serial processing high-speed parallel processing [37] operations.

Heterogeneous, PDSP is main processor responsible for FPGA acts a co-processor for parallel reconfigurable pipelined operations processing operations DSP hardware platforms [37] Table 5: Template for Hardware Platform Selection -1

91

PROPERTY PDSP FPGA Design Relies more on software programming Requires deeper hardware Methodology knowledge to use the underlying hardware. knowledge since logic need to be designed using existing pre- designed blocks Design Cycle Application-specific. Shorter design cycle if Application-specific & availability architectural constraints not a hindrance. of pre-defined blocks Ease of design Extensive C programming or assembly language Graphical tools like Simulink cycle required. shorten the development cycle. Reusability Reusability limited to high-level C code. Reusability of FPGA is the main of design Assembly language code cannot be ported to a advantage. Prototype of the design different architecture. can be implemented on FPGA which could be verified for almost accurate results. If the design has faults, the HDL code is modified, and FPGA can be reprogrammed to test the design. Knowledge of Not required. All programming in either C or May be required in order to work HDLs assembly with FPGA tools Knowledge of Not required. Architecture is fixed Requires proficiency in logic Digital Logic design Non Recurring NRE refers to the one-time cost of researching, Typically, FPGA vendors classify Engineering Costs designing, and testing a new product. Since FPGAs within “families”. design effort and research costs are higher when Developing a new family of designing a specialized PDSP chip architecture, FPGAs has high NRE costs, NRE costs for PDSPs are higher. versions within a family are simple tweaks hence have lower NRE costs. Time-to-Market Inconclusive. May have faster time-to-market Depends on the design cycle for each application. due to smaller design cycle. However large designs require more building blocks that increase complexity. Hence may not keep pace as compared to a PDSP with fixed hardware architecture. Current State Still popular among core DSP engineers who are Newer approach. Slowly gaining more familiar with design cycle. foothold in academia and industry. Enables a hardware engineer to learn and implement DSP easily.

Table 6: Template for Hardware Platform Selection -2

92

5.3 FUTURE TRENDS: CO-PROCESSORS – A HYBRID APPROACH

An emerging trend in the VLSI based implementation of DSP systems that has been observed is not to restrict the implementation to a single platform like PDSP or FPGA. A thoughtful effort in this direction has been to provide a “hybrid solution” with the objective of maximizing the DSP system performance through the use of an FPGA as a co-processor to PDSP. This hybrid or heterogeneous approach has been the subject of various white papers published by hardware industry vendors like Altera, Xilinx and Texas Instruments.

A common thread that this author has observed when reading numerous white papers by the

FPGA vendors Altera and Xilinx is the ability to couple the FPGA as a co-processor to an existing Texas Instruments manufactured PDSP. This alliance of PDSP and FPGA vendors has opened a new arena in the quest to find an optimal, low-cost and effective platform for DSP system development and implementation.

Presented below is a comprehensive analysis of the material published in selected white papers of Altera and Xilinx that stresses the topic of FPGA co-processing.

As stated before, conventionally, DSP applications were either implemented in a general-purpose

DSP processor or built using ASIC technology. Despite the availability of high-performance

DSP processors, they may not be suited to all DSP kinds of applications. Their general-purpose architecture makes these DSP processors flexible, but they may not be fast enough or cost- effective for all systems. FPGAs and ASICs offer faster processing speed and more functionality to support more advanced features when compared to PDSPs. Making a choice between an ASIC and an FPGA depends on the application. ASICs were used whenever the DSP application required performance beyond the abilities of programmable DSPs, or when the expected system volumes justify a semi-custom solution or a full-custom ASIC solution. However, an FPGA

93 implementation can be a faster time-to-market and lower-cost solution than an ASIC design.

FPGAs also offer the added benefit of re-configurability when the design specification changes.

On the other hand, an ASIC may be the right solution for a large volume, very high-speed, or power-sensitive application [34] [35] [40] [41].

High-performance DSP platforms, based on general-purpose DSP processors running algorithms developed in C, have been migrating towards the use of an FPGA pre-processor or coprocessor.

The prime motivators that necessitate this migration are significant performance, power, and cost advantages. Despite the significant benefits, design teams accustomed to working on traditional high-level language based DSP development may avoid using FPGAs because they lack the hardware skills necessary to use one as a co-processor. Unfamiliarity with traditional hardware design methodologies such as VHDL and Verilog limits or prevents the use of an FPGA that may result in resulting in more expensive and power-hungry designs. A new group of emerging design tools called ESL (electronic system level) promises to address this methodology issue, allowing processor-based developers to accelerate their designs with FPGA while maintaining a common design methodology for hardware and software. The performance and cost of the DSP system are optimized while lowering system power requirements by offloading operations that require high-speed parallel processing onto the FPGA and leaving operations that require high- speed serial processing on the DSP. Another approach may entail the creation of an independent hardware accelerator for one of the Xilinx embedded processors. The processor remains the primary target for the C routines, with the exception that performance-critical operations are pushed to the FPGA logic in the form of a hardware accelerator. This provides a more software- centric design methodology albeit with tradeoff of processing performance [36].

FPGAs and DSP processors have fundamentally dissimilar architectures. An algorithm that is well suited for implementation on one may be very inefficient on the other. For instance, a

94

hardware system based solely on DSP processors may require more area, cost, or power if the

target application requires a large amount of parallel processing or a maximized multichannel

throughput because discrete DSPs do not scale well for parallel processing. An FPGA

coprocessor can provide up to 550 parallel multiply and accumulate operations on a single

device, delivering the same performance with fewer devices and lower power for many

applications. On the other hand, while FPGAs excel at processing large amounts of data in

parallel, they are not as optimized as DSP processors for tasks such as periodic coefficient

updates, decision-making control tasks, or high-speed serial mathematical operations. Combining

an FPGA with a DSP processor delivers successful solutions for a wide range of applications.

[37]

Nevertheless, FPGAs bring two key advantages to digital signal processing. First their

architectures are well suited for highly parallel implementation of DSP functions, allowing for

very high performance. Second, user programmability allows designers to trade-off device area

vs. performance by selecting the appropriate level of parallelism to implement their functions.

By programming the FPGA to use more on-chip resources, designers can achieve higher

performance. By using less resources (and accepting a corresponding lower performance),

designers can optimize the design for low cost. [39]

While FPGA architecture has some nice capabilities, several areas must come together for a

good co-processing solution:

 Silicon foundation: Logic, DSP, and power management

 Arithmetic foundation: Operator cores optimized for FPGAs and datapath compiler

 Library foundation: Function optimized for specific FPGA resources

 System level: Co-processing tool chain with CPU interface bandwidth and [38]

95

Heterogeneous, reconfigurable DSP hardware platforms are hardware platforms that include both a DSP processor and an FPGA supported by a platform-based design methodology enable traditional DSP designers not familiar with FPGAs to quickly evaluate the benefits an FPGA coprocessor can bring to their applications. They provide off-the-shelf hardware that addresses the most important design challenges yet is still sufficiently customizable to allow for product differentiation. These platforms limit degrees of freedom in hardware, thereby providing greater automation in the design flow. This automation can help eliminate complexity, thus extending the advantages of heterogeneous platforms to the DSP design community [37].

A heterogeneous system improves exploitation of pipelining and parallel processing, which are essential to achieve high frame rates and low latency. Developing this type of system requires proficiency in both FPGA and DSP processor designs plus the systems engineering skills necessary for partitioning – a breadth of skills few designers possess. A heterogeneous platform- based design flow extends the design automation concepts adopted by the individual processor and FPGA design flows to the entire platform. The basic function of a platform-based design – abstracting away the hardware and software interface details between the FPGA and DSP processor – allows a DSP designer with little or no FPGA design experience to evaluate and exploit the benefits of adding an FPGA. This design flow should automatically generate memory maps, header and driver files for the software interface, and hardware interface and interrupt logic. Refining the overall system should have limited consequences on individual hardware and software components [37].

Designers can use many methods to implement a DSP algorithm in any given technology. Target hardware often influences the algorithmic approach. When the target is a heterogeneous DSP hardware platform, selecting an implementation becomes a two-step process. The designer must first select the most appropriate hardware device and then determine which implementation

96 method makes sense for that particular device. On a DSP hardware platform, the processor will be the master and control the FPGA [37].

Figure 26: TI DSP Xilinx FPGA as Co-processor [36]

]

Figure 27: TI DSP and Altera FPGA as Co-processor [40]

97

Figure 28: Heterogeneous platform-based design [37]

In many cases, FPGAs work in conjunction with a conventional DSP – typically integrating pre- and post-processing functions, along with high performance signal processing. FPGAs can also integrate all the logic, bus-bridging, and peripheral functions, thus reducing system costs and affording a higher level of system integration. The FPGA, in turn, will be used as either a co- processor (where data is sourced to and synched from the DSP processor) or as a pre- or post- processor (where the data is sourced from a high-speed interface). System data rates and operating parameters drive optimal FPGA usage [37].

DSP application developers will find the co-processing flow, in which an FPGA can be used to accelerate performance-critical functions, to be the most natural programming model. Tools such as Code Composer Studio (CCS) for Texas Instruments DSPs include code profilers that identify the software hot spots that can be offloaded to the FPGA. To use these tools efficiently and design a heterogeneous DSP/FPGA platform effectively, designers need an interface to connect the FPGA to a separate DSP processor on the hardware platform. DSP platforms will typically

98 support more general-purpose interfaces, such as the Texas Instruments 16/32/64-bit Tic6x DSP extended memory interface (suitable for system control and co-processing tasks), and high-speed serial interfaces, such as Serial Rapid IO or video interfaces (for pre- and post-processing operations). As designers add FPGA to the system, the software implementation will change from an algorithmic description to data passing and function control. The FPGA coprocessor will appear as a hardware accelerator to the application software developer and will be accessible through function calls [37].

5.4 CONCLUDING REMARKS

Finally, to conclude, “human-intelligent machine” interaction on a day-to-day basis is so ingrained as a part of our lives now (including the use of household electronic appliances, smartphones, tablets, computing systems, intelligent automobiles, office and industrial automation, and security systems) that “unplugging” is now being suggested as an alternative therapy by the medical profession as a means of reducing distractions and cleansing of the mind, body and soul. This is because at almost every point in our daily routine involves some interactions with the virtually connected world. It is always fascinating to learn and understand how theories can be transformed into real world working products that can help make tasks easier in this increasingly interconnected wired or virtually wired (wireless) world of ubiquitous computing devices.

All this has all become possible because engineers have been able to interface the real world analog signals, convert them to digital signals, analyze and extract relevant information contained in those signals that can make sense to the end-user, and transform signals to synthesize an output signal that can be presented to an ordinary end-user in an easy manner. The essence of this capturing, analysis, extraction, transformation, synthesis and delivery is what can

99 be defined as digital signal processing in layman’s terminology and the medium through which this can be accomplished is through a combination of hardware and software.

We can thus state that the field of VLSI Signal Processing not only brings about a convergence of two independent domains in the field of Electrical Engineering, but also throws open a range of possibilities and ideas that can be harvested to help solve problems in the individual domains with cross-pollination of knowledge sharing abilities, building cross expertise in each domain that may help solve problems faced by hardware engineers, algorithm engineers and system architects alike.

100

REFERENCES

[1] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, Wiley-Interscience Publishers, 2007.

[2] S. Mirzaei, Design Methodologies and Architectures for Digital Signal Processing on FPGAs, Ph.D. Dissertation, Department of Electrical and Computer Engineering, University of California Santa Barbara, 2010.

[3] S. S. Bhattacharyya, E. F. Deprettere, R. Leupers and J. Takala, Handbook of Signal Processing Systems, Springer, 2010.

[4] Y. H. Hu, Programmable Digital Signal Processors: Architecture: Programming and applications, CRC Press, 2007.

[5] R. Duren, J. Stevenson and M. Thompson, “A comparison of FPGA and DSP development environments and performance for acoustic array processing”, Proceedings 50th Midwest Symposium on Circuits and Systems, 2007.

[6] R. D. Turney, C. Dick, D. B. Parlour and J. Hwang, “Modeling and implementation of DSP FPGA solutions”, Proceedings International Conference on Signal Processing Applications and Technology, ICSPAT, 2000.

[7] J. Cong, Bin Liu, S. Neuendorffer, J. Noguera, K. Vissers and Zhiru Zhang, “High-level synthesis for FPGAs: From prototyping to deployment”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 30(4), pp. 473-491, 2011.

[8] D. Markovic, B. Nikolic and R. W. Brodersen, “Power and area minimization for multidimensional signal processing”, IEEE Journal of Solid-State Circuits, 42(4), pp. 922-934. 2007.

[9] Texas Instruments White Paper http://www.ti.com/lit/wp/spra879/spra879.pdf , accessed 07-09-2014.

101

[10] M. A. Richards and G. A. Shaw, “Chips, architectures and algorithms: Reflections on the exponential growth of digital signal processing capability”, http://users.ece.gatech.edu/mrichard/Richards&Shaw_Algorithms01204.pdf , accessed 07-09-2014.

[11] S. P. Chan, C. Sankaran, G. Ballou, M. Pecht, N. Angelopoulos, P. Lall, J. Cogdell, Z. Wan, C. R. Paul and R. C. Dorf. The Electrical Engineering Handbook, CRC Press, 2006.

[12] J. Eyre and J. Bier, “Evolution of DSP processors”, IEEE Signal Processing Magazine 17(2), pp. 43-50, 2000.

[13] M. Rawski, B. J. Falkowski and T. Łuba, “Digital Signal Processing: designing for FPGA architectures”, Facta Universitatis-Series: Electronics and Energetics 20(3), pp. 437-459. 2007.

[14] C. Ho, M. Leong, P. Leong, J. Becker and M. Glesner, “Rapid prototyping of FPGA based floating point DSP systems”, Proceedings 13th IEEE International Workshop on Rapid System Prototyping, 2002.

[15] Woon-Seng Gan, “Teaching and learning the hows and whys of real-time digital signal processing”, IEEE Transactions on Education, 45(4), pp. 336-343. 2002.

[16] L. S. DeBrunner and V. DeBrunner, “The case for teaching DSP algorithms in conjunction with implementations”, Proceedings 2nd Signal Processing Education Workshop and 10th Digital Signal Processing Workshop, 2002.

[17] N. Kehtarnavaz and S. Mahotra, “FPGA implementation made easy for applied digital signal processing courses”, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011.

[18] Texas Instruments DSP Tools http://www.ti.com/lsds/ti/dsp/toolsw.page, accessed 07- 09-2014.

[19] S. M. Kuo, B. H. Lee and W. Tian, Real-Time Digital Signal Processing, Wiley, 2006.

102

[20] M. Shirvaikar and T. Bushnaq, “A comparison between DSP and FPGA platforms for real-time imaging applications”, Proceedings SPIE-IS and T Electronic Imaging-Real- Time Image and Video Processing 7244, 2009.

[21] M. Shirvaikar and T. Bushnaq, “VHDL implementation of wavelet packet transforms using SIMULINK tools”, Proceedings International Society for Optical Engineering, 2008.

[22] R. Duren, J. Stevenson and M. Thompson, “A comparison of FPGA and DSP development environments and performance for acoustic array processing”, Proceedings 50th Midwest Symposium on Circuits and Systems, 2007.

[23] D. Markovic, B. Richards and R. W. Brodersen, “Technology driven DSP architecture optimization within a high-level block diagram based design flow”, Proceedings 40th Asilomar Conference on Signals, Systems and Computers, 2006.

[24] FPGA Programming for the Masses http://queue.acm.org/detail.cfm?id=2443836 , accessed 07-09-2014.

[25] TI C5515 EVM User Guide http://support.spectrumdigital.com/boards/evm5515/revb/files/evm5515_TechRef_revb.p df, accessed 07-09-2014.

[26] TI C5515 Fixed-Point DSP Datasheet http://www.ti.com/lit/ds/symlink/tms320c5515.pdf, accessed 07-09-2014.

[27] TI website tutorial http://processors.wiki.ti.com/index.php/Category:Simulation, accessed 07-09-2014.

[28] Altera DE-1 Board User Manual ftp://ftp.altera.com/up/pub/Altera_Material/12.1/Boards/DE1/DE1_User_Manual.pdf, accessed 07-09-2014.

[29] Simulink User Guide http://www.mathworks.com/help/pdf_doc/simulink/sl_gs.pdf, accessed 07-09-2014.

103

[30] Quartus II User Manual http://www.altera.com/literature/hb/qts/quartusii_handbook.pdf, accessed 07-09-2014.

[31] SOPC Builder User Manual http://www.altera.com/literature/ug/ug_sopc_builder.pdf, accessed 07-09-2014.

[32] DSP Builder User Guide http://www.altera.co.jp/literature/ug/ug_dsp_builder.pdf, accessed 07-09-2014.

[33] P. Ekas and B. Jentz, “Developing and integrating FPGA coprocessors”, Embedded Computing Design Magazine, http://embedded-computing.com/pdfs/Altera.Fall03.pdf, accessed 07-09-2014.

[34] S. K. Knapp, Using Programmable Logic to Accelerate DSP Functions, Xilinx, Inc., pp. 1-8. 1995.

[35] S. Sharma and W. Chen, “Using model-based design to accelerate FPGA development for automotive applications”, The MathWorks, 2009.

[36] T. Hill, “The benefits of FPGA coprocessing”, Xcell Journal.v58, pp. 29-31. 2006.

[37] T. Hill, “Heterogeneous hardware platforms capitalize on DSP/FPGA capabilities”, http://dsp-fpga.com/pdfs/Xilinx.RG07.pdf, accessed 07-09-2014.

[38] FPGA Coprocessing Evolution: Sustained Performance Approaches Peak Performance http://www.altera.com/literature/wp/wp-01031-coprocessing-evolution.pdf , accessed 07- 09-2014.

[39] S. Zack and S. Dhanani, “DSP co-processing in FPGAs: Embedding high-performance, low-cost DSP functions”, Xilinx White Paper 2004, http://ohm.bu.edu/~pbohn/__Engineering_Reference/ECEU530_HDL/Digilent_S3_Boar d/wp212.pdf , accessed 07-09-2014.

104

[40] P. Ekas and B. Jentz, “Developing and integrating FPGA coprocessors”, Embedded Computing Design Magazine, http://embedded-computing.com/pdfs/Altera.Fall03.pdf, accessed 07-09-2014.

[41] FPGAs Provide Reconfigurable DSP Solutions http://www.altera.com/literature/wp/wp_dsp_fpga.pdf, accessed 07-09-2014.

[42] Increase Bandwidth in Medical & Industrial Applications with FPGA Coprocessors http://www.altera.com/literature/wp/wp_use_of_pld_as_cp5.pdf , accessed 07-09-2014.

[43] Altera Cyclone II Device Handbook, http://www.altera.com/literature/hb/cyc2/cyc2_cii5v1.pdf, accessed 07-09-2014.

[44] TI C55x CPU Architecture Reference Guide, http://www.ti.com/lit/ug/swpu073e/swpu073e.pdf, accessed 07-09-2014.

105