A comparative study of PDSP and FPGA design
methodologies for DSP system design
A thesis submitted
to the Graduate School of the University of Cincinnati
in partial fulfillment of the requirements for the Degree of
Master of Science
in the Department of Electrical Engineering and Computing Systems
of the College of Engineering and Applied Science
By
Prasad Deodhar
Bachelor of Engineering (Computer Engineering)
University of Mumbai, India. 2009.
Thesis Advisor and Committee Chair: Dr. Carla Purdy ABSTRACT
In today’s globally interconnected world, we notice a proliferation of a vast array of electronic devices and systems in our daily life, from industrial automation, military, aerospace, aviation, medicine, consumer electronics to multimedia and entertainment products. The common thread that binds all these devices is that they involve some kind of a human-computer interface that helps the end-user of these devices interact and control the computational system within each device. Such a human –computer interface typically involves some kind of a Digital Signal
Processing (DSP) module whose specific task is to accept as input a real-world analog signal, convert it into a digital signal and process the digital signal by means of extracting useful information through transformation, analysis and synthesis to eventually deliver a result that can help in making a decision. Hence DSP serves as the “interface” between the analog domain of real-life signals and the computational world of digital signals. The most widely used hardware platform for DSP system implementation is the Programmable Digital Signal Processor (PDSP).
The PDSPs are general purpose microprocessors designed for embedded DSP applications. They contain special architecture and instructions that support execution of computation - intensive
DSP algorithms more efficiently. However, rapid advancements in CMOS technology have widened the options available to a hardware engineer for DSP system implementation. The advent of Field Programmable Gate Arrays (FPGAs) with in-built hardware blocks like DSP multiplier cores, hard and soft IP cores and high-level synthesis tools has given the PDSP a strong competitor. A multitude of factors such as development effort, design time, performance in terms of power consumption and speed, time-to-market, prototyping capabilities, design methodologies and architectural flexibility should be considered for DSP system implementation.
ii
This thesis makes a comparative study of the two hardware platforms – PDSP and FPGA – in terms of design methodologies, architectures, design time and effort and impact of high-level synthesis tools. The objective is to help a DSP hardware engineer make an informed decision on the pros and cons of selecting a particular hardware platform.
iii
iv
ACKNOWLEDGEMENTS
First and foremost, I would like to express my gratitude to God, for his blessings and support in every phase of my life.
I express my appreciation and thanks to my advisor Dr. Carla Purdy, Associate Professor,
University of Cincinnati for her motivation, constant support and guidance that helped me in my research work and writing this thesis.
I would also like to sincerely thank Dr. J. Adam Wilson, currently serving as Assistant Professor of Neurology at Cincinnati Children's Hospital and Medical Center, without whose valuable assistance, I would not have been able to work on the hardware and software resources required for the preparation and completion of this study. I express my immense gratitude to Dr. Wilson for his support and guidance.
I would like to thank my parents and friends who kept me motivated and without their moral support and motivation this thesis would not have been possible. They have always been a strong pillar of strength, care and inspiration for me.
I dedicate this thesis to my sister Swati, who had been the greatest support for me throughout my graduate education.
v
TABLE OF CONTENTS
ABSTRACT ...... ii
ACKNOWLEDGMENTS ...... v
LIST OF FIGURES ...... viii
LIST OF TABLES ...... x
LIST OF ABBREVIATIONS ...... xi
1. INTRODUCTION ...... 1
1.1 Motivation ...... 1
1.2 Research Overview ...... 2
1.3 Thesis Outline ...... 6
2. BACKGROUND STUDY ...... 9
2.1 Introduction ...... 9
2.2 Impact of VLSI Technology on DSP and Vice-versa ...... 11
2.3 Evolution of DSP Hardware ...... 13
2.3.1 Programmable Digital Signal Processors (PDSPs) ...... 14
2.3.2 Field Programmable Gate Arrays (FPGAs) ...... 18
2.4 VLSI - DSP: An Insight into the Education Perspective ...... 22
2.4.1 Present State of VLSI - DSP Education ...... 23
2.4.2 Suggested Improvements in VLSI - DSP Education ...... 25
3. PROBLEM STATEMENT ...... 27
3.1 Design Methodologies: PDSP and FPGA Design Flows ...... 27
3.1.1 DSP System Development Flow Using PDSPs ...... 29
vi
3.1.2 DSP System Development Flow Using FPGAs ...... 32
3.2 PDSP and FPGA Comparison: Summary of Past Case Studies ...... 42
3.3 Problem Definition: Scope, Goals & Objectives ...... 47
4. EXPERIMENTS ...... 49
4.1 Introduction ...... 49
4.2 Hardware and Software Overview ...... 50
4.2.1 Hardware – PDSP: Texas Instruments C5515 Evaluation Board ...... 50
4.2.2 Software – PDSP: Code Composer Studio IDE ...... 54
4.2.3 Hardware – FPGA Altera DE1 Development Board ...... 57
4.2.4 Software – FPGA ...... 59
4.3 Experiments ...... 65
4.4 Procedure for Implementation ...... 69
4.5 Observations and Results ...... 71
4.6 Comparison of the Results ...... 82
5. CONCLUSIONS AND FUTURE WORK ...... 88
5.1 Introduction ...... 88
5.2 Conclusions: Template for Hardware Platform Selection ...... 89
5.3 Future Trends: Co-Processors – A Hybrid Approach ...... 93
5.4 Concluding Remarks ...... 99
REFERENCES ...... 101
vii
LIST OF FIGURES
Figure 1: Evolution of DSP Hardware ...... 14
Figure 2: PDSP Design Flow [19] ...... 30
Figure 3: FPGA Based DSP design approaches ...... 33
Figure 4: Model Based Design Framework Using Simulink [6] ...... 37
Figure 5: FPGA Verification [6] ...... 37
Figure 6: Model Based Design: Complete Design Flow [6]...... 40
Figure 7: Texas Instruments C5515 Evaluation Board [25] ...... 50
Figure 8: TI TMS320 C5515 DSP Processor [26] ...... 52
Figure 9: Altera DE1 Development Board [28] ...... 57
Figure 10: Quartus II Flow [30] ...... 61
Figure 11: Altera SOPC Builder Tool [33] ...... 63
Figure 12: FPGA Development Tools at a Glance [33] ...... 65
Figure 13: Experiment 1 Simulink Model ...... 72
Figure 14: Experiment 1 Simulink Simulation ...... 73
Figure 15: Experiment 1 Quartus II ...... 73
Figure 16: Experiment 1 PDSP ...... 74
Figure 17: Experiment 2 Simulink Model_1 ...... 75
Figure 18: Experiment 2 Simulink Model_2 ...... 75
Figure 19: Experiment 2 Simulink Simulation ...... 76
Figure 20: Experiment 2 Quartus II ...... 76
Figure 21: Experiment 2 PDSP ...... 77
Figure 22: Experiment 3 Simulink Model_1 ...... 80 viii
Figure 23: Experiment 3 Simulink Model_2 ...... 81
Figure 24: Experiment 3 Simulink Simulation ...... 81
Figure 25: Experiment 3 Quartus ...... 82
Figure 26: TI DSP and Xilinx FPGA as Co-processor [36] ...... 97
Figure 27: TI DSP and Altera FPGA as Co-processor [40]...... 98
Figure 28: Heterogeneous platform-based design [37] ...... 98
ix
LIST OF TABLES
Table 1: Results for Case Study # 2 [21] ...... 44
Table 2: Internal Memory [26] ...... 53
Table 3: External Memory [26] ...... 53
Table 4: Comparing FPGA and PDSP Implementation Results ...... 86
Table 5: Template for Hardware Platform Selection_1 ...... 91
Table 6: Template for Hardware Platform Selection_2 ...... 92
x
LIST OF ABBREVIATIONS
DSP Digital Signal Processing
PDSP Programmable Digital Signal Processor
FPGA Field Programmable Gate Array
TI Texas Instruments
DFT Discrete Fourier Transform
FIR Finite Impulse Response
IIR Infinite Impulse Response
CCS Code Composer Studio
VLSI Very Large Scale Integrated Circuits
HDL Hardware Description Language
VHDL VHSIC Hardware Description Language
xi
CHAPTER 1: INTRODUCTION
1.1 MOTIVATION
The past decade has witnessed an exponential growth in the field of embedded systems, especially in the entertainment/portable computing/mobile devices sector. The increasing trend towards high processing power, portable, mobile and power-efficient systems has compelled engineers and scientists to develop innovative design methodologies that can fulfill these requirements and meet the stringent system specifications. Many of these systems, like digital cameras, portable media players, smartphones and tablets, perform Digital Signal Processing
(DSP) operations that require intensive mathematical operations.
The conventional method of developing DSP applications is by using a programmable digital signal processor (PDSP) for prototype design and implementation. This is primarily due to the shorter development time, lower power consumption and lower cost.
Due to progress in CMOS semiconductor technology, complex DSP algorithms, communication protocols, and applications are now feasible, which, in turn, increase the complexity of the systems and products. As the complexity increases, the system reliability is no longer solely defined by the hardware platform reliability but also increasingly determined by hardware and software architecture, design and verification processes, and the level of design maintainability.
[1]
The presence of DSP capabilities is becoming a ubiquitous phenomenon today. More and more common devices require some kind of signal processing with a high throughput of data. As DSP is integrated into more devices, time-to-market and the ability to make late design changes become important. The challenge before engineers is to find ways to achieve higher processing performance coupled with less design effort so that time-to- market is quick.
1
Programmable DSP processors perform their arithmetic operations via software. Software can give flexibility in design, allowing late design changes, but its performance is poor compared to hardware. Software executes in a sequential manner where hardware can execute in a truly parallel way. This is a serial operation in nature, and therefore a slow approach, but has the advantage of being modifiable. The idea of putting the arithmetic operations in hardware has been around for a long time. But creating a custom ASIC requires a lot of time and effort up front and the computing logic on the ASIC cannot be modified after it has been fabricated. This is where a field programmable gate array (FPGA) becomes a great solution by combining the strengths of hardware and software. An FPGA combines the best of both worlds. Reconfigurable hardware such as FPGAs offers high performance and can consequently be significantly faster than the microprocessors. [2]
1.2 RESEARCH OVERVIEW
Real-time implementation of DSP systems requires design of hardware which can match the application sample rate to the hardware processing rate (which is related to the clock rate and the implementation style).When the hardware has matched the sample rate, there is no advantage in making the hardware any faster or larger. Thus, real-time does not always mean high-speed.
Real-time architectures are capable of processing samples as they are received from the signal source, as opposed to storing them in buffers for later processing as done in batch processing.
Furthermore, real-time architectures operate on an infinite time series (since the number of the samples of the signal source is so large that it can be considered infinite). The sample rate information alone cannot be used to choose the architecture. The complexity of algorithms is also an important consideration. The requirements for high data rate and increased algorithmic complexity in next-generation devices present a difficulty for meeting the power budgets.
Therefore, in designing next-generation systems, algorithm designers, system architects and
2 circuit designers face a challenge of how to optimally utilize the benefits of technology scaling in a short development cycle.
Programmable Digital Signal Processors (PDSPs) are a specialized form of a microprocessor with an architecture optimized for the fast operational needs of DSP applications. A PDSP works well in signal processing applications because it is optimized to efficiently process signals, is relatively inexpensive, and has a well-defined development path and a fixed hardware configuration. Since the different types of digital signals require only one set of hardware, a DSP processor can be “mass manufactured” so that the hardware is constant for all chips and
“functionality is defined through software” [3] [4].
The fundamental difference between a PDSP and a general-purpose microprocessor is the DSP processor’s hardware multiply-accumulate (MAC) block and specialized hardware accelerator blocks (co-processors) to facilitate faster computation of commonly found DSP functions [5].
The MAC operation is usually the performance bottleneck in most DSP applications. DSP processors generally incorporate MAC blocks in their architecture to minimize this performance bottleneck. While adding more MAC units may provide more PDSP throughput, the PDSP falls behind in raw data processing power for certain data-intensive DSP functions such as Fourier
Transforms, digital filters, etc. To overcome this hurdle, PDSPs have also tried incorporating specialized hardware cores or “hardware accelerators” (coprocessor) blocks such as the FFT coprocessor, the Viterbi coprocessor and the enhanced filter coprocessor. While such coprocessor blocks provide high DSP throughput, they do not cater to all DSP applications. Most
DSP applications cannot benefit from the DSP vendors' predefined and limited set of hardware accelerator blocks. Additionally, such hardware accelerator blocks are permanent, do not allow for any level of customization for the particular design needs, and can quickly become archaic in
3 today’s evolving standards [3]. The DSP processor’s fixed hardware architecture is not suitable for certain applications that might require customized DSP function-based implementations.
Field Programmable Gate Array (FPGA) is a highly customizable chip used for logic functions.
The FPGA is programmed using a hardware description language (HDL) which is used to program the connections for the individual gates in the FPGA. An FPGA design is usually limited by the number of gates available on the chip. As the number of available gates increases on the FPGA, more and more complex designs can be placed on the chip. The FPGA can also be limited by the time it takes a signal to travel from one gate to another as well as the time it takes to pass through a single gate. Advances in FPGA technology have increased the number of available gates while reducing the time it takes a signal to travel through and between the gates
[5].
Field programmable gate arrays (FPGAs) enjoy adequate performance and logic capacity to implement a number of digital signal processing (DSP) algorithms effectively. But the most significant and path breaking step that FPGAs are taking in regards to signal processing is the utility of the “software tools” used to create the designs. Software development and hardware synthesis tools are essential for creating a structural and functional design and produce either an optimized hardware layout when working with a FPGA or an optimized software routine when working with a DSP.
The tools used in the PDSP software design have been modified so that the software used in the
DSP runs efficiently using a pipeline structure that offers only a limited degree of parallelism.
However, the tools used in the FPGA design optimize the design to run in a highly parallel manner. The tools for the PDSP generally only require knowledge of the C language, while the
FPGA tools require the user to know less common hardware description languages such as
VHDL or Verilog. Tools like System Generator from Xilinx or DSP Builder from Altera have
4 reduced the user’s need to know VHDL and Verilog by using a block-based representation for large pieces of hardware design code. Advances in the FPGA design tools, along with the performance increase the FPGA provides, have opened the door for signal processing designers to increasingly use FPGAs over the traditional DSPs [5].
Latest improvements in the simulation capabilities of high level modeling tools have opened new design flow possibilities. High level design tool support for algorithm modeling and simulation enables a designer to use a single environment to create floating and fixed-point models, and to make decisions early in the design process. At the same time, FPGA vendors have expanded commercial IP offerings to incorporate higher level DSP functions. Together, these technologies enable a new flow for data path design that includes design iterations at the system level [6] [7].
Previously, FPGA based DSP implementation required the combined efforts of a DSP engineer and a hardware engineer familiar with HDL or schematic based design .The steps in such an effort of combined DSP design using algorithmic modeling tools coupled with hardware architecture generation tools by FPGA vendors follow a typical set of stages that include construction of an ideal mathematical model, investigation of implementation effects, test-bench creation and hardware net list generation. Traditional HDL design methods are then used to complete the design implementation [6].
The issue of mapping “DSP algorithm to VLSI architecture” and then translating the optimal architecture to silicon can be formulated in many ways. Commonly, there is a requisite performance level, with acceptable power and silicon-area resources budget along with a desired speed (sampling rate in this case). Since the performance of a DSP application is often dictated by a standard, small power or area are commonly distinguishing features of a particular implementation. The overall design process, therefore, can be viewed as a constrained
5 optimization problem, where the power and/or area are minimized under performance constraints
[6] [8].
The problem that needs to be cracked is multifaceted: to find the right algorithm that optimally uses the underlying technology to achieve the desired data rate, while staying within the power and die size limits. To answer this, every candidate algorithm has to be mapped into an architecture that is optimal for a particular technology. The architecture choice strongly depends on the required throughput, but also on the underlying technology options, usually defined by the choice of supply and threshold voltages. To meet targets, design teams routinely settle on the first architecture and VLSI technology platform that meets the specifications. The ultimate choice is dependent on a design that attains a low power implementation with shorter design time
[8].
Until recently, the use of programmable digital signal processors was nearly universal, but with the requirements of many applications exceeding the processing capabilities of digital signal processors, the use of FPGAs is growing rapidly. Equally vital is the FPGA’s intrinsic advantage in ease of customization, rapid prototyping and reliability.
1.3 THESIS OUTLINE
The goal of this thesis is to research and investigate the core issue of VLSI-based design and implementation of DSP systems. It is an attempt to explore the design methodologies and architectures for using PDSPs and FPGA based design and gauge the impact of high-level synthesis tools on DSP system development.
This thesis will demonstrate the comparative analysis of the PDSP and FPGA implementations with regards to their processing performance as well as calculating a measure of development effort and evaluating the usage of critical high-level design tools of each implementation
6 platform. It shall be observed that the FPGA is capable of implementing the same design as the
PDSP with the advantage of achieving much higher computational speeds. Several case studies have been used as the test-bed implementation for the PDSP and FPGA platforms.
This thesis is broken down into four parts: problem statement (definition, scope, objectives and limitations), algorithm design, architectural implementation and analysis of results. This chapter
(Chapter 1) discusses the motivation behind this thesis followed by the overview of the current state of research in this topic and eventually concluding with the thesis outline.
Chapter 2 of this thesis gives the overview and background study involved in preparing this thesis. The emphasis of this chapter is to acquaint the reader with the knowledge of how the progress and advancements in VLSI technology have influenced the hardware implementation of
DSP systems. This is followed by a detailed study of the evolution of DSP hardware primarily focusing on two hardware platforms that are most commonly used for DSP system design: the
Programmable Digital Signal Processor (PDSP) and the Field Programmable Gate Array
(FPGA). An insight into the world of VLSI –DSP from an education perspective is also explained at the end of the chapter. It is an attempt to take a look at the current state in academia of VLSI and DSP coursework and teaching methods. It makes a strong case for the inclusion of the DSP hardware aspect in teaching advanced VLSI or DSP for graduate students and suggests ways and means to achieve improvements in teaching methods in terms of course emphasis, course structure and syllabus.
Chapter 3 of this thesis builds onto the conceptual foundation of chapter 2 and examines the various design methodologies that can be utilized for implementation of DSP systems when targeting different hardware platforms. It also presents a brief summary of past case studies that have explicitly talked about the issue of PDSP and FPGA comparison in the development of
DSP systems. These case studies shall enable any reader of this thesis to get acquainted with the
7 wider level of arguments that go in favor of or against when making a selection of the hardware platform along with various other finer points that are encountered when debating over the contentious topic of the best available hardware platform for DSP system implementation. The chapter eventually concludes with a formal problem statement of this thesis that defines the problem at hand with its scope and limitations, the goals and objectives of this research concluding with an overview of the experiments that are planned to be performed.
Chapter 4 of this thesis deals with the set of experiments performed. It also delves separately into the issues of implementing the standard DSP algorithms using Texas Instruments’ DSP chip and
Altera’s FPGA, elaborating in detail the development flow used by each platform in the system design. It also lays down the observations and results obtained in the process of implementation that serve as the basis for the comparison of the two hardware platforms.
Chapter 5 of this thesis discusses the conclusions of this thesis and emerging trends for future work possible in this area.
8
CHAPTER 2: BACKGROUND STUDY
2.1 INTRODUCTION
The first chapter of this thesis provided an introduction to the broad area of VLSI based design and implementation of digital signal processing systems. This chapter and the next chapter principally lay the necessary groundwork for the theoretical foundation of the principles, concepts, techniques and the elementary information required for understanding subject matter concerning VLSI based DSP. This has been accomplished through the study of literature available currently along with a summary of past case studies to support the main topic being discussed. These case studies have explicitly made a comparative analysis and critique of the two most commonly available and utilized hardware platforms for DSP systems, namely the
Programmable Digital Signal Processor (PDSP) and Field Programmable Gate Array (FPGA).
This chapter opens with an overview of the impact of rapid advancements in VLSI and its influence on DSP systems followed by a detailed explanation of the evolution of DSP hardware platforms mentioned above: PDSP and FPGA.
Digital Signal processing (DSP) is the link from the real world to the computing world. DSP is used in several applications that encompass digital communication, multimedia systems, radar and satellite systems, biomedical devices, image-processing applications and consumer electronic appliances. All these applications cover a broad spectrum of performance and cost requirements and hence require different sampling rates. [1]
Due to progress in CMOS semiconductor technology, realizing complex DSP algorithms and applications is now feasible. These factors have led to an escalation in the performance demands
9 of these algorithms resulting in more complex systems and products. As the complexity increases, the system reliability cannot be exclusively defined by the hardware platform reliability but also increasingly determined by the associated hardware and software architecture, development and verification processes, and the level of design maintainability. [1]
Real-time signal processing applications are transforming the electronics segment of the high- tech industry market. On an average, every six months, markets are swamped with new products and technologies that are smarter, faster, smaller and more interconnected than ever. This has led to a huge demand for greater speed, effectiveness and portability in any new product that hits the market. The changing market dynamics continues to propel growth and the pace of change is accelerating. [9]
For the semiconductor provider, that fast adoption means fast time-to-market along with a quick ramp-to-volume. This puts tremendous pressure on DSP / VLSI design engineers to meet these varied demands. Costs must be controlled and power consumption needs to be reduced while increasing performance and flexibility within the ambit of an increasingly complex development environment and a design cycle that is ever shrinking [9].
The consequent objective is to examine the benefits of hardware platforms and associated architectures for realizing specific DSP applications and compare the design trade-offs between them. The criteria used to evaluate the options for selecting a platform for implementation are time to market, cost of production, processing performance, development effort, power consumption and the real flexibility with a substantial number of features can be accommodated.
DSPs from traditional vendors like Texas Instruments and Analog Devices have been the primary choice for signal processing applications for many years in the academia and industry.
While they are still widely used for many applications today, the insatiable need for multimedia
10 systems that require higher performance and algorithm complexity, is fueling a rate of growth that Moore’s Law is hard pressed to keep up with. As such, another option of FPGAs have evolved to become reconfigurable signal processors that warrant serious consideration for many of today's signal processing design challenges.
2.2 IMPACT OF VLSI TECHNOLOGY ON DSP AND VICE-VERSA
The advent of VLSI (Very Large Scale Integration) enabled solutions to complex and intractable engineering problems. Advancements in VLSI have played a serious role in realizing the amazing electronic appliances today. The capability to place billions of transistors in a small silicon area has revolutionized the consumer high-technology market, with products regularly appearing with increasing computational power, improved battery life and reduced physical size.
Digital Signal Processing is generally performed using “specialized programmable signal processors”. The capability of a signal processor is determined by its “hardware” and “software”.
By “hardware” we mean the physical implementation, which includes both individual ICs and the system architecture. “Software” is the computational procedure, which includes both the mathematical functionality and the particular algorithm by which it is implemented [10].
Progress in signal processing capability is the product of progress in IC devices, architectures, algorithms and mathematics. Advances in VLSI technology can be used to examine the relative impact of improved IC technology and computing architectures, or more generally computing
“hardware”, versus fast algorithms and new mathematical techniques (computing “software”) in advancing the capability of digital signal processing [10].
The capability of the hardware implementation is affected by the performance of the individual
ICs that comprise the processor, memory, and communication elements, as well as the architecture that defines the overall organization of these elements. Since all of the signal
11
processors all realized using VLSI technology, the progress and advancements in VLSI have
inherently impacted on signal processing system architecture in a number of important ways of
which a few are elaborated below [11]
High speed: As the IC manufacturing technology evolves, the feature dimensions of transistors
continue to shrink. Smaller transistors means faster switching speed and, hence, higher clock
rate. Faster processing speed means more demanding signal processing algorithms can now be
implemented for real-time processing.
Parallelism: Higher device density and larger chip area promise to pack millions of transistors
on a single chip. This makes it feasible to exploit parallel processing to achieve an even higher
throughput rate by processing multiple data streams concurrently. To fully exploit the benefit of
parallel processing, however, the formulation of signal processing algorithms must be
reexamined. Algorithm transformation techniques are also developed to exploit maximum
parallelism from a given DSP algorithm formulation.
Local communication: As device dimensions continue to decrease and chip area continues to
increase, the cost of intercommunication becomes significant in terms of both chip real estate
and transmission delay. Hence, pipelined operation with a local bus is preferred to broadcasting
using global interconnection links. Compiler and code generation methods need to be updated to
maximize the efficiency of pipelining.
Low-power architecture: Smaller transistor feature size makes it possible to reduce the
operating voltage and, thereby, significantly reduces the power consumption of an IC chip. This
trend makes it possible to develop digital signal processing systems on portable or handheld
mobile computers. On the other hand, the stringent performance requirement and regular
deterministic formulation of signal processing applications also profoundly influenced the VLSI
design methodology.
12
High-level synthesis design methodology: The quest to streamline the process of translating a
complex algorithm into a functional piece of silicon that meets the stringent performance and
costs constraints has led to significant progress in the area of high-level synthesis, system
compilation, and optimal code generation. Ideas such as dataflow modeling, loop unrolling,
software pipelining, which were originally developed for general purpose computing systems,
have enjoyed great success when applied to aiding the synthesis of an application-specific signal
processing system from a high-level behavioral description.
Multimedia processing architecture: With the maturity and popularity of multimedia signal
processing applications, general purpose microprocessors have incorporated special-purpose
architecture, such as the multimedia extension instruction set (e.g., MMX). Signal processors
also led the wave of a novel architectural concept such as very long instruction word (VLIW)
architecture. In fact, it is argued that incorporating multimedia features is the only way to sustain
the exponential growth in performance through the next decade.
2.3 EVOLUTION OF DSP HARDWARE
We shall be discussing two of the most commonly used hardware platforms for DSP namely
Programmable Digital Signal Processor (PDSP) and Field Programmable Gate Array (FPGA) in
this section. We have restricted the scope of the discussion to these platforms as the main
objective of this thesis is to have a comparative study of these platforms. Various other types of
hardware that can be utilized for DSP implementation are beyond the scope of this thesis.
13
Figure 1: Evolution of DSP Hardware
2.3.1 Programmable Digital Signal Processors (PDSPs)
Programmable digital signal processors (PDSPs) are general-purpose microprocessors designed
specifically for DSP applications where the architecture is optimized for repetitive, numerically
intensive tasks at high rates. They are designed mainly for embedded DSP applications.
PDSPs fall between the general-purpose microprocessor and the custom-designed chips. GPPs
have the advantage of ease of programming and development. However, GPPs often suffer from
unsatisfactory performance for DSP applications due to overheads incurred in both the
architecture and the instruction set. ASICs, on the other hand, lack the flexibility of
programming. The time to market delay due to chip development are longer. [4]
Recently the border between DSP processors and general-purpose processors has been
diminishing as general-purpose processors have obtained DSP features to support various
multimedia applications. On the other hand, DSP processors, which used to be programmed with
14
manual assembly, have nowadays incorporated features from general-purpose computers to
support software development on high-level languages. [3]
PDSPs can be classified as General Purpose PDSPs and Application- specific PDSPs typically based on the architecture they support.
General Purpose PDSPs
This class of PDSPs is characterized by the following features [3] [12]:
Data path: The actual signal processing operations in a processor are carried out in various
functional units, such as arithmetic logic units (ALU), multipliers, and the collection these units
is called as data path. In order to store intermediate results, data path contains also accumulators
and registers. The data path in DSP processors can be expected to be tailored for computations
inherent in typical DSP algorithms. Multiplication is involved in one of the most characteristic
operations in DSP, multiply-accumulate (MAC) and often DSP performance even characterized
as MAC/s. Therefore, a fast multiplier is an essential unit in a PDSP. In addition, an adder is
used in MAC operation and these resources form a MAC unit. Processor may also contain
parallel MAC units to further boost the performance on DSP applications. As in general-purpose
processors, PDSPs contain arithmetic-logical unit (ALU) which performs the basic operations:
addition, subtraction, increment, negate, and, or, not, etc. DSP applications typically have very
high computational requirements in comparison to other types of computing tasks, since they
often must execute DSP algorithms in real time on lengthy segments of signals sampled at 10-
100 KHz or higher. Hence, DSP processors may also be enhanced with special function units to
improve performance for a group of applications that may often include several independent
execution units that are capable of operating in parallel. As a result, efficiency can be improved
by adding application specific functions to the data path of a general-purpose DSP processor and
significant savings can be obtained if a system is tailored for the application at hand.
15
Data format support: DSP processors are divided as fixed-point and floating-point processors
based on the type of arithmetic units in the processor. The floating-point processors contain more
complex logic and, therefore, consume more power and are more expensive. However, the
floating-point processors are easier to program as the dynamic range in floating-point
representation is larger and there is no need to scale and optimize the signal levels during
intermediate computations. Furthermore, high-level languages have floating-point data types
while integers are the only supported fixed-point data types although signal processing calls for
fractional data types for fixed-point arithmetic.
Memory architecture: A key feature of PDSPs is the adoption of a Harvard memory
architecture that contains separate program and data memory so as to allow simultaneous
instruction fetch and data access. This is different from the conventional Von Neumann
architecture, where program and data are stored in the same memory space. This implies that, in
Harvard architecture, while operands for current instruction are accessed, the next instruction can
already be fetched. This approach doubles the memory bandwidth when one operand instructions
are used.
Dedicated address generator: Intensive access to memory in DSP applications implies that
address computations are performed frequently. As the data path is utilized by signal processing
arithmetic, DSP processors often contain ALUs dedicated to memory address computations. The
dedicated address generator allows rapid access of data with complex data arrangement without
interfering with the pipelined execution of main ALUs (arithmetic and logic units). This is useful
for situations such as two-dimensional (2D) digital filtering and motion estimation. Some
address generators may include a bit-reversal address calculation to support the efficient
16
implementation of FFT, and circular buffer addressing for the implementation of infinite impulse
response (IIR) digital filters.
High bandwidth memory and I/O controller: To allow low-cost, intensive input and output
demands of most signal processing applications, most PDSPs incorporate one or more
specialized serial or parallel I/O interfaces, and streamlined I/O handling mechanisms, such as
low-overhead interrupts and direct memory access (DMA), to allow data transfers to proceed
with little or no intervention from the processor's computational units. Several PDSPs have built-
in multichannel DMA channels and dedicated DMA buses to handle data I/O without interfering
with CPU operations. To maximize data I/O efficiency, some modern PDSPs even include a
dedicated video and audio codec (coder/decoder) as well as a high-speed serial/parallel
communication port.
Application- Specific Instruction Set – Digital Signal Processors (ASIP DSPs) [3]
An ASIP DSP is an application specific digital signal processor designed for an application
domain to accelerate computationally heavy and most used functions. It is used for iterative data
manipulation, transformation, and matrix computing extensive applications. ASIP architecture is
designed to implement the assembly instruction set with minimum hardware cost. The main
difference between a general-purpose processor and an ASIP DSP is the application domain. A
general-purpose processor is not designed for a specific application class so that it should be
optimized based on the performance of the application.
The design focus of an ASIP is on specific performance and specific flexibility with low cost for
solving problems in a specific domain. A general-purpose microprocessor aims for the maximum
average performance instead of specific performance. Designers of general-purpose processors
take into consideration both the maximum performance and maximum flexibility. The instruction
set must be general enough to support general applications. The compiler should offer
17 compilation for all programs and to adapt all programmers’ coding behaviors. ASIP designers have to think about applications and cost first. Usually the primary challenges for ASIP designers are the silicon cost and power consumption.
Based on the specified function coverage, the goal of an ASIP design is to reach the highest performance over silicon cost, the highest performance over power consumption, and the highest performance over the design cost. The requirement on flexibility should be sufficient instead of ultimate. The performance is application specific instead of the highest one. To minimize the silicon cost, a design of an ASIP aims usually to a custom performance requirement instead of an ultimate possible high performance. Programs running in an ASIP might be relatively short, simple, with ultra-high coding efficiency, requirements on tool qualities such as the quality of code compiler could be application specific. For example, for radio baseband, the requirement on compiler may not really be mandatory.
2.3.2 Field Programmable Gate Arrays (FPGAs)
A Field Programmable Gate Array (FPGA) is a software-configurable hardware device that contains an array of programmable logic cells interconnected by a matrix of programmable connections. Each cell can implement a simple logic function defined by a designer’s CAD tool.
The programmability of FPGA is based on two key principles: the use of programmable functional blocks, and programmable interconnect which allows multiple blocks to be connected to form more complex logical functions. [13]
The fundamental building block of the FPGA is the Look-Up-Table (LUT). A LUT is a Read-
Only-Memory (ROM) which may be programmed to emulate logical functions by storing the relevant output in the memory location corresponding to the inputs which produce those outputs.
Loading the data into each LUT on the chip is known as configuring the FPGA By connecting
18
these together using the programmable interconnect; networks of LUTs could implement higher
dimensionality logic functions. The resulting networks are then connected to the outside world
via the programmable pins. [14]
FPGA implementation styles [3]
Dedicated –core based: In this style, the underlying FPGA logic resources are used to “host”
the DSP processing engine or core in its entirety. DSP algorithms can be modeled and simulated
or pre-existing DSP IP cores can be used to design the whole DSP processing system.
Algorithmic model to hardware core generation requires tight coupling between model and core
and a node in the model translates to a component in the circuit. The most commonly used high
level modeling and synthesis tools used in this type of implementation style is MathWorks
Simulink along with FPGA vendor tools (Altera’s DSP Builder or Xilinx’s System Generator).
Dedicated core development is accomplished through algorithm modeling and simulation in
Simulink and the associated FPGA vendor tool. After the intended algorithm has been modeled,
FPGA vendor tool generates a VHDL or Verilog HDL description of the algorithmic model that
can be used for further FPGA based design. Since RTL generation of the algorithm being a
complex task, the high level synthesis tools play a crucial role in saving development time and
effort by automatic HDL code generation. The logic can then be synthesized, placed and routed
on the FPGA using standard FPGA logic synthesis tools. In this manner, a dedicated core for the
desires DSP logic can be created, generated synthesized on the target FPGA hardware. FPGA
tools may also possess the capability of generating test benches along with HDL code. This test
bench can be used for RTL simulation and functional verification. A detailed development flow
of creating such kind of a dedicated core is explained in the next chapter.
19
Co-processor based: In this style, available microprocessor architecture is coupled with a
custom designed co-processor for application-specific DSP. FPGA vendors provide softcore
embedded processors like Xilinx’s Microblaze and Altera’s Nios II. FPGAs allow the softcore
processors to be extended with custom datapath co-processors which results in a robust platform
for developing application-specific Multi-processor System on Chip (MPSoC). The chief
advantage is software programmability of multi-processors delivering high performance for
domain specific applications. FPGA vendors like Altera provides a soft core processor core
called Nios II and Xilinx provides a soft core processor called Micro Blaze along with a vast IP
of memory and peripherals that can be used in conjunction with the designed DSP logic. High
level synthesis tools like MathWorks Simulink are employed for algorithm modeling and
simulation. These tools work in tandem with FPGA vendor specific tools like Altera’s DSP
Builder and Quartus II and Xilinx’s System Generator to provide a way to convert an algorithmic
description of a DSP application to a RTL module. This RTL DSP module can be coupled with a
soft core processor provided by the FPGA vendor along with memory and other peripherals to
generate a configuration termed as System on a Programmable Chip. Such a configuration
harnesses the strengths of both the soft core processor and the core DSP logic in such a manner
that the soft core processor can handle general purpose computing tasks while leaving the DSP
core logic free to manage and execute its core DSP functions. As this configuration consists of a
dedicated hardware accelerator combined with a soft core processor, it is termed as a co-
processor based configuration.
Advantages of FPGA based implementation of DSP
Digital signal processing (DSP) algorithms have traditionally been implemented using
application-specific integrated circuits (ASICs) or programmable digital signal processors
(PDSPs). However, with the introduction of large capacity FPGAs, there has been a shift towards
20
reconfigurable computing for DSP. The key motivating factors for choosing FPGA as a target
platform for DSP applications are flexibility, real-time performance and cost. While software
processors allow functional flexibility, the application’s real-time performance requirements, or
physical constraints placed on the embedded realization, for example in terms of size or power
consumption, may be beyond which these can achieve. In such a situation, unless volumes are
sufficiently high, the Non-Recurring Engineering (NRE) costs associated with creating
customized ASICs are such that this may not be commercially viable.
For such high performance, low volume DSP systems, the ability of FPGA to host custom
computing architectures tailored to the real-time requirements of the application and physical
requirements of the implementation, at relatively low cost, is a key advantage. The fine-grained
parallelism of FPGAs coupled with the inherent data parallelism found in many DSP functions,
have made reconfigurable computing a viable alternative that offers a compromise between the
performance of fixed-functionality hardware and the flexibility of software- programmable
devices [3].
Modern FPGAs play host to a range of complex processing resources which can only be
effectively exploited by heterogeneous processing architectures composed of microprocessors
with custom co-processors, parallel software processors and dedicated hardware units. The
complexity of these architectures, coupled with the need for frequent regeneration of the
implementation with each new application makes FPGA system design a highly complex and
unique design problem. The key to success in this process is the ability of the designer to best
exploit the FPGA resources in a custom architecture, and the ability of design tools to quickly
and efficiently generate these architectures [3].
FPGAs offer performance target not achievable by DSP processors. However, to achieve the
high-performance, FPGA-based designs have come at a cost. Efficient utilization of possibilities
21 provided by modern programmable devices requires knowledge of hardware specific design methods. Designing DSP system targeted for FPGA devices is very different than designing it for DSP processors. Most algorithms being in use were developed for software implementation.
Such algorithms can be difficult to translate into hardware. Thus the efficiency of FPGA-based
DSP is heavily dependent on experience of the designer and his ability to tailor the algorithm to efficient hardware implementation. [DSP design for FPGA Arch]. FPGA implementation has several key advantages specifically time-to-market is short, upgrade to new architecture is relatively easy and low-volume production is cost effective. [4]
2.4 VLSI - DSP: AN INSIGHT INTO THE EDUCATION PERSPECTIVE
While discussing the evolution of DSP hardware in terms of various options available to an engineer, this author as a part of the background study of this thesis, also feels the need to delve into the issue of the current state of education with respect to VLSI and DSP subjects at the undergraduate and graduate level. The motivation for this is the intention to make a solid case of incorporating the teaching of hardware implementation of DSP algorithms and applications in university curricula for electrical and computer engineering students.
In this section, we shall review the current state of graduate –level DSP education / VLSI education and discuss ways and means that can be adopted to incorporate the hardware implementation aspect in DSP education. This can be visualized as an attempt to bring about a convergence of VLSI and embedded system designers with DSP algorithm engineers with the intention of creating a possibility of knowledge sharing and joint development of DSP projects that benefits from the capabilities of both DSP architects and VLSI engineers.
Digital signal processing (DSP) is an area of engineering that has developed rapidly over the past
30 years. DSP is omnipresent as it is used in many important areas, from multimedia and digital
22 communication to consumer electronic products, such as digital camera and MP3 players.DSP has enabled the user to clean up noisy signals, speed up the communication rate, and store more data, and provides many advantages over its analog counterpart. [15]
We are entering an era when it is insufficient to be just a DSP designer, just as it is insufficient to be only a hardware implementation expert. Instead, engineers must be aware of the interactions between design and implementation-especially in the area of digital signal processing. The key to unlock the huge potential that lies at the convergence of VLSI and DSP is integrating algorithm design and implementation. This approach is centered on producing engineers that can understand the intricacies and interaction of the DSP algorithm and its implementation into real products. The question however is to what extent theory and development of the algorithm should be integrated with its implementation. [16]
2.4.1 Present state of VLSI - DSP education
With regards to digital VLSI education at the graduate level, this has generally been restricted to
VLSI design, verification, testing & validation and electronic design automation courses with rarely a thought being given about including the diverse fields that Integrated Circuits (ICs) touch upon, such as System-on –Chips, or the application aspect of Application- Specific
Integrated Circuits (ASICs). It has come to the notice of this author that somewhere in the maze of VLSI education at the graduate level, the importance of “application-specific” has been reduced and the “integrated circuit” aspect has taken precedence. In this process, the core essence of learning and understanding the design, verification & validation of ASICs has been lost. This situation prompts for more targeted action at including the study of a range of
“application domains” that can be visualized when studying about ASIC design. Nowadays,
ASICs are ubiquitous, they can be found in every gadget, appliance, product, system that is expected to work in a wide array of domains ranging from military systems, aerospace &
23
aviation, automobile, industrial automation, medicine & healthcare, entertainment &
multimedia gadgets, consumer electronic appliances, mobile computing products,
telecommunication & networking products. Therefore it makes it necessary for a hardware
engineer to understand intricacies like:
What is the function an ASIC is being designed?
What domain it shall be deployed in?
What is the external environment & how does it affect the functioning of the ASIC?
Does the domain influence in any way the ASIC design-verification-testing-validation
lifecycle?
Is the domain consideration necessary only at the architectural level and irrelevant at the
logic/circuit/physical design levels of a typical ASIC design cycle?
DSP courses offered at many universities do not normally include hardware implementation case
scenarios of signal processing algorithms. The scope of these courses is limited to understanding
and giving a working knowledge of commonly used DSP algorithms like Fourier transforms and
digital filtering. In terms of including the hardware aspects, rarely is such an option considered
partly due to expensive costs associated with purchasing PDSP or FPGA evaluation kits and full-
version software tools. A major reason for the non-inclusion of FPGA implementation is the fact
that students majoring in signal processing often are not familiar with the VHDL or Verilog
FPGA programming languages. It is of benefit to signal processing students if applied DSP
courses are structured in such a way that a hardware implementation aspect is included as an
alternative or complement to DSP theory. [17]
The primary benefit of including the DSP hardware aspect is to help students understand the
process of algorithm to architecture transformation. Restricting the study only to DSP theory
may inhibit and hinder the realization of how a DSP algorithm can be mapped to a fixed
24 processor architecture like a Texas Instruments PDSP chip or a flexible user defined architecture that can be synthesized on an FPGA using on board hard-IP cores like logic blocks, memories, multipliers or soft-IP cores that are provided by FPGA vendors for a plethora of common DSP tasks. Hence inclusion of such a DSP hardware aspect in a DSP theory class or an advanced DSP class shall help students to learn a critical aspect of how DSP applications are able to function: i.e. interface with signals in the real-world, perform computation on those signals and generate necessary output signals after computation according to some pre-defined algorithm to bring about a signal transformation that helps the end-user of the DSP system decipher the output signal and make decisions based on the output information.
2.4.2 Suggested improvements in VLSI - DSP education
DSP theory courses have been standard fare in electrical engineering departments for many years, however DSP hardware courses are less common, and designing a course syllabus is not so straightforward. Two syllabus decisions involve course emphasis and course structure.
An all-inclusive and all-encompassing DSP theory and hardware course can be split up over a period of 2 semesters. In the first semester, the emphasis shall be on teaching of basic DSP theory that has been traditionally followed typically in all universities. An advanced DSP hardware course shall specifically include a type of DSP hardware as a platform on which DSP algorithms and DSP applications can be experimented. PDSP and FPGA are naturally the most optimum choices when considering the DSP hardware aspect. In fact, the availability of a wide suite of educational hardware evaluation boards that are offered by Texas Instruments, a leading vendor of PDSPs and Altera and Xilinx, a leading vendor of FPGAs has the academic community spoilt for choice.
In such a situation, the choice of hardware boils down to a competition between differing design methodologies that are associated with each platform. Conventional wisdom shall make
25 professors to be inclined to go with having PDSP kits in the labs for DSP hardware study. Texas
Instruments offers a vast array of DSP families in their product portfolio ranging from Multicore
DSPs (C6x), power optimized DSPs to low-power DSPs (C54x, C55x). Along with the hardware kit, Texas Instruments provides access to Code Composer Studio software. Code Composer
Studio™ (CCStudio) is an Eclipse-based integrated development environment for TI's DSPs, microcontrollers and application processors. CCStudio includes a suite of tools used to develop and debug embedded applications. It includes compilers for each of TI's device families, source code editor, project build environment, debugger, profiler, simulators and many other features.
CCStudio provides a single user interface taking users through each step of the application development flow. Familiar tools and interfaces allow users to get started faster than ever before and add functionality to their application thanks to sophisticated productivity tools. [18]
The case for having FPGA as a hardware platform in DSP hardware classes has become much stronger over the recent years with the availability of high-level synthesis tools provided by
FPGA vendors like Altera & Xilinx. These tools help engineers design and implement DSP applications using graphical modeling tools like Simulink. Vendors provide specific tools to convert a Simulink model to Verilog or VHDL code which can be downloaded onto an FPGA and the synthesized design can be evaluated directly on hardware. Moreover a complete processor based system can also be designed using DSP modules that can be integrated with soft-
IP cores like processors, memories and peripherals. Hence a complete package of software tools and hardware boards can help a student use them not only for digital design or embedded systems classes but also for DSP hardware design and deployment class.
26
CHAPTER 3: PROBLEM STATEMENT
In the previous chapter, we discussed the necessary background information that lays the groundwork essential for providing a detailed problem statement including the scope, objectives and limitations of this thesis in this chapter. It discussed the advent of VLSI and its profound impact on DSP system implementation with a detailed account of the evolution of DSP hardware, specifically the PDSPs and FPGAs and the architectural styles associated with each one of them.
This chapter continues where the preceding chapter concluded and proceeds to introduce and expand on the issue of an overall design methodology associated with each hardware platform for DSP system development. The first section of this chapter provides a detailed analysis of
PDSP and FPGA design methodologies, discussing the similarities and differences that emerge when working on a DSP application starting from an algorithm, moving to architecture and concluding with a circuit to realize the intended application. Hence, the end-result is a cumulative effort of implementing the intended algorithm on a target architecture combined together to synthesize a circuit is the desired DSP application.
3.1 DESIGN METHODOLOGIES: PDSP AND FPGA DESIGN FLOWS
A design methodology is the overall strategy to organize and solve the design tasks at the different steps of the design process. It is not possible to construct a comprehensive design methodology that can be applied for all application domains or hardware platforms, but a sensible methodology can be crafted that spans the common features present with a domain or platform.
A comparative study of PDSP and FPGA development flows can be performed using three parameters of algorithm modeling and simulation, architecture design and analysis, circuit
27 synthesis and implementation. The development flow for each platform consists of these steps that involve the 3 distinct phases of system design namely algorithm, architecture, and circuit.
The first step in the development flow basically discusses the mapping or transformation of a
DSP algorithm for suitable hardware architecture. This step consists of modeling a DSP algorithm in a high-level programming language like C or C++ or an assembly language or a hardware description language (HDL) like Verilog HDL or VHDL. In case of FPGA-based design, generally HDLs are used to generate a functional description of the DSP algorithm also called as the Register Transfer Level (RTL) description. Nowadays, with the advent of “high- level synthesis tools”, it has become possible to model algorithms in graphical tools like
Simulink combined with extensive support for algorithm simulation, verification and debug capabilities.
The second step involves architecture design for the underlying hardware tailored to for the DSP application and taking into consideration the hardware platform that should host the architecture.
In certain hardware platforms like PDSPs, the architecture is fixed at the time of manufacture and hence cannot be modified or reconfigured during DSP system development. In contrast,
FPGAs are a class of reconfigurable and programmable logic that allow the development of custom architectures tailored for the for the computational, memory and power requirements of the application for which DSP system is intended to be designed. This has removed the preconception that FPGAs are only used as ‘glue logic’ platform and more realistically shows that FPGAs are a collection of system components with which the user can create a DSP system.
The third step consists of circuit synthesis and implementation. In case of FPGAs, this can be viewed as downloading the custom architecture designed in the previous step onto the FPGA using logic synthesis tools. A logic synthesis tool takes an RTL hardware description and a standard cell library as input and produces a gate-level netlist as output. The resulting gate-level
28 netlist is a completely structural description with only standard cells at the leaves of the design.
Internally, a synthesis tool performs many steps including high-level RTL optimizations, RTL to boolean logic, technology independent optimizations, and finally technology mapping to the available standard cells. In case of PDSPs, since the architecture is fixed, there is no separate need for synthesizing it. The final step involves system integration of the software components and the hardware architecture. Therefore, software development in high-level languages / graphical tools for the DSP algorithms using the target architecture on the desired hardware platform is an essential and ultimate step in deploying the envisioned DSP application on the target hardware platform.
Given below is a detailed account of the development flow for PDSP followed by FPGA.
3.1.1 DSP system development flow using PDSPs
As stated above, the generic steps for DSP system development using PDSPs remain the same.
Here each step has been elaborated in the context of PDSP as a hardware platform. The diagram below shows the design flow for PDSPs. [19]
29
Figure 2: PDSP design flow [19]
The flowchart above [19] clearly depicts each stage of DSP system development starting with system requirements and specifications followed by algorithm development and simulation, hardware/software partitioning, development and prototyping and concluding with system integration testing debug and final release.
Algorithm –level modeling and simulation is generally done using high level languages like
MATLAB or C / C++. Simulations are performed using tools that run on a general purpose computer so that results can be analyzed and input data tested before migrating to algorithm to the target hardware platform. The advantage of this step is that becomes easy to test and debug and if required modify programs in high level languages. This significantly saves software development time.
30
The choice of the PDSP depends upon a variety of factors like kind of algorithm to be implemented, processing requirements of the algorithm, code sizes etc. the objective is to select the chip that best match the project’s time scales and cost calculations. Moreover, the vast array of development tools that can be supported by PDSP vendors has emerged as a key factor in the selection of PDSP chips. These tools range from compilers, assemblers, simulators in the software development and debug category to in-circuit emulators and logic analyzers in the hardware testing category along with commercially available evaluation boards for the purpose of prototyping and software development before the actual PDSP chip is purchased for mass deployment of applications.
The software development stage is the most important stage in DSP system development using
PDSPs. This stage mirrors a typical software development life-cycle that involves the phases of requirements specification, analysis, design, implementation, integration, testing, debug and finally concluding with release. High –level programming languages like C/C++ are the most commonly used tools for software development in PDSP based designs. On the other hand,
Assembly language programming does empower the programmer to use various functions of the processor resulting in a highly efficient mapping of the algorithm to the processor but is generally discouraged due to its time-consuming and complex nature. Software simulators or hardware platforms are used for debug and testing purposes in software development. Emulators are used when software need to be tested on the target hardware. DSP software development engineers largely prefer to use C because of the wide array of C compilers that are available for different hardware platforms. This is in addition to numerous inherent advantages of the C language itself like the support of data structure and powerful commands.
31
Hence, the PDSP design flow for DSP system development shows a well-defined pathway for system realization with the software development stage acquiring primacy over other stages and as a key deciding factor in the selection of a particular PDSP chip.
Currently, Texas Instruments and Analog Devices are the biggest vendors of PDSP chips along with associated suite of development tools and end-to-end solutions across a broad spectrum of application domains like industrial, medical, entertainment, military, aerospace, aviation etc.
3.1.2 DSP system development flow using FPGAs
In the previous chapter, it has been mentioned how FPGAs have become a competitive alternative for high performance DSP applications, previously dominated by PDSP and ASIC devices. It also explained the implementation styles of using a FPGA as a DSP Co-processor, as well as, a stand-alone DSP Engine also called as the dedicated core approach.
In the past few years, the electronics industry has witnessed upgraded versions of products are introduced frequently. This forces the engineers to build a product development strategy that is flexible, fast, and low cost that supports rapid, low-cost product innovation and evolution. The strategy should help system designers react in real time to customer feedback and market changes; tailor features of a basic design for different users, regions, or price points; develop differentiated features before the competition; and maintain the first-mover advantage that is so critical to market success. This “design once, make many” approach improves productivity, saves development time, and ultimately saves money.
The main barrier to acceptance of FPGAs for new users has been ease of use and design flow, which is now being addressed with the emergence of new development platforms combined with high-level design methodologies and software tools associated with it.
32
The role of high-level synthesis tools in FPGA based design has been prolific and note-worthy in the sense that it has greatly simplified the process of algorithm modeling and simulation. In the absence of high-level modeling tools, a DSP system designer is expected to write HDL programs for any DSP algorithm that needs to be modeled and simulated. This is a time consuming process because HDL coding complicates and expands the design effort and provides no specific advantages over using high-level programming languages like C /C++.
Before examining deeper into DSP system design using FPGAs using high level synthesis tools and a model-based design framework, it shall be prudent to discuss the conventional approach of
FPGA based design using Hardware Description Languages (HDLs). The section below discusses the conventional method of HDL based design followed by the a brief comment on programming challenges that are encountered in HDL based design that have consequently compelled the adoption of model based design framework using high level synthesis and verification tools.
Figure 3: FPGA based DSP design approaches
33
Conventional approach: HDL based design and its pitfalls
Hardware designers that program FPGAs predominantly use HDLs such as Verilog and VHDL which is a lower level of abstraction than high level programming languages like C or C++. This is because a hardware designer thinks about a design in terms of low-level building blocks like basic Boolean gates and/or/nand/nor/ex-or, multiplexers, decoders, adders, multipliers, flip-flops, registers. Hence, employing Verilog or VHDL makes the task easier for the designer to construct the structural logic required to perform the necessary function. Therefore, the emphasis is more on the structural representation of the intended application than on the behavioral aspects [24].
In contrast, programming general purpose CPUs using High level programming languages
(HLLs) enjoy the advantages of solid Instruction Set Architectures (ISA) of the microprocessors and availability of cutting-edge compilers that enable a simpler programming experience.
Moreover, a higher level of abstraction drastically increases as programmer’s efficiency and reduced possibility of bugs resulting in faster time to market [24].
After an architectural and logic level description is created using HDLs, for synthesizing the design on the FPGA high level tools are used to generate a configuration bitstream from Verilog or VHDL code. This is the most significant and time consuming strep that relies heavily on the performance of powerful EDA tools to generate a bitstream that shall adhere to the resource and timing constraints of the target FPGA device. This phase typically consists of several steps like synthesis, translation, target hardware resource mapping, place and route, timing analysis eventually ending with the generation of a bitstream configuration file for downloading on the
FPGA.
This entire flow described above can be broken down into two major phases: “high-level synthesis” and “logic synthesis”. High level synthesis can be defined as the conversion of an
34 algorithmic (behavioral) description of an application to a low-level RTL (structural) description using logic gates. Logic synthesis can be defined as the process of converting a RTL level description into a low-level netlist specific to the target hardware with the extensive use of target’s technology libraries.
To ease the programming of FPGAs, several frameworks have been proposed [24] with the chief objective of elevating the level of abstraction at which the hardware designer can write a program to compile it down to VHDL or Verilog. Three frameworks- HDL-like frameworks,
HLL based frameworks and model based – are the most widely used and shall be discussed here.
The model based framework has been separately explained in the next section.
HDL-like frameworks: An example of this is SystemVerilog that contains two components: the synthesizable component that extends and adds several features of Verilog-2005 standard and the verification component which uses an object-oriented model similar to C++ or Java than Verilog.
HLL –based frameworks: An example of this is the use of high level programming languages like C or C++ or SystemC that support the conversion of a behavioral description to RTL description done by high-end compilers. EDA vendors have developed tools to support high level synthesis. Example: Cadence C to Silicon Compiler, Synopsys Synphony C Synthesis,
Mentor Graphics Catapult C.
The next phase of “Design Verification” tasks often gobble up a large portion of the overall design cycle time. In HDL based designs, a test bench be must be created that is connected to the design under test. The design under test and the testbench are simulated using event based simulators like ModelSim of Mentor Graphics Corporation wherein each signal transaction is recorded and displayed as a waveform. This waveform based debugging may work for smaller designs but the complexity and simulation time required increases as the designs become larger
35 and larger. To avoid this, hardware designers often turn to emulation suing FPGAs to speed up simulation. This whole process is diametrically opposite to the case in software verification which is much less complicated and has numerous debugging to verify programs in a modest yet potent way [24].
Modern approach: model based design framework
High-level synthesis takes an abstract “behavioral specification” of a digital system and builds a register-transfer level (RTL) “structure” that realizes the given behavior. The task is to take a specification of the behavior required of a system and a set of constraints and goals to be satisfied, and to form a structure that implements the behavior while satisfying the goals and constraints. Hence High-level synthesis actually maps algorithms to architectures.
Traditionally, algorithm designers used MATLAB or C to validate algorithms, without feedback about the practical feasibility of a hardware system. Hardware designers then re-entered the design using Hardware Description Languages (HDLs) that involved numerous changes in the algorithm. The result was required to be re-validated by the algorithm designer, leading to multiple coding and verification of the design, significantly increasing the development cycle
[23].
The arrival of graphical high level modeling and synthesis tools like Simulink by MathWorks in conjunction with FPGA vendor tools like Altera’s DSP Builder or Xilinx’s System Generator have revolutionized the whole procedure of DSP system design and led to emergence and adoption of the FPGA as a credible alternative to fixed architecture PDSP chips or high cost
ASICs.
A unified Simulink design environment/ framework widely adopted by the algorithm designers can be depicted as below [23]
36
Figure 4 : Model based design Framework using Simulink [6]
Figure 5 : FPGA Verification [6]
Using this approach, a design needs to be entered only once. The environment enables both algorithm verification and hardware emulation, and also provides an abstract view of the design architecture. Finally, it allows FPGA-based verification, again using the same input description.
37
Behavioral HDL is produced which allows algorithm mapping onto an FPGA for hardware
emulation. EDA tools runs an initial synthesis and HDL simulation to verify functional
equivalency between the two hardware descriptions. Mapped HDL can then be synthesized into a
GDSII format. Hence, graphical block-based design entry restores the missing link between
algorithm and circuit designers. It enables a single design entry, architecture optimization, and
final hardware verification within Simulink environment, which is widely adopted by the
algorithmic designers.
The above figure clearly shows that it is straightforward to map a fixed architecture to a target
technology. With some architectural feedback from the underlying technology such as speed,
power, and area of the building blocks, the architecture can be optimized in Simulink.
Leveraging this flow, architectural trade-offs can then be explored, allowing the designer to
minimize power and area for a given technology, for a specified throughput constraint for an
algorithm. The final phase of the design flow is verification. Simulink is used in the entire design
cycle: design entry, architecture optimization, and final verification. Behavioral HDL generated
by Simulink can be used to emulate the design on the FPGA.
Model based design flow for FPGA based implementation: Goals
According to [6] the five major goals of model based design are listed below:
1. Provide DSP system modeling capability at a high level of abstraction, with simple the availability of simple arithmetic and logic operators together with specific operators for typical DSP computations like DFT, FFT, FIR and IIR filters etc.
2. Ensure that the same model can be used throughout the design process, from performing initial theoretical design, to simulation, RTL code generation and system integration
3. Generate an FPGA implementation of the data path from the system model automatically, without requiring the addition of device-specific information.
4. Support the task of synthesizing control logic for the FPGA implementation.
38
5. Automate the creation of test benches for performing logic simulation on the final FPGA implementation.
Model based design flow for FPGA based implementation: Process [6]
Model based design is defined as the process of systematic generation of a hardware
representation for an intended DSP application traversing through multiple phases. It starts with
an algorithmic model followed by RTL code generation to describe the functional representation
of the system to a gate-level netlist as a structural representation that eventually gets mapped and
synthesized to a specific target hardware platform usually a FPGA.
Simulink is a graphical modeling tool; designers create algorithmic models using available
blocksets from MathWorks and other arithmetic and logic blocksets provided by Altera’s DSP
Builder. Simulink also provides extensive simulation capabilities including floating point types.
DSP Builder is Altera’s high level modeling and synthesis tool that works in conjunction with
and is ingrained as a part of Simulink. DSP Builder simplifies hardware implementation of DSP
functions, provides a system verification capabilities to the system engineer who may not even
be familiar with the HDL based design flow and allows the system engineer to implement DSP
functions on a FPGA without the knowledge of HDLs. DSP Builder shortens design cycles by
helping construct the hardware representation of a DSP design in an algorithm-friendly
development environment. It integrates the algorithm development, simulation, and verification
capabilities of MathWorks MATLAB and Simulink system-level design tools with the Altera
Quartus II software and third-party synthesis and simulation tools.
Given below is the diagram [6] and description of the model based design flow that uses
Simulink and DSP Builder as high level synthesis tools.
39
Figure 6: Model based design: complete design flow [6]
The first step in model based design involves model construction and simulation using Simulink and DSP Builder function blocksets for the intended DSP algorithms. The development environment is Simulink where DSP Builder blocksets are ingrained. When the model is
40
converted into a form that can be realized on the FPGA, the system designer can invoke the
netlister and testbench generator.
The netlister extracts a hierarchical representation of the model’s structure annotated with all the
element parameters and signal data types. A mapper then analyzes the elements in the hierarchy
and creates a VHDL description of the design. The test bench generator is an interactive tool that
runs in the Simulink environment, where the designer can captures the input stimuli and system
outputs of selected simulation runs for conversion to test patterns. The test bench generator
converts the captured simulation data into VHDL code that will verify the algorithm being
modeled and test its outputs against the expected results.
The FPGA vendor specific tools take over from this stage to synthesize the control logic and
combine all the pieces into a single fully-realized netlist, and place and route the design in an
FPGA. The outputs of this back-end process are a bit-stream (FPGA configuration file) and an
EDIF (Electronic Design Interchange Format) structural netlist of the hardware annotated with
timing information. This netlist can be simulated with the test vectors produced previously from
system simulations to verify the performance of the completed FPGA hardware realization.
The step by step process for Altera FPGAs is precisely given as follows [6]
• Model Creation in Simulink and DSP Builder
• Simulate the model within Simulink
• Convert the Simulink model to a FPGA realizable form
• HDL Code generation & Test bench generation in DSP Builder
• Simulation of DSP model as a Quartus II (Altera’s FPGA design tool) project
• FPGA specific netlist generation
41
• Download the design on FPGA and test
From the above description of the FPGA Model based design flow it is clear that FPGA based
design of DSP systems using high-level development tools has opened the floodgates for a vast
amount of DSP algorithm engineers to actively contribute in the process of hardware system
realization shoulder to shoulder with their VLSI counterparts. In turn, it has created vast
opportunities for VLSI engineers to gain domain specific knowledge of the DSP application
domain and envision the chip design process from an algorithm developer’s perspective.
In the past, DSP FPGA design required the cumulative energies of a DSP engineer and VLSI
engineer familiar with HDL based design. But, FPGA model based design approach helps a DSP
algorithm engineer to derive an HDL netlist for a data path directly from a system level tool. The
steps include construction of an ideal mathematical model, investigation of implementation
effects, test-bench creation, and hardware netlist generation. Hence it provides a seamless path
from system-level algorithm design to FPGA implementation.
3.2 PDSP AND FPGA COMPARISON: SUMMARY OF PAST CASE STUDIES
Now that the design methodologies for DSP system development using PDSPs and FPGAs have
been explained in the above two sections, it shall be necessary to demonstrate working examples
of the two hardware platforms that have already been developed and deployed. In the following
section, we will take a look at a limited set of examples in the area of DSP system
implementation. These examples have been culled from available research conference
proceedings and journal articles.
Presented below is a summary from three case studies from past literature that specifically deal
with the topic of hardware platform comparison (PDSP vs. FPGA) for DSP system
implementation. These DSP systems belong to varied application domains like sound signal
42 processing and image processing. This survey together with the material presented till now in this thesis forms the theoretical foundation on which the problem statement of this research shall be based.
Case Study # 1
The first case study here [20] deals with the FPGA implementation of a discrete wavelet transform algorithm used for real time imaging compression applications. DWT is the core transform used in JPEG2000 image compression standard. The goal of this paper was to compare the performance of a traditional DSP processor against an FPGA in terms of the development effort and processing performance.
The hardware platform used was TI C6416 DSP chip and Altera’s DE2 board with a Cyclone II
FPGA. The FPGA implementation was accomplished using handwritten VHDL code and PDSP implementation using C programming.
The Cyclone II FPGA executed the DWT in 164,354 clock cycles at a 50 MHz clock rate resulting in an execution time of 3.2 milliseconds for a 128 X 128 pixels image. In contrast , the
PDSP chip required 10,770,432 clock cycles at 600MHz clock rate leading to 17.9 ms to do the same job. In terms of execution performance, the FPGA outperformed the PDSP by a wide margin. In terms of hardware utilization, FPGA design required 742 Logic Elements out of a total of 33,216, resulting in 2.2% of hardware utilization. The PDSP occupied 67.2 KB of 1024
KB on-chip memory making it 6.5 % of total size. In this case too, FPGA design was more efficient that the PDSP design. The only metric where PDSP raced ahead was the lines of code.
PDSP design requires 132 lines of C code whereas FPGA design required 429 lines of VHDL code resulting in 3.2 times the lines of code than the PDSP. This factor has been tackled and new results have demonstrated that with the usage of high-level synthesis tools instead of hand-
43 written VHDL code, the number of lines of code required for FPGA based designs can be minimized greatly. More details of this are presented in the next case study.
Case Study # 2
The second case study here [21] is an offshoot of the first case study presented above. It delves deeper into the issue of FPGA implementation techniques and proposes a new method of FPGA implementation using high-level synthesis tools like Simulink/DSP Builder. The author states that the goal was to study the feasibility of a high-level synthesis tools based approach using
Simulink/ DSP builder by comparing its performance with a handwritten VHDL implementation.
It goes on to prove that Simulink/DSP Builder technique is more efficient and faster as compared to traditional VHDL coding technique.
The hardware platform used was a Cyclone II FPGA on an Altera DE2 board. To evaluate the quality of results of the Simulink/ DSP Builder tool as against a hand-written VHDL, performance in terms of hardware utilization and execution time was measured. Two algorithms of the JPEG2000 standard – Daubechies 5/3 and Daubechies 9/7 were implemented for comparison. The following results were obtained [21]
RESULTS Algorithm VHDL Simulink/DSP Builder Logic Units Clock Frequency Logic Units Clock Frequency Daubechies 5/3 110 203.13 79 250.4 Daubechies 9/7 133 66.08 107 109
Table 1: Results for Case Study # 2 [21]
The above results clearly show that using Simulink/DSP Builder has outperformed in the metrics of clock frequency and hardware utilization over hand-written VHDL.This has led to the
44 emergence of high-level synthesis tools as a competitive alternative to traditional VHDL based implementation in FPGAs allowing for faster implementation and faster time to market.
Case Study # 3
This case study [22] discusses the comparison of FPGA and DSP development environments and performance for acoustic array processing.
The purpose of the sound localization system is to process signals from an array of microphones to determine the direction of arrival of an impulsive acoustic signal like a gunshot or a handclap.
The application uses a fixed planar array of microphones to capture impulsive acoustic signals from point source. The sound source is assumed to be located at a distance that is sufficiently far so as to justify a far-field assumption. Additionally, the sound source is assumed to be located on the same plane as the microphone array. The objective is to locate the direction of the (far-field) sound source. This is accomplished by estimating the relative time delays for the arrival times of the acoustic impulse at each microphone. The system includes a filtering stage, a correlation stage, and a trigonometric math (angle of arrival calculation) stage. All of these stages are computationally intensive.
The DSP implementation was accomplished using Texas Instrument’s Code Composer 3.3 development software with a TMS360 C6711 DSK as the target hardware that has floating point capability that reduces the development time for most DSP applications. A C6711 DSK board operates at 100 MHz, is capable of completing eight 32-bit instructions per cycle. The DSP on the board has eight independent functional units: four floating point ALUs (arithmetic logic unit), two fixed point ALUs, and two fixed/floating point multipliers. Subroutines, programmed in C, were developed for the preprocessing, partial cross-correlation and angle calculation portions of the algorithm.
45
The Xilinx Virtex-II FPGA contains 96 18-bit x 18-bit multipliers, 96 18 KB block RAMs, and
3,584 combinational logic blocks (CLBs). Xilinx rates the part as the equivalent of 3,000,000 logic gates. The clocking available to the FPGA can run at speeds up to 120 MHz. The FPGA development was accomplished using MATLAB’s Simulink in conjunction with Xilinx’s FPGA
System Generator Blockset.The functionality of top-level design consisting three blocks was specified using a lower-level diagram containing System Generator blocks. The preprocessing block & angle of arrival block were designed exclusively using basic System Generator blocks.
The System Generator block set does not provide a correlation block. Hence correlation function can be implemented either with an assembly of smaller blocks or using the “black box” feature of System Generator. The black box feature allows the user to develop a custom block whose functionality is specified using either Verilog HDL or VHDL.
The major focus [22] was to determine how the development tools affected the development time and the processing performance. The software metric used to determine the “equivalent LOC” for the Simulink source code corresponded to the development time, at least for a rough estimate.
For a comparison of the development time, a metric of equivalent lines of code was developed.
Since this cannot be applied directly to graphical languages such as Simulink, an “equivalent
LOC” metric for Simulink code was developed. FPGA design line count to a total of 429 lines of code. By comparison, the DSP design contained 86. The final results [22] indicate that the FPGA design required 4.6 times more lines of code. In terms of timing performance, the FPGA implementation is significantly faster than the DSP. The DSP took 25,725,060 clock cycles to produce a final answer running at a clock rate of 100 MHz. The FPGA took 23,005 clock cycles to produce a final answer running at a clock rate of 40 MHz. This resulted in operating times of
257.3 ms and 0.575 ms respectively, a speedup factor of 447.
46
The above results indicate a wide trade-off between the two approaches in terms of hardware platforms, implementation methodologies and software tools. Development time for the FPGA version can be shortened and the quality of results can be increased when a design can be implemented using major functions that are described by standard blocks made available by high level synthesis tools.
The core objective of the above survey was to understand, in each of the case studies mentioned, the procedure that has been adopted for implementation on the PDSP and FPGA hardware platforms, the nature of the algorithm that has been implemented, the high level languages or synthesis tools that have been used and the impact they have caused on the development time and processing performance. Results of each case study can be compared on the basis of common metrics like execution time, hardware utilization and code size. These results eventually help in building a case for performing such a qualitative and quantitative analysis that can be applied to hitherto untouched application domains of DSP like biomedical applications.
3.3 PROBLEM DEFINITION: SCOPE, GOALS & OBJECTIVES
Based on the exhaustive background study presented in the previous chapter, detailed explanation of the design methodologies associated with each hardware platform and a summary of previous case studies specifically reflecting the subject of this research – VLSI implementation of DSP systems: A comparative study of PDSP and FPGA design methodologies for DSP system design–a comprehensive problem statement has been formulated and presented below. The problem statement outlines its scope and limitations.
The principal focus of this research is to compare the design methodologies that come into play when a specific hardware platform is selected as target hardware for implementation. This
47 research shall also expand its scope to evaluate the impact of high-level synthesis tools have come to play in the course of DSP system development.
An attempt has been made to provide a basic level of comparison between the two platforms to help hardware engineers and algorithm architects decide on a specific hardware platform. The three case studies discussed earlier in this chapter have by and large focused their attention on comparing PDSP and FPGA implementations for a particular class of DSP applications like image processing and acoustics. However, in this thesis, the experiments have been restricted to generic DSP algorithms like DFT, FFT and digital filters. The key intention here is not to get bound to any particular class of DSP applications but to provide a “generic template” for engineers and architects alike to make an informed decision on hardware platform selection.
Considering this author’s primary academic background is in the area of VLSI and embedded systems, the comparative study and the resulting selection template that shall be the end-result of this thesis, should be viewed as an honest attempt to arrive at a conclusion as seen from a hardware engineer’s perspective with some degree of working knowledge of the domain of signal processing. The basic list of experiments along with the necessary hardware and software resources needed for an efficient and purposeful investigation, recording of results and drawing up of inferences have been presented in the next chapter.
48
CHAPTER 4: EXPERIMENTS
4.1 INTRODUCTION
The preceding two chapters have laid the groundwork of a strong theoretical foundation that
consisted of a detailed overview and background survey of the broad area of research that this
thesis discusses. It included the evolution of DSP hardware, the architectural options that
accompany each type of DSP hardware platform, the design methodologies associated with each
DSP hardware platform and their similarities and differences concluding with a survey of past
case studies that have explicitly overseen and discussed the comparison of hardware platforms
for the VLSI implementation of DSP systems.
The previous chapter also formally stated the problem statement of this thesis that has essentially
set out the goals to be achieved at the end of this research:
1. To compare the design methodologies for PDSP based design and FPGA based design
2. To analyze the PDSP and FPGA implementations with regards to their development effort for
a generic set of algorithms
To accomplish the objective of validating these goals, a set of algorithms have been formulated
to be implemented on the two DSP hardware platforms mentioned in the previous chapters :
PDSP and FPGA. These set of algorithms represent the most commonly used mathematical
computations that are virtually used in all types of DSP algorithms in every plausible application
domain ranging from image processing, speech processing, video processing to complex systems
like biomedical devices, military and aerospace systems.
49
4.2 HARDWARE AND SOFTWARE OVERVIEW
Before investigating into the aspect of describing the experiments proposed and the techniques deployed for implementation, it is essential to explain in detail the specific hardware platforms and software technologies have been utilized for the experiments. For this purpose, we have selected a Texas Instruments DSP chip TMS 320 C5515 Evaluation board as a host platform for
PDSP implementation and Altera DE-1 board with cyclone II FPGA as platform for FPGA implementation. Given below is a brief summary of the hardware evaluation boards and the software associated with those platforms.
4.2.1 Hardware – PDSP: Texas Instruments C5515 Evaluation Board
The PDSP hardware platform used for this thesis is the Texas Instruments TMS320 C5515
Evaluation Module (EVM).
Figure 7: Texas Instruments C5515 Evaluation Board [25]
50
The C5515 EVM is a standalone development platform that enables users to evaluate and develop applications for the TI C5515 Digital Signal Processor (DSP). The EVM is designed to work with TI’s Code Composer Studio (CCS) Integrated Development Environment (IDE). Code
Composer Studio communicates with the EVM board through the external emulator header, or on board emulation. The EVM operates from a +5V external power supply or battery. The EVM comes with a full complement of on-board devices that suit a wide variety of application environments. [25]
The key features of the C5515 EVM are [25]:
A Texas Instruments TMS320C5515 DSP operating up to 100 MHz 128 Mbytes of Mobile SDRAM 16 Megabytes of NOR Flash 64 Megabytes of NAND Flash 128 x 128 bit mapped color LCD display 10 User push button switches External JTAG emulation interface Embedded JTAG controller RS-232 Interface MMC / SD Media Card Connector User USB 2.0 port via C5515 I2C EEPROM (256Kbits) and SPI EEPROM (256Kbits) Expansion connectors for Bluetooth interface TPS65023 Power Management IC for individual C5515 power rail control TLV320AIC3204 stereo codec with line in, line out, headphone, mic in, on board microphones INA219 power measurement devices Optional battery power
TI TMS320 C5515 DSP Processor [26]
The TMS320C5515 digital-signal processor (DSP) contains a high-performance, low-power DSP to efficiently handle tasks required by portable audio, wireless audio devices, industrial controls, software defined radio, fingerprint biometrics, and medical applications. The functional block diagram of the C5515 DSP is shown below [26]
51
Figure 8: TI TMS320 C5515 DSP Processor [26]
The DSP consists of the following primary components [26]:
The above figure shows the functional block diagram of the PDSP and how it connects to the rest
of the device.
(1) CPU Core: The C5515 CPU is responsible for performing the digital signal processing tasks
required by the application. In addition, the CPU acts as the overall system controller,
responsible for handling many system functions such as system-level initialization,
configuration, user interface, user command execution, connectivity functions, and overall
system control. The CPU also manages/controls all peripherals on the device. The DSP
architecture uses the switched central resource (SCR) to transfer data within the system. [26]
Tightly coupled to the CPU are the following components:
DSP Internal Memory : - Single and Dual Access RAM and ROM
52
FFT hardware accelerator
Ports and buses
(2) FFT Hardware Accelerator: The C5515 CPU includes a tightly-coupled FFT hardware
accelerator that communicates with the C5515 CPU through the use coprocessor instructions.
For ease of use, the ROM has a set of C-callable routines that use these coprocessor instructions
to perform 8, 16, 32, 64, 128, or 256-point FFTs.
(3) System Memory
Memory Type Memory Size Dual-access RAM (DARAM) 64 KB
Single-access RAM (SARAM) 256 KB
Read-only memory (ROM) 128 KB
Table 2: Internal Memory [26]
Memory Type Memory Size Mobile SDRAM 128 MB
NOR Flash 16 MB
NAND Flash 64 MB
Table 3: External Memory [26]
(4) Peripherals: The C5515 PDSP includes the following peripherals [26]:
One external memory interface (EMIF) with 21-bit address and 16-bit data. The EMIF has
support for mobile SDRAM and non-mobile SDRAM single-level cell (SCL) NAND with 1-bit
ECC, and multi-level cell (MLC) NAND with 4-bit ECC.
Two serial busses each configurable to support one Multimedia Card (MMC) / Secure Digital
(SD/SDIO) controller, one inter-IC sound bus (I2S) interface with GPIO, or a full GPIO
interface.
53
One parallel bus configurable to support a 16-bit LCD bridge or a combination of an 8-bit LCD
bridge, a serial peripheral interface (SPI), an I2S, a universal asynchronous receiver/transmitter
(UART), and GPIO.
Four direct memory access (DMA) controllers, each with four independent channels.
One inter-integrated circuit (I2C) multi-master and slave interface with 7-bit and 10-bit
addressing modes.
Three 32-bit timers with 16-bit pre-scaler; one timer supports watchdog functionality.
A USB 2.0 slave.
A 10-bit successive approximation (SAR) analog-to-digital converter with touchscreen
conversion capability.
One real-time clock (RTC) with associated low power mode.
4.2.2 Software – PDSP: Code Composer Studio IDE [27]
Code Composer Studio (CCS) is the integrated development environment for TI's DSPs,
microcontrollers and application processors. CCS includes a suite of tools used to develop and
debug embedded applications. CCS is based on the Eclipse open source software framework.
Code Composer Studio version 5 uses an unmodified version of Eclipse, and also includes
support for Linux, as well as Microsoft Windows.
CCS includes compilers for each of TI's device families, source code editor, project build
environment, debugger, profiler, simulators and many other features. CCS includes a real time
operating system called DSP/BIOS or SYS/BIOS. CCS includes support for OS level application
debug as well as low-level JTAG based development.
Debugger: CCSs integrated debugger has several capabilities and advanced breakpoints to
simplify development. Conditional or hardware breakpoints are based on full C expressions,
54 local variables or registers. CCS supports the development of complex systems with multiple processors or cores. Global breakpoints and synchronous operations provide control over multiple processors and cores.
Compiler: TI has developed C/C++ compilers specifically tuned to maximize the processor's usage and performance. TI compilers use a wide range of classical, application-oriented, and sophisticated device-specific optimizations that are tuned to all the supported architectures. With the program level view, the compiler is able to generate code similar to an assembly program developer who has the full system view. This application level view is leveraged by the compiler to make trade-offs that significantly increase the processor performance.
Prior to starting the debugger, it is necessary to select and configure the target to where the code will execute. The target can be a “software simulator” or an “emulator connected to a board”.
Simulation using Code Composer Studio [27]
An instruction set simulator is a software tool for developing applications on TI’s PDSPs.
Simulators are an excellent platform for application development because they provide greater visibility into application behavior, are readily available, and are easy to use. Additional simulator characteristics that are critical to application development are simulation speed, simulation accuracy, and the ability to run complete applications. TI offers instruction set simulators which ensure quick deployment of applications into end systems as an integral part of the CCS IDE. The rich integrated development environment (IDE) offers a number of features to speed up the various phases of application development, debug, and optimization. CCS IDE supports complete application simulation, easy migration between simulation and emulation environments, device level simulation, BIOS and RTDX.
55
Simulators provide an excellent development platform that helps the developer meet their goals.
The advantages of simulation are:
Easy to use. No additional setup is required. Simulators, being software, can be distributed easily and are usually less expensive.
Provide excellent control and repeatability to the user – a simulator can run in an identical manner time after time. In the hardware scenario, repeatability of external events like interrupts is almost impossible to guarantee.
Flexibility. Some aspects could be ignored if necessary, to provide an environment more suited to the particular phase of development.
Provide visibility into the application behavior as well as resource usage. The details, which can be provided on the simulators, may be difficult to obtain on the hardware.
Simulators have some limitations as they are not real systems and are therefore normally limited in the extent to which they can model hardware. TI provides different flavors of simulator and they are abstracted based on the range of details and extend of hardware modeled.
Range of details: Functional - Provides a programmer view of the model Cycle Accurate - Models 100% pipeline and latencies
Extend of Hardware modeled: CPU/Core simulator - Models the CPU core only Device Simulator - Models the CPU, caches, DMA and peripherals. System/SOC Simulator - Multi-core simulator with multiple cores. Ex: ARM +DSP
56
4.2.3 Hardware – FPGA: Altera DE1 Development Board
Figure 9: Altera DE1 Development Board [28]
Altera DE1 Development and Education board features a state-of-the-art Cyclone® II 2C20
FPGA in a 484-pin package. All important components on the board are connected to pins of this chip, allowing the user to control all aspects of the board’s operation. For simple experiments, the DE1 board includes a sufficient number of robust switches (of both toggle and push-button type), LEDs, and 7-segment displays. For more advanced experiments, there are SRAM,
SDRAM, and Flash memory chips.
For experiments that require a processor and simple I/O interfaces, it is easy to instantiate
Altera’s Nios II processor and use interface standards such as RS-232 and PS/2. For experiments
57 that involve sound or video signals, there are standard connectors for microphone, line-in, line- out (24-bit audio CODEC), SD memory card connector, and VGA; these features can be used to create CD-quality audio applications and video.
The following hardware is provided on the DE1 board [28]:
Altera Cyclone® II 2C20 FPGA device
Altera Serial Configuration device – EPCS4
USB Blaster (on board) for programming and user API control; both JTAG and Active Serial
(AS) programming modes are supported
512-Kbyte SRAM
8-Mbyte SDRAM
4-Mbyte Flash memory
SD Card socket
4 pushbutton switches
10 toggle switches
10 red user LEDs
8 green user LEDs
50-MHz oscillator, 27-MHz oscillator and 24-MHz oscillator for clock sources
24-bit CD-quality audio CODEC with line-in, line-out, and microphone-in jacks
VGA DAC (4-bit resistor network) with VGA-out connector
RS-232 transceiver and 9-pin connector
PS/2 mouse/keyboard connector
Two 40-pin Expansion Headers with resistor protection
Powered by either a 7.5V DC adapter or a USB cable
58
CYCLONE II FPGA [28]
Altera® Cyclone II FPGAs extend the low-cost FPGA density range to 68,416 logic elements
(LEs) and provide up to 622 usable I/O pins and up to 1.1 Mbits of embedded memory. Altera’s latest generation of low-cost FPGAs—Cyclone II FPGAs, offer 60% higher performance and half the power consumption of competing 90-nm FPGAs. The low cost and optimized feature set of Cyclone II FPGAs make them ideal solutions for a wide array of automotive, consumer, communications, video processing, test and measurement, and other end-market solutions.
Features of Cyclone II FPGA
The Cyclone II device family offers the following features:
High-density architecture with 4,608 to 68,416 LEs
Embedded multipliers
Advanced I/O support
Flexible clock management circuitry
Device configuration
Intellectual property
4.2.4 Software – FPGA
Software Tools for DSP system development
(1) Simulink [29]: Simulink is a software tool from MathWorks that is used for modeling, simulating and analyzing dynamic systems. Altera’s DSP Builder runs a s an integral part of
Simulink. The DSP Builder Standard and Advanced blocksets appear in the Simulink Library browser. DSP Builder works within the model based design methodology. An executable
59
specification is created using standard Simulink blocksets. After the functionality and dataflow
issues have been defined, DSP builder can be used for specifying the hardware implementation
details for a specific Altera FPGA board/device. DSP Builder can execute all downstream
implementation tools by invoking
Altera’s Quartus II EDA tool to place and route, bitstream generation to configure the FPGA.
(2) Quartus II [30]: Altera provides various tools for development of hardware and software for
embedded systems. Altera’s Quartus II design software provides a complete design environment
that easily adapts to your specific design requirements. The CAD flow involves the following
steps:
1. Design Entry – the desired circuit is specified either by means of a schematic diagram, or by
using a hardware description language, such as VHDL or Verilog
2. Synthesis – the entered design is synthesized into a circuit that consists of the logic elements
(LEs) provided in the FPGA chip
3. Functional Simulation – the synthesized circuit is tested to verify its functional correctness;
this simulation does not take into account any timing issues
4. Fitting – the CAD Fitter tool determines the placement of the LEs defined in the netlist into
the LEs in an actual FPGA chip; it also chooses routing wires in the chip to make the required
connections between specific LEs
5. Timing Analysis – propagation delays along the various paths in the fitted circuit are analyzed
to provide an indication of the expected performance of the circuit
6. Timing Simulation – the fitted circuit is tested to verify both its functional correctness and
timing
60
7. Programming and Configuration – the designed circuit is implemented in a physical FPGA
chip by programming the configuration switches that configure the LEs and establish the
required wiring connections
Figure 10: Quartus II Flow [30]
61
(3) SOPC Builder [31]: SOPC Builder is a powerful system development tool that enables the user to define and generate a complete system-on-a-programmable-chip (SOPC) in much less time than using traditional, manual integration methods. SOPC Builder is included as part of the
Quartus II software. SOPC Builder is a general-purpose tool for creating systems that may or may not contain a processor and may include a soft processor other than the Nios II processor.
SOPC Builder automates the task of integrating hardware components. In traditional design methods, HDL modules must be written manually to wire together the pieces of the system. On the contrary, in SOPC Builder, the system components are specified in a GUI environment and
SOPC Builder generates the interconnect logic automatically. SOPC Builder generates HDL files that define all components of the system, and a top-level HDL file that connects all the components together. SOPC Builder generates either Verilog HDL or VHDL equally.
An SOPC Builder component is a design module that SOPC Builder recognizes and can automatically integrate into a system. Custom components can also be defined and added or selected from a list of provided components. SOPC Builder connects multiple modules together to create a top-level HDL file called the SOPC Builder system. SOPC Builder generates system interconnect fabric that contains logic to manage the connectivity of all modules in the system.
SOPC Builder modules are the building blocks for creating an SOPC Builder system. SOPC
Builder modules use Avalon interfaces, such as memory-mapped, streaming, and IRQ, for the physical connection of components.
62
Figure 11: Altera SOPC Builder Tool [33]
(4) Altera DSP Builder [32]: Digital signal processing (DSP) system design in Altera programmable logic devices (PLDs) requires both “high-level algorithm” and “hardware description language (HDL) development” tools. Altera’s DSP Builder integrates these tools by combining the algorithm development, simulation, and verification capabilities of The
MathWorks MATLAB and Simulink system-level design tools with VHDL and Verilog HDL design flows, including the Altera Quartus II software. DSP Builder shortens DSP design cycles by helping you create the hardware representation of a DSP design in an algorithm-friendly development environment. Existing MATLAB functions and Simulink blocks can be combined with Altera DSP Builder blocks and Altera intellectual property (IP) MegaCore functions to link system-level design and implementation with DSP algorithm development. In this way, DSP
Builder allows system, algorithm, and hardware designers to share a common development platform. The DSP Builder Signal Compiler block reads Simulink Model Files (.mdl) that contain other DSP Builder blocks and MegaCore functions. Signal Compiler then generates the
VHDL files and Tcl scripts for synthesis, hardware implementation, and simulation.
63
The DSP Builder standard blockset includes libraries of design building and interface blocks and
a library of blocks that represent each of the DSP MegaCore functions.
The standard blockset has the following features:
Cycle-accurate behavioral models
Multiple clock domain management
Control rich with backpressure support
Access to specific hardware device features
Hardware-in-the-loop (HIL) support enables FPGA hardware co-simulation
Support for importing VHDL or Verilog HDL design entities
Tabular and graphical state machine support
Rapid prototyping using Altera DSP development boards
SignalTap II logic analyzer debugging support
Direct instantiation of DSP IP cores
The DSP Builder advanced blockset does not interface directly with the DSP IP cores but instead
includes its own timing-driven IP blocks that can generate high performance FIR, CIC, NCO,
and FFT models.
The advanced blockset has the following features:
Specification driven design with automatic pipelining and folding
High level synthesis technology
High performance timing-driven IP models
Multichannel designs with automatically vectorized inputs
Automatic generation of memory-mapped interfaces
Simulink fixed-point types
64
Single system clock for the main datapath logic
Feed-forward datapath with minimum control
Portability across different device families
Figure 12: FPGA development tools at a glance [33]
4.3 EXPERIMENTS
The previous chapter which presented a snapshot of past literature on the comparison of
hardware platforms for DSP system implementation deliberated case studies related to various
fields of DSP applications like image compression and sound processing. The common factor
that binds these all previous case studies is that none of them have given a “generic viewpoint”
when comparing the FPGA and PDSP platforms. Each case study discusses and implements a
DSP algorithm that is tailored for the intended application.
65
The attempt of this thesis is to present hardware platform comparison for the most commonly used DSP computations like Frequency Analysis using Fourier Transforms, Digital filter design and realization and sampling rate conversion. These computations are ubiquitous in almost all
DSP applications ranging from image processing, audio/video processing to biomedical systems.
The objective here is to design a set of experiments that cover these common DSP computations and implement it on both FPGA and PDSP using the design flows described in Chapter 3.
Explained in brief below is the list of experiments performed.
Experiment 1: Basic Sampling and Quantization
A signal is defined as a quantity that varies with time space or any other independent variable.
An analog signal is a signal that is continuous in time and amplitude. It means that it has a value or takes a value at every instant of time. Hence it is also called as a continuous time and continuous valued signal. Analog to Digital Converters are used to convert an Analog Signal to a digital signal. A digital signal is a signal that is discrete in time and amplitude. It means that it takes values at discrete instants of time.
Sampling is defined as the process of recording the value of a signal at discrete and periodic instants of time. The time difference between two consecutive samples is called as sampling time and its reciprocal is sampling frequency. Hence to get the value of a signal at discreet instants, we have to “sample” the signal at periodic time intervals. By the Sampling theorem, sampling frequency must be at least twice the frequency of the continuous-time signal to avoid aliasing
Quantization is the process of mapping a large set of input values to a smaller set – such as rounding values to some unit of precision. A device or algorithmic function that performs quantization is called a quantizer. The round-off error introduced by quantization is referred to as
66 quantization error. Hence quantization can be thought of as a process of truncation or rounding off.
Mathematically, an analog signal can be represented as
x(t) = A sin (ωt) = A sin (2πF0t) where A = amplitude of the analog signal; t= time
F0= analog frequency to convert an analog signal to a digital signal, we sample the analog signal at
Ts time intervals, where Ts represents sampling time period.
If ‘n’ is the number of samples per period, then t=nTs;
Hence the analog frequency equation can be written as
x(n) = A sin(2πF0 nTs)
x(n) = A sin(2π F0/Fs n) ……. (because Ts = 1/Fs)
Hence the equation of a digital signal can be represented as x(n) = A sin(2 π F n)
where F = F0/Fs = Frequency of digital signal obtained by dividing analog frequency by sampling frequency.
Experiment # 2: Discrete Fourier Transform
Discrete Fourier Transform (DFT) is the frequency domain representation of a time domain discrete time signal x(n).
67
The Discrete Time Fourier Transform (DTFT) a discrete-time aperiodic signal is given by the following equation:
X(ω) = x(n) e – j ω n n the range of the DTFT is infinite. However, the range of the DFT is finite. The DFT is obtained by sampling the DTFT of x(n) at ‘N’ equally spaced points over a period extending from ω=0 to
2π
DFT can be expressed as
N 1 X(k) = x(n) e –j2πkn/N n0
(-j2ᴨ/N) The Twiddle Factor can be expressed as WN = e
Hence, x(n) is a N*1 matrix containing ‘N’ elements [x(0) x(1) x(2) ….. x(N-1)]
X(k) is a N*1 matrix containing ‘N’ elements [X(0) X(1) X(2) ….. X(N-1)]
Therefore, the DFT equation can be expressed as the multiplication of matrices
XN = [WN]* xN
Experiment # 3: Digital FIR Filters – design and realization
A digital filter is a system that performs mathematical operations on a sampled, discrete-time signal to reduce or enhance certain aspects of that signal. The primary functions of a digital filter are : to confine a signal to a prescribed frequency band like low pass or high pass, to decompose a signal into multiple sub-bands, to modify the frequency spectrum of a signal and to model the
I/O relationship of a system.
68
A digital filter is characterized by its transfer function obtained after taking a Z Transform of the difference equation. Mathematical analysis of the transfer function can describe how it will respond to any input. Filter Design consists of developing specifications appropriate to the required conditions like a low pass filter or high-pass filter with a specific cut-off frequency, and then producing a transfer function which meets the specifications.
There are two types of digital filters classified based on their impulse response : Finite- Impulse
Response Filters (FIR) and Infinite Impulse Response Filters (IIR). We shall be restricting ourselves to FIR Filters in this thesis. FIR filters are non-recursive, i.e. its output depends only on the present input and the past inputs. IIR systems are recursive, they have a feedback and their output depends not only on the present and past inputs but also on future outputs that are fed back to the IIR filter.
The system transfer function of a FIR filter is given by
M 1 -k H(Z) = bk Z k 0
The objective here is to design an FIR filter (both low-pass and high pass) with a given cut-off frequency for both the filters. The filter shall be filtering a multi-channel signal having 4 channels.
4.4 PROCEDURE FOR IMPLEMENTATION
FPGA
Simulink and DSP Builder were used as the graphical modeling and simulation tools for FPGA implementation. Simulink and DSP Builder provide toolboxes that contain pre-defined blocks required for each application domain. The toolboxes used for this project include DSP System toolbox, Altera DSP builder blockset, and Simulink General toolbox. Using the blocks provided
69 in these above mentioned toolboxes, graphical models have been constructed for each experiment. Signal Tap Logic Analyzer method has been used to build these models. The
SignalTap II logic analyzer captures the signal activity at the output gates loads into the Altera device on the development board. The logic analyzer retrieves the values and displays them in the MATLAB work space. A SignalTap II Logic Analyzer block in DSP Builder has a simple, easy-to-use interface, analyzes signals in the top-level design file, uses a single clock source and captures data around a trigger point. 88% of the data is pre-trigger and 12% of the data is post- trigger [32] A signal-tap node represents a wire carrying a signal that travels between different logical components of a design file. The SignalTap II logic analyzer can capture signals from any internal device node in a design file, including I/O pins. The SignalTap II logic analyzer can analyze up to 128 internal nodes or I/O elements. The trigger pattern describes a logic event in terms of logic levels or edges. The SignalTap II logic analyzer uses a comparison register to recognize the moment when the input signals match the data specified in the trigger pattern. The trigger pattern comprises a logic condition for each input signal. By default, all signal conditions for the trigger pattern are set to Don’t Care, masking them from trigger recognition. You can select one of the following logic conditions for each input signal in the trigger pattern: Don’t care, Low, High, Rising edge, Falling edge, Either edge. The SignalTap II logic analyzer triggers when it detects the trigger pattern on the input signals [32]. Using the method described above, models can be constructed and simulated in MATLAB workspace. Thereafter, the Simulink model is imported in Quartus II tool of Altera using Tcl scripting commands. Altera’s tools convert the Simulink model to a VHDL or Verilog code as required by the designer. The designer can then simulate the design in Altera using ModelSim. The design can be synthesized and downloaded onto the FPGA using Altera’s Quartus II tool.
70
PDSP
The procedure for PDSP implementation is pretty straight-forward as compared to the FPGA implementation. Code Composer Studio (CCS) is the name of the software tool provided by
Texas Instruments along with its C5515 Evaluation Module. CCS works in an Eclipse GUI based
Microsoft Windows environment. Designers can choose to write either Assembly language programs or C language programs to create projects in CCS. The user has to specify the Device family (in this case C5515 EVM) and the type of debugging method. Two types of debugging methods are generally provided: simulation and emulation. On-board hardware Emulation is possible through a USB cable connected to the host computer. Emulation provides a way for users to inspect the inner details of a TI digital signal processor and help the product development by means of a hardware device (Emulator). Emulation also has the benefit of providing the scenario that is closest to the end product while still maintaining control over the device. However, for purposes of this thesis, emulation has not been used due to limitations of the CCS software version 5.3 and non-availability of specific drivers compatible with the software version. Instead, TI Simulator for EVM C5515 provided by CCS has been used to code, build, debug and simulate the designs. This drawback has limited the extent of analysis that can be performed on the PDSP based design and hence only parameters like lines of code, design time and have been recorded for comparison with FPGA.
4.5 OBSERVATIONS AND RESULTS
(1) Quantization :
FPGA :
Quantization Interval = 5 Quantized Values Range (0,5,10,15….)
71
The Quantizer block passes its input signal through a stair-step function so that many neighboring points on the input axis are mapped to one point on the output axis. The effect is to quantize a smooth signal into a stair-step output. The output is computed using the round-to- nearest method, which produces an output that is symmetric about zero. y = q * round(u/q) where y is the output, u the input, and q the Quantization interval parameter.
Data Type Support: The Quantizer block accepts and outputs real or complex signals of type single or double. For more information, see Data Types Supported by Simulink in the Simulink documentation.
Quantization interval: The interval around which the output is quantized. Permissible output values for the Quantizer block are n*q, where n is an integer and q the Quantization interval. The default is 0.5.
Screenshots:
Figure 13: Experiment 1 Simulink Model
72
Figure 14: Experiment 1 Simulink Simulation
Figure 15: Experiment 1 Quartus II
73
Figure 16: Experiment 1 PDSP
(2) Discrete Fourier Transform
Fo= 20 Hz (Frequency of Continuous Time Signal)
Fs= 100 Hz (Sampling Frequency) Ts= 0.01 seconds
Number of Samples per Frame =256
Total Simulation time = 10 seconds
Total Number of Samples = (Total Simulation Time) / (Sampling Time)
= 10/0.01 = 1000 samples
Total Number of Frames = Total Number of Samples / Number of Samples Per Frame
= 1000/256
=4 (approximately)
74
Figure 17: Experiment 2 Simulink Model_1
75
Figure 18: Experiment 2 Simulink Model_2
Figure 19: Experiment 2 Simulink Simulation
Figure 20: Experiment 2 Quartus II
76
Figure 21: Experiment 2 PDSP
(3) Digital Filter
Using Filter Design Block in Simulink
Digital Filter Design block & Filter Realization Wizard [MATHWORKS HELP]
Overview of the Digital Filter Design Block
The Digital Filter Design block can be used to design and implement a digital filter. It is designed can filter single-channel or multichannel signals. The Digital Filter Design block is ideal for simulating the numerical behavior of a filter on a floating-point system, such as a personal computer or DSP chip.
Filter Design and Analysis: All filter design and analysis can be performed within the Filter
Design and Analysis Tool (FDATool) GUI, which opens with the Digital Filter Design block.
FDATool provides extensive filter design parameters and analysis tools such as pole-zero and impulse response plots.
77
Filter Implementation: Once a filter is designed using FDATool, the block automatically
realizes the filter using the filter structure specified. The block can then be used to filter signals
in a Simulink model. The filter can also be fine-tuned by changing the filter specification
parameters during a simulation.
Guidelines when Selecting a Filter Design Block
Users can design and implement digital filters using the Digital Filter Design block and Filter
Realization Wizard. There are certain similarities and differences between these blocks and how
to choose the block that is best suited for specific needs.
Similarities:
Filter design and analysis options: Both blocks use the Filter Design and Analysis Tool
(FDATool) GUI for filter design and analysis.
Output values: If the output of both blocks is double-precision floating point, single-precision
floating point, or fixed point, the output values of both blocks numerically match the output of
the filter method of the dfilt object.
Differences:
Filter implementation method: The Digital Filter Design block opens the FDATool GUI to the
Design Filter panel. It implements filters using the Digital Filter block. These filters are
optimized for both speed and memory use in simulation and in C code generation. The Filter
Realization Wizard opens the FDATool GUI to the Realize Model panel. The block can
implement filters in two different ways. It can use the Simulink Sum, Gain, and Delay blocks, or
it can use the Digital Filter block. If a filter is implemented using the Digital Filter block, it is
78
bound by the type of filters this block supports. If a filter is implemented by the Filter
Realization Wizard using Sum, Gain, and Delay blocks, inputs to the filter must be sample based.
Supported filter structures: Both blocks support many of the same basic filter structures, but the
Filter Realization Wizard supports more structures than the Digital Filter Design block. This is
because the block can implement filters using Sum, Gain, and Delay blocks.
Multichannel filtering: The Digital Filter Design block can filter multichannel signals. Filters
implemented by the Filter Realization Wizard can only filter single-channel signals.
Data type support: The Digital Filter block supports single- and double-precision floating-point
computation for all filter structures and fixed-point computation for some filter structures. The
Digital Filter Design block only supports single- and double-precision floating-point
computation.
Guidelines regarding when to use each block
Digital Filter Design Block
Use to simulate single- and double-precision floating-point filters.
Use to filter multichannel signals.
Use to generate highly optimized ANSI® C code that implements floating-point filters for
embedded systems.
Filter Realization Wizard
Use to simulate numerical behavior of fixed-point filters in a DSP chip, a field-programmable
gate array (FPGA), or an application-specific integrated circuit (ASIC).
79
Use to simulate single- and double-precision floating-point filters with structures that the
Digital Filter Design block does not support.
Use to visualize the filter structure, as the block can build the filter from Sum, Gain, and Delay
blocks.
Use to generate multiple filter blocks rapidly.
A multi-channel signal having 4 channels with frequencies 30 Hz, 50 Hz, 95 Hz, 110 Hz is to be
filtered into separate high-pass bands (95 Hz and 110 Hz) with high-pass filter cut-off frequency
80 Hz and low-pass band (30 Hz and 50 Hz) with low pass cut-off frequency as 65 Hz. the
sampling frequency used is 300 Hz.
Figure 22: Experiment 3 Simulink Model_1
80
Figure 23: Experiment 3 Simulink Model_2
Figure 24: Experiment 3 Simulink Simulation
81
Figure 25: Experiment 3 Quartus
4.6 COMPARISON OF THE RESULTS
In the table given below, the PDSP and FPGA implementations have been compared on the basis of various factors listed. The procedure for implementation has already been described in detail in section 4.4 and the observations and results have been explained in section 4.5.
Multiple parameters have been used to compare the performance. The parameters have been grouped together into three groups based on their nature. In the “first group”, we look at the nature of each hardware platform, the type of hardware and software used and the programming method required for each platform.
In the “second group”, we compare based on performance. Performance is measured in terms of lines of code required, execution time and hardware resource utilization. Processing performance is one of the most important characteristic of comparison that distinguishes each hardware
82 platform from the other. These are the parameters that matter the most to DSP engineers as it gives them a clear picture of the nature and type of hardware to be used for a specific class of
DSP applications.
The “third group” are other those parameters that are an off-shoot of the architectural structure inherent to each hardware device. Architecture is the mother parameter that influences the design size, design effort parameters as these are dependent and directly derived from the architecture of each hardware platform.
In the table given below, we can see that the primary difference is the method of implementation associated with each hardware platform. Therefore, for FPGA implementations, we can see that the use of graphical modeling using Simulink and DSP Builder results in 130, 77 and 130 lines of code for the three experiments as compared to 43, 112 and 115 lines of code in C for the
PDSP implementation. Therefore, on an average FPGA implementation requires approximately
112 lines of code versus 90 lines of code. We can see that, using a high level language like C automatically always results in fewer lines of code than other methods types of coding. However, in this case a strong caveat needs to be added as we have calculated a rough estimate of Simulink lines of code.
Simulink, being a graphical modeling tool, does not involve writing lines of code. The method that we have adopted here is to estimate the lines of code for Simulink based FPGA implementation as if each block were a module having one input, one output and at least one parameter whose value is set by the block. This approach adopted results in the values of lines of code obtained as mentioned in the table. VHDL lines of code for the experiments are 692,430 and 554. The VHDL lines of code have been “generated” by the Altera’s Quartus II tool when the Simulink+DSP builder model has been imported into Quartus II for hardware synthesis on
FPGA. It shall not be wise to compare the C and VHDL lines of code in this case because the
83 conversion of a graphical model to HDL involves 3 tools (Simulink, DSP Builder, Quartus II) working in tandem to generate the VHDL code.
In terms of resource utilization, this metric is unique to the FPGA design as the designers create the intended design from scratch using the logic gates on the FPGA to host the DSP design.
Hence a proper metric for resource utilization can be obtained for FPGA implementation after compilation of the imported design in Quartus II. The resource utilization has been found at less than 5% due to fewer number of gates and the simple logic involved in the DSP computations.
However, a designer can construct a DSP design that involves large number of elements like multipliers, FFT, FIR computation blocks that are dedicated DSP blocks along with a processor core, memory and peripherals. Such a design will have the resource utilization shooting up to high levels as a large number of gates on the FPGA will be utilized for the design. Resource utilization as a metric cannot be explicitly measured for the PDSP chip as the processor and other components are already fabricated on the chip and the designer makes uses of these resources already present to build a DSP application.
When we consider the type of architecture of each type of hardware, we can notice a distinct difference. The PDSP has a pre-defined and pre-fabricated architecture consisting of a processor optimized for DSP needs, memory, peripherals and other DSP specific components. Whereas the
FPGA is just a sea of logic gates that must be configured to before any DSP application can be executed on it. Therefore, for a DSP designer who is not concerned overtly about the intricacies of the host architecture on which his DSP application runs, should ideally be satisfied with buying an off-the shelf DSP board and conduct experiments. However, the designer will be constrained to operate within the limits that the PDSP host architecture imposes on its users.
On the other hand, a DSP engineer opting for an FPGA based design shall have the option of building an entirely new architecture. Usually, building a processor based architecture on an
84
FPGA is a complex and time-consuming process that requires writing thousands of lines of
VHDL or Verilog code and testing it by writing test benches. However, with the advent of high- level synthesis tools like Simulink and Altera’s DSP Builder, the complex task of assembling an architecture can be drastically cut down in terms of complexity by switching to graphical modeling and simulation using pre-defined building blocks made available to the designer.
Hence, we can see that, the task of the designer is simplified to a large extent and it helps the designer to build his own architecture and optimize it as per the needs of any specific DSP application domain. Such an option cannot be possible in a PDSP chip, which even though has been manufactured to be tailor-made for DSP applications, cannot be re-configured to modify its architecture to fine-tune it for particular DSP application domains.
An extension of the architecture is the design size and design effort required for the hardware platforms. Generally, design size of PDSP designs is smaller as compared to FPGA designs because of the use of high-level programming language and fixed architecture that has already been optimized for area. In FPGA designs however, the designer has to fit the design within the logic fabric available on the FPGA chip itself and it has been observed that FPGA designs are large and also require some gates for configuring the logic.
In terms of design effort, PDSP designs are quick to implement if the design complexity is small and the designer has the requisite programming knowledge. If the designer is not concerned about the internal details of the architecture, PDSP based implementation is fast. In FPGA based design, the process becomes a lot easier due to almost complete elimination of hand-written coding. It relies on graphical modeling tools to make the process of design easier, simpler and faster.
85
Table 4: Comparing FPGA and PDSP Implementation Results
PROPERTY FPGA PDSP NATURE OF THE PLATFORM Hardware Altera DE-1 Board Texas Instruments C5515 DSP Cyclone II FPGA Evaluation Module Software MathWorks Simulink (R2011a) TI code composer Studio 5.3 Altera DSP Builder Altera Quartus II ModelSim Programming Method Graphical modeling. Requires little or no C Programming hand-written programs. PERFORMANCE Sampling & 43 Average = 90 Lines of Quantization Simulink* = 130 Average = 112.3 Code DFT Simulink* = 77 112 Digital Simulink* = 130 115 Filters Execution Time Execution time is shorter than PDSPs. Execution time is longer than [43] [44] Faster execution observed in cases where FPGAs. Maximum CPU clock DSP logic is downloaded onto FPGA. Even frequency is 120 MHz. Also, if an embedded processor like Nios II is instruction is executed in 12 included, execution time is lesser as Nios II pipelined stages. clock frequency is 200 MHz Resource Utilization Specific value is reported by Altera’s tools Though no values reported due to [43] [44] depending upon the number of logic tool limitations, pipelined elements used. instruction execution utilizes a large amount of CPU hardware. EP2C20 chip (18,752 logic elements) C5515 chip 484 Pin BGA package 196 Pin BGA package 26 Multipliers (18X18 bit) 250 MHz Dual Multipliers (240 MHz) 52 M4K blocks (1 block= 4Kbit) 320 KB RAM & 128 KB ROM Nios II Processor (200, 185, 165 MHz) CPU (60, 75, 100, 120 MHz) ARCHITECTURE Architecture Flexible and tailor-made architecture can be Fixed Architecture cannot be designed by the designer. changed Design Size Design size shall be large if components Design size is small as processor like processors, memories and peripherals architecture is pre-fabricated on are included with DSP modules in the the chip. design. Design Effort Faster only if simple blocks need to be Fast if the programmer is adept at plugged in and connected. C or assembly. Also need not Designs take longer time if an entire spend time coding for architecture. processor based system needs to be created Only needs to make best use of it. and synthesized on an FPGA board Familiarity helps hardware engineers familiar with DSP engineers prefer PDSPs over design flow to help design DSP systems. FPGAs due to limited exposure to Familiarity with MATLAB helps DSP hardware design tools. engineers acclimatize to FPGA based design.
86
*Note: The Simulink / DSP builder design tools use graphical blocks supplemented with HDL code instead of C code. Therefore, to find an equivalent measurement for the total Lines of Code in the FPGA design, an assumption has to be made about the blocks used. The blocks required the designer to specify a minimum of three basic details to instantiate the block: the block function, the input(s), and the output(s). The methodology provided one line of code for each block placed and one line for each input and output parameter. Moreover, the blocks also contained user-defined parameters that have to be set for each block instantiation. Hence, for every parameter needed to define the block, another line of code was added to the count. In this manner, an estimate of the Lines of Code for Simulink modeling has been made.
87
CHAPTER 5: CONCLUSIONS AND FUTURE WORK
5.1 INTRODUCTION
In the discussion in the previous chapters, most of the issues that the author came across while researching this subject have been extensively covered. This includes an exhaustive background study and literature review of this research area that discussed the impact on VLSI technology on
DSP, evolution of DSP hardware like the PDSP and FPGA, the design methodologies associated with each hardware platform and three case studies from past research that specifically deal with the issue of PDSP vs. FPGA comparison. The experiments described and formulated and the consequent observations and results have clearly demonstrated the inherent advantages and disadvantages of each hardware platform.
The next section of this chapter shall list all he conclusions in a comparison format based on various points the author has observed for the two hardware platforms. This chart should serve as a generic template that can help engineers working in this area to make informed decisions about hardware selection. Since the intention of the author was not to restrict the debate about this topic to the confines of an industry perspective, this template should also be helpful for students and academic researchers.
In the penultimate section, the core objective is to envision the future scope and direction of emerging trends and developments in the DSP hardware implementation field. The aim is to provide an insight into happenings in this space and to explore a few evolving trends and developments that may help us chart the future course of discussion of this research area.
88
5.2 CONCLUSIONS: TEMPLATE FOR HARDWARE PLATFORM SELECTION
In this section, we present a template for hardware platform selection for DSP design using either
PDSPs or FPGAs. Throughout the course of this thesis until now, we have come across multiple factors that are necessary to be taken into account when the type of hardware needs to zeroed on for DSP system development.
In Table 5 below, we list the factors that are influenced by the nature of the hardware that have an effect on the processing performance of the DSP application being implemented. Those are speed, cost, power consumption, area/size, prototyping ability, field programmability and the availability of pre-designed hard and soft IP cores.
The second table (Table 6) lists all the external or generic factors that indirectly affect the design and implementation of DSP systems. We have called them external or generic because these can be seen virtually in any kind of system development, whether is pure software based or uses general purpose microprocessors, ASICs, PDSP, FPGAs and microcontrollers.
From the two tables, it we can list the specific advantages associated with PDSPs and FPGAs as given below.
Advantages of a PDSP based design methodology:-
Conventional design methodology that is well developed and mature.
PDSP based DSP system design is beneficial when the designer is not concerned about
the explicit internal details of the architecture and does not require an architecture that needs
to be tailor-made and fine-tuned for a specific class of DSP applications.
89
The PDSP is designed and optimized to support a larger and wider class of DSP algorithms and applications.
PDSP is useful whenever there is a need of high production volumes and environments
Useful when power needs to be conserved and area needs to be optimized.
Knowledge of logic design or HDLs not required as architecture need not be designed separately.
Using PDSP is akin to using a personal computer (with a general purpose microprocessor) for writing programs to build a purely software based application, albeit with the notable difference that in case of PDSP; the user writes high-level programs to run
DSP applications that are tightly coupled to the host hardware with the absence of any operating system.
Advantages of a FPGA based design methodology:-
A modern approach that eliminates the need to write long lines of code and instead replaces it with a graphical modeling approach and high-level synthesis tools to translate the models.
Designer can construct a whole new architecture with pre-defined blocks and a wide range of hard and soft IP cores library made available by the FPGA vendors.
Best suited for low-production volume applications.
Design can be changed and re-configured frequently and refinements can be made easily and quickly.
Models can be re-used for new designs.
90
PROPERTY PDSP FPGA Speed Slower speed due to lower clock rates for Faster speed due to higher clock rates of [21] [22] CPU cores. Execution takes longer if pipeline embedded processors. Also case studies has large number of stages. Case studies too have reported shorter execution time have reported longer execution times for than for PDSP. PDSPs Cost Useful for complex and high production For smaller applications and/or lower volume designs as a PDSP is a pre- production volumes manufactured device with a fixed architecture Power Can be optimized for low power consumption Consumes more power before it is fabricated later. Size / Area Application typically use all components Applications may use only a few logic present on a PDSP chip as instructions are elements for core DSP logic. In a executed in a pipelined fashion. processor-based system, large number of logic elements required. Field Cannot be reprogrammed. Field Programmable. A new code can be Programmability Ships with a fixed architecture. downloaded and an FPGA can be reprogrammed in a short time.
Prototyping PDSP can be used for prototyping subject to Can be used a prototyping device architecture constraints. because of reusability. More freedom because user can design all blocks practically from scratch. Pre-designed Not available unless specifically included in Modern FPGAs have additional hard or soft IP hardware. Need to be designed separately. hardware/ software blocks like cores/blocks multipliers for DSP, hard core or soft- core processors & peripherals. Possible to design a System-on-Chip (SoC) using these blocks Parallel PDSPs deliver better performance for high- FPGAs deliver better performance for Processing speed serial processing high-speed parallel processing [37] operations.
Heterogeneous, PDSP is main processor responsible for FPGA acts a co-processor for parallel reconfigurable pipelined operations processing operations DSP hardware platforms [37] Table 5: Template for Hardware Platform Selection -1
91
PROPERTY PDSP FPGA Design Relies more on software programming Requires deeper hardware Methodology knowledge to use the underlying hardware. knowledge since logic need to be designed using existing pre- designed blocks Design Cycle Application-specific. Shorter design cycle if Application-specific & availability architectural constraints not a hindrance. of pre-defined blocks Ease of design Extensive C programming or assembly language Graphical tools like Simulink cycle required. shorten the development cycle. Reusability Reusability limited to high-level C code. Reusability of FPGA is the main of design Assembly language code cannot be ported to a advantage. Prototype of the design different architecture. can be implemented on FPGA which could be verified for almost accurate results. If the design has faults, the HDL code is modified, and FPGA can be reprogrammed to test the design. Knowledge of Not required. All programming in either C or May be required in order to work HDLs assembly with FPGA tools Knowledge of Not required. Architecture is fixed Requires proficiency in logic Digital Logic design Non Recurring NRE refers to the one-time cost of researching, Typically, FPGA vendors classify Engineering Costs designing, and testing a new product. Since FPGAs within “families”. design effort and research costs are higher when Developing a new family of designing a specialized PDSP chip architecture, FPGAs has high NRE costs, NRE costs for PDSPs are higher. versions within a family are simple tweaks hence have lower NRE costs. Time-to-Market Inconclusive. May have faster time-to-market Depends on the design cycle for each application. due to smaller design cycle. However large designs require more building blocks that increase complexity. Hence may not keep pace as compared to a PDSP with fixed hardware architecture. Current State Still popular among core DSP engineers who are Newer approach. Slowly gaining more familiar with design cycle. foothold in academia and industry. Enables a hardware engineer to learn and implement DSP easily.
Table 6: Template for Hardware Platform Selection -2
92
5.3 FUTURE TRENDS: CO-PROCESSORS – A HYBRID APPROACH
An emerging trend in the VLSI based implementation of DSP systems that has been observed is not to restrict the implementation to a single platform like PDSP or FPGA. A thoughtful effort in this direction has been to provide a “hybrid solution” with the objective of maximizing the DSP system performance through the use of an FPGA as a co-processor to PDSP. This hybrid or heterogeneous approach has been the subject of various white papers published by hardware industry vendors like Altera, Xilinx and Texas Instruments.
A common thread that this author has observed when reading numerous white papers by the
FPGA vendors Altera and Xilinx is the ability to couple the FPGA as a co-processor to an existing Texas Instruments manufactured PDSP. This alliance of PDSP and FPGA vendors has opened a new arena in the quest to find an optimal, low-cost and effective platform for DSP system development and implementation.
Presented below is a comprehensive analysis of the material published in selected white papers of Altera and Xilinx that stresses the topic of FPGA co-processing.
As stated before, conventionally, DSP applications were either implemented in a general-purpose
DSP processor or built using ASIC technology. Despite the availability of high-performance
DSP processors, they may not be suited to all DSP kinds of applications. Their general-purpose architecture makes these DSP processors flexible, but they may not be fast enough or cost- effective for all systems. FPGAs and ASICs offer faster processing speed and more functionality to support more advanced features when compared to PDSPs. Making a choice between an ASIC and an FPGA depends on the application. ASICs were used whenever the DSP application required performance beyond the abilities of programmable DSPs, or when the expected system volumes justify a semi-custom solution or a full-custom ASIC solution. However, an FPGA
93 implementation can be a faster time-to-market and lower-cost solution than an ASIC design.
FPGAs also offer the added benefit of re-configurability when the design specification changes.
On the other hand, an ASIC may be the right solution for a large volume, very high-speed, or power-sensitive application [34] [35] [40] [41].
High-performance DSP platforms, based on general-purpose DSP processors running algorithms developed in C, have been migrating towards the use of an FPGA pre-processor or coprocessor.
The prime motivators that necessitate this migration are significant performance, power, and cost advantages. Despite the significant benefits, design teams accustomed to working on traditional high-level language based DSP development may avoid using FPGAs because they lack the hardware skills necessary to use one as a co-processor. Unfamiliarity with traditional hardware design methodologies such as VHDL and Verilog limits or prevents the use of an FPGA that may result in resulting in more expensive and power-hungry designs. A new group of emerging design tools called ESL (electronic system level) promises to address this methodology issue, allowing processor-based developers to accelerate their designs with FPGA while maintaining a common design methodology for hardware and software. The performance and cost of the DSP system are optimized while lowering system power requirements by offloading operations that require high-speed parallel processing onto the FPGA and leaving operations that require high- speed serial processing on the DSP. Another approach may entail the creation of an independent hardware accelerator for one of the Xilinx embedded processors. The processor remains the primary target for the C routines, with the exception that performance-critical operations are pushed to the FPGA logic in the form of a hardware accelerator. This provides a more software- centric design methodology albeit with tradeoff of processing performance [36].
FPGAs and DSP processors have fundamentally dissimilar architectures. An algorithm that is well suited for implementation on one may be very inefficient on the other. For instance, a
94
hardware system based solely on DSP processors may require more area, cost, or power if the
target application requires a large amount of parallel processing or a maximized multichannel
throughput because discrete DSPs do not scale well for parallel processing. An FPGA
coprocessor can provide up to 550 parallel multiply and accumulate operations on a single
device, delivering the same performance with fewer devices and lower power for many
applications. On the other hand, while FPGAs excel at processing large amounts of data in
parallel, they are not as optimized as DSP processors for tasks such as periodic coefficient
updates, decision-making control tasks, or high-speed serial mathematical operations. Combining
an FPGA with a DSP processor delivers successful solutions for a wide range of applications.
[37]
Nevertheless, FPGAs bring two key advantages to digital signal processing. First their
architectures are well suited for highly parallel implementation of DSP functions, allowing for
very high performance. Second, user programmability allows designers to trade-off device area
vs. performance by selecting the appropriate level of parallelism to implement their functions.
By programming the FPGA to use more on-chip resources, designers can achieve higher
performance. By using less resources (and accepting a corresponding lower performance),
designers can optimize the design for low cost. [39]
While FPGA architecture has some nice capabilities, several areas must come together for a
good co-processing solution:
Silicon foundation: Logic, DSP, and power management
Arithmetic foundation: Operator cores optimized for FPGAs and datapath compiler
Library foundation: Function optimized for specific FPGA resources
System level: Co-processing tool chain with CPU interface bandwidth and latency [38]
95
Heterogeneous, reconfigurable DSP hardware platforms are hardware platforms that include both a DSP processor and an FPGA supported by a platform-based design methodology enable traditional DSP designers not familiar with FPGAs to quickly evaluate the benefits an FPGA coprocessor can bring to their applications. They provide off-the-shelf hardware that addresses the most important design challenges yet is still sufficiently customizable to allow for product differentiation. These platforms limit degrees of freedom in hardware, thereby providing greater automation in the design flow. This automation can help eliminate complexity, thus extending the advantages of heterogeneous platforms to the DSP design community [37].
A heterogeneous system improves exploitation of pipelining and parallel processing, which are essential to achieve high frame rates and low latency. Developing this type of system requires proficiency in both FPGA and DSP processor designs plus the systems engineering skills necessary for partitioning – a breadth of skills few designers possess. A heterogeneous platform- based design flow extends the design automation concepts adopted by the individual processor and FPGA design flows to the entire platform. The basic function of a platform-based design – abstracting away the hardware and software interface details between the FPGA and DSP processor – allows a DSP designer with little or no FPGA design experience to evaluate and exploit the benefits of adding an FPGA. This design flow should automatically generate memory maps, header and driver files for the software interface, and hardware interface and interrupt logic. Refining the overall system should have limited consequences on individual hardware and software components [37].
Designers can use many methods to implement a DSP algorithm in any given technology. Target hardware often influences the algorithmic approach. When the target is a heterogeneous DSP hardware platform, selecting an implementation becomes a two-step process. The designer must first select the most appropriate hardware device and then determine which implementation
96 method makes sense for that particular device. On a DSP hardware platform, the processor will be the master and control the FPGA [37].
Figure 26: TI DSP Xilinx FPGA as Co-processor [36]
]
Figure 27: TI DSP and Altera FPGA as Co-processor [40]
97
Figure 28: Heterogeneous platform-based design [37]
In many cases, FPGAs work in conjunction with a conventional DSP – typically integrating pre- and post-processing functions, along with high performance signal processing. FPGAs can also integrate all the logic, bus-bridging, and peripheral functions, thus reducing system costs and affording a higher level of system integration. The FPGA, in turn, will be used as either a co- processor (where data is sourced to and synched from the DSP processor) or as a pre- or post- processor (where the data is sourced from a high-speed interface). System data rates and operating parameters drive optimal FPGA usage [37].
DSP application developers will find the co-processing flow, in which an FPGA can be used to accelerate performance-critical functions, to be the most natural programming model. Tools such as Code Composer Studio (CCS) for Texas Instruments DSPs include code profilers that identify the software hot spots that can be offloaded to the FPGA. To use these tools efficiently and design a heterogeneous DSP/FPGA platform effectively, designers need an interface to connect the FPGA to a separate DSP processor on the hardware platform. DSP platforms will typically
98 support more general-purpose interfaces, such as the Texas Instruments 16/32/64-bit Tic6x DSP extended memory interface (suitable for system control and co-processing tasks), and high-speed serial interfaces, such as Serial Rapid IO or video interfaces (for pre- and post-processing operations). As designers add FPGA coprocessors to the system, the software implementation will change from an algorithmic description to data passing and function control. The FPGA coprocessor will appear as a hardware accelerator to the application software developer and will be accessible through function calls [37].
5.4 CONCLUDING REMARKS
Finally, to conclude, “human-intelligent machine” interaction on a day-to-day basis is so ingrained as a part of our lives now (including the use of household electronic appliances, smartphones, tablets, computing systems, intelligent automobiles, office and industrial automation, and security systems) that “unplugging” is now being suggested as an alternative therapy by the medical profession as a means of reducing distractions and cleansing of the mind, body and soul. This is because at almost every point in our daily routine involves some interactions with the virtually connected world. It is always fascinating to learn and understand how theories can be transformed into real world working products that can help make tasks easier in this increasingly interconnected wired or virtually wired (wireless) world of ubiquitous computing devices.
All this has all become possible because engineers have been able to interface the real world analog signals, convert them to digital signals, analyze and extract relevant information contained in those signals that can make sense to the end-user, and transform signals to synthesize an output signal that can be presented to an ordinary end-user in an easy manner. The essence of this capturing, analysis, extraction, transformation, synthesis and delivery is what can
99 be defined as digital signal processing in layman’s terminology and the medium through which this can be accomplished is through a combination of hardware and software.
We can thus state that the field of VLSI Signal Processing not only brings about a convergence of two independent domains in the field of Electrical Engineering, but also throws open a range of possibilities and ideas that can be harvested to help solve problems in the individual domains with cross-pollination of knowledge sharing abilities, building cross expertise in each domain that may help solve problems faced by hardware engineers, algorithm engineers and system architects alike.
100
REFERENCES
[1] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, Wiley-Interscience Publishers, 2007.
[2] S. Mirzaei, Design Methodologies and Architectures for Digital Signal Processing on FPGAs, Ph.D. Dissertation, Department of Electrical and Computer Engineering, University of California Santa Barbara, 2010.
[3] S. S. Bhattacharyya, E. F. Deprettere, R. Leupers and J. Takala, Handbook of Signal Processing Systems, Springer, 2010.
[4] Y. H. Hu, Programmable Digital Signal Processors: Architecture: Programming and applications, CRC Press, 2007.
[5] R. Duren, J. Stevenson and M. Thompson, “A comparison of FPGA and DSP development environments and performance for acoustic array processing”, Proceedings 50th Midwest Symposium on Circuits and Systems, 2007.
[6] R. D. Turney, C. Dick, D. B. Parlour and J. Hwang, “Modeling and implementation of DSP FPGA solutions”, Proceedings International Conference on Signal Processing Applications and Technology, ICSPAT, 2000.
[7] J. Cong, Bin Liu, S. Neuendorffer, J. Noguera, K. Vissers and Zhiru Zhang, “High-level synthesis for FPGAs: From prototyping to deployment”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 30(4), pp. 473-491, 2011.
[8] D. Markovic, B. Nikolic and R. W. Brodersen, “Power and area minimization for multidimensional signal processing”, IEEE Journal of Solid-State Circuits, 42(4), pp. 922-934. 2007.
[9] Texas Instruments White Paper http://www.ti.com/lit/wp/spra879/spra879.pdf , accessed 07-09-2014.
101
[10] M. A. Richards and G. A. Shaw, “Chips, architectures and algorithms: Reflections on the exponential growth of digital signal processing capability”, http://users.ece.gatech.edu/mrichard/Richards&Shaw_Algorithms01204.pdf , accessed 07-09-2014.
[11] S. P. Chan, C. Sankaran, G. Ballou, M. Pecht, N. Angelopoulos, P. Lall, J. Cogdell, Z. Wan, C. R. Paul and R. C. Dorf. The Electrical Engineering Handbook, CRC Press, 2006.
[12] J. Eyre and J. Bier, “Evolution of DSP processors”, IEEE Signal Processing Magazine 17(2), pp. 43-50, 2000.
[13] M. Rawski, B. J. Falkowski and T. Łuba, “Digital Signal Processing: designing for FPGA architectures”, Facta Universitatis-Series: Electronics and Energetics 20(3), pp. 437-459. 2007.
[14] C. Ho, M. Leong, P. Leong, J. Becker and M. Glesner, “Rapid prototyping of FPGA based floating point DSP systems”, Proceedings 13th IEEE International Workshop on Rapid System Prototyping, 2002.
[15] Woon-Seng Gan, “Teaching and learning the hows and whys of real-time digital signal processing”, IEEE Transactions on Education, 45(4), pp. 336-343. 2002.
[16] L. S. DeBrunner and V. DeBrunner, “The case for teaching DSP algorithms in conjunction with implementations”, Proceedings 2nd Signal Processing Education Workshop and 10th Digital Signal Processing Workshop, 2002.
[17] N. Kehtarnavaz and S. Mahotra, “FPGA implementation made easy for applied digital signal processing courses”, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011.
[18] Texas Instruments DSP Tools http://www.ti.com/lsds/ti/dsp/toolsw.page, accessed 07- 09-2014.
[19] S. M. Kuo, B. H. Lee and W. Tian, Real-Time Digital Signal Processing, Wiley, 2006.
102
[20] M. Shirvaikar and T. Bushnaq, “A comparison between DSP and FPGA platforms for real-time imaging applications”, Proceedings SPIE-IS and T Electronic Imaging-Real- Time Image and Video Processing 7244, 2009.
[21] M. Shirvaikar and T. Bushnaq, “VHDL implementation of wavelet packet transforms using SIMULINK tools”, Proceedings International Society for Optical Engineering, 2008.
[22] R. Duren, J. Stevenson and M. Thompson, “A comparison of FPGA and DSP development environments and performance for acoustic array processing”, Proceedings 50th Midwest Symposium on Circuits and Systems, 2007.
[23] D. Markovic, B. Richards and R. W. Brodersen, “Technology driven DSP architecture optimization within a high-level block diagram based design flow”, Proceedings 40th Asilomar Conference on Signals, Systems and Computers, 2006.
[24] FPGA Programming for the Masses http://queue.acm.org/detail.cfm?id=2443836 , accessed 07-09-2014.
[25] TI C5515 EVM User Guide http://support.spectrumdigital.com/boards/evm5515/revb/files/evm5515_TechRef_revb.p df, accessed 07-09-2014.
[26] TI C5515 Fixed-Point DSP Datasheet http://www.ti.com/lit/ds/symlink/tms320c5515.pdf, accessed 07-09-2014.
[27] TI website tutorial http://processors.wiki.ti.com/index.php/Category:Simulation, accessed 07-09-2014.
[28] Altera DE-1 Board User Manual ftp://ftp.altera.com/up/pub/Altera_Material/12.1/Boards/DE1/DE1_User_Manual.pdf, accessed 07-09-2014.
[29] Simulink User Guide http://www.mathworks.com/help/pdf_doc/simulink/sl_gs.pdf, accessed 07-09-2014.
103
[30] Quartus II User Manual http://www.altera.com/literature/hb/qts/quartusii_handbook.pdf, accessed 07-09-2014.
[31] SOPC Builder User Manual http://www.altera.com/literature/ug/ug_sopc_builder.pdf, accessed 07-09-2014.
[32] DSP Builder User Guide http://www.altera.co.jp/literature/ug/ug_dsp_builder.pdf, accessed 07-09-2014.
[33] P. Ekas and B. Jentz, “Developing and integrating FPGA coprocessors”, Embedded Computing Design Magazine, http://embedded-computing.com/pdfs/Altera.Fall03.pdf, accessed 07-09-2014.
[34] S. K. Knapp, Using Programmable Logic to Accelerate DSP Functions, Xilinx, Inc., pp. 1-8. 1995.
[35] S. Sharma and W. Chen, “Using model-based design to accelerate FPGA development for automotive applications”, The MathWorks, 2009.
[36] T. Hill, “The benefits of FPGA coprocessing”, Xcell Journal.v58, pp. 29-31. 2006.
[37] T. Hill, “Heterogeneous hardware platforms capitalize on DSP/FPGA capabilities”, http://dsp-fpga.com/pdfs/Xilinx.RG07.pdf, accessed 07-09-2014.
[38] FPGA Coprocessing Evolution: Sustained Performance Approaches Peak Performance http://www.altera.com/literature/wp/wp-01031-coprocessing-evolution.pdf , accessed 07- 09-2014.
[39] S. Zack and S. Dhanani, “DSP co-processing in FPGAs: Embedding high-performance, low-cost DSP functions”, Xilinx White Paper 2004, http://ohm.bu.edu/~pbohn/__Engineering_Reference/ECEU530_HDL/Digilent_S3_Boar d/wp212.pdf , accessed 07-09-2014.
104
[40] P. Ekas and B. Jentz, “Developing and integrating FPGA coprocessors”, Embedded Computing Design Magazine, http://embedded-computing.com/pdfs/Altera.Fall03.pdf, accessed 07-09-2014.
[41] FPGAs Provide Reconfigurable DSP Solutions http://www.altera.com/literature/wp/wp_dsp_fpga.pdf, accessed 07-09-2014.
[42] Increase Bandwidth in Medical & Industrial Applications with FPGA Coprocessors http://www.altera.com/literature/wp/wp_use_of_pld_as_cp5.pdf , accessed 07-09-2014.
[43] Altera Cyclone II Device Handbook, http://www.altera.com/literature/hb/cyc2/cyc2_cii5v1.pdf, accessed 07-09-2014.
[44] TI C55x CPU Architecture Reference Guide, http://www.ti.com/lit/ug/swpu073e/swpu073e.pdf, accessed 07-09-2014.
105