Electrical and Engineering Faculty Publications Electrical &

6-11-2008

Significance of Logic Synthesis in FPGA-Based Design of Image and Signal Processing Systems

Mariusz Rawski Warsaw University of Technology, [email protected]

Henry Selvaraj University of Nevada, Las Vegas, [email protected]

Bogdan J. Falkowski Nanyang Technological University, Singapore

Tadeusz Luba Institute of Telecommunications Warsaw

Follow this and additional works at: https://digitalscholarship.unlv.edu/ece_fac_articles

Part of the Computer Engineering Commons, Electrical and Electronics Commons, Power and Energy Commons, Signal Processing Commons, and the Systems and Communications Commons

Repository Citation Rawski, M., Selvaraj, H., Falkowski, B. J., Luba, T. (2008). Significance of Logic Synthesis in FPGA-Based Design of Image and Signal Processing Systems. Pattern Recognition Technologies and Applications: Recent Advances 265-283. Hershey, PA: IGI Global. https://digitalscholarship.unlv.edu/ece_fac_articles/303

This Chapter is protected by copyright and/or related rights. It has been brought to you by Digital Scholarship@UNLV with permission from the rights-holder(s). You are free to use this Chapter in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/or on the work itself.

This Chapter has been accepted for inclusion in Electrical and Computer Engineering Faculty Publications by an authorized administrator of Digital Scholarship@UNLV. For more information, please contact [email protected]. Pattern Recognition Technologies and Applications: Recent Advances

Brijesh Verma Central Queensland University, Australia

Michael Blumenstein Griffith University, Australia

InformatIon scIence reference Hershey • New York Acquisitions Editor: Kristin Klinger Development Editor: Kristin Roth Assistant Development Editor: Jessica Thompson Senior Managing Editor: Jennifer Neidig Managing Editor: Jamie Snavely Assistant Managing Editor: Carole Coulson Copy Editor: Sue Vander Hook Typesetter: Sean Woznicki Cover Design: Lisa Tosheff Printed at: Yurchak Printing Inc.

Published in the United States of America by Information Science Reference (an imprint of IGI Global) 701 E. Chocolate Avenue, Suite 200 Hershey PA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail: [email protected] Web site: http://www.igi-global.com and in the United Kingdom by Information Science Reference (an imprint of IGI Global) 3 Henrietta Street Covent Garden London WC2E 8LU Tel: 44 20 7240 0856 Fax: 44 20 7379 0609 Web site: http://www.eurospanbookstore.com

Copyright © 2008 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.

Library of Congress Cataloging-in-Publication Data

Pattern recognition technologies and applications : recent advances / Brijesh Verma and Michael Blumenstein, editors.

p. cm.

Summary: “ This book provides cutting-edge pattern recognition techniques and applications. Written by world-renowned experts in their field, this easy to understand book is a must have for those seeking explanation in topics such as on- and offline handwriting and speech recognition, signature verification, and gender classification”--Provided by publisher.

Includes bibliographical references and index.

ISBN-13: 978-1-59904-807-9 (hardcover)

ISBN-13: 978-1-59904-809-3 (e-book)

1. Pattern perception. 2. Pattern perception--Data processing. I. Verma, Brijesh. II. Blumenstein, Michael.

Q327.P383 2008

006.4--dc22

2007037396

British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library.

All work contributed to this book set is original material. The views expressed in this book are those of the authors, but not necessarily of the publisher. 

Chapter XII Significance of Logic Synthesis in FPGA-Based Design of Image and Signal Processing Systems

Mariusz Rawski Warsaw University of Technology, Poland

Henry Selvaraj University of Nevada, USA

Bogdan J. Falkowski Nanyang Technological University, Singapore

Tadeusz Łuba Warsaw University of Technology, Poland

AbstrAct

This chapter, taking FIR filters as an example, presents the discussion on efficiency of different imple- mentation methodologies of DSP algorithms targeting modern FPGA architectures. Nowadays, pro- grammable technology provides the possibility to implement digital systems with the use of specialized embedded DSP blocks. However, this technology gives the designer the possibility to increase efficiency of designed systems by exploitation of parallelisms of implemented algorithms. Moreover, it is possible to apply special techniques, such as distributed arithmetic (DA). Since in this approach, general-pur- pose multipliers are replaced by combinational LUT blocks, it is possible to construct digital filters of very high performance. Additionally, application of the functional decomposition-based method to LUT blocks optimization, and mapping has been investigated. The chapter presents results of the comparison of various design approaches in these areas.

Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited. Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

INtrODUctION gates, a lot of complex gates that can be gener- ated in (semi-) custom CMOS design, and the The pattern recognition research field aims to field programmable logic families that include design methods that allow recognition of patterns various types of (C)PLDs and FPGAs. The other in data. It has important application in image no less important factors of the success are the analysis, character recognition, speech analysis, automation of the design process and hardware and many others. A pattern recognition system description languages. Modern design tools have is composed of sensors gathering observations enabled us to move beyond putting together digital that have to be classified, a feature extraction part components in a schematic entry package to start that provides specific information from gathered writing code in an HDL specification. However, observation, and a classification mechanism that the opportunities created by modern microelec- classifies observation on the basis of extracted tronic technology are not fully exploited because features. Feature extraction methods are re- of weaknesses in traditional logic design meth- sponsible for reducing the resources required to ods. According to the International Technology describe observation accurately. In the case of Roadmap for Semiconductors (1997), the annual image analysis, character recognition, or speech growth rate in design complexity is equal to 58%, analysis, various -processing (DSP) while the annual growth rate in productivity is algorithms are used to detect desired features only 21% (Figure 1). This means that the number of digitalized image or speech signal. Efficient of logic gates available in modern devices grows implementation of feature extraction-based DSP faster than the ability to design them meaningfully. methods requires specific hardware solutions. New methods are required to aid design process The commercial success of hardware imple- in a way that possibilities offered by modern mentations of image processing systems is due microelectronics are utilized in the highest pos- in large part to revolutionary development in sible degree. microelectronic technologies. By taking advan- In recent years, digital filtering has been recog- tage of the opportunities provided by modern nized as a primary digital signal processing (DSP) microelectronic technology, we are in a position operation. With advances in technology, digital to build very complex digital circuits and systems filters are rapidly replacing analogue filters, which at relatively low cost. There is a large variety of were implemented with RLC components. Digital logic building blocks that can be exploited. The filters are used to modify attributes of signal in library of elements contains various types of the time or frequency domain through a process

Figure 1. Difference in growth of device complexity and productivity

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

called linear convolution. Traditionally, digital plication of the DWT requires convolution of the signal filtering algorithms are being implemented signal with the wavelet and scaling functions. Ef- using general-purpose programmable DSP chips. ficient hardware implementation of convolution is Alternatively, for high-performance applications, performed as a finite impulse response (FIR) filter. special-purpose fixed functionDSP chipsets and Two filters are used to evaluate aDWT: a high-pass application-specific integrated circuits (ASICs) and a low-pass filter, with the filter coefficients are used. Typical DSP devices are based on the derived from the wavelet basis function. concept of RISC processors with an architecture Progress in the development of program- that consists of fast array multipliers. In spite of mable architectures observed in recent years using pipeline architecture, the speed of such has resulted in digital devices that allow build- implementation is limited by the speed of ar- ing very complex digital circuits and systems ray multiplier. Digital filters are implemented at relatively low cost in a single programmable in such devices as multiply-accumulate (MAC) structure. FPGAs are an array of programmable algorithms (Lapsley, Bier, Shoham & Lee, 1997; logic cells interconnected by a matrix of wires Lee, 1988, 1989). However, the technological and programmable switches. Each cell performs advancements in field programmable gate arrays a simple logic function defined by a designer’s (FPGAs) in the past decade have opened new program. An FPGA has a large number (64 to paths for DSP design engineers. more than 300,000) of these cells available to use Digital filtering plays an extremely important as building blocks in complex digital circuits. The role in many signal and image processing algo- ability to manipulate the logic at the gate level rithms. An excellent example is wavelet transform, means that a designer can construct a custom which has gained much attention in recent years. to efficiently implement the desired Discrete wavelet transform (DWT) is one of the function. FPGA manufacturers have for years useful and efficient signal and image decompo- been extending their chips’ ability to implement sition methods with many interesting properties digital-signal processing efficiently; for example, (Daubechies, 1992; Falkowski, 2004; Falkowski by introducing low-latency carry-chain-routing & Chang, 1997; Rao & Bopardikar, 1998; Rioul lines that sped addition and subtraction operations & Vetterli, 1991). This transformation, similar to spanning multiple logic blocks. Such a mecha- the Fourier transform, can provide information nism is relatively efficient when implementing about frequency contents of signals. However, addition and subtraction operations. However, it unlike Fourier transform, this approach is more is not optimal in cost, performance, and power natural and fruitful when applied to nonstation- for multiplication and division functions. As a ary signals, like speech, signal, and images. The result, Altera (with Stratix), QuickLogic (with flexibility offered by discrete wavelet transform QuickDSP, now renamed Eclipse Plus), and Xilinx allows researchers to develop and find the right (with Virtex-II and Virtex-II Pro) embedded in wavelet filters for their particular application. For their chips dedicated multiplier function blocks. example, in fingerprints compression, a particular Altera moved even further along the integration set of bio-orthogonal filters—Daubechies bio- path, providing fully functional MAC blocks orthogonal spine wavelet filters—is found to be called the DSP blocks. very effective (Brislawn, Bradley, Onyshczak & Programmable technology makes it possible Hopper, 1996). The computational complexity of to increase the performance of a digital system the discrete wavelet transform is very high. Hence, by implementing multiple, parallel modules in efficient hardware implementation is required to one chip. This technology allows also the appli- achieve very good real-time performance. Ap- cation of special techniques such as distributed

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

arithmetic (DA) (Croisier, Esteban, Levilion & 2001). However, there are relatively few papers in Rizo, 1973; Meyer-Baese, 2004; Peled & Liu, which functional decomposition procedures were 1974). DA technique is extensively used in com- compared with analogous synthesis methods used puting the sum of product in filters with constant in commercial design tools. The reason behind coefficients. In such a case, partial product term such a situation is the lack of appropriate inter- becomes a multiplication with a constant (i.e., face software that would allow a transforming scaling). DA approach significantly increases the description of project structure obtained outside performance of an implemented filter by remov- a commercial design system into a description ing general-purpose multipliers and introducing compatible with its rules. Moreover, the compu- combinational blocks that implement the scaling. tation complexity of functional decomposition These blocks have to be efficiently mapped onto procedures makes it difficult to construct efficient FPGA’s logic cells. This can be done with the use automatic synthesis procedures. These difficul- of such advanced synthesis methods as functional ties have been eliminated at least partially in so- decomposition (Rawski, Tomaszewicz, & Łuba, called balanced decomposition (Łuba, Selvaraj, 2004; Rawski, Tomaszewicz, Selvaraj, & Łuba, Nowicka & Kraśniewski, 1995; Nowicka, Łuba 2005; Sasao, Iguchi, & Suzuki, 2005). & Rawski, 1999). In the case of applications targeting FPGA structures based on look-up tables (LUTs), the influence of advanced logic synthesis procedures bAsIc tHEOrY on the quality of hardware implementation of signal and information processing systems is In this chapter, only such information necessary especially important. Direct cause of such a situ- for an understanding of this chapter is reviewed. ation is the imperfection of technology mapping More detailed description of functional decom- methods that are widely used at present, such position based on partition calculus can be found as minimization and factorization of Boolean in Brzozowski and Łuba (2003). function, which are traditionally adapted to be used for structures based on standard cells. These cube representation of boolean methods transform Boolean formulas from a Functions sum-of-products form into a multilevel, highly factorized form that is then mapped into LUT cells. A Boolean function can be specified using the This process is at variance with the nature of the concept of cubes (e.g., input terms, patterns) LUT cell, which from the logic synthesis point representing some specific subsets of minterms. of view is able to implement any logic function In a minterm, each input variable position has a of limited input variables. For this reason, for the well-specified value. In a cube, positions of some case of implementation targeting FPGA structure, input variables can remain unspecified, and they decomposition is a much more efficient method. represent “any value” or “don’t care” (–). A cube Decomposition allows synthesizing the Boolean may be interpreted as a p-dimensional subspace of function into a multilevel structure that is built the n-dimensional Boolean space or as a product of components, each of which is in the form of of n–p variables in (p denotes the LUT logic block specified by truth tables. the number of components that are “–“). Bool- Efficiency offunctional decomposition has been ean functions are typically represented by truth proved in many theoretical papers (Brzozowski & tables. A description of a function Łuba, 2003; Chang, Marek-Sadowska & Hwang, using minterms requires 2n rows for a function 1996; Rawski, Jóźwiak & Łuba, 2001; Scholl, of n variables. For function from Table 1, a truth

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

table with 26 = 64 rows would be required. Since Example 1: Blanket-Based a cube represents a set of minterms, application representation of boolean of cubes allows for much more compact descrip- Functions tion in comparison with minterm representation. For example, cube 0101–0 from row 1 of the truth For function F from Table 1, the blankets induced table from Table 1 represents a set of two minterms by particular input and output variables on the {010100, 010110}. set of function F’s input patterns (cubes) are as For pairs of cubes and for a certain input subset follows: B, we define the compatibility relation COM as follows: each two cubes S and T are compatible β = {1, 2, 3, 4, 5, 6, 8, 9; 3, 6, 7, 9, 10}, (i.e., S, T ∈ COM(B) ) if and only if x(S) ~ x(T) x1 β for every x ⊆ B. The compatibility relation ~ on x2 = {5 , 6, 7, 8, 10; 1, 2, 3, 4, 6, 7, 9, 10}, β {0, –, 1) is defined as follows: 0 ~ 0, – ~ –, 1 ~ x3 = {1, 2, 3, 4, 8, 9; 5, 6, 7, 8, 10}, 1, 0 ~ –, 1 ~ –, – ~ 0, – ~ 1, but the pairs (1, 0) β x4 = {2, 3, 5, 8, 9, 10; 1, 2, 4, 5, 6, 7, 8}, and (0, 1) are not related by ~. The compatibility β = {1, 2, 3, 5, 6, 7, 8, 10; 1, 4, 5, 6, 8, 9, 10}, relation on cubes is reflexive and symmetric, but x5 β not necessarily transitive. In general, it generates x6 = {1, 2, 3, 4, 7, 8, 9, 10; 3, 4, 5, 6, 7, 9, 10}, a “partition” with nondisjoint blocks on the set of β y1 = {1, 2, 3, 4, 5, 6, 7; 8, 9, 10}, cubes representing a certain Boolean function F. The cubes contained in a block of the “partition” β β The product of two blankets 1 and 2: are all compatible with each other. ”Partitions” with nondisjoint blocks are re- β = β • β = ferred to as blankets (Brzozowski & Łuba, 2003). x2x4 x2 x4 The concept of blanket is a simple extension of {5, 8, 10; 5, 6, 7, 8; 2, 3, 9, 10; 1, 2, 4, 6, 7}, β ≤ β ordinary partition, and typical operations on x2x4 x2 . blankets are strictly analogous to those used in ordinary partition algebra. Information on the input patterns of a certain function F is delivered by the function’s inputs and representation and Analysis of Boolean Functions with Blankets

Table 1. Boolean function F(x1 , x2 , x3 , x4 , x5 ,

A blanket on a set S is such a collection of (not x6 ) necessarily disjoint) subsets B of S, called blocks, i x x x x x x y that: 1 2 3 4 5 6 1 1 0 1 0 1 – 0 0 B = S 2 0 1 0 – 0 0 0  i i 3 – 1 0 0 0 – 0 β β 4 0 1 0 1 1 – 0 The product of two blankets 1 and 2 is defined as follows: 5 0 0 1 – – 1 0 6 – – 1 1 – 1 0 β • β ∩ ∈ β ∈β 7 1 – 1 1 0 – 0 1 2 = { Bi Bj | Bi 1 and Bj 2 } 8 0 0 – – – 0 1 β ≤β 9 – 1 0 0 1 – 1 For two blankets we write 1 2 if and only if for β β ⊆ 10 1 – 1 0 – – 1 each Bi in 1 there exists a Bj in 2 such that Bi Bj. The relation ≤ is reflexive and transitive.

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

used by its outputs with respect to the blocks of spect to (U, V), if for every minterm b relevant the input and output blankets. Knowing the block to F, G(bV) is defined, G(bV) ∈ {0, 1}p., and F(b) of a certain blanket, one is able to distinguish the = H(bU, G(bV) ). G and H are called blocks of the elements of this block from all other elements, but decomposition (Figure 2). is unable to distinguish between elements of the given block. In this way, information in various theorem 1: Existence of serial points and streams of discrete information systems Decomposition (Brzozowski & Łuba, can be modeled using blankets. 2003)

β β β serial Decomposition Let V , U , and F be blankets induced on the function’s F input cubes by the input subsets V The set X of a function’s input variable is par- and U, and outputs of F, respectively. β titioned into two subsets: free variables U and If there exists a blanket G on the set of func- ∪ β ≤ β β • bound variables V, such that U V = X. Assume tion F’s input cubes such that V G , and U β ≤ β that the input variables x1,...,xn have been relabeled G F , then F has a serial decomposition with in such way that: respect to (U, V). Proof of Theorem 1 can be found in Brzozowski

U = {x1,...,xr} and and Łuba (2003).

V = {xm–s+1,...,xn}. As follows from Theorem 1, the main task in constructing a serial decomposition of a function β Consequently, for an n-tuple x, the first r F with given sets U and V is to find a blanket G components are denoted by xU, and the last s that satisfies the condition of the theorem. Since V β ≥ β components, by x . G must be V , it is constructed by merging β Let F be a Boolean function, with n > 0 inputs blocks of V as much as possible. β and m > 0 outputs, and let (U, V) be as previously Two blocks Bi and Bj of blanket V are com- γ indicated. Assume that F is specified by a set F patible (merge able), if blanket ij obtained from β of the function’s cubes. Let G be a function with blanket V by merging Bi and Bj into a single block s inputs and p outputs, and let H be a function satisfies the second condition of Theorem 1; that β • γ ≤ β with r + p inputs and m outputs. The pair (G, H) is, if U ij F. Otherwise blocks Bi and Bj are represents a serial decomposition of F with re- incompatible (unmergeable). A subset δ of blocks β of the blanket V is a compatible class of blocks if the blocks in δ are pairwise compatible. A com- Figure 2. Schematic representation of the serial patible class is maximal if it is not contained in decomposition any other compatible class. From the computational point of view, finding maximal compatible classes is equivalent to find- ing maximal cliques in a graph Γ = (N, E), where β the set N of nodes is the set of blocks of V and set E of edges is formed by set of compatible pairs. β The next step in the calculation of G is the selection of a set of maximal classes, with minimal β cardinality, that covers all the blocks of V . The minimal cardinality ensures that the number of

0 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

β blocks of G, and hence the number of outputs of the following are the unmergeable pairs: (B1, B4), the function G, is as small as possible. (B1, B6), (B1, B7), (B2, B6), (B2, B7), (B3, B4), (B3,

In certain heuristic strategies, both procedures B6), (B3, B7), (B4, B6), (B4, B7), (B5, B6), and (B5,

(finding maximal compatible classes and then B7). Using the graph coloring procedure, we find finding the minimal cover) can be reduced to the that three colors are needed here (Figure 3). graph coloring problem. Nodes B1, B3 are assigned one color, nodes β Calculating G corresponds to finding the mini- B2, B4, B5 are assigned a second color, and a mal number k of colors for graph Γ = (N, E). third color is assigned for nodes B6, B7. The sets of nodes assigned to different colors form the β Example 2 blocks of G.

β For the function from Table 1 specified by a set G = {1, 2, 3, 4, 8, 9 ; 3, 4, 5, 6, 9 ; 6, 7, 10} F of cubes numbered 1 through 10, consider a β serial decomposition with U = {x2, x4, x5} and V It is easily verified that G satisfies the condi-

= {x1, x3, x6}. tion of Theorem 1. Thus, function F has a serial We find: decomposition with respect to (U, V). β Since G has 3 blocks, to encode blocks of this blanket, two encoding bits g and g have to be used. β = β = β • β • β = 1 2 U x2 x4 x5 x2 x4 x5 Let us assume that we use the encoding: {5, 8, 10; 5, 6, 7, 8; 2, 3, 10; 1, 2, 6, 7; 9, 10; 1, 4, 6 }, 00 01 10 β = β = β • β • β = β = { }. V x1 x3 x6 x1 x3 x6 G 1, 2, 3, 4, 8, 9 ; 3, 4, 5, 6, 9 ; 6, 7,10 {1, 2, 3, 4, 8, 9; 3, 4, 9; 8; 5, 6; 3, 9; 7, 10 ; 6, 7, 10}, β β To define a function G by a set of cubes, we F = y1= {1, 2, 3, 4, 5, 6, 7; 8, 9, 10}, calculate all the cubes, r(Bi ), assigned to each block B of β . The relationship between blocks For: i V β of V and their cube representatives, r(Bi ), relies BB1 2 BB3 4 BB5 6 B7 on containment of block B in blocks of β from β = { } i xj V 1, 2, 3, 4, 8, 9 ; 3, 4 ,9 ; 8 ; 5, 6; 3, 9; 7, 10 ; 6, 7, 10 ∈ xj V. β Denoting blocks of V from Example 2 as B1

through B7, we have r(B1) = 000. This is because

B1 = {1, 2, 3, 4, 8, 9} is included in the first blocks β β β of x1, x3 and x6. For B2 = {3, 4, 9}, we have: B2 is β included in the first block of β , in the first block Figure 3. Incompatibility graph of V’ s blocks x1 β β x3 and in both blocks of x6. Hence, r(B2) = 00–.

Similarly, r(B3) = 0–0, r(B4) = 011, r(B5) = –0–,

r(B6) = 11–, r(B7) = 111. Finally, the value of function G is obtained on β the basis of containment of blocks Bi in blocks of G. β Block B1={1, 2, 3, 4, 8, 9} of blanket V is contained β in block G that has been encoded with 00. Since

r(B1) = 000, we have G(r(B1)) = G(x1 = 0, x3 = 0, x6

= 0) = 00. Similarly, G(r(B3)) = 00, G(r(B4)) = 01,

G(r(B6)) = 10 and G(r(B7)) = 10. However, block β B2 = {3, 4, 9} is contained in two blocks of G (one

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

encoded “00” and the second “01”). The represen- block blankets (equivalent to encoding the tative “00–” of this block has nonempty product multivalued function of block G defined by β with representative “000” of B1 and representative blanket G with a number of binary output

“0–0” of B3, which was assigned output “00.” To variables) avoid conflicts, we must subtract cubes “000” and “0–0” from cube “00–.” The result is cube “001” In a multilevel decomposition, this process is that may be assigned output “01.” The same applies applied to functions H and G repetitively, until each to block B5. The representative “–0–” of this block block in the obtained network in this way can be has nonempty product with representative of B1 mapped directly to a logic block of a specific imple- and B3, which was assigned output “00.” We must mentation structure (Łuba & Selvaraj, 1995). subtract cubes “000” and “0–0” from cube “0–0,” The selection of an appropriate input variable and the result in the form of cube “10–” may be partitioning is the main problem in functional assigned output “01”. decomposition (Rawski, Jóźwiak & Łuba, 1999a; Truth table of function G is presented in Table Rawski, Selvaraj & Morawiecki, 2004). The choice 2a. To compute the cubes for function H, we of sets U and V from set X determines the construc- β • β β consider each block of the product U G . Their tion of an appropriate blanket G, which satisfies β representatives are calculated in the same fashion. Theorem 1. The existence of such a blanket G Finally, the outputs of H are calculated with respect implies the existence of a serial decomposition. β β β β • β β to F (Table 2b). Blankets V, G, U G, and F constitute the basis The process of functional decomposition for the construction of subfunctions H and G in consists of the following steps: serial decomposition. In other words, knowing β β β β V , U , and F , and having G, one can construct • Selection of an appropriate input support V particular subfunctions G and H. for block G (input variable partitioning) β β β • Calculation of the blankets U , V and F • Construction of an appropriate multiblock β blanket G (corresponds to the construction of the multivalued function of block G) Table 2b. Function H of the serial decomposition

• Creation of the binary functions H and G x x x g g y β 2 4 5 1 2 1 by representing the multiblock blanket G 1 1 1 – 0 0 0 as the product of a number of certain two- 2 1 – 0 0 0 0 3 1 0 0 0 0 0 4 1 0 0 0 1 0 Table 2a. Function G of the serial decomposition 5 1 1 1 0 0 0 6 1 1 1 0 1 0

x1 x3 x6 g1 g2 7 0 1 1 0 1 0 1 0 0 0 0 0 8 – 1 – 0 1 0 2 0 0 1 0 1 9 – 1 – 1 0 0 3 0 – 0 0 0 10 – 1 0 1 0 0 4 0 1 1 0 1 11 0 – – 0 0 1 5 1 0 – 0 1 12 1 0 1 0 0 1 6 1 1 – 1 0 13 1 0 1 0 1 1 7 1 1 1 1 0 14 – 0 – 1 0 1

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

The input variables of block G and their cor- In multilevel logic synthesis methods, the se- β responding blankets, and the output blanket G of rial decomposition process is applied recursively block G define together the multivalued func- to functions H and G obtained in the previous β tion of block G. The structure of G obviously synthesis steps until each block of the resulting influences the shape of the subfunctions G and net can be mapped directly to a single logic block β H (Figure 2). Blanket G determines the output of a specific implementation structure (Łuba, 1995; values of function G. Each value of this multi- Łuba & Selvaraj, 1995). In the case of look-up table valued function corresponds to a certain block of FPGAs, the multilevel decomposition process β the blanket G. Considering the number of values ends when each block of the resulting net can be of the multivalued function of a certain subsys- mapped directly into a configurable logic block tem in decomposition is therefore equivalent to (CLB) of a specific size (typically the CLB size is β considering the number of blocks in blanket G from 4 to 6 inputs and 1 or 2 outputs). Although   of this subsystem. A minimum of log2 q binary algorithms of multilevel logic synthesis can also variables is required for encoding q values. Thus, use parallel decomposition in order to assist the β if q denotes the number of blocks in G, then the serial decomposition (Łuba et al., 1995), the final minimum required number of binary outputs from results of the synthesis process strongly depend   G is equal to k = log2 q . on the quality of the serial decomposition. Since function H is constructed by substitut- ing in the truth table of function F the patterns of Parallel Decomposition values of the primary input variables from set V (bound variables) with the corresponding values Consider a multiple-output function F. Assume β of function G, it is obvious that the choice of G that F has to be decomposed into two components, influences the subfunction H. The outputs of G G and H, with disjoint sets YG and YH of output constitute a part of the input support for block H. variables. This problem occurs, for example, when Thus, the size of block G and the size of block H we want to implement a large function using both grow with the number of blocks in blanket components with a limited number of outputs. β G. The minimum possible number of blocks in Note that such a parallel decomposition can also β G strongly depends on the input support chosen alleviate the problem of an excessive number of β for block G, because G is computed by merging inputs of F. This is because for typical functions, β some blocks of V , this being the blanket induced most outputs do not depend on all input variables. by the chosen support. Therefore, the set XG of input variables on which

Function H is decomposed in the successive the outputs of YG depend may be smaller than X. steps of the multilevel synthesis process. This is Similarly, the set XH of input variables on which β why blanket G has a direct influence on the next the outputs of YH depend may be smaller than X. steps of the process. The structure of the blanket As a result, components G and H have not only β G determines the difficulty of the successive de- fewer outputs but also fewer inputs than F. The composition steps and influences the final result of exact formulation of the parallel decomposition the synthesis process (characterized by the number problem depends on the constraints imposed by of logic blocks and number of logic levels). The the implementation style. One possibility is to find β number of blocks in blanket G is the most decisive sets YG and YH , such that the combined cardinality parameter. The strong correlation of the number of XG and XH is minimal. Partitioning the set of β of blanket G’s blocks with the decomposition’s outputs into only two disjoint subsets is not an quality has been shown in Rawski, Jóźwiak, and important limitation of the method, because the Łuba (1999b), and this number can be used as a procedure can be applied again for components criterion for testing individual solutions. G and H.

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

Example 3 An optimal two-block decomposition, mini-

mizing the card XG + card XH (where card X is

Consider the multiple-output function given in the cardinality of X), is YG = {y2, y4, y5} and YH

Table 3. The minimal sets of input variables on ={y1, y3, y6}, with XG = {x1, x2, x3, x4, x7} and which each output of F depends are: XH = {x1, x2, x4, x6, x9}. The truth tables for com- ponents G and H are shown in Table 4. The algorithm itself is general in the sense that y1: {x1, x2, x6} the function to be parallel decomposed can be y2: {x3, x4} specified in compact cube notation. Calculation y3: {x1, x2, x4, x5, x9}, {x1, x2, x4, x6, x9} of the minimal sets of input variables for each y4: {x1, x2, x3, x4, x7} individual output can be a complex task. Thus, y5: {x1, x2, x4} in practical implementation, heuristic algorithms y6: {x1, x2, x6, x9} are used, which support calculations with the help of so-called indiscernible variables.

Table 3. Function F

x1 x2 x3 x4 x5 x6 x7 x8 x9 y1 y2 y3 y4 y5 y6 1 0 0 0 1 1 1 0 0 0 0 0 0 0 – 0 2 1 0 1 0 0 0 0 0 0 0 0 – 1 0 1 3 1 0 1 1 1 0 0 0 0 0 1 1 0 1 1 4 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 5 1 0 1 0 1 0 0 0 0 0 0 0 – 0 1 6 0 0 1 1 1 0 0 0 0 1 1 0 1 0 0 7 1 1 1 0 0 0 0 0 0 1 0 – 0 1 0 8 1 0 1 1 0 1 0 0 0 1 1 0 0 – 1 9 1 0 1 1 0 1 1 0 0 – 1 0 1 – 1 10 1 1 1 0 0 0 0 1 0 1 0 1 0 1 – 11 0 0 0 1 1 1 0 0 1 0 0 1 0 – 1 12 0 0 0 1 1 0 0 0 1 – – 1 0 0 0

Table 4b. Function H of parallel decomposition

x x x x x y y y Table 4a. Function G of parallel decomposition 1 2 4 6 9 1 3 6 1 0 0 1 1 0 0 0 0

x1 x2 x3 x4 x7 y2 y4 y5 2 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 3 1 0 1 0 0 0 1 1 2 1 0 1 0 0 0 1 0 4 1 1 1 1 0 0 1 0 3 1 0 1 1 0 1 0 1 5 0 0 1 0 0 1 0 0 4 1 1 1 1 0 1 1 1 6 1 1 0 0 0 1 1 0 5 0 0 1 1 0 1 1 0 7 1 0 1 1 0 1 0 1 6 1 1 1 0 0 0 0 1 8 0 0 1 1 1 0 1 1 7 1 0 1 1 1 – 1 – 9 0 0 1 0 1 – 1 0

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

balanced Functional Decomposition Table 5. type fr The balanced decomposition is an iterative process in which, at each step, either parallel or .i 10 serial decomposition of a selected component is .o 2 performed. The process is carried out until all .p 25 resulting subfunctions are small enough to fit 0101000000 00 blocks with a given number of input variables. 1110100100 00 0010110000 10 Example 4 0101001000 10 1110101101 01 The influence of the parallel decomposition on the 0100010101 01 final result of theFPGA-based mapping process will be explained with the function F given in 1100010001 00 Table 5, for which cells with four inputs and one 0011101110 01 output are assumed (this is the size of Altera’s 0001001110 01 FLEX FPGAs). 0110000110 01 As F is a 10-input, two-output function, in 1110110010 10 the first step of the decomposition, particularly 0111100000 00 in automated mode, serial decomposition is per- 0100011011 00 formed. The algorithm extracts function g with 0010111010 01 inputs numbered 1, 3, 4, and 6; thus, the next step deals with seven-input function H, for which again 0110001110 00 serial decomposition is assumed, now resulting 0110110111 11 in block G, with four inputs and two outputs 0001001011 11 (implemented by two cells). It is worth noting that 1110001110 10 the obtained block G takes as its inputs variables 0011001011 10 denoted 0, 2, 5, and 7, which, fortunately, belong 0010011010 01 to primary variables, and therefore the number of 1010110010 00 levels is not increased in this step as it is shown 0100110101 11 in Figure 4a. In the next step, we apply parallel 0001111010 00 decomposition. Parallel decomposition generates two components, both with one output but four 1101100100 10 and five inputs, respectively. The first one forms 1001110111 11 a cell (Figure 4b). The second component is sub- .e ject to two-stage serial decomposition as shown in Figure 4c. The obtained network can be built of seven (four to one) cells, where the number of one output. Each of them is subject to two-stage levels in the critical path is three. serial decomposition. For the first component, The same function decomposed with parallel a disjoint serial decomposition with four inputs decomposition, the first step shown in Figure 5, and one output can be applied (Figure 5a). The leads to a completely different structure. Parallel second component can be decomposed serially decomposition applied directly to function F gen- as well; however, with the number of outputs of erates two components, both with six inputs and the extracted block, G equals two. Therefore,

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

Figure 4. Decomposition of function F obtained Figure 5. Decomposition of function F with a with a strategy, where serial decomposition is strategy, where parallel decomposition is per- performed at first formed at first

Given a function F with n inputs and m out-

puts, and a logic cell (LC) with Cin inputs and Cout outputs, a decomposition process is carried out by the following steps:

1. If n ≤ m, use parallel decomposition. Continue iteratively for each of the obtained compo- to minimize the total number of components, a nents. nondisjoint decomposition strategy can be applied. 2. If n > m, try disjoint serial decomposition The truth tables of the decomposed functions g , 1 with the number of block G inputs equal to h , g , h , are shown in Table 6. The columns in 1 2 2 the number C of LC inputs, and the number the tables denote variables in the order shown in of block G outputs equal to the number C in Figure 5; for example, the first left-hand side out of LC outputs. If such a serial decomposition column in Table 6b denotes variable numbered is found, find the correspondingH . Continue 4, the second variable numbered 6, and the third iteratively with F = H. Otherwise, try to find denotes variable g . Such a considerable impact on 1 G with fewer inputs than C and/or fewer the structure results from the fact that the parallel in outputs than C . If such a G is found, findH . decomposition simultaneously reduces the num- out Continue iteratively with F = H. In case a G ber of inputs to both the resulting components, that fits in one cell cannot be found, try a larger leading to an additional improvement in the final G. This step is repeated until decomposition representation. with a function G larger than the cell exists. The idea of intertwining parallel and serial Find H and continue with F = H. Function G decomposition has been implemented in a pro- will have to be decomposed later. gram called DEMAIN. DEMAIN has two modes: automatic and interactive. It can also be used for The decomposition is carried out until all re- the reduction of the number of inputs of a func- sulting subfunctions are small enough to fit into tion when an output depends on only a subset of logic cells available in the assumed implementa- the inputs. From this point of view, DEMAIN tion technology. is a tool specially dedicated to FPGA-oriented technology mapping.

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

Table 6. There are only a few applications (e.g., adaptive filters) where general programmable filter archi- a) function g b) function h 1 1 tecture is required. In many cases, the coefficients 0110 1 -01 0 do not change over time—linear time—invariant 1101 1 011 1 filters (LTI). Digital filters are generally classified 1000 1 111 0 as being finite impulse response (FIR) or infinite 0010 1 100 1 impulse response (IIR). As the names imply, an 0000 0 0-0 0 FIR filter consists of a finite number of sample 0101 0 110 0 values, reducing the previously presented con- 1100 0 volution to a finite sum per output sample. An 0100 0 IIR filter requires that an infinite sum has to be 0011 0 performed. In this chapter, implementation of the 1011 0 LTI FIR filters will be discussed. 1111 0 The output of an FIR filter of order (length)L , to an input time-samples x[n], is given by a finite c) function g2 d) function h2 0110 1 10-1 0 version of convolution sum: 0011 1 -101 1 L-1 0100 1 -111 1 y[] n =∑ x[] k ⋅c[] k 1000 1 0011 0 k =0 (2) 0101 1 0001 1 1100 0 1-00 0 The L-th order LTI FIR filter is schematically 0010 0 0000 0 presented in Figure 6. It consists of a collection 1010 0 1110 1 of delay line, adders, and multipliers. 1110 0 1010 0 Much available digital filter software enables 0001 0 0100 1 very easy computation of coefficients for a given 0111 0 0010 1 filter. However, the challenge is mapping the FIR 1111 0 structure into suitable architecture. Digital filters are typically implemented as multiply-accumulate (MAC) algorithms with the use of special DSP devices. DIGItAL FILtErs Efficient hardware implementation of a filter’s structure in programmable devices is possible Digital filters are typically used to modify the by optimizing the implementation of multipliers attributes of a signal in the time or frequency do- and adders. In modern programmable structures, main through a process called linear convolution specialized embedded blocks can be used to imple- (Meyer-Baese, 2004). This process is formally ment multipliers, increasing the performance described by the following formula: of the designed system. Moreover, in the case of Altera’s devices, a whole MAC unit can be y[] n =x[] n ∗ f[] n =∑x[] k ⋅ f[] n -k = ∑ x[]k⋅ c[]k implemented in embedded DSP block, making k k the design methodology very similar to the one (1) used in the case of DSP processors. In the case of programmable devices, how- where the values c[i] ≠ 0 are called the filter’s ever, direct or transposed forms are preferred for coefficients. maximum speed and lowest resource utilization.

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

Figure 6. Direct form FIR filter tation of the function f(xb[k],c[k]). The preferred implementation method is to realize the mapping

f(xb[k],c[k]) as the combinational module with L inputs. The schematic representation of signed DA filter structure is shown in Figure 7, where the mapping f is presented as a lookup table that includes all the possible linear combinations of the filter coefficients and the bits of the incoming data samples (Meyer-Baese, 2004). The utility This is because the approach enables exploitation programs that generate the look-up tables for of prevalent parallelism in the algorithm. filters with given coefficients can be found in A completely different FIR architecture is the literature. based on the distributed arithmetic concept. In In the experiments presented in this chapter, contrast to a conventional sum-of-products archi- a variation of DA architecture has been used. It tecture, in the distributed arithmetic method, the increases the speed of a filter at the expense of sum of products of a specific bit of the input sample additional LUTs, registers, and adders. The basic over all coefficients is computed in one step. DA architecture for computing the length L sum of products accepts one bit from every L input word. The computation speed can be significantly DIstrIbUtED ArItHMEtIc increased by accepting for the computation more MEtHOD bits per word. Maximum speed can be achieved with a fully pipelined parallel architecture, as The distributed arithmetic method is a method shown in Figure 8. Such an implementation can of computing the sum of products. In many DSP outperform all commercially available program- applications, a general-purpose multiplication mable signal processors. is not required. In the case of filter implementa- The HDL specification of the look-up table tion, if filter coefficients are constant in time, can be easily obtained for the filter described then the partial product term x[n] ∙ c[n] becomes by its c[i] coefficients. Since the size of look-up a multiplication with a constant. Then, taking tables grows exponentially with the number of into account the fact that the input variable is a inputs, efficient implementation of these blocks binary number: becomes crucial to the final resource utilization of filter implementation. Here, advanced synthesis B-1 b methods based on balanced decomposition can x[] n =∑ xb[] n ⋅2 , where [xb n][∈ 0,1] b=0 (3) be successfully applied for technology mapping of DA circuits onto FPGA logic cells. The whole convolution sum can be described as shown next: rEsULts BL--1 1 BL--1 1 b b y[] n =∑2 ⋅ ∑ xb [ k][⋅c k] = ∑ 2 ⋅ ∑ f([ xb k], c[]k ) b=0 k = 0 b=0 k = 0 Experimental results for FIR filter implementation (4) with different design methodologies are presented in this section. Filter with length (order) 15 has The efficiency of filter implementation based been chosen for the experiment. It has eight-bit on this concept strongly depends on the implemen- signed input samples, and its coefficients can

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

Figure 7. DA architecture with look-up table (LUT)

Figure 8. Parallel implementation of a distributed arithmetic scheme

be found in Goodman and Carey (1977). For character, requires 15 clock cycles to compute comparison, the filter has been implemented in the result. It requires a relatively large amount of Stratix EP1S10F484C5, Cyclone EP1C3T100C6, resources, while delivering the worst performance and CycloneII EP2C5T144C6 structures using in comparison to other implementations. Altera QuartusII v5.1 SP0.15. The next column, “MULT block,” holds the Table 7 presents the comparison of implemen- implementation results of a method similar to tation results for different design methodologies. “MAC” with a difference where multipliers were The column falling under the “MAC” label implemented in dedicated DSP-embedded blocks. presents the results obtained by implementing It can be noticed that the performance of the fil- the multiply-and-accumulate strategy with the ter increased at the cost of additional resources use of logic cell resources; without utilization in the form of DSP-embedded blocks. Results the embedded DSP blocks. Multipliers as well in the column falling under “DSP block” were as accumulators were implemented in a circuit of obtained by implementing the whole MAC unit logic cells. This implementation, due to its serial in the embedded DSP block. Further increase in

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

Table 7. Implementation results for different design methodologies. Chip: S – Stratix EP1S10F484C5; C – Cyclone EP1C3T100C6, CII – CycloneII EP2C5T144C6 1) DSP blocks are not present in this device family.

Chip MAC MULT Block DSP Block Parallel DA DA Decomposed LC 448 294 225 402 997 567 S DSP 0 2 4 30 0 0 fmax [MHz] 74.6 85.06 107.30 53.3 65.14 78.75 LC 429 436 429 961 997 567 C DSP 1) – – – – – – fmax [MHz] 77.85 79.31 77.85 57.71 72.56 70.87 LC 427 294 275 670 952 567 CII DSP 0 2 2 26 0 0 fmax [MHz] 82.62 97.5 105.59 67.02 78.21 81.46

performance could be noticed, but still 15 clock In Table 8, the experimental results of cycles have to be used to compute the result. Daubechies’ dbN, coifN, symN, and 9/7-tap bio- Results given in the “Parallel” column were orthogonal filter banks are presented. Filters 9/7 obtained by implementing the filter in a parallel are in two versions: (a) analysis filter and (s) syn- manner. In this case, the results were obtained in thesis filter. Filters dbN, coifN, symN are similar a single clock cycle. Even though, the maximum for analysis and synthesis (a/s). All filters have frequency of this implementation is less than pre- 16-bit signed samples and have been implemented vious ones, it outperforms these implementations with the use of distributed arithmetic concept in due to its parallel character. the fully parallel way. Balanced decomposition Application of the DA technique results in software was also added to increase efficiency of the increased performance since the maximum the DA tables’ implementations. frequency has increased. However, in this ap- proach, more logic cell resource has been used Table 8 presents the result for filter imple- since multipliers have been replaced by large mentations using Stratix EP1S10F484C5 device, combinational blocks and no DSP-embedded with a total count of 10,570 logic cells. In the modules have been utilized. implementation without decomposing the filters, Finally, results presented in the column the new method was modeled in AHDL, and “DA decomposed” demonstrate that the appli- Quartus2v6.0SP1 was used to map the model cation of the DA technique combined with an into the target structure. In the implementation advanced synthesis method based on balanced using decomposition, the automatic software decomposition results in a circuit that not only was used to initially decompose DA tables, and outperforms any other implemented circuit but then the Quartus system was applied to map the also reduces the necessary logic resource. The filters into FPGA. balanced decomposition technique was applied The application of the balanced decomposi- to decompose the combinational blocks of the tion concept significantly decreased the logic cell DA implementation. resource utilization and at the same time increased the speed of the implementation.

0 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

Table 8. Implementation results of filters with and without decomposition

Filter Order Without Decomposition With Decomposition

LC fmax [MHz] LC fmax [MHz] db3, a/s low-pass 6 1596 278,63 1345 254,26 db4, a/s low-pass 8 3747 212,9 2891 201,73 db5, a/s low-pass 10 10057 169,81 7377 119,39 db6, a/s low-pass 12 –** – 31153 –* 9/7, a low-pass 9 3406 206,61 1505 212,86 9/7, s low-pass 7 1483 273,37 881 263,5 9/7, a high-pass 7 2027 253,29 1229 223,16 9/7, s high-pass 9 4071 180,93 1616 189,47 coif6, a/s low-pass 6 1133 283,45 1041 260,62 coif12, a/s low-pass 12 –** – 1614 196,85 sym8, a/s low-pass 8 3663 212,72 2249 197,94 sym12, a/s low-pass 12 –** – 2313 198,61 sym14, a/s low-pass 14 –** – 2345 200,24 sym16, a/s low-pass 16 –** – 2377 206,83

* does not fit in EP1S10F484C5 ** too long compilation time (more than 24 hours) cONcLUsION results can be obtained by utilizing the parallel- ism in implemented algorithms and by applying The modern programmable structures deliver advanced synthesis methods based on decomposi- the possibilities to implement DSP algorithms in tion. Influence of the design methodology and the dedicated embedded blocks. This makes designing balanced decomposition synthesis method on the of such an algorithm an easy task. However, the efficiency of practical digital filter implementa- flexibility of programmable structures enables tion is particularly significant when the designed more advanced implementation methods to be circuit contains complex combinational blocks. used. In particular, exploitation of parallelism This is a typical situation when implementing in the algorithm to be implemented may yield digital filters using the DA concept. very good results. Additionally, the application The most efficient approach to logic synthesis of advanced logic synthesis methods based on of FIR filter algorithms discussed in this chapter balanced decomposition, which is suitable for relies on the effectiveness of the functional de- FPGA structure, leads to results that cannot be composition synthesis method. These methods achieved with any other method. were already used in decomposition algorithms; The presented results lead to the conclusion however, they were never applied together in a that if the designer decides to use the methodol- technology-specific mapper targeted at a look-up ogy known from DSP processor application, table FPGA structure. This chapter shows that it the implementation quality will profit from the is possible to apply the balanced decomposition utilization of specialized DSP modules embed- method for the synthesis of FPGA-based circuits ded in the programmable chip. However, best directed toward area or delay optimization.

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

AcKNOWLEDGMENt Trans on Acoustics, Speech and Signal Process- ing, 25(2), 121–126. This work is supported by Ministry of Science and Lapsley, P., Bier, J., Shoham, A., & Lee, E. (1997). Higher Education financial grant for years 2006- DSP processor fundamentals. New York: IEEE 2009 (Grant No. SINGAPUR/31/2006) as well as Press. Agency for Science, Technology and Research in Singapore (Grant No. 0621200011). Lee, E. (1988). Programmable DSP architectures: Part I. IEEE Transactions on Acoustics, Speech and Signal Processing Magazine, 4–19. rEFErENcEs Lee, E. (1989). Programmable DSP architectures: Part II. IEEE Transactions on Acoustics, Speech Brislawn, C.M., Bradley, C.B.J., Onyshczak, R., & and Signal Processing Magazine, 4–14. Hopper, T. (1996). The FBI compression standard for digitized fingerprint images. Proceedings of Łuba, T. (1995). Decomposition of multiple-valued the SPIE Conference 2847, Denver, Colorado, functions. Proceedings of the 25th International 344–355. Symposium on Multiple-Valued Logic, Blooming- ton, Indiana, (pp. 256–261). Brzozowski, J.A., & Łuba, T. (2003). Decompo- sition of Boolean functions specified by cubes. Łuba, T., & Selvaraj, H. (1995). A general ap- Journal of Multiple-Valued Logic and Soft Com- proach to Boolean function decomposition and puting, 9, 377–417. its applications in FPGA-based synthesis. VLSI Design, Special Issue on Decompositions in VLSI Chang, S.C., Marek-Sadowska, M., & Hwang, Design, 3(3-4), 289–300. T.T. (1996). Technology mapping for TLU FPGAs based on decomposition of binary decision dia- Łuba, T., Selvaraj, H., Nowicka, M., & Kraśniew- grams. IEEE Trans on CAD, 15(10), 1226–1236. ski, A. (1995). Balanced multilevel decomposition and its applications in FPGA-based synthesis. Croisier, A., Esteban, D., Levilion, M., & Rizo, In G. Saucier, & A. Mignotte (Eds.), Logic and V. (1973). Digital filter for PCM encoded signals. architecture synthesis. Chapman & Hall. US Patent No. 3777130. Meyer-Baese, U. (2004). Digital signal process- Daubechies, I. (1992). Ten lectures on wavelets. ing with field programmable gate arrays. Second SIAM. edition. Berlin: Springer Verlag. Falkowski, B.J. (2004). Compact representation of Nowicka, M., Łuba, T., & Rawski, M. (1999). logic functions for lossless compression of grey FPGA-based decomposition of Boolean functions: scale images. IEEE Proceedings, and Algorithms and implementation. Proceedings of Digital Techniques, 151(3), 221–230). the Sixth International Conference on Advanced Falkowski, B.J., & Chang, C.H. (1997). Forward Computer Systems, Szczecin, Poland, 502–509. and inverse transformations between Haar spectra Peled, A., & Liu, B. (1974). A new realization of and ordered binary decision diagrams of Boolean digital filters. IEEE Transactions on Acoustics, functions. IEEE Trans on Computers, 46(11), Speech and Signal Processing, 22(6), 456–462. 1271–1279. Rao, R.M., & Bopardikar, A.S. (1998). Wavelet Goodman, D.J., & Carey, M.J. (1977). Nine digi- transform: Introduction to theory and applica- tal filters for decimation and interpolation.IEEE tions. Addison-Wesley.

 Significance of Logic Synthesis inFPGA-Based Design of Image and Signal Processing Systems

Rawski, M., Jóźwiak, L., & Łuba T. (1999a). Ef- Rawski, M., Tomaszewicz, P., & Łuba, T. (2004). ficient input support selection for sub-functions Logic synthesis importance in FPGA-based in functional decomposition based on informa- designing of information and signal processing tion relationship measures. Proceedings of the systems. Proceedings of the International Confer- EUROMICRO’99 Conference, Milan, Italy. ence on Signal and Electronics Systems, Poznań, Poland, (pp. 425–428). Rawski, M., Jóźwiak, L., & Łuba, T. (1999b). The influence of the number of values in sub- Rawski, M., Tomaszewicz, P., Selvaraj, H., & functions on the effectiveness and efficiency of Łuba, T. (2005). Efficient implementation of digital the functional decomposition. Proceedings of the filters with use of advanced synthesis methods EUROMICRO’99 Conference, Milan, Italy. targeted FPGA architectures. Proceedings of the Eighth Euromicro Conference on DIGITAL Rawski, M., Jóźwiak, L., & Łuba T. (2001). SYSTEM DESIGN, Portugal, (pp. 460–466). Functional decomposition with an efficient in- put support selection for sub-functions based on Rioul, O., & Vetterli, M. (1991). Wavelets and information relationship measures. Journal of signal processing. IEEE Signal Processing Systems Architecture, 47, 137–155. Magazine, 14–38. Rawski, M., Selvaraj, H., & Morawiecki, P. (2004). Sasao, T., Iguchi, Y., & Suzuki, T. (2005). On LUT Efficient method of input variable partitioning in cascade realizations of FIR filters. Proceedings functional decomposition based on evolutionary of the Eighth Euromicro Conference on DIGITAL algorithms. Proceedings of the DSD 2004 Euro- SYSTEM DESIGN, Portugal, (pp. 467–474). micro Symposium on Digital System Design, Ar- Scholl, C. (2001). Functional decomposition with chitectures, Methods and Tools, Rennes, France, application to FPGA synthesis. Kluwer Academic (pp. 136–143). Publishers.