ISSN 2319-8885 Vol.05,Issue.28 September-2016,

Pages:5991-6002

www.ijsetr.com

An Efficient Power Consumption of VITERBI Decoder for TCM System D. KEERTHY1, R. ASHOK KUMAR2 1PG Scholar, Dept of ECE(VLSI), Ananthalakshmi Institute of Technology and Sciences, Anantapur, AP, India, E-mail: [email protected]. 2Assistant Professor, Dept of ECE, Ananthalakshmi Institute of Technology and Sciences, Anantapur, AP, India.

Abstract: High-speed, low-power design of Viterbi decoders for trellis coded (TCM) systems is presented in this paper. It is well known that the Viterbi decoder (VD) is the dominant module determining the overall power consumption of trellis coded modulation decoders. We propose a pre-computation architecture incorporated with T-algorithm for Viterbi decoder, which can effectively reduce the power consumption without degrading the decoding speed much. A general solution

to derive the optimal pre-computation steps is also given in the paper. Implementation result of a VD for a rate-3/4 Convolutional code used in a TCM system shows that compared with the full trellis VD, the pre-computation architecture reduces the power consumption by as much as 70% without performance loss, while the degradation in clock speed is negligible. Now-a-days we have lots of troubles with the channels for transmission, due to strong noise and interference while transmission. So we are here using some efficient “Viterbi Decoding Technique” for correcting the corrupted signal while transmission. The main objective of the project is that to correct the corrupted signal in communication channel due to strong noise and interference. For any digital communication channel it can be applied, the transmitted data is presented in binary form that is modulated to analog waveforms and transmitted through a channel to a receiver. In the channel the noise and interference corrupt the transmitted signal, which is mapped back to binary bits in the receiver. Some bit errors may occur if the interference is too strong so channel coding is often used to prevent these errors. The channel coding occurs in High-speed, low-power design of Viterbi Decoders for Trellis Coded Modulation (TCM) systems without performance loss, while the degradation in clock speed is negligible. There are many different methods for channel coding like linear block codes and convolution codes, where block codes are better suited for error detection and Convolutional codes are mainly used for error correction. But in this project we are going to correct the corrupted signal at the decoder, which was generated from the encoder due to strong interference in communication channel in order to get the actual signals using Verilog. So that the codes for the encoder are considered to be conventional this generates the input signals to the decoder. In this the decoder is designed by using the “Viterbi Algorithm” for getting the actual transmitted signals at the output.

Keywords: TCM, Viterbi Decoding Technique.

I. INTRODUCTION split up into Front-end design using HDLs, Verification, The expansion of VLSI is „Very Large Scale Integration‟. and Back-end Design or Physical Design. Front-end includes It is the process of designing, verifying, fabricating and design specification, architectural description, logic design, testing of a VLSI IC .A VLSI chip is an IC, which has verification, synthesis. Back-end includes Floor planning, transistors in excess of 40,000. The active devices used for placement,clock tree synthesis, routing, physical verification, fabricating an IC are CMOS FETs. Producing a VLSI chip is GDS II generation. an extremely complex task. It has number of design and verification steps. A design team comprising hundreds of In integrated circuit design, physical design is a step in engineers, scientists and technicians has to work on a modern the standard design cycle which follows after the circuit VLSI project. It is important that each member of the team design. At this step, circuit representations of the components has clear understanding of his or her part of the contribution (devices and interconnects) of the design are converted into for the design. This is accomplished by means of the design geometric representations of shapes which, when hierarchy. Any complex digital system may be broken down manufactured in the corresponding layers of materials, will into gates and memory elements by successively subdividing ensure the required functioning of the components. This the system in a hierarchical manner. Highly automated and geometric representation is called integrated circuit layout. sophisticated tools are commercially available to achieve this This step is usually split into several sub-steps, which include decomposition. Each design domain may be specified at a both design and verification and validation of the layout. various levels of abstraction such as circuit, logic and Modern day Integrated Circuit (IC) design is split up architectural. Modern day Integrated Circuit (IC) design is into Front-end design using HDLs, Verification, and Back-

Copyright @ 2016 IJSETR. All rights reserved. D. KEERTHY, R. ASHOK KUMAR end Design or Physical Design. The next step after Physical Design is the Manufacturing process or Fabrication Process that is done in the Wafer Fabrication Houses. Fab-houses fabricate designs onto silicon dies which are then packaged into ICs. Each of the phases mentioned above have Design Flows associated with them. These Design Flows lay down the process and guide-lines/framework for that phase. Physical Design flow uses the technology libraries that are provided by the fabrication houses. These technology files provide information regarding the type of Silicon wafer used, the standard-cells used, the layout rules (like DRC in VLSI), etc. Fig.1. Convolutional Encoder.

T-algorithm has been shown to be very efficient in The stream of information bits flows in to the shift reducing the power consumption. However, searching for register from one end and is shifted out at the other end. The the optimal PM in the feedback loop still reduces the location of stages as well as the number of memory elements decoding speed. To overcome this drawback, two variations determines the minimum hamming distance. Minimum of the T-algorithm have been proposed: the relaxed adaptive Hamming distance determines the maximal number of VD, which suggests using an estimated optimal PM, instead correctable bits. Interconnection functions for different rates of finding the real one each cycle and the limited-search and different number of memory elements and their parallel state VD based on scarce state transition. In our minimum hamming distances are available. preliminary work, we have shown that when applied to high- rate Convolutional codes, the relaxed adaptive VD suffers a severe degradation of bit-error-rate performance due to the inherent drifting error between the estimated optimal PM and the accurate one. In this work, we further analyze the pre- computation algorithm. A systematic way to determine the optimal pre-computation steps is presented, where the minimum number of steps for the critical path to achieve the theoretical iteration bound is calculated and the computational complexity overhead due to pre-computation is evaluated. Then, we discuss a complete low-power high- speed VD design for the rate-3/4 Convolutional code. Finally ASIC implementation results of the VD are reported, which have not been obtained in our previous work.

II. CONVOLUTIONAL ENCODER A. Convolutional Coding Fig.2.State Diagram for the Convolutional Encoder. Convolutional coding has been used in communication systems including deep space communications and wireless The operation of a Convolutional encoder can be easily communications. Convolutional codes offer an alternative to understood with the aid of a state diagram. The state diagram block codes for transmission over a noisy channel. is a graph of the possible states of the encoder and the Convolutional coding can be applied to a continuous input transitions from one state to another state. Fig.2 represents stream (which cannot be done with block codes), as well as the state diagram of the encoder shown in .1. Fig.2 depicts blocks of data. A Convolutional encoder is a Mealy machine, state transitions and the corresponding encoded outputs. As where the output is a function of the current state and the there are two memory-elements in the circuit, there are four current input. It consists of one or more shift registers and possible states that the circuit can assume. These four states multiple XOR gates. XOR gates are connected to some are represented as S0 through S3. Each state‟s information stages of the shift registers as well as to the current input to (i.e. the contents of flip-flops for the state) along with an generate the output. The encoder in Fig.1 produces two bits input generates an encoded output code. For each state, there of encoded information for each bit of input information, so it can be two outgoing transitions; one corresponding to a „0‟ is called a rate 1/2 encoder. A Convolutional encoder is input bit and the other corresponding to a „1‟ input. generally characterized in (n, k, m) format with a rate of k/n, B. 64-Bit Convolutional Encoder Calculations where We have concluded that two-step pre-computation is the  N is number of outputs of the encoder optimal choice for the rate-3/4 code VD. For convenience of  K is number of inputs of the encoder discussion, we define the left-most register in Fig.3 as the  M is number of flip-flops of the longest shift register of most-significant-bit (MSB) and the right-most register as the the encoder least-significant-bit (LSB).

International Journal of Scientific Engineering and Technology Research Volume.05, IssueNo.28, September-2016, Pages: 5991-6002 An Efficient Power Consumption of VITERBI Decoder for TCM System scheme which allows highly efficient transmission of information over band-limited channels such as telephone lines. Trellis modulation was invented by Gottfried Ungerboeck working for IBM in the 1970s, and first described in a conference paper in 1976; but it went largely unnoticed until he published a new detailed exposition in 1982 which achieved sudden widespread recognition. In the late 1980s, operating over plain old telephone service (POTS) typically achieved 9.6 kbit/s by employing 4 Fig.3. Pre-computation Algorithm. bits per symbol QAM modulation at 2,400 baud (symbols/second). This bit rate ceiling existed despite the TABLE I: States of Viterbi Decoder best efforts of many researchers, and some engineers predicted that without a major upgrade of the public phone infrastructure, the maximum achievable rate for a POTS might be 14 kbit/s for two-way communication (3,429 baud × 4 bits/symbol, using QAM). 14 kbit/s is only 40% of the theoretical maximum bit rate predicted by Shannon's Theorem for POTS lines (approximately 35 kbit/s). Ungerboeck's theories demonstrated that there was considerable untapped potential in the system, and by applying the concept to new mode standards, speed rapidly increased to 14.4, 28.8 and ultimately 33.6 kbit/s.

The name trellis was coined because a state diagram of the technique, when drawn on paper, closely resembles the trellis lattice used in rose gardens.The scheme is basically a Convolutional code of rates (r, r+1). Ungerboeck's unique contribution is to apply the parity check on a per symbol basis instead of the older technique of applying it to the bit stream then modulating the bits. The key idea he termed Mapping by Set Partitions. This idea was to group the symbols in a tree like fashion then separate them into two limbs of equal size. At each limb of the tree, the symbols were further apart. Although hard to visualize in multi- dimensions, a simple one dimension example illustrates the basic procedure. Suppose the symbols are located at [1, 2, 3, 4, ...]. Then take all odd symbols and place them in one group and the even symbols in the second group. This is not quite accurate because Ungerboeck was looking at the two dimensional problem, but the principle is the same, take every other one for each group and repeat the procedure for each tree limb. He next described a method of assigning the encoded bit stream onto the symbols in a very systematic procedure. Once this procedure was fully described, his next

step was to program the algorithms into a computer and let the computer search for the best codes. The results were astonishing. Even the simplest code (4 states) produced error rates nearly one one-thousandth of an equivalent un-coded system. For two years Ungerboeck kept these results private and only conveyed them to close colleagues.

Finally, in 1982, Ungerboeck published a paper describing the principles of trellis modulation. A flurry of research activity ensued, and by 1990 the International The 64 states and PMs are labeled from 0 to 63. The two- Union had published modem standards step pre-computation is expressed as for the first trellis-modulated modem at 14.4 kilobits/s (2,400 baud and 6 bits per symbol). Over the next several years III. VITERBI ALGORITHM In telecommunication, trellis modulation (also known as further advances in encoding, plus a corresponding symbol trellis coded modulation, or simply TCM) is a modulation rate increase from 2,400 to 3,429 baud, allowed modems to International Journal of Scientific Engineering and Technology Research Volume.05, IssueNo.28, September-2016, Pages: 5991-6002 D. KEERTHY, R. ASHOK KUMAR achieve rates up to 34.3 kilobits/s (limited by maximum Branch Metric Unit (BMU): power regulations to 33.8 kilobits/s). Today, the most common trellis-modulated V.34 modems use a 4-dimensional set partition which is achieved by treating two 2-dimensional symbols as a single lattice. This set uses 8, 16, or 32 state Convolutional codes to squeeze the equivalent of 6 to 10 bits into each symbol sent by the modem (for example, 2,400 baud × 8 bits/symbol = 19,200 bit/s). Once manufacturers introduced modems with trellis modulation, transmission rates increased to the point where interactive transfer of multimedia over the telephone became feasible (a 200 kilobyte image and a 5 megabyte song could be downloaded in less than 1 minute and 30 minutes, respectively). Sharing a floppy disk via a BBS could be done in just a few minutes, instead of an hour. Thus Ungerboeck's invention played a key role in the Information Age.

B. Viterbi Decoder A Viterbi decoder uses the Viterbi algorithm for decoding Fig.5. Branch Metric Unit. a bit stream that has been encoded using a Convolutional code. There are other algorithms for decoding a convolution A sample implementation of a branch metric unit is ally encoded stream (for example, the Fano algorithm). The shown in the above Fig.5. A branch metric unit's function is Viterbi algorithm is the most resource-consuming, but it does to calculate neither branch metrics, which are nor med the maximum likelihood decoding. It is most often used for distances between every possible symbol in the code decoding Convolutional codes with constraint lengths k<=10, alphabet, and the received symbol. There are hard decision but values up to k=15 are used in practice. Viterbi decoding and soft decision Viterbi decoders. A hard decision Viterbi was developed by Andrew J. Viterbi and published in the decoder receives a simple bit stream on its input, and a paper "Error Bounds for Convolutional Codes and an Hamming distance is used as a metric. A soft decision Viterbi Asymptotically Optimum Decoding Algorithm", IEEE decoder receives a bit stream containing information about Transactions on Information Theory, Volume IT-13, pages the reliability of each received symbol. For instance, in a 3- 260-269, in April, 1967. There are both hardware (in bit encoding, this reliability information is encoded as modems) and software implementations of a Viterbi decoder. follows in the table 2 Now I am going to implement the hardware implementation and the hardware implementation in shown in the below TABLE II: Branch Metric Unit Calculations Fig.4.

Fig.4.Viterbi Decoder.

A common way to implement a hardware viterbi decoder a hardware Viterbi decoder for basic (not punctured) code usually consists of the following major blocks:  Branch metric unit (BMU)  Path metric unit (PMU)

 Trace back unit (TBU) Fig.6. Add Compare Select Unit. International Journal of Scientific Engineering and Technology Research Volume.05, IssueNo.28, September-2016, Pages: 5991-6002 An Efficient Power Consumption of VITERBI Decoder for TCM System Path Metric Unit (PMU): A sample implementation of a A sample implementation of a trace back unit as shown in path metric unit for a specific K=4 decoder is shown in above the above Fig.8 back-trace unit restores an (almost) Fig.6. A path metric unit summarizes branch metrics to get maximum-likelihood path from the decisions made by PMU. metrics for 2K-1 paths, where K is the constraint length of the Since it does it in inverse direction, a viterbi decoder code, one of which can eventually be chosen as optimal. comprises a FILO (first-in-last-out) buffer to reconstruct a Every clock it makes 2K-1 decisions, throwing off wittingly correct order. Note that the implementation shown on the non optimal paths. The results of these decisions are written image requires double frequency. There are some tricks that to the memory of a trace back unit. The core elements of a eliminate this requirement. PMU are ACS (Add-Compare-Select) units. The way in which they are connected between themselves is defined by a IV. VITERBI DECODER IMPLEMENTATION specific code's trellis diagram. Since branch metrics are This section discusses the different parts of the Viterbi always ≥0, there must be an additional circuit preventing decoding process. Analog signals are quantized and metric counters from overflow (it isn't shown on the image). converted into digital signals in the quantization block. The An alternate method that eliminates the need to monitor the synchronization block detects the frame boundaries of code path metric growth is to allow the path metrics to "roll over", words and symbol boundaries. We assumed that the Viterbi to use this method it is necessary to make sure the path decoder receives successive code symbols, in which the metric accumulators contain enough bits to prevent the "best" boundaries of the symbols and the frames have been and "worst" values from coming within 2(n-1) of each other. identified. Trellis coded modulation (TCM) schemes are used The compare circuit is essentially unchanged. in many bandwidth- efficient systems. Typically, a TCM system employs a high-rate Convolutional code, which leads Add Compare Select Unit (ACS): to a high complexity of the Viterbi decoder (VD) for the TCM decoder, even if the constraint length of the Convolutional code is moderate. For example, the rate-3/4 Convolutional code used in a 4-D TCM system for deep space communications has a constraint length of 7; however, the computational complexity of the corresponding VD is equivalent to that of a VD for a rate-1/2 Convolutional code with a constraint length of 9 due to the large number of transitions in the trellis. Therefore, in terms of power consumption, the Viterbi decoder is the dominant module in a TCM decoder. In order to reduce the computational Fig.7. Internal Diagram Of Add Compare Select Unit. complexity as well as the power consumption, low-power schemes should be exploited for the VD in a TCM decoder. It is possible to monitor the noise level on the incoming bit stream by monitoring the rate of growth of the "best" path General solutions for low-power VD design have been metric. A simpler way to do this is to monitor a single well studied by existing work. Power reduction in VDs could location or "state" and watch it pass "upward" through say be achieved by reducing the number of states (for example, four discrete levels within the range of the accumulator. As it reduced-state sequence decoding (RSSD), M-algorithm and passes upward through each of these thresholds, a counter is T-algorithm) or by over-scaling the supply voltage. Over- incremented that reflects the "noise" present on the incoming scaling of the supply voltage usually needs to take into signal as shown in the above Fig.7. consideration the whole system that includes the VD (whether the system allows such an over-scaling or not), Trace back Unit (TBU): which is not the main focus of our research. RSSD is in general not as efficient as the T-algorithm and M -algorithm is more commonly used than T-algorithm in practical applications, because the T-algorithm requires a sorting process in a feedback loop while T-algorithm only searches for the optimal path metric (PM), that is, the minimum value or the maximum value of all PMs. T-algorithm has been shown to be very efficient in reducing the power consumption. However, searching for the optimal PM in the feedback loop still reduces the decoding speed. To overcome this drawback, two variations of the T-algorithm have been proposed: the relaxed adaptive VD, which suggests using an estimated optimal PM, instead of finding the real one each cycle and the limited-search parallel state VD based on scarce state transition (SST).

Fig.8. Trace back unit. International Journal of Scientific Engineering and Technology Research Volume.05, IssueNo.28, September-2016, Pages: 5991-6002 D. KEERTHY, R. ASHOK KUMAR In our preliminary work, we have shown that when applied architecture consists of radix-2 compare-select (CS)units. to high-rate Convolutional codes, the relaxed adaptive VD Each ACS unit in layer-2 and higher layer processors in the suffers a severe degradation of bit-error-rate (BER) proposed architecture consists of radix-2k-1 CS units. The performance due to the inherent drifting error between the main idea of the proposed method involves combining K- estimated optimal PM and the accurate one. On the other trellis steps as a pipeline structure and then combining the hand, the SST based scheme requires pre-decoding and re- resulting look-ahead branch metrics as a tree structure in a encoding processes and is not suitable for TCM decoders. In layered manner to decrease the ACS pre computation latency. TCM, the encoded data are always associated with a complex This leads to regular and simple high throughput rate viterbi multi-level modulation scheme like 8-ary phase-shift keying decoder architecture with logarithmic increase in latency, as (8PSK) or 16/64-ary quadrature (16/64 opposed to linear increase in conventional look-ahead factor. QAM) through a constellation point Mapper. At the receiver, a soft-input VD should be employed to guarantee a good coding gain. Therefore, the computational overhead and decoding latency due to pre-decoding and re-encoding of the TCM signal become high. In our preliminary work, we proposed an add-compare-select unit (ACSU) architecture based on pre-computation for VDs incorporating T - algorithm, which efficiently improves the clock speed of a VD with T -algorithm for a rate-3/4 code.

B. Working of Viterbi Algorithm The major tasks in the Viterbi decoding process are as follows:  Branch metric computation  State metric update  Survivor path recording  Output decision generation

C. Block Diagram of Decoder with Encoder There are three major components in viterbi decoder; the branch metric units (BMU), Add-compare-select unit (ACS), survivor memory unit (SMU) or Trace Back (TB) are shown in the Fig.9. Fig.10. Internal architecture of viterbi decoder.

E. Branch Metric Unit The branch metric unit calculates the branch metrics of the trellis structure from bit metrics as shown in the below Fig.11. The branch metrics are difference values between received code symbol and the corresponding branch words from the encoder trellis. The bit metrics can be calculated with a separate unit as shown in figure or a look-up table can be used. The inputs needed for this task are bit metrics, which in this case come from the Convolutional encoder. These Fig.9. Block diagram of viterbi decoder. encoder branch words are the code symbols that would be expected to come from the encoder output as a result of the D. Internal Architecture of Viterbi Decoder state transitions. In hard-decision decoding the calculation The Internal architecture of the conventional and the M- method is called Hamming distance. The Hamming distance step look-ahead viterbi decoder without encoder block is d(X, Y) between two words X and Y is defined to equal the shown in Fig.10 below. M-parallel incoming branch metrics number of differing elements. For soft-decision decoding, are used for M-step look-ahead viterbi decoder architecture. there is another algorithm called Euclidean distance. When The circled symbols A, ACS, and ACSrx and black box in the input symbol is X and encoder symbol is Y, the figure represent adder, radix-2 ACS unit, and the pipelining Euclidean distance is calculated from the formula (X-Y) 2. latch, respectively. Since there are no parallel paths for each (K-1) look-ahead branch metric computation in a nest that combines K-trellis steps, only the additions are allowed for them shown in figure above. Each adder in A, ACS, and ACSrx is a two input adder Each ACS unit in the conventional ACS pre-computation architecture and in the layer-1 processor (P1) of the proposed ACS pre-computation Fig.11. Branch Metric computation Block. International Journal of Scientific Engineering and Technology Research Volume.05, IssueNo.28, September-2016, Pages: 5991-6002 An Efficient Power Consumption of VITERBI Decoder for TCM System F. Path Metric Unit permutation network based path history unit implements The PMU calculates new path metric values and decision directly the trellis diagram of the given Convolutional code. values as shown in the Fig.12 below. Because each state can The resulting circuit has smaller routing area than register- be achieved from two states from the earlier stage, there are exchange technique and has faster decoding speed than trace- two possible path metrics coming to the current state. The back method regardless of the constraint length. In order to ACS unit, as shown in figure, adds for each of the two decode the input sequence, the survivor path, or shortest path incoming branches the corresponding states path metric, through the trellis must be traced. The selected minimum resulting in two new path metrics. A new value of the state metric path from the ACS output points the path from each metrics has to be computed at each time instant. In other state to its predecessor. In theory, decoding of the shortest words, the state metrics have to be updated every clock cycle. path would require the processing of the entire input Because of this recursion, pipelining, a common approach to sequence. In practice the survivor paths merge after some increase the throughput of the system, is not applicable. The number of iterations, as shown in bold lines in the 4-state Add-Compare-Select (ACS) unit hence is the module that example of figure. From the point they merge together, the consumes the most power and area. In order to obtain the decoding is unique. The trellis depth at which all the survivor required precision, a resolution of 7 bits for the state metrics paths merge with high probability is referred as the survivor is essential, while 5 bits are needed for the branch metrics. path length. In the PNPH unit, the trace-back operation is Since the state metrics are always positive numbers and since carried out by an “all-path broadcast” from the rightmost to only positive branch metrics are added to them, the the leftmost end rather than exchanging path information via accumulated metrics would grow indefinitely without registers controlled by the decision bits as in the register- normalization. In this project we have chosen to implement exchange method or by reading previous decision bits modulo normalization, which requires keeping an additional recursively as in the trace-back method.. Only those bit. The operation of the ACS unit is shown in figure. The connected survivor paths set by the decision bits stored in the new branch metrics are added to previous state metrics to registers can be reached to their destinations. The trace-back form the candidates for the new state metrics. The operation delay is limited only to the propagation time of the comparison can be done by using the subtraction of the two combinational circuit. candidate state metrics, and the MSB of the difference points to a larger one of two. The path with the better metric is H. Permutation Network Based Path History Unit chosen and stored as the new path metric for current state, The PNPH unit for an Convolutional code is a 5L-stage k while generating a decision bit mathematically, permutation network with each stage containing 1-to-2 de- multiplexer, where each De-Mux corresponds to each node of the trellis diagram and is associated with a K-bit register and a 2k-input OR gate. The K-bit register is used to store the decision bits associated with the state node and to determine the partial survivor path associated with the node. Thus, each registers-de-multiplexer pair determines the part of the survivor path associated its corresponding state node. The connection between two adjacent stages of the interconnection network is defined by the next function of the

Fig.12. Path metric unit. state diagram of the underlying encoder as shown in the Fig.13 below. The decision bit indicates what branch was chosen. Because each state can be achieved from two states from the earlier stage, the decision value is represented by one bit. If the bit is one the path metric selected is coming from the lower state from those two possible states in Trellis diagram, and if the decision bit is zero the path metric selected is coming from the upper state. As the ACS unit needs the results from the calculations of the previous steps, it forms a feedback loop with the external memory unit, where the results are stored.

G. Survival Path Unit The survivor path unit stores the decisions of the ACS unit and uses them to compute the decoded output. The trace- back technique and the register-exchange approaches are two major techniques used for the path history management .The former takes up less area but require much more time than the latter, since it needs to search the trace of the survivor path back sequentially. A relatively new approach called Fig.13. Permutation network based path history unit. International Journal of Scientific Engineering and Technology Research Volume.05, IssueNo.28, September-2016, Pages: 5991-6002 D. KEERTHY, R. ASHOK KUMAR New decision-bit values for each state calculated by add- exchange (RE) and trace back (TB) schemes. In the regular compare-select (ACS) enter into the rightmost end of VD without any low-power schemes, SMU always outputs corresponding shift register. The shift registers then shift left the decoded data from a fixed state (arbitrarily selected in one step, which corresponds to the decoding window moving advance) if RE scheme is used, or traces back the survivor right one position. To eliminate the requirement of a lot of path from the fixed state if TB scheme is used, for low- comparators for deciding the minimum path metric, all inputs complexity purpose. For VD incorporated with _ -algorithm, of all rightmost 1-to-2k de-multiplexer are set to 1. In general, no state is guaranteed to be active at all clock cycles. Thus it at least one of them will propagate to the output end at the is impossible to appoint a fixed state for either outputting the leftmost end since all of them will merge into one according decoded bit (RE scheme) or starting the trace-back process to the merging property of Convolutional code. (TB scheme). In the conventional implementation of T- algorithm, the decoder can use the optimal state (state with I. Proposed Design PMopt_), which is always enabled, to output or trace back In this work, we further analyze the pre-computation data. The process of searching for PMopt can find out the algorithm. A systematic way to determine the optimal pre- index of the optimal state as a by-product. computation steps is presented, where the minimum number of steps for the critical path to achieve the theoretical iteration bound is calculated and the computational complexity overhead due to pre-computation is evaluated. Then, we discuss a complete low-power high-speed VD design for the rate-3/4 Convolutional code [1]. Finally ASIC implementation results of the VD are reported, which have not been obtained in our previous work. The functional block diagram of the VD with two-step pre-computation T - algorithm is shown in Fig.14. The minimum value of each BM group (BMG) can be calculated in BMU or TMU and then passed to the “Threshold Generator” unit (TGU) to calculate (PMopt+T). (PMopt+T) and the new PMs are then compared in the “Purge Unit”.

Fig.15. Architecture of TGU.

However, when the estimated PMopt is used, or in our case PMopt is calculated from PMs at the previous time slot, it is difficult to find the index of the optimal state. A practical method is to find the index of an enabled state through PMopt priority encoder. Suppose that we have labeled the states from 0 to 63. The output of the priority encoder would be the unpurged state with the lowest index. Assuming the purged states have the flag “0” and other states are assigned the flag “1”, the truth table of such a priority encoder is shown in Table I, where “flag” is the input and “index” is the output. Implementation of such a table is not trivial. In our design, we employ an efficient architecture for the 64-to-6 Fig.14.Viterbi Decoder with Two Step Pre-computation. priority encoder based on three 4-to-2 priority encoders. The 64 flags are first divided into 4 groups, each of which T-Algorithm: The architecture of the TGU is shown in contains 16 flags. The priority encoder at level 1 detects Fig.15, which implements the key functions of the two-step which group contains at least one “1” and determines. Then pre-computation scheme. In Fig.15, the “MIN 16” unit for MUX2 selects one group of flags based on The input of the finding the minimum value in each cluster is constructed with priority encoder at level 2 can be computed from the output two stages of four-input comparators. This architecture has of MUX2 by “OR” operations. We can also reuse the been optimized to meet the iteration bound. Compared with intermediate results by introducing another MUX (MUX1). the conventional T -algorithm, the computational overhead of The output of the priority encoder at level 2 is “index [3:2]”. this architecture is 12 addition operations and a comparison, Again, “index [3:2]” selects four flags (MUX3) as the input which is slightly more than the number obtained from the of the priority encoder at level 3. Finally, the last encoder evaluation. will determine “index [3:2]”. Implementing the 4-to-2 priority encoder is much simpler than implementing the 64- SMU Design: In this section, we address an important issue to-6 priority encoder. Its truth table is shown in Table II and regarding SMU design when T-algorithm is employed. There the corresponding logics. are two different types of SMU in the literature: register International Journal of Scientific Engineering and Technology Research Volume.05, IssueNo.28, September-2016, Pages: 5991-6002 An Efficient Power Consumption of VITERBI Decoder for TCM System TABLE III: Synthesis Results for Maximum Clock Speed A path metric memory is used to store the path metric values. This forms a feedback loop with path metric unit.

Survival Path Unit: This module decides the output of the whole decoder. The module compares the metrics of all the survivors and selects a survivor such that it has the minimum metric. This is the output of the viterbi decoder. The sp block module is the basic block.

C. Algorithmic Flow of Implementation V. SIMULATION IMPLEMENTATION AND The whole program for the Convolutional encoder and PHYSICAL DESIGN viterbi decoder can be summarized as shown in the following A. Simulation of Viterbi Algorithm flowchart i.e., Fig.16. Implemented all the modules of Convolutional encoder and viterbi decoder in verilog HDL the code comprises of D. Simulation Results of Convolutional Encoder three levels of abstraction namely, behavioral, dataflow and The simulations results are taken from the XILINX tool structural. We give a brief description of all the modules we and the below shown Fig.17 describes the simulation results have used. of Convolutional encoder and the output will be according to the states in the Convolutional encoder. B. Modules Used In Implementation Trellis Codec: The top module consisting of Convolutional encoder and all the modules of the viterbi decoder and the decoded sequence of data with input as the message bits

Convolutional Encoder: We have used a rate ½ Convolutional encoder hence; it uses a state machine of four states and generates 2 bit output with 1 bit input.

Branch Metric Unit: This module computes the metric for each path in the trellis. We have used a look-up table which gives the output branch metric values, based on the encoded sequence.

Fig.17. Simulation Results of Convolutional Encoder.

Simulation Results of Viterbi Decoder: The output waveform of the error correcting viterbi decoder for a given 15-bit data is shown below. These wave forms are corresponding to the data which we have given at the encoder input. We know that in any digital communication channel the output data at the decoder is same as that of the input data that was given at the encoder input. So that, the output which we got at the viterbi decoder is same as that of the input which we have given at the input of the encoder but, in this project the output wave forms were generated with some delay, this delay is due to the convergence of the viterbi algorithm. In this we can observe that, at what time the last input bit was corrupted after that bit only the output starts to generate the output, this we can observe in the output wave forms shown in below Fig.18.

Fig.16. Algorithmic flow diagram.

Path Metric Unit: This module computes the new path metric value by summing up the incoming path metric and branch metric values. It also compares the two metrics generated at each stage and selects the minimum of the two as the new path metric value for that particular state. Based on the path selected, it generates the selection bits which are used to trace back the original path by the survivor path unit. Fig.18. Output wave forms of viterbi decoder. International Journal of Scientific Engineering and Technology Research Volume.05, IssueNo.28, September-2016, Pages: 5991-6002 D. KEERTHY, R. ASHOK KUMAR E. Device Utilization Summary TABLE VI: Timing Report Design Report: Design report describes the Operating Conditions and Wire load Models shown in the table 4.

TABLE IV: Design Reports

Power Report: The power report describes the power required in the design and that is calculated for both 90nm and 32nm and that are listed below table 7. Area Report: Area report describes the total area of the cell in both the 90nm and 32nm and that is shown in the table 5. TABLE VII: Power Reports

TABLE V: Area Repo

QOR Report: Quality of Results (QOR) is a term used in evaluating technological processes. It is generally represented Timing Report: The timing report describes the time that is as a vector of components, with the special case of uni- taken by the clock i.e., it includes the delay‟s, arrival time dimensional value as a synthetic measure. The following and required time and they are calculated in both the 90nm table 8 describes the QOR report of the design. and 32nm and that is shown in the table 6.

International Journal of Scientific Engineering and Technology Research Volume.05, IssueNo.28, September-2016, Pages: 5991-6002 An Efficient Power Consumption of VITERBI Decoder for TCM System TABLE VIII: QOR Report

Fig.20. RTL Schematic of Branch Metric Unit.

The RTL schematic for Encoder block is shown in Fig.21 below.

Fig.21. RTL Schematic of Encoder.

The RTL schematic for the “path metric value generator block” is shown in below Fig.22. F. RTL Schematic The following Fig.19 shows the RTL schematic of viterbi decoder and it includes of sub modules of the design.

Fig.19. RTL Schematic of viterbi decoder.

The RTL schematic diagram for “Branch metric “unit is Fig.22. RTL Schematic of Path Metric Unit. shown in below Fig.20 International Journal of Scientific Engineering and Technology Research Volume.05, IssueNo.28, September-2016, Pages: 5991-6002 D. KEERTHY, R. ASHOK KUMAR The RTL schematic diagram for “Survival path unit” is [8] Qing Li, Xuan-zhong Li, Han-hong Jiang and Wen-hao shown in below Fig.23. He2008, A High-Speed Viterbi Decoder, Fourth International Conference on Natural Computation IEEE.,p.p. 313-316. [9] Yao Gang, Ahmet T., Erdogan, and TughrulArslan, 2006, An Efficient Pre- Traceback Architecture for the Viterbi Decoder Targeting Wireless Communication Applications, IEEE Transactions on Circuits and Systems-I: regular papers, 53(9),423-432 [10] Yun-Ching Tang, Do-Chen Hu, Weiyi Wei, Wen-Chung Lin and HongchinLin, 2009. A Memory-Efficient Architecture for Low Latency Viterbi Decoders, IEEE.335- Fig.23. RTL Schematic of Survival Path Unit. 338 [11] Ajay Dholakia, 1994. Introduction to Convolutional VI. CONCLUSION AND FUTURE SCOPE Codes with Applications.Kluwer Academic Publishers. A. Conclusion [12] G. Forney, 1973. The Viterbi Algorithm, Proceedings of We have proposed a high-speed low-power VD design the IEEE, 61(3),268-278. for TCM systems. The pre-computation architecture that [13] Dalia A., El-Dib and Elmasry M.I. 2004. Modified incorporates T –algorithm efficiently reduces the power Register-Exchange Viterbi Decoder for Low-Power Wireless consumption of VDs without reducing the decoding speeds Communications, IEEE Transactions on Circuits and appreciably. We have also analyzed the pre-computation Systems I, ,51(2), 371- 378. algorithm, where the optima pre-computation steps are [14] Lang L, Tsui C.Y and Cheng R.S.1997. Low power soft calculated and discussed. Both the ACSU and SMU are output Viterbi decoder scheme for turbo code decoding, modified to correctly decode the signal. IEEE Conference-Paper, ISCAS „97,New York, USA, 24, 1369-1372. B. Future Scope The Viterbi Decoder is designed for 64-bits so the design is going to be complex. So, to overcome this problem in future, the Reed-Solomon codes are introduced and these are going to be in the form of bytes and this will reduces the design complexity.

VII. REFERENCES [1] Chien-Ching Lin, Yen-Hsu Shih, Hsie-Chia Chang, and Chen-Yi Lee, 2005. Design of a Power-Reduction Viterbi Decoder for WLAN Applications, IEEE Transactions on Circuits and System-I: regular papers, 52(6), 321-328G. [2] IrfanHabib, OzgunPaker, and Sergei Sawitzki,2009, Design Space Exploration of Hard- Decision Viterbi Decoding: [3]annS.YuanandWeidongKuang,2004,TeachingAsynchrono us Design in Digital Integrated Circuits, IEEE transactions on education,47(3),397-404 [4] Injin He, Zhongfeng Wang, Zhiqiang Cui, and Li Li, 2009, Towards an Optimal Trade-off of Viterbi Decoder Design, IEEE conferecne,3030- 3033 [5] Joshi M.V., Gosavi S., Jegadeesan V., Basu A., Jaiswal S., Al-Assadi W.K. and Smith S.C. 2007, NCL Implementation of Dual-Rail 2s Complement 8×8 Booth2 Multiplier using Static and Semi-Static Primitives, IEEE region 5 Technical Conference, April 20-21, Fayetteville,59- 64. [6] Jun Jin Kong, Keshhab K Parhi., 2004 Low- Latency Architectures for High-Throughput Rate Viterbi Decoder, IEEE Transactions on VLSI System, 12(6), 642-651. [7] Meilana Siswanto1, Masuri Othman, Edmond Zahedi,2006 VLSI Implementation of 1/2 Viterbi Decoder for IEEE P802.15-3a UWB Communication, IEEE ICSE 2006 Proc., Kuala Lumpur, Malaysia,666 – 670.

International Journal of Scientific Engineering and Technology Research Volume.05, IssueNo.28, September-2016, Pages: 5991-6002