Razor Based Dynamic Voltage Scaling Multiplier with 32×32 Multi Precision 1 2 SYED SHANAVAJ , M

ISSN 2319-8885 Vol.04,Issue.38, September-2015,

Pages:8334-8337

www.ijsetr.com

Razor Based Dynamic Voltage Scaling Multiplier with 32×32 Multi Precision 1 2 SYED SHANAVAJ , M. CHANDRA SEKHAR 1PG Scholar, Dept of ECE(VLSI), SVCE, JNTUA, Tirupati, Chittor(Dt), AP, India, E-mail: [email protected]. 2Assistant Professor, Dept of ECE, SVCE, JNTUA, Tirupati, Chittor(Dt), AP, India, Email: [email protected].

Abstract: This paper present a multi-precision (MP) reconfigurable multiplier that incorporates flexible precision, parallel processing (PP), razor-based dynamic voltage scaling (DVS), and MP input operands scheduling to provide optimum performance for a variety of operating conditions. All these basic blocks of the proposed reconfigurable multiplier can either work as independent smaller-precision multipliers or work in parallel to perform higher-precision multiplications. Given the user’s requirements a dynamic voltage/ frequency scaling management unit configures the multiplier to operate at the proper precision and frequency. Dithering voltage unit has single-switch and razor flip-flops helps to reduce the voltage safety margins and overhead typically associated to DVS to the smallest level. The huge amount of power and silicon area overhead typically associated to reconfigurability features were removed. At Finally, the proposed new MP multiplier can further benefit from an operands scheduler that rearranges the input data, to determine or calculate the optimum voltage and frequency operating conditions for minimum power consumption. An AMIS 0.35-μm technology is used to fabricated this low-power mp multiplier. An Experimental results shows that the proposed MP design features a 28.2% and 15.8% reduction in circuit area and power consumption compared with conventional fixed-width multiplier. When combining this MP design with razor-based Dynamic Voltage Scaling, PP, and the proposed novel operands scheduler, then total power reduction 77.7%–86.3% is achieved with a total silicon area overhead as low as 11.1%.

Keywords: Computer Arithmetic, Dynamic Voltage Scaling, Low Power Design, Multi-Precision Multiplier.

I. INTRODUCTION application-specific integrated circuits (ASICs) are designed Consumers demand for increasingly portable yet high for a fixed maximum word-length so as to accommodate the performance multimedia and communication products worst case scenario. Therefore, an 8-bit multiplication imposes stringent constraints on the power consumption of computed on a 32-bit Booth multiplier would result in individual internal components. Of these, multipliers perform unnecessary switching activity and power loss. Several works one of the most frequently encountered Arithmetic For investigated this word-length optimization proposed an embedded applications, it has become essential to design ensemble of multipliers of different precisions, with each more power-aware multipliers. Given their fairly complex optimized to cater for a particular scenario. Each pair of structure and interconnections, multipliers can exhibit a large incoming operands is routed to the smallest multiplier that number of unbalanced paths, resulting in substantial glitch can compute the result to take advantage of the lower energy generation and propagation. This spurious switching activity consumption of the smaller circuit. This ensemble of point can be mitigated by balancing internal paths through a systems is reported to consume the least power but this came combination of architectural and transistor-level optimization at the cost of increased chip area given the used ensemble technique. In addition to equalizing internal path delays, structure. To address this issue, proposed to share and reuse dynamic power reduction can also be achieved by monitoring some functional modules within the ensemble. In, an 8-bit the effective dynamic range of the input operands so as to multiplier is reused for the 16-bit multiplication, adding disable unused sections of the multiplier and/or truncate the scalability without large area penalty. Reference extended output product at the cost of reduced precision. This is this method by implementing pipelining to further improve possible because, in most sensor applications, the actual the multiplier’s performance, with several multiplier inputs do not always occupy the entire magnitude of its elements grouped together to provide higher precisions and word-length. reconfigurability.

For example, in artificial neural network applications, the Reference analyzed the overhead associated to such weight precision used during the learning phase is reconfigurable multipliers. This analysis showed that around approximately twice that of the retrieval phase. Besides, 10%–20% of extra chip area is needed for 8–16 bits operations in lower precisions are the most frequently multipliers. Combining multi precision (MP) with dynamic required. In contrast, most of today’s full-custom DSPs and voltage scaling (DVS) can provide a dramatic reduction in

Copyright @ 2015 IJSETR. All rights reserved. SYED SHANAVAJ, M. CHANDRA SEKHAR power consumption by adjusting the supply voltage The main contributions of this paper can be summarized according to circuit’s run-time workload rather than fixing it follows. to cater for the worst case scenario [4]. When adjusting the  A novel MP multiplier architecture featuring, voltage, the actual performance of the multiplier running respectively, 28.2% and 15.8% reduction in silicon area under scaled voltage has to be characterized to guarantee a and power consumption compared with its conventional fail-safe operation. Conventional DVS techniques consist 32 × 32 bit fixed-width multiplier contrarypart. In this mainly of lookup table (LUT) and on-chip critical path paper, silicon area is optimized by applying an operation replica approaches. The LUT approach tunes the supply reduction technique that replaces a multiplier by voltage according to a predefined voltage-frequency adders/subtractors. relationship stored in a LUT, which is formed considering  A silicon implementation of this MP multiplier worst case conditions (process variations, power supply integrating an error-tolerant razor-based dynamic DVS droops, temperature hot-spots, coupling noise, and many approach. more). Therefore, large margins are necessarily added, which  A novel operand scheduler that rearranges the operations in turn significantly decrease the effectiveness of the DVS on input operands so as to reduce the number of technique. transitions of the supply voltage and, in turn, minimize the overall power consumption of the multiplier. II. EXISTING SYSTEM: Today’s full-custom DSPs and application specific The razor technology is a breakthrough work, which can integrated circuits (ASICs) are designed for a fixed largely eliminates the safety margins by achieving variable maximum word-length. Therefore, an 8-bit multiplication is tolerance through in-situ timing error detection and computed on a 32-bit Booth multiplier would result in correction ability .This approach is based on a razor flip-flop, unnecessary switching activity and loss of power. Some which can detects and corrects delay errors by double Several works are investigated this word length optimization. sampling shown in fig.2. Each pair of these incoming operands is routed to the smallest multiplier that can computed the result to take advantage of the lower energy consumption of the small circuits. Combining multi precision (MP) with dynamic voltage scaling (DVS) can provide a dramatic reduction in power consumption by adjusting the supply voltage according to circuit’s run-time workload rather than fixing it to cater. When adjusting this voltage, the actual performance of the multiplier running under scaled voltage has to be characterized to guarantee a fail-safe operation. These Conventional Dynamic Voltage Scaling techniques consists of lookup table (LUT) and on-chip critical path replica Fig. 2. Conceptual view of razor flip-flop. approaches. III. SYSTEM OVERVIEW AND OPERATION The proposed MP multiplier system (Fig. 1) comprises five different modules that are as follows:  The MP multiplier;  The input operands scheduler (IOS) Whose function is to be reorder the input data stream into a buffer, hence to reduce the required power supply voltage transitions.  The frequency scaling unit implemented using a voltage controlled oscillator (VCO). Its function is to generate the required operating frequency of the multiplier.  The voltage scaling unit (VSU) implemented using a voltage dithering technique to limit huge silicon area. Its function is to generate the supply voltage so as to minimize power consumption dynamically.  The dynamic voltage/frequency management unit (VFMU) that receives the user requirement. The VFMU sends control signals to the VSU and FSU to generate the required power supply voltage and clock frequency for the MP multiplier and it is useful for all computations. It is equipped or fit-out with razor flip- flops that can report timing errors associated to insufficiently high voltage supply levels.

The operation principle was, Initially, the multiplier operates at a standard supply voltage of 3.3 V. If the razor flip-flops of Fig. 1. Overall multiplier system architecture. the MP multiplier does not report any errors, hence the International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.38, September-2015, Pages: 8334--8337 Razor Based Dynamic Voltage Scaling Multiplier with 32×32 Multi Precision supply voltage can be minimized. It is achieved through the (3) becoming curves (4)–(6), respectively. Regions 1 and 2 VFMU, which sends control signals to the VSU, hence to show the power optimization space for the combined lower the supply voltage level. The razor flip-flops indicates approach. Based on PP, the operating frequency could be timing errors, then scaling of the power supply is stopped if decreased together with the supply voltage, as shown in the feedback is provided by flipflop. The proposed multiplier curves (7) and (8).Finally, region 3 shows the optimization (Fig. 3) not only combines MP and DVS but also parallel space for these proposed approach or method, which processing (PP). Our multiplier comprises 8 × 8 bit combines the MP, DVS with PP. reconfigurable multipliers. These blocks can work either as a nine independent multipliers or work in parallel to perform IV. MP AND RECONFIGURABILITY OVERHEAD one, two or three 16 × 16 bit multiplications or a single-32 × The role of the input interface unit is to distribute the 32 bit operation. PP process is used to increase the input data between the nine independent processing elements throughput or reduce the supply voltage level for low Power (PEs) (Fig. 3) of the 32 × 32 bit MP multiplier and consider Operation. the selected operation mode. A 3-bit control bus which indicates whether the inputs are 1/4/9 pair(s) of an 8-bit operands, or 1/2/3 pair(s) of a 16-bit operands, or 1 pair of a 32-bit operands, respectively. The input data flow or stream is distributed depending on the selected operating mode, between the PEs to perform the computation. To evaluate the overhead associated to rearrangability and MP, we define X and Y are the 2n-bits multiplicand and multiplier, respectively. XH, YH are the respective n most significant bits whereas XL, YL are the respective n least significant bits. XL YL , XH YL , XLYH, XHYH are the crosswise products. The product of X and Y can be represented as follows:

(1) Where 2n bit reconfigurable multiplier can be built using adders and four n bit × n bit multipliers to compute XHYH, XHYL, XLYH, and XLYL.

Fig. 3. Possible configuration modes of proposed MP (2) multiplier. (3) then (1) could be rewritten as follows:

(4)

Comparing (1) and (4), we have removed one n × n bit multiplier (for calculating XHYL or XLYH) and one 2n-bit adder (for calculating XHYL + XLYH).

The two adders are replaced with two n-bit adders (for n calculating XH + XL and YH + YL) and two (2 + 2)-bit sub tractors (for calculating X’Y’ − XHYH − XLYL). In a 32-bit multiplier, we can thus significantly reduce the design complexity by using two 34-bit subtractors to replace a 16 × 16 bit multiplier. Actually we need two 16 × 16 bit multipliers (for calculating XHYH and XLYL) and one 17 × 17 bit multiplier (for calculating X’Y’). To evaluate the proposed MP architecture, a 32-bit x32-bit fixed-width multiplier and four sub-block MP multipliers are designed

Fig. 4. Conceptual view of optimization spaces of MP, using a Booth Radix-4 Wallace tree structure similar to that DVS, and PP approaches. used for the building blocks of our MP three sub-block multiplier. Fig. 4 shows the benefits of the different approaches. Power consumption is a linear function of the workload, That is V. DYNAMIC VOLTAGE AND FREQUENCY normally represented by the input operands precision. Curve SCALING MANAGEMENT 1 corresponds to the case of a fixed-precision (FP) multiplier A. DVS Unit using a fixed power supply. Region 1 shows the power In our implementation (Fig. 1), a dynamic power supply and optimization space for MP techniques, which use various- a VCO are employed to achieve real-time dynamic voltage precision multiplications to reduce power. If MP is combined and frequency scaling under various operating conditions. In, with DVS, then the power is further reduced with curves (1)– near-optimal dynamic voltage scaling can be achieved when using dithering voltage, which exhibits faster response time International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.38, September-2015, Pages: 8334-8337 SYED SHANAVAJ, M. CHANDRA SEKHAR than conventional voltage regulators. Depending on the time VII. CONCLUSION slots, the dithering Voltage which uses the power switches to The proposed a novel MP multiplier architecture featuring, connect different supply Voltages to the load. The respectively, 28.2% and 15.8% reduction in silicon area and requirement for multiple supplies can also result in system power consumption compared with its 32 × 32 bit overhead. To address these issues, we implemented a single- conventional fixed-width multiplier contrarypart. When MP supply voltage dithering scheme. multiplier architecture is combined with a razor-based DVS approach and the proposed novel operands scheduler, total B. Dynamic Frequency Scaling Unit power reduction 77.7%–86.3 was achieved with a total In the proposed 32 × 32 bit MP multiplier, dynamic silicon area overhead as low as 11.1%. The fabricated chip frequency scaling is used to meet throughput requirements demonstrated run-time adaptation to the actual workload by and it is based on a VCO that is implemented as a seven- operating at the minimum supply voltage level and minimum stage current starved ring oscillator. clock frequency while meeting throughput requirements. The proposed new operand scheduler rearranges the operations on VI. SIMULATION RESULTS input operands, is to minimize the number of transitions of Existing system: The below Fig.5 shows the simulation the supply voltage and to minimized the overall power results for existing system and the proposed system consumption of the multiplier. The proposed MP razor-based simulation output results is shown in fig.6. DVS multiplier provided a solution toward achieving full computational flexibility and low power consumption for various general purpose low-power applications.

VIII. REFERENCES [1] R. Min, M. Bhardwaj, S.-H. Cho, N. Ickes, E. Shih, A. Sinha, A. Wang, and A. Chandrakasan, “Energy-centric enabling technologies for wireless sensor networks,” IEEE Wirel. Commun., vol. 9, no. 4, pp. 28–39, Aug. 2002. [2] M. Bhardwaj, R. Min, and A. Chandrakasan, “Quantifying and enhancing power awareness of VLSI systems,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 9, no. 6, pp. 757–772, Dec. 2001. [3] A. Wang and A. Chandrakasan, “Energy-aware architectures for a realvalued FFT implementation,” in Proc. EEE Int. Symp. Low Power Electron. Design, Aug. 2003, pp. 360–365. [4] T. Kuroda, “Low power CMOS digital design for multimedia processors,” in Proc. Int. Conf. VLSI CAD, Oct. 1999, pp. 359–367. [5] H. Lee, “A power-aware scalable pipelined booth multiplier,” in Proc. IEEE Int. SOC Conf., Sep. 2004, pp. 123–126.

Fig.5. Existing System Output Results. [6] S.-R. Kuang and J.-P. Wang, “Design of power-efficient configurable booth multiplier,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 3, pp. 568–580, Mar. 2010. [7] O. A. Pfander, R. Hacker, and H.-J. Pfleiderer, “A multiplexer-based concept for reconfigurable multiplier arrays,” in Proc. Int. Conf. Field Program. Logic Appl., vol. 3203. Sep. 2004, pp. 938–942. [8] F. Carbognani, F. Buergin, N. Felber, H. Kaeslin, and W. Fichtner, “Transmission gates combined with level-restoring CMOS gates reduce glitches in low-power low-frequency multipliers,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 7, pp. 830–836, Jul. 2008. [9] T. Yamanaka and V. G. Moshnyaga, “Reducing multiplier energy by data-driven voltage variation,” in Proc. IEEE Int. Symp. Circuits Syst., May 2004, pp. 285–288.

Fig.6. Proposed System Output Results. International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.38, September-2015, Pages: 8334--8337