Take Home Midterm #2 and Final-Project#2: POWER CONSUMPTION , PROCESS MISMATCH, and SRAM

Take Home Midterm #2 and Final-Project#2: POWER CONSUMPTION , PROCESS MISMATCH, and SRAM Design OUT: Feb. 18;

IN: Mar. 18 (complete report of Midterm-#2 and Final-Project) DONE IN GROUPS!!!!

I acknowledge, that I did this work on my own, and did not copy or plagiarize the information from any other people or sources. I sign this to be true, and my signature with my name means I did this work honorably. NAME: ______

SIGNATURE:______DATE:______

Optimal Power Consumption vs. Process Sensitivity/Mismatch

Everything we’ve done so far assumes we want the fastest delay. However, for some applications, we need the maximum delay possible and the minimum power consumed. This is a real-time requirement. This is true for: ultra-low power, portable electronics. I.e. sensor networks, portable electronics, etc.

Therefore, As you are the CTO of a sensor networking startup, you need to understand the power consumption values for a NAND-2. Assume the NAND-2 characterization is general enough that you can generalize your results for any logic gate in any technology. Simulate only in simple schematic form. Assume minimum- type sizing for NAND-2 gate. Automate this as much as possible through scripting (i.e. Python; PERL), so you don’t waste too much time. Do this only for TT corner.

AUTOMATION SCRIPT (Tom Ruggeri, Feb. 2011): Here is a script, written by graduate student Tom Ruggeri, that automates this simulation across multiple processes. It was originally written for a D-FF, so you’ll have to adapt it to work for the NAND-2 you have the schematic for. http://web.engr.oregonstate.edu/~ruggerit/ece471_mt_script.tar.gz

(Energy dissipated is: integral of I dt / C) Problem #1: Energy-Delay Product for Scaled Vdd Technologies with NO Process Mismatch

Tplh Td(avg I(static Energy(static Energy(dynamic Energy(TOT)/computation ) ) ) ) 0.25u(2.5V) 2.5V 1.0V 0.6V 0.5V 0.3V

65nm(1V) 1.0V 0.7V 0.5V 0.4V 0.3V

1) PLOT: 1) DELAY vs. VDD for each process; 2) ENERGY/COMPUTATION vs. VDD for each process; 3) ENERGY*DELAY Product vs. VDD for each process (EDP on Y-Axis; VDD and process node on X-Axis)

NOTE: ENERGY(TOT) is the total energy consumed (static AND dynamic) in any one clock period. This is equivalent to ENERGY consumed / computation. YOU NEED TO FIND STATIC ENERGY CORRECTLY, and also what the cycle time is.

Process Variation

In any “real design”, devices, capacitances, resistances, and transistors do not appear as they seem. In actuality, they obey statistical variations in delay, making the behavior extremely difficult to predict. While traditionally we’d like to do a Statistical Methodology (Monte Carlo) for variation simulation/prediction, this is difficult. A more simpler way is to implement a worst-case simulation for the variation. There are two offsets possible in a process: 1) Vt Variation; 2) channel length modulation. Vt variation is characterized as: Sigma(Vt)=Avt / sqrt(W*L), where Avt ~ 2mV/um in general for most technologies. (Pelgrom’s mismatch model)

For this part, still use your NAND-2 gate, but make the worst possible Tplh and Tphl conditions. For example, the worst case Tphl is when: PMOS has smaller Vt; NMOS has larger Vt; process corner is Slow PMOS; Fast NMOS. The converse case (for Tplh) is also true, and needs to be simulated. (NOTE: Ignore the 10% gate length variation for this portion (to simplify your simulations). Just make a sigma- Vth increase (worst-case) in both the NMOS and PMOS of the FA delay, assuming W/L scaling across process technology.

Problem #2: Energy-Delay Products for Scaled Vdd Technologies with WORST- CASE Process Mismatch

Tplh Td(avg I(static Energy(static Energy(dynamic Energy(TOT)/computation ) ) ) ) 0.25u(2.5V) 2.5V 1.0V 0.6V 0.5V 0.3V

65nm(1V) 1.0V 0.7V 0.5V 0.4V 0.3V

1) PLOT: 1) DELAY vs. process; 2) ENERGY(TOT) vs. process; 3) ENERGY(TOT)*Delay Product vs. process (EDP on Y-Axis; process node on X- Axis) (SAME AS PROBLEM-#1) Problem 3: Qualitative Discussion (short answers)

1) How does the delay vs. power dissipation scale as Vdd is lowered? At what point can we stop scaling the supply voltage Vdd? Is there a limit? (HINT: Look at the leakage)

2) Vdd scaling is a great way to reduce power consumption, if the increase in delay can be tolerated. However, what happens to the delay vary between: a) no process mismatch case; b) worst case process mismatch case

3) While energy/computation seems to be reducing with reduced VDD, another important metric is energy/computation * delay (or, energy-delay product). How is Energy-Delay Product scaling? Are we seeing any benefit for low- VDD operation as we continue reducing VDD? Why or why not?

Problem 4: 2-Page Skim Paper 1) Ultralow-voltage, minimum-energy CMOS http://blaauw.eecs.umich.edu/getFile.php?id=247 2) Sub-threshold Sensor Network Processor http://blaauw.eecs.umich.edu/getFile.php?id=263 3) Razor-I paper http://blaauw.eecs.umich.edu/getFile.php?id=25 Read the three papers above, and write a 1-page synopsis, summary of low- voltage, digital logic design. NOTE: I DO READ AND GRADE THESE CAREFULLY, TO DETERMINE UNDERSTANDING OF THIS MATERIAL. PLEASE DO WRITE WELL. Problem 4a: Design of a 16 x 16b 6T-SRAM in 0.25um CMOS GOAL: Design a 16-entry SRAM with word-size=16b. NOTE: No layout is required for this design! Your design procedure should be similar to the Berkeley final project, except you do NOT need to finish the layout: http://bwrc.eecs.berkeley.edu/classes/icdesign/ee141_f08/Project/EE141- Proj1.pdf You need to build the following components to demonstrate this 16x16 SRAM:

A) 4:16 decoder for the one-hot word-select lines B) 6T-SRAM bit cell (use Min-size inverters) C) Column sensing amplifier (i.e. either opamp or offset-cancelled sense amplifier) D) Please write-up your design using the 4-page IEEE format paper. There are examples on the class webpage.

NOTE: While you are NOT doing layout, you will need to estimate the parasitic loading of your long wiring loads. Do this by multiplying your total RC wire lengths, and use a PI-model to estimate the parasitic load your decoder and column lines will really see after layout. For example, you can use power-point to estimate this.

In your final report, please estimate the leakage power, active power, maximum clock frequency, area, and noise margins for the SRAM cell.

EXTRA CREDIT-1: Scale it to 32nm-CMOS, and show the changes in leakage power, active power, maximum clock frequency, area, and noise margins.

EXTRA CREDIT-2: Add two power gate to your entire 16bx16b SRAM: a) PMOS power-gating switch, for leakage current reduction

Size this PMOS header to achieve less 10% reduction in total delay (compared with NO PMOS header). NOTE: No retention.

b) NMOS power-gate ‘drowsy’ switch Saves leakage current by dropping the ‘virtual VDD’ node, while exhibiting retention. Problem 4b: Pipelined 16b Adder with RC clock loading GOAL: Design a 16-bit adder, with pipelined stages, and significant CLK skew.

[PART-A]: 16b-adder with LARGE CLK Wire Skew Unfortunately, the clock routing tree has significant delay-skew. Assume the clock is routed backwards, from back-to-front. (The CLK wire model is below.) 1) Step-1: Design a simple D-FF (master-slave with pass-gates and inverters)

2) Step-2: Add D-FFs at the beginning/end of the 16b-adder. 3) Step-3: Add D-FFs every 4-bits.

In your final report, please estimate the leakage power, active power (LOGIC and D- FFs), and maximum clock frequency (assuming worst-case delay), for a LONG RC- wire for the clock path, from the back-to-front. WIRE MODEL of CLK:

A) L=4mm, W=0.1um B) Resistivity ρ=2.7e-8 (Ω-m) C) Capacitance=10fF/um [PART-B]: 16b-adder with INV-Buffers Inserted Into CLK Wire

In order to minimize the clock skew, inverter buffers are inserted into this long CLK wire. Insert INV buffers into this long clock, to break up the CLK delay.

1) Step-1: Design a D-FF (master-slave with pass-gates and inverters) 2) Step-2: Add D-FFs at the beginning/end of the 16b-adder.

3) Step-3: Add D-FFs every 4-bits. In your final report, please estimate the leakage power, active power (LOGIC and D- FFs), and maximum clock frequency (assuming worst-case delay), for this ‘optimal’ inverter-buffer repeater insertion.

EXTRA CREDIT-1: Scale it to 32nm-CMOS, and show the changes in leakage power, active power, maximum clock frequency, area, and noise margins.

EXTRA CREDIT-2: Add two power gate to your entire 16bx16b SRAM: a) PMOS power-gating switch, for leakage current reduction

Size this PMOS header to achieve less 10% reduction in total delay (compared with NO PMOS header). Problem-5: Grad students ONLY Design a inverter-based opamp/gain stage, using 65nm-TSMC. See how to do this at low-VDD (i.e. COME TALK WITH ME TO DISCUSS. )

Or, do both projects above. (Problem-4a and Problem-4b)