Quick viewing(Text Mode)

Integrated Circuits, Performance, Power

Integrated Circuits, Performance, Power

Alexander Nelson January 15, 2021

University of Arkansas - Department of Science and Computer Engineering Integrated Circuits Technologies Over Time

1 and Integrated Circuits

Transistors – On/Off controlled by electric signal – Many transistors in a single chip VLSI/ULSI – Containing thousands/millions/billions of transistors

Intel 8080 (1974) – 6,000 Transistors AMD 9 3900X (2019) – 9.89B Transistors 45 years = 16,483,333 times more transistors

2 Semiconductors

Semiconductor – Solid substance with conductivity between insulator and conductor Silicon – Natural semiconductor Can be chemically modified to be:

• Conductor • Insulator • Areas that can conduct or insulate as a switch

3 Manufacturing

Yield – Proportion of working dies per wafer

4 Core i7 Wafer

300mm wafer, 280 chips, 32nm technology Each chip is 20.7 x 10.5nm

5 Integrated Circuit Cost

Nonlinear relation to area and defect rate • Wafer cost and area are fixed • Defect rate determined by manufacturing process • Die area determined by architecture & circuit design

6 Performance How do you define CPU performance?

6 Example

Which airplane has the best performance? 7 Response Time vs. Throughput

Response Time – How long to finish a task Throughput – Total work done per unit time • e.g. tasks/transaction per hour

How are these two metrics affected by:

• Replacing with faster version? • Adding more processors?

8 Relative Performance

If Performance defined as: 1 Performance = ExecutionTime Then: PerformanceX = Execution TimeY = n PerformanceY Execution TimeX Can be phrased as “Processor X is n times faster than Processor Y”

9 Relative Performance

If Performance defined as: 1 Performance = ExecutionTime Then: PerformanceX = Execution TimeY = n PerformanceY Execution TimeX Can be phrased as “Processor X is n times faster than Processor Y”

Example: Assume a program runs in:

• 10s on Processor A • 15s on Processor B

Then: Execution TimeY = 15 = 1.5 Execution TimeX 10 So, A is 1.5 times faster than B

10 Execution Time

How do you measure execution time?

Elapsed Realtime – Total response time including all aspects • Processing, I/O, OS overhead, idle time

Determines system performance

CPU Time – Time spent processing a given job • Discounts I/O time, other jobs’ shares

Comprises user CPU time & system CPU time Different programs affected differently by CPU & system performance

11 CPU Clocking

Nearly all CPU governed by clock

Clock Period – Duration of a clock cycle (in seconds) Clock – # of clock cycles per second (in )

12 CPU Time

CPU Time can be defined as: CPU Clock Cycles CPUTime = CPUClockCycles×ClockCycleTime =

How to improve performance? • Reduce number of clock cycles per operation or per program • Increase clock rate

Hardware designer may need to trade off clock rate with cycle count

13 CPU Time Example

Example: Computer A: 2GHz Clock Frequency results in 10s CPU Time

Design Computer B such that:

• Aiming for 6s CPU Time • May increase clock frequency, but causes 1.2X clock cycles

14 CPU Time Example

Example: Computer A: 2GHz Clock Frequency results in 10s CPU Time

Design Computer B such that:

• Aiming for 6s CPU Time • May increase clock frequency, but causes 1.2X clock cycles

Clock CyclesB 1.2×Clock CyclesA Clock RateB = = CPU TimeB 6s 9 ClockCyclesA = CPUTimeA×ClockRateA = 10s×2GHz = 20×10 1.2×20×109 24×109 Clock RateB = 6s = 6s = 4GHz

15 Instruction Counts and CPI

Clock Cycles = Instruction Count × CPU Time = Instruction Count × CPI × Clock Cycle Time

Instruction Count×CPI CPU Time = Clock Rate Instruction Count – Number of instructions for a particular program Depends on: • Program • ISA • Compiler Average Cycles Per Instruction – Determined by ISA/CPU Hardware Different instructions may have different CPI Average CPI affected by % of instruction classes in program 16 CPI Example

Computer A – Cycle Time = 250ps, CPI = 2.0 Computer B – Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster? By how much?

17 CPI Example

A is faster by 1.2 times relative performance

18 CPI in Detail

Different instruction classes take different # of cycles Pn Clock Cycles = i=1(CPIi × Instruction Counti ) Weighted Average CPI: Clock Cycles Pn Instruction Counti CPI = Instruction Count = i=1(CPIi × Instruction Count ) The sum is the relative frequency of each class of instruction

19 CPI Example

20 Performance Summary

Performance Depends on:

• Algorithm – Affects instruction count, possibly CPI • Programming language – Affects instruction count & CPI • Compiler – Affects instruction count & CPI • ISA – Affects instruction count, CPI, & Time per cycle

Instructions Clock Cycles Seconds CPU Time = Program × Instruction × ClockCycle

21 Power Power Trends

Intel Core i5-9400 9th Gen (2019) – 14nm, 2.90-4.1 GHz, 65W

22 Why isn’t clock frequency continuing to increase?

22 Battery Life Isn’t Keeping Up

23 Dynamic Power Consumption

CMOS Transistors – complementary metal oxide semiconductor Best since 1976 Gradually being replaced by FinFET technology (can go <20nm)

Primary energy consumption is dynamic energy 1 2 Energy ∝ 2 × Capacitive Load × Voltage × Frequency Switched Reducing voltage from 5V→1V allowed increase of 1000x clock frequency with only 30x gain in power consumption

24 Reducing Power

Suppose a new CPU has: • 85% capacitive load of old CPU • 15% voltage and 15% frequency reduction

2 Pnew Cold ×0.85×(Vold ×0.85) ×Fold ×0.85 4 Then: P = 2 = 0.85 = 0.52 old Cold ×Vold ×Fold The new CPU uses 52% of the power of the old CPU

However:

• Can’t reduce voltage further • Can’t remove more heat

How else can we improve performance?

25 Uniprocessor Performance

Since 2003, constrained by power, instruction-level parallelism,

memory latency 26 Multiprocessors

Multicore – More than one processor per chip Requires explicitly parallel programming

• Compare with instruction level parallelism • Hardware executes multiple instructions at once • Hidden from the programmer! • Hard to do • Programming for performance • Load balancing • Optimizing Communication and synchronization

27 Benchmarking Processors

SPEC – System Performance Evaluation Cooperative – Funded by computer vendors for standard set of benchmarks

SPECINTC2006 benchmarks on 2.66 GHz i7 920

28 SPEC Power

Power consumption of a server at different workload levels • Performance: ssj ops/sec • Power: Watts (Joules/sec)

P10 P10 Overall ssj ops per Watt = ( i=0 ssj opsi ) ÷ ( i=0 poweri )

29 Power SPEC

SPECpower ssj2008 for X5650

30 Pitfall: Amdahl’s Law

Improving one aspect does not guarantee a proportional improvement in overall performance Amdahl’s Law: Taffected Timproved = improvement factor + Tunaffected Example: Multiply accounts for 80/100 seconds of execution How much improvement in multiply performance to get 5x improvement?

31 Pitfall: Amdahl’s Law

Improving one aspect does not guarantee a proportional improvement in overall performance Amdahl’s Law: Taffected Timproved = improvement factor + Tunaffected Example: Multiply accounts for 80/100 seconds of execution How much improvement in multiply performance to get 5x improvement? 80 20 = n + 20 – Can’t be done! Corollary: Make the common case fast!

32