Integrated Circuits, Performance, Power
Integrated Circuits, Performance, Power
Alexander Nelson January 15, 2021
University of Arkansas - Department of Computer Science and Computer Engineering Integrated Circuits Technologies Over Time
1 Transistors and Integrated Circuits
Transistors – On/Off Switch controlled by electric signal Integrated Circuit – Many transistors in a single chip VLSI/ULSI – Containing thousands/millions/billions of transistors
Intel 8080 (1974) – 6,000 Transistors AMD Ryzen 9 3900X (2019) – 9.89B Transistors 45 years = 16,483,333 times more transistors
2 Semiconductors
Semiconductor – Solid substance with conductivity between insulator and conductor Silicon – Natural semiconductor Can be chemically modified to be:
• Conductor • Insulator • Areas that can conduct or insulate as a switch
3 Manufacturing Process
Yield – Proportion of working dies per wafer
4 Intel Core i7 Wafer
300mm wafer, 280 chips, 32nm technology Each chip is 20.7 x 10.5nm
5 Integrated Circuit Cost
Nonlinear relation to area and defect rate • Wafer cost and area are fixed • Defect rate determined by manufacturing process • Die area determined by architecture & circuit design
6 Performance How do you define CPU performance?
6 Example
Which airplane has the best performance? 7 Response Time vs. Throughput
Response Time – How long to finish a task Throughput – Total work done per unit time • e.g. tasks/transaction per hour
How are these two metrics affected by:
• Replacing processor with faster version? • Adding more processors?
8 Relative Performance
If Performance defined as: 1 Performance = ExecutionTime Then: PerformanceX = Execution TimeY = n PerformanceY Execution TimeX Can be phrased as “Processor X is n times faster than Processor Y”
9 Relative Performance
If Performance defined as: 1 Performance = ExecutionTime Then: PerformanceX = Execution TimeY = n PerformanceY Execution TimeX Can be phrased as “Processor X is n times faster than Processor Y”
Example: Assume a program runs in:
• 10s on Processor A • 15s on Processor B
Then: Execution TimeY = 15 = 1.5 Execution TimeX 10 So, A is 1.5 times faster than B
10 Execution Time
How do you measure execution time?
Elapsed Realtime – Total response time including all aspects • Processing, I/O, OS overhead, idle time
Determines system performance
CPU Time – Time spent processing a given job • Discounts I/O time, other jobs’ shares
Comprises user CPU time & system CPU time Different programs affected differently by CPU & system performance
11 CPU Clocking
Nearly all CPU governed by clock
Clock Period – Duration of a clock cycle (in seconds) Clock Frequency – # of clock cycles per second (in hertz)
12 CPU Time
CPU Time can be defined as: CPU Clock Cycles CPUTime = CPUClockCycles×ClockCycleTime = Clock Rate
How to improve performance? • Reduce number of clock cycles per operation or per program • Increase clock rate
Hardware designer may need to trade off clock rate with cycle count
13 CPU Time Example
Example: Computer A: 2GHz Clock Frequency results in 10s CPU Time
Design Computer B such that:
• Aiming for 6s CPU Time • May increase clock frequency, but causes 1.2X clock cycles
14 CPU Time Example
Example: Computer A: 2GHz Clock Frequency results in 10s CPU Time
Design Computer B such that:
• Aiming for 6s CPU Time • May increase clock frequency, but causes 1.2X clock cycles
Clock CyclesB 1.2×Clock CyclesA Clock RateB = = CPU TimeB 6s 9 ClockCyclesA = CPUTimeA×ClockRateA = 10s×2GHz = 20×10 1.2×20×109 24×109 Clock RateB = 6s = 6s = 4GHz
15 Instruction Counts and CPI
Clock Cycles = Instruction Count × Cycles Per Instruction CPU Time = Instruction Count × CPI × Clock Cycle Time
Instruction Count×CPI CPU Time = Clock Rate Instruction Count – Number of instructions for a particular program Depends on: • Program • ISA • Compiler Average Cycles Per Instruction – Determined by ISA/CPU Hardware Different instructions may have different CPI Average CPI affected by % of instruction classes in program 16 CPI Example
Computer A – Cycle Time = 250ps, CPI = 2.0 Computer B – Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster? By how much?
17 CPI Example
A is faster by 1.2 times relative performance
18 CPI in Detail
Different instruction classes take different # of cycles Pn Clock Cycles = i=1(CPIi × Instruction Counti ) Weighted Average CPI: Clock Cycles Pn Instruction Counti CPI = Instruction Count = i=1(CPIi × Instruction Count ) The sum is the relative frequency of each class of instruction
19 CPI Example
20 Performance Summary
Performance Depends on:
• Algorithm – Affects instruction count, possibly CPI • Programming language – Affects instruction count & CPI • Compiler – Affects instruction count & CPI • ISA – Affects instruction count, CPI, & Time per cycle
Instructions Clock Cycles Seconds CPU Time = Program × Instruction × ClockCycle
21 Power Power Trends
Intel Core i5-9400 9th Gen (2019) – 14nm, 2.90-4.1 GHz, 65W
22 Why isn’t clock frequency continuing to increase?
22 Battery Life Isn’t Keeping Up
23 Dynamic Power Consumption
CMOS Transistors – complementary metal oxide semiconductor Best performance per watt since 1976 Gradually being replaced by FinFET technology (can go <20nm)
Primary energy consumption is dynamic energy 1 2 Energy ∝ 2 × Capacitive Load × Voltage × Frequency Switched Reducing voltage from 5V→1V allowed increase of 1000x clock frequency with only 30x gain in power consumption
24 Reducing Power
Suppose a new CPU has: • 85% capacitive load of old CPU • 15% voltage and 15% frequency reduction
2 Pnew Cold ×0.85×(Vold ×0.85) ×Fold ×0.85 4 Then: P = 2 = 0.85 = 0.52 old Cold ×Vold ×Fold The new CPU uses 52% of the power of the old CPU
However:
• Can’t reduce voltage further • Can’t remove more heat
How else can we improve performance?
25 Uniprocessor Performance
Since 2003, constrained by power, instruction-level parallelism,
memory latency 26 Multiprocessors
Multicore Microprocessors – More than one processor per chip Requires explicitly parallel programming
• Compare with instruction level parallelism • Hardware executes multiple instructions at once • Hidden from the programmer! • Hard to do • Programming for performance • Load balancing • Optimizing Communication and synchronization
27 Benchmarking Processors
SPEC – System Performance Evaluation Cooperative – Funded by computer vendors for standard set of benchmarks
SPECINTC2006 benchmarks on 2.66 GHz Intel Core i7 920
28 SPEC Power Benchmark
Power consumption of a server at different workload levels • Performance: ssj ops/sec • Power: Watts (Joules/sec)
P10 P10 Overall ssj ops per Watt = ( i=0 ssj opsi ) ÷ ( i=0 poweri )
29 Power SPEC
SPECpower ssj2008 for Xeon X5650
30 Pitfall: Amdahl’s Law
Improving one aspect does not guarantee a proportional improvement in overall performance Amdahl’s Law: Taffected Timproved = improvement factor + Tunaffected Example: Multiply accounts for 80/100 seconds of execution How much improvement in multiply performance to get 5x improvement?
31 Pitfall: Amdahl’s Law
Improving one aspect does not guarantee a proportional improvement in overall performance Amdahl’s Law: Taffected Timproved = improvement factor + Tunaffected Example: Multiply accounts for 80/100 seconds of execution How much improvement in multiply performance to get 5x improvement? 80 20 = n + 20 – Can’t be done! Corollary: Make the common case fast!
32