Introduction to the Poulson (Intel 9500 Series) Processor Openvms Advanced Technical Boot Camp 2015 Keith Parris / September 29, 2015
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to the Poulson (Intel 9500 Series) Processor OpenVMS Advanced Technical Boot Camp 2015 Keith Parris / September 29, 2015 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Information on Poulson from Intel’s ISSCC Paper © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson information from Intel’s ISSCC Paper http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/itanium-poulson-isscc-paper.pdf 3 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson information from Intel’s ISSCC Paper http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/itanium-poulson-isscc-paper.pdf 4 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson information from Intel’s ISSCC Paper http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/itanium-poulson-isscc-paper.pdf 5 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson information from Intel’s ISSCC Paper • Intel presented a paper on Poulson at the International Solid-State Chips Conference (ISSCC) in July 2011. From this, we learned: • Poulson would be in a 32 nm process (2 process generations ahead from Tukwila, which was at 65 nm, skipping the 45 nm process) • The socket would be compatible with Tukwila • Poulson would have 8 cores, of a brand new core design • The front end (instruction fetch) would be decoupled from the back end (instruction execution) • Poulson could execute and retire as many as 12 instructions per cycle, double Tukwila’s 6 instructions http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/itanium-poulson-isscc-paper.pdf 6 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson information from Intel’s ISSCC Paper • Power consumption would be 170 watts vs. 185 for Tukwila • Delivery was slated for 2012 http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/itanium-poulson-isscc-paper.pdf 7 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Information on Poulson from Intel’s announcement © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson information from Intel announcement http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/itanium-poulson-isscc-paper.pdf 9 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson information from Intel announcement http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/itanium-poulson-isscc-paper.pdf 10 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson information from Intel announcement http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/itanium-poulson-isscc-paper.pdf 11 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson information from Intel announcement http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/itanium-poulson-isscc-paper.pdf 12 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson information from Intel announcement http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/itanium-poulson-isscc-paper.pdf 13 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson features • 2 memory execution units • 2 general purpose integer units • 2 ALU units • 2 floating-point units • 3 branch units • 1 NOP unit 14 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson features • New instructions: 4-cycle Integer multiply, count leading zeroes; • Better OS control of thread behavior; data access hints; new user-controlled register file to control these hints, allowing compilers much finer-grained control of data cache and TLB policies; multi-line software prefetch. • Increased memory queue sizes. • Scheduler changed to focus on performance and power. • Additional 32 entries in the integer register file • L3 protected by ECC allowing double-bit error correction and triple-bit detection • Turbo Boost 2.0 can increase the clock speed of active cores 15 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Comparison between Tukwila and Poulson © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Specifications Tukwila Poulson Issue width (max. instructions executed per 6 (2 bundles of 3) 12 (4 bundles of 3) cycle) Number of cores 4 8 Clock frequency 1.73 Ghz 2.53 Ghz Power (Thermal Design Power, or TDP) 185 W 170 W Hyperthreading type Switch-on-Event-or-Timer Dual-Domain Fine-Grain On-chip cache 30 MB 54 MB L3 cache 24 MB 32 MB Transistors 2 billion 3.1 billion QPI link speed 48 GB/s 128 GB/s Memory bandwidth 34 GB/s 45 GB/s Memory technology DDR3-800 DDR3-1067 Memory sparing DIMM DIMM and Rank Process technology 65 nm 32 nm Quickpath (QPI) transfer rate 4.8 GT/s 6.4 GT/s Micro-architecture Global micro-stall Replay and flush mechanisms Register file 128 integer; 128 floating-point 160 integer; 128 floating-point Die size 21.5 by 32.5 mm (699 mm2) 18.2 mm by 29.9 mm (444 mm2) Memory per socket 256 GB 512 GB Virtualization Vt-i 2 Vt-i 3 17 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson processors and the i4 Platform © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Different flavors of Poulson chip • As with Tukwila, there are models of the chip available at different prices with different core counts, frequencies, and on- chip cache sizes. • So a Poulson chip could be 8-core or only quad-core, and run at different clock rates (and thus execution speeds). • HP will happily sell you your choice of CPU flavor in the rx2800 i4, BL860c i4, BL870c i4, and BL890c i4. 19 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Performance © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Performance estimates from Intel and HP • Intel claimed 2.4x performance gain over Tukwila • HP benchmarks on i4 platform were better, and claimed 3x “Based on HP labs testing that compared HP Integrity blade servers with the Intel Itanium processor 9500 series versus Intel Itanium processor 9300 series, resulting in a 3.29 times performance improvement over the previous generation, rounded to three times the improvement.” 21 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. NUMA Configuration of i2 and i4 Servers i2 Integrity Blades and rx2800 i2 Server Non-Uniform Memory Access (NUMA) CPUs and Memory in 2-socket Servers or Blades with I/O BL870c i2: BL890c i2: 22 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. NUMA Configuration of i2 and i4 Servers I2 Integrity Servers / Blades Non-Uniform Memory Access (NUMA) CPUs and Memory in 2-socket i2 Servers or i2 Blades with I/O • Fastest memory access is within a socket; slower between sockets, and slower as the number of hops increases, but worst-case is still less than 2X the best-case latency • BL890 i2 (32p) memory access times from Socket 0 are shown below: 217 319 319 425 1.0x 1.47x 1.47x 1.96x 319 406 430 304 1.47x 1.87x 1.98x 1.40x 23 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Implications of 8-core processor on i4 servers • The Poulson (9500) can have twice the number of cores as a Tukwila (9300). • This means customers who needed a BL890c i2 server can use a BL870c i4 server instead, and still have the same number of cores, and • Customers who before needed a BL870c i2 server can use a BL860c i4 server instead. • This also reduces the NUMA performance penalty because there are fewer blades needed and more memory accesses are local to a socket or local to a blade and are thus faster. 24 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Simple performance tests: Local lock requests 25 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Simple performance tests: Spinlock acquisition 26 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Simple performance tests: VUPS.COM 27 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Kittson © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Kittson information from Intel Poulson announcement 29 © Copyright 2015 Hewlett-Packard Development Company, L.P.