Peter Hsu, Phd [email protected] Presented at Barcelona Supercomputer Center, Spain 25 June 2018

Total Page:16

File Type:pdf, Size:1020Kb

Peter Hsu, Phd Pete2222@Mac.Com Presented at Barcelona Supercomputer Center, Spain 25 June 2018 Using Cellphone Technology to Build AI Servers Peter Hsu, PhD [email protected] Presented at Barcelona Supercomputer Center, Spain 25 June 2018 C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved Energy is Real Limit of AI 30 GFLOPS / Watt (32-bit Float) 200 PFLOPS @ 13.3 MW 15.1 GFLOPS / Watt (64-bit Float) C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 2 IBM AC922 C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 3 IBM AC922 Where does the energy go? How efficient is it really? Can we do better? C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 4 Summary heat sink • 2× — near-data processing 280 mm = 11 inches 15.0 mm 15.0 mm SoC package • 1.6× — SoC/DRAM 3D Cache Coherent Parallel SoC die layout/package co-design Phone Processor 280 mm C4-bumps • 1.6× — vector accumulator PCB solder balls 15 mm • 1.4× — best consumer electronics process Airflow DRAM +CPU fine-pitch solder balls • 1.4× — 10nm → 7nm Power chip-scale package VRM 10x more energy efficient C4-bumps i.e. 150 GFLOPS/W 64-bit Float FLASFLASH 600 GFLOPS/W 16-bit Float SSD TSV DRAM die stack C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 5 112fJ Cell Array 0.54pJ 3% 12% DRAM 1% 1.48pJ IBM AC922 8% 33% Power Wires & Buffers 22% 24% 2.31pJ GPU Subsystem and Full Server Energy (pJ) Power (W) 52% Component Operation per Op per MADD per Op Chip System Misc read row activation (per bit) 0.54 1.0 3.9 GPUSRAM HBM2 0.025 data transfer within a chip 1.48 2.7 10.6 16% 42% 8.9 35 209 14% DRAM dword / data transfer within a stack 2.3 4.2 16.6 Other MADD data transfer across interposer 0.5 1.0 3.9 CPU Arithmetic cache global wire (35mm x 1 bit) 2.7 5.0 19.6 17% 7% 5.1 12% miss write L2 cache (72-bit SRAM) 2.7 0.1 0.3 read L2 cache (72-bit SRAM) 2.7 2.7 10.6 dload / 21.8 mm local wire (2.5mm x 1 bits) 0.2 12.5 17.6 49.0 37.3 mm NVIDIA MADD V100 write vector register file (64-bit) 2.4 2.4 9.4 215 1,293 GPU read 3 vector operands (SRAM) 7.2 7.2 28.2 300W 300W 300W 300W 300W 300W MADD floating-point MADD (64-bit FP) 8.7 8.7 18.3 34.1 write 1 vector result (SRAM) 2.4 2.4 9.4 other memory interface, control, etc. 14.0 14.0 14.0 55 VRM conversion efficiency 85% 9.6 9.6 38 225 PCIe Card 2 fan (12V, 0.5A) or water pump 6 3.1 3.1 12 72 190W 190W 6 GPU Card Subtotal 300 76.6 76.6 300 1,800 3W x 8 3W x 8 DDR4 row activation (per bit) 0.54 2.5 0.7 16 10 3.0 48 DRAM data transfer within chip, I/O 1.71 7.9 2.3 IBM SRAM 3.4 40 just Power 2 wires & buffers 4.3 16 50 190 380 guesses 9 CPU other 8.5 100 2 CPU power VRM 85% 2.5 29 58 58 Misc 414W ⬆ misc.—PCIe, fans/pumps, etc. 18 414 414 Chasis TOTAL #of primary power supply 85% 17.2 405 405 3,105W 15.1 GFLOP/W 550 140 Grand Total 3,105 C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 6 Cell Array DRAM 1% Power 8% Wires & Buffers 22% 24% Misc GPU 16% SRAM 42% 14% Other CPU Arithmetic 17% 12% 7% 21.8 mm 37.3 mm 8% DRAM access & arithmetic datapath most of 38% moving data & Where does⋏ the energy go? staging in SRAM How efficient is it really? Can we do better? C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 7 Data-Streaming Architecture 300 300 300 300 300 300 190W 190W 3W x 8 3W x 8 CPU+GPU Architecture Misc TOTAL 3,105W Power 9 DRAM DRAM DRAM DRAM out-of- V100 cell array cell array cell array cell array order arithmetic NVMe SSD core I/O I/O I/O I/O register last level file NOTE: coherence ≠ always cache level-2 HBM2 PCIe cache cell array communicating through shared memory i.e. asynchronous DMA memcpy C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 8 Data-Streaming Architecture Example IBM 7094, $2M USD 0.5 MHz clock cycle 0.15 MB main memory SSD to Main Main Memory to Memory Working Memory PCIe 4 Gen3 12 5,376 Cards 8Gt/s 340 GB/s NVLINKS 75 GB/s = 32 x8 GB/s GB/s 1 TB Main Memory Power 9 DRAM DRAM DRAM DRAM out-of- V100 Tape Storage Disk Drive Core Memory Logic Gates cell array cell array cell array cell array order arithmetic NVMe SSD core I/O I/O I/O I/O register last level file 96 GB cache level-2 HBM2 PCIe cell array Working 1962 cache Memory C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 9 near-data processing Architectures SSD SSD SSD SSD SSD SSD UFS 2.1 1 TB Main Memory 307 1.2GB/s DRAM DRAM DRAM DRAM DRAM DRAM GB/s cell array cell array cell array cell array cell array cell array I/O I/O I/O I/O I/O I/O LPDDR4 8,737 GB/s data-streaming X 4266 SoC SoC SoC SoC SoC SoC level-2 level-2 level-2 level-2 level-2 level-2 cache cache cache cache cache cache register register register register register register file file file file file file arithmetic arithmetic arithmetic arithmetic arithmetic arithmetic 64 GB/s Network Network Network Network Network Network Bisection (peak) Gen4 PCIe 16Gt/s x30 SSD to Main Main Memory to Memory Working Memory PCIe 4 Gen3 12 5,376 Cards 8Gt/s 340 GB/s NVLINKS 75 GB/s = 32 x8 GB/s GB/s 1 TB Main Memory Power 9 DRAM DRAM DRAM DRAM out-of- V100 cell array cell array cell array cell array order arithmetic NVMe SSD core I/O I/O I/O I/O register last level file 96 GB cache level-2 HBM2 PCIe cache cell array Working Memory Oracle Labs RAPID parallel computer C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 10 Cache Coherent Parallel Phone Processor 280 mm = 11 inches 15.0 mm 15.0 mm Gen 4 PCIe • 16 Gt/s @ 10-12 inches • Power efficient (≈5pJ/bit) • Strong industry support 280 mm • Volume manufacturing 16×16 Power & Cooling • Like 800W processor Airflow • But already spread out Industry has much experience with 4 GB DRAM, 30GB/s BW DRAM +CPU CC-NUMA, 256 nodes just bigger 0.567 TFLOPS 16-bit Float number than usual Intel PC Power Metric AC922 This Unit SGI Origin 2000 VRM DRAM Capacity 1.1 1.0 TB DRAM Bandwidth 5.7 7.6 TB/s FLASFLASH 16-Bit Compute 145 TFLOPS SSD 64-Bit Compute 47.0 36.3 TFLOPS C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 11 near-data processing Advantages SSD SSD SSD SSD SSD SSD UFS 2.1 1 TB Main Memory 307 1.2GB/s DRAM DRAM DRAM DRAM DRAM DRAM GB/s cell array cell array cell array cell array cell array cell array I/O I/O I/O I/O I/O I/O LPDDR4 8,737 GB/s 280 mm = 11 inches 15.0 mm X 4266 SoC SoC SoC SoC SoC SoC 15.0 mm level-2 level-2 level-2 level-2 level-2 level-2 cache cache cache cache cache cache register register register register register register file file file file file file arithmetic arithmetic arithmetic arithmetic arithmetic arithmetic 64 GB/s Network Network Network Network Network Network Bisection 280 mm (peak) Gen4 PCIe 16Gt/s x30 SSD to Main Main Memory to Memory Working Memory Airflow DRAM +CPU Power VRM Use cache coherency to make physically partitioned memory easy to program FLASFLASH SSD C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 12 Best Consumer Technology 280 mm = 11 inches 15.0 mm 15.0 mm 27 March 2017 280 mm 72.3mm2 7 December 2017 C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 13 Processor over Memory heat sink SoC package SoC die C4-bumps LPDDR4Y PCB solder balls 15 mm Best of both worlds Processor Tile interface 2 mm fine-pitch solder balls 2 mm <1 mm chip-scale package 1 GB DRAM C4-bumps TSV DRAM die stack C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 14 Vector Accumulator Architecture SRAM main memory DRAM main memory DRAM main memory 0.9pJ x 2.5% 0.9pJ x 2.5% L2 Cache Vector Register File 0.9pJ 0.9pJ 0.9pJ Vector Vector × Register File Register File v1 = A[1: N] r1 = A[1: N] 3.25pJ v2 = B[1:N] r2 = B[1:N] 0.9 pJ x 4 + vac = v1 * v2 × r3 = C[1:N] × v3 = C[1:N] r7 = r1 * r2 + r3 3.25pJ vac += v3 vac + r4 = D[1:N] + v4 = D[1:N] r5 = E[1:N] v5 = E[1:N] r8 = r4 * r5 + r7 vac += v4 * v5 r6 = F[1:N] v6 = F[1:N] r9 = r5 * r6 + r8 ∑=1.82pJ ∑=4.52pJ vac += v5 * v6 G[1:N] = r9 G[1:N] = vac CRAY-1 GPU C2P3 C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 15 Vector Accumulator Architecture DRAM main memory 0.9pJ x 2.5% Block floating-point DNN training Mario Drumond Ta o Li n Martin Jaggi Babak Falsafi Vector Register File MSR Contact: Eric Chung February 20th, 2018 0.9pJ 0.9pJ Custom arithmetic for DNN Block floating-point (BFP) for DNNs × Prior work shows mixed results Compromise between fixed- and floating-point 3.25pJ § Half-precision floating-point (FP16): § Limits range of values within a single tensor + § 10x worse area/power than fixed-point § Wide range of values across tensors § Fixed-point: § Dynamically pick quantization points § Limited range vac § Complex techniques to select quantization points ü § Quantization points are static Dot products in fixed-point ✗ Other operations degenerate to floating-point (FP) Key observation: § Large fraction of DNN computations appear in dot products Custom arithmetic for dot products is enough Great candidate for custom DNN representation 3 4 C2P3 C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 16 Dragonfly Network Key idea: leverage 280 mm 15.0 mm on-chip and off-chip 15.0 mm coherence networks 280 mm 256×256=65,536 processor SoC chips C2P3 @ BSC Copyright ©2018 Peter Hsu, All Rights Reserved 17 Together SSD SSD SSD SSD SSD SSD UFS 2.1 1 TB Main Memory 307 1.2GB/s DRAM DRAM DRAM DRAM DRAM DRAM GB/s cell array cell array cell array cell array cell array cell array I/O I/O I/O I/O I/O I/O LPDDR4 8,737 GB/s X 4266 SoC SoC SoC SoC SoC SoC level-2 level-2 level-2 level-2 level-2 level-2 cache cache cache cache cache cache register register register register register register C2P3 SoC and System Energy Power (W) file file file file file file arithmetic arithmetic arithmetic arithmetic arithmetic arithmetic 64 GB/s Component Operation pj pJ/MADD per Chip System Network Network Network Network Network Network Bisection read row activation (per bit) 0.54 0.91 0.07 (peak) LPDDR4X Gen4 0.026 data transfer within a chip 1.93 3.3 5.3 0.25 0.4 106 DRAM PCIe 16Gt/s DW/MADD off chip I/O SERDES 0.7 1.2 0.09 x30 Cell Array local wire (0.5 mm x 1 bits) 0.03 0.9 0.9 0.07 DRAM load DW 2% write register file (64-bit 1.8 0.05 0.05 0.00 14% SRAM) Wires & Buffers MADD read 2 operands (64-bit 3.6 3.6 0.28 Power 10.1 15% instruction SRAM) 25% C2P3 floating-point MADD (64-bit) 6.5 6.5 0.51 1.8 468 SRAM Processor memory interface, control, etc.
Recommended publications
  • Amd Filed: February 24, 2009 (Period: December 27, 2008)
    FORM 10-K ADVANCED MICRO DEVICES INC - amd Filed: February 24, 2009 (period: December 27, 2008) Annual report which provides a comprehensive overview of the company for the past year Table of Contents 10-K - FORM 10-K PART I ITEM 1. 1 PART I ITEM 1. BUSINESS ITEM 1A. RISK FACTORS ITEM 1B. UNRESOLVED STAFF COMMENTS ITEM 2. PROPERTIES ITEM 3. LEGAL PROCEEDINGS ITEM 4. SUBMISSION OF MATTERS TO A VOTE OF SECURITY HOLDERS PART II ITEM 5. MARKET FOR REGISTRANT S COMMON EQUITY, RELATED STOCKHOLDER MATTERS AND ISSUER PURCHASES OF EQUITY SECURITIES ITEM 6. SELECTED FINANCIAL DATA ITEM 7. MANAGEMENT S DISCUSSION AND ANALYSIS OF FINANCIAL CONDITION AND RESULTS OF OPERATIONS ITEM 7A. QUANTITATIVE AND QUALITATIVE DISCLOSURE ABOUT MARKET RISK ITEM 8. FINANCIAL STATEMENTS AND SUPPLEMENTARY DATA ITEM 9. CHANGES IN AND DISAGREEMENTS WITH ACCOUNTANTS ON ACCOUNTING AND FINANCIAL DISCLOSURE ITEM 9A. CONTROLS AND PROCEDURES ITEM 9B. OTHER INFORMATION PART III ITEM 10. DIRECTORS, EXECUTIVE OFFICERS AND CORPORATE GOVERNANCE ITEM 11. EXECUTIVE COMPENSATION ITEM 12. SECURITY OWNERSHIP OF CERTAIN BENEFICIAL OWNERS AND MANAGEMENT AND RELATED STOCKHOLDER MATTERS ITEM 13. CERTAIN RELATIONSHIPS AND RELATED TRANSACTIONS AND DIRECTOR INDEPENDENCE ITEM 14. PRINCIPAL ACCOUNTANT FEES AND SERVICES PART IV ITEM 15. EXHIBITS, FINANCIAL STATEMENT SCHEDULES SIGNATURES EX-10.5(A) (OUTSIDE DIRECTOR EQUITY COMPENSATION POLICY) EX-10.19 (SEPARATION AGREEMENT AND GENERAL RELEASE) EX-21 (LIST OF AMD SUBSIDIARIES) EX-23.A (CONSENT OF ERNST YOUNG LLP - ADVANCED MICRO DEVICES) EX-23.B
    [Show full text]
  • 4010, 237 8514, 226 80486, 280 82786, 227, 280 a AA. See Anti-Aliasing (AA) Abacus, 16 Accelerated Graphics Port (AGP), 219 Acce
    Index 4010, 237 AIB. See Add-in board (AIB) 8514, 226 Air traffic control system, 303 80486, 280 Akeley, Kurt, 242 82786, 227, 280 Akkadian, 16 Algebra, 26 Alias Research, 169 Alienware, 186 A Alioscopy, 389 AA. See Anti-aliasing (AA) All-In-One computer, 352 Abacus, 16 All-points addressable (APA), 221 Accelerated Graphics Port (AGP), 219 Alpha channel, 328 AccelGraphics, 166, 273 Alpha Processor, 164 Accel-KKR, 170 ALT-256, 223 ACM. See Association for Computing Altair 680b, 181 Machinery (ACM) Alto, 158 Acorn, 156 AMD, 232, 257, 277, 410, 411 ACRTC. See Advanced CRT Controller AMD 2901 bit-slice, 318 (ACRTC) American national Standards Institute (ANSI), ACS, 158 239 Action Graphics, 164, 273 Anaglyph, 376 Acumos, 253 Anaglyph glasses, 385 A.D., 15 Analog computer, 140 Adage, 315 Anamorphic distortion, 377 Adage AGT-30, 317 Anatomic and Symbolic Mapper Engine Adams Associates, 102 (ASME), 110 Adams, Charles W., 81, 148 Anderson, Bob, 321 Add-in board (AIB), 217, 363 AN/FSQ-7, 302 Additive color, 328 Anisotropic filtering (AF), 65 Adobe, 280 ANSI. See American national Standards Adobe RGB, 328 Institute (ANSI) Advanced CRT Controller (ACRTC), 226 Anti-aliasing (AA), 63 Advanced Remote Display Station (ARDS), ANTIC graphics co-processor, 279 322 Antikythera device, 127 Advanced Visual Systems (AVS), 164 APA. See All-points addressable (APA) AED 512, 333 Apalatequi, 42 AF. See Anisotropic filtering (AF) Aperture grille, 326 AGP. See Accelerated Graphics Port (AGP) API. See Application program interface Ahiska, Yavuz, 260 standard (API) AI.
    [Show full text]
  • Printmgr File
    2010 ANNUAL REPORT ON FORM 10-K To Our Stockholders: 2010 was a year of significant accomplishments for AMD. We delivered differentiated products and platforms that expanded customer relationships and drove solid financial results. Our 2010 results demonstrate the benefits of our new business model as a fabless design company. The year also marked the start of the AMD Fusion Era of Computing as we shipped the first Accelerated Processing Units (APUs) to customers worldwide. APUs combine an unprecedented level of processing performance with low power consumption, enabling incredibly vivid and lifelike computing experiences. We believe that as we enter 2011, we are well positioned to drive profitable growth. We plan to capitalize on our position in the industry as the only company with both leading-edge graphics and x86 microprocessors. We plan to seize the opportunity ahead and lead this next era of the industry based on our AMD Fusion family of processors. Consistent financial results with a new fabless business model We achieved many financial milestones in 2010, increasing revenue 20 percent year-over-year, restructuring our balance sheet, reducing our overall debt, improving our gross margin and delivering positive adjusted non-GAAP free cash flow. Most importantly, we demonstrated that our fabless business model works, generating profitability at the operating income level every quarter in 2010. We improved our balance sheet by reducing our overall debt. Without taking into account GLOBALFOUNDRIES’ indebtedness, we reduced our debt by approximately $357 million during 2010. We remain focused on adhering to strong financial discipline and believe that we are positioned for continued profitable growth in 2011.
    [Show full text]
  • United States Securities and Exchange Commission Form
    Table of Contents UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 FORM 10-K (Mark One) x ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934. For the fiscal year ended December 29, 2012 OR ¨ TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934. For the transition period from to Commission File Number 001-07882 ADVANCED MICRO DEVICES, INC. (Exact name of registrant as specified in its charter) Delaware 94-1692300 (State or other jurisdiction of incorporation or organization) (I.R.S. Employer Identification No.) One AMD Place, Sunnyvale, California 94088 (Address of principal executive offices) (Zip Code) (408) 749-4000 (Registrant’s telephone number, including area code) Securities registered pursuant to Section 12(b) of the Act: (Title of each class) (Name of each exchange on which registered) Common Stock per share $0.01 par value New York Stock Exchange Securities registered pursuant to Section 12(g) of the Act: None Indicate by check mark if the registrant is a well-known seasoned issuer, as defined in Rule 405 of the Securities Act. Yes x No ¨ Indicate by check mark if the registrant is not required to file reports pursuant to Section 13 or Section 15(d) of the Exchange Act. Yes ¨ No x Indicate by check mark whether the registrant (1) has filed all reports required to be filed by Section 13 or 15(d) of the Securities Exchange Act of 1934 during the preceding 12 months (or for such shorter period that the registrant was required to file such reports), and (2) has been subject to such filing requirements for the past 90 days.
    [Show full text]
  • ADVANCED MICRO DEVICES, INC. (Exact Name of Registrant As Specified in Its Charter)
    Table of Contents UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 FORM 10-K (Mark One) x ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934. For the fiscal year ended December 31, 2011 OR ¨ TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934. For the transition period from to Commission File Number 001-07882 ADVANCED MICRO DEVICES, INC. (Exact name of registrant as specified in its charter) Delaware 94-1692300 (State or other jurisdiction of (I.R.S. Employer incorporation or organization) Identification No.) One AMD Place, Sunnyvale, California 94088 (Address of principal executive offices) (Zip Code) (408) 749-4000 (Registrant’s telephone number, including area code) Securities registered pursuant to Section 12(b) of the Act: (Title of each class) (Name of each exchange on which registered) Common Stock per share $0.01 par value New York Stock Exchange Securities registered pursuant to Section 12(g) of the Act: None Indicate by check mark if the registrant is a well-known seasoned issuer, as defined in Rule 405 of the Securities Act. Yes x No ¨ Indicate by check mark if the registrant is not required to file reports pursuant to Section 13 or Section 15(d) of the Exchange Act. Yes ¨ No x Indicate by check mark whether the registrant (1) has filed all reports required to be filed by Section 13 or 15(d) of the Securities Exchange Act of 1934 during the preceding 12 months (or for such shorter period that the registrant was required to file such reports), and (2) has been subject to such filing requirements for the past 90 days.
    [Show full text]
  • 2011 Annual Report on Form 10-K
    2011 ANNUAL REPORT ON FORM 10-K To Our Stockholders: AMD enters 2012 firmly focused on becoming a solid execution engine, while positioning ourselves to take advantage of growth opportunities driven by a fundamental shift in the computing ecosystem. We refreshed our product portfolio in 2011 to better align with the needs of our customers and consumers. We enabled a new era in vivid computing, with the well-received launch of the world’s first Accelerated Processing Units (APUs) and continued to offer the best HD entertainment experience on the planet with our AMD Radeon™ graphics solutions. In addition, we were pleased to exit the year with an improving server business with the introduction of a new generation of AMD Opteron™ processors. 2011 financial performance We executed well to our financial goals in 2011, and renewed our operational efficiency efforts implementing several initiatives to help reduce costs, streamline processes and speed time to market. While revenue was largely flat year-over-year, solid improvements in non-GAAP adjusted free cash flow allowed us to strategically reduce our debt by more than $200 million in principal value and strengthen our balance sheet. Further, we delivered non-GAAP net income in every quarter. We believe that consistent operational and financial performance will strengthen our ability to strategically invest in the areas of our business that will provide sustainable returns over the long-term. Improving momentum signals promising potential We shipped more than 30 million APUs in 2011, with 11 of the world’s top 12 global computer companies offering APU-based products, including Acer, Asus, Dell, HP, Lenovo, Samsung, Sony and Toshiba.
    [Show full text]
  • UCLA Anderson NVIDIA Corporation, 2001
    UCLA Anderson NVIDIA Corporation, 2001 Exhibit 1—GPUs Advance More Rapidly Than CPUs 100 Millions of Transistors GeForce3 Pentium 4 Pentium III+ GeForce256 GeForce2 Pentium II 10 Pentium III Riva TNT2 Intel CPUs Riva TNT Riva 128ZX Pentium NVIDIA Graphics Chips Riva 128 1 Millions of Operations/Second Pentium 4 Pentium III+ 1000 Intel CPUs Single Operations Pentium III Pentium II 100 GeForce3 Pentium GeForce2 GeForce256 Riva TNT2 10 Riva TNT Riva 128ZX NVIDIA Graphics Chips Riva 128 Floating-Point Operations 1 Jan-93 Jun-94 Oct-95 Mar-97 Jul-98 Dec-99 Apr-01 This case was prepared by Professor Richard P. Rumelt with the research assistance of Gregory Rose, UCLA Anderson M.B.A., 2004 and Oliver Costa, UCLA Anderson MBA Class of 2003. This case was prepared using information provided by NVIDIA Corporation, as well as public sources, to serve as the basis for class discussion rather than to illustrate either effective or ineffective management. © 2004 by Richard P. Rumelt. NVIDIA Corporation, 2001 NVIDIA1 was formed in April, 1993, by Jen‐Hsun Huang, Curtis Priem, and Chris Mala‐ chowsky. Huang had been an engineer‐manager at LSI Logic; Priem and Malachowsky had been CTO and VP for Hardware Engineering respectively, at Sun Microsystems. The founders believed that the time was ripe for the introduction of higher‐quality 3D graphics chips in the PC market. The early years were not auspicious. NVIDIA’s first two products failed and start‐up competitor 3Dfx Interactive came to dominate the 3D graphics market with a 1998 share of 85%. To make matters worse, rapid growth in 3D graphics processors brought industry giant Intel into the fray.
    [Show full text]
  • M&A Seminar Series–Session Eight: CEO View: Top Ten Business Points Summary
    M&A Seminar Series–Session Eight: CEO View: Top Ten Business Points Summary tony zingale, dave orton, ben horowitz, dave healy january 29, 2009 overview 1. Maximize valuation by building a great company 3. Generally, it is best to maximize M&A valuation (not just a great product) that is the leader in through a competitive deal process and by its market segment. You can become a market maintaining an ability to remain independent leader by buying companies to fill out your product and thus having an ability to walk from the deal suite, increase average sales per customer (Opsware; Clarify; Mercury; ATI). Sometimes, and improve your go-to market strategy (e.g., however, a thorough market check is not advisable, Cadence; Opsware; Clarify; Mercury Interactive) such as where a leak about the deal might upset or by using M&A to expand into new markets (ATI; a target’s competitive situation or where there is Opsware). The best deals help the buyer to start only a small number of viable bidders (ATI). Often, selling “business value” to customers’ senior strategic partner discussions will lead to M&A or executives rather than “technology value” to their make it easier to do a rapid market check (Opsware). R&D or IT groups (Mercury Interactive). Shedding weak businesses (e.g., Loudcloud’s web hosting 4. Macroeconomic factors should inform your business) or product lines (e.g., ATI’s low margin assessment of your own valuation. Valuation is graphics boards) to focus on core strengths is relative to that of peers and targets, and “markets critical.
    [Show full text]