Amd “Trinity” Apu

Total Page:16

File Type:pdf, Size:1020Kb

Amd “Trinity” Apu HOT CHIPS 2012 AMD “TRINITY” APU Sebastien Nussbaum AMD Fellow Trinity SOC Architect AMD APU “TRINITY” WITH AMD DISCRETE CLASS GRAPHICS ALL NEW ARCHITECTURE FOR UP TO 50% GPU1 AND UP TO 25% BETTER X86 PERFORMANCE2 . “Piledriver” Cores – Improved performance and power efficiency – 3rd-Gen Turbo Core technology – Quad CPU Core with total of 4MB L2 . 2nd-Gen AMD Radeon™ with DirectX® 11 support – 384 Radeon™ Cores 2.0 . HD Media Accelerator – Accelerates and improves HD playback – Accelerates media conversion – Improves streaming media – Allows for smooth wireless video . Enhanced Display Support – AMD Eyefinity Technology3 – 3 Simultaneous DisplayPort 1.2 or HDMI/DVI links – Up to 4 display heads with display multi-streaming 2 AMD “Trinity” HotChips 2012 | Sebastien Nussbaum | July 2012 “TRINITY” FLOORPLAN 32nm SOI, 246mm2, 1.303BN TRANSISTORS DDR3 Controller Dual Channel DDR3 Memory Controller Memory Scheduler AMD HD Media Accelerator Channel (UVD, AMD Accelerated Accelerator Accelerator Video Converter) L2 L2 MediaHD AMD Northbridge Unified Northbridge Cache Cache GPU AMD Radeon™ GPU Dual Dual Up to 4 Core Core “Piledriver” x86 x86 Cores with total 4MB L2 Module Module HDMI, DisplayPort 1.2, DVI controllers PCI Express® I/O — PCIe Display DP/ Display Controller PCIe® ® 24 lanes, optional digital PCIe® PLL HDMI® display interfaces 3 AMD “Trinity” HotChips 2012 | Sebastien Nussbaum | July 2012 AMD 2ND GENERATION “BULLDOZER” CORE: “PILEDRIVER” 32nm "PILEDRIVER" COMPUTE MODULE x86 CORE REDESIGN . Shared Fetcher / prediction pipeline - 64KB I-Cache . Shared 4-way x86 decoder Prediction Queue ICache L1 BTB . Shared Floating Point Unit - dual 128-bit FMA pipes Fetch Queue L2 BTB . Shared 16-way 2MB L2; Ucode ROM 4 x86 Decoders . Dedicated integer cores – Register renaming based on physical register file Int Int – Unified scheduler per core Scheduler FP Scheduler Scheduler – Way-predicted 16KB L1 D-cache Instr Instr Out-of-order Load-Store Unit Retire Retire -bit – -bit MMX MMX FMAC FMAC AGen AGen AGen AGen 128 128 EX, DIV EX, DIV EX, . ISA additions: FMA3, F16C MULEX, MULEX, . Lightweight profiling support in HW . “Piledriver” performance increase over “Stars” Ld/ST Unit L1 DTLB L1 DCache FP Ld Buffer Ld/ST Unit L1 DTLB L1 DCache – 14% improvement for desktop5 Data Prefetcher Shared L2 Cache – 25% improvement for notebook2 – AMD Turbo Core 3.0 5 AMD “Trinity” HotChips 2012 | Sebastien Nussbaum | July 2012 “PILEDRIVER” CORE FLOOR PLAN Microcode ROM Branch Instruction 64KB L1 Predictor Decode I-Cache Integer Integer Integer Integer Datapath Scheduler Scheduler Datapath 16KB L1 16KB L1 2MB L2 Cache Unit D-Cache D-Cache Load/Store Load/Store Cache L2 Floating-Point Unit 6 AMD “Trinity” HotChips 2012 | Sebastien Nussbaum | July 2012 “PILEDRIVER” IMPROVEMENTS & ENHANCEMENTS VS. “BULLDOZER” “Bulldozer” Hybrid Predictor Augmented with 2nd level predictor ISA extensions FMA3, F16C Improved scheduling FPU and INT Faster Instruction exe SYSCALL & SYSRET HW Divider 2x Larger L1 TLB Improved Store-to- Load Forwarding L2 efficiency and prefetching HW L1 Pre-fetcher improvements improvements 7 AMD “Trinity” HotChips 2012 | Sebastien Nussbaum | July 2012 “PILEDRIVER” IMPROVEMENTS 30% higher . Design optimized for wide 10% lower . Loop Predictor CPU Freq 13 dynamic operational range (0.8V to 1.3V) . Way Predictor . 30% higher frequency at same power vs. “Bulldozer”13 . Dispatch gating based on voltage as “Stars” CPU Core in group size “Llano” . Clock Gating . Reduction in high power flops Efficient . 50% more base product Power . Intelligent L2 content operation management frequency vs. “Llano” at tracking to speed up L2 same 35W SOC TDP Latency reduction flush A10-4600M vs A8-3600M . State save/restore latency improvements to speed up power gating 8 AMD “Trinity” HotChips 2012 | Sebastien Nussbaum | July 2012 MEDIA PROCESSING ACCELERATION AMD’S UNIFIED VIDEO DECODER (UVD) UVD UVD UVD 1st generation 2nd generation AMD A-Series APU H.264 / AVCHD H.264 / AVCHD H.264 / AVCHD VC-1 / WMV profile D VC-1 / WMV profile D VC-1 / WMV profile D Video MPEG-2 MPEG-2 Formats MPEG-4 / DivX Bitstream decode Bitstream decode Bitstream decode Picture-in-Picture Picture-in-Picture Dual stream HD+SD Dual stream HD+SD Features Dual stream HD+HD 10 AMD “Trinity” HotChips 2012 | Sebastien Nussbaum | July 2012 “TRINITY” ACCELERATED VIDEO CONVERTER (“AVC”) Core . Multi-stream hardware H.264 HD Encoder functionality . Power-efficient and faster than real-time6 1080p @60fps Quality . 4:2:0 color sampling video features . Optimizations for scene changes (games and video) . Variable compression quality Interfacing . Audio / Video multiplexing features . Input from frame buffer for transcoding and video conferencing . Input from GPU display engine for wireless display7 11 AMD “Trinity” HotChips 2012 | Sebastien Nussbaum | July 2012 VIDEO ENCODING SYSTEM OPERATION APU x86 Bitrate Feedback GPU Rate Control Current frame (Uncompressed YUV420) Forward Transform (e.g. FDCT) Reference frames in the Intra group of pictures (GOP) Prediction Quantization Entropy 0100110010110 Encode 10111100… Motion H.264 Estimation Compressed Stream AMD’s Accelerated Video Converter 12 AMD “Trinity” HotChips 2012 | Sebastien Nussbaum | July 2012 GPU DESIGN UPDATES FOR GAMING AND COMPUTE 3D ENGINE Command Processor Thread Generator . DirectX® 11 – SM 5.0, OpenCL™ 1.1, DC 11 Instr. Cache Ultra-Threaded Dispatch Processor . GPU Core made of 384 Radeon™ Cores , each Constant capable of 1 SP FMAC per cycle Cache 32kB Local DataShare 32kB Local DataShare 32kB Local DataShare SIMD Engine 1 Engine SIMD 2 Engine SIMD 6 Engine SIMD – Organized as 96 stream processing units – each 4- BufferExport way VLIW (vs. 5-way in Llano) Memory Controller – 6 SIMDs (each contains 16 processing units) – Each SIMD share 1 texture unit – achieving 4:1 ALU:Texture rate Registers GlobalSynchronization . 32 depth / stencil per clock, 8 color per clock Fetch Fetch Fetch 64kB GlobalDataShare . 24x multi-sample and super sample, 16x 8kB 8kB 8kB anisotropic filtering L1 L1 L1 . Improved hardware tessellator vs. “Llano” . Compute improvements 128kB 128kB 128kB 128kB L2 L2 L2 L2 – Asynchronous dispatch: multiple compute kernels with independent address space simultaneously 14 AMD “Trinity” HotChips 2012 | Sebastien Nussbaum | July 2012 PERFORMANCE ACHIEVEMENTS PERFORMANCE INCREASE ON CLIENT WORKLOADS (FOR 35W TDP ) “TRINITY” VS. “LLANO” CPU PERFORMANCE INCLUDING POWER MANAGEMENT, FREQ AND IPC GAINS Digital Media Sony Vegas PowerDirector 9 HD Transcode Handbrake HD Handbrake DVD iTunes Quicktime Pro Photoshop Elements 2.0 0% 10% 20% 30% 40% 50% 60% Web & Productivity: Compression & Cryptography ® PCMark7 Score ® PCMark7 Productivity Windows Send & Compress 7Zip GUIMark2 HTML5 GUIMark2 Bitmap 0% 10% 20% 30% 40% 50% 60% 70% Experimental setup: see footnote 10 16 AMD “Trinity” HotChips 2012 | Sebastien Nussbaum | July 2012 PERFORMANCE AND POWER COMPARISON VS PRIOR-GENERATION ® Visual Performance - 3DMark Vantage Performance Battery Life Hours - Windows Idle See footnote 11 (Est. 62 Whr. Battery) A10 12.2 4530 A10 A8 12.2 3850 A8 Trinity A6 12.2 A6 2650 12.2 Llano A4 A4 2200 0 5 10 15 0 2000 4000 6000 See footnotes 4 and 8 or battery life measurement considerations General Performance - PCMark® Vantage Overall Compute Capacity - Calculated CTP SP GFLOPS See footnote 11 A10 6125 A10 603 A8 405 A8 5656 A6 306 A6 4882 A4 204 A4 4600 0 200 400 600 800 9 0 2000 4000 6000 8000 See footnote Trinity performance based on estimates and/or preliminary benchmarks and are subject to change. 17 AMD “Trinity” HotChips 2012 | Sebastien Nussbaum | July 2012 AMD TURBO CORE 3.0 TECHNOLOGY AMD TURBO CORE 3.0 TECHNOLOGY : OVERVIEW UTILIZE CALCULATED AVAILABLE DYNAMIC THERMAL HEADROOM TO IMPROVE PERFORMANCE . 10-20 ºC variations across die during peak load GPU-dominated workload (3DMark®) CPU-dominated workload (Livermore Loop 1Thread) with single thread application on CPU CPU0=17W, GPU=4.2W CPU0=2.7W, GPU=23.9W CPU0 At Tjmax GPU At Tjmax Die Y Die X Die Y dim dim dim Die X dim Simulation results for engineering discussion – no claims made to applicability to specific configuration of sold products. Chip divided into “Thermal Entities” (TE) – Thermal Entity calculate power and thermal density Fan . Thermal RC network – Transfer coefficients that describe thermal transfer between TEs, substrate and package are characterized – Numerical analysis firmware runs on the management processor which calculates per TE temperatures – TEs are throttled using voltage/frequency adjustments according to workload heuristics 19 AMD “Trinity” HotChips 2012 | Sebastien Nussbaum | July 2012 AMD TURBO CORE 3.0 TECHNOLOGY: CALCULATED VS. MEASURED TEMPERATURE CPU module 0 (Calculated) GPU (Calculated) Hotspot (measured) CPU module 1 (Calculated) 100ºC temp temp Estimated +/- 3- 5C difference in 80ºC 3DMark® Vantage Measured hotspot temperature calculated hotspot 0s time 1200s vs. measured hot 100ºC spot temperature, at steady thermal state temp temp Measured hotspot temperature 80ºC CPU Power Virus Experimental results for engineering review, no observable product functional operational difference results from thermal differences. No claims made to accuracy 20 AMD “Trinity” HotChips 2012 | Sebastien Nussbaum | July 2012 AMD TURBO CORE 3.0 TECHNOLOGY – PERFORMANCE . Workloads of moderate activity have high residency at maximum frequency – Thermal headroom allows hotspot to remain below maximum control temperature . Higher activity workloads offer fewer opportunities to raise frequency
Recommended publications
  • Improving Resource Utilization in Heterogeneous CPU-GPU Systems
    Improving Resource Utilization in Heterogeneous CPU-GPU Systems A Dissertation Presented to the Faculty of the School of Engineering and Applied Science University of Virginia In Partial Fulfillment of the requirements for the Degree Doctor of Philosophy (Computer Engineering) by Michael Boyer May 2013 c 2013 Michael Boyer Abstract Graphics processing units (GPUs) have attracted enormous interest over the past decade due to substantial increases in both performance and programmability. Programmers can potentially leverage GPUs for substantial performance gains, but at the cost of significant software engineering effort. In practice, most GPU applications do not effectively utilize all of the available resources in a system: they either fail to use use a resource at all or use a resource to less than its full potential. This underutilization can hurt both performance and energy efficiency. In this dissertation, we address the underutilization of resources in heterogeneous CPU-GPU systems in three different contexts. First, we address the underutilization of a single GPU by reducing CPU-GPU interaction to improve performance. We use as a case study a computationally-intensive video-tracking application from systems biology. Because of the high cost of CPU-GPU coordination, our initial, straightforward attempts to accelerate this application failed to effectively utilize the GPU. By leveraging some non-obvious optimization strategies, we significantly decreased the amount of CPU-GPU interaction and improved the performance of the GPU implementation by 26x relative to the best CPU implementation. Based on the lessons we learned, we present general guidelines for optimizing GPU applications as well as recommendations for system-level changes that would simplify the development of high-performance GPU applications.
    [Show full text]
  • Oracle Solaris 11.4 Security Target, V1.3
    Oracle Solaris 11.4 Security Target Version 1.3 February 2021 Document prepared by www.lightshipsec.com Oracle Security Target Document History Version Date Author Description 1.0 09 Nov 2020 G Nickel Update TOE version 1.1 19 Nov 2020 G Nickel Update IDR version 1.2 25 Jan 2021 L Turner Update TLS and SSH. 1.3 8 Feb 2021 L Turner Finalize for certification. Page 2 of 40 Oracle Security Target Table of Contents 1 Introduction ........................................................................................................................... 5 1.1 Overview ........................................................................................................................ 5 1.2 Identification ................................................................................................................... 5 1.3 Conformance Claims ...................................................................................................... 5 1.4 Terminology ................................................................................................................... 6 2 TOE Description .................................................................................................................... 9 2.1 Type ............................................................................................................................... 9 2.2 Usage ............................................................................................................................. 9 2.3 Logical Scope ................................................................................................................
    [Show full text]
  • Cryptanalysis of the Random Number Generator of the Windows Operating System
    Cryptanalysis of the Random Number Generator of the Windows Operating System Leo Dorrendorf School of Engineering and Computer Science The Hebrew University of Jerusalem 91904 Jerusalem, Israel [email protected] Zvi Gutterman Benny Pinkas¤ School of Engineering and Computer Science Department of Computer Science The Hebrew University of Jerusalem University of Haifa 91904 Jerusalem, Israel 31905 Haifa, Israel [email protected] [email protected] November 4, 2007 Abstract The pseudo-random number generator (PRNG) used by the Windows operating system is the most commonly used PRNG. The pseudo-randomness of the output of this generator is crucial for the security of almost any application running in Windows. Nevertheless, its exact algorithm was never published. We examined the binary code of a distribution of Windows 2000, which is still the second most popular operating system after Windows XP. (This investigation was done without any help from Microsoft.) We reconstructed, for the ¯rst time, the algorithm used by the pseudo- random number generator (namely, the function CryptGenRandom). We analyzed the security of the algorithm and found a non-trivial attack: given the internal state of the generator, the previous state can be computed in O(223) work (this is an attack on the forward-security of the generator, an O(1) attack on backward security is trivial). The attack on forward-security demonstrates that the design of the generator is flawed, since it is well known how to prevent such attacks. We also analyzed the way in which the generator is run by the operating system, and found that it ampli¯es the e®ect of the attacks: The generator is run in user mode rather than in kernel mode, and therefore it is easy to access its state even without administrator privileges.
    [Show full text]
  • Application-Controlled Frequency Scaling Explained
    The TURBO Diaries: Application-controlled Frequency Scaling Explained Jons-Tobias Wamhoff, Stephan Diestelhorst, and Christof Fetzer, Technische Universät Dresden; Patrick Marlier and Pascal Felber, Université de Neuchâtel; Dave Dice, Oracle Labs https://www.usenix.org/conference/atc14/technical-sessions/presentation/wamhoff This paper is included in the Proceedings of USENIX ATC ’14: 2014 USENIX Annual Technical Conference. June 19–20, 2014 • Philadelphia, PA 978-1-931971-10-2 Open access to the Proceedings of USENIX ATC ’14: 2014 USENIX Annual Technical Conference is sponsored by USENIX. The TURBO Diaries: Application-controlled Frequency Scaling Explained Jons-Tobias Wamhoff Patrick Marlier Dave Dice Stephan Diestelhorst Pascal Felber Christof Fetzer Technische Universtat¨ Dresden, Germany Universite´ de Neuchatel,ˆ Switzerland Oracle Labs, USA Abstract these features from an application as needed. Examples in- Most multi-core architectures nowadays support dynamic volt- clude: accelerating the execution of key sections of code on age and frequency scaling (DVFS) to adapt their speed to the the critical path of multi-threaded applications [9]; boosting system’s load and save energy. Some recent architectures addi- time-critical operations or high-priority threads; or reducing tionally allow cores to operate at boosted speeds exceeding the the energy consumption of applications executing low-priority nominal base frequency but within their thermal design power. threads. Furthermore, workloads specifically designed to run In this paper, we propose a general-purpose library that on processors with heterogeneous cores (e.g., few fast and allows selective control of DVFS from user space to accelerate many slow cores) may take additional advantage of application- multi-threaded applications and expose the potential of hetero- level frequency scaling.
    [Show full text]
  • Vulnerabilities of the Linux Random Number Generator
    Black Hat 2006 Open to Attack Vulnerabilities of the Linux Random Number Generator Zvi Gutterman Chief Technology Officer with Benny Pinkas Tzachy Reinman Zvi Gutterman CTO, Safend Previously a chief architect in the IP infrastructure group for ECTEL (NASDAQ:ECTX) and an officer in the Israeli Defense Forces (IDF) Elite Intelligence unit. Master's and Bachelor's degrees in Computer Science from the Israeli Institute of Technology. Ph.D. candidate at the Hebrew University of Jerusalem, focusing on security, network protocols, and software engineering. - Proprietary & Confidential - Safend Safend is a leading provider of innovative endpoint security solutions that protect against corporate data leakage and penetration via physical and wireless ports. Safend Auditor and Safend Protector deliver complete visibility and granular control over all enterprise endpoints. Safend's robust, ultra- secure solutions are intuitive to manage, almost impossible to circumvent, and guarantee connectivity and productivity, without sacrificing security. For more information, visit www.safend.com. - Proprietary & Confidential - Pseudo-Random-Number-Generator (PRNG) Elementary and critical component in many cryptographic protocols Usually: “… Alice picks key K at random …” In practice looks like random.nextBytes(bytes); session_id = digest.digest(bytes); • Which is equal to session_id = md5(get next 16 random bytes) - Proprietary & Confidential - If the PRNG is predictable the cryptosystem is not secure Demonstrated in - Netscape SSL [GoldbergWagner 96] http://www.cs.berkeley.edu/~daw/papers/ddj-netscape.html Apache session-id’s [GuttermanMalkhi 05] http://www.gutterman.net/publications/2005/02/hold_your_sessions_an_attack_o.html - Proprietary & Confidential - General PRNG Scheme 0 0 01 Stateseed 110 100010 Properties: 1. Pseudo-randomness Output bits are indistinguishable from uniform random stream 2.
    [Show full text]
  • Analysis of Entropy Usage in Random Number Generators
    DEGREE PROJECT IN THE FIELD OF TECHNOLOGY ENGINEERING PHYSICS AND THE MAIN FIELD OF STUDY COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2017 Analysis of Entropy Usage in Random Number Generators KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Analysis of Entropy Usage in Random Number Generators JOEL GÄRTNER Master in Computer Science Date: September 16, 2017 Supervisor: Douglas Wikström Examiner: Johan Håstad Principal: Omegapoint Swedish title: Analys av entropianvändning i slumptalsgeneratorer School of Computer Science and Communication i Abstract Cryptographically secure random number generators usually require an outside seed to be initialized. Other solutions instead use a continuous entropy stream to ensure that the internal state of the generator always remains unpredictable. This thesis analyses four such generators with entropy inputs. Furthermore, different ways to estimate entropy is presented and a new method useful for the generator analy- sis is developed. The developed entropy estimator performs well in tests and is used to analyse en- tropy gathered from the different generators. Furthermore, all the analysed generators exhibit some seemingly unintentional behaviour, but most should still be safe for use. ii Sammanfattning Kryptografiskt säkra slumptalsgeneratorer behöver ofta initialiseras med ett oförutsägbart frö. En annan lösning är att istället konstant ge slumptalsgeneratorer entropi. Detta gör det möjligt att garantera att det interna tillståndet i generatorn hålls oförutsägbart. I den här rapporten analyseras fyra sådana generatorer som matas med entropi. Dess- utom presenteras olika sätt att skatta entropi och en ny skattningsmetod utvecklas för att användas till analysen av generatorerna. Den framtagna metoden för entropiskattning lyckas bra i tester och används för att analysera entropin i de olika generatorerna.
    [Show full text]
  • Trusted Platform Module ST33TPHF20SPI FIPS 140-2 Security Policy Level 2
    STMICROELECTRONICS Trusted Platform Module ST33TPHF20SPI ST33HTPH2E28AAF0 / ST33HTPH2E32AAF0 / ST33HTPH2E28AAF1 / ST33HTPH2E32AAF1 ST33HTPH2028AAF3 / ST33HTPH2032AAF3 FIPS 140-2 Security Policy Level 2 Firmware revision: 49.00 / 4A.00 HW version: ST33HTPH revision A Date: 2017/07/19 Document Version: 01-11 NON-PROPRIETARY DOCUMENT FIPS 140-2 SECURITY POLICY NON-PROPRIETARY DOCUMENT Page 1 of 43 Table of Contents 1 MODULE DESCRIPTION .................................................................................................................... 3 1.1 DEFINITION ..................................................................................................................................... 3 1.2 MODULE IDENTIFICATION ................................................................................................................. 3 1.2.1 AAF0 / AAF1 ......................................................................................................................... 3 1.2.2 AAF3 ..................................................................................................................................... 4 1.3 PINOUT DESCRIPTION ...................................................................................................................... 6 1.4 BLOCK DIAGRAMS ........................................................................................................................... 8 1.4.1 HW block diagram ...............................................................................................................
    [Show full text]
  • AMD's Early Processor Lines, up to the Hammer Family (Families K8
    AMD’s early processor lines, up to the Hammer Family (Families K8 - K10.5h) Dezső Sima October 2018 (Ver. 1.1) Sima Dezső, 2018 AMD’s early processor lines, up to the Hammer Family (Families K8 - K10.5h) • 1. Introduction to AMD’s processor families • 2. AMD’s 32-bit x86 families • 3. Migration of 32-bit ISAs and microarchitectures to 64-bit • 4. Overview of AMD’s K8 – K10.5 (Hammer-based) families • 5. The K8 (Hammer) family • 6. The K10 Barcelona family • 7. The K10.5 Shanghai family • 8. The K10.5 Istambul family • 9. The K10.5-based Magny-Course/Lisbon family • 10. References 1. Introduction to AMD’s processor families 1. Introduction to AMD’s processor families (1) 1. Introduction to AMD’s processor families AMD’s early x86 processor history [1] AMD’s own processors Second sourced processors 1. Introduction to AMD’s processor families (2) Evolution of AMD’s early processors [2] 1. Introduction to AMD’s processor families (3) Historical remarks 1) Beyond x86 processors AMD also designed and marketed two embedded processor families; • the 2900 family of bipolar, 4-bit slice microprocessors (1975-?) used in a number of processors, such as particular DEC 11 family models, and • the 29000 family (29K family) of CMOS, 32-bit embedded microcontrollers (1987-95). In late 1995 AMD cancelled their 29K family development and transferred the related design team to the firm’s K5 effort, in order to focus on x86 processors [3]. 2) Initially, AMD designed the Am386/486 processors that were clones of Intel’s processors.
    [Show full text]
  • Dev/Random and FIPS
    /dev/random and Your FIPS 140-2 Validation Can Be Friends Yes, Really Valerie Fenwick Manager, Solaris Cryptographic Technologies team Oracle May 19, 2016 Photo by CGP Grey, http://www.cgpgrey.com/ Creative Commons Copyright © 2016, Oracle and/or its affiliates. All rights reserved. Not All /dev/random Implementations Are Alike • Your mileage may vary – Even across OS versions – Solaris 7‘s /dev/random is nothing like Solaris 11’s – Which look nothing like /dev/random in Linux, OpenBSD, MacOS, etc – Windows gets you a whole ‘nother ball of wax… • No common ancestry – Other than concept Copyright © 2016, Oracle and/or its affiliates. All rights reserved 3 /dev/random vs /dev/urandom • On most OSes, /dev/urandom is a PRNG (Pseudo-Random Number Generator) – In some, so is their /dev/random • Traditionally, /dev/urandom will never block – /dev/random will block • For fun, on some OSes /dev/urandom is a link to /dev/random Copyright © 2016, Oracle and/or its affiliates. All rights reserved 4 FreeBSD: /dev/random • /dev/urandom is a link to /dev/random • Only blocks until seeded • Based on Fortuna Copyright © 2016, Oracle and/or its affiliates. All rights reserved 5 OpenBSD: /dev/random • Called /dev/arandom • Does not block • Formerly based on ARCFOUR – Now based on ChaCha20 – C API still named arc4random() Copyright © 2016, Oracle and/or its affiliates. All rights reserved 6 MacOS: /dev/random • /dev/urandom is a link to /dev/random • 160-bit Yarrow PRNG, uses SHA1 and 3DES Copyright © 2016, Oracle and/or its affiliates. All rights reserved 7 Linux: /dev/random • Blocks when entropy is depleted • Has a separate non-blocking /dev/urandom Copyright © 2016, Oracle and/or its affiliates.
    [Show full text]
  • Linux Random Number Generator
    Not-So-Random Numbers in Virtualized Linux and the Whirlwind RNG Adam Everspaugh, Yan Zhai, Robert Jellinek, Thomas Ristenpart, Michael Swift Department of Computer Sciences University of Wisconsin-Madison {ace, yanzhai, jellinek, rist, swift}@cs.wisc.edu Abstract—Virtualized environments are widely thought to user-level cryptographic processes such as Apache TLS can cause problems for software-based random number generators suffer a catastrophic loss of security when run in a VM that (RNGs), due to use of virtual machine (VM) snapshots as is resumed multiple times from the same snapshot. Left as an well as fewer and believed-to-be lower quality entropy sources. Despite this, we are unaware of any published analysis of the open question in that work is whether reset vulnerabilities security of critical RNGs when running in VMs. We fill this also affect system RNGs. Finally, common folklore states gap, using measurements of Linux’s RNG systems (without the that software entropy sources are inherently worse on virtu- aid of hardware RNGs, the most common use case today) on alized platforms due to frequent lack of keyboard and mouse, Xen, VMware, and Amazon EC2. Despite CPU cycle counters interrupt coalescing by VM managers, and more. Despite providing a significant source of entropy, various deficiencies in the design of the Linux RNG makes its first output vulnerable all this, to date there have been no published measurement during VM boots and, more critically, makes it suffer from studies evaluating the security of Linux (or another common catastrophic reset vulnerabilities. We show cases in which the system RNG) in modern virtualized environments.
    [Show full text]
  • Analysis of the Linux Random Number Generator
    Analysis of the Linux Random Number Generator Zvi Gutterman Benny Pinkas Safend and The Hebrew University of Jerusalem University of Haifa Tzachy Reinman The Hebrew University of Jerusalem March 6, 2006 Abstract Linux is the most popular open source project. The Linux random number generator is part of the kernel of all Linux distributions and is based on generating randomness from entropy of operating system events. The output of this generator is used for almost every security protocol, including TLS/SSL key generation, choosing TCP sequence numbers, and file system and email encryption. Although the generator is part of an open source project, its source code (about 2500 lines of code) is poorly documented, and patched with hundreds of code patches. We used dynamic and static reverse engineering to learn the operation of this generator. This paper presents a description of the underlying algorithms and exposes several security vulnerabilities. In particular, we show an attack on the forward security of the generator which enables an adversary who exposes the state of the generator to compute previous states and outputs. In addition we present a few cryptographic flaws in the design of the generator, as well as measurements of the actual entropy collected by it, and a critical analysis of the use of the generator in Linux distributions on disk-less devices. 1 Introduction Randomness is a crucial resource for cryptography, and random number generators are therefore critical building blocks of almost all cryptographic systems. The security analysis of almost any system assumes a source of random bits, whose output can be used, for example, for the purpose of choosing keys or choosing random nonces.
    [Show full text]
  • Random Number Generation (Entropy)
    1 ì Random Number Generation (Entropy) Secure Software Systems Fall 2018 2 Entropy (Thermodynamics) Measure of disorder of any system Entropy will only increase over time! (2nd law of thermodynamics) Secure Software Systems Fall 2018 3 Entropy (Information Science) ì Shannon Entropy ì Measure of unpredictability of a state ì Average information content ì Low entropy ì Very predictable ì Not much information contained ì High entropy ì Unpredictable Desired for cryptography! ì Significant information contained Secure Software Systems Fall 2018 4 Donald Knuth ì Author, The Art of Computer Programming ì Algorithms! ì Creator of TeX typesetting system ì Winner, ACM Turing Award, 1974 “Random numbers should not be generated with a method chosen at random.” – Donald Knuth Secure Software Systems Fall 2018 5 The Market ì High demand for random numbers in cryptography ì Keys ì Nonces ì One-time pads ì Salts Secure Software Systems Fall 2018 6 Great Disasters in Random Numbers ì Netscape Navigator (1995) ì Random numbers used for SSL encryption ì Export laws (at time) limited international version to 40 bit key lengths ì Algorithm RNG_CreateContext() ì Generates seed for RNG ì Get time of day, process ID, parent process ID ì Hash with MD5 ì All predictable! ì Brute force in much fewer than 240 attempts, which was already weak (< 30 hours) https://people.eecs.berkeley.edu/~daw/papers/ddj-netscape.html Secure Software Systems Fall 2018 7 Great Disasters in Random Numbers ì Debian Linux (2006-2008) - CVE-2008-0166 ì OpenSSL package (specific Debian/Ubuntu
    [Show full text]