Cache Attacks on ARM
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Memory Hierarchy Memory Hierarchy
Memory Key challenge in modern computer architecture Lecture 2: different memory no point in blindingly fast computation if data can’t be and variable types moved in and out fast enough need lots of memory for big applications Prof. Mike Giles very fast memory is also very expensive [email protected] end up being pushed towards a hierarchical design Oxford University Mathematical Institute Oxford e-Research Centre Lecture 2 – p. 1 Lecture 2 – p. 2 CPU Memory Hierarchy Memory Hierarchy Execution speed relies on exploiting data locality 2 – 8 GB Main memory 1GHz DDR3 temporal locality: a data item just accessed is likely to be used again in the near future, so keep it in the cache ? 200+ cycle access, 20-30GB/s spatial locality: neighbouring data is also likely to be 6 used soon, so load them into the cache at the same 2–6MB time using a ‘wide’ bus (like a multi-lane motorway) L3 Cache 2GHz SRAM ??25-35 cycle access 66 This wide bus is only way to get high bandwidth to slow 32KB + 256KB main memory ? L1/L2 Cache faster 3GHz SRAM more expensive ??? 6665-12 cycle access smaller registers Lecture 2 – p. 3 Lecture 2 – p. 4 Caches Importance of Locality The cache line is the basic unit of data transfer; Typical workstation: typical size is 64 bytes 8 8-byte items. ≡ × 10 Gflops CPU 20 GB/s memory L2 cache bandwidth With a single cache, when the CPU loads data into a ←→ 64 bytes/line register: it looks for line in cache 20GB/s 300M line/s 2.4G double/s ≡ ≡ if there (hit), it gets data At worst, each flop requires 2 inputs and has 1 output, if not (miss), it gets entire line from main memory, forcing loading of 3 lines = 100 Mflops displacing an existing line in cache (usually least ⇒ recently used) If all 8 variables/line are used, then this increases to 800 Mflops. -
Make the Most out of Last Level Cache in Intel Processors In: Proceedings of the Fourteenth Eurosys Conference (Eurosys'19), Dresden, Germany, 25-28 March 2019
http://www.diva-portal.org Postprint This is the accepted version of a paper presented at EuroSys'19. Citation for the original published paper: Farshin, A., Roozbeh, A., Maguire Jr., G Q., Kostic, D. (2019) Make the Most out of Last Level Cache in Intel Processors In: Proceedings of the Fourteenth EuroSys Conference (EuroSys'19), Dresden, Germany, 25-28 March 2019. ACM Digital Library N.B. When citing this work, cite the original published paper. Permanent link to this version: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-244750 Make the Most out of Last Level Cache in Intel Processors Alireza Farshin∗† Amir Roozbeh∗ KTH Royal Institute of Technology KTH Royal Institute of Technology [email protected] Ericsson Research [email protected] Gerald Q. Maguire Jr. Dejan Kostić KTH Royal Institute of Technology KTH Royal Institute of Technology [email protected] [email protected] Abstract between Central Processing Unit (CPU) and Direct Random In modern (Intel) processors, Last Level Cache (LLC) is Access Memory (DRAM) speeds has been increasing. One divided into multiple slices and an undocumented hashing means to mitigate this problem is better utilization of cache algorithm (aka Complex Addressing) maps different parts memory (a faster, but smaller memory closer to the CPU) in of memory address space among these slices to increase order to reduce the number of DRAM accesses. the effective memory bandwidth. After a careful study This cache memory becomes even more valuable due to of Intel’s Complex Addressing, we introduce a slice- the explosion of data and the advent of hundred gigabit per aware memory management scheme, wherein frequently second networks (100/200/400 Gbps) [9]. -
Migration from IBM 750FX to MPC7447A by Douglas Hamilton European Applications Engineering Networking and Computing Systems Group Freescale Semiconductor, Inc
Freescale Semiconductor AN2808 Application Note Rev. 1, 06/2005 Migration from IBM 750FX to MPC7447A by Douglas Hamilton European Applications Engineering Networking and Computing Systems Group Freescale Semiconductor, Inc. Contents 1 Scope and Definitions 1. Scope and Definitions . 1 2. Feature Overview . 2 The purpose of this application note is to facilitate migration 3. 7447A Specific Features . 12 from IBM’s 750FX-based systems to Freescale’s 4. Programming Model . 16 MPC7447A. It addresses the differences between the 5. Hardware Considerations . 27 systems, explaining which features have changed and why, 6. Revision History . 30 before discussing the impact on migration in terms of hardware and software. Throughout this document the following references are used: • 750FX—which applies to Freescale’s MPC750, MPC740, MPC755, and MPC745 devices, as well as to IBM’s 750FX devices. Any features specific to IBM’s 750FX will be explicitly stated as such. • MPC7447A—which applies to Freescale’s MPC7450 family of products (MPC7450, MPC7451, MPC7441, MPC7455, MPC7445, MPC7457, MPC7447, and MPC7447A) except where otherwise stated. Because this document is to aid the migration from 750FX, which does not support L3 cache, the L3 cache features of the MPC745x devices are not mentioned. © Freescale Semiconductor, Inc., 2005. All rights reserved. Feature Overview 2 Feature Overview There are many differences between the 750FX and the MPC7447A devices, beyond the clear differences of the core complex. This section covers the differences between the cores and then other areas of interest including the cache configuration and system interfaces. 2.1 Cores The key processing elements of the G3 core complex used in the 750FX are shown below in Figure 1, and the G4 complex used in the 7447A is shown in Figure 2. -
Caches & Memory
Caches & Memory Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon, Bala, Bracy, McKee, and Sirer] Programs 101 C Code RISC-V Assembly int main (int argc, char* argv[ ]) { main: addi sp,sp,-48 int i; sw x1,44(sp) int m = n; sw fp,40(sp) int sum = 0; move fp,sp sw x10,-36(fp) for (i = 1; i <= m; i++) { sw x11,-40(fp) sum += i; la x15,n } lw x15,0(x15) printf (“...”, n, sum); sw x15,-28(fp) } sw x0,-24(fp) li x15,1 sw x15,-20(fp) Load/Store Architectures: L2: lw x14,-20(fp) lw x15,-28(fp) • Read data from memory blt x15,x14,L3 (put in registers) . • Manipulate it .Instructions that read from • Store it back to memory or write to memory… 2 Programs 101 C Code RISC-V Assembly int main (int argc, char* argv[ ]) { main: addi sp,sp,-48 int i; sw ra,44(sp) int m = n; sw fp,40(sp) int sum = 0; move fp,sp sw a0,-36(fp) for (i = 1; i <= m; i++) { sw a1,-40(fp) sum += i; la a5,n } lw a5,0(x15) printf (“...”, n, sum); sw a5,-28(fp) } sw x0,-24(fp) li a5,1 sw a5,-20(fp) Load/Store Architectures: L2: lw a4,-20(fp) lw a5,-28(fp) • Read data from memory blt a5,a4,L3 (put in registers) . • Manipulate it .Instructions that read from • Store it back to memory or write to memory… 3 1 Cycle Per Stage: the Biggest Lie (So Far) Code Stored in Memory (also, data and stack) compute jump/branch targets A memory register ALU D D file B +4 addr PC B control din dout M inst memory extend new imm forward pc Stack, Data, Code detect unit hazard Stored in Memory Instruction Instruction Write- ctrl ctrl ctrl Fetch Decode Execute Memory Back IF/ID -
Interconnect Solutions Short Form Catalog
Interconnect Solutions Short Form Catalog How to Search this Catalog This digital catalog provides you with three quick ways to find the products and information you are looking for. Just point and click on the bookmarks to the left, the linked images on the next page or the labeled sections of the table of contents. You can also use the “search” function built into Adobe Acrobat to jump directly to any text reference in this document. Acrobat “Search” function instructions: 1. Press CONTROL + F 2. When the dialog box appears, type in the word or words you are looking for and press ENTER. 3. Depending on your version of Acrobat, it will either take you directly to the first instance found, or display a list of pages where the text can be found. In the latter, click on the link to the pages provided. Interconnect Solutions Short Form Catalog Complete Solutions for the Electronics Industry 3M Electronics offers a comprehensive range of Interconnect Solutions for the electronics industry with a product portfolio that includes connectors, cables, cable assemblies and assembly tooling for a wide variety of applications. 3M is dedicated to innovation, continually developing new products that become an important part of everyday life across many diverse markets. A number of 3M solution categories are based on custom-designed products for specialized applications. 3M Electronics can help you design, modify and customize your product as well as help you to seamlessly integrate our products into your manufacturing process on a global basis. RoHS Compliant Statement “RoHS compliant” means that the product or part does not contain any of the following substances in excess of the following maximum concentration values in any homogeneous material, unless the substance is in an application that is exempt under RoHS: (a) 0.1% (by weight) for lead, mercury, hexavalent chromium, polybrominated biphenyls or polybrominated diphenyl ethers; or (b) 0.01% (by weight) for cadmium. -
The BG News April 15, 1999
Bowling Green State University ScholarWorks@BGSU BG News (Student Newspaper) University Publications 4-15-1999 The BG News April 15, 1999 Bowling Green State University Follow this and additional works at: https://scholarworks.bgsu.edu/bg-news Recommended Citation Bowling Green State University, "The BG News April 15, 1999" (1999). BG News (Student Newspaper). 6484. https://scholarworks.bgsu.edu/bg-news/6484 This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License. This Article is brought to you for free and open access by the University Publications at ScholarWorks@BGSU. It has been accepted for inclusion in BG News (Student Newspaper) by an authorized administrator of ScholarWorks@BGSU. ■" * he BG News Women rally to 'take back the night' sexual assault. istration vivors, they will be able to relate her family. By WENDY SUTO Celesta Haras/ti, a resident of buildings, rain to parts of her story, Kissinger "The more I tell my story, the The BG News BG and a W4W member, said the or shine. The said. less shame and guilt I feel," Kissinger said. "For my situa- Women (and some men) will rally is about issues that are con- keynote "When I decided to disclose tion, I'm glad I didn't tell my take to the streets tonight, pro- sidered taboo by society, such as speaker, my sexual abuse to my family, I parents right away because I claiming a public statement in an rape and incest. She has attended Kendel came out of the closet complete- think it would have been a worse attempt to "Take Back the Night" several TBTN marches at the Kissinger, a ly," Kissinger said. -
Publication Title 1-1962
publication_title print_identifier online_identifier publisher_name date_monograph_published_print 1-1962 - AIEE General Principles Upon Which Temperature 978-1-5044-0149-4 IEEE 1962 Limits Are Based in the rating of Electric Equipment 1-1969 - IEEE General Priniciples for Temperature Limits in the 978-1-5044-0150-0 IEEE 1968 Rating of Electric Equipment 1-1986 - IEEE Standard General Principles for Temperature Limits in the Rating of Electric Equipment and for the 978-0-7381-2985-3 IEEE 1986 Evaluation of Electrical Insulation 1-2000 - IEEE Recommended Practice - General Principles for Temperature Limits in the Rating of Electrical Equipment and 978-0-7381-2717-0 IEEE 2001 for the Evaluation of Electrical Insulation 100-2000 - The Authoritative Dictionary of IEEE Standards 978-0-7381-2601-2 IEEE 2000 Terms, Seventh Edition 1000-1987 - An American National Standard IEEE Standard for 0-7381-4593-9 IEEE 1988 Mechanical Core Specifications for Microcomputers 1000-1987 - IEEE Standard for an 8-Bit Backplane Interface: 978-0-7381-2756-9 IEEE 1988 STEbus 1001-1988 - IEEE Guide for Interfacing Dispersed Storage and 0-7381-4134-8 IEEE 1989 Generation Facilities With Electric Utility Systems 1002-1987 - IEEE Standard Taxonomy for Software Engineering 0-7381-0399-3 IEEE 1987 Standards 1003.0-1995 - Guide to the POSIX(R) Open System 978-0-7381-3138-2 IEEE 1994 Environment (OSE) 1003.1, 2004 Edition - IEEE Standard for Information Technology - Portable Operating System Interface (POSIX(R)) - 978-0-7381-4040-7 IEEE 2004 Base Definitions 1003.1, 2013 -
Opening Plenary March 2021
Opening Plenary March 2021 Glenn Parsons – IEEE 802.1 WG Chair [email protected] 802.1 plenary agenda Monday, March 8th opening Tuesday, March 16th closing • Copyright Policy • Copyright Policy • Call for Patents • Call for Patents • Participant behavior • Participant behavior • Administrative • Membership status • Membership status • Future Sessions • Future Sessions • Sanity check – current projects • 802 EC report • TG reports • Sanity check – current projects • Outgoing Liaisons • Incoming Liaisons • Motions for EC • TG agendas • Motions for 802.1 • Any other business • Any other business 2 INSTRUCTIONS FOR CHAIRS OF STANDARDS DEVELOPMENT ACTIVITIES At the beginning of each standards development meeting the chair or a designee is to: .Show the following slides (or provide them beforehand) .Advise the standards development group participants that: .IEEE SA’s copyright policy is described in Clause 7 of the IEEE SA Standards Board Bylaws and Clause 6.1 of the IEEE SA Standards Board Operations Manual; .Any material submitted during standards development, whether verbal, recorded, or in written form, is a Contribution and shall comply with the IEEE SA Copyright Policy; .Instruct the Secretary to record in the minutes of the relevant meeting: .That the foregoing information was provided and that the copyright slides were shown (or provided beforehand). .Ask participants to register attendance in IMAT: https://imat.ieee.org 3 IEEE SA COPYRIGHT POLICY By participating in this activity, you agree to comply with the IEEE Code of Ethics, all applicable laws, and all IEEE policies and procedures including, but not limited to, the IEEE SA Copyright Policy. .Previously Published material (copyright assertion indicated) shall not be presented/submitted to the Working Group nor incorporated into a Working Group draft unless permission is granted. -
Leverage and Asset Bubbles: Averting Armageddon with Chapter 11?*
The Economic Journal, 120 (May), 500–518. doi: 10.1111/j.1468-0297.2010.02357.x. Ó The Author(s). Journal compilation Ó Royal Economic Society 2010. Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA. LEVERAGE AND ASSET BUBBLES: AVERTING ARMAGEDDON WITH CHAPTER 11?* Marcus Miller and Joseph Stiglitz An iconic model with high leverage and overvalued collateral assets is used to illustrate the ampli- fication mechanism driving asset prices to ÔovershootÕ equilibrium when an asset bubble bursts – threatening widespread insolvency and what Richard Koo calls a Ôbalance sheet recessionÕ. Besides interest rates cuts, asset purchases and capital restructuring are key to crisis resolution. The usual bankruptcy procedures for doing this fail to internalise the price effects of asset Ôfire-salesÕ to pay down debts, however. We discuss how official intervention in the form of ÔsuperÕ Chapter 11 actions can help prevent asset price correction causing widespread economic disruption. There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy. Hamlet From 2007 to 2009 a chain of events, beginning with unexpected losses in the US sub-prime mortgage market, was destined to bring the global financial system close to collapse and to drag the world economy into recession. ÔOne of the key challenges posed by this crisisÕ, says Williamson (2009), Ôis to understand how such major consequences can flow from such a seemingly minor eventÕ. Before describing an amplification mechanism involving overpriced assets and excessive leverage, we begin by looking, albeit briefly, at what the current macroeconomic paradigm may have to say. -
Architectural Support for Real-Time Computing Using Generalized Rate Monotonic Theory
744 計 測 と 制 御 シ テ ア Vol.31, No.7 集 ル タ イ ム 分 ス ム 特 リ 散 (1992年7月) ≪展 望 ≫ Architectural Support for Real-Time Computing using Generalized Rate Monotonic Theory Lui SHA* and Shirish S. SATHAYE** Abstract The rate monotonic theory and its generalizations have been adopted by national high tech- nology projects such as the Space Station and has recently been supported by major open stand- ards such as the IEEE Futurebus+ and POSIX. 4. In this paper, we focus on the architectural support necessary for scheduling activities using the generalized rate monotonic theory. We briefly review the theory and provide an application example. Finally we describe the architec- tural requirements for the use of the theory. Key Words: Real-Time Scheduling, Distributed Real-Time System, Rate Monotonic scheduling ● Stability under transient overload. When the 1. Introduction system is overloaded by events and it is im- Real-time computing systems are critical to an possible to meet all the deadlines, we must industrialized nation's technological infrastructure. still guarantee the deadines of selected critical Modern telecommunication systems, factories, de- tasks. fense systems, aircrafts and airports, space stations Generalized rate monotonic scheduling (GRMS) and high energy physics experiments cannot op- theory allows system developers to meet the erate .without them. In real-time applications, above requirements by managing system concur- the correctness of computation depends upon not rency and timing constraints at the level of task- only its results but also the time at which outputs ing and message passing. In essence, this theory are generated. -
NVIDIA Tegra 4 Family CPU Architecture 4-PLUS-1 Quad Core
Whitepaper NVIDIA Tegra 4 Family CPU Architecture 4-PLUS-1 Quad core 1 Table of Contents ...................................................................................................................................................................... 1 Introduction .............................................................................................................................................. 3 NVIDIA Tegra 4 Family of Mobile Processors ............................................................................................ 3 Benchmarking CPU Performance .............................................................................................................. 4 Tegra 4 Family CPUs Architected for High Performance and Power Efficiency ......................................... 6 Wider Issue Execution Units for Higher Throughput ............................................................................ 6 Better Memory Level Parallelism from a Larger Instruction Window for Out-of-Order Execution ...... 7 Fast Load-To-Use Logic allows larger L1 Data Cache ............................................................................. 8 Enhanced branch prediction for higher efficiency .............................................................................. 10 Advanced Prefetcher for higher MLP and lower latency .................................................................... 10 Large Unified L2 Cache ....................................................................................................................... -
Bi-Directional Optical Backplane Bus for General Purpose Multi-Processor B Oard-To-B Oard Optoelectronic Interconnects
JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 13, NO. 6, JUNE 1995 1031 Bi-Directional Optical Backplane Bus for General Purpose Multi-Processor B oard-to-B oard Optoelectronic Interconnects Srikanth Natarajan, Chunhe Zhao, and Ray. T. Chen Absfract- We report for the first time a bidirectional opti- cal backplane bus for a high performance system containing nine multi-chip module (MCM) boards, operating at 632.8 and 1300 nm. The backplane bus reported here employs arrays of multiplexed polymer-based waveguide holograms in conjunction with a waveguiding plate, within which 16 substrate guided waves for 72 (8 x 9) cascaded fanouts, are generated. Data transfer of 1.2 GbUs at 1.3-pm wavelength is demonstrated for a single bus line with 72 cascaded fanouts. Packaging-related issues such as Waveguiding Plate transceiver size and misalignment are embarked upon to provide n a reliable system with a wide bandwidth coverage. Theoretical U hocessor/Memory Board treatment to minimize intensity fluctuations among the nine modules in both directions is further presented and an optimum I High-speed Optoelectronic Transceiver design rule is provided. The backplane bus demonstrated, is for general-purpose and therefore compatible with such IEEE stan- - Waveguide Hologram For Bi-Directional Coupling dardized buses as VMEbus, Futurebus and FASTBUS, and can Fig. 1. Optical equivalent of a section of a single bidirectional electronic function as a backplane bus in existing computing environments. bus line. I. INTRODUCTION needed to preserve the rising and falling edges of the signals HE LIMITATIONS of current computer backplane buses increases. This makes using bulky, expensive, terminated Tstem from their purely electronic interconnects.