Analysis of Intel IA-64 Processor Support for Secure Systems
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Review Memory Disambiguation Review Explicit Register Renaming
5HYLHZ5HRUGHU%XIIHU 52% &6 *UDGXDWH&RPSXWHU$UFKLWHFWXUH 8VHRIUHRUGHUEXIIHU /HFWXUH ² ,QRUGHULVVXH2XWRIRUGHUH[HFXWLRQ,QRUGHUFRPPLW ² +ROGVUHVXOWVXQWLOWKH\FDQEHFRPPLWWHGLQRUGHU ,QVWUXFWLRQ/HYHO3DUDOOHOLVP ª 6HUYHVDVVRXUFHRIYDOXHVXQWLOLQVWUXFWLRQVFRPPLWWHG ² 3URYLGHVVXSSRUWIRUSUHFLVHH[FHSWLRQV6SHFXODWLRQVLPSO\WKURZRXW *HWWLQJWKH&3, LQVWUXFWLRQVODWHUWKDQH[FHSWHGLQVWUXFWLRQ ² &RPPLWVXVHUYLVLEOHVWDWHLQLQVWUXFWLRQRUGHU ² 6WRUHVVHQWWRPHPRU\V\VWHPRQO\ZKHQWKH\UHDFKKHDGRIEXIIHU 6HSWHPEHU ,Q2UGHU&RPPLW LVLPSRUWDQWEHFDXVH 3URI-RKQ.XELDWRZLF] ² $OORZVWKHJHQHUDWLRQRISUHFLVHH[FHSWLRQV ² $OORZVVSHFXODWLRQDFURVVEUDQFKHV &6.XELDWRZLF] &6.XELDWRZLF] /HF /HF 5HYLHZ0HPRU\'LVDPELJXDWLRQ 5HYLHZ([SOLFLW5HJLVWHU5HQDPLQJ 4XHVWLRQ*LYHQDORDGWKDWIROORZVDVWRUHLQSURJUDP 0DNHXVHRIDSK\VLFDO UHJLVWHUILOHWKDWLVODUJHUWKDQ RUGHUDUHWKHWZRUHODWHG" QXPEHURIUHJLVWHUVVSHFLILHGE\,6$ ² 7U\LQJWRGHWHFW5$:KD]DUGVWKURXJKPHPRU\ .H\LQVLJKW$OORFDWHDQHZSK\VLFDOGHVWLQDWLRQUHJLVWHU ² 6WRUHVFRPPLWLQRUGHU 52% VRQR:$5:$:PHPRU\KD]DUGV IRUHYHU\LQVWUXFWLRQWKDWZULWHV ,PSOHPHQWDWLRQ ² 5HPRYHVDOOFKDQFHRI:$5RU:$:KD]DUGV ² .HHSTXHXHRIVWRUHVLQSURJRUGHU ² 6LPLODUWRFRPSLOHUWUDQVIRUPDWLRQFDOOHG6WDWLF6LQJOH$VVLJQPHQW ² :DWFKIRUSRVLWLRQRIQHZORDGVUHODWLYHWRH[LVWLQJVWRUHV ª /LNHKDUGZDUHEDVHGG\QDPLFFRPSLODWLRQ" :KHQKDYHDGGUHVVIRUORDGFKHFNVWRUHTXHXH 0HFKDQLVP".HHSDWUDQVODWLRQWDEOH ² ,IDQ\ VWRUHSULRUWRORDGLVZDLWLQJIRULWVDGGUHVVVWDOOORDG ² ,6$UHJLVWHU⇒ SK\VLFDOUHJLVWHUPDSSLQJ ² ,IORDGDGGUHVVPDWFKHVHDUOLHUVWRUHDGGUHVV DVVRFLDWLYHORRNXS ² :KHQUHJLVWHUZULWWHQUHSODFHHQWU\ZLWKQHZUHJLVWHUIURPIUHHOLVW WKHQZHKDYHDPHPRU\LQGXFHG -
ARM Cortex-A* Brian Eccles, Riley Larkins, Kevin Mee, Fred Silberberg, Alex Solomon, Mitchell Wills
ARM Cortex-A* Brian Eccles, Riley Larkins, Kevin Mee, Fred Silberberg, Alex Solomon, Mitchell Wills The ARM CortexA product line has changed significantly since the introduction of the CortexA8 in 2005. ARM’s next major leap came with the CortexA9 which was the first design to incorporate multiple cores. The next advance was the development of the big.LITTLE architecture, which incorporates both high performance (A15) and high efficiency(A7) cores. Most recently the A57 and A53 have added 64bit support to the product line. The ARM Cortex series cores are all made up of the main processing unit, a L1 instruction cache, a L1 data cache, an advanced SIMD core and a floating point core. Each processor then has an additional L2 cache shared between all cores (if there are multiple), debug support and an interface bus for communicating with the rest of the system. Multicore processors (such as the A53 and A57) also include additional hardware to facilitate coherency between cores. The ARM CortexA57 is a 64bit processor that supports 1 to 4 cores. The instruction pipeline in each core supports fetching up to three instructions per cycle to send down the pipeline. The instruction pipeline is made up of a 12 stage in order pipeline and a collection of parallel pipelines that range in size from 3 to 15 stages as seen below. The ARM CortexA53 is similar to the A57, but is designed to be more power efficient at the cost of processing power. The A57 in order pipeline is made up of 5 stages of instruction fetch and 7 stages of instruction decode and register renaming. -
I386-Engine™ Technical Manual
i386-Engine™ C/C++ Programmable, 32-bit Microprocessor Module Based on the Intel386EX Technical Manual 1950 5 th Street, Davis, CA 95616, USA Tel: 530-758-0180 Fax: 530-758-0181 Email: [email protected] http://www.tern.com Internet Email: [email protected] http://www.tern.com COPYRIGHT i386-Engine, VE232, A-Engine, A-Core, C-Engine, V25-Engine, MotionC, BirdBox, PowerDrive, SensorWatch, Pc-Co, LittleDrive, MemCard, ACTF, and NT-Kit are trademarks of TERN, Inc. Intel386EX and Intel386SX are trademarks of Intel Coporation. Borland C/C++ are trademarks of Borland International. Microsoft, MS-DOS, Windows, and Windows 95 are trademarks of Microsoft Corporation. IBM is a trademark of International Business Machines Corporation. Version 2.00 October 28, 2010 No part of this document may be copied or reproduced in any form or by any means without the prior written consent of TERN, Inc. © 1998-2010 1950 5 th Street, Davis, CA 95616, USA Tel: 530-758-0180 Fax: 530-758-0181 Email: [email protected] http://www.tern.com Important Notice TERN is developing complex, high technology integration systems. These systems are integrated with software and hardware that are not 100% defect free. TERN products are not designed, intended, authorized, or warranted to be suitable for use in life-support applications, devices, or systems, or in other critical applications. TERN and the Buyer agree that TERN will not be liable for incidental or consequential damages arising from the use of TERN products. It is the Buyer's responsibility to protect life and property against incidental failure. TERN reserves the right to make changes and improvements to its products without providing notice. -
Out-Of-Order Execution & Register Renaming
Asanovic/Devadas Spring 2002 6.823 Out-of-Order Execution & Register Renaming Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Asanovic/Devadas Spring 2002 6.823 Scoreboard for In-order Issue Busy[unit#] : a bit-vector to indicate unit’s availability. (unit = Int, Add, Mult, Div) These bits are hardwired to FU's. WP[reg#] : a bit-vector to record the registers for which writes are pending Issue checks the instruction (opcode dest src1 src2) against the scoreboard (Busy & WP) to dispatch FU available? not Busy[FU#] RAW? WP[src1] or WP[src2] WAR? cannot arise WAW? WP[dest] Asanovic/Devadas Spring 2002 Out-of-Order Dispatch 6.823 ALU Mem IF ID Issue WB Fadd Fmul • Issue stage buffer holds multiple instructions waiting to issue. • Decode adds next instruction to buffer if there is space and the instruction does not cause a WAR or WAW hazard. • Any instruction in buffer whose RAW hazards are satisfied can be dispatched (for now, at most one dispatch per cycle). On a write back (WB), new instructions may get enabled. Asanovic/Devadas Spring 2002 6.823 Out-of-Order Issue: an example latency 1 LD F2, 34(R2) 1 1 2 2 LD F4, 45(R3) long 3 MULTD F6, F4, F2 3 4 3 4 SUBD F8, F2, F2 1 5 5 DIVD F4, F2, F8 4 6 ADDD F10, F6, F4 1 6 In-order: 1 (2,1) . 2 3 4 4 3 5 . 5 6 6 Out-of-order: 1 (2,1) 4 4 . 2 3 . 3 5 . -
Dynamic Register Renaming Through Virtual-Physical Registers
Dynamic Register Renaming Through Virtual-Physical Registers † Teresa Monreal [email protected] Antonio González* [email protected] Mateo Valero* [email protected] †† José González [email protected] † Víctor Viñals [email protected] †Departamento de Informática e Ing. de Sistemas. Centro Politécnico Superior - Univ. de Zaragoza, Zaragoza, Spain. *Departament d’ Arquitectura de Computadors. Universitat Politècnica de Catalunya, Barcelona, Spain. ††Departamento de Ingeniería y Tecnología de Computadores. Universidad de Murcia, Murcia, Spain. Abstract Register file access time represents one of the critical delays of current microprocessors, and it is expected to become more critical as future processors increase the instruction window size and the issue width. This paper present a novel dynamic register renaming scheme that delays the allocation of physical registers until a late stage in the pipeline. We show that it can provide important savings in number of physical registers so it can significantly shorter the register file access time. Delaying the allocation of physical registers requires some artifact to keep track of dependences. This is achieved by introducing the concept of virtual-physical registers, which are tags that do not require any storage location. The proposed renaming scheme shortens the average number of cycles that each physical register is allocated, and allows for an early execution of instructions since they can obtain a physical register for its destination earlier than with the conventional scheme. Early execution is especially beneficial for branches and memory operations, since the former can be resolved earlier and the latter can prefetch their data in advance. 1. Introduction Dynamically-scheduled superscalar processors exploit instruction-level parallelism (ILP) by overlapping the execution of instructions in an instruction window. -
Computer Architecture Out-Of-Order Execution
Computer Architecture Out-of-order Execution By Yoav Etsion With acknowledgement to Dan Tsafrir, Avi Mendelson, Lihu Rappoport, and Adi Yoaz 1 Computer Architecture 2013– Out-of-Order Execution The need for speed: Superscalar • Remember our goal: minimize CPU Time CPU Time = duration of clock cycle × CPI × IC • So far we have learned that in order to Minimize clock cycle ⇒ add more pipe stages Minimize CPI ⇒ utilize pipeline Minimize IC ⇒ change/improve the architecture • Why not make the pipeline deeper and deeper? Beyond some point, adding more pipe stages doesn’t help, because Control/data hazards increase, and become costlier • (Recall that in a pipelined CPU, CPI=1 only w/o hazards) • So what can we do next? Reduce the CPI by utilizing ILP (instruction level parallelism) We will need to duplicate HW for this purpose… 2 Computer Architecture 2013– Out-of-Order Execution A simple superscalar CPU • Duplicates the pipeline to accommodate ILP (IPC > 1) ILP=instruction-level parallelism • Note that duplicating HW in just one pipe stage doesn’t help e.g., when having 2 ALUs, the bottleneck moves to other stages IF ID EXE MEM WB • Conclusion: Getting IPC > 1 requires to fetch/decode/exe/retire >1 instruction per clock: IF ID EXE MEM WB 3 Computer Architecture 2013– Out-of-Order Execution Example: Pentium Processor • Pentium fetches & decodes 2 instructions per cycle • Before register file read, decide on pairing Can the two instructions be executed in parallel? (yes/no) u-pipe IF ID v-pipe • Pairing decision is based… On data -
Xv6 Booting: Transitioning from 16 to 32 Bit Mode
238P Operating Systems, Fall 2018 xv6 Boot Recap: Transitioning from 16 bit mode to 32 bit mode 3 November 2018 Aftab Hussain University of California, Irvine BIOS xv6 Boot loader what it does Sets up the hardware. Transfers control to the Boot Loader. BIOS xv6 Boot loader what it does Sets up the hardware. Transfers control to the Boot Loader. how it transfers control to the Boot Loader Boot loader is loaded from the 1st 512-byte sector of the boot disk. This 512-byte sector is known as the boot sector. Boot loader is loaded at 0x7c00. Sets processor’s ip register to 0x7c00. BIOS xv6 Boot loader 2 source source files bootasm.S - 16 and 32 bit assembly code. bootmain.c - C code. BIOS xv6 Boot loader 2 source source files bootasm.S - 16 and 32 bit assembly code. bootmain.c - C code. executing bootasm.S 1. Disable interrupts using cli instruction. (Code). > Done in case BIOS has initialized any of its interrupt handlers while setting up the hardware. Also, BIOS is not running anymore, so better to disable them. > Clear segment registers. Use xor for %ax, and copy it to the rest (Code). 2. Switch from real mode to protected mode. (References: a, b). > Note the difference between processor modes and kernel privilege modes > We do the above switch to increase the size of the memory we can address. BIOS xv6 Boot loader 2 source source file executing bootasm.S m. Let’s 2. Switch from real mode to protected mode. expand on this a little bit Addressing in Real Mode In real mode, the processor sends 20-bit addresses to the memory. -
Protected Mode - Wikipedia
2/12/2019 Protected mode - Wikipedia Protected mode In computing, protected mode, also called protected virtual address mode,[1] is an operational mode of x86- compatible central processing units (CPUs). It allows system software to use features such as virtual memory, paging and safe multi-tasking designed to increase an operating system's control over application software.[2][3] When a processor that supports x86 protected mode is powered on, it begins executing instructions in real mode, in order to maintain backward compatibility with earlier x86 processors.[4] Protected mode may only be entered after the system software sets up one descriptor table and enables the Protection Enable (PE) bit in the control register 0 (CR0).[5] Protected mode was first added to the x86 architecture in 1982,[6] with the release of Intel's 80286 (286) processor, and later extended with the release of the 80386 (386) in 1985.[7] Due to the enhancements added by protected mode, it has become widely adopted and has become the foundation for all subsequent enhancements to the x86 architecture,[8] although many of those enhancements, such as added instructions and new registers, also brought benefits to the real mode. Contents History The 286 The 386 386 additions to protected mode Entering and exiting protected mode Features Privilege levels Real mode application compatibility Virtual 8086 mode Segment addressing Protected mode 286 386 Structure of segment descriptor entry Paging Multitasking Operating systems See also References External links History https://en.wikipedia.org/wiki/Protected_mode -
Hardware-Sensitive Database Operations - II
Faculty of Computer Science Database and Software Engineering Group Hardware-Sensitive Database Operations - II Balasubramanian (Bala) Gurumurthy Advanced Topics in Databases, 2019/May/17 Otto-von-Guericke University of Magdeburg So Far... ● Hardware evolution and current challenges ● Hardware-oblivious vs. hardware-sensitive programming ● Pipelining in RISC computing ● Pipeline Hazards ○ Structural hazard ○ Data hazard ○ Control hazard ● Resolving hazards ○ Loop-Unrolling ○ Predication Bala Gurumurthy | Hardware-Sensitive Database Operations Part II We will see ● Vectorization ○ SIMD Execution ○ SIMD in DBMS Operation ● GPUs in DBMSs ○ Processing Model ○ Handling Synchronization Bala Gurumurthy | Hardware-Sensitive Database Operations Vectorization Leveraging Modern Processing Capabilities Hardware Parallelism One we know already: Pipelining ● Separate chip regions for individual tasks to execute independently ● Advantage: parallelism + sequential execution semantics ● We discussed problems of hazards ● VLSI tech. limits degree up to which pipelining is feasible [Kaeslin, 2008] Bala Gurumurthy | Hardware-Sensitive Database Operations Hardware Parallelism Chip area can be used for other types of parallelism: Computer systems typically use identical hardware circuits, but their function may be controlled by different instruction stream Si: Bala Gurumurthy | Hardware-Sensitive Database Operations Special instances Example of this architecture? Bala Gurumurthy | Hardware-Sensitive Database Operations Special instances Example of this architecture? -
Chapter 2 Powerpc Register Set
Chapter 2 PowerPC Register Set 20 20 This chapter describes the register organization defined by the three levels of the PowerPC U architecture—user instruction set architecture (UISA), virtual environment architecture V (VEA), and operating environment architecture (OEA). The PowerPC architecture defines O register-to-register operations for all computational instructions. Source data for these instructions are accessed from the on-chip registers or are provided as immediate values embedded in the opcode. The three-register instruction format allows specification of a target register distinct from the two source registers, thus preserving the original data for use by other instructions and reducing the number of instructions required for certain operations. Data is transferred between memory and registers with explicit load and store instructions only. Note that the handling of reserved bits in any register is implementation-dependent. Software is permitted to write any value to a reserved bit in a register. However, a subsequent reading of the reserved bit returns 0 if the value last written to the bit was 0 and returns an undefined value (may be 0 or 1) otherwise. This means that even if the last value written to a reserved bit was 1, reading that bit may return 0. 2.1 PowerPC UISA Register Set The PowerPC UISA registers, shown in Figure 2-1, can be accessed by either user- or U supervisor-level instructions (the architecture specification refers to user-level and supervisor-level as problem state and privileged state respectively). The general-purpose registers (GPRs) and floating-point registers (FPRs) are accessed as instruction operands. -
A Comparison of Protection Lookaside Buffers and the PA-RISC
rI3 HEWLETT ~~PACKARD A Comparison ofProtection Lookaside Buffers and the PA·RISC Protection Architecture John Wilkes, Bart Sears Computer Systems Laboratory HPL-92-55 March 1992 processor architecture, Eric Koldinger and others at the University of single address space, 64 Washington Department of Computer Science have bitaddressing, protection, proposed a new model for memory protection in PA-RISe, protection single-address-space architectures. This paper lookaside buffers, PLB, compares the Washington proposal with what translation lookaside already exists in PA-RISC, and suggests some buffers,TLB, cache incremental improvements to the latter that would indexing, virtually provide most of the benefits of the former. addressed caches Internal Accession Date Only © Copyright Hewlett-Packard Company 1992 1 Introduction A recent technical report by Eric Koldinger and others [Koldinger91] proposes a new model for managing protectioninformationinsingle address space processors. This paper is one outcome of discussions withthe authors about the real differences betweentheir proposal and the scheme usedinPA-RISC[Lee89,HPPA90]:it offers a restatement of the newproposal, a comparisonwith the existing PA-RISCarchitecture, and some thoughts onhow the PA-RISCarchitecture mightbe extended in the future to provide some of the benefits that the PLBidea was trying to achieve. The reader is assumed to be familiar withmodem TLB,cache and processor architectures; with luck, sufficient information is provided in this paper that detailed prior knowledge about either the Washington proposal or PA-RISCisn't required. Example To help illustrate some of the finer points of the different protectionmodels, anexample based on the LRPC work of Brian Bershad is used [Bershad90] (see Figure 1). -
Intel® 64 and IA-32 Architectures Software Developer's Manual
Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide, Part 2 NOTE: The Intel® 64 and IA-32 Architectures Software Developer's Manual consists of five volumes: Basic Architecture, Order Number 253665; Instruction Set Reference A-M, Order Number 253666; Instruction Set Reference N-Z, Order Number 253667; System Programming Guide, Part 1, Order Number 253668; System Programming Guide, Part 2, Order Number 253669. Refer to all five volumes when evaluating your design needs. Order Number: 253669-029US November 2008 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANT- ED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR IN- TENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUA- TION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "unde- fined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.