And 64-Bit Windows® Operating Systems

Total Page:16

File Type:pdf, Size:1020Kb

And 64-Bit Windows® Operating Systems DEV398 Porting Applications to Windows® for AMD64 Technology Mike Wall MTS Software Engineer, AMD Agenda AMD64 technology AMD64 Instruction Set Architecture AMD OpteronTM and AMD AthlonTM 64 Processor overview Platform features and multiprocessing 64-bit Windows® for AMD64 What to port, and how Maximize multiprocessor performance Tools and additional resources Windows® and AMD64 Technology Unifying theme: Compatibility Processor: Native hardware support for 32-bit and 64-bit x86 code OS: 64-bit Windows® runs 32-bit and 64- bit applications side by side, seamlessly Code: A single C/C++ source code tree compiles to both 32-bit and 64-bit binaries AMD64 Technology AMD64 Instruction Set Architecture AMD64 Technology AMD64 Programmer’s Model 63 31 15 7 0 In x86 Added by AMD64 RAX EAX AH AL 127 0 79 0 SS XMM0 GG EAX xx SS PP 88 EE RR 77 XMM7 EDI XMM8 R8 XMM8 EIP XMM15 R15 AMD64 Technology AMD64 Instruction Set Architecture Support for all x86 instruction extensions MMXTM, SSE, SSE2, 3DNow!TM Full performance with all code Native 32-bit x86 mode Enhanced capability in 64-bit mode 64-bit general purpose registers 64-bit addressing Twice as many general purpose registers Twice as many SSE registers Same familiar x86 instructions AMD64 Technology AMD OpteronTM and AMD AthlonTM 64 Processor Overview AMD64 Technology AMD64 CPU block diagram DDR Memory Controller L1 X86-based Instruction Cache 64-bit L2 Processor Cache Core L1 Data Cache HyperTransport™ . AMD64 Technology Integrated memory controller The word to remember: Latency 1,000’s of MHz 100’s of MHz & Always Increasing & Not Improving MemoryMemory MemoryMemory ControllerController ControllerController AMDAMD Athlon™Athlon™ 64 64 LegacyLegacy x86x86 ChipsetChipset Integrated Memory Controller runs at CPU Core Frequency As CPU frequency increases, the integrated memory controller becomes more efficient, but the Legacy x86 memory controller does not. AMD64 Technology HyperTransportTM Interface TM 16x16 8x8 AMD Athlon 64 TM Graphics HyperTransport HyperTransport I/OI/O HubHub processor @ 6.4GB/s Tunnel @ 800MB/s HyperTransportHyperTransportTM TechnologyTechnology AttributesAttributes UnidirectionalUnidirectional (a(a pairpair ofof links)links) DDRDDR--likelike performanceperformance (800MHz(800MHz == 1600MT/sec)1600MT/sec) 44 bytesbytes widewide …… 6.4GB/sec6.4GB/sec bandwidthbandwidth AMD64 Technology Platform features and multiprocessing AMD64 Technology Performance desktop PC TM AMDAMD Athlon AthlonTM6464 754 PGA 754µ µPGA 16x16 HyperTransportTM @ 1600 MT/s 266/333/400MHz 32bits @ 64-Bit Unbuffered DDR 533Mhz AMD-8151TMTM AGP8X AMD-8151 Graphics Tunnel Graphics Tunnel PCI 8x8 HyperTransport @ 400MT/s 32bits @ 33Mhz TM AMD-8111AMD-8111TM I/O Hub I/O Hub LPC FLASH USB 1.1 SIO USB 2.0 UDMA133 SMbus 1.1 SMbus 2.0 ACR MII AC’97 AC’97 CODEC Audio AMD64 Technology High performance workstation AMD Opteron AMDAMD Opteron™ Opteron™ AMD Opteron 200-333MHz 940 µPGA 16x16 cHyperTransport™ @ 940 µPGA 200-333MHz 144-Bit Reg DDR 940 µPGA 1600MT/s 940 µPGA 144-Bit Reg DDR 16x16 HyperTransport @ 16x16 HyperTransport @ 1600MT/s 1600MT/s 32bits @ 64bits @ TM 100Mhz PCI-X 533Mhz AMD-8151 TM AMD-8131TMTM AGP8X AMD-8151 AMD-8131 Graphics Tunnel PCIX Tunnel Graphics Tunnel PCIX Tunnel 1000 BaseT 8x8 HyperTransport @ 400MT/s Gbit Ethernet Legacy PCI U320 SCSI TM SCSI AMD-8111AMD-8111TM I/O Hub I/O Hub LPC FLASH SIO USB1.0,2.0 AC97 UDMA133 10/100 Ethernet 10/100 Phy 100 BaseT AMD64 Technology 4P server Possible expansion to another 4 processors 200-333MHz 16x16 cHyperTransport™ @ 144-Bit Reg DDR 1600MT/s AMD Opteron AMDAMD Opteron™ Opteron™ AMD Opteron 200-333MHz 940 µPGA Coherent 940940 µ µPGAPGA 144-Bit Reg DDR 940 µPGA HyperTransport Links 16x16 cHyperTransport @ 16x16 cHyperTransport @ 1600MT/s 1600MT/s AMD Opteron AMD Opteron 200-333MHz 200-333MHz AMD Opteron 16x16 cHyperTransport @ AMD Opteron 144-Bit Reg DDR 144-Bit Reg DDR 940940 µ µPGAPGA 1600MT/s 940940 µ µPGAPGA 16x16 HyperTransport @ 1600MT/s 8x8 HyperTransport @ 64bits @ 64bits @ 400MT/s 133Mhz TM 133Mhz AMD-8131 TM PCI-X AMD-8131 PCI-X HypTrans-PCI-X Legacy PCI HotPlug HypTrans-PCI-X HotPlug PCI VGA Graphics 8x8 HyperTransport @ TM 1600MT/s AMD-8111 TM AMD-8111 FLASH I/O Hub I/O Hub 64bits @ TM 64bits @ 1000 BaseT LPC 66Mhz AMD-8131 TM 66Mhz SIO AMD-8131 Gbit Ethernet Zircon BMC PCI-XHyperTransport PCI-X HyperTransport U320 USB1.0,2.0 PCI-X SCSI AC97 100 BaseT PCI-X UDMA100 1000 BaseT 10/100 Ethernet Management LAN Gbit Ethernet 64-bit Windows® for AMD64 Technology 64-bit Windows® for AMD64 32-bit and 64-bit on a single platform An AMD64-based Processor can run both 32- and 64-bit Windows® operating systems STARTSTART BOOTBOOT UPUP UsingUsing 3232 bitbit BIOSBIOS 32-bit LookLook 64-bit LoadLoad 3232 bitbit OSOS LoadLoad 6464 bitbit OSOS atat OSOS RunRun 3232 bitbit RunRun 3232 && 6464 ApplicationsApplications bitbit appsapps 64-bit Windows® for AMD64 Full backward compatibility with 32-bit Existing 32-bit applications run great WoW64 = Windows-on-Windows 64 32-bit applications run on the hardware no emulation anywhere with AMD64 Performance in WoW64 is not impaired slight overhead for OS calls sometimes compensated by faster OS Users can keep all their existing software 64-bit Windows® for AMD64 Technology WoW64 operation and compatibility 32-bit thread 64-bit thread 32-bit32-bit ApplicationApplication 6464-bit-bit ApplicationApplication WoW64 6464-bit-bit OperatingOperating SystemSystem 64-bit Device Drivers AMD64: What to port, and how What apps will benefit from 64 bits? Large memory! essentially unlimited virtual address space physical memory only limited by platform 8GB per CPU is expected to be common next year More registers and “big number” math Codecs, simulation, 3D, games Compression, encryption, finance Not everyone needs to port: 32-bit runs fine! AMD64: What to port, and how Some code really must be ported Drivers device drivers must “match the OS” 64-bit OS requires 64-bit drivers there are presentations focused on drivers Code libraries and .dll’s customers who port will require 64-bit versions You can’t mix 32 and 64-bit application code but OS IPC mechanisms work 32 ÅÆ 64 AMD64: What to port, and how Moving up to 64-bit mode A single source code tree Æ compile to 32-bit and 64-bit Windows® platforms! Programming for 64-bit Windows® is basically the same as 32-bit same API, a few different data types Types int and long remain 32 bits Only pointers expand to 64 bits “P64” Use size_t and new polymorphic types AMD64: What to port, and how Use portable / scalable data types Use int where only 32 bits are needed e.g. Index variable for a small array Use size_t where array may grow beyond the 32-bit limit Special polymorphic types for pointer math INT_PTR, UINT_PTR, LONG_PTR, ULONG_PTR These types scale to match the 32/64-bit mode AMD64: What to port, and how Code example #1 char *p; long Lval; Lval = p; // bad! truncated in 64-bit mode char *p; LONG_PTR Lval; // either 32 bits or 64 bits Lval = p; // works fine AMD64: What to port, and how Code example #2 for (int i = 0; i < S; i++) { A[i] += 3; } If S may exceed 32 bits, use: for (size_t i = 0; i < S; i++) { A[i] += 3; } AMD64: What to port, and how Code example #3 objectStart = (PVOID) ((ULONG)objectStart | 1); This is bad because we are casting a pointer to a long; it will get truncated when we compile for 64-bit objectStart = (PVOID) ((ULONG_PTR)objectStart | 1); Proper use of polymorphic data type AMD64: What to port, and how Assorted portability tips -1 != 0xFFFFFFFF (just use “-1”) Many Windows® apps use DWORD a DWORD is always 32 bits, make sure that is really what you want Use %p in a printf, to print a pointer Data alignment can affect performance there are alignment requirements in the ABI structure padding can affect portability AMD64: What to port, and how Explicitly sized types INT64 and UINT64 are always 64 bits INT32 and UINT32 are always 32 bits Think carefully before using these most often you really want INT_PTR etc. explicit size type are useful for shared data but you can typically just use int and long AMD64: What to port, and how Microsoft C compiler and libraries The 64-bit compiler emits SSE / SSE2 code The string and memory functions are optimized and should be used AMD provides optimized math libraries: ACML = AMD Core Math Libraries (BLAS, FFT, LAPACK, and more) AMD64: What to port, and how Microsoft C compiler and libraries (2) In-line assembly code is not supported put assembly code in a separate MASM file Compiler intrinsics are provided for effectively in-lining many SSE functions and other special functions AMD64: What to port, and how Porting x86 assembly code Assembly code is straightforward to port And worth doing for performance reasons In-line assembly no longer supported Use separate MASM file, compile .obj and link Take advantage of the new registers! GPR regs r8-r15, SSE regs xmm8-xmm15 New 64-bit stack frames, calling convention Carefully read the ABI document! ÆÆ “AMD64 Software Conventions” AMD64: What to port, and how Using the 64-bit registers 32-bit mov edx, 66 mov eax, [ecx + edx*4] mov ecx, [eax] //32-bit pointer 64-bit mov r10, 66 mov rax, [rcx + r10*8] mov rcx, [rax] //64-bit pointer AMD64: What to port, and how 64-bit regs: upper half gets cleared 32-bit code might look like this xor edx, edx // clear all bits mov dx, 66 // load 16-bit value 64-bit code looks like this mov edx, 66 // load 32-bit value // upper half is cleared AMD64: What to port, and how Function call args passed in regs 32-bit code might look like this push
Recommended publications
  • 07 Vectorization for Intel C++ & Fortran Compiler .Pdf
    Vectorization for Intel® C++ & Fortran Compiler Presenter: Georg Zitzlsberger Date: 10-07-2015 1 Agenda • Introduction to SIMD for Intel® Architecture • Compiler & Vectorization • Validating Vectorization Success • Reasons for Vectorization Fails • Intel® Cilk™ Plus • Summary 2 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Vectorization • Single Instruction Multiple Data (SIMD): . Processing vector with a single operation . Provides data level parallelism (DLP) . Because of DLP more efficient than scalar processing • Vector: . Consists of more than one element . Elements are of same scalar data types (e.g. floats, integers, …) • Vector length (VL): Elements of the vector A B AAi i BBi i A B Ai i Bi i Scalar Vector Processing + Processing + C CCi i C Ci i VL 3 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Evolution of SIMD for Intel Processors Present & Future: Goal: Intel® MIC Architecture, 8x peak FLOPs (FMA) over 4 generations! Intel® AVX-512: • 512 bit Vectors • 2x FP/Load/FMA 4th Generation Intel® Core™ Processors Intel® AVX2 (256 bit): • 2x FMA peak Performance/Core • Gather Instructions 2nd Generation 3rd Generation Intel® Core™ Processors Intel® Core™ Processors Intel® AVX (256 bit): • Half-float support • 2x FP Throughput • Random Numbers • 2x Load Throughput Since 1999: Now & 2010 2012 2013 128 bit Vectors Future Time 4 Optimization Notice
    [Show full text]
  • Exploring the X64
    Exploring the x64 Junichi Murakami Executive Officer, Director of Research Fourteenforty Research Institute, Inc. Who am I? • Junichi Murakami – @Fourteenforty Research Institute, Inc. – Both Windows and Linux kernel development – Reversing malware and P2P software, etc. – Speaker at: • Black Hat 2008 US and Japan, AVAR 2009, RSA Conference(2009-) – Instructor at Security & Programming Camp(2006-) 2 Environment • Windows 7 x64 Edition • Visual Studio 2008 • Windbg • IDA Pro Advanced – STD doesn’t support x64, an offering is needed! 4 Agenda • Windows x64 • ABI(Application Binary Interface) • API Hooking • Code Injection 5 Windows x64 • Native x64 and WoW64 • Virtual Address Space – 2^64 = 16 Exa Byte ( Exa: 10^18) – but, limited to 16TB by Microsoft • File/Registry reflection • New 64-bit APIs – IsWow64Process, GetNativeSystemInfo, etc. 6 ABI • Binary Format • Register • Calling Convention • Exception Handling • Systemcall(x64, WoW64) 11 Binary Format(Cont.) • Some fields were extended to 64-bits – IMAGE_NT_HEADERS.IMAGE_OPTIONAL_HEADER • ImageBase • SizeOfStackReserve • SizeOfStackCommit • SizeOfHeapReserve • SizeOfHeapCommit 13 Calling Convention • first 4 parameters are passed by RCX, RDX, R8, R9 – 5th and later are passed on the stack • caller allocates register home space on the stack • RAX is used for return values • leaf / non-leaf function – leaf function: never use stack – PE32+ contains non-leaf function’s information in its EXCEPTION DIRECTORY • Register’s volatility – volatile: RAX, RCX, RDX, R8-R11 15 Exception Handling •
    [Show full text]
  • Through the Looking Glass: Webcam Interception and Protection in Kernel
    VIRUS BULLETIN www.virusbulletin.com Covering the global threat landscape THROUGH THE LOOKING GLASS: and WIA (Windows Image Acquisition), which provides a WEBCAM INTERCEPTION AND still image acquisition API. PROTECTION IN KERNEL MODE ATTACK VECTORS Ronen Slavin & Michael Maltsev Reason Software, USA Let’s pretend for a moment that we’re the bad guys. We have gained control of a victim’s computer and we can run any code on it. We would like to use his camera to get a photo or a video to use for our nefarious purposes. What are our INTRODUCTION options? When we talk about digital privacy, the computer’s webcam The simplest option is just to use one of the user-mode APIs is one of the most relevant components. We all have a tiny mentioned previously. By default, Windows allows every fear that someone might be looking through our computer’s app to access the computer’s camera, with the exception of camera, spying on us and watching our every move [1]. And Store apps on Windows 10. The downside for the attackers is while some of us think this scenario is restricted to the realm that camera access will turn on the indicator LED, giving the of movies, the reality is that malware authors and threat victim an indication that somebody is watching him. actors don’t shy away from incorporating such capabilities A sneakier method is to spy on the victim when he turns on into their malware arsenals [2]. the camera himself. Patrick Wardle described a technique Camera manufacturers protect their customers by incorporating like this for Mac [8], but there’s no reason the principle into their devices an indicator LED that illuminates when can’t be applied to Windows, albeit with a slightly different the camera is in use.
    [Show full text]
  • Analysis of SIMD Applicability to SHA Algorithms O
    1 Analysis of SIMD Applicability to SHA Algorithms O. Aciicmez Abstract— It is possible to increase the speed and throughput of The remainder of the paper is organized as follows: In an algorithm using parallelization techniques. Single-Instruction section 2 and 3, we introduce SIMD concept and the SIMD Multiple-Data (SIMD) is a parallel computation model, which has architecture of Intel including MMX technology and SSE already employed by most of the current processor families. In this paper we will analyze four SHA algorithms and determine extensions. Section 4 describes SHA algorithm and Section possible performance gains that can be achieved using SIMD 5 discusses the possible improvements on SHA performance parallelism. We will point out the appropriate parts of each that can be achieved by using SIMD instructions. algorithm, where SIMD instructions can be used. II. SIMD PARALLEL PROCESSING I. INTRODUCTION Single-instruction multiple-data execution model allows Today the security of a cryptographic mechanism is not the several data elements to be processed at the same time. The only concern of cryptographers. The heavy communication conventional scalar execution model, which is called single- traffic on contemporary very large network of interconnected instruction single-data (SISD) deals only with one pair of data devices demands a great bandwidth for security protocols, and at a time. The programs using SIMD instructions can run much hence increasing the importance of speed and throughput of a faster than their scalar counterparts. However SIMD enabled cryptographic mechanism. programs are harder to design and implement. A straightforward approach to improve cryptographic per- The most common use of SIMD instructions is to perform formance is to implement cryptographic algorithms in hard- parallel arithmetic or logical operations on multiple data ware.
    [Show full text]
  • Fibre Channel Interface
    Fibre Channel Interface Fibre Channel Interface ©2006, Seagate Technology LLC All rights reserved Publication number: 100293070, Rev. A March 2006 Seagate and Seagate Technology are registered trademarks of Seagate Technology LLC. SeaTools, SeaFONE, SeaBOARD, SeaTDD, and the Wave logo are either registered trade- marks or trademarks of Seagate Technology LLC. Other product names are registered trade- marks or trademarks of their owners. Seagate reserves the right to change, without notice, product offerings or specifications. No part of this publication may be reproduced in any form without written permission of Seagate Technol- ogy LLC. Revision status summary sheet Revision Date Writer/Engineer Sheets Affected A 03/08/06 C. Chalupa/J. Coomes All iv Fibre Channel Interface Manual, Rev. A Contents 1.0 Contents . i 2.0 Publication overview . 1 2.1 Acknowledgements . 1 2.2 How to use this manual . 1 2.3 General interface description. 2 3.0 Introduction to Fibre Channel . 3 3.1 General information . 3 3.2 Channels vs. networks . 4 3.3 The advantages of Fibre Channel . 4 4.0 Fibre Channel standards . 5 4.1 General information . 6 4.1.1 Description of Fibre Channel levels . 6 4.1.1.1 FC-0 . .6 4.1.1.2 FC-1 . .6 4.1.1.3 FC-1.5 . .6 4.1.1.4 FC-2 . .6 4.1.1.5 FC-3 . .6 4.1.1.6 FC-4 . .7 4.1.2 Relationship between the levels. 7 4.1.3 Topology standards . 7 4.1.4 FC Implementation Guide (FC-IG) . 7 4.1.5 Applicable Documents .
    [Show full text]
  • AMD Opteron™ Shared Memory MP Systems Ardsher Ahmed Pat Conway Bill Hughes Fred Weber Agenda
    AMD Opteron™ Shared Memory MP Systems Ardsher Ahmed Pat Conway Bill Hughes Fred Weber Agenda • Glueless MP systems • MP system configurations • Cache coherence protocol • 2-, 4-, and 8-way MP system topologies • Beyond 8-way MP systems September 22, 2002 Hot Chips 14 2 AMD Opteron™ Processor Architecture DRAM 5.3 GB/s 128-bit MCT CPU SRQ XBAR HT HT HT 3.2 GB/s per direction 3.2 GB/s per direction @ 0+]'DWD5DWH @ 0+]'DWD5DWH 3.2 GB/s per direction @ 0+]'DWD5DWH HT = HyperTransport™ technology September 22, 2002 Hot Chips 14 3 Glueless MP System DRAM DRAM MCT CPU MCT CPU SRQ SRQ non-Coherent HyperTransport™ Link XBAR XBAR HT I/O I/O I/O HT cHT cHT cHT cHT Coherent HyperTransport ™ cHT cHT I/O I/O HT HT cHT cHT XBAR XBAR CPU MCT CPU MCT SRQ SRQ HT = HyperTransport™ technology DRAM DRAM September 22, 2002 Hot Chips 14 4 MP Architecture • Programming model of memory is effectively SMP – Physical address space is flat and fully coherent – Far to near memory latency ratio in a 4P system is designed to be < 1.4 – Latency difference between remote and local memory is comparable to the difference between a DRAM page hit and a DRAM page conflict – DRAM locations can be contiguous or interleaved – No processor affinity or NUMA tuning required • MP support designed in from the beginning – Lower overall chip count results in outstanding system reliability – Memory Controller and XBAR operate at the processor frequency – Memory subsystem scale with frequency improvements September 22, 2002 Hot Chips 14 5 MP Architecture (contd.) • Integrated Memory Controller
    [Show full text]
  • Minimum Hardware and Operating System
    Hardware and OS Specifications File Stream Document Management Software – System Requirements for v4.5 NB: please read through carefully, as it contains 4 separate specifications for a Workstation PC, a Web PC, a Server and a Web Server. Further notes are at the foot of this document. If you are in any doubt as to which specification is applicable, please contact our Document Management Technical Support team – we will be pleased to help. www.filestreamsystems.co.uk T Support +44 (0) 118 989 3771 E Support [email protected] For an in-depth list of all our features and specifications, please visit: http://www.filestreamsystems.co.uk/document-management-specification.htm Workstation PC Processor (CPU) ⁴ Supported AMD/Intel x86 (32bit) or x64 (64bit) Compatible Minimum Intel Pentium IV single core 1.0 GHz Recommended Intel Core 2 Duo E8400 3.0 GHz or better Operating System ⁴ Supported Windows 8, Windows 8 Pro, Windows 8 Enterprise (32bit, 64bit) Windows 10 (32bit, 64bit) Memory (RAM) ⁵ Minimum 2.0 GB Recommended 4.0 GB Storage Space (Disk) Minimum 50 GB Recommended 100 GB Disk Format NTFS Format Recommended Graphics Card Minimum 128 MB DirectX 9 Compatible Recommended 128 MB DirectX 9 Compatible Display Minimum 1024 x 768 16bit colour Recommended 1280 x 1024 32bit colour Widescreen Format Yes (minimum vertical resolution 800) Dual Monitor Yes Font Settings Only 96 DPI font settings are supported Explorer Internet Minimum Microsoft Internet Explorer 11 Network (LAN) Minimum 100 MB Ethernet (not required on standalone PC) Recommended
    [Show full text]
  • Architecture and Application of Infortrend Infiniband Design
    Architecture and Application of Infortrend InfiniBand Design Application Note Version: 1.3 Updated: October, 2018 Abstract: Focusing on the architecture and application of InfiniBand technology, this document introduces the architecture, application scenarios and highlights of the Infortrend InfiniBand host module design. Infortrend InfiniBand Host Module Design Contents Contents ............................................................................................................................................. 2 What is InfiniBand .............................................................................................................................. 3 Overview and Background .................................................................................................... 3 Basics of InfiniBand .............................................................................................................. 3 Hardware ....................................................................................................................... 3 Architecture ................................................................................................................... 4 Application Scenarios for HPC ............................................................................................................. 5 Current Limitation .............................................................................................................................. 6 Infortrend InfiniBand Host Board Design ............................................................................................
    [Show full text]
  • SIMD Extensions
    SIMD Extensions PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 12 May 2012 17:14:46 UTC Contents Articles SIMD 1 MMX (instruction set) 6 3DNow! 8 Streaming SIMD Extensions 12 SSE2 16 SSE3 18 SSSE3 20 SSE4 22 SSE5 26 Advanced Vector Extensions 28 CVT16 instruction set 31 XOP instruction set 31 References Article Sources and Contributors 33 Image Sources, Licenses and Contributors 34 Article Licenses License 35 SIMD 1 SIMD Single instruction Multiple instruction Single data SISD MISD Multiple data SIMD MIMD Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism. History The first use of SIMD instructions was in vector supercomputers of the early 1970s such as the CDC Star-100 and the Texas Instruments ASC, which could operate on a vector of data with a single instruction. Vector processing was especially popularized by Cray in the 1970s and 1980s. Vector-processing architectures are now considered separate from SIMD machines, based on the fact that vector machines processed the vectors one word at a time through pipelined processors (though still based on a single instruction), whereas modern SIMD machines process all elements of the vector simultaneously.[1] The first era of modern SIMD machines was characterized by massively parallel processing-style supercomputers such as the Thinking Machines CM-1 and CM-2. These machines had many limited-functionality processors that would work in parallel.
    [Show full text]
  • Chapter 6 MIDI, SCSI, and Sample Dumps
    MIDI, SCSI, and Sample Dumps SCSI Guidelines Chapter 6 MIDI, SCSI, and Sample Dumps SCSI Guidelines The following sections contain information on using SCSI with the K2600, as well as speciÞc sections dealing with the Mac and the K2600. Disk Size Restrictions The K2600 accepts hard disks with up to 2 gigabytes of storage capacity. If you attach an unformatted disk that is larger than 2 gigabytes, the K2600 will still be able to format it, but only as a 2 gigabyte disk. If you attach a formatted disk larger than 2 gigabytes, the K2600 will not be able to work with it; you could reformat the disk, but thisÑof courseÑwould erase the disk entirely. Configuring a SCSI Chain Here are some basic guidelines to follow when conÞguring a SCSI chain: 1. According to the SCSI SpeciÞcation, the maximum SCSI cable length is 6 meters (19.69 feet). You should limit the total length of all SCSI cables connecting external SCSI devices with Kurzweil products to 17 feet (5.2 meters). To calculate the total SCSI cable length, add the lengths of all SCSI cables, plus eight inches for every external SCSI device connected. No single cable length in the chain should exceed eight feet. 2. The Þrst and last devices in the chain must be terminated. There is a single exception to this rule, however. A K2600 with an internal hard drive and no external SCSI devices attached should have its termination disabled. If you later add an external device to the K2600Õs SCSI chain, you must enable the K2600Õs termination at that time.
    [Show full text]
  • Application Note 904 an Introduction to the Differential SCSI Interface
    DS36954 Application Note 904 An Introduction to the Differential SCSI Interface Literature Number: SNLA033 An Introduction to the Differential SCSI Interface AN-904 National Semiconductor An Introduction to the Application Note 904 John Goldie Differential SCSI Interface August 1993 OVERVIEW different devices to be connected to the same daisy chained The scope of this application note is to provide an introduc- cable (SCSI-1 and 2 allows up to eight devices while the pro- tion to the SCSI Parallel Interface and insight into the differ- posed SCSI-3 standard will allow up to 32 devices). A typical ential option specified by the SCSI standards. This applica- SCSI bus configuration is shown in Figure 1. tion covers the following topics: WHY DIFFERENTIAL SCSI? • The SCSI Interface In comparison to single-ended SCSI, differential SCSI costs • Why Differential SCSI? more and has additional power and PC board space require- • The SCSI Bus ments. However, the gained benefits are well worth the addi- • SCSI Bus States tional IC cost, PCB space, and required power in many appli- • SCSI Options: Fast and Wide cations. Differential SCSI provides the following benefits over single-ended SCSI: • The SCSI Termination • Reliable High Transfer Rates — easily capable of operat- • SCSI Controller Requirements ing at 10MT/s (Fast SCSI) without special attention to termi- • Summary of SCSI Standards nations. Even higher data rates are currently being standard- • References/Standards ized (FAST-20 @ 20MT/s). The companion Application Note (AN-905) focuses on the THE SCSI INTERFACE features of National’s new RS-485 hex transceiver. The The Small Computer System Interface is an ANSI (American DS36BC956 specifically designed for use in differential SCSI National Standards Institute) interface standard defining a applications is also optimal for use in other high speed, par- peer to peer generic input/output bus (I/O bus).
    [Show full text]
  • Programming Model Intel Itanium 64
    11/11/2003 64-bit computing AMD Opteron 64 Application of Win32 Executable File Legacy 64 bit platforms Inbuilt 128-bit bus DDR memory controller with memory bandwidth speed up to 5.3GB/s. Infectors on Intel Itanium and AMD Benefits of 64-bit processors Opteron Based Win64 Systems Use of hyper transport protocol, “glueless” architecture. Oleg Petrovsky and Shali Hsieh Increased integer dynamic range Computer Associates International Inc. Available in up to 8 way configuration with the clock speeds 1 Computer Associates Plaza, Islandia, NY 11749, Much larger addressable memory space of 1.4 GHz, 1.6 GHz and 1.8 GHz . USA Benefits to database, scientific and cryptography Reuses already familiar 32-bit x86 instruction set and applications extends it to support 64-bit operands, registers and memory pointers. AMD64 Programming Model AMD64: Programming model Intel Itanium 64 X86 32-64 64 bit Itanium line of processors is being developed by Intel XMM8 X86 80-Bit Extends general use registers to 64-bit, adds additional eight 64-Bit X87 general purpose 64-bit registers. Itanium - 800 MHz, no on die L3 cache, Itanium 2 - 1GHz, RAX EAX AX 3MB L3 on die, Itanium 2003 (Madison) - 1.5 GHz, 6MB L3 on die cache, 410M transistors, largest integration on a RBX Reuses x86 instruction set. single silicon crystal today. XMM15 RCX Runs 32-bit code without emulation or translation to a native Itanium line of processors utilizes more efficient and robust XMM0 than legacy x86 instruction set architecture F instruction set. R8 L A Itanium has to use x86-to-IA-64 decoder a specifically Minimizes learning curve.
    [Show full text]