Introduction to Lab 4

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Lab 4 Introduction Uppmax SSE Summary What is lab 4? Vectors? Who needs vectors anyway? Introduction to Lab 4 Andreas Sandberg <[email protected]> The purpose of this assignment is to give insights into: How vector instructions can be used for floating point code Division of Computer Systems How integer operations can be performed using vector Dept. of Information Technology instructions Uppsala University How memory alignment affects performance and correctness 2011-11-18 2 AvDark’11| Introduction to Lab 4 Introduction Uppmax SSE Summary Introduction Uppmax SSE Summary The Kalkyl Cluster The Kalkyl Cluster Specifications Logging in transferring files Cluster specifications Connecting with SSH 348 Nodes interconnected with Infiniband Always connect to kalkyl.uppmax.uu.se 2784 CPU Cores ssh -Y [email protected] 9404 GB RAM • -Y – Enables X-forwarding 113 TB disk Transferring files Node specifications Transfer files using the scp command Runs Scientific Linux (RedHat Enterprise Linux customized Use the same server as for normal SSH logins for scientific applications) scp ./foo [email protected]:bar/ 2x Quad-Core Intel Xeon 5520 (Nehalem based) • Transfers the file ./foo to the directory bar in your home At least 24 GB RAM directory on Uppmax 3 AvDark’11| Introduction to Lab 4 4 AvDark’11| Introduction to Lab 4 Introduction Uppmax SSE Summary Introduction Uppmax SSE Summary The Kalkyl Cluster Loading additional software Submitting an interactive job You won’t measure correct results in your experiments unless Uppmax provides optional software in modules that can be you allocate an entire node for your experiments. easily loaded an unloaded. Use salloc -p node -n 1 -t 15:00 --qos=short -A g2011132 CMD module load gcc—Load the latest version of the GCC • Runs CMD, or a shell if CMD is omitted compiler • -p node -n 1—Request 1 node module unload gcc—Unload the currently loaded GCC • -t 15:00—Expected runtime for the job • --qos=short—Use the queue for short jobs module • -A g2011132—Use the course project for accounting module list—List loaded modules Jobs running longer than the requested runtime time will module whatis—List available modules be terminated 5 AvDark’11| Introduction to Lab 4 6 AvDark’11| Introduction to Lab 4 Introduction Uppmax SSE Summary Introduction Uppmax SSE Summary x86 Terminology SSE Registers 127 64 630 Integers 2x 64-bit (int|FP) Byte 8 bits 32 3164639695127 0 Word 16 bits 4x 32-bit (int|FP) DWord 32 bits QWord 64 bits 127 32 316463112111 96 95 48 47 16 1509695 8x 16-bit int Floating Point 127 0 Single 32 bits 16x 8-bit int Double 64 bits 16 new 128-bit registers (8 registers in 32-bit mode) Extended 80 bits (only available in the x87) Registers can hold either FP or integer values Number of elements depends on element type 7 AvDark’11| Introduction to Lab 4 8 AvDark’11| Introduction to Lab 4 Introduction Uppmax SSE Summary Introduction Uppmax SSE Summary SSE New instructions Compared to classical x86 Loads and Stores Classical x86/x87 SSE Several new MOV instructions Stack based FP math Register based FP math • Most of them can act as both loads and stores Uses extended 80-bit FP Uses standard 32-bit or Behavior with respect to memory system: precision internally 64-bit FP precision Aligned Requires aligned memory operands Some instructions have All registers are general Unaligned Allows unaligned memory operands fixed operands purpose Non-temporal Accesses optimized for streaming data Memory operations can Memory operations must Different versions depending on data type generally be unaligned generally be aligned • Can be used to optimize data placement inside the CPU 9 AvDark’11| Introduction to Lab 4 10 AvDark’11| Introduction to Lab 4 Introduction Uppmax SSE Summary Introduction Uppmax SSE Summary New instructions New Instructions Horizontal Add HADDPS (Horizontal ADD Packed Single fp) All common arithmetic operations are available • Operate on individual elements Input vectors • At least one version per data type (8 versions of add!) a0 a1 a2 a3 b0 b1 b2 b3 Binary logic operators are available • Operate on entire 128-bit registers ++++ • Different versions for integer and FP Vector specific instructions • Dot-products • Horizontal add • ... Output vector c0 c1 c2 c3 Hordes esoteric instructions Can be used to efficiently summarize 4 vectors 11 AvDark’11| Introduction to Lab 4 12 AvDark’11| Introduction to Lab 4 Introduction Uppmax SSE Summary Introduction Uppmax SSE Summary New Instructions Detecting SSE Comparisons How it should be done PCMPGTW (Parallel CoMpare Greater Than Word) Input vectors 1 Can bit 21 in EFLAGS be toggled? ) CPUID is present 2 Execute CPUIDEAX=0. Check manufacturer and maximum a0 a1 ... a7 > b0 b1 ... b7 CPUID function #. 3 Execute CPUIDEAX=1. Check the following bits: ci := ai > bi ? FFFF16 : 000016 EDX:25 SSE EDX:25 SSE2 Output vector c0 c1 ... c7 ECX:0 SSE3 ECX:9 SSSE3 ECX:19 SSE4.1 Compares element-wise ECX:20 SSE4.2 An element is binary all 1 if the predicate is true, 0 4 Check for optional instructions (use CPUID) otherwise Can be used to generate bit masks 13 AvDark’11| Introduction to Lab 4 14 AvDark’11| Introduction to Lab 4 Introduction Uppmax SSE Summary Introduction Uppmax SSE Summary Detecting SSE C-support What we do Basics Several different interfaces, no standard. Common approaches: • Assembler libraries • Inline assembler (no standard inline asm syntax) This slide is intentionally left blank • GCC Intrinsics • ICC Intrinsics (supported by GCC) Intrinsic names for ICC are documented in Intel’s CPU manuals GCC’s native instructions are “documented” in the GCC manual 15 AvDark’11| Introduction to Lab 4 16 AvDark’11| Introduction to Lab 4 Introduction Uppmax SSE Summary Introduction Uppmax SSE Summary C-support C-support Headers Instruction and Type Naming xmmintrin.h SSE emmintrin.h SSE2 pmmintrin.h SSE3 _mm_<op>_<suffix> tmmintrin.h SSSE3 smmintrin.h SSE4.1 nmmintrin.h SSE4.2 Vector type Element type gmmintrin.h AVX <suffix> epi8 __m128i int8_t epi16 __m128i int16_t Some header files include earlier versions headers from epi32 __m128i int32_t earlier SSE versions. epi64 __m128i int64_t GCC requires that SSE extensions are enabled through ps __m128 float command line switches. pd __m128d double Warning: This normally allows GCC to automatically generate code for the those extensions. 17 AvDark’11| Introduction to Lab 4 18 AvDark’11| Introduction to Lab 4 Introduction Uppmax SSE Summary Introduction Uppmax SSE Summary Example A Small Vectorization Tutorial Loading and Storing 1 Start with a simple serial version of your algorithm Load/store example using unaligned accesses 2 Remove conditional control flow 3 Unroll loops #include <pmmintrin.h> 4 Vectorize! s t a t i c void my_memcpy( char ∗dst , const char ∗src, size_t len) { / ∗ Assume that length is an even multiple of the s t a t i c i n t ∗ vector size ∗ / count ( const u i n t 3 2 _ t ∗data, size_t len) assert((len & 0xF) == 0); { for ( i n t i = 0; i < len; i +=16) { i n t c = 0; __m128i v = _mm_loadu_si128((__m128i ∗ ) ( src + i ) ) ; for ( i n t i = 0; i < len; i++) _mm_storeu_si128 (( __m128i ∗ )(dst + i), v); i f (data[i] == 0) } c ++; } return c ; } 19 AvDark’11| Introduction to Lab 4 20 AvDark’11| Introduction to Lab 4 Introduction Uppmax SSE Summary Introduction Uppmax SSE Summary A Small Vectorization Tutorial A Small Vectorization Tutorial 1 Start with a simple serial version of your algorithm 1 Start with a simple serial version of your algorithm 2 Remove conditional control flow 2 Remove conditional control flow 3 Unroll loops 3 Unroll loops 4 Vectorize! 4 Vectorize! i n t c = 0; i n t c = 0; for ( i n t i = 0; i < len; i++) assert(!(len & 0x3)); c += (data[i] == 0) ? 1 : 0; for ( i n t i = 0; i < len; i +=4) return c ; c += ((data[i + 0] == 0) ? 1 : 0) + ((data[i + 1] == 0) ? 1 : 0) + ((data[i + 2] == 0) ? 1 : 0) + ((data[i + 3] == 0) ? 1 : 0); return c ; 20 AvDark’11| Introduction to Lab 4 20 AvDark’11| Introduction to Lab 4 Introduction Uppmax SSE Summary Introduction Uppmax SSE Summary A Small Vectorization Tutorial Common Error Sources 1 Start with a simple serial version of your algorithm 2 Remove conditional control flow 3 Unroll loops Unaligned memory accesses 4 Vectorize! • Causes a Segmentation fault • May be due to an unintentional memory operand • Can be hard to spot in memory debuggers __m128i c = _mm_setzero_si128(); Unsupported SSE instructions const __m128i one = _mm_set1_epi32(1); • Causes an Illegal instruction error const __m128i zero = _mm_setzero_si128(); • GCC may automatically emit SSE instructions if SSE has for ( i n t i =0; i <len; i +=4) { __m128i v = _mm_loadu_si128((__m128i ∗ )(data + i)); been enabled on the command line const __m128i cond = _mm_cmpeq_epi32(v, zero ); c = _mm_add_epi32(c, _mm_and_si128(cond, one)); } return _mm_extract_epi32(c, 0) + _mm_extract_epi32(c, 1) + _mm_extract_epi32(c, 2) + _mm_extract_epi32(c, 3); 20 AvDark’11| Introduction to Lab 4 21 AvDark’11| Introduction to Lab 4 Introduction Uppmax SSE Summary Introduction Uppmax SSE Summary Where to go from here Important dates Intel • C++ Compiler Manual Groups: • Optimization Reference Manual Prep. Room 1515, now–17:00 • Intel 64 and IA-32 Architectures Software Developer’s A 2011-11-21, Room 1412, 08:15–12:00 Manual (vol. 1 & 2) B 2011-11-21, Room 1412, 13:15–17:00 AMD C 2011-11-22, Room 1412, 08:15–12:00 • Software Optimization Guide for AMD Family 10h Deadline: 2011-11-28 15:14 • AMD64 Architecture Programmer’s Manual (vol. 1 & 3) The GCC manual 22 AvDark’11| Introduction to Lab 4 23 AvDark’11| Introduction to Lab 4 Introduction Uppmax SSE Summary Introduction Uppmax SSE Summary Summary Summary And remember.
Recommended publications
  • AMD Athlon™ Processor X86 Code Optimization Guide
    AMD AthlonTM Processor x86 Code Optimization Guide © 2000 Advanced Micro Devices, Inc. All rights reserved. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. No license, whether express, implied, arising by estoppel or otherwise, to any intellectual property rights is granted by this publication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD’s products are not designed, intended, authorized or warranted for use as components in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other applica- tion in which the failure of AMD’s product could create a situation where per- sonal injury, death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without notice. Trademarks AMD, the AMD logo, AMD Athlon, K6, 3DNow!, and combinations thereof, AMD-751, K86, and Super7 are trademarks, and AMD-K6 is a registered trademark of Advanced Micro Devices, Inc. Microsoft, Windows, and Windows NT are registered trademarks of Microsoft Corporation.
    [Show full text]
  • Lecture Notes in Assembly Language
    Lecture Notes in Assembly Language Short introduction to low-level programming Piotr Fulmański Łódź, 12 czerwca 2015 Spis treści Spis treści iii 1 Before we begin1 1.1 Simple assembler.................................... 1 1.1.1 Excercise 1 ................................... 2 1.1.2 Excercise 2 ................................... 3 1.1.3 Excercise 3 ................................... 3 1.1.4 Excercise 4 ................................... 5 1.1.5 Excercise 5 ................................... 6 1.2 Improvements, part I: addressing........................... 8 1.2.1 Excercise 6 ................................... 11 1.3 Improvements, part II: indirect addressing...................... 11 1.4 Improvements, part III: labels............................. 18 1.4.1 Excercise 7: find substring in a string .................... 19 1.4.2 Excercise 8: improved polynomial....................... 21 1.5 Improvements, part IV: flag register ......................... 23 1.6 Improvements, part V: the stack ........................... 24 1.6.1 Excercise 12................................... 26 1.7 Improvements, part VI – function stack frame.................... 29 1.8 Finall excercises..................................... 34 1.8.1 Excercise 13................................... 34 1.8.2 Excercise 14................................... 34 1.8.3 Excercise 15................................... 34 1.8.4 Excercise 16................................... 34 iii iv SPIS TREŚCI 1.8.5 Excercise 17................................... 34 2 First program 37 2.1 Compiling,
    [Show full text]
  • Targeting Embedded Powerpc
    Freescale Semiconductor, Inc. EPPC.book Page 1 Monday, March 28, 2005 9:22 AM CodeWarrior™ Development Studio PowerPC™ ISA Communications Processors Edition Targeting Manual Revised: 28 March 2005 For More Information: www.freescale.com Freescale Semiconductor, Inc. EPPC.book Page 2 Monday, March 28, 2005 9:22 AM Metrowerks, the Metrowerks logo, and CodeWarrior are trademarks or registered trademarks of Metrowerks Corpora- tion in the United States and/or other countries. All other trade names and trademarks are the property of their respective owners. Copyright © 2005 by Metrowerks, a Freescale Semiconductor company. All rights reserved. No portion of this document may be reproduced or transmitted in any form or by any means, electronic or me- chanical, without prior written permission from Metrowerks. Use of this document and related materials are governed by the license agreement that accompanied the product to which this manual pertains. This document may be printed for non-commercial personal use only in accordance with the aforementioned license agreement. If you do not have a copy of the license agreement, contact your Metrowerks representative or call 1-800-377- 5416 (if outside the U.S., call +1-512-996-5300). Metrowerks reserves the right to make changes to any product described or referred to in this document without further notice. Metrowerks makes no warranty, representation or guarantee regarding the merchantability or fitness of its prod- ucts for any particular purpose, nor does Metrowerks assume any liability arising
    [Show full text]
  • Codewarrior® Targeting Embedded Powerpc
    CodeWarrior® Targeting Embedded PowerPC Because of last-minute changes to CodeWarrior, some of the information in this manual may be inaccurate. Please read the Release Notes on the CodeWarrior CD for the most recent information. Revised: 991129-CIB Metrowerks CodeWarrior copyright ©1993–1999 by Metrowerks Inc. and its licensors. All rights reserved. Documentation stored on the compact disk(s) may be printed by licensee for personal use. Except for the foregoing, no part of this documentation may be reproduced or trans- mitted in any form by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from Metrowerks Inc. Metrowerks, the Metrowerks logo, CodeWarrior, and Software at Work are registered trademarks of Metrowerks Inc. PowerPlant and PowerPlant Constructor are trademarks of Metrowerks Inc. All other trademarks and registered trademarks are the property of their respective owners. ALL SOFTWARE AND DOCUMENTATION ON THE COMPACT DISK(S) ARE SUBJECT TO THE LICENSE AGREEMENT IN THE CD BOOKLET. How to Contact Metrowerks: U.S.A. and international Metrowerks Corporation 9801 Metric Blvd., Suite 100 Austin, TX 78758 U.S.A. Canada Metrowerks Inc. 1500 du College, Suite 300 Ville St-Laurent, QC Canada H4L 5G6 Ordering Voice: (800) 377–5416 Fax: (512) 873–4901 World Wide Web http://www.metrowerks.com Registration information [email protected] Technical support [email protected] Sales, marketing, & licensing [email protected] CompuServe Goto: Metrowerks Table of Contents 1 Introduction 11 Read the Release Notes! . 11 Solaris: Host-Specific Information. 12 About This Book . 12 Where to Go from Here .
    [Show full text]
  • System Calls and Inline Assembler
    System calls and inline assembler Michal Sojka [email protected] ČVUT, FEL License: CC-BY-SA System calls ● A way for “normal” applications to invoke operating system (OS) kernel's services. ● Applications run in unprivileged CPU mode (user space, user mode) ● OS kernel runs in privileged CPU mode (kernel mode) ● System call is a way how to securely switch from user to kernel mode. What is a system call technically? ● A machine instruction that: – Increases the CPU privilege level and – Passes the control to a predefined place in the kernel. ● Arguments are (typically) passed in CPU registers. ● Instructions: – x86: int 0x80, sysenter, syscall – MIPS: syscall – ARM: swi x86 user execution environment (32 bit) Basic Program Execution Registers Address Space* 2^32 -1 Eight 32-bit General-Purpose Registers Registers General-Purpose Registers 31 16 15 8 07 16-bit 32-bit AH AL AX EAX Six 16-bit Segment Registers Registers BH BL BX EBX General-Purpose Registers 32-bits EFLAGS Register CH CL 031 CX ECX DH DL DX 32-bits EIP (Instruction Pointer Register) EAX EDX BP EBX EBP FPU Registers SI ECX ESI EDX Eight 80-bit Floating-Point DI EDI ESI Registers Data Registers 0 SP EDI ESP *The address space can be flat or segmented. Using EBP 16 bits Control Register the physical address ESP 16 bits Status Register extension mechanism, a physical address space of 16 bits Tag Register 2^36 - 1 canbeaddressed. Segment Registers Opcode Register (11-bits) 15 0 48 bits FPU Instruction Pointer Register CS 48 bits FPU Data (Operand) Pointer Register DS SS MMX
    [Show full text]
  • GCC and Assembly Language GCC and Assembly Language
    slide 2 slide 1 gaius gaius GCC and Assembly language GCC and Assembly language one could use an assembly language source file during the construction of an operating system kernel, define manyfunctions which: get, set registers microkernel, or embedded system it is vital to be able to access some of the microprocessor attribute unavailable in a high levellanguage this is inefficient, as it requires a call, ret to set a register cause cache misses and introduce a 3 instruction for example the operating system might need to: overhead modify a processes, stack pointer (%rsp) turn interrupts on and off manipulate the virtual memory directory processor register slide 3 slide 4 gaius gaius Consider an example (dangeous) foo.S suppose we wanted to get and set the value of the .globl foo_setsp stack pointer: rsp #void setsp (void *p) # #move the parameter, p, into $rsp # we might initially start to write an assembly file: foo_setsp: pushq %rbp foo.S which sets and gets the stack pointer movq %rsp, %rbp movq %rdi, %rsp leave ret # #void *getsp (void) # foo_getsp: movq %rsp, %rax ret slide 5 slide 6 gaius gaius Nowwrite some C code: bar.c Compile and link the code extern void foo_setsp (void *p); $ as -o foo.o foo.S $ gcc -c bar.c void someFunc (void) $ gcc foo.o bar.o { void *old = foo_getsp(); foo_setsp((void *)0x1234); } what are the problems with this code? hint examine what happens to the stack pointer slide 7 slide 8 gaius gaius Problems Writing the example the correct way the stack pointer is modified during the call and abetter technique is
    [Show full text]
  • In Using the GNU Compiler Collection (GCC)
    Using the GNU Compiler Collection For gcc version 6.1.0 (GCC) Richard M. Stallman and the GCC Developer Community Published by: GNU Press Website: http://www.gnupress.org a division of the General: [email protected] Free Software Foundation Orders: [email protected] 51 Franklin Street, Fifth Floor Tel 617-542-5942 Boston, MA 02110-1301 USA Fax 617-542-2652 Last printed October 2003 for GCC 3.3.1. Printed copies are available for $45 each. Copyright c 1988-2016 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being \Funding Free Software", the Front-Cover Texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled \GNU Free Documentation License". (a) The FSF's Front-Cover Text is: A GNU Manual (b) The FSF's Back-Cover Text is: You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development. i Short Contents Introduction ::::::::::::::::::::::::::::::::::::::::::::: 1 1 Programming Languages Supported by GCC ::::::::::::::: 3 2 Language Standards Supported by GCC :::::::::::::::::: 5 3 GCC Command Options ::::::::::::::::::::::::::::::: 9 4 C Implementation-Defined Behavior :::::::::::::::::::: 373 5 C++ Implementation-Defined Behavior ::::::::::::::::: 381 6 Extensions to
    [Show full text]
  • PA Build RM.Pdf
    CodeWarrior Development Studio for Power Architecture® Processors Build Tools Reference Revised: 2 March 2012 Freescale, the Freescale logo and CodeWarrior are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. The Power Architecture and Power.org word marks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org. All other product or service names are the property of their re- spective owners. © 2005-2012 Freescale Semiconductor, Inc. All rights reserved. Information in this document is provided solely to enable system and software implementers to use Freescale Semicon- ductor products. There are no express or implied copyright licenses granted hereunder to design or fabricate any inte- grated circuits or integrated circuits based on the information in this document. Freescale Semiconductor reserves the right to make changes without further notice to any products herein. Freescale Semiconductor makes no warranty, representation or guarantee regarding the suitability of its products for any partic- ular purpose, nor does Freescale Semiconductor assume any liability arising out of the application or use of any product or circuit, and specifically disclaims any and all liability, including without limitation consequential or incidental dam- ages. “Typical” parameters that may be provided in Freescale Semiconductor data sheets and/or specifications can and do vary in different applications and actual performance may vary over time. All operating parameters, including
    [Show full text]
  • ARM Cortex-A Series Programmer's Guide
    ARM® Cortex™-A Series Version: 4.0 Programmer’s Guide Copyright © 2011 – 2013 ARM. All rights reserved. ARM DEN0013D (ID012214) ARM Cortex-A Series Programmer’s Guide Copyright © 2011 – 2013 ARM. All rights reserved. Release Information The following changes have been made to this book. Change history Date Issue Confidentiality Change 25 March 2011 A Non-Confidential First release 10 August 2011 B Non-Confidential Second release. Updated to include Virtualization, Cortex-A15 processor, and LPAE. Corrected and revised throughout 25 June 2012 C Non-Confidential Updated to include Cortex-A7 processor, and big.LITTLE. Index added. Corrected and revised throughout. 22 January 2014 D Non-Confidential Updated to include Cortex-A12 processor, Cache Coherent Interconnect, expanded GIC coverage, Multi-core processors, Corrected and revised throughout. Proprietary Notice This Cortex-A Series Programmer’s Guide is protected by copyright and the practice or implementation of the information herein may be protected by one or more patents or pending applications. No part of this Cortex-A Series Programmer’s Guide may be reproduced in any form by any means without the express prior written permission of ARM. No license, express or implied, by estoppel or otherwise to any intellectual property rights is granted by this Cortex-A Series Programmer’s Guide. Your access to the information in this Cortex-A Series Programmer’s Guide is conditional upon your acceptance that you will not use or permit others to use the information for the purposes of determining whether implementations of the information herein infringe any third party patents. This Cortex-A Series Programmer’s Guide is provided “as is”.
    [Show full text]
  • Learn Linux Kernel Programming, Hands-On: a Uniquely Effective Top-Down Approach
    The Linux® Kernel Primer: A Top-Down Approach for x86 and PowerPC Architectures By Claudia Salzberg Rodriguez, Gordon Fischer, Steven Smolski ............................................... Publisher: Prentice Hall PTR Pub Date: September 21, 2005 ISBN: 0-13-118163-7 Pages: 648 Table of Contents | Index Learn Linux kernel programming, hands-on: a uniquely effective top-down approach The Linux® Kernel Primer is the definitive guide to Linux kernel programming. The authors' unique top-down approach makes kernel programming easier to understand by systematically tracing functionality from user space into the kernel and carefully associating kernel internals with user-level programming fundamentals. Their approach helps you build on what you already know about Linux, gaining a deep understanding of how the kernel works and how its elements fit together. One step at a time, the authors introduce all the tools and assembly language programming techniques required to understand kernel code and control its behavior. They compare x86 and PowerPC implementations side-by-side, illuminating cryptic functionality through carefully- annotated source code examples and realistic projects. The Linux® Kernel Primer is the first book to offer in-depth coverage of the rapidly growing PowerPC Linux development platform, and the only book to thoroughly discuss kernel configuration with the Linux build system. Coverage includes Data structures x86 and PPC assembly language Viewing kernel internals Linux process model User and kernel space Interrupts and exceptions Memory allocation and tracking Tracing subsystem behavior I/O interactions Filesystems and file operations Scheduling and synchronization Kernel boot process Kernel build system Configuration options Device drivers And more... If you know C, this book teaches you all the skills and techniques you need to succeed with Linux kernel programming.
    [Show full text]
  • The Shellcoder 039 S Handbook Discovering And
    80238ffirs.qxd:WileyRed 7/11/07 7:22 AM Page iii The Shellcoder’s Handbook Discovering and Exploiting Security Holes Second Edition Chris Anley John Heasman Felix “FX” Linder Gerardo Richarte The Shellcoder’s Handbook: Discovering and Exploiting Security Holes (1st Edition) was written by Jack Koziol, David Litchfield, Dave Aitel, Chris Anley, Sinan Eren, Neel Mehta, and Riley Hassell. Wiley Publishing, Inc. 80238ffirs.qxd:WileyRed 7/11/07 7:22 AM Page ii 80238ffirs.qxd:WileyRed 7/11/07 7:22 AM Page i The Shellcoder’s Handbook Second Edition 80238ffirs.qxd:WileyRed 7/11/07 7:22 AM Page ii 80238ffirs.qxd:WileyRed 7/11/07 7:22 AM Page iii The Shellcoder’s Handbook Discovering and Exploiting Security Holes Second Edition Chris Anley John Heasman Felix “FX” Linder Gerardo Richarte The Shellcoder’s Handbook: Discovering and Exploiting Security Holes (1st Edition) was written by Jack Koziol, David Litchfield, Dave Aitel, Chris Anley, Sinan Eren, Neel Mehta, and Riley Hassell. Wiley Publishing, Inc. 80238ffirs.qxd:WileyRed 7/11/07 7:22 AM Page iv The Shellcoder’s Handbook, Second Edition: Discovering and Exploiting Security Holes Published by Wiley Publishing, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com Copyright © 2007 by Chris Anley, John Heasman, Felix “FX” Linder, and Gerardo Richarte Published by Wiley Publishing, Inc., Indianapolis, Indiana Published simultaneously in Canada ISBN: 978-0-470-08023-8 Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600.
    [Show full text]
  • CCES 2.9.0 C/C++ Compiler Manual for SHARC Processors
    CCES 2.9.0 C/C++ Compiler Manual for SHARC Processors (Includes SHARC+ and ARM Processors) Revision 2.2, May 2019 Part Number 82-100117-01 Analog Devices, Inc. One Technology Way Norwood, MA 02062-9106 Copyright Information ©2019 Analog Devices, Inc., ALL RIGHTS RESERVED. This document may not be reproduced in any form without prior, express written consent from Analog Devices, Inc. Printed in the USA. Disclaimer Analog Devices, Inc. reserves the right to change this product without prior notice. Information furnished by Ana- log Devices is believed to be accurate and reliable. However, no responsibility is assumed by Analog Devices for its use; nor for any infringement of patents or other rights of third parties which may result from its use. No license is granted by implication or otherwise under the patent rights of Analog Devices, Inc. Trademark and Service Mark Notice The Analog Devices logo, Blackfin, Blackin+, CrossCore, EngineerZone, EZ-Board, EZ-KIT, EZ-KIT Lite, EZ-Ex- tender, SHARC, SHARC+, and VisualDSP++ are registered trademarks of Analog Devices, Inc. EZ-KIT Mini is a trademark of Analog Devices, Inc. All other brand and product names are trademarks or service marks of their respective owners. CCES 2.9.0 C/C++ Compiler Manual for SHARC Processors ii Contents Preface Purpose of This Manual................................................................................................................................. 1±1 Intended Audience........................................................................................................................................
    [Show full text]