Linux Gpu Documentation

Total Page:16

File Type:pdf, Size:1020Kb

Linux Gpu Documentation Linux Gpu Documentation The kernel development community Jul 14, 2020 CONTENTS i ii CHAPTER ONE INTRODUCTION The Linux DRM layer contains code intended to support the needs of complex graphics devices, usually containing programmable pipelines well suited to 3D graphics acceleration. Graphics drivers in the kernel may make use of DRM func- tions to make tasks like memory management, interrupt handling and DMA easier, and provide a uniform interface to applications. A note on versions: this guide covers features found in the DRM tree, including the TTM memory manager, output configuration and mode setting, and the new vblank internals, in addition to all the regular features found in current kernels. [Insert diagram of typical DRM stack here] 1.1 Style Guidelines For consistency this documentation uses American English. Abbreviations are written as all-uppercase, for example: DRM, KMS, IOCTL, CRTC, and so on. To aid in reading, documentations make full use of the markup characters kerneldoc provides: @parameter for function parameters, @member for structure members (within the same structure), &struct structure to reference structures and func- tion() for functions. These all get automatically hyperlinked if kerneldoc for the referenced objects exists. When referencing entries in function vtables (and struc- ture members in general) please use &vtable_name.vfunc. Unfortunately this does not yet yield a direct link to the member, only the structure. Except in special situations (to separate locked from unlocked variants) locking requirements for functions aren’t documented in the kerneldoc. Instead locking should be check at runtime using e.g. WARN_ON(!mutex_is_locked(...));. Since it’s much easier to ignore documentation than runtime noise this provides more value. And on top of that runtime checks do need to be updated when the locking rules change, increasing the chances that they’re correct. Within the documenta- tion the locking rules should be explained in the relevant structures: Either in the comment for the lock explaining what it protects, or data fields need a note about which lock protects them, or both. Functions which have a non-void return value should have a section called “Re- turns”explaining the expected return values in different cases and their meanings. Currently there’s no consensus whether that section name should be all upper- case or not, and whether it should end in a colon or not. Go with the file-local style. Other common section names are “Notes”with information for dangerous or tricky corner cases, and “FIXME”where the interface could be cleaned up. 1 Linux Gpu Documentation Also read the guidelines for the kernel documentation at large. 1.1.1 Documentation Requirements for kAPI All kernel APIs exported to other modules must be documented, including their datastructures and at least a short introductory section explaining the overall con- cepts. Documentation should be put into the code itself as kerneldoc comments as much as reasonable. Do not blindly document everything, but document only what’s relevant for driver authors: Internal functions of drm.ko and definitely static functions should not have formal kerneldoc comments. Use normal C comments if you feel like a com- ment is warranted. You may use kerneldoc syntax in the comment, but it shall not start with a /** kerneldoc marker. Similar for data structures, annotate anything entirely private with /* private: */ comments as per the documentation guide. 1.2 Getting Started Developers interested in helping out with the DRM subsystem are very welcome. Often people will resort to sending in patches for various issues reported by check- patch or sparse. We welcome such contributions. Anyone looking to kick it up a notch can find a list of janitorial tasks on the TODO list. 1.3 Contribution Process Mostly the DRM subsystem works like any other kernel subsystem, see the main process guidelines and documentation for how things work. Here we just docu- ment some of the specialities of the GPU subsystem. 1.3.1 Feature Merge Deadlines All feature work must be in the linux-next tree by the -rc6 release of the current release cycle, otherwise they must be postponed and can’t reach the next merge window. All patches must have landed in the drm-next tree by latest -rc7, but if your branch is not in linux-next then this must have happened by -rc6 already. After that point only bugfixes (like after the upstream merge window has closed with the -rc1 release) are allowed. No new platform enabling or new drivers are allowed. This means that there’s a blackout-period of about one month where feature work can’t be merged. The recommended way to deal with that is having a -next tree that’s always open, but making sure to not feed it into linux-next during the black- out period. As an example, drm-misc works like that. 2 Chapter 1. Introduction Linux Gpu Documentation 1.3.2 Code of Conduct As a freedesktop.org project, dri-devel, and the DRM community, follows the Con- tributor Covenant, found at: https://www.freedesktop.org/wiki/CodeOfConduct Please conduct yourself in a respectful and civilised manner when interacting with community members on mailing lists, IRC, or bug trackers. The community rep- resents the project as a whole, and abusive or bullying behaviour is not tolerated by the project. 1.3. Contribution Process 3 Linux Gpu Documentation 4 Chapter 1. Introduction CHAPTER TWO DRM INTERNALS This chapter documents DRM internals relevant to driver authors and developers working to add support for the latest features to existing drivers. First, we go over some typical driver initialization requirements, like setting up command buffers, creating an initial output configuration, and initializing core services. Subsequent sections cover core internals in more detail, providing im- plementation notes and examples. The DRM layer provides several services to graphics drivers, many of them driven by the application interfaces it provides through libdrm, the library that wraps most of the DRM ioctls. These include vblank event handling, memory man- agement, output management, framebuffer management, command submission & fencing, suspend/resume support, and DMA services. 2.1 Driver Initialization At the core of every DRM driver is a struct drm_driver structure. Drivers typically statically initialize a drm_driver structure, and then pass it to drm_dev_alloc() to allocate a device instance. After the device instance is fully initialized it can be registered (which makes it accessible from userspace) using drm_dev_register(). The struct drm_driver structure contains static information that describes the driver and features it supports, and pointers to methods that the DRM core will call to implement the DRM API. We will first go through the struct drm_driver static information fields, and will then describe individual operations in details as they get used in later sections. 2.1.1 Driver Information Major, Minor and Patchlevel int major; int minor; int patchlevel; The DRM core identifies driver versions bya major, minor and patch level triplet. The information is printed to the kernel log at initialization time and passed to userspace through the DRM_IOCTL_VERSION ioctl. The major and minor numbers are also used to verify the requested driver API ver- sion passed to DRM_IOCTL_SET_VERSION. When the driver API changes between 5 Linux Gpu Documentation minor versions, applications can call DRM_IOCTL_SET_VERSION to select a spe- cific version of the API. If the requested major isn’t equal to the driver major, or the requested minor is larger than the driver minor, the DRM_IOCTL_SET_VERSION call will return an error. Otherwise the driver’s set_version() method will be called with the requested version. Name, Description and Date char *name; char *desc; char *date; The driver name is printed to the kernel log at initialization time, used for IRQ registration and passed to userspace through DRM_IOCTL_VERSION. The driver description is a purely informative string passed to userspace through the DRM_IOCTL_VERSION ioctl and otherwise unused by the kernel. The driver date, formatted as YYYYMMDD, is meant to identify the date of the latest modification to the driver. However, as most drivers fail to update it,its value is mostly useless. The DRM core prints it to the kernel log at initialization time and passes it to userspace through the DRM_IOCTL_VERSION ioctl. 2.1.2 Device Instance and Driver Handling A device instance for a drm driver is represented by struct drm_device. This is initialized with drm_dev_init(), usually from bus-specific ->probe() callbacks implemented by the driver. The driver then needs to initialize all the various sub- systems for the drm device like memory management, vblank handling, modeset- ting support and intial output configuration plus obviously initialize all the corre- sponding hardware bits. Finally when everything is up and running and ready for userspace the device instance can be published using drm_dev_register(). There is also deprecated support for initalizing device instances using bus-specific helpers and the drm_driver.load callback. But due to backwards-compatibility needs the device instance have to be published too early, which requires unpretty global locking to make safe and is therefore only support for existing drivers not yet converted to the new scheme. When cleaning up a device instance everything needs to be done in reverse: First unpublish the device instance with drm_dev_unregister(). Then clean up any other resources allocated at device initialization and drop the driver’s reference to drm_device using drm_dev_put(). Note that any allocation or resource which is visible to userspace must be released only when the final drm_dev_put() is called, and not when the driver is unbound from the underlying physical struct device. Best to use drm_device managed resources with drmm_add_action(), drmm_kmalloc() and related functions. devres managed resources like devm_kmalloc() can only be used for resources directly related to the underlying hardware device, and only used in code paths fully protected by drm_dev_enter() and drm_dev_exit().
Recommended publications
  • Is Parallel Programming Hard, And, If So, What Can You Do About It?
    Is Parallel Programming Hard, And, If So, What Can You Do About It? Edited by: Paul E. McKenney Linux Technology Center IBM Beaverton [email protected] December 16, 2011 ii Legal Statement This work represents the views of the authors and does not necessarily represent the view of their employers. IBM, zSeries, and Power PC are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. Linux is a registered trademark of Linus Torvalds. i386 is a trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both. Other company, product, and service names may be trademarks or service marks of such companies. The non-source-code text and images in this doc- ument are provided under the terms of the Creative Commons Attribution-Share Alike 3.0 United States li- cense (http://creativecommons.org/licenses/ by-sa/3.0/us/). In brief, you may use the contents of this document for any purpose, personal, commercial, or otherwise, so long as attribution to the authors is maintained. Likewise, the document may be modified, and derivative works and translations made available, so long as such modifications and derivations are offered to the public on equal terms as the non-source-code text and images in the original document. Source code is covered by various versions of the GPL (http://www.gnu.org/licenses/gpl-2.0.html). Some of this code is GPLv2-only, as it derives from the Linux kernel, while other code is GPLv2-or-later.
    [Show full text]
  • Ati Driver Freebsd
    Ati driver freebsd Hey, I`m new to teh the bsd *BSD world and just installed Freebsd FreeBSD. Only thing missing is my video driver. ATI Radeon X How to Solved - Switch between ATI and VESA drivers? If you want to automatically load a video driver at boot time, we recommend to do it from /etc/:Radeon​: ​ It allows the use of newer xfvideo-ati drivers and AMD GPUs. This project started in January Initial radeon code comes from Linux. EndSection DESCRIPTION radeon is an Xorg driver for ATI/AMD RADEON-based video cards with the following features: o Full support for 8-, , and. This package contains the xfvideo-ati driver. xdrivers/drm-kmod: Port for the DRM kernel drivers for FreeBSD This port. If I boot X11 with no or with ati driver, the system stops responding, although cursor continues to follow mouse movements. (I suppose. To all those concerned, I have read that FreeBSD would be supported by the latest graphic card drivers, which was also confirmed by. I bought an expensive ATI card when they announced they'll go Note that AMD doesn't provide a driver for FreeBSD, so you'll be using the. We now know for sure that FreeBSD will ship with a kernel mode-setting driver for supporting open-source AMD Radeon graphics with its. AMD tech support has allegedly confirmed that Catalyst is being ported to FreeBSD. A Phoronix reader pointed out this thread. I am not sure that FreeBSD will fully support this card. The Xorg version for FreeBSD is and the ATI driver used is version The reason is, AMD/ATI doesn't support FreeBSD, and you have to resort to the sucky open source drivers.
    [Show full text]
  • A Simplified Graphics System Based on Direct Rendering Manager System
    J. lnf. Commun. Converg. Eng. 16(2): 125-129, Jun. 2018 Regular paper A Simplified Graphics System Based on Direct Rendering Manager System Nakhoon Baek* , Member, KIICE School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, Korea Abstract In the field of computer graphics, rendering speed is one of the most important factors. Contemporary rendering is performed using 3D graphics systems with windowing system support. Since typical graphics systems, including OpenGL and the DirectX library, focus on the variety of graphics rendering features, the rendering process itself consists of many complicated operations. In contrast, early computer systems used direct manipulation of computer graphics hardware, and achieved simple and efficient graphics handling operations. We suggest an alternative method of accelerated 2D and 3D graphics output, based on directly accessing modern GPU hardware using the direct rendering manager (DRM) system. On the basis of this DRM support, we exchange the graphics instructions and graphics data directly, and achieve better performance than full 3D graphics systems. We present a prototype system for providing a set of simple 2D and 3D graphics primitives. Experimental results and their screen shots are included. Index Terms: Direct rendering manager, Efficient handling, Graphics acceleration, Light-weight implementation, Prototype system I. INTRODUCTION Rendering speed is one of the most important factors for 3D graphics application programs. Typical present-day graph- After graphics output devices became publicly available, a ics programs need to be able to handle very large quantities large number of graphics applications were developed for a of graphics data. The larger the data size, and the more sen- broad spectrum of uses including computer animations, com- sitive to the rendering speed, the better the speed-up that can puter games, user experiences, and human-computer inter- be achieved, even for minor aspects of the graphics pipeline.
    [Show full text]
  • Porting X Windows System to Operating System Compliant with Portable Operating System Interface
    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 7, 2020 Porting X Windows System to Operating System Compliant with Portable Operating System Interface Andrey V. Zhadchenko1, Kirill A. Mamrosenko2, Alexander M. Giatsintov3 Center of Visualization and Satellite Information Technologies Scientific Research Institute of System Analysis Moscow, Russia Abstract—Now-a-days graphical interface is very important separated process contexts [4]. Current graphical subsystem for any operating system, even the embedded ones. Adopting is server-client X windows system implementation named existing solutions will be much easier than developing your XFree86 with version 4.8.0. Although XFree86 [5] supports up own. Moreover, a lot of software may be reused in this case. to the X11R6.6 protocol version, which is barely enough to run This article is devoted to X Window System adaptation for modern applications, absence of many important extensions, Portable Operating System Interface (POSIX) compliant real- for example, Xrender [6], implies heavy limits upon software. time operating system Baget. Many encountered problems come from the tight connection between X and Linux, therefore it is expected to encounter these issues during usage of X on non- Nowadays existing free software solutions in display Linux systems. Discussed problems include, but not limited to servers for operating systems are limited to two options: X the absence of dlopen, irregular file paths, specific device drivers. Windows System and Wayland [7]. However, there is a big Instructions and recommendations to solve these issues are given. ideological difference between them. X started it’s history a A comparison between XFree86 and Xorg implementations of X long time ago in the ’80s and was developed as an all-around is discussed.
    [Show full text]
  • ECS2100 Series CLI Reference Guide
    ECS2100-10T/PE/P ECS2100-28T/P/PP 10/28-Port Web-smart Pro Gigabit Ethernet Switch CLI Reference Guide Software Release v1.2.2.15 www.edge-core.com CLI Reference Guide ECS2100-10T Gigabit Ethernet Switch Web-smart Pro Gigabit Ethernet Switch with 8 10/100/1000BASE-T (RJ-45) Ports and 2 Gigabit SFP Ports ECS2100-10PE Gigabit Ethernet Switch Web-smart Pro Gigabit Ethernet Switch with 8 10/100/1000BASE-T (RJ-45) 802.3 af/at PoE Ports with 2 Gigabit SFP Ports (PoE Power Budget: 65W) ECS2100-10P Gigabit Ethernet Switch Web-smart Pro Gigabit Ethernet Switch with 8 10/100/1000BASE-T (RJ-45) 802.3 af/at PoE Ports and 2 Gigabit SFP Ports (PoE Power Budget: 125 W) ECS2100-28T Gigabit Ethernet Switch Web-smart Pro Gigabit Ethernet Switch with 24 10/100/1000BASE-T (RJ-45) Ports and 4 Gigabit SFP Ports ECS2100-28P Gigabit Ethernet Switch Web-smart Pro Gigabit Ethernet Switch with 24 10/100/1000BASE-T (RJ-45) 802.3 af/at PoE Ports and 4 Gigabit SFP Ports (PoE Power Budget: 200 W) ECS2100-28PP Gigabit Ethernet Switch Web-smart Pro Gigabit Ethernet Switch with 24 10/100/1000BASE-T (RJ-45) 802.3 af/at PoE Ports and 4 Gigabit SFP Ports (PoE Power Budget: 370 W, can extend to 740 W) E012019-CS-R05 How to Use This Guide This guide includes detailed information on the switch software, including how to operate and use the management functions of the switch. To deploy this switch effectively and ensure trouble-free operation, you should first read the relevant sections in this guide so that you are familiar with all of its software features.
    [Show full text]
  • 11. Kernel Design 11
    11. Kernel Design 11. Kernel Design Interrupts and Exceptions Low-Level Synchronization Low-Level Input/Output Devices and Driver Model File Systems and Persistent Storage Memory Management Process Management and Scheduling Operating System Trends Alternative Operating System Designs 269 / 352 11. Kernel Design Bottom-Up Exploration of Kernel Internals Hardware Support and Interface Asynchronous events, switching to kernel mode I/O, synchronization, low-level driver model Operating System Abstractions File systems, memory management Processes and threads Specific Features and Design Choices Linux 2.6 kernel Other UNIXes (Solaris, MacOS), Windows XP and real-time systems 270 / 352 11. Kernel Design – Interrupts and Exceptions 11. Kernel Design Interrupts and Exceptions Low-Level Synchronization Low-Level Input/Output Devices and Driver Model File Systems and Persistent Storage Memory Management Process Management and Scheduling Operating System Trends Alternative Operating System Designs 271 / 352 11. Kernel Design – Interrupts and Exceptions Hardware Support: Interrupts Typical case: electrical signal asserted by external device I Filtered or issued by the chipset I Lowest level hardware synchronization mechanism Multiple priority levels: Interrupt ReQuests (IRQ) I Non-Maskable Interrupts (NMI) Processor switches to kernel mode and calls specific interrupt service routine (or interrupt handler) Multiple drivers may share a single IRQ line IRQ handler must identify the source of the interrupt to call the proper → service routine 272 / 352 11. Kernel Design – Interrupts and Exceptions Hardware Support: Exceptions Typical case: unexpected program behavior I Filtered or issued by the chipset I Lowest level of OS/application interaction Processor switches to kernel mode and calls specific exception service routine (or exception handler) Mechanism to implement system calls 273 / 352 11.
    [Show full text]
  • Understanding the Linux Kernel, 3Rd Edition by Daniel P
    1 Understanding the Linux Kernel, 3rd Edition By Daniel P. Bovet, Marco Cesati ............................................... Publisher: O'Reilly Pub Date: November 2005 ISBN: 0-596-00565-2 Pages: 942 Table of Contents | Index In order to thoroughly understand what makes Linux tick and why it works so well on a wide variety of systems, you need to delve deep into the heart of the kernel. The kernel handles all interactions between the CPU and the external world, and determines which programs will share processor time, in what order. It manages limited memory so well that hundreds of processes can share the system efficiently, and expertly organizes data transfers so that the CPU isn't kept waiting any longer than necessary for the relatively slow disks. The third edition of Understanding the Linux Kernel takes you on a guided tour of the most significant data structures, algorithms, and programming tricks used in the kernel. Probing beyond superficial features, the authors offer valuable insights to people who want to know how things really work inside their machine. Important Intel-specific features are discussed. Relevant segments of code are dissected line by line. But the book covers more than just the functioning of the code; it explains the theoretical underpinnings of why Linux does things the way it does. This edition of the book covers Version 2.6, which has seen significant changes to nearly every kernel subsystem, particularly in the areas of memory management and block devices. The book focuses on the following topics: • Memory management, including file buffering, process swapping, and Direct memory Access (DMA) • The Virtual Filesystem layer and the Second and Third Extended Filesystems • Process creation and scheduling • Signals, interrupts, and the essential interfaces to device drivers • Timing • Synchronization within the kernel • Interprocess Communication (IPC) • Program execution Understanding the Linux Kernel will acquaint you with all the inner workings of Linux, but it's more than just an academic exercise.
    [Show full text]
  • Zynq Ultrascale+ Mpsoc TRD User Guide (UG1221)
    Zynq UltraScale+ MPSoC Base Targeted Reference Design User Guide UG1221 (v2016.4) March 22, 2017 Revision History The following table shows the revision history for this document. Date Version Revision 03/22/2017 2016.4 Released with Vivado Design Suite 2016.4 with no changes from previous version. 12/15/2016 2016.3 Updated for Vivado Design Suite 2016.3: Updated Reference Design Overview. Replaced Chapter 2, Reference Design. Updated Figure 3-1 and the following paragraph. Updated Figure 4-1 and perfapm library descriptions in Chapter 4, RPU-1 Software Stack (Bare-metal). Updated Figure 6-1, Figure 6-2, and DDR region descriptions under Memories in Chapter 6. Updated Figure 7-1, Figure 7-4, and Figure 7-5. Added X11 section, deleted first paragraph under EGLFS QPA, and modified “Evdev” section to “Libinput” in Chapter 7, APU Software Platform. Updated Table 8-2 and clock descriptions under Clocks, Resets and Interrupts in Chapter 8. 07/22/2016 2016.2 Updated for Vivado Design Suite 2016.2: Added “GPU” to hardware interfaces and IP under Key Features. Changed link under Design Modules from the wiki site to the HeadStart Lounge and updated link under Tutorials to the Base TRD wiki site. Deleted steps 2 and 4 under Tutorials and added reference tutorial (last bullet). Added second to last sentence to second paragraph under Boot Process. Added “Load PMU FW” component to Figure 6-1. Clarified Message Passing section (text only). Changed “PCA9546” to PCA9548” in Figure 8-7. 06/29/2016 2016.1 Initial Xilinx release. Zynq UltraScale+ MPSoC Base TRD www.xilinx.com Send Feedback 2 UG1221 (v2016.4) March 22, 2017 Table of Contents Revision History .
    [Show full text]
  • Linux Kernel Synchronization
    10/27/12 Logical Diagram Binary Memory Threads Formats Allocators Linux kernel Today’s Lecture User SynchronizationSystem Calls Kernel synchronization in the kernel RCU File System Networking Sync Don Porter CSE 506 Memory Device CPU Management Drivers Scheduler Hardware Interrupts Disk Net Consistency Why Linux Warm-up synchronization? ò What is synchronization? ò A modern OS kernel is one of the most complicated parallel programs you can study ò Code on multiple CPUs coordinate their operations ò Examples: ò Other than perhaps a database ò Includes most common synchronization patterns ò Locking provides mutual exclusion while changing a pointer-based data structure ò And a few interesting, uncommon ones ò Threads might wait at a barrier for completion of a phase of computation ò Coordinating which CPU handles an interrupt The old days: They didn’t Historical perspective worry! ò Why did OSes have to worry so much about ò Early/simple OSes (like JOS, pre-lab4): No need for synchronization back when most computers have only synchronization one CPU? ò All kernel requests wait until completion – even disk requests ò Heavily restrict when interrupts can be delivered (all traps use an interrupt gate) ò No possibility for two CPUs to touch same data 1 10/27/12 Slightly more recently A slippery slope ò Optimize kernel performance by blocking inside the kernel ò We can enable interrupts during system calls ò Example: Rather than wait on expensive disk I/O, block and ò More complexity, lower latency schedule another process until it completes ò We can block in more places that make sense ò Cost: A bit of implementation complexity ò Better CPU usage, more complexity ò Need a lock to protect against concurrent update to pages/ inodes/etc.
    [Show full text]
  • Towards Scalable Synchronization on Multi-Cores
    Towards Scalable Synchronization on Multi-Cores THÈSE NO 7246 (2016) PRÉSENTÉE LE 21 OCTOBRE 2016 À LA FACULTÉ INFORMATIQUE ET COMMUNICATIONS LABORATOIRE DE PROGRAMMATION DISTRIBUÉE PROGRAMME DOCTORAL EN INFORMATIQUE ET COMMUNICATIONS ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES PAR Vasileios TRIGONAKIS acceptée sur proposition du jury: Prof. J. R. Larus, président du jury Prof. R. Guerraoui, directeur de thèse Dr T. Harris, rapporteur Dr G. Muller, rapporteur Prof. W. Zwaenepoel, rapporteur Suisse 2016 “We can only see a short distance ahead, but we can see plenty there that needs to be done.” — Alan Turing To my parents, Eirini and Charalampos Acknowledgements “The whole is greater than the sum of its parts.” — Aristotle To date, my education (i.e., diploma, M.Sc., and Ph.D.) has lasted for 13 years. I could not possibly be here and sustain all the pressure (and of course the financial expenses) without the help and support of my family, Eirini (my mother), Charalampos (my father), and Eleni (my sister). I want to deeply thank them for being there for me throughout these years. In my experience, a successful Ph.D. thesis in the area called “systems” (i.e., with a focus on software systems) requires either 10 years of solo work, or 5–6 years of fruitful collaborations. I was lucky enough to belong in the latter category and to have the chance to collaborate with many amazing people in producing the research that is included in this dissertation. This is actually the main reason why in the main body of this dissertation I use “we” instead of “I.” First and foremost, I would like to thank my advisor, Rachid Guerraoui.
    [Show full text]
  • GPU Scheduling for Real-Time Multi-Tasking Environments
    TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments Shinpei Kato† ‡, Karthik Lakshmanan†, and Ragunathan (Raj) Rajkumar†, Yutaka Ishikawa‡ † Department of Electrical and Computer Engineering, Carnegie Mellon University ‡ Department of Computer Science, The University of Tokyo Abstract transition. Especially recent trends on 3-D browser and desktop applications, such as SpaceTime, Web3D, 3D- The Graphics Processing Unit (GPU) is now commonly Desktop, Compiz Fusion, BumpTop, Cooliris, and Win- used for graphics and data-parallel computing. As more dows Aero, are all intriguing possibilities for future user and more applications tend to accelerate on the GPU in interfaces. GPUs are also leveraged in various domains multi-tasking environments where multiple tasks access of general-purpose GPU (GPGPU) processing to facili- the GPU concurrently, operating systems must provide tate data-parallel compute-intensive applications. prioritization and isolation capabilities in GPU resource Real-time multi-tasking support is a key requirement management, particularly in real-time setups. for such emerging GPU applications. For example, users We present TimeGraph, a real-time GPU scheduler could launch multiple GPU applications concurrently in at the device-driver level for protecting important GPU their desktop computers, including games, video play- workloads from performance interference. TimeGraph ers, web browsers, and live messengers, sharing the same adopts a new event-driven model that synchronizes the GPU. In such a case, quality-aware soft real-time ap- GPU with the CPU to monitor GPU commands issued plications like games and video players should be pri- from the user space and control GPU resource usage in oritized over live messengers and any other applications a responsive manner.
    [Show full text]
  • Synchronizations in Linux
    W4118 Operating Systems Instructor: Junfeng Yang Learning goals of this lecture Different flavors of synchronization primitives and when to use them, in the context of Linux kernel How synchronization primitives are implemented for real “Portable” tricks: useful in other context as well (when you write a high performance server) Optimize for common case 2 Synchronization is complex and subtle Already learned this from the code examples we’ve seen Kernel synchronization is even more complex and subtle Higher requirements: performance, protection … Code heavily optimized, “fast path” often in assembly, fit within one cache line 3 Recall: Layered approach to synchronization Hardware provides simple low-level atomic operations , upon which we can build high-level, synchronization primitives , upon which we can implement critical sections and build correct multi-threaded/multi-process programs Properly synchronized application High-level synchronization primitives Hardware-provided low-level atomic operations 4 Outline Low-level synchronization primitives in Linux Memory barrier Atomic operations Synchronize with interrupts Spin locks High-level synchronization primitives in Linux Completion Semaphore Futex Mutex 5 Architectural dependency Implementation of synchronization primitives: highly architecture dependent Hardware provides atomic operations Most hardware platforms provide test-and-set or similar: examine and modify a memory location atomically Some don’t, but would inform if operation attempted was atomic 6 Memory
    [Show full text]