Processing Multimedia Workloads on Heterogeneous Multicore Architectures

Total Page:16

File Type:pdf, Size:1020Kb

Processing Multimedia Workloads on Heterogeneous Multicore Architectures Doctoral Dissertation Processing Multimedia Workloads on Heterogeneous Multicore Architectures H˚akon Kvale Stensland February 2015 Submitted to the Faculty of Mathematics and Natural Sciences at the University of Oslo in partial fulfilment of the requirements for the degree of Philosophiae Doctor © Håkon Kvale Stensland, 2015 Series of dissertations submitted to the Faculty of Mathematics and Natural Sciences, University of Oslo No. 1601 ISSN 1501-7710 All rights reserved. No part of this publication may be reproduced or transmitted, in any form or by any means, without permission. Cover: Hanne Baadsgaard Utigard. Printed in Norway: AIT Oslo AS. Produced in co-operation with Akademika Publishing. The thesis is produced by Akademika Publishing merely in connection with the thesis defence. Kindly direct all inquiries regarding the thesis to the copyright holder or the unit which grants the doctorate. Abstract Processor architectures have been evolving quickly since the introduction of the central processing unit. For a very long time, one of the important means of increasing per- formance was to increase the clock frequency. However, in the last decade, processor manufacturers have hit the so-called power wall, with high heat dissipation. To overcome this problem, processors were designed with reduced clock frequencies but with multiple cores and, later, heterogeneous processing elements. This shift introduced a new challenge for programmers: Legacy applications, written without parallelization in mind, gain no benefits from moving to multicore and heterogeneous architectures. Another challenge for the programmers is that heterogeneous architecture designs are very different with respect to caches, memory types, execution unit organization, and so forth and, in the worst case, a programmer must completely rewrite the application to obtain the best performance on the new architecture. Multimedia workloads, such as video encoding, are often time sensitive and interac- tive. These workloads differ from traditional batch processing workloads with no real-time requirements. This work investigates how to use modern heterogeneous architectures ef- ficiently to process multimedia workloads. To do so, we investigate both simple and complex workloads on multiple architectures to learn about the properties of these archi- tectures. When programing multimedia workloads, it is very important to know how the algorithms perform on the target architecture. In addition, achieving high performance on heterogeneous architectures is not a trivial task, often requiring detailed knowledge about the architecture. We therefore evaluate several optimizations so we can learn how best to write programs for these architectures and avoid potential pitfalls. We later use the knowledge gained to propose a framework design and language called Parallel Pro- cessing Graph (P2G). The P2G framework is designed for multimedia workloads and supports heterogeneous architectures. To demonstrate the feasibility of the framework, we construct a proof-of-concept implementation. Two simple workloads show that we can express multimedia workloads in the system. We also demonstrate the scalability of the designed solution. iii iv Acknowledgements Working with the PhD has been a long journey, at times, it has been both frustrating and stressful, but it has mostly been lots of fun. I would like to thank my supervisors, Professor Carsten Griwodz and Professor P˚alHalvorsen for interesting discussions, their deep insights and valuable feedback over the years. I would also like to thank my colleagues H˚avard Espeland and Paul Beskow who I have shared office with for good collaboration and inspiring discussions on the topic of processing multimedia workloads. During the work with this thesis I have supervised several master students. I would like to thank all of them, the discussions have been very inspiring, and have helped me with this thesis. The work environment at Simula Research Laboratory and the Media-department has also been excellent. Here, I would like to thank Andreas Petlund, Kristian Evensen, Ragnhild Eg, Preben Olsen and Vamsidhar Gaddam for making Simula a great place to be. Finally I would like to thank my family, friends and especially my wife Marianne, for being patient and always supporting me no matter what I decide to do. v vi Contents I Overview 1 1 Introduction 3 1.1 Background and Motivation . .3 1.1.1 Heterogeneous Architectures . .4 1.1.2 Multimedia Workloads . .5 1.2 Problem Statement . .6 1.3 Limitations . .7 1.4 Research Method . .7 1.5 Main Contributions . .8 1.6 Outline . 10 2 Heterogeneous Computing 11 2.1 Hardware Architectures . 11 2.1.1 Intel x86 Processor Architecture . 11 2.1.2 Intel IXP Network Processor . 16 2.1.3 Nvidia Graphics Processing Units . 18 2.1.4 STI Cell Broadband Engine . 22 2.1.5 Other Hardware Architectures . 24 2.1.6 Summary . 26 2.2 Hardware Abstractions and Programming Models . 27 2.2.1 SMT . 27 2.2.2 SIMD . 28 2.2.3 SIMT . 29 2.2.4 Summary . 30 2.3 Summary . 30 3 Using Heterogeneous Architectures for Simple Tasks 33 3.1 Intel IXP Network Processor . 33 3.1.1 Case Study: Network Protocol Translation . 34 3.1.2 Implications . 38 3.2 x86 Processor Architecture . 39 3.2.1 Case study: Motion JPEG Encoding . 39 3.2.2 Case Study: Multi-Rate Video Encoding with VP8 . 42 3.2.3 Case Study: Parallel Execution of a Game Server . 46 3.2.4 Implications . 52 3.3 Graphics Processing Units . 53 vii 3.3.1 Case Study: GPU Memory Spaces and Access Patterns . 53 3.3.2 Case Study: Host{Device Communication Optimization . 56 3.3.3 Case Study: Cheat Detection . 58 3.3.4 Case Study: MJPEG Encoding . 64 3.3.5 Implications . 68 3.4 Cell Broadband Engine . 69 3.4.1 Case Study: MJPEG Encoding . 69 3.4.2 Implications . 73 3.5 Architecture Comparison . 74 3.6 Summary . 75 4 Using Heterogeneous Architectures for Complex Workloads 77 4.1 Bagadus Sports Analysis System . 77 4.1.1 Bagadus: The Basic Idea . 78 4.1.2 Video Subsystem . 80 4.2 The Real-Time Bagadus Video Pipeline . 83 4.2.1 Performance Analysis . 90 4.2.2 Discussion . 92 4.3 Summary . 94 5 The P2G Framework and the Future 97 5.1 Summary of Challenges . 97 5.2 Design Ideas for a New Processing Framework . 98 5.3 Existing Processing Frameworks . 98 5.4 The P2G Framework . 99 5.4.1 Architecture . 102 5.4.2 Programming Model . 103 5.4.3 Prototype . 108 5.4.4 Workloads . 109 5.4.5 Evaluation . 110 5.4.6 Summary . 113 5.5 The Future . 113 6 Papers and Author's Contributions 115 6.1 Overview of Research Papers . 115 6.2 Paper I: Transparent Protocol Translation for Streaming . 115 6.3 Paper II: Evaluation of Multi-Core Scheduling Mechanisms for Heteroge- neous Processing Architectures . 116 6.4 Paper III: Tips, Tricks and Troubles: Optimizing for Cell and GPU . 117 6.5 Paper IV: Cheat Detection Processing: A GPU versus CPU Comparison . 118 6.6 Paper V: Reducing Processing Demands for Multi-Rate Video Encoding: Implementation and Evaluation . 119 6.7 Paper VI: LEARS: A Lockless, Relaxed-Atomicity State Model for Parallel Execution of a Game Server Partition . 120 6.8 Paper VII: P2G: A Framework for Distributed Real-Time Processing of Multimedia Data . 120 6.9 Paper VIII: Bagadus: An Integrated Real-Time System for Soccer Analytics122 viii 6.10 Paper IX: Processing Panorama Video in Real-Time . 122 6.11 Supervised Master's Students . 123 6.12 Other Publications . 127 7 Conclusion 129 7.1 Summary . 129 7.2 Concluding Remarks . 130 7.3 Future Work . 131 II Research Papers 145 Paper I: Transparent Protocol Translation for Streaming 147 Paper II: Evaluation of Multi-Core Scheduling Mechanisms for Hetero- geneous Processing Architectures 153 Paper III: Tips, Tricks and Troubles: Optimizing for Cell and GPU 161 Paper IV: Cheat Detection Processing: A GPU versus CPU Compari- son 169 Paper V: Reducing Processing Demands for Multi-Rate Video Encod- ing: Implementation and Evaluation 177 Paper VI: LEARS: A Lockless, Relaxed-Atomicity State Model for Par- allel Execution of a Game Server Partition 199 Paper VII: P2G: A Framework for Distributed Real-Time Processing of Multimedia Data 209 Paper VIII: Bagadus: An Integrated Real-Time System for Soccer An- alytics 223 Paper IX: Processing Panorama Video in Real-Time 247 Posters and live demonstrations 269 Other research papers 273 A BNF Grammar of the P2G Kernel Language 277 ix x List of Figures 1.1 The real-time panorama video stitching pipeline in the Bagadus soccer analysis system [114]. .6 2.1 Comparison of SMP architectures. 13 2.2 Intel Haswell architecture diagram. 14 2.3 Intel Xeon Phi MIC architecture. 16 2.4 Intel IXP2400 architecture diagram. 17 2.5 Comparison of transistor usage on a CPU and on a GPU. 18 2.6 A pre-DirectX 10 graphics pipeline [43], with a programmable vertex pro- cessor and fragment processor. 19 2.7 Nvidia GK110 SMX architecture [90], slightly modified. 21 2.8 The Kepler memory hierarchy. 22 2.9 CBE architecture. 23 2.10 Overview of an SPE. 24 2.11 A superscalar processor design with and without SMT. 28 2.12 SIMD programming model. 28 2.13 Nvidia CUDA programming model. 29 3.1 Overview of the streaming scenario. 35 3.2 Packet flow on the Intel IXP2400. 36 3.3 Achieved bandwidth, varying drop rate and link latency with 1% server{ proxy loss. ..
Recommended publications
  • Intel® Quick Sync Video Technology Guide
    WP80 Superguide 3: THE CLOUD VIDEO SUPERGUIDE JANUARY/FEBRUARY 2015 SPONSORED CONTENT Intel® Quick Sync Video Technology and Intel® Xeon® Processor Based Servers— Flexible Transcode Performance and Quality Video transcoding involves converting provides enterprise-quality HEVC and and boost image quality. Some key one compressed video format to audio codecs, Intel® VTune™ Amplifier improvements include the following: another. In the past this process has XE performance analysis tools and Video • Additional JPEG/MJPEG decode in been a compute-intensive task which Quality Caliper stream quality analyzer. the multi-format codec engine. This demanded a large amount of precious Additionally, product family members, support is on top of existing energy- CPU resources. Intel® Quick Synch Video Intel® Video Pro Analyzer and Intel® efficient, high-performance AVC (QSV) can enable hardware-accelerated Stress Bitstreams and Encoder bundles encode/decode that sustains multiple transcoding to deliver better performance enable production–scale validation and 4K and Ultra HD video streams. than transcoding on the CPU without debug of encode, transcode, and decode • A dedicated new video quality engine sacrificing quality. and playback applications. to provide extensive video processing First introduced in 2011, Intel Quick Intel Media Server Studio SDK at low power consumption Sync technology is available in the Intel® implements many codec and tools • Programmable and media-optimized Xeon® Processor E3-1200 v3 with Intel components initially in software, and EU (execution units)/samplers for HD Graphics P4600/4700 and Iris™ Pro later as hybrid (software and hardware) high quality P5200. (From here on, we’ll simply refer to or entirely in hardware.
    [Show full text]
  • Copyright by Jian He 2020 the Dissertation Committee for Jian He Certifies That This Is the Approved Version of the Following Dissertation
    Copyright by Jian He 2020 The Dissertation Committee for Jian He certifies that this is the approved version of the following dissertation: Empowering Video Applications for Mobile Devices Committee: Lili Qiu, Supervisor Mohamed G. Gouda Aloysius Mok Xiaoqing Zhu Empowering Video Applications for Mobile Devices by Jian He DISSERTATION Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY THE UNIVERSITY OF TEXAS AT AUSTIN May 2020 Acknowledgments First and foremost, I want to thank my advisor Prof. Lili Qiu, for the support and guidance I have received over the past few years. I appreciate all her contributions of time, ideas and funding to make my Ph.D. experience productive and stimulating. The enthusiasm she has for her research signifi- cantly motivated to concentrate on my research especially during tough times in my Ph.D. pursuit. She taught me how to crystallize ideas into solid and fancy research works. I definitely believe that working with her will help me have a more successful career in the future. I also want to thank all the members in my dissertation committee, Prof. Mohamed G. Gouda, Prof. Aloysius Mok and Dr. Xiaoqing Zhu. I owe many thanks to them for their insightful comments on my dissertation. I was very fortunate to collaborate with Wenguang Mao, Mubashir Qureshi, Ghufran Baig, Zaiwei Zhang, Yuchen Cui, Sangki Yun, Zhaoyuan He, Chenxi Yang, Wangyang Li and Yichao Chen on many interesting works. They always had time and passion to devote to my research projects.
    [Show full text]
  • GPU Developments 2018
    GPU Developments 2018 2018 GPU Developments 2018 © Copyright Jon Peddie Research 2019. All rights reserved. Reproduction in whole or in part is prohibited without written permission from Jon Peddie Research. This report is the property of Jon Peddie Research (JPR) and made available to a restricted number of clients only upon these terms and conditions. Agreement not to copy or disclose. This report and all future reports or other materials provided by JPR pursuant to this subscription (collectively, “Reports”) are protected by: (i) federal copyright, pursuant to the Copyright Act of 1976; and (ii) the nondisclosure provisions set forth immediately following. License, exclusive use, and agreement not to disclose. Reports are the trade secret property exclusively of JPR and are made available to a restricted number of clients, for their exclusive use and only upon the following terms and conditions. JPR grants site-wide license to read and utilize the information in the Reports, exclusively to the initial subscriber to the Reports, its subsidiaries, divisions, and employees (collectively, “Subscriber”). The Reports shall, at all times, be treated by Subscriber as proprietary and confidential documents, for internal use only. Subscriber agrees that it will not reproduce for or share any of the material in the Reports (“Material”) with any entity or individual other than Subscriber (“Shared Third Party”) (collectively, “Share” or “Sharing”), without the advance written permission of JPR. Subscriber shall be liable for any breach of this agreement and shall be subject to cancellation of its subscription to Reports. Without limiting this liability, Subscriber shall be liable for any damages suffered by JPR as a result of any Sharing of any Material, without advance written permission of JPR.
    [Show full text]
  • Keepixo’S Genova Live:Speed Is the Latest Addition to Genova Virtualizable Software Family of Products
    High Density Video Transcoding Keepixo’s Genova Live:Speed is the latest addition to Genova Virtualizable Software family of products. It runs on the Kontron SYMKLOUD platform and leverages Intel’s Quick Sync Video (QSV) technology to dramatically increase transcoding density; minimize power consumption while meeting professional service grade levels. Media Processing Acceleration Genova Live:Speed The explosive growth of Internet video traffic has put Keepixo worked with Kontron to leverage the Kontron pressure on video transcoding infrastructures to SYMKLOUD Converged Infrastructure platform and optimize bandwidth, power consumption and cost of developed a comprehensive package that optimizes the operations. High density transcoding solutions such as performance of QSV and allows smooth deployment of Intel’s Quick Sync Video technology have emerged to highly dense transcoding infrastructures comprising of address these challenges. They introduce various levels hundreds of services. of HW acceleration to off-load computationally intensive tasks. Initially targeted at consumer Keepixo selected the Kontron SymKloud MS2910 applications, high density transcoding has become a platform to implement its high density transcoding viable option for professional transcoding solution. The MS2910 platform comes in a 2U (21” infrastructures when included in suitable packages that depth) chassis, dual hot-swappable 10GbE switches, address the requirements of professional applications. and can accommodate up to 9 modular compute servers, each hosting 2 independent CPUs for a total of Intel Quick Sync Video (QSV) is available on a range of up to 18 CPUs per chassis. The SymKloud compute Intel i7 and Xeon E3 processors fitted with on-chip nodes can be of different mix-and-match processor graphics.
    [Show full text]
  • Rethinking Visual Cloud Workload Distribution
    WHITE PAPER Media and Communications Content Creation and Distribution RethinkingVisualCloud WorkloadDistribution Creating a New Model With visual computing workloads growing at an accelerating pace, cloud service providers (CSPs), communications service providers (CoSPs), and enterprises are rethinking the physical and virtual distribution of compute resources to more effectively balance cost and deployment efficiency while achieving exceptional performance. Visual cloud deployments accommodate a diverse range of streaming workloads, encompassing media processing and delivery, cloud graphics, cloud gaming, media analytics, and immersive media. Contending with the onslaught of new visual workloads will require more nimble, scalable, virtualized infrastructures; the capability of shifting workloads to the network edge when appropriate; and a collection of tools, software, and hardware components to support individual use cases fluidly. Advanced network technologies and cloud architectures are essential for agile distribution of visual cloud workloads. A 2017 report, Cisco Visual Networking Index: Forecast and Methodology, 2016–2021, projected strong growth in all Internet and managed IP video- related sectors. Compound annual growth rate (CAGR) figures during this time span, calculated in petabytes per month, included these predictions: • Content delivery network (CDN) traffic: 44 percent increase globally • Consumer-managed IP video traffic: 19,619 petabytes per month (14 percent increase) by 2021 • Consumer Internet video: 27 percent increase for fixed, 55 percent increase for mobile The impact of this media growth on cloud-based data centers will produce a burden on those CSPs, CoSPs, and enterprises that are not equipped to deal with TableofContents large-scale media workloads dynamically. Solutions to this challenge include: Creating a New Model . 1 • Increasingflexibilityandoptimizingprocessing: Virtualization and software- defined infrastructure (SDI) make it easier to balance workloads on available OpenSourceSoftware resources.
    [Show full text]
  • Transcoding SDK Combine Your Encoding Presets Into a Single Tool
    DATASHEET | Page 1 Transcoding SDK Combine your encoding presets into a single tool MainConcept Transcoding SDK is an all-in-one production tool offering HOW DOES IT WORK? developers the ability to manage multiple codecs and parameters in one • Transcoding SDK works as an place. This streamlined SDK supports the latest encoders and decoders additional layer above MainConcept from MainConcept, including HEVC/H.265, AVC/H.264, DVCPRO, and codecs. MPEG-2. The transcoder generates compliant streams across different • The easy-to-use API replaces the devices, media types, and camcorder formats, and includes support for need to set conversion parameters MPEG-DASH and Apple HLS adaptive bitstream formats. Compliance manually by allowing you to configure ensures content is delivered that meets each unique specification. the encoders with predefined profiles, letting the transcoding engine take Transcoding SDK was created to simplify the workflow for developers care of the rest. who frequently move between codecs and output to a multitude of • If needed, manual control of the configurations. conversion process is supported, including source/target destinations, export presets, transcoding, and filter AVAILABLE PACKAGES parameters. HEVC/H.265 HEVC/H.265 encoder for creating HLS, DASH-265, and other ENCODER PACKAGE generic 8-bit/10-bit 4:2:0 and 4:2:2 streams in ES, MP4 and TS file formats. Includes hardware encoding support using Intel Quick KEY FEATURES Sync Video (IQSV) and NVIDIA NVENC (including Hybrid GPU) for Windows and Linux. • Integrated SDKs for fast deployment HEVC/H.265 SABET HEVC/H.265 encoder package plus Smart Adaptive Bitrate Encod- of transcoding tools ENCODER PACKAGE ing Technology (SABET).
    [Show full text]
  • Intel® Quick Sync Video and Ffmpeg Installation and Validation Guide
    White paper Intel® Quick Sync Video and FFmpeg Installation and Validation Guide Introduction Intel® Quick Sync Video technology on Intel® Iris™ Pro Graphics and Intel® HD graphics provides transcode acceleration on Linux* systems in FFmpeg* 2.8 and later editions. This paper is a detailed step-by-step guide to enabling h264_qsv, mpeg2_qsv, and hevc_qsv hardware accelerated codecs in the FFmpeg framework. For a quicker overview, please see this article. Performance note: The *_qsv implementations are intended to provide easy access to Intel hardware capabilities for FFmpeg users, but are less efficient than custom applications optimized for Intel® Media Server Studio. Document note: Monospace type = command line inputs/outputs. Pink = highlights to call special attention to important command line I/O details. Getting Started 1. Install Intel Media Server Studio for Linux. Download from software.intel.com/intel-media-server- studio. This is a prerequisite for the *_qsv codecs as it provides the foundation for encode acceleration. See the next chapter for more info on edition choices. Note: Professional edition install is required for hevc_qsv. 2. Get the latest FFmpeg source from https://www.FFmpeg.org/download.html. Intel Quick Sync Video support is available in FFmpeg 2.8 and later editions. The install steps outlined below were verified with ffmpeg release 3.2.2 3. Configure FFmpeg with “--enable –libmfx –enable-nonfree”, build, and install. This requires copying include files to /opt/intel/mediasdk/include/mfx and adding a libmfx.pc file. More details below. 4. Test transcode with an accelerated codec such as “-vcodec h264_qsv” on the FFmpeg command line.
    [Show full text]
  • MSI Afterburner V4.6.4
    MSI Afterburner v4.6.4 MSI Afterburner is ultimate graphics card utility, co-developed by MSI and RivaTuner teams. Please visit https://msi.com/page/afterburner to get more information about the product and download new versions SYSTEM REQUIREMENTS: ...................................................................................................................................... 3 FEATURES: ............................................................................................................................................................. 3 KNOWN LIMITATIONS:........................................................................................................................................... 4 REVISION HISTORY: ................................................................................................................................................ 5 VERSION 4.6.4 .............................................................................................................................................................. 5 VERSION 4.6.3 (PUBLISHED ON 03.03.2021) .................................................................................................................... 5 VERSION 4.6.2 (PUBLISHED ON 29.10.2019) .................................................................................................................... 6 VERSION 4.6.1 (PUBLISHED ON 21.04.2019) .................................................................................................................... 7 VERSION 4.6.0 (PUBLISHED ON
    [Show full text]
  • System Design for Telecommunication Gateways
    P1: OTE/OTE/SPH P2: OTE FM BLBK307-Bachmutsky August 30, 2010 15:13 Printer Name: Yet to Come SYSTEM DESIGN FOR TELECOMMUNICATION GATEWAYS Alexander Bachmutsky Nokia Siemens Networks, USA A John Wiley and Sons, Ltd., Publication P1: OTE/OTE/SPH P2: OTE FM BLBK307-Bachmutsky August 30, 2010 15:13 Printer Name: Yet to Come P1: OTE/OTE/SPH P2: OTE FM BLBK307-Bachmutsky August 30, 2010 15:13 Printer Name: Yet to Come SYSTEM DESIGN FOR TELECOMMUNICATION GATEWAYS P1: OTE/OTE/SPH P2: OTE FM BLBK307-Bachmutsky August 30, 2010 15:13 Printer Name: Yet to Come P1: OTE/OTE/SPH P2: OTE FM BLBK307-Bachmutsky August 30, 2010 15:13 Printer Name: Yet to Come SYSTEM DESIGN FOR TELECOMMUNICATION GATEWAYS Alexander Bachmutsky Nokia Siemens Networks, USA A John Wiley and Sons, Ltd., Publication P1: OTE/OTE/SPH P2: OTE FM BLBK307-Bachmutsky August 30, 2010 15:13 Printer Name: Yet to Come This edition first published 2011 C 2011 John Wiley & Sons, Ltd Registered office John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com. The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
    [Show full text]
  • H P Intel® Quick Sync Video Tec Nology on Intel® Iris™ Graphics
    WHITE PAPER Intel Quick Sync Video Intel® Quick Sync Video Tech n ology on Intel® Iris™ Graphics and Intel® HD Grap h ics family—Flexible Transcode Performance an d Quality The 4th generation Intel® Core™ Processor introduces Intel® Iris™ Pro Graphics, Intel® Iris™ Graphics and Intel® HD Graphics (4200+ Series), featuring the newest iteration of Intel® Quick Sync Video technology. In addition, Intel Quick Sync technology is also available in the Intel Xeon® Processor E3-1200 v3 product family with Intel HD Graphics P4600/P4700. For the purposes of this whitepaper, we will refer simply to Intel Quick Sync technology. This advanced graphics solution sets a milestone in performance by delivering a blazingly fast transcode experience, and balances image quality and performance through a series of optimized user targets for easy adoption. This paper explains the changes in Intel Quick Sync Video transcode target usages, describes the new high-performance hardware acceleration engine, and sets performance and quality expectations for each target usage. Introduction In the past, transcoding has been a Video transcoding involves converting one compute-intensive task that demanded a compressed video format to another. The large amount of precious CPU resources. process can apply changes to the format, With multiple cores and more powerful such as moving from MPEG2 to H.264, or processors driving today’s systems, transcoding can change the properties of transcoding happens faster, and is a given format, like bit rate or resolution. commonly used to support the format Conversions cannot generally happen in a requirements of a whole range of video single step; rather, a series of transitions consumption devices.
    [Show full text]
  • Mechdyne-TGX-2.1-Installation-Guide
    TGX Install Guide Version 2.1.3 Mechdyne Corporation March 2021 TGX INSTALL GUIDE VERSION 2.1.3 Copyright© 2021 Mechdyne Corporation All Rights Reserved. Purchasers of TGX licenses are given limited permission to reproduce this manual, provided the copies are for their use only and are not sold or distributed to third parties. All such copies must contain the title page and this notice page in their entirety. The TGX software program and accompanying documentation described herein are sold under license agreement. Their use, duplication, and disclosure are subject to the restrictions stated in the license agreement. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. This publication is provided “as is” without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non- infringement. Any Mechdyne Corporation publication may include inaccuracies or typographical errors. Changes are periodically made to these publications, and changes may be incorporated in new editions. Mechdyne may improve or change its products described in any publication at any time without notice. Mechdyne assumes no responsibility for and disclaims all liability for any errors or omissions in this publication. Some jurisdictions do not allow the exclusion of implied warranties, so the above exclusion may not apply. TGX is a trademark of Mechdyne Corporation. Windows® is registered trademarks of Microsoft Corporation. Linux® is registered trademark of Linus Torvalds.
    [Show full text]
  • NVIDIA GRID™ Virtual PC and Virtual Apps
    NVIDIA VIRTUAL PC AND VIRTUAL APPS GPU-ACCELERATED PERFORMANCE FOR THE VIRTUAL ENTERPRISE Desktop virtualization has been around for many Reason 1: Flexible Workers Require a years, but some organizations still struggle to Seamless Experience deliver a user experience that stands up to what As organizations form return to work plans, workers have enjoyed on physical PCs. Remote it is clear that remote work will be part of the offices, virtual collaboration, and fast access long term solution. By allowing employees to seamlessly transition between the office and to information, coupled with the rising use of home, organizations are adopting the flexible modern, graphics-intensive applications, make office model. User experience has become more GPUs more relevant than ever. important than ever. Video collaboration, as well as simple productivity applications found in Microsoft NVIDIA® Virtual PC (vPC) and Virtual Apps (vApps) Windows 10 (Win 10), Office 365, web browsers, and improve virtual desktops and applications for streaming video can benefit from GPU acceleration. every user, with proven performance built on NVIDIA GPUs for exceptional productivity, security, and IT manageability. The virtualization software divides NVIDIA GPU resources, so the GPU can be 82% of organizations shared across multiple virtual machines running have the option of any application. working remotely. Here are three powerful reasons to deploy vPC and vApps in your data center. Traditional desktop and laptop PCs boost application performance with embedded or integrated GPUs. NVIDIA VIRTUAL PC AND VIRTUAL APPS | SOlutiON OvERVIEW | SEP21 However, when making the transition from physical to Reason 3: There Are More Users to Support virtual, IT has traditionally left the computer graphics Than Ever Before burden—such as from DirectX and OpenGL workloads Today’s virtual desktops and applications require and video streaming—to be handled by a server CPU.
    [Show full text]