WP80 Superguide 3: THE CLOUD VIDEO SUPERGUIDE JANUARY/FEBRUARY 2015 SPONSORED CONTENT

Intel® Quick Sync Video Technology and ® ® Processor Based Servers— Flexible Transcode Performance and Quality

Video transcoding involves converting provides enterprise-quality HEVC and and boost image quality. Some key one compressed video format to audio codecs, Intel® VTune™ Amplifier improvements include the following: another. In the past this process has XE performance analysis tools and Video • Additional JPEG/MJPEG decode in been a compute-intensive task which Quality Caliper stream quality analyzer. the multi-format codec engine. This demanded a large amount of precious Additionally, product family members, support is on top of existing energy- CPU resources. Intel® Quick Synch Video Intel® Video Pro Analyzer and Intel® efficient, high-performance AVC (QSV) can enable hardware-accelerated Stress Bitstreams and Encoder bundles encode/decode that sustains multiple transcoding to deliver better performance enable production–scale validation and 4K and Ultra HD video streams. than transcoding on the CPU without debug of encode, transcode, and decode • A dedicated new video quality engine sacrificing quality. and playback applications. to provide extensive video processing First introduced in 2011, Intel Quick Intel Media Server Studio SDK at low power consumption Sync technology is available in the Intel® implements many codec and tools • Programmable and media-optimized Xeon® Processor E3-1200 v3 with Intel components initially in software, and EU (execution units)/samplers for HD Graphics P4600/4700 and Iris™ Pro later as hybrid (software and hardware) high quality P5200. (From here on, we’ll simply refer to or entirely in hardware. The reason for • A scalable architecture with the Intel Quick Sync technology for Video as this approach is a faster time to market flexibility to enable acceleration based QSV.) This advanced graphics solution sets for software solutions, followed by hybrid on application demand a milestone in performance by delivering solutions that contain partial hardware a fast transcode experience and balances acceleration, and then finally fast hardware There are different benefits to software image quality with performance through solutions that scale. Intel Media Server and hardware approaches for encoding a set of optimized user targets facilitating Studio SDK 2015 is available for both solutions. Typically software encoders easy adoption. Windows and , and is an enabling have the luxury of performing extremely The Intel® Media Server Studio software for Intel Quick Sync technology. complicated estimation and provides software development tools and The new Intel libraries needed to develop, debug, and Iris and Intel Iris deploy media solutions on Intel server Pro Graphics (5100+ processors. Four bundles of tools, libraries, Series) and Intel HD and test content offer flexibility to Graphics (4600+ software developers, datacenter solution Series) have seen a providers, cloud solution providers, and number of significant OEMs. The Intel® Media Server Studio enhancements. Professional Edition provides access Many new features to Intel hardware through a software and capabilities are development kit (SDK) and media and included to improve graphics drivers for Linux and Windows performance, reduce Server. Along with these components, it power consumption, SPONSORED CONTENT JANUARY/FEBRUARY 2015 streamingmedia.com WP81

exhaustive rate-distortion optimization for a full range of users. New target usage into slices. Together, EU, subslices, and on the CPU in order to obtain the best design optimizations provide significant slices are the modular building blocks possible quality. The tradeoff comes performance headroom and enable a that are composed to create product in a very high computational cost. wider variety of applications. Developers variants based upon Intel Processor Hardware encoders have commonly been can take advantage of greater flexibility in Graphics. Next in the architecture is regarded as less flexible and incapable performance and quality to meet various sharing DRAM physical memory with the of delivering the same quality. However, application requirements. For example, CPU. This unified memory architecture with Intel advances in hardware and hardware-specific features on Intel’s offers a number of system design, software, developers should quickly learn on-die integrated Processor Graphics power efficiency, and programmability to facilitate state-of-the-art algorithms Architecture, such as shared physical advantages over PCI Express hosted that should improve the quality. Intel memory and zero copy buffer transfers, discrete memory systems. The obvious Quick Sync Video technology pre-defines are programmable through the buffer advantage is that shared, coherent various target usages (TU) ranging from allocation mechanisms in APIs such as physical memory enables zero copy TU1 to TU7. These are based on a range OpenCL 1.0+ and DirectX11.2+. The effect, buffer transfers between CPUs and GEN of different performance and quality as shown in the Encoding Performance compute architecture. The net effect of tradeoffs. Note that at the Intel Media diagram in purple, is an increase in these newest architectures is that they Server-API level there are additional performance with a GPU accelerated benefit performance, conserve memory options available for developers to further codec. in this case HEVC. footprint, and indirectly conserve system fine tune these targets. Intel’s on-die integrated Processor power not spent needlessly copying data. Target Usage 7 (TU7) is the fastest Graphics Architecture encode mode provided on Intel Quick (generically referred Sync Video. TU7 is meant to provide the to as GEN) offers fastest mode and quality improvements, outstanding real-time but only when doing so does not hinder 3D rendering and the targeted performance. The TU7 media performance. setting in the new Intel Quick Sync Video However, its technology improve both energy efficiency underlying compute and visual quality while still meeting a architecture also (12x) real-time speed target. offers general- purpose compute “The highlight of Intel’s 14nm Broadwell capabilities that processor is the Gen8 GPU, which offers approach teraFLOPS 20% better compute and 50% more performance. The architecture of SUMMARY texturing performance than Haswell’s … Intel Processors Graphics delivers a Video delivery has become a critical delivering 8x more performance per clock full complement of high-throughput service offering for cloud service providers, compared with ’s Gen6 GPU, floating-point and integer compute enterprises, and communications eclipsing even the rapid growth rate that capabilities, a layered high-bandwidth providers. With the proliferation of viewing Moore’s Law predicts.” —Linley Group memory hierarchy, and deep integration devices—ranging from TVs to laptops to with on-die CPUs. Additionally, this is smart phones—efficiently transcoding The latest version of Intel Quick accomplished in a modular design that video from one compressed video format Sync Video is a significant improvement provides scalability for a variety of uses. to another is essential. over previous generations in terms of The architecture begins with compute Developers of video delivery services performance and quality. The new high- components called execution units should consider Intel® Quick Sync Video— performance hardware acceleration (EUs). EUs are clustered into groups available in Intel Xeon Processor E3-1200 engine improves speed and quality targets called sub-slices which are grouped v3—to enable hardware-accelerated transcoding to deliver better performance

References: https://software.intel.com/sites/default/files/managed/41/7d/Intel_HEVCwhitepaper_v1.3_03Nov2014.pdf than transcoding on the CPU without http://newsroom.intel.com/servlet/JiveServlet/previewBody/3880-102-1-6896/WP-HD-Graphics-4200.pdf sacrificing quality. To get enabled quickly, https://software.intel.com/en-us/file/compute-architecture-of-intel-processor-graphics-gen8pdf use the Intel® Media Server Studio, which https://software.intel.com/en-us/articles/d3d9-media-surface-sharing-between-intel-quick-sync-video-and-opencl-on- intel-hd-graphics provides development tools and libraries https://software.intel.com/en-us/intel-media-server-studio needed to develop, debug, and deploy media solutions on Intel server processors. Quote Source: http://www.linleygroup.com/newsletters/newsletter_detail.php?num=5223