Zero-Copy Video Streaming on Embedded Systems the Easy Way
Total Page:16
File Type:pdf, Size:1020Kb
Zero-Copy Video Streaming on Embedded Systems the Easy Way Michael Tretter - [email protected] Embedded Linux Conference-Europe Philipp Zabel - [email protected] Oct 25, 2017 Slide 1 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 Examples ● Presentation capture and streaming ● Augmented Reality ● UAV video downlink ● Intercom CC BY-SA 4.0: Unknownlfcg – Own work CC BY 4.0: kremlin.ru Slide 2 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 Agenda ● Video and graphics on embedded devices ● Hardware acceleration units and Zero-Copy buffer sharing ● Case study: i.MX6 ● The easy way ● Open Issues & Future Work Slide 3 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 Building Blocks ● Recording / Streaming ● Receiving / Projection / Compositing ● Lens correction / Warping ● Transcoding CC BY-SA 4.0: DXR - Own work Slide 4 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 Embedded System Requirements ● Portable ● Soft real-time ● Energy efficient ● “High” data rates ● Lightweight Limited processing power vs. Audio/Video use case Slide 5 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 Specialized Co-processors ● Graphics Processing Unit ● Supported or preferred format in ● Video encoder and decoder memory differ ● ● FPGA Copy and conversion between hardware units required ● Camera ● Display Controller ● Network Controller Slide 6 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 Zero-Copy "Zero-copy" describes computer ● Copying in CPU is expensive operations in which the CPU does not ● CPU memory bandwidth smaller perform the task of copying data from than hardware acceleration units one memory area to another. ● CPU cache management (Wikipedia) Slide 7 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 Putting it all Together, Case study: i.MX6 ● Memory bandwidth ● Up to quad-core Cortex A9, 1 GHz ● Up to 533 MHz DDR3 SDRAM ● CPU memcpy ~500 MiB/s (1066 MT/s @ 64-bit) ● 1080p30 YUYV: 120 MiB/s ● Realistically, up to 2.5 GiB/s on ● Cache management overhead i.MX6Q, more on i.MX6QP Slide 8 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 Putting it all Together, Case study: i.MX6 ● GPU: Vivante GC2000 ● Display: IPUv3 display interface CPUs ● Camera: IPUv3 capture interface External ● GPU VPU: Chips&Media CODA960 Memory Display AXI/ Interface AHB Capture Interface Internal VPU SRAM Slide 9 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 Device Drivers and Interfaces ● GPU: Vivante GC2000 ● etnaviv (DRM) ● Display: IPUv3 display interface ● imx-drm (DRM, KMS) ● Camera: IPUv3 capture interface ● imx-media (Video4Linux2), staging ● VPU: Chips&Media CODA960 ● coda (Video4Linux) ● Userspace: Mesa/etnaviv (OpenGL) ● DMABuf Slide 10 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 Kernel Userspace Slide Slide BeforeDMABuf W 11 - - © Pengutronix -http://www.pengutronix.de - CaptureCSI, V4L2) (IPUv3 mmap R CPU copy CPU 10/25/2017 W Encode (CODA960, V4L2) Encode (CODA960, mmap R W Kernel Userspace Slide Slide DMABuf W 12 CaptureCSI, V4L2) (IPUv3 - ©Pengutronix - - http://www.pengutronix.de - fd dup 10/25/2017 Encode (CODA960, V4L2) Encode (CODA960, R W Kernel Userspace Slide Slide DMABuf 13 Decode V4L2) (CODA960, - ©Pengutronix - - http://www.pengutronix.de - R W fd dup 10/25/2017 Display(IPUv3, DRM) R Device Drivers and Interfaces (API) ● V4L2 ioctls: Import/export DMABuf handles VIDIOC_EXPBUF from/into video devices VIDIOC_QBUF ● DRM ioctls: Import/export DMABuf handles DRM_IOCTL_PRIME_HANDLE_TO_FD from/into GPU or display contoller DRM_IOCTL_PRIME_FD_TO_HANDLE devices, used by libdrm/Mesa ● EGL extensions: These sit on top of EGL_EXT_image_dma_buf_import DRM_IOCTL_PRIME_* EGL_EXT_image_dma_buf_import_modifiers EGL_MESA_image_dma_buf_export Slide 14 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 Device Drivers and Interfaces: V4L2 /* V4L2 DMABuf export */ /* V4L2 DMABuf import */ int video_fd = open("/dev/v4l/by-name/csi", int dmabuf_fd; O_RDWR); int video_fd = open("/dev/v4l/by-name/coda", O_RDWR); struct v4l2_requestbuffers reqbuf = { .count = 1, struct v4l2_requestbuffers reqbuf = { .type = V4L2_BUF_TYPE_VIDEO_CAPTURE, .count = 1, .memory = V4L2_MEMORY_MMAP, .type = V4L2_BUF_TYPE_VIDEO_OUTPUT, }; .memory = V4L2_MEMORY_DMABUF, ioctl(video_fd, VIDIOC_REQBUFS, &reqbuf); }; ioctl(video_fd, VIDIOC_REQBUFS, &reqbuf); struct v4l2_exportbuffer expbuf = { .type = V4L2_BUF_TYPE_VIDEO_CAPTURE, struct v4l2_buffer buf = { .index = 0, .type = V4L2_BUF_TYPE_VIDEO_OUTPUT, }; .memory = V4L2_MEMORY_DMABUF, ioctl(video_fd, VIDIOC_EXPBUF, &expbuf); .index = 0, .m.fd = dmabuf_fd, int dmabuf_fd = expbuf.fd; }; ioctl(video_fd, VIDIOC_QBUF, &buf); https://linuxtv.org/downloads/v4l-dvb-apis-new/uapi/v4l/vidioc-expbuf.html https://linuxtv.org/downloads/v4l-dvb-apis-new/uapi/v4l/dmabuf.html Slide 15 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 Device Drivers and Interfaces: EGL/OpenGL ES EGLint attrib_list[] = { int dmabuf_fd; EGL_WIDTH, 1920, int stride; EGL_HEIGHT, 1280, EGL_LINUX_DRM_FOURCC_EXT, eglExportDMABUFImageMESA( DRM_FORMAT_YUYV, egl_display, EGL_DMA_BUF_PLANE0_FD_EXT, dmabuf_fd, egl_image, EGL_DMA_BUF_PLANE0_FD_OFFSET_EXT, 0, &dmabuf_fd, EGL_DMA_BUF_PLANE0_FD_PITCH_EXT, 3840, &stride); EGL_NONE, }; EGLImageKHR egl_image = eglCreateImageKHR( egl_display, EGL_NO_CONTEXT, EGL_LINUX_DMA_BUF_EXT, NULL, attrib_list); glEGLImageTargetTexture2DOES( GL_TEXTURE_EXTERNAL_OES, egl_image); https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_image_dma_buf_import.txt https://www.khronos.org/registry/EGL/extensions/MESA/EGL_MESA_image_dma_buf_export.txt Slide 16 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 The Easy Way Slide 17 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 GStreamer “GStreamer is a library for constructing graphs of media-handling components. The applications it supports range from simple Ogg/Vorbis playback, audio/video streaming to complex audio (mixing) and video (non-linear editing) processing. Applications can take advantage of advances in codec and filter technology transparently. Developers can add new codecs and filters by writing a simple plugin with a clean, generic interface.” Slide 18 - © Pengutronix - http://www.pengutronix.de - 10/25/2017 GStreamer gst-launch-1.0 playbin uri=file:///home/mtr/Videos/tears_of_steel_720p.mkv <GstPlayBin> playbin0 [>] current-uri="file:////home/mtr/Videos/tears_of_steel_720p.mkv" source=(GstFileSrc) source n-video=1 current-video=0 n-audio=1 current-audio=0 audio-sink=(GstPulseSink) pulsesink0 video-sink=(GstXvImageSink) xvimagesink0 video-stream-combiner=(GstInputSelector) inputselector0 audio-stream-combiner=(GstInputSelector) inputselector1 sample=((GstSample*) 0x5590b656c030) GstPlaySink playsink [>] parent=(GstPlayBin) playbin0 flags=video+audio+text+soft-volume+deinterlace+soft-colorbalance sample=((GstSample*) 0x5590b656c110) video-sink=(GstXvImageSink) xvimagesink0 audio-sink=(GstPulseSink) pulsesink0 send-event-mode=first GstBin abin [>] parent=(GstPlaySink) playsink GstPlaySinkAudioConvert GstPulseSink aconv pulsesink0 [>] [>] parent=(GstBin) abin parent=(GstBin) abin GstQueue use-converters=TRUE enable-last-sample=FALSE aqueue audio/x-raw device="alsa_output.pci-0000_00_1b.0.analog-stereo" [>] format: F32LE current-device="alsa_output.pci-0000_00_1b.0.analog-stereo" parent=(GstBin) abin audio/x-raw layout: interleaved device-name="Built-in Audio Analog Stereo" current-level-buffers=44 format: F32LE GstAudioConvert GstAudioResample audio/x-raw rate: 44100 current-level-bytes=360448 layout: interleaved audio/x-raw conv audio/x-raw resample format: F32LE channels: 2 current-level-time=1022000000 rate: 44100 format: F32LE [>] format: F32LE [>] layout: interleaved src channel-mask: 0x0000000000000003 sink silent=TRUE channels: 2 layout: interleaved parent=(GstPlaySinkAudioConvert) aconv layout: interleaved parent=(GstPlaySinkAudioConvert) aconv sink rate: 44100 [>][bfb] [ >][ bfb] channel-mask: 0x0000000000000003 [>][bfb] rate: 44100 rate: 44100 channels: 2 channels: 2 channels: 2 sink src channel-mask: 0x0000000000000003 proxypad14 proxypad13 channel-mask: 0x0000000000000003 sink src channel-mask: 0x0000000000000003 sink src [>][bfb] audio/x-raw [ >][ bfb] [>][bfb][T] [>][bfb] [>][bfb] [ >][ bfb] [>][bfb] [ >][ bfb] format: F32LE layout: interleaved rate: 44100 channels: 2 GstIdentity identity audio/x-raw channel-mask: 0x0000000000000003 format: F32LE [>] proxypad15 parent=(GstPlaySinkAudioConvert) aconv layout: interleaved [>][bfb] rate: 44100 signal-handoffs=FALSE channels: 2 sink channel-mask: 0x0000000000000003 [>][bfb] sink src [>][bfb] [ >][ bfb] GstInputSelector inputselector0 video/x-raw [>] format: I420 parent=(GstPlayBin) playbin0 video/x-raw width: 1280 n-pads=1 GstURIDecodeBin format: I420 height: 534 uridecodebin0 video/x-raw active-pad=(GstSelectorPad) sink_0 GstBin width: 1280 interlace-mode: progressive GstStreamSynchronizer [>] format: I420 vbin height: 534 pixel-aspect-ratio: 1/1 streamsynchronizer0 parent=(GstPlayBin) playbin0 width: 1280 video/x-raw [>] [>] uri="file:////home/mtr/Videos/tears_of_steel_720p.mkv" height: 534 sink_0 interlace-mode: progressive chroma-site: mpeg2 format: I420 parent=(GstPlaySink) playsink parent=(GstPlaySink) playsink source=(GstFileSrc) source interlace-mode: progressive running-time=41000000 pixel-aspect-ratio: 1/1 colorimetry: bt709 width: 1280 caps=video/x-raw(ANY); audio/x-raw(ANY); text/x-raw(ANY); subpicture/x-dvd; subpictur… pixel-aspect-ratio: 1/1 t ags=(( Gst TagList *) 0x7fa4c40d7f20) chroma-site: