OPTO−ELECTRONICS REVIEW 21(1), 39–51

DOI: 10.2478/s11772−013−0072−z

Video coding and transmission standards for 3D – a survey

A. BUCHOWICZ*

Institute of Radioelectronics, Warsaw University of Technology, 15/19 Nowowiejska Str., 00−665 Warsaw,

The emerging 3D television systems require effective techniques for transmission and storage of data representing a 3−D scene. The 3−D scene representations based on multiple video sequences or multiple views plus depth maps are especially im− portant since they can be processed with existing video technologies. The review of the video coding and transmission tech− niques is presented in this paper.

Keywords: television, video coding, video broadcasting, video streaming.

1. Introduction The goal of the 3D television (3DTV) system is to create a 3−D viewing experience to the viewers. The sophisticated information processing techniques are currently being developed to achieve this goal. The main areas of research are: l 3−D scene capturing to create a 3−D scene representation suitable for further processing [1]; l coding and transmission of a 3−D scene representation; l display technologies to present a 3−D scene reconstruc− ted from its representation. The simplest 3−D scene representation can be obtained with 2−video camera in a stereoscopic setup (Fig. 1). Only very limited 3−D viewing experience can be achieved with such representation: a scene can be observed from the only 1−fixed position. This 3−D mode is widely supported by cur− Fig. 2. Multiview capturing system (C – video camera). rently available displays, however, wearing special 3−D glasses is required to perceive depth of the scene. [2] can be used to view selection. Widely discussed in the literature free viewpoint television (FTV) systems [3,4] are based on this concept. Number of views in the multiview 3−D scene representa− tion is fixed and limited by the throughput of the transmis− sion channel. The 3−D viewing experience is thus limited to the very narrow angular range. An additional scene geome− try information, e.g., depth maps, can be added to overcome those limitations. The 3−D scene representation based on multiple views with added depth maps (multiview plus Fig. 1. Stereo view capturing module (C – video camera). depth, MVD) [5] allows for synthesis of additional views, thus reducing the number of views to be transmitted. The Multiview scene representation obtained with a multi depth maps can be estimated by the analysis of the neigh− camera capturing system (Fig. 2) offers higher flexibility – bouring views (Fig. 3) [6]. viewers can switch between views and select their view Depth maps can be also generated with the use of the position in the scene. Stereoscopic displays with an eye/ distance measuring devices (Fig. 4), e.g., time−of−flight ca− head tracking system or multiview autosteroscopic displays meras or pattern projection techniques [1]. An interesting solution based on the infrared light fringe patterns and fast *e−mail: [email protected] video cameras is presented in Ref. 7.

Opto−Electron. Rev., 21, no. 1, 2013 A. Buchowicz 39 Video coding and transmission standards for 3D television – a survey

2. Video coding The video sequence is the representation of a natural scene sampled temporally and spatially. In most cases it requires a huge amount of data to reproduce the scene with a good quality. The goal of a video coding is to find the compact representation of the video sequence while preserving its quality. The video compression is achieved by the exploita− tion of a temporal and spatial redundancy in the video sequence [9].

2.1. MPEG-4 AVC/H.264 The MPEG−4 AVC/H.264 standard is currently the most Fig. 3. Multiview plus depth capturing system (C – video camera, important one in the area of video coding [10]. It is based on DE – depth estimation unit). the hybrid motion compensation and transforms coding algorithm like many of its predecessors. Significant enhan− cements of the classic algorithm have been implemented in this standard to improve its coding efficiency [11–14]. The H.264/AVC encoder is divided into two layers: the video coding layer (VCL) and the network abstraction layer (NAL). The VCL processes video data: each frame of the in− put sequence is partitioned into a set of macroblocks, each macroblock is temporally or spatially predicted and its pre− diction error is then transform coded. The VCL generates a stream of encoded macroblocks organized into slices. The slice covers a part (or entire) frame and can be parsed inde− pendently from other slices. The NAL formats the output stream of the encoder as a series of packets called NAL units. The set of consecutive NAL units decodable into a single frame is called an access unit (AU). Each NAL unit is com− posed of one−byte header followed by its payload (Fig. 5). The header contains three fields describing the payload: l F – error flag (1 bit), NAL units with this field set to 1 should not be processed; l NRI – NAL unit priority (2 bits), this field should be set to 0 in all NAL units not used as a reference by other NAL units. The higher value of this field the more im− portant the NAL unit is for the video sequence recon− struction; l TYPE – NAL unit type, values 1 ÷ 23 are restricted to be Fig. 4. Multiview plus depth capturing system (C – video camera, DM – distance measuring device). used only within the H.264/AVC standard. Values 0 and 24 ÷ 31 may be used for other purposes, e.g., in trans− The other 3−D scene representations, such as surface− mission. −based or volumetric [8], are used in practice, as well. How− The length of a NAL unit is not encoded in its header. ever, 3−D scenes representations based on multiple views Therefore, NAL units must be prefixed with start codes eventually enhanced by the depth information have an im− (e.g., defined in the Annex B of the H.264/AVC standard) portant advantage – they can be encoded and transmitted or encapsulated in additional data structures (e.g., transmis− with the use of the existing or enhanced techniques used for sion protocol packets) to allow their separation. the standard video. The survey of these techniques will be presented in this paper. This paper is organized as follows. The review of the video coding standards is given in Sect. 2. Issues related to media streams multiplexing and storage are discussed in Sect. 3. Techniques used for the video transmission are pre− sented in Sect. 4. Conclusions and future research directions are discussed in Sect. 5. Fig. 5. NAL units in the H.264/AVC bitstream.

40 Opto−Electron. Rev., 21, no. 1, 2013 © 2013 SEP, Warsaw There are two main NAL unit categories: l VCL containing encoded video data, the following types have been defined: – coded slice of instantaneous data refresh (IDR) pic− tures (TYPE = 5). IDR picture allows to start the decoding process. The first frame in the video se− quence is always encoded in an IDR mode. IDR pic− tures may be repeated in the video sequence to allow stream switching or recovery from transmission errors; – coded slice of non−IDR picture (TYPE = 1); – coded slice data partition (TYPE = 2, 3, 4). Data partitioning is an error resilience tool available in the H.264/AVC standard. More detailed description can be found in Ref. 10; l non−VCL carrying associated additional information, the most important types are: – sequence parameter set (SPS, TYPE = 7), contains infrequently changing information relating to the entire video sequence; – picture parameter set (PPS, TYPE = 8), contains infrequently changing information relating to one or more pictures in a video sequence; – supplemental enhancement information (SEI, TYPE = 6), provides information supporting the decoding process or displaying the reconstructed video sequence; – access unit delimiter (TYPE = 9); – end of sequence (TYPE = 10); Fig. 6. Access unit representation in the H.264/AVC bitstream. – end of stream (TYPE = 11). The syntax of the H.264/AVC bitstream is more flexible with respect to the previous video coding standards. The 2.1.1. Scalable extension only required syntax element in every access unit is the In many applications it is desirable to deliver the same video VCL NAL unit containing at least one slice of the primary content in multiple versions differing in spatial resolution, coded picture (Fig. 6). In certain profiles of the H.264/AVC frame rate or image quality. This goal can be easily standard primary coded data may be followed by VCL NAL achieved by simulcast transmission of all required versions units with a redundant representation of this picture. of the video content, but it is a highly inefficient solution. Each slice header directly refers to the PPS and indi− Scalable coding is another possibility. Scalable video rectly to the SPS, both parameters’ sets must be known to bitstream is composed of two or more hierarchical layers the decoder to allow slice processing. Usually, non−VCL (Fig. 7). NAL units with SPS and PPS are transmitted before any VCL NAL unit in the same channel (”in−band”). However, SPS/PPS NAL units may be transmitted in an additional, more reliable channels as well (”out−of−band”). SEI mes− sages may be very useful in the VCL NAL unit processing, but they are not necessary to decode the access unit. Simi− larly access units delimiters are not required to detect the beginning of a new frame in the encoded video sequence. The frame boundaries can be derived from slice headers in VCL NAL units, however it is a resource consuming pro− cess. The H.264/AVC bitstream processing may be signifi− cantly simplified if it contains access unit delimiters. Multi− ple SPSs and PPSs may be defined and used by an encoder in the same bitstream. It is only required that the VCL NAL units using different SPS should be preceded by the end of a sequence NAL unit. The end of a stream NAL unit may Fig. 7. Scalable encoded video sequence with two layers. Spatial and follow the very last access unit in the entire sequence. temporal scalability modes are used.

Opto−Electron. Rev., 21, no. 1, 2013 A. Buchowicz 41 Video coding and transmission standards for 3D television – a survey

The lowest/base layer carries the lowest quality video The base layer of the SVC bitstream is compatible with content. The upper/enhancement layers contain data re− H.264/AVC. The single−layer AVC decoder may simply quired to reconstruct video in a better quality. There are the discard all SVC NAL units and reconstruct the base layer following scalability modes: from the remaining NAL units. The SVC bitstream can also l temporal – enhancement layers allow a video sequence be transcoded into AVC bitstream if certain constraints are reconstruction with an increased frame rate; met. An interesting example of this functionality is lossless l spatial – as above but with an increased spatial resolu− rewriting of a multilayer SVC bitstream into a single−layer tion; AVC bitstream [22]. l quality or SNR – as above, but with an increased signal to quantization noise ratio. 2.1.2. Multiview extension The above scalability modes can be combined and used Advances in display technologies and increasing availabil− jointly in a single scalable bitstream. Multiple representa− ity of a 3−D content force the development of an effective tions of the video sequence with different spatio−temporal coding and transmission techniques for stereo or multiview resolutions and quantization parameters can be created in video [2]. The stereoscopic video transmission can be im− this way. plemented by a simulcast or sequential transmission of the The scalable video coding tools were available in pre− full−resolution left eye and right eye views. However, it ceding standards, e.g., MPEG−2 Video [15], H.263 [16], would require twice as much bandwidth as for a single view. MPEG−4 Visual [17]. However, these tools were inefficient The bandwidth increase will be even higher if more views and were rarely used in practice. The scalable video coding are to be transmitted. The views can also be transmitted with (SVC) extension was introduced in the fourth edition of the the use of frame−compatible formats [23]. The left and right H.264/AVC standard [18,19]. The extension supports all eye views are firstly subsampled and then put together, e.g., scalability modes. Basic coding tools available in the sin− side−by−side into a single frame in these formats. Frame− gle−layer H.264/AVC standard are used within each layer of −compatible formats can be easily implemented in any exist− a scalable bitstream. Additionally, the following coding ing video transmission infrastructure. However, the views tools were developed: resolution is reduced by 50% in such approach and it cannot l interlayer intra prediction – the intra−coded macroblock be extended for more views. in the enhancement layer is predicted from the co−lo− The multiview coding efficiency can be significantly cated (and up−sampled, if spatial scalability is used) increased if the redundancies between neighbouring views macroblock in the reference layer; are exploited. A frame in a certain view can be predicted l interlayer motion prediction – motion parameters for the from a time−aligned frame in another view similarly as macroblock in the enhancement layer are derived from a frame can be predicted from another frame in the same the co−located macroblock in the reference layer; video sequence (Fig. 8). The view which is used as a refer− l interlayer residual prediction – residual signal of the ence for another views is called the base view, other views inter−coded macroblock in the enhancement layer is pre− are called the enhancement views, analogously as layers in dicted from the residual signal of the colocated macro− a scalable coding. block in the reference layer. The multiview profile has been added to the second edi− The above tools exploit the statistical dependencies tion of the MPEG−2 Video standard [15]. However, it has between layers to increase the efficiency of the scalable not been deployed to the market product owing to its limited encoder. The SVC extension provides high coding effi− efficiency. The (MVC) extension ciency – the SVC stream bitrate is in the most cases only has been appended to the fifth edition of the H.264/AVC slightly higher than that of the equivalent quality non−scal− able AVC stream [20]. Three new NAL unit types have been defined to carry the scalable extension data. The structure of the new SVC NAL units is similar to that of the basic AVC NAL units [21]. The scalable video bitstream can be easily adapted to the transmission channel throughput or decoder capabilities by simply discarding unnecessary layers. The additional fields in the extended header of SVC NAL units are used to select the layers required to decode the video sequence at the requested quality level. However, if the combined scala− bility is used, information carried in this header is not suffi− cient enough to select all the required layers. It can be derived from the slice headers, but it means that all NAL units must be at least partially parsed. Special SEI messages describing the scalable hierarchy have been defined to sim− plify the scalable bitstream processing [21]. Fig. 8. Inter−view prediction.

42 Opto−Electron. Rev., 21, no. 1, 2013 © 2013 SEP, Warsaw standard [24]. It is based on the proven tools used for the l adaptive interpolation filter for the samples in the sub− standard video coding supplemented by new tools utilizing −pixel positions to improve the quality of the predictions the interview dependencies. The significant improvement of signal; the coding efficiency has been achieved with these tools. l entropy coding with the use of probability interval parti− The enhanced view requires usually not more than half a bit tioning: unit interval is divided into small number of rate necessary to encode this view with the same quality probability intervals to decouple the probability model− independently from another views. The coding efficiency ling form entropy coding. was not the only requirement for the MVC extension. The The HEVC standard is still being developed. It has rea− backward compatibility with the single−view AVC encoder ched the Committee Draft level, the standardization process was requested, as well. The AVC encoder architecture with is expected to be completed in the beginning of 2013. It is the video coding (VCL) and network abstraction (NAL) worthwhile to mention, that similarly as for the MPEG−4 layers was reused in the MVC encoder design. AVC/H.264, the scalable [28] and multiview [29] exten− The MVC bitstream provides view scalability – only sions for the HEVC have been proposed. selected NAL units may be decoded to reconstruct the re− quested views. The base view is encoded in conformance to 2.3. 3D video the single view H.264/AVC standard. The MVC bitstream may be processed by a standard AVC decoder which can Multiview 3−D scene representation with fixed number of simply ignore all MVC extension NAL units and recon− views allows for the only limited implementation of the fee struct the base view. viewpoint television concept. On the contrary, multiview video plus depth (MVD) representation allows for the syn− Several new SEI messages have been defined in the thesis of any view within the certain range [5] providing MVC extension. The SEI messages are not necessary to much better viewing experience. Although MVD represen− decode the MVC bitstream similarly as it was in the case of tation may be encoded with the use of existing tools – depth the AVC and SVC bitstreams. However, the MVC bits− maps may be handled as an unrelated monochrome image – tream processing may be simplified if the SEI messages are better results may be achieved if views and depth maps are available and used by the decoder. encoded jointly. The MPEG initiated a standardization pro− cess for the 3D video (3DV) based on the MVD representa− 2.2. High efficiency video coding tion in the beginning of 2011 [30–31]. The goal is to create an effective compression tools allowing for a synthesis of The MPEG−4 AVC/H.264 video coding standard is commonly high quality views. Another requirement is compatibility used in many multimedia applications. However, the user re− with the existing coding standards, as well as an ability to quirements are still growing and more effective compression extract data required for the presentation on traditional tools are demanded. The standardization process for the new mono or stereo displays. The depth map extraction tech− video coding standard has been initiated [25]. The standard, niques are beyond the scope of the call [30], the supplied called high efficiency video coding (HEVC) is expected to test material for the proposal evaluation contains both views provide significantly higher compression effectiveness with and depth maps. respect to the MPEG−4 AVC/H.264. It is especially important The proposals submitted in the response for the call fall for the becoming more popular HD and emerging Ultra−HD into two categories: ACV−based and HEVC−based. It has (4k × 2k and more) video applications (e.g., digital cinema). been decided that 3DV standard will be developed in two The proposals submitted in response to the call are based parallel tracks, one for each of the above categories [32]. on the traditional hybrid video coding algorithm [25]. Sev− The following coding tools have been selected for the eral new compression tools have been proposed to fulfil the AVC−based test model [33]: requirements for the higher efficiency [26,27]: l inter−view prediction based on view−synthesised pictu− l variable size coding units: frame partitioning into a fixed remotion vector prediction for texture coding with the size macroblock structure used in the previous standards use of associated depth information; has been replaced by the partitioning into a variable size l joint inter−view depth filtering to denoise depth data− (from 8 × 8 up to 64 × 64) structure of coding units depth range−based prediction to compensate inconsis− (CU). The CU structure adapts to the picture characteris− tencies of depth maps generated for different views re− tics – smaller CU are used in regions with many details, use of texture motion information in depth coding; larger CU in uniform regions; l slice header prediction to remove redundancy between l quad−tree partitioning of coding units into variable size texture slice headers and depth slice headers. prediction units (PU) and transform units (TU). The pre− The HEVC−based test model is based on the following diction mode is selected at the PU level, spatial trans− coding tools [34]: form is applied to the TU. The structure of PU may be l dependent views encoding with the use of disparity independent from the structure of TU; compensated prediction (adapted from MVC); l adaptive loop filters: used in addition to the deblocking l inter−view prediction with the use of a view−synthesized filter to improve image quality; picture;

Opto−Electron. Rev., 21, no. 1, 2013 A. Buchowicz 43 Video coding and transmission standards for 3D television – a survey l adaptive filter to remove artefacts created by the view synthesis process; l inter−view motion prediction with the use of depth infor− mation; l inter−view residual prediction based on the depth infor− mation; l bit assignment (QP adjustment) for the texture compo− nent based on the depth information to improve percep− tual quality; l non−linear depth map representation to improve the quality of the synthesised views; l modified motion information coding to preserve sharp edges in depth maps; l depth maps intra prediction modes providing better rep− resentation of edges; l motion parameter inheritance – reuse of texture motion information in depth coding. The 3DV standard is currently in the early stage of Fig. 9. MPEG−2 transport stream multiplexer. development. The completion of the standardization pro− cess is expected in the middle of 2013. the transport stream. Therefore, the MPEG−2 TS is used mainly for the transmission in lossy or noisy channels with 3. Media streams multiplexing and storage high error probability, e.g., in broadcast networks. Each MPEG−2 TS packet starts with a 4−byte header containing Multiple media streams are transmitted or stored in many several fields. The most important is the 13−bit packet iden− multimedia systems. It is often required to combine media tifier (PID) allowing proper interpretation of the packet streams into a single multiplexed stream to simplify the payload. transmission or storage. The MPEG−2 transport stream (TS) [35] is widely used to multiplex media streams mainly in The MPEG−2 TS contains also additional packets with a digital television broadcasting and streaming in the com− metadata called program specific information (PSI). The puter network. It is also used for storage, but other file for− PSI completely describes television programs and their mats (multimedia containers) are more popular. The MPEG media streams multiplexed into the MPEG−2 TS. The PSI is MP4 file format is becoming the most important in this area organized into several tables. Additional data structures, [36]. called descriptors, have been defined to carry information on either the entire transport stream or on the television pro− gram or on the media streams. The MPEG−2 TS specifica− 3.1. MPEG-2 transport system tion defines five types of PSI tables [35]: The MPEG−2 TS is a tool for multiplexing multiple elemen− l program association table (PAT): required, contains the tary streams (ES) of compressed audio, video and auxiliary list of all television programs multiplexed into the TS, data. Additional information is also provided to enable syn− transmitted in the TS packets with PID value equalling chronization of media streams processing and their presen− zero; tation. The elementary streams are encapsulated into pack− l program map table (PMT): required for each television ets before being multiplexed (Fig. 9). The packetized ele− program multiplexed into the TS, describes media streams mentary stream (PES) is created in this way. The packet and the stream containing synchronization data (program headers contain timing information related to the portions of clock reference – PCR) for the television program, trans− the media streams carried as the packet payloads. The length mitted in the TS packets with PID value specified in the of the packet is stored in its header. The PES packets may PAT list entry for the television program; have fixed, variable or unspecified (signalled as equalling l conditional access table (CAT): required if any of the zero) lengths depending on the application. All elementary media stream is scrambled; streams combined into the transport stream may have com− l network information table (NIT): optional, its content is mon time base. Such stream is referred to as a single pro− application specific, usually describes the TS provider; gram transport stream (SPTS). However, groups of audio, l transport stream description table (TSDT): optional, video and data streams (television programs), each group contains descriptors relating to the entire TS. with an independent time base, may be multiplexed, as well. The MPEG−2 TS specification was created before the Such stream is called a multiple program transport stream development of the H.264/AVC standard had started. It was (MPTS). The MPEG−2 TS is a series of packets with the later extended [37,38] to allow to multiplex H.264/AVC fixed length equalling 188 B. The fixed and relatively short bitstreams into the transport stream. The amendment [37] length of the TS packets simplifies the error protection of defines an identifier and two descriptors for the H.264/AVC

44 Opto−Electron. Rev., 21, no. 1, 2013 © 2013 SEP, Warsaw bitstream. Additionally, some restrictions are imposed on the H.264/AVC bitstream. The most important are: l the beginning of each access unit must be signalled by the AU delimiter NAL unit; l the H.264/AVC bitstream must contain all the SPS and PPS necessary to its decoding. The MPEG−2 TS specification was further amended to Fig. 11. MP4 file with media data split into fragments. enable transmission of the scalable [39] and the multiview [40] extensions of the H.264/AVC standard. mation on all media fragments. The mfra box simplifies ran− 3.2. MP4 file format dom access to the specified fragment of the multimedia pre− sentation stored in the MP4 file. Each fragment can be The ISO base media file format [41] (published also as accessed and decoded independently from other fragments. a part of the JPEG−2000 standard [42]) is a basis of many This feature is used in HTTP−based streaming solutions that file formats used to store a multimedia content. It was gen− are presented in the next section. eralized from the MP4 file format [36] which, in turn, was The AVC file format is the extension of the ISO base derived from Apple’s QuickTime file format [43]. media file format [44]. It is designed to store AVC, SVC or The MP4 is a universal multimedia container designed MVC video streams. It inherits all data structures defined in to store multiple time−based media streams of different the base specifications and defines new data structures to types. The media information is stored in the MP4 file in support features specific to the AVC/SVC/MVC codecs a series of data structures called boxes (atoms in the Quick− such as parameter sets or switching pictures. An excellent Time file format). The MP4 file starts with a file−type box overview of the new data structures related to the scalable (ftype, Fig. 10). This box identifies specifications to which video stream can be found in Ref. 45. The multiview video the file complies and describes how the file should be inter− preted. The movie box (moov) contains metadata on all related issues are discussed in Ref. 46. media streams stored in the file. Information for each media stream is stored in a separate track. Each track contains tim− 4. Video transmission ing information for the media stream, specifies the stream parameters (e.g., width/height for video) and defines the An effective transmission of the encoded video sequence is coding standard. The movie box also contains information another important issue in multimedia systems [46,23]. In on where the encoded data can be found. This data may be most cases the video, audio and auxiliary data streams mul− stored in the single media data box (mdat). The media data tiplexed in the MPEG−2TS [34] are transmitted. However, it may be split into chunks and interleaved in the media data is not the only solution. The elementary (video, audio, data) box. The timing information in the movie box allows for may be transmitted separately over IP networks as it will be a random access to the requested point in any media stream. discussed later in Sect. 4.2. Furthermore, it is not required that all media data specified in the movie box are contained in the same file. The pointers 4.1. Broadcasting in the movie box structure may refer to media data in the same MP4 file or in remote locations identified by their Broadcasting is the way of distributing multimedia data URL. through open radio networks. Television is probably the The entire multimedia presentation may be also divided best known example of a broadcasting system. The ana− into many fragments (Fig. 11). The metadata for each frag− logue video and audio signals have been used in television ment is contained in the movie fragment box (moof). Its for many years. The transition to the digital domain have structure is analogous to the structure of the movie box. The been initiated in the late 90’s. It is scheduled to be com− media data for each fragment are stored in the multiple pleted in EU countries in 2012. The digital switchover media data boxes (mdat) interleaved with the movie frag− would not be possible without channel coding standards ment boxes. The last box in the fragmented MP4 file is the defining modulation parameters, protective coding, etc. movie fragment random access box (mfra) containing infor− [47]. Many standards crucial for the digital television have been developed by the European Digital Video Broadcas− ting Project [48]. The DVB−S standard is suitable for satellite based trans− mission systems [49]. It uses QPSK modulation and error protection based on the Reed−Solomon block code and the convolutional codes. The new DVB−S2 standard offers higher efficiency and fiexibility than its predecessor DVB−S [50]. It has been achieved by new error correction scheme Fig. 10. MP4 file with single media data box. based on LDCP codes concatenated with BCH codes, new

Opto−Electron. Rev., 21, no. 1, 2013 A. Buchowicz 45 Video coding and transmission standards for 3D television – a survey modulations schemes, adaptive coding and modulation their applications in real−time systems are limited owing to (ACM) functionality and many other enhancements. the delays introduced by the TCP transmission. The DVB−C standard has been developed for the cable The other approach is based on the UDP as a transport television (CATV) networks [51]. It reuses many functional protocol. It is preferred in real−time applications, e.g., video− blocks utilized in the DVB−S standard. However, there are −conferencing or video surveillance. There are usually very also some important differences: QAM modulation is used strict requirements on the transmission delay in such appli− instead of QPSK, convolutional codes are not used for error cations. These requirements can be fulfilled only if the UDP protection. The second generation standard DVB−C2 for is used. However, since the UDP is an unreliable protocol, cable systems has been adopted lately [52]. By the use of some datagrams may be lost, duplicated or may arrive to the COFDM modulation and advanced coding techniques, the destination in the wrong order. The real time protocol transmission efficiency has been significantly increased al− (RTP), accompanied by the RTP control protocol (RTCP) lowing deployment of new services in CATV networks. were developed to eliminate these drawbacks [59,60]. The The terrestrial transmission is more complex than satel− RTP provides data transport mechanism, while the RTCP is lite or cable due to the multipath propagation. The DVB−T a tool for data transmission monitoring. Both protocols are standard [53] for digital terrestrial television was completed most often used on the top of the UDP, however, it is possi− several years later than the DVB−S and DVB−C standards. It ble to use them with the other transport protocols, as well. uses multi−carrier COFDM modulation to overcome multi− path propagation related problems and the Reed−Solomon 4.2.1. TCP-based streaming block code combined with the convolutional codes for error There are many multimedia delivery solutions based on the protective coding. Its enhancement is the new DVB−T2 TCP protocol. The real time messaging protocol (RTMP) standardproviding increased number of carriers (up to 32k), [61] was developed by Macromedia (later acquired by additional modulation constellations, error protective cod− Adobe) for its Flash Player. The multimedia data are split ing based on LDCP/BCH codes, rotated constellations and into small fragments sent as a series of RTMP messages. many other improvements [54]. Each message contains a timestamp and a payload type The DVB−H standard is an important extension of the identifier. The RTMP allows for simultaneous transmission DVB−T standard [55]. Its features allow for an implementa− of many time−synchronized audio and video streams. The tion of media and data delivery services to mobile termi− RTMP defines its own control messages and allows for nals [56,57]. It introduces time slicing to reduce power con− higher−lever protocols to embed application specific mes− sumption in mobile terminals, novel MPE−FEC error pro− sages in the RTMP messages. It can be used for both inter− tection coding and IP based data transmission. The DVB−H active on−demand and live broadcast services. The TCP port uses the same physical layer as the DVB−T, therefore it is 1935 is used by the RTMP by default, so the RTMP trans− possible to mix DVB−T and DVB−H services within one mission may be blocked by firewalls. However, the RTMP channel. messages can be embedded in HTTP or HTTPS requests The channel coding techniques described in the DVB with the use of the RTMPT or the RTMPS protocols [62], standards are used in conjunction with source coding algo− respectively, to bypass firewalls restrictions. rithms and elementary streams multiplexing into the The splitting of multimedia files into small fragments MPEG−2 transport stream. Besides the standard program allows for their playback after receiving only small part of specific information defined in the MPEG−2 system specifi− the entire file. It is a big advantage over the classical HTTP cation (presented in Sect. 3.1), additional tables and des− or FTP download. However, if a network congestion occurs, criptors have been defined by the DVB consortium [58]. the reception of the consecutive fragments is delayed and These additional data structures allow for the implementa− playback is stopped. Several adaptive HTTP−based solu− tion of the digital television specific services, e.g., electro− tions have been proposed to eliminate this annoying effect. nic program guide (EPG). The Apple Live Streaming is one of possible solutions [63]. It supports multiple variants of the same multimedia 4.2. IP streaming presentation. Each variant is encoded with different parame− ter sets, e.g., bitrate and split into small time−aligned frag− There are generally two approaches to the media delivery ments (Fig. 12). Each fragment starts with a key frame and over the IP networks. The first one uses the TCP as a trans− can be decoded independently from the other ones. It allows port protocol. The file download with the use of the HTTP for seamless switching between variants according to the protocol is the most obvious example. The other example is actual network throughput. Each fragment is stored in as the so called HTTP progressive download. The file trans− a separate MPEG−2 TS file. The extended M3U playlist mitted with the use of HTTP is split into many small frag− files for each variant listing its fragments and a variant ments in this case. Each fragment is transmitted in a sepa− playlist file that lists each variant are created [64]. The rate HTTP request, allowing media playback after receiving playlists as well as the media fragments are downloaded by only the small part of the entire file. The advantage of the the client in consecutive HTTP requests. The media adapta− HTTP−based solutions is the ability to traverse firewalls, so tion – switching between variants – is performed entirely by it is widely used in the Internet (e.g., YouTube). However, the client.

46 Opto−Electron. Rev., 21, no. 1, 2013 © 2013 SEP, Warsaw dard was proposed by the 3rd generation partnership project (3GPP) [72]. The proposals issued later by the open IPTV forum [73] and the MPEG [74] were based on the initial 3GPP proposal. The dynamic adaptive streaming over HTTP (DASH) standard, that is currently developed by MPEG, is supposed to meet the following requirements [75,76]: l support for any type of media including Ultra HD video, 3−D video, interactive 3−D content; l adaptivity to the network conditions and the user termi− nals capabilities; Fig. 12. Adaptation to network conditions by switching between l efficient HTTP−based delivery with support for progres− media variants. sive download, peer−to−peer networks, multi−channel transmission and content relaying; The Microsoft Silverlight Smooth Streaming [65] is l multiple media streams embedded in MP4 or MPEG−2 based on the same concept as the Apple Live Streaming. TS files. However, the media fragments are not stored in separate The draft version of the standard provides the XML files, instead each media variant consisting of many frag− schema for the manifest file format called media presenta− ments is stored in one MP4 file [41,36]. The entire media tion description (MPD) [74]. The media streams are split presentation with multiple variants encoded with different into a hierarchical structure of periods, groups, presenta− bitrates is stored in several MP4 files in this solution, while tions and segments. This structure is more general and more fiexible than the fragment structure used in the previously there are hundreds or thousands of small MPEG−2 in the presented solutions. The MPD update mechanism is also Apple’s solution. Each fragment is accessed by the exten− defined to support the live/unbounded media content. ded URL containing both the variant identifier and the frag− ment timecode [66]. This URL is translated by the server 4.2.2. UDP-based streaming into the file offset in the physical MP4 file containing the requested media variant. Two custom files called the server The video streaming techniques utilizing UDP as a transport manifest and the client manifest are used in this process. protocol provide a minimal transmission delay required in The first file uses the SMIL 2.0 [67] syntax to describe the some applications. However, the DP datagrams are usually relationships between the media variants, their bitrates and blocked by firewalls, so these techniques can be used only in local networks in most cases. If the network supports the their mapping to the physical files in the server filesystem. quality of service mechanism, then the media stream may be The second file is a custom XML document and is used by simply encapsulated in the UDP datagrams without any the client to determine the bitrate, resolutions and other additional error protection provisioning. The only limitation parameters of the media streams available in the server. is the length of the resulting UDP datagrams: it should not Currently, this solution is supported only by the Microsoft exceed the length of the network maximum transmission Internet Information Services 7.0 with Smooth Streaming unit (MTU). The MTU length depends on the network phys− extension [68]. ical layer, e.g., in the Ethernet networks is approximately The Adobe HTTP Dynamic Streaming is very similar to equal to 1500 B. the Microsoft’s solution [69]. Its great advantage is an inte− The RTP/RTCP protocols are most often used if the net− gration with the Flash platform. The new version of the work performance cannot be guaranteed. They can be used Flash Player, available as a plugin in almost every PC in different network topologies. The unicast (point−to−point) worldwide, supports HTTP Dynamic Streaming as well as transmission between two participants is the simplest case. the traditional RTMP streaming. The existing Flash content Multicast transmission between a group of participants can protection technique can be used with the new streaming also be easily realized. The RTP translators can be used to solution. The MP4 file format extension called MP4 frag− bridge transmission between different networks. The RTP ment format (F4F) [70] is used to store fragmented media mixers can be used to aggregate media streams from many files. The manifest file is a custom XML document with an participants into one stream transmitted to the other partici− open specification [71]. Adobe has developed a complete pant or group of participants [60]. toolchain to prepare both fragment and manifest files for The RTP is a data transport protocol. The media data are off−line and live media presentations. It also supports open split into smaller parts and encapsulated into RTP packets. source initiatives developing auxiliary tools for the Flash The length of the RTP packet should not exceed the network platform. MTU length. The RTP packet is composed of a header and a The above presented commercial HTTP−based stream− payload. Its header contains several fields describing the ing solutions are widely used to distribute multimedia in payload [59]. Internet. However, this application area is so important that The RTP can be used for transmission of different types an international standard is highly required. The first stan− of data. Media data may be either multiplexed in MPEG−2

Opto−Electron. Rev., 21, no. 1, 2013 A. Buchowicz 47 Video coding and transmission standards for 3D television – a survey

Transport Stream before transmission [77,78] or transmitted The fragmentation mode provides the way to handle as separate streams [79,80]. The payload formats are de− NAL units containing such long slices. The NAL unit is fined in separate documents. The payload formats for sev− split into fragments transmitted in the consecutive RTP pac− eral types of audio streams as well as data type identifiers kets. There is also an option to change the order of the NAL and their timestamp clock rates are defined in Ref. 81. The unit fragments. Each NAL unit fragment is appended by an payload format for MPEG−4 AAC and other MPEG−4 ele− additional field containing its order in this option. mentary streams is defined in Ref. 82. The payload format The NAL units containing, e.g., parameter sets, SEI for the MPEG−2 TS is specified in Ref. 83. The payload for− messages or encoded slices of fine sliced frame can be very mat for the H.264/AVC and its extensions will be further short. Their transmission in a single RTP packet is ineffec− presented in this section. tive due to the header overhead. The aggregation mode The RTCP is a control protocol associated with the RTP allows to join such short NAL units in one longer RTP (Fig. 13). It defines a set of reports (control messages) packet. The aggregated RTP packet can contain either NAL exchanged between the RTP session participants. The units with identical timestamps or with different times− RTCP reports allow for constant monitoring of the data tamps. Similarly, as in the fragmentation mode, NAL units transmission quality, session participant identification, noti− do not have to be inserted into aggregated packet in its fication on changes in the session membership and media decoding order. stream synchronization [59,60]. The payload format for the scalable extension (SVC) of The first version of the RTP/RTCP specification re− the H.264/AVC is proposed in the draft specification [85]. quired that adjacent UDP ports are used: even numbered Two modes of the SVC bitstream transmission are defined: port the unidirectional RTP transmission and odd numbered l single−session: all layers of the SVC stream are transmit− port for the bidirectional RTCP transmission. This conven− ted in a one RTP session. All packetization modes avail− tion is used in most applications although it is not required able for the H.264/AVC bitstream may be used in this by newer version of the RTP/RTCP specification. mode; The RTP payload format for the delivery of H.264/AVC l multi−session: layers of the SVC stream are transmitted bitstream is defined in Ref. 84. Three encapsulation modes in different RTP sessions. All sessions are synchronized are specified: to the same system clock. Four special packetization l single NAL unit in the RTP packet; modes are defined for this transmission mode. l multiple NAL units in the RTP packet (aggregation Multi−session mode is especially suitable for the multi− mode); cast transmission. Separation of the SVC bitstream layers l NAL unit split into multiple RTP packets (fragmentation simplifies the stream adaptation to the network conditions. mode). Specialized network devices, so called Media Aware Net− The first mode is very simple: an entire NAL unit is work Elements, can simply discard the layers which require inserted into the RTP packet as its payload. It is effective if higher throughput than is currently available. the NAL unit length fits to the network characteristic. The The proposed payload format [86] for multiview exten− length of the RTP packet must not exceed the maximum sion (MVC) of the H.264/AVC is very similar to the specifi− length of the UDP datagram equal to 64 kB. It should also cation for the SVC. The views contained in the MVC bit− not exceed the length of the maximum transfer unit (MTU) stream can be transmitted in either one RTP session (sin− for the given network. If the RTP packet is longer than gle−session mode) or in multiple synchronized RTP sessions MTU it will be fragmented by the lower layers in the IP (multi−session mode). stack. The packet fragmentation increases the probability of the packet loss. 5. Conclusions The length of the encoded slice depends on many fac− tors. It may easily exceed the limit of 64 kB, e.g., if the high The multiview 3−D scene representations may be encoded resolution sequence is encoded with good quality (high and transmitted with the existing video coding technology. bitrate) and no frame slicing is used (e. g., the entire frame is The H.264/AVC and its scalable and multiview extensions encoded in one slice). In many cases it also exceeds the offer high coding efficiency and fiexibility in adjusting the MTU value. coder configuration to the application requirements. The currently developed HEVC standard will offer even higher efficiency required for the emerging HD/Ultra HD video applications. The 3D video standard, under development too, will support the multivview plus depth representations. If video streams are to be broadcasted, the proper channel coding standard must be chosen. The standards developed by the DVB consortium covers all possible radio channels. The multiple video streams must be multiplexed into the MPEG−2 transport stream prior to the transmission in Fig. 13. Media streaming with use of RTP/RTCP protocols. a broadcast network. The MPEG−2 TS may be used for stor−

48 Opto−Electron. Rev., 21, no. 1, 2013 © 2013 SEP, Warsaw age too, however, the MP4 file format offers higher fiexibi− 13. G.J. Sullivan and T. Wiegand, “Video compression from lity. The transmission of multiple high−rate video streams in concepts to the H.264/AVC standard”, Proc. IEEE 93, 18–31 the IP networks is a challenging task. The UDP−based pro− (2005). tocols, e.g., RTP/RTCP, should be used in the real−time 14. D. Marpe, T. Wiegand, and G. J. Sullivan, “The H.264/ application with strict requirements on the transmission de− MPEG4 standard and its applica− lay. However, this approach is limited to local networks in tions,” IEEE Commun. Mag. 44, 134–143 (2006). the most cases because UDP datagrams are usually blocked 15. Information technology generic coding of moving pictures and associated audio information: video, ISO/IEC 13818− by a firewalls. The HTTP−based solutions should be used if −2:2000. the transmission in Internet is required. The upcoming 16. Video coding for low bit rate communication, ITU−T H.263, DASH standard currently developed by MPEG seems to be 2005. very promising, but it is not supported by any web browsers 17. Information technology coding of audio – visual objects Part or multimedia players, yet. In contrast the existing HTTP− 2: visual, ISO/IEC 14496−2:2003. −based commercial solutions, e.g., Adobe Flash or Micro− 18. H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the soft Silverlight, offer a complete toolchain for the authoring, scalable video coding extension of the h.264/AVC standard”, streaming and presentation of the multimedia content. IEEE T. Circ. Syst. Vid. 17, 1103–1120 (2007). 19. H. Schwarz and M. Wien, “The scalable video coding exten− Acknowledgement sion of the H.264/AVC standard,” IEEE Signal Process. Mag. 25, 135–141 (2008). This work was supported by the Ministry of Science and 20. M. Wien, H. Schwarz, and T. Oelbaum, “Performance Higher Education. Grant N516 415738. analysis of SVC”, IEEE T. Circ. Syst. Vid. 17 , 1194–1203 (2007). References 21. Y.−K. Wang, M.K. Hannuksela, S. Pateux, A. Eleftheriadis, and S. Wenger, “System and transport interface of SVC”, 1. E. Stoykova, A.A. Alatan, P. Benzie, N. Grammalidis, S. IEEE T. Circ. Syst. Vid. 17, 1149–1163 (2007). Malassiotis, J. Ostermann, S. Piekh, V. Sainov, C. Theobalt, 22. A. Segall and J. Zhao, “Bit−stream rewriting for SVC−to− T. Thevar, and X. Zabulis, “3−D time−varying scene capture −AVC conversion”, Proc. Int. Conf. Image Process., pp. technologies – a survey”, IEEE T. Circ. Syst. Vid. 17, 2776–2779, San Diego, 2008. 1568–1586 (2007). 23. C.G. Gürler, B. Güurkemli, G. Saygili, and A.M. Tekalp, 2. H. Urey, K.V. Chellappan, E. Erden, and P. Surman, “State “Flexible transport of 3−D video over networks”, Proc. IEEE of the art in stereoscopic and autostereoscopic displays”, 99, 694–707 (2011). Proc. IEEE 99, 540–555 (2011). 24. A. Vetro, T. Wiegand, and G.J. Sullivan, “Overview of the 3. M. Tanimoto, M.P. Tehrani, T. Fujii, and T. Yendo, “Free− stereo and multiview video coding extensions of the H.264/ −viewpoint tv”, IEEE Signal Process. Mag. 28, 67–76, MPEG−4 AVC standard”, Proc. IEEE 99, 626–642 (2011). (2011). 25. Joint call for proposals on video compression technology, 4. M. Tanimoto, “FTV and all−around 3DTV”, IEEE P Soc. ISO/IEC JTC1/SC29/WG11, N11113, Kyoto, 2010. Photo−Opt. Ins., Tainan City, Taiwan, 2011. 26. J.−R. Ohm and G. Sullivan, “Description of high efficiency 5. K. Müller, P. Merkle, and T. Wiegand, “3−D video represen− video coding (HEVC)”, ISO/IEC JTC1/SC29/WG11, tation using depthmaps”, Proc. IEEE 99, 643–656 (2011). N11822, Daegu, 2011. 6. P. Kauff, N. Atzpadin, C. Fehn, M. Müller, O. Schree, A. 27. D. Marpe, H. Schwarz, T. Wiegand, S. Bosse, B. Bross, P. Smolic, and R. Tanger, “Depth map creation and image−based Helle, T. Hinz, H. Kirchhoffer, H. Lakshman, T. Nguyen, S. rendering for advanced 3DTV services providing interopera− Oudin, M. Siekmann, K. Suhring, and M. Winken, “Im− bility and scalability”, Signal Process. Image 22, 217–234, proved video compression technology and the emerging high (2007). efficiency video coding standard”, IEEE I. Symp. Consum. 7. W. Skarbek, “Structured light camera and integrated literate Electr., pp. 52–56, Berlin, 2011. programming of GPU”, Proc. Joint Conf. NTAV/SPA pp. 28. H. Choi, J. Nam, D. Sim, and I. V. Baji, “Scalable video cod− 25–36, Łódź, Poland, 2012. ing based on high efficiency video coding (HEVC)”, IEEE 8. A.A. Alatan, Y.Y. Yemez, U. Güdükbay, X. Zabulis, K. Pacif., Victoria, Canada, 2011. Müller, Ç.E. Erdem, C. Weigel, and A. Smolic, “Scene rep− 29. G. Van Wallendael, S. Van Leuven, J. De Cock, F. Bruls, and resentation technologies for 3dtv – a survey”, IEEE T. Circ. R. Van de Walle, “Multiview and depth map compression Syst. Vid. 17, 1587–1605 (2007). based on HEVC”, IEEE I. Symp. Consum. Electr, pp. 6–8. 9. Y.Q. Shi and H. Sun, Image and Video Compression for Berlin, 2011. Multimedia Engineering. Fundamentals, Algorithms, and 30. Video and requirement, call for proposals on 3D video cod− Standards. Second Edition, CRC Press, Boca Raton, 2008. ing technology, ISO/IEC JTC1/SC29/WG11, N12036, Ge− 10. Information technology – coding of audio−visual objects – neva, 2011. Part 10: advanced video coding, ISO/IEC 14496−10:2008. 31. Video and requirements, “Applications and requirements on 11. I.E. Richardson, The H.264 Advanced Video Compression 3D video coding”, ISO/IEC JTC1/SC29/WG11, N12035, Standard, 2nd Edition”, edited by Wiley, Chichester, 2010. Geneva, 2011. 12. T. Wiegand, G.J. Sullivan, G. Bjontegaard, and A. Luthra, 32. K. Müller, A. Vetro, and V. Baroncini, “AHG report on 3D “Overview of the H.264/AVC video coding standard”, IEEE video coding”, ISO/IEC JTC1/SC29/WG11, m21469, Gene− T. Circ. Syst. Vid. 13, 560–576 (2003). va, 2011.

Opto−Electron. Rev., 21, no. 1, 2013 A. Buchowicz 49 Video coding and transmission standards for 3D television – a survey

33. M.M. Hannuksela, “Test model under consideration for 55. Digital video broadcasting. Transmission system for hand− AVC−based 3D video coding (3DV−ATM)”, ISO/IEC JTC1/ held terminals (DVB−H), ETSI EN 302 304. SC29/WG11, N12349, Geneva, 2011. 56. J.T.J. Penttinen, P. Jolma, E. Aaltonen, and J. Väre, The 34. H. Schwarz and K. Wegner, “Test model under consideration DVB−H Handbook. The Functioning and Planning of Mobile for HEVC based 3D video coding”, ISO/IEC JTC1/SC29/ TV, edited by Wiley, Chichester, 2009. WG11, N12350, Geneva, 2011. 57. A. Kumar, Implementing Mobile TV ATSC Mobile DTV, 35. Information technology – Generic coding of moving pictures MediaFLO, DVB−H/SH, DMB, WiMAX, 3G Systems, and and associated audio information: Systems, ISO/IEC 13818− Rich Media Applications, Elsevier/Focal Press, Burlington, −1:2000. 2010. 36. Information technology – Coding of audio−visual objects – 58. Digital video broadcasting; specification for service iforma− Part 14: MP4 file format, ISO/IEC 14496−14:2003. tion (SI) in DVB systems, ETSI EN 300 468. 37. Transport of AVC video data over ITU−T Rec H.222.0 – 59. H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, ISO/IEC 13818–1 streams, ISO/IEC 13818−1:2000/Amd RTP: A Transport Protocol for Real−Time Application, RFC 3:2004. 3550, 2003. 38. J.−B. Lee and H. Kalva, The VC−1 and H.264 Video Com− 60. C. Perkins, RTP Audio and Video for the Internet, edited by pression Standards for Broadband Video Services, Springer, Addison−Wesley, Boston, 2006. Berlin, 2008. 61. Adobe Systems Inc., Real time messaging protocol chunk 39. Transport of scalable video over Rec. ITU−T H.222.0 – stream, 2009. ISO/IEC 13818–1, ISO/IEC 13818–1:2007/Amd 3:2009. 62. Adobe Inc., Adobe flash media server configuration and ad− 40. Transport of multiview video over Rec. ITU−T H.222.0 – ministration guide, available online http://livedocs.adobe. ISO/IEC 138181, ISO/IEC 13818–1:2007/Amd 4:2009.11. com/fiashmediaserver/3.0/docs/help.htmlficontent=03 41. Information technology – Coding of audio−visual objects – configtasks 25.html. Part 12: ISO base media file format, ISO/IEC 14496−12: 63. E.R. Pantos and W. May, HTTP Live Streaming, available 2008. online http://tools. ietf.org/html/draft−pantos−http−live−stre− 42. Information technology – JPEG 2000 image coding system – aming−06. Part 12: ISO base media file format, ISO/IEC 15444−12: 64. The M3U Playlist format, available online http://wikipedia. 2008. org/ wiki/M3U. 43. Apple Inc., QuickTime File Format Specification, 2007. 65. Microsoft Corporation Microsoft Silverlight, available on− 44. Information technology – Coding of audio−visual objects – line http://www.microsoft.com/ silverlight/. Part 15: Advanced Video Coding file format, ISO/ IEC 66. A. Zambelli, IIS sooth streaming technical overview, Micro− 14496−15:2010. soft Corporation, 2009. 45. P. Amon, T. Rathgen, and D. Singer, “File format for scal− 67. The World Wide Web Consortium, Synchronized multime− able video coding”, IEEE T. Circ. Syst. Vid. 17, 1174–1185 dia integration language (SMIL 2.0), available online http:// (2007). www.w3.org/ TR/2005/REC−SMIL2−20050107/. 46. T. Schierl and S. Narasimhan, “Transport and storage sys− 68. Microsoft Corporation, Smooth streaming, available online tems for 3−D video using MPEG−2 systems, RTP, and ISO http://www. iis.net/download/SmoothStreaming file format”, Proc. IEEE 99, 671–683 (2011). 69. Adobe Inc., Dynamic HTTP streaming, available online 47. U. Reimers, Digital Video Broadcasting (DVB). The Interna− http://www.adobe.com/ products/httpdynamicstreaming/ tional Standard for Digital Television, Springer, Berlin, 70. Adobe Inc., Adobe Flash Video File Format Specification 2001. Version 10.1, available online http://download. macromedia. 48. Digital video broadcasting (DVB) Project, http://www.dvb. com/f4v/video file format spec v10 1.pdf. org/. 71. Adobe Open Source/Open Source Media Framework, Flash 49. Digital video broadcasting (DVB); Framing structure, chan− Media Manifest File Format Specification, available online nel coding and modulation for 11/12 GHz satellite services, http://osmf.org/dev/osmf/specpdfs/FlashMediaManifestFile ETSI EN 300 421. FormatSpecification.pdf. 50. Digital video broadcasting (DVB); Second generation fram− 72. 3GPP, Transparent end−to−end Packet−switched Streaming ing structure, channel coding and modulation systems for Service (PSS); Protocols and codecs (Release 9), available Broadcasting, Interactive Services, News Gathering and online http://www.3gpp.org/ftp/Specs/html−info/26234.htm. other broadband satellite applications (DVB−S2), ETSI EN 73. Open IPTV Forum, HTTP Adaptive Streaming, available 302 307. online http://www. oipf.tv/docs/Release2/OIPF−T1−R2−Spe− 51. Digital Video Broadcasting; Framing structure, channel cod− cification−Volume−2a−HTTP−Adaptive−Streaming−V2_0− ing and modulation for cable systems, ETSI EN 300 429. −2010−09−07.pdf. 52. Digital video broadcasting; Frame structure channel coding 74. Information technology – MPEG systems technologies – and modulation for a second generation digital transmission Part 6: Dynamic adaptive streaming over HTTP (DASH), system for cable systems (DVB−C2), ETSI EN 302 769. ISO/IEC FCD 23001−6, 2010. 53. Digital video broadcasting; Framing structure, channel cod− 75. Requirements on HTTP Streaming of MPEG Media, ISO/ ing and modulation for digital terrestrial television, ETSI EN IEC JTC1/SC29/ WG11 N11340, Dresden, 2010. 300 744. 76. Use Cases for HTTP Streaming of MPEG Media, ISO/IEC 54. Digital video broadcasting; Frame structure channel coding JTC1/SC29/WG11 N11339, Dresden, 2010. and modulation for a second generation digital terrestrial 77. Digital video broadcasting (DVB); Transport of MPEG−2 television broadcasting system (DVB−T2), ETSI EN 302 based DVB services over IP based networks, ETSI TS 102 755. 034, 2006.

50 Opto−Electron. Rev., 21, no. 1, 2013 © 2013 SEP, Warsaw 78. ITU, Focus Group on IPTV, RTP/UDP/ MPEG2 TS as a me− 83. D. Hoffman, G. Fernando, and M. Civanlar, RTP payload ans of transmission for IPTV Streams, IPTV−ID−0087, 2006. format for MPEG1/MPEG2 video, RFC 2250, 1998. 79. Internet Streaming Media Alliance, ISMA Implementation 84. S. Wenger, M.M. Hannuksela, T. Stock−Hammer, M. Wes− Specification, 2005. terlund, and D. Singer, RTP payload format for h.264 video, 80. Internet Streaming Media Alliance, Planning the Future of RFC 3984, 2005. IPTV with ISMA, 2006. 85. S. Wenger, Y.−K. Wang, T. Schierl, and A.Eleftheriadis, 81. RTP Profile for Audio and Video Conferences with Minimal RTP payload format for svc video, available online: http:// Control, RFC 3551, 2006. tools.ietf.org/html/draft−ietf−avt−rtp−svc−21. 82. J. van der Meer, D. Mackie, V. Swami−Nathan, D. Singer, 86. Y.−K. Wang and T. Schierl, RTP payload format for MVC and P. Gentric, RTP payload format for transport of mpeg4 Video, available online: http://tools.ietf.org/html/draft− elementary streams, RFC 3640, 2003. −wang−avt−rtp−mvc−05.

Opto−Electron. Rev., 21, no. 1, 2013 A. Buchowicz 51