ADAPTING ADVANCED CODING TO CABLE NETWORKS Mukta Kar, Ph.D. and Yasser Syed, Ph.D. Cable Television Laboratories, Inc.

Abstract researchers understood the power and flexibility of migrating to digital technology, This paper describes the current cable and it started being practical to do so. environment and what demands it has on development of a cable-friendly AVC video However, the delivery of uncompressed coding stream. It considers what the further digital audio/video to consumer homes is restrictions on AVC are for recreating the not an acceptable solution as it takes a huge same services already provided by MPEG-2 amount of bandwidth to deliver. To tackle plus coexistence in the same network with this problem, researchers had been working MPEG-2 video streams and other digital on how to compress digital audio/video services. The paper also looks at what are effectively for consumer-oriented the requirements for multiplexing, random applications. Towards the end of the last access to support existing broadcast century, audio-video compression services, and how this affects the coding technology matured to the level that it can design. Furthermore it looks at additional be practically used for consumer emerging needs to address services like applications, and MPEG-2 standards were Personal Video Recorders (PVRs), Video- created in 1994 to broadcast digital on-Demand (VOD), and Digital Program audio/video to consumer homes and offices. Insertion (DPI) and how this can constrain MPEG-2 technology takes much less the design on the video stream and the video bandwidth than analog delivery (today coding layer. This paper concludes by approximately 10:1 bandwidth efficiency) saying it is not enough to just have a highly while providing same or better fidelity efficient video coding standard developed, (quality of audio/video) even in a noisy the cable environment (or any application environment. environment) imposes its own set of demands that trades off the advanced video AVC: EMERGING DEMANDS coding design and some compression FOR VIDEO CODING efficiency for expected viewer experience. Over the last few years MPEG-2 INTRODUCTION technology has become very popular and received global acceptance which has led to In the last century television contents many new innovations such as HDTV, PVR (video and audio) have been delivered to (DVR), VOD, streaming media, interactive consumers mostly in analog format. It was TV, etc. These applications were realized then that the fidelity of analog unthinkable in the analog format and did not reception is good in noiseless or less noisy exist as MPEG-2 standards were created. environments, and also the analog system is While these new applications can help either inefficient or inflexible in the generate more revenues for the broadcast important areas such as service security, industries (cable, satellite and terrestrial), bandwidth usage, interactivity, etc. Around they also require more digital bandwidth to the 80’s and 90’s of the last century, deliver. For example, an HDTV channel requires a bandwidth 15-20 Mbits/sec ISO/MPEG (MPEG-4 part 10) and H.264 in whereas a SDTV channel takes only 2-3 ITU in May, 2003 [4]. AVC may be Mbits/sec using the MPEG-2 video standard considered as an aggregation of tools, some [1,2]. of which are enhanced MPEG-2 tools and a few additional new tools that have been Besides broadcast industries, other video added to increase compression efficiency industries such as DVD, streaming media, while providing the same or better picture etc. also needed an advanced video codec quality. In brief, AVC achieved two that has significantly better compression important objectives - better compression than MPEG-2. For example, an HD movie efficiency than MPEG-2 and convergence in cannot be stored on a current DVD disc. As hardware across major consumer video MPEG-2 video needs more bandwidth, applications - but sacrificed backward today’s streaming media technologies use compatibility with the MPEG-2 video proprietary codecs to deliver their content compression standard. However, backward but encounter storage and software compatibility has been addressed by the challenges due to a lack of convergence on a silicon designers at the AVC codec chip compression standard. In another industry, level such that it becomes transparent to conversational video applications (video users. conferencing and video telephony) use another video codec standard, but are also AVC tools consist of enhanced MPEG-2 facing similar challenges to streaming tools, some new tools, and error resiliency media. This shows that each of the major tools. MPEG-AVC or AVC is a large tool video industries today use diverse video box which can meet the needs of most major standards and different hardware platforms video applications such as broadcast video, for delivery and storage which impede conversational applications, and streaming consumer acceptance of multiple video video. The AVC system has created an applications. opportunity to change the paradigm of coding in AVC. The new tool box allows One of the answers to the problems multiple reference frames to be used for above is to compress digital audio/video predictions as opposed to the two reference even more efficiently than that in MPEG-2 frames allowed in MPEG-2. It also in order to be useable across different introduces spatial prediction in intra-coded industries. A better answer will be to create frame and adaptive . a new video standard that will not only Motion compensation resolutions’ block provide significantly better compression size may be 16x16, 8x16, 16x8, 8x8, 8x4, than MPEG-2 but also a convergence in 4x8 or 4x4 pixels. It supports block hardware platforms across major video transforms of 8x8 and 4x4 pixels for luma applications. which may be used adaptively. It also introduces in-loop filtering to smooth the To have better compression efficiency as sharp edges which are an artifact of using well as to eliminate divergent requirements block transforms. AVC also added a few in hardware, the video experts in ITU and error-resilient tools (not present in MPEG-2) ISO/MPEG formed a joint video team (JVT) to support video delivery over non-QOS to investigate a solution. Their collaborative networks such as public IP networks and work over two years resulted in the creation wireless networks. of an (AVC) in CURRENT STATE OF CABLE: and video telephony, using the same AVC THE NEED FOR AVC IN CABLE hardware platform. In a way, to future-proof our industry with respect to delivery of About ten years ago, the cable industry multiple applications, the cable industry was delivering premium TV programs needs to consider all these factors mentioned primarily in analog only format. Over the above. last ten years evolutionary changes have been taking place in cable. Today cable ADAPTING THE AVC STANDARD TO delivers not only analog video but also a few CABLE digital enabled services such as digital TV, high speed data, and telephony. Cable is also Although AVC created profiles and in the process of using non-linear video levels geared towards major video services in addition to scheduled types of applications, still it is not well tailored services. At present cable delivers linear towards the cable environment. The video applications like broadcast television but stream may have a much better compression also is adding non-linear services such as efficiency using all tools in the standard, but VOD, PVR, VOIP. Also we are shifting the expected viewer experience for cable TV from the broadcast television model to a has not been designed in the AVC standard. personal television one. As a result cable The adaptation of MPEG-2 to cable was an faces a real bandwidth crunch as it has to easier process than what is now being support the legacy analog and digital considered because the standard was broadcast television services due to optimized more towards a broadcast regulatory and logistical reasons, while application, while the AVC standard tries to adding an array of the new digital services encompass more than that. Additionally to actively engage with new competition. cable has also been changing at the same VOD, MOD, and PVR services require a time to address other applications than just significant amount of storage at the headend broadcast. and in client devices. So we need better compression of the digital content than Constraints need to be written on the MPEG-2 can provide. AVC will certainly AVC standard (or any future advanced help in delivering more channels over the coding standard) to maintain current same digital bandwidth as well as in storing expected services and viewer experience the digital content at the headend or in while improving the plant bandwidth usage PVRs. efficiencies and expanding on these services. This constraint document will be It was mentioned earlier that MPEG-2 similar to the standard document SCTE 43 technology has been accepted widely across 2005 [6] that is written for MPEG-2 video. the global broadcast industry. It provides The network infrastructure and legacy tremendous business continuity in terms of equipment need to continue operating content, equipment (transmission, storage, without disruption. Existing services such as and test & measurement), etc. It is expected the Electronic Program Guide, Emergency that the general video industry will migrate Alert Messages, and audio-video to AVC in a next few years, especially in the synchronization need to be maintained and area of HDTV delivery and storage. AVC be agnostic to any new video coding may also provide opportunity for adding standard. Viewer experiences like channel new services, such as video conferencing switching/surfing, PVR recording and interactive video services still need to be matured MPEG-2 infrastructure should not maintained transparent to different video be reinvented. Keeping that in mind, MPEG coding technologies. At the same time, the created an amendment to MPEG-2 Systems splicing of video streams to perform local [3] to deliver AVC-coded video over commercial insertion should be visibly existing MPEG-2 transport. Figure 1 shows seamless as done in MPEG-2. The cable how AVC coded video can be packetized in viewer should not be aware of any changes MPEG-2 transport packets which will in video coding technologies, but should enable multiplexing with other elementary benefit from the increased services. streams and other cable digital services. Designing constraints may have a minor Transport over MPEG-2 will ensure audio- negative effect on compression efficiency, video synchronization (no lip-synch error). but will allow the viewer experience to Additionally, it will support existing cable remain intact. applications such as channel change, trick- mode, and DPI. Some constraints will be DELIVERY OF AVC CODED VIDEO necessary at the video level (SCTE 43 [6] for MPEG-2 video). The digital transport / It is important to note that currently all multiplex standard SCTE 54 2004 [7] has to digital services are delivered over MPEG-2 be amended. These are discussed below. transport over the HFC cable network. To avoid expensive changes to the existing It is mentioned earlier that the AVC MPEG-2 transport-based infrastructure, any standard will not only benefit broadcast new digital services that will be added to applications, it will also be used in other cable should also be delivered using existing major video services such as conversational MPEG-2 multiplexing capability with other and streaming media applications. ITU services without causing any side effects. H.320 and Internet Protocol (IP) are used This means that the existing services based today to deliver conversational and on analog, MPEG-2 video, and high speed streaming media applications, and these data services will remain unaffected. So it is transports can be utilized for AVC as well. probably the most important requirement Referring back to Figure 1, AVC coded that any new service based on AVC coded video allows for supporting these 3 major video (or any advanced coded video that transport protocols (MPEG-2, H.320 & IP), may be added to cable) should be using a network abstraction layer (NAL). deliverable over the MPEG-2 transport First, the coded video and related infrastructure that exists in the cable information (PS and SEI) are packetized in network today. NAL packets which can be variable in size. Each NAL packet will have a type value that It may be noted that AVC is just an tells the type of payload it is carrying (VCL, advanced video coding standard. Unlike PS or SEI). Those NAL packets are then re- MPEG-2, no specific transport standard has packetized in MPEG-2, IP and H.320. been created for delivery of AVC coded video. The MPEG committee realized that a

Figure 1. Carriage of AVC Compressed Video

Figure 2. Random Access Points into Video Stream

RANDOM ACCESS REQUIREMENTS broadcast channel without an undue amount FOR CABLE of delay in receiving the video picture. The viewer after he switches the channel will A critical feature needed for cable is the normally tolerate 1-2 seconds at most of not ability to randomly access the video stream seeing video (because that is what a viewer at regular intervals [see Figure 2]. This is an already expects from existing analog change essential need for fast consistent channel times) before assuming something is wrong changing/ channel acquisition in the with the channel and switch away. The broadcast environment, but it also can be channel change time is actually due to a quite useful for DVR and commercial (ad) combination of factors such as physical RF insertion applications. As will be pointed tuning, conditional access, parsing streams, out in the following sections, constraints to and decoding the video stream. Decoding of design the video stream and transport the video is dependent upon the design of headers will affect how a video compression the video stream to allow for regular periods standard is used and will tradeoff some of random access while the rest of the potentially very high coding efficiency gains channel change contributing factors are in exchange for supporting a viewer outside factors. A good strategy will allow expected experience. channel changing to be fast, frequent, and output undisrupted clean video once the Channel Change requires random access decoding process starts or in other words a points to allow switching into an existing graceful transition from one channel to the other. Additionally to work in an existing an access point in the video stream. Once cable environment, a viewer should be able that transport packet is decoded, sequence to switch to channels that have alternate header and picture header information is video coding technologies (MPEG-2, given to identify the dimensions and frame analog) for the channel with again a graceful rate of the video. This also starts the process transition. of identifying the next transport packets in the sequence. Channel Change requires a combination of signaling at the transport layer to identify At the video coding layer, the random the pertinent video transport packets and access point identifies the start of a GOP designing of the video coding pattern in the (group of pictures) that consist of a pattern video layer to create a clean continuously of I, P, and B coded pictures [see Figure 4]. displayable output video only using data The next random access point will be at the after the access point. Some compression beginning of the next GOP which is efficiency in the video stream will be normally encoded at intervals of 15 to 30 sacrificed in order to allow for a consistent frames (1/2 second or 1 second) with some and fast channel change between video leeway for making some occurrences of the streams. interval smaller to adapt to stream changes or stream conditioning situations. The types HOW MPEG-2 DOES RANDOM ACCESS of pictures indicate the different type of prediction that is used in MPEG-2 video: An MPEG-2 stream uses the Random Access Indicator (RAI) bit in the MPEG-2 I-Pictures - These pictures use intra- transport (4byte) header adaptation field frame compression and are not shown in Figure 3 to identify that this is the predicted from other pictures (no transport packet (188 bytes) that identifies temporal compression). This is also a

Figure 3. MPEG-2 Transport Stream Packet with Header Information type of anchor picture (in AVC this There are a couple of things to note. could also be called a reference picture) Prediction is only dependent on the nearest which are pictures that can be used for anchor frames. This means that to decode a prediction. The I-picture can be the start picture, at a maximum, one needs to store is of a random access point in the video a segment of the video stream that contains stream. that particular picture and all pictures up to and including the nearest anchor pictures in P-Pictures - These pictures can either both directions (so long as the anchor use intra-frame compression or pictures are already decoded). In the prediction from the nearest past anchor reference picture buffer this allows for a frame (forward predicted). The P- natural “bumping” process to occur such picture is also another type of anchor that when an anchor picture gets stored and frame. decoded then the oldest anchor picture can be discarded. Also of note, is that any B-Pictures - In addition to intra-frame picture acting as an anchor picture must compression and forward prediction, come earlier in the transmit/decode order these pictures can also predict from the than the picture that requires that prediction nearest anchor frame in the future (even though in cases of backward (backward predicted) as well as predict prediction that anchor picture is actually both forward and backward at the same displayed later). Lastly, exiting (switching time (bi-directionally predicted). The channels) before any anchor picture in B-picture cannot be used in turn for transmit/decode order allows for the video prediction and is not an anchor frame. displayed before the point to have no jumps in time (temporal continuity).

Figure 4. Possible MPEG-2 and AVC Prediction and Coded Pictures of Video Stream The MPEG-2 GOP structure length and emulated by limiting the combination of pattern indicates the video stream segment slices in the picture. guarantees that all anchor pictures contained within are decodable and which allows for Predicted Pictures – These pictures any pictures in smaller segments (mini- whether forward or backward predicted do GOPS) predicted from the anchors pictures not need to predict from the nearest to be decodable. Having a random access reference (anchor) frame. They can refer to point start at a beginning of a GOP allows any reference frame stored in the Decoded for continuously displayable video from that Picture Buffer (DPB) and are limited to point on. This in turn directly affects maximum number of reference frames channel change time and the video quality in storable at any one time. Though one may the channel change. only need to predict only from one slice of the picture, the entire picture must be stored. ADAPTING RANDOM ACCESS TO AVC To emulate MPEG-2 behavior simply limit prediction to nearest anchor neighbors. The MPEG-2 GOP structure is a very powerful mechanism for random access and IDR – This is an intra-frame picture the AVC encoding standard can using intra compression techniques that also accommodate this GOP structure, but this use spatial prediction between macroblocks. substantially limits temporal compression An IDR (Instantaneous Decoder Refresh) is improvements that AVC can offer. Even a special intra-frame picture that indicates though the concept of I, P, B are called the that any pictures subsequent to an IDR same as in MPEG-2, AVC has changed what cannot reference (predict from) pictures they meant and also changed prediction before the IDR in transmission order (see rules between I,P, and Bs [see Figure 4]. Figure 5). Without an IDR tag, an intra- These changes are: frame picture allows for pictures after it to reference or predicted from one that came Pictures – These are made up of before the intra-frame picture in combination of slices which can be of the transmission order. An MPEG-2 I picture same type or a combination of different can be emulated by an IDR. Some types. Prediction from a slice requires differences between using an IDR tag or not storage of the entire picture. Pictures are is that the bit rate spikes up around an IDR differentiated between referenced (anchor) than if the tag was not used. pictures and non- referenced pictures which are not used as predicted pictures. Using P - These slices can now be both MPEG-2 Terminology (not really correct for forward or backward predicted. For MPEG- AVC but often used), an “I” picture can 2 behavior, this would need to be limited to have only I slices, “P” Picture can have I forward prediction to only I or P slices. slices and P slices, a “B” picture can have I, P, and B slices. B - These slices can now be reference (anchor) frames and predicted in either I, P, B – These refer to actual slice types direction from any two slices and can be in the pictures rather than the pictures weighted. To emulate MPEG-2 behavior, themselves. A single picture can contain B’s should not be able to predict from a B different combination of these slices and forward and backward dual prediction depending on the number of slices allocated should be used without weighting. to pictures. An MPEG-2 like picture can be

Figure 5. Difference of Intra-frame with and without IDR Signaling

The MPEG-2 GOP structure can be change time. Another thing would be to contained in the design of a cable-friendly limit the time segment/ stream segment that AVC video stream, but many of the contain reference pictures that could be used temporal compression improvement for decoding a particular picture. The use of techniques that the AVC standard provides backward prediction in AVC can be would be lost. extended over longer windows which can give more compression efficiency, but may Providing for random access in cable cause problems in some cable applications. should not make the AVC stream conform Backward prediction needs to be constrained to an MPEG-2 structure but rather such that it does not affect existing cable accommodate it and other approaches. In applications or such that stream conditioning order to do this, rules need to be developed can handle infrequent trouble spots that (not implementations) that allow for fast might occur. To keep the decode and display channel change times and good video order from getting too far out of alignment, quality during the change while not bounding the presentation time from the preventing too much loss in compression decode time of the random accessed picture improvements. In order to do this, a concept could be used. Additionally setting a bound of a segment still needs to exist and it is between the random accessed picture and important that the transmit/decode order and the first clean picture that allows for the display order do not get too far out of continuously displayable video from that alignment. point on would also be useful in this process. Lastly providing some rules to Some possible things to consider are allow for points of time continuity in the creating something similar to a GOP length stream would be helpful. in AVC which would bound the channel

Highlighted Pictures are reconstructable/displayable periodic pictures Trick mode pictures

Figure 6. Generation of Trick Modes using Small Subset of Pictures

EMERGING APPLICATIONS: Additionally slower FFWD/RWD rates DIGITAL VIDEO RECORDERS that approach real-time playback need to consider smooth motion techniques between In the present cable environment, cable these displayable frames which would friendly video streams may also have to warrant a periodicity between sampled address other applications often in the same frames. On the transport level, several things stream meant for broadcast. This would need to be signaled about the video stream. mean more stringent requirements on the One of these areas is identifying regular RAP interval for streams that address these video starting points such as a start of the additional applications. clean video display in the RAP interval. Another areas is the identification of A popular application that is used with reference versus non- reference pictures broadcast streams is the DVR storage and which identifies some pictures that may not navigation of that stream. This is a service be needed for a DVR FFWD/RWD decode. that viewers expect to have because it exists Lastly doing transport signaling in the clear in the analog television and MPEG-2 can allow for manipulation of the video domains today. In the AVC domain this is stream even though it may be encrypted on not as easy and unless constraints are used the hard drive. to ensure a compatible stream is provided, the DVR design may require a full AVC or any other advanced video codec decode/local encode of the stream prior to needs to have the ability to do this. This may storage. require consideration of the DVR application while designing the RAP For the stream itself, regular self interval. It would also require developing contained pictures or field pictures (for signaling in the transport layer which is a interlaced) need to be available at less than factor usually not considered in a video or equal to 1 second intervals. To avoid coding standard. running decoder chips at faster than normal decode rates, the displayable pictures for DVR FFWD/RWD need to depend on a smaller subset of pictures in the RAP interval [see Figure 6 above]. EMERGING APPLICATIONS: one is to ensure temporal completeness or DIGITAL COMMERCIAL INSERTION temporal continuity of the exited stream. This is mainly achieved in MPEG-2 splicing The digital program insertion (DPI) or by exiting on an anchor frame such as I or P insertion of local commercials or short picture. Sometimes this is called completing programs is achieved by splicing from a the mini-GOP. If one exits a stream in the network program stream to a local middle of a group of B-frames, there will be advertisement (ad) stream at a specific point a temporal jump in presentation space in time. At the end of the ad a second splice between the surviving B-frame(s) and the occurs to get back to the network program. earlier anchor frame in decode order. When The timing of splice points is signaled by splicing between two AVC coded streams, SCTE 35 [5] splice_info_section() messages this temporal continuity at an OUT point carried in the transport stream (TS) packets must exist as well. The use of IDR pictures of network program stream. In splicing, the can be helpful (but may not be necessary) to exit point from the network program stream create this temporal continuity. is called the OUT point or out of the network point. The start of the ad or the In the case of the IN point the stream is a second stream is called the IN point. new one to a decoder, so the stream should Similarly, at the end of the local ad-insertion have all the parameters such as profiles and window, the splicer exits the ad-stream and levels, picture resolution, etc., at the start returns back to the network stream. So one that is needed for a decoder before starting DPI opportunity involves two splicings or to decode any picture. This is insured in two pairs of IN and OUT points [see Figure MPEG-2 coded video stream starting with 7 below]. the sequence header, picture header, etc., followed by an I-frame. A similar solution The requirement of splicing from one in AVC would possibly be to start with a compressed stream to another compressed sequence parameter set (SPS), picture

National Ad Network Program OUT IN (M frames)

pk pk+1 pk+2 pk+3 pk+4 pk+5 ppk+67 pk+7 pN

p0 p1 p2 pM-1

Local Ad IN (M frames) OUT

Figure 7. Simplified Diagram of Local Commercial Insertion

parameter set (PPS), etc., and an IDR REFERENCES: picture. Also it may be recommended that one or more mini-GOPs may be encoded 1. ISO/IEC IS 13818-2, International around any splice points just to avoid any Standard (2000), MPEG-2 Video. splicing complexity due to delayed signaling for the splice points (ad-insertion). At the 2. ISO/IEC 13818-2: 1996/Cor. 1:2000 (E) end of a commercial when the splicer MPEG 2 Video Technical Corrigendum 1. returns to the network, the exited ad should also have- temporal continuity when 3. ITU-T Rec. H222.0 | ISO/IEC 13818-1 returning to the network feed. (2nd edition, 2000): “Information Technology – Generic Coding of moving CONCLUSIONS pictures and associated audio: Systems”- and ITU-T Rec. H.222.0 | ISO/IEC 13818- The AVC standardization process is in 1:2000/AMD.3: (2003): “Amendment 3: progress. Issues/requirements related to Transport of AVC video data over ITU-T major broadcast applications such as Rec. H222.0 | ISO/IEC 13818-1 streams”. channel change time, VOD, PVR, DPI have been investigated and defined. It has been 4. ITU-T Rec. 264 | ISO/IEC 14496-10 apparent that some amount of bindings will AVC: “Advanced Video Coding for Generic be required to make it cable application- Audiovisual Services” (May 2003) and friendly. However, constraints will be Amendment-1 to 14496-10 “Fidelity Range minimized to achieve maximum Extensions (2004) ”. compression efficiency while maintaining higher quality video than that produced by 5. ANSI/SCTE 35-2004, Digital Program other similar standards such as DVB, Blue- Insertion Cueing Message for Cable. ray DVD, HD-DVD, etc. The least amount of constraints will also leave room for 6. SCTE 43-2005, Digital Video Systems encoders to produce a bitstream with Characteristics Standard for Cable potentially higher compression efficiency Television. without compromising quality in the future. 7. ANSI/SCTE 54-2004, Digital Video ACKNOWLEDGEMENTS Service Multiplex and Transport System Standard for Cable Television. The authors wish to thank CableLabs management for their support and encouragement in performing this work.

They would also like to thank the members of the advance codec drafting group in SCTE for many helpful discussions.