FILE-BASED WORKFLOWS High-qualityVideo processing in the file domain

Bruce Devlin AmberFin

File-based workflows require more format conversion than workflows in the SDI domain. The effects of interlace and scaling on file-based images is greater than their effects in the SDI domain, yet most file-based processing focuses on raw speed rather than media quality. Better media quality reduces the consumer churn rate and makes the end customers more loyal. This article shows what can be achieved today in the file domain with a little workflow care and some good and scaling.

In the middle of 2005, it was a real achievement to make a file-based workflow deliver business benefits. At the time, there was very poor interoperability between encoders, transcoders and edit platforms. Many systems that were built went for a “one-stop shop” approach, sourcing all the components from a single vendor, basing the system around MPEG-2 or QuickTime and working in SD. In the middle of 2005, few pople had heard of YouTube. “Just making the system work” was considered a success. Let’s fast forward now to the middle of 2010. There are a great many file-based workflows delivering business benefits. YouTube, iTunes, NetFlix, DailyMotion and other portals are doing well. Tape is still the main mechanism for the interchange of programmes, but many content providers are switching to a file-only model. The success of formats like MXF and QuickTime have enabled this new way of working, and the abundance of new distribution platforms – whether they are broadcast, cable, internet, IPTV, web TV or mobile – provide a commercial incentive to create more content in 2010 for less money than was spent in 2009. Interoperability issues are still with us, but “just making the system work” is no longer a good goal for a file-based workflow. Delivering high-quality video at low bitrates has been shown to reduce churn in IPTV and to increase viewers on the web (try Googling “video quality reduces churn” for hundreds of articles). It is also a way in which individual broadcasters and publishers can differentiate themselves. This article looks at the challenges of keeping the quality high, the bitrate down and why it is even more important to process the signal well in the file domain when compared with the traditional SDI domain.

Codecs, converters, interlace and standards

Before jumping into the signal processing technology, I will first define some terms. Many of these terms are used in a very loose fashion within the industry. This article will use them as follows:

EBU TECHNICAL REVIEW – 2010 Q3 1 / 7 B. Devlin FILE-BASED WORKFLOWS Codec: A video or audio or data encoder or decoder. MPEG-2 is a codec whereas MXF is not. H.264 is a codec and so is Dolby D. Wrapper: A multiplexing or interleaving specification that allows different video, audio and data codecs to be synchronized within a single file. MXF, GXF, Quick- Time, Windows Media, AVI, PS and MP4 are all wrappers. Delivery Profile: A combination of codecs and wrappers configured for delivery to a certain specification. Today, this is often determined by the target device: e.g. a particular playout server. As the number of delivery formats increases, this is more likely to become an Application Specification such as AMWA’s AS02 or AS03. Format Converter: A format converter changes the spatial resolution of the video and may modify the by some integer ratio. For example, a format converter may convert SD (720 x 576 x 50i) to HD (1280 x 720 x 50p). In the USA, a format converter may insert 2:3 cadence to convert 23.98 fps (frames per second) material to 29.97 fps material. Standards Converter: A standards converter changes temporal resolution of the video and may also change resolution. For example a typical conversion is from US mate- rial (29.97 fps at 480i lines) to European video (25 fps at 576i lines). In the file domain, conversion to filmic rates and 1080 50p / 1080 60p is also possible. Interlace: Interlace can be thought of as a very old compression tool. Every other line is displayed during the first half of the frame period. The remaining lines are displayed during the second half of the frame period. Virtually all standard-definition video is interlaced, as is a very large amount of HD material. Like any compression scheme, interlace has it benefits ( reduction) and its problems (vertical-temporal aliasing, poor digital compression characteristics). It is worth noting that in 2010, the majority of source content is still interlaced but many display devices that consumers use, including flat-screen TVs, PCs, iPads and mobile devices, are progres- sive.

Comparing the SDI domain with the file domain

In the SDI domain, video signals required a physical transport and so the number of Format Conver- sions that were needed were limited to the conversions between the formats that could be carried on a SMPTE 292M physical link. During the planning phase of a facility build, the Format Converter devices could be planned and purchased to maximize quality and minimize the number of conver- sions. High quality converters featured good deinterlacers and poor quality converters didn’t. In the file domain, there are hundreds more formats that need to be supported but, until recently, there have not been any good software deinterlacers on the market to keep the picture quality high.

Deinterlacing – why you need it

A frame captured from a clip of video is shown at the top of the next page. This frame shows the typical “interlaced look” that should be very familiar. To our eye, it is quite clearly a skateboarder who had moved between 1 being scanned and field 2 being scanned. To a piece of software analysing a portion of the scene, it sees horizontal stripes. The software must do a lot of calculations to work out if this pattern was intended to be stripes (i.e. vertical detail) or whether this pattern was due to motion (i.e. poor temporal sampling). This problem is known as

EBU TECHNICAL REVIEW – 2010 Q3 2 / 7 B. Devlin FILE-BASED WORKFLOWS vertical-temporal aliasing and, due to the very low frame rates of video compared to the speed of objects moving across the screen, it is impossible to get it right all the time. Let’s consider the case of compressing an SD video signal. Assuming no scaling takes place, codecs like MPEG-2 and H.264 have knowledge of interlace built into them and they do a reasonable job. MPEG-1 and other 1st generation codecs do not know about interlace and, when used incorrectly, can permanently stamp in the interlace artefacts so that they can (almost) never be removed. What happens when scaling takes place? Well-designed soft- ware will scale each field independently and the pictures will look OK. Poor-quality scaling software will frame blend and then scale the image. From a signal-processing point of view, this is a disaster – but it happens very frequently today. One classic example provided to AmberFin’s engineering department was a clip from a movie that was originally shot at (24 progressive frames per second). By looking at the artefacts, we believe the clip we were asked to process had been converted to 29.97 fps by inserting a 2:3 sequence and converting to a standard-definition MPEG-2 format. The MPEG-2 file had then been frame-blended and badly scaled to 720p (probably 59.94 fps) and encoded as progressive H.264. By performing this scaling and coding, the original interlace had been “stamped into” the picture like a watermark. The poor-quality scaler had not preserved the original interlaced line structure and it was now impossible to reverse the process.

Spotting quality clues in a test pattern

Many of today’s encoding and trans- coding solutions are geared towards raw throughput rather than keeping the integrity of the original picture. In almost all industries, increasing the quality of a product increases its attractiveness to the end customer. This is true for free-to-air services as well as paid-for services. In order to get better throughput, deinterlacers and scalers are compromised in quality in order to run fast. Let’s look at some test patterns to see the effects of this on images. The humble test pattern was once the mainstay of checking an Test Pattern 1 – poor quality scaling and deinterlacing analogue transmission system. However, it’s an unfortunate recent trend that the testing of equipment before purchase happens less than it did in the past. In fact, modern test patterns are still very useful because they can reveal what is going on behind the scenes when an image gets filtered (a scaler, deinterlacer and a standards converter are all different examples of a filter). We at AmberFin have used a test pattern that has kindly been provided by OmniTek to reveal some obvious and some subtle effects of poor deinterlacing.

EBU TECHNICAL REVIEW – 2010 Q3 3 / 7 B. Devlin FILE-BASED WORKFLOWS Hopefully, when you are reading this article, the PDF file has not been over-compressed or scaled with a poor-quality image filter on your screen. [Ed.: the two test patterns are duplicated at higher quality on a single page, inserted at the end of this PDF.] If you look at Test Pattern 1, you will see that there are some obvious artefacts that have been introduced as a result of the poor performance of the deinterlacer. 1) the omnitek url in blue at the top of the chart looks “bold” and some of the letters merge.

2) the frequency lettering on the Test Pattern 2 – high quality scaling and deinterlacing right-hand side is missing hori- zontal components 3) the large “OmniTek” lettering at the bottom seems to have the wrong curvature at the top and bottom of the letter “O” 4) The pictures of the car and plane look “crunchy” and have “stair cases”

There are additional more subtle effects of the poor deinterlacer that can be seen when compared with the high-quality deinterlacing and scaling shown in Test Pattern 2. 5) In the lower row of the coloured zone plates (the circles at the bottom of the image), there are three clear circles per zone plate in Test Pattern 1, whereas there is only one zone pattern per plate in Test Pattern 2. The poor quality deinterlacer has introduced an extra two circles! This is because its filter has introduced new frequency components that were never in the original image. This will become important later. 6) On the right-hand side, you will see a number of vertical and diagonal coloured lines. These represent different horizontal and diagonal frequencies in the image. In Test Pattern 2, you will see that the lines remain diagonal when they get closer together (i.e. with higher frequency). In Test Pattern 1, you will see that the poor deinterlacer turns them into a chequerboard pattern i.e. new high frequencies have been introduced. 7) Finally, you will see how the picture of the fruit and vegetables in Test Pattern 2 looks clear whereas the picture in Test Pattern 1 looks “busy” and the fruit seem to blend into each other.

If you were able to see the pictures in ‘play’ you would notice other artefacts; for example, the under- line on http://www.omnitek.tv flickering on and off between fields.

How compression works

Nearly all files in a file-based workflow are compressed. Most files will have been compressed and decompressed several times before they are used in a playout or publishing system. Space prevents a full thesis on compression techniques, so I will focus on two of the major tools that are used by compression engines and how their performance is reduced by poor deinterlacing and scaling. Many compression schemes use to reduce the bitrate of a video stream. The theory is quite simple and can be illustrated by the DCT (Discrete Cosine Transform) image shown in Fig. 4.

EBU TECHNICAL REVIEW – 2010 Q3 4 / 7 B. Devlin FILE-BASED WORKFLOWS The transform is applied to a block or a region of an image. The output of the transform will be a set of numbers that correspond to different spatial energies within the picture. The DCT image shows that the top left “bin” represents the amount of DC energy in the image. The top right bin repre- sents the amount of high horizontal frequency in the image. The bottom left bin represents the high vertical frequency and the bottom right bin represents the high diagonal frequency. When applied to small sections of most images, only a few of these bins will have a significant amount of energy. All the other bins, particularly those with high frequency are likely to have little energy (i.e. a small value) and can there- fore be coded less accurately or even discarded. Compres- sion codecs such as MPEG-1, MPEG-2, DV, Windows Figure 4 Media Video (VC-1), H.264, JPEG2000 etc. all use this prin- DCT image ciple to reduce the bitrate. There is a built in assumption that, on average, you can throw away or mask artefacts by selectively reducing quality in high- frequency areas by reducing coding accuracy or even discarding data. We saw in the previous section, however, that poor deinterlacing and scaling introduce significant amounts of energy into an image and it does this in the high-frequency areas. This causes problems for transform codecs because the built-in assumptions are now stressed. The codec control system will see lots of high-frequency energy and will attempt to allocate bitrate to it so that it can be prop- erly represented in the decoded image. The codec has no way of knowing that those high frequen- cies were artefacts that you didn’t want in the first place. The net result of poor deinterlacing and scaling is that bitrates go up and picture quality goes down in any transform compression system. Many compression schemes also use inter-frame encoding. This technique assumes that a video sequence consists of a series of frames where nothing much changes on a frame-by-frame basis. Motion vectors are generated to track blocks of the image and instead of sending transformed image data to a decoder, motion vectors and difference pictures are sent instead. At first glance, it would appear that this mechanism would not be affected by deinterlacing and scaling. However, when a moving test pattern is used to investigate the behaviour of these filters, it Figure 5 becomes obvious that the effects of poor filtering are dynamic in Inter-frame decoding nature. This means that the motion estimator in the compression encoder has a much harder job of matching blocks in one image with equivalent blocks in a future or previous image. The artefacts introduced by the poor scaling and deinterlacing have the same effect as noise and therefore will reduce the statistical likelihood of a good match. Reducing the quality of the motion vectors results in more image data having to be sent to the decoder which in turn results in higher bitrate and / or poorer quality pictures. One important current topic is the use of compression and in future Stereoscopic (i.e. 3D) TV services. Ster-

Abbreviations AVI (Microsoft) Audio-Video Interleaved MXF DC Direct Current NTSC National Television System Committee (USA) DCT Discrete Cosine Transform PAL Phase Alternation Line GXF General eXchange Format PS Programme Stream HD High-Definition SD Standard-Definition IPTV Internet Protocol Televison SDI Serial Digital Interface

EBU TECHNICAL REVIEW – 2010 Q3 5 / 7 B. Devlin FILE-BASED WORKFLOWS eoscopic TV is highly intolerant of poor quality images, especially if they introduce non parallax disparities between left- and right-eye views. At the same time, in order for Stereoscopic TV to be commercially practical, it will need to use economically practical amounts of data through produc- tion, post production and distribution. This is a topic worthy of much future discussion.

Format conversion in the file domain

It is important to realise that more format conversion is done in the file domain than was ever done in the SDI domain. There are many more hours of interlaced TV source material in the world today than there are of progressive material, yet nearly every modern display is a progressive device. From flat panels to mobile phone to PCs to iPads – they are all inherently progressive. The life cycle of media in the file domain is also increasingly more complicated. It is often the case that a web version of an HD asset will be made from the SD master for historical reasons. This may result in several scaling and deinterlacing steps that are not needed. Reviewing the life cycle of media within a facility may reveal areas where storage and bandwidth costs downstream can be reduced by improving the quality of upstream. Storage is cheap, but if you can reduce the size of your assets by 20% and make them look better at the same time, then recovering those costs is often easy to do.

Standards conversion in the file domain

I have discussed at length, the issues around format conver- sion – i.e. scaling and deinterlacing, but I will finish off with some words on Standards Conversion. The picture on the left (Fig. 6) shows the classic standards conversion problem that needs to be solved. How do you convert video that was sampled at 50 frames per second into Figure 6 video that is sampled at 60 frames per second? The answer Standards conversion is quite simple – you measure the motion of every pixel in the scene and use the motion vectors to project those pixels into the new temporal position. In other words – move all the objects to positions as though the camera were operating at the new frame rate. As an extra complexity, in standard definition, you also have to scale the image because the shapes of PAL and NTSC pixels are different. It should be obvious from the rest of this article that interlace and scaling will have a massive impact on the performance of a standards converter. It should also be of no surprise that the world’s best deinterlacers come from companies that have mastered the art of standards conversion. Standards conversion is a computationally intense process and it interacts with compression codecs in the same way as a deinterlacer does. If you have source material that has been poorly deinter- laced and scaled, and you attempt to standards convert it, then you will end up with very poor output pictures that in turn will be hard to compress. Taking care upstream will deliver cost benefits down- stream.

Conclusions

File-based workflows require more format conversion than workflows in the SDI domain. The effects of interlace and scaling on file-based images is greater than the effects in the SDI domain, yet most file-based processing focuses on raw speed rather than media quality. Better media quality reduces

EBU TECHNICAL REVIEW – 2010 Q3 6 / 7 B. Devlin FILE-BASED WORKFLOWS

Bruce Devlin graduated from Queens’ College Cambridge in 1986 and has been working in the broadcast industry ever since. He joined the BBC Research Depart- ment to work on Radio-Camera systems before moving to France where he worked on sub-band and MPEG coding for Thomson. He joined Snell & Wilcox in 1993 where he started the company's work on compression coding.

Mr Devlin joined AmberFin (the software division of Snell & Wilcox) in 2008 where he guides the company on file, workflow, and systems issues. He holds several patents in the field of compression and files, and has written international standards and con- tributed to books on MPEG and file formats.

Bruce Devlin is co-author of the MXF file format specification and an active contribu- tor to the work of the SMPTE (Society of Motion Picture and Television Engineers) and the AMWA (Advanced Media Workflow Association). He is a Fellow Member of the SMPTE. the consumer churn rate and makes end customers more loyal – this can be achieved today in the file domain with a little workflow care and some good deinterlacing and scaling.

This version: 8 July 2010

Published by the European Broadcasting Union, Geneva, Switzerland ISSN: 1609-1469

Editeur Responsable: Lieven Vermaele Editor: Mike Meyer E-mail: [email protected]

The responsibility for views expressed in this article rests solely with the author

EBU TECHNICAL REVIEW – 2010 Q3 7 / 7 B. Devlin Test Pattern 1

Test Pattern 2