<<

The archivist's /container FAQ hp://download.das-werksta.com/pb/mthk/info/video/FAQ-digital_vi...

The archivist's and container FAQ

Back to all arcles

Started: September 4th, 2013 Last update: September 23rd, 2016 Wrien by Peter B., Hermann Lewetz and Marion Jaks

Disclaimer

The topic of digital video formats is huge, technical and complex – and the decision of which one to choose is oen linked to given precondions differing from instuon to instuon. Choosing a container and codec for preserving video in digital form has a number of implicaons. Therefore, this choice should be made wisely.

Since we have started developing methods for video digizaon in 2009, we connuously spend a lot of me evaluang and developing methods for responsibly storing our content to the best of our knowledge. We'd like to document, share and discuss our findings with others.

It is backed up by arguments and technical evaluaons, comparisons and tests wherever possible. Please queson our statements! We welcome feedback, comments and discussions about this subject, because we believe in exchange of informaon and experiences, rather than blindly following someone else's suggesons.

We would like to thank Ian Henderson, from the Naonal Archives U.K., who unintenonally inspired us to finally start wring this down, by e-mailing us a long list of quesons about this topic.

This arcle represents our personal experience at the Austrian Mediathek. Topics:

Pros and cons regarding video formats for archiving Audio/video the Mediathek uses Video container the Mediathek uses? FFV1 as video codec Handling born-digital video Video file size (large files/segments) Codec error resilience JPEG2000/MXF Container: MKV/MOV/MXF Archiving DVDs Performance of image sequence as format (e.g. DPX/TIFF)

1 of 7 10/1/2018, 12:06 PM The archivist's video codec/container FAQ hp://download.das-werksta.com/pb/mthk/info/video/FAQ-digital_vi...

Q: What are the pros and cons regarding these video formats for archiving?

container video-codec audio-codec 1. MXF JPEG2000 PCM (uncompressed) 2. MXF Uncompressed PCM (uncompressed) 3. MOV ProRes PCM (uncompressed) 4. MKV FFV1 PCM (uncompressed) 5. AVI FFV1 PCM (uncompressed)

This answer is so long, that we've put a detailed comparison of the "usual suspects" on its own page: Comparing video codecs and containers for archives

Q: Which audio/video codecs does the Mediathek use?

For captured material, we use FFV1 for video and uncompressed PCM for audio. Audio resoluon depends on the source material, but analogue sources are captured using 48kHz 24bit, as this is the SDI standard.

"Born digital" material, coming in as a file already, is a whole different story. Please read up on born digital details, below.

Q: Which container does the Mediathek use?

We use AVI (Audio Video Interleaved).

Almost every applicaon that has to do with video can handle AVI files. Ranging from Free Soware (Open Source) to proprietary tools, professional and consumer alike. It has a quite limited set of features, but this is also a main feature for long-term preservaon: simple = robust.

Also: more features = more possible points of failure. Some archives argue the case for containers with complex features, but do not acknowledge the dangers accompanying this decision: the more features a container has, the more possible points of failure are added to your archive copy. Keeping this in mind, we made our decision for a simple container.

In pracce, we've had almost no interoperability issues with using AVI across different tools from different vendors. Even across different operang system plaorms.

We currently do not yet know of any other container that is so widely and stably supported.

Q: Would you always recommend FFV1 for capture from analogue?

Yes. To be more precise: We recommend a lossless codec for capturing from analogue.

The reason why we use (and recommend) FFV1 is: The choice of lossless codecs is very small. We need

2 of 7 10/1/2018, 12:06 PM The archivist's video codec/container FAQ hp://download.das-werksta.com/pb/mthk/info/video/FAQ-digital_vi...

something that works and is safe for long-term preservaon.

In the last few years FFV1 has proven itself to offer that:

It's currently one of the most easily accessible lossless codecs It's incredibly widespread, due to its availability in FFmpeg's libraries out of the box It's probably the fastest lossless codec that has compression comparable to JPEG2000-lossless

With FFV1, we can capture directly in our archiving format. Without requiring any special hardware. Thanks to FFmpeg, we can transcode to virtually any format. We even run automated mass-transcodings, for example to our video content available online.

Q: How sustainable is FFV1? It's not even standardized.

It's not yet standardized. That's true. In theory, that sounds bad. In pracce it doesn't maer, because FFV1 was released as Free Soware (some call it "Open Source") from the very beginning.

What does this mean?

Quong Jason Garre-Glaser, the lead-developer of "" (the most widely used H.264 implementaon) on the queson of long-term format accessibility of FFV1:

Portable C code is not going to stop working in 20, 40, or 100 years. Something encoded in a format with open source encoders and decoders will be available for as long as computers exist as long as it's packaged with source code. Nobody cares about how "estoteric" anything is: if a computer can run C code, it can decode anything.

Even if Jason was wrong, there are no arficial restricons that would prevent anyone from translang C code to work in any future environment.

No arficial restricions:

FFV1 was developed as part of the FFmpeg project, which is Free Soware. Free Soware is defined by 4 freedoms:

You may…

1. …run the program for any purpose (USE). 2. …study how the program works (STUDY). 3. …redistribute copies so you can help others (SHARE). 4. …improve the program, and release your improvements to the public (IMPROVE).

For long-term preservaon, this means that we can even archive the source code of the encoding/decoding tool that was used to create all FFV1 files ever in existence. Literally. Compared to analogue video, this is the equivalent of archiving our recorder and replayer along with our content – and even more: Archiving the blueprints, schemacs and all parts required to build it from scratch. Now and under any, yet unknown technical condions in the future.

Technically, this is a viable soluon against format obsolescence. Please make up your own opinion about

3 of 7 10/1/2018, 12:06 PM The archivist's video codec/container FAQ hp://download.das-werksta.com/pb/mthk/info/video/FAQ-digital_vi...

this, and evaluate exisng applicaons or hardware producing standardized formats (like JPEG2000 or MXF) against these above menoned properes.

For more informaon about how soware licensing affects archiving instuons, and its impact on sustainability of digital soluons and formats, there is a video of a talk about this subject, given at the EuropeanaTech conference in 2011.

Q: Would you always recommend FFV1 for capture from digital video?

It depends.

When the original stream can be retrieved from tape and the codec fullfills our requirements for mid-term preservaon (e.g. DV), we would rather keep this codec and not transfer it into FFV1. Especially if the codec is lossy compressed.

See the queson about born digital files for more details about how we handle born digital material.

Q: Would you always transfer born digital files into your archive format (FFV1)?

As most born digital files are lossy compressed and some of it store extra informaon in their original form (header info, addional data files or folder structure, etc), we always keep the original source files, as well using BagIt for preserving the original file/folder structure. We also checksum the original files.

Unl the required storage becomes affordable, only the video files which do not fullfill the requirements for mid-term preservaon (i.e. the codec is proprietary, or owned by one vendor, etc) will be transferred to FFV1 – as a copy of the original file.

So, mainly due to size consideraons (for the me being), our current approach is to have a "whitelist" system. Codecs on the whitelist will be kept in their original format and not transcoded.

Currently, our whitelist contains the following video codecs: (Audio is always transcoded to uncompressed PCM)

FFV1 (Full name: FFmpeg Video Codec 1) H.264 / x264 (Full name: MPEG-4 AVC ()) / DivX (Full name: MPEG-4 ASP (Advanced Simple Profile)) DV (Full name: Digital Video) MPEG-1 / MPEG-2 Classic MPEG-1 or MPEG-2 as used for VCDs or DVDs.

The file is just "re-mulplexed / rewrapped" into a suitable container, without re-encoding of the video bitstream. Therefore, no quality is lost.

4 of 7 10/1/2018, 12:06 PM The archivist's video codec/container FAQ hp://download.das-werksta.com/pb/mthk/info/video/FAQ-digital_vi...

This procedure of rewrapping has also proven to be good to detect container/wrapper issues as early as possible in the ingest process: For example, if rewrapping causes the audio/video to go asynchronous, it might indicate that the ming informaon in the source was not 100% clean.

Every other video codec will be transcoded to FFV1, preserving as much as possible (bit-depth, pixel-format, colorspace, ...)

Q: Is it possible to capture HD in FFV1?

The actual version 1 of FFV1 is running fast enough to encode and decode SD material in realme. With the version 3 of FFV1, the encoding and decoding uses multhreading and therefore can be done faster. Tests have shown that with FFV1 version 3 it should be possible to capture even full HD (1920x1080) on reasonable PC hardware (Quad-Core @3.4GHz).

This might be an interesng soluon for capturing HDCAM.

Q: Video files are huge! How do you deal with several-gigabytes per file?

We don't. If we can, we avoid creang files that large, by segmenng our captured material into segments of 1500 frames (=1 minute PAL) each.

In our everyday pracce, this has turned out to have more advantages than disadvantages:

An applicaon is needed to concatenate the segments. For automaon, we use FFmpeg. For post- processing, it works great with VirtualDub, and any video-eding applicaon is by design built to handle video segments, anyway When ordering a video from the archive, only the minimimum amount of data must be transferred, regardless of the underlying architecture (disk, tape, HSM, etc), because no file-parsing-and-seeking is necessary to order minute-exact. Having a file header in every segment file, makes those files more robust and less vulnerable to bit- errors in comparison to a single large file. No problem with > 2GB file allocaon. Having one file checksum for every segment of the archive video copy, enables archives to perform integrity checks more granular. Less network load when copying video secons for producon.

The sorng order of the video segment files is stored in their filename (zero-padded), so the files appear in correct order when sorted alphabecally. This is the way we're working for years now, without problems. The segment index number is also stored in each file's header, so it could be restored from there – if needed.

Q: I've heard/seen that bit-errors are almost invisible in Uncompressed/J2K/etc. What about FFV1?

Compressed formats always have subsequent bytes in their bitstream dependent on each other. Therefore,

5 of 7 10/1/2018, 12:06 PM The archivist's video codec/container FAQ hp://download.das-werksta.com/pb/mthk/info/video/FAQ-digital_vi...

with compressed bitstreams you will always have some kind of cascade effect in case of errors.

How robust a certain bitstream is against bit-errors, is oen pointed out as very important for the choice of a video codec. These error-concealment techniques are very important for live-streaming video. For example, in broadcasng playout systems.

NOTE: With uncompressed video, one usually has no means of automacally detecng pixel errors. Some compressed codecs contain integrity informaon which allows integrity verificaon.

Bit-errors means losing informaon – no maer how small (or big) the visual impact of that error may be: It must not happen.

Q: Why not JPEG2000/MXF?

To answer this queson we would like to quote an e-mail we've received from another naonal A/V archive:

"Though inially keen I've become a lile cauous in regards to JP2000/MXF. I think the way MXF has been documented and standardised for use in specific areas of broadcasng and post-producon etc is very posive but it appears at the moment that real world applicaon seems to be rather random with some bespoke tailoring going on. JP2000, though its been around for years also there's a lack of robust codec support and the combinaon of JP2/MXF is underrepresented in terms of simple tools to generate, play and validate."

This prey much sums up the current status of JPEG2000-lossless wrapped in MXF.

Apart from these issues, JPEG2000 is very slow, compared to other codecs that have a similar compression rao.

For more details, please take a look at the JPEG2000 (lossless) secon on our codec comparison page, as well as the test results for speed/size.

Links:

Jim Lindner's comments about JPEG2000/MXF (AMIA-L mailing list).

Q: What about MKV/MOV/MXF as container?

As menoned above, AVI causes the least problems and is sll the best supported container for video. However, when storing content originang from a lossy codec source, preserving the original video bitstream might not always be possible with AVI.

In the last few years, (MKV) is being widely used as replacement for AVI use cases. This makes it already more interoperable, highly documented and well-supported – even compared to Apple's Quickme container (MOV). MXF is the least supported with the most compability and accessibility problems. It also has a very limited set of codecs that it supports at the moment (e.g. no FFV1 yet), so it is completely out of the queson for us.

Unfortunately, vendors of proprietary video-producon and eding products, have not yet picked up Matroska-support as desireable. Transcoding tools and playback hardware on the other hand has widely

6 of 7 10/1/2018, 12:06 PM The archivist's video codec/container FAQ hp://download.das-werksta.com/pb/mthk/info/video/FAQ-digital_vi...

spawned MKV support over the last few years. Numbers increasing.

For more details about these containers, please take a look at the video container secon on our comparison page.

Q: How do you archive video DVDs?

Video DVDs are actually data DVDs with a defined, standardized folder/file layout. This means, that because the bitstream on the disk is actually a filesystem, we can read (and archive) the whole filesystem as-is.

For DVDs, this usually boils down to 2 filesystem formats:

Universal Disc Format (UDF) ISO 9660

For the actual extracon of data from opcal media, we use "dvdisaster": A tool designed for data-recovery of data stored on opcal discs. We are running dvdisaster on GNU/Linux, because the Windows version does not allow to read original, pressed disks.

In both cases, the preserved master format is then an "ISO image" file (.iso), which represents every wrien sector of an opcal disk. Of course, we also checksum them during the ingest process.

This is also our approach for all opcal discs which contain these filesystems, such as:

Video DVDs Data DVDs Data CDs

Q: Performance of image sequences as format?

For digital film it is very common to work with one-file-per-frame image sequences and audiofiles. The formats of choice are usually DPX or TIFF for the images and linear WAV for audio. There's no video container or streams involved. This way of storing moving images has it's pros and cons.

At the Mediathek, we've ran performance tests regarding uncompressed image sequences (e.g. DPX) compared to uncompressed video in AVI. We transcoded both sources to the same output format. The image sequence as source performed tremendously slower than the videofile.

The reason for this performance difference is not data size or , because these are almost idencal to each other. For each file that is being read, the operang system must read certain file-properes, such as access rights, file-header, etc. Addionally, depending on the individual implementaon of the tools being used, loading a new file might add addional acons within the applicaon - yet, causing more addional me to wait.

Österreichische Mediathek | Design by FreeCSSTemplates.org.

7 of 7 10/1/2018, 12:06 PM