<<

A short guide to choosing a digital format for archiving masters | S... hps://www..be/?q=en/content/short‐guide‐choosing‐digital‐forma...

EN FR NL Contact

Home Articles Interviews Case studies Guidelines Projects Blog

A short guide to choosing a digital format for video archiving masters

Author: EmanuelEmanuel LorrainLorrain (PACKED(PACKED vzw)vzw) Publication date: MarchMarch 20142014

Hundred of thousands of hours of audio-visualaudio-visual materialmaterial areare stillstill being held by Flemish cultural heritage institutions and broadcastbroadcast archives on alalreadyready – or soon to become – obsolete carriers. From the end of 2013 the Flemish Institute for Archiving (VIAA)1 willwill operateoperate asas aa serviceservice provider who will organise the digitisation and storagestorage of audio-visual contents for owners and caretakers. The digital files prodproduceduced will ultimatelyultimately replace old tape-based formats and become thethe newnew archivingarchiving mastersmasters2..

File-based video formats have brought a number ofof newnew termsterms (wrapper,(wrapper, ,codec, compression,compression, etc.) and facets to video preservation that have toto be learned byby collecticollection caretakers. Confusion regarding technologies can cause heritage institutionsions toto bebe reluctantreluctant aboutabout entrustingentrusting their collections and devoting their reresourcessources to large-scalelarge-scale digitisatidigitisationon projects. In such a context choosing the destination format and specifications is always a very complex phase due to the lack of a real consensus in the archival world as toto whichwhich formatsformats and specificationsspecifications should be used forfor thethe long-termlong-term preservationpreservation ofof video.video. ThisThis dedecision is, however, a critical step that will have consequences on the future use and accessibility of the digitised content.

In the framework ofof thethe preparationpreparation forfor thethe VIAA digitisation projects, PACKED vzw has conducted some research, lookinglooking at common prpracticesactices in broadcast organisations and audio- visual archives in order to see whwhatat would be the best solution for the digitisationdigitisation of audio-visual collections in Flanders' cultural heritage institutionsinstitutions.. TheThe followingfollowing documentdocument givesgives anan overviewoverview of the different elements that should be taken into account when choosingchoosing thethe destinationdestination formatformat and related specifications, listinlistingg the various options available.

1. Video formats

1.1 and containers Video files are composed of different streams encapsulated inin aa containercontainer oror wrapper.wrapper. VideoVideo andand audioaudio signalssignals are encoded using codecs. A codec is a piece of hardware or softwaretware neededneeded toto encodeencode aa datadata streamstream oror signalsignal forfor transmission,transmission, storagestorage oror encryptionencryption andand toto decodedecode itit forfor plplayback or other purposes such as editing. Codec is a 'portemanteau''portemanteau' termterm constructedconstructed fromfrom thethe wordswords coding/decoding.coding/decoding. TheThe termterm 'codec''codec' isis commonlycommonly usedused toto referrefer directly to a coding or compression format. Video and audio essences (the streams) can be encoded with different codecs, with or without compression. Some examples of codecs for video are: H264, MPEG2, JPEG2000, IV41, and Sorenson. InIn orderorder toto createcreate aa videovideo filefile readablereadable byby computercomputer ,software, thethe encodedencoded videovideo andand audioaudio streamsstreams areare wrappedwrapped togethertogether inin aa containercontainer withwith aa numbernumber ofof otherother datadata streamsstreams sucsuch as descriptive and subtitles. The number, type and of data streams that a container can hold are specific to the container format used. Some examples of containers for video are: AVI, MOV, MP4, WMV and MXF.

1.2 , lossless and As mentioned, audio and video can be encoded with or without compression. In an uncompressed video , the entire of the digitised source is captured and encoded without any compression. Uncompressed video leadsleads toto veryvery bigbig filesfiles requiringrequiring importantimportant storagestorage capacitycapacity whwhen great amount of content needs to be digitised. In order to generate smaller file sizes and bit rates, video compression is used to re-encode the original content differently. Compression codecs can be lossless or lossy. When using a lossless codec, a bit-identical copy of the data can be achieved (as in an uncompressed file). When using a lossylossy codec,codec, thethe entiretyentirety ofof thethe datadata isis notnot

1 of 9 8/23/2018, 4:09 PM A short guide to choosing a digital format for video archiving masters | S... hps://www.scart.be/?q=en/content/short‐guide‐choosing‐digital‐forma...

maintained. Video compression can be brought about using different methods and (, , discrete cosine transform or DCT). Compression methodsthods areare usuallyusually divideddivided intointo threethree mainmain categories: lossylossy compression;compression; visually ; mathematically lossless compression.

With lossy compression a number of are removed in order to reducereduce thethe sizesize ofof thethe videovideo file.file. MostMost ofof thethe time,time, thisthis isis donedone byby reducingreducing thethe amountamount ofof colourcolour information.information. ThisThis processprocess impliesimplies thatthat aa partpart ofof thethe image,image, andand detailsdetails of its (the chroma sub-sampling and the colour bit depth) and also sometimes its luminance are lost permanently. MPEG-2/D10, Apple ProRes, DVCPro and H264 are examples of codecs performing lossy compression. The majority of digital cameras capture video natively with lossy compression codecs; lossy compressed formats are always used for production and for access (a.o. web, TV and DVD). Manufacturers sometimes label technically 'lossy' compression schemes as 'visually lossless', because the difference between the compressed video and the original is supposed to be imperceptible to the (common) human eye. Despite its name, 'visually lossless' is a compression method in which a part of the data is permanently discarded. For this reason, 'visually lossless' is also sometimes more accurately defined as 'near-lossless compression'. In the remainder of this document, thethe termterm 'loss'lossy' will also be used to refer to 'visually lossless' compression. 'Mathematically'Mathematically lossless'lossless' compressioncompression isis alsoalso aa methodmethod usedused toto reducereduce thethe sizesize ofof aa file,file, butbut herehere thethe encodedencoded datadata remainremain exactlyexactly thethe samesame onceonce itit isis decoded.decoded. InIn 'real'real losslesslossless ccompression', no information is lost. The is reducedreduced byby representingrepresenting exactlyexactly thethe samesame informationinformation moremore conciconcisely, using for instance statistical redundancy. Lossless compression codecs can't achieve the same compression ratiosratios asas lossylossy (and(and 'visually'visually lossless')lossless') codecscodecs but they result in smaller files than uncompressed video while retainingretaining thethe entireentire information.information. InIn thethe remainderremainder ofof thisthis documentdocument 'lossless''lossless' willwill bebe usedused toto referrefer toto 'mathematical'mathematicallyly losslesslossless compression'.compression'.

1.3 Compression ratios ratio is the ratio between the uncompressed size of a file and its compressed size. Different compression algorithms and methods produce different compression ratios. The examples below show the differences in storage space required when lossy, lossless and uncompressed video codecs are used: uncompressed (e.g., v210) 10-bit -> approx. 100GB per hour of video;ideo; losslesslossless compressioncompression (FFV1(FFV1 andand JPEGJPEG 2000)2000) 10-bit10-bit ->-> approx.approx. 45-45-50 GB per hour of video; lossylossy compression;compression; MPEG 2 (50 Mbps) -> approx. 25 GB per hour of video; DV (DV25) -> approx. 12 GB per hour of video; MPEG 2 (DVD quality) -> approx. 3.6 GB per hour of video.

2. Choosing a format for long-term preservation

2.1 Differences between broadcast and heritage archives

Broadcast and cultural heritage sectors often have different views on how audio-visual material should be preserved. While both are entitled to preserve and make audio-visualisual heritageheritage accessible,accessible, theythey dealdeal withwith differentdifferent typestypes andand quantitiesquantities ofof audio-visualaudio-visual material.material. ThisThis leadsleads toto didifferentfferent requirements,requirements, viewsviews andand approachesapproaches onon whatwhat preservation means and on how to do it. Within the context of VIAA,IAA, thethe materialmaterial toto bebe digitiseddigitised comescomes fromfrom aa widewide rangerange ofof mutuallymutually differentdifferent institutions,institutions, withwith approximatelyapproximately seseventy per cent originating from the broadcast sector (public,(public, commercialcommercial andand locallocal )televisions) andand thethe restrest fromfrom didifferentfferent heritageheritage institutionsinstitutions (museums,(museums, culturalcultural archives and heritage libraries). As broadcast archives store large quantities of material, they often require speed, efficiency and a format that can satisfy their technical tool chain and workflow. Their use cases are clearly defined, i.e., the necessity to re-use material they have produced themselves in the past for their own broadcast activities or to make it available for others. Typically, the message conveyed by the content predominates over the quality of the image. On the other hand, cultural heritage organisations consider themselves as custodians rather than as owners of the audio-visual heritage. In most cases, they didn't produce the material they preserve, which makes them accountable to the donors and means they have a duty to preserve it in the best possible manner. While access is also a crucial aspect for heritage institutions, their approach is determined by conservation principles such as authenticity, integrityintegrity andand long-termlong-term sustainabilitysustainability overover short-termshort-term efficienefficiency. InIn itsits definitiondefinition ofof aa museum,museum, UNESCOUNESCO says:says: “Today“Today theythey areare nonnon-profit-making,-profit-making, permanentpermanent institutionsinstitutions inin thethe service of society and its development, and open to the public, whichwhich acquire,acquire, conserve,conserve, research,research, communicatecommunicate and exhibit, for purposes of study, education and enjoyment, materialterial evidenceevidence ofof peoplepeople andand theirtheir environment".environment". […][…] AA museum’smuseum’s primaryprimary purposepurpose isis toto safeguardsafeguard andand preservepreserve thethe heritageheritage asas aa whole.”whole.” WhatWhat isis saidsaid herehere aboutabout

2 of 9 8/23/2018, 4:09 PM A short guide to choosing a digital format for video archiving masters | S... hps://www.scart.be/?q=en/content/short‐guide‐choosing‐digital‐forma...

museums applies to cultural archives and heritage libraries as well. While they are also interested in providing access to audio-visual material efficiently, they also have an institutionalinstitutional mandatemandate toto preservepreserve it.it.

2.2 Criteria The archiving community has defined sets of principles to evaluate the sustainability factors of file formats for long- termterm archiving.archiving. OneOne exampleexample ofof thesethese evaluationevaluation toolstools isis thethe lilist created by the of Congress for its own collection3.. TheThe criteriacriteria consideredconsidered byby PACKEDPACKED vzwvzw duringduring itsits researchresearch inin thethe frameworkframework ofof VIAAVIAA areare largelylargely basedbased on this list and others developed by others, such as the InterPares 2 project4 oror thethe UnitedUnited Kingdom'sKingdom's NationalNational Archives5 They comprise: thethe quality:quality: thethe qualityquality ofof thethe filefile shouldshould bebe highhigh enoughenough toto anticipateanticipate fufutureture useuse andand toto avoidavoid riskrisk ofof qualityquality lossloss overover time;time; thethe openness:openness: therethere shouldshould bebe nono restrictionrestriction onon thethe useuse andand reusereuse ofof thethe filfile such as licences that can constitute a threat to the adoption and support of the format; thethe adoption:adoption: thethe formatformat shouldshould bebe widelywidely usedused byby archivesarchives oror differentdifferent domadomainsins andand havehave sufficientsufficient support in existing tools; thethe transparency:transparency: thethe filefile formatformat shouldshould bebe easyeasy toto analyse;analyse; thethe durability:durability: thethe formatformat shouldn'tshouldn't bebe expectedexpected toto becomebecome obsoleteobsolete oror needneed trtranscoding too quickly. Backward compatibility in the short term should be ensured; thethe functionality:functionality: thethe filefile shouldshould bebe ableable toto dealdeal withwith complexcomplex objects;objects; thethe handling:handling: thethe filefile shouldshould bebe handledhandled easily,easily, efficientlyefficiently andand withoutwithout (or(or veryvery limited)limited) riskrisk ofof errorerror andand threatthreat toto workflows.workflows.

While the criteria are clear, a format combining all the listed requirementsrequirements suchsuch asas opennessopenness (important(important forfor culturalcultural heritage institutions) and efficient handling (critical for the broadcastbroadcast sector),sector), isis notnot yetyet apparent.apparent. TheThe resultresult ofof thisthis isis thatthat thethe choicechoice ofof thethe archivingarchiving formatformat isis alwaysalways aa compromicompromise, whereby different types of institutions do not necessarily prioritise the same criteria. Unlike in the realm off audioaudio digitisationdigitisation forfor whichwhich LPCMLPCM andand BroadcastBroadcast Wave (BWAV) are widely considered the de-facto standards for long-term preservation, no consensus has been foundfound amongstamongst archivistsarchivists forfor videovideo content.content. However,However, digitisingdigitising allall remainingremaining analogueanalogue materialmaterial can'tcan't waitwait forfor thethe idealideal videovideo formatformat toto emergeemerge whilewhile obsoleteobsolete videotapesvideotapes andand playplayback equipment are slowly degrading on shelves. Different archives have already chosen different codecs, containers and specifications for their file-based archiving masters. Just as for before, no formats, containers orr codecscodecs areare expectedexpected toto lastlast forever.forever. TheThe archivingarchiving masters will most likely have to be migrated to another format at some point. From a cultural heritage point of view, itit shouldshould bebe possiblepossible toto migratemigrate andand transcodetranscode aa filefile inin thethe fufutureture withoutwithout lossloss ofof informationinformation andand quality.quality.

3. Risks of lossy compression

3.1 Lossy compression threatens the quality of the content

Lossless compression can't achieve compression ratios equal to lossylossy compression.compression. ThisThis isis whywhy lossylossy compressedcompressed codecs have been chosen by some digitisation projects when storage costs were a main concern. However, while lossylossy compressioncompression resultsresults inin smallersmaller imageimage files,files, thethe informatiinformation loss can also result in severe digital artefacts, especially visible with high compression ratios and on certain typestypes ofof material.material. TechnicallyTechnically speaking,speaking, aa compression artefact is a particular type of data error. These artefacts appear because the amount of data discarded from the original was too important. A compression such as the ones used by MPEG formats cannot always make the distinction between small variations and distortionsdistortions thatthat willwill bebe visiblevisible toto thethe nakednaked eye.eye. This results in visual errors such as blurring, blockiness, shimmering and colour aberration.6

3.2 Lossy compression can threaten future uses of the content Moreover, when specific works such as colour correction or image restoration have to be carried out, the absence of discarded data can be a big problem. Even if the visually "lossless" compression algorithms are considered good enough for today's uses (i.e., TV and Web broadcast), they mightt notnot preservepreserve sufficientsufficient informationinformation forfor futurefuture applications that we can't yet anticipate. Since the recording of moving images was made possible, display standards and technologies have kept evolving. An acceptable compressed image today might look terrible on tomorrow'stomorrow's endend users'users' devicesdevices andand screens.screens. ChoosingChoosing aa lossylossy comcompression codec for long-term preservation is a risk:risk: byby greatlygreatly decreasingdecreasing imageimage qualityquality oneone necessarilynecessarily throwthrows away some of the potential of the digitised material.

3.3 Lossy compression increases Generation loss refers to the loss of quality between copies. This can happen when a tape is copied to another tape or when a file is transcoded to a different format. During the conversion from an analogue to a digital format a certain amount of loss inevitably occurs and even with uncompressed digitisation the digital file is not an exact copy

3 of 9 8/23/2018, 4:09 PM A short guide to choosing a digital format for video archiving masters | S... hps://www.scart.be/?q=en/content/short‐guide‐choosing‐digital‐forma...

of the original analogue source. Even digital tape formats such asas DigitalDigital BetacamBetacam –– whichwhich replacedreplaced previousprevious analogue formats like SP as the in audio-visuall archivesarchives forfor long-termlong-term preservationpreservation ofof videovideo –– didn't make it technically possible to avoid generation loss duringring migrationmigration fromfrom oneone tapetape toto another.another. However,However, thethe best practice in audio-visual archives has always been to avoid migratingmigrating thethe contentcontent toto aa poorerpoorer mediummedium oror format.format. At the time, a copy to Digital Betacam was the closest to a lossless copy that one could get. Today, lossless codecs make it possible to keep the entire information while using smallerller storagestorage capacitycapacity thanthan uncompresseduncompressed video.video. TheThe merits of the already established good practice with tape formatsts shouldshould bebe maintainedmaintained inin tape-lesstape-less videovideo archiving.archiving. Digital formats, just like tapes, are not expected to last forever. Digital files created today will also become obsolete and will need to be transcoded to another format at some point inin thethe future.future. ChoosingChoosing toto digitisedigitise heritageheritage materialmaterial with lossy compression means that one decides to delete some informationformation fromfrom thethe original.original. OnceOnce lossylossy compression has been processed, there is no reversibility or way back as the deleted data is lost forever. On the contrary a lossless encoding will ensure that the entirety of the information is there for the next migration. With lossy compressed formats the image quality will decrease during every migration/transcodingmigration/ procedure.procedure. IfIf problemsproblems appear in the future, the only option is to digitise the tapes again.

3.4 Should different tape formats be digitised differently? 3.4.1 Quality requirements When choosing a format to digitise video, the question of whether quality requirements should differ based on the original source material often comes up. Choosing a compressed formatformat oror aa lowerlower bit-depthbit-depth toto digitisedigitise certaincertain formatsformats isis oftenoften consideredconsidered becausebecause ofof thethe inherentinherent lowerlower qualiqualityty ofof analogueanalogue formatsformats suchsuch asas VHS,VHS, VCR,VCR, 1/2"1/2" EIAJEIAJ or even U-matic tapes compared to broadcast standards like digitaltal Betacam.Betacam. BecauseBecause thesethese analogueanalogue formatsformats have a smaller number of resolution lines it may seem logical to use a lower bit depth or for theirtheir capture.capture. InIn practicepractice thisthis meansmeans thatthat thethe digitisationdigitisation setsettingstings wouldwould use,use, forfor instance,instance, anan 8-bit8-bit depthdepth insteadinstead ofof 10-bit depth or a 4:2:1 instead of a 4:2:2 chroma subsampling. The process of defining the levels into which analogue variables are separated in order to convert them into is called sampling. In the case of images, resolution defines unit of area, and bit depth defines the unit of luminance.luminance. InIn analogueanalogue videovideo thethe rangerange betweenbetween whitewhite andand blackblack isis expressedexpressed inin IREIRE7 andand isis fixedfixed betweenbetween 00 andand 100 IRE for PAL. All properly recorded video will contain video levelslevels betweenbetween 00 andand 100100 IRE.IRE. AtAt oneone endend ofof thethe rangerange itit isis blackblack andand atat thethe otherother endend itit isis white.white. TheThe higherhigher thethe bitbit depthdepth usedused toto digitisedigitise videovideo thethe betterbetter thethe digitaldigital sample depth, leading to a continuous, smooth transition from blacklack toto whitewhite8.. ForFor maximalmaximal qualityquality retentionretention ofof thethe original source a 10 bits digital sample is required. This is truerue forfor anyany videotapevideotape formats,formats, eveneven U-matic,U-matic, Hi8Hi8 oror VHS. In fact, retaining the maximum chrominance and luminance information from these formats might be even more important than for high-quality standards such as Betacam SP or 2" Quad tapes. The same goes for an already poor analogue source, for which any type of compression willwill onlyonly makemake thethe lowlow qualityquality ofof thethe imageimage worse.worse.

3.4.2 Keeping the native encodings Storing digitised video material in one single format makes it easier to manage a file-based archive than storing in different formats. As an example, monitoring the levels of obsolescencelescence ofof formatsformats andand managingmanaging futurefuture transcodingtranscoding proceduresprocedures isis moremore complicatedcomplicated ifif severalseveral formatsformats aarere used.used. However,However, forfor somesome digitaldigital tape-basedtape-based formatsformats likelike DVDV oror HDCAM,HDCAM, keepingkeeping thethe originaloriginal encodingencoding ofof thethe without further transcoding is possible. Other tape-based digital formats such as Digital Betacam don't allow one to keep the original encoding9 andand shouldshould be digitised using the same codec as for analogue tapes.

4. Best practices and available options

4.1 Uncompressed video formats 4.1.1 Uncompressed MXF files The BBC is the only broadcast archive of such importance known toto havehave chosenchosen uncompresseduncompressed videovideo forfor itsits digitaldigital preservation master files of a part of its collection. To digitiseise materialmaterial whichwhich isis stillstill keptkept onon physicalphysical carrierscarriers (mainly(mainly D3 and Digital Betacam tapes), it uses the Archive system developeddeveloped byby itsits ownown R&DR&D department.department. FilesFiles produced by this system consist of an 8-bit uncompressed YUYV orr 10-bit10-bit uncompresseduncompressed v210v210 essenceessence wrappedwrapped inin an MXF container.

4.1.2 Uncompressed AVI and Quicktime files Apart from the BBC, uncompressed video is almost only used by small or middle-sized collections holding very valuable works. As an example, institutions with art collections such as LIMA10 inin thethe Netherlands,Netherlands, thethe ZKM11 inin GermanyGermany andand othersothers startedstarted toto digitisedigitise theirtheir analogueanalogue videovideotapetape worksworks severalseveral yearsyears agoago already,already, usingusing uncompressed video. Aja and Black Magic are the most common hardware brands for capture cards used by these collections to encode video with the v210 codec in combination with an AVI or Quicktime container (MOV). They have chosen for this combination because they absolutely wanted toto avoidavoid (lossy)(lossy) compression,compression, andand thisthis waswas aa good and affordable combination at that time. Both the AVI and MOV container and the v210 codec are all very well

4 of 9 8/23/2018, 4:09 PM A short guide to choosing a digital format for video archiving masters | S... hps://www.scart.be/?q=en/content/short‐guide‐choosing‐digital‐forma...

supported containers and encoding formats by the majority of currentrrent mediamedia playersplayers andand editingediting softwaresoftware (e.g.,(e.g., Final Cut Pro). While both AVI and MOV are proprietary formats, theirtheir specificationsspecifications areare mademade availableavailable byby thethe manufacturers, and they're implemented in a wide variety of toolsls availableavailable underunder anan openopen licenselicense (e.g.,(e.g., FFmpeg).FFmpeg).

4.2 Lossless video codecs For institutions and archives that can't afford to store uncompressedressed video,video, butbut wantwant toto keepkeep thethe maximummaximum qualityquality fromfrom thethe originaloriginal source,source, losslesslossless compressioncompression isis thethe onlyonly otheotherr solution.solution. ThereThere areare aa numbernumber ofof differentdifferent codecscodecs which encode video with real mathematical lossless compression: Sheervideo; ; ; YULS; JPEG2000; FFV1.

When removing the proprietary codecs from this list, only a few areare left.left. SeveralSeveral ofof thethe remainingremaining openopen sourcesource onesones are still in a complete or partial experimental state and only very small communities maintain them. This is of course a threat to their long-term availability and the low support in softwaresoftware toolstools makesmakes themthem alsoalso hardhard toto useuse forfor non-non- technicianstechnicians oror institutionsinstitutions withoutwithout in-housein-house developers.developers. ThisThis bbasically leaves heritage institutions that want to use a losslesslossless codec,codec, withwith onlyonly twotwo options:options: Jpeg2000Jpeg2000 andand FFV1.FFV1.

4.2.1 Jpeg2000 JPEG2000 is an image codec and a suite of ISO/IEC standards developed by the Joint Photographic Expert Group. JPEG2000 can be used to images in either lossless or lossy modes and is also used to encode audio- visual content from video capture or scanning.12 JPEG2000JPEG2000 encodesencodes videovideo materialmaterial frameframe byby frameframe andand doesdoes not use inter- coding techniques13.. InIn itsits losslesslossless mode,mode, JPEG2000JPEG2000 hashas beenbeen chosenchosen byby aa numbernumber ofof bigbig institutionsinstitutions toto archivearchive theirtheir audio-visualaudio-visual materialmaterial inin combinatcombinationion withwith aa MXFMXF container.container. DifferentDifferent codeccodec librarieslibraries cancan be used to encode and decode JPEG2000 files. OpenJPG and Kakadu areare twotwo examplesexamples ofof JPEG2000JPEG2000 implementationsimplementations usedused byby openopen sourcesource andand proprietaryproprietary softwaresoftware prprogrammes. JPEG2000 supports several resolutions,resolutions, samplesample bitbit depthsdepths andand chromachroma subsampling,subsampling, however,however, unlikeunlike videovideo codecscodecs likelike DVDV oror FFV1,FFV1, JPEG2000JPEG2000 reliesrelies onon itsits containercontainer (a.o.,(a.o., MXF,MXF, QuickTimeQuickTime andand MotionMotion JPEGJPEG 22000) for some of its technical metadata (e.g., colour space).

4.2.2 FFV1 'FF'FF videovideo codeccodec 1',1', knownknown asas FFV1,FFV1, isis thethe mostmost promisingpromising openopen ssource for long-term preservation. This 'mathematical'mathematical lossless'lossless' onlyonly codeccodec isis includedincluded inin thethe LibavcodeLibavcodec library as part of the FFmpeg project14.. VersionVersion '3''3' of the codec was developed with the input of archivists in orderr toto addressaddress thethe specificspecific requirementsrequirements ofof thethe heritageheritage sector. It is being successfully used by the Austrian Mediatek15,, thethe VancouverVancouver CityCity ArchiveArchive16 andand moremore recentlyrecently byby thethe MUMOKMUMOK17 inin ViennaVienna forfor theirtheir long-termlong-term preservationpreservation files.files. FFV1FFV1 hashas aa ccompression ratio similar to JPEG2000 and decreases the amount of storage space needed to almost thirtyty percentpercent comparedcompared toto uncompresseduncompressed video.video. TheThe Austrian Mediathek has been using it successfully for three yearsrs alreadyalready andand hashas beenbeen ableable toto useuse itit withwith allall current colour spaces like YUV, YV12 and RGB, including different subsampling (4.4.4, 4.2.2, etc.) with PAL SD material in 4:3 and 16:9 aspect ratios as well as with HD content in 1980 x 1080 resolution. Open source tools used by the aforementioned archives, Archivematica18 andand DVA-ProfessionDVA-Profession19 bothboth supportsupport thisthis codec and while the adoption of FFV1 remains modest in archives,, manufacturersmanufacturers suchsuch asas NOANOA AudioAudio SolutionsSolutions20 are starting to incorporate the codec in their products. FFV1 is being increasingly discussed on forums (e.g., AMIA), expert groups (e.g., Presto4u) and presented in articles (e.g., AVAV Insider)Insider) andand conferencesconferences (e.g.,(e.g., IASAIASA 2013)2013) asas anan alternative to JPEG2000 for mathematical lossless encoding of video.ideo. SomeSome mediamedia artart collectionscollections areare currentlycurrently runningrunning teststests toto evaluateevaluate ifif FFV1FFV1 couldcould offeroffer aa goodgood alternativalternative to uncompressed video files. When used in combination with the open source container it has the benefit of creating fully open source files. If the adoption of FFV1 increases, it might become the preferred choice of institutions wanting to digitise video without any information loss.

4.3 Containers 4.3.1 AVI AVI stands for . It is a video container formatformat introducedintroduced byby MicrosoftMicrosoft inin NovemberNovember 19921992 asas part of its '' framework. AVI is a simple container with a limited set of features. For instance,instance, AVIAVI doesdoes notnot provideprovide aa standardisedstandardised wayway toto encodeencode thethe informationinformation relatedrelated toto thethe aspectaspect ratioratio ofof aa videovideo essence. This means that when a file is played in VLC or Quicktimeime playersplayers forfor instance,instance, thethe rightright displaydisplay aspectaspect ratioratio isis notnot selectedselected automatically.automatically. AVIAVI reliesrelies onon thethe codeccodec toto expressexpress thethe displaydisplay aspectaspect ratio.ratio. SomeSome formatsformats likelike DV, FFV1 or MPEG2 can do this uncompressed video and some other codecscodecs can't.can't. AVIAVI isis usedused byby thethe AustrianAustrian Mediathek to wrap FFV1 video essence and by several media art collections to store uncompressed video. It is a

5 of 9 8/23/2018, 4:09 PM A short guide to choosing a digital format for video archiving masters | S... hps://www.scart.be/?q=en/content/short‐guide‐choosing‐digital‐forma...

proprietary container format, but as said earlier, even if its legallegal situationsituation isis unclear,unclear, MicrosoftMicrosoft makesmakes itsits documentation freely available.

4.3.2 Quicktime Quicktime is a proprietary multimedia container format developed by Apple . The format specifies a container file that contains one or more streams, each of which storesstores aa particularparticular typetype ofof data,data, e.g.,e.g., audio,audio, videovideo and text (e.g., for subtitles). Quicktime files can use two differentferent extensions:extensions: .mov.mov oror .qt..qt. JustJust likelike AVI,AVI, thethe Quicktime format is a proprietary container but its documentation is made available by Apple Computer. While it is a proprietary format, it is widely used and supported by the vast majoritymajority ofof toolstools onon thethe market.market. MOVMOV isis usedused byby several collections to store uncompressed video or to create access files with lossy codecs such as Apple Pro Res or H264.

4.3.3 MXF MXF21 isis aa containercontainer formatformat usedused toto wrapwrap aa numbernumber ofof differentdifferent audioaudio andand videovideo streams,streams, subtitlessubtitles andand descriptivedescriptive metadata. MXF is theoretically a codec agnostic container and as seen earlier it can be used to wrap video in different encodings such as uncompressed, Mpeg-2 or JPEG2000 in lossylossy andand losslesslossless modes.modes.22 TheThe specifications of its profile for use with video are still very muchmuch linkedlinked toto thethe hardwarehardware andand softwaresoftware usedused toto ingestingest and create the video files. In recent years archivists and digitisationtisation labslabs havehave reportedreported aa numbernumber ofof interoperabilityinteroperability problems with MXF/JPEG2000 files created with different encoders. While it is a SMPTE standard, video files using MXF remain manufacturer dependent, which has led to different types of MXF resulting in compatibility issues between playback software. The high flexibility of MXF and the lack of a standard profiles forfor preservationpreservation makesmakes itit aa complexcomplex containercontainer toto handle. Although the use of MXF is widespread, this does not exclude all risks for interoperability problems. While it isis technicallytechnically anan openopen standard,standard, aa numbernumber ofof aspectsaspects ofof MXFMXF areare onlyonly availableavailable inin documentsdocuments forfor whichwhich aa feefee hashas toto be paid. This is one of the reasons why MXF is not as widespread as AVI or MOV in open source tools. The Advanced Media Workflow (AMWA)23 isis aa groupgroup thatthat bringsbrings togethertogether mainlymainly broadcastersbroadcasters andand manufamanufacturers, but also some large American cultural heritage institutions as Library of Congress and National Archives and Records Administration, working on solving these issues by specifying a numbernumber ofof MXFMXF profilesprofiles forfor specificspecific applications.applications. The AS-07 profile in development now is one of them and is intended to be manufacturer independent and specifically designed for long-term preservation requirements. Beyond better handling of lossless JPEG2000, the AS-07 profile should include, amongst others, multiple timecodes from different sources24 asas wellwell asas captionscaptions andand subtitles. However, this profile is still a work in progress, withith nono fixedfixed timetime frameframe asas toto whenwhen itsits finalfinal versionversion willwill bebe readyready andand implementedimplemented byby manufacturers.manufacturers.

4.3.4. Matroska The Matroska Multimedia Container is a free open source format thatthat cancan holdhold anan unlimitedunlimited numbernumber ofof video,video, audio,audio, picture, or subtitle tracks in one file. Matroska is similar to otherother containerscontainers butbut isis entirelyentirely openopen inin specification,specification, withwith implementationsimplementations consistingconsisting mostlymostly ofof openopen sourcesource software.software. ItIt iis intended to serve as a universal format for storing any type of multimedia content like video. On the Matroska websiteite itit isis announcedannounced thatthat thethe containercontainer "can"can supportsupport all known audio and video compression formats by design. To make sure it will also be capable of coping with the futurefuture standardsstandards itit isis basedbased onon aa veryvery flexibleflexible underlyingunderlying framframework called EBML, allowing for more functionalities to be added to the container format without breaking backwards compatibility with older and files.25"" Matroska use the .mkv extension and is known to the wider public as a container used to wrap content extracted fromfrom DVDsDVDs withwith openopen sourcesource softwaresoftware suchsuch asas HandbrakeHandbrake26.. TheThe VancouverVancouver CityCity ArchivesArchives isis usingusing MatroskaMatroska asas aa preservation format to store FFV1 essence together with audio essence and metadata. An open source free Matroska command line validator tool is available27 andand thethe containercontainer isis supportedsupported byby ArchivematicaArchivematica andand alsoalso usedused by the Vancouver City Archives.

4.4 Codecs and containers evaluation

Containers +- (wrappers)

AVI Simple Proprietary, but well supported Widely used by open source and other tools Widely supported Can't wrap complex objects

Quicktime (MOV, .QT) Widely used Proprietary, but well supported Widely supported by open source and other tools Can wrap complex objects

MXF SMTPE non-proprietary standard Interoperability problems Can wrap complex object between manufacturers and

6 of 9 8/23/2018, 4:09 PM A short guide to choosing a digital format for video archiving masters | S... hps://www.scart.be/?q=en/content/short‐guide‐choosing‐digital‐forma...

Widely adopted by the audio-visual profiles community (especially also by the Lacks a specific profile for broadcast sector) preservation. Not yet widely supported by software tools

Matroska Open Source Not widely adopted by archives Can wrap complex objects or the broadcast sector Lacks a specific profile for preservation

Codecs (encodings) + -

V210 No loss of quality Proprietary, but well supported Widely supported by software tools by open source and other tools Used by several archives Results in big files

JPEG2000 ISO non-proprietary standard Complex Smaller file than uncompressed video Not widely supported in editing Adopted by important AV archives software Lossless compression possible28 Not ideal for YUV material

FFV1 Open source Only used by a small amount of Lossless compression collections Smaller file than uncompressed video Not popular in the broadcast

D10 & AVC/H264 Results in considerably smaller files Lossy compression than lossless codecs Not widely adopted by heritage Widely adopted by broadcast archives collections Widely supported by tool chains

5. What is the best digital format for video preservation?

As seen throughout this text, there are only two options for one to be sure that the best quality is captured digitally fromfrom thethe originaloriginal videovideo source:source: anan uncompresseduncompressed oror losslesslossless codcodec. As seen also, uncompressed video requires a lotlot ofof storagestorage capacitycapacity andand thethe capabilitycapability toto handlehandle largelarge amouamounts of data efficiently. For the same amount of storage needed for one uncompressed copy, lossless encoding with FFV1 or lossless JPEG2000 allows one to store around one and a half to two copies. Although lossless compression results in smaller files, it may require more processing power due to the algorithms used in lossless compression. In a digitisation project, financial means and storage capacity are often considered the crux of decision-making.ision-making. CompressedCompressed formatformat suchsuch asas MPEG-2/D10 or lossy JPEG2000 allow even more reduced storage space, but represent a big risk for the (future) quality of archival material. FFV1 is the only 'real' open source codec that could be used by anan audio-visualaudio-visual archivearchive suchsuch asas VIAA,VIAA, but,but, asas wewe have mentioned, FFV1 is still in the process of gaining acceptance within the archival community and unknown by a largelarge partpart ofof it.it. NotNot manymany archivesarchives areare keenkeen onon beingbeing pioneerspioneers inin usingusing long-termlong-term preservationpreservation formats.formats. WhenWhen aa more widely used combination of FFV1 and Matroska become available and are further supported by software tools and manufacturers, it might be the best option for collections thatthat can'tcan't relyrely onon specificspecific proprietaryproprietary hardwarehardware andand software to use their files. However, its limited adoption compared to JPEG2000 and MPEG-2/D10 makes it difficult toto convinceconvince anyany heritageheritage institutionsinstitutions andand broadcastbroadcast archivesarchives toto choosechoose itit asas theirtheir long-termlong-term preservationpreservation format.format. VIAA will provide transcoding services for the creation of derivatives for production and access for all the institutions involved.involved. ForFor smallsmall institutionsinstitutions thatthat cannotcannot affordafford toto havehave thethe necessarynecessary softwaresoftware toto editedit materialmaterial inin MXF/JPEG2000MXF/JPEG2000 or to transcode it, it is crucial29.. TheThe InstitutInstitut NationalNational dede l'Audiovisuel,l'Audiovisuel, thethe LibraryLibrary ofof CongreCongress, the Library and Archives of , and the National Archives of Australia are amongst the biggest audio-visual archives in the world digitising their material with a lossless JPEG2000 codec wrapped in MXF. This adoption amongst big institutionsinstitutions constitutesconstitutes aa communitycommunity withwith anan amountamount ofof materialmaterial andand resourcesresources importantimportant enoughenough toto thinkthink thatthat therethere will always be a solution to migrate and access these archives inin thethe future.future. WhileWhile therethere areare stillstill disparitiesdisparities inin thethe MXF profiles they use, the work being done on developing the AS-07-07 specificationsspecifications toto makemake aa commoncommon MXFMXF profileprofile forfor audio-visualaudio-visual archivesarchives isis encouragingencouraging givengiven thethe numbernumber ofof lalargerge institutionsinstitutions andand manufacturersmanufacturers involvedinvolved inin it.it. However, this profile is still a work in progress that started back in 200930,, andand therethere isis stillstill nono fixedfixed timetime frameframe asas toto when it will be ready.

7 of 9 8/23/2018, 4:09 PM A short guide to choosing a digital format for video archiving masters | S... hps://www.scart.be/?q=en/content/short‐guide‐choosing‐digital‐forma...

6. Financial considerations

To prepare the digital archive infrastructure properly, costs for (amongst others) storage, transcoding hardware and network requirements need to be evaluated before the digitisation process has even started. Uncompressed and losslesslossless compressedcompressed formatsformats generategenerate biggerbigger filesfiles thanthan lossylossy cocompression and therefore require more storage space and processing power. Costs will accumulate throughout the years, but storage and processing capacity also becomes cheaper every year according to Moore's law. For an important digitisation process, costs will probably decrease before the end of the digitisation is completed. From an archival and conservation point of view, storage costs shouldn't be a decisive criterion, and not one to favour over quality aspects and long-term sustainability. In the future,future, therethere isis aa bigbig chancechance thatthat thethe additionaladditional costcost forfor storstoringing losslesslossless compressedcompressed filesfiles willwill becomebecome aa lesserlesser concern than the quality of the digitised content. Use cases that we can't even imagine today might require a very high-quality video file. Lossy compression is a risky path that couldcould leadlead toto importantimportant qualityquality problemsproblems andand thethe safest decision is to avoid any type of information loss. If the digitisation process has to be remade because the quality standards chosen were too low, the costs would be far greaterreater thanthan thethe supplementarysupplementary investmentinvestment neededneeded toto storestore losslesslossless videovideo filesfiles today.today. MoreMore importantly,importantly, therethere isis aa bigbig chancechance thatthat renewedrenewed digitisationdigitisation wouldwould nono longerlonger be possible because of the advanced deterioration of the carriers, the obsolescence of the necessary playback equipment and the lack of skilled operators.

Conclusion

The VIAA digitisation projects are a unique opportunity for Flemish heritage institutions to digitise the audio-visual material they still have retained on obsolete formats. These obsolete master tapes should only be digitised once and should therefore be done with the best possible quality. As saidsaid earlier,earlier, whenwhen thethe necessarynecessary playbackplayback equipment is no longer available and the ageing of the carriers worsened, digitising the tapes again will be more difficult, more expensive, achieve lesser quality and in some cases simply be impossible. An ideal video combining all the criteria for long-term preservation may not yet exist , however several initiatives encourage us to thinkthink thatthat oneone willwill soonsoon emerge.emerge. UncertaintyUncertainty asas toto howhow formatsformats will or will not become the future standard makes it difficult to commit to one codec and one container. However, digitisation needs to take place now and it is not possible to wait for the perfect format to appear. Choosing a format should therefore be a trade off for which the best quality requirements and long-term sustainability are ensured.red.

Thanks to: Sue Bigelow (Vancouver City Archive), Carl Fleischhauer (Library of Congress), Hermann Lewetz (Austrian Mediathek) and Dave Rice (CUNY TV), for sharing their experience through emails and for providing some of the informationinformation usedused inin thisthis text.text. ThanksThanks alsoalso toto PeterPeter BubestingerBubestinger (Austrian(Austrian Mediathek)Mediathek) forfor hishis constructiveconstructive andand technicaltechnical .feedback.

References: A Guide to Understanding BBC Archive MXF Files by . Glanville and T. Heritage: http://downloads.bbc.co.uk/rd/pubs/whp/whp--files/WHP241.pdf Comparison of Lossless compression codecs: http://compression.ru/video/codec_comparison /lossless_codecs_2007_en.html/lossless_codecs_2007_en.html A Primer on Codecs for Moving Image and Sound Archives & 10 Recommendations for Codec Selection and Management by chris lacinak: http://www.avpreserve.com/wp-content/uploads/2010/04 /AVPS_Codec_Primer.pdf/AVPS_Codec_Primer.pdf Lossless Video Compression for Archives: Motion JPEG2k and Otherr Options:Options: http://web.archive.org /web/20080512040824/http://www.media-matters.net/d.../web/20080512040824/http://www.media-matters.net/d... Wikipedia page on FFV1: http://en.wikipedia.org/wiki/FFV1 FFV1 codec specifications: http://www.ffmpeg.org/~michael/ffv1.html FFV1 on the Library of Congress preservation website: http://www.digitalpreservation.gov/formats /fdd/fdd000341.shtml/fdd/fdd000341.shtml A very useful discussion on preservation formats on the Archivematica Google Group: https://groups.google.com/forum/#!topic/archivematica/HulV96gJ0go Refining Conversion Contract Specifications: Determining Suitablele DigitalDigital VideoVideo FormatsFormats forfor Medium-termMedium-term Storage by George Blood: http://www.digitizationguidelines.gov/audio-visual/documents /IntrmMastVi.../IntrmMastVi...

Notes

8 of 9 8/23/2018, 4:09 PM A short guide to choosing a digital format for video archiving masters | S... hps://www.scart.be/?q=en/content/short‐guide‐choosing‐digital‐forma...

1. See: http://www.viaa.be

2. See: https://www.projectcest.be/wiki/Glossarium:Archiveringsbestand

3. See: http://www.digitalpreservation.gov/formats/sustain/sustain.shtmll

4. See: http://www.interpares.org/display_file.cfm?doc=ip2_gs11_final_report_eng...

5. See: http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf

6. For further information about digital artefacts created by lossyy compressioncompression see:see: http://www.doc.ic.ac.uk/~nd/surprise_96

/journal/vol4/sab/report.html/journal/vol4/sab/report.html

7. IREIRE isis thethe acronymacronym ofof thethe NorthNorth AmericanAmerican InstituteInstitute ofof RadioRadio EngEngineers.ineers. TheThe termterm IREIRE isis alsoalso usedused asas aa videovideo levellevel

measurement unit.

8. For more detailed and technical explanation on analogue video levels see the presentation "What Should We Do Today: Toward

an Interim-Master for the Preservation of Digital AudiovisualAudiovisual Materials" for the Association of MovingMoving Image Archivists Annual

Conference 2011, Austin, TX presented by George Blood, George Bloodlood AudioAudio andand Video;Video; CourtneyCourtney Egan,Egan, NationalNational ArchivesArchives andand

Records Administration; and Jimi Jones, Library of Congress - Officeffice ofof StrategicStrategic Initiatives.Initiatives.

9. This could be done if made the technical specifications of DigitalDigital Betacam'sBetacam's bitstream'sbitstream's formatformat available.available.

10. LIMA, formerly known as the Netherland Media Art Institute. ForFor moremore infoinfo aboutabout theirtheir preservationpreservation workwork readread thethe followingfollowing cascase

study:study: http://scart.be/?q=en/content/case-study-report%C2%A0-digitisation-media...ion-media...

11. See the interview with Christophe Blase (ZKM): http://scart.be/?q=en/content/interview-christoph-blase-zkm

12. A well-known example of the use of (lossy) JPEG2000 is Package (DCP) that is used worldwide for digital

projection in cinemas.

13. Inter-frameInter-frame compressioncompression isis aa videovideo compressioncompression technique,technique, inin whwhich,ich, forfor mostmost frames,frames, onlyonly datadata aboutabout thethe differencedifference betweenbetween

thatthat particularparticular frameframe andand somesome otherother frameframe isis storedstored explicitlyexplicitly..

14. See the technical specification of the FFV1 codec: http://www.ffmpeg.org/~michael/ffv1.html

15. See: http://www.mediathek.at/

16. See: http://vancouver.ca/your-government/city-of-vancouver-archives.aspx

17. See: http://www.mumok.at/

18. See: https://www.projectcest.be/wiki/Software:Archivematica

19. See: http://www.dva-profession.mediathek.at/

20. See: http://www.theiabm.org/article.php?group_id=9168

21. MXF is the acronym for .

22. InIn combinationcombination withwith lossylossy JPEG2000JPEG2000 MXFMXF isis alsoalso usedused inin DigitalDigital Cinema Package (DCP) that is worldwide used for digital

projection in cinemas. MXF is also very popular inin the broadcastt sector,sector, especiallyespecially becausebecause thethe interchangeinterchange withwith lossylossy D10D10

(IMX)(IMX) isis fairlyfairly effective.effective.

23. See: http://www.amwa.tv/

24. See the EVS white ' The Integrity Of Technical InformationInformation InIn TapelessTapeless ProductionProduction AndAnd Long-TermLong-Term Archives':Archives':

http://www.evs.com/sites/default/files/download_area/resource/White%20Pa...

25. See: http://matroska.org/technical/guides/faq/index.html

26. This has also given a reputation to Matroska of being a 'pirate format'.format'.

27. See mkvalidator tool: http://www.matroska.org/downloads/mkvalidator.html

28. Also lossy compression is possible, depending on the choice made.

29. Lossless JPEG2000 is a format that is not suited forfor video editinging purposes.purposes. ThisThis meansmeans thatthat aa so-calledso-called mezzaninemezzanine formatformat isis

needed to use the video content.

30. See: http://www.digitizationguidelines.gov/guidelines/MXF_app_spec.htmltml

Nederlands Français

9 of 9 8/23/2018, 4:09 PM