AtAutomatic ti pprocessiningg off CERN CHEP 2007 videovideo,id , audiodi andd phophotot archiveshi International Conference on Computing in Highg Energygy and Nuclear Physicsy MiMicha h ł KitkKwiatek, CERN IT DttDepartment,p email:il MiMich [email protected]@cern.ch h 2­7 September 2007,, Victoria BC Canada

Abstract The digitalization of CERN audio‐visual archivesarchives, a major task currently in progressprogress, will generateg over 40 TB of video,, audio and photop files. Storingg these files is one issueissue, but a far more important challenge is to provide long‐time coherence of the archive and to make these ffiles available on‐line with minimum manpowerp investment. An infrastructure, based on standard CERN services, has been implemented, whereby master filesfiles, stored in the CERN DistribDistributed ted File SstemSystem (DFS)(DFS), are disdiscovered o ered and scheduled for encoding into lightweight web formats based on predefined profiles. Changes in master filesfiles, conversion profiles or in the metadata database (read from CDS, the CERN Document Server) are automatically detected and the media re‐ encoded whenever necessary. The encoding processes are run on virtual servers providedpo dedono ‐demandde a d by theteCERNC ServerSe e SelfSe ServiceSe ce Centre,Ce t e, so thattatnewe serversse e s canca be easily configured to adapt to higher load. FinallyFinally, the generated files are made available from the CERN standard web servers with streamingg implementedp usingg Services. Architecture Overview

MdiMedia AhiArchiver softwaref Encoding of multimedia files into light‐weight formats is done using the Media Archiver software developed within CERN IT‐IS group. Media Archiver automatically discovers new or modified files and starts off‐line encoding. Changes to encoding profiles or metadata information are also automatically discovered. What is moremore, Media Archiver also enforces that files stored on DFS are described in CDS, which is important to make sure that the archive will be coherent in the long run. The Media Archiver software is based on Windows Media Encoder 9 seriesseries, which providesid APIAPIs to encoded audiodi andd videoid filfiles iinto WiWindows d MdiMedia AdiAudio andd VidVideo formats. Such files can then be streamed from standard CERN web servers. However, the architecture of Media Archiver is open and enables third‐party encoders to be used. In factfact, two open source products are already incorporated in the processp : FFmpegpg, for creatingg thumbnails of MPEG‐2 videos and Ogggg to encode audio files in the OGG format. The Media Archiver software creates added‐value by effectively integrating standard CERN servicesi : CERN DtDocument SServer supportedtdbby IT‐UDS group andd DFS andd webb servers supported by IT‐IS group. ConsequentlyConsequently, the synergies created allow to run ththe servicesi effectivelyff ti l andd ththe endd‐user getst a consistentit texperiencei .

Supportedpp formats

.WAVWAV .MP3MP3 MP3 Audio Files PCM Audio Files .OGGOGG OGG AdiAudio FilFiles .JPGJPG .WMA JPEG Photoh Filesl Files MediaArchiverMediaArchiver.exe exe .JPGJPG .MPGMPG JPEG Photo Files Archivingg workflow MPEG‐2 Video Files .WMVWMV CDSCDS, theh CERN DDocument SServer, iis theh centrall dbdatabase off CERN audio,di videoid andd photo archives. Information about media files is entered into CDS by members of WiWindows d MdiMedia VidVideo FilFiles .AVIAVI PhPhotolab t l b andd AdiAudio‐visualiltteams. MdiMedia filfiles are putt on DFS, CERN’CERN’s standardtdd .JPGJPG storage system. The Media Archiver software discovers new files immediately and iititinitiates ththe offff‐liline encodingdi process. DV AVI Video Files JPEG Video Thumbnails Media Archiver first queries CDS for metadata: formatformat, titletitle, description and date of recordingg. Encodingg profilesp are attributed based on the format. The metadata is References then included in the generated files. When the files are readyready, CDS is notified and 11. CERN Document Server: http://cdswebhttp://cdsweb.cern.ch/ cern ch/ links to the generatedg files are added to the CDS record. 2. CERN Windows On Demand service: ThThe endd‐user didiscovers audiodi ‐visualilrecordsd bby bibrowsing ththrough h CDSCDS, availableil bl https://wwwhttps://www.cern.ch/winservices/Services/WoD/Default.aspx cern ch/winservices/Services/WoD/Default aspx directly from CERN’sCERN s public welcome page. The files are linked from CDSCDS, but served 33. Windows Media Encoder 9 Series ffrom ththe MdiMedia AhiArchive webb server runningi ItInternet tIfInformation ti SServer andd h//ihttp://microsoft.com/windows/windowsmedia/forpros/encoder/default.mspxppp f /id/id di/f/ d/dfl . 44. Ogg Vorbis encoder: httphttp://www.rarewares.org/ogg //www rarewares org/ogg‐oggencoggenc.php php 55. FFmpeg encoder: http://ffmpeghttp://ffmpeg.mplayerhq.hu/ mplayerhq hu/