<<

Video Formats A. List of Most Common MPEG (Moving Pictures Expert Group): three formats, MPEG 1, 2, and 4. MPEG-•‐ 1: Old, supported by everything (at least up to 352x240), reasonably efficient. A good format for the web. MPEG-•‐2: A version of MPEG-•‐ 1, with better compression. 720x480. Used in HDTV, DVD , and SVCD. MPEG-•‐4: A family of codecs, some of which are open, others proprietary. H.264: Most commonly used codecs for uploaded to the web. Part of the tag">M PEG-•‐4 . MPEG spinoffs: (for music) and VideoCD. MJPEG (Motion JPEG): A codec consisting of a stream of JPEG ima ges. Common in video from digital cameras, and a reasonable fo rmat for editing videos, but it doesn't well, so it's not go od for web distribution. DV (): Usually used for video grabbed via firewire off a video camera. Fixed at 720x480 @ 29.97FPS, or 720x576 @ 25 FPS. Not very highl y compressed. WMV (Windows Video): A collection of Microsoft proprietary video codecs. Since version 7, it has used a special version of MPEG4. RM (Real Media): a closed codec developed by Real Networks for streaming video and audio. DivX: in early versions, essentially an ASF (incomplete early MPEG-•‐ 4) codec inside an AVI container; DivX 4 and later are a more full M PEG-•‐4 codec…no resolution limit. Requires more horsepower to play than mpeg1, but less than mpeg2. Har d to find mac and windows players. Sorenson 3: Apple's proprietary codec, commonly used for distributing movie trailers (inside a Quicktime container). Quicktime 6: Apple's implementation of an MPEG4 codec. RP9: a very efficient streaming proprietary codec from Real (not MPEG4). WMV9: a proprietary, non-•‐MPEG4 codec from Microsoft. : A relatively new from Xiph.org. Dirac: A very new open format under development by the BBC.

B. List of Most Common Containers AVI (): a Windows' con tainer. MPEG-•‐ 4 Part 14 (known as .mp4): is the standardized container for MPE G-•‐ 4. FLV (): the format used to deliver MPEG video throu gh Flash Player. MOV: Apple's QuickTime container format. OGG, OGM & OGV: open-•‐standard containers. MKV (Mastroska): another open-•‐ specification container that you've seen if you've ever downloaded anime. VOB (DVD Video Object): It's DVD's standard container. ASF: a Microsoft format designed for WMV and WMA— files can end in .wmv or .asf

Image Formats

 BMP (Windows )--Windows Paint: Microsoft paint's native format.  CompuServe GIF (Graphics Interchange Format): 8- images, and can't handle more than 256 colors. There is a loss of data inthis type of compression. It is used for online imagery, especially on the . There is an option to save interlaces , useful on the Web.  DCS (Desktop Color Separator)--QuarkXPress: This format enables color separations of image to be printed.  EPS (Encapsulated PostScript): This is a way of saving object-oriented graphics that are intended to be printed to a PostScript printing device. Many different applications use different versions of EPS, including Adobe Illustrator, FreeHand, Canvas, and CorelDraw.  HAM (Hold And Modify)--: a compressed version of IFF, but the images have to conform to on of two preset sizes.  IFF (Interchange )--Amiga: general graphics format, serves a similat function as PICT on the .  JPEG (Joint Photographic Experts Group): The most effective compression technique, which can be used at different levels of compression. It subdivides the image, and averages the values in each subdivision. It only saves relative differences within each of the subdivisions. This is a very effective It looks the worst on images that contain very large, sharp differences. It is useful for photographs, where the changes in value are not abrupt.  LZW (Lempel-Ziv-Welch): A compression technique that substitutes shorter strings of data for often-repeated code describing the image. There is no loss of quality.  MacPaint: A file format for MacPaint, considered to be pretty obsolete by now I guess. There is a lot of clip-art still in MacPaint.  PCX (doesn't stand for anything)--PC Paintbrush: The extension assigned to images saved in PC Paintbrush's native format.  Photo CD: These have their own file format supporting a YCC (supposed to be better than other models) but also store compressed PICT versions of each image.  Photoshop Native Formats: 2.0 and 3.0. Retain all of the data, including masking channels. There is some compression in the 3.0 version, but no loss of data at all.  PICT (Macintosh Picture): Native to the Macintosh system , handles object-oriented and bit-mapped images equally well.  PIXAR: format for use in PIXAR workstations, for 3D . Photoshop can open stills saved as PIXAR or save images as PIXAR so that they can be incorporated into 3D renderings. Supports RGB and Greyscale images.  PixelPaint: There are three native formats, 1.0, 2.0, and 3.0.  Premier Filmstrip: A format for exporting an Adobe Premier file to allow for frame- by-frame editing i9n Photoshop.  RLE (Run-Length Encoding): A for BMP files. Save some disk space without losing data.  Scitex CT (Continuous Tone): A file format for use with Scitex for pinting and scanning. Supports Greyscale and CMYK images.  TGA (Targa)--TrueVision: a format that allows you to overlay graphics onto live video.  TIFF (Tag Image File Format): Widely used across different platforms. Can't handle object-oriented files, and doesn't support JPEG compression. can be saved to be IBM or Macintosh compatible, and uses LZW compression.

List of Audio File Formats Open File Formats (supported by and most likely to work with our software)  - standard used mainly in Windows PCs. Commonly used for storing uncompressed (PCM), CD-quality sound files, which means that they can be large in size - around 10MB per minute of music. It is less well known that wave files can also be encoded with a variety of codecs to reduce the file size (for example the GSM or mp3 codecs). A list of common wave file codecs can be found here. Sample .wav file.  mp3 - the MPEG Layer-3 format is the most popular format for downloading and storing music. By eliminating portions of the audio file that are essentially inaudible, mp3 files are compressed to roughly one-tenth the size of an equivalent PCM file while maintaining good audio quality. We recommend the mp3 format for music storage. It is not that good for voice storage. See here for a sample mp3 encoded wav file. Sample .mp3 file.  ogg - a free, container format supporting a variety of codecs, the most popular of which is the . Vorbis files are often compared to MP3 files in terms of quality. But the simple fact mp3 are so much more broadly supported makes it difficult to recommend ogg files. Sample .ogg file.  - designed for telephony use in , gsm is a very practical format for telephone quality voice. It makes a good compromise between file size and quality. We recommend this format for voice. Note that wav files can also be encoded with the gsm codec. See here for a sample gsm encoded wav file. Sample .gsm file.  dct - A variable codec format designed for dictation. It has dictation header and can be encrypted (often required by medical confidentiality laws). See here for a supported in dct files. The standard dct player is the Express Scribe Transcription Player.  - a lossless compression codec. You can think of lossless compression as like but for audio. If you compress a PCM file to flac and then restore it again it will be a perfect copy of the original. (All the other codecs discussed here are lossy which means a small part of the quality is lost). The cost of this losslessness is that the compression ratio is not good. But we recommend flac for archiving PCM files where quality is important (eg. broadcast or music use). Sample .flac file.  au - the standard audio file format used by Sun, Unix and Java. The audio in au files can be PCM or compressed with the ulaw, alaw or G729 codecs. Sample .au file.  aiff - the standard audio file format used by Apple. It is like a wav file for the Mac. Sample .aif file.  vox - the vox format most commonly uses the Dialogic ADPCM (Adaptive Differential Pulse Code ) codec. Similar to other ADPCM formats, it compresses to 4-. Vox format files are similar to wave files except that the vox files contain no information about the file itself so the codec sample rate and number of channels must first be specified in order to play a vox file. Vox a very old file type and is pretty poor. We do not recommend it for anything except for supporting legacy systems. Sample .vox file.  raw - a raw file can contain audio in any codec but is usually used with PCM audio data. It is rarely used except for technical tests. Sample .raw file. Proprietary Formats (supported by our software)  wma - the popular Audio format owned by Microsoft. Designed with Digital Rights Management (DRM) abilities for copy protection. Sample .wma file.  aac - the Advanced is based on the MPEG4 audio standard owned by Dolby. A copy-protected version of this format has been developed by Apple for use in music downloaded from their iTunes Music Store. Sample .aac file.  atrac (.wav) - the older style ATRAC format. It always has a .wav file extension. To open these files simply install the ATRAC3 drivers. Sample .atrac file.  ra - a Real Audio format designed for streaming audio over the Internet. The .ra format allows files to be stored in a self-contained fashion on a , with all of the audio data contained inside the file itself. Sample .ra file.  ram - a text file that contains a link to the Internet address where the Real Audio file is stored. The .ram file contains no audio data itself.  dss - Digital Speech Standard files are an Olympus . It is a fairly old and poor codec. Prefer gsm or mp3 where the recorder allows.  msv - a Sony proprietary format for Memory Stick compressed voice files. You might need a Sony plugin to load this. Click here.  dvf - a Sony proprietary format for compressed voice files; commonly used by Sony dictation recorders. You might need a Sony plugin to load this. Click here. Formats not supported at this stage  none Other Formats  atrac (.oma, .omg, .atp) - the newer style Sony proprietary format designed for use. It always has a .oma, .omg or .atp file extension. It is similar to mp3 and probably only useful if you are reading files from or writing for minidiscs. Note most of these files are rights managed so you cannot open them in any software programs.  mid - the midi file is not an audio file format at all. It is just a list of musical notes which a synthesizer can play.  ape - the file format from Monkey's Audio is claimed to give about 50% compression without loss in audio quality.

File Formats An Overview of File Formats JSON JSON is a simple file format that is very easy for any to read. Its simplicity means that it is generally easier for computers to process than others, such as XML. XML XML is a widely used format for data exchange because it gives good opportunities to keep the structure in the data and the way files are built on, and allows developers to write parts of the documentation in with the data without interfering with the reading of them. RDF A W3C-recommended format called RDF makes it possible to represent data in a form that makes it easier to combine data from multiple sources. RDF data can be stored in XML and JSON, among other serializations. RDF encourages the use of URLs as identifiers, which provides a convenient way to directly interconnect existing open data initiatives on the Web. RDF is still not widespread, but it has been a trend among Open Government initiatives, including the British and Spanish Government Linked Open Data projects. The inventor of the Web, Tim Berners- Lee, has recently proposed a five-starscheme that includes linked RDF data as a goal to be sought for open data initiatives. Ads by BetterSurfAd Options Spreadsheets Many authorities have information left in the spreadsheet, for example Microsoft Excel. This data can often be used immediately with the correct descriptions of what the different columns mean. However, in some cases there can be macros and formulas in spreadsheets, which may be somewhat more cumbersome to handle. It is therefore advisable to document such calculations next to the spreadsheet, since it is generally more accessible for users to read. Comma Separated Files CSV files can be a very useful format because it is compact and thus suitable to transfer large sets of data with the same structure. However, the format is so spartan that data are often useless without documentation since it can be almost impossible to guess the significance of the different columns. It is therefore particularly important for the comma-separated formats that documentation of the individual fields are accurate. Furthermore it is essential that the structure of the file is respected, as a single omission of a may disturb the reading of all remaining data in the file without any real opportunity to rectify it, because it cannot be determined how the remaining data should be interpreted. Text Document Classic documents in formats like Word, ODF, OOXML, or PDF may be sufficient to show certain kinds of data - for example, relatively stable mailing lists or equivalent. It may be cheap to exhibit in, as often it is the format the data is born in. The format gives no support to keep the structure consistent, which often means that it is difficult to enter data by automated means. Be sure to use templates as the basis of documents that will display data for reuse, so it is at least possible to pull information out of documents. It can also support the further use of data to use typography markup as much as possible so that it becomes easier for a machine to distinguish headings (any type specified) from the content and so on. Generally it is recommended not to exhibit in word processing format, if data exists in a different format.

Plain Text documents (.txt) are very easy for computers to read. They generally exclude structural from inside the document however, meaning that developers will need to create a parser that can interpret each document as it appears. Some problems can be caused by switching plain text files between operating systems. MS Windows, Mac OS X and other Unix variants have their own way of telling the computer that they have reached the end of the line.

Scanned image Probably the least suitable form for most data, but both TIFF and JPEG-2000 can at least mark them with documentation of what is in the picture - right up to mark up an image of a document with full text content of the document. It may be relevant to their displaying data as images whose data are not born electronically - an obvious example is the old church records and other archival material - and a picture is better than nothing.

HTML Nowadays much data is available in HTML format on various sites. This may well be sufficient if the data is very stable and limited in scope. In some cases, it could be preferable to have data in a form easier to download and manipulate, but as it is cheap and easy to refer to a page on a website, it might be a good starting point in the display of data.

Open File Formats Even if information is provided in electronic, machine-readable format, and in detail, there may be issues relating to the format of the file itself. The formats in which information is published – in other words, the digital base in which the information is stored - can either be ―open‖ or ―closed‖. An open format is one where the specifications for the software are available to anyone, free of charge, so that anyone can use these specifications in their own software without any limitations on reuse imposed by intellectual property rights. If a file format is ―closed‖, this may be either because the file format is proprietary and the specification is not publicly available, or because the file format is proprietary and even though the specification has been made public, reuse is limited. If information is released in a closed file format, this can cause significant obstacles to reusing the information encoded in it, forcing those who wish to use the information to buy the necessary software. The benefit of open file formats is that they permit developers to produce multiple software packages and services using these formats. This then minimises the obstacles to reusing the information they contain. Using proprietary file formats for which the specification is not publicly available can create dependence on third-party software or file format license holders. In worst- case scenarios, this can mean that information can only be read using certain software packages, which can be prohibitively expensive, or which may become obsolete. The preference from the open government data perspective therefore is that information be released in open file formats which are machine-readable. Example: UK traffic data Andrew Nicolson is a software developer who was involved in an (ultimately successful) campaign against the construction of a new road, the Westbury Eastern bypass, in the UK. Andrew was interested in accessing and using the road traffic data that was being used to justify the proposals. He managed to obtain some of the relevant data via freedom of information requests, but the local government provided the data in a proprietary format which can only be read using software produced by a company called Saturn, who specialise in traffic modelling and forecasting.