• Part I -‐ Images
Total Page:16
File Type:pdf, Size:1020Kb
• Part I - Images • COLOR IMAGES – Image Formats • IMAGE STANDARDS – JPEG – JPEG 2000 Color image formats • There actually are three basiC methods or graphiC formats for a Computer to render, or store and display, an image: – raster – veCtor – metafile Raster image format • A Raster format breaks the image into a series of colored dots Called pixels. The number of ones and zeros (bits) used to Create each pixel denotes the depth of color you Can put into your images. – Raster image formats Can save at 8, 16, 24, and 32 bits-per-pixel. At the two highest levels, the pixels themselves Can Carry up to 16,777,216 different colors. – The main Internet formats, Bitmap, GIF, PNG, JPEG, are all Raster formats. Enlargement of a raster image. The quality is not improved VeCtor image format • An image stored in a veCtor format is an image defined by lines, Curves, CirCles etc, whiCh are stored as mathemaCal formulas. Compared to raster images only the formulas are stored. This makes the size of the file very small. The images don't loose foCus when you zoom, sinCe the lines are re-rendered. • This formang falls into open and proprietary formats made for speCifiC programs: – SVG (SCalable VeCtor GraphiCs) open standard Created and developed by the World Wide Web Consorum – AI (Adobe Illustrator) – CDR (CorelDRAW) – ….. • A veCtor image gives a very high quality, it requires small storage space and is easy to edit. You should always try to save your veCtor images in a veCtor format. It is not possible to save photos, sCanned images etc in a veCtor format. • Examples of veCtor images are drawings, diagrams and illustraons. Enlargement of a veCtor image. The quality is s_ll good Metafile format • An image in metafile format is a Combinaon of both of the two basiC formats; veCtor and raster format. Metafile formats are portable formats that Can inClude both veCtor and raster informaon. • Photos are stored in raster format. In some Cases you might want to put describing numbers, text and arrows in these images. Both text and arrows should be saved as veCtor informaon and not as raster to keep the good quality but the photos sll needs to be in a raster format. The storage is done in a meta format. • Examples of metafile formats: – the WMF (Windows metafile) – the EMF and EMF+ (Windows EnhanCed Metafile). EMF+ is a 32 bit format used by Windows aer Windows XP. It stores a list of funcon Calls that are issued to Windows GDI to display an image on sCreen and for printer drivers. It is the nave veCtor format for Word, Power Point and Publisher MS appliCaons. Enlargement of a meta image: the quality is good for the veCtor informaon Bitmap image format • The Bitmap image format was invented by MiCrosog as a deviCe independent image (DIB) format. It allows to store 2D digital images of any width, height and resolu_on, both monoChrome and color. TypiCally images are in unCompressed form but op_onally Can also be compressed. • Bitmap images Can have a pixel depth of 1, 4, 8, 16, 24, 32 bits. Bitmap images of 1, 4, 8 bits have a table for color Conversion. Images with higher depths have the color direCtly enCoded with the three RGB Components. • A bitmap in memory is loaded as a DIB struCture. It inCludes: – Header (file size…) – Bitmap info (size, n. pixel depth..) – Color table – Pixel map upside-down packed in row (rounded up to a mul_ple of 4 bytes) 24-bit depth bitmap Image Compression • Image Compression aims at reduCing the number of bits used to represent raster image Content. Compression Can be either lossless or lossy. Lossless Compression sChemes are reversible so that the original data Can be reConstruCted, while lossy sChemes acCept some loss of data in order to achieve higher Compression. • Lossless Compression algorithms usually exploit stas_Cal redundanCy to represent the sender's data more ConCisely, but nevertheless perfeCtly. Lossless Compression is possible beCause most real-world data has sta$s$cal redundancy. For example, in English text, the lejer 'e' is muCh more Common than the lejer 'z', and the probability that the lejer 'q' will be followed by the lejer 'z' is very small. • Lossy Compression instead assumes that some loss of fidelity is acCeptable. For example, a person viewing a piCture or television video sCene might not no_Ce if some of its finest details are removed or not represented perfeCtly. • What Compression teChnique is to be used depends on the appliCaon. In general: – Text doCuments: lossless compression – Data for numeriCal analysis: lossless compression – Programs: lossless compression – TypographiC images: lossless compression – WEB images: lossy compression – Video: lossy compression – Audio: lossy compression • Lossy Compression guarantees higher Compression rates Lossless compression Run-lenght enCoding • Run lenght enCoding is a fixed-lenght Coding sCheme. With run lenght enCoding, a sequenCe of equal symbols is enCoded with only one symbol, followed by a number that speCifies the _mes the symbol appears ConseCu_vely. A speCial symbol is required. • For example, if we Consider a text string with three Character sequenCes of 11 (eleven) Characters “r” each, followed by three sequenCes of 11 (eleven) “p” each and three sequenCes of 11 (eleven) “C” each, the whole string is enCoded as (speCial symbol $): $11r $11r $11r $11p $11p $11p $11c $11c $11c Total Characters: 6+6+6 = 18 Total numbers: 3+3+3=9 Total dimension: 27Byte = 216bit Prefix-free coding • Fixed-length codes are always uniquely deCipherable. However, these do not always give the best compression and variable length codes are preferred. • Prefix free coding is a coding scheme where no codeword is a prefix of another one. Every message encoded by a prefix free code is uniquely deCipherable. Since no codeword is a prefix of any other we Can always find the first codeword in a message, peel it off, and Con_nue deCoding. • We are therefore interested in finding good (best compression) prefix-free codes. Huffman Coding • Huffman Coding refers to the use of a variable-length Code for enCoding a sourCe symbol, where the variable-length Code has been derived on the basis of the es_mated probability of oCCurrenCe of the sourCe symbol. • Huffman Coding uses a speCifiC method for Choosing the representaon for each symbol: – a prefix Code that expresses the most Common sourCe symbols with shorter strings of bits than for less Common sourCe symbols. – the bit string represen_ng some par_Cular symbol is never a prefix of the bit string represen_ng any other symbol (prefix-free Code) • It is the most effiCient Compression method of this type: no other mapping of individual sourCe symbols to unique strings of bits will produCe a smaller average output size with the same symbol frequenCies. • The Huffman Code for an alphabet (set of symbols) may be generated by ConstruC_ng a binary tree with nodes Containing the symbols to be enCoded and their frequenCies of oCCurrenCe. The tree may be ConstruCted as follows: – Step 1. SeleCt the two parentless nodes with the lowest frequenCies. – Step 2. Create a new node whiCh is the parent of the two lowest frequenCy nodes. – Step 3. Assign the new node a frequenCy equal to the sum of its Children's frequenCies. – Step 4. Repeat Step 1 un_l there is only one parentless node leg. • Huffman code example: Consider a set of five different symbols. The symbol's frequencies are: 0 1 A 24 B 12 C 10 A 0 1 D 8 24 E 8 If not encoded, this results into a total of 186 bit (3 bit per codeword) 0 1 0 1 Huffman enCoding: – Step 1. Combine D and E into DE with a frequency of 16 B C D E – Step 2. Combine B and C into BC with a frequency of 22 10 12 8 8 – Step 3. Combine BC and DE into BCDE with a frequency of 38 – Combine A with BCDE into ABCDE with a frequency of 62 • The code for each symbol may be obtained by tracing a path to the symbol from the root of the tree. A 1 is assigned for a branCh in one direCon and a 0 is assigned for a branCh in the other direCon. • The running me of Huffman's method is fairly efficient, it takes O(n log n) operaons to ConstruCt it. 0 1 • Building the Code tree: A Symbol FrequenCy Code Code length Total lenght 0 1 24 A 24 0 1 24 B 12 100 3 36 C 10 101 3 30 0 1 0 1 D 8 110 3 24 E 8 111 3 24 B C D E 12 8 8 ---------------------------------------------------------------------------------- 10 186 bit 138 bit (3 bit Code) (Huffman enCoding) DeCoding Huffman enCoded files • In order to deCode Huffman encoded files, the deCoding algorithm must know what code was used to encode the data. A table Containing symbols and their codes or the Huffman tree should be used. • Decoding a file is a two step process: – the header data is read in and the Huffman code for each symbol is reConstruCted. – the encoded data is read and deCoded. • The fastest method for deCoding symbols is to read the enCoded file one bit at _me and traverse the Huffman tree acCording to each of the bits un_l a leaf Containing a symbol is reached. When a bit Causes a leaf of the tree to be reached, the symbol Contained in that leaf is wrijen to the deCoded file, and traversal starts again from the root of the tree. 0 1 • Example A 0 1 24 Input sequenCe 0111100110111 Decoded sequence A E B D E 0 1 0 1 B C D E 10 12 8 8 • Although Huffman's original algorithm is op_mal for a stream of unrelated symbols with a known input probability distribu_on, it is not op_mal when the probability mass funC_ons are unknown, not iden_Cally distributed, ….