Today's Topics Main Memory Cells Figure 1.7 the Organization of a Byte- Size Memory Cell Main Memory Addresses
Total Page:16
File Type:pdf, Size:1020Kb
Today’s topics Main Memory Cells "! Data Representation Cell: A unit of main memory (typically 8 !! Memory organization bits which is one byte) !! Compression !! Most significant bit: the bit at the left (high-order) end of Upcoming the conceptual row of bits in a memory cell !! Systems !! Least significant bit: the bit at the right (low-order) end of !! Connections !! Algorithms the conceptual row of bits in a memory cell Reading Computer Science, Chapter 1-3 CompSci 001 3.1 CompSci 001 3.2 Brookshear 1-2 Figure 1.7 The organization of a byte- size memory cell Main Memory Addresses "! Address: A “name” that uniquely identifies one cell in the computer’s main memory !! The names are actually numbers. !! These numbers are assigned consecutively starting at zero. !! Numbering the cells in this manner associates an order with the memory cells. CompSci 001 3.3 CompSci 001 3.4 Brookshear 1-3 1-4 Figure 1.8 Memory cells arranged by Memory Terminology address "! Random Access Memory (RAM): Memory in which individual cells can be easily accessed in any order "! Dynamic Memory (DRAM): RAM composed of volatile memory CompSci 001 3.5 CompSci 001 3.6 Brookshear 1-5 Brookshear 1-6 Mass Storage Mass Storage Systems "! On-line versus off-line " ! Typically larger than main memory "! Magnetic Systems "! Typically less volatile than main memory !! Disk "! Typically slower than main memory !! Tape "! Optical Systems !! CD !! DVD "! Flash Drives CompSci 001 3.8 CompSci 001 3.9 Brookshear 1-8 Brookshear 1-9 Figure 1.9 A magnetic disk storage Figure 1.10 Magnetic tape storage system CompSci 001 3.10 CompSci 001 3.11 Brookshear 1-10 Brookshear 1-11 Figure 1.11 CD storage Files "! File: A unit of data stored in mass storage system !! Fields and keyfields "! Physical record versus Logical record "! Buffer: A memory area used for the temporary storage of data (usually as a step in transferring the data) CompSci 001 3.12 CompSci 001 3.13 Brookshear 1-12 Brookshear 1-13 Figure 1.12 Logical records versus Data Compression physical records on a disk "! Compression is a high-profile application !! .zip, .mp3, .jpg, .gif, .gz, … !! What property of MP3 was a significant factor in what made Napster work (why did Napster ultimately fail?) "! Why do we care? !! Secondary storage capacity doubles every year !! Disk space fills up quickly on every computer system !! More data to compress than ever before CompSci 001 3.14 CompSci 001 3.15 Brookshear 1-14 More on Compression Text Compression "! What’s the difference between compression techniques? !! .mp3 files and .zip files? !! .gif and .jpg? !! Lossless and lossy "! Is it possible to compress (lossless) every file? Why? "! Lossy methods !! Good for pictures, video, and audio (JPEG, MPEG, etc.) "! Lossless methods !! Run-length encoding, Huffman, LZW, … "! Input: String S "! Output: String S! 11 3 5 3 2 6 2 6 5 3 5 3 5 3 10! !! Shorter !! S can be reconstructed from S! CompSci 001 3.16 CompSci 001 3.17 Figure 1.13 The message “Hello.” in Text Compression: Examples ASCII “abcde” in the different formats Encodings ASCII: 01100001011000100110001101100100… ASCII: 8 bits/character Fixed: Unicode: 16 bits/character 000001010011100 0 1 Symbol ASCII Fixed Var. length length 0 1 0 0 a 01100001 000 000 0 1 0 1 0 b 01100010 001 11 a b c d e Var: c 01100011 010 01 000110100110 0 1 d 01100100 011 001 e 01100101 100 10 0 1 0 1 c e b 0 1 CompSci 001 3.18 CompSci 001 a d 3.19 Brookshear 1-18 Huffman coding: go go gophers Huffman Coding ASCII 3 bits Huffman "! D.A Huffman in early 1950’s g 103 1100111 000 00 "! o 111 1101111 001 01 13 Before compressing data, analyze the input stream p 112 1110000 010 1100 "! h 104 1101000 011 1101 6 7 Represent data using variable length codes e 101 1100101 100 1110 4 g o 3 "! Variable length codes though Prefix codes r 114 1110010 101 1111 s 115 1110011 110 101 3 3 s * 2 2 !! Each letter is assigned a codeword sp. 32 1000000 111 101 1 2 !! Codeword is for a given letter is produced by traversing p h e r the Huffman tree "! Encoding uses tree: 1 1 1 1 !! !! 0 left/1 right Property: No codeword produced is the prefix of another 6 !! Letters appearing frequently have short codewords, while !! How many bits? 37!! 4 g !! Savings? Worth it? o those that appear rarely have longer ones 2 2 3 3 "! Huffman coding is optimal per-character coding method 3 p h e r 1 1 1 1 s * 1 2 CompSci 001 3.20 CompSci 001 3.21 Building a Huffman tree Building a tree "! Begin with a forest of single-node trees (leaves) !! Each node/tree/leaf is weighted with character count “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” !! Node stores two values: character and count !! There are n nodes in forest, n is size of alphabet? "! Repeat until there is only one node left: root of tree !! Remove two minimally weighted trees from forest !! Create new tree with minimal trees as children, •!New tree root's weight: sum of children (character ignored) "! Does this process terminate? How do we get minimal trees? !! Remove minimal trees, hummm…… 11 6 5 5 4 4 3 3 3 3 2 2 2 2 2 1 1 1 CompSci 001 3.22 CompSci 001 3.23 I E N S M A B O T G D L R U P F C Building a tree Properties of Huffman coding "! Want to minimize weighted path length L(T)of tree T “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” "! L(T ) d w = ! i i i"Leaf (T ) 60 !! 37 23 wi is the weight or count of each codeword i !! di is the leaf corresponding to codeword i 21 16 12 11 "! How do we calculate character (codeword) frequencies? " 10 11 8 8 6 6 ! Huffman coding creates pretty full bushy trees? I !! When would it produce a “bad” tree? 5 5 5 6 4 4 4 4 3 3 "! How do we produce coded compressed data from input E N S M A B 3 2 3 3 2 2 2 2 efficiently? O T G D L R 1 1 1 2 C F P U 60 CompSci 001 3.40 CompSci 001 3.41 Decoding a message Huffman Tree 2 "! “A SIMPLE STRING TO BE ENCODED USING A MINIMAL 01100000100001001101! NUMBER OF BITS” ! 60 ! E.g. “ A SIMPLE” " “10101101001000101001110011100000” 37 23 21 16 12 11 10 11 8 8 6 6 I 5 5 5 6 4 4 4 4 3 3 E N S M A B 3 2 3 3 2 2 2 2 O T G D L R 1 1 1 2 C F P U CompSci 001 3.42 CompSci 001 3.63 Other methods Data Compression "! Adaptive Huffman coding Year Scheme Bit/ "! Why is data "! Lempel-Ziv algorithms Char compression !! Build the coding table on the fly while reading document 1967 ASCII 7.00 important? !! Coding table changes dynamically 1950 Huffman 4.70 "! How well can you !! Protocol between encoder and decoder so that everyone is 1977 Lempel-Ziv (LZ77) 3.94 compress files always using the right coding scheme 1984 Lempel-Ziv-Welch (LZW) – Unix 3.32 !! Works well in practice (compress, gzip, etc.) compress losslessly? "! More complicated methods 1987 (LZH) used by zip and unzip 3.30 !! Is there a limit? !! Burrows-Wheeler (bunzip2) 1987 Move-to-front 3.24 !! How to compare? !! PPM statistical methods 1987 gzip 2.71 "! How do you 1995 Burrows-Wheeler 2.29 measure how 1997 BOA (statistical data 1.99 much compression) information? CompSci 001 3.72 CompSci 001 3.73 .