Chapter 1: Data Storage

 1.1 and Their Storage  1.2 Main Memory  1.3 Mass Storage  1.4 Representing Information as Patterns  1.5 The Binary System  1.6 Storing Integers  1.7 Storing Fractions  1.8 Data Compression  1.9 Communication Errors

1 Computer == 0 and 1 ?

 Hmm…every movie says so….

Chapter 2 2 Why I can’t see any 0 and 1 when opening up a computer??

Chapter 2 3 What’s really in the computer With the help of an oscilloscope (示波器)

Chapter 2 4 HIGH Bits LOW

 Bit  Binary Digit  The basic unit of information in a computer  A symbol whose meaning depends on the application at hand.  Numeric value (1 or 0)  Boolean value (true or false)  Voltage (high or low)

 For TTL, high=5V, low=0V

5 More bits?

(kbit) 103 210  (Mbit) 106 220  (Gbit) 109 230  (Tbit) 1012 240  (Pbit) 1015 250  (Ebit) 1018 260  (Zbit) 1021 270

6 Bit Patterns

 All data stored in a computer are represented by patterns of bits:  Numbers  Text characters  Images  Sound  Anything else…

7 Now you know what’s a ‘bit’, and then?

 How to ‘compute’ bits?  How to ‘implement’ it ?  How to ‘store’ bit in a computer?

Chapter 2 8 How to do computation on Bits?

Chapter 2 9 Boolean Operations

 Boolean operation  any operation that manipulates one or more true/false values (e.g. 0=true, 1=false)  Can be used to operate on bits  Specific operations  AND  OR  XOR  NOT

10 AND, OR, XOR

11 Implement the Boolean operation on a computer

Chapter 2 12 Gates

 Gates  devices that produce the outputs of Boolean operations when given the operations’ input values  Often implemented as electronic circuits  Provide the building blocks from which computers are constructed

13 AND, OR, XOR, NOT Gates

Chapter 2 14 INPUT OUTPUT A B A NAND B NAND 001 011 101 NOR 110

XNOR

15 Implement a gate

 You can implement it with transistors  But today gates are normally packaged in Integrated Circuit (IC) forms

Chapter 2 16 17 How many gates in this picture?

NOT=?

NAND=?

NOR=?

Chapter 2 18 Classification of IC

Complexity Number of gates Small-Scale Integration (SSI) < 12 Medium-Scale Integration (MSI) 12-99

Large-Scale Integration (LSI) 100-999

Very Large-Scale Integration (VLSI) 10,000-99,999

Ultra Large-Scale Integration (ULSI) > 100,000

Chapter 2 19 How to ‘store’ the bit?

Chapter 2 20 Store the bit

 What do we mean by ‘storing’ the data?

Bits in the computer at the i th second = Bits in the computer at the (i + 1) th second

 For example  If the current state of the bit is ‘1’, it should be still ‘1’ at the next second

21 Using Flip-flop to store the bit

Chapter 2 22 Flip-flops?

NOT THIS!! 23 Flip-Flops

 Flip-Flop  a circuit built from gates that can store one bit of data  Two input lines  One input line sets its stored value to 1  Another input line sets its stored value to 0  When both input lines are 0, the most recently stored value is preserved

24 Flip-Flops Illustrated

X Y Z

X Z 0 0 -- 0 1 0 Flip-Flop 1 0 1 Y 1 1 *

25 A Simple Flip-Flop Circuit

X

Y

26 Setting to 1 – Step a.

27 Setting to 1 – Step b.

28 Setting to 1 – Step c.

29 Try This

 Show how the output can be turned to 0

30 Another Flip-Flop Circuit

X

Y

Try this: When X=0, Y=1 What would be the output?

31 Flip-Flops

 A kind of storage  Holds its value until the power is turned off

32 Types of Memory

 Volatile memory  A.k.a. dynamic memory  Holds its value until the power is turned off  Non-volatile memory  Holds its value even after the power is off

33 Capacitors

 Two metallic plates positioned in parallel  Principle  Tiny space between two plates  Voltage thru, charge the plats  Voltage off, charge remains on the plats  Connect two plates, charge off the plates  Many of these capacitors with their circuits on a wafer (chip)  Even more volatile than flip-flops  Must be replenished periodically

34 Flash Memory

 Originally made from EEPROM (Electrically Erasable ProgrammableRe ad-Only Memory)  Trapping electrons in silicon dioxide

SiO2 chambers/cell  Chamber very small  90 angstroms  1 angstrom is 1/10,000,000,000 of a meter  Non-volatile 35 History of Flash Memory

 PROM->EPROM->EEPROM->Flash memory

Chapter 2 36 Comparison

 EPROM (Erasable Programmable Read- Only Memory)  Use UV to erase the cell/bit  Erase the whole chip at a time  EEPROM(Electrically Erasable Program mable Read-Only Memory)  Use electric field  Erase one bit at a time  Flash memory  Erase one chunk/block of bits at a time

Chapter 2 37 Types of Memory

 Volatile memory  A.k.a. dynamic memory  Holds its value until the power is turned off  Examples, flip-flops, capacitors  Non-volatile memory  Holds its value even after the power is off  Examples, flash memory, magnetic storage, compact disk, etc

38 Hexadecimal Notation

 Why?  Things are in bit streams  Stream = a long string of bits.  Long bit streams

 E.g., 1000 = 11111010002

 Hexadecimal (16進位) notation  A shorthand notation for streams of bits.  More compact.

 1000=3E816

39 The Hexadecimal System

1000 = 11111010002 =3E816

40 Chapter 1: Data Storage

 1.1 Bits and Their Storage  1.2 Main Memory  1.3 Mass Storage  1.4 Representing Information as Bit Patterns  1.5 The Binary System  1.6 Storing Integers  1.7 Storing Fractions  1.8 Data Compression  1.9 Communications Errors

41 Storage Hierarchy

Speed Cost Capacity

Cache

Main Memory

e.g. Hard Disk On-line Storage

Near-line Storage e.g. Optical Disk

e.g. Magnetic Off-line Storage Tape low low large

42 CPU Cache

 One type of memory  On the CPU  Have fast access time  stores copies of the data from frequently used main memory locations.  As long as most memory accesses are in the cached memory the average latency of memory accesses will be closer to the cache latency

Chapter 2 43 Main Memory: Basic

 Cell  Manageable memory unit  Typically 8 bits   A string of 8 bits.  High-order end  The left end  Low-order end  The right end  Least significant bit  The last bit at the low-order end 44 A Byte-Size Memory Cell

45 Main Memory: Addresses

 Address  A “name” to uniquely identify one cell  Consecutive numbers  Usually starting at zero  I.e., having an order  “previous cell” and “next cell” have reasonable meanings  Random Access Memory  Memory where any cell can be accessed independently

46 Memory Cells

47 Not Quite the Metric System

 K  “Kilo-” normally means 1,000  = 210 = 1024  M  “Mega-” normally means 1,000,000  = 220 = 1,048,576  G  “Giga-” normally means 1,000,000,000  Gigaabyte = 230 = 1,073,741,824

48 Mass Storage Systems

 Non-Volatile  Data remains when computer is off  Usually much bigger than main memory  Traditionally usually rotating disks  Hard disk, floppy disk, CD-ROM  Much slower than main memory (mechanical vs. electrical)

 Data access must wait for seek time (head positioning)

 Data access must wait for rotational latency

49 Disk arm

read/write head

50 Hard Disk

sector

track

51 固態硬碟(Solid-State Disk or SSD)  No mechanical part  Based on Flash memory technology  Pros  Quiet, low power, immune to vibration  Cons  More expensive (smaller capacity)  Shorter life time (less number of read- write)  Can’t recover data from a damaged disk

Chapter 2 52 Compact Disk

53 Multi-layered CD

pits

54 55 How about CD-R (writable CD)

 Don’t have bumps on the surface  A organic dye layer covers the CD  2 lasers  write laser heats up the dye layer enough to make it opaque.  read laser senses the difference between clear dye and opaque dye the same way it senses bumps -- it picks up on the difference in reflectivity.

Chapter 2 56 CD-RW

 Use a special dye material  Can change its transparency depending on the temperature

Chapter 2 57 Magnetic Tape

58 Files

 File  The unit of data stored on a mass storage system.  Logical record and Field  Natural groups of data within a file  Physical record  A block of data conforming to the physical characteristics of the storage device.  E.g. disk sector

59 Logical vs. Physical Records

phone num address name

60  Two more logical records might be stored in the same physical record

 One logical record might be split into two physical record

61 Buffers

 Storage area used to hold data on a temporary basis  Usually during the process of being transferred from one device to another  Example: Printers with memory  Holding portions of a document that have transferred to the printer but not yet printed  Why we need buffer?

62 Why using buffer?

 Device sharing  E.g. 3 computers sharing the same printer  Connect devices of different speeds  E.g. Computer -> buffer -> printer

63 Chapter 1: Data Storage

 1.1 Bits and Their Storage  1.2 Main Memory  1.3 Mass Storage  1.4 Representing Information as Bit Patterns  1.5 The Binary System  1.6 Storing Integers  1.7 Storing Fractions  1.8 Data Compression  1.9 Communications Errors

64 Bit Patterns

 All data stored in a computer are represented by patterns of bits:  Text characters: “I love computer”  Numbers: 123456  Images: JPEG, GIF, etc  Sound:

65 How we interpret “0110101”?

 We use ‘coding’ to define the meaning of binary bits  Human Language is one kind of coding  e.g “gut” = gastrointestinal tract (English) = good (German)  Many different coding schemes  ASCII  EBCDIC  ISO/IEC  …. 66 “CODE” = translation rule

What does this code say? 67 ASCII - “Hello.”

68 Representing Text

 Printable character  Letter, punctuation, etc.  Assigned a unique bit pattern.  ASCII  7-bit values  For most symbols used in written English text  Unicode  16-bit values  For most symbols used in most world languages  ISO  32-bit values

69 Representing Number

 Binary notation  Uses bits to represent a number in base two  In practice  Not all numbers can be represented

 0.1666666666666666  Depends on the capability of your computer (CPU and OS)

 How many bits are used in MAC OS?

70 Limitations

 Overflow  Happens when a number is too big to be represented

 Truncation  Happens when a number is between two representable numbers  Think the real numbers..

 I will say more about this in a little bit

71 Representing Images

 Bitmap techniques  Points, pixels  e.g. 小畫家程式  Representing colors  BMP, DIB, GIF and JPEG formats  Scanners, digital cameras, computer displays  Vector techniques  Lines, scalable fonts, e.g., TrueType  e.g. Acrobat Reader  PostScript  CAD systems, printers

72 Vector vs. bitmap

Chapter 2 73 Representing Sound

74 Chapter 1: Data Storage

 1.1 Bits and Their Storage  1.2 Main Memory  1.3 Mass Storage  1.4 Representing Information as Bit Patterns  1.5 The Binary System  1.6 Storing Integers  1.7 Storing Fractions  1.8 Data Compression  1.9 Communications Errors

75 Decimal vs. Binary Systems

102 101 100

23 22 21 20

76 From Binary to Decimal

77 From Decimal to Binary

78 An Example

13  A 23  B 22  C  2  D

Polly Huang, NTU EE Chapter 2 79 Adding Binary Numbers

00111010  00011011 01010101

80 Fraction in Binary System

22 21 20 2-1 2-2 2-3

81 Adding Fractions

00010.011  00100.110 00111.001

82 Chapter 1: Data Storage

 1.1 Bits and Their Storage  1.2 Main Memory  1.3 Mass Storage  1.4 Representing Information as Bit Patterns  1.5 The Binary System  1.6 Storing Integers  1.7 Storing Fractions  1.8 Data Compression  1.9 Communications Errors

83 Two Kinds of Integers

 Unsigned integers  Numbers that are always positive  Signed integers  Numbers that can be positive or negative

84 Representing Signed Integers

 Two’s complement notation  The most popular representation  Excess notation  A less popular representation  Only as examples, different computer/OS can represent them in slightly different ways

85 Two’s Complement Notation

Chapter 2 86 Coding –6 using Four Bits

87 Adding Numbers

88 Try This

5 0101  4 Two’s Complement  ? ? ?

89 Overflow

5 0101  4 Two’s Complement  0100 -7 1001 Decimal

90 Excess Notation

91 Represent Floating-Point using Excess Notation (8 Bits)

. Mantissa x 2 Exponent 01101011 .10112110 10.11  2 3 4 92 Fraction to Floating Point

3/ 8  .01102100  01000110 3/ 8  .11002011  00111100 0  .00002000  00000000

93 ‘number of bits’ matters!

94 Loss of Digits

When adding two numbers, the ‘Exponent’ component needs to be the same, but this then could result in ‘loss of digits’

A x 2 B = A (right shift n bits) x 2 B +n e.g. 1000 x 20 = 0001 x 23

2 1  1  1 10.1 0.001 0.001 2 8 8  01101010  00011000  00011000  01101010  01100000  01100000  01101010  2 1 2 95 Chapter 1: Data Storage

 1.1 Bits and Their Storage  1.2 Main Memory  1.3 Mass Storage  1.4 Representing Information as Bit Patterns  1.5 The Binary System  1.6 Storing Integers  1.7 Storing Fractions  1.8 Data Compression  1.9 Communications Errors

96 Why compression?

 Save storage  Save energy/power  Save money

Chapter 2 97 Generic Data Compression

 Run-length encoding  Relative encoding  Frequency-dependent encoding  Adaptive dictionary encoding

98 Run-Length Encoding

 Replace a sequence with  The repeating value  The number of times it occurs in the sequence  Example:  1111111111100000000001111111100  (1, 11), (0, 10), (1, 8), (0, 2)

99 Relative Encoding

 Record the differences between consecutive data blocks  Each block is encoded in terms of its relationship to the previous block  e.g. 91,94,95,95,98,100,101,102,105,110  => -9,-6,-5,-5,-2,100,1,2,5,10  Real-life Example  Consecutive frames of a motion picture (MP4) 100 English letter frequency

Chapter 2 101 Frequency-Dependent Encoding

 Variable-length codes  “a” represented by 01  “q” represented by 10101  The length of the bit pattern used to represent a data item is inversely related to the frequency of the item’s use  x: data item  F(x): frequency of x appearing  C(x): code to represent the data item  |C(x)|  1/F(x)

102 The Idea

 Letters e, t, a, i are used more frequently than letters z, q, x  Use short bit patterns for letters e, t, a, i, and longer patterns for z, q, x

103 A Simple Example

 A 4-symbol case  Symbol A B C D

 Without special code  Symbol A B C D  Code 00 01 10 11

104 Huffman Coding

 A 4-symbol case  Symbol A B C D

 Probability 0.75 0.1 0.075 0.075

 Results  Symbol A B C D  Code 0 10 110 111 R 0.75*1 0.1*2 0.15*3 1.4 bits

105 Saving with Huffman Coding

 Normally  2 bits/sample for 4 symbols  Huffman coding  1.4 bits/sample on average

106 Dictionary Encoding

 searching for matches between the text to be compressed and a set of strings contained in a data structure (called the 'dictionary‘).  When the encoder finds such a match, it substitutes a reference to the string's position in the data structure.  Static dictionary vs. adaptive dictionary

Chapter 2 107 Static dictionary encoding

 one whose full set of strings is determined before coding begins and does not change during the coding process  often used when the message to be encoded is fixed and large

Chapter 2 108 Adaptive Dictionary Encoding

 Dictionary  Collection of building blocks from which the message being compressed is constructed  Frequent patterns  Dictionary is allowed to change during the encoding process  Future occurrence as a single reference to the dictionary

109 Lempel-Ziv (LZ77) Decoding: xyxxyzy (5, 4, x)

110 Another Example

 Decoding xyxxyzy (5,4,x)(0,0,w)(8,6,y)  xyxxyzyxxyzx(0,0,w)(8,6,y)  xyxxyzyxxyzxw(8,6,y)  xyxxyzyxxyzxwzyxxyzy

111 Lempel-Ziv (LZ77) Encoding

 For a message, divide it into  A text window (~Kbytes) followed by  A look-ahead buffer (10~1000 )  Then find the longest segment  In text window that agrees with a pattern (> 2)  Start from the beginning of the window  In the look-ahead buffer  Use a triplet, (,,) to encode the segment in the look-ahead buffer  Slide the window and buffer and repeat similar process

112 Lempel-Ziv Encoding: LZ77

 Look-ahead buffer: begin with 7  Text window: 8  Encoding xyxxyzyxxyzxwzyxxyzy  xyxxyzyxxyzxwzyxxyzy xyxxyzy(5, 4,)  xyxxyzyxxyzxwzyxxyzy xyxxyzy(5, 4, x)  xyxxyzyxxyzxwzyxxyzy xyxxyzy(5, 4, x)(0, 0, w)  xyxxyzyxxyzxwzyxxyzy xyxxyzy(5, 4, x)(0, 0, w) (8, 6,)  xyxxyzyxxyzxwzyxxyzy xyxxyzy(5, 4, x)(0, 0, w)(8, 6,y)  xyxxyzyxxyzxwzyxxyzy 113 Compressing Images

 Images  3-byte-per-pixel (RGB)  Large, unmanageable maps  Compression techniques  GIF (Graphic Interchange Format )

 Use LZW compression (extension of LZ77)  JPEG (Joint Photographic Experts Group)

114 JPEG’s Lossless Mode

 No information lost in encoding the picture  Code the difference between adjacent pixels (relative encoding)  The differences are encoded using a variable-length code (frequency-dependent encoding)  Leads to storage size not manageable  Seldom used

115 JPEG’s Baseline Standard

 Human eye is more sensitive to changes in brightness than to change in color  Encoding each brightness component but dividing the image into four-pixel blocks and recording only the average color of each block

116 JPEG’s Baseline Standard

 Save additional space by recording brightness and color change between components (relative encoding)  The final bit pattern is further compressed by applying variable-length encoding system  Compression ratio about 20 to 1

117 Compressing Sound

 MP3  MPEG-1 Audio Layer 3  Sound are composed of sine waves of different frequencies  Allocate number of bits according to the responses of human ear to sound in different frequencies (Huffman coding)  Compression ratio about 12 to 1

118 Chapter 1: Data Storage

 1.1 Bits and Their Storage  1.2 Main Memory  1.3 Mass Storage  1.4 Representing Information as Bit Patterns  1.5 The Binary System  1.6 Storing Integers  1.7 Storing Fractions  1.8 Data Compression  1.9 Communications Errors

119 Why Errors happen

 Temperature  Communication media  Cable vs. wireless  Interference

Chapter 2 120 Communication Errors

 Communication  Transmission of data  Data  bits  Error  bit pattern transferred  received  Error encoding techniques  Allow detection  Allow correction of errors

121 Adjusted for Odd Parity

 Making sure there’s an odd number of ‘1’s

‘A’ ‘F’

Problem: can only detect single bit error!!! 122 Error Correction Codes

 So that errors can be  Not only detected  But also corrected (probabilistically)

 Hamming distance  The number of bits in which two bit patterns differ

123 Error-Correcting Code

Hamming distance between A and B is 4

Any two patterns are separated by a Hamming distance of at least 3

An error bit will result in an illegal pattern

124 Decoding 010100

125 Questions?

126