Network Tools, Data Compression

1. Networking in Practical Terms We first explored, in terms of a 1750 network, principles of round-robin scheduled, packet-switched networks with confirmation. We then sawhow those principles are essential elements of the design of the modern Internet. We saw how the Internet delivers data by preceding the bits of information with headers that describe, in order,(1) the address of the next router on the journey, (2) the address of the ultimate destination, (3) the sequence number,destination program, and request for confirmation, and (4) data-specific description. We now move from basic principles and system design to some practical how-to’s. In particular,welook at: a) Howthe Internet attaches names (likewww.xyz.com) to IP addresses b) Howtofind out network details/information c) Methods for compressing data

2. Getting to KnowYour Network We saw that the Internet is a vast collection of local area networks connected by routers. Each network has a net- work ID number,and each machine on that network is givenacomputer ID number.This combination of network- ID.computer-ID (the IP address) identifies every single computer on the Internet. Furthermore, every computer can send packets of data to every other computer,simply by specifying the IP address of the destination. Once you start to understand this model of data communication, a fewquestions arise: a. Howdonames likewww.tufts.edu work? Where is its IP address? b. What is my IP address? What is your IP address? c. What route does a packet taketoget from my machine to its destination? d. What is the MACaddress of my computer? e. What are the routers to get me out of this local network? f. Howcan I get a lot of data through a connection with limited capacity?

3. DNS -Names for IP Numbers Every computer on the Internet is assigned an IP Address.AnIPaddress is a four-part number of the form net- work#.computer#. Each part is an 8-bit number,which, in theory,allows for about four billion addresses. PROBLEM Howcan people remember IP addresses? PROCEDURE Assign names to numbers. Forexample, google.com is at IP address 74.125.47.147. Youcould just type 74.125.47.147 into your browser and connect to google.com. But it is easier to remember a name than a sequence of four 8-bit numbers. It is easier,for example, to remember www.tufts.edu than it is to remember 130.64.1.83. When you send email, you use a name for the machine to which you are sending email, but you could use the IP address, what everitis. PROBLEM Howdoes a computer convert www.tufts.edu to 130.64.1.83 PROCEDURE It calls directory assistance (aka a DNS server) DNS stands for domain name system.Scattered all overthe Internet are computers that do for computers what the telephone number 411 does for people: looks up numbers for names. When you type www.google.com into your browser,your browswer connects to a DNS server and asks for the IP number of www.google.com. The DNS server might knowthe IP address of the name you request, in which case it sends back the IP number.Or it might not. In which case, it says something like, "That’sa.com address. Ask the server in charge of the .com addresses. That server is at the following IP address." Then your computer makes a second connection, this time to aserver in charge of the .com addresses. That second server might knowwhere www.google.com is, or it might not. If it does not, it says "That’sagoogle.com address. Ask the server overatgoogle.com. That server knows the IP address of www.google.com." And your computer makes a third call, this time to the server at google.com. That server tells your computer the IP address for www.google.com.

Dec 01, 2008 13:47 page 1 Network tools, Data Compression

Each domain,likegoogle.com, just likeeach business, keeps its own directory of the computers within its organiza- tion. For example, the DNS servers at Tufts keep a list of the IP addresses assigned to computers within the Tufts network. The .edu server just has to knowthe address of the DNS server at Tufts. Note, the Internet would exist and would work without DNS. It would not be as convenient for humans.

4. Unix Commands to View Network Information Howcan we look up a name ourselves? Howcan we see what is going on in our network? Unix has commands like ls, who, ps, date to tell you what is happening on the local machine, and Unix has commands to tell you about the Internet connections.

hostname hostname tells you the name of the computer you are using. The -i option asks for the IP address of the computer.

ifconfig ifconfig tells you about the network card(s) in the machine you are using. It tells the MACaddress and the IP address. If there are multiple cards, ifconfig lists them all.

dig dig asks DNS for the IP address for a name. traceroute traceroute shows you the path of a packet from one machine to another. Try /sbin/traceroute times.co.uk to see howyou read news from the Times of or /sbin/traceroute www.lemonde.fr.

ping ping tells you if a remote machine is alive and can tell you howlong it takes to get a packet there, and tells you howmanypackets get lost.

route route tells you howthe local machine figures out which router to use to get to what addresses.

arp arp tells you which IP addresses are assigned to local MACaddresses.

5. Sending Images - Fax vs Digital Images Afax machines scans a piece of paper and creates a black and white bitmap. The machine can scans at 98 lines per inch, or at 196 lines per inch. Faxalways scans at 203 bits per inch horizontally.Apage is 8.5" wide x 11" tall. ☞ Howmanypixels does one rowuse? ☞ Howmanypixels does a complete page use? ☞ Faxmachines transmit up to 14,400 bits/second. Howmanyseconds should it taketosend a complete page ? But fax machines do not takethat long usually.How dotheyreduce the time?

6. Compressing Data The Internet sends data as 1’sand 0’srepresented as pulses of electricity,flashes of light, bursts of radio wav es. And these 1’sand 0’sare ultimately stored on hard disks, RAM, flash drives, compact discs. The capacity of a wire, as measured in howmanypulses can be sent per second, and the capacity of a hard disk, as measured as howmany magnetic regions per square inch, are limited. Since the beginning of data storage and transmission, people have looked for ways to store more information in less space. What procedures have people devised to compress data? Why Compress Data? If some information can be represented in fewer bits, that information takes less time to transmit and takes less space to store. We knowhow touse bits to represent sound, text, and images. Howcan we compress songs, documents, and pictures?

6.1. Compressing Music MP3 is a format that compresses digital music. We hav e studied some of the techniques MP3 uses to store a piece of music using fewer bits than the full 16-bit, 2-channel, 44,100 samples per second CD standard. Some of the tech- niques, such as dropping quieter data that would be hard to hear along with louder sounds, is based on the way

Dec 01, 2008 13:47 page 2 Network tools, Data Compression human hearing works. Some techniques, such as dropping the sample rate, changes the resolution but leavesit within acceptable ranges.

6.2. Compressing Text Adouble-spaced page of text at an average of 10 characters per inch contains about 1800 characters. Therefore a 10-page paper uses about 18,000 bytes, which is 144,000 bits. Howcan one store the same 18,000 characters worth of text in fewer bits? There are twomajor techniques, one is based on characters, the other is based on words and phrases. First, let us look at compression in the earliest days of sending bits using electricity -- the Morse Code.

Samuel Morse designed his Morse Code to reduce Morse Code the number of dits and dahs sent overthe wire. The A._J.--- S ... 2 ..--- letter ’e’ is the most common letter in typical English B-... K -.- T -3...-- C-.-. L .-.. U ..- 4 ....- text. Morse assigned a single dit to the letter E and a D-.. M -- V ...- 5 ..... single dah to the letter T.Less common letters, like E. N-. W .-- 6 -.... Q, were assigned longer codes. Thus, data compres- F..-. O --- X -..- 7 --... G--. P .--. Y -.-- 8 ---.. sion has been built into electric communication sys- H.... Q --.- Z --.. 9 ----. tems since the beginning. What other techniques can I..R.-. 1 .---- 0 ----- we use to compress data? 7. Designing Your Own Compression System: Image Compression Consider,inthe spirit of the 1750 Fax System, an image transmitter from Earth to a colonyonMars. The transmis- sion station charges one dollar per bit. Youhav e a8x8 digital image you want to send to a pal of yours on a semester abroad on Mars. Drawapicture in the grid belowand we shall discuss systems to use to cut the cost down from $64 Apicture: Acompressed version:

______0-14

______15-29

______30-44

______45-59

8. Motivation for Text Compression (from 1857 text) We come to consider the relative expense of the transmission of messages in England and the States. In the foregoing lines we have shown, that England possesses, miles of line, 8,298; miles of wire, 44,845; the United States pos- sesses, miles of lines, 16,735; miles of wire, 23,281. We thus see, that the telegraph in the United States extends overmore than twice as much ground as the British lines; while on the other hand the system of telegraph in England is so much more fully developed, that nearly double the quantity of wire is in actual use. On the English lines, which are in the hands of three companies only,from 25,000 to 30,000 miles are worked on Cook and Wheatstone’ssystem; 10,000 on the magnetic system--without batteries;--3000 on Bain’schemical principle--which is rapidly extending;--and the remainder on Morse’splan. The price of the transmission of messages is less in America than in England, especially if we regard the distance of transmission. In America a message is limited to ten words; in England to twenty words; and the message is delivered free within a certain dis- tance from the station. In both countries the names and addresses of the sender and receiverare sent free of charge. The average cost of transmission from London to every station in Great Britain is 13/10 of a pennyper word per 100 miles. The average cost from Washington to all the principal towns in America is about 8/10 of a pennyper word per 100 miles. The ordinary scale of charges for twenty words in England is 1s. for fifty miles and under; as 2s. 6d. between fifty miles and 100 miles; all distances beyond that, 5s., with afew exceptions, where there is great competition. Having receivedthe foregoing statement from a most competent authority,its accuracymay be confidently relied upon.

Dec 01, 2008 13:47 page 3 Network tools, Data Compression

In conclusion, I would observethat the competition which is gradually growing up in this country must eventually compel a reduction of the present charges; but evenbefore that desirable opposition arrives, the companies would, in my humble opinion, exercise a wise and profitable discretion by modifying their present system of charges. Originally the addresses of both parties were included in the number of words allowed; that absurdity is nowgiv enup, but one scarcely less ridiculous still remains--viz., twenty words being the shortest message upon which their charges are based. A merchant in NewYork can send a message to NewOrleans, a distance of 2000 miles, and transact important business in ten words--say "Buy me a thousand bales of cot- ton--ship to ;" but if I want to telegraph from Windsor to London, a distance of twenty miles, "Send me my portman- teau," I must pay for twenty words. Surely telegraph companies would showasound discretion by lowering the scale to ten words, and charging two-thirds of the present price for twenty.Opposition would soon compel such a manifestly useful change; but, independent of all coercion, I believe those companies that strive the most to meet the reasonable demands of the public will always showthe best balance-sheet at the end of the year.--Thirteenpence is mere more than one shilling.

9. Fax Machine: Basic Principles and History

Faxmachines do not transmit each bit on a page. Instead, theyexamine each scan line for patterns. Apattern is a sequence of black pixels or a sequence of white pixels. Each pattern has a special code, just as each letter in the Morse code has a special code. And, just as the Morse Code givesshort codes to commonly appearing letters, the Faxcode givesshort codes to commonly appearing pat- terns. Guess what sort of patterns are very common. Consult the next page to see the coding system Fax machines use.

Fax: Origins and History The use of the fax machine to transmit images via tele- phone lines did not become common in American busi- nesses until the late 1980s, but the technology dates back to the nineteenth century. In 1843 in England, Alexander Bain (1818-1903) devised an apparatus comprised of twopens connected to twopen- dulums, which in turn were joined to a wire, that was able to reproduce writing on an electrically conductive surface. (image from: http://home.maine.rr.com/randylinscott/fax.htm) In 1862, the Italian physicist Giovanni Caselli built a machine he called a pantelegraph (implying a hybrid of pantograph and telegraph), which was based on Bain’s invention but also included a synchronizing apparatus. His pantelegraph was used by the French Post & Telegraph agencybetween and from 1856 to 1870. Elisha Gray (1835-1901), American inventor,born in Barnesville, Ohio invented and patented manyelectrical devices, including a facsimile transmission system. He also organized a companythat later became the Western Electric Company. In 1902, (1870-1945) in Germanyinv ented telephotography, a means for manually breaking down and transmitting still photographs by means of electrical wires. In 1907, Korn sent the first inter-city fax when he trans- mitted a photograph from Munich to Berlin.

In 1925, Edouard Belin (1876-1963) in France constructed the Belinograph. His invention involved placing an image on a cylinder and scanning it with a powerful light beam that had a photoelectric cell which could convert light, or the absence of light, into transmittable electrical impulses. The Belinograph process used the basic principle upon which all subsequent facsimile transmission machines would be based. In 1934, the Associated Press introduced the first system for routinely transmitting "wire photos," and 30 years later,in1964, the Xerox Corporation introduced Long Distance Xerography(LDX).

Dec 01, 2008 13:47 page 4 Network tools, Data Compression

Formanyyears, facsimile machines remained cumbersome, expensive and difficult to operate, but in 1966 Xerox introduced the Magnafax Telecopier,asmaller,46-pound facsimile machine that was easier to use and could be con- nected to anytelephone line. Using this machine, a letter-sized document took about six minutes to transmit. The process was slow, but it represented a major technological step. In the late 1970s, Japanese companies entered the market, and soon a newgeneration of faster,smaller and more efficient fax machines became available. (from http://www.ideafinder.com/history/inventions/fax.htm)

FaxMachine Run Length Coding System (for runs up to 64)

white black white black run code len code len run code len code len 000110101 8 0000110111 10 33 00010010 8000001101011 12 1000111 6 010 3 34 00010011 8000011010010 12 20111 4 11 2 35 00010100 8000011010011 12 31000 4 10 2 36 00010101 8000011010100 12 41011 4 011 3 37 00010110 8000011010101 12 51100 4 0011 4 38 00010111 8000011010110 12 61110 4 0010 4 39 00101000 8000011010111 12 71111 4 00011 5 40 00101001 8000001101100 12 810011 5 000101 6 41 00101010 8000001101101 12 910100 5 000100 6 42 00101011 8000011011010 12 10 00111 50000100 7 43 00101100 8000011011011 12 11 01000 50000101 7 44 00101101 8000001010100 12 12 001000 60000111 7 45 00000100 8000001010101 12 13 000011 600000100 8 46 00000101 8000001010110 12 14 110100 600000111 8 47 00001010 8000001010111 12 15 110101 6000011000 9 48 00001011 8000001100100 12 16 101010 60000010111 10 49 01010010 8000001100101 12 17 101011 60000011000 10 50 01010011 8000001010010 12 18 0100111 70000001000 10 51 01010100 8000001010011 12 19 0001100 700001100111 11 52 01010101 8000000100100 12 20 0001000 700001101000 11 53 00100100 8000000110111 12 21 0010111 700001101100 11 54 00100101 8000000111000 12 22 0000011 700000110111 11 55 01011000 8000000100111 12 23 0000100 700000101000 11 56 01011001 8000000101000 12 24 0101000 700000010111 11 57 01011010 8000001011000 12 25 0101011 700000011000 11 58 01011011 8000001011001 12 26 0010011 7000011001010 12 59 01001010 8000000101011 12 27 0100100 7000011001011 12 60 01001011 8000000101100 12 28 0011000 7000011001100 12 61 00110010 8000001011010 12 29 00000010 8000011001101 12 62 00110011 8000001100110 12 30 00000011 8000001101000 12 63 00110100 8000001100111 12 31 00011010 8000001101001 12 EOL 000000000001 12 00000000000 11 32 00011011 8000001101010 12

10. Summary: Three Compression Techniques You Should Know (a) run-length (b) Huffman (c) LZW

Dec 01, 2008 13:47 page 5