Using TCP Through Sockets 1 File Descriptors
Total Page:16
File Type:pdf, Size:1020Kb
Using TCP Through So ckets David Mazi eres [email protected] 1 File descriptors 1 Most I/O on Unix systems takes place through the read and write system calls . Before discussing network I/O, it helps to understand how these functions work even on simple les. If you are already familiar with le descriptors and the read and write system calls, you can skip to the next section. Section 1.1 shows avery simple program that prints the contents of les to the standard output|just like the UNIX cat command. The function typefile uses four system calls to copy the contents of a le to the standard output. int openchar *path, int flags, ...; The op en system call requests access to a particular le. path sp eci es the name of the le to access; flags determines the typ e of access b eing requested|in this case read-only access. op en ensures that the named le exists or can b e created, dep ending on flags and checks that the invoking user has sucient p ermission for the mo de of access. If successful, op en returns a non-negative integer known as a le descriptor. All read and write op erations must be p erformed on le descriptors. File descriptors remain b ound to les even when les are renamed or deleted or undergo p ermission changes 2 that revoke access . By convention, le descriptors numb ers 0, 1, and 2 corresp ond to standard input, standard output, and standard error resp ectively. Thus a call to printf will result in a write to le descriptor 1. If unsuccessful, op en returns 1 and sets the global variable errno to indicate the nature of the error. The routine p error will print \ lename: error message" to the standard error based on errno. int read int fd, void *buf, int nbytes; read will read up to nbytes bytes of data into memory starting at buf. It returns the number of bytes actually read, which may very well b e less than nbytes. If it returns 0, this indicates an end of le. If it returns 1, this indicates an error. 1 High-level I/O functions suchasfread and fprintf are implemented in terms of read and write. 2 Note that not all network le systems prop erly implement these semantics. 1 int write int fd, void *buf, int nbytes; write will write up to nbytes bytes of data at buf to le descriptor fd. It returns the number of bytes actually written, which unfortunately may b e less than nbytes in some circumstances. Write returns 0 to indicate an end of le, and 1 to indicate an error. int close int fd; close deallo cates a le descriptor. Systems typically limit each pro cess to 64 le de- scriptors by default though the limit can sometimes be raised substantially with the setrlimit system call. Thus, it is a good idea to close le descriptors after their last use so as to prevent \to o many op en les" errors. 1.1 type.c: Copy le to standard output include <stdio.h> include <unistd.h> include <fcntl.h> void typefile char *filename { int fd, nread; char buf[1024]; fd = open filename, O_RDONLY; if fd == -1 { perror filename; return; } while nread = read fd, buf, sizeof buf > 0 write 1, buf, nread; close fd; } int main int argc, char **argv { int argno; for argno = 1; argno < argc; argno++ typefile argv[argno]; exit 0; } 2 2 TCP/IP Connections 2.1 Intro duction TCP is the reliable proto col many applications use to communicate over the Internet. TCP provides a stream abstraction: Two pro cesses, p ossibly on di erent machines, each have a le descriptor. Data written to either descriptor will b e returned by a read from the other. Such network le descriptors are called so ckets in Unix. Every machine on the Internet has a unique, 32-bit IP Internet proto col address. An IP address is sucient to route network packets to a machine from anywhere on the Inter- net. However, since multiple applications can use TCP simultaneously on the same machine, another level of addressing is needed to disambiguate which pro cess and le descriptor in- coming TCP packets corresp ond to. For this reason, each end of a TCP connection is named by 16-bit p ort number in addition to its 32-bit IP address. So howdoes a TCP connection get set up? Typically, a server will listen for connections on an IP address and p ort numb er. Clients can then allo cate their own p orts and connect to that server. Servers usually listen on well-known p orts. For instance, nger servers listen on p ort 79, web servers on p ort 80 and mail servers on p ort 25. A list of well-known p ort numb ers can be found in the le /etc/services on any Unix machine. The Unix telnet utility will allow to you connect to TCP servers and interact with them. By default, telnet connects to p ort 23 and sp eaks to a telnet daemon that runs login. However, you can sp ecify a di erent p ort numb er. For instance, p ort 7 on many machines runs a TCP echo server: athena telnet athena.dialup.mit.edu 7 ...including Athena's default telnet options: "-ax" Trying 18.184.0.39... Connected to ten-thousand-dollar-bil l.di alup .mi t.ed u. Escape character is '^]'. repeat after me... repeat after me... The echo server works! The echo server works! quit quit ^] telnet> q Connection closed. athena Note that in order to quit telnet, you must typ e Control-] followed by q and return. The echo server will happily echo anything you typ e like quit. As another example, let's lo ok at the nger proto col, one of the simplest widely used TCP proto cols. The Unix finger command takes a single argument of the form user@host. It then connects to p ort 79 of host, writes the user string and a carriage-return line-feed 3 over the connection, and dumps whatever the server sends back to the standard output. We can simulate the finger command using telnet. For instance, using telnet to do the equivalentofthe command finger [email protected], we get: athena telnet mit.edu 79 ...including Athena's default telnet options: "-ax" Trying 18.72.0.100... Connected to mit.edu. Escape character is '^]'. help These help topics are available: about general options restrictions url change-info motd policy services wildcards To view one of these topics, enter "help name-of-topic-you-want". ... Connection closed by foreign host. athena 2.2 TCP client programming Now let's see how to make use of so ckets in C . Section 2.3 shows the source co de to a simple nger client that do es the equivalent of the last telnet example of the previous section. The function tcpconnect shows all the steps necessary to connect to a TCP server. It makes the following system calls: int socket int domain, int type, int protocol; The so cket system call creates a new so cket, just as op en creates a new le descriptor. so cket returns a non-negative le descriptor number on success, or 1 on an error. INET, signifying an IP so cket, and When creating a TCP so cket, domain should b e AF type should be SOCK STREAM, signifying a reliable stream. Since the reliable stream proto col for IP is TCP, the rst two arguments already e ectively sp ecify TCP.Thus, the third argument can b e left 0, letting the Op erating System assign a default proto col TCP. which will b e IPPROTO Unlike le descriptors returned by op en,you can't immediately read and write data to a so cket returned by so cket. You must rst assign the so cket a lo cal IP address and p ort numb er, and in the case of TCP you need to connect the other end of the so cket to a remote machine. The bind and connect system calls accomplish these tasks. int bind int s, struct sockaddr *addr, int addrlen; bind sets the lo cal address and p ort numberofasocket. s is the le descriptor number of a so cket. For IP so ckets, addr must be a structure of typ e sockaddr in, usually as follows in /usr/include/netinet/in. h. addrlen must be the size of struct sockaddr in or whichever structure one is using. 4 struct in_addr { u_int32_t s_addr; }; struct sockaddr_in { short sin_family; u_short sin_port; struct in_addr sin_addr; char sin_zero[8]; }; Di erentversions of Unix mayhave slightly di erent structures. However, all will have family, sin port, and sin addr. All other elds should b e set to zero. the elds sin Thus, b efore using a struct sockaddr in, you must call bzero on it, as is done in tcpconnect. Once a struct sockaddr in has b een zero ed, the sin family eld must b e set to the value AF INET to indicate that this is indeed a sockaddr in. Bind cannot take this for granted, as its argument is a more generic struct sockaddr *. sin port sp eci es which 16-bit p ort numb er to use. It is given in network big-endian byte order, and so must b e converted from host to network byte order with htons. It is often the case when writing a TCP client that one wants a p ort numb er but do esn't care which one.