Web Services
Total Page:16
File Type:pdf, Size:1020Kb
Carnegie Mellon Web Services 15-213/18-243: Introduc3on to Computer Systems 23rd Lecture, 20 April 2010 Instructors: Bill Nace and Gregory Kesden (c) 1998 - 2010. All Rights Reserved. All work contained herein is copyrighted and used by permission of the authors. Contact [email protected] for permission or for more information. Carnegie Mellon Today HTTP and Sta4c Content Serving Dynamic Content Proxies Carnegie Mellon Client-server web communicaon Client Server Clients and servers communicate using the HyperText Transfer initiate TCP Protocol (HTTP) connection . Client and server establish TCP connec3on accept . Client requests content connection request . Server responds with requested content file . Client and server close connec3on send file (usually) Current version is HTTP/1.1 file . RFC 2616, June, 1999 received; close . hUp://www.ieW.org/rfc/rfc2616.txt connection closed Carnegie Mellon Web Content Web servers return content to clients . content: a sequence of bytes with an associated MIME (Mul3purpose Internet Mail Extensions) type Example MIME types . text/html HTML document . text/plain UnformaUed text . image/gif Binary image encoded in GIF format . image/jpeg Binary image encoded in JPEG format . video/mpeg Video encoded with MPEG Carnegie Mellon URL: Universal Resource Locator URL: A name to iden4fy an object managed by a server URLs for sta4c content: . http://www.cs.cmu.edu:80/index.html . http://www.cs.cmu.edu/index.html . http://www.cs.cmu.edu . Iden3fies a file called index.html, managed by a Web server at www.cs.cmu.edu that is listening on port 80 URLs for dynamic content: . http://www.imdb.com/find?s=all&q=the+net . Iden3fies an executable file called find, managed by a Web server at www.imdb.com, that has 2 input parameters: – s, which has a value of “all” – q, which has a value of “the net” (note whitespace changed to ‘+’ because of URL syntax rules) Carnegie Mellon Anatomy of an HTTP Transac4on unix> telnet www.aol.com 80 Client: open connection to server Trying 205.188.146.23... Telnet prints 3 lines to the terminal Connected to aol.com. Escape character is '^]'. GET / HTTP/1.1 Client: request line host: www.aol.com Client: required HTTP/1.1 HOST header Client: empty line terminates headers. HTTP/1.0 200 OK Server: response line MIME-Version: 1.0 Server: followed by five response headers Date: Mon, 08 Jan 2001 04:59:42 GMT Server: NaviServer/2.0 AOLserver/2.3.3 Content-Type: text/html Server: expect HTML in the response body Content-Length: 42092 Server: expect 42,092 bytes Server: empty line (“\r\n”) terminates hdr <html> Server: first HTML line in response body ... Server: 766 lines of HTML not shown. </html> Server: last HTML line in response body Connection closed by foreign host. Server: closes connection unix> Client: closes connection and terminates Carnegie Mellon HTTP Requests HTTP request is a request line, followed by zero or more request headers, followed by a blank line (CRLF), followed by an op=onal message body Request = Request-Line *(header CRLF) CRLF [message-body] Request line describes the object that is desired Request-Line = Method SP Request-URI SP HTTP-Version CRLF . Method is a verb describing the ac3on to be taken (details on next slide) . Request-URI is typically the URL naming the object desired . A URL is a type of URI (Uniform Resource Iden3fier) . See hUp://www.ieW.org/rfc/rfc2396.txt . HTTP-Version is the HTTP version of request (HTTP/1.0 or HTTP/1.1) Carnegie Mellon HTTP Requests (cont) HTTP methods: . GET: Retrieve an object (web page, etc) . Workhorse method (99% of requests) . “Condi3onal GET”if header includes If-Modified-Since, If-Match, etc. POST: Send data to the server (can also be used to retrieve dynamic content) . Arguments for dynamic content are in the request body . HEAD: Retrieve metadata about an object (validity, modifica3on 3me, etc) . Like GET but no data in response body . OPTIONS: Get server or object aUributes . PUT: Write a file to the server! . DELETE: Delete an object (file) on the server! Request = Request-Line . TRACE: Echo request in response body *(header CRLF) CRLF . Useful for debugging [message-body] Request-Line = Method SP Request-URI SP HTTP-Version CRLF Carnegie Mellon HTTP Requests (cont) Request Headers provide addi4onal informa4on to the server . Modify the request in some form header = Header-Name COLON SP Value Examples: . Host: www.w3.org . If-Modified-Since: Mon, 19 Aug 2010 19:43:31 GMT . User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7...Safari/533.4 . Connecon: Keep-Alive . Transfer-Encoding: chunked Request = Request-Line *(header CRLF) CRLF Message Body contains data objects [message-body] . Data for POST, PUT, TRACE methods Carnegie Mellon HTTP Responses HTTP response format is similar to request, except first line is a “status line” Response = status-line *(header CRLF) CRLF [message-body] status-line = HTTP-Version SP Status-code SP Reason-Phrase CRLF . HTTP-Version is the HTTP version of request (HTTP/1.0 or HTTP/1.1) . Status-code is numeric status, Reason-Phrase is corresponding English . 200 OK Request was handled without error . 301 Moved Permanently Object is in new loca5on . 404 Not found Server couldn’t find the file Response headers, similar to request headers . Content-Type: text/html . Content-Length: 22629 (in bytes) . Location: new-folder/new-object-name.html Carnegie Mellon HTTP Versions HTTP/1.0 (1990) got the web up and running . HTTP/1.0 uses a new TCP connec3on for each transac3on (i.e. request-reply) HTTP/1.1 (1999) added performance features . Supports persistent connec3ons . Mul3ple transac3ons over the same connec3on, amor3zing TCP startup . Connection: Keep-Alive . Supports pipelined connec3ons . Mul3ple transac3ons at the same 3me over single a TCP connec3on . Requires HOST header . Host: www.chatroulette.com . Supports chunked encoding (described later) . Transfer-Encoding: chunked . Adds addi3onal support for caching web objects Carnegie Mellon GET Request to www.cmu.edu from Safari Request-URI is just the suffix, not the en0re URL GET /index.shtml HTTP/1.1 Host: www.cmu.edu Referer: http://www.cmu.edu/index.shtml User-Agent: Mozilla/5.0 (Macintosh...Safari... Accept: */* Accept-Language: en-us Accept-Encoding: gzip, deflate Cookie: __unam=74ba...webstats-cmu=cmu128.2... Connection: keep-alive CRLF (\r\n) Carnegie Mellon GET Response From www.cmu.edu Server HTTP/1.1 200 OK Date: Tue, 20 Apr 2010 12:50:27 GMT Server: Apache/1.3.39 (Unix) ... mod_ssl/2.8.30 OpenSSL/0.9.6m+ Keep-Alive: timeout=5, max=99 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html CRLF (\r\n) <!DOCTYPE html PUBLIC ...> <html xmlns="http://www.w3.org/1999/xhtml"> <head> ... <title>Carnegie Mellon University</title> </head> <body> <table border="0" cellpadding="0" cellspacing="0" width="100%"> <tr> <td align="left" class="home_leftnav_bg" valign="top" width="252"> ... </body> </html> Carnegie Mellon Today HTTP and Sta4c Content Serving Dynamic Content Proxies Serving Dynamic Content Client sends request to GET /cgi-bin/env.pl HTTP/1.1 server. If request URI contains the Client Server string “/cgi-bin”, then the server assumes that the request is for dynamic content. – – 15-213, Fʼ09 Serving Dynamic Content (cont) The server creates a child process and runs the Client Server program identified by the URI in that process fork/exec env.pl – – 15-213, Fʼ09 Serving Dynamic Content (cont) The child runs and Client Server generates the dynamic Content content. Content The server captures the content of the child and env.pl forwards it without modification to the client – – 15-213, Fʼ09 Issues in Serving Dynamic Content How does the client pass program Request arguments to the server? Client Content Server How does the server pass these arguments to the child? Content Create How does the server pass other info relevant to the request to the env.pl child? How does the server capture the content produced by the child? These issues are addressed by the Common Gateway Interface (CGI) specification. – – 15-213, Fʼ09 CGI Because the children are written according to the CGI spec, they are often called CGI programs. Because many CGI programs are written in Perl, they are often called CGI scripts. However, CGI really defines a simple standard for transferring information between the client (browser), the server, and the child process. – – 15-213, Fʼ09 add.com: THE Internet addition portal! Ever need to add two numbers together and you just canʼt find your calculator? Try Dr. Daveʼs addition service at “add.com: THE Internet addition portal!” Takes as input the two numbers you want to add together. Returns their sum in a tasteful personalized message. After the IPO weʼll expand to multiplication! – – 15-213, Fʼ09 The add.com Experience input URL host port CGI program args Output page – – 15-213, Fʼ09 Serving Dynamic Content With GET Question: How does the client pass arguments to the server? Answer: The arguments are appended to the URI Can be encoded directly in a URL typed to a browser or a URL in an HTML link http://add.com/cgi-bin/adder?1&2 adder is the CGI program on the server that will do the addition. argument list starts with “?” arguments separated by “&” spaces represented by “+” or “%20” Can also be generated by an HTML form <form method=get action="http://add.com/cgi-bin/postadder"> – – 15-213, Fʼ09 Serving Dynamic Content With GET URL: http://add.com/cgi-bin/adder?1&2 Result displayed on browser: Welcome to add.com: THE Internet addition portal. The answer is: 1 + 2 = 3 Thanks for visiting! – – 15-213, Fʼ09 Serving Dynamic Content With GET Question: How does the server pass these arguments to the child? Answer: In environment variable QUERY_STRING A single string containing everything after the “?” For add.com: QUERY_STRING = “1&2” /* child code that accesses the argument list */ if ((buf = getenv("QUERY_STRING")) == NULL) { exit(1); } /* extract arg1 and arg2 from buf and convert */ ..