HHTTTTPP

BByy BBaarrddiiaa ,, PPaattiitt,, aanndd RRoozzhheehh HTTP - Introduction

- Hyper Text Transfer Protocol -uses the TCP/IP technology -has had the most impact on the (WWW) - specs in RFC 2616 (RFC2616) HTTP - Importance of The Web

• - before HTTP , FTP data transfers accounted for approximately 1/3 of the traffic • - HTTP inception in 1990s and by 2000 completely overshadowed other applications HTTP - Importance of The Web (continued)

• - companies have web sites, online catalogs • - Internet and Web are indistinguishable for most users • - Uses of the Web include Graphical Design of Information, Dissemination of Research http://info. cern.ch/ (world’s first-ever web ) European organization for Nuclear Research, browsing and ordering of products, and customer support, display of create arts HTTP - Architectural Components • - Web consists of large set of documents called Web Pages • - web pages considered hypermedia document • - media suffix used to indicate that document contains items other then text , such as graphics • - hyper prefix used to indicate document can contain selectable links • - Hyper Text Markup Language (HTML) used to present mixture of text and images HTTP - Sample HTML Page

• • • MyPage.<a href="/tags/HTML/" rel="tag">html</a> - My Home Page • • • Welcome to My Home Page • HTTP - Sample HTML Page HTTP - Uniform Resource Locator (URL) • - each page assigned a unique URL name that is used to identify it http://hostname[:port]/path[;parameters][[?query] • - http / ftp = scheme specifies the transfer protocol, • - hostname string specifies the or IP address of the server • - :port is an optional protocol port number needed only in case the server does not use the default port 80 HTTP - simple URL • Example: http://www.csun.edu/ URL - Query • Example: http://www.google.com/search?hl=en&lr=&safe=off& q=the+last+page+on+the+internet&btnG=Search

• • • •

• • URL - last comment

• Each Web Page is assigned a unique identifier known as a Uniform Resource Locator (URL). The absolute form of a URL contains a full specification; a relative form that omits the address of the server is only useful when the server is implicitly known. Fully validated URL

• Good for www.externalsite.com to www.othersite.com • Access key details Internal URL • Good for www.internalsite.com www.internsite.com • Local server validated URL: • Accessibility HTTP - Header Definition

• HTTP/1.1 header fields. For entity-header fields, both sender and recipient refer to either the client or the server, depending on who sends and who receives the entity. • Example: The most common usage is a clear-text request by the client followed by a server demand to upgrade the connection • Client: • GET /encrypted-area HTTP/1.1 • Host: www.example.com • Server: • HTTP/1.1 426 Upgrade Required • Upgrade: TLS/1.0, HTTP/1.1 • Connection: Upgrade HTTP - Header GET Example

• Below is a sample conversation between an HTTP client and an HTTP server running on www.example.com, port 80. • Client request (followed by a blank line, so that request ends with a double newline, each in the form of a carriage return followed by a line feed): • GET /index.html HTTP/1.1 • Host: www.example.com • The "Host" header distinguishes between various DNS names sharing a single IP address, allowing name-based virtual hosting. While optional in HTTP/1.0, it is mandatory in HTTP/1.1. • Server response (followed by a blank line and text of the requested page): • HTTP/1.1 200 OK • Date: Mon, 23 May 2005 22:38:34 GMT • Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) • Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT • Etag: "3f80f-1b6-3e1cb03b" • Accept-Ranges: bytes • Content-Length: 438 • Connection: close • Content-Type: text/html; charset=UTF-8 HTTP Status Codes

• * 1 1xx Informational • * 2 2xx Success • * 3 3xx Redirection • * 4 4xx Client Error • * 5 5xx Server Error • * 6 See also • * 7 External links HTTP Status Code - 1xx Informational

• Request received, continuing process. • * 100: Continue • * 101: Switching Protocols HTTP Status Code - 2xx Success • The action was successfully received, understood, and accepted. • * 200: OK • * 201: Created • * 202: Accepted • * 203: Non-Authoritative Information • * 204: No Content • * 205: Reset Content • * 206: Partial Content • * 207: Multi-Status • For use with XML-based responses when a number of actions could have been requested – details of the separate statuses are given in the message body. See WebDAV for associated specifications. HTTP Status Code - 3xx Redirection • The client must take additional action to complete the request. * 300: Multiple Choices * 301: Moved Permanently • This and all future requests should be directed to another URI. • * 302: Found • This is the most popular redirect code, but also an example of industrial practice contradicting the standard. HTTP/1.0 specification (RFC 1945) required the client to perform temporary redirect (the original describing phrase was "Moved Temporarily"), but popular browsers implemented it as a 303 See Other. Therefore, HTTP/1.1 added status codes 303 and 307 to disambiguate between the two behaviors. However, majority of Web applications and frameworks still use the 302 status code as if it were the 303. • See also 302 Google Jacking. • * 303: See Other (since HTTP/1.1) • The response to the request can be found under another URI using a GET method. • * 304: Not Modified • * 305: Use Proxy (since HTTP/1.1) • Many HTTP clients (such as Mozilla and ) don't correctly handle responses with this status code. • * 306 is no longer used, but reserved. Was used for 'Switch Proxy'. • * 307: Temporary Redirect (since HTTP/1.1) In this occasion, the request should be repeated with another URI, but future requests can still be directed to the original URI. In contrast to 303, the original POST request must be repeated with another POST request. HTTP Status Code - 4xx Client Error • The request contains bad syntax or cannot be fulfilled. • * 400: Bad Request • * 401: Unauthorized • Similar to 403/Forbidden, but specifically for use when is possible but has failed or not yet been provided. See basic authentication scheme and digest access authentication. • * 402: Payment Required • The original intention was that this code might be used as part of some form of digital cash/micropayment scheme, but that has never eventuated, and thus this code has never been used. • * 403: Forbidden • * 404: Not Found • * 405: Method Not Allowed • * 406: Not Acceptable • * 407: Proxy Authentication Required • * 408: Request Timeout • * 409: Conflict • * 410: Gone • * 411: Length Required • * 412: Precondition Failed • * 413: Request Entity Too Large • * 414: Request-URI Too Long • * 415: Unsupported Media Type • * 416: Requested Range Not Satisfiable • * 417: Expectation Failed • * 449: Retry With A Microsoft extension: The request should be retried after doing the appropriate action. HTTP Status Code - 5xx Server Error • The server failed to fulfil an apparently valid request. • * 500: Internal Server Error • * 501: Not Implemented • * 502: Bad Gateway • * 503: Service Unavailable • * 504: Gateway Timeout • * 505: HTTP Version Not Supported • * 509: Bandwidth Limit Exceeded • This status code, while used by many servers, is not an official HTTP status code. How a browsers contacts to a ? • The browsers begins with a URL, extracts the hostname section, uses DNS to map the name into an equivalent IP Address, and uses the IP address to form a TCP connection to the server. • Once the TCP connection is in place, the browser and web server use HTTP to communicate; the browser sends a request to retrieve a specific page and the server responds by sending a copy of the page HTTP GET REQUEST A browser sends an HTTP GET command to request a web page from a server. The request consist of a single line of text that begins with key word “GET” followed by a URL and an HTTP version number Example: If we want to retrieve the web page for comp429 from server wwwwww..ccssuunn..eedduu a browser can send the following request: • GEThttp://www.csun.edu/comp429/officehour/http/1.1 • Once a TCP connection is in place, there no need to send an absolute URL --- the following relative URL will retrieve the same page GET /comp429/officehour/http/1.1 TO SUMMARIZE:

• The HTTP or Hypertext transfer Protocol is used between the browser and a web server. The browser send a GET request to which a server responds by sending the requested item. What should a web server respond when it receives an illegal request? The answer is simple the sever send the error message to the browsers via HTML. Why? - because since the request has been sent by a browser, so the browser will attempt to display whatever the server returns. Example of an Error Messages: 400 bad request

bad request

your browser sent a request that this server could not understand it will appear on the user’s screen like bad request your browser sent a request that this server could not understand. Persistent Connections • The first version of HTTP used TCP connection per data transfer. • As a result it was increasing the load on HTTP server causing congestion on the internet. • So later the new version of HTTP was implemented. (HTTP version 1.1) What new in HTTP version1.1? • Using persistent connection approach as the default. That is once a client opens a TCP connection to server, the client leaves the connection in place during multiple requests and responses. When either a client or server is ready to close the connection, it informs the other side ,and the connection is closed. The advantage of persistent connection • Fewer TCP connections means lower response latency , less over head on the under lying networks, less memory used for buffers, and less CPU time is used • HTTP response and request can be pipelines. Pipelining allows browsers to do multiple request without waiting for each response, more effiently lesser elapsed time. The disadvantage of persistent connection • We need to identify the beginning and end of each item send over connection. • 2 techniques to handle the situation 1) send a length followed by the item. 2) send sentinel value after the item to mark the end. Is it possible that a server to know the length of an item before sending?

The answer is NO.

- As we know some webpage is being generated upon request. ( think of it as the new webpage is being created or updating the webpage all the time.) Ex) all the news webpage. • So it not a good idea that the server keeping track of the data length all the time. (-delays transmission by saving data to a file before sending) How the server handle with this situation? • If the server doesn’t know the length of an item a priori, the server will inform the browser that it will close the connection after transmitting the item. To summarize: • To allow a TCP connection to persist through multiple requests and responses, HTTP sends a length before each response. If it does not know the length, a server informs the client, sends the response, and then close the connection. What representation should a server use to send length information? • Interestingly HTTP borrows the basis format from e-mail,using the 2822 format and MIME extension. • So that each HTTP transmission contains a header, a blank line, and the item being sent. • Header contains a keyword, a colon, and information. Example of item appear in the header..

Header Meaning Content-Length Size of item in octets Content-Type Type of the item Content-Encoding Encoding used for item Content-Language Language(s) used in item Example when HTML document is transferred across a persistent TCP connection. Content-Length: 34 Content-Language: en Content-Encoding: ascii Blank line. Follow by the document A n example. • In addition HTTP includes a wid variety of headers that allow a browser and server to exchange meta information. Close Connection.

We said that if a server does not know the length of an item, the server will close the connection after sending the item. Here is how the server informs the browser to expect a close. To do so, the server includes a Connection header before the item in place of a content-length header: Connection:close when it receives a connection header, browser will know that the server intends to close connection ; the browser is forbidden from sending further request. HTTP CACHING in HTTP: is a local storage of response messages of a program and the subsystem that controls message storage, retrieval, and deletion. The objective of HTTP caching is to improve the performance by reducing the response time and network bandwidth consumption in future and equivalent requests by saving copies of results of requests. Caching Advantages: • Reduced User Experienced Latency • Reduced Load on the Network • Reduced Load on the Origin Server • Reduces or eliminates send/request entire cycles and sending full responses. • Also enables access to web pages offline by browser cache. CCoonnddiittiioonnaall RReeqquueesstt iinn CClliieenntt CCaacchhiinngg With If-Modified-Since in header of the GET request • “Conditional GET” client server client: http request message with Conditional Get Specifies date of cached object copy in the http request not If-modified-since: http response modified HTTP/1.1 304 Not Modified server: Response,C contdaiintsi onnoa l Request http request message object, if cAalclhoewds cborpoyw isse r to checkw citahc Choendd itcioonpayl Gfeotr freshness up-to-dateE: liminates useless latency object HTTP/1.1 304 Not modified Modified http response Example: HTTP/1.1 200 OK … If-Modified-Since: Wed, 22 No6 16:20:01 GMT What is Cacheable? • Protocol Specific Considerations – Responses to “OPTIONS”, “PUT”, and “DELETE” methods are not cached. – Directive “No-store” prevents caching. – Directive “No-cache” forces revalidation. – Presence of “Authorization” can prevent caching. • Content Specific Considerations – A caTchyeapbeles c oonft eHntT isT noPt aClwaaycsh cianchge:d. A cache generally has its own set of a•ddiBtiornoawl rsuelers .cache: – Things that are prone to change: Dyn a m ic aOllyb gjeencetrast esdt ofilrees,d c ooonki ehs,a srcdri pdteids kre sopof ncselis.ent – Thin•gs wPhriochx mya cy ancoht ceh:ange: Such as Electronic book and media files – Things w h i c h O arne edr acianicnhg:e, serves multiple users L a rge andH leists frraeqtueesn tolyf r5eq0u%est esdo. metimes possible HTTP COOKIES

• The HTTP protocol is stateless, meaning that, it does not keep track of requests made to the server. • In HTTP protocol, each request is independent and unrelated. • When state information needs to be preserved across requests, one may use HTTP cookies. • A cookie is a (name, value) pair that a web server (an application running on the web server) can ask the client to remember it. • The client sends this (name, value) pair along with every request to the web server. • The web server then passes this over to the application that requires it. Cookies are HTTP headers. •UsAer Cseormvpeur tgerives the browser a cookie CbySU sNen Sdeirnvge ra Set-Cookie wwhwea.cdseurn .leidnue with the response. • A cookie is set as follow: Set-Cookie: NAME=VALUE; expires=DATE; path=PATH; domain=DOMAIN_NAME; secure Example: Set-Cookie: MyColour=lavender; expires=Thursday 22-Nov- 2006 00:00:00 GMT

• A client sends back a cookie by sending a Cookie header line with the request. Cookie: NAME1=OPAQUE_STRING1; NAME2=OPAQUE_STRING2 ... • Contains: client ID, session ID, session state Set-Cookie: NAME=VALUE; expires=DATE; path=PATH; domain=DOMAIN_NAME

• The expires option tells the browser when to expire the cookie. Omission of the expires header means the browser should never save the cookie to the disk. The format for the expiry date is: Weekday, Day-Month-Year, Hour:Minute:Second GMT Thursday 22-Nov-2006 00:00:00 GMT Month is 3 letters, and weekday is spelt out fully. • The path option tells the browser which the cookie must be sent to. If no path is specified in the header, the cookie is sent to only those URLs that have the same path as the URL that set the cookie. If the cookie path is set to /, then all URLs at the server will receive the cookie. • The domain option tells the browser the domains to which it should send the cookie. domain=.csun.edu The domain must start with "." and contain at least one additional "." .csun.edu.ca The server that sends the Set-Cookie header must be in the domain specified. If no domain option is in the header, the cookie will only be sent to the same server. Limits on cookies: Each cookie can be up to 4 KB in size Each site can store up to 20 cookies More about cookies: • Create sessions. • Can be used to track user browsing behavior and preferences within a web site. • Can store personal information or passwords in them. • In user computer cookies can be rejected by a browser or erased by the user. • Can used to Avoid logins and provide authorization. • Servers can require that cookies be enabled before the client can use a website. HTTP PROXY • An intermediary program which acts as both a server and a client for the purpose of making requests on behalf of other clients. Requests are serviced internally or by passing them on, with possible translation, to other servers. A proxy must implement both the client and server requirements of this specification. • Therefore, proxy server, satisfies client request without involving origin server, resulting in reduced server & network load, and low latency to response. • TThhrreeee pprriimmaarryy uusseess ooff pprrooxxiieess Security Performance Content Filtering • Tow forms of proxy server exist: Nontransparent and Transparent. • Nontransparent Proxy: Is visible to user, and the user can configure a browser to contact to the proxy server instate of the original source. • TMranosrpear Uenst eP roofx yP: rcoacxhiees sn:etworks traffic without re•quRiriensgt ruiscetri ncogn aficgcuerastsi otno oIrn ktenronwelte dogne .I IPs aa dwdarye tsos simplify caching for the end user and forces all users to use th•e cRacehset.ricting access based on URL • So•mAe lDlorawwibnagc kI notfe rTnreant sapcacrensts Ptroo xnyo:n Oe nIlPy nuseetsw poorrkt s80, FT• P Int oits s pupopsosritbelde, atond h haavse S mtabuillittiyp l/ eR pelrioabxiileitsy issues. Security in HTTP: – HTTP does not provide security: There is a need of security for transferring some information such as a credit card number. – HTTPS: TThhaannkk YYoouu HTTP Over SSL (Secure Socket Layer Protocol) – In HTTPS is used to ensure confidentiality. HTTPS solved problems related to e-commerce. In HTTPS encrypted data is not cacheable, data Qtruansferse are cosnfidenttial, aind SoSL usnes a cesrtifica?te Qtreue. estions?