Hypertext Transfer Protocol (HTTP) Fundamentals

Péter Jeszenszky Faculty of Informatics, University of Debrecen [email protected]

Last modified: September 10, 2021 Contents

● Introduction, concepts ● Messages ● Methods ● Status codes ● Content negotiation ● software

2 Hypertext Transfer Protocol

● A stateless application-level protocol for distributed, collaborative, hypertext information systems. ● Was developed as a joint effort between IETF and W3C. ● See: IETF HTTP Working Group https://httpwg.org/

3 Characteristics

● A request/response protocol based on the client-server model. ● Stateless – I.e., subsequent requests are treated independent of each other. ● Extensible – E.g., methods, status codes, header fields. ● General-purpose – Although mainly used for communication between clients and web servers, in principle, can be used for any other purpose.

4 History (1)

● The first documented version: – HTTP 0.9 (Tim Berners-Lee) https://www.w3.org/Protocols/HTTP/AsImplemented.html ● Very simple, supports only GET requests for which a HTML document consisting of ASCII characters is sent back as a response. ● HTTP/1.0: – Tim Berners-Lee, Roy T. Fielding, Henrik Frystyk Nielsen, Hypertext Transfer Protocol—HTTP/1.0, RFC 1945, May 1996. https://www.rfc-editor.org/rfc/rfc1945 ● Uses MIME-like messages that also contain meta-information about enclosed content. – Supports not only the transmission of HTML documents but also of any other media types. ● Supports multiple methods (GET, HEAD, POST, PUT, DELETE, LINK, ULINK). ● Authentication (basic authentication) ● …

5 History (2)

● HTTP/1.1: – Roy T. Fielding, James Gettys, Jeffrey C. Mogul, Henrik Frystyk Nielsen, Tim Berners-Lee, Hypertext Transfer Protocol —HTTP/1.1, RFC 2068, January 1997. https://www.rfc-editor.org/rfc/rfc2068 ● New features: persistent connections, content negotiation, more sophisticated caching, range requests, … – Roy T. Fielding, James Gettys, Jeffrey C. Mogul, Henrik Frystyk Nielsen, Larry Masinter, Paul J. Leach, Tim Berners- Lee, Hypertext Transfer Protocol—HTTP/1.1, RFC 2616, June 1999. https://www.rfc-editor.org/rfc/rfc2616 ● An update to RFC 2068.

6 Current Standard

● Roy T. Fielding (ed.), Julian F. Reschke (ed.), Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing, RFC 7230, June 2014. https://www.rfc-editor.org/rfc/rfc7230 ● Roy T. Fielding (ed.), Julian F. Reschke (ed.), Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content, RFC 7231, June 2014. https://www.rfc-editor.org/rfc/rfc7231 ● Roy T. Fielding (ed.), Julian F. Reschke (ed.), Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests, RFC 7232, June 2014. https://www.rfc-editor.org/rfc/rfc7232 ● Roy T. Fielding (ed.), Yves Lafon (ed.), Julian F. Reschke (ed.), Hypertext Transfer Protocol (HTTP/1.1): Range Requests, RFC 7233, June 2014. https://www.rfc-editor.org/rfc/rfc7233 ● Roy T. Fielding (ed.), Mark Nottingham (ed.), Julian F. Reschke (ed.), Hypertext Transfer Protocol (HTTP/1.1): Caching, RFC 7234, June 2014. https://www.rfc-editor.org/rfc/rfc7234 ● Roy T. Fielding (ed.), Julian F. Reschke (ed.), Hypertext Transfer Protocol (HTTP/1.1): Authentication, RFC 7235, June 2014. https://www.rfc-editor.org/rfc/rfc7235

7 Secure HTTP

● Eric Rescorla, HTTP Over TLS, RFC 2818, May 2000. https://www.rfc-editor.org/rfc/rfc2818 – Originally, this specification defined the https URI scheme, that is now defined by RFC 7230. ● Rohit Khare, Scott Lawrence, Upgrading to TLS Within HTTP/1.1, RFC 2817, May 2000. https://www.rfc-editor.org/rfc/rfc2817 ● Tim Dierks, Eric Rescorla, The Transport Layer Security (TLS) Protocol Version 1.3, RFC 8446, August 2018. https://www.rfc-editor.org/rfc/rfc8446

8 HTTP/2

● The next major version of HTTP after HTTP/1.1. ● Web page: https://http2.github.io/ ● Specifications: – Mike Belshe, Roberto Peon, Martin Thomson (ed.), Hypertext Transfer Protocol Version 2 (HTTP/2), RFC 7540, May 2015. https://www.rfc-editor.org/rfc/rfc7540 – Roberto Peon, Herve Ruellan, HPACK: Header Compression for HTTP/2, RFC 7541, May 2015. https://www.rfc-editor.org/rfc/rfc7541

9 Sessions

● A session is a sequence of requests and responses between a client and a server. ● The HTTP protocol, by nature, is stateless and does not provide support for session management. ● Session management can be implemented with the help of cookies.

10 How it Works

GET /index. HTTP/1.1 User-Agent: Browser Host: www.example.com Accept: */*

HTTP/1.1 200 OK Date: Fri, 23 Aug 2019 13:15:42 GMT Content-Type: text/html Content-Length: 1024

Hello, world! ...

11 curl

● Command line tool (curl) and library (libcurl) for transferring data that supports a number of protocols. https://curl.se/ https://github.com/curl/curl – Written in: C – Platform: Linux, macOS, Windows, … – License: X11 License ● Supported protocols: FTP, HTTP, HTTPS, SCP, SFTP, …

12 HTTPie

● Command line HTTP client. https://httpie.io/ https://github.com/jakubroztocil/httpie/ – Written in: Python – Platform: Linux, macOS, Windows – License: New BSD License

13 Web Developer Tools

● Chromium, , Opera: – Chrome Developer Tools (DevTools) https://developer.chrome.com/devtools ● Firefox: – Firefox Developer Tools https://developer.mozilla.org/docs/Tools ● Safari: – Tools https://developer.apple.com/safari/tools/ https://support.apple.com/guide/safari-developer ● Chromium-based Edge: – Microsoft Edge (Chromium) Developer Tools https://docs.microsoft.com/en-us/microsoft-edge/devtools-guide-chro mium

14 Further Tools

● Postman https://www.postman.com/ – Available as a native application. ● Platform: macOS, Linux, Windows ● License: non-free ● Further information: Postman Learning Center https://learning.postman.com/

15 Terminology (1)

● Resource: The target of an HTTP request identified by a URI. ● Representation: – Information that is intended to reflect a past, current, or desired state of a given resource. – Can be readily communicated via the protocol. – Consists of a set of representation metadata and a potentially unbounded stream of representation data. ● Content negotiation: – An origin server might be provided with, or be capable of generating, multiple representations that are each intended to reflect the current state of a target resource. – Content negotiation is a mechanism for selecting the most appropriate representation to a given request. – This representation is called the selected representation.

16 Terminology (2)

● Message: The basic unit of HTTP communication. ● Payload: – A representation transmitted in a message.

17 Terminology (3)

● The terms client and server refer only to the roles that programs perform for a particular connection. The same program might act as a client on some connections and a server on others. – Client: A program that establishes a connection to a server for the purpose of sending one or more HTTP requests. – Server: A program that accepts connections in order to service HTTP requests by sending HTTP responses.

18 Terminology (4)

● User agent: A client program that initiates a HTTP request. – E.g., web browser, web crawler, command line tool (curl, wget), custom application, … ● Origin server: A program that can originate authoritative responses for a given target resource. ● Sender/recipient: A program that sends or receives a given message, respectively.

19 Terminology (5)

● Intermediary: allows requests to be satisfied through a chain of connections. – There are three types of intermediaries: proxy, gateway, tunnel.

20 Intermediaries

User agent Intermediary Intermediary Origin server Implementation Diversity

● Both user agents and origin servers can be of many kinds. – User agents: general-purpose browsers, household appliances, entertainment devices, command line tools, mobile apps, … – Origin servers: web servers, configurable networking components, office machines, autonomous robots, traffic cameras, …

22 The http and https URI Schemes (1)

● Defined for the purpose of identifying resources on a potential origin server listening for connections on a given TCP port. – https uses a TLS-secured connection for communication. ● Syntax: – 'http://' host [':' port] [path] ['?' query] ● If the port subcomponent is not given, TCP port 80 is the default. – 'https://' host [':' port] [path] ['?' query] ● If the port subcomponent is not given, TCP port 443 is the default. ● The path must start with a '/' character or must be empty.

23 The http and https URI Schemes (2)

● The origin server for a URI is identified by the host component and the optional port component. – The path and the optional query component identifies a potential target resource within that origin server's namespace. ● Note that the presence of a URI does not imply that there is always an HTTP server listening for connections on the given host and port.

24 The http and https URI Schemes (3)

● URI comparison: – An empty path component is equivalent to a path of '/'. – The scheme and host components are case-insensitive and normally provided in lowercase. All other components are compared in a case- sensitive manner. – Characters other than those in the “reserved” set are equivalent to their percent-encoded octets. ● For example, the following URIs are equivalent: – http://www.inf.unideb.hu/, http://www.inf.unideb.hu:80/, http://www.inf.unideb.hu, http://www.inf.unideb.hu:80 – http://www.inf.unideb.hu/~jeszy/, http://www.inf.unideb.hu/%7Ejeszy/, HTTP://www.INF.UNIDEB.hu/~jeszy/

25 Messages

● There are two types of messages: – Request – Response ● Messages must be parsed as a sequence of octets.

26 Message Format

● Syntactically, requests and responses differ only in their first line. – The first line is called the start-line. ● The start-line is followed by zero or more header fields, each of which is terminated with CRLF. ● An empty line (CRLF) indicates the end of the header section. ● Optionally, a message body may appear at the end of the message.

27 Requests (1)

● The start-line of requests has the following syntax: method request-target HTTP-version CRLF – Components within the line must be separated by a single space. ● The request-target identifies the target resource upon which to apply the request.

28 Requests (2)

● Example:

> GET /licenses/ HTTP/1.1 > Host: www.gnu.org > User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0 > Accept: text/html,application/xhtml+xml,application/xml;q=0.9, image/webp,*/*;q=0.8 > Accept-Language: hu-HU,hu;q=0.8,en-US;q=0.5,en;q=0.3 > Accept-Encoding: gzip, deflate, br > Upgrade-Insecure-Requests: 1 > Connection: keep-alive >

29 Request (3)

● The most common form of request-target is the following: path ['?' query] – If the target URI's path component is empty, the client must send '/' as the path. – The host and port components of the target URI are sent in the Host header field. – Example: ● GET /copyleft/gpl.html HTTP/1.1 Host: www.gnu.org

30 Requests (4)

● The request-target can be '*' only for a server-wide OPTIONS request (see the discussion of the OPTIONS method). – Example: OPTIONS * HTTP/1.1

31 Responses (1)

● The first line of a response message is called the status- line, it has the following syntax: HTTP-version status_code reason_phrase CRLF – Components within the line must be separated by a single space. ● The status code consists of three decimal digits and is followed by a short textual description associated with the numeric status code. ● Examples: – HTTP/1.1 200 OK – HTTP/1.1 404 Not Found

32 Responses (2)

● Example:

< HTTP/1.1 200 OK < Date: Tue, 10 Jan 2017 09:18:05 GMT < Server: Apache/2.4.7 < Content-Location: home.html < Vary: negotiate,accept-language,Accept-Encoding < TCN: choice < Accept-Ranges: bytes < Content-Encoding: gzip < Cache-Control: max-age=0 < Expires: Tue, 10 Jan 2017 09:18:05 GMT < Content-Length: 9181 < Keep-Alive: timeout=3, max=100 < Connection: Keep-Alive < Content-Type: text/html < Content-Language: en < < 〈gzip compressed data〉 33 Header Fields (1)

● They have the following syntax: field_name ':' value – Field name: ● Consist of one or more characters, only a subset of ASCII characters is allowed (letters, digits, '!', '#', '$', …). ● Case-insensitive. – Value: ● Consists of printable ASCII characters, spaces and horizontal tab characters. ● It is recommended to add a single space before the value. ● Spaces and horizontal tabs before and after the value are ignored.

34 Header Fields (2)

● The order of header fields in not significant. ● Multiple header fields with the same field name may appear only if the field value for that header field is defined as a comma-separated list. – A recipient may combine the values of the header fields with the same field name into a list. – An exception is the Set-Cookie header field that often appears multiple times in a response message and does not use the list syntax.

35 Header Fields (3)

● The four types of header fields: – Representation header fields: provide metadata about the representation ● E.g., Content-Type, Content-Encoding, Content-Language – Payload header fields: describe the payload ● E.g., Content-Length, Transfer-Encoding – Request header fields: header fields sent by clients in requests ● E.g., Accept, Accept-Language, Host, User-Agent – Response header fields: allow the server to pass additional information about the response beyond what is placed in the status- line ● E.g., Date, ETag, Last-Modified, Location

36 Header Fields (4)

● Many other specifications defines header fields beyond the ones specified by HTTP. ● IANA maintains the registry of header fields. – See: Message Headers https://www.iana.org/assignments/message-heade rs/message-headers.xhtml

37 The User-Agent Header Field (1)

● Contains information about the user agent originating the request. ● Can be used for content negotiation or analytics regarding browser or operating system use. ● A user agent should send a User-Agent field in each request. ● The field value consists of one or more product identifiers, each followed by zero or more comments. – Product identifiers are listed in decreasing order of their significance. – Each product identifier consists of a name and optional version. – Comments are delimited by '(' and ')'.

38 The User-Agent Header Field (2)

● curl: curl/7.71.1 ● Firefox: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0 ● Google Chrome: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36 ● …

39 The User-Agent Header Field (3)

● Further links: – UserAgentString.com http://www.useragentstring.com/ – Firefox user agent string reference https://developer.mozilla.org/docs/Web/HTTP/Heade rs/User-Agent/Firefox – What's my user agent? https://www.whatsmyua.info/ – Latest user agents for Web Browsers & Operating Systems https://www.whatismybrowser.com/guides/the-latest- user-agent/

40 The q Parameter

● Many of the request header fields (Accept, Accept- Charset, Accept-Encoding, Accept-Language) use a parameter, named q, to assign a relative weight to the preference for that associated kind of content. ● The weight is a real number in the range 0 through 1, up to 3 digits are allowed after the decimal dot. – A value of 0 means “not acceptable”. – 0.001 is the least preferred. – 1 is the most preferred. ● The default weight is 1.

41 The Accept Header Field (1)

● Can be used by user agents to specify response media types that are acceptable. ● The field value is a list of media ranges, where each media range might be followed by zero or more media type parameters (e.g., charset), and an optional q parameter. ● Media range: – type/subtype: indicates that media type – type/*: indicates all subtypes of that type – */*: indicates all media types ● A request without any Accept header field implies that the user agent will accept any media type in response.

42 The Accept Header Field (2)

● The value can vary, it can be different when fetching a document entered in the address bar and an image linked via an element. – See: MDN Web Docs – Content negotiation – The Accept: header https://developer.mozilla.org/en-US/docs/Web/HTT P/Content_negotiation#The_Accept_header ● Default value (user agent-dependent): – List of default Accept values https://developer.mozilla.org/en-US/docs/Web/HTT P/Content_negotiation/List_of_default_Accept_valu

es 43 The Accept Header Field (3)

● Firefox: uses the following default value – Accept: text/html, application/xhtml+xml, application/xml;q=0.9,*/*;q=0.8

Media Range q text/html 1 application/xhtml+xml 1 application/xml 0.9 */* 0.8

44 Message Body (1)

● Carries the payload body of a request or response. ● Consist of the payload body or the payload body encoded with the transfer coding indicated by the Transfer-Encoding header field. ● Is an arbitrary sequence of octets.

45 Message Body (2)

● The presence of a message body in a request is signaled by a Content-Length or Transfer-Encoding header field. – A message body in a GET, HEAD, DELETE, CONNECT or OPTIONS request message has no defined semantics. – A client must not send a message body in a TRACE request. ● The presence of a message body in a response depends on both the request method to which it is responding and the response status code – Responses to the HEAD request method never include a message body, even if the header fields indicate otherwise.

46 Payload Semantics (1)

● The purpose of a payload in a request is defined by the method semantics. – For example, a representation in the payload of a POST request represents information to be processed by the target resource. ● The purpose of a payload in a response is defined by both the request method and the response status code. – For example, responses with an error status code usually contain a payload that represents the error condition.

47 Payload Semantics (2)

● The representation data is in a format and encoding defined by the representation metadata header fields.. – The Content-Type header field indicates the media type of the representation. ● Example: – Content-Type: text/html; charset=utf-8 – The Content-Encoding header field indicates what content codings have been applied to the representation. ● Data in the media type referenced by the Content-Type header field can be obtained by applying appropriate decoding mechanisms.

48 Content Coding (1)

● Content codings are used to allow a representation to be compressed or otherwise usefully transformed without losing the identity of its underlying media type and without loss of information. – Frequently, the representation is stored in coded form, transmitted directly, and only decoded by the final recipient. ● The content codings applied are a characteristic of the representation. – All other metadata about the representation is about the coded form.

49 Content Coding (2)

● HTTP/1.1 defines the following content codings: – compress: data format commonly produced by the Unix file compression program compress – deflate: DEFLATE compressed data inside the zlib format – gzip: GZIP compression – x-compress: deprecated, alias for compress – x-gzip: deprecated, alias for gzip ● Other content codings: – br: Brotli compression

50 Content Coding (3)

● The Accept-Encoding header field can be used in requests by user agents to indicate what content-codings are acceptable in the response. – Example: ● Accept-Encoding: compress, gzip ● Accept-Encoding: * ● The Content-Encoding header field indicates what content codings have been applied to the representation, beyond those inherent in the media type. – Data in the media type referenced by the Content-Type header field can be obtained by applying appropriate decoding mechanisms. – Example: ● Content-Encoding: gzip

51 Content Coding (4)

● IANA maintains the registry of content codings. – Content coding names are case-insensitive. – See: HTTP Content Coding Registry https://www.iana.org/assignments/http-parameters /http-parameters.xhtml#content-coding

52 Methods (1)

● HTTP/1.1 defines the following methods: – GET – HEAD – POST – PUT – DELETE – CONNECT – OPTIONS – TRACE ● Method names are case-sensitive. ● Method semantics might be further refined by the presence of header fields in the request. ● All general-purpose servers must support the methods GET and HEAD, all other methods are optional.

53 Methods (2)

● Additional methods, beyond the ones defined by HTTP, have been standardized, too. ● IANA maintains the registry of HTTP methods. – See: Hypertext Transfer Protocol (HTTP) Method Registry https://www.iana.org/assignments/http-methods/htt p-methods.xhtml

54 Methods (3)

● The 501 (Not Implemented) status code in a response indicates that the origin server does not recognize or implement the request method. ● The 405 (Method Not Allowed) status code in a response indicates that the request method is known by the origin server but is not supported by the target resource.

55 GET

● Requests transfer of a current selected representation for the target resource. ● Servers information retrieval purposes. ● A client can alter the semantics of GET to be a “range request”, requesting transfer of only some part(s) of the selected representation, by sending a Range header field in the request.

56 HEAD

● Identical to the GET method except that the server must not send a message body in the response. ● Can be used for obtaining metadata about the selected representation without transferring the representation data.

57 POST

● Requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics. ● Possible applications include: – Submitting data (e.g., form data) to a data-handling process. – Posting a message to a newsgroup, mailing list, or blog. – Creating a new resource. – Appending data to a resource's existing representation(s). – …

58 PUT (1)

● Requests that the state of the target resource be created or replaced with the state defined by the representation enclosed in the request message payload. ● A successful PUT of a given representation would suggest that a subsequent GET on that same target resource will result in an equivalent representation being sent in a 200 (OK) response.

59 PUT (2)

● The fundamental difference between the POST and PUT methods is highlighted by the different intent for the enclosed representation: – The target resource in a POST request is intended to handle the enclosed representation. – The enclosed representation in a PUT request is defined as replacing the state of the target resource.

60 DELETE

● Requests that the origin server remove the association between the target resource and its current functionality. – If the target resource has one or more current representations, they might or might not be destroyed by the origin server, and the associated storage might or might not be reclaimed, depending entirely on the nature of the resource and its implementation by the origin server.

61 OPTIONS (1)

● Requests information about the communication options available for the target resource, at either the origin server or an intervening intermediary. ● '*' as the request-target applies to the server in general rather than to a specific resource. ● In a successful response to an OPTIONS request a server should send any header fields that might indicate optional features implemented by the server and applicable to the target resource. – For example, the Allow header field lists the set of methods advertised as supported by the target resource.

62 OPTIONS (2)

● Example: – curl -v --request OPTIONS \ http://apache.org/foundation/contact. html > OPTIONS /foundation/contact.html HTTP/1.1 > Host: apache.org > User-Agent: curl/7.78.0 > Accept: */* > < HTTP/1.1 200 OK < Server: Apache < Allow: GET,POST,OPTIONS,HEAD,TRACE < Cache-Control: max-age=3600 < Expires: Sun, 15 Aug 2021 11:52:48 GMT < Content-Type: text/html < Content-Length: 0 < Date: Sun, 15 Aug 2021 10:52:48 GMT 63 < TRACE

● Requests the request message to be sent back. – The final recipient of the request should reflect the message received back to the client as the message body of a 200 (OK) response with a Content-Type of message/http. ● In general, the method is not allowed on servers for security reasons.

64 Status Codes (1)

● Three-digit decimal integer numbers. ● The first digit defines the class of status code (response). ● Clients are not required to understand the meaning of all registered status codes. – However, a client must understand the class of any status code, as indicated by the first digit. – An unrecognized status code must be treated as being equivalent to the x00 status code, where x is the first digit of the unrecognized status code.

65 Status Codes (2)

● Status codes are extensible. ● IANA maintains the registry of status codes. – See: Hypertext Transfer Protocol (HTTP) Status Code Registry https://www.iana.org/assignments/http-status-code s/http-status-codes.xhtml

66 Classes of Status Codes

● 1xx: Informational – Indicates an interim response prior to sending a final response. ● 2xx: Success – The request was successfully received, understood, and accepted by the server. ● 3xx: Redirection – Further action needs to be taken by the user agent in order to fulfill the request, this can happen automatically. ● 4xx: Client Error ● 5xx: Server Error

67 Client and Server Errors

● Except when responding to a HEAD request in response with a status code of 4xx and 5xx the server should send a representation containing an explanation of the error situation, and whether it is a temporary or permanent condition.

68 Major Status Codes (1)

Status Code Reason Description 100 Continue The initial part of a request has been received and has not yet been rejected by the server.

The server intends to send a final response after the request has been fully received and acted upon.

A client that is about to send a (presumably large) message body may wish to receive a 100 (Continue) interim response before actually transmitting the message body. A client that will wait for a 100 (Continue) response before sending the request message body must send an Expect: 100- continue header field.

69 Major Status Codes (2)

Status Code Reason Description 200 OK The request has been fulfilled.

The payload sent in the response depends on the request method. For example, in a response to a GET request the payload is a representation of the target resource. 201 Created The request has been fulfilled and has resulted in one or more new resources being created. 202 Accepted The request has been accepted for processing, but the processing has not been completed. 204 No Content The request has been fulfilled and there is no payload body in the response. 206 Partial Content A range request has been fulfilled. 70 Major Status Codes (3)

Status Code Reason Description 300 Multiple Choices The target resource has more than one representation.

For methods other than HEAD, the server should generate a payload in the response containing a list of representation metadata and URI reference(s) from which the user or user agent can choose the one most preferred. 301 Moved Permanently The target resource has been assigned a new permanent URI and any future references to this resource ought to use the new URI. 302 Found The target resource resides temporarily under a different URI. 303 See Other The server is redirecting the user agent to a different resource, which is intended to provide an indirect response to the original request. 304 Not Modified The server has received a conditional GET or HEAD request. There is no need for the server to transfer a representation of the target resource because the request indicates that the client already has a valid representation. 71 Major Status Codes (4)

Status Code Reason Description 400 Bad Request The server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax). 401 Unauthorized The request lacks valid authentication credentials for the target resource. 403 Forbidden The server understood the request but refuses to authorize it.

If authentication credentials were provided in the request, the server considers them insufficient to grant access. 404 Not Found The origin server did not find a current representation for the target resource or is not willing to disclose that one exists.

405 Method Not Allowed The method is known by the origin server but not supported by the target resource.

406 Not Acceptable The target resource does not have a current representation that would be acceptable to the user agent and the server is unwilling to supply a default 72 representation (content negotiation). Major Status Codes (5)

Status Code Reason Description 500 Internal Server Error The server encountered an unexpected condition that prevented it from fulfilling the request. 501 Not Implemented The server does not support the functionality (method) required to fulfill the request.

503 Service Unavailable The server is currently unable to handle the request (due to, e.g., a temporary overload or scheduled maintenance).

73 Fun with Status Codes

● Let’s have some fun: – 418 (I'm a teapot) ● Larry Masinter, Hyper Text Coffee Pot Control Protocol (HTCPCP/1.0), RFC 2324, 1 April 1998. https://www.rfc-editor.org/rfc/rfc2324 ● Imran Nazar, The Hyper Text Coffee Pot Control Protocol for Tea Efflux Appliances (HTCPCP-TEA), RFC 7168, 1 April 2014. https://www.rfc-editor.org/rfc/rfc7168 – HTTP Cats https://http.cat/ – HTTP Status Dogs https://httpstatusdogs.com/

74 Content Negotiation (1)

● An origin server might be provided with, or be capable of generating, multiple representations that are each intended to reflect the current state of a target resource. – For example, representations might have different formats, languages, or encodings. – Likewise, different users or user agents might have differing capabilities, characteristics, or preferences. ● Content negotiation is a mechanism for selecting the most appropriate representation to a given request.

75 Content Negotiation (2)

● HTTP/1.1 defines the following two patterns of content negotiation: – Proactive: the server selects the representation based upon the user agent's stated preferences. ● This is also known as server-driven negotiation. – Reactive: the server provides a list of representations for the user agent to choose from. ● This is also known as agent-driven negotiation. ● There are further patterns, different patterns are not mutually exclusive.

76 Proactive Negotiation (1)

● The origin server uses an algorithm to select the preferred representation. ● Selection is based on the available representations for a response compared to various information supplied in the request, including certain header fields and implicit characteristics, such as the client's network address. ● The following header fields can be used for the selection: – Accept, Accept-Charset, Accept-Encoding, Accept-Language, User-Agent ● A Vary header field in a response subject to proactive negotiation indicates what parts of the request (what request header fields) might influence the origin server for selecting the representation.

77 Proactive Negotiation (2)

● Advantageous: – When the algorithm for selecting from among the available representations is difficult to describe to a user agent. – When the server desires to send its “best guess” to the user agent along with the first response, to avoid a subsequent request.

78 Proactive Negotiation (3)

● Disadvantages: – It is impossible for the server to accurately determine what might be “best” for any given user, since that would require complete knowledge of both the capabilities of the user agent and the intended use for the response. – Having the user agent describe its capabilities in every request can be both very inefficient and a potential risk to the user's privacy. – Complicates the implementation of an origin server and the algorithms for generating responses to a request. – Limits the reusability of responses for shared caching.

79 Proactive Negotiation (4)

● Example: – curl -v http://www.gnu.org/ -H "Accept-Language: fr"

> GET / HTTP/1.1 > User-Agent: curl/7.78.0 > Host: www.gnu.org > Accept: */* > Accept-Language: fr > < HTTP/1.1 200 OK < Date: Sun, 15 Aug 2021 11:37:57 GMT < Server: Apache/2.4.7 < Content-Location: home.fr.html < Vary: negotiate,accept-language,Accept-Encoding < TCN: choice < Access-Control-Allow-Origin: (null) < Accept-Ranges: bytes < Cache-Control: max-age=0 < Expires: Sun, 15 Aug 2021 11:37:57 GMT < Transfer-Encoding: chunked < Content-Type: text/html < Content-Language: fr < 80 < ... Reactive Negotiation (1)

● Selection of the best response representation is performed by the user agent after receiving an initial response from the origin server that contains a list of resources for alternative representations. – Selection of alternatives might be performed automatically by the user agent or manually by the user.

81 Reactive Negotiation (2)

● Advantageous: – When the response would vary over commonly used dimensions (such as type, language, or encoding). – When the origin server is unable to determine a user agent's capabilities. ● Disadvantages: – After obtaining the list of alternative representations the user agent must make a second request to obtain the desired representation. – HTTP/1.1 does not define a mechanism for supporting automatic selection.

82 Redirection (1)

● Example: – curl -v http://w3.org/ > GET / HTTP/1.1 > Host: w3.org > User-Agent: curl/7.78.0 > Accept: */* > < HTTP/1.1 301 Moved Permanently < Content-length: 0 < Location: http://www.w3.org/ <

83 Redirection (2)

● Example: – curl -v -L http://w3.org/

> GET / HTTP/1.1 > User-Agent: curl/7.78.0 > Host: w3.org > Accept: */* > < HTTP/1.1 301 Moved Permanently < Content-length: 0 < Location: http://www.w3.org/ < > GET / HTTP/1.1 > Host: www.w3.org > User-Agent: curl/7.78.0 > Accept: */* > < HTTP/1.1 200 OK < ... 84 Redirection (3)

● Example: – curl -v http://dbpedia.org/resource/Hungary

> GET /resource/Hungary HTTP/1.1 > User-Agent: curl/7.78.0 > Host: dbpedia.org > Accept: */* > < HTTP/1.1 303 See Other < Server: nginx/1.18.0 < Date: Sun, 15 Aug 2021 12:23:18 GMT < Content-Type: text/html < Content-Length: 153 < Connection: keep-alive < Location: https://dbpedia.org/resource/Hungary < Access-Control-Allow-Credentials: true < Access-Control-Allow-Methods: GET, POST, OPTIONS < Access-Control-Allow-Headers: Depth,DNT,X-CustomHeader,Keep-Alive,User-Agent, X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Accept-Encoding <

85 Content Negotiation (1)

● Example: – curl --http1.1 -H "Accept-Language: it" -L -v \ https://www.mozilla.org/

> GET / HTTP/1.1 > Host: www.mozilla.org > User-Agent: curl/7.78.0 > Accept: */* > Accept-Language: it > < HTTP/1.1 301 Moved Permanently < Date: Fri, 10 Sep 2021 19:53:54 GM < Content-Type: text/html; charset=utf-8 < Transfer-Encoding: chunked < Connection: keep-alive < Location: /it/ < Vary: Accept-Language < ... < 86 Content Negotiation (2)

● Example (continued):

> GET /it/ HTTP/1.1 > Host: www.mozilla.org > User-Agent: curl/7.78.0 > Accept: */* > Accept-Language: it > < HTTP/1.1 200 OK < Date: Fri, 10 Sep 2021 19:53:54 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 102402 < ... < Server: cloudflare < < <

87 Content Negotiation (3)

● Example: – curl -v -L -H "Accept: application/json" \ http://dbpedia.org/resource/Hungary > GET /resource/Hungary HTTP/1.1 > Host: dbpedia.org > User-Agent: curl/7.78.0 > Accept: application/json > < HTTP/1.1 303 See Other < Server: nginx/1.18.0 < Date: Mon, 16 Aug 2021 13:59:32 GMT < Content-Type: text/html < Content-Length: 153 < Connection: keep-alive < Location: https://dbpedia.org/data/Hungary.json < Access-Control-Allow-Credentials: true < Access-Control-Allow-Methods: GET, POST, OPTIONS < Access-Control-Allow-Headers: Depth,DNT,X-CustomHeader,Keep-Alive,User-Agent, X-Requested-With, If-Modified-Since,Cache-Control,Content-Type,Accept-Encoding < < < 303 See Other < <

303 See Other

<
nginx/1.18.0
88 < < Content Negotiation (4)

● Example (continued): > GET /data/Hungary.json HTTP/1.1 > Host: dbpedia.org > User-Agent: curl/7.78.0 > Accept: application/json > < HTTP/1.1 200 OK < Date: Mon, 16 Aug 2021 13:59:32 GMT < Content-Type: application/json < Content-Length: 1585877 < Connection: keep-alive < Vary: Accept-Encoding < Server: Virtuoso/08.03.3322 (Linux) x86_64-generic-linux-glibc25 VDB < Expires: Mon, 23 Aug 2021 14:03:03 GMT < ... < < 〈JSON data〉

89 Content Negotiation (5)

● Example (continued): – curl -v -L -H "Accept: application/rdf+xml" \ http://dbpedia.org/resource/Hungary \ -o Hungary.rdf – curl -v -L -H "Accept: text/turtle" \ http://dbpedia.org/resource/Hungary \ -o Hungary.ttl – curl -v -L -H "Accept: text/html" \ http://dbpedia.org/resource/Hungary \ -o Hungary.html

90 Web Server Software (1)

● Market share of web servers: – Netcraft Web Server Survey https://news.netcraft.com/archives/category/web-s erver-survey/ ● See the following to understand the figures: How many active sites are there? https://www.netcraft.com/active-sites/

91 Web Server Software (2)

● The most widely used web servers:

Name Platform License Comment Apache HTTP Server cross-platform Apache License 2.0 http://httpd.apache.org/ nginx https://nginx.org/en/ cross-platform Simplified BSD License ● Pronunciation: engine x ● Web server and reverse proxy ● Customers: Dropbox, Last.fm, Netflix, SourceForge, …

Internet Information Services Windows non-free software (IIS) https://www.iis.net/

Google Web Server (GWS) Linux? non-free software Custom-developed web server used by Google.

92 Web Server Software (3)

● A few other notable web servers:

Name Platform License Comment Apache Traffic Server Unix Apache License 2.0 Caching reverse proxy server https://trafficserver.apache.org/ developed originally by Yahoo!

Jetty Java Apache License 2.0/ https://www.eclipse.org/jetty/ Eclipse Public License v1.0 Lighttpd https://www.lighttpd.net/ Linux New BSD License Squid platformfüggetlen GPLv2 Caching proxy server http://www.squid-cache.org/ Varnish Unix Simplified BSD Caching reverse proxy server https://www.varnish-cache.org/ License

93 Java Support

● Java SE 8: – java.net.HttpURLConnection https://docs.oracle.com/javase/8/docs/api/java/net/ HttpURLConnection.html ● JDK 11: – See the java.net.http.HttpClient class provided by the java.net.http module. https://docs.oracle.com/en/java/javase/11/docs/api /java.net.http/java/net/http/package-summary.html

94 Client Libraries

● Java: – Apache HttpComponents – HttpClient (license: Apache License 2.0) https://hc.apache.org/httpcomponents-client-ga/

● API documentation: https://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/ – Google HTTP Client Library for Java (license: Apache License 2.0) https://googleapis.github.io/google-http-java-client/ https://github.com/google/google-http-java-client – Jetty HTTP client (license: Apache License 2.0/Eclipse Public License v1.0) https://www.eclipse.org/jetty/ ● Documentation: https://www.eclipse.org/jetty/documentation/jetty-11/programming-guide/index.html#pg-client-http – OkHttp (license: Apache License 2.0) https://square.github.io/okhttp/ https://github.com/square/okhttp – Unirest for Java (license: MIT License) http://kong.github.io/unirest-java/ https://github.com/kong/unirest-java ● Python: – Requests: HTTP for Humans (license: Apache License 2.0) https://requests.readthedocs.io/ https://github.com/psf/requests – urllib3 https://urllib3.readthedocs.io https://github.com/urllib3/urllib3

95 Further Recommended Reading

● MDN Web Docs – HTTP https://developer.mozilla.org/docs/Web/HTTP ● Apache HTTP Server Documentation https://httpd.apache.org/docs/ ● Ilya Grigorik, High Performance Browser Networking. O'Reilly, 2013. https://hpbn.co/

96