Web Protocols UR What? URI Syntax Internationalisation

Web Protocols UR What? URI Syntax Internationalisation

CS314 - 29 UR what? Web Protocols ! URI: Uniform Resource Identifier ! URI, URN, URL " Uniquely identifies a data entity " ! Internationalisation Obeys a specific syntax " schemeName:specificStuff ! Role of HTML and XML ! URN: Uniform Resource Name ! HTTP and HTTPS " A URI that only names something " interacting via the Web " Example: urn:isbn:0-534-38317-3 ! URL: Uniform Resource Locator " A URI that points to an actual resource " Example: http://en.wikipedia.org/wiki/URL 1 2 URI syntax Internationalisation ! The Unicode standard defines character sets for any URI = scheme ":" hier-part script [ "?" query ] [ "#" fragment ] " variable length character codes, usually encoded in bytes in UTF-8 format ! The hierarchical part can start with // and uses / to separate components. There are other reserved " 8-bit ASCII is a proper subset of UTF-8 characters ! Internationalising DNS names, URIs and email addresses in UTF-8 is not simple http://en.wikipedia.org/wiki/URL " Yet most people in the world have names like Fältström, scheme top of hierarchy (note reversal - next next name DNS writes right to left!) level level ! (DNS is case-independent but URI is case-sensitive) internationalised content and interaction 3 4 *ML *ML parsers ! In 1969, three IBMers invented GML (Generalised ! Strictly speaking, a pure SGML parser can parse Markup Language) HTML, XML or XHTML ! In the early 1980s it became SGML (Standard GML) ! In practice, HTML is written sloppily with proprietary ! Around 1990, Tim Berners-Lee and Robert Cailliau extensions invented HTML (HyperText Markup Language) as an application of SGML " browsers and XML consumers have to be more tolerant than a strict SGML parser " HTML is the format for hypertext documents on the Web ! The WorldWide Web Consortium developed XML " different browsers tolerate different deviations from the (eXtensible Markup Language) as a subset of SGML standards " primary format for data sharing in Web-based services " HTML files that don't cite a specific DTD (Document ! XHTML (eXtensible HTML) is an XML-conformant Type Definition) or omit some syntax elements are often redefinition of HTML tolerated by browsers 5 6 SGML DTDs *ML document format ! SGML text starts with a DTD declaration such as <!DOCTYPE elem1 PUBLIC "fpi" "path"> ! A document must obey its declared DTD elem1= the first SGML element in the document ! Thus, with the previous DTD the document must start with <html> and end with </html> fpi = formal public identifier of the DTD " Internally, all elements must conform to the syntax path = where to find the DTD text defined in the DTD ! Example " The semantics expressed in comments in the DTD <!DOCTYPE html PUBLIC need to be coded into whatever software is interpreting the document "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> 7 8 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" Getting Hyper "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> ! The key property of an HTML document is that it may <title>*ML document format</title> contain links to other HTML documents </head> <body> " These are called hyperlinks because they may jump <h1><font color=blue>*ML document format</font></h1> anywhere on the Internet, and the resulting interlinked <ul> documents are known as hypertext <li>A document must obey its declared DTD <li>Thus, with the previous DTD the document must start with ! A link is formally an anchor in HTML terminology: <html> and end with </html> <ul> <a href="http://www.google.com">Click for Google</a> <li>Internally, all elements must conform to the syntax defined in a for Anchor the DTD <li>The semantics expressed in comments in the DTD need to be href for Hypertext Reference coded into whatever software is interpreting the document </ul> </ul> </body> </html> 9 10 HTTP: Hypertext Transfer Protocol Important HTTP Methods ! Used for communication between Web clients ! GET (“browsers”) and Web servers " retrieve the entity indicated by the URI, e.g. an HTML " Principal use is to carry HTML documents identified by a URL document " Request/response protocol running over TCP (usually port 80) ! ! Request includes: POST " Method (see below) " post (send) the content in the request message to the server " URI (typically a URL) for processing, e.g. filled in forms from an HTML page " Request modifiers and optional content ! There are various other methods defined, but GET and ! Response includes: POST are by far the most important " Status or error code " some methods allow for remote content update " Meta-information about content " Content (typically an HTML document) 11 12 Content Encoding in HTTP Nested content causes repeated HTTP transactions ! HTTP message format generally resembles email <html><body> format, and similar methods are used to encode Here's some text and an image in the same content (international character sets, graphics, etc.) directory <img src="CoffeeBreak.jpg">. ! Thus, an image represented in an HTML document Here's <a href="http://w3.org">a like this: link.</a> <img src="CoffeeBreak.jpg"> </body></html> leads to HTTP content headers like this: ! There will be Content-Length: 49398 " a GET for the initial document, " a second automatic GET on the same HTTP connection for Connection: close the image, Content-Type: image/jpeg " a third GET on a new HTTP connection when the user clicks on the link followed by encoded JPEG format 13 14 POST and the Interaction Model HTML Form Options ! Possible input fields ! HTML documents can include forms, and browsers support data entry into forms " <form action="myScript.cgi" " method=POST> " <p>Enter your choice:<br> <input type=text name=Choice size=50 " value="Anything"></p> " <p><input type=submit value="Send"> " </p></form> ! Browser generates a POST message when user clicks the Send button, and the data entered is delivered to the script 15 16 Thankyou, Wikipedia! Scripting (Server Side) Scripting (Client Side) ! Data entered in a form is delivered to a script at the ! HTML content can also include scripts that run on the server client machine " The script must know the fields in the form and their " which, by the way, could include malicious code meanings " mainly written in JavaScript " The web server reaches the script through the CGI (Common Gateway Interface) ! JavaScript is not Java, but can be embedded in HTML " Many choices of scripting language, and ways to use documents general languages like C and Java " weakly typed and rather vaguely defined " Most common today (for complex applications) are Perl, " art rather than science PHP and Ruby (on Rails) ! HTML content can also trigger Java code " PHP5 is an object-oriented language 17 18 HTTP Caches HTTP Proxies ! ! HTTP supports caching An intermediate system can proxy HTTP requests and responses " locally, in the client system " a proxy can be configured as caching or non-caching " remotely, in an intermediate system ! Caching is obviously beneficial for large images etc. " dynamic content must not be cached " POST invalidates cached data ! HTTP cache-control directives include: no-cache #dynamic content max-age = <seconds>#short-lived data 19 20 HTTP security References ! An HTTP connection can be opened securely over TLS ! Shay 12.3 - 12.5 (SSL) by using the https: scheme ! URIs: RFC 3305, RFC 3986 " HTTPS generally listens on port 443 ! Internationalised DNS: RFC 4690 ! An insecure HTTP connection can use the HTTP upgrade ! Internationalised URIs: RFC 3987 header to upgrade to TLS, even when using port 80 ! HTML, XML and XHTML: " the HTTP CONNECT method can be used to upgrade to TLS through a proxy http://www.w3.org/TR/1999/REC-html401-19991224/ ! Beware! Client authentication can be secured via TLS, http://www.w3.org/TR/xml/ but HTTP server authentication is a minefield http://www.w3.org/TR/html/ ! HTTP(S): RFC 2616, 2817, 2818 Congratulations! You have connected securely ! PHP: http://php.net/ to StealMyPassword.net. ! JavaScript: http://www.openjs.com/ 21 22.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us