Web Information System Design No.5 Accessing Web Documents

Tatsuya Hagino ([email protected])

1 Web Basic Components

 Prepare documents using HTML and CSS  Address documents by URL  Get documents from server using HTTP

HTML documentHTML HTML Internet documentHTML document documentHTML document URL

HTTP Web Browser Web Server

2 URI (Uniform Resource )

 Identifier  names and numbers that identify things (objects) and concepts

which we use:  identifier for students  identifier for books Identifier  identifier for phones  identifier for PC Object  identifier in Internet Concept

3 URL, URN and IRI

 URL (Uniform Resource Locator)  URI to point the address of resource  Example: http, ftp, file,

 URN (Uniform Resource Name)  urn::  NID needs to be registered to IANA  http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xml  50 URN are registered (2014-04-17)  Example: 3gpp, cablelabs, cgi, clei, dgiwg, dvb, ebu, epc, epcglobal, eurosystem, example, fdc, fipa, geant, gsma, ietf, iptc, isan, isbn, iso, issn, ivis, liberty, mace, mpeg, nbn, nena, newsml, nfc, nzl, oasis, ogc, ogf, oid, oipf, oma, pin, publicid, s1000d, schac, service, smpte, swift, tva, uci, ucode, uuid, web3d, xmlorg,

 IRI (Internationalized Resource Identifier)  Internationalized URI  URI only allows to use characters  IRI allows to use characters

4 URI Generic Syntax

http://www.sfc.keio.ac.jp/teacher/hagino.html?title=web#lecture

Schema Authority Path Query Fragment

 Schema  Path  Type of URI  Address inside the authority  Protocol  File name

 Authority  Query  Host name  Query words  Server name  Parameters for interaction

 Fragment  Position inside the document

5 Axioms for URI

 Universality  Every has URI.

 Global Scope  URI always has the same meaning regardless of context.  It is unique in the global scope.

 Sameness  URI means the same thing.  Representations may vary, but the meanings are the same.

 Opacity  URI itself does not show its resource type.  Resource type and details need to be obtained from its representation.

6 Concept, Identifier and Representation

 Concept  Web Resource http://www.keio.ac.jp/index.html Identifier  Identifier Resource Keio University  URI (Uniform Resource Identifier) Representation

 Representation  HTML+CSS Keio University  XML

Keio University

 RDF ......

7 Use of URI

 Web page address  URL http://www.sfc.keio.ac.jp/about_sfc/video.html

 Specification  DTD http://www.w3.org/TR/html4/loose.dtd urn:ietf:rfc:2141  RFC

 Any Web resource  Person http://ja.dbpedia.org/page/東京都  City http://ja.dbpedia.org/page/安倍晋三

8 HTTP (Hypertext Transfer Protocol)

 Protocol for manipulating Web resource

 Five main methods HEAD GET  HEAD Obtain  Get information of the resource Process  GET Web  Get a representation of the resource Resource POST  PUT  Create or update the resource Update

 DELETE PUT DELETE  Delete the resource  POST  Send data for process

9 HTTP and FTP

 FTP  For remote file manipulation  Used from the beginning of Internet  User needs to login to FTP server  Use two TCP connections (control and data)  Only support two types: text or binary  Anonymous FTP for software distribution

 HTTP  For Web resource manipulation  No need for user authentication  Use single TCP connection  Multimedia support

10 HTTP Request and Response

GET HTTP/1.0 : : ...

... Web Server User Agent HTTP Request (Browser) Web HTTP Response Resource

HTTP/1.1 : : status code meaning ... 200 OK 301 Moved Permanently ... 303 See Other 401 Unauthorized 403 Forbidden 11 404 Not Found GET and HEAD Methods

 GET method  Obtain a representation of the resource HTML Document Movie  Content negotiation GET  Language negotiation Web Resource  HEAD method  Obtain information about the resource and its presentation GET  Subset of GET Japanese English

 Property of GET method  GET is safe to use multiple times  GET is idempotent  No side effect for GET GET × GET = GET

12 Content Negotiation

 Document/image format

GET keioPenMark Web Resource Accept: image/png keioPenMark.jpg User keioPenMark Agent keioPenMark.png

keioPenMark.png keioPenMark.gif

 Language format

GET index.html index.html.en Accept-Language: ja, en-us;q=0.8, en;q=0.7 User index.html Agent index.html.ja index.html.ja index.html.kr 13 PUT and POST Methods

 PUT method GET PUT  Create or update the resource of URI  Create or update a Web page  Inverse of GET  Browsers do not use PUT method Web Resource  POST method  Send data to URI resource  The resource process the data  Used for FORM interaction

 GET vs POST method  Choose GET or POST in FORM method attribute  Use GET for non side effect operations  Use POST for updating resource with side effect  POST is not idempotent POST × POST ≠ POST

14 Other Features of HTTP

 Transfer  Effective Use of TCP/IP  Page move  Persistent connection (keep- alive)  Pipeline processing Authentication   User name and password Proxy cache control  Basic and Digest  authentication  max-age  no-cache  public and private  Virtual Host  Serve multiple hosts by single server  Extension to WebDAV  Use DNS alias  COPY, MOVE, LOCK, UNLOCK

15 Summary

 URI  Identifier for Web  URL, URN, IRI

 HTTP  Protocol for Web resource manipulation  HTTP URI  Methods: HEAD, GET, PUT, DELETE, POST

16 Group Work

 Group Work  Three students in one group

 Title: Proposal of HTML6  List problems of current HTML and propose a solution as HTML6  Problems regarding HTML document format, features, server environment, programming environment, and so on.

 Presentation  5 minutes presentation  The presentation material must be uploaded to SFC-SFS before noon of the presentation day.

 Date  Presentation on June 1st.  If there is no class because of Keio Waseda baseball match, presentation on June 8th.

 Note  The title slide must include the list of members (who contribute the work).

17