Web Information System Design No.5 Accessing Web Documents
Tatsuya Hagino ([email protected])
1 Web Basic Components
Prepare documents using HTML and CSS Address documents by URL Get documents from server using HTTP
HTML documentHTML HTML Internet documentHTML document documentHTML document URL
HTTP Web Browser Web Server
2 URI (Uniform Resource Identifier)
Identifier names and numbers that identify things (objects) and concepts
Identifiers which we use: identifier for students identifier for books Identifier identifier for phones identifier for PC Object identifier in Internet Concept
3 URL, URN and IRI
URL (Uniform Resource Locator) URI to point the address of resource Example: http, ftp, file, mailto
URN (Uniform Resource Name) urn:
IRI (Internationalized Resource Identifier) Internationalized URI URI only allows to use ascii characters IRI allows to use unicode characters
4 URI Generic Syntax
http://www.sfc.keio.ac.jp/teacher/hagino.html?title=web#lecture
Schema Authority Path Query Fragment
Schema Path Type of URI Address inside the authority Protocol File name
Authority Query Host name Query words Server name Parameters for interaction
Fragment Position inside the document
5 Axioms for URI
Universality Every Web resource has URI.
Global Scope URI always has the same meaning regardless of context. It is unique in the global scope.
Sameness URI means the same thing. Representations may vary, but the meanings are the same.
Opacity URI itself does not show its resource type. Resource type and details need to be obtained from its representation.
6 Concept, Identifier and Representation
Concept Web Resource http://www.keio.ac.jp/index.html Identifier Identifier Resource Keio University URI (Uniform Resource Identifier) Representation
Representation HTML+CSS
Keio University
RDF ......7 Use of URI
Web page address URL http://www.sfc.keio.ac.jp/about_sfc/video.html
Specification DTD http://www.w3.org/TR/html4/loose.dtd urn:ietf:rfc:2141 RFC
Any Web resource Person http://ja.dbpedia.org/page/東京都 City http://ja.dbpedia.org/page/安倍晋三
8 HTTP (Hypertext Transfer Protocol)
Protocol for manipulating Web resource
Five main methods HEAD GET HEAD Obtain Get information of the resource Process GET Web Get a representation of the resource Resource POST PUT Create or update the resource Update
DELETE PUT DELETE Delete the resource POST Send data for process
9 HTTP and FTP
FTP For remote file manipulation Used from the beginning of Internet User needs to login to FTP server Use two TCP connections (control and data) Only support two types: text or binary Anonymous FTP for software distribution
HTTP For Web resource manipulation No need for user authentication Use single TCP connection Multimedia support
10 HTTP Request and Response
GET
HTTP/1.1
GET method Obtain a representation of the resource HTML Document Movie Content negotiation GET Language negotiation Web Resource HEAD method Obtain information about the resource and its presentation GET Subset of GET Japanese English
Property of GET method GET is safe to use multiple times GET is idempotent No side effect for GET GET × GET = GET
12 Content Negotiation
Document/image format
GET keioPenMark Web Resource Accept: image/png keioPenMark.jpg User keioPenMark Agent keioPenMark.png
keioPenMark.png keioPenMark.gif
Language format
GET index.html index.html.en Accept-Language: ja, en-us;q=0.8, en;q=0.7 User index.html Agent index.html.ja index.html.ja index.html.kr 13 PUT and POST Methods
PUT method GET PUT Create or update the resource of URI Create or update a Web page Inverse of GET Browsers do not use PUT method Web Resource POST method Send data to URI resource The resource process the data Used for FORM interaction
GET vs POST method Choose GET or POST in FORM method attribute Use GET for non side effect operations Use POST for updating resource with side effect POST is not idempotent POST × POST ≠ POST
14 Other Features of HTTP
Transfer Effective Use of TCP/IP Page move Persistent connection (keep- alive) Pipeline processing Authentication User name and password Proxy cache control Basic and Digest authentication max-age no-cache public and private Virtual Host Serve multiple hosts by single server Extension to WebDAV Use DNS alias COPY, MOVE, LOCK, UNLOCK
15 Summary
URI Identifier for Web URL, URN, IRI
HTTP Protocol for Web resource manipulation HTTP URI Methods: HEAD, GET, PUT, DELETE, POST
16 Group Work
Group Work Three students in one group
Title: Proposal of HTML6 List problems of current HTML and propose a solution as HTML6 Problems regarding HTML document format, features, server environment, programming environment, and so on.
Presentation 5 minutes presentation The presentation material must be uploaded to SFC-SFS before noon of the presentation day.
Date Presentation on June 1st. If there is no class because of Keio Waseda baseball match, presentation on June 8th.
Note The title slide must include the list of members (who contribute the work).
17