Uniform Resource Identifier (URI) URI (2)

● Each URI begins with a scheme name that is separated by a ':' character from the scheme-specific part of the URI. Péter Jeszenszky – Scheme specifications can define their scheme- Faculty of Informatics, University of Debrecen specific syntax within certain limits. [email protected] ● The organization responsible for the administration of the URI schemes: Last modified: September 9, 2019 – Internet Assigned Numbers Authority (IANA) https://www.iana.org/ ● See: Uniform Resource Identifier (URI) Schemes https://www.iana.org/assignments/uri-schemes/uri-sche mes.xhtml 3

URI (1) Well-Known URI Schemes

● Uniform Resource Identifier (URI): ● file: – – Matthew Kerwin, The "file" URI Scheme, RFC 8089, February A compact sequence of characters that identifies 2017. https://tools.ietf.org/html/rfc8089 an abstract or physical resource. ● http/https: ● A resource is not necessarily available on the Web. – Roy T. Fielding (ed.), Julian F. Reschke (ed.), Hypertext Transfer ● URIs can be assigned even to objects from the real Protocol (HTTP/1.1): Message Syntax and Routing, RFC 7230, world or to concepts. June 2014. https://tools.ietf.org/html/rfc7230 ● Current standard: ● mailto: – Tim Berners-Lee, Roy Fielding, Larry Masinter, – Martin Dürst, Larry Masinter, Jamie Zawinski, The 'mailto' URI Scheme, RFC 6068, October 2010. Uniform Resource Identifier (URI): Generic https://tools.ietf.org/html/rfc6068 Syntax, RFC 3986, January 2005. https://tools.ietf.org/html/rfc3986 2 4 Dereferencing URL vs URN (2)

● Accessing the resource identified by a URI. ● This former classification is now obsolete: – In most cases, “access” means the retrieval of a – Michael Mealling (ed.), Ray Denenberg (ed.), Report representation of the resource. from the Joint W3C/IETF URI Planning Interest Group: Uniform Resource Identifiers (URIs), URLs, and Uniform Resource Names (URNs): Clarifications and Recommendations, RFC 3305, August 2002. https://tools.ietf.org/html/rfc3305 – URIs, URLs, and URNs: Clarifications and Recommendations 1.0—Report from the joint W3C/IETF URI Planning Interest Group (W3C feljegyzés, 2001. szeptember 21.) https://www.w3.org/TR/uri-clarification/

5 7

URL vs URN (1) URL vs URN (3)

● Historically, two disjoint types of URIs are ● According to the contemporary view, a URI can distinguished: be a locator, a name, or both at the same time. – Uniform Resource Locator (URL): – A URI scheme does not need to be cast into one of ● Identifying resources by their location. a discrete set of URI types, such as URL or URN. ● Tim Berners-Lee, Larry Masinter, Mark P. McCahill, ● URL is an informal concept, it means a URI that Uniform Resource Locators (URL), RFC 1738, identifies a resource via a representation of its December 1994. https://tools.ietf.org/html/rfc1738 primary access mechanism (for example, its – Uniform Resource Name (URN): network “location”). ● Persistent and location-independent resource identifiers. ● Ryan Moats, URN Syntax, RFC 2141, May 1997. https://tools.ietf.org/html/rfc2141 6 8 URN WHATWG standard (2)

● A Uniform Resource Name (URN) is a persistent, ● Handle URIs and IRIs uninformly. location-independent resource identifier. ● A URL is a universal identifier. ● A URN is a URI that is assigned under the urn URI scheme. ● See: – Peter Saint-Andre, John . Klensin, Uniform Resource Names (URNs), RFC 8141, April. 2017. https://tools.ietf.org/html/rfc8141

9 11

URI vs URL WHATWG standard (1) (IETF vs WHATWG)

● URL Living Standard (last updated: 19 August ● See: 2019) https://url.spec.whatwg.org/ – Daniel Stenberg. My URL isn't your URL. May 11, ● Goals: 2016. https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-y – Align RFC 3986 and RFC 3987 with contemporary our-url/ implementations and obsolete them in the process. – Daniel Stenberg. One URL standard please. – Standardize on the term URL. January 30, 2017. – Enhance URL’s existing JavaScript API. https://daniel.haxx.se/blog/2017/01/30/one-url-stand ard-please/

10 12 URI Examples URI Characters (2)

● http://www.ietf.org/rfc/rfc3986.txt ● Percent-encoding: used to represent a data octet in ● https://url.spec.whatwg.org/#references a component when that octet's corresponding ● file:///usr/lib/R/library character is outside the allowed set or is being used ● mailto:[email protected] as a delimiter of, or within, the component. ● ldap://ldap..com/dc=example,dc=com – A percent-encoded octet is encoded as a character triplet ● tel:+36-52-512-900 %hh, consisting of the '%' character followed by the two ● news:comp.lang.c hexadecimal digits representing that octet's numeric ● urn:isbn:0-395-36341-1 value.

● urn:ietf:std:66 ● For example, %20 is the percent-encoding the space character. ● urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 ● Both the uppercase ('A', …, 'F') and the lowercase ('a', …, ● geo:47.5539464,21.6215658 'f') hexadecimal digits can be used. ● If two URIs differ only in the case of hexadecimal digits used in percent-encoded octets, they are equivalent. 13 15

URI Characters (1) URI Characters (3)

● Characters allowed in URIs: ● Percent-encoding examples: – The following are reserved characters: – file:///media/Movies/What's Up, Tiger ● ':', '/', '?', '#', '[', ']', '@', '!', '$', '&', ''', '(', ')',  '*', '+', ',', ';', '=' Lily? (1966)/ – Characters used as delimiters. file:///media/Movies/What%27s%20Up%2C – The following are unreserved characters: %20Tiger%20Lily%3F%20%281966%29/ ● 'A', …, 'Z', 'a', …, 'z' – Assuming UTF-8 character encoding: ● '0', …, '9' http://www.w3.org/People/Dürst/  ● '-', '.', '_', '~' http://www.w3.org/People/D%C3%BCrst/ ● Thd specification does not mandate any particular character encoding.

14 16 URI Syntax (1) Authority

● Syntax is organized hierarchically. ● The name comes from that the name space – Components listed in order of decreasing significance from left to defined by the remainder of the URI is under right. its jurisdiction. ● Generic syntax: scheme ':' hier-part ['?' query] ['#' fragment] ● Syntax: – The hier-part component may consist of an authority and a path [userinfo '@'] host [':' port] component, its syntax is: '//' authority path or path – A URI scheme may define a default port. ● When authority is present, the path must either be empty or begin with a ● For example, the http scheme defines a '/' character. default port of 80. ● When authority is not present, the path cannot begin with two '/' characters.

17 19

URI Syntax (2) Path

● Example: ● A sequence of path segments separated by a https://wordery.com/search?term=scotland#header \___/ \______/\_____/ \______/ \____/ '/' character. | | | | | scheme authority path query fragment ● Terminated by the first '?' or '#', or by the ● Example: end of the URI. mailto:[email protected]?subject=XML \____/ \______/ \______/ ● The path segments and can be used | | | '.' '..' scheme path query just as in some operating systems' file directory structures.

18 20 Query Fragment Identifier (2)

● Indicated by the first '?' character and terminated by a '#' ● URI scheme specifications must define their character or by the end of the URI. own syntax so that all strings matching their ● Contains non-hierarchical data. scheme-specific syntax must be an absolute ● Often contains name/value pairs of the form name '=' value delimited by an '&' character. URI without a fragment identifier. – In the case of the http and https URI schemes the query component is – Scheme specifications will not define fragment used for submitting form data (see the application/x-www-form- identifier syntax or usage, regardless of its format). urlencoded applicability to resources identifiable via that – Example: scheme, as fragment identification is orthogonal to ● https://www.bookdepository.com/search?searchTerm=sherlock&search=Find+book – See: HTML Standard – URL-encoded form data scheme definition. https://html.spec.whatwg.org/multipage/forms.html#url-encoded-form-data

21 23

Meaning of the Fragment Identifier Fragment Identifier (1) (1)

● Indicated by a '#' character and terminated by the end of the ● text/html media type: URI. – Fragment identifiers either refer to the indicated part of the ● Allows indirect identification of a secondary resource by reference document or provide state information for in-page scripts. to a primary resource and additional identifying information. https://www.iana.org/assignments/media-types/text/html – The identified secondary resource may be some portion or subset of the – Detailed processing for fragment identifiers is defined in the HTML5 primary resource, some view on representations of the primary resource, specification. or some other resource defined or described by those representations. ● See: Navigating to a fragment ● The semantics of a fragment identifier are defined by the set of https://html.spec.whatwg.org/multipage/browsing-the-web.html#scroll-to-fra representations that might result from a retrieval action on the gid primary resource. – For example, the fragment identifier in the – Media types may also define their own restrictions on or structures within https://www.w3.org/blog/news/#w3c_footer URI refers to the the fragment identifier syntax. element with id="w3c_footer". ● The fragment identifier is separated from the rest of the URI prior – For example, the fragment identifier in the to a dereference. https://www.youtube.com/watch?v=w0ffwDYo00Q#t=77 URI indicates the position from which playback will be started (at the 22 24 77th second). Meaning of the Fragment Identifier URI-reference Examples (2)

● application/xml, text/xml and */*+xml media ● http://www.gnu.org/licenses/licenses.html types: ● http://www.w3.org/TR/xml/#abstract – The latter includes, for example, the following media types: ● http://en.wikipedia.org/wiki/The_Beatles#History application/xhtml+xml, image/svg+xml, ● /pub/linux/kernel/v3.x/testing/ model/x3d+xml ● ../../images/bullet.png – The syntax and semantics of fragment identifiers is based on ● index.html#contents the XPointer Framework specification. https://www.iana.org/assignments/media-types/text/xml ● contacts.xml#element(/1/2)

● ● XPointer Framework (W3C Recommendation, 25 March 2003) #nav https://www.w3.org/TR/xptr-framework/ ● gpl.html – For example, the fragment identifier in the ● empty string https://www.w3.org/TR/xml/#sec-bibliography URI refers to the element with identifier sec-bibliography in the document.

25 27

Absolute URI, URI-reference, Same-document reference relative reference

● Absolute URI: a URI without a fragment identifier. ● A URI-reference that refers to a URI that is, – Only absolute URIs can be used as a base URI. aside from its fragment component (if any), ● URI-reference: a URI or a relative reference. identical to the base URI. ● Relative reference: a scheme-specific subpart of a URI – Example: empty string, #nav or a suffix of it (can be empty). – A dereference should not result in a new retrieval – The specification does not use the term “relative URI” at all! action. – URIs are interpreted consistently regardless of context, relative references are interpreted in a context. – Relative references are resolved to a URI against a base URI. The resulting URI is also known as the target URI. – The specification describes an algorithm for resolving relative references. 26 28 Relative Reference Resolution Establishing a Base URI Examples (2)

● Within certain media types, a base URI for relative references can be ● Let be the base embedded within the content itself. http://example/a/b/c?q – Thus, for example, documents can define their base URI. URI

● XML: the base URI can be specified by the xml:base attribute (see later). ● HTML: the base URI can be provided by the base element. Relative Reference Target URI https://html.spec.whatwg.org/multipage/semantics.html#the-base-element #z http://example/a/b/c? ● If no base URI is embedded and a representation is enclosed within q#z another entity – for example, another document –, then the base URI is the base URI of the entity in which the representation is encapsulated. "" (empty string) http://example/a/b/c?q ● If no base URI is embedded and the representation is not encapsulated . http://example/a/b/ within some other entity, then, if a URI was used to retrieve the ./ http://example/a/b/ representation, that URI is considered the base URI. .. http://example/a/ – If the retrieval was the result of a redirected request, the last URI used is the base URI. ../d http://example/a/d ● Otherwise, the base URI is application-dependent. ../../d http://example/d

29 31

Relative Reference Resolution Relative Reference Resolution Examples (1) Examples (3)

● Example: ● Let be the base http://example/a/b/c?q – URI Example Relative Reference Target URI d http://example/a/b/d ./d http://example/a/b/d /d http://example/d – Resolution of the relative references: //localhost http://localhost ● theme.css  http://example/docs/howto/theme.css ● /about  http://example/about ?y http://example/a/b/c?y ● ../images/logo.png  http://example/docs/images/logo.png d?y http://example/a/b/d?y

30 32 URI Comparison (1) Java Support

● The scheme and host components are case- ● java.net.URI class insensitive. https://docs.oracle.com/en/java/javase/11/docs/ api/java.base/java/net/URI.html ● The other syntax components are assumed to be case-sensitive unless specifically defined – Based on the previous version of the standard. otherwise by the scheme. ● Tim Berners-Lee, Roy Fielding, Larry Masinter, Uniform Resource Identifiers (URI): Generic Syntax, RFC 2396, ● For example, the http://www.w3.org/ and August 1998. https://tools.ietf.org/html/rfc2396 HTTP://www.W3.org/ URIs are equivalent.

33 35

Internationalized Resource Identifier URI Comparison (2) (1)

● A possible definition of equivalence: ● IRIs consist of Unicode/UCS characters instead of US-ASCII characters. – URIs should be considered equivalent when they identify the same resource. ● The syntax and use of components and reserved characters is the same as that in the URI specification. – This definition is not of much practical use, because in general there is no way to compare two resources. ● The range of unreserved characters is expanded to include Unicode/UCS characters. ● In practice, equivalence is determined by string ● Example: comparison. http://駅街ガイド.jp/ガイド.jp/jp/ ● – Normalization is applied before comparison, for example, Current standard: uppercase letters are converted to lowercase letters in – Martin Dürst, Michel Suignard, Internationalized Resource Identifiers case-insensitive components. (IRIs), RFC 3987, January 2005. https://tools.ietf.org/html/rfc3987

34 36 Internationalized Resource Identifier XML Base (1) (2)

● An IRI reference can be mapped to an equivalent URI ● XML Base (Second Edition) (W3C reference: Recommendation, 28 January 2009) – For each character of the IRI reference that is not allowed in https://www.w3.org/TR/xmlbase/ URI references perform the following steps: – Provides a mechanism for defining base URIs in XML ● Convert the character to a sequence of one or more octets using UTF-8. documents. ● Convert each octet to %HH, where HH is the hexadecimal notation of – Introduces the xml:base attribute to specify a base the octet value (percent-encoding). URI. ● Replace the original character with the resulting character sequence. ● – Example: http://www.w3.org/People/Dürst/  The attribute is inherited by descendant elements until http://www.w3.org/People/D%C3%BCrst/ another element with an xml:base attribute is encountered.

37 39

Internationalized Resource Identifier XML Base (2) (3)

● Advantages: ● Example: uses an alphabet other than Latin. ● Risks: – Homograph attacks: tricking users by exploiting the fact that there are different Unicode characters that look alike. – Resolution of the relative references: ● The risk also exists with URIs, see, for example, lame and 1ame, broken and br0ken. ● info.xml  http://example/books/untitled/info.xml, chapter1.xml  http://example/books/untitled/chapter1.xml ● thanks.xml  http://example/books/thanks.xml 38 40 ● unsorted.xml  http://example/biblio/unsorted.xml Media Fragment Addressing (1) Media Fragment Addressing (3)

● Media fragments are subparts of media resources of ● The media type of the retrieved fragment type audio/*, image/* and video/*. should be the same as the media type of the ● Media-format independent standard means of primary resource. addressing media fragments on the Web using URIs: – For example, a URI fragment that points to a single – Media Fragments URI 1.0 (basic) (W3C video frame out of a longer video results in a one- Recommendation, 25 September 2012) frame video, not in a still image. https://www.w3.org/TR/media-frags/ ● Media fragments are regarded along several different dimensions.

41 43

Media Fragment Addressing (2) Media Fragment Addressing (4)

● Media fragment identification information in URIs can be ● Media fragment identification in the fragment provided in the query or the fragment identifier identifier component: component. – In traditional URI fragment retrieval, a user agent ● Example: requests the complete primary resource from the ● http://www.example.com/video.ogv?t=60,100 server and then applies the fragmentation locally. ● http://www.example.com/video.ogv#t=60,100 – In the case of media fragments it is inefficient to – The difference is that the query produces a new resource, whereas the fragment identifier refers to a secondary retrieve the entire primary resource from the server. resource that has a relationship to the primary resource. ● In practice a user agent may perform multiple requests ● URI fragments are resolved from the primary resource without (including range requests) to extract the media fragment. another retrieval action.

42 44 Media Fragment Addressing (5) Media Fragment Addressing (7)

● Media fragment addressing in the query ● The specification defines the following component: dimensions: – The media fragment is extracted by the server and – Temporal (t): denotes a specific time range. is delivered to the user agent as a completely new ● Use case: start/stop playback at a specific time position. resource. – Spatial (xywh): denotes a rectangular area in the original media specified by the coordinates of its top-left corner, its width and height, where (0,0) is the top-left corner of the original media. ● Two different use cases: highlight, crop.

45 47

Media Fragment Addressing (6) Media Fragment Addressing (8)

● Syntax for media fragment addressing: ● The specification defines the following – name=value pairs delimited with '&' characters, dimensions: (continued) where name denotes a dimension. – Track (track): denotes one or more tracks in the original media. ● For example, “the english audio and the video track”. – Id (id): denotes a named temporal fragment within the original media. ● For example, chapter2.

46 48 Media Fragment Addressing (9) Media Fragment Addressing (11)

● An example for using the t dimension: time in ● An example for using the track dimension: the second column of the table is measured in – http://www.example.com/example.ogv#track=audio seconds. ● An example for using the id dimension: Media fragment Meaning – t=10,20 [10,20) http://www.example.com/example.ogv#id=chapter2 t=,10 [0,10) t=10 [10,vég) t=01:38,03:52 [98,232) t=0:02:00,121.5 [120,121.5)

49 51

Media Fragment Addressing (10) Media Fragment Addressing (12)

● An example for using the xywh dimension: ● All dimensions are logically independent and can be combined, the outcome is independent of the order of Media fragment Meaning the dimensions. xywh=100,150,320,200 A rectangle of size 320×200 with top- left at x=100 and y=150 – Example: xywh=pixel:100,150,320,200 The same, as the previous one. http://www.example.com/example.ogv#track=audio&t=10,2 xywh=percent:25,25,50,50 A rectangle of size 50%×50% with 0 top-left at x=25% and y=25% ● If the same dimension occurs more than once, only the last occurrence is considered. – The only exception is the track dimension: multiple track dimensions are allowed. ● Example: #track=1&track=2

50 52 Media Fragment Addressing (13) URL Shortening (1)

● List of implementations: ● Long URIs can be shortened using HTTP redirection. – Showcase – Media Fragments Working Group Wiki https://www.w3.org/2008/WebVideo/Fragments/wiki/ ● The aim of URL shortening is to create an http URI Showcase that points to the same resource, but is more ● An example for experimenting: aesthetic, more compact and can be displayed and communicated more easily. – http://video.webmfiles.org/elephants-dream.webm#t =60,70 – Originally and historically, the length of messages was limited to 140 characters. ● See: Giving you more characters to express yourself https://blog.twitter.com/official/en_us/topics/product/2017/Givin g-you-more-characters-to-express-yourself.html

53 55

Media Fragment Addressing (14) URL Shortening (2)

● Browser support: ● List of URL Shorteners – See: https://caniuse.com/#feat=media-fragments https://bit.do/list-of-url-shorteners.php ● Implementations (incomplete): ● A number of websites provides URL shortening – Firefox (): supports only the time dimension functionality: ● See: Audio and Video Delivery – Specifying playback range – Twitter (https://t.co/) https://developer.mozilla.org/en-US/Apps/Fundamentals/Au – dio_and_video_delivery#Specifying_playback_range Google Maps (https://goo.gl/) – Chromium (Blink): supports only the time dimension – YouTube (http://y2u.be/) ● See: Media Fragments support has landed in Chromium – … http://lists.w3.org/Archives/Public/public-media-fragment/20 12Jan/0021.html

54 56 URL Shortening: URL Shortening: TinyURL Google URL Shortener (2)

● Owner: TinyURL, LLC. Starting March 30, 2018, Google turned down Web page: https://tinyurl.com/ the support for the service. HTTP status code: 301 (Moved Permanently) – See: Transitioning Google URL Shortener to Registration: no Firebase Dynamic Links Custom URI: yes https://developers.googleblog.com/2018/03/transitio URL information: yes ning-google-url-shortener.html Tracking: no API: http://tinyurl.com/api-create.php?url Example: http://tinyurl.com/qeo69bs http://preview.tinyurl.com/qeo69bs http://tinyurl.com/XmlJavaTypeAdapter http://preview.tinyurl.com/XmlJavaTypeAdapter

57 59

URL Shortening: URL Shortening: Bitly Google URL Shortener (1) Owner: Bitly, Inc. Owner: Google Inc. Web page: https://bitly.com/ Web page: https://goo.gl/ HTTP status code: 301 (Moved Permanently) HTTP status code: 301 (Moved Permanently) Registration: optional Registration: optional Custom URI: yes Custom URI: no URL information: yes URL information: yes Tracking: yes Tracking: yes API: https://dev.bitly.com/ API: https://developers.google.com/url-shortener/ Example: http://bit.ly/1DueXyD Example: http://goo.gl/C5TblB http://bitly.com/1DueXyD+ http://goo.gl/C5TblB.info http://amzn.to/1EQFqrW http://amzn.to/1EQFqrW+

58 60