<<

arXiv:1907.00259v1 [cs.DL] 29 Jun 2019 h,iflecdb i ok,sae h optrwrdo tod of world the shaped works, his by influenced who, h rgnlvso fhpreta rpsdb e esn[1 Nelson Ted by proposed as of vision original The documents , , Xanadu, hspprpeet oe n omlitrrtto fthe of interpretation formal and novel a presents paper This iil once ouet aldXnd,ge eodthe beyond goes Xanadu, called documents connected visibly hspprtist e akt h rgnlvso fhyperte of vision original the to back get to tries paper This [30]. tools) on focused (that 090-20:6 ae1o 1–4. of 1 Page 01:06. 2019-07-02 INTRODUCTION KEYWORDS media. n tteeryWChmpg http://www.w3.org/Xanadu.ht homepage W3C 2 early the at and 1 ABSTRACT atbtntlatteWb[20]. Web the least not but last tl at ob elzd i nunei iil hog p through visible is influence His realized. be to waits still resentation • CONCEPTS CCS Int the as in such infrastructures and information implementation current in for tion allow refer that and examples technologies with existing illustrated netw is and model formats Its data as protocols. such standards specific from hypert pendent infrastructure-agnostic hypertext: of vision inal oue nsml ik) n ntehpretrsac comm research hypertext the in and community links), literary simple the b on in focused also both has Nelson, ‘hypermedia’ from general differently [18 used more links build or to hypertext documents of quotations using concept of versions, instead (with documents markup...) build lay to links non-breaking uses promises it it particular In aspects. several in iie ytesaeo ouetpoesn ol n ysub- hypertext. of by demo and a tools not processing is paper document this of guidelines, state mission the by models Limited link-based beyond systems hypertext and 25] 6, f 2, technologies [1, transclusion p existing this 26], conten [12, 22] 28], identifiers [11, 19, analysis [14–17, format data works on research Nelson’s from draws from Apart heart. doc its transcludeable puts at that model formal a of specification enpoesdt hsppr e gr o noeve and overview an for 2 figure See a details paper. for this ://github.com/jakobib/hypertext2019 to t links processed transclusion of been traces are there however look closer e 5 o odeapeo hta culdm ae ih lo might paper demo actual an what of example good a for [5] See that proposal the in both Nelson references Berners-Lee Tim nomto systems Information • ; ua-etrdcomputing Human-centered → yetx languages Hypertext 1 nrsrcueAnsi Hypertext Infrastructure-Agnostic esnscr da ewr of network a idea, core Nelson’s yetx hyper- / Hypertext → l(929)[17]. (1992/93) ml e oteWb[,15] [4, Web the to led ebnznrl e B (VZG) GBV des Verbundzentrale ; ouetrep- Document klike. ok dsources. nd rteWeb the or x sinde- is ext ik and links ernet. a have hat [email protected] ne to ences uments t-based 2 over- , .The ]. tegra- eople 22] 4, unity tby xt na On [3]. ao Voß Jakob (that Web orig- aper een ork ay, rcia yetx ytmneseeual implementat executable needs system hypertext practical A ouetietfiri eaieysotdcmn htref that document short relatively a is identifier document A oueti nt euneo ye.Ti ento rough definition This bytes. of sequence finite a is document A h rhtcueo nifatutr-gotchypertext infrastructure-agnostic an of architecture The yetx ytmi tuple a is system hypertext A h lmnsadterrltosaeec ecie ntefo the in described each are relations their and elements The bination h functions the nte ouet dnieshv rprisdpnigon depending properties have Identifiers document. another ouetidentifiers Document UTF- as more. such many many formats and data PDF SVG, non-disjoint CSV, by dynamically grouped processed be are may uments they but [24] content by Docu static n datasets). k other The all transclude 29]. subsumes that therefore (datasets [8, paper this documents in as used hypertext data of of tion definition the with equates Documents ouetst a ahb rue no(osbyoverlappin of: (possibly consists further into system grouped hypertext The be formats. data each can sets Document ossso orbsceeet n hi relations: their and elements basic four of consists ntne fec lmn a ute egopdby grouped be further can element each of Instances OUTLINE omlmodel Formal definition. formal a after sections T U tag">C 4) 3) 2) 1) A S R E I D sa yetx sebefnto with function assemble hypertext an is sataslso ucinwith function transclusion a is sartivlfnto with function retrieval a is sasto ouetsget with segments document of set a is sasto dtlsswith lists edit of set a is dtlist edit locators content identifiers document documents sasto otn oaoswith locators content of set a is sasto ouetietfir with identifiers document of set a is documents of set a is sasget sg function usage segments a is h c , d obn at feitn ouetit e ones new into document existing of parts combine ∈ i R , U nld l nt,dgtlobjects digital finite, all include C , T × , D A eeec emnswti documents within segments reference spr of part is n ehdt elwehragvncom- given a whether tell to method a and eeec niiuldocuments individual reference E h ⊂ D R , D : S U I I , T oalwisuewith use its allow to C : C → E : , E S ⊂ P( → D , S → S I D , ⊂ ⊂ R D A C , D S U : ) × E , T D → , A i D aaformats data where: T et are ments . llowing system osof ions nsof inds r to ers Doc- . the ly g) o- 8, . Jakob Voß particular identifier system they belong to [29, pp. 59-71]. Identi- Data formats fiers in infrastructure-agnostic hypertext must first be unambigu- ous (an identifier must reference only one document), persistent (the An often neglected fundamental property of digital documents is reference must not change over time), and actionable (hypertext their grounding in data formats. A data format is a set of docu- systems should provide methods to retrieve documents via the re- ments that share a common data model, also known as their docu- trieval function R). Properties that should be fulfilled at least to ment model, and a common serialization (see fig. 1). Models define some degree include uniqueness (a document should not be refer- elements of a document in terms of sets, strings, tuples, graphs or enced by too many identifiers), performance (identifiers should be similar structures. These structures are mathematically rigorous in easy to compute and to validate), and distributedness (identifiers theory [24] but more based on descriptive patterns in practice [28]. should not require a central institution). The actual choice of an The meaning of these elements (for instance “words”, “sentences”, identifier system depends on weighting of specific requirements. A and “paragraphs” in a document model) is based on ideas that we promising choice is the application of content-based identifiers as at least assume to be consistent among different people. proposed at least once in the context of hypertext systems [12]. idea model format serialization Content locators 01101...

A content locator is a document that can be used to select parts of another document via transclusion. Nelsons refers to these locators Figure 1: levels of data modeling as “reference pointers” [19], exemplified with spans of bytes or char- acters in a document. Content locators depend on data formats and Data modeling, the act of mapping between ideas, models, and for- document models. For instance locator languages XPath, XPointer, mats is an unsolved problem because ideas can be expressed in and XQuery act on XML documents, which can be serialized in dif- many models and models can be interpreted in many ways [11]. ferent forms (therefore it makes no sense to locate parts of an XML Data models can further be expressed in multiple formats although document with positions of bytes). Other locator languages apply these formats should fully be convertible between each other, at to tabular data (SQL, RFC 7111), to graphs (SPARQL, GraphQL), or least in theory.4 Formats can further be serialized in multiple forms, to two-dimensional images (IIIF), to name a few. Whether and how which for their part are based on other data models. For instance the partsofa document can be selectedwith acontent locatorlanguage RDF model can be serialized in RDF/XML format which is based on depends on which data format the document is interpreted in. For the XML model. XML can be serialized XML syntax which is based instance an SVG image file can be processed at least as image, as on the model, and Unicode can be serialized in UTF-8. At XML document, or as Unicode string, each with its own methods the end of these chains of abstraction eventually all documents can of locating document segments. Content locators can be extended be serialized as sequence of bytes. Serializations are seldom sim- to all executable programs that reproducibly process some docu- ple mappings as most serialization formats allow insignificant vari- ments into other documents. This generalization can be useful to ances such as additional whitespace. To check whether a data ob- track data processing pipelines as hyperdata such as discussed for ject conforms to a serialization, formats are often described with executable papers and reproducible research. Restriction of content a formal grammar that can also give insights about the format’s locators to less powerful query languages might make sense from model.5 Infrastructure-agnostic hypertext does not impose any lim- a security point of view. its on possible data formats and their models.

Edit Lists EXAMPLE

An edit list is a document that describes how to construct a new doc- The following example may illustrate the formal model and its el- ument from parts of other documents. Edit lists are known as Edit ements. Let hD, I,C, E, S, R,U ,T, Ai be a hypertext system with D Decision Lists in Xanadu and the idea was borrowed from film mak- the set of printable ASCII character strings and some documents: ing [15]. Simplified forms of edit lists are implemented in version d1 = ‘My name is Alice’ control systems and in collaborative tools such as and real- d2 = ‘Alice’ time editing. Hypertext edit lists go beyond this one-dimensional c1 = ‘char=11,15’ case by support of multiple source documents and by more flex- d3 = ‘Hello, !’ ible methods of document processing in addition to basic opera- c2 = ‘char=7’ tions such as insert, delete, and replace. The actual processing steps d4 = ‘Hello, Alice!’ tracked by an edit list depend on data formats of transcluded docu- ments. Just like content locators, edit lists could be extended to arbi- If c1 and c2 are read in content locator syntax as defined by = trary executable programs that implement the hypertext assemble RFC 5147 so that T (hc1,d1i) d2 then d4 can be constructed by function A for some subset of edit lists. To ensure reproducibility transcluding a document segment of d1 into d3 at position c2. The and reliable transclusion,3 these programs must not access unstable corresponding edit list e1 ∈ E with A(e1) = d4 could look like this: external information such as documents that may change [24]. 4In practice it’s often unknown whether two data formats actually share the same model, especially if models are only given implicitly by definition of their formats. 5Formats may also exist purely implicit in of sample instances which grammar, 3If documents models/segments change, locators may not be applicable anymore [6]. model, and ideas must be guessed from by reverse engineering [28]. 2019-07-02 01:06. Page 2 of 1–4. Infrastructure-Agnostic Hypertext

take 995f37f2e066b7d8893873ca4d780da5bf017184 normalization: most documents (including identifiers) can be insert at 48ba94c47b45390b6dd27824cfc0d8468c2cbc71 serialized in different forms. To support unique document iden- from fcb59267e2e6641140578235c8cb6d38eaf6abc1 tifiers, a hypertext system should support normalization of doc- segment c5b794c7ae5d490f52a414d9d19311b9a19f61b3 uments to canonical forms. link services: of links have been proposed as central The values in e are SHA-1 hashes of d , c , d , and c respectively.6 1 3 2 1 1 part of Open Hypermedia Systems [3] but they are not available Retrieval function R maps them back to strings. are for the Web because of commercial interest.8 Links are ideally given by U (e ) = {hc ,d i, hc ,d i} used for editing d to d (ver- 1 2 3 1 1 3 4 derived from edit lists with segments usage function U . Recent sioning) and for referencing of segment of d in d (transclusion). 1 4 development such as Webmention and OpenCitation may help to improve collection of links. IMPLEMENTATIONS visualization and navigation: this most recognizable element of hypertext has mostly been reduced to simple links while Nel- One of the problems faced by was it long required son’s ideas seem to have been forgotten or ignored [27]. Nev- new developments such as computer networks, document process- ertheless the creation of tools for visualization and navigation ing, and graphical user interfaces ahead of their time. Today we can in hypertext structures is less challenging then getting hold of build on a many existing technologies: the underlying documents and hyperlinks. edit list formats despite edit lists being the very core of the networks: storage and communication networks are ubiquitous idea of hypertext [15], they have rarely been implemented with several protocols (HTTP, IPFS, ...). in reusable data formats. Proper hypertext implementations identifier systems: document identifiers should be part of the therefore require to establish new formats with support of URI/IRI identifier system. More specific candidates of relevant hypertext assemble function A and segments usage function identifier systems include and content-based identifiers. U . formats: hypertext systems should not be limited to their own editing tools applications to create and modify digital objects document formats (such as the Web’s focus on HTML/DOM) track changes don’t provide this information in form of but allow for integration of all kinds of digital objects. reusable edit lists, if at all. Hypermedia authoring needs to be content locators as shown above, several content locator and integrated into existing editing tools to succed [7]. query formats exist, at least for some document models. copyright and control: who should be allowed to use which Access to documents via a retrieval function R can be implemented documents under which conditions? The answers primarily with existing network and identifier technologies. Obvious solu- depend on legal, social, and political requirements. tions build on top of HTTP and URL but these identifiers are far from unambiguous and persistent. Content-based identifiers are Differences to Xanadu guaranteed to always reference the same document but they re- 7 quire network and storage systems to be actionable. The set of Project Xanadu promised a comprehensive hypertext system in- supported data formats is only limited by availability of applica- cluding elements for content (xanadocs), network (servers), rights tions to view and to edit documents. Full integration into a hyper- (micropayment), and interfaces (viewers) – years before each these text system however requires appropriate content locator formats concept made it into the computer mainstream. Today a xanalogi- to select, transclude, and link to segments from these documents. cal hypertext system can more build on existing technologies. The Existing content locator technologies include URI Fragment Iden- infrastructure-agnostic model of hypertext tries to capture the core tifiers [25], patch formats (JSON Patch, XML Patch, LD Patch...), parts of the original vision of hypertext by concentrating onits doc- and domain-specific query languages as long as they can guaran- uments and document formats. For this reason some requirements tee reproducible builds. The IIIF Image API, with focus on content listed by Xanadu Australia [23] or mentioned by Nelson in other locators in images [2], and hypothes.is, with a combination of loca- publications are not incorporated explicitly: tor methods [6], popularized at least simple forms of transclusion on the Web. a) identified servers as canonical sources of documents b) identified users and access control c) copyright and royalty system via micropayment Challenges d) user interfaces to navigate and edit

Despite the availability of technologies to build on, creation of a Meeting these requirements in actual implementations is possible xanalogical hypertext system is challenging for several reasons. nevertheless. Identified servers (a) and users (b) were part of The general problems involved with transclusion have been Tumbler identifers (that combined document identifiers and identified [1]. Other or more specific challenges include (ordered content locators) [17] but the current OpenXanadu implementa- by severity): tion uses plain URLs as part of its Xanadoc edit list format [21]. Canonical sources of documents (a) could also be implemented storage: data storage is cheap, but someone has to pay for it. by or alternative technology to prove that a specific document existed on a specific server at a specific time. Such 6A more practicaledit list syntax E could also allow the direct embedding of small doc- ument instances which SHA-1 hashes can be computed from. If implemented carefully, knowledge of a document’s first insertion into the hypertext this could also reconcile transclusion with copy-and-paste. 7New standards such as IPFS Mulihashes and BitTorrent Merkle-Hashes look promis- 8In particular by search engines and by spammers. A criterion to judge the success of ing but these types of identifiers are not specified as part of the URI system (yet) [26]. a hypertext system is whether it is popular enough to attract link spam. 2019-07-02 01:06. Page 3 of 1–4. Jakob Voß system would also allow for royalty systems (c).9 Identification REFERENCES of users and access control (b) could also be implemented in [1] Robert Akscyn. 2015. The Future of Transclusion. Intertwingled (2015), 113–122. several ways but this feature much more depends on network https://doi.org/10.1007/978-3-319-16925-5_15 infrastructures and socio-technical environments, including rules [2] Michael Appleby, Tom Crane, Robert Sanderson, Jon Stroop, and Simeon Warner of privacy, intellectual property, and censorship. Last but not least, (Eds.). 2017. IIIF Image API (2.1.1 ed.). IIIF Consortium. https://iiif.io/api/image/ [3] Claus Atzenbeck, Thomas Schedel, Manolis Tzagarakis, Daniel Roßner, a hypertext system needs applications to visualize, navigate, and and Lucas Mages. 2017. Revisiting Hypertext Infrastructure. Proceed- edit hypermedia (d)10 Several user interface have been invented in ings of the 28th ACM Conference on Hypertext and Social Media, 35–44. https://doi.org/10.1145/3078714.3078718 the [13] and there will unlikely be one final [4] Tim Berners-Lee. 1989. Information Management: A Proposal. CERN. application because user interfaces depend on use-cases and file https://www.w3.org/History/1989/proposal.html formats. [5] Sarven Capadisli, Sören Auer, and Reinhard Riedl. 2015. This ‘Paper’ is a Demo. The : ESWC 2015 Satellite Events (2015), 26–30. https://doi.org/10.1007/978-3-319-25639-9_5 [6] Kristóf Csillag. 2013. Fuzzy Anchoring. Hypothesis (22 4 2013). Differences to other hypertext models https://web.hypothes.is/blog/fuzzy-anchoring/ [7] Angelo Di Iorio and Fabio Vitali. 2005. From the writable web to global editability. Proceedings of the sixteenth ACM conference on Hypertext and hypermedia (2005), The focus of models from the hypertext research community 35–45. https://doi.org/10.1145/1083356.1083365 [3] is more on services and tools than on Nelson’s requirements [8] Jonathan Furner. 2016. “Data”: The data. Information Cultures in the Digital Age: a Festschrift in honor of Rafael Capurro (2016), 287–306. [30]. This paper rather looks at the neglected “within-component http://www.jonathanfurner.info/wp-content/uploads/2016/12/Furner-Final-Proof-18.4.16. layer”11 of the Dexter Hypertext Reference Model [10] than on [9] Kaj Grønbæk and Randall H. Trigg. 1996. Toward a Dexter-based model for open hypermedia: Unifying embedded references and link objects. Proceedings of the issues of storage, presentation and interaction with a hypertext the seventh ACM conference on Hypertext, 149–160. system. Extension with content locators (“locSpecs” in [9]) could [10] Frank G. Halasz and Mayer D. Schwartz. 1990. The Dexter Hypertext Reference more align Dexter with infrastructure agnostic hypertext but Model. Technical Report. [11] William Kent. 1988. The Many Forms of a Single Fact. Proceedings of IEEE COM- existing models rarely put traceable edit-lists and transclusion into PCON 89, 438–443. https://doi.org/10.1109/CMPCON.1989.301972 their core. [12] Tuomas Lukka and Benja Fallenstein. 2002. Freenet-like GUIDs for implementing xanalogical hypertext. Proceedings of the thirteenth ACM conference on Hypertext and hypermedia (2002), 194–195. https://doi.org/10.1145/513338.513386 [13] Matthias Müller-Prove. 2002. Vision and Reality of Hypertext and Graphical User SUMMARY AND CONCLUSION Interfaces. Ph.D. Dissertation. https://mprove.de/visionreality/ [14] . 1965. Complex information processing: a file structure for the com- plex, the changing and the indeterminate. Proceedings of the 1965 20th national This paper presents a novel interpretation of the original vision of conference, 84–100. https://doi.org/10.1145/800197.806036 hypertext [14, 15]. Its infrastructure-agnostic model does not re- [15] Ted Nelson. 1967. Getting It Out of Our System. : A Critical quire or exclude specific data formats or network protocols. Ab- Review, 191–210. [16] Ted Nelson. 1974. Computer Lib / Dream Machines. stract from these ever-changing technologies, the focus is on hy- [17] Ted Nelson. 1980. Literary Machines. Mindful Press. permedia content (documents) and connections (hyperlinks). Core [18] Ted Nelson. 1997. Embedded Markup Considered Harmful. Journal 2, 4 (1997), 129–134. elements of hypertext systems are identified as documents, docu- [19] Ted Nelson. 1999. Xanalogical structure, needed now more than ever: parallel ment identifiers, content locators, and edit lists. A formal model de- documents, deep links to content, deep versioning, and deep re-use. Comput. fines their relations based on knowledge of data formats and mod- Surveys 31, 4es (12 1999). https://doi.org/10.1145/345966.346033 [20] Ted Nelson. 2008. Geeks Bearing Gifts. Mindful Press. els. It is shown which technologies can be used to implement such [21] Ted Nelson and Nicholas Levin. 2014. OpenXanadu. a hypertext system integrated into current information infrastruc- http://xanadu.com/xanademos/MoeJusteOrigins.html tures (especially the and the Web) and which challenges [22] Ted Nelson, Robert Adamson Smith, and Marlene Mallicoat. 2007. Back to the future: hypertext the way it used to be. Proceedings of the eighteenth conference still exist (in particular support of edit lists in editing tools). on Hypertext and hypermedia, 227–228. https://doi.org/10.1145/1286240.1286303 [23] Andrew David Pam. 2002. Xanadu FAQ (1.56 ed.). http://www.aus.xanadu.com/xanadu/faq.html [24] Allen H. Renear, David Dubin, and Karen M. Wickett. 2009. When dig- Pandoc template (TEX) ital objects change — exactly what changes? Proceedings of the Amer- ican Society for and Technology 45, 1 (3 6 2009). T X https://doi.org/10.1002/meet.2008.14504503143 Markdown E PDF [25] Jeni Tennison. 2012. Best Practices for Fragment Identifiers and Media Type Definitions. World Wide Web Consortium. http://www.w3.org/TR/fragid-best-practices/ TikZ SVG HTML [26] Ben Trask. 2016. Hash URI Specification (Initial Draft). Technical Report. https://github.com/hash-uri/hash-uri [27] Fernanda Viégas, Martin M. Wattenberg, and Kushal Dave. [n. d.]. Studying Co- Pandoc template (HTML) CSS operation and Conflict between Authors with history flow Visualizations. CHI ’04 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Wikidata CSL JSON BibT X 575–582. https://doi.org/10.1145/985692.985765 E [28] Jakob Voss. 2013. Describing Data Patterns. Ph.D. Dissertation. http://aboutdata.org/ Figure 2: Proto-transclusion links of this paper [29] Jakob Voss. 2013. Was sind eigentlich Daten? LIBREAS 23 (2013). https://doi.org/10.18452/9038 [30] Noah Wardrip-Fruin. 2004. What hypertext is. Proceedings of the fifteenth ACM conference on Hypertext and hypermedia, 126–127. https://doi.org/10.1145/1012807.1012844 9Copyright detection was easy to implement with mandatory registration such as partly required in the United States until 1976. Authors might also register documents with cryptographic hashes without making them public in the first place. 10Tim Berners-Lee’s first originally supported editing. 11“It would be folly to attempt a generic model covering all of these data types.” [10] 2019-07-02 01:06. Page 4 of 1–4.