Fragment Identifiers for Plain Text Files

Total Page:16

File Type:pdf, Size:1020Kb

Fragment Identifiers for Plain Text Files Fragment Identifiers for Plain Text Files Erik Wilde and Marcel Baschnagel Computer Engineering and Networks Laboratory (TIK) Swiss Federal Institute of Technology (ETH Zurich)¨ [email protected] ABSTRACT Fragment identifiers have been defined for other widely Hypermedia systems like the Web heavily depend on their used resource types, such as XML (where fragment iden- ability to link resources. One of the key features of the tifiers are XPointers [4]) and PDF (where it is possible to Web’s URIs is their ability to not only specify a resource, identify pages and other subresources). However, since tool but to also identify a subresource within that resource, by support for working with URIs with fragment identifiers is using a fragment identifier. Fragment identification enables weak, and fragment identifiers often are less robust then ref- user to create better hypermedia. We present a proposal erences to the resource only, only a small fraction of links for fragment identifiers for plain text files, which makes it on the Web specify fragment identifiers. possible to identify character or line ranges, or subresources As a proof of concept, a prototypical implementation for identified by regular expressions. Using these fragment iden- plain text fragment identifiers has been developed, and there- tifiers, it is possible to create more specific hyperlinks, by fore it is possible to test and use these fragment identifiers not only linking to a complete plain text resource, but only in a server-based scenario as well as in the form of a Firefox the relevant part of it. Along with this proposal, a proto- extension. type implementation is described which can be used both as a server-side testbed and as a client-side extension for the 2. Plain Text Files and Hypertext Firefox browser. In terms of hypertext, plain text files have severe lim- Categories and Subject Descriptors: I.7.2 [Document itations, because they cannot contain outgoing links, and and Text Processing]: Document Preparation — Hy- they cannot serve as targets for links to subresources. Inter- pertext/Hypermedia; H.5.4 [Information Interfaces and estingly, defining fragment identifiers solves both problems, Presentation]: Hypertext/Hypermedia — Navigation because outgoing links can then be defined by using exter- 1 General Terms: Design, Standardization nal links (for example XLinks [3] ), and incoming links can point to subresources of the plain text file. Keywords: Plain Text, URI, XLink, Firefox The majority of files used on the Web are media types other than plain text, but there still is a large number of 1. Introduction plain text files available. These are either legacy files, which The Web heavily depends on its ability to link resources. were never converted to a richer media type (a non-trivial Even though the vast majority of links on the Web is us- task, if structure needs to be added), or they are output ing references to complete documents, the ability to refer- files from tools or systems which have plain text output, a ence document fragments greatly enhances the power of a popular example is the Comma-Separated Values (CSV) for- hypermedia system, and its usefulness for the users. The mat (a weakly structured format, but in terms of hypertext generic URI syntax [1] specifies fragment identifiers, and usage simply treated as unstructured plain text). For all thus enables Web links to point to document fragments. On these plain text resources, fragment identification makes it the Web, fragment identification depends on a resource’s possible to integrate them better into the hypermedia world MIME type, and the most popular fragment identification of the Web, even if only with incoming links. method is that of HTML. URIs with fragment identifiers In a Web with plain text fragment identification, it would such as http://www.w3.org/TR/html/#C_8 reference a par- for example be possible to have a link pointing to a partic- ticular subresource within the resource, in case of HTML ular line in a log file and thus be able to make a statement fragment identifiers the element with the id attribute set to about this subresource, such as the comment that this log C 8 (for example the anchor element <a id="C 8">...</a>). entry requires further investigation. 3. Fragment Identifiers for Plain Text Files Permission to make digital or hard copies of all or part of this work for Fragment identifiers are strings appended to a URI, us- personal or classroom use is granted without fee provided that copies are ing the # character as a separation between the resource not made or distributed for profit or commercial advantage and that copies identification and the fragment identifier. The generic URI bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific 1As pointed out by Bry and Eckstein [2], however, even after 4 permission and/or a fee. years the XLink specification has not had any success as a Web HT’05, September 6–9, 2005, Salzburg, Austria. technology. Copyright 2005 ACM 1-59593-168-6/05/0009 ...$5.00. syntax [1] states that “the fragment identifier component of a disjoint subresource (if there is more than one occurrence a URI allows indirect identification of a secondary resource of the case-insensitive string “RFC” within the resource). by reference to a primary resource and additional identify- ing information. [...] The fragment’s format and resolution 4. Implementation is [...] dependent on the media type of a potentially re- trieved representation, even though such a retrieval is only Implementations of the fragment identification scheme for performed if the URI is dereferenced.” plain text resources should be included in future Web brows- In an Internet Draft [6] currently under development (and ers, and in other tools handling plain text files (such as text open for discussion on the relevant mailing lists), a fragment viewers and editors), so that these tools can generate frag- identifier syntax and semantics for plain text resources has ment identifiers for inclusion into Web resources. As a pro- been defined. The basic building blocks of these fragment totype, we have implemented fragment identifier interpreta- identifiers are the following: tion code that operates in three steps: • Positions and Ranges: Positions identify fragments of 1. Identifier Analysis: The fragment identifier is analyzed length zero, which may be a useful concept if a link ex- and it is tested whether it conforms to the syntax def- presses the concept of “insert some text here”. Ranges, inition. on the other hand, are fragments that span between 2. Resource Analysis: The resource is retrieved, tested two positions, and thus can be thought of what users for its MIME type, tested for its character encoding, normally can select with a mouse. and tested against the fragment identifier’s hash sum • Characters and Lines: The only plain text structures (if present). that can be identified reasonably across different char- 3. Identifier Application: If the fragment identifier is cor- acter encodings are individual characters and lines. rect and applicable, the subresource is located and pre- Both concept are important when dealing with plain sented to the user. text files, and are usually supported by text tools (for example providing a “go to line” function). Since a prototype should be available for as many inter- ested users as possible, it is available in two configurations, • Regular Expressions: While characters and lines easily as server-side and client-side implementation. break when the file is modified, it may be more robust to identify a subresource through string matching. By 4.1 Server-Side Implementation using a regular expression, a fragment identifier selects The server-side implementation is based on Perl, which all character ranges matching this regular expression. provides excellent string handling functionality and thus is The concepts of positions/ranges and characters/lines may ideally suited to support the interpretation of fragment iden- be combined to use line ranges and similar concepts. Also, tifiers. The server-side implementation is available for test- different identification methods can be combined, and the ing at http://dret.net/text-fragment/ and can be used fragment identified by such a combination is the union of all with any text file. individually identified fragments. As with regular expres- sions, this may easily lead to disjoint subresources, which is not a problem and can be easily represented by using high- lighting or similar concepts. Finally, a hash value can be used to be able to detect changes in the resource. This may be useful to be able to detect potentially broken fragment identifiers. As long as the hash value (with may be the length or the MD5 check- sum) has not changed, the resource probably has not been changed and the fragment identifier can safely be applied. The following examples show how the concepts are used in actual fragment identifiers. They are using some of the frag- ment identification schemes defined by the fragment identi- fier syntax. Both examples assume that the resource re- trieved by dereferencing the URI is a plain text file. 1. http://example.com/text.txt#line=10,20 2. http://example.com/text.txt#match=[rR][fF][cC] In the first example, the URI identifies a fragment consist- Figure 1: Demo Text with Fragment Identifier ing of lines 11–20 of the identified resource. If the resource does not have that many lines, the fragment is smaller or A simple example is shown in Figure 1, which is the re- may even be empty. The URI of the second example2 iden- sult of applying a three-part fragment identifier to a text tifies a fragment of the resource which consists of a case- file containing this paper’s abstract.
Recommended publications
  • Customizing and Extending Powerdesigner SAP Powerdesigner Documentation Collection Content
    User Guide PUBLIC SAP PowerDesigner Document Version: 16.6.2 – 2017-01-05 Customizing and Extending PowerDesigner SAP PowerDesigner Documentation Collection Content 1 PowerDesigner Resource Files.................................................... 9 1.1 Opening Resource Files in the Editor.................................................10 1.2 Navigating and Searching in Resource Files............................................ 11 1.3 Editing Resource Files........................................................... 13 1.4 Saving Changes................................................................13 1.5 Sharing and Embedding Resource Files...............................................13 1.6 Creating and Copying Resource Files.................................................14 1.7 Specifying Directories to Search for Resource Files.......................................15 1.8 Comparing Resource Files........................................................ 15 1.9 Merging Resource Files.......................................................... 16 2 Extension Files................................................................18 2.1 Creating an Extension File.........................................................19 2.2 Attaching Extensions to a Model....................................................20 2.3 Exporting an Embedded Extension File for Sharing.......................................21 2.4 Extension File Properties......................................................... 21 2.5 Example: Adding a New Attribute from a Property
    [Show full text]
  • IPBES Workshop on Biodiversity and Pandemics Report
    IPBES Workshop on Biodiversity and Pandemics WORKSHOP REPORT *** Strictly Confidential and Embargoed until 3 p.m. CET on 29 October 2020 *** Please note: This workshop report is provided to you on condition of strictest confidentiality. It must not be shared, cited, referenced, summarized, published or commented on, in whole or in part, until the embargo is lifted at 3 p.m. CET/2 p.m. GMT/10 a.m. EDT on Thursday, 29 October 2020 This workshop report is released in a non-laid out format. It will undergo minor editing before being released in a laid-out format. Intergovernmental Platform on Biodiversity and Ecosystem Services 1 The IPBES Bureau and Multidisciplinary Expert Panel (MEP) authorized a workshop on biodiversity and pandemics that was held virtually on 27-31 July 2020 in accordance with the provisions on “Platform workshops” in support of Plenary- approved activities, set out in section 6.1 of the procedures for the preparation of Platform deliverables (IPBES-3/3, annex I). This workshop report and any recommendations or conclusions contained therein have not been reviewed, endorsed or approved by the IPBES Plenary. The workshop report is considered supporting material available to authors in the preparation of ongoing or future IPBES assessments. While undergoing a scientific peer-review, this material has not been subjected to formal IPBES review processes. 2 Contents 4 Preamble 5 Executive Summary 12 Sections 1 to 5 14 Section 1: The relationship between people and biodiversity underpins disease emergence and provides opportunities
    [Show full text]
  • Writing Mathematical Expressions in Plain Text – Examples and Cautions Copyright © 2009 Sally J
    Writing Mathematical Expressions in Plain Text – Examples and Cautions Copyright © 2009 Sally J. Keely. All Rights Reserved. Mathematical expressions can be typed online in a number of ways including plain text, ASCII codes, HTML tags, or using an equation editor (see Writing Mathematical Notation Online for overview). If the application in which you are working does not have an equation editor built in, then a common option is to write expressions horizontally in plain text. In doing so you have to format the expressions very carefully using appropriately placed parentheses and accurate notation. This document provides examples and important cautions for writing mathematical expressions in plain text. Section 1. How to Write Exponents Just as on a graphing calculator, when writing in plain text the caret key ^ (above the 6 on a qwerty keyboard) means that an exponent follows. For example x2 would be written as x^2. Example 1a. 4xy23 would be written as 4 x^2 y^3 or with the multiplication mark as 4*x^2*y^3. Example 1b. With more than one item in the exponent you must enclose the entire exponent in parentheses to indicate exactly what is in the power. x2n must be written as x^(2n) and NOT as x^2n. Writing x^2n means xn2 . Example 1c. When using the quotient rule of exponents you often have to perform subtraction within an exponent. In such cases you must enclose the entire exponent in parentheses to indicate exactly what is in the power. x5 The middle step of ==xx52− 3 must be written as x^(5-2) and NOT as x^5-2 which means x5 − 2 .
    [Show full text]
  • Xincluding Plain-Text Fragments for Symmetry and Profit Piotr Bański, University of Warsaw, [email protected]
    XIncluding plain-text fragments for symmetry and profit Piotr Bański, University of Warsaw, [email protected] <xi:include href=”resource.xml” parse=”text|xml” Summary xpointer=”fragment_identifier” /> XML for the annotation of images, audio or video is by necessity of the stand-off kind: the annotations are virtually anchored to the parse=“xml” parse=“text” binary stream (typically by x,y coordinates or time offsets). Work on language resource annotation has extended this approach to plain text, where annotations are anchored by character offsets. This is referred to as extreme stand-off markup. No generic XML tools or standards support it, but... they could, fairly easily. inclusion of inclusion of inclusion of inclusion of Rationale the entire the entire a fragment a fragment ISO Language Annotation Framework (e.g. Ide & Romary, 2007) re- XML text of an XML of a text commends that data be kept as read-only plain text, with one or mo- resource resource resource resource re layers of XML annotation identifying individual spans of charac- ters and adding metadata to them (identifying e.g. segments of va- rious levels of granularity, parts of speech, syntactic or discourse @xpointer present functions, etc.) no @xpointer However, the only way to handle this (as evidenced by the American National Corpus), is to use in-house tools working on correspon- dence semantics for hyperlinks (see Bański, 2010, for terminology What makes it impossible right now? and more remarks). 1. XInclude Ban: XInclude Rec, 3.1: “The xpointer attribute must not be present when parse="text".” But character streams can do better! Because text is what XML is built out of and what it contains, instead of correspondence 2.
    [Show full text]
  • W3C Working Draft: Scalable Vector
    next contents properties index Scalable Vector Graphics (SVG) 1.0 Specification W3C Working Draft 03 March 2000 This version: http://www.w3.org/TR/2000/03/WD-SVG-20000303 (Available as: PDF, zip archive of HTML) Previous public version: http://www.w3.org/TR/1999/WD-SVG-19991203/ Latest public version: http://www.w3.org/TR/SVG/ Editor: Jon Ferraiolo <[email protected]> Authors: See author list Copyright ©1998, 1999, 2000 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Abstract This specification defines the features and syntax for Scalable Vector Graphics (SVG), a language for describing two-dimensional vector and mixed vector/raster graphics in XML. Status of this document This document is a public review draft version of the SVG specification. This working draft attempts to address review comments that were received during the initial Last Call period, which started 12 August 1999, and also incorporates other modifications resulting from continuing collaboration with other working groups and continuing work within the SVG working group. With the publication of this draft, the SVG specification enters a second "Last Call". The second Last Call period will end on 31 March, 2000. The SVG specification is going through a second Last Call review process to provide the public and other working groups an opportunity to review the changes to the specification since the initial Last Call period. A complete list of all changes since the initial Last Call version of the specification is available in Appendix L: Change History. Last call comments should be sent to [email protected].
    [Show full text]
  • How Many Bits Are in a Byte in Computer Terms
    How Many Bits Are In A Byte In Computer Terms Periosteal and aluminum Dario memorizes her pigeonhole collieshangie count and nagging seductively. measurably.Auriculated and Pyromaniacal ferrous Gunter Jessie addict intersperse her glockenspiels nutritiously. glimpse rough-dries and outreddens Featured or two nibbles, gigabytes and videos, are the terms bits are in many byte computer, browse to gain comfort with a kilobyte est une unité de armazenamento de armazenamento de almacenamiento de dados digitais. Large denominations of computer memory are composed of bits, Terabyte, then a larger amount of nightmare can be accessed using an address of had given size at sensible cost of added complexity to access individual characters. The binary arithmetic with two sets render everything into one digit, in many bits are a byte computer, not used in detail. Supercomputers are its back and are in foreign languages are brainwashed into plain text. Understanding the Difference Between Bits and Bytes Lifewire. RAM, any sixteen distinct values can be represented with a nibble, I already love a Papst fan since my hybrid head amp. So in ham of transmitting or storing bits and bytes it takes times as much. Bytes and bits are the starting point hospital the computer world Find arrogant about the Base-2 and bit bytes the ASCII character set byte prefixes and binary math. Its size can vary depending on spark machine itself the computing language In most contexts a byte is futile to bits or 1 octet In 1956 this leaf was named by. Pages Bytes and Other Units of Measure Robelle. This function is used in conversion forms where we are one series two inputs.
    [Show full text]
  • Automatically Adapting Programs for Mixed-Precision Floating-Point Computation
    Automatically Adapting Programs for Mixed-Precision Floating-Point Computation Michael O. Lam and Bronis R. de Supinski and Jeffrey K. Hollingsworth Matthew P. LeGendre Dept. of Computer Science Center for Applied Scientific Computing University of Maryland Lawrence Livermore National Laboratory College Park, MD, USA Livermore, CA, USA [email protected], [email protected] [email protected], [email protected] ABSTRACT IEEE standard provides for different levels of precision by As scientific computation continues to scale, efficient use of varying the field width, with the most common widths being floating-point arithmetic processors is critical. Lower preci- 32 bits (\single" precision) and 64 bits (\double" precision). sion allows streaming architectures to perform more opera- Figure 1 graphically represents these formats. tions per second and can reduce memory bandwidth pressure Double-precision arithmetic generally results in more ac- on all architectures. However, using a precision that is too curate computations, but with several costs. The main cost low for a given algorithm and data set leads to inaccurate re- is the higher memory bandwidth and storage requirement, sults. In this paper, we present a framework that uses binary which are twice that of single precision. Another cost is instrumentation and modification to build mixed-precision the reduced opportunity for parallelization, such as on the configurations of existing binaries that were originally devel- x86 architecture, where packed 128-bit XMM registers can oped to use only double-precision. This framework allows only hold and operate on two double-precision numbers si- developers to explore mixed-precision configurations with- multaneously compared to four numbers with single preci- out modifying their source code, and it enables automatic sion.
    [Show full text]
  • Understanding JSON Schema Release 2020-12
    Understanding JSON Schema Release 2020-12 Michael Droettboom, et al Space Telescope Science Institute Sep 14, 2021 Contents 1 Conventions used in this book3 1.1 Language-specific notes.........................................3 1.2 Draft-specific notes............................................4 1.3 Examples.................................................4 2 What is a schema? 7 3 The basics 11 3.1 Hello, World!............................................... 11 3.2 The type keyword............................................ 12 3.3 Declaring a JSON Schema........................................ 13 3.4 Declaring a unique identifier....................................... 13 4 JSON Schema Reference 15 4.1 Type-specific keywords......................................... 15 4.2 string................................................... 17 4.2.1 Length.............................................. 19 4.2.2 Regular Expressions...................................... 19 4.2.3 Format.............................................. 20 4.3 Regular Expressions........................................... 22 4.3.1 Example............................................. 23 4.4 Numeric types.............................................. 23 4.4.1 integer.............................................. 24 4.4.2 number............................................. 25 4.4.3 Multiples............................................ 26 4.4.4 Range.............................................. 26 4.5 object................................................... 29 4.5.1 Properties...........................................
    [Show full text]
  • Programming Curriculum
    PARC VLP PROGRAMMING CURRICULUM WWW.PARCROBOTICS.ORG Overview Study of programming languages, paradigms and data structures. Chapter 1: Programming Basics Sections A. What is programming? B. What is a programming language? C. Writing source code D. Running your code E. Using IDE Chapter 2: Programming Syntax Sections A. Why Python?A. Why Python? B. Basic statementsB. Basic statements and expressions and expressions C. Troubleshooting issues C. Troubleshooting issues Chapter 3: Variables and Data Types Sections A. IntroductionA. to Introductionvariables and to data variables types and data types B. WorkingB.W withorking variables with variables across Languages across Languages C. Working with numbers C. Working with numbers D. Working with strings E. WorkingD. with commentsWorking with strings E. Working with comments Chapter 4: Conditional Code Sections: A. Making decisions in code B. Exploring conditional code C. Working with simple conditions D. Conditionals across languages PAN-AFRICAN ROBOTICS COMPETITION 1 Chapter 1 SECTION A What is programming? Programming is the process of converting ideas into instructions that a computer can understand and execute. These instructions are specific and sequential. You can think of it as a recipe. Let's you want to prepare your favorite food; you would need first a list of ingredients and then a set of instructions as to which ingredients go in first. If you have ever cooked before or watched someone cook before you will know that the amount of each ingredient can dramatically affect the outcome. Computers are very literal. They try to execute our commands exactly. When we give them bad instructions, we might introduce bugs or even make the computer crash.
    [Show full text]
  • Text File Text File Example Text Reading Overview
    CS106A, Stanford Handout #53 Fall, 2003-04 Nick Parlante Files Text File The simple "text file" is one of the oldest and simplest types of file. For that reason, text files are one of the most universally standard ways to store information. A text file is something you could type up in a word processor, and save with the "plain text" option. A text file is made of a sequence of characters, organized as a series of horizontal lines of text. There is no notion of bold or italic or color applied to the characters; they are just plain text characters, as we would store in a String. Historically, text files have been made of the familiar roman keyboard alphabet – abcdef..xyz012..89!@#$.. – with the extremely standard ASCII encoding (American Standard Code for Information Interchange). More recently, text has grown to include the idea of unicode characters like ø and !, but the encodings for those are not yet as standard as ASCII. The .java files where we write our Java code are examples of text files. The text file is such a simple, flexible way to store information, it will be with us for a long time to come. If you want to store information in a simple, non-proprietary way, the text file is a great choice. XML files, which are a very modern way to store information, are a type of text file. Text File Example This is the first line of my 4 line text file, and this here is the 2nd. The above line is blank! The above example text file is 4 lines long.
    [Show full text]
  • Plain Text & Character Encoding
    Journal of eScience Librarianship Volume 10 Issue 3 Data Curation in Practice Article 12 2021-08-11 Plain Text & Character Encoding: A Primer for Data Curators Seth Erickson Pennsylvania State University Let us know how access to this document benefits ou.y Follow this and additional works at: https://escholarship.umassmed.edu/jeslib Part of the Scholarly Communication Commons, and the Scholarly Publishing Commons Repository Citation Erickson S. Plain Text & Character Encoding: A Primer for Data Curators. Journal of eScience Librarianship 2021;10(3): e1211. https://doi.org/10.7191/jeslib.2021.1211. Retrieved from https://escholarship.umassmed.edu/jeslib/vol10/iss3/12 Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License. This material is brought to you by eScholarship@UMMS. It has been accepted for inclusion in Journal of eScience Librarianship by an authorized administrator of eScholarship@UMMS. For more information, please contact [email protected]. ISSN 2161-3974 JeSLIB 2021; 10(3): e1211 https://doi.org/10.7191/jeslib.2021.1211 Full-Length Paper Plain Text & Character Encoding: A Primer for Data Curators Seth Erickson The Pennsylvania State University, University Park, PA, USA Abstract Plain text data consists of a sequence of encoded characters or “code points” from a given standard such as the Unicode Standard. Some of the most common file formats for digital data used in eScience (CSV, XML, and JSON, for example) are built atop plain text standards. Plain text representations of digital data are often preferred because plain text formats are relatively stable, and they facilitate reuse and interoperability.
    [Show full text]
  • Limiting the Exposure of Plain Text Passwords in C
    Limiting The Exposure of Plain Text Passwords in C# written by Austin Altmann | October 24, 2019 One vulnerability that we frequently look for when testing thick client applications is plain text passwords that are exposed in memory. Microsoft provides the SecureString to help protect passwords in memory, but what it does not provide is a perfect solution to actually using the SecureString when sending web requests. And as you’ll see below, there is none, but there are some safer ways to operate. NetworkCredential Might Be Good Enough If you’re handling passwords in forms other than the login, NetworkCredential won’t be the best tool for the job. But if you want to use an authentication scheme like Basic or Digest, you might be satisfied implementing a CredentialCache and NetworkCredential as seen here. After all, there’s a NetworkCredential constructor with a SecureString password argument. However, the password will be exposed in memory as a string when preparing to send a WebRequest with the credentials. Each authentication scheme makes a call to the NetworkCredential’s InternalGetPassword function which will return the password in plain text. internal string InternalGetPassword() { string decryptedString = UnsafeNclNativeMethods.SecureStringHelper.CreateString(m_passw ord); return decryptedString; } The String Problem The entire point of using a SecureString is to limit the exposure of the plain text password in memory (see the blog title). Strings are immutable and therefore cannot be cleared on command. Their values will exist in memory until the garbage collector rolls around. If strings are used to handle the plain text password, it may exist in more addresses and for longer than anticipated.
    [Show full text]