Notes on the Transclusion Finder the Html Version of This Document Badly Botches the Special Symbols

Total Page:16

File Type:pdf, Size:1020Kb

Notes on the Transclusion Finder the Html Version of This Document Badly Botches the Special Symbols Notes on the Transclusion Finder The html version of this document badly botches the special symbols. The .pdf or .ps versions are good in this regard. The text with the formulae may be of interest to some however. The main idea is to define a "landmark" that is a property of a string of bits that is suitably rare. Each file is scanned for landmarks and a checksum of the data around each landmark is made and included in a file along with the address of where it was found. The file is them sorted by checksum and then landmark sites with the same checksum are compared by reading the disk. The main difficulty is finding a suitable landmark definition. I invented a stream hash function that is fairly efficient to compute. It amounts to a convolution of the bit stream of the file with a string of about 500 bits. I look for strings of about 11 ones in the hash and call this a landmark. Bits surrounding this string of ones are used as the checksum. This has the convenient property that if two file portions are alike except for different offsets then the coincidence will be found even if the offset difference is not a multiple of one byte. This scheme is likely to find coincidences of more than 256 characters as such a stream is likely to have a landmark. Shorter coincidences are less likely to be found. One piece of hair is removal of multiple reports resulting from multiple landmarks in one coincidence. Local Stationary Bit Stream Functions This is the theory of a string hash function. It is a function on a long string of bits. Its value is a few hundred bits longer than the argument. It is local, stationary and fairly fast to compute. We use “string” here to refer to arrays of bits indexed by the integers. If b is such a string then bj is bit j of that string. The strings are infinitely long in both directions but the strings will be zero outside some finite range. Both the argument and value of string hash is a string. b << n is the string b shifted n to the left. (b<<n)i= bi–n for any integers i and n. Note that we picture bits with a larger index on the left! e.g. b2 b1 b0 b–1 b–2 ... —this is the classic order for polynomial coefficients. We also write “b(x)” to represent the polynomial Σbixi over the field GF(2). A string function s is stationary iff s(b<<1) = s(b)<<1 for all b. In this case s(b<<n) = s(b)<<n for all b and n. A function s is local when there are integers a and b such that a<b and s(b)i is determined by bj only when i+a < j < i+b. This means that such functions depends on only bits in the neighborhood. More precisely L is local if there are integers a and b such that a<b and for any strings s and t (for all i, if (for all j if (i+a < j < i+b) then sj = tj) then L(s)i = L(t)i). We want a stationary function so that properties of a string that are preserved under shifting will remain visible in the yield of S if they were visible before. We want the string to local so that equality of strings of modest length will lead to equality of the yield of S. We will look for “landmarks” in the yield such as strings of ones. No particular pattern can be presumed to have a useful density in the original data—zeros, for instance, are excluded from ascii text. Such patterns should have known densities in the “randomized” data that comes from S. Our particular string hash function is based on a polynomial h over GF(2) of degree P≅50. We identify a bit string b with the polynomial h(x) = Σbixi summed over non-negative i. When polynomials are used to checksum transmission blocks a “checksum” is computed and transmitted that would have caused the block to be a multiple of the polynomial had it been exclusive-ored into the right end of the block. In this case the checksum is a function of all of the bits in the block and is thus not local. Our string hash function S is a convolution function. S(b)i = Σbi–jpj. (In this context Σ means the one bit exclusive-or: ⊕) If h(x) = x64+x61+1 then p may be recursively defined as pj = 0 for j<0 or j≥B and p0 = 1 and pj = pj–64 ⊕ pj–61 for 0 < j < B. (B is some constant ≅ 500). Alternatively p may be defined by the relations xB + r(x) = p(x)h(x) and deg(r)≤deg(h). r(x) is the remainder after dividing xB by h(x) and p(x) = (xB + r(x))/ h(x). It may be seen that S is stationary and local for the range a = –B, b = 0. The “polynomial” associated with p for infinite B is the reciprocal of h in the sense that the terms of the product of p and h are the coefficients of the polynomial 1 despite the fact there are infinitely many terms in p. This is like 1/(1–x) = 1+x–1+x–2+x–3... You may have seen the old fallacious proof that 1+2+4+8... = -1. That proof is valid in this funny world. The desired convolution is m(x)p(x) where mi are the bits of the argument to S. If there are N bits in the argument then the first bit is mN-1 and the last bit is m0. We will process the bits from first to last. m(x)p(x) = m(x)[(xB + r(x))/h(x)] = [m(x)(xB + r(x))]/h(x) We first describe a conceptual method that does not have the factor B in its cost. Then we show how to do it several bits at a time instead of one. The first conceptual method of computation proceeds as follows. The string argument is delivered one bit at a time, largest index first. The algorithm does not know the value of that index. S being stationary, we can compute the answer as soon as the data begin to arrive. By the first and last bits of the string argument we refer to some range of the infinite input outside of which is all zero. We buy three left infinite accumulators m, s and q. m accumulates the argument, s will accumulate m(x)(xB + r(x)) and q will accumulate the quotient [m(x)(xB + r(x))]/h(x). We prepare the initial answer as if the input string were empty (all zero) by setting m, s and q to all zeros. When ever a bit of m arrives: 1) Shift m, s and q each to the left one bit and deposit the new bit in m0. 2) If the new bit is 1 we exclusive-or xB + r(x) into s. 3) Set q0 to sB. 4) If sB then exclusive-or h(x)x(B–P) into s. (This turns sB off.) When an oracle tells us that there are no new bits we supply B–P zeros after which s will be all zeros. Note that the last step above keeps sj = 0 for j>B. Supplying zeros continues to increase the number of low bits in s. The loop invariant for the above is: s(x) = (xB + r(x))m(x) – q(x)h(x)x(B–P) and deg(s)<B. Since s is driven to zero we have q(x) = (xB + r(x))m(x)x(B–P)/(h(x)x(B–P)) upon exit from the loop. The extra factor x(B–P) in m(x)x(B–P) above arose when we fed B–P more zero bits into the algorithm after the last bit of the argument arrived. Steps 1 and 2 constitute the multiplication (xB + r(x))m(x) and steps 3 and 4 perform the division (xB + r(x))m (x)/h(x). We now have: q(x) = (xB + r(x))m(x)/h(x) = m(x)p(x) which is the desired answer. ....We reformulate the above in notation without mutation such as shifting and “exclusive-or”ing. To accelerate the above computations we consider the data flow. The only parts of the computation that are proportional to B are the shifts of s, m and q. m is entirely unneeded—it was computed only to make the the loop invariant testable and thus make the computation easier to debug. The bits of q are produced as the answer and are unused again in this computation. Only the first B bits of s are used—indeed the rest are all zero. After sP is produced it is not used again, aside for shifting until it becomes sB–P. This is thus a fixed length FIFO queue and may be implemented without shifting. Alternatively we can dispense with the queue and perform steps 3 and 4 sooner if we can get an early edition of the input bits so as to perform the xB element of step 2. The latter is the only thing delaying the computation in steps 3 and 4. In either case there remain no factors of B in the computation.
Recommended publications
  • Approved DITA 2.0 Proposals
    DITA Technical Committee DITA 2.0 proposals DITA TC work product Page 1 of 189 Table of contents 1 Overview....................................................................................................................................................3 2 DITA 2.0: Stage two proposals.................................................................................................................. 3 2.1 Stage two: #08 <include> element.................................................................................................... 3 2.2 Stage two: #15 Relax specialization rules......................................................................................... 7 2.3 Stage two: #17 Make @outputclass universal...................................................................................9 2.4 Stage two: #18 Make audience, platform, product, otherprops into specializations........................12 2.5 Stage two: #27 Multimedia domain..................................................................................................16 2.6 Stage two: #29 Update bookmap.................................................................................................... 20 2.7 Stage two: #36 Remove deprecated elements and attributes.........................................................23 2.8 Stage two: #46: Remove @xtrf and @xtrc...................................................................................... 31 2.9 Stage 2: #73 Remove delayed conref domain.................................................................................36
    [Show full text]
  • XRI 2.0 FAQ 1 December 2005
    XRI 2.0 FAQ 1 December 2005 This document is a comprehensive FAQ on the XRI 2.0 suite of specifications, with a particular emphasis on the XRI Syntax 2.0 Committee Specification which was submitted for consideration as an OASIS Standard on November 14, 2005. 1 General..................................................................................... 3 1.1 What does the acronym XRI stand for? ................................................................3 1.2 What is the relationship of XRI to URI and IRI? ....................................................3 1.3 Why was XRI needed?..........................................................................................3 1.4 Who is involved in the XRI specification effort? ....................................................4 1.5 What is the XRI 2.0 specification suite? ................................................................4 1.6 Are there any intellectual property restrictions on XRI? ........................................4 2 Uses of XRI .............................................................................. 5 2.1 What things do XRIs identify? ...............................................................................5 2.2 What are some example uses of XRI?..................................................................5 2.3 What are some applications that use XRI? ...........................................................5 3 Features of XRI Syntax ........................................................... 6 3.1 What were some of the design requirements
    [Show full text]
  • List of Different Digital Practices 3
    Categories of Digital Poetics Practices (from the Electronic Literature Collection) http://collection.eliterature.org/1/ (Electronic Literature Collection, Vol 1) http://collection.eliterature.org/2/ (Electronic Literature Collection, Vol 2) Ambient: Work that plays by itself, meant to evoke or engage intermittent attention, as a painting or scrolling feed would; in John Cayley’s words, “a dynamic linguistic wall- hanging.” Such work does not require or particularly invite a focused reading session. Kinetic (Animated): Kinetic work is composed with moving images and/or text but is rarely an actual animated cartoon. Transclusion, Mash-Up, or Appropriation: When the supply text for a piece is not composed by the authors, but rather collected or mined from online or print sources, it is appropriated. The result of appropriation may be a “mashup,” a website or other piece of digital media that uses content from more than one source in a new configuration. Audio: Any work with an audio component, including speech, music, or sound effects. CAVE: An immersive, shared virtual reality environment created using goggles and several pairs of projectors, each pair pointing to the wall of a small room. Chatterbot/Conversational Character: A chatterbot is a computer program designed to simulate a conversation with one or more human users, usually in text. Chatterbots sometimes seem to offer intelligent responses by matching keywords in input, using statistical methods, or building models of conversation topic and emotional state. Early systems such as Eliza and Parry demonstrated that simple programs could be effective in many ways. Chatterbots may also be referred to as talk bots, chat bots, simply “bots,” or chatterboxes.
    [Show full text]
  • Web Boot Camp Week 1 (Custom for Liberty)
    "Charting the Course ... ... to Your Success!" Web Boot Camp Week 1 (Custom for Liberty) Course Summary Description This class is designed for students that have experience with HTML5 and CSS3 and wish to learn about, JavaScript and jQuery and AngularJS. Topics Introduction to the Course Designing Responsive Web Applications Introduction to JavaScript Data Types and Assignment Operators Flow Control JavaScript Events JavaScript Objects JavaScript Arrays JavaScript Functions The JavaScript Window and Document Objects JavaScript and CSS Introduction to jQuery Introduction to jQuery UI jQuery and Ajax Introduction to AngularJS Additional AngularJS Topics The Angular $http Service AngularJS Filters AngularJS Directives AngularJS Forms Testing JavaScript Introduction to AngularJS 2 Prerequisites Prior knowledge of HTML5 and CSS3 is required. Duration Five days Due to the nature of this material, this document refers to numerous hardware and software products by their trade names. References to other companies and their products are for informational purposes only, and all trademarks are the properties of their respective companies. It is not the intent of ProTech Professional Technical Services, Inc. to use any of these names generically "Charting the Course ... ... to Your Success!" TDP Web Week 1: HTML5/CSS3/JavaScript Programming Course Outline I. Introduction to the Course O. Client-Side JavaScript Objects A. TDP Web Bootcamp Week 1, 2016 P. Embedding JavaScript in HTML B. Legal Information Q. Using the script Tag C. TDP Web Bootcamp Week 1, 2016 R. Using an External File D. Introductions S. Defining Functions E. Course Description T. Modifying Page Elements F. Course Objectives U. The Form Submission Event G. Course Logistics V.
    [Show full text]
  • Autonomous Agents on the Web
    Report from Dagstuhl Seminar 21072 Autonomous Agents on the Web Edited by Olivier Boissier1, Andrei Ciortea2, Andreas Harth3, and Alessandro Ricci4 1 Ecole des Mines – St. Etienne, FR, [email protected] 2 Universität St. Gallen, CH, [email protected] 3 Fraunhofer IIS – Nürnberg, DE, [email protected] 4 Università di Bologna, IT, [email protected] Abstract The World Wide Web has emerged as the middleware of choice for most distributed systems. Recent standardization efforts for the Web of Things and Linked Data are now turning hypermedia into a homogeneous information fabric that interconnects everything – devices, information resources, abstract concepts, etc. The latest standards allow clients not only to browse and query, but also to observe and act on this hypermedia fabric. Researchers and practitioners are already looking for means to build more sophisticated clients able to meet their design objectives through flexible autonomous use of this hypermedia fabric. Such autonomous agents have been studied to large extent in research on distributed artificial intelligence and, in particular, multi-agent systems. These recent developments thus motivate the need for a broader perspective that can only be achieved through a concerted effort of the research communities on the Web Architecture and the Web of Things, Semantic Web and Linked Data, and Autonomous Agents and Multi-Agent Systems. The Dagstuhl Seminar 21072 on “Autonomous Agents on the Web” brought together leading scholars and practitioners across these research areas in order to support the transfer of knowledge and results – and to discuss new opportunities for research on Web-based autonomous systems. This report documents the seminar’s program and outcomes.
    [Show full text]
  • Freenet-Like Guids for Implementing Xanalogical Hypertext
    Freenet-like GUIDs for Implementing Xanalogical Hypertext Tuomas J. Lukka Benja Fallenstein Hyperstructure Group Oberstufen-Kolleg Dept. of Mathematical Information Technology University of Bielefeld, PO. Box 100131 University of Jyvaskyl¨ a,¨ PO. Box 35 D-33501 Bielefeld FIN-40351 Jyvaskyl¨ a¨ Germany Finland [email protected] lukka@iki.fi ABSTRACT For example, an email quoting another email would be automati- We discuss the use of Freenet-like content hash GUIDs as a prim- cally and implicitly connected to the original via the transclusion. itive for implementing the Xanadu model in a peer-to-peer frame- Bidirectional, non-breaking external links (content linking) can be work. Our current prototype is able to display the implicit con- resolved through the same mechanism. Nelson[9] argues that con- nection (transclusion) between two different references to the same ventional software, unable to reflect such interconnectivity of doc- permanent ID. We discuss the next layers required in the implemen- uments, is unsuited to most human thinking and creative work. tation of the Xanadu model on a world-wide peer-to-peer network. In order to implement the Xanadu model, it must be possible to efficiently search for references to permanent IDs on a large scale. The original Xanadu design organized content IDs in a DNS-like Categories and Subject Descriptors hierarchical structure (tumblers), making content references arbi- H.5.4 [Information Interfaces and Presentation]: Hypertext/Hy- trary intervals (spans) in the hierarchy. Advanced tree-like data permedia—architectures; H.3.4 [Information Storage and Retrie- structures[6] were used to retrieve the content efficiently.
    [Show full text]
  • Angularjs Workbook
    AngularJS Workbook Exercises take you to mastery By Nicholas Johnson version 0.1.0 - beta Image credit: San Diego Air and Space Museum Archives Welcome to the Angular Workbook This little book accompanies the Angular course taught over 3 days. Each exercise builds on the last, so it's best to work through these exercises in order. We start out from the front end with templates, moving back to controllers. We'll tick off AJAX, and then get into building custom components, filters, services and directives. After covering directives in some depth we'll build a real app backed by an open API. By the end you should have a pretty decent idea of what is what and what goes where. Template Exercises The goal of this exercise is just to get some Angular running in a browser. Getting Angular We download Angular from angularjs.org Alternately we can use a CDN such as the Google Angular CDN. Activating the compiler Angular is driven by the template. This is different from other MVC frameworks where the template is driven by the app. In Angular we modify our app by modifying our template. The JavaScript we write simply supports this process. We use the ng-app attribute (directive) to tell Angular to begin compiling our DOM. We can attach this attribute to any DOM node, typically the html or body tags: 1 <body ng-app> 2 Hello! 3 </body> All the HTML5 within this directive is an Angular template and will be compiled as such. Linking from a CDN Delivering common libraries from a shared CDN can be a good idea.
    [Show full text]
  • Connecting Topincs Using Transclusion to Connect Proxy Spaces
    Connecting Topincs Using transclusion to connect proxy spaces Robert Cerny Anton-Kubernat-Straße 15, A-2512 Oeynhausen, Austria [email protected] http://www.cerny-online.com Abstract. Topincs is a software system for agile and distributed knowledge management on top of the common web infrastructure and the Topic Maps Data Model. It segments knowledge into stores and offers users a way to establish a temporary connection between stores through transclusion and merging. This allows to easily copy topics and statements. Through this mechanism, later acts of integration are simplified, due to matching item identifiers. Furthermore, transclusion can be used to create permanent connections between stores. 1 Introduction A Topincs store [5, 6, 10] is a knowledge repository for an individual or a group to make statements about subjects. It uses the Topic Maps Data Model [7] for representing knowledge. Throughout this paper we will illustrate crucial aspects by the example of a small fictional software company, called Modestsoft. It employs around 10 people and has various departments, including development, testing, and sales & services. Modestsoft uses several shared Topincs stores to manage its organizational knowledge. Those stores deal with Modestsoft’s staff, products, issues, clients, and their configurations, to name only a few. Additionally, every employee has his own store. While the shared stores have a clear boundary around their domain, the personal stores may contain topics and statements of a wider variety and are considered a repository for note taking, and as such they function as a memory extender [4]. Figure 1 illustrates Modestsoft’s setup and the Topincs stores they use to manage their spontaneous knowledge recording demands.
    [Show full text]
  • What Can Software Learn from Hypermedia?
    What Can Software Learn From Hypermedia? Philip Tchernavskij Clemens Nylandsted Klokmose Michel Beaudouin-Lafon LRI, Univ. Paris-Sud, CNRS, Digital Deisgn & Information Studies, LRI, Univ. Paris-Sud, CNRS, Inria, Université Paris-Saclay Aarhus University Inria, Université Paris-Saclay Orsay 43017-6221, France Aarhus N 8200, Denmark Orsay 43017-6221, France [email protected] [email protected] [email protected] ABSTRACT procedures, or working materials in the physical world, these dimen- Most of our interactions with the digital world are mediated by sions of flexibility are limited or nonexistent in computer-mediated apps: desktop, web, or mobile applications. Apps impose artificial activity. These limitations are especially evident as software has limitations on collaboration among users, distribution across de- become ubiquitous in our social and professional lives. vices, and the changing procedures that constantly occur in real In general, apps model procedures, in the sense that they couple work. These limitations are partially due to the engineering prin- what users can do, e.g., changing color, with what they can do it ciples of encapsulation and program-data separation. By contrast, to, e.g., text. Some apps supports collaborating on a particular task, the field of hypermedia envisions collaboration, distribution and but limit the available tools and working materials. Some apps can flexible practices as fundamental features of software. We discuss be extended with new tools, but are highly specialized, e.g., for shareable dynamic media, an alternative model for software that illustration or programming. Apps can only be combined in limited unifies hypermedia and interactive systems, and Webstrates, an ways, e.g., it is possible to work on an image in two image treatment experimental implementation of that model.
    [Show full text]
  • Chapter 9: the Forms of Resource Descriptions
    Chapter 9 The Forms of Resource Descriptions Ryan Shaw Murray Maloney 9.1. Introduction . 467 9.2. Structuring Descriptions . 469 9.3. Writing Descriptions . 492 9.4. Worlds of Description . 501 9.5. Key Points in Chapter Nine . 508 9.1 Introduction Throughout this book, we have emphasized the importance of separately consid­ ering fundamental organizing principles, application-specific concepts, and de­ tails of implementation. The three-tier architecture we introduced in §1.6 is one way to conceptualize this separation. In §6.7, we contrasted the implementation-focused perspective for analyzing relationships with other per­ spectives that focus on the meaning and abstract structure of relationships. In this chapter, we present this contrast between conceptualization and implemen­ tation in terms of separating the content and form of resource descriptions. In the previous chapters, we have considered principles and concepts of organ­ izing in many different contexts, ranging from personal organizing systems to cultural and institutional ones. We have noted that some organizing systems have limited scope and expected lifetime, such as a task-oriented personal or­ ganizing system like a shopping list. Other organizing systems support broad uses that rely on standard categories developed through rigorous processes, like a product catalog. By this point you should have a good sense of the various conceptual issues you need to consider when deciding how to describe a resource in order to meet the The Discipline of Organizing goals of your organizing system. Considering those issues will give you some sense of what the content of your descriptions should be. In order to focus on the conceptual issues, we have deferred discussion of specific implementation issues.
    [Show full text]
  • Historically, Cross-Origin Interactions, Through Hyperlinks
    Historically, cross-origin interactions, through hyperlinks, transclusion and form submission, were the most important and distinguishing features of HTTP, HTML and the World Wide Web. As the Web moved from being composed of static markup and resources rendered by the user agent to also include active content, through plug-ins and embedded scripting, and client-side state, through cookies, it quickly became clear that unrestricted cross-origin interactions presented serious privacy and security risks. To deal with these risks, user agents and plugin technologies introduced a set of restrictions generally known as the Same Origin Policy. (SOP) Though many variants of the SOP exist, they all generally 1) preserved the existing cross-origin interactions allowed by HTML and HTTP while 2) restricting the ability of active content to read or make new types of requests across origins. This specification allows resources to voluntarily relax these restrictions. To do so safely, it is important to understand both 1) the pre-existing security impacts of cross- origin requests allowed by the legacy architecture of the Web as well as 2) the distinction between the goal of this specification: authorizing cross-origin read access to a resource in the user agent, and the possibly unintended consequences of authorizing write/execute access to resources by applications from foreign origins executing in the user agent. In this specification, A simple cross-origin requestst has beenare defined as congruent the set of HTTP methods, headers and data which may with those which may be generated sent cross-origin by currently deployed user agents that do not implement CORS.
    [Show full text]
  • Hyperwell: Local-First, Collaborative Notebooks for Digital Annotation
    Hyperwell: Local-First, Collaborative Notebooks for Digital Annotation A thesis presented for the degree of Master of Science by Jan Kaßel [email protected] 3724135 at the Institute of Computer Science Universität Leipzig, Germany May 4, 2020 Advisor & First Examiner Second Examiner Dr. Thomas Köntges Prof. Gregory Crane Chair of Digital Humanities Department of Classical Studies Leipzig University Tufts University Except where otherwise noted, content in this thesis is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License¹, which permits un- restricted adaption, use, and distribution in any medium, provided the original work is properly cited the resulting work is licensed under the same terms as the original. The source code of all Hyperwell repositories is available as open-source soft- ware, licensed under the more permissive MIT License. Copyright 2020, Jan Kaßel. 1. https://creativecommons.org/licenses/by-sa/4.0/ Table of Contents Abstract i Acknowledgements ii List of Abbreviations iii List of Figures iv 1 Introduction 1 1.1 Motivation: A Question of Ownership ................ 3 1.2 Research Goals and Affiliated Work ................. 4 1.3 Synopsis ................................ 5 2 Related Work 7 2.1 Hypertext and Annotation ....................... 7 2.2 Digital Real-Time Collaboration ................... 10 2.3 Linked Data and Digital Humanities . 12 2.4 Peer-to-Peer Networks ........................ 16 2.5 Local-First Applications ........................ 20 3 Study: Exploring Collaborative Workflows 22 3.1 Study Framework ........................... 25 3.2 Analyzing Digital Workflows ..................... 27 3.3 Setting and Observations ....................... 30 3.4 Results ................................. 32 4 Peer-to-Peer Annotation 35 4.1 What’s (Not) Wrong with Servers? .
    [Show full text]