Automated Metadata Extraction

Total Page:16

File Type:pdf, Size:1020Kb

Automated Metadata Extraction Masaryk University Faculty of Informatics Automated Metadata Extraction Master’s Thesis Bc. Martin Šmíd Brno, Spring 2021 Masaryk University Faculty of Informatics Automated Metadata Extraction Master’s Thesis Bc. Martin Šmíd Brno, Spring 2021 Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Bc. Martin Šmíd Advisor: RNDr. Lukáš Němec i Acknowledgements I would like to thank my advisor, RNDr. Lukáš Němec, for his friendly approach, helpful advice, and guidance of my work. I would like to express my gratitude to my family for their support. iii Abstract This thesis aims to create a modular tool that can extract important pat- terns of interest from binary data of generally unknown origin. Binary data can come in many different forms. Therefore, several standard encodings used to represent binary data are introduced in the first part. A section introducing selected interesting data patterns whose detection allows for more detailed identification of certain virtual or physical entities follows. The thesis also includes a description of the implemented tool that automates data decoding and data pattern extraction, along with the results of a collected dataset processing. iv Keywords binary-to-text encoding, metadata extraction, Python, data decoding, pattern detection v Contents 1 Introduction 1 2 Binary-to-text Encoding 3 2.1 UUEncoding . .3 2.2 Base encoding . .4 2.2.1 Base16 . .5 2.2.2 Base32 . .5 2.2.3 Base58 . .6 2.2.4 Base64 . .6 2.2.5 Base85 . .7 2.2.6 Base91 . .8 2.3 BinHex . .8 2.4 Percent-encoding . .9 2.5 Quoted-printable . 10 2.6 yEnc . 10 2.7 Bech32 . 11 2.8 MIME . 12 2.9 PEM . 13 3 Metadata Patterns 15 3.1 Cryptographic Hash Functions . 15 3.1.1 Message-Digest Algorithm 5 . 15 3.1.2 Secure Hash Algorithms . 16 3.2 Uniform Resource Identifier . 16 3.2.1 Uniform Resource Locator . 17 3.2.2 Uniform Resource Name . 17 3.2.3 Data URI . 17 3.2.4 Magnet URI . 18 3.3 International Standard Book Number . 19 3.4 International Standard Serial Number . 19 3.5 Digital Object Identifier . 20 3.6 IP Address . 20 3.7 E-mail . 21 3.8 MAC Address . 21 3.9 Cryptocurrency . 22 vii 3.9.1 Bitcoin . 23 3.9.2 Ethereum . 24 3.9.3 Tether . 25 3.9.4 Polkadot . 25 3.9.5 XRP (Ripple) . 25 3.9.6 Cardano . 26 3.9.7 Litecoin . 26 3.9.8 Bitcoin Cash . 27 3.9.9 Chainlink . 27 3.9.10 Stellar . 28 4 MetExt 29 4.1 Analysis . 29 4.1.1 Balbuzard . 29 4.1.2 CyberChef . 30 4.1.3 Chepy . 30 4.1.4 Ciphey . 30 4.2 Design . 31 4.3 Plugins . 31 4.3.1 Decoders . 32 4.3.2 Extractors . 33 4.3.3 Printers . 36 4.4 API . 37 4.5 CLI . 38 5 Evaluation 41 5.1 Test Data Set . 41 5.1.1 Generated Data . 41 5.1.2 Collected Data . 43 5.2 Results . 43 5.2.1 Generated Data . 44 5.2.2 Collected Data . 44 5.3 Evaluation Discussion . 46 6 Conclusion and Future Work 47 6.1 Future Work . 48 A Encoding Mapping Tables 49 viii B Magnet Links Parameters 59 C Pattern Extraction Results 61 D Code extracts 69 Bibliography 71 ix 1 Introduction The Internet has progressively become a very accessible global net- work for people. Web applications enable creating and sharing large amounts of data dedicated to individual participants in network com- munication. Often this data is usually unstructured, and thus without knowledge of the content or nature of the data, it is not easy to explore or read it correctly. On the contrary, unawareness of the structure or its complete ab- sence plays into the hands of the creator or sender of the data. They may apply methods to obfuscate the actual content of the data. In worse cases, it may be the misdeeds of an actor seeking to spread malicious software or otherwise compromise the data. Therefore, it is advisable to inspect the unknown data acquired and ascertain its origin, characteristics, structure or any other informa- tion that can help provide more in-depth insight into the data. Such inspection can be done partly manually, often using software that displays the data in a form known as a hex dump or in a form suitable for inspection of the data as a sequence of bytes in both hexadecimal and simplified text form. This view of the data dramatically simplifies the manual inspection of binary data. Nevertheless, since the manual data processing is time-consuming with respect to the size and form of the data being examined and human error can quickly occur, it is desirable to eliminate manual steps and automate data processing as much as possible. This thesis focuses on creating an extensible tool designed to pro- cess binary data in different formats and find meaningful data patterns in these data. The goal is to meet the initial requirement to support Python version 3.5, ease of use and functional extensibility in modules. The tool’s design is such that the output is further easily machine- processable, and it is possible to add additional information about the patterns found in the determined output. However, before it is possible to realistically analyse the data and determine whether the data contains the information sought, it is necessary to process it and recognise its correct format. Standard encodings used in sending and sharing binary data and selected search data patterns are described in the first half of the 1 1. Introduction paper. The second half then focuses on the tool itself, its description and implemented parts and sample data output from the tool. 2 2 Binary-to-text Encoding It is necessary to ensure that the data recipients can receive and pro- cess the data without malformation. In distinct environments that were not or are still not designed to process binary data in 8-bit en- coding, one must utilise an encoding that enables sending such data. Therefore, several binary-to-text encodings that encode binary data into a sequence of printable characters had been created to mitigate such a problem. Although such encodings ensure seamless data transmission in a particular system, it is also necessary to keep in mind that the use of binary-to-text encodings entails certain compromises, e.g. the size of the data sent is usually larger. Hence means of data compression may need to be applied. In this section, we will list commonly used binary-to-text encod- ings. 2.1 UUEncoding UUEncoding (Unix-to-Unix encoding) originated in 1980 in Unix for encoding binary data for transmission in e-mail [1] designated to transfer files from one Unix system to another through systems with different character sets. UUEncoding was also used to post filesto Usenet newsgroups. Historically, the uuencode program utilised ASCII1 [2] characters with decimal codes from 32 to 95 to encode three octets of data into four printable characters. Each encoded line starts with a length char- acter equal to the number of encoded bytes. Encoded data is encap- sulated within a header line begin <mode> <filename><NL> and a trailer ‘<NL>end<NL> lines where the character ‘ is used as a zero-data character and <NL> is a line break. Later on, the uuencode tool added a Base64 (Section 2.2.4) variant conforming to the MIME. The header lines were changed to contain the information about the used Base64 encoding and the trailer line 1. American Standard Code for Information Interchange 3 2. Binary-to-text Encoding was replaced with a sequence of four equal symbols “====” as it is a valid Base64-decodable sequence [3]. However, the MIME standard for e-mails (Section 2.8) and yEnc en- coding (Section 2.6) created for newsgroups posting replaced UUEn- coding in later years. Listing 2.1: An example of uuencoded data 1 begin 0744 example.dat 2 M3&]R96T@:7!S=6T@9&]L;W(@<VET(&%M970L(&-O;G-E8W1E=’5 E<B!A9&EP 3 M:7-C:6YG(&5L:70N($5T:6%M(&)I8F5N9’5M(&5L:70@96=E="! E<F%T+B!. 4 1=6QL86T@:G5S=&\@96YI;2X‘ 5 ‘ 6 end 2.2 Base encoding Base encodings utilise a specific alphabet to encode binary data and fundamentally use the modulus operation for the encoding and de- coding. Characters from the ASCII set is usually used as encoding alphabet in common encodings to avoid data misinterpretation in text-based systems. Depending on the specification, different restrictions such as line length constraints or characters outside the defined encoding alphabet apply. Base encodings tailored for specific use cases utilising UTF encod- ings have also been invented, e.g. Base122 [4] (an alternative to Base64 utilising properties of UTF-8 encoding), Base1024 [5] (encoding with emoji characters), Base2048 [6] (encoding optimised for Twitter), or Base32768 [7] (encoding optimised for UTF-16 text). Nevertheless, they have not been standardised and commonly used so far. Thus, this section only describes commonly-used base encodings with alphabets that comprise printable ASCII characters. 4 2. Binary-to-text Encoding 2.2.1 Base16 Base16 (also referred to as hex) encoding is the standard case-insensi- tive hex encoding that uses a set of 16 characters – digits and letters from “A” to “F” (see Table A.1).
Recommended publications
  • Specification for JSON Abstract Data Notation Version
    Standards Track Work Product Specification for JSON Abstract Data Notation (JADN) Version 1.0 Committee Specification 01 17 August 2021 This stage: https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/jadn-v1.0-cs01.md (Authoritative) https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/jadn-v1.0-cs01.html https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/jadn-v1.0-cs01.pdf Previous stage: https://docs.oasis-open.org/openc2/jadn/v1.0/csd02/jadn-v1.0-csd02.md (Authoritative) https://docs.oasis-open.org/openc2/jadn/v1.0/csd02/jadn-v1.0-csd02.html https://docs.oasis-open.org/openc2/jadn/v1.0/csd02/jadn-v1.0-csd02.pdf Latest stage: https://docs.oasis-open.org/openc2/jadn/v1.0/jadn-v1.0.md (Authoritative) https://docs.oasis-open.org/openc2/jadn/v1.0/jadn-v1.0.html https://docs.oasis-open.org/openc2/jadn/v1.0/jadn-v1.0.pdf Technical Committee: OASIS Open Command and Control (OpenC2) TC Chair: Duncan Sparrell ([email protected]), sFractal Consulting LLC Editor: David Kemp ([email protected]), National Security Agency Additional artifacts: This prose specification is one component of a Work Product that also includes: JSON schema for JADN documents: https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/schemas/jadn-v1.0.json JADN schema for JADN documents: https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/schemas/jadn-v1.0.jadn Abstract: JSON Abstract Data Notation (JADN) is a UML-based information modeling language that defines data structure independently of data format.
    [Show full text]
  • Characterizing Pixel Tracking Through the Lens of Disposable Email Services
    Characterizing Pixel Tracking through the Lens of Disposable Email Services Hang Hu, Peng Peng, Gang Wang Department of Computer Science, Virginia Tech fhanghu, pengp17, [email protected] Abstract—Disposable email services provide temporary email services are highly popular. For example, Guerrilla Mail, one addresses, which allows people to register online accounts without of the earliest services, has processed 8 billion emails in the exposing their real email addresses. In this paper, we perform past decade [3]. the first measurement study on disposable email services with two main goals. First, we aim to understand what disposable While disposable email services allow users to hide their email services are used for, and what risks (if any) are involved real identities, the email communication itself is not necessar- in the common use cases. Second, we use the disposable email ily private. More specifically, most disposable email services services as a public gateway to collect a large-scale email dataset maintain a public inbox, allowing any user to access any for measuring email tracking. Over three months, we collected a dataset from 7 popular disposable email services which contain disposable email addresses at any time [6], [5]. Essentially 2.3 million emails sent by 210K domains. We show that online disposable email services are acting as a public email gateway accounts registered through disposable email addresses can be to receive emails. The “public” nature not only raises interest- easily hijacked, leading to potential information leakage and ing questions about the security of the disposable email service financial loss. By empirically analyzing email tracking, we find itself, but also presents a rare opportunity to empirically collect that third-party tracking is highly prevalent, especially in the emails sent by popular services.
    [Show full text]
  • Pdflib Text and Image Extraction Toolkit (TET) Manual
    ABC Text and Image Extraction Toolkit (TET) Version 5.2 Toolkit for extracting Text, Images, and other items from PDF Copyright © 2002–2019 PDFlib GmbH. All rights reserved. Protected by European and U.S. patents. PDFlib GmbH Franziska-Bilek-Weg 9, 80339 München, Germany www.pdflib.com phone +49 • 89 • 452 33 84-0 If you have questions check the PDFlib mailing list and archive at groups.yahoo.com/neo/groups/pdflib/info Licensing contact: [email protected] Support for commercial PDFlib licensees: [email protected] (please include your license number) This publication and the information herein is furnished as is, is subject to change without notice, and should not be construed as a commitment by PDFlib GmbH. PDFlib GmbH assumes no responsibility or lia- bility for any errors or inaccuracies, makes no warranty of any kind (express, implied or statutory) with re- spect to this publication, and expressly disclaims any and all warranties of merchantability, fitness for par- ticular purposes and noninfringement of third party rights. TET contains modified parts of the following third-party software: CMap resources. Copyright © 1990-2019 Adobe Zlib compression library, Copyright © 1995-2017 Jean-loup Gailly and Mark Adler TIFFlib image library, Copyright © 1988-1997 Sam Leffler, Copyright © 1991-1997 Silicon Graphics, Inc. Cryptographic software written by Eric Young, Copyright © 1995-1998 Eric Young ([email protected]) Independent JPEG Group’s JPEG software, Copyright © Copyright © 1991-2017, Thomas G. Lane, Guido Vollbeding Cryptographic software, Copyright © 1998-2002 The OpenSSL Project (www.openssl.org) Expat XML parser, Copyright © 2001-2017 Expat maintainers ICU International Components for Unicode, Copyright © 1995-2012 International Business Machines Corpo- ration and others OpenJPEG library, Copyright © 2002-2014, Université catholique de Louvain (UCL), Belgium TET contains the RSA Security, Inc.
    [Show full text]
  • Understanding JSON Schema Release 2020-12
    Understanding JSON Schema Release 2020-12 Michael Droettboom, et al Space Telescope Science Institute Sep 14, 2021 Contents 1 Conventions used in this book3 1.1 Language-specific notes.........................................3 1.2 Draft-specific notes............................................4 1.3 Examples.................................................4 2 What is a schema? 7 3 The basics 11 3.1 Hello, World!............................................... 11 3.2 The type keyword............................................ 12 3.3 Declaring a JSON Schema........................................ 13 3.4 Declaring a unique identifier....................................... 13 4 JSON Schema Reference 15 4.1 Type-specific keywords......................................... 15 4.2 string................................................... 17 4.2.1 Length.............................................. 19 4.2.2 Regular Expressions...................................... 19 4.2.3 Format.............................................. 20 4.3 Regular Expressions........................................... 22 4.3.1 Example............................................. 23 4.4 Numeric types.............................................. 23 4.4.1 integer.............................................. 24 4.4.2 number............................................. 25 4.4.3 Multiples............................................ 26 4.4.4 Range.............................................. 26 4.5 object................................................... 29 4.5.1 Properties...........................................
    [Show full text]
  • DLI Implementation and Reference Guide
    Implementation and Reference Guide Datalogics Interface Datalogics® Datalogics DATALOGICS INTERFACE Implementation and Reference Guide This guide is part of the Adobe® PDF Library v6.1.1Plus suite; 02/15/05. Copyright 1999-2005 Datalogics Incorporated. All Rights Reserved. Use of Datalogics software is subject to the applicable license agreement. DL Interface is a trademark of Datalogics Incorporated. Other products mentioned herein as Datalogics prod- ucts are also trademarks or registered trademarks of Datalogics, Incorporated. Adobe, Adobe PDF Library, Portable Document Format (PDF), PostScript, Acrobat, Distiller, Exchange and Reader are trademarks of Adobe Systems Incorporated. HP and HP-UX are registered trademarks of Hewlett Packard Corporation. IBM, AIX, AS/400, OS/400, MVS, and OS/390 are registered trademarks of International Business Machines. Java, J2EE, J2SE, J2ME, all Java-based marks, Sun and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. Linux is a registered trademark of Linus Torvalds. Microsoft, Windows and Windows NT are trademarks or registered trademarks of Microsoft Corporation. SAS/C is a registered trademark of SAS Institute Inc. UNIX is a registered trademark of The Open Group. VeriSign® is a registered trademark of VeriSign, Inc. in the United States and/or other countries. All other trademarks and registered trademarks are the property of their respective owners. For additional information, contact: Datalogics, Incorporated 101 North Wacker
    [Show full text]
  • Efficient Sorting of Search Results by String Attributes
    Efficient sorting of search results by string attributes Nicholas Sherlock Andrew Trotman Department of Computer Science Department of Computer Science University of Otago University of Otago Otago 9054 New Zealand Otago 9054 New Zealand [email protected] [email protected] Abstract It is sometimes required to order search In addition, the search engine must allocate memory results using textual document attributes such as to an index structure which allows it to efficiently re- titles. This is problematic for performance because trieve those post titles by document index, which, using of the memory required to store these long text a simplistic scheme with 4 bytes required per docu- strings at indexing and search time. We create a ment offset, would require an additional 50 megabytes method for compressing strings which may be used of storage. for approximate ordering of search results on textual For search terms which occur in many documents, attributes. We create a metric for analyzing its most of the memory allocated to storing text fields like performance. We then use this metric to show that, post titles must be examined during result list sorting. for document collections containing tens of millions of As 550 megabytes is vastly larger than the cache mem- documents, we can sort document titles using 64-bits ory available inside the CPU, sorting the list of docu- of storage per title to within 100 positions of error per ments by post title requires the CPU to load that data document. from main memory, which adds substantial latency to query processing and competes for memory bandwidth Keywords Information Retrieval, Web Documents, with other processes running on the same system.
    [Show full text]
  • [MS-LISTSWS]: Lists Web Service Protocol
    [MS-LISTSWS]: Lists Web Service Protocol Intellectual Property Rights Notice for Open Specifications Documentation . Technical Documentation. Microsoft publishes Open Specifications documentation (“this documentation”) for protocols, file formats, data portability, computer languages, and standards support. Additionally, overview documents cover inter-protocol relationships and interactions. Copyrights. This documentation is covered by Microsoft copyrights. Regardless of any other terms that are contained in the terms of use for the Microsoft website that hosts this documentation, you can make copies of it in order to develop implementations of the technologies that are described in this documentation and can distribute portions of it in your implementations that use these technologies or in your documentation as necessary to properly document the implementation. You can also distribute in your implementation, with or without modification, any schemas, IDLs, or code samples that are included in the documentation. This permission also applies to any documents that are referenced in the Open Specifications documentation. No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation. Patents. Microsoft has patents that might cover your implementations of the technologies described in the Open Specifications documentation. Neither this notice nor Microsoft's delivery of this documentation grants any licenses under those patents or any other Microsoft patents. However, a given Open Specifications document might be covered by the Microsoft Open Specifications Promise or the Microsoft Community Promise. If you would prefer a written license, or if the technologies described in this documentation are not covered by the Open Specifications Promise or Community Promise, as applicable, patent licenses are available by contacting [email protected].
    [Show full text]
  • V10.5.0 (2013-07)
    ETSI TS 126 234 V10.5.0 (2013-07) Technical Specification Universal Mobile Telecommunications System (UMTS); LTE; Transparent end-to-end Packet-switched Streaming Service (PSS); Protocols and codecs (3GPP TS 26.234 version 10.5.0 Release 10) 3GPP TS 26.234 version 10.5.0 Release 10 1 ETSI TS 126 234 V10.5.0 (2013-07) Reference RTS/TSGS-0426234va50 Keywords LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice Individual copies of the present document can be downloaded from: http://www.etsi.org The present document may be made available in more than one electronic version or in print. In any case of existing or perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF). In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at http://portal.etsi.org/tb/status/status.asp If you find errors in the present document, please send your comment to one of the following services: http://portal.etsi.org/chaircor/ETSI_support.asp Copyright Notification No part may be reproduced except as authorized by written permission.
    [Show full text]
  • Federal Implementation Guideline for Electronic Data Interchange
    NIST RESEARCH LIBRARY NIST Special Publication 881-38 APR 6 " 1998 Federal Implementation Guideline for Electronic Data Interchange ASC X12 Version/Release 003070 FEDERAL CONVENTIONS FOR USING ASC XI 2 TRANSACTION SETS Implementation Convention Nisr U.S. DEPARTMENT OF COMMERCE Technology Administration National Institute of Standards and Technology rhe National Institute of Standards and Technology was established in 1988 by Congress to "assist industry in the development of technology . , . needed to improve product quality, to modernize manufacturing processes, to ensure product reliability . and to facilitate rapid commercialization ... of products based on new scientific discoveries." NIST, originally founded as the National Bureau of Standards in 1901, works to strengthen U.S. industry's competitiveness; advance science and engineering; and improve public health, safety, and the environment. One of the agency's basic functions is to develop, maintain, and retain custody of the national standards of measurement, and provide the means and methods for comparing standards used in science, engineering, manufacturing, commerce, industry, and education with the standards adopted or recognized by the Federal Government. As an agency of the U.S. Commerce Department's Technology Administration, NIST conducts basic and applied research in the physical sciences and engineering, and develops measurement techniques, test methods, standards, and related services. The Institute does generic and precompetitive work on new and advanced technologies.
    [Show full text]
  • Python Language
    Python Language #python Table of Contents About 1 Chapter 1: Getting started with Python Language 2 Remarks 2 Versions 3 Python 3.x 3 Python 2.x 3 Examples 4 Getting Started 4 Verify if Python is installed 4 Hello, World in Python using IDLE 5 Hello World Python file 5 Launch an interactive Python shell 6 Other Online Shells 7 Run commands as a string 7 Shells and Beyond 8 Creating variables and assigning values 8 User Input 12 IDLE - Python GUI 13 Troubleshooting 14 Datatypes 15 Built-in Types 15 Booleans 15 Numbers 15 Strings 16 Sequences and collections 16 Built-in constants 17 Testing the type of variables 18 Converting between datatypes 18 Explicit string type at definition of literals 19 Mutable and Immutable Data Types 19 Built in Modules and Functions 20 Block Indentation 24 Spaces vs. Tabs 25 Collection Types 25 Help Utility 30 Creating a module 31 String function - str() and repr() 32 repr() 33 str() 33 Installing external modules using pip 34 Finding / installing a package 34 Upgrading installed packages 34 Upgrading pip 35 Installation of Python 2.7.x and 3.x 35 Chapter 2: *args and **kwargs 38 Remarks 38 h11 38 h12 38 h13 38 Examples 39 Using *args when writing functions 39 Using **kwargs when writing functions 39 Using *args when calling functions 40 Using **kwargs when calling functions 41 Using *args when calling functions 41 Keyword-only and Keyword-required arguments 42 Populating kwarg values with a dictionary 42 **kwargs and default values 42 Chapter 3: 2to3 tool 43 Syntax 43 Parameters 43 Remarks 44 Examples 44 Basic
    [Show full text]
  • A Proposal of Substitute for Base85/64 – Base91
    A Proposal of Substitute for Base85/64 – Base91 Dake He School of Information Science & Technology, Southwest Jiaotong University, Chengdu 610031,China College of Informatics, South China Agricultural University, Guangzhou 510642, China [email protected] Yu Sun, Zhen Jia, Xiuying Yu, Wei Guo, Wei He, Chao Qi School of Information Science & Technology, Southwest Jiaotong University, Chengdu 610031,China Xianhui Lu Key Lab. of Information Security, Chinese Academy of Sciences, Beijing 100039,China ABSTRACT not control character or “-”(hyphen). There are totally 94 of such ASCII characters, their corresponding digital The coding transformation method, called Base91, is coding being all integers ranging from 32 through 126 characterized by its output of 91 printable ASCII with the exception of 45. E-mail written in these ASCII characters. Base91 has a higher encoding efficiency than characters is compatible with the Internet standard SMTP, Base85/64, and higher encoding rate than Base85. and can be transferred in nearly all the E-mail systems. Besides, Base91 provides compatibility with any Nowadays, as Content-Transfer-Encoding to provide bit-length input sequence without additional filling compatibility with the E-mail, Base64[1,2] code is usually declaration except for his codeword self. One can use employed. Base91 as a substitute for Base85 and Base64 to get some Base64 coding divides the input sequence into blocks benefits in restricted situations. being 6-bits long to be used as variable implementation Keywords: Base91; Base85; Base64; printable ASCII mapping, the mapping is denoted by characters; IPv6 Base64[ ]: X →Y where the variable or original image set X includes all 64 1.
    [Show full text]
  • Answers to Exercises
    Answers to Exercises A bird does not sing because he has an answer, he sings because he has a song. —Chinese Proverb Intro.1: abstemious, abstentious, adventitious, annelidous, arsenious, arterious, face- tious, sacrilegious. Intro.2: When a software house has a popular product they tend to come up with new versions. A user can update an old version to a new one, and the update usually comes as a compressed file on a floppy disk. Over time the updates get bigger and, at a certain point, an update may not fit on a single floppy. This is why good compression is important in the case of software updates. The time it takes to compress and decompress the update is unimportant since these operations are typically done just once. Recently, software makers have taken to providing updates over the Internet, but even in such cases it is important to have small files because of the download times involved. 1.1: (1) ask a question, (2) absolutely necessary, (3) advance warning, (4) boiling hot, (5) climb up, (6) close scrutiny, (7) exactly the same, (8) free gift, (9) hot water heater, (10) my personal opinion, (11) newborn baby, (12) postponed until later, (13) unexpected surprise, (14) unsolved mysteries. 1.2: A reasonable way to use them is to code the five most-common strings in the text. Because irreversible text compression is a special-purpose method, the user may know what strings are common in any particular text to be compressed. The user may specify five such strings to the encoder, and they should also be written at the start of the output stream, for the decoder’s use.
    [Show full text]