Automated Metadata Extraction
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Specification for JSON Abstract Data Notation Version
Standards Track Work Product Specification for JSON Abstract Data Notation (JADN) Version 1.0 Committee Specification 01 17 August 2021 This stage: https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/jadn-v1.0-cs01.md (Authoritative) https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/jadn-v1.0-cs01.html https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/jadn-v1.0-cs01.pdf Previous stage: https://docs.oasis-open.org/openc2/jadn/v1.0/csd02/jadn-v1.0-csd02.md (Authoritative) https://docs.oasis-open.org/openc2/jadn/v1.0/csd02/jadn-v1.0-csd02.html https://docs.oasis-open.org/openc2/jadn/v1.0/csd02/jadn-v1.0-csd02.pdf Latest stage: https://docs.oasis-open.org/openc2/jadn/v1.0/jadn-v1.0.md (Authoritative) https://docs.oasis-open.org/openc2/jadn/v1.0/jadn-v1.0.html https://docs.oasis-open.org/openc2/jadn/v1.0/jadn-v1.0.pdf Technical Committee: OASIS Open Command and Control (OpenC2) TC Chair: Duncan Sparrell ([email protected]), sFractal Consulting LLC Editor: David Kemp ([email protected]), National Security Agency Additional artifacts: This prose specification is one component of a Work Product that also includes: JSON schema for JADN documents: https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/schemas/jadn-v1.0.json JADN schema for JADN documents: https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/schemas/jadn-v1.0.jadn Abstract: JSON Abstract Data Notation (JADN) is a UML-based information modeling language that defines data structure independently of data format. -
Characterizing Pixel Tracking Through the Lens of Disposable Email Services
Characterizing Pixel Tracking through the Lens of Disposable Email Services Hang Hu, Peng Peng, Gang Wang Department of Computer Science, Virginia Tech fhanghu, pengp17, [email protected] Abstract—Disposable email services provide temporary email services are highly popular. For example, Guerrilla Mail, one addresses, which allows people to register online accounts without of the earliest services, has processed 8 billion emails in the exposing their real email addresses. In this paper, we perform past decade [3]. the first measurement study on disposable email services with two main goals. First, we aim to understand what disposable While disposable email services allow users to hide their email services are used for, and what risks (if any) are involved real identities, the email communication itself is not necessar- in the common use cases. Second, we use the disposable email ily private. More specifically, most disposable email services services as a public gateway to collect a large-scale email dataset maintain a public inbox, allowing any user to access any for measuring email tracking. Over three months, we collected a dataset from 7 popular disposable email services which contain disposable email addresses at any time [6], [5]. Essentially 2.3 million emails sent by 210K domains. We show that online disposable email services are acting as a public email gateway accounts registered through disposable email addresses can be to receive emails. The “public” nature not only raises interest- easily hijacked, leading to potential information leakage and ing questions about the security of the disposable email service financial loss. By empirically analyzing email tracking, we find itself, but also presents a rare opportunity to empirically collect that third-party tracking is highly prevalent, especially in the emails sent by popular services. -
Pdflib Text and Image Extraction Toolkit (TET) Manual
ABC Text and Image Extraction Toolkit (TET) Version 5.2 Toolkit for extracting Text, Images, and other items from PDF Copyright © 2002–2019 PDFlib GmbH. All rights reserved. Protected by European and U.S. patents. PDFlib GmbH Franziska-Bilek-Weg 9, 80339 München, Germany www.pdflib.com phone +49 • 89 • 452 33 84-0 If you have questions check the PDFlib mailing list and archive at groups.yahoo.com/neo/groups/pdflib/info Licensing contact: [email protected] Support for commercial PDFlib licensees: [email protected] (please include your license number) This publication and the information herein is furnished as is, is subject to change without notice, and should not be construed as a commitment by PDFlib GmbH. PDFlib GmbH assumes no responsibility or lia- bility for any errors or inaccuracies, makes no warranty of any kind (express, implied or statutory) with re- spect to this publication, and expressly disclaims any and all warranties of merchantability, fitness for par- ticular purposes and noninfringement of third party rights. TET contains modified parts of the following third-party software: CMap resources. Copyright © 1990-2019 Adobe Zlib compression library, Copyright © 1995-2017 Jean-loup Gailly and Mark Adler TIFFlib image library, Copyright © 1988-1997 Sam Leffler, Copyright © 1991-1997 Silicon Graphics, Inc. Cryptographic software written by Eric Young, Copyright © 1995-1998 Eric Young ([email protected]) Independent JPEG Group’s JPEG software, Copyright © Copyright © 1991-2017, Thomas G. Lane, Guido Vollbeding Cryptographic software, Copyright © 1998-2002 The OpenSSL Project (www.openssl.org) Expat XML parser, Copyright © 2001-2017 Expat maintainers ICU International Components for Unicode, Copyright © 1995-2012 International Business Machines Corpo- ration and others OpenJPEG library, Copyright © 2002-2014, Université catholique de Louvain (UCL), Belgium TET contains the RSA Security, Inc. -
Understanding JSON Schema Release 2020-12
Understanding JSON Schema Release 2020-12 Michael Droettboom, et al Space Telescope Science Institute Sep 14, 2021 Contents 1 Conventions used in this book3 1.1 Language-specific notes.........................................3 1.2 Draft-specific notes............................................4 1.3 Examples.................................................4 2 What is a schema? 7 3 The basics 11 3.1 Hello, World!............................................... 11 3.2 The type keyword............................................ 12 3.3 Declaring a JSON Schema........................................ 13 3.4 Declaring a unique identifier....................................... 13 4 JSON Schema Reference 15 4.1 Type-specific keywords......................................... 15 4.2 string................................................... 17 4.2.1 Length.............................................. 19 4.2.2 Regular Expressions...................................... 19 4.2.3 Format.............................................. 20 4.3 Regular Expressions........................................... 22 4.3.1 Example............................................. 23 4.4 Numeric types.............................................. 23 4.4.1 integer.............................................. 24 4.4.2 number............................................. 25 4.4.3 Multiples............................................ 26 4.4.4 Range.............................................. 26 4.5 object................................................... 29 4.5.1 Properties........................................... -
DLI Implementation and Reference Guide
Implementation and Reference Guide Datalogics Interface Datalogics® Datalogics DATALOGICS INTERFACE Implementation and Reference Guide This guide is part of the Adobe® PDF Library v6.1.1Plus suite; 02/15/05. Copyright 1999-2005 Datalogics Incorporated. All Rights Reserved. Use of Datalogics software is subject to the applicable license agreement. DL Interface is a trademark of Datalogics Incorporated. Other products mentioned herein as Datalogics prod- ucts are also trademarks or registered trademarks of Datalogics, Incorporated. Adobe, Adobe PDF Library, Portable Document Format (PDF), PostScript, Acrobat, Distiller, Exchange and Reader are trademarks of Adobe Systems Incorporated. HP and HP-UX are registered trademarks of Hewlett Packard Corporation. IBM, AIX, AS/400, OS/400, MVS, and OS/390 are registered trademarks of International Business Machines. Java, J2EE, J2SE, J2ME, all Java-based marks, Sun and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. Linux is a registered trademark of Linus Torvalds. Microsoft, Windows and Windows NT are trademarks or registered trademarks of Microsoft Corporation. SAS/C is a registered trademark of SAS Institute Inc. UNIX is a registered trademark of The Open Group. VeriSign® is a registered trademark of VeriSign, Inc. in the United States and/or other countries. All other trademarks and registered trademarks are the property of their respective owners. For additional information, contact: Datalogics, Incorporated 101 North Wacker -
Efficient Sorting of Search Results by String Attributes
Efficient sorting of search results by string attributes Nicholas Sherlock Andrew Trotman Department of Computer Science Department of Computer Science University of Otago University of Otago Otago 9054 New Zealand Otago 9054 New Zealand [email protected] [email protected] Abstract It is sometimes required to order search In addition, the search engine must allocate memory results using textual document attributes such as to an index structure which allows it to efficiently re- titles. This is problematic for performance because trieve those post titles by document index, which, using of the memory required to store these long text a simplistic scheme with 4 bytes required per docu- strings at indexing and search time. We create a ment offset, would require an additional 50 megabytes method for compressing strings which may be used of storage. for approximate ordering of search results on textual For search terms which occur in many documents, attributes. We create a metric for analyzing its most of the memory allocated to storing text fields like performance. We then use this metric to show that, post titles must be examined during result list sorting. for document collections containing tens of millions of As 550 megabytes is vastly larger than the cache mem- documents, we can sort document titles using 64-bits ory available inside the CPU, sorting the list of docu- of storage per title to within 100 positions of error per ments by post title requires the CPU to load that data document. from main memory, which adds substantial latency to query processing and competes for memory bandwidth Keywords Information Retrieval, Web Documents, with other processes running on the same system. -
[MS-LISTSWS]: Lists Web Service Protocol
[MS-LISTSWS]: Lists Web Service Protocol Intellectual Property Rights Notice for Open Specifications Documentation . Technical Documentation. Microsoft publishes Open Specifications documentation (“this documentation”) for protocols, file formats, data portability, computer languages, and standards support. Additionally, overview documents cover inter-protocol relationships and interactions. Copyrights. This documentation is covered by Microsoft copyrights. Regardless of any other terms that are contained in the terms of use for the Microsoft website that hosts this documentation, you can make copies of it in order to develop implementations of the technologies that are described in this documentation and can distribute portions of it in your implementations that use these technologies or in your documentation as necessary to properly document the implementation. You can also distribute in your implementation, with or without modification, any schemas, IDLs, or code samples that are included in the documentation. This permission also applies to any documents that are referenced in the Open Specifications documentation. No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation. Patents. Microsoft has patents that might cover your implementations of the technologies described in the Open Specifications documentation. Neither this notice nor Microsoft's delivery of this documentation grants any licenses under those patents or any other Microsoft patents. However, a given Open Specifications document might be covered by the Microsoft Open Specifications Promise or the Microsoft Community Promise. If you would prefer a written license, or if the technologies described in this documentation are not covered by the Open Specifications Promise or Community Promise, as applicable, patent licenses are available by contacting [email protected]. -
V10.5.0 (2013-07)
ETSI TS 126 234 V10.5.0 (2013-07) Technical Specification Universal Mobile Telecommunications System (UMTS); LTE; Transparent end-to-end Packet-switched Streaming Service (PSS); Protocols and codecs (3GPP TS 26.234 version 10.5.0 Release 10) 3GPP TS 26.234 version 10.5.0 Release 10 1 ETSI TS 126 234 V10.5.0 (2013-07) Reference RTS/TSGS-0426234va50 Keywords LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice Individual copies of the present document can be downloaded from: http://www.etsi.org The present document may be made available in more than one electronic version or in print. In any case of existing or perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF). In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at http://portal.etsi.org/tb/status/status.asp If you find errors in the present document, please send your comment to one of the following services: http://portal.etsi.org/chaircor/ETSI_support.asp Copyright Notification No part may be reproduced except as authorized by written permission. -
Federal Implementation Guideline for Electronic Data Interchange
NIST RESEARCH LIBRARY NIST Special Publication 881-38 APR 6 " 1998 Federal Implementation Guideline for Electronic Data Interchange ASC X12 Version/Release 003070 FEDERAL CONVENTIONS FOR USING ASC XI 2 TRANSACTION SETS Implementation Convention Nisr U.S. DEPARTMENT OF COMMERCE Technology Administration National Institute of Standards and Technology rhe National Institute of Standards and Technology was established in 1988 by Congress to "assist industry in the development of technology . , . needed to improve product quality, to modernize manufacturing processes, to ensure product reliability . and to facilitate rapid commercialization ... of products based on new scientific discoveries." NIST, originally founded as the National Bureau of Standards in 1901, works to strengthen U.S. industry's competitiveness; advance science and engineering; and improve public health, safety, and the environment. One of the agency's basic functions is to develop, maintain, and retain custody of the national standards of measurement, and provide the means and methods for comparing standards used in science, engineering, manufacturing, commerce, industry, and education with the standards adopted or recognized by the Federal Government. As an agency of the U.S. Commerce Department's Technology Administration, NIST conducts basic and applied research in the physical sciences and engineering, and develops measurement techniques, test methods, standards, and related services. The Institute does generic and precompetitive work on new and advanced technologies. -
Python Language
Python Language #python Table of Contents About 1 Chapter 1: Getting started with Python Language 2 Remarks 2 Versions 3 Python 3.x 3 Python 2.x 3 Examples 4 Getting Started 4 Verify if Python is installed 4 Hello, World in Python using IDLE 5 Hello World Python file 5 Launch an interactive Python shell 6 Other Online Shells 7 Run commands as a string 7 Shells and Beyond 8 Creating variables and assigning values 8 User Input 12 IDLE - Python GUI 13 Troubleshooting 14 Datatypes 15 Built-in Types 15 Booleans 15 Numbers 15 Strings 16 Sequences and collections 16 Built-in constants 17 Testing the type of variables 18 Converting between datatypes 18 Explicit string type at definition of literals 19 Mutable and Immutable Data Types 19 Built in Modules and Functions 20 Block Indentation 24 Spaces vs. Tabs 25 Collection Types 25 Help Utility 30 Creating a module 31 String function - str() and repr() 32 repr() 33 str() 33 Installing external modules using pip 34 Finding / installing a package 34 Upgrading installed packages 34 Upgrading pip 35 Installation of Python 2.7.x and 3.x 35 Chapter 2: *args and **kwargs 38 Remarks 38 h11 38 h12 38 h13 38 Examples 39 Using *args when writing functions 39 Using **kwargs when writing functions 39 Using *args when calling functions 40 Using **kwargs when calling functions 41 Using *args when calling functions 41 Keyword-only and Keyword-required arguments 42 Populating kwarg values with a dictionary 42 **kwargs and default values 42 Chapter 3: 2to3 tool 43 Syntax 43 Parameters 43 Remarks 44 Examples 44 Basic -
A Proposal of Substitute for Base85/64 – Base91
A Proposal of Substitute for Base85/64 – Base91 Dake He School of Information Science & Technology, Southwest Jiaotong University, Chengdu 610031,China College of Informatics, South China Agricultural University, Guangzhou 510642, China [email protected] Yu Sun, Zhen Jia, Xiuying Yu, Wei Guo, Wei He, Chao Qi School of Information Science & Technology, Southwest Jiaotong University, Chengdu 610031,China Xianhui Lu Key Lab. of Information Security, Chinese Academy of Sciences, Beijing 100039,China ABSTRACT not control character or “-”(hyphen). There are totally 94 of such ASCII characters, their corresponding digital The coding transformation method, called Base91, is coding being all integers ranging from 32 through 126 characterized by its output of 91 printable ASCII with the exception of 45. E-mail written in these ASCII characters. Base91 has a higher encoding efficiency than characters is compatible with the Internet standard SMTP, Base85/64, and higher encoding rate than Base85. and can be transferred in nearly all the E-mail systems. Besides, Base91 provides compatibility with any Nowadays, as Content-Transfer-Encoding to provide bit-length input sequence without additional filling compatibility with the E-mail, Base64[1,2] code is usually declaration except for his codeword self. One can use employed. Base91 as a substitute for Base85 and Base64 to get some Base64 coding divides the input sequence into blocks benefits in restricted situations. being 6-bits long to be used as variable implementation Keywords: Base91; Base85; Base64; printable ASCII mapping, the mapping is denoted by characters; IPv6 Base64[ ]: X →Y where the variable or original image set X includes all 64 1. -
Answers to Exercises
Answers to Exercises A bird does not sing because he has an answer, he sings because he has a song. —Chinese Proverb Intro.1: abstemious, abstentious, adventitious, annelidous, arsenious, arterious, face- tious, sacrilegious. Intro.2: When a software house has a popular product they tend to come up with new versions. A user can update an old version to a new one, and the update usually comes as a compressed file on a floppy disk. Over time the updates get bigger and, at a certain point, an update may not fit on a single floppy. This is why good compression is important in the case of software updates. The time it takes to compress and decompress the update is unimportant since these operations are typically done just once. Recently, software makers have taken to providing updates over the Internet, but even in such cases it is important to have small files because of the download times involved. 1.1: (1) ask a question, (2) absolutely necessary, (3) advance warning, (4) boiling hot, (5) climb up, (6) close scrutiny, (7) exactly the same, (8) free gift, (9) hot water heater, (10) my personal opinion, (11) newborn baby, (12) postponed until later, (13) unexpected surprise, (14) unsolved mysteries. 1.2: A reasonable way to use them is to code the five most-common strings in the text. Because irreversible text compression is a special-purpose method, the user may know what strings are common in any particular text to be compressed. The user may specify five such strings to the encoder, and they should also be written at the start of the output stream, for the decoder’s use.