Pdflib Text and Image Extraction Toolkit (TET) 5.3 Manual
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Pdflib Text and Image Extraction Toolkit (TET) Manual
ABC Text and Image Extraction Toolkit (TET) Version 5.2 Toolkit for extracting Text, Images, and other items from PDF Copyright © 2002–2019 PDFlib GmbH. All rights reserved. Protected by European and U.S. patents. PDFlib GmbH Franziska-Bilek-Weg 9, 80339 München, Germany www.pdflib.com phone +49 • 89 • 452 33 84-0 If you have questions check the PDFlib mailing list and archive at groups.yahoo.com/neo/groups/pdflib/info Licensing contact: [email protected] Support for commercial PDFlib licensees: [email protected] (please include your license number) This publication and the information herein is furnished as is, is subject to change without notice, and should not be construed as a commitment by PDFlib GmbH. PDFlib GmbH assumes no responsibility or lia- bility for any errors or inaccuracies, makes no warranty of any kind (express, implied or statutory) with re- spect to this publication, and expressly disclaims any and all warranties of merchantability, fitness for par- ticular purposes and noninfringement of third party rights. TET contains modified parts of the following third-party software: CMap resources. Copyright © 1990-2019 Adobe Zlib compression library, Copyright © 1995-2017 Jean-loup Gailly and Mark Adler TIFFlib image library, Copyright © 1988-1997 Sam Leffler, Copyright © 1991-1997 Silicon Graphics, Inc. Cryptographic software written by Eric Young, Copyright © 1995-1998 Eric Young ([email protected]) Independent JPEG Group’s JPEG software, Copyright © Copyright © 1991-2017, Thomas G. Lane, Guido Vollbeding Cryptographic software, Copyright © 1998-2002 The OpenSSL Project (www.openssl.org) Expat XML parser, Copyright © 2001-2017 Expat maintainers ICU International Components for Unicode, Copyright © 1995-2012 International Business Machines Corpo- ration and others OpenJPEG library, Copyright © 2002-2014, Université catholique de Louvain (UCL), Belgium TET contains the RSA Security, Inc. -
DLI Implementation and Reference Guide
Implementation and Reference Guide Datalogics Interface Datalogics® Datalogics DATALOGICS INTERFACE Implementation and Reference Guide This guide is part of the Adobe® PDF Library v6.1.1Plus suite; 02/15/05. Copyright 1999-2005 Datalogics Incorporated. All Rights Reserved. Use of Datalogics software is subject to the applicable license agreement. DL Interface is a trademark of Datalogics Incorporated. Other products mentioned herein as Datalogics prod- ucts are also trademarks or registered trademarks of Datalogics, Incorporated. Adobe, Adobe PDF Library, Portable Document Format (PDF), PostScript, Acrobat, Distiller, Exchange and Reader are trademarks of Adobe Systems Incorporated. HP and HP-UX are registered trademarks of Hewlett Packard Corporation. IBM, AIX, AS/400, OS/400, MVS, and OS/390 are registered trademarks of International Business Machines. Java, J2EE, J2SE, J2ME, all Java-based marks, Sun and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. Linux is a registered trademark of Linus Torvalds. Microsoft, Windows and Windows NT are trademarks or registered trademarks of Microsoft Corporation. SAS/C is a registered trademark of SAS Institute Inc. UNIX is a registered trademark of The Open Group. VeriSign® is a registered trademark of VeriSign, Inc. in the United States and/or other countries. All other trademarks and registered trademarks are the property of their respective owners. For additional information, contact: Datalogics, Incorporated 101 North Wacker -
Python Language
Python Language #python Table of Contents About 1 Chapter 1: Getting started with Python Language 2 Remarks 2 Versions 3 Python 3.x 3 Python 2.x 3 Examples 4 Getting Started 4 Verify if Python is installed 4 Hello, World in Python using IDLE 5 Hello World Python file 5 Launch an interactive Python shell 6 Other Online Shells 7 Run commands as a string 7 Shells and Beyond 8 Creating variables and assigning values 8 User Input 12 IDLE - Python GUI 13 Troubleshooting 14 Datatypes 15 Built-in Types 15 Booleans 15 Numbers 15 Strings 16 Sequences and collections 16 Built-in constants 17 Testing the type of variables 18 Converting between datatypes 18 Explicit string type at definition of literals 19 Mutable and Immutable Data Types 19 Built in Modules and Functions 20 Block Indentation 24 Spaces vs. Tabs 25 Collection Types 25 Help Utility 30 Creating a module 31 String function - str() and repr() 32 repr() 33 str() 33 Installing external modules using pip 34 Finding / installing a package 34 Upgrading installed packages 34 Upgrading pip 35 Installation of Python 2.7.x and 3.x 35 Chapter 2: *args and **kwargs 38 Remarks 38 h11 38 h12 38 h13 38 Examples 39 Using *args when writing functions 39 Using **kwargs when writing functions 39 Using *args when calling functions 40 Using **kwargs when calling functions 41 Using *args when calling functions 41 Keyword-only and Keyword-required arguments 42 Populating kwarg values with a dictionary 42 **kwargs and default values 42 Chapter 3: 2to3 tool 43 Syntax 43 Parameters 43 Remarks 44 Examples 44 Basic -
Parallel Heterogeneous Computing a Case Study on Accelerating JPEG2000 Coder
Parallel Heterogeneous Computing A Case Study On Accelerating JPEG2000 Coder by Ro-To Le M.Sc., Brown University; Providence, RI, USA, 2009 B.Sc., Hanoi University of Technology; Hanoi, Vietnam, 2007 A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The School of Engineering at Brown University PROVIDENCE, RHODE ISLAND May 2013 c Copyright 2013 by Ro-To Le This dissertation by Ro-To Le is accepted in its present form by The School of Engineering as satisfying the dissertation requirement for the degree of Doctor of Philosophy. Date R. Iris Bahar, Ph.D., Advisor Date Joseph L. Mundy, Ph.D., Advisor Recommended to the Graduate Council Date Vishal Jain, Ph.D., Reader Approved by the Graduate Council Date Peter M. Weber, Dean of the Graduate School iii Vitae Roto Le was born in Duc-Tho, Ha-Tinh, a countryside area in the Midland of Vietnam. He received his B.Sc., with Excellent Classification, in Electronics and Telecommunications from Hanoi University of Technology in 2007. Soon after re- ceiving his B.Sc., Roto came to Brown University to start a Ph.D. program in Com- puter Engineering in Fall 2007. His Ph.D. program was sponsored by a fellowship from the Vietnam Education Foundation, which was selectively nominated by the National Academies’ scientists. During his Ph.D. program, he earned a M.Sc. degree in Computer Engineering in 2009. Roto has been studying several aspects of modern computing systems, from hardware architecture and VLSI system design to high-performance software design. He has published several articles in designing a parallel JPEG2000 coder based on heterogeneous CPU-GPGPU systems and designing novel Three-Dimensional (3D) FPGA architectures. -
A Proposal of Substitute for Base85/64 – Base91
A Proposal of Substitute for Base85/64 – Base91 Dake He School of Information Science & Technology, Southwest Jiaotong University, Chengdu 610031,China College of Informatics, South China Agricultural University, Guangzhou 510642, China [email protected] Yu Sun, Zhen Jia, Xiuying Yu, Wei Guo, Wei He, Chao Qi School of Information Science & Technology, Southwest Jiaotong University, Chengdu 610031,China Xianhui Lu Key Lab. of Information Security, Chinese Academy of Sciences, Beijing 100039,China ABSTRACT not control character or “-”(hyphen). There are totally 94 of such ASCII characters, their corresponding digital The coding transformation method, called Base91, is coding being all integers ranging from 32 through 126 characterized by its output of 91 printable ASCII with the exception of 45. E-mail written in these ASCII characters. Base91 has a higher encoding efficiency than characters is compatible with the Internet standard SMTP, Base85/64, and higher encoding rate than Base85. and can be transferred in nearly all the E-mail systems. Besides, Base91 provides compatibility with any Nowadays, as Content-Transfer-Encoding to provide bit-length input sequence without additional filling compatibility with the E-mail, Base64[1,2] code is usually declaration except for his codeword self. One can use employed. Base91 as a substitute for Base85 and Base64 to get some Base64 coding divides the input sequence into blocks benefits in restricted situations. being 6-bits long to be used as variable implementation Keywords: Base91; Base85; Base64; printable ASCII mapping, the mapping is denoted by characters; IPv6 Base64[ ]: X →Y where the variable or original image set X includes all 64 1. -
Answers to Exercises
Answers to Exercises A bird does not sing because he has an answer, he sings because he has a song. —Chinese Proverb Intro.1: abstemious, abstentious, adventitious, annelidous, arsenious, arterious, face- tious, sacrilegious. Intro.2: When a software house has a popular product they tend to come up with new versions. A user can update an old version to a new one, and the update usually comes as a compressed file on a floppy disk. Over time the updates get bigger and, at a certain point, an update may not fit on a single floppy. This is why good compression is important in the case of software updates. The time it takes to compress and decompress the update is unimportant since these operations are typically done just once. Recently, software makers have taken to providing updates over the Internet, but even in such cases it is important to have small files because of the download times involved. 1.1: (1) ask a question, (2) absolutely necessary, (3) advance warning, (4) boiling hot, (5) climb up, (6) close scrutiny, (7) exactly the same, (8) free gift, (9) hot water heater, (10) my personal opinion, (11) newborn baby, (12) postponed until later, (13) unexpected surprise, (14) unsolved mysteries. 1.2: A reasonable way to use them is to code the five most-common strings in the text. Because irreversible text compression is a special-purpose method, the user may know what strings are common in any particular text to be compressed. The user may specify five such strings to the encoder, and they should also be written at the start of the output stream, for the decoder’s use. -
Scape D10.1 Keeps V1.0
Identification and selection of large‐scale migration tools and services Authors Rui Castro, Luís Faria (KEEP Solutions), Christoph Becker, Markus Hamm (Vienna University of Technology) June 2011 This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137). This work is licensed under a CC-BY-SA International License Table of Contents 1 Introduction 1 1.1 Scope of this document 1 2 Related work 2 2.1 Preservation action tools 3 2.1.1 PLANETS 3 2.1.2 RODA 5 2.1.3 CRiB 6 2.2 Software quality models 6 2.2.1 ISO standard 25010 7 2.2.2 Decision criteria in digital preservation 7 3 Criteria for evaluating action tools 9 3.1 Functional suitability 10 3.2 Performance efficiency 11 3.3 Compatibility 11 3.4 Usability 11 3.5 Reliability 12 3.6 Security 12 3.7 Maintainability 13 3.8 Portability 13 4 Methodology 14 4.1 Analysis of requirements 14 4.2 Definition of the evaluation framework 14 4.3 Identification, evaluation and selection of action tools 14 5 Analysis of requirements 15 5.1 Requirements for the SCAPE platform 16 5.2 Requirements of the testbed scenarios 16 5.2.1 Scenario 1: Normalize document formats contained in the web archive 16 5.2.2 Scenario 2: Deep characterisation of huge media files 17 v 5.2.3 Scenario 3: Migrate digitised TIFFs to JPEG2000 17 5.2.4 Scenario 4: Migrate archive to new archiving system? 17 5.2.5 Scenario 5: RAW to NEXUS migration 18 6 Evaluation framework 18 6.1 Suitability for testbeds 19 6.2 Suitability for platform 19 6.3 Technical instalability 20 6.4 Legal constrains 20 6.5 Summary 20 7 Results 21 7.1 Identification of candidate tools 21 7.2 Evaluation and selection of tools 22 8 Conclusions 24 9 References 25 10 Appendix 28 10.1 List of identified action tools 28 vi 1 Introduction A preservation action is a concrete action, usually implemented by a software tool, that is performed on digital content in order to achieve some preservation goal. -
Automated Patch Transplantation
Automated Patch Transplantation RIDWAN SALIHIN SHARIFFDEEN, National University of Singapore, Singapore SHIN HWEI TAN∗, Southern University of Science and Technology, China MINGYUAN GAO, National University of Singapore, Singapore ABHIK ROYCHOUDHURY, National University of Singapore, Singapore Automated program repair is an emerging area which attempts to patch software errors and vulnerabilities. In this paper, we formulate and study a problem related to automated repair, namely automated patch transplantation. A patch for an error in a donor program is automatically adapted and inserted into a “similar” target program. We observe that despite standard procedures for vulnerability disclosures and publishing of patches, many un-patched occurrences remain in the wild. One of the main reasons is the fact that various implementations of the same functionality may exist and, hence, published patches need to be modified and adapted. In this paper, we therefore propose and implement a workflow for transplanting patches. Our approach centers on identifying patch insertion points, as well as namespaces translation across programs via symbolic execution. Experimental results to eliminate five classes of errors highlight our ability to fix recurring vulnerabilities across various programs through transplantation. We report that in20of24 fixing tasks involving eight application subjects mostly involving file processing programs, we successfully transplanted thepatchand validated the transplantation through differential testing. Since the publication -
Cisco Unified Attendant Console Standard 14.0.1 Open Source
Open Source Used In Cisco Unified Attendant Console Standard (14.0.1) Cisco Systems, Inc. www.cisco.com Cisco has more than 200 offices worldwide. Addresses, phone numbers, and fax numbers are listed on the Cisco website at www.cisco.com/go/offices. Open Source Used In Cisco Unified Attendant Console Standard (14.0.1) 1 Text Part Number: 78EE117C99-1135983409 Open Source Used In Cisco Unified Attendant Console Standard (14.0.1) 2 This document contains licenses and notices for open source software used in this product. With respect to the free/open source software listed in this document, if you have any questions or wish to receive a copy of any source code to which you may be entitled under the applicable free/open source license(s) (such as the GNU Lesser/General Public License), please contact us at [email protected]. In your requests please include the following reference number 78EE117C99-1135983409 Contents 1.1 dotnetzip-reduced 1.9.1.8 1.1.1 Available under license 1.2 zlib 1.2.3 1.2.1 Available under license 1.3 commonservicelocator 1.0.0.0 1.3.1 Available under license 1.4 libjpeg 6a 1.4.1 Available under license 1.5 openssl 1.0.2s 1.5.1 Notifications 1.5.2 Available under license 1.6 dotnetzip 1.9.1.8 1.6.1 Available under license 1.7 ionic-zip 1.9.1.8 1.7.1 Available under license 1.8 xerces-c 3.2.3 1.8.1 Available under license 1.9 log4net 2.0.8.0-.NET 1.9.1 Available under license 1.10 zlib 1.1.4 1.10.1 Available under license 1.11 sqlite 3.7.16.2 1.11.1 Available under license Open Source Used In Cisco Unified Attendant Console Standard (14.0.1) 3 1.1 dotnetzip-reduced 1.9.1.8 1.1.1 Available under license : The ZLIB library, available as Ionic.Zlib.dll or as part of DotNetZip, is a ported-then-modified version of jzlib, which itself is based on zlib-1.1.3, the well-known C-language compression library. -
JPEG 2000 Jako Archivní Formát Obrazových Dat1
Název článku JPEG 2000 jako archivní formát obrazových dat1 JPEG 2000 as Archival Format for still Images Mgr. Natalie Ostráková / Národní knihovna České republiky (The National Library of Czech Republic), Klementinum 190, 110 00 Praha 1, Česká republika Resumé: Formát JPEG 2000 byl představen v roce 2000 a od té doby se zvažuje jeho vhodnost pro dlouhodobé uchovávání digitálních obrazových dat. V projektu Národní digitální knihovna se používá pro dlouhodobé uchovávání obrazových dat vzniklých digitalizací sbírek knihoven. V člán- ku je formát krátce představen, pozornost se věnuje jednotlivým parametrům profilu formátu, exis- tujícím nástrojům a hlavně využití formátu ve významných zahraničních institucích s cílem ukázat, že se jedná o vhodný a ve světě užívaný archivační formát. Klíčová slova: JPEG 2000, JP2, dlouhodobé uchovávání, digitální archivace, souborové formáty Summary: Format JPEG 2000 was introduced in 2000 and since then we can see discussions about its suitability for long term preservation of still images. Actually JPEG 2000 is used in project National digital library as archival master for images created during digitization of library collections. In the text the format is briefly introduced, then focusing on its profile parameters, on tools able to work with the format and especially text focuses on use of the format in important foreign institu- tions with aim to show, that it is suitable and in use archival format. Keywords: JPEG 2000, JP2, long term preservation, digital preservation, file formats Téma formátů vhodných k dlouhodobé archivaci digitálních dokumentů je jedním ze stěžejních témat digitální archivace. Volba vhodných formátů, jejich identifikace, vali- dace a charakterizace jsou častým tématem příspěvků oborových událostí (konferencí, webinářů apod.) a otázkou doporučených formátů se zabývají pracovníci paměťových institucí. -
JPEG 2000 CODEC Certification Guidance for 1000 Ppi Fingerprint Friction Ridge Imagery
Withdrawn NIST Technical Series Publication Warning Notice The attached publication has been withdrawn (archived), and is provided solely for historical purposes. It may have been superseded by another publication (indicated below). Withdrawn Publication Series/Number NIST SP 500-300 Title JPEG 2000 CODEC Certification Guidance for 1000 ppi Fingerprint Friction Ridge Imagery Publication Date(s) April 2016 Withdrawal Date May 2021 Withdrawal Note Replaced by updated version Superseding Publication(s) (if applicable) The attached publication has been superseded by the following publication(s): Series/Number NIST SP 500-300 Title JPEG 2000 CODEC Certification Guidance for 1000 ppi Fingerprint Friction Ridge Imagery Author(s) Shahram Orandi; John Libert; John Grantham; Michael Garris; Fred Byers Publication Date(s) May 2021 URL/DOI https://doi.org/10.6028/NIST.SP.500-300-upd Additional Information (if applicable) Contact Latest revision of the attached publication Related Information Withdrawal Announcement Link Updated: May 2021 NIST Special Publication 500-300 JPEG 2000 CODEC Certification Guidance for 1000 ppi Fingerprint Friction Ridge Imagery Shahram Orandi John Libert John Grantham Mike Garris Fred Byers This publication is available for free at: http://dx.doi.org/10.6028/NIST.SP.500-300 NIST Special Publication 500-300 JPEG 2000 CODEC Certification Guidance for 1000 ppi Fingerprint Friction Ridge Imagery Shahram Orandi John Libert Michael Garris Fred Byers Information Access Division Information Technology Laboratory John Grantham Systems Plus, Inc. Rockville, MD This publication is available for free at: http://dx.doi.org/10.6028/NIST.SP.500-300 April 2016 U.S. Department of Commerce Penny Pritzker, Secretary National Institute of Standards and Technology Willie E. -
International Standard Iso/Iec 15444-5
This is a previewINTERNATIONAL - click here to buy the full publication ISO/IEC STANDARD 15444-5 Second edition 2015-10-15 Information technology — JPEG 2000 image coding system: Reference software Technologies de l'information — Système de codage d'images JPEG 2000: Logiciel de référence Reference number ISO/IEC 15444-5:2015(E) © ISO/IEC 2015 ISO/IEC 15444-5:2015(E) This is a preview - click here to buy the full publication COPYRIGHT PROTECTED DOCUMENT © ISO/IEC 2015 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISO's member body in the country of the requester. ISO copyright office Ch. de Blandonnet 8 CP 401 CH-1214 Vernier, Geneva, Switzerland Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail [email protected] Web www.iso.org Published in Switzerland ii © ISO/IEC 2015 – All rights reserved ISO/IEC 15444-5:2015(E) This is a preview - click here to buy the full publication Foreword ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.