Webinar www.pdfa.org PDF/A for Scanned Documents Paper Becomes Digital Mark McKinney, LuraTech, Inc., President Armin Ortmann, LuraTech, CTO Mark McKinney President, LuraTech, Inc. © 2009 PDF/A Competence Center, www.pdfa.org Existing Solutions for Scanned Documents www.pdfa.org Black & White: TIFF G4 Color: Mostly JPEG, but sometimes PNG, BMP and other raster graphics formats Often special version formats like “JPEG in TIFF” Disadvantages: Several formats already for scanned documents Even more formats for born digital documents Loss of information, e.g. with TIFF G4 Bad image quality and huge file size, e.g. with JPEG No standardized metadata spread over all formats Not full text searchable (OCR) inside of files Black/White: Color: - TIFF FAX G4 - TIFF - TIFF LZW Mark McKinney - JPEG President, LuraTech, Inc. - PDF 2 Existing Solutions for Scanned Documents www.pdfa.org Bad image quality vs. file size TIFF/BMP JPEG TIFF G4 23.8 MB 180 kB 60 kB Mark McKinney President, LuraTech, Inc. 3 Alternative Solution: PDF www.pdfa.org PDF is already widely used to: Unify file formats Image à PDF “Office” Documents à PDF Other sources à PDF Create full-text searchable files Apply modern compression technology (e.g. the JPEG2000 file formats family) Harmonize metadata Conclusion: PDF avoids the disadvantages of the legacy formats “So if you are already using PDF as archival Mark McKinney format, why not use PDF/A with its many President, LuraTech, Inc. advantages?” 4 PDF/A www.pdfa.org What is PDF/A? • ISO 19005-1, Document Management • Electronic document file format for long-term preservation Goals of PDF/A: • Maintain static visual representation of documents • Consistent handing of Metadata • Option to maintain structure and semantic meaning of content • Transparency to guarantee access • Limit the number of restrictions Mark McKinney President, LuraTech, Inc. PDF/A – Full-Text Searchability (OCR) www.pdfa.org Benefit: Searchable at the File Level Digital Library - “after book download” Large Manuals / Multi-Page Construction Documents Downloaded Documents from Archive Databases Documents sent to customers, suppliers, lawyers, etc. as email attachments Mark McKinney President, LuraTech, Inc. 6 PDF/A – Enhanced Compression www.pdfa.org For Black & White Documents JBIG2 - ISO/IEC 14492 Used as alternative to TIFF G4 Full and visual lossless mode Embedded in PDF/A, available in Acrobat Reader FAX G4 JBIG2/lossless JBIG2/lossy 60 kB 46 kB 29 kB Mark McKinney President, LuraTech, Inc. 7 PDF/A – Enhanced Compression www.pdfa.org For Color Documents MRC Compression, also known as JPEG2000 (JPM) Splits documents in three layers to be compressed independently and stored in PDF/A Mark McKinney President, LuraTech, Inc. 8 PDF/A – Enhanced Compression www.pdfa.org For Color Documents Extreme compression, fully legible Saves the color and the visual quality TIFF TIFF G4 JPEG PDF/A 23,8 MB 60 kB 180 kB 65 kB Mark McKinney President, LuraTech, Inc. 9 PDF Compressor Basics: How it works www.pdfa.org TIFF Network / Workflow JPEG LuraDocument PDF Compressor Scanner PDF Conversion and Optimization Process Paper Storage / ECM Convert Scanned documents Batch conversion “unattended” Fully automated Mark McKinney President, LuraTech, Inc. Demo www.pdfa.org Armin, let’s have a look! Mark McKinney President, LuraTech, Inc. 11 Question: www.pdfa.org PDF/A: hype or the future archiving format? Mark McKinney President, LuraTech, Inc. 12 PDF/A – Example e-Government www.pdfa.org Medical and Student Records State of New York Long-term Archive Department of Health Department of Education Project Outline Previously using 1 terabyte of storage every 2 weeks Capture all documents with Scan Service Provider with Fujistu and Kodak scanners Convert images to optimized PDF/A with LuraDocument PDF Compressor Deliver and store PDF/A documents with ECM Results High compressed PDF/A files reduce storage costs and bandwidth needs by 90% Long term readability of all files with retention time of over 40 years Files are now available quickly for daily research Mark McKinney AIIM 2008: Best Practices Award President, LuraTech, Inc. GTC West 2008: Best Solutions Award 13 PDF/A – Example Credit Files www.pdfa.org Mailroom for credit files and international checks Example: HeLaBa (German State Bank) Mailroom Revenue: 168B Euros Employees: 5,700 Project Outline Convert 20 Mio. Pages paper based archive to PDF/A Convert all daily incoming mail to PDF/A Create complete electronic credit files Used tools: LuraTech PDF Compressor, Kofax Ascent, EMC Centera, Wincor Nixdorf archive:net (Taxnet) Results Full color scans in electronic archive High compressed PDF/A files Full text searchable credit files Mark McKinney President, LuraTech, Inc. Long term readability of credit files First step on the way to single archiving format 14 Billions of Pages Preserved www.pdfa.org Airbus (D) Library of Congress (USA) AOK (D) OCE (NL/D) APO-Bank (D) RWE Energy (D) Bank Julius Baer (CH) Siemens (D) Blohm & Voss (D) Southern Nuclear (USA) Bosch Rexroth (D) Southern CA Edison (USA) British Library (UK) West LB (D) City of Arlington (USA) Sparkassen Informatik (D) City of Toronto (CA) State of New York (USA) DAK Insurance (D) Swiss RE (CH) Department of Defense (USA) Universa Insurance (D) Harvard Library (USA) Vattenfall (D) Het Utrechts Archief (NL) International Labor A few of the projects that LuraTech knows about. Mark McKinney Organization (CH) President, LuraTech, Inc. 15 PDF/A for Scanned Documents www.pdfa.org Thanks your interest! Please fill out our questionnaire. Demo software or more information? [email protected] Mark McKinney President, LuraTech, Inc. 16.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages16 Page
-
File Size-