Webinar
www.pdfa.org
PDF/A for Scanned Documents
Paper Becomes Digital
Mark McKinney, LuraTech, Inc., President Armin Ortmann, LuraTech, CTO
Mark McKinney President, LuraTech, Inc.
© 2009 PDF/A Competence Center, www.pdfa.org Existing Solutions for Scanned Documents
www.pdfa.org Black & White: TIFF G4
Color: Mostly JPEG, but sometimes PNG, BMP and other raster graphics formats
Often special version formats like “JPEG in TIFF”
Disadvantages: Several formats already for scanned documents Even more formats for born digital documents Loss of information, e.g. with TIFF G4 Bad image quality and huge file size, e.g. with JPEG No standardized metadata spread over all formats Not full text searchable (OCR) inside of files
Black/White: Color: - TIFF FAX G4 - TIFF - TIFF LZW
Mark McKinney - JPEG President, LuraTech, Inc. - PDF
2 Existing Solutions for Scanned Documents
www.pdfa.org Bad image quality vs. file size
TIFF/BMP JPEG TIFF G4
23.8 MB 180 kB 60 kB
Mark McKinney President, LuraTech, Inc.
3 Alternative Solution: PDF
www.pdfa.org PDF is already widely used to: Unify file formats Image à PDF “Office” Documents à PDF Other sources à PDF Create full-text searchable files Apply modern compression technology (e.g. the JPEG2000 file formats family) Harmonize metadata
Conclusion:
PDF avoids the disadvantages of the legacy formats
“So if you are already using PDF as archival
Mark McKinney format, why not use PDF/A with its many President, LuraTech, Inc. advantages?”
4 PDF/A
www.pdfa.org What is PDF/A? • ISO 19005-1, Document Management • Electronic document file format for long-term preservation
Goals of PDF/A: • Maintain static visual representation of documents • Consistent handing of Metadata • Option to maintain structure and semantic meaning of content • Transparency to guarantee access • Limit the number of restrictions
Mark McKinney President, LuraTech, Inc. PDF/A – Full-Text Searchability (OCR)
www.pdfa.org Benefit: Searchable at the File Level Digital Library - “after book download” Large Manuals / Multi-Page Construction Documents Downloaded Documents from Archive Databases Documents sent to customers, suppliers, lawyers, etc. as email attachments
Mark McKinney President, LuraTech, Inc.
6 PDF/A – Enhanced Compression
www.pdfa.org For Black & White Documents JBIG2 - ISO/IEC 14492 Used as alternative to TIFF G4 Full and visual lossless mode Embedded in PDF/A, available in Acrobat Reader
FAX G4 JBIG2/lossless JBIG2/lossy
60 kB 46 kB 29 kB Mark McKinney President, LuraTech, Inc.
7 PDF/A – Enhanced Compression
www.pdfa.org For Color Documents MRC Compression, also known as JPEG2000 (JPM) Splits documents in three layers to be compressed independently and stored in PDF/A
Mark McKinney President, LuraTech, Inc.
8 PDF/A – Enhanced Compression
www.pdfa.org For Color Documents Extreme compression, fully legible Saves the color and the visual quality
TIFF TIFF G4 JPEG PDF/A
23,8 MB 60 kB 180 kB 65 kB
Mark McKinney President, LuraTech, Inc.
9 PDF Compressor Basics: How it works
www.pdfa.org
TIFF Network / Workflow
JPEG LuraDocument PDF Compressor Scanner
Conversion and Optimization Process Paper
Storage / ECM Convert Scanned documents Batch conversion “unattended” Fully automated Mark McKinney President, LuraTech, Inc. Demo
www.pdfa.org
Armin, let’s have a look!
Mark McKinney President, LuraTech, Inc.
11 Question:
www.pdfa.org
PDF/A: hype or the future archiving format?
Mark McKinney President, LuraTech, Inc.
12 PDF/A – Example e-Government
www.pdfa.org Medical and Student Records
State of New York Long-term Archive Department of Health Department of Education
Project Outline Previously using 1 terabyte of storage every 2 weeks Capture all documents with Scan Service Provider with Fujistu and Kodak scanners Convert images to optimized PDF/A with LuraDocument PDF Compressor Deliver and store PDF/A documents with ECM
Results High compressed PDF/A files reduce storage costs and bandwidth needs by 90% Long term readability of all files with retention time of over 40 years Files are now available quickly for daily research Mark McKinney AIIM 2008: Best Practices Award President, LuraTech, Inc. GTC West 2008: Best Solutions Award
13 PDF/A – Example Credit Files
www.pdfa.org
Mailroom for credit files and international checks
Example: HeLaBa (German State Bank) Mailroom Revenue: 168B Euros Employees: 5,700
Project Outline Convert 20 Mio. Pages paper based archive to PDF/A Convert all daily incoming mail to PDF/A Create complete electronic credit files Used tools: LuraTech PDF Compressor, Kofax Ascent, EMC Centera, Wincor Nixdorf archive:net (Taxnet)
Results Full color scans in electronic archive High compressed PDF/A files Full text searchable credit files Mark McKinney President, LuraTech, Inc. Long term readability of credit files First step on the way to single archiving format 14 Billions of Pages Preserved
www.pdfa.org Airbus (D) Library of Congress (USA)
AOK (D) OCE (NL/D)
APO-Bank (D) RWE Energy (D)
Bank Julius Baer (CH) Siemens (D) Blohm & Voss (D) Southern Nuclear (USA) Bosch Rexroth (D) Southern CA Edison (USA) British Library (UK) West LB (D) City of Arlington (USA) Sparkassen Informatik (D) City of Toronto (CA) State of New York (USA) DAK Insurance (D) Swiss RE (CH) Department of Defense (USA) Universa Insurance (D) Harvard Library (USA) Vattenfall (D) Het Utrechts Archief (NL)
International Labor A few of the projects that LuraTech knows about. Mark McKinney Organization (CH) President, LuraTech, Inc.
15 PDF/A for Scanned Documents
www.pdfa.org
Thanks your interest!
Please fill out our questionnaire.
Demo software or more information?
Mark McKinney President, LuraTech, Inc.
16