Webinar
PDF/A for Scanned Documents
Paper Becomes Digital
Mark McKinney, LuraTech, Inc., President Armin Ortmann, LuraTech, CTO
Mark McKinney
President, LuraTech, Inc.
© 2009 PDF/A Competence Center, www.pdfa.org
Existing Solutions for Scanned Documents
Black & White: TIFF G4 Color: Mostly JPEG, but sometimes PNG, BMP and other raster graphics formats
Often special version formats like “JPEG in TIFF” Disadvantages:
Several formats already for scanned documents Even more formats for born digital documents Loss of information, e.g. with TIFF G4 Bad image quality and huge file size, e.g. with JPEG No standardized metadata spread over all formats Not full text searchable (OCR) inside of files
Black/White: Color: - TIFF FAX G4 - TIFF
- TIFF LZW - JPEG - PDF
Mark McKinney
President, LuraTech, Inc.
2
Existing Solutions for Scanned Documents
Bad image quality vs. file size
TIFF G4
JPEG
TIFF/BMP
60 kB
23.8 MB
180 kB
Mark McKinney
President, LuraTech, Inc.
3
Alternative Solution: PDF
PDF is already widely used to:
Unify file formats
Image à PDF “Office” Documents à PDF Other sources à PDF
Create full-text searchable files Apply modern compression technology (e.g. the JPEG2000 file formats family)
Harmonize metadata
Conclusion: PDF avoids the disadvantages of the legacy formats
“So if you are already using PDF as archival format, why not use PDF/A with its many advantages?”
Mark McKinney
President, LuraTech, Inc.
4
PDF/A
What is PDF/A? • ISO 19005-1, Document Management • Electronic document file format for long-term preservation
Goals of PDF/A: • Maintain static visual representation of documents
• Consistent handing of Metadata • Option to maintain structure and semantic meaning of content
• Transparency to guarantee access • Limit the number of restrictions
Mark McKinney
President, LuraTech, Inc.
PDF/A – Full-Text Searchability (OCR)
Benefit: Searchable at the File Level
Digital Library - “after book download” Large Manuals / Multi-Page Construction Documents Downloaded Documents from Archive Databases Documents sent to customers, suppliers, lawyers, etc. as email attachments
Mark McKinney
President, LuraTech, Inc.
6
PDF/A – Enhanced Compression
For Black & White Documents
JBIG2 - ISO/IEC 14492
Used as alternative to TIFF G4 Full and visual lossless mode Embedded in PDF/A, available in Acrobat Reader
- FAX G4
- JBIG2/lossless
- JBIG2/lossy
- 60 kB
- 46 kB
- 29 kB
Mark McKinney
President, LuraTech, Inc.
7
PDF/A – Enhanced Compression
For Color Documents
MRC Compression, also known as JPEG2000 (JPM) Splits documents in three layers to be compressed independently and stored in PDF/A
Mark McKinney
President, LuraTech, Inc.
8
PDF/A – Enhanced Compression
For Color Documents
Extreme compression, fully legible Saves the color and the visual quality
TIFF G4
- JPEG
- TIFF
PDF/A
- 23,8 MB
- 65 kB
- 60 kB
- 180 kB
Mark McKinney
President, LuraTech, Inc.
9
PDF Compressor Basics: How it works
TIFF
Network / Workflow
LuraDocument
JPEG
PDF Compressor
Scanner
Conversion and Optimization Process
Paper
Storage / ECM
Convert Scanned documents
Batch conversion “unattended” Fully automated
Mark McKinney
President, LuraTech, Inc.
Demo
Armin, let’s have a look!
Mark McKinney
President, LuraTech, Inc.
11
Question:
PDF/A: hype or the future archiving format?
Mark McKinney
President, LuraTech, Inc.
12
PDF/A – Example e-Government
Medical and Student Records State of New York Long-term Archive
Department of Health Department of Education
Project Outline
Previously using 1 terabyte of storage every 2 weeks Capture all documents with Scan Service Provider with Fujistu and Kodak scanners
Convert images to optimized PDF/A with LuraDocument PDF Compressor
Deliver and store PDF/A documents with ECM
Results
High compressed PDF/A files reduce storage costs and bandwidth needs by 90%
Long term readability of all files with retention time of over 40 years
Files are now available quickly for daily research AIIM 2008: Best Practices Award
Mark McKinney
President, LuraTech, Inc.
GTC West 2008: Best Solutions Award
13
PDF/A – Example Credit Files
Mailroom for credit files and international checks Example: HeLaBa (German State Bank) Mailroom
Revenue: 168B Euros Employees: 5,700
Project Outline
Convert 20 Mio. Pages paper based archive to PDF/A Convert all daily incoming mail to PDF/A Create complete electronic credit files Used tools: LuraTech PDF Compressor, Kofax Ascent, EMC Centera, Wincor Nixdorf archive:net (Taxnet)
Results
Full color scans in electronic archive High compressed PDF/A files Full text searchable credit files Long term readability of credit files
Mark McKinney
President, LuraTech, Inc.
First step on the way to single archiving format
14
Billions of Pages Preserved
Airbus (D)
Library of Congress (USA)
OCE (NL/D)
AOK (D) APO-Bank (D)
RWE Energy (D)
Bank Julius Baer (CH)
Blohm & Voss (D)
Siemens (D) Southern Nuclear (USA) Southern CA Edison (USA) West LB (D)
Bosch Rexroth (D) British Library (UK) City of Arlington (USA) City of Toronto (CA) DAK Insurance (D) Department of Defense (USA) Harvard Library (USA) Het Utrechts Archief (NL) International Labor
Sparkassen Informatik (D) State of New York (USA) Swiss RE (CH) Universa Insurance (D) Vattenfall (D)
A few of the projects that LuraTech knows about.
Organization (CH)
Mark McKinney
President, LuraTech, Inc.
15
PDF/A for Scanned Documents
Thanks your interest!
Please fill out our questionnaire.
Demo software or more information?
Mark McKinney
President, LuraTech, Inc.
16