Webinar

www.pdfa.org

PDF/A for Scanned Documents

Paper Becomes Digital

Mark McKinney, LuraTech, Inc., President Armin Ortmann, LuraTech, CTO

Mark McKinney President, LuraTech, Inc.

© 2009 PDF/A Competence Center, www.pdfa.org Existing Solutions for Scanned Documents

www.pdfa.org Black & White: TIFF G4

Color: Mostly JPEG, but sometimes PNG, BMP and other formats

Often special version formats like “JPEG in TIFF”

Disadvantages: Several formats already for scanned documents Even more formats for born digital documents Loss of information, e.g. with TIFF G4 Bad image quality and huge file size, e.g. with JPEG No standardized spread over all formats Not full text searchable (OCR) inside of files

Black/White: Color: - TIFF G4 - TIFF - TIFF LZW

Mark McKinney - JPEG President, LuraTech, Inc. - PDF

2 Existing Solutions for Scanned Documents

www.pdfa.org Bad image quality vs. file size

TIFF/BMP JPEG TIFF G4

23.8 MB 180 kB 60 kB

Mark McKinney President, LuraTech, Inc.

3 Alternative Solution: PDF

www.pdfa.org PDF is already widely used to: Unify file formats Image à PDF “Office” Documents à PDF Other sources à PDF Create full-text searchable files Apply modern compression technology (e.g. the JPEG2000 file formats family) Harmonize metadata

Conclusion:

PDF avoids the disadvantages of the legacy formats

“So if you are already using PDF as archival

Mark McKinney format, why not use PDF/A with its many President, LuraTech, Inc. advantages?”

4 PDF/A

www.pdfa.org What is PDF/A? • ISO 19005-1, Document Management • Electronic document for long-term preservation

Goals of PDF/A: • Maintain static visual representation of documents • Consistent handing of Metadata • Option to maintain structure and semantic meaning of content • to guarantee access • Limit the number of restrictions

Mark McKinney President, LuraTech, Inc. PDF/A – Full-Text Searchability (OCR)

www.pdfa.org Benefit: Searchable at the File Level Digital Library - “after book download” Large Manuals / Multi-Page Construction Documents Downloaded Documents from Archive Databases Documents sent to customers, suppliers, lawyers, etc. as email attachments

Mark McKinney President, LuraTech, Inc.

6 PDF/A – Enhanced Compression

www.pdfa.org For Black & White Documents JBIG2 - ISO/IEC 14492 Used as alternative to TIFF G4 Full and visual lossless mode Embedded in PDF/A, available in Acrobat Reader

FAX G4 JBIG2/lossless JBIG2/lossy

60 kB 46 kB 29 kB Mark McKinney President, LuraTech, Inc.

7 PDF/A – Enhanced Compression

www.pdfa.org For Color Documents MRC Compression, also known as JPEG2000 (JPM) Splits documents in three layers to be compressed independently and stored in PDF/A

Mark McKinney President, LuraTech, Inc.

8 PDF/A – Enhanced Compression

www.pdfa.org For Color Documents Extreme compression, fully legible Saves the color and the visual quality

TIFF TIFF G4 JPEG PDF/A

23,8 MB 60 kB 180 kB 65 kB

Mark McKinney President, LuraTech, Inc.

9 PDF Compressor Basics: How it works

www.pdfa.org

TIFF Network / Workflow

JPEG LuraDocument PDF Compressor Scanner

PDF

Conversion and Optimization Process Paper

Storage / ECM Convert Scanned documents Batch conversion “unattended” Fully automated Mark McKinney President, LuraTech, Inc. Demo

www.pdfa.org

Armin, let’s have a look!

Mark McKinney President, LuraTech, Inc.

11 Question:

www.pdfa.org

PDF/A: hype or the future archiving format?

Mark McKinney President, LuraTech, Inc.

12 PDF/A – Example e-Government

www.pdfa.org Medical and Student Records

State of New York Long-term Archive Department of Health Department of Education

Project Outline Previously using 1 terabyte of storage every 2 weeks Capture all documents with Scan Service Provider with Fujistu and Kodak scanners Convert images to optimized PDF/A with LuraDocument PDF Compressor Deliver and store PDF/A documents with ECM

Results High compressed PDF/A files reduce storage costs and bandwidth needs by 90% Long term readability of all files with retention time of over 40 years Files are now available quickly for daily research Mark McKinney AIIM 2008: Best Practices Award President, LuraTech, Inc. GTC West 2008: Best Solutions Award

13 PDF/A – Example Credit Files

www.pdfa.org

Mailroom for credit files and international checks

Example: HeLaBa (German State Bank) Mailroom Revenue: 168B Euros Employees: 5,700

Project Outline Convert 20 Mio. Pages paper based archive to PDF/A Convert all daily incoming mail to PDF/A Create complete electronic credit files Used tools: LuraTech PDF Compressor, Kofax Ascent, EMC Centera, Wincor Nixdorf archive:net (Taxnet)

Results Full color scans in electronic archive High compressed PDF/A files Full text searchable credit files Mark McKinney President, LuraTech, Inc. Long term readability of credit files First step on the way to single archiving format 14 Billions of Pages Preserved

www.pdfa.org Airbus (D) Library of Congress (USA)

AOK (D) OCE (NL/D)

APO-Bank (D) RWE Energy (D)

Bank Julius Baer (CH) Siemens (D) Blohm & Voss (D) Southern Nuclear (USA) Bosch Rexroth (D) Southern CA Edison (USA) British Library (UK) West LB (D) City of Arlington (USA) Sparkassen Informatik (D) City of Toronto (CA) State of New York (USA) DAK Insurance (D) Swiss RE (CH) Department of Defense (USA) Universa Insurance (D) Harvard Library (USA) Vattenfall (D) Het Utrechts Archief (NL)

International Labor A few of the projects that LuraTech knows about. Mark McKinney Organization (CH) President, LuraTech, Inc.

15 PDF/A for Scanned Documents

www.pdfa.org

Thanks your interest!

Please fill out our questionnaire.

Demo software or more information?

[email protected]

Mark McKinney President, LuraTech, Inc.

16