PDF/A in a Nutshell 2.0 PDF for long-term archiving

Alexandra Oettler

■■ The history of the ISO standard

■■ All versions – from PDF/A-1 to PDF/A-3

■■ How users benefit from PDF/A

■■ The technical background

■■ Tools for creating PDF/A files

■■ Validating PDF/A files

■■ PDF/A in law and administration

■■ PDF/A in finance and industry

PDF/A in a Nutshell 2.0 PDF for long-term archiving The ISO Standard – from PDF/A-1 to PDF/A-3 This work, including all its component parts, is copyright protected. All rights based thereupon are reserved, including those of translation, reprinting, presentation, extraction of illustrations or tables, broadcasting, microfilming or reproduction by any other means, or storage in any data-processing device, in whole or in part. Reproduction of this work or any part of this work is only permitted where legally specified in the Copyright Act of the Federal Republic of Germany dated the 9th of September 1965.

© 2013 Association for Digital Document Standards e. V., Berlin [email protected] Printed in Germany

The use of any names, trade names, trade descriptions etc. in this work, even those not specially identified as such, does not justify the assumption that these names are free according to trademark protection law and thus usable by anyone. Text: Alexandra Oettler Layout, cover design, design and composition: Alexandra Oettler Cover image: Paulgeor, Dreamstime.com Picture credits: Page 5: Photocase; Page 6: Sepp Huberbauer, Photocase; Page 8: aoe; Page 13: EU Publications Office; Page 14: Rui Frias, Istockphoto.com; Page 15: MBPHOTO, Istockphoto.com; Page 18: Photocase. Printed by: Galrev Druck- und Verlagsgesellschaft Hesse & Partner OHG

Contents

PDF/A – the ISO standard for long-term archiving 5 PDF/A in public administration 13 The decisive advantages of PDF/A Widespread acceptance of PDF/A PDF/A in finance and industry 14 Industry documentation PDF/A facts – an introduction to the standard 6 Banking and insurance An archiving format Healthcare Why PDF/A and not just PDF? E-invoicing

A short history of PDF/A 7 PDF/A in legislation and justice 15 Becoming an ISO standard Federal jurisdiction in the United States courts PDF/A catches on The Italian Chamber of Commerce Austria: BAIK Germany: Land registration The technical side of the PDF/A standard 8 PDF/A-1: The first archiving standard PDF/A-2: Based on PDF 1.7 What the users and experts say 16 PDF/A-3: One more feature Conformance levels: A, B, U PDF/A and the other PDF standards 17 PDF/X The most important reasons to use PDF/A 9 PDF/A PDF/E PDF Typical uses for PDF/A 10 PDF/VT PDF/UA PDF/A creation tools 11 Desktop software The myths and legends surrounding PDF/A 18 Server-based solutions Programming libraries Integrated PDF/A functions Further information on PDF/A 19 The portal to the PDF Association PDF Association events Validation: Is it really PDF/A? 12 Membership When do I need to validate? Finding the right validation solution

PDF/A in a Nutshell 2.0 III

Introduction PDF/A – the ISO standard for long-term archiving

Up to the end of the 20th century, physi- tems used for creating, storing or render- cal media formats (paper, microfilms ing the files.” (ISO 19005-1, quoted from and microfiches) were the only option the introduction). for businesses and public authorities The first part of the standard, PDF/A-1, storing documents for the long term in a has been available since the 1st of Octo- reproducible format. The major draw- ber 2005. Its official designation is“ISO back to these analogue approaches was 19005-1:2005. Document management – the significant time and effort required: Electronic document file format for long- documents are hard to search through, term preservation – Part 1: Use of PDF 1.4 trained personnel are required, specialist (PDF/A-1)”. equipment is needed to read microfilms, Since then, two further parts have been and entire climate-controlled rooms are made available to users: PDF/A-2 (since needed to store documents. 2011) and PDF/A-3 (since 2012). These The first digital archiving format to parts exist in parallel and are optimised gain ground in many countries was the to meet particular needs (see page 8). TIFF image format. In 1993, however, a The PDF/A standards family regulates modern, more powerful format became how to create electronic documents to available in the form of PDF. This be- ensure they can be reliably reproduced came the basis on which the standard ar- for decades to come. The standard does chive format PDF/A was developed (see not describe how to build a revision-safe page 7). archive, nor the theory behind one. For companies, public authorities and private users needing to store digital in- The decisive advantages of PDF/A formation for a long period of time – be it 5 years, 50 or 500 – the PDF/A standard ■■A PDF/A file contains everything is now the clear choice of file format. needed to display it and nothing which PDF/A is a multi-part ISO standard could negatively impact the display. developed over many years of committee work by industry associations, business- ■■PDF/A files can be used on any plat- The ISO (International Organization for Standardization) is the largest es and public authorities around the form. organisation in the world for devel- world. The result is“a file format based oping and publishing international on PDF, known as PDF/A, which pro- ■■Free programs exist for displaying standards. vides a mechanism for representing elec- PDF/A files. tronic documents in a manner that preserves their visual appearance over ■■The multi-part PDF/A standard offers time, independent of the tools and sys- great flexibility to users. Widespread acceptance of PDF/A PDF/A is becoming more and more common, be it in industry, public ad- ministration, financial services or ac- ademia. A large number of authorities and institutions worldwide recommend PDF/A or specifically require the use of the standard (see page 13).

PDF/A in a Nutshell 2.0 5 Introduction PDF/A facts – an introduction to the standard

Current file formats used by popular ap- viewed on a tablet, a smartphone or a plications are simply not suitable for pub- desktop computer, a PDF file will usually lic authorities, businesses and individual look the same. users needing to store unalterable digital Document archives, however, require an documents for long periods of time. Word exceptionally high standard: the content processors such as Microsoft Word or must always appear exactly the same OpenOffice Writer create files which can under all circumstances. Particularly look very different depending on the plat- because of its universal availability and PDF/A is an industry-recognised form used to view them. Text and images worldwide acceptance, it makes sense to ISO standard. Future software may appear different than intended – or build on PDF to create an archiving stan- development must reflect the they may not appear at all. Nowadays, dard for digital documents. need to work reliably with there are also the questions of how these these documents. programs will develop in the future, and Why PDF/A and not just PDF? whether or not it will still be possible to Put in the simplest possible terms, open and view older files – an unaccept- PDF/A is a PDF which forbids certain able risk when considering the timescales functions which could hinder long-term involved in long-term archiving. archiving. PDF/A also demands that the file meet certain requirements which An archiving format guarantee reliable reproduction. When using email or the internet to dis- For example, files must not be encrypt- tribute carefully designed documents ed with a password, as all content must containing text and images, users are in- always be fully available. Embedded vid- creasingly choosing PDF. After all, the eo and audio data are also prohibited: Portable Document Format can embed PDF/A consciously avoids anything that all elements of a document within itself. requires external software for display or This can include fonts and images, but playback. JavaScript and certain actions also 3D objects, audio and video. Em- are also forbidden, as executing them bedded fonts are optional; it is also possi- could potentially alter the PDF. ble (in order to save on file size, for PDF/A also places higher demands on example) to link to one instead. This, the information it contains. All required however, carries the risk that not all ma- fonts (or at least all glyphs for the specific chines will correctly display the PDF. characters used) must be embedded within PDF has also gained such broad world- the PDF. To ensure a uniform colour ap- wide acceptance because free programs pearance on a variety of platforms and de- exist for all devices and operating sys- vices, colour information must be given in tems to view PDF documents. Whether a platform-independent format using ICC colour profiles. The software must also use the XMP format for metadata (which is used to store the data identifying the file as a PDF/A, for example). PDF/A also sets technical limits: for ex- ample, the page size is limited to an edge length of either 5.08 metres (PDF/A-1) or up to 381 kilometres (PDF/A-2 and PDF/A-3).

6 PDF/A in a Nutshell 2.0 Introduction

A short history of PDF/A

Those who first needed to store docu- University Libraries, Library of Con- ments in a future-proof digital format gress), the judicial system (Administra- used the popular image format TIFF tive Office of the United States Courts) (Tagged Image File Format). This format and industry developers (including Ado- was used for a long time, particularly for be Systems and Kodak). After a number scanned documents, but it has a number of meetings and a comprehensive testing of drawbacks. and approval phase, the ISO published For example, the TIFF raster format con- PDF/A on the 1st of October 2005 under tains no text-based information, meaning the designation “ISO 19005-1:2005”. It files cannot be searched by their text con- was the world’s first standard file format tent. And if the TIFF file contains colour for digital long-term archiving. images or pages, it will become signifi- cantly larger; effective compression is all PDF/A catches on but impossible. Only black-and-white line In 2006, to promote recognition of images (which is sometimes enough for PDF/A, a group of software developers PDF/A’s wide-ranging every- scanned text pages) can save much space founded the PDF/A Competence Centre day use is also seen in the in TIFF format. (today a part of the PDF Association) as number of common programs Contrary to popular belief, TIFF is not an an industry association for digital doc- which support it. Free word ISO standard. The resolution, colour and ument standards. Through seminars, processing software such as metadata settings for TIFF files are mostly conferences, publications and not least OpenOffice and LibreOffice can left to the individual user’s discretion. through its website www.pdfa.org, the create PDF/A files at the click association has helped spread practical of a button, and Adobe Reader Becoming an ISO standard information about the ISO standard (see faithfully displays PDF/A docu- As Adobe Systems’ 1993-published Por- page 19). ments as they were intended table Document Format (PDF) grew in Initially active in Germany and Swit- to be seen. Microsoft Office popularity, users and developers began zerland in particular, within a few years has also supported directly to recognise its potential for long-term the PDF Association was able to expand saving as PDF/A since 2007. archiving. its area of operations across Europe, In 2002, specialists from libraries and America, the Middle East, Asia and Aus- archives, from administrative bodies, tralia. By the end of 2012, the Association from industry and from the judicial sys- had 143 members across 25 countries. tem assembled in order to develop a pur- Today, PDF/A has found broad ac- pose-built file format for standardised ceptance in all sectors where docu- archiving. A working group within the ments are stored long-term. Numerous ISO (International Organisation for document management solutions pro- Standardisation) took up the task: repre- vide direct support for archiving with sentatives from a wide range of US-based PDF/A. More and more countries are associations and federal authorities in- recommending the standard in public cluding AIIM (Association for Infor- administration, or even specifically re- mation and Image Management), NPES quiring it (see page 13). Meanwhile, (Association for Suppliers of Printing, a correspondingly broad selection of Publishing and Converting Technolo- PDF/A creation and validation software gies) and NARA (National Archives and is now available (see page 12), from Records Administration) met with ex- single-workstation solutions to auto- perts from the library sector (Harvard mated server-based systems.

PDF/A in a Nutshell 2.0 7 Technical Information The technical side of the PDF/A standard

Nomenclature: PDF/A versions After the first part of PDF/A was pub- PDF/A-3 file can contain the original file and levels are simply given one lished, two more parts arrived. These are from which it was generated. The PDF/A after another. A PDF/A-1b file, not replacements for part 1, however; standard does not regulate the suitability for example, is a PDF file for rather, they offer additional options for of these embedded files for archiving. long-term archiving, of the first archiving PDF documents. All existing generation, with visually repro- PDF/A files remain fully valid. Conformance levels: A, B, U ducible content. The different conformance levels reflect PDF/A-1: The first archiving standard the quality of the archived document and PDF/A-1 is based on PDF version 1.4, depend on the input material and the which first appeared in 2001. All re- document’s purpose. sources (images, graphics, typographic characters) must be embedded within ■■Level A (Accessible) meets all require- the PDF/A document itself. A PDF/A file ments for the standard, including the requires precise, platform-independent logical structure of the document and its colour data using ICC profiles, and XMP correct reading order. Text must be ex- for the document metadata. Transparent tractable and the logical structure must elements, some forms of compression match the natural reading order. Fonts (LZW, JPEG2000), PDF layers, and cer- used must meet stringent requirements. tain actions or JavaScript are forbidden. This PDF/A level can usually only be met A PDF/A file must not be password-pro- by converting born-digital documents. tected. PDF/A-1 expressly supports em- bedded digital signatures and the use of ■■Level B (Basic) guarantees that the con- hyperlinks. tent of the document can be unambigu- ously reproduced. Level B files are easier PDF/A-2: Based on PDF 1.7 to create than Level A, but Level B does PDF/A-2 was published in 2011 as “ISO not guarantee 100% text extraction or 19005-2”. Based on PDF version 1.7 (see searchability. It does not necessarily mean page 17), , which has since been stan- that the content can be reused without dardised as “ISO 32000-1”, it makes use any problems. Scanned paper documents of this version’s new features. This means can usually be converted to PDF/A Con- PDF/A-2 allows JPEG2000 compression, formance Level B without any extra work. transparent elements and PDF layers. PDF/A-2 also allows you to embed Open- ■■Level U (Unicode) was introduced along Type fonts and supports PAdES (PDF with PDF/A-2. It expands Conformance Advanced Electronic Signatures)-com- Level B to specify that all text can be pliant digital signatures. One particularly mapped to standard Unicode character important innovation is the “container” codes. function: PDF/A files can be embedded within a PDF/A-2 document. PDF/A-3: One more feature PDF/A-3 has been available since October 2012. A PDF/A-3 document allows you to embed any file format desired – not just PDF/A documents. For example, a

8 PDF/A in a Nutshell 2.0 Uses and Benefits The most important reasons to use PDF/A

The PDF/A standard offers practical ■■Platform-independent: PDF, and so solutions for a wide variety of tasks, PDF/A too, are platform-indepen- bringing advantages to many areas of dent. Thanks to PDF/A, documents application. such as invoices, brochures, manuals or research reports can be made reli- ■■Long-term archiving: PDF/A provides ably available through a wide range of an ISO-standardised format to all channels. those who need to store digital docu- ments for long periods of time. This ■■Full text searching: PDF/A helps you can include archives, libraries, banks, to find and access specific information insurance firms and others. within a data set. This is even possible with scanned documents, as the stan- ■■Legally binding documents: PDF/A is dard permits searchable text created an excellent option for digitally signed through optical character recognition documents and records. The ISO stan- (OCR). Even Conformance Level B dard allows embedded electronic signa- (Basic) supports this feature. tures and specifies only their minimum requirements. This means that PDF/A ■■Extra search options: XMP metadata documents can always be digitally can be used to add additional struc- signed using the very latest technology, tured information to the document, even as it develops in the future. such as the author, description of the content, or source and copyright in- ■■Science and research: PDF/A reliably formation. As a result, the user can displays special characters for math- search for additional stored keywords, ematical formulas or old languages, categories or values within the data as all required symbols are embedded set. into the file itself. ICC profiles pro- vide total colour control, supporting ■■Use content again and again: PDF/A research work in fields such as medi- Conformance Level A makes it easier cine, archaeology or cultural history. to reuse content. Such files are very As a result, people are always finding easy to convert to Word, HTML or eB- more uses for PDF/A in the academic ook formats. sector: some universities now only ac- cept assignments and dissertations in ■■Use PDF/A in combination with other this ISO-standard format. standards: PDF/A is closely related to the ISO’s other PDF standards. As a re- ■■Global integration: storing informa- sult, a PDF/A file can often meet the tion in different languages requires requirements for universally accessible comprehensive support for all kinds of (for disabled users) as defined in writing systems around the world. In the PDF/UA standard. Digital books Japanese, Arabic, or Cyrillic, PDF/A in PDF/A format are well-suited for makes sure that texts can always be cor- printing on demand if they also meet rectly displayed on any device, includ- the PDF/X standard for digital print ing the reading direction. It also allows documents. For an overview of PDF fixed-layout printing. standards, see page 17.

PDF/A in a Nutshell 2.0 9 Uses and Benefits

Typical uses for PDF/A

The PDF/A standard has proved its suit- torial systems. As a result, an archivable ability for a wide variety of tasks. Here PDF can be created at the same time with- we can show you just a few brief practical out significant extra work. This can either examples. be done using external solutions (rather than using the original program that cre- ■■Scanned documents for archiving: PDF/A ated the document) or using a print-ready is widely used to digitise paper-based PDF which is often already available in files and records. A document scanner the PDF/X format, the ISO standard for reads the original text, and specially de- print documents (see page 17). signed software automatically converts the data into a searchable PDF/A file. ■■Creating documents from databases: Many PDF/A files were originally creat- ■■Archive migration: Solutions exist to ed from databases or were created using help digital archives which are still using XML data. This structured input data of- older formats to migrate to PDF/A. In ten allows you to create PDF/A Confor- most cases, the process can even be au- mance Level A documents. You can also tomated. convert forms to PDF/A.

■■Incoming and outgoing mail: Wheth- ■■Digital document folders: As of er a company receives letters or emails, PDF/A-3, source documents can also PDF/A provides a reliable storage format be embedded directly into a PDF/A file for them. Letters can be automatically in their original format. This eliminates scanned and archived as PDF/A files, time-consuming hybrid archiving pro- and emails and their attachments can be cesses in which additional documents stored in PDF/A too. There are also great (Excel tables, image files, CAD draw- advantages to storing a copy of all out- ings) had to be managed separately going mail in PDF/A format. Outgoing from the archived PDF/A file in their mail data can be retrieved from popu- original formats. Thanks to PDF/A-3, lar print data streams such as AFP (Ad- all relevant information is now con- vanced Function Presentation). tained within a single file.

■■Office documents: If your presentations, ■■Team collaboration: PDF/A-3 in par- spreadsheets and text documents are ticular is exceptionally powerful and likely to have long-term relevance and flexible when used within modern col- need to remain available for long periods laboration frameworks. Its hybrid ap- of time, then PDF/A is the perfect format proach means that each document can in which to archive them. The original contain the current working version of a programme used may directly support document and the final – archive-ready this to some extent, or you can use addi- – version. As a result, PDF/A-3 provides tional software packages. In both cases, ideal support for all the most important the process can be automated. functions within a Microsoft SharePoint environment, for example. In particu- ■■Documentation and typesetting: Bro- lar, this includes collaborative work on chures and instruction books are usually documents as well as distribution and created using layout programs and edi- archiving.

10 PDF/A in a Nutshell 2.0 Tools

PDF/A creation tools

PDF/A documents can be created in a software can be used to examine PDF/A variety of ways: files to ensure they actually meet the PDF/A standard (see page 12). ■■From scanned documents Individual-workstation products al­ so exist which allow users to scan to ■■By direct conversion of the source PDF/A, including OCR. This software data is sometimes supplied with the scanner itself. ■■By exporting from the program used to create the source document Server-based solutions Server-based solutions exist for mass ■■Using an intermediate step which PDF/A creation. This allows busi- turns a PDF file into PDF/A ness-wide standardisation of your working processes and lets you manage ■■Using print output formats or print large volumes of data. Some desktop data streams such as GDI, PCL, PC products also have server-based PostScript, AFP and XPS. versions for high-volume processing.

This section will sketch out just a few Programming libraries typical approaches. For a more exten- Programming libraries allow devel- sive list, including specific products opers to add PDF/A functionality to which allow you to create PDF/A files, their own applications without hav- visit the PDF Association’s website at ing to develop the needed technology www.pdfa.org. from scratch. Some desktop or serv- er-based products are also available Desktop software as programming libraries. Suppliers On a standard workstation, office ap- can thus integrate extra functionality plications in particular will already of- into their solutions with minimal de- fer inbuilt tools (or can easily be velopment work on their part. These retrofitted) to export word-processed extra functions may include PDF/A files, spreadsheets or presentations di- creation, validation and management. rectly to PDF/A. If the “Tagged PDF” A business’ IT department can also option is enabled, then Microsoft Of- add PDF/A features to the company’s fice, OpenOffice and LibreOffice will own software environment for internal Some word processing software even support PDF/A Conformance projects. (such as OpenOffice, shown here) al- Level A for semantically structured low you to create PDF/A-1, including data. Integrated PDF/A functions “Tagged PDF” as a prerequisite for Some PDF/A conversion solutions Many document and output manage- Conformance Level A. use print data creation tools to generate ment solutions providers offer mod- PDF or PDF/A files. Another approach ules which can be used to perform is to use programming libraries to con- PDF/A functions. Many systems are vert data or directly write to PDF. already available for high-volume is used for PDFs in management of a wide variety of in- many industries, and it provides com- put and output channels in PDF and prehensive support for PDF/A. This PDF/A format.

PDF/A in a Nutshell 2.0 11 Validation Validation: Is it really PDF/A?

It is not always easy to tell at first glance other means), validation is advised. Af- whether an existing PDF file actually ter all, it is impossible to know how the meets the ISO’s PDF/A standard. Appli- PDF/A file was created. cations such as Adobe Acrobat and Ado- be Reader do use a pale blue banner to ■■Prior to transmission/distribution: When indicate when a file claims to be PDF/A sending a PDF/A file by email or making compliant, but this is only an indicator it available online, validation in advance and should not be used in place of a full is recommended. examination to ensure the document meets the PDF/A standard. ■■Before archiving: You must validate To be absolutely certain, you can per- data before placing it in a digital archive. form a validation check which examines Adobe Acrobat and Adobe Reader can all relevant parts of a document. ■■At the end of certain processes: Certain indicate whether a document may processing stages which should not ad- be PDF/A compliant, but this is not a versely affect a PDF/A file under normal replacement for a full validation. circumstances (such as inserting extra pages) may, in rare cases, cause a PDF/A to become invalid. Validation will clarify the situation for you.

If the validation process detects a vio- lation of the PDF/A standard, the file can often be repaired using the appropriate software. If this is not possible, the only other option is to recreate the PDF docu- ment from scratch. Finding the right validation solution Several validation programs are available Acrobat’s Preflight function checks for compliance with on the market. As with PDF/A creation PDF standards. software, you can choose between work- station applications, server-based solu- When do I need to validate? tions and modules for workflow systems. During the typical life cycle of a PDF/A PDF/A files can also be validated using file, there are particular points when programming libraries. it should be checked for complete ISO Some PDF/A creation tools can also compliance. perform a test after conversion to ensure the result meets the ISO standard. Nat- ■■After creation: A PDF/A file should be urally, these solutions can also validate Note regarding process valida- validated immediately after creation, to PDF/A documents delivered from else- tion: during an automated stage ensure that the process was carried out where. of processing, such as scanning successfully. Adobe Acrobat Pro, already used in to PDF/A, the process as a whole is validated rather than each many industries, can test almost all PDF ■ On receipt individual PDF in the process. ■ : If a company receives a ISO standards including the three parts PDF/A file (by email, for example, or by of PDF/A, using its Preflight function.

12 PDF/A in a Nutshell 2.0 Areas of Application PDF/A in public administration

Many government authorities and public mats, namely ODF, PDF and PDF/A. All institutions worldwide now specify for- public institutions in the country must mats to use for digital data. Government use software, as must all offices often recommend that working companies which take on public con- documents use open file formats. More tracts. Any entity which cannot meet and more often, PDF/A is the only for- these requirements must fully justify this mat accepted for final-version files. decision. In many cases, it is generally easier and ultimately more cost-effective ■■EU Publications Office: The EU Publica- to switch over to a standardised process. tions Office is tasked with providing -ac cess to all laws, declarations and ■■Brazil: In 2007, the Brazilian govern- publications. Since 2007, the EU Digital ment introduced the e-PING architec- Library has been tasked with storing ture which regulates the provision of printed texts – some of which date back digital services. For final versions of a to 1957 – in digital form as well. In a pilot document to be transmitted or archived, The EU Publications Office in Luxem- project, an external digitisation team Brazil prefers PDF/A. bourg. took two years to turn 130,000 paper documents in eleven languages into ■■Denmark: Since April 2011, all Danish PDF/A-1b files with searchable text. An government bodies are required to save important factor in choosing PDF/A was non-editable documents in PDF/A for- that XMP metadata can be used for key- mat. words and other bibliographic informa- tion. To simplify print-on-demand book ■■France: Since early 2009, the French Libraries and archives are taking orders, the archive files are now also authorities have recommended the ISO’s a leading role in implementing available in the ISO standard format for PDF/A standard for archiving adminis- and developing PDF/A. In the digital print data, PDF/X-3. trative documents with static, unchang- USA and Europe in particular, ing content. these institutions are choosing ■■The European Patent Office: Since April the ISO standard for long-term 2010, the European Patent Office has ■■Switzerland: Due to archiving require- archiving. published patent documents not just in ments, all electronic communication PDF format, but also in PDF/A. For the between citizens and administrative au- Patent Office, an important feature of the thorities is required to use the PDF/A file PDF/A format is found in the way it uses format. This regulation has been in force metadata: the XMP metadata fields can since 2008. include the publication number, the pat- entee and the international patent clas- ■■Germany: German registry offices have sification. run an electronic register of births, mar- riages and deaths since 2009; for regis- ■■“Comply or Explain” in the Netherlands: tered data, they use PDF/A and XML. By The government of the Netherlands has 2014, these offices are expected to have a “Comply or Explain” policy regarding switched over to an all-digital system. open standard software. The national ac- tion plan “Nederland Open in Verbind- For further user reports and the latest ing” enforces the use of open standards PDF/A recommendations, visit the web- and requests the use of standard file for- site of PDF Association: www.pdfa.org.

PDF/A in a Nutshell 2.0 13 Areas of Application PDF/A in finance and industry

Businesses benefit from the ISO stan- ning existing bonus booklets in colour. dard for long-term archiving because it This file is used to create a PDF/A with helps them to store digital documents compressed images and searchable text, in compliance with legal requirements. which can then be archived and sent on- PDF/A-3 has further increased adoption wards. rates within the financial sector, as this Helaba, the state bank of Hesse and new part of the standard can also be used Thuringia, uses PDF/A to handle incom- to keep source documents organised (see ing post and to archive emails. It also page 8). stores digital credit documents in PDF/A format. Industry documentation The banking and insurance sector of- The aeroplane manufacturer Airbus was ten requires credit and insurance files to a pioneer in the use of PDF/A. Aeroplane be retained for 50 or more years. blueprints must be preserved for at least 99 years. Back in 2002, one of the man- Healthcare ufacturer’s working groups recognised As a rule, documents such as doctors’ that PDF was in some respects well-suit- notes, statements, lab reports, and X-ray ed to long-term archiving, but also that and tomographic images must be re- it contained a number of problematic tained for 30 years or more. functions. The team therefore first devel- The medical centre at Greifswald Uni- oped a “minimal PDF” which was used versity Hospital, for example, uses PDF/A for digital archiving until PDF/A became to archive its patient records – including available. digital signatures and timestamps. Due The construction industry has rec- to requirements of legal certainty, digital ognised the advantages of features like signatures play a critical role in medical PDF/A-3’s “container” functionality. statements. Mechanical engineering companies, for PDF/A is used in doctors’ practices as example, can preserve original 3D mod- well as in clinics. The Lake Constance els in any format as part of the PDF/A-3 Radiation Oncology Centre uses PDF/A file. to process and archive digital patient re- NIRMA (the Nuclear Information and cords. Records Management Association) also recommends PDF/A when working with E-invoicing nuclear technology in the United States. The standardised document and data The US-based energy provider Southern format ZUGFeRD was created to make Co. has been using PDF/A for years to it easier to exchange digital invoices. It ensure that all digital documents relating is the result of a joint initiative between to nuclear installations will remain read- BITKOM (Bundesverband Informati- able into the future. onswirtschaft, Telekommunikation und neue Medien e.V.) and FeRD (Forum Banking and insurance elektronische Rechnung Deutschland). PDF/A helps the German health insur- The ZUGFeRD exchange format also ance provider Techniker Krankenkasse uses PDF/A-3. It embeds invoice data in with a bonus programme for its custom- XML format to allow the recipient of an ers. It begins with the company scan- invoice to process it automatically.

14 PDF/A in a Nutshell 2.0 Areas of Application PDF/A in legislation and justice

Digital documents are increasingly re- documents. The goal was to turn large placing traditional paper documents. quantities of paper-based documents Existing paper documents are being into a reliable, future-proof digital for- scanned and digitised, while digi- mat with no expensive specialist view- tal-only processes are becoming more ing software required. and more common – and PDF/A is at the very forefront of this trend. The Italian Chamber of Commerce PDF/A also plays a significant role in Since 2010, Italian businesses have legislation and justice in many coun- been required to send reports to the tries. Legal bills and court records usu- appropriate commercial register in ally have to be stored for exceptionally PDF/A format. This includes balanc- long periods of time. The ability to es, certificates and reports of business search text and XMP metadata in transactions, acquisitions, mergers and PDF/A files can make it much quicker insolvencies. and easier to find and allocate digital A data transfer platform is used records. for input; software is used to convert text or existing PDF documents to Federal jurisdiction in the United PDF/A-format certificates. The CGN, States courts a specialist network for the financial, legal, fiscal and labour sectors, was To see the huge significance of PDF/A closely involved in establishing the in the legal field, you need only look at platform. the Administrative Office of the United States Courts, which is taking a lead- Austria: BAIK ing role in standardising PDF/A. Work The “Bundeskammer der Architekten has been ongoing since 2002 with the und Ingenieurkonsulenten” in Austria, American associations AIIM (Asso- the Federal Chamber of Architects and ciation for Information and Image Consulting Engineers, requires public- Management) and NPES (the Nation- ly available digital certificates to con- al Printing Equipment Association) form to the PDF/A-1b standard. This to develop the standard for archived guarantees the authenticity of all dig- ital documents accepted into the title register, thanks to a qualified digital signature. Germany: Land registration The “Decree of the Baden-Württem- berg Ministry of Justice for the intro- duction of digital legal procedures and digital records of land registration procedures” (ERGA-VO) requires that all digital data (ASCII, Unicode, RTF, PDF, TIFF, Word) submitted must be convertible to PDF/A. This decree has been in force since March 2012.

PDF/A in a Nutshell 2.0 15 Expert Opinions What the users and experts say

Stephen Levenson the European Community and the Euro- U.S. District Courts: pean Union, available in digital form.”

“PDF/A now pro- vides a full decade Kai Volmar of design ideas Landesbank Hessen­ from the best and Thüringen (Helaba): brightest digital preservation prac- “In the world of IT, titioners. Rich opportunities exist for sometimes you don’t having all the features of PDF that you want to be the first to have come to expect in presentation use a new technolo- and now with PDF/A-3 the ability to gy. But the advantag- have machine-processible data as XML es of PDF/A convinced us straight away, or text. PDF/A-3 will also allow for when compared with the TIFF format we keeping the original editable version of had previously used. As a result, the deci- your content. sion in 2006 to use PDF/A to digitise our Most of the world has adopted PDF/A records was not a hard one. as their long-term static format. It has Meanwhile, the state bank of Hesse become the true replacement for paper, and Thuringia now uses nothing but as its designers envisioned. Other for- PDF/A for digital archiving, whether mats will result in the loss of content. It the documents first need to be digitised used to be said, no one was ever fired for or were born digital in the first place.” picking IBM. It will be said some day, no one was ever fired for picking PDF/A. It is that dependable.” Jacob Bielfeldt, Techni- ker Krankenkasse:

Anton Zagar “In an internal EU Publications Office: workshop in 2006, the Techniker Kran- “The European kenkasse identified Union Publications PDF/A as a prom- Office stores its ising future-proof digital archive of document format; today, this is con- over 150,000 pub- firmed by the many advantages PDF/A lications, some of brings. them stretching back to 1952, in the The Techniker Krankenkasse is intro- PDF/A format. We also use the PDF/A ducing PDF/A into its ongoing projects format to publish the Official Journal step by step. Our first project was to digi- of the European Union in 23 languages tise staff records; our second was to use every day: in 2012 alone we produced PDF/A in output management. We have 1.2 million pages. increasingly used PDF/A for input man- The Office has set itself the goal of agement since 2011 and are planning to making all publications, by all bodies of use PDF/A further.”

16 PDF/A in a Nutshell 2.0 The Family of PDF Standards PDF/A and the other PDF standards

Specialist ISO standards based on the PDF PDF/X since 2001 Portable Document Format are avail- PDF itself was also standardised in able for a wide range of purposes. 2008 as “ISO 32000”. The basis of the “Prepress digital data ­exchange using PDF” standard was the then-current PDF PDF/X version 1.7. With this, PDF became an ISO-Standard for the printing industry Back in 2001, an ISO working group open standard. PDF 2.0 is expected to developed a pre-press PDF standard, be published in 2014. “ISO 15930”. At this time, customers PDF/A since 2005 usually sent printers “open files” from PDF/VT layout software. This method, howev- PDF/VT is a standard based on “PDF Archive” er, always carried the risk of fonts and PDF/X-4 and PDF/X-5, supporting Standardised long-term images going missing. PDF/X is able variable data printing. It was pub- archiving with PDF to eliminate all of these problems; it lished in August 2010. also has the advantage of carrying re- The abbreviation “VT” stands for PDF/E since 2008 liable colour information thanks to “Variable data and transactional colour management settings. printing”. This includes invoices and “PDF Engineering” The “X” identifier stands for “Ex- personalised advertisements, for ex- Construction diagrams change”, as PDF/X is intended for re- ample. with moving 3D models liable print data exchange. Additional where required standardisation for PDF/X versions 4 PDF/UA and 5 has taken into account the newer The PDF/UA (Universal Access) stan- PDF since 2008 features available to the PDF file for- dard, approved in 2012, allows univer- mat, including transparent elements sal access to PDF files’ content. This is “Portable Document Format” and JPEG2000 image compression. useful for users with disabilities (for The ISO standard PDF/X-5 also supports externally ref- example the partially sighted) and corresponds with PDF erenced elements. others. version 1.7 Of particular importance is a clear co- PDF/A herent logical structure of the PDF’s PDF/VT since 2010 PDF was also recognised early on as elements, to ensure that navigational having great potential for archiving aids, reading software or Braille dis- “PDF for Variable Data and Transactional Printing“ digital documents. In 2005, the ISO plays can handle all content including published the first part of the PDF text, images and diagrams. Used for variable data printing standard for long-term archiving, PDF/UA builds on proven concepts PDF/A. for accessible web content and adds concrete demands on the semantic PDF/UA since 2012 PDF/E structure of PDF documents (which This standard has been available since PDF/A Conformance Level A had “PDF for Universal Access“ 2008 as “ISO 24517”; it is aimed at previously only given in a very gener- ISO standard for engineering documents such as con- al sense). universally accessible PDF documents struction drawings. The original data PDF/UA offers users with disabili- often comes from CAD software used ties the best possible access to content. for digital drafting. PDF/E can display It also makes it easier for mobile de- rotating and folding 3D objects on- vices to use this content and supports screen, using tools like the free Adobe its flexible reuse in other forms of pre- Reader. sentation.

PDF/A in a Nutshell 2.0 17 Tidbits The myths and legends surrounding PDF/A

A number of critics have spoken out ments as to whether an external link against PDF/A, especially when the stan- should lead to a valid destination. dard was first introduced. Many criti- cisms of the format, however, are based ■■PDF is a proprietary format: PDF was on misunderstandings. originally developed by Adobe Systems, These are some of the most commonly but since then PDF (ISO 32000) and encountered myths and legends: PDF/A (ISO 19005) have become ISO standards. TIFF, on the other hand, is ■■PDF/A files are too large: PDF/A actu- a specification belonging to Adobe Sys- ally allows exceptionally small file sizes tems alone, and it has not achieved the thanks to its sophisticated use of pow- status of ISO standard. erful compression algorithms such as JBIG2 and JPEG (and JPEG2000, from ■■Scanned documents cannot be searched PDF/A-2 onwards). Embedded fonts can by text: PDF/A permits text recognition slightly increase the size of a PDF/A file. processes, meaning that even scanned When archiving a very large number PDF/A documents can be searched. of individual, fairly similar documents, this can in some cases (such as for mass ■■PDF/A is not supported by DMS systems: mailings) prove problematic. Any ECM system which works with PDF can also handle PDF/A in princi- ■■PDF/A is not as revision-safe as TIFF: TIFF ple. Many DMS suppliers offer solutions files are easier to alter than PDF and which support PDF/A. PDF/A documents. In any case, however, revision safety is not achieved through ■■PDF/A does not allow metadata: Not your choice of file format. It can only be at all: PDF/A specifically requires em- achieved by using an appropriate docu- bedded standardised metadata corre- ment management or archiving system. sponding to the modern XMP metadata standard, which was published in Febru- ■■PDF/A does not allow signatures: Quite ary 2012 as “ISO 16684-1”. XMP meta- the opposite. PDF/A expressly supports data can be directly embedded into the embedded digital signatures. PDF/A-2 re- PDF/A document. quires PADeS-standard compliance here. ■■PDF/A is not globally relevant: This state- ment is false. Although the very first PDF/A initiatives and products did come from German-speaking countries, the ISO standard has since become a recom- mendation or even a legal requirement in many countries and industries.

■■PDF/A is expensive to implement: Yes and no. Implementing PDF/A solutions ■■Links are not allowed: This claim is also and training staff will incur costs at first, false. Hyperlinks are allowed in princi- but these investments very often pay for ple. The PDF/A standard sets no require- themselves within months.

18 PDF/A in a Nutshell 2.0 Further Information Further information on PDF/A

The PDF/A Competence Centre, to- ities seeking a speaker on PDF/A for day a part of the PDF Association, was their event can find support at the PDF founded very shortly after PDF/A first Association: interested parties can sim- appeared as an ISO standard. This in- ply enquire about a presentation using ternational organisation aims to pro- a form at www.pdfa.org. mote the development and usage of Some countries also have direct con- PDF standards. To that end, the PDF tact persons in the Association’s local Association targets users, developers chapters. For a complete list, including and decision-makers equally and helps contact details, please visit the Associa- its members exchange information tion’s website. worldwide. Membership The portal to the PDF Association Anyone aiming to work actively to de- A good starting point for anyone with velop and expand use of the PDF stan- questions about PDF/A or PDF in gen- dard can become a member of the PDF eral is the PDF Association’s website. Association, whether an individual or At www.pdfa.org, you can find infor- an entire organisation. mation in English and German about Membership allows a company to current development, from all relevant present itself on the PDF Association’s industries and from suppliers world- portal and to publish its own an- wide. Comprehensive background infor- nouncements, press releases and arti- mation and example applications explain cles on the website. They can also the technology behind PDF/A and present their software solutions and its use in practice. You can view a vid- services within the Association’s prod- eo-on-demand series about PDF/A and uct showcase. The Association also has other PDF standards. The website offers especially favorable terms for present- an overview of PDF- and PDF/A-related ing products and strategies at trade software products and services. shows and other events. Members also Users can contact the Association have exclusive access to the Associa- with specific questions. To do so, sim- tion’s intranet. ply register on the website and de- scribe your request on the discussion forum. Specialists and practitioners from around the world will provide in- formed answers and suggestions. PDF Association events The PDF Association attends interna- tional trade shows and other events on subjects such as document man- agement, digital media and electronic archiving. For years the Association has also organised specialist seminars and technical conferences around the The PDF Association’s website can be world. Companies and public author- found at www.pdfa.org.

PDF/A in a Nutshell 2.0 19 PDF/A in a Nutshell 2.0 – PDF for long-term archiving

About the author PDF/A is an ISO standard for using the PDF format for long-term archiving of digital documents. Since its pub- Alexandra Oettler has worked for lication in 2005, PDF/A has become the format of choice years as a freelance journalist in the for archiving digital documents in a wide range of indus- areas of software, print and media. tries and applications. “PDF/A in a Nutshell 2.0” pro- Her work is regularly published in vides a comprehensive introduction to the material and specialist journals on the subject of shows off the latest developments available with PDF/A-2 prepress in practice, software tech- and PDF/A-3. The brochure provides information about nology and financial developments in PDF/A tools and strategies for creating and validating the publishing sector. She regularly writes news and back- PDF/A files. Examples from around the world demon- ground reports for the online editions of several journals. strate how users in the areas of finance, administration, She also was one of the co-authors of the first edition of academia and law can benefit from PDF/A. “PDF/A in a Nutshell”. Contents: ■■Facts about PDF/A

■■The history of PDF/A

■■The technical side

■■Who can benefit from PDF/A and why

■■Typical applications

■■PDF/A creation tools

■■PDF/A validation

■■PDF/A in public administration

■■PDF/A in finance and industry

■■PDF/A in legislation and justice

■■What the users and experts have to say

■■PDF/A and the other PDF standards