LINUXUSER PDF Toolkit

Multifunctional tool for PDF files PDF TO THE MAX To manage the mountains of paper that cross our desks every day, we need to file, retrieve, copy, stamp, investigate, and classify documents. A special tool can help users keep on top of their electronic paperwork: pdftk – the PDF toolkit. BY STEFAN LAGOTZKI

ative PDF utilities such as the PDF toolkit from one of the Sid Stew- operation [option] U GhostScript are very useful if ard’s [1] websites. The GPL program is output outputfile U Nyou’re willing to click through available for Linux, Mac OS X (Panther), [passwords] [userpermissions] the menus. But if you’re looking for FreeBSD, Solaris, and Windows. The something faster, or if you would like to platform-specific install proved to be Input files have to be in PDF format. automate a recurring task, try pdftk (the quite simple on the platforms we tested The tool additionally needs text files in PDF Toolkit). pdftk is a convenient com- (including Debian and SuSE Linux). a special format for some operations. mand-line program for processing PDF After completing the install, you can pdftk outputs one or more PDF files and files. According to creator Sid Steward, run pdftk from a shell. The pdftk --help also the text files in special cases. “If PDF is electronic paper, then pdftk is command gives you a list of commands In the following sections, I have put an electronic staple-remover, hole- and options with short help texts. Table together a few examples that demon- punch, binder, secret-decoder-ring, and 1 lists and explains the major operations. strate a few of pdftk’s more interesting X-Ray-glasses.” The generic syntax for processing PDF uses. These examples by no means files with the program is: explore the program’s limits. Installation and Use You can download the latest version of pdftk inputfile(s) U Adding Attachments to PDF Files You can add an attachment to a PDF file just as you can add an attachment to an email message. Adobe Reader (version 6 or newer) stores the attachments at the recipient. pdftk allows users to attach files to PDF documents and also to save the attachments. Before the release of Adobe Reader 7, this was the only way Linux users could save attachments. Attachments can be used to add source code or excerpts from literature databases to PDF files. The following example shows how to forward a PDF file and the accompanying source code. To add a source code file to a PDF docu- ment using pdftk, type the following commands:

pdftk form. attach_files U form.tex output new.pdf

Alternatively, you could use pdfLaTeX and attachfile to add the source code to the finished PDF file. The recipient would then use pdftk to unpack the source code file and other attachments using in any directory:

86 ISSUE 59 OCTOBER 2005 WWW.LINUX - MAGAZINE.COM PDF Toolkit LINUXUSER

pdftk example_attachment.pdf U showpage Listing 1: Typical PDF unpack_files output Source Meta-Data It is easy to change the background color In this example, pdftk saves the attach- in the EPS code. You can then convert 01 InfoKey: Title ments in a directory named Source. Add- the EPS file to a PDF using epstopdf and 02 InfoValue: Working with pdftk ing the directory name always makes then run pdftk to use the file as a back- 03 InfoKey: Subject sense if you are handling multiple ground: 04 InfoValue: Practical pdftk attachments. examples pdftk example.pdf background U 05 InfoKey: Keywords Watermarks and Bg.pdf output eg_color.pdf Background Colors 06 InfoValue: pdftk, iText, pdftk uses a similar approach to the Splitting and OpenSource applications LaTeX eso-pic package to add a water- Assembling PDF Files 07 InfoKey: Author mark to a document. The background The burst operation allows you to split a 08 InfoValue: Stefan Lagotzki option handles this and also allows the PDF file into its component pages. To do 09 InfoKey: City user to assign a PDF background color. so, you need to provide a generic name The image you will be using as your for the pages and specify the numbering 10 InfoValue: Dresden watermark must be a PDF file. You could format: create the image with a vector graphics by linking part of a PDF file with parts of tool or write a PostScript program. If the pdftk example.pdf burst U other to create a new document. watermark is not the same size as the output Page%03d.pdf document, pdftk will scale the water- pdftk example.pdf burst U Querying and mark. Let’s assume you want to stamp output ./Pages/Page%03d.pdf Updating Meta-Information the word “DRAFT” on a document. The Most PDF files co ntain meta-informa- first step would be to create a PDF with In both examples a three-digit page num- tion with details of the author, the topic, the right page size, before going on to ber will be added to the page names. In or the software used to create the docu- call pdftk as follows: the second example, pdftk will store the ment. Pdftk allows you to send this data PDF files in an existing subdirectory. to standard output or to redirect it into a pdftk example.pdf background U The cat operation tells pdftk to concat- file: draft.pdf output draft1.pdf enate multiple PDF files to create a new document. You can use wildcards to pdftk example.pdf U The watermark looks like a stamp on specify the filenames of the individual dump_data output info.txt any part of the document without con- source files. tent. You can create a small EPS file with This command saves the meta-informa- a background color of your choice. The pdftk example.pdf form.pdf U tion from the PDF document to a file PostScript commands for an A4 size attachment.pdf U titled info.txt. The information com- page look like this: cat output example_concat.pdf prises a key field and the matching value pdftk D=coversheet.pdf U (see Listing 1). Before forwarding or %!PS-Adobe-2.0 B=example.pdf U archiving PDF documents, it often %%BoundingBox: 0 0 595 842 cat D B1-4 output U makes sense to update the meta-data. 0.95 0.95 0.90 setrgbcolor example_coversheet.pdf Pdftk allows you to do so without having 0 0 moveto 595 0 rlineto 0 842 U to recreate or translate the PDF file. rlineto -595 0 rlineto As the second example demonstrates, To update the meta-information, first closepath fill you can use cat to rearrange documents create a text file with the meta-data; the file should look something like this Table 1: pdftk Operations (shortened for the sake of brevity): Operation Explanation InfoKey: Creator attach_files Adds files as attachments to a PDF document. This allows you to add an archive file to the PDF file. InfoValue: TeX background Adds a watermark to each page of the PDF document. Also allows you to affix InfoKey: Corporation an electronic stamp to empty spaces. InfoValue: Sample and Sons burst Splits a PDF document into individual pages. cat Concatenates a new PDF file from multiple files or pages from different PDF This file does not need to contain all the documents. information that a PDF file can store. dump_data Outputs information about a PDF file on standard output. Fields that already contain values are not dump_data_fields Outputs information about the form fields in a PDF file on standard output. touched by the update if the text file fill_form Fills out PDF forms or links form data with the document. does not specify them. You can even add unpack_files Unpacks the enclose attachments of a PDF document in a directory. new key fields (Corporation in our exam- update_info Updates the meta-information (e.g. author, title, topic) in a PDF file. ple) and assign values to them. The fol-

WWW.LINUX - MAGAZINE.COM ISSUE 59 OCTOBER 2005 87 LINUXUSER PDF Toolkit

lowing call updates the meta-informa- fills out the form fields in Table 2: PDF Permissions tion: his or her browser. Then a PHP or Perl script run- Option Explanation pdftk example.pdf U ning in the background Printing The document can be printed in best quality. DegradedPrinting The document print out will be of limited update_info info.txt U creates the FDF file; quality. finally, pdftk combines output eg_meta.pdf ModifyContents The document content can be changed. the two parts. The com- Assembly The PDF document can be concatenated The input and output files are not per- pleted PDF file can then with other PDF documents. mitted to have the same name. In other be mailed. CopyContents Text and images can be copied from the file. words, you either need to manually ModifyAnnotations Comments and annotations can be rename the output file, or use a shell Passwords and changed. script to do so. User Permissions FillIn Forms in the PDF file can be filled out. in PDFs AllFeatures The user has all specified privileges. Filling Out PDF Forms PDF files can be pro- PDF files can contain forms with known tected by user and owner passwords. A=Lie5quai cat output U form fields. Adobe developed the propri- pdftk allows you to set both the pass- egl_pw.pdf user_pw Abraxas etary but open FDF format for PDF form words and the permissions for a PDF data. Listing 2 shows an example of a file. The following example sets both As the previous example does not allow short FDF file. passwords: you to concatenate the PDF file for file_ In Listing 2, T is the title, and V is the new.pdf, you need to supply the owner form field value. You can now merge the pdftk file.pdf output U password. PDF file with the FDF file and decide file_new.pdf owner_pw U whether the form data should remain Lie5quai user_pw phupaefu Conclusions editable or be indelibly merged with the If you are looking for a quick, simple, document: The passwords in this example were and efficient tool for editing PDF files generated using the pwgen tool. You from the command line, try the PDF pdftk form.pdf fill_form U must choose different strings for the Toolkit. pdftk is a versatile, multifunc- eg.fdf output edit.pdf user and owner passwords. tional PDF manipulation tool without pdftk form.pdf fill_form U The owner of a PDF file can assign the burden of a GUI. If you want to dig eg.fdf output end.pdf flatten specific permissions. Table 2 has a list of deeper into the subject of manipulating permissions that you can set with pdftk. PDF files, see Sid Steward’s book on PDF The first example gives you editable The following example first creates a Hacks [2]. results; whereas the flatten option in the PDF file that can only be printed. The pdftk is written in C++ and based second file indicates that the form fields second line creates a PDF file that can be on the iText library [3], which in turn should be indelibly merged with the PDF printed and also copied. was written in Java. The whole program file. was complied and linked with tools from The form feature allows you to use pdftk example.pdf output U the free GNU Compiler Collection [4], pdftk to create completed PDF forms on file_new.pdf owner_pw U This makes pdftk easily portable and an Internet or Intranet server. The user Lie5quai user_pw phupaefu U extensible. The pdftk website has links allow printing to ports. Listing 2: Example FDF File pdftk example.pdf output U Development work on the pdftk pro- file_new.pdf owner_pw U gram still continues. The program’s 01 %FDF-1.2 Lie5quai user_pw phupaefu U author, Sid Steward, will answer queries 02 1 0 obj << allow printing CopyContents on pdftk and PDF programming posted 03 /FDF << /Fields [ in the comp.text.pdf newsgroup and in ■ 04 << /V (Dresden)/T (city) >> PDF files can be encrypted with different his own PDF forum [1]. levels of encryption. To encrypt a file 05 << /V (Stefan Lagotzki)/T with pdftk, add one of the following as INFO (author)>> the final option: encrypt_40bit or [1] Sid Steward: pdftk; Version 1.12 06 ]/F (form.pdf) >> encrypt_128bit. You also need to supply (Nov. 2004): 07 >> a password for a password-protected http:// www. accesspdf. com/ pdftk/ PDF file. If you are processing multiple 08 endobj [2] Sid Steward, PDF Hacks; O’Reilly, files, you can bind variables to the file- 09 trailer 2004. names and then assign a password to [3] Bruno Lowagie, Paulo Soares: 10 << each file. In the following example, only iText-Library; Version 1.1 (Nov. 2004): 11 /Root 1 0 R file A is password protected: http:// . sourceforge. net 12 >> [4] GNU Compiler Collection, Version pdftk A=file_new.pdf U 13 %%EOF 3.4.3 (Nov. 2004): http:// gcc. gnu. org B=eg_color.pdf input_pw U

88 ISSUE 59 OCTOBER 2005 WWW.LINUX - MAGAZINE.COM