Multifunctional Tool for PDF Files
Total Page:16
File Type:pdf, Size:1020Kb
LINUXUSER PDF Toolkit Multifunctional tool for PDF files PDF TO THE MAX To manage the mountains of paper that cross our desks every day, we need to file, retrieve, copy, stamp, investigate, and classify documents. A special tool can help users keep on top of their electronic paperwork: pdftk – the PDF toolkit. BY STEFAN LAGOTZKI ative Linux PDF utilities such as the PDF toolkit from one of the Sid Stew- operation [option] U GhostScript are very useful if ard’s [1] websites. The GPL program is output outputfile U Nyou’re willing to click through available for Linux, Mac OS X (Panther), [passwords] [userpermissions] the menus. But if you’re looking for FreeBSD, Solaris, and Windows. The something faster, or if you would like to platform-specific install proved to be Input files have to be in PDF format. automate a recurring task, try pdftk (the quite simple on the platforms we tested The tool additionally needs text files in PDF Toolkit). pdftk is a convenient com- (including Debian and SuSE Linux). a special format for some operations. mand-line program for processing PDF After completing the install, you can pdftk outputs one or more PDF files and files. According to creator Sid Steward, run pdftk from a shell. The pdftk --help also the text files in special cases. “If PDF is electronic paper, then pdftk is command gives you a list of commands In the following sections, I have put an electronic staple-remover, hole- and options with short help texts. Table together a few examples that demon- punch, binder, secret-decoder-ring, and 1 lists and explains the major operations. strate a few of pdftk’s more interesting X-Ray-glasses.” The generic syntax for processing PDF uses. These examples by no means files with the program is: explore the program’s limits. Installation and Use You can download the latest version of pdftk inputfile(s) U Adding Attachments to PDF Files You can add an attachment to a PDF file just as you can add an attachment to an email message. Adobe Reader (version 6 or newer) stores the attachments at the recipient. pdftk allows users to attach files to PDF documents and also to save the attachments. Before the release of Adobe Reader 7, this was the only way Linux users could save attachments. Attachments can be used to add source code or excerpts from literature databases to PDF files. The following example shows how to forward a PDF file and the accompanying source code. To add a source code file to a PDF docu- ment using pdftk, type the following commands: pdftk form.pdf attach_files U form.tex output new.pdf Alternatively, you could use pdfLaTeX and attachfile to add the source code to the finished PDF file. The recipient would then use pdftk to unpack the source code file and other attachments using in any directory: 86 ISSUE 59 OCTOBER 2005 WWW.LINUX - MAGAZINE.COM PDF Toolkit LINUXUSER pdftk example_attachment.pdf U showpage Listing 1: Typical PDF unpack_files output Source Meta-Data It is easy to change the background color In this example, pdftk saves the attach- in the EPS code. You can then convert 01 InfoKey: Title ments in a directory named Source. Add- the EPS file to a PDF using epstopdf and 02 InfoValue: Working with pdftk ing the directory name always makes then run pdftk to use the file as a back- 03 InfoKey: Subject sense if you are handling multiple ground: 04 InfoValue: Practical pdftk attachments. examples pdftk example.pdf background U 05 InfoKey: Keywords Watermarks and Bg.pdf output eg_color.pdf Background Colors 06 InfoValue: pdftk, iText, pdftk uses a similar approach to the Splitting and OpenSource applications LaTeX eso-pic package to add a water- Assembling PDF Files 07 InfoKey: Author mark to a document. The background The burst operation allows you to split a 08 InfoValue: Stefan Lagotzki option handles this and also allows the PDF file into its component pages. To do 09 InfoKey: City user to assign a PDF background color. so, you need to provide a generic name The image you will be using as your for the pages and specify the numbering 10 InfoValue: Dresden watermark must be a PDF file. You could format: create the image with a vector graphics by linking part of a PDF file with parts of tool or write a PostScript program. If the pdftk example.pdf burst U other PDFs to create a new document. watermark is not the same size as the output Page%03d.pdf document, pdftk will scale the water- pdftk example.pdf burst U Querying and mark. Let’s assume you want to stamp output ./Pages/Page%03d.pdf Updating Meta-Information the word “DRAFT” on a document. The Most PDF files co ntain meta-informa- first step would be to create a PDF with In both examples a three-digit page num- tion with details of the author, the topic, the right page size, before going on to ber will be added to the page names. In or the software used to create the docu- call pdftk as follows: the second example, pdftk will store the ment. Pdftk allows you to send this data PDF files in an existing subdirectory. to standard output or to redirect it into a pdftk example.pdf background U The cat operation tells pdftk to concat- file: draft.pdf output draft1.pdf enate multiple PDF files to create a new document. You can use wildcards to pdftk example.pdf U The watermark looks like a stamp on specify the filenames of the individual dump_data output info.txt any part of the document without con- source files. tent. You can create a small EPS file with This command saves the meta-informa- a background color of your choice. The pdftk example.pdf form.pdf U tion from the PDF document to a file PostScript commands for an A4 size attachment.pdf U titled info.txt. The information com- page look like this: cat output example_concat.pdf prises a key field and the matching value pdftk D=coversheet.pdf U (see Listing 1). Before forwarding or %!PS-Adobe-2.0 B=example.pdf U archiving PDF documents, it often %%BoundingBox: 0 0 595 842 cat D B1-4 output U makes sense to update the meta-data. 0.95 0.95 0.90 setrgbcolor example_coversheet.pdf Pdftk allows you to do so without having 0 0 moveto 595 0 rlineto 0 842 U to recreate or translate the PDF file. rlineto -595 0 rlineto As the second example demonstrates, To update the meta-information, first closepath fill you can use cat to rearrange documents create a text file with the meta-data; the file should look something like this Table 1: pdftk Operations (shortened for the sake of brevity): Operation Explanation InfoKey: Creator attach_files Adds files as attachments to a PDF document. This allows you to add an archive file to the PDF file. InfoValue: TeX background Adds a watermark to each page of the PDF document. Also allows you to affix InfoKey: Corporation an electronic stamp to empty spaces. InfoValue: Sample and Sons burst Splits a PDF document into individual pages. cat Concatenates a new PDF file from multiple files or pages from different PDF This file does not need to contain all the documents. information that a PDF file can store. dump_data Outputs information about a PDF file on standard output. Fields that already contain values are not dump_data_fields Outputs information about the form fields in a PDF file on standard output. touched by the update if the text file fill_form Fills out PDF forms or links form data with the document. does not specify them. You can even add unpack_files Unpacks the enclose attachments of a PDF document in a directory. new key fields (Corporation in our exam- update_info Updates the meta-information (e.g. author, title, topic) in a PDF file. ple) and assign values to them. The fol- WWW.LINUX - MAGAZINE.COM ISSUE 59 OCTOBER 2005 87 LINUXUSER PDF Toolkit lowing call updates the meta-informa- fills out the form fields in Table 2: PDF Permissions tion: his or her browser. Then a PHP or Perl script run- Option Explanation pdftk example.pdf U ning in the background Printing The document can be printed in best quality. DegradedPrinting The document print out will be of limited update_info info.txt U creates the FDF file; quality. finally, pdftk combines output eg_meta.pdf ModifyContents The document content can be changed. the two parts. The com- Assembly The PDF document can be concatenated The input and output files are not per- pleted PDF file can then with other PDF documents. mitted to have the same name. In other be mailed. CopyContents Text and images can be copied from the file. words, you either need to manually ModifyAnnotations Comments and annotations can be rename the output file, or use a shell Passwords and changed. script to do so. User Permissions FillIn Forms in the PDF file can be filled out. in PDFs AllFeatures The user has all specified privileges. Filling Out PDF Forms PDF files can be pro- PDF files can contain forms with known tected by user and owner passwords. A=Lie5quai cat output U form fields. Adobe developed the propri- pdftk allows you to set both the pass- egl_pw.pdf user_pw Abraxas etary but open FDF format for PDF form words and the permissions for a PDF data. Listing 2 shows an example of a file. The following example sets both As the previous example does not allow short FDF file.