<<

Modifying | Starter Kit

How to modify PDF documents? Seth Kenlon

he Portable Document Format, or PDF, is a document standard developed by As a part of the GNOME Desktop Environ- TAdobe to help ensure that when a user ment (which ’s uses as its base), prints, they get exactly what they see on the Evince is the default PDF viewer on Ubuntu screen. They are used as “pre-flight” tests Linux. It allows you to do all of the usual PDF in graphics design, or as convenient ways to things like read, rotate, resize for viewing, sent documents (with embedded fonts and and well as a few advanced features. graphics) across a network. One problem with PDFs is that there PDFs also get abused quite a lot; they are many different ways to create them but are often considered an e-book format even not much of a way to tell what feature set a though they don’t feature the re-sizing and particular document actually contains. For re-flowing capabilities of true e-book formats instance, it’s possible to embed text into a like e-pub. PDFs are, as their name states, scanned set of images in a PDF (using OCR meant to be digital versions of paper with all technology), but not everyone does this. of its advantages as well as disadvantages. So you might open one PDF document of a Sometimes you’ll a need to modify PDFs. scanned text book and discover that Evince Adobe itself successfully pushed PDF as a can highlight and copy all of the text on a universal, cross-platform standardized format, page. You might then open a separate docu- and yet their Acrobat Pro application, which ment that looks basically the same and yet allows a user to open and modify PDF docu- Evince will be unable to select or copy the ments, is not available on Linux. As a result, text. It’s important to know that this is not many Linux users have opted for e-pub (based something Evince is or is not doing; it’s infor- on entirely free technology like html, , and mation that is or is not embedded invisibly zip) over PDF, but PDF still does have distinct into the PDF file itself. advantages when you need a document to be Another confusing feature are PDF an inflexible print-ready proof. Luckily, there Forms (FDF). You might download a docu- are a number of tools that will allow you to ment that you are required to fill out (such work with, create, and modify PDFs on Linux. as a job application or a school form) and find that Evince allows you to click into each field and type in the data. You could then save the PDF with the form data included and submit it back to the organization request- ing the information. You might then open a different form and find that Evince will not allow you to fill in the data. Once again, this isn’t Evince arbitrarily deciding whether or not you can fill out a form; it’s how the PDF itself was created. Some have form data while others do not, and there’s no good way to know for sure which is which except by trying to perform an action and watching Figure 1. The Evince viewer brings Adobe PDF functionality to your desktop the results.

32 linux identity office Modifying PDFs | Linux Starter Kit

For all the common tasks, how- ever, you’ll find Evince a fine PDF viewer that easily matches Reader in features and per- formance. Modification Even though Adobe doesn’t bother releasing Acrobat Pro for Linux, there are plenty of tools that we Li- nux users can use to modify PDFs. The first is , a digital illustration application that also happens to translate PDFs into their graphical and textual com- ponents. Retaining the layout of Figure 2. Filling out Form Data with Evince the page, Inkscape is able to open PDFs into fully modifiable page And this will leave you with a single advantage of PDF’s ability to do in- layouts. PDF file (“newfile.” in the ex- ternal linking or follow external hy- Inkscape is a traditional illustra- ample) as if though nothing had perlinks; i.e., if you reference a web tion program that is powerful and changed. page in your document, the user yet intuitive. Install it via the Ubuntu cannot click on the weblink and be Software Center; for a full, free se- Generating a PDF whisked off to that webpage in their ries on using the tools of Inkscape, There are many ways to send your browser. At best, they’d need to se- see screencasters.heathenx.org own documents out in the PDF for- lect the text in the PDF and open a If the PDF contains multiple pag- mat. Nearly every program that can browser and paste the text into the es, then you may need to stitch the print can also “print” to PDF; in oth- URL bar; this is simple enough on a pages back together. For instance, er words, the applications thinks it’s computer but can be difficult on a if you modify the second page of printing, but instead of printing to mobile device. a three page document, you could paper it saves the results to a file. The answer is an html-like mark- open page 2 in Inkscape, modify it, This is a great way to quickly up language called docbook, which and export it as a stand-alone modi- get PDF versions of anything that allows you to create documents in fied page 2, but then you’ll want to can be printed. If you do any kind any plain like or integrate it back into the original of document authoring either in Li- Emacs, process the document with three page document. breOffice, Open Office.org, , a stylesheet, and then output to a Once you’ve made the modifica- or even just a basic text editor, you PDF that will open in PDF viewers tions to the PDF page, you can ex- have this method to produce at- with all hotlinks enabled. port it page out as a PDF just as you tractive and functional digital docu- To install the docbook toolchain, would with any other application; ments. The disadvantage to this, visit the Ubuntu Software Center choose Print from the File menu, however, is that it does not take and install the following programs: and choose to Print to File. There is a handy commandline tool for this slicing and dicing of PDF files called pdftk (PDF Tool Kit), avail- able from the Ubuntu Software Cen- ter. There’s a lot you can do with pd- ftk, including splitting up the pages in a PDF and then re-constructing it. To break the PDF into pages, you can use the burst option:

pdftk bigfile.pdf burst

To stitch it back together again with the new page 2:

pdftk pg_0001.pdf modified_2. ↵ pdf pg_0003.pdf cat output ↵ newfile.pdf Figure 3. Open any PDF page in Inkscape www.linuxidentity.com/us/ 33 Modifying PDFs | Linux Starter Kit

Save the file as magazine. to your Documents folder, and you’re ready to process the document to apply some (very) basic styles:

xmlto fo ~/Documents/magazine. ↵ xml -o ~/Documents/fo

Figure 4. Modifying PDFs with Inkscape! The “fo” filetype is a PDF-ready for- mat that would look mostly like gib- sudo apt-get install fop xmlto ↵

berish if you were to look at it. So docbook the final step is to process the “.fo” Linux Identity Sample PDF document that xmlto has just cre- If you’ve ever used HTML, then doc- ated into a proper PDF: book will come naturally. If not, then you’ll probably find docbook a little fop ~/Documents/fo/magazine. ↵ technical at first but once you’ve Linux Identity is an informa ↵ fo ~/Documents/magazine.pdf tried it for a few basic documents, tive magazine. Visit ↵ you’ll find the learning curve pretty their website ↵ ments folder. You’ll find magazine. use obvious markup tags that la- today! pdf there, which you can open in bel significant elements in a docu- evince. It won’t be much to look at, ment. These include for
since it is, after all, very basic, but paragraphs, for bul- try clicking the hyperlink and notice let lists, for numbered That is the basic structure of a basic how it automatically opens your web lists, and so on. Once you know a docbook document. To process it, you browser and takes you to the appro- few tags, it’s fairly intuitive. must first add a header lien or two so priate website. To get started on a basic docu- that the stylesheet processor knows Additional features of docbook ment, open gedit and type in the fol- how to interpret it, so add these two include embedding media like lowing sample text: lines to the very top of your document: graphics, providing an automatically hotlinked Table of Contents and In- dex, blockquotes, code boxes, and obviously all the styles and fonts you could ever want. It has been used to create ebook and printed versions of school textbooks, tech- nical manuals, articles (including this one), scientific papers, works of fiction, and much more. Conclusion PDFs are powerful tools for proofing and for delivering rich paperless documents. They can be over-used and mis-used, so think twice before you generate PDFs when you really mean to send text files, , or .odt files. Whatever you use, you can be sure that Linux has plenty of tools to manipulate PDFs, all you Figure 5. Hyperlinked PDFs from tools need to do is explore them.

34 linux identity office