Pdftk Extract Pages from Pdf
Total Page:16
File Type:pdf, Size:1020Kb
Pdftk extract pages from pdf Continue Ubuntu 20.10 is canonical's first release to support the Raspberry Pi single-plate computer. Emperor-OS is a new Linux distribution focused on programming, development and data science. The beta version for the latest release of Ubuntu is now available for trial. The chrome-based MS Edge browser will soon be available for testing on Linux. Lenovo is doing well on its promise to support Linux as a pre-installation option. One of the most beautiful Linux distributions on the market has been updated. Fan-favorite, Zorin OS 15.3 is officially available. Lenovo is doing well on its Linux promise by adding Fedora to the mix. System76 has unleashed a beast from a mobile work station. In a restructuring move, Mozilla lays out 250 employees, but then inks a lot with Google. As it turns out, I can do it with the imagination of images. If you don't have it, simply install it with: sudo apt-get install imagemagick Note 1: I've tried this with a one-page pdf (I'm learning to use imagemagick, so I didn't want any more problems than necessary). I don't know if/how it will work with multiple pages, but you can extract a bookmark with pdftk: pdftk A=myfile.pdf cat A1 output page1.pdf indicating that the page number will be split (in the example above, A1 selects the first page). Note 2: The resulting image using this procedure will be a raster. Open the pdf with the command screen, which is part of the imagemagick suite: display file.pdf Mine looked like this: Click on the image to see a full resolution version Now click on the window and a menu will appear next to it. There, select Transform | Cultivation. Back in the main window, you can select the area you want to trim simply by dragging the pointer (classic corner-to-corner selection). Note the hand-shaped pointer around the image while selecting This selection can be refined before continuing to the next step. Once you have had to do so, take note of the small rectangle that appears in the top left corner (see image above). Displays the dimensions of the selected area first (e.g. 281x218) and second the coordinates of the first corner (e.g. +256+215). Note the dimensions of the selected area; you will need it at the time of saving the cropped image. Now, again in the pop menu (which is now the specific crop menu), please click the Cut button. Finally, once you are satisfied with the crop results, click on the File menu | Save Browse to the folder where you want to save the cropped pdf, type a name, click the Format button, in the Select Image Format Type select PDF window, and click the Select button. Back to the Look up window and select a file, click the Save button. Before saving, will ask you to select page geometry. Here, type the dimensions of the cropped image, using a simple letter x to separate the width and height. Now, you can do all this perfectly from the command line (the (the converts with the -crop) option -- it's probably faster, but you should know in advance the coordinates of the image you want to extract. Check the conversion of the man and an example on his website. Follow this link. From time to time, I needed to extract some pages from a pdf document from several pages. Let's say you have a 6-page pdf document called myoldfile.pdf. You want to extract in a new file pdf mynewfile.pdf that contains only pages 1 and 2, 4 and 5 of myoldfile.pdf. I did exactly that using pdktk, a command line tool. If pdftk is not yet installed, install it this way on a Debian or Ubuntu-based computer. $ sudo apt-get update $ sudo apt-get install pdftk Then, to make a new pdf with only pages 1, 2, 4 and 5 of the old pdf, do this: $ pdftk myoldfile.pdf cat 1 2 4 5 output mynewfile.pdf Note that cat and output are special pdftk keywords. the cat specifies the operation to be performed in the input file. output signals that what follows is the name of the output pdf file. You can specify page ranges like this: $ pdftk myoldfile.pdf cat 1-2 4-5 output mynewfile.pdftk has some more tricks in your back pocket. For example, you can specify a burst operation to split each page of the input file into a separate output file. $ pdftk myoldfile.pdf burst By default, output files are called pg_0001.pdf, pg_0002.pdf, etc. pdftk is also able to merge multiple pdf files into one pdf. $ pdftk pg_0001.pdf pg_0002.pdf pg_0004.pdf pg_0005.pdf output mynewfile.pdf That would merge the files corresponding to the first, second, fourth and fifth pages into a single output pdf. If you know otherwise easy to split the pages of a pdf file, please tell us in a comment. Very appreciated. Two updates (part 2, part 3) are available for this site. This website uses cookies to improve your experience. Let's say you're ok with this, but you can turn it off if you wish. Accept Read More I have some pdf files about 2000 pages. They are generated randomly. I need to extract some pages that contain some specific patterns, which changes your page number for each pdf. With some steps using pdfToText and AWK, I can get the page numbers and I can store some information in a CSV file like this: PatternA ; 1 3 5 7 Pattern ; 1 8 10 22 I've been trying to make a loop to get and process each line from this csv to the cat option from the pdftk command, but removes the return error: $IFS =$(printf '\t') for the line in 'cat work.csv' do pattern='echo $line ' cut -d ';' -f 1' pages='eco $line ◆ cut -d ';' - f 2' pdftk input.pdf cat $pages output $pattern done When echoing pattern and pages variable , everything is fine. But the order returns the error if I try to get pages from a variable: Error: Unexpected text $pages at the end of the page range, here: 1 3 5 7 Exit. Acceptable keywords, for example: even or weird. To rotate pages, use: North South East West Left Right or Down Errors Errors No output created. Fact. Input errors, so no output was created. What am I doing wrong? Thank you! I would like to extract page ranges from a PDF document to a new PDF document that uses the command line in Linux. Note that: Pdftk - Pdf toolkit fails for me with: $ pdftk input.pdf cat 1 verbal output.pdf Error: Failed to open PDF file: input.pdf Errors found. No output created. Fact. Input errors, so no output was created. It turns out that You (should) know that Pdftk is nothing more than a very old version of iText.... The keywords in the previous statement are VERY OLD. (since pdftk cannot open the pdf file) $ java -classpath /path/to/Multivalent20091027.pdf.Split -page 1 input.pdf Exception in the main thread java.lang.NoClassDefFoundErr tool/pdf/Split Caused by: java.lang.ClassNotFoundException: tool.pdf.Split at java.net URLClassLoader $1.run(URLClassLoader.java:202) to java.security.AccessController .doPrivileged(Native method) to java.net.URLClassLoader.findClass(URLClassLoader.java:190) to java.lang.ClassLoader.loadClass(ClassLoader.java:306) to sun.mi sc. Launcher$AppClassLoader.loadClass(Launcher.java:301) to java.lang.ClassLoader.loadClass(ClassLoader.java:247) The main class could not be found: tool.pdf.Split. The program will come out. Turns out this is a bit of a difficult software: even if yours in SourceForge, and says that Practical Thinking generously provides these tools for free use on the command line here – but, here, here it says: The browser is open source. Document tools are a free bonus and not open source. ... which finally clarifies the conversion comment – Paste (Imposition) PDF documents - Stack Overflow: All multivalent releases linked from the official source forging site are losing the tool pack. (edit: there seems to be an old multipurpose version with the tools included, see the OS link; but as it looks a bit like abandonware, I'd rather not use it) Finally, I would like to avoid tools that are essentially front-facing for Latex such as PDFjam So, are there options for such pdf splitting command line tool under Linux? Let's say I created a 100-page book with the document class of the book. What is the quickest way to extract, for example, pages 3, 67-70 and 80 from the workbook into six separate PDF files? Is there any standard tool/script out there that does it very quickly? Doing it manually through a user interface is quite tedious. There are several ways to extract a number of pages from a PDF file: there are PDF-related toolkits to do so, or you can use Ghostscript directly. For example, to extract pages 22-36 from a 100-page PDF file using pdftk: $ pdftk A = cat A22-36 output outfile_p22-p36.pdf Or use a combination of xpdf-utils (or poppler-tools) with psutils and the ps2pdf command (which is sent as part of Ghostscript): $ pdftops 100p-inputfile.pdf - | Psselect -p22-36 | \ ps2pdf14 - - Or, just use Ghostscript (which, unlike pdftk, is installed almost everywhere; and you've been using it in the last order anyway): $ gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER\-dFirstPage=22 -dLastPage=36\-sOutputFile=outfile_p22-p36.pdf 100p-inputfile.pdf In terms of processing speed and efficiency and the most important quality of the output file, the 2nd method above is surely the worst of the 3.