Pymupdf Documentation Release 1.18.19
Total Page:16
File Type:pdf, Size:1020Kb
PyMuPDF Documentation Release 1.18.19 Jorj X. McKie Sep 17, 2021 Contents 1 Introduction 1 1.1 Note on the Name fitz ..........................................2 1.2 License and Copyright..........................................2 1.3 Covered Version.............................................2 2 Installation 3 2.1 Step 1: Install MuPDF..........................................3 2.2 Step 2: Download and Generate PyMuPDF...............................3 3 Tutorial 5 3.1 Importing the Bindings..........................................5 3.2 Opening a Document...........................................5 3.3 Some Document Methods and Attributes................................6 3.4 Accessing Meta Data...........................................6 3.5 Working with Outlines..........................................6 3.6 Working with Pages...........................................7 3.6.1 Inspecting the Links, Annotations or Form Fields of a Page..................7 3.6.2 Rendering a Page........................................8 3.6.3 Saving the Page Image in a File................................8 3.6.4 Displaying the Image in GUIs.................................8 3.6.4.1 wxPython.......................................8 3.6.4.2 Tkinter.........................................9 3.6.4.3 PyQt4, PyQt5, PySide.................................9 3.6.5 Extracting Text and Images...................................9 3.6.6 Searching for Text....................................... 10 3.7 PDF Maintenance............................................ 10 3.7.1 Modifying, Creating, Re-arranging and Deleting Pages.................... 11 3.7.2 Joining and Splitting PDF Documents............................. 11 3.7.3 Embedding Data........................................ 11 3.7.4 Saving.............................................. 12 3.8 Closing.................................................. 12 3.9 Further Reading............................................. 12 4 Collection of Recipes 13 4.1 Images.................................................. 13 4.1.1 How to Make Images from Document Pages.......................... 13 4.1.2 How to Increase Image Resolution............................... 14 i 4.1.3 How to Create Partial Pixmaps (Clips)............................. 14 4.1.4 How to Zoom a Clip to a GUI Window............................. 15 4.1.5 How to Create or Suppress Annotation Images......................... 15 4.1.6 How to Extract Images: Non-PDF Documents......................... 15 4.1.7 How to Extract Images: PDF Documents........................... 16 4.1.8 How to Handle Image Masks.................................. 17 4.1.9 How to Make one PDF of all your Pictures (or Files)..................... 18 4.1.10 How to Create Vector Images.................................. 21 4.1.11 How to Convert Images..................................... 21 4.1.12 How to Use Pixmaps: Glueing Images............................. 22 4.1.13 How to Use Pixmaps: Making a Fractal............................ 23 4.1.14 How to Interface with NumPy................................. 25 4.1.15 How to Add Images to a PDF Page............................... 25 4.1.16 How to Control the Size of Inserted Images.......................... 27 4.2 Text.................................................... 27 4.2.1 How to Extract all Document Text............................... 27 4.2.2 How to Extract Text from within a Rectangle......................... 28 4.2.3 How to Extract Text in Natural Reading Order......................... 28 4.2.4 How to Extract Tables from Documents............................ 30 4.2.5 How to Mark Extracted Text.................................. 31 4.2.6 How to Mark Searched Text.................................. 32 4.2.7 How to Mark Non-horizontal Text............................... 33 4.2.8 How to Analyze Font Characteristics.............................. 34 4.2.9 How to Insert Text....................................... 35 4.2.9.1 How to Write Text Lines............................... 36 4.2.9.2 How to Fill a Text Box................................ 37 4.2.9.3 How to Use Non-Standard Encoding......................... 38 4.3 Annotations................................................ 39 4.3.1 How to Add and Modify Annotations............................. 40 4.3.2 How to Use FreeText...................................... 44 4.3.3 Using Buttons and JavaScript.................................. 45 4.3.4 How to Use Ink Annotations.................................. 46 4.4 Drawing and Graphics.......................................... 47 4.5 Extracting Drawings........................................... 49 4.6 Multiprocessing............................................. 52 4.7 General.................................................. 57 4.7.1 How to Open with a Wrong File Extension.......................... 57 4.7.2 How to Embed or Attach Files................................. 57 4.7.3 How to Delete and Re-Arrange Pages............................. 57 4.7.4 How to Join PDFs........................................ 58 4.7.5 How to Add Pages....................................... 59 4.7.6 How To Dynamically Clean Up Corrupt PDFs......................... 60 4.7.7 How to Split Single Pages................................... 61 4.7.8 How to Combine Single Pages................................. 62 4.7.9 How to Convert Any Document to PDF............................ 64 4.7.10 How to Deal with Messages Issued by MuPDF........................ 65 4.7.11 How to Deal with PDF Encryption............................... 66 4.8 Common Issues and their Solutions................................... 68 4.8.1 Changing Annotations: Unexpected Behaviour........................ 68 4.8.1.1 Problem........................................ 68 4.8.1.2 Cause......................................... 68 4.8.1.3 Solutions........................................ 68 4.8.2 Misplaced Item Insertions on PDF Pages............................ 69 4.8.2.1 Problem........................................ 69 ii 4.8.2.2 Cause......................................... 69 4.8.2.3 Solutions........................................ 70 4.8.3 Missing or Unreadable Extracted Text............................. 70 4.8.3.1 Problem: no text is extracted............................. 70 4.8.3.2 Cause......................................... 70 4.8.3.3 Solution........................................ 71 4.8.3.4 Problem: unreadable text............................... 71 4.8.3.5 Cause......................................... 71 4.8.3.6 Solution........................................ 71 4.9 Low-Level Interfaces........................................... 71 4.9.1 How to Iterate through the xref Table............................ 71 4.9.2 How to Handle Object Streams................................. 72 4.9.3 How to Handle Page Contents................................. 73 4.9.4 How to Access the PDF Catalog................................ 74 4.9.5 How to Access the PDF File Trailer.............................. 74 4.9.6 How to Access XML Metadata................................. 75 4.9.7 How to Extend PDF Metadata................................. 75 4.9.8 How to Read and Update PDF Objects............................. 77 5 Module fitz 81 5.1 Invocation................................................ 81 5.2 Cleaning and Copying.......................................... 82 5.3 Extracting Fonts and Images....................................... 83 5.4 Joining PDF Documents......................................... 83 5.5 Low Level Information.......................................... 84 5.6 Embedded Files Commands....................................... 85 5.6.1 Information........................................... 85 5.6.2 Extraction............................................ 86 5.6.3 Deletion............................................. 87 5.6.4 Insertion............................................. 87 5.6.5 Updates............................................. 87 5.6.6 Copying............................................. 88 5.7 Text Extraction.............................................. 88 6 Classes 91 6.1 Annot................................................... 91 6.1.1 Annotation Icons in MuPDF.................................. 101 6.1.2 Example............................................. 102 6.2 Colorspace................................................ 102 6.3 DisplayList................................................ 103 6.4 Document................................................ 104 6.4.1 set_metadata() Example................................. 140 6.4.2 set_toc() Demonstration.................................. 140 6.4.3 insert_pdf() Examples.................................. 141 6.4.4 Other Examples......................................... 141 6.5 Font.................................................... 142 6.6 Identity.................................................. 149 6.7 IRect................................................... 149 6.8 Link.................................................... 152 6.9 linkDest.................................................. 154 6.10 Matrix.................................................. 156 6.10.1 Examples............................................ 160 6.10.2 Shifting............................................. 160 6.10.3 Flipping............................................. 161 iii 6.10.4 Shearing............................................. 162 6.10.5 Rotating............................................. 163 6.11 Outline.................................................. 164