IDOL Keyview Filter SDK 12.5 C Programming Guide
Total Page:16
File Type:pdf, Size:1020Kb
KeyView Software Version 12.5 Filter SDK C Programming Guide Document Release Date: February 2020 Software Release Date: February 2020 Filter SDK C Programming Guide Legal notices Copyright notice © Copyright 2016-2020 Micro Focus or one of its affiliates. The only warranties for products and services of Micro Focus and its affiliates and licensors (“Micro Focus”) are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Micro Focus shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Documentation updates The title page of this document contains the following identifying information: l Software Version number, which indicates the software version. l Document Release Date, which changes each time the document is updated. l Software Release Date, which indicates the release date of this version of the software. To check for updated documentation, visit https://www.microfocus.com/support-and-services/documentation/. Support Visit the MySupport portal to access contact information and details about the products, services, and support that Micro Focus offers. This portal also provides customer self-solve capabilities. It gives you a fast and efficient way to access interactive technical support tools needed to manage your business. As a valued support customer, you can benefit by using the MySupport portal to: l Search for knowledge documents of interest l Access product documentation l View software vulnerability alerts l Enter into discussions with other software customers l Download software patches l Manage software licenses, downloads, and support contracts l Submit and track service requests l Contact customer support l View information about all services that Support offers Many areas of the portal require you to sign in. If you need an account, you can create one when prompted to sign in. To learn about the different access levels the portal uses, see the Access Levels descriptions. KeyView (12.5) Page 2 of 355 Filter SDK C Programming Guide Contents Part I: Overview of Filter SDK 11 Chapter 1: Introducing Filter SDK 12 Overview 12 Features 12 Platforms, Compilers, and Dependencies 13 Supported Platforms 13 Supported Compilers 14 Software Dependencies 14 Windows Installation 15 UNIX Installation 16 Package Contents 17 License Information 18 Enable Advanced Document Readers 18 Update License Information 18 Directory Structure 19 Chapter 2: Getting Started 21 Architectural Overview 21 Enhance Performance 23 File Caching 23 Filtering 23 Subfile Extraction 24 Memory Abstraction 24 Use the C-Language Implementation of the API 24 Input/Output Operations 25 Filtering in File Mode 25 Filtering in Stream Mode 26 Multithreaded Filtering 27 The Filter Process Model 28 Filter API 28 File Extraction API 28 Persist the Child Process 29 In the API 29 In the formats.ini File 29 Run Filter In Process 29 In the API 30 In the formats.ini File 30 Run File Extraction Functions Out of Process 30 Restart the File Extraction Server 30 Out-of-Process Logging 31 Enable Out-of-Process Logging 31 Set the Verbosity Level 31 KeyView (12.5) Page 3 of 355 Filter SDK C Programming Guide Enable Windows Minidump 32 Keep Log Files 32 Run File Detection In or Out of Process 32 Specify the Process Type In the formats.ini File 33 Specify the Process Type In the API 33 Stream Data to Filter 33 Part II: Use Filter SDK 34 Chapter 3: Use the File Extraction API 35 Introduction 35 Extract Subfiles 36 Sanitize Absolute Paths 37 Extract Images 38 Recreate a File’s Hierarchy 38 Create a Root Node 38 Recreate a File’s Hierarchy—Example 39 Extract Mail Metadata 40 Default Metadata Set 40 Extract the Default Metadata Set 41 Extract All Metadata 41 Microsoft Outlook (MSG) Metadata 41 Extract MSG-Specific Metadata 42 Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata 43 Extract EML- or MBX-Specific Metadata 43 Lotus Notes Database (NSF) Metadata 44 Extract NSF-Specific Metadata 44 Microsoft Personal Folders File (PST) Metadata 45 MAPI Properties 45 Extract PST-Specific Metadata 46 Exclude Metadata from the Extracted Text File 46 Extract Subfiles from Outlook Files 47 Extract Subfiles from Outlook Express Files 47 Extract Subfiles from Mailbox Files 47 Extract Subfiles from Outlook Personal Folders Files 47 Choose the Reader to use for PST Files 48 MAPI Attachment Methods 49 Open Secured PST Files 50 Detect PST Files While the Outlook Client is Running 50 Extract Subfiles from Lotus Domino XML Language Files 51 Extract .DXL Files to HTML 51 Extract Subfiles from Lotus Notes Database Files 52 System Requirements 52 Installation and Configuration 53 Windows 53 Solaris 53 KeyView (12.5) Page 4 of 355 Filter SDK C Programming Guide AIX 5.x 54 Linux 54 Open Secured NSF Files 54 Format Note Subfiles 55 Extract Subfiles from PDF Files 55 Improve Performance for PDFs with Many Small Images 55 Extract Embedded OLE Objects 55 Extract Subfiles from ZIP Files 56 Default File Names for Extracted Subfiles 56 Default File Name for Mail Formats 56 Default File Name for Embedded OLE Objects 57 Chapter 4: Use the Filter API 58 Generate an Error Log 58 Enable or Disable Error Logging 59 Use the API 59 Use Environment Variables 59 Change the Path and File Name of the Log File 59 Report Memory Errors 60 Use the API 60 Use Environment Variables 60 Specify a Memory Guard 60 Report the File Name in Stream Mode 60 Report Extended Error Codes 61 Specify the Maximum Size of the Log File 61 Extract Metadata 61 Extract Metadata for File Filtering 62 Extract Metadata for Stream Filtering 62 Example 62 Convert Character Sets 63 Determine the Character Set of the Output Text 64 Guidelines for Character Set Conversion 64 Set the Character Set During Filtering 65 Set the Character Set During Subfile Extraction 65 Prevent the Default Conversion of a Character Set 65 Extract Deleted Text Marked by Tracked Changes 66 Filter PDF Files 66 Filter PDF Files to a Logical Reading Order 66 Enable Logical Reading Order 68 Use the C API 68 Use the formats.ini File 68 Rotated Text 69 Extract Custom Metadata from PDF Files 69 Extract Custom Metadata by Tag 69 Extract All Custom Metadata 70 Filter Tagged PDF Content 70 Skip Embedded Fonts 71 KeyView (12.5) Page 5 of 355 Filter SDK C Programming Guide Use the formats.ini File 71 Use the C API 72 Control Hyphenation 72 Use the formats.ini File 72 Use the C API 72 Filter Portfolio PDF Files 73 Filter Spreadsheet Files 73 Filter Worksheet Names 73 Filter Hidden Text in Microsoft Excel Files 73 Specify Date and Time Format on UNIX Systems 73 Filter Very Large Numbers in Spreadsheet Cells to Precision Numbers 74 Extract Microsoft Excel Formulas 74 Standardize Cell Formats 76 Numbers 76 Text 77 Dates 77 Filter XML Files 77 Configure Element Extraction for XML Documents 78 Modify Element Extraction Settings 78 Modify Element Extraction Settings in the kvxconfig.ini File 79 Specify an Element's Namespace and Attribute 81 Add Configuration Settings for Custom XML Document Types 81 Configure Headers and Footers 82 Filter Hidden Data 83 Hidden Data in Microsoft Excel Documents 83 Example 84 Toggle Hidden Excel Data Settings in the formats.ini File 84 Hidden Data in HTML Documents 84 Tab Delimited Output for Embedded Tables 85 Table Detection for PDF Files 85 Exclude Japanese Guide Text 86 Source Code Identification 86 Chapter 5: Sample Programs 88 Introduction 88 tstxtract 88 filter 90 Part III: C API Reference 93 Chapter 6: File Extraction API Functions 94 KVGetExtractInterface() 94 fpCloseFile() 95 fpExtractSubFile() 96 fpFreeStruct() 97 fpGetMainFileInfo() 98 fpGetSubFileInfo() 99 KeyView (12.5) Page 6 of 355 Filter SDK C Programming Guide fpGetSubFileMetaData() 100 fpOpenFile() 102 fpSetExtractionTimeout() 103 Chapter 7: File Extraction API Structures 105 KVCredential 105 KVCredentialComponent 106 KVExtractInterface 106 KVExtractSubFileArg 107 KVGetSubFileMetaArg 110 KVMainFileInfo 111 KVMetadataElem 112 KVMetaName 113 KVOpenFileArg 114 KVOutputStream 115 KVSubFileExtractInfo 115 KVSubFileInfo 116 KVSubFileMetaData 119 Chapter 8: Filter API Functions 121 KV_GetFilterInterfaceEx() 122 fpCanFilterFile() 124 fpCanFilterStream() 125 fpCloseStream() 126 fpFiletoInputStreamCreate() 127 fpFileToInputStreamFree() 128 fpFilterConfig() 129 fpFilterFile() 134 fpFilterStream() 135 fpFreeFilterOutput() 136 fpFreeOLESummaryInfo() 137 fpFreeXmpInfo() 138 fpGetDocInfoFile() 139 fpGetDocInfoStream() 140 fpGetKvErrorCodeEx() 141 fpGetOLESummaryInfo() 142 fpGetOLESummaryInfoFile() 143 fpGetTrgCharSet() 144 fpGetXmpInfo() 145 fpGetXmpInfoFile() 146 fpInit() 147 fpInitWithLicenseData() 149 fpOpenStream() 152 fpOpenStreamEx2() 153 fpRefreshFilterKVOOP() 154 fpSetReplacementChar() 155 fpSetSrcCharSet() 156 fpSetTimeout() 157 KeyView (12.5) Page 7 of 355 Filter SDK C Programming Guide fpShutdown() 158 Chapter 9: Filter API Structures 159 KVFltInterfaceEx 160 ADDOCINFO 163 KV_CONFIG_Arg 164 KVFilterOutput 165 KVInputStream 166 KVMemoryStream 167 KVStructHead 168 KVSumInfoElemEx 169 KVSummaryInfoEx 170 KVXConfigInfo 171 KVXmpInfo 173 KVXmpInfoElems 174 Chapter 10: Enumerated Types 175 Introduction 175 Programming Guidelines 176 ENDocAttributes 176 KVCredKeyType 177 KVErrorCode 177 KVErrorCodeEx 179 KVMetadataType 182 KVMetaNameType 184 KVSumInfoType 184 KVSumType 185 LPDF_DIRECTION 189 Appendixes 190 Appendix A: Supported Formats 191 Key to Supported Formats Table 191 Supported Formats