IDOL Keyview Filter SDK 12.8 C Programming Guide
Total Page:16
File Type:pdf, Size:1020Kb
IDOL KeyView Software Version 12.8 Filter SDK C Programming Guide Document Release Date: February 2021 Software Release Date: February 2021 Filter SDK C Programming Guide Legal notices Copyright notice © Copyright 2016-2021 Micro Focus or one of its affiliates. The only warranties for products and services of Micro Focus and its affiliates and licensors (“Micro Focus”) are as may be set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Micro Focus shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Documentation updates The title page of this document contains the following identifying information: l Software Version number, which indicates the software version. l Document Release Date, which changes each time the document is updated. l Software Release Date, which indicates the release date of this version of the software. To check for updated documentation, visit https://www.microfocus.com/support-and-services/documentation/. Support Visit the MySupport portal to access contact information and details about the products, services, and support that Micro Focus offers. This portal also provides customer self-solve capabilities. It gives you a fast and efficient way to access interactive technical support tools needed to manage your business. As a valued support customer, you can benefit by using the MySupport portal to: l Search for knowledge documents of interest l Access product documentation l View software vulnerability alerts l Enter into discussions with other software customers l Download software patches l Manage software licenses, downloads, and support contracts l Submit and track service requests l Contact customer support l View information about all services that Support offers Many areas of the portal require you to sign in. If you need an account, you can create one when prompted to sign in. To learn about the different access levels the portal uses, see the Access Levels descriptions. IDOL KeyView (12.8) Page 2 of 381 Filter SDK C Programming Guide Contents Part I: Overview of Filter SDK 11 Chapter 1: Introducing Filter SDK 12 Overview 12 Features 12 Platforms, Compilers, and Dependencies 13 Supported Platforms 13 Supported Compilers 14 Software Dependencies 14 Windows Installation 15 UNIX Installation 16 Package Contents 16 License Information 17 Enable Advanced Document Readers 17 Pass License Information to KeyView 18 Directory Structure 19 Chapter 2: Getting Started 21 Architectural Overview 21 File Caching 22 Filtering 23 Subfile Extraction 23 Memory Abstraction 23 Use the C-Language Implementation of the API 24 Input/Output Operations 24 Filtering in File Mode 25 Filtering in Stream Mode 25 Multithreaded Filtering 26 The Filter Process Model 27 Filter API 27 File Extraction API 28 Persist the Child Process 28 In the API 28 In the formats.ini File 28 Run Filter In Process 29 In the API 29 In the formats.ini File 29 Run File Extraction Functions Out of Process 29 Restart the File Extraction Server 29 Out-of-Process Logging 30 Enable Out-of-Process Logging 30 Set the Verbosity Level 30 Enable Windows Minidump 31 IDOL KeyView (12.8) Page 3 of 381 Filter SDK C Programming Guide Keep Log Files 31 Run File Detection In or Out of Process 32 Specify the Process Type In the formats.ini File 32 Specify the Process Type In the API 32 Stream Data to Filter 32 Part II: Use Filter SDK 34 Chapter 3: Use the File Extraction API 35 Introduction 35 Extract Subfiles 36 Sanitize Absolute Paths 37 Extract Images 38 Recreate a File’s Hierarchy 38 Create a Root Node 39 Recreate a File’s Hierarchy—Example 39 Extract Mail Metadata 40 Default Metadata Set 40 Extract the Default Metadata Set 41 Extract All Metadata 41 Microsoft Outlook (MSG) Metadata 41 Extract MSG-Specific Metadata 43 Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata 43 Extract EML- or MBX-Specific Metadata 43 Lotus Notes Database (NSF) Metadata 44 Extract NSF-Specific Metadata 44 Microsoft Personal Folders File (PST) Metadata 45 MAPI Properties 45 Extract PST-Specific Metadata 46 Exclude Metadata from the Extracted Text File 47 Extract Subfiles from Outlook Files 47 Extract Subfiles from Outlook Express Files 47 Extract Subfiles from Mailbox Files 47 Extract Subfiles from Outlook Personal Folders Files 48 Choose the Reader to use for PST Files 48 MAPI Attachment Methods 50 Open Secured PST Files 50 Detect PST Files While the Outlook Client is Running 51 Extract Subfiles from Lotus Domino XML Language Files 51 Extract .DXL Files to HTML 52 Extract Subfiles from Lotus Notes Database Files 52 System Requirements 52 Installation and Configuration 53 Windows 53 Solaris 53 AIX 5.x 54 Linux 54 IDOL KeyView (12.8) Page 4 of 381 Filter SDK C Programming Guide Open Secured NSF Files 55 Format Note Subfiles 55 Extract Subfiles from PDF Files 55 Improve Performance for PDFs with Many Small Images 55 Extract Embedded OLE Objects 56 Extract Subfiles from ZIP Files 56 Default File Names for Extracted Subfiles 56 Default File Name for Mail Formats 56 Default File Name for Embedded OLE Objects 58 Chapter 4: Use the Filter API 59 Generate an Error Log 59 Enable or Disable Error Logging 60 Use the API 60 Use Environment Variables 60 Change the Path and File Name of the Log File 60 Report Memory Errors 61 Use the API 61 Use Environment Variables 61 Specify a Memory Guard 61 Report the File Name in Stream Mode 61 Report Extended Error Codes 62 Specify the Maximum Size of the Log File 62 Extract Metadata 62 Extract Metadata for File Filtering 63 Extract Metadata for Stream Filtering 63 Example 63 Convert Character Sets 65 Determine the Character Set of the Output Text 65 Guidelines for Character Set Conversion 65 Set the Character Set During Filtering 66 Set the Character Set During Subfile Extraction 66 Customize Character Set Detection and Conversion 67 Extract Deleted Text Marked by Tracked Changes 67 Filter PDF Files 68 Filter PDF Files to a Logical Reading Order 68 Enable Logical Reading Order 69 Use the C API 69 Use the formats.ini File 70 Rotated Text 70 Extract Custom Metadata from PDF Files 71 Extract Custom Metadata by Tag 71 Extract All Custom Metadata 71 Filter Tagged PDF Content 72 Skip Embedded Fonts 72 Use the formats.ini File 73 Use the C API 73 IDOL KeyView (12.8) Page 5 of 381 Filter SDK C Programming Guide Control Hyphenation 73 Use the formats.ini File 74 Use the C API 74 Filter Portfolio PDF Files 74 Filter Spreadsheet Files 74 Filter Worksheet Names 74 Filter Hidden Text in Microsoft Excel Files 75 Specify Date and Time Format on UNIX Systems 75 Filter Very Large Numbers in Spreadsheet Cells to Precision Numbers 76 Extract Microsoft Excel Formulas 76 Standardize Cell Formats 78 Numbers 78 Text 78 Dates 78 Filter XML Files 79 Configure Element Extraction for XML Documents 79 Modify Element Extraction Settings 80 Explore XML Extraction Settings with the Sample Program 80 Specify an Element's Namespace and Attribute 82 Configure Headers and Footers 83 Filter Hidden Data 83 Hidden Data in Microsoft Excel Documents 84 Example 85 Toggle Hidden Excel Data Settings in the formats.ini File 85 Hidden Data in HTML Documents 85 Tab Delimited Output for Embedded Tables 86 Table Detection for PDF Files 86 Exclude Japanese Guide Text 86 Source Code Identification 87 Configure the Proxy for RMS 88 Chapter 5: Sample Programs 89 Introduction 89 tstxtract 89 filter 91 Part III: C API Reference 94 Chapter 6: File Extraction API Functions 95 KVGetExtractInterface() 95 fpCloseFile() 97 fpExtractSubFile() 97 fpFreeStruct() 99 fpGetMainFileInfo() 100 fpGetSubFileInfo() 101 fpGetSubFileMetaData() 102 fpOpenFile() 104 IDOL KeyView (12.8) Page 6 of 381 Filter SDK C Programming Guide fpSetExtractionTimeout() 105 Chapter 7: File Extraction API Structures 107 KVCredential 107 KVCredentialComponent 108 KVExtractInterface 108 KVExtractSubFileArg 109 KVGetSubFileMetaArg 112 KVMainFileInfo 113 KVMetadataElem 114 KVMetaName 115 KVOpenFileArg 116 KVOutputStream 117 KVSubFileExtractInfo 118 KVSubFileInfo 119 KVSubFileMetaData 122 Chapter 8: Filter API Functions 123 KV_GetFilterInterfaceEx() 124 fpCanFilterFile() 126 fpCanFilterStream() 127 fpCloseStream() 128 fpConfigureRMS() 128 fpFiletoInputStreamCreate() 130 fpFileToInputStreamFree() 131 fpFilterConfig() 132 fpFilterFile() 138 fpFilterStream() 139 fpFreeFilterOutput() 141 fpFreeOLESummaryInfo() 142 fpFreeXmpInfo() 143 fpGetDocInfoFile() 144 fpGetDocInfoStream() 145 fpGetKvErrorCodeEx() 146 fpGetOLESummaryInfo() 147 fpGetOLESummaryInfoFile() 148 fpGetTrgCharSet() 149 fpGetXmpInfo() 150 fpGetXmpInfoFile() 152 fpInit() 154 fpInitWithLicenseData() 156 fpOpenStream() 160 fpOpenStreamEx2() 161 fpRefreshFilterKVOOP() 162 fpSetReplacementChar() 163 fpSetSrcCharSet() 164 fpSetTimeout() 165 fpShutdown() 166 IDOL KeyView (12.8) Page 7 of 381 Filter SDK C Programming Guide Chapter 9: Filter API Structures 167 KVFltInterfaceEx 168 ADDOCINFO 171 KV_CONFIG_Arg 172 KVFilterOutput 173 KVInputStream 174 KVMemoryStream 175 KVRMSCredentials 175 KVStructHead 177 KVSumInfoElemEx 178 KVSummaryInfoEx 179 KVXConfigInfo 180 KVXmpInfo 182 KVXmpInfoElems 183 Chapter 10: Enumerated Types 184 Introduction 184 Programming Guidelines 185 ENDocAttributes 185 KVCredKeyType 186 KVErrorCode 186 KVErrorCodeEx 188 KVMetadataType 192 KVMetaNameType 193 KVSumInfoType 194 KVSumType 195 LPDF_DIRECTION 198 Appendixes 200 Appendix A: Supported Formats 201 Key to Supported