IDOL Keyview Filter SDK 12.7 C Programming Guide
Total Page:16
File Type:pdf, Size:1020Kb
KeyView Software Version 12.7 Filter SDK C Programming Guide Document Release Date: October 2020 Software Release Date: October 2020 Filter SDK C Programming Guide Legal notices Copyright notice © Copyright 2016-2020 Micro Focus or one of its affiliates. The only warranties for products and services of Micro Focus and its affiliates and licensors (“Micro Focus”) are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Micro Focus shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Documentation updates The title page of this document contains the following identifying information: l Software Version number, which indicates the software version. l Document Release Date, which changes each time the document is updated. l Software Release Date, which indicates the release date of this version of the software. To check for updated documentation, visit https://www.microfocus.com/support-and-services/documentation/. Support Visit the MySupport portal to access contact information and details about the products, services, and support that Micro Focus offers. This portal also provides customer self-solve capabilities. It gives you a fast and efficient way to access interactive technical support tools needed to manage your business. As a valued support customer, you can benefit by using the MySupport portal to: l Search for knowledge documents of interest l Access product documentation l View software vulnerability alerts l Enter into discussions with other software customers l Download software patches l Manage software licenses, downloads, and support contracts l Submit and track service requests l Contact customer support l View information about all services that Support offers Many areas of the portal require you to sign in. If you need an account, you can create one when prompted to sign in. To learn about the different access levels the portal uses, see the Access Levels descriptions. KeyView (12.7) Page 2 of 373 Filter SDK C Programming Guide Contents Part I: Overview of Filter SDK 11 Chapter 1: Introducing Filter SDK 12 Overview 12 Features 12 Platforms, Compilers, and Dependencies 13 Supported Platforms 13 Supported Compilers 14 Software Dependencies 14 Windows Installation 15 UNIX Installation 15 Package Contents 16 License Information 17 Enable Advanced Document Readers 17 Pass License Information to KeyView 17 Directory Structure 18 Chapter 2: Getting Started 21 Architectural Overview 21 File Caching 22 Filtering 23 Subfile Extraction 23 Memory Abstraction 23 Use the C-Language Implementation of the API 24 Input/Output Operations 24 Filtering in File Mode 25 Filtering in Stream Mode 25 Multithreaded Filtering 26 The Filter Process Model 27 Filter API 27 File Extraction API 27 Persist the Child Process 28 In the API 28 In the formats.ini File 28 Run Filter In Process 29 In the API 29 In the formats.ini File 29 Run File Extraction Functions Out of Process 29 Restart the File Extraction Server 29 Out-of-Process Logging 30 Enable Out-of-Process Logging 30 Set the Verbosity Level 30 Enable Windows Minidump 31 KeyView (12.7) Page 3 of 373 Filter SDK C Programming Guide Keep Log Files 31 Run File Detection In or Out of Process 31 Specify the Process Type In the formats.ini File 32 Specify the Process Type In the API 32 Stream Data to Filter 32 Part II: Use Filter SDK 34 Chapter 3: Use the File Extraction API 35 Introduction 35 Extract Subfiles 36 Sanitize Absolute Paths 37 Extract Images 38 Recreate a File’s Hierarchy 38 Create a Root Node 38 Recreate a File’s Hierarchy—Example 39 Extract Mail Metadata 40 Default Metadata Set 40 Extract the Default Metadata Set 41 Extract All Metadata 41 Microsoft Outlook (MSG) Metadata 41 Extract MSG-Specific Metadata 42 Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata 43 Extract EML- or MBX-Specific Metadata 43 Lotus Notes Database (NSF) Metadata 44 Extract NSF-Specific Metadata 44 Microsoft Personal Folders File (PST) Metadata 45 MAPI Properties 45 Extract PST-Specific Metadata 46 Exclude Metadata from the Extracted Text File 46 Extract Subfiles from Outlook Files 47 Extract Subfiles from Outlook Express Files 47 Extract Subfiles from Mailbox Files 47 Extract Subfiles from Outlook Personal Folders Files 48 Choose the Reader to use for PST Files 48 MAPI Attachment Methods 50 Open Secured PST Files 50 Detect PST Files While the Outlook Client is Running 50 Extract Subfiles from Lotus Domino XML Language Files 51 Extract .DXL Files to HTML 51 Extract Subfiles from Lotus Notes Database Files 52 System Requirements 52 Installation and Configuration 53 Windows 53 Solaris 53 AIX 5.x 54 KeyView (12.7) Page 4 of 373 Filter SDK C Programming Guide Linux 54 Open Secured NSF Files 55 Format Note Subfiles 55 Extract Subfiles from PDF Files 55 Improve Performance for PDFs with Many Small Images 55 Extract Embedded OLE Objects 55 Extract Subfiles from ZIP Files 56 Default File Names for Extracted Subfiles 56 Default File Name for Mail Formats 56 Default File Name for Embedded OLE Objects 57 Chapter 4: Use the Filter API 58 Generate an Error Log 58 Enable or Disable Error Logging 59 Use the API 59 Use Environment Variables 59 Change the Path and File Name of the Log File 59 Report Memory Errors 60 Use the API 60 Use Environment Variables 60 Specify a Memory Guard 60 Report the File Name in Stream Mode 60 Report Extended Error Codes 61 Specify the Maximum Size of the Log File 61 Extract Metadata 61 Extract Metadata for File Filtering 62 Extract Metadata for Stream Filtering 62 Example 62 Convert Character Sets 64 Determine the Character Set of the Output Text 64 Guidelines for Character Set Conversion 64 Set the Character Set During Filtering 65 Set the Character Set During Subfile Extraction 65 Customize Character Set Detection and Conversion 66 Extract Deleted Text Marked by Tracked Changes 66 Filter PDF Files 67 Filter PDF Files to a Logical Reading Order 67 Enable Logical Reading Order 68 Use the C API 68 Use the formats.ini File 69 Rotated Text 69 Extract Custom Metadata from PDF Files 70 Extract Custom Metadata by Tag 70 Extract All Custom Metadata 70 Filter Tagged PDF Content 71 Skip Embedded Fonts 71 Use the formats.ini File 72 KeyView (12.7) Page 5 of 373 Filter SDK C Programming Guide Use the C API 72 Control Hyphenation 72 Use the formats.ini File 73 Use the C API 73 Filter Portfolio PDF Files 73 Filter Spreadsheet Files 73 Filter Worksheet Names 73 Filter Hidden Text in Microsoft Excel Files 74 Specify Date and Time Format on UNIX Systems 74 Filter Very Large Numbers in Spreadsheet Cells to Precision Numbers 75 Extract Microsoft Excel Formulas 75 Standardize Cell Formats 77 Numbers 77 Text 77 Dates 77 Filter XML Files 78 Configure Element Extraction for XML Documents 78 Modify Element Extraction Settings 79 Explore XML Extraction Settings with the Sample Program 79 Specify an Element's Namespace and Attribute 81 Configure Headers and Footers 82 Filter Hidden Data 82 Hidden Data in Microsoft Excel Documents 83 Example 84 Toggle Hidden Excel Data Settings in the formats.ini File 84 Hidden Data in HTML Documents 84 Tab Delimited Output for Embedded Tables 85 Table Detection for PDF Files 85 Exclude Japanese Guide Text 85 Source Code Identification 86 Configure the Proxy for RMS 87 Chapter 5: Sample Programs 88 Introduction 88 tstxtract 88 filter 90 Part III: C API Reference 93 Chapter 6: File Extraction API Functions 94 KVGetExtractInterface() 94 fpCloseFile() 95 fpExtractSubFile() 96 fpFreeStruct() 97 fpGetMainFileInfo() 98 fpGetSubFileInfo() 99 fpGetSubFileMetaData() 100 KeyView (12.7) Page 6 of 373 Filter SDK C Programming Guide fpOpenFile() 102 fpSetExtractionTimeout() 103 Chapter 7: File Extraction API Structures 105 KVCredential 105 KVCredentialComponent 106 KVExtractInterface 106 KVExtractSubFileArg 107 KVGetSubFileMetaArg 110 KVMainFileInfo 111 KVMetadataElem 112 KVMetaName 113 KVOpenFileArg 114 KVOutputStream 115 KVSubFileExtractInfo 115 KVSubFileInfo 116 KVSubFileMetaData 119 Chapter 8: Filter API Functions 121 KV_GetFilterInterfaceEx() 122 fpCanFilterFile() 124 fpCanFilterStream() 125 fpCloseStream() 126 fpConfigureRMS() 126 fpFiletoInputStreamCreate() 128 fpFileToInputStreamFree() 129 fpFilterConfig() 130 fpFilterFile() 135 fpFilterStream() 136 fpFreeFilterOutput() 137 fpFreeOLESummaryInfo() 138 fpFreeXmpInfo() 139 fpGetDocInfoFile() 140 fpGetDocInfoStream() 141 fpGetKvErrorCodeEx() 142 fpGetOLESummaryInfo() 143 fpGetOLESummaryInfoFile() 144 fpGetTrgCharSet() 145 fpGetXmpInfo() 146 fpGetXmpInfoFile() 148 fpInit() 150 fpInitWithLicenseData() 152 fpOpenStream() 155 fpOpenStreamEx2() 156 fpRefreshFilterKVOOP() 157 fpSetReplacementChar() 158 fpSetSrcCharSet() 159 fpSetTimeout() 160 KeyView (12.7) Page 7 of 373 Filter SDK C Programming Guide fpShutdown() 161 Chapter 9: Filter API Structures 162 KVFltInterfaceEx 163 ADDOCINFO 166 KV_CONFIG_Arg 167 KVFilterOutput 168 KVInputStream 169 KVMemoryStream 170 KVRMSCredentials 170 KVStructHead 172 KVSumInfoElemEx 173 KVSummaryInfoEx 174 KVXConfigInfo 175 KVXmpInfo 177 KVXmpInfoElems 178 Chapter 10: Enumerated Types 179 Introduction 179 Programming Guidelines 180 ENDocAttributes 180 KVCredKeyType 181 KVErrorCode 181 KVErrorCodeEx 183 KVMetadataType 186 KVMetaNameType 188 KVSumInfoType 188 KVSumType 189 LPDF_DIRECTION 193 Appendixes 194 Appendix A: Supported Formats 195 Key to Supported Formats Table 195 Supported Formats 197 Appendix