IDOL Keyview Filter SDK 12.6 C++ Programming Guide
Total Page:16
File Type:pdf, Size:1020Kb
KeyView Software Version 12.6 Filter SDK C++ Programming Guide Document Release Date: June 2020 Software Release Date: June 2020 Filter SDK C++ Programming Guide Legal notices Copyright notice © Copyright 2016-2020 Micro Focus or one of its affiliates. The only warranties for products and services of Micro Focus and its affiliates and licensors (“Micro Focus”) are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Micro Focus shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Documentation updates The title page of this document contains the following identifying information: l Software Version number, which indicates the software version. l Document Release Date, which changes each time the document is updated. l Software Release Date, which indicates the release date of this version of the software. To check for updated documentation, visit https://www.microfocus.com/support-and-services/documentation/. Support Visit the MySupport portal to access contact information and details about the products, services, and support that Micro Focus offers. This portal also provides customer self-solve capabilities. It gives you a fast and efficient way to access interactive technical support tools needed to manage your business. As a valued support customer, you can benefit by using the MySupport portal to: l Search for knowledge documents of interest l Access product documentation l View software vulnerability alerts l Enter into discussions with other software customers l Download software patches l Manage software licenses, downloads, and support contracts l Submit and track service requests l Contact customer support l View information about all services that Support offers Many areas of the portal require you to sign in. If you need an account, you can create one when prompted to sign in. To learn about the different access levels the portal uses, see the Access Levels descriptions. KeyView (12.6) Page 2 of 261 Filter SDK C++ Programming Guide Contents Part I: Overview of Filter SDK 11 Chapter 1: Introducing Filter SDK 12 Overview 12 Features 12 Platforms, Compilers, and Dependencies 13 Supported Platforms 13 Supported Compilers 14 C++ Filter SDK 14 Software Dependencies 14 Windows Installation 15 UNIX Installation 16 Package Contents 17 License Information 18 Enable Advanced Document Readers 18 Update License Information 18 Directory Structure 19 Chapter 2: Getting Started 22 Use the C++ Language Implementation of the API 22 Build the C++ API 22 Create a KeyView Session 23 Configure your session 23 Detect the Format of a File 24 Filter a File 24 Extract Subfiles 24 Extract Metadata 25 Exceptions 25 Generic IO Types 25 Part II: Use Filter SDK 28 Chapter 3: Use the File Extraction API 29 Introduction 29 Extract Subfiles 30 Extract Images 31 Extract Mail Metadata 31 Default Metadata Set 31 Extract the Default Metadata Set 32 Extract Subfiles from Outlook Express Files 32 Extract Subfiles from Mailbox Files 32 Extract Subfiles from Outlook Personal Folders Files 33 Choose the Reader to use for PST Files 33 KeyView (12.6) Page 3 of 261 Filter SDK C++ Programming Guide MAPI Attachment Methods 35 Open Secured PST Files 35 Detect PST Files While the Outlook Client is Running 35 Extract Subfiles from Lotus Domino XML Language Files 36 Extract .DXL Files to HTML 36 Extract Subfiles from Lotus Notes Database Files 37 System Requirements 37 Installation and Configuration 37 Windows 38 Solaris 38 AIX 5.x 38 Linux 39 Open Secured NSF Files 39 Format Note Subfiles 39 Extract Subfiles from PDF Files 40 Improve Performance for PDFs with Many Small Images 40 Extract Embedded OLE Objects 40 Extract Subfiles from ZIP Files 41 Extract Metadata 41 Chapter 4: Use the Filter API 42 Generate an Error Log 42 Enable or Disable Error Logging 43 Use the API 43 Use Environment Variables 43 Change the Path and File Name of the Log File 43 Report Memory Errors 43 Use the API 44 Use Environment Variables 44 Specify a Memory Guard 44 Specify the Maximum Size of the Log File 44 Extract Metadata 45 Convert Character Sets 45 Determine the Character Set of the Output Text 45 Guidelines for Character Set Conversion 46 Set the Character Set During Filtering 46 Set the Character Set During Subfile Extraction 47 Extract Deleted Text Marked by Tracked Changes 47 Filter a File 47 Filter PDF Files 48 Filter PDF Files to a Logical Reading Order 48 Enable Logical Reading Order 49 Use the C++ API 49 Use the formats.ini File 49 Rotated Text 50 Extract Custom Metadata from PDF Files 50 Extract All Custom Metadata 50 KeyView (12.6) Page 4 of 261 Filter SDK C++ Programming Guide Filter Tagged PDF Content 50 Skip Embedded Fonts 51 Use the formats.ini File 51 Use the C++ API 51 Control Hyphenation 52 Use the formats.ini File 52 Use the C++ API 52 Filter Portfolio PDF Files 52 Filter Spreadsheet Files 52 Filter Worksheet Names 52 Filter Hidden Text in Microsoft Excel Files 53 Specify Date and Time Format on UNIX Systems 53 Filter Very Large Numbers in Spreadsheet Cells to Precision Numbers 53 Extract Microsoft Excel Formulas 54 Configure Headers and Footers 55 Filter Hidden Data 56 Hidden Data in HTML Documents 56 Tab Delimited Output for Embedded Tables 56 Table Detection for PDF Files 57 Exclude Japanese Guide Text 57 Source Code Identification 57 Chapter 5: Sample Programs 60 Introduction 60 Build the Sample Programs 60 Run the Sample Programs 61 detect 61 extract 62 filter_document 62 metadata 63 subfiles 63 filter_container 63 Part III: C++ API Reference 64 Chapter 7: InputTypes and OutputTypes 66 Chapter 8: The keyview Namespace 68 The Session Class 68 Constructor 68 config 69 detect 69 filter 69 get_summary_information 69 metadata_map 70 subfiles 70 The Configuration Class 70 Constructor 70 KeyView (12.6) Page 5 of 261 Filter SDK C++ Programming Guide custom_pdf_metadata 70 date_time_field_codes 71 extraction_timeout 71 filename_field_code 71 formatted_mail 71 header_and_footer 72 header_and_footer_tags 72 hidden_text 72 no_encoding_conversion 72 out_of_process_log 73 out_of_process_memory_log 73 password 73 pdf_logical_reading 73 revision_marks 74 skip_comments 74 skip_embedded_fonts 74 skip_thumbnail 74 soft_hyphens 75 source_encoding 75 tagged_pdf_content 75 target_encoding 75 string& temporary_directory 76 timeout 76 unicode_byte_order_marker 76 The DetectionInfo Class 76 appleDoubleEncoded 77 appleSingleEncoded 77 category 77 category_name 77 description 77 encrypted 77 extension 78 format 78 macBinaryEncoded 78 version 78 wangGDLencoded 78 windowRMSEncrypted 79 The Container Class 79 The Subfile Class 79 extract 79 children 79 index 80 is_folder 80 mail_metadata 80 parent 80 rawname 80 KeyView (12.6) Page 6 of 261 Filter SDK C++ Programming Guide size 81 time 81 type 81 The SummaryInfoItem Class 81 apply_visitor 82 convert_to_string 82 name 82 type 82 The SummaryInfoVisitorBase Class 82 visit_boolean 83 visit_datetime 83 visit_double 83 visit_integer 83 visit_target_encoding_string 84 visit_utf8_string 84 Enumerations 84 LogicalPDFDirection 85 SubFile::Type 85 SummaryInfoType 86 Exceptions 86 Chapter 9: The keyview::io Namespace 88 InputFile 88 Constructors 88 OutputFile 88 Constructors 88 OutputStdout 88 Constructors 89 InMemoryFile 89 Constructors 89 Appendixes 90 Appendix A: Supported Formats 91 Key to Supported Formats Table 91 Supported Formats 93 Appendix B: Document Readers 164 Key to Document Readers Table 164 Document Readers 166 Appendix C: Character Sets 195 Multibyte and Bidirectional Support 195 Coded Character Sets 203 Appendix D: Extract and Format Lotus Notes Subfiles 209 Overview 209 Customize XML Templates 209 Use Demo Templates 210 KeyView (12.6) Page 7 of 261 Filter SDK C++ Programming Guide Use Old Templates 210 Disable XML Templates 210 Template Elements and Attributes 211 Conditional Elements 211 Control Elements 212 Data Elements 213 Date and Time Formats 216 Lotus Notes Date and Time Formats 216 KeyView Date and Time Formats 217 Appendix E: File Format Detection 222 Introduction 222 Extract Format Information 222 Determine Format Support 222 Example formats.ini file entries 223 Refine Detection of Text Files 223 Allow Consecutive NULL Bytes in a Text File 224 Translate Format Information 225 Distinguish Between Formats 225 Determine a Document Reader 226 Category Values in formats.ini 226 Appendix F: List of Required Files for Redistribution 230 Core Files 230 Support Files 231 Document Readers 232 Appendix G: Develop a Custom Reader 239 Introduction 239 How to Write a Custom Reader 240 Naming Conventions 240 Basic Steps 241 Token Buffer 241 Macros 242 Reader Interface 243 Function Flow 243 Example Development of fffFillBuffer() 244 Implementation 1—fpFillBuffer() Function 244 Structure of Implementation 1 245 Problems with Implementation 1 245 Implementation 2—Processing a Large Token Stream 246 Structure of Implementation 2 247 Problems with Implementation 2 247 Boundary Conditions 247 Implementation 3—Interrupting Structured Access Layer Calls 248 Structure of Implementation 3 250 Development Tips 250 Functions 251 KeyView (12.6) Page 8 of 261 Filter SDK C++ Programming Guide xxxsrAutoDet() 251 xxxAllocateContext() 252 xxxFreeContext() 253 xxxInitDoc() 254 xxxFillBuffer() 254 xxxGetSummaryInfo() 255 xxxOpenStream() 256 xxxCloseStream() 257 xxxCharSet() 257 Appendix H: Password Protected Files 259 Supported Password Protected