IDOL Keyview Filter SDK 12.4 C++ Programming Guide
Total Page:16
File Type:pdf, Size:1020Kb
KeyView Software Version 12.4 Filter SDK C++ Programming Guide Document Release Date: October 2019 Software Release Date: October 2019 Filter SDK C++ Programming Guide Legal notices Copyright notice © Copyright 2016-2019 Micro Focus or one of its affiliates. The only warranties for products and services of Micro Focus and its affiliates and licensors (“Micro Focus”) are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Micro Focus shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Documentation updates The title page of this document contains the following identifying information: l Software Version number, which indicates the software version. l Document Release Date, which changes each time the document is updated. l Software Release Date, which indicates the release date of this version of the software. To check for updated documentation, visit https://www.microfocus.com/support-and-services/documentation/. Support Visit the MySupport portal to access contact information and details about the products, services, and support that Micro Focus offers. This portal also provides customer self-solve capabilities. It gives you a fast and efficient way to access interactive technical support tools needed to manage your business. As a valued support customer, you can benefit by using the MySupport portal to: l Search for knowledge documents of interest l Access product documentation l View software vulnerability alerts l Enter into discussions with other software customers l Download software patches l Manage software licenses, downloads, and support contracts l Submit and track service requests l Contact customer support l View information about all services that Support offers Many areas of the portal require you to sign in. If you need an account, you can create one when prompted to sign in. To learn about the different access levels the portal uses, see the Access Levels descriptions. KeyView (12.4) Page 2 of 240 Filter SDK C++ Programming Guide Contents Part I: Overview of Filter SDK 11 Chapter 1: Introducing Filter SDK 12 Overview 12 Features 12 Platforms, Compilers, and Dependencies 13 Supported Platforms 13 Supported Compilers 14 C++ Filter SDK 14 Software Dependencies 14 Windows Installation 15 UNIX Installation 16 Package Contents 17 License Information 18 Enable Advanced Document Readers 18 Update License Information 18 Directory Structure 19 Chapter 2: Getting Started 21 Use the C++ Language Implementation of the API 21 Build the C++ API 21 Create a KeyView Session 22 Configure your session 22 Detect the Format of a File 23 Filter a File 23 Extract Subfiles 23 Extract Metadata 24 Exceptions 24 Generic IO Types 24 Part II: Use Filter SDK 27 Chapter 3: Use the File Extraction API 28 Introduction 28 Extract Subfiles 29 Extract Images 30 Extract Mail Metadata 30 Default Metadata Set 30 Extract the Default Metadata Set 31 Extract Subfiles from Outlook Express Files 31 Extract Subfiles from Mailbox Files 31 Extract Subfiles from Outlook Personal Folders Files 32 Choose the Reader to use for PST Files 32 KeyView (12.4) Page 3 of 240 Filter SDK C++ Programming Guide MAPI Attachment Methods 34 Open Secured PST Files 34 Detect PST Files While the Outlook Client is Running 34 Extract Subfiles from Lotus Domino XML Language Files 35 Extract .DXL Files to HTML 35 Extract Subfiles from Lotus Notes Database Files 36 System Requirements 36 Installation and Configuration 36 Windows 37 Solaris 37 AIX 5.x 37 Linux 38 Open Secured NSF Files 38 Format Note Subfiles 38 Extract Subfiles from PDF Files 39 Improve Performance for PDFs with Many Small Images 39 Extract Embedded OLE Objects 39 Extract Subfiles from ZIP Files 40 Extract Metadata 40 Chapter 4: Use the Filter API 41 Generate an Error Log 41 Enable or Disable Error Logging 42 Use the API 42 Use Environment Variables 42 Change the Path and File Name of the Log File 42 Report Memory Errors 42 Use the API 43 Use Environment Variables 43 Specify a Memory Guard 43 Specify the Maximum Size of the Log File 43 Extract Metadata 44 Convert Character Sets 44 Determine the Character Set of the Output Text 44 Guidelines for Character Set Conversion 45 Set the Character Set During Filtering 45 Set the Character Set During Subfile Extraction 46 Extract Deleted Text Marked by Tracked Changes 46 Filter a File 46 Filter PDF Files 47 Filter PDF Files to a Logical Reading Order 47 Enable Logical Reading Order 48 Use the C++ API 48 Use the formats.ini File 48 Rotated Text 49 Extract Custom Metadata from PDF Files 49 Extract All Custom Metadata 49 KeyView (12.4) Page 4 of 240 Filter SDK C++ Programming Guide Filter Tagged PDF Content 49 Skip Embedded Fonts 50 Use the formats.ini File 50 Use the C++ API 50 Control Hyphenation 51 Use the formats.ini File 51 Use the C++ API 51 Filter Spreadsheet Files 51 Filter Worksheet Names 51 Filter Hidden Text in Microsoft Excel Files 51 Specify Date and Time Format on UNIX Systems 52 Filter Very Large Numbers in Spreadsheet Cells to Precision Numbers 52 Extract Microsoft Excel Formulas 52 Configure Headers and Footers 54 Filter Hidden Data 55 Hidden Data in HTML Documents 55 Tab Delimited Output for Embedded Tables 55 Table Detection for PDF Files 56 Exclude Japanese Guide Text 56 Source Code Identification 56 Chapter 5: Sample Programs 59 Introduction 59 Build the Sample Programs 59 Run the Sample Programs 60 detect 60 extract 61 filter_document 61 metadata 62 subfiles 62 filter_container 62 Part III: C++ API Reference 63 Chapter 7: InputTypes and OutputTypes 65 Chapter 8: The keyview Namespace 67 The Session Class 67 Constructor 67 config 68 detect 68 filter 68 get_summary_information 68 metadata_map 69 subfiles 69 The Configuration Class 69 Constructor 69 custom_pdf_metadata 69 KeyView (12.4) Page 5 of 240 Filter SDK C++ Programming Guide date_time_field_codes 70 extraction_timeout 70 filename_field_code 70 formatted_mail 70 header_and_footer 71 header_and_footer_tags 71 hidden_text 71 no_encoding_conversion 71 out_of_process_log 72 out_of_process_memory_log 72 password 72 pdf_logical_reading 72 revision_marks 73 skip_comments 73 skip_embedded_fonts 73 skip_thumbnail 73 soft_hyphens 74 source_encoding 74 tagged_pdf_content 74 target_encoding 74 string& temporary_directory 75 timeout 75 unicode_byte_order_marker 75 The DetectionInfo Class 75 appleDoubleEncoded 76 appleSingleEncoded 76 category 76 category_name 76 description 76 encrypted 76 extension 77 format 77 macBinaryEncoded 77 version 77 wangGDLencoded 77 windowRMSEncrypted 78 The Container Class 78 The Subfile Class 78 extract 78 children 78 index 79 is_folder 79 mail_metadata 79 parent 79 rawname 79 size 80 KeyView (12.4) Page 6 of 240 Filter SDK C++ Programming Guide time 80 type 80 The SummaryInfoItem Class 80 apply_visitor 81 convert_to_string 81 name 81 type 81 The SummaryInfoVisitorBase Class 81 visit_boolean 82 visit_datetime 82 visit_double 82 visit_integer 82 visit_target_encoding_string 83 visit_utf8_string 83 Enumerations 83 LogicalPDFDirection 84 SubFile::Type 84 SummaryInfoType 85 Exceptions 85 Chapter 9: The keyview::io Namespace 87 InputFile 87 Constructors 87 OutputFile 87 Constructors 87 OutputStdout 87 Constructors 88 InMemoryFile 88 Constructors 88 Appendixes 89 Appendix A: Supported Formats 90 Supported Formats 90 Archive Formats 91 Binary Format 94 Computer-Aided Design Formats 95 Database Formats 96 Desktop Publishing 97 Display Formats 97 Graphic Formats 98 Mail Formats 102 Multimedia Formats 105 Presentation Formats 108 Spreadsheet Formats 111 Text and Markup Formats 113 Word Processing Formats 114 KeyView (12.4) Page 7 of 240 Filter SDK C++ Programming Guide Appendix B: Detected Formats 120 Key to Detected Formats Table 120 Detected Formats 122 Appendix C: Character Sets 174 Multibyte and Bidirectional Support 174 Coded Character Sets 182 Appendix D: Extract and Format Lotus Notes Subfiles 188 Overview 188 Customize XML Templates 188 Use Demo Templates 189 Use Old Templates 189 Disable XML Templates 189 Template Elements and Attributes 190 Conditional Elements 190 Control Elements 191 Data Elements 192 Date and Time Formats 195 Lotus Notes Date and Time Formats 195 KeyView Date and Time Formats 196 Appendix E: File Format Detection 201 Introduction 201 Extract Format Information 201 Determine Format Support 201 Example formats.ini file entries 202 Refine Detection of Text Files 202 Allow Consecutive NULL Bytes in a Text File 203 Translate Format Information 204 Distinguish Between Formats 204 Determine a Document Reader 205 Category Values in formats.ini 205 Appendix F: List of Required Files for Redistribution 209 Core Files 209 Support Files 210 Document Readers 211 Appendix G: Develop a Custom Reader 218 Introduction 218 How to Write a Custom Reader 219 Naming Conventions 219 Basic Steps 220 Token Buffer 220 Macros 221 Reader Interface 222 Function Flow 222 Example Development of fffFillBuffer() 223 Implementation 1—fpFillBuffer() Function 223 KeyView (12.4) Page 8 of 240 Filter SDK C++ Programming Guide Structure of Implementation 1 224 Problems with Implementation 1 224 Implementation 2—Processing a Large Token Stream 225 Structure of Implementation 2 226 Problems with Implementation 2 226 Boundary Conditions 226 Implementation 3—Interrupting Structured Access Layer Calls 227 Structure of Implementation 3 229 Development Tips 229 Functions 230 xxxsrAutoDet() 230