IDOL Keyview Filter SDK 12.8 C++ Programming Guide
Total Page:16
File Type:pdf, Size:1020Kb
IDOL KeyView Software Version 12.8 Filter SDK C++ Programming Guide Document Release Date: February 2021 Software Release Date: February 2021 Filter SDK C++ Programming Guide Legal notices Copyright notice © Copyright 2016-2021 Micro Focus or one of its affiliates. The only warranties for products and services of Micro Focus and its affiliates and licensors (“Micro Focus”) are as may be set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Micro Focus shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Documentation updates The title page of this document contains the following identifying information: l Software Version number, which indicates the software version. l Document Release Date, which changes each time the document is updated. l Software Release Date, which indicates the release date of this version of the software. To check for updated documentation, visit https://www.microfocus.com/support-and-services/documentation/. Support Visit the MySupport portal to access contact information and details about the products, services, and support that Micro Focus offers. This portal also provides customer self-solve capabilities. It gives you a fast and efficient way to access interactive technical support tools needed to manage your business. As a valued support customer, you can benefit by using the MySupport portal to: l Search for knowledge documents of interest l Access product documentation l View software vulnerability alerts l Enter into discussions with other software customers l Download software patches l Manage software licenses, downloads, and support contracts l Submit and track service requests l Contact customer support l View information about all services that Support offers Many areas of the portal require you to sign in. If you need an account, you can create one when prompted to sign in. To learn about the different access levels the portal uses, see the Access Levels descriptions. IDOL KeyView (12.8) Page 2 of 266 Filter SDK C++ Programming Guide Contents Part I: Overview of Filter SDK 11 Chapter 1: Introducing Filter SDK 12 Overview 12 Features 12 Platforms, Compilers, and Dependencies 13 Supported Platforms 13 Supported Compilers 14 C++ Filter SDK 14 Software Dependencies 15 Windows Installation 15 UNIX Installation 16 Package Contents 17 License Information 17 Enable Advanced Document Readers 18 Pass License Information to KeyView 18 Directory Structure 19 Chapter 2: Getting Started 21 Use the C++ Language Implementation of the API 21 Build the C++ API 21 Create a KeyView Session 22 Configure your session 23 Detect the Format of a File 23 Filter a File 23 Extract Subfiles 23 Extract Metadata 24 Exceptions 24 Generic IO Types 25 Part II: Use Filter SDK 27 Chapter 3: Use the File Extraction API 28 Introduction 28 Extract Subfiles 29 Extract Images 30 Extract Mail Metadata 30 Default Metadata Set 30 Extract the Default Metadata Set 31 Extract Subfiles from Outlook Express Files 31 Extract Subfiles from Mailbox Files 31 Extract Subfiles from Outlook Personal Folders Files 32 Choose the Reader to use for PST Files 32 IDOL KeyView (12.8) Page 3 of 266 Filter SDK C++ Programming Guide MAPI Attachment Methods 34 Open Secured PST Files 34 Detect PST Files While the Outlook Client is Running 35 Extract Subfiles from Lotus Domino XML Language Files 35 Extract .DXL Files to HTML 36 Extract Subfiles from Lotus Notes Database Files 36 System Requirements 36 Installation and Configuration 37 Windows 37 Solaris 37 AIX 5.x 38 Linux 38 Open Secured NSF Files 39 Format Note Subfiles 39 Extract Subfiles from PDF Files 39 Improve Performance for PDFs with Many Small Images 39 Extract Embedded OLE Objects 39 Extract Subfiles from ZIP Files 40 Extract Metadata 40 Chapter 4: Use the Filter API 41 Generate an Error Log 41 Enable or Disable Error Logging 42 Use the API 42 Use Environment Variables 42 Change the Path and File Name of the Log File 42 Report Memory Errors 43 Use the API 43 Use Environment Variables 43 Specify a Memory Guard 43 Specify the Maximum Size of the Log File 43 Extract Metadata 44 Convert Character Sets 44 Determine the Character Set of the Output Text 44 Guidelines for Character Set Conversion 45 Set the Character Set During Filtering 45 Set the Character Set During Subfile Extraction 46 Customize Character Set Detection and Conversion 46 Extract Deleted Text Marked by Tracked Changes 46 Filter a File 47 Filter PDF Files 47 Filter PDF Files to a Logical Reading Order 47 Enable Logical Reading Order 48 Use the C++ API 48 Use the formats.ini File 49 Rotated Text 49 Extract Custom Metadata from PDF Files 49 IDOL KeyView (12.8) Page 4 of 266 Filter SDK C++ Programming Guide Extract All Custom Metadata 49 Filter Tagged PDF Content 50 Skip Embedded Fonts 50 Use the formats.ini File 50 Use the C++ API 51 Control Hyphenation 51 Use the formats.ini File 51 Use the C++ API 51 Filter Portfolio PDF Files 52 Filter Spreadsheet Files 52 Filter Worksheet Names 52 Filter Hidden Text in Microsoft Excel Files 52 Specify Date and Time Format on UNIX Systems 52 Filter Very Large Numbers in Spreadsheet Cells to Precision Numbers 53 Extract Microsoft Excel Formulas 53 Configure Headers and Footers 55 Filter Hidden Data 56 Hidden Data in HTML Documents 56 Tab Delimited Output for Embedded Tables 56 Table Detection for PDF Files 56 Exclude Japanese Guide Text 57 Source Code Identification 57 Chapter 5: Sample Programs 59 Introduction 59 Build the Sample Programs 59 Run the Sample Programs 60 detect 60 extract 61 filter_document 61 metadata 62 subfiles 62 filter_container 62 Part III: C++ API Reference 63 Chapter 7: InputTypes and OutputTypes 65 Chapter 8: The keyview Namespace 67 The Session Class 67 Constructor 67 config 68 detect 68 filter 69 get_summary_information 69 metadata_map 69 subfiles 69 The Configuration Class 70 IDOL KeyView (12.8) Page 5 of 266 Filter SDK C++ Programming Guide Constructor 70 character_set_detection 70 custom_pdf_metadata 70 date_time_field_codes 71 extraction_timeout 71 filename_field_code 71 formatted_mail 71 header_and_footer 72 header_and_footer_tags 72 hidden_text 72 no_encoding_conversion 72 out_of_process_log 73 out_of_process_memory_log 73 password 73 pdf_logical_reading 73 revision_marks 74 skip_comments 74 skip_embedded_fonts 74 skip_thumbnail 75 soft_hyphens 75 source_encoding 75 tagged_pdf_content 75 target_encoding 76 string& temporary_directory 76 timeout 76 unicode_byte_order_marker 76 The DetectionInfo Class 77 appleDoubleEncoded 77 appleSingleEncoded 77 category 77 category_name 77 description 78 encrypted 78 extension 78 format 78 macBinaryEncoded 78 version 79 wangGDLencoded 79 windowRMSEncrypted 79 The Container Class 79 The Subfile Class 80 extract 80 children 80 index 80 is_folder 80 mail_metadata 81 parent 81 IDOL KeyView (12.8) Page 6 of 266 Filter SDK C++ Programming Guide rawname 81 size 81 time 81 type 82 The SummaryInfoItem Class 82 apply_visitor 82 convert_to_string 83 name 83 type 83 The SummaryInfoVisitorBase Class 83 visit_boolean 83 visit_datetime 84 visit_double 84 visit_integer 84 visit_target_encoding_string 84 visit_utf8_string 85 Enumerations 85 LogicalPDFDirection 85 SubFile::Type 86 SummaryInfoType 86 Exceptions 87 Chapter 9: The keyview::io Namespace 89 InputFile 89 Constructors 89 OutputFile 89 Constructors 89 OutputStdout 90 Constructors 90 InMemoryFile 90 Constructors 90 Appendixes 91 Appendix A: Supported Formats 92 Key to Supported Formats Table 92 Supported Formats 94 Appendix B: Document Readers 169 Key to Document Readers Table 169 Document Readers 171 Appendix C: Character Sets 200 Multibyte and Bidirectional Support 200 Coded Character Sets 208 Appendix D: Extract and Format Lotus Notes Subfiles 214 Overview 214 Customize XML Templates 214 IDOL KeyView (12.8) Page 7 of 266 Filter SDK C++ Programming Guide Use Demo Templates 215 Use Old Templates 215 Disable XML Templates 215 Template Elements and Attributes 216 Conditional Elements 216 Control Elements 217 Data Elements 218 Date and Time Formats 221 Lotus Notes Date and Time Formats 221 KeyView Date and Time Formats 222 Appendix E: File Format Detection 227 Introduction 227 Extract Format Information 227 Determine Format Support 228 Example formats.ini file entries 228 Refine Detection of Text Files 228 Allow Consecutive NULL Bytes in a Text File 229 Translate Format Information 230 Distinguish Between Formats 231 Determine a Document Reader 231 Category Values in formats.ini 231 Appendix F: List of Required Files for Redistribution 235 Core Files 235 Support Files 236 Document Readers 237 Appendix G: Develop a Custom Reader 244 Introduction 244 How to Write a Custom Reader 245 Naming Conventions 245 Basic Steps 246 Token Buffer 246 Macros 248 Reader Interface 248 Function Flow 249 Example Development of fffFillBuffer() 249 Implementation 1—fpFillBuffer() Function 249 Structure of Implementation 1 250 Problems with Implementation 1 250 Implementation 2—Processing a Large Token Stream 250 Structure of Implementation 2 251 Problems with Implementation 2 252 Boundary Conditions 252 Implementation 3—Interrupting Structured Access Layer Calls 253 Structure of Implementation 3 255 Development Tips 255 Functions 256 IDOL KeyView (12.8) Page 8 of 266 Filter SDK C++ Programming Guide xxxsrAutoDet() 256 xxxAllocateContext() 257 xxxFreeContext() 258 xxxInitDoc() 258 xxxFillBuffer() 259 xxxGetSummaryInfo() 260 xxxOpenStream() 261 xxxCloseStream() 262 xxxCharSet() 262 Appendix H: Password Protected Files 264 Supported Password Protected File Types 264 Send documentation feedback 266 IDOL KeyView (12.8) Page 9 of 266 Filter SDK C++ Programming Guide IDOL KeyView (12.8) Page 10 of 266 Part I: Overview of Filter SDK This section provides an overview of the Micro Focus KeyView Filter SDK and describes how to use the C++ implementation of the API. l Introducing Filter SDK, on page 12 l Getting Started, on page 21 IDOL KeyView (12.8) Page 11 of 266 Chapter 1: Introducing Filter SDK This section describes the Filter SDK package.