IDOL Keyview XML Export SDK 12.6 Java Programming Guide
Total Page:16
File Type:pdf, Size:1020Kb
KeyView Software Version 12.6 XML Export SDK Java Programming Guide Document Release Date: June 2020 Software Release Date: June 2020 XML Export SDK Java Programming Guide Legal notices Copyright notice © Copyright 1997-2020 Micro Focus or one of its affiliates. The only warranties for products and services of Micro Focus and its affiliates and licensors (“Micro Focus”) are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Micro Focus shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Documentation updates The title page of this document contains the following identifying information: l Software Version number, which indicates the software version. l Document Release Date, which changes each time the document is updated. l Software Release Date, which indicates the release date of this version of the software. To check for updated documentation, visit https://www.microfocus.com/support-and-services/documentation/. Support Visit the MySupport portal to access contact information and details about the products, services, and support that Micro Focus offers. This portal also provides customer self-solve capabilities. It gives you a fast and efficient way to access interactive technical support tools needed to manage your business. As a valued support customer, you can benefit by using the MySupport portal to: l Search for knowledge documents of interest l Access product documentation l View software vulnerability alerts l Enter into discussions with other software customers l Download software patches l Manage software licenses, downloads, and support contracts l Submit and track service requests l Contact customer support l View information about all services that Support offers Many areas of the portal require you to sign in. If you need an account, you can create one when prompted to sign in. To learn about the different access levels the portal uses, see the Access Levels descriptions. KeyView (12.6) Page 2 of 276 XML Export SDK Java Programming Guide Contents Part I: Overview of XML Export 9 Chapter 1: Introducing XML Export 11 Overview 11 Features 12 Platforms, Compilers, and Dependencies 12 Supported Platforms 13 Supported Compilers 13 Software Dependencies 14 Windows Installation 14 UNIX Installation 15 Package Contents 16 License Information 17 Enable Advanced Document Readers 17 Update License Information 18 Directory Structure 19 Definition of Terms 20 Chapter 2: Getting Started 22 Architectural Overview 22 Enhance Performance 24 File Caching 24 Convert Files 24 Convert Files Out of Process 25 Configure Out-of-Process Conversions 26 Run Export Out of Process—Overview 28 Recommendations 28 Run Export Out of Process 29 Subfile Extraction 30 Convert Outlook Email without Using the Extraction API 31 Set Conversion Options 31 Set Conversion Options by Using the API 31 Explore Conversion Options with the Sample Programs 31 Templates 32 Use the Export Demo Program 33 Change Input/Output Directories 34 Set Configuration Options 34 Suppress Images 34 Use PDF Position Information 35 Convert Files 35 Use the XML Export API 36 Input/Output Operations 37 Convert Files 37 KeyView (12.6) Page 3 of 276 XML Export SDK Java Programming Guide Multithreaded Conversions 40 Use Methods in the XmlExportReader Class 41 Example 41 Use Callbacks 42 Example 42 Before Running Your Application 43 Use the KeyView Document Type Definition (DTD) 43 Use XML Style Language Transformation (XSLT) 43 Add Elements and Attributes to the DTD 43 Move the DTD 44 Part II: Use the Export API 46 Chapter 3: Use the File Extraction API 48 Introduction 48 Extract Subfiles 49 Sanitize Absolute Paths 50 Extract Images 51 Recreate a File Hierarchy 51 Create a Root Node 51 Example 52 Extract Mail Metadata 53 Default Metadata Set 53 Extract the Default Metadata Set 54 Microsoft Outlook (MSG) Metadata 54 Extract MSG-Specific Metadata 55 Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata 55 Extract EML- or MBX-Specific Metadata 56 Lotus Notes Database (NSF) Metadata 56 Extract NSF-Specific Metadata 56 Microsoft Personal Folders File (PST) Metadata 56 MAPI Properties 57 Extract PST-Specific Metadata 57 Exclude Metadata from the Extracted Text File 58 Extract Subfiles from Outlook Files 58 Extract Subfiles from Outlook Express Files 58 Extract Subfiles from Mailbox Files 59 Extract Subfiles from Outlook Personal Folders Files 59 Choose the Reader to use for PST Files 60 MAPI Attachment Methods 61 Open Secured PST Files 62 Detect PST Files While the Outlook Client is Running 62 Extract Subfiles from Lotus Domino XML Language Files 62 Extract .DXL Files to HTML 63 Extract Subfiles from Lotus Notes Database Files 63 System Requirements 64 KeyView (12.6) Page 4 of 276 XML Export SDK Java Programming Guide Installation and Configuration 64 Windows 64 Solaris 65 AIX 5.x 65 Linux 66 Open Secured NSF Files 66 Format Note Subfiles 66 Extract Subfiles from PDF Files 66 Improve Performance for PDFs with Many Small Images 67 Extract Embedded OLE Objects 67 Extract Subfiles from ZIP Files 67 Default File Names for Extracted Subfiles 68 Default File Name for Mail Formats 68 Default File Name for Embedded OLE Objects 69 Exclude Japanese Guide Text 69 Chapter 4: Use the XML Export API 71 Extract Metadata 71 Extract Metadata by Using the API 71 Example 72 Extract Metadata by Using a Template File 74 Examples 74 $SUMMARYNN 74 $SUMMARY 75 $USERSUMMARY 75 Extract File Format Information 76 Example 76 Convert Character Sets 77 Determine the Character Set of the Output Text 77 Guidelines for Character Set Conversion 78 Examples of Character Set Conversion 79 Document Character Set Can be Determined 79 Document Character Set Cannot be Determined 79 Set the Character Set During Conversion 80 Set the Character Set During File Extraction from a Container 80 Map Styles 81 Use the Java API 81 Use a Template file 82 Use Style Sheets 83 Use Extensible Style Sheet Language (XSL) 84 Use Cascading Style Sheets (CSS) 84 Display Vector Graphics on UNIX and Linux 85 Convert Revision Tracking Information 86 Convert PDF Files 87 Convert PDF Files to a Logical Reading Order 87 Logical Reading Order and Paragraph Direction 87 Enable Logical Reading Order 88 KeyView (12.6) Page 5 of 276 XML Export SDK Java Programming Guide Use the Java API 88 Use the formats_e.ini File 89 Control Hyphenation 90 Extract Custom Metadata from PDF Files 90 Configure the Size of Exported Images 91 Convert Spreadsheet Files 92 Convert Hidden Text in Microsoft Excel Files 92 Convert Headers and Footers in Microsoft Excel 2003 Files 92 Specify Date and Time Format on UNIX Systems 93 Convert Very Large Numbers in Spreadsheet Cells to Precision Numbers 93 Extract Microsoft Excel Formulas 94 Convert XML Files 95 Configure Element Extraction for XML Documents 96 Modify Element Extraction Settings 96 Use the Java API 97 Modify Element Extraction Settings in the kvxconfig.ini File 97 Specify an Element's Namespace and Attribute 99 Add Configuration Settings for Custom XML Document Types 99 Error Messages 100 Show Hidden Data 103 Hidden Data in Microsoft Documents 103 Toggle Word Comment Settings in the formats_e.ini File 104 Toggle PowerPoint Slide Note Settings in the formats_e.ini File 105 Exclude Japanese Guide Text 105 Source Code Identification 106 Chapter 5: Sample Programs 107 Introduction 107 ExtractExport 108 XmlTest 110 XmlConvFileToFile 112 Run XmlConvFileToFile on Windows 113 Run XmlConvFileToFile on UNIX 113 XmlConvStreamToStream 114 Run XmlConvStreamToStream on Windows 114 Run XmlConvStreamToStream on UNIX 115 XmlParseIt 115 Run XmlParseIt on Windows 116 Run XmlParseIt on UNIX 116 Part III: Appendixes 118 Appendix A: Supported Formats 120 Key to Supported Formats Table 120 Supported Formats 122 Appendix B: Document Readers 193 Key to Document Readers Table 193 KeyView (12.6) Page 6 of 276 XML Export SDK Java Programming Guide Document Readers 195 Appendix C: Character Sets 224 Multibyte and Bidirectional Support 224 Coded Character Sets 232 Appendix D: Extract and Format Lotus Notes Subfiles 238 Overview 238 Customize XML Templates 238 Use Demo Templates 239 Use Old Templates 239 Disable XML Templates 239 Template Elements and Attributes 240 Conditional Elements 240 Control Elements 241 Data Elements 242 Date and Time Formats 245 Lotus Notes Date and Time Formats 245 KeyView Date and Time Formats 246 Appendix E: Export Tokens 251 Appendix F: File Format Detection 254 Introduction 254 Extract Format Information 254 Determine Format Support 254 Refine Detection of Text Files 255 Change the Amount of File Data to Read 255 Change the Percentage of Allowed Non-ASCII Characters 256 Use the File Extension for Detection 256 Allow Consecutive NULL Bytes in a Text File 256 Translate Format Information 256 Distinguish Between Formats 257 Determine a Document Reader 258 Category Values in formats_e.ini 258 Appendix G: Files Required for Redistribution 262 Core Files 262 Support Files 263 Document Readers and Writers 265 Document Type Definition Files 272 Appendix H: Password Protected Files 273 Supported Password Protected File Types 273 Export Password Protected Files 274 Open Password Protected Container Files 274 Send documentation feedback 276 KeyView (12.6) Page 7 of 276 XML Export SDK Java Programming Guide KeyView (12.6) Page 8 of 276 Part I: Overview of XML Export This section provides an overview of the Micro Focus IDOL KeyView Export SDK and describes how to use the Java implementation of the API. l Introducing XML Export l Getting Started KeyView (12.6) Page 9 of 276 XML Export SDK Java Programming Guide Part I: Overview of XML Export KeyView (12.6) Page 10 of 276 Chapter 1: Introducing XML Export This guide is for developers who want to incorporate Micro Focus KeyView XML conversion technology into their applications using a Java development environment.