IDOL Keyview XML Export SDK 12.4 Java Programming Guide
Total Page:16
File Type:pdf, Size:1020Kb
KeyView Software Version 12.4 XML Export SDK Java Programming Guide Document Release Date: October 2019 Software Release Date: October 2019 XML Export SDK Java Programming Guide Legal notices Copyright notice © Copyright 2006-2019 Micro Focus or one of its affiliates. The only warranties for products and services of Micro Focus and its affiliates and licensors (“Micro Focus”) are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Micro Focus shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Documentation updates The title page of this document contains the following identifying information: l Software Version number, which indicates the software version. l Document Release Date, which changes each time the document is updated. l Software Release Date, which indicates the release date of this version of the software. To check for updated documentation, visit https://www.microfocus.com/support-and-services/documentation/. Support Visit the MySupport portal to access contact information and details about the products, services, and support that Micro Focus offers. This portal also provides customer self-solve capabilities. It gives you a fast and efficient way to access interactive technical support tools needed to manage your business. As a valued support customer, you can benefit by using the MySupport portal to: l Search for knowledge documents of interest l Access product documentation l View software vulnerability alerts l Enter into discussions with other software customers l Download software patches l Manage software licenses, downloads, and support contracts l Submit and track service requests l Contact customer support l View information about all services that Support offers Many areas of the portal require you to sign in. If you need an account, you can create one when prompted to sign in. To learn about the different access levels the portal uses, see the Access Levels descriptions. KeyView (12.4) Page 2 of 255 XML Export SDK Java Programming Guide Contents Part I: Overview of XML Export 9 Chapter 1: Introducing XML Export 11 Overview 11 Features 12 Platforms, Compilers, and Dependencies 12 Supported Platforms 13 Supported Compilers 13 Software Dependencies 14 Windows Installation 14 UNIX Installation 15 Package Contents 16 License Information 17 Enable Advanced Document Readers 18 Update License Information 18 Directory Structure 19 Definition of Terms 20 Chapter 2: Getting Started 22 Architectural Overview 22 Enhance Performance 24 File Caching 24 Convert Files 24 Convert Files Out of Process 25 Configure Out-of-Process Conversions 26 Run Export Out of Process—Overview 28 Recommendations 28 Run Export Out of Process 29 Subfile Extraction 30 Convert Outlook Email without Using the Extraction API 31 Set Conversion Options 31 Set Conversion Options by Using the API 31 Set Conversion Options by Using the Template Files 32 Templates 32 Use the Export Demo Program 33 Change Input/Output Directories 34 Set Configuration Options 35 Suppress Images 35 Use PDF Position Information 35 Convert Files 36 Use the XML Export API 37 Input/Output Operations 37 Convert Files 37 KeyView (12.4) Page 3 of 255 XML Export SDK Java Programming Guide Multithreaded Conversions 40 Use Methods in the XmlExportReader Class 41 Example 42 Use Callbacks 42 Example 43 Before Running Your Application 43 Use the KeyView Document Type Definition (DTD) 43 Use XML Style Language Transformation (XSLT) 44 Add Elements and Attributes to the DTD 44 Move the DTD 44 Part II: Use the Export API 46 Chapter 3: Use the File Extraction API 48 Introduction 48 Extract Subfiles 49 Sanitize Absolute Paths 50 Extract Images 51 Recreate a File Hierarchy 51 Create a Root Node 51 Example 52 Extract Mail Metadata 53 Default Metadata Set 53 Extract the Default Metadata Set 53 Microsoft Outlook (MSG) Metadata 54 Extract MSG-Specific Metadata 55 Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata 55 Extract EML- or MBX-Specific Metadata 55 Lotus Notes Database (NSF) Metadata 56 Extract NSF-Specific Metadata 56 Microsoft Personal Folders File (PST) Metadata 56 MAPI Properties 56 Extract PST-Specific Metadata 57 Exclude Metadata from the Extracted Text File 58 Extract Subfiles from Outlook Files 58 Extract Subfiles from Outlook Express Files 58 Extract Subfiles from Mailbox Files 58 Extract Subfiles from Outlook Personal Folders Files 59 Choose the Reader to use for PST Files 59 MAPI Attachment Methods 61 Open Secured PST Files 61 Detect PST Files While the Outlook Client is Running 62 Extract Subfiles from Lotus Domino XML Language Files 62 Extract .DXL Files to HTML 63 Extract Subfiles from Lotus Notes Database Files 63 System Requirements 63 KeyView (12.4) Page 4 of 255 XML Export SDK Java Programming Guide Installation and Configuration 64 Windows 64 Solaris 64 AIX 5.x 65 Linux 65 Open Secured NSF Files 66 Format Note Subfiles 66 Extract Subfiles from PDF Files 66 Improve Performance for PDFs with Many Small Images 66 Extract Embedded OLE Objects 66 Extract Subfiles from ZIP Files 67 Default File Names for Extracted Subfiles 67 Default File Name for Mail Formats 67 Default File Name for Embedded OLE Objects 68 Exclude Japanese Guide Text 69 Chapter 4: Use the XML Export API 70 Extract Metadata 70 Extract Metadata by Using the API 70 Example 71 Extract Metadata by Using a Template File 73 Examples 73 $SUMMARYNN 73 $SUMMARY 74 $USERSUMMARY 74 Extract File Format Information 75 Example 75 Convert Character Sets 76 Determine the Character Set of the Output Text 76 Guidelines for Character Set Conversion 77 Examples of Character Set Conversion 78 Document Character Set Can be Determined 78 Document Character Set Cannot be Determined 78 Set the Character Set During Conversion 79 Set the Character Set During File Extraction from a Container 79 Map Styles 80 Use the Java API 80 Use a Template file 81 Use Style Sheets 82 Use Extensible Style Sheet Language (XSL) 83 Use Cascading Style Sheets (CSS) 83 Display Vector Graphics on UNIX and Linux 84 Convert Revision Tracking Information 85 Convert PDF Files 86 Convert PDF Files to a Logical Reading Order 86 Logical Reading Order and Paragraph Direction 87 Enable Logical Reading Order 88 KeyView (12.4) Page 5 of 255 XML Export SDK Java Programming Guide Use the Java API 88 Use the formats_e.ini File 88 Control Hyphenation 89 Extract Custom Metadata from PDF Files 90 Configure the Size of Exported Images 91 Convert Spreadsheet Files 92 Convert Hidden Text in Microsoft Excel Files 92 Convert Headers and Footers in Microsoft Excel 2003 Files 92 Specify Date and Time Format on UNIX Systems 92 Convert Very Large Numbers in Spreadsheet Cells to Precision Numbers 93 Extract Microsoft Excel Formulas 93 Convert XML Files 95 Configure Element Extraction for XML Documents 95 Modify Element Extraction Settings 96 Use the Java API 96 Use an Initialization File 96 Modify Element Extraction Settings in the kvxconfig.ini File 97 Specify an Element's Namespace and Attribute 98 Add Configuration Settings for Custom XML Document Types 99 Error Messages 99 Show Hidden Data 102 Hidden Data in Microsoft Documents 103 Toggle Word Comment Settings in the formats_e.ini File 104 Toggle PowerPoint Slide Note Settings in the formats_e.ini File 104 Exclude Japanese Guide Text 105 Source Code Identification 105 Chapter 5: Sample Programs 106 Introduction 106 ExtractExport 107 XmlTest 109 XmlConvFileToFile 111 Run XmlConvFileToFile on Windows 112 Run XmlConvFileToFile on UNIX 112 XmlConvStreamToStream 113 Run XmlConvStreamToStream on Windows 113 Run XmlConvStreamToStream on UNIX 114 XmlParseIt 114 Run XmlParseIt on Windows 115 Run XmlParseIt on UNIX 115 Part III: Appendixes 117 Appendix A: Supported Formats 119 Supported Formats 119 Archive Formats 120 Binary Format 123 KeyView (12.4) Page 6 of 255 XML Export SDK Java Programming Guide Computer-Aided Design Formats 124 Database Formats 125 Desktop Publishing 126 Display Formats 126 Graphic Formats 127 Mail Formats 131 Multimedia Formats 134 Presentation Formats 137 Spreadsheet Formats 140 Text and Markup Formats 142 Word Processing Formats 143 Appendix B: Detected Formats 149 Key to Detected Formats Table 149 Detected Formats 151 Appendix C: Character Sets 203 Multibyte and Bidirectional Support 203 Coded Character Sets 211 Appendix D: Extract and Format Lotus Notes Subfiles 217 Overview 217 Customize XML Templates 217 Use Demo Templates 218 Use Old Templates 218 Disable XML Templates 218 Template Elements and Attributes 219 Conditional Elements 219 Control Elements 220 Data Elements 221 Date and Time Formats 224 Lotus Notes Date and Time Formats 224 KeyView Date and Time Formats 225 Appendix E: Export Tokens 230 Appendix F: File Format Detection 233 Introduction 233 Extract Format Information 233 Determine Format Support 233 Refine Detection of Text Files 234 Change the Amount of File Data to Read 234 Change the Percentage of Allowed Non-ASCII Characters 235 Use the File Extension for Detection 235 Allow Consecutive NULL Bytes in a Text File 235 Translate Format Information 235 Distinguish Between Formats 236 Determine a Document Reader 237 Category Values in formats_e.ini 237 KeyView (12.4) Page 7 of 255 XML Export SDK Java Programming Guide Appendix G: Files Required for Redistribution 241 Core Files 241 Support Files 242 Document Readers and Writers 243 Document Type Definition Files 250