IDOL Keyview XML Export SDK 11.5 C Programming Guide
Total Page:16
File Type:pdf, Size:1020Kb
KeyView Software Version: 11.5 XML Export SDK C Programming Guide Document Release Date: October 2017 Software Release Date: October 2017 XML Export SDK C Programming Guide Legal notices Warranty The only warranties for Hewlett Packard Enterprise Development LP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HPE shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Restricted rights legend Confidential computer software. Valid license from HPE required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. Copyright notice © Copyright 2006-2017 Hewlett Packard Enterprise Development LP Trademark notices Adobe™ is a trademark of Adobe Systems Incorporated. Microsoft® and Windows® are U.S. registered trademarks of Microsoft Corporation. UNIX® is a registered trademark of The Open Group. This product includes an interface of the 'zlib' general purpose compression library, which is Copyright © 1995-2002 Jean-loup Gailly and Mark Adler. Documentation updates The title page of this document contains the following identifying information: l Software Version number, which indicates the software version. l Document Release Date, which changes each time the document is updated. l Software Release Date, which indicates the release date of this version of the software. To check for recent software updates, go to https://downloads.autonomy.com/productDownloads.jsp. To verify that you are using the most recent edition of a document, go to https://softwaresupport.hpe.com/group/softwaresupport/search-result?doctype=online help. This site requires that you register for an HPE Passport and sign in. To register for an HPE Passport ID, go to https://hpp12.passport.hpe.com/hppcf/login.do. You will also receive updated or new editions if you subscribe to the appropriate product support service. Contact your HPE sales representative for details. Support Visit the HPE Software Support Online web site at https://softwaresupport.hpe.com. This web site provides contact information and details about the products, services, and support that HPE Software offers. KeyView (11.5) Page 2 of 339 XML Export SDK C Programming Guide HPE Software online support provides customer self-solve capabilities. It provides a fast and efficient way to access interactive technical support tools needed to manage your business. As a valued support customer, you can benefit by using the support web site to: l Search for knowledge documents of interest l Submit and track support cases and enhancement requests l Access product documentation l Manage support contracts l Look up HPE support contacts l Review information about available services l Enter into discussions with other software customers l Research and register for software training Most of the support areas require that you register as an HPE Passport user and sign in. Many also require a support contract. To register for an HPE Passport ID, go to https://hpp12.passport.hpe.com/hppcf/login.do. To find more information about access levels, go to https://softwaresupport.hpe.com/web/softwaresupport/access-levels. To check for recent software updates, go to https://downloads.autonomy.com/productDownloads.jsp. KeyView (11.5) Page 3 of 339 XML Export SDK C Programming Guide Contents Part I: Overview of XML Export 13 Chapter 1: Introducing XML Export 14 Overview 14 Features 15 Platforms, Compilers, and Dependencies 15 Supported Platforms 15 Supported Compilers 16 C++ Filter SDK 17 Software Dependencies 17 Windows Installation 17 UNIX Installation 18 Package Contents 19 License Information 20 Enable Advanced Document Readers 20 Update License Information 20 Directory Structure 21 Definition of Terms 23 Chapter 2: Getting Started 24 Architectural Overview 24 Memory Abstraction 25 Enhance Performance 26 File Caching 26 Convert Files Out of Process 26 Configure Out-of-Process Conversions 27 Run Export Out of Process—Overview 29 Recommendations 29 Run Export Out of Process in the C API 30 Example—KVXMLStartOOPSession 31 Example—KVXMLEndOOPSession 32 Convert Files 33 Subfile Extraction 33 Convert Outlook Email without Using the Extraction API 34 Set Conversion Options 34 Set Conversion Options by Using the API 34 Set Conversion Options by Using the Template Files 34 Templates 35 Use the Export Demo Program 36 Change Input/Output Directories 37 Set Configuration Options 38 Suppress Images 38 Use PDF Position Information 38 KeyView (11.5) Page 4 of 339 XML Export SDK C Programming Guide Convert Files 39 Use the C-Language Implementation of the API 39 Input/Output Operations 40 Convert Files 40 Multithreaded Conversions 42 Use the Verity Document Type Definition (DTD) 42 Use XML Style Language Transformation (XSLT) 43 Add Elements and Attributes to the DTD 43 Move the DTD 43 Part II: Use the Export API 44 Chapter 3: Use the File Extraction API 45 Introduction 45 Extract Subfiles 46 Extract Images 47 Recreate a File’s Hierarchy 47 Create a Root Node 47 Recreate a File’s Hierarchy—Example 48 Extract Mail Metadata 48 Default Metadata Set 49 Extract the Default Metadata Set 49 Microsoft Outlook (MSG) Metadata 50 Extract MSG-Specific Metadata 51 Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata 52 Extract EML- or MBX-Specific Metadata 52 Lotus Notes Database (NSF) Metadata 52 Extract NSF-Specific Metadata 53 Microsoft Personal Folders File (PST) Metadata 53 MAPI Properties 53 Extract PST-Specific Metadata 54 Exclude Metadata from the Extracted Text File 55 Extract Subfiles from Outlook Files 55 Extract Subfiles from Outlook Express Files 55 Extract Subfiles from Mailbox Files 56 Extract Subfiles from Outlook Personal Folders Files 56 Use the Native or MAPI-based Reader 56 Use the Native PST Reader (pstnsr) 57 Use the MAPI Reader (pstsr) 57 System Requirements 58 MAPI Attachment Methods 58 Open Secured PST Files 59 Detect PST Files While the Outlook Client is Running 59 Extract Subfiles from Lotus Domino XML Language Files 59 Extract .DXL Files to HTML 60 Extract Subfiles from Lotus Notes Database Files 60 KeyView (11.5) Page 5 of 339 XML Export SDK C Programming Guide System Requirements 61 Installation and Configuration 61 Windows 61 Solaris 62 AIX 5.x 62 Linux 63 Open Secured NSF Files 63 Format Note Subfiles 63 Extract Subfiles from PDF Files 63 Improve Performance for PDFs with Many Small Images 63 Extract Embedded OLE Objects 64 Extract Subfiles from ZIP Files 64 Default File Names for Extracted Subfiles 64 Default File Name for Mail Formats 65 Default File Name for Embedded OLE Objects 66 Chapter 4: Use the XML Export API 67 Extract Metadata 67 Extract Metadata by Using the API 67 Use the C API 68 Extract Metadata by Using a Template File 68 Examples 69 $SUMMARYNN 69 $SUMMARY 69 $USERSUMMARY 70 Extract File Format Information 70 Use the C API 70 Convert Character Sets 70 Determine the Character Set of the Output Text 70 Guidelines for Character Set Conversion 71 Examples of Character Set Conversion 72 Document Character Set Can be Determined 72 Document Character Set Cannot be Determined 73 Set the Character Set During Conversion 73 Set the Character Set During File Extraction from a Container 74 Map Styles 74 Use the C API 75 Use a Template file 75 Use Style Sheets 77 Use Extensible Style Sheet Language (XSL) 77 Use Cascading Style Sheets (CSS) 78 Display Vector Graphics on UNIX and Linux 78 Convert Revision Tracking Information 79 Convert PDF Files 80 Use the pdf2sr Reader 80 Convert PDF Files to a Logical Reading Order 81 Logical Reading Order and Paragraph Direction 81 KeyView (11.5) Page 6 of 339 XML Export SDK C Programming Guide Enable Logical Reading Order 82 Use the C API 82 Use the formats_e.ini File 83 Control Hyphenation 84 Extract Custom Metadata from PDF Files 84 Configure the Size of Exported Images 85 Convert Spreadsheet Files 86 Convert Hidden Text in Microsoft Excel Files 86 Convert Headers and Footers in Microsoft Excel 2003 Files 86 Specify Date and Time Format on UNIX Systems 86 Convert Very Large Numbers in Spreadsheet Cells to Precision Numbers 87 Extract Microsoft Excel Formulas 87 Convert XML Files 89 Configure Element Extraction for XML Documents 89 Modify Element Extraction Settings 90 Use the C API 90 Use an Initialization File 91 Modify Element Extraction Settings in the kvxconfig.ini File 91 Specify an Element’s Namespace and Attribute 93 Add Configuration Settings for Custom XML Document Types 93 Show Hidden Data 94 Hidden Data in Microsoft Documents 94 Toggle Word Comment Settings in the formats_e.ini File 95 Toggle PowerPoint Slide Note Settings in the formats_e.ini File 96 Exclude Japanese Guide Text 96 Chapter 5: Sample Programs 97 Introduction 97 C Sample Programs 97 Compile the Visual Basic Sample Program 98 tstxtract 98 cnv2xml 99 cnv2xmloop 100 metadata 101 xmlindex 101 xmlini 101 Use Style Sheets with xmlini 102 xmlcallback 103 xmlonefile 103 xmlmulti 103 Export Demo 104 Part III: C API Reference 105 Chapter 6: File Extraction API Functions 106 KVGetExtractInterface() 106 fpCloseFile() 107 KeyView (11.5) Page 7 of 339 XML Export SDK C Programming Guide fpExtractSubFile() 107 fpFreeStruct() 109 fpGetMainFileInfo() 110 fpGetSubFileInfo() 111 fpGetSubFileMetaData()