IDOL Connector Framework Server 11.0 Administration Guide
Total Page:16
File Type:pdf, Size:1020Kb
HPE Connector Framework Server Software Version: 11.0 Connector Framework Server Administration Guide Document Release Date: March 2016 Software Release Date: March 2016 Connector Framework Server Administration Guide Legal Notices Warranty The only warranties for Hewlett Packard Enterprise Development LP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HPE shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Restricted Rights Legend Confidential computer software. Valid license from HPE required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. Copyright Notice © Copyright 2016 Hewlett Packard Enterprise Development LP Trademark Notices Adobe™ is a trademark of Adobe Systems Incorporated. Microsoft® and Windows® are U.S. registered trademarks of Microsoft Corporation. UNIX® is a registered trademark of The Open Group. This product includes an interface of the 'zlib' general purpose compression library, which is Copyright © 1995-2002 Jean-loup Gailly and Mark Adler. Documentation Updates HPE Big Data Support provides prompt and accurate support to help you quickly and effectively resolve any issue you may encounter while using HPE Big Data products. Support services include access to the Customer Support Site (CSS) for online answers, expertise-based service by HPE Big Data support engineers, and software maintenance to ensure you have the most up-to-date technology. To access the Customer Support Site l go to https://customers.autonomy.com The Customer Support Site includes: l Knowledge Base. An extensive library of end user documentation, FAQs, and technical articles that is easy to navigate and search. l Support Cases. A central location to create, monitor, and manage all your cases that are open with technical support. l Downloads. A location to download or request products and product updates. l Requests. A place to request products to download or product licenses. To contact HPE Big Data Customer Support by email or phone l go to http://www.autonomy.com/work/services/customer-support Support The title page of this document contains the following identifying information: HPE Connector Framework Server (11.0) Page 2 of 159 Connector Framework Server Administration Guide l Software Version number, which indicates the software version. l Document Release Date, which changes each time the document is updated. l Software Release Date, which indicates the release date of this version of the software. To check for recent updates or to verify that you are using the most recent edition of a document, visit the Knowledge Base on the HPE Big Data Customer Support Site. To do so, go to https://customers.autonomy.com, and then click Knowledge Base. The Knowledge Base contains documents in PDF and HTML format as well as collections of related documents in ZIP packages. You can view PDF and HTML documents online or download ZIP packages and open PDF documents to your computer. About this PDF Version of Online Help This document is a PDF version of the online help. This PDF file is provided so you can easily print multiple topics from the help information or read the online help in PDF format. Because this content was originally created to be viewed as online help in a web browser, some topics may not be formatted properly. Some interactive topics may not be present in this PDF version. Those topics can be successfully printed from within the online help. HPE Connector Framework Server (11.0) Page 3 of 159 Connector Framework Server Administration Guide Contents Chapter 1: Introduction 9 Connector Framework Server 9 Filter Documents and Extract Subfiles 9 Manipulate and Enrich Documents 10 Index Documents 10 Import Process 11 Display Online Help 11 OEM Certification 12 Related Documentation 12 HPE's IDOL Platform 12 System Architecture 13 Chapter 2: Configure Connector Framework Server 15 Connector Framework Server Configuration File 15 Modify Configuration Parameter Values 16 Configure Connector Framework Server 17 Include an External Configuration File 18 Include the Whole External Configuration File 19 Include Sections of an External Configuration File 19 Include a Parameter from an External Configuration File 19 Merge a Section from an External Configuration File 20 Customize Logging 21 Encrypt Passwords 22 Create a Key File 22 Encrypt a Password 22 Decrypt a Password 24 Example Configuration File 24 Chapter 3: Start and Stop Connector Framework Server 27 Start Connector Framework Server 27 Stop Connector Framework Server 27 Chapter 4: Send Actions to Connector Framework Server 29 Send Actions to Connector Framework Server 29 Asynchronous Actions 29 HPE Connector Framework Server (11.0) Page 4 of 159 Connector Framework Server Administration Guide Check the Status of an Asynchronous Action 30 Cancel an Asynchronous Action that is Queued 30 Stop an Asynchronous Action that is Running 30 Store Action Queues in an External Database 31 Prerequisites 31 Configure Connector Framework Server 31 Use XSL Templates to Transform Action Responses 33 Example XSL Templates 34 Chapter 5: Ingest Data 35 Ingest Data using Connectors 35 Ingest an IDX File 35 Ingest XML 36 Ingest PST Files 38 Ingest Password-Protected Files 38 Ingest Data for Testing 40 Chapter 6: Filter Documents and Extract Subfiles 41 Customize KeyView Filtering 41 Disable Filtering or Extraction for Specific Documents 41 Chapter 7: Manipulate and Enrich Documents 43 Introduction 43 Choose When to Run a Task 44 Create Import and Index Tasks 46 Document Fields for Import Tasks 47 Write and Run Lua Scripts 48 Write a Lua Script 49 Run a Lua Script 49 Debug a Lua Script 49 Lua Scripts Included With CFS 53 Use Named Parameters 54 Enable or Disable Lua Scripts During Testing 55 Example Lua Scripts 55 Add a Field to a Document 55 Count Sections 56 Merge Document Fields 56 Add Titles to Documents 57 Analyze Images 58 Run Image Analysis on All Supported Images 58 HPE Connector Framework Server (11.0) Page 5 of 159 Connector Framework Server Administration Guide Run Image Analysis on Specific Documents 59 Analyze Speech 60 Run Analysis on All Audio and Video Files 61 Run Analysis on Specific Documents 61 Use Multiple Speech Servers 62 Language Identification 63 Transcode Audio 63 Speech-To-Text Results 64 Analyze Video 65 Create a Media Server Configuration 65 Run Analysis on All Supported Video Files 67 Run Analysis on Specific Documents 68 Troubleshoot Video Analysis 70 Categorize Documents 71 Customize the Query 72 Customize the Output 73 Run Eduction 74 Redact Documents 75 Lua Post Processing 75 Extract Content from HTML 76 Extract Metadata from Files 77 Import Content Into a Document 78 Reject Invalid Documents 78 Reject Documents with Binary Content 78 Reject Documents with Import Errors 79 Reject Documents with Symbolic Content 79 Reject Documents by Word Length 80 Reject All Invalid Documents 80 Split Document Content into Sections 81 Split Data into Multiple Documents 81 Write Documents to Disk 82 Write Documents to Disk in IDX Format 82 Write Documents to Disk in XML Format 82 Write Documents to Disk in JSON Format 83 Write Documents to Disk in CSV Format 83 Write Documents to Disk as SQL INSERT Statements 84 Standardize Document Fields 85 Normalize E-mail Addresses 85 Chapter 8: Index Documents 87 Introduction 87 HPE Connector Framework Server (11.0) Page 6 of 159 Connector Framework Server Administration Guide Configure the Batch Size and Time Interval 88 Index Documents into an IDOL Server 88 Index Documents into Haven OnDemand 89 Index Documents into Vertica 90 Prepare the Vertica Database 91 Configure CFS to Index into Vertica 92 Troubleshooting 93 Index Documents into MetaStore 93 Document Fields for Indexing 94 AUTN_INDEXPRIORITY 94 Manipulate Documents Before Indexing 94 Set Up Document Tracking 95 Appendix A: KeyView Supported Formats 97 Supported Formats 97 Archive Formats 99 Binary Format 101 Computer-Aided Design Formats 101 Database Formats 103 Desktop Publishing 103 Display Formats 104 Graphic Formats 104 Mail Formats 107 Multimedia Formats 110 Presentation Formats 111 Spreadsheet Formats 113 Text and Markup Formats 115 Word Processing Formats 116 Supported Formats (Detected) 121 Appendix B: KeyView Format Codes 126 KeyView Classes 126 KeyView Formats 127 Appendix C: Document Fields 152 Document Fields 152 AUTN_IDENTIFIER 153 Sub File Indexes 154 Append Sub File Indexes to the Document Identifier 155 HPE Connector Framework Server (11.0) Page 7 of 159 Connector Framework Server Administration Guide Glossary 156 Send Documentation Feedback 159 HPE Connector Framework Server (11.0) Page 8 of 159 Chapter 1: Introduction This section provides an overview of Connector Framework Server. • Connector Framework Server 9 • Related Documentation 12 • HPE's IDOL Platform 12 • System Architecture 13 Connector Framework Server Connector Framework Server (CFS) processes the information that is retrieved by connectors, and then indexes the information into one or more indexes, such as IDOL Server or Haven OnDemand. When connectors send documents to CFS, the documents contain only metadata extracted from the repository, such as the location of a file or record that the connector has retrieved. CFS extracts the file- specific metadata and file content from the file and adds it to the document. This allows IDOL to search and extract meaning from the information contained in the repository, without needing to process the information in its native format. CFS also provides features to manipulate and enrich documents before they are indexed. CFS includes customizable import tasks that you can run, and supports the Lua scripting language so that you can write your own tasks and develop custom processing rules.