Integrator Acquisition System Developer's Guide
Total Page:16
File Type:pdf, Size:1020Kb
Oracle® Endeca Information Discovery Integrator Integrator Acquisition System Developer's Guide Version 3.0.0 • May 2013 Copyright and disclaimer Copyright © 2003, 2013, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. UNIX is a registered trademark of The Open Group. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency- specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S. Government. This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications. This software or hardware and documentation may provide access to or information on content, products and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services. Oracle® Endeca Information Discovery Integrator: Integrator Version 3.0.0 • May 2013 Acquisition System Developer's Guide Table of Contents Copyright and disclaimer ..........................................................2 Preface ..........................................................................8 About this guide ................................................................8 Who should use this guide .........................................................8 Conventions used in this guide......................................................8 Contacting Oracle Customer Support .................................................9 Part I: Introduction to IAS Chapter 1: Introduction ...........................................................11 Overview of the Integrator Acquisition System..........................................12 About the Endeca IAS Service .....................................................14 About the IAS Server ...........................................................14 About the Component Instance Manager .............................................15 About the Record Store ..........................................................15 About using SSL with IAS ........................................................16 Overview of the default IAS crawls and manipulators.....................................16 Chapter 2: Running the IAS Sample Applications.....................................17 About the sample IAS applications ..................................................17 Using the IAS Server Java client ...................................................17 IAS Server Java client sample files and directories ..................................17 About the IAS Server Java client program.........................................18 Building and running the Java client with Ant ......................................18 Opening the ias-server-java-client project in Eclipse .................................19 Running the operations of the Java client .........................................19 Using the Record Store Java client..................................................21 Record Store client sample files and directories ....................................21 About the Record Store sample client applications ..................................22 Building and running the sample writer client with Ant ................................22 Building and running the sample reader client with Ant ...............................23 Opening the recordstore-java-client project in Eclipse ................................24 Running the operations of the sample writer client...................................25 Running the operations of the sample reader client ..................................25 Part II: Crawling Data Sources Chapter 3: Creating a Crawl .......................................................28 About creating a crawl ...........................................................28 Creating a Delimited File crawl ....................................................29 Oracle® Endeca Information Discovery Integrator: Integrator Version 3.0.0 • May 2013 Acquisition System Developer's Guide Table of Contents 4 Creating a Documentum Content Server crawl .........................................32 Supported versions of Documentum Content Server .................................35 Setting up IAS for Documentum Content Server ....................................35 Limitations of a Documentum Content Server crawl..................................36 Permission mapping in a Documentum Content Server crawl...........................36 Creating a File System crawl ......................................................36 Creating a JDBC crawl ..........................................................39 Feature notes and known limitations of JDBC crawls.................................42 Determining which SharePoint crawl to use ...........................................43 Creating a SharePoint Object Model crawl ............................................43 SharePoint versions supported by a SharePoint Object Model crawl .....................46 Installing a SharePoint solution on the SharePoint server .............................47 Additional configuration notes for a SharePoint Object Model crawl ......................48 Permission mapping for SharePoint Object Model properties ..........................49 Uninstalling the SharePoint solution from the SharePoint server.........................50 Creating a SharePoint Web Services crawl ............................................50 SharePoint versions supported by a SharePoint Web Services crawl ....................53 Additional configuration notes for a SharePoint Web Services crawl......................53 Permission mapping for SharePoint Web Services properties ..........................55 About filters...................................................................55 Setting document conversion options ................................................57 Configuring document conversion filters ..............................................58 Adding a Filtering Script manipulator to a crawl.........................................60 Adding a Modifying Script manipulator to a crawl........................................61 Modifying a crawl ..............................................................62 Writing crawl output to a file .......................................................63 Chapter 4: Configuring a Record Store Instance......................................65 About record generations.........................................................65 About transactions..............................................................66 About the last read generation for a client.............................................66 About deleted records ...........................................................67 Configuring a Record Store instance ................................................68 Configuration properties for a Record Store instance.....................................69 Change properties and new Record Store instances .....................................74 Deleting stale generations of records ................................................74 Disabling automatic management of a Record Store instance ..............................74 Performance considerations when using a Record Store instance ...........................75 Chapter 5: Running a Crawl .......................................................76 Running a crawl