Content Acquisition System Developer's Guide Version 11.1 • July 2014
Total Page:16
File Type:pdf, Size:1020Kb
Oracle Commerce Content Acquisition System Developer's Guide Version 11.1 • July 2014 Contents Preface........................................................................................................................................9 About this guide..........................................................................................................................................................9 Who should use this guide.........................................................................................................................................9 Conventions used in this guide..................................................................................................................................9 Contacting Oracle Support.......................................................................................................................................10 Part I: Introduction to CAS and Crawling Data Sources................................11 Chapter 1: Introduction......................................................................................13 Overview of the Endeca Content Acquisition System.......................................................................................13 About the Endeca CAS Service........................................................................................................................14 About the CAS Server ......................................................................................................................................15 About the Component Instance Manager.........................................................................................................15 About the Record Store.....................................................................................................................................16 About the Dimension Value Id Manager...........................................................................................................17 Overview of the default CAS data sources and manipulators...........................................................................18 Security requirements.......................................................................................................................................18 Chapter 2: Creating a crawl ..............................................................................19 About creating a crawl.......................................................................................................................................19 About filters.......................................................................................................................................................19 About CAS output types and the Deployment Template..................................................................................21 Creating a crawl using the CAS Server Command-line Utility..........................................................................21 Creating a crawl using CAS Console................................................................................................................22 Creating a crawl using the CAS Server API......................................................................................................22 Setting document conversion options...............................................................................................................22 Configuring document conversion filters...........................................................................................................23 Modifying a crawl using the CAS Server Command-line Utility........................................................................25 Chapter 3: Configuring a Record Store instance............................................27 About record generations..................................................................................................................................27 About transactions............................................................................................................................................27 About the last read generation for a client........................................................................................................28 About deleted records.......................................................................................................................................29 Configuring a Record Store instance................................................................................................................30 Configuration properties for a Record Store instance.......................................................................................31 Change properties and new Record Store instances.......................................................................................35 Deleting stale generations of records...............................................................................................................35 Disabling automatic management of a Record Store instance.........................................................................35 Performance considerations when using a Record Store instance..................................................................36 Chapter 4: Running a crawl...............................................................................39 Running a crawl................................................................................................................................................39 Order of execution in a crawl configuration.......................................................................................................39 Full and incremental crawling modes................................................................................................................40 Crawls and archive files....................................................................................................................................41 About writing records to a Record Store instance.............................................................................................43 About the record output file...............................................................................................................................44 Chapter 5: Running the CAS sample applications.........................................47 About the sample CAS applications..................................................................................................................47 Part II: Using CAS data sources.......................................................................63 iii Chapter 6: Using the Delimited File data source............................................65 Configuration properties for the Delimited File data source..............................................................................65 Chapter 7: Using the Endeca Record File data source..................................67 Configuration properties for the Endeca Record File data source....................................................................67 Chapter 8: Using the File System data source................................................69 Configuration properties for the File System data source.................................................................................69 Chapter 9: Using the JDBC data source..........................................................71 Installing a JDBC driver into CAS.....................................................................................................................71 Configuration properties for the JDBC data source..........................................................................................71 Feature notes and known limitations of the JDBC data source........................................................................72 Part III: Loading data into an MDEX Engine.....................................................75 Chapter 10: Creating a Forge pipeline to read from or write to a Record Store.77 Overview of a Forge pipeline............................................................................................................................77 Creating a Forge pipeline .................................................................................................................................78 Chapter 11: Creating a CAS crawl to write MDEX-compatible output...........83 Overview of a CAS crawl that produces MDEX-compatible output..................................................................83 Loading dimensions, properties, and precedence rules...................................................................................84 Loading dimension value records into a Record Store instance.......................................................................85 Loading data records into a Record Store instance..........................................................................................89 About configuring application features in a CAS-based application.................................................................90 Creating a crawl to write MDEX-compatible output..........................................................................................91 Part IV: CAS Command Line Utilities...............................................................93 Chapter 12: CAS Server Command-line Utility................................................95 Overview of the CAS Server Command-line Utility...........................................................................................95