Content Processing Framework Guide (PDF)
Total Page:16
File Type:pdf, Size:1020Kb
MarkLogic Server Content Processing Framework Guide 2 MarkLogic 9 May, 2017 Last Revised: 9.0-7, September 2018 Copyright © 2019 MarkLogic Corporation. All rights reserved. MarkLogic Server Version MarkLogic 9—May, 2017 Page 2—Content Processing Framework Guide MarkLogic Server Table of Contents Table of Contents Content Processing Framework Guide 1.0 Overview of the Content Processing Framework ..........................................7 1.1 Making Content More Useful .................................................................................7 1.1.1 Getting Your Content Into XML Format ....................................................7 1.1.2 Striving For Clean, Well-Structured XML .................................................8 1.1.3 Enriching Content With Semantic Tagging, Metadata, etc. .......................8 1.2 Access Internal and External Web Services ...........................................................8 1.3 Components of the Content Processing Framework ...............................................9 1.3.1 Domains ......................................................................................................9 1.3.2 Pipelines ......................................................................................................9 1.3.3 XQuery Functions and Modules .................................................................9 1.3.4 Pre-Commit and Post-Commit Triggers ...................................................10 1.3.5 Creating Custom Applications With the Content Processing Framework 11 1.4 Default Conversion Option ...................................................................................11 1.4.1 Microsoft Office and Adobe PDF Conversion .........................................11 1.4.2 HTML Conversion and Enrichment .........................................................11 2.0 Getting Started with a Simple CPF Application ..........................................13 2.1 Overview of Example CPF Application ...............................................................14 2.2 Create the Action Modules ...................................................................................15 2.2.1 add-copyright.xqy .....................................................................................15 2.2.2 add-last-updated.xqy .................................................................................16 2.3 Insert the Action Modules into the Modules Database .........................................16 2.4 Create the Pipeline ................................................................................................17 2.5 Configure a Database for Content Processing ......................................................19 2.6 Insert and Update a Document in the Database ....................................................22 2.7 View the Properties Document .............................................................................23 2.8 Extend the CPF Application .................................................................................24 3.0 Understanding and Using Domains .............................................................29 3.1 Overview of Domains ...........................................................................................29 3.2 Domain Scope and Code Evaluation Context .......................................................29 3.2.1 Domain Scope ...........................................................................................30 3.2.2 Evaluation Context ...................................................................................30 3.2.3 Domain Scope Can Encapsulate Processing Logic ..................................31 3.3 Rules for Domains ................................................................................................31 3.3.1 Do Not Overlap Domains .........................................................................31 3.3.2 Collection Domain Scope Notes ...............................................................32 3.4 Creating and Modifying Domains ........................................................................32 MarkLogic 9—May, 2017 Content Processing Framework Guide—Page 3 MarkLogic Server Version MarkLogic 9—May, 2017 Table of Contents 3.5 Attaching and Detaching Pipelines to Domains ...................................................33 4.0 Understanding and Using Pipelines .............................................................35 4.1 Pipeline Architecture ............................................................................................35 4.1.1 Automatic Status (Event) Handling ..........................................................35 4.1.2 Transitioning Between States ...................................................................36 4.1.3 Pipelines Can Flow Through Other Pipelines ...........................................37 4.2 Viewing Pipelines in the Admin Interface ............................................................38 4.3 Loading Pipelines With the Admin Interface .......................................................39 4.4 XML Format of a Pipeline ....................................................................................40 4.4.1 Sample Pipeline XML Document .............................................................41 4.4.2 Success-Action and Failure-Action ..........................................................42 4.4.3 Status Transitions ......................................................................................42 4.4.3.1 Status .........................................................................................43 4.4.3.2 On Success and On Failure .......................................................43 4.4.3.3 Default Action ...........................................................................43 4.4.3.4 Priority .......................................................................................43 4.4.3.5 Execute ......................................................................................44 4.4.4 State Transitions .......................................................................................44 4.4.4.1 State ...........................................................................................44 4.4.4.2 On Success and On Failure .......................................................45 4.4.4.3 Default Action ...........................................................................45 4.4.4.4 Priority .......................................................................................45 4.4.4.5 Execute ......................................................................................46 4.5 XQuery Functions to Manage Pipelines ...............................................................46 5.0 Developing Modules to Process Content .....................................................47 5.1 Overview of Modules ...........................................................................................47 5.2 Loading Modules Into the Database .....................................................................47 5.3 External Variables Available to Modules .............................................................48 5.4 Design Patterns and Rules ....................................................................................49 5.4.1 Condition Modules Must Return a Boolean .............................................49 5.4.2 Condition Modules Should Not Update Documents ................................49 5.4.3 Action Modules Use try/catch With cpf:success and cpf:failure ..............49 5.4.4 Action Modules Operate On a Single Document .....................................50 5.4.5 Action Modules Must Be a Single Transaction ........................................51 5.4.6 Using XSLT Stylesheets Instead of Action Modules ...............................51 6.0 Using the Framework to Create Custom Applications ................................53 6.1 Install Content Processing Framework in Database .............................................53 6.2 Decide on the Stages and Logic for Your Processing ..........................................54 6.3 Create and Load Your Pipelines ...........................................................................55 6.4 Create and Load Your Modules ............................................................................55 6.5 Design Patterns For Content Applications ............................................................55 6.6 Microsoft Office 2007 and Later Documents .......................................................56 Page 4—Content Processing Framework Guide MarkLogic Server Table of Contents 6.7 Other CPF Applications Included With MarkLogic Server .................................56 7.0 Security Considerations With Content Processing ......................................57 7.1 Security Requirements for Users Who Create or Modify Documents .................57 7.2 Security Requirements When Modules Perform Privileged Operations ..............57 7.3 Security Roles for Managing Content Processing ................................................58 8.0 Debugging and Recovering from Error Conditions .....................................59 8.1 Database Online Events ........................................................................................59 8.2 Disabling Content Processing Triggers ................................................................59 8.3 Content Processing Framework Trace Events ......................................................61 8.3.1 List of Trace Events ..................................................................................61