Loading Content Into Marklogic Server

MarkLogic Server Loading Content Into MarkLogic Server 1 MarkLogic 10 May, 2019 Last Revised: 10.0, May, 2019 Copyright © 2019 MarkLogic Corporation. All rights reserved. MarkLogic Server Table of Contents Table of Contents Loading Content Into MarkLogic Server 1.0 Designing a Content Loading Strategy ..........................................................4 1.1 Available Content Loading Interfaces ....................................................................5 1.2 Loading Activities ...................................................................................................6 1.3 What to Consider Before Loading Content ............................................................7 1.3.1 Setting Document Permissions ...................................................................7 1.3.2 Schemas ......................................................................................................7 1.3.3 Fragments ....................................................................................................8 1.3.4 Indexing ......................................................................................................8 2.0 Controlling Document Format .......................................................................9 2.1 Terminology ............................................................................................................9 2.2 Supported Document Formats ..............................................................................10 2.2.1 JSON Format ............................................................................................10 2.2.2 XML Format .............................................................................................11 2.2.3 Binary Format ...........................................................................................11 2.2.4 Text (CLOB) Format ................................................................................11 2.3 Choosing a Binary Format ....................................................................................12 2.3.1 Loading Binary Documents ......................................................................15 2.3.2 Configuring MarkLogic Server for Binary Documents ............................15 2.4 Implicitly Setting the Format Based on the MIME Type .....................................15 2.5 Explicitly Setting the Format ................................................................................16 2.6 Determining the Format of a Document ...............................................................17 3.0 Specifying Encoding and Language ............................................................18 3.1 Understanding Character Encoding ......................................................................18 3.2 Explicitly Specifying Character Encoding While Loading ..................................19 3.3 Automatically Detecting the Encoding .................................................................19 3.4 Inferring the Language and Encoding of a Node in XQuery with xdmp:encoding- language-detect 20 3.5 Specifying the Default Language for XML Documents .......................................21 4.0 Loading Content Using XQuery ..................................................................22 4.1 Built-In Document Loading Functions .................................................................22 4.2 Specifying a Forest in Which to Load a Document ..............................................23 4.2.1 Consider If You Really Want to Specify a Forest ....................................23 4.2.2 Some Potential Advantages of Specifying a Forest ..................................24 4.2.3 Example: Examining a Document to Decide Which Forest to Specify ....24 MarkLogic 10—May, 2019 Loading Content Into MarkLogic Server—Page 1 MarkLogic Server Table of Contents 4.2.4 More Examples .........................................................................................25 4.3 Creating External Binary References Using XQuery ...........................................26 5.0 Loading Content Using REST, Java or Node.js ...........................................27 6.0 Loading Content Using MarkLogic Content Pump ....................................28 7.0 Loading Content Using WebDAV ...............................................................29 8.0 Repairing XML Content During Loading ....................................................30 8.1 Programming Interfaces and Supported Content Repair Capabilities ..................31 8.2 Enabling Content Repair .......................................................................................31 8.3 General-Purpose Tag Repair .................................................................................32 8.3.1 How General-Purpose Tag Repair Works ................................................32 8.3.2 Pitfalls of General-Purpose Tag Repair ....................................................33 8.3.3 Limitations ................................................................................................34 8.3.3.1 XQuery Functions .....................................................................34 8.3.3.2 Root Element .............................................................................34 8.3.3.3 Previous Marklogic Versions ....................................................34 8.3.4 Controlling General-Purpose Tag Repair .................................................34 8.4 Auto-Close Repair of Empty Tags ........................................................................35 8.4.1 What Empty Tag Auto-Close Repair Does ...............................................35 8.4.2 Defining a Schema to Support Empty Tag Auto-Close Repair ................36 8.4.3 Invoking Empty Tag Auto-Close Repair ..................................................37 8.4.4 Scope of Application ................................................................................39 8.4.5 Disabling Empty Tag Auto-Close .............................................................40 8.5 Schema-Driven Tag Repair ..................................................................................40 8.5.1 What Schema-Driven Tag Repair Does ....................................................40 8.5.2 How to Invoke Schema-Driven Tag Repair .............................................42 8.5.3 Scope of Application ................................................................................43 8.5.4 Disabling Schema-Driven Tag Repair ......................................................43 8.6 Load-Time Default Namespace Assignment ........................................................44 8.6.1 How Default Namespace Assignments Work ..........................................44 8.6.2 Scope of Application ................................................................................45 8.7 Load-Time Namespace Prefix Binding ................................................................45 8.7.1 How Load-Time Namespace Prefix Binding Works ................................46 8.7.2 Interaction with Load-Time Default Namespace Assignment .................47 8.7.3 Scope of Application ................................................................................49 8.7.4 Disabling Load-Time Namespace Prefix Binding ....................................49 8.8 Query-Driven Content Repair ...............................................................................49 8.8.1 Point Repair ..............................................................................................50 8.8.2 Document Walkers ...................................................................................50 9.0 Modifying Content During Loading ............................................................53 MarkLogic 10—May, 2019 Loading Content Into MarkLogic Server—Page 2 MarkLogic Server Table of Contents 9.1 Converting Microsoft Office and Adobe PDF Into XML ....................................53 9.2 Converting to XHTML .........................................................................................54 9.3 Automating Metadata Extraction ..........................................................................54 9.4 Transforming XML Structures .............................................................................54 10.0 Performance Considerations ........................................................................55 10.1 Understanding the Locking and Journaling Database Settings for Bulk Loads ...55 10.2 Fragmentation .......................................................................................................56 11.0 Technical Support ........................................................................................57 12.0 Copyright .....................................................................................................59 MarkLogic 10—May, 2019 Loading Content Into MarkLogic Server—Page 3 MarkLogic Server Designing a Content Loading Strategy 1.0 Designing a Content Loading Strategy 8 MarkLogic Server provides many ways to load content into a database including built-in XQuery functions, the REST Client API, and the command-line tool, MarkLogic Content Pump (mlcp). Choosing the appropriate method for a specific use case depends on many factors, including the characteristics of your content, the source of the content, the frequency of loading, and whether the content needs to be repaired or modified during loading. In addition, environmental and operational factors such as workflow integration, development resources, performance

Load more