Concepts Guide

Total Page:16

File Type:pdf, Size:1020Kb

Concepts Guide MarkLogic Server Concepts Guide 1 MarkLogic 10 May, 2019 Last Revised: 10.0, May, 2019 Copyright © 2019 MarkLogic Corporation. All rights reserved. MarkLogic Server Table of Contents Table of Contents Concepts Guide 1.0 Overview of MarkLogic Server .....................................................................5 1.1 Relational Data Model vs. Document Data Model .................................................5 1.2 XML Schemas ........................................................................................................6 1.3 High Performance Transactional Database .............................................................7 1.4 Rich Search Features ..............................................................................................7 1.5 Text and Structure Indexes .....................................................................................7 1.6 Semantics Support ..................................................................................................8 1.7 Binary Document Support ......................................................................................9 1.8 MarkLogic APIs and Communication Protocols ....................................................9 1.9 High Performance .................................................................................................10 1.10 Clustered ...............................................................................................................11 1.11 Cloud Capable .......................................................................................................11 2.0 How is MarkLogic Server Used? .................................................................13 2.1 Publishing/Media Industry ....................................................................................13 2.2 Government / Public Sector ..................................................................................14 2.3 Financial Services Industry ...................................................................................15 2.4 Healthcare Industry ...............................................................................................15 2.5 Other Industries .....................................................................................................17 3.0 Indexing in MarkLogic ................................................................................18 3.1 The Universal Index ..............................................................................................18 3.1.1 Word Indexing ..........................................................................................19 3.1.2 Phrase Indexing .........................................................................................20 3.1.3 Relationship Indexing ...............................................................................21 3.1.4 Value Indexing ..........................................................................................22 3.1.5 Word and Phrase Indexing ........................................................................22 3.2 Other Types of Indexes .........................................................................................23 3.2.1 Range Indexing .........................................................................................23 3.2.1.1 Range Queries ...........................................................................26 3.2.1.2 Extracting Values ......................................................................27 3.2.1.3 Optimized "Order By" ...............................................................27 3.2.1.4 Using Range Indexes for Joins ..................................................28 3.2.2 Word Lexicons ..........................................................................................29 3.2.3 Reverse Indexing ......................................................................................30 3.2.3.1 Reverse Query Constructor .......................................................30 3.2.3.2 Reverse Query Use Cases .........................................................31 3.2.3.3 A Reverse Query Carpool Match ..............................................31 MarkLogic 10—May, 2019 Concepts Guide—Page 1 MarkLogic Server Table of Contents 3.2.3.4 The Reverse Index .....................................................................33 3.2.3.5 Range Queries in Reverse Indexes ............................................35 3.2.4 Triple Index ...............................................................................................36 3.2.4.1 Triple Index Basics ....................................................................36 3.2.4.2 Triple Data and Value Caches ...................................................37 3.2.4.3 Triple Values and Type Information .........................................38 3.2.4.4 Triple Positions .........................................................................38 3.2.4.5 Index Files .................................................................................39 3.2.4.6 Permutations ..............................................................................39 3.3 Index Size .............................................................................................................39 3.4 Fields .....................................................................................................................40 3.5 Reindexing ............................................................................................................40 3.6 Relevance ..............................................................................................................40 3.7 Indexing Document Metadata ...............................................................................40 3.7.1 Collection Indexes ....................................................................................41 3.7.2 Directory Indexes ......................................................................................41 3.7.3 Security Indexes ........................................................................................41 3.7.4 Properties Indexes .....................................................................................41 3.8 Fragmentation of XML Documents ......................................................................42 4.0 Data Management ........................................................................................43 4.1 What's on Disk ......................................................................................................43 4.1.1 Databases, Forests, and Stands .................................................................44 4.1.2 Tiered Storage ...........................................................................................44 4.1.3 Super Databases and Super Clusters .........................................................45 4.1.4 Partitions, Partition Keys, and Partition Ranges .......................................48 4.2 Ingesting Data .......................................................................................................50 4.3 Modifying Data .....................................................................................................52 4.4 Multi-Version Concurrency Control .....................................................................52 4.5 Point-in-time Queries ............................................................................................53 4.6 Locking .................................................................................................................53 4.7 Updates .................................................................................................................53 4.8 Isolating an update ................................................................................................54 4.9 Documents are Like Rows ....................................................................................54 4.10 MarkLogic Data Loading Mechanisms ................................................................55 4.11 Content Processing Framework (CPF) .................................................................56 4.12 Organizing Documents .........................................................................................57 4.12.1 Directories .................................................................................................57 4.12.2 Collections ................................................................................................57 4.12.3 Unprotected Collections ...........................................................................58 4.12.4 Protected Collections ................................................................................58 4.13 Database Rebalancing ...........................................................................................58 4.14 Bitemporal Documents .........................................................................................60 4.14.1 Bitemporal Data Management ..................................................................60 4.14.2 Bitemporal Queries ...................................................................................61 4.15 Managing Semantic Triples ..................................................................................61 MarkLogic 10—May, 2019 Concepts Guide—Page 2 MarkLogic Server Table of Contents 5.0 Searching in MarkLogic Server ...................................................................62 5.1 High Performance Full Text Search .....................................................................62
Recommended publications
  • Semantics Developer's Guide
    MarkLogic Server Semantic Graph Developer’s Guide 2 MarkLogic 10 May, 2019 Last Revised: 10.0-8, October, 2021 Copyright © 2021 MarkLogic Corporation. All rights reserved. MarkLogic Server MarkLogic 10—May, 2019 Semantic Graph Developer’s Guide—Page 2 MarkLogic Server Table of Contents Table of Contents Semantic Graph Developer’s Guide 1.0 Introduction to Semantic Graphs in MarkLogic ..........................................11 1.1 Terminology ..........................................................................................................12 1.2 Linked Open Data .................................................................................................13 1.3 RDF Implementation in MarkLogic .....................................................................14 1.3.1 Using RDF in MarkLogic .........................................................................15 1.3.1.1 Storing RDF Triples in MarkLogic ...........................................17 1.3.1.2 Querying Triples .......................................................................18 1.3.2 RDF Data Model .......................................................................................20 1.3.3 Blank Node Identifiers ..............................................................................21 1.3.4 RDF Datatypes ..........................................................................................21 1.3.5 IRIs and Prefixes .......................................................................................22 1.3.5.1 IRIs ............................................................................................22
    [Show full text]
  • Introduction How Marklogic Works?
    MarkLogic Database – Only Enterprise NoSQL DB Sanket V. Patel, Aashi Rastogi Department of Computer Science University of Bridgeport, Bridgeport, CT Abstract MarkLogic DB is one of the Enterprise NoSQL database that supports multiple-model database design. It is optimized for structured and unstructured data that allows to store, manage, query and search across JSON, XML, RDF (Triplestore) and can handle data with a schema free and leads to faster time-to-results by providing handling of different types of data. It provides ACID Transactions using MVCC (multi-version concurrency control). One of the important key feature of MarkLogic is its Bitemporal behavior by providing data at every point in time. Due to its shared-nothing architecture it is highly available and easily and massively scalable with no single point of failure making structured data integration easier. It also has incremental backup means to only backup the updated data. Marklogic provides Hadoop integration and Hadoop is designed to store large amount of data in Hadoop Distributed File System (HDFS) and works better with the transactional applications. Introduction NoSQL means non-SQL or non-relational databases which provides mechanism to store and retrieve the data other than relational databases. NoSQL database is in use nowadays because of simplicity of design, easy to scale out and control over availability. Implementation Types of NoSQL Databases: Key-value based Column oriented Graph oriented Document based Multi-model Multi-model database is an only designed to support multiple data models against a single application. Marklogic DB is one of the NoSQL database that uses multi-model database design.
    [Show full text]
  • Content Processing Framework Guide (PDF)
    MarkLogic Server Content Processing Framework Guide 2 MarkLogic 9 May, 2017 Last Revised: 9.0-7, September 2018 Copyright © 2019 MarkLogic Corporation. All rights reserved. MarkLogic Server Version MarkLogic 9—May, 2017 Page 2—Content Processing Framework Guide MarkLogic Server Table of Contents Table of Contents Content Processing Framework Guide 1.0 Overview of the Content Processing Framework ..........................................7 1.1 Making Content More Useful .................................................................................7 1.1.1 Getting Your Content Into XML Format ....................................................7 1.1.2 Striving For Clean, Well-Structured XML .................................................8 1.1.3 Enriching Content With Semantic Tagging, Metadata, etc. .......................8 1.2 Access Internal and External Web Services ...........................................................8 1.3 Components of the Content Processing Framework ...............................................9 1.3.1 Domains ......................................................................................................9 1.3.2 Pipelines ......................................................................................................9 1.3.3 XQuery Functions and Modules .................................................................9 1.3.4 Pre-Commit and Post-Commit Triggers ...................................................10 1.3.5 Creating Custom Applications With the Content Processing Framework 11 1.4
    [Show full text]
  • Access Control Models for XML
    Access Control Models for XML Abdessamad Imine Lorraine University & INRIA-LORIA Grand-Est Nancy, France [email protected] Outline • Overview on XML • Why XML Security? • Querying Views-based XML Data • Updating Views-based XML Data 2 Outline • Overview on XML • Why XML Security? • Querying Views-based XML Data • Updating Views-based XML Data 3 What is XML? • eXtensible Markup Language [W3C 1998] <files> "<record>! ""<name>Robert</name>! ""<diagnosis>Pneumonia</diagnosis>! "</record>! "<record>! ""<name>Franck</name>! ""<diagnosis>Ulcer</diagnosis>! "</record>! </files>" 4 What is XML? • eXtensible Markup Language [W3C 1998] <files>! <record>! /files" <name>Robert</name>! <diagnosis>! /record" /record" Pneumonia! </diagnosis> ! </record>! /name" /diagnosis" <record …>! …! </record>! Robert" Pneumonia" </files>! 5 XML for Documents • SGML • HTML - hypertext markup language • TEI - Text markup, language technology • DocBook - documents -> html, pdf, ... • SMIL - Multimedia • SVG - Vector graphics • MathML - Mathematical formulas 6 XML for Semi-Structered Data • MusicXML • NewsML • iTunes • DBLP http://dblp.uni-trier.de • CIA World Factbook • IMDB http://www.imdb.com/ • XBEL - bookmark files (in your browser) • KML - geographical annotation (Google Maps) • XACML - XML Access Control Markup Language 7 XML as Description Language • Java servlet config (web.xml) • Apache Tomcat, Google App Engine, ... • Web Services - WSDL, SOAP, XML-RPC • XUL - XML User Interface Language (Mozilla/Firefox) • BPEL - Business process execution language
    [Show full text]
  • Multi-Model Databases: a New Journey to Handle the Variety of Data
    0 Multi-model Databases: A New Journey to Handle the Variety of Data JIAHENG LU, Department of Computer Science, University of Helsinki IRENA HOLUBOVA´ , Department of Software Engineering, Charles University, Prague The variety of data is one of the most challenging issues for the research and practice in data management systems. The data are naturally organized in different formats and models, including structured data, semi- structured data and unstructured data. In this survey, we introduce the area of multi-model DBMSs which build a single database platform to manage multi-model data. Even though multi-model databases are a newly emerging area, in recent years we have witnessed many database systems to embrace this category. We provide a general classification and multi-dimensional comparisons for the most popular multi-model databases. This comprehensive introduction on existing approaches and open problems, from the technique and application perspective, make this survey useful for motivating new multi-model database approaches, as well as serving as a technical reference for developing multi-model database applications. CCS Concepts: Information systems ! Database design and models; Data model extensions; Semi- structured data;r Database query processing; Query languages for non-relational engines; Extraction, trans- formation and loading; Object-relational mapping facilities; Additional Key Words and Phrases: Big Data management, multi-model databases, NoSQL database man- agement systems. ACM Reference Format: Jiaheng Lu and Irena Holubova,´ 2019. Multi-model Databases: A New Journey to Handle the Variety of Data. ACM CSUR 0, 0, Article 0 ( 2019), 38 pages. DOI: http://dx.doi.org/10.1145/0000000.0000000 1.
    [Show full text]
  • Monitoring Marklogic Guide (PDF)
    MarkLogic Server Monitoring MarkLogic Guide 1 MarkLogic 10 May, 2019 Last Revised: 10.0-6, February, 2021 Copyright © 2021 MarkLogic Corporation. All rights reserved. MarkLogic Server Table of Contents Table of Contents Monitoring MarkLogic Guide 1.0 Monitoring MarkLogic Server .......................................................................5 1.1 Overview .................................................................................................................5 1.2 Selecting a Monitoring Tool ...................................................................................5 1.3 Monitoring Architecture, a High-level View ..........................................................6 1.4 Monitoring Tools and Security ...............................................................................6 1.5 Guidelines for Configuring your Monitoring Tools ...............................................7 1.5.1 Establish a Performance Baseline ...............................................................7 1.5.2 Balance Completeness Against Performance .............................................7 1.6 Monitoring Metrics of Interest to MarkLogic Server .............................................8 1.6.1 Does MarkLogic Have Adequate Resources? ............................................8 1.6.2 What is the State of the System Overall? ...................................................9 1.6.3 What is Happening on the MarkLogic Server Cluster Now? .....................9 1.6.4 Are There Signs of a Serious Problem? ...................................................11
    [Show full text]
  • Data Lifecycle and Analytics in the AWS Cloud
    Data Lifecycle and Analytics in the AWS Cloud A Reference Guide for Enabling Data-Driven Decision-Making DATA LIFECYCLE AND ANALYTICS IN THE AWS CLOUD AWS THE IN ANALYTICS AND LIFECYCLE DATA CONTENTS PURPOSE Contents INTRODUCTION Purpose 3 CHALLENGES 1. Introduction 4 2. Common Data Management Challenges 10 3. The Data Lifecycle in Detail 14 LIFECYCLE Stage 1 – Data Ingestion 16 INGESTION Stage 2 – Data Staging 24 Stage 3 – Data Cleansing 31 Stage 4 – Data Analytics and Visualization 34 STAGING Stage 5 – Data Archiving 44 4. Data Security, Privacy, and Compliance 46 CLEANSING 5. Conclusion 49 ANALYTICS 6. Further Reading 51 Appendix 1: AWS GovCloud 53 Appendix 2: A Selection of AWS Data and Analytics Partners 54 Contributors 56 ARCHIVING SECURITY Public Sector Case Studies Financial Industry Regulation Authority (FINRA) 9 CONCLUSION Brain Power 17 READING DigitalGlobe 21 US Department of Veterans Affairs 23 APPENDICES Healthdirect Australia 27 CONTRIBUTORS Ivy Tech Community College 35 UMUC 38 UK Home Office 40 2 © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. DATA LIFECYCLE AND ANALYTICS IN THE AWS CLOUD AWS THE IN ANALYTICS AND LIFECYCLE DATA CONTENTS PURPOSE Purpose of this guide INTRODUCTION Data is an organization’s most valuable asset and the volume and variety of data that organizations amass continues to grow. The demand for simpler data analytics, cheaper data storage, advanced predictive tools CHALLENGES like artificial intelligence (AI), and data visualization is necessary for better data-driven decisions. LIFECYCLE INGESTION The Data Lifecycle and Analytics in the AWS Cloud guide helps organizations of all sizes better understand the data lifecycle so they can optimize or establish an advanced data analytics practice in their STAGING organization.
    [Show full text]
  • Release Notes (PDF)
    MarkLogic Server Release Notes 2 MarkLogic 9 May, 2017 Last Revised: 9.0-13, July, 2020 Copyright © 2020 MarkLogic Corporation. All rights reserved. MarkLogic Server Version MarkLogic 9—May, 2017 Page 2—Release Notes MarkLogic Server Table of Contents Table of Contents Release Notes 1.0 Introduction ..................................................................................................13 1.1 Bug Fixes ..............................................................................................................13 2.0 Installation and Upgrade ..............................................................................15 2.1 Supported Platforms .............................................................................................15 2.2 Supported Filesystems ..........................................................................................15 2.3 Upgrade Support ...................................................................................................15 3.0 New Features in MarkLogic 9 .....................................................................17 3.1 Template Driven Extraction (TDE) ......................................................................18 3.2 SQL Enhancements ...............................................................................................18 3.3 Optic API ..............................................................................................................18 3.4 Enhanced Tiered Storage ......................................................................................19
    [Show full text]
  • BEYOND the RDBMS: WORKING with RELATIONAL DATA in MARKLOGIC Ken Tune, Senior Principal Consultant, Marklogic
    BEYOND THE RDBMS: WORKING WITH RELATIONAL DATA IN MARKLOGIC Ken Tune, Senior Principal Consultant, MarkLogic © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Agenda . Personal introduction . Motivation – the Operational Data Hub . Modeling data in MarkLogic . Three worked examples demonstrating different migration patterns . BI Tool integration . Wrap Up & Q&A SLIDE: 2 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Introduction . Ken Tune . Senior Principal Consultant at MarkLogic for ~5 years . Background : dev lead / system architecture and design . Strong background in relational technology . Source code for this talk - https://github.com/rjrudin/marklogic-sakila-demo SLIDE: 3 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. OVERVIEW MarkLogic as an Operational Data Hub . Integrate data from multiple heterogeneous stand-alone sources . Do more with that data in aggregate . Use extensive MarkLogic feature set . Some of those silos will be RDBMSs SLIDE: 4 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Tables and Documents . Tables – Tabular! Document . Documents – Tree Structures – hierarchical & nested Title . Higher dimension implies greater representative power Metadata Author Section . We can intuitively transform tables into documents . Documents offer additional possibilities First Last Section Section Section Section SLIDE: 5 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Entities and relationships . Our applications have conceptual models Association . A conceptual model has entities with different Person Person kinds of relationships Aggregation . Relational forces us to break up our model into separate tables for every 1:many relation Person Car . Sometimes makes sense, but not always Composition . We never have a choice Person Alias SLIDE: 6 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Documents – greater flexibility . Choose how you model relationships Person • Name .
    [Show full text]
  • Master Thesis
    Version Control of structured data: A study on different approaches in XML Master Thesis in Software Engineering ERIK AXELSSON SE´RGIO BATISTA Department of Computer Science and Engineering Division of Software Engineering Chalmers University of Technology Gothenburg, Sweden 2015 Abstract The structured authoring environment has been changing towards a decentralised form of authoring. Existing version control systems do not handle these documents ade- quately, making it very difficult to have parallel authoring in a structured environment. This study attempts to find better alternatives to the existing paradigms and tools for versioning XML documents. To achieve this, the DESMET methodology for evaluating software engineering meth- ods and tools was applied, with both a Feature Analysis and a Benchmark Analysis being performed. Concerning the feature analysis, the results demonstrate that the XML-aware tools are, as expected, better at XML specific concerns, such as considering the history of a specific node. Conversely, the non-XML-aware ones are not able to achieve good results in the XML specific concerns, but do achieve a high score when considering project maturity or general repository management features. Regarding performance, this study concludes that XML-aware tools bring a considerable overhead when compared to the non-XML-aware tools. This study concludes that the selection of an approach to versioning XML should be dependent of the priorities of the documentation project. Keywords: Structured documentation, XML, Version Control, distributed collaboration, Git, Sirix, XChronicler, temporal databases, DESMET. Acknowledgements The authors would like to thank both the academic supervisor Morgan Ericsson and examiner Matthias Tichy for their feedback and support throughout this thesis work.
    [Show full text]
  • Marklogic 10 Upgrade Accelerator Datasheet
    MarkLogic 10 Upgrade Accelerator DATASHEET The MarkLogic 10 Upgrade Accelerator is designed to help our existing customers put the powerful features of MarkLogic 10 into action and to educate you on the art of the possible with this significant release. MarkLogic 10 is a free upgrade to all customers with an active support contract. The accelerator starts with a collaboration between MarkLogic Consulting and your team to determine which new features apply to your business and how they can best benefit you. Then we will work together to plan and implement your upgrade. MarkLogic Consulting has been working with MarkLogic 10 throughout the product development lifecycle and is uniquely positioned with deep reachback capability to MarkLogic Engineering. This ensures our customers will have access to the most extensive and current information to take full advantage of MarkLogic 10. Gain All the Benefits of MarkLogic 10 MarkLogic 10 includes a variety of robust new features that we can help you take advantage of. The following focus areas highlight some, but not all, of the new capabilities in MarkLogic 10. Embedded Machine Learning MarkLogic 10 ships with a complete machine-learning toolkit. This enables data scientists to develop and train machine-learning models in MarkLogic. The machine learning modules are deeply embedded into the MarkLogic engine, meaning inference models can be woven into the fabric of our MarkLogic application and run as part of an ACID-compliant transaction, close to the data for optimal performance and under the best-of-breed security of the MarkLogic database. MarkLogic 10 also includes Nvidia’s CUDA libraries, enabling GPU acceleration for machine learning training and execution.
    [Show full text]
  • Xquery 1 Xquery
    XQuery 1 XQuery XQuery Examples Collection Welcome to the XQuery Examples Collection Wikibook! XQuery is a World Wide Web Consortium recommendation for selecting data from documents and databases. Current Status A new release of eXist (1.4) is currently installed and under test. Please note any problems with these examples in the discussion. Recent Changes About this Project This is a collaborative project and we encourage everyone who is using XQuery to contribute their XQuery examples. All example programs must conform to the creative-commons-2.5 share-alike with attribution license agreement [1]. Execution of examples use an eXist demo server. 1. Instructors: please sign our Guest Registry if you are using this book for learning or teaching XQuery 2. Contributors: please see our Naming Conventions to ensure your examples are consistent with the textbook 3. Learners: If you are looking for an example of a specific XQuery language construct, technique or problem but can't find an example, please add a suggestion to the Examples Wanted section. Introduction 1. Background - A brief history and motivation for the XQuery standard. 2. Benefits - Why use XQuery? 3. Installing and Testing - How to install an XQuery server on your . 4. Naming Conventions - Naming standards used throughout this book. Example Scripts Beginning Examples Examples that do not assume knowledge of functions and modules. 1. HelloWorld - A simple test to see if XQuery is installed correctly. 2. FLWOR Expression - A basic example of how XQuery FLWOR statements work. 3. Sequences - Working with sequences is central to XQuery. 4. XPath examples - Sample XPath samples for people new to XML and XPath 5.
    [Show full text]