<<

Ontology-driven corporate software: a cookbook

TriniData LLC Contents

4 Why Use Ontologies in Corporate Applications? 50 Subscribing to Receive Data

6 The Data Structure 53 Synchronization Between Several Platform Instances

16 Data Storage 55 SPARQL Endpoint

20 Properties of the Classes and Properties 57 Geography and Geometry Support

22 Access Rights and Security 59 Languages Support

25 Architecture of the Applications Using ArchiGraph Platform 60 Full-Text Search in Ontologies

27 ArchiGraph Platform Deployment 61 Glossaries, Lexical Models and Text-to-Facts Transformation

39 Working With ArchiGraph REST API 64 Querying External Data With ArchiGraph

42 Data Processing Using Inference Rules

2 Ontology-based corporate software is gaining popularity. Ontologies This document is not a tutorial on ontologies itself, or a detailed guide are the way of formalization of the conceptual knowledge models. of using them. There is a lot of literature covering these matters, They can be represented in a machine-readable form according to (Maria Keet, 2018). We have tried to give an overview of how the language, SHACL constraints and inference rules formalization syntax, ontologies can enrich applications functionality, and to give some etc. The upper-level ontologies, such as BFO, GFO, SUMO, Dolce are particular recipes of their use on our platform. In other words, we providing tools for knowledge modeling in various domains. The why and how to ontology-enabled software is developed by a number of vendors, use ontologies in the business applications. among which are TopQuadrant, Semantic Web Company, Ontotext and many others. Their products often become a basis for the applied systems implemented in the world’s leading corporations: the knowledge management systems, information collection and analysis tools, situation centers and many others.

The TriniData company has created the ArchiGraph platform, which enables to use the full power of ontologies for building corporate applications, using the world standards and the widespread modeling methods. Our instruments are interoperable with other ontology-based tools at the model and technical level. At the same in the corporate applications. Having six years of experience in implementation of ontology-based corporate solutions, we would describes a set of technologies, methods and tools necessary for designing modern and effective corporate software.

In this document we consider only one scenario of using ontologies in corporate software, but we hope that a reader could translate the described design patterns into other domains.

3 Why Use Ontologies in Corporate Applications?

Ontologies are often used in software for representing the structure on the contrary, lowers the times and cost of any possible software of the data and the data itself, and as a tool for representing the parts modernization. of a logic of a program execution. The ontology is situated outside of its processing can be changed without changing the code and even they are supported by and can be processed with a wide range of them. restarting an application. It means that ontology-based applied software is not vendor-locked with particular platforms or solutions, does not depend on versions or A skeptical reader can argue that there are a lot of software vendor policies: in most cases you can switch the applied system from development platforms which allow customizing data structure and using one ontological framework to another, probably more modern application logic. What is the principal difference between ontological and more functional ontology processing platform. And, obviously, the software development and these platforms? ontology-based software developers are not limited in using various programming languages, operation systems and development tools. The ontological way of the world representation is related with the philosophical, mathematical, and logical foundations: the description The functional advantages of ontology-based software development logic, set theory, etc. This allows to abstract from the details of data storage design and focus on reproduction of the business user’s are paramount. The modern world is volatile and the enterprises should of view on the same objects and processes. The models built with customers’ needs can change instantly, the products and services are emerging and eliminating in response of them, the business processes methods) can have less distortions when the business user’s concepts are changing accordingly. In other words, the business processes and objects are very unstable, and any piece of business logic rigidly the relational databases design. Such distortions on the design phase represented in the data structure or program algorithm can tomorrow can be costly when implementing and supporting software. The become the fatal obstacle which won’t allow an enterprise to adapt to the changes, expand to the new markets or just keep on running.

4 For the following narrative we need an example which allows demonstrating the described advantages of the ontologies but will be simple enough to understand without excessive explanations. Consider a simple automated system, a “situation center” of the building, which gathers telemetry from the temperature and system is designed to process only these kinds of data, but it should be able to extend further to process many other types of building that the building should host the public events which organizers can It means that an automated system cannot be limited in processing types, response orders and participators. The system can be further used by the personnel of the new services of the building, as well as the processing new types of data and building new kinds of reports.

As an enterprise will not have time and money to implement these system which could be customized on the run by the analysts, not developers. It encourages an enterprise to set the task of using tools which will simplify integration with new data sources, customization of the user interfaces and reports, changing the set of the processed events types and the typical orders of response on them without amending program code. This motivates a system developer to use ontologies and to put as much logic on the model level as possible.

The rest of the document demonstrates how to achieve this.

5 The Data Structure

In the systems that we build the data structure is a part of an Our analysts often use BFO upper level ontology and importing ontology, so the ontology can be regarded as the core, around which parts of the applicable domain ontologies, constructing the rest of the allowable customization limits depend on the correctness of the possible approach and it can be varied according to the customer’s ontology design.

There are no commonly accepted and formalized rules of ontology The ontology of the situation center of the building described above design. Several “schools” are allowing their own modeling methods, should contain the following parts: Moreover, following too strict methods can lead to the lack of solution • •

There is a wide range of domain ontologies, such as FOAF for people • signals that can be generated by these devices: measures, and their relations, SSN and SOSA for sensors, QUDT for units of measure, IFC for constructions etc. When creating an ontology you can import and reuse some parts of these ontologies: this allows to • the other hand, we don’t think that reusing ontologies ends in itself, • as it could complicate model structure and lead to trade-offs that will • reaction orders and instructions according to which the actions cost much at the time of system modernization. should be taken. An opposite approach is designing ontology from scratch, just as a All these business objects describe the real-world events and their description of some part of reality. This could cause ontology littering participants. The data on particular objects and events (individuals) will be with unnecessary entities and their unclear distinction, problems when expanding ontology to the adjacent domains, and in the end ontology classes and properties. The ontology should also contain entities practices criticized above.

6 • with an ontology. In the demo version1, unlike productive installation, • data transformation rules (for example, rules of transforming First of all, an endpoint should be created. An endpoint is, according • actions performed by the automated system in order to react on execute the next command in the command line of the server where • ArchiGraph is deployed2: • analytic dashboards and interactive maps building rules, etc. As we will show later, a part of the data processing logic (algorithms) will also be represented in the form of constraints or inference rules. So the ontologies allow to represent the logic of automated system execution using ontology elements represented according to some is a root ontology element (top-level ontology classes should be its metamodel, as well as using inference rules. SHACL inference rules subclasses). itself are a part of an ontology.

Then you should create a core storage in which ArchiGraph will store In the ArchiGraph platform, ontologies are composed using ArchiGraph. the ontology TBox (classes and properties) and all the data by default, Mir editor (“Mir” means “world” in Russian). We recommend referring to its user manual for more information. later. In general, you should deploy a graph database (RDF triple store with SPARQL interface, for example Apache Fuseki), and register it in To start working with ArchiGraph.Mir on-premise installation, you endpoints and storages. ArchiGraph is an ontology-based data management platform offering API to access ontology and the data 1 You can get ArchiGraph products demo at represented according with to it. ArchiGraph.Mir uses this API to work 2 ArchiGraph is usually deployed in containers, so you should add a command allowing to get into container to this line. We will show an example later, when discussing ArchiGraph deployment.

7 Table 1. The storage types available in the default ArchiGraph version

"Demo model storage" is its readable name, 127.0.0.1 is a storage IP RDF triple stores address, 8080 is a port number, mdm is a login for HTTP authorization, fuseki Apache Fuseki allegrograph AllegroGraph update service URL. blazegraph BlazeGraph

The storage types available in the default platform version are listed Document-oriented databases in the Table 1. mongo MongoDB through MongoClient library

This list can be extended with adapters for Oracle, SQL Server, MySQL mongodb MongoDB through MongoDB library and other databases. mongodba MongoDB through MongoDB library, Then you should bind the new storage to the endpoint as a TBox supporting multi-language string literals (model) storage with the next command: Relational databases a separate column After performing these settings ArchiGraph.Mir, connected with the to work. values in separate columns for indexing

values in separate columns for indexing, supporting multi-language string literals

8 Now let us turn to the examples of creating ontology elements for the continuants. Occurrents also has further division. For an analyst above described entities. Let us describe the structure of an ontology interested in this theory we recommend browsing the BFO content considered in this document. (for example, at ) and reading the papers on this topic. We will demonstrate creation of a fragment of ontology, based on the BFO1 upper level ontology, and using some elements of SOSA, SSN The process of ontology elements creation in the ArchiGraph. and QUDT2 ontologies. Mir interface is described in its user documentation. The result of construction of a part of our sample ontology is shown in Fig. 1. Upper level ontologies divide everything existent in the world into general classes (sets) according to the rigid (constant for the whole period of existence of an object), “fundamental” properties. The top level of BFO contains classes “Continuant” and “Occurrent”. The common fundamental property of all Continuants is that they exist completely at each moment of time and can exist for some period of time. Occurrents, in contrast, represent changes that unfold in time and cannot be considered disregarding it. In our model the samples performed by these devices and the alarms that they produce.

Continuants, in turn, are divided into three subclasses: independent, won’t go deep into philosophical foundations of this distinction, and independently of others, or can manifest (exist) only in relations with other objects. All the material objects are independent continuants, while various informational entities are generically dependent

1 2

9 Fig. 1. A part of sample ontology classes in tree view in the ArchiGraph.Mir interface

ontologies. It is possible to import an external ontology instead of copying its elements manually. This can be done by that because we only need a small number of classes from external ontologies.

10 Fig. 2. Diagram of the ontology fragment

11 Beside the mentioned BFO classes, this ontology fragment includes class will look in ArchiGraph.Mir interface (in the “Attributes” “Sensor” () and “Result of measurement” () area at the class properties page) as shown in Fig. 3. classes borrowed from SSN and SOSA ontologies respectively. We have bound them to the appropriate place in BFO classes structure because SSN and SOSA ontologies are not based on any upper level classes, such as “Temperature sensor” or “Smoke alarm” as the ontology is a normal practice. Fig. 3. The list of sosa:Result class attributes When the set of classes is created, we have to proceed with the properties. We recommend analysts to make model diagrams to simplify navigation through the model. There is no commonly accepted notation for these diagrams, so we have devised, as we like to think, an intuitively comprehensible visual syntax. A sample of such a diagram of our ontology. is shown in Fig. 2. The boxes on it represent classes, arrows between them mean reference properties (), which may connect individuals of these classes. The parallelogram represents the format, which can be processed in external applications such as literal property (), which is connected with the Protégé editor. The TBox elements may be imported from external class which whose individuals may have this property values. The data sources into the RDF triple store in which ArchiGraph stores model type is shown inside of the parallelogram. structure description. After this operation, the platform API and editor cache should be reloaded. Beside the above listed classes the “Unit of measure” () class appears on this diagram. It is borrowed from the QUDT ontology which describes units of measure of physical values. Most of the platform API (JSON form is also allowed): properties (predicates) on this diagram are also imported from QUDT and SOSA ontologies. Relying on the diagram let us create three object properties and one “datatype” (literal) property. After that, the properties set of the

12 To rebuild the editor cache the following URL should be invoked: The next task is to populate ontology with individuals. We can containing information on two units of measure. data contains units of measure and features of interest. We could import the full set of units of measure from QUDT, but we will rather the “Import” area on the same page of the editor and press “Import” button. The import results will be displayed below the import form.

We will use Excel import tool to create units of measure set. Open the The imported units of measure will be displayed in the editor interface as shown in Fig. 6. the main menu. Then choose in the left menu one or more classes which objects we want to create. In the middle part of the page check the “Individuals” checkbox and press “Export” button as shown in Fig. 4.

Fig. 4. Excel import/export interface

13

14

The reference data individuals formally belong to the ABox, a part of ontology containing facts on the particular objects. But as there is not many of them, and they rarely change, we can store them in the graph database along with TBox. All the other be better stored in the relational or document-oriented database (graph databases are, in general, not intended to serve as

15 Data Storage

ArchiGraph allows storage of the individuals in the external non-graph storages according to the logical structure of the ontology. The platform segmented Postgres tables, indexed at least by the event time, feature can use several data storages of different types simultaneously. The “native” type of storage for ontologies are the graph databases allow system scaling. supporting SPARQL access protocol. There are several proprietary implementations of such databases (Stardog, AllegroGraph, Virtuoso, How we can implement this schema? etc.), as well as open source projects such as Apache Fuseki. ArchiGraph in it. We can use different tables to store objects of various classes, or reference data individuals. place all the objects into a single table. The choice depends on the indexed attributes set: if they are the same for all the classes, there is Storing a large amount of data in the graph database is ineffective, as no need in different tables. In our example the ssn:Stimulus and Event they usually don’t allow custom indexing in respect of the nature of data (for example, indexing time series by the timestamp). The key feature of ArchiGraph is its ability to store individuals of some branches of classes the different segmentation policies should be applied to them. This tree in the relational (usually PostgreSQL) or document-oriented (such makes creating different tables for individuals of these classes the as MongoDB) databases. These databases can index data in respect of best choice. After the table is created, the following columns should be created in it: ArchiGraph platform offers its own SPARQL endpoint for accessing individuals in these storages, making the customer component • indifferent to the actual storage mode. It also allows applying inference rules to these individuals, as we will discuss later.

In our example, the signals coming from sensors and the events are • the permanently growing data arrays, which can reach millions of As each object can belong to several classes simultaneously, this

16 • Then we have to bind the storage to the endpoint which will use it: properties. ArchiGraph stores properties values into a JSON collection and stores it here.

The separate columns should also be created for storing values of the The next command shows the list of data storages bound to the indexed properties or the properties used for table segmentation. In endpoint: our case such are the columns for storing date and time of the signal where it is situated, and an event type. You can choose arbitrary names Then we need to bind the ontology classes to the new storage of our for these columns. endpoint. This should be done with the command:

storage. First, we have to register the storage itself. This is done by calling mdmctl command: otherwise you need to indicate the full URI. You can list the classes bound to the storages with the next command:

We already know all the command parameters mentioned above, except the following: When the data table is created in relational storage, its column names • have to be put into the ArchiGraph platform settings. Each column the storage,

• • property values changes.

17 After the “bind property” command in this syntax the property URI is referenced. This property’s values have to be stored into the table and storages mappings which we recommend in this scenario. column indicated in the next parameter (“test” is the table, “type” is its column in this example). The next command can be used to bind the column containing object URI: appropriate tables will be created. If we don’t know which attributes have to be indexed in the beginning of development, we can start working with non-indexed tables, and then analyze platform logs to The commands like these have to be executed for all the other columns created in the data table. The special “data” column containing the whole object has to be bound with the special “data” pseudo-attribute.

The attributes mapping for a certain table can be viewed with the next command:

Its only parameter is the table name.

After these settings are done, you still can work through ArchiGraph platform API with the individuals of the classes bound to the relational data storages, although their individuals are no more situated in the real triple store.

Which settings have to be done in our example? Looking at our ontology classes tree we can conclude that it is reasonable to split individuals between at least three collections: the physical objects (sensors, etc.), events and measurement results (observations). Each collection in our case will be represented by Postgres table indexed by

18 Table 2. Ontology classes distribution between storages

Class name Class URI Indexes

Material object

Signal (Stimulus) dateTime, sosa:madeBySensor

Event dateTime, hasEventType

Measurement result

19 Properties of the Classes and Properties

The semantic-web aware reader knows that the semantic web A standard property can be added with the following command: like , or . Such predicates are applicable not to the individuals only, but also to the classes or properties of an ontology (sometimes only the last ones). It means that the classes and properties may have values of their own In this line, the “add property” command is followed by the parameters: properties, such as labels, comments and other annotations, various linkages between them. Some of these predicates, such as , • , , , , , , , are • supported by the ArchiGraph platform by default. • What if we need to use a predicate not listed above? For example, declare one property as • individuals as denoting the same real world object with ? • Then we need to register a so-called “standard property” (it means may possess values of this property, • may be the values of this property,

• for each individual,

20 • value for each individual,

• different languages,

• be changed (used for the system properties such as archigraph:SourceSystem and archigraph:LocalCode).

In this and all the similar command examples all the parameters not

for the Demo endpoint:

When the “standard properties” are registered in ArchiGraph, they will be automatically displayed in the objects editing forms in ArchiGraph. Mir. It does not mean that ArchiGraph will automatically perform entailment based on these properties: for example, if property A is declared as value of property B as it may be expected. Entailment is performed only for the above listed properties: , , . Entailment rules for the user-added standard properties

21 Access Rights and Security

Most of operations with the ontology-based data will be performed In the infrastructure built around ArchiGraph the authorization and not with ArchiGraph.Mir, but with the ArchiGraph platform API. The authentication are implemented using KeyCloak product. This open- platform offers several APIs among which we will consider REST, source software has a rich functionality of integration with security 1. providers such as Active Directory. It also provides its own instruments of accounts, user groups and roles management. KeyCloak allows to set up seamless Kerberos authorization in order not to ask user for login and main kernel is PHP-based and deployed in Kubernetes pods as it is password. It also allows to implement SSO (Single Sign-On) mechanism, described in the Deployment section of this document. The second using which a user authorized in one of ArchiGraph applications kernel is now under active development and it is already available for use becomes also authorized in others. as an extra component of the platform. This kernel a C++ -written Linux KeyCloak can be used both for interactive user authorization and API processing. Its API and behavior are identical to the main kernel’s, clients authorization. In the secure environment a client wanting to but in read-only mode currently. The secondary kernel is completely functional with the core API methods, including multi-language data support, but it does not currently support geodata. It supports only Fuseki and Postgres storages now. It is important to mention that ArchiGraph platform does not handle user accounts and “does not know” anything of them. Platform API clients are are named “Originators” in ArchiGraph, as their IDs are indicated in applications providing GUI, such as ArchiGraph.Mir. Our ontology editor are used to control access rights at the platform side. In the trusted environments it is possible to set up anonymous access to the platform’s API. In the highly secure environments it is possible to use JSON Web Tokens (JWT) mechanism to authorize API clients. concept of access rights control in ArchiGraph. We recommend to create a separate Originator for each role or group of users. For this Originators access rights to the model elements can be assigned at the platform 1 ArchiGraph has also experimental support for GraphQL and protocol.

22 level. To achieve this, we have to associate KeyCloak user groups with the In this syntax the bind system command is followed by the Originator’s ArchiGraph Originators as it will be described below in the Deployment code (demo) and the endpoint code (Demo). section. Originator by default can perform all kinds of operations with all the We recommend to use the same paradigm in all the applications providing elements of the model structure and data. The access can be limited GUI for working with ontologies and data stored in the ArchiGraph at the level of particular classes by issuing a command: platform. For the applications performing operations without user can use a single Originator for each component. We do not recommend using the same Originator for several components as this will make log In this line after the allow command are consecutively indicated every log entry. It will also make impossible for one component to receive which operations can perform this Originator with the individuals model and data updates made by another component using subscription mechanism which we will consider below. the example above the demo Originator can read individuals of the class but cannot update them. The rights set for superclass are inherited by its subclasses. If a class is a subclass Let us consider commands which are used to manage Originators set of several classes, the strictest option is selected. To set up rights in ArchiGraph. To register the new Originator, the following command for working with the classes and properties of the model we should has to be issued: indicate , or as the class URI in the command syntax. In this example the Demo client is a readable name of the Originator, platform API. After the Originator is created, we should allow it to access one or more endpoints:

23 We can list the Originators and their access rights with the following command:

In our example we should create Originators for the application adapters and the situation center UI. If the users with the different access rights will work with the UI, then a separate Originator for each group should be created. Their access rights to the ontology

24 Architecture of the Applications Using ArchiGraph Platform

Now when we know how ArchiGraph stores data and how to set it up for working with client applications, we can design a scenario of situation center incoming data processing. The platform supports feed to a user. Working in asynchronous mode by subscription allows several methods of interaction with users and application components, to avoid the permanent platform polling for the changes. We will as shown in Fig. 8. consider all the scenarios of with the platform below, The arrows on this diagram shows the directions of data transfer, not the direction of the API calls.

As this diagram shows, all the data processing goes through the ArchiGraph platform. The applied components are labelled as the “Data processing component” and “User interface component” (in the real systems there may be a lot of components of various designation). The platform offers several ways of accessing information for the components:

• In the asynchronous mode by subscription through the message

25

26 ArchiGraph Platform Deployment

The following components are necessary for the platform deployment: We assume that all these components are already deployed in the Kubernetes cluster or separately: • • • • • • • • mode

• •

For gathering and analyzing logs the following components are also

• Timber.io Vector

• Prometheus

• php-fpm-exporter

27 Applications Databases

The ArchiGraph platform components use several databases (DBs) Import database templates into respective databases: uses its own DB for storing settings and other metadata. Before starting the platform the following databases should be rolled out from templates:

• Note that ArchiGraph.Mir database is initiated automatically on • sessions data,

Create the following roles in the PostgreSQL server:

Create four databases in PostgreSQL with respective owners:

28 Fuseki Datasets

Create Fuseki dataset using its web panel. Go to the Manage datasets tab and choose Add new dataset:

Fig. 9. Dataset creation in Fuseki

Type the dataset name, choose Persistent type and press Create dataset button.

29 Wait until the dataset is created, then press Upload data button:

with an empty dataset), and press Upload now button:

Wait for the dataset loading.

30 ArchiGraph Platform Deployment in Kubernetes

For ArchiGraph deployment you will need an access to the Kubernetes the images (we assume they are already loaded to the local repositories):

By default all the ArchiGraph platform components are deployed in the "trinidata" namespace which has to be created before deployment. 1. Indicate server name or address, username and password and DB name: images (we assume they are already loaded to the local repositories): Docker repository and authorization credentials in the user:password images (we assume they are already loaded to the local repositories): format, base64 encoded:

{ 2 }

1 2 Set to base64encode(user:password)

31 }, After the successful deployment 6 containers will be run in two pods: } } Elasticsearch server for logs indexation: The pod named "mdm-dev" can be run in several instances for scaling several instances of the server. For scaling and running 4 instances run the command: 1 and should not be run as more than one instance. If you are not using ArchiGraph synchronization between several clusters, you can from the manifests folder: temporarily switch off this pod and its containers by scaling it to zero instances: Deploy ArchiGraph by issuing the next command from the same folder:

instances in Kubernetes2. Wait for images loading and containers deployments which may take some time. You may monitor the process with the next command: through the proxy situated on the master node, using Kubernetes

1 Indicate Elasticsearch server name or network address. 2

32 internal DNS mechanism. If we have deployed ArchiGraph using ArchiGraph.KMS Deployment "trinidata" namespace, its address will look like: For ArchiGraph deployment you will need an access to the Kubernetes

By default all the ArchiGraph platform components are deployed in the working with ontology data source (see Data structure chapter). "trinidata" namespace which has to be created before deployment.

Indicate server name or address, username and password and DB name:

images (we assume they are already loaded to the local repositories):

Docker repository and authorization credentials in the user:password format, base64 encoded:

33 { 1 } If the KMS is deployed by the described schema using "trinidata" }, namespace, it will have the following address: } } Please note that for the correct functioning of KMS in the described from the manifests folder:

Deploy ArchiGraph.KMS by issuing the next command from the same folder:

Wait for images loading and containers deployments which may take some time. You may monitor the process with the next command:

After the successful deployment the two containers should be run in one pod:

34 ArchiGraph.Mir Deployment

For ArchiGraph.Mir deployment you will need an access to the credentials:

By default all the ArchiGraph platform components are deployed in the "trinidata" namespace which has to be created before deployment. agmir

comments: Originators with ins platform access tokens (do not mess these with KeyCloak tokens), for example: { }, { } In the newer versions of ArchiGraph the correspondence between Originators and KeyCloak groups is stored in KeyCloak group

35 references to the images (we assume they are already loaded to the local repositories): to the Docker repository and base64-encoded credentials in the user:password form: { } }, agmir } }

36 Using Redis Cache in ArchiGraph from the folder containing manifests: The objects caching are used to speed up platform and lower the Deploy ArchiGraph.Mir by running the next command from the same from the Kubernetes master node console using kubectl command: folder: Wait for images build and deployment which may take some time.

You can monitor the process by running the command: in the next line. Further we will omit the starting part of the command assuming that you know how to execute in inside ArchiGraph container.

After the successful deployment two containers should be run in the single pod: port, keys space number (selector), password and cluster mode if Redis cluster is used: If we have deployed ArchiGraph.Mir by the above described schema using "trinidata" namespace, its target address will be: Please note that in the cluster mode selector has no value. Set the cache type constant to "redis": Please note that the ArchiGraph.Mir images are built to be opened

37 Set the constant switching on model caching to "1":

the result:

After all components are deployed we should set up logs collection into Kibana.

38 Working With ArchiGraph REST API

In our example we should solve at least two tasks of incoming data 1. processing: create and maintain the list of sensors (which can generate alarms) in ArchiGraph, and process alarms in the real time. external system:

in the form of JSON objects routed to the REST service implemented at our system’s side. We have to implement two components: one will be scheduled to process the sensors list, another one will listen for POST The sensors list import script should implement the following algorithm • •

• the values of the corresponding ontology properties (mapping between them can also be stored in ontology), • If any sensor that exists in ontology is not mentioned in the imported 1 See the complete ArchiGraph platform API reference at ArchiGraphAPIReference.pdf

39 }, { the sensor in the source system. This value will be stored as a valued of the special } } a message describing operation results (OperationResults package) which will contain the URI that the platform has assigned to the The special attribute object. In the real-world scenario the sensor will have much more properties, including its coordinates. We will consider working with the geodata later. which signals for the absence of the object with this code in the platform, or the Items packet with a nested Item element, which Code 3) If the sensor is already registered in the platform, we have to property should contain object URI. some property values differ from each other, we should issue a will be identical to those from step 2) with the only difference: we should pass Code attribute containing the object URI instead of the LocalCode. We can list only the changed attribute values in the be left intact.

40 system. We should obtain the full list of objects from the platform:

If some of the sensors which existed in the platform are not listed in removed) removed. We can either delete them from the platform’s setting its property to false.

As a result of these actions the platform will contain an actual list of sensors. We can proceed to constructing an algorithm of writing time series to the platform: sensor measurement results, alarms and the events they have produced.

41 Data Processing Using Inference Rules

The easiest version of time series processing can be implemented When the individuals of the class to which the rule is bound are created or updated the constraint check is performed. If the constraint is not practice it is often needed to perform some kind of logical processing met, a special individual is created representing validation result. It of the data. One of the advantages of using ontologies in the corporate references the violated constraint and the individual for which it is not automated systems is that this processing can be moved from the met. The application components working with ontology can handle program code to the model level using logical inference rules. These rules can be changed during the program execution, it means we can change its algorithms without amending code or restarting any of applications. In the ArchiGraph platform the rules are composed using constructor provided by the ArchiGraph.KMS. Let us consider as an example the rule which controls that an alarm is generated by the sensor not in ontologies is SHACL1 marked as faulty. The form of the rule creation in ArchiGraph.KMS interface looks as it features (known as SHACL Advanced Features), among which are shown in Fig. 12: creation of the inference rules for augmenting the facts which are put into an ontology as axioms with the inferred facts. ArchiGraph platform offers SHACL rules execution not only for the data situated in the RDF triple store, but for all the data stored in the platform.

The SHACL Constraints has the following principle. A rule is created for one or more model classes to check some logical constraint. A rule can contain several logical conditions united with the logical AND, OR operators and the brackets. The rule itself is represented as an

42 Fig. 12. Rule properties form in the ArchiGraph.KMS interface

Then we have to construct the rule conditions in the same form:

43 that an individual of the “Alarm” class in variable $this is produced by a boolean true if the object met the constraint and false otherwise. If false, the platform creates a special object of the “Violation” class.

Only the alarms produced by non-broken sensors will meet this seen at the class properties page in ArchiGraph.Mir as shown in Fig. 14:

Fig. 14. SPARQL query for logical constraint check1

The sample rule show in Fig. 13 consists of three logical conditions. The second and third check that some sensor in variable A does not have any value of the “isBroken” property or has its value set to false. These conditions are united by the logical OR. SELECT temporarily due to the limited functionality of ArchiGraph internal SPARQL endpoint which is under active development. The absence of SELECT results is

44 Let us note that ArchiGraph implements some features beyond features). For example, a constraint condition can be formed as a positive or negative statement. It means that a desirable result may be the situation where the described condition is not met as well as it is met (see “Should not be met” checkbox in the rule properties).

order to turn on constraints checking. If the conditions are not met, the object will be updated anyway, but the operation result will be returned with the error message:

Or, the same in JSON syntax:

} }

violation result as it is shown in Fig. 15:

45 Fig. 15. Constraint violation in the ArchiGraph.Mir interface

In the platform data the following “Violation” object will be recorded:

46 class individual, which property points to the checked object. The the place where the two sensors are installed should be checked. can use these objects to present constraints checking result to a data The rule condition can be represented graphically as shown in Fig. 16.

SHACL inference Rules work in a similar way. They are applied on As this example shows, the variables in the constraints and inference creation or updating of the individuals of the class to which they are rules may mean the individuals related to the checked object with the bound. If the logical condition is met, the rule creates one or more Object properties, as well as the values of its Literal properties. The last can be constrained considering the literal data type: for the numeric objects or creates new ones. applicable, while “contains” and “not contains” are applicable to the Let us consider the following rule as an example: if a smoke detector strings. The literal values may be compared with the constants as well has produced an alarm and at the same time the temperature sensor as with the other variables of the same type. reports high temperature, the “Fire” event has to be formed. The rule will have the following structure:

47 Fig. 16. Inference rule structure

48 Let us pay attention to the time interval check between the alarm and measurement. The rules editor handles this as it is shown in Fig. 17:

Fig. 18. Rule conclusion in the ArchiGraph.KMS interface

So as the result of the inference an individual of the Event class will be created. It will have the “Fire” type (which is selected from the events “resultTime” of the Alarm and Measurement respectively, in the next the Alarm was registered. We can augment its properties by relying rule condition we can choose “variables set” selector , choose E and it with the Alarm, Sensor installation place, etc. The UI components F variables in the drop-down menus, choose “has Interval, sec” in the “Relation” column, and set the condition for the interval value in the should be subscribed to the Event class individuals. right column (it should not exceed 60 seconds). The inference rules are convenient for implementing incoming data processing pipelines, which are completely designed at the ontology level and independent of the program code. will apply all the rules which are bound to the class(es) to which the updated object belongs. If the rule condition is met, its conclusion will be applied. It is represented in ArchiGraph.KMS as shown in Fig. 18:

49 Subscribing to Receive Data

In our scenario we have to build a situation center UI which should display the list of incoming alarms and the event created on its basis.

Subscriptions are used to implement this. update, delete events of the individuals of any available class. In our example an UI component should subscribe to the and } } broker at any node available to the platform. The format of the messages which the platform sends to the The parameters of the Subscribe tag are specifying the message subscribed components is identical to the packets which it sends in root tag name, SubscriptionItems instead of Items. to the subscribing system, while Objects parameter indicates if the component should receive ABox (individuals) updates. As in the other

50 • broker, way is preferable when processing large arrays of data, as it guarantees • • of them will be used to send messages from a component to the platform, another one will pass the responses back to a component. • broker, •

• processing messages. This can be done with the following command: broker,

• • broker. listens to:

• will be placed,

• platform will respond,

51 In our example the situation center UI back-end should subscribe to the updates of the Alarm and Measurement classes. The information on these objects should be passed to the front-end using websocket protocol. The front-end built using reactive layout principle should be updated to display the actual information to the users.

It is important to note that we can subscribe to the model (classes and properties) updates too. This allows to design the model-driven applications. They should be ready for the model updates and react by changing data processing algorithms. At least an application should be ready for the properties set changes and add new controls to the UI forms as new properties emerge. A special meta-model can be designed to describe interface rendering rules, including controls ordering and grouping, visibility rules, etc. The more advanced applications can be also ready for the new classes’ appearance. They can process these classes individuals using the logic applicable to their superclasses: for example, an application can display the sensors of Sensor class. Alternatively, the logic of data handling can be described in a special part of the model, such as the “object display rules”.

52 Synchronization Between Several Platform Instances

The different platform instances can subscribe to each other’s power supply and still be functional without the Internet connection. messages on the model and data updates. This allows to implement In this case the main cluster can be situated in management a full or partial synchronization between several instances of the company datacenter, and the secondary clusters are deployed in each platform. served building. The model changes can be performed only from the management company’s side, and these changes are propagated to To be precise, this synchronization is performed between platform the secondary clusters automatically. The data input (for example, clusters of the platform run under Kubernetes or Docker Swarm, sensors editing), on the contrary, is performed on the secondary each of which contains a number of pods with the platform services clusters and replicated to the main one. A part of data series can (see Deployment chapter). One of the clusters is usually selected as migrate in the same way: for example, the important events and the main one, while the others serve particular organizational units alarms can be replicated to the main cluster, but it does not make or divisions. The model and master data changes are performed on sense to translate all the sensors data into it. the main cluster only and they are propagated to all the secondary clusters. The data exchange schema is shown in Fig. 19. The similar schema can be implemented for a reserve cluster implementation. In this case two identical clusters have to be set Depending on the functional tasks the subscription may be set up not up, and the mutual subscription for the whole ontology is set up for a whole model but for several classes only (although the platform between them. The proxy server should be set up between these clusters and the API users, switching them to the reserve cluster in case if the main one is down (or it can distribute load between two imagine that in our example an organization serves several buildings clusters permanently). If one of the clusters is down, the users will be in different cities. Each building has its own “situation center”. It seamlessly switched to another one. As the subscription uses Kafka would be risky for all the buildings to use a single ArchiGraph cluster, as the Internet connection loss will lead to the system’s shutdown. It is necessary to provide the autonomous mode when the ArchiGraph start processing these messages and synchronize its state. cluster is situated in the building, it is reserved, has an uninterruptible

53 Fig. 19. Data exchange between several ArchiGraph platform instances

54 SPARQL Endpoint

We have considered the main ways to use ArchiGraph platform. It is enough for building business applications with it using ontologies and time interval, the sensors that have produces these alarms, and the inference. Now let us describe some special platform features.

the main ways of interacting with the platform, although the “native” data access method for ontology storages is the SPARQL protocol. ArchiGraph offers a SPARQL endpoint allowing to extract any data from the platform physically situated in the relational or document-oriented databases.

At the moment while this Cookbook is being prepared the SPARQL endpoint in ArchiGraph allows only data extraction, not providing its updating or deletion. The SPARQL endpoint is under active development, does not yet support all the protocol features and has an experimental status. However, it can be used in the real-world scenarios for extracting

We will use Apache Fuseki panel to demonstrate ArchiGraph SPARQL endpoint use. In this panel an arbitrary endpoint URL can be provided Apache Fuseki panel in Fig. 20.

55 Please note that in our example the Alarm and Sensor classes individuals are situated in Postgres database. This

56 Geography and Geometry Support

In the business applications such as the situation centers it is often GeoJSON-based data types: PointType, LineType, PolygonType and and Polygon type columns in the Postgres database and map them to the ontology properties.

In our sample situation center use case it is necessary to keep the We have to keep in mind that the platform will store properties values plans of the buildings and the surrounding territory as well as the twice: in the text form as in the above examples and in the GeoJSON situated within a particular building. with the other attributes, but we have to create an additional column To represent the coordinates and shapes in the ontology we have to create the DatatypeProperty having range PointType, which has to be applicable to the Sensor class individuals, and the geodata column mapping can look as shown below: property having range PolygonType applicable to the individuals of the class. We will represent these properties values in the form of WGS 84 coordinates in GeoJSON format, where the latitudes and longitudes are represented as the real numbers. For example, sensor coordinates (a value of the property with PointType platform, we can perform the search of the sensors situated within value of the property with PolygonType range) in the simplest case the building. We have to extract the building shape coordinates, split is available when the physical data storage offers such possibility. All the other storages will keep geometry data but will not be able

57

Each point of the building polygon has to be represented as a single in JSON form.

58 Languages Support

The ArchiGraph platform allows storing the values of the string literals in the various languages. It means that the language-annotated values in English and German. Any languages can be registered in the platform.

Fig. 22. Language switches in ArchiGraph.Mir interface

The following command registers a language in ArchiGraph:

The control marked as “2” in Fig. 22 allows to choose the interface The add language command is followed by the two-letters ISO code displayed in this language. The control marked as “1” allows to choose of the language and its readable name. the content language. When the language other than the default

The next command shows the list of the registered languages: platform language is chosen, the platform starts to display the literal string properties values in the chosen language if they are set. When saving object, all the stored string values are also labelled with the chosen language. These language labels can be further used by any application working with the platform using any API version.

There are two language switches in the top-right part of ArchiGraph. Mir interface:

59 Full-Text Search in Ontologies

At the moment the main version of ArchiGraph platform does not support full-text search by default, but there is a solution allowing to implement it using additional components.

For the full-text search implementation a Solr cluster has to be deployed. Usually there is a limited number of ontology classes whose objects can have literal properties which has to be indexed with the full-text index. For example, it can be a Document class whose objects represent the documents from the corporate storages. The Solr schema has to be implemented allowing to store the structured object properties as well as the indexed string properties.

The indexing is performed by the special crawler which processes the documents in its source, creates the data object in ArchiGraph platform’s storage adapter, provides search functions which consider morphology and relevance scoring, snippets generation and its transfer to the client application. We omit the details of these components tuning here as they are not included in the main ArchiGraph version.

60 Glossaries, Lexical Models and Text-to-Facts Transformation

The ontologies are intended for the structured facts processing. Each (Natural Language Processing). NIF allows modeling the elements of fact represents a basic element of knowledge. But the human thinking the text: paragraphs, sentences, words and their relations. There are is strictly bound to the language, so it is hard to build knowledge models not taking into account the lexical component. Text into NER’s are the templates for named entities mentions discovery or literal values parsing (such as dates, intervals, etc.). section we will consider the instruments that the ontology modeling and ArchiGraph platform offer for such task’s resolution. We have developed an additional component of our platform, ArchiGraph.Logos, which is a service for the text transformation Among the ontologies used for the lexical models creation the SKOS (Simple Knowledge Organization System)1 should be noted, ontologies to implement scenarios of the structured information as it is a popular ontology for glossaries and thesauri development. We use SKOS for building lexical models related with some domain ontologies. Adding relations between the skos:Concept and domain situation center interface: “The sensors in the A building which were ontologies elements allows to bind them. It is useful for building showing temperature above 20 Celsius degree today after 12:00”. If the glossary applications which allow users browsing the lexicon of domain ontology is accompanied by the lexical model which relates some subject matter domain and enable to view the related domain of each term and the different senses of every term in various contexts. obtained from the ArchiGraph platform will be the structured data set 2 The NIF ontology (NLP Interchange Format) is intended for the representation of the text parsing results performed by the NLP tools service is tolerant to the absence of the part of terms in the lexical ontology and tries to augment the phrase sense by constructing the

61 Fig. 23. The result of the query transformation into SPARQL rule

The reverse scenario, text to facts transformation, is also possible. One of the situation center tasks is the registering speech into the system. Consider that the caller has told to the customer service personnel that “there is smoke information. Our sample text allows easily to determine the event type (“Smoke”) and the place where it has occurred

62 text and create a new ontology object or amend the existing one.

The third scenario of the natural language processing is the automated or semi-automated inference rules generation. Working with the rules transform the phrases like “If there issmoke in the building, inform the 911 service” into the SHACL Rules. In this case a rule can create a task for the employee to inform the 911 service in case when an Event appears of the Smoke category.

demonstration interface in Fig. 23.

As the above example shows, the system has recognized the objects appropriate ontology class was found. The relations were established between the objects, following the grammatical structure of the phrase, and the additional object properties were discovered. As a objects matching the conditions set in the phrase. In the same way a service can transform natural language texts into the sets of the facts which can augment the existing information.

63 Querying External Data With ArchiGraph

The external relational and document-oriented databases and even rules which relate ontology elements with the elements of the external web-services can be plugged into ArchiGraph platform which allows systems data structure. this: using the “transparent” platform storage adapters or using a The structured data of external systems can be available through the special platform add-on, the Logical data mart. The “transparent” adapters are called this way because they are case of the relational database): • Between database tables and ontology classes, plugging in a special platform add-on for each external data source. • Between database columns and ontology properties, This add-on should implement a standard program interface which is not considered here. This add-on provides an additional type of the • Between database table representing relations between other platform storage, and after its implementation the new data source tables and the ontology properties, can be registered in the platform using the mdmctl add storage command. Then one or more classes can be mapped to this adapter. • master data records. The Logical data mart is a more universal option which allows to set up data extraction rules in the ontology. The Logical data mart is a special mdmctl add storage ldm command (please note that this adapter is not supplied in the main platform version and should be purchased mapping rules. Then the analysts have to set up particular mapping

64