PUBLIC 2021-07-07

Document Classification company. All rights reserved. affiliate

THE BEST RUN 2021 SAP SE or an SAP © Content

1 What Is Document Classification?...... 3

2 What's New for Document Classification...... 5 2.1 2020 What's New for Document Classification (Archive)...... 6 2.2 2019 What's New for Document Classification (Archive)...... 10

3 Concepts...... 12

4 Metering and Pricing...... 13

5 Supported File Formats...... 15

6 Supported Languages...... 16

7 Initial Setup...... 17 7.1 Tutorials...... 18

8 Best Practices...... 19

9 API Reference...... 20 9.1 Get Access Token...... 20 9.2 Training...... 22 Classification Scenarios and Document Data Format...... 24 Training Data Stratification...... 29 9.3 Inference...... 30 Pre-trained Classification Model...... 31 9.4 Document Language Hint Supported Languages...... 31 9.5 Input Limits...... 32 Trial Account Input Limits...... 33 9.6 Common Status and Error Codes...... 33

10 Security Guide...... 34 10.1 User Administration, Authentication and Authorization...... 34 10.2 Data Protection and Privacy...... 35 10.3 Auditing and Logging Information...... 37 10.4 Front-End Security...... 38

11 Monitoring and Troubleshooting...... 39

Document Classification 2 PUBLIC Content 1 What Is Document Classification?

Classify business documents into user-defined categories using .

Document Classification is an artificial intelligence business service that analyzes business documents and proposes classifications based on previous customized classification models. To do so, the service includes training and inference capabilities to fit a model using a custom dataset.

Document Classification helps you to apply machine learning to automate the management and processing of large amounts of business documents. With customized classification models, you can use the service in a wide range of business scenarios and adapt it to your special requirements. Use the service in critical business processes such as enterprise mail inbox processing, contract management and invoice processing. Document Classification is part of the SAP AI Business Services portfolio.

Document Classification is an SAP Business Technology Platform (SAP BTP) service available under CPEA (Cloud Platform Enterprise Agreement). The service is also part of the SAP AI Business Services portfolio, and it is integrated into document management in SAP S/4HANA.

Environment

This service runs in the Cloud Foundry environment.

Features

Classify Business Process business documents and receive classification proposals, based on Documents either custom or pre-trained classification models.

Train Custom Classification Train a custom classification model by providing sample business documents Model and the corresponding classification labels (ground truth).

Business Benefits

With Document Classification you can:

● Extract information from documents to classify them ● Receive proposals of the best classification outcome for uploaded documents together with a corresponding value of confidence ● Save costs and increase efficiency ● Replace manual work through automatic document classification

Document Classification What Is Document Classification? PUBLIC 3 ● Reduce human errors when processing an increasing volume of documents

Prerequisites

See Initial Setup [page 17].

Scope and Limitations

For information on technical limits, see Input Limits [page 32].

Regional Availability Get an overview on the availability of Document Classification according to region, infrastructure provider, and release status in the Pricing tab of the SAP Discovery Center .

Trial Scope Document Classification is available for trial use. A trial account lets you try out SAP BTP for free and is open to everyone. Trial accounts are intended for personal exploration, and not for productive use or team development. They allow restricted use of the platform resources and services. The trial period varies depending on the environment.

To activate your trial account, go to Welcome to SAP BTP Trial.

 Note

See also the following information: Trial Accounts.

In the Cloud Foundry environment, you get a free trial account for Document Classification with the following constraints: Trial Account Input Limits [page 33].

Document Classification 4 PUBLIC What Is Document Classification? 2 What's New for Document Classification

Techni­ cal Envi­ Availa­ Com­ Capa­ ron­ ble as ponent bility ment Title Description Type of

Docum Exten­ Cloud Python The Python client library has been updated and now also includes Chang 2021-0 ent sion Foun­ Client the Document service. ed 7-26 dry Classifi Suite - Library Go to Python Client Library . cation Devel­ opment Effi- ciency

Docum Exten­ Cloud Overall There have been several code improvements. Chang 2021-0 ent sion Foun­ Im­ ed 7-07 dry Classifi Suite - prove­ cation Devel­ ments opment Effi- ciency

Docum Exten­ Cloud Secur­ Auditing and logging information is now available in the Security New 2021-0 ent sion Foun­ ity Guide [page 34]. 7-07 dry Classifi Suite - Guide cation Devel­ opment Effi- ciency

Docum Exten­ Cloud Overall There have been several code improvements. Chang 2021-0 ent sion Foun­ Im­ ed 5-12 Classifi Suite - dry prove­ cation Devel­ ments opment Effi- ciency

Document Classification What's New for Document Classification PUBLIC 5 Techni­ cal Envi­ Availa­ Com­ Capa­ ron­ ble as ponent bility ment Title Description Type of

Docum Exten­ Cloud Overall There have been several code improvements. Chang 2021-0 ent sion Foun­ Im­ ed 3-16 Classifi Suite - dry prove­ cation Devel­ ments opment Effi- ciency

Docum Exten­ Cloud Train­ Training Data Stratification [page 29] is now available. New 2021-0 ent sion Foun­ ing 3-16 Classifi Suite - dry cation Devel­ opment Effi- ciency

Docum Exten­ Cloud Overall There have been several code improvements. Chang 2021-0 ent sion Foun­ Im­ ed 2-01 Classifi Suite - dry prove­ cation Devel­ ments opment Effi- ciency

2.1 2020 What's New for Document Classification (Archive)

Techni­ cal Envi­ Availa­ Com­ Capa­ ron­ ble as ponent bility ment Title Description Type of

Docum Exten­ Cloud Sup­ Document Classification now supports additional file formats. New 2020-1 ent sion Foun­ ported 2-07 See Supported File Formats [page 15]. Classifi Suite - dry File cation Devel­ For­ opment mats Effi- ciency

Document Classification 6 PUBLIC What's New for Document Classification Techni­ cal Envi­ Availa­ Com­ Capa­ ron­ ble as ponent bility ment Title Description Type of

Docum Exten­ Cloud Pre- A new pre-trained classification model for invoices, payment advi­ New 2020-1 ent sion Foun­ trained ces and purchase orders is now available. 2-07 Classifi Suite - dry Classi­ See Pre-trained Classification Model [page 31]. cation Devel­ fication opment Model Effi- ciency

Docum Exten­ Cloud Overall There have been several code improvements. Chang 2020-1 ent sion Foun­ Im­ ed 2-07 Classifi Suite - dry prove­ cation Devel­ ments opment Effi- ciency

Docum Exten­ Cloud Overall ● There have been several code improvements. Chang 2020-1 ent sion Foun­ Im­ ● The Feature Scope Description for Document Classification ed 1-03 Classifi Suite - dry prove­ has been updated. cation Devel­ ments opment Effi- ciency

Docum Exten­ Cloud Service ● Supported Languages [page 16] documentation is now New 2020-1 ent sion Foun­ Guide available. 1-03 Classifi Suite - dry Im­ ● To get better classification results with Document cation Devel­ prove­ Classification, see Best Practices [page 19]. opment ments Effi- ciency

Docum Exten­ Cloud Meter­ A new service plan is available for Document Classification. New 2020-1 ent sion Foun­ ing and 0-21 See Metering and Pricing [page 13]. Classifi Suite - dry Pricing cation Devel­ opment Effi- ciency

Document Classification What's New for Document Classification PUBLIC 7 Techni­ cal Envi­ Availa­ Com­ Capa­ ron­ ble as ponent bility ment Title Description Type of

Docum Exten­ Cloud Overall There have been several code improvements. Chang 2020-1 ent sion Foun­ Im­ ed 0-13 Classifi Suite - dry prove­ cation Devel­ ments opment Effi- ciency

Docum Exten­ Cloud Overall There have been several code improvements. Chang 2020-0 ent sion Foun­ Im­ ed 9-11 Classifi Suite - dry prove­ cation Devel­ ments opment Effi- ciency

Docum Exten­ Cloud Tutori­ A new tutorial group is now available for Document Classification. New 2020-0 ent sion Foun­ als 8-28 See Use Machine Learning to Classify Documents (Trial Account) Classifi Suite - dry . cation Devel­ opment Effi- ciency

Docum Exten­ Cloud Overall There have been several code improvements. Chang 2020-0 ent sion Foun­ Im­ ed 8-28 Classifi Suite - dry prove­ cation Devel­ ments opment Effi- ciency

Docum Exten­ Cloud Tutori­ A new tutorial group is now available for Document Classification. New 2020-0 ent sion Foun­ als 6-18 See Use Machine Learning to Classify Documents (Enterprise Ac­ Classifi Suite - dry count) . cation Devel­ opment Effi- ciency

Document Classification 8 PUBLIC What's New for Document Classification Techni­ cal Envi­ Availa­ Com­ Capa­ ron­ ble as ponent bility ment Title Description Type of

Docum Exten­ Cloud Train­ Understand how to label a training document. New 2020-0 ent sion Foun­ ing 6-18 See Classification Scenarios and Document Data Format [page Classifi Suite - dry 24]. cation Devel­ opment Effi- ciency

Docum Exten­ Cloud Meter­ Metering and Pricing [page 13] documentation has been up­ Chang 2020-0 ent sion Foun­ ing and dated. ed 6-18 Classifi Suite - dry Pricing cation Devel­ opment Effi- ciency

Docum Exten­ Cloud Overall There have been several code improvements. Chang 2020-0 ent sion Foun­ Im­ ed 5-12 Classifi Suite - dry prove­ cation Devel­ ments opment Effi- ciency

Docum Exten­ Cloud Trial You can now try out Document Classification on SAP Cloud Plat­ New 2020-0 ent sion Foun­ Ac­ form Trial. 4-09 Classifi Suite - dry count See Get a Trial Account. cation Devel­ opment  Restriction Effi- ciency See Trial Account Input Limits [page 33].

Docum Exten­ Cloud Python A Python client library is now available for Document New 2020-0 ent sion Foun­ Client Classification. It provides easy access to the REST API and facili­ 3-24 Classifi Suite - dry Library tates the service onboarding process. cation Devel­ See Python Client Library . opment Effi- ciency

Document Classification What's New for Document Classification PUBLIC 9 Techni­ cal Envi­ Availa­ Com­ Capa­ ron­ ble as ponent bility ment Title Description Type of

Docum Exten­ Cloud Docu­ The document language hint has been enabled. This new optional Chang 2020-0 ent sion Foun­ ment parameter can improve the classification results for image-based ed 3-24 Classifi Suite - dry Lan­ documents. cation Devel­ guage See Training [page 22], Inference [page 30] and Document Lan­ opment Hint guage Hint Supported Languages [page 31]. Effi- ciency

Docum Exten­ Cloud Input The maximum size of the documents that Document Chang 2020-0 ent sion Foun­ Limits Classification can process has been increased to 25MB. ed 3-02 Classifi Suite - dry See Input Limits [page 32]. cation Devel­ opment Effi- ciency

Docum Exten­ Cloud Overall There have been several code and stabilization improvements. Chang 2020-0 ent sion Foun­ Im­ ed 2-03 Classifi Suite - dry prove­ cation Devel­ ments opment Effi- ciency

2.2 2019 What's New for Document Classification (Archive)

Techni­ cal Envi­ Availa­ Com­ Capa­ ron­ ble as ponent bility ment Title Description Type of

Docum Exten­ Cloud Overall There have been several stability and usability improvements. Chang 2019-1 ent sion Foun­ Im­ ed 2-15 New XSUAA implementation based on sap-xssec library is now Classifi Suite - dry prove­ available. cation Devel­ ments opment Effi- ciency

Document Classification 10 PUBLIC What's New for Document Classification Techni­ cal Envi­ Availa­ Com­ Capa­ ron­ ble as ponent bility ment Title Description Type of

Docum Exten­ Cloud New A new service that applies machine learning to automate the man­ An­ 2019-11 ent sion Foun­ Service agement and processing of large amounts of business documents nounce -12 Classifi Suite - dry is now available. Use Document Classification in a wide range of ment cation Devel­ business scenarios and adapt it to your special requirements. See opment Document Classification documentation. Effi- ciency

Document Classification What's New for Document Classification PUBLIC 11 3 Concepts

See a glossary of definitions for artificial intelligence (AI) and machine learning (ML), and Document Classification concepts in AI & ML Glossary. In the third column Filter, select Document Classification.

Document Classification 12 PUBLIC Concepts 4 Metering and Pricing

Usage Metric

Document Classification is metered based on a predefined usage metric consisting of documents, defined as unique records processed by the cloud service. One document can consist of maximum three pages. If a document consists of more than 3 pages, each additional 3 pages will be charged as an additional document.

Block Size

1 block = 100 documents. The final price is a sum of the number of documents uploaded to the service.

Basic Service

Document Classification allows users to train and deploy customer specific classification models. The first deployed model per customer global account can be used without extra costs. Only inference requests are charged.

Metric Tiers Block Price per Month

Blocks of 100 documents Minimum consumption: 2 blocks EUR 10.00 2 to 300 blocks One document = 3 pages 301 to 600 blocks EUR 8.00

More than 601 blocks EUR 6.50

Example

Cost for 7 blocks = 7 * EUR 10.00 = EUR 70.00.

Cost for 310 blocks = 310 * EUR 8.00 = EUR 2,480.

Cost for 610 blocks = 610 * EUR 6.50 = EUR 3,965.

Document Classification Metering and Pricing PUBLIC 13 Extended Service

Additional models are charged EUR 0.601 per used hour.

If at some point of time the client had more than one model deployed, then the additional model hours are charged.

Example

Total Charge = Deployed model hours charge + Inference requests in blocks of 100 documents.

48 Blocks are used (unit price/month = EUR 10.00) = 48 * EUR 10.00 = EUR 480.00.

In total: EUR 13.823 + EUR 480.00 = EUR 493.823.

 Tip

Use the pricing estimator tool .

 Note

This is the latest service plan available for Document Classification. Other service plan options might be available for you in the Pricing tab of the SAP Discovery Center depending on when you got your SAP Business Technology Platform global account.

Related Information

SAP Discovery Center SAP BTP Service Description Guide and Agreement

Document Classification 14 PUBLIC Metering and Pricing 5 Supported File Formats

Document Classification supports the following document file formats as input:

● document files in PDF format ● document files in single-page PNG and JPEG format

 Note

The file name should contain a file extension. For example: “invoice” only, without a file extension, is not a valid file name.

The file name cannot be empty even if a file extension is provided. For example: “.pdf” is not a valid file name.

Document Classification Supported File Formats PUBLIC 15 6 Supported Languages

Document Classification supports the following languages:

● Arabic ● Japanese ○ Hiragana ○ Katakana ● Korean (Hangul) ● Latin ○ Croatian ○ Czech ○ Danish ○ Dutch ○ English ○ Estonian ○ Finish ○ French ○ German ○ Icelandic ○ Indonesian ○ Italian ○ Latvian ○ Malaysian ○ Polish ○ Portuguese ○ Serbian ○ Slovak ○ Spanish ○ Swedish ○ Turkish ● Russian (Cyrillic) ● Simplified and Traditional Chinese (Hans) ● Thai

Document Classification 16 PUBLIC Supported Languages 7 Initial Setup

To be able to use Document Classification for productive purposes, you must complete some steps in the SAP BTP cockpit.

 Tip

See Tutorials [page 18] to find out how to use a trial account (or an enterprise account) to try out the service.

Prerequisites

● You have an enterprise global account on SAP BTP. See Enterprise Accounts. ● You are entitled to use the service.

1. Create a Subaccount in the Cloud Foundry Environment.

To be able to use Document Classification for productive purposes, you need to create a subaccount in your global account using the SAP BTP cockpit.

 Tip

See Create a Subaccount in the Cloud Foundry Environment.

2. Enable the Document Classification Service.

To enable Document Classification in the service catalog, using the SAP BTP cockpit, perform the following steps:

1. Configure Entitlements and Quotas. 2. Create Space. 3. Create Service Instance.

 Note

In the New Instance wizard, enter only the Basic Info details, and leave the Parameters details empty, instance parameters are not mandatory for this service.

4. Create Service Key.

Document Classification Initial Setup PUBLIC 17  Tip

See Using Services in the Cloud Foundry Environment.

7.1 Tutorials

Follow a tutorial to get familiar with the Document Classification APIs and functionalities.

Tutorial Group Description

Use Machine Learning to Classify Documents (Trial Account) Use a trial account to classify business documents by type (invoice, purchase order or payment advice).

Use Machine Learning to Classify Documents (Enterprise Use an enterprise account to classify business documents Account) by their content language.

Related Information

Tutorial Navigator

Document Classification 18 PUBLIC Initial Setup 8 Best Practices

To get better classification results with Document Classification, observe the following recommendations:

● Use A4-style format (Europe) or letter format (United States) ● Portrait orientation is preferable (full support of landscape is planned) ● Use high quality scan ● Avoid using handwriting text due to very limited support ● Ideal resolution is 300dpi, good quality needs at least 150dpi, higher resolution (> 300 dpi) should be undistinguishable from 300dpi. Extremely large files take more time to preprocess (they are scaled back to 300dpi). Similarly, colors are ignored, and images are converted to grayscale. ● Dark text on light background is more accurate than light text on dark background ● Documents marked by markers might lead to bad OCR (Optical Character Recognition) results ● If some are oriented differently (for example, rotated 90 degrees) or have much larger or much smaller font than the rest of the page, they are not detected

 Tip

● See Supported File Formats [page 15]. ● See also Supported Languages [page 16] and Input Limits [page 32].

Document Classification Best Practices PUBLIC 19 9 API Reference

Explore the Document Classification API.

Document Classification is a customizable service which allows customers to create classification models that fit their specific business needs. Therefore, in order to use the service to classify documents, you first have to train a machine learning model based on examples of classified documents.

Before using the Document Classification training and inference endpoints listed in the sections below, you need to retrieve your OAuth access token as described in Get Access Token [page 20].

● Training [page 22] ● Inference [page 30]

To display the comprehensive specification of the Document Classification endpoints in Swagger UI, add the URL path extension /document-classification/v1 to the Document Classification base URL (that is, the url value from outside the uaa section of your service key).

Related Information

Input Limits [page 32] Common Status and Error Codes [page 33]

9.1 Get Access Token

Retrieve your OAuth access token, which will grant you access to the Document Classification endpoints.

 Note

The token is valid for 12 hours. After that, you need to generate a new one.

Request

Base URL: url value from inside the uaa section of the service key

URL Path: /oauth/token

HTTP Method: POST

Document Classification 20 PUBLIC API Reference Request Headers

Header Required Values

Content-Type Yes

Request Parameters

Parameter Required Data Type Description

client_id Yes String The clientid value from the service key.

client_secret Yes String The clientsecret value from the service key.

grant_type Yes String Token grant type. Set it to client_credentials.

response_type Yes String Token response type. Set it to token.

Response

The response is given as a status (200 or 401). See Common Status and Error Codes [page 33].

Response Example 200 “Success”

{

"access_token": "<< your access token >>", "token_type": "bearer", "expires_in": 43199, "scope": "uaa.resource", "jti": "8d00c157058949daab714a44c04c416b"

}

 Tip

Alternatively, you can follow the steps in this tutorial to Get OAuth Access Token for Document Classification via Web Browser .

Document Classification API Reference PUBLIC 21 9.2 Training

See all the training endpoints, divided in groups, and in the suggested order when first using Document Classification.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-classification/v1

Datasets

The following endpoints relate to datasets (used to train new classification models):

HTTP Method URL Endpoint Path Description

GET /datasets Get information about all created data­ sets for a tenant

POST /datasets Create a new training dataset entry

DELETE /datasets/{datasetId} Delete a specific dataset

GET /datasets/{datasetId} Get the summary information of a spe­ cific dataset

GET /datasets/{datasetId}/ Get information about all documents in documents a dataset

Document Classification 22 PUBLIC API Reference HTTP Method URL Endpoint Path Description

POST /datasets/{datasetId}/ Upload a training document containing documents classification labels (ground truth) to a particular training dataset. See Classifi- cation Scenarios and Document Data Format [page 24] to understand how to label a training document.

 Note

Use the optional parameter docu­ ment language hint lang to poten­ tially improve classification results for image-based documents. See Document Language Hint Sup­ ported Languages [page 31].

See also Training Data Stratification [page 29].

DELETE /datasets/{datasetId}/ Delete a specific training document of documents/{documentId} the given {datasetId}

GET /datasets/{datasetId}/ Get information about a document in a documents/{documentId} dataset

Models

The following endpoints relate to models:

HTTP Method URL Endpoint Path Description

GET /models Get information about trained models

POST /models/{modelName}/ Create a new model for a particular da­ versions taset

DELETE /models/{modelName}/ Delete a trained model with the given versions/{modelVersion} {modelName} and {modelVersion}

GET /models/{modelName}/ Get information about a trained model, versions/{modelVersion} including training logs, model accuracy and dataset statistics

Document Classification API Reference PUBLIC 23 Deployments

The following endpoints relate to deployments:

HTTP Method URL Endpoint Path Description

GET /deployments Get information about all deployed models

POST /deployments Deploy a model specified by {model­ Name} and {modelVersion}

DELETE /deployments/ Undeploy a successfully deployed {deploymentId} model

GET /deployments/ Get information of a deployed model {deploymentId} with the given {modelName} and {mod­ elVersion}

9.2.1 Classification Scenarios and Document Data Format

You can use Document Classification to create a machine learning model for customized document classifications, and automate the management and processing of your business documents. Understand how to label documents and corresponding metadata used for training, in the classification scenario examples listed below.

Data Format of the Ground Truth (Labels)

In order to train a customer specific classification model, you have to create a dataset with sample documents and corresponding ground truth (labels) for each document. With the POST endpoint /datasets/ {datasetId}/documents, you can upload a document plus the associated label for the document into an existing dataset. The label has to be provided as a JSON object and describes the class(es), to which a document will be assigned. The different possibilities to define the labels depend on the the type of classification and are described in the following sections:

● Single Characteristic, Binary Classification [page 25] ● Single Characteristic, Multi-class Classification [page 25] ● Two Characteristics, Two Binary Classification Tasks [page 26] ● Single Characteristic, Multi-label Classification [page 28] ● Two Characteristics, One Multi-label and One Binary Classification [page 29]

Document Classification 24 PUBLIC API Reference 9.2.1.1 Single Characteristic, Binary Classification

For each characteristic of a binary or multi-class classification, there can always be only exactly one value per document (in contrast to multi-label classification).

Labels Example

Consider (all) documents to be classified into one of the following two classes:

Suitable for children, does not include content appropriate only for adults

{

"classification": [ { "characteristic": "SuitableForChildren", "value": "yes" } ]

}

Not suitable for children, includes content not intended for children

{

"classification": [ { "characteristic": "SuitableForChildren", "value": "no" } ]

}

9.2.1.2 Single Characteristic, Multi-class Classification

Like binary classification tasks, multi-class classifications are mutually exclusive, but you have more than just two classes.

Labels Example

Consider (all) documents to be classified into one of the following four classes:

Language of text is English

{

"classification": [ { "characteristic": "Language", "value": "English" }]

}

Document Classification API Reference PUBLIC 25 Language of text is German

{

"classification": [ { "characteristic": "Language", "value": "German" }]

}

Language of text is French

{

"classification": [ { "characteristic": "Language", "value": "French" }]

}

Language of text is not English, German or French

{

"classification": [ { "characteristic": "Language", "value": "Other" }]

}

9.2.1.3 Two Characteristics, Two Binary Classification Tasks

The resulting number of classes in which a document can fall in the case of mutually exclusive characteristics is the product of the number of classes for each characteristic.

Consider documents to be classified into the following two classes for the first characteristic:

● Suitable for children (does not include content appropriate only for adults) ● Not suitable for children (includes content not intended for children)

Consider documents to be classified into the following two classes for the second characteristic:

● Document includes images ● Document does not include images

Labels Example

Suitable for children (does not include content appropriate only for adults) + Document does not include images

{

Document Classification 26 PUBLIC API Reference "classification": [ { "characteristic": "SuitableForChildren", "value": "yes" }, { "characteristic": "IncludesImages", "value": "no" }]

}

Not suitable for children (includes content not intended for children) + Document includes images

{

"classification": [ { "characteristic": "SuitableForChildren", "value": "no" }, { "characteristic": "IncludesImages", "value": "yes" }]

}

Not suitable for children (includes content not intended for children) + Document does not include images

{

"classification": [ { "characteristic": "SuitableForChildren", "value": "no" }, { "characteristic": "IncludesImages", "value": "no" }]

}

Suitable for children (does not include content appropriate only for adults) + Document includes images

{

"classification": [ { "characteristic": "SuitableForChildren", "value": "yes" }, { "characteristic": "IncludesImages", "value": "yes" }]

}

Document Classification API Reference PUBLIC 27 9.2.1.4 Single Characteristic, Multi-label Classification

Multi-label classification uses the keyword "values" (different from the singular version "value"), and brackets […] to indicate the variable number of labels possible per value for each characteristic.

Consider documents to be classified based on the following characteristic with three classes:

● Document content contains animals ● Document content contains humans ● Document content contains plants

Labels Example

Document content contains animals + Document content contains plants

{

"classification": [ { "characteristic": "Contains", "values": ["animals", "plants"] } ]

}

Document content contains animals + Document content contains humans + Document content contains plants

{

"classification": [ { "characteristic": "Contains", "values": ["animals", "humans", "plants"] } ]

}

Document content contains no label

{

"classification": [ { "characteristic": "Contains", "values": [] } ]

}

Document Classification 28 PUBLIC API Reference 9.2.1.5 Two Characteristics, One Multi-label and One Binary Classification

For the first characteristic (multi-label classification), keyword "values" and brackets […] indicate the variable number of labels possible per value, and exactly one of the values "yes" or "no" are used for the second characteristic (binary classification).

Consider documents to be classified based on the following first characteristic with three classes:

● Document content contains animals ● Document content contains humans ● Document content contains plants

Consider documents additionally to be classified into the following two classes for the second characteristic:

● Document includes images ● Document does not include images

Labels Example

Document content contains animals + Document content contains plants + Document does not include images

{

"classification": [ { "characteristic": "Contains", "values": ["animals", "plants"] } , { "characteristic": "IncludesImages", "value": "no" }]

}

9.2.2 Training Data Stratification

By default, the Document Classification service performs dataset stratification for you. In this case, the hash of the byte-wise content of the document will be used to assign 80% of documents to training and 10% each to validation and test set. You do not have to make any configuration if you prefer this approach.

You can also define your own custom stratification for each document in the POST /datasets/ {datasetId}/documents endpoint. To use this feature, you have to supply an optional key value in the parameters field of this endpoint. The key has to be stratificationSet, while the valid values are training, validation and test. The document will then be put into this set independent of document hash. When choosing this approach, make sure each stratification set contains documents, otherwise the service will raise an error.

Document Classification API Reference PUBLIC 29 9.3 Inference

See all the classification endpoints in the suggested order when first using Document Classification.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-classification/v1

Document Classification

The following endpoints relate to document classification:

HTTP Method URL Endpoint Path Description

GET /classification/models/ Get the processing status of uploaded {modelName}/versions/ documents {modelVersion}/documents

POST /classification/models/ Create a job for document classification {modelName}/versions/  {modelVersion}/documents Note Use the optional parameter document language hint to potentially improve classification results for image-based docu­ ments. See Document Language Hint Supported Languages [page 31].

GET /classification/models/ Get the result for a specific classifica- {modelName}/versions/ tion job {modelVersion}/documents/ {documentId}

Related Information

Pre-trained Classification Model [page 31]

Document Classification 30 PUBLIC API Reference 9.3.1 Pre-trained Classification Model

sap_document_type

This pre-trained classification model predicts a multi-class characteristic called document_type with the 3 classes listed in the table below. See also Single Characteristic, Multi-class Classification [page 25].

Class Description

Invoice Time-stamped commercial document that itemizes and records a transaction between a buyer and a seller.

Payment Advice Document sent by a customer to a supplier to inform the supplier that their invoices have been paid.

Purchase Order Commercial document and first official offer issued by a buyer to a seller indicating types, quantities, and agreed prices for products or services. It is used to control the purchasing of products and services from external suppliers.

Supported Languages

● English ● German

 Note

● The sap_document_type model was trained on sample data from an SAP system and might not generalize well to your data. Also no “Other” category has been included in the training of the model so far, therefore results for documents of another class will likely yield unsatisfactory results. If you have a large set of documents, use the service to train your own custom document type model instead. ● If you upload to the service documents in an unsupported language, the request is still processed, but the classification result may not be accurate.

9.4 Document Language Hint Supported Languages

The optional parameter document language hint lang supports the following languages:

Language Code Language

en English (default)

de German

Document Classification API Reference PUBLIC 31 Language Code Language

es Spanish

fr French

ja Japanese

ja-vert Japanese (vertical text)

ko Korean

ko-vert Korean (vertical text)

pt Portuguese

ru Russian

zh-Hans Chinese Simplified

zh-Hant Chinese Traditional

9.5 Input Limits

All Document Classification endpoints exposed to the end user have strict limits on the inputs. See details in the table below.

Input Maximum Limit

Document size 25 MB

Pages per document 100

Number of documents per dataset 10,000

Total amount of data per dataset 100 GB

Total number of datasets per tenant 10

Trained models, which are not deployed 10

 Note

Datasets are not permanent storage. It is strongly recommended, that datasets are cleaned up, 3 months after creation at the latest. The same is valid for undeployed models which are not used (deployed) in 3 months. SAP reserves the right to delete datasets and undeployed models older than 3 months without prior notice.

Document Classification 32 PUBLIC API Reference 9.5.1 Trial Account Input Limits

When using the Document Classification trial account, be aware of the following input limit:

Input Maximum Limit

Uploaded documents per week 40

 Restriction

The Document Classification trial account in the Cloud Foundry environment makes a pre-trained classification model available. Therefore, you cannot train a customized machine learning model using the Document Classification trial account.

If you want to try out the Document Classification training endpoints to create your own classification model, you can use an Enterprise Account. See Get an Enterprise Account.

9.6 Common Status and Error Codes

Code Reason

200 Request was successful

202 Request was accepted

400 Bad request

401 Authorization token was not found

404 Unknown training dataset ID, or document ID, or model­ Name, or modelVersion, or deployment model name and version

409 Request cannot be processed due to a conflict in the current state of the resource, for example a model was not success­ fully trained, is being deployed or is already deployed, or document processing is still in progress

413 Request exceeds 25MB

415 MIME type is not supported

429 Rate limit is violated

500 Error occurred while submitting this request

Document Classification API Reference PUBLIC 33 10 Security Guide

Get an overview on the security information that applies to Document Classification. Learn about the main security aspects of the service and its components.

General Information

Document Classification provides a set of RESTful application programming interfaces (APIs) over which client applications can directly communicate with the service, in order to classify documents based on customer specific requirements. All communication to the APIs is secured via the HTTPS protocol. Document Classification is a pure API-based service and provides no graphical user interface and has no dedicated frontend.

Related Information

User Administration, Authentication and Authorization [page 34] Data Protection and Privacy [page 35] Auditing and Logging Information [page 37] Front-End Security [page 38]

10.1 User Administration, Authentication and Authorization

Introduction

For Document Classification, the standard user authentication and authorization mechanisms provided by SAP Business Technology Platform (SAP BTP) for Cloud Foundry is used. Following the standard mechanism of Cloud Foundry, the service consumer can create an instance of the Document Classification service and then generate credentials to communicate with the service instance (see Using Services in the Cloud Foundry Environment in the SAP BTP documentation for details on this process).

The credentials enable the user to retrieve a JSON Web Token (JWT), which is necessary for the secure communication between any client and the service. Overall, the communication with the service is secured by the OAuth 2.0 protocol. For more information on this topic, see Data Privacy and Security in the SAP BTP documentation.

Document Classification 34 PUBLIC Security Guide User Administration and Provisioning

The application does not manage or provision users.

10.2 Data Protection and Privacy

Introduction

Data protection is associated with numerous legal requirements and privacy concerns. In addition to compliance with general data privacy acts, it is necessary to consider compliance with industry-specific legislation in different countries. This section describes the specific features and functions that SAP provides to support compliance with the relevant legal requirements and data privacy.

This section and any other sections in this Security Guide do not give any advice on whether these features and functions are the best method to support company, industry, regional or country-specific requirements. Furthermore, this guide does not give any advice or recommendations with regard to additional features that would be required in a particular environment; decisions related to data protection must be made on a case-by- case basis and under consideration of the given system landscape and the applicable legal requirements.

 Note

In the majority of cases, compliance with data privacy laws is not a product feature.

SAP software supports data privacy by providing security features and specific data-protection-relevant functions such as functions for the simplified blocking and deletion of personal data.

SAP does not provide legal advice in any form. The definitions and other terms used in this guide are not taken from any given legal source.

Document Classification generally requires the following types of data:

Data Purpose

Document data for Consists of single documents, which are submitted by a user in order to receive machine learning inference predictions. This data is stored temporarily in SAP Business Technology Platform (SAP BTP) Cloud Foundry and removed automatically 1 hour after the machine learning predictions are generated.

Document data for Consists of documents and corresponding metadata containing classification labels (ground truth) training and is used in the machine learning training process to create a model for customized document classifications. This data is stored in SAP BTP Cloud Foundry and it is under full control of the serv­ ice user. It is removed by user request via a REST API endpoint or when the instance of the Document Classification service is deleted.

Document Classification Security Guide PUBLIC 35 Read Access Logging

The training data used by Document Classification is controlled and managed by the consuming application/ customer which calls the service APIs. However, Document Classification does not have any means to verify whether the data uploaded to the service contains any personal information. Therefore, Document Classification does not support logging of read access to (sensitive) personal data.

Information Report

The data used by Document Classification is controlled and managed by the consuming application/customer which calls the service APIs. Document Classification does not have any means to verify whether the data uploaded to the service contains any personal information. Therefore, the service does not provide any means to retrieve personal data of specific individuals. It is recommended that the consuming application/customer, which uses the service provides personal data reports to its users about the data being stored and transferred to Document Classification for processing.

Deletion of Personal Data

Document Classification does not explicitly process any personal data. The service does not have any means to verify whether the data uploaded to the service contains any personal information. Therefore, no dedicated functionality for the deletion of personal data is available. However, customers may delete data on demand, or the data will be deleted automatically after defined retention period. See the table below for more details.

Data Deletion

Document data for The client system sends documents 24 hours a day, 7 days a week. The data retention period is 1 inference hour.

Document data for The training documents and corresponding metadata are uploaded via controlled manual or semi- training automatic process. There is no data retention period and the data can be deleted whenever cus­ tomer/consuming application wants by using respective API endpoints or as a part of account ter­ mination.

Change Log

The data used by Document Classification is controlled and managed by the consuming application/customer which calls the service APIs. The service itself does not allow any change to the content of the uploaded data. Therefore, Document Classification does not support logging of data change.

Document Classification 36 PUBLIC Security Guide Consent

According to Personal Data Processing Agreement for SAP Cloud Services, SAP acts as data processor. Thus, customers are responsible for obtaining relevant consent to process personal data, including when applicable approval by controllers to use SAP as a processor.

10.3 Auditing and Logging Information

Here you can find a list of the security events that are logged by the Document Classification service.

Security events written in audit logs How to identify related log Event grouping What events are logged events Additional information

Authentication related events Authentication failure Failed login attempt for ten­ See below the definitions of ant {tenant_id} on {time} the notations used in the log events. Dataset related events Deletion of datasets Deletion of dataset:{data­ ● {dataset_id}: ID of the set_id} successful dataset. Deletion of dataset:{data­ ● {model_id}: ID of the set_id} failed machine learning model. ● {tenant_id}: ID of the Model related events Deletion of models Deletion of model:{model_id} tenant used to access successful the service. Deletion of model:{model_id} ● {time}: time stamp of failed when a log was created. You can use time stamps Deployment of trained mod­ Deployment of model: to sort the logs by time. els {model_id} successful

Deployment of model: {model_id} failed

Un-deployment of models Un-deployment of model: {model_id} successful

Un-deployment of model: {model_id} failed

Document Classification Security Guide PUBLIC 37 How to identify related log Event grouping What events are logged events Additional information

Tenant related events Tenant provision "Tenant provisioned" and ID consisting of: targetTenant {tenant_id}

Attribute with name "state" was changed from "DOES_NOT_EXIST" to "SUCCESSFULLY_PROVI­ SIONED"

Tenant de-provision "Tenant de-provisioned" and ID consisting of: targetTenant {tenant_id}

Attribute with name "state" was changed from "SUC­ CESSFULLY_PROVISIONED" to "SUCCESSFULLY_DEPRO­ VISIONED"

Related Information

Audit Logging in the Cloud Foundry Environment

10.4 Front-End Security

Document Classification does not have any user interface component. All functionalities are delivered via web services, JSON over HTTPS.

The service is a backend-only service component and not designed to be invoked by a web browser. Additionally, outputs returned by the service depend on the data submitted to it. Therefore, a consumer of the service should sanitize the data submitted to the service and returned by it to avoid script injection attack.

Document Classification 38 PUBLIC Security Guide 11 Monitoring and Troubleshooting

Find out how to get support.

Getting Support

If you encounter an issue with this service, we recommend to follow the procedure below.

Check Platform Status

Check the availability of the platform at SAP Trust Center .

For more information about selected platform incidents, see Root Cause Analyses.

Check Guided Answers

In the SAP Support Portal, check the Guided Answers section for SAP Business Technology Platform. You can find solutions for general platform issues as well as for specific services there.

Contact SAP Support You can report an incident or error through the SAP Support Portal. For more information, see Getting Support.

Please use the following component for your incident:

Component Name Component Description

CA-ML-BDP Services related to Business Document Processing

When submitting the incident, we recommend including the following information:

● Region information (Canary, EU10, US10, for example) ● Subaccount technical name ● The URL of the page where the incident or error occurs ● The steps or clicks used to replicate the error ● Screenshots, videos, or the code entered

Document Classification Monitoring and Troubleshooting PUBLIC 39 Important Disclaimers and Legal Information

Hyperlinks

Some links are classified by an icon and/or a mouseover text. These links provide additional information. About the icons:

● Links with the icon : You are entering a Web site that is not hosted by SAP. By using such links, you agree (unless expressly stated otherwise in your agreements with SAP) to this:

● The content of the linked-to site is not SAP documentation. You may not infer any product claims against SAP based on this information. ● SAP does not agree or disagree with the content on the linked-to site, nor does SAP warrant the availability and correctness. SAP shall not be liable for any damages caused by the use of such content unless damages have been caused by SAP's gross negligence or willful misconduct.

● Links with the icon : You are leaving the documentation for that particular SAP product or service and are entering a SAP-hosted Web site. By using such links, you agree that (unless expressly stated otherwise in your agreements with SAP) you may not infer any product claims against SAP based on this information.

Videos Hosted on External Platforms

Some videos may point to third-party video hosting platforms. SAP cannot guarantee the future availability of videos stored on these platforms. Furthermore, any advertisements or other content hosted on these platforms (for example, suggested videos or by navigating to other videos hosted on the same site), are not within the control or responsibility of SAP.

Beta and Other Experimental Features

Experimental features are not part of the officially delivered scope that SAP guarantees for future releases. This means that experimental features may be changed by SAP at any time for any reason without notice. Experimental features are not for productive use. You may not demonstrate, test, examine, evaluate or otherwise use the experimental features in a live operating environment or with data that has not been sufficiently backed up. The purpose of experimental features is to get feedback early on, allowing customers and partners to influence the future product accordingly. By providing your feedback (e.g. in the SAP Community), you accept that intellectual property rights of the contributions or derivative works shall remain the exclusive property of SAP.

Example Code

Any software coding and/or code snippets are examples. They are not for productive use. The example code is only intended to better explain and visualize the syntax and phrasing rules. SAP does not warrant the correctness and completeness of the example code. SAP shall not be liable for errors or damages caused by the use of example code unless damages have been caused by SAP's gross negligence or willful misconduct.

Gender-Related Language

We try not to use gender-specific forms and formulations. As appropriate for context and readability, SAP may use masculine word forms to refer to all genders.

Document Classification 40 PUBLIC Important Disclaimers and Legal Information Document Classification Important Disclaimers and Legal Information PUBLIC 41 www.sap.com/contactsap

© 2021 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. The information contained herein may be changed without prior notice.

Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.

Please see https://www.sap.com/about/legal/trademark.html for additional trademark information and notices.

THE BEST RUN