Analysis and Improvement on Knowledge Representation Model of XBRL

Analysis and improvement on knowledge representation model of XBRL

Ji Ma, Shuang Zhang Research Center of Finance Sciences and Technology Graduate University of the Chinese Academic of Sciences

Abstract: In this paper, we introduce the basic concepts and principles of XBRL (eXtensible Business Reporting Language), describe the knowledge representation model of the specifications such as XBRL2.1, Dimensions1.0, XBRL Generic Links1.0 and Formula1.0. We analyzed the limitations of these models. Then we developed a more generic knowledge representation model based on the requirements of business report representation. We aim to improve its extensibility and capability of representation, and reduce its redundancy.

Keywords: XBRL, Knowledge Representation, Financial Report, XML

1 Introduction

Accompanying by widely applying of computer systems, software systems are facing more and more complex requirements. From 2000, in the field of financial report representation, the United States Institute of Certified Public Accountants (AICPA) launched XBRL (eXtensible Business Reporting Language) plan, in order to achieve a goal that the software can identify and handle different financial reports adapting to the changes in financial report rules without re-development. Nowadays, software applications mainly focus on displaying, editing, storing and transferring financial report. After several years’ development, XBRL has been developed as a knowledge representation architecture, which includes XBRL 2.1 Specification and some other specifications. This paper will analyze the XBRL specifications and their limitations, and propose ideas for improving those limitations.

2 Knowledge and Knowledge Representation

There are lots of definitions about knowledge. Generally speaking, knowledge is considered as a subset of information, the information refined. In the field of knowledge representation, there are two classical theories which are Semantic Network and Ontology. Both the Semantic Network and Ontology is ways of representing knowledge; they can be described by directed graph with tags. Ontology, which human being learns the world by, is a concept of philosophy. In 1991, Neches and his partners introduced the Ontology into field of AI (artificial intelligence)[9]. Guarinol uses the degree of precision and level of domain dependence as criteria for classification of ontology[11]. There are four levels of ontology, namely top-level, domain, task and application. Ontology applies a reusing mechanism to describe knowledge, which focuses on the common features of knowledge. In another word, ontology is a specification of sharing concept description. It is a domain knowledge oriented conceptual model. Comparing with Ontology, Semantic Network is more flexible to describe knowledge and semanteme even natural language. However Ontology is more competent in domain knowledge representation which means describe precise and well-organized knowledge. XBRL is one of Ontological applications, which comes from the requirement of expressing financial reporting at first, and then expanding to other similar reporting domain to describe the organized knowledge in those domains.

Domain knowledge usually includes conceptual model, set of fact and rules, and the conceptual model contains the concepts attributes of conceptions and relations between those conceptions. In this paper we use "reporting domain" to stand the domain knowledge described by XBRL which originated from the requirement of financial reporting. In this domain, the model of knowledge is guided by fact, which includes three levels of content: fact, concept and rule. The fact is the body of content, which can be further divided into data fact and non-data fact; Concepts constrain facts, which identify the domain of fact, e.g. time is a concept; Rule is a relation between concept and fact, which describe the relation between facts. Rule includes logical rule and operational rule.

The main task of domain knowledge representation is how to express the organized knowledge so that agent could recognize and handle. The main criteria of knowledge representation are ability of representation (expansibility, precision), reasoning process (validity, efficiency), user experience (readability, modularity). This paper will mainly analyze the XBRL representation model from the aspects of representation ability and reasoning process.

3 XBRL Representation Model Analysis

XBRL specifications compose of a series of specifications adopting XML as the knowledge representation method; describe the knowledge of financial report through XML and XML Schema documents. According to the classification by “XBRL Specification and Guidance Stack (SGS) 1.0”[2] , XBRL specifications are divided into three levels. The first level is technological base, including XBRL Specification and its consistency packages, Formula link library. The second level is a model rule which is constraints made by models. The third level is application directions such as taxonomy, instance documents.

3.1 XBRL2.1

XBRL2.1 Specification[3] is the core of XBRL. It supplies a knowledge representation architecture for description of business report. XBRL 2.1 consists of three layers. First layer is metadata layer which includes four XML schema documents. The next is taxonomy and the third one is XBRL instance.

We illustrate the semantic elements, on which XBRL focuses, as the following figure.

nesting calculation arc

Tuple presentation arc

nesting definition arc

Balance attribute Item/Concept

instance refer refer Label Unit constraint Facts Reference attribute belong to Lang Period include Context include Entity

Figure 1 XBRL 2.1 taxonomy includes definition of concepts and relationship, which consists of schema and linkbase. The schema defines concepts, and linkbase is in charge of expressing relationship. Facts will be in instance documents. 3.2 Dimensions 1.0

Dimensions 1.0 is based on XBRL 2.1. It enhanced the capability of representation of multi-dimensions data. We group dimensions in two types. First includes the dimensions related to entity, EX: organization structure. The others belong to the second type. Those dimensions may include region, product, and so on. For example, region dimension has members of AG, APAC and EMEA, and AG also has members of America and Canada. The following diagram illustrates a simple model of Dimensions 1.0.

Multidimensional data Dimensions1.0

Primary Item ●Multidimensional data set ◎Region Hypercube ○All ○Apac Dimension ○China ○Japan Member ○... Member ○EMEA … ◎Product Member …… Figure 2 Hypercube is collection of dimensions which organizes the multidimensional data set. Dimension is a abstract concept, which is used to organize domain member but not to restrict facts. Domain members are concepts used to describe facts, and there are parent-child relationships between them. All of the concepts are defined in dimensions taxonomy.

3.3 Formula Specification 1.0

Some requirements new were proposed accompanying by applying of XBRL, that is, how to present the relationship between facts. So formula specification[5] has been proposed. This specification has three components, formula, fact variable and filter. Agent software can extract facts’ value from instance document via fact variables and filters. Formula is used to express math formulas using fact variables.

3.4 Generic Links

XBRL 2.1 has defined five types of linkbase. Then XBRL international developed generic links for extending linkbase by user. But generic links has not taken place of linkbase. 4 Problem Analysis

After analyzing the requirements and representation model of XBRL specifications, it can be seen that there are problems mainly in three aspects, the accuracy in representation, the flexibility of the model and the logic ability. 4.1 Accuracy in Representation

In the aspect of accuracy in representation, XBRL has met the present needs of the business report, but there are still some requirements in definitions of language label, the demarcation between concepts and facts and the representation of concept relationships.

 In XBRL2.1 Specification, it is unable to add human language labels to the concepts such as unit and balance, which will reduce the human-computer interaction ability. In the instance documents, defines unit as follows: iso4217:CNY It is unable to add labels to concept unit because it is not defined in taxonomy, there is only id attribute and measure can be displayed. However, id is optional or even just a meaningless number, the representation of measure is also difficult to be understood, so does the predefined attribute balance. Further more, XBRL can not provide labels for some non-concept elements such as role/arcRole.

 Dimensions1.0 defines dimension facts in schema document. Generally speaking, schema document defines concepts, for example, region dimension, product dimension, domain member concept, while member fact should belong to the fact level. For example, L.A. is a fact element. The definition of concepts and facts should be divided into different layers.

4.2 Model Flexibility

Model flexibility includes extensibility and reusability. Due to the increasing requirements of representation, there are several designing problems in XBRL. Although XBRL specifications have implemented the new requirements, it is achieved through designing new representation models.

 XBRL2.1Specification defines the financial domain knowledge in the metadata layer (specification layer), which makes the difference between representation model and domain knowledge obscure, reduce the extensibility of the representation model.

 After Dimension1.0 was added, the reusability of context in XBRL2.1 Specification is reduced and its redundancy is raised. For example, we want to define the “incoming” in two dimensions as “all-regions” and “hardware” and also define the “incoming” in “apac” and “hardware”, we must define two contexts as follows:

abc 2007-01-01 2008-01-01 all-region hardware

晨光公司 2007-01-01 2008-01-01 apac hardware From the above we can see that the vast majority of code is redundant, only one line is for the different dimensions. Along with the increasing dimensions it will bring about a large number of redundancies, meanwhile, it is difficult to guarantee the consistency of the code.

4.3 logic ability

At the beginning, XBRL is designed mainly for representing concepts, facts and the relations between them. Nowadays, the increasing logical relations are put forward, which brings challenge to XBRL. At present, XBRL represents calculations through calculation link base and formula, while it can not represent the logical relations. In this situation, the agency developed for XBRL lack of the logical reasoning ability. However, the calculating representation ability of XBRL is still lacking.

 The calculation link base can only represents the operational rules between concepts, and then Formula was generated to solve this problem. Formula represents the calculation relations under different contexts, which means Formula can represent the calculation relations between facts of different concepts, different entities, different periods and different dimensions. However, in the aspect of representation, there are redundancies between Formula and calculation link base.

 In Formula, the expressions description makes it difficult for software development and extensibility and the ability of representing logical rules is low.

4.4 How to improve

In order to solve those facing challenges thoroughly, it is necessary to improve the XBRL knowledge representing model in those aspects which are listed as following:

 Identify the division between concept and fact under each standard. Based on those divisions, rebuild method of representing. The representations of concept under different standards should use uniform representing method just as the representations of fact. This work is the base framework of representing model, which requires an appropriate division based on classification of ontology. A suitable framework is benefit for improvement of the model’s extensibility.

 Distinguish the domain knowledge from meta-knowledge. In order to satisfy the requirement of model’s extensibility, we need to divide knowledge into meta-knowledge, static domain knowledge, and dynamic domain knowledge.  Using triad to represent all kinds of knowledge including source node, arc relation and target node. All of those nodes and relations belong to elements. The relations between fact and fact can also represent the original constraint of a language environment, through creating constraint of entry fact or dimension fact can reduce the representing redundancy in a multi-dimension situation which increases the extensibility at the same time.

 Separate computing relations from relational chained library. Build a uniform relation model of reasoning, which include arithmetic operation and logical operation. All relations should apply the form of binary relation, which is a element owning a return value, such as “y=f(x, z) in which y represent the element of that relation; f is the rule of that relation, it may be arithmetic operation or logical operation; x is the left element of the that binary relation, z is the right element”. Through combining those relation elements, the reasoning model has the ability to represent more complicate relation.

 Build a uniform readable label model, which is independent from other models and can supply readable resources such as labels and references for any element, in this way the readability of the representing knowledge increase.

5 Improvement

Based on the foregoing analysis, we will put forward some improvement to XBRL knowledge representation model. The knowledge representation model can be divided into four layers, representation model layer, domain model layer, fact layer, application layer. Representation model layer evolved from XBRL specifications defines the knowledge carrier form and knowledge representation form. Domain model layer focused on the business rules of report domain take the place of XBRL taxonomy. This layer is composed of ontology and rules, ontology in charge of defining domain knowledge concepts and their relations, while rules define the complex logical reasoning and calculation. The fact layer implements instance documents. Next we will sort out the domain knowledge, abstract the general characteristics, and then acquire the elements needed in representation model and domain model.

The representation of knowledge in XBRL can be divided into facts and concepts. The following figure describes a simple topology of them. Abstract Concept reference arc

extend extend

Structure Concept label arc

extend extend extend extend extend Hypercube Tuple nest Item Resource

nest nest Explicit Dimension nest Member Concept

define define define unit constraint item facts constraint constraint member facts period extend2 reference extend2 entity label

label arc extend2 abstract facts reference arc

Figure 3 In the diagram above, the knowledge can be divided into elements and relations. Elements can be divided into two parts, one is concept which is ontology of all concepts, and the other one is abstract fact which is ontology of all facts. Different from Dimension1.0, we extract the fact from concept domain; add it into fact domain; restrict the fact through constraint relations. It is important to note that the label facts and the reference facts resources focused on human-computer interaction, which make no sense to knowledge representation. Therefore, we adopt to supply a set of resource labels for knowledge, which is independent of the domain model.

We have the knowledge being abstracted. In the representation model layer, the hypercube, the explicit dimension and the tuple, which have the same attribute, will be unified as tuple, while subjects and domain with the same attribute can be unified as item. We will consider how to represent the basic data types, tuple, item, relations etc. The relations can be mainly divided into three parts including: the relation between concepts, the relation between the facts and the relation between concepts and facts. The following table shows the relations.

Table 1 No Relation Arcs Scenario Type r1 Extend co extend Relationships between concepts Many to o ncept ne r2 Nest conc nest Relationships between concepts One to ma ept ny r3 Define define Relationships between concept and fact One to ma ny r4 Constraint constraint Relationships between facts Many to m any r5 Extend fa extend2 Relationships between facts Many to o ct ne r6 Refer to l label arc Refer to labels, from both concepts and Many to m able facts any r7 Refer to r reference arc Refer to references, from both concepts Many to m eference and facts any In table 1, the relation r5, which is designed for explanation, doesn’t exist in the representation model.

There are also some business relations between fact sets should be considered. These relations mainly include the following part: the relationships between the facts of different concepts, the facts under the different dimensions, and others under the various constraints. These new requirements cover the representation requirements of XBRL2.1 linkbase, Dimensions1.0 linkbase and Formula. By reason of the powerful representation ability of Formula, we utilize the designing method of Formula, adopt the filter method. However, we adopt the arcs instead of expression in Formula, so that the extensibility can be improved.

label resources no dependencies between Definition rule ref these facts, so could have of rules been in any order; Implicit matching is always R-R to preceding non-sequence R-V R-V fact variable rule

Extract from factVariable factVariable ref facts

V-F V-V V-F value conceptName filter

factArc filter ref Filter rules

Figure 4 The conceptName filter filters facts through specified concept names; factArc filter filters facts through specified arcs between facts. The constraints of dimension fact, entity fact, currency and time on main facts can be resolved by factArc filter. The factVariable extracts facts through the combination of different filters, and can be assigned a value. In the same factVariable the conceptName filter can appear only once.

Rules extract the factVariables through R-V. The factVariables are independent of each other, and can be reused. Further more, rules can be nested through R-R so that they can compose to more complex rules.

Table 2 No Relation Arcs Scenario Type r8 RV R-V From rule to factVariable Many to many r9 RR R-R From rule to rule Many to many r10 VF V-F From factVariable to filter One to many r11 VV V-V From factVariable to value One to one There is the architecture of improved representation model: Representation Domain Model Layer Fact Layer Model Layer Static Dynamic Concept ： Unit Asset Fact Tuple Context Liability (item facts, Item Balance Ownership Interest dimensions facts Resource Hypercube …… and so on) Time Dimension Region Dimension Relations between Relation ： Domain Product Dimension Facts Fact constraint Entity …… Resource …… Shared Relations Reference RV RR VF VV Data Type

Figure 5 In the improved representation model, some elements such as unit, context and balance are classified into domain model layer, the domain member of dimension is classified into fact layer, and the relation between facts is classified into representation model layer. In fact, the meta-knowledge in the improved model represents the taxonomy, the domain model layer represents the Dimension1.0 specification, and the fact layer represents the Dimension1.0 taxonomy. The constraints on facts are described by the arcs between facts.

Reference: 1 邓志鸿, 唐世渭, 张铭, 杨冬青, 陈捷. Ontology 研究综述. 北京大学学报（自然科学版）, 第 38 卷,第 5 期. 09/2002 2 XBRL Specification and Guidance Stack (SGS) 1.0. http://www.xbrl.org/technical/SGS-PWD-2005-05- 17.htm 3 Extensible Business Reporting Language (XBRL) 2.1. http://www.xbrl.org/Specification/XBRL- RECOMMENDATION-2003-12-31+Corrected-Errata-2006-12-18.htm 4 XBRL Dimensions1.0. http://www.xbrl.org/Specification/XDT-REC-2006-09-18.htm 5 Formula 1.0. http://www.xbrl.org/Specification/formula-PWD-2007-12-31.html 6 XBRL Generic Links 1.0. http://www.xbrl.org/Specification/XGL-PWD-2007-04-24.htm 7 Charles Hoffman. Financial Reporting Using XBRL: IFRS and US GAAP Edition. Lulu.com. 2006. 8 Nils J Nilsson. Artificial Intelligence, A New Synthesis. Morgan Kaufmann Publishers. 1998. 9 Neches R, Fikes R E, Gruber T R, et al. Enabling Technology for Knowledge Sharing. AI Magazine. 1991,12(3):36~56 10 Gruber T R. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition ,1993,5 : 199～220 11 Guarino N. Semantic Matching: Formal Ontological Distinctions for Information Organization, Extraction, and Integration. In: Pazienza M T, eds. Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology, Springer Verlag. 1997:139~170 12 Berners-Lee and the Semantic Web Vision. http://www.xml.com/pub/a/2000/12/xm1200/timbl.html 13 T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American. 05/2001: 34~43. 14 Jacco van Ossenbruggen, Lynda Hardman and Lloyd Rutledge. Hypermedia and the Semantic Web: A Research Agenda. Journal of Digital information, volume 3 issue 1. 17/05/2002. 15 W3C Semantic Web Activity. http://www.w3.org/2001/sw/