Instructions for ACL-2013 Proceedings
Total Page:16
File Type:pdf, Size:1020Kb
Using Natural Language and SBVR to Author Unambiguous Business Governance Documents Donald Chapin John Hall Business Semantics Ltd Model Systems Flat 21, Dovedale Cottages 17 Melcombe Court 240a Battersea Park Road Dorset Square London SW 11 4LN London NW1 5EP United Kingdom United Kingdom [email protected] [email protected] term and use it consistently within a document for Abstract each distinct audience. An organization (a semantic community) has The Object Management Group standard “Se- speech communities (audience that shares terms) mantics of Business Vocabulary and Business that each use a given natural language and, typi- Rules” (SBVR) was designed to enable natural cally, at least three speech communities that use language sentences to be written so they can the same natural language, each with a distinct vo- be read unambiguously by business people, cabulary with its own preferred terms for the same and interpreted unambiguously in formal logic by computers. This paper discusses the key concepts: factors that need to be present in addition to • Employees: jargon, abbreviations, transac- SBVR to realize this SBVR design goal. tion codes, form numbers, etc. But much of the vocabulary would be in understandable 1 Introduction business language. It would usually be the most comprehensive vocabulary, providing Ambiguity in business communication, especially default terms for the others. in business governance documents, introduces • Legal, for contracts, product and service avoidable business risks. Sometimes these busi- specifications, compliance reporting, etc. ness risks are very costly, even catastrophic, to the The vocabulary would be formal, include organization involved. standard legal and industry terminology, and The key challenge is to remove ambiguity in be strictly policed. governance documents without business authors • Public, for advertisements, public-facing having to learn new grammar rules. web sites, scripts for helpdesks, etc. The vo- cabulary would be everyday language - and 2 Business Audience – Not IT Audience probably also be strictly policed. SBVR Terminological Dictionaries and Rule- There would probably also be smaller, special- books “document the meaning of terms and other ized speech communities, such as accountancy representations that business authors intend when and finance. Their vocabularies would usually be they use them in their business communications, drawn from the employees’ and legal vocabular- as evidenced in their written documentation, such ies, supplemented by terms adopted from their as contracts, product/service specifications, and practices. governance and regulatory compliance docu- 2.1 Different Terminological Dictionaries for ments.”1 Different Audiences SBVR “is conceptualized optimally for busi- ness people rather than automated processing. It is Each speech community within the business designed to be used for business purposes, inde- would have a terminological dictionary, in its own pendent of information systems designs.”2 language, that would be a view of the terminolog- A key aspect of natural language simplification ical database – a structured subset delivered as a is to choose one of the synonyms as the preferred 1 SBVR Clause 1.4 “Terminological Dictionaries and 2 SBVR Clause 1.2 “Applicability” Rulebooks” report, or the output from a canned query, or a live • Subject concept view via a custom interface. • Period of time In a given language within a given business: In terminological dictionaries the context • Each concept must have a preferred designa- within which the preferred term and its synonyms tion have exactly one meaning is explicitly stated in • Preferred designations may have synonyms: the terminological entry. • A synonym for a given concept in one ter- When authoring business documents, there are minological dictionary may be a preferred a number of techniques to make the context ex- term for that concept in another termino- plicit, thus minimizing the likelihood of linguistic logical dictionary. analysis engines getting it wrong: • A synonym might not be a preferred term • Including the intended audience (speech in any terminological dictionary – but community), the subject field(s), and docu- may be a synonym in more than one ter- ment applicability dates as document proper- minological dictionary. ties • Including a subject field and/or context con- 2.2 Terminological Dictionaries are for Peo- ple; Data Models are for IT Systems cept as metadata in the document’s outline headings Terminological dictionaries document the mean- • Noting the subject field and/or context con- ings intended by business authors for words and cept in a (xxxx, yyyy) notation after the word phrases they use in their business documents. or phrase. These documents are used by business people to operate the business. 3 Subset of Natural Language Grammar For example, ISO 1087-1_2000 Terminology – Not Artificial Grammar work - Vocabulary - Part 1: Theory and applica- tion defines the meaning of the terms used in ISO The approach this paper advocates is to use a se- 704:2009 Terminology work – Principles and lected subset of natural language grammar struc- methods. tures & terms defined in a terminological diction- Data Models and their data definitions docu- ary. It is not to define a new artificial language or ment the data maintained in IT systems. These artificial extensions to natural language. models are used by IT professionals for design of This means not changing the natural language IT systems. syntax of sentences in any way that requires busi- For example, ISO 30042 Systems to manage ness users to learn new syntax or different inter- terminology, knowledge, and content – TermBase pretations of syntax from what they already know eXchange (TBX) is a data model that documents or could know from natural language grammar. XML data structures for exchanging terminologi- 3.1 Keeping Natural Language Grammar cal database content. Natural Terminological dictionaries and data models are both important but serve very different audi- While there is a continuum from: ences and purposes. Neither is an adequate substi- 1. “sloppily, even wrongly, used natural lan- tute for the other. guage grammar” through 2. “good quality simple, plain natural 2.3 Importance of Context language” through (and across the bound- Dealing with homonyms is essential to removing ary to) ambiguity. At the heart of terminology science is 3. “additional artificial grammar having to be the principle that there is a one-to-one relation, in learned and remembered” through a given context, between a given word or phrase 4. a “fully formal language that looks as much and the concept that designates it. like natural language as possible” to (but is ISO TC 37 Terminology standards and SBVR deceiving like COBOL was) together can support several kinds of context for 5. “formal logic programming languages” disambiguating part of speech words and phrases: (like Prolog and Datalog), the transition from stage 2 to stage 3 is a clearly • Subject field identifiable boundary that, when crossed, moves • Part of speech from pure natural language to some form of arti- • Speech community ficial language. • Context concept (disambiguation context) Business people need to be able to use natural The document author is always the final author- language with the help of tools to express business ity for intended meaning, when linguistic analysis definitions and sentences unambiguously – with- can’t do the job correctly alone. out being required to learn something that is not An example of a sentence where a software tool part of natural language grammar. should ask for clarification is London Under- Of course, natural language grammar can be ground’s rule: supplemented with good practices for which sub- “Dogs must be carried on escalators”. sets of natural language grammar structures and This could be interpreted either as: patterns are least ambiguous. The “Plain English” “A person who is accompanied by a dog must requirement of the US Government is an example carry the dog when riding an escalator” of this.3 or as Once one crosses this boundary, the whole ap- “A person may ride an escalator only if the per- proach is on a slippery slope from making it easy son is carrying a dog” for business people to communicate unambigu- Compare this with “A hard hat must be worn ously to making it easy for IT developers. when visiting a contruction site”. If business people have to learn artificial gram- mar / syntax / notation, there is a shift of respon- 4 Unambiguous Words/Phrases sibility – and effort – for unambiguous communi- cation. Rather than business people working with Ambiguous words and phrases are one of the two good semantic authoring tools in their own natural major sources of ambiguity in business documen- language, they have to speak the language of IT tation. Removing ambiguity from part-of-speech professionals. The more that happens, the greater words and phrases is the focus of the discipline of the risk to the clarity of the documents that people terminology science. in the business use. Terminology work is standardized in the ISO TC 37 terminology standards with ISO 704:2000 3.2 “Plain Language” as Basis for Least Am- and ISO 1087-1 being the core standards. SBVR biguous Subset of Natural Language builds on the foundation of these standards and There is a large international “Plain Language” adds: community that is rich with good practice materi- • Semantic features to terminological diction- als, training, tools and practical experience that aries so that the definitions of concepts can support writing using plain language.4 5 6 7 be grounded in formal logic. The knowledge, know-how and involvement in • The ability to define the skeleton of a sen- unambiguous business document authoring of the tence clause; i.e. sentence clauses without Plain Language community fits exactly with the their quantifications -- typically “subject business audience of this approach. This is in verb object [preposition object]”. These sharp contrast to an audience of logicians and IT skeleton clauses are known as “verb concept professionals. wordings” in SBVR.