On the Formal Standardization of Terminology Resources: the Case Study of Trimed

Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 4903–4910 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC On the Formal Standardization of Terminology Resources: The Case Study of TriMED F. Vezzani1, G. M. Di Nunzio2;3 1Dept. of Linguistic and Literary Studies, 2Dept. of Information Engineering, 3Dept. of Mathematics University of Padua, Italy [email protected], [email protected] Abstract The process of standardization plays an important role in the management of terminological resources. In this context, we present the work of re-modeling an existing multilingual terminological database for the medical domain, named TriMED. This resource was conceived in order to tackle some problems related to the complexity of medical terminology and to respond to different users’ needs. We provide a methodology that should be followed in order to make a termbase compliant to the three most recent ISO/TC 37 standards. In particular, we focus on the definition of i) the structural meta-model of the resource, ii) the data categories provided, and iii) the TBX format for its implementation. In addition to the formal standardization of the resource, we describe the realization of a new data category repository for the management of the TriMED terminological data and a Web application that can be used to access the multilingual terminological records. Keywords: terminology standardization, medical language, methodologies for LRs construction. 1. Introduction ples (Wilkinson et al., 2016) of the European Open Science 2 Terminology standardization has long been known as a fun- Cloud (ESOC). damental requirement in dealing with special languages 1.1. Our Contribution (Wuster,¨ 1971; Galinski and Nedobity, 1988; Sager, 1998). In this paper, we focus on the terminology standardiza- This process reflects on two aspects related to terminology, tion by restricting our field of analysis to the medical do- as the study of the “micro-language” used by specialists in main. In this context, we describe the process of formal particular domains of the human activity (Balboni, 2000). standardization of an existing multilingual database named Firstly, from a semantic perspective, the standardization of TriMED (Vezzani et al., 2018), which was conceived in terminology consists in dealing with the meaning of terms. order to tackle some problems related to the complexity This process aims to 1) create a shared common consensus of medical terminology, such as the transmission and the on the use of technical terms among a community of ex- comprehension of medical information. Our resource is perts, and 2) disambiguate their meaning to the advantage designed in order to respond to the ISO standards for the of monoreferentiality (a term has a single referent in the implementation and management of terminology resources. real world) and monosemy (a term has a single and univo- In particular, we present: cal meaning) (Magris et al., 2002). The importance of this aspect is reflected in such contexts where the transmission 1. The standardization of the structure of TriMED ac- of information is “vital”. Communication in the healthcare cording to Terminological Markup Framework (TMF) domain, for example, is often characterized by an opaque meta-model (ISO-16642, 2017); lexicon which needs to be disambiguated (Rouleau, 2003; Castro et al., 2007). 2. The description of its Data Categories Specifications Secondly, from a formal perspective, the standardization of through the implementation of the TriMED Data Cat- terminology, interpreted in this context as a collection of egory Repository (ISO-12620, 2019); terminological data, reflects on the organization of struc- 3. The implementation of the resource in the TermBase tured data for linguistic resources. To this purpose, the eXchange (TBX) format (ISO-30042, 2019); International Organization for Standardization (ISO),1 and in particular the Technical Committee ISO/TC 37 work- 4. The Web application to access the data stored in the ing on Language and Terminology, provides a large num- Termbase. ber of standards for “language resource management” (sub- The reminder of this paper is organized as follows: in Sec- committee 4) and “management of terminology resources” tion 2., we present, in terms of both semantic and formal (sub-committee 3). One of the main goal is to provide struc- standardization, the existing linguistic resources available tured and standardized terminological data which can be for the medical domain and the purposes behind their im- reused in different terminology management systems. This plementation. In Section 3., we describe the application aspect is important not only for the correct implementation of the above mentioned standards applied to TriMED re- of a terminological resource, but also in a broader sense source by focusing on the 1) structural meta-model 2) data of ‘Open Science’, where the concept of “reusability” is categories, and 3) TBX implementation. In Section 4. we one of the fundamental keystone of the FAIR (Findabil- present our Web Application and, finally, in Section 5., we ity, Accessibility, Interoperability and Reusability) princi- give our conclusions and some hints on future works. 1https://www.iso.org/home.html 2https://www.eosc-portal.eu 4903 2. Related Works velopment of the terminological database VariMed10: this The standardization of medical terminology aims, first of resource focuses on terminological variations in the medi- all, at the optimization of the communication between ex- cal field along with pragmatic information and is intended perts working in their sub-domains of specialization (Wer- to facilitate communication between healthcare profession- muth and Verplaetse, 2019). To this purpose, nomencla- als and patients (Sanchez´ and Velasco, 2013). tures, vocabularies, terminologies and codes have been de- From a multilingual viewpoint, several initiatives have been veloped to enable effective communication between med- carried out at the European level. The European Commis- ical experts and to facilitate the recording of patient data. sion (DG III) commissioned the Multilingual Glossary of 11 The Unified Medical Language System3 (UMLS) is a com- Popular and Technical Medical Terms project which was pendium gathering a large number of biomedical resources, realized by the Department of applied Linguistics of the such as: 1) SNOMED CT,4 the reference terminology for Heymans Institute of Pharmacology and Mercator School clinical concepts which is used for clinical documentation in 1995-2000. The resource groups nine glossaries of and reporting; 2) MeSH Thesaurus,5 a controlled vocab- 1830 scientific and popular medical terms about medici- ulary for the indexing and retrieval of the biomedical lit- nal product package inserts in nine official languages of erature; 3) ICPC2,6 a classification system which allows the European Union. In 2018, the Terminology Coordi- to classify patient data and clinical activity; 4) ICD11,7 a nation Unit (TermCoord) of the European Parliament in classification system used to codify diagnoses for record- collaboration with the Paris-Diderot University, the Uni- ing morbidity and mortality statistics. The formal stan- versity of Granada and the University of Padua, engaged 12 dardization of these medical resources is regulated by the in the YourTerm MED project consisting in the realiza- international standard ISO 17117-1:2018 developed by the tion of a structured linguistic database based on the cog- Technical Committee 215 working on Health informatics nitive structure of the Frame-Based Terminology (Faber, (ISO-171171, 2018). According to the standard, these re- 2015). This multilingual tool is conceived in order to facil- sources should have a conceptual orientation: they are con- itate communication between healthcare professionals on figured as ontologies, that is collections of clinical concepts mission and their patients: in particular, it is intended to and the corresponding medical terms which designate their meet the immediate terminological needs of Medecins´ sans meaning. Frontieres/Doctors` Without Borders (MSF) in their inter- The above mentioned resources are mainly used by experts national missions, so that physicians can consult a multilin- for purely medical purposes in order to promote peer com- gual resource on site which allows to take care of patients. munication and patient care. However, many studies focus Finally, we want to mention also two important initiatives also on the problem of comprehension of medical infor- which have led to the development of structured and stan- 13 mation in the patient-physician dialogue (McCray, 2005; dardized medical resources. The TermSciences initiative Jucks and Bromme, 2007; Tran et al., 2009; Elhadad and deals with the construction of a multi-purpose and multi- Sutaria, 2007). Indeed, physicians tend to use medical jar- lingual terminological database from various source vo- gon, also referred to as medicalese (Hadden et al., 2018), cabularies produced by major French research institutions which consists in words, phrases or concepts that can be (Khayari et al., 2006). In the context of the formal stan- misunderstood or misinterpreted by patients, or in general dardization of terminological resource, this was the first non-experts (Deuster et al., 2008; Castro et al., 2007). public initiative to implement the ISO standard dealing In this context, there are numerous resources

On the Formal Standardization of Terminology Resources: the Case Study of Trimed

Proceedings of the Workshop on Multilingual Language Resources and Interoperability, Pages 1–8, Sydney, July 2006

Part 2: Data Category Selection (DCS) for Electronic Terminological Resources (ETR)

A Multi-View Hyperlexicon Resource for Speech and Language System Development

Iso 24613:2008(E)

ESFRI / E-IRG Report on Data Management, January 2010

Tomaž Erjavec: Platforms and Standards for Data Sharing

Accessibility of Multilingual Terminological Resources - Current Problems and Prospects for the Future

Terminology for Translators — an Implementation of ISO 12620 Robert Bononno

Corpus Query Lingua Franca Part II: Ontology

Janne Bondi Johannessen (Ed.)

Of Parallèles

ISO 12616 Translation-Oriented Terminography