Ecolexicon: New Features and Challenges
Total Page:16
File Type:pdf, Size:1020Kb
EcoLexicon: New Features and Challenges Pamela Faber, Pilar León-Araúz, Arianne Reimerink Department of Translation and Interpreting, Universidad de Granada Buensuceso 11, 18071 Granada (Spain) E-mail: [email protected], [email protected], [email protected] Abstract EcoLexicon is a terminological knowledge base (TKB) on the environment with terms in six languages: English, French, German, Modern Greek, Russian, and Spanish. It is the practical application of Frame-based Terminology, which uses a modified version of Fillmore’s frames coupled with premises from Cognitive Linguistics to configure specialized domains on the basis of definitional templates and create situated representations for specialized knowledge concepts. The specification of the conceptual structure of (sub)events and the description of the lexical units are the result of a top-down and bottom-up approach that extracts information from a wide range of resources. This includes the use of corpora, the factorization of definitions from specialized resources and the extraction of conceptual relations with knowledge patterns. Similarly to a specialized visual thesaurus, EcoLexicon provides entries in the form of semantic networks that specify relations between environmental concepts. All entries are linked to a corresponding (sub)event and conceptual category. In other words, the structure of the conceptual, graphical, and linguistic information relative to entries is based on an underlying conceptual frame. Graphical information includes photos, images, and videos, whereas linguistic information not only specifies the grammatical category of each term, but also phraseological, and contextual information. The TKB also provides access to the specialized corpus created for its development and a search engine to query it. One of the challenges for EcoLexicon in the near future is its inclusion in the Linguistic Linked Open Data Cloud. Keywords: Terminology, knowledge representation, terminological knowledge base being added. This terminological resource is conceived for 1. Introduction language and domain experts as well as for the general EcoLexicon (ecolexicon.ugr.es) is a multilingual visual public. It targets users such as translators, technical writers, thesaurus of environmental science (Faber, León-Araúz, and environmental experts who need to understand and Reimerink 2014). It is the practical application of specialized environmental concepts with a view to writing Frame-based Terminology (FBT; Faber et al. 2011; Faber and/or translating specialized and semi-specialized texts. 2012, 2015), a theory of specialized knowledge representation that uses certain aspects of Frame Semantics 2. Frame-based Terminology (Fillmore 1985; Fillmore and Atkins 1992) to structure Frame-based Terminology (FBT) is the theoretical specialized domains and create non-language-specific approach used to create EcoLexicon. Based on cognitive representations. FBT focuses on: (i) conceptual semantics (Geeraerts 2010) and situated cognition organization; (ii) the multidimensional nature of (Barsalou 2008), specialized environmental knowledge is specialized knowledge units; and (iii) the extraction of stored and structured in the form of propositions and semantic and syntactic information through the use of knowledge frames, which are organized in an ontological multilingual corpora. EcoLexicon is an internally coherent structure. information system, which is organized according to FBT is a cognitively-oriented terminology theory that conceptual and linguistic premises at the macro- as well as operates on the premise that, in scientific and technical the micro-structural level. communication, specialized knowledge units activate From a visual perspective, each concept appears in a domain-specific semantic frames that are in consonance network that links it to all related concepts. The semantic with the users’ background knowledge. The specification networks in EcoLexicon are based on an underlying of such frames is based on the following set of domain event, which generates templates for the most micro-theories: (i) a semantic micro-theory; (ii) a prototypical states and events that characterize the syntactic micro-theory; and (iii) a pragmatic specialized field of the Environment as well as the entities micro-theory. Each micro-theory is related to the that participate in these states and events. This type of information in term entries, the relations between visualization was selected because a semantic network is specialized knowledge units, and the concepts that they an effective representation method for capturing and designate (Faber 2015). encapsulating large amounts of semantic information in an More concretely, the semantic micro-theory involves an intelligent environment (Peters and Shrobe 2003). The internal and external representation. The internal representations generated for each concept are obtained representation is reflected in a definition template used to from the information extracted from static knowledge structure the meaning components and semantic relations sources such as a multilingual corpus of texts and other in the description of each specialized knowledge unit (see environmental resources. Section 5). The external representation is a domain-specific EcoLexicon currently has 3,599 concepts and 20,106 ontology whose top-level concepts are OBJECT, EVENT, terms in Spanish, English, German, French, Modern Greek, ATTRIBUTE, and RELATION. The ontology is based on the and Russian, though terms in more languages are currently conceptual representations of physical objects and 73 processes (e.g. ALLUVIAL FAN, GROYNE, EROSION, derive their meaning from the characteristics of a given WEATHERING, etc.). This set of concepts acts as a scaffold, geographic area or region and, for example, the weather and their natural language descriptions provide the phenomena that typically occur there semantic foundation for data querying, integration, and Based on these theoretical premises, EcoLexicon has inferencing (Samwald et al. 2010). evolved and has made significant advances since it was The syntactic micro-theory is event-based and takes first created a decade ago. Section 3 explains the interface the form of predicate-argument structures. The nature of of the application, the knowledge provided to users, and an event depends on the predicates that activate the the various interaction options. Section 4 describes the relationships between entities. According to FBT, terms contextualization of knowledge to avoid information and their relations to other terms have a syntax, as overload. Section 5 explains how natural language depicted in graph-based micro-grammars, which not only definitions are created according to FBT premises. show how hierarchical and non-hierarchical relations are Section 6 shows the search possibilities of the expressed in different languages, but can also tag corpus EcoLexicon corpus. Section 7 addresses one of the future texts for information retrieval (León and Faber 2012). challenges of the resource, its inclusion in the Linguistic Finally, the pragmatic micro-theory is a theory of Linked Open Data Cloud, and Section 8 draws some final contexts, which can be linguistic or extralinguistic. conclusions. Linguistic contexts are generally regarded as spans of +5 items before and after term occurrence. They are crucial 3. User interface in the design stage of a terminological knowledge base Users interact with EcoLexicon through a visual interface (TKB) for a wide variety of reasons, which include: (i) with different modules that provide conceptual, linguistic, term disambiguation; (ii) definition formulation; (iii) and graphical information. Instead of viewing all linguistic usage; (iv) conceptual modeling; and (v) term information simultaneously, they can browse through the extraction. Such contextual information is important windows and select the data that is most relevant for their because it shows how terms are activated and used in needs. specialized texts in the form of collocations and Figure 1 shows the entry in EcoLexicon for FAN. collocational patterns. When users open the application, three zones appear. The In contrast, extralinguistic contexts are pointers to top horizontal bar gives users access to the term/concept cultural knowledge, perceptions, and beliefs since many search engine. The vertical bar on the left of the screen specialized knowledge units possess an important cultural provides information regarding the search concept, dimension. Cultural situatedness has an impact on namely its definition, term designations, associated semantic networks since certain conceptual categories are resources, general conceptual role, and phraseology. linked to the habitat of the speakers of a language and Figure 1: EcoLexicon user interface 74 The topmost box shows the definition of the concept. arranged; (iv) the shortest path between two concepts; and Each definition makes category membership explicit, (v) concordances for a term (see Section 6). reflects a concept’s relations with other concepts, and On the center of the screen, the conceptual map is specifies essential attributes and features (see Section 5). shown as well as the icons that permit users to configure Accordingly, the definition is the linguistic codification of and personalize it for their needs (see Section 4). The the relational structure shown in the concept map. The standard representation mode shows a multi-level words in each definition also have hyperlinks to their semantic network