ONTOLOGY LEXICALIZATION 57 cação ins. semantic annotations semantic Universitário Darcy Ribeiro, Darcy Universitário a and its integration into integration its and a entos contendo as princi- vice-versa. Therefore, thevice-versa.Therefore, ional search and validates and search ional da metodologia, no domí- da metodologia, odology can be used in the in used be can odology : . : a tanto, propõe-se à criação out semantic annotation. To annotation. semantic out egrar egrar a leitura em modelo de osal was based on recall and recall on based was osal icas e icas apropriadas semânticas Campus proposed model, documents model, proposed Para possibilitar a comparação possibilitar Para exical-semanticand database tained the index built based on based built index the tained proposta do modelo de recupera- de modelo do proposta E-mail unda contendo o índice construído, índice o contendo unda , , Campinas, 29(1):57-72, 2017 jan./abr., ação em Ciência da Informação, Florianópolis, Informação, da Ciência em ação a Informação. Informação. a Informação Trans /Correspondence to: M. SCHIESSL. SCHIESSL. M. to: /Correspondence Lexicalização de Ontologias: o relacionamento entre conteúdo e significado no contexto da contexto no significado e conteúdo entre relacionamento o Ontologias: de Lexicalização orelacionamento : Correspondência para Correspondência 1 in Brazilian Portuguese containing morphological, syntactic, and semantic information semantic and syntactic, morphological, containing Portuguese Brazilian in

Universidade de Brasília, 2015. Brasília, de Universidade s anotações semânticas para representar a busca semântica. A avaliação da proposta é baseada na revo na baseada é proposta da avaliação A semântica. busca a representar para semânticas anotações s 2 3 : . Ontology. . Representation of information. Semantic Web. Semantic information. of Representation retrieval. Information Ontology. Science. Information : stract SC, Brasil. SC, 7/7/2016. in appoved and 23/6/2016 on resubmitted 19/10/2015, on Received Article based on the doctoral dissertation of SCHIESSL, M. entitled “ entitled M. SCHIESSL, of dissertation doctoral the on based Article Informação”. da Recuperação d Ciência em Pós-Graduação de Programa Informação. da Ciência de Faculdade Brasília. de Universidade Brasil. DF, Brasília, 910-900, 70 Central, Biblioteca da Edifício Pós-Gradu de Programa Informação, da Ciência Departamentode Catarina, Santa de Federal Universidade https://doi.org/10.1590/2318-08892017000100006 a partir dos textos com a com textos dos partir a 1 2 3 Esta proposta visa representar a linguagem natural na forma adequada às ontologias e vice-versa. Par de semiautomática de léxicos em contendo portuguêsinformações morfológicas, sintát brasileiro, para a leitura por máquinas, permitindo vincular dados estruturados e não estruturados, bem como int utilização a demonstram Os alcançados resultados a precisão. aumentar para informação da recuperação da e léxico-semântica base da ontologia, da elaboração a português,para em financeiro risco de nio avaliar a ção da performance informação Para semântica. do selecionados foram modelo docum proposto, semântica. e sem anotação com indexados Esses foram financeiro. do de risco domínio pais definições seg a e tradicional, busca a representando primeira a bases, duas criadas foram abordagens, as entre precision. The queries submitted to the model showed that the semantic search outperforms the tradit the outperforms search semantic the that showed model the to submitted queries The precision. doma kindsof all in used be can proposed procedure the complex, more Although used. methodology the Keywords Resumo an information retrieval model to improve precision. The results obtained demonstrated that the meth the that demonstrated obtained results The precision. improve to model retrieval information an thel and constructionthe ontology ofan (financial Portugueseriscorisk) infor domainfinanceiro the of performance the evaluate to order In model. retrieval information semantic a of proposal the with and with indexed and selected were domain risk financial the of definitions main the containing the with texts the on based created were databases two approaches, the between comparison con the enable second the and search traditional the represents one first The search. semantic the represent to prop the of evaluation The search. semantic the represent to annotations semantic the with texts the Ab The proposal presented in this study seeks to properly represent natural language to ontologies and database lexical a of creation semi-automatic dat unstructured and structured between link the allowing proposed, was machines by read be can that Marcelo SCHIESSL Marcelo BRÄSCHER Marisa Lexicalizaçãode ontologias entre conteúdo e significado no contexto da Recuperaçãoda Informação Ontology lexicalization: Relationship between content and meaning in the context of Information Retrieval ica. oduzido em qualquer a tradicional a e tradicional a validam 1999). ith the user and improve the quality the improve and user the ith their correspondents in natural https://doi.org/10.1590/2318-08892017000100006 BRÄSCHER, The use of a natural language interfaceany language with natural a of use The It seems natural that the exploitation of resources of exploitation the that natural seems It questions that accurately reflectHowever, the user’sthis needs.influences thecharacteristic complexity of and the the representationcontents associated withof thedocument quality of information( recovery and technology is the way to create the balance between balance the create to way the is technology and lexical elements presentontologies in the thatthis documents in presented proposal the areTherefore, representation. and at the level study aims to properly represent ofthe natural language creation knowledgesemi-automatic The versa. vice and ontologies to of a lexical database in Brazilian Portuguese containing that information semantic and syntactic, morphological, can be read by machines was proposed, allowing the link between structured and unstructured data and its integration into an information The inclusion of language resources retrieval precision. improve model to in a natural language processing system can provide w interaction better systems. retrieval information of procedures Methodological language processing and information retrieval systems allows direct interaction and therefore allows raising individuals – and language. Therefore, tocomplex and simple capture of verbalizations about information linguistically rich elements of an ontology, lexical knowledge is needed, that is, knowledge of the set of words related should knowledge to this Furthermore, the interest. of domain published be should and machines in accessible made be these between bridge effective The reuse. its facilitate to two worlds would allow queries submitted in natural language to seek semantics available in the semantic web and provide alternatives Ambiguity. Science: Information in to interest of problem address a central and to use to ., search e . i ys, ys, but they state that words or codify them with similar meaning. Ciência da informação. Ontologia. Recuperação da informação. Representação da informação. Web semânt Web da Ciência Representação da informação. Ontologia.da informação. informação. Recuperação : , , Campinas, 29(1):57-72, 2017 jan./abr., It is evident that the world of semantic web and web semantic of world Itthe that evidentis Despite the semantic web promise to establish a establish to promise web semantic the Despite Thus, even if complex systems, such as ontologies, as such systems, complex if even Thus, The Web revolution has led to widespread access widespread to led has revolution Web The Informação knowledge, it is necessary to create a bridge between properties, classes, – ontology an of components the Trans statement statement in different wa retain essential information that is not present in any representation. other order In connected. be to need language natural natural language to be justified. The The authors natural add language that to be a justified. complete a is it but events, rare of system a is language claimed who Jones, Spärck quote they Therefore, model. symbol other no and self-representing are words the that can substitute Charniak (1973) and Wilks (1977) corroborate this be waived. be relationship between people and machines, Wilks and – representation knowledge that argue (2009) Brewster ontological in this case – must be combined with any computers computers to run only syntactic processing, proposal unique and initial the case, Inthis patterns. for of ontologies to interact with both man and machine can be affected cannot andindexing content and organization, preparation, human involvement in the computer software programs, can tasks complex surf perform and information, through collect Internet, the users. of behalf on ambitiously aim for semantic information processing, current technologies are restricted to the ability of will be readily available, reusable, and interoperable, and interoperable, and reusable, available, readily be will Web bring to is idea The ubiquitous. devicesthe willbe ubiquity the to everyday lives of users with documents thus pages, Web of information semantic with enriched of form the in agents, which in environment an creating Introduction the is progress, in still revolution, Another information. to principle the on based is which revolution, Web Semantic data ambiguous, be not will information electronic that e na As precisão. consultas submetidas ao modelo que mostram a o semântica desempenho busca supera d metodologia empregada. O procedimento, embora adicione complexidade em sua elaboração, pode ser repr outro domínio. Palavras-chave

M. SCHIESSL & M. BRÄSCHER 58 ONTOLOGY LEXICALIZATION 59 , ork in cannot understand cannot BERNERS-LEE; HENDLER; LASSILA HENDLER; BERNERS-LEE; , , Campinas, 29(1):57-72, 2017 jan./abr., of the Semantic Web is to help . (1998), this will transform the current abling computers and people to w Informação et al […] is an extension of the current web in which in web current theextension of an is […] better meaning, well-defined given is information en co-operation( 2001, p.3). Trans of characters that used to identify resources machines into conscious human beings, but s using a simple mechanism to express facts or facts express to mechanism simple a using s The uniform resource identifier concept is Resource Description Framework (RDF), as its If knowledge becomes explicit through Web The major goal All information added to the Web should be ). All of them correspond to the standard mechanism standard the to correspond them of All ). DOI statements. The idea behind the RDF is clear, the whole the clear, is RDF the behind idea The statements. concept is represented by the triple: subject, property (or predicate), and object. In fact, this combination is languages because it Western all speakersfamiliar to of is the intuitive way to form simple sentences. Subject done using the Uniform Resource Identifier, which refers which Identifier, Resource Uniform the using done a to string are built on standards. According to Heath and simple a Bizer provides identifier resource uniform the (2011), and extensible means for Furthermore, it is identifying intended to distinguish and identify a resource. via URI, such as texts, can that be represented anything images, videos, sounds, and concrete (car, moon) or concepts. divinity) (love, abstract widespread in the Information Science such as in the specification the location of books Web of pagesidentification the in via and (URL) Uniform Locator Resource via International Standard Book Number (ISBN), serial publications via International Standard Serial Number (ISSN), and digital contents via Digital Object Identifier ( objects. individualizes and identifies that name suggests, provides a framework for describing resource exchanged between them, but they they but them, between exchanged messages. these of meaning the created: is Web Semantic technologies, machines to “read” and use the web. Berners-Lee According to a Web, giant global book, into a giant global database. Such technology does not providetransform intelligence interpret and or exchange, find, to them for tools provides it information. named to be identifiable and retrievable. This can be caixa , agência ge in natural language. (agency, Automatic Teller Automatic (agency, o, Oberle o, and Staab (2009), depósito entation model that encompasses the encompasses that model entation , and , In human communication, people use contextual use people communication, human In Guarino (1998) argues that ontologies capture According to Guarin According meaning of information. Consequently, the machines deal only with syntax in order for information to be https://doi.org/10.1590/2318-08892017000100006 to facilitate utterance interpretation. On the other hand, other the On interpretation. utterance facilitate to the communication between machines is established for developed methods standardized and artificial using Markup HyperText the on based is Web The purpose. this Language (HTML), which cannot explain the real lexical level is also necessary for proper use of ontologies of use proper for necessary also is level lexical in language processing and as a way to integrate the levels. ontological and terminological experiences personal and knowledge, world knowledge, semantics of ontology, the terminology used to express to used terminology the ontology, of semantics this knowledge in natural language,This units. lexicaltheir and terms the about information and linguistic model allows the participation addition in Therefore, of process. inference and translation machines in the the representing levels, terminological and semantic to Paradoxically, Paradoxically, researchers have given less attention to issues to related the lexicon and linguistics in the fields of knowledge organization and information retrieval. formal a requires problem this of solution the Therefore, repres knowledge knowledge but fail to capture the structure and use of terms that are objects of Terminology and Lexicology. The structure and use of terms are essential to express and refer to the same knowled example, example, the bank word alone is ambiguous, but if it is combined with other words, such as saque eletrônico, Machinen, withdrawal, and deposit), it falls within the semantics. its reveals and institution financial of context data because the human reader has to interpret the gaps the interpret to has reader human the because data and relationships present in the texts. The availablesources usually have only keywords visible in search engines, which can be seen as a limited However, if semantics.the keywords are related to other defined For semantics. the revealing formed is context the links, Semantic WebSemantic in the context of Semantic Web, semantics underlyingconveysof use effective more enables This meaning. e inconsistencies. e ., it should be explicitly be should it ., i.e https://doi.org/10.1590/2318-08892017000100006 ent. Both directly influence the effectiveness the influence directly Both ent. ble to determine whether there ar there whether determine to ble Baeza-Yates and Ribeiro-Neto (1999) propose the propose (1999) Ribeiro-Neto and Baeza-Yates There have been undeniable advances in However, the difficulty in predicting relationships predicting in difficulty the However, for designed was (OWL) Language Ontology Web (1998), which tries to respond to the challenges of challenges the to respond to tries which (1998), possi the Web. the of view logical the and task user the between distinction docum the more complex class structures and properties. It extends It properties. and structures class complex more describing for vocabulary more adds and RDFS and RDF classes, these about facts classes, as such things, of groups relationships between the on focused classesis It relationships. these of characteristics and read be instances,to intended is and content and Web the of processing by computer applications. Moreover, it enables creationthe of rules, axioms, and inferences(W3C, tools 2014). logical using deductions to enable Informationretrieval the Web, the to due years recent in retrieval information popularization of Graphical Use Interfaces (GUI), and inexpensive mass storage devices. Incontinuous addition, optimizationthe of search improvesengines, users’ experience,which has madeespecially information, of source the preferred and Web standard the and Brin by engine search Google the of launch the after Page designing a system that gathers Web documents and of growth of rate the to according updated, them keeps information systems can go a long way with semantics. a little involving conflict or incompatibility still remains. For Man classes the consider we if classes: disjoint example, and Woman, we know that no individual can be instance of an both classes. This means that, in RDFS, it is im assumption (OWA) World the Open the other On hand, is what is database the in stated is what that view the is known; everything else is unknown. Similarly, there is names, single of assumption no expressed that person A is not person B. Finally, there and entities of specification comprehensive a be should relationships unless they add inference rules in a more abstract layer thatdatabase. the to restrictions generalized can set limits and introduce . i.e ., it ., i.e Heitor Villa-Lobos was ntation of specialization of ntation ” ” in Japanese would use the , , Campinas, 29(1):57-72, 2017 jan./abr., al level only. With these Semantic Web tools, Web Semantic these With only. level al e 1 shows the represe the shows 1 e Description Framework Schema (RDFS) is a and range. Therefore, RDF and RDFS provide The semantics of the elements of the RDF (S) Figur According According to Nardi and Brachman (2003), RDFS Allemang and Hendler (2008) state that the Informação sufficient semantics to represent knowledge, although superfici a at Trans knowledge is based on their properties and values, values, and properties their on based is knowledge is possible to make inferences about the hierarchicalbased and properties and classes between relationships on restrictions connected to the properties, such asdomain the semantics in the data model, which leads ontology to and an translates into the computer world the intension and extension about (1978) Dahlberg of ideas of the concepts that are the Science. basis Information of ontologies in born in class Rio the of de specification Janeiroor instantiation, particularization, Box, Terminological or TBox called is second The ‘person’. which contains the domain abstractions that enable the in this layer, Thus, about the data model. inferences introduces properties and classes between relationship base. and generalization layers. The first one is commonly the example, or as Assertionalto ABox For Box. referred representation of the sentence “ the relationship between general entities, such as classes as such entities, general between relationship the knowledge (specific) extensional 2) and properties; and that deals with the specification of the entities or class instantiation. As a result, the relationships entitiesbetween in the specialization layersgeneralization layer which areforms a RDF (S)reflected knowledge in (subproperties and superproperties). and (subproperties intensional 1) knowledge: of types two combine RDF and knowledge (general), which remains at the conceptual abstract level and deals with the actual data model, Resource RDF. in used be to vocabulary the defines that language It allows the definition of classes of entities that have something in common. Moreover, it enables defining hierarchy the as well as restrictions, their and properties of classes (subclasses and superclasses) and properties refers to the concept to be described; property refers to refers propertydescribed; be to concept the to refers to refers object and subject; the to related attributes the property. Anything can be described using this simple triple.

M. SCHIESSL & M. BRÄSCHER 60 ONTOLOGY LEXICALIZATION 61 . . et al et al . (2004), but Ex:City Class et al ave been made in made been ave rdfs: Class rdfs: Class subClassOf rdf:type rdf:type , , Campinas, 29(1):57-72, 2017 jan./abr., Ex: who suggested a solution that . (2004a; 2004b) and Contreras t al Ex:District e e

g Informação tic portals discussed by Maedche

n (2004),

a

r .

: vities involving the Semantic Web have been have Web Semantic the involving vities

s ce ranking is provided. In some systems, links systems, some In provided. is ranking ce subClassOf f Trans Class

d

r et al The relevance ranking issue was addressed by Acti Seman the user interface next to each returned instance in the in instance interface user returned the each nextto query answer according to Contreras ranked. are documents the nor instances the neither Rocha Semantic searchSemantic h proposals many and studied widely an attempt to create a Web of distributable, machine- readable data. Since the concept of semantic web has but solved been have problems many introduced, been more complex ones are still approached differently by different researchers generalized view thatof semantic web, which is discussed contribute tobelow. a more (2001), Castells (2004) essentially provide simple search functionalities Searches retrieval. data semantic as characterized are that return ontology instances rather than documents and relevan no in added are instances the reference that documents to rdf:type Ex:origin Ex:

Rdfs: Property Rdfs:

n

i

a

m

o

d

:

s f

Class

d r subClassOf rdf:type · rdf:type rdfs: Class rdfs: subClassOf Ex: Class mation Retrieval System (IRS). User task implies task User (IRS). System Retrieval mation . Intensional x Extensional Knowledge. From the From traditional information retrieval to the nfor Class RDF RDFS in the infrastructure involved in the entire information process. management https://doi.org/10.1590/2318-08892017000100006 on well-structured and well-known collections have been have collections well-known and well-structured on replaced with ordinary people who tend to ignore or disregard the heterogeneity of the contents, languages,query complexity increased or to led has any This conceptualSystems. Information foundation about logical logical view of a document, but its usage leads to poor retrieval. quality Web today, there has been a significant change in the queries perform to trained Professionals profile. user Web document refers to a sequence of transformations aimed transformations of sequence a to refers document terms index of set a through documents representing at texts full although because justified is which keywords, or their document, a of view logical complete most the are usage implies high computational cost. On concise the most other the provides categories of set small a hand, of the I the of user the of semantics the convey which terms specifying need and that meet the user information needs when browsing retrieved documents. Logical view of the Figure 1 Source: Created by the Author. . (2007) . . (2014). et al et et al nd nd Cimiano measure. measure. The . (2009) introduced (2009) . . (2013) a et al et et al . (2011) and improved by m. . (2013) and Walter Walter . (2013) and et al interconnect semantic web with web semantic interconnect etween text and structured data structured and text etween . (2003) and Reymonet and (2003) . in order to automate the lexicon et al https://doi.org/10.1590/2318-08892017000100006 et al et (2012), Unger . et al Finally, Finally, the integration between semantic web Some initiatives such as those introduced in the in introduced those as Somesuch initiatives Given the large volume of Web content, it is The exploitation of associated with to attempt an In using metadata in all stages of the process: . (2014) reflects the urgent need to establish a Crae originally proposed model. proposed originally and WWW will make it possible to obtain appropriate user the about information unstructured and structured levels that enable various searches, Heath and Bizer (2011) Bizer and Heath searches, various enable that levels b gap the that mention web semantic of popularization the to barrier a remains environment. this for designed tools of use the to and Navigli by studies include ontologymodel The levels. ontological and lexicalizationlexical the integrating models proposed without by Buitelaar Mc et al connection between the knowledge of the world of describingaccurately terms, of world the and concepts the between difference the impossible to develop solutions without the help of machines. Therefore, Walter construction, and semantics the provide to databases structured used The variants. morphological and lexical find to corpus the aim is to induce the creation of aknowledge lexicon fromrepresented the in ontologies to feed the noise, unreliability, and possible conflicts of data collected data of conflicts possible and unreliability, noise, sources. of number large a from of precision the increase can documents web semantic Silva systems. retrieval information a generic information retrieval model for the semantic web representation, matching, and similarity model uses semantic representationkeywords. that The cases” ratherdocuments “semantic in clustered than instances are and concepts described through represent the user interest. In order to achieve more precise results, the matching and similaritycompare modelsthe same “semantic cases” of queriesdocuments. and especially proposed, been have processes various WWW, lately. Despite the growth of structured databases to . et al et ngines. . (2004) . . (2003) . (2008) et al et ved objects ved et al et al d the range of web, retrie web, . (2005) and Castells Castells and (2005) . et al et . (2003) and Popov Popov and (2003) . et al et . (2004) believe that the combination the that believe (2004) . , , Campinas, 29(1):57-72, 2017 jan./abr., et al et . (2011) proposed a Semantic Web Search et al Exploring the Linked (LOD) potential, (LOD) Data Open Linked the Exploring Seeking to overcome the limitations of specific The study by Vallet Vallet by study The Popov Popov Guha Guha and McCool (2003) and Guha Informação vestigated the combination an Moreover, this system must scale to large amounts of data of amounts large to scale must system this Moreover, heterogeneity, with deal to enough robust be must and Trans Engine (SWSE) for searching and browsing RDF Web data. Web RDF browsing and searching for (SWSE) Engine semantic the of flexibility the Given can represent people, companies, cities, proteins, or anything that has been published without predefined e search traditional in that as such categorization semantic data and (2) bridging the gap betweeninformation textual the unstructured and data web semantic Web. the on available Hogan organizational ontologies, Fernandezin information spaces providedtowards step important by an semanticrepresents study Their web WWW. and open the to technologies retrieval semantic of design the Web by: (1) bridging the gap between the users and by introducing a ranking algorithm especially designed especially rankingalgorithm a introducing by semantic a using model retrieval ontology-based an for indexing scheme based on annotationtechniques. weighting information retrieval can retrieval. semantic automatic address and annotation the problem in and Guha by out carried studies the complements (2007) Guha (2003), McCool corresponds to a resource and each arc is labeled with a with labeled is arc each and resource a to corresponds model. data RDFS propertylikea type of information retrievaland representation, knowledge ontologies, lightweight techniques, semantically determined according to the relevance provided by the by provided relevance the to according determined weights. associated assumed that semantic web data are modeled asdirected a and labeled graph, in which each node provides provides a ranked list in response to user queries. The authors proposed a semantic network in which relation the instances have semantic labels and numerical The weights. query terms are mapped to the semantic network nodes, and the order of the search results is

M. SCHIESSL & M. BRÄSCHER 62 ONTOLOGY LEXICALIZATION 63 Threat ntity; LE- Legal Entity) is a is Entity) Legal ntity;LE- E is a , , Campinas, 29(1):57-72, 2017 jan./abr.,

n of texts about financial risk s se

a explores re c in t a th Informação Trans Figure 2 shows the top level view of the financial the of view level top the shows 2 Figure The collectio

risk domain. The variousrelationships that conceptscontradict the forces arebetween the linkedthreat and byprotection of the assets of an entity. Each dimension of this diagram gives rise to increasinglyspecific concepts. type Thethe establish that arrows the following interpreted set of concepts of must relationship between one concept and For another. be Individual (IE- IE/LE example, defense agent that imposes risk.asset the defense mitigate measures to in are which Portuguese, in documents 2,978 contained various formats known by most The users. formats are: (.doc MicrosoftWord and .docx) (.ppt), and PowerPoint Markup HyperText and (.pdf), Format Document Portable WikipediaLanguage (HTML). In in Portuguese addition, was also used containing 1,385,451 documents in

n Asset o i- Vulnerability e in Portuguese in e

s

e

u

l

a s e

v c

u

is a d (2010, p. 12).

e

r

t

a

h t et al.

Imposes he present study, we propose the sem . Top level view of the Financial Risk Domain. In t Defense matic construction of a lexical databas lexical a of construction matic Figure 2 Source: Adapted from Gresser https://doi.org/10.1590/2318-08892017000100006 based on ontology of risk and its corresponding corpus, corresponding its and risk of ontology on based below. described as Semanticinformationretrieval proposalmodel auto Risk the the domain, purposeFinancial which, for for of created was database This RiscoLex. called was study, this and provide good quality results. Thus, only with this free this with only Thus, results. quality good provide and and unrestricted communication between the worlds, the two claimed potential of semantic web will be user. common the to available profile. The new generation ofindistinctly IRS willnot are bethat searchstructures able ontological to containing either knowledge in databases of understandable to formalpeople or in textual databases that programs computer intelligent to understandable not are ), ; (3) or is task;is Format Searchfor Synonyms RDFMapping ) or separated by separated or ) LabelExtraction nascimentoLocal país de origem ) must be represented be must ) into ... Ontology Thesaurus Dictionary presidentes-do-Brazil https://doi.org/10.1590/2318-08892017000100006 instituições_financeiras paísDeOrigem tuguese language were used in th . Riscolex construction flowchart. The approach includes the proposal of one or Lexical Database Classes/Properties Figure 3 Source: Created by the author. This step aims to characterize frequent terms which are, which terms frequent characterize to aims step This Portuguese the in and domain the in preferred therefore, language; (4) in natural language, it is common to use Thus, meaning. same the convey to word one than more the aim is to find the greatestontologies Linguistic list. the of terms the for synonyms possible number of Por the for and (5) The Lesk (1986) approach was used to polysemoustreat terms and collect those thatdomain. the to relevant are more them into RDF format, and provide the lexical database lexical the provide and format, RDF into them the of steps the shows 3 Figure model. Lemon the with process. RiscoLex generation more lexical entries for each class and property of the ontology. The first step involves the extraction of the labels of the ontology and additional information such as synonyms and syntactical features, from external resources. The task steps were configured to do the following: (1) All s and p labels are extracted from the natural in terms of list a create to o) p, (s, triple ontology language; (2) Labels in CamelCase ( ( words hyphenated ( underscore formats transform to aims step This texts. in found NL in such as gerenciamento_de_risco em gerenciamento de risco These terms are searched in the corpora for validation. . et al These (2011), Oliveira Floresta part. The part. et al. et ine readable ine . (2011) for the Bosque Bosque egosa egosa ormat. The The reason ormat. roposed by by roposed et al Seìrasset (2014). Seìrasset ”, version 7.4, version ”, Corpus icon Model for Ontologies (lemon)

es that enable better generalization of generalization better enable that es labels of classes and properties of the , , Campinas, 29(1):57-72, 2017 jan./abr., -learn.org/stable/>, by Pedr by -learn.org/stable/>, The The proposal for the construction of RiscoLex is Semantic similarity was determined using the In order to represent the linguistic information, Informação Floresta Sintá(c)tica Floresta ontology, identify and retrieve their respective synonyms respective their retrieve and identify ontology, convert term, each of features morphosyntactic the and Trans resources combined are the key sources for the selection the for sources key the are combined resources interest, of domain the in lexicons related semantically of study. present the risk,in financial to extract the WordNet, proposed byMultilingual Wordnet (OMW) proposed Fellbaum by proposed Paiva PWN OpenWN-PT; the in (1998);resulted which (2012), Open p Onto.PT (2013); Foster and Bond by by proposed DBnary, and (2013); ontology. groups in structured are that resources lexical following used be can that and items lexical related semantically of freely because they are in the public domain: Priceton proposal proposal of the Lex were applied. This model was designed to develop a standard RDF format of linguistic information, mach a which of specifications declarative includes lexicon that captures morphological, syntactic, semanticand aspects of the lexical items related to an LexicalizationApproach the principles defined by McCrae library (NLTK), by Bird, Klein, and Loper (2009); SciKit-Learn (2009); Loper and Klein, Bird, by (NLTK), library , the applications Apache Jena, , (NLTK). There were 9,266 phrases corresponding “ to following computational resourcesprocessing: were Protégé used, for the with language programming 2.7 Python 4.3, version eXtensible eXtensible Markup Language (XML) f or lexicalizations of types different find to ability the was properti ontology standards. For this search, we used Toolkit the corpus tool Language Natural the into incorporated is that

M. SCHIESSL & M. BRÄSCHER 64 ONTOLOGY LEXICALIZATION 65 . (2011) and Kara et al , , Campinas, 29(1):57-72, 2017 jan./abr., ion, comparison, and annotation. infer unexpressed They meanings. Informação ., if the query does not find relevant i.e Trans The first result to be highlighted is the creation of creation the is highlighted be to result first The Figure 4 illustrates the information retrieval The semantic annotation process is therefore The domain is represented by ontologies and . (2012), the model has the information retrieval Results Portuguese Brazilian in database lexical first the RiscoLex, approach; correspondents in the knowledge base, the retrievessystem the information related descriptors. to the document process with the addition of the The semantic module. user interacts in a traditional way to submit the query. The query processing standardizes the terms for the includes knowledge the lexicon-ontological The search. the characterizes corpus The RiscoLex. the and ontology The retrieved. be to documents the containing database the provides involved the databasesindexation of joint and retrieval the in used is which index, lexical-semantic ranking of retrieved documents to be to the presented user. essential to link for tool documents main the is NLP ontology. to domain the the by created semantic space document identificat effects, ambiguity possible minimize to seeking However, validation. human by complemented is it the financial risk domain, the model can be applied to any other domain sinceunstructured thereinformation is that structuredcoulddomain. the by understood concepts represent andthe represent entities ontological hand, one the On corpora. non- infer automatically engines inference and concepts, explicit information. On the other using interact hand,people and contents, descriptors document describe natural language to complement incompatible eacheven or formats other different in for but information the task of providing Influencedformats. by Fernández et al semantic a including descriptors the on based structure indexed are entities ontological and Documents module. interaction the facilitates option modeling This together. way same the in searching keeps it as user end the with final the Additionally, engines. search traditional in does it result is at least as good as that of the traditional Pessoa Pessoa classes. (financial Especialista between the gy represent the represent gy nceira (risk); no mechanism no (risk); pessoaFísica n of the document risco n be increased through e label of the class instituiçãofina (danger), (danger), in the representation of e extent to the comprehensiveness the to extent e automatic hypernym resolution, such as such resolution, hypernym automatic perigo perigo ery term does not match with the keywords, the with match not does term ery In the Semantic Information Retrieval Model Moreover, th Moreover, Therefore, a corpus and an ontolo an and corpus a Therefore, Traditional IRS rely on keywords or descriptors to descriptors or keywords on rely IRS Traditional (bank) and would would be expressed in the descriptors as (Individual Entity). In this case, the RiscoLex, linked RiscoLex, the case, this In Entity). (Individual include the concepts to be represented. In addition, it is it addition, In represented. be to concepts the include also assumed that although this search is restricted to https://doi.org/10.1590/2318-08892017000100006 SIRM:Anoverview constructed was ontology the that assumed is it (SIRM), that sources information textual the with associated and expressed. The expressed. banco institution) and other forms of given dependence a of between representation the in precision increase words, document. be seen as a dynamic extensiodescriptors. For example, from the class (Specialist) it can be inferred that the members belongalso to the Stakeholder and Therefore, they inherit restrictionsall through ofaxioms, theirwithout being attributesexplicitly and of the indexationThe systemmeanings. semantic ca provide explicitly that inferences supportinclusion to of ontology more the IRS provides can it case, this In engines. inferential through meanings Física Física synonyms the and lemma the provides ontology, the to the insertedin not is descriptor the If descriptors. the to semi-automatically be can it ontology, the in or RiscoLex inserted in both of them to emphasize the dynamic knowledge. of nature same domain to different users: machines and people. In general, there is no correspondencedocument the and entities ontological in available labels descriptors. For example, th Física the document will not be retrieved. For example, the query term is is synonym the document the retrieve will terms between similarity measuring on based document. that Riscolex and informationRiscolexandretrieval is problem The enough. not is this but documents, index qu the if that Risco Financeiro tation of knowledge in he domain; if so, they were inserted were they so, if domain; he https://doi.org/10.1590/2318-08892017000100006 reflect the poor choice of terms to be to terms of choice poor the reflect rce for the rce for of creation the bag of words epts. In addition, 476 axioms were created were axioms 476 addition, In epts. ., 112 entries, were searched to find lexical find to searched were entries, 112 ., ., from 90,533., from terms 42,394to terms in the i.e i.e ting information and knowledge discovery the sou Therefore, the labels of 65 classes and 47 labels the provides ontology the hand, one the On In the present study, the results of the validation the of results the study, present the In groups. three generated procedure clustering The ., identical terms, were automatically added to the i.e BoW, the in found not were that terms the For RiscoLex. they whether verify to out carried was analysis manual a t to related fact in are RiscoLex. the into used as (BoW) of the risk. The group processing steps wentuntil finding the appropriate BoW throughfor several processing eachIn synonyms. the with comparison the terms of number the of reduction a observed was it step, that would be part of the BoW,reduction, resulting in a 53% group. properties, variations or synonyms in the dictionaries and lexical new 122 of total A RiscoLex. the compose to ontologies terms were found Thus, and the validated. final version compose to terms 234 totaling 109%, by increased was RiscoLex. the databases. support the in synonyms for search the start to On the other hand, the corpus is segmented and the group that contains the terms that best represent the domain is transformed the using validated into are BoW the in terms and synonyms a BoW. Therefore, the similarity The terms measure. with similarity equal to 1, between conc between deductions logical enables that engine inference the for from exis domain. the about of the ontology labels with the corpus expectations.were Only below 50.7% of subjects or classes were found in the corpus and only 20,0% in the properties. results low These This properties. and classes the of labels the in included process indicates that the selection of synonymousshould improve participation the The represen available. material written the to relation of experts or specialists in terms. appropriate thismost the choosing field is essential for The group chosen by the specialists was the one with the largest number of related words to (financial risk). Then, the most representative term was e Risco to the to Indexation Knowledge relationships Lexical-ontological or – Index (2011, p.438). predicate, and object; 65 object; and predicate, Lexical-semantic (Finance and Corporat et al. Unranked and 47 properties properties 47 and Documents – g – in Portuguese was developed based developed was Portuguese in – rankin Query Query Search , , Campinas, 29(1):57-72, 2017 jan./abr., Processing lthough different resources were used as the as used were resources different lthough or concepts concepts or . SIRM overview. Thus, Thus, the final ontology in the domain of Another result is the construction of the first – OntoRisco s s e interpretation of language restricted Informação Ranked inanceiro e Corporativo triples comprising the subject, the comprising triples classe Trans F – Risk) on the combination of existing ontologies in English adapted to the Brazilian needs. This resulted in 2,178 construction of this financial ontology national and as international if the itbetween diversity were new. The markets has led us to rethink especially themarket, conceptsBrazilian the and to their according relationships for public companies. This adaptation required great knowledge. such represent to effort ontology for risk management in Portuguese. Difficulties Portuguese. in management risk for ontology reported been often have resource of type this building in in the academic literature. In ourA different. study, it was not the demanded topic the of specificity the point, starting process. In this is language line,natural of meaning the which at thegranularity, level of representationalsemantic the by but language by driven not is captured, distinctions these Thus, ontology. an in made distinctions domain. specific a of context the in only relevant are built with the Lemon model, which differs from the others the from differs which model, Lemon the with built th by well-defined domain. Additionally, the ontology, as a resource for natural language interpretation, puts the lexical database at the center of the interpretation Figura 4 Source: Adapted from Fernández Documents

M. SCHIESSL & M. BRÄSCHER 66 ONTOLOGY LEXICALIZATION 67 8 15 18 159 153 RDD (ownership), F% 97.30 99.67 93.33 0 0 0 100.00 100.00 posse R% 87.50 100.00 100.00 0 100.00 100.00 (property), , , Campinas, 29(1):57-72, 2017 jan./abr., (resource), that is, something that something is, that (resource), ’ ’ (product) which is synonym for (risk), the most frequent term, was term, frequent most the (risk), P% 94.74 99.35 (threat), there would be absence of absence be would there (threat), 0 0 100.00 100.00 100.00 als with a particularpoint. a with als risco risco recurso recurso ’, ’, which in the RiscoLex has the same Informação produto 7 18 15 propriedade ameaça ameaça 153 159 RRD bem Trans rm ‘ A third procedure to deal with ambiguity by In addition, in terms of semantic similarity, the The term The As highlighted in the literature, recall and ’ ’ (article). This term is also common in the risk (asset), and and (asset), 7 ss 19 15 154 159 artigo ativo ativo is owned or possessed. This shows that the syntactic differentiation of terms helps removing the ambiguity precision. improving and policategorization by caused homography was also used. It refers to the semantic identification of terms search thatthe example, have For the meanings. same different but syntactic category for the te ‘ domain, but it usually refers to a part of law or de that legalagreement related with terms of identification the to refers procedure similarity. semantic their measuring at aiming meanings Given a particular term, the entire collection can be only one document, but in the documents. 159 semantic retrieves search, it present in most documents, and its variants were used search syntactic a of case the In documents. three only in term the for semantic the Therefore, texts. related semantically of 99% result is the set of texts containing any semantically related term present in the RiscoLex The database. first increase. recall the is technique this of benefit balance a therefore and correlated, inversely are precision should be soughtbehavior common to ambiguous an achieve instance, For maximumprecision. recall and is observed for terms that take let’s example, For have use. their to according functions, different syntactic the noun ‘ concept of F% 4.88 2.48 0 10.93 40.00 0 14.74 retrieves 6.54 1.26 R% 25.00 93.33 00 0 100.00 00 0 perigo perigo 8.00 2.50 P% 33.33 100.00 00 00 0 100.00 . As previously explained in 2 2 14 18 10 RRD ameaça 2 2 ts 30 175 720 there there is the semantic space provided by isible to the . In the platform the In engine. search the to isible

, and In the search for the same term, for example the example for term, same the for search Inthe The terms identified in the corpus that According to the objectives of our investigation, our of objectives the to According . Results’ evaluation. ‘ameaça’ Relevant Retrieved Documents (RRD); Relevant Documents in the Database(RDD).

Query P-1 P-2 P-3 P-4 P-5 Notes: Source: Created by the author. https://doi.org/10.1590/2318-08892017000100006 the semantic search looks term for the search, any syntactic typethe in ofexample, term. For Table 1 the RiscoLex, which also means searching for the terms the for searching means also which RiscoLex, the risco, perigo, terms these between similarity semantic the 3.3.1, section was obtained from the lexical resources used in search and that are related to the financial risk domain. Thus, whereas only, terms explicit for looks search syntactic the Retrieving informationRetrieving term weight weight to increase the relevance of the document and v more it make Solr, term weighting is based on the tf-idf information algorithm, detailed with list ordered an presents which of the scores assigned to each retrieved document to relevance. the rank documents were indexed with and without semantic annotation to enableapproaches. comparison between the corresponded to the ontology labels were assigned enabled examining the advantages and disadvantages of the methodology investigating usedthe entire corpus manually to andverify and the feasibilityvalidate theof results ofthe containing documents 785 of total a Consequently, the automatic procedures. main definitions of risk domain were selected. These Discussion it was considered that there was no need for numbera vastof documents, but rather a set of data that ’ ’ risco riscos perigo operacional (risk) in the or

(crime) (query:(crime) risco ’(risk), and ‘ (risk) crime risco ., many documents were documents many ., risco risco i.e ’ (operational risk) that leads that risk) (operational ’ nal risks) was also not retrieved in retrieved not also was risks) nal ’ ’ (threat), ‘ https://doi.org/10.1590/2318-08892017000100006 is not explicit in the query. However, the However, query. the in explicit not is m, including the one with its plural form. plural its with one the including m, risco operacional risco ‘ ’ (operatio ’ Ameaça ‘ The P-3 query indicated the importance of The P-2 query showed good recall but low – P-5 documents related to to related documents P-5 – influence may that characteristics have queries All For each query, Table 1, shows: the number of the between difference considerable a was There tic search can recognize other terms that are ). compound ter compound as well as terms, compound and words stop processing characteristic marks diacritical of identification proper the traditional search traditional and querysearch semanticin search the P-1 due to the preference for the term documents that composedocuments. 1.26 only the recall, database.low the in Which reflected was therefore ‘ term the retrieve to able not is index traditional The it because (risk) seman present in the RiscoLex, and thus they were added to the index. access as considered and indexed therefore (danger)were points. search, traditional in precision lack the to due is This irrelevant. were most but retrieved of identification of compound terms, resulting in the search for the isolated terms (operational). Furthermore, there is no processing for plural terms. Thus, a documentoperacionais finds ss containing the hand, other the ‘ On either. search semantic the term single a the containing documents all of retrieval accurate the to crime the results obtained by searchlanguage engines.processing pointFrom of the view, the use the with improvement an demonstrating increases complexity of a semantic information retrieval system. The results, the linguistic complexities that affect the performance proposed the way the and engines, search traditional of below. discussed are them with dealt approach documents retrieved by the traditionaldocuments searchconsidered for values the search; semantic the by retrieved relevant;documents the number of and precision, recall, measurement; and the number of of documents in the database that were considered column). experts(last the by relevant (credit ed terms ed (threat) t ’ (article). ’ ., an index i.e ). (goods) (query:(goods) artigo ameaça risco operacional bens risco de crédito de risco ’ (goods) also retrieves also (goods) ’ ). risco operacional risco mercadoria same documents were indexed. The indexed. were documents same the performance of our proposal, two proposal, our of performance the ). , , Campinas, 29(1):57-72, 2017 jan./abr., risco de crédito de risco – P-4 documents related to to related documents P-4 – – P-2 Documents related to to related documents P-3 – – P-1 Documents related to The evaluation was based on recall and precision. and recall on based was evaluation The To evaluate To Finally, it is known that human supervision ameaça ). Informação bens Trans (query: (query:risk) (operational (query:risk) Therefore, in order to determine the relevance,databases the were evaluated by assessed5 experts, the whodocuments also thatqueries: following the to according were not retrieved, Therefore, Therefore, the relevance of each document should be case, this in interest, user’s the to according determined databases these Since system. the querythe into made have not been previously query. classified, the of it topic is the to necessary relevant to references the evaluate first one, represents the traditional search, built based on unprocessed texts, and it was used as a the contained second The comparison. for point starting annotations, semantic the with texts on based built index search. semantic the represent to used was it and Evaluation the with databases investigated by domain experts and about which there which expertsabout domain and by investigated is some uncertainty regarding annotation. This list is a debugging tool to identify polysemy cases, semantic or concept the to correspond not do which annotations important of absence the and category, syntactic the to base. knowledge the in domain the for terms increases the annotation accuracy. However, the task is increases the annotation However, accuracy. not feasible for several million annotations that can be annotation automatic The databases. textual in obtained processing described can present a list of terms to be scrolled scrolled trough to identify others that have therela same semantically Identifying category. semantic is very useful in indexing a corpus a so for that a search ‘ as such meaning broad ‘ as such terms specific with documents

M. SCHIESSL & M. BRÄSCHER 68 ONTOLOGY LEXICALIZATION 69 ’ ”   ’. Duly ’. roubo (theft), (assault), Violação ’ ’ (activity) crime ’, according ’, roubo roubo crime) crime  ataque crime  his/her query and atividade equently its precision, its equently (infringement). (counterfeit), ’ (Law). Thus, it includesthe it Thus, (Law). ’ transgressão transgression característica psicológica sychological characteristic , , Campinas, 29(1):57-72, 2017 jan./abr., infração infração  p   Direito  falsificação falsificação rehensive the search would be. For activity atividade Informação   abstração Trans formation needs without realizing that the  abstraction ato is noteworthy that, in general, a search includes search a general, in that, noteworthy is ’ ’ (crime) at all; which is not desirable in this action (robbery), Conversely, Conversely, in the intentional path towards the term, the general It observed more was the that The The last query, P-5, showed the need for more It   (kidnapping), and and (kidnapping),  crime ., it does not include all technical aspects that the word the that aspects technical all include not does it ., to ‘ the only approach, search semantic this in Thus, example. hyponymic relationships that cons and search, aimthe of specificity to improve the considered. are results results include more specific concepts, such ‘ as ‘ concept the of extension an is which (theft), semantic a in role “specifying” this play hyponyms noted query. vagueness and scope the in increase an is there concept, in the search, leading to an increase reduction precision, in which is not the objective of most recall but a search engines. The example ‘ belowterm the of concepts the of taxonomy illustrates the to WordNet; from left to right, the most general to the meaning: specific most entidade evento (entity event the more comp example, the search for the term ‘ related not are that documents certainly retrieve would attention attention to details since it also involves the “ inference hyponyms. for search the includes that process meaning The ‘crime’(crime). for synonym the is (violation) adopted to refers an action that principle, breaks a law, or agreement from perspective of the ordinary citizen; i.e ‘ of in terms expresses following hyponyms for the domain: assalto rapto thus concepts, specific most the to general most the from ordinary an concept, the towards path existential the in user does not take into account the complexities and existing linguistic relations between the expected result. When a general meet term that is searched, documents retrieved the by satisfied is user the his/her in ’, (asset), recurso ’ (as well (as ’ creditar ’ (goods), ’ d (goods), bem bem como’ ativo ativo did not have not did , an bem ) bens’ plural form, the ntic search, 154 bem como bem (ownership hus, there was one extra ’ ’ (to credit) (verb ‘ posse credito (operational (operational risk). This is a typical case se language. Traditional searches usually searches Traditional language. se (property), ’ ’ (credit) and ‘ The The P-4 query showed an improvement in the With the normalization of the In was query, there one extrathe P-3 document ... além dos e de mercado, introduziu-se o risco crédito (as well as), 17 were excluded because they because excluded were 17 as), well (as text. the in synonyms the https://doi.org/10.1590/2318-08892017000100006 propriedade (resource) were retrieved. T mistaken a indicated which ss, the in retrieved document annotation, which was observed due to Moreover, of thisthe 20 references to result.the ‘ term the ts, 20 had the comparative phrase ‘ phrase comparative the had 20 ts, the that references 3 of exception the with therefore, and as), besides this comparative phrase also contained other terms relevant and theyterms with the same meaning, should not be retrieved. In the synonyms sema the included also that documents defining the meaning of the terms. The traditional search traditional The terms. the of meaning the defining retrieved documents that have the ‘ term but it did not include important is Itits database. the in plural references 16 in resulting form ‘ to mention that out of the 30 documents retrieved by in which only human judgment can determinerelevance of the the term, and there is no way to treat it automatically. results considering the syntactic categories as a way of operacional... irrelevant considered was it but retrieved, was document by the experts because it is a text in which the term appearsin one listseveral only of riskscontext in the of risco operacional expected. This difficult. more processing automatic making ts, the in follows: as document the in included is reference words words are found in the domain investigated, and thus before form canonical the to change and recognition the low and recall high Therefore, convenient. is annotation precision in traditional search processing indicateof the mentioned thetopics. Inlack the semanticof search, high recall and precision values were found, as of the Portugue the of eliminate the accent marks and therefore, the words ‘ conjugated in the first person of Both present form. same the have indicative),will accent, acute the without risco graphers, for future research, we build a lexical-semantic index erspective of information science. information of erspective https://doi.org/10.1590/2318-08892017000100006 data and unstructured textual corpus and (financial risk). The second resource is the For For all of these reasons, it can be said that the As a suggestion From the perspective of Information Science, Another contribution of this study is to provide other than the databases tf-idf, the to Moreover, address useful. the highly lexical-semantic be would indexing tools as used be could lexicalization ontology by created text automatic or summarization automatic improve to writing. Finally, the participation of lexico semantic information about terms finaceirorelated to development of the OntoRisco, ontology, for the same domain. proposal the since achieved was study objectivethis of was to create a lexical database, RiscoLex, in Brazilian Portuguese containing morphological, syntactic, and semantic information that can be read by machines in the RDF format allowing the link between structured OntoRisco it into a integrate semantic retrieval model information precision. improve to order in ideas collect to order in users on study a recommended to improve the vocabulary and, atconsider the idiosyncrasies of and the jargons domains the same time, to be explored. This could contribute to improve the search results of lexical databases with semantic IRS. Moreover, the adoption of different weighting factors, semantic information to that improves precision in the information retrievalprocess. literature scantystillthe because is field this in research There field. Science Computer the in produced mostly is foster would that research scientific for room” of “plenty is the development of the information science field and promote the popularization of the Semantic Web also, including the view of information scientists. Therefore, bridge to is study present the of contributions the of one the gap between the development of computational resources and the management and organization of p the from information, resources in Portuguese for the financial The segment. novel a RiscoLex, the of construction the is resource first and syntactic morphological, containing database lexical ’ atentado ’ was ’ found, esearch and the crime and the annotation of the corpus the of annotation the and , , Campinas, 29(1):57-72, 2017 jan./abr., eprocess for debugging th The present study addressed the use of Semantic of use the addressed study present The Finally, it can be said that the queries submitted Finally, This case could be easily solved with the An important fact that deserves attention is that is deserves attention that important fact An It can be seen from Table 1 that the traditional Informação omote considerable improvement in the task of iscoLex, resulting in resulting a iscoLex, substantial retrieval information aim of this is to supportinformation retrievalthe model proposedthat uses linguisticsemantic and Trans Conclusion for processing information textual and technologies Web in is that database lexical-semantic a of construction the The W3C. the by adopted standard the with accordance providing the most appropriate information to the user. the to information appropriate most the providing Although more complex, the procedure proposed can be used in all kinds of domain optimizing the results obtained. outperforms the traditional search and validates the of preparation the involving process The methodology. databases ontolexical of the domain to be indexed are complex, but pr they improvement. Furthermore, it shows the dynamic nature dynamic the shows it Furthermore, improvement. require which knowledge of bases, maintenance the of maintenance. periodic to the model showed that the semantic search introduction of the annotation and the insertion of the to important is fact this However, RiscoLex. the into term emphasize that human verification is a very important th of part lexicon-ontologicalR databases, OntoRisco and analysis, one document that had the term ‘ (attack), which is also a hyponym ‘ of into inserted be to identified or annotated not was it but database. RiscoLex the to hyponyms annotated at the database. the at annotated hyponyms the failure to identify all relevant documents database,in the since there was one addition more to the 7 documentdocuments retrieved. In the in manual search retrieved 2 documents with 100,0% precision but precision 100,0% with documents 2 retrieved search however, search, semantic The 25,0%. only recall, low with also had great precision but with much higher recall, 87.5%, due to the fact that this search also identified

M. SCHIESSL & M. BRÄSCHER 70 ONTOLOGY LEXICALIZATION 71 : , … ed. IEEE , v., 7, nd 2 … Santa Computer . p. 445-458. Proceedings from:

et al SYSTEMS : ICSC, 2008. p.253-260. Available p.253-260. 2008. ICSC, : et al

, 14 , Teoriado conceito. . Formal ontology in information systems. In: 42, n. 5, p. 557-577, 2003. Av 2003. 557-577, p. 5, n. 42, Trans encies within & outside the financial sector. [S.l]: , v. lications All authors contributed to the conception and design and conception the to contributed authors All nd e, Services and Agents on the World Proceedings... a (CA):a USA NAGEMENT , 2008, Santa Clara, California, USA. nd STAAB, S.; STUDER Berlin: Springer, 2009. p. 1-17. Available from: .Cited: Sept. 27, 2011. R. MCCOOL, R.; GUHA, Networks DAHLBERG,I. .Acesso em: 22 out. 2011. FELLBAUM, C. Cambridge(MA): The MIT Press, 1998. FERNÁNDEZ, M. retrieval: An ontology-basedScienc approach. n. 4, p. 434-452, 2011. Available10.1016/j.websem.2010.11.003>. Cited: from: Feb. 20, 2011.

Natural language processing with .1999. Tese (Doutorado em Ciência em (Doutorado Tese .1999. Institut National des Langues et Semantic web road map . Linking and extending an , 2011.p. 33-36. . . Ontology lexicalisation: The lemon Computer networks and ISD mento automático de ambiguidades na INTERNATIONAL CONFERENCE ON Sofia: Association for Computational , v. 284, n. 5, p. 34-43,p.284,v.5,n., 2001. Available from: Semanticweb technologies foreconomic The anatomy of a large-scale hypertextual et al . . Neptuno:technologiesSemanticweba . for et al –Universidade de Brasília, Brasília, 1999. Trata , 2004,Heraklion, Crete,Greece.

st etal et al et

... Paris: [s.n.], engine. : Efective modeling in RDFS and OWL. San Francisco: San OWL. and RDFS in modelingEfective : Knowledge and Data Engineering . Boston:Addison-Wesley Longman,1999. . Boston:. O’Reilly Media, 2009.Available from:. Cited: Jul. 15, 2014. TERMINOLOGY AND ARTIFICIAL INTELLIGENCEARTIFICIAL AND TERMINOLOGY Proceedings Civilisations Orientales CASTELLS, P.; FERNANDEZ, M.; the vector-space model for retrieval. ontology-based information on BRIN, S.; PAGE, L. web search v. 30, n. 1, p. 107-117, 1998. sciencedirect.com/science/article/pii/S016975529800110X>.Available from: . Cited: May 17, 2014. BRÄSCHER, M. informação da recuperação daInformação) Semantic.html>.Cited: Aug. 22,2011. B python www.nltk.org/book/>.Cited: Feb. 21, 2012. B multilingual .ASSOCIATIONCOMPUTATIONALFOR LINGUISTICS In: retrieval BERNERS-LEE,T.; HENDLER, J.; LASSILA, O. ScientificAmerican .Cited: Sept. 17,2011. BERNERS-LEE, T. 1998. Available from: . INTERNATIONAL , 2004, New York. Data & Knowledge e Engineeringe rd A hybrid approach , v.,12, p. 2825-2830, NATURAL LANGUAGE 13 EEE INTERNATIONAL : The Europhan Journal Europhan The : Semant , Proceedings… The quest for information for quest The . OWL 2 Web Ontology . An information retrieval . Openwordnet-pt … Toulouse: Institut de Atoll: A frameworkAtoll:theAfor Foundations and Trends in . A corpus-basedapproachA . , G. , Upgrade J.; Aussenac-Gilles, N. Modelling NaturalLanguag Natural language processing as a Aachen: CEUR-WS, 2013. Available Proceedings https://doi.org/10.1590/2318-08892017000100006 . Scikit-learn:. Machinelearningpython. in . A lemon lexicon for DBpedia. In: , 2009,, Las Vegas. th . Kim a semantic platform for information he semantic web. et al et Dbnary: Wiktionary as a lemon based rdf , 6 , et al F. , v. 1, n. 34, p. 199-327, 2007. Available … New York: ACM, 2004. p. 374-383. Available , v. 94, p. 148-162, 2014. et al . e ; BREWSTER, C. Proceedings… B. l on the semantic web. semantic the on l , D.; FERNÁNDEZ, M.; CASTELLS, P CASTELLS, M.; FERNÁNDEZ, D.; , ienc ionandretrieval. ngual lexical resource in RDF. , Y. Engineering WILKS foundation of t Web Sc www.nowpublishers.com/article/De. . from: Cited: Sept. 18, 2014. VALLET retrieva Informaticsforthe Professional, 19-23,p.2015.Available6, n. . Cited: Dec. 6, 2014. CONSORTIUM Working2014.Group,OWLLanguage.Cambridge W3C(MA): . Cited: May 17, 2014. P CIMIANO,C.;UNGER, S.;WALTER, for the induction of ontology lexica. In: PROCESSING AND Springer, INFORMATION 2013. p. 102-113. SYSTEMS CIMIANO,P.UNGER,WALTER,C.;S.; automatic induction of ontology lexica. PAIVA, V.; RADEMAKER, A.; MELO RADEMAKER,A.; V.; PAIVA, brazilian wordnet for reasoning. Rio de Janeiro: (EMApTechnicalFGV, Reports). Available 2012. from: . Cited: 2012. Aug. 6, PEDREGOSA, TheJournal of Research 2011. POPOV, extract n. 9, p. 375-392, 2004. REYMONET, A.; THOMAS, ontological and terminological resourcesINTERNATIONAL in SEMANTIC OWL WEB DL. In: CONFERENCEBusan, South Korea. Recherchein Informatigue, 2007. v. 7. ROCHA, C.; SCHWABE, D.; ARAGÃO, M. P. for searching in the CONFERENCE ON semanticWORLD WIDE WEB web. In: Proceedings from: . Cited: Apr. 28, 2011. SÉRASSET, G. : : ... … 12., IEEE THE ACM v. 46, v. , eb, v. 9, n. 4, n. 9, v.eb, roceedings P Proceedings aluation British National Web Semantics

: , v. 37, n. 4, p. 294 305, 294 p. 4, n. 37, v. , Toronto. ent/1179190/?reload= … New York: ACM, 2003. ACM, York: New … Linking lexical resources A.Ontology learningand oni/dl/course/dlhb/dlhb- Semantic search. In: , 1986, 1986, , 2001, Chilton, UK. Chilton, 2001, Thedescription logic handbook , 2011, Heraklion, Crete, Greece.Crete,Heraklion, 2011, , th An introduction to description th th th , 5 , Proceedings MILLER, E. . (Ed.).. Linked Data: Evolving the web into a R.; et alet InformationSystems , v. 18, n. 1, p. 22-31, 2003. Available from: . Sealaframework. fordeveloping semantic . Interchanging lexical resources on the : Research and applications. Extended ADVANCES IN DATABASES . Searching and browsing linked data with ANNUAL INTERNATIONAL CONFERENCE ON Language Resources and Ev , , Campinas, 29(1):57-72, 2017 jan./abr., al Onto. PT: Towards the automaticconstructiontheTowards PT: Onto. es and Agents on the World Wide W Wide World the on Agents and es . An ontology-based retrieval system using et alet Washington (DC): Springer, 2011. p. 245-259.

et on Databases, 18 on Databases, et al

et al DOCUMENTATION Automatic sense disambiguation using machine tic web. rence rence Informação CRAE, J. CRAE, J.; SPOHR, D.; CIMIANO, P. c c OLIVEIRA, H. G. OLIVEIRA,H. of a lexical ontologydissertation)–University of Coimbra, Portugal, 2013. for portuguese. Thesis (Doctoral Trans 01.pdf>. Cited: Feb. 13, 2013. NAVIGLI, R.;VELARDI, P.;GANGEMI, its application to automated terminology Intelligent translation. Systems . Cited: May 3, 2012 Available from: . . Cited: Jun. 30, 2012. M LESK, M. readable dictionaries: how to tell a pine cream conecone. In: from an ice SYSTEMS New York: ACM, 1986. p. 24-26. Availableacm.org/citation.cfm?doid=318723.318728>. from: . 2012. KARA, S. semanticindexing. 2012. Available 2012. 30, Oct. Cited: from: science/article/pii/S030643791100113X>. . Cited: Jan. 18, 2011. GUHA, R.; MCCOOL, INTERNATIONAL CONFERENCE ON WORLD WIDE WEB,Hungary. Budapest, 2003, p. 700-709.

M. SCHIESSL & M. BRÄSCHER 72