A multi-layer for geospatial semantic Ludovic Moncla, Mauro Gaio

To cite this version:

Ludovic Moncla, Mauro Gaio. A multi-layer markup language for geospatial semantic annotations. 9th Workshop on Geographic Information Retrieval, Nov 2015, Paris, France. ￿10.1145/2837689.2837700￿. ￿hal-01256370￿

HAL Id: hal-01256370 https://hal.archives-ouvertes.fr/hal-01256370 Submitted on 1 Jan 2018

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Multi-Layer Markup Language for Geospatial Semantic Annotations

∗ † Ludovic Moncla Mauro Gaio Universidad de Zaragoza LIUPPA - EA 3000 C/ María de Luna, 1 - Zaragoza, Spain Av. de l’Université - Pau, France [email protected] [email protected]

ABSTRACT 1 Motivation and Background In this paper we describe a markup language for semantically Two categories of markup languages can be considered in annotating raw texts. We define a formal representation of geospatial tasks, those dedicated to the encoding of spatial text documents written in natural language that can be ap- data (e.g., gml, kml) and those dedicated to the annota- plied for the task of Named Entities Recognition and Spatial tion of spatial or spatio-temporal information in texts, such Role Labeling. as in semantic role parsing [4, 10]. SpatialML [6] is more The proposal relies on a multi-layer process focused on the annotation of static spatial information and based on a core generic layer, which can be freely adapted does not provide any support to identify spatio-temporal in- into more specific layers depending on the intended goal. formation such as motion. ISO-Space [12] annotates spatio- Our markup language is based on the tei Guidelines1 to temporal information and has been also designed for captur- propose a generic and extensible markup language. This ing implicit spatial information. Although ISO-Space seems language is particularly dedicated for the text mining task a comprehensive standard for capturing spatio-temporal in- and ready to use to be layered with more semantic relation- formation, some of its elements remain complex and are not ships between elements of the text. really suited for a fully automatic process. Otherwise tei We show the feasibility of this proposal from a generic is a standard for textual markup and provides a guide to annotation of texts describing itineraries toward a geospatial best practices for interchanging and encoding textual infor- semantic annotation. mation. tei has been designed to be the more generic and adaptable as possible and provides a generic framework par- ticularly adapted for the customization. CCS Concepts The existing markup languages dedicated to the annota- •Information systems → Data encoding and canon- tion are more focused on the specification of relations (spa- icalization; Language models; Information extraction; tial or spatio-temporal) than in named entities (ne). Stan- •Applied computing → Extensible Markup Language dard markup languages consider ne as being only composed (XML); of a pure proper name, whereas we consider more complex expressions known as expanded named entities (ene) [7, 15]. Keywords ene concept is particularly essential in any task that aims geo-semantic tagging; text annotation; expanded named en- to retrieve all or part of evocative context of the NE, such tity as disambiguation task. For those reasons, we propose the specification of a multi-layer markup language adapted to the annotation of both enes and spatial/spatio-temporal re- ∗ Laboratoire COGIT lations. The proposed tei-compliant markup language is 73 av. de Paris - 94160 Saint-Mand´e, France based on a core generic layer, dedicated to the annotation of †LIUPPA - EA 3000 ne, which can be used to share pre-processed corpus of doc- Av. de l’Universit´e- Pau, France uments. Finally the language is designed to be compliant 1 tei is an international standard for textual markup defined with an automatic process of annotation. by the tei Consortium. It provides a guide to best practices for interchange and encoding of textual material in digital form. http://www.tei-c.org/Guidelines/P5/ Although still at an early stage of development, the pro- posed markup language was applied for a problem of au- tomatic information extraction and toponym resolution de- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed scribed in [8] and for a problem of automatic itinerary re- for profit or commercial advantage and that copies bear this notice and the full cita- construction described in [7]. tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission The remainder of the paper is structured as follows. Sec- and/or a fee. Request permissions from [email protected]. tion 2 describes the ‘core generic layer’ of our multi-layer GIR ’15, November 26-27, 2015, Paris, France annotation language. Then, Section 3 describes the adap- c 2015 ACM. ISBN 978-1-4503-3937-7/15/11. . . $15.00 tation of the generic language for a specific semantic role. DOI: http://dx.doi.org/10.1145/2837689.2837700 Finally, Section 4 summarises and concludes this paper. 2 A Generic Markup Language for Expanded Named Entity Representation However, when the associated term is not equal to the in- trinsic or default type of the pure proper name, it defines a The main elements of our proposed generic markup speci- new entity that overlaps the pure proper name. The follow- fication are categorized in two groups, the first one refers ing examples (7 - 10) illustrate that an entity may contain to the elements contained within expanded named entities, the name of another entity, and that the new entity may the second one refers to the standard elements describing have a different type. For instance, a sentence containing the text segmentation and a set of global attributes defined the expanded named entity “sindaco di Venzia” (sindaco = by the tei P5 Guidelines which are available for all tei ele- mayor) is referring to the person and not necessarily to the ments. city.

2.1 Expanded Named Entity Representation (7) Cardiff Downtown→ two entities, Cardiff (loca- 2.1.1 Principles tion) and Cardiff Downtown (location) According to [5] there are two categories of proper names: pure and descriptive. Pure proper names can be simple (8) delt`adel Ebro → two entities, Ebro (location) and (i.e., composed of a single lexeme) or complex (i.e., com- delt`adel Ebro (location) posed of several lexemes) and are only composed of proper names. Descriptive proper names refer to a composition of proper names and common names (i.e., expansion). In (9) chˆateau de Versailles → two entities, Versailles other words, descriptive proper names overlap pure proper (location) and chˆateau de Versailles (location) names. Descriptive proper names refer to a ne built with a pure proper name and a descriptive expansion. This expan- sion can change the type (e.g., location, person, etc) of the (10) sindaco di Venezia → two entities, Venezia (loca- initial pure proper name. tion) and sindaco di Venezia (person) An ene may contain an entity built with both categories of proper names (i.e., pure and descriptive), and that can be composed of one or more concepts, whereas most of works of ner are usually only considering pure proper names. This Level 2. refer to a descriptive proper name composed of provides us, in particular, to move beyond reduction of place another descriptive proper name. ene of Level 2 are built to a name and a set of coordinates, a model still predominant with ene of level 1 and with a descriptive expansion. The in Geographic Information Science as specified by [11]. We following examples (11 - 13) show some ene of Level 2. The define several levels of overlapping, (0, 1, 2, etc.) for the behaviour is the same as for the previous level, the expansion representation of ene. Each level is encapsulated in the can change the nature of the object described by the ene previous level. of level 1. For instance, in example (13) the person ene ‘g´en´eral Charles de Gaulle’ becomes a reference to a location with the expansion ‘avenue’ having a geographical sense. Level 0. refer to pure proper names. It can be seen as the core component of an ene. Thus, we consider ne as a special kind of ene. The following examples (1 - 3) show (11) Parque Natural del Delta del Ebro some examples of entity of level 0:

(1) Cardiff → one entity (location) (12) Cardiff City Football Club (2) Balaruc-le-Vieux → one entity (location)

(3) Charles de Gaulle → one entity (person) (13) avenue du g´en´eral Charles de Gaulle

Level 1. refer to descriptive proper names composed of a pure proper name (i.e., an entity of level 0) and a common Level 3. ene at this level are built with ene of level 2 plus noun (i.e., expansion). The following examples (4 - 6) show a descriptive expansion. Actually, there is not really a limit the representation of ene. We can notice that in these cases, to the overlapping. However, it is really rare to find an ene descriptive expansions do not change the intrinsic nature of of level 3 or more. The following examples (14 - 15) show the object described by the proper name. some ene of Level 3.

(4) commune de Balaruc-le-Vieu

(14) la valle glaciale del lago di monte Acuto (5) comunidad aut´onoma de Arag´on

(15) the owner of the restaurant of Neuvic Lake (6) g´en´eral Charles de Gaulle The proposed hierarchy of overlapping of ene introduces more detailed entities and allows the annotation of more Table 1: Attributes for tag fine-grain ne and less errors of classification. type category of ne (location, person, organization, etc.) In our language each level of the ene can be marked, from subType semantic sub-categorization the pure proper name to the whole ene. For instance, con- sidering example (15), using ner method described in [8] produces the following results: category of the ne (location, person, organisation, etc.) such as the ENAMEX types proposed in the MUC-6 typology or • ‘Neuvic’ as proper name (ene level 0) such as those defined by [15, 16]. The subType attribute for • ‘Neuvic Lake’ as geographical name (ene level 1) indicates a second level in the classification such as • ‘restaurant of Neuvic Lake’ as place name (ene level roadName if the type attribute value is equal to ‘location’. 2) Figure 1 illustrates the annotation of pure proper names • ‘owner of the restaurant of Neuvic Lake’ as person (i.e., ene of level 0) using the tei element. (ene level 3) To illustrate the advantage of using the introduced con- Vanoise cept of ene, we have tested four well-known English ner --- tools (Stanford, Open Calais, Illinois and FreeLing) with example (15). The results are shown below: GR55 • Stanford Named Entity Recognizer2 annotates one en- tity: – ‘Neuvic Lake’ as location Figure 1: Examples of Named Entities (NE) • Open Calais3 annotates two ne: – ‘Neuvic Lake’ as natural feature In our annotation scheme, we also take advantage of tags – ‘restaurant of Neuvic Lake’ as facility provided by tei for dates and time. These tags are described • ner tool of the Cognitive Computation Group4 [13] of in the Core module of the tei Guidelines and are available the University of Illinois annotates only one ne: for all tei documents. They refer to the expressions of date, – ‘Neuvic Lake’ as organization time or duration in texts. Figure 2 shows some examples of • FreeLing5 annotates one ne: annotation of date and time entities. – ‘Neuvic Lake’ as geographical location July 2015 We argue that for a powerful task, of marking and of cate-

Table 4: Attributes for tag lemma Canonical form of the word type Part-of-speech ure 7 illustrates the annotation of punctuation. subType Semantic sub-categorization All these elements used for the text segmentation and the representation of ene define the generic layer of the multi- Table 4 shows the current set of attributes defined layer markup language. They have been applied for the text in our specification and Figure 7 shows the result of the mining and ner tasks. annotation of sentence (16).

E^ tes -vous 3 Towards a Geospatial Semantic ML parvenu This section describe the adaptation to transform the generic au refuge core layer towards a geospatial semantic markup language: du the second layer. Col We propose some guidelines for a tei compliant markup de la language for encoding spatial information. Some elements Vanoise belonging to the generic layer of our multi-layer markup lan- ? guage ( and ) are turned into more specific ele- ments embedding geospatial semantics.

Figure 7: Annotation of words, grammatical phrases According to the Namesdates module of the tei Guide- and punctuation lines, the content of elements is defined as a common noun identifying some geographical feature (e.g., valley, mount, etc.) contained within a spatial ne. This is The lemma attribute for is optional and may contain the equivalent of the definition of a descriptive expansion the canonical form of the word. The type attribute for is part of an ene (i.e., element) having a mandatory and contains the lexical categories of word (part- geographical denotation and associated with a spatial ne. of speech). Although there are significant variations depend- Thus, in our customized specification, the ing on the language, common parts of speech are: noun, elements having a geographical sense (i.e., city, lake, river, verb, adjective, adverb, pronoun, preposition, conjunction, etc.) which are used in conjunction with elements interjection, numeral, article or determiner. The label for are turned into elements. Figure 8 shows two each category is usually defined by abbreviations such as n, examples of elements. v, adj, adv, etc. Different pos tagsets have been defined and used in well-known projects such as the Brown Corpus6, 7 8 the Penn Treebank [14] and the French Tree Bank [1]. The subType attribute for is optional and may contain se- lac mantic information. For instance, in the current version of the language, subType is used to classify verbs. The possi- Grattaleu ble values are: motion initial, motion median, motion final, --- perception and topographic. The element defined by the tei Guidelines in the torrent analysis module represents a grammatical phrase. The type de attribute may be used to indicate the type of phrase such la as noun phrases, prepositional phrases, verbal phrases, etc. Figure 7 shows such annotation on sentence (16). Leisse The element contains a character or string of char- acters regarded as constituting a single punctuation mark. Table 5 shows the current set of attributes. Figure 8: Annotation of geographical feature names All the attributes of the element are optional. The force attribute for indicates if the considered punctu- According to the Namesdates module, the ation mark is a separator for words or phrases. The type element identifies a name associated with some geograph- attribute for indicates the kind of punctuation. Fig- ical feature such as ‘River Thames’ or ‘col de la Vanoise’. 6http://www.comp.leeds.ac.uk/ccalas/tagsets/brown.html Thus, elements which refer to 7http://www.cis.upenn.edu/˜treebank/ geographical names are turned into elements. 8http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr. Figure 9 shows an example of annotation of a php element. Table 6: GeoNames feature classes col Name Code Feature code Administra- first-order administrative division de tive bound- A (ADM1). . . la aries Vanoise Area L locality (LCTY), park (PRK). . . Hydrographic H stream (STM), lake (LK). . . Hypsographic T valley (VAL), pass (PASS). . . Populated populated place (PPL), farm village P place (PPLF). . . Figure 9: Annotation of geographical names Road / Rail- R trail (TRL), street (ST), road (RD). . . road As we have seen in the definition of the concept of ene Spot S school (SCM), resthouse (RHSE). . . and in the description of the element, we defined several Undersea U canyon (CNYU), reef (RFU). . . forest (FRST), cultivated area levels of encapsulation. The global n attribute may be also Vegetation V used to indicate the level of encapsulation of (CULT). . . elements (in the same way that it was done for the element). Figure 10 shows an example of encapsulation of two elements. adjacency and inclusion. Furthermore, distance relations are described with the element in the paragraph below. refuge Table 7: Attributes for tag du type orientation, direction initial, direction final, meet, inside, . . . Col subtype sub-categorization de la Table 7 shows the current set of attributes de- Vanoise fined by our customized specification and Figure 11 shows three examples of annotation of element. pres des Figure 10: Encapsulation of elements --- According to our customized specification, the attributes au type and subtype are optional and their values refer to the nord nature of the geographical feature such as lake, mountain, de valley, city, etc. In the current version of the language, we propose to follow the classification introduced for the GeoN- --- 9 ames Ontology . The nine categories of feature classes of jusqu GeoNames are shown in Table 6. The code column lists the au nine possible values for the type attribute and the feature code column shows some examples of feature classes which are used for the subtype attribute. And according to our Figure 11: Annotation of elements customized specification, these two attributes can be also used for the element when it is not included in a The type attribute is mandatory and its value shall be: element. orientation, direction initial or direction final, meet, inside... According to the tei Guidelines, the element The value of the optional subtype attribute depends on the marks that part of a relative temporal or spatial expres- value of the type attribute. For the orientation type, the sion which indicates the direction of the offset. Thus, the value of the subtype attribute shall be: south, east, north- generic elements referring to spatial west, above, behind, etc. For the adjacency type, the value relations are turned into elements in the geospatial of the subtype attribute shall be: next, near, etc. And for semantic layer of our proposal. the inclusion type, the value of the subtype attribute shall According to our specification of the geospatial semantic be: in, inside, etc. layer, the element annotates spatial relations ex- According to the tei Guidelines, the element pressed in texts. At the current level of development, we contains a word or phrase referring to some quantity, usu- distinguish three categories of spatial relations: topological ally comprising a number, a unit, and a commodity name. relations [2], directional relations [3] and distances. Cur- Thus, according to our specification of the geospatial seman- rently we consider two main types of topological relations: tic layer, the elements referring to 9http://www.geonames.org/ontology/ distance relations are turned into elements in the geospatial semantic layer of the language. (Table 9). Absolute elements refer to stan- dard spatial ene and relative elements refer to spatial ene associated with spatial relations (i.e., and elements). In other words, the elements defined in the generic layer quantity numeric value are turned into and elements are turned into tag type="relative">. Figure 13 and Figure 14 show an exam- ple of annotation of an absolute and a relative element respectively. Table 8 shows the current set of attributes de- fined in our customized specification. All attributes are op- < placeName type=" relative "> tional. The unit attribute indicates the units used for the measurement. The value shall be expressed in the Interna- 200 tional System Units (SI)10 such as meter and second. The m`etres value of the quantity attribute shall be a numeric value. Figure 12 shows an example of annotation of el- au ement. nord de quantity=" 200"> Pau two hundred meters Figure 14: Annotation of a relative Figure 12: Annotation of element The element is defined by the tei Guidelines as a generic element containing data about a geographic lo- The element is defined by the tei Guidelines cation. In the current version of our specification of the as containing an absolute or relative place name. Thus, ac- geospatial semantic layer, we use this element to replace the cording to our specification of the geospatial semantic layer, generic element. Thus, we consider the elements of the generic layer of our proposal re- that the element refers to the definition of a spatial ferring to geographical places (i.e., containing a area from the association of several locations (Figure 15). or ) are turned into ele- According to our specification, elements can con- ments. Furthermore, we specify that elements tain the and attributes described in the must be included into elements. element and referring to the feature class of the spatial object. Table 9: Attributes for tag type absoluter o relative les bourgs < placeName type=" absolute "> de < placeName xml:id=" pn2"> refuge Barioz du et de Col < placeName xml:id=" pn1"> de Bieux la Vanoise Figure 15: Annotation of a element

We customize the element described in the generic Figure 13: Annotation of an absolute layer of our multi-layer markup language for annotating mo- tion events and perception expressions. We distinguish between two types of : abso- According to our specification of the geospatial semantic lute and relative and we define the type attribute as optional layer, the subtype attribute (optional) indicates the semantic 10http://www.bipm.org/en/publications/si-brochure/ of the verbal phrase, its value shall be: motion or perception. < placeName type=" absolute "> Table 10: Attributes for tag type type of the phrase Pau subtype semantic sub-categorization (e.g., mo- tion or perception) function a second level of semantic sub- 43.301667 -0.368611 categorization

The function attribute (optional) indicates the motion class. Figure 17: Annotation of the element In the current version of our proposal we consider a set of six motion classes based on the classifications of verbs of motion proposed by [9]: leave, hit, reach, external, internal, nested in various elements depending on the ene to which and cross. Figure 16 shows the result of the annotation of it refers. For instance, in Figure 17 the element sentence (16) using the elements available in our geospatial is nested in the element and refer to the location semantic layer. of the spatial ne ‘Pau’, whereas for relative ene (e.g., ‘sud de Pau’) the element is nested in element. Furthermore, a location may be specified by using a non- E^ tes -vous TEI xml vocabulary such as gml and kml. Then, we also use of rdf identifiers to interlink with resources on the Web parvenu of Linked Data such as Geonames or DBpedia. au 3.3 Indication of Uncertainty The element indicates the degree of certainty refuge associated with some aspects of the text markup. This ele- ment is described in the Certainty module of the tei Guide- du lines. Col Table 11: Attributes for tag de la target URI data pointer locus name , start, end, location, value lemma type Vanoise assertedValue alternative value degree degree of confidence ? Table 11 shows the current set of attributes defined in our specification. The target attribute indicates the element to which the certainty is applied using the URI syntax. The target attribute is optional and if it is not Figure 16: Annotation of a element expressed, the certainty relies to its parent element. The locus attribute indicates the aspect concerning which cer- 3.2 Encoding Geometric Properties of Spatial tainty is being expressed. The locus attribute is mandatory and its value shall be one of the following: name, start, loca- Features tion, value. The name value indicates that the uncertainty The tei Guidelines describe also some elements for encoding relies on the name of the element to which the geometric properties of spatial features. According to the element refers. The start, end and location values indicate Namesdates module, the element defines the lo- respectively whether the start, the end or both the start cation of a place as a set of geographical coordinates and the and the end of the element are correctly identified. The element contains any expression of a set of geographic value value indicates that the uncertainty concerns the con- coordinates. However, geographic coordinates such as lati- tent of the element. The assertedValue attribute provides tude and longitude values are not often available directly in an alternate value for the aspect of the considered markup. the textual description and must be retrieved from external For instance, when the value of the locus attribute is equal geographic resources. to name, the value of the refers to the al- Figure 17 shows an example of annotation using the and elements. The and finally, the degree attribute indicates the degree of confi- elements (optional) indicates the continent and the country dence expressed by the element. respectively to which the location belongs. With respect to the problem of NE classification, the element can be used to indicate the degree of cer- of several levels of expansion (see Section 2.1), according to tainty of the type assigned to a ne. Figure 18 shows an ex- our specification the and elements can be ample of a element applied to a element. The name value of the locus attribute and the rs value of the assertedValue attribute indicate that the Table 12: Summary of the tagset defined for the element may be a element. Generic and the Geospatial layer of our multi-layer markup language Textual ele- Tagset of the Tagset of the < placeName xml:id=" pl1"> ments Generic layer Geospatial layer < certainty target ="# pl1" locus=" name " word assertedValue="rs" degree=" 0.6" /> sentence Paris punctuation spatial and tempo- ral relations measure expres- Figure 18: Annotation using the element sions expansion of ene ne Furthermore, the element can be also used (level0) to indicate the certainty degree of the toponym disambigua- > tion task. In this case, the element must be ene (level 0) associated with the element. Figure 19 shows an ex- ample in which the uncertainty relies on the geographical verbal phrase location of the spatial entity. Figure 20 shows the result of the annotation of sentence < placeName xml:id=" pn1" type=" absolute "> (16) using all the elements and attributes defined in our customized specification of a geospatial semantic language. Pau 43.301667 -0.368611 E^ tes -vous / > parvenu au < placeName xml:id=" pn1" type=" absolute "> Figure 19: Annotation using the element < certainty target ="# pn1" locus=" name" assertedValue="rs" degree=" 1.0"/> 51.969604 -2.893146 4 Discussion < certainty target ="# geo1" locus=" value " The main idea of this paper is to propose a general frame- degree=" 1.0"/> work for people to create their own specific markup language based on a core generic layer particularly dedicated for the text mining tasks such as information extraction, data min- refuge ing, but also for deeper linguistic processing such as semantic parsing and co-reference resolution. The core generic layer is du also ready to use to be layered with more semantic relation- ships between elements of the text. Col To our knowledge ner is an important pre-processing step for most of these tasks and automatic ner process for Indo- de la European languages might be more or less challenging (e.g. for German it is especially challenging). This is the reason Vanoise why our contribution is based on examples from this task and our proposal relies on the tei standard which is widely used in and for Indo-European languages. Thus, the objective is to define several specific ? languages, each one adapted to a specific need and all based on the same generic core layer. The proposed generic core layer may be used to create and share pre-processed corpus. Table 12 shows a summary of the different elements de- Figure 20: Example of annotation fined by the generic and the geospatial layers of our multi- layer markup language. We applied our proposal of a multi-layer markup language Unlike a great deal of current research, dealing with ner, for the annotation of geospatial information from textual that are considering only pure proper names or very few en- descriptions of itineraries. Then, we use this encoding of tities that we defined as ene of level 1 such as Eiffel Tower geospatial information in order to automatically reconstruct and River Thames, we consider in our proposal both cate- the described itinerary. The automatic reconstruction of gories of proper names (i.e., pure and descriptive). itineraries task is based on a calculation (multi-criteria ap- proach) using all the elements annotated from the textual [9] P. Muller. A qualitative theory of motion based on description. Given the work in progress of our modeling spatio-temporal primitives. In Proc. of the Sixth Inter- work, the information concerning the reconstructed itinerary national Conference on Knowledge Representation and is not yet introduced in the annotation. Despite this we have Reasoning (KR98), pages 131–141, 1998. shown how it is possible to define new layers according to the needs and aims. For instance, users can define a third layer [10] S. Pradhan, K. Hacioglu, W. Ward, J. H. Martin, and derived from the second layer of the multi-layer markup lan- D. Jurafsky. Semantic Role Parsing: Adding Seman- guage using non-tei elements and also non-consuming tags tic Structure to Unstructured Text. In Proc. of ICDM which may interlink already tagged elements such as ne or ’03, pages 629–632, Washington, DC, USA, 2003. IEEE ene with verbs or any other kind of element (e.g., spatial Computer Society. or temporal relations, etc.). On our side we are considering [11] R. S. Purves and C. Derungs. From space to place: in the short term to specify a third layer introducing more place-based explorations of texts. International Journal semantic and dedicated to the representation of motion us- of Humanities and Arts Computing, 1(9):74–94, 2015. ing some ISO-Space tags such as the spatial link elements (, , , ). [12] J. Pustejovsky, J. L. Moszkowicz, and M. Verhagen. A linguistically grounded annotation language for spatial 5 Acknowledgments information. TAL, 53(2):87–113, 2012. This work has been partially supported by : the Commu- [13] L. Ratinov and D. Roth. Design challenges and mis- naut´ed’Agglom´eration Pau Pyr´en´ees (CDAPP) and the In- conceptions in named entity recognition. In CoNLL, 6 stitut National de l’Information G´eographique et Foresti`ere 2009. (IGN) through the PERDIDO project ; the Spanish Govern- ment (project TIN2012-37826-C02-01); and the Aragon and [14] B. Santorini. Part-of-speech tagging guidelines for the Aquitaine Regional Governments through the transborder penn treebank project (3rd revision). Technical report, Aragon-Aquitaine cooperation programme 2014. Department of Computer and Information Science, Uni- versity of Pennsylvania, 1990. References [15] S. Sekine, K. Sudo, and C. Nobata. Extended named entity hierarchy. In Proc. of LREC’02, May 29-31, [1] A. Abeill´e, L. Cl´ement, and F. Toussenel. Building a 2002, Las Palmas, Canary Islands, Spain, 2002. treebank for french. In A. Abeill´e, editor, Treebanks, number 20 in Text, Speech and Language Technology, [16] M. Tran and D. Maurel. Prolexbase : Un dictionnaire pages 165–187. Springer Netherlands, Jan. 2003. relationnel multilingue de noms propres. Traitement Automatique des Langues, 47(3):115–139, 2006. [2] M. Egenhofer and R. Franzosa. Point-set topological spatial relations. International journal for Geographical Information Systems, 5(2):161–174, 1991.

[3] A. U. Frank. Qualitative spatial reasoning with cardi- nal directions. In Proc. of the Seventh Austrian Confer- ence on Artificial Intelligence, pages 157–167. Springer, 1991.

[4] D. Gildea and D. Jurafsky. Automatic Labeling of Se- mantic Roles. Comput. Linguist., 28(3):245–288, Sept. 2002.

[5] K. Jonasson. Le nom propre. Duculot, Belgique, Louvain-la-Neuve, 1994.

[6] I. Mani, J. Hitzeman, J. Richer, D. Harris, R. Quimby, and B. Wellner. Spatialml: Annotation scheme, cor- pora, and tools. In Proc. of LREC’08, European Lan- guage Resources Association (ELRA), Marrakech, Mo- rocco, may 2008.

[7] L. Moncla, M. Gaio, and S. Musti`ere. Automatic itinerary reconstruction from texts. In Proc. of GI- Science 2014, Geographic Information Science, pages 253–267, Vienna, Austria, Sept. 2014.

[8] L. Moncla, W. Renteria-Agualimpia, J. Nogueras-Iso, and M. Gaio. Geocoding for texts with fine-grain to- ponyms: An experiment on a geoparsed hiking descrip- tions corpus. In Proc. of the SIGSPATIAL ’14, pages 183–192, New York, NY, USA, 2014. ACM.