The Problem of Using Persistent Identifiers for Historical

The Problem of Using Persistent Identifiers for Historical

Zapiski Komisji Geografii Historycznej PTH The problem of using persistent identifiers for historical geographical objects Grzegorz Myrda https://orcid.org/0000-0002-2756-8654 Tadeusz Manteuffel Institute of History, Polish Academy of Sciences Tomasz Panecki https://orcid.org/0000-0003-3483-2035 Tadeusz Manteuffel Institute of History, Polish Academy of Sciences Abstract: The authors describe the usage of persi- Zarys treści: Autorzy podejmują problem stosowa- stent identifiers (PIDs) for historical geographical nia trwałych identyfikatorów (PID) odnośnie do objects. They provide PIDs’ definition and scope historycznych obiektów geograficznych. Przyta- of use as well as characterise the process of data czają definicję i zakres wykorzystywania PID-ów, harmonisation and PIDs’ creation. The article omawiają kwestię harmonizacji danych historycz- describes and assesses certain approaches used in no-geograficznych oraz zasady tworzenia PID-ów. different projects. Most often, internal identifiers W artykule omówione zostały dotychczasowe roz- are used, although their stability is not guaranteed. wiązania stosowane w różnych projektach. Naj- References are also made to external data stores częściej używa się wewnętrznych identyfikatorów, such as Geonames and Wikidata. których stabilność nie jest określona. Mamy także odwołania do zasobów zewnętrznych (Geonames, Wikidata). Keywords: identifier, harmonisation, PID, historical Słowa kluczowe: identyfikator, harmonizacja, PID, geographical object historyczny obiekt geograficzny Data harmonisation attributes of a settlement may include its The resource identifiers we use when con- name, type, location, area occupied, etc.1 ducting research play a key role in harmo- The identity of geographical objects, nising data from different sources, such as and the ascertainment of their continu- national and academic, contemporary and ity over time, becomes a key issue from historical sources. The identifiers make it the perspective of conducting historical possible to establish direct and numerical and geographical research using GIS tools. links to the objects introduced into the It is also the subject of an interdiscipli- system based on references made to them nary discussion of historians, geographers, in different historical sources covering philo soph ers and computer scientists. different time periods. It is actually time Identification as such is an attempt at (meaning the diachronic approach) that a compromise between the intuition of causes the most difficulties in identifying a researcher, who points to the same ob- objects. However, we should not forget jects in different sources, and the need to about the issues resulting from the syn- identify these objects numerically in in- chronic approach (similar time, different formation systems. Based on the research source). The identified objects, meaning those assigned the same identifier, are the 1 P. Garbacz, A. Ławrynowicz, B. Szady, Identity criteria for localities, in: Formal ontology in information systems, ed. S. Borgo, P. Hitzler, O. Kutz, same objects regardless of their attributes Amsterdam–Berlin–Washington 2018 (Frontiers in Artificial Intelligence that are changing over time. For instance, and Applications, 306), pp. 47–54. http://dx.doi.org/10.12775/SG.2020.08 Studia Geohistorica • Nr 08. 2020 179 Zapiski… Grzegorz Myrda, Tomasz Panecki conducted so far on historical and con- in the type of a settlement with the same temporary settlements, we can indicate name and location should raise doubts as four main constitutive properties of geo- to its identification.4 Another example is graphical objects that show their identity the diachronic identification of objects over time: proper name, location, type and from different classes, such as a physi- mereological relations. The first two are ographic object and a settlement in this taken into account most frequently.2 case. In Nowy Tomyśl Poviat, in the place The issues relating to a proper name where the former settlement called Bo- include its unchangeability, its varieties, brówka was located, there is now a swamp variants, but also the problem of differ- bearing the same name.5 Therefore, we ent languages. While there is no doubt have a change of type here (and even of that the names Grodzisko, Gratez and object class), but not of name and loca- Grodzisk Wielkopolski refer to the same tion, so we can say that it is the same ob- object, the names Kąty and Winkel may ject. The identification is also significantly arouse doubts, even though it is the same influenced by mereological relations, es- expression rendered in different languages. pecially in the diachronic context. There However, if we take the names Zielonka are many cases where, during the settle- and Przyłęk, we must use an additional ment process, a settlement was divided criterion for their identification. In this into two or more parts or became part of case, this criterion will be the identical another settlement. If a reference resource location of the objects. We can assume provides an identifier for two equivalent that geographical objects which are located settlements (for example with an annota- in the same place according to different tion -Dolny – ‘Lower’, -Górny – ‘Upper’), sources (assuming that their object class- and earlier the settlement was mentioned es are compatible) are the same objects, as one entity, we have a problem with its even if they have different proper names. identification. An example may be Psary However, this assumption may turn out (Będzin Poviat): its diachronic identifica- to be incorrect, as it is difficult to define tion is ambiguous because of complicated the compatibility of location which is af- mereological relations.6 fected by, among others, the level of gen- Even though much experience has been eralisation used (if a settlement is seen as gathered in historical and geographical re- a point or an area), the accuracy of the search and certain solutions for numerical map and the spatial development or even identification of objects have been devel- actual relocation of a settlement (for ex- oped, misidentification scenarios may oc- ample Nieszawa3) which does not affect cur and lead to negative consequences for its identity. When identifying an object, semantic coherence of data. Two such sce- we also sometimes consider the compat- narios are: (1) same settlements, different ibility of object types, such as towns, vil- identifiers and (2) different settlements, lages, grain milling hamlets and smithery same identifiers. The correct scenario is (3) settlements in the case of a settlement. same settlements, same identifiers. In or- Of course, the nature of a settlement may der to achieve it, a conceptually correct have changed over time, but if we have ref- data harmonisation is needed. erences from similar periods, the difference 2 Ibidem. 3 W. Duży, Powiat m. Toruń, in: Metodologia tworzenia czasowo-przestrzen- nych baz danych dla rozwoju osadnictwa oraz podziałów terytorialnych, 4 T. Panecki, Powiat nowotomyski, in: Metodologia tworzenia, pp. 257–258. ed. B. Szady, [Warszawa 2019] (project report, 10.5281/zenodo.3751266), 5 Ibidem, p. 274. pp. 377–378. 6 W. Duży, Powiat będziński, in: Metodologia tworzenia, pp. 45–49. 180 Studia Geohistorica • Nr 08. 2020 The problem of using persistent identifiers for historical geographical objects Zapiski… What is a PID (persistent identifier) and what we share data or pieces of information that is it used for? may need to be referenced (quotation) or We have all probably encountered the si- referred to by external computer systems tuation when at a given URL (Uniform (harmonisation), it is very important to Resource Locator) instead of the expected share these resources together with iden- content we see the message: “404 – Page tifiers that make a long-term and stable not found.” As long as such a situation reference to the specific resource. concerns data that are not very impor- We can give a few examples that show tant, it does not matter much. Since the how identifiers may look, without deter- Internet is developing quickly and is be- mining whether these are persistent iden- ing used everywhere, for everything and tifiers that meet all the pertinent criteria. by everyone, such situations will become Examples of identifiers from the National increasingly common in everyday life. Ho- Register of Geographical Names (Polish: wever, they should not occur in cases for Państwowy Rejestr Nazw Geograficznych – which the World Wide Web was inven- PRNG),7 Geonames8 and Wikidata9 con- ted – for disseminating scientific achieve- cerning the same settlement are set out ments and coordinating research carried below. These identifiers meet most of the out in different research centres. The 404 criteria for persistent identifiers. Yet in Error may be a result of not using persi- fact, we can only state that something is stent identifiers (PIDs) which identify re- a persistent identifier when it stands the sources in a unique and persistent manner. test of time. Before that, it is only its aim. A lot of online and offline data do not have such identifiers or are given identifiers that Table 1. Identifiers of Adamowice in different systems only have some of the characteristics of System Identifier persistent identifiers. That is why it is diffi- PRNG 100 cult to use them in any context other than Geonames 776782 the project under which they were created. Wikidata Q4680202 Gradually, more and more data types Source: authors’ own elaboration within

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    15 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us