Modelling vagueness – A criteria-based system for the qualitative assessment of reading proposals for the deciphering of Classic Mayan hieroglyphs

Franziska Diehr1, Sven Gronemeyer2, 3, Elisabeth Wagner2, Christian Prager2, Katja Diederichs2, Uwe Sikora1, Maximilian Brodhun1, Nikolai Grube2 1 State and University Library Göttingen, Germany 2 Department for Anthropology of the Americas, University of Bonn, Germany 3 Department of Archaeology and History, La Trobe University, Australia [email protected]

Abstract office, located at the Department for the Anthropo- logy of the Americas, University of Bonn,3 works The project ‘Text Database and Diction- in close collaboration with the State and University ary of Classic Mayan’ aims at creating a Library Göttingen.4 Since 2014, the hieroglyphic machine-readable corpus of all Maya texts texts have been prepared, evaluated, and interpreted and compiling a dictionary on this basis. in interdisciplinary cooperation using methods and The characteristics of this complex writ- tools from the humanities and information techno- ing system pose particular challenges to logy (Prager, 2014c). research, resulting in contradictory and am- One result of this collaboration, and at the same biguous deciphering hypotheses. In this time an important milestone of the project, is the paper, we present a system for the qualitat- digital Sign Catalogue which is the subject of this ive evaluation of reading proposals that is paper. For the creation of a text corpus and a dic- integrated into a digital Sign Catalogue for tionary of an only partially understood language Mayan hieroglyphs, establishing a novel and script, a Sign Catalogue, an inventory of all concept for sign systematisation and clas- used signs, is an indispensable instrument. At the sification. The paper focuses in particular same time, the catalogue also forms the core com- on the modelling process and thus emphas- ponent for constructing a machine-readable text. ises the role of knowledge representation The project deals with a that is only in digital humanities research. partially deciphered. The investigation of the script 1 Modelling vagueness in the Maya and language has produced numerous assumptions and interpretations about the reading of its signs. Dictionary project For a heuristic deciphering work, these reading hy- The not yet completely deciphered script of the pre- potheses are to be included for further investigation. Columbian Maya culture is the research subject During the development of the Sign Catalogue, a of the interdisciplinary project ‘Text Database and system for the qualitative evaluation of decipher- Dictionary of Classic Mayan’.1 The aim of this ment hypotheses and reading proposals was de- long-term project is to record the approximately veloped. Among other things, it offers assistance 10,000 known text carriers and their inscriptions with the linguistic analyses of the texts. The chal- in a machine-readable corpus and based on this lenge in modelling this system was to represent to compile a dictionary, which reflects the entire several reading proposals, as well as to determine vocabulary and its use in context (Prager, 2014b).2 the factors that led to the formulation of the respect- As a joint effort, the specially established project ive deciphering hypothesis. The goal of knowledge representation is also the explicit description of 1. http://mayadictionary.de 2. The project started in 2014 and is expected to end in 2028. the methods applied. This enables the scholar to It is funded by the North Rhine-Westphalian Academy of carry out analyses and draw conclusions on the Sciences and Arts http://www.awk.nrw.de and the Union of German Academies of Sciences and Humanities https: 3. https://www.iae.uni-bonn.de/ //www.akademienunion.de. 4. https://www.sub.uni-goettingen.de

33 Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)

Figure 2: Writing of Maya Hieroglyphs

which is characterised by two main sign classes: and syllabograms. Logograms stand for linguistic terms, like e.g. PAKAL for ‘shield’, and refer, with only a few exceptions, only to one Figure 1: Example for a Maya inscription denotation. Syllabograms represent single vowels and syllabic components and serve to write lex- ical and grammatical morphemes. They are also basis of the model. The modelling of reading pro- used as phonetic complements of logograms, as posals and their plausibility act as examples for in PAKAL-la. Words could thus also be written the representation of complex knowledge objects exclusively with syllabic signs pa-ka-la (Figure2), with interpretive character, as they are typical for but mostly, both sign classes were combined in research data in the humanities. In this context, writing (Diehr et al., 2017, 1186). the paper emphasizes the role of knowledge repres- entation in projects that are located in the digital The signs were arranged in almost rectangular humanities and considers the challenges posed by blocks. Such a hieroglyphic block probably cor- the modelling of vague and uncertain information. responds to the emic idea of a word. Within a block, the signs could be arranged in many differ- 2 Characteristics of Classic Maya ent ways. Depending on space requirements and writing aesthetics, they could merge, overlap, be infixed, or rotated. The blocks were usually arranged in Compared to other Mesoamerican writing systems, double columns and read from left to right and such as Isthmic or Aztec, Maya writing has a con- top to bottom. The sentences constructed this way siderably long time of use, about 2,000 years. The form complex texts with a syntax and structure first pre-Classic texts were written in the 3rd cen- still preserved in modern Mayan languages (Prager, tury B.C. The writing tradition reached its peak in 2014a). Maya writing shows a distinct calligraphic the Classic period (100 - 810 A.D.). The arrival of complexity, as we can observe a broad graphic vari- the Spaniards lead to a deep cut in Maya culture ability of signs. It allowed to write aesthetically that also affected the use of their writing. It was sophisticated texts without necessarily repeating only possible for the Maya to use it in secret. In the identical variants. For a linguistic expression, there underground, hieroglyphic writing continued until is not only a single graphic correspondence, but the late 17th century, and ceased to exist thereafter. usually several, sometimes very different, variants. It was only due to 18th and especially the 19th The bipartite syllabogram u could either be writ- century explorers who brought Maya texts back ten in its full form or alternatively with only one to light, and with that further and further into the of its two segments. These segments could also focus of research. To date, an estimated 10,000 be reduced or multiplied. As a further example, text carriers (Figure1) have survived, mainly mo- the syllablic sign yi shows how simple forms can numental inscriptions such as altars and stelae. Nu- be transformed into so-called head variants. Even merous texts can also be found on ceramics and diagnostic features (marked in Figure3) are not small artefacts, like jewellery, implements or boxes. present in all graph variants. It was even possible Of special importance are the codices written on to insert or parenthesise other graphs, as the ex- paper-like material, of which unfortunately only ample ma shows (Figure3). four have survived. The way in which variants are formed has not The Maya writing system is characterised by yet been the subject of systematic investigations an iconic character, hence labelled a hieroglyphic before. There are a few individual studies by Beyer script. Typologically it is a logo-syllabic system (1934a; 1934b; 1936; 1937) and Lacadena (1995,

34 Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)

of published sign inventories shows between 700 and 1,000 signs with a phonemic value. These comprise, according to own estimates, about 3,000 distinctive graph variants.

3 Sign catalogues as classification aids

The study of the hieroglyphs is carried out on the basis of previously discovered text carriers. How- ever, many text carriers have been destroyed over time, are forgotten in archives and museums, hid- den in private collections, or have simply not yet been discovered. Therefore, there will always be a small number of signs that could not be invent- oried. On the other hand, it can be assumed that new discoveries will be made in the coming years. Early in 2018, the results of a large-scale aerial Lidar prospection in Guatemala were presented, Figure 3: Graph variants showing a much denser concentration of archaeolo- gical sites in the tropical lowlands than previously assumed, also implying a larger number of text 204 ff.) that investigate substitutions, rotation prin- carriers (Clyens, 2018). ciples, or symmetries. In the course of the pre- The discovery of new inscriptions and thus the paratory work for the digital Sign Catalogue, we discovery of previously unknown signs represents recognised for the first time that there are general a challenge for the classification of Maya signs. rules and principles for the formation of graphs Previous sign inventories could, of course, only which can be derived and defined by graphotactic consider the signs known at the time for a printed 5 analyses . In total, we have identified 45 individual publication. The discovery of new signs has to be variation principles that can be divided into nine anticipated in the digital Sign Catalogue, also a pos- classes (mono-, bi-, tri-, and variopartite, division, sible re-classification of already known signs. We animation head, animation figure, multiplication, therefore need flexible data management processes extraction). for new research results. Another feature is the polyvalence of signs. A Compared to other ancient languages and writ- single glyph can either be read as a or a ing systems, the of Maya writing is syllabogram, depending on context. For example, relatively recent. Although it was known since the the sign with catalogue number 528 can be used 19th century that many texts contain readable dates as the logograms TUN (‘stone’) and CHAHUK (Morley, 1915), a historical character and thus a (name of the 19th day in the ), or as representation of spoken language was rejected the syllabogram ku. The graph variant used does (Thompson, 1956, 169). It was not until the 1950s not indicate which of the readings is to be applied. that Yuri Knorozov recognised the logo-syllabic This impressively illustrates how free Maya scribes character of Maya writing (Knorozov, 1952) and were in the design of their texts. It also explains the was able to present the first linguistically viable challenges faced in the still ongoing decipherment readings. Although his research appeared in Eng- process. Especially the graph variations make the lish translation in 1958 (Knorozov and Coe, 1958), determination of and their allocation the left his work mostly unnoticed by to a specific linguistic expression difficult. Until researchers for a long time. today, the exact number of signs and their vari- The hieroglyph catalogue by J. Eric S. Thompson ations could not be determined exactly. A review (Thompson, 1962) contains several multiple and false classifications, but it still serves as the stand- 5. This classification was first presented at the conference ard reference for sign classification and subsequent “Ägyptologische ‘Binsen’-Weisheiten III” which took place at the Academy of Sciences and Literature in Mainz on April 8, transliteration in preparation for linguistic analyses, 2016 (Prager and Gronemeyer, 2018). even though Thompson did not provide any read-

35 Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018) ings in rejection of Knorozov’s works (Prager, 2014a). humanities that aims to represent objects and the A total of nine other sign inventories have been knowledge about them in a data model. In Sowa’s published so far. They all have incorrect classi- sense, this means making the semantics of know- fications, particularly problematic are the multiple ledge objects explicit and transferring them into a classifications of signs in which several allographs data model (Sowa, 2000, 132). We call the ‘know- of a were inventoried as different signs ledge object’ the epistemic notion of specific entit- (Grube, 1990; Kelley, 1962; Kurbjuhn, 1989; Riese, ies that constitute themselves in a specific know- 2006; Ringle and Smith-Stark, 1996). The cata- ledge context. Knowledge objects do not exist logues are mostly simple graph inventories. It was by themselves. They are created by knowledge- not until Knorozov’s “Compendio Xcaret”, pub- generating processes; they are formed from state- lished posthumously in 1999 (Knorozov, 1999), ments, analyses and interpretations about specific that readings were linked to the corresponding entities that are the object of interest. We call the graphemes for the first time. However, these are specific knowledge context from which questions based on the state of research of the 1960s. Even to the object arise ‘domain’. The process of model- the two most recent catalogues show readings only ling knowledge objects and their specific domain in an unreflected manner. can be subdivided into the following steps: (1) ana- With our digital Sign Catalogue we want to con- lysis of domain-specific requirements, (2) know- sider both expression levels of a sign, the functional- ledge representation by conceptual modelling and linguistic and the graphemic, and model them in (3) construction of a machine-readable model. such a way that the assignment of both levels to each other is accurate and flexible. A further disad- 4.1 Analysis of domain-specific requirements vantage of traditional character catalogues is that they are unchangeable due to their printed edi- Requirements for the Maya Sign Catalogue are tion and therefore cannot be dynamically exten- to be determined by intensive exchange and trans- ded. This prevents misclassifications from being fer of knowledge between domain experts and the corrected or new relations between signs from be- modeller. In the process, specific requirements are ing established. Here, too, a catalogue in digital defined for the model, its implementation in a data format can remedy the situation by being able to cataloguing system and technical environment. To react flexibly to changes while at the same time do so, we chose a procedure based on the method offering persistent identification possibilities. of an expert interview. According to Reinhold, this represents an adequate method of information 4 Modelling of the digital sign catalogue needs analysis particularly in the context of mod- elling research processes and research data in the We are developing our digital Sign Catalogue with digital humanities, since a high degree of implicit the aim of carrying out a complete new inventory knowledge is to be expected on part of the research- of Maya signs and thus making a reliable statement ers (Reinhold, 2015, 330). However, it should be on the number of currently known signs. With the noted that experts only share goal-oriented know- Sign Catalogue we establish a new concept for the ledge if the questioner already has a high level systematisation and classification of signs. The of expertise on the subject (Flick, 2007, 218 ff.). specific characteristics of the complex Maya writ- The process thus requires a high degree of famili- ing system are explicitly represented in the model: arisation with subject on the part of the modeller, Graph variants, multifunctional characters and mul- who must be able to describe the respective domain tiple transliteration values are defined and related from a disciplinary point of view. to each other. Particular attention is paid to reading The process of defining requirements represents hypotheses, which are not only documented in the the most interdisciplinary area in joint project work. catalogue, but also qualitatively evaluated accord- According to an explorative-hermeneutic working ing to explicit criteria, so that they can be prepared method, we proceeded as follows: The first step for later analysis. was to incorporate the basic concepts of the do- In order to develop a Sign Catalogue in a virtual main, i.e. linguistics and grammatology. We also environment, the signs and their properties must be analysed the subject, the Sign Catalogue, for its represented in a machine-readable model. We un- importance as an essential tool for decipherment, derstand modelling as a research method of digital as well as its function specific to the project and

36 Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)

discipline. Furthermore, an intensive analysis of In our understanding, a sign is constituted by the the structure of the Maya writing system was car- conjunction of two different levels: 1) a linguistic- ried out with the aim of explicitly representing its functional level, which, according to Ferdinand de signs and their function in the model. The resulting Saussure, contains the notion and the pronunciation minutes of the discussions formed the basis for a (de Saussure, 1931, 28 ff.) as well as the specific catalogue of requirements, which serves as a work- function of the sign in its writing system; and 2) a ing paper to determine further requirements and to level of graphic representation which contains all specify existing ones. possible forms of expression reflecting the concept of the linguistic-functional level. 4.2 Knowledge representation by conceptual Consider a Maya sign as an example: it has the modelling verbal utterance yi and thus fulfills the function of a syllabic sign. For the syllabogram yi there are at We are convinced that the process of knowledge least three different graphical forms of represent- representation is a hermeneutic method aimed at ation (see Figure3). This form of representation constructing a machine-readable model. Concep- is called a graph.6 A graph is an abstract, typed tual modelling defines what Sowa calls ‘ontological form of an individually realised sign. The graph of categories’. They determine everything that can be yi in the variation ‘anthropomorphic head variant’ represented in a computer application (Sowa, 2000, recorded in the catalogue represents a type which 51). The creation of an ontological model aims to prototypically represents all individual writing vari- explicitly describe knowledge objects, their rela- ants and thus all actual occurrences for yi in the tionship to each other, and to their domain. The form of a ‘head variant’. definition of these categories is particularly difficult All graphs that are assigned to a shared linguistic when it comes to vague and uncertain information: expression stand in an allographic relationship to “any incompleteness, distortions, or restrictions in each other and in their totality form variants of the the framework of categories must inevitably limit grapheme of the sign (Bussmann, 2002, 294). A the generality of every program and database that graph can only be assigned to exactly one linguistic- uses those categories” (Sowa, 2000, 51). Since functional expression (idiomcat:Sign). This rela- ‘knowledge’ about objects can be questioned or in- tion (idiomcat:isGraphOf) is optional, so that also terpreted differently, it is necessary to represent graphs can be inventoried which could not be as- the various levels of knowledge in the model in signed to a ‘sign’ yet. order to counteract such distortions and to limit The function of a sign within the writ- the knowledge base precisely in the sense of the ing system is called a sign function (idiom- defined ontological categories. cat:SignFunction). For Classic Mayan, besides First, we scanned through specialised literature the two main classes logogram and syllabogram and linguistic vocabularies such as the SIL Gloss- (idiomcat:SyllabicReading), two additional func- ary of Linguistic Terms (Loos et al., 2003) for defin- tions were defined, since Rogers (2005, 10) also in- itions and concepts to describe writing systems and cludes numerals (idiomcat:NumericFunction) and signs. However, the materials showed that most diacritics (idiomcat:DiacriticFunction). Logo- concepts are not reusable for our model because grams are further subdivided by their function: We they already focus too much on applicability in a differentiate between those that have a deciphered particular linguistic context. Our aim, however, phonetic value (idiomcat:LogographicReading) is to define ontological categories for the repres- and those for which only a semantic field can be entation of signs and their function in a writing narrowed down (idiomcat:LogographicMeaning).7 system. Linguistic categories can only be applied on a meta-level (Diehr et al., 2018, 38). 6. The term ‘graph’ chosen by the project as the concept of the abstract, typified form of a realised character is still under Based on the literature and the outcomes of the discussion within the project. Since in linguistic discourse requirement analysis, we modelled concepts and the realised character is commonly referred to as ‘graph’, the abstract form represents a kind of prototype of the graph. their relationships to each other and to their domain Here it is still to be considered how such a typification can in an ontology using OWL syntax. Figure4 shows be considered in an intermediate step between the realised the domain model of the ontology and illustrates graph and the graphem. For example, the typed form could be considered as meta-graph or proto-graph in contrast to the the core concepts and their relationships to each graph. other. 7. This distinction may be irrelevant for sign classification

37 Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)

Figure 4: Domain Model of the Sign Catalogue

In the catalogue, we also document reading hy- Grid (Neuroth et al., 2015) for the administration, potheses and evaluate their plausibility (idiom- creation, and presentation of the data generated in cat:ConfidenceLevel). Therefore, a distinction the project. For data acquisition, we use the RDF of the logograms is necessary because different input mask of the TextGrid Lab, which we adapted evaluation criteria (idiomcat:parameter) have to to our project-specific needs. The mask renders be applied to each. In the linguistic analysis HTML forms to enter data on the basis of an RDF of the corpus it also makes sense to distinguish schema in TURTLE syntax. In order to make the in- between phonetic values and a meaning. Both put mask usable for the Sign Catalogue, the model are represented as transliteration values (idiom- designed as an ontology must be transferred into cat:transliterationValue) in the model. an RDF schema. This illustrates that data modelling often requires 4.3 Construction of a machine-readable pragmatic decisions. Since the functionality of the model input mask is based on an RDF schema in TURTLE Once the objects have been defined, their structure syntax, it was not possible to directly use the on- and relationships to each other have been represen- tology written in OWL. This is not a problem in ted in a conceptual model, the data model can be the present case, since the OWL schema could be developed. Concepts and structures are translated transferred lossless into an RDF schema. In our into a machine-readable form by formulating them scope, the expressiveness of the triple structure is in a syntax suitable for representing the conceptual sufficient to represent the complexity of the on- model. Since the Sign Catalogue was designed as tological relationships defined by the conceptual an ontology, a data structure is required that can model. However, the generated data is stored in a represent semantic relations between uniquely ref- triple store. With the use of the Query Language erenceable entities. SPARQL sophisticated queries are possible. We use the virtual research environment Text- In addition to the aim of representing knowledge in general, since the function of the character is one and the objects and their domain in a machine-readable same for the writing system. However, with the ontology we model, interoperability with other systems and model a specific use case: The Sign Catalogue serves as an instrument purposing the deciphering of the Classic Mayan schemata plays an important role in the develop- writing system. ment of data models. The adoption of already

38 Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018) defined concepts enables the exploitation of the po- be modelled as a subclass of gold:FeatureStructure. tential of existing metadata standards: Their integ- The evaluation of suitable top-level ontologies ration improves the searchability of ontologies and showed that the CIDOC Conceptual Reference increases both their quality and that of the applic- Model (CIDOC CRM), despite its focus on de- ations accessing them, since the knowledge base scribing processes for documenting cultural herit- is continuously enriched (Gradmann et al., 2013, age objects, contains many meta-concepts that are 275 ff.). Simperl names the following steps for the suitable for modelling our catalogue. In this re- reuse and integration of ontologies: 1) Search for gard, most classes have been defined as subclasses reusable ontologies, 2) integration-oriented evalu- of CIDOC CRM, for example classes ‘Sign’ and ation, and 3) integration of ontologies into the own ‘Graph’ were defined as specific sub-concepts of model (Simperl, 2010, 246). crm:E33_Linguistic_Object as they are “identifi- The chances of finding already existing ontolo- able expressions in natural language” (Doerr et al., gies are low if they are subject-specific and thus 2011). often only known in the respective domains. The ontologies developed in these domains are obvi- 5 Development of a system for the ously very specific and/or geared to a specific ap- qualitative evaluation of decipherment plication, which makes their immediate reuse and hypotheses the integration of concepts more difficult.8 In our Sign Catalogue we want to consider pub- Therefore, it makes sense to use so-called top- lished as well as own reading hypotheses, espe- level ontologies in addition to subject-specific on- cially since the readings are used for the linguistic tologies. Top-level ontologies make it possible to analysis of the text corpus for the compilation of reach agreement on generally valid concepts that the dictionary. do not have to be redefined in one’s own scheme Even though research is working with erroneous (Milton and Smith, 2004, 85). catalogues, the decipherment of Maya writing has While searching for reusable ontologies for the significantly advanced in recent decades. In the Sign Catalogue we came across ‘General Ontology 1980s, presented an important study 9 for Linguistic Description’ (GOLD). With the (Stuart, 1987) that gave the decipherment process aim to define basic categories and relations for the new impetus. Nonetheless, there is no consensus scholarly description of human language, the onto- about the reading for all signs. The reasons why logy seemed to be reusable for our purposes. In the signs are not equally deciphered can be manifold, integrated evaluation it turned out that it focuses on for example if a sign is only attested once, or grammatical rules with morphosyntax as a starting if there are no indicators pointing to phonemics. point (Farrar and Langendoen, 2003, 100). In our Since decipherment has mostly been carried out by context, the concepts defined in GOLD could only individual epigraphers using isolated examples in be used to a limited extent, since our concept rep- ‘hand-picked’ texts, they have never been checked resents a meta-level that serves for the organisation and compared using a complete text corpus, since of signs. Nevertheless, some concepts provided a it does not yet exist and is only being developed good starting point for the definition of our schema. by the project now. Not only for these reasons dif- The definition of gold:FeatureStructure as “a kind ferent reading hypotheses were and are presented of information structure, a container or data struc- for many signs. Individual interpretations of the ture, used to group together qualities or features context or linguistic basics also contribute hereto. of some object” (Farrar, 2010) is so general that our definition of the class idiomcat:SignFunction 5.1 Development and plausibility of reading as “a feature assessed to a Sign[,] [t]he nature of hypotheses the feature is specified by the subclasses” 10 could From a grammatological point of view, different categories of deciphering can be defined that are 8. This is also true for the ontology of the Sign Catalogue: It was developed specifically for the needs of the project. Even linguistic and semantic, extended after the sign though, it can principally be abstracted for other applications. function by Riese (1971, 20-23) and deciphering 9. http://linguistics-ontology.org/gold criteria by Houston (2001, 9 ff.). The phonetic 10. Ontology of the Sign Catalogue for Classic Mayan https://classicmayan.org/documentations/ content of linguistically secured signs can be veri- catalogue.html fied in a number of contexts, often with semantic

39 Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018) and lexical agreement in morphographs, sometimes s = Correspondence with image also with polyvalent readings. With operational q = Correspondence with object readings, the sound value can be inferred on the b = Doubling is possible basis of certain indications, such as vowel harmony rules, phonemic complements, or the semantic field. v = Vowel harmony However, multiple reading proposals that are not i = Lexical correspondence with graph icon polyvalent and have different plausibilities may These are combined in four propositional logics also occur, based on the individual interpretation to determine four plausibility levels: of the context or the knowledge of the material. In some cases, only parts of the phonetic content can 1 (excellent) = d ∨ u ∨ (k ∧ (s ∨ q)) ∧ (a ∧ ( f ∨ be isolated due to a lack of indications, such as m)∧(s∨q))∨((o∨ f ∨m∨y)∧(g∨r∨c)∧(s∨q)) the final sound through complements. However, se- mantics and almost always the word class can often 2 (good) = (t ∨ o ∨ c) ∧ v ∧ (s ∨ q)) ∨ (a ∨ o ∨ f ∨ be narrowed down from only partially readable or m) ∧ (g ∨ r ∨ c)) ∨ (g ∨ r ∨ c) ∧ (s ∨ q)) ∨ ( f ∨ m) ∧ undeciphered signs, either because of the context (s ∨ q)) or also because of the iconicity of the graph. About 3 (partial) = t ∨ c ∨ l ∨ r¬v one third of the sign inventory resists any reason- able interpretation so far, mostly morphographs 4 (weak) = g ∨ m ∨ r and among these mostly nouns. The criteria and propositional logics were 5.2 Definition of propositional logics to developed under critical consideration of the determine confidence levels existing decipherment practice. In an account Since each sign function fulfills a different lin- from about 1566, we actually have a contemporary guistic function, each requires the creation of its description of the (de Landa, 1959, own set of criteria to test a decipherment hypo- 104 ff.), even if it was considered as an thesis. For example, a logogram cannot appear in a then. Nonetheless, the manuscript contains stem medial position of a word, whereas a syllable annotations to certain hieroglyphics with a syllabic character can (criterion ‘m’). The evaluation of a value, which then brought the key to decipherment. decipherment is qualitatively done by combining Even before Knorozov it was known that there certain criteria in propositional logics, which in- was a strong text-image relation in the codices. dicate the plausibility in disjunct numerical values. Thus Paul Schellhas was able to isolate the proper Each sign function has a different number of plaus- names of different gods in a structuralistic way ibilities. If new texts add new occurrences with (Schellhas, 1897). This also turned out to be criteria that have not been considered so far, these helpful for decipherment. Among the examples can be supplemented. For example, the following with a syllabic annotation in said manuscript there criteria have been defined for the syllabic signs: are also and (ku and k’u in today’s orthography). These examples are therefore d = Landa’s Manuscript deciphered only by criterion ‘d’, as they are l = Consonant in Landa’s Alphabet supported by a contemporary witness, further u = Vowel in Landa’s Alphabet criteria would be exclusively complementary. Both k = Complete logographic substitution signs can be found in the initial position (criterion ‘f’) in hieroglyphs that identify animals that are t = Partial logographic substitution also depicted in the corresponding image: a turkey a = Allographic substitution and a vulture (Figure5). Colonial dictionaries o = Preposed to deciphered logogram provide kutz and k’uch for the two animals. Thus c = Postposed to deciphered logogram we also have a hypothesis to the sound of the f = Stem initial second syllabograms (tzV and chV), but without knowing the vowel, because both hieroglyphics are m = Stem medial not present in Landas list. Criterion ‘r’ is fulfilled g = Word medial and level 3 is reached. r = Word final But the sign tzV appears in the initial word posi- y = Ergative spellings tion among the depiction of a dog, attested as tzul

40 Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)

Figure 5: Decipherment of Syllabograms in the dictionaries. Thus the syllabic value tzu is out investigations for the use of the writing in its confirmed and the sign becomes level 2, in connec- spatial-temporal development and its use depend- tion with the image even to level 1 (criterion ‘s’). ing on the text carrier, its location as well as the The second sign is therefore lV; the initial sound text contents. is confirmed by the annotation for a very sim- We encode the text corpus in XML using the ilar sign in the manuscript. The examples have so TEI-P5 guidelines (TEI, 2018). In order to distin- far shown vowel harmony (CV1-CV1), so that the guish the structure, the arrangement of the glyphs reading might be lu, which would be level 2. and other inscription-specific phenomena, we have Final proof comes from a stem medial attesta- developed a TEI-compliant, application-specific tion. In the calendrical structure of an almanach, schema. Instead of using phonemic-transliterated we find three hieroglyphs where the number ‘11’ values, we use Semantic Web technology to refer should appear, which is spoken buluk. Indeed, with in the XML document to the resource stored in the spelling bu-lu-ku, the assumed vowel can be RDF. In the Sign Catalogue, each graph variant is confirmed here, and the sign is deciphered on level recorded as an independent resource and thus has 1. a URI. In the TEI-encoded text we tag each glyph with the element (character or glyph) and use 6 Methods and techniques for the the attribute @ref (“points to a description of the 11 creation of a machine-readable corpus character or glyph intended”) to refer to the re- spective URI (Figure6). The text itself consists One objective of the project is to create a machine- of external resources and together with the Sign readable corpus of all inscriptions. Due to the com- Catalogue forms an ontologically linked system. plex graphemics, the sign polyvalence and the low With this approach, we also take into account the decipherment status, it is not possible to capture the requirement to consider different deciphering hy- text in phonemic-transliterated values. Therefore, potheses. Since the text corpus refers to the graph, it is not surprising that there is currently no stand- which is potentially associated with several pos- ardised machine-readable font, such as Unicode, sible reading proposals in the digital catalogue, it is available for the Maya script. There are efforts in now possible to analyse the inscription under con- this direction (Pallan Gayol and Anderson, 2018), sideration of various hypotheses. A further advant- but in their current form they do not meet the clas- age of this approach is that new can sification requirements of the Mayan script. be flexibly integrated. Even in the case of newly

In the text corpus we would like to refer to the 11. http://www.tei-c.org/release/doc/ graph variants used in each case, in order to carry tei-p5-doc/en/html/ref-g.html

41 Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)

Figure 6: Exemplary encoding of a hieroglyphic block in TEI/XML formulated, reliable statements, these do not have which poses questions to knowledge objects and to be elaborately incorporated into the corpus. By their domain. The representation of this knowledge using URIs, it is possible to clearly reference the in a machine-readable model requires the precise corresponding resource in the Sign Catalogue. The definition of the objects as well as their relation- encoding of the corpus text itself does not change, ships to each other and to their knowledge base. it remains stable. Specialist traditions are questioned and applied methods are checked for their fundamentals. In the 7 Data publication and reuse possibilities case of modelling the deciphering hypotheses, it became clear that statements about the plausibility The cataloguing and classification of Maya signs of reading proposals can only be made on the basis is a continuous process. Through ongoing re- of formal evaluation criteria. Without reflection search, new findings will repeatedly lead to re- on the intuitive-pragmatic evaluation mechanisms, classifications. In the scope of our project, a first no evaluation system could have developed that new inventory will be completed in the course of operates on the basis of formalised and logical- 2018. As soon as all graph variants and signs have categorised parameters. been recorded, the encoding of the inscriptions and Modelling thus makes a central contribution to the compilation of the corpus will begin. The docu- the research methods of the Digital Humanities mentation of the texts and text carriers will extend by stimulating the reflection on the generation of over the remaining duration of the project until discipline-specific knowledge through conscious 2028. The data will successively be made access- questioning. The modelling process produces ex- ible on our project portal,12 which is currently in plicitly defined objects and relationships that are the conception stage. Furthermore, the corpus data transformed into machine-readable data and data will also be published in the TextGrid Repository structures. This transformation process makes the (TG Rep),13 where they can also be accessed by knowledge objects and their knowledge base avail- external users via OAI-PMH. The RDF data of the able for further analysis, whether hermeneutical or Sign Catalogue will be also retrievable at the portal quantitative. via a SPARQL endpoint and also at the TG Rep. All schemata created in the project can be down- loaded from the public area of our Git repository14 References and can be used under a CC BY-4.0 license. The Beyer, Hermann (1934a). The position of the affixes in documentation of the Sign Catalogue ontology is Maya writing I. Maya Research, 1:20–29. also available as a website.15 Beyer, Hermann (1934b). The position of the affixes in 8 Conclusion: modelling as a research Maya writing II. Maya Research, 1:101–108. method Beyer, Hermann (1936). The position of the affixes in Maya writing III. Maya Research, 3(1):102–104. Modelling processes bring about a reflection of the Beyer, Hermann (1937). Studies on the inscriptions of definitions and methods of the respective discipline, Chichen Itza. Contributions to American Anthropology and History, 4(21):29–175. 12. https://www.classicmayan.org/ 13. https://textgridrep.org/ Bussmann, Hadumod (2002). Lexikon der Sprachwis- 14. https://projects.gwdg.de/projects/ senschaft. Stuttgart: Alfred Kröner Verlag. documentations/repository 15. Ontology of the Sign Catalogue for Classic Mayan Clyens, Tom (2018). Exclusive: Laser scans https://classicmayan.org/documentations/ reveal Maya “megalopolis” below Guatem- catalogue.html alan jungle. National Geographic. URL

42 Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)

https://news.nationalgeographic.com/2018/ Houston, Stephen D., Oswaldo Chinchilla Mazariegos, 02/maya-laser-lidar-guatemala-pacunam/. and David Stuart (2001). Introduction. In Stephen D. Houston, Oswaldo Chinchilla Mazariegos, and David de Landa, Diego (1959). Relación de las cosas de Stuart, eds., The Decipherment of Ancient Maya Writ- Yucatán. Introducción por Angel María Garibay K. ing, pages 3–19. Norman, OK: University of Oklahoma México, D.F.: Biblioteca Porrúa 13. Editorial Porrúa. Press. de Saussure, Ferdinand (1931). Cours de linguistique Kelley, David H. (1962). Review of A Cata- générale. Paris: Payot. log of Maya Hieroglyphs, by J. Eric S. Thompson. American Journal of Archaeology, 66(4):436–438. Diehr, Franziska, Sven Gronemeyer, Christian doi:10.2307/502055. Prager, Maximilian Brodhun, Elisabeth Wagner, Katja Diederichs, and Nikolai Grube (2017). Mo- Knorozov, Juri (1952). Drevnyaya pis’mennost’ Tsen- dellierung eines digitalen Zeichenkatalogs für die tral’noy Ameriki. Sovietskaya Etnografiya, 3(2):100– Hieroglyphen des Klassischen Maya. In Maximilian 118. Eibl and Martin Gaedke, eds., INFORMATIK 2017, pages 1185–1196. Bonn: Gesellschaft für Informatik. Knorozov, Juri (1999). Compendio Xcaret de la Es- doi:10.18420/in2017_120. critura Jeroglífica Maya. Universidad de Quintana Roo, Promotora Xcaret, Chetumal. Diehr, Franziska, Sven Gronemeyer, Christian Prager, Maximilian Brodhun, Elisabeth Wagner, Katja Knorozov, Juri and Sophie D. Coe (1958). The prob- Diederichs, and Nikolai Grube (2018). Ein digita- lem of the study of the Maya hieroglyphic writing. ler Zeichenkatalog als Organisationssystem für die American Antiquity, 23(3):284–291. noch nicht entzifferte Schrift der Klassischen Maya. In Christian Wartena, Michael Franke-Maier, and Kurbjuhn, Kornelia (1989). Maya: The Complete Ernesto De Luca, eds., Knowledge Organization Catalogue of Glyph Readings. Kassel: Schneider & for Digital Humanities. Proceedings of the 15th Weber. Conference on Knowledge Organization WissOrg’17 of the German Chapter of the International Society Lacadena, Alfonso (1995). Evolución formal for Knowledge Organization (ISKO), pages 37–43. de las grafías escriturarias mayas: Implicaciones doi:10.17169/FUDOCS_document_000000028863. históricas y culturales. Madrid: Universidad Com- plutense. URL https://eprints.ucm.es/2433/1/ Doerr, Martin, Matthew Stiff, Nick Crofts, Stephen H0028201.pdf. Stead, and Tony Gill (2011). CIDOC Conceptual Ref- erence Model Version 5.0.4. Tech. rep., ICOM/CIDOC Loos, Eugene E., Susan Anderson, Dwight H. Day, CRM Special Interest Group. URL http://www. Paul C. Jordan, and J. Douglas Wingate (2003). Gloss- cidoc-crm.org/get-last-official-release. ary of Linguistic Terms 29. Dallas: SIL International. URL https://glossary.sil.org/. Farrar, Scott (2010). General ontology for linguistic description (GOLD). Tech. rep., Department of Milton, Simon K. and Barry Smith (2004). Top-level Linguistics (The LINGUIST List). URL http:// ontology: The problem with naturalism. In Achille C. linguistics-ontology.org/. Varzi and Laure Vieu, eds., Formal Ontology in Inform- ation Systems, pages 85–94. IOS Press. Farrar, Scott and D. Terrence Langendoen (2003). A linguistic ontology for the Semantic Web. GLOT Inter- Morley, Sylvanus G. (1915). An Introduction to the national, 7(3):97–100. Study of the Maya Hieroglyphs. Washington, D.C.: Government Printing Office. Flick, Uwe (2007). Qualitative Sozialforschung: Eine Einführung. Rowohlt, Reinbek. Neuroth, Heike, Andrea Rapp, and Sibylle Söring, eds. (2015). TextGrid: Von der Community – für die Gradmann, Stefan, Evelyn Dröge, Julia Iwanowa, Vi- Community. Göttingen: Universitätsverlag Göttingen. oleta Trkulja, and Steffen Hennicke (2013). Wege zur doi:10.3249/webdoc-3947. Integration von Ontologien am Beispiel einer Spezifi- zierung des Europeana Data Model. In Hans-Christian Pallan Gayol, Carlos and Deborah Anderson (2018). Hobohm, ed., Informationswissenschaft zwischen vir- Achieving machine-readable Mayan text via Unicode: tueller Infrastruktur und materiellen Lebenswelten. Blending “old world” script-encoding with novel di- Proceedings. 13. Internationalen Symposiums für In- gital approaches. In Élika Ortega, Glen Worthey, Isabel formationswissenschaft (ISI 2013), pages 273–284. Galina, and Ernesto Priani, eds., Book of Abstracts Di- Werner Hülsbusch. gital Humanities 2018, Puentes-Bridges, Mexico City, pages 256–261. Grube, Nikolai (1990). Die Entwicklung der Mayas- chrift: Grundlagen zur Erforschung des Wandels der Prager, Christian (2014a). Textdatenbank und Wörter- Mayaschrift von der Protoklassik bis zur spanischen buch des Klassischen Maya: Objectives. Tech. rep., Eroberung. In Acta Mesoamericana 3. Von Flemming. Arbeitsstelle der Nordrhein-Westfälischen Akademie der Wissenschaften und der Künste an der Rheinischen Friedrich-Wilhelms-Universität Bonn. URL http:// mayawoerterbuch.de/objectives/. 43 Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)

Prager, Christian (2014b). Textdatenbank und Sowa, John F. (2000). Knowledge Representation. Lo- Wörterbuch des Klassischen Maya: Project archi- gical, Philosophical, and Computational Foundations. tecture. Tech. rep., Arbeitsstelle der Nordrhein- Pacific Grove, CA: Brooks Cole Publishing. Westfälischen Akademie der Wissenschaften und der Künste an der Rheinischen Friedrich-Wilhelms- Stuart, David (1987). Ten phonetic syllables. Research Universität Bonn. URL http://mayawoerterbuch. Reports on Ancient Maya Writing 14. de/architecture/. TEI (2018). P5: Guidelines for Electronic Text Prager, Christian (2014c). Textdatenbank und Wör- Encoding and Interchange. Text Encoding Initiat- terbuch des Klassischen Maya: Sign classes and ive. URL http://www.tei-c.org/release/doc/ catalogues. Tech. rep., Arbeitsstelle der Nordrhein- tei-p5-doc/en/html/. Westfälischen Akademie der Wissenschaften und Thompson, J. Eric S. (1956). The rise and fall of Maya der Künste an der Rheinischen Friedrich-Wilhelms- civilization. In The Civilization of the American Indian Universität Bonn. URL http://mayawoerterbuch. Series 39. Norman, OK: University of Oklahoma Press. de/aufbau-2/. Thompson, J. Eric S. (1962). A catalog of Maya hiero- Prager, Christian and Sven Gronemeyer (2018). Neue glyphs. In The Civilization of the American Indian Ergebnisse in der Erforschung der Graphemik und Gra- Series 62. Norman, OK: University of Oklahoma Press. phetik des Klassischen Maya. In Svenja A. Gülden, Kyra V. J. van der Moezel, and Ursula Verhoeven-van Elsbergen, eds., Ägyptologische “Binsen”-Weisheiten Figure credits III: Formen und Funktionen von Zeichenliste und Paläographie, no. 15 in Akademie der Wissenschaften Fig.1: Example for a Maya Inscription, La und der Literatur, Abhandlungen der Geistes- und sozi- Corona Panel 1, Photo: Sven Gronemeyer. alwissenschaftlichen Klasse, pages 135–181. Stuttgart: Franz Steiner. Fig.2: Writing of Hieroglyphs. John Mont- gomery: How to Read Maya Hieroglyphs. Hip- Reinhold, Anke (2015). Das Experteninterview als zen- pocrene, New York, NY, 2002. trale Methode der Wissensmodellierung in den Digital Humanities. Information Wissenschaft Praxis, 66 (5- Fig.3: Graph Variants. Concept: Franziska 6):327–333. doi:10.1515/iwp-2015-0057. Diehr; Drawings: Christian Prager. Riese, Berthold (1971). Grundlagen zur Entzifferung der Mayahieroglyphen, dargestellt an den Inschriften Fig.4: Domain-Model of Sign Catalogue Onto- von Copan. In Beiträge zur mittelamerikanischen logy. Concept: Franziska Diehr. Völkerkunde 11. Kommissionsverlag K. Renner. Fig.5: Decipherment of Syllabograms. Concept: Riese, Berthold (2006). Drei neue Maya-Hieroglyphen Sven Gronemeyer; Drawings: , Kataloge. Anthropos, 101(1):238–246. Relación de las cosas de Yucatán. Fray Di[eg]o Ringle, William M. and Thomas C. Smith-Stark (1996). de Landa: MDLXVI. Unpublished Manuscript. A concordance to the inscriptions of , Chiapas, Madrid, Biblioteca de la Real Academia de Mexico. In Middle American Research Institute, Pub- Historia, 1566; Faksimile Codex Dresden: lication 62 . Tulane University. SLUB http://digital.slub-dresden.de/ Rogers, Henry (2005). Writing systems: A linguistic werkansicht/dlf/2967/1/; Faksimile Codex approach. In Blackwell Textbooks in Linguistics 18. Madrid: Ferdinand Anders Codex Tro-Cortesianus Blackwell Publishing. (Codex Madrid): Museo de America Madrid. Co- Schellhas, Paul (1897). Die Göttergestalten der dices Selecti Phototypice Impressi 8. Akademische Mayahandschriften: Ein mythologisches Kulturbild Druck- u. Verlagsanstalt, Graz, 1967. aus dem alten Amerika. Dresden: Richard Bertling. Fig.6: Exemplary encoding of a hieroglyphic Simperl, Elena (2010). Guidelines for reusing ontolo- block in TEI/XML. Screenshot of internal project gies on the Semantic Web. International Journal of Semantic Computing, 4(2):239–283. space of ‘Text Database and Dictionary of Classic Mayan’ in TextGrid Lab.

44