SiLOrB and Signotate: A Proposal for Lexicography and Corpus-building via the Transcription, Annotation, and Writing of Signs

Brenda Clark, Greg Clark [email protected], [email protected]

Abstract This paper proposes a system of standardized transcription and orthographic representation for sign languages ( Orthography Builder) with a corresponding text-based corpus-building and annotation tool (Signotate). The transcription system aims to be analogous to IPA in using ASCII characters as a standardized way to represent the phonetic aspects of any sign, and the writing system aims to be transparent and easily readable, using pictographic symbols which combine to create a 'signer' in front of the reader. The proposed software can be used to convert transcriptions to written signs, and to create annotated corpora or lexicons. Its text-based human- and machine-readable format gives a user the ability to search large quantities of data for a variety of features and contributes to sources, such as dictionaries and transcription corpora.

Keywords: transcription, orthography, annotation tools Both currently exist as early versions which aim to 1. Introduction become a standardized and inclusive system for building sign language corpora. This paper proposes a system of standardized transcription and orthographic representation (Sign Language Orthography Builder; SiLOrB), which 2. Sign Language Orthography Builder corresponds to a text-based and searchable dictionary- or (SiLOrB) corpus-building and annotation tool (Signotate). Though a Past attempts at transcription or writing systems for sign few notation and transcription systems have been created languages have had shortcomings such as lack of for sign languages in the past (c.f. Stokoe, 1960; Hanke, completeness (e.g. limited non-manuals), use of non- 2004, and Sutton, 2010), none have yet become the standard characters (as in HamNoSys and SignWriting), standard, perhaps due to difficulty of use and structure and linear organization (as in HamNoSys and Stokoe that does not resemble the structure of signs. notation). The SiLOrB transcription system is designed to Consequently, sign language data is often presented with be analogous to the IPA used for spoken languages: a sign-by-sign or morpheme-by-morpheme glosses as a universal set of ASCII characters and multi-character substitute for phonetic or phonemic transcriptions, even ‘codes’ which correspond to individual phonetic though this would be considered to be unacceptable for components of a language (Clark, 2018). It is transparent, spoken language research. It also means that sign customizable, and creates sign language texts which can language corpora generally cannot be searched on the be searched based on phonetic or phonemic aspects. The basis of phonetic aspects and that notes about articulation conventions it establishes can be easily incorporated as a are not standardized. It is roadblock to lasting system of organization and presentation for lexicons and documentation and collaborative description. other texts. It is both machine-readable and human- While increasing use of any type of transcription and line readable, and has been created with both signers and drawings in sign language descriptions, dictionaries, and linguists in mind. analyses is certainly a step in the right direction, line The SiLOrB system also corresponds to pictographic drawings and even videos do not indicate the same level symbols which can be used to create orthographic of abstraction as a writing system. No matter how close it representations of signs. Each code has a direct and is to citation form, a line drawing or video clip is one predictable impact on the appearance of a sign's written example of a single signer with a specific dialect form, though most symbols are combinatory. For example, performing a sign. A writing system like SiLOrB has the a single complex symbol is used to depict the orientation potential to consistently represent the phonemic and shape of a hand (see Figures 1-3). This type of components of a sign with no distractors, and Signotate symbology allows for a less linear representation which software allows for easy creation of such representations. more accurately reflects a sign's articulation and Not only is this valuable for linguistic analysis, but it can phonemic structure. The sections below discuss basic be used to build easily-searchable and visibly transparent aspects of the transcription (2.1) and writing (2.2) systems. corpora and to create printed literature. While video or Those who are interested in the full current version can live signing is certainly the most reliable mode for visit https://bleegiimuusclark.com/home/silorb-sign- communication (just as audio or live speaking is ideal for language-writing/. spoken languages), written language is a way to spread information in communities with limited access to video 2.1 Transcription resources and to allow signers who are hesitant about being recorded to contribute to the conversation. The system described here is based on what is known about phonemic distinctions in sign languages (see Jepsen The sections below discuss the transcription and et. al., 2015; Crasborn et. al., 2000), and aims to improve orthographic conventions used in SiLOrB (2), which aim on existing forms of transcription. An early version of to improve on existing systems, and the software that can SiLOrB was used to write Sivia Sign Language (Clark, be used to annotate texts with this type of notation (3). 2017), and the current version (2.0) is expanded based on

LREC 2018 Sign Language Workshop 29 additional phonological distinctions used in ASL, Hawai‘i , and allows both types to be described with the Sign Language, and a few others. SiLOrB breaks a sign's same value for both hands (e.g. both hands inward or both articulation into the familiar categories of 1) shape and hands toward the dominant side). This is one of the ways orientation of the hands, 2) , 3) movement, and 4) SiLOrB is geared toward phonemic representation and non-manuals. Each category is specified with a capital searchability, along with its hierarchical organization. letter code followed by the applicable phonetic information in a set order. A user can choose to describe a 2.2 Written Representations sign at the level of detail necessary for their objective, in As with transcription, the objectives for the writing order to fit the language’s phonology or morphology, or system are clarity, consistency, and ease of use. Some even to record contrasting phonemic and surface forms of conventions are inspired by existing systems, such as the a sign. Because categories and descriptors are additive use of white for the palm and black for the back of the rather than mandatory, morphemes consisting of fewer hand, as in SignWriting (Sutton 2010). The basic structure components, such as a modification to , a type combines SignWriting’s ‘drawing of a signer’ approach of movement, or a facial expression, can be depicted with some linear elements which add to its consistency individually as well. and readability for longer texts. SiLOrB is described as After the specification of the dominant or non-dominant ‘non-linear’ in contrast to systems like HamNoSys or hand (D or ND), palm and finger orientation are given, , which simply list handshape, orientation, followed by groups of fingers and their positions. etc. from the left to right. SiLOrB instead uses many ‘DND^Vc*%A^+’, for example, means that both hands combinatory symbols and works largely from the center (DND) are in a palms upward (^), fingers forward (V) outward. It does have linear components, however, due to position with all the fingers rounded (c) and making handshape changes which are listed from left to right on contact (*). Then both hands change (%) to palms both sides, the depiction of non-manuals on the far right, bodyward (A), fingers upward (^) orientation with all the and formatting which standardizes the height of each fingers extended (+). Locations (L) consist of a regional component to resemble a line of text. code and optional further specification of placement and Full orthographic representations of signs (created with contact, as in ‘Lzv>< o’ describing a location in zero Signotate; see Section 3) combine pictographic symbols space (z) below the waist (v) and near the vertical center which are arranged as a ‘signer’ facing the reader. A (><), which is close to the torso but does not make location symbol is placed in the center with the hands (a contact (o). Movements (M) often consist of a direction combined shape and orientation symbol) on either side and a path, as in ‘M^sm’ for upward movement (^) with a and movement to the outside of each hand. Non-manuals short trajectory (sm). Non-manuals (NM) give a part of are given with the location if applicable (e.g. a central the body followed by a position or movement code. torso image may have markers for hands making contact ‘NMM*’ describes the mouth (M) in a pursed position. with that location and for movement of the torso itself), Thus, the sign for ‘fire’ in Sivia Sign Language (LSSiv) is and additional non-manuals occurring on other parts of transcribed as ‘DND^Vc*%A^+; Lzv o; M^sm; NMM*’ the body are represented on the far right. Figure 1 shows (see Figure 1). the orthographic representation of the Sivia Sign For simpler signs which do not utilize every aspect, Language (LSSiv) sign for ‘fire’ described in the previous unnecessary categories are simply deleted from the section: 1) the center drawing of a torso indicates zero transcription. The LSSiv sign for , for example, uses space and circles show that the hands are near a low and only the dominant hand (D) in a consistent orientation and central part of the torso; 2) the hand symbols on either shape: forward palm (V) and upward fingers (^) with the side show a change from palm up with fingers in a index (1) and middle (2) spread (w). Its location (L) is rounded position to palm bodyward with fingers extended, simply the forehead (fh) with contact (x), and there are no 3) movement arrows depict a short upward path, and 4) movement or non-manual components. Thus, the the face on the right edge shows pursed lips. (See B. transcription for ‘Peru’ is ‘DV^12w; Lfhx’. Clark 2018 for a full description of the current system and examples with corresponding videos.) SiLOrB transcription has also changed from its original version to use more iconic coding conventions which limit language barriers for users who are not fluent in English. Though top-level category codes and some specifiers are based on English terms, arrow-like characters (^, v, <>, ><, >>, <<, A, V) are used for orientation, movement, and location specifiers. Similarly, emoticons are the inspiration for many mouth shapes, Figure 1: Written representation of the LSSiv sign for such as ‘)’ for a smile and ‘P’ for an exposed tongue. ‘fire’ (DND^Vc*%A^+; Lzv>< o; M^sm; NMM*) Other codes are chosen to resemble a corresponding As in the transcription system, simpler signs may not use orthographic symbol, such as ‘%’ for a change in position all of the available parameters, and appear with fewer (as in Figure 1) or ‘*’ for contact made with the fingertips. symbols. The sign for 'coca' uses only the dominant hand Distinctions such as ‘in’ (toward the center) versus and has no movement, so its orthographic representation ‘toward the dominant side’ are clarified using digraphs is much shorter, as seen in Figure 2. The sign consists of (>< versus <<). Instead of requiring the absolute direction an extended index finger touching a puffed out cheek. of each hand, ‘inward’ and ‘outward’ options allow a user This is also an example of symbology that combines to reference a vertical center line. This creates a distinction between mirrored and purely directional

30 LREC 2018 Sign Language Workshop locative and non-manual aspects in a single region (here, the head).

Figure 2: Written representation of the LSSiv sign for ‘coca’ (D><^1+; Lchk*; NMchk<>d) Figure 3 shows the sign for 'cacao', which uses the non- dominant hand as its location and only moves on one side. Figure 4: Screenshot of Signotate The dominant hand with palm down and bent fingers moves outward repeatedly over the non-dominant hand Symbols appear in the upper box as codes are entered with palm up and extended fingers. Again, a single unit below to help ensure that the desired configuration is depicts the features of the non-dominant hand and serves achieved. Additional signs in the same transcript or as the location for the dominant hand. lexicon appear on the right panel with their transcriptions and glosses, and metadata can be entered below transcription codes. These fields allow a user to follow glossing and annotation conventions such as those outlined in Crasborn, Bank, & Cormier 2015, with the addition of standardized phonetic notation. Future Figure 3: Written representation of the LSSiv sign for implementations of Signotate may also include plugins ‘cacao’ (Dv><+; Lnd; M<>#) for programs such as iLex or ELAN to allow written signs to appear along with time-aligned transcriptions. 3. Signotate Software 3.2 Searchability Software called Signotate is currently being developed to The system is indexed by articulatory features that a user create written signs based on their SiLOrB transcriptions. inputs, so a Signotate corpus is instantly searchable by It is also a tool for creating documents such as a transcript phonetic or phonemic components. Aspects like one- or or a lexicon consisting of many annotated signs. Like two-handedness and symmetry or asymmetry are also SiLOrB, Signotate is designed to be intuitive for a variety included in searchable components, as well as some of users and customizable for a variety of tasks. The implications which are not explicitly expressed in following sections discuss specific features of sign entry transcription. For example, fingers described as ‘bent’ are (3.1) and search functions (3.2). Those who are interested also marked as extended, though SiLOrB coding only in the project can visit https://github.com/Signotate to find requires that ‘bent’ (r) be specified. Any field in metadata out more and keep up with the latest updates. (e.g. glosses, morphemes, participants, location, timestamp, etc.) or code in the form of a sign (e.g. 3.1 Sign Entry extended fingers, location, type of contact, movement While SiLOrB transcription code and orthographic direction, eye gaze, brow position, etc.) is searchable as symbols can be created by hand as video annotations or well. Signotate is also able to perform SQL style searches. entries in a lexicon, the Signotate software application For example, one could search for signs that begin at any provides an easy way to convert transcriptions into location below the waist, with the fingers oriented upward. written signs and to create a lexicon or a transcript from Similarly, one could search for transcriptions that involve multiple entries. It facilitates quick transcription in the a person from Cusco who is between 25 and 30 years of age, and is not a native user of Peruvian Sign Language. field, allowing transcribers to rapidly add collected data to a corpus. Figure 4 shows the application’s interface, Signotate exists as both a web implementation and which guides a user to enter a sign’s transcription in the standalone desktop implementation. The desktop version, four main categories of hand, location, movement, and which stores its data locally in an SQLite database, can be non-manuals. The default form is a one-handed sign used offline, while still enabling search across small to which occurs on the dominant side of the body, though a moderately-sized corpora. The web implementation, user can choose to switch to the non-dominant side. For which is backed by Elasticsearch, is capable of searching two-handed symmetrical signs, a ‘dual-sided’ option across very large corpora. In the future, Signotate could automatically copies a transcription to the non-dominant support more complex searches and aggregations, such as side as well, and for asymmetrical signs, an ‘asymmetrical’ phrasal search, or searches for grammatical or syntactic option allows a user to edit both sides individually. patterns. Signotate lexicons and transcripts can be imported from and exported to a human-readable yaml formatted file, as shown in Figure 5. (See https://bleegiimuusclark.com/signotate-v0-1-yaml/ for a complete example of this format.)

LREC 2018 Sign Language Workshop 31 transcript: 4. Conclusion - times: start: 1.51.416 The culmination of SiLOrB and Signotate is the ability to stop: 1.51.897 build corpora of sign languages which include not only usage: "typical" glosses and translations for videos, but annotation at transcriptGlosses: several levels, including anything from phonetic features, - language: "English" morphemes, and single signs to extended texts such as gloss: "bee" morphStruct: "bee" narratives or conversations. The resulting corpora would signClass: "noun" utilize a detailed and universal format for talking about usage: "typical" sign languages which is machine readable and easily used otherGlosses: by language researchers and computational linguists for a - language: "English" label: "meaning1" variety of tasks including automated sign language gloss: "sting" transcription or analysis. morphStruct: "sting" signClass: "verb/noun" Descriptive, searchable, and standardized annotations usage: "typical" combined with Signotate software open the door to new sign: collaborative possibilities for sign linguists, for natural silorbCode: "D><^t*; Lchk; Mx#" language processing researchers, and for signing properties: communities. Aside from its descriptive and analytical singleSided: "dominant" nonManual: false advantages, these solutions will enable a user to create hands: typed, alphabetized, and printed media for sign languages. silorbCode: "D><^t*" Not only is this important for data preservation, dominant: presentation, and organization, but it can provide options silorbCode: "><^t*" singleSided: "dominant" for communities with limited access to video and online stages: resources. The idiomatic nature, extensibility through - stageCode: "D><^t*" documentation, and software suite help ensure high features: quality and long lasting documentation of sign languages - palmIn for a variety of purposes. - fingersUp fingerFeatures: - fingers: Bibliographical References - 0 - 1 Clark, B. (2018). Sign Language Orthography Builder. - 2 Retrieved from https://bleegiimuusclark.com/home/ - 3 silorb-sign-language-writing/ - 4 Clark, B. (2017). A grammatical sketch of Sivia Sign features: - extended Language (Doctoral dissertation). University of Hawaiʻi - tapered at Mānoa. - contact Crasborn, O., Bank, R., and Cormier, K. (2015). Digging nonDominant: into Signs: Towards a gloss annotation standard for singleSided: "dominant" sign language corpora. Nijmegen: Radboud University location: silorbCode: "Lchk" & London: University College London. dominant: Crasborn, O., van der Hulst, H. and van de Kooij, E. silorbCode: "chk" (2000). Phonetic and phonological distinctions in sign singleSided: "dominant" languages. In A paper presented at Intersign: Workshop stages: - stageCode: "chk" (Vol. 2). region: "chk" Hanke, T. (2004). HamNoSys-representing sign language proximity: "near" data in language resources and language processing nonDominant: contexts. In LREC, 4. singleSided: "dominant" Jepsen, J., De Clerck, G., Lutalo-Kiingi, S., and movement: silorbCode: "Mx#" McGregor, W. B. (Eds). (2015). Sign languages of the dominant: world: a comparative handbook. Walter de Gruyter. silorbCode: "x#" Stokoe, W. C. Jr. (1960). Sign Language Structure: An singleSided: "dominant" outline of the visual communication systems of the stages: American deaf. In Studies in Linguistics, Occasional - stageCode: "x#" other: Papers, 8. Department of Anthropology and Linguistics, - tap University of Buffalo Buffalo, NY. - repeated Sutton, V. (2010). Sutton's SymbolBank: International nonDominant: SignWriting Alphabet. http://movementwriting.org singleSided: "dominant"

Figure 5: Signotate yaml snippet for one sign

32 LREC 2018 Sign Language Workshop