Database of Russian Sign Language
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 6079–6085 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC TheRuSLan: Database of Russian Sign Language Ildar Kagirov, Denis Ivanko, Dmitry Ryumin, Alexander Axyonov, Alexey Karpov St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences St. Petersburg, Russia {kagirov, karpov}@iias.spb.su, [email protected], {a.aksenov95, dl_03.03.1991}@mail.ru Abstract In this paper, a new Russian sign language multimedia database TheRuSLan is presented. The database includes lexical units (single words and phrases) from Russian sign language within one subject area, namely, "food products at the supermarket", and was collected using MS Kinect 2.0 device including both FullHD video and the depth map modes, which provides new opportunities for the lexicographical description of the Russian sign language vocabulary and enhances research in the field of automatic gesture recognition. Russian sign language has an official status in Russia, and over 120,000 deaf people in Russia and its neighboring countries use it as their first language. Russian sign language has no writing system, is poorly described and belongs to the low-resource languages. The authors formulate the basic principles of annotation of sign words, based on the collected data, and reveal the content of the collected database. In the future, the database will be expanded and comprise more lexical units. The database is explicitly made for the task of creating an automatic system for Russian sign language recognition. Keywords: Russian sign language, low resourced languages, corpora annotation, image recognition, machine learning people worldwide have disabling hearing loss, and this 1. Introduction number is going to grow during the next 30 years up to 900 million. There are no data on how much of those people use This paper presents the electronic lexical database of sign languages in everyday life, but their number must be Russian sign language (RSL) collected for one subject area. The database is called TheRuSLan (Thesaurus Russian considerable. Sign language is, like any natural language, a structured Sign Language (TheRuSLan, 2019)) and is the first of a form of communication involving gestures and motions. kind for Russian sign language. As the authors believe, the collected database can be helpful in tasks of machine SLs make use of different parts of body to convey meanings. Unlike spoken languages, SLs benefit from learning, gesture and sign language automatic recognition, space to build up strings of gestures and express semantics. and sign language linguistics. The primary objective of the investigation was to create a That’s why gestures of SL are classified, inter alia, according to position and orientation. Among other database of RSL, which could be of interest both for distinctive features are handshape and trajectory. linguistic research and machine learning. This was done with the use of two annotation layers: linguistic In spite of the official status and active use by the Russian deaf community, RSL was not sufficiently described and (“phonological”) and “image recognition-oriented”, i.e. presented to the scientific community. Linguistic research class labeling. The paper is structured as follows: after the Introduction, on RSL is in its infancy yet. Respectively, RSL can be classified into low-resourced languages due to the fact that Section 2 provides some information about RSL as low- there are not much RSL data collected. Databases, resourced natural language. Section 3 sketches the state-of- the-art in the field of corpora and databases of RSL. In however, are essential to training, testing and comparison of automatic (sign) language recognition systems. Section 4, the authors present their view on building RSL corpora, and provide the basic distinctive features used to annotate the collected database. Following this, Sections 5 3. SL Databases: State-of-the-Art and 6 contain information concerning collecting data and Nowadays there are many hand gesture databases, general features of the database. Eventually, the last section collected by different teams for different purposes. 7, serves as round-up of the paper where we discuss all the In this paper, the most widely used databases for sign obtained results as well as the main conclusions. language and hand gesture recognition are given in the Table 1. In contrast to the linguistic resources, these 2. Russian Sign Language as a Low- corpora were explicitly created for SL recognition, natural Resourced Language language processing and computer vision tasks. Typically, RSL is the language of communication used by the deaf there is a lack of suitable video corpora for the task of sign community in Russia and some of its neighboring language recognition and this kind of data differs countries. As the statistics by the Ethnologue international significantly from the real language encountered outside catalog (www.ethnologue.com) indicates, the number of the research lab. In the presented table, two of the largest RSL speakers was over 122 thousand people in 2010. In publicly available SL video corpora are listed (7 and 8). 2012, RSL was officially recognized as the language of communication in the Russian Federation. The others presented corpora are meant for gesture Gestures as a form of communication are of great recognition tasks, which is closely related to SL importance in everyday life and constitute different recognition, and usually used to train parts of sign language language systems and sub-systems. Most of them share the recognition systems. Description of the datasets and details property of being independent communication system such as number of classes, subjects, and samples are given based on gesticulation. According to the data provided by in the table. More gesture recognition databases relevant to the World Health Organization, today around 466 million SL recognition research can be found in (Pisharady and 6079 # Dataset Year Task Sample images # sub. # classes duration #s per sub. #samples Sensor 1 ChaLearn 2011 1) Play game 20 many - different 50000 Kinect 2) Remote control 3) Learning SL video video video fps Data Free Link height width acc. 320 240 10 RGB-D Yes (Malgired dy et al., 2012) 2 MSRC-12 2012 gestures to be #sub. # classes duration #s per sub. #samples Sensor recognized by the 30 12 6h. 40m. ≈200 6244 Kinect MS system video video video fps Data Free Ref. height width acc. 640 480 - RGB-D Yes (Simon et al., 2012) 3 ChaLearn 2013 Multi-modal #sub. # classes duration #s per sub. #samples Sensor Multi- Italian gestures 27 20 - ≈500 13858 Kinect modal recognition video video video fps Data Free Ref. height width acc. 640 480 30 RGB-D + Yes (Escalera audio et al., 2013) 4 NUS 2012 Hand posture #sub. # classes duration #s per sub. #samples Sensor dataset-II recognition 40 10 - - 2750 Camera image image video fps Data Free Ref. height width acc. 320 240 N/A RGB Yes (Pisharady (160) (120) Images and Saerbeck, 2013) 5 6D Motion 2011 Gesture #sub. # classes duration #s per sub. #samples Sensor gesture recognition 28 20 - 200 5600 Camera dataset video video video fps Data Free Ref. height width acc. - - - Row binary No (Chen et al., 2012) 6 Sheffield 2013 Learning #sub. # classes duration #s per sub. #samples Sensor kinect Discriminative 6 10 - 360 2160 Kinect gesture Representations dataset video video video fps Data Free Ref. height width acc. 640 480 30 RGB-D Yes (Liu and Shao, 2013) 7 SIGNUM 2008 Sign Language #sub. # classes duration #s per sub. #samples Sensor database recognition 50 455 42.7 h - 19500 Camera video video video fps Data Free Ref. height width acc. 780 580 30 RGB Yes (Agris et al., 2008) 8 RWTH- 2014 Sign Language #sub. # classes duration #s per sub. #samples Sensor PHOENIX- recognition 27 1081 12.54h - 6841 Camera Weather video video video fps Data Free Ref. height width acc. 210 260 25 RGB No (Forster et al., 2014) 9 Chalearn 2016 Isolated gesture #sub. # classes duration #s per sub. #samples Sensor Isolated recognition 21 249 - - 47933 Kinect gestures recognition video video video fps Data Free Ref. height width acc. 640 480 30 RGB-D No (Li et al., 2016) Table 1: Available sign language and hand gesture databases 6080 Saerbeck, 2015). analysis and automatic recognition. A feature of the However, only few databases are available for RSL, and proposed annotation system is that the parameters laid they were created either for educational or linguistic tasks. down in the basis of the actual phonological interpretation The only annotated linguistic database is the Russian Sign of the gesture find implementation in those classes that are Language Corpus (RSLC, 2014) by Novosibirsk State allocated for constructing probabilistic models. Technical University (NSTU). NSTU corpus contains over In (Stokoe, 1960), an innovative approach was introduced, 230 samples by 43 signers, and was annotated with the implying decomposition of gestures into three components: ELAN tools, being equipped with metadata search filters. 1) handshape and hand orientation; 2) location; 3) In general, this corpus is an effective linguistic tool for trajectory. The set of combination is not very large researching RSL. It should be noted, however, that most of (Battison, 1978) distinguishes 45 handshapes, 25 the signers use local dialects (primarily Moscow and localizations, and 12 types of performing a gesture in Novosibirsk), thus the data provided by this corpus are – in American SL). Handshapes are determined by "active" some aspects – not of high relevance for other RSL idioms. fingers and operations with them (fingers can be bent, In addition to the NSTU corps, there are other databases of hooked, flattened, etc.). The principles of gesture RSL. Most of them are video tutorials showing single description formulated by Stokoe have proven to be good words, or phrases.