Xamobile Mobile text entry for transcribing historical

Sunkanmi Olaleye Supervisor: Prof. Hussein Suleman

Digital Libraries Laboratory, University of Cape Town

Figure 5. Xamobile text entry methods Outline

• Introduction • Research Questions • Evaluation • Results

Digital Libraries Laboratory, University of Cape Town 2 Introduction

• Preserving languages is vital in saving cultural wealth and important ancestral knowledge embedded in these languages.

Digital Libraries Laboratory, University of Cape Town 3 Introduction: |Xam

• |Xam is one of the African languages classified as extinct (ISO 639-3) • Records of the language currently existing in a digitized dictionary. • Text of the language should be preserved. Mobile text entry could help.

Digital Libraries Laboratory, University of Cape Town 4 Introduction: |Xam Transcription Tools

Transcription Tools tested & Outcomes • Automatic Handwriting Recognition  45.10% at line level [Williams and Suleman, 2011] • Web Transcription  69.69% at line level [Ngoni and Suleman, 2013]

Digital Libraries Laboratory, University of Cape Town 5 Research Questions

1. How do the XWERTY, T9, Script and Hierarchical input methods compare in terms of accuracy for |Xam text?

2. How do the XWERTY, T9, Pinyin Script and Hierarchical input methods compare in terms of Speed of entry for |Xam text?

Xwerty T9 Pinyin Script Hierarchical

Digital Libraries Laboratory, University of Cape Town 6 Evaluation

Population Size

• 15 participants (Within group) Apparatus

• |Xam Line Text (Source: gold standard data used in AHR and TBL)

• Android Touchscreen mobile phones

• 4 Prototype Input Methods

AHR : Automatic Handwriting Recognition [Williams and Suleman 2011] TBL : Transcribe Bleek & Lloyd [Ngoni and Suleman 2013]

Digital Libraries Laboratory, University of Cape Town 7 Evaluation: Metrics – RQ 1

푻 − ퟏ ퟔퟎ 푾푷푴 = ∗ 푺 ퟓ |푰풏풑풖풕푺풄풓풆풂풎| 푲푺푷푪 = |푻풓풂풏풔풄풓풊풃풆풅푻풆풙풕| Word per Minute (WPM)

Keystroke per Character (KSPC)

•|T| is the length of transcribed |Xam text

•S is the duration in seconds

•Note that the constants 60 and 5 represent conversion metric for minute and average word length of |Xam text.

•|InputScream| is the total number key-press for the presented |Xam text

•|TranscribeText| is the total number key-press required using a particular input method for the presented |Xam text

(Koivisto and Urbaczewski 2005; MacKenzie and Soukoreff 2002; Mike et al. 2007)

Digital Libraries Laboratory, University of Cape Town 8 Evaluation: Metrics – RQ 2

푴푺푫 푨, 푩 푶풍풅 푴푺푫 푬풓풓풐풓 푹풂풕풆 = ∗ ퟏퟎퟎ% 푴풂풙 푨 , 푩 A represents the Presented |Xam text B represents the Transcribed |Xam text 푴푺푫 푨, 푩 푵풆풘 푴푺푫 푬풓풓풐풓 푹풂풕풆 = ∗ ퟏퟎퟎ% 푺푸

푆푄 is the mean length of the alignment strings in the set.

푆푄 ≥ 푀푎푥 퐴 , 퐵

(Koivisto and Urbaczewski 2005; MacKenzie and Soukoreff 2002; Mike et al. 2007)

Digital Libraries Laboratory, University of Cape Town 9 Evaluation: Results RQ1

There was a significant effect of techniques on the text input speed (F3,56 = 5.32, p < .0005).

Our results show that xwerty is the fastest text entry method with the highest word per minute (WPM) for |Xam text.

There was no significant difference in the keystrokes per character (KSPC) averages for Xwerty, T9 and Hierarchical but Pinyin Script recorded the highest KSPC and hierarchical the lowest KSPC.

10 9 8

7 6 5 4

Speed (wpm)Speed 3

2 1 0 X T P H Input methods

Digital Libraries Laboratory, University of Cape Town 10 Evaluation: Results RQ 2

The error rate was significant different among the input techniques (F3,52 = 6.66, p < .001).

Hierarchical input technique was the most accurate and had the lowest error rate; Pinyin Script had the highest error rate.

45.00 40.00

35.00 30.00 25.00 Error rate Error 20.00 15.00 10.00 5.00 0.00 X 9 P H Input methods

Digital Libraries Laboratory, University of Cape Town 11 References

• Kyle Williams, Hussein Suleman (2011) 'Creating a Handwriting Recognition Corpus for Bushman Languages' . ICADL 2011: pp. 222-23

• Ngoni Munyaradzi, Hussein Suleman (2013) ‘Quality Assessment in Crowdsourced Indigenous Language Transcription’. TPDL 2013: pp. 13-22

• MacKenzie I. S. and Soukoreff, R. W. (2002) ‘Text Entry for Mobile Computing Models and Methods, Theory and Practice’, Human-Computer Interaction, pp. 147-198

• Matti Koivisto and Andrew Urbaczewski (2005) 'Accuracy metrics in mobile text entry’, Human-Computer Interaction - 2005, pp. 1-4.

• Mike Tian-Jian Jiang, James Zhan, Jaimie Lin, Jerry Lin, Wen-Lien Hsu (2007) 'An Automated Evaluation Metric for Chinese Text Entry', Robustness analysis of adaptive Chinese input methods." Advances in Text Input Methods (WTIM 2011) (2011), pp. 2-4.

• UNESCO Atlas of the World’s Languages in Danger, Available at:[www.unesco.org/new/en/culture/themes/endangered-languages/] (Accessed: 1st March 2014)

Digital Libraries Laboratory, University of Cape Town 12