Analysing Emergent Users' Text Messages Data and Exploring Its

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. Digital Object Identifier 10.1109/ACCESS.2017.DOI Analysing Emergent Users’ Text Messages Data and Exploring its Benefits ANAS BILAL1, AIMAL REXTIN1, AHMAD KAKAKHAIL1 AND MEHWISH NASIM 2 1Department of Computer Science, COMSATS University Islamabad, Islamabad, Pakistan 2ARC Centre of Excellence for Mathematical and Statistical Frontiers, University of Adelaide, Australia Corresponding author: Aimal Rextin (e-mail: [email protected]). ABSTRACT While users in the developed world can choose to adopt the technology that suits their needs, the emergent users cannot afford this luxury, hence, they adapt themselves to the technology that is readily available. When technology is designed, such as the mobile-phone technology, it is an implicit assumption that it would be adopted by the emergent users in due course. However, such user groups have different needs, and they follow different usage patterns as compared to users from the developed world. In this work, we target an emergent user base, i.e., users from a university in Pakistan, and analyse their texting behaviour on mobile phones. We see interesting results such as, the long-term linguistic adaptation of users in the absence of reasonable Urdu keyboards, the overt preference for communicating in Roman Urdu and the social forces related to textual interaction. We also present two case studies on how a single dataset can effectively help understand emergent users, improve usability of some tasks, and also help users perform previously difficult tasks with ease. INDEX TERMS Emergent Users, Roman Urdu, Text Message Analysis, Word Completion I. INTRODUCTION emergent users. We propose an alternative approach; i.e., data Low cost mobile phones have allowed the wide scale adop- driven usability improvement. This approach is feasible in tion of smart phones in developing countries; for example, today’s world due to the large amount of digital footprints South Asian users alone constitute one of the largest mo- left by users on their computing devices. We present two case bile phone user bases in the world. In the last few years, studies on how a single dataset can help understand emergent researchers, have referred to user groups from developing users and improve usability of some tasks. countries as emergent users. Such user groups are either We decided to use data from text messages for our analysis arXiv:1811.11983v1 [cs.SI] 29 Nov 2018 less educated, economically disadvantaged, geographically because use of text messaging has increased in recent years. dispersed, or have a culturally heterogeneous background Pakistan Telecommunication Authority [5] estimates that [1]. Millions of people in the developing world own mobile there were 133:24 million mobile subscribers in 2016 and phones and large proportions of users have access to smart- more than 301:7 billion text messages were sent in 2014. phones, owing to the availability of low cost smartphones. We observed that Pakistani users generally use Roman Urdu Technology is designed primarily for the needs of a user when communicating by text messages. Roman Urdu is the from the developed world and it is assumed emergent users colloquial term given to the practice of writing Urdu (the will adopt it in a similar fashion. However, studies show that national language of Pakistan) in Roman script. It is generally emergent users adopt these technologies in different ways agreed by local users that this practice evolved when users [1] [24], with their unique usage peculiarities [2], [3]. Jones who were not comfortable in English started to communicate et al. [4] argued that emergent users should be engaged in over text messages [13], [31], [32]. technology design to yield designs of greater value. Studies recommend that the interface design follows the A. CONTRIBUTIONS traditional life cycle of HCI. However, this approach needs In this paper, we extend the work that we presented earlier [6] both time and financial resources; both lacking in case of by performing additional analysis on the collected dataset. VOLUME 4, 2016 1 Bilal et al.: Analysing Emergent Users’ Text Messages Data and Exploring its Benefits This dataset1 is a collection of original text-messages corpus under certain social settings. The benefits of text messag- of 116 mobile phone users in Pakistan. This data set con- ing and their difficulty in using them led Pakistani users sists of a hand-crafted file that includes grouping of various to adapt to the situation by using Roman script to convey spelling variations of the same word. The following are our their messages in the local language; this style of writing is contributions for this extended study: known as Roman Urdu [6]. Any kind of analysis on Roman 1) Texting behaviour: Our first contribution is studying Urdu is made complex by the fact that Roman Urdu has the texting behaviour of Pakistani users, such as, the no standard spelling defined and different people may use predominant language used by the users and the vari- different spellings for the same word [12]–[14]. ous spelling variations of different words. This will be Text messaging communication is becoming more popular covered in Section III. day by day, hence attracted the attention of researchers for 2) Spelling Variations: Since Roman Urdu has naturally analysis. For example some studies suggest that text mes- evolved, there are no standardized spellings in place. In sages can be classified into two disjoint groups. The first Section IV, we will show that spelling variations have class of messages are called informational messages and they three main categories. contain information of a practical nature, and the second 3) Case Studies: The marked contribution of this paper class is called relational messages and they contain greetings is to show the importance of collecting datasets from and personal conversations etc. [15]–[17]. Some other studies developing countries in order to understand the use of compare the structure and function of the text messages technology in understudied populations. In this regard, of users with different characteristics such as age, gender, we will discuss the following case studies: experience etc. [15], [18]–[21]. There are even studies that a) Availability of word completion feature is a major look into how the language used on Facebook or WhatsApp ease for smartphone users. In Section V, we show etc. is different from standard languages [22] and studies that that by using this corpus one can help emer- examine the functions of emojis in one-to-one messaging via gent smartphone users by providing him/her with text [23]. more accurate word completion. b) In Section VI, we will discuss how users’ text Roman Urdu Corpora data can be used to understand the social dynam- Urdu is not only the national language of Pakistan but also the ics governing the interactions among people in official language of many Indian states and is among the most developing countries, especially the ones involv- widely spoken languages of the sub-continent [12], [14], ing intimate relations. [29]. Researchers have collected various corpora of Roman Urdu in natural settings including a collection of 82; 000 B. BACKGROUND tweets from Pakistan and using it to design an algorithm to separate English words from Roman Urdu words [30]. Hus- Users of computing systems who are disadvantaged are gen- sain collected a corpora of 5000 text messages and analysed erally referred to as emergent users [1]. These include users the linguistic patterns of emergent users [31]. There is also from developing countries who face many challenges due a corpus of 50; 000 Roman Urdu SMS and a Roman Urdu to their different cultures and language, low education, and corpus containing 1 Million messages from chat rooms [32]. late access to technology etc. The number of mobile phone We can see that the larger datasets are from the web chat users from developing countries have increased dramatically rooms, which represents a totally different setting than text in recent years due to the decreasing cost of smartphones and messages. mobile internet. This has led to a steady stream of studies There have also been a few studies that investigate possible that analyse problems and potential solutions for these users. applications such as bilingual classification and sentiment These include studying text messages from a structural and analysis [33]; transliteration [13], [34]; word prediction [35]; functional point of view [6]; a study exploring the use of and tagging parts of speech etc. computing technology by young migrant workers in China There are several applications of natural languages corpus [7]; and a job searching website designed for illiterate people such as detecting SPAM emails and messages [25]–[27] in Pakistan [8]. and other natural language processing tasks. Hence it is It has been argued that emergent users especially illiterate no surprise that various researchers have collected natural users find it difficult to use text based features on phones language corpora. For example, Tagg [28] collected a large [9]. Various studies suggest that voice based access of such scale corpus and performed linguistic investigation on about features should be made easier, e.g., voice-based use of 11; 000 text messages. text messages [10] and a voice-based job searching mobile application for Pakistani illiterate users [11]. However, text C. ORGANIZATION based communication has its own advantages such as its Section I is the introduction and motivation. Section II de- asynchronous nature and privacy, making it more convenient scribes our dataset. In Section III, we discuss the texting 1Our dataset is available for academic use at https://github.com/CIIT-HCI/ behaviour of our user group, whereas in Section IV we Roman-Urdu-SMS-Corpus analyse spelling variations in Roman Urdu text messages. In 2 VOLUME 4, 2016 Bilal et al.: Analysing Emergent Users’ Text Messages Data and Exploring its Benefits Sections V and VI we report our two case studies.

Analysing Emergent Users' Text Messages Data and Exploring Its

IJCNLP 2011 Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011)

Urdu Zabta Takhti (UZT) 1.01 L2/02-004

Urdu Keypad Free Download

Urdu Keyboard Label Instructions and Specifications

Urdu Word Processor – Standards and Guidelines

Ebook Hindi Typing Tutor Youtube Fresh Data

Development and Validation of Phonetically Balanced Speech Perception Test in Urdu Language

Internationalized Domain Names (Idns)

PAN Localization Survey of Language Computing in Asia 2005

Development of Urdu Zabta Takhti (UZT) 1.01 L2/02-003

Urdu and Computer V1.5

Enhancing Security of Urdu Language Websites Through Urdu CAPTCHA