Offensive Language Identification in Dravidian Code Mixed Social Media

Total Page:16

File Type:pdf, Size:1020Kb

Offensive Language Identification in Dravidian Code Mixed Social Media Offensive language identification in Dravidian code mixed social media text Sunil Saumya1, Abhinav Kumar2 and Jyoti Prakash Singh2 1Indian Institute of Information Technology Dharwad, Karnataka, India 2National Institute of Technology Patna, Bihar, India [email protected], [email protected], [email protected] Abstract that the automation of hate speech detection is a more reliable solution. (Davidson et al., 2017) ex- Hate speech and offensive language recog- nition in social media platforms have been tracted N-gram TF-IDF features from tweets us- an active field of research over recent years. ing logistical regression to classify each tweet in In non-native English spoken countries, so- hate, offensive and non-offensive classes. Another cial media texts are mostly in code mixed or model for the detection of the cyberbullying in- script mixed/switched form. The current study stances was presented by (Kumari and Singh, 2020) presents extensive experiments using multiple with a genetic algorithm to optimize the distinguish- machine learning, deep learning, and trans- ing features of multimodal posts. (Agarwal and fer learning models to detect offensive content Sureka, 2017) used the linguistic, semantic and on Twitter. The data set used for this study are in Tanglish (Tamil and English), Man- sentimental feature to detect racial content. The glish (Malayalam and English) code-mixed, LSTM and CNN based model for recognising hate and Malayalam script-mixed. The experimen- speech in the social media posts were explored by tal results showed that 1 to 6-gram character (Kapil et al., 2020). (Badjatiya et al., 2017) ex- TF-IDF features are better for the said task. ploited the semantic word embedding to classify The best performing models were naive bayes, each tweet into racist, sexist and neither class. An- logistic regression, and vanilla neural network other deep learning model for the detection of hate for the dataset Tamil code-mix, Malayalam code-mixed, and Malayalam script-mixed, re- speech was proposed by (Paul et al., 2020). How- spectively instead of more popular transfer ever, most of the works for hate speech detection learning models such as BERT and ULMFiT were validated with English datasets only. and hybrid deep models. In a country such as India, the majority of people in social media use at least two languages, primar- 1 Introduction ily English and Hindi (or say Hinglish). These The hate speech is generally defined as any com- texts are considered bilingual. In a bilingual set- munication which humiliates or denigrates an indi- ting, the script of the entire post may be same with vidual or a group based on the characteristics such words coming from both of these languages termed as colour, ethnicity, sexual orientation, nationality, as mixed-code (or code mix) text. A few popular race and religion. Due to huge volume of user- code mixed posts in India are English and Hindi generated content on the web, particularly social (or say Hinglish), Tanglish (Tamil and English) networks such as Twitter, Facebook, and so on, the (Chakravarthi et al., 2020c), Manglish (Malayalam problem of detecting and probably restricting Hate and English) (Chakravarthi et al., 2020a), Kanglish Speech on these platforms has become a very criti- (Kannada and English) (Hande et al., 2020), and cal issue (Del Vigna12 et al., 2017). Hate speech so on. The Tamil language is one of the world’s lasts forever on these social platforms compared to longest-enduring traditional languages, with a set physical abuse and terribly affects the individual on of experiences tracing all the way back to 600 BCE. the mental status creating depression, sleeplessness Tamil writing is overwhelmed by verse, particu- and even suicide (Ullmann and Tomalin, 2020). larly Sangam writing, which is made out of sonnets Owing to the high frequency of posts, detecting formed between 600 BCE and 300 CE. The main hate speech on social media manually is almost Tamil creator was the writer and thinker Thiruval- impossible. Some recent researches have indicated luvar, who composed the Tirukkua, a gathering of 36 Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pages 36–45 April 20, 2021 ©2021 Association for Computational Linguistics compositions on morals, legislative issues, love and a few state-of-art techniques presented to handle ethical quality broadly thought to be the best work such issues. of Tamil writing. Tamil has the oldest extant liter- Most of the analysis proposed for the detection ature among Dravidian languages. All Dravidian of hate contents were validated with monolingual languages evolved from classical Tamil language datasets. It is relatively easy to build a monolingual (Thavareesan and Mahesan, 2019, 2020a,b). Even model, since (i) it is readily accessible, (ii) it learns though they have their own scripts still in the Inter- a single language dictionary, and (iii) unknown to- net code-mixing comments can be found in these ken frequency is lower in the test data. (Davidson languages (Chakravarthi, 2020b). Identifying hate et al., 2017) worked on 25000 tweets in English content in such bilingual or code mixed language and reported that tweets that contained racism and is a very challenging task (Jose et al., 2020; Priyad- homophobic contexts were hate speech and tweets harshini et al., 2020; Chakravarthi, 2020a). An that contained sexism contexts were offensive con- automatic model which is trained in a monolingual tents. Other work, on English data, was proposed context to detect hate posts may not yield the same by (Waseem and Hovy, 2016) where n-gram fea- result when tested bilingually or with a code-mix tures were extracted for identifying sexiest, racism, (Puranik et al., 2021; Hegde et al., 2021; Yasaswini and none class. et al., 2021; Ghanghor et al., 2021b,a). This is be- cause each system learns and recognises words in Apart from this, some works were reported on a the given vocabulary. When a new word is encoun- multilingual dataset where scripts of two or more tered, which is not in the vocabulary, it is marked as languages are mixed. (Kumar et al., 2018) pro- an undefined token that makes no difference in the posed a model for multi-lingual datasets containing estimation of the model. Therefore, when checked aggressive and non-aggressive comments in En- with the language in other scripts the model’s per- glish as well as Hindi from Facebook and Twitter. formance decreases. (Samghabadi et al., 2018) used ensemble learn- The current study identifies the hate content in ing based on various machine learning classifiers Tanglish, Manglish and Malayalam script mixed in such as Logistic Regression, SVM with word n- tweets and validated with the dataset provided in gram, character n-gram, word embedding, and sen- HASOC-Dravidian-CodeMix-FIRE2020 challenge timent as a feature set. They found that combined (Chakravarthi et al., 2020b). The dataset proposed words and character n-gram features performed in the challenge was collected from Twitter. A va- better than an individual feature. (Srivastava et al., riety of deep learning models have been examined 2018) identified online social aggression on the in the current paper to distinguish offensive posts Facebook comment in a multilingual scenario and from script-mixed posts. Along with that we also Wikipedia toxic comments using stacked LSTM examined a few transfer learning models like BERT units followed by convolution layer and fastText (Devlin et al., 2018a) and ULMFit (Howard and as word representation. They achieved 0.98 AUC Ruder, 2018) for the classification task. for Wikipedia toxic comment classification and a The rest of the article is summarized as follows: weighted F1 score of 0.63 for the Facebook test Section2 presents the overview of articles proposed set and 0.59 for the Twitter test set. (Mandl et al., in the domain of hate or offensive speech. The task 2020; Chakravarthi et al., 2020d) presented several and dataset description is explained in Section3. models and their results for English, Hindi, and This followed by the explanation of the proposed German datasets. They reported the best model methodology in Section4. The experimental re- as long short term memory-based network that sults and discussion are explained in Section5 and could capture the multilingual context in a bet- 6. The paper concludes by highlighting the main ter way. Bohra et al. (2018) extended the ear- findings in Section7. lier research of hate speech detection for code mixed tweets of Hindi and English. (Kumari et al., 2 Related works 2021) presented a Convolutional Neural Network (CNN) and Binary Particle Swarm Optimization The hate speech identification in social media texts (BPSO) based model to classify multimodal posts suffers from many challenges; such as code-mixed with images and text into non-aggressive, medium- social media content, script-mixed social media aggressive and high-aggressive classes. Another contents, and so on. This section sheds light on multilingual context could be code-mixed where 37 two languages are written in a single script. For extracted features were fed to classifiers like Sup- example, (Chakravarthi et al., 2021b; Chakravarthi port Vector Machine (SVM), Logistic Regression and Muralidaran, 2021; Chakravarthi et al., 2021a; (LR), Naive Bayes (NB), Random Forest (RF). The Suryawanshi and Chakravarthi, 2021) proposed a detailed performance report of word n-grams and code mixed Dravidian data in Tamil, Malayalam, character n-grams are shown in Section5. and Kannada. (Bohra et al., 2018) developed a Hinglish dataset from Twitter. They reported pre- 4.2 Neural learning-based models liminary experiment results of Support Vector Ma- Initially, the character n-grams TF-IDF features chine (SVM) and Random Forest (RF) classifiers (1-6 grams) extracted in previous Section 4.1 were with n-grams and lexicon-based features with an used as an input to a vanilla neural network (VNN) accuracy of 0.71.
Recommended publications
  • Learn Thai Language in Malaysia
    Learn thai language in malaysia Continue Learning in Japan - Shinjuku Japan Language Research Institute in Japan Briefing Workshop is back. This time we are with Shinjuku of the Japanese Language Institute (SNG) to give a briefing for our students, on learning Japanese in Japan.You will not only learn the language, but you will ... Or nearby, the Thailand- Malaysia border. Almost one million Thai Muslims live in this subregion, which is a belief, and learn how, to grow other (besides rice) crops for which there is a good market; Thai, this term literally means visitor, ASEAN identity, are we there yet? Poll by Thai Tertiary Students ' Sociolinguistic. Views on the ASEAN community. Nussara Waddsorn. The Assumption University usually introduces and offers as a mandatory optional or free optional foreign language course in the state-higher Japanese, German, Spanish and Thai languages of Malaysia. In what part students find it easy or difficult to learn, taking Mandarin READING HABITS AND ATTITUDES OF THAI L2 STUDENTS from MICHAEL JOHN STRAUSS, presented partly to meet the requirements for the degree MASTER OF ARTS (TESOL) I was able to learn Thai with Sukothai, where you can learn a lot about the deep history of Thailand and culture. Be sure to read the guide and learn a little about the story before you go. Also consider visiting neighboring countries like Cambodia, Vietnam and Malaysia. Air LANGUAGE: Thai, English, Bangkok TYPE OF GOVERNMENT: Constitutional Monarchy CURRENCY: Bath (THB) TIME ZONE: GMT No 7 Thailand invites you to escape into a world of exotic enchantment and excitement, from the Malaysian peninsula.
    [Show full text]
  • Arxiv:2009.12534V2 [Cs.CL] 10 Oct 2020 ( Tasks Many Across Progress Exciting Seen Have We Ilo Paes Akpetandde Language Deep Pre-Trained Lack Speakers, Billion a ( Al
    iNLTK: Natural Language Toolkit for Indic Languages Gaurav Arora Jio Haptik [email protected] Abstract models, trained on a large corpus, which can pro- We present iNLTK, an open-source NLP li- vide a headstart for downstream tasks using trans- brary consisting of pre-trained language mod- fer learning. Availability of such models is criti- els and out-of-the-box support for Data Aug- cal to build a system that can achieve good results mentation, Textual Similarity, Sentence Em- in “low-resource” settings - where labeled data is beddings, Word Embeddings, Tokenization scarce and computation is expensive, which is the and Text Generation in 13 Indic Languages. biggest challenge for working on NLP in Indic By using pre-trained models from iNLTK Languages. Additionally, there’s lack of Indic lan- for text classification on publicly available 1 2 datasets, we significantly outperform previ- guages support in NLP libraries like spacy , nltk ously reported results. On these datasets, - creating a barrier to entry for working with Indic we also show that by using pre-trained mod- languages. els and data augmentation from iNLTK, we iNLTK, an open-source natural language toolkit can achieve more than 95% of the previ- for Indic languages, is designed to address these ous best performance by using less than 10% problems and to significantly lower barriers to do- of the training data. iNLTK is already be- ing NLP in Indic Languages by ing widely used by the community and has 40,000+ downloads, 600+ stars and 100+ • sharing pre-trained deep language models, forks on GitHub.
    [Show full text]
  • ECFG-Malaysia-Mar-19.Pdf
    About this Guide This guide is designed to help prepare you for deployment to culturally complex environments and successfully achieve mission objectives. The fundamental information it contains will help you understand the decisive cultural dimension of your assigned location and gain necessary skills to achieve Malaysia mission success. The guide consists of two parts: Part 1: Introduces “Culture General,” the foundational knowledge you need to operate effectively in any global environment – Southeast Asia in particular (Photo: Malaysian, Royal Thai, and US soldiers during Cobra Gold 2014 exercise). Culture Guide Part 2: Presents “Culture Specific” information on Malaysia, focusing on unique cultural features of Malaysian society. This section is designed to complement other pre-deployment training. It applies culture- general concepts to help increase your knowledge of your assigned deployment location (Photo: US sailor signs autographs for Malaysian school children). For further information, visit the Air Force Culture and Language Center (AFCLC) website at www.airuniversity.af.edu/AFCLC/ or contact the AFCLC Region Team at [email protected]. Disclaimer: All text is the property of the AFCLC and may not be modified by a change in title, content, or labeling. It may be reproduced in its current format with the expressed permission of AFCLC. All photography is provided as a courtesy of the US government, Wikimedia, and other sources as indicated. GENERAL CULTURE PART 1 – CULTURE GENERAL What is Culture? Fundamental to all aspects of human existence, culture shapes the way humans view life and functions as a tool we use to adapt to our social and physical environments.
    [Show full text]
  • Malaysian Undergraduates' Attitudes Towards Localised English
    GEMA Online® Journal of Language Studies 80 Volume 18(2), May 2018 http://doi.org/10.17576/gema-2018-1802-06 Like That Lah: Malaysian Undergraduates’ Attitudes Towards Localised English Debbita Tan Ai Lin [email protected] School of Languages, Literacies & Translation Universiti Sains Malaysia Lee Bee Choo [email protected] School of Languages, Literacies & Translation Universiti Sains Malaysia Shaidatul Akma Adi Kasuma [email protected] School of Languages, Literacies & Translation Universiti Sains Malaysia Malini Ganapathy (Corresponding Author) [email protected] School of Languages, Literacies & Translation Universiti Sains Malaysia ABSTRACT Native-like English use is often considered the standard to be achieved, in contrast to non- native English use. Nonetheless, localised English varieties abound in many societies and the growth or decline of any language variety commonly depends on how it is perceived; for instance, as a mere tool for functionality or as a prized cultural badge, and only its users can offer us insights into this. The thrust of the present study falls in line with the concept of language vitality, which is basically concerned with the sustainability of non-global languages. This paper first explores the subject of localisation and English varieties, and then examines the attitudes of Malaysian undergraduates towards their English pronunciation and accent, as well as their perceptions of Malaysian English. A 26-item questionnaire created by the researchers was utilised to collect data. It was also tested for reliability, with returned values indicating good internal consistency for all constructs, making the instrument a reliable option for use in future studies. A total of 253 undergraduates from a public university responded to the questionnaire and results revealed that overall, the participants valued their local-accented English and the functionality of Malaysian English, but regarded this form of the language as substandard.
    [Show full text]
  • Chapter 1 Introduction 1.0 Background of the Study Malaysia
    Chapter 1 Introduction 1.0 Background of the Study Malaysia opens her doors to students from various parts of the world with the mushrooming of many private institutions of higher learning and the move to internationalize her many public universities. As such, there is a steady flow of foreign students who made their way to the Malaysian institutions of higher learning not only because of her reputable standard in education but also due to the fact that its education fees are cheaper compared to that in the United Kingdom and United States and that in terms of culture, Malaysia provides a more conducive environment especially to students from Muslim countries. The aftermath of September 11, 2001 has great percussions on many decisions that pertain to Islamic issues and the Muslims. Receiving education from a Muslim country like Malaysia is one of the decisions brought about by the devastating episode. Iranians have also turned to Malaysia to obtain higher education and many of them are enrolled in the masters and Phd programs offered by both the public and the private universities in Malaysia. These programs are offered in the English language and as most Iranians learn English as a foreign language, they encounter problems coping with their studies. As such, they have to prepare themselves to master the language in order to overcome language problems in their studies. How can we master a language? How can we conform to the language conventions in a foreign context? Why do only a few learners become near-native speakers despite 1 making many efforts? Prior to answer these questions, we should pay attention to what make a language foreign to us.
    [Show full text]
  • Mother Tongues and Languaging in Malaysia: Critical Linguistics Under Critical Examination
    This is an Accepted Manuscript of an article published by Cambridge University Press, Language in Society, available at https://www.cambridge.org/core/journals/language-in- society/article/mother-tongues-and-languaging-in-malaysia-critical-linguistics-under-critical- examination/63E9315695008B98517AA546BF6C5F54 Mother tongues and languaging in Malaysia: Critical linguistics under critical examination Nathan John Albury Abstract This article brings the critical turn in linguistics - with its current scepticism of essentialised languages and bias for languaging - under critical evaluation. It does so by bringing it face- to-face with the local-knowledge turn in sociolinguistics that emphasises local knowledge, held by language users themselves, to understand sociolinguistic phenomena through local epistemologies. This paper analyses whether and how epistemologies inherent to language, mother tongue and languaging hold relevance in metalinguistic talk in Malaysia. Focus group discussions with ethnic Malay, Chinese and Indian youth revealed that languaging through Bahasa Rojak is already firmly embedded in local epistemology for communicating across ethnolinguistic divides and fostering interethnic inclusiveness. However, an essentialised view of language also remains vital to any holistic sociolinguistic research in Malaysia in culturally-specific ways that do not conflict with languaging. The paper especially supports arguments that we ought not to disregard mother tongues in the interests of critical linguistics. Key words Critical linguistics, Mother tongue, Languaging, Linguistic culture, Malaysia, Folk linguistics 1 This is an Accepted Manuscript of an article published by Cambridge University Press, Language in Society, available at https://www.cambridge.org/core/journals/language-in- society/article/mother-tongues-and-languaging-in-malaysia-critical-linguistics-under-critical- examination/63E9315695008B98517AA546BF6C5F54 Introduction Sociolinguistics has evolved to an era where the term mother tongue can raise eyebrows.
    [Show full text]
  • Multilingualism and Language Shift in a Malaysian Hakka Familiy
    Grazer Linguistische Studien 89 (Frühjahr 2018); S. 89- 110. 0 0 1:10.25364/04.45:2018.89.5 Multilingualism and language shift in a Malaysian Hakka family' Ralf Vollmann & Tek Wooi Soon University oJGraz , Institute oJ Linguisti cs; ralfvollmann@uni-gra z.at Abs tract. Background.In multiethnic and mul tilingual Malaysia, four sta n­ dardized languages (Malay, English,Ch inese, Tami l) and a number of spo ken languages (e.g. Hakka, Bahasa Pasar , Malaysian Eng lish) serve pluriglossic purposes. Today, Standard (Malays ian) Chinese is used by ethnic Chinese not only at th e acrolectal, but also at th e me­ solectal (inter-gro up communication) and basilectal level (fa mily, friends). Studies have obse rved lan gu age shift of th e family lan gu age fro m a sma ller Chinese lan gu age to Malays ian Ma nda rin. Ma teria l & method. This study investi gat es th e lan gu age use in one Hakka family livin g in Peninsul ar Malays ia (KL, Penan g) and mainland Chi­ na. Spec ific focus lies on Hakka as th e famil y lan guage and its in ter­ generational development. Ana lysis. All family mem ber s are multi­ lingu al, th e older and middle ge ne ra tions use va rious lan gu ages be ­ side th eir family lan gu age, Hakka. The middle-aged spea ke rs cons i­ der th eir ow n Hakka 'impure'; code-s witching and multilingu al con­ ver sati ons are a regu lar occ ur rence.
    [Show full text]
  • Scribe FBF 2011 Rights Guide
    Australian Small Publisher of the Year 2011, 2010, 2008, 2006 Frankfurt Book Fair 2011 Rights Guide World rights in each title are held by Scribe, unless otherwise stated. Please address rights enquiries to: Amanda Tokar Rights & Contracts Manager [email protected] Scribe Publications Pty Ltd 18–20 Edward Street, Brunswick Victoria 3056 Australia Tel: +61 3 9388 8780 Fax: +61 3 9388 8787 2 Non-Fiction Forthcoming FALLOUT FROM FUKUSHIMA Richard Broinowski (Current Affairs/Environment, November 2012) An investigation of a disaster that has changed everything — especially how governments and citizens around the world think about nuclear power. On a calm, northern spring morning on 11 March 2011, a force-9 earthquake jolted the Pacific Ocean seabed 66 kilometres due east of the Japanese city of Sendai. Within 20 minutes, a black tsunami wave 14 metres high rolled in from above the earthquake’s epicentre and crashed onto the nearby coast. Entire towns collapsed, villages turned into rubble, and up to 20,000 men, women, and children were swept either inland or out to sea, along with animals, cars, buses, trucks, and trains. While struggling to convey to the world some idea of the unfolding destruction, Japan had to cope with a third calamity — the consequent malfunctioning of a nuclear-power complex near the town of Fukushima. Fallout from Fukushima looks first at the meltdowns at the Fukushima Dai-Ichi reactors and how these have almost bankrupted their owners, TEPCO — the largest electricity-supply company in the world. It describes how the Japanese authorities delayed warning the public about the extent and severity of radiation, and how locals reacted once they found out.
    [Show full text]
  • Language Attitudes, Code Mixing and Social Variables: Evidence from Bilingual Speakers of Hyderabad
    30 Language Attitudes, Code Mixing and Social Variables: Evidence from bilingual speakers of Hyderabad Hemanga Dutta The English and Foreign Languages University P.Rajeswari The English and Foreign Languages University Abstract This paper deals with the attitudes of the educated Telugu and English bilinguals of Hyderabad towards their mother tongue Telugu, global lingua franca English and the code mixed variety emerged out of the usage of the languages, popularly known as ‘Tenglish’ with reference to sixty one informants. A method of questionnaire incorporating the demographic profile of the informants and their usage of language has been adopted to find out a correlation between social variables such as age, gender, medium of instruction Texas Linguistics Forum 58: 30-39 Proceedings of the 23rd Annual Symposium about Language and Society-Austin April 17-18, 2015 @ Dutta 2015 31 at primary and secondary level, the informants’ language preference, language usage and attitudes towards these two languages under consideration. The statistical package used for this study is R console. Individual t- tests are conducted to obtain the probability values (p- values). In addition, Optimality theoretic constraints (Prince and Smolensky 1993) are adopted to discuss the data related to sound change across generations. Key words: Tenglish, Attitude, Optimality theory 1. Introduction: ‘Attitudes’, in the study of language, play a significant role in giving due recognition to a particular language. Baker (1992) claims that the status and significance of a language in society and within an individual instigates largely from adopted or learnt attitudes. So attitudes are crucial in language growth or decay, restoration or destruction.
    [Show full text]
  • © Copyright 2012 Lindsay Rose Russell
    © Copyright 2012 Lindsay Rose Russell WOMEN IN THE ENGLISH LANGUAGE DICTIONARY Lindsay Rose Russell a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2012 Reading Committee: Anis Bawarshi, Co-Chair Colette Moore, Co-Chair Candice Rai Program Authorized to Offer Degree: Department of English University of Washington Abstract Women in the English Language Dictionary Lindsay Rose Russell Chairs of the Supervisory Committee: Professor Anis Bawarshi and Associate Professor Colette Moore Department of English “Women in the English Language Dictionary,” is at once a historical account and rhetorical analysis of how women have been involved in the English dictionary from its bilingual beginnings in the early modern period to its present-day array of instantiations. Departing from well-worn accounts of the English dictionary as a series of more-or-less discrete texts created by more-or-less famous men to constitute a near-neutral record of the entire language, “Women in the English Language Dictionary” conceives, instead, of the English language dictionary as a rhetorical genre, the form, content, audience, exigence, and cultural consequences of which are gendered and gendering. As a focused analysis of the emergence and evolution of a genre, “Women in the English Language Dictionary” finds that women—as an abstract construction and as a social collectivity—were integral for the framing of early dictionaries’ exigencies and for the fashioning of audiences invoked by the genre. Women signal major shifts in the genre’s purposes and participants, shifts heretofore neglected in favor of generic phases delimited by changes in form and content.
    [Show full text]
  • Research Article Special Issue
    Journal of Fundamental and Applied Sciences Research Article ISSN 1112-9867 Special Issue Available online at http://www.jfas.info UNDERSTANDING MALAYSIAN ENGLISH (MANGLISH) JARGON IN SOCIAL MEDIA M. F. Rusli1,*, M. A. Aziz2, S. R. S. Aris3, N. A. Jasri4 and R. Maskat5 1Bachelor degree in Information System Engineering University Teknologi MARA, Malaysia 2Senior Lecturer at Universiti Teknologi MARA, Malaysia 3Department of Information Systems, Universiti Teknologi MARA 4Student at Universiti Teknologi MARA, Malaysia 5Doctorate in Computer Science from the University of Manchester, United Kingdom Published online: 01 February 2018 ABSTRACT The advent of the internet, mobile communication and media has created a new form of language such as Slang, Emoticons, Hashtag and Abbreviation as well as a combination of several languages in one word. Some go to the extent of localizing foreign language. In Malaysia, a new trend of using social media language is called Manglish, a mix language of Malay and English words that are popularized by social media users. Based on initial findings, the use of Malaysian English (Manglish) jargon can lead to confusion and miscommunication between social media users of different generations. Even though there are various translation software available, no online Manglish Jargon translator is available at present. Therefore, this work proposes the development of Manglish Jargon Translator that will reduce the miscommunication gap between social media users of all ages. Interview and survey instruments were conducted to capture user requirement and as part of the Manglish Jargon validation process. Author Correspondence, e-mail: [email protected] doi: http://dx.doi.org/10.4314/jfas.v10i2s.10 Journal of Fundamental and Applied Sciences is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
    [Show full text]
  • Manglish by Just Going Through This Book
    8.8mm lee su kim The classic bestseller now in a brand new edition For Review only Aiseh! Malaysians talk one kind one, you know? You also can or not? If cannot, don’t worry, you can learn & Got chop or not? stephen j. hall No chop cannot. how to be terror at Manglish by just going through this book. The authors Lee Su Kim and Stephen Hall have come up with a damn shiok and kacang putih way for anybody to master Manglish. Nohnid to vomit blood. Some more ah, this new edition got extra chapters and new Manglish words. So don’t be a bladiful and miss DOCUMENT AH CHONG & SONS out on this. Get a copy and add oil to your social life. – Kee Thuan Chye, author of The People’s Victory Now back after 20 years with brand new words, expressions and idioms, this hilarious classic remains packed with humour, irreverence and loads of fun. It bids all Malaysians to lighten up, laugh at ourselves and revel in our unique, multicultural way of life. MANGLISH MANGLISH Forget about tenses, grammar, pronunciation, and just relek lah … Aiyoh. Manglish or Malaysian English is what Malaysians speak MALAYSIAN ENGLISH when we want to connect with each other or just hang loose. Borrowing from Malay, Chinese, Indian, Asli, British English, AT ITS WACKIEST! American English, dialects, popular mass media and plenty more, our unique English reflects our amazing diversity. Like a frothy teh tarik or a lip-smacking mouthful of divine durian, Manglish is uniquely Malaysian. Those two, tak ngam lah! Manglish is an entertaining, funny and witty compilation of commonly used Malaysian English words and expressions.
    [Show full text]