DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2020

Designing and using gamification elements to improve students’ user experience in a video-based mobile language learning app

THOR GALLE

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

1 Abstract

With the rise of the smartphone industry, the domain of mobile-assisted language learning (MALL) has increasingly grown. A large number of language learning applications have been developed aiming to support individuals’ second language acquisition on various levels, e.g., by teaching vocabulary and grammar to improve reading and listening comprehension. The viability of these applications has been examined in literature and shows overall positive but mixed results. On one side, their success is partly attributed to gamified design elements. These are reported to improve the user experience (UX) and boost learners' motivation. On the other side, the primary reliance on decontextualized vocabulary and grammar exercises is criticized. In response, one such application, SVT Språkplay developed by the Swedish non-profit ​ ​ Språkkraft, incorporated television programs as a longer form of context. This introduced novel video-based learning functions. The first aim of this thesis was to start filling a gap in research by evaluating the usability and user experience of these functions. This was performed through user tests and interviews with seven second language students who used the app to learn Swedish over a period of at least two weeks. The second aim of the thesis was to improve the usability and user experience of the problematic learning functions through a user-centred design process with the ultimate goal to improve learner support and vocabulary acquisition outcomes. The study participants consisted of doctoral researchers and students recruited from a basic Swedish course at KTH. They represent a demographic that benefits from learning Swedish to improve their job opportunities. The initial evaluation results were analysed through the lens of the MUUX-E theoretical framework [10], a framework for evaluating the “user experience and educational features ​ ​ of m-learning environments”. The evaluation showed that the core vocabulary learning aids directly integrated into the video watching experience were perceived as useful. Conversely, the gamified learning functions outside of the video watching experience were found to be scarcely used as intended. The subsequent user-centered design process improved upon the design of problematic learning functions by adhering to the principles of the MUUX-E framework. Concretely, more varied contextualized vocabulary exercises were designed, more options for user customization were included and feedback and progress metrics such as “streaks” were highlighted. An evaluation of the design with the same participants as the initial evaluation suggests that these changes would improve the usability and user experience of the application. Further research should evaluate an implemented end-product based on the proposed designs in a real-life setting. In that case, its pedagogical merit should also be evaluated. In summary, this thesis found that mobile video-based MALL apps such as Språkplay can provide usable and enjoyable language learning functions.

1

1.1 Keywords mobile-assisted language learning, video-based language learning, usability, UX, usability, user experience and educational features of m-learning environments (MUUX-E), gamification

2 Abstract | 3

2 Sammanfattning

Med tillväxten av mobiltelefonbranschen har domänen för mobilassisterad språkinlärning (MALL) ökat alltmer. Ett stort antal språkinlärningsapplikationer har utvecklats för att stödja individers förvärv av andraspråk på olika nivåer, t.ex. genom att lära ut ordförråd och grammatik samt för att förbättra läs- och hörförståelsen. Dessa applikationer har undersökts i litteraturen och visar positiva men blandade resultat. Å ena sidan tillskrivit deras framgång delvis spelelementen. Dessa rapporteras förbättra användarupplevelsen (UX) och öka elevernas motivation. Å andra sidan kritiseras det primära förlitandet på dekontekstualiserade ordförråd och grammatikövningar. Som ett svar skapades en sådan applikation, SVT Språkplay, utvecklad av den svenska ​ ​ ideella föreningen Språkkraft, vilken använder TV-program som en längre form av språkligt sammanhang. Detta introducerade nya videobaserade inlärningsfunktioner. Det första syftet med denna uppsats var att börja fylla ett hål i forskningen genom att utvärdera användbarheten och användarupplevelsen av dessa funktioner. Det gjordes genom att genomföra användartester och intervjuer med sju andraspråkstudenter som använde appen för att lära sig svenska under en period av två veckor. Det andra syftet med arbetet var att förbättra användbarheten och användarupplevelsen för dessa inlärningsfunktioner genom en användarcentrerad designprocess med det slutliga målet att förbättra studentens stöd. Studiedeltagarna bestod av doktorander och studenter rekryterade från en nybörjarkurs i svenska på KTH. De representerar en demografisk nytta av att lära sig svenska för att öka deras tillgång till den svenska arbetsmarknaden. De första utvärderingsresultaten analyserades genom tillämpning av MUUX-E-ramverket, ett ramverk för att utvärdera “user experience and educational features of m-learning ​ ​ ​ ​ ​ ​ environments” [10]. Det visade att de grundläggande ordförrådets inlärningshjälpmedel ​ ​ ​ som direkt integrerades i video upplevdes som användbara. Omvänt användes knappt alls de spelifierade inlärningsfunktionerna utanför videon. Den efterföljande användarcentrerade designprocessen förbättrades vid design av problematiska inlärningsfunktioner genom att följa principerna i MUUX-E-ramverket. Konkret utformades mer varierade kontextualiserade vokabulärövningar, fler alternativ för användaranpassning inkluderades och feedback- och framstegsmetriker som “streaks” lyftes fram. En utvärdering av designen med samma deltagare som den första utvärderingen tyder på att dessa förändringar skulle förbättra användbarheten och användarupplevelsen. Ytterligare forskning bör utvärdera en implementerad slutprodukt baserad på de föreslagna designförbättringarna i en verklig miljö. I så fall bör dess pedagogiska meriter också utvärderas. Sammanfattningsvis fann vi att videobaserade MALL-appar som Språkplay kan ge användbara och roliga språkinlärningsfunktioner.

3

2.1 Nyckelord mobile-assisted language learning, video-based language learning, usability, UX, usability, user experience and educational features of m-learning environments (MUUX-E), gamification

4

3 Acknowledgments

First of all I would like to thank Niss Jonas Carlsson, CEO of Språkkraft and industrial adviser to this thesis, for allowing me to examine and reimagine the app developed by his non-profit. The regular discussions we had and insights from Språkkrafts’ earlier R&D efforts were more than once an inspiration for my work. My views did not always align with the direction of the app development, so I appreciate Niss Jonas’ willingness to welcome a different perspective on the design of the app. I genuinely believe Språkkraft is contributing to an important social challenge in an innovative way, and I am grateful that my thesis could contribute to that. Next, I would like to thank my supervisor Olga Viberg for the continuous support in the planning and writing of this thesis. Her knowledge of the field and regular and precise feedback were an invaluable source of help. In this context, I should also mention the group supervision sessions where my peers Edward Lindén, Xinyan Luo and Martin Wedberg taught me useful lessons, and where I could share my progress and concerns once in a while. Finally, I want to express my gratitude to the seven study participants who voluntarily spent their time on interviews and user tests to help improve Språkplay. Without them, this study would have been difficult.

Stockholm, May 2020 Thor Galle

5

4 Table of contents

1 Abstract 1 1.1 Keywords 2 2 Sammanfattning 3 2.1 Nyckelord 4

3 Acknowledgments 5

4 Table of contents 6

5 List of acronyms and abbreviations 12

6 Introduction 1 6.1 Context of this work 2 6.1.1 Societal background 2 6.1.2 Learning language with television content 3 6.1.3 SVT Språkplay: the solution? 3 6.1.4 Reception of Språkplay 4 6.2 Problem 5 6.3 Purpose 7 6.4 Goals 7 6.5 Research Methodology 7 6.6 Delimitations 8 6.7 Summary 9 6.8 Structure of the thesis 9 7 Description of the examined application 10 7.1 Video watching functions 10 7.1.1 Control functions (pause, jump forward, jump backward) 10 7.1.2 The translation pop-up 11 7.1.3 The color system 12 Initial color determination 12 Changing word color 13 7.2 Active learning functions 13 7.2.1 Word test 13 7.2.2 Word list 14

6

7.3 Self-monitoring functions 15 7.3.1 Profile page 15 7.3.2 Activity statistics dashboard 15

8 Background 16 8.1 Video and language learning 16 8.1.1 Media and language learning 16 8.1.2 Learning with Closed Captions 16 8.1.3 Control and “glossed” captions 17 8.1.4 Implications for a video-based MALL application 17 8.2 Mobile assisted language learning (MALL) 17 8.2.1 Defining MALL 17 8.2.2 MALL with immigrants 18 8.2.3 Video-based learning in MALL 18 8.3 The role of gamification 19 8.3.1 Defining gamification 19 8.3.2 Game-based learning and gamification within MALL 19 8.3.3 The value of gamification for vocabulary learning 20 8.3.4 Implications for this project 20 8.4 Usability, User Experience (UX) and User-centered design (UCD) 20 8.4.1 Why and Usability and UX? 20 8.4.2 Usability and User Experience in MALL 21 8.5 Summary 21

9 Theory 22 9.1 The Double Diamond framework 22 9.2 The evaluation framework MUUX-E 23

10 Methods 24 10.1 Discover 24 10.1.1 Literature review 24 10.1.2 State-of-the-art review 24 10.1.3 Self-guided usability evaluation of the examined app 25 Initial evaluation 25 Cognitive evaluation 26

7

10.1.4 User test of the examined app 26 Participants 26 Initial survey 27 Meeting modality 27 User test process 27 Data analysis 28 10.1.5 Semi-structured interviews 28 Data analysis 29 10.2 Define 29 10.3 Develop 30 10.3.1 Low-fidelity prototype 30 10.3.2 High-fidelity prototype 30 10.3.3 Prototype evaluation with users 31 Group interview process 32 Data analysis 32 10.4 Ethics 32

11 Prototype development 33 11.1 Initial evaluation results of the examined app (Define) 33 Participants 33 11.1.1 General usability 34 User errors in the word test 34 The initial state of the color system 34 Efficiency of color switching 35 11.1.2 Web-based learning principles 35 Obstacles for the discoverability of learning features 35 Non-verbatim captions as imperfect learning media 35 11.1.3 m-Learning features 36 Useful translation pop-up 36 Missing ability to hide captions 36 Recommendations for learning & viewing are not personalised 37 Limited encouragement of active learning 37 Missing features to integrate with learning context 38

8

11.1.4 Educational usability 38 Clarity of the color system at first use 38 The clarity of learning goals 39 11.1.5 User Experience 39 Aesthetic appeal of the application 39 Gamified elements as a fulfiller of motivational needs 40 Emotional viewing context 40 11.1.6 Summary of the MUUX-E findings 41 11.2 Design of the improved prototype (Develop) 41 11.2.1 Low-fidelity prototype sketches 41 11.2.2 High-fidelity prototype 43 Home screen 43 Navigation bar & information architecture 44 ‘Learn’ screen & word list management 44 Video watching screen 45 Collaborative caption quality assurance 46 Translation pop-up module 47 Word list screens 47 Word tests 48 Flash cards 49 Mixed exercises 50 Feedback and progression 52 12 Results 53 12.1 General usability 53 Video screen and transcript 53 New translation pop-up 54 Dashed underlines and hidden green underline 54 12.2 Web-based learning principles 54 Navigational architecture 54 Improving media quality by mismatch reporting 55 12.3 m-Learning features 55 Personalized “Continue watching” 55

9

Customizable word lists 56 Hiding captions 56 12.4 Educational usability 57 Modularised exercises 57 Video-based exercises 57 Learning goal reminders 58 12.5 User Experience 58 Aesthetic experience 58 Gamified elements 58 12.6 Summary 58

13 Discussion 59 13.1 Evaluation of the original app 59 13.1.1 Limitations 60 13.2 Design process and results 61 13.2.1 Limitations 61 13.3 Further work 62 13.4 Ethics 62 14 Conclusions and Future work 63

15 References 64

16 Appendix A: The initial survey for study participants 67 16.1  Hey, thanks for your interest in the Språkplay study! 67 16.1.1 Do I have your permission? 67 16.2 A bit more about you 68 16.3 Done! 69 17 Appendix B: Semi-structured interview questions 69 17.1 A. Contextual inquiry 69 17.2 B. Learning scenario follow-up (examined application) 69

18 Appendix C: Prototype evaluation questions 70 18.1 Generic Usability 70 18.2 Web-based learning principles 70 18.3 Educational usability 70

10

18.4 m-Learning features 70 18.5 User Experience 71

11

5 List of acronyms and abbreviations

CEFR common european framework of reference for languages DGBL digital game-based language learning HCI human-computer interaction L2 second-language MALL mobile-assisted language learning MGBL mobile game-based learning MUUX-E usability and user experience of mobile educational environments (evaluation framework) UCD user-centred design UX user experience RQ research question SVT Sveriges Television (company) SLA second language acquisition

12

6 Introduction

This chapter presents the specific problem that this thesis addresses, the context of the problem, the goals of this thesis project, and finally, it outlines the structure of the thesis. In recent years, mobile-assisted language learning (MALL) has risen as a means to support learners’ second language (L2) acquisition in informal language learning on various levels [2,18,25,31]. Compared to other learning methods, MALL brings flexible use of time and ​ space for learning, good alignment with personal needs and preferences and the possibility of keeping learning efforts ongoing while waiting or commuting [22]. Popular ​ ​ state-of-the-art MALL applications such as have shown considerable merit with regard to explicit learning outcomes such as vocabulary acquisition and grammar understanding [2,25]. Contrarily, challenges were also identified, such as faltering learner ​ ​ motivation and persistence [2,15,25]. Moreover, prior research found limitations regarding ​ ​ the development of implicit skills such as listening and speaking comprehension [25]. ​ ​ The impaired viability of using MALL for improving implicit learning outcomes has been attributed to a primary reliance on decontextualized exercises on the individual sentence level [11,25]. According to Krashen, such exercises promote conscious language learning as ​ ​ opposed to subconscious language acquisition [21]. He states that methods encouraging ​ ​ the latter, such as the extensive reading of L2 texts [20], have been proven to perform ​ ​ better on communicative tests where real time, implicit language knowledge is required [21]. ​ Recently, new mobile technology, such as developed by the Swedish non-profit organization Språkkraft1 attempts to cater to this critique by merging more extensive ​ language context (television) into language learning activities. In their application SVT ​ Språkplay2, this concretely entails a combination of (1) an interactive video watching experience with “glossed” captions, i.e. captions where the translation of a word can be requested at any time [27], and (2) exercises with digital game-based learning elements ​ ​ that are created from the context of the video transcript. In this process however, there remain challenges. First, the area of context-based MALL is largely unexplored in literature and little state-of-art precedents are available for comparison. At the same time, the domain of L2 learning has known a growing body of research on video-based learning with glossed captions that suggests promising results regarding vocabulary acquisition [26,34,37]. This calls for research of the applicability of these findings to MALL. ​ Second, the usability and user experience (UX) of other state-of-the-art MALL applications have been praised, notably of Duolingo [32,35,36]. Furthermore, digital game-based ​ ​ elements (DGBL) and gamification have been linked to their success in vocabulary learning acquisition outcomes and UX [4,11,25,32]. However, these game-based exercises are also ​ ​ the elements that were criticized before for being ‘decontextualized’. This raises the questions: are decontextualized exercises related to a good UX? What is the usability and UX of the contextualized exercises that Språkplay aims to provide?

1 The website of the organization may be found at https://sprakkraft.org ​ 2 A website describing SVT Språkplay is found at https://sprakkraft.org/svt-sprakplay/ ​

1

This degree project specifically takes up that last question. It studies the usability and user experience of the aforementioned novel vocabulary learning functions found in the MALL app SVT Språkplay, a smartphone app developed by Språkkraft. The app and the ​ ​ non-profit behind it will be discussed in more detail in the section 6.1.3. A description of ​ ​ the learning functions of the app can be found in section 7. This thesis examines the app within the context of its use by immigrant students and researchers in Swedish higher education, a target group with a need for more immersive language learning, as will be explained in the next section. It aims to find out which learning functions provide a good usability and user experience, and how, from the human-computer interaction (HCI) perspective, they could be improved and extended to optimize second language learning.

6.1 Context of this work

6.1.1 Societal background

For immigrants, speaking the main language used in a host country is key to access job markets, social - and cultural facilities. Yet upon arrival, many of them do not speak that language. One such immigrant learner group in Sweden is foreign students and researchers, for whom good skills in English are often sufficient to enter as a student or as a doctoral researcher in academic environments. Yet, not knowing Swedish brings the group to a clear disadvantage when looking for work. Statistics Sweden found in a survey with foreign-born, highly-educated people between the ages 24-64 that they are 22% less likely to be employed compared to native born Swedes [33]. A study based on this research ​ ​ concludes that “a lack of contacts was by far the most common reason why foreign born persons had difficulties in getting positions that they applied for” and that “the second most common reason was difficulty with the Swedish language.” [33 p. 23]. It is not only ​ ​ these immigrants that ‘suffer’, but also Swedish society. For example, the management of KTH in Stockholm, has recently made it a requirement for all foreign staff members to have a learning plan for Swedish [24]. The reasons given are twofold. First, less knowledge of Swedish may pose a risk in decreasing Swedish research outputs, reducing the relevance of research for the local society. Second, the administration of certain university roles, especially more senior ones such as “dean”, requires command of the Swedish language. Not knowing the language may prevent researchers from growing in their academic career, and puts more pressure on Swedish-speaking employees to fulfill those roles. From the aforementioned challenges, it follows that learning Swedish is critical for immigrant students and researchers that wish to work and quickly integrate in their host country. It removes barriers in both academic and industrial career paths. Additionally, it may be indirectly useful to establish contact with potential employers and build a network. Finally, it would be beneficial for society. To address this need, the Swedish government-sponsored program Utbildning i svenska ​ för invandrare (SFI) offers free Swedish education for immigrants. However, this formal ​ educational system is only one way to solve the problem, and it may not be sufficient.

2

6.1.2 Learning language with television content

Modern digital technology could increase the availability of tailored Swedish language content for second language learners. Most people in Sweden carry a smartphone wherever they go, which provides an opportunity to find and explore content such as news articles and television programmes outside of classrooms. One potential source is the public television provider SVT. SVT offers free programmes that can familiarize the watcher with real-life spoken Swedish, national news and Swedish culture. However, watching regular television programmes may prove a steep challenge for basic language learners. Indeed, the Common European Framework of Reference for Languages (CEFR), a framework that categorizes levels of language mastery, deems this a challenge. There are six CEFR levels: A1, A2, B1, B2, C1, C2. A1 denotes the basic level of an absolute beginner, while C2 denotes near-native proficiency in writing, speaking and listening. In the framework’s level descriptors, learners at the A1 level are not expected to understand any second language video material [5]. Learners at the A2 level should be able to identify changes of topic and certain points of commentary supported by visuals [ibib.]. For full ​ ​ comprehension, obstacles exist in speech that may be too fast, words and sentences that could be too complicated or the subject matter that may be too unfamiliar. Only at the upper-intermediate B2 level does the framework expect full comprehension of video material [ibid.]. ​ ​ This may lead basic learners to refrain from studying this challenging language content and to stick with the level-tailored material from formal education resources. But is this not a missed opportunity? How could the video content be adapted so that it fits the learner’s level? Here technology may also provide an answer. In the last decade, mobile devices have been increasingly used to facilitate self-regulated second language learning. Apps such as Duolingo, Babbel and with millions of users aim to teach vocabulary and grammar by means of interactive exercises, substituting or complementing formal education. In doing so, they achieve remarkable results. They also employ gamification, i.e., a way of designing experiences where progress, achievements and play are introduced to a non-game setting. Could this mobile technology also be used to aid in the consumption of Swedish television programmes? Is there a marriage possible between this interactive, mobile way of learning vocabulary and mobile consumption of television content? A non-profit organization in Sweden called Språkkraft partnered with the Swedish public broadcaster SVT to develop such an application, called SVT Språkplay. This degree project has been conducted in collaboration with Språkkraft and evaluated the usability and user experience of Språkplay.

6.1.3 SVT Språkplay: the solution?

Språkkraft is a Swedish NGO that aims to accelerate the integration of immigrants in Sweden by focusing on their development of Swedish language skills. Their main activity is the development of freely available digital applications to facilitate Swedish language

3

learning. They are primarily active in Sweden but are soon expanding their activities to other countries in the near future.

SVT Språkplay is a mobile app developed by Språkkraft and published in a partnership with SVT, the Swedish public broadcaster. It has 390 000 downloads as of April 2020 across the iOS and Android platforms and 50 000 monthly active users as of February 2020. It offers mobile access to Swedish television programs from SVT with aids in language learning: the closed captions can be translated to the language learners' first (known) language on-demand. This is the crucial feature that provides learners with the means to increase their consumption of Swedish media content. A description of all learning functions of Språkplay is found in the next chapter (7).

It is the intention of Språkkraft that by frequently using the app, learners could improve their vocabulary and grammar knowledge but also gain awareness of Swedish culture and customs and through the television programmes. But does this intention materialize?

6.1.4 Reception of Språkplay

Learners have been overwhelmingly appreciative of Språkplay. 80% of 762 user reviews on Google Play gave the app a 5-star review (on 5 stars)3. They highlight mainly the concept of combining learning with television watching, and reportedly gain benefits from it:

“This application teaches me more than school ” (translated by the author) “Den här applikationen lära mig mer än skola ” (original) - Mohamad A. 1 June 2017, 5 star rating

“Really really nice idea and implementation. It makes it so much easier to learn new vocabulary. And the exercises are quite engaging. Great overall” - Marco A, 1 February 2017, 5 star rating

“Fantastic way to help with my Swedish. I have found this much more beneficial than any SFI course I have taken. One of the most useful language apps I have encountered...love it!” - Angelia S., 27 May 2017, 5 star rating

However, this is not the end of the story. Under this wide surface of appreciation, there are non-negligible problems to be found, including those that relate to the design, user experience (UX) and usability of the app:

“Good idea. Great contents. Horrible UX. Supporting only African or Arabic languages mostly. Impossible to read the auto translation.” ̶ Christopher R, 24 April 2019, 2 star rating

3 Reviews retrieved Tuesday 21 April 2020 from the Apps Google Play store listing: https://play.google.com/store/apps/details?id=se.svt.sprakplay. ​ Percentage calculated by comparing the pixel widths of the review distribution bars. Number of pixels per star count: 5*: 100, 4*: 17, 3*: 0, 2*: 2, 1*: 7

4

“App has a lot of bugs but I have to give it 5 stars because the concept is awesome.” ̶ Anonymous, 10 August 2019, 5 star rating

“Awesome concept, a little buggy but hope it will be fixed soon. Layout as well needs makeover :)” ̶ Sharvil B., 23 September 2019, 4 star rating ​ “Wow, you have no idea how much I was looking for something like this. I think it opens in a web view. It is so slow at least on my old phone. I hope that it will change soon to like more app style with more features. Thanks a lot.” ̶ Farzad Z., 10 December 2016, 5 star rating

“Very good. I like this application [but] I hope that they have another [test] than the drag-and-drop test, it is a little dull [...]” (translated by the author) “Jättebra. Jag tycker om applikationen [men] jag hoppas att de har annat [test] än drag och dropp test det är lite tråkigt [...]” (original) - Jack A., 23 October 2016, 5 star rating ​ While users are reportedly satisfied with the main functionality, they stumble on bugs (often unspecified) and performance issues. After leaving out remarks that are less relevant to this thesis4, there seem to be varied opinions regarding the exercises and learning tasks that are available in the app, with one user calling them “quite engaging” and another hoping that “there are others than the drag-and-drop test” because it is “a bit dull” (freely translated). The aforementioned feedback reveals concrete issues, but statements like “horrible UX” allude to a wider problem of a general usage experience that may not be satisfactory to some users.

6.2 Problem

The previous sections established that learning Swedish is critical for quick integration into Swedish society. At the same time, formal and traditional language learning systems are expensive in terms of time and financial cost and are typically limited to classroom environments. This imposes limits on the ways in which students can learn Swedish. Mobile-assisted language learning (MALL) could provide an answer by making flexible learning possible and being accessible "anywhere and anytime" [11]. ​ ​ However, state-of-the-art mobile language learning technology is missing components of contextual language practice. Providing television content through this technology as attempted by Språkplay may bring a form or rich audiovisual context to mobile language learning, but in practice there is little academic evidence for the viability of this concept.

In literature, few studies were conducted focusing on video-based language learning in mobile contexts. Hsu et al. [14] compared the effects of fully-captioned, keyword-captioned ​ ​ and uncaptioned mobile video on the vocabulary acquisition of 5th graders learning English as a foreign language. The study reported increased vocabulary comprehension in

4 The remark “supporting only African or Arabic languages” is not an issue that will be discussed in this thesis. Språkkraft supports mainly these languages since most immigrants in Sweden are from Arabic or African origin. The thesis was conducted with the English language as known language.

5

the case of captioned video. In another study, Hsu et al. [13] further evaluated different ​ ​ strategies of selecting keywords in captions on the motivation of mobile video learners, where a selection strategy adaptive to the learner’s skill was found to be most motivating for student learning. To the best of the author’s knowledge, no studies are available on usability and user experience of video-based mobile learning, and this thesis aims to fill this gap.

In the state-of-the-art as well, no other mobile video-based language learning solutions were found. Exceptions on non-mobile systems consisted of the web-extension “Language learning with Netflix”5 and the web application Lingopie6. The former is a community-made extension for Netflix. It adds multilingual captions and dictionary aids to Netflix videos. The latter affords similar functions to Språkplay, including a game. It is available as a web app and projected to be available on mobile devices.

Alongside the mentioned precedents, Språkkraft has released a novel concept with novel design decisions to combine DGBL techniques with television-watching. This introduces two challenges. First, given the lack of research and precedents it is unclear which of functions in Språkplay support users in their second language acquisition from a perspective of usability.

Second, the previous account of the reception of Språkplay suggests that the knowledge of user preferences and aims with regard to mobile game-based language learning in video-based context may be incomplete, resulting in a “horrible UX” for one. For example, it remains a question what kinds of tests and “app layouts” the user would prefer.

6.2.1 Research questions

The aforementioned challenges and existing gaps bring us to the following research questions of this work:

RQ1: What is the usability and user experience of the vocabulary learning functions ​ available in the examined application for students learning Swedish in higher education?

RQ2: How can gamification be used and designed in a video-based language learning ​ application to improve (perceived) usability and user experience of vocabulary learning functions, for students learning Swedish in higher education?

The first question is concerned with the implementation of Språkplay as it has been available as of February-April 2020. It inquires the usability and user experience of the learning functions, specifically examining functions that aim to support vocabulary acquisition. The second question focuses on the game-based techniques that are already part of many MALL applications, and explores how they can be integrated with novel video-based MALL applications such as Språkplay to better conform with the needs and preferences of users. In terms of users, “students” should be seen broadly as both students and researchers, that is, all members of higher education who receive language education.

5 https://languagelearningwithnetflix.com/ ​ 6 https://lingopie.com/ ​

6

Concretely, this work first inquires how the existing game-based functions have been used in Språkplay. Next, it aims to improve and extend them. RQ2 can therefore be split up in two sub research questions:

RQ2a) How well, in terms of usability and UX, do the video-based language learning functions in Språkplay based on gamification support the vocabulary acquisition for students learning Swedish in higher education?

RQ2b) How can they be improved by taking benefits of game-based learning?

6.3 Purpose

From a societal perspective, this degree project aims to contribute to the integration of foreign students into Swedish society by aiding in the research and development of novel digital tools that facilitate Swedish language learning. From an industrial perspective, this project answers a need from the partner Språkkraft to evaluate the design of SVT Språkplay, and to design ways in which game-based learning techniques can be better matched with video-based learning to ultimately provide better learner support. A tangible outcome of this work for Språkkraft are designs that could be implemented in their app. From an academic perspective, this degree project aims to identify the components of a successful integration of mobile game-based language learning with video-based language learning from a usability and UX perspective, which could provide a base for further research and development.

6.4 Goals

The general goal of this design-oriented research project is to study the usability concerns involved in the design of a video-based MALL application, with specific relation to its primary use for vocabulary learning purposes rather than entertainment purposes. It concretely examines Språkplay as a paradigm for a new set of MALL applications. Three sub-goals can be defined: Firstly (1), this work aims to evaluate the usability and user experience of learning functions currently available in Språkplay. The functions may or may not involve digital game-based elements. Secondly (2), this work aims to further design learning functions with gamification following the guidance of MALL and design research. Thirdly (3), this work aims to evaluate the design produced in (2) in a user-centred way so that the functions match the expectations and preferences of users.

6.5 Research Methodology

The research questions were examined by employing a mix-method approach with complementing qualitative methods in a user-centred design process. This process was structured according to the Double Diamond framework, which will be further discussed in

7

section 9. Theory. As a part of the process, structured methods were used such as a ​ ​ “cognitive walkthrough” and a semi-structured interview. Next to these, unstructured methods consisted of a free exploration of state-of-the-art apps and the examined app. For an overview of data collection methods, see table 1. For methods with users, a recurring set of participants was recruited of students with the A2 CEFR language level from the course Svenska för Ingenjörer at KTH. More details on ​ ​ the data collection as well as data analysis methods are presented in section 10. Methods. ​ ​ ​ ​

Targeted Method Double Users Primary Research Diamond Aim Question Phase

RQ2 State-of-the-art review Discover no Design

RQ1, RQ2 Free app exploration Discover no Evaluation

RQ1 Cognitive walkthrough Discover no Evaluation

RQ1, RQ2a User test (30 min.) Discover yes (7) Evaluation

RQ1, RQ2 Semi-structured interview (15 min.) Discover yes (7) Design

RQ1, RQ2 Focus group discussion (1 hr.) Develop yes (4) Design

Table 1: Overview of data collection methods

6.6 Delimitations

To keep this study manageable as a degree project certain choices were made. Some are already embedded in the research question: the project only studies a video-based MALL app for a target group of students and researchers learning Swedish in higher education. Inferences of results towards the learning of other languages, or towards other societal groups of learners are therefore not warranted. Next, the evaluation and design in this work focused on scenarios in which vocabulary can be learned with a video-based context. While there may have been potential for implicit or explicit instruction on grammar, cultural knowledge or other language-related skills, these possibilities were ignored for the scope of this project. The intention of evaluation and design needs to be clarified as well: the methods did not attempt to provide evidence for the language-pedagogical merit of the discussed learning functions, but rather aimed to understand the actual or perceived usability and user experience of the functions. Finally, this degree project did not produce an implemented interactive application, but halted the user-centered design process with the design and evaluation of an interactive, high-fidelity mock-up. This imposes restrictions on evaluation findings of that design.

8

6.7 Summary

In this introduction it has been established that Språkplay is a new kind of MALL app, focused on L2 video context rather than decontextualized exercises. This approach seems to be widely appreciated among its users. Moreover, literature suggests the value of learning with glossed captions for increasing vocabulary acquisition. However, some usability and user experience issues remain in the application, as evidenced by user reports. This may stem from a limited understanding of user preferences and goals in the context of the app. In response, this thesis seeks to evaluate the application’s learning functions in terms of usability and user experience, and then to resolve these issues in design. The evaluation will be carried out in a user-centred way with students and researchers learning Swedish in higher education, a demographic in need of Swedish skills to increase their job opportunities.

6.8 Structure of the thesis

Chapter 7 explains the features of the evaluated application. Chapter 8 presents background knowledge from relevant literature and a review of state-of-the-art digital learning technology. Chapter 9discusses background theory to be used in the methods. Chapter 10 describes the methods used for data collection and data analysis in the user-centred design process. Chapter 11 presents the intermediary evaluation results and design of the improved prototype. Chapter 12 presents the results of the final prototype evaluation Chapter 13 discusses the results and their connection to the research questions and their validity Chapter 14 concludes the degree project Chapter 15 lists the references used in this work Chapter 16 and next are appendices.

9

7 Description of the examined application

This overview describes the learning functions of the original version of the examined application, SVT Språkplay, as of April 202o. It is presented here as an understanding of these functions is helpful to grasp references to them in the following thesis chapters. The presented screenshots are taken from the application running on an Android smartphone. However, the application is also available with like functionality and very similar layouts on Android tablets and iOS-driven devices.

7.1 Video watching functions

7.1.1 Control functions (pause, jump forward, jump backward)

Figure 1: video watching functions ​

The application has a video control interface (figure 1) resembling common video ​ ​ applications such as YouTube, looking at the pause button, with a few important differences. The pausing behavior is similar to other video players, as is the bottom scrub-bar. Yet, in contrast to other video-watching applications, the app has more features related to captioning. In portrait mode, captions roll from bottom to the top as the video plays, and the previous two subtitles are shown together with the current one and the next two. It is possible to quick-scrub (jump) through the video by either clicking the left-arrow or right-arrow icons, or tapping the next or previous captions. In landscape mode, only the current caption is shown.

10

Next to the play/pause button, there is an “auto-translate” toggle button on the left, which is described further in the next section. On the right lies the “auto-pause” toggle, which will automatically pause the video at the start of every new caption.

7.1.2 The translation pop-up

Figure 2: video watching functions ​ When tapping a word, its translation is shown (see figure 2) together with the Swedish ​ ​ definition and links to more resources7 with information about the word, such as Google Translate8 or the Folkets Lexikon9 dictionary. In the top bar the canonical form of the word is shown (e.g. infinitive form for a verb), together with the CEFR level of the word (see next). On the right, a star icon affords a way to favorite the word (see later), play back a voice to hear the correct pronunciation of the word and to close the dialog. Closing the dialog will resume the video. The pop-up can be opened in the following locations where words appear: 1. In video captions (here) 2. In word test transcripts (see Active learning functions) 3. In the zoomed detail view of the “visual vocabulary” visualization (see Self-monitoring functions) 4. In the word list (see Active learning functions) Tapping the bar will change the color of the word (see next).

7 For certain words Språkplay does not have a translation in the app, e.g. neologisms or slang words. These may be more quickly found in external resources that are frequently updated. 8 https://translate.google.com/ - a web app from Google for translation of words and full sentences ​ ​ between any two languages. 9 http://folkets-lexikon.csc.kth.se/folkets/ - a community-built, creative-commons licensed Swedish ​ ​ Swedish-English, English-Swedish dictionary

11

7.1.3 The color system

Figure 3: video watching functions ​ At the core of the application is a system that gives a learning status to each Swedish word known to the application. This status ideally should reflect the degree of knowledge that the user has of the word. There are three statuses relevant to a user: a word can either be known (green), it can be unknown and of interest to be learned (yellow), or it can be unknown and not of immediate interest to be learned (red). Initial statuses are based on an assumption and may not reflect the users’ actual vocabulary knowledge. By changing word colors, the user may adapt the system to their situation (see more below). The color system appears in many functions of the application: ● as an underline in captions (see the previous section) ● as a status in a translation pop-up (see figure 3) ​ ​ ● as an underline in the transcript displayed in the word test, when no test is active (see Active learning functions) ● as base for statistics such as “words learned” in the profile page (see Self-monitoring functions) One more way in which the color can change is when the user completes a video word test (see later). After correctly filling in a word to be learned there (yellow), the word gains a dot underneath. At the third dot, the yellow color is automatically switched to green.

Initial color determination

When a user first starts the app, she is required to choose a CEFR level that fits her current language knowledge. The initial status of words will be adjusted based on that level, i.e. it is based on the assumption that the user wants to focus on learning words on her current level now and leave more advanced words for later. For example,e.g.. common verbs such

12

as “att äta” (to eat) would be associated with an A1 level, and would be marked as green ​ ​ when the user selects that they are at the A2 level. Next, the app would set words from current level (A2) to yellow, and words from a later level (B1-C2) to red. This initial mapping is, importantly, an approximation of the vocabulary knowledge typically associated with that CEFR level. It will not reflect the user’s real vocabulary knowledge.

Changing word color

The color of a word (see next) can be changed by the user to better reflect the user’s knowledge. This is mainly achieved by tapping the bar of the translation pop-up after selecting the word somewhere (see previous). One peculiarity is the change to yellow: this change also marks (stars) the word as “favorite”, which adds it to the word list (see later). The star can be removed manually if wished. The learning status assigned to words for a particular user is embedded in the user’s Learner Profile and shared across the applications of Språkkraft.

7.2 Active learning functions

7.2.1 Word test

Figure 4: video watching functions ​ In the word test (see figure 4) the user can practice the unfamiliar words she encountered ​ ​ in a video. The word test is a fill-in-the-gap test with translation hints. The whole video transcript serves as a text base, and the yellow (learn now) words are blanked out in multiples of about 5 (units). Words should be dragged from the word bank and filled in to the correct

13

gap. A user can choose the unit until where the test should be done. The word test is started from the pause menu of a video.

7.2.2 Word list

Figure 5: video watching functions ​ The word list (see figure 5) gathers words that were favorited (starred) with the translation ​ ​ pop-up for reference. It can be configured by sorting the words by date or learning status (green, yellow). Next, the words can be filtered by the video from which they came. The “” functionality will sort the words according to the spaced repetition algorithm, in which older and less difficult words put at the bottom in order to exploit the psychological , that states that knowledge is retained better when repeating it in spaced-out sessions [43]. Words list items can be tapped to show the sentence ​ ​ wherefrom they were marked as favorite, and it is possible to open the video containing that sentence, or to delete the word.

14

7.3 Self-monitoring functions

Figure 6: Learning profile (left), visual vocabulary for level B1 (middle), ​ activity page (right)

7.3.1 Profile page

The learning profile page (see figure 6, left) summarizes some app activity statistics. ​ ​ “Points” denote the number of correct word fill-ins in the word tests. Next, accomplished achievements are also displayed here, and an overview of the status of all words known to the app (their CEFR level is first given by the users, then individual words can be changed). In the visual vocabulary detail view (see figure 6, middle) each word associated with a ​ ​ certain CEFR level (e.g. B1) is represented with a rectangle. The user can zoom to see the words.

7.3.2 Activity statistics dashboard

The Activity page (see figure 6, right) displays certain activity statistics over time: the words read (1), words learned (2) and time spent in the app (3). It then offers a possibility to set daily goals for each of these statistics. Achieving one goal will increase the count of a “streak”, which is lost (reset to zero) again if the user skips one day of using the app.

15

8 Background

This background section expands on several concepts mentioned in the introduction: video and language learning, mobile assisted language learning, gamification, usability, user experience and user centred design. All these are research areas with an important relation to the examined application or this thesis.

8.1 Video and language learning

8.1.1 Media and language learning

One way in which language can be acquired is through extensive media consumption, an activity that has been intertwined with language learning for centuries. Consuming second language media can be seen as a goal of learning that language, but also as an aid in learning. Consider for example how 19th century scholars have relearned to read Egyptian hieroglyphs by studying a multilingual decree carved on a stone, the Rosetta Stone [40]. ​ ​ More recently, Stephen Krashen developed a hypothesis relating the consumption of foreign-language media to the acquisition of vocabulary and spelling. His Input Hypothesis “assumes that we acquire language by understanding messages. More precisely, comprehensible input is the essential environmental ingredient” [20 p. 1]. A message may ​ ​ be transmitted by any medium, yet Krashen studied mostly textual media. Since the rise of audiovisual media however, messages have been extensively transmitted through media that is not primarily textual. This also sparked research into language learning with these new media.

8.1.2 Learning with Closed Captions

Particularly relevant for this thesis is research on language learning conducted with video-based media. Today, people may binge-watch a series on Netflix and follow their favorite vlogger on YouTube. Video content such as TV programmes, internet videos and films are available in enormous volumes at the tap of a finger, also in the examined application. This is material brimming with spoken language, and often also captioned with text. Vanderplank started researching language learning with video in the late 80s and is still active in the domain today. In the recent Handbook of Informal Learning [8], he ​ ​ recalls that “TV, film, and video were seen by second language acquisition (SLA) researchers as an excellent potential source of comprehensible input, since they fulfill Krashen's idea of exposure to natural language”. Indeed, he further discusses that more exposure to foreign-language video content is correlated with better L2 (i.e., second language learning) comprehension skills [8]. ​ ​ But not all exposure to video is equally helpful for learning. One important aspect is the presence and language of captions: “viewing captioned video material compared to uncaptioned or subtitled material increases comprehension and promotes the learning of vocabulary” [8 p. 341]. In this quote, captions translated to a known language are referred ​ ​ to as (L1) “subtitles”, while untranslated second-language (L2) captions are simply named “captions”. Moreover, Vanderplank specified the most desirable type of captions by concluding that “full captions come out very well against keyword captions” [37 p. 116]. ​ ​

16

Full captions intend to render what is said in a video in full sentences, or if possible, verbatim. Keyword captions omit commonly occuring words such as prepositions and articles with the intention of lowering the cognitive reading load for learners [37 p.106].

8.1.3 Control and “glossed” captions

The positive results from learning with closed-captions should be nuanced by pointing to a dependency on the mode of learning. Vanderplank sheds light on this when he discusses a study on the benefits, affordances and downsides for individuals watching a range of foreign language DVDs (the EURECAP project [37]): “evidence from diaries kept by ​ ​ participants showed quite clearly that only if participants made full use of their control of viewing in terms of stopping, rewinding, and checking words and phrases that were difficult to follow or were unknown did they have any chance of taking in these words and phrases.” [8 p. 343]. ​ ​ Furthermore, recent research [27] has compared a form of captioning that integrates this ​ ​ “checking of words” to other forms of captioning, namely full captions, keyword captions and no captions. The “glossed keyword captions” they used are defined as “keyword captions with access to meaning: each keyword is linked to its corresponding L1 context-bound translation” [26, p.8]. L1 is the first or known language of the learner, and the learner could see the L1 meaning of an L2 word by pausing the video (pressing the spacebar). The study found that glossed keyword captions clearly outperformed the other forms of captioning in terms of vocabulary acquisition (as measured by meaning recall). This is a noteworthy result for this thesis, as the captioning of Språkplay can be seen as glossed full captions, where a meaning is shown by tapping a word in the captions. While ​ ​ the merits of glossed full captions are open for study, it seems that the combined reports of ​ ​ Montero Perez et al. (glossed keyword captions are best) and Vanderplank (full captions ​ ​ are preferred over keyword captions) suggest their value.

8.1.4 Implications for a video-based MALL application

Implications for the project at hand are twofold. On one hand, the previous section predicts the viability of the examined application: the concept of learning language from captioned video (with gloss) makes sense. Even though most related studies are based primarily on other technologies such as teletext TV, captioned DVDs and computer media players, transplanting this concept onto a mobile device should not be different in essence. On the other hand, this literature points to desirable technological affordances of the video-playing environment. It is stressed that learner control functions, such as pausing, stopping and rewinding video, should be available

8.2 Mobile assisted language learning (MALL)

8.2.1 Defining MALL

Mobile assisted language learning (MALL) is a term that originated in the late 2000’s to denote language learning via mobile interactive or audiovisual technologies, which included devices such as handheld computers, tablet pc’s and mp3-players [23]. When ​ ​

17

smartphones superseded these technologies in the last decade, many MALL studies have since focused on the use of mobile applications on smartphones (apps) for language learning [11,38]. Kukulska-Hulme (the author of a seminal MALL paper, [23]), defines ​ ​ ​ ​ MALL as “the use of smartphones and other mobile technologies in language learning, especially in situations where portability and situated learning offer specific advantages” [22 p. 1]. She further mentions some specific advantages, of which the following are ​ pertinent to the studied case listed: flexible use of time and space for learning, good alignment with personal needs and preferences, the possibility of keeping learning efforts ongoing while waiting or commuting. MALL may “support learners in reading, listening, speaking and writing in the target language, either individually or in collaboration with one another” [22 p. 1]. ​ ​ MALL has generally been seen as a sub-field of mobile learning (mLearning) [38] which ​ ​ was defined by Sharples et al. as a “process of coming to know through conversations across multiple contexts among people and personal interactive technologies” [39], as cited ​ ​ by Viberg & Grönlund [20 p. 1]. Kukulska-Hulme further characterizes mLearning as being ubiquitous and contextual, noting that the mobility of the learner is increasingly highlighted over the mobility of the technology [22]. From these definitions it follows that ​ ​ the examined application can be situated in MALL. It specifically addresses reading and listening capabilities as mentioned by Kukulska-Hulme, but also combines this with a viewing aspect through the captioned video. The application may be used in mobile contexts such as commuting, and focuses on individual use.

8.2.2 MALL with immigrants

A number of studies have examined the uses of MALL for immigrant language learners specifically. Jones et al. [18] suggest that the examined MALL application helps ​ ​ immigrants with acquiring relevant language learning skills and practice of these skills. One other conclusion from this study is that MALL as practiced by immigrants was not performed “anywhere, anytime”, but situated with more specificity. The application they studied “was used when appropriate and convenient, i.e., when participants had sufficient time, reliable internet connection and a suitable location, such as at home. The concept of anywhere anytime learning idea is not always borne out in practice.” [18 p. 249]. A home was seen as a good location since “It allows learners a private space in which to practice and make mistakes and appears to address some known challenges, such as pronouncing words ‘correctly’, understanding local accents and understanding and using colloquial language which includes contractions. Such support and practice was motivating and improved confidence.” This highlights the benefits that MALL could have over learning in non-private locations, such as a typical classroom.

8.2.3 Video-based learning in MALL

A few studies have specifically investigated the opportunities that lie at the intersection of video-based learning and MALL. The section 6.2 Problem section already reported the ​ ​ results of two different studies ([13,14]), namely an increased vocabulary comprehension ​ ​ where (adaptive) captioned video was used on mobile devices, compared to non-captioned video. The first of the two studies motivated the integration of captioned video with a

18

MALL application a priori by arguing that, (1) learners nowadays spend more time on mobile devices than on PCs, and that (2) MALL has the flexibility to support in-class and out-of-class learning. However, there seem to be no studies comparing a video-based MALL approach to video-based learning with PCs, or to a different vocabulary learning method entirely (e.g., the reading of texts on mobile devices/PCs/books). Thus, while the individual evidence from video-based learning (see before) and MALL seems promising, more empirical research is required to ground the field of video-based MALL.

8.3 The role of gamification

The most popular mobile language learning application of today, Duolingo, is used by about 300 million people [41]. Its popularity has been partly attributed to “gamification” ​ ​ [11,25,32,35], which is why this section describes that domain and its relation to MALL in ​ more detail.

8.3.1 Defining gamification

In studies of MALL, gamification has surfaced as a desirable characteristic along with e.g. flexibility [25]. But what is it really? Gamification can be defined as “the use of game design ​ ​ elements in non-game contexts” [7 p. 10]. The same paper further specifies it as a design ​ ​ practice that aims to incorporate “game elements”, rather than producing full-fledged games. The paper categorizes game elements into five levels of abstraction, where examples of the most concrete elements are “reward and reputation systems with points, badges, levels and leaderboards” (p. 1). More abstract elements include the consideration of “time constraints” and “limited resources”, or even processes to develop games such as “playtesting” and “playcentric design”.

8.3.2 Game-based learning and gamification within MALL

In the domain of (language) learning technology, the use of game elements has been discussed with the term digital game-based learning (DGBL), of which mobile game-based learning (MGBL) is a subdomain [9]. This body of research has links to mLearning, MALL ​ ​ and is thus of relevance to this thesis. The difference of this concept with “gamification” as defined before may be that (M)GBL starts from a (mobile) game and looks at how learning is or can be achieved through that game. Gamification on the other hand, starts from a traditionally non-game context (e.g. language learning) and implements game elements in that context in order to increase outcomes such as the users’ motivation to learn [1], that is, ​ ​ it gamifies a non-game application. ​ ​ In practice, this conceptual distinction is demonstrated in a study on the effects of using “gamified features” in a “mobile game-based English vocabulary learning app” [4]. In this ​ ​ experimental study, two versions of the same app have been compared. One version has gamified elements such as leaderboards and minigames as a means of vocabulary assessment. It was thus called a “mobile game-based app”. In the other version, these elements were omitted. It did thus not qualify as a game-based app.

19

8.3.3 The value of gamification for vocabulary learning

The study above has found that specifically for (English) vocabulary learning in a mobile app, the use of gamified features has a strong positive correlation with both the acquisition and retention of new vocabulary [4]. In particular, the following were found to be the ​ ​ gamified elements that contributed most to this result: (1) a pre-established learning path that visualizes progression, (2) gamified assessment methods, e.g. a tic-tac-toe game where the learner can only move when she/he provides a correct answer to a vocabulary (translation) exercise, (3) a ranking of learning peers (leaderboard), an element of social competition. Moreover, the study found that users spent more time in the app with gamified features, than the version without. Another study found that gamification elements such as progression tracking with experience points (XP) could be used to adapt the interface to the learners’ needs, e.g. when the learner wrongly guesses a word, it could be made to appear more in the app [13]. Similarly, a study on Duolingo [25] praised its XP leaderboards as a social gamification aspect. The presence of this element had a positive correlation with app usage, which was itself correlated with greater language mastery. However, it was also noted that the feature “may have led some participants to focus less on language learning and more on reaching the top of the XP leaderboard” [25 p. 305].

8.3.4 Implications for this project

Since the aim of this project is to improve the usability and UX of the examined app with the ultimate goal of increasing vocabulary acquisition for its users, attention should be given to game design elements such as presented in the last sections. It is clear that these elements are connected to improved vocabulary acquisition on their own merit. In the next background section, they are also related to an improved UX.

8.4 Usability, User Experience (UX) and User-centered design (UCD)

8.4.1 Why and Usability and UX?

ISO 9241-210 states that good usability and UX are determinants of the technical and commercial success of any interactive application [17]. Therefore, gaining insights on how good usability and UX can be achieved in the domain of MALL is important. The ISO defines usability as the “extent to which a system, product or service can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use” [17 p. 17]. User experience is defined as a “person's perceptions ​ ​ and responses resulting from the use and/or anticipated use of a product, system or service” [ibid.]. Furthermore, the ISO specifies that good usability and UX are achieved by a user-centred design process (UCD). This is an iterative process of understanding the context of use, specifying user requirements, producing design solutions to meet those requirements and evaluating the designs against the requirements. As a result, all these steps would ideally also apply to the design of MALL applications. When employing the concepts of usability and user experience in a UCD process, it is necessary to note that they are intertwined. The ISO specification notes that “usability, when interpreted from the perspective of the users' personal goals, can include the kind of

20

perceptual and emotional aspects typically associated with user experience. Usability criteria can be used to assess aspects of user experience.” [ibid.]. Ritter et al. further clarify the difference: “usability and usability engineering focus on task related aspects (getting the job done); user experience and experience design focus on and foreground the users’ feelings, emotions, values, and their immediate and delayed responses.” [29 p. 44]. Given ​ ​ this complementary yet overlapping connection, this work aims to evaluate both aspects simultaneously.

8.4.2 Usability and User Experience in MALL

Few research efforts have been focused on the examination of “user experience” (UX) or “usability” within the MALL domain. Limited exceptions were found, starting with a 2016 blog article that puts forward gamification as a solution to UX challenges, taking Duolingo as a successful example [35]. Sendurur et al. in 2017 [32] discuss UX of Duolingo in more ​ ​ ​ ​ detail, finding for example that high usability, a step-by-step flow, gamification and immediate feedback positively affect the UX, while a lack of challenge and overuse of notifications can be detrimental. Finally, Triando and Arhippainen released a design-focused study 2019 on the UX of a self-developed “mobile web game” to learn Viena Karelian, a Finnic language dialect, taking inspiration from Duolingo. This thesis aims to widen the limited insights on usability and UX in MALL, focussing on a video-based mobile learning app as opposed to Duolingo or a web-based solution.

8.5 Summary

The application Språkplay can be viewed from different angles covered in literature. From the angle of video-based learning, the glossed full captions and video control functions available in the app are promising for vocabulary acquisition. From the angle of MALL, the aspect of flexible use has been praised, but the context-of-use for immigrant learners has been found to be more localized than what would be expected from the tenet "anywhere, anytime" . Next, gamified design elements have been a part of many MALL applications and are cited as reasons for their success. Elements conveying progression and social competitive features were discussed most. Finally the concepts of usability and user experience were brought forward as important factors in the success of an application, and the lack of studies relating to these factors in video-based MALL was noted. The next chapter introduces two theoretical frameworks before discussing the methodology of this work in more detail.

21

9 Theory

This chapter briefly introduces two frameworks that were instrumental to the setup of this thesis. First, the Double Diamond is presented, a framework that implements the UCD process seen in the previous chapter. Next, the MUUX-E framework is explained, a framework to evaluate the usability, user experience and educational features of m-learning environments.

9.1 The Double Diamond framework

Figure 7: A visualization of the Double Diamond design process, ​ adapted from Design Council UK [6] ​ The user-centred design process as seen in section 8.4 has been implemented in various ways by parties across the globe [3]. One interpretation is the Double Diamond framework ​ ​ defined by Design Council, an advising organisation to the UK government. The framework segments the user-centred design process in four phases: Discover, Define, Develop and Deliver [6], (see also figure 6). Importantly, Discover and Develop contain expanding ​ ​ ​ ​ methods for researching the problem domain (Discover) and designing a chosen solution (Develop). Define and Deliver respectively narrow down the scope of the problem domain (Define) and implementation paths (Deliver), for example by evaluation methods. In the context of this project it is important to note that the Discover phase mostly gains information about an existing application, while the Double Diamond is generally used to design new solutions from scratch. Nevertheless, any user-centred design process expects iteration, which means that the process can also be applied to improve an existing app. In

22

the section 10 Methods, all used methods in this thesis are framed along the Double ​ ​ Diamond phases.

9.2 The evaluation framework MUUX-E

As seen, one critical aspect of the UCD process is evaluation of usability and user experience designed applications or intermediary prototypes. While no related work seems available in the MALL domain specifically, the parent domain m-Learning has received some attention regarding this topic. Salient for this thesis is the work of Harpur and de Villiers [10], who synthesized “MUUX-E”, a framework for evaluating the usability, user ​ ​ experience and educational features of m-learning environments. They put forward five categories of design principles pertinent to evaluation activities of this type: generic usability criteria inspired by Nielsen [28] (1), web-based learning principles concerning ​ ​ information architecture and content quality (2), principles of educational usability (3), features specifically expected for successful m-learning (4), and finally, principles of user experience (5). MUUX-E’s principles and categories will be leveraged in this thesis to evaluate the examined video-based MALL app and it’s iterative redesign.

23

10 Methods

The used methods will be presented following the Double Diamond framework [6], which ​ ​ was discussed in chapter 8. Theory. First, this chapter details the data collection methods ​ ​ to understand the problem domain, in line with the Double Diamond “Discover” phase. A significant part of these methods pertain to the user-centred evaluation of the existing examined application whose features were explained in the previous chapter. Next, the methods for analysing this broad data to obtain design requirements are described, corresponding to the Double Diamond “Define” phase. Finally, the methods to develop and evaluate the improved prototype are covered. The concrete outcomes of the Discover and Define phases are covered in the Prototype development chapter following this chapter. The final results of the Develop phase, the ​ evaluation of the final prototype, are presented in the Results chapter. A final Ethics ​ section under Methods describes the ethical considerations for this work.

10.1 Discover

The discovery of the problem domain consisted of literature and state-of-the-art review and a series of usability studies on the examined app.

10.1.1 Literature review

To start, a literature study was conducted to become acquainted with prior knowledge on “gamification”, “video-based language learning” and “MALL”. These terms were searched in various combinations in academic paper databases. Titles and abstracts were read to determine relevance for this UCD project. The main findings from this study were already discussed in the Background chapter. ​ ​ 10.1.2 State-of-the-art review

Next, relevant state-of-the-art solutions with gamification elements in mobile-assisted-language learning (MALL) were surveyed. The goal for this survey was to aid in the UCD process by providing an idea of what gamification elements are common practice in MALL apps. This is relevant knowledge for design since adherence to standards is an established usability principle, popularised by Nielsen in the 90s. When a user recognizes an exercise type from another app they used, this benefits the usability of the application because the learning cost is reduced [28]. ​ ​ Five MALL applications were selected by their popularity in terms of downloads in app stores, namely Duolingo, Babbel, Quizlet, AnkiDroid and . These apps were installed by the author and tested briefly with Swedish course content. The test sessions were screen-recorded. A conventional coding technique [12] was employed to discover ​ ​ commonalities. One specific outcome was the creation of a collection of exercise types found in the examined application for later reference in exercise design. A total of 24 exercise types were found, and these were categorized in terms of their “Areas of Instructional Assessment”, a categorization defined by Heil et al. [3]: vocabulary in

24

isolation, vocabulary in context, grammatical form, pragmatics, pronunciation, no assessment. To keep a focus, the results of this research are not discussed separately in this thesis. However, certain findings are referenced and explained when discussing design decisions in section 11.2 Design of the improved prototype. ​ 10.1.3 Self-guided usability evaluation of the examined app

The main part of the discovery process concerned the evaluation of the examined app, since in the case where a produced artifact is already available, evaluation is the next step in the UCD process [16]. This evaluation pertains to the first research question of this ​ ​ thesis that inquires the usability of the examined app. A usability evaluation was carried out in two iterations. First, a heuristic walkthrough as defined by Karat et al. [19] was conducted individually by the author of this thesis. The ​ ​ outcome was a) an identification of the most important learning functions in Språklay and b) an initial list of potential usability problems associated with these functions. Second, a set of user tests was conducted to enrich and correct this data. It is important to note that the intermediary results from these methods are not directly described in this thesis. After the data processing described in section 10.2 Define, the results from each method ​ ​ contributed to the triangulated results that are presented in the section 11.1 Initial ​ evaluation results. The heuristic walkthrough is discussed hereafter, the user tests are discussed in the next section. Sears et al. describe the heuristic walkthrough method proposed by Karat et al. as consisting of two passes [15]: 1. A self-guided exploration of the interface, much like a heuristic evaluation 2. A scenario-based walkthrough of the interface.

Initial evaluation

In the initial stages of this thesis the author had already explored the interface of Språkplay and had informally documented usability problems with Nielsen’s heuristics [28] that ​ ​ came to mind. These observations were based on prior experience, discussions with student peers and comparison with other video- and MALL apps. The author relied on these experiences to identify the main learning functions of the app to be evaluated; four were identified. This list was then updated with feedback from the principal, adding one more learning function. The documented informal evaluation was taken as the first pass of the heuristic walkthrough evaluation. The evaluated learning functions were: 1. Discovering the meaning of an unknown word while watching a video 2. Automatic translations and reflecting one's vocabulary knowledge in the app 3. Constructing a word list while watching video 4. Reviewing a saved word list 5. Play a word game based on the video transcript.

25

The context of these functions can be found in chapter 7. Description of the examined ​ prototype. ​ Cognitive evaluation

For the second pass, a cognitive walkthrough was performed as described by Lewis and Rieman [8]. The input for this method is threefold: 1) a task scenario: a small story that explains the need for a user to perform a certain task;. 2) an action list that details the most efficient path of actions needed to reach the end state of the task;. 3) a cognitive walkthrough that asks 4 questions about the state of mind for each action in the action list: a) Will users be trying to produce the effect the action has? Are they aware of the feature or need? b) Will users see the control (button, menu, switch, etc.) for the action? c) Once users find the control, will they recognize that it produces the effect they want? d) After the action is taken, will users understand the feedback they get, so they can go on to the next action with confidence? Task scenarios and action lists were constructed for each learning function, then answers to the “state of mind” questions (3) were written down individually by the author in free-form. From there, problems related to awareness (3a), discoverability (3b), affordance (3c) and feedback (3d) were identified. The outcome of the heuristic walkthrough as a whole was two sets of problems from different angles (Nielsen’s usability principles and cognitive considerations). Each problem was ranked by severity on the grade of low, medium, high by the author by answering the question: How much is this problem likely to affect the effectiveness, efficiency or satisfaction of the app?

10.1.4 User test of the examined app

To enhance the validity of the findings a second evaluation method was employed: a user test based on the recommendations of Lewis and Rieman [24]. This section describes the ​ ​ recruitment of participants, the way meetings were arranged, a pre-survey, the user test process and data analysis method.

Participants

Participants were recruited from a course Svenska för Ingenjörer at KTH, a course taken by students with the CEFR A2 level in Swedish. The course teacher invited her students by email by sending out an invitation. They were asked to participate in three remote sessions with the author: the user test (this section), an open-ended interview (see Define) and a ​ ​

26

final co-design session (see Develop). It was assumed that the main motivation to ​ ​ participate was to use SVT Språkplay as a voluntary study aid. Ten students showed interest in the study by responding to the first invitation. Of those, nine students responded to the initial survey (described next). Seven students proceeded to scheduling a user test (the first user session, which is described here). The same seven students also participated in a follow-up interview two weeks later (described in “Define”). Finally, of those seven, four participants participated in the final design evaluation session (described in “Develop”).

Initial survey

To prepare for the user test each potential participant that showed interest in the study was sent a non-standard survey. The asked questions included questions about their prior experience with language learning apps, their language skills and basic demographic data. The questions can be found in appendix A. Characteristics of the participants are described ​ ​ where the evaluation results are reported in Prototype development. ​ After participants had responded to the survey they were sent an email with instructions to install SVT Språkplay and means to schedule the first session (user test).

Meeting modality

All sessions with users were conducted through the use of the video conferencing software. This was due to a force majeure during the making of this degree project: the global COVID-19 crisis prevented in-person meetings in Sweden starting in February 2020. However, the author believes the ramifications were limited: either Skype10 or Zoom11 were used as video-conferencing applications. These softwares have mobile screen-sharing features in addition to webcam video sharing. This allowed to approach the experience of in-person user tests in most relevant factors: the users’ app usage was visible on a video feed, the participant’s facial expressions could be seen on another webcam feed and instant audio communication was included. The video calls were recorded. While calling, the author also took notes on paper as a guideline for later video analysis. Paper note taking was chosen over keyboard note taking to prevent the sound of typing, which may be perceived as annoying by the participant.

User test process

The user test took about 30 minutes per person and was loosely structured with tasks to evaluate the same learning functions evaluated for the prior cognitive walkthrough. Before starting participants were informed that the goal was to evaluate the application, not them. They were also asked to follow the think-aloud protocol and reminded that the video call would be recorded. While the users were performing tasks, the author allowed them to move as organically as possible to other tasks that fit their current location in the app. There was therefore no

10 https://www.skype.com/en/ ​ 11 https://zoom.us/ ​

27

fixed order in the tasks. If the users did something “interesting”, that is, stumbled on a problem that the author had not identified yet in the heuristic walkthrough (previous section), or mentioned something like “I would like this button to do A but it does B”, the author tried to dive deeper and ask questions about these issues. Regularly the user compared the app to other MALL apps she or he used before, which was encouraged. It also happened that in some cases the users had already explored the app before the user test. Initial communication had advised them not to, but this could not be enforced. The original intention was to test the first-time user experience of the app. However, after conducting a few sessions with users who had already explored the app; the author noticed that these experienced users could answer questions like: “how did you understand this feature?”. Answers from these users seemed less biased since they were not instructed to do something that might not have come naturally to them, but they instead recounted their spontaneous experiences from a few days ago. Therefore, the author embraced prior exploration as enriching data points. The video call ended with a debrief asking for any additional comments or general feedback and instructions for the next two weeks (“use the app at your own pace, though I recommend watching 1 video daily”).

Data analysis

To analyse the data, a directed coding technique [12] was employed through the video ​ ​ coding software Atlas TI. Transcripts were generated by the tool otter.ai12 and included ​ ​ alongside the video to aid with the coding process. Known issues found in the prior heuristic walkthrough were considered to be the primary predetermined codes and their occurrences were counted. However, the process allowed for other codes to appear, such as other apparent issues, the previously found commonalities found in other MALL apps in (see “State-of-the art-review”), and codes reflected digressions covering various language learning strategies from users. A set of auxiliary codes was built to aid later iterations over the coded material, such as codes tagging emotional responses and indications of task success or failure. The outcome of this analysis session was a list of usability issues and other observations with each an occurrence count.

10.1.5 Semi-structured interviews

The user test was deemed to only be able to capture a part of usability consideration of the app, since for most participants it was the first time using the app. Perspectives and usage preferences can evolve over time. Therefore, a second session was set up with the participants two weeks after their initial session. This session was conducted in the form of a 15-minute semi-structured interview. The questions (see Appendix B) further polled for ​ ​ the learning context and habits of users and their longer term perspectives on the previously evaluated learning functions of the app. It also explicitly asked for feedback and ideas for potential improvement.

12 https://otter.ai/ - a audio transcribing web-app driven by artificial intelligence ​ ​

28

Data analysis

The interviews were audio-recorded and analysed with the same software and methods as the user test analysis. The set of issue-based codes from the user test (see the previous section 10.1.4) was expanded. More codes were added related to ideas for improvement as suggested by participants. As mentioned, the result set is not directly presented in this thesis, but rather contributed to the triangulated results presented in section 11.1 Initial ​ evaluation results of the examined app. The triangulation procedure to beget these results ​ is described next in section 10.2 Define. ​ ​ 10.2 Define

The Double Diamond framework presents the Define phase as a way to “review and narrow down your insights and establish your project’s main challenge” [6]. One method ​ ​ mentioned is to utilize a set of assessment criteria to advance the most important ideas and issues to the Develop phase. This project attempted to improve allround usability and user experience of the examined app, therefore it attempted to equally consider all the related aspects established by MUUX-E framework: general usability, web-based learning principles, educational usability, m-learning principles and user experience principles [10]. ​ ​ The data from all separate evaluation methods in Discover were triangulated as follows: 1. Each code (e.g. idea, usability issue, learning strategies) from the Discover phase was included in a spreadsheet as a row, adding additional related data in columns such as their severity ranking (initial issues) and occurrences (coded user test & interviews) 2. Rows were perused several times to eliminate redundancies from the different coding sessions, resulting in many rows that were merged. 3. Each row was tagged with relevant principles from the MUUX-E framework 4. Next, the rows were narrowed down and prioritized: the ranking of issues and their occurrence were used to drop below-average occurrences and issues with a “low” priority, effectively lending more importance to issues that were shared by the author’s initial observations and several user test results. 5. The data was re-sorted from the angle of the MUUX-E categories. For each category the remaining set of related issues and ideas were seen as topics for further design and development.

29

10.3 Develop

According to the Double Diamond framework, the aim of the Develop phase is to “to brainstorm design concepts, test out what works and discard what doesn’t.” [6] ​ 10.3.1 Low-fidelity prototype

To brainstorm design concepts, the author first reviewed the issues and ideas that emerged from the Define phase for each MUUX-E category. Next, paper-based low-fidelity wireframes were sketched that could address each of these issues. Various options were left open, which were finally narrowed down with limited feedback from a design-student peer. Some sketches and design choices are discussed in section 11.2.1 Low-fidelity prototype ​ sketches.

10.3.2 High-fidelity prototype

After the feedback on the low-fidelity prototype, a clickable high-fidelity prototype was built with the design tool Figma13 based on the low-fidelity sketches (see figure 12). The Material Design14 icon set ​ ​ was used to attain a look & feel associated with the Android mobile platform. Thumbnail images were taken from Unsplash and combined with realistic content titles taken from the actual app.

Figure 8: The development of a clickable prototype in Figma ​

13 https://www.figma.com/ - a cloud-based design tool ​ ​ 14 https://material.io/ - a cross-platform design framework by Google ​ ​

30

The outcome of this design was a mock iPhone SE (see figure 9) that could navigate ​ ​ between the designed screens. It was also possible to scroll on scrollable elements, such as the vertical scroll in the homescreen revealing an extra section, or the horizontal scroll of recommended videos. The design decisions in the process are detailed in section 11. ​ Prototype development. The target of the design was both mobile iOS and Android devices. ​

Figure 9: The mock iPhone SE running the prototype in a web browser ​ 10.3.3 Prototype evaluation with users

The goal of the final prototype evaluation was to evaluate whether the designed improvements solved the issues as seen from the MUUX-E framework, and responded well to the wishes of the users. The format of this evaluation was originally intended to be a focus group with all seven participants, but the participants could not be brought together at the same moment. Instead, one semi-structured group interview was held with three of the participants, and a similar interview with a single other participant. To prepare the interview, questions were constructed to gauge the user’s willingness to use the designed elements, and to assess whether the elements resolved the issues reported by users. The questions were initially sorted according to the MUUX-E categories to ensure a balance of usability aspects. These questions can be found in appendix C. Later, they were reordered according to the design elements they aimed to evaluate to ensure the interview

31

could follow a demonstration of the prototype without jumping back-and-forth between screens.

Group interview process

The group- and single-participant interviews took one hour each. The author’s screen was shared displaying the clickable prototype. A web link to the prototype was also sent to the participants so they could independently interact with it on their own computers during the call. Opportunities were given for participants to discuss among each other, and the author (who led the discussion) attempted to request views from all participants, and inquire for motivations behind simple yes/no answers.

Data analysis

The transcripts for the discussions were generated with otter.ai. Next, a directed coding approach [12] was used to code the answers to the questions in relation to MUUX-E ​ ​ principles. Direct keyword based search in the transcript was used to identify passages that were related to the question at hand. These results are presented in chapter 12. Results. ​ ​ 10.4 Ethics

This thesis proceeded in accordance with the GDPR. Participants were first invited to the study by their teacher. Students who were interested notified their email to the researcher, only consenting to receive further information. Next, a detailed initial questionnaire was sent that (1) outlined the structure and aims of the study, (2) presented the ways in which participant data would be processed and shared (including video/audio recording) and (3) the purposes of this data processing. Participants provided their voluntary, explicit and informed consent before enrolling. See the “Do I have your permission?” section in Appendix A for details. Participants could drop out of the study at any time. Participants were offered no compensation for enrolling in the study, the study was framed as an opportunity to help in the development of the app. To protect the participant’s identity they were given a number (e.g., P3). In this thesis they are only referred to with this number. A potential problem regarding the ethics of data processing with regard to the examined app is discussed in section 13.4 Ethics. ​ ​

32

11 Prototype development

This chapter first contextualizes the design decisions in the prototype by covering the concrete outcomes of the Define phase according to the MUUX-E categories. Next it describes the intermediary and final form of the developed prototype.

11.1 Initial evaluation results of the examined app (Define)

This section discusses the results from the combined initial evaluation according to the five categories of the MUUX-E framework: generic usability (1), web-based learning principles (2), principles of educational usability (3), m-learning features (4), and finally, principles of user experience (5). It may be helpful to read to the Description of the examined ​ application to understand which functions of the application were evaluated. As a ​ reminder, statements from the participants are referred to with a participant number (e.g., P3) to preserve the participant’s anonymity. To clarify whether a statement came from the user test when the user was first introduced to the app, session 1 (S1) is annotated. If the statement originated from the open-ended interview after two weeks of app usage, it is referred to as session 2 (S2). Finally, some statements from the last semi-structured interview of the final prototype (S3) are also retroactively discussed here when they related directly to the existing app (e.g. when a user compared the existing app to the prototype).

Participants

Of the seven participants of the first two evaluation sessions, two were female and five were male. Most were doctoral students or post-doctoral researchers, with the exception of one undergraduate student and one graduate student. The motivation for learning Swedish for most students was to integrate better in Swedish society, specifically to facilitate continued living and working in the country. Only the graduate student had a more ephemeral motivation to learn the language for the duration of his double-degree year in Sweden, as a form of cultural curiosity. All participants spoke a different native language (Serbian, Russian, Italian, Portuguese, French, Spanish, English) and all spoke English at an advanced CEFR C level (the sessions were conducted in English). Four participants also spoke at least one more language at a B-level. Five out of seven participants had never used SVT Språkplay before the study, while two had (P2 and P9). In terms of prior experience with MALL apps, all participants had used or were still using Duolingo to learn Swedish. Two people also used Tinycards15, one other person Quizlet16, another one AnkiDroid17. The average participant age was both 26, with a minimum of 21 and a maximum of 33.

15 https://tinycards.duolingo.com/ - a flashcard mobile app and web-app by Duolingo ​ ​ 16 https://quizlet.com/ - a flashcard mobile app and web-app ​ ​ 17 https://github.com/ankidroid/Anki-Android - self-described “semi-official port of the open ​ ​ source Anki spaced repetition flashcard system to Android”

33

11.1.1 General usability

This section describes themes regarding the general usability of Språkplay.

User errors in the word test

One recurring issue was that the word test activity responded to user input in ways that were unclear to participants. For example, when an error message appeared “There are not enough words to learn (yellow) to start a test”, it was not clear to participant P5 (S1) how to solve this issue, which led her to abandon the word test activity. Similarly P1 and P7 (S2) reported being confused when a unit they did not test before appeared “checked” and could not be activated without restarting the whole word test. The reason was that they had already tested a unit that came after the desired unit, which led to the specified app behavior. In another related issue, the word bank for “words-to-fill” disappeared out of view when the transcript was scrolled (e.g. for P1, P5 & P9). The “prevention of usability-related errors” principle in MUUX-E states that error-prone conditions such as these should be eliminated. In summary, the word test approach of having a full transcript with separate test units led to usability problems. User input should be more directed and constrained to prevent errors, for example with a step-by-step flow for user input.

The initial state of the color system

For almost all participants, the initial mapping of words to CEFR levels caused some form of annoyance or confusion, since the mapping was often not matching with their vocabulary knowledge. This mismatch can be seen as a problem instance of the “Match to the real world” principle of MUUX-E. As an illustration, P1 mentions “I wonder how he actually measures that things should be easy, because for instance this red here, I can understand it. I already know what smittan [contagion in English] means. But I don’t ​ ​ ​ ​ know what begravning [burial in English] means [begravning was marked as ​ ​ ​ ​ green/known].” P7 (S1) makes a remark on this feature from an educational viewpoint: “ I don’t think it is so straightforward to correlate a number of words or a specific type of words you have to know, and the level you reached in a language. If compared to English for example, I think I’ve learned let’s say 80% of the words that I know in the A1, A2 level and then all the rest come with B1, B2, which is mostly grammar and C1, C2 would be mostly sentence construction. Okay, this app is using this means of classification. And yeah, it’s okay. It’s one possibility. Let’s see how it works.” However, in the follow-up interview two weeks later he addresses the same issue as his primary concern with the app. P2 (S2): “most of the verbs are addressed [assigned] to [the] A1 level, which is in my opinion a little bit wrong. Because yes, verbs are important, but you don’t learn them all in A1. And at the same time, I found some vocabulary or names related to family or food, that would sit instead in level C1 or B2. But actually, family and food are such an easy topic that normally is placed in A1. So there is a lot of malposition of words, and this system of words could work, but it needs to be elaborated a lot. It’s really low [quality] in my opinion, still.” P5 and P6 had similar remarks.

34

Efficiency of color switching

The core learning system of the examined app is dependent on users’ active interaction with words. A user needs to change a word’s color to match the words they know (green) and want to learn (yellow). In case of such a frequent action, one of Nielsen’s usability principles is to enable a more efficient way to complete that action for so-called power-users [28], also called shortcuts. However, the examined app requires at least three ​ ​ taps for a color switch: 1. tap the word to open a dialog, 2. tap the bar to toggle to the desired color, 3. tap the close icon to close the dialog. It may take considerable time to execute those actions, preventing a fluid video-watching experience.

11.1.2 Web-based learning principles

The principles of web-based learning posit that suitable language learning content should be available. Moreover, content should be easily accessible through a well-designed information architecture.

Obstacles for the discoverability of learning features

For both the word test and word list features, discoverability was a major issue. In the case of the word list, which is organized behind the “Profile” menu, all users first looked at the star-icon in the right bottom to find their favorite words, while this icon actually led to their favorite videos. Next, most users aimlessly tried opening every page visible from the home screen to scan for the word list. This points to an issue of information architecture, where the word list could be positioned better. Similarly, the video word test page was not easy to discover. It only appears as a small “academic hat” icon in the top-right corner of a paused video, a placement associated typically with video controls. The most striking report of this came from P2 (S2), who asked the interviewer where he could find the word test. During the two weeks after the first interview he had forgotten where it was and had not been able to find it.

Non-verbatim captions as imperfect learning media

One challenge that appeared is that captions are often not verbatim, which threatened their perceived suitability as a learning resource for most participants. This observation also has grounding in literature. Captions were originally intended as a medium for deaf people to follow a video, not for learning a language. It commonly occurs that the captions do not match what is said by actors or speakers in the video. Words may be replaced with shorter synonyms, or sentences that were rephrased midway may be omitted for reasons such as limiting the line length and preserving readability [37]. ​ ​ In contrast, participants reported a desire to see fully verbatim captions that represent a match between what is said and written, as this would help them with their listening skills. P1 comments that developing listening skills are “just as necessary, besides the vocabulary and the grammar”, P3 agrees that “it’s the listening that I usually don’t get”, which is a major reason for him to use the app.

35

With those motivations in mind, participants explicitly criticized the mismatch in captions. P1 (S1) “People don’t speak slow or something, which is nothing that I find bad, but the problem is that the subtitles don’t match”. He later gives an example: “the [sub]title tries to be formal, while the people are speaking informal”. P7 explicitly mentions being confused in these situations: “I will follow the sound of the voice and I would like to follow it on the text. But if all of a sudden a sentence, which is probably useless, but it’s a sentence, is skipped in the subtitles, then I get a little bit confused, you know, and that’s something that annoys me”. P6 and P3 have similar experiences. However, in literature it has been argued that the mismatch makes the viewer more wary of potential mistakes, which may be beneficial for learning [8]. ​ ​ 11.1.3 m-Learning features

“M-learning features” is a category of MUUX-E to evaluate whether the typical affordances of the mobile device, such as its personal nature and its ability to be used “anywhere and anytime”, are used to their full extent in the MALL app. In the study of this thesis, a focus was put on the principles pragmatic user-centricity and contextual factors, which include aspects of personalised learning, customization, the option to complete tasks over longer duration, encouragement of active learning and access to student-centric material.

Useful translation pop-up

The dictionary/translation pop-up was found intuitive and usable, even being cited as the favorite feature of the application for some participants (P7, P1; S2). When asked to find the translation of a word, P1 responded “I just click it”, vocalizing that he found clicking the word an obvious action to reach that goal. P7 (S2) highlights the usefulness of the feature: “it’s just like watching a [video] with subtitles, but with Språkplay, what is nice, If I don’t know the meaning of one word, I can just tap on it, and I will have the translation”. P1 (S1) claims that it is “one of the best features” and “95% of the time it is super helpful”. The other 5% would be times where the translations are not accurate in the context. P2 (S2) remarked that clicking the pop-up pauses the video, which he appreciated because it gives him the time to consider the translation. However, he further expresses a problem he experienced when wanting to review a word from a previous sentence. When tapping that word, the video rewinds to the position of that caption sentence (the step forward/backward action): “[it] was a cool experience, but at the same time you keep listening to the same thing over and over again. Maybe it could be an optional thing, I like it, but I would probably disable it.”

Missing ability to hide captions

One feature that was missing for five out of the seven participants is the ability to hide captions on demand. While they were mostly appreciated as a learning aid (see Non-verbatim captions as imperfect learning media), some participants (P9, P1) explicitly ​ expressed a desire to practice their listening abilities by watching a section first without captions, only to enable them when they could not understand what was said. This was coded with the principle of “customization” in MUUX-E.

36

Recommendations for learning & viewing are not personalised

A feature commonly found in other MALL applications was the ability to quickly start a relevant learning session from the home screen, often in a single tap (e.g., Memrise, Duolingo). Similarly, other mobile media streaming applications (e.g., Netflix, Spotify, YouTube) include in their first screen a set of personalized recommendations such as uncompleted episodes and next episodes of series in progress (Netflix) and recently listened to artists (Spotify). The examined application did not exhibit such easily accessible personalized recommended content, even though it is relevant to enhance usability and user experience. These features in other apps adapt a part of the user interface to the user’s prior activities and possible intentions, conforming with the principles of user-centricity. Shorter attention spans are associated with mobile devices in the context of MALL [11], so one ​ ​ more advantage of a possible set of “Continue watching” recommendations is that it facilitates the interrupted viewing of longer (e.g., 1 hour) videos by offering a quick way to “jump back in” after an interruption.

Limited encouragement of active learning

While the examined app clearly can be classified as a MALL application given its language learning-centered color system and word review features, it is not highlighting the active learning features (word test and word list) to the extent presented in the MUUX-E framework. Prompts for opening a word test while a video was watched existed, but only P7 (S2) acted on them. Users mentioned the unpredictable behavior of word tests (discussed in General Usability) as a reason for abandoning the word test while watching a video. The discoverability of the learning features was already criticized from the angle of web-based learning before. However, in this context it should also be mentioned that five out of seven users reported to regularly complete a word test after a video. P3 (S2) mentioned that he ​ ​ found this feature useful to “consolidate” vocabulary. Nevertheless, these five participants also admitted to not completing more than two to three units for each post-video testing session, with P3 explaining that this transcript-based test is too long (“the quantity is quite a bit”). Finally, P7 reported that he missed a larger variety of exercises, an aspect he liked in Duolingo. Word lists were not actively used for learning, with no participant reporting reviewing them regularly (S2). The reason was that a plain list without interactivity did not seem fit for active learning for most participants. P2 (S2) states that he couldn’t “actively learn with it”, expecting a word exercise similar to other MALL apps. P3 (s1) comments that “it’s always nice to go over stuff, but I’m a bit confused as to how to use it”. A second reason found for its under usage was the lack of word list customizability. P7 (S2) wished to create multiple separate lists of words himself to study those. The only way to create a list was by favoriting words, which resulted in a single collection of favorites.

37

In other MALL applications, active learning encouragement exists in the form of notifications with learning reminders, easily accessible learning sessions (see previous section) and varied gamified learning exercises with visible rewards (see later).

Missing features to integrate with learning context

Four participants actively used other flash card based MALL apps, respectively Tinycards, AnkiDroid and Quizlet to practice Swedish vocabulary. P2 (S2) mentioned he would like a word list export feature that allowed him to transfer his favorite words from Språkplay into a flash card deck, P9 voiced a similar request and P1, P3 and P5 (all S2) responded positively to the idea. When asked whether they would prefer more active learning features for word lists in Språkplay itself versus an integration with their existing learning apps, participants’ opinions varied. P2 wished for an extensive export and did not care about the built-in learning features. P9 was doubtful that a built-in solution would offer a similar user experience to the Quizlet app he was using, and in that case also would prefer an export feature. P3 had no strong opinion but could see benefit in both options.

11.1.4 Educational usability

The “Educational usability” category of MUUX-E is concerned with the clarity of learning goals, objectives and outcomes. It further evaluates the way feedback and guidance are given in the learning process, and how progress assessments are included in the app.

Clarity of the color system at first use

A core educational aspect in Språkplay is a system that assigns to each word a status of user knowledge (see the description of the examined application for more details). To remind of the meaning, green meant “I know his word”, yellow meant “I want to learn this ​ ​ ​ ​ word now” and red meant “I want to learn this word later”. To leverage the learning ​ ​ functions such as the word list and video word test, it is essential to understand this system. However, when participants have been asked to explain the meaning of the colors in their own words (S1) there were mixed results regarding the accuracy of their understanding. Importantly, the variance in these results was connected to whether the participant had read the prior onboarding help text about the color system, and whether this information was retained. The meaning of a green caption underline as a known word was perhaps the most intuitive, only P5 (S1) confessed that she “sees no logic in the color system”, later wrongly interpreting that “the color system is emphasizing the speech”. This can be connected to the observation that P5 skipped the initial help text without reading it. P5 reached a correct understanding only when being pointed at the help text labels while changing the word. When asked to find information about this, she did not think to tap the help text menu. But even when the user read the help text, the information was not always retained fully. P1 (S1) stated: “I saw a small help dialog about this, and I know that green must signify that this word should be easy”. He further mentions that he thinks that yellow means ​ ​ “something he kind of already knows”, and red a word he is “completely alien to”. P2 (S1) ​ ​

38

had a similar conception, saying that “I read somewhere that they go for complexity”. While P1 and P2’s interpretations are correct in the sense that the initial color determination is based on complexity, they miss the nuance of the “learning” status. Later, P1 (S1) expresses his related confusion when tapping the “favorite” button on a word: “Yellow before was something that I didn’t know that much. But if the star is supposed to be something that I really want to learn, hmm.. [...] It doesn’t seem completely logical”. Importantly, these confusions seem to disappear over time. P7 comments in the second session that “to me the color system is actually pretty OK” . Other results from session 2 show that all users actively switch the colors of words to make them “testable” in the word test. That suggests that the system is usable for learning.

The clarity of learning goals

To stimulate self-regulated learning monitoring, the application measures learning activity and enables customizable learning goals in three dimensions: words read (1), words learned (2) and learning time (3). When any of these goals are achieved, a streak is won. The streak is discussed further in the User Experience section. These metrics (and streak) are visible as graphs in the Activity screen, as seen in the description of the application. There they can also be customized. During S1, participants seemed either positive or indifferent towards using seeing and setting learning goals, as tracking learning progress would work in a motivating way (P9). Some issues were identified with the learning goals, such as a confusion from P7 and P9 on what constitutes a “learned word”: was it a word demonstrated to be known by repeated word test practice, or did a mere switch to green suffice to increase this count? A description next to the progress graph explains that “learned” words are counted as soon as they are once filled in correctly in a word test, yet the participants overlooked this information. Another issue is the measuring of words “read”: it was unclear to users (and the author) how exactly, or where exactly, the “words read” count was measured. Despite the positive reactions during S1, in S2 only P7 reported to have actively referred to his learning goals to follow up on his learning. This may be related to the visibility of the goals and the absence of notifications, which are discussed further with the streak in the User Experience section.

11.1.5 User Experience

This MUUX-E evaluation category focuses on the perceptions from users from a more emotional and feelings-based perspective.

Aesthetic appeal of the application

The initial evaluation reported visual inconsistencies in the application, notably the use of mixed light- and dark-themed screens, the mixed use of custom styles originating from other Språkkraft apps versus styles associated with the SVT brand, and the inconsistent use of certain icon shapes and types for the same functions. Consistency is a long-standing usability principle and inconsistencies can make a user interface seem messy or confusing [28]

39

From the participants perspective however, not many explicit comments were spontaneously made about the appeal of the application, with some exceptions. P9 found the light green interface accents “ugly”. P2 (S1) commented that the general look and feel made him assume that the application was buggy overall.

Gamified elements as a fulfiller of motivational needs

In literature, the ability of gamification elements to increase engagement with MALL learning has been explained with self-determination theory, a theory that explains intrinsic motivation as a result of three psychological needs: competence (feeling capable of doing something), autonomy (feeling free to choose how to do something) and relatedness (feeling connected with other people) [35]. These needs are incorporated as UX aspects in ​ ​ the MUUX-E framework, which leads this thesis to evaluate the gamification elements of Språkplay in this section. Språkplay includes several gamification elements such as streaks (a visualization of consecutive days that a user was active in the app), learning goals (see Educational Usability) and achievements. All of these elements cater to the “competence” need, as they affirm acquired skills and exerted efforts. App elements that are related to “autonomy” are for example the availability of choice of when and where to watch a video, which video to watch, or the possibility to adjust one’s learning goals. App elements in connection to “relatedness” were scarce, as no social features exist except the possibility to share a link to the app. When looking at the competence-based functions, a first problem was identified with the learning streaks. They were reported to not be visible enough for participants to work as a motivating force. P9 (S2) comments that "you see it only if you want to see it”, which doesn’t induce a desire in him to “get the streak of the day”. In S3, all four participants had forgotten about their streaks when asked about it (e.g. “Was there a streak?”, P7). A solution suggested in S2 by P9 was for the app to send notifications to remind the learner of completing their planned learning for the day. P2 resonated that thought. However, when this idea was discussed with P3, P5 and P5, it became clear that the notification should be optional. P7 mentions that “if there were notifications, I would disable them”. P3 further explains that notification reminders from Duolingo had made him feel guilty for not learning, which he finds an unpleasant way of being encouraged. Another competence-based element, receiving achievements, was viewed as either a positive or neutral, but invariably surprising experience. P2, P5, and P9 voiced explicit appreciation when they got an achievement in S1. P7 was indifferent towards it. In S2, P9 pointed out that he was missing a way of seeing the possible upcoming achievements that were upcoming, and suggesting to include more meaningful achievements. So far, achievements are based solely on the word count of viewed captions.

Emotional viewing context

The emotional state and energy of the participants seemed to influence their viewing and learning behavior. Moreover, the app seemed to accommodate both passive and active viewing approaches. P3 mentioned that he appreciated that he “did not feel pressurized to finish things in the application”, in contrast to Duolingo. He further mentioned that he did

40

not always interact with the learning system, but engaged in active learning based on his “mood”: sometimes he just wanted to passively watch for leisure without using active learning features, sometimes he spent more time actively retrieving the meaning of words and completing word tests. P1 called using the app his “before bed learning” activity, mentioning a relaxed and passive approach to consume video.

11.1.6 Summary of the MUUX-E findings

The mixed-method MUUX-E evaluation results presented in this section revealed desirable aspects of the examined applications, but mostly focused on problems experienced by the users. In the General usability category, preventable issues were found in the user test. ​ ​ Furthermore, the initial state of the color system was not understood fully by users at first, Finally, an interaction efficiency problem was spotted on the translation pop-up.

In the category Web-based learning, obstacles were found for the discovery of the word ​ ​ test and word list. Moreover, non-verbatim captions were found to be suboptimal learning material for video-based learning. In the category m-Learning features, the translation ​ ​ pop-up was praised, but personalization features such as the ability to hide captions, export word lists or see personalized recommendations were missing. Active learning opportunities, while present and appreciated, were also not found to be encouraged by the interface.

In the Educational usability category, the color system itself was found to be understood ​ ​ correctly only after users had some experience in the app. Next, the learning goals were found to be unclear. Finally, in the category User Experience, the aesthetic appeal of the ​ ​ application was criticized. Gamified elements were identified as powerful motivators in light of self-determination theory, but lacking in their discoverability. The emotional state of the learner was also identified as a factor affecting levels of learning engagement in the app.

The next section will discuss the proposed design that was created to address the issues.

11.2 Design of the improved prototype (Develop)

11.2.1 Low-fidelity prototype sketches

A selection of photographs of low-fidelity paper prototype sketches is displayed below (figure 10 & figure 11). In some cases, informational architectures were written down next ​ ​ ​ ​ to the screen designs in an attempt to address the discoverability issues raised in the category “Web-based learning” and elsewhere.

41

Figure 10: Two variations of the learning screen with information architecture annotations ​

Figure 11: Variations on the home screen and video testing exercises ​

42

Different designs for various elements were inspired by design elements presented in such apps as Spotify, Duolingo, Quizlet and Memrise. These references will be discussed in the high-fidelity design versions in the next section. In terms of evaluation of the low-fidelity sketches, one design student peer expressed preferences regarding different versions of the video pre-test, post-test interface elements and the overall navigation architecture of the test process.

11.2.2 High-fidelity prototype

Figure 12: The redesigned Home screen on the left, and Learn screen on the right ​ Home screen

The home screen, the initial screen seen when the app is opened, was adjusted (see figure ​ 12, left). First, the streak was included in the top bar to address the visibility issue (see ​ 11.1.5 User Experience). Next, a section with horizontally scrollable thumbnails was added ​ (“Continue watching”) that would contain two types of videos: 1) videos that were aborted by closing the phone, and 2) new episodes of series that are often watched. A green

43

progress bar indicates where the viewer left off. This section is a design solution for the issue of missing personalized content (see the m-Learning section). Next, a section is displayed with a call-to-action that leads to a word test. This was included to more visibly encourage active learning, a lack that was identified. Below that follows a section of generic recommendations as they existed before. These were left in the application because they may perform functions of general public interest. For example, at the time the thesis was written, programs about the new coronavirus epidemic were recommended [42]. ​ ​ Navigation bar & information architecture

The navigation bar was redesigned (see figure 12) in an attempt to clarify and simplify the ​ ​ functions of the app. The “discovery” (browse content categories) and “search” pages in the original app were merged since they afford conceptually similar actions: finding a program, and minimalism in design is a MUUX-E principle. A reference for this was the Spotify app, where a search box is displayed on top of content in a single page. The Search page was not designed to save time, as it was not directly related to learning features. Next, a “Learn” page was added. This page is an information-architectural reorganization, aggregating elements related to vocabulary learning from the Profile and video pages. The page makes the existing learning functions more accessible, emphasising their importance, and leaves space for an expansion of learning functions in the Learn screen. This addresses the found issues of discoverability and lack of active learning encouragement. Finally, the star icon that afforded access to a page with a watch history and favorite programs was renamed to a “Library” page. This was done because all participants had confused the star icon with the starring of words, expecting to find favorited words (a word list) behind it. The contents of the page itself were not changed.

‘Learn’ screen & word list management

As explained before, the Learn page aggregates the learning functions of the app in a single place, displaying the last available video word tests and word lists. On top the streak is repeated from the homepage, again to increase its visibility. By highlighting the streak in the Learn screen, it is also implied that the streak is connected to learning activities, which may help users understand the system by recognition (a MUUX-E principle). Next follows a “quick action” button to encourage active learning. Video tests were previously only available from their respective videos, which made them hard to return to when the app was opened with the intention to learn words. The Learn page simultaneously acts as an indexing and management page for world lists, functionality that was moved here from the Profile section to increase discoverability. Additionally, multiple word lists could be created by the user, addressing a raised customization concern. A visualization of colors indicates the learning progress of the list.

44

Figure 13: The redesigned video watching screen with ​ captions enabled (left) and disabled (right)

Video watching screen

The video watching screen was first reconfigured to resemble the popular interface of YouTube, which is more minimal in terms of video controls (minimalism being a MUUX-E principle) (see figure 13). Top and bottom bars were stripped and the video scrubbing bar ​ ​ was moved to the bottom of the video. This opens up screen real estate without sacrificing recognizability for experienced smartphone users. Next, the video watching toggle settings (auto–translate and auto-pause) were moved to a permanent, horizontally scrollable bar of Material Design “chips”. The labels clarify their functions (their initial unclarity was a problem in the user test). One more function, “Show captions”, was added to address the need that emerged from the initial evaluation. An icon strip was added to open the transcript, bookmark the video and to share the video. A more subtle but important change is the extension of the colored caption underlines to become dashed when the learning status is an assumption based on the initial CEFR level selection. This is a design attempt to distinguish a user-confirmed status (full underline) from an assumed, uncertain and possibly faulty status. This was done to communicate uncertainty and avoid the annoyance that was discussed in the General Usability evaluation. Similarly, words unknown to the system are underlined in gray. Words that are

45

confirmed to be known (full green) are not underlined at all to minimize visual clutter, although this could be a user setting. Finally, an experimental alternative means of reaching video word-tests was designed at the bottom. The full-width bar represents a video timeline, with the bars inside representing the chopped-up transcript units that can be tested. This way, a user could open a testing unit in front (pre-test), behind (post-test) or at the current playing location right from the video screen, without going navigating through the transcript.

Collaborative caption quality assurance

Non-verbatim captions were often found annoying in the current app (see Initial evaluation). The media quality is not directly under the control of the principal (it comes from SVT), but the principal communicated an idea to allow users to report a mismatch in a caption. Swedish-speaking volunteers could then make verbatim transcriptions of the problematic caption sections, which would be pushed to users. In case a section was reported, but not corrected yet (made verbatim), an indication of these reports could be shown to other users. This system was not designed due to thesis scope and time constraints, but it was discussed with participants (see Results).

Figure 14: The redesigned word translation pop-up module (left) ​ and “word add” screen (right)

46

Translation pop-up module

The pop-up module (figure 14) was redesigned to allow for more efficient and clear ​ ​ operation, an identified inadequacy. First the pop-up content on an initial tap was reduced to only contain the translation. This disturbs the video context less. Tapping an arrow leads to a pop-up with more details (e.g. grammatical information) if desired. Switching a word’s color is now accomplished through tapping a single icon, instead of toggling a status bar, which was an uncommon UI pattern. This pattern enables much quicker interactions: to switch a color and resume viewing the video, one can hold a word, drag the finger to the desired color, and release the finger. That constitutes 1 drag motion compared to the 3-6 taps required by the original mechanism. This mechanism was inspired by the “quick reactions” UI pattern in Facebook Messenger. The three differently colored icons also help to signal the possibility of changing the color, and the meaning of the color. Next, a plus-icon replaces the prior star icon that caused some confusion. Its role is to open a dialog where the word can be added to custom lists (e.g. “Favorited words”, but also others). It involves more steps than before, but this was necessary to support multiple word lists.

Figure 15: The redesigned word list (left) ​ and word-list export dialog (right)

Word list screens

The word list screen was modified (see figure 15) to include a visual indication of the ​ ​ learning progress for words (color) and an easily visible way to practice the list. This is a

47

design attempt to make word lists more usable for word learning purposes. Tapping any single word opens a detail pop-up as before. Additionally, a feature to export the word list was included as a response to a user request (see m-Learning features). Different formats and API connections are suggested.

Word tests

Figure 16: the word test selection dialog ​ So far several locations have been seen where a (video) word test can be started in the prototype. The prior research called for more active learning encouragement (e.g. MUUX-E), more exercise types (from interviews) and more context in exercises (e.g. [25]). ​ ​ In the new design, each word test is imagined to be either a flashcard exercise or a mixed set of exercises (see figure 16), in a fashion inspired by Duolingo. These were the two ​ ​ modalities of active learning in MALL that the participants were most familiar with. A dialog prompts the user to make a choice for the test. The tested words of these exercises depend on the context where the exercise is started; e.g., yellow words from a selected word list, yellow words from a transcript unit in a video test, or a random selection of all yellow words in the home/learn screen.

48

Figure 17: an example flash card cycle ​ Flash cards

The flash cards follow standard interaction mechanisms common to other flashcard based MALL apps. The card can be flipped by pressing it or the button at the bottom (see figure ​ 17). ​ One peculiarity is the hint of the part-of-speech for the word on the Swedish side, a feature that P7 (S2) claimed was essential to his vocabulary learning strategy. Another element that was preserved from the prior word lists was the source sentence from the video is displayed to provide context to the exercise. A top bar is intended to show the test progress, the number of yellow words to be tested , and the number of green words learned as the test progresses. A word becomes learned when it has been correctly translated three times. Tests come with a configurable number of questions.

49

Figure 18: the three proposed mixed exercises: ​ 1) fill-in-the-gap, 2) reorder the sentence, 3) write-what-you hear

Mixed exercises

Three context-based exercises were selected from an analysis of existing exercises in MALL applications (see figure 18). Context-based exercises were preferred to cater to the ​ ​ described critique in literature (see Background) and to optimally utilize the available video context as exercise input (audio and transcript). Next to reading, they also train writing and listening skills in a limited, sentence-based way. First comes the fill-in-the gap “cloze” test that was already available. Changed in this design is the optional hinting by tapping a gap instead of permanently hovering hints. This optionally increases the exercise difficulty. The length of the transcript section to test is also limited, which prevents scrolling errors (see General Usability). The second exercise is a drag & drop sentence construction test. Finally a listening and writing (spelling) test plays the related audio and asks to write an omitted yellow word.

50

Figure 19: example feedback notice in exercises ​

Figure 20: post-exercise debrief feedback ​

51

Feedback and progression

Explicit and corrective feedback is given in exercises (see figure 19) according to the ​ ​ educational usability principle. After a complete exercise session is completed two successive debrief screens remind the learner of their progress (see figure 20). First by ​ ​ reviewing the concrete outcomes of the exercise, next by showing the progress towards the learning goals. In the initial evaluation it was criticized that these were not prominent enough to be motivating. This design solution repeats it more often, together with an explicit reminder about the streak goal. Since tests are modularized, a button is required to allow easy continuation of practice.

52

12 Results

This chapter describes the results of the final prototype evaluation with users (see Methods). They are reported in a similar way to the initial evaluation results, according to the key categories of MUUX-E (see Prototype development).

12.1 General usability

Figure 21: original video screen (left) vs. designed video screen (right) ​ Overall the new design resolved most existing issues related to general usability, however, it also introduced new ones.

Video screen and transcript

In the word test, certain found errors were prevented by modularizing the word tests into a set of exercises decoupled from the transcript. This led to all participants reporting having no need anymore for a full transcript available with a permanent button in the video screen. The other buttons in the strip of the transcript (Save and Share, see figure 21) were ​ ​ ​ ​ ​ ​ also not seen as essential, suggesting that they can be removed according to the principle of minimalism. P9: “It is kind of pure, more simple. You have your video and the text and what else do you need?” but P2 and P7 stated that these features “could be useful” at some point. P9 agrees that they could be moved behind a “more options” menu in the video: “the first thing I would try is click on the [more options icon]. I mean, that's really standard”.

53

The “Save” function was seen as ambiguous (P2), because it could either denote bookmarking the whole video, or bookmarking a timestamp of the video.

New translation pop-up

The redesigned translation pop-up (see figure 21) was appreciated for three reasons. P7 ​ ​ liked that it is smaller which would induce less cognitive load cognitive (“the less, the better”). Next, P9 and P2 mentioned that the three icons for switching color are easier to understand (“it's faster for the onboarding experience as well. [...] The first time you tap on the word, you understand what the color means.”). Finally, from a perspective of efficiency, most participants liked that switching a color would be faster (“That’s perfect. [...] That makes it faster”, P7). However, P9 wonders whether a quicker preview would make him learn the words less (“the problem is, maybe you end up not learning because it goes too fast. I don't know. I'm not sure.”). In response, P1 suggests that the behavior should pop-up could be configurable (e.g., to control whether it pauses the video or not, and whether the old pop-up is shown or not).

Dashed underlines and hidden green underline

The dashed underlines were received positively, but not always clear directly. The meaning of dashed underlines (initial word statuses) was not clear to P2, however, P7 correctly ​ ​ interpreted the difference without any help text. P1 likes it “much better”, and P9 mentions that it is a harmless change in the sense that “it's good for people who understand it, and it doesn't hurt for those who don't”. They established that by interacting with words the dashed underline and color would change, which makes the interface learnable. All participants reported an indifference towards hiding full green underlines. However, P9 noted from experience that “some words are not in the dictionary”, which makes them appear as not underlined in the current app. This could be ambiguous. The design already accounted for this by underlining words out of the dictionary as gray, which P9 liked. P7 summarizes: “I don't really care about the green underlining, but I like the idea of the gray one.”

12.2 Web-based learning principles

Navigational architecture

The reorganization of the menu seems to achieve its goals but evidence is limited. P1 thinks the search button presents “some sort of exploration”, which is correct. The underlying page was not accessible in the prototype. The changes introduced by the Learn screen were not immediately clear to all users experienced with the previous version. P7 thought the word lists could still be found in the Profile section, but selected the Learn screen as a probable second option. That is still better than the indeterminate navigational “guessing” that most users exhibited in the first user test. P2 directly associated the Learn screen with word lists.

54

Improving media quality by mismatch reporting

All participants responded positively to a caption quality reporting function, saying that they would report mismatched caption sentences if it was available. P9 comments that “you need someone to care about it”, alluding that the proposed volunteers might not be able to fix all captions. When asked about their desire to see an indication often-reported problematic captions with a red background, some considerations came back. P9 likes that it makes reporting problems more useful for other users, recognizing that this is a “way to use this information even if nobody [volunteers] cares”. From a design perspective, P7 noted that the design should avoid confusion with the existing learning system red color code, but likes the system otherwise. P2 thinks such an would be “helpful not to waste time”, since he reported having “wasted time” by re-listening to a confusing mismatched passage to check if correctly assessed the mismatch.

12.3 m-Learning features

Figure 22: original home screen (left) vs. designed home screen (right) ​ Personalized “Continue watching”

The affordance of the “Continue watching” section was clear and liked (see figure 22). ​ ​ When asked what kinds of videos they expect to see in the section, P2, P7 and P9 first identify “episodes that you stopped in the middle”. P7 adds “Or some kind of program you normally follow. For example, I'm used to listen to the news.” Both kinds of episodes were

55

indeed imagined to be presented. P7 prefers the section to the current home screen: “it is definitely better than the setup that the app has right now. That shows you what things are suggested, but it does not mean that those are the ones you follow yourself [...] they're not personalized at all”. In that sense, this element is successful in light of the MUUX-E personalization principle.

Customizable word lists

The customizability of word lists was also positively received, as it had been requested by P7 and P9 in previous interviews. P7 comments that he expects to be able to add a new word list from a word pop-up dialog, an option that was not included in the design. The word list export feature was clear and also enthusiastically received, with the versatility of a .csv export being praised most. This feature had been a prior request of P2 and P9, but also P1 commented that "I like .csv. And .csv I can import to whatever I want” . P2 explained his existing vocabulary export workflow from Rememberry18 : “I export [to] .csv, and then I use it in Tinycards.”. He looked forward to doing this in Språkplay as well.

Hiding captions

The ability to hide captions was received positively (see figure 13), with P2 mentioning he ​ ​ “would definitely use it”, especially when he felt more confident that he could understand the words in a particular video (e.g. the easy program Nyheter på lätt Svenska) to sharpen his listening skills. P1 wanted to enable it by default. P9 saw it as a way to sometimes escape the implied responsibility he feels to change word colors or retrieve translations of unknown words: “just watch the video fullscreen. No disturbance”.

18 Remembery is a browser extension that allows users to “translate words and phrases while browsing the web, and easily replenish your foreign languages dictionary using .” https://chrome.google.com/webstore/detail/rememberry-translate-and/dipiagiiohfljcicegpgffpbnj mgjcnf

56

12.4 Educational usability

Figure 23: original word test screen (left) vs. designed cloze test screen (right) ​ Two more exercises were added, see figure 18. ​ ​ ​ Modularised exercises

Participants liked the increased variety of exercises (see figure 23). P7 comments on the ​ ​ designed flash card system that “I'm not doing flashcards normally, and this looks very good to me”. Furthermore, he appreciates the writing exercise as an addition: “these kinds of exercises where you have to type yourself the word are super difficult but really important.” P1 did not use the word test in the current app, but comments that these exercises "look so great this way and I may even try it out." He also sees potential in additional speech-recognizing exercises like Duolingo has. However, interaction between practicing a word in an exercise and the status of a word was controversial. The design dictated that words would automatically change their status after being correctly filled in 3 times (like the current app). P7 did not like this because he wanted to remain in control over the word colors. P9 had a similar preference.

Video-based exercises

Participants liked the bar that afforded pre- and posts tests for video sections (see figure ​ 14), although its purpose was not immediately clear. The colors of the inner bar were not seen as necessary. Pre-testing sections was clearly favored over post-testing as a form of preparation. P7 would like to pretest a section, then check if he “got it” by watching the

57

section after. P9 sees potential in pretesting a section to prime its yellow contents, then watching it while hiding captions to practice his listening skills.

Learning goal reminders

The added reminders of learning goals after exercises were received positively or neutrally. P2 finds the summary “super cool” as it tells him “the time he still has to explore” to reach his learning goal. P7 claims that “It’s OK, but I don’t need it”. The prompt to do a full video test at the end of a video was also either received neutrally or positively, with P9 calling it a “good ending” that he expected to be there, while P7 was indifferent to its presence.

12.5 User Experience

Aesthetic experience

The designed app has a consistent light design throughout all screens, in contrast to the mixed green, dark and light frames in the original app. P2 likes it, saying it feels “properly designed”. However, when asked whether participants preferred an overall dark or light design, dark was preferred, because it is easier on the eyes good for night time viewing (P7 and P1). All participants agreed that the app should adhere to a system-wide dark-mode switch that is common in recent smartphone operating systems.

Gamified elements

Participants like the increased visibility of the streak, commenting that it was the first time they noticed it (P7: "Uh, I've never noticed that before? And this way it looks very clear.") The plan to keep achievements in the app and expand them with different types (not designed) was received well, with P2 relating it to achievements in his sports app which he finds useful as a motivational help. He also expresses a wish for a social leaderboard in the app that measures a streak. This was not included in the app. P9 likes the wording on the video debrief screen, such as “Good job!”, which he experiences as motivating.

12.6 Summary

The results presented the outcomes of the final prototype evaluation along the categories of the MUUX-E framework. In terms of the category General Usability, the new translation ​ ​ popup was especially appreciated for its increased efficiency, the dashed underlines were not seen as important and the strip with side-functions was seen as unnecessary. In terms of Web-based learning, the participants succeeded in finding their way through the ​ ​ reorganized layout; they also reported interest in a collaborative caption-improvement feature. The m-Learning features, participants overwhelmingly praised the new Continue ​ ​ watching section, customizable word list and hideable captions. The word list export was found useful by those that had requested it. In the category Educational usability, ​ modularized exercises were seen as more enticing than the previous exercises available,

58

video-based exercises were also seen as useful preparation. However, participants desired full over the under-the-hood interaction between exercises and the color status of words. Extra screens reminding of progression were positively or neutrally received. Finally, in terms of User Experience, the new app layout was received well, but a request for a ​ ​ dynamic dark mode surfaced. Finally, participants praised the increased visibility of gamified elements such as a streak; but also expressed a desire for social gamification features, which were not implemented. The next chapter will, among others, discuss these results.

13 Discussion

This thesis set out to answer two research questions. The first question inquired about the usability and user experience of novel video-based learning functions found in Språkplay. The second question was more explorative, asking for ways in which such video-based learning functions can be designed to integrate gamification elements and evaluated. This chapter connects the findings of this work to the research questions, background and aims of this thesis. Finally, it identifies limitations of the work and identifies opportunities for further research. In order to answer the aforementioned questions, the work formulated three goals: 1. To evaluate the usability and user experience of video-based learning functions found in Språkplay 2. To further design video-based learning functions based on insights from design research and MALL precedents 3. To evaluate the produced design in terms of usability and user experience The process of completing goal 2 and 3 itself would provide one possible way in which video-based learning functions can be designed, answering the second research question. This chapter will address whether these goals were adequately realized by the work, pointing out results and limitations.

13.1 Evaluation of the original app

To evaluate the learning functions (goal 1) the study undertook several steps of qualitative data collection and analysis, including a heuristic walkthrough, user test and semi-structured interviews. A large number of findings was reduced by using conventional and directed coding techniques. The evaluation results exhibit a negative emphasis with most of the reported points consisting of usability issues, missing features or sub-optimal user experiences. These points should not be mistaken to represent the value or quality of the app as it has been reported that usability studies tend to overrepresent negative aspects in general [30]. Still, ​ ​ cognition of the negative points proved useful for later design explorations. An important counterweight to the negative points was found a well. When asked about their overall opinion, all participants were satisfied with the core functionality of the app:

59

(1) having access to controllable, glossed captions, and (2) being able to test those. Regular use of former feature especially was reported which can be seen as a UX accomplishment in its own regard. This is an encouraging result because this interactive mode of viewing with glossed captions has been identified as advantageous for vocabulary acquisition compared to other video watching strategies [26]. ​ ​ More relations can be drawn to prior research. First, it was found that the desire to actively learn with the application shifted depending on the emotional context and energy of participants. This can be related to the “paradox of effort” identified by Vanderplank [8]: ​ ​ watching TV is often regarded as a leisure activity, which may make it harder to bring up the necessary commitment for (active) learning if this is not encouraged. Captions provide such encouragement, they have the effect of making the learner realize their missing vocabulary knowledge. In that light, the request to be able to temporarily hide captions can be seen as a call for leisurely watching (e.g. P3 and P9), or as a way in which increased control would optimize learning (as seen in the background and suggested by P1, P7). Next, it was found that the word test activity missed a clearly defined step-by-step flow which led to usability issues. This could be predicted from the findings of Sendurur et al. [32], which identified a step-by-step flow in vocabulary exercises as inviting good usability. ​ 13.1.1 Limitations

Some limitations should be mentioned regarding the internal validity of the initial evaluation process. First of all, there is the subjectivity of the heuristic walkthrough as it was conducted solely by the author. For that reason it was opted to involve users in the evaluation as well. However, also this second method suffers from certain issues. Conducting two user sessions with two weeks in between, one for initial experiences and one reflective evaluations, can be seen as a good practice to evaluate actual usage behavior, as opposed to a single user test. However, the number of seven recurring participants was low which may have affected the reliability of the findings. Next, the author is aware that most of the participants were (post-)doctoral researchers that spoke at least two and often three languages. It is possible that this high level of education and prior language learning experience influenced the results, limiting possibilities to extrapolate the findings to the intended general population of “foreign students in Swedish higher education”. The unequal distribution of prior experience with the app may have been similarly problematic, in that they may have influenced the first user test results. Nevertheless, these effects were less likely to influence the triangulated final results because a second interview session was included that sought for post-experience results from all participants. ​ ​ In terms of the analysis of the data, the subjectivity of a single analyst (the author) should again be mentioned as a weak point reducing internal validity. Moreover, it should be mentioned that the framework used to classify the findings (MUUX-E) was not developed specifically for MALL applications, but for m-Learning in general. It may therefore have missed nuanced evaluation criteria relating to language learning in particular.

60

13.2 Design process and results

For the second and third goals inspiration was taken from the user-centred Double Diamond framework to organize the design process. The designs were primarily based on the feedback from the initial evaluation, while the author’s experience with other MALL and mobile media applications was used as a secondary reference. The author produced a high-fidelity prototype addressing about three fourths of the usability issues and desires that were encountered. Certain pages, such as the Search page, Library page and Profile page were not designed. These were omitted because they either did not change much compared to the original app, or they consisted of minor functions that were common to most mobile apps (e.g., log-in functionality). The design evaluation (goal 3) was intended to be conducted in the format of a focus group with the same participants as the app evaluation. Its aim was to evaluate the perceived usability and UX of the revised and added learning functions in the design. While six participants registered for the one-hour session, only one participant appeared (P2). For a rescheduled session three participants joined (P1, P7, P9). Hence, with the limited number of participants, the evaluation took the form of two semi-structured (group) interviews. In the group of three, relevant discussions between participants still occurred. The evaluation results report that most design changes were either positively or neutrally received, suggesting that a redesign according to MUUX-E principles did improve perceived usability and user experience of the application. Most notably the “Continue watching” section and word list customization features were received with unanimous appreciation. Negative feedback existed as well, for example on the designed automatic word color switching in exercises. Several features were not intuitively clear to users and had to be explained, such as the new video word test bar. In these cases, it can be argued that the designs also introduced new violations against the MUUX-E principles, such as the principle of “minimalism in design”. One instance of such a violation is where P7 and P9 found the new video screen to contain unnecessary elements. Finally, the positive reception of the learning progression visualization after each exercise (a gamification element) was also found in Chen et al. [4], suggesting that this design ​ ​ change might motivate learners to use the app more.

13.2.1 Limitations

Also in the design phase limitations existed. First of all, the completion of a single design iteration (design & evaluation) can be criticized. In a user-centred design process, more iterations are expected. Here the design process was stalled at the end of the first evaluation of the new prototype, with certain known opportunities for improvement. This situation was a result of the limited time scope for the thesis and time spent on the initial evaluation of the existing app. Next, a caveat of the prototype was that it was by its nature a model that did not provide a fully realistic interactive experience. It could thus not be relied upon to evaluate the imagined model targets’ actual usability and UX. Instead, perceived usability and UX were evaluated.

61

In terms of the data analysis, similar comments to the initial evaluation can be made relating to its internal validity: there were even less participants (now all male as well) and the data was still analyzed by the author only. In this case it should be taken into account that the evaluation was performed by the designer, which more readily opens the door for a confirmation bias. This may have been reflected in the conduction of the group interview, the data analysis and the reporting of results.

13.3 Further work

This work leaves several opportunities for further research and development. First of all, the insights from the design evaluation results may be iterated upon by following the suggestions from participants to further improve the usability and UX of the designed features. The work also found routes for further design and development that were not explored, such as social gamification features, extended achievements and the design of a collaborative, volunteer-based system for caption quality control. Since this work focused on the usability and UX aspects of vocabulary learning features in a mobile video-based environment, it left out evaluation of the pedagogical merits of these features. For example, its effects on reading and listening comprehension were not studied. Some prior research has been done on that (e.g. [14]), but the conditions are slightly ​ ​ different in the case of Språkplay, notably because of the added gamification features. Another condition that was not much examined is the mobile nature of SVT Språkplay, which begs the question of whether a mobile interface would be used and experienced differently from a desktop or laptop (e.g. web app) version of the same learning functions. Two participants (P2 and P9) expressed that they would wish SVT Språkplay to be available on a desktop, which points to the relevancy of this question. Interestingly, the developer of Språkplay has also made a web app for the educational TV provider UR19. Evaluation of the current designs may also be carried out with different target groups (e.g., immigrant learners not enrolled in higher education). A larger number of study participants and more structured methods such as quantitative analysis of app usage data and standardised surveys may also be desirable for an increased validity of the findings. Språkkraft intends to refer to this work in the oncoming iterations of the design and development of SVT Språkplay.

13.4 Ethics

Ethical considerations with regard to the methods used in this thesis were already described in section 10.4 Ethics. However, the users were not explicitly informed of the ​ ​ data processing practices of the examined app SVT Språkplay. This was considered to be part of the private agreement between the participant and the application publisher SVT. ​ ​ Although usage of SVT Språkplay was necessary for participation in this study, the terms ​ ​ of the data processing agreement20 could be reviewed upon installation. Dropping out of

19 https://www.ur.se/sprakplay/ has similar functionality to SVT Språkplay ​ ​ 20 https://kontakt.svt.se/guide/about-svt-sprakplay-in-english-and-swedish the header ​ ​ Personuppgiftsbehandling describes the data processing activities and purposes of SVT Språkplay. ​ This information is referred to from within the application.

62

the study for reasons of disagreement with these terms was possible at all times. Finally, data logged by SVT Språkplay about the participants’ usage was never requested from Språkkraft, neither was it accessed or used in this thesis work.

14 Conclusions and Future work

The first aim of this thesis was to start filling a gap in research by evaluating the usability and user experience of the novel (mobile) video-based learning functions found in Språkplay. It did this by conducting user tests and interviews with seven second language students who used the app over a period of at least two weeks. The initial evaluation results were analysed through the lens of the MUUX-E framework and showed that the core vocabulary learning aids directly integrated into the video watching experience were perceived as useful. Conversely, the gamified learning functions outside of the video watching experience were found to be scarcely used as intended. The second aim of the thesis was to improve the usability and user experience of these learning functions through a user-centred design process with the ultimate goal to improve learner support and vocabulary acquisition outcomes. The outcomes of this user-centered design process were twofold. First, it produced tentative solutions for selected usability problems found in the prior evaluation. Second, it improved upon design of gamified learning functions by adhering to the principles of the MUUX-E framework. Concretely, more varied contextualized vocabulary exercises were designed, more options for user customization were included and feedback and progress metrics such as “streaks” were highlighted. An evaluation of the design with the same participants as the initial evaluation suggests that these changes improved the perceived usability and user experience of the application. This result suggests that evaluating and designing according to the MUUX-E principles is a viable method for designing usable gamified learning functions for mobile video-based learning environments. Further research could evaluate an implemented end product based on the proposed designs in a real-life setting. In that case, its pedagogical merit should also be evaluated.

63

64 |

15 References

[1] Rula Al Azawi, Mazin Bulshi, and Fatma Farsi. 2016. Educational Gamification Vs. Game Based Learning: Comparative Study. [2] Gustavo García Botero, Frederik Questier, and Chang Zhu. 2019. Self-directed language learning in a mobile-assisted, out-of-class context: do students walk the talk? Comput. Assist. Lang. Learn. 32, 1–2 (January 2019), 71–97. ​ DOI:https://doi.org/10.1080/09588221.2018.1485707 [3] J. Cassim. 2015. 17 - Issues and techniques in the inclusive design of apparel for the active ageing population. In Textile-Led Design for the Active Ageing Population, Jane ​ ​ McCann and David Bryson (eds.). Woodhead Publishing, 283–305. DOI:https://doi.org/10.1016/B978-0-85709-538-1.00017-1 [4] Chih-Ming Chen, Huimei Liu, and Hong-Bin Huang. 2019. Effects of a mobile game-based English vocabulary learning app on learners’ perceptions and learning performance: A case study of Taiwanese EFL learners. ReCALL 31, 2 (May 2019), ​ ​ 170–188. DOI:https://doi.org/10.1017/S0958344018000228 [5] Council of Europe. 2011. Common European Frameworkof Reference for Languages: Learning, teaching, assessment. Retrieved May 26, 2020 from http://ebcl.eu.com/wp-content/uploads/2011/11/CEFR-all-scales-and-all-skills.pdf [6] Design Council. 2015. What is the framework for innovation? Design Council’s evolved Double Diamond. Design Council. Retrieved May 24, 2020 from ​ ​ https://www.designcouncil.org.uk/news-opinion/what-framework-innovation-design- councils-evolved-double-diamond [7] Sebastian Deterding, Dan Dixon, Rilla Khaled, and Lennart Nacke. 2011. From game design elements to gamefulness: defining “gamification.” In Proceedings of the 15th ​ International Academic MindTrek Conference: Envisioning Future Media Environments (MindTrek ’11), Association for Computing Machinery, Tampere, ​ Finland, 9–15. DOI:https://doi.org/10.1145/2181037.2181040 [8] Mark Dressman and Randall William Sadler (Eds.). 2019. The Handbook of Informal ​ Language Learning | Wiley. Retrieved February 4, 2020 from ​ https://www.wiley.com/en-gb/The+Handbook+of+Informal+Language+Learning-p-9 781119472308 [9] Filippos Giannakas, Georgios Kambourakis, Andreas Papasalouros, and Stefanos Gritzalis. 2018. A critical review of 13 years of mobile game-based learning. Educ. ​ Technol. Res. Dev. 66, 2 (April 2018), 341–384. ​ DOI:https://doi.org/10.1007/s11423-017-9552-z [10] Patricia-Ann Harpur and Ruth de Villiers. 2015. MUUX-E, a framework of criteria for evaluating the usability, user experience and educational features of m-learning environments. South Afr. Comput. J. 0, 56 (July 2015). ​ ​ DOI:https://doi.org/10.18489/sacj.v56i1.240 [11] Catherine Regina Heil, Jason S. Wu, Joey J. Lee, and Torben Schmidt. 2016. A Review of Mobile Language Learning Applications: Trends, Challenges, and Opportunities. EuroCALL Rev. 24, 2 (September 2016), 32. ​ DOI:https://doi.org/10.4995/eurocall.2016.6402 [12]Hsiu-Fang Hsieh and Sarah E. Shannon. 2005. Three Approaches to Qualitative Content Analysis. Qual. Health Res. 15, 9 (November 2005), 1277–1288. ​ ​ DOI:https://doi.org/10.1177/1049732305276687 [13]Ching-Kun Hsu. 2015. Learning motivation and adaptive video caption filtering for

64

EFL learners using handheld devices. ReCALL 27, 1 (January 2015), 84–103. ​ ​ DOI:https://doi.org/10.1017/S0958344014000214 [14]Ching-Kun Hsu, Gwo-Jen Hwang, Yu-Tzu Chang, and Chih-Kai Chang. 2012. Effects of Video Caption Modes on English Listening Comprehension and Vocabulary Acquisition Using Handheld Devices. (2012), 13. [15]Daniel R. Isbell, Hima Rawal, Rachelle Oh, and Shawn Loewen. 2017. Narrative Perspectives on Self-Directed Foreign Language Learning in a Computer- and Mobile-Assisted Language Learning Context. Languages 2, 2 (June 2017), 4. ​ ​ DOI:https://doi.org/10.3390/languages2020004 [16]ISO. 2006. ISO 9241-110: Ergonomics of human-system interaction—Part 110: Dialogue principles. [17]ISO. 2006. ISO 9241-210-2010 Ergonomics of human-system interaction. Part 210 - Human-centred design for interactive systems. [18] Ann Jones, Agnes Kukulska-Hulme, Lucy Norris, Mark Gaved, Eileen Scanlon, Jan Jones, and Andrew Brasher. 2017. Supporting immigrant language learning on smartphones: A field trial. Stud. Educ. Adults 49, 2 (July 2017), 228–252. ​ ​ DOI:https://doi.org/10.1080/02660830.2018.1463655 [19]Claire-Marie Karat, Robert Campbell, and Tarra Fiegel. 1992. Comparison of empirical testing and walkthrough methods in user interface evaluation. In Proceedings of the ​ SIGCHI Conference on Human Factors in Computing Systems (CHI ’92), Association ​ for Computing Machinery, Monterey, California, USA, 397–404. DOI:https://doi.org/10.1145/142750.142873 [20] Stephen Krashen. 1989. We Acquire Vocabulary and Spelling by Reading: Additional Evidence for the Input Hypothesis. Mod. Lang. J. 73, 4 (1989), 440–464. ​ ​ DOI:https://doi.org/10.2307/326879 [21]Stephen Krashen. 2014. Does Duolingo “Trump” University-Level Language Learning? Int. J. Foreign Lang. (2014), 3. ​ [22] Agnes Kukulska-Hulme. 2018. Mobile-assisted language learning [Revised and updated version]. In The Concise Encyclopedia of Applied Linguistics, Carol A. ​ ​ Chapelle (ed.). Wiley. Retrieved April 27, 2020 from http://oro.open.ac.uk/57023/ [23] Agnes Kukulska-Hulme and Lesley Shield. 2008. An overview of mobile assisted language learning: From content delivery to supported collaboration and interaction. ReCALL 20, 3 (September 2008), 271–289. ​ DOI:https://doi.org/10.1017/S0958344008000335 [24] Clayton Lewis and John Rieman. 1994. 5. Evaluating the Design With Users. In Task-Centered User Interface Design. 19. Retrieved from http://hcibib.org/tcuid ​ [25] Shawn Loewen, Dustin Crowther, Daniel R. Isbell, Kathy Minhye Kim, Jeffrey Maloney, Zachary F. Miller, and Hima Rawal. 2019. Mobile-assisted language learning: A Duolingo case study. ReCALL 31, 3 (September 2019), 293–311. ​ ​ DOI:https://doi.org/10.1017/S0958344019000065 [26] Maribel Montero Perez, Elke Peters, Geraldine Clarebout, and Piet Desmet. 2014. Effects of Captioning on Video Comprehension and Incidental Vocabulary Learning. Lang. Learn. Technol. 18, (February 2014), 118–141. ​ [27] Maribel Montero Perez, Elke Peters, and Piet Desmet. 2018. Vocabulary learning through viewing video: the effect of two enhancement techniques. Comput. Assist. ​ Lang. Learn. 31, 1–2 (January 2018), 1–26. ​ DOI:https://doi.org/10.1080/09588221.2017.1375960 [28] Jakob Nielsen. 10 Heuristics for User Interface Design: Article by Jakob Nielsen. Nielsen Norman Group. Retrieved February 20, 2020 from ​

65

66 |

https://www.nngroup.com/articles/ten-usability-heuristics/ [29] Frank E. Ritter, Gordon D. Baxter, and Elizabeth F. Churchill. 2014. Foundations ​ for Designing User-Centered Systems. Springer London, London. ​ DOI:https://doi.org/10.1007/978-1-4471-5134-0 [30] Frank E. Ritter, Gordon D. Baxter, and Elizabeth F. Churchill. 2014. Methodology III: Empirical Evaluation. In Foundations for Designing User-Centered Systems. ​ ​ Springer London, London, 353–380. DOI:https://doi.org/10.1007/978-1-4471-5134-0_13 [31]Fernando Rosell-Aguilar. 2018. Autonomous language learning through a mobile application: a user evaluation of the busuu app. Comput. Assist. Lang. Learn. 31, 8 ​ ​ (November 2018), 854–881. DOI:https://doi.org/10.1080/09588221.2018.1456465 [32] Emine Sendurur, Esra Efendioglu, Neslihan Yondemir Çaliskan, Nomin Boldbaatar, Emine Kandin, and Sevinç Namazli. 2017. The M-Learning Experience of ​ Language Learners in Informal Settings. International Association for the ​ Development of the Information Society. Retrieved May 11, 2020 from https://eric.ed.gov/?id=ED579194 [33] Statistics Sweden. 2018. The labour market in 2018 for foreign born persons with a higher education. (2018), 40. [34] (Mark) Feng Teng. 2020. Vocabulary learning through videos: captions, advance-organizer strategy, and their combination. Comput. Assist. Lang. Learn. 0, 0 ​ ​ (February 2020), 1–33. DOI:https://doi.org/10.1080/09588221.2020.1720253 [35] Gustavo Fortes Tondello. 2016. An introduction to gamification in human-computer interaction. XRDS Crossroads ACM Mag. Stud. 23, 1 (September ​ ​ 2016), 15–17. DOI:https://doi.org/10.1145/2983457 [36] Triando and Leena Arhippainen. 2019. Development and User Experiences of the Learn Viena Karelian Mobile Web Game. In 2019 International Conference on ​ Advanced Computer Science and information Systems (ICACSIS), 465–470. ​ DOI:https://doi.org/10.1109/ICACSIS47736.2019.8979925 [37] Robert Vanderplank. 2016. Captioned Media in Foreign Language Learning and ​ Teaching: Subtitles for the Deaf and Hard-of-Hearing as Tools for Language Learning. Palgrave Macmillan UK, London. ​ DOI:https://doi.org/10.1057/978-1-137-50045-8 [38] Olga Viberg and Åke Grönlund. 2012. Mobile Assisted Language Learning: A Literature Review. (2012), 8. [39] Scott Wilson, Oleg Liber, Mark Johnson, Phil Beauvoir, Paul Sharples, and Colin Milligan. 2007. Personal Learning Environments: Challenging the dominant design of educational systems. J. E-Learn. Knowl. Soc. 3, 2 (2007), 27–38. ​ ​ [40] 2020. Rosetta Stone. Wikipedia. Retrieved February 13, 2020 from ​ ​ https://en.wikipedia.org/w/index.php?title=Rosetta_Stone&oldid=938876679 [41]2020. Duolingo. Wikipedia. Retrieved February 14, 2020 from ​ ​ https://en.wikipedia.org/w/index.php?title=Duolingo&oldid=940561364 [42] 2020. Språkkraft offers vital information to immigrants and refugees during the coronavirus crisis. Impact Hub Stockholm. Retrieved May 14, 2020 from ​ ​ https://stockholm.impacthub.net/sprakkraft-offers-vital-information-to-immigrants-a nd-refugees-during-the-coronavirus-crisis/ [43] 2020. Spaced repetition. Wikipedia. Retrieved May 10, 2020 from ​ ​ https://en.wikipedia.org/w/index.php?title=Spaced_repetition&oldid=955844736

66

16 Appendix A: The initial survey for study participants

This appendix replicates the Google Form that participants were asked to fill in, including frontmatter. For each multiple choice question it was possible to leave an alternative reply not given.

16.1  Hey, thanks for your interest in the Språkplay study!

This follow-up form is here to: ✅ Ask for your consent on the study process ✅ Get to know you a bit better!

A reminder of what we'll do in the study: The idea is that you'll supplement your Swedish learning using the Språkplay app at your own pace in the coming month. We'll schedule 3 remote video calls so I can follow up on how you're using the app, and you can help me improve it. 1. End of March (this week or next week): introduction session (user test) (30 or 15 min.) 2. You can freely now use the app for 2 to 3 weeks at home. Enjoy Swedish TV! 3. Mid of April: interview about your Språkplay learning experiences (30 min.) 4. End of April: interview to discuss a new gamified vocabulary learning feature (1h)

16.1.1 Do I have your permission?

→ I conduct this study to collect information on how you use or wish to use SVT Språkplay. → Your email address will only be used for communication about the study. → For later reference, I will record the audio of all the sessions. → For later reference, I will ask you to share your mobile screen during our first session so I can see and record how you first use the app. This can be done with the apps Skype or Zoom. Your face will not be recorded. → The video or audio recordings will not be shared or published outside of Språkkraft (the non-profit behind Språkplay). → Findings from the interviews will only be published in anonymized form.

67

→ You may withdraw from this study at any time by contacting me at [email protected] or [redacted phone number]

16.2 A bit more about you

Some questions about your language knowledge, learning habits and background questions

● What is your native language? (short answer question) ● What other languages do you speak & at what level? (A1 - C2) (short answer question) ○ Example: Portuguese A2, German B2. Use this if you don't know -> A1: beginner, A2: Elementary, B1: Intermediate, B2: Upper intermediate, C1: Proficient, C2: Mastery (native skill). ● What is your motivation for learning Swedish? (short answer question) ● Did you use Språkplay before? (multiple choice question) ○ I have used it before ○ I have heard of it before ○ I did not know of SVT Språkplay before I was invited to this study ● Where did you first hear about SVT Språkplay? (if you have) (short answer question) ● Did you watch Swedish television/movies/series before? (multiple choice question) ○ Yes ○ No ● Have you used other apps for language learning? (any language) Which one(s)? (long answer question) ● What is your gender? Explanation: Why? I need this as a standard background variable for my study. ○ Female ○ Male ○ Prefer not to say ○ Other… ● What is your age? Explanation: Same story, also an important background variable.

68

16.3 Done!

After you fill in this form I'll send you an email with: 1. ⬇ Installation instructions for SVT Språkplay 2.  A way to schedule our first interview for the end of this week or next week!

17 Appendix B: Semi-structured interview questions

All yes/no questions served as a starting point for follow-up questions like “What did you think of it?” or “Why not?”, aiming to explore the role that the application features were fulfilling in the learning strategies of the user.

17.1 A. Contextual inquiry

1. In which ways do you practice Swedish? 2. How often do you practice Swedish? 3. How do you learn vocabulary? 4. How long is a typical practice session?

17.2 B. Learning scenario follow-up (examined application)

● What did you like most about the app? ● What do you think should be improved? ● How often did you use Språkplay? ● Which videos did you watch on Språkplay? ● Did you use the auto translate / auto pause functionality ● Did you change the colors of the words while watching? ● Did you save words using the "favorite" functionality? ● Did you take a word test? ● Did you set watching goals? ● Were you motivated by the streak? ● Did you see the achievements?

69

● What could this application do more to help you learn vocabulary? ● Anything else?

18 Appendix C: Prototype evaluation questions

Semi-structured questions according to the MUUX-E principles.

18.1 Generic Usability

● Is the "+" icon & subsequent "favorites" list (as designed) more clear to you than the current app? ● Are the video controls functions (as designed) recognisable to you? (What do you think they mean?) ● What do you think the dashed underlined captions means? (as designed) ● Would you want to confirm your knowledge of every word?

18.2 Web-based learning principles

● Is “Continue watching" an easy way to access relevant info? ● Where would you find the word list? ● Would you use a "report" functionality (as designed) to report problems in captions? ● Would it be helpful for you to see if people reported that a section has problematic captions (as designed)? ● Would the indication of word colors on video items (as designed) help you select videos that match your level/wishes?

18.3 Educational usability

● How do you want to actively study word lists in the app? ● Are the flashcard/random exercise options (as designed) sufficient? Do you want more customization? If not, why not? If yes, what should be customizable? ● Do you prefer the modularized guided tests (as designed) compared to the transcript-based tests? ● Would you want to pre-test a section of the video while watching it? ● Would you want to pre-test a section of the video while watching it? ● Is the video testing screen (as designed) making these functions clear? ● Is the video testing screen (as designed) making these functions clear?

18.4 m-Learning features

● Would you use the "Recent videos" section in the homescreen? (as designed) ● Would you use the word list export feature (as designed?) ● Would you construct your own word lists (as designed)? Would you prefer to use the auto-generated ones from videos? (as designed)

70

● Do you like the available customization options in the video viewer?Do you like the available customization options in the video viewer? (as designed) (e.g. no captions)

18.5 User Experience

● Would you be motivated by more kinds of achievements? (as designed) ● Would you be motivated to use the app by a social leaderboard/competition? ● What kind of notification would not make you feel guilty? ● Do you want to see a streak count and/or experience points in the home screen? Would you be motivated by that? (connection to gamification)? (as designed) ● In a video-watching app, do you like to have it in dark mode? (Like Netflix?) Or light mode (as designed)?

71

TRITA -EECS-EX-2020:423

www.kth.se