Unblind Your Apps: Predicting Natural-Language Labels for Mobile GUI Components by Deep Learning
Total Page:16
File Type:pdf, Size:1020Kb
Unblind Your Apps: Predicting Natural-Language Labels for Mobile GUI Components by Deep Learning Jieshan Chen Chunyang Chen∗ Zhenchang Xing† [email protected] [email protected] [email protected] Australian National University Monash University Australian National University Australia Australia Australia Xiwei Xu Liming Zhu†‡ Guoqiang Li∗ [email protected] [email protected] [email protected] Data61, CSIRO Australian National University Shanghai Jiao Tong University Australia Australia China Jinshui Wang∗ [email protected] Fujian University of Technology China ABSTRACT our model can make accurate predictions and the generated labels According to the World Health Organization(WHO), it is estimated are of higher quality than that from real Android developers. that approximately 1.3 billion people live with some forms of vision impairment globally, of whom 36 million are blind. Due to their CCS CONCEPTS disability, engaging these minority into the society is a challenging • Human-centered computing → Accessibility systems and problem. The recent rise of smart mobile phones provides a new tools; Empirical studies in accessibility; • Software and its engi- solution by enabling blind users’ convenient access to the infor- neering → Software usability. mation and service for understanding the world. Users with vision KEYWORDS impairment can adopt the screen reader embedded in the mobile Accessibility, neural networks, user interface, image-based buttons, operating systems to read the content of each screen within the content description app, and use gestures to interact with the phone. However, the prerequisite of using screen readers is that developers have to add ACM Reference Format: natural-language labels to the image-based components when they Jieshan Chen, Chunyang Chen, Zhenchang Xing, Xiwei Xu, Liming Zhu, are developing the app. Unfortunately, more than 77% apps have Guoqiang Li, and Jinshui Wang. 2020. Unblind Your Apps: Predicting Natural- issues of missing labels, according to our analysis of 10,408 An- Language Labels for Mobile GUI Components by Deep Learning. In 42nd droid apps. Most of these issues are caused by developers’ lack of International Conference on Software Engineering (ICSE ’20), May 23–29, 2020, Seoul, Republic of Korea. ACM, New York, NY, USA, 13 pages. https: awareness and knowledge in considering the minority. And even if //doi.org/10.1145/3377811.3380327 developers want to add the labels to UI components, they may not come up with concise and clear description as most of them are of no visual issues. To overcome these challenges, we develop a deep- 1 INTRODUCTION learning based model, called LabelDroid, to automatically predict Given millions of mobile apps in Google Play [11] and App store [8], the labels of image-based buttons by learning from large-scale com- the smart phones are playing increasingly important roles in daily mercial apps in Google Play. The experimental results show that life. They are conveniently used to access a wide variety of services such as reading, shopping, chatting, etc. Unfortunately, many apps ∗Corresponding author. remain difficult or impossible to access for people with disabili- †Also with Data61, CSIRO. ties. For example, a well-designed user interface (UI) in Figure 1 ‡ Also with University of New South Wales. often has elements that don’t require an explicit label to indicate their purpose to the user. A checkbox next to an item in a task Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed list application has a fairly obvious purpose for normal users, as for profit or commercial advantage and that copies bear this notice and the full citation does a trash can in a file manager application. However, to users on the first page. Copyrights for components of this work owned by others than ACM with vision impairment, especially for the blind, other UI cues are must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a needed. According to the World Health Organization(WHO) [4], it fee. Request permissions from [email protected]. is estimated that approximately 1.3 billion people live with some ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea form of vision impairment globally, of whom 36 million are blind. © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-7121-6/20/05...$15.00 Compared with the normal users, they may be more eager to use https://doi.org/10.1145/3377811.3380327 the mobile apps to enrich their lives, as they need those apps to ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea Jieshan Chen, Chunyang Chen, Zhenchang Xing, et al. Figure 2: Source code for setting up labels for “add playlist” Figure 1: Example of UI components and labels. button (which is indeed a clickable ImageView). represent their eyes. Ensuring full access to the wealth of infor- developers may add label “add” to the button in Figure 2 rather mation and services provided by mobile apps is a matter of social than “add playlist” which is more informative about the action. justice [54]. To overcome those challenges, we develop a deep learning based Fortunately, the mobile platforms have begun to support app model to automatically predict the content description. Note that accessibility by screen readers (e.g., TalkBack in Android [12] and we only target at the image-based buttons in this work as these VoiceOver in IOS [20]) for users with vision impairment to interact buttons are important proxies for users to interact with apps, and with apps. Once developers add labels to UI elements in their apps, cannot be read directly by the screen reader without labels. Given the UI can be read out loud to the user by those screen readers. For the UI image-based components, our model can understand its example, when browsing the screen in Figure 1 by screen reader, semantics based on the collected big data, and return the possible users will hear the clickable options such as “navigate up”, “play”, label to components missing content descriptions. We believe that “add to queue”, etc. for interaction. The screen readers also allow it can not only assist developers in efficiently filling in the content users to explore the view using gestures, while also audibly de- description of UI components when developing the app, but also scribing what’s on the screen. This is useful for people with vision enable users with vision impairment to access to mobile apps. impairments who cannot see the screen well enough to understand Inspired by image captioning, we adopt the CNN and transformer what is there, or select what they need to. encoder decoder for predicting the labels based on the large-scale Despite the usefulness of screen readers for accessibility, there dataset. The experiments show that our LabelDroid can achieve is a prerequisite for them functioning well, i.e., the existence of 65% exact match and 0.697 ROUGE-L score which outperforms both labels for the UI components within the apps. In detail, the Android state-of-the-art baselines. We also demonstrate that the predictions Accessibility Developer Checklist guides developers to “provide con- from our model is of higher quality than that from junior Android tent descriptions for UI components that do not have visible text"[6]. developers. The experimental results and feedbacks from these Without such content descriptions1, the Android TalkBack screen developers confirm the effectiveness of our LabelDroid. reader cannot provide meaningful feedback to a user to interact Our contributions can be summarized as follow: with the app. • To our best knowledge, this is the first work to automati- Although individual apps can improve their accessibility in many cally predict the label of UI components for supporting app ways, the most fundamental principle is adhering to platform acces- accessibility. We hope this work can invoke the community sibility guidelines [6]. However, according to our empirical study in attention in maintaining the accessibility of mobile apps. Section 3, more than 77% apps out of 10,408 apps miss such labels • We carry out a motivational empirical study for investigating for the image-based buttons, resulting in the blind’s inaccessibility how well the current apps support the accessibility for users to the apps. Considering that most app designers and developers with vision impairment. are of no vision issues, they may lack awareness or knowledge • We construct a large-scale dataset of high-quality content of those guidelines targeting for blind users. To assist developers descriptions for existing UI components. We release the with spotting those accessibility issues, many practical tools are dataset2 for enabling other researchers’ future research. developed such as Android Lint [1], Accessibility Scanner [5], and other accessibility testing frameworks [10, 18]. However, none of these tools can help fix the label-missing issues. Even if developers 2 ANDROID ACCESSIBILITY BACKGROUND or designers can locate these issues, they may still not be aware 2.1 Content Description of UI component how to add concise, easy-to-understand descriptions to the GUI Android UI is composed of many different components to achieve components for users with vision impairment. For example, many the interaction between the app and the users. For each