Arxiv:2104.08301V2 [Cs.CL] 7 Jul 2021
Total Page:16
File Type:pdf, Size:1020Kb
Text2App: A Framework for Creating Android Apps from Text Descriptions Masum Hasan1*, Kazi Sajeed Mehrab1*, Wasi Uddin Ahmad2, and Rifat Shahriyar1 1Bangladesh University of Engineering and Technology (BUET) 2University of California, Los Angeles (UCLA) [email protected], [email protected], [email protected] [email protected] Abstract Natural language description: Create an app with a textbox, a button named “Speak”, and a We present Text2App – a framework that al- text2speech. When the button is lows users to create functional Android appli- clicked, speak the text in the text box. cations from natural language specifications. Simplified App Representation: The conventional method of source code gen- On click <complist> eration tries to generate source code directly, <textbox> Happy text to which is impractical for creating complex soft- <button> STRING0 app ware. We overcome this limitation by trans- </button> <text2speech> forming natural language into an abstract in- </complist> termediate formal language representing an ap- <code> plication with a substantially smaller number <button1_clicked> of tokens. The intermediate formal representa- <speak> <textbox1text> tion is then compiled into target source codes. </speak> This abstraction of programming details al- </button1_clicked> </code> lows seq2seq networks to learn complex ap- plication structures with less overhead. In or- Literal Dictionary: der to train sequence models, we introduce a { “STRING0”: “Speak” data synthesis method grounded in a human } survey. We demonstrate that Text2App gener- alizes well to unseen combination of app com- Figure 1: An example app created by our system that ponents and it is capable of handling noisy nat- speaks the textbox text on button press. The natural lan- ural language instructions. We explore the pos- guage is machine translated to a simpler intermediate sibility of creating applications from highly ab- formal language which is compiled into an app source stract instructions by coupling our system with code. Literals are separated before machine translation. GPT-3 – a large pretrained language model. We perform an extensive human evaluation and identify the capabilities and limitations 2016), with the aspiration to automatically gen- of our system. The source code, a ready-to- run demo notebook, and a demo video are erate full-fledged software systems further down publicly available at https://github.com/ the road. To date, however, the task of source code text2app/Text2App. generation has turned out to be highly difficult – the arXiv:2104.08301v2 [cs.CL] 7 Jul 2021 best deep neural networks, consisting of hundreds 1 Introduction of millions of parameters and trained with hundreds of gigabytes of data, fails to achieve an accuracy Mobile application developers often have to build higher than 20% (Ahmad et al., 2021; Lu et al., applications from natural language requirements 2021). Till date, the ambition to produce software provided by their clients or managers. An auto- automatically from natural language descriptions mated tool to build functional applications from has remained a distant reality. such natural language descriptions will signifi- In this work, we present Text2App, a novel cantly value this application development process. pipeline to generate Android Mobile Applications For many years, researchers have been trying to (app) from natural language (NL) descriptions. In- generate source code from natural language de- stead of an end-to-end learning-based model, we scriptions (Yin and Neubig, 2017; Ling et al., break down the challenging task of app develop- * Equal contribution ment into modular components and apply learning- based methods only where necessary. We create a application with a substantially smaller number of formal language named Simplified App Represen- tokens, allowing seq2seq models to generate intri- tation (SAR), to represent an app with a minimal cate apps in a few decoding steps, which otherwise number of tokens and train a sequence-to-sequence would be unsolvable by current sequential models. neural network to generate this formal representa- We design a formal language named Simplified tion from an NL. Fig.1 shows an example of a for- App Representation (SAR) that captures the app mal representation created from a given NL. Using design, components, and functionalities in a small a custom-made compiler, we convert the simpli- number of tokens (Section 3.1). We further develop fied formal representation to the application source a SAR compiler that converts a SAR to an appli- code from which a functional app can be built. We cation source code (Section 3.2). Using MIT App create a data synthesis method and a BERT-based Inventor1 – a popular, accessible application devel- NL augmentation method to synthesize realistic opment tool – the source code can be compiled to NL-SAR parallel corpus. functional app in a matter of minutes. Training a We demonstrate that the compact app represen- sequence-to-sequence neural network for translat- tation allows seq2seq models to generate app from ing a natural language to SAR requires a parallel significantly noisy input, even being able to pre- NL-SAR corpus. However, human annotation of dict combinations it has not seen during training. such a corpus is difficult, and it limits our capa- Moreover, we open source our implementation to bility to add new components and functionalities. the community, and lay down the groundwork to Instead, we conduct a human survey to understand extend the features and functionalities of Text2App user perception of text-based app development and beyond what we demonstrated in this paper. app description pattern (Section 3.3), and based on this survey, we create a data synthesis method to 2 Related Works automatically generate fluent natural language de- Historically, deep learning based program genera- scriptions of apps along with corresponding SARs tion tended to focus on generating unit functions or (Section 3.4). To make sure our synthetic dataset methods from natural language instructions using is not monotonous, we introduce a BERT based sequence-to-sequence or sequence-to-tree architec- data augmentation method (Section 3.5). Using the tures (Ling et al., 2016; Yin and Neubig, 2017; synthesized and augmented parallel NL-SAR data, Brockschmidt et al., 2019; Parisotto et al., 2017; we train multiple sequence-to-sequence neural net- Rabinovich et al., 2017; Ahmad et al., 2021; Lu works to predict SAR from a given text description et al., 2021). The other type of works in program of an app, which is then compiled into functional generation that sparked researcher’s interest is gen- apps (Section 3.6). Fig.2 describes each step in erating GUI source code from a screenshot, hand- our natural language to app generation process. We drawn image, or text description of the GUI (Bel- also discuss how a pretrained language model, such tramelli, 2018; Jain et al., 2019; Robinson, 2019; as GPT-3, can be used with our system as an exter- Zhu et al., 2019; Moran et al., 2020; Kolthoff, nal knowledge-base for simplifying abstract human 2019). These works are limited to generating GUI instructions (Section 3.7). Our system is built to be design only, and does not naturally extend to func- modular, where each module is self-contained: in- tionality based programming. To the best of our dependent and with a single, well-defined purpose. knowledge, ours is the first work on developing This allows us to modify one part of the system working software with interdependent functional without affecting the others and debug the system components from natural language description. to pinpoint any error. Literals like strings, numbers are separated dur- 3 Text2App ing the preprocessing and are re-introduced during compilation. Contrary to conventional program- Text2App is a framework that aims to build oper- ming languages, unless a user specifies a detail of ational mobile applications from natural language an app component, a suitable default is assumed. (NL) specifications. We make this possible by trans- This allows the user to describe an app more natu- lating a specification to an intermediate, compact, rally and also reduces unnecessary overhead from formal representation which is compiled into the the sequential model. application source code in a later step. This inter- mediate language helps our system represent an 1https://appinventor.mit.edu/ Simplified App { "STRING0": "Speak" } Representation (SAR) Create an app with a create an app with a <complist> <textbox> textbox and a button text box and a button Encoder Decoder <button> STRING0 named "Speak". When named STRING0 . </button> </complist> ... when ... <code> ... Natural Language Tokenized and Description of App Pre-processed Seq2seq Neural Network { "STRING0": "Speak" } .apk MIT App Inventor MIT App Inventor SAR Compiler Ready to Use! backend Compiler Source (.scm, .bky) Figure 2: Text2App Prediction Pipeline. A given text is formatted and passed to a seq2seq network to be translated into SAR. Using a SAR Compiler, it is converted to App Inventor project, which can be built into an application. 3.1 Simplified App Representation (SAR) an animated ball has properties ‘color’, ‘speed’, SAR is an abstract, intermediate, formal language ‘radius’, etc. Such a property is called an argu- that represents a mobile application in our system. ment (<ARG>). The values of such arguments are We design SAR to be minimal and compact, at the indicated by <VAL>. As an example,