Deep Learning Models in

Maria Naumcheva Innopolis University Innopolis, Russia [email protected]

Abstract—Requirements elicitation is an important phase of have been applied to such tasks as image generation [21], any software project: the errors in requirements are more expen- image-to-image translation [22], sound synthesis [23], data sive to fix than the errors introduced at later stages of software augmentation [24], music generation and natural language life cycle. Nevertheless many projects do not devote sufficient time to requirements. Automated requirements generation can processing [25]. improve the quality of software projects. In this article we have Requirements elicitation is an important phase of any soft- accomplished the first step of the research on this topic: we ware project. The errors in requirements are up to 200 times have applied the vanilla sentence autoencoder to the sentence more expensive to fix than the errors introduced at later stages generation task and evaluated its performance. of software life cycle [26]. Nevertheless with the growing The generated sentences are not plausible English and contain only a few meaningful words. We believe that applying the popularity of Agile software development, many projects model to a larger dataset may produce significantly better results. quickly proceed to the implementation stage, without devoting Further research is needed to improve the quality of generated sufficient time to requirements. Undocumented requirements data. pose threats to software project success: assumptions will have Index Terms—Requirements engineering, variational autoen- to be made during the development process, and they will not coder, software requirements. be agreed with the customer. If non-functional requirements are not documented, the developed software may have poor I.INTRODUCTION quality. Machine learning (ML) is the field of study that gives com- The automated requirements generation can solve the issue puters the ability to learn without being explicitly programmed of poorly documented requirements as it may significantly re- [1]. ML models allow to approach complex problems and duce the time required for requirements elicitation. Automated large amounts of data. Instead of being explicitly programmed requirements generation may involve extracting requirements with complex algorithms, they are trained on a training set to from project documentation, conditional generation of non- learn the patterns from the data. Machine learning has been functional requirements or machine translation of requirements applied to such tasks as anomaly detection [2], [3], car accident in a programming language to a natural language. detection [4], [5], scene detection [6], image classification [7], Software requirements can be expressed in a natural lan- hyperspectral image analysis and classification [8], [9], human guage, in semi-formal notation (such as UML), in a graph- activity recognition [10], [11], [12], [13], object recognition or automata-based notation, in mathematical notation, or in [14], medical image analysis [15], [16], machine translation a programming language [27]. According to the industrial [17], [18], [19]. survey [28], 89% of software projects specify requirements Deep Learning models are ML models composed of multi- in natural language, so without losing generality we can say ple levels of non-linear operations, such as in neural nets with that requirements are texts. arXiv:2105.07771v1 [cs.SE] 17 May 2021 many hidden layers [20]. Deep learning can be supervised, Out of deep generative models, unlike GANs, Variational semi-supervised, or unsupervised. In supervised learning the Autoencoders can work with discrete data, such as texts, di- data in the training set has labels, such as class name or target rectly so they are a natural fit to text processing. Moreover they numeric value. Common supervised learning tasks are clas- seem more suitable for the tasks of requirement generation sification and regression (predicting target numerical value). as they are able to learn the semantic features and important In semi-supervised learning only part of the data is labeled. attributes of the input text. In unsupervised learning the data is unlabeled. Unsupervised In our work we aim to do the first step in a journey learning is computationally more complex and solves harder to automated requirements generation. We apply the current problems, than the supervised learning models. state generative model to requirements generation task and Deep generative models are a rapidly evolving area of ML outline the results. Although the application of VAE to nat- that enables modeling of such complex and high-dimensional ural language generation tasks is of interest for the research data as text, images, and speech. The important examples of community, to the best of our knowledge the application of deep generative models are Generative Adversarial Networks VAE in software requirements engineering domain has not yet (GAN) and Variational Autoencoders (VAE). GANs and VAEs been explored. II.RELATED WORK has entangled dimentions. The proposed model combines z c A. NLP in Requirements engineering with the structured code , targeting sentence attributes. The resulting disentangled representation enables generation of The applications of natural language processing (NLP) sentences with given attributes [38]. techniques to requirements engineering have been studied for decades. The systematic mapping study [29] identifies III.METHOD such NLP tasks in requirements engineering, as: detecting A. Model linguistic issues, identifying domain abstractions and concepts, Variational Autoencoder, introduced in [39], is an unsuper- requirements classification, constructing conceptual models, vised machine learning generative model, that consists of two establishing traceability links, requirements retrieval from ex- connected networks, an encoder and a decoder. The encoder isting repositories. However, none of these tasks has been encodes the input as a distribution over the latent space, while efficiently solved so far. Although the systematic mapping decoder generates new data by reconstruction of data points study identified 130 tools for requirements natural language generated from this distribution. The objective function has a processing, only 15 of them are available online and most of form [39]: them are not supported anymore. One of the problems with applying the NLP techniques to requirements engineering is (i) (i) that there are no large or even medium-size datasets available. L(θ, φ; x ) = −DKL(qφ(z||x )||pθ(z)) Creating datasets is costly, especially for labeled data. The L 1 X deep generative models do not require annotated datasets so + (logp (x(i)|z(i,l))) L θ they may solve the problems intractable otherwise due to the i=1 lack of sufficient amount of labeled data. Our model is a sentence generating VAE, based on sequence B. VAE for text generation to sequence RNN architecture [30] [40]. Recurrent Neural Networks are used to handle sequential data and can be utilized In the sentence generating VAE, introduced by Bowman for inputs or outputs of variable lengths. The model consists of et al. [30] the decoder performs sequential word generation, bidirectional recurrent LSTM encoder, that maps the incoming similar to the standard recurrent neural network language sentence to latent random variables, normal RNN variational model (RNNLM), however the latent space also captures inference module and recurrent LSTM decoder that receives the global characteristics of the sentences, such as its topic the latent representation as input at every step. The pre- or semantic properties. Yang et al. [31] empirically showed trained GloVe word embeddings [41] are used for words vector that using dilated CNN decoder, instead of LSTM decoder representation. The model architectural diagram is presented suggested by [30], can improve VAE performance for text in Fig. 1. modeling and classification and prevent posterior collapse. Semi-supervised Sequential Variational Autoencoder (SS- B. Dataset VAE), suggested by Xu et al. in [32], targets the text clas- Requirements texts have a specific structure. They are not sification task. The authors suggest feeding the input data the same as the sentences in fiction books or news article. label to the decoder at each time step. As a result, the One of the industry standards is so-called shall statements, e.g. classification accuracy significantly improves. Moreover, the ”The system shall notify the user when no matching product is trained model was able to generate plausible data: for the same found on the search.”. Some other requirements words, such latent variable z and different sentimental labels the model as ”should”, ”will”, ”must” may be used instead of ”shall”. generated syntactically similar sentences with the opposite Another industrial practise is to specify requirements as use sentimental connotation. cases. In the experiment section we do not consider use cases Ml-VAE, introduced by Shen et al. [33] aims to cope as requirements for the following reasons. First, requirements with the task of long text generation. The multi-level LSTM formulated as shall statements have common structure and structure of VAE is introduced in order to capture the text each requirement (sentence) describes one atomic unit of features at different levels of abstraction (sentence-level, word- functionality (or nonfunctional constraints), while some parts level). of use cases can not be treated as atomic requirements (such as The task of conditional machine translation is explored in for example stakeholders). Second, we believe that use cases [34], [35], and [36]. Conditional VAE for machine translation are not a proper way to specify a system, they may serve as models explicitly the input semantics and uses it for gen- test cases for the requirements, but not as the requirements erating the translation. Ye et al [37] proposed a Variational themselves (the detailed discussion on the topic is presented Template Machine, a modification of vanilla VAE, for data- in [42]). to-text generation. VTM has two disentangled latent variables: For VAE implementation we used the requirements dataset, z, modeling sentence template and c, modeling content. Hu created for the ReqExp project, lead by Sadovykh A., Ivanov et al. address the problem of controlled text generation by V., and Naumchev A., trimmed from the projects with poorly combining VAE with attribute discriminators. In vanilla VAE specified requirements and enriched with the data that we the data is generated conditioned on the latent code z that collected on our own from the publicly available software two requirements. The generated sentences are presented in Table 2 Customers shall be able to confirm the order after real sentence checkout generated the system shall shall shall to to to to the the the sentence Users shall be able to turn the sound on or off at real sentence any stage during the game generated the shall shall to to the the the the the the the the sentence the the The system must allow only admin-users to set real sentence up user profiles and allocate users to groups generated the shall shall to the the the the the the the the the sentence the the real sentence The system should support multilingual interface generated the shall shall shall the the sentence real sentence the system should be developed on open standards generated the system shall shall shall to to to the the sentence TABLE II REQUIREMENTSGENERATEDBYTHEMODEL

As we see, the only meaningful words that appeared in the generated requirements are ”system” and ”shall”. The rest are stop words ”the” and ”to”. At the same time, ”system” and ”shall” are indeed the most important words in software requirements. As the dataset was relatively small (about 3000 entries), we believe that better results could be achieved on a substantially larger dataset. Nevertheless we have to admit that the current results of the model are not satisfactory. Fig. 1. Model architectural diagram We see the following future work directions. First and most important is to collect a substantially larger software requirements dataset. Second, experiment with the existing requirements specifications. The dataset has been checked for model on a larger dataset. Third, experiment with another correctness, duplicated entries were removed, single require- VAE models, such as ml-VAE, conditional VAE, Variational ments consisting of several sentences were split to several Template Machine, and VAE for controlled text generation. requirements, trimmed of excessive text or removed. The REFERENCES resulting dataset has about 3000 entries. Each entry is one [1] A. L. Samuel. Some studies in machine learning using the game of sentence. The average length of one requirement is about 21 checkers. IBM Journal of Research and Development, 3(3):210–229, words. The example of dataset entries is listed in Table 1. 1959. The data was preprocessed with Keras Tokenizer class. The [2] Ad´ın Ram´ırez Rivera, Adil Khan, Imad Eddine Ibrahim Bekkouch, and Taimoor Shakeel Sheikh. Anomaly detection based on zero-shot outlier Tokenizer class turns each sentence into a sequence of integers synthesis and hierarchical feature distillation. IEEE Transactions on (word indices in a dictionary). All punctuation is removed. The Neural Networks and Learning Systems, 2020. num words argument reflects the maximum number of words: [3] Kirill Yakovlev, Imad Eddine Ibrahim Bekkouch, Adil Mehmood Khan, and Asad Masood Khattak. Abstraction-based outlier detection for image only (num words - 1) most frequent words are kept. data. In Proceedings of SAI Intelligent Systems Conference, pages 540– 552. Springer, 2020. Requirements [4] Mikhail Bortnikov, Adil Khan, Asad Masood Khattak, and Muhammad The system should be able to create test environment of weborder system. Ahmad. Accident recognition via 3d cnns for automated traffic mon- The system hardware shall be fixed and patched via an internet connection. itoring in smart cities. In Science and Information Conference, pages Yoggie shall coordinate on future enhancement and features with our 256–264. Springer, 2019. organization. [5] Elizaveta Batanina, Imad Eddine Ibrahim Bekkouch, Youssef Youssry, TABLE I Adil Khan, Asad Masood Khattak, and Mikhail Bortnikov. Domain adaptation for car accident detection in videos. In 2019 Ninth Interna- DATASET EXCERPT tional Conference on Image Processing Theory, Tools and Applications (IPTA), pages 1–6. IEEE, 2019. [6] Stanislav Protasov, Adil Mehmood Khan, Konstantin Sozykin, and Muhammad Ahmad. Using deep features for video scene detection and IV. EVALUATION AND FUTURE WORK annotation. Signal, Image and Video Processing, 12(5):991–999, 2018. After training for 100 epochs, which took about 1.5 hours, [7] Muhammad Ahmad, Adil Mehmood Khan, Manuel Mazzara, Salvatore Distefano, Mohsin Ali, and Muhammad Shahzad Sarfraz. A fast we checked the ability of trained model to generate require- and compact 3-d cnn for hyperspectral image classification. IEEE ments. To do so, we interpolated in the latent space between Geoscience and Remote Sensing Letters, 2020. [8] Muhammad Ahmad, Mohammed A Alqarni, Adil Mehmood Khan, [28] Samuel A Fricker, Rainer Grau, and Adrian Zwingli. Requirements Rasheed Hussain, Manuel Mazzara, and Salvatore Distefano. Segmented engineering: best practice. In Requirements Engineering for Digital and non-segmented stacked denoising autoencoder for hyperspectral Health, pages 25–46. Springer, 2015. band reduction. Optik, 180:370–378, 2019. [29] Liping Zhao, Waad Alhoshan, Alessio Ferrari, Keletso J Letsholo, [9] Muhammad Ahmad, Adil Mehmood Khan, Manuel Mazzara, and Salva- Muideen A Ajagbe, Erol-Valeriu Chioasca, and Riza T Batista-Navarro. tore Distefano. Multi-layer extreme learning machine-based autoencoder Natural language processing (nlp) for requirements engineering: A for hyperspectral image classification. In VISIGRAPP (4: VISAPP), systematic mapping study. arXiv preprint arXiv:2004.01099, 2020. pages 75–82, 2019. [30] Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal [10] Yuriy Gavrilin and Adil Khan. Across-sensor feature learning for energy- Jozefowicz, and Samy Bengio. Generating sentences from a continuous efficient activity recognition on mobile devices. In 2019 International space. arXiv preprint arXiv:1511.06349, 2015. Joint Conference on Neural Networks (IJCNN), pages 1–7. IEEE, 2019. [31] Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, and Taylor Berg- [11] Konstantin Sozykin, Stanislav Protasov, Adil Khan, Rasheed Hussain, Kirkpatrick. Improved variational autoencoders for text modeling using and Jooyoung Lee. Multi-label class-imbalanced action recognition in dilated convolutions. In International conference on machine learning, hockey videos via 3d convolutional neural networks. In 2018 19th pages 3881–3890. PMLR, 2017. IEEE/ACIS International Conference on , Artificial [32] Weidi Xu, Haoze Sun, Chao Deng, and Ying Tan. Variational autoen- Intelligence, Networking and Parallel/Distributed Computing (SNPD), coder for semi-supervised text classification. In Proceedings of the AAAI pages 146–151. IEEE, 2018. Conference on Artificial Intelligence, volume 31, 2017. [12] Adil Mehmood Khan, Young-Koo Lee, Sungyoung Lee, and Tae-Seong [33] Dinghan Shen, Asli Celikyilmaz, Yizhe Zhang, Liqun Chen, Xin Wang, Kim. Accelerometer’s position independent physical activity recognition Jianfeng Gao, and Lawrence Carin. Towards generating long and system for long-term activity monitoring in the elderly. Medical & coherent text with multi-level latent variable models. arXiv preprint biological engineering & computing, 48(12):1271–1279, 2010. arXiv:1902.00154, 2019. [13] Adil Mehmood Khan, Young-Koo Lee, Sungyoung Y Lee, and Tae- [34] Biao Zhang, Deyi Xiong, jinsong su, Hong Duan, and Min Zhang. Seong Kim. A triaxial accelerometer-based physical-activity recognition Variational neural machine translation. pages 521–530, 01 2016. via augmented-signal features and a hierarchical recognizer. IEEE [35] Artidoro Pagnoni, Kevin Liu, and Shangyan Li. Conditional vari- transactions on information technology in biomedicine, 14(5):1166– ational autoencoder for neural machine translation. arXiv preprint 1172, 2010. arXiv:1812.04405, 2018. [14] Adil Khan and Khadija Fraz. Post-training iterative hierarchical data [36] Jinsong Su, Shan Wu, Deyi Xiong, Yaojie Lu, Xianpei Han, and Biao augmentation for deep networks. Advances in Neural Information Zhang. Variational recurrent neural machine translation. In Proceedings Processing Systems, 33, 2020. of the AAAI Conference on Artificial Intelligence, volume 32, 2018. [37] Rong Ye, Wenxian Shi, Hao Zhou, Zhongyu Wei, and Lei Li. Variational [15] Maxim Gusarev, Ramil Kuleev, Adil Khan, Adin Ramirez Rivera, and template machine for data-to-text generation, 2020. Asad Masood Khattak. Deep learning models for bone suppression [38] Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and in chest radiographs. In 2017 IEEE Conference on Computational Eric P Xing. Toward controlled generation of text. In International Intelligence in Bioinformatics and Computational Biology (CIBCB), Conference on Machine Learning, pages 1587–1596. PMLR, 2017. pages 1–7. IEEE, 2017. [39] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. [16] Anton Dobrenkii, Ramil Kuleev, Adil Khan, Adin Ramirez Rivera, arXiv preprint arXiv:1312.6114, 2013. and Asad Masood Khattak. Large residual multiple view 3d cnn [40] Giancarlo Nicola. Text generation with a variational autoencoder. https: 2017 for false positive reduction in pulmonary nodule detection. In //https://nicgian.github.io/text-generation-vae/, 2018. IEEE conference on computational intelligence in bioinformatics and [41] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. computational biology (CIBCB) , pages 1–6. IEEE, 2017. Glove: Global vectors for word representation. In Empirical Methods [17] Albina Khusainova, Adil Khan, and Ad´ın Ram´ırez Rivera. Sart- in Natural Language Processing (EMNLP), pages 1532–1543, 2014. similarity, analogies, and relatedness for tatar language: New bench- [42] Bertrand Meyer. Uml: The positive spin. American Programmer, 1997. mark datasets for word embeddings evaluation. arXiv preprint arXiv:1904.00365, 2019. [18] Albina Khusainova, Adil Khan, Ad´ın Ram´ırez Rivera, and Vitaly Ro- manov. Hierarchical transformer for multilingual machine translation. arXiv preprint arXiv:2103.03589, 2021. [19] Aidar Valeev, Ilshat Gibadullin, Albina Khusainova, and Adil Khan. Application of low-resource machine translation techniques to russian- tatar language pair. arXiv preprint arXiv:1910.00368, 2019. [20] Yoshua Bengio. Learning deep architectures for AI. Now Publishers Inc, 2009. [21] Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018. [22] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to- image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017. [23] Jesse Engel, Kumar Krishna Agrawal, Shuo Chen, Ishaan Gulrajani, Chris Donahue, and Adam Roberts. Gansynth: Adversarial neural audio synthesis. arXiv preprint arXiv:1902.08710, 2019. [24] Antreas Antoniou, Amos Storkey, and Harrison Edwards. Data augmen- tation generative adversarial networks. arXiv preprint arXiv:1711.04340, 2017. [25] Jie Gui, Zhenan Sun, Yonggang Wen, Dacheng Tao, and Jieping Ye. A review on generative adversarial networks: Algorithms, theory, and applications. arXiv preprint arXiv:2001.06937, 2020. [26] Barry W. Boehm. Software Engineering Economics, pages 99–150. Springer Berlin Heidelberg, Berlin, Heidelberg, 2001. [27] Jean-Michel Bruel, Sophie Ebersold, Florian Galinier, Alexandr Naum- chev, Manuel Mazzara, and Bertrand Meyer. The role of formalism in system requirements (full version). arXiv preprint arXiv:1911.02564, 2019.