Measuring Architectural Degeneration – in Systems Written in the Interpreted Dynamically Typed Multi- Paradigm Language Python
Total Page:16
File Type:pdf, Size:1020Kb
Linköping University | Department of Computer and Information Science Master thesis, 30 ECTS | Datateknik 2019 | LIU-IDA/LITH-EX-A--19/059--SE Measuring Architectural Degeneration – In Systems Written in the Interpreted Dynamically Typed Multi- Paradigm Language Python Anton Mo Eriksson & Hampus Dunström Supervisor : Anders Fröberg Examiner : Erik Berglund Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer- ingsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Över- föring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och till- gängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet än- dras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/. Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to down- load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/. © Anton Mo Eriksson & Hampus Dunström Abstract Architectural degeneration is an ever-present threat to software systems with no exception based on the domain or tools used. This thesis focus on the architectural degeneration in systems written in multi-paradigm run-time evaluated languages like Python. The focus on Python in this kind of investigation is to our knowledge the first of its kind; thus the thesis investigates if the methods for measuring architectural degeneration also applies to run-time evaluated languages like Python as believed by other researchers. Whom in con- trast to our research have only researched this phenomenon in systems written in compiled languages such as Java, C, C++ and C#. In our research a tool PySmell has been developed to recover architectures and identify the presence of architectural smells in a system. PyS- mell has been used and evaluated on three different projects Django, Flask and PySmell itself. The results of PySmell are promising and of great interest but in need of further investigating and fine-tuning to reach the same level as the architectural recovery tools available for compiled languages. The thesis presents the first step into this new area of detecting architectural degeneration in interpreted languages, revealing issues such as that of extracting dependencies and how that may affect the architectural smell detection. Acknowledgments We would like to thank our supervisors from both Linköpings University and FindOut Tech- nologies for believing in us and supporting us when everything did not go our way. Anders Fröberg at Linköping University has always been there for us, especially for questions regard- ing the report and academic requirements. David Lindahl and Nils Kronqvist at FindOut Technologies, thank you for our weekly meetings were you have helped us keep a contin- uous workflow and give another perspective on our work. We are also thankful to Marco Kuhlmann, a machine learning and language technology researcher at Linköping Univer- sity, for his time and insights into our work in machine learning. Last but not least, Helena Gällerdal you have made our time at FindOut Technologies brighter with your energy and commitment towards us, thank you! iv Contents Abstract iii Acknowledgments iv Contents v List of Figures viii List of Tables ix 1 Introduction 1 1.1 Purpose . 2 1.2 Research Question . 2 1.3 Delimitations . 2 1.4 Expected Results . 2 1.5 FindOut Technologies . 3 2 Theory 4 2.1 Architecture . 4 2.2 Architectural Degeneration . 5 2.3 Architectural Smells . 5 2.3.1 Dependency Cycles . 6 2.3.2 Link Overload . 6 2.3.3 Unused Interface . 6 2.3.4 Sloppy Delegation . 7 2.3.5 Concern Overload . 7 2.4 Threshold . 8 2.5 Architecture Recovery . 9 2.5.1 Architecture Recovery using Concerns . 9 2.6 Text Mining . 11 2.6.1 Latent Dirichlware system.et Allocation . 11 2.6.2 N -gram model . 12 2.6.3 Term Frequency-Inverse Document Frequency . 13 2.6.4 Stochastic Gradient Descent . 13 2.6.5 Logistic Regression . 14 2.7 Clustering . 14 2.7.1 Agglomerative Clustering . 14 2.7.2 K-Medoids . 15 2.8 Distance Measurements . 15 2.8.1 Jaccard Distance . 15 2.8.2 Jensen-Shannon Distance . 16 2.8.3 Combined Distance . 16 2.9 Multi-Paradigm Programming Languages . 17 v 2.9.1 Python . 17 2.9.2 Abstract Syntax Tree . 18 3 Related Work 20 3.1 Architectural Degeneration . 20 3.2 Architectural Recovery . 22 4 Method 23 4.1 Developing PySmell . 23 4.2 Architecture Recovery . 23 4.2.1 Structural Information Extraction . 24 4.2.2 Concern Extraction . 24 4.2.3 Brick Recovery . 25 4.3 Measuring Architectural Smells . 25 4.3.1 Dependency Cycles . 26 4.3.2 Link Overload . 26 4.3.3 Unused Interfaces/Entities . 26 4.3.4 Sloppy Delegation . 26 4.3.5 Concern Overload . 26 4.4 Validation . 26 4.4.1 Parameters . 27 4.5 Analysing the Architectural Degeneration . 27 5 Result 29 5.1 PySmell . 29 5.2 Investigated Projects . 29 5.3 Architectural Smells . 30 5.3.1 Parameters . 30 5.3.2 Dependency Cycles . 31 5.3.3 Link Overload . 32 5.3.4 Unused Entities . 33 5.3.5 Sloppy Delegation . 35 5.3.6 Concern Overload . 36 5.4 Validation . 37 5.4.1 Parameters . 37 5.4.2 Architectural Recovery . 38 5.4.3 Architectural Smells . 39 6 Discussion 40 6.1 Method . 40 6.1.1 ARC . 40 6.1.2 Structural Extraction . 40 6.1.3 Concern Extraction . 41 6.1.4 Distance Measurement . 41 6.1.5 Clustering . 42 6.1.6 Measuring Architectural Smells . 43 6.1.7 Parameters . 43 6.1.8 Threats to Validity . 44 6.2 Result . 44 6.2.1 Dependency Cycles . 44 6.2.2 Link Overload . 45 6.2.3 Unused Entities . 45 6.2.4 Sloppy Delegation . 45 vi 6.2.5 Concern Overload . 45 6.2.6 Validation . 46 6.3 Research Question . 47 7 Conclusion 48.