An Empirical Study Investigating the Predictors of Software Metric Correlation in Application Code and Test Code

Total Page:16

File Type:pdf, Size:1020Kb

An Empirical Study Investigating the Predictors of Software Metric Correlation in Application Code and Test Code An Empirical Study Investigating the Predictors of Software Metric Correlation in Application Code and Test Code. by Daniel Kwame Dapaah Afriyie A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Master of Applied Science in Electrical and Computer Engineering Carleton University Ottawa, Ontario © 2019, Daniel Kwame Dapaah Afriyie Abstract On non-trivial software, the large test code base needs adequate maintenance similarly to the application code. It has often been argued that test code should be simple, some authors even arguing that a test script should be reduced to a single control flow. If this is indeed what happens in practice, then we believe test code maintenance will be very different from application code maintenance, despite the fact that both are source code. Using a number of large open source software, we compare application code and test code by using a series of well-known source code complexity metrics. The results reveal that application code is, as expected, more complex than test code but not necessarily so much more complex. The study also confirms the assertion that test code is not as simple as it should be, at least as advocated in textbooks, and may therefore be very complex to maintain. Using complexity metrics to determine the difference, our findings also reveal that, the kind of code determines the extent of monotonicity between the number of lines of code (LOC) and cyclomatic complexity (CC). While a number of authors hypothesize and experimentally confirm that CC has a very strong correlation with LOC, justifying the use of LOC in place of CC (and Halstead Effort), we believe that this strong correlation is prevalent only in production code as results from test code prove otherwise. In test code, there exist a very weak (or almost no) correlation between (a) LOC and CC, (b) Halstead Effort and CC, and (c) LOC and Halstead Effort. We therefore argue that the level of correlation depends on at least three factors namely: the kind of code, the kind of software and the kind of metric. Therefore, we believe it would be inappropriate, without considering these factors to substitute one metric with another using some generalized correlation hypothesis. The results gained thus far contradict the notion that some metrics are correlated (that is if the right factors are not considered). Given the weak monotonicity between CC and LOC, CC and Halstead Effort for test code, we disagree that CC and Halstead metrics are redundant as some studies suggest. Rather, we advocate for the use of CC over LOC (or both or cylomatic density) due to numerous advantages CC has over LOC such as the fact that CC is perceived to better reflect cognitive complexity, numerical complexity, adequacy of testing, interdependency and code refactoring that cannot be accounted for simply by LOC. ii Acknowledgements One great discovery of climbing mountains and winning battles is the realization of even greater mountains to climb and tougher battles to win. In doing so, we are faced with hard choices; one of which is flinching back, to the very least. The true test of our resilience is in our ability to ultimately win and endure climbing higher mountains. Without which many have lived their lives not having ever lived. I am therefore grateful to Almighty God that this project is finally completed. A few years ago, a bleary-eyed stare at what lies in the future was unassuring, but the joining of the dots as time went on gives us this delight to be forever grateful and an assurance to be hopeful. I owe special thanks most of all, to my supervisor, Prof. Yvan Labiche. Such a great inspiration. Everything he does, from what is believed to be conceptually complex to the least significant, are tackled with a great deal of perfection and a high level of academic proficiency. A mentor, a father, and a friend. You only get better by heeding to his advice. Very helpful, insightful and full of brilliance. I am privileged to work with him. It was a great pleasure to collaborate with Prof. Natalia Stepanova at the School of Mathematics and Statistics for sharing ideas on complicated topics when organizing my thoughts on statistical principles and methodologies. I feel so favored to enjoy special treatment from Jennifer Poll, Jenna McConnell and Darlene Hebert. Many thanks to them and members of the Department and Faculty. I am indebted to the many individuals who generously and without hesitation, supported in various ways. The last stages of my masters had turmoil moments, to say the least. Thankfully, I had my fair share of great and reliable people who stood with me. A mentor Mr. George Cole, my sweetheart and good friend Abigail Oduro and a brother Eric Obeng. I am grateful for the support and encouragement from my family, Samuel Afriyie, Mercy Afriyie and Isaac Afriyie. Where water fails, blood sticks better. I love you. iii Table of Contents Chapter 1 INTRODUCTION .......................................................................................... 1 1.1 Contribution of this Study .............................................................................................. 3 1.2 Summary of This Research ............................................................................................ 4 1.3 Organization of this Document....................................................................................... 5 Chapter 2 RELATED WORK ......................................................................................... 6 Chapter 3 BACKGROUND INFORMATION ............................................................ 11 3.1 Cyclomatic Complexity, Lines of Code and Halstead Metrics. ................................... 11 3.2 On the Use of Maintainability Index ............................................................................ 12 3.3 Measuring Correlation .................................................................................................. 13 3.3.1 Spearman’s Correlation Coefficient ......................................................................... 14 3.3.2 On Pearson’s Correlation Coefficient ...................................................................... 15 3.3.3 Kendall’s Correlation Coefficient ............................................................................ 15 3.3.4 Conclusion on correlation tests ................................................................................ 16 3.4 Test for Normality: Anderson Darling Test, Shapiro-Wilk, D’Agostino K-squared, Kolmogorov Smirnov ................................................................................................................ 17 3.5 Homogeneity of Variance: T-Test and ANOVA .......................................................... 18 3.6 Statistical Analysis – Wilcoxon Rank Sum Test .......................................................... 19 3.6.1 Kolmogorov Smirnov Tests ..................................................................................... 20 Chapter 4 EXPERIMENTAL DESIGN ....................................................................... 21 4.1 Selected Applications and Rationale ............................................................................ 21 4.2 Measurement tool ......................................................................................................... 24 4.3 Selected metrics ............................................................................................................ 25 4.4 Selection of Statistical Methods ................................................................................... 26 Chapter 5 RESULTS (ANALYSIS OF DATA) ........................................................... 28 5.1 Statistical Analysis of Extracted Data .......................................................................... 28 5.1.1 Descriptive statistics on Lines of Code .................................................................... 28 5.1.2 Descriptive statistics on Cyclomatic Complexity .................................................... 35 5.1.3 Descriptive statistics on Using Distributions, Graphs and Average Values for Cyclomatic Complexity......................................................................................................... 38 5.2 Inferential Statistics ...................................................................................................... 54 5.3 On the Use of Non-Parametric Statistical Tests ........................................................... 57 iv 5.4 On Correlation among Software Metric ....................................................................... 59 Chapter 6 THREATS TO VALIDITY ......................................................................... 65 Chapter 7 CONCLUSIONS ........................................................................................... 68 REFERENCES ................................................................................................................ 71 APPENDICES ................................................................................................................. 76 Appendix A Distributions of cyclomatic complexity values for objects. .................................. 76 A.1 Sample representative distributions ......................................................................... 76 A.2 Boxplots for all objects ............................................................................................ 78 Appendix B : Linear Scale Zoomed In On The
Recommended publications
  • The Dzone Guide to Volume Ii
    THE D ZONE GUIDE TO MODERN JAVA VOLUME II BROUGHT TO YOU IN PARTNERSHIP WITH DZONE.COM/GUIDES DZONE’S 2016 GUIDE TO MODERN JAVA Dear Reader, TABLE OF CONTENTS 3 EXECUTIVE SUMMARY Why isn’t Java dead after more than two decades? A few guesses: Java is (still) uniquely portable, readable to 4 KEY RESEARCH FINDINGS fresh eyes, constantly improving its automatic memory management, provides good full-stack support for high- 10 THE JAVA 8 API DESIGN PRINCIPLES load web services, and enjoys a diverse and enthusiastic BY PER MINBORG community, mature toolchain, and vigorous dependency 13 PROJECT JIGSAW IS COMING ecosystem. BY NICOLAI PARLOG Java is growing with us, and we’re growing with Java. Java 18 REACTIVE MICROSERVICES: DRIVING APPLICATION 8 just expanded our programming paradigm horizons (add MODERNIZATION EFFORTS Church and Curry to Kay and Gosling) and we’re still learning BY MARKUS EISELE how to mix functional and object-oriented code. Early next 21 CHECKLIST: 7 HABITS OF SUPER PRODUCTIVE JAVA DEVELOPERS year Java 9 will add a wealth of bigger-picture upgrades. 22 THE ELEMENTS OF MODERN JAVA STYLE But Java remains vibrant for many more reasons than the BY MICHAEL TOFINETTI robustness of the language and the comprehensiveness of the platform. JVM languages keep multiplying (Kotlin went 28 12 FACTORS AND BEYOND IN JAVA GA this year!), Android keeps increasing market share, and BY PIETER HUMPHREY AND MARK HECKLER demand for Java developers (measuring by both new job 31 DIVING DEEPER INTO JAVA DEVELOPMENT posting frequency and average salary) remains high. The key to the modernization of Java is not a laundry-list of JSRs, but 34 INFOGRAPHIC: JAVA'S IMPACT ON THE MODERN WORLD rather the energy of the Java developer community at large.
    [Show full text]
  • Return of Organization Exempt from Income
    OMB No. 1545-0047 Return of Organization Exempt From Income Tax Form 990 Under section 501(c), 527, or 4947(a)(1) of the Internal Revenue Code (except black lung benefit trust or private foundation) Open to Public Department of the Treasury Internal Revenue Service The organization may have to use a copy of this return to satisfy state reporting requirements. Inspection A For the 2011 calendar year, or tax year beginning 5/1/2011 , and ending 4/30/2012 B Check if applicable: C Name of organization The Apache Software Foundation D Employer identification number Address change Doing Business As 47-0825376 Name change Number and street (or P.O. box if mail is not delivered to street address) Room/suite E Telephone number Initial return 1901 Munsey Drive (909) 374-9776 Terminated City or town, state or country, and ZIP + 4 Amended return Forest Hill MD 21050-2747 G Gross receipts $ 554,439 Application pending F Name and address of principal officer: H(a) Is this a group return for affiliates? Yes X No Jim Jagielski 1901 Munsey Drive, Forest Hill, MD 21050-2747 H(b) Are all affiliates included? Yes No I Tax-exempt status: X 501(c)(3) 501(c) ( ) (insert no.) 4947(a)(1) or 527 If "No," attach a list. (see instructions) J Website: http://www.apache.org/ H(c) Group exemption number K Form of organization: X Corporation Trust Association Other L Year of formation: 1999 M State of legal domicile: MD Part I Summary 1 Briefly describe the organization's mission or most significant activities: to provide open source software to the public that we sponsor free of charge 2 Check this box if the organization discontinued its operations or disposed of more than 25% of its net assets.
    [Show full text]
  • 4.3.0 Third Party License Files
    Third Party Terms Third Party License(s) of Terracotta Version 4.3 THE FOLLOWING THIRD PARTY COMPONENTS MAY BE UTILIZED, EMBEDDED, BUNDLED OR OTHERWISE INCLUDED IN SOME OF THE PRODUCTS ("Product") YOU HAVE LICENSED FROM TERRACOTTA, INC..THESE THIRD PARTY COMPONENTS MAY BE SUBJECT TO ADDITIONAL OR DIFFERENT LICENSE RIGHTS, TERMS AND CONDITIONS AND / OR REQUIRE CERTAIN NOTICES BY THEIR THIRD PARTY LICENSORS. SOFTWARE AG IS OBLIGED TO PASS ANY CURRENT AND FUTURE TERMS OF SUCH LICENSES THROUGH TO ITS LICENSEES. TP Product Name TP Product Version apache-commons-io 2.4 apache-commons-lang 2.5 apache-commons-logging 1.0.3 apache-jakarta-commons-beanutils 1.8.3 apache-jakarta-commons-cli 1.1 apache-jakarta-commons-collections 3.2.1 apache-jakarta-commons-logging 1.1.1 apache-log4j 1.2.17 apache-shiro 1.2.3 apache-xmlbeans 2.4.0 beanshell-project 2.0b4 commons-lang 2.6 fasterxml-jackson-annotations 2.3 gf.aopalliance-repackaged.jar 2.2.0 gf.hk2.api.jar 2.2.0 gf.hk2.locator.jar 2.2.0 Copyright (c) 2015 Software AG, Darmstadt, Germany Third Party License(s) of Terracotta Version 4.3 TP Product Name TP Product Version gf.hk2-utils.jar 2.2.0 gf.javax.annotation-api.jar 1.20 gf.javax.annotation.jar 1.1 gf.javax.inject.jar 2.2.0 gf.javax.jms.jar 1.1 gf.javax.mail.jar 1.4.4 (API 1.4) gf.javax.security.auth.message.jar 1.0 gf.javax.servlet-api.jar 3.0.1 gf.javax.transaction.jar 1.1 gf.javax.ws.rs-api.jar 2.00 gf.jersey-client.jar 2.6.0 gf.jersey-common.jar 2.6.0 gf.jersey-container-servlet-core.jar 2.6.0 gf.jersey-container-servlet.jar 2.6 gf.jersey-guava.jar
    [Show full text]
  • Full-Graph-Limited-Mvn-Deps.Pdf
    org.jboss.cl.jboss-cl-2.0.9.GA org.jboss.cl.jboss-cl-parent-2.2.1.GA org.jboss.cl.jboss-classloader-N/A org.jboss.cl.jboss-classloading-vfs-N/A org.jboss.cl.jboss-classloading-N/A org.primefaces.extensions.master-pom-1.0.0 org.sonatype.mercury.mercury-mp3-1.0-alpha-1 org.primefaces.themes.overcast-${primefaces.theme.version} org.primefaces.themes.dark-hive-${primefaces.theme.version}org.primefaces.themes.humanity-${primefaces.theme.version}org.primefaces.themes.le-frog-${primefaces.theme.version} org.primefaces.themes.south-street-${primefaces.theme.version}org.primefaces.themes.sunny-${primefaces.theme.version}org.primefaces.themes.hot-sneaks-${primefaces.theme.version}org.primefaces.themes.cupertino-${primefaces.theme.version} org.primefaces.themes.trontastic-${primefaces.theme.version}org.primefaces.themes.excite-bike-${primefaces.theme.version} org.apache.maven.mercury.mercury-external-N/A org.primefaces.themes.redmond-${primefaces.theme.version}org.primefaces.themes.afterwork-${primefaces.theme.version}org.primefaces.themes.glass-x-${primefaces.theme.version}org.primefaces.themes.home-${primefaces.theme.version} org.primefaces.themes.black-tie-${primefaces.theme.version}org.primefaces.themes.eggplant-${primefaces.theme.version} org.apache.maven.mercury.mercury-repo-remote-m2-N/Aorg.apache.maven.mercury.mercury-md-sat-N/A org.primefaces.themes.ui-lightness-${primefaces.theme.version}org.primefaces.themes.midnight-${primefaces.theme.version}org.primefaces.themes.mint-choc-${primefaces.theme.version}org.primefaces.themes.afternoon-${primefaces.theme.version}org.primefaces.themes.dot-luv-${primefaces.theme.version}org.primefaces.themes.smoothness-${primefaces.theme.version}org.primefaces.themes.swanky-purse-${primefaces.theme.version}
    [Show full text]
  • TE Console 8.8.2.2 - Use of Third-Party Libraries
    TE Console 8.8.2.2 - Use of Third-Party Libraries Name Selected License mindterm 4.2.2 (Commercial) APPGATE-Mindterm-License GifEncoder 1998 (Acme.com License) Acme.com Software License ImageEncoder 1996 (Acme.com License) Acme.com Software License commons-discovery 0.2 [Bundled w/te-console] Apache License 1.1 (Apache 1.1) jrcs 20080310 (Apache 1.1) Apache License 1.1 activemQ-broker 5.13.2 (Apache-2.0) Apache License 2.0 activemQ-broker 5.15.9 (Apache-2.0) Apache License 2.0 activemQ-camel 5.15.9 (Apache-2.0) Apache License 2.0 activemQ-client 5.13.2 (Apache-2.0) Apache License 2.0 activemQ-client 5.14.2 (Apache-2.0) Apache License 2.0 activemQ-client 5.15.9 (Apache-2.0) Apache License 2.0 activemQ-jms-pool 5.15.9 (Apache-2.0) Apache License 2.0 activemQ-kahadb-store 5.15.9 (Apache-2.0) Apache License 2.0 activemQ-openwire-legacy 5.13.2 (Apache-2.0) Apache License 2.0 activemQ-openwire-legacy 5.15.9 (Apache-2.0) Apache License 2.0 activemQ-pool 5.15.9 (Apache-2.0) Apache License 2.0 activemQ-protobuf 1.1 (Apache-2.0) Apache License 2.0 activemQ-spring 5.15.9 (Apache-2.0) Apache License 2.0 activemQ-stomp 5.15.9 (Apache-2.0) Apache License 2.0 ant 1.6.3 (Apache 2.0) Apache License 2.0 avalon-framework 4.2.0 (Apache v2.0) Apache License 2.0 awaitility 1.7.0 (Apache-2.0) Apache License 2.0 axis 1.4 [Bundled w/te-console] (Apache v2.0) Apache License 2.0 axis-jaxrpc 1.4 [Bundled w/te-console] (Apache 2.0) Apache License 2.0 axis-saaj 1.4 [Bundled w/te-console] (Apache 2.0) Apache License 2.0 batik 1.6 (Apache v2.0) Apache License 2.0 batik-constants
    [Show full text]
  • Hortonworks Data Platform Release Notes (October 30, 2017)
    Hortonworks Data Platform Release Notes (October 30, 2017) docs.cloudera.com Hortonworks Data Platform October 30, 2017 Hortonworks Data Platform: Release Notes Copyright © 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing, processing and analyzing large volumes of data. It is designed to deal with data from many sources and formats in a very quick, easy and cost-effective manner. The Hortonworks Data Platform consists of the essential set of Apache Software Foundation projects that focus on the storage and processing of Big Data, along with operations, security, and governance for the resulting system. This includes Apache Hadoop -- which includes MapReduce, Hadoop Distributed File System (HDFS), and Yet Another Resource Negotiator (YARN) -- along with Ambari, Falcon, Flume, HBase, Hive, Kafka, Knox, Oozie, Phoenix, Pig, Ranger, Slider, Spark, Sqoop, Storm, Tez, and ZooKeeper. Hortonworks is the major contributor of code and patches to many of these projects. These projects have been integrated and tested as part of the Hortonworks Data Platform release process and installation and configuration tools have also been included. Unlike other providers of platforms built using Apache Hadoop, Hortonworks contributes 100% of our code back to the Apache Software Foundation. The Hortonworks Data Platform is Apache-licensed and completely open source. We sell only expert technical support, training and partner-enablement services. All of our technology is, and will remain, free and open source. Please visit the Hortonworks Data Platform page for more information on Hortonworks technology. For more information on Hortonworks services, please visit either the Support or Training page.
    [Show full text]
  • Big Data Security Analysis and Secure Hadoop Server
    Big Data Security Analysis And Secure Hadoop Server Kripa Shanker Master’s thesis January 2018 Master's Degree in Information Technology 2 ABSTRACT Tampereen Ammattikorkeakoulu Tampere University of Applied Sciences Master's Degree in Information Technology Kripa Shanker Big data security analysis and secure hadoop server Master's thesis 62 pages, appendices 4 pages January 2018 Hadoop is a so influential technology that’s let us to do incredible things but major thing to secure informative data and environment is a big challenge as there are many bad guys (crackers, hackers) are there to harm the society using this data. Hadoop is now used in retail, banking, and healthcare applications; it has attracted the attention of thieves as well. While storing sensitive huge data, security plays an important role to keep it safe. Security was not that much considered when Hadoop was initially designed. Security is an important topic in Hadoop cluster. Plenty of examples are available in open media on data breaches and most recently was RANSOMEWARE which get access in server level which is more dangerous for an organizations. This is best time to only focus on security at any cost and time needed to secure data and platform. Hadoop is designed to run code on a distributed cluster of machines so without proper authentication anyone could submit code and it would be executed. Different projects have started to improve the security of Hadoop. In this thesis, the security of the system in Hadoop version 1, Hadoop version 2 and Hadoop version 3 is evaluated and different security enhancements are proposed, considering security improvements made by the two mentioned projects, Project Apache Knox Gateway, Project Apache Ranger and Apache Sentry, in terms of encryption, authentication, and authorization.
    [Show full text]
  • Plugin Tapestry ​
    PlugIn Tapestry ​ Autor @picodotdev https://picodotdev.github.io/blog-bitix/ 2019 1.4.2 5.4 A tod@s l@s programador@s que en su trabajo no pueden usar el framework, librería o lenguaje que quisieran. Y a las que se divierten programando y aprendiendo hasta altas horas de la madrugada. Non gogoa, han zangoa Hecho con un esfuerzo en tiempo considerable con una buena cantidad de software libre y más ilusión en una región llamada Euskadi. PlugIn Tapestry: Desarrollo de aplicaciones y páginas web con Apache Tapestry @picodotdev 2014 - 2019 2 Prefacio Empecé El blog de pico.dev y unos años más tarde Blog Bitix con el objetivo de poder aprender y compartir el conocimiento de muchas cosas que me interesaban desde la programación y el software libre hasta análisis de los productos tecnológicos que caen en mis manos. Las del ámbito de la programación creo que usándolas pueden resolver en muchos casos los problemas típicos de las aplicaciones web y que encuentro en el día a día en mi trabajo como desarrollador. Sin embargo, por distintas circunstancias ya sean propias del cliente, la empresa o las personas es habitual que solo me sirvan meramente como satisfacción de adquirir conocimientos. Hasta el día de hoy una de ellas es el tema del que trata este libro, Apache Tapestry. Para escribir en el blog solo dependo de mí y de ninguna otra circunstancia salvo mi tiempo personal, es com- pletamente mío con lo que puedo hacer lo que quiera con él y no tengo ninguna limitación para escribir y usar cualquier herramienta, aunque en un principio solo sea para hacer un ejemplo muy sencillo, en el momento que llegue la oportunidad quizá me sirva para aplicarlo a un proyecto real.
    [Show full text]
  • Tripwire Whitelist Profiler 5.5.0 - Use of Third Party Libraries
    Tripwire Whitelist Profiler 5.5.0 - Use of Third Party Libraries Name Selected License activemq-client 5.13.2 (Apache-2.0) Apache License 2.0 apache-axis 1.4 Apache License 2.0 axis 1.4 (Apache 2.0) Apache License 2.0 axis-jaxrpc 1.4 (Apache 2.0) Apache License 2.0 axis-saaj 1.4 (Apache 2.0) Apache License 2.0 batik-constants 1.9.1 (Apache-2.0) Apache License 2.0 batik-css 1.9.1 (Apache-2.0) Apache License 2.0 batik-i18n 1.9.1 (Apache-2.0) Apache License 2.0 batik-util 1.9.1 (Apache-2.0) Apache License 2.0 bonecp 0.8.0.RELEASE (Apache 2.0) Apache License 2.0 cglib 2.2.1-v20090111 (Apache-2.0) Apache License 2.0 commons-beanutils 1.9.4 (Apache-2.0) Apache License 2.0 commons-codec 1.13.0 (Apache-2.0) Apache License 2.0 commons-collections4 4.4 (Apache-2.0) Apache License 2.0 commons-configuration 1.10.0 (Apache-2.0) Apache License 2.0 commons-dbutils 1.6 (Apache-2.0) Apache License 2.0 commons-discovery 0.5 (Apache 2.0) Apache License 2.0 commons-io 2.4 (Apache 2.4) Apache License 2.0 commons-io 2.6 (Apache-2.0) Apache License 2.0 commons-lang3 3.9 (Apache-2.0) Apache License 2.0 commons-logging 1.1.3 (Apache-2.0) Apache License 2.0 commons-logging 1.2 (Apache-2.0) Apache License 2.0 commons-text 1.8 (Apache-2.0) Apache License 2.0 geronimo-j2ee-management_1.1_spec 1.0.1 Apache License 2.0 (Apache-2.0) geronimo-jms_1.1_spec 1.1.1 (Apache-2.0) Apache License 2.0 groovy-all 2.4.7 (Apache v2.0) Apache License 2.0 gson 2.8.6 (Apache-2.0) Apache License 2.0 guava 28.1 (Apache-2.0) Apache License 2.0 guava-retrying 2.0.0 (Apache v2.0) Apache
    [Show full text]
  • KNIME Big Data Extensions Admin Guide
    KNIME Big Data Extensions Admin Guide KNIME AG, Zurich, Switzerland Version 4.0 (last updated on 2020-09-02) Table of Contents Overview . 1 Cloudera CDH Compatibility. 2 Cloudera HDP Compatibility. 2 Amazon EMR Compatibility . 2 Apache Livy setup . 2 Cloudera CDH. 2 Cloudera HDP. 4 Amazon EMR . 4 Spark Job Server setup . 4 Background . 5 Versions . 5 Updating Spark Job Server. 6 Requirements. 6 Installation . 6 Installation on a Kerberos-secured cluster. 9 Setting up LDAP authentication. 10 Maintenance . 13 Retrieving Spark logs . 14 Troubleshooting . 14 Downloads . 16 Apache Livy downloads . 16 Spark Jobserver downloads. 16 KNIME Big Data Extensions Admin Guide Overview KNIME Big Data Extensions integrate Apache Spark and the Apache Hadoop ecosystem with KNIME Analytics Platform. This guide is aimed at IT professionals who need to integrate KNIME Analytics Platform with an existing Hadoop/Spark environment. The steps in this guide are required so that users of KNIME Analytics Platform run Spark workflows. Note that running Spark workflows on KNIME Server requires additional steps outlined in Secured Cluster Connection Guide for KNIME Server. Figure 1. Overall architecture KNIME Extension for Apache Spark requires a REST service to be installed on an edge/fronted node of the cluster. The REST service must be one of: • Apache Livy (recommended, requires at least Spark 2.2) • Spark Job Server (deprecated, still supported for Spark 2.1 and older) Please follow the instructions of this guide to either install Livy (if necessary), or Spark Jobserver. For new setups it is strongly recommended to use at least Spark 2.2 and Livy.
    [Show full text]
  • Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs"
    Appendix to the paper "Code smell prediction employing machine learning meets emerging Java language constructs" Hanna Grodzicka, Michał Kawa, Zofia Łakomiak, Arkadiusz Ziobrowski, Lech Madeyski (B) The Appendix includes two tables containing the dataset used in the paper "Code smell prediction employing machine learning meets emerging Java lan- guage constructs". The first table contains information about 792 projects selected for R package reproducer [Madeyski and Kitchenham(2019)]. Projects were the base dataset for cre- ating the dataset used in the study (Table I). The second table contains information about 281 projects filtered by Java version from build tool Maven (Table II) which were directly used in the paper. TABLE I: Base projects used to create the new dataset # Orgasation Project name GitHub link Commit hash Build tool Java version 1 adobe aem-core-wcm- www.github.com/adobe/ 1d1f1d70844c9e07cd694f028e87f85d926aba94 other or lack of unknown components aem-core-wcm-components 2 adobe S3Mock www.github.com/adobe/ 5aa299c2b6d0f0fd00f8d03fda560502270afb82 MAVEN 8 S3Mock 3 alexa alexa-skills- www.github.com/alexa/ bf1e9ccc50d1f3f8408f887f70197ee288fd4bd9 MAVEN 8 kit-sdk-for- alexa-skills-kit-sdk- java for-java 4 alibaba ARouter www.github.com/alibaba/ 93b328569bbdbf75e4aa87f0ecf48c69600591b2 GRADLE unknown ARouter 5 alibaba atlas www.github.com/alibaba/ e8c7b3f1ff14b2a1df64321c6992b796cae7d732 GRADLE unknown atlas 6 alibaba canal www.github.com/alibaba/ 08167c95c767fd3c9879584c0230820a8476a7a7 MAVEN 7 canal 7 alibaba cobar www.github.com/alibaba/
    [Show full text]
  • Talend Open Studio for ESB Installation and Upgrade Guide for Linux
    Talend Open Studio for ESB Installation and Upgrade Guide for Linux 7.3.1 Last updated: 2020-02-19 Contents Copyleft...........................................................................................................................3 Talend Open Studio for ESB: Prerequisites...............................................................5 About this installation guide..........................................................................................................................................5 Preparing your installation............................................................................................................................................. 5 Hardware requirements.................................................................................................................................................... 6 Software requirements......................................................................................................................................................7 Installing the XULRunner package............................................................................................................................17 Setting up JAVA_HOME.................................................................................................................................................. 18 Installing your Talend Open Studio for ESB manually.......................................... 19 Installing and configuring your Talend Studio.....................................................................................................19
    [Show full text]