Constrained Clustering Approach to Aid in Remodularisation of Object-Oriented Software Systems

Total Page:16

File Type:pdf, Size:1020Kb

Constrained Clustering Approach to Aid in Remodularisation of Object-Oriented Software Systems CONSTRAINED CLUSTERING APPROACH TO AID IN REMODULARISATION OF OBJECT-ORIENTED SOFTWARE SYSTEMS CHONG CHUN YONG Malaya of FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY UNIVERSITY OF MALAYA KUALA LUMPUR University 2016 CONSTRAINED CLUSTERING APPROACH TO AID IN REMODULARISATION OF OBJECT-ORIENTED SOFTWARE SYSTEMS CHONG CHUN YONG Malaya of THESIS SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY UNIVERSITY OF MALAYA KUALA LUMPUR University 2016 UNIVERSITY OF MALAYA ORIGINAL LITERARY WORK DECLARATION Name of Candidate: Chong Chun Yong Registration/Matric No: WHA130005 Name of Degree: Doctor of Philosophy Title of Project Paper/Research Report/Dissertation/Thesis: Constrained Clustering Approach to aid in Remodularisation of Object-oriented Software Systems Field of Study: Software Engineering I do solemnly and sincerely declare that: (1) I am the sole author/writer of this Work; (2) This Work is original; (3) Any use of any work in which copyright exists was done by way of fair dealing and for permitted purposes and any excerpt or extract from, or reference to or reproduction of any copyright work has been disclosed expressly and sufficiently and the title of the WorkMalaya and its authorship have been acknowledged in this Work; (4) I do not have any actual knowledge nor do I ought reasonably to know that the making of this work constitutes anof infringement of any copyright work; (5) I hereby assign all and every rights in the copyright to this Work to the University of Malaya (“UM”), who henceforth shall be owner of the copyright in this Work and that any reproduction or use in any form or by any means whatsoever is prohibited without the written consent of UM having been first had and obtained; (6) I am fully aware that if in the course of making this Work I have infringed any copyright whether intentionally or otherwise, I may be subject to legal action or any other action as may be determined by UM. Candidate’s Signature Date: University Subscribed and solemnly declared before, Witness’s Signature Date: Name: Designation: ABSTRACT Effective execution of software maintenance requires knowledge of the detailed working of software. The structure of a software, however, may not be clear to software maintainers because it is poorly designed or, worse, there is no updated software documentation. To effectively address this issue, researchers have proposed to apply software clustering to help in recovering a high-level semantic representation of the software design by grouping sets of collaborating software components into meaningful subsystems. This high-level semantic representation serves to help bridge the dichotomy between the perceived software design from the maintainers’ view and the actual code structure. However, software clustering is typically conducted in an unsupervised and rigid manner, where maintainers have no influence Malayaon the clustering results and only a single solution is produced for any given dataset. Even if maintainers possess additional information that could be useful to guide andof improve the clustering results, traditional clustering algorithms have no way to take advantage of this information. These practical concerns have led the researcher to propose the idea of integrating domain knowledge into traditional unsupervised clustering algorithms, herewith referred as constrained clustering, a semi-supervised clustering technique where domain experts can explicitly exert their opinions in the form of explicit clustering constraints to restrict whether a pair of software components should or should not be clustered into the same subsystem. Apart fromUniversity the explicit clustering constraints from domain experts, other sources of information to guide and improve clustering results can be derived implicitly from the source code itself. To help maintainers effectively identify and interpret the implicit information hidden in the source code, this study proposes representing software using weighted complex network in conjunction with graph theory to help in understanding and analysing the structure, behaviour, as well as the complexity of the software components and their ii relationships from the graph theory’s point of view. The results of the analysis can be subsequently converted into implicit clustering constraints. Hence, maintainers can make use of both the explicit and implicit constraints to help in creating a high-level semantic representation of the software design that is coherent and consistent with the actual code structure. This thesis proposes a constrained clustering approach to aid in remodularisation of poorly designed or poorly documented object-oriented software systems. The source code of an object-oriented software system is first converted into UML class diagrams. Next, information from the class diagrams are extracted to measure the strength of cohesion among related classes together with their relationships, and then transform them into a weighted complex network with its nodes and edgesMalaya associated with measured weights. Graph theory metrics are subsequently applied onto the constructed weighted complex network so that the structure, behaviour, andof the complexity of software components and their relationships can be analysed. The results are then converted into sets of clustering constraints. Guided by the explicit and implicit clustering constraints, sets of cohesive clusters are progressively derived to act as a high-level semantic representation of the software design. This research follows an empirical research methodology, where the proposed approach is validatedUniversity using 40 object-oriented open -source software systems written in Java. Using MoJoFM, which is a well-established technique used to compare the similarity between multiple clustering results, the proposed approach achieves an aggregated average of 80.33% accuracy when compared against the original package diagrams of the 40 software systems, thus considerably outperforms conventional unconstrained clustering approach. The clustering results serve as supplementary information for software iii maintainers to aid in making critical decisions for re-engineering, maintaining and evolving software systems. Ultimately, this research helps in reducing the cost of software maintenance through better comprehension of the recovered software design. Malaya of University iv ABSTRAK Penyelenggaraan perisian yang berkesan memerlukan pengetahuan tentang operasi perisian tersebut. Bagaimanapun, struktur perisian mungkin tidak jelas kepada penyelenggara perisian kerana perisian tersebut direka dengan buruk, atau lebih teruk lagi, tidak ada dokumentasi yang dikemaskini. Bagi menangani isu ini dengan berkesan, penyelidik telah mencadangkan untuk melaksanakan pengkelompokan perisian untuk membantu dalam memulihkan perwakilan semantik peringkat tinggi secara rekabentuk perisian dengan mengumpulkan komponen-komponen perisian yang bekerjasama ke dalam subsistem yang bermakna. Perwakilan semantik peringkat tinggi ini berfungsi untuk merapatkan dikotomi antara reka bentuk perisian yang dilihat dari pandangan penyelenggara dan struktur kod yang sebenMalayaarnya. Walau bagaimanapun, pengkelompokan perisian biasanya dijalankan secara tidak terselia dan tegar, di mana penyelenggara tidak mempunyai pengaruhof ke atas keputusan kelompok dan hanya satu penyelesaian yang dihasilkan untuk sebarang set data yang diberikan. Walaupun penyelenggara mempunyai maklumat tambahan yang boleh membantu dan meningkatkan keputusan pengelompokan, algoritma pengelompokan tradisional tidak mempunyai cara untuk mengambil kesempatan daripada maklumat tersebut. Kebimbangan yang praktikal ini telah mendorong penyelidik kepada idea untuk menyepadukan pengetahuan domain ke dalam algoritma pengelompokan tradisional tanpa pengawasan, bersama-sama ini dirujukUniversity sebagai pengkelompokan secara kekangan, teknik pengelompokan separuh selia dimana pakar-pakar domain boleh memberi pendapat mereka dalam bentuk kekangan kelompok untuk menyekat sama ada sepasang komponen perisian perlu atau tidak dikelompokkan ke dalam subsistem yang sama. Selain daripada kekangan pengelompokan yang jelas daripada pakar-pakar domain, sumber maklumat lain untuk membimbing dan meningkatkan hasil pengelompokan boleh diperolehi secara tersirat v dari kod sumber perisian itu sendiri. Untuk membantu penyelenggara mengenal pasti dan mentafsir maklumat yang tersirat tersembunyi dalam kod sumber secara berkesan, kajian ini mencadangkan untuk mewakili perisian menggunakan rangkaian kompleks berwajaran sempena dengan teori graf untuk membantu dalam memahami dan menganalisis struktur, kelakuan, dan juga kerumitan komponen perisian dan hubungan mereka dari sudut pandangan teori graf. Keputusan analisis boleh kemudiannya ditukar menjadi kekangan pengelompokan tersirat. Oleh itu, penyelenggara boleh menggunakan kedua-dua kekangan tersurat dan tersirat untuk membantu dalam mewujudkan perwakilan rekabentuk perisian semantik berperingkat tinggi yang koheren dan konsisten dengan struktur kod yang sebenarnya. Tesis ini mencadangkan satu kaedah pengelompokanMalaya kekangan untuk membantu dalam remodularisasi sistem perisian berorientasikan objek yang direka secara buruk atau tidak didokumenkan. Pada mulanya, kod sumberof sistem perisian berorientasikan objek ditukar kepada gambar rajah kelas UML. Seterusnya, maklumat daripada gambar rajah kelas diekstrak untuk mengukur kekuatan perpaduan
Recommended publications
  • Roller: an Open Source Javatm Ee Blogging Platform
    ROLLER: AN OPEN SOURCE JAVATM EE BLOGGING PLATFORM • Dave Johnson – Staff Engineer S/W – Sun Microsystems, Inc. Agenda • Roller history • Roller features • Roller community • Roller internals: backend • Roller internals: frontend • Customizing Roller • Roller futures Roller started as an EJB example... • Homeport – a home page / portal (2001) ... became an O'Reilly article • Ditched EJBs and HAHTsite IDE (2002) • Used all open source tools instead and thus... ... and escaped into the wild I am allowing others to use my installation of Roller for their weblogging. Hopefully this will provide a means for enhancing the Roller user base as well as provide a nice environment for communication and expression. Anthony Eden August 8, 2002 ... to find a new home at Apache • Apache Roller (incubating) – Incubation period: June 2005 - ??? Agenda • Roller history • Roller features • Roller community • Roller internals: backend • Roller internals: frontend • Customizing Roller • Roller futures Roller features: standard blog stuff • Individual and group blogs • Hierarchical categories • Comments, trackbacks and referrers • File-upload and Podcasting support • User editable page templates • RSS and Atom feeds • Blog client support (Blogger/MetaWeblog API) • Built-in search engine Multiple blogs per user Multiple users per blog Blog client support • XML-RPC based Blogger and MetaWeblog API • Lots of blog clients work with Roller, for example: ecto http://ecto.kung-foo.tv For Mac OSX and Windows Roller 3.0: What's new • Big new release, 3 months in dev
    [Show full text]
  • Return of Organization Exempt from Income
    OMB No. 1545-0047 Return of Organization Exempt From Income Tax Form 990 Under section 501(c), 527, or 4947(a)(1) of the Internal Revenue Code (except black lung benefit trust or private foundation) Open to Public Department of the Treasury Internal Revenue Service The organization may have to use a copy of this return to satisfy state reporting requirements. Inspection A For the 2011 calendar year, or tax year beginning 5/1/2011 , and ending 4/30/2012 B Check if applicable: C Name of organization The Apache Software Foundation D Employer identification number Address change Doing Business As 47-0825376 Name change Number and street (or P.O. box if mail is not delivered to street address) Room/suite E Telephone number Initial return 1901 Munsey Drive (909) 374-9776 Terminated City or town, state or country, and ZIP + 4 Amended return Forest Hill MD 21050-2747 G Gross receipts $ 554,439 Application pending F Name and address of principal officer: H(a) Is this a group return for affiliates? Yes X No Jim Jagielski 1901 Munsey Drive, Forest Hill, MD 21050-2747 H(b) Are all affiliates included? Yes No I Tax-exempt status: X 501(c)(3) 501(c) ( ) (insert no.) 4947(a)(1) or 527 If "No," attach a list. (see instructions) J Website: http://www.apache.org/ H(c) Group exemption number K Form of organization: X Corporation Trust Association Other L Year of formation: 1999 M State of legal domicile: MD Part I Summary 1 Briefly describe the organization's mission or most significant activities: to provide open source software to the public that we sponsor free of charge 2 Check this box if the organization discontinued its operations or disposed of more than 25% of its net assets.
    [Show full text]
  • Projects – Other Than Hadoop! Created By:-Samarjit Mahapatra [email protected]
    Projects – other than Hadoop! Created By:-Samarjit Mahapatra [email protected] Mostly compatible with Hadoop/HDFS Apache Drill - provides low latency ad-hoc queries to many different data sources, including nested data. Inspired by Google's Dremel, Drill is designed to scale to 10,000 servers and query petabytes of data in seconds. Apache Hama - is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS for massive scientific computations such as matrix, graph and network algorithms. Akka - a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM. ML-Hadoop - Hadoop implementation of Machine learning algorithms Shark - is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can execute Hive QL queries up to 100 times faster than Hive without any modification to the existing data or queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions, providing seamless integration with existing Hive deployments and a familiar, more powerful option for new ones. Apache Crunch - Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run Azkaban - batch workflow job scheduler created at LinkedIn to run their Hadoop Jobs Apache Mesos - is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark, and other applications on a dynamically shared pool of nodes.
    [Show full text]
  • Reflexión Académica En Diseño & Comunicación
    ISSN 1668-1673 XXXII • 2017 Año XVIII. Vol 32. Noviembre 2017. Buenos Aires. Argentina Reflexión Académica en Diseño & Comunicación IV Congreso de Creatividad, Diseño y Comunicación para Profesores y Autoridades de Nivel Medio. `Interfaces Palermo´ Reflexión Académica en Diseño y Comunicación Comité Editorial Universidad de Palermo. Lucia Acar. Universidade Estácio de Sá. Brasil. Facultad de Diseño y Comunicación. Gonzalo Javier Alarcón Vital. Universidad Autónoma Metropolitana. México. Centro de Estudios en Diseño y Comunicación. Mercedes Alfonsín. Universidad de Buenos Aires. Argentina. Mario Bravo 1050. Fernando Alberto Alvarez Romero. Pontificia Universidad Católica del C1175ABT. Ciudad Autónoma de Buenos Aires, Argentina. Ecuador. Ecuador. www.palermo.edu Gonzalo Aranda Toro. Universidad Santo Tomás. Chile. [email protected] Christian Atance. Universidad de Lomas de Zamora. Argentina. Mónica Balabani. Universidad de Palermo. Argentina. Director Alberto Beckers Argomedo. Universidad Santo Tomás. Chile. Oscar Echevarría Renato Antonio Bertao. Universidade Positivo. Brasil. Allan Castelnuovo. Market Research Society. Reino Unido. Coordinadora de la Publicación Jorge Manuel Castro Falero. Universidad de la Empresa. Uruguay. Diana Divasto Raúl Castro Zuñeda. Universidad de Palermo. Argentina. Michael Dinwiddie. New York University. USA. Mario Rubén Dorochesi Fernandois. Universidad Técnica Federico Santa María. Chile. Adriana Inés Echeverria. Universidad de la Cuenca del Plata. Argentina. Universidad de Palermo Jimena Mariana García Ascolani. Universidad Comunera. Paraguay. Rector Marcelo Ghio. Instituto San Ignacio. Perú. Ricardo Popovsky Clara Lucia Grisales Montoya. Academia Superior de Artes. Colombia. Haenz Gutiérrez Quintana. Universidad Federal de Santa Catarina. Brasil. Facultad de Diseño y Comunicación José Korn Bruzzone. Universidad Tecnológica de Chile. Chile. Decano Zulema Marzorati. Universidad de Buenos Aires. Argentina. Oscar Echevarría Denisse Morales.
    [Show full text]
  • PDF Download Scaling Big Data with Hadoop and Solr
    SCALING BIG DATA WITH HADOOP AND SOLR - PDF, EPUB, EBOOK Hrishikesh Vijay Karambelkar | 166 pages | 30 Apr 2015 | Packt Publishing Limited | 9781783553396 | English | Birmingham, United Kingdom Scaling Big Data with Hadoop and Solr - PDF Book The default duration between two heartbeats is 3 seconds. Some other SQL-based distributed query engines to certainly bear in mind and consider for your use cases are:. What Can We Help With? Check out some of the job opportunities currently listed that match the professional profile, many of which seek experience with Search and Solr. This mode can be turned off manually by running the following command:. Has the notion of parent-child document relationships These exist as separate documents within the index, limiting their aggregation functionality in deeply- nested data structures. This step will actually create an authorization key with ssh, bypassing the passphrase check as shown in the following screenshot:. Fields may be split into individual tokens and indexed separately. Any key starting with a will go in the first region, with c the third region and z the last region. Now comes the interesting part. After the jobs are complete, the results are returned to the remote client via HiveServer2. Finally, Hadoop can accept data in just about any format, which eliminates much of the data transformation involved with the data processing. The difference in ingestion performance between Solr and Rocana Search is striking. Aptude has been working with our team for the past four years and we continue to use them and are satisfied with their work Warren E. These tables support most of the common data types that you know from the relational database world.
    [Show full text]
  • Classifying, Evaluating and Advancing Big Data Benchmarks
    Classifying, Evaluating and Advancing Big Data Benchmarks Dissertation zur Erlangung des Doktorgrades der Naturwissenschaften vorgelegt beim Fachbereich 12 Informatik der Johann Wolfgang Goethe-Universität in Frankfurt am Main von Todor Ivanov aus Stara Zagora Frankfurt am Main 2019 (D 30) vom Fachbereich 12 Informatik der Johann Wolfgang Goethe-Universität als Dissertation angenommen. Dekan: Prof. Dr. Andreas Bernig Gutachter: Prof. Dott. -Ing. Roberto V. Zicari Prof. Dr. Carsten Binnig Datum der Disputation: 23.07.2019 Abstract The main contribution of the thesis is in helping to understand which software system parameters mostly affect the performance of Big Data Platforms under realistic workloads. In detail, the main research contributions of the thesis are: 1. Definition of the new concept of heterogeneity for Big Data Architectures (Chapter 2); 2. Investigation of the performance of Big Data systems (e.g. Hadoop) in virtual- ized environments (Section 3.1); 3. Investigation of the performance of NoSQL databases versus Hadoop distribu- tions (Section 3.2); 4. Execution and evaluation of the TPCx-HS benchmark (Section 3.3); 5. Evaluation and comparison of Hive and Spark SQL engines using benchmark queries (Section 3.4); 6. Evaluation of the impact of compression techniques on SQL-on-Hadoop engine performance (Section 3.5); 7. Extensions of the standardized Big Data benchmark BigBench (TPCx-BB) (Section 4.1 and 4.3); 8. Definition of a new benchmark, called ABench (Big Data Architecture Stack Benchmark), that takes into account the heterogeneity of Big Data architectures (Section 4.5). The thesis is an attempt to re-define system benchmarking taking into account the new requirements posed by the Big Data applications.
    [Show full text]
  • Listado De Libros Virtuales Base De Datos De Investigación Ebrary-Engineering Total De Libros: 8127
    LISTADO DE LIBROS VIRTUALES BASE DE DATOS DE INVESTIGACIÓN EBRARY-ENGINEERING TOTAL DE LIBROS: 8127 TIPO CODIGO CODIGO CODIGO NUMERO TIPO TITULO MEDIO IES BIBLIOTECA LIBRO EJEMPLA SOPORTE 1018 UAE-BV4 5008030 LIBRO Turbulent Combustion DIGITAL 1 1018 UAE-BV4 5006991 LIBRO Waste Incineration and the Environment DIGITAL 1 1018 UAE-BV4 5006985 LIBRO Volatile Organic Compounds in the Atmosphere DIGITAL 1 1018 UAE-BV4 5006982 LIBRO Contaminated Land and its Reclamation DIGITAL 1 1018 UAE-BV4 5006980 LIBRO Risk Assessment and Risk Management DIGITAL 1 1018 UAE-BV4 5006976 LIBRO Chlorinated Organic Micropollutants DIGITAL 1 1018 UAE-BV4 5006973 LIBRO Environmental Impact of Power Generation DIGITAL 1 1018 UAE-BV4 5006970 LIBRO Mining and its Environmental Impact DIGITAL 1 1018 UAE-BV4 5006969 LIBRO Air Quality Management DIGITAL 1 1018 UAE-BV4 5006963 LIBRO Waste Treatment and Disposal DIGITAL 1 1018 UAE-BV4 5006426 LIBRO Home Recording Power! : Set up Your Own Recording Studio for Personal & ProfessionalDIGITAL Use 1 1018 UAE-BV4 5006424 LIBRO Graphics Tablet Solutions DIGITAL 1 1018 UAE-BV4 5006422 LIBRO Paint Shop Pro Web Graphics DIGITAL 1 1018 UAE-BV4 5006014 LIBRO Stochastic Models in Reliability DIGITAL 1 1018 UAE-BV4 5006013 LIBRO Inequalities : With Applications to Engineering DIGITAL 1 1018 UAE-BV4 5005105 LIBRO Issues & Dilemmas of Biotechnology : A Reference Guide DIGITAL 1 1018 UAE-BV4 5004961 LIBRO Web Site Design is Communication Design DIGITAL 1 1018 UAE-BV4 5004620 LIBRO On Video DIGITAL 1 1018 UAE-BV4 5003092 LIBRO Windows
    [Show full text]
  • HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack
    HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack Geoffrey C. Fox, Judy Qiu, Supun Kamburugamuve Shantenu Jha, Andre Luckow School of Informatics and Computing RADICAL Indiana University Rutgers University Bloomington, IN 47408, USA Piscataway, NJ 08854, USA fgcf, xqiu, [email protected] [email protected], [email protected] Abstract—We review the High Performance Computing En- systems as they illustrate key capabilities and often motivate hanced Apache Big Data Stack HPC-ABDS and summarize open source equivalents. the capabilities in 21 identified architecture layers. These The software is broken up into layers so that one can dis- cover Message and Data Protocols, Distributed Coordination, Security & Privacy, Monitoring, Infrastructure Management, cuss software systems in smaller groups. The layers where DevOps, Interoperability, File Systems, Cluster & Resource there is especial opportunity to integrate HPC are colored management, Data Transport, File management, NoSQL, SQL green in figure. We note that data systems that we construct (NewSQL), Extraction Tools, Object-relational mapping, In- from this software can run interoperably on virtualized or memory caching and databases, Inter-process Communication, non-virtualized environments aimed at key scientific data Batch Programming model and Runtime, Stream Processing, High-level Programming, Application Hosting and PaaS, Li- analysis problems. Most of ABDS emphasizes scalability braries and Applications, Workflow and Orchestration. We but not performance and one of our goals is to produce summarize status of these layers focusing on issues of impor- high performance environments. Here there is clear need tance for data analytics. We highlight areas where HPC and for better node performance and support of accelerators like ABDS have good opportunities for integration.
    [Show full text]
  • Mission Critical Messaging with Apache Kafka
    Jiangjie (Becket) Qin LinkedIn 1 2 Introduction to Apache Kafka Kafka based replication in Espresso Message Integrity guarantees Performance Large message handling Security Q&A 3 Introduction to Apache Kafka Kafka based replication in Espresso Message Integrity guarantees Performance Large message handling Security Q&A 4 5 6 7 Tracking events App2App msg. Online/Nearline Offline Metrics Data deployment ETL / Data deployment Logging Kafka (Messaging) Samza Media upload/Download Change Log (Stream proc.) (Images/Docs/Videos) Ambry Databus (Blob store) streams Media Brooklin Change Logs Online Media processing processing (Images/Docs/Videos) (Change capture) Applications Vector Hadoop Voldemort Nuage User data update /Venice Processed data (Our AWS) (K-V store) User data update Espresso Processed data (NoSQL DB) Stream Media Storage Nuage 8 Tracking events App2App msg. Online/Nearline Offline Metrics Data deployment ETL / Data deployment Logging Kafka (Messaging) Samza Media upload/Download Change Log (Stream proc.) (Images/Docs/Videos) Ambry Databus (Blob store) streams Media Brooklin Change Logs Online Media processing processing (Images/Docs/Videos) (Change capture) Applications Vector Hadoop Voldemort Nuage User data update /Venice Processed data (Our AWS) (K-V store) User data update Espresso Processed data (NoSQL DB) Stream Media Storage Nuage 下午16:40,百宴厅1,LinkedIn基于Kafka和ElasticSearch的实时日志分析 9 Tracking events App2App msg. Online/Nearline Offline Metrics Data deployment ETL / Data deployment Logging Kafka (Messaging) Samza Media upload/Download Change Log (Stream proc.) (Images/Docs/Videos) Ambry Databus (Blob store) streams Media Brooklin Change Logs Online Media processing processing (Images/Docs/Videos) (Change capture) Applications Vector Hadoop Voldemort Nuage User data update /Venice Processed data (Our AWS) (K-V store) User data update Espresso Processed data (NoSQL DB) Stream Media Storage Nuage 10 Tracking events App2App msg.
    [Show full text]
  • Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs"
    Appendix to the paper "Code smell prediction employing machine learning meets emerging Java language constructs" Hanna Grodzicka, Michał Kawa, Zofia Łakomiak, Arkadiusz Ziobrowski, Lech Madeyski (B) The Appendix includes two tables containing the dataset used in the paper "Code smell prediction employing machine learning meets emerging Java lan- guage constructs". The first table contains information about 792 projects selected for R package reproducer [Madeyski and Kitchenham(2019)]. Projects were the base dataset for cre- ating the dataset used in the study (Table I). The second table contains information about 281 projects filtered by Java version from build tool Maven (Table II) which were directly used in the paper. TABLE I: Base projects used to create the new dataset # Orgasation Project name GitHub link Commit hash Build tool Java version 1 adobe aem-core-wcm- www.github.com/adobe/ 1d1f1d70844c9e07cd694f028e87f85d926aba94 other or lack of unknown components aem-core-wcm-components 2 adobe S3Mock www.github.com/adobe/ 5aa299c2b6d0f0fd00f8d03fda560502270afb82 MAVEN 8 S3Mock 3 alexa alexa-skills- www.github.com/alexa/ bf1e9ccc50d1f3f8408f887f70197ee288fd4bd9 MAVEN 8 kit-sdk-for- alexa-skills-kit-sdk- java for-java 4 alibaba ARouter www.github.com/alibaba/ 93b328569bbdbf75e4aa87f0ecf48c69600591b2 GRADLE unknown ARouter 5 alibaba atlas www.github.com/alibaba/ e8c7b3f1ff14b2a1df64321c6992b796cae7d732 GRADLE unknown atlas 6 alibaba canal www.github.com/alibaba/ 08167c95c767fd3c9879584c0230820a8476a7a7 MAVEN 7 canal 7 alibaba cobar www.github.com/alibaba/
    [Show full text]
  • Interactive Programming Support for Secure Software Development
    INTERACTIVE PROGRAMMING SUPPORT FOR SECURE SOFTWARE DEVELOPMENT by Jing Xie A dissertation submitted to the faculty of The University of North Carolina at Charlotte in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computing and Information Systems Charlotte 2012 Approved by: Dr. Bill (Bei-Tseng) Chu Dr. Heather Richter Lipford Dr. Andrew J. Ko Dr. Xintao Wu Dr. Mary Maureen Brown ii c 2012 Jing Xie ALL RIGHTS RESERVED iii ABSTRACT JING XIE. Interactive programming support for secure software development. (Under the direction of DR. BILL (BEI-TSENG) CHU) Software vulnerabilities originating from insecure code are one of the leading causes of security problems people face today. Unfortunately, many software developers have not been adequately trained in writing secure programs that are resistant from attacks violating program confidentiality, integrity, and availability, a style of programming which I refer to as secure programming. Worse, even well-trained developers can still make programming errors, including security ones. This may be either because of their lack of understanding of secure programming practices, and/or their lapses of attention on security. Much work on software security has focused on detecting software vulnerabilities through automated analysis techniques. While they are effective, they are neither sufficient nor optimal. For instance, current tool support for secure programming, both from tool vendors as well as within the research community, focuses on catching security errors after the program is written. Static and dynamic analyzers work in a similar way as early compilers: developers must first run the tool, obtain and analyze results, diagnose programs, and finally fix the code if necessary.
    [Show full text]
  • Luciano Sampaio Martins De Souza Early Vulnerability Detection For
    Luciano Sampaio Martins de Souza Early Vulnerability Detection for Supporting Secure Programming DISSERTAÇÃO DE MESTRADO DEPARTAMENTO DE INFORMÁTICA Programa de Pós-Graduação em Informática Rio de Janeiro January 2015 Luciano Sampaio Martins de Souza Early Vulnerability Detection for Supporting Secure Programming DISSERTAÇÃO DE MESTRADO Dissertation presented to the Programa de Pós- Graduação em Informática of the Departamento de Informática, PUC-Rio, as partial fulfillment of the requirements for the degree of Mestre em Informática. Advisor: Prof. Alessandro Fabricio Garcia Rio de Janeiro January 2015 Luciano Sampaio Martins de Souza Early Vulnerability Detection for Supporting Secure Programming Dissertation presented to the Programa de Pós- Graduação em Informática of the Departamento de Informática, PUC-Rio, as partial fulfillment of the requirements for the degree of Master in Informatics. Prof. Alessandro Fabricio Garcia Advisor Departamento de Informática – PUC-Rio Prof. Anderson Oliveira da Silva Departamento de Informática – PUC-Rio Prof. Marcelo Blois Ribeiro GE Global Research Prof. Marcos Kalinowski UFJF Prof. José Eugenio Leal Coordinator of the Centro Técnico Científico da PUC-Rio Rio de Janeiro, January 15th, 2015 All rights reserved Luciano Sampaio Martins de Souza The author graduated in Computer Science from the University Tiradentes (UNIT) in 2006. He received a Graduate Degree with emphasis on Web Development from the University Tiradentes (UNIT) in 2011. His main research interest is: Software Development. Bibliographic data Souza, Luciano Sampaio Martins de Early vulnerability detection for supporting secure programming / Luciano Sampaio Martins de Souza; advisor: Alessandro Fabricio Garcia. – 2015. 132 f. :il. (color.) ; 30 cm Dissertação (mestrado) Pontifícia Universidade Católica do Rio de Janeiro, Departamento de Informática, 2015.
    [Show full text]