A Large Scale Empirical Study

Total Page:16

File Type:pdf, Size:1020Kb

A Large Scale Empirical Study Co-Change Patterns: A Large Scale Empirical Study Luciana L. Silvaa,∗, Marco Tulio Valentea, Marcelo de Almeida Maiab aDepartment of Computer Science, UFMG, Brazil bFaculty of Computer Science, UFU, Brazil Abstract Co-Change Clustering is a modularity assessment technique that reveals how often changes are localized in modules and whether a change propagation represents design problems. This technique is centered on co-change clusters, which are highly inter-related source code files considering co-change relations. In this paper, we conduct a series of empirical analysis in a large corpus of 133 popular software projects on GitHub. We describe six co-change patterns by projecting them over the directory structure. We mine 1,802 co-change clusters and 1,719 co-change clusters (95%) are covered by the six co-change patterns. In this study, we aim to answer two central questions: (i) Are co-change patterns detected in different programming languages? (ii) How do different co- change patterns relate to rippling, activity density, ownership, and team diversity on clusters? We conclude that Encapsulated and Well-Confined clusters (Wrapped) implement well-defined and confined concerns. Octopus clusters are proportionally numerous regarding to other patterns. They relate significantly with ripple effect, activity, ownership, and diversity in development teams. Although Crosscutting are scattered over directories, they implement well-defined concerns. Despite they present higher activity compared to Wrapped clusters, it is not necessarily easy to get rid of them, suggesting that support tools may play a crucial role. Keywords: Modularity, co-change clusters, co-change patterns 1. Introduction the standard approach is based on the analysis of structural mea- sures, e.g., coupling and cohesion (Chidamber and Kemerer, Parnas developed the criteria that “modules should hide de- 1991; Stevens et al., 1974). Over the years, several alterna- cisions or decisions that are likely to change (Parnas, 1972). In tive attempts have been proposed, such as semantic approaches, addition, according to Parnas, a module represents a responsi- which analyze the source code vocabulary using Information bility assignment. However, the initial planned modules may Retrieval techniques (Santos et al., 2014; Maletic and Marcus, suffer changes overtime to adapt to their new responsibilities, 2000), co-change approaches, which mine historical data to de- otherwise their modularity tends to decay. During software de- tect software artifacts that usually change together (Zimmer- velopment, changes are performed constantly in tasks related mann et al., 2007; Alali et al., 2013), or hybrid approaches, to new features, code refactoring, and bug fixing. In a modular which combine these types of information (Kagdi et al., 2013; software, when those tasks are required, they should change Bavota et al., 2014). a single module with minimal—if possible none—impact in other modules (Aggarwal and Singh, 2005). In contrast, im- Recently, we proposed the Co-Change Clustering technique proper modularization may cause ripple effects during software to assess modularity by capturing logical modules from history maintenance. For such situation, it would be beneficial whether of software changes (Silva et al., 2014, 2015a). Co-change clus- the system expert could semi-automatically analyze modularity ters consist of source code artifacts that frequently change to- by retrieving set of classes that usually change together to com- gether between themselves but not with others artifacts in dif- pare and contrast co-change clusters with directory structure. ferent clusters. Later, we reported a study with experts of six object-oriented systems to investigate the developers’ percep- The importance of modular systems has motivated re- tion of co-change clusters (Silva et al., 2015b). These clusters searchers and practitioners to investigate different dimensions were classified in three patterns to represent common instances to assess system modularity. Despite modularity being an es- of co-change clusters, regarding their projection over the direc- sential principle in software development, effective approaches tory structure: Encapsulated Clusters (clusters that match the to assess modularity still remain an open problem. Typically, directory structure, i.e., they dominate all co-change classes in the directory structures they touch), Crosscutting Clusters (clus- ∗Corresponding author at Departamento de Cienciaˆ da Computac¸ao,˜ Av. ters whose co-change classes are scattered across several direc- Antonioˆ Carlos, 6627 - Pampulha CEP: 31270-010, UFMG, Belo Horizonte, tory structures), and Octopus Clusters (most co-change classes Brazil. Tel.: +55 31 3409-5865. are confined in one directory structure with some arms—or Email addresses: “tentacles”—in others). Our proposed co-change patterns could [email protected],[email protected] (Luciana L. Silva), [email protected] (Marco Tulio Valente), cover 52% of the 102 mined clusters. Although unitary changes [email protected] (Marcelo de Almeida Maia) are preferable than co-changes, indeed, those co-change pat- Preprint submitted to Journal of Systems and Software March 15, 2019 terns with propagation and scattering revealed to have different we conduct a qualitative and semantic analysis and Section ?? perspectives on the impact they may cause in software main- we discuss our findings. We discuss threats to validity in Sec- tenance. In summary, our first results indicate that: (i) Encap- tion 7 and related work in Section 8. Finally, we conclude the sulated Clusters tend to be more controllable; (ii) around 50% paper in Section 9. of the Crosscutting Clusters were associated to design anoma- lies; (iii) Octopus Clusters represent expected class distribu- 2. Co-Change Clustering tions, which are difficult to implement in an encapsulated way. Nonetheless, we did not evaluate whether programming lan- The ultimate goal of our technique is to support develop- guages, application domains, thresholds, and commit density ers on modularity assessment using co-change relations. The (number of commits divided by the number of source code files) technique relies on historical information to generate co-change impact co-change pattern’s detection. We also did not analyze graphs and then mine co-change clusters. As illustrated in Fig- whether co-change patterns relate to ripple-effect, activity den- ure 1, we propose three phases to extract co-change clusters. In sity, ownership, and team diversity on clusters. the first phase, we extract commit transactions and apply pre- In this paper, we extend our previous work to answer these processing steps to build co-change graphs. After that, a post- aforementioned open questions in five directions: processing phase is applied on co-change graphs to prune edges with small weights. Finally, in the last phase, the co-change 1. We conduct a new study in a large corpus of 133 popular graph is clustered several times to selected the best clustering. projects hosted in GitHub. We consider projects in differ- A detailed description of this section can be found in previous ent languages (C/C++, Java, PHP, Ruby, JavaScript, and work (Silva et al., 2014, 2015a). Python) and application domains. 2. We investigate the threshold used to group commits (ap- 2.1. Co-Change Graph plied by the same developer in a period of time during the data preprocessing) and the process of co-change cluster Beyer and Noack (2005) proposed co-change graphs, an ab- extraction. In this analysis, we intend to evaluate the ef- straction to represent commit transactions from version control fectiveness of co-change clusters. system (VCS). Conceptually, a co-change graph is an undi- rected and weighted graph fV, Eg, where V is a set of source 3. We propose three new co-change patterns (Section 3): (i) code files and E is a set of edges. If there is an edge between Well-Confined Clusters (clusters which touch a single di- two vertices (source code files) F and F , then at least one com- rectory structure but do not dominate it); (ii) Black Sheep i j mit in the VCS changes Fi and F j, where i , j. Finally, we Clusters (similar to Crosscutting, however, they touch very conduct some experiments with relative edge weights to define few code files in each one directory structure); and (iii) the weighting measure. We concluded that co-change modifi- Squid Clusters (similar to Octopus, but they have smaller cations are usually much less frequent than single changes in a bodies and arms). We mine 1,802 co-change clusters from source code file. For example, in our previous work (Silva et al., version histories of such projects, which were then cate- 2014, 2015a), we extract co-change clusters for Lucene project1 gorized in six patterns regarding their projection over the to present the concept of co-change clustering. We observe that directory structure. From the initially computed clusters, the maximal edges’ weight is seven and the maximum size is 1,719 co-change clusters (95%) are covered by the pro- 27. For this reason, we do not adopt a relative edge weights on posed co-change patterns. co-change graphs. Instead, the edges’ weights represent how 4. We conduct a series of statistical analysis to evaluate the many commits changed the connected source code files simul- categorized co-change clusters on the 133 projects (Sec- taneously represent. tion 5.4). This study aims to answer two central questions: Before extracting co-change graphs, we perform Phase 1 (i) Are co-change patterns detected in different program- (Figure 1) to preprocess commits extracted from version his- ming languages? (ii) How do different co-change patterns tory. This phase consists of the following three steps: relate to rippling, activity density, ownership, and team di- versity on clusters? 1. We discard commits which do not change source code files, 5. We conduct a qualitative analysis by investigating under- e.g., documentation and configuration files. In addition, test- lying natural language topics in commit messages on clas- ing files are eliminated because co-changes between testing sified co-change clusters.
Recommended publications
  • Histcoroy Pyright for Online Information and Ordering of This and Other Manning Books, Please Visit Topwicws W.Manning.Com
    www.allitebooks.com HistCoroy pyright For online information and ordering of this and other Manning books, please visit Topwicws w.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Tutorials Special Sales Department Offers & D e al s Manning Publications Co. 20 Baldwin Road Highligh ts PO Box 761 Shelter Island, NY 11964 Email: [email protected] Settings ©2017 by Manning Publications Co. All rights reserved. Support No part of this publication may be reproduced, stored in a retrieval system, or Sign Out transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid­free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. PO Box 761 Shelter Island, NY 11964 www.allitebooks.com Development editor: Cynthia Kane Review editor: Aleksandar Dragosavljević Technical development editor: Stan Bice Project editors: Kevin Sullivan, David Novak Copyeditor: Sharon Wilkey Proofreader: Melody Dolab Technical proofreader: Doug Warren Typesetter and cover design: Marija Tudor ISBN 9781617292576 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – EBM – 22 21 20 19 18 17 www.allitebooks.com HistPoray rt 1.
    [Show full text]
  • Beyond Relational Databases
    EXPERT ANALYSIS BY MARCOS ALBE, SUPPORT ENGINEER, PERCONA Beyond Relational Databases: A Focus on Redis, MongoDB, and ClickHouse Many of us use and love relational databases… until we try and use them for purposes which aren’t their strong point. Queues, caches, catalogs, unstructured data, counters, and many other use cases, can be solved with relational databases, but are better served by alternative options. In this expert analysis, we examine the goals, pros and cons, and the good and bad use cases of the most popular alternatives on the market, and look into some modern open source implementations. Beyond Relational Databases Developers frequently choose the backend store for the applications they produce. Amidst dozens of options, buzzwords, industry preferences, and vendor offers, it’s not always easy to make the right choice… Even with a map! !# O# d# "# a# `# @R*7-# @94FA6)6 =F(*I-76#A4+)74/*2(:# ( JA$:+49>)# &-)6+16F-# (M#@E61>-#W6e6# &6EH#;)7-6<+# &6EH# J(7)(:X(78+# !"#$%&'( S-76I6)6#'4+)-:-7# A((E-N# ##@E61>-#;E678# ;)762(# .01.%2%+'.('.$%,3( @E61>-#;(F7# D((9F-#=F(*I## =(:c*-:)U@E61>-#W6e6# @F2+16F-# G*/(F-# @Q;# $%&## @R*7-## A6)6S(77-:)U@E61>-#@E-N# K4E-F4:-A%# A6)6E7(1# %49$:+49>)+# @E61>-#'*1-:-# @E61>-#;6<R6# L&H# A6)6#'68-# $%&#@:6F521+#M(7#@E61>-#;E678# .761F-#;)7-6<#LNEF(7-7# S-76I6)6#=F(*I# A6)6/7418+# @ !"#$%&'( ;H=JO# ;(\X67-#@D# M(7#J6I((E# .761F-#%49#A6)6#=F(*I# @ )*&+',"-.%/( S$%=.#;)7-6<%6+-# =F(*I-76# LF6+21+-671># ;G';)7-6<# LF6+21#[(*:I# @E61>-#;"# @E61>-#;)(7<# H618+E61-# *&'+,"#$%&'$#( .761F-#%49#A6)6#@EEF46:1-#
    [Show full text]
  • Data Platforms Map from 451 Research
    1 2 3 4 5 6 Azure AgilData Cloudera Distribu2on HDInsight Metascale of Apache Kaa MapR Streams MapR Hortonworks Towards Teradata Listener Doopex Apache Spark Strao enterprise search Apache Solr Google Cloud Confluent/Apache Kaa Al2scale Qubole AWS IBM Azure DataTorrent/Apache Apex PipelineDB Dataproc BigInsights Apache Lucene Apache Samza EMR Data Lake IBM Analy2cs for Apache Spark Oracle Stream Explorer Teradata Cloud Databricks A Towards SRCH2 So\ware AG for Hadoop Oracle Big Data Cloud A E-discovery TIBCO StreamBase Cloudera Elas2csearch SQLStream Data Elas2c Found Apache S4 Apache Storm Rackspace Non-relaonal Oracle Big Data Appliance ObjectRocket for IBM InfoSphere Streams xPlenty Apache Hadoop HP IDOL Elas2csearch Google Azure Stream Analy2cs Data Ar2sans Apache Flink Azure Cloud EsgnDB/ zone Platforms Oracle Dataflow Endeca Server Search AWS Apache Apache IBM Ac2an Treasure Avio Kinesis LeanXcale Trafodion Splice Machine MammothDB Drill Presto Big SQL Vortex Data SciDB HPCC AsterixDB IBM InfoSphere Towards LucidWorks Starcounter SQLite Apache Teradata Map Data Explorer Firebird Apache Apache JethroData Pivotal HD/ Apache Cazena CitusDB SIEM Big Data Tajo Hive Impala Apache HAWQ Kudu Aster Loggly Ac2an Ingres Sumo Cloudera SAP Sybase ASE IBM PureData January 2016 Logic Search for Analy2cs/dashDB Logentries SAP Sybase SQL Anywhere Key: B TIBCO Splunk Maana Rela%onal zone B LogLogic EnterpriseDB SQream General purpose Postgres-XL Microso\ Ry\ X15 So\ware Oracle IBM SAP SQL Server Oracle Teradata Specialist analy2c PostgreSQL Exadata
    [Show full text]
  • Research Document
    Research Document Student Name: James Heneghan Student ID: C00155861 Course ID: CW_KCSOF_B Project Name: Take Me There Supervisor: Joseph Kehoe Institute: Institute of Technology Carlow Date: 16/10/2014 1 Contents Introduction ............................................................................................................................................ 3 Galileo Board ........................................................................................................................................... 3 Details of the Architecture .................................................................................................................. 3 Communication ................................................................................................................................... 3 Accelerometer......................................................................................................................................... 4 Cloud Hosting .......................................................................................................................................... 4 Google Maps API ..................................................................................................................................... 4 Google Maps JavaScript API ................................................................................................................ 4 Google Static Maps API ......................................................................................................................
    [Show full text]
  • Data Managementmanagementdata for Scientific Research
    DDaattaa MMaannaaggeemmeenntt For Scientific Research Photo: © Stanza. Used with permission. CCoouurrssee IInnttrroodduuccttiioonn Welcome to a course in data management for scientific research projects. CCoouurrssee SSttrruuccttuurree Casual “guided” study-group approach Presentations, demos, hands-on exercises, discussions and “homework” Materials: A textbook, eBooks, websites, and online videos WWhhyy TTaakkee TThhiiss CCoouurrssee?? PracticalComputing.org Researchers work with increasing amounts of data. Many students do not have training in data management. Science degree programs generally do not address this gap. It is difficult for “non-majors” to get into IT courses. This leaves students and research teams struggling to cope. And therefore places a heavy burden on IT support. Our data management course provides the needed skills to address these issues. Exciting new discoveries await those who can effectively sift through mounds of data! Participant IntroductionsIntroductionsParticipant Please introduce yourself and share your: Degree program and emphasis Research area (general topic) Your current research project (specific topic) The types of data or data systems you use in this project What you hope to get out of this course Session 1: Data System Essentials How will you manage your data? You need a data system. There are many choices. To pick the best one, you need to state your requirements. Photo: NASA Today's Learning Objectives In this session, you will … Become familiar with common types of data systems Learn to differentiate between flat files and relational databases Learn to differentiate between spreadsheets and databases Learn how to model system functions and interactions Learn how to create system diagrams Learn how to state system requirements Ultimately, this knowledge will help you select or design the best data system for your needs.
    [Show full text]
  • Webscalesql : Basic Details and Installation
    The Customize Windows Technology Blog http://thecustomizewindows.com WebScaleSQL : Basic Details and Installation Author : Abhishek WebScaleSQL is a modified MySQL Database developed by Engineers at Facebook, Google, LinkedIn, and Twitter for making kind of scalable SQL. Everyone knows about MySQL Database. Those who do not know what is MySQL, they can read our previously published article - What is MySQL, Why We Need MySQL. At the time of publishing this article; WebScaleSQL is on everyone's lips. MySQL is great, but it is not intended to be Scalable when it was written, that is the basic reason why we try to make Scalable MySQL Database for CMS like WordPress in a kind of legacy mode. WebScaleSQL is kind of fork of MySQL. WebScaleSQL : Basics Need for the Approach Flexibility and Scalability are the Key Benefits of Cloud Computing. Again, Scalability and Service Continuity are Not equivalent. There is another a MySQL Improved Extension - MySQLi. There are reasons why NoSQL Movement was started. 1 / 4 The Customize Windows Technology Blog http://thecustomizewindows.com Official website of WebScaleSQL is : http://webscalesql.org/ But, we do not need the official website, but the Github repository : https://github.com/webscalesql/webscalesql-5.6/tree/webscalesql-5.6.16 2 / 4 The Customize Windows Technology Blog http://thecustomizewindows.com Do you know that, PostgreSQL; one of the most popular noSQL database actually can be used instead of MySQL for WordPress? MariaDB is constantly merged with MySQL community edition. Instead of writing vague collected data on WebScaleSQL, let us install WebScaleSQL. WebScaleSQL : Installation In order to test WebScaleSQL, we will need the following dependencies : + cmake + gcc + libaio-dev + libncurses5-dev + libreadline- dev + bison + git + perl I believe, WebScaleSQL is not intended for use on a VirtualBox as a Virtual Appliance.
    [Show full text]
  • A Novel Approach for Estimating Truck Factors
    A Novel Approach for Estimating Truck Factors Guilherme Avelino∗y, Leonardo Passosz, Andre Hora∗ and Marco Tulio Valente∗ ∗ASERG Group, Department of Computer Science (DCC) Federal University of Minas Gerais (UFMG), Brazil Email: fgaa, mtov, [email protected] y Department of Computing (DC) Federal University of Piaui (UFPI), Brazil zUniversity of Waterloo, Canada Email: [email protected] Abstract—Truck Factor (TF) is a metric proposed by the agile for TF-estimation for which we apply to a target corpus community as a tool to identify concentration of knowledge comprising 133 systems in GitHub. In total, such systems in software development environments. It states the minimal have over 373K files and 41 MLOC; their combined evolution number of developers that have to be hit by a truck (or quit) before a project is incapacitated. In other words, TF helps history sums to over 2 million commits. By surveying and to measure how prepared is a project to deal with developer analyzing answers from 67 target systems, we evidence that in turnover. Despite its clear relevance, few studies explore this 84% of valid answers developers agree or partially agree that metric. Altogether there is no consensus about how to calculate it, the TF’s authors are the main authors of their systems; in 53% and no supporting evidence backing estimates for systems in the we receive a positive or partially positive answer regarding our wild. To mitigate both issues, we propose a novel (and automated) approach for estimating TF-values, which we execute against estimated truck factors. a corpus of 133 popular project in GitHub.
    [Show full text]
  • Facebook Mysql Async.Pdf
    Asynchronous MySQL How Facebook Queries Databases Chip Turner – [email protected] 2014-04-02 Our Codebase § A fancy website written in PHP (which became Hack) § Grew organically over time § Accrued technical debt, then paid it off as we scaled § Many backend services written in C++ § Operations tools (99% Python, 1% PHP, 1% Perl, 1% …) Our Servers and Network § Hundreds of thousands of servers § “Many, many” webservers § “Many” databases § Sharded data model § Single master, multiple replicas § One copy of each shard in each datacenter § Multiple datacenters worldwide A Sense of Scale § PHP code issues 10,000 QPS … of errors § Connection refused, failovers, solar flares, timeouts, you name it § Retries usually make these invisible to the application § 12,000,000 QPS of actual queries from webservers § 8,400,000 QPS of Async MySQL queries (up from 0 one year ago) § Average query time: 9ms § 30 hours of queries executed per second § This is just PHP – does not include TAO, warehousing, or other use cases DB Client Team § Formed a team in early 2013 to focus on database client issues § Most original database client code came along as necessary, not designed § Problem space is both querying databases and finding the right database to query; this is surprisingly tricky § Primarily OLTP workload § Usability, security, reliability are all goals Security - http://xkcd.com/327/ Security team also focuses on this area; joint responsibility. They make it secure, we make it easier to use. Different Kinds of Performance § Throughput § Queries per second or
    [Show full text]
  • Alibaba Open-Source Fast Facts 2019 November
    Alibaba Open-source Fast Facts 2019 November Overview Since the first batch of open-source projects launched in 2011, developers at Alibaba have been actively contributing to open-source communities. Alibaba now boasts over 180 open-source projects, contributing codes to all aspects of software solutions, including cloud infrastructure and machine learning, container, enterprise-class system, database and network. Alibaba and its computing arm Alibaba Cloud have been taking prominent role in a number of worlds’ leading open-source communities, such as Linux Foundation, Apache Software Foundation, MariaDB Foundation, Cloud Foundry Foundation, Cloud Native Computing Foundation ⚫ In the star ranking of world’s largest developer community Github, Alibaba has gained over 690,000 stars, with about 20,000 contributors, as one of the top ten organizations. ⚫ Since 2011, Alibaba has contributed to open source communities including Cloud Native Computing Foundation, Aliance for Open Media, Cloud Foundry, Hyperledger, Open Container Initiative, Continuous Delivery Foundation, The Apache Software Foundation, MariaDB Foundation, The Linux Foundation. ⚫ Linux Community, Alibaba has contributed over 290 patches ⚫ Alibaba’s open-source project “Dubbo” “RocketMQ” “Weex” “JStorm” were donated to Apache Foundation, “Dubbo” and RockeMQ became the top-level project at Apache ⚫ Alibaba Cloud’s OSS has become the third official recognized cloud storage by Hadoop ⚫ Alibaba Cloud has helped MySQL to identify nearly 300 bugs, contributed all the patches to MariaDB
    [Show full text]
  • Recent Trends in Data Persistence and Analysis
    SQL Strikes Back Recent Trends in Data Persistence and Analysis Codemesh 2014 November 4, 2014 Dean Wampler, Ph.D [email protected] ©Typesafe 2014 – All Rights Reserved Tuesday, November 4, 14 All photos are copyright (C) 2008 - 2014, Dean Wampler. All Rights Reserved. Photo: Before dawn above the Western USA [email protected] polyglotprogramming.com/talks @deanwampler ©Typesafe 2014 – All Rights Reserved 2 Tuesday, November 4, 14 Typesafe provides products and services for building Reactive, Big Data applications typesafe.com/reactive-big-data ©Typesafe 2014 – All Rights Reserved 3 Tuesday, November 4, 14 For @coderoshi... Cat meme! ©Typesafe 2014 – All Rights Reserved 4 Tuesday, November 4, 14 If you were at Eric Raymond’s (@coderoshi’s) talk... My cat Oberon. Three Trends ©Typesafe 2014 – All Rights Reserved 5 Tuesday, November 4, 14 Three trends to organizer our thinking… Photo: Dusk over the American Midwest, in Winter Data Size ⬆ ©Typesafe 2014 – All Rights Reserved 6 Tuesday, November 4, 14 Data volumes are obviously growing… rapidly. Facebook now has over 600PB (Petabytes) of data in Hadoop clusters! Formal Schemas ⬇ ©Typesafe 2014 – All Rights Reserved 7 Tuesday, November 4, 14 There is less emphasis on “formal” schemas and domain models, i.e., both relaonal models of data and OO models, because data schemas and sources change rapidly, and we need to integrate so many disparate sources of data. So, using relavely-agnos^c so_ware, e.g., collec^ons of things where the so_ware is more agnos^c about the structure of the data and the domain, tends to be faster to develop, test, and deploy.
    [Show full text]
  • Data Platforms
    1 2 3 4 5 6 Towards Apache Storm SQLStream enterprise search Treasure AWS Azure Apache S4 HDInsight DataTorrent Qubole Data EMR Hortonworks Metascale Lucene/Solr Feedzai Infochimps Strao Doopex Teradata Cloud T-Systems MapR Apache Spark A Towards So`ware AG ZeUaset IBM Azure Databricks A SRCH2 IBM for Hadoop E-discovery Al/scale BigInsights Data Lake Oracle Big Data Cloud Guavus InfoSphere CenturyLink Data Streams Cloudera Elas/c Lokad Rackspace HP Found Non-relaonal Oracle Big Data Appliance Autonomy Elas/csearch TIBCO IBM So`layer Google Cloud StreamBase Joyent Apache Hadoop Platforms Oracle Azure Dataflow Data Ar/sans Apache Flink Endeca Server Search AWS xPlenty zone IBM Avio Kinesis Trafodion Splice Machine MammothDB Presto Big SQL CitusDB Hadapt SciDB HPCC AsterixDB IBM InfoSphere Starcounter Towards NGDATA SQLite Apache Teradata Map Data Explorer Firebird Apache Apache Crate Cloudera JethroData Pivotal SIEM Tajo Hive Drill Impala HD/HAWQ Aster Loggly Sumo LucidWorks Ac/an Ingres Big Data SAP Sybase ASE IBM PureData June 2015 Logic for Analy/cs/dashDB Logentries SAP Sybase SQL Anywhere Key: B TIBCO EnterpriseDB B LogLogic Rela%onal zone SQream General purpose Postgres-XL Microso` vFabric Postgres Oracle IBM SAP SQL Server Oracle Teradata Specialist analy/c Splunk PostgreSQL Exadata PureData HANA PDW Exaly/cs -as-a-Service Percona Server MySQL MarkLogic CortexDB ArangoDB Ac/an PSQL XtremeData BigTables OrientDB MariaDB Enterprise MariaDB Oracle IBM Informix SQL HP NonStop SQL Metamarkets Druid Orchestrate Sqrrl Database DB2 Server
    [Show full text]
  • Co-Change Clustering
    LUCIANA LOURDES SILVA CO-CHANGE CLUSTERING Thesis presented to the Graduate Program in Computer Science of the Federal Univer- sity of Minas Gerais in partial fulfillment of the requirements for the degree of Doctor in Computer Science. Advisor: Marco Túlio de Oliveira Valente Co-Advisor: Marcelo de Almeida Maia Belo Horizonte October 30, 2015 Abstract Modularity is a key concept to embrace when designing complex software systems. Nonetheless, modular decomposition is still a challenge after decades of research on new techniques for software modularization. One reason is that modularization might not be viewed with single lens due to the multiple facets that a software must deal with. Research in programming languages still tries to define new modularization mechanisms to deal with these different facets, such as aspects and features. Addition- ally, the traditional modular structure defined by the package hierarchy suffers from the dominant decomposition problem and it is widely accepted that alternative forms of modularization are necessary to increase developer’s productivity. In order to con- tribute with a solution to this problem, in this thesis we propose a novel technique to assess package modularity based on co-change clusters, which are highly inter-related source code files considering co-change relations. The co-change clusters are classified in six patterns regarding their projection to the package structure: Encapsulated, Well- Confined, Crosscutting, Black-Sheep, Octopus, and Squid. We evaluated our technique in three different fronts: (i) a quantitative analysis on four real-world systems, (ii) a qualitative analysis on six systems implemented in two languages to reveal developer’s perception of co-change clusters, (iii) a large scale study in a corpus of 133 GitHub projects implemented in six programming languages.
    [Show full text]