Course 2: «Open Source Software (OSS) Engineering Data»
Total Page:16
File Type:pdf, Size:1020Kb
Course 2: «Open Source Software (OSS) Engineering Data». 1st Day: Metrics and Tools for Software Engineering in Open Source Software 1. Open Software / Hardware Technologies: Introduction to Open Source Software and related technologies. 2. Software Engineering FLOSS: Free Libre Open Source Software in Software Engineering. 3. Metrics for Open Source Software: product and project metrics and related tooling. 2nd Day: Research based on Open Source Software Data 4. Facilitating Metric for White-Box Reuse: we will discuss a metric derived from the analysis of Open Source Project which facilitates the white-box reuse (reuse based on the source code analysis). 5. Extracting components from Open-Source: The Component Adaptation Environment Approach (COPE): we will discuss the results of the OPEN-SME EU Research Project and in particular the COPE tool for extracting reusable components from Open Source Software. 6. Software Engineering Research based on Open Source Software Data: Data, A recent example: we will discuss a recent study of open source repository mailing lists for deriving implications for the evolution of an open source project. 7. Improving Component Coupling Information with Dynamic Profiling: we will discuss how dynamic profiling of an open source program can contribute towards its comprehension. In various points during the lectures students will be asked to carry out activities. Open Software / Hardware Technologies Ioannis Stamelos, Professor Nikolaos Konofaos, Associate Professor School of Informatics Aristotle University of Thessaloniki George Kakarontzas, Assistant Professor University of Thessaly 2018-2019 1 F/OSS - FLOSS Definition ● The traditional SW development model asks for a “closed member” team that develops proprietary source code. ● FLOSS is an alternative model for developing, distributing and using software ● A FLOSS software provides three basic “freedoms” to its user: Use the software at his own will (as he wishes) Copy it and distribute copies as many times as he wants Change it and distribute the changes as he wants ● The above freedoms need the source code to be openly available 2018-2019 2 Free Software ● Free Software Foundation, under the leadership of Richard Stallman (http://www.fsf.org) ● Free software license cover the three freedoms mentioned above ● Free has to do with the freedom of using the source code ● “Free as in freedom, not as free beer” 2018-2019 3 Open Source Software ● Open Source Software is similar to Free Software (http://www.opensource.org) ● License addresses more issues: Single programmer’s code integrity Distribution of new developments Obligation to avoid bans over specific populations ... For more, read on the link above... 2018-2019 4 2018-2019 5 Development Process (1) ● Different than traditional software development approaches (e.g. Waterfall model) ● Considered as similar to a kind of “extreme spiral model”, extreme programming or other agile methods ● None of the above can accurately represent the F/OSS development process: they neglect the fact that the code is open to everyone 2018-2019 6 Development Process (2) ● In a typical scenario, an individual has a problem as a computer user that he wants to solve (has a “personal itch”): – Wants a brand new app – Wants a non-commercial app – Existing commercial or F/OSS apps are not satisfactory ● He produces an initial version and opens the code with an appropriate license 2018-2019 7 Development Process (3) ● In case there is interest around the project, people start using it, reporting issues and sending new pieces of code (commits) ● The coordinator (or the coordination team), typically the original programmer(s) integrate the add-ons and produce new releases, openly available on the Internet ● This cyclic procedure repeats as long as the project is alive, i.e. there are interested users and developers that support it 2018-2019 8 Development Process (4) ● The most significant advantage is the availability of a multitude of volunteers who develop and inspect continuously the source code (“peer review”) ● Because of the numbers of users/developers finding/fixing bugs and releasing the new code is fast, much faster than with closed code and closed teams. This fact is epitomized with “given enough eyeballs, all bugs are shallow” (Raymond) ● This type of collaborative and open development has produced dedicated software tools that have been adopted by traditional teams as well (CVS/SVN/GIT, Bugzilla etc) 2018-2019 9 F/OSS today ● Several modern F/OSS projects (e.g. ODOO, Alfresco, SugarCRM) are hybrid, meaning in general that: – The project is initiated by a company or the community has evolved into a company – There is an open sourced core (community) version, and a commercial (enterprise) edition – There are paid services over the code (installation, training, parameterization, adaptation, training, support, etc) 2018-2019 10 Known F/OSS projects (1) ● The flagship OS: The Linux Kernel, also Android (~80% open sourced). MacOS is open since some months ● Web servers: Apache Web Server, ~50% globally, also nginx ~20% ● PC Software: Thunderbird, Webmail – Firefox, Chromium – LibreOffice/OpenOffice - VLC ● RDBMS: MySQL, PostgreSQL. Surprisingly, also SAPDB 2018-2019 11 Known F/OSS projects (2) ● Programming: Perl, Python, PHP, Java (since some years) ● The known LAMP of software development: – Linux – Apache http Server – MySQL – Perl/Python/ PhP ● ... 2018-2019 12 Where F/OSS resides ● On self managed web sites (e.g. Apache.org) ● Collectively, in forges: SourceForge Github RubyForge, Tigris.org, BountySource, Launchpad, BerliOS, JavaForge, GNU Savannah, Gitorious... 2018-2019 13 Some current statistics on GitHub ● https://octoverse.github.com/ 2018-2019 14 People Location 2018-2019 15 Repositories 2018-2019 16 Advantages ● Interoperability ● Ease of extension ● Transparency ● Security (!?), Reliability ● Cost savings (zero acquisition cost, free updates, reduced vendor lock-in risk) 2018-2019 17 Drawbacks - Weaknesses ● Development continuum is not guaranteed (however the same issue appears in closed source as well) ● Possibility of forking ● Hard to assess the maturity of a specific FLOSS project ● Lack of usability (in recent years this issue has been addressed in many cases) ● Possibility of lack of support 2018-2019 18 Open Source Issue Survey: GitHub Incomplete or confusing documentation: ~90% Unresponsiveness: ~80% Dismissive responses: ~55% Conflict: ~45% Unexplained rejection: ~30% Unwelcoming language or content: ~15% 2018-2019 19 Open Source criticism ● "Ubuntu Spyware: What to do?", από τον Richard Stallman: https://www.fsf.org/blogs/rms/ubuntu-spyware-what-to-do ● Lennart Poettering statement about open source communities: https://plus.google.com/+LennartPoetteringTheOneAndOnly/posts/J2TZrTvu7vd ● Criticism on "given enough eyeballs, all bugs are shallow", from Bob Glass: http://books.google.gr/books?id=3Ntz-UJzZN0C&pg=PA174&redir_esc=y#v=onepage&q&f=fals e 2018-2019 20 Technical Infrastructure: web site ● Open source projects should have two different websites: – A user website – A developers’ website ● These two websites contain different types of information and serve different purposes. ● There should be a way to reach the developers’ website from the user oriented website. ● An example are the LibreOffice users’ website (top) and developers’ website (bottom). 2018-2019 21 Technical Infrastructure: “Canned” Hosting ● A “canned” hosting site offers most of the necessary collaboration tools for an open source project. For example it could offer some or all of the following: – Public version control repositories – Bug tracking – Wiki space – Mailing list hosting – Continuous integration testing and other services. ● For many projects, canned hosting is an adequate and perfectly acceptable solution. ● The most popular such service today is GitHub (https://github.com/) 2018-2019 22 Technical Infrastructure: Mailing Lists/Forums ● Mailing lists or forums refer to message-based communication platforms where posts are organized in threads (i.e. topics). ● Users can subscribe and create posts and get notified when answers become available. ● Usually there is web-based access to old messages and searching facilities. ● Some things to consider when choosing a mailing list/forum solution: – It should provide mail and web-based access – It should provide spam checking and moderation capabilities – It should provide archiving and searching 2018-2019 23 Technical infrastructure: forum software ● Discourse — https://discourse.org/ – You can install it yourself. Provides both mail and web interfaces. Commercial support is available but it is open source software. ● Google Groups — https://groups.google.com/ – Not an open source service. Provides searchable archives, moderation and spam-prevention as well as both mail and web access. ● And many others: – GroupServer — http://groupserver.org/ – Sympa — https://www.sympa.org/ – Mailman — http://www.list.org/ 2018-2019 24 Technical infrastructure: version control ● Version control system is a system which allows of tracking and controlling changes on project’s files. Particularly the project’s source code but also other files (e.g. web pages). ● Currently the most used version control system is Git (usually hosted at GitHub): https://git-scm.com/ ● Other options include the following: – Mercurial: https://www.mercurial-scm.org/ – Subversion: https://subversion.apache.org/ 2018-2019 25 Technical infrastructure: bug tracker ● A more generic platform than