The Evolution of C Programming Practices: a Study of the Unix Operating System 1973–2015

The Evolution of C Programming Practices: A Study of the Unix Operating System 1973–2015 Diomidis Spinellis Panos Louridas Maria Kechagia [email protected] [email protected] [email protected] Department of Management Science and Technology Athens University of Economics and Business Patision 76, GR-104 34 Athens, Greece ABSTRACT or at the resulting artefacts (outputs). Furthermore, we can Tracking long-term progress in engineering and applied sci- examine both using either qualitative or quantitative means. ence allows us to take stock of things we have achieved, ap- The objective of this work is to study the long term evo- preciate the factors that led to them, and set realistic goals lution of C programming in the context of the Unix oper- for where we want to go. We formulate seven hypotheses as- ating system development. The practice of programming sociated with the long term evolution of C programming in is affected by tools, languages, ergonomics, guidelines, pro- the Unix operating system, and examine them by extracting, cessing power, conventions, as well as business and soci- aggregating, and synthesising metrics from 66 snapshots ob- etal trends and developments. Specific factors that can tained from a synthetic software configuration management drive long term progress in programming practices include repository covering a period of four decades. We found that the affordances and constraints of computer architecture, over the years developers of the Unix operating system ap- programming languages, development frameworks, compiler pear to have evolved their coding style in tandem with ad- technology, the ergonomics of interfacing devices, program- vancements in hardware technology, promoted modularity ming guidelines, processing memory and speed, and social to tame rising complexity, adopted valuable new language conventions. These might allow, among other things, the features, allowed compilers to allocate registers on their be- more liberal use of memory, the improved use of types, the half, and reached broad agreement regarding code format- avoidance of micro-optimisations, the writing of more de- ting. The progress we have observed appears to be slowing scriptive code, the choice of appropriate encapsulation mech- or even reversing prompting the need for new sources of in- anisms, and the convergence toward a common coding style. novation to be discovered and followed. Here are a few examples. The gradual replacement of clunky teletypewriters with addressable-cursor visual dis- play terminals in the 1970s may have promoted the use of CCS Concepts longer, more descriptive identifiers and comments. Compil- •Software and its engineering ! Software evolution; ers using sophisticated graph colouring algorithms for regis- Imperative languages; Software creation and management; ter allocation and spilling [12] may have made it unnecessary Open source model; •General and reference ! Empir- to allocate registers in the source code by hand. The real- ical studies; Measurement; •Social and professional isation that the overuse of the goto statement can lead to topics ! Software maintenance; History of software; spaghetti code [13] might have discouraged its use. Simi- larly, one might hope that the recognition of the complexity and problems associated with the (mis)use of the C prepro- Keywords cessor [15,34,48,49] may have led to a reduced and more dis- C; coding style; coding practices; Unix; BSD; FreeBSD ciplined application of its facilities. Also, one would expect that the introduction and standardisation of new language 1. INTRODUCTION features [2,23,45] would lead to their adoption by practition- ers. Finally, the formation of strong developer communities, Tracking long-term progress in engineering and applied the maturing of the field, and improved communication fa- science allows us to take stock of things we have achieved, cilities may lead to a convergence on code style. appreciate the factors that led to them, and set realistic In more formal terms, based on a simple-regression ex- goals for where we want to go. Progress can be tracked along ploratory study [54], we established the following hypothe- two orthogonal axes. We can look at the processes (inputs) ses, which we then proceeded to test with our data. H1: Programming practices reflect technology affordances If screen resolutions rise we expect developers to become ACM acknowledges that this contribution was authored or co-authored by an em- ployee, contractor or affiliate of a national government. As such, the Government more liberal with their use of screen space, as they are no retains a nonexclusive, royalty-free right to publish or reproduce this article, or to al- longer constrained to use shorter identifiers and shorter lines. low others to do so, for Government purposes only. Higher communication bandwidth (think of the progress from ICSE ’16, May 14-22, 2016, Austin, TX, USA a 110 bps asr-33 teletypewriter, to a 9600 bps vt-100 char- c 2016 ACM. ISBN 978-1-4503-3900-1/16/05. $15.00 acter addressable terminal, to a 10mb Ethernet-connected DOI: http://dx.doi.org/10.1145/2884781.2884799 bitmap-screen workstation) makes typing more responsive they indeed change over time, as well as the direction and and the refresh of large code bodies faster. Similarly, if rate of change. barriers imposed to compilation unit sizes by the lack of The results of the study on long term evolution of pro- memory or processing capacity are removed, we expect de- gramming practices can be used to allocate the investment velopers to abandon artificially-imposed file size limits, and of effort in areas where progress has been efficiently achieved, move towards longer files packed with more functionality. and to look for new ways to tackle problems in areas showing a lack of significant progress. Also, given the hypothesis that H2: Modularity increases with code size the structure and internal quality attributes of a working, As the Unix source code evolves and multiplies in size non-trivial software artefact will represent first and foremost and complexity we expect developers to manage this growth the engineering requirements of its construction [51], the re- through mechanisms that promote modularity. These in- sults can also indicate areas where developers rationally al- clude the use of the static keyword to limit the visibility located improvement effort and areas where developers did of globally visible identifiers, and the pairing of header files not see a reason to invest. with C implementation into discrete modules. This paper builds on the earlier short exploratory study [54] H3: New language features are increasingly used by explicitly stating and testing hypotheses, by using a more to saturation point sophisticated statistical method for the analysis, by a de- The C language has been changing over time with new tailed explanation of the employed methods, and by present- features being added, albeit sparingly. We expect them to ing possible causality links between external changes, the be increasingly used after being introduced, up to a point metrics, and the hypotheses. To formulate the hypotheses | certainly, a language feature can only be used a certain we took into account the general technological trends, such number of times in a program. For example, when the un- as improved screen technologies and software techniques, signed keyword was introduced, it could in theory, be used combined with metrics, which we could measure from our at most on all integer definitions and declarations of a pro- data and could bear on the trends. In the following sec- gram, and in practice only on those where the underlying tions we describe the methods of our study (Section 2), we value was indeed non-negative. We call this limit the fea- present and discuss the results we obtained (Section 3), and ture's saturation point. outline related work (Section 4). Section 5 summarises our H4: Programmers trust the compiler for register conclusions and provides directions for further work. allocation In the days of yore, programmers had to deal themselves 2. METHODS with the nitty-gritty details of register allocation by declar- Our study is based on a synthetic software configuration ing variables with the register keyword. As compiler tech- management repository tracking the long term evolution of nology has improved, compilers have become more adept the Unix operating system. At successive time points of at the task. We expect that developers have been noticing significant releases we process the C source code files with this and have therefore been more and more trusting of the a custom-developed tool to extract a variety of metrics for compilers with handling register allocation optimizations. each file. We then synthesise these metrics into weighted H5: Code formatting practices converge to a com- values that are related to our hypotheses, and analyse the mon standard results over time using established statistical techniques. We expect developers working on a single project, such 2.1 Data Collection and Primary Metrics as Unix, to adopt a common coding standard, making the code base more homogeneous over time. This can be aided The analysis of the Unix source code over the long term through the increased availability and use of collaboration was made possible by the fact that important Unix material mechanisms such as version control systems and online com- of historical importance has survived and is nowadays openly munities. available. Although Unix was initially distributed with rel- atively restrictive licenses, significant parts of its early de- H6: Software complexity evolution follows self cor- velopment have been released by one of its rights-holders rection feedback mechanisms (Caldera International) under a liberal license. Additional We expect that as software evolves it becomes more com- parts were developed or released as open source software by plex, to a degree. A successful project cannot grow in com- (among others) the University of California, Berkeley and plexity beyond the confines of human comprehension.

The Evolution of C Programming Practices: a Study of the Unix Operating System 1973–2015

Megaraid 6Gb/S SAS RAID Controllers Users Guide

Software Rejuvenation Model for Cloud Computing Platform

Retrocomputing As Preservation and Remix

The Past, Present, and Future of Software Evolution

A Brief History of Software Engineering Niklaus Wirth ([email protected]) (25.2.2008)

SWARE: a Methodology for Software Aging and Rejuvenation Experiments

Software Evolution of Legacy Systems a Case Study of Soft-Migration

Software Design Introduction

Govt Linux Appreciation Day 09/18/01

Netiq® Cloudaccess and Mobileaccess Installation and Configuration Guide

The History of Computing in the History of Technology

An Overview of Software Evolution