University of Calgary PRISM: University of Calgary's Digital Repository

Graduate Studies Legacy Theses

2001 A descriptive process model for open-source software development

Johnson, Kim

Johnson, K. (2001). A descriptive process model for open-source software development (Unpublished master's thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/22282 http://hdl.handle.net/1880/41007 master thesis

University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca

The author of this thesis has granted the University of Calgary a non-exclusive license to reproduce and distribute copies of this thesis to users of the University of Calgary Archives.

Copyright remains with the author.

Theses and dissertations available in the University of Calgary Institutional Repository are solely for the purpose of private study and research. They may not be copied or reproduced, except as permitted by copyright laws, without written authority of the copyright owner. Any commercial use or publication is strictly prohibited.

The original Partial Copyright License attesting to these terms and signed by the author of this thesis may be found in the original print version of the thesis, held by the University of Calgary Archives.

The thesis approval page signed by the examining committee may also be found in the original print version of the thesis held in the University of Calgary Archives.

Please contact the University of Calgary Archives for further information, E-mail: [email protected] Telephone: (403) 220-7271 Website: http://www.ucalgary.ca/archives/ THE UNIVERSITY OF CALGARY

A Descriptive Process Model for Open-Source Software Development

by

Kim Johnson

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

DEGREE OF MASTER OF SCIENCE

DEPARTMENT OF COMPUTER SCIENCE

CALGARY, ALBERTA

JUNE, 2001

©Kim Johnson 2001 Abstract

Open Source is a term used to describe a tradition of open standards, shared , and collaborative software development. However the methodology itself has yet to be captured definitively in writing. The single best description is Eric Raymond's (1998a)

The Cathedral and the Bazaar, and while excellent, it is not an academic work but more a pseudo-evangelical report from the field. Consequently, the current perception of what constitutes open-source software development remains somewhat subjective.

This thesis attempts to describe an introductory process model for open-source software development. Common characteristics are identified and discussed with specific examples from various open-source projects. The results lend support to suggestions that open-source software development follows an adaptive lifecycle, with a flexible management model emphasizing leadership, collaboration, and accountability.

Moreover, would seem to represent an alternative approach to distributed software development, able to offer useful information about common problems as well as possible solutions.

in Acknowledgements

This work would not have been possible without the guidance and support of many people. I would first like to thank my supervisor, Dr. Rob Kremer, for giving me an opportunity to research a somewhat unconventional subject. Thanks especially for a flexible yet supportive advisory style.

My appreciation and respect to those who have pioneered open-source software development. It is a truly unique approach and a fascinating area for research. In particular, thanks to the following people for taking time to review an early draft of this work: , Brian Behlendorf, Roy Fielding, Michael Johnson, David Lawrence,

Jason Robbins, Guido van Rossum, Erik Troan, and Paul Vixie.

I would also like to thank Mildred Shaw, Alfred Hussein, and the other early adopters at

SERN for an excellent introduction to the complex subject of software engineering. It has provided me with a solid foundation for continued learning, and I hope it has made me a better practitioner.

And last but certainly not least, most heartfelt thanks to Tera and Kylan for their motivation and continued tolerance of long hours at the keyboard.

IV ... when men were men and wrote their own device driver ...

v Table of Contents

Abstract iii

Acknowledgements iv

Table of Contents vi

List of Tables viii

List of Figures ix

List of Abbreviations and Nomenclature x

Chapter 1 Introduction 1 1.1 Aim 1 1.2 Motivation 1 1.3 Open-Source Software 3 1.4 Software Process Models 4 1.5 Approach 6 1.6 Objectives ...8 1.7 Thesis Structure .9 1.8 Summary ..9

Chapter 2 Open-Source Software Development 10 2.1 History 10 2.2 Definition 13 2.3 The Cathedral and the Bazaar 19 2.4 Projects .23 2.5 Summary 29

Chapter 3 State View 30 3.1 Closed Prototyping ..31 3.2 Iterative and Incremental Enhancement 35 3.3 Concurrent Development ..41 3.4 Large-Scale Peer Review 45 3.5 U ser-Driven Requirements 5 0 3.6 Summary 54

Chapter 4 Organizational View 55 4.1 Decentralized Collaboration 56 4.2 Trusted Leadership 60 4.3 Internal Motivation 64 4.4 Asynchronous Communication 68 4.5 Summary -74

Chapter 5 Control View 76 5.1 Informal Planning 77 5.2 Tiered Participation 79 5.3 Modular Design ..86

vi 5.4 Ubiquitous Tool Support 91 5.5 Shared Information Space 96 5.6 Summary 99

Chapter 6 Evaluation 100 6.1 Key Strengths 100 6.2 Key Weaknesses 104 6.3 Summary 108

Chapter 7 Conclusions 109 7.1 Addressing the Objectives 109 7.2 Future Directions 110 7.3 Thesis Summary 112

Bibliography 117

Appendices 132 A.l Open Source Chronology (Selected Events) 132 A.2 Open Source Projects 136 A.3 Open Source Definition 148 A.4 GNU General Public License 149

vii List of Tables

Table 1. Characteristics of selected open-source projects 6 Table 2. Distribution of sources by software engineering validation method 7 Table 3. Comparison of various licensing practices 15 Table 4. Typical change request 37 Table 5. Comparison of defect density measures between commercial projects and Apache 47 Table 6. Timeline of a bug fix 49 Table 7. Comparison of code productivity of the top Apache developers and the top developers in several commercial projects 79 Table 8. Levels of participation in open-source projects 80 Table 9. Top 5 languages and testing tools used in a small-scale survey on quality related activities in open-source development 92 Table 10. Apache shared information space 96

vin List of Figures

Figure 1. Various categories of free software 14 Figure 2. Market share for top HTTP servers across all domains 23 Figure 3. Comparison of evolutionary development vs. waterfall life cycle 36 Figure 4. Growth of the compressed tar file for the full kernel source release 40 Figure 5. Typical build cycle 42 Figure 6. Proportion of changes closed within a given number days for Apache 50 Figure 7. E-mail discourse 69 Figure 8. List server discourse 69 Figure 9. Activity for the Python mailing list 71 Figure 10. milestone schedule for 2001 78 Figure 11. Cumulative distribution of contributions to the Apache code base 83 Figure 12. Histogram of LOC added per for the GNOME project 84 Figure 13. Cumulative distribution of PR related changes to the Apache code base 85 Figure 14. Mozilla ownership architecture 89 Figure 15. Linux ownership architecture 89

ix List of Abbreviations and Nomenclature

API (Application Programming Interface) - Prescribed by an or application, defining the rules for interaction with other software, build A compiled program intended for distribution. Brooks's Law "Adding more manpower to a project makes it later." The perceived benefit of adding more to a project is outweighed by the cost of coordinating and merging their work, bus syndrome Refers to a process that has become too dependent on the input of one individual. C2Net A software company whose flagship product is a commercial version of the Apache Web server. Acquired by Red Hat in 2000. CGI (Common Gateway Interface) - A standard for interfacing external applications with Web servers. Conway's Hypothesis States that the organization of a software system will be congruent to the organization of the group that designed the system. A general method for making a program free software, and requiring all modified and extended versions to be free software as well, cost Effort cost, or the number of hours required to perform a task. CPAN (Comprehensive Archive Network) - A large collection of Perl software and documentation. CVS (Concurrent Versions System) - The dominant version control system for open- source software development. Cyclic A software company that originally sold support for CVS. Acquired by SourceGear in 1999. Cygnus A software company credited with pioneering the commercialization of open- source software. Acquired by Red Hat in 2000. commit-then-review Changes are deemed inherently acceptable and are applied, with testing and review afterwards. GPL (General Public License) - A license typically used for free software.

x GNU (GNU's Not ) - Used to reference the GNU Project, a development effort to produce a free Unix-like operating system. It is pronounced "guh-NEW." See also: FSF FAQ (Frequently Asked Questions) - Documents that list and answer the common questions on a particular subject, feature creep The tendency to continually add features at the expense of elegance and simplicity. Free Software Refers to the users' freedom to run, copy, distribute, study, change, and improve software. See also: GNU FSF () - A non-profit organization that raises funds for work on the GNU Project. See: GNU. KDelta Thousand lines of code changed. KLOCA Thousand lines of code added. Linus's Law "Given enough eyeballs, all bugs are shallow." Given a large enough pool of peer reviewers, every problem will be obvious to someone. Meritocracy A system of management in which the amount of access and participation allowed is based on the opinions of one's peers. MR (Modification Request) - A request to change a program, either for enhancement or defect correction. Open Source In the broadest sense, a term used to describe a tradition of open standards, shared source code, and collaborative development. See also: OSD, OSI OSD (Open Source Definition) - A formal definition of what is meant by the term "Open Source," according the . See: OSI. OSI (Open Source Initiative) - A non-profit organization dedicated to managing and promoting the Open Source Definition. See: OSD. Red Hat A software company best known for its popular , review-then-commit Changes are proposed, discussed, considered, and reviewed before being applied. Patch A utility that takes a "patch" file containing a difference listing produced by diff and applies those differences to one or more original files, producing a "patched" version. SMB () - A protocol used by most PC-related machines to share files, printers, and various other services tarball A compressed archive. Town Council Effect A situation described by Alan Cox, in which there is a high ratio of "potentially useful wannabe programmers" to active developers. The result is that it becomes difficult to get work done under a pure bazaar model.

XII 1

Chapter 1 Introduction

The idea is to make programming a more public practice, under common scrutiny of many team members, rather than a private art. Harlan Mills

Objectives

• To state the aim of the thesis

• To explain the motivation for the thesis

• To introduce Open-Source Software

• To introduce software process modeling

• To explain the approach used in researching the thesis

• To state the objectives of the thesis

• To preview the chapters of the thesis

1.1 Aim

The aim of the thesis is to provide a descriptive process model for open-source software development.

1.2 Motivation

Open Source is a term used to describe a tradition of open standards, shared source code, and collaborative development (O'Reilly, 1999). Software such as the Linux and

FreeBSD operating systems, the Apache Web server, the Perl, Tel, and Python languages, and much of the Internet infrastructure, can all be categorized as Open Source.

In 1998, made international news when it decided to release the next version of its as open-source software (Charles, 1998). Soon after, IBM adopted

Apache as the core of its e-commerce product line (IBM, 1998). Investments by Intel and others have helped propel companies such as Red Hat, a well-known Linux distributor, to stock market prominence (O'Brian, 1999). Apple (1999), IBM (2000), and

Sun (2000) have either adopted or attempted to emulate open-source licensing for certain projects. Even Microsoft, which is known for its competitive view of Linux (Valloppillil, 2

1998a, 1998b), funds Perl development through a cooperative venture with ActiveState

(1999).

Why has Open Source prompted so much interest? For software developers, the most compelling answer is the Internet. Reaching the market first is critical and, as a ubiquitous communications infrastructure, the Internet inherently supports globally distributed product development. It is quickly becoming recognized as an enabling factor that allows companies to meet the challenges of developing software under tightening market conditions (Maurer and Kaiser, 1998).

However developing software over the Internet, while potentially advantageous, is not always easy. In this regard, an increasing amount of research has been directed toward distributed software processes. Unfortunately much of this work focuses on adapting conventional models, and the monolithic processes underlying these methodologies are not well suited for heterogeneous, rapidly changing environments. Open Source represents an alternative approach to software development that has evolved within the

Internet itself, offering useful information about common problems as well as some possible solutions.

But what is open-source software development? Advocates suggest that it produces more reliable software in less time than closed models. Giving users of a product access to the source code encourages natural product evolution as well as pre-planned product design

(OSI, 1999a). Moreover, this is achieved in a setting of continual change.

Yet these broad claims remain largely unsubstantiated, and critics have some hard questions. For example, is this really a new way of building software? Are each of the successes a fluke of circumstance, or is there a repeatable process at work? How reliant is the whole development model on the hobbyist hacker or computer science student who just happens to put the right pieces together to make something work well? (Behlendorf,

1999)

The most immediate problem in answering these questions is that the methodology has yet to be captured definitively in writing. The single best description is Eric Raymond's

(1998a) The Cathedral and the Bazaar, and while excellent, it is not an academic work but more a pseudo-evangelical report from the field. 3

Although much anecdotal evidence exists for a wide range of projects, this information has not been placed within a common framework. Consequently, the current perception of what constitutes open-source software development remains somewhat subjective.

Frequently cited practices are neither well documented nor universally agreed upon.

Without a consistent format for discussion, it is difficult for researchers and practitioners alike to attempt to emulate or even assess open-source projects.

It is therefore important to establish at least an introductory process model for open- source software development. Common practices can be documented and used to develop a descriptive model that discusses the methodology in the context of contemporary software engineering. By improving the process definition, it becomes easier to study the dynamics of open-source software development objectively and in more detail.

1.3 Open-Source Software

The term "open source" is alternately used to refer to a philosophy, a way of doing business, and a software development methodology. These each represent different aspects of open source, and they can be discussed both exclusively and interchangeably.

The philosophy behind open source is based on the concept of free software. Free software refers not to price but to liberty, or the freedom to modify and redistribute source code. This belief is founded on a social ideal advocating collaboration through knowledge sharing.

Open source has achieved widespread recognition largely as a result of industry resistance to the notion of free software. Many free software practitioners felt that their business and development practices were not being given fair consideration due to misconceptions about the underlying philosophy. The open source label was established to market the commercial viability of free software, while maintaining the same basic approach.

Since traditional licenses and fees cannot be used with open-source software, several business models have been proposed for companies that are creating or leveraging open- source products (Hecker, 1998). These include Support Sellers, in which revenue comes 4 from media distribution, branding, consulting, custom development and post-sales support. Red Hat and Cygnus are popular examples.

Loss Leaders use a no-charge open-source product to establish market position. Netscape is applying this model with Communicator, attempting to regain a greater share of the browser market. Companies that are in business primarily to sell hardware can use

Widget Frosting, where open-source software is used to enable driver and interface code.

Accessorizing involves selling products that support open-source software, such as books and hardware. Examples include O'Reilly, VA Linux, and SSC. Other models have also been described, including some hybrids, in which the constraints surrounding open source are relaxed somewhat to interact with more conventional business practices.

Open-source software development itself has been advocated as a way of building highly reliable products that more closely approximate actual user requirements. Ingrained in the hacker culture, open source appears to represent the antithesis of software engineering. Consequently, it is commonly disregarded as lacking any repeatable methodology. However, despite this historical stereotype, open-source projects follow surprisingly consistent development practices.

Projects commonly deal with issues such as distributed coordination, large-scale collaboration, and rapid incremental product evolution. Generally speaking, an individual or small group develops some software, makes it available on the Internet for review, screens changes to the code base, and delegates responsibility as participation grows. This approach has been proven to be highly effective in many different projects.

1.4 Software Process Models

A software process comprises the activities, methods, and practices necessary to develop a software system (Humphrey, 1989). Software process models are abstractions of a particular development approach. The purpose of a process model is to reduce complexity of understanding by removing unnecessary detail.

Process models can be either prescriptive or descriptive. A prescriptive model characterizes what is supposed to be done, whereas a descriptive model captures what is 5 actually done. Prescriptive models suggest minimal process improvement. The model is considered correct as it is represented.

Descriptive models provide useful information about a process and its behaviour. They can be used to facilitate discussion, where a group requires a common representational format. A descriptive model can also support process management and improvement, helping to identify potential problems before they occur.

It is often useful to structure a process model using different views. These views underlie separate yet interrelated representations for analyzing and presenting information. They are analogous to "different vantage points from which one may view an observable process." (Curtis et al, 1992) Each view is an essential perspective that must be understood, defined, and managed. Ideally, when combined, these perspectives will produce an integrated, consistent, and complete process model.

Humphrey (1989) proposes three basic views for software process models: the state view, the organizational view, and the control view. The state view covers the various stages of product development. This includes both tasks and product states relating to design, coding, and testing. The organizational view of a software process model addresses the social aspect of development. This includes factors relating to communication and coordination, key roles and responsibilities, and motivation. The control view of a software process model focuses on direction. More specifically, it deals with mechanisms for guiding development. This includes planning, approval, data gathering, support, and reporting.

Each of these views can be represented with varying levels of formality and granularity.

Granularity is driven by the need to ensure precision, or the degree to which a process model specifies repeatable steps. A larger grained model is more suitable when individual steps are well understood. Emphasis is on understanding the interaction and relationships between steps.

There has been considerable discussion regarding the level of formality needed in software process modeling. Osterweil (1987) takes a strong position on formality, arguing that "software processes are software too." However, Lehman (1987) maintains that formal representations such as computer programming are too deterministic for 6 processes enacted by humans. In practice, the level of formality depends on the purpose of the model. Informal representations are more appropriate for human enactment, as opposed to process automation.

1.5 Approach

The approach taken in researching the thesis was as follows. A brief introduction to the topic area was developed, and the aim and objectives were formalized and presented for review. Based on comments received, a literature survey was performed, touching on subjects such as the history of free software, licensing, and some relevant definitions.

The survey also included a critical synopsis of Eric Raymond's (1998a) The Cathedral and the Bazaar, widely recognized as the de facto treatment of the open-source methodology.

Approximately 50 projects were then catalogued according to the Open Source

Definition. These were selected on the basis of personal recommendations, exposure within the open-source community, and trade press coverage. The selection covered a range of application domains, and was meant to capture a suitable level of diversity.

Summary information was collected for each project as listed in Appendix A.2. This included a review of goals, licensing, community, history, and current status. Of the roughly 50 projects, 10 were selected for more detailed study as described in Table 1.

Backgrounds were compiled, and the products themselves were also tried.

Table 1. Characteristics of selected open-source projects.

Project Size (KLOC)1 Application Domain Apache 100 HTTP server Linux 800 Operating system INN 150 NNTP server BIND 150 DNS server KDE 250 Desktop environment GNOME 150 Desktop environment Mozilla 1500 Web browser Perl 150 Programming language Python 160 Programming language 150 SMB server

KDE and GNOME KLOC estimates include the base product only. 7

With the literature survey as a starting point, additional information was collected for each of the short-listed projects. A breakdown of sources by validation method is shown in Table 2.2

Emphasis was placed on sources relating to observational and historical methods. This included first person accounts, as well as numerous published interviews with various core developers. Where necessary, this information was clarified with additional questioning via e-mail. To minimize selection bias, critical comments were taken into consideration wherever possible.

Table 2. Distribution of sources by software engineering validation method.

Validation method Category Description % Project monitoring Observational Collect development data 8 Case study Observational Monitor project in depth 14 Assertion Observational Use ad hoc validation techniques S Field study Observational Monitor multiple projects 2 Literature search Historical Examine previously published studies 2 Legacy Historical Examine data from completed projects 16 Lessons learned Historical Examine qualitative data from completed projects 44 Static analysis Historical Examine structure of developed product 4 Replicated Controlled Develop multiple versions of product 0 Synthetic Controlled Replicate one factor in laboratory setting 0 Dynamic Analysis Controlled Execute developed product for performance 0 Simulation Controlled Execute product with artificial data 2

Sources relying on legacy data were especially useful. Open-source projects record nearly everything in electronic form, making data collection reasonably straightforward.

Artifacts also tend to be fairly consistent. Several supporting studies used data mining3 techniques for information recovery and analysis.

In addition to these sources, short-listed projects were periodically monitored throughout the duration of the study. This was accomplished by passively subscribing to various mailing lists and newsgroups, where most development activity could be observed. CVS archives and problem report databases were also helpful in retracing project histories.

" See Zelkowitz and Wallace (1Q98) for a detailed taxonomy of these software engineering validation methods. ' Mochus et al (2000) used scripts to extract data from the Apache developer mailing list archive, CVS repository, and problem reporting database. Koch and Schneider (2000) used a similar approach, retrieving data from the GNOME CVS repository and several discussion lists. 8

Combining these resources, meta-analysis was performed across projects to identify

common attributes. Characteristics were then organized into state, organizational, and control process views by asking the following questions respectively. How is the work produced? How is the work organized? How is the work controlled? This made it possible to build a process framework using a stepwise, repeatable approach.

The model itself is described informally with a high level of abstraction. Practical examples together with excerpted comments from various project participants were used to illustrate fundamental points. Key strengths and weaknesses were also outlined.

Lastly, a preliminary draft was distributed to select members of the open-source community, eliciting feedback. This allowed for some validation, and contributed to a final critique. Wherever possible, remarks were folded back into the model to improve the accuracy of the overall interpretation.

1.6 Objectives

The objectives of the thesis are:

1. To survey the current literature relating to open-source software.

2. To review a range of open-source projects, selecting several for more detailed

investigation.

3. To compile information about these projects, both through observational and

historical study.

4. To identify common characteristics across projects, consistent with state,

organizational, and control views of the development process.

5. To discuss these characteristics with examples from various projects.

6. To critique the resulting model, both independently and through exposure to the

open-source community. 9

1.7 Thesis Structure

Chapter I introduces the thesis, stating the aim and objectives. The motivation for the thesis is explained and the approach used in researching the thesis is discussed. Brief overviews are provided for open-source software and software process modeling.

Chapter 2 surveys the subject area in more detail, discussing various aspects of open- source software. The history of free software and the definition of Open Source are reviewed. A critical synopsis of The Cathedral and the Bazaar is provided, as well as profiles of several open-source projects.

Chapter 3 presents a state view of the open-source software development process.

Characteristics are discussed relating to design, coding, and testing.

Chapter 4 presents an organizational view of the open-source software development process. Characteristics are discussed relating to communication and coordination, key roles and responsibilities, and motivation.

Chapter 5 presents a control view of the open-source software development process.

Characteristics are discussed relating to planning, approval, data gathering, support, and documentation.

Chapter 6 reviews the key strengths of the open-source software development methodology, highlighting various aspects of the process that make it useful.

Weaknesses are also discussed, both actual and potential.

Chapter 7 summarizes the thesis, discussing how the objectives have been met and presenting potential directions for future work.

1.8 Summary

This chapter introduced the thesis, stating the aim and objectives. Open Source Software and software process modeling were also briefly discussed. The motivation and approach used for research were explained, and the chapters of the thesis were previewed. 10

Chapter 2 Open-Source Software Development

...think of "free speech, " not "free beer" Richard M. Stallman

Objectives

• To present a brief history offree software

• To review the definition of Open Source

• To provide a critical synopsis of The Cathedral and the Bazaar

• To briefly describe several open-source projects

2.1 History

Open Source is firmly rooted in the Hacker Ethic.4 In the late 1950's, MIT's computer culture originated the term "hacker"5, defined today as "a person who enjoys exploring the details of programmable systems ..." (Raymond, 1996a). Various members of the

Tech Model Railroad Club, or TMRC, formed the nucleus of MIT's Artificial

Intelligence Laboratory. These individuals were obsessed with the way systems worked.

The word hack had long been used to describe elaborate college pranks devised by MIT students, however TMRC members used the word to describe a task "imbued with innovation, style, and technical virtuosity" (Levy, 1984). A project undertaken not solely to fulfill some constructive goal, but with some intense creative interest was called a hack.

Projects encompassed everything electronic, including constant improvements to the elaborate switching system controlling the TMRC's model railroad. Increasingly though, attentions were directed toward writing computer programs, initially for an IBM 704 and

4 The Hacker Ethic (Levy, 1984) (p40) states that: ''Access to computers - and anything which might teach you something about the way the world works - should be unlimited and total. Always yield to the Hands- On Imperative!" 5 Unfortunately, the term "hacker" suffers from some widespread misconceptions. Popular usage denotes a person involved in mischief or criminal activities using computers. These people are more accurately referred to as "crackers." 1 ] later on the TX-O, one of the first transistor-run computers in the world. Early hackers would spend days working on programs intended to explore the limits of these machines.

In 1961, MIT acquired a PDP-1, the first minicomputer, designed not for huge number- crunching tasks but for scientific inquiry, mathematical formulations, and of course hacking. Manufactured by Digital Equipment Corporation, the PDP series of computers pioneered commercial interactive computing and time-sharing operating systems. MIT hackers developed software that was freely distributed by DEC to other PDP owners.

Programming at MIT became a rigorous application of the Hacker Ethic, a belief that

"access to computers - and anything which might teach you something about the way the world works - should be unlimited and total" (Levy, 1984).

MIT was soon joined by Stanford University's Artificial Intelligence Laboratory and later

Carnegie-Mellon University. All were thriving centres of software development able to communicate with each other through the ARPAnet, the first transcontinental, high-speed data network. Built by the Defense Department in the late 1960's, it was originally designed as an experiment in digital communication. However, the ARPAnet quickly grew to link hundreds of universities, defense contractors, and research laboratories. This allowed for the free exchange of information with unprecedented speed and flexibility, particularly software. (Raymond, 1996b)

Programmers began to actively contribute to various shared projects. These early collaborative efforts led to informal principles and guidelines for distributed software development stemming from the Hacker Ethic. The most widely known of these projects was Unix, which contributed to the ongoing growth of what would eventually become the

Internet.

Unix was originally developed at AT&T Bell Labs, and was not strictly speaking a freely available product. However, it was licensed to universities for a nominal sum, which resulted in an explosion of creativity as programmers built on each other's work.

Traditionally, operating systems had been written in assembler to maximize hardware efficiency, but by the early 1970's hardware and compiler technology had become good enough that an entire operating system could be written in a higher level language. Unix 12 was written in , and this provided unheard of portability between hardware platforms, allowing programmers to write software that could be more easily shared and dispersed.

The most significant source of Unix development outside of Bell Labs was the University of California at Berkeley. UC Berkeley's Computer Science Research Group folded their own changes and other contributions into a succession of releases. Berkley Unix came to be known as BSD, or Berkley Standard Distribution, and included a rewritten file system, networking capabilities, virtual memory support, and a variety of utilities (Ritchie, 1979).

A few of the BSD contributors founded , marketing Unix on 68000- based hardware. Rivalry ensued between supporters of Berkley Unix and AT&T versions. This intensified in 1984, when AT&T divested and Unix was sold as a commercial product for the first time through Unix System Laboratories.

The commercialization of Unix not only fractured the developer community, but it resulted in a confusing mass of competing standards that made it increasingly difficult to develop portable software. Other companies had entered the marketplace, selling various proprietary versions of Unix. Development largely stagnated, and Unix System

Laboratories was sold to after efforts to create a canonical commercial version failed. The GNU project was conceived in 1983 to rekindle the cooperative spirit that had previously dominated software development.

GNU, which stands for GNU's Not Unix, was initiated under the direction of Richard

Stallman, who had been a later participant in MIT's Artificial Intelligence Lab and believed strongly in the Hacker Ethic. The GNU project had the ambitious goal of developing a freely available Unix-like operating system that would include command processors, assemblers, compilers, interpreters, debuggers, text editors, mailers, and much more. (FSF, 1998a)

Stallman created the Free Software Foundation, an organization that promotes the development and use of free software, in particular the GNU operating system (FSF,

1998c). Hundreds of programmers created new, freely available versions of all major

Unix utility programs. Many of these utilities were so powerful that they became the de facto standard on all Unix systems. However, a project to create a replacement for the

Unix kernel itself faltered. 13

By the early 1990's, the proliferation of low-cost, high-performance personal computers along with the rapid growth of the had reduced entry barriers to participation in collaborative projects. Free software development extended to reach a much larger community of potential contributors, and projects such as Linux and Apache became immensely successful, prompting a further formalism of hacker best practices.

The Cathedral and the Bazaar was first presented at Linux Kongress 97 and made widely available on the Web shortly thereafter. Written by Eric Raymond (1998a), the paper contrasts two different styles of software development, "the cathedral model of the commercial world and the bazaar model of the Linux world." The cathedral model is tightly organized and centrally planned. In contrast, Linux development resembles "a great babbling bazaar of differing agendas and approaches."

The paper acted as a catalyst for a widespread grassroots movement, eventually culminating in Netscape's 1998 announcement that it planned to give away the source of its browser. Soon after, a strategy session was held with representatives from the free software community, and the Open Source Software label was coined (OSI, 1999b).6

2.2 Definition

The term "Open Source" was adopted in large part because of the ambiguous nature of the expression free software. The notion of free software does not mean free in the financial sense, but instead refers to the users' freedom to run, copy, distribute, study, change and improve software. Confusion over the meaning can be traced to the problem that, in English, free can mean no cost as well as, freedom. In most other languages, free and freedom do not share the same root; gratuit and libre, for instance. "To understand the concept, you should think of free speech, not free beer," writes

(FSF, 1999a).

Due to the inherent ambiguity of the terminology, various wordings are used interchangeably. This is misleading, as software may be interpreted as something it is

6 As an historical note, Christine Peterson of the Foresight Institute is credited with coining the term "Open Source Software." 14 not. As shown in Figure 1 (FSF, 1998b), even closely related terms such as free software and Open Source have developed subtle distinctions.7

Figure 1. Various categories of free software.

Free software is often confused with public domain software. If software is in the public domain, then it is not subject to ownership and there are no restrictions on its use or distribution. More specifically, public domain software is not copyrighted. If a developer places software in the public domain, then he or she has relinquished control over it. Someone else can take the software, modify it, and restrict the source code.

Freeware is commonly used to describe software that can be redistributed but not modified. The source code is not available, and consequently freeware should not be used to refer to free software.

Shareware is distributed at no initial cost, like freeware. Users can redistribute shareware, however anyone who continues to use a copy is required to pay a modest license fee. Shareware is seldom accompanied by the source code, and is not free software.

7 According to the Free Software Foundation (FSF, 1998d): "The Free Software movement and the Open Source movement are like two political camps within the free software community ... We disagree on the basic principles, but agree more or less on the practical recommendations." 15

Open Source is used to mean more or less the same thing as free software. Free software is "software that comes with permission for anyone to use, copy, and distribute, either verbatim or with modifications, either gratis or for a fee." (FSF, 1999a) In particular, this means that source code must be available.

Free software is often used in an ideological context, whereas Open Source is a more commercially oriented term. The Free Software Foundation advocates free software as a right, emphasizing the ethical obligations associated with software distribution (Stallman,

1999). Open Source is more commonly used to describe the business case for free software, focusing more on the development process rather than any underlying moral requirements.

Various free software licenses have been developed. The licenses each disclaim all warranties. The intent is to protect the author from any liability associated with the software. Since the software is provided free of charge, this would seem to be a reasonable request. Table 3 (Perens, 1999) (pi 85) provides a comparison of several common licensing practices.

Table 3. Comparison of various free software licensing practices.

License Can be mixed Modifications can Can be relicensed Contains special with non-free be taken private by anyone privileges for the software and returned to original copyright you holder over your modifications GPL LGPL X BSD X X NPL X X X MPL X X Public X X X Domain

Copyleft is a concept originated by Richard Stallman to address problems associated with placing software in the public domain. As mentioned previously, public domain software is not copyrighted. Someone can make changes to the software, many or few, and distribute the result as a proprietary product. People who receive the modified product may not have the same freedoms that the original author provided. Copyleft says that 16

"anyone who redistributes the software, with or without changes, must pass along the freedom to further copy and change it." (FSF, 1999b)

To copyleft a program, first it is copyrighted and then specific distribution terms are added. These terms are a legal instrument that provide rights to "use, modify, and redistribute the program's code or any program derived from it but only if the distribution terms are unchanged." (FSF, 1999b)

In the GNU project, copyleft distribution terms are contained in the GNU General Public

License, or GPL (see Appendix A.4). The GPL does not allow private modifications.

Any changes must also be distributed under the GPL. This not only protects the original author, but it also encourages collaboration, as any improvements are made freely available. (Stallman, 1993)

Additionally, the GPL does not allow the incorporation of licensed programs into . Any software that does not grant as many rights as the GPL is defined as proprietary. However, the GPL contains certain loopholes that allow it to be used with software that is not entirely free. Software libraries that are normally distributed with the compiler or operating system may be linked with programs licensed under the GPL. The result is a partially-free program. The copyright holder has the right to violate the license, but this right does not extend to any third parties who redistribute the program. Subsequent distributions must follow all of the terms of the license, even those that the copyright holder violates.

An alternate form of the GPL, the GNU General Public License or LGPL, allows the linking of free software libraries into proprietary executables under certain conditions.

In this way, commercial development can also benefit from free software. A program covered by the LGPL can be converted to the GPL at any time, but that program, or anything derived from it, cannot be converted back to the LGPL.

The GPL is a political manifesto as well as a , and much of the text is concerned with explaining the rationale behind the license. Unfortunately this has alienated some developers. For example, Larry Wall (Lash, 1998), creator of Perl and the

Artistic license, says: "the FSF [Free Software Foundation] has religious aspects that I 17 don't care for." As a result, some free software advocates have created more liberal licensing terms, avoiding the political rhetoric associated with the GPL.

The X license and the related BSD and Apache licenses are very different from the GPL and LGPL. The software originally covered by the X and BSD licenses was funded by monetary grants from the US government. In this sense, the public owned the software, and the X and BSD licenses therefore grant relatively broad permissions.

The most important difference is that X-licensed modifications can be made private. An

X-licensed program can be modified and redistributed without including the source or applying the X license to the modifications. Other developers have adopted the X license and its variants, including the BSD and the Apache web server.

The Artistic license was originally developed for Perl, however it has since been used for other software. The terms are more loosely defined in comparison with other licensing agreements, and the license is more commercially oriented. For instance, under certain conditions modifications can be made private. Furthermore, although sale of the software is prohibited, the software can be bundled with other programs, which may or may not be commercial, and sold.

The , or NPL, was developed by Netscape. The NPL contains special privileges that apply only to Netscape. Specifically, it allows Netscape to re- license code covered by the NPL to third parties under different terms. This provision was necessary to satisfy proprietary contracts between Netscape and other companies.

The NPL also allows Netscape to use code covered by the NPL in other Netscape products without those products falling under the NPL.

Not surprisingly, the free software community was somewhat critical of the NPL.

Netscape subsequently released the MPL, or . The MPL is similar to the NPL, but it does not contain exemptions. Both the NPL and the MPL allow private modifications.

The Open Source Definition is not a software license. Instead it is a specification of what is permissible in a software license for that software to be considered Open Source. The 18

Open Source Definition is based on the Debian free software guidelines or social contract, which provides a framework for evaluating other free software licenses.

The Open Source Definition (see Appendix A.3) includes several criteria, which can be paraphrased as follows (OSI, 1999c):

1. Free Redistribution - Copies of the software can be made at no cost.

2. Source Code - The source code must be distributed with the original work, as

well as all derived works.

3. Derived Works - Modifications are allowed, however it is not required that the

derived work be subject to the same license terms as the original work.

4. Integrity of the Author's Source Code - Modifications to the original work may

be restricted only if the distribution of patches is allowed. Derived works may be

required to carry a different name or version number from the original software.

5. No Discrimination Against Persons or Groups - Discrimination against any

person or group of persons is not allowed.

6. No Discrimination Against Fields of Endeavor - Restrictions preventing use of

the software by a certain business or area of research are not allowed.

7. Distribution of License - Any terms should apply automatically without written

authorization.

8. License Must Not Be Specific to a Product - Rights attached to a program must

not depend on that program being part of a specific software distribution.

9. License Must Not Contaminate Other Software - Restrictions on other software

distributed with the licensed software are not allowed.

The GNU GPL, BSD, X Consortium, MPL, and Artistic licenses are all examples of licenses that conform to the Open Source Definition.

The evaluation of a proposed license elicits considerable debate in the free software • community. Many companies are developing licenses intended to capitalize on the growing popularity of Open Source. Some of these licenses conform to the Open Source

Definition, however others do not. For example, the Sun Community Source License 19

(Sun, 2000) approximates some Open Source concepts, but it does not conform to the

Open Source Definition. The Apple Public Source License, or APSL (Apple, 1999), has been alternately endorsed and rejected by various members of the open-source community, whereas the IBM Public License (IBM, 2000) has been approved.

2.3 The Cathedral and the Bazaar

The Cathedral and the Bazaar is recognized as the canon of the open-source movement.

Written by Eric Raymond (1998a), it outlines lessons learned from "a successful open- source project, fetchmail, that was run as a deliberate test of some surprising theories about software engineering suggested by the history of Linux."

While the paper has been widely acclaimed, it has also been criticized for its broad propositions and somewhat misleading approach (Eunice, 1998). For instance, Raymond essentially ignores more contemporary practices in software engineering. As a result, some consider his analysis too simplistic. In any case, The Cathedral and the Bazaar has been the cause of much discussion, and offers a unique and insightful perspective, enumerating various principles that drive the bazaar model. These "tenets of Open

Source" are reviewed below.

1. Every good work of software starts by scratching a developer's personal itch.

Free software developers are able to choose the projects they work on, and are thus highly motivated. This correlation has also been confirmed in studies of some commercial projects (DeMarco and Lister, 1999) and carries many potential benefits, including better code quality.

2. Good programmers know what to write. Great ones know what to rewrite (and reuse).

The importance of reuse is not unique to free software development. However, Raymond does note that "the source-sharing tradition of the Unix world has always been friendly to code reuse." With code made freely accessible to everyone, it seems plausible that reuse will be made easier. 20

3. "Plan to throw one away; you will, anyhow." (Fred Brooks, "The Mythical Man-

Month", Chapter 11)

Raymond borrows from Brooks to illustrate an important point, that redesign is often preferable to reworking a flawed concept. This is particularly important in evolutionary prototyping, to which parts of the bazaar model bear some resemblance.

4. If you have the right attitude, interesting problems will find you.

The implication is that a project will evolve more naturally in a software culture where code sharing is encouraged.

5. When you lose interest in a program, your last duty to it is to hand it off to a competent successor.

For code reuse to be effective, it must be maintained. In a voluntary environment it is therefore important for projects to be transferred to someone new when the current owner becomes too busy or tired.

6. Treating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging.

This is one of the key principles of free software development. Projects rely on users to evaluate and improve the software. Moreover, users as co-developers can be extremely effective because of their familiarity with the application domain.

7. Release early. Release often. And listen to your customers.

Another key principle of the bazaar model, although it is not necessarily specific to this approach. In order to improve progress visibility and establish a tight feedback loop, change should be frequent and small.

8. Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone.

Possibly the most important, and in many ways controversial, principle put forward by

Raymond. Also known as Linus's Law, it can be stated informally as "Given enough eyeballs, all bugs are shallow." The basic idea is that a larger community of diverse 21 participants will find more errors. While there is little doubt that this approach is fast, its efficiency remains unproven.

9. Smart data structures and dumb code works a lot better than the other way around.

This is a general statement, in which Raymond notes that data structures support understanding through abstraction.

10. If you treat your beta-testers as if they're your most valuable resource, they will respond by becoming your most valuable resource.

Again, Raymond stresses the importance of users as co-developers, noting that the success of any free software product relies on the interest and commitment of its users.

Participants need to see their contributions recognized.

11. The next best thing to having good ideas is recognizing good ideas from your users. Sometimes the latter is better.

Raymond continues to elaborate on the importance of feedback from a competent user base. A large, diverse community is as important for product evolution as it is for debugging. More people have more ideas, and often develop better solutions.

12. Often, the most striking and innovative solutions come from realizing that your concept of the problem was wrong.

A restatement of (3), emphasizing that developers must not be afraid to discard features in favour of clean design.

13. "Perfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away."

Raymond references a well-known design principle, expanding on (3) and (12). As in

(11) it is also suggested that co-developers are valuable for design as well as debugging.

"It is not only debugging that is parallelizable; development and (to a perhaps surprising extent) exploration of design space is, too." 22

14. Any tool should be useful in the expected way, but a truly great tool lends itself to uses you never expected.

By encouraging meaningful input, software is more likely to approximate actual user needs. Good design decisions often follow from diverse input.

15. When writing gateway software of any kind, take pains to disturb the data stream as little as possible — and *never* throw away information unless the recipient forces you to!

This does not really apply to the bazaar model, but instead to the development of fetchmail itself.

16. When your language is nowhere near Turing-complete, syntactic sugar can be your friend.

Another technical point directed more toward fetchmail. Raymond explains why an

"English-like" syntax was appropriate for the rc file parser.

17. A security system is only as secure as its secret. Beware of pseudo-secrets.

Raymond cautions against using insecure methods to implement security.

18. To solve an interesting problem, start by finding a problem that is interesting to you.

Essentially a restatement of (1), emphasizing that developers who are able to choose their work are more effective than those who cannot.

19. Provided the development coordinator has a medium at least as good as the

Internet, and knows how to lead without coercion, many heads are inevitably better than one.

Raymond notes that the bazaar model is dependent on the Internet as a process framework. The Internet provides easy access to a large, diverse pool of potential contributors. It also supports open channels of communication and a decentralized group structure. 23

2.4 Projects

There are literally thousands of open-source projects currently in existence. These projects include operating systems, programming languages, utilities, Internet applications and many more. The following 10 projects are notable for their influence, size, and success.

Apache

Apache is a commercial-grade Web server. It originated as a series of patches to the public domain HTTP daemon developed at the National Center for Supercomputing

Applications (NCSA). Brian Behlendorf (Dougherty, 1998c) recalls: "It started around

January of 1995, when a couple members of a Web-related IETF mailing list (or it could have been www-talk, I'm not sure exactly) were commiserating about the loss of NCSA developers to Netscape and the lack of responsiveness from NCSA to bugfixes and patches we were sending in. Since other people had patches I wanted, and I had patches they wanted, we decided to band together."

The first official release was made later that year, and Apache now consistently ranks as the most popular Web server on the Internet, as shown in Figure 2 (Netcraft, 2000).

Apache dominates the market and is more widely used than all other Web servers combined. Several companies, including C2Net, distribute commercial versions of

Apache, earning money for support services and added utilities.

70%

35%

0% Aug 1996 1997 1998 1999 2000

Figure 2. Market share for top HTTP servers across all domains.

Hence the name, A PAtCHy server. 24

Ongoing development is overseen by the Apache Group, a group of about 20 core developers, who now focus more on business issues and security problems. Development is coordinated through the new- httpd mailing list, and a voting process exists for conflict resolution. Apache operates as a meritocracy, in a format similar to most open-source projects. Responsibility is based on contribution, where the more work you have done, the more work you are allowed to do. The product is licensed under the Apache License, a BSD variant.

Linux

Linux9 is a Unix-like operating system that runs on several hardware platforms, including

Intel processors, Motorola MC68K, and DEC Alphas (Linux Journal, 1998). It is a superset of the POSIX specification, with SYS V and BSD extensions. Linux began as a hobby project of Linus Torvalds, then a graduate student at the University of Helsinki, and was inspired by his interest in Minix, a small Unix system developed primarily as an educational tool by Andrew Tanenbaum. Linus (Torvalds, 1992) set out to create, in his own words, "a better Minix than Minix."

In October 1991, Linus (Torvalds, 1992) announced the first official release of Linux, version 0.02, with the following posting to the comp. os. minix newsgroup:

Do you pine for the nice days of minix-1.1, when men were men and wrote their own device drivers? Are you without a nice project and just dying to cut your teeth on a OS you can try to modify for your needs? Are you finding it frustrating when everything works on minix? No more all-nighters to get a nifty program working? Then this post might be just for you :-)

As I mentioned a month (?) ago, I'm working on a free version of a minix- lookalike for AT-386 computers. It has finally reached the stage where it's even usable (though may not be depending on what you want), and I am willing to put out the sources for wider distribution. It is just version 0.02 (+1 (very small)

9 Strictly speaking, Linux refers to a Unix-like operating system consisting of many different programs. The majority of these are GNU programs, and some therefore consider "GNU/Linux" to be a more accurate term (for an extended discussion, see Stallman, 1998). Here, "Linux" is taken to refer primarily to the . 25

patch already), but I've successfully run bash/gcc/-make/gnu-sed/compress etc under it...

Since then, hundreds of programmers have contributed to the ongoing development of

Linux. Kernel development is largely coordinated through the linux-kernel mailing list. The list is high volume, and currently includes over 200 active developers as well as many other debuggers and testers. With growth, Linus has relinquished control over certain areas of the kernel, such as file systems and networking, to certain "trusted lieutenants." However, Linus remains the final authority on decisions related to kernel development. The kernel is licensed under the GPL, and releases are made available via ftp at http://www.kernel.org.

Arguably the most well-known open-source project, Linux has quietly gained popularity in academia as well as among scientific researchers and Internet service providers.

Recently it has made notable commercial inroads, and is currently marketed as the only viable alternative to Microsoft's Windows NT. A study by International Data

Corporation reported that Linux accounted for 17.2 % of server operating system shipments in 1998, an increase of 212% over the previous year (Shankland, 1998). The kernel is typically packaged with the various other programs that comprise a Unix operating system. Several companies, such as Red Hat and Caldera, currently sell these packages as Linux distributions.

Although other projects such as Apache and Mozilla have attracted a fair share of media attention, it is Linux that has resulted in so much mainstream interest in open-source software development.

INN and BIND

INN and BIND are both maintained by the Internet Software Consortium (ISC), a non• profit organization dedicated to maintaining reference implementations for core Internet protocols.

BIND, or the Berkeley Internet Name Domain, is a DNS server. It is used on most name serving machines on the Internet. Originally developed as a graduate project at UC

Berkeley, early versions were maintained by the Computer Systems Research Group with 26 later work sponsored by Digital Equipment Corporation. Paul Vixie has acted as the principal maintainer for BIND since 1989.

INN, or the InterNetNews package, is a server. Usenet is most simply described as a worldwide bulletin board system. According to David Lawrence (1998): "Rich Salz wrote and released the INN system in the early 1990s, but his ability to support it gradually succumbed to career and family obligations. Although he still nominally controlled the software, no updates were being released. Users took it upon themselves to share patches and generate official releases. After four such releases, Salz turned INN over to the ISC."

Both projects can be considered mature in that they are both stable and widely used.

Participation remains steady. Recent versions include various protocol and architectural enhancements.

KDE and GNOME

KDE and GNOME are XI1 based desktop environments. The two are similar in many ways, however GNOME uses the GTK+ , whereas KDE uses , a graphical library from Troll Tech.

Unfortunately, Qt was initially a proprietary product, and its use was met with mixed reaction to say the least. The prospect of a free graphical desktop was so attractive that many were willing to overlook the contradictory nature of the KDE Project. However others initiated GNOME, a fully open-source competitor. Early on, this resulted in some fairly acrimonious debates.

Eventually, Troll Tech relicensed Qt as open source, defusing the conflict. The two projects have both continued, aiming to best each other in terms of functionality and usability. (Perens, 1999)

GNOME, or the GNU Network Object Model Environment, includes a windows-based desktop, a development platform, and a set of office productivity applications. The development platform provides a set of core libraries supporting GUI construction.

GNOME Office is based on Bonobo, a component system designed to promote reuse. 27

KDE also includes a desktop, an application development framework, and an office suite.

The application framework is based on KParts compound document technology and leverages open standards such as CORBA. KOffice, the office suite, is comprised of a word processor, a spreadsheet, and a presentation program, together with numerous other tools.

Mozilla

Mozilla is an open-source deployment of Netscape's popular Web browsing suite,

Netscape Communicator. In January 1998 Netscape announced that the source code for the next generation of Communicator would be made freely available. This decision was strongly influenced by a whitepaper written by Frank Hecker, which referenced The

Cathedral and the Bazaar (Hamerly et al, 1999) (pi98):

When Netscape first made Navigator available for unrestricted download over the Internet, many saw this as flying in the face of conventional wisdom for the commercial software business, and questioned how we could possibly make money "giving our software away." Now of course this strategy is seen in retrospect as a successful innovation that was a key factor in Netscape's rapid growth, and rare is the software company today that does not emulate our strategy in one way or another. Among other things, this provokes the following question: What if we were to repeat this scenario, only this time with source code?

The first developer release of the source code was made in late March (Charles, 1998).

Since then, the project has expanded to encompass about 60 owners, each responsible for a particular module. Development is coordinated through mozilla.org, a group providing a central point of contact for those interested in using or improving the code.

In this regard, an extensive web site has been established that includes tools for problem reporting and version management. Discussion forums are available through various newsgroups and mailing lists. All code issued in March was released under the NPL.

New code can be released under the MPL or any compatible license.

Although it has benefited from widespread media exposure, Mozilla has met with its share of challenges. The merger of AOL and Netscape introduced early uncertainty, and 28 the vocal departure of (1999), one of the project leads, led some to label

Mozilla as a failure. Still, the project has persevered and now represents a growing set of

Internet technologies. Many continue to feel confident that Mozilla will produce a leading web browser, designed for standards compliance, performance, and portability.

Perl and Python

Perl and Python are both interpreted, high-level programming languages that have gained widespread usage on the Internet. While the implementations are quite different from each other, there is a great deal of overlap in functionality between the two.

Consequently, they are often compared and share a somewhat unusual rivalry.

Python originated as a hobby project of Guido van Rossum, who in 1989 began writing a descendant of ABC, a teaching language aimed at non-professional programmers.

Released for free distribution in 1991, Python is now used throughout the software industry. It is an object-oriented language with an elegant, clear syntax and a highly extensible architecture. Guido van Rossum is responsible for almost all of the core implementation, and continues to oversee ongoing development.

Originally developed in 1986 by Larry Wall, Perl has become extremely popular for system and network administration, as well as CGI programming. Larry Wall

(Dougherty, 1997) recalls: "It came to me way back when ... the UNIX universe at that time consisted of C and Shell ... C was good at getting down into the inards of things, but wasn't very good at whipping things up quickly. Whereas Shell was good at whipping things up quickly, but couldn't get down into the nitty-gritty stuff. So, if those are two- dimensional on a graph, then there's this big blank area out there where, well, where Perl is now. So that's the origin of Perl."

Perl is maintained by a core group of programmers via the perlSporters mailing list.

Larry Wall retains artistic control of the language, however a well-defined extension mechanism allows for the development of add-on modules by independent programmers (Wall, 1999b). 29

Samba

Samba is an implementation of the CIFS protocol suite. CIFS, or the Common Internet

File System, was introduced by Microsoft and relies on the SMB protocol, which is used by most PC-related machines to share files, printers, and various other services. Samba allows Unix operating systems to interact with platforms that support SMB natively.

Originated by Andrew Tridgell, Samba is now maintained by the Samba Team, a group of about 20 people who make regular contributions and have write access to the source tree. Development is coordinated through the samba-technical mailing list.

Samba is extremely popular for integrating Unix with other systems, most notably those that are Microsoft-based. It is currently shipped with all major Linux distributions.

Additionally, Silicon Graphics offers commercial support for Samba, as do several other providers. Samba has become so successful that it actually exceeds the performance of native SMB platforms.

2.5 Summary

This chapter presented a brief history of free software. The meanings behind free software and Open Source were also discussed, with a review of various licenses as well as the Open Source Definition. A critical synopsis of Eric Raymond's The Cathedral and the Bazaar was provided, and several open-source projects were profiled. 30

Chapter 3 State View

Release early, Release often. Eric S. Raymond

Objectives

• To identify characteristics consistent with a state view of the open-source

software development process

• To discuss these characteristics using examples from various open-source

projects

The state view of a software process model covers the various stages of product development. This includes both tasks and states of the product relating to design, coding, and testing. Characteristics consistent with this view can be identified by asking how is the work produced?

Open-source projects begin with a closed prototype, either developed from scratch or based on some extant older product. Following widespread release, volunteers begin to incrementally evolve this early version through rapid iteration, while concurrently managing as many design, build, and testing activities as possible. Requirements are user-driven and projects rely on large-scale peer review to remove errors. This approach can be broken into five characteristics.

1. Prototyping is closed. An individual or small group develops an early version of

the product. Upon release, this is used to present plausible promise and establish

a conceptual design.

2. Enhancement is iterative and incremental. The prototype, or "build 0," is evolved

incrementally through a series of regular iterations. Increments are small and

iterations are frequent.

3. Development operates concurrently at many levels. Design, build, and testing are

managed in parallel. Change is stabilized in stages.

4. Peer review is large-scale. Changes are subject to review by a diverse and highly

motivated user base. 31

5. Requirements are strongly user-driven. Requirements are tacitly understood by

developers who are themselves users of the product.

3.1 Closed Prototyping

A prototype is a simplified version of a product created rapidly and early in a project.

Prototyping is often used to facilitate a better understanding of user interaction, especially when requirements are vague or unstable. Prototypes can also be used for experimenting with new design ideas, as a safety factor in high-risk environments, and to manage organizational change (Hekmatpour, 1987).

Most open-source projects begin with a prototype. An individual or small group will start building a system from scratch or by reusing an extant older product.10 Once the originator is ready to invite others into the project, a prototype is typically released over the Internet. The intent is to use this early version as a catalyst in establishing a user community.

In this regard, the prototype must present "plausible promise,"11 demonstrating enough potential that users will volunteer effort towards evolving it. Eric Raymond (1998a) notes: "Your program doesn't have to work particularly well. It can be crude, buggy, incomplete, and poorly documented. What it must not fail to do is (a) run, and (b) convince potential co-developers that it can be evolved into something really neat in the foreseeable future."

Open-source projects rely on volunteerism, and people are needed for ongoing product maintenance and enhancement. Karl Fogel (1999) (p79), cofounder of Cyclic and an early CVS developer, explains:

... all projects begin with the expectation of success and the hope that the

10 For example, Apache was based on the NSCA httpd server, and Mozilla was derived from the code base. With regard to Linux, Eric Raymond (1998a) explains: "Linus Torvalds ...didn't actually try to write Linux from scratch. Instead, he started by reusing code and ideas from Minix, a tiny Unix-like operating system for PC clones. Eventually all the Minix code went away or was completely rewritten — but while it was there, it provided scaffolding for the infant that would eventually become Linux." " Raymond (Cavalier, 1998) refers to both technical and sociological promise: "The promise was partly technical (this code will be wonderful with a little effort) and sociological (if you join our gang, you'll have as much fun as we're having)." 32

software will be immediately adopted by hordes of enthusiastic users, some of whom will contribute bug reports and patches. In most cases, the project finds a kind of comfortable middle ground: A modest number of users grow to depend on the software, they find each other and band together, usually on a mailing list or newsgroup, and stay in close contact with the maintainer or maintainers. Some of them are able to help out with debugging, creating new features, and generally keeping the code healthy. This is what I like to call a "fireside user community"; every program needs one if it is to stay alive.

Participation is more likely if a running product is available. Any development team is much more productive when they can build and test a working version of the software.

Raymond (1998a) observes: "It's fairly clear that one cannot code from the ground up in bazaar style. One can test, debug, and improve in bazaar style, but it would be very hard to originate a project in bazaar mode ... Your nascent developer community needs to have something runnable and testable to play with."

A working version attracts users by filling a niche. Karl Fogel (1999) (p80) comments:

"... a program is doomed to extinction unless it has some committed users who depend on it to get things done. Many projects that appear initially promising end up as failures

- that is, they don't spread beyond their original authors - because they don't fulfill this basic requirement."

The key is to release at the right time - when the prototype is functional but still needs improvement. It must be reasonably stable and actually do something before anyone will take the time to try it out. Fogel (1999) (p80) remarks: "This is the chicken-and-egg dilemma faced by every new project and is probably where the majority of fatalities occur."

Given the risk of waiting too long and someone else announcing a similar project, it is tempting to release a prototype earlier than later. Although open-source projects such as

KDE and GNOME promote friendly competition while occupying the same niche, this is far from ideal. However, as Fogel (1999) (p84) points out, there is also the danger of releasing too early and wasting other people's time: "The majority of potential contributors often feel that they can patch in a desired missing feature themselves, but that if the fundamental code is fragile, there's no point in their putting new code on top of 33 it. Even the most charitable reaction to crashes is, 'Well, I guess it's not really ready for outsiders yet. I'll come back in six months and see what the code looks like.' "

For example, since its inception Mozilla has been criticized for a disproportionate number of outside contributors versus Netscape employees (McHugh, 1999). This can be partly attributed to weak prototyping. Soon after release, observers commented on the incompleteness of the code (Patrizio, 1998). As Jamie Zawinski (1999), one of the early core developers who later left the Mozilla project, recounts:

People only really contribute when they get something out of it. When someone is first beginning to contribute, they especially need to see some kind of payback, some kind of positive reinforcement, right away. For example, if someone were running a web browser, then stopped, added a simple new command to the source, recompiled, and had that same web browser plus their addition, they would be motivated to do this again, and possibly to tackle even larger projects.

We never got there. We never distributed the source code to a working web browser, more importantly, to the web browser that people were actually using. We didn't release the source code to the most-previous-release of : instead, we released what we had at the time, which had a number of incomplete features, and lots and lots of bugs. And of course we weren't able to release any or crypto code at all.

What we released was a large pile of interesting code, but it didn't much resemble something you could actually use.

Consequently, external participation was lower than anticipated at first. Yet as Mozilla improves, more outside developers are now joining in (mozilla.org, 2000). One of the current leads, Christopher Blizzard (Schaller, 2000), emphasizes:

People who work on open source projects usually do their best work on previously released products. Once Mozilla reaches the point where it's usable for everyone day to day ... people will start fixing their favorite bugs and adding features to the browser. It's true for almost every open project that some company or person has to do most of the initial work and can expect more support once that project is out and being used. Mozilla is no different. 34

... as the product reaches the point where it's usable the number of external contributors is likely to increase. That's held true for us.

Mozilla was able to recover because Netscape programmers are paid to write code.

Unfortunately, other projects have met with less success under similar circumstances.

FreeCASE, for instance, announced plans to develop an object-oriented analysis and design tool. A small community initially showed considerable interest, discussing requirements and debating various design issues. Yet after more than a year, very little had actually been implemented. There was no prototype, and the project stagnated. In contrast, Argo/UML, a similar project, started with working code. Participation remains strong with steady progress. Not surprisingly, some FreeCASE members have since suggested adopting Argo/UML as a starting point (Robbins, 1999).

Plausible promise is critical for establishing early participation. However, assuming there is sufficient interest, the prototype also needs to present an implicit fundamental design in order to scale properly. A well thought out prototype solves many issues up front, giving new developers an easily understood, extensible framework. Eric Raymond

(1998a) notes: "Linux and fetchmail both went public with strong, attractive basic designs. Many people thinking about the bazaar model ... have correctly considered this critical."

With a consistent design, it is easier to maintain conceptual integrity as features are added. Brooks (1995) (p42) argues: "Conceptual integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas."

It is largely for this reason that prototyping in open-source development is often closed at first. Reaching consensus in decentralized groups can be difficult (Mantei, 1981), and it is easier for fewer people to establish an initial architecture. Brooks (1995) (p44) suggests that to achieve conceptual integrity, design must "proceed from one mind, or from a very small number of agreeing resonant minds." Chip Salzenberg (1999) describes his work on Topaz, a project to re-implement Perl in C++: "... it's me mostly

[working on the project] for now because when you're starting on something like this, 35 there's really not a lot of room to fit more than one or two people. The core design decisions can't be done in a bazaar fashion ..."

Ideally, the prototype should be designed for understanding and extensibility, defining major structural components and how they fit together. Any product that is sufficiently complex to require the effort of many participants will encounter challenges in maintaining a coherent conceptual model. The design needs to support evolutionary change without being overly complex.

Unfortunately, this is largely dependent on the expertise of the originator. In reality, most developers prefer to delay as many design decisions as possible until the code has undergone some real use (Fogel, 1999). Often a design that looked good when the project started may turn out to be wrong later on. Only fundamental design decisions tend to be made early, focusing subsequent development effort without stifling creativity.

3.2 Iterative and Incremental Enhancement

Requirements for different types of software are often so difficult to understand that it is nearly impossible, or at least unwise, to attempt to design a system entirely in advance.

Processes must be agile enough to handle rapid, continuous change.

Mills (1971) suggested that software should be advanced incrementally. Brooks (1995) would later elaborate on this concept, proposing that developers add more functions to systems as they are run, used, and tested. Basili and Turner (1975) originated the practice of iterative enhancement in large-scale software development, and Boehm (1988) created the spiral model, an evolutionary lifecycle incorporating risk management.

As shown in Figure 3 , an evolutionary style is in contrast to a more sequential,

"waterfall" approach (Royce, 1970) to product development. The slope of the solution curve is steeper, meaning that the process is more adaptable. For this particular example, the evolutionary path is stepped because functionality is advanced incrementally.

12 Adapted from Comer (1991). 36

Time

Figure 3. Comparison of evolutionary development vs. waterfall life cycle.

In open-source projects, the prototype represents "build 0," or the minimum version of the product that can be assembled and tested. This initial version is then evolved incrementally through a series of regular iterations. Small increments and rapid iteration typify open-source development. As Eric Raymond (1998a) prescribes: "Release early,

Release often."

Most projects tend to make a lot of small changes. New users are often just interested in a specific enhancement, and even core developers responsible for much of the code base prefer several stepwise refinements to one large modification.13 This is more effective, especially given the number of contributors and part-time nature of the work.

The lifecycle of a typical change request is outlined in Table 4.14 A contributor begins by volunteering for some task, usually a bug fix or enhancement. Core developers are ordinarily responsible for implementing most new features, working through requests in order of priority and familiarity.

13 For instance in a study of Apache development (Mockus et at, 2000), change requests were found to be much smaller than in comparable commercial projects. 14 Extrapolated from Gooch et al (2001). 37

Table 4. Typical change request.

1. Volunteer. Accept responsibility for a given task, typically a bug fix or enhancement.

2. Copy source. Obtain a working copy of the source code.

3. Implement change. Make changes to the working copy by modifying the source code. Because the working copy is separate, there is no interference.

4. Create patch. Create a patch representing the differences between the old copy and the new one.

5. Submit source. Post the patch and a brief description explaining its relevance to the developer mailing list.

After obtaining a working copy of the code, either through ftp or a version control system, a contributor will begin making changes. Upon completion, these changes are usually encapsulated as a patch, or an ASCII text file that contains the differences between original and new code, with some extra information such as filenames and line numbers (Gooch et al, 2001).

Patches are submitted for review, integration into the source tree, and subsequent release.

In some cases, changes can be committed directly without generating a patch. For example, core developers often have write access to the source tree, and changes can be made at their discretion.15

Each patch constitutes an incremental addition to the code. Many small increments necessitate frequent iteration, particularly in a distributed group. Lehman and Belady

(1971) (cited by Brooks, 1995) (pl50) have offered evidence that change should be very large and widely spaced or else very small and frequent. The latter approach is more subject to instability, but it can be extremely effective.

A rapid release cycle is needed to synchronize change, motivate contributors, and encourage continuous feedback. Linux kernel development is the most obvious example of this approach, with iterations averaging every few days over the past six years. A

15 Apache (Fielding, 1998) distinguishes between "commit-then-review" (where changes are deemed inherently acceptable and are applied for review afterward) and "review-then-commit" (where changes are discussed before being applied). 38 recent development thread, version 2.1.*, iterated over 132 times. As of 2.3.39 and

2.2.14 respectively, there had been over 369 development releases along four main threads (1.1.*, 1.3.*, 2.1.*, and 2.3.*) and 67 stable kernel releases (1.0, 1.2.*, 2.0.*, and

2.2.*) (Godfrey and Tu, 2000).16

Although other projects release less frequently, the relative amount of change is also lower. Most projects find an iterative cycle that accommodates the volume of submissions. The key is to maintain a predictable schedule of builds, whether daily, weekly, or somewhere in between. When asked what was required to get "critical mass" behind Python's use and development, Guido van Rossum (Dougherty, 1998b) responded: "I think widespread distribution was the key, plus regular solid releases."

Frequent iteration helps to ensure that new code is habitually merged into the source tree.

It is an incentive for developers to stay current. By not keeping pace with the latest build, they risk having submissions rejected for inconsistency. Periodic stablizations force coordination while still allowing developers to work independently.

Regular builds also improve motivation. Developers are more likely to contribute to a project when there is visible progress. Eric Raymond (1998a) explains: "Linus was keeping his hacker/users constantly stimulated and rewarded - stimulated by the prospect of having an ego-satisfying piece of the action, rewarded by the sight of constant (even daily) improvement in their work." David Lawrence (1998) of INN similarly comments:

"... everyone can see the work progressing, even when a development cycle appears to run a bit long. Periods of inactivity tend to emerge only when the many participants see nothing that really needs doing."

By releasing often, a tight feedback loop is established through which users are able to evaluate recently implemented features. Bugs can be caught more quickly, and corrections are easier to make. This is in contrast to extended release cycles, where the design is apt to drift between periods of user review.

A detailed history of Linux kernel releases in available from Williams, R. (1999). Perl development follows a similar pattern, as described in Hietaniemi (2000). 39

However, this approach also carries certain risks. For instance, there is a higher tendency to retain poorly structured, inefficient code (McConnell, 1996). Developers often move onto new features after existing code is minimally functional. Retaining obsolete or poor quality code across successive iterations can lead to design deterioration, increasing the entropy of a system and adversely affecting maintainability. Chris Salzenberg (1999) discusses Perl:

It really is hard to maintain Perl 5. Considering how many people have had their hands in it; it's not surprising that this is the situation. And you really need indoctrination in all the mysteries and magic structures and so on - before you can really hope to make significant changes to the Perl core without breaking more things than you're adding.

Some design decisions have made certain bugs really hard to get rid of... Really, when you think about it, the number of people who can do that sort of deep work because they're willing to or have been forced to put enough time into understanding it, is very limited, and that's bad for Perl, I think. It would be better if the barrier to entry to working on the core were lower. Right now the only thing that's really accessible to everyone is the surface language, so anytime anybody has the feeling that they want to contribute to Perl, the only thing they know how to do is suggest a new feature.

Fortunately, open-source projects are not subject to rigorous schedule pressures, so there is more opportunity for rework. For example, Perl 6 is only the latest in a series of fundamental architectural revisions. Larry Wall (Beaver et al, 2000) recalls: "At the end of Perl 4,1 realized it was time to scrap the prototype and rewrite it. So Perl 5 is really pretty near a total rewrite. And at that point I realized it was both my first and last chance to do it right, so I put a lot of effort into defining an architecture that would be extensible and scalable and all the good buzzwords."17

As another example, the original layout engine for Mozilla was discarded in favour of , a next- generation engine based entirely on open Internet standards (LinuxWorld, 1999b). This decision was made in spite of mounting criticism over the project's apparent inability to deliver a production release, along with issues concerning backwards compatibility. 40

As Alan Cox (McMillan, 1999) of Linux remarks: "The most annoying one [lesson] you eventually learn is that sometimes you have to just throw everything in the bin because you screwed up and did it the wrong way. And now the only way you're ever going to get your project right is just to throw everything away and start again."

Generally though, many open-source projects have managed to avoid problems associated with design deterioration. Godfrey and Tu (2000) observe that, contrary to their expectations, the Linux kernel has scaled remarkably well, especially given the rapid growth rate shown in Figure 4.

20,000,000

18,000,000

16,000,000

14,000,000 » Development releases (1.1, 1.3, 2.1, 2.3) -—Stable releases (1.0, 1.2, 2.0, 2.2) 12,000,000 I c 10,000,000 o

«" 8,000,000

6,000,000

4,000,000

2,000,000

Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001

Figure 4. Growth of the compressed tar file for the full Linux kernel source release.

This success can largely be attributed to a well-defined prototype that emphasizes modularity and extensibility.18 When asked if there are any original design assumptions in the Linux kernel that are limiting development today, Alan Cox (McMillan, 1999) replied:

I don't think so. There are lots of cases where, to get the best performance, you want to really think how to redo stuff. What we've been doing there, especially with 2.2, is making drivers with all the nice locking stuff in them run much faster

18 Eric Youngdale (Moody, 1997), a Linux hacker, notes: "Linus is omnipresent through the development approach he created. Yet he almost never intervenes - in a way, he solved all of the problems up front." 41

on a multiprocessor machine. If they don't have these features, things still work. And that's really important. It's possible to write a very naive driver. And once you've got it working, you can improve it, optimize it, and effectively you can then tell the kernel, "Hey I'm clever, I want to take on all these roles. I'm going to screw up on my own without your help."

Still, in situations where the initial design does not inherently support extensibility, it can be difficult to promote rework. Most effort in open-source development is spent performing corrective rather than preventative maintenance. This is particularly true in the early stages of a project, where there is less overall commitment and immediate results tend to carry more weight. Preventative maintenance impedes the momentum of the development process, which depends on the interest of potential contributors.

3.3 Concurrent Development

Sequential approaches to product development often require longer periods of time because work is scheduled in phases. Testing typically occurs late in the development cycle, and as a result it can be difficult to assess progress accurately. In contrast, concurrent development tends to proceed more rapidly, with greater tolerance for change.

Although there is sequential initiation of each activity, tasks are carried out in parallel once started (Davis and Sitaram, 1994).

In open-source projects, as many design, build, and testing activities as possible are performed concurrently. There are no distinct phases. Instead, participants have the flexibility to work on whatever task they find interesting. Some write code while others debug changes or discuss new features.

Development operates concurrently at many levels. As previously mentioned, most contributors iterate through a common series of actions. They discover a requirement, identify a solution, develop and test within their own local copy of the source, and submit a change. At any given time, different developers can be working on a range of tasks in parallel, each at various stages of completion.

To synchronize change, at some point an overlying iterative cycle ends and all outstanding patches are merged into the main distribution for release. Testing intensifies as bugs are fixed and a new round of features is implemented, beginning the next cycle. 42

At the highest level, some projects also maintain parallel code branches.19 One is for ongoing development and the other is for stability, or widespread use. Again, release cycles overlap and requirements identified in the stable branch are implemented in development, which eventually leads to a new stable release.

This basic concept, which works the same both within and between branches, is represented in Figure 5.20 Evaluation and testing for the first iteration are folded into the next, and so on. Development streams evolve in parallel, as contributors design, code, and test simultaneously.

Release 1 Release 2 Release 3

Merge \ Iteration 3 Design and coding

Iteration 2

:::: Iteration 1 I rr r:

Review and debugging Figure 5. Typical build cycle.

There is some overhead associated with merges. Not only must code conflicts be resolved, but there is also extra effort required for packaging and distribution. Paul Vixie

(1999) (p98) of BIND elaborates: "Integration of an open-source project usually involves writing some manpages, making sure that it builds on every kind of system the developer has access to, cleaning up the Makefile to remove the random hair that creeps in during the implementation phase, writing a README, making a tarball, putting it up for

Linux is perhaps best known for this approach, where the middle number of the kernel version identifies the release path. Odd numbers are for development kernels and even numbers are for stable kernels. 20 Adapted from Aoyama (1998). 43 anonymous FTP somewhere, and posting a note to some mailing list or newsgroup where interested users can find it."

Certain guidelines are commonly used to help offset this workload. For example, projects regularly enforce feature and code freezes. During a feature freeze, no new functionality is added to the code base, however bug fixes are usually permitted. Minor improvements are allowed as long as they are relatively isolated. A code freeze is more restrictive than a feature freeze. No changes are made to the code, except possibly severe bug fixes. The intent of a freeze is to create a progressively static buffer around the merge.

Many open-source projects also rely on version control systems to reduce integration costs. These tools can make tracking changes and packaging a release much easier.

Some projects, like Mozilla, even force automated builds. A notable exception is the

Linux kernel, where Linus Torvalds (Linux World, 1999a) still does most of this work manually: "When it comes to the kernel, the way things are done is that we have a one single source tree that is the mother of all source trees, and that sits on whatever machine

I happen to use as a development machine, and then people send me patches towards that source tree. If I'm in a good mood and the patches look good, they make it into the holy shrine of the official kernel, and if I'm not they just wait for a better day."

Integration costs in Linux kernel development continue to increase with the number of contributors, and there have been some fairly acrimonious debates over the sustainability of such an approach. This has resulted in at least one "Linus burnout episode,"

(Torvalds, 1998) and several proposals for various patch management or version control solutions. Alan Cox (LWN, 1999) remarks:

If I was the kernel organiser, there are quite a few things I' do differently. Right now Linux applies all the patches and builds the trees, I'd much rather there were a group of people directly merging patches into the kernel tree and Linus sitting watching it and vetoing things rather than doing all the merge work too.

The [development] model has changed over 2.1 .x and it has evolved into a kind of compromise that seems to work very well. Linus is still applying all the 44

patches but there are people now collating and feeding Linus tested sets of patches in small cleanly organized groups.21

With merging simplified as much as possible, either through tool support, delegation, or both, the question then becomes when to merge, and how often? Aoyama (1998) suggests that development time is minimized when the concurrency ratio is 0.5. That is, each release should be pipelined in the middle of an iteration. For most open-source projects, it is difficult to determine whether this is actually the case. Builds and releases are typically made on an "as needed" basis. Ken Coar (1999) (p8) explains:

New versions of the Apache HTTP server software aren't released according to any calendar schedule; instead, they are made available when there's consensus among the developers that it's "ready." Every now and then, someone on the development mailing list notices that, "gee, it's been a long time since we did a release; the src/CHANGES file has about fifty new entries since the last one." After a few weeks of discussion, general agreement will probably form that yes, it's about time for a new release, and someone will volunteer to be the release manager."

Still, it is apparent that where more activities are being performed concurrently, regular merges are required to encourage synchronization. This is certainly the case where many different contributors are writing a lot of code. For development and stable branches, the release cycle is typically much less aggressive. Linus Torvalds (Goodman et al, 1999) remarks:

You want to make new releases often enough that people don't start worrying about it. Quite frankly it was too long between 2.0 and 2.2 — two and a half years, and there were a lot of people who had to upgrade to one of the development kernels in the interim because they did SMP better, or something else. But it's not supposed to be that you have to upgrade to a development kernel because you want a feature. The situation should be that users want to be on the latest stable release, and that every year or year and a half you get a new stable

21 Alan Cox (McMillan, 1999) describes his own role in Linux kernel development: "Most of what I do with the kernel involves testing and coordinating patches. I look for code that fits together, stuff that is good but needs testing - the stuff that Linus has missed. Basically, I hoover up all of the patches that appear, test them together, throw out bad stuff and then feed the good stuff to Linus." 45

release that has new features and then you move on. It's really hard to make a release. I've been wanting to make a new release for almost a year now because it was time for it. But there was always something in flux. This is why everybody has problems making new releases.

With concurrent development, the goal is to merge often enough to synchronize change and create a suitably tight feedback loop. There is a trade-off, as merge costs accumulate with the number of releases, lengthening development time. In preliminary examinations, projects such as Linux and Apache seem to approximate these assertions.

Development is initially extremely rapid and then gradually slows as integration becomes more difficult. For mature open-source projects, discussions have focused more on reducing merge costs and maintaining consistent cycle times (Fielding, 1998). At the very least, this suggests a concurrency ratio optimized through experience.

3.4 Large-Scale Peer Review

Reviews can be applied throughout the software development process, helping to uncover errors that can then be removed (Pressman, 1997). Technical work needs reviewing simply because it is inherently prone to error. Freedman and Weinberg (1990) suggest that any type of review is a way of using the diversity of a group of people to:

1. Point out needed improvements in the product of a single person or team

2. Confirm those parts of a product in which improvement is either not desired or

not needed

3. Achieve technical work of more uniform, or at least more predictable, quality than

can be achieved without reviews, in order to make it more manageable

Particularly effective is peer review. Although people are good at catching some of their own errors, large classes of errors escape the originator more easily than anyone else.

Mills (1971) argues that exposing all of the work to everybody's gaze helps quality control, both by peer pressure to do things well, and by peers actually spotting flaws and bugs.

Peer review is implicit in open-source development. Source code is available to everyone, and technical communications such as bug reports are conducted in public. 46

This encourages developers to be thorough and think twice before releasing faulty code.

Even so, it is assumed that releases will not be free of defects. Rather, the product will continually improve based on enhancements, revisions, and corrections generated by a large number of actively involved users. Paul Vixie (1999) (p98) of BIND elaborates:

An additional advantage enjoyed by open-source projects is the "peer review" of dozens or hundreds of other programmers looking for bugs by reading the source code rather than just by executing packaged executables. Some of the readers will be looking for security flaws and some of those found will not be reported ... but this danger does not take away from the overall advantage of having uncounted strangers reading the source code. These strangers can really keep an Open Source developer on his or her toes in a way that no manager or mentor ever could.

Open-source projects rely almost singularly on large-scale peer review to remove errors.

There is usually no system-level test plan or unit testing (Vixie, 1999). In a study on quality related activities in open-source development, Zhao and Elbaum (2000) found that more than 80% of projects surveyed did not have a testing plan. Some notable exceptions exist. For example, a regression testing clause is included as part of the

Artistic License (perl.com, 2000).

Generally though, peer review by itself appears to work reasonably well, and the absence of other types of testing does not seem to cause much concern. Most of the Internet infrastructure is open-source software known for its reliability (O'Reilly, 1999). There is also quantitative evidence to suggest that GNU utilities are more reliable than many comparable commercial alternatives (Miller et al, 1995).

Table 5 22 (Mockus et al, 2000) (p270) compares defect density measures in commercial projects with Apache. While the user-perceived defect density of Apache is more than that of the commercial products, this can be partly attributed to higher usage intensity.

More importantly, the lower defect density of the code before system testing suggests

~ Mochus et al use two different measures for defect density. The incremental nature of deliveries (where only a small fraction of the code is actually changed) is taken into account using defects per thousand lines of code added (KLOCA). Defects per thousand deltas (KDelta) is used to handle the problem of bloated code (considered bad but with an artificially low defect rate). 4? earlier reduction of errors. Based on this data, Mockus et al hypothesize that the defect density in open-source projects will be lower than commercial code that has received a comparable level of testing.

Table 5. Comparison of defect density measures between commercial projects and Apache.

Measure Apache A C D E Post-release 2.64 0.11 0.1 0.7 0.1 Defects/KCOLA Post-release 40.8 4.3 14 28 10 Defects/Kdelta Post-feature test 2.64 t 5.7 6.0 6.9 Defects/KLOCA Post-feature test 40.8 * 164 196 256 Defects/Kdelta

Open-source development assumes that bugs become "shallow," or easier to find "when exposed to a thousand eager co-developers pounding on every new release." Eric

Raymond (1998a) dubs this "Linus's Law," which states that "given a large enough beta- tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone." Or, less formally, "Given enough eyeballs, all bugs are shallow."

It should be noted however, that the efficiency of a "strength in numbers" approach to quality control remains unclear. Critics argue that although Linus's Law may be fast and effective, it is not necessarily efficient. McConnell (1999) suggests that for open-source projects there may be a duplication of effort hidden in peer review. One person may eventually find the bug, but many others will nonetheless spend time looking for it and trying to fix it.

Raymond (1998a) counters that, "the theoretical loss of efficiency due to duplication of work by debuggers almost never seems to be an issue in the Linux world." Duplication is minimized through frequent iteration, which quickly propagates fixes. Raymond also suggests that the costs of duplicated work scale more slowly than the planning and management overhead that would be needed to eliminate them.

In any case, the basic premise remains the same. More users find different ways of stressing the program. Debugging is an activity that works better when split among 48 people with diverse knowledge. This can be attributed to the Delphi effect, which states that the averaged opinion of a group is more reliable than any single recommendation.

As Paul Vixie (1999) (p98) observes: "The essence of field testing is its lack of rigor.

What software engineering is looking for from its field testers is patterns of use which are inherently unpredictable at the time the system is being designed and built - in other words, real world experiences of real users. Unfunded open-source projects are simply unbeatable in this area." David Lawrence (1998) (p50), one of the core INN contributors, comments: "... an open development model's strength lies in numbers. In an open effort, many server administrators may be compiling the development tree on different operating systems and running different configurations. Thus, bugs are often found much faster. Although the person finding a bug may not know how to fix it, help is readily available."

By leveraging the Internet, open-source projects are able to reach a large number of potential reviewers, each with access to the source code. Moreover, reviewers are self- selected and therefore highly motivated to use the software, find out how it works, and develop solutions to problems. David Lawrence (1998) (p50) observes: "Bugs get quick attention in open-source projects because the participants tend to have a significant interest in the package, which may not be the case with commercial programming. As end users, they have a lot of real-world experience and know how they want the programs to operate. They can identify missing features and point out sometimes quirky and troublesome behaviour. When they see a potential problem, they respond quickly."

Paul Vixie (1999) (p98) emphasizes: "... open-source software enjoys the best system- level testing in the industry ... The reason is simply that users tend to be much friendlier when they aren't being charged any money, and power users (often developers themselves) are much more helpful when they can read, and fix, the source code to something they're running."

McConnell (1999) cautions that by fixing a bug downstream, at the source code level, the cost could be higher. He adds that this might not be immediately apparent because the downstream effort on an open-source project is spread across so many people. 49

Proponents argue that providing users with access to the source code reduces the time needed for downstream error detection (Bollinger et al, 1999).

A tight feedback loop helps to ensure that bugs are found as quickly as possible. Short iterations allow users to review the code early, and developers are able to release fixes shortly afterward. This minimizes loss of context for the original development effort, making a correction easier to make.

Table 6 (Russell, 1999) shows the timeline of a bug fix for the Linux kernel. Linus

Torvalds (bootNet.com, 1999) elaborates: "... Bugs in Linux are minimal and fixed very quickly. As an example, a networking bug that allowed you to send illegal packets was corrected within four hours. Most commercial had fixes within a week or two."

Table 6. Timeline of a bug fix.

Tjun7T999 (GMT) ~~ Mailed to bugtraq 15:43 Hit Alan's mailbox 15:56 Alan patches bug 21:11 Alan posts preliminary fix to linux-kernei 21:23 Alan posts to bugtraq 22:30 Slashdol runs story 22:36

Figure 6 (Mockus et al, 2000) (p271) shows the proportion of changes closed within a given number of days for Apache. Fifty percent of problem reports were resolved within a day, 75% within 42 days, and 90% within 140 days. These numbers were influenced by priority, time period, and whether or not the bug caused a change to the code. For example, higher priority items such as the core (kernel, protocol) were solved as quickly as possible. 50

0 5 10 SO 100 500 Days open

Figure 6. Proportion of changes closed within a given number days for Apache.

The end result seems to be that, in open-source projects, large-scale peer review helps to

find more bugs faster. The user base is both diverse and highly motivated, and many projects see critical bugs fixed with impressive speed. However the efficiency of this

approach is still in doubt, and some question whether it is being accurately reflected in the overall cost of development.

3.5 User-Driven Requirements

Understanding customer requirements is one of the most challenging aspects of software

development. Requirements can change frequently and are often notoriously ambiguous.

Not surprisingly, active customer participation has been shown to noticeably enhance transfer of domain knowledge (Vosburgh et al, 1984). Working closely with customers helps to avoid rework and increase productivity.

Open-source development is exceptionally strong in this regard because, as Roy Fielding

(1998) of Apache observes: "the developers of the system are also its biggest

customers." Contributors to open-source projects actively use the software, and code is

typically written because someone wants to "scratch a personal itch" (Raymond, 1998a).

Linus Torvalds (ABCNEWS.com, 1999) observes: "Instead of developing by looking at

what marketing wants, Linux code tends to get developed by the people who actually

need new features." 51

In general, open-source developers are experienced users of the software they write, self- selected by their interest and knowledge of the application domain. Since a lack of domain knowledge is frequently a problem in large software projects (Curtis, 1988), one of the main sources of error is eliminated when domain experts write the code.

However, customers with direct access to a product during development often have an increased desire for features (McConnell, 1996). In open-source projects, participants are motivated by the opportunity to write code. As Guido van Rossum (1998) of Python notes: "From the user's point of view, the big advantage is flexibility."

Unfortunately, this can sometimes lead to "feature creep." Alan Cox (McMillan, 1999) discusses the Linux kernel: "My main concern is that however hard you try to keep the kernel lean, things gradually creep in. It's very easy to put something in a kernel; it's really hard to get it out. That's the real thing: making sure you keep the gradual growth under control. Getting that wrong could be very bad."

Linus Torvalds (Goodman et al, 1999) further emphasizes: "It's just a basic fact of software engineering that it's a lot easier to add features than it is to remove them!

Adding features never breaks old programs, unless you have a bug — and even then the bug may be so subtle that you never notice. Eventually it will happen to Linux too. The only thing you can do is to make that process as slow as you can make it."

Risk of feature creep tends to increase as a project scales because more users bring more demands. Brooks (1995) (p258) notes: "For products that survive and evolve through many generations, the temptation [to add features] is especially strong ... the larger and more amorphous the user set, the more necessary it is to define it explicitly if one is to achieve conceptual integrity."

In open-source development, there are no formal specifications. Instead, projects try to minimize feature creep by publicly debating proposed changes, either through a newsgroup or mailing list. Consensus is whatever participants remember or agree with

(Vixie, 1999). Core developers retain the right to reject new features or ask that they be reworked. For the most part, product integrity is therefore dependent on the overall design philosophy shared by this core group. With regard to Linux kernel development,

Linus Torvalds (Goodman et al, 1999) explains: 52

... You have to have a basic philosophy. You have to have some fairly basic rules about how things should work so that whenever someone adds new features you can ask, "Does this make sense in the larger picture?" If you don't have a larger picture, you can never ask yourself whether some new feature makes sense. UNIX has a philosophy that makes it less susceptible to these random additions. But the Windows philosophy is to add whatever it takes ~ to do whatever you have to do for marketing reasons. It's not written down anywhere, but trust me, that's what the Windows philosophy is. UNIX, however, was not quite a research operating system. A lot of research operating systems go to the other extreme, which is, "We don't care about how the world looks, because we want it to look this way, and that's the way it's going to look." You can't do that either — that's putting blindfolds on yourself!

The strong technical orientation of most participants, and the fact that in a volunteer environment the software is often being run on lower-end hardware, makes feature control somewhat easier. Emphasis tends to be on performance and a lean code base.

David Lawrence (1998) (p51) notes: "For something like INN, where software speed is an extremely important aspect of suitability for production use, and where the amount of material to be processed increases at an alarming rate ... an open project is much more likely to result in concise and elegant solutions because the system's users want to squeeze all they can out of their machines."

The disadvantage is that while a technical emphasis benefits feature-set control, it can adversely impact other areas, most notably end-user design. Participants in open-source projects are not really users, but programmer-users. Linus Torvalds (Ghosh, 1998) remarks: "The thing that makes 'real users' so interesting is that they have so [sic] different usage patterns from most developers, which is why a product that is solely

23 targeted to developers tends to lack a certain stability and finish."

lamie Zawinski (Linuxpower, 1999) notes: "Netscape's customers are users, whereas mozilla.org's customers are developers." Christopher Blizzard (Schaller, 2000) further comments: "At mozilla.org, we're really targeting developers more than end users. We're hoping that we can get a lot of developers using Mozilla, for sure. As for end-users, 1 think that most of them would benefit most from sticking to a released branded version of Netscape or some other released browser version from another vendor." 53

Open-source projects are therefore inclined to offer workable imitations of popular commercial interfaces. Features address user activities rather than user behaviour.

Innovation that is routinely found in the underlying code is not present in the interface because there is no feedback loop to true end-users, and no imperative to create one

(Kuniavsky, 1999).

Open-source software has tended to be more infrastructure-oriented. Traditionally, most projects have focused on operating systems and network services. This is because end- user applications are hard to write. Graphical, windowed environments are relatively complex, and most programmers are not good interface designers. Open-source projects also tend to thrive where incremental change is rewarded, which has meant back-end systems more than front-ends. (Behlendorf, 1999)

There are certainly some notable counterexamples. The GIMP is particularly strong, and

KDE and GNOME both support a wide variety of end-user desktop applications. Still, the interfaces remain somewhat rough, with more emphasis on the underlying architecture.24 Linus Torvalds (Yamagata, 1997) comments:

... I tend to think that some things work better as commercial software, mainly because a lot of the program is that "final polish" that commercial software is so good at.

For example, user interfaces are usually better in commercial software. I'm not saying that this is always true, but it many cases the to a program is the most important part for a commercial company: whether the programs [sic] works correctly or not seems to be secondary ...

Things like word processors tend to be better when they are commercial, because in a word processor the most important part really is the user interface.

Although open-source development can reduce problems associated with transfer of domain knowledge, a hands-on approach may heighten the risk of feature creep. A technical emphasis can offset this by encouraging lean feature-sets, but often at the

~ Even the GIMP is lauded more for its plug-in architecture. Similarly, GNOME's component technology is commonly emphasized over its user interface (in fairness however, the GNOME Usability Project seems to be a serious attempt to address the user interface). 54 expense of end-user design. In general, where developers are not also experienced users of the software, they are highly unlikely to have the necessary level of domain expertise or motivation to succeed as an open-source project. At the very least, certain areas of the product will be weak.

3.6 Summary

This chapter presented a state view of the open-source software development process. A number of characteristics consistent with this view were identified and discussed with examples from various open-source projects. For instance, prototyping is closed. An individual or small group typically develops an early version of the product. Upon release, volunteers with firsthand knowledge of the application domain evolve this initial build incrementally through a series of regular iterations. Design, build, and testing are managed in parallel. Quality control is handled through large-scale peer review. 55

Chapter 4 Organizational View

We reject kings, presidents, and voting. We believe in rough consensus and running code. Dave Clark

Objectives

• To identify characteristics consistent with an organizational view of the open-

source software development process

• To discuss these characteristics using examples from various open-source

projects

The organizational view of a software process model addresses the social aspect of development. This includes factors relating to communication and coordination, key roles and responsibilities, and motivation. Characteristics consistent with this view can be identified by asking how is the work organized?

The organizational structure of an open-source project is decentralized. Participants are self-regulated, for the most part able to work on whatever interests them. They communicate asynchronously and are internally motivated. A project leader coordinates overall development, while secondary leaders are responsible for specific subsystems.

Leaders control integration, and their authority is based on trust earned through competence. This approach can be broken into four characteristics.

1. Collaboration is decentralized. Responsibility is pushed downward as a project

scales, allowing many small projects to work like one big project. Integration is

controlled.

2. Leadership is trusted. The control hierarchy is built up on a personal network of

trust. Authority and responsibility move towards those who demonstrate the most

competence.

3. Motivation is internal. External motivators such as financial compensation are

secondary. People contribute for reasons such as opportunity, community, and

status. 56

4. Communication is asynchronous. Geographical distribution makes synchronous

communication impractical. Most communication is through electronic mail.

4.1 Decentralized Collaboration

As Pressman (1997) (p61) wryly observes, "there are almost as many organizational

structures for software development as there are organizations developing software." For

simplicity however, Mantei (1981) has suggested three generic teams. The first is

Democratic Decentralized, based on Weinberg's (1998) concept of "egoless

programming," in which there is no permanent leader. Rather, task coordinators are

appointed for short durations and then replaced by others who may coordinate different tasks. Decisions are made by group consensus, and communication is horizontal.

In contrast, the Controlled Centralized team has a leader responsible for top-level

problem solving and internal coordination. Communication between the leader and team members is vertical. This is similar to the "chief programmer" approach first proposed by Mills (1971).

The Controlled Decentralized team is a compromise. The team has a leader who coordinates specific tasks and secondary leaders that have responsibility for defined subtasks. Problem solving remains a group activity, but implementation is partitioned among subgroups. Communication among subgroups and individuals is horizontal, but vertical communication along the control hierarchy also occurs.

A common view of open-source software development is that of the Democratic

Decentralized team. Many see open-source projects as anarchical, somehow able to miraculously produce an integrated product (the "anyone-can-hack-anything consensus theory"25). This is an easy assumption, since software developers are known to abhor bureaucratic rules and procedures, unnecessary documents, and overly formalized modes

25 Raymond (1998b) notes: "In fact (and in contradiction to the anyone-can-hack-anything consensus theory) the open-source culture has an elaborate but largely unadmitted set of ownership customs. These customs regulate who can modify software, the circumstances under which it can be modified, and (especially) who has the right to redistribute modified versions back to the community." 57

of communication. Hackers in particular are commonly characterized as independent and

free-thinking26, and even the term "bazaar" itself suggests chaos.

In practice, although open-source projects certainly retain some of these characteristics,

most are closer to a Controlled Decentralized group structure, with some democratic

aspects. While participants are largely self-regulated, there are also implicit controls.

Linus Torvalds (bootNet, 1999) remarks: "It's a chaos with some external constraints put

on it ... it allows a chaos, but at the same time it has certain built-in things that just make

it very stable."

Control is needed as the number of participants grows. A purely democratic group is

simply not viable for large tasks because of its cooperation requirements. In open-source development, people work on whatever aspect of the code interests them, "scratching their own itches." However, they are eventually faced with the prospect of integrating their own improvements with changes made by others. Although this is a relatively simple task for a few developers, it quickly becomes unmanageable with more people.

Changes will inevitably begin to overlap, resulting in disagreements over who should adapt or defer their code. The result is that someone has to take on the role of code integrator, deciding what gets into the main distribution.

A classic example is the Linux kernel, in which Linus Torvalds originated the project and continues to oversee ongoing development. Eric Youngdale (Moody, 1997), a coder who led the development team for Linux's SCSI drivers, emphasizes: "Free-flowing self- regulation is all very well, but without the right person to act as a focus, this energy will just be dissipated."

It is for this reason that open-source projects are often characterized as "benevolent dictatorships" (Raymond, 1998b). The idea is that a project owner, or benevolent dictator, has the right to make binding decisions, so long as any choices reflect the community's broader interests. This is not always easy, and as Jeremy Allison

According to the Hacker Ethic (Levy, 1984) (p41): "Mistrust authority - promote decentralization ... Bureaucracies ... are flawed systems, dangerous in that they cannot accommodate the exploratory impulse of true hackers." 58

(LinuxWorld, 1999a) of Samba relates: "sometimes the emphasis is on benevolent, sometimes the emphasis is on dictatorship."

27 Not all projects follow this model exactly. For instance, Apache operates with a voting committee, otherwise known as the Apache Group. In contrast, KDE works without a formal advisory, even stating (KDE, 2000): "the KDE Project is possibly the only large

Open Source project that has neither a 'benevolent dictator' nor an elected governing board (or any voting at all). In a sense, we are the only large 'pure' Bazaar-style project out there."28

According to Chip Salzenberg (LinuxWorld, 1999a), Perl "originally was a benevolent dictatorship pretty much along the lines of Linux." However, as project owner Larry

Wall explains (Beaver et al, 2000), it has progressed to the point where he now advises a sort of rotating inner council29:

It's interesting how the governance of the Perl community has evolved over time. It has actually turned out to be somewhat like the United States federal government. It used to be, way back in the Dark Ages, that I just ran the whole thing. I was Mr. Perl - judge, jury, and executioner. But these days the perl5- porters mailing list serves as the legislature. And we have an executive - which is not me, actually. It's whoever is currently the integration manager, essentially - the patch manager. We call that person the Patch Pumpkin Holder. That's the executive, and so that title moves from person to person, and that leaves me to be the Supreme Court. So I get to rule on what's constitutional or not.

Projects simply tend to evolve, shifting to handle new organizational requirements. Larry

Wall (Beaver et al, 2000) explains: "... Different kinds of programs need different kinds of models. Perl, being a language as well as a computer program, needs a certain kind of

" Raymond (1998b) remarks that "[an] understanding of large projects that don't follow a benevolent- dictator model is weak." Perl, Apache, and KDE are mentioned specifically. 28 The KDE Project has since clarified this approach (KDE, 2001): "The KDE Core group decides on the overall direction of the KDE Project and manages the release schedule. Contrary to the development of other free software projects, most notably Linux, we do not have one single 'benevolent dictator' who decides on important questions. Rather, the KDE core group consisting of about 20 developers decides by means of democratic voting procedures on important questions." 29 With Perl 6, the organizational structure is changing again. Development topics will be assigned to working groups that will work under a central project manager. 59 design oversight that something like Apache doesn't necessarily need. So Apache can get away with an oligarchy; Perl needs more of a monarchy."

In any case, the general approach remains largely the same. Development is decentralized, with a core group responsible for integration. Eric Allman (Lourier, 1999), the original developer of Sendmail, comments: "I believe that all the open source projects that have really succeeded have done so by having some strong core. It's not necessarily an individual, [as in the case of the] Apache core team. But it's not a pure bazaar. Pure bazaars tend to devolve into chaos way too easily. You need a little bit of control, but not too much."

With growth, the core pushes out and downward. Secondary leaders begin to emerge once it becomes too much for one person, or even a small group, to coordinate everything. As Chip Salzenberg (LinuxWorld, 1999a) relates, Perl changed because of the demands placed on Larry Wall: "[It changed] partly because Larry burned out. I mean anybody would burn out trying to manage the development of a language with such a large group of users over such a long period of time. Up to Perl 5 it was pretty much

Larry's show."

Secondary leaders are module owners rather than project owners.30 They are responsible for some subset of the code base, reviewing contributions and handling integration. The core still essentially manages the project, but without having to become involved in every detail. Linus Torvalds (Yamagata, 1997) notes: "I've been very successful in delegating off any work that I cannot or do not want to handle, which has allowed me to keep on managing the basic kernel and set down the milestones and generally deciding on the basic picture of the kernel."

Linus (Gillmor, 2000) recalls early kernel development: "It started out very flat - no organization, no hierarchy, because there weren't that many people involved. It was mainly an issue of my having told people, 'Hey, I'm working on this project ...'I did it all by e-mail and, more than that, I did it all by personal e-mail ... It was all one-to-one."

j0 Terminology varies from project to project, however secondary leaders are most commonly referred to as "module owners" or "maintainers." Perl uses the term "pumpkings." 60

Gradually, Linus came to rely on what Michael Johnson (Moody, 1997) calls "a few trusted lieutenants, from whom he will take larger patches and trust those patches. The lieutenants more or less own relatively large pieces of the kernel."

In this way, open-source projects are able to promote decentralized collaboration while still maintaining necessary controls. Responsibility is typically pushed as low as possible, allowing many small projects to work like one big project. For the most part, developers can function autonomously, relying on the control hierarchy to manage integration. As Alan Cox (Anderson, 1999) explains: "There's no waiting around for a manager to give the go-ahead on a project. If someone doesn't think something is working right, and he wants it fixed, he just goes ahead and fixes it."

4.2 Trusted Leadership

Leadership refers to the ability to influence other people (Weinberg, 1998). In open- source development, leaders try to push work in a consistent direction. Linus Torvalds

(Gillmor, 2000) notes: "Good leadership is always a matter of making people want to do things because of their own reasons rather than due to any external pressure, and, probably, realizing when you're wrong and deciding that it's not worth it."

Leaders in open-source projects do not function in a conventional sense, largely because of a group dynamic driven by volunteerism. Influence follows people, not positions.

Alan Cox (Anderson, 1999) emphasizes: "It [Linux development] doesn't really work that way. We're not organized along corporate lines. We don't have titles."

Instead, projects operate as meritocracies, following "a system of management in which the amount of access and participation allowed are based on the opinion of one's peers"

(Coar, 1999) (p2). The more someone participates, the more merit or trust they earn from their peers, and the more they are allowed to do.

J Linus (Gillmor, 2000) further explains: "... the moment somebody stepped up and said, 'Hey, I think you should handle this that way,' I said, 'Go ahead! Do it!" [As the kernel evolves] someone always steps up and becomes the leader for a particular sub-system ... Sometimes it's one person having multiple sub• systems, sometimes it's just one person for one sub-system." 61

Influence is therefore dependent on trust earned through competence. Responsibility moves toward those who demonstrate the most competence and, as Eric Raymond

(1998b) notes, "Hackers like to say that authority follows responsibility." Generally speaking, programmers tend to value people they perceive to be good at the things they do. Thus it is easier to exert leadership over programmers "by being a soft-spoken programming wizard than by being the world's fastest talking salesman" (Weinberg,

1998) (p80).

The control hierarchy in an open-source project is held together by trust. There is really no other reason for participants to follow the direction of one person over another. Linus

Torvalds (LinuxWorld, 1999a) remarks: "... the whole development model is really built up on kind of a personal network of trust, and nothing else." Chip Salzenberg

(LinuxWorld, 1999a) of Perl further emphasizes: "People basically have to trust at least the intentions and usually the skill of each other, or else it just all falls apart."

The originator of a project typically assumes an initial leadership role as the "benevolent dictator," or project owner. This only makes sense, since the originating individual or group will be most familiar with the code and best able to offer direction. In this case, trust is implicit. Linus Torvalds (Gillmor, 2000) recalls:

It just happened by a kind of natural selection. I'd been doing Linux as my own

personal project, and I put it out just because I wanted comments and because I

thought that somebody else was interested, and obviously, partly because I

thought it was a really interesting project and it's a way of just showing off ...

there were a lot of things that people asked for and also implemented themselves.

They started out just asking for small things and then asking for larger things or

doing them themselves, and none of this was very planned for. The leadership

part came by default, because nobody wants to make decisions, right? Things

just happened, and it wasn't really planned. And I was the obvious person for it.

As the project scales certain people are usually invited to join the originator, possibly managing certain subsystems as secondary leaders, or maintainers. Raymond (1998a) makes the distinction between ordinary contributors and co-developers. Contributors who make a substantial ongoing commitment earn the trust of senior members and are asked to take on more responsibility. Ken Coar (1999) (p4) notes: "The innermost circle 62 of involvement in the Apache server project is comprised of the people who have shown the most dedication to it, either through actual code submitted or by advocacy, infrastructure support, or other types of contributions."

Some projects are more formal about this than others. Apache requires group consensus in gauging merit and deciding what someone is allowed to do (Fielding, 1998). The

Linux kernel is fairly casual in comparison. As Linus Torvalds (Gillmor, 2000) explains:

"... it's very much an organic process ... It's not as if there has been any voting (on who should be a maintainer). People just know who's been active there and who they trust, and it just happens."

In any case, the process applies not just to new leaders, but to changes in leadership as well. It is here that open-source projects exhibit clear democratic organizational traits. In a democratic group, leadership is not confined to a single person, but moves around from team member to team member. Weinberg (1998) notes that it is not important for every member to exert equal leadership, but that the determinants of leadership are based on the inner realities of team life.

For example, if a project or module owner is not doing a good job, perhaps by releasing buggy code or by not being responsive to bug fixes or suggestions, then someone will decide that they can do better. However, the community still needs to agree that a change is in the best interests of the project. Linus Torvalds (bootNet.com, 1999) relates: "... the only entity that can really succeed in developing Linux is the entity trusted to do the right thing. And as it stands right now, I'm the only person/entity that has that degree of trust. And even if somebody thought I was doing a bad job (which is fairly rare) and that somebody decides that T really want to fix this feature,' there's a really big hurdle to convince everybody else that he CAN fix that feature."

Realistically, disagreements over leadership are rare. It is more common for the original owner to seek out a replacement. As Raymond (1998b) notes: "It is well understood in the community that project owners have a duty to pass projects to competent successors when they are no longer willing or able to invest needed time in development or maintenance work." 63

Leadership is therefore self-regulating, able to shift with the project's internal demands.

Even dictators are accountable. In this regard, the bazaar model owes much to the early successes of the IETF, or the Internet Engineering Task Force, which created and maintains the open standard on which the Internet was built. The unofficial motto of the

IETF was originally uttered by MIT professor Dave Clark: "We reject kings, presidents, and voting. We believe in rough consensus and running code."32

One of the disadvantages of trusted leadership is that owners tend to be strongly identified with their work. There has also been speculation that a certain amount of charisma is needed to be project leader. Trust may not be based solely on competence, but on other less tangible factors as well.

Certainly, many open-source leaders tend to assume an overstated persona, whether intentionally or not. The expression BDFL, or "Benevolent Dictator for Life," reflects this. Linux is strongly associated with Linus Torvalds. The same can also be said of Perl and Larry Wall, or Python and Guido van Rossum.

The risk is that the community will be unable to cope with an inevitable change in leadership, and the project will falter. For example, Linus Torvalds (ABCNews.com,

1999) notes: "It's very obvious that the current 'Linus personality cult' has to go. It's clear that there are a lot of other developers who have the technical expertise to take

Linux where it needs to go, and while I'm happy to be the 'posterboy of Linux' eventually people will have to realize that when I'm 80 and have altzheimers or whatever, you'd better be looking for somebody else for leadership."

This problem is jokingly referred to as "bus syndrome." The expression refers to a thread that has alternately appeared on various mailing lists, including the Python developer list, where someone posted the semi-serious question (McLay, 1994): "What would happen if

Guido was hit by a bus?" Various projects have attempted to address the issue. Guido van Rossum initially formed the Python Software Activity, which was later superseded by the Python Software Foundation. Similarly, the Apache Group is now known as the

Apache Software Foundation. Each of these organizations exists at least partly for

32 Widely attributed to Dave Clark at an IETF meeting in 1992. 64 reasons of continuity. With regard to Linux. Linus Torvalds (Goodman et al, 1999) merely comments:

The last stable release for the last year hasn't been maintained by me. That part isn't interesting any more. I didn't want to be there as a maintainer for a product that wasn't technically interesting, but where you just had to make sure it was stable. Alan Cox maintained the last stable release. I haven't done anything at all for 2.0 in a year. When a bug is found, Alan sends me a patch, and I sprinkle holy penguin pee on it, and it magically becomes official. If I died tomorrow because a bus struck me - which is highly unlikely, because there aren't many buses here - what would happen is psychologically a lot of people would be running in circles screaming. And a lot of people would feel nervous about Linux, and maybe that would delay a few Linux projects by a year or so. But what would happen eventually is that Alan Cox or someone else would pick it up.

So, to some extent the risk of "bus syndrome" may be overstated. According to David

Lawrence (1998) (p51) of INN: "... the real need is for talented programmers who are genuinely interested in the product and can interact well with the many contributors.

This is hardly such a task as monumental as finding a charismatic leader, as proven by the many open projects that succeed without such leaders."

4.3 Internal Motivation

Motivation is undoubtedly the single greatest influence on how well people perform.

Performance is largely measured by quality and productivity, both of which are strongly affected by motivation. In fact, many studies have found that, in software development, motivation has a stronger influence on productivity than any other factor (Boehm. 1981).

Participants in open-source software development tend to be highly motivated. This is not surprising, since projects rely primarily on the effort of volunteers. Consequently,

j3 According to the Apache Project (2000): "... the Foundation has been incorporated as a membership- based, not-for-profit corporation in order to ensure that the Apache projects continue to exist beyond the participation of individual volunteers, to enable contributions of intellectual property and funds on a sound basis, and to provide a vehicle for limiting legal exposure while participating in open-source software projects." 65 tasks must be interesting enough that people are willing to donate their time. As Alan

Cox (LWN, 1999) observes: "The free software world doesn't really work like a managed corporate structure. If someone is going to do something as a volunteer, it has to be something they find fun." Linus Torvalds (ABCNEWS, 1999) similarly notes: "I generally don't ask people to do anything special. I want people to work on projects because they WANT to work on them, not because they feel that they SHOULD. That's how you maintain good morale and code quality."

Open-source projects elicit participation on their own merit. External motivators such as financial compensation are secondary. Until recently, few developers were actually paid to participate in projects such as Linux or Apache. People contribute for other reasons, such as opportunity, community, and status.^ These internal motivators have been shown to have a more noticeable impact on individual performance (McConnell, 1996).

The opportunity to write code is one of the strongest internal motivators in open-source development. According to Weinberg (1998) (pi 84), "programming itself, if the programmer is given a chance to do it his way, is the biggest motivation in programming." Boehm (1981) counted "the work itself as a top motivational factor for programmer analysts, and Raymond (1998b) refers to "the joy of craftsmanship." Paul

Vixie (1999) (p97) of BIND comments: "The opportunity to write code is the primary motivation for almost all open-source effort ever expended. If one focuses on this aspect of software engineering to the exclusion of all others, there's a huge freedom of expression."

Zawacki (1993) (cited by McConnell, 1996) reported that about 60 percent of a developer's motivation comes from a match-up between the job and the individual.

Contributors to open-source projects are self-selected, able to "scratch a personal itch" by volunteering for tasks that interest them (Raymond, 1998a).

According to McConnell (1996) (p255), the best way to motivate developers "is to provide them with an environment that makes it easy for them to focus on what they like

34 See Hars and Ou (2001) for a quantitative study of the motivational factors driving participation in open- source development. 66 doing most, which is developing software." In this regard, most open-source projects operate with very few organizational constraints. The opportunity to work autonomously seems to appeal to participants. "A happy programmer is one who is neither underutilized nor weighed down with ill-formulated goals and stressful process friction,"

Raymond (1998a) suggests. "Enjoyment predicts efficiency."

McConnell (1996) also observes that the importance of the work itself is one reason that quality is more motivating to software developers than external influences such as schedule. Creating something on the leading edge is a "rush" to a technically oriented person. Open-source projects interest many people because of the perceived challenges associated with producing something unconventional. Participants like to feel that they are pushing boundaries. Erik Troan (McHugh, 1998), a core Linux developer, remarks:

"For engineers, it's all about the cool hack."

For example, many questioned the practicality of porting the Linux kernel to the Palm handheld. Yet a group of developers obviously felt this was a worthwhile project. In this regard, Linus Torvalds (bootNet.com, 1999) notes: "... Linux would have just stopped being after a year because it would have reached my own personal needs. And by reaching those needs, it wouldn't have been interesting anymore. No program is interesting in itself to a programmer. It's only interesting as long as there are new challenges and ideas coming up."

A sense of community is another strong internal motivator in open-source development.

As Karl Fogel (1999) (plO) of CVS observes: "The sheer pleasure of working in partnership with a group of committed developers is a strong motivation in itself. The fact that little or no money is involved merely attests to the strength of the group's desire to make the program work, and the presence of collaborators also confirms that the work is valuable outside one's own narrow situation."

DeMarco and Lister (1999) emphasize the importance of community in software development. An organization that succeeds in building a satisfying community tends to keep its people. When the sense of community is strong enough, no one wants to leave.

As Larry Wall (Dougherty, 1997) remarks: "... I wanted the Perl community to function like a little bit of Heaven, where people are naturally helping each other. They encourage 67 each other, give each other cool things ..." Wall (1999a) (p41) also notes: "... A

language without a culture is dead. A sense of participation is important for any open- source project, but is utterly crucial for a language. I didn't want people to merely say, T know how to program in Perl.' I wanted people to say, T am a Perl programmer.' When people achieve such a cultural identity, many things suddenly become easy. In particular, a kind of self-organizing criticality takes place, and the proper number of leaders (and followers) seems to appear as if by magic."

Status among peers is one of the more interesting internal motivators in open-source development. Boehm (1981) ranked status among the lowest motivational factors for programmer analysts. However Raymond (1998b) argues that, although hackers may not openly admit to desiring status, it is nevertheless a driving force.^ Those with even a peripheral involvement in Open Source are likely to recognize the names Linus Torvalds or Larry Wall.

In the free software culture, sharing work rather than restricting it achieves status.

Participants compete for prestige by giving away time, energy, and creativity (Raymond,

1998b). "Without rivalry - at least the potential for rivalry - you don't get anything done," Linus Torvalds (Gillmor, 2000) explains. "So we've often had these cases where there's been two people maintaining very similar kinds of things, and what ends up happening is that I often accept both of them and see which one ends up getting used ..."

Yet although competition can be effective, it is not without risk. DeMarco and Lister

(1999) argue that a long-term effect of heightened competition is lack of effective peer coaching. In open-source projects, developers working on different approaches to the same problem may be less inclined to collaborate. As David Lawrence (1998) (p51) of

INN explains, a lack of external motivators can exacerbate rivalries, particularly when one contribution is favoured over another: "Personality clashes sometimes undermine productivity. This is especially distressing when those involved are highly valued for their many useful contributions. Without the stabilizing incentive of a paycheck, little turf wars can easily cause people to quit the group altogether. Even people not directly

35 Eric Raymond comments extensively on status in Homesteading the Noosphere (1998b). 68

involved in the battles may decide they don't want to be subjected to the squabbles in

their e-mail."

Fortunately, while rivalries are common, bitter disputes are not. Linus Torvalds

(Gillmor, 2000) comments: "... Sometimes what happens is that the two [submissions]

end up being two completely different things - they just evolve into different directions,

and suddenly they aren't very similar any more and they have very distinct uses." Still,

group cohesiveness is more volatile given many individual agendas, and consequently

there is always the potential for conflict: "The only time I remember it getting nasty -

people started sending patches that fought the other person's patches - I just told them

'OK, I'm not accepting patches from either of you, and this driver's dead as far as I'm concerned.' A few months later, one of the maintainers had just lost interest... the problem went away on its own basically ... It happens, but it really is very rare."

4.4 Asynchronous Communication

Communication in open-source development is constrained by geographical distribution and a voluntary work environment. Many large projects are multinational, and most volunteers have other jobs.37 Variation in work schedules and network latency make

synchronous communication difficult.

Accordingly, communication is predominantly asynchronous. Participants exchange

information via e-mail as Ken Coar (1999) (p4), a senior Apache developer, explains:

Since Apache development is so highly distributed, with lots of work being done in Europe and Oceania in addition to the various time-zones in the United States, real-time communications just don't work very well. Someone is going to be asleep or close to it at almost any possible time.

The solution currently in use by the Apache project is to have most everything happen in electronic mail. There are a few mailing lists dedicated to the project ... and even the most impatient participants learn quickly to allow at least 24

For an example, see Torvalds (2000). j7 More information about the geographic distribution and work hours of Linux kernel developers is provided in a study by Hermann et al (2000). Williams, R. (2000) also provides an interesting distribution of postings to the Linux kernel developers list by timezone. 69

hours to pass before assuming any sort of consensus or conclusion. Discussions

can often span several days or even weeks as people who were offline come back

and join in, possibly re-opening a discussion that others thought concluded.

Mailing lists and private e-mail are both used. Chen and Gaines (1997) characterize e- mail discourse as a cycle of origination and response between agents communicating through a computer-mediated channel. As shown in Figure 7, private e-mail includes only an originator and a recipient.

V J

Figure 7. E-mail discourse.

However list server discourse, shown in Figure 8, also includes a community.

Communication patterns are more complex in that the originator may not direct a message to a particular recipient, there may be multiple responses to a message, and the response from the recipient may itself trigger responses from others.

Figure 8. List server discourse. 70

Nearly all open-source projects use mailing lists as a primary means of communication, where every message is distributed to all subscribers. Private e-mail is typically reserved for resolving localized conflicts, or when a discussion contains confidential information

(Fielding and Kaiser, 1997).

Mailing lists are important in a decentralized organizational structure because they promote open dialogue. However, a weakness of decentralized groups is that communication channels can eventually become unmanageable (Mantei, 1981).

Eric Youngdale (Moody, 1997), an early Linux contributor, remarks that, "[as projects increase in size] it gets harder for key developers to communicate because of the enormous noise on the mailing lists." Less experienced participants can sometimes overrun certain threads, essentially drowning out any meaningful discussion, resulting in what Alan Cox (1998) characterizes as the "Town Council" effect:

The problem that started to arise was the arrival of a lot of (mostly well meaning) and dangerously half clued people with opinions - not code, opinions. They knew enough to know how it should be written but most of them couldn't write "hello world" in C. So they argue for weeks about it and they vote about what compiler to use and whether to write one - a year after the project started using a perfectly adequate compiler. They were busy debating how to generate large

model binaries while ignoring the kernel swapper design.

Linux 8086 went on, the real developers have many of the other list members in their kill files so they can communicate via the list and there are simply too many half clued people milling around. It ceased to be a bazaar model and turns [sic] into a core team, which to a lot of people is a polite word for a clique. It is an inevitable defensive position in the circumstances [sic].

Of course, this is problematic in open-source development because it relies on a

"strength-in-numbers" approach for testing and feedback. By excluding potential contributors, particularly in a volunteer-driven community, a project may inadvertently sacrifice valuable input. 71

4000

3250

3QOO

£ 2750 C m a. u 2500 4-* rtL. 0- 2250

250

0 .llllll

1991 1992 1993 1994 1995 1996 1997 1998 1999

Year Figure 9. Activity for the Python mailing list.

Figure 9 (python.org, 2000) summarizes number of participants by year for the Python mailing list. It is obvious that with more participants, list traffic, meaningful or otherwise, increases dramatically. For Apache, Ken Coar (1999) (p5) similarly relates:

"The new-httpd list can achieve amazingly high traffic levels; at peaks of development activity (or controversy) it may involve hundreds of messages per day. Even during the quietest of times there are usually at least half a dozen messages in any 24-hour period."

Jeremy Allison (Tamiya, 1999), one of the core Samba developers, comments:

Some days, [I write] no code at all. Some days, it's all . I get around 300

In just a day. I'm drowning in email, I really am. I just have to relate. I

just cannot do anything about it. 72

... Most days, it's an average of probably half time coding, half time just managing stuff. Trying to manage the release of Samba, merging other people's patches and communicating with those team members, it just takes an enormous amount of time ...

... I used to have a job where I could go write code eight hours a day everyday. I can't do that anymore. I still try to have at least one day in a week to do that, but it's very hard, it really is.

The number of communication paths grows multiplicatively, proportional to the square of the number of people. More paths usually result in more communication, with less opportunity for actual work. This is popularly known as "Brooks's Law," which states:

"Adding manpower to a late software project makes it later" (Brooks, 1995) (p25).

Guido van Rossum (1998) of Python echoes the sentiments of several other prominent open-source developers: "... as a package becomes more popular, the developer spends more time on helping users than on developing software. While there are ways to avoid this (e.g. don't answer email), it remains a problem - without a support organization, you're it! This can be summarized as 'you're crushed by your own success.' "

Fortunately, Brooks's Law is intentionally simplistic. Although an increase in list activity is inevitable, most open-source projects remain productive by taking preventative steps early on to minimize redundant communication. For example, larger projects often have numerous mailing lists each specializing in a different topic. FreeBSD uses over 40 mailing lists, as well as several non-English lists. In well-established projects it is common to have, at a minimum, mailing lists for technical discussion, reporting bugs, and general inquiries.

Certain lists can also be closed, or available "by invitation only." Threads are transferred to a public list as appropriate, where the bulk of communication still takes place. To subscribe to the GNUstep developers list for instance, you must contact the maintainers directly. The list has been "established to keep the noise low when it comes to core topics." Ken Coar (1999) (p5) notes: "There are two discussion lists associated with the

Apache HTTP server project: a public technical list called new-httpd, and the private 73

apache-core list which is intended for discussions by the core team about the project

itself rather than about technical issues."38

List digests such as linux-kernel-digest are sometimes used to condense activity into

a format more suitable for quick review. Multiple postings are combined, and

subscribers typically receive only 3 or 4 mailings per day, rather than hundreds. This

makes it easier for participants to keep apprised of what is happening without reading

every message. The disadvantage is that it is difficult to reply to individual postings. So

digests are most useful to those who wish to stay current, but not active developers.

Many mailing lists are also automatically archived. New participants can review past

discussions, learning more about the project and what has already transpired.

Furthermore, it reduces the chance that various threads will be reintroduced or revisited

(Fielding and Kaiser, 1997).

Most effective though, is list etiquette. Informal rules and guidelines are used to keep everyone, particularly those new to the project, focused on the task at hand. Etiquette is strongly enforced, sometimes almost too much so. Ken Coar (1999) elaborates:

The [new-httpd] list is definitely for serious technical discussion; there is not a

lot of tolerance exhibited toward newcomers seeking consulting help with server

operation ...

Even messages containing legitimate technical development-related content

aren't guaranteed a welcome; the list could easily be described as 'high-octane.'

Someone new who posts on a subject that has been discussed (and concluded)

several times before, or who weighs into the middle of a vigorous technical

discussion with something ill-considered or poorly presented, is likely to get

'flamed.' Even messages that pass the flame-bait test might be given little

credence at first.

According to the KDE Project (2001): "About 6 months into the development of KDE, the project started to grow very large and began to attract a great amount of passionate vocal interest. Seemingly endless discussions and bickering began to stifle development. In an effort to maintain a healthy decision• making process it became necessary for the core group of developers to open internal communication channels and to limit write access to kde-devel, the developers mailing list, to actual developers rather than opinionated spectators. Read access to kde-devel is still open to anyone." 74

This is because the community of participants on the new-httpd list is just that - a community. The people who have been there for awhile are familiar with each other, and have developed respect for each other's opinions; newcomers are generally greeted with reticence until they have established a bit of a presence and the long-term residents feel comfortable with them.

This makes it sound as though new contributors aren't welcome, but that's actually very far from the truth. The project is always glad to get more help - but would-be contributors need to show that they're there for the course before they'll be taken seriously. That means proving themselves by showing awareness of current and past discussions (through lurking for awhile and reviewing the mail archives), technical ability, perseverance, and the courage to stand up for their opinions.

Projects typically recommend that new participants become at least marginally familiar with mail archives, and passively subscribe to an appropriate list or digest, "lurking" on it for a few weeks before posting anything.

For the most part, communities are able to maintain fairly open channels of communication for collaboration and knowledge sharing. In this regard, Brooks (1995) observes that "the communication in an organization is a network, not a tree." For open- source projects, Raymond (1998a) suggests that communication is not only a network, but a redundant network as well: "... in the open-source community, organizational form and function match on many levels. The network is everything and everywhere; not just the Internet, but the people doing the work form a distributed, loosely coupled peer-to- peer network which provides multiple redundancy and degrades very gracefully. In both networks, each node is important only to the extent that other nodes want to cooperate with it."

4.5 Summary

This chapter presented an organizational view of the open-source software development process. A number of characteristics consistent with this view were identified and discussed with examples from various open-source projects. For instance, collaboration is decentralized and participants work more or less autonomously, however integration is 75 controlled. Leaders maintain conceptual integrity, relying on trust earned through competence. Participants communicate asynchronously, and are motivated to contribute for reasons such as opportunity, community, and status. 76

Chapter 5 Control View

Managing a distributed development team ... is a bit like herding cats. Guido van

Rossum

Objectives

• To identify characteristics consistent with a control view of the open-source

software development process

• To discuss these characteristics using examples from various open-source

projects

The control view of a software process model focuses on direction. More specifically, it deals with mechanisms for guiding development. This includes planning, approval, data gathering, support, and documentation. Characteristics consistent with this view can be identified by asking how is the work controlled?

In open-source development, control is largely implicit. There are virtually no written rules or guidelines. Volunteers work on tasks until they become busy or tired, and as a result strict deadlines are impractical. The integrity of the code base is maintained through a participatory model based on competence. This is typically supported by a modular architecture, which helps to promote a high division of labour. Tools and information are made readily available to encourage consistency among new participants.

This approach can be broken into five characteristics.

1. Planning is informal. There are no concrete plans or visions. The only long-term

goal is to improve the product.

2. Participation is tiered. Participants work at different levels, reflecting a natural

gradient of competence and commitment.

3. Architectures are designed for modularity. Modular designs reduce

interdependencies, allowing development to proceed more cleanly.

4. Tool support is ubiquitous. Tools are freely available and reasonably consistent

across projects, lowering the entry barrier for participation. 77

5. Information space is shared. Web sites provide easy access to information

resources such as user documentation, discussion forums, and problem report

databases.

5.1 Informal Planning

Planning for product development and delivery schedules is a challenging task, especially in software projects. There are many factors that affect the schedule, and progress is difficult to measure. A substantial amount of effort is spent trying to estimate and track work performed.

Open-source development is much more informal. There are typically no concrete plans.

Instead, the only long-term goal is to improve the product. Linus Torvalds (Yamagata,

1997) relates: "I try to avoid long-range plans and visions - that way I can more easily deal with anything new that comes up without having pre-conceptions of how I should deal with it. My only long-range plan has been and still is the very general plan of making Linux better."

The dynamic nature of many open-source projects makes it virtually impossible to enforce strict plans or schedules. Since most contributors are volunteers, there is no real commitment to deliver anything within a fixed timeframe. Well-meaning promises are made, but these are always subject to change. Developers may become busy or tired, or simply move onto other interests.

Change is also constant, and open-source projects typically do not have the close vendor relationships that many commercial companies rely on to anticipate industry trends.

Users drive requirements, and it is easier to be more reactive than proactive. As Linus

Torvalds (Abreu, 1999) notes:

In the sense that the Linux user base has been changing fairly rapidly, making a

five-year plan just would not work. A year ago the main user for this was still on

a kind of technical workstation, a small scale Web server. And suddenly the

enterprise-like large scale computing came. It wasn't something that Linux had really been used in but it meant that suddenly a lot of new user interest was in a

completely new area. So we're moving on to doing better and better things and

it's not really planned. It's more of a reaction to what people need. 78

At most, project owners set general milestones intended to focus interest on particular areas. Feature sets are emphasized more than delivery dates. For example, Apache posts a high-level project plan, listing goals for future releases. Each of Apache's source code repositories also contains a file called "STATUS," which is used to keep track of the agenda and plans for work within that repository. Linux has something similar called a

ChangeLog. Mozilla currently provides a development roadmap, discussing the schedule shown in Figure 10 (Eich, 2001). Contributors have been asked to vote on the features they think should be included in each milestone.

Q4 2000 Ql 2001 Q2 2001 Q3 2001 Q4 2001

Netscape PR 3 Bianch Point Mozilla 0.9

Mozilla 0.6 Vendor Branches As Required Recommended If we're unlucky. Beta Branch Time X WE ARE HERE

Figure 10. Mozilla milestone schedule for 2001.

A lack of formal planning does not seem to adversely impact open-source projects. In fact, it would seem to be more of a strength than a weakness. This could be because the effect of scheduling pressure on productivity is questionable. DeMarco and Lister (1999) note that in a study of productivity by estimation approach, projects in which no estimate was prepared at all outperformed any other.39

Table 7 (Mockus et al, 2000) (p269) compares code productivity between top Apache developers and top developers in several commercial projects. The Apache core

39 In a revision to The Cathedral and the Bazaar, Raymond states: "I have come to suspect... that in earlier versions of this paper I severely underestimated the importance of the 'wake me up when it's done' anti- deadline policy to the open-source community's productivity and quality. General experience with the rushed GNOME 1.0 in 1999 suggests that pressure for a premature release can neutralize many of the quality benefits open source normally confers. It may well turn out to be that the process transparency of open source is one of three coequal drivers of its quality, along with 'wake me up when it's done' scheduling and developer self-selection." 79 developers appear to be very productive, especially given that Apache is a voluntary, part-time activity with a relatively lean code base.

Table 7. Comparison of code productivity of the top Apache developers and the top developers in several commercial projects.

Apache A B C D E KMR/developer/year .11 .03 .03 .09 .02 .06 KLOC/developer/year 4.3 38.6 1 1.7 6.1 5.4 10

Measured in KLOC per year, Apache developers achieve a level of production that is within a factor of 1.5 of the top full-time developers in projects C and D. Moreover, they handle more modification requests (MRs) than the top developers in any of the commercial projects. Based on this information, Mockus et al suggest that rates of production in Apache are at least comparable to commercial efforts.

Open-source projects work well with loosely defined goals and deadlines. They are not subject to rigid delivery or shipping dates, and still remain very productive. It seems likely that this can at least partly be attributed to lack of schedule pressure. Even recent criticism over the delay in releasing Linux 2.4 has largely been ignored, as developers adopt an "it will be ready when it's ready" attitude. In discussing a potential release date,

Alan Cox (2000) simply remarks: "It's [still just] a target. If you don't have a target you never finish. If you do have a target you miss it sometimes. The good thing is we are missing by a smaller margin each time."

5.2 Tiered Participation

Open-source projects have different levels of participation. Levels are synonymous with trust, controlling ownership, or modification and distribution rights. As Eric Raymond

(2000) notes: "Many bazaar projects have inner and outer circles ... This simply reflects a natural gradient of interest and competence and commitment."

In INN for example, a two-tier model controls access to the source code. David

Lawrence (1998) (p49) elaborates: "Anyone on the Internet can see the entire working development tree ... General access is read only; permission to alter the tree is granted to only a handful of lead programmers intimately familiar with the entire INN system. 80

These programmers are responsible for vetting changes submitted by the INN community at large and then integrating them into the tree."

Table 840 outlines several levels of participation commonly seen in open-source software development. Not all projects are the same. For instance, some are less restrictive than others41, but the general idea remains consistent. The control hierarchy is based on trust, and higher levels are trusted with more responsibility.

Table 8. Levels of participation in open-source projects.

High Project owner. Individual or core group responsible for design decisions concerning overall direction of the product. Contributes hundreds of hours per year to the project.

People with write access to all or some subset of the source tree. Typically secondary leaders responsible for a particular subsystem. Review code submitted by others, route to project owner.

Contribute bug fixes and small enhancements. May or may not participate on an ongoing basis.

Use the product and debug it. Identify and report bugs. Participate in the mailing lists.

Use the product and also suggest new features. Participate in the mailing lists.

Low Use the product.

Ken Coar (1999) (p4) of Apache explains: "Access to each successive level of participation is controlled by the opinions of your peers. The more you submit, and the more your submissions are considered to be of value, the more 'merit' you acquire.

Accumulate enough merit and you'll be admitted to the next level of participation."

For the most part, this process is fairly relaxed. Eric Raymond (2000) emphasizes:

"Anybody can originate and send a patch. And anybody, by displaying sufficient ability and commitment, can work his or her way into the informal inner circle." Still, it should be noted that in longer-term projects, a strong bond often develops between established members, and it can be difficult for new people to join in.

Adapted from Coar (1999). Cavalier (1998) refers to 3 levels of participation: the need-driven consumer, the user-developer, and the core developer. 41 Project restrictions vary. At a minimum most restrict write access to the source tree, however some allow anyone to commit changes. 81

Levels of participation are needed to satisfy quality, integration, and cooperation demands. The quality of code submissions in open-source development varies drastically. There also tends to be a lot of highly opinionated individuals, which can sometimes lead to conflicts over implementation. Consequently, limited groups of experienced developers are more effective where design decisions have far-reaching implications. Alan Cox (1998) of Linux comments:

The first thing you have to understand is that really good programmers are relatively unusual. Not only that but the difference between a true "real programmer" and the masses is significantly greater than that between "great" and "average" in many other professions. Studies have quoted 30 to 1 differences in productivity between the best and the rest.42

Secondly you need to understand that a lot of wannabe real programmers are also very good at having opinions. Many of them also catch buzzword disease or have some specialty they consider the "one true path." On the Internet talk is cheap ...

There are a very small number of real programmers with the time and the right (or is that wrong) kind of mental state to contribute to a project whose sole real worth is "Hack Value." As a result of this at any given time a project has two or three core contributing people.

At the highest level is the project owner. This individual or core group maintains the overall design philosophy by controlling what makes it into the main distribution. Larry

Wall (Dougherty, 1997) recalls: "People have made many suggestions, and I've taken some of them and have rejected many more, though some people wouldn't think so."

Alan Cox (McMillan, 1999) similarly observes: "A large part of Linus's job is to say

'no' to things. And that can be quite a hard job. I have it much easier because I'm

~ Brooks (1995) (p30) makes a similar statement: "In one of their studies, Sackman, Erikson, and Grant were measuring performances of a group of experienced programmers. Within just this group the ratios between best and worst performances averaged between 10:1 on productivity measurements, and an amazing 5:1 on program speed and space measurements!" X2 looking for stuff that perhaps ought to be done, and Linus's job is to say 'No that's not going in. No that's crap, rewrite it.' "43

Different levels of participation promote the separation of architecture and implementation. Brooks (1995) (p44) argues that the "separation of architectural effort from implementation is a powerful way of getting conceptual integration on very large projects." With respect to Perl, Chip Salzenberg (1999) comments: "The language changes only when Larry [Wall] says so. What he has said on this subject is that anything that is officially deprecated is fair game for removal. Beyond that I really need to leave things as is. He's the language designer. I'm the language implementer, at least for this particular project. It seems like a good separation of responsibilities."44

Architectural responsibility is delegated downward as a project scales. Brooks (1995) refers to the "recursion of architects." For example at the next highest level, secondary leaders typically control specific subsystems, maintaining consistency by reviewing submissions. The basic approach is the same at any level, or as Karl Fogel (1999) (p88) of CVS notes: "The primary role of a maintainer is to say 'yes,' to say 'no,' to code and help others do the same."

Generally speaking, upper tiers generate and maintain most of the code in an open-source project. For instance, Figure 11 (Mockus et al, 2000) (p267) shows the cumulative distribution of contributions to the Apache code base. The top 15 developers contributed more than 83% of the modifications requests (MRs), 88% of added lines, and 91% of deleted lines. Developers outside of the top 15 contributed very little code.

Alan Cox (McMillan, 1999) elaborates: "Sometimes you get stuff that's a great idea, but dreadfully implemented. Other times people have done a good job, but they've done it the wrong way. Sometimes people write code that fights the kernel. They give you long complicated alternative ways to achieve something. There, the right thing to do is say, 'Okay that's a great idea, great interface. Throw the bit in the middle away; rewrite it to fit the way the kernel wants to work.' " 44 Brooks (1995) (p233) further notes: "Remember that the builder has creative responsibility for the implementation; the architect only suggests ... Always be ready to suggest a way of implementing anything one specifies; be prepared to accept any other equally good way. Deal quietly and privately such suggestions. Be ready to forego credit for suggested improvements. Listen to the builder's suggestions for improvements." In this regard, Raymond (1998a) observes that: "...it is absolutely critical that the coordinator be able to recognize good design ideas from others." 83

1 5 10 15 50 100 Number of individuals

Figure 11. Cumulative distribution of contributions to the Apache code base.

As another example, Figure 12 (Koch and Scheider, 2000) (p3) shows the lines of code

added per programmer for GNOME (since the beginning of the project). According to the supporting data, the mean LOC added by a given programmer was found to be 21,000 with standard deviation of 67,000. The mean LOC deleted was 15,000 with a standard

deviation of 58,000. The results indicate significant differences between programmers, with the majority contributing a relatively small amount.45

45 Similar distributions can also be observed for Linux kernel development, as reported in surveys by Dempsey et al (1999) and Hermann et al (2000). 84

300

LOC Added Figure 12. Histogram of LOC added per programmer for the GNOME project.

This is in contrast to a more conventional arrangement, in which managers are typically furthest removed from the code. Weinberg (1998) suggests that the differences between first-level and n-th level managers can be described by proximity to the work being done.

The first-level manager maintains at least some direct contact with the code, whereas the n-th level manager only sees the work indirectly.

The unfortunate effect is that confidence is undermined at lower levels. Even though an n-th level manager may have originally been a programmer, given rapid change in technology, this person will likely have lost any immediately relevant technical ability.

Weinberg (1998) notes that in a small survey of working programmers, only 15% of first- line managers were thought to be as skilful as the programmers themselves.

Open-source projects seem to conveniently avoid this problem, since leaders are also experienced developers. Such are the inherent benefits of a participatory model based on competence. This also contradicts the commonly held perception that open-source development involves more programmers than other approaches, at least for new code.

While upper tiers are responsible for most of the implementation work, debugging occurs at mid to lower levels of participation. Consequently, overall effort is more evenly distributed. Error detection is not a trivial task, or as Linus Torvalds (Raymond, 1998a) 85 observes: "Somebody finds the problem, and somebody else understands it. And I'll go on record as saying that finding it is the bigger challenge."

Figure 13 (Mockus et al, 2000) (p268) shows the cumulative distribution of just problem report, or PR, related changes to the Apache code base. It is evident that participation is broader in defect repair than in the development of new functionality. Only 66% of the

PR related changes were produced by the top 15 contributors. The participation rate was

26 developers per 100 PR changes and 4 developers per 100 non-PR changes.

00 o

J3 W . Pi < P- <+- o Fraction of MRs c • o . Fraction of Delta '•6 Fraction of Lines Added o Fraction of Lines Deleted ft 8 ™.4

10 15 50 100 388 Number of individuals

Figure 13. Cumulative distribution of PR related changes to the Apache code base.

Based on this data, Mockus et al hypothesize that "a group larger by an order of magnitude than the core will repair defects, and yet a larger group (by another order of magnitude) will report problems." Eric Raymond (2000) emphasizes: "... the peer- review effect at the heart of Linux's success uses the brains and eyeballs of the *entire* developer population, regardless of what concentric circles or cliques they self-organize into."

So although senior developers, or those at higher levels of participation, usually implement most of the new functionality in an open-source project, the principle of large- scale peer review remains intact. Middle tiers that are also corresponding greater in number contribute to bug fixes, and lower tiers, still greater in number, identify and report bugs. New features are discussed and the product is critiqued at the entry level.

This encompasses the project community. 86

5.3 Modular Design

Modularity refers to the practice of dividing software into separate components, or modules, that are integrated to satisfy problem requirements (Pressman, 1997). The concept of modularity is important for system design, improving comprehensibility and helping to minimize the effects of change.

It becomes easier to approach a complex task when it is broken into smaller pieces.

Myers (1978) (cited by Pressman, 1997) (p349) states that, "modularity is the single attribute of software that allows a program to be intellectually manageable." By designing a modular architecture, modifications are also less likely to ripple through the entire program. The impact of change is lessened through a concept known as

"information hiding," in which design decisions are isolated inside their own modules

(Parnas, 1972). Boehm (1987) reported that information hiding is a powerful technique for eliminating rework, and points out that it is particularly effective for incremental, evolutionary development styles.

Modularity is strongly emphasized in open-source development. It helps participants to contribute effectively while understanding only a subset of the system, and reduces problems associated with low quality submissions. This is important because open- source projects do not consistently practice clean or elegant design.

Clean design involves looking at a program as a mathematical construct or proof, and is more common in commercial software development where companies can afford to make a large investment early in the lifecycle. However, theoretical correctness holds less influence in open-source development, and although there is a tremendous amount of discussion among free software enthusiasts about what constitutes good design, developer lists are not always an accurate description of what is actually being implemented (Fogel, 1999). Clean design in open-source projects is secondary to running code.

Most contributors are interested in a specific enhancement or modification. Changes are made in small increments, and as many design decisions as possible are postponed until the code has had some real use. As Karl Fogel (1999) (pl41) of CVS explains, 87 contributors tend to work quickly at implementing a solution, without necessarily examining how changes might affect other areas:

In a free project, many small initial investments are made at different times by various contributors ... These individual contributors became involved in the first place because of their interest in what the code does ... Their first job is to install and run the code, not peruse it...

The majority of contributors to free software have no need to carefully inspect the code, although that is really the only way to develop a truly informed opinion about a program's design. New contributors usually just want some small, incremental addition to the program's behaviour ... For such quick, minor changes, the best strategy is usually for them to dive in and start hacking. There usually isn't time - or, to be honest, inclination to take a broad survey of the code and then make changes in the most theoretically pleasing way.

Not surprisingly then, contributions tend to be rather inconsistent. Most participants in large open-source projects do not have the necessary expertise to be familiar with all aspects of the design.46 With regard to the Linux kernel, Ken Thompson (Cooke et al,

1999) (p61), one of the original creators of Unix, observes: "I've looked at the source and there are pieces that are good and pieces that are not. A whole bunch of random people have contributed to this source, and the quality varies drastically."

The impact of low quality submissions is minimized through ownership. Code is filtered through the control hierarchy, and modularity makes it easier for developers at different levels of participation to assume control of certain subsystems. Overall review can then be less stringent because the impact of change is localized.

In this way, architecture mirrors organizational structure. This behaviour was originally described as Conway's hypothesis (1968), formulated by Brooks (1995) as Conway's

Law, which states that the organization of a software system will be congruent to the organization of the group that designed the system. Open-source projects follow highly

46 David Lawrence (1998) of INN relates: "Contributions to the project vary in quality, and much time can be spent trying to make sense of a submission. Although the time lost is almost always regained by having bugs rapidly identified and proposed fixes offered, it can still be frustrating to sort through confusing or awkwardly coded submissions that are not up to the usual standards of the project ..." 88 modular designs because a decentralized organizational structure necessitates this approach.

For instance, the Linux kernel was developed by a large number of volunteers. The number of developers, and the fact that they are volunteers, had an impact on how the system should be designed. With such a large number of geographically dispersed people, a tightly coupled system would be difficult to coordinate. Developers would risk constantly treading on each other's code. Consequently, the kernel was designed to have those subsystems anticipated to need the most modification to be designed as highly modular. Linus Torvalds (1999) elaborates:

With the Linux kernel it became clear very quickly that we want to have a system which is as modular as possible. The open-source development model really requires this, because otherwise you can't easily have people working in parallel. It's too painful when you have people working on the same part of the kernel and they clash.

Without modularity I would have to check every file that changed, which would be a lot, to make sure nothing was changed that would effect (sic) anything else. With modularity, when someone sends me patches to do a new filesystem and 1 don't necessarily trust the patches per se, I can still trust the fact that if nobody's using this filesystem, it's not going to impact anything else.

For example, is working on a new filesystem, and he just got it working. I don't think it's worth trying to get into the 2.2 kernel at this point. But because of the modularity of the kernel I could if I really wanted to, and it wouldn't be too difficult. The key is to keep people from stepping on each other's toes.

In a study of ownership as a predictor of architecture, Bowman and Holt (1998) examined both Linux and Mozilla. Figure 14 shows the "ownership architecture" for

Mozilla, summarizing how many developers have worked on each subsystem, and also which subsystems had developers in common. As expected, only a few developers are associated with more than one subsystem. 89

Support Libraries (32)

Figure 14. Mozilla ownership architecture.

The ownership architecture of the Linux kernel is shown in Figure 15. Subsystems with a higher number of participants require more structure. For instance, the file system has a strictly defined interface with many independent sub-subsystems. In contrast, the inter• process communication module has a small number of developers, and is less rigorously controlled.

Memory Manager Process Scheduler Network Interface -1 — 2- (5) (16) (31)

Inter-Process Communication (2)

Figure 15. Linux ownership architecture. 90

In addition to functionality available in the core executable, Linux supports loadable kernel modules (de Goyeneche and de Sousa, 1999). Loadable modules contain code that can be optionally included in a particular configuration of the kernel. Hardware device drivers are implemented as loadable modules so that they are only included in the kernel if the hardware device is available.47 As Linus Torvalds (1999) explains, loadable kernel modules help to further reduce the impact of change-induced side effects:

With the 2.0 kernel Linux really grew up a lot. This was the point that we added loadable kernel modules. This obviously improved modularity by making an explicit structure for writing modules. Programmers could work on different modules without risk of interference. I could keep control over what was written into the kernel properly. So once again managing people and managing code led to the same design decision. To keep the number of people working on Linux coordinated, we needed something like kernel modules. But from a design point of view, it was also the right thing to do.

This approach is common in other projects. Apache uses a modular extension mechanism to maintain the integrity of the core server. Sameer Parekh (1999), founder of C2Net, notes: "The Apache Group designed the Apache web server with modularity in mind. When the Apache Group rewrote the server core for the 0.8.x release of Apache they built into the core an extensible module API in order to provide a consistent interface for functionality. They seperated [sic] out the bulk of the server's operations into a set of modules, so that the server core would be a minimal set of operations."

Perl 5 was redesigned to allow user extensions, or Perl modules. These modules, archived within the Comprehensive Perl Archive Network, or CPAN, allow for easier addition of new functionality. Python employs something similar.

The key is to control interaction between modules through well-defined, narrow interfaces. Other areas of the program use the interface when something is required from

47 In a study of the Linux kernel, Godfrey and Tu (2000) note that growth and size of the drivers subsystem distorts the idea of how large and complicated the system is. The small, core kernel subsystems comprise only a small part of the source tree. By far, the greatest portion of the code is made up of device drivers. 91 the module. All code outside the module avoids referring to any part of the module not explicitly defined as an interface. A relative amount of creative freedom can then be granted within a module as long as the interfaces are controlled. Guido van Rossum

(1998) of Python writes: "Linus has an extreme but clear point of view: the interfaces need to be designed carefully by the main developer; the implementations may be buggy.

For example, Linus doesn't mind if there are some buggy device drivers - that only affects a small number of people, and only until they get fixed - while a bad [interface] design will haunt you until the end of times [sic] ..."

In general, modular design in open-source development is driven by a decentralized organizational structure. There is a high division of labour in most projects, and a modular architecture makes this easier. Again, many small projects work like one big project.

5.4 Ubiquitous Tool Support

Tool support is important in any software project. Useful tools allow developers to work more effectively, supporting coordination and standardization. In open-source development, tools improve control by "retaining project history, tracking problems and revisions, providing and controlling remote access, and preventing change collisions"

(Fielding and Kaiser, 1997).

While there is no official toolset for open-source development, projects are remarkably consistent. The same tools tend to be used for the same tasks, and even different utilities are fairly similar in terms of visible functionality. For instance, to the average participant, one problem reporting system behaves more or less like another.

The use of a common toolset provides a distinct advantage. Contributors do not need to spend a substantial amount of time learning new tools when moving from project to project. This is undoubtedly one of the reasons why most open-source projects are not eager to stray from established tools. Owners rely on what works, shortcomings and all.

Linus Torvalds (Goodman et al, 1999) comments: "When 1 released the first version [of Linux] it was about 10,000 lines of code. Now it's about 15 million. The largest portion is by far the device drivers." 92

Implemented with the Internet in mind, tools used in open-source development are lightweight and portable, though there is an obvious bias with regard to the Unix platform. Most are freely available and can be easily downloaded, helping to lower the entry barrier for participation. It is also relatively straightforward to find tools, enlist advice, and set them up when starting a project. Nearly all tools are open source, bootstrapped as projects of their own.

There are of course development tools, such as compilers and debuggers. The most common compiler seems to be gcc, which is not surprising given that the C language is a popular choice for many projects. Similarly, gdb is frequently used for debugging. Table

9 shows the top 5 languages and debugging tools reported in a survey of open-source development activities (Zhao and Elbaum, 2000).

Table 9. Top 5 languages and testing tools used in a small-scale survey on quality related activities in open-source development.

Language Times Used Testing Tools Times Used C 105 GDB 30 Perl 46 Perl Debugger 10 C++ 35 Electric Fence 7 Java 13 DDD 4 Php 9 Purify 4

More interesting are tools intended for process support rather than actual development.

For instance, mailing list managers are vital to any open-source project. As developers move beyond private e-mail, these tools facilitate communication among a growing number of participants. Majordomo and Smartmail are popular, with Mailman gaining more recent acceptance. List managers are essentially used to control lists of addresses for an underlying mail transport system, such as Sendmail.48 Some also include integrated support for the Web. (Barr, 1996)

48 See http://www.greatcircle.com/majordomo/mhonarc.html for Majordomo, http://www.list.org for Mailman, and http://www.sendmail.org for Sendmail. 93

List archiving tools such as MHonArc and Hypermail provide a history of project communication.49 These programs are typically set up to convert list activity to a set of cross-referenced HTML documents. New participants can browse or search an archive to learn more about a project.

Source control tools are widely used for obvious reasons. Decentralized collaboration necessitates good configuration management practices. Contributions to open-source projects are made up of changes to various files in the master source tree. Many small contributions can be submitted at various times by different contributors, making synchronization a challenge.

At the simplest level are tools like diff and patch. The standard Unix diff utility is used to reveal the differences between two files. Patch50 augments diff, allowing a developer to reconstruct a file based on the differences between it and another. It has also contributed to the open-source vernacular, as contributions are now commonly referred to as "patches."

While the diff and patch utilities provide a convenient way to submit contributions, given the number of people working on a typical open-source project, it is also necessary to track the history of changes. This allows for the retrieval of previous versions for comparison with the current version. Maintainers are better able to find out who applied a patch and when, as well as to resolve conflicts between overlapping patches.

The Concurrent Versioning System, or CVS, is the de facto version control tool for open- source development. In over 50 projects surveyed, more than 60% used CVS. The configuration management policy embedded in CVS closely matches the open-source methodology, making it a natural fit. Projects are organized around a central CVS repository, from which developers can retrieve copies of the source tree. Changes are made within these copies, and the repository is updated. To stay current, developers

See http://www.oac.uci.edu/indiv/ehood/mhonarc.htrnl for MHonArc and http://www.hypermail.org for Hypermail. 30 As a point of interest, patch was also written by Larry Wall, the originator of Perl. 94 periodically download new versions and resolve any local conflicts manually.

(Cederqvist, 1993)

Various tools can be used to complement CVS. For instance, Secure Shell, or SSH, allows developers to execute remote CVS commands without sending system passwords in the clear over the Internet. CVSup and rsync can be used to transfer and update collections of files, most notably CVS repositories, across a network. Tools such as

CVSWeb and ViewCVS provide a browsable Web interface to a CVS repository.5

Bonsai and Tinderbox, contributed by Netscape through the Mozilla project, can also be used with CVS. Bonsai is for performing queries on the contents of a CVS repository.

Tinderbox can be used to examine the state of the tree, looking at what platforms have built successfully, what platforms are broken, and exactly how they are broken.

(LinuxWorld, 1999b)

Tools for problem tracking are often used in conjunction with source control systems.

The Debian bug tracking system52 is popular, and again, Mozilla has provided Bugzilla

(LinuxWorld, 1999b). However, the most well-known is the GNU Problem Report

Management System, or GNATS. All of these tools help maintainers to create and administer a database of problem reports. Responsible parties can take possession of submitted bugs or suggestions, and close them once they are resolved. Notification is handled through e-mail. (Osier and Kehoe, 1996)

It should be noted that most of the tools used in open-source development are not as feature rich as their commercial counterparts. This is not to say that they are necessarily feature poor. Following the hacker tradition, open-source tools tend to be utilitarian but certainly functional. Software developers in general have traditionally directed more effort at constructing good tools for others than for themselves (Fielding et al, 1998).

For instance, CVS has some longstanding shortcomings, including private versioning and native replication capabilities (van der Hoek, 2000). This has resulted in competing

51 See http://www.ssh.com for SSH, http://www.polstra.com/projects/freeware/CVSup/ for CVSup, http://rsync.samba.org/ for rsync, http://stud.fh-heilbronn.de/~zeller/cgi/cvsweb.cgi/ for CVSWeb, and http://www.lyra.org/viewcvs/ for ViewCVS. 52 See http://www.chiark.greenend.org.uk/~ian/debbugs/ for the Debian Bug Tracking System. 95 products such as Bitkeeper.5 Also, as Fielding and Kaiser (1997) (p89) note, high volumes of list activity can quickly outpace the capabilities of tools like MHonArc and

Hypermail:

Tracking progress and potential conflicts has often resulted in a deluge of e-mail. Maintaining an adequate archive of that communication requires more sophisticated data management and retrieval techniques than string pattern- searching or hypertext reply-threading can supply. Although most e-mail discussions can be automatically categorized and threaded, Internet mail applications vary to such an extent that an "e-mail librarian" interface is needed to manually fix incorrectly categorized or threaded discussions.

Unfortunately, existing hypertext e-mail archival systems ... are batch-oriented and do not provide such an interface. To handle a list with such a large volume of traffic, an archival system must provide views at varying levels of abstraction, such that the archive can be browsed without displaying too much information at once.

In practice however, the weaknesses of open-source tools are not overly significant.

Generally speaking, project control tools and repository data are ranked relatively low in terms of value and use (Kraul and Streeter, 1995). "Best projects do not necessarily have state-of-the-art methodologies or extensive automation and tooling," Hetzel (1993) (cited by McConnell, 1996) (p351) remarks: "They do rely on basic principles such as strong team work, project communication, and projects controls. Good organization and management appears to be far more of a critical success factor than technology." Other studies have also concluded that tool support is only a secondary contributor to an organization's overall level of productivity (Zelkowitz et al, 1984, Zawacki, 1993) (cited by McConnell, 1996).

Still, there is a concerted effort to develop next-generation toolsets for open-source projects and collaborative development in general. The Software Carpentry Project is one example, and Tigris is another. SourceForge has leveraged existing tools to provide an enhanced framework tailored for open-source development, and CollabNet's

See http://www.bitkeeper.com for Bitkeeper. 96

SourceCast is an example of a more commercially oriented toolset based on the open- source approach.54

5.5 Shared Information Space

Throughout the duration of a software project, a substantial amount of information is produced. This can include historical data, documentation, debugging logs, and more.

Sharing this information so that it is readily accessible is not always easy. Data must be integrated from various sources and presented in a useful format that remains current.

Making this information available from a central repository is also particularly important for distributed development.

A shared information space can be used as a framework to manage this task. Participants can exploit the information base both as a means of sharing context and as a way to coordinate their work activities. Brooks (1995) (p75) refers to the concept of a "project workbook." This is not so much a document as it is a structure imposed on the information that project will be producing.

Shared information spaces are an integral part of open-source development, where projects use the Web to establish a central point of contact for those interested in participating. Project Web sites are essentially portals, or administrative centres that provide easy access to information resources such as user documentation, discussion forums, and problem report databases.

For example, Table 10 describes the shared information space for Apache. The Web site is divided into different areas, each targeting a particular group. For example, general users are directed to www. apache. org, whereas developers use dev. apache. org.

Table 10. Apache shared information space.

www.apache.org Information for users, official public releases dev.apache.org Project guidelines and information for developers, tips for development and building a release, mailing list and tool information bugs.apache.org Problem report database modules.apache.org Third-party module registry

See http://www.software-carpentry.com/ for The Software Carpentry Project, http://www.tigris.org for Tigris, http://www.sourceforge.org for SourceForge, and http://www.collab.net for SoureCast. 97

In open-source development, shared information spaces can be critical to the success of a project. With regard to documentation, Brian Behlendorf (1999) (pl63), one of the core

Apache developers, remarks: "... locating dedicated resources to make sure that non• technical people can understand and appreciate the tools they are deploying is essential to widespread usage. It helps cut down on having to answer bug reports which are really just misunderstandings, and it also helps encourage new people to learn their way around the code and become future contributors."

Perhaps even more important is the issue of standardization. With so many people making contributions at different times, it can become a challenge to maintain consistency. David Lawrence (1998) (p51) of INN explains: "... keeping quality high, or at least consistent, is difficult. Even relatively good programmers working on the same source code can lose coherency if they are not careful to use a consistent style ... each of the lead programmers authorized to commit changes understands that future maintainability depends on continuing with the style already established for the code, even if it differs from their personal style. With this in mind, they rework submissions as necessary."

For this reason, project Web sites often include style guides and other references, making it easier to orient new participants. Mozilla and Apache both maintain style guides, and other projects simply reference the GNU Coding Standards (Stallman, 2000). On the value of this information in running a project, Guido van Rossum of Python (1998) writes:

... we [a panel of Open Source leaders] didn't say much more about the control issues, except to note that managing a distributed development team like the contributors to the average open source package is a bit like herding cats ... The best contribution (for me) came from Eric Raymond and Cygnus' John Gilmore, who noted that it's possible to train your contributors (e.g. through style guides, coding standards, etc.), and that this is actually an effective way to improve the quality of your contributions. One way to go at it is simply saving scraps of "internal documentation" as you are producing them, e.g. in response to email questions from other developers, and in a couple of years, voila, an internal manual! 98

As mentioned previously, many projects also use list archiving tools. These can be configured to convert list activity to a set of linked documents. In this way, a shared information space can also represent the organizational memory of a project. Roy

Fielding (Fielding and Kaiser, 1997) (p88) elaborates: "Because project communication is limited to e-mail, it can be automatically archived for later use. There is no need to take meeting notes or to transcribe design decisions after the fact, and thus nothing is irretrievably lost. Ideas can be revisited over time, and new project members can read the entire archive when they join, thereby gaining an understanding of the project history, which is often more complete than the memory of the original contributors."

The importance of a project Web site, or shared information space, means that resources are regularly committed to properly managing it. Brian Behlendorf (1999) estimates that approximately 160 hours are needed to set up the initial infrastructure and site content, with about 30 hours/week thereafter to maintain it.

Unfortunately, it is often easier to procure resources for maintaining the framework than to have people write the actual documentation contained within it. Although some documentation is created automatically, there is still a certain amount that needs to be written. David Lawrence (1998) (p51) notes: "Documentation [in open-source projects] is almost always a problem. The participants are usually programmers first and foremost.

They write code to meet a particular need; when that need is met they move on to another issue. Since they understand what they wrote (we hope), they see little need for documentation. No technical writer is assigned to the task, and requiring programmers to write relevant end-user documentation for all their changes would tend to have a strong damping effect on development."

So while open-source projects provide fairly sophisticated frameworks for information storage and retrieval, content remains a challenge. Shared information spaces such as project Web sites tend to focus on automatically generated data sources. Written documentation is often limited. 99

5.6 Summary

This chapter presented a control view of the open-source software development process.

A number of characteristics consistent with this view were identified and discussed with examples from various open-source projects. For instance, planning is informal. Most projects set no concrete plans or visions. Instead, the only long-term goal is to improve the product. Participation is tiered. This reflects a natural gradient of competence and commitment, making management of the code base easier. Modular architectures are common, supporting a high division of labour while also minimizing change impact.

Tools are reasonably consistent across projects, and information is readily accessible.

Both help to improve standardization by orienting new contributors. 100

Chapter 6 Evaluation

... you can 7 take a dying project, sprinkle it with the magic pixie dust of "open source, " and have everything work out. Software is hard. The issues aren V that simple. Jamie

Zawinksi

Objectives

• To review the key strengths and weaknesses of the open-source software

development methodology

In describing the open source approach to software development, a number of strengths have been made apparent. These strengths will be reviewed, highlighting various aspects of the process that make it useful. Weaknesses will also be discussed, both actual and potential.

6.1 Key Strengths

The 14 characteristics enumerated in this thesis reflect fundamental strengths that enable open-source projects to function effectively. These are summarized below.

Closed prototyping is common to almost all open-source projects. It is widely acknowledged that the bazaar model does not work very well for early development.

Participants need something runnable to evaluate and improve. Thus, an individual or small group will typically construct a prototype to act as a catalyst in attracting a nascent user community. The prototype is closed at first because it is easier for fewer people to maintain conceptual integrity.

This approach is very effective. By closing the prototype, there is a better chance that the initial design will be kept consistent. However, at the same time, as many design decisions as possible are postponed until the product can be exposed to a larger and more diverse community. This is particularly important when requirements are vague or unstable.

Open-source projects use a form of evolutionary prototyping, in which the initial build is enhanced incrementally through a series of regular iterations. By definition, evolutionary practices are adaptable. Designs can be altered during development 101 according to new requirements. By moving forward in smaller increments, the process remains flexible enough to handle this type of spontaneous change.

With the number of people typically involved in an open-source project, each with different ideas and needs, the ability to react to change can be critical. Short and regular iterations keep increments small, helping to synchronize contributions and minimize risk of divergence. This approach establishes a tight feedback loop, and also improves progress visibility. Developers are motivated by continual improvement.

Open-source projects leverage evolutionary development even further by managing as many design, build, and testing activities as possible in parallel. Iterations overlap as testing from one phase influences design in the next. Development is therefore able to proceed much more quickly than if work was performed sequentially.

Many small continual changes are made to the code, and extensive peer review is used to minimize error. Peer review is particularly effective in open-source development because more users are able to find more bugs. Communities tend to be fairly diverse, increasing the probability that a bug will be apparent to at least one person. Moreover, reviewers are self-selected and therefore highly motivated to find problems. Access to the source code and short iterations reduce loss of context, making bugs easier to fix once found.

Reputation among peers is also very important in open-source projects, and top contributors are careful to be as thorough as possible up front.

One of the major sources of error is eliminated because domain experts are typically responsible for writing the code. Requirements are tacitly understood by developers who are themselves users of the product. Emphasis is on features that are known to be useful.

A diverse community helps to ensure that the design accommodates a range of opinions.

Communities tend to be self-organizing, and development is decentralized. Open-source projects are adverse to large teams, and instead endorse smaller groups of functional specialists. If someone has a better idea of how to do something and can demonstrate it, they are free to do so. Responsibility is pushed downward as much as possible, however integration is still controlled. This allows many small projects to work like one big project. 102

The control hierarchy is made up of developers responsible for deciding what contributions will be accepted and integrated into the source tree. This arrangement is built up on a personal network of trust. Authority and responsibility are synonymous with trust, moving towards those who demonstrate the most competence. Leaders are essentially trusted to make the right decisions.

An environment driven by volunteerism necessitates this system of management. It is effective because the most qualified participants tend to lead a project. Leaders are technically proficient and commonly demonstrate a commitment that elicits considerable respect from the broader community. As a result, conflicts are rare. The organizational structure is also very adaptable, as determinants of leadership are based on the needs of the project itself.

This concept also applies to motivation. External motivators are secondary in open- source development. Developers participate because they are able to write code, experience a sense of community while doing so, and achieve recognition for their contributions. In this regard, motivation is internal as projects encourage participation on their own merit.

Communication among participants in an open-source project is asynchronous. This is a result of geographical distribution, which introduces variation in work schedules and network latency. Asynchronous communication maps well onto a decentralized group structure. Channels are open, and communication is able to occur both vertically and horizontally across the control hierarchy.

Planning in open-source projects is informal. There are no concrete plans or visions.

Rather, the only long-term goal is to improve the product. This is not necessarily a problem however, as loosely defined goals and deadlines may actually improve productivity. Many open-source projects are very productive, especially considering the voluntary, part-time nature of the work. Lack of schedule pressure may be yet another motivating factor for participants. It also frees the development process from any artificially enforced constraints. Developers can take as much time as deemed necessary to implement enhancements. 103

While anyone can contribute to an open-source project, in actuality there are different levels of participation. These levels represent a natural gradient of competence and commitment. Most new code is written by a relatively small group of core developers, typically the leaders. A larger group will repair defects, and still a larger group will report problems. The whole community will use and evaluate the product.

In this way, open-source projects allow for unlimited participation while maintaining integrity of design and implementation. Anyone can use the software, finding errors and recommending new features. However, the ability to have new code introduced into the source tree is governed by a few rigid rules. This approach is tailored to support a decentralized process involving a large number of participants with varying skills and experience.

Product architectures in open-source projects are highly modular, mirroring organizational structure. Modular designs support a high division of labour by allowing developers to work on different parts of the code without substantially impacting each other. Interdependencies are reduced, and development can proceed more cleanly.

Products also tend to be more portable and extensible as a result.

A modular architecture provides code integrators with greater flexibility in approving contributions. They can rely more on peer review to identify localized errors, without worrying as much about how modifications will affect other subsystems.

Comprehensibility is also improved, as most developers only need to become familiar with a specific area. With clearly defined interfaces, an in depth knowledge of the entire product is optional.

Open-source development is supported by a variety of tools. These address functions such as information management, source control, and problem reporting. Toolsets are freely available and reasonably consistent across projects, lowering the entry barrier for participation. New contributors can easily obtain tools and do not need to spend a lot of time learning how to use them.

These tools produce a substantial amount of data, much of it auto-generated. Most open- source projects establish an easily accessible framework through the web to share this information. Project homepages are essentially portals, or administrative centres that 104 provide a central point of contact for participants. This speeds up orientation for new developers, improves standardization, and helps with overall collaboration.

6.2 Key Weaknesses

Potential weaknesses in open-source software development are many because, carried too far, each of the 14 described characteristics can become a liability.

Prototyping is highly dependent on the expertise of the originator. The initial conceptual design is very important, as it can be difficult for developers to compensate for fundamental weaknesses later on. Experience is also required to decide when the prototype should be released. By releasing too early, a project might not have a clearly defined direction. Conflicting opinions and stagnation are the usual result. Yet by releasing too late, the design may not accurately anticipate the user requirements of a broader community. It can therefore be challenging to effectively manage the prototyping activity.

Mozilla is a well-known example, where Netscape released the code to an early version of Communicator 6. Although the intention was good, many developers found it frustrating to work with source that was incomplete and unstable. As a result, external participation dropped sharply soon after the initial release. The project has recovered to some extent, but credibility remains a problem.

Once released, iterative and incremental enhancement usually results in rapid evolution of the initial build. However, as the product matures, many small changes can gradually erode design. Maintenance and performance problems grow with the entropy of the system, and an increasing amount of effort may be spent on restructuring. In some cases, a complete rewrite is required. Senior developers offset this risk to some extent by carefully monitoring what is added to the source tree, but they are still obligated by the user community to incorporate requested features.

It is also difficult to move from incremental innovation to truly radical innovation or invention. Many open-source projects, such as Linux, Apache, and Netscape, are based on existing designs. As the number of participants in a project grows, it can be challenging to take major leaps. The decentralized aspect of open-source development 105 means that it is often problematic for one person, no matter how influential, to push development in a particular direction.

Concurrency allows development to proceed more rapidly, however there is also some associated overhead. With many development activities occurring in parallel, frequent merges are necessary to synchronize change. Merges are time consuming, and can result in code approval bottlenecks if not properly managed.

Open-source projects rely almost singularly on peer review for quality control. Other types of testing are uncommon, particularly at the design stage. Projects need to build-in as well as test-in quality. Peer review is sometimes insufficient for identifying high-level architectural flaws. It is also often ineffective at finding obscure flaws. This includes errors that do not happen often and are difficult to identify through source inspection.

Alan Cox (2001) remarks: "... the eyeball count on a bug tends to depend heavily on the commonness of the use of that bit of code. I've never measured it but my suspicion is that the bugs in open source code tend to be concentrated much more in the less common drivers/features - especially after a 'stable' release."

Furthermore, the efficiency of large-scale peer review is unclear. Some argue that although one person will eventually find a bug, many others will nonetheless spend time looking for it. Open-source projects also rely heavily on downstream error detection, and corrections are more costly to make at the source code level than at the design stage. So although this approach appears very effective, it may also be somewhat labour intensive.

Having domain experts write code eliminates the possibility of misunderstood requirements, however it can also introduce feature creep. Customers with direct access to the product often have an increased desire for features. Coupled with the risks associated with an evolutionary approach to development, a design can quickly bloat or wander in conflicting directions as too many quick additions are made to the code base.

Perhaps more importantly, there is no feedback loop to true end users and no imperative to create one. There is a noticeable technical bias, as product concepts tend to emphasize user activities rather than user behaviour. Moreover, when developers are not experienced users of the software, they are unlikely to have the necessary expertise or motivation to succeed in an open-source project. 106

Decentralization is a necessity in open-source projects, if only because contributors tend to dislike overly bureaucratic rules and procedures. However, these same attitudes can lead to unnecessary reinvention. For the most part, developers work in segregation on competing solutions. Many small projects tend to invariably overlap. Trial and error is common, often resulting in redundant code.

Competition can exacerbate personality conflicts or rivalries, causing turf wars where one or more developers refuse to cede their code in favour of another solution. In extreme cases, the project can fork. Disagreeing contributors may make their own copy of the code and start distributing a divergent version of the product.

Fortunately, forking is relatively uncommon. Although the exact reason is not well understood, it seems likely that this can be at least partly attributed to the political skill of many project owners. It should also be noted that forking is not always a bad thing. In some cases, a fork is the best way to repair damaged relationships. For example, in the case of Samba and Samba TNG55, developers realized that their aims were incompatible within a single source base and they forked. However almost immediately they found themselves able to work effectively as friends again. (Tridgell, 2000)

Trusted leadership works well in open-source development, stabilizing informal collaboration in an environment built on volunteerism. However, there are very few people with the right qualities to lead open-source projects. Mentoring is also rare, and most knowledge is tacit. It is therefore critical to retain key people. Leaders tend to be strongly identified with their work. This can cause problems with change management, where the community may inevitably have difficulty accepting new leadership.

Status is an important motivator in open-source development. Contributors compete for recognition from their peers by trying to provide a best solution. However, in the long- term this can strain a project. Without the stabilizing influence of a paycheck, disputes over code acceptance are more likely to escalate and good developers may leave.

Internal motivators are strong, but they are also less predictable.

See http://www.samba-tng.org for Samba TNG. 107

Asynchronous communication works well for many open-source projects, but as communities grow it can potentially become unmanageable. List traffic tends to increase dramatically, and less experienced participants can overrun certain threads, drowning out any meaningful discussion. For work to continue, senior developers may circumvent the public list, relying more on closed discussion. In such cases, the feedback loop between testers and developers is weakened.

The Linux 8086 project is one example, where core developers were forced to essentially exclude input from certain list members so that work could proceed without constant interruption. Of course, the project then ceased to be a true bazaar.

Informal planning helps to motivate contributors and improve productivity, but it also makes forecasting virtually impossible. The process itself is not readily visible, and it can therefore be difficult to assess progress. There is no real commitment to deliver anything within a given timeframe.

Different levels of participation help to maintain control in open-source projects, but a management hierarchy incurs overhead. Productivity can be noticeably impacted by coordination demands since the best developers are also leaders. Given the aforementioned shortage of qualified leaders, the scalability of such an approach is questionable.

Additionally, projects tend to move toward an unintentionally closed participatory model over time. As a product becomes more complex, there are fewer people who have a detailed understanding of the underlying architecture. Tiers can sometimes resemble cliques, as experienced members develop a comfortable working relationship. Together with the substantial effort required for orientation, there is less chance that new developers will join in at higher levels.

While tools used in open-source projects are freely available and widely distributed, they are also lacking relative to commercial alternatives. Generally speaking, these tools are weaker than the products they are used to develop. Many have longstanding bugs and are missing scalable features. Projects must expend effort to compensate for these problems. 108

With regard to shared information spaces, most open-source projects struggle in producing adequate documentation. It is often easier to procure resources for maintaining an information framework than to have people supply the documentation contained within it. Consequently, much of the knowledge in a typical open-source project is tacit and unwritten. This encourages learning by doing, which can be unnecessarily time consuming.

6.3 Summary

This chapter reviewed some key strengths and weaknesses of the open-source software development methodology. 109

Chapter 7 Conclusions

If you don 7 know where you are, a map won 7 help. Watts S. Humphrey

Objectives

• To discuss how the objectives of the thesis have been addressed

• To present potential directions for future work

• To summarize the thesis

7.1 Addressing the Objectives

The overall aim of the thesis has been to provide a descriptive process model for open- source software development. The intent was to achieve this aim by meeting several objectives.

As described earlier, the first objective was to survey the current literature relating to open-source software. This was done in Chapter 2, providing an introduction to the subject area, not just to open-source software development, but also to the more general concept of Open Source. The introduction included a brief history of free software, along with an overview of various licensing practices and definitions.

The second objective was to review a range of open-source projects, selecting several for more detailed investigation. This assessment is outlined in Appendix A.2. Roughly 50 projects were catalogued with regard to goals, licensing, community, history, and current status. Of these 50 projects, the 10 shown in Table 1 (p6) were shortlisted for further study.

The third objective was to compile additional information about the shortlisted projects, both through observational and historical study. Introductory profiles of each project were made available in Chapter 2. Additional information was accumulated through numerous published interviews with various core participants, as well as legacy data and several related studies. Projects were also monitored by passively subscribing to various newsgroups and mailing lists. 110

The fourth objective was to identify common characteristics across projects, consistent with state, organizational, and control views of the development process. Through a comparative analysis, 14 characteristics were recognized and broken out into different process views by asking the following questions: How is the work produced? How is the work organized? How is the work controlled?

The fifth objective was to discuss these characteristics with specific examples. Chapters

3, 4, and 5 addressed the state, organizational, and control views respectively.

Characteristics were informally discussed in each chapter, with supporting illustrations from various projects. Common attributes were presented in the broader context of software engineering.

Lastly, the sixth objective was to critique the process model, both independently and through exposure to the open-source community. A review of the key strengths and weaknesses the open-source development methodology was included in Chapter 6. A preliminary draft of the thesis was also distributed to select members of the open-source community. This included Alan Cox (Linux), Brian Behlendorf (Apache), Roy Fielding

(Apache), Michael Johnson (Linux), David Lawrence (INN), Jason Robbins

(Argo/UML), Guido van Rossum (Python), Erik Troan (Linux), and Paul Vixie (BIND).

7.2 Future Directions

Many software practitioners have shown considerable interest in adapting some of the techniques that have worked so well in projects such as Linux and Apache for use in other areas. This raises a few interesting questions, together with potential directions for future work.

For instance, the characteristics enumerated in this thesis can be taken to represent a broad description of the open-source software development process. However, to what extent are these characteristics dependent on each other? Must they be applied together, or can some be taken individually, perhaps in conjunction with more conventional approaches?

It has already been pointed out that incremental development is dependent on a highly modular architecture. Similarly, modular designs tend to mirror a decentralized Ill organizational structure. The software research community has documented these dependencies, but others are less clearly defined.

A notable example is the concept of large-scale peer review. Peer review is fundamental to open-source development. Projects rely on the efforts of a large, diverse community for quality control. But when is a community large enough? Is there some minimum number of participants necessary for peer review to be effective in an open-source project? Must all users also be developers?

Some of these questions have been partly addressed in a discussion of bazaar size

(Cavalier, 1998). A distinction is made between the effective and total size of a bazaar.

Effective size refers to the number of participants able to contribute to a specific activity.

This is in contrast to total size, or the total number of participants in a project.

The implications of bazaar size are significant because they introduce the possibility that some of the practices common to open-source development can be applied in closed environments. Would large-scale peer review work as well for a software company with an effective number of participants comparable to that of a smaller open-source project?

This points to some potential, albeit controversial, commercial applications for the open- source model.

Another question might then be to what extent is the development process dependent on other aspects of the open-source approach? Licensing, for instance? Does software need to be licensed as open source in order for a project to practice open-source development?

Tim O'Reilly (2000) suggests that "... it is possible to apply open source collaborative principles inside a large company, even without the intention to release the resulting software to the outside world."

Companies such as CollabNet are already exploring these issues. CollabNet leverages expertise in the open-source community to provide companies with collaborative development solutions. Bill Portelli (Williams S., 2000), CEO of CollabNet, explains:

"A lot of what we've done is look at the practices and processes that have worked7 how developers have worked together and how projects have managed to achieve the growth rates of a Linux or Apache." 112

Other efforts include "inner source" variants, in which some of the principles of open- source development are applied to other types of projects. One example is the concept of

"gated communities," a form of shared development in which licensed developers are able to exchange modifications with other licensed developers, but not with the general public (Perens, 2000).

The work reported in this thesis could therefore be extended to provide a broader understanding of the open-source development process. Specifically, to what extent are common development practices dependent on each other along with other aspects of the open-source approach, and can they be applied selectively?

7.3 Thesis Summary

In describing the open-source software development process, the thesis identified 14 characteristics. These are reiterated below.

1. Prototyping is closed. An individual or small group develops an early version of

the product. Upon release, this is used to present plausible promise and establish

a conceptual design.

2. Enhancement is iterative and incremental. The prototype, or "build 0," is evolved

incrementally through a series of regular iterations. Increments are small and

iterations are frequent.

3. Development operates concurrently at many levels. Design, build, and testing are

managed in parallel. Change is stabilized in stages.

4. Peer review is large-scale. Changes are subject to review by a diverse and highly

motivated user base.

5. Requirements are strongly user-driven. Requirements are tacitly understood by

developers who are themselves users of the product.

6. Collaboration is decentralized. Responsibility is pushed downward as a project

scales, allowing many small projects to work like one big project. Integration is

controlled. 113

7. Leadership is trusted. The control hierarchy is built up on a personal network of

trust. Authority and responsibility move towards those who demonstrate the most

competence.

8. Motivation is internal. External motivators such as financial compensation are

secondary. People contribute for reasons such as opportunity, community, and

status.

9. Communication is asynchronous. Geographical distribution makes synchronous

communication impractical. Most communication is through electronic mail.

10. Planning is informal. There are no concrete plans or visions. The only long-term

goal is to improve the product.

11. Participation is tiered. Participants work at different levels, reflecting a natural

gradient of competence and commitment.

12. Architectures are designed for modularity. Modular designs reduce

interdependencies, allowing development to proceed more cleanly.

13. Tool support is ubiquitous. Tools are freely available and reasonably consistent

across projects, lowering the entry barrier for participation.

14. Information space is shared. Web sites provide easy access to information

resources such as user documentation, discussion forums, and problem report

databases.

For discussion, these characteristics were broken out into state, organizational, and control views. Each view was intended to provide a different, but complementary perspective of the development process. Key strengths and weaknesses of the process were also discussed, both actual and potential.

Reviewing this work, several important points should be noted. First is that very few of these characteristics are necessarily unique to open-source development. For instance, in software and other industries, there are now many companies that use prototyping as well as multiple cycles of concurrent design, build, and testing. Similarly, topics such as peer review and democratic team organization are not new. 114

What makes open-source development interesting is how these common principles can be applied together to produce high-quality software in a rapidly changing environment, namely the Internet. Projects such as Linux, Apache, and many others have consistently demonstrated that it is possible to produce reliable software quickly using an adaptable, distributed approach.

Looking back at The Mythical Man Month 20 years later, Brooks (1995) (p281) observes that "the microcomputer revolution has changed how everybody builds software." The software processes of the 1970's have been altered by this revolution, together with the technological advances that enabled it. As a result, many of the difficulties associated with these older processes have been eliminated.

With this in mind, today it can likewise be stated that the Internet is again changing how software is built. The Internet is pervasive, forcing a re-examination of not just conventional business concepts, but also of how software is developed. The Internet inherently supports globally distributed product development, and it is becoming increasingly recognized as a way of improving time to market.

For software engineering, lightweight, agile processes are being used more and more in an attempt to produce better software faster. These approaches are based on an adaptive lifecycle similar to evolutionary or spiral lifecycles, together with a management model that emphasizes leadership, collaboration, and accountability, rather than command and control.

There has been a move away from processes that are hierarchical and management driven. The trend is toward cooperative styles of development where management dictate is replaced by ethical considerations of community membership. In many ways, open source has pioneered this approach.

Lightweight methodologies are more adaptive than predictive, and are people-oriented rather than process-oriented. Agile methods attempt a useful compromise between too much and not enough process (Fowler, 2001). As Booch (2001) (p 121) observes: "The

CMM makes us wonder what else shall we add? Whereas the light methodologists are always asking what else can we take away?" 115

In some ways, open-source development scarcely even qualifies as a methodology. An implicit informality and strong social emphasis make it somewhat unpredictable. Larry

Wall (1999b) (pi 27), the creator of Perl, says: "I won't tell you everything about how

Open Source works; that would be like trying to explain why English works."

However, as Bollinger et al (1999) (p9) note: "If nothing else, it acts as a sort of

Occam's Razor56 for the rest of software engineering. Instead of asking, 'How many more controls will this project need before it becomes predictable?' the Open Source

Razor demands that a new question be asked: 'Can you justify adding a new control, method, or metric to the process when open-source methods already work fine without it?' "

This is not to say that open-source development is a perfect solution. Contrary to some widespread misconceptions about its application, open source is not meant to best all existing forms of software development.

In comparing open source with other approaches it is helpful to visualize a spectrum, with process intensive methodologies such as the Capability Maturity Model at one extreme, and lightweight methodologies such as open source at the other. Approaches that incorporate control with some flexibility, like Rational's Unified Process, can be positioned in the middle. Of course, what the spectrum illustrates is that no one solution is made to fit all problems. In some cases the CMM might be optimal. Software for air- traffic control or a nuclear power plant would undoubtedly be good candidates for a development team relying on the CMM. However in other cases, process is less important. Software development is fundamentally a human activity, and lightweight methodologies emphasize this.

The irony is that, for an approach that is commonly thought of as somewhat contrary, open source shares many similarities with more traditional forms of software development. Again, very few of the principles that characterize open source can be considered unique. Instead, it is the way in which different aspects of the process have

56 Occam's Razor is a principle attributed to 14* century logician and Franciscan friar William of Occam. It emphasizes , stating that "Entities should not be multiplied unnecessarily" or "Pluralitas non est ponenda sine neccesitate." 116 been emphasized to solve certain types of problems. Open source happens to work very well for infrastructure projects where participants have an in-depth knowledge of the application domain. Moreover, the inherent adaptability of the development model continues to surprise critics as it tests the boundaries of this problem space.

Yet open-source development is by no means a magic bullet. Linus Torvalds (Scannell,

1999) remarks: "People think just because it is open-source, the result is going to be automatically better. Not true. You have to lead it in the right directions to succeed.

Open source is not the answer to world hunger."

Rather, open source represents an alternative approach to software development that has evolved within the Internet itself, offering useful information about common problems as well as some possible solutions. As Tim O'Reilly (Booch, 2001) (pl21) observes:

"Open source software is important because it represents the first stage of something that will eventually become far more widespread: shared projects carried out over the Net by people who are geographically and organizationally independent."

The work presented in this thesis provides a consistent framework for analysis and discussion, ideally describing a set of principles that define the absolute minimal process by which a distributed group of people can produce high quality software. In this regard,

Open Source mirrors contemporary trends in software engineering by emphasizing social factors and lightweight methods as a way of developing software. 117

Bibliography

ABCNews.com. (1999). Linus Torvalds: Leader of the Revolution [online]. Available from: http://www.abcnews.go.com/sections/tech/DailyNews/LinusChat990505. [Accessed 11 March 2000]. Linus Torvalds covers a range of topics in this online chat, including his future involvement in Linux development and competition with Microsoft.

Abreu, E.M. (1999). Interview: Torvalds on Linux directions, open source. IDG News [online]. Available from: http://www.sunworld.com/sunworldonline/swol-06- 1999/f_swol-06-torvalds_p.html [Accessed 17 April 2000]. Linus Torvalds comments on Linux 2.4, the Linux Standard Base, and vendor support.

ActiveState. (1999). ActiveState deepens Open Source Perl relationship with Microsoft. ActiveState Press Release. 1 June 1999. As part of a 3 year support contract, ActiveState agrees to add features missing from previous Windows ports of Perl.

Anderson, M. (1999). Number two - with a bullet. Ottawa Citizen Online [online]. Available from: http://www.ottawacitizen.com/hightech/990726/2648940.html [Accessed 17 April 2000]. An interview with Alan Cox, in which he discusses his role in the Linux community.

Aoyama, M. (1998). Agile Software Process and Its Experience. Proceedings of the 20th International Conference on Software Engineering. IEEE Computer Society, 3-12. Proposes the Agile Software Process model (ASP), and discusses its experience in large- scale software development.

Apache. (2000). The Apache Software Foundation [online]. Available from: http://www.apache.org/ [Accessed 3 May 2001]. A brief introduction to The Apache Software Foundation.

Apple Computer. (1999). Apple Public Source License [online]. Available from: http://www.publicsource.apple.com/apsl/ [Accessed 21 April 1999]. Text of the Apple Public Source License, which is intended to support Apple's open development efforts, including Darwin, the core of Mac OS X.

Barahona, J.G., Quiros, P.H., and Bollinger, T. (1999). A Brief History of Free Software and Open Source. IEEE Software, 16(1), 32-33. A free software timeline spanning 1969 to 1998.

Barr, D. (1996). Majordomo FAQ [online]. Available from: http://www.greatcircle.com/majordomo/majordomo-faq.html [ Accessed 17 October 2000]. Frequently asked questions about the Majordomo list manager.

Basili, V.R. and Turner, A.J. (1975). Iterative Enhancement: A Practical Technique for Software Development. IEEE Transactions on Software Engineering, December 1975, 118

390-396. Recommends iterative enhancement as a top-down approach to software development.

Beaver, D., Buckendorff, J., Edelman, A., Mace, T., Vaamonde, C, and Yan, A. (2000). Amazon.com Interview: Larry Wall [online]. Available from: http://vvvvrw.amazon.com/exec/obidos/ts/feature/7137/104-2330356-5587641 [Accessed 17 April 2000]. Larry Wall discusses various technical issues relating to Perl, as well as the philosophy behind the language.

Behlendorf, B. (1999). Open Source as a Business Strategy. In: DiBona, C, Ockman, S., and Stone, M. eds. Open Sources: Voices from the Open Source Revolution. O'Reilly and Associates, 149-170. Brian Behlendorf discusses Open Source and its potential as a reliable model for conducting commercial software development.

Boehm, B.W. (1981). Software Engineering Economics. Prentice Hall. A classic guide to economic analysis in software development.

Boehm, B.W. (1987). Improving Software Productivity. IEEE Computer, 20(9), 43-57. A discussion of factors affecting software productivity and the need to improve it.

Boehm, B.W. (1988). A Spiral Model of Software Development and Enhancement. IEEE Computer, 21(5), 61-72. Describes the spiral model, an alternative software life cycle emphasizing risk management.

Bollinger, T., Nelson, R., Self, K.M, and Turnbull, S.J. (1999). Open-Source Methods: Peering Through the Clutter. IEEE Software, 16(4). A rebuttal to McConnell (1999).

Booch, G. (2001). Software Solutions. Communications of the ACM, 44(3). A discussion about the future of software engineering. bootNet.com. (1999). Linux Manifesto [online]. Available from: http://www.bootnet.com/youaskedforit/lip_linux_manifesto.html [Accessed 11 March 2000]. A comprehensive interview with Linus Torvalds, in which he discusses the origins, motivation, and development style of Linux.

Bowman, I.T., Holt, R.C., Brewster, N.V. (1999). Linux as a Case Study: Its Extracted Software Architecture. In: Proceedings of the 21s' International Conference on Software Engineering. IEEE Computer Society, 555-563. A study of Linux kernel architecture.

Brooks, F.P, Jr. (1995). The Mythical Man-Month: 20th Anniversary Edition. Addison- Wesley. Original chapters, accompanied by a summary and retrospective.

Cavalier, F.J. (1998). Some Implications of Bazaar Size [online]. Available from: http://www.mibsoftware.com/bazdev/ [Accessed 7 April 2001]. Discusses the difference between total size and the effective working size of a bazaar. 119

Cederquist, P. (1993). Version Management with CVS. User manual for the Concurrent Versioning System.

Charles, J. (1998). Open Source: Netscape Pops the Hood. IEEE Software, 15(4), 79-81. Discusses Netscape's decision to release the source code to Communicator.

Chen, L. and Gaines, R. (1997). Modeling and Supporting Virtual Cooperative Interaction Through the Web. In: F. Sudweeks, M. McLaughlin, and S. Rafaeli eds. Network and Netplay: Virtual Groups on the Internet. Cambridge, MA: AAAI/MIT Press. Presents a conceptual model for knowledge acquisition over the Web.

Coar, K. (1999). Apache and Open-Source Development. The Bazaar, 14-16 December 1999. Covers various topics, including mailing lists, source control, voting, and the release cycle.

Comer, E.R. (1991). Alternative Software Life Cycle Models. Aerospace Software Engineering: A Collection of Concepts. American Institute of Aeronautics. Contrasts approaches such as rapid prototyping, incremental development, evolutionary prototyping, reuse, and automated software synthesis with conventional life cycle models.

Conway, M.E. (1968). How do committees invent? Datamation, 14(4), 28-31. Introduces Conway's hypothesis, which states that the organization of a software system will be congruent to the organization of the group that designed the system.

Cooke, D., Urban, J., and Hamilton, S. (1999). Unix and Beyond: An Interview with Ken Thompson. IEEE Computer, 32 (5), 58-64. One of the co-creators of Unix briefly comments on Linux.

Cox, A. (1998). Cathedrals, Bazaars and the Town Council. Slashdot [online]. Available from: http://slashdot.org/features/98/10/13/1423253.shtml [Accessed 17 April 2000]. Alan Cox shares his thoughts on the Bazaar model, including an example dubbed the "Town Council" effect.

Cox, A. 5 January 2000. Re: Linux 2.4 before 2001? Linux Kernel Mailing List [online]. Available from: [email protected] [Accessed 4 Feb 2001]. Alan Cox comments on release dates for the 2.4 kernel release.

Cox, A. ([email protected]), 11 June 2001. Re: graduate thesis on open-source software development. E-mail to K. Johnson ([email protected]).

Curtis, B. (1988). A Field Study of the Software Design Process for Large Systems. IEEE Transactions on Software Engineering, 31(11), 1268-1287. Describes the design problems associated with 17 large software projects. 120

Curtis, B., Kellner, M.I., and Over, J. (1992). Process Modeling. Communications of the ACM, 35(9), 75-90. An introduction to software process modeling.

Davis, A.M. and Sitaram, P. (1994). A Concurrent Process Model for Software Development. Software Engineering Notes, 19(2), 38-51. Presents a model capturing concurrency among activities in software development.

DeMarco, T. and Lister, T. (1999). Peopleware, 2nd Edition. Dorset House. Discusses social aspects of software development and project management.

Dempsey, B.J., Weiss, D., Jones, P. and Greenberg, J. (1999). A Quantitative Profile of a Community of Open Source Linux Developers. School of Information and Library Science, University of North Carolina at Chapel Hill. A quantitative study of the UNC Metalab Linux Archives.

Dougherty, D. (1997). The Rebels of Perl. Web Review [online], 28 February 1997. Available from: http://webreview.com/wr/pub/97/02/28/feature/perl.html [Accessed 17 April 2000]. Larry Wall and Tom Christiansen comment on a variety of topics, including the origin, philosophy, and culture of Perl.

Dougherty, D. (1998a). The Origins of Free and Open Source Software. Web Review [online], 10 April 1998. Available from: http://webreview.com/wr/pub/freeware/origins.html [Accessed 13 Nov 1998]. Outlines the different meanings of freeware, free software, shareware, public domain, and open- source software. Also includes a timeline spanning 1969 to 1998.

Dougherty, D. (1998b). Python's Guido van Rossum. Web Review [online], 10 April 1998. Available from: http://webreview.com/wr/pub/freeware/vanrossum.html [Accessed 13 Nov 1998]. A short interview with Guido van Rossum, discussing the growth and future of Python.

Dougherty, D. (1998c). Apache's Brian Behlendorf. Web Review [online], 10 April 1998. Available from: http://webreview.com/wr/pub/freeware/behlendorf.html [Accessed 13 Nov 1998]. A short interview with Brian Behlendorf, discussing the origins of Apache.

Eich, B. (2001). Mozilla development roadmap. [online]. Available from: http://www.mozilla.org/roadmap.html [Accessed 6 Feb 2001]. Describes where the Mozilla project has been and where it is going.

Eunice, J. (1998). Beyond the Cathedral, Beyond the Bazaar [online]. Available from: http://www.illuminata.com/public/all/catalog.cgi/cathedral [Accessed 21 April 1999]. An early critique of The Cathedral and the Bazaar.

Fielding, R.T. (1998). The Apache HTTP Server Project: Lessons Learned from Collaborative Software Development [online]. Available from: http://www.ics.uci.edu/~fielding/talks/apache98/index.htm [Accessed 28 Feb 1999]. A 121 slide presentation outlining how Apache development has changed since the project's inception.

Fielding, R.T., and Kaiser, G. (1997). The Apache HTTP server project. IEEE Internet Computing, 1(4), 88-90. Discusses large-scale collaboration over the Internet with specific examples from Apache.

Fielding, R., Whitehead, J., Anderson, K, Bolcer, G.A., Oreizy, P., and R.N. Taylor. (1998). Software Engineering and the WWW: The Cobbler's Barefoot Children, Revisited [online]. Department of Information & Computer Science. The University of California, Irvine. Available from: http://gbolcer.ics.uci.edu/papers/cobbler.html [Accessed 1 Dec 1998]. A discussion of software engineering tool support.

Fogel, K. (1999). Open Source Development with CVS. The Coriolis Group. Chapters alternating between CVS and open-source software development.

Fowler, M. (2001). The New Methodology [online]. Available from: http://wwvv.martinfowler.com/articles/newMethodology.html [Accessed 8 April 2001]. An introduction to agile, or lightweight, methodologies.

Freedman, D.P. and Weinberg, G.M. (1990). Handbook of Walkthroughs, Inspections and Technical Reviews, 3rd Edition. Dorset House. Procedures and checklists for conducting reviews.

The Free Software Foundation. (1998a). Overview of the GNU Project [online]. Available from: http://www.fsf.org/gnu/gnu-history.html [Accessed 5 April 1999]. A brief overview of the GNU Project.

The Free Software Foundation. (1998b). Categories of Free and Non-Free Software [online]. Available from: http://www.fsf.org/philosophy/categories.html [Accessed 18 April 1999]. Describes various categories of free and non-free software, including freeware, shareware, public domain software, and Open Source software.

The Free Software Foundation. (1998c). What is the Free Software Foundation [online]. Available from: http://www.fsf.org/fsf/fsf.html [Accessed 5 April 1999]. A brief introduction to the FSF.

The Free Software Foundation. (1998d). Why "Free Software " is better than "Open Source" [online]. Available from: http://www.fsf.org/philosophy/free-software-for- freedom.html [Accessed 28 June 2001]. Explains the differences between free software and open source.

The Free Software Foundation. (1999a). What is Free Software? [online]. Available from: http://www.fsf.org/philosophy/free-sw.html [Accessed 18 April 1999]. A definition of free software. 122

The Free Software Foundation. (1999b). What is Copyleft? [online]. Available from: http://www.gnu.org/copyleft/copyleft.html [Accessed 19 April 1999]. Describes the concept of copyleft.

Freshmeat.net. (1999). Freshmeat appindex [online]. Available from: http://freshmeat.net/appindex/ [Accessed 21 April 1999]. Comprehensive index of free software.

Ghosh, R.A. (1998). Interview with Linus Torvalds: What motivates free software developers? First Monday [online]. Available from: http://www.firstmonday.org/issues/issue3_3/torvalds/ [Accessed 12 June 2000]. Linus Torvalds talks about motivation in open-source development.

Gillmor, D. (2000). He wrote the book on Linux revolution and added chapters on motivating legions. SiliconValley.com [online], 26 February 2000. Available from: http://www.mercurycenter.com/svtech/columns/gillmor/leaders/torvalds.htm [Accessed 11 March 2000]. An interview with Linus Torvalds, in which he discusses leadership in Linux kernel development.

Godfrey, M. and Tu, Q. (2000). Evolution in Open Source Software: A Case Study. University of Waterloo. Summarizes some preliminary investigations into the evolution of the Linux kernel.

Gooch, R.E. et al. (2001). The linux-kernel mailing list FAQ [online]. Available from: http://www.tux.org/lkml/ [Accessed 22 April 2001]. Frequently asked questions about the linux-kernel mailing list.

Goodman, A., Welsh, M. and Gomes, L. (1999). The Linux Interview: It's Linus" World, We Just Live In It. Linux Magazine [online], May 1999. Available from: http://www.linux-mag.com/ [Accessed 4 April 2000]. Linus Torvalds discusses adding features, timing releases, and continuity.

De Goyeneche, J.M. and de Sousa, E.A.F. (1999). Loadable Kernel Modules. IEEE Software,16(1), 65-71. An explanation of how the Linux kernel uses dynamically loadable modules.

Hamerly, J., Paquin, T., and Walton, S. (1999). Free the Source: The Story of Mozilla. In: DiBona, C, Ockman, S., and Stone, M. eds. Open Sources: Voices from the Open Source Revolution. O'Reilly and Associates, 197-206. Recalls events surrounding Netscape's decision to release the source code for Communicator.

Hars, A. and Ou, S. (2001). Working for Free? Motivations of Participating in Open- Source Projects. Proceedings of the 34th Hawaii International Conference on Systems Science. A study of the motivational factors driving participation in open-source projects. 123

Hecker, F. (1998). Setting Up Shop: The Business of Open-Source Software [online]. Available from: http://people.netscape.com/hecker/setting-up-shop.html [Accessed 18 Sept 1998]. The whitepaper that led to Netscape's Open Source announcement in 1998.

Hekmatpour, S. (1987). Experience with evolutionary prototyping in a large software project. Software Engineering Notes, 12(1). Takes a broad look at different prototyping approaches, and then describes a large project that applied evolutionary prototyping.

Hermann, S., Hertel, G. and Niedner, S. (2000). Linux Study Homepage [online]. Available from: http://www.psychologie.uni-kiel.de/linux-study/writeup.html [Accessed 15 Oct 2000]. A survey of Linux kernel developers.

Hietaniemi, J. (2000). The Perl history records [online]. Available from: http://www.perl.com/pub/doc/manual/html/pod/perlhist.html [Accessed 28 June 2000]. Historical information related to Perl development, including selected patch and release sizes.

Humphrey, W.S. (1989). Managing the Software Process. Addison-Wesley. Discusses software management through the Capability Maturity Model, or CMM.

IBM. (1998). IBM Enhances and Expands WebSphere Product Line in Collaboration with Apache and NetObjects. IBM Press Release. 22 June 1998. Announcement regarding plans to include the Apache Web server as part of the Websphere product line.

IBM. (2000). IBM Public License [online]. Available from: http://oss.software.ibm.com/developerworks/opensource/licensel 0.html [Accessed 21 August 2000]. Text of the IBM Public License.

KDE. (2000). KDE Response to GNOME Foundation [online]. Available from: http://www.kde.org/announcements/gfresponse.html [Accessed 9 September 2000]. KDE responds to the Sun/HP decision to use GNOME as their standard desktop.

KDE. (2001). KDE Project Management [online]. Available from: http://www.kde.org/whatiskde/management.html [Accessed 3 May 2001]. A brief introduction to how the KDE project is run.

Koch, S. and Schneider G. (2000). Results from Software Engineering Research into Open Source Development Projects Using Public Data. Vienna University of Economics and BA. Quantitative study of GNOME.

Kraul, R. and Streeter, L. (1995). Coordination in Software Development. Communications of the ACM, 38(3), 69-81. Compares the use and value of various coordination techniques in software development.

Kuniavsky, M. (2000). It's the User, Stupid. Sendmail.net [online], 20 January 2000. Available from: http://sendmail.net/?feed=interviewkuniavsky [Accessed 11 March 124

2000]. An opinion paper discussing lack of user interface innovation in open-source development.

Lash, A. (1998). Source code for the masses. CNETNews.com [online], 2 February 1998. Available from: http://vvww.news.com/SpecialFeatures/0,5,18652,00.html [Accessed 27 July 1998]. A brief introduction to free software.

Lawrence, D.C. (1998). InterNetNews Server: Inside an Open-Source Project. Internet Computing, 6(15). David Lawrence discusses the INN project.

Lehman, M.M. (1987). Process models, process programs, programming support. Proceedings of the 9th International Conference on Software Engineering. IEEE Computer Society, 14-16. Response to Osterweil, 1987, questioning the feasibility of using programming to model processes.

Levy, S. (1984). Hackers: Heroes of the Computer Revolution. Anchor Press/Doubleday. The classic underground history of the computer age.

Linux Journal. (1998). What is Linux? Linux Journal [online]. Available from: http://www.ssc.com/linux/what.html [Accessed 9 September 1998]. A brief definition of Linux.

LinuxPower. (1999). Jwz Explains It All [online]. Available from: http://linuxpower.org/display_item.phtml?id=i97 [Accessed 11 March 2000]. Jamie Zawinski comments on mozilla, including his role, future plans, and Netscape.

Linux Weekly News. (1999). LWN interviews Alan Cox. Linux Weekly News [online]. Available from: http://lwn.net/1999/features/ACInterview/ [Accessed 11 March 2000]. Alan Cox discusses Linux kernel development, commenting on his increasingly organizational role.

Linux World. (1999a). Open Source Methodologies. LinuxWorld Conference & Expo, 9- 12 August, 1999. Linus Torvalds, Brian Behlendorf, Jeremy Allison, Jordan Hubbard, and Dirk Hohndel talk about the organization and processes behind their projects.

LinuxWorld. (1999b). Inside Mozilla.org. LinuxWorld Conference & Expo, 9-12 August, 1999. Eric Krock, Rick Gessner, and Dan Mosedale discuss the tools and processes used in mozilla development.

Lourier, P. (1999). Interview with Eric Allman. 1999 USENIX Technical Conference, 6- 11 June, 1999. Eric Allman talks about the history of Sendmail.

Mantei, M. (1981). The Effect of Programming Team Structures on Programming Tasks. Communications of the ACM, 24(3), 106-113. Outlines 3 generic team structures for software development, including recommendations for when and how they should be applied. 125

Maurer, F. and Kaiser, G. (1998). Software Engineering in the Internet Age. IEEE Computing, 2(5). Editorial discussing how the Internet is influencing today's software development activities.

McConnell, S. (1996). Rapid Development. Microsoft Press. Covers best practices in rapid development and software engineering.

McConnell, S. (1999). Open-Source Methodology: Ready for Prime Time? IEEE Software, 16(4). A critical look at open-source software development.

McHugh, J. (1998). For the Love of Hacking. Forbes, 10 August 1998. Introduces Open Source to a business audience.

McHugh, J. (1999). Open Sourcery. Forbes, 3 May 1999. Discusses the apparent commercial failure of Mozilla.

McLay, M. 29 June 1994. If Guido was hit by a bus? Python Mailing List [online]. Available from: [email protected] [Accessed 11 February 2001]. Infamous thread discussing the dependence of Python on Guido van Rossum.

McMillan, R. (1999). Kernel Driver. Linux Magazine [online], September 1999. Available from: http://www.linux-mag.com/ [Accessed 17 April 2000]. An interview with Alan Cox discussing his involvement in Linux kernel development.

Miller, B.P., Koski, D., Lee, CP., Maganty, V., Murthy, R., Natarajan, A., and Steidl, J. (1995). Fuzz Revisited: A Re-examination of the Reliability of UNIX Utilities and Services. The University of Wisconsin. Follow-up to a previous study conducted in 1990, now including free software.

Mills, H.D. (1971). Top-down programming in large systems. In: R. Rustin ed. Debugging Techniques in Large Systems. Prentice-Hall. An early discussion of top-down design.

Mockus, A., Fielding, R.T., and Herbsleb, J. (2000). A Case Study of Open Source Software Development: The Apache Server. Proceedings of the 22" International Conference on Software Engineering. IEEE Computer Society, 263-272. A quantitative study of Apache development.

Moody, G. (1997). The Greatest OS that (N)ever Was. Wired [online], 5 August 1997. Available from: http://www.wired.eom/wired/5.08/linux.html [Accessed 10 Feb 1999]. An overview of the origin and evolution of Linux. mozilla.org. (2000). mozilla.orgstatistics [online]. Available from: http://webtools.mozilla.org/miscstats/ [Accessed 30 July 2000]. Various statistics for work done by Mozilla contributors other than Netscape. 126

Netcraft. (2000). Web Server Survey [online]. Available from: http://www.netcraft.com/survey/ [Accessed 15 May 2000]. A survey of Web server software usage on Internet connected computers.

O'Brian, R. (1999). High-Tech Rally Boosts Market: Red Hat Soars in Trading Debut. Wall Street Journal. 12 August 1999. Recaps the Red Hat IPO.

Open Source Initiative. (1999a). Frequently Asked Questions [online]. Available from: http://www.opensource.org/advocacy/faq.html [Accessed 01 April 2001]. Some frequently asked questions about open-source software.

Open Source Initiative. (1999b). History of the OSI [online]. Available from: http://www.opensource.org/docs/history.html [Accessed 01 April 2001]. A brief history of the Open Source Initiative, including how the Open Source label was developed.

Open Source Initiative. (1999c). The Open Source Definition, Version 1.4 [online]. Available from: http://www.opensource.org/osd.html [Accessed 20 April 1999]. Text of the Open Source Definition, intended as a specification of what is permissible in a software license for that software to be considered Open Source.

O'Reilly, T. (1999). Lessons from Open-Source Software Development. Communications of the ACM, 42(4), 33-37. An overview of open-source software development, including lessons learned.

O'Reilly, T. (2000). Open Source: The Model for Collaboration in the Age of the Internet [online]. Available from: http://www.oreillynet.eom/pub/a/network/2000/04/13/CFPkeynote.html [Accessed 3 May 2001]. Keynote discussing why open source is central to the success of the Internet.

Osier, J.M. and Kehoe, B. (1996). Keeping Track: Managing Messages with GNATS [online]. Available from: http://sourceware.cygnus.com/gnats/gnats_toc.html [Accessed 31 May 1999]. User manual for the GNU Problem Report Management System.

Osterweil, L. (1987). Software processes are software too. Proceedings of the 9th International Conference on Software Engineering. IEEE Computer Society, 2-13. Suggests that processes can be modeled through programming.

Parekh, S. (1999). Introduction to programming for the Apache API [online]. Available from: http://modules.apache.org/doc/Intro_API_Prog.html [Accessed 9 Oct 2000]. Discusses the module API for the Apache server.

Parnas, D.L. (1972). On the Criteria To Be Used in Decomposing Systems Into Modules. Communications of the ACM, 15(12), 1053-1058. Classic paper advocating decomposition and modularization through design decisions. 127

Patrizio, A. (1998). Communicator Code Gets Mixed Developer Reaction. TechWeb [online], 3 April 1998. Available from: http://www.techweb.com/wire/story/TWB19980403S0024 [Accessed 5 September 1998]. A short article discussing the initial reaction from developers to Mozilla.

Perens, B. (1999). The Open Source Definition. In: DiBona, C, Ockman, S., and Stone, M. eds. Open Sources: Voices from the Open Source Revolution. O'Reilly and Associates, 171-188. An analysis of the Open Source Definition together with various licensing practices.

Perens, B. (2000). Gated Communities [online]. Available from: http://www.hnux.com/news/columns.phtml?aid=9939 [Accessed 8 April 2001]. discusses gated communities and their relationship to open source. perl.com (2000). The "ArtisticLicense" [online]. Available from: http://www.perl.com/pub/language/misc/Artistic.html [Accessed 9 Oct 2000]. Text of the Artistic License.

Pressman, R. (1997). Software Engineering: A Practitioner's Approach, 4th Edition. McGraw-Hill. A comprehensive introduction to software engineering. python.org (2000). Stats for comp.lang.python [online]. Available from: http://starship.python.net/~just/FindMailStats/ [Accessed 3 March 2001]. Summary of traffic on the Python mailing list.

Raymond, E.S. (1996a). The New Hacker's Dictionary. MIT Press. Definitions for technical jargon and slang.

Raymond, E.S. (1996b). A Brief History of Hackerdom [online]. Available from: http://www.tuxedo.org/~esr/faqs/hacker-hist.html [Accessed 5 April 1999]. An outline of the history of the hacker culture.

Raymond, E.S. (1998a). The Cathedral and the Bazaar [online]. Available from: http://sagan.earthspace.net/~esr/writings/cathedral-bazaar/cathedral-bazaar.html [Accessed 18 Sept 1998]. The canon of the open-source movement.

Raymond, E.S. (1998b). Homesteading the Noosphere [online]. Available from: http://sagan.earthspace.net/~esr/writings/homesteading/homesteading.html [Accessed 18 Sept 1998]. Follow-up to The Cathedral and the Bazaar, analyzing the property and ownership customs of the open-source culture.

Raymond, E.S. 12 August 2000. Re: Fwd: Closed-door development. Linux Kernel Mailing List [online]. Available from: [email protected] [Accessed 10 October 2000]. Response to a suggestion that open-source development is in fact closed. 128

Ritchie. D.M. (1979). The Evolution of the Unix Time-Sharing System [online]. Available from: http://cm.bell-labs.com/cm/cs/who/dmr/hist.html [Accessed 4 April 1999]. A brief history of Unix authored by one of the co-creators.

Robbins, J.E. 4 March 1999. Cooperation between FreeCASE and Argo/UML. FreeCASE Mailing List [online]. Available from: [email protected] [Accessed 29 July 2000]. Thread discussing the possibility of merging FreeCASE and Argo/UML.

Royce, W. W. (1970). Managing the Development of Large Software Systems: Concepts and Techniques. WESCON Technical Papers, Vol. 14. Also available in: Proceedings of the 9th International Conference on Software Engineering. Classic paper presenting the waterfall lifecycle model.

Russell, P. (1999). Anatomy of a Patch. Linux Magazine, September 1999. Traces the discovery and resolution of a network bug affecting the Linux kernel.

Salzenberg, C. (1999). Topaz: Perl for the 22nd Century [online]. Available from: http://www.perl.com/pub/1999/09/topaz.html [Accessed 25 March 2000]. Chip Salzenberg describes his work on Topaz, a project to complete rewrite the internals of Perl in C++.

Scannell, E. (1999). Linus Torvalds says open source not a guarantee of success. InfoWorld.com [online], 6 October 1999. Available from: http://www.infoworld.com/cgi- bin/displayStory.pl?99106.pitorvalds.htm [Accessed 17 April 2000]. Highlights from question and answer session with Linus Torvalds at Internet World Expo.

Schaller, C. (2000). Mozilla and Linux, the Road Ahead. LinuxPower [online], 7 March 2000. Available from: http://www.linuxpower.org/display.php?id=l68 [Accessed 11 March 2000]. An interview with Christopher Blizzard, in which he discusses the Mozilla project.

Shankland, S. (1998). Linux shipments up 212 percent. CNETNews.com [online], 16 February 1998. Available from: http://news.cnet.com/news/0-1003-200- 336510.html?tag=st.ne.ni.rnbot.rn.ni [Accessed 21 April 1999]. Discusses the market share of Linux.

Stallman, R.M. (1993). The GNU Manifesto [online]. Available from: http://www.fsf.org/gnu/manifesto.html [Accessed 18 September 1998]. Background to the GNU Project.

Stallman, R.M. (1998). Linux and the GNU Project [online]. Available from: http://www.fsf.org/gnu/linux-and-gnu.html [Accessed 8 April 2001]. A discussion of the term "Linux" and its relationship to the GNU Project.

Stallman, R.M. (1999). Why 'Tree Software" is better than "Open Source" [online]. Available from: http://www.fsf.org/philosophy/free-software-for-freedom.html [Accessed 129

5 April 1999]. A discussion of the differences between the terms "free software" and "open source."

Stallman, R.M. (2000). GNU Coding Standards [online]. Available from: http://www.gnu.org/prep/standards_toc.html [Accessed 3 February 2001]. Style guide for GNU.

Sun Microsystems. (2000). Sun Community Source Licensing [online]. Available from: http://www.sun.com/software/communitysource/index.html [Accessed 30 September, 2000]. Information on the Sun Community Source License.

Tamiya, M. (1999) ChangeLog interviews Jeremy Allison. Linux Weekly News [online], Available from: http://lwn.net/2000/features/lc99/jeremy.phtml [Accessed 11 March 2000]. Jeremy Allison discusses his involvement with Samba, free software vs. Open Source, and the Mindcraft benchmarks.

Torvalds, L. 31 July 1992. Linux's History [online]. Available from: http://www.li.org/li/linuxhistory.shtml [Accessed 9 September 1998]. Linus Torvalds recalls early Linux kernel development.

Torvalds, L. 29 September 1998. Re: 2.1.123 and fbcon.c. Linux Kernel Mailing List [online]. Available from: [email protected] [Accessed 11 February 2001], The infamous "Linus burnout episode."

Torvalds, L. (1999). The Linux Edge. In: DiBona, C, Ockman, S., and Stone, M. eds. Open Sources: Voices from the Open Source Revolution. O'Reilly and Associates, 101- 111. Linus Torvalds discusses Linux kernel design.

Torvalds, L. 18 March 2000. Re: 2.3.51 tulip broken. Linux Kernel Mailing List [online]. Available from: [email protected] [Accessed 25 March 2000]. Linus Torvalds responds to comments regarding Donald Becker's approach to maintaining Linux ethernet drivers.

Tridgell, A. (2000). Samba-TNG fork [online]. Available from: http://www.samba.org/samba/tng.html [Accessed 18 June 2001].

Valloppillil, V. (1998a). Open Source Software: A New(?) Development Methodology [online]. Available from: http://www.opensource.org/halloweenl.html [Accessed 6 November 1998]. An infamous competitive analysis of the open-source software development methodology, which originated as an internal Microsoft document.

Valloppillil, V. (1998b). Linux OS Competitive Analysis: The Next Java VM? [online]. Available from: http://www.opensource.org/halloween2.html [Accessed 6 November 1998]. A second document dealing specifically with Linux. 130 van der Hoek, Andre. (2000). Configuration Management and Open Source Projects. 3rd Workshop on Software Engineering over the Internet. ICSE 2000. Examines why CVS plays such a prominent role in open-source development. van Rossum, G. (1998). O'Reilly's "Open Source" summit. Linux Weekly News [online]. Available from: http://lwn.net/lwn/980416/a/guido-oss.html [Accessed 11 March 2000]. Guido van Rossum summarizes O'Reilly's Open Source Summit and press conference.

Vixie, P. (1999). Software Engineering. In: DiBona, C, Ockman, S., and Stone, M. eds. Open Sources: Voices from the Open Source Revolution. O'Reilly and Associates, 91- 100. Paul Vixie of BIND discusses various aspects of open-source development as they relate to software engineering.

Vosburg, J., Curtis, B., Wolverton, R., Albert, B., Malec, M., Hoben, S., and Liu, Y. (1984). Productivity Factors and Programming Environments. Proceedings of the 7th International Conference on Software Engineering. IEEE Computer Society, 143-152. A study of large-scale software productivity.

Wall, L. (1999a). The Origin of the Camel Lot in the Breakdown of the Bilingual Unix. Communications of the ACM, 42(4), 40-41. Larry Wall explains the motivation and culture behind Perl.

Wall, L. (1999b). Diligence, Patience, and Humility. In: DiBona, C, Ockman, S., and Stone, M. eds. Open Sources: Voices from the Open Source Revolution. O'Reilly and Associates, 127-147. A broad discussion of language and Perl.

Weinberg, G.M. (1998). The Psychology of Computer Programming: Silver Anniversary Edition. Dorset House Publishing. Landmark book describing software development as a social activity.

Williams, R. (1999). Linux Kernel Version History [online]. Available from: http://ps.cus.umist.ac.uk/~rhw/kernel.versions.html [Accessed 9 February 1999]. A comprehensive listing of all kernel builds.

Williams, R. 19 August 2000. Timezones. Linux Kernel Mailing List [online]. Available from: [email protected] [Accessed 15 Oct 2000]. Distribution of postings to the Linux kernel developers list by timezone.

Williams, S. (2000). Open Season: , prodigal hacker. Upside Today [online]. Available from: http://www.upsidetoday.com/texis/mvm/story?id=38e0f6630 [Accessed 10 April 2000]. An interview with Marc Andreessen, discussing his addition to the CollabNet board of directors.

Yamagata, H. (1997). Linux's Linus Torvalds. Tokyo Linux User's Group [online]. Available from: http://www.tlug.gr.jp/linus.html [Accessed 13 November 1998]. An 131 interview with Linus Torvalds, in which he discusses licensing, GNU, success factors, and future plans.

Zawinski, J. (1999). resignation and postmortem [online]. Available from: http://www.jwz.org/gruntle/nomo.html [Accessed 2 July 2000]. Jamie Zawinski comments on his resignation from Mozilla.

Zelkowitz, M.V. and Wallace, D.R. (1998). Experimental models for validating technology. Computer, 31(5), 23-31. Provides a taxonomy of validation methods for experimentation in software engineering.

Zhao, L. and Elbaum, S. (2000). A Survey on Quality Related Activities in Open Source. Software Engineering Notes, 25(3), 54-57. Analysis of responses from a small-scale quality assurance survey. 132

Appendices

A.l Open Source Chronology (Selected Events) 57

1950's and 1960's

Software is distributed with source code and without restrictions in forums such as the "Algorithms" section of CACM, the IBM Share User Group, and the DEC Decus user group.

1969

April: RFC 1 describing the first software for the Internet (then ARPAnet) is published. Free accessibility to RFCs, and especially to the protocol specifications was key to Internet development.

The birth and infancy of Unix, developed as a replacement for the failed OS Multics.

1971

Richard Stallman begins his career at MIT in a group that uses only free software (nonproprietary software which includes publicly accessible and redistributable source code).

1974

First published report on Unix.

1975

Richard Stallman writes the first editor.

1978

Donald Knuth of Stanford University begins working on the Tex system, and distributes it as free software.

1979

Eric Allman writes a precursor to Sendmail, called Delivermail. It is shipped with 4.0 and 4.1 BSD Unix.

1980

57 Adapted from Barahona et al (1999) and Dougherty (1998a). 133

Early era of nonproprietary software for academic use is largely over. Most software has become proprietary; that is, it is privately owned and its source code is not publicly available.

1983

Richard Stallman writes the GNU Manifesto, in which he calls for a return to publicly shareable software and source code.

GNU Project begins. Developers begin creating a wide range of generally Unix-like tools and software such as compilers. The kernel is not covered by these early efforts.

1985

MIT's is distributed as free software under a minimally restrictive license.

1987

Larry Wall releases the first version of Perl, a Unix-based programming language he designed to scan, manipulate, and print text files. The first version is released under the GPL, but Wall feels the terms are too restrictive and writes his own distribution rules, which he calls the "Artistic License."

1988

Tel developed by John Ousterhout.

1989

Cygnus, the first company to identify business opportunities in free software, is founded.

1990

The Free Software Foundation announces its intent to build a powerful Unix-like kernel called Hurd. Their goal is to fill in the last major hole in the GNU suite of software and create a nonproprietary development system. However, the scope of Hurd is so large that many fear it will never be completed.

Python invented by Guido Van Rossum at CWI in Amsterdam.

1991

William and Lynne Jolitz write a series in Dr. Dobbs Journal on how to port BSD Unix to 386-based PCs. This is the start of the BSD family of free OSs. The BSD-oriented free- 134

software movement keeps comparatively tight control of the technical content of its freely distributed source code.

August: Finnish graduate student Linus Torvalds creates a very limited version of Linux, version 0.01. Initially, Torvalds uses Minix-386 (an academically oriented proprietary Unix-like operating system developed by Andrew Tanenbaum) as his development kernel.

October: The first "official" version of Linux, version 0.02, is released.

December: Torvalds releases the first self-supporting release of Linux, version 0.11. Developers can now work on Linux without using any proprietary tools or operating systems.

1992

The US Air Force awards New York University a contract to build a free compiler for what is now called Ada 95. The NYU team chooses GNU gcc for code generation and calls their compiler GNAT, for GNU NYU Ada 95 Translator.

January: Tanenbaum publicly criticizes Linux as technically obsolete and overly architecture-specific. In the ensuing Usenet dialog Torvalds adopts an adamantly open- source stance that helps attract new developers to Linux.

July: 386BSD1.0 is released by William and Lynne Jolitz. A legal battle begins to determine if there is any proprietary code in Berkeley Net Release/2, on which 386BSD is based. Some months later, a settlement is achieved with the release of Berkeley Net Release/3, on which all modern free BSD operating systems are based.

1993

August: creates a new Linux distribution called Debian Linux, developed by a distributed group of volunteers. Because Debian software can be folded back into other Linux distributions, it does not create any significant long-term split in the basic Linux distribution.

December: FreeBSDl .0, one of the first stable descendants of Jolitzs' early 386BSD, is released on the Web.

First release of Samba by Andrew Tridgell.

1994

Free Ada compiler receives a commercial boost with the incorporation of Ada Core Technologies (ACT), by the original NYU creators of GNAT. ACT decides to make money by evolving GNAT and selling support services, rather than by selling GNAT 135 itself. Over time and with the help of ACT, GNAT becomes the dominant Ada 95 compiler for most commercial applications.

Marc Ewing begins the Red Hat Linux distribution. Like the Debian distribution, it is intended to improve on the then-dominant Softlanding Linux System (SLS) distribution.

January: Debian Linux (version 0.91), developed by 12 volunteers, is released.

March: Linux 1.0 is released. First issue of Linux Journal is published.

October: Bryan Sparks founds Caldera with start-up money from former Novell chief executive Ray Noorda. Perl 5 is released with extensions, which give Perl programmers a much more flexible framework for adding new features. NetBSD 1.0 is released.

1995

January: FreeBSD 2.0 is released.

February: A group of webmasters gather for the purpose of coordinating changes to NCSA's http daemon developed by Rob McCool. Eight core contributors form the Apache group.

April: The first official public release (0.6.2) of Apache is distributed.

December: Apache 1.0 is released.

1996

First Conference on Freely Redistributable Software is held in Cambridge, Massachusetts, USA.

August: First test release of the GNU system, using the GNU Hurd as the kernel.

1997

June: Eric Raymond presents his paper "The Cathedral and the Bazaar" at Linux Kongress 97.

1998

January: Persuaded by a whitepaper written by Frank Hecker, citing the Cathedral and the Bazaar, Netscape declares its intent to release the source code for its Navigator browser.

February: Chris Peterson and others coin the phrase "open source" to help differentiate the more business-compatible approach to free software found, for example, in the Red 136

Hat Linux and Netscape releases. The actual Linux distributions remain dependent on GNU and free software, however, and are perhaps better understood as a different way of bundling and promoting multiple strands of both free and open source software.

April: Netscape source code is released, and initial fixes and enhancements begin arriving within hours.

July: Debian 2.0 is released by more than 300 volunteer package maintainers working on more than 1,500 packages.

August: Linus Torvalds appears on the front cover of Forbes magazine.

October: IBM incorporates Apache into its commercial Websphere product. Intel and Netscape invest in Red Hat Linux.

November: Internal Microsoft "Halloween" documents analyzing strengths and weaknesses of Open Source software and Linux are leaked to the public.

1999

April: Jamie Zawinski, leader of the Mozilla project, resigns in apparent frustration over inability to ship a first release.

July: Brian Behlendorf founds CollabNet.

August: Red Hat software, a popular commercial Linux distributor, goes public raising over 68 million dollars in venture capital.

2000

October: Sun Microsystems open sources StarOffice, its productivity application suite.

A.2 Open Source Projects 38

Name: Alliance URL: http://www.allos.org/ License: GPL Description: "The goal of the Alliance project is to create a stable operating system based on a modified version of the Stanford Caching Model of Operating System Functionality. The primary modification makes it usable on Intel compatible hardware (the Stanford kernel was implemented on a PaRadiGM system built for that purpose). Alliance introduces a modern system for distributed computing, with a CORBA ORB as a communication backbone for the external kernels."

Compiled from project homepages, as well as Freshmeat.net (1999). 137

Comments: Well-organized project, currently coordinated by Lloyd Duhon. Specifications and source tree available at ftp://ftp.allos.org/pub/allos/. General mailing list at [email protected]. CVS repository currently accessible only by Alliance members. Various module owners.

Name: Apache URL: http://www.apache.org/ or http://dev.apache.org/ License: The Apache Software License Description: "The Apache Project is a collaborative software development effort aimed at creating a robust, commercial-grade, featureful, and freely-available source code implementation of an HTTP (Web) server. The project is jointly managed by a group of volunteers located around the world, using the Internet and the Web to communicate, plan, and develop the server and its related documentation. These volunteers are known as the Apache Group. In addition, hundreds of users have contributed ideas, code, and documentation to the project." Comments: One of the most widely recognized open-source projects. Developer mailing list at [email protected]. CVS repository and GNATS bug tracking.

Name: Argo/UML URL: http://argouml.tigris.org/ or http://www.ics.uci.edu/pub/arch/uml/index.html or http://www.ArgoUML.org/ License: BSD type Description: "Argo/UML is a pure Java open source CASE tool that provides cognitive support of object-oriented design. Argo/UML provides some of the same editing and code generation features of a commercial CASE tool, but it focuses on features that enhance usability and support the cognitive needs of designers. Uses XML file formats: XMI and PGML." Comments: Originated by Jason Robbins as an academic research project. Continues to scale well, now affiliated with Collab.Net. Developer mailing list at [email protected]. CVS repository and Bugzilla bug tracking. Various module owners.

Name: Berlin URL: http://www.berlin-consortium.org/ License: LGPL Description: "Berlin is an experimental that is a logical extension of the integrated layout and structured graphics model developed in Interviews and Fresco. It makes heavy use of CORBA for transparent inter-process control, and utilizes a loadable module framework to maintain good drawing speed and extensive customizability." Comments: Developer mailing [email protected]. CVS repository and SourceForge bug tracking.

Name: BIND URL: http://www.isc.org/products/BrND/ License: Free to use but restricted 138

Description: "BIND (Berkeley Internet Name Domain) is an implementation of the Domain Name System (DNS) protocols and provides an openly redistributable reference implementation of the major components of the Domain Name System. The BIND DNS Server is used on the vast majority of name serving machines on the Internet." Comments: Relatively closed development. Source available via FTP.

Name: The Casbah Project URL: http://www.casbah.org/ License: OpenSource Description: "The goal of The Casbah Project is to build a free software Web application framework, allowing developers to build Web applications rapidly and effectively." Comments: Developer mailing list at [email protected]. CVS Repository and Request Tracker bug tracking.

Name: The Open Directory Project URL: http://www.dmoz.org/ License: Open Directory License Description: "The Open Directory Project's goal is to produce the most comprehensive directory of the web, by relying on a vast army of volunteer editors." Comments: Not a software project in the conventional sense, but still a good example of distributed collaboration.

Name: Emacs URL: http ://www.gnu.org/software/emacs/ License: GPL Description: "Emacs is the extensible, customizable, self-documenting real-time display editor. Emacs has special code editing modes, a scripting language (elisp), and comes with many packages for doing mail, news and more, all in your editor." Comments: The original free software project. Mailing list at bug-gnu- [email protected].

Name: Enhydra URL: http://www.enhydra.org/ License: BSD type Description: "Enhydra is an Open Source Java/XML application server run-time and development environment. In development for nearly three years, Enhydra is the result of pragmatic, hands-on consulting engagements. It is the only Open Source application server that features full XML support, including Enhydra XMLC for turning HTML presentations into Java classes based on the Document Object Model. Enhydra includes addition developer tools including Enhydra Debugger and Enhydra DODS for creating object to relational database mappings." Comments: Corporately sponsored by Lutris Technologies. Developer mailing list at [email protected]. CVS repository but no bug tracking.

Name: Enlightenment 139

URL: http://www.enlighterrrnent.org/ License: BSD type Description: "Enlightenment is a window manager designed to be fast, extremely powerful, flexible, configurable, themeable and also usable." Comments: Popular window manager for X. Coordinated by Carsten Haitzler and Geoff Harrison. Mailing list at [email protected]. CVS repository but no bug tracking.

Name: FreeBSD URL: http://www.freebsd.org/ License: BSD type Description: "FreeBSD is a UNIX operating system based on U.C. Berkeley's 4.4BSD- lite release for the i386 platform (and recently the alpha platform). It is also based indirectly on William Jolitz's port of U.C. Berkeley's Net/2 to the i386, known as 386BSD, though very little of the 386BSD code remains." Comments: Coordinated by the FreeBSD core team, notably Jordan Hubbard. Numerous mailing lists, including [email protected]. CVS repository and GNATS bug tracking. Various maintainers.

Name: FreeBuilder URL: http://www.freebuilder.org/ License: GPL? Description: "Free Builder is a free Java Integrated Development Environment. It's got all the tools you would expect from an IDE." Comments: Very little activity. After a promising start, participation quickly dissipated in favour of competing projects such as jEdit. Developer mailing list at free- [email protected].

Name: FreeCASE URL: http://www.freecase.seul.org/ License: GPL? Description: "FreeCASE is/will be a team-oriented tool for object-oriented analysis and design. It will support UML. It will forward-generate and reverse engineer source code in multiple languages. It will support a networked repository, allowing for development over the Internet. It will also provide versioning and code management capabilities. Additionally, it will support a client running on multiple platforms." Comments: Very little activity. Project leader Jeff Wolfe no longer involved in project. Mailing list at [email protected].

Name: FreeDOS URL: http://www.freedos.org/ License: GPL Description: "FreeDOS aims to be a complete, free, 100% MS-DOS compatible operating system." Comments: Surprisingly active project, originated and currently coordinated by Jim Hall. Mailing list at [email protected]. Jitterbug bug tracking. Various maintainers. 140

Name: FreeType URL: http://www.freetype.org/ License: Free to use but restricted Description: "The FreeType engine is a free and portable TrueType font rendering engine. It has been developed to provide TT support to a great variety of platforms and environments. Notice that FreeType is a library. It is not a font server for your preferred environment, even though it has been written to allow the design of many font servers." Comments: Developer mailing list at [email protected]. CVS repository.

Name: gIDE URL: http://gide.pn.org/ License: GPL Description: "gIDE is a GTK-based Integrated Development Environment for C. gIDE already features a powerful editor with syntax highlighting and (one4ine) indenting. GNU C and GDB are (partly) integrated as well." Comments: Originated by Steffan Kern. Developer mailing list at gide- [email protected]. CVS repository and Gnome bug tracking.

Name: The GIMP URL: http://www.gimp.org/ License: GPL Description: "The GIMP is the GNU Image Manipulation Program. It is a freely distributed piece of software suitable for such tasks as photo retouching, image composition and image authoring. It can be used as a simple paint program, an expert quality photo retouching program, an online batch processing system, a mass production image renderer, a image format converter, etc." Comments: Originated by Peter Mattis and Spencer Kimball, currently coordinated by Manish Singh. Developer mailing list at [email protected]. CVS repository and Gnome bug tracking.

Name: GNOME URL: http://www.gnome.org/ License: GPL Description: "GNOME is the GNU Network Object Model Environment. This project is building a complete, user-friendly desktop based entirely on free software. This desktop consists of small utilities and larger applications that share a consistent . It uses the GTK as the GUI toolkit for all GNOME-compliant applications. Comments: Altogether a fairly large effort comprised of multiple subprojects relating to the desktop. Developer mailing list at [email protected]. CVS repository and Gnome bug tracking. Various maintainers.

Name: GNUstep URL: http://www.gnustep.org/ License: GPL 141

Description: "GNUstep is a set of general-purpose Objective-C libraries based on the OpenStep standard developed by NeXT (now Apple) Inc. The libraries consist of everything from foundation classes, such as dictionaries and arrays, to GUI interface classes such as windows, sliders, buttons, etc." Comments: Source tree available at ftp://ftp..org/pub/gnustep. Closed developer mailing list. GNATS bug tracking but no CVS repository. Various maintainers.

Name: Gzilla URL: http://www.gzilla.com/ License: GPL Description: "Gzilla is a free web browser written in the GTK+ framework. Right now, it's still in early alpha, but you might have fun playing with it anyway." Comments: Now called Armadillo, currently coordinated by Christopher Reid Palmer. Little activity. Developer mailing list at [email protected].

Name: Harmony URL: http ://www.gnu.org/software/harmony/ License: GPL Description: "The Harmony project was born to make KDE free software, as defined by the Free Software Foundation and the GPL. It aims to be API-compatible with Troll Tech's Qt toolkit, currently used by KDE, but will expand on Qt's functionality by adding support for multi-threaded applications and pluggable themes on the model of the Offix widget set and Enlightenment window manager." Comments: Discontinued.

Name: Hurd URL: http: // www. gnu. or g/soft war e/hurd/ License: GPL Description: "The GNU Hurd is the GNU project's replacement for the Unix kernel. The Hurd is a collection of servers that run on the Mach microkernel to implement file systems, network protocols, file access control, and other features that are implemented by the Unix kernel or similar kernels (such as Linux). Currently, the Hurd runs on i386 machines. The Hurd should, and probably will, be ported to other hardware architectures or other microkernels in the future." Comments: In development since 1990, last official release in 1997. Remains active. CVS repository and bug reporting at [email protected].

Name: INN URL: http://www.isc.org/inn.html License: Free to use but restricted Description: "InterNetNews is a complete Usenet system. The cornerstone of the package is innd, an NNTP server that multiplexes all I/O. Think of it as an nntpd merged with the B News inews, or as a C News relaynews that reads multiple NNTP streams. Newsreading is handled by a separate server, nnrpd, that is spawned for each client. Both innd and nnrpd have some slight variances from the NNTP protocol (although in normal 142 use you will never notice); see the manpages. INN separates hosts that feed you news from those that have users reading news." Comments: Source available via FTP. General mailing list at [email protected]. Bug reporting at [email protected], patch submission at [email protected].

Name: Jazilla URL: http://www.jazilla.org/ License: MPL Description: "Jazilla is an ongoing work by a group of Java programmers to create a 100% Java version of the Mozilla browser. It's still a work in progress and so should only be downloaded by developers, but can read a web page from a remote web server and display it." Comments: Ambitious project. Developer mailing list at [email protected]. CVS repository and SourceForge bug tracking.

Name: Jigsaw URL: http://www.w3.org/Jigsaw/ License: BSD type Description: "Jigsaw is W3C's leading-edge Web server platform, providing a sample HTTP 1.1 implementation based on the latest IETF drafts and a variety of other features on top of an advanced architecture implemented in Java. Jigsaw provides both client and server HTTP/1.1 implementations and is also packaged as a ready-to-run HTTP/1.1 proxy-cache." Comments: Source available via FTP. General mailing list at [email protected]. Bug reporting [email protected].

Name: Kaffe URL: http://www.kaffe.org/ License: GPL Description: "Kaffe is a complete, Personal Java 1.1 compliant Java environment. As an independent implementation, it was written from scratch and is free from all third party royalties and license restrictions. It comes with its own standard class libraries, including Beans and Abstract Window Toolkit (AWT), native libraries, and a highly configurable virtual machine with a just-in-time (JIT) compiler for enhanced performance." Comments: Originated by Tim Wilkinson, now administered through Transvirtual Technologies. CVS repository and Kaffe bug tracking.

Name: KDE URL: http://www.kde.org/ or http://developer.kde.org/ License: GPL Description: "KDE is a powerful graphical desktop environment for Unix workstations. It combines ease of use, contemporary functionality and outstanding graphical design with the technological superiority of the Unix operating system. KDE is a completely new desktop, incorporating a large suite of applications for Unix workstations. While KDE includes a window manager, file manager, panel, control center and many other components that one would expect to be part of a contemporary desktop environment, the 143 true strength of this exceptional environment lies in the interoperability of its components." Comments: Well organized, multiple subprojects relating to the desktop. Developer mailing list at [email protected]. CVS repository and Debian bug tracking. Various maintainers.

Name: Linux URL: http://www.linux.org/ or http://www.kernel.org/ License: GPL Description: "Linux is a clone of the operating system Unix, written from scratch by Linus Torvalds with assistance from a loosely-knit team of hackers across the Net. It aims towards POSIX compliance. It has all the features you would expect in a modern fully-fledged Unix, including true multitasking, virtual memory, shared libraries, demand loading, shared copy-on-write executables, proper memory management, and TCP/IP networking. Linux was first developed for -based PCs (386 or higher). These days it also runs on Compaq Alpha AXP, Sun SPARC, Motorola 68000 machines (like Atari ST and Amiga), MIPS, PowerPC, ARM and SuperH. Additional ports are in progress, including PA-RISC and IA-64." Comments: Source tree at http://www.kernel.org. Developer mailing list at linux- [email protected]. Unofficial CVS repository at http://vger.samba.org. Various maintainers.

Name: Mnemonic URL: http://www.mnemonic.org/ License: GPL Description: "Mnemonic is a free GPL'ed WWW browser designed from the ground up, with an emphasis on modularity, small size, speed, runtime extensibility and user configurability." Comments: Apparently discontinued.

Name: Mozilla URL: http://www.mozilla.org/ License: MPL Description: "Mozilla is an open-source web browser, designed for standards compliance, performance and portability. We coordinate the development and testing of the browser by providing discussion forums, software engineering tools, releases and bug tracking." Comments: General mailing list at [email protected]. CVS repository and Bugzilla bug tracking. Various module owners.

Name: MySQL URL: http://www.mysql.org/ License: Free to use but restricted Description: "MySQL is a SQL (Structured Query Language) database server. SQL is the most popular database language in the world. MySQL is a client server 144 implementation that consists of a server daemon mysqld and many different client programs/libraries." Comments: Very popular RDBMS. Developer mailing list at [email protected]. Bug reporting at [email protected]. Source available via FTP.

Name: NetBSD URL: http://vmw.netbsd.org/ License: BSD type Description: "The NetBSD Operating System is a fully functional Open Source Unix• like operating system descended from the Berkeley Networking Release 2 (Net/2), 4.4BSD-Lite, and 4.4BSD-Lite2 sources. NetBSD runs on twenty different system architectures featuring eight distinct families of CPUs, and is being ported to more." Comments: Emphasis on portability. Developer mailing list at [email protected]. CVS repository and bug reporting at [email protected]. Various port maintainers.

Name: OpenBIOS URL: http://www.freiburg.linux.de/OpenBIOS/ License: ? Description: "Our goal is to create an IEEE 1275-1994 compliant (referred to as OpenFirmware). OpenFirmware (IEEE-1275-1994) is used by SUN, Apple and others." Comments: Developer mailing list at [email protected]. Source available via FTP.

Name: OpenBSD URL: http://www.openbsd.org/ License: BSD type Description: "The OpenBSD project produces a FREE, multi-platform 4.4BSD-based UNIX-like operating system. Our efforts place emphasis on portability, standardization, correctness, security, and cryptography. OpenBSD supports binary emulation of most programs from SVR4 (Solaris), FreeBSD, Linux, BSDI, SunOS, and HPUX." Comments: Very active project, with emphasis on security and cryptography. Developer mailing list at [email protected]. CVS repository and GNATS bug tracking. Various maintainers.

Name: OpenLDAP URL: http://www.openldap.org/ License: OpenLDAP Public License Description: "The OpenLDAP Project is a collaborative effort to provide a robust, commercial-grade, fully featured, and open source LDAP suite of applications and development tools. The project is managed by a worldwide community of volunteers that use the Internet to communicate, plan, and develop the OpenLDAP suite and its related documentation. OpenLDAP also provides a complete LDAPv2+ implementation including server, clients, and a C SDK. Comments: Coordinated by the OpenLDAP Core Team. Developer mailing list at [email protected]. CVS repository and Jitterbug bug tracking. 145

Name: Perl URL: http://www.perl.com/ or http://cpan.perl.org/ License: Artistic and GPL Description: "Perl is a high-level, general-purpose programming language that makes easy things easy and hard things possible. It is optimized for scanning arbitrary text files and system administration. It has built-in extended regular expression matching and replacement, a dataflow mechanism to improve security with setuid scripts and is extendable via modules that can interface to C libraries." Comments: Very popular. Originated by Larry Wall. Developer mailing list at perl5- [email protected]. Source code available via FTP. Various maintainers or pumpkins.

Name: PostgreSQL URL: http://www.postgresql.org/ License: BSD type Description: "PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS), derived from the Berkeley Postgres database management system. While PostgreSQL retains the powerful object-relational data model, rich data types and easy extensibility of Postgres, it replaces the PostQuel query language with an extended subset of SQL." Comments: Currently coordinated by Marc Fournier. Source code available via FTP or through a CVS repository.

Name: ProjectCenter URL: http://www.projectcenter.ch/ License: GPL Description: "ProjectCenter should one time become the GNUstep counterpart to Apple/NeXT's ProjectBuilder. It will integrate the simplicity of the elder ProjectBuilder found in NEXTSTEP, the power of the new one found in OPENSTEP and MacOS X Server and it will adopt other new features as well!" Comments: Designed and written by Philippe CD. Robert. Source currently available only to core contributors.

Name: Python URL: http://www.python.org/ License: BSD type Description: "Python is an interpreted, interactive, object-oriented programming language. It combines remarkable power with very clear syntax, and isn't difficult to learn. It has modules, classes, exceptions, very high-level data types, and dynamic typing. There are interfaces to many system calls and libraries, as well as to various windowing systems (, Mac, MFC, GTK, QT, wxWindows). New built-in modules are easily written in C or C++. Python is also usable as an extension language for applications that need a programmable interface." Comments: Originated by Guido van Rossum. Core development moved to BeOpen.com. General mailing list at [email protected]. CVS repository and SourceForge bug tracking. 146

Name: Samba URL: http://www.samba.org/ License: GPL Description: "The Samba software suite is a collection of programs that implements the SMB protocol for Unix systems, allowing you to serve files and printers to Windows, NT, OS/2 and DOS clients. This protocol is sometimes also referred to as the LanManager or Netbios protocol." Comments: Originated by Andrew Tridgell. Currently coordinated by the Samba Team. Developer mailing list at [email protected]. CVS repository and Jitterbug bug tracking.

Name: Sendmail URL: http://www.sendmail.org/ License: OpenSource Description: "Sendmail is a Mail Transfer Agent, which is the program that moves mail from one machine to another. Sendmail implements a general internetwork mail routing facility, featuring aliasing and forwarding, automatic routing to network gateways, and flexible configuration." Comments: Originated by Eric Allman. General mailing list at comp-mail- [email protected]. Source available via FTP. Commercial support available through Sendmail, Inc.

Name: SHELF URL: http://www.applixware.org/ License: LGPL Description: "Applix SHELF enables application developers to increase customization and extensibility of their applications by embedding SHELF in their products. It also allows users to rapidly develop graphical applications and is ideally suited for developing graphical interfaces to legacy and Internet/Intranet applications. SHELF does not need to be embedded within applications and can be used to create true cross platform, standalone, robust graphical applications." Comments: Corporately sponsored by VistaSource. Developer mailing list at shelf- [email protected]. Source available via FTP.

Name: Tcl/tk URL: http://dev.scriptics.com/software/tcltk/ License: Freely distributable Description: "Tel provides a portable scripting environment for Unix, Windows, and that supports string processing and pattern matching, native file system access, shell-like control over other programs, TCP/IP networking, timers, and event-driven I/O. Tel has traditional programming constructs like variables, loops, procedures, namespaces, error handling, script packages, and dynamic loading of DLLs. Tk provides portable GUIs on UNIX, Windows, and Macintosh. A powerful widget set and the concise scripting interface to Tk make it a breeze to develop sophisticated user interfaces." 147

Comments: Created and developed by John Ousterhout. Developer mailing list at [email protected]. CVS repository and bug tracking. Commercial support available through Scriptics.

Name: Window Maker URL: http://www.windowmaker.org/ License: GPL Description: "Window Maker is an XI1 window manager designed to give additional integration support to the GNUstep Desktop Environment. In every way possible, it reproduces the elegant look and feel of the NeXTSTEP[tm] GUI. It is fast, feature rich, easy to configure, and easy to use. In addition, Window Maker works with GNOME and KDE, making it one of the most useful and universal window managers available." Comments: Developer mailing list at [email protected]. CVS repository and Jitterbug bug tracking. Various maintainers.

Name: Wine URL: http://www.winehq.com/ License: BSD type Description: "Wine Is Not an Emulator, it is an alternative implementation of the Windows 3.x and Win32 . Wine provides both a development toolkit (Winelib) for porting legacy Windows sources to Unix and a program loader, allowing unmodified Windows 3.1/95/NT binaries to run under Intel Unixes. Wine does not require , as it is a completely alternative implementation consisting of 100% Microsoft Free code, but it can optionally use native system DLLs if they are available." Comments: Currently coordinated by Alexandre Julliard. Developer mailing list at [email protected]. CVS repository and GNATS bug tracking.

Name: wxWindows URL: http://www.wxwindows.org/ License: BSD type Description: "wxWindows/Gtk is the GTK+ port of the C++ cross-platform wxWindows GUI library, offering classes for all common GUI controls as well as a comprehensive set of helper classes for most common application tasks, ranging from networking to HTML display and image manipulation. There are also Python bindings available for the GTK and the MSW port." Comments: Developer mailing list at [email protected]. CVS repository and Bugzilla bug tracking. Various maintainers.

Name: XEmacs URL: http://www.xemacs.org/ License: GPL Description: "XEmacs (formerly known as Lucid Emacs) is a powerful, extensible text editor with full GUI support, initially based on an early version of GNU Emacs 19 from the Free Software Foundation and since kept up to date with recent versions of that product. XEmacs stems from a collaboration of Lucid, Inc. with Sun Microsystems, Inc. 148 and the University of Illinois with additional support having been provided by Amdahl Corporation, INS Engineering Corporation, and a huge amount of volunteer effort." Comments: Currently coordinated by the XEmacs Review Board. General mailing list at @xemacs.org. Source available via FTP. Jitterbug bug tracking. Various maintainers.

Name: Xfree86 URL: http://www.xfree86.org/ License: Freely distributable Description: "XFree86 is a freely redistributable implementation of the X Window System that runs on UNIX and UNIX-like operating systems." Comments: Closed developer mailing list. CVS respository. Patch submission at [email protected].

Name: Zope URL: http://www.zope.org/ License: OpenSource Description: "Zope is a free, Open Source web application platform used for building high-performance, dynamic web sites. It contains a powerful and simple scripting object model and high-performance, integrated object database." Comments: Corporately sponsored by Digital Creations. Developer mailing list at [email protected]. CVS repository and bug tracking. A.3 Open Source Definition

Version 1.7

Open source doesn't just mean access to the source code. The distribution terms of open- source software must comply with the following criteria:

1. Free Redistribution

The license may not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license may not require a royalty or other fee for such sale.

2. Source Code

The program must include source code, and must allow distribution in source code as well as compiled form. Where some form of a product is not distributed with source code, there must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction cost — preferably, downloading via the Internet without charge. The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed. 149

3. Derived Works

The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software.

4. Integrity of The Author's Source Code.

The license may restrict source-code from being distributed in modified form only if the license allows the distribution of "patch files" with the source code for the purpose of modifying the program at build time. The license must explicitly permit distribution of software built from modified source code. The license may require derived works to carry a different name or version number from the original software.

5. No Discrimination Against Persons or Groups.

The license must not discriminate against any person or group of persons.

6. No Discrimination Against Fields of Endeavor.

The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.

7. Distribution of License.

The rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties.

8. License Must Not Be Specific to a Product.

The rights attached to the program must not depend on the program's being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution.

9. License Must Not Contaminate Other Software.

The license must not place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be open-source software. A.4 GNU General Public License

Version 2, June 1991 150

Copyright (C) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA

Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.

Preamble

The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software—to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too.

When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things.

To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.

We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations.

Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all.

The precise terms and conditions for copying, distribution and modification follow. 151

TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION, AND MODIFICATION

This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you".

Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does.

1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program.

You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee.

2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:

a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change.

b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.

c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program 152

itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.)

These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it.

Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program.

In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.

3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:

a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,

b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,

c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.)

The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major 153

components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable.

If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code.

4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.

5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it.

6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License.

7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program.

If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances.

It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the 154

sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice.

This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License.

8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License.

9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns.

Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation.

10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally.

NO WARRANTY

11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 155

PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

END OF TERMS AND CONDITIONS

How to Apply These Terms to Your New Programs

If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms.

To do so, attach the following notices to the program. It is safest to attach them to the start of each source fde to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found.

one line to give the program's name and an idea of what it does. Copyright (C) yyyy name of author

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: 156

Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type 'show w'. This is free software, and you are welcome to redistribute it under certain conditions; type 'show c' for details.

The hypothetical commands 'show w' and 'show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than 'show w' and 'show c'; they could even be mouse-clicks or menu items—whatever suits your program.

You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names:

Yoyodyne, Inc., hereby disclaims all copyright interest in the program 'Gnomovision' (which makes passes at compilers) written by James Hacker.

signature of Ty Coon, 1 April 1989 Ty Coon, President of Vice

This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Library General Public License instead of this License.