Computer Supported Cooperative Work (CSCW) https://doi.org/10.1007/s10606-018-9333-1 © The Author(s) 2018 The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries A Collaborative Ethnography of Documentation Work R. Stuart Geiger1 , Nelle Varoquaux1,2 , Charlotte Mazel-Cabasse1 & Chris Holdgraf1,3 1Berkeley Institute for Data Science, University of California, Berkeley, 190 Doe Library, Berkeley, CA, 94730, USA (E-mail:
[email protected]); 2Department of Statistics, Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, CA, USA; 3Berkeley Institute for Data Science, Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA Abstract. Computational research and data analytics increasingly relies on complex ecosystems of open source software (OSS) “libraries” – curated collections of reusable code that programmers import to perform a specific task. Software documentation for these libraries is crucial in helping programmers/analysts know what libraries are available and how to use them. Yet documentation for open source software libraries is widely considered low-quality. This article is a collaboration between CSCW researchers and contributors to data analytics OSS libraries, based on ethnographic fieldwork and qualitative interviews. We examine several issues around the formats, practices, and challenges around documentation in these largely volunteer-based projects. There are many dif- ferent kinds and formats of documentation that exist around such libraries, which play a variety of educational, promotional, and organizational roles. The work behind documentation is similarly multifaceted, including writing, reviewing, maintaining, and organizing documentation. Different aspects of documentation work require contributors to have different sets of skills and overcome various social and technical barriers.