Inside the Sequence Universe: the Amazing Life of Data and the People Who Look After Them

Inside the sequence universe: The amazing life of data and the people who look after them Tahani Nadim Goldsmiths, University of London Thesis submitted in fulfilment of the requirements for the degree of Ph.D. July 2012 I, Tahani Nadim, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. 2 Abstract This thesis provides an ethnographic exploration of two large nucleotide sequence databases, the European Molecular Biology Laboratory Bank, UK and GenBank, US. It describes and analyses their complex bioinformatic environments as well as their material-discursive environments – the objects, narratives and practices that recursively constitute these databases. In doing so, it unravels a rich bioinformational ecology – the “sequence universe”. Here, mosquitoes have mumps, the louse is “huge” and self-styled information plumbers patch-up high-throughput data pipelines while data curators battle the indiscriminate coming-to-life caused by metagenomics. Given the intensification of data production, the biosciences have reached a point where concerns have squarely turned to fundamental questions about how to know within and between all that data. This thesis assembles a database imaginary, recovering inventive terms of scholarly engagement with bioinformational databases and data, terms that remain critical without necessarily reverting to a database logic. Science studies and related disciplines, investigating illustrious projects like the UK Biobank, have developed a sustained critique of the perceived conflation of bodies and data. This thesis argues that these accounts forego an engagement with the database sui generis, as a situated arrangement of people, things, routines and spaces. It shows that databases have histories and continue established practices of collecting and curating. At the same time, it maps entanglements of the databases with experiments and discovery thereby demonstrates the vibrancy of data. Focusing on the question of what happens at these databases, the thesis follows data curators and programmers but also database records and the entities documented by them, such as uncultured bacteria. It contextualises ethnographic findings within the literature on the sociology and philosophy of science and technology while also making references to works of art and literature in order to bring into relief the boundary- defying scope of the issues raised. 3 Table of contents Abstract..............................................................................................................................................3 Table of contents ............................................................................................................................4 List of figures ...................................................................................................................................8 List of abbreviations......................................................................................................................9 Acknowledgments....................................................................................................................... 10 Chapter 1. Upsetting the database logic, towards the database imaginary.... 11 Introduction .................................................................................................................................. 11 From database logic to database imaginary...................................................................................14 How biology learned to love the database.......................................................................... 16 Critical responses: data begets life........................................................................................ 17 The databases: EMBL-Bank and GenBank .......................................................................... 21 GenBank .........................................................................................................................................................24 EMBL-Bank ...................................................................................................................................................25 Beyond the databases, the sequence universe..............................................................................26 Making sense of the sequence universe...........................................................................................28 The present research ................................................................................................................. 29 Note on limitations and terms..............................................................................................................31 Chapter overview.......................................................................................................................................32 Chapter 2. Figures seen twice: from archive to database and laboratory....... 35 Introduction .................................................................................................................................. 35 Figure seen twice .......................................................................................................................................36 Initial situations........................................................................................................................... 38 Situating databases ...................................................................................................................................42 Laboratory work: mundane actions ..................................................................................... 44 Working with data.....................................................................................................................................45 Laboratory objects: inscriptions............................................................................................ 47 Bioinformational artefacts.....................................................................................................................49 The field.......................................................................................................................................... 51 4 Sequence universe.....................................................................................................................................52 Raising worlds: Posthuman politics...................................................................................... 54 Surprises towards a database imaginary............................................................................ 56 Chapter 3: Meeting the sequence universe................................................................ 58 Introduction .................................................................................................................................. 58 Cosmic encounters ....................................................................................................................................59 Inventive diffractions................................................................................................................. 60 Co-presence amidst infrastructural assemblages............................................................ 63 Multi-sited co-presence...........................................................................................................................64 Imagining methods ..................................................................................................................... 66 Idiotic pace....................................................................................................................................................67 On form ........................................................................................................................................... 69 The present research ................................................................................................................. 71 Observations and interviews ................................................................................................................73 Chapter 4. Viral and valent trails: a visitor’s guide to the sequence universe .................................................................................................................................................. 75 Introduction .................................................................................................................................. 75 The doubtful guest in the sequence universe................................................................................78 Into the Wellcome Trust Genome Campus ......................................................................... 80 Landscape with database........................................................................................................................81 Performative integration........................................................................................................................83 Entrez: Playground with mumps............................................................................................ 86 Travels to the NIH......................................................................................................................................88 Ways into the sequence universe .........................................................................................

Inside the Sequence Universe: the Amazing Life of Data and the People Who Look After Them

Ensembl Genomes: Extending Ensembl Across the Taxonomic Space P

Abstracts Genome 10K & Genome Science 29 Aug - 1 Sept 2017 Norwich Research Park, Norwich, Uk

Rare Variant Contribution to Human Disease in 281,104 UK Biobank Exomes W 1,19 1,19 2,19 2 2 Quanli Wang , Ryan S

The ELIXIR Core Data Resources: Fundamental Infrastructure for The

Article Reference

Open Targets Genetics

Select Committee on Science and Technology Corrected Oral Evidence: Ageing: Science, Technology and Healthy Living

95 CHAPTER 5 Looking at Science, Looking at You! the Feminist Re-Visions of Nature

Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I

Strategic Plan 2011-2016

General Assembly and Consortium Meeting 2020

Advanced Sectioned Images of a Cadaver Head with Voxel Size Of