161-Jessica Cox-25.Indd
Total Page:16
File Type:pdf, Size:1020Kb
Research Paper Co-occurrence of Cell Lines, Basal Media and Supplementation in the Biomedical Research Literature Jessica Cox†, Darin McBeath, Corey Harper, Ron Daniel Elsevier Labs, 230 Park Avenue, New York, NY 10169, USA Citation: Cox, Jessica, Darin McBeath, Corey Harper, and Ron Daniel. “ Co-occurrence of cell Abstract lines, basal media and supplementation in the Purpose: The use of in vitro cell culture and experimentation is a cornerstone of biomedical biomedical research research, however, more attention has recently been given to the potential consequences of literature.” Journal of Data and Information using such artificial basal medias and undefined supplements. As a first step towards better Science, vol. 5, no. 3, understanding and measuring the impact these systems have on experimental results, we use 2020, pp. 161–177. text mining to capture typical research practices and trends around cell culture. https://doi.org/10.2478/ jdis-2020-0016 Design/methodology/approach: To measure the scale of in vitro cell culture use, we have Received: Jan. 31, 2020 analyzed a corpus of 94,695 research articles that appear in biomedical research journals Accepted: Apr. 23, 2020 published in ScienceDirect from 2000–2018. Central to our investigation is the observation that studies using cell culture describe conditions using the typical sentence structure of cell line, basal media, and supplemented compounds. Here we tag our corpus with a curated list of basal medias and the Cellosaurus ontology using the Aho-Corasick algorithm. We also processed the corpus with Stanford CoreNLP to find nouns that follow the basal media, in an attempt to identify supplements used. Findings: Interestingly, we find that researchers frequently use DMEM even if a cell line’s vendor recommends less concentrated media. We see long-tailed distributions for the usage of media and cell lines, with DMEM and RPMI dominating the media, and HEK293, HEK293T, and HeLa dominating cell lines used. Research limitations: Our analysis was restricted to documents in ScienceDirect, and our text mining method achieved high recall but low precision and mandated manual inspection of many tokens. Practical implications: Our findings document current cell culture practices in the biomedical research community, which can be used as a resource for future experimental design. Originality/value: No other work has taken a text mining approach to surveying cell culture practices in biomedical research. Keywords Cell culture; Biomedical research; Text mining JDIS Journal of Data and † Corresponding author: Jessica Cox (E-mail: [email protected]). Information Science http://www.jdis.org https://www.degruyter.com/view/j/jdis 161 Journal of Data and Information Science Vol. 5 No. 3, 2020 Research Paper 1 Introduction Experimentation using in vitro cell culture serves as a foundation of biomedical research. Immortalized cell lines and primary cells are cultured and maintained in cell culture media; a specially formulated mixture of metabolites that supports cellular growth and proliferation. Widely used basal medias (e.g. DMEM, RPMI, MEM) are manufactured with the intent of minimizing variability between culture techniques (Asayama, 2017). These basal media are further supplemented with various compounds to fit the needs of the specific cell line or type. However, the composition of these medias rarely resembles that of the true physiologic conditions these cells would encounter in vivo. The development of defined culture media began as a mission to identify the minimal required metabolites to sustain cell viability outside of the body. Experiments conducted by Harry Eagle in the 1950’s resulted in the development of Basal Medium Eagle (BME) (Eagle, 1955) and it’s more nutritionally dense derivative, Minimum Essential Medium (MEM) (Eagle, 1959). In 1959 Dulbecco and Freeman reported on the use of a concentrated version of MEM, which is now known as DMEM (Dulbecco & Freeman, 1959). RPMI 1640 was developed in 1967 specifically for the culture of white blood cells but supports growth of various immune cell types (Moore, Gerner, & Franklin, 1967). Typically, these basal medias are supplemented with fetal bovine/calf serum (FBS/FCS), an undefined media first described in 1958 (Puck, Cieciura, & Robinson, 1958). FBS is a source of growth factors, trace elements, vitamins, hormones, and proteins that stimulate and sustain in vitro cell growth. Aside from the ethical concerns around the collection of serum, there is variability between batches of serum, which has downstream effects on experimental results (Sikora et al., 2016) (Zheng et al., 2008). Beyond FBS, basal media may be additionally supplemented with antibiotics and/or non-essential amino acids, amongst other compounds necessary for cellular growth within the specific system. The combination of cell line, basal media and supplements used to build cell culture systems is seemingly limitless. Several studies have shown that the composition of basal medias has a significant impact on the results of their experiments (Ariffin et al., 2016; Kim et al., 2015; Pirsko et al., 2018; Selenius et al., 2019; Tomoya Kawakami, 2016). However, culture media and supplementation remain a largely overlooked experimental parameter. Within the past decade, there has been an increase in awareness of how these environmental conditions influence cell metabolism, and more questions are being raised on whether the simplistic environments in which cells are cultured can really be appropriately used to model in vivo conditions (Adams, 2019; Cantor, Journal of Data and 2019; Hirsch & Schildknecht, 2019; Mckee & Komarova, 2017; Vande Voorde Information Science et al., 2019). 162 Co-occurrence of Cell Lines, Basal Media and Supplementation in the Biomedical Jessica Cox et al. Research Literature Research Paper These questions are particularly significant given the broader attention given to reproducibility in science in the past decade. While there is some debate over how serious the issue is in biomedical research, the community agrees that quality work must be reproducible. In 2012, Amgen reported an 11% reproducibility rate of 53 landmark papers in cancer biology (Begley & Ellis, 2012), and in 2018 The Reproducibility Project: Cancer Biology cut the number of studies they worked to replicate from 50 to 18, due to a variety of factors, and of these could only replicate 5 (Kaiser, 2018). Experiments that are not reproducible may be due to dishonest reporting, but more likely due to unreported laboratory and methodological conditions within a paper. This may be further reduced to variability in lots or batches of materials used to culture cells. In order to understand how biomedical researchers use cell media in practice, and the subsequent downstream effects of these choices, we have analyzed a corpus of nearly 100,000 biomedical research articles published in ScienceDirect since 2000. From that larger corpus we selected 12,732 sentences from full-length articles that contain mentions of known basal media and cell lines. We found only one study that assessed use of media types in biomedical research (Arora, 2013), and believe our work will strengthen the findings of their review. Our contributions are to provide basic counts of the media types, cell lines, and supplement types; to provide information on the co-occurrences of those items, and to provide data on how the usage of those items has changed since the year 2000. Understanding at a high level what the most frequent co-occurring combinations are can pave the way towards understanding and establishing community standards, and fuel a larger discussion of the value, or risk, of using such artificial media in biomedical research. This also provides a baseline of current practice that can help identify future trends towards more physiologically representative media. 2 Methodology 2.1 Corpus We sourced articles from a list of manually curated biology journals, developed in 2017 by the authors (Groth & Cox, 2017). From these journals, we selected all full-length research articles that were published in the year 2000 or later. From these 174,971 papers, we selected all of the sentences that appeared in the methods section of the paper. To do this, we first used our open source AnnotationQuery tool (AQ) (McBeath, 2017) to filter section titles that contained the terms “experiment”, “procedure”, “method”, “in vitro”, “cell culture”, or “cell”. Using AQ we selected all of the sentences that appeared in these sections. This returned 6.99 million Journal of Data and sentences from 94,695 unique documents. Information Science http://www.jdis.org https://www.degruyter.com/view/j/jdis 163 Journal of Data and Information Science Vol. 5 No. 3, 2020 Research Paper 2.2 Dictionary development We downloaded a version of Cellosaurus in February 2019, version 29. Cellosaurus is an ontology of cell lines, developed by the Swiss Institute of Bioinformatics (Bairoch, 2018). In total, the ontology covers 109,135 cell lines. Each of these cell lines also has associated synonyms. For example, HeLa cells may also be represented as “Hela”, “He La”, “hela”, etc. Inspection of the entries revealed several noisy tokens. To improve precision of our tagging, we filtered out terms if they were represented by all numerals, a single or double or triple letter string, (e.g. “A” or “AB” or “ABC”), a combination of a single letter and a single digit (e.g. “1H” or “H1”), a string of numbers with a “-” or “.” in between them (e.g. “2-2” or “2.2”), a single letter followed by a single digit separated by a “-” or “.” (e.g. “A.1”), or two letters followed by a single number (e.g. “CH3”). There was also a significant amount of manual review of the remaining terms and their synonyms. We filtered out any names (e.g. “Fisher”) or overlap with medium types (i.e. “F12”). If any inappropriate tokens appeared in our analysis, they were reviewed by JC and excluded. One example is of the cell line RERF-LC-MS, which was annotated several thousand times in our corpus.