Median Absolute Deviation to Improve Hit Selection for Genome-Scale Rnai Screens

Median Absolute Deviation to Improve Hit Selection for Genome-Scale RNAi Screens NAMJIN CHUNG,1 XIAOHUA DOUGLAS ZHANG,2 ANTHONY KREAMER,1 LOUIS LOCCO,1 PEI-FEN KUAN,2,3 STEVEN BARTZ,4 PETER S. LINSLEY,4 MARC FERRER,1 and BERTA STRULOVICI1 High-throughput screening (HTS) of large-scale RNA interference (RNAi) libraries has become an increasingly popular method of functional genomics in recent years. Cell-based assays used for RNAi screening often produce small dynamic ranges and significant variability because of the combination of cellular heterogeneity, transfection efficiency, and the intrin- sic nature of the genes being targeted. These properties make reliable hit selection in the RNAi screen a difficult task. The use of robust methods based on median and median absolute deviation (MAD) has been suggested to improve hit selection in such cases, but mean and standard deviation (SD)–based methods are still predominantly used in many RNAi HTS. In an experimental approach to compare these 2 methods, a genome-scale small interfering RNA (siRNA) screen was performed, in which the identification of novel targets increasing the therapeutic index of the chemotherapeutic agent mitomycin C (MMC) was sought. MAD values were resistant to the presence of outliers, and the hits selected by the MAD-based method included all the hits that would be selected by SD-based method as well as a significant number of additional hits. When retested in triplicate, a similar percentage of these siRNAs were shown to genuinely sensitize cells to MMC compared with the hits shared between SD- and MAD-based methods. Confirmed hits were enriched with the genes involved in the DNA damage response and cell cycle regulation, validating the overall hit selection strategy. Finally, computer simulations showed the superiority and generality of the MAD-based method in various RNAi HTS data models. In conclusion, the authors demonstrate that the MAD-based hit selection method rescued physiologically relevant false negatives that would have been missed in the SD-based method, and they believe it to be the desirable 1st-choice hit selection method for RNAi screen results. (Journal of Biomolecular Screening 2008:149-158) Key words: RNAi, RNA interference, siRNA, high-throughput screen, functional genomics, data analysis, MAD, median absolute deviation, hit selection INTRODUCTION as one of the most popular investigative tools ever for drug target identification and validation.2 With the advances in genome NA INTERFERENCE (RNAi) refers to posttranscriptional sequencing, large-scale siRNA or short hairpin (shRNA) Rgene silencing that involves the endonucleolytic cleavage libraries have been built and screened to identify novel thera- and subsequent degradation of a specific mRNA transcript by peutic targets.3-11 homologous double-stranded RNA.1 Discovery of small inter- A genome-scale RNAi library of about 20,000 genes can be fering RNA (siRNA) and its utility in mammalian cells have screened in, depending on configuration, 200 to 250 microplates enabled both academia and industry to adopt RNAi technology with 96 wells or 50 to 80 plates with 384 wells. RNAi screens are, in principle, cell-based high-throughput screens (HTS) that involve siRNA (or shRNA) transduction. It is siRNA transfection, however, that makes an RNAi screen markedly different from and 1 Department of Automated Biotechnology, Merck Research Laboratories, much more complicated than a conventional HTS. siRNA trans- North Wales, Pennsylvania. 2Department of Biometrics Research, Merck Research Laboratories, West fection can be, even with automation, a slow process and requires Point, Pennsylvania. lengthy incubation times before the effects of silencing are observed 3Department of Statistics, University of Wisconsin, Madison. and can be measured.12 Considering that RNAi assays typically last 4Department of Biology, Rosetta Inpharmatics, a wholly owned subsidiary of 48 to 96 h and often entail screening under multiple conditions Merck & Co., Inc., Seattle, Washington. (e.g., with or without DNA damage), the entire process of a Received Jun 1, 2007, and in revised form Oct 6, 2007. Accepted for publica- genome-scale RNAi screen can take up to several weeks of full tion Oct 26, 2007. operations. Therefore, RNAi screens are expensive, time-consum- Journal of Biomolecular Screening 13(2); 2008 ing, and resource intensive. For these reasons, many genome-scale DOI: 10.1177/1087057107312035 RNAi screens have been conducted in a nonreplicate manner, © 2008 Society for Biomolecular Sciences www.sbsonline.org 149 Chung et al. followed by the retest of a relatively small number (100s to 1000s) genome-scale siRNA screen, selected 2 different types of hits of preliminary hits in replicate.13-15 As a result, confidence levels in based on either SD- or MAD-based method, and retested them in preliminary hits selected from the primary screen are generally triplicate to compare their performances. low. In addition, compared with in vitro biochemical assays, the For this purpose, we used an siRNA screen that measured assays used in an RNAi screen typically measure cellular pheno- cell viability under different mitomycin C (MMC) conditions types with narrower dynamic ranges and greater variability due to to identify chemosensitizing genes. MMC is a common cellular heterogeneity. Moreover, variable siRNA transfection effi- chemotherapeutic agent for cancer patients. It mediates the for- ciency from plate to plate increases data variability. Taken together, mation of covalent bonds between DNA strands, causing cell it is not uncommon to observe a 20% to 30% coefficient of vari- cycle arrest during the S phase and, when left unrepaired, apop- ance for control siRNAs in an RNAi screen.15 Under these circum- tosis. Most cancer cells proliferate in an uncontrolled manner stances, it is critical to reduce the number of false-negative hits and, as a result, become particularly susceptible to apoptotic rather than false-positive hits from the nonreplicate, initial screen cell death by MMC. However, like many other chemotherapeu- because false positives can be eliminated during the validation tic agents such as cisplatin, camptothecin, and doxorubicin, process whereas false negatives represent missed opportunities. MMC is also cytotoxic to normal cells such as epithelial cells A simplistic hit selection method in HTS is to use an arbi- lining the digestive tract, hematopoietic cells, and other trary numerical cutoff such as 30% inhibition or 50% activa- actively proliferating cells. Therefore, the ultimate goal of the tion. Another widely popular method is to use the z score, screen is to identify therapeutic targets that can be used which is defined by the difference between individual well data together with a lower dose of MMC in combination therapy. and sample mean (or median) divided by standard deviation (SD).16 Therefore, a z score of 3 translates into the distance of MATERIALS AND METHODS 3 × SD extended from sample mean (or median), and the wells with a z score of 3 or greater represent an extreme 0.27% sub- Cell lines and chemicals set in a normal distribution. One way to reduce false negatives is to lower the stringency for hit selection threshold (e.g., 2 × HeLa cells were purchased from the American Type Culture SD instead of 3 × SD) and select more preliminary hits.13,14 Collection (Rockville, MD) and maintained in Dulbecco’s Minimal However, this inevitably increases the number of false positives Essential Media (DMEM) supplemented with 10% fetal bovine as well, thus increasing the cost of retesting the siRNAs that serum (FBS) and penicillin/streptomycin (Invitrogen, Carlsbad, would eventually turn out to be negative.16 CA). MMC was obtained from Calbiochem (San Diego, CA). Recently, there have been more sophisticated approaches to HTS hit selection to improve overall confirmation rates.16-24 siRNA design and siRNA library composition One way is to use median absolute deviation (MAD) rather siRNA sequences were designed with an algorithm devel- than popular SD. MAD can be defined as follows: oped to increase efficiency of the siRNAs for silencing while minimizing their off-target effects.26 siRNAs were manufac- MAD = 1.4826 × median( ⎜x – median(x) ⎜), ij tured by Sigma-Proligo (Boulder, CO). The custom siRNA library is composed of 22,108 unique siRNA pools, in which where x indicates all the values in the sample wells of a plate and each pool consists of an equimolar mixture of 3 siRNAs target- xij indicates the sample well at the row i and the column j. The ing difference sequences of the same mRNA transcript. The constant 1.4826 is used to make MAD comparable to SD when siRNA library includes the druggable genome,27 membrane 25 data distribute normally. In HTS assays, SD is often inflated by proteins, enzymes, pathways of therapeutic interest, and the presence of a few strong outliers (potential hits or detection RefSeq genes (releases 6-8, http://ncbi.nih.gov/RefSeq/). errors), which could increase the number of false negatives. In contrast, MAD is more robust to outliers, and MAD-based siRNA transfection methods are expected to obtain fewer false negatives. Using data in real RNAi HTS experiments, we have demonstrated that HeLa cells were transfected with siRNAs using the Optifect MAD-based methods (e.g., median ± 3 × MAD) identify more transfection reagent (Invitrogen). Briefly, HeLa cells (440 cells/40 hits than SD-based methods (e.g., mean ± 3 × SD).22 However, μl/well in 384-well microplates) were grown for 24 h in DMEM although previous studies have analyzed existing HTS data to supplemented with 10% FBS and penicillin/streptomycin in a develop and compare new hit selection methods, there have been SelecT automated cell culture system (Hertfordshire, UK) before no experimental investigations specifically designed to show transfection with the siRNA library described above. For screen- whether the additional hits selected using MAD-based methods ing, 3 siRNAs targeting the same gene were pooled at equal molar- are mostly true or false positives.

Median Absolute Deviation to Improve Hit Selection for Genome-Scale Rnai Screens

Applied Biostatistics Mean and Standard Deviation the Mean the Median Is Not the Only Measure of Central Value for a Distribution

Random Variables and Applications

Calculating Variance and Standard Deviation

4. Descriptive Statistics

Annex : Calculation of Mean and Standard Deviation

Measures of Dispersion

Descriptive Statistics

AP Statistics Chapter 7 Linear Regression Objectives

Measures of Central Tendency & Dispersion

1 Lecture.5 Measures of Dispersion

The Standard Deviation Is the Most Commonly Used Measure for Variability

1 Computing the Standard Deviation of Sample Means