An Open-Access Long Oligonucleotide Microarray Resource for Analysis of the Human and Mouse Transcriptomes

An open-access long oligonucleotide microarray resource for analysis of the human and mouse transcriptomes. Kévin Le Brigand, Roslin Russell, Chimène Moreilhon, Jean-Marie Rouillard, Bernard Jost, Franck Amiot, Virginie Magnone, Christine Bole-Feysot, Philippe Rostagno, Virginie Virolle, et al. To cite this version: Kévin Le Brigand, Roslin Russell, Chimène Moreilhon, Jean-Marie Rouillard, Bernard Jost, et al.. An open-access long oligonucleotide microarray resource for analysis of the human and mouse transcriptomes.. Nucleic Acids Research, Oxford University Press, 2006, 34 (12), pp.e87. 10.1093/nar/gkl485. hal-00088266 HAL Id: hal-00088266 https://hal.archives-ouvertes.fr/hal-00088266 Submitted on 16 Aug 2006 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Published online July 19, 2006 Nucleic Acids Research, 2006, Vol. 34, No. 12 e87 doi:10.1093/nar/gkl485 An open-access long oligonucleotide microarray resource for analysis of the human and mouse transcriptomes Ke´vin Le Brigand1,2, Roslin Russell3, Chime`ne Moreilhon1,2, Jean-Marie Rouillard4,5, Bernard Jost6, Franck Amiot7, Virginie Magnone1,2, Christine Bole-Feysot6, Philippe Rostagno1,2, Virginie Virolle1,2, Virginie Defamie1,2, Philippe Dessen8, Gary Williams3, Paul Lyons3,Ge´raldine Rios1,2, Bernard Mari1,2, Erdogan Gulari4,5, Philippe Kastner6, Xavier Gidrol7, Tom C. Freeman3 and Pascal Barbry1,2,* 1CNRS, Institut de Pharmacologie Molećulaire et Cellulaire, UMR6097, 660, route des Lucioles F-06560 Sophia Antipolis, France, 2University of Nice Sophia Antipolis, Institut de Pharmacologie Molećulaire et Cellulaire, UMR6097, 660, route des Lucioles F-06560 Sophia Antipolis, France, 3MRC Rosalind Franklin Centre for Genomics Research, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SB, UK, 4Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109, USA, 5Biodiscovery LLC, 3886 Penberton Dr, Ann Arbor, MI 48109, USA, 6IGBMC, BP163, F67404 Illkirch Ce´dex, France, 7CEA—Service de Geńomique Fonctionnelle, Genopole d’Evry, F91057 Evry Ce´dex, France and 8Laboratoire de Geńe´tique Oncologique, UMR 1599 CNRS, Institut Gustave Roussy, F-94805 Villejuif Cedex, France Received April 28, 2006; Revised June 5, 2006; Accepted June 23, 2006 ABSTRACT This work provides a comprehensive open resource Two collections of oligonucleotides have been for investigators working on human and mouse designed for preparing pangenomic human and transcriptomes, as well as a generic method to mouse microarrays. A total of 148 993 and 121 703 generate new microarray collections in other organ- oligonucleotides were designed against human and isms. All information related to these probes, as mouse transcripts. Quality scores were created in well as additional information about commercial order to select 25 342 human and 24 109 mouse microarrays have been stored in a freely-accessible oligonucleotides. They correspond to: (i) a BLAST- database called MEDIANTE. specificity score; (ii) the number of expressed sequence tags matching each probe; (iii) the dis- tance to the 30 end of the target mRNA. Scores were INTRODUCTION also used to compare in silico the two microarrays Microarray technologies for expression profiling may be split with commercial microarrays. The sets described into two broad categories, platforms that are based on in situ here, called RNG/MRC collections, appear at least as synthesis of oligonucleotide probes and those that are based specific and sensitive as those from the commercial of the deposition of preassembled DNA probes. The first platforms. The RNG/MRC collections have now been class of array platforms is dominated by the commercial sector with a number of companies, e.g. Affymetrix (1), used by an Anglo-French consortium to distribute Nimblegen (2), Agilent (3), offering a range of off-the-shelf more than 3500 microarrays to the academic com- or custom arrays to their customers. Microarrays fabricated munity. Ad hoc identification of tissue-specific using preassembled probes have traditionally been favoured transcripts and a 80% correlation with hybridiza- by many academic laboratories and are also available from tions performed on Affymetrix GeneChip suggest a number of commercial sources e.g. GE Healthcare’s Code- that the RNG/MRC microarrays perform well. link platform (4), Illumina’s ‘BeadChip’ arrays (5). Primarily *To whom correspondence should be addressed. Tel : +33 4 9395 7793; Fax: +33 4 9395 7794; Email: [email protected] *Correspondence may also be addressed to Tom C. Freeman. Tel : +44 131 242 6242; Fax: +44 131 242 6244; Email: [email protected] Present address: Tom C. Freeman, Scottish Centre for Genomic Technology and Informatics, University of Edinburgh Medical School, The Chancellor’s Building, Edinburgh EH16 4SB, UK Ó 2006 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. e87 Nucleic Acids Research, 2006, Vol. 34, No. 12 PAGE 2 OF 13 for reasons of flexibility and cost, many academic labora- Here we describe the bioinformatic pipeline that has been tories still favour the use of spotted arrays made in-house used in the design of two pangenomic oligonucleotide collec- for their research. tions for study the expression profiling of human and mouse For a number of years the fabrication of spotted micro- systems. This includes in silico validation steps and bench- arrays largely relied on the attachment of gene fragments mark comparisons with commercial human and mouse amplified from cDNA libraries (6). Whilst this approach oligonucleotide probe collections, and the creation of an clearly works and can provide useable tools for expression open-access database called MEDIANTE, which integrates analysis, it suffers from several fundamental limitations: information about the RNG/MRC, Affymetrix, Agilent and gene representation within cDNA libraries is incomplete; Illumina probe sets. Lastly, we present experimental valida- there is often a significant degree of redundancy within tion data obtained after hybridizing distinct RNAs originating clone collections; annotation of clones can be flawed and from human or mouse tissues on microarrays spotted with the cDNA libraries often come with legal restrictions on their RNG/MRC probe collections. distribution and use. Furthermore, the relatively large size of the cDNA amplicons can be associated with the presence of repeat sequences or homology to related genes, which can MATERIALS AND METHODS compromise the specificity of the probes in an unpredictable Oligonucleotide design way (7). An alternative approach that addresses this issue involves the production of gene-specific DNA fragments by Transcript selection. Two non-redundant sets of mRNA PCR amplification using specific primers (8–10). Existence sequences (one for human and one for mouse) were assembled of a significant fraction of genes where a specific PCR amp- from RefSeq, a database derived from GenBank. These were licon cannot be designed or generated, as well as the high subjected to BLAST sequence analysis (20) against UniGene. costs and technical difficulty of DNA production, makes Out of the 105 680 representative sequences from human this approach impractical for the fabrication of mammalian UniGene clusters (build #167), 87 386 did not match this whole genome expression microarrays. first RefSeq selection (build #33 for human). When UniGene An alternative approach for probe synthesis for spotted clusters corresponding to less than 4 sequences were microarray production has come through the use of long excluded, there were 2979 UniGene clusters of more than 4 (50–70mers) oligonucleotides (11,12). A significant reduction sequences associated with at least 1 RNA sequence, which in the cost of production of the synthetic oligonucleotides, an did not match any RefSeq transcript. The representative improvement of the quality control provided by the different RNA from each of these UniGene clusters was then intro- suppliers and the ability to design one or several specific duced into the list of transcripts selected for oligo design. probes to any given target sequence, has made the use of Sequences defined in Affymetrix and Agilent human micro- long oligonucleotides for the fabrication of microarrays a array annotations were then compared to this second list very attractive option. As a result, the last few years have in order to identify sequences which were not represented. seen a number of companies offering aliquots of oligonuc- Following this selection the final number of human tran- leotide libraries for array fabrication. Transcript coverage scripts selected for oligo design was 29 894. BLAT analyses has then increasing alongside our knowledge of transcript (21) ensured that each sequence was correctly positioned on diversity. However, these sets have been relatively

An Open-Access Long Oligonucleotide Microarray Resource for Analysis of the Human and Mouse Transcriptomes

Dynamics of Gene Silencing During X Inactivation Using Allele-Specific RNA-Seq Hendrik Marks1*, Hindrik H

SLIC-CAGE: High-Resolution Transcription Start Site Mapping Using Nanogram-Levels of Total RNA

Drosophila and Human Transcriptomic Data Mining Provides Evidence for Therapeutic

Integrative Analysis of Multiple Sclerosis Using a Systems Biology

Published Version

Integrative Framework for Identification of Key Cell Identity Genes Uncovers

Content Based Search in Gene Expression Databases and a Meta-Analysis of Host Responses to Infection

Supplementary Table 1-All DNM.Xlsx

Small Noncoding RNA Signatures for Determining the Developmental Potential of an Embryo at the Morula Stage

Quantitative SUMO Proteomics Identifies PIAS1 Substrates Involved

Systematic Bromodomain Protein Screens Identify Homologous Recombination and R-Loop Suppression Pathways Involved in Genome Integrity

Distinct Transcriptomes Define Rostral and Caudal 5Ht Neurons