SOFTWARE REVIEW

Genomics software: The view from 10,000 feet Michael E. Weale*

Department of Medical and Molecular Genetics, King’s College London, 8th Floor, Tower Wing, Guy’s Hospital, London SE1 9RT, UK *Correspondence to: Tel: þ44 (0)207 188 2601; Fax: þ44 (0)207 188 2585; E-mail: [email protected]

Date received: 18th September 2009

Abstract The rate of change in , and ’omics generally, shows no signs of slowing down. Related analysis software is struggling to keep apace. This paper provides a brief review of the field.

Keywords: genomics, software, systems genetics, ‘omics, -wide association studies, SNP annotation, networks

For the bioinformaticist, and still more so the tra- being expanded continuously. In addition to SNP ditional genetic epidemiologist, the big view on probes, modern GWAS panels are equipped with how to tackle genomic data analysis looks daunting. additional probe sets for interrogating copy number Only a few years ago, the genome-wide association variation (CNV). PennCNV2 is a popular software study (GWAS) represented the overshadowing for calling these. CNV call uncertainty poses Everest on the landscape, and commentators fretted downstream problems for association analysis, and about the computational feasibility of analysing software for dealing with this has been reviewed 500,000 or so single nucleotide polymorphisms recently in this journal.3 Another trend is towards (SNPs) against one phenotype variable. Now, the imputation of SNPs that are not present in the single-phenotype GWAS is a foothill from which GWAS panel but can be inferred via linkage dise- to launch attacks on datasets of much greater scale quilibrium (LD), also reviewed here recently.4 and variety. A new term — systems genetics — has Popular choices are Mach,5 Impute6 and Beagle.7 A emerged to describe this expanded world view. more specialist imputation problem, but one of Analytical tools for dealing with molecular data general interest due to the role of the immune have always lagged behind the acceleration of high- response system in many diseases, is to call classical throughput methods for generating them, and that leukocyte antigen (HLA) genotypes from seems especially true at the present time. Here, I SNPs typed in the HLA region of chromosome provide an overview of the software currently avail- 6. Recently improved software from Gil McVean able, and look ahead to future developments. and colleagues is available for this.8 First, a look at more familiar territory. The soft- SNP annotation tools provide the most straight- ware package PLINK1 has become the favoured forward window from GWAS hits and also work-horse of GWAS analysis, thanks to the untir- sequence data into the wider ‘omic universe. A ing efforts of Shaun Purcell to keep the software recent review is by Rachel Karchin.9 The SNP well documented, flexible, fast and compact in its Function Portal10 provides one of the more com- use of data structures. Few other packages surpass prehensive lists of annotation for each SNP, including PLINK as far as basic quality control and first-pass those arising via LD proxy or ‘tagging’. Other SNP-by-SNP analysis are concerned, and many options include FastSNP,11 PupaSuite,12 SNPnexus,13 other, more advanced features are available and are SNPinfo,14 SNPselector,15 F-SNP16 and TAMAL.17

56 # HENRY STEWART PUBLICATIONS 1479–7364. HUMAN GENOMICS. VOL 4. NO. 1. 56–58 OCTOBER 2009 Genomics software: The view from 10,000 feet SOFTWARE REVIEW

WGAviewer18 is geared specifically towards the no means only, location for primary literature on analysis of GWAS results, and has a nice visual new software. Looking ahead, software for handling interface. All these tools struggle to keep up with high-throughput sequencing is an area where we the rapidly expanding set of available annotations. can expect much development in the coming For example, several different datasets are now pub- months. has a useful online ‘virtual lically available that combine GWAS SNP data with issue’ on tools for next generation sequencing which genome-wide expression data (so-called gen- they are recurrently updating (http://www.oxford etical genomics or expression quantitative trait locus journals.org/our_journals/bioinformatics/nextgene [eQTL] data). Currently, however, no one tool inte- rationsequencing.html). One wonders whether grates the ability to search all these datasets simul- 10,000 feet will be high enough for a synoptic taneously. One option for the more proficient view in 12 months time. investigator is to keep one step ahead by using the Galaxy web tool19 to design their own application for integrating different annotation tracks with their References GWAS hits. SNAP20 is a useful tool for feeding LD 1. Purcell, S. et al. (2007), ‘PLINK: A tool set for whole-genome associ- ation and population-based linkage analyses’, Am. J. Hum. Genet. Vol. proxy into such a custom-made Galaxy 81, pp. 559–575. application. 2. Wang, K. et al. (2007), ‘PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection Beyond SNP annotation, there are more formal in whole-genome SNP genotyping data’, Genome Res. Vol. 17, attempts at linking genetic data into functional net- pp. 1665–1674. 3. Plagnol, V. (2009), ‘Association tests and software for copy number works. These may be created from internal sources, variant data’, Hum. Genomics Vol. 3, pp. 191–194. such as p-values for SNP–SNP interactions, or 4. Ellinghaus, D., Schreiber, S., Franke, A. and Nothnagel, M. (2009), ‘Current software for genotype imputation’, Hum. Genomics Vo l . 3 , extrinsic sources, such as protein–protein inter- Vol. 371–380. 21 actions (reviewed here recently ) and gene ontol- 5. Li, Y. and Abecasis, G.R. (2006), ‘Mach 1.0: Rapid haplotype recon- struction and missing genotype inference’, Am. J. Hum. Genet. Vol. S79, ogy categories. A repository of types of network p. 2290. data is available at http://www.pathwaycommons. 6. Marchini, J., Howie, B., Myers, S., McVean, G. et al. (2007), ‘A new multipoint method for genome-wide association studies by imputation of org. While network visualisation tools were pre- genotypes’, Nat. Genet. Vol. 39, pp. 906–913. viously the domain of expensive commercial soft- 7. Browning, B.L. and Browning, S.R. (2009), ‘A unified approach to geno- ware, Cytoscape22 has become an excellent type imputation and haplotype-phase inference for large data sets of trios and unrelated individuals’, Am. J. Hum. Genet. Vol. 84, pp. 210–223. freeware alternative. For formal statistical signifi- 8. Leslie, S., Donnelly, P.and McVean, G. (2008), ‘A statistical method for pre- cance of coincident patterns within these networks, dicting classical HLA alleles from SNP data’, Am.J.Hum.Genet.Vol. 82, pp. 48–56. there is a rapidly expanding literature and no con- 9. Karchin, R. (2009), ‘Next generation tools for the annotation of human sensus yet on the best approach to take. Two SNPs’, Brief. Bioinform. Vol. 10, pp. 35–52. 23 10. Wang, P. et al. (2006), ‘SNP Function Portal: A web database for explor- examples are ALIGATOR and gene-set enrich- ing the function implication of SNP alleles’, Bioinformatics Vol. 22, ment analysis (GSEA). The latter has been adapted pp. e523–529. 11. Yuan, H.Y. et al. (2006), ‘FASTSNP: An always up-to-date and extend- from gene expression studies and applied to GWAS able service for SNP function analysis and prioritization’, Nucleic Acids p-values. Web-based implementations are available Res. Vol. 34, pp. W635–W641. 12. Conde, L. et al. (2006), ‘PupaSuite: Finding functional single nucleotide at http://bioinfo.vanderbilt.edu/webgestalt and polymorphisms for large-scale genotyping purposes’, Nucleic Acids Res. http://www.broadinstitute.org/gsea. Vol. 34 (Web Server issue), pp. W621–W625. 13. Chelala, C., Khan, A. and Lemoine, N.R. (2009), ‘SNPnexus: A web How can one keep up to date in the rapidly database for functional annotation of newly discovered and public domain changing world of genomic software? Certainly, single nucleotide polymorphisms’, Bioinformatics Vol. 25, pp. 655–661. 14. Xu, Z. and Taylor, J.A. (2009), ‘SNPinfo: Integrating GWAS and review sections such as the one here in Human candidate gene information into functional SNP selection for genetic Genomics will help. Nucleic Acids Research publishes a association studies’, Nucleic Acids Res. Vol. 37 (Web Server issue), 24 pp. W600–W605. useful annual review of web server applications, 15. Xu, H. et al. (2005), ‘SNPselector: A web tool for selecting SNPs for now also available online (http://bioinformatics.ca/ genetic association studies’, Bioinformatics Vol. 21, pp. 4181–4186. 16. Lee, P.H. and Shatkay, H. (2008), ‘F-SNP: Computationally predicted links_directory). The Applications Note section of functional SNPs for disease association studies’, Nucleic Acids Res. Vol. 36 the journal Bioinformatics provides the best, but by (Database issue), pp. D820–D824.

# HENRY STEWART PUBLICATIONS 1479–7364. HUMAN GENOMICS. VOL 4. NO 1. 56–58 OCTOBER 2009 57 SOFTWARE REVIEW Weale

17. Hemminger, B.M., Saelim, B. and Sullivan, P.F. (2006), ‘TAMAL: An 21. Lehne, B. and Schlitt, T. (2009), ‘Protein-protein interaction databases: integrated approach to choosing SNPs for genetic studies of human Keeping up with growing interactomes. Hum. Genomics Vol. 3, pp. 291–297. complex traits’, Bioinformatics Vol. 22, pp. 626–627. 22. Shannon, P. et al. (2003), ‘Cytoscape: A software environment for 18. Ge, D. et al. (2008), ‘WGAViewer: Software for genomic annotation of integrated models of biomolecular interaction networks’, Genome Res. whole genome association studies’, Genome Res. Vol. 18, pp. 640–643. Vol. 13, pp. 2498–2504. 19. Giardine, B. et al. (2005), ‘Galaxy: A platform for interactive large-scale 23. Holmans, P. et al. (2009), ‘Gene ontology analysis of GWA study data sets genome analysis. Genome Res. Vol. 15, pp. 1451–1455. provides insights into the biology of bipolar disorder’, Am. J. Hum. 20. Johnson, A.D. et al. (2008), ‘SNAP: A web-based tool for identification Genet. Vol. 85, pp. 13–24. and annotation of proxy SNPs using HapMap’, Bioinformatics Vol. 24, 24. Benson, G. (2009), ‘Nucleic Acids Research Annual Web Server Issue in pp. 2938–2939. 2009’, Nucleic Acids Res. Vol. 37, pp. W1–W2.

58 # HENRY STEWART PUBLICATIONS 1479–7364. HUMAN GENOMICS. VOL 4. NO. 1. 56–58 OCTOBER 2009