BMC Genomics Biomed Central
Total Page:16
File Type:pdf, Size:1020Kb
BMC Genomics BioMed Central Methodology article Open Access Computational selection and prioritization of candidate genes for Fetal Alcohol Syndrome Zané Lombard1, Nicki Tiffin2,3, Oliver Hofmann2, Vladimir B Bajic2, Winston Hide2 and Michèle Ramsay*1 Address: 1Division of Human Genetics, National Health Laboratory Service & School of Pathology, University of the Witwatersrand, Johannesburg, 2001, South Africa, 2South African National Bioinformatics Institute (SANBI) Research Group, University of the Western Cape, Bellville, 7530, South Africa and 3Division of Human Genetics, University of Cape Town, Cape Town, 8001, South Africa Email: Zané Lombard - [email protected]; Nicki Tiffin - [email protected]; Oliver Hofmann - [email protected]; Vladimir B Bajic - [email protected]; Winston Hide - [email protected]; Michèle Ramsay* - [email protected] * Corresponding author Published: 25 October 2007 Received: 28 March 2007 Accepted: 25 October 2007 BMC Genomics 2007, 8:389 doi:10.1186/1471-2164-8-389 This article is available from: http://www.biomedcentral.com/1471-2164/8/389 © 2007 Lombard et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: Fetal alcohol syndrome (FAS) is a serious global health problem and is observed at high frequencies in certain South African communities. Although in utero alcohol exposure is the primary trigger, there is evidence for genetic- and other susceptibility factors in FAS development. No genome-wide association or linkage studies have been performed for FAS, making computational selection and -prioritization of candidate disease genes an attractive approach. Results: 10174 Candidate genes were initially selected from the whole genome using a previously described method, which selects candidate genes according to their expression in disease-affected tissues. Hereafter candidates were prioritized for experimental investigation by investigating criteria pertinent to FAS and binary filtering. 29 Criteria were assessed by mining various database sources to populate criteria-specific gene lists. Candidate genes were then prioritized for experimental investigation using a binary system that assessed the criteria gene lists against the candidate list, and candidate genes were scored accordingly. A group of 87 genes was prioritized as candidates and for future experimental validation. The validity of the binary prioritization method was assessed by investigating the protein-protein interactions, functional enrichment and common promoter element binding sites of the top-ranked genes. Conclusion: This analysis highlighted a list of strong candidate genes from the TGF-β, MAPK and Hedgehog signalling pathways, which are all integral to fetal development and potential targets for alcohol's teratogenic effect. We conclude that this novel bioinformatics approach effectively prioritizes credible candidate genes for further experimental analysis. Background range of prevalence rates reported in two different primary Case Study Disease: Fetal Alcohol Syndrome school cohorts from this community were 65.2–74.2 per Fetal alcohol syndrome (FAS) is the most common pre- 1 000 [2] and 68.0–89.2 per 1000 [1] respectively. This ventable cause of mental retardation globally, and is a rate is alarmingly higher than the average observed for the serious public health problem in South Africa [1]. The developed world of 0.97 per 1000 live births [3]. Page 1 of 15 (page number not for citation purposes) BMC Genomics 2007, 8:389 http://www.biomedcentral.com/1471-2164/8/389 The teratogenic effect of alcohol is well established and the critical factor being the generally weak genotype-phe- exposure to alcohol in utero is known to result in a widely notype association in multi-factorial disorders [16]. variable phenotype. Fetal alcohol spectrum disorder (FASD) is an umbrella term used to describe the irreversi- Few candidate gene association studies investigating the ble array of anomalies associated with in utero alcohol effect of specific genetic polymorphisms on the risk of FAS exposure [4]. These anomalies include prenatal and post- development have been published. These studies have natal growth retardation, central nervous system (CNS) generally focused on the alcohol dehydrogenase enzyme dysfunction, characteristic craniofacial malformation and family members and conflicting results have been other organ abnormalities [5-7]. The term FAS is a clinical obtained. Stoler et al. [17] observed that the absence of the description for children at the most severe end of the ADH1B*3 allele was protective for fetal outcome, in con- FASD spectrum, who display the full phenotype associ- flict with two other studies showing the presence of this ated with in utero alcohol exposure. allele to be protective [18,19]. The ADH1B*2 allele has been proposed to play a possible protective role, or to be Although alcohol consumption during pregnancy is the a marker for protection in the South African mixed-ances- primary trigger for the presentation of FAS, the exact try population [20]. However, the sample size for this mechanisms for alcohol-induced teratogenic effects have association study was small, and results have not yet been not been elucidated. Research has shown that secondary replicated in other populations. Many other genes are factors, like genetic, epigenetic and environmental factors likely to contribute towards the development of FAS and influence the outcome and severity of the disorder. Fur- further investigation is required. thermore, a dose- and time-dependant relationship has been observed, where exposure to higher concentrations Candidate gene association studies remain the most prac- of alcohol at critical developmental stages resulted in tical and frequently employed approach in disease gene more severe anomalies [8]. An association between a var- investigation for complex disorders. However, the main iable genetic background and FAS development is prima- challenge when using this approach is to select suitable rily supported by the observation that FAS does not occur genes to test, especially for diseases with poorly under- in all children exposed to alcohol during the prenatal stood aetiology. Recently, many computational candidate period [9]. This observation suggests that certain individ- gene selection and -prioritization methods have been uals may have a genetic predisposition to infliction of developed [21-31]. These tools aim to identify and prior- more severe damage by gestational alcohol consumption; itize putative disease genes by modelling specific charac- and the varied phenotype observed in FASD may be a teristics of known disease genes, or by focusing on known reflection of the varied susceptibility quotients in the disease features (such as gene expression profiles or phe- genetic background of the individual. Streissguth and notype). However, there is a vast quantity of information Dehaene [10] studied twin pairs with alcoholic mothers, and data sources available currently, and it is expected and found the rate of concordance for FASD to be 100% that a tool with the flexibility to include a large array of for monozygotic twins, whereas digygotic twins showed data sources would positively aid disease gene discovery. only 64% concordance. Further support for the role of The freely accessible tool Endeavour offers such an appli- genetics in FAS development is obtained from animal cation [22]. This tool is based on the premise of ranking model studies [11]. Several studies in different mouse unknown candidate genes according to their similarity strains have shown variation in the extent and pattern of with a known set of training genes. In the absence of a alcohol-induced malformation, as well as behavioural linked genetic region (which is the case with FAS), all outcome [12-15]. FAS can therefore be considered to be a genes in the genome must be included as a starting point multi-factorial or complex disease, suggesting that there for candidate gene selection, which is not feasible when are multiple genetic factors underlying susceptibility to using this approach. FAS and the interactions between these factors as well as other factors are likely to be intricate. Convergent Functional Genomics (CFG) is an approach used to identify and prioritize candidate genes, which Disease gene identification for FAS relies on the cross-matching of animal model gene expres- To date, no FAS family linkage studies or genome wide sion data with human genetic linkage data, as well as association studies have been performed. Linkage studies human tissue data and biological roles data [32,33] This require large family samples and this poses a significant approach has many parallels to the approach described in challenge. Countries with the highest FAS rates are mostly this paper, as it prescribes a Bayesian-like methodology of resource-poor, possibly contributing to the reason why reducing uncertainty through the combination of multi- such studies have not yet been performed. Furthermore, ple independent lines of evidence, each by itself lacking linkage studies have not proven to be particularly success- sufficient power to confirm that a gene is a putative candi- ful in discovering the genetic causes of complex diseases, date gene,