Analysis of Multiple Single Nucleotide Polymorphisms of Candidate Genes Related to Coronary Heart Disease Susceptibility by Using Support Vector Machines

Analysis of Multiple Single Nucleotide Polymorphisms of Candidate Genes Related to Coronary Heart Disease Susceptibility by Using Support Vector Machines

Clin Chem Lab Med 2003; 41(4):529–534 © 2003 by Walter de Gruyter · Berlin · New York Analysis of Multiple Single Nucleotide Polymorphisms of Candidate Genes Related to Coronary Heart Disease Susceptibility by Using Support Vector Machines Yeomin Yoon1, Junghan Song2, Seung Ho Hong3 and Introduction Jin Q. Kim2* Coronary heart disease (CHD) is the leading cause of 1 Department of Laboratory Medicine, Cheju National University College of Medicine, Jeju, South Korea death in developed countries (1). CHD is the complex 2 Department of Laboratory Medicine, Seoul National genetic disease involving many genes, environmental University College of Medicine, Seoul, South Korea influences, and important gene-environment interac- 3 Jeju National University of Education, Jeju, South Korea tions (2). CHD patients show varying clinical and angio- graphic features and that the importance of the differ- ent pathogenetic components may be different in Coronary heart disease (CHD) is a complex genetic dis- different patients (3). Prevention of CHD is made diffi- ease involving gene-environment interaction. Many cult not only by the multiple predisposing causes of association studies between single nucleotide poly- CHD but also by the individual’s susceptibility to these morphisms (SNPs) of candidate genes and CHD have causes. Emerging evidence suggests that common been reported. We have applied a new method to ana- variations in genes, called single nucleotide polymor- lyze such relationships using support vector machines phisms (SNPs), are associated with CHD and can mod- (SVMs), which is one of the methods for artificial neu- ulate the effect of environmental risk factors on the de- ronal network. We assumed that common haplotype velopment of CHD (4). However, the identification of implicit in genotypes will differ between cases and SNPs in genes associated with complex disease includ- controls, and that this will allow SVM-derived patterns ing CHD is often very difficult because their individual to be classifiable according to subject genotypes. Four- contributions are likely to be small (5). Therefore, we teen SNPs of ten candidate genes in 86 CHD patients not only analyzed the association between a single and 119 controls were investigated. Genotypes were SNP and CHD using the χ2 test for allele frequency in transformed to a numerical vector by giving scores cases and controls but also estimated the combina- based on difference between the genotypes of each tional effects of multiple SNPs in attempt to detect CHD subject and the reference genotypes, which represent using support vector machines (SVMs). the healthy normal population. Overall classification SVMs (6, 7) have been successfully applied to a accuracy by SVMs was 64.4% with a receiver operating wide range of pattern recognition problems, including characteristic (ROC) area of 0.639. By conventional microarray gene expression data. SVMs can classify analysis using the ␹2 test, the association between genes into some functional categories based on ex- CHD and the SNP of the scavenger receptor B1 gene pression data obtained from a microarray and have al- was most significant in terms of allele frequencies in lowed predictions to be made concerning the func- cases vs. controls (p = 0.0001). In conclusion, we sug- tions of unannotated yeast genes (8). SVMs, which are gest that the application of SVMs for association stud- based on a solid mathematical foundation, attempt to ies of SNPs in candidate genes shows considerable solve a universal problem of classification that we promise and that further work could be usefully per- need to know which belongs to which group. Applied formed upon the estimation of CHD susceptibility in in- to multiple SNP data, the process of constructing dividuals of high risk. Clin Chem Lab Med 2003; 41(4): SVMs begins with the transformation of genotypes of 529–534 multiple SNPs of each individual into a numerical vec- tor. Vectors are labeled positively if the individuals are Key words: Support vector machines; Coronary heart in the CHD group and are labeled negatively if they are disease; Single nucleotide polymorphisms. in the control group. Using this training set of SNP Abbreviations: apo, apolipoprotein; BMI, body mass vectors, SVMs would learn to discriminate between index; CHD, coronary heart disease; HDL-C, high den- the CHD and control group (9). Having learned the vec- sity lipoprotein-cholesterol; LDL-C, low density lipo- tor features of the class, SVMs can recognize a new in- protein-cholesterol; Lp(a), lipoprotein(a); ROC, receiver dividual as a member of the CHD group or of the con- operating characteristic; SNP, single nucleotide poly- trol group based on their SNP data. Moreover, the morphism; SRB1, scavenger receptor B1; SVMs, sup- SVMs could also be retrained to identify outliers that port vector machines. may have previously been assigned to the incorrect class in the training set. Then, SVMs would use the in- formation in the training set to determine what SNP features are characteristic of a given CHD or control *E-mail of the corresponding author: [email protected] group, and use this information to decide whether any 530 Yoon et al.: Analysis of multiple SNPs in CHD given SNP data are likely belong to a CHD or control berg, Germany). Body mass index (BMI) was calculated by di- group. viding weight by (height)2. We describe here not only the use of χ2 statistics to determine the association between a single SNP and Selection of candidate genes CHD but also the use of SVMs to classify subjects as We selected 10 genes probably related to CHD, based on en- members of the CHD or of the control group based on coded molecules that have roles in thrombosis, thrombolysis, a set of multiple SNP data. We genotyped 14 SNPs on vasodilator tone, and lipid metabolism. The ten genes were: 10 candidate genes related to CHD risk in 86 CHD pa- apoCIII, apoE, lipoprotein lipase, scavenger receptor B1 (SRB1), tients and 119 age-matched healthy controls. lipoprotein receptor-related protein, factor VII, plasminogen activator inhibitor 1 (PAI-1), glycoprotein 1b α-polypeptide (GP1BA), superoxide dismutase (SOD), and the endothelial ni- tric oxide synthase (eNOS) genes. The accession number of ap- Materials and Methods propriate the GeneBank reference sequences, the location of the sequences, and the bases potentially substituted in the 14 SNPs Study subjects and samples and the dbSNP numbers are summarized in Table 1. Eighty-six patients (54 males and 32 females) with CHD, as Genotyping SNPs documented by coronary angiography because of recent my- ocardial infarction or angina, were selected at Seoul National DNA samples were extracted from peripheral blood by stan- University Hospital. The normal control group consisted of dard methods. Genomic DNA was subjected to PCR and the 119 age-matched individuals (63 males and 56 females) who identity of the PCR products were confirmed by digestion with were selected by health-screening at the same hospital in or- a restriction enzyme and subsequent agarose electrophoresis. der to exclude those with a history of chest pain, diabetes, hy- Fourteen pairs of oligomers were chosen to serve as PCR pertension, and general illnesses. Blood samples were placed primers to amplify regions containing each of the SNPs in the into EDTA tubes and stored at –70 °C until assay. 10 candidate genes. The nucleotide sequence of these primers, restriction enzymes, and the expected sizes of the Lipid and apolipoprotein analysis PCR products are indicated in Table 2. The concentrations of plasma cholesterol and triglycerides Statistical analysis were determined using enzymatic methods (Roche Diagnos- tics, Mannheim, Germany). High density lipoprotein-choles- Statistical analyses were performed with the Statistical Pack- terol (HDL-C) was measured directly with HDL-C diagnostic age for the Social Sciences (SPSS, SPSS Inc., Chicago, IL, kits (Kyowa Medex, Tokyo, Japan) using a Hitachi 747 auto- USA), version 9.01 for Windows. Variables in two or three matic chemistry analyzer. The level of low density lipoprotein- groups were compared using the Mann-Whitney U-test or the cholesterol (LDL-C) was calculated using the formula of Fried- Kruskal-Wallis test. The χ2 test and Fisher’s exact test were wald et al. (10), and the levels of apolipoprotein (apo)A-I and used to test for independent relationships between variables. apoB were measured by immunonephelometric assay (Bering The difference of the allele frequencies of CHD patients and Nephelometer, Beringwerke AG, Germany). Lipoprotein(a) controls were evaluated using χ2 test. The Hardy-Weinberg (Lp(a)) was measured using commercially available enzyme- equilibrium of alleles at individual loci was assessed using linked immunosorbent assay kit (IMMUNO GmbH, Heidel- χ2 statistics. Table 1 Genotyped SNPs. Genes Gene Bank Reference Sequence Base Accession number Location dbSNP Major Minor of SNP number allele allele SRB1 NM_005505.2 (gi21361199) 1158 rs5888 CT ApoE K00396.1 (gi 178850) 586 rs7412 CT K00396.1 (gi 178850) 448 rs429358 TC eNOS D26607.1 (gi558523) 7002 rs1799983 GT D26607.1 (gi558523) 20454 rs1799985 GT SOD U10116.1 (gi529149) 5256 rs2536512 TG ApoCIII J00098.1 (gi 178765) 5163 rs5128 GC LPL AF050163.1 (gi3293304) 4509 rs285 TC AF050163.1 (gi3293304) 8393 rs320 TG AF050163.1 (gi3293304) 9040 rs328 CG Factor VII NM_019616.1 (gi10518502) 1223 rs6046 GA PAI-1 AF386492.2 (gi14488407) 837 rs1799889 G– LRP1 AF058399.1 (gi3493546) 516 rs1799986 CT GP1BA AF395009.1 (gi14600281)

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us