Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. resadmngncdsae.Tog hr aebe infiatcnrbtoso ouaingenomics population of dis- understanding of level genetic contributions genome comprehensive rare significant a of provided been have understanding & have studies few our there Bocchini, very advanced populations Though Amberger, Indian significantly of Scott, diseases. research have OMIM (Hamosh, monogenic efforts 7000 delineated Currently and global been orders diseases. These have accelerated Mendelian has monogenic etiologies 2005). methods with genetic McKusick, detection associated distinct variants throughput with and high and phenotypes novel tools of genetic pace in the decade last the over Advancements abroad. living INTRODUCTION Indians and populations Indian the to relevant tools (https://databases.lovd.nl/shared/variants?search screening and genetic appropriate [(http://clinindb.igib.res.in) Metabolism, of ClinIndb Conclusion: Inborn-errors database, (South-Asians). owned viz. ge- a project is genome disorders, cohort created in1000 study representation to have our its linked underrepresented that than We shown or were populations also earlier Indian variants have We of known These representatives not conditions. better hereditary either netically frequencies. other were various their and and informative of cancers Results: hereditary be terms Monogenic-diabetes, to populations. in found Indian databases were distinct ˜12% public geographically phenotype GSA, in and monogenic from variants ethnic of monogenic analyzed diverse individ- profile of 2795 the representing burden frequency from Of cohorts) Illumina) the the (global-screening-array, genomics estimate dataset assessed in-house to genotype have (multiple utilized was we study uals study The information this, this Towards its Methods: of or variants. aim populations. ClinVar The underrepresented associated Indian are datasets. in public populations variants using in Indian causing sparse population disease The world are variants different methods. relevant in clinically based variants sequencing deleterious w.r.t. and and rare genotyping cataloging throughput towards high efforts concerted been have Purpose:There Abstract 2020 28, April 7 6 5 4 3 2 1 Mukerji Mitali Varma Seth Malika Singhal Khushboo Narang Ankita genomic for resource in A medicine in (ClinIndb): markers populations relevant Indian clinically multi-ethnic and rare of spectrum Frequency nttt fGnmc n nertv Biology Integrative and Genomics biology of integrative Institute Delhi and New genomics Sciences, of Medical Institute of Institute Ladakh UK India Leh London All Way, Hospital, Grafton Memorial 25 Norboo Health Sonam Women’s (IGIB) for Biology Institute Integrative UCL & Biology Genomics Integrative of & Institute Genomics of Institute CSIR by 1 %D2Mhmd2Frq2),t epciiin n eerhr ndanss oneigaddvlpetof development and counseling diagnosis, in researchers and clinicians help to =%3D%22Mohamed%20Faruq%22)], adn Jain Vandana , 1 hit Parveen Shaista , 6 1 n oamdFaruq Mohammed and , hrtrmUppilli Bharathram , 1 zaShamim Uzma , 5 rstaConsortium Trisutra , 1 rhn Vats Archana , 2 oj Sharma Pooja , 1 ie Anand Vivek , 7 1 1 1 hvn Parasher Bhavana , aaHillman Sara , 1 1 aaZahra Sana , aw Naushin Salwa , 3 am Dolma Padma , 1 1 rdaaMathur Aradhana , hnauSengupta Shantanu , 1 riYadav Arti , 4 Binuja , 1 , 1 , 1 , - Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. . act fkoldefrmttossetu n hi rqece,2)lc fsseai characterization systematic of have lack we Therefore, 2.) frequencies, populations. their Indian and in spectrum conducted of studies for implementation knowledge earlier of in for paucity size challenges sample unmet 1.) low few either or a Primarily, India are of populations. there populations genetics Indian these clinical Despite in of needs research medicine India. the genomic genomics in meet translational to system enterprise and healthcare commercial public basic of and in segment funded hospitals government tertiary information of from unit level ongoing genetics are laboratories, genome efforts coun- Asian wide Asian of country different Multiple representation from adequate individuals 2019). for 1700 Consortium, over need (GenomeAsia100K of the data databases data (GAsP) highlighting level thus Project genome 100k tries, covered ail- GenomeAsia disorders, comprehensively neurogenetic storage published other a Lysosomal recently and provided infor- in genodermatosis A aggregate studies immunodeficiencies, that reports (http://guardian.meragenome.com/). primary disorders case of ments genetic multiple disorders based Indian and sequencing of dysplasias single knowledgebase skeletal and representative Dystrophy NGS also et Muscular from now (Pradhan mation Duchenne is cardiomyopathies anemia), There disorders, on Mitochondrial cell 2010). information ataxias, sickle made al., provides spinocerebellar and been v1.0 (CF), have (thalassemia database fibrosis contributions cystic noteworthy hemoglobinopathies disease (DMD), Other Genetic of (#IGDD). Indian genetics patients South The Indian the 2018), 3500 others. in al., over few (INDEX-db) et in individuals a (Hariprakash disease Indians and genetic Exome) south 2019) 1000 and the 2015), al., Consortium, Genome With P. informa- et Asian variant (G. P provide (South Project populations. (Ahmed to SAGE Genomes efforts Indian 1000 - additional from of scale diverse in genomes wide spectrum Asian the put genome have the across groups the understand research out at et global to tion carried other Sinha efforts Indian not 2009; systematic NGS, have Habib, P. of platforms variants advent 2007; & throughput monogenic al., Agarwal, high et Arya, and of Gupta 2007; Sinha, Mendelian availability 2010; 2009; al., limited al., al., et to et et Biswas Grover Due Kumar A 2017). 2014; 2015; al., al., 2008; al., et et al., Talwar et Giri et 2008; Kanchan 2011; al., Bhattacharjee al., 2012; et 2010; al., Chaki et al., and related 2010; Jha et adaptation associations al., Aggarwal altitude et days disease high 2015; Biswas et pre-NGS studies, and Arindam al., during Thanseem cutaneous pharmacogenomics et demonstrated al., neurological, 1999; HIV), (Aggarwal various been al., et for , disorders has et (Bamshad populations variability example, Majumder at-risk baseline populations (For of 2003; specific Indian diseases identification India al., diverse infectious an et of in of Kivisild based utility - pool 2011; haplogroup The distinct Agrawal, Y- are 2006). & and of al., group Khan, mitochondrial characterization there Ahmad, addition, AA in and Borkar, In and helped admixture 2001; DR 2008). of also Consortium, from degree have V. large specifically studies a G. on populations, exhibit (I. based isolated to unique (TB)) known however, and Tibeto-Burman are genetic sub-clusters, determinants large four and multiple major into IE (IE) are broadly are and Indo-European divided DR language be (DR), and classification. Consortium, can Dravidian ethnicity ethno-linguistic V. populations that (AA), G. Indian revealed (I. (Austro-Asiatic that analysis populations highlighted clusters Indian studies Genetic diverse These For 55 2010). geography. of across diversity. than al., polymorphisms regions genetic et associated single its Narang disease of recent catalogue understand to 2008; in a map events to provided that Patterson, founder have undertaken efforts genes for Thangaraj, level been 900 reservoir consortium Reich, a have a 2008; and IGVdb efforts in et gene-variant-pool Consortium, genomic instance, unique (Basu V. nationwide a population G. provides extensive local by I. This with past, shaped 2016; 2009). admixtures been Singh, Majumder, also migration & also & as of Price, Sarkar-Roy, has art history Basu, and vast populations 2003; trade its the clines, of al., of geographical exchange consanguinity, diversity intercontinental and during ancestries genetic endogamy shared events the as and lineages Further, such linguistic factors and socio-cultural populations. cultural ethnic, global diverse many from diagnosis people with rapid billion facilitate 1.3 delivery of that healthcare comprises India populations guided need across medicine solutions. urgent biomarkers genomic healthcare an for genomic gen 2 u to due ako ersnaino ieetethnic different of representation of lack Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. aaaotgne,aeaddsaesau fsmlsicue nCrimdadGMDchr sprovided is cohort GOMED and Cardiomed included. in were setting, included disorders clinical samples hereditary in of various status chip (AIIMS, of disease genotyping in regions investigations and throughput age genetic Indian high gender, for North of about altitude clinicians utility from Data high from the lineage from referred understand IE subjects subjects to coro- Indo-European 19-40 included 287 study: of and study from GOMED 552patients Leh HAPS were iv) from The and Delhi). subjects iii) (TB) individuals based lineage the details). comprised897healthy control -Burman cohort of it case for Tibeto for group and (Table-S1 recruited disease age disease subjects artery artery 1449 the conditions. nary included coronary and cohort morbid for CARDIOMED genders non-mendelian ii) study other both GWAS 2017). related of al., and et representation (IPGTRA) health (Prasher –West equal various years IE of near (KLE), study was DR-South genetic (CBPACS), (IE) There for various IE-North Indo-European (JBR) for from from following IE-East asso- individuals lineages (in-house) from and included phenotype linguistic datasets studies cohorts (DR) mongenic genotype Ayurgenomics cohort Dravidian The and TRISUTRA and based conditions. i) Mendelian physiological genomics analyzed: of related were ongoing other analysis cohorts from or frequency polygenic outcome) populations and or pregnancy Indian non-mendelian curation and in at adaptation variants aimed (Hypoxia cohorts (n=287). ciated cohort HAP study Ayurgenomics study diverse TRISUTRA this GOMED 3.) represent Since, 1.) 4.) 1449), cohorts and – (n= these ) were cohort four (n=438 in cohorts CARDIOMED of cohort included These part study were subjects 2.) India. who The 958), of subjects (n= background Indian studies. geographic 3132 cohort and of ethnic based inc.) GWAS Illumina genome- array, in-house from variants screening different pathogenic (Global likely and data pathogenic genotyping reported ClinVar wide of frequencies the analyzed have We details subjects/samples Study much genetically opportunities provide is METHODS to cohort dataset AND relevant our MATERIAL genome clinically that 1000 research. of demonstrate in future information to populations for the Asian able gaps register South were and representative and we than catalogue Further, Consortium diverse to (GenomeAsia100K more population. database i GenomeAsia100K Indian unique and study Aggregation a for 2016) Exome our variants created al., The have in The and et We 2015), higher 2017) Karczewski Consortium, Francioli, (2019). J. P. is & (G. (K. variants Karczewski populations (ExAC) (K. diverse pathogenic Genome Consortium covers (gnomAD) 1000 SAS like it Database of repositories Aggregation i) Mendelian global other Genome representation : other and (iii) as with disorders compared unique hematological populations, distribution when Metabolism, and frequency Indian of novel provides errors ii) is in Inborn subjects, study disorders healthy for our variants 2795 ge- of of pathogenic size known content affordable sample of the and large Screening with brief, Global throughput cohorts using In high variants Indian annotated multiethnic a Illumina. clinical utilized global from 19,538 linked study (GSA) over disease of Array Our information monogenic provides of (n=2795). that novel catalogue tool populations comprehensive nomics of Indian a characterization provides diverse study in our 4.) variants primarily, issues disorders, genetic these of genetic address knowledge prevalent To of other scarcity and 3.) phenotypes disorders, OMIM mutations. monogenic 7000 various to of linked spectrum mutations pathogenic known of o ,3 ape eelae nteGnmSui eso . o lseigadcliggntps Out genotypes. calling and data clustering Raw for processing. 2.0 data version illumina GenomeStudio for the practices in best loaded follow were to tried samples we 3,132 the errors, for per calling GWAS as genotype ClinVar, reduce from performed To markers were ( important PharmaGKB. Experiments clinically from Screen- of information probes. representation Global pharmacogenomics genome-wide has Infinium and chip 642,824 throughput This high contains the that protocol. using manufacturer’s 1.0 system version iScan Array Illumina the ing in performed was (QC) Genotyping control Quality and processing Data Genotyping, al S1 Table . 3 Figure-1 ) Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. h ain sdltrosi IT osbyDmgn P oyhn, Polyphen Probably n Uncertain SIFT, (P) of Damaging in Variant Possibly deleterious assigned SIFT, - been in VUS-II. criteria has deleterious following is 3 with variant of , pathogenicity the Polyphen score predict in A tools (D) Polyphen three Damaging 2010). scores, all CADD Hakonarson, if used & com- We Significance a Li, effect. using such (Wang, their analysed other ANNOVAR and ascertain and selected to from were pathogenicity tools affects” of three factor, interpretation” of risk bination uncertain association, pathogenic”. or significance, Likely manuscript. “uncertain “no or as this and query “Pathogenic keywords significance throughout of “conflicting” our clinical pathogenic keywords filter narrowed as used keyword with further designated Variants we applied are we variants, then variants variants these and pathogenic relevant retrieve database likely clinically ClinVar To or in and Pathogenic variants. given rare pathogenic 1 [?]0.05. only Clin- likely frequency of in was allele RSID value and matches alternate dbSNP analysis pathogenic exact with and our with only Indels) coordinates alleles and of to (SNPs those the focus variants only Both 19,538 the retained selected we . As we variants, QC, 2015) multi-allelic manual al., of After case et Var. In (Landrum 2019 used. (ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab were 1, database ClinVar names in April variants the on with database markers. downloaded subjects ClinVar delimited/, our common with in genotyped for GSA variants the in data mapped variants We the relevant / merge important to F clinically used of plots. populations. Mapping was PCA among customized PLINK differentiation study, create genetic this to compare data. in used to samples IGV was used i.e. R reference datasets in as three (n=2,504). package across well data ggplot common genomes as markers 1000 161,484 genomes of with used representation 1000 we as the analysis, well compared PCA as We perform samples To study. were study the1000 samples Consortium our These in (IGV) with diversity. Variation diversity ethnic populations genomic Genome genomic Indian Indian Indian with Indian diverse of OMINI of representation w.r.t cohorts on part a genotyped subjects study as individuals as study healthy unpublished) collected our 471 our (Data of of of dataset Inc. to mapping reference Illumina structure package genome-wide genetic array, used For genetic EIGENSOFT we 2006).. groups, the in linguistic al., and compare module et (Price smartpca to project using as genomes performed well was as (PCA) elucidate analysis component Principal genomes subjects 1000 of the structure/architecture indi- –freq genetic in 2795 of defined using missing in Assessment allele genotypes. computed variants of alternate heterozygous were and 597,979 the rate variants (ref/alt) to included high the homozygous respect set of with the with final frequencies Individual computed Allele was QC, Frequency (http://grch37.ensembl.org/Homo all genotypes. computed. Plink. After missing was in of frequency option two. rate which For the (IBD) for the Identity-by-descent among PI viduals function. on of excluded with –genome proportion based was using individuals estimate was genotypes samples 181 to samples removed among pruned of We LD relatedness exclusion were for individuals. 0.1 checked two –maf also between with we SNPs that addition, autosomal SNPs p-value In removed this, with We excluded SNPs. values, level. were missing rare variants sample 10% for 12,970 with at (HWE). samples exclusion format, equilibrium and no variants Weinberg PLINK < was the Hardy the there out from in while filter deviate removed output to significantly were 2015) retrieved SNPs al., and et 1,373 8 (Chang additional used of 2012). was al., threshold (v1.9) et was z-score PLINK (Goldstein that used caller analysis variant We and a QC is SNPs. for Zcall rare preferable (Version3.4). zcall calling by of for processed rate suited was call criteria GenomeStudio genotype through passed filtered that score samples GenTrain with 2,976 SNPs were there 3,132, of 10 -6. eapidHEfitrol o omnSP –a .5 si sntaporaet s hsfilter this use to appropriate not is it as 0.05) (–maf SNPs common for only filter HWE applied We > 07 Nswt GenTrain with SNPs =0.7. > 2 ADadti ecasfida VSI.Asoeo . a sindif assigned was 2.5 of score A (VUS-I). as classified we this and CADD =20 ain/noIdx.A nhuedvlpdpr citwsue ocount to used was script perl developed in-house An sapiens/Info/Index). 4 > 07aeepce ob lsee orcl.Data correctly. clustered be to expected are =0.7 HAT ST ausfo mrparslswr also were results smartpca from values > 09.Frhr eslce 612,322 selected we Further, =0.95. > > .,a siaeo B.Further, IBD. of estimate an 0.2, 2 ADadwsasge as assigned was and CADD =20 I n ITpredictions SIFT and DIV - Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. l,21) ute,w xlddte10 eoe F,ARadERsprppltosa ela the as et well Shah as 2011; populations al., super et EUR (Narang and populations AMR African AFR, super genomes and western other 1000 Indian from any the between population excluded than cline Indo-African we a population Further, admixed in super an 2011). present be EAS al., is of to to Indian majority demonstrated closer populations of to earlier IGV part is proximal was as respect which group is well with descent), TB SAS as cohorts African genomes while Though study 1000 distant our of are populations. of population populations large diversity ( super genetic IGV populations (AFR) the the IGV assess African and and and genomes compare American(AMR) 1000 to used the PCA to analysis, this cohorts In study the of analysis diversity likely Genetic or pathogenic the to DISCUSSION belong AND information. markers and RESULTS the 9853 Annotation, retrieve GSA, to HTML5, and graphs. PHP ClinVar using used the between We class. developed plot markers pathogenic database. was to 19,538 mysql end common in used front stored of was are The Out Highcharts variants all Jquery. tools. of information and source frequency Javascript open Bootstrap.3.2.1, using 5, designed PHP was browser ClinIndb The Implementation Database resources. different using (IEM), conditions metabolism hereditary of other errors and Inborn cancers disorders: hereditary complex and fibrosis, rare Cystic associated MODY, variants and genes of Annotation W losotitd3 ainsrlvn rmpamcgnmc esetv hc r agdwt the with tagged are which perspective pharmacogenomics from relevant variants disorders 30 for shortlisted analyzed also We were ([?]5) 6. occurrence high with variants conditions, hereditary other Among 5. Grln ainsi eeiaycnes ito 5 eei ainsi 9cne rdsoiggenes predisposing cancer 99 Fibrosis Cystic in with variants associated Variants Genetic Genetic 851 of 4. List cancers: Hereditary in Variants Germline two 3. from compiled is data This genes: associated (MODY) young the of database diabetes Initiative onset Monarch Maturity The 2. from retrieved were classes IEM different with associated Genes 1. ewr du epne nCiVr( ClinVar in response” syndrome “drug Lange keyword de Cornelia disorders, Cardiac disorders. disorders, syndromic using neuromuscular other estimated and other was variant and this haplotypes Neurological of inferred viz. segregation the 2001) the Donnelly, of we identify & frequency and to Smith, SNPs The (Stephens, SNPs tag algorithm identify tag backgrounds. PHASE to (209) with haplotype variants used variant those was datasets different F508del Tagger selected genotype on frequent first populations. These We less European populations. included in project. of [?]0.05 also haplotype genomes group of performed 1000 major frequency we have four from gene, the that variants CFTR for 4389 separately in investi- on divided To were data F508del database. common genotype CFTR2 most using in of analysis frequency origin ˜70% Pheny- haplotype has causing the which (CF) (rs113993960) gate Fibrosis Cystic deletion classical (F508) transmem- included 508 This fibrosis lalanine cystic gene. (CFTR) from regulator variants conductance pathogenic brane 28 prioritized We 11March2019%20%281%29.xlsx. rm8,6 ains(onye l,21) aawsdwlae rm-https://www.cftr2.org/sites/default/files/CFTR2 - from downloaded was Data gene 2013). (CFTR) al., regulator et (Sosnay conductance patients transmembrane 88,664 fibrosis by from cystic study in variants the pathogenic in reports which provided is cancers hereditary with associated are that B: MODY. with Source MIDD associated and like genes form lipodystrophy 35 of related partial annotation its or or deafness) MODY and 2018 in diabetes (https://www.diabetesgenes.org/tests-for-diabetes-subtypes/a- implicated inherited DiabetesGenes genes, (maternally 33 – houses A new-test-for-all-mody-genes/) four Source under to defined related subclasses sources. as IEM in well for provided as is genes metabolism class energy unique IEM and every 419 thyroid different acid, amino 2016). carbohydrate, al., classes- et (Mungall (https://monarchinitiative.org/) opldadcasfidgnsit 4MD utps(idu ta. 2018). al., et (Firdous subtypes MODY 14 into genes classified and compiled al S7 Table Figure-S2 al S2 Table 5 al S5 Table ). (Figure-2). and .A xetd 00gnmsErpa (EUR,) European genomes 1000 expected, As ). iueS1 Figure al S6 Table FR hts/wwct2og)database (https://www.cftr2.org/) CFTR2 : GWI a ugoppplto of population outgroup (an OG-W-IP . . un tal. et Huang al S3 Table al S4 Table iru tal. et Fidrous provides - Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. ncroyrt eaoim ain r27088 nGG eehsfeunyo %(rqec f = (f) (frequency 3% of frequency has well gene hypoplasia. thyroid as GYG1 with Clindb in in associated (rs267606858) associated is frequency variant variant is asymptomatic 0.01% a This variant of majorly metabolism, study. This frequency is carbohydrate under has and In databases databases. SLC2A4 higher other in global be in variant other populations to a (f=0.03) in report SAS metabolism, reported gene populations as thyroid doesn’t is BTD In SAS databases deficiency in as nature. partial other (rs13078881) well in whose variant while as deficiency frequent This cohorts cohorts Biotinidase another our our with of homocystinuria. in Frequency in with (f=0.04) similar variant. 4% associated is this pathogenic screening of is for these a genetic frequency frequency gene metabolism, of in carrier any acid (CBS) Majority disorders amino highest beta-synthase In IEM classes. has interventions. cystathionine in variant IEM dietary in different implicated or variants in treatment (rs5742905) amino variants early of and variant by pathogenic energy inclusion patients of carbohydrate, Therefore, benefit thyroid, list might to provides treatable. related – Cystic are Table genes (MODY), disorders compiled young classes. have We the IEM acid of metabolism: of diabetes errors onset different Inborn to Maturity related genes metabolism, cancers. in of hereditary frequency carrier and errors 5 fibrosis Inborn [?] with classes: variants IEM pathogenic phenotypic different compared to and analysed related have below. We are discussed populations that is Indian frequency categories in amino carrier variants IEM relevant of different high Clinically errors in with variants Inborn genes of category: other distribution gene, The few (CBS are classes. beta-synthase there ( cystathionine metabolism), 1% from acid [?] Apart frequency frequencies. carrier lower with highest is genes Clindb 12 of between found of difference We difference frequency difference overall frequency average that frequency revealed the average ( analysis compared the GAsP Our also, than compared and, databases. we study other Clindb, under with of databases GAsP other estimates and frequency Clindb carriers of between of reliability number the with variants evaluate include To if even GAsP. higher and remains 9853 number ExAc of This gnomAD, > distribution genomes, databases. variants Frequency 1000 other pathogenic in to populations. groups comparison 9853 Indian SAS known with diverse of 4a compared in Figure was spectrum ClinVar) Clindb frequency in in variants houses variants pathogenic mapped that 19,538 “Clindb” of databases resource (out genomic a other created and have Clindb We in mutations pathogenic known of Spectrum 1kg than cohorts study 3). our our (Figure in to groups closer isolated more DR genetically are and AA populations 1Kg F of than However, Representation populations lacking. cohorts. IGV also our genomes is in 1000 samples as populations) well study altitude as (high populations IGV group with TB cohorts group. study group. EAS our and SAS of samples diversity SAS in of genetic published estimations number the recently frequency less compared we comparatively bias Also, Lastly, has might moreover, which 2016). and (n=724) Ramsay, group group & TB SAS from Basu, in representation Choudhury, lacks (Sengupta, EAS project findings GAsP genomes. 1000 our the substantiated in (1kg also under-represented group are SAS cluster S3) genomes majority genetic 1000 (Figure that TB the observed as to well (1kg clearly proximal as We group are populations populations structure. isolated large DR genetic DR and map and fine IE to the of (OG-W-IP) population outgroup Indian .Ti eesttsteueo ag n ies oot rmIda ouain nfrhrgnmcstudies. genomic further in populations Indian from cohorts diverse and large of use the necessitates This 1. hw htCid a aiu nqevrat 12)wt rqec fptoei ainsin variants pathogenic of frequency with (1128) variants unique maximum has Clindb that shows – A)i 00gnmsi eeial itntfo ouain nT lse (F cluster TB in populations from distinct genetically is genomes 1000 in EAS) iue4b Figure nerpeetto fIda eoi iest n10 eoe a ale eotdand reported earlier was genomes 1000 in diversity genomic Indian of Underrepresentation . iue3 Figure ). A.Mr pcfial,A n Rioae rusa ela Blwaltitude low TB as well as groups isolated DR and AA specifically, More SAS. hw ehv ersnaino EadD ag ouain swl sfrom as well as populations large DR and IE of representation have we shows ST nlsssget htorsuychrsaemr rxmlto proximal more are cohorts study our that suggests analysis al S8 Table 6 .ML,CSadZR1aetpgnswith genes top are ZGRF1 and CBS MBL2, ). SAS A) oee,AA However, SAS). ST =0.01-0.02) Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. s4694 nAA1gn a rqec f00%(=.0)i lnbwieteei orepresentation no is there while Clindb in in variant Mutations (f=0.004) metabolism, 0% 0.04% energy of in of Similarly, frequency frequency populations. has disorders. global has reported XV from gene resources disease ACAT1 global storage in glycogen in rs148639841 with populations associated SAS are while GYG1 cohorts our in 0.03) aiaino eoyn eut eedn ydigGAgntpn fslce mpe nrpiae (n=13) replicates in smaples selected of GSA-genotyping doing by done were results genotying of Validation Validation Vrat ngn eae oohrsse iodr:teosre aao h ooei ies related disease monogenic the of data observed the disorders: system other to related gene in Variants 4. genes) predisposing cancer 99 (in variants genetic pathogenic 853 mapping On Cancers: Hereditary 3. con- transmembrane fibrosis cystic from variants pathogenic 6 prioritized have We Fibrosis: Cystic 2. autosomal an as inherited diseases the of one (MODY), young the of diabetes Maturity-onset MODY: 1. f1 ahgncvrat ngnsrltdt ada rhtma Canlptygenes, (Channelopathy arrhythmias cardiac to childhood occurrences rare related heterozygous various genes 133 of in instance, cases For variants KCNQ1 confirmed population. pathogenic genetically Indian 11 in the disorders of of Mendelian ( occurrence onset etc adult true heart and their brain, and organs, knowledge critical the GJB2, other for and responsible (f=0.005) genes 2 familial MAP2K2, cancer, our Breast-ovarian (f=0.01), than BRAC2, Neuroblastoma (f=0.005). retrieved less ALK, we and related genes, in is deafness (f=0.007) predisposing frequency which cancer types 99 carrier genes) n=5, tumor (Familial mapping high (13 to CHEK2 On multiple with 18variants and respectively. n=1 variants related) 4 of from pathogenic and (Deafness set 5 ranges GJB2 26other are retrieved in variants’ genes variants we these breast) for of in carriers dataset, cancer of carriers our Number of with threshold. Number cases described 10,389 study. in our subtypes in CFTR of cancer function 33 of from loss complete severe less a has is function there of where loss fibrosis, complete CFTR sino- cystic to in gene. classical lead and mutations to doesn’t infertility of compared that type male Mutations as the insufficiency, phenotypes 2001). upon pancreatic Knowles, variants depends & example, CFTR phenotypes global (Noone For CFTR The gene in in in absent databases). Heterogeneity phenotypes. deletion nearly global other classical disease. or in pulmonary to this less groups linked as SAS either frequent also in is as are variants (f=0.001-0.03 Fibrosis are these rs193922500 Cystic CFTR recessive of expect autosomal in Representation databases of variants cause (f=0.001-0.002). other causes common cohorts that most It our suggests the database. be analysis CFTR2 to Our Phenylalanine in known frequency causing (CF). is ˜70% (CF) and has CFTR Fibrosis which of Cystic f=0.0016) misfolding classical (rs113993960, included deletion This (F508) 508 gene. (CFTR) patients. not regulator in does ductance fewer underrepresented and on non-consequential remains conducted are therefore, been variants have and pathogenic studies intervention so-called earlier represented pharmacological the the more need that that are necessarily possible is mutations is possible GCK It Equally However, earlier. samples. are 2018). reported study than al., recent cases cohorts et a the our However, (Mohan in of in (˜7%) populations. Majority patients mutations European among in GCK of databases. mutation mutated from observed global diabetes HNF1A ranging they in or (Maturity-onset frequency that GCK absence having gene highlighted either variants with have GCK These study be genes in criteria. to in our 14 variants known frequency in rate screened three defined frequency prevalence We our found have high with We 4). 0.0009-0.04 has 2) (Table type It subtypes subtypes. young, many MODY the 2013). has 14 and Thomas, sometimes to heterogeneous & or related genetically misdiagnosis Arulappan, is of Chapla, because Indians, either (Nair, Asian underrepresented undetected is remains diabetes of it form This trait. dominant ahgncvrat eesttsterlvn dnicto fptet arigsc eet nclinics diagnosis. in rapid syn- defects for approach such various centric carrying to mutation patients linked rare appropriate multiple of genes allow of identification also data the relevant and frequency in the this observed necessitates Thus variants were disorders. pathogenic hemoglobinpathies calls and heterozygous in renal neurological, multiple mainly dromic, genes; addition dystrophy In muscular various observed. in count heterozygous 145 and SCN5A ,crimptis( cardiomypathies ), MYBPC3 7 < and %o ninptet n N1 scommonly is HNF1A and patients Indian of 1% TNNT2 Table-2 eeosre.Svnvrat with variants Seven observed. were ) .Tu,i ihihstegpin gap the highlights it Thus, ). CAPN3 and SELENON KCNH2 were , Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. 00gnmsadGs.Teeoe rqec siae rmti td r niiae ob oereliable. in more group be SAS to even anticipated population are study world this any included ClinIndb from than Cohorts named estimates populations frequency populations. compendium IGVC Therefore, Indian GAsP. of a diverse and representatives developed genomes in closer 1000 variants we are relevant study knowledge, clinically this this of in data with frequency researchers houses and which variants clinicians cataloged also benefit we To of addition, In part not users. 5,984 the are of variants for S8). Out database (Table These perspective the effect. pharmacogenetics detrimental. in their from be provided ascertain important is to to are list significance predicted a that tools clinical are but predictive whose that analysis three variants variants main of those 3,751 the consensus of are effect used there only We variants, the as such evaluated ClinVar. well also in as we uncertain variants, cataloging is pathogenic the based to estimate literature to addition from required In estimated are frequency accurately carrier be uninformative of cannot data. / estimates patient which reliable informative load that using for fact mutation costs researchers the real screening and highlighted is also clinicians genetic study study reducing benefit This this in will Importantly, which aid catalog. chips. may studies our genotype screening as from or well genetic estimates assays as conducting pathogenic commercial rare making for variants in specific decision guide pathogenic as population known priori well including of a and ˜88% as an for cataloging databases frequency of public of need prior- in absence for the variants hand, candidates demonstrate other important populations and are the Indian frequency On Neuro in carrier testing. and high genetic with diabetes in variants itization pathogenic monogenic that in genes, suggests metabolism analysed study associated prevalent This have of underrepresented cancer we errors of hereditary Inborn knowledge example, comprehensive in of elaborated disorders. a variants classes Neuromuscular an provides, genetic different study As and common our fibrosis addition, and populations. cystic In Indian mutational in context. in the mutations relevantIndian evaluating disorders clinically of by complex or applicability spectrum pathogenic its and frequency known showcases rare study of Our burden in mutational load populations. the Indian or estimated different and and in cataloged variants diagnosis we study, clinical have this referred could In ( GSA the cohort application, overall for genetic our variants ( clinical from diagnosis For clinical identified DISCUSSION the variants chip. of at pathogenic GSA arrive representation common on could low for we mutations application to samples specific markers further patients’ clinical Indian due (3%) of known of 287 be for diagnosis abundance of rate could low clinical 9 positivity various This only low for very for total a ). referred In us were yielded chip. which cohort, GSA GOMED on samples under patients diseases database of genetic in investigations rare as genetic well based as GSA supplementary category in VUS provided of variants Utility is unique Clinical list 1974 (hg19), This are browser There genome cohorts. UCSC provided. our to been S11. Links in also as frequency database. such has the have options dbSNP explore which search and to various Genome rsid using batch 1000 researchers gene, ExAC, and batch clinicians rsid, facilitate location, to gene, database this developed have We concordance genotyping database 99.69% ClinIndb shown, analysis have concordance variants samples The of two set test. which common significance S10. of a Table HWE out 1274 score (n=48), data, quality data Exome genotyping sequencing and poor exome GSA to between whole due the eliminated with been analysis has concordance by also and ainso neti infiac eeas vlae n aaoe ntedtbs o users. for database the in cataloged and evaluated also were significance uncertain of Variants 8 Table-2 Table-S12 ) Table Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. gawl . ei . h,P,Snh .K,Sodn . ah,M . uej,M 21) EGLN1 (2010). M. types Mukerji, constitution . extreme . of . analysis Q., M. genetic Pasha, through T., . revealed Stobdan, in K., adaptation defined P. high-altitude Singh, P., in Jha, genetic involvement approach. S., Combined Ayurgenomics Negi, by (2015). S., revealed M. Aggarwal, hypoxia Mukerji, in 13 & outcome medicine, B., thrombotic translational Prasher, modulate of VWF S., Journal and Ghosh, Sara. EGLN1 A., from of Agrawal, hear effects A., I Gheware, once Leh S., for Aggarwal, no assistance grant financial the the give all would for REFERENCES I MLP901 and supported grant CSIR project facility. a genotyping as their for IARI acknowledge We work. respective their – for Data Variation Genome Indian work. respective analysis, their acknowledged. and for duly genotyping edged are quantitation, Cohort Khanna QC, Ayurgenomics Sangeeta Sample Kunj, TRISUTRA For Kalpana and AIIMS. Bhattacharyya and Aniket Leh Mukerji, from Mitali coordination and collection BSC0122, – MLP1802, Cohort MLP1601, - HAPS grants from funding restrictions the ethical acknowledge or Authors privacy to due available publicly not ACKNOWLEGEMNTS are data The author. sponding interest of Sharing: conflict Data no is there that declare Authors ailments. INTERST determining genetic OF in suspected and CONFLICT diagnostics with genetic patients rapid 2010 evolving in after for variants implications evident actionable direct quite has clinically study is our relevant growth of clinically content this yet. data of India, The updated growth in not and exponential is sequencing been database generation comparison has data this next Frequency there Importantly, and of Moreover, computed. advent biases. be absent. the from cannot after is 1993-2010 variants suffer estimates populations during might frequency global published carrier studies other literature therefore individual with from genetic patients, These collated from Indian was collated communications. data limitations. is personal This own in as diseases 2010). their well 52 al., have as from et they mutations (Pradhan however 6647 individuals catalogs of 5760 data such houses create (http://www.igdd.iicb.res.in/home.htm) to database disease efforts earlier were There Mtl uej,Bnj am,Rsn hms nbuiTiatiadAkt aag-ON data OMNI - Narang Ankita and management Triparthi and Anubhuti development Thomas, resource Roshni for Varma, Mukerji Binuja Mitali Mukerji, and Mitali Consortium 2. Variation Genome Indian study 1. field the supervising for Prasher analysis Bhavana and and genotyping study array genomic GSA the for supervising Khanna for Sangeeta Mukerji quan- and repository, Mitali Singhal DNA 6. Khushboo processing, and Vats, coordination, Archana identification collection, 5. sample sample for establishment, Yadav Arti field and for Varma Binuja Mukerji 4. Mitali studies Prasher, field Bhavana from study development Varma, the resource Binuja of for 3. coordination consortium for Ayurgenomics Mukerji TRISUTRA Mitali 2. and Prasher Bhavana 1. eoyigadanalysis and genotyping titation collection h aata upr h nig fti td r vial nrqetfo h corre- the from request on available are study this of findings the support that data The eakoldetecnrbto fMtl uej n hvn rse nSample in Prasher Bhavana and Mukerji Mitali of contribution the acknowledge We rceig fteNtoa cdm fSine,107 Sciences, of Academy National the of Proceedings 1,184. (1), otiuino niiul mnindblw sdl acknowledged duly is below) (mentioned individuals of Contribution otiuino niiul mnindblw sdl acknowl- duly is below) (mentioned individuals of Contribution – 9 4) 18961-18966. (44), Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. osrim .G .(08.Gntclnsaeo h epeo ni:acna o ies eeexplorati- gene M susceptibility. D., disease P. TB for Helden, in canvas Van interactions a L., gene-gene Merwe, India: der of Van people M., Daya, the of landscape Genetic (2008). variation. on. V. genetic human G. for I. Second-generation reference Consortium, (2015). global J. A J. datasets. (2015). Lee, richer P. & and M., G. larger S. Consortium, of Purcell, challenge S., Vattikuti, the C., patients. to L. Tellier, 1 rising C., PLINK: type C. Chow, albinism C., C. oculocutaneous Chang, Indian 131 among Molecular dermatology, (2011). variants K. investigative Ray, tyrosinase and & of R., of Journal MBL2 Bhadra, S., studies of Mallick, functional A., polymorphisms Bhattacharya, and M., Genetic Mondal, (2018). M., Sengupta, X. M., Chaki, Cheng, & studies. case-control D., of Wu, 22 lineage 1212. of C., (6), Y-chromosomal Wu, meta-analysis of a Z., susceptibility: Cao, spread X., Paleolithic Wang, (2011). Y., S. Cao, Agrawal, India. northeastern & and F., eastern Khan, in (2010). F., tribes V. Ahmad, G. I. M., Consortium, Borkar, . . patients. . disease K., polymorphisms: Ray, Parkinson’s K., Parkin Indian 167-171. S. (2007). in (3), Das, variants J. K., A. Ray, PINK1 Misra, of & S., Evaluation K., Majumder, Ray, T., Sadhukhan, V., A., population. G. Biswas, Indian I. in Consortium, disease S., Parkinson’s primary for Das, to risk G. M., predisposition I. Maulik, towards Consortium, A., factor . Biswas, susceptible . a . A., as Ray, CYP1B1 A., glaucoma. in Banerjee, open-angle polymorphism M., Acharya, Leu432Val structure. S., po- (2008). Mookherjee, extant complex V. D., of a Banerjee, history and A., the Bhattacharjee, components of ancestral 113 reconstruction Sciences, distinct Genomic of five (2016). Academy reveals P. National P. P. India N. Majumder, of Bhattacharyya, & pulations . N., Sarkar-Roy, . structure. and A., . peopling Basu, M., to Chakraborty, reference S., special Banerjee, with 2277-2290. view, S., genomic (10), Sengupta, a S., India: Roy, Thailand- Ethnic northwestern (2003). N., . the Mukherjee, . on A., . phenotypes P., Basu, and Charunwatthana, A. genotypes M., Rasanayagam, G6PD D. . Parker, of . border. N., Characterization . Myanmar Chowwiwat, (2014). B., R., H. B. Somsakchaicharoen, F. Rao, S., Nosten, populations. E., C. caste Chu, C. Indian Ricker, G., of E., Bancone, tuberculosis. origins M. the in Dixon, on polymorphisms S., evidence gene W. Genetic Watkins, immune (2001). T., Innate Kivisild, (2012). M., S. Bamshad, L. Garcia-Barcelo, Schlesinger, 80 . & immunity, . W., and . Infection Sadee, S., K., A. Borrego, disease Anti˜nolo,Azad, G., diseases—Hirschsprung J., complex determine Amiel, variants model. W., common a and R. as (2019). rare Brouwer, of A. Contribution Y., Consortium, (2013). & Sribudiani, M.-M. O., M., Mukherjee, M. S., Alves, I). M. (Phase Rao, Database S., Reference Jain, Exome B., 225-234. Indian Viswanath, P., The R. INDEX-db: More, H., P, Ahmed ora fgntc,87 genetics, of Journal eeomna ilg,382 biology, Developmental lSoe 9 one, PloS oeua iin 14 vision, Molecular 1,3-20. (1), 1) 3343-3359. (10), 1) e116063. (12), 6,1594-1599. (6), 1,320-329. (1), naso ua ilg,38 biology, human of Annals 1,260-262. (1), 841. , lSoe 10 one, PloS lnclgntc,72 genetics, Clinical le,M,&Ha,E .(05.Ivsiaigterl of role the Investigating (2015). G. E. Hoal, & M., ¨ oller, 10 4,e0123970. (4), iacec,4 Gigascience, ora fCmuainlBooy 26 Biology, Computational of Journal eoersac,11 research, Genome rhvso eia cec:AS 14 AMS, science: medical of Archives aknoim&rltddsres 16 disorders, related & Parkinsonism 6,736-746. (6), 5,484-486. (5), aue 526 Nature, 1,7. (1), eoersac,13 research, Genome 77) 68. (7571), rceig fthe of Proceedings 6,994-1004. (6), The (3), Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. iiid . oti . esau . atn,S,Klm,K,Prk . tpnv .(03.The (2003). V. populations. Stepanov, caste . and . tribal . Indian J., in Parik, both K., persists Kaldma, settlers exomes. S., earliest 000 Mastana, the 60 M., of Cum- over Metspalu, heritage . from S., genetic . information Rootsi, . data T., D., reference Kavanagh, Kivisild, displaying M., browser: D. 45 ExAC Ruderfer, research, M., The acids Solomonson, (2016). Nucleic B., B. Thomas, B. (gnomAD). B., mings, Database Weisburd, Aggregation J., Genome K. The Karczewski, (2017). perspectives. L. Current Francioli, India: & K., in Karczewski, screening Newborn (2010). M. 219-224. (2015). Kabra, (3), India. V. & in G. S., malaria I. Kapoor, falciparum Consortium, thrombospondin . Plasmodium 34 molecules . Diseases, for adhesion Infectious . factors and & S., risk Microbiology , Awasthi, C-reactive are S., NOSII, E-selectin Sharma, encoding and S., genes Mishra, host S., in Mohanty, Polymorphisms S., Pati, the of K., Deletion Kanchan, (2012). K. S. malaria. Sharma, . falciparum . to . K. susceptibility K., 12 P. Thangaraj, impacts Singh, A., . strongly Narang, . gene T., Qidwai, . APOBEC3B K., G., Kanchan, P. S., Sinha, Kremsner, P., Jha, K., populations. P. Indian Patra, in S., germline susceptibility Pathogenic S. (2018). malaria Pati, 52-61. N. K., and Oak, variations V. . MBL2 Singh, . P., (2014). . Sundaravadivel, C., N., Oh, A. J., Jha, Wang, I., cancers. D. adult Ritter, 10,389 Y., Wu, in A. and J., variants Kashyap, genomes R. . whole Mashl, . Asian K.-l., . South Huang, R., integrating Ravi, variants R., genetic Jayarajan, of S., resource A. comprehensive exomes. Ranawat, a A., SAGE: Verma, (2018). K., K. S. Vellarikkal, M., J. Hariprakash, aoh . ct,A . megr .S,Bchn,C . cuik .A 20) nieMendelian Online (2005). A. V. McKusick, disorders. & genetic and A., genes C. human Bocchini, of knowledgebase S., 33 a J. (OMIM), Amberger, Man in F., Inheritance A. Scott, A., polymor- (2007). single-nucleotide Hamosh, K. Ray, informative & and K., mutations P. prevalent Genetic Gangopadhyay, using personalized markers. K., (2010). disease S. phism for K. Wilson Das, Kaur, directions I., of Chattopadhyay, diagnosis P., . potential Molecular Nasipuri, . and M., . Maulik, drugs A., M., antiepileptic Gupta, Gupta, first-line K., on Bala, epilepsy S., with treatment. Sharma, patients R., of Baghel, profile M., Gourie-Devi, S., Grover, humans. in (2014). W. function loss Kimberling, K. cell . hearing hair S. . disrupt progressive . protein, 328-337. Brahmachari, cause J., stereociliary Velasco, and . conserved A., evolutionarily mice . Kolatkar, an in LOXHD1, A., . Sczaniecka, in S., Mutations N., (2009). M. Tandon, J. Hildebrand, associated A., M., genes Schwander, Basu, VKORC1 N., population. Grillet, and I., Indian CYP4F2 the Kaur, CYP2C9, in S., dosage in Grover, warfarin variations with pharmacogenetic M., of across N. epidemiology discoveries Khan, Genetic genetic K., enables A. Giri, Project 100K GenomeAsia Dec;576(7785):106-111. The Asia.Nature. (2019). testing perspectives. Genetic Consortium (2018). future GenomeAsia100K R. and S. status Masoodi, & current T., young Hassan, the U., 9 Shabir, of A., diabetes B. maturity-onset Ganai, S., of Ali, K., Nissar, P., Firdous, . 1,142-148. (1), (suppl aaae 2018 Database, ) D514-D517. 1), hraoeois 11 Pharmacogenomics, lnclceity 53 chemistry, Clinical . D) D840-D845. (D1), el 173 Cell, 7,927-941. (7), 9,1601-1608. (9), 1) 2029-2039. (10), 2,3530 e314. 355-370. (2), hraoeois 15 Pharmacogenomics, 11 h mrcnJunlo ua eeis 85 Genetics, Human of Journal American The 1) 1337-1354. (10), neto,Gntc n Evolution, and Genetics Infection, neto n muiy 82 immunity, and Infection uoenJunlo Clinical of Journal European rnir nendocrinology, in Frontiers uli cd research, acids Nucleic ninpdarc,47 pediatrics, Indian aAtu Lab MacArthur h American The (1), (3), . Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. akr .K,Ilm .T,Eko,G,Hsan .A,Qdi .K,Mrdzaa,A,...Thr,S. Tahura, . . . A., Muraduzzaman, K., S. Qadri, individuals. A., Bangladeshi M. in 11 mutations Hossain, one, gene dehydrogenase G., PloS Glucose-6-phosphate Eckhoff, of T., analysis population Molecular M. Indian (2016). Reconstructing Islam, (2009). K., L. S. Singh, Sarker, & L., A. Price, N., Patterson, history. Principal K., (2006). Thangaraj, D. D., Reich, (2017). Reich, & studies. R. A., association N. Kukreti, genome-wide Shadick, E., in . M. stratification Weinblatt, 904. . for M., R. corrects . Plenge, analysis A., J., components N. Narang, Patterson, L., R., A. Pandey, Price, geographically K., and ethnically B. across populations. initiative Khuntia, Indian Indian consortium diverse (2010). A., TRISUTRA medicine: K. Kumar, Ray, stratified for & B., Ayurgenomics C., Varma, Dutta, B., K., S. Prasher, Bag, K., Bhattacharyya, database. A., disease Dutta, genetic M., Raskind, speech. . Sengupta, . of S., . apraxia Pradhan, B., childhood I. Stanaway, with L., families K. multigenerational 11 two Chapman, M., in one, variants M. PloS candidate Matsushita, Genetic Q., A. (2016). Jr, H. Nato fibrosis W. M., cystic E. with Wijsman, associated B., Peter, phenotypes disease ’CFTR-opathies’: mutations. (2001). gene R. IGVBrowser–a regulator M. transmembrane (2010). Knowles, D. & G., Dash, P. & populations. Noone, M., Indian Mukerji, diverse A., from Mukhopadhyay, (2011). A., resource V. variation Chaurasia, G. genomic D., I. R. Consortium, Roy, . A., Narang, ancestry. . . African A., of Basu, population D., Indian Dash, 89 an A., Mukhopadhayay, in V., admixture Rawat, Recent diabetes P., onset Jha, maturity A., of diagnosis Narang, Molecular (2013). India. N. in Thomas, young & N., the Arulappan, of A., Chapla, V., V. across Nair, genotypes to Bhangale, phenotypes . connecting platform . analytic and . species. data integrative D., an L. Initiative: Monarch K Goldstein, The A., the J. B., McMurry, of J., K. diabetes C. Pahuja, maturity-onset Mungall, India. in W., South variants (1999). E. in pathogenic patients Stawiski, K. identifies (MODY) analysis T., S. young genomic Sil, T. Comprehensive Nguyen, . (2018). V., . T. . Radha, N., V., im- evolutionary Mukherjee, Mohan, possible B., their Dey, and M., populations Chakraborty, Indian (2016). plications. risk. S., in M. fibrosis L. polymorphisms Banerjee, Silver, insertion/deletion cystic B., . Human-specific Roy, of . P., detection . P. C., in T. Majumder, bias Petrossian, B., population 18 Spurrier, systematic medicine, C., expose in Borroto, J., Genetics panels M. ClinVar: screening Silver, (2015). mutation J., J. Hoover, A. Targeted . Silver, . M., . R. variants. S., relevant Lim, Chitipiralla, C., clinically Chao, of G., interpretations Brown, of M., archive Benson, (2009). M., public J. S. Lee, A119C Sengupta, J., CHDH . M. of . Landrum, . association hyperhomocysteinemia. S., genes: with Ghosh, pathway C677T R., metabolism MTHFR K. homocysteine and Sanapala, in E., Sundaramoorthy, polymorphisms A., nucleotide Kumar, Single G., Garg, J., Kumar, 72 Genetics, Human of Journal 1,111-120. (1), uli cd eerh 45 research, acids Nucleic aue 461 Nature, uoenJunlo ua eeis 7 Genetics, Human of Journal European 1) e0166977. (11), e0153864. (4), 76) 489. (7263), ninjunlo nornlg n eaoim 17 metabolism, and endocrinology of journal Indian uli cd eerh 39 research, acids Nucleic 2,174. (2), ora fehohraooy 197 ethnopharmacology, of Journal 2,313-332. (2), he,S,Blo,J . ormo . rs,M,...Egltd .(2016). M. Engelstad, . . . M., Brush, C., Borromeo, P., J. Balhoff, S., ¨ ohler, D) D712-D722. (D1), M eia eeis 19 genetics, medical BMC eprtr eerh 2 research, Respiratory iclto:Crivsua eeis 2 Genetics, Cardiovascular Circulation: 4,435. (4), (suppl 12 ) D933-D938. 1), aaae 2010 Database, uli cd eerh 44 research, acids Nucleic 274-293. , h mrcnJunlo ua Genetics, Human of Journal American The 6,328. (6), 1,22. (1), 3,430. (3), . auegntc,38 genetics, Nature D) D862-D868. (D1), 6,599-606. (6), (8), Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. A rus )Aslt vrg rqec ieecso lnbadGs SS eecmae with compared were (SAS) GAsP and Clindb of across groups differences SAS compared gnoMAD frequency were and average term ExAc Absolute pathogenic” 1000G, gnomAD B) “likely databases. or groups. global “pathogenic” SAS in either populations containing SAS variants different of Number genomes A) -1000 different sign represent plus figure populations, main Figure-4: IGV the - populations. outside circles super Unfilled Keys study, EAS groups. this and isolated in SAS Cohorts of - exception circles the Filled genomes with populations: study 1000 plus this the populations, in of IGV genomes. cohorts any 1000 - by from Circles 3: covered populations distinct populations: Figure super not is different EAS group populations represent and EAS DR figure SAS group. and main - SAS AA sign the by covered Isolated outside are India. Keys populations of populations. large populations DR and TB IE from the of Majority populations. 2: Figure databse. ClinIndb shot (2012). Bioinformatics M. B. analysis. Figure-1 Neale, population . and . . genetics M., legends: Fromer, genotyping: Figure J., array-based doi:10.1093/bioinformatics/bts479 Maguire, for 2543–2545. B., 28(19), caller G. from England), variant Grant, variants (Oxford, J., rare genetic Carey, a of A., L. zCall: Crenshaw, annotation Singh, I., functional . J. ANNOVAR: Goldstein, . (2010). . H. chromosome Y Hakonarson, M., data. from & B. sequencing inference Reddy, M., high-throughput India: Li, V., of K., L. groups Wang, Bhaskar, tribal and K., castes V. DNA. lower Genetic Singh, mitochondrial the (2017). interethnicand among G., M. and affinities Chaubey, Singh, Genetic functional K., . (2006). a . Thangaraj, patients: . I., epilepsy G., Thanseem, in Arora, outcome S., treatment Grover, on R., reconstruction variant perspective. Baghel, haplotype CYP1A1 S., for of Mahendru, method (2013). contribution N., statistical J. new Kanojia, Zielenski, A P., Talwar, . (2001). P. . Donnelly, gene. . & data. regulator N., J., population conductance Sharma, from N. transmembrane Smith, H., fibrosis Yu, M., cystic Stephens, K., the Kaniecki, in F., variants 45 Goor, of genetics, Van liability Nature Variations R., disease (2008). the K. K. S. Defining Siklosi, Sharma, R., . . P. India. . Sosnay, in S., malaria falciparum S. to Pati, susceptibility N., and G. molecules 7 Jha, adhesion P., encoding areas Anand, genes in K., host residing Kanchan, in populations T., of Qidwai, differentiation S., Genetic Sinha, India. (2009). in S. endemicity Habib, malaria & (2011). high S., G. Agarwal, of A. V., Reddy, Arya, . S., . Sinha, . G., admixture. Kulkarni, P., Indian Govindaraj, with 154-161. S., descendants (1), D. African Rani, P., siddis: dataset. Moorjani, Indian R., project Tamang, genomes M., A. 1000 Shah, the underrepresen- in and diversity stratification Population genetic 8 (2016). subcontinent evolution, M. Ramsay, Indian & of A., tation Basu, A., Choudhury, D., Sengupta, 1,250. (1), okflwtesuyaddtiso eoyig aapoesn n ult otosadscreen and controls quality and processing data Genotyping, of details and study the flow Work : iest n eaens fIda ouain ihrsett 00gnmsSSadESsuper EAS and SAS genomes 1000 to respect with populations Indian of relatedness and Diversity ucetcvrg fI n Rlreppltosa ela Bcutr(yhglnes your by highlanders) (by cluster TB as well as populations large DR and IE of coverage Sufficient pcrmo ahgncVrat nCid n te eoi aaae SSPopulations) (SAS databases genomic other and Clindb in Variants Pathogenic of Spectrum h hraoeoisjunl 17 journal, pharmacogenomics The 1) 3460-3470. (11), 1) 1160. (10), h mrcnJunlo ua eeis 68 Genetics, Human of Journal American The M eeis 7 genetics, BMC uli cd eerh 38 research, acids Nucleic ora fgntc,88 genetics, of Journal 1,42. (1), 3,242. (3), 13 A srpeettv fbt nmDadExAc and gnomAD both of representative is SAS h mrcnJunlo ua eeis 89 Genetics, Human of Journal American The 1,77-80. (1), 1) e164-e164. (16), 4,978-989. (4), eoebooyand biology Genome aai journal, Malaria Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. figures/Figure-1/Figure-1-eps-converted-to.pdf 14 Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. figures/Fig-2/Fig-2-eps-converted-to.pdf 15 Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. figures/Fig-3/Fig-3-eps-converted-to.pdf 16 Posted on Authorea 4 Apr 2020 — CC BY 4.0 — https://doi.org/10.22541/au.158602488.82271538 — This a preprint and has not been peer reviewed. Data may be preliminary. figures/Fig-4/Fig-4-eps-converted-to.pdf 17