Four Novel Polymorphisms Cause Nuclear Age Related Cataract in Chinese People

Lin Wang First Afliated Hospital of Harbin Medical University Wencheng Zhao First Afliated Hospital of Harbin Medical University Yongbin Yu First Afliated Hospital of Harbin Medical University ping liu (  [email protected] ) Eye Hospital, The First Affiliated Hospital of Harbin Medical University, Harbin, P.R. China

Research article

Keywords: nuclear age-related cataract (NARC), single nucleotide polymorphism (SNP), structure

Posted Date: February 13th, 2021

DOI: https://doi.org/10.21203/rs.3.rs-219650/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License

Page 1/24 Abstract

Background

More information on genetic variation can be obtained by exon sequencing for the diagnosis of nuclear age-related cataract (NARC).

Methods

In our present study, genomes of 12 DNA samples were sequenced. The average effective depth was 10× when using Illumina sequencing. After conducting whole-exon sequencing, we further performed depth analysis and spectrum analysis to determine the gene polymorphism sites closely associated with NARC.

Results

In showing single nucleotide polymorphism (SNP), there were 18,699 synonymous mutations and 17,975 missense mutations in the coding region. A total of 4,944 insertions and deletions (indels) were found. Among them, 1329 indels exhibited polymorphism and were further analyzed. Whole-exon sequencing previously showed polymorphism associated with ARC and known pathways associated with protein synthesis and metabolism. Following depth analysis (GO and KEGG analysis), we identifed 20 promising candidate genes that were closely related to NARC. We further performed spectrum analysis for 26 polymorphism sites and found that ZNF573 (rs3095726, SNP), ZNF862 (rs62621204, SNP), SYNE3 (rs76499929, indel), and GAS2L2 (rs78557458, SNP) had statistically signifcant relationship with NARC. The 3D protein structure showed obvious changes for ZNF573 (rs3095726, SNP) and GAS2L2 (rs78557458, SNP).

Conclusions

Our fndings provide the basis for further studies and discovery of key genes associated with NARC.

Key Messages

1 For studying the pathogenesis of nuclear age-related cataract

2 We found that ZNF573 (rs3095726, SNP), ZNF862 (rs62621204, SNP), SYNE3 (rs76499929, indel), and GAS2L2 (rs78557458, SNP) had statistically signifcant relationship with NARC

3 Our fndings provide the basis for further studies and discovery of key genes associated with NARC.

Background

Age-related cataracts (ARCs) are characterized by visual impairment and lens opacities, and they are the major reason of blindness worldwide [1]. Both genetic variations and environmental factors can lead to ARCs [2]. Various interaction factors have been confrmed to be involved in the complex cataract formation process.

Both genetic and environmental factors have been confrmed to be involved in the pathogenesis of ARC [3]. Genetic polymorphism is considered to be an integral part of genetic risk for developing ARC. Many researchers have studied the relationship between genetic polymorphism and ARC sensitivity [4,5].

Alterations in the molecular architecture of lens lead to the development of cataract [1]. The annual worldwide incidence of cataract has been estimated to be 17.7 million [2]. To date, there is no effective pharmacological treatment for cataract because it is an irreversible age-related process [3].

ARCs are classifed into cortical cataract, nuclear cataract, and posterior subcapsular cataract (PSC). In cortical cataract, the cytoplasm of mature fber cells are damaged in the outer third of the lens. An increase in light scattering is a characteristic of nuclear cataract with yellow or brown coloration [4]. Furthermore, less than 10% of ARC cases are PSC [5].

We assessed the cataract status of participants and examined their eye structure with slit lamp (Streit BQ-900, Hague; Haag Streit AG, Köniz, Switzerland) [6]. We used the standard lens opacifcation classifcation system III (LOCS III) to classify cataract types. On the basis of the standard criterion, we divided the type of cataracts into cortical cataracts (LOCS III score ≥2), nuclear cataracts ((LOCS III score ≥4), or mixed type. In the present study, we analyzed blood samples of patients with nuclear cataract.

It is generally believed that genetic factors play an important role in the formation of ARC; however, the exact cause of ARC remains unclear [7]. The etiology of ARC is multifactorial, and both genetic variations and environmental factors are associated with the disease. Genetic factors are closely related to the pathogenesis of ARCs. In twin studies, the heritability of nuclear subtypes was found to be 48% [8] . Genetic variations may increase the sensitivity of lens to environmental risk factors and may be directly involved in the formation of ARC [9] .

Through the target enrichment strategy, whole-exome sequencing (WES) is a high-throughput genomic technology that selectively captures the coding region of the genome [10-12]. WES uses oligonucleotide probes that perform selective hybridization [10,12,13], and targeting enrichment is achieved by capturing the entire coding region of the genome. Because exons constitute approximately 2% of the genome [14] , WES technology provides high coverage at low cost [14] with faster speed. After the frst successful application in the discovery of candidate genes for Miller’s syndrome [15], WES has been adopted for investigating a number of complex disorders and Mendelian [16-17]. Page 2/24 WES was used in the NHLBI GO Exome Sequencing Project, Exome Aggregation Consortium (EXAC), and other 1,000 genome projects to identify rare disease- related variants and to classify the variations in the population [18–22]. Since 2011, WES has also been frequently used as a diagnostic tool in clinical genetics laboratories [23–25].

WES is a useful clinical diagnostic tool to identify disease-related variants in patients [26–27]. Currently, to determine whether disease-related mutations in patients are related to mutations in coding regions, researchers are focusing on sequencing exons rather than entire genomes.

Recently, several studies have identifed variants strongly associated with disease phenotypes [26–27] by successful application of WES technology.

Because of the uneven coverage of different WES datasets along the exon length under high-resolution detection, in the variation calling analysis, the impact identifcation of new variations that may be of clinical signifcance.

In the present study, we analyzed and determined the key issues related to sequence structure, which lead to low coverage, and systematically studied the different parameters that may affect WES. To date, WES has become a major genetic tool, with over 100,000 exons sequenced in several diagnostic centers [28] .

To fll the gap and identify the mutation accurately, it is very important to improve the mapping algorithm and modify the design of target sequence capture technology and estimation of genetic disease heritability. Large numbers of insertions and deletions (indels) and single-nucleotide polymorphisms (SNPs) have been identifed using the next-generation sequencing (NGS) technology. In many species and human diseases, short indels are involved in phenotypic diversity and are regarded as the second most common form of genomic variation [29].

It is feasible to identify and study the molecular basis of SNPs and indels for ARC. However, to date, very limited investigations have been reported using the method of whole-genome resequencing for genes related to ARC.

The present study aimed to detect polymorphic SNP sites and indels including short indels [1–49 base pairs (bp)] across the whole exon sequence in 8 patients with NARC. In patients with NARC, it is important to identify the differences in SNPs and indels, which result in functional genes.

Methods

Subjects

A total of eight subjects were recruited from the Eye Hospital of Harbin Medical University. All the subjects received comprehensive ophthalmic examinations, including vision, slit lamp microscopy, and ophthalmoscopy. None of the subjects had blood relations (at least not among the four grandparents). All the subjects claimed to be Han (all four grandparents were Han). The study was approved by the institutional review committee, following the principles of the Helsinki declaration, and informed consent was signed by all subjects.

Lens opacity grading

According to LOCS III, a trained ophthalmologist graded the lens opacity of each right eye as cortical (C), nuclear color (NC), nuclear opalescence (NO), posterior subcapsular (P), or mixed type after pupil dilation with 1% tropicamide.

Nuclear ARC group and control group

All subjects with NARC were included in this study, and the case and control groups were recruited according to the grading conditions. The following exclusion criteria were used: (1) history of diseases such as tumor, cancer, respiratory disease, kidney disease, or history of diabetes; (2) pseudophakia or aphakia in both eyes; (3) ocular surgery history in either eye; (4) complications with other eye diseases such as fundus diseases, dislocated lens, glaucoma, trauma, high myopia, and uveitis; and (5) under 45 years of age.

Blood sample collection and DNA isolation

Peripheral blood (12 ml) samples of all subjects (8 disease cases and 4 controls) were collected in EDTA tubes and stored at -80°C before use. DNA was extracted from whole blood cells by using the mammalian blood genomic DNA extraction kit (Shanghai Life Biotech Co., Ltd., China) following the manufacturer’s instructions and stored at -20°C until it was used for genotyping

DNA library construction and sequencing

Qualifed genomic DNA samples were randomly interrupted into fragments with a main peak of approximately 200–300 BP by an ultrasonic high-performance sample processing system (Covaris). The DNA segment was then repaired, and the 3ʹ-end was added with the “A” base and the library joint at both ends. The library was prepared by linear amplifcation (LM-PCR). Some hybridization libraries and exon chips were captured and enriched, and the unenriched fragments were eluted and amplifed. The amplifed products were qualitatively controlled by Agilent 2100 Bioanalyzer (Agilent DNA 1000 Reagents) and qPCR, and then sequenced on a computer. We used the Illumina Hi Seq platform to perform high-throughput sequencing of each qualifed library and ensured that the amount of data of each sample met the standard. The original image data obtained by sequencing was transformed into raw reads (paired-end reads) by Illumina Base Calling software. The data were stored in FASTQ fle format, which is called raw data.

Extraction of genomic DNA from frozen blood samples by standard the phenol/chloroform method.DNA contamination and degradation were observed on 1% agarose gels, and the purity and concentration were tested using NanoDrop 2000 (Thermo Scientifc Inc. Waltham, DE, USA). High-quality DNAs were used in Page 3/24 library construction. For each individual, two paired-end libraries were constructed, and the read length was 2×100 bp. The sequencing was then performed by the Illumina Hi Seq 2000 instrument (Illumina Inc., San Diego, CA, USA).

Read mapping and variant calling

Information analysis was started with raw data. Raw data contain adapter sequences, bases with low sequencing quality, and bases with undetected N representations, which can interfere with subsequent information analysis. Therefore, it is necessary to flter the raw data to obtain clean data or clean re frst. Ads. Then, by using the comparison software (Burrows-Wheeler Aligner [BWA]) [30] [31], the clean data of each sample was compared to the human reference genome (GRCh37/HG19), and the original BAM format comparison result fle was obtained. To ensure the accuracy of mutation detection, we followed the best mutation detection and analysis process recommended by the ofcial Genome Analysis Toolkit (GATK) website. For comparing the results, Picard tool [32] was used to remove duplicate reads, and GATK [33,34] was used to process local readjustment and foundation quality recalibration.

On the basis of the comparison results, the evaluation indices such as sequence depth, coverage, and comparison rate of each sample were statistically analyzed. In addition, to ensure high quality sequencing data, a strict data quality control system (QC) was established in the entire analysis process. In this process, we used GATK v3.3.0’s Haplotype Caller, the cutting-edge, frst-class software, to detect genomic variations, including SNP and Indel, and fltered the original mutation test results to yield highly reliable mutation results. Next, SnpEff (http://snpeff.sourceforge.net/SnpEff_manual.html) software was used to annotate the mutation results and predict the impact. The fnal variation results and annotation results were used for downstream analysis. To remove noise from sequencing data, we frst fltered the data. The raw data fltering methods are as follows: After fltering, “clean data” were obtained, and the sequencing data were statistically analyzed, including the number of sequencing reads, data output, quality value distribution, etc. The BWA software (BWA v0.7.12) was used to compare all clean reads to the human reference genome (GRCh37/HG19). The sequence data of each lane were compared, and readgroup ID was added to the comparison results. The comparison method of BWA-MEM was adopted. The criteria were as follows: overall base quality score = 20; read depth <100 for each individual; and alternative allele on either forward or reverse supporting reads >3.

GO and KEGG pathway analysis

We then used KOBAS tools to annotate the function of (GO) and analyzed the pathway of Kyoto Encyclopedia of Genes and Genomes (KEGG) for the genes with identifed indel.

In the process of gene function annotation, p value of <0.05 as determined by Fisher’s accurate test was considered as signifcant. In addition, we downloaded genes related to known pathways from the KEGG pathway website (http://www.kegg.jp/), including growth arrest-specifc 2 like 2 genes, spectrin repeat- containing nuclear envelope family member 3 genes, and zinc fnger protein families, and we further determined whether these genes were present in the list of genes identifed by WES data for integrated analysis.

Mass spectrometry analysis

SNP typing with Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF/MS) is as follows. The target sequence was amplifed by PCR, and a base was then extended on the SNP site by adding SNP sequence-specifc extension primers. The prepared sample analyte was co- crystallized with the matrix of the chip. The crystal was placed in the vacuum tube of the mass spectrometer and then stimulated by an instantaneous nanosecond (10-9 s) intense laser. The matrix molecule absorbed the radiation energy, which led to energy accumulation and rapid heat generation. Consequently, the matrix crystal sublimates; the nucleic acid molecule was desorbed and transformed into metastable ions, and the ions produced were mostly single charged ions. The ions assumed the same kinetic energy in the accelerated electric feld, and they were then separated in a nonelectric feld drift region according to their mass-charge ratios and sent to the detector in a vacuum tube. Time-of-Flight (TOF) detectors are commonly used to detect ions produced by MALDI. The smaller the ion mass, the faster the ion arrives. Because mass spectrometry is highly sensitive with regard to quality, it is easy to distinguish two gene sequences containing only one different base and deduce SNP and indel typing.

Protein structure analysis

The 3-D structural changes of the ZNF573, ZNF862, GAS2L2, and SYNE3 genes resulting from rs3095726, rs62621204, rs78557458, and rs76499929, respectively, were analyzed using SWISS-MODEL software.

Results

Data production and read mapping

In this exon sequencing project, 12 DNA samples were sequenced with the Illumina sequencer, and the original base number of 14095.68 Mb was obtained for each sample. After deleting low-quality reads, 134,176,861 clean reads (13402.31Mb) per sample were obtained. The clean reads of each sample had higher Q20 and Q30, which indicated that the sequencing data had better sequencing quality. The average GC content was 45.47%. The results of the exon sequencing data are shown in Table 1.

The distribution of the content and quality values of the sequence bases on clean reads is shown in Figures 1 and 2, respectively.

Indel detection

Among the patients with severe lens opacity, we focused on the unique patient who showed gene polymorphisms that were inferred to be related to nuclear cataract.

Page 4/24 Finally, after fltering, we obtained an average of 103,623,705 common differential indels that exhibited polymorphisms between patients with nuclear cataract and normal people. In this project, an approximately 43.99 Mb long target area was captured by the chip, and we will change it on that target area. We used BWA to clean read each sample. Compared to the human reference genome sequence (GRCh37/HG19), the average 99.94% reads ratio is referred to Genome. The number of indels detected in each patient varied from 102,188,315 to 104,851,747, with an average of 103,623,705 (Table 2).

After removing duplicate reads, 103,623,705 effective reads (10290.98 Mb effective bases) were obtained on average. The number of effective bases was 54.01% (i.e., capture efciency [Capturespecifcity]) compared to the target area. The average sequencing depth of the target region was approximately 126.34×; 99.24% of the target region was covered by at least one read per sample, and 97.70% of the target region was covered by at least 10 reads. In addition, the single base sequencing depth profles and cumulative sequencing depth profles of each sample in the target region are shown in Figures 3a, 3b, and 3c, respectively. The length of inserting fragments (the length of DNA fragments sequenced) of pairwise sequencing reads is shown in Figures 3a, 3b, and 3c.

Functional annotation and genomic distribution

Overall, 4,944 InDels were found in all samples; 69.68% of them were present in the dbSNP database and 57.93% in the database of the 1000 Genomes Project. A total of 1329 InDels were newly discovered. The statistical data of InDel distribution for each sample and the general population are shown in Table 3.

Among the total InDels, 415 frame-shifting mutations occurred in the coding region, eight InDels formed the termination codon and nontermination codon, six InDels formed the initiation codon and noninitiation codon, and 58 InDels changed the splice acceptor or donor in the splice site region (see Table 4).

The length distribution of the InDel mutation in the coding region of each sample is shown in Figure 4.

SNP detection

Overall, 63,896 SNPs were found in all samples, 94.01% out of them. Now in the dbSNP database, 91.16% is in the thousand person genome project (the1000).Genomes Project in the database. There were 3,178 newly discovered SNPs. The ratio of base conversion to base substitution was 2.69%. In the overall SNP, the coding area. There were 18,966 synonymous mutations in the domain, 17,975 missense mutations, and 39 SNPs. The stop codon was transformed into a nonstop codon, and the 170 SNPs caused the codon to change into a nonstop codon. The statistical data of SNP distribution for each sample and the general population are shown in Table 5.

The termination codon, the 29 SNP makes the start codon change into a noninitiating codon, 119. An SNP changed the splicing receptor or splice donor in the splice site region (see Table 6).

GO and pathway analyses

We used KOBAS tools for GO enrichment and KEGG pathway analysis based on 1936 genes (SNP) and 275 (Indels). For SNP genes, 342 GO terms and 58 KEGG pathways were signifcantly enriched (P<0.05). For Indel genes, 266 GO terms and 48 KEGG pathways were signifcantly enriched (P<0.05). For SNPs, the signifcant terms in the GO analysis included “intracellular part,” “nucleic acid binding,” “nucleobase-containing compound metabolic process,” “nucleic acid metabolic process,” “DNA-templated,” “DNA binding,” “RNA metabolic process,” “RNA biosynthetic process,” “nucleobase-containing compound biosynthetic process,” “nucleic acid-templated transcription,” “regulation of macromolecule biosynthetic process,” “regulation of RNA metabolic process, transcription,” “regulation of cellular process,” “regulation of cellular macromolecule biosynthetic process,” “regulation of nucleic acid-templated transcription,” “regulation of cellular biosynthetic process,” “regulation of nitrogen compound metabolic process,” “regulation of biosynthetic process,” “regulation of RNA biosynthetic process,” “regulation of transcription,” “heterocyclic compound binding,” “organic cyclic compound binding,” “heterocycle metabolic process,” “heterocycle biosynthetic process, aromatic compound biosynthetic process,” “organic cyclic compound metabolic process,” “organic cyclic compound biosynthetic process,” “cellular nitrogen compound metabolic process,” “nitrogen compound metabolic process,” “metal ion binding,” “ion binding,” “cation binding,” and others.

The KEGG analysis mainly included the following terms: microtubule organizing center attachment site and others. For Indel polymorphism, the GO analysis mainly included the following signifcant terms: microtubule; organizing center attachment site; cytoskeletal anchoring at nuclear membrane; nuclear outer membrane; cytoskeletal protein binding; cytoskeletal protein binding; single-organism membrane organization; rough endoplasmic reticulum; membrane organization; cytoskeleton organization; maintenance of protein location in cell; maintenance of protein location; maintenance of location in cell; actin flament binding; regulation of cell shape; organelle outer membrane; single-organism organelle organization; nuclear membrane; maintenance of location; establishment of protein localization to membrane; regulation of cellular component organization; actin binding; nuclear envelope; organelle membrane; organelle; regulation of cell morphogenesis; nucleus; intracellular membrane-bounded organelle; organelle organization; endomembrane system; cellular developmental process; protein complex; regulation of anatomical structure morphogenesis; nuclear outer membrane-endoplasmic reticulum membrane network; intracellular organelle; intracellular membrane-bounded organelle; organelle organization; endomembrane system; cellular developmental process; protein complex; regulation of anatomical structure morphogenesis; nuclear outer membrane-endoplasmic reticulum membrane network, and others. After depth analysis, we found the presence of 20 genes (containing SNPs and Indel polymorphism sites ) in six or seven NARC samples, and the gene heatmaps are shown in Figure 5. The GO and KEGG analysis of these 20 genes are shown in Tables 7 and 8.

Identifcation of candidate genes associated with NARC

Page 5/24 Signifcant SNPs identifed based on known QTLs and mass spectrometry, and the biological function of genes. Twenty genes with at least one common SNP and indel gene are considered to be potential genes related to ARC. The results are shown in Figure 5. Depth analysis based on the raw data of whole exon sequences revealed that 15 SNPs and 11 indel polymorphisms were closely associated with the occurrence of cataract because of their signifcantly high expression in the cataract group. The SNP polymorphisms included myosin binding protein C, slow type; nucleolar protein 11; growth arrest-specifc 2 like 2; regulator of G-protein signaling 8; zinc fnger protein 862; zinc fnger protein 573; minichromosome maintenance complex component 3; atlastin GTPase 2; 1 open reading frame 68; and methionine sulfoxide reductase B2. The indel polymorphisms included dihydrodiol dehydrogenase; FLII, actin remodeling protein; spectrin repeat-containing nuclear envelope family member 3; ameloblastin; zinc fnger protein 527; G-protein signaling modulator 2; ATP binding cassette subfamily A member 10; succinate dehydrogenase complex favoprotein subunit A; zinc fnger protein 107; myc target 1; and uridine phosphorylase 2.

Indel and SNP validation

To evaluate the reliability of the resequencing data, 26 randomly selected indels and SNPs were validated by mass spectrometry. According to the polymorphism locus, the primer design was optimized through the assay design 3.1 software from Sequenom company. The primers synthesized by the company were quality checked by matrix-assisted laser desorption ionization time-of-fight mass spectrometry (MALDI-TOF) to assess whether the actual molecular weight is consistent with the theoretical molecular weight and the primer purity meets the experimental requirements. In the 384-well plate containing the reaction product, 16 μL triple distilled water was added and centrifuged at 2,000 rpm for 3 min; after the resin was added, the resin was purifed on a reverse shaker for 35 min and desalted. This was followed by centrifugation at 2,000 rpm for 3 min. The desalted sample was placed on the sample target and allowed to crystallize naturally. MALDI-TOF-MS was then performed. Typer 4 software was used to detect the mass spectrum peak and interpret the genotype of each sample target site according to the mass spectrum peak map. Shesis software (http://analysis.bio-x.cn/shesis main. HTM) was used for genotyping analysis of the difference in allele frequency and haplotype distribution between the normal group and the case group. The chi-square test was performed to analyze the relationship between SNPs of each gene and the relevance of disease risk, and P < 0.05 was considered to be signifcant. After specifc amplifcation and extension of SNP loci related to peripheral blood DNA in all case and control samples, four polymorphic loci were successfully genotyped.

Of the 12 indels that were retained after screening, we chose those with polymorphism in patients with severe lens opacifcation, known as “common differential variants.” The process of this study was as follows: we frst collected all polymorphic variations in patients and then retained the common variation with the same allele distribution pattern in individuals.

Statistical analysis

In addition, we compared the above difference index with variants in the human SNP database (http://www.ncbi.nlm.nih.gov). Next, we compared our results with the variants in the SNP database (NCBI dbSNP, updated on July 11, 2016), and we found that the three SNP polymorphisms and one indel polymorphism sites were signifcant. The frequencies of genotypes and alleles of 26 polymorphisms sites in patients with ARC and controls SNPs are shown in Table 9. Mass spectrometry analysis revealed that zinc fnger protein ZNF573(rs3095726, snp), ZNF862 (rs62621204, snp), SYNE3 (rs76499929, indel), and GAS2L2 (rs78557458, snp) had close relationship with ARC.

3D protein structure analysis

We visited the website https://www.swissmodel.expasy.org/ to see the function of SNP of peripherin gene changing site to protein and understand the effect of the SNP site mutation on protein. The comparison results can predict whether the two proteins have similar functions and help us to fnd new targets, including the fnal 3D structure location. We analyzed the 3D structural changes of the protein caused by missense mutation of Met465Val (ATG → GTG) in the coding region of the ZNF 573 gene. It was found that the original Met amino acid in the coding region of the ZNF573 gene had a change in the composition of carbon atoms. When ATG → GTG occurs, the original Met becomes Val. It can be seen that the position of the frst-order bicarbonate atom and a hydrogen bond of the rs3095726 protein changes from the tail position to the middle position. The analysis of the spatial sphere structure shows that the protein of ZNF 573 is not easy to be activated because of the change of ATG → GTG, resulting in the change of Met → Val from the initial obvious position to the hidden position of Val, thus affecting the function of ZNF 573. The results are shown in Figure 6. We analyzed the 3D structural changes of the protein caused by missense mutation of Val71Phe (GTT → TTT) in the coding region of the GAS2L2 gene. It was found that the original Met amino acid in the coding region of the GAS2L2 gene had a change in the composition of carbon atoms. When GTT → TTT occurs, the original Val becomes Phe. The 3D protein variety analysis showed that the position of the primary hydrogen bond changed from open to circle condition for the corresponding protein position of rs3095726. The space ball structural analysis showed that the change in the protein position site from the primary single position to circle shape may be because GAS2L2 was not easily activated and thus the function of GAS2L2 was infuenced. The results are shown in Figure 7.

Discussion

In eukaryotic genome, zinc fnger (ZNF) protein is one of the most abundant proteins. Its functions include DNA recognition, RNA packaging, transcriptional activation, apoptosis regulation, protein folding and assembly, and lipid binding. The structure of ZNF protein is as diverse as its function. Recently, many ZNF domains with new topological structures have been reported, which provide important insights into the structure/function relationship. In the eukaryotic system, ZNFs are one of the most common proteins with a wide range of biological functions, including apoptosis, protein structure, DNA recognition, and RNA transcription [30].

The function of transcriptional inhibition is to regulate in mammals, which is an important part of molecular mechanism. Some studies have shown that there are approximately 2,000 transcription factors in 5, and there are ZNF motifs in nearly 800 transcription inhibitors [31]. In the

Page 6/24 regulatory region of the genome, these motifs recognize specifc DNA sequences, and the interaction between ZNF protein regulatory gene expression and DNA target is the key [32].

The most common of these motifs is the C2H2 or Krüppel type of ZNF (KZNF). It is reported that a motif termed a Krüppel-associated box (KRAB) could recruit histone deacetylase complexes to the DNA region to which the ZNF is attached and occupy one-third portion in KZNFs [33].

Recently, some structural studies of ZNF proteins have shown new insights into their extraordinary diversity of structure and function. Although a large number of putative zinc fnger motifs have been identifed, the structures of only a few of them have been characterized. Some studies have proved that there are novel folds, while recent studies have shown that uncharacterized zinc fnger domains are built on common structural cores.

Although remarkable progress has been made in this area, the structure, function, and mechanism of ZNF protein need further study. It is, however, very clear that these small, independently folded protein domains play a key role in regulating a series of signifcant biological functions. The other functions of ZNF protein besides DNA and RNA recognition and packaging are gradually being recognized.

In our study, after whole-exon sequencing, we further performed depth analysis including GO and KEGG analysis, and we found the presence of 20 sites (containing SNPs and indel polymorphism sites ) in 6 or 7 NARC samples. Next, we performed mass spectrometry analysis and verifed that there are signifcant differences between the disease group and the control group for rs3095726 (ZNF573). We also investigated the structural variation caused by rs3095726 in the 3D protein. Analysis with the SWISS-MODEL software showed that rs3095726 (Met523Val) caused a structural variation in ZNF573 protein. The 3D protein structural variation analysis showed that the position of primary two carbon atoms and a hydrogen bond changed from the tail position to the middle position for the corresponding protein of rs3095726. The space ball structure analysis showed that the position of protein from initial obvious position to later hidden position may be caused by the fact that ZNF573 is not easily activated, which infuences the function of ZNF573. Because ZNF573 is involved in a broad range of biological functions including apoptosis, protein structure, DNA recognition, and RNA transcription, the structural variation in ZNF573 protein may result in the abnormal growth and development of human lens, further leading to the occurrence of cataract.

The molecular mechanism of KRAB domain inhibition is not clear. In our study, we found signifcant differences between the disease group and the control group for rs62621204 (ZNF862). The search results of rs62621204 (Arg56Cys) in PubMed indicated that this site is located in the region of “Krüppel- associated box,” which has been confrmed as the transcriptional inhibition region called the Krüppel‐associated box (KRAB) and is conserved in several Krüppel-type ZNF proteins [34].

KRAB is rich in charged amino acids and can be divided into two subfelds: A and B. These two subfelds can be folded into two amphiphilic α helices. KRAB A and B boxes can be separated by varying interval fragments. Many KRAB proteins contain only one box [35].

The functions of the known members of the KRAB protein family include transcriptional inhibition of RNA polymerase I, II, and III promoters; RNA binding and splicing; and nucleolar function control. When the KRAB domain binds to the template DNA through the DNA binding domain, the KRAB domain acts as a transcription inhibitor. The sequence of 45 amino acids in the KRAB A subfeld has been proven to be sufcient and necessary for transcriptional inhibition. The B-box itself does not suppress, but it does strengthen the repression imposed by the A-subdomain of KRAB [36].

The KRAB domain is usually encoded by two exons. The regions encoded by these two exons are called krab-a and krab-b. Although the functions of KRAB- ZFPS are largely unknown, they seem to play an important role in cell differentiation and development, organ development, and regulation of virus replication and transcription.

The molecular mechanisms of repression by the KRAB domain are not known. In our study, we found signifcant differences between the disease group and the control group for rs62621204 (ZNF862), and we further investigated the 3D protein variation caused by rs62621204. Analysis with SWISS-MODEL software revealed that rs62621204 (Arg56Cys) was not in the region of protein coding. On the basis of this result, we speculated that although this site was beyond the region of protein coding, rs62621204 (Arg56Cys) caused some variation in the KRAB domain.

Gas2 [MIM: 602835] is expressed in many human tissues, and it is involved in the regulation of microflament dynamics during cell cycle and apoptosis [37,38]. Gas2 belongs to the microflament network system and is a protein whose expression is also regulated during the growth arrest of diploid fbroblasts and has remained conserved in the evolution of species. Gas2l2 is a member of the Gas2 family, which includes Gas2, Gas2L1, Gas2L2, and Gas2L3 [39,40]. Gas2L2 has six exons that encode a 97-kDa protein. Previous studies have shown that Gas2L2 is located in actin stress fbers and microtubules and thus promotes the coordinated arrangement of actin microtubules in different extents [40]. However, little is known about the function and location of Gas2L2 in natural tissues

Cell transformation leads to changes in cell morphology, cell metabolism, gene expression, and growth control. Consequently, the cells become defective in reaching growth stagnation [41]. Growth stagnation, cell cycle arrest, or go are generally considered to exist only in the “negative” phase of the cell cycle. The isolation of highly expressed genes in growth arrest (Gas) has proved its existence. This provides a new tool for the study of cell biology of growth stagnation.

The Gas2 protein is induced in cultured cells during growth arrest [42]. It is also associated with apoptosis-related rearrangement of cytoskeleton in cultured cells [43] and possibly with the development of mammalian tissues [44]. During the transition from G0 to G1, the phosphorylation of the Gas2 protein in serine and threonine is believed to be a mechanism for the rapid induction of Gas2 inactivation after serum stimulation of blocked cells [45].

Therefore, the recognition of the function of components with good characteristics in the microflament system opens up a new research feld for linking the regulation of microflament network and cell growth.

Page 7/24 Cell shape is mainly determined by a complicated network of factors, including membrane cytoskeleton coupling factor, cytoskeleton-related elements, and cytoskeleton components [46]. It remains unclear whether the microflament network play a direct role in the generation of growth control signals or only needs to generate motion and chemotaxis responses, which are coupling but independent events in the growth cycle process.

The identifcation and characterization of new components in the microflament system that are closely related to the growth condition may be an important step to clarify the role of this system in growth control [47–49]. A good candidate for this kind of protein is Gas2, which has been proven to be an integral part of the microflament system and its expression in mouse and human fbroblasts is highly induced when they are stagnant [50]. There is increasing interest in understanding the molecular processes that lead to growth arrest.

A previous study [51] has demonstrated that Gas2 is an evolutionarily conserved component of the microflament system. When the growth of cells is restricted, the level of the Gas2 protein increases steadily, but because of its long half-life, there is no signifcant downregulation in the process of G0-G1 cell transformation.

In fact, cell responses (e.g., cell movement) that depend on the microflament system are signifcantly restricted in stationary cells [52]. This probably implies that specifc elements are needed for the tissue to have such cytoskeleton-related constraints: Gas2 can represent one of these elements.

In our study, after whole-exon sequencing, we performed depth analysis including GO and KEGG analysis and found that 20 sites (containing SNP and indel polymorphism sites) are present in six or seven NARC samples. We then performed mass spectrometry analysis and found signifcant differences between the disease group and the control group for rs78557458 (GAS2L2). We also investigated the 3D structural variation caused by rs78557458 in protein. Analysis with SWISS-MODEL software revealed that rs78557458 (Val71Phe) caused a structural variation in the GAS2L2 protein. The structural variation analysis showed that the position of the primary hydrogen bond changed from open to circle condition for the corresponding protein of rs3095726. The space ball structural analysis showed that the protein position from the initial single position to the circle shape may be caused by the fact that GAS2L2 is not easily activated, which infuenced the function of GAS2L2.

There is strong evidence that nesprin-3 (SYNE3) interacts with intermediate flaments in vivo and in vitro through lectins [53,54]. However, the role of nesprin-3 in the pathogenesis of glucocorticoid-induced cataract has not yet been confrmed. In the present study, we assessed the expression and function of nesprin-3 in human lens epithelial cells (HLECs). The results showed that syne3 gene polymorphism was different between the disease group and the control group. Therefore, the functional modifcations of SYNE3-mediated 3D protein structural changes in HLECs merit further investigation.

The loss of nesprin-3 changed the cytoskeleton around the nucleus, but not the entire cytoskeleton. This fnding was consistent with previous studies [55–57].

We believe that nesprin-3 provides a scaffold for the polyploid perinuclear tissue and that it is involved in the link between the centrosome and the nucleus. This nesprin-3/plectin/vimentin linkage is very similar to the nesprin-3/plectin/keratin linkage proposed in other cell lines [58].

The importance of nesprin-3 is not necessarily limited to the nuclear membrane, because the separation of polyploid and vimentin on the nuclear membrane may affect its availability in other roles. In fact, recent studies have shown that polyploid/vimentin complexes are the key in the regulation of focal adhesion and shape of mouse fbroblasts [59]. In addition, the in vivo results of polyploid and vimentin-defcient mice further emphasized the potential importance of the proposed nesprin-3/polyploid/vimentin linkage.

The function of nesprin-3 is to maintain the normal nuclear localization of HLECs, which adds new content to the already complex cytoskeleton and tissue structure around the nucleus. Our study showed signifcant differences between the disease group and the control group for rs76499929 (SYNE3). We then investigated the 3D protein structural variation caused by rs76499929. The SWISS-MODEL software showed that rs76499929 (Leu796del) was not in the region of protein coding. On the basis of this result, we speculated that although this site is beyond the region of protein coding, rs76499929 (Leu796del) infuenced the function of SYNE3 protein, which resulted in abnormality of cytoskeleton and tissue structure around the nucleus.

In conclusion, we found four new polymorphism sites associated with ARC, which could be the cause of the disease. Our fndings were supported by the results of studies on gene and protein structural variation. In this study, we adopted the whole-exon sequencing data to discover the molecular mechanisms involved in nuclear ARC. The data were verifed through several flters and databases, and the results of depth analysis were further validated by mass spectrometry experiments. The corresponding gene expression was measured and compared with those of healthy people, and the results showed that the variations in the 3D protein structure caused by ZNF573 (rs3095726, snp), GAS2L2 (rs78557458, snp), ZNF862 (rs62621204, snp), and SYNE3 (rs76499929, indel) were involved in the mechanism of nuclear ARC formation.

Conclusions

Our fndings provide the basis for further studies and discovery of key genes associated with NARC.

Abbreviations

Nuclear Age-related Cataract NARC

Single Nucleotide Polymorphism SNP

Insertions and deletions Indels

Page 8/24 Age-Related Cataracts ARCs

Posterior Subcapsular Cataract PSC

Lens Opacifcation Classifcation System III LOCS III

Whole-Exome Sequencing WES

Exome Aggregation Consortium EXAC

Next-Generation Sequencing NGS

Genome Analysis Toolkit GATK

Gene Ontology GO

Kyoto Encyclopedia of Genes and Genomes KEGG

Time-Of-Flight Mass Spectrometry MALDI-TOF/MS

Time-Of-Flight TOF

Zinc Finger ZNF

Krüppel type of ZNF KZNF

Krüppel-Associated Box KRAB

Growth arrest Gas

Nesprin-3 SYNE3

Human Lens Epithelial Cells HLECs

Declarations

Ethics approval and consent to participate: The experiments were approved by Ethics Committee of the First Afliated Hospital of Harbin Medical University and confrmed that all research was performed in accordance with relevant guidelines, and confrmed that informed consent was obtained from all participants.

Consent for publication: Not applicable

Availability of data and material: Not applicable

Competing interests: The authors declare that they have no competing interests.

Funding: This work was supported by grants from the National Natural Science Foundation of China (Grant No. 81700818) to Lin Wang, Heilongjiang Postdoctoral Research Initiation Fund (Grant No. LBH-Q15105) to Lin Wang, and the Research and Innovation fund of the First Hospital of Harbin Medical University (Grant No. 2017B016) to Lin Wang.The funding sources were non-proft scientifc research management and academic institutions, they had the roles in the design of this study, and did have the role during its execution, analyses, interpretation of the data, or decision to submit results.

Authors' contributions: LW, YB Y, and PL planned the study. LW, WC Z, and YB Y conducted the experiment. LW, WC Z, YB Y, and PL wrote and submitted the article. All authors have read and approved the manuscript.

Acknowledgements: Not applicable

References

1. Manoj Kumar, Tushar Agarwal, Sudarshan Khokhar, Manoj Kumar, Punit Kaur, Tara Sankar Roy, Rima Dada. Mutation screening and genotype phenotype correlation of alpha-crystallin, gamma-crystallin and GJA8 gene in congenital cataract. Mol Vis. 2011,17:693–707. 2. Resnikoff S, Pascolini D, Mariotti SP, Pokharel GP. Global magnitude of visual impairment caused by uncorrected refractive errors in 2004. Bull World Health Organ. 2008,86(1):63–70. 3. Gupta SK, Selvan VK, Agrawal SS, Saxena R. Advances in pharmacological strategies for the prevention of cataract development. Indian J Ophthalmol. 2009,57(3):175–83. 4. Beebe DC, Shui Y-B. Progress in preventing age-related cataract. In: Yorio T, Clark AF, Wax MB, editors. Ocular therapeutics. New York, NY: Academic Press; 2008. pp. 143–66. 5. Spector A. Oxidative stress-induced cataract: Mechanism of action. FASEB J. 1995,9(12):1173–82.

Page 9/24 6. Na K-S, Park Y-G, Han K, Mok JW, Joo C-K. Prevalence of and risk factors for age-related and anterior polar cataracts in a Korean population. PLOS ONE. 2014,9: e96461 7. Su S, Yao Y, Zhu RR, Liang CK, Jiang SQ, Hu N, Zhou J, Yang M, Xing Q, Guan HJ. The associations between single nucleotide polymorphisms of DNA repair genes, DNA damage, and agerelated cataract: Jiangsu Eye Study. Investig Ophthalmol Vis Sci .2013, 54:1201–1207. 8. Hammond CJ, Snieder H, Spector TD, Gilbert CE (2000) Genetic and environmental factors in age-related nuclear cataracts in monozygotic and dizygotic twins. N Engl J Med 342:1786–1790. 9. Hammond CJ, Duncan DD, Snieder H, de Lange M, West SK, Spector TD, Gilbert CE.The heritability of age-related cortical cataract: the twin eye study. Investig Ophthalmol Vis Sci , 2001,42:601–605 10. Hodges, Emily,Xuan, Zhenyu,Balija, Vivekanand. Genome-wide in situ exon capture for selective resequencing. Nature genetics. 2007,39:1522–1527. 11. Resnikoff S, Pascolini D, Mariotti SP, Pokharel GP. Global magnitude of visual impairment caused by uncorrected refractive errors in 2004. Bull World Health Organ. 2008,86(1):63–70. 12. Murim Choi,Ute I Scholl,Weizhen Ji. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences of the United States of America. 2009,106:19096–19101. 13. Ng, Sarah B., Turner, Emily H., Robertson, Peggy D.Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461:272–276. 14. Gilissen C, Hoischen A, Brunner HG, Veltman JA. Disease gene identifcation strategies for exome sequencing. European journal of human genetics: EJHG. 2012,20:490–497. 15. Sarah B Ng, Kati J Buckingham, Choli Lee. Exome sequencing identifes the cause of a mendelian disorder. Nature genetics. 2010,42:30–35. 16. Jessica X.Chong,Kati J.Buckingham,Shalini N.Jhangiani,CorinneBoehm,NaraSobreira,Joshua D.Smith,Tanya M.Harrell,Margaret J.McMillin,WojciechWiszn Fong,DebraMathews,P. DaneWitmer,Michael J.Bamshad. The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. American journal of human genetics. 2015,97:199–215. 17. NatureO'RoakBJ, Vives L, Girirajan S, Karakoc E, Krumm N, Coe BP. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature. 2012,485:246–250. 18. GenomesProject Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012,491:56–65. 19. RegaladoES, Guo DC, Villamizar C, Avidan N, Gilchrist D, McGillivray B. Exome sequencing identifes SMAD3 mutations as a cause of familial thoracic aortic aneurysm and dissection with intracranial and other arterial aneurysms. Circ Res. 2011,109:680–686. 20. Emond, Mary J, Louie, Tin, Emerson, Julia. Exome sequencing of extreme phenotypes identifes DCTN4 as a modifer of chronic Pseudomonas aeruginosa infection in cystic fbrosis. Nature genetics. 2012,44:886–889. 21. Catherine Boileau,Dong-Chuan Guo,Nadine Hanna. TGFB2 mutations cause familial thoracic aortic aneurysms and dissections associated with mild systemic features of Marfan syndrome. Nature genetics. 2012,44:916–921. 22. LekM, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016,536:285– 291. 23. PiersonTM, Adams D, Bonn F, Martinelli P, Cherukuri PF, Teer JK. Whole-exome sequencing identifes homozygous AFG3L2 mutations in a spastic ataxia- neuropathy syndrome linked to mitochondrial m-AAA proteases. PLoS Genet. 2011;7:e1002325. 24. Stephan J, Sanders,Michael T, Murtha,Abha R, Gupta,John D, Murdoch,Melanie J, Raubeson,A Jeremy, Willsey, A Gulhan, Ercan-Sencicek, Nicholas M, DiLullo , Neelroop N, Parikshak, Jason L, Stein, Michael F, Walker, Gordon T, Ober , Nicole A, Teran ,Youeun, Song , Paul, El-Fishawy,Ryan C, Murtha , Murim, Choi, John D, Overton, Robert D, Bjornson,Nicholas J, Carriero ,Kyle A, Meyer, Kaya, Bilguvar, Shrikant M, Mane , Nenad, Sestan , Richard P, Lifton, Murat, Günel, Kathryn, Roeder, Daniel H, Geschwind , Bernie, Devlin , Matthew W, State. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature. 2012;485:237–241. 25. Yang Y ,Muzny DM ,Reid JG . Clinical whole-exome sequencing for the diagnosis of mendelian disorders. The New England journal of medicine. 2013;369:1502–1511. 26. Talkowski ME, Minikel EV, Gusella JF. Autism spectrum disorder genetics: diverse genes with diverse clinical outcomes. Harv Rev Psychiatry. 2014;22:65– 75. 27. Berg JS. Genome-scale sequencing in clinical care: establishing molecular diagnoses and measuring value. JAMA. 2014;312:1865–1867. 28. Monroe GR, Frederix GW, Savelberg SM. Effectiveness of whole-exome sequencing and costs of the traditional diagnostic trajectory in children with intellectual disability. Genet Med, 2016 Sep;18(9):949-56. 29. Montgomery SB1, Goode DL, Kvikstad E, Albers CA, Zhang ZD, Mu XJ, Ananda G, Howie B, Karczewski KJ, Smith KS, Anaya V, Richardson R, Davis J, MacArthur DG, Sidow A, Duret L, Gerstein M, Makova KD,Lunter G. The origin, evolution, and functional impact of short insertion-deletion variants identifed in 179 human genomes. Genome research. 2013;23(5):749–61. Author profle Search articles by ORCID 0000-0002-3798-2058 30. Li, H. and Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 2009, 25: 1754-1760. 31. Li, H. and Durbin, R. Fast and accurate long-read alignment with burrows-wheeler transform. Bioinformatics, 2010, 26:589-595. 32. Picard Tools (http://broadinstitute.github.io/picard/). 33. Mark A DePristo,Eric Banks,Ryan Poplin. A framework for variation discovery and genotyping using next generation DNA sequencing data. Nature genetics , 2011,43: 491-498.

Page 10/24 34. McKenna A, Hanna M, Banks E. The Genome Analysis Toolkit: a MapReduce framework for analyzing next generation DNA sequencing data. Genome Research ,2010,20(9):1297-1303. 35. Bellefroid EJ, Poncelet DA, Lecocq PJ, Revelant O , Martial JA The evolutionarily conserved Krüppel-associated box domain defnes a subfamily of eukaryotic multifngered proteins.01 May 1991, 88(9):3608-3612 36. Margolin JF, Friedman JR, Meyer WK, Vissing H, Thiesen HJ , Rauscher FJ 3rd Krüppel-associated boxes are potent transcriptional repression domains. 01 May 1994, 91(10):4509-4513. 37. Brancolini, C, Benedetti, M., and Schneider, C. Micro- flament reorganization during apoptosis: the role of Gas2, a possible substrate for ICE-like proteases. EMBO J. 1995,14: 5179– 5190. 38. Zhang, T., Dayanandan, B., Rouiller, I., Lawrence, E.J., and Mandato, C.A. Growth-arrest-specifc protein 2 in- hibits cell division in Xenopus embryos. PLoS ONE .2011. 6, e24698 39. Goriounov, D., Leung, C.L., and Liem, R.K.. Protein products of human Gas2-related genes on 17 and 22 (hGAR17 and hGAR22) associate with both microfla- ments and microtubules. J. Cell Sci. 2003, 116:1045–1058. 40. Stroud, M.J., Nazgiewicz, A., McKenzie, E.A., Wang, Y., Kam- merer, R.A., and Ballestrem, C.. GAS2-like proteins mediate communication between microtubules and actin through interactions with end-binding proteins. 2014, J. Cell Sci. 127: 2672–2682 41. Land, H., L. F. Parada, and R. A. Weinber8. 1983. Tumorigenic conver- sion of primary embryo fbroblasts requires at least two cooperating on- cogenes. Nature (Lond.). 304:596-602. 42. Brancolini, C., Bottega, S. and Schneider, C. Gas2, a growth arrest-specifc protein, is a component of the microflament network system. J. Cell Biol. 1992, 117,1251 -1261. 43. Brancolini, C., Benedetti, M. and Schneider, C. Microflament reorganization during apoptosis: the role of Gas2, a possible substrate for ICE-like proteases. EMBO J. 1995,14:5179 -5190. 44. Lee, K. K., Tang, M. K., Yew, D. T., Chow, P. H., Yee, S. P., Schneider, C. and Brancolini, C. gas2 is a multifunctional gene involved in the regulation of apoptosis and chondrogenesis in the developing mouse limb.Dev. Biol. 1999, 207,14 -25.

45. Brancolini, C. and Schneider, C. Phosphorylation of the growth arrest-specifc protein Gas2 is coupled to actin rearrangements during G0→1 transition in NIH 3T3 cells. J. Cell Biol. (1994). 124,743 -756. 46. Stossel, T. P. On the crawling of animal cells. Science (Wash. DC). 1993.260:1086-1094. 47. Brooks, S. F., T. Herget, S. Broad, and E. Rozengurt. The expres- sion of 80K/MARCKS, a major substrate of protein kinase c (PKC), is down-regulated through both PKC-dependent and -independent path- ways. J. Biol. Chem. 1992.267:14212-14218. 48. Gluck, U., J. 1. Rodriguez-Fernandez, R. Pankov, and A. Ben Ze'ev. Regulation of adherens junction protein expression in growth activated 3T3 cells and regenerating liver. Exp. Cell Res. 1992, 202:477--486. 49. Ungar, F., B. Geiger, and A. Ben-Ze'ev. Cell contact- and shape- dependent regulation of vinculin synthesis in cultured fbroblasts. Nature (Lond.). 1986. 319:787-791. 50. Brancolini, C., S. Bottega, and C. Schneider. Gas2, a growth arrest- specifc protein, is a component of the microflament network system. J. Cell Biol. 1992. 117:1251-1261.

51. Brancolini, C. and Schneider, C. Phosphorylation of the growth arrest-specifc protein Gas2 is coupled to actin rearrangements during G0→1transition in NIH 3T3 cells. J. Cell Biol. 1994,124,743 -756 52. Conrad, P. A., K. A. Giuliano, G. Fisher, K. Collins, P. T. Matsudaira, and D. L. Taylor. Relative distribution of actin, myosin I, myosin II during the wound healing response to fbroblasts. J. Cell Biol. 1993. 120:1381-1391. 53. WilhelmsenK, Litjens SH, Kuikman I, Tshimbalanga N, Janssen H, van den Bout I, Raymond K, Sonnenberg A. Nesprin-3, a novel outer nuclear membrane protein, associates with the cytoskeletal linker protein plectin.J Cell Biol. 2005Dec 5;171(5):799-810. 54. Postel R, Ketema M, Kuikman I, de Pereda JM, Sonnenberg A. Nesprin-3 augments peripheral nuclear localization of intermediate flaments in zebrafsh. J Cell Sci (2011)124, 755–764. 55. Wilhelmsen K, Litjens SH, Kuikman I, Tshimbalanga N, Janssen H, van den Bout I, Raymond K, Sonnenberg A Nesprin-3, a novel outer nuclear membrane protein, associates with the cytoskeletal linker protein plectin. J Cell Biol ,2005,171, 799–810. 56. Ketema M, Wilhelmsen K, Kuikman I, Janssen H, Hodzic D, Sonnenberg A. Requirements for the localization of nesprin-3 at the nuclear envelope and its interaction with plectin. J Cell Sci ,2007,120, 3384–3394. 57. Postel R, Ketema M, Kuikman I, de Pereda JM, Sonnenberg A. Nesprin-3 augments peripheral nuclear localization of intermediate flaments in zebrafsh. J Cell Sci ,2011,124: 755–764. 58. Wilhelmsen K, Litjens SH, Kuikman I, Tshimbalanga N, Janssen H, van den Bout I, Raymond K, Sonnenberg A Nesprin-3, a novel outer nuclear membrane protein, associates with the cytoskeletal linker protein plectin. J Cell Biol, 2005,171, 799–810. 59. Burgstaller G, Gregor M, Winter L, Wiche G. Keeping the vimentin network under control: cell-matrix adhesion-associated plectin 1f affects cell shape and polarity of fbroblasts. Mol Biol Cell ,2010,21: 3362–3375.

Tables

Table 1 The results of exon sequencing data .

Page 11/24 Samples Rawbases Clean Clean Cleanread1 Cleanread2 Cleanread1 Cleanread2 GC Rawreads (Mb) Cleanreads bases(Mb) data Q20(%) Q20(%) Q30(%) Q30(%) content(%) rate (%)

1 142,744,716 14274.28 136,740,308 13657.90 95.68 98.69 96.49 96.09 91.38 45.68

2 147,264,980 14726.38 139,434,016 13928.23 94.58 98.62 96.19 95.95 90.72 45.28

3 145,010,526 14500.86 137,801,272 13764.08 94.92 98.66 96.33 96.03 91.02 45.42

4 142,742,068 14274.01 136,409,362 13625.27 95.46 98.53 96.20 95.68 90.71 45.45

5 135,957,452 13595.58 129,215,428 12906.12 94.93 98.65 96.34 96.01 91.04 45.46

6 141,237,565 14123.57 134,585,669 13442.70 95.18 98.67 96.39 96.04 91.15 45.52

7 143,748,204 14374.65 136,809,682 13665.40 95.07 98.61 96.26 95.89 90.86 45.42

8 141,572,061 14157.03 134,608,794 13445.20 94.97 98.64 96.31 95.98 90.97 45.43

9 135,951,979 13595.02 129,061,180 12891.35 94.82 98.67 96.15 96.06 90.69 45.38

10 135,932,011 13593.07 129,593,778 12945.50 95.24 98.69 96.16 96.10 90.71 45.64

11 139,580,545 13957.89 132,929,414 13277.80 95.13 98.67 96.29 96.04 90.94 45.53

12 139,760,243 13975.86 132,933,425 13278.18 95.01 98.65 96.23 96.00 90.83 45.44

Average 140,958,529 14095.68 134,176,861 13402.31 95.08 98.64 96.28 95.99 90.92 45.47

Due to technical limitations, table 2 is only available as a download in the Supplemental Files section.

Table 3 The statistical table of InDel distribution for each sample and the general population

FractionofInDelsindbSNP(%) TotalInDels FractionofInDelsin1000genomes(%) Novel Homozygous Heterozygous Intron 5'UTR Samples

1 2,696 81.82 62.24 428 959 1,737 1,615 49

2 2,651 82.80 63.22 396 991 1,660 1,574 51

3 2,676 80.98 61.66 449 995 1,681 1,610 55

4 2,656 83.70 63.18 375 977 1,679 1,582 54

5 2,802 80.76 61.81 468 1,018 1,784 1,662 56

6 2,725 81.19 61.90 448 991 1,734 1,629 53

7 2,677 82.56 62.77 406 986 1,691 1,595 53

8 2,718 81.43 62.08 441 1,000 1,719 1,622 55

9 2,744 81.45 62.28 442 1,035 1,709 1,621 48

10 2,669 82.43 63.21 401 973 1,696 1,608 49

11 2,704 81.68 62.40 430 988 1,716 1,620 52

12 2,708 81.90 62.48 426 1,003 1,705 1,612 51

Overall 4,944 69.68 57.93 1,329 NA NA 2,944 101

Table 4. Function classifcation statistics of indel in coding area In the total InDel, 415 frame-shifting mutations occurred in the coding region, 8 InDel made the termination codon non-termination codon, 6 InDel made the initiation codon non-initiation codon, and 58 InDel changed the splice acceptor or donor in the splice site region.

Page 12/24 Samples Frameshift Non-frameshiftInsertion Non-frameshiftDeletion Stoploss Startloss Splicing

1 230 98 108 7 3 40

2 237 78 98 6 4 42

3 241 83 99 6 4 37

4 221 85 102 5 4 39

5 238 100 117 4 2 44

6 236 94 108 6 3 40

7 231 86 103 6 4 40

8 237 90 106 5 3 40

9 226 91 104 6 4 41

10 209 101 110 5 3 40

11 227 95 108 5 3 40

12 228 90 105 6 4 41

Overall 415 163 235 8 6 58

Table 5 Function classifcation statistics of SNP in coding area:The statistical table of SNP distribution for each sample and the general population are shown.

FractionofSNPsindbSNP(%) FractionofSNPsin1000 TotalSNPs genomes(%) Novel Homozygous Heterozygous Intron 5'UTRs 3'UTR

Samples

1 33,171 97.66 93.67 666.00 13,407 19,764 10,052 532 568

2 32,388 97.59 93.70 659.00 13,394 18,994 9,894 493 564

3 32,891 97.43 93.23 714.00 13,545 19,346 9,951 507 555

4 33,014 97.74 93.73 627.00 13,412 19,602 10,023 527 577

5 33,029 97.54 93.20 704.00 13,517 19,512 10,085 537 567

6 33,030 97.54 93.37 694.67 13,490 19,541 10,029 525 563

7 32,811 97.62 93.60 660.22 13,432 19,379 9,982 515 568

8 32,910 97.53 93.34 692.74 13,498 19,412 10,006 520 563

9 33,450 97.32 93.03 752.00 13,351 20,099 10,115 526 545

10 33,239 97.42 93.35 710.00 13,430 19,809 10,052 516 565

11 33,060 97.50 93.35 699.14 13,473 19,587 10,029 520 564

12 33,107 97.48 93.33 703.79 13,418 19,688 10,042 520 559

Overall 63,896 94.01 91.16 3178.00 NA NA 18,966 1,042 1,061

Table 6 SNP distribution statistics identifed:A SNP changes the splicing receptor or splice donor in the splice site region

Page 13/24 Samples Synonymous Missense Stopgain Stoploss Startloss Splicing

1 9,934 8,853 74 28 13 53

2 9,570 8,649 62 30 14 61

3 9,784 8,798 74 28 14 60

4 9,938 8,705 70 29 13 49

5 9,734 8,742 74 28 15 55

6 9,817 8,798 74 28 14 56

7 9,775 8,717 69 29 14 55

8 9,764 8,752 72 28 14 57

9 10,020 8,878 71 27 18 58

10 9,794 9,005 71 29 14 60

11 9,792 8,852 72 28 14 58

12 9,862 8,816 71 28 15 57

Overall 18,699 17,975 170 39 29 119

Table 7 The Go analysis for twenty related genes Go Analysis

Term Database ID Input number Background number P-Value

MYBPC1

protein complex Gene Ontology GO:0043234 146 3913 0.020315

NOL11

protein complex Gene Ontology GO:0043234 146 3913 0.020315

GAS2L2

microtubule cytoskeleton organization Gene Ontology GO:0000226 26 396 0.0006

RGS8

GTPase regulator activity Gene Ontology GO:0030695 17 296 0.015631

ZNF862

biosynthetic process Gene Ontology GO:0009058 211 6087 0.020895

C1orf68

metabolic process Gene Ontology GO:0008152 354 10642 0.120141

ZNF573

biosynthetic process Gene Ontology GO:0009058 211 6087 0.010895

MCM3AP

protein complex Gene Ontology GO:0043234 146 3913 0.020315

ATL2

endoplasmic reticulum part Gene Ontology GO:0044432 47 1158 0.049237

MSRB2

Page 14/24 oxidoreductase activity, acting on a sulfur group of donors Gene Ontology GO:0016667 7 50 0.001621

FLII

actin flament severing Gene GO:0051014 1 9 0.022971

Ontology

DHDH

oxidoreductase activity, acting on the CH-CH group of donors Gene Ontology GO:0016627 2 57 0.008442

SYNE3

microtubule organizing center attachment site Gene Ontology GO:0034992 1 7 0.018419

AMBN

structural constituent of tooth enamel Gene Ontology GO:0030345 1 5 0.013847

ZNF527

intracellular organelle Gene Ontology GO:0043229 36 11680 0.059674

GPSM2

GTPase regulator activity Gene Ontology GO:0030695 4 296 0.005571

ABCA10

hydrolase activity Gene Ontology GO:0016787 11 2382 0.02661

ZNF107

intracellular organelle Gene Ontology GO:0043229 36 11680 0.059674

MYCT1

intracellular organelle Gene Ontology GO:0043229 36 11680 0.059674

UPP2

pyrimidine nucleotide salvage Gene Ontology GO:0032262 1 5 0.013847

Table 8 The KEGG analysis for twenty related genes KEGG analysis

Term Database ID Input number Background number P-Value

ABCA10

ABC transporters KEGG PATHWAY hsa02010 1 45 0.0358043

SDHA

Citrate cycle (TCA cycle) KEGG PATHWAY hsa00020 1 30 0.0109433

UPP2

Drug metabolism - other enzymes KEGG PATHWAY hsa00983 1 46 0.0261188

DHDH

Page 15/24 Pentose and glucuronate interconvert KEGG PATHWAY hsa00040 1 36 0.0129201

-sions

Table 9 Frequences of genotypes and alleles of 26 polymorphisms sites in ARC patients and controls

Page 16/24 Genotype ARC(%) Controls(%) p value OR(95% CI) Allel (n=583) (n=511)

SNPs rs368680155 TT 547(93.88%) 473(92.63%) 0.407 1.221(0.761-1.957) (MYBPC1) TC 21(3.54%) 22(4.26%) 0.550 0.831(0.451-1.529) CC 15(2.58%) 16(3.11%) 0.579 0.817(0.400-1.670)

T 1115(96%) 968 (94.7%) 0.321 1.220(0.824-1.805) C 51(4.4%) 54 (5.3%) 0.321 0.820(0.554-1.214) rs3817552 (MYBPC1) CC 382 (65.56%)337 (66.0%) 0.882 0.981(0.764-1.260)

CG 165 (28.33% 133(26.00%) 0.399 1.122(0.859-1.466) GG 36 (6.11%) 41 (8.00%) 0.233 0.754(0.474-1.200)

C 929 (80%) 807 (80%) 0.682 1.044(0.849-1.285) G 23 (20%) 215 (20%) 0.682 0.958(0.778-1.178)

rs2293468 CC 288 (49.44%) 255 (50.00%)0.868 0.980(0.773-1.243) (MYBPC1) CT 204 (35.00%) 123 (24.00%)0.000 1.698(1.303-2.213)

TT 91 (15.56%) 133 (26.00%)0.000 0.526(0.390-0.708) C 780 (67%) 633 (62%) 0.016 1.242(1.042-1.480)

T 386 (33%) 389 (38%) 0.016 0.805(0.676-0.960)

rs2291284 TT 288 (49.44%) 255 (50.00%) 0.868 0.980(0.773-1.243) (NOL11) TC 75 (12.78%) 72 (14.00% 0.553 0.9(0.636-1.274)

CC 220 (37.78%) 184 (36.00% 0.555 1.077(0.842-1.378) T 651 (56%) 582 (57%) 0.6 0.956(0.807-1.132)

C 515 (44%) 440 (43%) 0.6 1.046(0.883-1.240) rs12602590 CC 324 (955.56%) 358 (70%) 0.000 0.535(0.416-0.687)

(GAS2L2) CT 230 (39.44%) 137 (26.78%) 0.000 1.799(1.391-2.326) TT 29 (5.00%) 16 (3.22%) 0.126 1.619(0.869-3.017)

C 878 (75%) 853 (83%) 0.000 0.694(0.489-0.747) T 288 (25%) 169 (17%) 0.000 1.656(1.339-2.047)

rs78557458 CC 548(94.00%) 502(98.18%) 0.000 0.281(0.134-0.590)

(GAS2L2) CA 13 (2.18%) 6 (1.26%) 0.182 1.920(0.724-5.088) AA 22 (3.82%) 3 (0.56%) 0.000 6.641(1.976-22.319)

C 1109(95%) 1010 (99%) 0.000 0.231(0.123-0.433) A 57 (5%) 12 (1%) 0.000 4.326(2.308-8.109) rs569956 TT 379(65.00%) 368 (72.00%) 0.013 0.722(0.558-0.934) (RGS8 ) GT 23 (3.89%) 20 (4.00%) 0.979 1.008(0.547-1.8585) GG 18 (31.11%) 123 (24.00%) 0.01 1.420(1.086-1.857)

T 781 (67%) 756 (74%) 0.000 0.714(0.593-0.859) G 385 (33%) 266 (26%) 0.000 1.401(1.164-1.687)

Page 17/24 rs3735328 CC 268 (46.00%) 290 (56.67%) 0.000 0.648(0.510-0.823)

ZNF862 CT 257 (44.00%) 176 (34.44%) 0.000 2.164(1.677-2.794) TT 58 (10.00%) 45 (8.89%) 0.519 1.144(0.760-1.722)

C 793 (68%) 756 (74%) 0.002 0.748(0.621-0.901) T 373 (32%) 266 (26%) 0.002 1.337(1.110-1.610)

rs62621204 CC 489(83.89%) 501 (98.00%) 0.000 0.104 (0.053-0.202) ZNF862 CT 71(12.19%) 7 (1.44%) 0.000 9.984 (4.549-21.915)

TT 23 (3.92%) 3 (0.56%) 0.000 6.955 (2.076-23.302) C 1049 (90%) 1009(99%) 0.000 0.116 (0.065-0.206)

T 117 (10%) 13(1%) 0.000 8.657 (4.849-15.453)

rs1332500 GG 424 (72.78%) 378 (74.00%) 0.642 0.938 (0.717-1.228) C1orf68 GA 140 (23.96%) 102 (20.00%) 0.107 1.267 (0.950-1.691)

AA 19 (3.26%) 31 (6.00%) 0.027 0.522 (0.291-0.935) G 988 (85%) 858 (84%) 0.616 1.061 (0.842-1.337)

A 178 (15%) 164 (16%) 0.616 0.943 (0.748-1.188)

rs3095726 TT 163 (28%) 204 (40%) 0.000 0.584(0.453-0.752) ZNF573 TC 303 (52%) 233 (45.56%) 0.035 1.291(1.018-1.638) CC 117 (20%) 74 (14.44%) 0.015 1.483(1.078-2.040)

T 629 (54%) 641 (63%) 0.000 0.696(0.587-0.826) C 537 (46%) 381 (37%) 0.000 1.436(1.210-1.705)

rs9975588 GG 473 (81.11%)440 (86.00%) 0.027 0.694(0.501-0.961)

MCM3AP GA 107 (18.33%) 61 (12.00%) 0.003 1.658(1.181-2.329) AA 3 (0.56%) 10 (2.00%) 0.028 0.259(0.071-0.947)

G 1053 ( 90%) 941 (92%) 1.147 0.802(0.595-1.081) A 113 (10%) 81 (8%) 1.147 1.247(0.925-1.680)

rs2839166 TT 457 (78.33%)409 ( 80.00%) 0.502 0.905(0.675-1.213) MCM3AP TC 123 (21.11%) 92 (18.00%) 0.199 1.218(0.901-1.645)

CC 3 (0.56%) 10 (2.00%) 0.028 0.259(0.071-0.947) T 1037 (89%) 910(89%) 0.938 0.989(0.757-1.294)

CC 129 (11%) 112 (11%) 0.938 1.011(0.773-1.322)

rs2305243 TT 375(64.44%) 317(62.00%) 0.434 1.103(0.863-1.411) ATL2 TA 185(31.67%) 163(32.00) 0.953 0.992(0.769-1.281)

AA 23 (3.89%) 31( 6.00%) 0.106 0.636(0.366-1.106) T 935 (80%) 797 (78%) 0.205 1.143(0.930-1.405)

A 231 (20%) 225( 22%) 0.205 0.875(0.712-1.076)

rs1555804 AA 453 (77.70%) 389 (76.00%) 0.537 1.093(0.825-1.449)) MSRB2 AG 120 (20.58%) 102 (20.00%) 0.798 1.039(0.773-1.397))

GG 10 (1.72%) 20 (4.00%) 0.026 0.428(0.199-0.924)

Page 18/24 A 1026 (88%) 880 (86%) 0.189 1.183(0.921-1.519)) G 140 (12%) 142 (14%) 0.189 0.846(0.658-1.086)

Indels rs139095003 CTCCTCCT>CTCCT

FLII CTC/CTC 545(93.57%) 471(92.2%) 0.401 1.218(0.768-1.931) CTC/DEL 13 (2.22%) 22(4.24%) 0.052 0.507(0.253-1.017)

DEL/DEL 25 (4.21%) 18(3.56%) 0.516 1.227(0.662-2.276) CTC 1103 (95%) 964 (94%) 0.781 1.053(0.730-1.520)

DEL 63 (5%) 58 (6%) 0.781 0.949(0.658-1.370)

rs3835153 GAG>G DHDH GA/GA 487 (83.55%) 455 (89.18%) 0.009 0.624(0.438-0/889)

GA/ DEL 52 (8.89%) 31 (6.00%) 0.075 1.516(0.956-2.406) DEL/DEL 44 (7.56%) 25 (4.82%) 0.072 1.587(0.957-2.632)

GA 1026 (88%) 941 (92%) 0.002 0.631(0.473-0.841) DEL 140 (12%) 81 (8%) 0.002 1.585(1.189-2.113)

rs76499929 AGAAGA>AGA

SYNE3 AGA /AGA 299 (52.00%)295(57.78%) 0.033 0.771(0.607-0.979) AGA /DEL 233 (40.00%)196(38.33%) 0.586 1.070(0.839-1.365)

DEL/DEL 51 (8.00%) 20(3.89%) 0.001 2.353(1.383-4.004) AGA 831 (71%) 786(77%) 0.003 0.745(0.614-0.903) DEL 335 (29%) 236(23%) 0.003 1.343(1.107-1.628)

rs141384720 AGGAG>AG

SYNE3 AGGAG/ AGGAG 385(66.11%)317(62.00%) 0.157 1.196(0.934-1.532)

AGGAG/DEL 178(30.56%)184(36.00%) 0.057 0.783 (0.608-1.008) DEL/DEL 19 (3.33%) 10(2.00%) 0.180 1.691 (0.779-3.670)

AGGAG 948 (81%) 818(80%) 0.406 1.095 (0.885-1.354) DEL 216 (19%) 204 (20%) 0.406 0.914(0.738-1.130)

rs141384720 AGGAG>AG AMBN

AGG/ AGG 386 (66.11%) 317 (62.00%) 0.151 1.199(0.936-1.536) AGG/ DEL 178 (30.56%) 184 (36.00%) 0.055 0.781(0.607-1.005)

DEL/ DEL 19 (3.33%) 10 (2.00%) 0.181 1.688(0.777-3.664) AGG 950 (81%) 818 (80% ) 0.395 1.097(0.886-1.357))

DEL 216 (19%) 204 (20% ) 0.395 0.912(0.737-1.128)

rs200420244 T>- ZNF527

T/T 528(90.56%) 473(92.5%) 0.237 0.771(0.501-1.188)

Page 19/24 T/Null 33 (5.58%) 22(4.38%) 0.306 1.334(0.767-2.319)

Null /Null 22 (3.86%) 16 (3.12%) 0.563 1.213(0.630-2.336) T 1089 (93% ) 968 (95%) 1.194 0.789(0.551-1.129) Null 77 (7%) 54 (5%) 1.194 1.267(0.886-1.814)

rs199964596 CAACAACAAC>CAACAAC

GPSM2 CAA/ CAA 523 (89.66%)449 (87.88%) 0.334 1.204(0.826-1.754)

CAA/ DEL 16 (2.78%) 31 (6.00%) 0.007 0.437(0.236-0.809) DEL/ DEL 44 (7.56%) 31 ( 6.12%) 0.334 1.264(0.785-2.034)

CAA 1062(91%) 929(91%) 0.883 1.022(0.762-1.371) DEL 104(9%) 93 (9%) 0.883 0.978(0.730-1.312)

rs113082690 ACAGACA>ACA ABCA10

ACAG / ACAG 473(81.11%) 457(89.44%) 0.000 0.508(0.358-0.721) ACAG /DEL 104(17.78%) 51(10.00%) 0.000 1.958(1.368-2.803)

DEL /DEL 6 (1.11%) 3(0.56) 0.419 1.761(0.438-7.077) ACAG 1050(90%) 965(94%) 0.000 0.535(0.385-0.743)

DEL 116(10%) 57 (6) 0.000 1.870(1.346-2.599)

rs372662724 CTCT>CT ZNF107

CT/ CT 523(89.66%) 462(90.44%) 0.699 0.924(0.621-1.376)) CT/ DEL 38(6.56%) 28(5.45%) 0.472 1.203(0.727-1.990)) DEL/ DEL 22(3.78%) 21(4.11%) 0.775 0.915(0.497-1.684)

CT 1084(93%) 952(93%) 0.866 1.972 (0.698-1.353) DEL 82 (7%) 70 (7%) 0.866 1.029 (0.739-1.432)

rs71713548 AGATAGA >AGA

MYCT1 AGAT/ AGAT 376 (64.44%) 358 (70.00%) 0.051 0.776(0.602-1.001

AGAT/ DEL 188(32.22%) 143 (28.00%) 0.126 1.225(0.945-1.588) DEL/ DEL 19(3.33%) 10 (2.00%) 0.181 1.688(0.777-3.664)

AGAT 940(81%) 859 (84%) 0.036 0.789(0.632-0.985) DEL 226(19%) 163 (16%) 0.036 1.267(1.015-1.518)

rs11368509 ->A UPP2

A/ A 450 (77.22%) 426 (83.44%) 0.011 0.675(0.499-0.914) A/ DEL 126 ( 21.67%) 72 (14.00%) 0.001 1.681(1.223-2.310)

DEL/ DEL 7 (1.11%) 13 (2.56%) 0.098 0.466(0.184-1.176) A 1026(88%) 924 (90%) 0.070 0.777(0.592-1.021)

DEL 140 (12%) 98 (10%) 0.070 1.287(0.979-1.690)

Page 20/24 Figures

Figure 1

Sequencing base content distribution on clean reads. The X-axis represents the position of each base on reads. The Y axis represents the content of each base. Normally, except for the fuctuation of several base groups at the beginning of reads, the curve of base A should overlap with that of base T, and the curve of base G should overlap with that of base C without great fuctuation.

Figure 2

Page 21/24 Sequencing the distribution of base mass values on clean reads. The X-axis represents the position of each base on reads. The Y-axis represents the sequencing quality value. Each point in the graph represents the sequence quality value of the base at the corresponding position on reads.

Figure 3

The length of inserting fragments (the length of DNA fragments sequenced) of pairwise sequencing reads. a: The depth distribution of single base sequencing in the target region. The x-axis represents the sequencing depth, and the y-axis represents the proportion of the target area with the corresponding sequencing depth; b: Cumulative sequencing depth distribution on the target region. The x-axis represents the sequencing depth, and the y-axis represents the proportion of target areas that reach the corresponding sequencing depth and above; c: The length distribution of insertion fragments of paired sequenced reads. The x- axis represents the length of inserted fragments of paired sequenced reads, and the y-axis represents the proportion of reads with corresponding fragment length.

Figure 4

The length distribution of InDel in the coding region. The X-axis represents the length of insertion/deletion, and the Y-axis represents the number of insertions/deletions.

Page 22/24 Figure 5

After deep analysis, we found there are 20 genes (contained SNP and Indel porlymorphisms sites ) exists in 6 or 7 nuclear age realted cataract samples and the gens heatmaps were shown in Figure5.

Figure 6

The 3D protein structure change of SNP site rs3095726 of ZNF573 gene is shown in Figure 6. In addition, we use the software of SWISS-MODEL to analyze the three-dimensional structure changes of protein caused by missense mutation of Met465Val (ATG → GTG) in the coding region of ZNF573 gene. It was found that the original met amino acid in the coding region of ZNF573 gene had a change in the composition of carbon atoms. When ATG → GTG occurs, the original met becomes val. it can be seen that the position of the frst-order bicarbonate atom and a hydrogen bond of rs3095726 protein changes from tail position to middle position. The analysis of spatial sphere structure shows that the protein of ZNF573 is not easy to be activated because of the change of ATG → GTG, resulting in the change of met → Val from the initial obvious position to the hidden position of Val, thus affecting the function of ZNF573. A:The protein line structure of ZNF573(rs3095726: ATG →Met);B:The protein line structure of ZNF573(rs3095726: GTG →Val ); C: A:The protein space ball structure of ZNF573(rs3095726: ATG →Met); D: B:The protein space ball structure of ZNF573(rs3095726: GTG →Val ).

Page 23/24 Figure 7

The 3D protein structure change of SNP site rs78557458 of GAS2L2 gene is shown in Figure 7. In addition, we use the software of SWISS-MODEL to analyze the three-dimensional structure changes of protein caused by missense mutation of Val71Phe (GTT → TTT) in the coding region of GAS2L2 gene. It was found that the original met amino acid in the coding region of GAS2L2 gene had a change in the composition of carbon atoms. When GTT → TTT occurs, the original Val becomes Phe. The 3D protein variety analysis showed the position of primary hydrogen bond changed from open to circle condition for corresponding protein position of rs3095726. The space ball structure analysis showed the protein position site from primary single position to circle shape maybe caused the GAS2L2 was not easily activated so that the function of GAS2L2 was infuenced. A:The protein line structure of GAS2L2(rs78557458 : GTT →Val);B:The protein line structure of GAS2L2 (rs78557458: TTT →Phe ); C1: A:The protein space ball structure of GAS2L2(rs78557458 : GTT →Val);C2: The protein space ball structure of GAS2L2 (rs78557458: TTT →Phe ).

Supplementary Files

This is a list of supplementary fles associated with this preprint. Click to download.

Table2.docx

Page 24/24