Amino-terminal acetylaon of by N- terminal acetyltransferases: mechanisms and relevance to human c diseases

Gholson J. Lyon, M.D. Ph.D.

Acknowledgments

Michael Klingener Jason O’Rawe Yiyang Wu Michael Schatz Giuseppe Narzisi Thomas Arnesen Rune Evjenth Johan R. Lillehaug

Max Doerfel

our study families

Figure 4.

Figure 4. NAT activity of recombinant hNaa10p WT or p.Ser37Pro towards synthetic N-terminal peptides. A) and B) Purified MBP-hNaa10p WT or p.Ser37Pro were mixed with the indicated oligopeptide substrates (200 µM for SESSS and 250 µM for DDDIA) and saturated levels of acetyl-CoA (400 µM). Aliquots were collected at indicated time points and the acetylation reactions were quantified using reverse phase HPLC peptide separation. Error bars indicate the standard deviation based on three independent experiments. The five first amino acids in the peptides are indicated, for further details see materials and methods. Time dependent acetylation reactions were performed to determine initial velocity conditions when comparing the WT and Ser37Pro NAT-activities towards different oligopeptides. C) Purified MBP-hNaa10p WT or p.Ser37Pro were mixed with the indicated oligopeptide substrates (200 µM for SESSS and AVFAD, and 250 µM for DDDIA and EEEIA) and saturated levels of acetyl-CoA (400 µM) and incubated for 15 minutes (DDDIA and EEEIA) or 20 minutes (SESSS and AVFAD), at 37°C in acetylation buffer. The acetylation activity was determined as above. Error bars indicate the standard deviation based on three independent experiments. Black bars indicate the acetylation capacity of the MBP-hNaa10p wild type (WT), while white bars indicate the acetylation capacity of the MBP-hNaa10p mutant p.Ser37Pro. The five first amino acids in the peptides are indicated.

The role of thyroid hormone in crenism, which is caused by lack of iodine during maternal pregnancy, so this is an environmentally triggered disease. A LA T L E T R E NA R A L N T T E A E TR R E N O A U R T T O E E U S R T O E U S T E S I I I I I I NH2 I I I 5' I 5 NH2 NH2 I I5' 5 5' 5 5' 5 DIT + I + 5' 5 5' 5 SO O CH2 CH COOH DIT + I + DIT + I + HO O CH2 COOH SO4 4 3' O SO4 3 CH2 OCH COOHCH2 CH COOH HO O 3'HO CH2 3COO OH CH2 COOH 3' 3 3' 3 HydroquiHynonedroquiHynonedroquinone 3' 3 3' 3 I I I I I I I II I I I T4TS4 orS orT4 GT4TG4S or T4G TetracTetrTaectrac Ether-linkEther-linkEther-link ConjugatConjugation ionConjugatto: Thyroid Hormone and to: ion to: Deiodinaoncleavage cleavagecleavage Deamination Deaminat+ ion + - S- ulfSatulfeate - Sulfate Pathways Deamination + - Glucuonide- Glucuonide DecarboxylatDecarboxylatDecarboxylation ion ion - Glucuonide I I I I I I NH2 NH2 5' 5 5' 5 NH2 HO O5' HO CH5 CHO COOHCH2 CH COOH HO3' 3 O 3' 2 CH3 CH COOH T4 3' 3 2 I I I I I I T4 T4 ’ T 5 -DEIODINATION D1 4D1 5-DEIODINATION 5'-Deiodin5a't-iDoeniodinDa1t i&on D2 D1 &D D1 2& D3 D15 -&D eDi3odina5t-iDoeniodination 5'-Deiodinaactivatingtion D 1 & DD22 D3D1 & Dinactivating3 5-Dei odination

I NH2I NH2 I I NH2 NH2 5' 5 5' 5 5' 5 5' 5 HO O HO I CH2 OCH CONHO2 HCH2 CH COOH HO O I HO CH2 CHO COOHNHCH2 CH COOH 3' 5' 3 5 3' 3 3' 5' 3 3' 5 3 HO O CH CH COOH HO O CH CH COOH I 3' I 3I 2 I I 3'I I 3 I 2 I TI3 T3T3 reI rT3ver seI rTe3verse T3 T3 activeD E I O D I ND A E T I O I O D N I N A Tinactive I O Nre verse T3 D E I O D I N A T I O N Bacterial communicaon via small molecules and pepdes

Acyl Homoserine Autoinducing Lactones (AHL’s) = Peptides (AIP’s) = O H N ~6-10 amino acid peptides R O Modifications include O geranylation, thiolactone formation (cyclization) R = (CH2)nCH3, CO(CH2)nCH3 Lyon, G. & Muir, T.W., Chem. Biol. 2004 Bassler, B., Cell, 2004, 109, 421

I moved to Utah in July 2009 to find at least one new human disease, thus revealing new biology. u July 2009-December 2009: Aended weekly genecs case conference in which 10-30 genec cases are presented weekly, led by Dr. Alan Rope and aended by Drs. John Carey and John Opitz. u There are indeed MANY idiopathic disorders not described in the literature, many of which have neuropsychiatric manifestaons. I thought about hundreds of such cases, looking for the ideal first family to sequence. Discovering a new syndrome and its genec basis. I met the enre family on March 29, 2010

Photo of mother with son in late 1970’s This is the first boy in the late 1970’s.

First boy. Called “a lile old man” by the family. Died around ~1 year of age, from cardiac arrhythmias. This is the “Proband” photograph presented at Case Conference.

prominence of eyes, down-sloping palpebral fissures, thickened eyelids, large ears, beaking of nose, flared nares, hypoplasc nasal alae, short columella, protruding upper lip, micro-retrognathia

Family now in October 2011, with five mutaon- posive boys dying from the disease.

I 1 +/ 2 +/mt

II SB

1 mt/ 2 +/mt 3 +/mt 4 5 +/mt 6 7 +/+ 8 +/

III

1 2 +/ 3 +/+ 4 mt/ 5 6 mt/ 7 mt/ An unrelated second family was also idenfied, due to sharing the same genotype, i.e. the same mutaon.

II-1 III-2

Contributed by Les Biesecker and colleagues at NIH Ogden Syndrome – in 2011

We found the SAME mutaon in two unrelated families, with a very similar phenotype in both families, helping prove that this genotype contributes to the phenotype observed. These are the Major Features of the Syndrome. A

B

II-1 II-6 III-4 III-6 III-7

C D

II-1 III-2 u We performed X- exon capture with Agilent, followed by Next Gen Sequencing with Illumina. u We analyzed the data with ANNOVAR and VAAST (Variant Annotaon, Analysis and Search Tool). New computaonal tools for idenfying disease-causing mutaons by individual genome sequencing.

Yandell, M. et al. 2011. “A probabilisc disease-gene finder for personal genomes.” Genome Res. 21 (2011). doi:10.1101/gr.123158.111.

Wang, K., Li, M., and Hakonarson, H. (2010). ANNOVAR: funconal annotaon of genec variants from high-throughput sequencing data. Nucleic Acids Res 38, e164.

VAAST integrates AAS & Variant frequencies in a single probabilisc framework

• non-coding variants scored using allele frequency differences

• ni : frequency of variant type among all variants observed in Background and Target genomes

• ai: frequency of variant type among disease causing mutations in OMIM

• This approach means that every variant can be scored, non-synonymous, synonymous, coding, and non-coding. Phylogenetic conservation not required.

This is the mutation we found… one nucleotide change out of 6 billion nucleotides in a diploid genome…

Pro37 Mutation C C C

Ser37 C T T T C C T G G WT

T Unaffected Brother

C Proband Proving Relevance of the mutation uPresent in two unrelated families with very similar phenotype of affected boys. uBlinded Sanger sequencing showed perfect segregaon of the mutaon with the disease. Mutaon present in Proband, Carrier Mother, Carrier Grandmother and other carrier mothers. Absent in unaffected brother and unaffected uncle. uAlso present in DNA from formalin-fixed paraffin- embedded ssue from two other deceased affected boys, found in pathology department, saved in one case for 30 years. uMutaon NOT present in ~6000 exomes or genomes sequenced at BGI, CHOP and Utah for other projects. Ogden Syndrome, in honor of where the first family lives, in Ogden, Utah

The mutation is a missense resulting in Serine to Proline change in Naa10p

- Ser 37 is conserved from yeast to human - Ser37Pro is predicted to affect functionality (SIFT and other prediction programs) - Structural modelling of hNaa10p wt (cyan) and S37P (pink)

The mutaon disrupts the N-terminal acetylaon machinery (NatA) in human cells.

Slide courtesy of Thomas Arnesen Nε-acetylaon: Lysine Acetylaon (KATs, HATs) Introduction:Revealing acetyltransferases the secrets of proteins

A triple-nuclear and multi-dimensional high-resolution NMR spectroscopy study Revealing the secrets of proteins

A triple-nuclear and multi-dimensional high-resolution NMR spectroscopy study Annette Katharina Brenner

Annette Katharina Brenner

Revealing the secrets of proteins

Figure 1.3: N-acetylation. Lysine acetyltransferasesA triple-nuclear (KATs) and multi-dimensional transfer the acetyl high-resolution group of acetyl-CoA NMR to the side chain of lysine, and thus remove the positive chargespectroscopy of the study amino acid. Deacetylases (HDACs) catalyze the reverse reaction. Dissertation for the degree philosophiae doctor (PhD) at the University of Bergen

Annette Katharina Brenner 1.2.3 N -acetylation Dissertation for the degree philosophiae doctor (PhD) 2012 In N -acetylation, the acetyl group of acetyl-CoA is attransferred the University to of the Bergen backbone of the N-terminal amino acid of the target (figure 1.4).

2012

Dissertation for the degree philosophiae doctor (PhD) Figure 1.4: N-acetylation. The acetyl group of acetyl-CoA is irreversibly transferred to the N- terminal amino acid of the target protein by an N-terminal acetyltransferaseat the University of (NAT). Bergen

The reaction takes place post-initial rather than post-translational2012 [34] as the nascent protein is acetylated when 20-50 amino acids protrude from the [39]. N- acetylation neutralizes the positive charge of the protein N-terminal, which may affect the protein’s function, its stability, its interaction with other molecules and its susceptibility to further modifications [37]. Unlike N-acetylation, N-terminal acetylation appears to be irreversible [37]. N-acetylation occurs in all kingdoms of life, but the fraction of acetylated proteins varies widely. In mammals, supposedly 80- 90 % of all proteins are acetylated; more simple eukaryotes like yeast only show about 50 % of acetylated proteins, while proteins in prokaryotes and archaea are rarely acetylated [40]. The methionine that resides at the N-terminal of most proteins is

5 Introduction: acetyltransferases

Figure 1.3: N-acetylation. Lysine acetyltransferases (KATs) transfer the acetyl group of acetyl-CoA to the side α chain of lysine, and thus remove the positive charge of the amino acid. Deacetylases (HDACs)N catalyze-acetylaon: Amino-terminal Acetylaon the reverse reaction. Revealing the secrets of proteins

A triple-nuclear and multi-dimensional high-resolution NMR 1.2.3 N-acetylation spectroscopy study Revealing the secrets of proteins

In N -acetylation, the acetyl group ofA triple-nuclearacetyl-CoA and is multi-dimensionaltransferred to high-resolutionthe backbone NMR of the spectroscopy study Annette Katharina Brenner N-terminal amino acid of the target protein (figure 1.4).

Annette Katharina Brenner

Revealing the secrets of proteins Figure 1.4: N-acetylation. The acetylA triple-nuclear group of acetyl-CoA and multi-dimensional is irreversibly high-resolution transferred NMR to the N- terminal amino acid of the target protein by an N-terminal spectroscopyacetyltransferase study (NAT). Dissertation for the degree philosophiae doctor (PhD) at the University of Bergen The reaction takes place post-initial rather than post-translational [34] as the nascent

Annette Katharina Brenner protein is acetylated when 20-50 amino acids protrude from the ribosome [39]. N - Dissertation for the degree philosophiae doctor (PhD) 2012 acetylation neutralizes the positive charge of the proteinat the University N-terminal, of Bergen which may affect the protein’s function, its stability, its interaction with other molecules and its susceptibility to further modifications [37]. Unlike 2012 N -acetylation, N-terminal acetylation appears to be irreversible [37]. N-acetylation occurs in all kingdoms of life, but the fraction of acetylated proteins varies widely. In mammals, supposedly 80- 90 % of all proteins are acetylated; more simple eukaryotes like yeast only show about 50 % of acetylated proteins, while Dissertation proteins for in the prokaryotes degree philosophiae and doctor archaea (PhD) are rarely at the University of Bergen acetylated [40]. The methionine that resides at the N-terminal of most proteins is

5 2012 N-terminal processing in human cells

Van Damme P et al., PLoS Genet, 2011 Protein acetylation in higher eukaryotes

-Irreversible 1. -N-terminus -Cytoplasm

posttranslational N-epsilon acetylation -Reversible -Lysine 2. KAT KAT Ac Ac -Nucleus-cytoplasm

KDAC KDAC

CYTOPLASM NUCLEUS

-Irreversible? 3. -N-terminus -Vesicles etc. NAT acvity of recombinant hNaa10p WT or p.Ser37Pro towards synthec N-terminal pepdes

Assay performed in Thomas Arnesen lab GST GST- GST- Naa50 wt Naa15 wt

F1 F2 F3 F4 F1 F2 F3 F4 F1 F2 F3 F4 kDa kDa kDa 100 70 100 100 55 70 70 40 55 GST-Naa50 55 GST-Naa15 35 40 40 25 GST 35 35 25 25 15

GST- GST- GST- Naa10 wt Naa10 S37P Naa10 R116W

F1 F2 F3 F4 F1 F2 F3 F4 F1 F2 F3 F4 kDa kDa kDa

100 100 100 70 70 70 55 GST-Naa10 55 GST-Naa10 S37P 55 GST-Naa10 R116W 40 40 40 35 35 35 25 Protein Expression and purificaon 25 25

GST GST- GST- Avi/His Naa50 wt Naa15 wt 6 Naa10 wt

F1 F2 F3 F4 F1 F2 F3 F4 F1 F2 F3 F4 kDa kDa kDa F1 F2 F3 F4 100 kDa 70 100 100 55 70 70 100 40 55 GST-Naa50 55 GST-Naa15 70 35 40 40 55 25 GST 35 35 25 25 40 Avi-Naa10-His6 15 35

GST- GST- ExpressionGST- plasmid was transformed in chemically competent E. coli BL21 DE3 (NEB). A starting culture (100 ml LB media + antibiotic) was incubated for 12 h at 37°C. 30 ml of this culture were

Naa10 wt Naa10 S37P addedNaa10 to 400 R116Wml expression culture (LB media + 1 % (m/v) glucose + antibiotic) and were incubated at 37°C until an optical density of 0.8 was reached. Expression was induced with 0.5 mM IPTG

(Milipore) at 30°C for 1h. The bacteria were pelleted at 6000 x g for 15 min and resuspended in 8 ml lysis buffer (40 mM Tris/Cl, pH 8.0; 100 mM NaCl) or HIS-lysis buffer (50 mM NaH2PO4, 300 mM F1 F2 F3 F4 F1 F2 F3 F4 F1 F2 F3 F4 kDa kDa kDaNaCl, pH 8,0). The cells were ultrasonificated on ice and cellular debris was pelleted at 20,800 x g for 30 min at 4°C.

100 100 Protein100 was bound to 0.7 ml GSH beads/Glutathione sepharose (Sigma-Aldrich) or Ni-NTA (Qiagen), the beads were washed twice with lysis buffer or HIS-lysis buffer + 50 mM imidazole and purified 70 70 70 protein was eluted with 20 mM L-Glutathione (Sigma-Aldrich) or 250 mM imidazole. Fractions of 0.5 ml were collected and aliquots separated on SDS-PAGE. 55 GST-Naa10 55 GST-Naa10 S37P 55 GST-Naa10 R116W 40 40 40 35 35 35 25 25 25

Avi/His6 Naa10 wt

F1 F2 F3 F4 kDa

100 70 55 40 Avi-Naa10-His6 35

Expression plasmid was transformed in chemically competent E. coli BL21 DE3 (NEB). A starting culture (100 ml LB media + antibiotic) was incubated for 12 h at 37°C. 30 ml of this culture were

added to 400 ml expression culture (LB media + 1 % (m/v) glucose + antibiotic) and were incubated at 37°C until an optical density of 0.8 was reached. Expression was induced with 0.5 mM IPTG

(Milipore) at 30°C for 1h. The bacteria were pelleted at 6000 x g for 15 min and resuspended in 8 ml lysis buffer (40 mM Tris/Cl, pH 8.0; 100 mM NaCl) or HIS-lysis buffer (50 mM NaH2PO4, 300 mM

NaCl, pH 8,0). The cells were ultrasonificated on ice and cellular debris was pelleted at 20,800 x g for 30 min at 4°C.

Protein was bound to 0.7 ml GSH beads/Glutathione sepharose (Sigma-Aldrich) or Ni-NTA (Qiagen), the beads were washed twice with lysis buffer or HIS-lysis buffer + 50 mM imidazole and purified

protein was eluted with 20 mM L-Glutathione (Sigma-Aldrich) or 250 mM imidazole. Fractions of 0.5 ml were collected and aliquots separated on SDS-PAGE.

Big Quesons though:

?

Simulated structure of S37P mutant

What is the molecular basis of Ogden syndrome? • Naa10/Naa15 complex • Naa10 localisaon • Naa10 funcon what can we learn from ogden syndrome? • characterizing different model systems (fibroblasts, yeast, C. elegans) These are the Major Features of the Syndrome. Family 1: II-1 Family 1: II-1 Family 1: II-6 Family 1: III-4 Family 1: III-6 Family 1: III-6 Family 1: III-6 These are the Major Features of the Syndrome. Family 2: II-1 Family 2: II-1 Family 2: III-2 Family 2: III-2 Family 2: III-4 NAA10/NatA is essential for life

C. elegans

T. Brucei

D. melanogaster

Wang et al., Dev Dyn, 2010; Ingram et al., Mol Biochem Parasitol, 2000; Sonnichsen et al., Nature, 2005 The human N-Acetylome

Free

Ser-, Ala-, Thr-, Val- NatA ~8500 substrates Met-Lys-/Gln-/Met- >40% proteome NatF+ NatC NatE

NatF NatD NatB

• A majority of soluble human proteins are N-terminally acetylated • NatA is a major protein modifying enzyme of the human proteome

Arnesen T et al., Proc Natl Acad Sci USA, 2009; Van Damme P et al., PLoS Genet, 2011 Protein N-terminal acetyltransferases in cancer TV Kalvik and T Arnesen 2

Protein N-terminal acetyltransferases in cancer TV Kalvik and T Arnesen 2

Figure 1. Overview of human NATs and their substrate specificities. The human NATs are composed of catalytic subunits (yellow) and auxiliary subunits (green),Figure and 1. Overview all are associated of human NATs with and their substrate (blue). specificities. NatA potentially The human NATs acetylates are composed Ser-, Ala-, of catalytic Thr-, Gly- subunits and (yellow) Val- N-termini and auxiliary after the iMet has been removedsubunits by (green), methionine and all are aminopeptidases associated with ribosomes (MetAPs). (blue). NatB NatA potentially potentially acetylates acetylates Ser-, Met-Asp-, Ala-, Thr-, Met-Glu- Gly- and Val- and N-termini Met-Asn-, after whereasthe iMet NatC may target Met-Leu-,has been Met-Ile-, removed Met-Phe- by methionine and Met-Trp-. aminopeptidases NatD apparently (MetAPs). NatB only potentially acetylates acetylates the Met-Asp-, Ser- N-termini Met-Glu- of and histones Met-Asn-,Oncogene H2A whereas and (2012) NatC 1 H4. --8 may NatE and NatF target Met-Leu-, Met-Ile-, Met-Phe- and Met-Trp-. NatD apparently only acetylates the Ser- N-termini of histones H2A and& 2012 H4. Macmillan NatE Publishers and NatF Limited All rights reserved 0950-9232/12 demonstratedemonstrate some specificity some specificity towards towards NatC-type NatC-type substrates substrates as as well well as as Met-Lys-, Met-Lys-, Met-Ala-, Met-Ala-, Met-Met- Met-Met- and Met-Val-. and Met-Val-. *NatA orwww.nature.com/onc *NatA hNaa10p or may hNaa10p also may also mediate post-translationalmediate post-translational acetylation acetylation of mature of mature actins actins harboring harboring Asp- Asp- and and Glu- Glu- N-termini. N-termini.

REVIEW

and the multiple cellular pathways under its control. A growing Proteinconsequently N-terminal loss of hNatA acetyltransferases activity result in in DNA cancer damage and the multiple cellular pathways under its control. A growing consequently loss of hNatA activity result in DNA damage number of studies have reported that a high expression of hNaa10p TV Kalviksignaling1 and T Arnesen through1,2 gH2A.X/chk2 module and phosphorylation of number of studiesin certain have cancers reported seems that to correlate a high expression to a low survival of hNaa10p rate and signalingSer15 of p53 through leadingg toH2A.X/chk2 the ultimate module fate of apoptosis. and phosphorylation Three of 39,40 15 in certain cancersaggressiveness seems of to tumors. correlateNAA10 to awas low found survival overexpressed rate and in TheSeranaplastic human N-terminalof thyroid p53 acetyltransferases leading carcinoma to (NATs) cell the catalyze lines ultimate the with transfer non-functional of fate acetyl moieties of apoptosis. TP53to the N-termini Three of 80--90% of all aggressivenesscancers of tumors. of different39,40 NAA10 types andwas tissues found such overexpressed as hepatocellular in humananaplasticwere proteins. in the Six same thyroid NAT types study are carcinoma present reported in humans, to cell be NatA--NatF, sensitized lines each with is to composed chemother- non-functional of specific subunits TP53 and each acetylates a carcinoma,41 colorectal cancer,40,42 lung cancer39,42 and breast setapeutic of substrates drugs defined in by a drugthe N-terminal and cell amino-acid type sequence.specific NATs manner have been by suggested hNAA10 to act as oncoproteins as well cancers of different42,43 types and tissues such as hepatocellular aswere tumor suppressors in the47 in same human cancers, study and reported NAT expression to may be be both sensitized elevated and decreasedto chemother- in cancer versus non-cancer 41cancer. Overexpression40,42 of hNaa10p was39,42 also reported in the tissues.knockdown. Manipulation of NATs in cancer cells induced cell-cycle arrest, apoptosis or autophagy, implying that these enzymes carcinoma, urinarycolorectal bladder cancer, cancer, breastlung cancer cancer and cervicaland carcinoma. breast42 apeuticIn non-small drugs lung in a cell drug cancer and cells, cell knockdown type specific of hNAA10 mannerwas by hNAA10 42,43 target a variety of pathways.47 Of particular interest is hNaa10p (human ARD1), the catalytic subunit of the NatA complex, which cancer. OverexpressionExperiments with of the hNaa10p MCF-7 breast was cancer also reported cell line show in the that wasknockdown.shown coupled to to a suppress number of signaling cell proliferation molecules including due hypoxia to reducedinducible factor-1 cyclina, b-catenin/cyclin D1 D1, TSC2/mammalian urinary bladderexpression cancer, of hNaa10p breast increases cancer the and proliferation cervical of carcinoma. this cell line42 by targetexpression.In of rapamycin, non-small48 myosinThis tendencylung light chain cell kinase was cancer , DNA also methyltransferase1/E-cadherin reported cells, knockdown in two anaplastic and of p21-activated hNAA10 kinase-interactingwas exchange factors (PIX)/Cdc42/Rac1. The variety of mechanistic links where hNaa10p47 acts as a NAT, a lysine acetyltransferase or promoting passage of the G1/S and G2/M checkpoints as suggested displayingthyroid a non-catalytic carcinoma role, cell provide lines insights with to non-functional how hNaa10p may act TP53. as both a tumor suppressor and oncoprotein. Experiments with the MCF-7 breast43 cancer cell line show that shown to suppress cell proliferation due to reduced cyclin D1 by flow cytometry analysis. The same study also confirmed that In addition,48 the latter study also reported reduced levels of expression of hNaa10p increases the proliferation of this cell line by Oncogeneexpression.advance onlineThis publication, tendency 5 March 2012; was doi:10.1038/onc.2012.82 also reported in two anaplastic hNaa10p expression is significantly higher in the breast cancer tissue cyclin E and an upregulation of cyclin-dependent kinase inhibitor47 promoting passagethan in tissue of the adjacent G1/S and to the G2/M cancerous checkpoints tissue.43 as suggested Keywords:thyroidp27/Kip1,N-terminal carcinoma possibly acetylation; contributing cell NAT; Naa10p; lines to oncoprotein; with cell-cycle non-functional tumor arrest. suppressor47 Apparently, TP53. by flow cytometryIndeed, analysis. ectopic43 expressionThe same of hNaa10p study also increased confirmed cell prolifera- that theIn observed addition, effects the in latter non-small study lung also cell reported cancer cells reduced are levels of tion and upregulation of involved in cell survival, mediated via hNaa10p catalyzed N-e-acetylation of one or more hNaa10p expression is significantly higher44 in the breast cancer tissue cyclin E and an upregulation of cyclin-dependent kinase inhibitor proliferation and metabolism. Among43 them was Beclin 1, a PROTEINlysine N-TERMINAL residues ACETYLATION of b-catenin, a known regulatorevolution of from cyclin yeast47 toD1 humans with respect to substrate than in tissue adjacent to the cancerous tissue. 45 p27/Kip1, possibly contributing to48 cell-cycle arrest. Apparently, 25,34 --36 protein involved in autophagy and anti-apoptotic pathways. In Mostthrough eukaryotic binding proteins underg to theocotranslationaland/orpost- cyclin D1 promoter. Inspecificity addition, and hNaa10pthe composition of the complexes. NatF, Indeed, ectopicagreement expression with these of findings,hNaa10p it increased was observed cell that prolifera- the cell translationalthesynergetically observed modifications, activates and effects this the is often cyclin in crucial non-small D1 for promoter their on lung by the activating other cell hand, cancer is AP-1 present only cells in higher are eukaryotes and its regulation and function. One of the most common covalent presence may partially explain the higher level of Nt-acetylation 2 tion and upregulationproliferation was of markedly genes reduced involved in hNAA10 inknockdown cell survival, cells. modificationsmediatedproteins in eukaryotes (c-Jun via is and N hNaa10pa-terminal c-Fos) acetylation via catalyzed the (Nt-acetylation), ERK1/2 N-pathwaye-acetylationseen in and humans in complex as compared of one with or yeast. more 44 which is believed to occur on B80--90% of all soluble human Some of the catalytic NAT subunits were also demonstrated to proliferationThese and studies metabolism. strongly supportAmong a role them for hNaa10p was Beclin in promoting 1, a lysinewith b-catenin residues induce of b-catenin, the1--4 transcription a known of the regulator cyclin D1 of cyclin D1 45 proteins and 50--70%49 of soluble yeast proteins. catalyze48 N-e-acetylation, which is the acetylation of e-lysine groups cell proliferation and cell survival. Nt-acetylationpromoter. wasCyclin long believed D1 is to a prominentprotect proteins regulator from on of internal cell proliferation sites of proteins. This reaction is normally catalyzed by protein involved in autophagy and anti-apoptotic pathways. In through binding to the cyclin D1 promoter.50,51 In addition,37 hNaa10p degradation by increasing their stability.5 However, a recent study lysine acetyltransferases (KATs). The fact that NAT subunits are Furthermore, early studies reported that siRNA mediated and has been coupled to tumorigenesis. In a separate study 15,28 -- agreement with these findings, it was observed that the cell suggestssynergetically that Nt-acetylation activates of proteins may the act cyclinas a general D1 promoteralso found in non-ribosomal by activating forms, including AP-1 in the nucleus, 30 knockdown of hNAA10 and hNAA15 resulted in a significant degradationwhere signal hNaa10p in yeast, being was a depleted central part of from the N-end anaplastic rule thyroidmay suggest carcinoma that they have functions besides the co- proliferationreduction was markedly of viable reduced HeLa cells. in h ThisNAA10 reductionknockdown in viability cells. was pathway.proteinscells6,7 Additional by RNAi, (c-Jun roles it range and was from c-Fos) reported regulation via ofthat the protein-- the ERK1/2 acetylationtranslational pathway Nt-acetyltransferase status and of incomplex activity. protein interactions,8--11 sub-cellular targeting47 to membranous Very recently, a mutation in hNAA10 was shown to be the These studiesapparently strongly caused support by a caspase-dependent role for hNaa10p apoptosis in promoting of G0/G1 withb-catenin12b --17-catenin was unchanged. induceThis the could transcription indicate that of growth the cyclin D1 46 compartments and49 inhibition of underlying cause of Ogden syndrome, a lethal infantile syndrome cell proliferationphase and cells. cellThis survival. phenotype was recently also confirmed for translocation,promoter.inhibition18 making and thisCyclin cell-cycle a very diverse D1 arrest isprotein a prominent modifier. mediated19 by regulatorin reduced males. The of levels resulting cell of proliferation hNaa10p-Ser37Pro protein displayed reduced NAT-activity, in agreement with a functional hNaa10p human colon carcinoma HCT116 p53 þ / þ cells treated with cyclin D1 can be independent of the hNaa10p-mediated50,51 regula- Furthermore, early studies reported that siRNA mediated and has been coupled to tumorigenesis.being essentialIn for developmenta separate and life study in humans.38 This is so far sihNAA10, whereas HCT116 p53À/À cells showed reduced cell tion of b-catenin, perhaps solely mediated viathe only the NAT-mutation ERK1/2-AP-1 directly linked to disease. knockdown of hNAA1047 and hNAA15 resulted in a significant PROTEINwhere N-TERMINAL hNaa10p ACETYLTRANSFERASES was depleted (NATs) from anaplastic thyroid carcinoma growth. This latter study also pointed to pathways implicated in pathway. a The hallmarks of cancerous cells are their ability to escape Nt-acetylation is a catalytic process where N -terminal acetyl- apoptosis, sustain an active cell proliferation and eventually reduction ofthe viable sihNAA10 HeLa-induced cells. apoptosis. This reduction Depletion of in hNaa10p viability leads was to transferasescellsStudies (NATs) by transferby RNAi, Lee anet acetyl it al.39 groupwassuggest from reported acetyl yet coenzyme another that interaction the acetylation partner of status of 47 invade other tissues. Induction of apoptosis and/or reducing the A (Ac-CoA) to the a-amino group of a polypeptide N-terminus. cell proliferation in cancer cells could retard the cancerous tissue apparently causedactivation by and caspase-dependent stabilization of p53, specifically apoptosis by phosphorylation of G0/G1 This processbhNaa10p-catenin occurs cotranslationallythat was may unchanged. influenceafter the first 20--50 its residuesoncogenicThis could properties, indicate DNA that growth 46 15 20 --22 from developing and cure the patient. In the search for ways to of Ser . This indicates that the apoptosis observed is p53 havemethyltransferase1 left the ribosomal exit tunnel, (DNMT1).but Nt-acetylation DNMT1 has is responsible for the phase cells. This phenotype was recently also confirmed for inhibition and cell-cycle23,24 arrest mediatedcombat cancer, by reducedNATs and especially levels hNaa10p of have emerged as dependent, and depletion of hNaa10p indeed leads to increased also beenmethylation shown to occur of post-translationally.DNA and aberrant methylationpotential is characteristic candidates for this of task. In this review we will summarize human colon carcinoma HCT116 p53 þ / þ cells treated with Thecyclin amino-acid D1 sequences can be of the52,53 independent N-termini determine ofwhich thethe hNaa10p-mediated current knowledge of the NATs regula- in relation to cancer and transcription of p53-dependent pro-apotopic genes. The same proteinstumor are Nt-acetylated development. and by whichhNaa10p NAT the acetylation directly is interacts with DNMT1 sihNAA10, whereas HCT116 p53À/À cells showed reduced cell tion2,25,26 of b-catenin, perhaps solely mediatedprovide an overview via the of the ERK1/2-AP-1 properties of hNaa10p supporting its study shed light on the events upstream of p53 activation. They catalyzed.and increasesIn humans, its six enzymatic different NATs capability have been defined in an acetyltransferasecandidacies as an oncoprotein (AT)- and tumor suppressor. 47 on the background of their unique subunits and their defined set growth. Thisreported latter that studygH2A.X also and pointed chk2 were to pathwaysactivated by implicated phosphorylation in pathway.independent manner by facilitating the interaction between of substrates, although in recent years it was39 discovered that there the sihNAA10independent-induced apoptosis. of the functional Depletion status of of hNaa10pTP53. As activation leads to of existsDNMT1 someStudies overlap and in by the its substrate Lee substrateet profiles al. ofsuggest the DNA. different Specifically, NATs. yet another the interaction oncogenic partner of The human NATs have been classified as hNatA, hNatB, hNatC, Naa10p AS A PRO-PROLIFERATIVE AND ANTI-APOPTOTIC gH2A.X and chk2 is normally seen with double-strand DNA breaks, potential of hNaa10p lies within its capabilityPROTEIN to recruit IN CANCER DNMT1 CELLS activation and stabilization47 of p53, specifically by phosphorylation hNatD,hNaa10p hNatE and hNatF, that where mayall have been influence found to associate its oncogenic properties, DNA 15 Gromyko et al. postulated that depletion of hNaa10p and withto ribosomes. the E-cadherin Figure 1 summarizes promoter, the current where knowledge DNMT1During can the last silence couple of the years, several studies have shed light on of Ser . This indicates that the apoptosis observed is p53 aboutmethyltransferase1 their subunit composition and (DNMT1). substrate specifici- DNMT1the biological is responsible importance of hNaa10p for theand the human NatA dependent, and depletion of hNaa10p indeed leads to increased ties.1,2,15,27methylation --33 The NatA-E of complexes DNA are and conserved aberrant throughout methylationcomplex in cancer is cells, characteristic both with respect to of expression patterns 52,53 1 & 2 transcriptionOncogene of p53-dependent (2012), 1 --8 pro-apotopic genes. The same Departmenttumor of Molecular development. Biology, University of Bergen, Bergen,hNaa10p Norway2012 and MacmillanDepartment directly of Publishers Surgery, interacts Haukeland Limited University with Hospital, DNMT1 Bergen, Norway. Correspondence: Dr T Arnesen, Department of Molecular Biology, University of Bergen, Bergen, Hordaland N-5020, Norway. E-mail: [email protected] study shed light on the events upstream of p53 activation. They Receivedand 2 December increases 2011; revised and its accepted enzymatic 6 February 2012 capability in an acetyltransferase (AT)- reported that gH2A.X and chk2 were activated by phosphorylation independent manner by facilitating the interaction between independent of the functional status of TP53. As activation of DNMT1 and its substrate DNA. Specifically, the oncogenic gH2A.X and chk2 is normally seen with double-strand DNA breaks, potential of hNaa10p lies within its capability to recruit DNMT1 Gromyko et al.47 postulated that depletion of hNaa10p and to the E-cadherin promoter, where DNMT1 can silence the

Oncogene (2012), 1 --8 & 2012 Macmillan Publishers Limited Naa10 funcon

Ribosome

Naa15 Naa50 HYPK Naa10

NH2

cytosolic (co-translaonal) • Rho (Rac/Cdc42 cell migraon) • Caspase2 (apoptosis) • TSC2 (mTOR cell proliferaon) • HIF-1α (angiogenesis) • α -Tubulin protein stability, acvity and sorng Naa10 funcon

Ribosome

Naa15 Naa50 HYPK Naa10

NH2 Naa10 ?

cytosolic (co-translaonal) • Rho (Rac/Cdc42 cell migraon) nuclear • Caspase2 (apoptosis) transcripon factors • TSC2 (mTOR cell proliferaon) • DNMT1 (E-Cad transformaon) • HIF-1α (angiogenesis) • RelA/p65 (MCL-1 survival) • α -Tubulin • Rip1/NEMO (DNA damage) • β-Catenin/Lef-1/TEF (wnt-pathway) protein stability, acvity and sorng • AR (proliferaon) localizaon of Naa10 Immunofluorescence • HEK293 cells +/- V5-hNaa10 wt or mutants • IF with an-V5

V5-Naa10 wt

α-V5 DAPI merge interacon of Naa10 and Naa15

Co-IP of Naa10 and Naa15 in HEK293 cells • precipitang anbody: α-Myc

reduced interacon of Naa15 and Naa10 S37P? interacon of Naa10 and Naa15

Co-IP of Naa10 and Naa15 in HEK293 cells • precipitang anbody: α-V5

reduced interacon of Naa15 and Naa10 S37P? ARTICLES

Molecular basis for N-terminal acetylation by the heterodimeric NatA complex

Glen Liszczak1,2, Jacob M Goldberg2, Håvard Foyn3, E James Petersson2, Thomas Arnesen3,4 & Ronen Marmorstein1,2

N-terminal acetylation is ubiquitous among eukaryotic proteins and controls a myriad of biological processes. Of the N-terminal acetyltransferases (NATs) that facilitate this cotranslational modification, the heterodimeric NatA complex has the most diversity for substrate selection and modifies the majority of all N-terminally acetylated proteins. Here, we report the X-ray crystal structure of the 100-kDa holo-NatA complex from Schizosaccharomyces pombe, in the absence and presence of a bisubstrate peptide-CoA–conjugate inhibitor, as well as the structure of the uncomplexed Naa10p catalytic subunit. The NatA-Naa15p auxiliary subunit contains 13 tetratricopeptide motifs and adopts a ring-like topology that wraps around the NatA-Naa10p subunit, an interaction that alters the Naa10p active site for substrate-specific acetylation. These studies have implications for understanding the mechanistic details of other NAT complexes and how regulatory subunits modulate the activity of the broader family of protein acetyltransferases.

The cotranslational process of N-terminal acetylation occurs on ~85% no acetyltransferase has been structurally characterized in the of human proteins and ~50% of yeast proteins and mediates a wide presence of its activating partner28–30. range of biological processes including cellular apoptosis, enzyme Three additional NAT enzymes, NatD–NatF, have been identified regulation, protein localization and the N-end rule for protein degra- and appear to be independently active. They also have a more limited dation1–6. At least three NAT complexes that perform this modifica- set of biologically relevant substrates and are not well characterized tion, NatA, NatB and NatC, exist as heterodimers with one unique across eukaryotes31–35. Currently, there is no structure of a NAT catalytic subunit and an additional unique auxiliary subunit that both complex or any of the integral subunits, and the mechanism of sub- activates the enzymatic component and anchors the complex to the strate-specific catalytic regulation by the auxiliary subunits remains ribosome during translation7–13. Aberrant expression of the proteins uncharacterized. The only eukaryotic NAT for which structural data ARTICLESthat make up the NatA complex has been observed in a number of is available is the human NatE (NAA50) enzyme, which is independ- cancer-cell tissues; consequently, NAT enzymes are emerging targets ently active and has only one known biologically relevant substrate; for chemotherapeutic development14–21. this substrate, containing a 1-MLGP-4 N-terminal sequence, was The three NAT complexes are highly conserved in eukaryotes present in an X-ray crystal structure1,36. from yeast to humans and are differentiated from one another on the In this study we set out to determine the molecular basis for sub- basis of their substrate specificities1,5,22–24. NatA, which is composed strate binding, acetylation specificity and mode of catalysis of the of the catalytic NAA10 subunit and the auxiliary NAA15 subunit, is NatA complex and to determine the role of the auxiliary subunit in the most promiscuous of all NAT enzymes; classically, it acetylates an these activities. Toward this goal, we determined the X-ray crystal A-amino group on nascent peptide chains with an N-terminal alanine, structures of the holo-NatA complex bound to acetyl CoA and a cysteine, glycine, serine, threonine or valine residue1,22,24,25. Notably, bisubstrate inhibitor, determined the structure of the Naa10p sub- recent studies demonstrate that NAA10 also exists as a monomer unit bound to acetyl CoA in its uncomplexed form that preferentially in cells and that it can acetylate the A-amino group of substrates acetylates a unique subset of substrates, and carried out structure- with N-terminal aspartate and glutamate residues that can be gen- guided mutational analysis to derive structure-function correlations erated post-translationally, but it will not acetylate traditional underlying NatA acetylation. NatA substrates26. The NatB and NatC complexes acetylate the MolecularN termini of proteins basis with an N-terminal for methionine N-terminal with further RESULTS acetylation by the specificity dependent upon the identity of the second residue13,23,25,27. Overall structure of NatA Although NATs as well as many other lysine side chain acetyl- In an attempt to prepare the NatA complex for X-ray heterodimerictransferases require binding partners NatA for optimal complex catalytic activity, structure determination, we overexpressed the human complex

1,2 2 3 2 3,4 Glen Liszczak1Program in Gene ,Expression Jacob andM Regulation,Goldberg Wistar, Institute, Håvard Philadelphia, Foyn Pennsylvania,, E James USA. Petersson 2Department of, Chemistry,Thomas University Arnesen of Pennsylvania, & Philadelphia, Pennsylvania, USA. 3Department1,2 of Molecular Biology, University of Bergen, Bergen, Norway. 4Department of Surgery, Haukeland University Hospital, Bergen, Norway. RonenCorrespondence Marmorstein should be addressed to R.M. ([email protected]).

Received 12 April; accepted 19 June; published online 4 August 2013; doi:10.1038/nsmb.2636 N-terminal acetylation is ubiquitous among eukaryotic proteins and controls a myriad of biological processes. Of the N-terminal acetyltransferases (NATs) that facilitate this cotranslational modification, the heterodimeric NatA complex has the most diversity 1098 VOLUME 20 NUMBER 9 SEPTEMBER 2013 NATURE STRUCTURAL & MOLECULAR BIOLOGY ARTICLESfor substrate selection and modifies the majority of all N-terminally acetylated proteins. Here, we report the X-ray crystal structure of the 100-kDa holo-NatA complex from Schizosaccharomyces pombe, in the absence and presence of a bisubstrate peptide-CoA–conjugate inhibitor, as well as the structure of the uncomplexed Naa10p catalytic subunit. The NatA-Naa15p a auxiliaryN subunit containsb 13 tetratricopeptideN motifs andc adopts a ring-like topology that wraps daround the NatA-Naa10p subunit, an interaction that alters the Naa10p active site for30 substrate-specific acetylation. These studies have implications27 for understanding the mechanistic details of other NAT complexes and how regulatory subunits modulate the activity of the 28 Naa10p Q491 2 broader family of protein acetyltransferases. D532 Naa15p W552 F533 L549 F536 W494 Q15 29 The cotranslational process of N-terminal acetylation occurs on ~85% no acetyltransferaseI36 has been structurally characterized1 in the 29 30 30 28–30 C of human2 proteins and ~50% of yeast proteins and2 mediates a wide presence of its activating partner . 27 3 3 29 L32 6 5 I8 25 N25 range of 7biological13 processes90 including cellular27 apoptosis, enzyme Three additional NAT enzymes, NatD–NatF, have been identified 28 1 C 1 26 2 28  H20 regulation,4 protein14 localization and the N-end rule for proteinW526 degra- and appear to be independentlyY33 active. They also have a more limited 25 K29  4 15 25 1–6  L11 L28 24dation . At least three NAT complexes that perform this modifica- set of biologically relevant substrates and are not well characterized 19 31–35 tion,23 NatA,17 NatB and NatC, exist as heterodimers with one unique across eukaryotes . Currently, there isR448 no structure of a NAT 1 2 W38 F449 catalytic subunit and an additional unique auxiliary subunit that both complex or any of the integral subunits, and the mechanism of sub- activates the enzymatic component and anchors the complex to the strate-specific catalytic regulation by the auxiliary subunits remains ribosome during translation7–13. Aberrant expression of the proteins uncharacterized. The only eukaryotic NAT for which structural data Figure 1 Overallthat structure make upof thethe NatANatA complex complex bound has been to acetyl observed CoA. (ina) aNaa10p number (teal) of andis available Naa15p is(brown) the human subunits, NatE shown (NAA50) in cartoon, enzyme, bound which to is independ- acetyl CoA (CPKcancer-cell coloring and tissues; stick consequently,format). Only Naa15pNAT enzymes helices are that emerging contact Naa10ptargets areently labeled. active The and dashed has only brown one line known represents biologically a disordered relevant substrate; loop region in Naa15p.for chemotherapeutic The dimensions development of the complex14–2 1are. 107 Å × 85 Å × 70 Å. (b) A 90°this rotation substrate, of the containing view in a. aHelices 1-MLGP-4 that are N-terminal depicted in sequence, c and was d are labeled. (c) TheZoom three view highlightingNAT complexes key residues are highly that conservedcompose the in predominantly eukaryotes hydrophobicpresent in an interface X-ray crystal between structure Naa10p1, 3A61–. A2 and Naa15p A29–A30. (d) Zoomfrom viewyeast of to the humans intersubunit and are interface differentiated at the fromC-terminal one another region onof Naa10p the AIn1 andthis thestudy Naa15p we set A out25– toA27– determineA28 helices. the molecular basis for sub- basis of their substrate specificities1,5,22–24. NatA, which is composed strate binding, acetylation specificity and mode of catalysis of the between Lys29of andthe catalytic Gln15 ofNAA10 Naa10p subunit and andAsp532 the auxiliary and Gln491 NAA15 of subunit, of traditional is NatA substrates complex and(alanine, to determine cysteine, the glycine, role of serine,the auxiliary threonine subunit in Naa15p. Singlethe point most mutationspromiscuous in ofthis all regionNAT enzymes; did not classically, break up itthe acetylates or valine) an these (Supplementary activities. Toward Videos this 2goal, and we 3 ).determined Specifically, the Naa10p X-ray crystal complex and hadA-amino only modestgroup on effects nascent on peptide substrate chains binding with anand N-terminal cataly- alanine,residues Leu22structures and ofTyr26 the holo-NatA shift about complex 5.0 Å from bound surface-exposed to acetyl CoA and a 1,22,24,25 sis (Table 2), cysteine,probably glycine, owing serine, to the threonine extensiveness or valine of theresidue interface. . Notably,positions btoisubstrate buried positionsinhibitor, determinedin the active the site, structure and Naa10p of the Naa10pGlu24 sub- A smaller hydrophobicrecent studies interface demonstrate is formed that betweenNAA10 alsoNaa10p exists His20 as a monomermoves by unitabout bound 4.0 Å, to substantially acetyl CoA in altering its uncomplexed the landscape form ofthat the preferentially NatA in cells and that it can acetylate the A-amino group of substrates acetylates a unique subset of substrates, and carried out structure- and Naa15p Phe449 and Trp494 and is supplemented with a hydro- active site (Fig. 2d). All of these residues are well ordered in both with N-terminal aspartate and glutamate residues that can be gen- guided mutational analysis to derive structure-function correlations gen bond between Naa10p Gln25 and Naa15p Arg448. This region structures (Supplementary Fig. 4). Our comparison of the com- erated post-translationally, but it will not acetylate traditional underlying NatA acetylation. A of Naa15p directlyNatA stabilizes substrates the26. Theposition NatB of and the NatCNaa10p complexes 1 helix acetylateas plexed the and uncomplexed structures suggests that the auxiliary subunit well as the Naa10pN termini A1– Aof2 proteins loop and with as aan result N-terminal is crucial methionine for proper with furtherinduces anRESULTS allosteric change in the Naa10p active site to an extent that complex formation.specificity This dependent is evident upon from the the identity observation of the second that alanine residue 13,is23 ,required25,27. Overall for the structure mechanism of NatA of catalysis by the NatA complex, and point mutationsAlthough at either NATs Naa15p as well Arg448 as many or Naa15p other lysine Phe449 side were chain Naa10p acetyl- isIn likely an to attempt represent to an prepare active GNAT the NatA fold. Consistent complex forwith X-ray able to disrupttransferases NatA complex require formation. binding Several partners additional for optimal scattered catalytic activity,this hypothesis, structure a backbone determination, alignment we overexpressedof key active site the elements human complexin intermolecular interactions serve to supplement the Naa10p-Naa15p active Naa10p with the corresponding region in the independently interface (Supplementary1Program in Gene Fig. Expression 3). and Regulation, Wistar Institute, Philadelphia,active Pennsylvania, human NAA50 USA. 2Department that selects of Chemistry, a 1-Met-Leu-2 University N-terminal of Pennsylvania, sequence Philadelphia, Pennsylvania, USA. 3Department of Molecular Biology, University of Bergen,shows Bergen, a high Norway. degree 4Department of structural of Surgery, conservation Haukeland University (r.m.s. Hospital, deviation Bergen, of Norway. Molecular basisCorrespondence for Naa15p should modulation be addressed of to R.M.Naa10p ([email protected] acetylation). 1.52 Å). The corresponding alignment of the complexed and uncom- To explore theReceived molecular 12 April; basis accepted for Naa15p19 June; published modulation online of4 August Naa10p 2013; doplexedi:10.1038/nsmb.263 forms of Naa10p6 showed less structural conservation, with an acetyltransferase activity, we determined the X-ray crystal structure of r.m.s. deviation of 2.43 Å (Fig. 2e). Naa10p of S. pombe in the absence of Naa15p, for comparison with the 1098 VOLUME 20 NUMBER 9 SEPTEMBER 2013 NATURE STRUCTURAL & MOLECULAR BIOLOGY holo-NatA complex (Table 1). We determined the structure of Naa10p Substrate peptide binding and NatA inhibition (residues 1–156) to 2.00-Å resolution, using a combination of single- To determine the molecular basis for substrate-specific peptide bind- wavelength anomalous diffraction and molecular replacement (model, ing by NatA, and in particular how, unlike most other NATs, it is able NAA50) to phase data collected on a selenomethionine-derivatized to accommodate a number of nonmethionine N-terminal substrates, Naa10p protein. An alignment of the complexed and uncomplexed we synthesized a bisubstrate conjugate in which CoA is covalently forms of Naa10p revealed that the A1–loop–A2 segment assumes linked to a biologically relevant substrate peptide fragment with the a substantially different conformation in the presence of Naa15p sequence 1-SASEA-5 (CoA-SASEA). We performed inhibition stud- (Fig. 2a). Notably, this conformational change is driven by the move- ies with CoA-SASEA and with a control compound, acetonyl CoA, ment of several hydrophobic residues in Naa10p A2 (Leu28, Leu32 which is a nonhydrolyzable acetyl CoA analog (Fig. 3a). Half-maxi- and Ile36), which make intramolecular interactions with residues in mum inhibitory concentration (IC50) determinations revealed that Naa10p A1 and B3 (Ile8, Leu11, Met14, Tyr55 and Tyr57) in apo- CoA-SASEA had an IC50 of 1.4 o 1.0 MM, whereas acetonyl CoA Naa10p but shift to make alternative intermolecular interactions had an IC50 of 380 o 10 MM (Fig. 3b). To assess the specificity of this with helices of Naa15p in the NatA complex (Figs. 1c and 2b). As inhibitor toward NatA, we also calculated IC50 values of CoA-SASEA a result of this interaction, the C-terminal region of the A1 helix and acetonyl CoA with NAA50, a NAT that requires a substrate undergoes an additional helical turn, which helps to reposition the N-terminal methionine residue. We found that CoA-SASEA has an A1–A2 loop. Notably, docking of the apo-Naa10p structure into the IC50 of 11 o 2 MM, whereas acetonyl CoA has an IC50 of 130 o 12 MM corresponding binding pocket of Naa15p showed a clash between the (Fig. 3b). With NAA50, the addition of a SASEA peptide portion Naa10p A1–loop–A2 and Naa15p Arg448 and Naa15p Phe449 of A25 was able to increase the potency of acetonyl CoA by only about ten- (Fig. 2c)—the same interface that we have shown to be necessary for fold, whereas with NatA this peptide addition exhibited an increase proper complex formation (Fig. 1d and Table 2). in potency of about 300-fold. The greater potency of acetonyl CoA As a result of the Naa15p interaction along one side of the Naa10p with NAA50 over Naa10p can be explained by the stronger binding of A1–loop–A2 region, residues on the opposite side of this loop region acetyl CoA to NAA50 (Km = 27 o 2 MM) than to Naa10p (Km = 59 o appear to adopt a specific conformation that is essential for catalysis 5 MM). The markedly higher potency of the CoA-SASEA inhibitor

1100 VOLUME 20 NUMBER 9 SEPTEMBER 2013 NATURE STRUCTURAL & MOLECULAR BIOLOGY LETTER doi:10.1038/nature12141

De novo mutations in histone-modifying genes in congenital heart disease

Samir Zaidi1,2*, Murim Choi1,2*, Hiroko Wakimoto3, Lijiang Ma4, Jianming Jiang3,5, John D. Overton1,6,7, Angela Romano-Adesman8, Robert D. Bjornson7,9, Roger E. Breitbart10, Kerry K. Brown3, Nicholas J. Carriero7,9, Yee Him Cheung11, John Deanfield12, Steve DePalma3, Khalid A. Fakhro1,2, Joseph Glessner13, Hakon Hakonarson13,14, Michael J. Italia15, Jonathan R. Kaltman16, Juan Kaski12, Richard Kim17, Jennie K. Kline18, Teresa Lee4, Jeremy Leipzig15, Alexander Lopez1,6,7, Shrikant M. Mane1,6,7, Laura E. Mitchell19, Jane W. Newburger10, Michael Parfenov3, Itsik Pe’er20, George Porter21, Amy E. Roberts10, Ravi Sachidanandam22, Stephan J. Sanders1,23, Howard S. Seiden24, Mathew W. State1,23, Sailakshmi Subramanian22, Irina R. Tikhonova1,6,7, Wei Wang15,25, Dorothy Warburton4,26, Peter S. White14,15, Ismee A. Williams4, Hongyu Zhao1,27, Jonathan G. Seidman3, Martina Brueckner1,28, Wendy K. Chung4,29, Bruce D. Gelb22,24,30, Elizabeth Goldmuntz14,31, Christine E. Seidman3,5,32 & Richard P. Lifton1,2,6,7,33

Congenital heart disease (CHD) is the most frequent birth defect, Genomic DNA samples from trios underwent exome sequencing8 affecting 0.8% of live births1. Many cases occur sporadically and (see Methods). Targeted bases in each sample were sequenced a mean of impair reproductive fitness, suggesting a role for de novo muta- 107 times by independent reads, with 96.0% read eightor more times. In tions. Here we compare the incidence of de novo mutations in 362 parallel, 264 trios comprising unaffected siblings of autism cases and severe CHD cases and 264 controls by analysing exome sequencing their unaffected parents (Supplementary Table 1) were sequenced in the of parent–offspring trios. CHD cases show a significant excess of same facility using the same protocol and were analysed as a control protein-altering de novo mutations in genes expressed in the deve- group9 (Supplementary Table 2 and Supplementary Fig. 1). Family loping heart, with an odds ratio of 7.5 for damaging (premature relationships were confirmed from sequence data in all trios. termination, frameshift, splice site) mutations. Similar odds ratios High-probability de novo variants in probands were identified using are seen across the main classes of severe CHD. We find a marked a Bayesian quality score (QS; see Methods). Sanger sequencing of 181 excess of de novo mutations in genes involved in the production, putative de novo mutations across the QS spectrum demonstrated removal or reading of histone 3 lysine 4 (H3K4) methylation, strong correlation of confirmation with QS (R2 5 0.89), with 100% or ubiquitination of H2BK120, which is required for H3K4 confirmation of 90 calls with QS . 50 (Supplementary Table 3 and methylation2–4. There are also two de novo mutations in SMAD2, Supplementary Fig. 2). Consequently, de novo mutation calls with which regulates H3K27 methylation in the embryonic left–right QS $ 50 were included in the study; this set is estimated to include organizer5. The combination of both activating (H3K4 methyla- 90% of mutations with QS . 0, with ,100% specificity; 90% of these tion) and inactivating (H3K27 methylation) chromatin marks have the maximum QS of 100 (Supplementary Fig. 3). Sensitivity is characterizes ‘poised’ promoters and enhancers, which regulate further diminished by ,5% owing to bases with very low read cover- expression of key developmental genes6. These findings implicate age. We found 0.88 de novo mutations per subject in CHD cases and de novo point mutations in several hundreds of genes that collec- 0.85 in controls. These mutation rates (1.34 and 1.29 3 1028 per tar- tively contribute to approximately 10% of severe CHD. geted base) are not significantly different (P 5 0.63, binomial test) and From more than 5,000 probands enrolled in the Congenital Heart are similar to previous estimates10. The set of de novo mutations is Disease Genetic Network Study of theNationalHeart,Lung,andBlood shown in Supplementary Table 4. Institute Paediatric Cardiac Genomics Consortium7,weselected362 CHD cases and controls had very similar maternal and paternal parent–offspring trios comprising a child (proband) with severe CHD ages, which had a small effect on the mutation rate (Supplementary and no first-degree relative with identified structural heart disease. Pro- Fig. 4). We found no significant effect of geographic ancestry on the bands with an established genetic diagnosis were excluded. There were 154 mutation rate (Supplementary Fig. 5). The number of de novo muta- probands with conotruncal defects, 132 with left ventricular obstruction, tions per subject closely approximated the Poisson distribution, pro- 70 with heterotaxy and six with other diagnoses (Supplementary Table 1). viding no evidence for mutation clustering (Supplementary Fig. 6).

1Department of Genetics, Yale University School of Medicine, New Haven, Connecticut 06510, USA. 2Howard Hughes Medical Institute, Yale University, Connecticut 06510, USA. 3Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA. 4Department of Pediatrics, Columbia University Medical Center, New York, New York 10032, USA. 5Howard Hughes Medical Institute, Harvard University, Boston, Massachusetts 02115, USA. 6Yale Center for Mendelian Genomics, New Haven, Connecticut 06510, USA. 7Yale Center for Genome Analysis, Yale University, New Haven, Connecticut 06511, USA. 8Steven and Alexandra Cohen Children’s Medical Center of New York, New Hyde Park, New York 11040, USA. 9Department of Computer Science, Yale University, New Haven, Connecticut 06511, USA. 10Department of Cardiology, Children’s Hospital Boston, Boston, Massachusetts 02115, USA. 11Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, New York 10032, USA. 12Department of Cardiology, University College London, Great Ormond Street Hospital, London WC1N 3JH, UK. 13Center for Applied Genomics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA. 14Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA. 15The Center for Biomedical Informatics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA. 16National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA. 17Section of Cardiothoracic Surgery, University of Southern California Keck School of Medicine, Los Angeles, California 90089, USA. 18Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, New York 10032, USA. 19Division of Epidemiology, Human Genetics and Environmental Sciences, University of Texas School of Public Health, Houston, Texas 77030, USA. 20Department of Computer Science, Columbia University, New York, New York 10032, USA. 21Department of Pediatrics, University of Rochester Medical Center, The School of Medicine and Dentistry, Rochester, New York 14611, USA. 22Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA. 23Program on Neurogenetics, Child Study Center, Department of Psychiatry, Yale University, New Haven, Connecticut 06510, USA. 24Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA. 25Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey 07102, USA. 26Department of Pathology, Columbia University Medical Center, New York, New Jersey 10032, USA. 27Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06510, USA. 28Department of Pediatrics Yale University School of Medicine, New Haven, Connecticut 06510, USA. 29Department of Medicine, Columbia University Medical Center, New York, New York 10032, USA. 30The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA. 31Division of Cardiology, The Children’s Hospital of Philadelphia, The University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA. 32Cardiovascular Division, Brigham & Women’s Hospital, Harvard University, Boston, Massachusetts 02115, USA. 33Department of Internal Medicine, Yale University School of Medicine, New Haven, Connecticut 06510, USA. *These authors contributed equally to this work.

00 MONTH 2013 | VOL 000 | NATURE | 1 ©2013 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

Table 2 | Genes of interest with de novo mutations in probands WDR5 UBE2B ID Gene Mutation Dx Other structural/neuro/ht-wt MLL2 RNF20 1-00596 MLL2{ p.Ser1722Arg fs*9LVOY/Y/N SMAD2 (2) 1-00853 WDR5{ p.Lys7Gln CTD N/Y/N 1-00534 CHD7{ p.Gln1599* CTD Y/Y/Y 1-00230 KDM5A{ p.Arg1508Trp LVO N/N/Y Me K120 Ub 1-01965 KDM5B{ p.IVS12 1 1G.ALVON/N/Y CHD7 Me 1-01907 UBE2B p.Arg8Thr CTD N/N/N { K4 K27 1-00075 RNF20{ p.Gln83* HTX Y/Y/Y H3 H2B 1-01260 USP44{ p.Glu71Asp LVO N/N/N 1-02020 SMAD2{{ p.IVS6 11G.AHTXY/N/N USP44 1-02621 SMAD2{{ p.Trp244Cys HTX Y/NA/N KDM5A 1-01451 MED20 p.IVS2 1 2T.CHTXN/Y/Y KDM5B 1-01151 SUV420H1 p.Arg143Cys CTD N/Y/N H4 H2A 1-00750 HUWE1 p.Arg3219Cys LVO N/Y/N 1-00577 CUL3 p.Iso145Phe fs*23 LVO Y/Y/N 1-00116 NUB1 p.Asp310His CTD Y/Y/Y 1-01828 DAPK3 p.Pro193Leu CTD N/N/NA 1-03151 SUPT5H p.Glu451Asp LVO N/NA/N 1-00455 NAA15 p.Lys336Lys fs*6HTXY/Y/N 1-00141 NAA15 p.Ser761* CTD N/NA/Y 1-01138 USP34 p.Leu432Pro LVO N/NA/N 1-00448 NF1 p.IVS6 14delA CTD N/NA/N Figure 2 | de novo mutations in the H3K4 and H3K27 methylation 1-00802 PTCH1 p.Arg831Gln LVO N/NA/N pathways. Nucleosome with histone octamer and DNA, with H3K4 1-02458 SOS1 p.Thr266Lys Other Y/Y/Y methylation bound by CHD7, H3K27 methylation and H2BK120 1-02952 PITX2 p.Ala47Val LVO N/NA/N ubiquitination is shown. Genes mutated in CHD that affect the production, 1-01913 RAB10 p.Asn112Ser Other N/NA/N removal and reading of these histone modifications are shown; genes with 1-00638 FBN2 p.Asp2191Asn CTD N/NA/N 1-00197 BCL9 p.Met1395Lys LVO N/NA/N damaging mutations are shown in red, those with missense mutations are shown 1-02598 LRP2 p.Glu4372Lys HTX N/NA/N in blue. SMAD2 (2) indicates there are two patients with a mutation in this gene. Genes whose products are found together in a complex are enclosed in a box. Gene symbols are as in NCBI RefSeq database. Other structural/neuro/ht-wt denotes presence (Y) or absence (N) of other structural abnormalities, impaired cognitive speech or motor development, and height (ht) and/or weight (wt) less than 5th percentile for age, respectively. Further clinical details in complex conotruncal defect, an unusual finding in neurofibromatosis. Supplementary Tables 10 and 11. Associated syndromes: MLL2,Kabukisyndrome;CHD7,CHARGE These findings support variable expressivity and a broader phenotypic syndrome; CUL3,pseudohypoaldosteronism,type2E. spectrum resulting from mutations at known disease loci. Other genes * Premature termination mutation. { Gene involved in production, removal or reading of H3K4 methylation mark. of interest in this set included RAB10 and BCL9, identified as candi- 14 {{ Gene involved in removal of H3K27 methylation mark. dates by rare de novo copy-number variants . Del, deletion; Dx, diagnosis; fs, frameshift mutation; fs*n,frameshiftmutationfollowedbypremature termination n codons later; NA, data not available. Our results implicate de novo point/insertion–deletion (indel) mutations that by chance occur in genes required for normal heart (missense), encoding a histone H4 methylase; MED20 (splice site), a development in the pathogenesis of diverse CHDs. Consistent with component of the mediator complex; HUWE1 (missense), a ubiquitin this inference, genes with damaging and conserved missense muta- ligase targeting histones and TP53; CUL3 (frameshift), a scaffold for tions in CHD probands showed higher expression in E14.5 mouse assembly of many RING ubiquitin ligases8;andNUB1 (missense), which heart compared to controls (Supplementary Fig. 8; median 45 versus inhibits NEDD8, a cofactor for cullin-based ubiquitin ligases. Last, 16 r.p.m.; P 5 5 3 1024, Wilcoxon signed-rank test), whereas expres- NAA15,anN-acetyltransferase13,hadtwodamagingmutations,unlikely sion of genes with silent mutations show no significant difference a chance event (P 5 0.01, Monte Carlo simulation). Among the 17 above (median 21 versus 19 r.p.m.; P 5 0.7, Wilcoxon signed-rank test). genes, ten have no damaging variants and seven have one to five among Expression at E9.5 shows similar results (Supplementary Fig. 8). The .9,500 exomes in National Heart, Lung, and Blood Exome Sequencing increased mutation burden of HHE genes in cases is not due to a higher Project, 1000 Genomes and Yale exome databases. intrinsic mutation rate of these genes because the rate is significantly Phenotypes of the eight patients with de novo mutations in the higher than in controls; moreover, there is no significant difference in H3K4me pathway revealed diverse cardiac phenotypes (Table 2 and mutation rate between HHE and LHE genes in controls. Further, par- Supplementary Table 10). Other structural, neurodevelopmental and titioning genes into analogous high- and low-expression groups for growth abnormalities were common. In addition, consistent with a four control adult tissues (brain, heart, liver and lung) showed no role in left–right axis determination5, both patients with SMAD2 significant differences in mutation burden between cases and controls mutations had dextrocardia with unbalanced complete atrioventricu- or between high- and low-expression groups (Supplementary Fig. 9). lar canal and pulmonary stenosis. For other genes mutated more than From the increased fraction of patients with protein-altering muta- once (for example, NAA15), probands had dissimilar cardiac pheno- tions in HHE genes in CHD patients (0.22) versus controls (0.12), we types (Supplementary Table 11). estimate that such mutations have a role in about 10% of these patients Before initiating exome sequencing, we defined a set of 277 candid- (95% confidence interval, 5–15%). This could be somewhat underesti- ate CHD genes (Supplementary Table 12) from human and model mated, as mutation detection is incomplete, analysis is limited to genes system studies. There were 13 CHD probands with de novo mutations with identified mouse orthologues, and the HHE set may not include all in these genes (Table 2 and Supplementary Table 13), more than trait loci. Similarly, the observed odds ratios may be somewhat under- expected by chance (P 5 7 3 1024, Monte Carlo simulation) or in estimated as not all mutations in cases are likely to confer risk. controls (n 5 1; P 5 0.006, binomial test). This set included several These findings establish that mutations in many genes in the genes known to cause Mendelian CHD; however, affected subjects H3K4me–H3K27me pathway disrupt cardiac development and are lacked cardinal disease manifestations or had atypical cardiac features. consistent with previous evidence implicating these chromatin marks For example, the patient with the CHD7 mutation had none of the in regulating key developmental genes6, including those involved in main criteria (coloboma, choanal atresia or hypoplastic semicircular cardiac development15,16. Targeted sequencing in larger CHD cohorts canals) for CHARGE syndrome12. Similarly, the patient with the MLL2 will enable assessment of the role of each individual gene in this path- mutation was not prospectively diagnosed with Kabuki syndrome; way. These findings imply dosage sensitivity for these chromatin however, re-evaluation at age 2 after sequencing identified character- marks in CHD, similar to recent findings implicating haploin- istic facial features. Additionally, a patient with an NF1 mutation had a sufficiency for chromatin modifying/remodelling genes in diverse

00 MONTH 2013 | VOL 000 | NATURE | 3 ©2013 Macmillan Publishers Limited. All rights reserved SUPPLEMENTARY INFORMATION RESEARCH SUPPLEMENTARYSUPPLEMENTARY INFORMATION INFORMATIONRESEARCHRESEARCH Table S10. Chromatin modifying and other genes of interest with de novo mutations in CHD probands

Neuro- Table S10. ChromatinHeart modifying and otherPrimary genes Classification: of interest Specific with de novo mutations in CHD probandsSomatic Growth ID Gene † Mutation § Extracardiac Structural Anomalies Develop- Exp Cardiovascular Diagnoses Ht(%) Wt(%) mental Neuro- Heart Primary Classification: Specific Epicanthal folds, telecanthus, large Somatic Growth ID Gene Mutation Extracardiac Structural Anomalies Develop- † § low-set ears, excess nuchal skin, motor Exp p.Ser1722 LVO:Cardiovascular Mitral atresia, Diagnoses HLHS, aortic Ht(%) Wt(%) 1-00596 MLL2 216 high arched palate, wide-spaced delay, mental 50 <5 Argfs*9 atresia, dbl AA nipples,Epicanthal undescended folds, testes,telecanthus, club largehypotonia low-set ears, excess nuchal skin, motor p.Ser1722 LVO: Mitral atresia, HLHS, aorticfoot, hyperpigmented lesions, 1-00596 MLL2 216 high arched palate, wide-spaced delay, 50 <5 Argfs*9 CTD:atresia, TOF, dbl right AA aortic arch, 1-00853 WDR5 39 p.Lys7Gln aberrant LSA, coronary No nipples, undescended testes, clubabnormal hypotonia 90 90 abnormality foot, hyperpigmented lesions, CTD: TOF, right aortic arch, Cleft lip, cleft palate, inguinal 1-008531-00534 WDR5CHD7 39125 p.p.LysGln15997Gln* CTD:aberrant TOF- PALSA, coronary hernia,No micropenis, sensorineural abnormalabnormal <5 90<5 90 1-00305 CTD SHANK3 Silent C290C abnormalityNovel NM_033517.1 NP_277052.1hearing22 loss51117841 T>C 65.68 111 28 9 55 1 38 1 91 LVO: LSVC, Primum ASD, cleft 1-00230 KDM5A 70 p.Arg1508Trp No Cleft lip, cleft palate, inguinal normal <5 <5 1-1-0209300534 CHD7CTD LIFR 125 Silentp.Gln1599Q527Q* MV,CTD: Novelsub TOF-AS, -BAVPANM_002310 NP_002301 hernia,5 38502758 micropenis,T>C 42.39sensorineural228 69 abnormal62 142 0 121<5 0 <5100 p.IVS12+1 1-01965 KDM5B 68 LVO: Coarctation No hearing loss n/a <5 <5 G>A 1-01341 CTD NISCH Silent H989H LVO:Novel LSVC,NM_007184 Primum ASD,NP_009115 cleft 3 52522475 C>T 490.14 228 15 14 24 0 34 0 77 1-002301-01907 KDM5AUBE2B 70146 p.ArgArg8Thr1508 Trp CTD: TOF No No normal normal 50 <510 <5 MV, sub-AS, BAV Low-set ears,excess nuchal skin, 1-02786 OTHER CCNL1 Silent A19A HTX:Novel Dextrocardia,NM_020307 RAI, TAPVR,NP_064703 3 156877827 G>A 40.50 199 13 14 32 0 28 0 76 p.IVS12+1 hydronephrosis, wide-spaced 2nd 1-019651-00075 KDM5BRNF20 6858 p.Gln83* L-LVO:ventricular Coarctation loop, CAVC No abnormaln/a <5 <5<5 <5 G>A toe, Asplenia, primary cilia 1-01933 LVOTO FBLN2 Silent P312P unbalancedNovel -rightNM_001165035 dominant,NP_001158507 PA 3 13612791 A>C 46.48 130 20 10 18 0 18 0 55 RESEARCH1-01907 UBE2BSUPPLEMENTARY 146 p.Arg INFORMATION8Thr CTD: TOF dyskNoinesia normal 50 10 LVO: ASD, mitral atresia, aortic Low-set ears,excess nuchal skin, 1-01260 USP44 0 p.Glu71Asp HTX: Dextrocardia, RAI, TAPVR,No normal 25 25 1-01365 CTD C20orf152 Silent G297G atresia,Novel HLHSNM_080834.2 NP_543024.2 hydronephrosis,20 34582995 G>T wide45.23-spaced228 2nd110 91 226 0 207 2 100 1-00075 RNF20 58 p.Gln83* L-ventricular loop, CAVC abnormal <5 <5 HTX: Dextrocardia, ASD, CAVC- toe, Asplenia, primary cilia 1-02020 SMAD2 38 p.IVS6+1 G>A unbalanced-right dominant, PA Asplenia normal 95 10 Damaging mutations in LHE genes in CHD Cases unbalanced, DORV, D-TGA, PS dyskinesia HTX:LVO: Dextrocardia, ASD, mitral LSVC atresia, to LA,aortic 1-01260 USP44 0 p.Glu71Asp AbnormalNo nose, foot syndactyly, normal 25 25 1-006041-02621 CTDSMAD2 SLC5A2 38 Nonsensep.Trp244CW172Xys PAPVR,atresia,Novel CAVC HLHSNM_003041 unbalanced -rightNP_003032 16 31497537 G>A 3.43 228 n/a20 23 100 50 0 7850 0 100 malrotation dominant,HTX: Dextrocardia, DORV, PS ASD, CAVC- 1-1-0228202020 LVOTOSMAD2 DNAH1038 Nonsensep.IVS6+1W1406X G>A HTX:Novel Dextrocardia,NM_207437 PAPVR,NP_997320 Asplenia12 124317687 G>A 0.29 228 27 normal31 65 0 6595 0 10100 unbalanced, DORV, D-TGA, PS 1-01451 MED20 25 p.IVS2+2 T>C mitral atresia, HLHS, aortic No abnormal <5 10 HTX: Dextrocardia, LSVC to LA, 1-00323 HTX CATSPERG Nonsense Y752X atresia,Novel hypoplasticNM_021185 AA NP_067008 Abnormal19 38853114 nose,C>G foot0.24 syndactyly,228 48 26 47 0 43 0 100 1-02621 SMAD2 38 p.Trp244Cys PAPVR, CAVC unbalanced-right n/a 50 50 1-01151 SUV420H1 44 p.Arg143Cys CTD: dbl AA No malrotation abnormal 50 10 dominant, DORV, PS 1-02020 HTX SMAD2 Splice 1 bp beyond exon 6 LVO:Novel Mitral stenosis,NM_001135937 aortic NP_001129409 18 45374845 C>T 38.29 228 mild38 35 114 2 96 0 100 1-00750 HUWE1 260 p.Arg3219Cys No 25 25 RESEARCH SUPPLEMENTARY INFORMATIONstenosis,HTX: Dextrocardia, HLHS PAPVR, abnormal 1-01451 MED20 25 p.IVS2+2 T>C LVO:mitral Hypoplastic atresia, HLHS, mitral valve,aortic No abnormal <5 10 1-01451 HTX MED20 Splicep.Iso144Phe2 bp beyond exon 2 Novel NM_004275 NP_004266Congenital6 41884521 hip dysplasia,A>G 25.19 168 118 33 153 0 158 0 100 1-00577 CUL3 57 hypoplasticatresia, hypoplastic aortic annulus, AA aortic n/a 12 60 fs*23 congenital scoliosis 1-1-0330001151 LVOTOSUV420H1NTM 44 Frameshiftp.Arg143C204/344ys stenosis,CTD:Novel dbl coarctation AANM_016522 NP_057606 No11 132177668 -A 14.02 Indel-Pass 25abnormal 70 95 0 8050 0 10100 Table S4. All de novo mutations with Bayesian QS50 CTD:LVO: LSVC, Mitral sinus stenosis, venosus aortic ASD, Proband mild Father Mother 1-00750 HUWE1 260 p.Arg3219Cys No SUPPLEMENTARY INFORMATION RESEARCH25 25 1-00116 NUB1 45 p.Asp310His truncusstenosis, arteriosus, HLHS VSD- Spine lipoma abnormalabnormal <5 5 1-00185 LVOTO NAT8L Frameshift S217 muscularNovel NM_178557 NP_848652 4 2065595 +A 9.34 Indel-Pass 47 20 35 0 37 0 100 LVO: Hypoplastic mitral valve, p.Iso144Phe Congenital hip dysplasia, 1-005771-01828 CUL3DAPK3 5755 p.Pro193Leu CTD:hypoplastic TOF aortic annulus, aorticNo n/a n/a n/a 12n/a 60 1-00141 PrimaryCTD GREB1L Nonsensefs*23 W1373X Novel NM_001142966 NP_001136438 congenital18 19085419 scoliosisG>A Mean3.96 Variant 228 94 87 159 0 151 0 Bayesian100 1-03151 SUPT5H 133 p.Glu451Asp LVO:stenosis, BAV, aorticcoarctationRefSeq stenosis NM RefSeq NPNo Position Base n/aRef Nonref Ref75 Nonref Ref90 Nonref HTX:CTD: Dextrocardia, LSVC, sinus TAPVR, venosus ASD, 1-00393 CTD PLCZ1 Frameshiftp.Lys335Lys Y603 Novel NM_033123 NP_149114Hydronephrosis,12 18836193 asplenia,+AAAC 0.00 Indel-Pass 321 91 250 0 190 0 100 1-001161-IDTable00455 CardiacS10NUB1NAA15 . Chromatin Gene 45214 Mutation modifyingp.Asp Amino310H Acid and isChange otherLSVC,truncus dbSNP hypoplasticgenes arteriosus, of TV, interest VSD DORV,- with deSpineChr novo lipoma mutations Heart in QualityCHD normal probands abnormal 50 <550 Quality5 fs*6 malrotation hypoplasticmuscular RV,Accesion D-TGA, IDs PS Accesion IDs (hg19) change Cov Cov Cov Cov Cov Cov 1-01664 HTX IQCB1 Nonsense Q51X Novel NM_001023571 NP_001018865 3 121547429 G>A 9.73 163 35Neuro 14 49- 0 14 0 57 Classification Heart Primary Classification: Specific Exp. Score Somatic GrowthScore 1-018281-ID00141 DAPK3NAA15Gene 55214 † p.p.PSerMutationro761193Leu* CTD:CTD: TOF, TOF single LCA § No NoExtracardiac Structural Anomaliesn/a Developn/a -<5 n/a20 n/a Exp Cardiovascular Diagnoses Ht(%) Wt(%) 1-Mutations031511-01138 at conserved SUPT5HUSP34 position in LHE133 genes65 in CHDp.Leu432Pp.Glu Cases451Aspro LVO:LVO: supra BAV, MS, aortic BAV, stenosis CoA No No n/a mentaln/a 25 75>95 90 Epicanthal folds, telecanthus, large Damaging1-00448 mutations NF1 in HHE genes in CHD55 Casesp.IVS6+4 del A CTD:HTX: PA Dextrocardia, VSD-MAPCAs TAPVR, No n/a 5 5 p.Lys335Lys Hydronephrosis,low-set ears, excess asplenia, nuchal skin, motor 1-1-02956004551-00802 NAA15HTXPTCH1 HYDIN21432 Missense p.Argp.Ser1722831I2216NGln LVO:LSVC,LVO:Novel HLHS hypoplasticMitral NM_001270974.1 atresia, TV, HLHS, DORV, NP_001257903.1 aortic No 16 70977734 A>T 0.98 188n/a 54 normal 19 66 50 0 515050 0 50100 1-00596 MLL2 216 fs*6 malrotationhigh arched palate, wide-spacedProband delay, Father 50Mother <5 Table1-00534 S4. All deCTD novo mutations CHD7 withNonsense BayesianArgfs*9 QSQ1599X50 Other:atresia,Novel ASD dbl(multiple), NM_017780 AA dysplastic NP_060250 Macrocephaly, 8 61754556 dolichocephaly,lowC>T 124.85 228- 53 68 128 0 121 0 100 hypoplastic RV, D-TGA, PS nipples, undescended testes, club hypotonia 1-014121-02458 LVOTOSOS1 ODZ4 28Missense p.Thr266LysR1444K mitral,Novel tricuspid NM_001098816.2 and pulmonic NP_001092286.2 set ears,11 78413327 hyperextensibleC>T 22.75fingers, 228abnormal 60 61 95 <5 0 1155 2 100 1-00141 NAA15 214 p.Ser761* CTD: TOF, single LCA Nofoot, hyperpigmented lesions, n/a <5 20 1-02279 LVOTO AHNAK Nonsense G254X valvesNovel NM_001620 NP_001611foot syndactyly,11 62301129 caféC>A-au-lait175.75 spots 228 27 23 88 0 96 0 100 1-1-01179011381-02952 USP34CTDPITX2 C11orf416518 Missense p.p.Leu432PAla47Val S303C ro LVO:LVO:CTD:Novel CoA supra TOF, NM_012194.2MS, right BAV, aortic CoA arch, NP_036326.2 No No11 33564908 C>G 0.02 228n/a 77 n/a 75 138 75 0 20725>95 0 >95100 1-00853Primary WDR5 39 p.Lys7Gln aberrant LSA, coronary No Mean Variant abnormal 90 90Bayesian 1-1-0214400448 NF1CTD LIG1 55Nonsense p.IVS6+4 Y765X del A CTD:Novel PA VSD NM_000234-MAPCAs NP_000225 No19 48624517 G>T 118.41 228 33n/a 27 34 0 585 0 5100 1-01913 RAB10 119 p.Asn112Ser Other:abnormality DILV, DRefSeq-TGA, NM BAV, CoARefSeq NPNo Position Base n/aRef Nonref Ref 30Nonref Ref95 Nonref 1-00381† LVOTO COL4A3BP Missense G131D Novel NM_005713 NP_005704 5 74722260 C>T 39.13 228 76 57 122 0 111 0 100 1-1-0136000802HeartID expressionCardiac LVOTOPTCH1 refers NCKAP1Gene to #32 reads NonsenseMutation p.perArg Aminomillion831 E1057XAcidGln at Change murine LVO: e dbSNP14.5.Novel HLHS Mutation NM_013436 denotes the NP_038464 impact onChrNo 2Cleftencoded 183792856 lip, cleftprotein C>Apalate, in three103.97Heart inguinal letterQuality 228 code; 70 * denotesn/a 76 175 termination0 12850 4 Quality50100 mutation. Frameshift mutation in MLL2, CUL3 and NAA15. ‘IVS’ stands for intervening sequence. ‘fs’ stands for frameshift. Splice site mutation in KDM5B, 1-00534 CHD7 125 p.Gln1599* Other:CTD: ASD TOFAccesion -(multiple),PA IDs dysplasticAccesion IDs Macrocephaly,hernia,(hg19) micropenis,change dolichocephaly,low sensorineuralCov - abnormalCov Cov Cov <5Cov Cov <5 1-02621MED20, andHTX SMAD2SMAD2 occur at Missense1st base of canonicaW244C l spliceNovel donor ofNM_001135937 intron 12, at 2nd NP_001129409 base of canonical18 45375021 splice C>Gdonor 38.29of intron 2282 and 1 45st base 39 of 134canon0ical splice42 0 100 1-1-0196502458 Classification LVOTOSOS1 KDM5B28 Splicep.Thr266Lys 1 bp beyond exon 12mitral, Novel tricuspid NM_006618 and pulmonic NP_006609 set 1hearing ears,202722032 losshyperextensible C>T Exp.68.06 fingers,Score 228 47abnormal 39 112 0 76<5 0 Score5100 donor of intron 6 respectively. valves foot syndactyly, café-au-lait spots RESEARCH§ SUPPLEMENTARY INFORMATIONLVO: LSVC, Primum ASD, cleft 1-1-010531-0007502952HLHS1-00230 -hypoplasticPITX2 CTDHTXKDM5A leftFAM135A RNF20 heart18 syndrome;70 MissenseNonsense p.Alap. ArgDbl47ValR1138H 1508Q83XAA -doubleTrp LVO:aorticNovelNovel CoAarch; NM_001105531TOF NM_019592-tetralogy NP_001099001of NP_062538 Fallot; PAPVRNo 6 9No 104302602 71245998- partial anomalousG>AC>T 37.7358.23 pulmonary 228 228 venous 156 66normaln/a113 47return; 205 129 LSVC00 -left<519011475 00 <5>95100 100 Damagingsuperior mutations vena incava; HHE genesLA-left in CHD atrium; Cases CAVC-complete atrioventricularMV, sub- AS,canal BAV defect; TAPVR- total anomalous pulmonary venous return; MV-mitral valve; BAV- 1-01913bicuspid aorticRAB10 valve; ASD-atrial119 septalp.Asnp.IVS12+1 defect;112Ser VSD - ventricularOther: DILV, septal D defect;-TGA, PABAV,-pulmonary CoA atresia;No RAI- right atrial isomerization. n/a 30 95 1-007531-010281-01965 LVOTO CTDKDM5B KIAA1737GTPBP4 68 MissenseNonsense P132LK332X LVO:NovelNovel Coarctation NM_033426 NM_012341 NP_219494 NP_036473 1410No 77579856 1051839 C>TA>T 36.1767.83 228 228 13265n/a 13858 23591 00 <526336 00 <5100 100 †Heart1-00534 expressionCTD refers CHD7 to # readsNonsense perG>A millionQ1599X at murine Novele14.5. Mutation NM_017780 denotes NP_060250 the impact 8 on61754556 encodedC>T protein124.85 in three 228 letter 53 code; 68 * 128denotes0 termination121 0 100 mutation.1-003731-005961-01907 F LVOTO LVOTOrameshift UBE2B PAPSS1mutation MLL2146FrameshiftMissense in MLL2 p., ArgCUL3T399S8Thr S1722 and NAA15CTD:NovelNovel. ‘IVS’ TOF stands NM_005443 NM_003482 for intervening NP_005434 NP_003473 sequence.12 4No108574688 49438005 ‘fs’ standsG>C-A for216.1233.63 frameshift.Indel-Pass 228 Splice 8980normal 78 44site 178mutation 131 00 in50181126 KDM5B 00 10, 100 100 Low-set ears,excess nuchal skin, st MED201-02279, and LVOTO SMAD2 AHNAKoccur at Nonsense1st base of canonicaG254X l spliceHTX:Novel donor Dextrocardia, of NM_001620 intron 12, RAI, at NP_001611 TAPVR,2nd base of11 canonical62301129 spliceC>A donor175.75 of 228intron 272 and 23 1 base 88 of0 canon96 ical0 splice100 hydronephrosis, wide-spaced 2nd donor1-004551-00075 of intron HTX RNF206 respectively.NAA15 58 Frameshift p.Gln83D335* L-Novelventricular NM_057175 loop, CAVC NP_476516 4 140272757 -AAAG 213.74 Indel-Pass 68abnormal 17 57 0 <572 0 <5 100 § 1-01943 CTD PES1 Missense P409T Novel NM_001243225 NP_001230154 22toe, 30975867 Asplenia,G>T primary33.10 cilia 228 12 22 80 0 88 0 100 HLHS1-02144-hypoplasticCTD left LIG1heart syndrome;Nonsense Dbl Y765X AA-double aorticunbalancedNovel arch; NM_000234 -TOFright- tetralogdominant, NP_000225y of PA Fallot ;19 PAPVR 48624517-partialG>T anomalous118.41 228pulmonary 33 27venous 34 return;0 58LSVC0-left 100 superior1-019951-00577 vena LVOTOCTD cava; LATM2D2 OS9-left atrium; FrameshiftMissense CAVC-completeT74AT158 atrioventricularNovelNovel NM_001024381 NM_001017956 canal defect; NP_001019552 NP_001017956 TAPVR- total12 8dysk 38851146 58089814anomalousinesia T>C-A pulmonary27.6967.99 Indel-Pass 228venous 4078 return; 23 39 MV 81 72-mitral00 valve;7674 0BAV0 - 100100 bicuspid1-01360Table S4. aortic All LVOTO de valve;novo mutations NCKAP1ASD-atrial with Nonsense septal Bayesian defect; QS E1057X50 VSD- ventricularLVO:Novel ASD, septal NM_013436 mitral defect; atresia, PA NP_038464 aortic-pulmonary 2 atresia;183792856 RAIC>A- rightWWW.NATURE.COM/NATURE103.97 atrial isomerization 228Proband 70 76. 175Father0 | 19128Mother4 100 1-01260 USP44 0 p.Glu71Asp No normal 25 25 1-025151-02227 LVOTOHTX KCNH6 FTSJ3 FrameshiftMissense 786/847T274M atresia,NovelNovel HLHS NM_030779.2 NM_017647 NP_110406.1 NP_060117 1717 61611392 61897350 -GA/+CC>T 59.290.38Indel-Pass 228 2862 25 69 118 47 00 12537 00 10083 1-01965 LVOTOPrimary KDM5B Splice 1 bp beyond exon 12HTX: Novel Dextrocardia, NM_006618 ASD, NP_006609 CAVC- 1 202722032 C>T Mean68.06 Variant 228 47 39 112 0 76 0 Bayesian100 1-02020 SMAD2 38 p.IVS6+1 G>A RefSeq NM RefSeq NP AspleniaPosition Base Ref Nonrefnormal Ref Nonref Ref95 Nonref 10 ID Cardiac Gene Mutation Amino Acid Changeunbalanced, dbSNP DORV, D-TGA, PS Chr Heart Quality Quality 1-029551-00577 LVOTOCTD MAK16CUL3 Frameshift Missense R122LI144 NovelNovelAccesion NM_032509 NM_003590 IDs Accesion NP_115898 NP_003581 IDs 8 2 22537943433346630(hg19) change-TAATG>T 21.5557.06 Indel-Pass 228Cov 101163Cov 83 49Cov 265 172 Cov10 Cov365149 Cov00 100100 1-00075 ClassificationHTX RNF20 Nonsense Q83X HTX:Novel Dextrocardia, NM_019592 LSVC NP_062538 to LA, 9 104302602 C>T Exp.58.23Score 228 66 47 129 0 114 0 Score100 Abnormal nose, foot syndactyly, 1-02621 SMAD2 38 p.Trp244Cys PAPVR, CAVC unbalanced -right n/a 50 50 1-00448 CTD NF1 Splice 4 bp beyond exon 6 Novel NM_001128147 NP_001121619 17malrotation 29508511 -A 54.87 Indel-Pass 139 48 110 0 145 0 100 1-009981-01028Damaging mutationsCTD in HHEGTPBP4 genesINTS6 in CHDNonsense Missense Cases K332XT86R dominant,Novel NM_012341NM_012141DORV, PS NP_036473NP_036273 1013 52025243 1051839 G>CA>T 67.8320.84 228 132109 138 83 235251 0 263246 0 100 1-001411-00534 CTDCTDNAA15 CHD7 NonsenseNonsense Q1599X S761X HTX:Novel Dextrocardia, NM_017780NM_057175 PAPVR, NP_060250 NP_476516 8 4 61754556140306112 C>TC>A 124.85213.74 228 196 53 16 68 12 128 30 0 0 12141 0 0 10092 1-007501-005961-022791-01451 LVOTO LVOTOMED20 SSH2AHNAKMLL2 Frameshift25 Missense Nonsense p.IVS2+ V108LS1722G254X2 T>C mitralNovelNovel atresia, NM_003482NM_033389NM_001620 HLHS, aortic NP_003473NP_001611NP_203747 111217No62301129 4943800528011657 C>AC>A-A 175.75216.1220.74WWW.NATURE.COM/NATUREIndel-Pass 228 228 2780 70abnormal 23 4464 88 131149 00 96126189<5 0|0 1910100100 1-019071-02144 CTDCTDSERPINH1 LIG1 NonsenseNonsenseR415X Y765X atresia,Novel hypoplastic NM_000234NM_001235 AA NP_000225 NP_001226 1911 4862451775283114 G>TC>T 118.41847.33228 137 335 2715 34 32 0 0 5821 0 0 10055 1-029551-004551-013601-01151 LVOTOCTDHTXSUV420H1XRCC5 NCKAP1NAA15 Frameshift44 Missense Nonsense p.Arg K238QE1057X143CD335 ys CTD:Novel dbl AA NM_057175NM_021141NM_013436 NP_038464NP_476516NP_066964 24 2No183792856140272757216990668 -AAAGC>AA>C 103.97213.7420.47Indel-Pass 228 228 7068 58abnormal 76 1752 175 166 57 00 1282265072 40 10100100 Mutations1-01965 at conserved LVOTO position KDM5B in HHE genes Splice in CHD 1Cases bp beyond exon 12LVO: Novel Mitral NM_006618 stenosis, aortic NP_006609 1 202722032 C>T 68.06 228 47mild 39 112 0 76 0 100 1-005771-00750 LVOTO HUWE1 OS9 Frameshift260 p.Arg3219CT158 ys Novel NM_001017956 NP_001017956 12No 58089814 -A 67.99 Indel-Pass 78 39 72 0 2574 0 25100 1-011191-00075 CTDHTXNAA16 RNF20 MissenseNonsense R70CQ83X stenosis,Novel HLHS NM_024561NM_019592 NP_062538NP_078837 13 9 104302602 41893010 C>TC>T 58.2312.47 228 228 66 142abnormal 47 121 129 234 00 114219 00 100100 1-00148 LVOTO LAMC1 Missense G170E LVO:Novel Hypoplastic NM_002293 mitral NP_002284valve, 1 183072553 G>A 510.65 169 22 15 54 0 63 0 100 1-022271-01028 LVOTOCTD GTPBP4 FTSJ3 Frameshift Nonsensep.Iso144Phe786/847 K332X Novel NM_017647NM_012341 NP_036473NP_060117 1017Congenital 61897350 1051839 hip-GA/+CA>T dysplasia,67.8359.29 Indel-Pass 228 13262 138 69 235118 00 263125 00 100100 1-013121-00577 CTDCUL3 SESTD157 Missense R24Q hypoplasticNovel NM_178123 aortic annulus, NP_835224 aortic 2 180047900 C>T 11.25 228 77n/a 101 191 0 19512 0 60100 1-00596 LVOTO MLL2 Frameshiftfs*23 S1722 Novel NM_003482 NP_003473 12congenital 49438005 scoliosis-A 216.12 Indel-Pass 80 44 131 0 126 0 100 1-00522 LVOTO TLN1 Missense L684V stenosis,Novel coarctation NM_006289 NP_006280 9 35717729 G>C 321.59 228 53 49 96 0 132 0 100 1-005771-00455 LVOTOHTX CUL3NAA15Frameshift Frameshift I144D335 Novel NM_003590NM_057175 NP_476516NP_003581 42 140272757225379434 -AAAG-TAAT 213.7457.06 Indel-PassIndel-Pass 68163 17 49 57 172 00 72149 00 100100 1-02461 CTD DTNA Missense P295S CTD:Novel LSVC, NM_001392 sinus venosus NP_001383 ASD, 18 32400761 C>T 10.09 228 56 49 97 0 133 0 100 1-004481-016641-005771-00116 LVOTOCTDHTXNUB1 OBSCN NF1 OS945 FrameshiftMissenseSplice p. 4 bpAsp beyondF5295S310HT158 exonis 6truncus NovelNovel arteriosus, NM_001128147NM_001017956 NM_001098623 VSD NP_001017956NP_001121619 NP_001092093- 1217 1Spine 58089814228521311 29508511 lipoma -AT>C-A 67.99298.1454.87 Indel-PassIndel-Pass 22878139 43abnormal 39 48 52 72110 125 001 74145<596 000 5100 100100 1-011751-02227 LVOTOHTX ITGA7 FTSJ3 MissenseFrameshift R279W786/847 muscularNovel NM_001144997 NM_017647 NP_001138469 NP_060117 1712 61897350 56092245 -GA/+CG>A 59.297.79Indel-Pass 22862 53 69 40118 91 00 12594 00 100100 1-007501-005771-01828 LVOTO LVOTODAPK3 HUWE1 CUL3 55Frameshift Missense p.ProR3219C193LeuI144 CTD:Novel TOF NM_031407.5NM_003590 NP_113584.3NP_003581 2 XNo225379434 53576300 -TAATG>A 57.06259.79Indel-Pass 228163 38n/a 49 56 172 48 0 0 149n/a146 0 0 n/a100100 1-001411-00448 CTDCTDNAA15 NF1 NonsenseSplice 4 bp beyond S761X exon 6 Novel NM_001128147 NM_057175 NP_001121619 NP_47651617 4 140306112 29508511 C>A-A 213.7454.87 Indel-Pass 196139 16 48 12110 30 00 14541 00 10092 1-000961-03151 CTDSUPT5HPIK3CD 133 Missense p.GluL347V451Asp LVO:Novel BAV, NM_005026 aortic stenosis NP_005017 1No 9778770 C>G 4.38 210 40n/a 19 107 0 11757 0 90100 1-005221-00141 LVOTOCTD LAMA5NAA15 MissenseNonsenseC1625Y S761X Novel NM_057175NM_005560 NP_476516 NP_005551 420140306112 60902649 C>AC>T 213.74208.82 196 228 16 39 12 39 30 73 0 0 4196 0 0 92100 1-01907 CTD SERPINH1 Nonsense R415X HTX:Novel Dextrocardia, NM_001235 TAPVR, NP_001226 11 75283114 C>T 847.33 137 5 15 32 0 21 0 55 1-01907 CTD SERPINH1 Nonsense p.Lys335LysR415X Novel NM_001235 NP_001226 11Hydronephrosis,75283114 C>T asplenia,847.33 137 5 15 32 0 21 0 55 1-023641-00455 CTDNAA15NR6A1 214 Missense C120R LSVC,Novel hypoplastic NM_033334 TV, DORV, NP_201591 9 127316634 A>G 3.40 228 44normal 62 145 0 19850 0 50100 1-01637 CTD KIAA0664 Missense fs*6 H823Y Novel NM_015229 NP_056044 17malrotation 2598535 G>A 182.48 228 70 61 186 0 135 0 100 MutationsMutations at at conserved conserved position position in in HHE HHE genes genes in in CHD CHD Cases Cases hypoplastic RV, D-TGA, PS 1-021261-00148 LVOTOCTD BICD1 LAMC1 Missense Missense D760EG170E Novel NM_001714NM_002293 NP_002284NP_001705 12 1 183072553 32490460 G>AT>A 510.653.39 169 228 22 22 15 29 54 70 00 6376 00 100100 1-019071-00141 CTDNAA15UBE2B 214 Missense p.Ser761R8T* CTD:Novel TOF, NM_003337 single LCA NP_003328 5No133707309 G>C 146.23 228 26n/a 21 101 0 <563 0 20 100 1-001481-00522 LVOTO LVOTO LAMC1TLN1 MissenseMissense G170EL684V Novel NM_002293NM_006289 NP_006280NP_002284 91 18307255335717729 G>CG>A 321.59510.65 228 169 53 22 49 15 96 54 00 13263 00 100100 1-031711-008081-016641-01138 CTDCTDHTXUSP34RAVER1OBSCN ALPL Missense Missense Missense65 p.Leu432PF5295S A102TH93R ro LVO:NovelNovel supra NM_001177520NM_001098623 NM_133452 MS, BAV, CoA NP_001092093NP_001170991 NP_597709 119 1No228521311 21890596 10441195 T>CG>AT>C 298.14130.733.10 228 215 228 43 12434n/a 52 19 82 125 12540 100 962513287 000 >95100100100 1-005221-007501-00448 LVOTO LVOTONF1 HUWE1TLN1 Missense Missense55 p.IVSR3219CL684V6+4 del A CTD:Novel PA VSD NM_031407.5 NM_006289-MAPCAs NP_113584.3NP_006280 X 9No53576300 35717729 G>AG>C 259.79321.59 228 228 38 53n/a 56 49 48 96 00 1461325 00 5100 100 1-021071-016641-004791-005221-00802 LVOTO LVOTO LVOTOHTXPTCH1OBSCN GANABRDH5 LAMA5Missense Missense Missense32 p.ArgF5295SC1625YR280SN171S831Gln LVO:Novel HLHS NM_001098623NM_001199771 NM_005560NM_198334 NP_001092093NP_001186700 NP_005551 NP_938148 201211 1No228521311 60902649 5611821062402341 C>TT>CC>AT>C 208.82298.14129.522.42 228 228 228 39 4321 81n/a 39 5228 83 73 125 12359 0100 9650966269 000 50100100 100 1-01637 CTD KIAA0664 Missense H823Y Other:Novel ASD NM_015229 (multiple), dysplastic NP_056044 17Macrocephaly, 2598535 G>A dolichocephaly,low182.48 228 70- 61 186 0 135 0 100 1-007341-007501-023941-019071-02458 LVOTO LVOTO CTDSOS1 HUWE1MYO16UBE2B DST Missense Missense Missense28 p.Thr266LysR3219CR1164H K2653IR8T mitral,Novel tricuspid NM_001198950 NM_031407.5 NM_003337NM_015548 and pulmonic NP_001185879 NP_113584.3 NP_003328 NP_056363 13 5X 6set133707309 10977748153576300 56417763ears, hyperextensibleG>CG>AT>A 146.23259.79121.781.45 fingers, 228 228 189 26 3828 46abnormal 21 5624 67 101 4810468 000 63146<55958 001 5100 100100 1-00808 CTD RAVER1 Missense H93R valvesNovel NM_133452 NP_597709 19foot 10441195 syndactyly,T>C café130.73-au-lait 228 spots 124 82 125 0 132 0 100 1-021891-02952 LVOTO PITX2 EIF3H Missense18 p.Ala47ValH109R LVO:Novel CoA NM_003756 NP_003747 8No 117671183 T>C 118.19 228 66n/a 60 116 0 75147 0 >95100 1-009381-005221-00479 LVOTO LVOTOFGFR4 LAMA5 GANABMissense Missense C1625YD297NN171S Novel NM_005560NM_002011NM_198334 NP_938148NP_005551NP_0020021120 5 17651948362402341 60902649 T>CG>AC>T 129.52208.820.55 228 228 81 3934 83 3918 123 7342 00 699686 00 100100 1-023941-01913 LVOTORAB10 DST119 Missense p.Asn K2653I112Ser Other:Novel DILV, NM_015548 D-TGA, BAV, NP_056363 CoA 6No56417763 T>A 121.78 189 46n/a 67 104 0 5830 1 95100 1-00161† HTX SBNO1 Missense T1339M Novel NM_001167856 NP_001161328 12 123782548 G>A 108.82 228 84 114 239 0 240 0 100 1-016371-011791-02189Heart expression LVOTOCTD KIAA0664TDRD12 refers EIF3H to Missense # Missense reads per millionH823Y A155EH109R at murineNovel e14.5. NM_001110822 Mutation NM_015229NM_003756 denotes NP_001104292 NP_003747NP_056044 the impact1719 8 117671183on 33239405 2598535 encodedT>CG>AC>A protein118.19182.480.20 in three228 228 letter 66 7060 code; 60 6158 *116 186denotes 92 00 termination147135171 00 100100 mutation. Frameshift mutation in MLL2, CUL3 and NAA15. ‘IVS’ stands for intervening sequence. ‘fs’ stands for frameshift. Splice site mutation in KDM5B, 1-007531-00161 LVOTOHTX FYCO1SBNO1 Missense Missense E1286KT1339M Novel NM_001167856 NM_024513 NP_001161328 NP_07878912 3 12378254845996829 G>AC>T 108.8296.55 228 228 84 78114 80st 239 124 0 0 24060 0 0 100100 1-00323MED20, HTXand SMAD2IL2RB occurMissense at 1st base ofR170Q canonical spliceNovel donor NM_000878 of intron 12, at NP_000869 2nd base 22of canonical 37533655 spliceC>T donor0.16 of 228intron 252 and 34 1 base 35 of0 canon35 ical0 splice100 1-019071-00753 LVOTOCTD UBE2B FYCO1 Missense Missense E1286KR8T Novel NM_003337NM_024513 NP_078789NP_003328 35 13370730945996829 C>TG>C 146.2396.55 228 228 78 26 80 21 124 101 00 6063 00 100100 donor of intron 6 respectively. 1-02153§ CTD RNF44 Missense R421Q Novel NM_014901 NP_055716 5 175956066 C>T 88.67 228 15 17 43 0 54 0 100 1-025271-008081-02153HLHS-hypoplasticCTDCTDRAVER1GRM8 RNF44left heart MissenseMissense syndrome; DblN778SR421QH93R AA-doubleNovel aortic arch; NM_001127323 NM_133452NM_014901 TOF-tetralog NP_001120795 NP_055716NP_597709y of Fallot19 5; 7 PAPVR175956066126173103 10441195-partialC>TT>C anomalous130.7388.670.12 228 228pulmonary 15124 65 17venous 8268 43 125269 return;00 54132LSVC150 00-left 100100 1-00197superior LVOTOvena cava; BCL9LA-left atrium; Missense CAVC M1395K-complete atrioventricularNovel NM_004326 canal defect; NP_004317 TAPVR- 1total147096663 anomalousT>A pulmonary87.75228 venous 12 return; 16 MV 47 -mitral0 valve;71 0BAV- 100 1-001971-00325bicuspid LVOTO LVOTOaortic valve; TSHZ1BCL9 ASD-atrial Missense Missense septal defect; M1395KQ288R VSD- ventricularNovel septal NM_005786NM_004326 defect; NP_005777PA NP_004317-pulmonary18 1 atresia; 72998360147096663 RAIA>GT>A- right82.0387.75 atrial isomerization 228 228 30 12 24. 16 65 47 0 0 4971 0 0 100100 1-021441-00479 LVOTOCTD GANABKNDC1 Missense N171ST81M Novel NM_198334NM_152643 NP_938148NP_689856 1011 13498102462402341 T>CC>T 129.520.02 228 8118 8316 123 30 0 6944 0 100 1-003251-01026 LVOTO LVOTO TSHZ1 RUFY2 Missense MissenseQ288R P621L Novel NM_017987NM_005786 NP_060457 NP_005777 1018 70105589 72998360 G>AA>G 77.0382.03 228 228 90 30 105 24 216 65 0 0 22149 0 0 100100 1-020231-023941-00541 LVOTOHTX SIGLEC5EFHD2 DST Missense Missense K2653IC269Y A230V Novel NM_015548NM_003830NM_024329 NP_077305NP_056363NP_003821 19 16 1575518656417763 52131278 C>TT>AC>T 121.7876.900.02 228 189228 73 4615 86 6724 143 104 84 00 995867 010 100100 1-010261-00230 LVOTO LVOTO RUFY2 KDM5A Missense Missense R1508W P621L Novel NM_001042603 NM_017987 NP_001036068 NP_060457 1210 70105589 402269 G>AG>A 69.8977.03 228 228 53 90 36 105 90 216 0 0 108221 0 0 100100 1-021891-02923 LVOTO LVOTO EIF3H PHIP Missense MissenseH109R S674C Novel NM_003756NM_017934 NP_060404NP_003747 68 11767118379707311 G>CT>C 67.31118.19 193228 34 66 30 60 91116 00 106147 00 100100 1-01536 CTD C1orf94 Missense R222Q Novel NM_032884 NP_116273 1 34666598 G>A 0.00 228 36 32 121 0 58 0 100 1-001611-005411-02264 LVOTOHTX SBNO1EFHD2 C11orf9 Missense Missense Missense T1339M A230VF387S Novel NM_001167856NM_001127392 NM_024329 NP_001120864NP_001161328 NP_0773051112 1 1237825486154148315755186 T>CG>AC>T 108.8266.1776.90 228 228 228 22 84 73 18114 86 48239 143 100 8724099 000 100100100 1-006191-01783 LVOTO LVOTO KLF2FADS3 Missense Missense C334YG412S Novel NM_016270NM_021727 NP_068373NP_057354 111961643375 16437775 C>TG>A 65.7121.00 228 104 22 17 198 4529 00 8447 00 10086 1-00230 LVOTO KDM5A Missense R1508W Novel NM_001042603 NP_001036068 12 402269 G>A 69.89WWW.NATURE.COM/NATURE 228 53 36 90 0 108 | 019 100 1-031731-007531-01138 LVOTO LVOTOCTD FYCO1IGFN1 USP34Missense Missense E1286KI1864ML432P Novel NM_001164586 NM_024513NM_014709 NP_001158058 NP_055524NP_078789 23 1 2011796136157778545996829 A>GA>GC>T 64.4896.550.06 228 228109 56 7840 69 8017 154 124 26 001 1156024 10 10010082 1-029231-02133 LVOTOCTD CPSF1 PHIP Missense Missense S674CN29K Novel NM_013291NM_017934 NP_037423 NP_060404 8 6 14563445679707311 G>CG>C 63.0467.31 228 193 41 34 29 30 80 91 0 0 82106 0 0 100100 1-002811-021531-02437 CTDHTXHTXSPATA2 RNF44 LZTR1 MissenseMissense R421QD203VG248R Novel NM_001135773 NM_014901NM_006767 NP_001129245 NP_006758NP_0557162220 5 175956066 21344765 48523111 G>AC>TT>A 60.7288.6723.64 228 228194 35 1521 27 1714 107 4327 00 995429 00 10010064 1-022641-01365 LVOTOCTD C11orf9GTPBP1 Missense MissenseF387S E291K Novel NM_001127392 NM_004286 NP_001120864 NP_004277 2211 3911778361541483 G>AT>C 56.5966.17 228 228 45 22 55 18 129 48 0 1 15087 0 0 100100 1-007881-001971-00934 LVOTO LVOTOCTD MARCH5 FREM2BCL9 Missense Missense M1395KD2206N E28K Novel NM_017824.4 NM_004326NM_207361 NP_060294.1 NP_997244NP_0043171310 1 147096663 39425119 94070938 G>AG>AT>A 51.8787.7538.87 228 228 68 1267 69 1647 132 143 47 10 15716271 00 100100 1-003251-017831-01341 LVOTO LVOTOCTD KIAA0196 TSHZ1 FADS3 Missense Missense MissenseQ288RG412S V167D Novel NM_005786NM_014846 NM_021727 NP_055661NP_005777 NP_068373 18 811 126093921 7299836061643375 A>TA>GC>T 48.0782.0365.71 228 228 228 95 30 22 72 24 19 138 65 45 000 1344984 000 100100100 1-014111-00587 LVOTO LVOTO EPDR1 SMAD4 Missense Missense T209MI500V Novel NM_017549.4 NM_005359 NP_060019.2 NP_005350 18 7 4860467637989949 A>GC>T 45.7815.67 228 228 47 57 56 59 82 78 00 13976 00 100100 1-010261-011381-00491 LVOTO LVOTO LVOTO RUFY2 USP34 KPNA1 Missense Missense Missense P621LL432P P350S Novel NM_017987NM_002264 NM_014709 NP_002255NP_060457 NP_05552410 3 2 122156091 7010558961577785 G>AG>AA>G 45.3577.0364.48 228 228 228 101 90 56110 105 69199 216 154 000 144221115 001 100100100 Mutations1-03300 at nonconserved LVOTO position DHX38 in LHE Missensegenes in CHD CasesG332D Novel NM_014003 NP_054722 16 72133665 G>A 40.76 228 34 28 84 0 103 0 100 1-005411-021331-02093 HTXCTDCTD EFHD2CPSF1LOXL2 Missense Missense Missense A230VR327QN29K Novel NM_024329NM_002318 NM_013291 NP_002309NP_077305 NP_037423 81 8 2318606514563445615755186 C>TC>TG>C 110.9576.9063.04190 228 228 27 73 41 17 86 29 27 143 80 000 259982 000 91100100 1-024111-01828 CTDCTD CDH23DAPK3Missense MissenseR1136C P193L Novel NM_001171930.1 NM_001348 NP_001165401.1 NP_001339 1910 734838383963893 G>AC>T 54.610.30 228 228 34 43 38 45 50 158 00 63199 00 90100 1-002301-024371-01984 LVOTO LVOTOHTX PCDHGA2 KDM5A LZTR1 MissenseMissense Missense R1508WG248RL172F Novel NM_001042603 NM_032009NM_006767 NP_001036068 NP_061738 NP_006758 12 522140719052 21344765 402269 C>TG>AG>A 182.8469.8960.72 228 228 228 28 53 35 29 36 27 33 90107 000 4310899 000 89100100 1-008781-013651-03151 LVOTOHTXCTD CCDC57GTPBP1 SUPT5HMissense Missense Missense E291KE451DA77T NovelNovel NM_001130825 NM_198082.2 NM_004286 NP_001124297 NP_932348.2 NP_004277 191722 39960029 80159592 39117783 G>CC>TG>A 132.7210.1856.59 228 228 228 22 18 45 23 23 55 33 12958 000 2515048 000 72100100 1-029231-02788 LVOTOCTD PHIPMINK1 Missense Missense S674CR299C Novel NM_017934NM_153827 NP_722549NP_06040417 6 79707311 4789867 C>TG>C 77.7767.31 150 193 16 34 16 30 23 91 00 27106 00 66100 1-011-009341-01696184 LVOTOHTXCTD GLT25D1APLP1 FREM2 Missense MissenseMissense D2206NR330CR471W NovelNovel NM_005166.3 NM_024656NM_207361 NP_005157.1 NP_078932 NP_997244 191913 17691524 36364547 39425119 C>TC>TG>A 73.6411.4351.87 228228 228 35 66 68 33 52 69 47 13286 001 57208157 000 55100100 1-022641-00116 LVOTOCTD C11orf9NUB1 Missense Missense F387SD310H Novel NM_001127392 NM_016118.4 NP_001120864 NP_057202.311 7 15106408061541483 G>CT>C 45.0766.17 228 228 95 22 86 18 149 48 01 17687 00 100100 1-008531-017831-013411-01036 LVOTOCTDCTDCTD KIAA0196BCL2L1 FADS3WDR5 1Missense Missense MissenseG412S V167D K7QP59S Novel NM_001204113.1 NM_021727NM_052821 NM_014846 NP_001191042.1 NP_068373NP_438172 NP_05566111 2 9 8 11188167713700501812609392161643375 C>TA>CC>TA>T 41.1465.7139.4748.07 228 228 228 51 2235 95 48 1925 72 104 4513875 000 1198413462 000 100100100 Mutations at nonconserved position in HHE genes in CHD Cases 1-015381-011381-005871-01505 LVOTO LVOTO LVOTOHTX USP34SMAD4MPITTN Missense MissenseMissense T4852NL432P A38VI500V Novel NM_001256850.1 NM_014709NM_002435 NM_005359 NP_001243779 NP_055524NP_002426 NP_005350 15 21817959861061577785 75182964 48604676 G>TA>GC>TA>G 2093.2664.4839.0445.78 154 228 228 12 5636 47 10 6939 56 35 154 99 82 000 2311513984 0100 71100100 1-00638 CTD FBN2 Missense D2191N Novel NM_001999 NP_001990 5 127624885 C>T 263.94 228 127 111 325 0 312 0 100 1-020131-021331-004911-00258 LVOTO LVOTOCTDCTD TFIP11CPSF1 KPNA1PFKM Missense Missense Missense M432T P350SA522GN29K Novel NM_001008697NM_001166686 NM_013291 NM_002264 NP_001008697NP_001160158 NP_037423 NP_0022551222 8 3 145634456 48535104122156091 26895104 C>GG>CA>GG>A 218.0363.0436.1045.35 228 228 228 88 1014151 81 2911045 208 8019990 000 15110182144 000 100100100 1-033001-01817 LVOTOCTD MAPK8IP3 DHX38 Missense MissenseG332D P852R Novel NM_001040439 NM_014003 NP_001035529 NP_054722 1616 181609072133665 C>GG>A 134.1840.76 228 228 16 34 16 28 51 84 0 0 45103 0 0 100100 1-027721-024371-01432 LVOTOHTXCTD TARS2LZTR1LAMB2 Missense Missense R1661WG248R P155R Novel NM_006767NM_025150NM_002292 NP_002283NP_006758NP_07942622 3 1 15046315349159236 21344765 G>AG>AC>G 129.8660.7235.84 176 228 26 355111 273750 107148 00 4313099 00 100100 1-013651-020931-00222 LVOTOCTD GTPBP1LOXL2 NUCB1 Missense Missense E291KR327QR189C Novel NM_004286NM_006184 NM_002318 NP_006175NP_004277 NP_0023091922 8 49416352 3911778323186065 C>TG>AC>T 123.3156.59110.95 228 228190115 45 2770 55 17 258 129 27 000 16415025 000 10010091 1-008021-00381 LVOTO LVOTO PTCH1 STAB1 Missense MissenseR831Q A1102V Novel NM_001083607 NM_015136 NP_001077076 NP_055951 3 9 5254776798220518 C>TC>T 122.2132.19 228 228 71 21 47 28 89 34 00 12851 00 100100 1-027721-009341-018281-02121 LVOTOCTDCTD TBC1D4 FREM2DAPK3DST Missense Missense Missense D2206NG2936D P634R P193L Novel NM_207361NM_014832NM_015548 NM_001348 NP_056363NP_997244NP_055647 NP_001339 13 61956401671 7590046539425119 3963893 C>TG>AG>CG>A 121.7851.8729.4454.61 228 228 228 104 131 68 34 64 6985 38 229 132273 50 0100 18015723963 000 10010090 1-019841-01913 LVOTOOther PCDHGA2RAB10 Missense Missense N112SL172F Novel NM_016131NM_032009 NP_057215 NP_061738 2 5 26350020140719052 A>GC>T 118.86182.84225 228 23 28 21 29 59 33 0 0 5643 0 0 10089 1-009801-013411-01933 LVOTOCTD KIAA0196NCAPD3 CPD Missense Missense A1041V V167D P425R Novel NM_014846NM_015261NM_001304 NP_001295NP_055661NP_056076 1711 8 126093921134038929 28748818 C>GG>AA>T 98.8648.0729.32 228 228 108 9562 69 7238 169 138 93 20 142134101 00 100100 1-031511-02141 LVOTOCTD SUPT5HLRPPRC Missense Missense E451DD486N Novel NM_001130825 NM_133259 NP_001124297 NP_573566 21944190759 39960029 C>TG>C 93.03132.72 228 228 100 22 98 23 278 33 0 0 16925 0 0 10072 1-010481-005871-00186 LVOTOCTD SMAD4 SDK1DSG2 Missense Missense V625MI500VL499Q Novel NM_005359NM_152744NM_001943 NP_001934NP_005350NP_6899571818 7 29116237486046764014056 T>AA>GG>A 89.1145.7829.25 228 228 79 4793 69 104 56 157 155 82 00 90139138 001 100100 1-004911-027881-01788 LVOTO LVOTOCTD KPNA1MINK1 MYEF2 Missense Missense P350SR299CI264V Novel NM_002264NM_016132 NM_153827 NP_057216NP_002255 NP_722549 1517 3 122156091 48451047 4789867 T>CG>AC>T 69.6845.3577.77 228 228 150 87101 16 106110 16 191199 23 000 20114427 000 10010066 1-024581-02888 LVOTOother SOS1 AP3B1 Missense MissenseT266K E771K Novel NM_005633NM_003664 NP_003655NP_005624 5 2 7740611739278352 C>TG>T 63.5628.25 228 228 81 97 74 70 108 228 00 116125 00 100100 1-033001-016961-00323 LVOTOCTDHTX GLT25D1 DHX38NUP62 MissenseMissense Missense G332DR471WQ70R Novel NM_001193357 NM_014003 NM_024656 NP_001180286 NP_054722 NP_078932 191619 50412856 72133665 17691524 T>CG>AC>T 62.5540.7673.64 228 228 228 55 34 35 64 28 33 99 84 47 000 9310357 000 10010055 1-020201-00465 LVOTOHTX SLC38A3 TOMM40L Missense MissenseT174I S171I Novel NM_006841.4 NM_032174 NP_006832.1 NP_115550 1 3 16119800350253226 G>TC>T 62.291.34 228 228 69 24 67 18 96 50 00 7345 00 100100 1-020931-001161-00174 CTDCTD LOXL2ZNF326NUB1 Missense MissenseMissenseR327QD310H E338A Novel NM_016118.4 NM_002318NM_182976 NP_057202.3 NP_892021NP_002309 18 7 9048296215106408023186065 A>CC>TG>C 54.95110.9545.07 228190 228 137 27 95 100 17 86 125 27149 000 11025176 000 10091100 1-004821-00934 LVOTOHTX NFATC2 TRIM41Missense MissenseD584A P167S Novel NM_001258296 NM_033549 NP_001245225 NP_29102720 5 180651498 50071183 C>TT>G 53.7326.75 228 228 28 97 20 86 56 210 00 62205 00 100100 1-018281-010361-00824 CTDHTX BCL2L1DAPK3MAP2K71 Missense Missense Missense P193L V409IP59S Novel NM_001204113.1 NM_001348NM_145185 NP_001191042.1 NP_660186NP_001339 1919 2 111881677 7977281 3963893 G>AG>AC>T 52.5854.6141.14 228 228 228 26 34 51 25 38 48 41 50104 000 4363119 000 10090100 Mutations1-022541-01151 at nonconserved LVOTOCTD positionSUV420H1 ZNFX1 in HHE Missense Missense genes in CHD Cases Y1789CR143C Novel NM_021035NM_017635 NP_060105NP_066363 112067942601 47864195 G>AT>C 44.4925.62 228 228110 46105 70 295 143 00 41170 00 100100 1-019841-02254 LVOTO LVOTO PCDHGA2 ELMO2 Missense Missense L172FN332S Novel NM_032009NM_133171 NP_573403NP_06173820 5 140719052 45003945 T>CC>T 182.8444.31 228 228 87 28 71 29 201 33 00 11143 00 10089 1-017951-015051-01941 LVOTOCTDCTD DDX10NOP2TTN MissenseMissense Missense T4852N V427LI351V NovelNovel NM_001256850.1 NM_004398NM_006170 NP_001243779 NP_006161NP_004389 1211 2 108577521179598610 6671053 T>CG>TG>T 2093.2642.0425.53 228 228 154 55 128 12 44 92 10 129 188 35 010 21029523 000 10010071 1-031511-03190 LVOTOCTD SUPT5HOBSCN Missense Missense T4421M E451D Novel NM_001130825NM_001098623 NP_001092093NP_001124297 19 1 228503797 39960029 C>TG>C 298.14132.72 228 228 15 22 30 23 21 33 00 4525 00 8772 1-027881-011061-006381-00373 LVOTOCTDHTX TMEM63AMINK1DGCR2FBN2 Missense MissenseMissenseD2191NR299CI413V A64T Novel NM_001184781 NM_153827NM_014698 NM_001999 NP_001171710 NP_722549NP_055513 NP_0019902217 1 5 226047036 19076893127624885 4789867 C>TC>TT>CC>T 77.33263.9477.7725.32 228 150228 228 22 1271673 17 1662111 28 155 23325 000 4721627312 000 7210066100 1-01997 CTD PRPF4B Missense E14Q Novel NM_003913.4 NP_003904.3 6 4021699 G>C 129.14 228 26 21 58 0 71 0 100 1-001401-016961-002581-00344 CTDCTDCTD KIAA1468GLT25D1SERINC4PFKM Missense Missense Missense R471W S416PA522GR65H Novel NM_001258032.1 NM_001166686 NM_024656NM_020854 NP_001244961.1 NP_001160158 NP_078932NP_065905 15191812 44089057 1769152459895629 48535104 C>TC>TT>CC>G 43.79218.0373.6423.72 228 228 228 68 3555 88 73 3373 81112 129 47208 000 12614857151 000 10010055100 1-00305 CTD GRIP2 Missense T954M Novel NM_001080423.2 NP_001073892.2 3 14547126 G>A 51.51 228 38 24 70 0 53 0 100 1-003471-001161-01817Silent mutations LVOTOCTD in HHE genesMAPK8IP3 C9orf64NUB1 in CHD cases Missense MissenseD310HL144F P852R NovelNovel NM_001040439 NM_016118.4 NM_032307 NP_001035529 NP_057202.3 NP_11568316 79 15106408086570461 1816090 G>CT>AC>G 134.1845.0723.30 228 228 9545 16 8636 16 149105 51 00 1769545 00 100100 1-014321-03234 CTDHTX NDUFA13LAMB2 Missense Silent R1661WI50I Novel NM_015965NM_002292 NP_057049 NP_00228319 3 1963704649159236 A>CG>A 428.99129.86 228 176 23 26 2711 8450 0 0 13943 0 0 100100 1-004481-010361-00650 CTDHTX BCL2L1ITPR3RYR21 Missense Silent R1027HT1464T P59S Novel NM_001204113.1 NM_002224NM_001035 NP_001191042.1 NP_001026NP_002215 12 6 23775689211188167733642006 A>CG>AC>T 343.2041.1423.06 228 228 46 5160 29 4864 105 104191 00 90119232 00 100100 Mutations1-014571-002221-01816 at nonconserved LVOTO LVOTOHTX positionVPS13C ARHGDIA NUCB1 in HHE Missense Missense genes Silent in CHD CasesT423AR189CT132T NovelNovel NM_001185077 NM_017684 NM_006184 NP_001172006 NP_060154 NP_006175 171519 79827068 62283959 49416352 C>TT>CC>T 289.36123.3122.39 228 228 228 32 121115 34 7970 77 229 258 000 46228164 000 100100100 1-01637 CTD PSMB1 Silent T5T Novel NM_002793 NP_002784 6 170862316 T>C 243.58 194 23 11 94 0 66 0 100 1-003121-015051-003811-01696 LVOTO LVOTOCTD FAM76A STAB1TTNPKP4 Missense Missense SilentT4852N A1102VI228V S812S Novel NM_001256850.1 NM_001143912 NM_003628NM_015136 NP_001243779NP_001137384 NP_003619 NP_055951 2 1 3 1595198161795986102807564652547767 G>AG>TA>GC>T 2093.26173.79122.2121.69 228 154228 228 67 1254 71 68 1043 47 97 143 35 89 000 11912323128 000 10010071100 M003-10 #N/A SARS Silent P384P Novel NM_006513 NP_006504 1 109779065 G>A 137.97 228 57 72 153 0 132 0 100 1-013601-006381-021211-01505 LVOTO LVOTOCTD NEURL2FBN2 AKAP8DST Missense Missense Silent D2191NG2936D S92TY53Y Novel NM_001999NM_080749NM_005858 NM_015548 NP_005849NP_001990NP_542787 NP_0563631920 5 6 127624885 15484809 4451935656401671 G>AC>GC>TC>T 121.48263.94121.7821.38 228 228 228 57127 10423 31111 21 64 98325 22947 000 7531218036 000 100100100 1-019131-00465 LVOTOOther RAB10 CAND1 Missense Silent N112SL180L Novel NM_018448NM_016131 NP_060918 NP_05721512 2 6769123526350020 A>GA>G 101.74118.86 228225 129 23 108 21 122 59 0 0 11856 0 0 100100 1-020201-002581-00425 LVOTOCTDHTX PFKMWIBG ANLN Missense Silent G203V A522GT232T Novel NM_001166686 NM_032345NM_018685 NP_001160158 NP_061155NP_115721 12 7 36445998 4853510456295663 C>TC>GC>A 218.0397.6520.50 228 228 65 8886 61 8178 198 208191 00 152151169 00 100100 1-01933M004-11 LVOTOHTX TAX1BP1 CPD Missense Silent G325G P425R Novel NM_001206902 NM_001304 NP_001193831 NP_001295 71727833977 28748818 G>AC>G 93.8198.86 228 228 229 108 218 69 322 169 0 2 494142 0 0 100100 1-016211-018171-00344 CTDHTXCTD MAPK8IP3TWF2UBR5Missense Silent E185QP852RL291L Novel NM_001040439 NM_007284NM_015902 NP_001035529 NP_056986NP_00921516 8 3 10335763952264942 1816090 G>AC>G 134.1892.9220.49 228 228 145 1672 55 1668 220 184 51 00 14931145 00 100100 1-014321-021411-01816 LVOTOCTD LRPPRC SPARCL1LAMB2 Missense Missense Silent R1661WD486NQ448Q Novel NM_001128310 NM_002292 NM_133259 NP_001121782 NP_002283 NP_573566 43 2 884119784915923644190759 T>CG>AC>T 129.8683.4793.03 228 176 228 90 10026 8511 98 21050 278 000 15343169 000 100100100 1-002301-00738 LVOTOCTD BACH2WWC2 MissenseSilent T803AT157T Novel NM_021813NM_024949 NP_079225NP_068585 4 6 18413012790642246 T>CT>C 63.6520.11 228 228 40 60 46 33111 107 00 14978 01 100100 1-001861-00534 CTDCTD FDSG2AM111A Missense SilentL499Q E219E Novel NM_022074NM_001943 NP_071357 NP_001934 111858919798 29116237 A>GT>A 54.7189.11 228 228 46 79 52 69 131 157 0 0 11590 0 0 100100 1-002221-00111 LVOTO PPWD1NUCB1 Missense R189CI190V Novel NM_006184NM_015342 NP_006175NP_05615719 5 4941635264867712 A>GC>T 123.3119.27 228 115 4770 51 258 55 0 164103 01 100 1-017881-02708 LVOTO LVOTO MYEF2 FOXM1 Missense Silent P451PI264V Novel NM_001243089 NM_016132 NP_001230018 NP_057216 1215 296869848451047 A>GT>C 52.1669.68 228 228 74 87 66 106 124 191 0 0 160201 0 0 100100 1-022651-00381 LVOTO STAB1 PKN3 Missense A1102VR255Q Novel NM_015136NM_013355 NP_055951NP_037487 39 13146961352547767 G>AC>T 122.2119.26 228 7125 4721 8953 0 12862 0 100 1-021211-02888 LVOTOCTD AP3B1DST Missense MissenseG2936D E771K NovelNovel NM_015548 NM_003664 NP_056363 NP_003655 6 5 5640167177406117 C>TC>T 121.7863.56 228 228 104 81 64 74 229 108 00 180116 00 100100 1-00982 HTX CREB5 Missense T236M Novel NM_182899.3 NP_878902.2 7 28843919 C>T 27.24 228 125 125 193 1 111 0 100 1-019131-00323 OtherHTX RAB10NUP62 Missense Missense N112SQ70R NovelNovel NM_001193357 NM_016131 NP_001180286 NP_05721519 2 26350020 50412856 A>GT>C 118.8662.55225 228 23 55 21 64 59 99 00 5693 00 100100 1-00230 LVOTO MKRN2 Missense R50W Novel NM_001271707 NP_001258636 3 12610495 C>T 18.87 228 62 59 110 0 115 0 100 1-019331-00465 LVOTO LVOTO TOMM40L CPD Missense Missense P425R S171I NovelNovel NM_001304 NM_032174 NP_001295 NP_11555017 1 161198003 28748818 C>GG>T 98.8662.29 228 228 108 69 69 67 169 96 20 14273 00 100100 1-01538 HTX MKRN2 Missense A251V Novel NM_001271707 NP_001258636 3 12616400 C>T 18.87 228 35 38 100 0 81 0 100 1-021411-00174 CTD LRPPRCZNF326 MissenseMissenseD486N E338A NovelNovel NM_133259 NM_182976 NP_573566 NP_892021 2 1 4419075990482962 C>TA>C 93.0354.95 228 228 100 137 10098 278 125 00 169110 00 100100 1-002931-001861-00934 LVOTO LVOTOCTD HIVEP2 TRIM41DSG2 Missense MissenseL499Q P123L P167S NovelNovel NM_001943NM_006734 NM_033549 NP_001934NP_006725 NP_29102718 6 5 143095508180651498 29116237 G>AT>AC>T 89.1118.5453.73 228 228 7963 28 6962 20 157141 56 00 1099062 00 100100 8 |1-017881-011061-00824 WWW.NATURE.COM/NATURE LVOTOHTXHTX GPRC5BMAP2K7 MYEF2 Missense MissenseI264VI205T V409I NovelNovel NM_016132NM_016235 NM_145185 NP_057216NP_057319 NP_660186 151619 4845104719883554 7977281 A>GT>CG>A 69.6818.0652.58 228 228 8729 26 106 23 25 191 54 41 00 2018443 00 100100 1-029521-028881-01151 LVOTOCTD SUV420H1 AP3B1 PITX2 Missense Missense E771KR143C A47V NovelNovel NM_003664NM_153426 NM_017635 NP_003655NP_700475 NP_060105 11 54 1115535437740611767942601 G>AC>TG>A 63.5617.9944.49 228 228 8111077 7410552 108156 295 00 116101411 00 100100 1-018361-003231-02254 LVOTOCTDHTX KIAA2018NUP62 ELMO2 Missense Missense S1168LN332SQ70R NovelNovel NM_001009899NM_001193357 NM_133171 NP_001009899NP_001180286 NP_573403 1920 3 113377026 50412856 45003945 G>AT>CT>C 62.5517.9644.31 228 228 103 55 87 6435 71 215 99201 010 16593111 00 100100 1-010191-004651-01941 LVOTOHTXCTD TOMM40LPCDH1NOP2 Missense Missense K1203R S171II351V NovelNovel NM_032174NM_032420 NM_006170 NP_115550NP_115796 NP_006161 12 15 161198003141233713 6671053 G>TT>CT>C 62.2917.6642.04 228 228 6988 55 6790 44 194 96129 00 17273210 00 100100 M007-201-001741-03190 CTDHTX ZNF326SBNO2OBSCN Missense Missense T4421M E338A V78M NovelNovel NM_001098623 NM_182976NM_014963 NP_001092093 NP_892021NP_055778 19 1 1 228503797904829621147355 A>CC>TC>T 298.1454.9516.34 228 228 137 84 15 100 40 30 125115 21 00 1106845 00 10087 1-031711-009341-00373 LVOTO LVOTOCTD TRIM41EEPD1DGCR2 MissenseMissense E414DP167S A64T NovelNovel NM_001184781 NM_033549NM_030636 NP_001171710 NP_291027NP_08513922 57 18065149836327313 19076893 G>TC>TC>T 53.7316.1077.33 228 228 2834 22 2038 17 5647 28 00 628447 00 10072 1-008241-011901-01997 HTXCTD MAP2K7PRPF4BATM Missense Missense S821N V409I E14Q NovelNovel NM_003913.4 NM_145185NM_000051 NP_003904.3 NP_660186NP_000042 1911 6 108129798 79772814021699 G>AG>C 129.1452.5815.91 228 228116 26 26 2596 21 313 41 58 00 1114371 010 100100 1-000961-011-00344151 CTDCTD SUV420H1SERINC4LPHN3 Missense Missense K1406RR143CR65H NovelNovel NM_001258032.1 NM_017635NM_015236 NP_001244961.1 NP_060105NP_0560511115 4 6794260162936433 44089057 G>AA>GC>T 44.4915.4443.79 228 228110 50 68105 42 73 295150112 00 411154126 00 100100 M007-201-022541-00305 LVOTOHTXCTD ELMO2MASTLGRIP2 Missense Missense D537NN332ST954M NovelNovel NM_001080423.2 NM_001172304 NM_133171 NP_001073892.2 NP_001165775 NP_573403 2010 3 450039452745949714547126 G>AT>CG>A 44.3112.8251.51 228 228 162 87 38 143 71 24 201276 70 00 16811153 00 100100 Silent1-022121-01941 mutationsCTD in HHE genesZNF536NOP2 in CHD casesMissense E727KI351V Novel NM_006170NM_014717 NP_006161NP_055532 1219 31025762 6671053 G>AT>C 42.0411.70 228129 5524 4410 129 53 0 21039 0 100 1-03234 HTX NDUFA13 Silent I50I Novel NM_015965 NP_057049 19 19637046 A>C 428.99 228 23 27 84 0 139 0 100 1-008781-03190 CTDHTX OBSCNCRB2 Missense R1T4421M189Q Novel NM_001098623 NM_173689 NP_001092093 NP_775960 19 228503797126137555 G>AC>T 298.1411.59 228 1545 3036 153 21 0 4599 0 10087 1-003731-00650 LVOTOHTX DGCR2RYR2Missense Silent T1464T A64T NovelNovel NM_001184781 NM_001035 NP_001171710 NP_00102622 1 237756892 19076893 C>TA>C 343.2077.33 228 228 22 46 17 29 28105 00 4790 00 72100 1-00820 LVOTO PABPC4L Missense K224Q Novel NM_001114734.1 NP_001108206.2 4 135121679 T>G 9.05 228 53 46 78 0 96 0 100 1-019971-01816 LVOTOCTD ARHGDIAPRPF4B Missense Silent T132T E14Q NovelNovel NM_001185077 NM_003913.4 NP_001172006 NP_003904.317 6 798270684021699 G>CC>T 129.14289.36 228 228 26 32 21 34 58 77 00 7146 00 100100 1-01327 CTD KLHDC4 Missense V443I Novel NM_001184854 NP_001171783 16 87742022 C>T 10.94 228 36 34 81 0 69 0 100 1-003441-01637 CTD SERINC4PSMB1 Missense Silent R65HT5T NovelNovel NM_001258032.1 NM_002793 NP_001244961.1 NP_00278415 6 170862316 44089057 C>TT>C 243.5843.79 228 194 68 23 7311 11294 00 12666 00 100100 1-008181-003051-01696 LVOTOCTD C16orf48GRIP2PKP4 Missense SilentT954M A192T S812S NovelNovel NM_001080423.2 NM_032140.1 NM_003628 NP_001073892.2 NP_115516.1 NP_00361916 3 2 15951981614547126 67697845 G>AC>TG>A 173.7951.519.31 228 228 3839 67 2427 68 100 70 97 00 12453119 00 100100 Silent1-02070M003-10 mutationsCTD #N/Ain HHE genesHAUS3 SARSin CHD cases Missense SilentF572L P384P NovelNovel NM_024511 NM_006513 NP_078787 NP_006504 4 1 1097790652233750 G>TG>A 137.978.44 228 228 27 57 17 72 15372 00 13290 00 100100 1-002351-032341-01505 LVOTOCTDHTX NDUFA13ZC3HAV1 AKAP8 Missense Silent Silent A371TY53YI50I NovelNovel NM_015965NM_020119 NM_005858 NP_057049NP_064504 NP_005849 1919 7 138764576 19637046 15484809 A>CC>TG>A 428.99121.488.33 228 228 2391 57 2779 31 151 84 98 00 13917475 00 100100 1-003251-006501-00465 LVOTO LVOTOHTX CAND1RYR2 LYST Missense Silent Silent N3017TT1464TL180L NovelNovel NM_001035NM_000081 NM_018448 NP_001026NP_000072 NP_060918 12 1 237756892235892952 67691235 A>CT>GA>G 343.20101.746.80 228 228 1294651 1082924 105118 122 00 9089118 00 100100 1-001511-018161-00425 LVOTO LVOTOCTD ARHGDIAFAN1 ANLN Missense Silent Silent T905MT132TT232T NovelNovel NM_001185077 NM_014967 NM_018685 NP_001172006 NP_055782 NP_0611551715 7 798270683122152736445998 C>TC>T 289.3697.656.18 228 228 3261 65 3456 61 7719883 00 4615275 00 100100 1-019941-01637M004-11 LVOTOCTDHTX TAX1BP1PSMB1DFFB Missense Silent Silent G325G A91ST5T NovelNovel NM_001206902 NM_002793NM_004402 NP_001193831 NP_002784NP_004393 61 7 170862316278339773782405 G>TT>CG>A 243.5893.815.85 194218 228 2292325 218111794 32250 00 6649430 00 100100 1-016961-00344 CTDCTD PKP4UBR5 Silent Silent S812SL291L NovelNovel NM_003628 NM_015902 NP_003619 NP_056986 2 8 159519816103357639 G>AG>A 173.7992.92 228 228 14567 68 55 97220 00 119149 00 100100 M003-101-01816 LVOTO#N/A SPARCL1SARS Silent Silent P384PQ448Q NovelNovel NM_001128310 NM_006513 NP_001121782 NP_006504 1 4 10977906588411978 G>AT>C 137.9783.47 228 228 57 90 72 85 153 210 00 132153 00 100100 1-015051-00738 LVOTOCTD AKAP8WWC2 SilentSilent T157TY53Y NovelNovel NM_005858 NM_024949 NP_005849 NP_07922519 4 184130127 15484809 G>AT>C 121.4863.65 228 228 57 40 31 46 98111 00 75149 00 100100 1-004651-00534 LVOTOCTD F CAND1AM111A Silent SilentL180L E219E NovelNovel NM_018448 NM_022074 NP_060918 NP_071357 1211 6769123558919798 A>GA>G 101.7454.71 228 228 129 46 108 52 122 131 00 118115 00 100100 1-004251-02708 LVOTO LVOTO FOXM1 ANLN Silent SilentT232T P451P NovelNovel NM_001243089 NM_018685 NP_001230018 NP_06115512 7 36445998 2968698 C>TA>G 97.6552.16 228 228 65 74 61 66 198 124 00 152160 00 100100 M004-11 HTX TAX1BP1 Silent G325G Novel NM_001206902 NP_001193831 7 27833977 G>A 93.81 228 229 218 322 0 494 0 100 1-00344 CTD UBR5 Silent L291L Novel NM_015902 NP_056986 8 103357639 G>A 92.92 228 145 55 220 0 149 0 100 1-01816 LVOTO SPARCL1 Silent Q448Q Novel NM_001128310 NP_001121782 4 88411978 T>C 83.47 228 90 85 210 0 153 0 100 1-00738 CTD WWC2 Silent T157T Novel NM_024949 NP_079225 4 184130127 T>C 63.65 228 40 46 111 0 149 0 100 1-00534 CTD FAM111A Silent E219E Novel NM_022074 NP_071357 11 58919798 A>G 54.71 228 46 52 131 0 115 0 100 1-02708 LVOTO FOXM1 Silent P451P Novel NM_001243089 NP_001230018 12 2968698 A>G 52.16 228WWW.NATURE.COM/NATURE 74 66 124 0 160 0 100 | 9

8 | WWW.NATURE.COM/NATURE

8 | WWW.NATURE.COM/NATURE “Model Organisms” to study this in?

• S. cerevisiae (yeast) • S. pombe (yeast) • C. elegans (worms) • D. melanogaster (flies) • M. musculus (mouse) • Microcebus (mouse lemur- primate) • C. Jacchus (marmoset- primate) • Cell lines from the Ogden Syndrome boys • Induced Pluripotent Stem (iPS) cells - human branched chains outward from the center of the colony, and invasive growth under the surface of agar medium. “Normal” laboratory haploid strains have a doubling time of approximately 90 min. in complete YPD (1% yeast extract, 2% peptone, and 2% glucose) medium and approximately 140 min. in synthetic media during the exponential phase of growth at the optimum temperature of 30°C. However, strains with greatly reduced growth rates in synthetic media are often encountered. Usually strains reach a maximum density of 2 x 108 cells/ml in YPD medium. Titers 10 times this value can be achieved with special conditions, such as pH control, continuous additions of balanced nutrients, filtered-sterilized media and extreme aeration that can be delivered in fermenters. S. cerevisiae can be stably maintained as either heterothallic or homothallic strains, as illustrated in Figure 1. Both heterothallic and homothallic diploid strains sporulate under conditions of nutrient deficiency, and especially in special media, such as potassium acetate medium. During sporulation, the diploid cell undergoes meiosis yielding four progeny haploid cells, which become encapsulated as spores (or ascospores) within a sac-like structure called an ascus (plural asci). The percent sporulation varies with the particular strain, ranging from no or little sporulation to nearly 100%. Many laboratory strains sporulate to over 50%. The majority of asci contains four haploid ascospores, although varying proportions asci with three or less spores are also observed. Because the a and mating types are under control of a pair of MATa/MAT heterozygous alleles, each ascus contains two MATa and two MAT haploid cells. Upon exposure to nutrient condition, the spores germinate, vegetative growth commences and mating of the MATa and MAT can occur. However, if the haploid spores are mechanically separated by micromanipulation,S. the cerevesiae haplophase of heterothallic strains (yeast can be stably )maintained, thus allowing the preparation of haploid strains. In contrast, the presence of the HO allele in

Diplophase Diplophase HETEROTHALLIC HOMOTHALLIC Mating MATa/ MAT MATa/ MAT

Sporulation MATa Sporulation

MATa MAT

Mating-type MAT Germination switching and Germination Mating

Haplophase

Figure 1. Life cycles of heterothallic and homothallic strains of S. cerevisiae. Heterothallic strains can be stably maintained as diploids and haploids, whereas homothallic strains are stable only as diploids, because the transient haploid cells switch their mating type, and mate.

7 hNaa10p-S37P is funconally impaired in vivo using a yeast model.

Unpublished data, do not further distribute. Open queson: Funcon of N-terminal acetylaon?

Protein stability? Protein secreon?

Figure courtesy of Kris Gevaert ARTICLES

a N b N c d 30 27

28 Naa10p Q491 2 D532 Naa15p W552 F533 L549 F536 W494 Q15 29  I36 1 29 30 30 C 2  2 27 3 3 29 L32 6 5 I8 25 N25 13 90 27  28 1 7 C NAA10 + NAA15 1 26 2 28  H20 4 14 W526 Y33 25 K29  4 15 25  L11 L28 24 19 23 17 R448 1 2 W38 F449

Figure 1 Overall structure of the NatA complex bound to acetyl CoA. (a) Naa10p (teal) and Naa15p (brown) subunits, shown in cartoon, bound to acetyl CoA (CPK coloring and stick format). Only Naa15p helices that contact Naa10p are labeled. The dashed brown line represents a disordered loop region in Naa15p. The dimensions of the complex are 107 Å × 85 Å × 70 Å. (b) A 90° rotation of the view in a. Helices that are depicted in c and d are labeled. (c) Zoom view highlighting key residues that compose the predominantly hydrophobic interface between Naa10p A1–A2 and Naa15p A29–A30. (d) Zoom view of the intersubunit interface at the C-terminal region of Naa10p A1 and the Naa15p A25–A27–A28 helices. between Lys29 and Gln15 of Naa10p and Asp532 and Gln491 of of traditional substrates (alanine, cysteine, glycine, serine, threonine Naa15p. Single point mutations in this region did not break up the or valine) (Supplementary Videos 2 and 3). Specifically, Naa10p complex and had only modest effects on substrate binding and cataly- residues Leu22 and Tyr26 shift about 5.0 Å from surface-exposed sis (Table 2), probably owing to the extensiveness of the interface. positions to buried positions in the active site, and Naa10p Glu24 A smaller hydrophobic interface is formed between Naa10p His20 moves by about 4.0 Å, substantially altering the landscape of the NatA and Naa15p Phe449 and Trp494 and is supplemented with a hydro- active site (Fig. 2d). All of these residues are well ordered in both gen bond between Naa10p Gln25 and Naa15p Arg448. This region structures (Supplementary Fig. 4). Our comparison of the com- of Naa15p directly stabilizes the position of the Naa10p A1 helix as plexed and uncomplexed structures suggests that the auxiliary subunit well as the Naa10p A1–A2 loop and as a result is crucial for proper induces an allosteric change in the Naa10p active site to an extent that complex formation. This is evident from the observation that alanine is required for the mechanism of catalysis by the NatA complex, and point mutations at either Naa15p Arg448 or Naa15p Phe449 were Naa10p is likely to represent an active GNAT fold. Consistent with able to disrupt NatA complex formation. Several additional scattered this hypothesis, a backbone alignment of key active site elements in intermolecular interactions serve to supplement the Naa10p-Naa15p active Naa10p with the corresponding region in the independently interface (Supplementary Fig. 3). active human NAA50 that selects a 1-Met-Leu-2 N-terminal sequence shows a high degree of structural conservation (r.m.s. deviation of Molecular basis for Naa15p modulation of Naa10p acetylation 1.52 Å). The corresponding alignment of the complexed and uncom- To explore the molecular basis for Naa15p modulation of Naa10p plexed forms of Naa10p showed less structural conservation, with an acetyltransferase activity, we determined the X-ray crystal structure of r.m.s. deviation of 2.43 Å (Fig. 2e). Naa10p of S. pombe in the absence of Naa15p, for comparison with the holo-NatA complex (Table 1). We determined the structure of Naa10p Substrate peptide binding and NatA inhibition (residues 1–156) to 2.00-Å resolution, using a combination of single- To determine the molecular basis for substrate-specific peptide bind- wavelength anomalous diffraction and molecular replacement (model, ing by NatA, and in particular how, unlike most other NATs, it is able NAA50) to phase data collected on a selenomethionine-derivatized to accommodate a number of nonmethionine N-terminal substrates, Naa10p protein. An alignment of the complexed and uncomplexed we synthesized a bisubstrate conjugate in which CoA is covalently forms of Naa10p revealed that the A1–loop–A2 segment assumes linked to a biologically relevant substrate peptide fragment with the a substantially different conformation in the presence of Naa15p sequence 1-SASEA-5 (CoA-SASEA). We performed inhibition stud- (Fig. 2a). Notably, this conformational change is driven by the move- ies with CoA-SASEA and with a control compound, acetonyl CoA, ment of several hydrophobic residues in Naa10p A2 (Leu28, Leu32 which is a nonhydrolyzable acetyl CoA analog (Fig. 3a). Half-maxi- and Ile36), which make intramolecular interactions with residues in mum inhibitory concentration (IC50) determinations revealed that Naa10p A1 and B3 (Ile8, Leu11, Met14, Tyr55 and Tyr57) in apo- CoA-SASEA had an IC50 of 1.4 o 1.0 MM, whereas acetonyl CoA Naa10p but shift to make alternative intermolecular interactions had an IC50 of 380 o 10 MM (Fig. 3b). To assess the specificity of this with helices of Naa15p in the NatA complex (Figs. 1c and 2b). As inhibitor toward NatA, we also calculated IC50 values of CoA-SASEA a result of this interaction, the C-terminal region of the A1 helix and acetonyl CoA with NAA50, a NAT that requires a substrate undergoes an additional helical turn, which helps to reposition the N-terminal methionine residue. We found that CoA-SASEA has an A1–A2 loop. Notably, docking of the apo-Naa10p structure into the IC50 of 11 o 2 MM, whereas acetonyl CoA has an IC50 of 130 o 12 MM corresponding binding pocket of Naa15p showed a clash between the (Fig. 3b). With NAA50, the addition of a SASEA peptide portion Naa10p A1–loop–A2 and Naa15p Arg448 and Naa15p Phe449 of A25 was able to increase the potency of acetonyl CoA by only about ten- (Fig. 2c)—the same interface that we have shown to be necessary for fold, whereas with NatA this peptide addition exhibited an increase proper complex formation (Fig. 1d and Table 2). in potency of about 300-fold. The greater potency of acetonyl CoA As a result of the Naa15p interaction along one side of the Naa10p with NAA50 over Naa10p can be explained by the stronger binding of A1–loop–A2 region, residues on the opposite side of this loop region acetyl CoA to NAA50 (Km = 27 o 2 MM) than to Naa10p (Km = 59 o appear to adopt a specific conformation that is essential for catalysis 5 MM). The markedly higher potency of the CoA-SASEA inhibitor

1100 VOLUME 20 NUMBER 9 SEPTEMBER 2013 NATURE STRUCTURAL & MOLECULAR BIOLOGY Cell Fraconaon

Proteomics Analysis of EBV-transformed cell lines from family members WT carrier

I +| +/mut 1. 2.

WT MUT carrier carrier carrier MUT WT WT mut | II +| +/mut +/mut SB +/mut +/+ +| B. A. 1. 2. 3. 4. 5. 6. 7. 8.

WT WT MUT MUT MUT mut | mut | mut | III +/+ +| 1. 2. 3. 4. 5. 6. 7.

= male III.4. proband hemizygous, mutant (89323) (#1a) (#1b) = female II.2. mother of proband, carrier (89324) (#2) = FFPE DNA (for paent III.7.) or DNA from blood available (and for II.A. married-in father of proband, WT(89325) some of them: EBV transformed cell lines available + skin fibroblast of III.2. brother of proband, WT(90526) (#3) paent III.6.) III.1. sister of proband, WT (90527) (#4) SB = sllborn I.2. grandmother of proband, carrier (90528) I.1. married-in grandfather of proband, WT(90529) (#5) = proband II.7. aunt of proband, WT (90530) (#6) II.3. aunt of proband, carrier (90531) = paent samples analyzed by N-terminal COFRADIC analyses (#1 to #5) II.B. married-in uncle of proband, WT(90532) (#8) II.8. uncle of proband, WT(90688) (#9) = paent samples prepared for N-terminal COFRADIC analyses (but sll II.5. aunt of proband, carrier with deceased boy (90797) (#7) to be analyzed) (#8 and #9) N-terminal COFRADIC

Staes A et al. (2011) Nat. Protoc. 6, 1130-1141 Gevaert K et al. (2003) Nat. Biotechnol. 21, 566-569 Proteomics Strategy With Thomas Arnesen, Petra van Damme And Kris Gevaert Some downstream substrates for NatA Letters

pubs.acs.org/acschemicalbiology

Design, Synthesis, and Kinetic Characterization of Protein N‑Terminal Acetyltransferase Inhibitors †,‡,§ § § † ‡,∥ Havard Foyn, Justin E. Jones, Dan Lewallen, Rashmi Narawane, Jan Erik Varhaug, Letters ̊ pubs.acs.org/acschemicalbiology Paul R. Thompson,*,§ and Thomas Arnesen*,†,‡ †Department of Molecular Biology, University of Bergen, N-5020Design, Bergen, Synthesis, Norway and Kinetic Characterization of Protein N‑Terminal ‡ Acetyltransferase Inhibitors Department of Surgery, Haukeland University Hospital, N-5021 Bergen,†,‡,§ Norway § § † ‡,∥ Havard̊ Foyn, Justin E. Jones, Dan Lewallen, Rashmi Narawane, Jan Erik Varhaug, §Department of Chemistry, The Scripps Research Institute, 120Paul R. Scripps Thompson, Way,*,§ and Jupiter, Thomas Arnesen Florida*,†,‡ 33458, United States ∥ †Department of Molecular Biology, University of Bergen, N-5020 Bergen, Norway Department of Surgical Sciences, University of Bergen, N-5020‡Department Bergen, of Surgery, Norway Haukeland University Hospital, N-5021 Bergen, Norway §Department of Chemistry, The Scripps Research Institute, 120 Scripps Way, Jupiter, Florida 33458, United States *S Supporting Information ∥Department of Surgical Sciences, University of Bergen, N-5020 Bergen, Norway *S Supporting Information

ABSTRACT: The N-termini of 80−90% of human proteins ABSTRACT: The N-termini of 80−90% of human proteinsare acetylated by the N-terminal acetyltransferases (NATs), NatA−NatF. The major NAT complex, NatA, and particularly are acetylated by the N-terminal acetyltransferases (NATs),the catalytic subunit hNaa10 (ARD1) has been implicated in cancer development. For example, knockdown of hNaa10 NatA−NatF. The major NAT complex, NatA, and particularlyresults in cancer cell death and the arrest of cell proliferation. It also sensitized cancer cells to drug-induced cytotoxicity. the catalytic subunit hNaa10 (ARD1) has been implicated inHuman NatE has a distinct substrate specificity and is essential for normal chromosome segregation. Thus, NAT inhibitors cancer development. For example, knockdown of hNaa10may potentially be valuable anticancer therapeutics, either results in cancer cell death and the arrest of cell proliferation.directly It or as adjuvants. Herein, we report the design and synthesis of the first inhibitors targeting these enzymes. Using also sensitized cancer cells to drug-induced cytotoxicity.the substrate specificity of the enzymes as a guide, we synthesized three bisubstrate analogues that potently and selectively inhibit μ μ fi the NatA complex (CoA-Ac-SES4; IC50 = 15.1 M), hNaa10, the catalytic subunit of NatA (CoA-Ac-EEE4; Ki = 1.6 M), and Human NatE has a distinct substrate speci city and is essentialNatE/hNaa50 (CoA-Ac-MLG7; Ki* = 8 nM); CoA-Ac-EEE4 is a reversible competitive inhibitor of hNaa10, and CoA-Ac-MLG7 is a slow tight binding inhibitor of hNaa50. Our demonstration that it is possible to develop NAT selective inhibitors should for normal chromosome segregation. Thus, NAT inhibitorsassist future efforts to develop NAT inhibitors with more drug-like properties that can be used to chemically interrogate in vivo may potentially be valuable anticancer therapeutics, eitherNAT function. directly or as adjuvants. Herein, we report the design androtein acetylation has received significant attention in the subsets of proteins that are largely defined by their N-terminal synthesis of the first inhibitors targeting these enzymes. UsingP past decade due to its important roles in regulating amino acid sequence.1 For example, the highly conserved NatA eukaryotic cell signaling. Of the various residues known to be complex, which is composed of the catalytic subunit hNaa10 the substrate specificity of the enzymes as a guide, we synthesizedacetylated in three vivo, lysine bisubstrate acetylation has receivedanalogues the most that(Ard1) potently and the nonenzymatic and selectively auxiliary subunit inhibit Naa15 (Nat1/ attention due to its critical role in controlling gene transcription NATH),8−11 acetylates N-termini starting with Ser, Ala, Thr, the NatA complex (CoA-Ac-SES4; IC50 = 15.1 μM), hNaa10,and how the dysregulation catalytic of this processsubunit contributes of to NatA the onset (CoA-Ac-EEE4;Gly, Val, and Cys; NatAK actsi = on 1.6 theseμ N-terminiM), and after the and progression of disease. While the importance of lysine initiator Met (iMet) has been removed by methionine NatE/hNaa50 (CoA-Ac-MLG7; Ki* = 8 nM); CoA-Ac-EEE4acetylation is a reversible is clear, it is now competitive evident that N-terminal inhibitor (Nt)- aminopeptidases. of hNaa10,11,12 Interestingly, and CoA-Ac-MLG7 the substrate specificity of acetylation, the second major type of protein acetylation, also hNaa10 shifts when tested independently of hNaa15, such that is a slow tight binding inhibitor of hNaa50. Our demonstrationplays a critical that role in it eukaryotic is possible cell signaling. to This develop is the case NATon its own, selective hNaa10 acetylates inhibitors acidic actin N-termini shouldin vitro.13 ff because the majority of eukaryotic proteins are Nt-acetylated, Although hNaa50 (Nat5/San), the catalytic subunit of NatE, assist future e orts to develop NAT inhibitors with more drug-like properties that can be used to chemically interrogate14,15 in vivo mostly cotranslationally,1 and the functional consequences of interacts with the NatA complex, it is considered a separate this modification are quite diverse,2 including (i) degradation of NAT because it acetylates a distinct group of substrates starting NAT function. 13,16 Nt-acetylated proteins by a novel branch of the N-end rule with the iMet. pathway;3 (ii) inhibition of post-translational ER-transloca- Agrowingnumberofstudiesreporttheupregulated tion;4 (iii) protein complex formation;5 and (iv) protein expression, at both the mRNA and protein level, of the two targeting to membranes.6,7 main components of the NatA complex (i.e., hNaa10 and fi Nt-acetylation occurs when the acetyl moiety of acetyl hNaa15) in severalfi different cancer types, suggesting that rotein acetylation has received signi cant attention in thecoenzyme A (Ac-CoA)subsets is transferred of proteins to the α-amino that group of are a largely de ned by their N-terminal polypeptide by one of the six N-terminal acetyltransferases1 Received: February 24, 2013 past decade due to its important roles in regulating(NATs), i.e., NatAamino−NatF. Eachacid NAT sequence. is composed of oneFor or example,Accepted: March the 29, highly2013 conserved NatA P fi eukaryotic cell signaling. Of the various residues known to bemore distinctcomplex, subunits, and these which enzymes acetylate is composed speci c of the catalytic subunit hNaa10 acetylated in vivo, lysine acetylation has received the most (Ard1)© and XXXX American the Chemical nonenzymatic Society A auxiliarydx.doi.org/10.1021/cb400136s subunit Naa15| ACS Chem. Biol. XXXX, (Nat1/ XXX, XXX−XXX attention due to its critical role in controlling gene transcription NATH),8−11 acetylates N-termini starting with Ser, Ala, Thr, and how dysregulation of this process contributes to the onset Gly, Val, and Cys; NatA acts on these N-termini after the and progression of disease. While the importance of lysine initiator Met (iMet) has been removed by methionine acetylation is clear, it is now evident that N-terminal (Nt)- aminopeptidases.11,12 Interestingly, the substrate specificity of acetylation, the second major type of protein acetylation, also hNaa10 shifts when tested independently of hNaa15, such that plays a critical role in eukaryotic cell signaling. This is the case on its own, hNaa10 acetylates acidic actin N-termini in vitro.13 because the majority of eukaryotic proteins are Nt-acetylated, Although hNaa50 (Nat5/San), the catalytic subunit of NatE, mostly cotranslationally,1 and the functional consequences of interacts with the NatA complex,14,15 it is considered a separate this modification are quite diverse,2 including (i) degradation of NAT because it acetylates a distinct group of substrates starting 13,16 Nt-acetylated proteins by a novel branch of the N-end rule with the iMet. pathway;3 (ii) inhibition of post-translational ER-transloca- Agrowingnumberofstudiesreporttheupregulated tion;4 (iii) protein complex formation;5 and (iv) protein expression, at both the mRNA and protein level, of the two targeting to membranes.6,7 main components of the NatA complex (i.e., hNaa10 and Nt-acetylation occurs when the acetyl moiety of acetyl hNaa15) in several different cancer types, suggesting that coenzyme A (Ac-CoA) is transferred to the α-amino group of a polypeptide by one of the six N-terminal acetyltransferases Received: February 24, 2013 (NATs), i.e., NatA−NatF. Each NAT is composed of one or Accepted: March 29, 2013 more distinct subunits, and these enzymes acetylate specific

© XXXX American Chemical Society A dx.doi.org/10.1021/cb400136s | ACS Chem. Biol. XXXX, XXX, XXX−XXX Summary

• Found first human genec disease involving Nt-acetylaon of proteins

• Characterizing the Nt-acetylaon pathway both in vitro and in vivo

• Working toward idenfying interacng components and more downstream substrates of NatA complex.