bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

Deep whole genome sequencing of multiple proband tissues and parental blood reveals

the complex genetic etiology of congenital diaphragmatic hernias

EL Bogenschutz1,

ZD Fox1

A Farrell1,2

J Wynn3

B Moore1,2

L Yu3

G Aspelund4,

G Marth1,2

MY Yandell1,2

Y Shen5,6,7

WK Chung3,8,9

G Kardon1

1 Dept of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT

2 USTAR Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, UT

3 Dept of Pediatrics, Columbia University Irving Medical Center, New York, NY.

4 Dept of Surgery, Columbia University Irving Medical Center, New York, NY

5 Dept of Systems Biology, Columbia University Irving Medical Center, New York, NY.

6 Dept of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY.

7 JP Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY.

8 Dept of Medicine, Columbia University Medical Center Irving Medical Center, New York, NY

9 Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York,

NY

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

ABSTRACT

The diaphragm is a mammalian muscle critical for respiration and separation of the thoracic and

abdominal cavities. Defects in the development of the diaphragm are the cause of congenital

diaphragmatic hernia (CDH), a common birth defect. In CDH, weaknesses in the developing diaphragm

allow abdominal contents to herniate into the thoracic cavity and impair lung development, leading to a

high neonatal mortality. The genetic etiology of CDH is complex. Single nucleotide variants (SNVs),

insertion/deletions (indels), and structural/copy number variants in more than 150 have been

associated with CDH, although few genes are recurrently mutated in multiple patients and recurrently

mutated genes can be incompletely penetrant. This suggests that multiple genetic variants in

combination, other not yet investigated classes of variants, and/or nongenetic factors contribute to CDH

susceptibility. However, to date no studies have comprehensively investigated the contribution of all

possible classes of variants throughout the genome to the etiology of CDH. In our study, we used a

unique cohort of four patients with isolated CDH with samples from blood, skin, and diaphragm

connective tissue and parental blood samples and deep whole genome sequencing to assess germline

and somatic de novo and inherited variants of various sizes (SNVs, indels, and structural variants) in

exons, introns, UTRs, and intergenic regions. In each patient we found a different mutational landscape

that included germline de novo, and inherited SNVs and indels in multiple genes. We also found in two

patients an inherited 343 bp deletion interrupting an annotated enhancer of the CDH associated ,

GATA4, and we hypothesize that this common deletion (found in 1-2% of the population) acts as a

sensitizing allele for CDH. Overall, our comprehensive reconstruction of the genetic architecture of four

CDH individuals demonstrates that the etiology of CDH is heterogeneous and multifactorial.

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

AUTHOR SUMMARY

Deep whole genome sequencing of family trios shows that etiology of congenital diaphragmatic hernias

is heterogeneous and multifactorial.

INTRODUCTION

The diaphragm is a mammalian-specific muscle critical for respiration and separation of the abdominal

and thoracic cavities [1]. Defects in diaphragm development lead to congenital diaphragmatic hernias

(CDHs), a common structural birth defect [1 in 3000-3500 births; 2, 3-5] in which the barrier function of

the diaphragm is compromised. In CDH, a weakness develops in the diaphragm, allowing the

abdominal contents to herniate into the thoracic cavity and impede lung development. The resulting

lung hypoplasia and pulmonary hypertension are important causes of the neonatal mortality and long-

term morbidity associated with CDH [5-7]. The phenotype of CDH is highly variable and the clinical

outcomes are diverse, in part due to other associated congenital anomalies [7, 8]. Underlying this

phenotypic diversity is a complex genetic etiology [9].

Genetic variants in many chromosomal regions and over 150 genes have been implicated in CDH.

Molecular cytogenetic studies of individuals with CDH have identified multiple aneuploidies,

chromosomal rearrangements, and copy number variants in different chromosomal regions [9, 10].

Chromosomal abnormalities are found in 3.5-13% of CDH cases and are most frequently associated

with complex cases in which hernias appear in conjunction with other comorbidities [9]. In addition, a

multitude of individual genes have been identified through analyses of chromosomal regions commonly

associated with CDH [e.g. 11], recent exome sequencing studies [12-15], and analyses of mouse

mutants [e.g. 16, 17]. Variants in these genes can lead to either isolated or complex CDH. Altogether,

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

the diversity of chromosomal regions and genes associated with CDH demonstrates the heterogeneous

genetic origins of CDH.

Most genetic studies of the etiology of CDH have focused on the role of germline de novo variants. The

preponderance of CDH cases that occur sporadically without a family history of CDH [7] and the low

sibling recurrence rate [0.7%; 18] have argued for the importance of this class of genetic variants.

Indeed studies of trios of CDH-affected children and their unaffected parents have identified de novo

pathogenic copy number variants (affecting at least one gene) in 4% of CDH patients [19]. In addition,

trio studies employing exome or genome sequencing have discovered candidate de novo deleterious

gene variants [12-15]. Nevertheless one of these exome sequencing studies [15] estimated that only

15% of sporadic non-isolated CDH cases can be attributed to de novo gene-disrupting or deleterious

missense variants. Particularly given the small number of cases sequenced to date, most identified

genes recur in none or only a few CDH cases [20]. In addition, variants in particular CDH-associated

chromosomal regions or genes are often incompletely penetrant for CDH or cause subtle subclinical

diaphragm defects [11, 21]. Thus, while de novo copy number variants and variants in individual genes

undoubtedly are important, the genetic etiology of CDH is more complex and likely polygenic and

multifactorial.

Another class of variants that may contribute to CDH etiology are somatic de novo variants. A potential

role of somatic variants has been suggested by the discordant appearance of CDH in monozygotic

twins [18, 22] and the finding of tissue-specific genetic mosaicism in CDH patients [23, 24]. More

recently, our functional studies using mouse conditional mutants found that development of localized

muscleless regions leads to CDH and suggest that in humans somatic variants in the diaphragm may

cause muscleless regions that ultimately herniate [17]. However, to date only one human study has

explicitly examined whether somatic variants contribute to CDH [25].

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

Although less commonly investigated, inherited variants have been linked to CDH. Analyses of families

with multiple members affected by CDH revealed that autosomal recessive alleles can cause CDH [26-

29]. Other familial CDH cases exhibit an inheritance pattern of autosomal dominance with incomplete

penetrance. For instance, two families have been reported with multiple CDH offspring that inherited

either a large deletion or frameshift variant in ZFPM2, but with unaffected carrier parents [30]. In another

case, monoallelic missense variants in GATA4 were inherited in three generations of one family and

associated with a range of diaphragm defects, but only one family member had symptomatic CDH [21].

Thus, these familial cases demonstrate that inherited variants can contribute to CDH etiology, but these

genetic variants often exhibit incomplete penetrance and variable expressivity. This suggests that some

inherited variants may only lead to CDH in the context of other interacting genetic variants.

While human genetics studies have been essential for identifying candidate CDH chromosomal regions

and genes and providing insights into the genetic architecture underlying CDH, experiments with

rodents have been critical for determining mechanistically how the diaphragm and CDH develop and

explicitly testing whether candidate genes cause CDH. Embryological and genetic lineage experiments

[17, 31-33] have shown that the diaphragm develops primarily from two transient embryonic tissues: the

somites and the pleuroperitoneal folds (PPFs). The somites are the source of the diaphragm’s muscle,

as muscle progenitors migrate from cervical somites into the nascent diaphragm [33]. The PPFs give

rise to the diaphragm’s muscle connective tissue and central tendon [17]. Importantly, the PPFs

regulate the development of the diaphragm’s muscle and control overall diaphragm morphogenesis,

which takes place between embryonic day (E) 9.5 and E16.5 in the mouse [corresponding to E30-60 in

humans; 17, 33]. Engineered mutations in the mouse of candidate CDH genes have definitively

established that these genes are functionally important in CDH [17]. In addition, conditional

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

mutagenesis experiments indicate that the PPFs are an important cellular source of CDH as

inactivation of Gata4, WT1, or b-catenin in the PPFs results in hernias [17, 34, 35], while Gata4

inactivation in somites does not affect diaphragm development [17]. Furthermore, these experiments

established that mutations in CDH genes initiate aberrations in the earliest development of the PPFs

[by E12.5 in mouse; 17, 34, 35]. In contrast, mutations in the diaphragm’s muscle lead to diaphragms

that are muscle-less or with thin or aberrant muscle, but have never been found to lead to CDH [17, 36-

45]. Altogether these data indicate that the PPFs are critical for diaphragm morphogenesis and a

cellular source of CDH, while a direct role in CDH for genes expressed in the diaphragm’s muscle is

less clear. Given the importance of the PPFs in CDH, prioritization of genes expressed in the early

mouse PPFs is likely to be an effective strategy for evaluating new candidate CDH genes derived from

human genetic studies.

In this study, we take a novel approach to studying the etiology of CDH. Complementing recent studies

using large cohorts of CDH patients that focus on one class of possible variants - de novo germline

variants [12-15], we comprehensively examine the genome of four CDH patients with multiple tissue

samples and their unaffected parents. Using deep whole genome sequencing and a sophisticated

bioinformatics toolkit, we determine the contribution of germline and somatic de novo and inherited

variants to CDH etiology. Our analysis includes variants of different sizes – single nucleotide variants

(SNVs), small insertions and deletions (indels), and larger structural variants (SVs) – in all genomic

regions (exons, introns, UTR, and intergenic). We prioritize implicated genes not only based on their

frequency in the general population and predicted effect on gene function, but on their expression in

early PPFs and diaphragm muscle progenitors, using a newly generated mouse RNA-seq dataset.

Altogether we reconstruct the diverse genetic architecture underlying isolated CDH in four individuals,

revealing the heterogenous and multifactorial genetic etiology of CDH.

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

RESULTS

Strongly supported CDH genes are expressed at high levels in early mouse PPF fibroblasts or

diaphragm muscle progenitors

Studies of CDH patients and mouse models have implicated many genes in the etiology of CDH.

However, not all CDH-implicated genes are equally well-supported by evidence. To systematically and

comprehensively analyze published CDH-implicated genes, we compiled a list of 153 CDH-implicated

genes from three large cohort studies [13-15], previous literature reviews [9, 46] and recently published

studies [47, 48] (TableS1). We included genes that either had mouse functional data or human genetic

data found in more than 1 CDH individual, associated with other developmental disorders or structural

birth defects that co-occur in patients with complex CDH, and/or implicated [by 13] via their interaction

with known CDH genes and expression in the developing diaphragm [as determined by 46]. We ranked

each gene based on the nature of the variants, penetrance, and frequency reported in mouse and

human data (for details, see Materials and Methods). For mouse data, genes in which variants resulted

in herniation of abdominal contents into the thoracic cavity were ranked more highly than those which

simply resulted in muscleless regions or entirely muscleless diaphragms. In addition, genes in which

variants lead to highly penetrant phenotypes were more highly ranked. For human data, genes in which

inherited homozygous or biallelic putative deleterious (as described by original publication) variants or

de novo deleterious variants were found in more than one CDH patient (either more than one patient in

one study or across multiple studies) were ranked more highly than genes with putative deleterious

variants of unknown inheritance or found in only one patient. The scores for mouse and human data

were added to produce a final score and ranking. Twenty seven genes had a score of 10-19 and were

deemed highly likely to contribute to CDH etiology, fifty one genes had a score of 5-9, and seventy five

genes had a score of < 5.

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

Previous mouse studies of the development of CDH demonstrated that the PPFs are a critical cellular

source of CDH, and many CDH-implicated genes are expressed and required in early PPF cells [17,

34, 35, 49]. To test whether the CDH genes identified in the literature are expressed in the PPFs or

associated diaphragm muscle progenitors, we isolated E12.5 PPF fibroblasts and diaphragm muscle

progenitors from wild-type mice and performed RNA sequencing. We found that nearly all 26/27 (96%)

highly ranked genes are expressed at levels of at least 10 transcripts per million (TPM) reads (which

includes 29% of total transcripts), while 45/51 (88%) of moderately ranked genes and 63/75 (84%) of

lowly ranked genes are expressed at this level (Fig 1A). These data suggest that genes involved in

CDH are expressed at levels of at least 10 TPM in E12.5 PPFs or diaphragm muscle progenitors. In

evaluating the significance of newly identified putative CDH-genes, a finding that such genes are

expressed at > 10 TPM increases confidence that such genes indeed are important to CDH etiology.

Analysis of the highly ranked CDH genes reveals gene families and pathways that likely lead to CDH.

To discover networks, we inputted the 153 genes into STRING [50] (using all active interaction

sources except text mining and requiring a minimum interaction score of 0.4) to generate a protein

network (Fig 1B). The two highest scoring genes, GATA4 and ZFPM2 (FOG2), encode a transcription

factor and a co-factor that directly interact with each other [51]. In addition, in the heart GATA4 and

ZFPM2 interact with the protein encoded by the highly ranked gene, NR2F2 [52] and GATA4 interacts

with the protein encoded by the highly ranked gene, TBX5 [53, 54]. Thus, not only are variants in

GATA4, GATA6, ZFPM2, NR2F2, and TBX5 highly implicated in CDH, but GATA4 (GATA6), ZFPM2,

NR2F2, and TBX5 may function together in a complex to regulate diaphragm development.

Another class of highly ranked genes are those involved in the extracellular matrix (EFEMP2, FREM1,

FREM2, FRAS1, FBN1, HSPG2, COL3A1, COL20A1, LAMA5, and ELN) as well as genes that modify

matrix components (MMP2, MMP14, NDST1, and LOX). Also, prominent are genes involved in several

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

critical developmental pathways: -ROBO signaling (SLIT3, ROBO1, ROBO2), Retinoic Acid

signaling (RARB, RARA, STRA6), SHH signaling (GLI2, GLI3, KIF7, DISP1, STK36), and WNT

signaling (CTNNB1, FZD2, PORCN), MET signaling (MET, GAB1, PTPN11), and FGF signaling

(FGFRL1, FGFR2).

Present in the list of CDH-implicated genes are those expressed in myogenic cells and critical for

myogenesis (SIX4, SIX1, EYA1, EYA2, MEF2A, MSC, PAX7, PAX3, MYOD1, MYOG, DES, TNNT3).

However, while these genes are important for myogenesis and variants lead to muscle-less regions or

muscle-less diaphragms, it is not clear that they lead to diaphragmatic hernias. For instance, variants in

Pax3 in mouse lead to completely muscle-less diaphragms, and while the diaphragms are highly

domed, they do not allow abdominal contents to herniate into the thoracic cavity [17]. Thus, while

multiple reviews have included these genes as potentially implicated in CDH [e.g. 9, 13, 46], their true

role in CDH is less clear.

Cohort of 4 CDH probands and their parents with multiple proband tissue samples enable

unique genetic insights into CDH etiology

To gain greater insight into the genetic etiology of CDH, we analyzed four probands with CDH and their

parents with a unique array of tissue samples (Fig 2). Our cohort consists of four unrelated individuals

(male probands 411 and 967 and female probands 716 and 809) who had an isolated CDH in which the

diaphragmatic hernias were encased by a connective tissue sac. Reflecting the increased prevalence

of left versus right CDH [7], three probands (411, 809, and 967) had left CDH and one proband (716)

had right CDH. Three of the probands (411, 809 and 967) had hernias < 50% of chest wall devoid of

diaphragmatic tissue, and one (716) had a large hernia (> 50% of chest wall devoid of diaphragmatic

tissue). From each CDH proband, the sac was surgically removed during corrective surgery and saved

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

and skin biopsies and blood draws taken. In addition, blood samples were taken from each parent.

Altogether five samples (sac, skin, and blood from CDH proband and blood from 2 parents), from each

family were paired-end, whole genome sequenced to an average coverage > 50 reads across each

genome. Using Peddy [55], for each pentad of samples we confirmed sample quality, sex, relatedness,

and reported ancestry (Table S2). The DNA samples from the connective tissue sac uniquely enable us

to test the hypothesis, suggested by our previous mouse functional studies [17], that somatic variants in

the diaphragm’s connective tissue contribute to the etiology of CDH. In addition, the high depth of

whole genome sequencing and availability of new bioinformatic tools enable us to examine a wide

spectrum of potential genetic variants including SNVs, indels, and SVs in coding and non-coding

regions of the genome.

Analysis of germline de novo variants identifies THSD7A as a novel CDH candidate gene

Because most CDH patients have no family history of CDH, previous human genetic studies on the

etiology of CDH have concentrated on de novo variants [12-15]. Using blood samples from CDH

probands and their parents, these studies have inferred that variants present in the proband and not the

parents arose in the egg or sperm giving rising to the child and thus designated the discovered variants

as germline de novo variants. However, blood samples may also harbor somatic variants that arose

later in development and such somatic variants could be erroneously designated as a germline variant.

Because we have samples from three different tissues of each CDH proband as well as the parents,

variants present in the proband diaphragm sac, skin, and blood but not present in the parents are

confidently identified as germline (or at least early embryonic) variants. To discover such germline de

novo variants, we analyzed whole genome sequences from the pentad of samples using the variant

calling tool RUFUS (https://github.com/jandrewrfarrell/RUFUS). RUFUS subdivides each genome into

a series of k-mers and aligns k-mers from samples of interest to control samples to identify unique

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

SNVs and INDELs in the sample of interest. CDH proband genomes were compared with parental

genomes, and variants present in > 20% of reads (with GQ scores >20 and read depths > 15) of

proband diaphragm sac, skin, and blood genomes but not in parental genomes were designated as

germline de novo variants and confirmed by visual inspection in the Integrated Genome Viewer [56].

Germline de novo SNVs and INDELs were found in all four CDH probands (shown in intersection of

diaphragm sac, skin, and blood of 4 CDH probands in Fig 3A and detailed in Table S3). CDH probands

have 48-116 germline de novo variants, falling close to the 70 de novo SNVs expected in each

newborn in an average population [57, 58]. As expected, the largest number of de novo germline

variants were in intergenic regions (24-63 variants) and fewer were in UTR or intronic regions (20-50

variants). In the 4 probands, germline de novo non-synonymous coding (exon) variants in six genes

(PEX6, SCARB1, OLFM3, ZNF792, AR, and THSD7A) were discovered. De novo coding variants

impacting these genes have not been found in other CDH patients, and these genes are not located

within CDH-associated chromosomal regions. All of these coding variants are rare (gnomAD frequency

< 0.0001). The PEX6 variant is a damaging frameshift variant, while the THSD7A variant is a damaging

nonsense variant (stop gain, p.Trp260Ter). SCARB1, OLFM3, ZNF792, and AR are all missense

variants, but only the OLFM3 and ZNF792 variants are predicted to be damaging by PROVEAN (which

takes into account conservation of homologous sequences [59]). THSD7A is the only one of these four

genes (PEX6, THSD7A, OLFM3, and ZNF792) with damaging variants predicted by ExAC Pli [60] to be

haploinsufficient (intolerant of loss of one allele) and also substantially expressed in mouse PPFs or

diaphragm muscle progenitors (TPM of 7.0; Fig 3B). Thus THSD7A is a promising new candidate gene

in which variants lead to CDH. THSD7A (Thrombospondin Type I Domain Containing 7A) is a protein

containing 10 thrombospondin type I repeats and through its co-localization with avb3 integrin and

paxillin has been shown to promote endothelial cell migration during development [61-63]. In diaphragm

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

development, THSD7A may similarly regulate diaphragm vascularization or it may regulate PPF

fibroblast migration, which is essential for diaphragm morphogenesis [17].

Somatic de novo variants are present in diaphragm, skin and blood, but diaphragm somatic

variants are unlikely to contribute to CDH in the probands analyzed

Our previous mouse genetic functional study of CDH [17] suggested the hypothesis that somatic

variants in the PPF-derived muscle connective tissue contribute to CDH etiology. Our cohort of CDH

proband samples that include the diaphragm sac, which is composed of PPF-derived connective tissue,

uniquely allow us to test this hypothesis. In addition, because the samples were collected within a few

weeks of birth, we are uniquely positioned to determine the background frequency of somatic variants

in the skin and blood prior to exposure of any potential environmental mutagens (e.g. UV light).

To identify somatic de novo variants, diaphragm sac, skin, or blood CDH proband genomes were

compared via RUFUS with the other 2 genomes derived from the same CDH proband, and variants

present in > 20% of reads (with GQ scores >20 and read depths > 15) in only 1 or 2 tissues of each

proband genome and not present in parental genomes were designated as somatic de novo variants.

Variants were only included in which all three tissue samples had at least 15X coverage and Phred

quality score [64, 65] above 20. Variants were designated as tissue-specific if > 20% alternate variants

reads were present in one or two tissue samples and no alternate reads were present in the other

samples.

Somatic de novo variants were found in all four CDH probands (Fig 3, Table S3). Across both

individuals and tissues, the alternative allele read depth of somatic variants was significantly lower than

germline de novo variants (Fig S1) [66-68]. This indicates that the somatic de novo variants are present

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

in a subset of the sampled cells; in the diaphragm, this reflects either that not all PPF-derived

connective cells harbor the variant and/or that the sampled tissue includes several cells types (e.g.

connective tissue fibroblasts and endothelial cells ) of which only one cell type (e.g. fibroblasts) harbors

the variant. An analysis of the mutational spectrum (Fig S2A) reveals that the spectrum was generally

similar across all probands (although proband 411 and 967 harbor a larger number of deletions) and,

as expected, transitions (C/T or A/G changes) are more frequent than transversions (C/G, C/A, A/T,

A/C changes). The mutational spectrum also did not vary widely amongst diaphragm, skin, and blood

(Fig S2B).

Diaphragm sac, skin, and blood were all found to harbor private variants but no variants shared

between two tissues (Fig 3A, Table S3). Intergenic variants were the most common class of somatic

variants with variants in UTR and introns as the next most common. Somatic coding variant were only

found in the blood of Proband 411 (missense variant in MIR4717) and in the skin of Proband 809

(missense variants in DHX57 and TNFAIP8L3 and a synonymous variant in CXorf57). In all four

probands, the diaphragm contained somatic variants, but none of these were in coding regions and

variants in UTR and intron regions did not overlap any annotated enhancers. Thus, while somatic

variants were found in the diaphragm, none of the variants are likely to contribute to the etiology of

CDH in these children.

One striking feature of our analysis was the extremely high number of somatic variants in the

diaphragm sac and skin, but not blood, of proband 809. This high number of variants was not an artifact

of technical issues, as the samples passed all quality control filters of the standardized Utah Genome

Project pipeline (see Materials and Methods). Furthermore, not only does this child harbor more

somatic variants (Fig 3A), but the alternate allele frequency is significantly higher than that found in the

somatic variants in the other probands (Fig S1). This child also harbors a missense germline de novo in

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

the androgen receptor gene AR. AR has been well characterized as a tumor suppressor gene,

specifically in prostate cancer [69], and shown to be critical in DNA repair through activation of

transcriptional targets [70, 71]. However, arguing against a causative role of AR is that the particular AR

missense variant in proband 809 is not predicted by Provean to be damaging.

Our analysis of somatic variants also provides insights into how representative variants in the blood are

of germline variants. Blood is the most commonly sampled human tissue and variants in the blood are

typically designated as “germline” variants. However, our analysis shows that blood harbors somatic

variants that would typically be erroneously tallied as germline variants. In the four children analyzed,

an average of 1.5% (range of 1-4%) of variants in the blood are unique to blood. Thus our analysis

suggests that in general 1.5% of variants found in the blood are not germline variants, but instead

somatic variants.

Inherited variants in genes regulating muscle structure and function potentially contribute to

the etiology of CDH in one proband

Damaging variants inherited from parents may also be a genetic source of CDH. As the parents of the

four probands investigated do not have CDH, it is unlikely that an inherited variant of one allele would

directly cause CDH, while homozygous or biallelic variants, in which each parent contributes a gene

damaging allele, are more likely to contribute to CDH. Given this, recessive homozygous and biallelic

inherited variants were identified and prioritized using the GATK variant calling pipeline and the variant

prioritizing tool VAAST3 [72, 73] (Table S4). VAAST identifies and ranks genes based on whether they

are predicted to be damaging based on protein impact and rare compared against a mixed control

population from 1000 Genomes phase 3 [74]. Given that the significance level VAAST assigns to a

gene is limited by the size of the background population, genes were further prioritized as to their

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

likelihood to be involved in CDH etiology. Pseudogenes [75], highly mutable genes [76, 77], and genes

with multiple alleles in the variant region reported in gnomAD [78] were removed and genes expressed

at higher than 10 TPM in E12.5 mouse PPFs or diaphragm muscle progenitors prioritized (Fig 4A, B

and Table S4).

Using these criteria, we identified 13 genes with biallelic or homozygous variants in the three CDH

probands (Fig 4A and Table S4 with VAAST, CADD, PROVEAN impact, and ExAC Pli scores). None

of these genes has been identified in previous CDH studies. Of these 13 genes, 3 genes (MPEG1,

MYOF, SACS) harbor 1 deleterious frameshift allele and 1 missense allele predicted by Provean to be

neutral; 5 genes (UPF3A, CCDC136, NOP9, ZNF646, SH3PXD2A) harbor 1 missense allele predicted

to be deleterious and 1 missense allele predicted to be neutral by Provean; and 1 gene (ADAMTS2) is

homozygous for an inherited in-frame insertion, predicted by Provean to be neutral, but predicted by

ExAC to be intolerant of loss-of-function (Pli score of 0.99). Because each of these genes has at least

one predicted functional allele, these genes are unlikely to directly cause CDH, but may act as

sensitizing alleles that act in combination with other genetic variants produce CDH.

Proband 716, with a large right hernia, harbored four genes – ALG2, HRC, AHNAK, and MYO1H – in

which both inherited maternal and paternal alleles were frameshift null or missense predicted damaging

alleles and therefore more likely to contribute to CDH etiology. All four genes contain variants that are

rare compared to the background population (1000 Genomes Phase 3), based on the probabilistic

framework underlying VAAST, and all variants are rare (AF < 0.01) in gnomAD (Table S4).

Interestingly, three of these genes are involved in skeletal muscle function. ALG2 encodes an a1,3-

mannosyltransferase that catalyzes early steps of asparagine-linked glycosylation and is expressed at

neuromuscular junctions [79]. Human ALG2 variants have been found to affect the function of the

neuromuscular junction and mitochondria organization in myofibers, leading to congenital myasthenic

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

syndrome (fatigable muscle weakness) and mitochondrial myopathy [79-81]. HRC encodes a histidine-

rich calcium-binding protein that is expressed in the sarcoplasmic reticulum of skeletal, cardiac, and

smooth muscle [82] and in the heart has been shown to regulate calcium cycling [83, 84]. AHNAK

encodes a large nucleoprotein that acts as a structural scaffold in multi-protein complexes [85]. In

particular, AHNAK interacts with dysferlin, which is a transmembrane protein critical for skeletal muscle

membrane repair and loss of dysferlin causes several types of muscular dystrophy [86, 87]. AHNAK is

proposed to play a role in dysferlin-mediated membrane repair [86, 87]. The finding of deleterious

biallelic variants in these three genes suggests that aberrations in the diaphragm muscle’s

neuromuscular junctions, mitochondria, calcium handling, or membrane integrity contributes to the

development of CDH in this child. In addition, proband 716 has one predicted null and one missense

deleterious allele in MYO1H. MYO1H is a motor protein involved in intracellular transport and vesicle

trafficking, expressed in retrotrapezoid neurons critical for sensing C02 and regulating respiration, and

variants in MYO1H cause a recessive form of central hypoventilation [88]. While unlikely to directly

contribute to CDH, the MYO1H variants’ potentially deleterious effect on neuronal regulation of

respiration would have a detrimental impact on a CDH child.

An inherited deletion in intron 2 of Gata4 in two probands is a candidate common sensitizing

allele for CDH

Another important potential source of genetic variants underlying CDH are structural variants [SVs; 47]

and include > 50 bp insertions, deletions, inversions, and translocations. To identify structural variants

that could contribute to the etiology of the four CDH probands, we used the Lumpy smooth pipeline

[89], which uses three copy number variant callers: cn.MOPS [90], CNVkit [91] and CNVnator [92]. We

then determined whether identified SVs were de novo or inherited in the CDH probands using the tool

GQT [93] and confirmed SVs visually using IGV [56]. Though de novo SVs were discovered in multiple

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

probands (Table S5), all were common (allele frequency > 0.1) in a large SV dataset of 14,623

ancestrally diverse individuals [94] or located in an intergenic region. Thus, no discovered de novo SVs

are likely to contribute to CDH etiology.

To discover inherited SVs potentially contributing to CDH, we analyzed variants within chromosomal

regions highly associated with CDH [10]. Multiple candidate inherited SVs were identified (with allele

frequencies < 0.1 and not found in intergenic or repetitive regions, Table S5), but in no case were the

probands homozygous for the SV or biallelic with any of the identified SNVs. However, one SV, a 343

deletion within the second intron of the highly ranked CDH-associated transcription factor,

GATA4 [Table S1; 16, 17, 21], was discovered in two probands (Fig 5) and subsequently confirmed by

PCR. In proband 411 the deletion is paternally inherited (Fig 5A), while in proband 967 the deletion is

maternally inherited (Fig 5D).

Comparison of the 343 bp deleted region in human with the orthologous region in mouse reveals that

this region lies within an enhancer (element 2205) annotated in the VISTA enhancer database [Fig 5E;

95]. This enhancer drives reporter expression at E10.5 in the developing heart in a domain similar to

the GATA4 expression domain [96, 97] and demonstrates that this region is a bona fide GATA4

enhancer. Although yet to be tested, this enhancer may also be important for GATA4 expression in the

PPF cells that are critical for diaphragm development. As a parent of proband 411 and 967 possesses

the deletion and does not have any reported CDH, this deletion alone is unlikely to cause CDH in the

two probands. However, this deletion may sensitize the two probands to develop CDH.

Given that two of the CDH patients in this cohort harbor the 343 bp deletion, this deletion within the

GATA4 intron 2 enhancer may be a sensitizing variant enriched in CDH individuals. To test this, we

used PCR to identify the presence of the 343 bp deletion on a small cohort of 141 CDH individuals

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

(including the 4 CDH individuals from this study) for which DNA was available. In addition, we

compared the frequency of this deletion in the CDH cohort with that of a larger control population

(which includes an ancestrally diverse population, but also includes individuals with common

cardiovascular, neuropsychiatric, and immune- related diseases) with SVs discovered using a similar,

Lumpy software-based pipeline [94]. We found that CDH individuals had an allele frequency of 0.98%

(4 heterozygous individuals of 204 tested individuals, of which 141 had isolated CDH and 63 had

complex CDH) and in the larger control population an allele frequency of 1.9%. These data suggest that

this deletion has a roughly similar frequency of 1- 2% in CDH and control populations and is a common

variant in the population. This relatively common 343 bp deletion may be a sensitizing variant that, in

conjunction with other variants, contributes to the genetic etiology of CDH.

DISCUSSION

In this study, we comprehensively assessed the contribution of germline and somatic de novo and

inherited SNVs, indels, and SVs to CDH etiology and reconstruct for the first time the genetic

architecture of four individuals with isolated CDH. Our ability to perform such a comprehensive analysis

of CDH, identifying genetic variants of different sizes (SNVs, indels, and SVs) in various genomic

regions (exons, introns, UTR, and intergenic) with different inheritance patterns (inherited and germline

and somatic de novo) was enabled by three factors. First, the unique collection of diaphragm sac, skin

and blood samples from individuals with isolated CDH and blood from their parents allowed us to 1.

confidently identify germline variants (present in all three proband samples and absent in parent

samples), 2. discover de novo somatic variants (present in sac, but not in other samples) in the

diaphragm’s connective tissue, which has previously been shown to be an important cellular source of

CDH [9], and 3. determine which variants are inherited (present in all three proband samples and

present in at least one parent). Second, whole genome sequencing DNA samples to an average of >

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

50X coverage was essential for identifying non-coding DNA regions which contribute to CDH etiology

and positively calling somatic variants, which have relatively low numbers of reads. Finally, we

employed a collection of computational tools that uses Illumina pair-end, whole genome sequences to

discover de novo variants (RUFUS), identify and prioritize inherited variants (VAAST), and discover

SVs (Lumpy pipeline). Altogether, the unique cohort of samples, high-depth whole genome sequences,

and computational toolkit were essential to comprehensively interrogate the genetic architecture of the

four CDH individuals. Our pipeline lays the groundwork for future, larger scale studies investigating the

genetic etiology of CDH. In addition, our methodology should be useful for investigating other birth

defects with complex genetic etiologies, such as congenital heart defects [98].

Our analysis of germline de novo variants revealed a potential pitfall of using blood samples to infer

germline de novo variants and also identified a new candidate CDH-causative gene. Previous studies

searching for de novo variants that cause CDH have used DNA from blood samples and then inferred

that such variants are germline de novo variants [12-15]. However, while blood is the most readily

available source of DNA, variants in its DNA may not have originally arisen in the germline, but instead

may have arisen later in somatic cells. Our cohort, with three proband-derived tissue samples, allows

us to explicitly test this alternative hypothesis. We found that an average of 1.5% of the de novo

variants in the proband’s blood were not germline in origin (not present in all three proband tissues) and

is a similar rate as found in other recent papers [58, 99]. In fact, in proband 411 a blood-specific

somatic variant in MIR4717 would have been classified as a germline variant. Thus, researchers should

be cautious about inferring that all de novo variants in the blood are germline in origin. With DNA

samples from three tissues from each CDH proband, we are able to confidently identify germline de

novo variants because such variants will be present in all proband tissues, but not in parental samples.

We found an average of 68 germline de novo variants per proband, and this is similar to the 70

germline de novo variants found in the average population [58, 100-103]. Of these, only a few are in

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

gene-coding regions and only one of these genes, THSD7A, harbors a deleterious variant and is

predicted to be haploinsufficient and thus is a strong candidate CDH-causative allele. A limited number

of studies have investigated the function of Thsd7a in mice and zebrafish, focusing on its role in

regulating endothelial cell migration [61-63]. As Thsd7a is expressed in PPF cells in mouse and

migration of PPF cells is critical for normal diaphragm development and aberrant in CDH [17, 49], an

intriguing hypothesis is that Thsd7a regulates PPF migration.

The largely discordant appearance of CDH in monozygotic twins [22, 104] and our previous mouse

genetic studies [17] suggested that somatic variants in the diaphragm’s connective tissue may be a

genetic feature of some CDH individuals. In particular, our mouse conditional mutagenesis experiments

suggested that the connective tissue fibroblasts, which ultimately form a sac around some

diaphragmatic hernias, may harbor somatic variants [17]. Previous studies of the role of somatic

variants in other structural birth defects, such as congenital heart defects, have relied on blood, saliva,

or skin samples and inferred that low frequency (< 30%) alternate alleles represent somatic variants

that may potentially contribute to birth defect etiology [e.g. 105]. In our study, we have DNA directly

derived from the proband tissue, the PPF-derived diaphragm connective sac, hypothesized to harbor

somatic variants. Because we also have DNA derived from skin and blood proband samples, we were

able to positively identify any alternate allele, regardless of its frequency, present in the sac, but not in

skin or blood (or parental blood), as a somatic variant. Using similar logic, we were able to identify

somatic variants in blood and skin. To confidently identify these somatic variants, we conservatively

included alleles present in at least 20% of the reads (and not present in the other tissues). Using this

strategy, we identified in three of the probands 1-7 private somatic variants in the sac, skin, and blood.

Proband 809 has an aberrantly high number of somatic variants, but we currently have no mechanistic

explanation (e.g. variants in DNA repair genes) for this individual’s high somatic mutational load. In all

probands, the frequency of somatic alternate alleles was significantly lower than germline alleles,

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

indicating that these variants are present in a subset of sampled cells; in the diaphragm sac, either not

all PPF-derived connective tissue fibroblasts harbor the variants and/or other non-fibroblast cells are

included in the sample. Because no somatic variants were shared between two of the proband tissues,

all of the somatic variants must have arisen after the developmental divergence of diaphragm

connective tissue, skin, and blood. Importantly, no variants are shared between blood and diaphragm.

Thus, blood samples are unlikely to be informative about somatic variants in the diaphragm. Our

analysis of the diaphragm revealed multiple somatic variants in the diaphragm’s connective tissue, but

none in coding or annotated enhancers and so these somatic variants are unlikely to be deleterious. A

previous study [25] also found no evidence of damaging somatic variants, although this study examined

tissue sampled around the periphery of the herniated region and so did not specifically sample the

connective tissue that mouse genetic studies [17, 34, 35] predict to be a cellular source of CDH. While

our study did not find potentially damaging somatic variants in the diaphragm’s connective tissue, we

have established an effective discovery strategy. A more definitive test of the role of somatic variants in

CDH etiology awaits a future larger study.

The role of inherited variants in the etiology of CDH has received relatively little attention. Using

VAAST, we identified multiple compound heterozygous or homozygous inherited, presumably

recessive, SNVs and indel variants in all four probands. However, only a small number of these

variants were rare, predicted damaging, and in genes expressed in mouse PPFs or associated

diaphragm muscle progenitors. Of particular note is proband 716 who inherited multiple damaging

variants, including variants in ALG2, HRC, AHNAK, and MYO1H. AHNAK is involved in skeletal muscle

membrane structure and repair [86, 87], while HRC is functionally important in regulating calcium

cycling in muscle [82-84]. ALG2 regulates the function of the neuromuscular junction and organization

of mitochondria in myofibers and human variants in ALG2 have been linked to congenital myasthenic

syndrome and mitochondrial myopathy [79-81]. Together, variants in these three genes may weaken

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

muscle structure and function and lead to CDH. This is a surprising finding as mouse genetic studies

have found that while variants in muscle-specific genes lead to muscle-less or diaphragms with

aberrant muscle, none cause CDH (see Table S1). However, the inherited damaging variants in three

muscle-related genes in proband 716, with an unusually large hernia, suggests that genetic alterations

in muscle may lead to CDH. Another interesting aspect of the genetic profile of proband 716 is the

variant in MYO1H. MYO1H regulates the function of neurons critical for sensing C02 and respiration

[88], and so this MYO1H variant may further compound the respiratory issues introduced by CDH.

In our search for de novo or inherited SVs that could contribute to CDH etiology, we discovered in two

probands a 343 bp deletion in intron 2 of GATA4, a highly ranked CDH-associated gene, that disrupts

an annotated enhancer regulating GATA4 expression [106]. We hypothesize that disruption of this

enhancer leads to lower levels of GATA4 expression. GATA4 has notably dosage sensitive effects on

heart development [107] and likely also on diaphragm development. Given that this deletion is inherited

from unaffected parents and has an allele frequency of 1-2% in the general population, we hypothesize

that this deletion is a relatively common SV that acts as a sensitizing allele for CDH. We hypothesize

that decreased expression of GATA4 expression resulting from the 343 bp deletion confers CDH

susceptibility and in the context of other genetic variants (or environmental factors) leads to CDH. To

test this hypothesis, future experiments in our lab will test in mice whether this 343bp region regulates

GATA4 expression in the PPFs and whether a deletion in this region sensitizes mice to develop CDH.

Our comprehensive analysis of the genomes of four individuals with isolated CDH allows us to

reconstruct the diverse genetic architectures underlying CDH (Fig 6). Proband 809 is the most

enigmatic of the four cases. She harbors no obvious candidate genetic variants leading to CDH, as she

has no predicted damaging inherited recessive variants and one germline de novo missense variant in

AR that is predicted to be neutral. Her genome is particularly unusual in that it contains an abnormally

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

high somatic mutational load in her skin and diaphragm connective tissue, but the variants in the

diaphragm do not affect coding or annotated enhancer regions. The source of large number of somatic

mutations is unclear as she harbors no mutations in DNA repair genes. Proband 411 harbors the 343

bp intron 2 GATA4 deletion that we hypothesize acts as a sensitizing CDH allele, but the collaborating

variants that drive CDH are unclear. The germline de novo frameshift variant in PEX6 is deleterious but

PEX6 is not predicted to be haploinsufficient, and only one of the inherited variants in UPF3A is

predicted deleterious and UPF3A is also not predicted to be haploinsufficient. Proband 716 differs from

the other probands in that she has inherited multiple rare and damaging variants in myogenic genes

that likely lead to CDH. Notably, while three of the probands in our cohort have small left hernias, she is

the only proband who has a large (where > 50% of the chest wall is devoid of diaphragm tissue) right

hernia. Potentially, the origin of her atypical large right hernia [most hernias are found on the left side;

7] is linked to the variants in myogenic genes as opposed to genes expressed in PPF fibroblasts.

Proband 967 is the individual for which we have the strongest hypothesis about the genetic origin of

CDH. This individual harbors the 343 bp intron 2 GATA4 deletion that we postulate acts as a sensitizing

CDH allele and a rare germline de novo damaging missense variant in the haploinsufficient-intolerant

gene, THSD7A. These two variants suggest the hypothesis that during the early development of

proband 967 the PPFs of his nascent diaphragm were prone to apoptosis, unable to proliferate

sufficiently [as GATA4 promotes proliferation and survival; 17], and had defects in migration (due to low

expression levels of THSD7A) that ultimately lead to defects in diaphragm morphogenesis and CDH.

In summary, our comprehensive analysis demonstrates that the genetic etiology of CDH is

heterogeneous and likely multifactorial. A challenge for future studies will be to determine whether,

despite a diverse array of initiating genetic variants, a small set of molecular pathways are consistently

impacted in CDH. Identification of a few key molecular pathways common to all CDH patients would be

critical for designing potential in utero therapies to rescue or minimize the severity of herniation.

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

MATERIALS AND METHODS

Ranking of CDH-implicated genes

CDH genes reported in the literature were gathered from recent reviews [9, 46], large human patient

cohort studies [12-15, 47] and recent studies [47, 48]. Original publications implicating genes in CDH

were checked for the level of evidence (i.e. variants likely impacting gene function or deleterious as

described in original studies) and the following ranking system used.

Mouse Data Ranking: 10 = CDH, > 80% frequency; 9 = CDH, 40-60% frequency; 8 = CDH, < 40%

frequency; 7 = muscle-less patches, muscle-less diaphragm, thin diaphragm, > 80% frequency; 6 =

muscle-less patches, muscle-less diaphragm, thin diaphragm, 40-60% frequency; 5 = muscle-less

patches, muscle-less diaphragm, thin diaphragm, < 40% frequency. Human Data Ranking:9 = inherited

compound heterozygous or homozygous deleterious variant (2 alleles in 1 gene) + de novo deleterious

variant > 1 patient; 8 = inherited compound heterozygous or homozygous deleterious variant (2 alleles

in 1 gene), > 1 patient; 7 = de novo deleterious variant (1 allele in 1 gene), > 1 patient; 6 = inherited

compound heterozygous or homozygous deleterious variant (2 alleles in 1 gene), 1 patient; 5 =

inherited deleterious variant (1 allele in 1 gene), > 1 patient; 4 = unknown inheritance deleterious

variant (1 allele in 1 gene), > 1 patient; 3 = de novo deleterious variant (1 allele in 1 gene), 1 patient; 2

= inherited deleterious variant (1 allele in 1 gene), 1 patient; 1 = unknown inheritance deleterious

variants (1 allele in 1 gene), 1 patient. The final ranking of each gene was determined as the sum of the

mouse data and the human data ranking and constitutes the order of genes found in Table S1. CDH-

implicated genes were queried for expression in the mouse E12.5 PPF RNA-seq dataset and

expression plotted using the ggplot2 R package [108]. Gene networks within the list of CDH-associated

genes were visualized using STRING [50], visualizing high and medium levels of evidence to connect

gene nodes, using all evidence (except the “text mining” option was not used).

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

Patient sample collection and DNA isolation

Participants were enrolled in a Columbia University IRB protocol AAAB2063 and provided informed

consent/assent for participation in this study as previously described [15]. Whole genome sequencing

and data analysis at the University of Utah was covered by IRB protocol 00085165. All four probands

had isolated CDH. Probands 411, 809, and 967 had left hernias < 50% of chest wall devoid of

diaphragmatic tissue, while proband 716 had a large (> 50% of chest wall devoid of diaphragmatic

tissue) right hernia. The self-reported ancestries are probands 411 and 809 are European (non-

Hispanic), proband 716 is Asian, and proband 967 is African, Hispanic, and European. The 4 CDH

probands and parents are included in previous exome sequencing analysis [14].

Whole Genome Sequencing

DNA was prepared for sequencing using TruSeq DNA PCR-free libraries (Illumina) and run on the

Illumina HiSeq X Ten System at a minimum of 60× median whole-genome coverage

Whole genome alignment, variant calling and quality checks

Genomic sequencing reads were aligned with BWA-MEM [109] against human GRCh37 reference

genome (including decoy sequences from the GATK resource bundle). Aligned BAM files were de-

duplicated using samblaster [110]. Base quality score recalibration and realignment of small insertions

and deletions was performed with the GATK package [111]. Alignment quality was checked with

samtools [112] ‘stats’ and ‘flagstats’ functions. Variants (SNVs, indels) were called with GATK

Haplotypecaller [111]. Sample relatedness, sex and reported ancestry were confirmed with Peddy [55].

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

Germline and somatic de novo SNV and indel variant analysis

De novo SNVs and indels were called with RUFUS (https://github.com/jandrewrfarrell/RUFUS) using

standard parameters (25 length k-mers and 40 threads) and realigning k-mers to the human GRCh37

reference genome. Each proband tissue sample was run against both parents as control samples to

call germline de novo variants, and all possible combinations of one proband tissue against the other

two were run to call tissue-specific variants. Variants flagged “DeNovo” by RUFUS were retained and

any variants found in all three proband tissues were called germline de novo. Variants were further

filtered, and only variants were retained with GQ scores >20, read depths >15 at the variant site, and

the variant was found in > 20% of reads in the sample using annotations from the GATK

haplotypecaller [111] via a python script written with the cyvcf2 package [113] and annotations in IGV

[56]. Variant quality, alignment, and sample specificity were confirmed visually in IGV [56]. 5-20bp

indels located at the start or end of single nucleotide repeats were filtered out, as these are potential

false positives due to alignment error. Genetic locations were annotated with the UCSC Known Gene

Annotation [114]. Noncoding variants were intersected against a bed file of enhancers from the VISTA

Enhancer Database [106] using bedtools [115] to determine whether variants were within annotated

enhancers. Coding variant predicted damage was determined by both Provean [59] and CADD scores

[116] and allele frequency within a large, healthy population was determined by gnomAD [78]. Genes

containing coding variants were annotated as intolerant for loss-of-function with ExAC PLi scores [60].

Inherited SNV and indel variant analysis

The VAAST3 pipeline was used to call predicted damaging inherited recessive variants [73, 117]. The

pipeline includes the following steps. First, GATK haplotypecaller [111] identified-variants were

decomposed and normalized using VT [118], and variants with a GQ score < 30 or with > 25% of the

samples in the VCF file not genotyped (“no-call”) were filtered out using vcftools [119]. Variant effect

predicted annotations were added to the filtered VCF file using VEP [120] with version 83 of the hg19

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

vep-cache. As a background population, a VCF file with variants from 1000 Genomes phase 3 [74]

were run through the same workflow. A VVP background database was made from the 1000 Genome

filtered variants using the build_background function of VVP [121]. This background database and the

filtered VCF file of variants discovered in this cohort were used as inputs to VVP to prioritize cohort-

discovered variants, which were then passed to VAAST3 to be scored and ranked. Blood samples from

parents and the proband were used in VAAST3’s trio recessive inherited model. Genes with a p-value >

0.05 from VAAST3 were filtered out. The remaining genes were annotated with the predicted effect

from VEP, the location of the variant from VVP, and the parental origin of the allele from VVP. Ensembl

IDs given in the VAAST3 output were converted to gene names using GeneCard [122]. Genes were

then filtered out if 1) the genes were expressed at < 10 TPM in E12.5 mouse PPFs (human gene

names were converted to mouse orthologs using OrthoRetriever

(http://lighthouse.ucsf.edu/orthoretriever/)), 2) if they were a pseudogene annotated by GENCODE [75],

3) if they belonged to a highly mutable gene family, 4) if the allele called is found in > 0.01 of individuals

in gnomAD, or 5) there are multiple alleles at the site in gnomAD. Variant impact was predicted by

Provean [59], CADD [116] and ExAC pli scores [60], then genes were ranked based on the number and

severity of damaging alleles.

Inherited and de novo structural variant analysis

SVs were called with the Lumpy Smooth pipeline [89]. Regions with possible copy number variants

based on read depth were called using CNVkit [91], CNVnator [92] and CN.MOPS [90] with sample

BAM files. Outputs from the three tools were converted into BEDPE files and merged together as one

“deletions” file (evidence of a decrease in copy number) and one “duplications” (evidence of an

increase in copy number) per sample. The copy number call BEDPE files and aligned BAM files were

used as input to Lumpy [89] and called with the lumpy_smooth script (in the LUMPY scripts directory).

Lumpy output variants were used as inputs to SVTYPER [123] to annotate each variant as an insertion,

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

deletion, translocation or inversion within a VCF file. De novo and inherited SVs were identified with

GQT [124] using the LUMPY-created VCF file and a PED file describing family relations. We kept only

variants with either spilt or discordant supporting read counts above 8 and but below 400 (to exclude

noisy regions with high read mapping). SVs were confirmed in IGV [56], keeping variants with visible

evidence in read coverage change and discordant reads. de novo and inherited structural variants were

queried in the complete VCF file from [94] to determine allele frequency in the general population, and

we filtered out SVs with an allele frequency > 10%. Variants located in intergenic regions, overlapping

annotated repetitive element or elements [125] or homozygous in parental DNA were excluded. The

discovered 343 bp deletion was confirmed with PCR using primers: forward, 5'-

TTCCTCTACCATTGGGCGTTT-3' and reverse, 5'-AGGTAGTACGGCTGACTTGC-3'.

E12.5 PPF RNA sequencing

Pleuroperitoneal folds (PPFs) were isolated from wild-type E12.5 mouse embryos by cutting embryos

just above the hindlimbs and below the forelimbs and removing the heart and lungs cranially, leaving

the trunk with attached nascent diaphragm. The PPFs were manually dissected and stored in RNAlater

(Thermo Fisher, #AM7020) at -80°c. RNA was isolated using a RNeasy Micro Kit (Qiagen, #74004),

RNA quality confirmed with RNA TapeStation ScreenTape Assay (Agilent, # 5067-5576), and

sequenced using TruSeq Stranded mRNA Library Preparation Kit with polyA selection (Illumina) and

sequenced using HiSeq 50 Cycle Single-Read Sequencing v4 (Illumina) through the High-Throughput

Genomics and Bioinformatic Analysis Shared Resource at Huntsman Cancer Institute at the University

of Utah. Two biological replicates were sequenced (two pooled PPF pairs per replicated) and analyzed.

Sequencing reads were aligned to the mouse genome (mm9) with STAR [126], using standard

parameters. featureCounts [127] was used to count reads per gene and then normalized by TPM

(transcript per million) using R.

28 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

Figure creation and statistics

Figures 1A, 3B, 4B, and S1 were created with R package ggplot2 [108]. Fig S2 was created with

Prism 7 (Graphpad). Figures 2, 3A, 4A and 6 were created with Adobe Illustrator. Fig 5 was created

with samplot (https://github.com/ryanlayer/samplot) and Adobe Photoshop. Genome tracks were plotted

using the Gviz R package [128], with UCSC Known Gene [114], GERP conservation scores [129],

ENCODE DNase 1 hypersensitivity clusters, H3K27 acetylation in normal human lung fibroblasts

(NHLF) cells [130] and VISTA enhancer element tracks [106]. Except for the VISTA elements (bed file

downloaded from https://enhancer.lbl.gov/) all data was downloaded from the UCSC Genome Table

Browser. The 343 bp deletion sequence and the homologous sequence in mouse (mm9 reference

genome) were downloaded from the UCSC Genome Browser and aligned with Geneious [131].

Statistics for Fig S1 were generated with Prism 7, using a one-way ANOVA with multiple comparisons

to test differences between somatic or germline de novo variants found across probands or tissues and

unpaired t-tests to compare somatic and germline de novo variants within probands or tissues.

ACKNOWLEDGEMENTS

We would like to thank the patients and their families for their generous contribution. We are grateful for

the technical assistance provided by Patricia Lanzano, Jiangyuan Hu, Jiancheng Guo, and Liyong

Deng. We thank A Quinlan and RM Layer for help with LUMPY, D Neklason at Utah Genome Project

for coordinating sequencing, and CY Chow, LB Jorde, A Quinlan, EM Sefton, and B Collins for critical

reading of the manuscript. The support and resources from the Center for High Performance

Computing at the University of Utah are gratefully acknowledged. EL Bogenschutz was supported by

the University of Utah Genetics Training grant (NIH T32 GM007464). This research was supported by

NIH R01HD087360, March of Dimes 6FY15203, Utah Genome Project, and Wheeler Foundation grants

to GK; NIH R01GM120609 and R03HL138352 to YS; and NIH R01HD057036,UL1 RR024156,

29 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

and P01HD068250 to WKC. Additional funding support was provided by grants from CHERUBS,

CDHUK, and the National Greek Orthodox Ladies Philoptochos Society, Inc. and generous donations

from the Williams Family, Wheeler Foundation, Vanech Family Foundation, Larsen Family, Wilke

Family and many other families.

30 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

REFERENCES

1. Perry SF, Similowski T, Klein W, Codd JR. The evolutionary origin of the mammalian

diaphragm. Respir Physiol Neurobiol. 2010;171(1):1-16. Epub 2010/01/19. doi:

10.1016/j.resp.2010.01.004. PubMed PMID: 20080210.

2. Stege G, Fenton A, Jaffray B. Nihilism in the 1990s: the true mortality of congenital

diaphragmatic hernia. Pediatrics. 2003;112(3 Pt 1):532-5. Epub 2003/09/02. PubMed PMID:

12949279.

3. Torfs CP, Curry CJ, Bateson TF, Honore LH. A population-based study of congenital

diaphragmatic hernia. Teratology. 1992;46(6):555-65. Epub 1992/12/01. doi:

10.1002/tera.1420460605. PubMed PMID: 1290156.

4. Parker SE, Mai CT, Canfield MA, Rickard R, Wang Y, Meyer RE, et al. Updated

National Birth Prevalence estimates for selected birth defects in the United States, 2004-2006.

Birth Defects Res A Clin Mol Teratol. 2010;88(12):1008-16. Epub 2010/09/30. doi:

10.1002/bdra.20735. PubMed PMID: 20878909.

5. Shanmugam H, Brunelli L, Botto LD, Krikov S, Feldkamp ML. Epidemiology and

Prognosis of Congenital Diaphragmatic Hernia: A Population-Based Cohort Study in Utah.

Birth Defects Res. 2017;109(18):1451-9. Epub 2017/09/20. doi: 10.1002/bdr2.1106. PubMed

PMID: 28925604.

6. Lally KP. Congenital diaphragmatic hernia - the past 25 (or so) years. J Pediatr Surg.

2016;51(5):695-8. Epub 2016/03/02. doi: 10.1016/j.jpedsurg.2016.02.005. PubMed PMID:

26926207.

7. Pober BR. Overview of epidemiology, genetics, birth defects, and

abnormalities associated with CDH. Am J Med Genet C Semin Med Genet. 2007;145C(2):158-

31 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

71. Epub 2007/04/17. doi: 10.1002/ajmg.c.30126. PubMed PMID: 17436298; PubMed Central

PMCID: PMCPMC2891729.

8. Ackerman KG, Vargas SO, Wilson JA, Jennings RW, Kozakewich HP, Pober BR.

Congenital diaphragmatic defects: proposal for a new classification based on observations in

234 patients. Pediatric and developmental pathology : the official journal of the Society for

Pediatric Pathology and the Paediatric Pathology Society. 2012;15(4):265-74. Epub

2012/01/20. doi: 10.2350/11-05-1041-OA.1. PubMed PMID: 22257294; PubMed Central

PMCID: PMC3761363.

9. Kardon G, Ackerman KG, McCulley DJ, Shen Y, Wynn J, Shang L, et al. Congenital

diaphragmatic hernias: from genes to mechanisms to therapies. Dis Model Mech.

2017;10(8):955-70. Epub 2017/08/05. doi: 10.1242/dmm.028365. PubMed PMID: 28768736;

PubMed Central PMCID: PMCPMC5560060.

10. Holder AM, Klaassens M, Tibboel D, de Klein A, Lee B, Scott DA. Genetic factors in

congenital diaphragmatic hernia. Am J Hum Genet. 2007;80(5):825-45. Epub 2007/04/17. doi:

10.1086/513442. PubMed PMID: 17436238; PubMed Central PMCID: PMCPMC1852742.

11. Longoni M, Lage K, Russell MK, Loscertales M, Abdul-Rahman OA, Baynam G, et al.

Congenital diaphragmatic hernia interval on chromosome 8p23.1 characterized by genetics

and protein interaction networks. Am J Med Genet A. 2012;158A(12):3148-58. Epub

2012/11/21. doi: 10.1002/ajmg.a.35665. PubMed PMID: 23165946; PubMed Central PMCID:

PMCPMC3761361.

12. Longoni M, High FA, Qi H, Joy MP, Hila R, Coletti CM, et al. Genome-wide enrichment

of damaging de novo variants in patients with isolated and complex congenital diaphragmatic

32 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

hernia. Hum Genet. 2017;136(6):679-91. Epub 2017/03/18. doi: 10.1007/s00439-017-1774-y.

PubMed PMID: 28303347; PubMed Central PMCID: PMCPMC5453716.

13. Longoni M, High FA, Russell MK, Kashani A, Tracy AA, Coletti CM, et al. Molecular

pathogenesis of congenital diaphragmatic hernia revealed by exome sequencing,

developmental data, and bioinformatics. Proc Natl Acad Sci U S A. 2014;111(34):12450-5.

Epub 2014/08/12. doi: 10.1073/pnas.1412509111. PubMed PMID: 25107291; PubMed Central

PMCID: PMCPMC4151769.

14. Qi H, Yu L, Zhou X, Wynn J, Zhao H, Guo Y, et al. De novo variants in congenital

diaphragmatic hernia identify MYRF as a new syndrome and reveal genetic overlaps with other

developmental disorders. PLoS Genet. 2018;14(12):e1007822. Epub 2018/12/12. doi:

10.1371/journal.pgen.1007822. PubMed PMID: 30532227; PubMed Central PMCID:

PMCPMC6301721.

15. Yu L, Sawle AD, Wynn J, Aspelund G, Stolar CJ, Arkovitz MS, et al. Increased burden

of de novo predicted deleterious variants in complex congenital diaphragmatic hernia. Hum

Mol Genet. 2015;24(16):4764-73. Epub 2015/06/03. doi: 10.1093/hmg/ddv196. PubMed PMID:

26034137; PubMed Central PMCID: PMCPMC4512631.

16. Jay PY, Bielinska M, Erlich JM, Mannisto S, Pu WT, Heikinheimo M, et al. Impaired

mesenchymal cell function in Gata4 mutant mice leads to diaphragmatic hernias and primary

lung defects. Dev Biol. 2007;301(2):602-14. Epub 2006/10/31. doi:

10.1016/j.ydbio.2006.09.050. PubMed PMID: 17069789; PubMed Central PMCID:

PMCPMC1808541.

33 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

17. Merrell AJ, Ellis BJ, Fox ZD, Lawson JA, Weiss JA, Kardon G. Muscle connective tissue

controls development of the diaphragm and is a source of congenital diaphragmatic hernias.

Nat Genet. 2015;47(5):496-504. Epub 2015/03/26. doi: 10.1038/ng.3250. PubMed PMID:

25807280; PubMed Central PMCID: PMCPMC4414795.

18. Pober BR, Lin A, Russell M, Ackerman KG, Chakravorty S, Strauss B, et al. Infants with

Bochdalek diaphragmatic hernia: sibling precurrence and monozygotic twin discordance in a

hospital-based malformation surveillance program. Am J Med Genet A. 2005;138A(2):81-8.

Epub 2005/08/12. doi: 10.1002/ajmg.a.30904. PubMed PMID: 16094667; PubMed Central

PMCID: PMCPMC2891716.

19. Yu L, Wynn J, Ma L, Guha S, Mychaliska GB, Crombleholme TM, et al. De novo copy

number variants are associated with congenital diaphragmatic hernia. J Med Genet.

2012;49(10):650-9. Epub 2012/10/12. doi: 10.1136/jmedgenet-2012-101135. PubMed PMID:

23054247; PubMed Central PMCID: PMCPMC3696999.

20. Yu L, Hernan RR, Wynn J, Chung WK. The influence of genetics in congenital

diaphragmatic hernia. Semin Perinatol. 2019:151169. Epub 2019/08/25. doi:

10.1053/j.semperi.2019.07.008. PubMed PMID: 31443905.

21. Yu L, Wynn J, Cheung YH, Shen Y, Mychaliska GB, Crombleholme TM, et al. Variants

in GATA4 are a rare cause of familial and sporadic congenital diaphragmatic hernia. Hum

Genet. 2013;132(3):285-92. Epub 2012/11/10. doi: 10.1007/s00439-012-1249-0. PubMed

PMID: 23138528; PubMed Central PMCID: PMCPMC3570587.

22. Veenma D, Brosens E, de Jong E, van de Ven C, Meeussen C, Cohen-Overbeek T, et

al. Copy number detection in discordant monozygotic twins of Congenital Diaphragmatic

34 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

Hernia (CDH) and Esophageal Atresia (EA) cohorts. Eur J Hum Genet. 2012;20(3):298-304.

Epub 2011/11/11. doi: 10.1038/ejhg.2011.194. PubMed PMID: 22071887; PubMed Central

PMCID: PMCPMC3283183.

23. Kantarci S, Ackerman KG, Russell MK, Longoni M, Sougnez C, Noonan KM, et al.

Characterization of the chromosome 1q41q42.12 region, and the candidate gene DISP1, in

patients with CDH. Am J Med Genet A. 2010;152A(10):2493-504. Epub 2010/08/28. doi:

10.1002/ajmg.a.33618. PubMed PMID: 20799323; PubMed Central PMCID:

PMCPMC3797530.

24. Veenma D, Beurskens N, Douben H, Eussen B, Noomen P, Govaerts L, et al.

Comparable low-level mosaicism in affected and non affected tissue of a complex CDH

patient. PLoS One. 2010;5(12):e15348. Epub 2011/01/05. doi: 10.1371/journal.pone.0015348.

PubMed PMID: 21203572; PubMed Central PMCID: PMCPMC3006223.

25. Matsunami N, Shanmugam H, Baird L, Stevens J, Byrne JL, Barnhart DC, et al.

Germline but not somatic de novo mutations are common in human congenital diaphragmatic

hernia. Birth Defects Res. 2018;110(7):610-7. Epub 2018/03/24. doi: 10.1002/bdr2.1223.

PubMed PMID: 29570242; PubMed Central PMCID: PMCPMC5903934.

26. Farag TI, Bastaki L, Marafie M, al-Awadi SA, Krsz J. Autosomal recessive congenital

diaphragmatic defects in the Arabs. Am J Med Genet. 1994;50(3):300-1. Epub 1994/04/15.

doi: 10.1002/ajmg.1320500316. PubMed PMID: 8042677.

27. Hitch DC, Carson JA, Smith EI, Sarale DC, Rennert OM. Familial congenital

diaphragmatic hernia is an autosomal recessive variant. J Pediatr Surg. 1989;24(9):860-4.

Epub 1989/09/01. doi: 10.1016/s0022-3468(89)80582-2. PubMed PMID: 2674388.

35 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

28. Kantarci S, Al-Gazali L, Hill RS, Donnai D, Black GC, Bieth E, et al. Mutations in LRP2,

which encodes the multiligand receptor megalin, cause Donnai-Barrow and facio-oculo-

acoustico-renal syndromes. Nat Genet. 2007;39(8):957-9. Epub 2007/07/17. doi:

10.1038/ng2063. PubMed PMID: 17632512; PubMed Central PMCID: PMCPMC2891728.

29. Mitchell SJ, Cole T, Redford DH. Congenital diaphragmatic hernia with probable

autosomal recessive inheritance in an extended consanguineous Pakistani pedigree. J Med

Genet. 1997;34(7):601-3. Epub 1997/07/01. doi: 10.1136/jmg.34.7.601. PubMed PMID:

9222974; PubMed Central PMCID: PMCPMC1051006.

30. Longoni M, Russell MK, High FA, Darvishi K, Maalouf FI, Kashani A, et al. Prevalence

and penetrance of ZFPM2 mutations and deletions causing congenital diaphragmatic hernia.

Clin Genet. 2015;87(4):362-7. Epub 2014/04/08. doi: 10.1111/cge.12395. PubMed PMID:

24702427; PubMed Central PMCID: PMCPMC4410767.

31. Allan DW, Greer JJ. Embryogenesis of the phrenic nerve and diaphragm in the fetal rat.

J Comp Neurol. 1997;382(4):459-68. Epub 1997/06/16. PubMed PMID: 9184993.

32. Babiuk RP, Zhang W, Clugston R, Allan DW, Greer JJ. Embryological origins and

development of the rat diaphragm. J Comp Neurol. 2003;455(4):477-87. Epub 2003/01/01. doi:

10.1002/cne.10503. PubMed PMID: 12508321.

33. Sefton EM, Gallardo M, Kardon G. Developmental origin and morphogenesis of the

diaphragm, an essential mammalian muscle. Dev Biol. 2018;440(2):64-73. Epub 2018/04/22.

doi: 10.1016/j.ydbio.2018.04.010. PubMed PMID: 29679560; PubMed Central PMCID:

PMCPMC6089379.

36 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

34. Carmona R, Canete A, Cano E, Ariza L, Rojas A, Munoz-Chapuli R. Conditional

deletion of WT1 in the septum transversum mesenchyme causes congenital diaphragmatic

hernia in mice. Elife. 2016;5. Epub 2016/09/20. doi: 10.7554/eLife.16009. PubMed PMID:

27642710; PubMed Central PMCID: PMCPMC5028188.

35. Paris ND, Coles GL, Ackerman KG. Wt1 and beta-catenin cooperatively regulate

diaphragm development in the mouse. Dev Biol. 2015;407(1):40-56. Epub 2015/08/19. doi:

10.1016/j.ydbio.2015.08.009. PubMed PMID: 26278035; PubMed Central PMCID:

PMCPMC4641796.

36. Grifone R, Demignon J, Giordani J, Niro C, Souil E, Bertin F, et al. Eya1 and Eya2

proteins are required for hypaxial somitic myogenesis in the mouse embryo. Dev Biol.

2007;302(2):602-16. Epub 2006/11/14. doi: 10.1016/j.ydbio.2006.08.059. PubMed PMID:

17098221.

37. Grifone R, Demignon J, Houbron C, Souil E, Niro C, Seller MJ, et al. Six1 and Six4

homeoproteins are required for Pax3 and Mrf expression during myogenesis in the mouse

embryo. Development. 2005;132(9):2235-49. Epub 2005/03/25. doi: 10.1242/dev.01773.

PubMed PMID: 15788460.

38. Inanlou MR, Dhillon GS, Belliveau AC, Reid GA, Ying C, Rudnicki MA, et al. A

significant reduction of the diaphragm in mdx:MyoD-/-(9th) embryos suggests a role for MyoD

in the diaphragm development. Dev Biol. 2003;261(2):324-36. Epub 2003/09/23. PubMed

PMID: 14499644.

39. Ju Y, Li J, Xie C, Ritchlin CT, Xing L, Hilton MJ, et al. Troponin T3 expression in skeletal

and smooth muscle is required for growth and postnatal survival: characterization of

37 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

Tnnt3(tm2a(KOMP)Wtsi) mice. Genesis. 2013;51(9):667-75. Epub 2013/06/19. doi:

10.1002/dvg.22407. PubMed PMID: 23775847; PubMed Central PMCID: PMCPMC3787964.

40. Laclef C, Hamard G, Demignon J, Souil E, Houbron C, Maire P. Altered myogenesis in

Six1-deficient mice. Development. 2003;130(10):2239-52. Epub 2003/04/02. PubMed PMID:

12668636.

41. Li J, Liu KC, Jin F, Lu MM, Epstein JA. Transgenic rescue of congenital heart disease

and spina bifida in Splotch mice. Development. 1999;126(11):2495-503. Epub 1999/05/05.

PubMed PMID: 10226008.

42. Li Z, Colucci-Guyon E, Pincon-Raymond M, Mericskay M, Pournin S, Paulin D, et al.

Cardiovascular lesions and skeletal myopathy in mice lacking desmin. Dev Biol.

1996;175(2):362-6. Epub 1996/05/01. doi: 10.1006/dbio.1996.0122. PubMed PMID: 8626040.

43. Lu JR, Bassel-Duby R, Hawkins A, Chang P, Valdez R, Wu H, et al. Control of facial

muscle development by MyoR and capsulin. Science. 2002;298(5602):2378-81. Epub

2002/12/21. doi: 10.1126/science.1078273. PubMed PMID: 12493912.

44. Seale P, Sabourin LA, Girgis-Gabardo A, Mansouri A, Gruss P, Rudnicki MA. Pax7 is

required for the specification of myogenic satellite cells. Cell. 2000;102(6):777-86. Epub

2000/10/13. PubMed PMID: 11030621.

45. Tseng BS, Cavin ST, Booth FW, Olson EN, Marin MC, McDonnell TJ, et al. Pulmonary

hypoplasia in the myogenin null mouse embryo. Am J Respir Cell Mol Biol. 2000;22(3):304-15.

Epub 2000/03/01. doi: 10.1165/ajrcmb.22.3.3708. PubMed PMID: 10696067.

46. Russell MK, Longoni M, Wells J, Maalouf FI, Tracy AA, Loscertales M, et al. Congenital

diaphragmatic hernia candidate genes derived from embryonic transcriptomes. Proc Natl Acad

38 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

Sci U S A. 2012;109(8):2978-83. Epub 2012/02/09. doi: 10.1073/pnas.1121621109. PubMed

PMID: 22315423; PubMed Central PMCID: PMCPMC3286948.

47. Zhu Q, High FA, Zhang C, Cerveira E, Russell MK, Longoni M, et al. Systematic

analysis of copy number variation associated with congenital diaphragmatic hernia. Proc Natl

Acad Sci U S A. 2018;115(20):5247-52. Epub 2018/05/02. doi: 10.1073/pnas.1714885115.

PubMed PMID: 29712845; PubMed Central PMCID: PMCPMC5960281.

48. Jordan VK, Beck TF, Hernandez-Garcia A, Kundert PN, Kim BJ, Jhangiani SN, et al.

The Role of FREM2 and FRAS1 in the Development of Congenital Diaphragmatic Hernia. Hum

Mol Genet. 2018. Epub 2018/04/05. doi: 10.1093/hmg/ddy110. PubMed PMID: 29618029.

49. Clugston RD, Zhang W, Greer JJ. Early development of the primordial mammalian

diaphragm and cellular mechanisms of nitrofen-induced congenital diaphragmatic hernia. Birth

Defects Res A Clin Mol Teratol. 2010;88(1):15-24. Epub 2009/08/28. doi: 10.1002/bdra.20613.

PubMed PMID: 19711422.

50. Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, et al. STRING

v11: protein-protein association networks with increased coverage, supporting functional

discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019;47(D1):D607-D13.

Epub 2018/11/27. doi: 10.1093/nar/gky1131. PubMed PMID: 30476243; PubMed Central

PMCID: PMCPMC6323986.

51. Chlon TM, Crispino JD. Combinatorial regulation of tissue specification by GATA and

FOG factors. Development. 2012;139(21):3905-16. Epub 2012/10/11. doi:

10.1242/dev.080440. PubMed PMID: 23048181; PubMed Central PMCID: PMCPMC3472596.

39 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

52. Huggins GS, Bacani CJ, Boltax J, Aikawa R, Leiden JM. Friend of GATA 2 physically

interacts with chicken ovalbumin upstream promoter-TF2 (COUP-TF2) and COUP-TF3 and

represses COUP-TF2-dependent activation of the atrial natriuretic factor promoter. J Biol

Chem. 2001;276(30):28029-36. Epub 2001/05/31. doi: 10.1074/jbc.M103577200. PubMed

PMID: 11382775.

53. Ang YS, Rivas RN, Ribeiro AJS, Srivas R, Rivera J, Stone NR, et al. Disease Model of

GATA4 Mutation Reveals Transcription Factor Cooperativity in Human Cardiogenesis. Cell.

2016;167(7):1734-49 e22. Epub 2016/12/17. doi: 10.1016/j.cell.2016.11.033. PubMed PMID:

27984724; PubMed Central PMCID: PMCPMC5180611.

54. Luna-Zurita L, Stirnimann CU, Glatt S, Kaynak BL, Thomas S, Baudin F, et al. Complex

Interdependence Regulates Heterotypic Transcription Factor Distribution and Coordinates

Cardiogenesis. Cell. 2016;164(5):999-1014. Epub 2016/02/16. doi: 10.1016/j.cell.2016.01.004.

PubMed PMID: 26875865; PubMed Central PMCID: PMCPMC4769693.

55. Pedersen BS, Quinlan AR. Who's Who? Detecting and Resolving Sample Anomalies in

Human DNA Sequencing Studies with Peddy. Am J Hum Genet. 2017;100(3):406-13. Epub

2017/02/14. doi: 10.1016/j.ajhg.2017.01.017. PubMed PMID: 28190455; PubMed Central

PMCID: PMCPMC5339084.

56. Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-

performance genomics data visualization and exploration. Brief Bioinform. 2013;14(2):178-92.

Epub 2012/04/21. doi: 10.1093/bib/bbs017. PubMed PMID: 22517427; PubMed Central

PMCID: PMCPMC3603213.

40 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

57. Besenbacher S, Liu S, Izarzugaza JM, Grove J, Belling K, Bork-Jensen J, et al. Novel

variation and de novo mutation rates in population-wide de novo assembled Danish trios. Nat

Commun. 2015;6:5969. Epub 2015/01/20. doi: 10.1038/ncomms6969. PubMed PMID:

25597990; PubMed Central PMCID: PMCPMC4309431.

58. Sasani TA, Pedersen BS, Gao Z, Baird L, Przeworski M, Jorde LB, et al. Large, three-

generation human families reveal post-zygotic mosaicism and variability in germline mutation

accumulation. Elife. 2019;8. Epub 2019/09/25. doi: 10.7554/eLife.46922. PubMed PMID:

31549960; PubMed Central PMCID: PMCPMC6759356.

59. Choi Y, Sims GE, Murphy S, Miller JR, Chan AP. Predicting the functional effect of

amino acid substitutions and indels. PLoS One. 2012;7(10):e46688. Epub 2012/10/12. doi:

10.1371/journal.pone.0046688. PubMed PMID: 23056405; PubMed Central PMCID:

PMCPMC3466303.

60. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of

protein-coding genetic variation in 60,706 humans. Nature. 2016;536(7616):285-91. Epub

2016/08/19. doi: 10.1038/nature19057. PubMed PMID: 27535533; PubMed Central PMCID:

PMCPMC5018207.

61. Kuo MW, Wang CH, Wu HC, Chang SJ, Chuang YJ. Soluble THSD7A is an N-

glycoprotein that promotes endothelial cell migration and tube formation in angiogenesis. PLoS

One. 2011;6(12):e29000. Epub 2011/12/24. doi: 10.1371/journal.pone.0029000. PubMed

PMID: 22194972; PubMed Central PMCID: PMCPMC3237571.

41 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

62. Wang CH, Chen IH, Kuo MW, Su PT, Lai ZY, Wang CH, et al. Zebrafish Thsd7a is a

neural protein required for angiogenic patterning during development. Dev Dyn.

2011;240(6):1412-21. Epub 2011/04/27. doi: 10.1002/dvdy.22641. PubMed PMID: 21520329.

63. Wang CH, Su PT, Du XY, Kuo MW, Lin CY, Yang CC, et al. Thrombospondin type I

domain containing 7A (THSD7A) mediates endothelial cell migration and tube formation. J Cell

Physiol. 2010;222(3):685-94. Epub 2009/12/19. doi: 10.1002/jcp.21990. PubMed PMID:

20020485.

64. Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error

probabilities. Genome Res. 1998;8(3):186-94. Epub 1998/05/16. PubMed PMID: 9521922.

65. Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces

using phred. I. Accuracy assessment. Genome Res. 1998;8(3):175-85. Epub 1998/05/16. doi:

10.1101/gr.8.3.175. PubMed PMID: 9521921.

66. Dong X, Zhang L, Milholland B, Lee M, Maslov AY, Wang T, et al. Accurate

identification of single-nucleotide variants in whole-genome-amplified single cells. Nat

Methods. 2017;14(5):491-3. Epub 2017/03/21. doi: 10.1038/nmeth.4227. PubMed PMID:

28319112; PubMed Central PMCID: PMCPMC5408311.

67. Behjati S, Huch M, van Boxtel R, Karthaus W, Wedge DC, Tamuri AU, et al. Genome

sequencing of normal cells reveals developmental lineages and mutational processes. Nature.

2014;513(7518):422-5. Epub 2014/07/22. doi: 10.1038/nature13448. PubMed PMID:

25043003; PubMed Central PMCID: PMCPMC4227286.

68. Vijg J, Dong X, Zhang L. A high-fidelity method for genomic sequencing of single

somatic cells reveals a very high mutational burden. Exp Biol Med (Maywood).

42 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

2017;242(13):1318-24. Epub 2017/07/25. doi: 10.1177/1535370217717696. PubMed PMID:

28737476; PubMed Central PMCID: PMCPMC5529006.

69. Culig Z, Santer FR. Androgen receptor signaling in prostate cancer. Cancer Metastasis

Rev. 2014;33(2-3):413-27. Epub 2014/01/05. doi: 10.1007/s10555-013-9474-0. PubMed

PMID: 24384911.

70. Polkinghorn WR, Parker JS, Lee MX, Kass EM, Spratt DE, Iaquinta PJ, et al. Androgen

receptor signaling regulates DNA repair in prostate cancers. Cancer Discov. 2013;3(11):1245-

53. Epub 2013/09/13. doi: 10.1158/2159-8290.CD-13-0172. PubMed PMID: 24027196;

PubMed Central PMCID: PMCPMC3888815.

71. Schiewer MJ, Goodwin JF, Han S, Brenner JC, Augello MA, Dean JL, et al. Dual roles

of PARP-1 promote cancer growth and progression. Cancer Discov. 2012;2(12):1134-49. Epub

2012/09/21. doi: 10.1158/2159-8290.CD-12-0120. PubMed PMID: 22993403; PubMed Central

PMCID: PMCPMC3519969.

72. Hu H, Roach JC, Coon H, Guthery SL, Voelkerding KV, Margraf RL, et al. A unified test

of linkage analysis and rare-variant association for analysis of pedigree sequence data. Nat

Biotechnol. 2014;32(7):663-9. Epub 2014/05/20. doi: 10.1038/nbt.2895. PubMed PMID:

24837662; PubMed Central PMCID: PMCPMC4157619.

73. Yandell M, Huff C, Hu H, Singleton M, Moore B, Xing J, et al. A probabilistic disease-

gene finder for personal genomes. Genome Res. 2011;21(9):1529-42. Epub 2011/06/28. doi:

10.1101/gr.123158.111. PubMed PMID: 21700766; PubMed Central PMCID:

PMCPMC3166837.

43 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

74. Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A

global reference for human genetic variation. Nature. 2015;526(7571):68-74. Epub 2015/10/04.

doi: 10.1038/nature15393. PubMed PMID: 26432245; PubMed Central PMCID:

PMCPMC4750478.

75. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al.

GENCODE: the reference annotation for The ENCODE Project. Genome Res.

2012;22(9):1760-74. Epub 2012/09/08. doi: 10.1101/gr.135350.111. PubMed PMID:

22955987; PubMed Central PMCID: PMCPMC3431492.

76. Lenz TL, Spirin V, Jordan DM, Sunyaev SR. Excess of Deleterious Mutations around

HLA Genes Reveals Evolutionary Cost of Balancing Selection. Mol Biol Evol.

2016;33(10):2555-64. Epub 2016/07/21. doi: 10.1093/molbev/msw127. PubMed PMID:

27436009; PubMed Central PMCID: PMCPMC5026253.

77. Shyr C, Tarailo-Graovac M, Gottlieb M, Lee JJ, van Karnebeek C, Wasserman WW.

FLAGS, frequently mutated genes in public exomes. BMC Med Genomics. 2014;7:64. Epub

2014/12/04. doi: 10.1186/s12920-014-0064-y. PubMed PMID: 25466818; PubMed Central

PMCID: PMCPMC4267152.

78. Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. Variation

across 141,456 human exomes and genomes reveals the spectrum of loss-of-function

intolerance across human protein-coding genes. bioRxiv. 2019:531210. doi: 10.1101/531210.

79. Cossins J, Belaya K, Hicks D, Salih MA, Finlayson S, Carboni N, et al. Congenital

myasthenic syndromes due to mutations in ALG2 and ALG14. Brain. 2013;136(Pt 3):944-56.

44 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

Epub 2013/02/14. doi: 10.1093/brain/awt010. PubMed PMID: 23404334; PubMed Central

PMCID: PMCPMC3580273.

80. Monies DM, Al-Hindi HN, Al-Muhaizea MA, Jaroudi DJ, Al-Younes B, Naim EA, et al.

Clinical and pathological heterogeneity of a congenital disorder of glycosylation manifesting as

a myasthenic/myopathic syndrome. Neuromuscul Disord. 2014;24(4):353-9. Epub 2014/01/28.

doi: 10.1016/j.nmd.2013.12.010. PubMed PMID: 24461433.

81. Thiel C, Schwarz M, Peng J, Grzmil M, Hasilik M, Braulke T, et al. A new type of

congenital disorders of glycosylation (CDG-Ii) provides new insights into the early steps of

dolichol-linked oligosaccharide biosynthesis. J Biol Chem. 2003;278(25):22498-505. Epub

2003/04/10. doi: 10.1074/jbc.M302850200. PubMed PMID: 12684507.

82. Pathak RK, Anderson RG, Hofmann SL. Histidine-rich calcium binding protein, a

sarcoplasmic reticulum protein of striated muscle, is also abundant in arteriolar smooth muscle

cells. J Muscle Res Cell Motil. 1992;13(3):366-76. Epub 1992/06/01. doi: 10.1007/bf01766464.

PubMed PMID: 1527222.

83. Arvanitis DA, Vafiadaki E, Fan GC, Mitton BA, Gregory KN, Del Monte F, et al.

Histidine-rich Ca-binding protein interacts with sarcoplasmic reticulum Ca-ATPase. Am J

Physiol Heart Circ Physiol. 2007;293(3):H1581-9. Epub 2007/05/29. doi:

10.1152/ajpheart.00278.2007. PubMed PMID: 17526652.

84. Tzimas C, Johnson DM, Santiago DJ, Vafiadaki E, Arvanitis DA, Davos CH, et al.

Impaired calcium homeostasis is associated with sudden cardiac death and arrhythmias in a

genetic equivalent mouse model of the human HRC-Ser96Ala variant. Cardiovasc Res.

2017;113(11):1403-17. Epub 2017/09/02. doi: 10.1093/cvr/cvx113. PubMed PMID: 28859293.

45 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

85. Davis TA, Loos B, Engelbrecht AM. AHNAK: the giant jack of all trades. Cell Signal.

2014;26(12):2683-93. Epub 2014/08/31. doi: 10.1016/j.cellsig.2014.08.017. PubMed PMID:

25172424.

86. Huang Y, Laval SH, van Remoortere A, Baudier J, Benaud C, Anderson LV, et al.

AHNAK, a novel component of the dysferlin protein complex, redistributes to the cytoplasm

with dysferlin during skeletal muscle regeneration. FASEB J. 2007;21(3):732-42. Epub

2006/12/23. doi: 10.1096/fj.06-6628com. PubMed PMID: 17185750.

87. Zacharias U, Purfurst B, Schowel V, Morano I, Spuler S, Haase H. Ahnak1 abnormally

localizes in muscular dystrophies and contributes to muscle vesicle release. J Muscle Res Cell

Motil. 2011;32(4-5):271-80. Epub 2011/11/08. doi: 10.1007/s10974-011-9271-8. PubMed

PMID: 22057634; PubMed Central PMCID: PMCPMC3230764.

88. Spielmann M, Hernandez-Miranda LR, Ceccherini I, Weese-Mayer DE, Kragesteen BK,

Harabula I, et al. Mutations in MYO1H cause a recessive form of central hypoventilation with

autonomic dysfunction. J Med Genet. 2017;54(11):754-61. Epub 2017/08/06. doi:

10.1136/jmedgenet-2017-104765. PubMed PMID: 28779001.

89. Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: a probabilistic framework for

structural variant discovery. Genome Biol. 2014;15(6):R84. Epub 2014/06/28. doi: 10.1186/gb-

2014-15-6-r84. PubMed PMID: 24970577; PubMed Central PMCID: PMCPMC4197822.

90. Klambauer G, Schwarzbauer K, Mayr A, Clevert DA, Mitterecker A, Bodenhofer U, et al.

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation

sequencing data with a low false discovery rate. Nucleic Acids Res. 2012;40(9):e69. Epub

46 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

2012/02/04. doi: 10.1093/nar/gks003. PubMed PMID: 22302147; PubMed Central PMCID:

PMCPMC3351174.

91. Talevich E, Shain AH, Botton T, Bastian BC. CNVkit: Genome-Wide Copy Number

Detection and Visualization from Targeted DNA Sequencing. PLoS Comput Biol.

2016;12(4):e1004873. Epub 2016/04/23. doi: 10.1371/journal.pcbi.1004873. PubMed PMID:

27100738; PubMed Central PMCID: PMCPMC4839673.

92. Abyzov A, Urban AE, Snyder M, Gerstein M. CNVnator: an approach to discover,

genotype, and characterize typical and atypical CNVs from family and population genome

sequencing. Genome Res. 2011;21(6):974-84. Epub 2011/02/18. doi: 10.1101/gr.114876.110.

PubMed PMID: 21324876; PubMed Central PMCID: PMCPMC3106330.

93. Layer RM, Kindlon N, Karczewski KJ, Exome Aggregation C, Quinlan AR. Efficient

genotype compression and analysis of large genetic-variation data sets. Nat Methods.

2016;13(1):63-5. Epub 2015/11/10. doi: 10.1038/nmeth.3654. PubMed PMID: 26550772;

PubMed Central PMCID: PMCPMC4697868.

94. Abel HJ, Larson DE, Chiang C, Das I, Kanchi KL, Layer RM, et al. Mapping and

characterization of structural variation in 17,795 deeply sequenced human genomes. bioRxiv.

2018:508515. doi: 10.1101/508515.

95. Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, et al. In

vivo enhancer analysis of human conserved non-coding sequences. Nature.

2006;444(7118):499-502. Epub 2006/11/07. doi: 10.1038/nature05295. PubMed PMID:

17086198.

47 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

96. Watt AJ, Battle MA, Li J, Duncan SA. GATA4 is essential for formation of the

proepicardium and regulates cardiogenesis. Proc Natl Acad Sci U S A. 2004;101(34):12573-8.

Epub 2004/08/18. doi: 10.1073/pnas.0400752101. PubMed PMID: 15310850; PubMed Central

PMCID: PMCPMC515098.

97. Kuo CT, Morrisey EE, Anandappa R, Sigrist K, Lu MM, Parmacek MS, et al. GATA4

transcription factor is required for ventral morphogenesis and heart tube formation. Genes

Dev. 1997;11(8):1048-60. Epub 1997/04/15. PubMed PMID: 9136932.

98. Homsy J, Zaidi S, Shen Y, Ware JS, Samocha KE, Karczewski KJ, et al. De novo

mutations in congenital heart disease with neurodevelopmental and other congenital

anomalies. Science. 2015;350(6265):1262-6. Epub 2016/01/20. doi: 10.1126/science.aac9396.

PubMed PMID: 26785492; PubMed Central PMCID: PMCPMC4890146.

99. Hsieh A, Morton SU, Willcox JAL, Gorham JM, Tai AC, Qi H, et al. Early post-zygotic

mutations contribute to congenital heart disease. bioRxiv. 2019.

100. Besenbacher S, Sulem P, Helgason A, Helgason H, Kristjansson H, Jonasdottir A, et al.

Multi-nucleotide de novo Mutations in Humans. PLoS Genet. 2016;12(11):e1006315. Epub

2016/11/16. doi: 10.1371/journal.pgen.1006315. PubMed PMID: 27846220; PubMed Central

PMCID: PMCPMC5147774.

101. Jonsson H, Sulem P, Arnadottir GA, Palsson G, Eggertsson HP, Kristmundsdottir S, et

al. Multiple transmissions of de novo mutations in families. Nat Genet. 2018;50(12):1674-80.

Epub 2018/11/07. doi: 10.1038/s41588-018-0259-9. PubMed PMID: 30397338.

102. Kong A, Frigge ML, Masson G, Besenbacher S, Sulem P, Magnusson G, et al. Rate of

de novo mutations and the importance of father's age to disease risk. Nature.

48 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

2012;488(7412):471-5. Epub 2012/08/24. doi: 10.1038/nature11396. PubMed PMID:

22914163; PubMed Central PMCID: PMCPMC3548427.

103. Rahbari R, Wuster A, Lindsay SJ, Hardwick RJ, Alexandrov LB, Turki SA, et al. Timing,

rates and spectra of human germline mutation. Nat Genet. 2016;48(2):126-33. Epub

2015/12/15. doi: 10.1038/ng.3469. PubMed PMID: 26656846; PubMed Central PMCID:

PMCPMC4731925.

104. Wat MJ, Shchelochkov OA, Holder AM, Breman AM, Dagli A, Bacino C, et al.

Chromosome 8p23.1 deletions as a cause of complex congenital heart defects and

diaphragmatic hernia. Am J Med Genet A. 2009;149A(8):1661-77. Epub 2009/07/17. doi:

10.1002/ajmg.a.32896. PubMed PMID: 19606479; PubMed Central PMCID:

PMCPMC2765374.

105. Manheimer KB, Richter F, Edelmann LJ, D'Souza SL, Shi L, Shen Y, et al. Robust

identification of mosaic variants in congenital heart disease. Hum Genet. 2018;137(2):183-93.

Epub 2018/02/09. doi: 10.1007/s00439-018-1871-6. PubMed PMID: 29417219; PubMed

Central PMCID: PMCPMC5997246.

106. Visel A, Minovitsky S, Dubchak I, Pennacchio LA. VISTA Enhancer Browser--a

database of tissue-specific human enhancers. Nucleic Acids Res. 2007;35(Database

issue):D88-92. Epub 2006/11/30. doi: 10.1093/nar/gkl822. PubMed PMID: 17130149; PubMed

Central PMCID: PMCPMC1716724.

107. Pu WT, Ishiwata T, Juraszek AL, Ma Q, Izumo S. GATA4 is a dosage-sensitive

regulator of cardiac morphogenesis. Dev Biol. 2004;275(1):235-44. Epub 2004/10/07. doi:

10.1016/j.ydbio.2004.08.008. PubMed PMID: 15464586.

49 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

108. Wickham H, Sievert C. ggplot2 : Elegant Graphics for Data Analysis. 2nd ed. Cham:

Springer International Publishing; 2016. 1 online resource (266 pages) p.

109. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.

arXiv. 2013. doi: 1303.3997.

110. Faust GG, Hall IM. SAMBLASTER: fast duplicate marking and structural variant read

extraction. Bioinformatics. 2014;30(17):2503-5. Epub 2014/05/09. doi:

10.1093/bioinformatics/btu314. PubMed PMID: 24812344; PubMed Central PMCID:

PMCPMC4147885.

111. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework

for variation discovery and genotyping using next-generation DNA sequencing data. Nat

Genet. 2011;43(5):491-8. Epub 2011/04/12. doi: 10.1038/ng.806. PubMed PMID: 21478889;

PubMed Central PMCID: PMCPMC3083463.

112. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence

Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078-9. Epub 2009/06/10.

doi: 10.1093/bioinformatics/btp352. PubMed PMID: 19505943; PubMed Central PMCID:

PMCPMC2723002.

113. Pedersen BS, Quinlan AR. cyvcf2: fast, flexible variant analysis with Python.

Bioinformatics. 2017;33(12):1867-9. Epub 2017/02/07. doi: 10.1093/bioinformatics/btx057.

PubMed PMID: 28165109; PubMed Central PMCID: PMCPMC5870853.

114. Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D. The UCSC Known

Genes. Bioinformatics. 2006;22(9):1036-46. Epub 2006/02/28. doi:

10.1093/bioinformatics/btl048. PubMed PMID: 16500937.

50 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

115. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic

features. Bioinformatics. 2010;26(6):841-2. Epub 2010/01/30. doi:

10.1093/bioinformatics/btq033. PubMed PMID: 20110278; PubMed Central PMCID:

PMCPMC2832824.

116. Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the

deleteriousness of variants throughout the human genome. Nucleic Acids Res.

2019;47(D1):D886-D94. Epub 2018/10/30. doi: 10.1093/nar/gky1016. PubMed PMID:

30371827; PubMed Central PMCID: PMCPMC6323892.

117. Hu H, Huff CD, Moore B, Flygare S, Reese MG, Yandell M. VAAST 2.0: improved

variant classification and disease-gene identification using a conservation-controlled amino

acid substitution matrix. Genet Epidemiol. 2013;37(6):622-34. Epub 2013/07/10. doi:

10.1002/gepi.21743. PubMed PMID: 23836555; PubMed Central PMCID: PMCPMC3791556.

118. Tan A, Abecasis GR, Kang HM. Unified representation of genetic variants.

Bioinformatics. 2015;31(13):2202-4. Epub 2015/02/24. doi: 10.1093/bioinformatics/btv112.

PubMed PMID: 25701572; PubMed Central PMCID: PMCPMC4481842.

119. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant

call format and VCFtools. Bioinformatics. 2011;27(15):2156-8. Epub 2011/06/10. doi:

10.1093/bioinformatics/btr330. PubMed PMID: 21653522; PubMed Central PMCID:

PMCPMC3137218.

120. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The Ensembl

Variant Effect Predictor. Genome Biol. 2016;17(1):122. Epub 2016/06/09. doi:

51 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

10.1186/s13059-016-0974-4. PubMed PMID: 27268795; PubMed Central PMCID:

PMCPMC4893825.

121. Flygare S, Hernandez EJ, Phan L, Moore B, Li M, Fejes A, et al. The VAAST Variant

Prioritizer (VVP): ultrafast, easy to use whole genome variant prioritization tool. BMC

Bioinformatics. 2018;19(1):57. Epub 2018/02/22. doi: 10.1186/s12859-018-2056-y. PubMed

PMID: 29463208; PubMed Central PMCID: PMCPMC5819680.

122. Safran M, Dalah I, Alexander J, Rosen N, Iny Stein T, Shmoish M, et al. GeneCards

Version 3: the human gene integrator. Database (Oxford). 2010;2010:baq020. Epub

2010/08/07. doi: 10.1093/database/baq020. PubMed PMID: 20689021; PubMed Central

PMCID: PMCPMC2938269.

123. Chiang C, Layer RM, Faust GG, Lindberg MR, Rose DB, Garrison EP, et al. SpeedSeq:

ultra-fast personal genome analysis and interpretation. Nat Methods. 2015;12(10):966-8. Epub

2015/08/11. doi: 10.1038/nmeth.3505. PubMed PMID: 26258291; PubMed Central PMCID:

PMCPMC4589466.

124. Layer RM, Kindlon N, Karczewski KJ, Quinlan AR, Consortium EA. Efficient genotype

compression and analysis of large genetic-variation data sets. Nat Methods. 2016;13(1):63-+.

doi: 10.1038/Nmeth.3654. PubMed PMID: WOS:000367463600029.

125. Jurka J. Repbase update: a database and an electronic journal of repetitive elements.

Trends Genet. 2000;16(9):418-20. Epub 2000/09/06. doi: 10.1016/s0168-9525(00)02093-x.

PubMed PMID: 10973072.

126. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast

universal RNA-seq aligner. Bioinformatics. 2013;29(1):15-21. Epub 2012/10/30. doi:

52 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

10.1093/bioinformatics/bts635. PubMed PMID: 23104886; PubMed Central PMCID:

PMCPMC3530905.

127. Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for

assigning sequence reads to genomic features. Bioinformatics. 2014;30(7):923-30. Epub

2013/11/15. doi: 10.1093/bioinformatics/btt656. PubMed PMID: 24227677.

128. Hahne F, Ivanek R. Visualizing Genomic Data Using Gviz and Bioconductor. Methods

Mol Biol. 2016;1418:335-51. Epub 2016/03/24. doi: 10.1007/978-1-4939-3578-9_16. PubMed

PMID: 27008022.

129. Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. Identifying a high

fraction of the human genome to be under selective constraint using GERP++. PLoS Comput

Biol. 2010;6(12):e1001025. Epub 2010/12/15. doi: 10.1371/journal.pcbi.1001025. PubMed

PMID: 21152010; PubMed Central PMCID: PMCPMC2996323.

130. Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, et al. Architecture

of the human regulatory network derived from ENCODE data. Nature. 2012;489(7414):91-100.

Epub 2012/09/08. doi: 10.1038/nature11245. PubMed PMID: 22955619; PubMed Central

PMCID: PMCPMC4154057.

131. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al. Geneious

Basic: an integrated and extendable desktop software platform for the organization and

analysis of sequence data. Bioinformatics. 2012;28(12):1647-9. Epub 2012/05/01. doi:

10.1093/bioinformatics/bts199. PubMed PMID: 22543367; PubMed Central PMCID:

PMCPMC3371832.

53 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

132. Hucthagowder V, Sausgruber N, Kim KH, Angle B, Marmorstein LY, Urban Z. Fibulin-4:

a novel gene for an autosomal recessive cutis laxa syndrome. Am J Hum Genet.

2006;78(6):1075-80. Epub 2006/05/11. doi: 10.1086/504304. PubMed PMID: 16685658;

PubMed Central PMCID: PMCPMC1474103.

133. Devriendt K, Deloof E, Moerman P, Legius E, Vanhole C, de Zegher F, et al.

Diaphragmatic hernia in Denys-Drash syndrome. Am J Med Genet. 1995;57(1):97-101. Epub

1995/05/22. doi: 10.1002/ajmg.1320570120. PubMed PMID: 7645607.

134. Slavotinek AM, Moshrefi A, Lopez Jiminez N, Chao R, Mendell A, Shaw GM, et al.

Sequence variants in the HLX gene at chromosome 1q41-1q42 in patients with diaphragmatic

hernia. Clin Genet. 2009;75(5):429-39. Epub 2009/05/23. doi: 10.1111/j.1399-

0004.2009.01182.x. PubMed PMID: 19459883; PubMed Central PMCID: PMCPMC2874832.

135. Wat MJ, Beck TF, Hernandez-Garcia A, Yu Z, Veenma D, Garcia M, et al. Mouse model

reveals the role of SOX7 in the development of congenital diaphragmatic hernia associated

with recurrent deletions of 8p23.1. Hum Mol Genet. 2012;21(18):4115-25. Epub 2012/06/23.

doi: 10.1093/hmg/dds241. PubMed PMID: 22723016; PubMed Central PMCID:

PMCPMC3428158.

136. High FA, Bhayani P, Wilson JM, Bult CJ, Donahoe PK, Longoni M. De novo frameshift

mutation in COUP-TFII (NR2F2) in human congenital diaphragmatic hernia. Am J Med Genet

A. 2016;170(9):2457-61. Epub 2016/07/02. doi: 10.1002/ajmg.a.37830. PubMed PMID:

27363585; PubMed Central PMCID: PMCPMC5003181.

137. Tautz J, Veenma D, Eussen B, Joosen L, Poddighe P, Tibboel D, et al. Congenital

diaphragmatic hernia and a complex heart defect in association with Wolf-Hirschhorn

54 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

syndrome. Am J Med Genet A. 2010;152A(11):2891-4. Epub 2010/09/11. doi:

10.1002/ajmg.a.33660. PubMed PMID: 20830802.

138. Srour M, Chitayat D, Caron V, Chassaing N, Bitoun P, Patry L, et al. Recessive and

dominant mutations in retinoic acid receptor beta in cases with microphthalmia and

diaphragmatic hernia. Am J Hum Genet. 2013;93(4):765-72. Epub 2013/10/01. doi:

10.1016/j.ajhg.2013.08.014. PubMed PMID: 24075189; PubMed Central PMCID:

PMCPMC3791254.

139. Jacobs AM, Toudjarska I, Racine A, Tsipouras P, Kilpatrick MW, Shanske A. A

recurring FBN1 gene mutation in neonatal Marfan syndrome. Arch Pediatr Adolesc Med.

2002;156(11):1081-5. Epub 2002/11/07. PubMed PMID: 12413333.

140. Bleyl SB, Moshrefi A, Shaw GM, Saijoh Y, Schoenwolf GC, Pennacchio LA, et al.

Candidate genes for congenital diaphragmatic hernia from animal models: sequencing of

FOG2 and PDGFRalpha reveals rare variants in diaphragmatic hernia patients. Eur J Hum

Genet. 2007;15(9):950-8. Epub 2007/06/15. doi: 10.1038/sj.ejhg.5201872. PubMed PMID:

17568391.

141. Chassaing N, Ragge N, Kariminejad A, Buffet A, Ghaderi-Sohi S, Martinovic J, et al.

Mutation analysis of the STRA6 gene in isolated and non-isolated

anophthalmia/microphthalmia. Clin Genet. 2013;83(3):244-50. Epub 2012/06/13. doi:

10.1111/j.1399-0004.2012.01904.x. PubMed PMID: 22686418.

142. Urban Z, Hucthagowder V, Schurmann N, Todorovic V, Zilberberg L, Choi J, et al.

Mutations in LTBP4 cause a syndrome of impaired pulmonary, gastrointestinal, genitourinary,

musculoskeletal, and dermal development. Am J Hum Genet. 2009;85(5):593-605. Epub

55 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

2009/10/20. doi: 10.1016/j.ajhg.2009.09.013. PubMed PMID: 19836010; PubMed Central

PMCID: PMCPMC2775835.

143. Hogue J, Shankar S, Perry H, Patel R, Vargervik K, Slavotinek A. A novel EFNB1

mutation (c.712delG) in a family with craniofrontonasal syndrome and diaphragmatic hernia.

Am J Med Genet A. 2010;152A(10):2574-7. Epub 2010/08/25. doi: 10.1002/ajmg.a.33596.

PubMed PMID: 20734337.

144. Maas SM, Lombardi MP, van Essen AJ, Wakeling EL, Castle B, Temple IK, et al.

Phenotype and genotype in 17 patients with Goltz-Gorlin syndrome. J Med Genet.

2009;46(10):716-20. Epub 2009/07/10. doi: 10.1136/jmg.2009.068403. PubMed PMID:

19586929.

145. Tuzovic L, Yu L, Zeng W, Li X, Lu H, Lu HM, et al. A human de novo mutation in

MYH10 phenocopies the loss of function mutation in mice. Rare Dis. 2013;1:e26144. Epub

2013/01/01. doi: 10.4161/rdis.26144. PubMed PMID: 25003005; PubMed Central PMCID:

PMCPMC3927488.

146. Yano S, Baskin B, Bagheri A, Watanabe Y, Moseley K, Nishimura A, et al. Familial

Simpson-Golabi-Behmel syndrome: studies of X-chromosome inactivation and clinical

phenotypes in two female individuals with GPC3 mutations. Clin Genet. 2011;80(5):466-71.

Epub 2010/10/19. doi: 10.1111/j.1399-0004.2010.01554.x. PubMed PMID: 20950395.

147. Sarajlija A, Magner M, Djordjevic M, Kecman B, Grujic B, Tesarova M, et al. Late-

presenting congenital diaphragmatic hernia in a child with TMEM70 deficiency. Congenit Anom

(Kyoto). 2017;57(2):64-5. Epub 2016/09/21. doi: 10.1111/cga.12194. PubMed PMID:

27649480.

56 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

148. Kammoun M, Souche E, Brady P, Ding J, Cosemans N, Gratacos E, et al. Genetic

profile of isolated congenital diaphragmatic hernia revealed by targeted next-generation

sequencing. Prenat Diagn. 2018;38(9):654-63. Epub 2018/07/03. doi: 10.1002/pd.5327.

PubMed PMID: 29966037.

149. Jongmans MC, Admiraal RJ, van der Donk KP, Vissers LE, Baas AF, Kapusta L, et al.

CHARGE syndrome: the phenotypic spectrum of mutations in the CHD7 gene. J Med Genet.

2006;43(4):306-14. Epub 2005/09/13. doi: 10.1136/jmg.2005.036061. PubMed PMID:

16155193; PubMed Central PMCID: PMCPMC2563221.

150. Qidwai K, Pearson DM, Patel GS, Pober BR, Immken LL, Cheung SW, et al. Deletions

of Xp provide evidence for the role of holocytochrome C-type synthase (HCCS) in congenital

diaphragmatic hernia. Am J Med Genet A. 2010;152A(6):1588-90. Epub 2010/05/27. doi:

10.1002/ajmg.a.33410. PubMed PMID: 20503342; PubMed Central PMCID:

PMCPMC2909827.

151. Yu L, Bennett JT, Wynn J, Carvill GL, Cheung YH, Shen Y, et al. Whole exome

sequencing identifies de novo mutations in GATA6 associated with congenital diaphragmatic

hernia. J Med Genet. 2014;51(3):197-202. Epub 2014/01/05. doi: 10.1136/jmedgenet-2013-

101989. PubMed PMID: 24385578; PubMed Central PMCID: PMCPMC3955383.

152. Lin IC, Ko SF, Shieh CS, Huang CF, Chien SJ, Liang CD. Recurrent congenital

diaphragmatic hernia in Ehlers-Danlos syndrome. Cardiovasc Intervent Radiol.

2006;29(5):920-3. doi: 10.1007/s00270-005-0154-5. PubMed PMID: 16447004.

57 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

153. Bulfamante G, Gana S, Avagliano L, Fabietti I, Gentilin B, Lalatta F. Congenital

diaphragmatic hernia as prenatal presentation of Apert syndrome. Prenat Diagn.

2011;31(9):910-1. Epub 2011/06/28. doi: 10.1002/pd.2788. PubMed PMID: 21706505.

154. McVeigh TP, Banka S, Reardon W. Kabuki syndrome: expanding the phenotype to

include microphthalmia and anophthalmia. Clin Dysmorphol. 2015;24(4):135-9. Epub

2015/06/08. doi: 10.1097/MCD.0000000000000092. PubMed PMID: 26049589.

155. Reis LM, Tyler RC, Schilter KF, Abdul-Rahman O, Innis JW, Kozel BA, et al. BMP4

loss-of-function mutations in developmental eye disorders including SHORT syndrome. Hum

Genet. 2011;130(4):495-504. Epub 2011/02/23. doi: 10.1007/s00439-011-0968-y. PubMed

PMID: 21340693; PubMed Central PMCID: PMCPMC3178759.

156. Wat MJ, Veenma D, Hogue J, Holder AM, Yu Z, Wat JJ, et al. Genomic alterations that

contribute to the development of isolated and non-isolated congenital diaphragmatic hernia. J

Med Genet. 2011;48(5):299-307. Epub 2011/04/29. doi: 10.1136/jmg.2011.089680. PubMed

PMID: 21525063; PubMed Central PMCID: PMCPMC3227222.

157. Wilmink FA, Papatsonis DN, Grijseels EW, Wessels MW. Cornelia de lange syndrome:

a recognizable fetal phenotype. Fetal Diagn Ther. 2009;26(1):50-3. Epub 2009/10/10. doi:

10.1159/000236361. PubMed PMID: 19816032.

158. Piard J, Collet C, Arbez-Gindre F, Nirhy-Lanto A, Van Maldergem L. Coronal

craniosynostosis and radial ray hypoplasia: a third report of Twist mutation in a 33 weeks fetus

with diaphragmatic hernia. Eur J Med Genet. 2012;55(12):719-22. Epub 2012/09/18. doi:

10.1016/j.ejmg.2012.08.007. PubMed PMID: 22982246.

58 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

159. Graul-Neumann LM, Hausser I, Essayie M, Rauch A, Kraus C. Highly variable cutis laxa

resulting from a dominant splicing mutation of the elastin gene. Am J Med Genet A.

2008;146A(8):977-83. Epub 2008/03/19. doi: 10.1002/ajmg.a.32242. PubMed PMID:

18348261.

160. Niemi AK, Northrup H, Hudgins L, Bernstein JA. Horseshoe kidney and a rare TSC2

variant in two unrelated individuals with tuberous sclerosis complex. Am J Med Genet A.

2011;155A(10):2534-7. Epub 2011/09/13. doi: 10.1002/ajmg.a.34197. PubMed PMID:

21910228.

161. Zayed H, Chao R, Moshrefi A, Lopezjimenez N, Delaney A, Chen J, et al. A maternally

inherited chromosome 18q22.1 deletion in a male with late-presenting diaphragmatic hernia

and microphthalmia-evaluation of DSEL as a candidate gene for the diaphragmatic defect. Am

J Med Genet A. 2010;152A(4):916-23. Epub 2010/04/02. doi: 10.1002/ajmg.a.33341. PubMed

PMID: 20358601; PubMed Central PMCID: PMCPMC2922899.

162. Bulman MP, Kusumi K, Frayling TM, McKeown C, Garrett C, Lander ES, et al.

Mutations in the human delta homologue, DLL3, cause axial skeletal defects in spondylocostal

dysostosis. Nat Genet. 2000;24(4):438-41. Epub 2000/03/31. doi: 10.1038/74307. PubMed

PMID: 10742114.

163. Race H, Hall CM, Harrison MG, Quarrell OW, Wakeling EL. A distinct autosomal

recessive disorder of limb development with preaxial brachydactyly, phalangeal duplication,

symphalangism and hyperphalangism. Clin Dysmorphol. 2010;19(1):23-7. Epub 2009/12/03.

doi: 10.1097/MCD.0b013e328334557e. PubMed PMID: 19952732.

59 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

164. Salonen R, Herva R. Hydrolethalus syndrome. J Med Genet. 1990;27(12):756-9. Epub

1990/12/01. PubMed PMID: 2074561; PubMed Central PMCID: PMCPMC1017280.

165. Taylor J, Aftimos S. Congenital diaphragmatic hernia is a feature of Opitz G/BBB

syndrome. Clin Dysmorphol. 2010;19(4):225-6. Epub 2010/09/09. doi:

10.1097/MCD.0b013e32833b2bd3. PubMed PMID: 20823703.

166. Gupta N, Shastri S, Singh PK, Jana M, Mridha A, Verma G, et al. Nasopharyngeal

teratoma, congenital diaphragmatic hernia and Dandy-Walker malformation - a yet

uncharacterized syndrome. Clin Genet. 2016;90(5):470-1. Epub 2016/10/26. doi:

10.1111/cge.12830. PubMed PMID: 27506516.

167. Ackerman KG, Herron BJ, Vargas SO, Huang H, Tevosian SG, Kochilas L, et al. Fog2

is required for normal diaphragm and lung development in mice and humans. PLoS Genet.

2005;1(1):58-65. Epub 2005/08/17. doi: 10.1371/journal.pgen.0010010. PubMed PMID:

16103912; PubMed Central PMCID: PMCPMC1183529.

168. Antonius T, van Bon B, Eggink A, van der Burgt I, Noordam K, van Heijst A. Denys-

Drash syndrome and congenital diaphragmatic hernia: another case with the 1097G >

A(Arg366His) mutation. Am J Med Genet A. 2008;146A(4):496-9. Epub 2008/01/19. doi:

10.1002/ajmg.a.32168. PubMed PMID: 18203154.

169. van Dooren MF, Brooks AS, Hoogeboom AJ, van den Hoonaard TL, de Klein JE,

Wouters CH, et al. Early diagnosis of Wolf-Hirschhorn syndrome triggered by a life-threatening

event: congenital diaphragmatic hernia. Am J Med Genet A. 2004;127A(2):194-6. Epub

2004/04/27. doi: 10.1002/ajmg.a.20613. PubMed PMID: 15108210.

60 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

170. Revencu N, Quenum G, Detaille T, Verellen G, De Paepe A, Verellen-Dumoulin C.

Congenital diaphragmatic eventration and bilateral uretero-hydronephrosis in a patient with

neonatal Marfan syndrome caused by a mutation in exon 25 of the FBN1 gene and review of

the literature. Eur J Pediatr. 2004;163(1):33-7. Epub 2003/10/31. doi: 10.1007/s00431-003-

1330-8. PubMed PMID: 14586646.

171. Pasutto F, Sticht H, Hammersen G, Gillessen-Kaesbach G, Fitzpatrick DR, Nurnberg G,

et al. Mutations in STRA6 cause a broad spectrum of malformations including anophthalmia,

congenital heart defects, diaphragmatic hernia, alveolar capillary dysplasia, lung hypoplasia,

and mental retardation. Am J Hum Genet. 2007;80(3):550-60. Epub 2007/02/03. doi:

10.1086/512203. PubMed PMID: 17273977; PubMed Central PMCID: PMCPMC1821097.

172. Twigg SR, Kan R, Babbs C, Bochukova EG, Robertson SP, Wall SA, et al. Mutations of

ephrin-B1 (EFNB1), a marker of tissue boundary formation, cause craniofrontonasal

syndrome. Proc Natl Acad Sci U S A. 2004;101(23):8652-7. Epub 2004/05/29. doi:

10.1073/pnas.0402819101. PubMed PMID: 15166289; PubMed Central PMCID:

PMCPMC423250.

173. Smigiel R, Jakubiak A, Lombardi MP, Jaworski W, Slezak R, Patkowski D, et al. Co-

occurrence of severe Goltz-Gorlin syndrome and pentalogy of Cantrell - Case report and

review of the literature. Am J Med Genet A. 2011;155A(5):1102-5. Epub 2011/04/13. doi:

10.1002/ajmg.a.33895. PubMed PMID: 21484999.

174. Hughes-Benzie RM, Pilia G, Xuan JY, Hunter AG, Chen E, Golabi M, et al. Simpson-

Golabi-Behmel syndrome: genotype/phenotype analysis of 18 affected males from 7 unrelated

61 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

families. Am J Med Genet. 1996;66(2):227-34. Epub 1996/12/11. doi: 10.1002/(SICI)1096-

8628(19961211)66:2<227::AID-AJMG20>3.0.CO;2-U. PubMed PMID: 8958336.

175. Catteruccia M, Verrigni D, Martinelli D, Torraco A, Agovino T, Bonafe L, et al. Persistent

pulmonary arterial hypertension in the newborn (PPHN): a frequent manifestation of TMEM70

defective patients. Mol Genet Metab. 2014;111(3):353-9. Epub 2014/02/04. doi:

10.1016/j.ymgme.2014.01.001. PubMed PMID: 24485043.

176. Hosokawa S, Takahashi N, Kitajima H, Nakayama M, Kosaki K, Okamoto N.

Brachmann-de Lange syndrome with congenital diaphragmatic hernia and NIPBL gene

mutation. Congenit Anom (Kyoto). 2010;50(2):129-32. Epub 2010/02/17. doi: 10.1111/j.1741-

4520.2010.00270.x. PubMed PMID: 20156239.

177. Suri M, Kelehan P, O'Neill D, Vadeyar S, Grant J, Ahmed SF, et al. WT1 mutations in

Meacham syndrome suggest a coelomic mesothelial origin of the cardiac and diaphragmatic

malformations. Am J Med Genet A. 2007;143A(19):2312-20. Epub 2007/09/14. doi:

10.1002/ajmg.a.31924. PubMed PMID: 17853480.

178. Stheneur C, Faivre L, Collod-Beroud G, Gautier E, Binquet C, Bonithon-Kopp C, et al.

Prognosis factors in probands with an FBN1 mutation diagnosed before the age of 1 year.

Pediatr Res. 2011;69(3):265-70. Epub 2010/12/08. doi: 10.1203/PDR.0b013e3182097219.

PubMed PMID: 21135753.

179. Vasudevan PC, Twigg SR, Mulliken JB, Cook JA, Quarrell OW, Wilkie AO. Expanding

the phenotype of craniofrontonasal syndrome: two unrelated boys with EFNB1 mutations and

congenital diaphragmatic hernia. Eur J Hum Genet. 2006;14(7):884-7. Epub 2006/04/28. doi:

10.1038/sj.ejhg.5201633. PubMed PMID: 16639408.

62 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

180. Dias C, Basto J, Pinho O, Barbedo C, Martins M, Bornholdt D, et al. A nonsense porcn

mutation in severe focal dermal hypoplasia with natal teeth. Fetal Pediatr Pathol.

2010;29(5):305-13. Epub 2010/08/14. doi: 10.3109/15513811003796912. PubMed PMID:

20704476.

181. Veugelers M, Cat BD, Muyldermans SY, Reekmans G, Delande N, Frints S, et al.

Mutational analysis of the GPC3/GPC4 glypican gene cluster on Xq26 in patients with

Simpson-Golabi-Behmel syndrome: identification of loss-of-function mutations in the GPC3

gene. Hum Mol Genet. 2000;9(9):1321-8. Epub 2000/05/18. PubMed PMID: 10814714.

182. Denamur E, Bocquet N, Baudouin V, Da Silva F, Veitia R, Peuchmaur M, et al. WT1

splice-site mutations are rarely associated with primary steroid-resistant focal and segmental

glomerulosclerosis. Kidney Int. 2000;57(5):1868-72. Epub 2000/05/03. doi: 10.1046/j.1523-

1755.2000.00036.x. PubMed PMID: 10792605.

183. Chassaing N, Golzio C, Odent S, Lequeux L, Vigouroux A, Martinovic-Bouriel J, et al.

Phenotypic spectrum of STRA6 mutations: from Matthew-Wood syndrome to non-lethal

anophthalmia. Hum Mutat. 2009;30(5):E673-81. Epub 2009/03/25. doi: 10.1002/humu.21023.

PubMed PMID: 19309693.

184. Twigg SR, Matsumoto K, Kidd AM, Goriely A, Taylor IB, Fisher RB, et al. The origin of

EFNB1 mutations in craniofrontonasal syndrome: frequent somatic mosaicism and explanation

of the paucity of carrier males. Am J Hum Genet. 2006;78(6):999-1010. Epub 2006/05/11. doi:

10.1086/504440. PubMed PMID: 16685650; PubMed Central PMCID: PMCPMC1474108.

185. Li M, Shuman C, Fei YL, Cutiongco E, Bender HA, Stevens C, et al. GPC3 mutation

analysis in a spectrum of patients with overgrowth expands the phenotype of Simpson-Golabi-

63 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

Behmel syndrome. Am J Med Genet. 2001;102(2):161-8. Epub 2001/07/31. PubMed PMID:

11477610.

186. Horiguchi M, Inoue T, Ohbayashi T, Hirai M, Noda K, Marmorstein LY, et al. Fibulin-4

conducts proper elastogenesis via interaction with cross-linking lysyl oxidase. Proc

Natl Acad Sci U S A. 2009;106(45):19029-34. Epub 2009/10/27. doi:

10.1073/pnas.0908268106. PubMed PMID: 19855011; PubMed Central PMCID:

PMCPMC2776456.

187. Kreidberg JA, Sariola H, Loring JM, Maeda M, Pelletier J, Housman D, et al. WT-1 is

required for early kidney development. Cell. 1993;74(4):679-91. Epub 1993/08/27. PubMed

PMID: 8395349.

188. Clugston RD, Klattig J, Englert C, Clagett-Dame M, Martinovic J, Benachi A, et al.

Teratogen-induced, dietary and genetic models of congenital diaphragmatic hernia share a

common mechanism of pathogenesis. Am J Pathol. 2006;169(5):1541-9. Epub 2006/10/31.

doi: 10.2353/ajpath.2006.060445. PubMed PMID: 17071579; PubMed Central PMCID:

PMCPMC1780206.

189. Yuan W, Rao Y, Babiuk RP, Greer JJ, Wu JY, Ornitz DM. A genetic model for a central

(septum transversum) congenital diaphragmatic hernia in mice lacking Slit3. Proc Natl Acad

Sci U S A. 2003;100(9):5217-22. Epub 2003/04/19. doi: 10.1073/pnas.0730709100. PubMed

PMID: 12702769; PubMed Central PMCID: PMCPMC154325.

190. Coles GL, Ackerman KG. Kif7 is required for the patterning and differentiation of the

diaphragm in a model of syndromic congenital diaphragmatic hernia. Proc Natl Acad Sci U S

64 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

A. 2013;110(21):E1898-905. Epub 2013/05/08. doi: 10.1073/pnas.1222797110. PubMed

PMID: 23650387; PubMed Central PMCID: PMCPMC3666741.

191. Hentsch B, Lyons I, Li R, Hartley L, Lints TJ, Adams JM, et al. Hlx homeo box gene is

essential for an inductive tissue interaction that drives expansion of embryonic liver and gut.

Genes Dev. 1996;10(1):70-9. Epub 1996/01/01. PubMed PMID: 8557196.

192. Kim PC, Mo R, Hui Cc C. Murine models of VACTERL syndrome: Role of sonic

hedgehog signaling pathway. J Pediatr Surg. 2001;36(2):381-4. Epub 2001/02/15. PubMed

PMID: 11172440.

193. Beck TF, Veenma D, Shchelochkov OA, Yu Z, Kim BJ, Zaveri HP, et al. Deficiency of

FRAS1-related extracellular matrix 1 (FREM1) causes congenital diaphragmatic hernia in

humans and mice. Hum Mol Genet. 2013;22(5):1026-38. Epub 2012/12/12. doi:

10.1093/hmg/dds507. PubMed PMID: 23221805; PubMed Central PMCID: PMCPMC3561915.

194. You LR, Takamoto N, Yu CT, Tanaka T, Kodama T, Demayo FJ, et al. Mouse lacking

COUP-TFII as an animal model of Bochdalek-type congenital diaphragmatic hernia. Proc Natl

Acad Sci U S A. 2005;102(45):16351-6. Epub 2005/10/28. doi: 10.1073/pnas.0507832102.

PubMed PMID: 16251273; PubMed Central PMCID: PMCPMC1283449.

195. Babiuk RP, Greer JJ. Diaphragm defects occur in a CDH hernia model independently of

myogenesis and lung formation. Am J Physiol Lung Cell Mol Physiol. 2002;283(6):L1310-4.

Epub 2002/10/22. doi: 10.1152/ajplung.00257.2002. PubMed PMID: 12388344.

196. Baertschi S, Zhuang L, Trueb B. Mice with a targeted disruption of the Fgfrl1 gene die at

birth due to alterations in the diaphragm. FEBS J. 2007;274(23):6241-53. Epub 2007/11/08.

doi: 10.1111/j.1742-4658.2007.06143.x. PubMed PMID: 17986259.

65 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

197. Mendelsohn C, Lohnes D, Decimo D, Lufkin T, LeMeur M, Chambon P, et al. Function

of the retinoic acid receptors (RARs) during development (II). Multiple abnormalities at various

stages of organogenesis in RAR double mutants. Development. 1994;120(10):2749-71. Epub

1994/10/01. PubMed PMID: 7607068.

198. Pereira L, Lee SY, Gayraud B, Andrikopoulos K, Shapiro SD, Bunton T, et al.

Pathogenetic sequence for aneurysm revealed in mice underexpressing fibrillin-1. Proc Natl

Acad Sci U S A. 1999;96(7):3819-23. Epub 1999/03/31. PubMed PMID: 10097121; PubMed

Central PMCID: PMCPMC22378.

199. Ramirez-Solis R, Zheng H, Whiting J, Krumlauf R, Bradley A. Hoxb-4 (Hox-2.6) mutant

mice show homeotic transformation of a cervical vertebra and defects in the closure of the

sternal rudiments. Cell. 1993;73(2):279-94. Epub 1993/04/23. PubMed PMID: 8097432.

200. Liu Y, Oppenheim RW, Sugiura Y, Lin W. Abnormal development of the neuromuscular

junction in Nedd4-deficient mice. Dev Biol. 2009;330(1):153-66. Epub 2009/04/07. doi:

10.1016/j.ydbio.2009.03.023. PubMed PMID: 19345204; PubMed Central PMCID:

PMCPMC2810636.

201. Hildebrand JD, Soriano P. Overlapping and unique roles for C-terminal binding protein 1

(CtBP1) and CtBP2 during mouse development. Mol Cell Biol. 2002;22(15):5296-307. Epub

2002/07/09. PubMed PMID: 12101226; PubMed Central PMCID: PMCPMC133942.

202. Valasek P, Theis S, DeLaurier A, Hinits Y, Luke GN, Otto AM, et al. Cellular and

molecular investigations into the development of the pectoral girdle. Dev Biol.

2011;357(1):108-16. Epub 2011/07/12. doi: 10.1016/j.ydbio.2011.06.031. PubMed PMID:

21741963.

66 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

203. Misgeld T, Burgess RW, Lewis RM, Cunningham JM, Lichtman JW, Sanes JR. Roles of

neurotransmitter in synapse formation: development of neuromuscular junctions lacking

choline acetyltransferase. Neuron. 2002;36(4):635-48. Epub 2002/11/21. PubMed PMID:

12441053.

204. Domyan ET, Branchfield K, Gibson DA, Naiche LA, Lewandoski M, Tessier-Lavigne M,

et al. Roundabout receptors are critical for foregut separation from the body wall. Dev Cell.

2013;24(1):52-63. Epub 2013/01/19. doi: 10.1016/j.devcel.2012.11.018. PubMed PMID:

23328398; PubMed Central PMCID: PMCPMC3551250.

205. Oh J, Takahashi R, Adachi E, Kondo S, Kuratomi S, Noma A, et al. Mutations in two

matrix metalloproteinase genes, MMP-2 and MT1-MMP, are synthetic lethal in mice.

Oncogene. 2004;23(29):5041-8. Epub 2004/04/06. doi: 10.1038/sj.onc.1207688. PubMed

PMID: 15064723.

206. Zhang B, Xiao W, Qiu H, Zhang F, Moniz HA, Jaworski A, et al. Heparan sulfate

deficiency disrupts developmental angiogenesis and causes congenital diaphragmatic hernia.

J Clin Invest. 2014;124(1):209-21. Epub 2013/12/21. doi: 10.1172/JCI71090. PubMed PMID:

24355925; PubMed Central PMCID: PMCPMC3871243.

207. Krieser RJ, MacLea KS, Longnecker DS, Fields JL, Fiering S, Eastman A.

Deoxyribonuclease IIalpha is required during the phagocytic phase of apoptosis and its loss

causes perinatal lethality. Cell Death Differ. 2002;9(9):956-62. Epub 2002/08/16. doi:

10.1038/sj.cdd.4401056. PubMed PMID: 12181746.

208. Hornstra IK, Birge S, Starcher B, Bailey AJ, Mecham RP, Shapiro SD. Lysyl oxidase is

required for vascular and diaphragmatic development in mice. J Biol Chem.

67 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

2003;278(16):14387-93. Epub 2002/12/11. doi: 10.1074/jbc.M210144200. PubMed PMID:

12473682.

209. Shi L, Zhao G, Qiu D, Godfrey WR, Vogel H, Rando TA, et al. NF90 regulates cell cycle

exit and terminal myogenic differentiation by direct binding to the 3'-untranslated region of

MyoD and p21WAF1/CIP1 mRNAs. J Biol Chem. 2005;280(19):18981-9. Epub 2005/03/05.

doi: 10.1074/jbc.M411034200. PubMed PMID: 15746098.

210. Mill P, Lockhart PJ, Fitzpatrick E, Mountford HS, Hall EA, Reijns MA, et al. Human and

mouse mutations in WDR35 cause short-rib polydactyly syndromes due to abnormal

ciliogenesis. Am J Hum Genet. 2011;88(4):508-15. Epub 2011/04/09. doi:

10.1016/j.ajhg.2011.03.015. PubMed PMID: 21473986; PubMed Central PMCID:

PMCPMC3071922.

211. Kim Y, Sharov AA, McDole K, Cheng M, Hao H, Fan CM, et al. Mouse B-type lamins

are required for proper organogenesis but not by embryonic stem cells. Science.

2011;334(6063):1706-10. Epub 2011/11/26. doi: 10.1126/science.1211222. PubMed PMID:

22116031; PubMed Central PMCID: PMCPMC3306219.

212. Arber S, Han B, Mendelsohn M, Smith M, Jessell TM, Sockanathan S. Requirement for

the homeobox gene Hb9 in the consolidation of motor neuron identity. Neuron.

1999;23(4):659-74. Epub 1999/09/11. PubMed PMID: 10482234.

213. Pacifici PG, Peter C, Yampolsky P, Koenen M, McArdle JJ, Witzemann V. Novel mouse

model reveals distinct activity-dependent and -independent contributions to synapse

development. PLoS One. 2011;6(1):e16469. Epub 2011/02/10. doi:

68 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

10.1371/journal.pone.0016469. PubMed PMID: 21305030; PubMed Central PMCID:

PMCPMC3031568.

214. Ibdah JA, Paul H, Zhao Y, Binford S, Salleng K, Cline M, et al. Lack of mitochondrial

trifunctional protein in mice causes neonatal hypoglycemia and sudden death. J Clin Invest.

2001;107(11):1403-9. Epub 2001/06/08. doi: 10.1172/JCI12590. PubMed PMID: 11390422;

PubMed Central PMCID: PMCPMC209324.

215. Sachs M, Brohmann H, Zechner D, Muller T, Hulsken J, Walther I, et al. Essential role

of Gab1 for signaling by the c-Met receptor in vivo. J Cell Biol. 2000;150(6):1375-84. Epub

2000/09/20. PubMed PMID: 10995442; PubMed Central PMCID: PMCPMC2150711.

216. Goshu E, Jin H, Fasnacht R, Sepenski M, Michaud JL, Fan CM. Sim2 mutants have

developmental defects not overlapping with those of Sim1 mutants. Mol Cell Biol.

2002;22(12):4147-57. Epub 2002/05/25. PubMed PMID: 12024028; PubMed Central PMCID:

PMCPMC133848.

217. Kurohara K, Komatsu K, Kurisaki T, Masuda A, Irie N, Asano M, et al. Essential roles of

Meltrin beta (ADAM19) in heart development. Dev Biol. 2004;267(1):14-28. Epub 2004/02/21.

doi: 10.1016/j.ydbio.2003.10.021. PubMed PMID: 14975714.

218. Laurin M, Fradet N, Blangy A, Hall A, Vuori K, Cote JF. The atypical Rac activator

Dock180 (Dock1) regulates myoblast fusion in vivo. Proc Natl Acad Sci U S A.

2008;105(40):15446-51. Epub 2008/09/30. doi: 10.1073/pnas.0805546105. PubMed PMID:

18820033; PubMed Central PMCID: PMCPMC2563090.

219. Meech R, Gonzalez KN, Barro M, Gromova A, Zhuang L, Hulin JA, et al. Barx2 is

expressed in satellite cells and is required for normal muscle growth and regeneration. Stem

69 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

Cells. 2012;30(2):253-65. Epub 2011/11/15. doi: 10.1002/stem.777. PubMed PMID: 22076929;

PubMed Central PMCID: PMCPMC4547831.

220. Chevessier F, Girard E, Molgo J, Bartling S, Koenig J, Hantai D, et al. A mouse model

for congenital myasthenic syndrome due to MuSK mutations reveals defects in structure and

function of neuromuscular junctions. Hum Mol Genet. 2008;17(22):3577-95. Epub 2008/08/23.

doi: 10.1093/hmg/ddn251. PubMed PMID: 18718936.

221. Zvaritch E, Depreux F, Kraeva N, Loy RE, Goonasekera SA, Boncompagni S, et al. An

Ryr1I4895T mutation abolishes Ca2+ release channel function and delays development in

homozygous offspring of a mutant mouse line. Proc Natl Acad Sci U S A. 2007;104(47):18537-

42. Epub 2007/11/16. doi: 10.1073/pnas.0709312104. PubMed PMID: 18003898; PubMed

Central PMCID: PMCPMC2141812.

222. Li S, Czubryt MP, McAnally J, Bassel-Duby R, Richardson JA, Wiebel FF, et al.

Requirement for serum response factor for skeletal muscle growth and maturation revealed by

tissue-specific gene deletion in mice. Proc Natl Acad Sci U S A. 2005;102(4):1082-7. Epub

2005/01/14. doi: 10.1073/pnas.0409103102. PubMed PMID: 15647354; PubMed Central

PMCID: PMCPMC545866.

223. Nagata K, Kiryu-Seo S, Maeda M, Yoshida K, Morita T, Kiyama H. Damage-induced

neuronal endopeptidase is critical for presynaptic formation of neuromuscular junctions. J

Neurosci. 2010;30(20):6954-62. Epub 2010/05/21. doi: 10.1523/JNEUROSCI.4521-09.2010.

PubMed PMID: 20484637.

224. Uetani N, Chagnon MJ, Kennedy TE, Iwakura Y, Tremblay ML. Mammalian motoneuron

axon targeting requires receptor protein tyrosine phosphatases sigma and delta. J Neurosci.

70 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

2006;26(22):5872-80. Epub 2006/06/02. doi: 10.1523/JNEUROSCI.0386-06.2006. PubMed

PMID: 16738228.

225. Reinholt BM, Ge X, Cong X, Gerrard DE, Jiang H. Stac3 is a novel regulator of skeletal

muscle development in mice. PLoS One. 2013;8(4):e62760. Epub 2013/04/30. doi:

10.1371/journal.pone.0062760. PubMed PMID: 23626854; PubMed Central PMCID:

PMCPMC3633831.

226. Zhang P, Liegeois NJ, Wong C, Finegold M, Hou H, Thompson JC, et al. Altered cell

differentiation and proliferation in mice lacking p57KIP2 indicates a role in Beckwith-

Wiedemann syndrome. Nature. 1997;387(6629):151-8. Epub 1997/05/08. doi:

10.1038/387151a0. PubMed PMID: 9144284.

227. Hicks AN, Lorenzetti D, Gilley J, Lu B, Andersson KE, Miligan C, et al. Nicotinamide

mononucleotide adenylyltransferase 2 (Nmnat2) regulates axon integrity in the mouse embryo.

PLoS One. 2012;7(10):e47869. Epub 2012/10/20. doi: 10.1371/journal.pone.0047869.

PubMed PMID: 23082226; PubMed Central PMCID: PMCPMC3474723.

228. Xian J, Clark KJ, Fordham R, Pannell R, Rabbitts TH, Rabbitts PH. Inadequate lung

development and bronchial hyperplasia in mice with a targeted deletion in the Dutt1/Robo1

gene. Proc Natl Acad Sci U S A. 2001;98(26):15062-6. Epub 2001/12/06. doi:

10.1073/pnas.251407098. PubMed PMID: 11734623; PubMed Central PMCID:

PMCPMC64983.

229. Bladt F, Riethmacher D, Isenmann S, Aguzzi A, Birchmeier C. Essential role for the c-

met receptor in the migration of myogenic precursor cells into the limb bud. Nature.

1995;376(6543):768-71. Epub 1995/08/31. doi: 10.1038/376768a0. PubMed PMID: 7651534.

71 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

230. Catela C, Bilbao-Cortes D, Slonimsky E, Kratsios P, Rosenthal N, Te Welscher P.

Multiple congenital malformations of Wolf-Hirschhorn syndrome are recapitulated in Fgfrl1 null

mice. Dis Model Mech. 2009;2(5-6):283-94. Epub 2009/04/23. doi: 10.1242/dmm.002287.

PubMed PMID: 19383940; PubMed Central PMCID: PMCPMC2675798.

231. Hasty P, Bradley A, Morris JH, Edmondson DG, Venuti JM, Olson EN, et al. Muscle

deficiency and neonatal death in mice with a targeted mutation in the myogenin gene. Nature.

1993;364(6437):501-6. Epub 1993/08/05. doi: 10.1038/364501a0. PubMed PMID: 8393145.

72 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

SUPPORTING INFORMATION

Figure S1: Alternative allele read depth of somatic de novo variants is significantly lower than

germline de novo variants across CDH probands (A) and tissues (B).

Figure S2: The spectrum of de novo variants varies between the different CDH probands (A), but

does not vary between the three tissues sampled from the probands (B).

Table S1: Genes associated with CDH in the literature. Genes were collated from multiple studies,

including family and single patient case studies, large human patient sequencing studies and functional

studies using mouse genetics. A gene from human studies is included if multiple patients harbor

variants in the gene, the gene is associated with other implicated genes and expressed in the

developing diaphragm, and/or the gene is associated with other developmental disorders or structural

birth defects that co-occur in patients with complex CDH. A gene from mouse studies is included if a

variant in the gene leads to a diaphragm defect. Genes were then ranked based on the amount and

type of evidence implicating their contribution to CDH (see Materials and Methods). Literature cited [11-

17, 21, 23, 28, 30, 34-46, 48, 132-231]. Genes involved in GATA/TBX transcriptional network are

highlighted in blue; in Hedgehog signaling in cyan; in Retinoic Acid signaling in dark green; in

ROBO/SLIT signaling in purple; in MET signaling in orange; in WNT/bCATENIN signaling in yellow; and

in FGF signaling in pink and ECM proteins are in light green and in muscle-related transcription factors

or structural/functional genes in red.

Table S2: DNA samples of CDH probands and their parents are of high quality and expected

ancestry and relatedness. Sheet 1 reports results of Peddy analysis of heterozygosity and indicates

that sequences were of high quality and ancestry congruent with self-reported ancestry. Sheet 2 reports

73 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

results of Peddy sex analysis and shows that sex of DNA samples is congruent with known sex of

sample origins. Sheet 3 reports results of Peddy analysis of relatedness and shows that relatedness is

as expected.

Table S3: Germline de novo and somatic de novo variants identified by rufus in CDH probands.

True positive variants were identified by rufus and then filtered based on read depth (>15 reads total in

all samples compared in the rufus call), call quality (>20 GQ score) and alternate allele frequency

(sample of interest >20% alternate read frequency and other samples in comparison <20% alternate

read frequency). Sheet 1: summary of all de novo variants identified. Sheet 2: de novo coding variants

identified. Sheets 3-5: de novo gene-associated variants identified in diaphragm, skin, and blood.

Sheets 6-8: de novo non-gene associated variants identified in diaphragm, skin, and blood.

Table S4: Predicted gene damaging variants inherited from parents of CDH probands. VAAST3

recessive model comparing CDH patient families to 1000 Genome phase 3 individuals was used to call

predicted damaging, rare recessive variants. Genes harboring variants with a VAAST p-value <0.05

were saved as candidate genes potentially contributing to CDH (Sheet 2 shows unfiltered genes).

Genes were then further filtered based on expression in PPFs (genes needed to be expressed at > 10

TPM in PPFs), and pseudogenes and highly mutable genes were removed (Sheet 1 shows filtered

genes).

Table S5. Germline de novo and inherited structural variants possibly contributing to CDH.

Structural variants were identified using the Lumpy Smooth workflow. Germline de novo variants were

identified by querying variants unique to the CDH proband compared to the parents, filtered based on

reads that Lumpy scored as supporting the variant (>8 reads either split or discordant) and finally

visually inspected in IGV to determine whether parents had evidence of the same variant. Inherited

74 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

structural variants were filtered based on chromosomal regions highly associated with CDH [10] .

Structural variants were visually inspected and filtered out if allele frequency > 0.1 for inherited SVs or >

0.01 for de novo SVs [94], if SV was located in an intergenic region, or overlapped a repetitive element

[125].

FIGURE LEGENDS

Figure 1: Genes associated with CDH in the literature were collected and ranked based on the

amount of human CDH patient and mouse model functional evidence (see Table S1 for

rankings). A) CDH-associated genes overlaid on all genes expressed in mouse PPFs at E12.5. RNA-

seq reads normalized using TPM. Black line denotes TPM = 0 and red dashed line TPM = 10 (Log1).

Black dots, CDH genes highly supported by human and mouse data; dark grey dots, genes with

moderate data support; light grey dots, CDH genes with modest data support. B) STRING protein

network of CDH-associated genes. Within the STRING tool all active interaction sources except text-

mining was used, with a minimum required interaction score of 0.4. Edges are based on the strength of

data support. Nodes without edges are genes that do not interact with any other listed genes. Thick

edges, interaction score > 0.9; thinner edges, 0.9 - 0.7 interaction score; thinnest edges, interaction

score < 0.4. Prominent GATA/TBX transcriptional network, extracellular matrix, and muscle-related

genes are highlighted as well as Hedgehog, Retinoic Acid, ROBO/SLIT, MET, WNT/b-CATENIN, and

FGF signaling pathways. In both RNA-seq expression and protein network genes ranked 1-27 (with a

total score of > 10) are shown in black, genes ranked 28- 88 (with a total score 5-9) are shown in dark

gray, and genes ranked 89- 153 (with a total score <5) are shown in light gray. Details on rankings are

described in Table S1 and Materials and Methods.

75 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

Figure 2: Sample cohorts analyzed in this study. Each cohort consists of DNA samples from

diaphragm sac (yellow), skin (blue), and blood (red) of CDH proband and blood from proband’s mother

and father.

Figure 3: Germline and somatic de novo variants, identified via RUFUS, in CDH probands. A) De

novo germline and somatic variants unique to a single tissue and shared between two tissues (coding,

gene-associated non-coding, or non-gene associated variants) were found in the 4 CDH probands. 5

genes (PEX6, SCARB1, OLFM3, ZNF792, AR, and THSD7A) contained germline coding variants.

Genes with coding variants are highlighted in red. #/#/# = exon (coding) variants/ UTR+intron variants/

intergenic variants. B) Five genes with germline de novo coding variants overlaid on all genes

expressed in mouse PPFs at E12.5 (ZNF792 has no mouse ortholog). RNA-seq reads normalized

using TPM. Black line denotes TPM = 0 and red dashed line TPM = 10. Only SCARB1, PEX6, and

THSD7A are expressed in the PPFs at approximately > 10 TPM.

Figure 4: Inherited variants potentially contributing to etiology of CDH probands. A) Compound

heterozygous (boxed in dark purple) or homozygous (boxed in light purple) inherited variants highly

ranked as damaging and rare by VAAST, with pseudogenes, highly mutable genes, and genes

expressed at low levels in PPFs filtered out. B) Candidate genes with inherited variants overlaid on all

genes expressed in mouse PPFs at E12.5. RNA-seq reads normalized using TPM. Black line denotes

TPM = 0 and red dashed line TPM = 10.

Figure 5: Inherited deletion in intron 2 of GATA4, in a region which overlaps an enhancer

conserved in mouse and humans, found in CDH Probands 411 and 967. A-D) Samplot figures

show coverage in grey (right y-axis) and Insert size between pair-end reads (left y-axis), with split and

discordant reads with mis-matched insert size above 500 shown as bars. Paternal inheritance in

76 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Bogenschutz et al.

Proband 411 (A) and maternal inheritance in Proband 967 (D) of intron 2 deletion. E) GATA4 intron 2

and 343 bp deleted region (red box) with tracks of gene location, conservation, DNase 1

hypersensitivity and H3K27 acetylation in human lung fibroblasts. Bottom track shows enhancer

element 2205 from the VISTA enhancer database within which the deletion resides. Built on the hg19

human reference genome. F) Pairwise alignment of 343 bp region deletion in CDH families to the

homologous region in the mouse genome (mm9 reference genome). Track 1, consensus sequence;

track 2, base pair identity; track 3, human (hg19) sequence; track 4, mouse minus strand sequence.

Figure 6: Models of four CDH probands with inherited, germline and somatic de novo variants,

including coding, gene-associated (but non-coding), and non-gene associated SNVs, indels,

and SVs. Each model highlights the genetic complexity and heterogeneity underlying CDH. Z, zygote;

SD, skin-diaphragm progenitor; SB, skin-blood progenitor; DB, diaphragm-blood progenitor; S, skin cell;

D, diaphragm cell; B, blood cell.

77 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.03.024398; this version posted April 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.