University of Nevada, Reno

Toward Understanding the Genetic Basis of Cross-Incompatibility in : de novo genome Assembly of Johnsongrass and Resequencing of Iap and BAM1 loci

A Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Cellular and Molecular Biology

By

Julia N. Trowbridge

Melinda Yerka/Thesis Advisor

December 2019

THE GRADUATE SCHOOL

We recommend that the thesis prepared under our supervision by Julia Trowbridge

Entitled

de novo genome Assembly of Johnsongrass and Resequencing of Iap and BAM1 loci

be accepted in partial fulfillment of the requirements for the degree of Master of Science

Dr. Melinda Yerka , Advisor

Dr. Felipe Barrios-Masias , Committee Member Dr. David Alvarez-Ponce , Comm ittee Member Dr. Jeff Harper , Graduate School Representative

David W. Zeh, Ph.D., Dean, Graduate School

December 2019 i

ABSTRACT

Sorghum [ (L.) Moench] (referred to as “sorghum” hereafter) is a C4 grain crop in the grass family . It is closely-related to other members of sub-family , including the staple crops maize [Zea mays L.] and rice [Oryza sativa L.], and is the 5th-most produced cereal crop in the world31,33,72. The U.S. leads production of sorghum globally34. Johnsongrass [Sorghum halepense N. Steud de Wet] is a noxious weed in 46 states in the United States and often found growing within close proximity to sorghum where it has been shown to contaminate harvested seed through pollen-mediated gene flow. The risk of gene flow from sorghum to Johnsongrass is the primary reason why GE sorghum has not been approved for commercialization by

USDA-APHIS (personal communication, Dr. Subray Hegde, USDA Biotechnology

Regulatory Services branch chief in the Biotechnology Risk Analysis Program).

Many efforts71,73-77 have been made to determine the rate of gene flow between sorghum and Johnsongrass to empirically assess the risk of sorghum traits transferring to feral Johnsongrass populations, but these studies have used limited numbers of accessions from both species and the lack of high-throughput genotyping methods or a high-quality

Johnsongrass reference genome have led to inconsistent results. Given the polyploid history of Johnsongrass (a putative allotetraploid [2n = 4x = 40] and the close relationship between it and sorghum (S. bicolor is one of its ancestral genomes), this risk is not insignificant.

In order to determine the frequency of sorghum alleles segregating in regional feral Johnsongrass populations, an assembled and annotated Johnsongrass reference

ii

genome is needed to identify species-specific alleles, and their copy number, that may differ from those in the existing, well-annotated sorghum reference genome. Local rates of gene flow are needed because different sorghum genotypes and production methods are used in different geographies, and both factors could impact rates of reproductive success, genetic drift, or the fixation of crop alleles.

This thesis provides the basic genomic framework necessary to assist in NGS- based inquiries into the ancestry, speciation, and comparative genomics of Johnsongrass and sorghum. We completed the first Johnsongrass de novo genome assembly and amplified, through long-read resequencing, the putative reproductive barrier loci

64 (Inhibition of Alien Pollen, Iap and Barely Any Meristem, BAM1 ) in Johnsongrass that are known to impact rates of gene flow among Sorghum species and closely-related genera (Zea and Saccharum).

This new Johnsongrass reference genome and targeted resequencing data will greatly facilitate population genetic studies aimed to clarify empirical rates of gene flow among sorghum and Johnsongrass specifically, and within the Sorghum species complex generally. They will additionally assist with genetic and physiological investigations into the roles of key loci involved in processes of speciation and reproductive isolation.

iii

ACKNOWLEDGMENTS

Knowledge comes in many forms and I would like to thank those who have expanded mine through readings, hands on activities, or conversations. To my new friends Devin Smith and John (Jeep) Baggett, you made grad school bearable and even fun. To my fellow graduate students, especially Haley Toups and Chrystle Weigand, thank you for taking time out of your own research to teach me various lab/ greenhouse skills all while being the most positive and kind people I’ve met. To Jason, thanks for waiting around through countless hours of lab work, reading, stress, and listening to my fascination over my project, even when you didn’t quite understand what I was saying. To my Mom, thank you for the belief, support, and pride you have in me. To my advisors, professors, and other researchers, your wealth of knowledge never ceases to amaze and intimidate me in an inspiring way. Your feedback was/is always fair, challenging, and a growing experience that I have appreciated as I become more confident in my own research ability and work ethic.

iv

Table of Contents

1. List of Figures v

2. Chapter I: Literature Review 1

a. Introduction 1

b. Zea cross-incompatibility 4

c. PMEs modulate the stiffening and loosening of cell walls 6

d. Sorghum cross-incompatibility 9

3. Chapter II: Assembly of the Johnsongrass [Sorghum halepense N. Steud

de Wet] genome 13

a. Introduction 13

b. Materials and Methods 15

c. Results 25

d. Discussion 30

4. Supplemental Information 34

5. References 72

v

List of Figures

1. Figure 1. Model of Zea Pectin Methylesterases: Roles of Ga1, Ga2 and Tcb1 Allele Types in Pollen-Pistil Interactions..……………………………………..…8-9

2. Figure 2. Synteny between Chromosome 5 in Sorghum bicolor and the Zea mays reference genome, including the Ga1 locus on Chromosome 2 and additional sections of Chromosome 4. Sorghum appears to retain the ancestral Poaceae locus that was divided following an ancient tetraploidization event in Zea. Homologous regions are shown with connecting strands. Repeats are indicated in orange; the positive strand genes are indicated in blue and the negative strand genes are indicated in green…...... ….10

3. Figure 3. Representative Johnsongrass Seedlings after One Month of Growth. Meter stick for length reference under control (A) or water deficit (B) conditions. The control condition was 200 ml ± 20 ml SD daily nutrient solution whereas the water deficit condition was 50 ml ± 5 ml SD daily nutrient solution...…..…………………………………………………………...………….…21

4. Figure 4. 0.4% Agarose Gel with Johnsongrass Amplicons for Iap and BAM1, Putative Cross-Incompatibility Loci. Chr02:2144633..2160696: Iap full-length amplicon containing five candidate genes; Chr02:2144633..2150496: sub-region of Iap containing two candidate genes expressed in floral tissues at anthesis, Sobic.002G023300.1 and Sobic.002G023400; Chr02:2550778..2556242: amplicon containing the full-length BAM1 gene (Sobic.002G027600.1)..……………………..25

5. Figure 5. Heatmap of the Johnsongrass De Novo Genome Assembly. Blue boxes denote scaffolds and green boxes denote contigs. The red diagonal lines next to most pairs of green boxes are the result of synteny between those contigs. The gap in the diagonal red line appears to be one chromosome that is complicated. There is more polyploidy, heterozygosity, or both than the other chromosomes. It is likely that this unresolved heterozygocity is the reason for only 36 chromosomes (scaffolds) being detected despite Johnsongrass having 40 chromosomes. Further cytology and advanced computational investigations will be needed to resolve this………………26

6. Figure 6. Ordering Metrics for the Johnsongrass De Novo Genome Assembly. A total of 36 finished scaffolds (chromosomes) were detected of N50 = 43,822,357 bp each…………………………………………………………………………………..27

7. Figure 7. Phylogenetic Tree of BAM1. The neighbor-joining method and distance corrections were conducted in MUSCLE of the coding region of the gene BAM1 in sorghum against the Poaceae species Maize, Rice, Johnsongrass, and outside group Arabidopsis………………………………………………………………………..…29

8. Figure 8. Phylogenetic Tree of Sobic.002G023300. The neighbor-joining method and distance corrections were conducted in MUSCLE of the coding region of the gene Sobic.002G023300 in sorghum against the Poaceae species Maize, Rice, Johnsongrass, and outside group Arabidopsis.…………………..…………….…….29

1

LITERATURE REVIEW

Introduction

Sorghum [Sorghum bicolor (L.) Moench] (referred to as “sorghum” hereafter) is a C4 grain crop in the grass family Poaceae. It is closely-related to other members of sub-family Panicoideae, including the staple crops maize [Zea mays L.] and rice [Oryza sativa L.], and is the 5th-most produced cereal crop in the world31,33,72. The U.S. is the largest global producer 34, with most production going to animal feed, although specialty markets are leading to increased adoption of commercial varieties for gluten-free bread flour, syrup, popping, alcoholic beverages, and biofuels. Sorghum is widely marketed as a gluten-free, non-GMO ancient grain35.The genus Sorghum contains 25 species within five clades: Eu-sorghum, Heterosorghum, Parasorghum, Stiposorghum, and

Chaetosorghum 11,12,13. Sorghum, S. bicolor, is in the Eu-sorghum clade along with its weedy relatives, which include its conspecific de-domesticated relative, shattercane [S. bicolor (L.) Moench ssp. drummondii (Nees ex Steud.) De Wet ex Davidse]; the noxious weed, Johnsongrass [Sorghum halepense N. Steud de Wet]; and [S. propinquum S. Kunth

Hitchc]16,17.Current genetic evidence indicates that Johnsongrass (2n = 4x = 40) is an allopolyploid descendent of S. bicolor (2n = 2x = 20) and S. propinquum (2n = 2x = 20)

31, hence the observed cross-compatibility between feral sorghum and Johnsongrass15.

The S. bicolor genome has already been assembled31 and is well-annotated, making genetic investigations of both sorghum and shattercane cross-compatibility and incompatibility possible. However, Johnsongrass is a much more problematic weed for

2

agricultural production throughout the world due to its perennial growth and aggressive rhizomes, and there is currently no genome assembly available.

The focus of this thesis is to provide sufficient information about the

Johnsongrass genome to enable the first-ever genetic and physiological studies of cross- compatibility and incompatibility in the Sorghum genus, with the long-term objective of identifying or engineering novel cross-incompatibility systems to minimize gene flow among crops and weeds. Genetic isolation, through understanding of reproductive barriers, is needed to prevent the unintended transfer of crop traits to weeds, and to protect sorghum ovules carrying high-value specialty endosperm traits (i.e. improved protein digestibility, popping, high-amylopectin “waxy” starch, an ideal ethanol feedstock due to improved starch digestibility) from adventitious cross- pollination by wild-type or weed pollen, which could reduce quality and end use functionality.

Reproductive barriers have been widely deployed in maize breeding by using cross- incompatibility to genetically isolate popcorn ovules from dent maize pollen to protect endosperm quality traits such as popping ability. Crops loose significant yield and revenue yearly due to the lack of barriers between compatible varieties. According to the Weed

Science Society of America, based on data from 2007 to 2015, sorghum production yield loss per year averaged 47.8% even when farmers used best-management practices due to the lack of effective post-emergence grass available for weed control. This is a loss of an estimated $817 million22 .Grass weeds (especially shattercane and Johnsongrass) reduce sorghum yield by competing for nutrients, water, sunlight, and by

3

contaminating harvested seed (including reducing its quality through hybridization with

64 the crop .

Johnsongrass is a noxious weed in 46 states in the United States often found within close proximity to sorghum14,15,23. Arriola and Ellstrand (year18) reported hybridization between sorghum and Johnsongrass as not totally predictable; however, the

11% outcrossing rate suggested by Doggett (1988)19 is fairly consistent within their study. Actual rates of gene flow from sorghum to Johnsongrass are beyond the ability of current genetic investigations to detect because S. bicolor is one of the ancestors of

Johnsongrass, which makes identifying species-specific alleles challenging24. De novo assembling the Johnsongrass genome is a strategy to help resolve the conserved and unique regions among S. bicolor and S. halepense genomes to assist in clarifying the crop-weed hybridization rates in feral Johnsongrass populations and thereby assess the risk of crop-to-weed transfer of any future GE traits through pollen-mediated gene flow.

Such information is highly desirable to federal policymakers when considering the commercialization of new GE or gene-edited traits of sorghum (personal communication,

Dr. Subray Hegde, USDA Biotechnology Regulatory Services branch chief in the

Biotechnology Risk Analysis Program).

Sorghum is a primarily self-pollinated species, but sufficient outcrossing occurs

(18-30%, depending on panicle architecture and distance from the pollen source)32 to enable wind-pollinated production of commercial hybrid seed. While the Gametophyte 1

(Ga1) prezygotic reproductive barrier is used to isolate specialty maize varieties such as popcorn from wild-type varieties such as dent corn, there are currently no well- characterized prezygotic reproductive barriers to limit gene flow between sorghum

4

populations and the crop’s Eu-sorghum weedy relatives residing within its primary gene pool. An estimated $1 billion economic advantage could be derived from GE sorghum if not for the considerable risk of gene flow of GE traits to Johnsongrass15,20. In order to better understand the extent of naturally-occurring gene flow between sorghum and

Johnsongrass, and identify the underlying mechanisms facilitating cross-compatibility, two things are needed: 1) a suitable Johnsongrass reference genome, and 2) improved characterization of known prezygotic reproductive barrier loci in these closely-related species. Understanding the evolution of native cross-incompatibility systems in closely- related species is the optimal way to identify both the key genetic players in cross- compatible or -incompatible interactions, and the genomic modifiers of their expression.

The following is a description of known loci impacting cross-incompatibility in Zea (the best-characterized genus among the Panicoideae) and Sorghum.

Zea cross-incompatibility

Gametophytic factor 1 (Ga1) and Teosinte crossing barrier 1 (Tcb1) are located on chromosome 4 and Gametophytic factor 2 (Ga2) is located on chromosome 5, however the genes have not been fully annotated1,2,30. Ga1 is the most well-studied and is the primary barrier in the reproductive isolation of dent corn and popcorn. Popcorn varietals can utilizes Ga1-s to stop the growth of dent corn pollen that is ga1 from reaching popcorn ovary79. Ga1 has three known alleles (Ga1-s, ga1, and Ga1-m) with different functions. The ga1 allele is a null allele with no female or male function: ga1 pistils exclude no other Ga1 allele type pollen and ga1 pollen cannot grow in other Ga1 allele type pistils. The Ga1-s allele excludes non-Ga1-s pollen with the exception of Ga1-

5

m pollen (Figure 1.A) Ga1-m is able to pollinate Ga1-s, but due to a 2-bp insertion leading to a premature stop codon, Ga1-m can be pollinated by ga1 pollen (Figure

1. B)3,4. Recently, all maize cross-incompatibly regions have been linked to pectin

3,4 methylesterase genes, ZmPME3 and ZmGa1P/ZmPME10-1 complexes .

When ga1 pollen lands on a Ga1-s stigma, ZmPME3 is activated, expressing PME38 and stiffening the pollen tube to a rigidity that arrests growth and prevents fertilization

(Figure 1.C). When pollen containing the Ga1-s allele lands on a matching Ga1-s stigma it is hypothesized that either 1.) an inhibitor of pectin methylesterase (a PMEI) is released or 2.) PMEs change the pH within the transmitting tract, attracting pectate lysases and polygalacturonase3,4,30,36 which enable pollen growth through the transmitting tract and into the ovary (Figure 1.D).

Tcb1 and Ga2 are less well-characterized than the Ga1 region but each utilizes a similar model to Ga1. Both confer a recognition system having one allele that can overcome the respective barrier: Tcb1-s and Ga2-s are the ‘strong’ allele types that reject pollen not containing the matching allele (Figure 1.E,G), and Tcb1-m and Ga2-m overcome the strong allele type but fail to reject the recessive (null) alleles tcb1 and ga2

(Fiure.1 F,H)9 . Ga1 and Tcb1 are loosely linked: Tcb1-s is nonreceptive to the Ga1-s and ga1 allele but Tcb1-m is receptive to all pollen including all Ga1,Ga2 and Tcb1 allele

types1,2,7,8.

The rejection of pollen by Tcb1-s silks has been linked to the gene Pertunda which encodes a PME38 homologous to that at the Ga1 locus, differing by only 9 amino acids.

Alignments suggest that the two genes, Pertunda and ZmPME3, are conserved cross- incompatibility genes because the two were present in ancestral forms before the

6

split of teosinte from maize37. In a recent study, differentially-expressed genes at the

Ga2 crossing barrier locus were analyzed using a transcriptomic approach between compatible and incompatible crosses in pollinated and unpollinated maize silks. The common genes upregulated during pollination included those related to cell wall modification: PMEs, PMEIs, expansins, and pectate lyase superfamily proteins, specifically PME45, PME43, PMEI14, and cellulose mannose endo-1,4-beta- mannosidases30. The following is an elaboration of the putative roles for each of these molecular players in Zea mays reproductive isolation.

PMEs modulate the stiffening and loosening of cell walls.

Cell walls are composed of three major layers: the primary cell wall, middle lamella, and secondary cell wall providing tensile strength and protection. These layers are made up of microfibrils, hemicellulose, lignin, soluble proteins and pectin38. Pectin is a group of polysaccharides within the cell matrix that includes homogalacturonan (HG), xylogalacturonan (XGA), apiogalacturonan, rhamnogalacturonan I (RGI), and rhamnogalacturonan II in varying ratios, that contribute to the dynamics of the cell wall growth40. The organization and activity of polysaccharides, enzymes, calcium gradients, secretory proteins, and actin cytoskeletons maintain the tensile strength and integrity of the cell wall while allowing for the rapid elongation of growing pollen tubes39.

Homogalacturonan is thought to be synthesized in the Golgi apparatus and transported to the cell wall in a highly methyl-esterified state where it serves as the backbone to all pectins except RGI36,39,40,41. HGs are not distributed evenly throughout the cell wall and do not

36,42,43 have consistent patterns of methyl-esterification . Differences in methyl-

7

esterification alter the rigidity of the cell wall and it is a tightly regulated process. Pectin methylesterases are enzymes that deesterfy HG leading to either cell wall loosening or cell wall stiffening27. It is thought that PMEs can act either linearly or randomly on HG.

When acting randomly they release protons, shifting the pH and attracting pectin- degrading enzymes such as polygalacturonases leading to cell wall loosening. This pH change can also act as a feedback loop either stimulating or hindering PME activity, depending on the isoform27,28,29,36,43. PMEs can act linearly and freely upon carboxyl groups in tandem allowing carbon to creates linkages that stiffen the cell wall

6,27,28,29,36. Cross-incompatible reactions are known to be ancient pathogen defense mechanisms and often function similarly to other defense mechanisms such as those elicited by mechanical damage or herbivory51,52; the role of PME is no exception. The release of the methyl group participates in cell signaling, activating methanol-inducible genes, altering cytosolic calcium availability, and inducing defense mechanisms

49,50 against biotic or abiotic stress .

8

9

Figure 1. Model of Zea Pectin Methylesterases: Roles of Ga1, Ga2 and Tcb1 Allele Types in Pollen-Pistil Interactions. (A) Ga1-s pistils are incompatible with pollen carrying the null allele ga1 and they secrete a defense pectin methyl esterase enzyme (PME38). (A) Ga1-s pistils are cross-incompatible with pollen carrying the null allele ga1; they attack it by secreting a defense secretory pectin methyl esterase enzyme (PME38). (B) PME38 de-esterifies pectin, freeing a carboxyl group, opening a position for calcium to cross- link, leading to cell wall stiffening and growth retardation. (C) Ga1-m pistils contain an inactive ZmPME3 gene and does not produce PME38, allowing the ga1 null allele to grow normally. (D) ga1, Ga1-m and Ga1-s pollen each modulate cell wall growth through hydrogen by lowering apoplastic pH36,43,51 to attract pectate lyase. The mechanism of the putative ZmGa1P/ZmPME10-1 complex, present only in Ga1-m and Ga1-s pollen4, on apical tip cell wall growth is not yet known, but it is thought to maintain normal methyl-esterification and therefore rigidity, similar to figure D. (E,F) Tcb1-S pistils confer cross-incompatibility with the pollen carrying the null allele tcb1, attacking it through a homolog of defense secretory PME38. Tcb1-m pollen can overcome this barrier, but its pistil cannot exclude tcb1 pollen. (G,H) Less is known about the Ga2 barrier, but PME45/43 and PME inhibitor (PMEI) 14 are active during cross-incompatible pollinations and are thought to act similarly to the Ga1 and Tcb1 barriers shown here.

Sorghum cross-incompatibility

While synteny exists between maize and sorghum reference genomes at the maize Ga1 locus, genes within the locus do not appear to have the same function in sorghum as in maize, as the distribution of the ancestral Poaceae Ga1 locus among two chromosomes in Zea mays provides evidence that its evolution as a reproductive barrier followed the divergence of the two species’ genomes ~11 million years ago and after a whole-genome duplication event in Zea mays ~ 4 million years ago which may have enabled the evolution of new functions. Moreover, no candidate genes at this locus in

10

sorghum are PMEs or PMEIs; and while there is a PMEI elsewhere on this chromosome,

78 it is not expressed during anthesis .

Figure 2. Synteny between Chromosome 5 in Sorghum bicolor and the Zea mays reference genome, including the Ga1 locus on Chromosome 2 and additional sections of Chromosome 4. Sorghum appears to retain the ancestral Poaceae locus that was divided following an ancient tetraploidization event in Zea. Homologous regions are shown with connecting strands. Repeats are indicated in orange; the positive strand genes are indicated in blue and the negative strand genes are indicated in green.

Sorghum is known to have an unrelated (based on synteny analysis) prezygotic reproductive barrier locus from known Zea loci at Iap. The Iap locus controls two distinct phenotypes and, when homozygous recessive, allows for intra- and extrageneric pollen to grow normally to the ovary. In cases where pollen tubes reach the ovary, ploidy appears to be the primary determinate of whether fertilization occurs. Fine mapping of Iap10 identified

7 three genes of interest at the Iap locus within a 48-kb region . Maize and wild

Sorghum spp. pollen placed on wild-type Iap sorghum pistils showed stunted germination and inhibited growth patterns similar to that observed during incompatible pollen-pistil

11

interactions conferred by Ga1, Ga2, and Tcb1 loci in maize, rarely (<0.1%) reaching the sorghum ovary. Pollen from both maize and wild Sorghum species (including

Johnsongrass) displays a range of genotype-specific compatibility and incompatibility with sorghum pistils13,14,18 ,which is overcome (i.e. cross-compatibility is restored)

10 when sorghum pistils carry homozygous recessive (iap iap) alleles .

While it is widely known that the dominant wild-type allele of the Iap locus of sorghum prevents foreign pollen tube growth into sorghum pistils (confers incompatibility), and that the recessive allele (iap) enables foreign pollen tube growth to the ovary (confers compatibility)13,47,48; it is unknown which gene(s) at the Iap locus are involved. The homozygous recessive iap grain sorghum inbred line Tx3361 was developed at Texas A&M University using the distance traveled by maize pollen tubes into sorghum pistils as the desired phenotype to be selected for46. Similarly, when pollen from various wild Australian Sorghum species was placed on sorghum pistils, multiple points of growth inhibition (i.e. adhesion, germination, active elongation) and different growth morphologies (exiting the transmitting tract, bent, arrested) were observed7, underlying the likely quantitative nature of cross-incompatibility in the

Sorghum complex. These results indicate the existence of reproductive barriers leading up to fertilization that act physiologically to inhibit foreign pollen germination and growth into the sorghum pistil during cross-incompatible interactions (which are very likely controlled by genes within Iap and/or BAM1 loci and could be studied in vivo).

Our investigations of candidate genes at Iap identified one gene

(Sobic.002G023300.1) encoding a cysteine-rich secretory defense-related protein that is highly expressed in floral tissues at anthesis. Cysteine-rich secretory proteins (CRISPs)

12

play a role in host-recognition signaling and are involved in both the activation of defense-related mechanisms and pre-zygotic reproductive barriers53,54,55. One such

CRISP in Arabidopsis, Wak163, has a role in both mechanisms by functioning as a potential sensor of cell wall signaling by directly binding to the calcium that is cross- linked when PMEs de-methylate homogalacturonan. Based on its location within the fine-mapped region identified by Gill et al.10, its expression profile in floral tissues at anthesis based on the Panther Classification System68, and an evolutionarily-conserved role in pathogen defense, Sobic.002G023300.1 merits future investigations for its role, if any, in cross-incompatibility in sorghum.

Investigation of BAM1, originally thought to be within the Iap locus, revealed that it encodes a CLAVATA1-related leucine-rich repeat receptor-like kinase, implicated to have an evolutionary role in microsporogenesis signaling pathways69. In Arabidopsis,

BAM1 and BAM2 (having 80.8% identity with BAM1) are suggested to have an early role in somatic cell fates through ovule specification and function, and male gametophyte and anther development. Anther development in a bam1bam2 mutant does not form all cell layers, resulting in a disorganized cell division pattern. The bam1bam2 mutants also have dramatically reduced meristem sizes, but it is suggested that this is independent of altered anther development70. BAM1 merits future investigations for its original placement within the Iap locus when fine-mapped and for its known role in anther development and ovule specification. In addition, the Panther Classification System68 confirms that sorghum BAM1 is expressed in floral tissues during floral initiation.

13

CHAPTER II

Introduction

The genus Sorghum contains 25 species distributed among five clades: Eu- sorghum, Heterosorghum, Parasorghum, Stiposorghum, and Chaetosorghum 11,12,13.

Sorghum is an annual Poaceae species and the fifth-most produced grain crop in the world. Sorghum grain is in the Eu-sorghum clade along with its weedy relatives Johnsongrass [Sorghum halepense N. Steud de Wet], shattercane [S. bicolor (L.) Moench ssp. drummondii (Nees ex Steud.) De Wet ex Davidse], and [S. propinquum S. Kunth

Hitchc]16,17. Current genetic evidence indicates that Johnsongrass (2n = 4x = 40) is an allopolyploid descendent of sorghum (2n = 2x = 20) and S. propinquum (2n = 2x = 20)

15,31.

Johnsongrass is considered a noxious weed in 46 states in the United States and in many other parts of the world due to its perennial growth and aggressive rhizomes.

The close relationship between Johnsongrass and sorghum is challenging in that crop species. Weedy relatives pose a threat to plant breeding companies, who spend many millions of dollars developing traits that may transfer to weeds through pollen-mediated gene flow; and to federal regulatory agencies who must assess the potential environmental impacts of new crop trait commercialization. These challenges are compounded when crops carry -resistance traits, as gene flow provides an initial source of herbicide-resistance alleles in weed populations upon which herbicide selection will act to favor both the crop and crop-weed hybrids, leading to contamination of seed lots in breeding companies and reduced end-use quality of seed produced on farms.

14

The genetic diversity and drought-tolerance of sorghum have international relevance in its use as a climate-resilient bioenergy crop; however, biotechnology improvements are limited due to concerns about pollen-mediated gene flow to weed relatives. An estimation from Andrew Paterson states that an economic advantage of $1 billion could be derived from GE sorghum if not for the risk of pollen-mediated gene flow of GE traits to Johnsongrass15,20.Schmidt et al. (2013) reported 0-16% outcrossing from sorghum to shattercane depending on distance from the pollen source32. Arriola and Ellstrand (1996)18 reported outcrossing between sorghum and Johnsongrass as somewhat variable in frequency, but consistent with the 18-30% frequency suggested by

Doggett 19. Actual rates of gene flow to Johnsongrass are beyond the ability of current genetic investigations to detect because the ancestral relationship of sorghum to

Johnsongrass makes identifying species-specific alleles challenging24. Availability of a reference genome for Johnsongrass will help identify conserved and unique regions among the S. bicolor and S. halepense genomes.

In order to better understand the extent of naturally-occurring gene flow between sorghum and Johnsongrass, and to develop better weed control tactics for this aggressive weed, a Johnsongrass reference genome is needed. A reference genome will assist in clarifying empirical rates of gene flow under different cropping systems to help identify those systems that are most effective at reducing rates of gene flow or mitigating it through appropriate crop and herbicide rotation practices. A reference genome will also assist in understanding both sexual and asexual (i.e. through rhizomes) reproductive biology of

Johnsongrass, which could lead to novel weed control strategies that are

15

species-specific and safer for the environment and soil health than nonselective herbicides or repeated tillage. The objective of this research was to complete the first de novo genome assembly of Johnsongrass to assist researchers with these two grand challenges.

Materials and Methods

High molecular weight DNA Extraction. Long-read sequencing for genome assembly is optimized through the extraction of high molecular weight (HMW) genomic DNA.

Longer DNA fragments allow for easier identification of overlapping regions for de novo genome assembly. DNA binding beads, such as SeraMag beads, can be a low- shearing method to bind long fragments of DNA, skipping the centrifuge steps common in a traditional CTAB, DNAzol, Trizol or Kit reactions. A low-shearing high molecular weight gDNA extraction protocol was developed to support the long-read-based genome assembly using a modified protocol for sunflower [Helianthus annuus ssp. jaegeri ] as follows. To generate sufficient quantities of HMW DNA suitable for de novo genome assembly, immature Johnsongrass leaves from a single plant where collected, flash- frozen in liquid nitrogen, and stored at -80°C until DNA extraction. The sample was gently ground to a fine powder in a chilled mortar and pestle with liquid nitrogen while avoiding excessive grinding, which could lead to DNA shearing. Approximately 150-mg of ground tissue was added to 1200 µl of preheated lysis 67 in a 2 ml Eppendorf tube and inverted 20 times. RNaseA (QIAGEN, Hilden, Germany; 8 µl) was added and the tube again was inverted 20 times. The sample was incubated for 1 hour at 50°C, inverting 20 times every 10 min. Chilled potassium acetate (5M, 400 µl) was added and the solution

16

was mixed by inverting 20 times, then placed at 4°C for 15 min to remove proteins. After incubation, the sample was centrifuged at 4°C at 5000 × g for 10 min. The supernatant

(500 µl) was carefully transferred to a new 2-ml tube without disturbing the leaf debris and proteins. Binding buffer and SPRI beads solution (500 µl each) were added and mixed by inverting for 10 min. The tube was placed on a magnetic rack until the solution became clear, and the supernatant was removed. The beads were washed by adding 2 ml of 70% (v/v) ethanol, incubating at 20°C for 1 min, and the ethanol was removed by pipetting without removing the tube from the magnetic rack. The wash step was repeated once more. The tube was removed from the magnetic rack and the DNA was dried for 20 min by leaving it uncapped at 20°C to evaporate out the ethanol. Prewarmed elution buffer (55 µl; QIAGEN Buffer EB) was added and the beads were resuspended by gently flicking the tube and incubating overnight at room temperature. The tube was placed on the magnetic rack until the solution was clear and 50 µl of the elution buffer containing the HMW DNA was transferred to a new 1.5 ml Eppendorf tube. Absorbance ratios

(A260/280 and A230/260) were checked on a Nanodrop® spectrophotometer (NanoDrop

2000; ThermoFisher; Waltham; MA; USA). The molecule size and quality of the HMW

DNA was evaluated at the University of California-Davis Genome Center using pulsed- field gel electrophoresis (PFGE) (Figure 1). Two duplicate samples (technical replicates) were submitted and the sample with the best quality was selected for 10X Genomics sequencing.

Assembly and phasing of polyploid contigs in the Johnsongrass genome. Tissue from a single wild-type plant from an accession of Johnsongrass collected in Nebraska 35 was used in this genome assembly. The 10X Genomics Linked Reads raw reads were de novo

17

assembled using the Supernova assembler (Snyder et al., 2015). The total span of this assembly was 787.57 Mbp, with a contig N50 size of 117.60 kbp. The total span of the assembly was still nearly half the expected polyploid genome size, which was indicative of an allotetraploid genome. Consequently, we further processed the assembly to create a pseudopolyploid which can distinguish pairs of homeologous contigs sequence. Our approach is similar to the approach used for heterozygous genomes. Briefly, the algorithm begins by aligning the Linked Reads to Sorghum bicolor v3.1.1 using reads alignments, then phases all types of variants using HapCut2 (Edge et al., 2017). The phased SNPs and indels were replaced in the Sorghum bicolor genome sequence, and structure variations were reinstated by using local assemblies. Next, the phased contigs were examined to identify structure variations by Linked Reads alignment. The Linked Reads were aligned to the phased genome, and the extent of the large DNA molecules were inferred from the alignments of the reads. Since the physical coverage of large contigs (Linked Reads) is more consistent and less error-prone than short-reads alignments, the phased genomic contigs were cut at positions that have insufficient coverage of spanning nucleotides. This process was iteratively performed nine times until no more spurious structures were found. Specifically, this process solely maintained a consistent order and orientation by long-alignment chain, then the contigs were realigned with Linked Reads to identify homologous variants. The alignments were examined for processing more reliably by the genome analysis toolkit

(GATK) which was applied to replace the non-covered loci and SNPs with the

HaplotypeCaller algorithm (McKenna et al., 2010). The total span of the final phased assembly was 1,326,519,672 bp, in 3,213 contigs with a contig N50 size of 13,187,480 bp. To construct the chromosome-level

18

assembly, Hi-C libraries were prepared from immature leaves of Johnsongrass. The paired-end reads from Hi-C sequencing were mapped onto the final phased assembly using the 3D-DNA pipeline with default parameters. Misassembled contigs were corrected by detecting abrupt long-range contact patterns, then Hi-C reads were re- mapped to the corrected contigs using the BWA program, and scaffolding was performed using the ALLHiC pipeline (Zhang, 2019). The unanchored contigs were filtered and assigned as unanchored contigs. The final pseudomolecule assembly indicates 20 chromosomes with 1,367,364,829bp and N50 size of 68,658,214bp. In addition, BUSCO analysis indicates high (91.6%) completeness of this genome assembly (Simão et al.,

2015). The remaining steps include annotation of the genome assembly using multitissue, comprehensive RNA-Seq data.

RNA-Seq Experiment

To generate a wide variety of mRNA sequences suitable for validation of the de novo genome assembly, Johnsongrass seedlings derived from the same accession as the genome donor were subjected to water deficit and control treatments followed by destructive harvest, tissue sub-sampling, and total RNA extraction. Prior to initiation of the experiment, the amount of water needed to confer water deficit and control conditions was identified as follows. Plastic pots with drainage holes (17.78 cm tall, 16.5 cm. dia) were filled with medium-grit sand (QUIKRETE®, Atlanta, GA). Saturation capacity of the pots (200 ml; the daily application rate for the control condition) was determined by applying a nutrient solution (28.35 grams of Miracle Grow® All Purpose Plant Food dissolved in 53 Liters tap water) using a graduated cylinder to determine the volume

19

necessary for the solution to run from the bottom of each pot. The water deficit condition was one-fourth the daily application rate of the control, 50 ml.

The experiment was initiated in early April 2019 in the greenhouse by sowing 10 seeds pot-1 in 25 pots. The number of seeds used was well in excess of the tissue needed in order to ensure selection of similar-sized, healthy for treatment imposition and

RNA extraction. All seeds were sown 3 cm deep in sand that had been pre-moistened to saturation (described above). All pots were provided 200 ml nutrient solution daily at

09:00 for one week to ensure uniform emergence and growth. The plants were maintained under optimal conditions until the first leaf collar was fully expanded among most plants, which were then thinned to one plant pot-1, selecting for similarly-sized plants in all pots. Remaining plants were divided evenly into water deficit and control treatments and the two conditions were imposed for three weeks by adjusting daily nutrient solution applications as described. A total of three representative, similar-sized plants were selected for destructive harvest and tissue sub-sampling from both control and water deficit conditions as follows. Tissues harvested from each condition included: leaf, stem, root. Pollen, stigmas, seeds and rhizomes were harvested separately from mature plants grown under control conditions after three months of growth. A total of

34 samples were collected, placed in MP Biomedicals (Santa Ana, CA) Lying Matrix

‘A’ tubes for plant tissues, flash frozen in liquid nitrogen, and immediately used for

RNA extraction.

RNA extraction and Illumina sequencing. Total RNA was extracted from tissues using

(Sigma-Aldrich® Spectrum Plant Total RNA Kit; Darmstadt, Germany) with the

20

following modification: Spectrum Plant Total RNA Kit lysis buffer with β- mercaptoethanol was added to Lysing Matrix A tube containing frozen plant tissues and placed in a bead beater for homogenization of tissues with RNA stabilizing lysis buffer.

Optional DNase treatment was included in the extraction and total RNA was eluted in 30

µl molecular biology grade water. Absorbance ratios (A260/280 and A230/260) were checked on a Nanodrop® spectrophotometer (NanoDrop 2000; ThermoFisher; Waltham; MA;

USA). Purified RNA was checked for quality on a Bioanalyzer (Agilent Bioanalyzer

2100; Agilent; Santa Clara, CA; USA) and 10 samples having RIN scores >6.8 were selected for sequencing at the University of California-Davis Genome Center as follows.

Libraries were prepared by pooling tissues from the following samples: mature leaf, stem, and root from both control and water deficit condition plants, seeds, rhizome, stigmas, and pollen. Poly-A enriched single-strand sequencing was conducted on one lane on the

HiSeq using 150-bp paired-end reads. Additionally, a single, separate library was prepared by pooling all samples and conducting single-strand sequencing on one lane of a

MiSeq using 300-bp paired-end reads.

21

Figure 3. Representative Johnsongrass Seedlings after One Month of Growth. Meter stick for length reference under control (A) or water deficit (B) conditions. The control condition was 200 ml ± 20 ml SD daily nutrient solution whereas the water deficit condition was 50 ml ± 5 ml SD daily nutrient solution.

High molecular weight DNA clean-up for long-read sequencing of putative cross- incompatibility loci. In other species with better-characterized prezygotic reproductive barriers, the loci harboring causal genes tend to be hotspots for recombination66. We suspected that this may be the case in Johnsongrass and hypothesized that frequent recombination events may lead to high heterozygosity at candidate loci, making local assemblies more challenging. To address this potential pitfall, we extracted HMW DNA using the protocol above and conducted long-read resequencing of candidate Iap loci using primers developed from the most recent sorghum reference genome (v3.3.1) targeting all fragments on Chromosome 2 originally associated with the Iap locus fine-

10 mapped by Gill et al. , now known to reside in distinct regions 40 kb apart (Iap and

22

BAM1). Without an existing reference genome, it was not possible to be certain that all copies of Iap and BAM1 homologues in Johnsongrass would be captured using these PCR primers; nevertheless, only one amplified fragment per primer pair was observed in electrophoresis, and the use of HMW DNA resequencing was a strategy capable of both confirming the local genome assembly at resequenced loci and detecting the frequencies of specific haplotypes among the amplified fragments. Thus, the primary genome assembly, the re-sequencing data of Iap and BAM1, and the use of RNA-Seq data from pollen and pistils comprised redundant methods aimed to clarify both copy number and sequence information about candidate loci of the Johnsongrass genome suspected to play roles in pollen-pistil interactions with sorghum.

Additional clean-up steps were taken to ensure the optimization of long-fragment

(>16 kb) DNA amplification. Before beginning, buffer A65 was made [100 µl NaCl

(5M), 2 µl EDTA (500mM), 398 QIAGEN Buffer EB] and set aside for later use. High molecular weight Johnsongrass gDNA was extracted as previously described and total volume was brought up to 200 µl with elution buffer. This tube with the gDNA then receives an addition of 100 µl NaCl, 2 µl EDTA, and 198 µl of elution buffer.

Subsequently, 400 µl phenol:chloroform:isoamyl (25:24:1) were added to a 2 ml

Eppendorf tube (Tube 1), mixed by inverting 20 times, and centrifuged at 14000 × g for

10 min. The aqueous phase was transferred to a new 2 ml Eppendorf tube (Tube 2). Then

400 µl Buffer A was added to Tube 1, mixed by inversion 20 times, and centrifuged at

14000 × g for 10 min. The aqueous phase was again transferred to Tube 2) (with an accumulated total volume ~ 800 µl). An equivalent volume of chloroform:isoamyl alcohol (24:1) was added to Tube 2, mixed by inversion 20 times, and centrifuged at

23

14000 × g for 10 min. The aqueous phase was transferred to a new 2 ml Eppendorf tube

(Tube 3). The volume was measured and 0.3x of 100% ethanol was added to Tube 3, mixed by inversion, and centrifuged at 14000 × g for 15 min. The supernatant was transferred to a new 2 ml Eppendorf tube (Tube 4). The volume was measured and 1.7x volume of 100% ethanol was added to Tube 4, mixed by inversion 20 times, and centrifuged at 14000 × g for 15 min. The pellet was washed with 70% ethanol (500 µl) and centrifuged at 14000 × g for 15 min. The supernatant was transferred to a new 2 ml

Eppendorf tube (Tube 5) and the washing and centrifuging steps were repeated. The ethanol was removed by pipetting and the pellet was air dried in a hood for 5 min. The purified HMW DNA was resuspended in 25 µl elution buffer and stored at -80 °C until sequencing.

Re-sequencing putative cross-incompatibility loci in the Sorghum halepense genome

The Iap locus was originally fine-mapped by Gill et al10. Our investigations began by comparing mapped sequences and markers used by Gill et al10. to the most current version of the Sorghum bicolor reference genome (v3.1.1), which was assembled using long reads

(approximately 30-kb), and therefore had a much-improved local assembly. Based upon this initial work, it was clear that the region mapped by Gill et al10. was actually distributed among two distinct regions of Chromosome 2, spaced about 40 kb apart. We refer to the first section as Iap (containing five candidate genes) and BAM1 (containing only one candidate gene), based on nucleotide sequence homology among the two versions of the reference genome. The re-sequencing of putative reproductive barrier loci and candidate genes were completed by two different techniques. The full Iap locus

24

(roughly 16 kb) was amplified in a two-step PCR cycle using Takara LA Taq DNA polymerase (Cat.# RR002A). The PCR reaction was prepared with the following components to measure a total of 50 µl: 500 ng of genomic high molecular weight

Johnsongrass DNA,5 µl of 10x LA PCR Buffer II with Mg2+,8 µl of dNTPs, 0.5 µl of

Takara LA Taq DNA polymerase, 32.86 µl of distilled water 0.25 µl of the sense primer (5’-

CGGTGACCATGCCAAGTACAGCAAATTAAC-3’) and 0.25 µl of the antisense primer

(5’-CGGCAGTGAGAATGTTTACTGTTTGCTCAT-3’). The fragment was then amplified with a two-step PCR as follows: a denaturing cycle for 1 minute at 94°C and

14 cycles alternating between 20 seconds at 98°C and 20 minutes at 68°C; followed by

16 cycles alternating between20 seconds at 98°C and 20 min + 15 seconds/cycle at 68°C.

The final extension was 10 min at 72°C. A 0.4% agarose gel was used to validate the length of the amplicon using 4 µl of the reaction (see the first marked well in Figure 4.

Chr02:2144633..2160696).

Takara PrimeSTAR® GXL Premix (Cat.# R051A) was used to amplify specific genes of interest within the Iap and BAM1 loci. 50-µl reactions were prepared with the following components: 500 ng of genomic HMW Johnsongrass DNA, 25 µl of

PrimeSTAR GXL Premix (2x), 20.86 µl of distilled water and 0.25 µl of the sense and antisense primers for BAM1 (Forward: 5’-

CAAAGAAAAGACAAGTTTCTCAAAAGATCA-3’; Reverse: 5’-

TCTGTATAACAAAGTAGTAGGAGTACTTGC-3’) or Iap (Forward: 5’-

CGGTGACCATGCCAAGTACAGCAAATTAAC-3’; Reverse: 5’-

ATGAGCAAACAGTAAACATTCTCACTGCCG-3’) respectively. A three-step PCR was used as follows: an initial 10-second denaturing step at 98°C, 30 cycles of 10

25

seconds at 98°C, 15 seconds at 60°C, and 6 minutes at 68°C, respectfully; and a final extension of 6 minutes at 68°C. An aliquot of 4 µl of the completed reactions were loaded onto a 0.4% agarose gel and used in electrophoresis to verify fragment lengths (see the second and third marked wells in Figure 4.

Figure 4. 0.4% Agarose Gel with Johnsongrass Amplicons for Iap and BAM1, Putative Cross-Incompatibility Loci. Chr02:2144633..2160696: Iap full-length amplicon containing five candidate genes; Chr02:2144633..2150496: sub-region of Iap containing two candidate genes expressed in floral tissues at anthesis, Sobic.002G023300.1 and Sobic.002G023400; Chr02:2550778..2556242: amplicon containing the full-length BAM1 gene (Sobic.002G027600.1).

Results

While 40 chromosomes were expected, only thirty-six scaffolds of the Johnsongrass de novo genome assembly were identified due to polyploidy and high levels of heterozygocity that would be expected in a wild, weedy plant. However, it is possible that the remaining, unresolved heterozygosity could be masking one or two sets of chromosomes. A heatmap for the genome assembly is shown in Figure 5. and ordering metrics for the assembly are in Figure 6. The vertically overlapping diagonal lines in the

26

heatmap suggest more than two copies of at least one allele being present in the assembly, including a contig that appears to contain two copies of the same set of alleles.

This suggests the assembler had some trouble with these regions, and they are hard to untangle in scaffolding. Future efforts using a more sophisticated assembler will be needed to resolve these regions, as there is sufficient unresolved sequence and heatmap evidence for several more scaffolds to be present. In the meantime, the most recent results are a significant step forward in understanding the Johnsongrass genome and lay an important foundation for numerous investigations of genome evolution, divergence, and speciation within the wider Sorghum complex.

Figure 5. Heatmap of the Johnsongrass De Novo Genome Assembly. Blue boxes denote scaffolds and green boxes denote contigs. The red diagonal lines next to most pairs of green boxes are the result of synteny between those contigs. The gap in the diagonal red line appears to be one chromosome that is complicated. There is more polyploidy, heterozygosity, or both than the other chromosomes. It is likely that this unresolved heterozygocity is the reason for only 36 chromosomes (scaffolds) being detected despite Johnsongrass having 40 chromosomes. Further cytology and advanced computational investigations will be needed to resolve this.

27

+ ------+ ------+------+------+ | SCAFFOLD | NUMBER OF | LENGTH OF | LENGTH OF | | NUMBER | CONTIGS | CONTIGS | SCAFFOLDS | + ------+ ------+------+------+ | 0 | 20 | 89736022 | 89736022 | | 1 | 10 | 70469538 | 70469538 | | 2 | 10 | 81714562 | 81714562 | | 3 | 15 | 57167484 | 57167484 | | 4 | 17 | 38311951 | 38311951 | | 5 | 10 | 45057343 | 45057343 | | 6 | 9 | 43303485 | 43303485 | | 7 | 5 | 20750544 | 20750544 | | 8 | 14 | 33251597 | 33251597 | | 9 | 14 | 32013198 | 32013198 | | 10 | 11 | 21661898 | 21661898 | | 11 | 7 | 17008088 | 17008088 | | 12 | 5 | 16983251 | 16983251 | | 13 | 9 | 13597937 | 13597937 | | 14 | 7 | 11781333 | 11781333 | | 15 | 2 | 7863030 | 7863030 | | 16 | 3 | 7171241 | 7171241 | | 17 | 2 | 1450939 | 1450939 | | 18 | 16 | 91055638 | 91055638 | | 19 | 9 | 71138149 | 71138149 | | 20 | 23 | 82211200 | 82211200 | | 21 | 4 | 59770169 | 59770169 | | 22 | 6 | 36909712 | 36909712 | | 23 | 4 | 43822357 | 43822357 | | 24 | 15 | 43432125 | 43432125 | | 25 | 2 | 21666468 | 21666468 | | 26 | 14 | 32920204 | 32920204 | | 27 | 8 | 31348303 | 31348303 | | 28 | 3 | 21268019 | 21268019 | | 29 | 9 | 17158452 | 17158452 | | 30 | 5 | 16991136 | 16991136 | | 31 | 22 | 14464531 | 14464531 | | 32 | 14 | 12442339 | 12442339 | | 33 | 5 | 8505103 | 8505103 | | 34 | 7 | 7571734 | 7571734 | | 35 | 3 | 1460578 | 1460578 | + ------+ ------+------+------+ | TOTAL | 36 | 1223429658 | 1223429658 | | N50 | | 43822357 | + ------+ ------+------+------+ Figure 6. Ordering Metrics for the Johnsongrass De Novo Genome Assembly. A total of 36 finished scaffolds (chromosomes) were detected of N50 = 43,822,357 bp each.

28

RNASeq and Resequencing of Iap and BAM1. Due to the unexpected result of an incomplete Johnsongrass genome assembly of only 36 chromosomes, likely due to polyploidy and the high heterozygocity expected of a weed, the RNASeq results cannot yet be conclusively mapped to the genome to support differential gene analysis and annotations. Of necessity, the future investigations described above, including cytology to confirm expected chromosome numbers and additional computational efforts using a more sophisticated assembler will be pursued.

Multiple sequence alignment for the coding regions of each candidate gene within the amplicons containing the full-length Johnsongrass Iap region and BAM1 was performed using ClustalW in MUSCLE78. The consensus DNA sequence of each

Johnsongrass amplicon sequenced in PacBio containing putative genes associated with the BAM1 (Figure S1) and Iap (Figures S2 and S3) loci (as Iap was originally mapped by Gill et al. 10) in sorghum was BLASTed in Phytotozome against the Sorghum bicolor v3.3.1, Oryza sativa v7_JGI (rice), Zea mays PH207 v1.1 (maize), and Arabidopsis thaliana TAIR10 reference genomes. Any genes in reverse orientation to the S. bicolor reference sequence were reverse complimented prior to running in MUSCLE.

Phylogenetic trees were constructed based on sequence alignments at these loci. The alignment between the S. bicolor reference genome and the full-length Iap amplicon of

Johnsongrass showed 97.3% identity (Figure S4), whereas the alignment of the BAM1 amplicon showed 100% identity (data not shown). The phylogenetic tree of the coding region of Johnsongrass BAM1 (Figure 8) reflects the expected evolutionary relationships, with sorghum and Johnsongrass being very closely related, followed by maize, rice, and

Arabidopsis, respectively.

29

Figure 7. Phylogenetic Tree of BAM1. The neighbor-joining method and distance corrections were conducted in MUSCLE of the coding region of the gene BAM1 in sorghum against the Poaceae species Maize, Rice, Johnsongrass, and outside group Arabidopsis.

Alignments using putative coding regions of the five candidate genes in the full- length Iap ampicon of Johnsongrass clearly showed Sobic.002G023300.1 as the most conserved sequence across genera. Interestingly, the other genes at Iap had no strong hits in the other Poaceae genomes (rice and maize), and were absent from Arabidopsis, indicating the unique nature of this locus to Sorghum. The Johnsongrass homologue of

Sobic.002G023300.1 also reflected the expected evolutionary relationships described for

BAM1 (Figure 9).

Figure 8. Phylogenetic Tree of Sobic.002G023300. The neighbor-joining method and distance corrections were conducted in MUSCLE of the coding region of the gene Sobic.002G023300 in sorghum against the Poaceae species Maize, Rice, Johnsongrass, and outside group Arabidopsis.

These results indicate that both sequence divergence and structural rearrangements have occurred within this region during its evolution as putative

30

prezygotic reproductive barrier loci unique to Sorghum, similar to the structural rearrangements detected within each of the Ga1, Ga2, and Tcb1 maize loci and the remaining Zea species. While structurally rearrangements underpinning reproductive barriers is not unexpected, probably the most unexpected result of this work is that no known prezygotic reproductive barrier locus of Zea demonstrates any degree of synteny to Iap or BAM1 of sorghum or Johnsongrass, despite the close evolutionary relationship of these species. Moreover, while PME/PMEI systems appear conserved amongst all known maize prezygotic reproductive barrier loci, no PMEs or PMEIs were found to play a similar mechanistic role in these two Sorghum species.

Discussion

The focus of this research was to provide sufficient information about the

Johnsongrass genome to enable the first genetic and physiological studies of cross- compatibility and incompatibility in the Sorghum genus, with the long-term objective of identifying or engineering novel cross-incompatibility systems to minimize gene flow among crops and weeds. However, we expect the availability of a Johnsongrass reference genome to assist in identifying weed control tactics that could minimize the impact of this noxious weed on sorghum production and make sorghum a more competitive crop with maize. In addition, Johnsongrass is being used to breed perennial biofuel sorghum cultivars, and the new reference genome will be important for enabling genomics-assisted approaches.

In this master’s thesis, the identity and putative function of known cross- incompatibility loci in the closely-related Poaceae species maize, sorghum and

31

Johnsongrass have been investigated. The most widely known cross-incompatibility locus for maize (Ga1) does not appear to have a conserved function in other grasses, based on the current sorghum reference genome and the new Johnsongrass genome assembly. The

Ga1, Ga2, and Tcb1 loci of maize work through PMEs secreted by the pistil in response to growing pollen tubes to either loosen (during compatible interactions) or stiffen

(during incompatible interactions) the pollen tube cell wall. A reasonable hypothesis for why PMEs appear to have conserved function as reproductive barrier loci in maize, but not in Sorghum species, may be due to the extreme length of pistils of maize: any enzyme expressed in pollen that is more effective at rapidly dissolving cell walls will be positively selected for. Conversely, sorghum pistils are typically 1-2 mm, and speed may be harder to select positively for over very short distances, thus emphasizing the importance of alternative physiological pathways such as cell-cell recognition and chemotaxis in establishing reproductive isolation.

The most recent version of the sorghum genome does indicate a single expressed

PME in the region syntenic to the maize Ga1 locus in the panicle at anthesis. The most recent resequencing data using popcorn having the Ga1-S allele indicates numerous PME duplication events at this locus, consistent with the evolution of new function following chromosomal rearrangements such as tandem duplications at this locus and the ancient whole genome duplication event in maize. Based on this information, it is reasonable to suspect that PMEs may still have a minor role in modulating pollen tube growth in sorghum, although the Iap locus likely plays a dominant role by driving cell-cell recognition through ancient pathogen defense pathways. These questions merit further investigation using NGS strategies, particularly proteomics and investigations using

32

knock-out genotypes that will now be possible for the first time with the information generated herein.

Our investigations of candidate genes at Iap identified one gene

(Sobic.002G023300.1) encoding a cysteine-rich secretory defense-related protein that is highly expressed in floral tissues at anthesis. Cysteine-rich secretory proteins (CRISPs) play a role in host-recognition signaling and are involved in both the activation of

53,54,55 defense-related mechanisms and pre-zygotic reproductive barriers . This gene also seems to be the most well conserved across the Iap and BAM1 loci between Sorghum and closely related Poaceae crops. Investigation of BAM1, originally thought to be within the

Iap locus for sorghum, encodes CLAVATA1-related Lue-rich repeat receptor-like kinases that are implicated to have an evolutionary role for microsporogenesis signaling

69 pathways . In Arabidopsis, BAM1 and BAM2 (having 80.8% identity with BAM1) are suggested to have an early role in somatic cell fates through ovule specification and function, and male gametophyte and anther development. Anther development in a bam1bam2 mutant does not form all cell layers during anther development and has a disorganized cell division pattern. Bam1bam2 mutants also dramatically reduce meristem

70 size, but it is suggested that this is independent of altered anther development . Based on these phylogenetic results, the function of the Arabidopsis homologues, and the timing of sorghum BAM1 expression during floral initiation in floral tissues (coinciding with meiosis), it will be worth investigating whether BAM1 could serve as a nuclear male- sterility allele in breeding programs, or if it plays a role in the rate of production of unreduced gametes (diploid pollen) by sorghum that influence cultivar-specific rates of

33

cross-fertility and viable seed set with Johnsongrass. Future work is needed to identify a suitable knock-out mutant of the Iap gene Sobic.002G023300.1 and of BAM1, either in an

EMS population or by using CRISPR/Cas9, to clarify their functions.

In summary, the work described in this master’s thesis project included a de novo Johnsongrass genome assembly, revealing 36 finished scaffolds, but leaving four remaining chromosomes to be identified in future efforts that will require further cytology (to confirm expected chromosome numbers) and computational investigations.

Despite these limitations to the assembly, long-read sequencing data of the putative reproductive barrier loci Iap and BAM1 of Johnsongrass were achieved and used to develop a phylogenetic tree between Johnsongrass and its close Panicoideae relatives.

Results indicate high conservation (>98%) between the sorghum and Johnsongrass genomes at the Iap gene, Sobic.002G023300.1, but revealed species-specific structural and sequence differences that may be further investigated and studied in the context of rates of pollen tube germination and distance of growth into the pistil during self- pollination events as compared to cross-pollination events by related species.

34

Supplemental Information

Figure S1. BAM1 Sequence of Sorghum bicolor. PCR primers used to amplify long reads in Johnsongrass are shaded in gray.

>Chr02 Chr02:2550778..2556242

TCAAAAGAATAAATTTTCTGTTAAAAAAACAAAGAAAAGACAAGTTTCTCAA AAGATCACTTCTCTCCCTCTAGAAAATCCAACAAGGAATCACATGCTTTTTTA TAGCCGCACTAGCGGCCTACTGAGTAGGTGTGTCAGTGTGCGCAGCGTGCTG CTTCATCCAAGAACGACCAGCCGACGAGCGAGCCTATCGTAAAGCTCTCTCC TTTTCTCGCGGCCGGCGCGGCCTCACTCCCTTGTCTCCACTCTCCCGGCGGAG ACGGAGCCTCGAAACGAACCCGATGCCGATGCGCCTCCACCACCTCCTCCTC GTCCTCCTCGCCACCGCCGCCGCGGTGGCGGGGGCTGCCGCTGCCGGCGCGG GCGCGGACGCGGACGCGCTGCTGGCGGCGAAGGCGGCGCTGTCGGACCCGG CCGGCGCGCTGGCGTCCTGGACGAACGCCACTAGCACCGGCCCCTGCGCGTG GTCCGGCGTGACGTGCAACGCGCGCGGCGCGGTCATCGGGCTGGACCTCTCC GGCCGGAACCTCTCGGGCGCCGTCCCCGCCGCCGCGCTGTCCCGCCTGGCGC ACCTCGCGCGCCTCGACCTCGCCGCGAACGCGCTCTCGGGCCCCATCCCGGC GCCGCTGTCCAGGCTGCAGTCCCTCACCCACCTCAACCTCTCCAACAACGTGC TCAACGGCACCTTCCCGCCGCCGTTCGCGCGCCTGCGCGCGCTCCGGGTGCTC GACCTCTACAACAACAACCTCACCGGCCCGCTCCCGCTCGTGGTCGTCGCGTT GCCGATGCTCCGCCACCTCCACCTCGGGGGGAACTTCTTCTCCGGCGAGATCC CGCCCGAGTACGGCCAGTGGCGGAGGCTGCAGTACCTCGCCGTTTCCGGGAA CGAGCTGTCCGGGAAGATTCCTCCGGAGCTCGGCGGCCTCACCAGTCTCAGG GAGCTCTACATTGGCTACTACAACAGCTACTCCAGCGGGATACCGCCGGAGT TCGGCAACATGACGGATCTCGTCCGCCTCGACGCCGCCAACTGCGGGCTCTC CGGCGAGATTCCGCCGGAGCTCGGGAATCTCGAGAACCTCGACACGCTCTTC TTGCAGGTGAACGGGCTCACGGGCGCCATCCCGCCGGAGCTGGGCCGGCTCA GGAGCCTCAGCTCCCTCGACCTGTCCAACAACGGGCTCACCGGCGAGATTCC AGCGAGCTTCGCCGCGCTCAAGAACCTCACTTTGCTCAACCTCTTCCGCAACA AGCTCAGGGGCAGCATCCCCGAGCTCGTCGGCGACCTGCCCAACCTCGAGGT GCTGCAACTCTGGGAGAACAACTTCACCGGCGGCATCCCGCGCCGTCTCGGC CGCAACGGACGGCTCCAGCTGGTCGACCTCTCGTCCAACAGGCTCACCGGCA CCCTCCCGCCGGAGCTCTGCGCCGGGGGCAAGCTGGAGACGCTCATCGCGCT CGGCAACTTCCTCTTCGGCTCAATTCCGGAATCTCTGGGGAAATGCGAGGCA CTCTCCCGCATCCGCCTCGGCGAGAACTACCTCAATGGCTCCATCCCGGAAG GCCTCTTCGAACTGCCGAATCTGACACAGGTTGAGCTGCAGGACAACCTCCT GTCCGGTGGCTTCCCGGCGGTGGCCGGCACCGGCGCGCCAAATCTGGGGGCC ATCACTCTCTCCAATAACCAGCTCACCGGCGCGCTGCCAGCGTCCATTGGGA ATTTCTCAGGTTTGCAAAAATTGCTTCTCGACCAGAATGCATTCACCGGCGCA GTGCCGCCGGAGATTGGCCGGCTGCAGCAGTTGTCTAAGGCTGACCTGAGCG GCAATGCGCTGGACGGCGGCATGCCGCCGGAGATTGGAAAGTGCCGGTTGCT

35

CACCTACCTGGACCTCAGTCGGAACAACCTGTCTGGGGAGATACCGCCGGCC ATCTCCGGCATGCGAATACTCAACTACCTGAACCTGTCCCGGAACCACCTTG ATGGAGAGATACCGGCAACCATTGCTGCAATGCAGAGCCTCACGGCCGTCGA CTTCTCCTACAACAACCTGTCTGGCCTTGTGCCGGCGACTGGGCAGTTCAGCT ACTTCAATGCGACGTCCTTCGTCGGCAACCCGGGACTGTGTGGGCCGTACCTT GGACCATGCCATTCTGGTGGCGCCGGCACAGGCCATGGTGCACACACCCATG GTGGCATGTCCAATACCTTCAAGTTGCTCATCGTCCTCGGCTTGCTTGTCTGCT CCATTGCGTTTGCTGCCATGGCAATCTGGAAGGCCCGGTCACTGAAGAAGGC CAGCGAGGCACGTGCGTGGAGACTCACTGCGTTCCAGCGCCTTGAATTCACT TGCGACGATGTGCTAGATAGTCTGAAGGAGGAGAACATCATTGGCAAAGGTG GAGCTGGGATTGTGTACAAGGGGACAATGCCAGATGGTGAGCATGTTGCAGT GAAGCGGCTTTCGTCGATGAGCCGTGGCTCGTCGCATGACCATGGGTTCTCTG CTGAGATTCAGACTCTTGGGAGGATCCGGCACCGCTACATTGTGAGATTGCTT GGCTTCTGCTCGAACAATGAGACAAATCTGCTTGTGTACGAGTTTATGCCCAA TGGGAGCCTGGGGGAACTACTCCATGGCAAGAAAGGTGGCCATCTCCACTGG GACACTCGTTACAAAATCGCTGTTGAGGCTGCCAAGGGGCTCAGCTACCTCC ACCATGATTGTTCACCGCCAATCCTGCATCGTGATGTTAAATCAAACAACATC CTGCTTGATTCAGATTTTGAGGCACACGTTGCTGATTTTGGGCTCGCCAAGTT CTTGCAGGACTCCGGCGCATCACAGTGTATGTCTGCCATTGCCGGCTCCTATG GCTATATTGCTCCAGGTATTTTTTTTTATCCTTCTGCTTCACAATTGTTGATTTT AAGTCTTTCTTGTTGTCTACAGATTAAATGTGTTGCTAGCTGTTACTACTTGGC CCCTTTTAGTTTATTCTTCAGTTCAGCTAATGAAACATAGCATTCCTTTGTGGA CCTGGTTACTTACTGCTATGTAGCTAAAGTTTGTTTCCTTGTTCTGATAATATT AGAAATATAGATCTTTTTTTATATCTACATGAGTAAAGAAATCCTGAAATCAA CTAAACCTTTATTGGATTGCTATCTTGTGGATTTACAGAGTATGCATATACCC TCAAGGTCGATGAGAAGAGCGATGTCTACAGCTTTGGTGTCGTGCTTCTTGAG CTTGTCACCGGAAAGAAGCCAGTAGGGGAGTTTGGGGACGGTGTCGACATTG TCCAGTGGGTCAAGACGATGACGGACGCAAACAAGGAGCAGGTGATCAAGA TCATGGACCCCAGGCTGTCGACCGTGCCAGTGCACGAGGTGATGCATGTCTT CTACGTCGCGCTGCTCTGTGTCGAGGAGCAGAGCGTGCAGCGGCCTACGATG CGGGAGGTGGTGCAGATGCTCAGTGAGCTTCCCAAGCCAGCTGCAAGGCAAG GAGATGAACCTCCCAGCGTTGATGATGATGGTTCCGCGGCGCCGTCTGATGC TCCAGCAGGAGATGGATCTGTTGAAGCGCCACATGACGAAGCCACTAACGAG CAGCAGCCGCAACCGATCTCGCAGTCTTCGCCGACTACTGATCTCATCAGCAT GTGAGAGCTGTGCCACGCTTGGTTTCTCTCATTCCTATGTTTCCTTGTAATGGT GAAATTCGGATGTTTGAATATTTCTAGCTTAGGGCCTGGTGTTTTAGGTCTCT GCACTTAATTAGGGTGGTGTGAGATTTCAGATATCTTGTGTTTATCCATCTGC ATTTAGGTAGGCTGTAGGGTGTGTGTGAAATCTAGGCCCCAGAATTTCTGGTG TGGTATGCATCATAGCCTCTTAGGACACTGTTCTGGATGTACATATCAAGCCT CCTAATCTCCATTTTGCTGTTAGAATTGAAGAACCATTTAGGGGGCTAGGAGT CTAGGATTCTGGAGCAATCAAGTTCGTTAATGTTCCTGTTGTACTTGTGTTAA TGTTAAGGTGAGCACCCCAGCATTCATATGTTGTACTGCATTGCACTTCACCC

36

TCTAGCTAGGGAAGAAGTTCAGGTGCGTTTGGATGTTGTCAGTTAATGCAAC ACCTTTCCTTGCTTTGCACTGTACAGGAACAGGACGATACTTCTGAACAACTG TTGATTATCATTATCACAGGGCGAGCTTAATCGCAACATATAATCTCGCATTT GATGAAGATTATCATCATGATCATAGGGCGTATGCTTGCAGCATGTTGCAAG GAGTGGCAAGCCCATGGCTGGTGGAGCGGTGTCTGGCTGCAGATCACCAGGA TTGGTAGATCAGCAGGAGACCGAGACCAGCTGCCAGCAGGGGAACAATTAA GCAAGCGAAGCTTTGGAGTGAAGTGAAAGCAACTGTAATTTGCTCGCATCTC GACTTGGAGATGCTTCCTTGGCTCACCCCAATCTCTTTTTCTGTTACGAGGTC AAAAAAGACCCGCCGGAGCTGCGGGTGCTGTGCCCACAAGCACATGGTCGA CGCCTCTCATTCCTTCCAACCAAGCAGTCTGCAACCTTGGTGGCATTTTAACT ATGCAGAATGGGTGTGTTTGAGCAGTGAGATTCAGGCGTTTATAAACATTTG GTTGCAAAAACAAGAAATTAAAAGATATCTCCAGCACTAGCTAGATTTCGCA TCGTCTTCTTGAGATGCATCAGATCCCCATCCCATCTCCATCTCCCCCGTCGC GGTCGCAAGGAAGCATCGCATGTGTTGGCACGTACGTACCGGCCTTTGCCCA ATCATGGATGCATTCAGAGTTTCAGACATAACACGTCTGTATGTAGGTTGCGC AATGATCCTAGTCCTTGTCCAGGCTAATCTGTTGTCCTCCTACCGCAAATTAA GCATGCGTCGCTAGCATGGCTGCTGCTGCTCTAACTGTTCACATGAGGACTGT AGTTCGTTCTCATCCGGTCATCCCTCCTCGCATCGCATCGCATCGATGCAGTT CAGAGATGCAGATGATGCATCGTTGCTACCTCCATTGCATTGTCGAGGTCCTA CTCGCTAGCTTTTCACAGGAGCGCGTGCCTGAACGGCTGAATCCGTCACTCAT TTCATGTCTCTTGTTTCGGAGCAGTTTCAGGTTCTCATCATCCAAGCCATGGA TGGATGGTCGAGTCACAGTCACCTCACTACCCATGGAACCGGAGCTGAGATT TTTCGACCTGCCGCCAATGTTCTCGCATCCGGCAACAGCATGATCAGGCAGA AAAAAAAAGTTGGTGACACTGACCATTCCACTTGTGTGTGAGACTGTGAGCG AGTGATCAGCGTCATATCATCGGCATGCTTTTGTGCTGTCCAAACTCTACCCA ACAACTGGATGGAGTATACAAGCAAGTACTCCTACTACTTTGTTATACAGAA CAAAGCTA

37

Figure S2. Iap Region Containing the Tightly-Linked Genes Sobic.002G023300.1 and Sobic.002G023400.1 Expressed in Floral Tissues at Anthesis. PCR primers used to amplify long reads in Johnsongrass are shaded in gray.

>Chr02 Chr02:2144633..2150496 CATTGACCCTGACGGTGACCATGCCAAGTACAGCAAATTAACAGTCACCGTG TTTTACTAGTATATGCCATCTTGCTTGGACAAGTCTCACTAAATTCATCAGTA TAAATACCGTGGCCAGTCGCCACATCTCATGCACCAAAATCCATCTACCACA ACAGCGCGCCATCTCTGCTCATTCAGATAACTTGGTCAGCAGCAACAGCGGC AATGGCGTCGTCCTCCTCGTCACCGACGAAGCTGCTAGCGTGCCTCGTCGCCG CAGCTCTGGCGCTGGCTGCCACCGTCGTCGCGCCGTGCGCGGCGCAGAACTC GCCGCAGGACTACGTGAACCCGCACAACGCGGCGCGCGCCGACGTCGGCGTC GGCCCGGTGTCGTGGGACGACACGGTGGCCGCGTACGCGCAGAGCTACGCG GCGCAGCGGCAGGGCGACTGCAAGCTGATCCACTCCGGCGGTCCCTACGGCG AGAACATCTTCTGGGGCTCCGCCGGCGCCGACTGGTCGGCGTCCGACGCCGT GGCGTCGTGGGTTTCCGAGAAGCAGTACTACAACCACGACACCAACAGCTGC GCGGACGGCAAGGTGTGCGGGCACTACACGCAGGTGGTGTGGCGTGACTCCA CGGCCATCGGCTGCGCCCGCGTCGTCTGCGACAACAACGCCGGCGTCTTCAT CATCTGCAGCTACAACCCGCCGGGCAACTACGTCGGCCAGAGCCCATACTAG ACGTAGTAGTGTGCCGTATGCATGAATTGAATACATGCAAGTATACGTACTG GGGTCGGAGTGAAAATAAATTGTTGTCAACTTTATACCATACTATGAATGTTG ATAAACATAATAAGTCAATAAAATCATGTGATTGCTGAGGCATATATATTTTT GTTCTTATATTTTGATCGTTAATTACTACAGAAACGATCTTTAGGAACATGCA TGATCAATTTTAGTACTATACTACAACCAATTTGCAGCGTCGCATAACAAGAC CATGTGATTTTTCATATATGCTATTACATCTGCCCAAAAAAAATAAAAATAAA CACTACTATAAAACCCATTCTCAATATCATTTTCTCCATCTTTACAGTAAATG CTACTACGTCCACCTAAAAATGCACAATAAACACTTTCACAATATCATTTTTT TGGCACTGATGTAGTAATAACTTATTGTGATAAATTTGTTATCACTATCGGTT CGTTACACAATTGTGGTAACATACAACACCACAATCGGCCCACTAGCCTAAT AGAAAACACTAAGGCTGTGTTTGGTTCTTTGGAATTTGGCCCAGGAACAATTC TATCTCCTCAAGACTATACAAATTAGTGAAGCAATCCAAAACTAGGAATTAT TCGAGGCCCTCATTCCAGTAGAACCGAACAAGGCCTAAGGCTTTGTCCTCAC TCTATTTCAGACTCTTCCCTCTCTCATAGCCGCTGCTCTAGCACCGCCTCCACG TCCCAAGACCCCAACGCCACCGAACTGGAGCTCTATAATTTCATCAATTTTTA GTTTTAGGTCCCGTCAAACTATAAATTTGATGTAATTTTTTCATCTGACTTTAT TTGAATAAATTATAAATTATTCATGCTATAGCTGTTTTTGAGGTCCATGCATG TTTGAAATTTGATTTCTAAGTATGGCTGACTTACCTAGAAATCATTCATTTTA GAGTCGGTTGACATAAGCAACCATCCCTAAAATATATTTTTAGAGATGGGGT AGATATATTTTCAGTGATAGTTCACTTAGTAGAACCATGGTTTGTAAGGTATG GTGTTTCCGGAGCTAGTTGCAAATAGTTTTCTATGTAGAACACCCTTGAAAAA TAGAAGACATCTCAAATTATTTTTTTAACTGGTGTGACTGTGCGAAGAACGCT CGTGAGGACTCACGGTCACGGCGCTGGAGCACATATTGGCCGCCAGAGTCAC

38

CCAGGTGTACAACTCCAGGAAACTAAAGTTGCCACGTCAATCCATTACCATG GCGGCGGTGGCGTGGCACAGCCTTTTGTTTTAAGATAAATAAAAAAGGACCG ACCGGCGCTGATCTTGACCATTACAAACTTGTTTGTTTAAAAATATTGTTCGC TGATTTATTGTGAGAGAAAAACACTGCTGAATAGCTGGCAGATTCAATAGAT AAGCTTAACCAAATACATTAGTCACTGCTCTAGGTCTCCAACGTACAGACAA AATGCAGAGAAAATGAAGAAGAAATGATGAGCACGCGACTGAGAGTGAATG AATACACTGCAGTCATAGTACGTTTGTAATCAGCTGTATCAAAGTTCTTGACA AAATATATATATAATACAGGGCTTGCACATCCAACAGTCTTGGAACAGAAAC TAAGATTAATTAGCATGGTATATAAATTAAAGGTGCAACAAACATCATTGCA AAATTAAAAGTGACATCTCAAAAGCACGCCACGGGTAAAGAAGACCACACA GCGCGCGCGCACAACTGTAAATGAAGTAATCCAAAGCAAAGCGGCCATGGT AGGCCGCCTAAAACAAAGTTGACACAGCAGAATAAGACCAACGACCATACG GACTAATCAAGCACACATGCATGAAGAAAAGACAAATCCATTAAACAGCAC CAGAAGAAGTTGCTGCATCGCCATCTTCTTCACCTGGCCTGCTGACGATCCCG GCGTGGAAGAGCAACGCCCAGAGCAGCGTTATCAGCTCGCCACCCCTGGCAA TGGCCTCCGAGTGACCCTTGAGGTTGTCTGATGGCGCGACGAACAGGATCAT CTCTGACCAGAAGTCAGCCAGCAGCTTCCACGCCATCTCCTCACCATTGATTT TCTCCACCAGCTGCTTTGCCAGCTCCACACCATTCTTGAGCACCTCATGCTTC GAGTTCTCACTCAGCAAGTGGACCAGCTTTTGATACTCATCTTCTGGCTCCGA TGACCTTGCTACTGCGTGACCGGCGAGGGCATGCTCTGCGTCCTTCTTGACAG CCTTGTACAAGCTCTTGCTCCATGCATTGTCTTATCCGGAAGTAGCTCAGGGC ACCAACTCACTAGGTATGCACAGTACTTTGACAAGTGACTGGCAACAATCTT ATAATCTGTAGAGCTGGGATTTGGTGGAGAACCCTGTTCTTGATGATCAGCTC GGTATGGGTGTTTCACTTCAAGGATGCATGTGGCGATATGCCATGTGAGTATG ATATCCGATGTACCTTTGCCATTGCAAGCCCCAATAAAGCTTTGTCCACCAAC TTGGCTTTGTCGTAGAGATTCTGCACCATTGCTCAAACGGTGTCCATTGATCA GATTTCTTACTGCATTCAGGATGCTAACCTTCACGGCCGCTGGCACCTCCATT TTCCTCCTCTGATCTGGGACACGGAGGAGACGGTATACAAGAACTATTGGGG TTGTTTTTGGCTGAGGTACCAACACTGAGCACTGGCTCATTTTTTCATCCCAG TGTTTCATCAGCTTGCATCTGCAGCGTAACAAACTACTAACCAATTTTCGCAT GCATAGTGAATGTGGTGAAGAGTTGCGGTTGATGTAGCAGCAAATAAGGGTA ACTTTTGTCCAGTTGGAGCAAATATAGGAAACCAAGCTTCTAACCTCAGCTGT CACCGCTAACATCAGAACCAAAAATATTGGCAACAGATCGAAGGACCACTCT CCCATATTGATGTGCGACATGTAGCTTGCTAAGTTCCCTTGAATACACCATAA TTTACACTGAAGTTGTTGGCCCCTCCCATGCTCTAAGATCTTCCGTACTATGTT TGCTATGACTAGCAAAGTAAACAAGATACAATAAACTACTGTGAAAAGTGAT AAAATGATATCCAAGATGGGCAGGCAACAATTTGAATAGGAGATTGGGAGA GACGAGTAATAATAATCATGAAGAAATGATAACTCTTCCTCAATCACCACAA ATACCCTTTCATCTTCTCCAACCTTGTGCAACAAGCTCCGGAAAAAGTTGCGA GTCCCAGTAGAGCCACCATTAGTGATCTTATACCGTGCAAACCGACATCGCA ACAACTTGAACAAAGCAAAGGACAAGCATATATCTTTTAACTGTGGCGTTGC AATTAGAAGCATGCTATCTAACTGCCAAACTCTATCAATAGTCACCAATCCA

39

ACATGTTTGTTCAGCATTGTTTCAGAGTCATCCTTGAATGTATACCCATGAGG ATGCTTCTCCACTTGTTCCTCCGATTCTCTCATAACCAAGAGTGGGGGAGCAT GTGCACCATCACCTACCAGCGGTTCATCAACATGCTGATGTGCTGCTTCTTGT GCTTGTAACTTCTGCATGTAAACAAAAACAAGTCCTGGATTGCGTCCGAGTGT GAAGGACGACCGCCGTGCCATTTCGAATGCATAAAATTTGAGCCCTATTTTG GAGCATAGCAGAGCATAAGGTGTGATGAAGATGAAGATGAGTCCTATAATG AAGCCCTCTGCGGCATAGAGCTTAAGGAGACCACCGGCAGCAAGGTAAAAA GTCCAAATCCCCTGTATAAGCAGCTCGGGCGGGGGACCTTTGGATTGGCCTTC TCTGTCATCAACAGAGACTATTGTGCTTGTATTGATCATGACTACCAGCACAA GGGATGCCAGAATTACGAAGAATACCACGTGGCTAAGTGCATTGCATTGTGC AACGATCGCCAGCTCACCATAATCACCATAATCCTCACCATAATCTTCATACA GTGAGGAAAACCTGATGTAATTAGGGTGGAGGCTGATGGTGGAGGTGAGAT AGGAGATGATGGGCAAGAAAAGTGTGTTGGCTCCAAGGAAGACAAAGCGAG TGAAGGGGTGGTGGCGGTACCGCTGACCATAGGCGCCTATCCCAACAAGTAC TACAGCCAGGACAGGTTGAACCACTAGCAGGGCATTAACCTCCCATAGCTGC CGTCTAATGCTCTCGTGGACGGTTGTCTGGAGATCCATCGCTTGTTTGGAGCA ATTTCCGATCAGCTTCCCATTCCCTCCAGCCATGCTCAGCTGCCTGCCAAACT TCAAGTCGTTAGCTAGCACCATGCAAGGCAAGGCACATGCAACCTCGTGCAA GATAGACGGTGTTGTATGTACGAGAAACTATTAACATGCAAGTAATTTGAAG AAAAAAAATAAACGAGTGGAGGAATTACATGAATTAATTAAGGTGTGGCGG TATAGCTTAGGCCGAGATCCAGCTGACTGGTGCTGCGCAATCCAGCAGCCCA GCCGCTCAGCGTTTCTTATGGATGGTTTCCTTCGTAATGGAGTTTTGGGCTTG CTAGCTACTCACGCGTATTAGAGAAGGGTATATATGGGAGATGCCGAGAGGA AGAAAGTAGAGGTTGAACCTGACCTTTGGAACGATTGGGTCATGATCCCACG GCTATTACAGCCAAGAATCTTTGATGTTCCATAAGTATTAGAAGTTTTGAGGA CATATTTAGTAGGTAGAACTAATCAAATGATTATCTTTACTCTCTCATTTTATT TTCTAACACAGTACCCATTTAAATACTATATTCTTTATTTGGTTGCAATAAATT AATCTACCTATTACTCTATATATATGTTTTACCATCATTTGCACTGCGCCTTGG TGGGTTTAATTTATATGTATGTGTCCAAATATTCGGTGTGATAGGAGTTAAAG TTTACCGCTAGAAATCAAACAGGGACTAAGTATTGTCTCGGCGCTGCTAAAA CTACACCAGGGAAAAATCGCAAACCTTTGAAGTTTGTATATGTCGTTATGCTT TTAGTTTTAAGTTTTACTCTAGATAGTTGATGATGCAATCTGCTGTGAATCCTT TTTAATTAATCTTTAAGGAAACAGCAACATGGATGTATGAAC

40

Figure S3. Iap Full-Lenth Region of the Sorghum bicolor Genome. PCR primers used to amplify long-reads in Johnsongrass are shaded in gray.

>Chr02 Chr02:2144633..2160696 sorghum CATTGACCCTGACGGTGACCATGCCAAGTACAGCAAATTAACAGTCACCGTG TTTTACTAGTATATGCCATCTTGCTTGGACAAGTCTCACTAAATTCATCAGTA TAAATACCGTGGCCAGTCGCCACATCTCATGCACCAAAATCCATCTACCACA ACAGCGCGCCATCTCTGCTCATTCAGATAACTTGGTCAGCAGCAACAGCGGC AATGGCGTCGTCCTCCTCGTCACCGACGAAGCTGCTAGCGTGCCTCGTCGCCG CAGCTCTGGCGCTGGCTGCCACCGTCGTCGCGCCGTGCGCGGCGCAGAACTC GCCGCAGGACTACGTGAACCCGCACAACGCGGCGCGCGCCGACGTCGGCGTC GGCCCGGTGTCGTGGGACGACACGGTGGCCGCGTACGCGCAGAGCTACGCG GCGCAGCGGCAGGGCGACTGCAAGCTGATCCACTCCGGCGGTCCCTACGGCG AGAACATCTTCTGGGGCTCCGCCGGCGCCGACTGGTCGGCGTCCGACGCCGT GGCGTCGTGGGTTTCCGAGAAGCAGTACTACAACCACGACACCAACAGCTGC GCGGACGGCAAGGTGTGCGGGCACTACACGCAGGTGGTGTGGCGTGACTCCA CGGCCATCGGCTGCGCCCGCGTCGTCTGCGACAACAACGCCGGCGTCTTCAT CATCTGCAGCTACAACCCGCCGGGCAACTACGTCGGCCAGAGCCCATACTAG ACGTAGTAGTGTGCCGTATGCATGAATTGAATACATGCAAGTATACGTACTG GGGTCGGAGTGAAAATAAATTGTTGTCAACTTTATACCATACTATGAATGTTG ATAAACATAATAAGTCAATAAAATCATGTGATTGCTGAGGCATATATATTTTT GTTCTTATATTTTGATCGTTAATTACTACAGAAACGATCTTTAGGAACATGCA TGATCAATTTTAGTACTATACTACAACCAATTTGCAGCGTCGCATAACAAGAC CATGTGATTTTTCATATATGCTATTACATCTGCCCAAAAAAAATAAAAATAAA CACTACTATAAAACCCATTCTCAATATCATTTTCTCCATCTTTACAGTAAATG CTACTACGTCCACCTAAAAATGCACAATAAACACTTTCACAATATCATTTTTT TGGCACTGATGTAGTAATAACTTATTGTGATAAATTTGTTATCACTATCGGTT CGTTACACAATTGTGGTAACATACAACACCACAATCGGCCCACTAGCCTAAT AGAAAACACTAAGGCTGTGTTTGGTTCTTTGGAATTTGGCCCAGGAACAATTC TATCTCCTCAAGACTATACAAATTAGTGAAGCAATCCAAAACTAGGAATTAT TCGAGGCCCTCATTCCAGTAGAACCGAACAAGGCCTAAGGCTTTGTCCTCAC TCTATTTCAGACTCTTCCCTCTCTCATAGCCGCTGCTCTAGCACCGCCTCCACG TCCCAAGACCCCAACGCCACCGAACTGGAGCTCTATAATTTCATCAATTTTTA GTTTTAGGTCCCGTCAAACTATAAATTTGATGTAATTTTTTCATCTGACTTTAT TTGAATAAATTATAAATTATTCATGCTATAGCTGTTTTTGAGGTCCATGCATG TTTGAAATTTGATTTCTAAGTATGGCTGACTTACCTAGAAATCATTCATTTTA GAGTCGGTTGACATAAGCAACCATCCCTAAAATATATTTTTAGAGATGGGGT AGATATATTTTCAGTGATAGTTCACTTAGTAGAACCATGGTTTGTAAGGTATG GTGTTTCCGGAGCTAGTTGCAAATAGTTTTCTATGTAGAACACCCTTGAAAAA TAGAAGACATCTCAAATTATTTTTTTAACTGGTGTGACTGTGCGAAGAACGCT CGTGAGGACTCACGGTCACGGCGCTGGAGCACATATTGGCCGCCAGAGTCAC CCAGGTGTACAACTCCAGGAAACTAAAGTTGCCACGTCAATCCATTACCATG

41

GCGGCGGTGGCGTGGCACAGCCTTTTGTTTTAAGATAAATAAAAAAGGACCG ACCGGCGCTGATCTTGACCATTACAAACTTGTTTGTTTAAAAATATTGTTCGC TGATTTATTGTGAGAGAAAAACACTGCTGAATAGCTGGCAGATTCAATAGAT AAGCTTAACCAAATACATTAGTCACTGCTCTAGGTCTCCAACGTACAGACAA AATGCAGAGAAAATGAAGAAGAAATGATGAGCACGCGACTGAGAGTGAATG AATACACTGCAGTCATAGTACGTTTGTAATCAGCTGTATCAAAGTTCTTGACA AAATATATATATAATACAGGGCTTGCACATCCAACAGTCTTGGAACAGAAAC TAAGATTAATTAGCATGGTATATAAATTAAAGGTGCAACAAACATCATTGCA AAATTAAAAGTGACATCTCAAAAGCACGCCACGGGTAAAGAAGACCACACA GCGCGCGCGCACAACTGTAAATGAAGTAATCCAAAGCAAAGCGGCCATGGT AGGCCGCCTAAAACAAAGTTGACACAGCAGAATAAGACCAACGACCATACG GACTAATCAAGCACACATGCATGAAGAAAAGACAAATCCATTAAACAGCAC CAGAAGAAGTTGCTGCATCGCCATCTTCTTCACCTGGCCTGCTGACGATCCCG GCGTGGAAGAGCAACGCCCAGAGCAGCGTTATCAGCTCGCCACCCCTGGCAA TGGCCTCCGAGTGACCCTTGAGGTTGTCTGATGGCGCGACGAACAGGATCAT CTCTGACCAGAAGTCAGCCAGCAGCTTCCACGCCATCTCCTCACCATTGATTT TCTCCACCAGCTGCTTTGCCAGCTCCACACCATTCTTGAGCACCTCATGCTTC GAGTTCTCACTCAGCAAGTGGACCAGCTTTTGATACTCATCTTCTGGCTCCGA TGACCTTGCTACTGCGTGACCGGCGAGGGCATGCTCTGCGTCCTTCTTGACAG CCTTGTACAAGCTCTTGCTCCATGCATTGTCTTATCCGGAAGTAGCTCAGGGC ACCAACTCACTAGGTATGCACAGTACTTTGACAAGTGACTGGCAACAATCTT ATAATCTGTAGAGCTGGGATTTGGTGGAGAACCCTGTTCTTGATGATCAGCTC GGTATGGGTGTTTCACTTCAAGGATGCATGTGGCGATATGCCATGTGAGTATG ATATCCGATGTACCTTTGCCATTGCAAGCCCCAATAAAGCTTTGTCCACCAAC TTGGCTTTGTCGTAGAGATTCTGCACCATTGCTCAAACGGTGTCCATTGATCA GATTTCTTACTGCATTCAGGATGCTAACCTTCACGGCCGCTGGCACCTCCATT TTCCTCCTCTGATCTGGGACACGGAGGAGACGGTATACAAGAACTATTGGGG TTGTTTTTGGCTGAGGTACCAACACTGAGCACTGGCTCATTTTTTCATCCCAG TGTTTCATCAGCTTGCATCTGCAGCGTAACAAACTACTAACCAATTTTCGCAT GCATAGTGAATGTGGTGAAGAGTTGCGGTTGATGTAGCAGCAAATAAGGGTA ACTTTTGTCCAGTTGGAGCAAATATAGGAAACCAAGCTTCTAACCTCAGCTGT CACCGCTAACATCAGAACCAAAAATATTGGCAACAGATCGAAGGACCACTCT CCCATATTGATGTGCGACATGTAGCTTGCTAAGTTCCCTTGAATACACCATAA TTTACACTGAAGTTGTTGGCCCCTCCCATGCTCTAAGATCTTCCGTACTATGTT TGCTATGACTAGCAAAGTAAACAAGATACAATAAACTACTGTGAAAAGTGAT AAAATGATATCCAAGATGGGCAGGCAACAATTTGAATAGGAGATTGGGAGA GACGAGTAATAATAATCATGAAGAAATGATAACTCTTCCTCAATCACCACAA ATACCCTTTCATCTTCTCCAACCTTGTGCAACAAGCTCCGGAAAAAGTTGCGA GTCCCAGTAGAGCCACCATTAGTGATCTTATACCGTGCAAACCGACATCGCA ACAACTTGAACAAAGCAAAGGACAAGCATATATCTTTTAACTGTGGCGTTGC AATTAGAAGCATGCTATCTAACTGCCAAACTCTATCAATAGTCACCAATCCA ACATGTTTGTTCAGCATTGTTTCAGAGTCATCCTTGAATGTATACCCATGAGG

42

ATGCTTCTCCACTTGTTCCTCCGATTCTCTCATAACCAAGAGTGGGGGAGCAT GTGCACCATCACCTACCAGCGGTTCATCAACATGCTGATGTGCTGCTTCTTGT GCTTGTAACTTCTGCATGTAAACAAAAACAAGTCCTGGATTGCGTCCGAGTGT GAAGGACGACCGCCGTGCCATTTCGAATGCATAAAATTTGAGCCCTATTTTG GAGCATAGCAGAGCATAAGGTGTGATGAAGATGAAGATGAGTCCTATAATG AAGCCCTCTGCGGCATAGAGCTTAAGGAGACCACCGGCAGCAAGGTAAAAA GTCCAAATCCCCTGTATAAGCAGCTCGGGCGGGGGACCTTTGGATTGGCCTTC TCTGTCATCAACAGAGACTATTGTGCTTGTATTGATCATGACTACCAGCACAA GGGATGCCAGAATTACGAAGAATACCACGTGGCTAAGTGCATTGCATTGTGC AACGATCGCCAGCTCACCATAATCACCATAATCCTCACCATAATCTTCATACA GTGAGGAAAACCTGATGTAATTAGGGTGGAGGCTGATGGTGGAGGTGAGAT AGGAGATGATGGGCAAGAAAAGTGTGTTGGCTCCAAGGAAGACAAAGCGAG TGAAGGGGTGGTGGCGGTACCGCTGACCATAGGCGCCTATCCCAACAAGTAC TACAGCCAGGACAGGTTGAACCACTAGCAGGGCATTAACCTCCCATAGCTGC CGTCTAATGCTCTCGTGGACGGTTGTCTGGAGATCCATCGCTTGTTTGGAGCA ATTTCCGATCAGCTTCCCATTCCCTCCAGCCATGCTCAGCTGCCTGCCAAACT TCAAGTCGTTAGCTAGCACCATGCAAGGCAAGGCACATGCAACCTCGTGCAA GATAGACGGTGTTGTATGTACGAGAAACTATTAACATGCAAGTAATTTGAAG AAAAAAAATAAACGAGTGGAGGAATTACATGAATTAATTAAGGTGTGGCGG TATAGCTTAGGCCGAGATCCAGCTGACTGGTGCTGCGCAATCCAGCAGCCCA GCCGCTCAGCGTTTCTTATGGATGGTTTCCTTCGTAATGGAGTTTTGGGCTTG CTAGCTACTCACGCGTATTAGAGAAGGGTATATATGGGAGATGCCGAGAGGA AGAAAGTAGAGGTTGAACCTGACCTTTGGAACGATTGGGTCATGATCCCACG GCTATTACAGCCAAGAATCTTTGATGTTCCATAAGTATTAGAAGTTTTGAGGA CATATTTAGTAGGTAGAACTAATCAAATGATTATCTTTACTCTCTCATTTTATT TTCTAACACAGTACCCATTTAAATACTATATTCTTTATTTGGTTGCAATAAATT AATCTACCTATTACTCTATATATATGTTTTACCATCATTTGCACTGCGCCTTGG TGGGTTTAATTTATATGTATGTGTCCAAATATTCGGTGTGATAGGAGTTAAAG TTTACCGCTAGAAATCAAACAGGGACTAAGTATTGTCTCGGCGCTGCTAAAA CTACACCAGGGAAAAATCGCAAACCTTTGAAGTTTGTATATGTCGTTATGCTT TTAGTTTTAAGTTTTACTCTAGATAGTTGATGATGCAATCTGCTGTGAATCCTT TTTAATTAATCTTTAAGGAAACAGCAACATGGATGTATGAACTGATGACTCA ACACAGTCTATTTTCATGTTCAACAAGGAAAATGAAATAATGATTCTCTTGTG TTTTTGAATTGTAGGAGACTAGGAGGATTCCCCGCGTGTTGCCGCGGGTATGA AGAAGGAATATAAATGTTAGAACGTATAGAAATGTGACTTGCTGACAGATAT TATAATTACATAAGTTAGTAGATTTTAATTGTATGTGATAGTAGATTGTCTAG TTGGATAGCTTGTATGTTGATAGAAATAGGTAGTTAGGTGTCATAGTGGATTG TCACATGTTAAAAAAATAGGTAGTGAGATTTTGCTTTATAAAAGTATAGTAA GATTGAAAGTCGCTGCAACACAGCAGGAAACCCAGGGTTTATTTTAGTCTGG TCATTCACTCAAATGTACGTAAATTAGCTAGGTAATCTGCTGTGGATCGGAAT ATATACTTTGCTTTATTCATGGAAGAGCAGGTGTTCTGCTAATCTTTTAAAAA AGGAATAAACAATGAAAAAAGAAAAGCAGCTAGTACCTTAAATCAAGCATC

43

ATCATATGCATGCGAGGTCCCCTTTATTCCTCAGCGCCTGCGTTTTTCTTTTTC TCTTTGATTTCCGTCTTTTTTTTCTTTAATTTCTGCCTTTGTCTATCATTATTAC CTTCTCCATTGCACAGACGATATGAAGCAAGTCGAGTGACTAGAATGGCACG AGCTCGAGTGACTTGTGGCGAGCGTGTGAGAGAGCGTGTGTTTGGCTCAGAG TAATACATGGAGAGGCCAAGTCTAAACAACGACTATCTTCATTTCTATTTAAA TTTGTCTAATCAATGATTCCCACACCATTGCCGTCCACTATACACCACTTGGT ACGTGCGGAGCTATTACCACCATGGCCACTAAACCACTCAGCCCACCAGGGT GTCTTGGAGTCAGTGGCGGAGCCAGGATTTGAATCTTGGGTATTCCTTTGTTA GTGTCTAGTGAGTCGTTATTCAGCTGTTGCCTCTAACCTTCGACCGACGTGCG CTCTCGCCGGTCGCCATGTAAGCATAGCATAGAGATAGAGGTCGCGTCCGCG CTGCATGTCCCGCGCTGCCAGGAGCTTAGGACATTGAGAACCAGCAGTACCG CCACCACATCATGCCTGGTAGGCTAGGGTGAGAGATTTAGGCCTTGTTTAGTT CGCAAAAATTTTCAAGATTCCTCGTCACTTCGAATCTTTGGTTGCATGCATGG AGCATTAAATATAGATGAAAATAAAAACGAATTACACAGTTTACCTGTAATT TGTAAGATGAATCTTTTGAGCCTAGTTACTCCTTGTTTAGACAATGTTTGTCA AATAAAAACGAAAGTGCTACAGTAGCCAAAAATGGAAATTTTTGCCAACTAA ACAAGGCCTTAGGAATGAGGATTATGGAATAGTCTCCCACTCATGTCCTTATT ATATATGCTGCCATGTGCATATTTCCATATTTAGATCCAAAAAGTTTTTCGAT TGGAGTAACTAGGCTTAAAAGATTTGTCTCACGATTTACACGCAATCTGTGCA ATTAGTTTTTATTTTCATATATATTGAAAAGTTTTTTGGTTTTAGGATGAACTA AACAAGGCCTTATGTTGGAAAAGAATTTAGAAATAAAATATGATTTGTACTA ATATTTAGTGACATGGTCAATATGTATGTGGTCTGAAGATAAATAATTAGATT CTCATTTACAGAAACAACAAATTAGTACAAATTCCTTTGAAAAGAATGAGAG AGAAATTTTATTTGTTAAATATAGAAATGCTAATATATTTGATATACATATAT ATAAATATACCTATATATACTTAGTGCACAAAAAGTTAGGTATTCCTAGGAAT ACCAGGAATACCTGGTGTATTCGCCCATGCTTGAGGCATGTTTGGTAACGCG ACTCTGAATGGCCTGGCTGGCTGGCCTAGTCTAGGTCAGCTCGAGCCTGGTCA TAACGGTGCAATCTCATATGTTTGGTTGTCTATTCTAACAGAGACGAGCTCAC TTCAAATCGTGTTTGATTGGTAGTATACTAGAATCTGGTAACCCTAGACTTAC AACATGGTAATGTTACCTCCTCTTTGTCATTTCATCGCGAATACATGGAGGTC GTCGGCGTCGAGGCATGGCATGGCTCCGATTGGCGGTGAGGCATGGCTGCAT ACGACGGTGAACCCTCCTCCCCATCCCTCTCTGCTTGCCTACACCTTAGTCAC CACTAGCGTAACTCGCGAACGCGGAGCACACTACGCCATTGTTGTGGCTGCT CGACCTGATGACAAGCTCGCAGCCACCGCCAGGGCAGCCTCTCTCCATGCCG CACCTGGAGCACGGCATGGGAGCAAGAGTGTCTACGGCCGGCGGCGGCGCC GCTTCTAGGAGCAACGTGACCGAGGCGGTGGTGTCAGAGGGGCGCCGCTTTC AGGAGCCCATCCATCGCGTTGCGTGGTCCCCGTGTGTCCCTAGAGGAGGAGG CAGTGTGGCGGGACTCTGATTTTCCCAGATCTCATGACCGGGTCCGGCTCTGC CACCATTGCCGTGACAGAGGCAATGTTGCTTTGCATAGCATCCGTGCAGAAT AGGCTATCCAGTGACTCGTACATGCCGATGTCGAAGAAGGCGAACTCATCGC AGAAGAGGAGCACATTTGGCTCAGAGCCGGAGGAGGCCTATGGTGATGCCTC GTCTTCTGTGCTCTTCGACTCAGTCCTTGGCTGCTTCTTGTCGCCACTCTAGCC

44

TCAACATCGAGGGCGGGGGCTGCGTCATACGCCGCTGGCATGCCCTACCGTG AGATGTCCAGGAGTGCGCCAGAGCCAGAGCGCTCATTGGCCTCGTACTTCAA GGCGGCGCTGCTCTTCTTCCTCTTGGTTGACAAGGTCGATGACGGTGGCGTCG GTGGTGTGCGTGCTTGTGACTGGCTAGCTAGGGCCCCGATGGTGGCGGAGAA GAGAAATAGGTGAGAGGCCGGGAGGTGGCAAGTTCGTTCACTGCACGCTCCT TCTCGCTCCACCTGGCTAGGCTATGGAAAAACAAAACCCCCCATCATTTTAGT TGATCCCATAACTATAAATAGTTGATTGCGCAACTCGACATGTAAGTGGAGG GTCGGCTCCCAGGGGGGGAGAGGATAGACTTAGACAAACCGATAAGCTAAC AAGTGGAACAACCGAGTGTCTTCCACCAGCAGAAATCTCCACTTTTGTGACG CTAGGGAGAACCCTCACCGCTGACACCGTAAGCTAGCACTGAGCAATCCCCT ACCAAACTCTCATTCTCCTTCATTATAAACCCCCTGAGATTTGGGTGCTCCTG AATACGAAGAACACCACACTACTAGACGAGGGCCTCGATAGCCAAAACTAG GATAAACCGTCATGTCTATCTTGTGTTTTTAAGTACCCAAGCCACCGGAGACC CACACTTCTTATATTCGACAAACAACCATGATGCTTTGCAGTTATGAATCCGC GACAACTAGCACACAAAGTAGGGAAGTGTCGAATGTGAATTCGGCTCTATGG TGGCCGGAGCCACCCGACTTGTCGCCGAAGCAGGTTTAAATGAAGGGGCTGA TTGATGATTCTTCTTGGGATACAAAGTTGGACCATATTGGGCCCAAAAAATG GGAGGAGAGAAAGCAGTCGGTGGAAATTTAGTCTTCTTCAACTCCTAGAACT ATCATTGTGTTCTGATTTTTCTGCTAAAACTATAAAACCGCATATCTTACCCCC CCCCCCCCCCCTCATCTCTCAGAATCAATTACCTCCCTATCCTAGTTAAAAAT GGTGATAAATGCCAGATAAGACTTTTAAACCATAAAAATATCTCTACGGATA TTTACAGGCCGTGAAAGTAGCTCTAAAAGCTTTCAAAAATAGATTTACCTTTT CAGAAAATAGAAAATAAACAGAGAAATCTGTAGAATTCCCAAAATATTCTAA TTGATGCATAAACACGTTTTCAAAACTTTCTATAGGAAACATGAGGGCAATA TCAGCAGCATATGCATTTAGCATCATATTTGTGTTGCTTTCTTGTCGTCTGCAT CTCATATTTTTGTTATTGGAAATTCCTTAGACACATTTGGACATCAATTATGG AAAATGTTCATATTTATTTGGATATTTTCTCTTTTTCTTGAGCTATACATAAAT TTTTGGAAGCTACTAAAGTTAATTTCCTTAGCTCTCAATATTTTATTTGGGTTT ATAAATATTTTAATTGGGGTTTTCAAGTACCAATCTTACTTTAGGATTTCTCCC AAAATTTTAGAAGCTTAGACGTAGTTTTCATGGTTAAAAACCTTATCATATTT TTAGAACCAAATTGTCCTCCTGAGAAGCACTGCTAACGGAGCGGGGAAGTAA TTGAAAAGACTTTGAAGAGTTGGGGTGGGGGGGGGGGTGGGGGGGGGGGTT AAGATTTCCATTTGTGATGTTCGAAAAAGAAATTGGACTATGATAGTAGTTCA AGGAGGATTGTTTTTTCTGAATTTGCGAGCGTGTTCTCTGTTGGCCCATTAGG ACAATGGAGAGGCACGAGACAATATTTGCTTGGCAACGGAAGCAGGCAGCA ATTGGCAATATATTGGCCTCATGACTTGATTTGGCTACAAAATGCAGAAGAG AAAACTAAACCAACATCCAGATGCATGTAGTTTTGAAAAAAAAATGCATCGG CGCTGCTGGACCGCAGAACGCCGCTTCTGCCATGGTCTACAGACTTCGGCCA CCTGTGTCATATGTGACCAACTACCTGAGACCATTGATCACATTCTTCTGGGC TGTTGTTTCAGTAGGGAAGTTTGGCAAATTTGCTTCAGCAAGCTGCACCTGCA GGGCTTCGTTGTTGTCAATGAGGAGCCTGTGATGCATTGGTGGGTTCACAGCA GGAAGGCAGTCCCTAAGCAACTTCGGTGTGGATTCGACTTGCTTTTTTTCTAG

45

TGGGCTGGCTGCTTTCGAAGGAGAGGAATGCGAGAACTTTCGATGCAGTCAC CTCGACACTGTTACAGCTGGCAACAAGAATTATGGATGATATTGTTGTCTGGG AACTGGCAGGGTTTAAACCCTTCTTGGCTCTTCTAGGTTTGGTGTCTTCTTAGT AGTCCTCTCCCCGCTTCCTCTTTAGCCATCTTGGCTACTGCCTGTTCCTTTGCT GTTGAGTGTTTTGGTTCTATGTTCGGCAACATATGGTTACCGTGTAACTAGCT GTATCCTTCAAAGCTATTCCTGCTTAATGAAAAACATGCCTCCCGACGTGGTC TAGAAAAAAAAATTCTTTGGCACTTACTCATCAAACGTTATTCTACCATGCAA GTCATTCGTACAAAAGCTTACACATACATGGATCATCACACAGTATGTGGAG ACAGTAAACAATTTGTATCTGAATTATTAGTAAATCTTAAGAGCTTCATTACA ACATTATTTCAGGCAGGACACCAAACAGACTAGTTACAATTGATACAGAAGA CAAGAGTACCATTATCACATCAGCAGCGACGACAACATCATTCAGAGTGGCA CCTCATCAAATGAAATGCACTCGGCATGCAGAAACATTACCTCACAGGCCGA CAGGCGTACAGCGAGGCGACAAAGTATGAAATAATTAAAACCAACGCTGATT TTTTGTTGGCGAAAGCATAACAAACCAACCCTGAATTAAAACATGCATGTAT GAGTAGTAGGTAATAGGATAAGACTCCATTAGACCACACCAGCGGCAGAGGT AGCAGCAGCAGCAGCAGCAGCAGCACTGTTGTTCTCGCCTGGCCTGCTGACG ATCCCAGCATGGAAGAGCAGCACCCATAGCAGCGTTATCAGCTCGCCGCCCC TGGCGATTGCCTCTTTGTGCCCTTTGAGATTGTCGGACGGCGCAACATACAGG ACCATCTCCGACCAGAATTCAGCCAACAACTTCCACGCCGTATCATCACCAC CATGTAGCACCAGCTTCGCCAATTGCTCTCCCAGCCTCGCGCCGTCCTTGACT ACTTGATGCTTCGCTTTTGTACTCAGCAATTCGACCAGCTGTTGGCATTTGAC TTCCTGCGTCAATGAATCTCCTGCCGTGCAGCCGGCAAGGACACGCCGAGTG TCCTTCTTGACATTCTCGTACAGGCTCCTGCTCCATTCGTGGTCATCGGGAAG CAGCTCAGGGCACCATGTCACGAGGTAAGCACAGTACCGTGATAGGTGGGTG GCAACCATTTTGTAGTCAGTGTTGGAAAGTGGTGGAGAACCTTGTCCTTCGTC ATTCAGGTTTGGGTACCTGACCTCAAGTATACTTGTGGCGATGTGCCATGTGA GTATGGTATAAGATGTACTCTTGTTGCTGCAAGCCCAGTTGAAGCTTTCACCA AGGTCTCGGCAGCGGCGGAGACATGCTGCTCCATTGCTGAGCTGTCCTCCAT CATCTCTAGTCCTTCTTAGCGCTTGCATGATGCAAACCTTCACTACTGCTGGC ACACTGACCTTCCTGTTCTGGTCTGGTAAAAAAGGGAAAAGGTACCAGAGAA GACCAAGCGGGGTTGCTCTTGGTTGAAGCACCAATACTGAGAACTGGCCAAT TTTCTCATCCCAATGCCTCCTCAATAACTTGCATCTGCAGCGGATCAAAAGGC CGGCACATTTCTGAATGCAAAGAGAATGCTTCGATGAGGAGGAGGAGGCAC GGTTTAGGAGGCGACACACGAGGGCTACTTTGGTCCAGTTAGAGCAGACGTA GGAAGCCATGTCCCTCACCTCAGTAATGAAAACTAACACTAAAAGAAAAAGT AGTGGCACCAGACCAAAGAGCCAGCTTCCATACAATCTTATGTCCAATTGTG TTTGCAACTGATTTTCAGTGCAACTGATGAAACACAAAAGCTGAGGTGAACT AGTATTTACAGCTAAAAGGAAGCTGAGCCACGTTGCAGCCAAGATGCAATAA GTTATGCTCAAGAGTGAAATAAAGATACCCACGATGGGCAACCAGCACTTGG AATAGGAGATCGGTATGGATGTGTAATAATAGTCATGAACAAAAGAAATCTC CTCTGAAATCAACCGAAACACCCTGTTGTGGTCTCCCTCCTTGAGCAGCAAGC TCCAGAAAAAAGTGAACATGTCACTCGAGGCATTAGCAGTGTGGACATTGTA

46

CCCTGCAAACCGGCACCGCAACAACTTGAAGAACGCAAAGGACAAGCATAA GTCTTGTAGTCGCTGCAGTGTTGAGAACGGAAAGAAGCAGGCTGTCATCTTC CAAACACTGTCAATAGTTACCAGGCCATCTTTACCGTGCACAGCCATGCCTG AGTCACCCTTCCATACATACCCATGAGACTGCTTCTCCATGTTTCTTTTTCCCT CTCCCATAACCACTAGTGGAGGTGGAGGAGGAGCATCCTCATTAGTTACTGC TGCTGCTTCTCTACGGTGACTTGTCCTTGCTTGTACCGGTGGTGGTTGCCGCAT GTAGGCAAAAATAAGATGAGGATTGTTTCCGAGAGCAAAGGATTGCCGTGCC TTTTCAAATGCATAATATTTGAACACCATTTTGGCACATGTGAGAGCAAAAG GTATCCCTTCAAGTCCAATTATAAGCACAACAAAAGTAATAAGTCCATCTCTG GAATTTATGCCACTATTATTAGTGAAGTAGGTGACACCTAGGTAATAGGTCC AGAGGCCCTGAACAAGCAGGTCAAAAGGAGGACCTATGTTTCTGCCTTCTCT GTCATCAACAGCAACTATTGCACTGGTATTGATCATGATTATCTGAACAAGG GATGCCCATACTATGACCACGGGAGATTGAAATTCTGGATCGCATCTTGCAG CCAACCGCCCCTCATCTGTGTCTGTGACATTGTAGTTTGATCCAGCGACCAAG GGGACAGCGGTGGACATGACAGGCAAGAACAAGGTGGTGGCTCCAACAAAG ATGAACCTAGTTAGGCGGTGGTAGCGGTAACGTTGGCCAAAGATGCCTATCC CAACTATGACTCCCGCCAGGATGGCGCTGGCGACCAACAGAGCGTTCACAAG CCACAACCTGTGTCTGATGCTCTTGTGGAACAAGGACAGGGATTGGCCATCG CATATAATCCGGGAGGAGCATTGTTTGAGTGCATTGTAGGAGCAGTGGTCTC TAGTTGAATTCCCTCCAGCCATGATTGCCCCTGCTCATTGCCAAAATATCAAG GTTATATAAACACAGATTATTTTGTAGCATGCTGAAATGGAAATAATATGGA CTATGGAGGCATGTACATCTTATCTTACAATTACTAGTTAGAGGCCTGAACGG AAACACCATGTTAATTCTCACTAGGCGCACTAGGAAAAAAGATAAATCCTAG AAATCCTCATAATAATCAGAAACATCTGACCTTGAATTTGGTAGACTTGAATC ATCATCACCGATAAATAGCAAATGGATATATTGAATTTAGAATAAAAATTAC CCACCATTGTCTTTAAGAAAAATACATAAAGTGACCCCCTTATTTTGTATCTA AATTACCCTCCTCCGCCATTACACAAGATCATTCAGAATAACCTCCTAAATTG GTATATAGATTACCCATAACTGTCATTATGAAGGATAATTCAAATAGCCCCCA AAATTGACATCTAAATTTCAAAAATTTGTCATTATAAAACATACTACCAAATA ACCACATAACATTGCATCTAATTACCCTTCTTAACCATTATGAAAGATAATTT CAAATAACCTTCTAACGTTGTATCTAAATTCCCATCTTCTATAACAAAAGATG ATTTAAAATACCCCTAAAGTTTTTATAAGTTATCCACCTTTGTGATTATAAAA GATAAAATAACCTCCTAAGTTTGCATTCAACCTACCCATATGTGTCATTATGA ACGATAAATATTCGATGTACCTCAGTCACTATAAAAGTAAACAAAATCATCT CTCAAATATACATGTAAATACTTATATATGTTGGTATGGTACATATAGATATT TCTTGCAATCGATTATGGAGAAGCCAAAAAACGGGTTATTTAATTTCTTTTAC TTTGAATAAAAAGAAGAAACAACTATGATGAACTCAAGGTAGACATTTCTGG AAGTAGGTGGGATTACCCCTTTTCATTGTACGGACCATAATTATGCCAAAATC TTCACCCCAACAGGTTGAGAGGCCCCATCACCGTCTTCATGTTTTGGCAGTGG AAGTGGACGATTATGCCCATCTTCATCAAATGTTGGTATCAAACTGGCCATGA AAATCGATAAATTTTTTTTTTAAAAAGTCAAGGTCAAGTGCGTGCCCGCCACA CCCATGGCACCGGTCGTGTCCTTACGCCCCTCGACACTGTTGGCTGATTCAAA

47

CCCTGATGCCCTCGTCCACCCGATCATAGATCAACACCGTAGGGACGCCGCC TCACCGTTGCTGCCATAGAGCCTCCTCTGCCCCCGCTGAGGGGGGGGGGGGG GCAGGGGCGAACCTAGCAACAAGCTAGTGGGTGCAAATGCACCACCCAAAA TTTGAAAAAACAGTGGTTTTTTCTAAAATTTTACCATATATGCACCACATATA ACACAAGTCTATGCACCCACAAGGCAAGTGCACCACCCTCTAATTTGCTCTA GCTTCGCCACTGGGGGGGCGGGGGACCTGTGAGCGAAGGGAAGCACCCACC GCAATGGGGAGGTGCACCACCACTAGGGTCAGCATGCCTGTTGCAAGGGCAG GCATGGGTGTGGTTAGGGCCAACACATCCACGCCAAGGGGGTCATCCCGTCG AGTGTCTACCATGTGCCGGGTGGTTTGGTACACCAGATGGTAGGGCGAGAGC CTGGGGAAGGGAGAAACAGAGGGATAGAGCCTGGGAGAGAGGGGGGAGGA TAACACTGAGTTGAGAGAGAGAGAGATGGTAAGAGGATAGAGTTCAGATGG TAGCAACCTTACTAAATACATTGAGGATCCAAATTTTGCTTATGTGAGGGGGC TCTTAACATCGGTTCTTCTCAACACGAACCGGCAGTGAGAATGTTTACTGTTT GCTCATAAGATGAACTGGTGGTCAAATTCATTTTCACCACCGTCCATGTTCCA TTCTGAACCAAAAATGATTTTCAACGTTGGGTCCTTTATGGACATGTGGTGAC CAACGCTTACTACGTGAAAATGGTGTAATGAAAACTATTTCTTTAGTAGTGCA TGCATAG

48

Figure S4. Sequence Alignment Between the Johnsongrass Full-Length Iap Amplicon Consensus Sequence (top) and the Sorghum bicolor Reference Genome (bottom).

CATTGACCCTGACGGTGACCATGCCAAGTACAGCAAATTAACAGTCACCGTGTTTTACTAGTATATGCCAT CTTG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CATTGACCCTGACGGTGACCATGCCAAGTACAGCAAATTAACAGTCACCGTGTTTTACTAGTATATGCCAT CTTG CTTGGACAAGTCTCACTAAATTCATCAGTATAAATACCGTGGCCAGTCACCACATCTCATGCACCAAAATC CATC |||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||| CTTGGACAAGTCTCACTAAATTCATCAGTATAAATACCGTGGCCAGTCGCCACATCTCATGCACCAAAATC CATC TACCACAACAGCGCGCCATCTCTGCTCATTCAGATAACTTGGTCAGCAGCAACAGCGGCAATGGCGTCGTC CTCC ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| TACCACAACAGCGCGCCATCTCTGCTCATTCAGATAACTTGGTCAGCAGCAACAGCGGCAATGGCGTCGTC CTCC TCGTCACCGACGAAGCTGCTAGCGTGCCTCGTCGCCGCAGCTCTGGCGCTGGCAGCCACCGTCGTCGCGCC GTGC ||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| TCGTCACCGACGAAGCTGCTAGCGTGCCTCGTCGCCGCAGCTCTGGCGCTGGCTGCCACCGTCGTCGCGCC GTGC GCGGCGCAGAACTCGCCGCAGGACTACGTGAACCCGCACAACGCGGCGCGCGCCGACGTCGGCGTCGGCCC GGTG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| GCGGCGCAGAACTCGCCGCAGGACTACGTGAACCCGCACAACGCGGCGCGCGCCGACGTCGGCGTCGGCCC GGTG TCGTGGGACGACACGGTGGCCGCGTACGCGCAGAGCTACGCGGCGCAGCGGCAGGGCGACTGCAAGCTGAT CCAC ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| TCGTGGGACGACACGGTGGCCGCGTACGCGCAGAGCTACGCGGCGCAGCGGCAGGGCGACTGCAAGCTGAT CCAC TCCGGCGGTCCCTACGGCGAGAACCTCTTCTGGGGCTCCGCCGGCGCCGACTGGTCGGCGTCCGACGCCGT GGCG |||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||| TCCGGCGGTCCCTACGGCGAGAACATCTTCTGGGGCTCCGCCGGCGCCGACTGGTCGGCGTCCGACGCCGT GGCG TCGTGGGTCTCCGAGAAGCAGTACTACAACCACGACACCAACAGCTGCGCGGACGGGAAGGTGTGCGGACA CTAC |||||||| ||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||| |||||| TCGTGGGTTTCCGAGAAGCAGTACTACAACCACGACACCAACAGCTGCGCGGACGGCAAGGTGTGCGGGCA CTAC ACGCAGGTGGTGTGGCGTGACTCCACGGCCATCGGCTGCGCCCGCGTCGTCTGCGACAACAACGCCGGCGT CTTC

49

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| ACGCAGGTGGTGTGGCGTGACTCCACGGCCATCGGCTGCGCCCGCGTCGTCTGCGACAACAACGCCGGCGT CTTC ATCATCTGCAGCTACAACCCGCCGGGCAACTACGTAGGCAGGAGCCCATACTAGACGTAGTAGTGTGCCGC ATAT ||||||||||||||||||||||||||||||||||| ||| ||||||||||||||||||||||||||||| || ATCATCTGCAGCTACAACCCGCCGGGCAACTACGTCGGCCAGAGCCCATACTAGACGTAGTAGTGTGCCGT ATGC ATAAATTGAGTACATGCAAGTATACGTACTGGGGTCGGAGTGAAAATAAATTGTTGTCAACTTTATACCAT ACTA || |||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ATGAATTGAATACATGCAAGTATACGTACTGGGGTCGGAGTGAAAATAAATTGTTGTCAACTTTATACCAT ACTA TGAATTTTGATAAACATAATAAGTCAATAAAATCATGTGATTGCTGAGGCATATATATTTTTGTTCTTATA TTTT ||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TGAATGTTGATAAACATAATAAGTCAATAAAATCATGTGATTGCTGAGGCATATATATTTTTGTTCTTATA TTTT GATCGTTAATTACTACAGAAACGATCTTTAGGAACATGCATGATCAATTTTAGCACTATACTACAACCAAT TTGC ||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| GATCGTTAATTACTACAGAAACGATCTTTAGGAACATGCATGATCAATTTTAGTACTATACTACAACCAAT TTGC AGCGTCGCATAACAAGACCATGTGATTTTTCATATATGCTATTACATCTGCCCAAAAAAAAAAAAAATAAA CACT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||| AGCGTCGCATAACAAGACCATGTGATTTTTCATATATGCTATTACATCTGCCCAAAAAAAATAAAAATAAA CACT ACTATAAAACCCATTCTCAATATCATTTTCTCCATCTTTACAGTAAATGCTACTACGTCCACCTAAAAATG CACA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| ACTATAAAACCCATTCTCAATATCATTTTCTCCATCTTTACAGTAAATGCTACTACGTCCACCTAAAAATG CACA ATAAACACTTTCACAATATCATTTTTTTGGCACTGATGTAGTAATAACTTATTGTGATAAATTTGTTATCA CTAT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| ATAAACACTTTCACAATATCATTTTTTTGGCACTGATGTAGTAATAACTTATTGTGATAAATTTGTTATCA CTAT CGGTTCGTTACACAATTGTGGTAACATACAACACCACAATCGGCCCACTAGCCTAATAGAAAACACTAAGG GTGT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| CGGTTCGTTACACAATTGTGGTAACATACAACACCACAATCGGCCCACTAGCCTAATAGAAAACACTAAGG CTGT GTTTGGTTATTTGGAATTTGGCCCAAGAACAATTCTATCTCCTCAAGACTATACAAATTAGTGAAGCAATC CAAA |||||||| ||||||||||||||||

50

||||||||||||||||||||||||||||||||||||||||||||||||| GTTTGGTTCTTTGGAATTTGGCCCAGGAACAATTCTATCTCCTCAAGACTATACAAATTAGTGAAGCAATC CAAA ACTAGGAATTATTCTAGGCCCTCATTCCAGTAGAACCGAACAAGGCCTAAGGCTTTGTCCTCACTCTATTT CAGA |||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ACTAGGAATTATTCGAGGCCCTCATTCCAGTAGAACCGAACAAGGCCTAAGGCTTTGTCCTCACTCTATTT CAGA CTCTTCCCTCTCTCATAGCCGCTGCTCTAGCACCGCCTCCACGTCCCAAGACCCCAACGCCACCGAACTGG AGCT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CTCTTCCCTCTCTCATAGCCGCTGCTCTAGCACCGCCTCCACGTCCCAAGACCCCAACGCCACCGAACTGG AGCT CTATAATTTCATCAATTTTTAGTTTTAGGTCCCGTCAAACTATAATTTTGATGTAATTTTTTCATCTGACT TTAT ||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||| CTATAATTTCATCAATTTTTAGTTTTAGGTCCCGTCAAACTATAAATTTGATGTAATTTTTTCATCTGACT TTAT TTGAATAAATTATAAATTATTCATGCTACAGCTGTTTTTGAGGTCCATGCATGTTTGAAATTTGATTTCTA AGTA |||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||| TTGAATAAATTATAAATTATTCATGCTATAGCTGTTTTTGAGGTCCATGCATGTTTGAAATTTGATTTCTA AGTA TGGCTGACTTACCTAGAAATCATTCATTTTAGAGTCGGTTGACATAAGCAACCATCCCTAAAATATATTTT TAGA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| TGGCTGACTTACCTAGAAATCATTCATTTTAGAGTCGGTTGACATAAGCAACCATCCCTAAAATATATTTT TAGA GATGGGGTAGATATATTTTCAGTGATAGTTCACTTAGTAGAACCATGGTTTGTAAGGTATGGTGTTTCCAG AGCT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||| GATGGGGTAGATATATTTTCAGTGATAGTTCACTTAGTAGAACCATGGTTTGTAAGGTATGGTGTTTCCGG AGCT AGTTGCAAATAGTTTTCTATGTAGAACACCCTTGAAAAATAGAAGACATCTCAAATTATTTTTTTAACTGG TGTG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| AGTTGCAAATAGTTTTCTATGTAGAACACCCTTGAAAAATAGAAGACATCTCAAATTATTTTTTTAACTGG TGTG ACTGTGCGAAGAACGCTCGTGAGGACTCACGGTCACGGCGCTGGAGCACATATTGGCCGCCAGAGTCACCC AGGT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| ACTGTGCGAAGAACGCTCGTGAGGACTCACGGTCACGGCGCTGGAGCACATATTGGCCGCCAGAGTCACCC AGGT GTACAATTCCAGGAAACTAAAGTTGCCACGTCAATCCATTACCATGGCGGCGGTGGCGTGGCACAGCCTTT TGTT |||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

51

GTACAACTCCAGGAAACTAAAGTTGCCACGTCAATCCATTACCATGGCGGCGGTGGCGTGGCACAGCCTTT TGTT CTAAGATAAATGAAAAAGGACCGACCGGCGCTGATCTTAACCATTACGAACTTGTTTGTTTGAAAATATTG TTCG |||||||||| |||||||||||||||||||||||||| |||||||| ||||||||||||| ||||||||||||| TTAAGATAAATAAAAAAGGACCGACCGGCGCTGATCTTGACCATTACAAACTTGTTTGTTTAAAAATATTG TTCG CTGATTTATTGTGAGAGAAAAACAATGCTGAATAGCTGGCAGATTCAGTAGATAAGCTTAAGCAAATACAT TAGT |||||||||||||||||||||||| |||||||||||||||||||||| ||||||||||||| ||||||||||||| CTGATTTATTGTGAGAGAAAAACACTGCTGAATAGCTGGCAGATTCAATAGATAAGCTTAACCAAATACAT TAGT CACTGCTCTAGGTCTCCAACGTACAGACAAAATGCAGAGAAAATGAAGAAGAAATGATGAGCACGCGACTG AGAG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CACTGCTCTAGGTCTCCAACGTACAGACAAAATGCAGAGAAAATGAAGAAGAAATGATGAGCACGCGACTG AGAG TGAATGAATACACTGCAGTCATAGTACGTTTGTAATCAGCTGTATCAAAGTTCTTGACAAAATATGTATAT AATA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| TGAATGAATACACTGCAGTCATAGTACGTTTGTAATCAGCTGTATCAAAGTTCTTGACAAAATATATATAT AATA CGGGGCTTGCACATCCAACAATCTTGGAACAGAAACTAAGATTAATTAGCATGGTATATAAATTAAAGGTG CAAC | |||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||| CAGGGCTTGCACATCCAACAGTCTTGGAACAGAAACTAAGATTAATTAGCATGGTATATAAATTAAAGGTG CAAC AAACATCATTGCAAAATTAAAAGTGACATCTCAAAAGCACACCACGGGTAAAGAAGACCACACAGCGCGCG CGCA |||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||| AAACATCATTGCAAAATTAAAAGTGACATCTCAAAAGCACGCCACGGGTAAAGAAGACCACACAGCGCGCG CGCA CAACTGTAAATGAAGTAATCCAAAGCAAAGCGGCCATGGTAGGCCGCCTAAAACAAAGTTGACACAGCAGA ATAA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CAACTGTAAATGAAGTAATCCAAAGCAAAGCGGCCATGGTAGGCCGCCTAAAACAAAGTTGACACAGCAGA ATAA GACCAACGACCATACGGACTAATCAAGCACACATGCATGAAGAAAAGACAAATCCATTAAACAGCACCAGA AGAA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| GACCAACGACCATACGGACTAATCAAGCACACATGCATGAAGAAAAGACAAATCCATTAAACAGCACCAGA AGAA GTTGTTGCAGCGCCATCTTCTTCACCTGGCCTGCTGACGATCCCGGCGTGGAAGAGCAACGCCCAGAGCAG CGTT |||| |||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GTTGCTGCATCGCCATCTTCTTCACCTGGCCTGCTGACGATCCCGGCGTGGAAGAGCAACGCCCAGAGCAG

52

CGTT ATCAGCTCGCCACCCCTGGCAATGGCCTCCGAGTGACCCTTGAGGTTGTCTGATGGCGCGACGAACAGGAT CATC ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| ATCAGCTCGCCACCCCTGGCAATGGCCTCCGAGTGACCCTTGAGGTTGTCTGATGGCGCGACGAACAGGAT CATC TCTGACCAGAATTCAGCCAGCAGCTTCCACGCCGTCTCGTCACCATTGATTTTCTCCACCAGCTGCTTTGC CAGC ||||||||||| ||||||||||||||||||||| |||| |||||||||||||||||||||||||||||||||||| TCTGACCAGAAGTCAGCCAGCAGCTTCCACGCCATCTCCTCACCATTGATTTTCTCCACCAGCTGCTTTGC CAGC TCCACACCATTCTTGAGCACCTCATGCTTCGAGTTCTCACTCAGCAAGTGGACCAGCTTTTGATACTCTTC TTCT |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||| TCCACACCATTCTTGAGCACCTCATGCTTCGAGTTCTCACTCAGCAAGTGGACCAGCTTTTGATACTCATC TTCT GGCTCCGAGGACCTTGCTACTGCGTGACCGGCGAGGGCATGCTCTGCGTCCTTCTTGACAGCCTTGTACAA GCTC |||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GGCTCCGATGACCTTGCTACTGCGTGACCGGCGAGGGCATGCTCTGCGTCCTTCTTGACAGCCTTGTACAA GCTC TTGCTCCATGCAGTGTCTTATCCGGAAGTAGCTCAGGGCACCAACTCACTAGGTATGCACAGTACTTTGAC AAGT |||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TTGCTCCATGCATTGTCTTATCCGGAAGTAGCTCAGGGCACCAACTCACTAGGTATGCACAGTACTTTGAC AAGT GACTGGCAACAATCTTATAATCTGTAGAGCTGGGATTTGGTGGAGAACCCTGTTCTTGATGATCAGCTCGG TATG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| GACTGGCAACAATCTTATAATCTGTAGAGCTGGGATTTGGTGGAGAACCCTGTTCTTGATGATCAGCTCGG TATG GGTGTTTCACTTCAAGGATGCATGTGGCGATATGCCATGTGAGTATGATATCCGATGTACCTTTGCCATTG CAAG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| GGTGTTTCACTTCAAGGATGCATGTGGCGATATGCCATGTGAGTATGATATCCGATGTACCTTTGCCATTG CAAG CCCCAATAAAGCTTTGTCCACCAACTTGGCTTTGTCGTAGAGATTCTGCACCATTGCTCAAACGGTGTCCA TTGA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CCCCAATAAAGCTTTGTCCACCAACTTGGCTTTGTCGTAGAGATTCTGCACCATTGCTCAAACGGTGTCCA TTGA TCAGATTTCTTACTGCATTCATGATGCTAACCTTCACGGCCGCTGGCACCTCCATTTTCCTCCTCTGATCT GGGA ||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||| TCAGATTTCTTACTGCATTCAGGATGCTAACCTTCACGGCCGCTGGCACCTCCATTTTCCTCCTCTGATCT GGGA

53

CACGGAGGAGACGGTATACAAGAACTATTGGGGTTGTTTTTGGCTGAGGTACCAACACTGAGCACTGGCTC ATTT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CACGGAGGAGACGGTATACAAGAACTATTGGGGTTGTTTTTGGCTGAGGTACCAACACTGAGCACTGGCTC ATTT TTTCATCCCAGTGTTTCATCAGCTTGCATCTGCAGCGTAACAAACTACTAACCAATTTTCGCATGCATAGT GAAT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| TTTCATCCCAGTGTTTCATCAGCTTGCATCTGCAGCGTAACAAACTACTAACCAATTTTCGCATGCATAGT GAAT GTGGTGAAGAGTTGCGGTTGATGTAGCAGCAAATAAGGGTAACTTTTGTCCAGTTGGAGCAAATATAGGAA ACCA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| GTGGTGAAGAGTTGCGGTTGATGTAGCAGCAAATAAGGGTAACTTTTGTCCAGTTGGAGCAAATATAGGAA ACCA AGCTTCTAACCTCAGCTGTCACCGCTAACATCAGAACCAAAAATATTGGCAACAGATCGAAGGACGACTCT CCCA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| AGCTTCTAACCTCAGCTGTCACCGCTAACATCAGAACCAAAAATATTGGCAACAGATCGAAGGACCACTCT CCCA TATTGATGTGTGACATGTAGCTTGCTAAGTTCCCATGAATACACCATAATGTACACTGAAGTTGTTGGCCC CTCC |||||||||| ||||||||||||||||||||||| ||||||||||||||| |||||||||||||||||||||||| TATTGATGTGCGACATGTAGCTTGCTAAGTTCCCTTGAATACACCATAATTTACACTGAAGTTGTTGGCCC CTCC CATGCTCTAAGATCTTCCGTACTATGTTTGCTATGACTAGCAAAGTAAACAAGATACAATAAACTACTGTG AAAA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CATGCTCTAAGATCTTCCGTACTATGTTTGCTATGACTAGCAAAGTAAACAAGATACAATAAACTACTGTG AAAA GTGATAAAATGATATCCAAGATGGGCAGGCAACAATTTGAATAGGAGATTGGGAGAGACGAGTAATAATAA TCAT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| GTGATAAAATGATATCCAAGATGGGCAGGCAACAATTTGAATAGGAGATTGGGAGAGACGAGTAATAATAA TCAT GAAGAAATGATAACTCTTCCTCAATCACCACAAATACCCTTTCATCTTCTCCAACCTTGTGCAACAAGCTC CGGA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| GAAGAAATGATAACTCTTCCTCAATCACCACAAATACCCTTTCATCTTCTCCAACCTTGTGCAACAAGCTC CGGA AAAAGTTGCGAGTCCCAGTAGAGCCACCATTAGTGATCTTATACCGTGCAAACCGACATCGCAACAACTTG AACA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| AAAAGTTGCGAGTCCCAGTAGAGCCACCATTAGTGATCTTATACCGTGCAAACCGACATCGCAACAACTTG AACA AAGCAAAGGACAAGCATATATCTTTTAACTGTGGCGTTGCAATTAGAAGCATGCTATCTAACTGCCAAACT

54

CTAT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| AAGCAAAGGACAAGCATATATCTTTTAACTGTGGCGTTGCAATTAGAAGCATGCTATCTAACTGCCAAACT CTAT CAATAGTCACCAATCCAACATGTTTGTTCAGCATTGTTTCAGAGTCATCCTTGAATGTATACCCATGAGGG TGCT |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CAATAGTCACCAATCCAACATGTTTGTTCAGCATTGTTTCAGAGTCATCCTTGAATGTATACCCATGAGGA TGCT TCTCCACTTGTTCCTCCGATTCTCTCATAACCAAGAGTGGGGGAGCAGGTGCACCATCACCTACCAGCGGT TCAT ||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||| TCTCCACTTGTTCCTCCGATTCTCTCATAACCAAGAGTGGGGGAGCATGTGCACCATCACCTACCAGCGGT TCAT CAACATGCTGATGTGCTGCTTCTTGTGCTTGTAACTTCTGCATGTAAACAAAAACAAGTCCTGGATTGCGT CCGA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CAACATGCTGATGTGCTGCTTCTTGTGCTTGTAACTTCTGCATGTAAACAAAAACAAGTCCTGGATTGCGT CCGA GTGTGAAGGAGGACCGCCGTGCCATTTCGAATGCATAAAATTTGAGCCGTATTTTGGAGCATAGCAGAGCA TAAG |||||||||| ||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||| GTGTGAAGGACGACCGCCGTGCCATTTCGAATGCATAAAATTTGAGCCCTATTTTGGAGCATAGCAGAGCA TAAG GTGTGATTAAGATGAAGATGAGTTCTATAATGAAGTCATCTGCGGCATAGAGCTTAAGGACACCACCGGCA GCAA ||||||| ||||||||||||||| ||||||||||| | |||||||||||||||||||||| |||||||||||||| GTGTGATGAAGATGAAGATGAGTCCTATAATGAAGCCCTCTGCGGCATAGAGCTTAAGGAGACCACCGGCA GCAA GGTAAAAGGTCCAAATCCCCTGTATAAGCAGCTCGGGCGGGGGACCTTTAGATTGGCCTTCTCTGTCATCA ACAG ||||||| ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||| GGTAAAAAGTCCAAATCCCCTGTATAAGCAGCTCGGGCGGGGGACCTTTGGATTGGCCTTCTCTGTCATCA ACAG AGACTATTGTGCTTGTATTGATCATGACTACCAGCACAAGGGATGCCAGAATTACGAAGAATACCACGTGG TTAA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| AGACTATTGTGCTTGTATTGATCATGACTACCAGCACAAGGGATGCCAGAATTACGAAGAATACCACGTGG CTAA GTGCATTGCATTGTGCAACGATCGCTAGCTTATCATAATCATCATAATCCTCACCATAAACTTCAGACAGT GAGG ||||||||||||||||||||||||| |||| | |||||||| ||||||||||||||||| ||||| ||||||||| GTGCATTGCATTGTGCAACGATCGCCAGCTCACCATAATCACCATAATCCTCACCATAATCTTCATACAGT GAGG AAAACCTGATGTAATTAGGGTGGAGGCTGATGGTGGAGGTGAGATAGGAGATAATGGGCAAGAAAAGTGTG TTGG

55

|||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||| AAAACCTGATGTAATTAGGGTGGAGGCTGATGGTGGAGGTGAGATAGGAGATGATGGGCAAGAAAAGTGTG TTGG CTCCAAGGAAGACAAAGCGAGTGAAGGGGTGATGGCGGTACCGCTGACCATAGGCGCCTATCCCAACAAGT ACTA ||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||| CTCCAAGGAAGACAAAGCGAGTGAAGGGGTGGTGGCGGTACCGCTGACCATAGGCGCCTATCCCAACAAGT ACTA CAGCCAGGACAGGTTGAACCACTAGCAGGGCATTAACCTCCCATAGCTGCCGTCTAATGCTCTCGTGGACG GTTT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| CAGCCAGGACAGGTTGAACCACTAGCAGGGCATTAACCTCCCATAGCTGCCGTCTAATGCTCTCGTGGACG GTTG TCTGGAGATCCATCGCTTGTTTGGAGCAATTTGCGATCAGCTTCCCATTCCCTCCAGCCATGCTCAGCTGA CTGC |||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||| |||| TCTGGAGATCCATCGCTTGTTTGGAGCAATTTCCGATCAGCTTCCCATTCCCTCCAGCCATGCTCAGCTGC CTGC CAAACTTCAAGTCGTTAGCTAGCACCATGCAAGGCAAGGCACATGCAACCTCGCGCAGGATAGACGGTGTT GTAT ||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| ||||||||||||||||| CAAACTTCAAGTCGTTAGCTAGCACCATGCAAGGCAAGGCACATGCAACCTCGTGCAAGATAGACGGTGTT GTAT GTACGAGAAACTATTAACATGCAAGTAATTTGAAGAAAAAAAATAAACAAGTGGAGGAATTACCTGAATTA ATTA |||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||| ||||||||||| GTACGAGAAACTATTAACATGCAAGTAATTTGAAGAAAAAAAATAAACGAGTGGAGGAATTACATGAATTA ATTA AGGTGTGGCGGTATAGCTTAGGCCGAGATCCAGCTGACTGGTGCTGCGCAATCCAGCAGCCCAGCCGCTCA GCGT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| AGGTGTGGCGGTATAGCTTAGGCCGAGATCCAGCTGACTGGTGCTGCGCAATCCAGCAGCCCAGCCGCTCA GCGT TTCTTATGGATGGTTTCCTTCGTAATGGAGTTTTGGGCTTGCTTGCTACTCACGCGTATTAGAGAAGGGTA TATA ||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||| TTCTTATGGATGGTTTCCTTCGTAATGGAGTTTTGGGCTTGCTAGCTACTCACGCGTATTAGAGAAGGGTA TATA TGGGAGATGCCGAGAGGAAGAAAGTAGAGGTTGAACATGACCTTTGGAACGATTGGGTCATGATCCCACGG CTAT |||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||| TGGGAGATGCCGAGAGGAAGAAAGTAGAGGTTGAACCTGACCTTTGGAACGATTGGGTCATGATCCCACGG CTAT TACAGCCAAGAATCTTTTATGTTCCATAAGTATTAGAAGTTTTGAGGATATATTTAGTAGGTAGAACTAAT CAAA ||||||||||||||||| ||||||||||||||||||||||||||||||

56

|||||||||||||||||||||||||| TACAGCCAAGAATCTTTGATGTTCCATAAGTATTAGAAGTTTTGAGGACATATTTAGTAGGTAGAACTAAT CAAA TGATTATCTTTACTCTCTCTTTTTATTTTTTAACACAGTACCCATTTAAATACTATATTCTTTATTTGGTT GCAA ||||||||||||||||||| ||||||||| ||||||||||||||||||||||||||||||||||||||||||||| TGATTATCTTTACTCTCTCATTTTATTTTCTAACACAGTACCCATTTAAATACTATATTCTTTATTTGGTT GCAA TAAATTAATCTACCTATTACTCTATATATATGTTGTACCATCATTTGCATTGCGCCTTGGTGGGTTTAATT TATA |||||||||||||||||||||||||||||||||| |||||||||||||| ||||||||||||||||||||||||| TAAATTAATCTACCTATTACTCTATATATATGTTTTACCATCATTTGCACTGCGCCTTGGTGGGTTTAATT TATA TGTATGTGTTCAAACATTCAGTGTGATATGAGTTAAAGTTTACCGCTTCAAACCAAATAGGAACTAAGTAT TGTC ||||||||| |||| |||| |||||||| |||||||||||||||||| ||| |||| ||| ||||||||||||| TGTATGTGTCCAAATATTCGGTGTGATAGGAGTTAAAGTTTACCGCTAGAAATCAAACAGGGACTAAGTAT TGTC TCGGCGCTGCTAAAACTACACCAGGGAAAAATCGCAAACCTTTGAAGTTTGTGTATGTCGTTATGCTTTTA GTTT |||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||| TCGGCGCTGCTAAAACTACACCAGGGAAAAATCGCAAACCTTTGAAGTTTGTATATGTCGTTATGCTTTTA GTTT CAAGTTTTACTCTAGATAATTGATGATGTAATCTGCTGTGAATCCTTTTTAATTAATCTTTAACGAAACAG TAAC ||||||||||||||||| ||||||||| |||||||||||||||||||||||||||||||||| ||||||| ||| TAAGTTTTACTCTAGATAGTTGATGATGCAATCTGCTGTGAATCCTTTTTAATTAATCTTTAAGGAAACAG CAAC ATGGATGTATGAACTGATGACTCAACACAGTCTGTTTTCATGTTCAACCAGGAAAATGAAATAATGTTTCT CTTG ||||||||||||||||||||||||||||||||| |||||||||||||| ||||||||||||||||| |||||||| ATGGATGTATGAACTGATGACTCAACACAGTCTATTTTCATGTTCAACAAGGAAAATGAAATAATGATTCT CTTG TGCTTTTGAATTGTAGGAGACTAGGAGGATTTCCCGCATGTTGCCGCGGGTATGAAAAAGGAATATCAATG TTAG || |||||||||||||||||||||||||||| ||||| |||||||||||||||||| ||||||||| |||||||| TGTTTTTGAATTGTAGGAGACTAGGAGGATTCCCCGCGTGTTGCCGCGGGTATGAAGAAGGAATATAAATG TTAG AACGCATAGAAATGTGACTTGCTGACGGATATTATAATTACATGAGCTAATGAATTTTAATTGTATGTGAT AGTA |||| ||||||||||||||||||||| |||||||||||||||| || || | |||||||||||||||||||||| AACGTATAGAAATGTGACTTGCTGACAGATATTATAATTACATAAGTTAGTAGATTTTAATTGTATGTGAT AGTA GATTGTCTAGTTGGATAGCTTGTATGTTGAGAGAAATAGGTAGTGGGGTGTGATAGTGGATTGTCACATGT TAAA |||||||||||||||||||||||||||||| ||||||||||||| ||||| |||||||||||||||||||||||

57

GATTGTCTAGTTGGATAGCTTGTATGTTGATAGAAATAGGTAGTTAGGTGTCATAGTGGATTGTCACATGT TAAA AAAGTAGGTAGTAGGATTTTGCTTTATAAAAGTACAGTAAGATTGAAAGTCGCTGCTACACAGCAGGAAAC CCAG ||| |||||||| |||||||||||||||||||| ||||||||||||||||||||| |||||||||||||||||| AAAATAGGTAGTGAGATTTTGCTTTATAAAAGTATAGTAAGATTGAAAGTCGCTGCAACACAGCAGGAAAC CCAG GGTTTATTTTAGTCTGGTCATTCACTCCAATGTACGTAAATTAGCTAGGTAATCTGCTGTGGATCGGAATA TATA ||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||| GGTTTATTTTAGTCTGGTCATTCACTCAAATGTACGTAAATTAGCTAGGTAATCTGCTGTGGATCGGAATA TATA CTTTGCTTTATTCATGGAAGAGCAGGTGTTCTGCTAATCTTTTTAAAAATGAATAAACAATGAAAAAAGAA AAGC ||||||||||||||||||||||||||||||||||||||||||| ||||| ||||||||||||||||||||||||| CTTTGCTTTATTCATGGAAGAGCAGGTGTTCTGCTAATCTTTTAAAAAAGGAATAAACAATGAAAAAAGAA AAGC AGCTAGTACTTTAAAGCAATCATCATCATATGCATGCGAGGTCCCCTTTATTCCTCAGCGCCTGGGTTTTT CTTT ||||||||| ||||| ||| |||||||||||||||||||||||||||||||||||||||||||| |||||||||| AGCTAGTACCTTAAATCAAGCATCATCATATGCATGCGAGGTCCCCTTTATTCCTCAGCGCCTGCGTTTTT CTTT TTCTCTTTGATTTCCGTCTTTTTTTTCTTTAATTTCTGCCTTTGTCTATCATTATTACCTTCTCCATTGCA CAGA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| TTCTCTTTGATTTCCGTCTTTTTTTTCTTTAATTTCTGCCTTTGTCTATCATTATTACCTTCTCCATTGCA CAGA CGATATGAAGCAAGTCGAGTGACTAGAATGGCACGAGCTCGAATGACTTGTGGCGAGCGTGTGAGAGAGCG TGTG |||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||| CGATATGAAGCAAGTCGAGTGACTAGAATGGCACGAGCTCGAGTGACTTGTGGCGAGCGTGTGAGAGAGCG TGTG TTTGGCTCAGAGTAATACATGGAGAGGCCAGGTCTAAACAACGACTATCTTCATTTCTATTTAAATTTGTC TAAT |||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||| TTTGGCTCAGAGTAATACATGGAGAGGCCAAGTCTAAACAACGACTATCTTCATTTCTATTTAAATTTGTC TAAT CAATGATTCCCACACCATTGCCGTCCACTGTACACCAGTTGGTACGTGCGGAGCTATTACCACCATGGCCA CTAA ||||||||||||||||||||||||||||| ||||||| ||||||||||||||||||||||||||||||||||||| CAATGATTCCCACACCATTGCCGTCCACTATACACCACTTGGTACGTGCGGAGCTATTACCACCATGGCCA CTAA ACCACTCAGCCCACCAGGGTGTCTTGGGGCCAGTGACGGAGCCAGGATTTAAATTTTGGGTATTCCCTTGT TAGT ||||||||||||||||||||||||||| | ||||| |||||||||||||| ||| ||||||||||| |||||||| ACCACTCAGCCCACCAGGGTGTCTTGGAGTCAGTGGCGGAGCCAGGATTTGAATCTTGGGTATTCCTTTGT

58

TAGT GTCTAGTGAGTCGTTATTCAGCTGTTGCCTCTAACCTTACGACCGACGTGCGCCCTCGCCGGTCGCCATGT AAGC |||||||||||||||||||||||||||||||||||||| |||||||||||||| ||||||||||||||||||||| GTCTAGTGAGTCGTTATTCAGCTGTTGCCTCTAACCTT- CGACCGACGTGCGCTCTCGCCGGTCGCCATGTAAGC ATAGCATAGAGATAGAGGTCGCGTTCGCGCTGTATGTTCCACGCTGTCAGGAGCTTAGGACATTGGGAACC AGCA |||||||||||||||||||||||| ||||||| |||| || ||||| |||||||||||||||||| ||||||||| ATAGCATAGAGATAGAGGTCGCGTCCGCGCTGCATGTCCCGCGCTGCCAGGAGCTTAGGACATTGAGAACC AGCA GTACCGCCACCACATCATGCCTGGTAGGCTAGGGTGAGAGATTTAGGCCTTGTTTAGTTCGCAAAAATTTT CAAG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| GTACCGCCACCACATCATGCCTGGTAGGCTAGGGTGAGAGATTTAGGCCTTGTTTAGTTCGCAAAAATTTT CAAG ATTCCTCGTCACTTCGAATCTTTGGTTGCATGCATGGAGCATTAAATATAGATGAAAATAAAAACGAATTA CACA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| ATTCCTCGTCACTTCGAATCTTTGGTTGCATGCATGGAGCATTAAATATAGATGAAAATAAAAACGAATTA CACA GTTTACCTGTAATTTGTAAGATGAATCTTTTGAGCCTAGTTACTCCTTGTTTAGACAATGTTTGTCAAATA AAAA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| GTTTACCTGTAATTTGTAAGATGAATCTTTTGAGCCTAGTTACTCCTTGTTTAGACAATGTTTGTCAAATA AAAA CGAAAGTGCTACAGTAGCCAAAAATGGAAATTTTTGCCAACTAAACAAGGCCTTAGGAATGAGGATTATGG AATA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CGAAAGTGCTACAGTAGCCAAAAATGGAAATTTTTGCCAACTAAACAAGGCCTTAGGAATGAGGATTATGG AATA GTCTCCCACTCATGTCCTTATTATATATGCTGCCATGTGCATATTCCCATATTTAGATCCAAAAAGTTTTT CGAT ||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||| GTCTCCCACTCATGTCCTTATTATATATGCTGCCATGTGCATATTTCCATATTTAGATCCAAAAAGTTTTT CGAT TGGAGTAACTAGGCTTAAAAGATTTGTCTCGCGATTTACACGCAAACTGTGCAATTAGTTTTTATTTTCAT ATAT |||||||||||||||||||||||||||||| |||||||||||||| ||||||||||||||||||||||||||||| TGGAGTAACTAGGCTTAAAAGATTTGTCTCACGATTTACACGCAATCTGTGCAATTAGTTTTTATTTTCAT ATAT ATTGAAAAGTTTTTGGTTTTTAGGATGAACTAAACAAGGCCTTATCTTGGAAAAGAATTTAGAAATAAGAT ATGA |||||||||||||| | |||||||||||||||||||||||||||| |||||||||||||||||||||| |||||| ATTGAAAAGTTTTTTGGTTTTAGGATGAACTAAACAAGGCCTTATGTTGGAAAAGAATTTAGAAATAAAAT ATGA

59

TTTGTACTAATATTTAGTGACATGGTCAATATGTATGTGGTCTGAAGATAAATAATTAGATTCTCATTTAC AGAA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| TTTGTACTAATATTTAGTGACATGGTCAATATGTATGTGGTCTGAAGATAAATAATTAGATTCTCATTTAC AGAA ACAACAAATTAGTACAAATTCCTTTGAAAAGAATGAGAGAGAAATTTAATTTGTTAAATATAGAAATGCTA ATAT ||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||| ACAACAAATTAGTACAAATTCCTTTGAAAAGAATGAGAGAGAAATTTTATTTGTTAAATATAGAAATGCTA ATAT ATTTGATATACATATATATAAATATACCTATATATACTTAGTGCACAAAAAGTTGGGTATTCCTGGGAATA CCAG |||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| |||||||||| ATTTGATATACATATATATAAATATACCTATATATACTTAGTGCACAAAAAGTTAGGTATTCCTAGGAATA CCAG GAATACCTGGTGTATCCGCCCATGCTTGAGGCGTGTTTGGTAACGCGACTCTGAATGGCCTGGCTGGCTGG CCTA ||||||||||||||| |||||||||||||||| |||||||||||||||||||||||||||||||||||||||||| GAATACCTGGTGTATTCGCCCATGCTTGAGGCATGTTTGGTAACGCGACTCTGAATGGCCTGGCTGGCTGG CCTA GTCTAGGTTAGCTCGAGCCTGGTCATAACGGTGCAATCTCATATGTTTGGTTGTCTATTCTAACAAAGACG AGCT |||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| GTCTAGGTCAGCTCGAGCCTGGTCATAACGGTGCAATCTCATATGTTTGGTTGTCTATTCTAACAGAGACG AGCT CACTTCAAATCGTGTTTGGTTGGTAGTATACTAGAATCTGGTAACCCTAGACTTACAACAGGGTAATGTTA CCTC |||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||| |||||||||||||| CACTTCAAATCGTGTTTGATTGGTAGTATACTAGAATCTGGTAACCCTAGACTTACAACATGGTAATGTTA CCTC CTCTTTGTTATTTCATCGTGAATACATGGAGGTCATCGGCGTTGAGGCATGGCATGGCTCCGATTGATGGT GAGG |||||||| ||||||||| ||||||||||||||| ||||||| ||||||||||||||||||||||| ||||||| CTCTTTGTCATTTCATCGCGAATACATGGAGGTCGTCGGCGTCGAGGCATGGCATGGCTCCGATTGGCGGT GAGG CATGACTGCATACGACGGTGAACCCTCCTCCCCATCCCTCTCTGCTTGCCTACACCTCTGTCACCACTAGC GTAG |||| |||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||| CATGGCTGCATACGACGGTGAACCCTCCTCCCCATCCCTCTCTGCTTGCCTACACCTTAGTCACCACTAGC GTAA CTCGCGAACGTGGAGCACACTACGCCATCGTTGTGGCTGCTTGACCTGATGACAAGCTCGCATCCGCCACC AGGG |||||||||| ||||||||||||||||| |||||||||||| |||||||||||||||||||| || || |||||| CTCGCGAACGCGGAGCACACTACGCCATTGTTGTGGCTGCTCGACCTGATGACAAGCTCGCAGCCACCGCC AGGG CAGCCTCTCTCCGCGCCACACTTGGAGCACACCATGGGAGCAAGAGTGTCTACGGCTGGCGGCGGCGCTGC

60

TTCT |||||||||||| ||| ||| |||||||| |||||||||||||||||||||||| ||||||||||| |||||| CAGCCTCTCTCCATGCCGCACCTGGAGCACGGCATGGGAGCAAGAGTGTCTACGGCCGGCGGCGGCGCCGC TTCT AGGAGCAACGTGACCGAGGCGGTGGTGTCAGAGGGGCGCCTCTTTCAGGAGCCCATCCGTCGCATTGTGTG GTCC |||||||||||||||||||||||||||||||||||||||| ||||||||||||||||| |||| ||| ||||||| AGGAGCAACGTGACCGAGGCGGTGGTGTCAGAGGGGCGCCGCTTTCAGGAGCCCATCCATCGCGTTGCGTG GTCC CCGTGTGTCCCTAGAGGAGGAGGCGGTGTGGCGAGACTCTGATTTTCCTAGATCTCGTGACCAGGTCCAGC TCTC |||||||||||||||||||||||| |||||||| |||||||||||||| ||||||| ||||| ||||| ||||| CCGTGTGTCCCTAGAGGAGGAGGCAGTGTGGCGGGACTCTGATTTTCCCAGATCTCATGACCGGGTCCGGC TCTG CCACCATGGCCGTGACAGAGGCAATGTTTCTTTGCATAGCATCCGCACAGAATAGGCTATCGAATGACTCG TACG ||||||| |||||||||||||||||||| |||||||||||||||| |||||||||||||| | |||||||||| CCACCATTGCCGTGACAGAGGCAATGTTGCTTTGCATAGCATCCGTGCAGAATAGGCTATCCAGTGACTCG TACA TGCCGGTGTCGAAGAAGGCAAACTCATCGCCGAAGAGGAGCACATCTGGCCCAGAGTCGGAGGAGGCCTAT GGTG ||||| ||||||||||||| |||||||||| |||||||||||||| |||| ||||| |||||||||||||||||| TGCCGATGTCGAAGAAGGCGAACTCATCGCAGAAGAGGAGCACATTTGGCTCAGAGCCGGAGGAGGCCTAT GGTG ATGCCTCGTCTTCGGTGCTCTCCGACTCAGTCCTTGGCTGCTTCTTGTCGCCACTCTAGCCTCAACATCGA GGGC ||||||||||||| ||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||| ATGCCTCGTCTTCTGTGCTCTTCGACTCAGTCCTTGGCTGCTTCTTGTCGCCACTCTAGCCTCAACATCGA GGGC GGGGGCCGCGTCATACGCCGTTGGCACGCCCTACCATGAGATTTCTTGGAGTGTGCTAGAGCCAGAGCTCT CATT |||||| ||||||||||||| ||||| |||||||| |||||| || |||||| || ||||||||||| |||||| GGGGGCTGCGTCATACGCCGCTGGCATGCCCTACCGTGAGATGTCCAGGAGTGCGCCAGAGCCAGAGCGCT CATT GGCCTCGAACTTCAAGGCGGCGCTGCTCTTCTTCCTCTTGGTTGACAAGGTCGATGACGGCGGCGTCGGTG GTGT ||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||| GGCCTCGTACTTCAAGGCGGCGCTGCTCTTCTTCCTCTTGGTTGACAAGGTCGATGACGGTGGCGTCGGTG GTGT GCGTGGTTGTGACTGGCTGGCTAGGGCCCCGACAACGGCAGAGAAGAGAAAGAGGTGAGAGGCCGGCAGGT GGCA ||||| |||||||||||| ||| ||||||||||| |||||||||||||| ||||||||||||| |||||||| GCGTGCTTGTGACTGGCTAGCTAGGGCCCCGATGGTGGCGGAGAAGAGAAATAGGTGAGAGGCCGGGAGGT GGCA AGCTCGTTCACTGCGCACTCCTTCTCGCTCCACCTGGCTAGGCTGTGAAAAAACAAAACACCACATCATTT TAGT

61

|| ||||||||||| | ||||||||||||||||||||||||||| || ||||||||||| || |||||||||||| AGTTCGTTCACTGCACGCTCCTTCTCGCTCCACCTGGCTAGGCTATGGAAAAACAAAACCCCCCATCATTT TAGT TGATCCCATAACTATAAATAGTTTATTGTGCAACTCGACAAATAAGGGGAGGGTTGGCTCCCAGGGAGGGA GAAG ||||||||||||||||||||||| |||| ||||||||||| |||| ||||||| ||||||||||| |||||| | TGATCCCATAACTATAAATAGTTGATTGCGCAACTCGACATGTAAGTGGAGGGTCGGCTCCCAGGGGGGGA GAGG ATAGACTTAGACAAACCGATAAGCTAACAAGTGGAATAACCGAGTGTCTTCCACTAGTAGAAATGTCTGCT TTTA |||||||||||||||||||||||||||||||||||| ||||||||||||||||| || |||||| || ||||| ATAGACTTAGACAAACCGATAAGCTAACAAGTGGAACAACCGAGTGTCTTCCACCAGCAGAAATCTCCACT TTTG TGACGTTGGGGAGAACCCTCACCGCTGACACCGTAAGCTAGCACTAAGCAATCCCCTACCAAACTCTCATT CTTT ||||| | ||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||| TGACGCTAGGGAGAACCCTCACCGCTGACACCGTAAGCTAGCACTGAGCAATCCCCTACCAAACTCTCATT CTCC TTCATTATAAACCCCCTGAGATTTGGGTGCTCCTGAATACGAAGAACACCGCACTACTAGACGAGGGCCTC GATA |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| TTCATTATAAACCCCCTGAGATTTGGGTGCTCCTGAATACGAAGAACACCACACTACTAGACGAGGGCCTC GATA GCCCAAACTAGGATAAACCGTCATGTCTATCTTGTGTTTTTAAGTACCCAAGCCACTGGAGACCCACACTT CTTA ||| |||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||| GCCAAAACTAGGATAAACCGTCATGTCTATCTTGTGTTTTTAAGTACCCAAGCCACCGGAGACCCACACTT CTTA TATTCGACAAACAACCATGATGCTTGGCAGTTATGAACCTACGACAACTAGCACACAAAGTAGGGAAGTGT CGAA ||||||||||||||||||||||||| ||||||||||| | |||||||||||||||||||||||||||||||||| TATTCGACAAACAACCATGATGCTTTGCAGTTATGAATCCGCGACAACTAGCACACAAAGTAGGGAAGTGT CGAA TGCGAATTTGGCTCTATGGTGGCCGGAGCCACCCGCCTCGTTGTCGGAGCAGGTTTAAATGAAGGGGCTGA TTGA || ||||| |||||||||||||||||||||||||| || || | || |||||||||||||||||||||||||||| TGTGAATTCGGCTCTATGGTGGCCGGAGCCACCCGACTTGTCGCCGAAGCAGGTTTAAATGAAGGGGCTGA TTGA TGTTTCTTCTTGGGATAGAAAGTTGGACCATATTGGGCCCAAAAAATGAGAGGAGAGAATGCAGTCGGTAG AAAT || |||||||||||||| |||||||||||||||||||||||||||||| |||||||||| ||||||||| ||||| TGATTCTTCTTGGGATACAAAGTTGGACCATATTGGGCCCAAAAAATGGGAGGAGAGAAAGCAGTCGGTGG AAAT TTAATCTTCTTCAACTCCTAGAACTATCATTGTATTCTGATTTTTCTGCTAAAACTATAAAACTGCATATC TTAC ||| ||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||

62

||||||||||| TTAGTCTTCTTCAACTCCTAGAACTATCATTGTGTTCTGATTTTTCTGCTAAAACTATAAAACCGCATATC TTAC CCCCCCCCCCCCCCCTCATCTCTCAGAATCAATTACCTCCCTATCCTAGTTAAAATTGGTGATAAATGCCT GATA ||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||| |||| CCCCCCCCCCCCCCCTCATCTCTCAGAATCAATTACCTCCCTATCCTAGTTAAAAATGGTGATAAATGCCA GATA AGACTTTTAAACCATAAAAATATCTCTATGGATATTTAGAGGCCGTGAAAGTAGCTCTAAAAGCTTTCAAA AATA |||||||||||||||||||||||||||| ||||||||| |||||||||||||||||||||||||||||||||||| AGACTTTTAAACCATAAAAATATCTCTACGGATATTTACAGGCCGTGAAAGTAGCTCTAAAAGCTTTCAAA AATA GATTTACCTTTTTGGAAAATAGAAAATAAATAGAGAAATATGTAGAATTCCCAAAATATTCTATTTGATGC ATAA |||||||||||| |||||||||||||||| |||||||| ||||||||||||||||||||||| ||||||||||| GATTTACCTTTTCAGAAAATAGAAAATAAACAGAGAAATCTGTAGAATTCCCAAAATATTCTAATTGATGC ATAA ACACGTTTTCAAAACTTTCTATAGGAAACATGAGGGCAATATCAGCAGCATATGCATTTAGCATCATATTT GTGT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| ACACGTTTTCAAAACTTTCTATAGGAAACATGAGGGCAATATCAGCAGCATATGCATTTAGCATCATATTT GTGT TGCTTTCTTGTTGTCTGCATCTCATATTTTTGTTATTGGAAATTCCTTAGACACATTTGGACATTAATTAT GGAA ||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||| TGCTTTCTTGTCGTCTGCATCTCATATTTTTGTTATTGGAAATTCCTTAGACACATTTGGACATCAATTAT GGAA AATTTTCATATTTATTTGGATATTTTCTCTTTTTCTTGAGCTATAAATAATATTTTGGAAGCTGCTAAAAT TAAT ||| ||||||||||||||||||||||||||||||||||||||||| |||| ||||||||||| ||||| ||||| AATGTTCATATTTATTTGGATATTTTCTCTTTTTCTTGAGCTATACATAAATTTTTGGAAGCTACTAAAGT TAAT CTCATTAGCTCTCAATATTTTATTTGGGTTTATAAATATTTTAATTGGGGTTTGCAAGTACCAATCTTACT TTAG || ||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| TTCCTTAGCTCTCAATATTTTATTTGGGTTTATAAATATTTTAATTGGGGTTTTCAAGTACCAATCTTACT TTAG GATTTCTCCCAAAATTTTAGAAGCTTAGACGTAGTTTTCATGGTTAAAAACTTTATCATATTTTTAGAACC AAAT ||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||| GATTTCTCCCAAAATTTTAGAAGCTTAGACGTAGTTTTCATGGTTAAAAACCTTATCATATTTTTAGAACC AAAT TGTCCTCCTGAGAAGTACTACTAACGGAGCAGGGAAGTAATTGAGAAGACTTTGAAGAGTTGGGGTGTGGG GTGG ||||||||||||||| ||| |||||||||| ||||||||||||| |||||||||||||||||||||| |||| ||

63

TGTCCTCCTGAGAAGCACTGCTAACGGAGCGGGGAAGTAATTGAAAAGACTTTGAAGAGTTGGGGTGGGGG GGGG GGTGGGGGGGGGGGTTAAGATTTCCATTTGTGATGTTCGAAAAAGAAATTAGACTATGACAGTAGTTCAAG GAGG |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||| ||||||||||||||| GGTGGGGGGGGGGGTTAAGATTTCCATTTGTGATGTTCGAAAAAGAAATTGGACTATGATAGTAGTTCAAG GAGG ATTGTTTTTTCTGAATTTGCGAGCGTGTTCTCTGTTGGCCGATTAGGACAATGGAGAGGCCCGAGACAATA TTTG |||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||| |||||||||||||| ATTGTTTTTTCTGAATTTGCGAGCGTGTTCTCTGTTGGCCCATTAGGACAATGGAGAGGCACGAGACAATA TTTG CTTGGCAACGGAAGCAGGCAGCAATTGGCAATATATTGGCCTCATGACTTGATTTGGCTACAAAATGCAGA AGAG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CTTGGCAACGGAAGCAGGCAGCAATTGGCAATATATTGGCCTCATGACTTGATTTGGCTACAAAATGCAGA AGAG AAAACTAAACCAACATCCAGATGCATGTAGTTTTGAAATTGTTTTGCATCGGCGCTGCTGGACCGCAGAAC GCCG |||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||| AAAACTAAACCAACATCCAGATGCATGTAGTTTTGAAAAAAAAATGCATCGGCGCTGCTGGACCGCAGAAC GCCG CTTCTGCCATGGTCTACAGACTTCGGCCACCTGTGTCATATGTGACCAACTACCTAAGACCATTGATCACA TTCT ||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||| CTTCTGCCATGGTCTACAGACTTCGGCCACCTGTGTCATATGTGACCAACTACCTGAGACCATTGATCACA TTCT TCTGGGCTGTTGTTTCAGTAGGGAAGTTTGGCAAATTTGCTTCAGCAAGCTGCACCTGCAGGGCTTCGTTG TTGT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| TCTGGGCTGTTGTTTCAGTAGGGAAGTTTGGCAAATTTGCTTCAGCAAGCTGCACCTGCAGGGCTTCGTTG TTGT CAATTATTAGCCTGTGATGCATTGGTGGGTTCACAGCAGGAAGGCAGTCCCTAAGCAACTTCGGCGTAGAT TCGA |||| | |||||||||||||||||||||||||||||||||||||||||||||||||||||||| || ||||||| CAATGAGGAGCCTGTGATGCATTGGTGGGTTCACAGCAGGAAGGCAGTCCCTAAGCAACTTCGGTGTGGAT TCGA CTCGCTTTTTTTCTAGTGGGCTGGCTGCTTTCGTAGGAGAGGAATGCGAGAACTTTCGATGCAGTCACCTC GACA || |||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||| CTTGCTTTTTTTCTAGTGGGCTGGCTGCTTTCGAAGGAGAGGAATGCGAGAACTTTCGATGCAGTCACCTC GACA CCGTTATAGCTGGCAACAAGAATTATGGATGATGTTCTTGTCTGGGAACTGGCAGGGTTTAAACCCTTCTT GGCG | |||| |||||||||||||||||||||||||| || ||||||||||||||||||||||||||||||||||||| CTGTTACAGCTGGCAACAAGAATTATGGATGATATTGTTGTCTGGGAACTGGCAGGGTTTAAACCCTTCTT

64

GGCT CTTCTAGGTTTGGTGTCTTCTTAGTAGTCCTCTTCCCGCTTGCTCTTTAGCCATCTTGGCTACTGCCTGTT CCTT ||||||||||||||||||||||||||||||||| ||||||| ||||||||||||||||||||||||||||||||| CTTCTAGGTTTGGTGTCTTCTTAGTAGTCCTCTCCCCGCTTCCTCTTTAGCCATCTTGGCTACTGCCTGTT CCTT TGCTGTTGAGTGTTTTGGTTCTATGTTTGGCAACATATGGTTACTGTGTAACTAGCTGTATCCTTCAAAGC TATT ||||||||||||||||||||||||||| |||||||||||||||| |||||||||||||||||||||||||||||| TGCTGTTGAGTGTTTTGGTTCTATGTTCGGCAACATATGGTTACCGTGTAACTAGCTGTATCCTTCAAAGC TATT CCTGCTTAATGAAAAACGTGCCTCCCGACGTGGTCTAGAAAAAAAATTTCTTTGGCACTTACTCATCAAAC GTTA ||||||||||||||||| |||||||||||||||||||||||||||| |||||||||||||||||||||||||||| CCTGCTTAATGAAAAACATGCCTCCCGACGTGGTCTAGAAAAAAAAATTCTTTGGCACTTACTCATCAAAC GTTA TTCTACCATGCAAGTCATTTGTACCAAAGCTTACACATACATGGATCATCACACAGTATGTGGAGACAGTA AACA ||||||||||||||||||| |||| |||||||||||||||||||||||||||||||||||||||||||||||||| TTCTACCATGCAAGTCATTCGTACAAAAGCTTACACATACATGGATCATCACACAGTATGTGGAGACAGTA AACA ATTTGTATCTGAATTATTAGTAAATCTTAAGAGCTTCATTACAACATTATTTCAGGCAGGACACCAAACAG ACTT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| ATTTGTATCTGAATTATTAGTAAATCTTAAGAGCTTCATTACAACATTATTTCAGGCAGGACACCAAACAG ACTA GTTACAATTGATACAGAAGACAAGAGTACCATTATCACATCAGCAGCGACGACAACATCATTCAGAGTGGC ACCT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| GTTACAATTGATACAGAAGACAAGAGTACCATTATCACATCAGCAGCGACGACAACATCATTCAGAGTGGC ACCT CATCAAATGAAATGCACTCGGCATGCAGAAACATTACCTCACAGGCGTACAGGCGTACAGCGAGGCGACAA AGTA |||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||| CATCAAATGAAATGCACTCGGCATGCAGAAACATTACCTCACAGGCCGACAGGCGTACAGCGAGGCGACAA AGTA TGAAATAATTAAAACCAACGCTGAGTTTTTGTTGGCGAAAGCATAACAAACCAACCCTAAATTAAAACATG CATG |||||||||||||||||||||||| ||||||||||||||||||||||||||||||||| |||||||||||||||| TGAAATAATTAAAACCAACGCTGATTTTTTGTTGGCGAAAGCATAACAAACCAACCCTGAATTAAAACATG CATG TATGAGTAGTAGGTAATAGGATAAGACTCCATTAGACCACACCAGCGGCAGAGGTAGCAGCAGCAGCAGCA GCAG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| TATGAGTAGTAGGTAATAGGATAAGACTCCATTAGACCACACCAGCGGCAGAGGTAGCAGCAGCAGCAGCA GCAG

65

CAGCACTGTTGTTCTCGCCTGGCCTGCTGACGATCCCAGCATGGAAGAGCAGCACCCATAGCAGCGTTATC AGCT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CAGCACTGTTGTTCTCGCCTGGCCTGCTGACGATCCCAGCATGGAAGAGCAGCACCCATAGCAGCGTTATC AGCT CGCCGCCCCTGGCGATTGCCTCTTTGTGCCCTTTGAGATTGTCGGACGGCGCAACATACAGGACCATCTCT GACC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CGCCGCCCCTGGCGATTGCCTCTTTGTGCCCTTTGAGATTGTCGGACGGCGCAACATACAGGACCATCTCC GACC AGAATTCAGCCAACAACTTCCACACCGTATCATCACCACCATGTAGCACCAGCTTCGCCAATTGCTCTCCC AGCC ||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||| AGAATTCAGCCAACAACTTCCACGCCGTATCATCACCACCATGTAGCACCAGCTTCGCCAATTGCTCTCCC AGCC TCGCGCCGTCCTTGACTACTTGATGCTTCGCTTTTGTACTCAGCAATTCGACCAGCTGTTGGCATTTGGCT TCCT |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||| TCGCGCCGTCCTTGACTACTTGATGCTTCGCTTTTGTACTCAGCAATTCGACCAGCTGTTGGCATTTGACT TCCT GCGTCAATGAATCTCCTGCCGTGCAGCCGGCAAGGACACGCCGAGTGTCCTTCTTGACATTCCCGTACAGG CTCC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||| GCGTCAATGAATCTCCTGCCGTGCAGCCGGCAAGGACACGCCGAGTGTCCTTCTTGACATTCTCGTACAGG CTCC TGCTCCATTCGTGGTCATCGGGAAGCAGCTCAGGGCACCATGTCACGAGGTAAGCACAGTACCGTGATAGG TGGG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| TGCTCCATTCGTGGTCATCGGGAAGCAGCTCAGGGCACCATGTCACGAGGTAAGCACAGTACCGTGATAGG TGGG TGGCAACCATTTTGTAGTCAGTGTTGGAAAGTGGTGGAGAACATTGTCCTTCATCATTCTGGTGTGGGTAC CTGA |||||||||||||||||||||||||||||||||||||||||| ||||||||| |||||| ||| ||||||||||| TGGCAACCATTTTGTAGTCAGTGTTGGAAAGTGGTGGAGAACCTTGTCCTTCGTCATTCAGGTTTGGGTAC CTGA CCTCAAGTATACTTGTGGCGATGTGCCATGTGAGTATGGTATAAGATGTACTCTTGTTGCTGCAAGCCCAG TTGA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CCTCAAGTATACTTGTGGCGATGTGCCATGTGAGTATGGTATAAGATGTACTCTTGTTGCTGCAAGCCCAG TTGA AGCTTTCACCAAGGTCTCGGCAGCGGCGGAGACATGCTGCTCCATTACTGAGCTGTCCTCCATCATCTCTA GTCC |||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||| AGCTTTCACCAAGGTCTCGGCAGCGGCGGAGACATGCTGCTCCATTGCTGAGCTGTCCTCCATCATCTCTA GTCC TTCTTAGCGCTTGCATGATGCAAACCTTCACTACTGCTGGCACACTGACCTTCCTGTTCTGGTCTGGTAAA

66

AAAG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| TTCTTAGCGCTTGCATGATGCAAACCTTCACTACTGCTGGCACACTGACCTTCCTGTTCTGGTCTGGTAAA AAAG GGAAAAGGTACCAGAGAATACCAAGCGGGGTTGCTCTTGGTTGAAGCACCAATACTGAGAACTGGCCAATT TTCT |||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||| GGAAAAGGTACCAGAGAAGACCAAGCGGGGTTGCTCTTGGTTGAAGCACCAATACTGAGAACTGGCCAATT TTCT CATCCCAATGCCTCCTCAATAACTTGCATCTGCAGCGGATCAAAAGGCCGGCACATTTCTGAATGCAAAGA GAAT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CATCCCAATGCCTCCTCAATAACTTGCATCTGCAGCGGATCAAAAGGCCGGCACATTTCTGAATGCAAAGA GAAT GCTTCGATGAGGAGGAGGAGGCACGGTTTAGGAAGCGACACACGAGGGCTACTTTGGTCCAGTTAGAGCAG ACGT ||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||| GCTTCGATGAGGAGGAGGAGGCACGGTTTAGGAGGCGACACACGAGGGCTACTTTGGTCCAGTTAGAGCAG ACGT AGGAAGTCATGTCCCTCACCTCAGTAATGAAAAATAACACTAAAAGAAAAAGTAGCGGCACCAGACCAAAG AGCC |||||| |||||||||||||||||||||||||| ||||||||||||||||||||| ||||||||||||||||||| AGGAAGCCATGTCCCTCACCTCAGTAATGAAAACTAACACTAAAAGAAAAAGTAGTGGCACCAGACCAAAG AGCC AGCTTCCATACAATCTTATGTCCAATTGTGTTTGCAACTGATTTTCAGTGCAACTGATGAAACACAAAAGC TGAG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| AGCTTCCATACAATCTTATGTCCAATTGTGTTTGCAACTGATTTTCAGTGCAACTGATGAAACACAAAAGC TGAG GTGAACCAGTATTTACAGCTAAAAGGAAGCTGAGCCACGTTGCAGCCAAGATGCAATAAGTTATGCTCAAG AGTG |||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GTGAACTAGTATTTACAGCTAAAAGGAAGCTGAGCCACGTTGCAGCCAAGATGCAATAAGTTATGCTCAAG AGTG AAATAAAGATACCCACGATGGGCAACCAGCACTTGGAATAGGAGATCGGTATGGATGTGTAATAATAGTCA TGAA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| AAATAAAGATACCCACGATGGGCAACCAGCACTTGGAATAGGAGATCGGTATGGATGTGTAATAATAGTCA TGAA CAAAAGAAATCTCCTCTGAAATCAACCGAAACACCCTGTTGTGGTCTCCCTCCTTGAGCAGCAAGCTCCAG AAAA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CAAAAGAAATCTCCTCTGAAATCAACCGAAACACCCTGTTGTGGTCTCCCTCCTTGAGCAGCAAGCTCCAG AAAA AAGTGAACATGTCACTCGAGGCATTAGCAGTGTTGACATTGTACCCTGCAAACCGGCACCGCAACAACTTG AAGA

67

||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||| AAGTGAACATGTCACTCGAGGCATTAGCAGTGTGGACATTGTACCCTGCAAACCGGCACCGCAACAACTTG AAGA ACGCAAAGGACAAGCATAAGTCTTGTAGTCGTTGCAGTGTTGAGAACGGAAGGAAGCAGGCTGTCATCTTC CAAA ||||||||||||||||||||||||||||||| ||||||||||||||||||| ||||||||||||||||||||||| ACGCAAAGGACAAGCATAAGTCTTGTAGTCGCTGCAGTGTTGAGAACGGAAAGAAGCAGGCTGTCATCTTC CAAA CAATGTCAATAGTTACCAGGCCATCTTTACCGTGCACAGCCATGCCTGAGTCACCCTTCCATACATACCCA TGAG || ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| | CACTGTCAATAGTTACCAGGCCATCTTTACCGTGCACAGCCATGCCTGAGTCACCCTTCCATACATACCCA TGAG ACTGCTTCTCCATGTTTCTTTTTCCCTCTCCCATAACCACTAGTGGAGGTGGAGGAGGAGCATCCTCATTA GTTA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| ACTGCTTCTCCATGTTTCTTTTTCCCTCTCCCATAACCACTAGTGGAGGTGGAGGAGGAGCATCCTCATTA GTTA CTGCTGCTGCTTCTCTACGGTGACTTGTCCTTGCTTGTACCGGTGGTGGTTGCCGCATGTAGGCAAAAATA AGAT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CTGCTGCTGCTTCTCTACGGTGACTTGTCCTTGCTTGTACCGGTGGTGGTTGCCGCATGTAGGCAAAAATA AGAT GAGGATTGTTTCCGAGAGCAAAGGATTGCCGTGCCTTTTCAAATGCATAATATTTGAACACCATTTTGGCA CATG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| GAGGATTGTTTCCGAGAGCAAAGGATTGCCGTGCCTTTTCAAATGCATAATATTTGAACACCATTTTGGCA CATG TGAGAGCAAAAGGTATCCCTTCAAGTCCAATTATAAGCACAACAAAAGTAATAAGTCCATCTCTGGAATTT ATGC ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| TGAGAGCAAAAGGTATCCCTTCAAGTCCAATTATAAGCACAACAAAAGTAATAAGTCCATCTCTGGAATTT ATGC CACTATTATTAGTGAAGTAGGTCACACCTAGGTAATAGGTCCAGAGGCCCTGAACAAGCAGGTCAAAAGGA GGAC |||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||| CACTATTATTAGTGAAGTAGGTGACACCTAGGTAATAGGTCCAGAGGCCCTGAACAAGCAGGTCAAAAGGA GGAC CTATGTTTCTGCCTTCTCTGTCATCAACAGCAACTATTGCACTGGTATTGATCATGATTATCTGAACAAGG GATG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CTATGTTTCTGCCTTCTCTGTCATCAACAGCAACTATTGCACTGGTATTGATCATGATTATCTGAACAAGG GATG CCCATACTATGACCACGGGAGATTGAAATTCTGGATCGCATCTTGCAGCCAACCGCCCCTCATCTGTGTCT GTGA

68

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CCCATACTATGACCACGGGAGATTGAAATTCTGGATCGCATCTTGCAGCCAACCGCCCCTCATCTGTGTCT GTGA CATAGTAGTTTGATCCAGCGACCAAGGGGACAGCGGTGGACATGACAGGCAAGAACAAGGTGGTGGCTCCA ACAA ||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CATTGTAGTTTGATCCAGCGACCAAGGGGACAGCGGTGGACATGACAGGCAAGAACAAGGTGGTGGCTCCA ACAA AGATGAACCTAGTTAAGCGGTGGTAGCGGTAACGTTGGCCAAAGATGCCTATCCCGACTATGACTCCCGCC AGGA ||||||||||||||| ||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||| AGATGAACCTAGTTAGGCGGTGGTAGCGGTAACGTTGGCCAAAGATGCCTATCCCAACTATGACTCCCGCC AGGA TGGCGCTGGCGACCAACAGAGCGTTCACAAGCCACAACCTGTGTCTGATGCTCTTGTGGAACAAGGACAGG GATT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| TGGCGCTGGCGACCAACAGAGCGTTCACAAGCCACAACCTGTGTCTGATGCTCTTGTGGAACAAGGACAGG GATT GGCCATCGCATATAATCCGAGAGGAGCATTGTTTGAGTGCATTGTAGGAGCAGTGGTCTCTAGTTGAATTC CCTC ||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||| GGCCATCGCATATAATCCGGGAGGAGCATTGTTTGAGTGCATTGTAGGAGCAGTGGTCTCTAGTTGAATTC CCTC CAGCCATGATTGTCCCTGCTCATTGCCAAAATATCAAGGTTACATAAACAAGGATTATTTTGTAGCATGCT GAAA |||||||||||| ||||||||||||||||||||||||||||| ||||||| ||||||||||||||||||||||| CAGCCATGATTGCCCCTGCTCATTGCCAAAATATCAAGGTTATATAAACACAGATTATTTTGTAGCATGCT GAAA TGGAAATAATATGGACTATGGAGGCATGTACATCTTATCTTACAATTACTAGTTAGAGGCCCGAACGGAAA CACC ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||| TGGAAATAATATGGACTATGGAGGCATGTACATCTTATCTTACAATTACTAGTTAGAGGCCTGAACGGAAA CACC ATGTTAATTCTCACTAGGCGCACTAGGAAAAAAGATAAATCCTAGAAATCCTCATAATAATCAGAAACATC TGAC ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| ATGTTAATTCTCACTAGGCGCACTAGGAAAAAAGATAAATCCTAGAAATCCTCATAATAATCAGAAACATC TGAC CTTGAATTTGGTAGACTTGAATCATCATCACCGATAAATAGCAAATGGATATATTGAATTTAGAATAAAAA CTAC ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| CTTGAATTTGGTAGACTTGAATCATCATCACCGATAAATAGCAAATGGATATATTGAATTTAGAATAAAAA TTAC CCACCATTGTCTTTAAGAAAAATACATAAAGTGACCCCCTTATTTTGTATCTAAATTACCCTCCTCCGCCA TTAC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

69

|||| CCACCATTGTCTTTAAGAAAAATACATAAAGTGACCCCCTTATTTTGTATCTAAATTACCCTCCTCCGCCA TTAC ACAAGATCATTCAGAATAACCTCCTAAATTGGTATATAGATTACCCATAAGTGGCATTATGAAGGATAATT CAAA |||||||||||||||||||||||||||||||||||||||||||||||||| || ||||||||||||||||||||| ACAAGATCATTCAGAATAACCTCCTAAATTGGTATATAGATTACCCATAACTGTCATTATGAAGGATAATT CAAA TAGCCCCCAAAATTGACATCTAAATTACAAAAATTTGTCATTATAAAACATACTACCAAATAACCACATAA CATT |||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||| TAGCCCCCAAAATTGACATCTAAATTTCAAAAATTTGTCATTATAAAACATACTACCAAATAACCACATAA CATT GCATCTAATTACCCTTCTTAACTATTATGAAAGATAATTTCAAATAACCTTCTAATGTTGTATCTAAATTC CCAT |||||||||||||||||||||| |||||||||||||||||||||||||||||||| ||||||||||||||||||| GCATCTAATTACCCTTCTTAACCATTATGAAAGATAATTTCAAATAACCTTCTAACGTTGTATCTAAATTC CCAT CTTCTATAACAAAAGATTATTTAAAATACCCCTAAAGTTTTTATCAGTTATCCACCTTTGTGATTATAAAA GATA ||||||||||||||||| |||||||||||||||||||||||||| |||||||||||||||||||||||||||||| CTTCTATAACAAAAGATGATTTAAAATACCCCTAAAGTTTTTATAAGTTATCCACCTTTGTGATTATAAAA GATA AAATAACCTCCTAAGTTTGCATTCAACCTACCCATATGTGTCATTATGAACGATAAATATTCGATGTACCT CAGT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| AAATAACCTCCTAAGTTTGCATTCAACCTACCCATATGTGTCATTATGAACGATAAATATTCGATGTACCT CAGT CACTATAAAAGTAAACAAAATCATCTCTCAAATATACATGTAAATACTTATATATGTTGGTATGGTACATA TAGA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CACTATAAAAGTAAACAAAATCATCTCTCAAATATACATGTAAATACTTATATATGTTGGTATGGTACATA TAGA TATTTCTTGCAATCGAATATGGAGAAGACAAAAAATTGGTTATTTAATTTCTTTTACTTTGAATAAAGAGA AGAA |||||||||||||||| |||||||||| ||||||| |||||||||||||||||||||||||||||| ||||||| TATTTCTTGCAATCGATTATGGAGAAGCCAAAAAACGGGTTATTTAATTTCTTTTACTTTGAATAAAAAGA AGAA ACAACTACGATGAACTCAAGGTGGGCATTTTTGGAATTAGGTGGGATTACCCCTTTTCATTGTACGGACCA CAAT ||||||| |||||||||||||| | ||||| ||||| |||||||||||||||||||||||||||||||||| ||| ACAACTATGATGAACTCAAGGTAGACATTTCTGGAAGTAGGTGGGATTACCCCTTTTCATTGTACGGACCA TAAT TATGCCAAAATCTTCACCCCAACGGATTGAGAGGCCCCATCACCGTCTTCATGTTTTTGCAGTGGAAGTGG ACGA ||||||||||||||||||||||| | ||||||||||||||||||||||||||||||| |||||||||||||||||

70

TATGCCAAAATCTTCACCCCAACAGGTTGAGAGGCCCCATCACCGTCTTCATGTTTTGGCAGTGGAAGTGG ACGA TTATGCCCATCTTCATCAAACGCTGGTATCAAACTGGCCATGAATATCGAT- AATAAAAAATTAAAAAAGTCAAG |||||||||||||||||||| | ||||||||||||||||||||| |||||| ||| || ||||||||||| TTATGCCCATCTTCATCAAATGTTGGTATCAAACTGGCCATGAAAATCGATAAATTTTTTTTTTAAAAAGT CAAG GTCAAGTGCGTGCCCGCCACACCCATGGCGCCAGTCATGTCCTTACGCCCCTTGACACTGTTGGCTGATTC AAAC ||||||||||||||||||||||||||||| || ||| ||||||||||||||| |||||||||||||||||||||| GTCAAGTGCGTGCCCGCCACACCCATGGCACCGGTCGTGTCCTTACGCCCCTCGACACTGTTGGCTGATTC AAAC CCTGATGCCCTCGTCCACCCGATCATAGATCAATACCGTAGGAACGCCGCCTCACCGCTG- TGCCATAGAGCCTC ||||||||||||||||||||||||||||||||| |||||||| |||||||||||||| || |||||||||||||| CCTGATGCCCTCGTCCACCCGATCATAGATCAACACCGTAGGGACGCCGCCTCACCGTTGCTGCCATAGAG CCTC CTCTGCCTCTGCTGAGGGGGGGGGGGGGGCAGGGGCGAACCTAGCAACAAGCTAGTGGGTGCAAATGCACC ACCC ||||||| | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CTCTGCCCCCGCTGAGGGGGGGGGGGGGGCAGGGGCGAACCTAGCAACAAGCTAGTGGGTGCAAATGCACC ACCC AAAATTTGAAAAAACAGTGGTTTTTTCTAAAATTTTACCATATATGCACCACATATAACACAAGTCTATGC ACCC ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| AAAATTTGAAAAAACAGTGGTTTTTTCTAAAATTTTACCATATATGCACCACATATAACACAAGTCTATGC ACCC ACAAGGCAAGTGCACCACCCTCTAATTTGCTCTAGCTTCGCCACGGGGGGGGGGAGGGACCTGTGAGCGAA GGGA |||||||||||||||||||||||||||||||||||||||||||| ||||||| | |||||||||||||||||||| ACAAGGCAAGTGCACCACCCTCTAATTTGCTCTAGCTTCGCCACTGGGGGGGCGGGGGACCTGTGAGCGAA GGGA AGCACCCACCGCAATGGGGAGGTGCACCACCACTAGGGTCAGCATGCCTGTTGCAAGGGCAGGCATGGGTG TGGT ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| AGCACCCACCGCAATGGGGAGGTGCACCACCACTAGGGTCAGCATGCCTGTTGCAAGGGCAGGCATGGGTG TGGT TAGGGCCAACGCATCCACGCCAAGGGGGTCATCCCGTCGAGTGTCTACCATGTGCCGGGTGGTTTGGTACA CCAG |||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TAGGGCCAACACATCCACGCCAAGGGGGTCATCCCGTCGAGTGTCTACCATGTGCCGGGTGGTTTGGTACA CCAG ATGGTAGGGCGAGAGCCTGGGGAAGGGAGAAACAGAGGGATAGAGCCTGGGAGAGGGGGGGGAGGATAACA CTGA ||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||| ATGGTAGGGCGAGAGCCTGGGGAAGGGAGAAACAGAGGGATAGAGCCTGGGAGAGAGGGGGGAGGATAACA

71

CTGA GTTGAGAGAGAGAGAGAAGGAAAGAGGATAGAGTTAAGATGCTAGCAACCTTACTTAATACATTGAGGATC CAAA ||||||||||||||||| || |||||||||||||| ||||| ||||||||||||| ||||||||||||||||||| GTTGAGAGAGAGAGAGATGGTAAGAGGATAGAGTTCAGATGGTAGCAACCTTACTAAATACATTGAGGATC CAAA TTTTGCTTATGTGAGGGGGCTCTTAACATCGGTTCTTCTCAACACGAACCGGCAGTGAGAATGTTTACTGT TTGC ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| TTTTGCTTATGTGAGGGGGCTCTTAACATCGGTTCTTCTCAACACGAACCGGCAGTGAGAATGTTTACTGT TTGC TCATAAGATGAACTGGTGGTCAAATTCATTTTCACCACCGTCCATGTTCCATTCTGAACCAAAAATGATTT TCAA ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| TCATAAGATGAACTGGTGGTCAAATTCATTTTCACCACCGTCCATGTTCCATTCTGAACCAAAAATGATTT TCAA CGTTGGGTCCTTTATGGACATGTGGTGACCAACGCTTACTACGTGAAAATGGTGTAATGAAAACTATTTCT TTAG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| CGTTGGGTCCTTTATGGACATGTGGTGACCAACGCTTACTACGTGAAAATGGTGTAATGAAAACTATTTCT TTAG TAGTGCATGCATAGA ||||||||||||||| TAGTGCATGCATAGA

72

References

1. Evans, M. & Kermicle,J. 2001. “Teosinte crossing barrier1, a locus governing hybridization of teosinte with maize”. Theor Appl Genet103: 259. https://doi.org/10.1007/s001220100549 2. Zhang, H., Liu, X., Zhang, Y. et al.2012. “Genetic analysis and fine mapping of the Ga1-S gene region conferring cross-incompatibility in maize”.Theor Appl Genet 124: 459. https://doi.org/10.1007/s00122-011-1720-7 3. Lauter M.et al. 2017. “A Pectin Methylesterase ZmPme3 Is Expressed in Gametophyte factor1-s (Ga1-s) Silks and Maps to that Locus in Maize (Zea mays L.)” Frontiers in plant science vol. 8 1926. doi:10.3389/fpls.2017.01926 4. Zhang, Z. et al.2018. “A PECTIN METHYLESTERASE gene at the maize Ga1 locus confers male function in unilateral cross-incompatibility”. Nature communications vol. 9,1 3678. doi:10.1038/s41467-018-06139-8 5. Peaucelle, A. Braybrook, SA., Höfte, H. 2012. “Cell wall mechanics and growth control in plants: the role of pectins revisited.” Front Plant Sci 3: 121 6. Fabienne, M. 2001.“Pectin Methylesterases: Cell Wall Enzymes With Important Roles In Plant Physiology”. 6th ed., TRENDS In Plant Science pp. 414-419, https://ucanr.edu/datastoreFiles/608-837.pdf. 7. Kermicle, J.L. 2006. “A selfish gene governing pollen-pistil compatibility confers reproductive isolation between maize relatives” Genetics vol. 172,499-506. 8. Walbot, V. 2004. “Genomic, chromosomal and allelic assessment of the amazing diversity of maize” Genome biology vol. 5,6: 328. 9. Kermicle J., Evans M.. 2010. “The Zea mays Sexual Compatibility Gene ga2: Naturally Occurring Alleles, Their Distribution, and Role in Reproductive Isolation”, Journal of Heredity, Volume 101, Issue 6. https://doi.org/10.1093/jhered/esq090 10. Gill, J. 2014. “Fine Mapping and Characterization of the iap Gene in Sorghum [Sorghum bicolor (L.) Moench]”. Doctoral dissertation, Texas A & M University. http://hdl .handle .net /1969 .1 /153258. 11. Dillon, S. Lawrence, P. Henry, R. et al. 2016. "Sorghum laxiflorum and S. macrospermum, the Australian native species most closely related to the cultivated S. bicolor based on ITS1 and ndhF sequence analysis of 28 Sorghum species". SOUTHERN CROSS PLANT SCIENCE. Southern Cross University. 12. Lazarides,M., Hacker, J.,and Andrew, M.1991., cytology and ecology of indigenous Australian sorghums ( Sorghum Moench: Andropogoneae: Poaceae). Aust. Syst. Bot. 4:591–635 13. Hodnett, G. 2005.“Pollen–Pistil Interactions Result in Reproductive Isolation between Sorghum bicolor and Divergent Sorghum Species” Crop Science. 45:1403–1409. 14. Plants.usda.gov. 2019. Plants Profile for Sorghum halepense (Johnsongrass). [online] Available at: https://plants.usda.gov/core/profile?symbol=SOHA [Accessed 12 Feb. 2019].

73

15. Ohadi,S. Hodnett, G., Rooney, W., Bagavathiannan M..2017. Gene Flow and its Consequences in Sorghum spp., Critical Reviews in Plant Sciences, 36:5-6, 367-385, doi:10.1080/07352689.2018.1446813 16. De Wet JMJ. Systematics and evolution of Sorghum sect (Gramineae) American Journal of Botany. 1978;65:477–484. 17. Dickson N. et al. 2010. “Phylogenetic analysis of the genus Sorghum based on combined sequence data from cpDNA regions and ITS generate well- supported trees with two major lineages” Annals of botany vol. 105,3: 471-80. 18. Arriola, P. and Ellstrand, N.1996. Crop-To-Weed Gene Flow in the Genus Sorghum (Poaceae): Spontaneous Interspecific Hybridization between Johnsongrass, Sorghum halepense, and Crop Sorghum, S. Bicolor. American Journal of Botany, 83(9), p.1153. 19. DOGGETI, H. 1988. Sorghum, 2d. ed., Tropical agricultural series. Longman Scientific, Essex. 20. Paterson, A. and Chandler, J. (n.d.). Risk of Gene Flow from Sorghum to "Johnsongrass".Texas A & M: Dept Soil and Crop Science. https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ve d=2ahUKEwiRhKaJnrXgAhVhiVQKHZXyC30QFjAAegQIARAC&url=http s%3A%2F%2Fw3.ual.es%2Fpersonal%2Fedana%2Fbot%2Fmh%2Fcomplem ento%2Fdocufijos%2Fsorgo%2520y%2520sorgo.doc&usg=AOvVaw0Ho1p A348JLhb3FOfnIpNF. 21. Encyclopedia Britannica. 2019. Gene flow | genetics. [online] Available at: https://www.britannica.com/science/gene-flow [Accessed 12 Feb. 2019]. 22. Wssa.net. 2019. Crop loss | Weed Science Society of America. [online] Available at: http://wssa.net/wp-content/uploads/WSSA-2018-Sorghum- Yield-Loss-poster.pdf 23. Lopez, J. A. 1988. Biological Aspects and Control of Johnsongrass (Sorghum halepense (L.) Pers.), Diss. Texas A&M University, College Station, Texas 24. Kong, W. et al. 2013. “Genetic analysis of recombinant inbred lines for Sorghum bicolor × ” G3 (Bethesda, Md.) vol. 3,1: 101- 8. 25. De Nettancourt, D. 2001. Incompatibility and Incongruity in Wild and Cultivated Plants (2nd ed.). Springer Verlag, Berlin. 26. Vencill, W., Nichols, R., Webster, T., Soteres, J., Mallory-Smith, C., Burgos, N., McClelland, M. 2012. Herbicide Resistance: Toward an Understanding of Resistance Development and the Impact of Herbicide-Resistant Crops. Weed Science, 60(SP1), 2-30. doi:10.1614/WS-D-11-00206.1 27. Bosch, M. and Helper, P. 2005. Pectin Methylesterases and Pectin Dynamics in Pollen Tubes. THE PLANT CELL ONLINE.17(12), pp.3219- 3226. http://www.plantcell.org/content/17/12/3219/tab-article-info [Accessed 22 Mar. 2019]. 28. Moustacas, A.M., Nari, J., Borel, M., Noat, G., and Ricard, J. 1991. Pectin methylesterase, metal ions and plant cell-wall extension. The role of metal ions in plant cell-wall extension. Biochem. J. 279, 351–354.

74

29. Nari, J., Noat, G., Diamantidis, G., Woudstra, M., and Ricard, J. 1986. Electrostatic effects and the dynamics of enzyme reactions at the surface of plant cells. 3. Interplay between limited cell-wall autolysis, pectin methyl esterase-activity and electrostatic effects in soybean cell walls. Eur. J. Biochem. 155, 199–202. 30. Wang, M., Chen, Z., Zhang, H., Chen, H., Gao, X. 2018. Transciptome Analysis Provides Insight into the Molecular Mechanisms Underlying gametophyte factor 2-Mediated Cross-Incompatibility in Maize,. Int. J. Mol.Sci 19,1757; doi:10.3390/ijms19061757 31. Paterson, A. H. 2009. The Sorghum Bicolor Genome And The Diversification Of Grasses. Nature. 456.551. https://www.nature.com/articles/nature07723#supplementary-information. 32. Schmidt JJ, Pedersen JF, Bernards ML, Lindquist JL. 2013. Rate of shattercane × sorghum hybridization in situ. Crop Sci 53: 1677–1685 33. Dahlberg, J. et al. 2011.Assessing Sorghum [Sorghum Bicolor (L) Moench] Germplasm For New Traits: Food, Fuels & Unique Uses. CREA Journals, vol 56, no. 2. https://journals-crea.4science.it/index.php/maydica/article/view/688. 34. All About Sorghum. United Sorghum Checkoff, 2019, https://www.sorghumcheckoff.com/all-about-sorghum. 35. Ciacci, C. et al. 2007. Celiac Disease: In Vitro And In Vivo Safety And Palatability Of Wheat-Free Sorghum Food Products. Clinical Nutrition, vol 26, no. 6, pp. 799-805. Elsevier BV, doi:10.1016/j.clnu.2007.05.006. 36. Hocq, L. et al. 2017.Connecting Homogalacturonan-Type Pectin Remodeling To Acid Growth. Trends In Plant Science, vol 22, no. 1, http://dx.doi.org/10.1016/j.tplants.2016.10.009. 37. Lu, Y. et al.2019. A Silk-Expressed Pectin Methylesterase Confers Cross- Incompatibility Between Wild And Domesticated Strains Of Zea Mays. Biorxiv, Preprint. Cold Spring Harbor Laboratory, doi:10.1101/529032. 38. Sticklen, M. B. 2008. Plant genetic engineering for biofuel production: towards affordable cellulosic ethanol. Nature Reviews Genetics 9, 433-443 doi: 10.1038/nrg2336. https://www.nature.com/articles/nrg2336 39. Hepler, P.K. et al. 2013. Control Of Cell Wall Extensibility During Pollen Tube Growth. Molecular Plant, vol 6, no. 4, pp. 998-1017. Elsevier BV, doi:10.1093/mp/sst103. https://www.sciencedirect.com/science/article/pii/S1674205214608984 40. Harholt, J. et al. 2010.Biosynthesis Of Pectin. PLANT PHYSIOLOGY, vol 153, no. 2, pp. 384-395. American Society Of Plant Biologists (ASPB), doi:10.1104/pp.110.156588. http://www.plantphysiol.org/content/153/2/384 41. Caffall, K., Mohnen, D.2009. The Structure, Function, And Biosynthesis Of Plant Cell Wall Pectic Polysaccharides. Carbohydrate Research, vol 344, no. 14,pp. 1879-1900. Elsevier BV, doi:10.1016/j.carres.2009.05.021. https://www.sciencedirect.com/science/article/pii/S0008621509002006?via% 3Dihub

75

42. Willats, W. G., McCartney, L., Mackie, W. & Knox, J. P. 2001. Pectin: cell biology and prospects for functional analysis. Plant Mol. Biol. 47, 9–27.

43. Wolf, S. et al. 2009. Homogalacturonan Methyl-Esterification And Plant Development. Molecular Plant, vol 2, no. 5,pp. 851-860. Elsevier BV, doi:10.1093/mp/ssp066.

44. Bartek, M.S., Hodnett, G.L., Burson, B.L., Stelly, D.M. & Rooney, W.L. 2012. Pollen tube growth after intergeneric pollinations of iap-homozygous sorghum. Crop Science 52, 1553-1560. 45. Gill, J.R., Rooney, W.L. & Klein, P.E. 2014. Effect of humidity on intergeneric pollinations of iap (Inhibition of Alien Pollen) sorghum [Sorghum bicolor (L.) Moench]. Euphytica 198, 381-387. 46. Kuhlman, L.C. & Rooney, W.L. 2011. Registration of Tx3361 Sorghum Germplasm. Journal of Plant Registrations 5, 133-134. 47. Price, H.J. et al. 2006. Genotype dependent interspecific hybridization of Sorghum bicolor. Crop Science 46, 2617-2622. 48. 31. Laurie, D.A. & Bennett, 1989. M.D. GENETIC-VARIATION IN SORGHUM FOR THE INHIBITION OF MAIZE POLLEN-TUBE GROWTH. Annals of Botany 64, 675-681. 49. Dorokhov, Y. L. et al.2018. "Methanol In Plant Life". Frontiers In Plant Science, vol 9, 2018. Frontiers Media SA, doi:10.3389/fpls.2018.01623. 50. Tran, D., Dauphin, A., Meimoun, P., Kadono, T., Nguyen, H. T. H., Arbelet- Bonnin, D., et al. 2018. Methanol induces cytosolic calcium variations, membrane depolarization and ethylene production in Arabidopsis and tobacco. Ann. Bot. doi: 10.1093/aob/mcy038. 51. Chalivendra, S. C. et al.2012. "Developmental Onset Of Reproductive Barriers And Associated Proteome Changes In Stigma/Styles Ofsolanum Pennellii". Journal Of Experimental Botany, vol 64, no. 1 pp. 265-279. Oxford University Press (OUP), doi:10.1093/jxb/ers324. 52. Kachroo, A. et al. 2002."Self-Incompatibility In The Brassicaceae". The Plant Cell, vol 14, no. suppl 1pp. S227-S238. American Society Of Plant Biologists (ASPB), doi:10.1105/tpc.010440. 53. Marshall, E. et al. 2011. "Cysteine-Rich Peptides (Crps) Mediate Diverse Aspects Of Cell-Cell Communication In Plant Reproduction And Development". Journal Of Experimental Botany, vol 62, no. 5,pp. 1677- 1686. Oxford University Press (OUP), doi:10.1093/jxb/err002. 54. Takayama S, Shimosato H, Shiba H, Funato M, Che FS, Watanabe M, Iwano M, Isogai A. 2001. Direct ligand–receptor complex interaction controls Brassica self-incompatibility, Nature , vol. 413 (pg. 534-538) 55. Wheeler MJ, de Graaf BH, Hadjiosif N, Perry RM, Poulter NS, Osman K, Vatovec S, Harper A, Franklin FC, Franklin-Tong VE.2009. Identification of the pollen self-incompatibility determinant in, Papaver rhoeas. Nature. vol. 459 (pg. 992-995)

76

56. LA Taq DNA Polymerase: A High-Fidelity PCR Enzyme For Long-Range PCR. Takarabio.Com, 2019, https://www.takarabio.com/products/pcr/long- range-pcr/la-taq-products/la-taq-dna-polymerase. 57. Hasan, S.A., Rabei S., Nada R., Abagadallah G., 2017. “ Water use Efficiency in the Drought-Stressed Sorghum and Maize in Relation to expression of Aquaporin Genes,” Biologia Plantasum vol 61(1): 127-137. doi:10.1007/s10535-016-0656-9 58. Enciso, J., Jifon J., Ribera L., Zapata S., Ganjegunte G., 2015. “Yeild, water use efficiency and economic analysis of energy sorghum in South Texas, “ Biomass and Bioenergy 81, 339-344. http://dx.doi.org/10.1016/j.biombioe.2015.07.021

59. Jolivet, P. and Foley, J. 2015. “Solutions for purifying nucleic acids by solid- phase reversible immobilization (SPRI)”. Ludmer Centre for Neuroinformatics and Mental Health, pp.V2.3. 60. Mayjonade, B., Gouzy, J., Donnadieu, C., Pouilly, N., Marande, W., & Callot, C. et al. 2001. “Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules”. BioTechniques. 62.

61. Rohland, N. , and D. Reich. 2012. “Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture”. Genome Res. 22:939-46. 62. Spitters C.J.T., Van Den Bergh J.P. 1982. Competition between crop and weeds: A system approach. In: Holzner W., Numata M. (eds) Biology and ecology of weeds. Geobotany, vol 2. Springer, Dordrecht 63. Annabelle Decreux, Johan Messiaen, Wall-associated Kinase WAK1 Interacts with Cell Wall Pectins in a Calcium-induced Conformation, Plant and Cell Physiology, Volume 46, Issue 2, February 2005, Pages 268–278, https://doi.org/10.1093/pcp/pci026

64. The BAM1/BAM2 Receptor-Like Kinases Are Important Regulators of Arabidopsis Early Anther Development. Carey L.H. Hord, Changbin Chen, Brody J. DeYoung, Steven E. Clark, Hong Ma. The Plant Cell Jul 2006, 18 (7) 1667-1680; DOI: 10.1105/tpc.105.036871 65. Guidelines for Using a Salt:Chloroform Wash to Clean Up gDNA. (2019). [ebook] California: , Pacific Biosciences of California, Inc. Available at: https://dnatech.genomecenter.ucdavis.edu/wp- content/uploads/2015/04/PacBio-Shared-Protocol-Guidelines-for-Using-a- High-Salt-Phenol-Chloroform-Clean-Up-of-gDNA.pdf 66. Payseur BA. Genetic Links between Recombination and Speciation. PLoS Genet. 2016;12(6):e1006066. Published 2016 Jun 9. doi:10.1371/journal.pgen.1006066 67. Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules.Baptiste Mayjonade, Jérôme Gouzy, Cécile Donnadieu,

77

Nicolas Pouilly, William Marande, Caroline Callot, Nicolas Langlade, and Stéphane Muños. BioTechniques 2016 61:4, 203-205 68. Pantherdb.org. (2019). PANTHER - Family Information. [online] Available at: http://www.pantherdb.org/panther/family.do?clsAccession=PTHR10334 69. Fletcher JC. The CLV-WUS Stem Cell Signaling Pathway: A Roadmap to Crop Yield Optimization. Plants (Basel). 2018;7(4):87. Published 2018 Oct 19. doi:10.3390/plants7040087 70. Hord, C., Chen, C., DeYoung, B., Clark, S., & Ma, H. (2006). The BAM1/BAM2 Receptor-Like Kinases Are Important Regulators of Arabidopsis Early Anther Development. http://www.plantcell.org/content/18/7/1667#abstract-1 71. Ohadi S, Littlejohn M, Mesgaran M, Rooney W, Bagavathiannan M (2018) Surveying the spatial distribution of feral sorghum (Sorghum bicolor L.) and its sympatry with johnsongrass (S. halepense) in South Texas. PLoS ONE 13(4): e0195511. https://doi.org/10.1371/journal.pone.0195511 72. FAOSTAT. Food and agriculture organization of the United Nations. FAOSTAT Statistics Database. 2019. Available from: http://www.fao.org/faostat/en/#data. 73. Korres NE, Norsworthy JK, Bagavathiannan MV, Mauromoustakos A. Distribution of arable weed populations along eastern Arkansas–Mississippi Delta roadsides: factors affecting weed occurrence. Weed Technology. 2015 Aug 12;29(3):596–604. 74. Adugna A., Sweeney P. M. and Bekele E. 2013 Estimation of in situ mating systems in wild sorghum (Sorghum bicolor (L.) Moench) in Ethiopia using SSR-based progeny array data: implications for the spread of crop genes into the wild. J. Genet. 92. 75. Andow, DA and Zwahlen, C (2006) Assessing environmental risks of transgenic plants. Ecology Letters 9: 196–214. 76. Paterson, AH, Schertz, KF, Lin, YR, Liu, SC and Chang, YL (1995) The weediness of wild plants: molecular analysis of genes influencing dispersal and persistence of , Sorghum halepense (L.) Pers. Proceedings of the National Academy of Sciences USA 92: 6127–6131 77. Ejeta, G., & Grenier, C. (2005). Sorghum and its Weedy Hybrids [Ebook] (pp. 5-11). West Lafayette, IN 47907, U.S.A.: Dept. of Agronomy, Lilly Hall of Life Sciences, 915 W. State Street, Purdue University. Retrieved from http://www.ask-force.org/web/Africa-Harvest-Sorghum-Lit/Ejeta-Feral- Chapt8-Sorghum.pdf 78. About UniProt. (2019). https://www.uniprot.org/help/about 79. Hurst, P., Liang, Z., Smith, C., Yerka, M., Sigmon, B., Rodriguez, O., & Schnable, J. (2019). Genome wide analysis of Ga1-s modifiers in maize [Ebook]. bioRxiv. Retrieved from https://www.biorxiv.org/content/biorxiv/early/2019/02/07/543264.full.pdf