INVESTIGATION OF THE BASIS OF LENGTH VARIABILITY IN THE

MARAMA ( ESCULENTUM) LARGE RDNA INTERGENIC

SPACER

by

EVAN CADWALLADER MESZAROS

Submitted in partial fulfillment of the requirements

For the degree of Master of Science

Thesis Adviser: Dr. Christopher Cullis

Department of Biology

CASE WESTERN RESERVE UNIVERSITY

August, 2011

CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the thesis/dissertation of

______Evan Meszaros

candidate for the ______degreeMaster of Science *.

(signed)______Robin Snyder, Ph.D.

(chair of the committee)

______Christopher Cullis, Ph.D.

______Emmitt Jolly, Ph.D.

______Barbara Kuemerle, Ph.D.

______

______

(date) ______May 3, 2011

*We also certify that written approval has been obtained for any proprietary material contained therein.

Table of Contents

List of Tables ...... iv

List of Figures...... v

Acknowledgements...... vi

Abstract...... vii

Introduction...... 1–4

Methods...... 4–7

Results...... 7–29

Discussion...... 29–34

Appendix...... 34–54

References...... 55–59

iii

List of Tables

1. Tandemly-arrayed subrepeat unit consensus sequences organized by the subrepeat cluster (see Fig. 12) which they are proposed to comprise...... 17

2. Tandemly-arrayed subrepeat unit consensus sequences organized by the samples in which they were identified...... 28–29

iv

List of Figures

1. Diagrammatic representation of the large rDNA array...... 3

2. Survey of 35 genomic marama samples amplified with the ext0 primers...... 4

3. Schematic of primer walking across the IGS of the large rDNA repeat unit...... 8

4. Comparison of four unique genomic samples amplified with primers ext0, ext1 ...... 9

5. Re-amplification of the lower of two bands from sample P4 (ext1 primers) ...... 10

6. Three samples amplified using all four pairs of primers for primer walking ...... 11

7. Dot plot of the full 3.2Kb IGS sequence of sample P4 plotted against itself ...... 12

8. Gel showing amplicons resulting from PCR with the ext1 primers ...... 13

9. Comparison of the full IGS sequence with that of ext1-amplified sample 25J...... 14

10. Comparison of the amplifications of samples P4 and 12D with the ext2 primers...... 15

11. Comparison of the sequences of P4 and 12D amplified with the ext2 primers...... 16

12. Diagram of the clusters of tandemly-arrayed subrepeats in the subrepeat region ...... 17

13. Survey of genomic samples amplified with the ext3 primer pair ...... 19

14. Selections from the survey representing the greatest variety of band classes ...... 19

15. Re-amplified (with ext3 primers) band excisions denoted in Fig. 14...... 20

16. Re-amplified (with ext3 primers) band excisions denoted in Fig. 15...... 21

17. Single-band samples isolated over many excisions and re-amplifications...... 22

18. Alignment of the more conserved 732bp section of the subrepeat sequences...... 24

19. Similarity tree of the alignment of a 732bp section of 21 samples’ sequences ...... 25

20. Alignment of the more variable section of the subrepeat sequences...... 27

21. Comparison of the dot plots of the subrepeat regions of three species’ IGS...... 31

v Acknowledgements

I am very grateful to Dr. Christopher Cullis for his mentorship in this project; all research

was performed in and supported by his lab at Case Western Reserve University. I would

also like to thank Margaret Cullis, Cory Bickel, and other colleagues in the Cullis Lab for

their patient assistance with lab procedures. Thanks also goes to Drs. Emmitt Jolly and

Barbara Kuemerle for their input and advice, and to my friends and colleagues in the

Department of Biology for their support.

vi

Investigation of the Basis of Length Variability in the Marama (Tylosema

esculentum) Large rDNA Intergenic Spacer

Abstract

by

EVAN CADWALLADER MESZAROS

The intergenic spacer (IGS) of the large rDNA gene array has been characterized in organisms across the domain Eukarya. A common characteristic of this region is length heterogeneity, not only between conspecific individuals, but within individual organisms.

In most species, this heterogeneity has been found to result from one or more clusters of subrepeats, each containing a varying number of subrepeat units. Length heterogeneity has previously been observed in the IGS of the undomesticated Tylosema esculentum (commonly called marama). This locus was sequenced and the region containing subrepeat clusters compared amongst 21 individuals. The subrepeat clusters contributing to IGS length heterogeneity are identified, and the organization of these clusters within the marama IGS is compared to that of other species. The findings presented here suggest that the subrepeat region in marama is smaller and more complex than in the analogous regions of other species.

vii

Introduction:

The legume Tylosema esculentum, known also by its common name marama, is an uncultivated species endemic to the arid regions of southern . Marama is of particular interest due to the high nutrient content of its beans as well as its tubers, both of which are currently used as a food source by indigenous human populations (Keegan and van Staden, 1981; Bower et al., 1988). Marama is also thought to have potential as a cover crop and as fodder for livestock, while at the same time successfully growing in poor soil (Thom, 2000). Therefore, much of the interest in the species has been in regards to its domestication in order to address human malnutrition in its native region. However, because the species grows in a patchy distribution across parts of , , and , and because these populations are threatened by human over- harvesting, conservation of the genetic diversity of the species is another major objective

(Nepolo et al., 2009). It is for both these reasons – its development as a crop and the conservation of existing diversity within the species – that more needs to be learned about it.

To these ends, an initiative to characterize marama in terms of the species’ molecular attributes has begun. These efforts have concentrated on the development of novel molecular markers, such as microsatellite repeats (Takundwa et al., 2010a, b) and RAPD markers (Monaghan and Halloran, 1996), as well as the analysis of known polymorphic loci, including the intergenic spacer regions of the 5S and large ribosomal RNA genes

(Kuemerle et al., 2010; Nepolo et al., 2010), with the aim of measuring genetic diversity

1 within the species. While these support the general conclusion that there is significant

diversity within the species, especially at the sub-population level, they also raise

additional questions concerning the origin and extent of the diversity of DNA sequence classes. One such question regards the nature of the variation observed in the intergenic spacer (IGS) of the large ribosomal RNA (rRNA) gene array.

The IGS, also known as the non-transcribed spacer (NTS) region, is located in vascular between the 25S and 18S rRNA genes (Fig. 1). These two genes, which flank the

5.8S gene and are separated from it by internal transcribed spacers (ITS), are part of a tandemly arrayed 45S transcription unit found in variable but high copy number (as reviewed by Rogers and Bendich, 1987). Considerable variation in IGS length within individuals of most species has also been observed (for example, see Ellis et al.,

1984, Kato et al., 1990; Borisjuk and Hemleben, 1993; Cordesse et al., 1993;

Lakshmikumara and Negi, 1993; Lassner et al., 1987; Saghai-Maroof et al., 1984;

Sardana and Flavell, 1996; Yakura et al., 1984; Bhatia et al., 1996), and marama can be counted among these (Kuemerle et al., 2010; Nepolo et al., 2010). This variation has been shown in many species to be caused by tandem subrepeats of one or more basic units varying from tens to a few hundred bases in size, and located in an analogous region of the IGS across different species. The variable number of subrepeats found in the rDNA

IGS of animals has been shown to have enhancer activity for the rRNA genes (Doelling et al., 1993; Reeder, 1984 as cited by Sardana and Flavell, 1996), though this has not yet been shown to be the case in plants.

2

Figure 1: Diagrammatic representation of the large rDNA array (top), with an expanded view of a

complete 45S transcription unit, containing the 18S, 5.8S, and 25S rRNA genes separated from each

other by an internal transcribed spacer (ITS), and itself separated from the next 45S transcription

unit by the intergenic spacer (IGS) located between the 25S and 18S genes (bottom).

The purpose of this study is to determine the basis of the length variability observed in

the marama rDNA IGS. This length variability ranges from an almost complete deletion

to over 4Kb (Fig. 2 – Note: a combined ca. 820bp of each band shown is comprised of

25S and 18S coding regions intervening between the primers’ annealing sites and the IGS

proper). Based on previous studies on related , two leading explanations for this are that the variability is either due primarily to differences in an integral number of only one or two subrepeat unit classes, as has been observed in the pea (Ellis et al., 1984), or

due to a complex subrepeat region involving a number of varying subrepeat classes, as

seen in lentils, faba and mung beans (Fernández et al., 2000; Kato et al., 1990). The results presented below suggest that the marama large rDNA IGS shares some of both features, including a complex subrepeat region within which a single tandem repeat cluster may be responsible for at least some of the observed IGS length heterogeneity

(Fig. 2).

3

Figure 2: Survey of the marama germplasm represented by 35 unique genomic samples PCR- amplified with a pair of primers (ext0) designed by the BIOL 301/401 class to anneal to sites within the conserved 25S and 18S rRNA coding regions flanking the IGS. The 25S and 18S primers’ annealing sites are located ca. 310bp and 510bp within their respective gene’s coding regions. MII =

Bioline Easyladder II marker. This gel image was generated by Dr. Cullis in 2007 for the BIOL

301/401 class.

Methods:

DNA Extraction

The germplasm of 12 unique marama samples was isolated from the embryonic axes of marama beans according to the method described in the QIAGEN DNeasy® Plant

Handbook, 2004 (Purification of Total DNA from Plant Tissue protocol, pp. 18-20) and the accompanying kit. Embryonic axis tissue was either processed fresh (after had soaked in water) or after being frozen at −20°C. The tissue was disrupted using a micropestle with sterile sand.

4 Genomic DNA samples were also obtained via extraction from marama leaf tissue

(gathered from plants growing at the University’s farm in Pretoria, South Africa), using the same protocol and kit as above. Leaf tissue was either processed fresh or after being stored at −80°C. 1g of leaf tissue was used per extraction, rather than the recommended

100mg; reagent volumes were adjusted to the corresponding stoichiometry, though only one QIAshredder Mini Spin Column and DNeasy Mini Spin Column were used per extraction. Tissue was disrupted using a mortar, pestle, and sterile sand. An additional wash of the DNeasy column’s membrane after Step 10 with 500µL 100% ethanol was required as per the protocol’s Troubleshooting Guide (p. 25).

PCR

Amplification of all samples was accomplished using Takara HS Taq Polymerase and associated reagents. As per the accompanying protocol, PCR reactions totaled 50µL and contained 0.25µL 5U/µL Taq, 5µL 10× Fast Buffer I or II (depending upon the expected amplicon size), and 4µL 2.5mM dNTP mixture. Samples were amplified using MJ

Research, Inc. PTC-100 thermocyclers. The basic PCR program followed (with exceptions noted) was:

1. 85°C for 2m 2. 94° for 1m (initialization) 3. 98° for 5s (denaturation) 4. 58°-62° (depending on primers) for 10s (annealing) 5. 72° for 1m (elongation) 6. GoTo 3 15-33× (effective number of cycles determined empirically) 7. 72° for 5m (final elongation) 8. 4° for ∞ (final hold)

5 PCR sample purification was carried out following the PCR Purification Kit Protocol

Using a Microcentrifuge (pp.19-20), from the QIAGEN QIAquick® Spin Handbook,

2008. Samples were eluted in either 30 or 50µL H2O.

Gel Electrophoresis and Analysis

Ficoll OrangeG (6×) loading buffer was added to all samples before electrophoresis.

Agarose gels were made in concentrations ranging from 1-2% with 0.5× TBE buffer.

Approximately 1µL of 10µg/mL EtBr was added to 50ml molten gel.

Gels were imaged with a Kodak DC290 Zoom Digital Camera and a Fotodyne, Inc.

FOTO/UV 300 transilluminator. Adobe Photoshop was used to adjust contrast, invert

color information, and label gel images. Fragment sizes and sample concentrations were determined by comparing bands to standards – Bioline Easyladder I and II – and by

constructing calibration curves using Microsoft Excel.

Bands were extracted from gels following the QIAquick® Gel Extraction Kit Protocol

Using a Microcentrifuge (pp. 25-26) in QIAGEN’s QIAquick® Spin Handbook, 2008.

Sequencing and Bioinformatic Analysis

All sequencing was performed by Eurofins MWG Operon in Huntsville, AL.

Bioinformatic analysis was carried using a number of software programs. Primer3

(http://frodo.wi.mit.edu/primer3/) was used in the design of PCR primer pairs; unless

6 otherwise noted, dot plots were generated by the following program

(http://www.vivo.colostate.edu/molkit/dnadot/); local alignment comparison of DNA

sequences was carried out using the NCBI’s BLAST Align software (linked from

http://blast.ncbi.nlm.nih.gov/Blast.cgi); global alignment comparison of two or more

sequences was carried out using the ClustalW algorithm (found online at

http://align.genome.jp/) and with the BioEdit (v. 7.0.9.0) software; tandemly arrayed sequences were identified using the Tandem Repeats Finder software, with advanced settings set to the most permissive search parameters (found online at http://tandem.bu.edu/trf/trf.html).All default options were used in all software programs unless otherwise noted.

The same protocols for DNA extraction, gel extraction, and PCR sample purification were followed by the Case Western Reserve University BIOL 301/401 classes of 2007,

2009, and 2010, which contributed work and data to this project.

Results:

The sequence of a full marama IGS was determined via primer walking, starting with the

region flanked by the consensus primers designed within the conserved rRNA coding

regions at each end of the transcription unit. Individual amplified bands were excised from a gel such as that shown in Figure 2, re-amplified using the same primers, and

sequenced from each end, again with the appropriate primers. Analyzing the sequence

7 data, a second set of primers was designed and the excised band used as a template to amplify a shorter fragment. This was again sequenced from both ends and the process repeated until the complete IGS sequence was obtained.

Primer Walking Across the IGS

The pair of PCR primers annealing to sites located in the conserved 25S and 18S rRNA gene coding regions flanking the IGS and previously used to amplify the IGS (Nepolo et al., 2010; Fig. 2) – hereby referred-to as primer pair “ext0” (Fig. 3a) – were used to generate new primers. Both primers had previously been used to obtain the sequences of the IGS adjacent to the ends of each coding region (BIOL 301/401 ’09) and these sequences were used to design a new pair of primers – “ext1” – to amplify the intervening spacer (Fig. 3b).

a) 25S Gene IGS 18S Gene

Primer pair : “ext0” 25SL/18SR Primary Amplicon Size: 3.5-4.1Kb b) IGS

Primer pair : “ext1” Primary Amplicon Size: 2.9-3.5Kb 25SLext1/18SRext1 c) IGS

Primer pair : “ext2” 25SL2/18SR2 Primary Amplicon Size : ≈2.0Kb d) IGS

Primer pair : “ext3” Primary Amplicon Size: 950bp-1.4Kb* 25SL2/18Sext3 Figure 3: Schematic depiction of the process of primer walking across an idealized IGS of the marama large rDNA repeat unit. Small arrows indicate primer anneal sites; solid line

8 sections of the IGS represent portions for which nucleotide sequences were determined; dashed sections represent portions for which sequence had not yet been obtained. a) Primers were designed for the BIOL 301/401 ’07 class to anneal to sites found in the conserved 25S and 18S rRNA coding regions; IGS sequences determined using these primers were obtained by the BIOL 301/401 ’09 class. b-d) The first through the third primer pairs (extensions 1-3, or ext1-3, respectively) designed to anneal to sites located within sequences of the IGS obtained using the previous primer pair. The size range in which the most prominent amplicons were observed (upon electrophoresis of samples amplified with the listed primers) are given as the “primary amplicon size”. *The primary amplicon sizes resulting from PCR with primer pair ext3 fell in three size classes within this range (see the Sample Surveys and Sequence Analysis section). Primer design was accomplished using the Primer3 software.

The spacers of a number of individual marama plants, previously PCR-amplified with primer pair ext0, were then amplified with primer pair ext1. The most prominent class of resulting bands is between 2.9Kb and 3.5Kb, compared to the 3.5-4.1Kb band classes produced when amplifying with primer pair ext0 (Fig. 3a, 3b, 4).

Figure 4: Comparison of four marama plants’ (P1, 13, P4, and P5) genomic DNA amplified with primer pairs ext0 and ext1 (Fig. 3a, 3b). Included are negative PCR controls for each primer pair. MII = Bioline Easyladder II marker.

9 The lower of two discrete bands appearing in sample P4 of Fig. 4 (when allowed to

electrophoretically separate farther) was isolated (Fig. 5) and chosen for complete sequencing via primer walking (Fig. 3).

Figure 5: The re-amplification of the lower of two discrete bands isolated from sample P4 (when amplified with the ext1 primers). This amplicon would be used as the subject of the primer walking project. MII = Bioline Easyladder II marker.

Two additional rounds of sequencing and developing new primers (Fig. 3c, 3d) allowed

for the generation of a complete sequence of the IGS of this representative sample P4.

The sizes of the most prominent amplicons resulting from PCRs using the second (ext2)

and third (ext3) primer pairs developed were about 2.0Kb and from 950bp-1.4Kb,

respectively. As when amplifying samples with primer pair ext0 (Fig. 2), a high

variability in the number and size disparity of resulting amplicons was observed when

amplifying samples with these primer pairs as well. A gel comparing three samples used

in Fig. 4 – samples P1, 13, and P5 – amplified with each successive generation of

primers, is shown in Fig. 6. Note that as the size of the amplified products becomes

shorter, it is possible to resolve the very broad bands into discrete classes.

10

Figure 6: All four generations of primers used to complete the primer walking project are shown here having been used to amplify the same three samples (from left, P1, 13, and P5 for each primer pair) and allowing for a comparison of the primary amplicon sizes among primer pairs used. A positive control (sample 8C, amplified with the ext3 primer pair) is included. MI = Bioline Easyladder I marker.

Locating the Subrepeat Region

Analysis of the first round of sequences obtained using primer pair ext1 helped identify

the highly repetitive nature of that sequence starting ca.220bp downstream of the 25S coding region. Therefore, the primer for this direction was designed to anneal to a site as far downstream in this relatively non-repetitive 220bp as possible. Upon assembling the sequence for the entire IGS of sample P4, it was determined that this highly repetitive region proximal to the 25S gene is part of the ca.1030bp subrepeat region. Plotting the sequence of the IGS against itself in a dot plot, the subrepeat region is most readily apparent between bases ca. 220–1160 (Fig. 7).

11

Figure 7: Dot plot of the full, 3.2Kb sequence of the IGS of sample P4 (see Appendix) plotted against itself, using a window size = 9 and a mismatch limit = 1. The most prominent portion of the subrepeat region, located proximal to the 25S gene (left/top) and beginning at ca. 220 and ending at ca. 1160, is identified.

The final pair of primers used to complete the primer walk across the IGS of sample P4 –

primer pair ext3 – anneal to sites closely flanking the subrepeat region (Fig. 3d). This pair

of primers is thus suited to survey the length variation across a number of samples in

order to determine whether or not the variation seen at the level of the whole IGS exists

within this more narrowly-defined region. In addition, the size range in which amplicons

are produced using this primer pair is small enough (950bp-1.4Kb; Fig. 3d) to allow amplicons to be completely sequenced. The next section describes the amplicons of varying size produced with, and the analysis of sequences obtained using, the three primer pairs developed for primer walking. 12 Sample Surveys and Sequence Analysis

Five PCR samples of the marama IGS, chosen because the range of bands observed when amplified with the ext0 primer pair, were re-amplified with the ext1 primer pair. The results of the PCR survey with the ext1 primers recapitulated the results of the PCR survey with the ext0 primers (Fig. 2) by revealing many classes of bands smaller than the prominently represented 2.9-3.5Kb size class (Fig. 8) expected to result based on the full sequence obtained from primer walking across sample P4 (Fig. 3b, 6).

Figure 8: Gel revealing the many amplicons resulting from amplification of five samples with the ext1 primer pair. Bands of classes D and J (with label to the left of each indicated band) from samples 12 and 25, respectively, were among many of these smaller bands excised from this gel. MII = Bioline Easyladder II marker.

13 A number of these bands, serving as representatives of such smaller amplicon size

classes, were extracted from the gel in Fig. 8. One of these – band J from sample 25 –

was re-amplified with the same primers and successfully sequenced (see Appendix for sequence). Bands of class J (ca. 300bp) appear to be generated alongside most of the larger band classes isolated upon re-amplification. Comparing the sequence of the 300bp fragment 25J to the sequence of the previously fully sequenced IGS using both local and

global alignment software, it is apparent that the former mostly matches with the IGS

sequence immediately adjacent to the 18SRext1 primer site, but with an apparent deletion

of about 600 bases if the 11bp perfect match is also significant (Fig. 9).

Figure 9: Graphical result of a BLAST Align comparison (using the ‘BLASTN’ algorithm) of the full

3.2Kb IGS sequence obtained for sample P4 with that of amplicon 25J, representing a class of 300bp fragments produced when amplifying samples with primer pair ext1. ‘Query’ = full IGS sequence, presented 5′-3′ from the conserved 25S coding region (left) to the 18S conserved coding region (right). Small match is 11bp in size.

When testing the final version of the ext2 primer pair, band D, extracted from sample 12

in the gel shown in Fig. 8, was included alongside a number genomic DNA samples in

order to determine whether the ext2 primer sites were present in this 2.0Kb amplicon.

Successful amplification with the ext2 primers yielded a new amplicon of 1.0Kb (Fig.

10), which was then able to subsequently be sequenced (see Appendix for sequence).

14

Figure 10: Comparison of the PCR results of sample P4 and the

amplicon 12D – formerly 2Kb when amplified with primer pair ext1 – both amplified with the primer pair ext2. MI = Bioline Easyladder I marker.

Comparison of this sequence to the full sequence of the IGS using BLAST Align (and the

‘BLASTN’ algorithm) revealed a large region of high similarity adjacent to the 18SR2 primer annealing site, as well as other regions of similarity matching the end of the subrepeat region proximal to the 25S gene (Fig. 11). These results thus show that the ext2 primers’ annealing sites have been found within an ext1 amplicon class which is almost

1Kb shorter than the most commonly produced class resulting from amplification with the ext1 primer pair, and this would lend support to the idea that IGS length variations are due to differences in the presence or absence of sequences in or adjacent to the subrepeat region rather than elsewhere in the IGS.

15

Figure 11: (Top) Graphical result of a BLAST Align (using the ‘BLASTN’ algorithm) comparison of

the full 3.2Kb IGS sequence obtained for sample P4 with that of the 1Kb amplicon 12D, the product of a previous PCR (see text), amplified with primer pair ext2. ‘Query’ = full IGS sequence, presented 5′-3′ from the conserved 25S coding region (far left) to the 18S conserved coding region (far right). (Bottom) Dot plot from the same BLAST Align execution comparing the full IGS sequence (abscissa)

to that obtained for sample 12D (ordinate).

Considering the subrepeat region directly, an analysis of the fully sequenced 3.2Kb IGS

for tandemly repeated sequences identified five major clusters of repeats (Fig. 12, Table

1). The subrepeat units shown to comprise clusters C, D, and E in Fig. 12 are those

identified as having the greatest period size, percent matches with the subrepeat unit’s

consensus sequence, and lowest percent indels compared with the subrepeat unit’s

consensus sequence; these are denoted in Table 1 with a Roman numeral ‘I’ next to their respective repeat cluster name. 16

Figure 12: Diagram of the subrepeat region, showing the clusters of tandemly-arrayed subrepeats identified therein. Small arrows at top of diagram denote the anneal sites of the ext3 primers. Note that the “left primer” anneals to a site partially overlapping the second subrepeat of cluster A.

Table 1: Tandemly-arrayed subrepeat unit consensus sequences organized by the subrepeat cluster (see Fig. 12) which they are proposed to comprise. Tandem repeat loci, subrepeat unit consensus sequences, and accompanying statistics were identified using the Tandem Repeats Finder software (Benson, 1999).

Locus in Full Repeat IGS Sequence Consensus Consensus Copy % % Cluster (bp) Sequence Size (bp) Number Matches Indels A 131-214 GATTCTTGGGATG 42 2.0 81 4 TGATCCCTAAGTG TGGAGGGAGGGAT ACT

B 239-317 CTGCATTCTGCAC 33 2.3 80 6 ATCATGCTGCACG AGTGGCC

C–I 439-607 TGCGCACATCTGC 31 5.3 85 5 CGCACATATGCGC ACATA C–II 442-592 GCACATCTGCGCA 21 7.2 79 10 CAGATGCC C–III 452-606 CGCACATATG 10 14.7 76 11

D–I 694-764 CATGGTGCACGGG 21 3.4 64 13 ATGGTGCA D–II 695-769 ATGGTGCACGGG 12 6.8 62 22

E–I 871-1160 ACGTTGGTGCACA 33 8.8 78 2 TGATGCACACATT GGTGCAA E–II 883-1020 ATGGTGCACGGG 12 12.3 62 16 E–III 1012-1159 CACATTGATGCA 12 13.2 61 16

17 In addition, alternative subrepeat unit consensus sequences are presented for the same

general repeat cluster regions C, D, and E in Table 1. Note that the 10bp consensus

sequence proposed as the repeating unit comprising cluster C (C–III) actually matches

10bp of the larger 31bp sequence (C–I) proposed to comprise the same cluster. Note also

that the 12bp consensus sequence proposed as the repeating unit of cluster D (D–II)

matches exactly the sequence proposed as the repeating unit of part of the region (base pairs 883-1020) designated ‘cluster E’. While these loci – clusters D and E – are separated from one another by ca. 110bp, a BLASTN alignment of the full IGS with itself revealed that similar regions (base pairs 628-770 and 815-949) have an 80% match, with gaps comprising 7% and an E-value on the order of 10-30.

In order to get a more thorough understanding of how the subrepeat region of individual

marama samples differ, the primer pair ext3 – those primers used for primer walking

across the innermost region of the IGS (Fig. 3d) and which flank the subrepeat region

(with the possible exception of subrepeat cluster A, as shown in Fig. 12) – were used in a

PCR survey of 55 samples. The typical results for each individual sample yielded at least

one of three major amplicon size classes: a 950bp-1.05Kb class, an intermediate 1.10-

1.18Kb class, and a 1.30-1.55Kb class. Many other amplicons outside (but, for the most

part, between) these size class ranges were also observed, though these may have been

produced in such low abundance that they did not consistently appear from one gel, or

portion of a gel, to another (see Fig. 13). The DNAs used for the amplifications of

samples shown in Fig. 13 were isolated by the BIOL 301/401 ’07 class; two other sets of

samples had also been prepared, one by the BIOL 301/401 ’09 class and one by me.

18

Figure 13: Survey of genomic samples amplified with the ext3 primer pair. The left and right sections of the image are the upper and lower portions, respectively, of the same gel. The left section may appear darker due to a higher concentration of EtBr remaining in the upper portion of the gel when imaged; this may explain the visibility of the many faint bands on the left as opposed to the right section. The genomic samples amplified were prepared by the BIOL 301/401 ’07 class. C(+) = positive control consisting of a previous amplification of sample P8 with the same primers. MI = Bioline Easyladder I marker.

Nine samples from the survey gel shown in Fig. 13 were selected on the basis of the variety of band classes they presented, and select bands were excised from these in the hope of isolating amplicons of single size classes for subsequent sequencing (Fig. 14).

Figure 14: Samples selected from the survey gel shown (in Fig. 13) to represent the greatest variety of band classes resulting from PCR of genomic samples with the ext3 primer pair. Bands excised from 19 each sample’s lane are labeled with letters (or asterisks, if no other bands were excised from that lane) to the left of each band. Bands were named in order of their chronological excision, with the faintest visible bands in each lane being excised first. MI = Bioline Easyladder I marker.

The 25 excised bands were then re-amplified and run on another gel (Fig. 14). It is

evident that the attempted isolation of many bands representing size classes >1Kb is

confounded by the co-production (in all but samples 18A and 18B) of a smaller, ca.

950bp amplicon. Again, in an attempt to isolate amplicons representing the major size

classes present for sequencing, nine bands were excised from the gel shown in Fig. 15.

Figure 15: Re-amplified band excisions denoted in Fig. 14. Samples were re-amplified with the ext3 primer pair. Bands were again selected for excision based on their variety in size and intensity, and they are labeled with U (Upper), M (Middle), or L (Lower) to the left of each band. Ctrl = negative control. MI = Bioline Easyladder I marker.

Upon re-amplification and electrophoresis, samples 11CU and 11CL produced results

consistent with those observed for many samples in Fig. 15, in which a smaller, ca.

950bp band appears in both samples, albeit at a lower concentration when observed with

another, larger band (see Fig. 16). The remaining samples, with the exception of 17AM

20 and 18AU, presented single bands, representing two amplicon class sizes, suitable for subsequent sequencing. The unexpected results observed for samples 17AM, with the appearance of a whole range of bands, and 18AU, the band excision for which likely contained some material excised from the 18AL band in Fig. 15, ruled these out as candidates for direct PCR amplicon sequencing.

Figure 16: Re-amplified band excisions denoted in Fig. 15. Samples were re-amplified with the ext3 primer pair. Ctrl = negative control. MI = Bioline Easyladder I marker.

All samples observed (in Figs. 15 and 16) to consistently produce a single band when amplified with the ext3 primer pair (Fig. 17) were subsequently sequenced. Thirteen additional samples, nine of which were contributions from the BIOL 301/401 ’10 class, were also sequenced. Some band isolates deriving from the same genomic sample and originally of different sizes (e.g. samples 11B, 11CL, and 11D) ultimately appeared as single bands of the same size after two series of gel extraction and re-amplification (Fig.

17). These were selected to be sequenced in order to determine whether or not the ca.

950bp band appearing across all three share the same sequence.

21

Figure 17: Seventeen samples presenting only single bands each, representing amplicon classes successfully isolated over a course of selective band excisions and re-amplifications (with the ext3 primer pair) as shown in Figs. 13-16. Samples shown were subsequently sequenced. MI = Bioline Easyladder I marker.

Due to their relatively small size, these single band amplicons lent themselves to

complete sequencing, allowing for the construction of a consensus of both reads obtained

for each sample. While poor sequence quality of one or both primer’s reads limited the

number of consensus sequences developed from the pool of samples, 26 reasonably high quality sequences were ultimately developed. Band isolates deriving from the same genomic sample and found to completely match each other’s sequence, such as bands

11B, 11CL, and 11D, were compared and condensed into a single consensus sequence.

The resulting alignment of all 21of the sequences obtained may be divided into two major

sections, one of which appears to have much higher agreement across all samples (Fig.

18), and one in which there is high variability (Fig. 20) and much less basis for

comparison. The section in which there is high agreement across all samples comprises

the ca. 750bp of the subrepeat region proximal to the 18Sext3 primer’s annealing site. In

order to produce the most meaningful consensus sequence and similarity tree (Fig. 19) for

all 21 samples based on this region of high agreement, however, the 40-42bp

22 immediately adjacent to the 18Sext3 primer, which for most samples was of poor quality,

were excluded. In addition, it should be noted that sequences 18AL and 17AU, which are

listed along the bottom of the alignment in Fig. 18, had poor reads from the 18Sext3

primer, and thus appear truncated in the alignment and are grouped on their own branch

on the similarity tree (Fig. 19). Taking these exceptions into account, though, the

discrepancies between individual sequences and the consensus sequence for this region

only included three apparent deletions of 8-20bp and eight SNPs. All but two of the SNPs

appear only once in the alignment, and thus they may be artifacts of the sequencing

process despite agreement between the two reads for each sample. Analysis of the

deletions in this region suggest that the deletion in sample IX (bp 332-343 in Fig. 18)

closely matches the first 12bp subrepeat unit of cluster E listed in Table 1. The 9bp deletion appearing in sample I (bp 161-169) appears as a tandem duplicate sequence in all other samples. Finally, the deletion shared by samples IX and 11 (bp 208-228) does not appear to reside in any of the subrepeat clusters identified in Table 1.

23 Figure 18: Alignment of the more conserved 732bp section of the subrepeat region, with sequences having been amplified and sequenced using the ext3 primer pair. Individual samples’ sequences are compared to a consensus sequence (along top);

24 agreement with the consensus is denoted with a ‘.’; deviations from the consensus are denoted with the appropriate nucleotide letter or a ‘-’ to indicate a gap. Note: vertical order of sequences is different in Fig. 20.

Figure 19: Similarity tree comparing 21 samples based on the alignment of a 732bp section of the samples’ subrepeat sequences

(see Fig. 18). Samples 18AL and 17AU (bottom) cluster together primarily because their sequence reads from the 18Sext3 primer were of poor quality. The image was generated using ClustalW.

The section of the subrepeat region showing high variability (in sequence length, from

42-518bp, as well as sequence agreement) across all samples was considerably more difficult to parse. In addition, the integrity of the sequence reads from the 25SL2 primer in this section was consistently poorer than those on the opposite side. Because many of the samples’ sequences clustered in similarity from the point designated ‘bp 0’ (in both

Figs. 18 and 20) toward the 25S gene (Fig. 20), five samples were identified as being

25 representative of the variety of possible subrepeat clusters found in this portion of the subrepeat region. One of these is sample P4, which is the sample used to generate the full

IGS sequence and that used to generate the diagram of the subrepeat region shown in Fig.

12. Subrepeat cluster B appears to be unique to sample P4. Another representative sample

is IX, which, like many of the samples, appears to roughly be a truncated version of the amplicon obtained for sample P4. This sample does differ from P4 (Fig. 20), but it shares an almost complete analog of the subrepeat cluster C with P4.

The other three representative samples and their unique subrepeats – unique primarily in terms of their copy numbers – are summarized in Table 2. Samples P9B, 18AL, and 17AU exclusively share a 45bp cluster in close proximity to the 25SL2 primer annealing site. In addition, they also share large clusters of subrepeats with repeating units analogous to those listed in Table 1 for Cluster C, with units classified as 1×, 2×, 3×, 4×, and 6× multiples of an analogous ca.10bp repeat unit (Table 2). The longest of these subrepeat clusters measured by the number of tandem repeats of this smallest 10bp unit (P9B–V), is

41.0 repeats (i.e. comprising ca. 410bp), found in sample P9B. However, the longest subrepeat cluster identified (found in the same sample) is comprised of 13.6 repeats of a

31bp unit (P9B–III), comprising a subrepeat cluster size of ca. 422bp. The largest subrepeat unit identified is 63bp (18AL–II), though it only comprises a cluster about

302bp in length. It should be noted that the subrepeats listed in Table 2 were identified from the full-length sequence reads from each sample, rather than the separate sections shown in Figs. 18 and 20.

26

Figure 20: Alignment of the section of highly variable (in length, read quality, and mutual agreement) sequence data across the 21 samples whose subrepeat regions were sequenced using the ext3 primer pair. Point ‘0’ on ruler matches the same point in Fig. 18. Note: vertical order of sequences is different in Fig. 18.

27 Table 2: Tandemly-arrayed subrepeat unit consensus sequences organized by the samples in which they were identified. Consensus sequences and accompanying statistics were identified using the Tandem Repeats Finder software (Benson, 1999). Analogous subrepeat units manually identified; for subrepeat units C–I, C–II, and C–III, see Table 1.

Representative Analogous Sample(s) + Subrepeat Consensus Consensus Copy % % Subrepeat Unit Unit(s) Sequence Size (bp) Number Matches Indels P9B / 18AL / N/A (See bp GGAAAGGG 18 2.5 81-85 0 ACCAATAG 17AU – I –582 - –473 ACY in Fig. 20) (Y=C/T)

P9B–II 2× of C–II, TGCCGCAC 42 10.0 72 11 ATATGCGC 18AL–IV, ACATCTGC 17AU–III; GCACATCT 4× of P9B– GCCGCACA V and C– TC III P9B–III C–I; 18AL– TGCGCACA 31 13.6 70 11 III; TATGCGCA CATGTGCC 17AU–II; GCACATC 3× of P9B– V and C– III P9B–IV C–II; 18AL– CACATCTG 21 19.7 65 13 IV; CCGCACAT CTGCC 17AU–III; 2× of P9B– V and C– III P9B–V C–III TGCGCACA 10 41.0 68 13 TC

18AL–II 2× of C–I, TGCGCACA 63 4.8 77 7 TATGCGCA 18AL–III, CATCTGCC 17AU–II, GCACATCT and P9B– GCCGCACA III; TATGCGCA 3× of C–II, CATCTGCC GCACATA 18AL–IV, 17AU–III, and P9B– IV; 6× of P9B– V and C– III 28 18AL–III C–I; 17AU– TGCGCACA 31 9.6 77 7 II; P9B– TATGCGCA CATCTGCC III; GCACATC 3× of P9B– V and C– III 18AL–IV C–II; 17AU– TGCGCACA 20 14.5 75 11 III; TCTGCGCA CATC 2× of P9B– V and C– III

17AU–II C–I; 18AL– TGCGCACA 31 12.3 72 10 III; TCTGCCGC ACATCTGC P9B–III; GCACATC 3× of P9B– V and C– III 17AU–III C–II; 18AL– TGCCGCAC 21 18.5 72 9 IV; ATCTGCGC ACATC 2× of P9B– V and C–III

Discussion:

The variety of band size classes observed upon amplification of the full marama rDNA

IGS (Fig. 2), and the continued observation of considerable variation after amplifying

successively shorter sections of the IGS when primer-walking toward the subrepeat

region (Figs. 6, 8, 13, 14) suggests that at least some of the IGS length heterogeneity can be accounted for by variations in the subrepeat region. The sequence data of the subrepeat region observed for 21 unique samples suggests that this variation is largely due to differences in the number of tandem repeats of a basic 10bp unit, or, as seems

29 more probable based on the analysis (see Tables 1 and 2), repeats of larger units which

may in turn be subdivided into tandemly repeated 10bp units. This of course implies a

homologous relationship between the smaller constituent subrepeat units and the larger

units appearing as integral multiples thereof, with the larger tandem repeat units (which in almost every case share greater matching and a lower percentage composition of

indels) being the product of more recent amplification events. There is ample evidence

that the tandem repeats found in many other IGS subrepeat regions share these

characteristics (for example, see Cordesse, et al; Da Rocha and Bertrand, 1995;

Fernández et al., 2000; and the review by Rogers and Bendich, 1987), though the

mechanism responsible for the production of tandem arrays is less clear, with unequal

crossover and replication slippage being prominent suggestions (Lassner et al., 1987; Da

Rocha and Bertrand, 1995).

Furthermore, a consideration of the structure of the rDNA IGS and of the subrepeat

regions of other species reveals some noteworthy similarities and differences with those

of marama. While the IGS length variation in other legumes, such as pea (Pisum sativum), fava bean (Vicia faba), and mung bean (Vigna radiata), were demonstrated to be based on variations in tandem arrays (Kato et al., 1990; Piller, et al., 1990; Yakura et al., 1984; Ellis et al., 1984), the variation in pea is based almost entirely on only a single, long tandem array of a ca.180bp repeat unit. The structure and size of the subrepeat region of the pea IGS appears quite distinct from marama, as shown in a dot plot comparison of a 2.5Kb section of the IGS adjacent to the 25S gene and containing the subrepeat regions of marama, pea, and the similarly structured white mustard (Sinapis

30 alba) (Fig. 21). Not only is the greater complexity of the marama subrepeat region

apparent, but so is the fact that it appears to be, at most, half the size of the pea and S.

alba subrepeat regions, with the balance of the marama IGS consisting of mostly non-

repetitive sequence.

Figure 21: Comparison of the IGS subrepeat regions of three species, each made by plotting 2.5Kb of each species’ IGS sequence (the full sequences of which are similarly sized, at 3-3.5Kb) against itself. Left: marama (T. esculentum); Middle: pea (P. sativum); Right: white mustard (S. alba). (Kato et al., 1990; Rathgeber and Capesius, 1990).

It should be noted, however, that other species do, like marama, have complex IGS subrepeat region structures. The IGS length variation in fava bean, while largely due to repetition of a 325bp repeat unit (itself found to consist of a duplet), also depends on four other clusters of repeats (Yakura et al., 1984). The IGS of another legume, the lentil

(Lens culinaris), seems to be similar in the number and organization of its subrepeat clusters, with two of four clusters found to be comprised of short 21bp and 11bp repeat units (Fernández et al., 2000). Yet other species shown to have significant tandem subrepeat clusters comprised of short repeat units include arugula (Eruca sativa –

Lakshmikumaran and Negi, 1994), black mustard (Brassica nigra – Bhatia et al., 1996),

and wheat (Lassner et al., 1987).

31 The above notwithstanding, the greatest amount of variation in the marama large rDNA

IGS that can be accounted for by the observed results alone is about 550bp. In order to account for the remaining 3.5Kb of observed IGS length variability (Fig. 2), it may be supposed that significant deletions in or adjacent to the subrepeat region, as observed in samples 25J and 12D (Figs. 9 and 11), or the existence of exceptionally expanded subrepeat clusters (Fig. 12) are responsible. The former possibility could be addressed by performing a PCR survey of a larger number of samples presenting significantly shorter

IGS length variants using the ‘outermost’ primer pairs (or combinations thereof) developed for primer walking in this study. A single additional round of primer walking and the subsequent comparison of these completed sequences with the full 3.2Kb IGS sequence would likely reveal prominent deletions. The latter possible explanation accounting for the remaining IGS length heterogeneity – that of expanded subrepeat clusters – is addressed below.

It is possible that the many ‘lesser’ bands observed when using each successive primer pair were spurious, caused perhaps by partial- or mis-priming during amplification. This was addressed as far as possible by increasing the PCR annealing temperature from 58° to 62°C (in 2° increments), and by comparing all primer sequences against the full IGS sequence, using the relatively permissive BLASTN local alignment algorithm, in an attempt to identify partial matches with primers. When amplifying certain samples with the ext3 primer pair, the effects of mis-priming were evident over multiple gel extraction

/ amplification cycles, with the recurrent production of ca. 950bp amplicons from ca.1.3 or 1.5Kb template fragments (Figs. 14-17). It is likely that this is due to a doubly- or

32 triply-repeated region in the template fragment containing the necessary portion of the

25SL2 primer’s annealing site to allow for the production of the shortest of these

amplicons. The production of the shortest amplicon would thus be expected to

competitively exclude the production of the longer amplicons, and this agrees with the

observed results.

To confirm this, PCR of genomic samples using the 25SLext1 primer (from the ext1

primer pair) or a novel 25S primer paired with the 18Sext3 primer could be performed, and new sequences could be obtained. Alternatively, a restriction digest of the IGS of

samples found to produce the larger (1.3-1.5Kb) ext3 fragments could be performed, thus

allowing for a comparison of the resulting fragment sizes (and their sum) with those

expected based on the reference IGS sequence. However, if a restriction site were to

unexpectedly reside in a large tandem repeat, this could yield confounding results.

Another alternative to control for these outcomes is to isolate and subclone the larger ext3

amplicon classes. The addition of necessarily unique primer sequences to the ends of

linear ext3 amplicons would be expected to allow for the successful sequencing, if not

PCR amplification (which may still be hindered by samples’ self-priming), of these

problematical samples.

This study of the length variation previously observed (Kuemerle et al., 2010; Nepolo et

al., 2010) in the marama rDNA IGS involved the production of a complete sequence of a

3.2Kb IGS and of additional amplicons, including the subrepeat region of 21 individual

plants’ samples. From an analysis of these sequences were identified distinct clusters of

33 tandemly arrayed subrepeats, at least one of which appears capable of accounting for

some IGS length heterogeneity. An understanding of the molecular features responsible

for the length variation across the full or partial sections of the marama IGS may allow

for future investigations into the heritability of these features (as in Ellis et al., 1984, and

Saghai-Maroof et al., 1984), and further analysis of the marama IGS sequence generated here may help identify important features related to the regulation of the rRNA genes

(Piller et al., 1990; Cordesse et al.1993; Kato et al., 1990; Lassner et al., 1997).

Appendix

Primer sequences (listed by pair name):

Ext0:

28SL: ACGAGAGGAACCGTTGATTC

18SR: CTCAATGAGCCCGGTATTGT

Ext1:

25SLext1: TGCCACGATCCACTGAGATT

18SRext1: CATCGGGTGGAAAGAATCAT

34

Ext2:

25SL2: TGTGGAGGGAGGGATACTTG

18SR2: CAACACTGCCTCGCAAAATA

Ext3:

[Same 25S primer as used in Ext2 pair]

18Sext3: ATGATCCCTTGATGGTGATCC

Complete IGS sequence of sample P4, 5′-3′ from the 25S to the 18S gene:

[25S_gene]CCCCCTCCTCCCAACATCCTCCCTCCGGAATGGGGGAGGTGCCCGGAGGCTA

GGGCCGGTGTTGTGAGACCATCATTTATCCTCCAAGTGTATGGGAATGAAGAGTGGAAC

CAGGGGGTTCATGTGAAACGATTGTTGTGATGTGGTGCCTAAGTGTGGAGGGAGGTATA

CTGATTCTTGGGATTTATCCTCTAAGTGTGGAGGGAGGGATACTTGGACCAGACACAGG

CACATGGTGCTGCATATGATGCACATCGTGCTGCACGAGTGGCCCTGCATTCTGCCCAT

GCATTCTGCACGAGTGGCCCTGTATACTGTAATATTATGTTATGTTATGTAACACTATG

GGGCTGTCTTGGTTGATTTAGCCATTTATGGTTCATGTCCTGTAACTTGAAGAGGATGG

GTTACTTTTTTCTTGACTATTCTTCAATTGGTTGAGCCCAGCTGCCGCACATATGCGCA

CATATGCGCCCAGCTGCCGCACATATGCGCACATATGCCCACATCTGCCGCACATATGC

GCACATCTGCTGCACATATGCCGCACATCTGCGCACAGCTGCCCACATCTGCCGCACAT

TGCTGCCCACGATGCTTGCACATCGCTGGCCATGTTGCCCTATGGGAGGGGGTGATTAT

35 TGAAATGGGCATATTTTCGGATTTAGAGGTTTTTCGGAATATGTTGCTGGCCATGGTGC

ACGGGATGGTGTCATTGGTGTCATCGCTGGTGCACATTGTGCACGGGATGATGCACATG

TTGCTGCACAACGGTGGCACGTTGCTGCACATAGTCGCAATAAGATTAGCCCAAGGAGG

GGCTGATTATTGAAATGGGCATATTTTTGGATTTAGAGGCTTTCTTGGGCTACGTTGCT

GAGAATGGTGCACTGGGTGGTGCACGTGTTGCTGCACATGGTGCACTGGATGGTGCACG

GGTTGCTGCACACGCTGCAATAGATGGTGCACATGTTGCAGGCCATCATGCATGGGATG

GTGGCATCGTTGGTGCACTTGATGCACACATTGATGCCAACGTTGGTGCACATGATGCA

CACATTGATGCCAAGGTTGGTGCACGTGATGCACACATTGATGCCAACGTTGGTGCACG

TGATGCACACATTGGTGCCAATGTTGGTGCACGAGCATGTGCACATGGCAGGAGCACAT

GGCAAGGGTTGGTGCAAGTAGCTAACAAGGTGATGGGTATAGACCAAATGGGTTGCCAA

TCTCTCAATTGGACATCTAGGCAATGTCCCAACGTCCAATCTGACCCCCCAACATGCAT

TCCCTATTTTTCCAGAATTTTGCTATTTTTTGGTTTTNTGGGATCACCATCAAGGGATC

ATATCTCACTACTNTGAGATGCAATTGCCAAGGTTAAGGTTCCATAATTCAATCCATGT

CCCAGGGATGCTCCACACCAAATTTCACCACCTGGGGCCCACCCGTTAATTTTTGCGAT

TTTTTTAAAGTTTTGCACTTTGAAAAATCATAAAAAATAATTGGTCCATAAAAAAATTA

TAAAATTTGGTCTCCACCTACATTGCATTTTTATGAACCTCCATACAAAATTTCAGCCC

AAAATTCCATCTCTAGCTCCAAAAATGACCCTTATGTTGGCTCAGAAAAATGATGTTTG

GTCCTGCCAGATGGGAATCTGTAAATTAAGCTCTTTAAGGGGGAGCTACTCGTCTGCCT

TGGGTAGGCAAGAGCGTGTGCCATGGGGCTCCCCCCCACGTCCAAGCATGCCGCAAGCA

ACGTGCTAGGTGTGTTTGGGCGTCATACCATGGCACACCATCGGGGGGTTGGCAAGCGC

GGTGGTGGGTGGCACCACCACCGCTGGCGGCCAACTTTCCGGCGAGGCGGGGGAATGTT

CGTATGGCACAAAAATGATTTTCTTGTCGCGCTTGATTGGGTTTTATCGATTAAAATGT

GTCATTTGAGCCCCCCTCAGCCTCGGAATAGCCAACGGATGCACGGACTTGACCCTCCC

CCCACCAGATTGTGCCACTCCCCCTACATTTTGTGACTTCCCAAGCACCATCGTGTGCC 36 AGTGCTGCCTCGTTCTTGTGGGTTGTGGCATTGATCCTTGCCTATATGGGCACGTTGAT

GTGAAGCATGGGCGCGTACTATGTGCTTGTGTGGATCATCAATGGGCTTGTAGAAGCAT

GGTGATTTGCTTGTCGGGTATTTTGCGAGGCAGTGTTGGCATGCATGGGTTTGGAGATG

CACCCATTGTTGCCTTTCTTTCTTCAATATGAGCACGTTTGGGGGTGCTTTGTTGGGTT

GCCATGCTTGGCATGGTGGGCTTGCTGTTACAATTTGCAAAGGTGAAGGGGTCGGTGCC

TAGCAAACAAAAGGTGTGGGATGACGGGTGTTGGCCTATTGTACGGCGTGCTCCATCTC

ATTAAATCCTTAGTAGCTAGCCCGGTGCTGGAGCTGGAGCTTCCTCGATGGCATTGCTT

ACTATGTTGGGCAGAAGCATTCCCAAAACCTAGCATTTCCATGTTGGTCGTTTGGATGA

CGTGCTCTAACCATTTTAGCCCCCGATTTCCCCACAGTCGATGCCTTCAAGAAGCCAAT

GACGTTTTTTGTTCCCCCCAACCAAAAGCAAAGCACAAGCCAACGTTTTTCATATGCCC

ACAATCCAAGACGTCCCTTGTCAAGTCCGGTTCGGTTCTGTTTTCATGTCGTTGCGACC

TGGTTATGGCATCACTCGGTGTTTCCATTTGCTCATGTGCCATTACGCACATGGGGGGT

GGTTTGCTTGGTGGGTGCTTGACTAGTCCCCGTGATGGTGTGTGAGAGGTGAATGGATT

GCCCGTGTTGGCTGGCTCTGTGCTTGTGCATAGAACTGTCTGCACCCGATCTTTGGTGT

GGTTATCCCTTGCCTTGTGGCATTGGGTTTGCTCGTTGTCGGTTCTTGTGTTGCCTACC

GATTTATCGGAATTCAGTTGCCTTGGATGATTCTTTCCACCCGATGCTCCTTGTTGGGC

AATTGGGGGACCTCGTGGCCTCCTTGGTGTTCCACTAGAGTTGCATGTTCATGCAGCCC

CTCGTGGTGCAACAAGGTTTGTGTTGTTCTTTTGGATGCGGTGCATGCGAAGGTTGTGG

GGCGCTTGCCTCGCTTGACCCTTTAGAGTATGCTCTTTCGAATGACATTCGTGCTCGCG

TCGGCATACCCGCCCTCGTTGGGGGTGTTGCACGTGCGATGTGCGGTGTCCTAGAGGAT[

18S_gene]

37 Sequence of band class J extracted from sample 25 amplified with primer pair ext1:

TTGCCACGATCCACTGAGATTAAGTCCGGTTCGGTTCTGTTTTCATGTCGTTGCGACCT

GGTTATGGCATCACTCGTGTTTCCATTTGCTCATGTGCCATTACGCACATGGGGGGTGG

TTTGCTTGGTGGGTGCTTGACTAGTCCCCGTGATGGTGTGTGAGAGGTGAATGGATTGC

CCGTGTTGGCTGGCTCTGTGCTTGTGCATAGAACTGTCTGCACCCGATCTTTGGTGTGG

TTATCCCTTGCCTTGTGGCATTGGGTTTGCTTGTTGTCGGTTCTTGTGTTGCCTACCGA

TTTATCGGAATTCAGTTGCCTTGGATGATTCTTTCCACCCGATGG

Sequence of band class D extracted from sample 12 amplified with primer pair ext1, and re-amplified with primer pair ext2:

GCATATGATGCACATSGTGSTGCACGAGTGGCCCTGCATTGTGCCCATGCCGCCCCACA

TCCWTGTGAACTTGCTGCCNGTGGGTGGGAACTTGGTGCGCATGGGTGSTGCACATMTG

YGCACATGAGGKGCCCMTGGTGCGCACAGCTGCCGCACAAGCTGCCATCATCTGCCGAA

CATGTGCCCATGGTTGCCCAAACCCGCCGCACATGTTGCCCACGTCTGCTGCACATCTG

TTGCACATCTGCCGACATCTGCCGCACAGCTGCACACATCTGCCGCACATATGCGCCCA

GCTGCCGCATATATGTGCACATTTACGCACAGGTGCCGCACATATGTGCACATTTGCGC

ACAGGTGTCGCACATCTGCCGCACATATGCACACAGCTGCCACACATATACGCACATAT

GCCGCACATATGCGCACATCTGCCGACATTTGCCGCACATATGCGGACATCTGCCGCAC

AGCTGCGCACATCTACCGCACATATGCGCCCAGCTGCCGCACATAAGTGCACATCTGCC

GCACATATGCGCCCAGCTACCGCTGGCGGCCAAATTTTCGGCGAGGCGGGGGAATGTTC

GTATGGCACAAAAATGATTTTCTGGTCGCGCTTGATTGGGTTTTATGGATTAAAATGTG

TCATTTGAGCCCCCCTCGGCCTCGAAATAGCCAACGGATGCACGAACTTGACCCTCCCC 38 CCACCAGATTGTGCCACTCCCCCAACATTTTGTGACTTCCCAAGCACCATCGTGTGCCA

GTGCTGCCTTGTTCTTGTGGGTTGTGGCATTGATCCTTGCCTATATGGGCACGTTGATG

TGAAGCATGGGCGCGTGCTATGTGCTTGTGTGGATCATGCATGGGCTGTAGAAGCATGT

TGAT

Consensus sequences of the left and right reads of the subrepeat region (amplified with primer pair ext3) of a number of samples:

>P4:

TGCATATGATGCACATCGTGCTGCACGAGTGGCCCTGCATTCTGCCCATGCATTCTGCA

CGAGTGGCCCTGTATACTGTAATATTATGTTATGTTATGTAACACTATGGGGCTGTCTT

GGTTGATTTAGCCATTTATGGTTCATGTCCTGTAACTTGAAGAGGATGGGTTACTTTTT

TCTTGACTATTCTTCAATTGGTTGAGCCCAGCTGCCGCACATATGCGCACATATGCGCC

CAGCTGCCGCACATATGCGCACATATGCCCACATCTGCCGCACATATGCGCACATCTGC

TGCACATATGCCGCACATCTGCGCACAGCTGCCCACATCTGCCGCACATTGCTGCCCAC

GATGCTTGCACATCGCTGGCCATGTTGCCCTATGGGAGGGGGTGATTATTGAAATGGGC

ATATTTTCGGATTTAGAGGTTTTTCGGAATATGTTGCTGGCCATGGTGCACGGGATGGT

GTCATTGGTGTCATCGCTGGTGCACATTGTGCACGGGATGATGCACATGTTGCTGCACA

ACGGTGGCACGTTGCTGCACATAGTCGCAATAAGATTAGCCCAAGGAGGGGCTGATTAT

TGAAATGGGCATATTTTTGGATTTAGAGGCTTTCTTGGGCTACGTTGCTGAGAATGGTG

CACTGGGTGGTGCACGTGTTGCTGCACATGGTGCACTGGATGGTGCACGGGTTGCTGCA

CACGCTGCAATAGATGGTGCACATGTTGCAGGCCATCATGCATGGGATGGTGGCATCGT

TGGTGCACTTGATGCACACATTGATGCCAACGTTGGTGCACATGATGCACACATTGATG

CCAAGGTTGGTGCACGTGATGCACACATTGATGCCAACGTTGGTGCACGTGATGCACAC 39 ATTGGTGCCAATGTTGGTGCACGAGCATGTGCACATGGCAGGAGCACATGGCAAGGGTT

GGTGCAAGTAGCTAACAAGGTGATGGGTATAGACCAAATGGGTTGCCAATCTCTCAATT

GGACATCTAGGCAATGTCCCAACGTCCAATCTGACCCCCCAACATGCAT

>1B:

ATATGCGCCCAGCTGCCGCACATATGCGCACATATGCCCACATCTGCCGCACATATGCG

CACATCTGCTGCACATATGCCGCACATCTGCGCACAGCTGCCCACATCTGCCGCACATT

GCTGCCCACGATGCTTGCACATCGCTGGCCATGTTGCCCTATGGGAGGGGGTGATTATT

GAAATGGGCATATTTTCGGATTTAGAGGTTTTTCGGAATATGTTGCTGGCCATGGTGCA

CGGGATGGTGTCATTGGTGTCATCGCTGGTGCACATTGTGCACGGGATGATGCACATGT

TGCTGCACAACGGTGGCACGTTGCTGCACATAGTCGCAATAAGATTAGCCCAAGGAGGG

GCTGATTATTGAAATGGGCATATTTTTGGATTTAGAGGCTTTCTTGGGCTACGTTGCTG

AGAATGGTGCACTGGGTGGTGCACGTGTTGCTGCACATGGTGCACTGGATGGTGCACGG

GTTGCTGCACACGCTGCAATAGATGGTGCACATGTTGCAGGCCATCATGCATGGGATGG

TGGCATCGTTGGTGCACTTGATGCACACATTGATGCCAACGTTGGTGCACATGATGCAC

ACATTGATGCCAAGGTTGGTGCACGTGATGCACACATTGATGCCAACGTTGGTGCACGT

GATGCACACATTGGTGCCAATGTTGGTGCACGAGCATGTGCACATGGCAGGAGCACATG

GCAAGGGTTGGTGCAAGTAGCTAACAAGGTGATGGGTATAGACCAAATGGGTTGCCAAT

CTCTCAATTGGACATCTAGGCAATGTCCCAACGTCCAATCTGACCCCCCAACATGCATT

CCCTATTTTCCAGA

>3C:

TGCGCACATATGCGCCCAGCTGCCGCACATATGCGCACATATGCCCACATCTGCCGCAC

ATATGCGCACATCTGCTGCACATATGCCGCACATCTGCGCACAGCTGCCCACATCTGCC 40 GCACATTGCTGCCCACGATGCTTGCACATCGCTGGCCATGTTGCCCTATGGGAGGGGGT

GATTATTGAAATGGGCATATTTTCGGATTTAGAGGTTTTTCGGAATATGTTGCTGGCCA

TGGTGCACGGGATGGTGTCATTGGTGTCATCGCTGGTGCACATTGTGCACGGGATGATG

CACATGTTGCTGCACAACGGTGGCACGTTGCTGCACATAGTCGCAATAAGATTAGCCCA

AGGAGGGGCTGATTATTGAAATGGGCATATTTTTGGATTTAGAGGCTTTCTTGGGCTAC

GTTGCTGAGAATGGTGCACTGGGTGGTGCACGTGTTGCTGCACATGGTGCACTGGATGG

TGCACGGGTTGCTGCACACGCTGCAATAGATGGTGCACATGTTGCAGGCCATCATGCAT

GGGATGGTGGCATCGTTGGTGCACTTGATGCACACATTGATGCCAACGTTGGTGCACAT

GATGCACACATTGATGCCAAGGTTGGTGCACGTGATGCACACATTGATGCCAACGTTGG

TGCACGTGATGCACACATTGGTGCCAATGTTGGTGCACGAGCATGTGCACATGGCAGGA

GCACATGGCAAGGGTTGGTGCAAGTAGCTAACAAGGTGATGGGTATAGACCAAATGGGT

TGCCAATCTCTCAATTGGACATCTAGGCAATGTCCCAACGTCCAATCTGACCCCCCAAC

ATGCATTCCCTATTTTTCCAGAATT

>4B:

GCACATATGCGCCCAGCTGCCGCACATATGCGCACATATGCCCACATCTGCCGCACATA

TGCGCACATCTGCTGCACATATGCCGCACATCTGCGCACAGCTGCCCACATCTGCCGCA

CATTGCTGCCCACGATGCTTGCACATCGCTGGCCATGTTGCCCTATGGGAGGGGGTGAT

TATTGAAATGGGCATATTTTCGGATTTAGAGGTTTTTCGGAATATGTTGCTGGCCATGG

TGCACGGGATGGTGTCATTGGTGTCATCGCTGGTGCACATTGTGCACGGGATGATGCAC

ATGTTGCTGCACAACGGTGGCACGTTGCTGCACATAGTCGCAATAAGATTAGCCCAAGG

AGGGGCTGATTATTGAAATGGGCATATTTTTGGATTTAGAGGCTTTCTTGGGCTACGTT

GCTGAGAATGGTGCACTGGGTGGTGCACGTGTTGCTGCACATGGTGCACTGGATGGTGC

ACGGGTTGCTGCACACGCTGCAATAGATGGTGCACATGTTGCAGGCCATCATGCATGGG 41 ATGGTGGCATCGTTGGTGCACTTGATGCACACATTGATGCCAACGTTGGTGCACATGAT

GCACACATTGATGCCAAGGTTGGTGCACGTGATGCACACATTGATGCCAACGTTGGTGC

ACGTGATGCACACATTGGTGCCAATGTTGGTGCACGAGCATGTGCACATGGCAGGAGCA

CATGGCAAGGGTTGGTGCAAGTAGCTAACAAGGTGATGGGTATAGACCAAATGGGTTGC

CAATCTCTCAATTGGACATCTAGGCAATGTCCCAACGTCCAATCTGACCCCCCAACATG

CATTCCCTATTTTCCAGA

>11B/CL/D:

GCACATATACGCACATATGCGCCCAGCTGCCGCACATATGCGCACATATGCCCACATCT

GCCGCACATATGCGCACATCTGCTGCACATATGCCGCACATCTGCGCACAGCTGCCCAC

ATCTGCCGCACATTGCTGCCCACGATGCTTGCACATCGCTGGCCATGTTGCCCTATGGG

AGGGGGTGATTATTGAAATGGGCATATTTTCGGATTTAGAGGTTTTTCGGAATATGTTG

CTGGCCATGGTGCACGGGATGGTGTCATTGGTGTCATCGCTGGTGCACATTGTGCACGG

GATGATGCACATGTTGCTGCACATAGTCGCAATAAGATTAGCCCAAGGAGGGGCTGATT

ATTGAAATGGGCATATTTTTGGATTTAGAGGCTTTCTTGGGCTACGTTGCTGAGAATGG

TGCACTGGGTGGTGCACGTGTTGCTGCACATGGTGCACTGGATGGTGCACGGGTTGCTG

CACACGCTGCAATAGATGGTGCACATGTTGCAGGCCATCATGCATGGGATGGTGGCATC

GTTGGTGCACTTGATGCACACATTGATGCCAACGTTGGTGCACATGATGCACACATTGA

TGCCAAGGTTGGTGCACGTGATGCACACATTGATGCCAACGTTGGTGCACGTGATGCAC

ACATTGGTGCCAATGTTGGTGCACGAGCATGTGCACATGGCAGGAGCACATGGCAAGGG

TTGGTGCAAGTAGCTAACAAGGTGATGGGTATAGACCAAATGGGTTGCCAATCTCTCAA

TTGGACATCTAGGCAATGTCCCAACGTCCAATCTGACCCCCCAACATGCATTCCCTATT

TTCCAGAAT 42 >12CL/CU:

CACATATGCGCACATATGCCCACATCTGCCGCACATATGCGCACATCTGCTGCACATAT

GCCGCACATCTGCGCACAGCTGCCCACATCTGCCGCACATTGCTGCCCACGATGCTTGC

ACATCGCTGGCCATGTTGCCCTATGGGAGGGGGTGATTATTGAAATGGGCATATTTTCG

GATTTAGAGGTTTTTCGGAATATGTTGCTGGCCATGGTGCACGGGATGGTGTCATTGGT

GTCATCGCTGGTGCACATTGTGCACGGGATGATGCACATGTTGCTGCACAACGGTGGCA

CGTTGCTGCACATAGTCGCAATAAGATTAGCCCAAGGAGGGGCTGATTATTGAAATGGG

CATATTTTTGGATTTAGAGGCTTTCTTGGGCTACGTTGCTGAGAATGGTGCACTGGGTG

GTGCACGTGTTGCTGCACATGGTGCACTGGATGGTGCACGGGTTGCTGCACACGCTGCA

ATAGATGGTGCACATGTTGCAGGCCATCATGCATGGGATGGTGGCATCGTTGGTGCACT

TGATGCACACATTGATGCCAACGTTGGTGCACATGATGCACACATTGATGCCAAGGTTG

GTGCACGTGATGCACACATTGATGCCAACGTTGGTGCACGTGATGCACACATTGGTGCC

AATGTTGGTGCACGAGCATGTGCACATGGCAGGAGCACATGGCAAGGGTTGGTGCAAGT

AGCTAACAAGGTGATGGGTATAGACCAAATGGGTTGCCAATCTCTCAATTGGACATCTA

GGCAATGTCCCAACGTCCAATCTGACCCCCCAACATGCATTCCCTATTTTCCAGAAT

>17AL/B/CL:

ATATGCGCACATATGCGCCCAGCTGCCGCACATATGCGCACATATGCCCACATCTGCCG

CACATATGCGCACATCTGCTGCACATATGCCGCACATCTGCGCACAGCTGCCCACATCT

GCCGCACATTGCTGCCCACGATGCTTGCACATCGCTGGCCATGTTGCCCTATGGGAGGG

GGTGATTATTGAAATGGGCATATTTTCGGATTTAGAGGTTTTTCGGAATATGTTGCTGG

CCATGGTGCACGGGATGGTGTCATTGGTGTCATCGCTGGTGCACATTGTGCACGGGATG

ATGCACATGTTGCTGCACAACGGTGGCACGTTGCTGCACATAGTCGCAATAAGATTAGC

CCAAGGAGGGGCTGATTATTGAAATGGGCATATTTTTGGATTTAGAGGCTTTCTTGGGC 43 TACGTTGCTGAGAATGGTGCACTGGGTGGTGCACGTGTTGCTGCACATGGTGCACTGGA

TGGTGCACGGGTTGCTGCACACGCTGCAATAGATGGTGCACATGTTGCAGGCCATCATG

CATGGGATGGTGGCATCGTTGGTGCACTTGATGCACACATTGATGCCAACGTTGGTGCA

CATGATGCACACATTGATGCCAAGGTTGGTGCACGTGATGCACACATTGATGCCAACGT

TGGTGCACGTGATGCACACATTGGTGCCAATGTTGGTGCACGAGCATGTGCACATGGCA

GGAGCACATGGCAAGGGTTGGTGCAAGTAGCTAACAAGGTGATGGGTATAGACCAAATG

GGTTGCCAATCTCTCAATTGGACATCTAGGCAATGTCCCAACGTCCAATCTGACCCCCC

AACATGCATTCCCTATTTTTCCA

>21:

GAGCCCAGCTGCCGCACATATGCGCACATATGCGCCCAGCTGCCGCACATATGCGCACA

TATGCCCACATCTGCCGCACATATGCGCACATCTGCTGCACATATGCCGCACATCTGCG

CACAGCTGCCCACATCTGCCGCACATTGCTGCCCACGATGCTTGCACATCGCTGGCCAT

GTTGCCCTATGGGAGGGGGTGATTATTGAAATGGGCATATTTTCGGATTTAGAGGTTTT

TCGGAATATGTTGCTGGCCATGGTGCACGGGATGGTGTCATTGGTGTCATCGCTGGTGC

ACATTGTGCACGGGATGATGCACATGTTGCTGCACAACGGTGGCACGTTGCTGCACATA

GTCGCAATAAGATTAGCCCAAGGAGGGGCTGATTATTGAAATGGGCATATTTTTGGATT

TAGAGGCTTTCTTGGGCTACGTTGCTGAGAATGGTGCACTGGGTGGTGCACGTGTTGCT

GCACATGGTGCACTGGATGGTGCACGGGTTGCTGCACACGCTGCAATAGATGGTGCACA

TGTTGCAGGCCATCATGCATGGGATGGTGGCATCGTTGGTGCACTTGATGCACACATTG

ATGCCAACGTTGGTGCACATGATGCACACATTGATGCCAAGGTTGGTGCACGTGATGCA

CACATTGATGCCAACGTTGGTGCACGTGATGCACACATTGGTGCCAATGTTGGTGCACG

AGCATGTGCACATGGCAGGAGCACATGGCAAGGGTTGGTGCAAGTAGCTAACAAGGTGA

44 TGGGTATAGACCAAATGGGTTGCCAATCTCTCAATTGGACATCTAGGCAATGTCCCAAC

GTCCAATCTGACCCCCCAACATGCATTCCCTATTTTTCCAGAATTTT

>26A:

CCAGCTGCCGCACATATGCGCACATATGCGCCCAGCTGCCGCACATATGCGCACATATG

CCCACATCTGCCGCACATATGCGCACATCTGCTGCACATATGCCGCACATCTGCGCACA

GCTGCCCACATCTGCCGCACATTGCTGCCCACGATGCTTGCACATCGCTGGCCATGTTG

CCCTATGGGAGGGGGTGATTATTGAAATGGGCATATTTTCGGATTTAGAGGTTTTTCGG

AATATGTTGCTGGCCATGGTGCACGGGATGGTGTCATTGGTGTCATCGCTGGTGCACAT

TGTGCACGGGATGATGCACATGTTGCTGCACAACGGTGGCACGTTGCTGCACATAGTCG

CAATAAGATTAGCCCAAGGAGGGGCTGATTATTGAAATGGGCATATTTTTGGATTTAGA

GGCTTTCTTGGGCTACGTTGCTGAGAATGGTGCACTGGGTGGTGCACGTGTTGCTGCAC

ATGGTGCACTGGATGGTGCACGGGTTGCTGCACACGCTGCAATAGATGGTGCACATGTT

GCAGGCCATCATGCATGGGATGGTGGCATCGTTGGTGCACTTGATGCACACATTGATGC

CAACGTTGGTGCACATGATGCACACATTGATGCCAAGGTTGGTGCACGTGATGCACACA

TTGATGCCAACGTTGGTGCACGTGATGCACACATTGGTGCCAATGTTGGTGCACGAGCA

TGTGCACATGGCAGGAGCACATGGCAAGGGTTGGTGCAAGTAGCTAACAAGGTGATGGG

TATAGACCAAATGGGTTGCCAATCTCTCAATTGGACATCTAGGCAATGTCCCAACGTCC

AATCTGACCCCCCAACATGCATTCCCTATTTTTCCAGAATT

>31:

AGCTGCCGCACATATGCGCACATATGCGCCCAGCTGCCGCACATATGCGCACATATGCC

CACATCTGCCGCACATATGCGCACATCTGCTGCACATATGCCGCACATCTGCGCACAGC

TGCCCACATCTGCCGCACATTGCTGCCCACGATGCTTGCACATCGCTGGCCATGTTGCC 45 CTATGGGAGGGGGTGATTATTGAAATGGGCATATTTTCGGATTTAGAGGTTTTTCGGAA

TATGTTGCTGGCCATGGTGCACGGGATGGTGTCATTGGTGTCATCGCTGGTGCACATTG

TGCACGGGATGATGCACATGTTGCTGCACAACGGTGGCACGTTGCTGCACATAGTCGCA

ATAAGATTAGCCCAAGGAGGGGCTGATTATTGAAATGGGCATATTTTTGGATTTAGAGG

CTTTCTTGGGCTACGTTGCTGAGAATGGTGCACTGGGTGGTGCACGTGTTGCTGCACAT

GGTGCACTGGATGGTGCACGGGTTGCTGCACACGCTGCAATAGATGGTGCACATGTTGC

AGGCCATCATGCATGGGATGGTGGCATCGTTGGTGCACTTGATGCACACATTGATGCCA

ACGTTGGTGCACATGATGCACACATTGATGCCAAGGTTGGTGCACGTGATGCACACATT

GATGCCAACGTTGGTGCACGTGATGCACACATTGGTGCCAATGTTGGTGCACGAGCATG

TGCACATGGCAGGAGCACATGGCAAGGGTTGGTGCAAGTAGCTAACAAGGTGATGGGTA

TAGACCAAATGGGTTGCCAATCTCTCAATTGGACATCTAGGCAATGTCCCAACGTCCAA

TCTGACCCCCCAACATGCATTCCCTATTTTTCCAGAATTTTGCTA

>34

GAGCCCAGCTGCCGCACATATGCGCACATATGCGCCCAGCTGCCGCACATATGCGCACA

TATGCCCACATCTGCCGCACATATGCGCACATCTGCGCACAGCTGCCCACATCTGCCGC

ACATTGCTGCCCACGATGCTTGCACATCGCTGGCCATGTTGCCCTATGGGAGGGGGTGA

TTATTGAAATGGGCATATTTTCGGATTTAGAGGTTTTTCGGAATATGTTGCTGGCCATG

GTGCACGGGATGGTGTCATTGGTGTCATCGCTGGTGCACATTGTGCACGGGATGATGCA

CATGTTGCTGCACAACGGTGGCACGTTGCTGCACATAGTCGCAATAAGATTAGCCCAAG

GAGGGGCTGATTATTTAAATGGGCATATTTTTGGATTTAGAGGCTTTCTTGGGCTACGT

TGCTGAGAATGGTGCACTAGGCGGTGCACGTGTTGCTGCACATGGTGCACTGGATGGTG

CACGGGTTGCTGCACACGCTGCAATAGATGGTGCACATGTTGCAGGCCATCATGCATGG

GATGGTGGCATCGTTGGTGCACTTGATGCACACATTGATGCCAACGTTGGTGCACATGA 46 TGCACACATTGATGCCAAGGTTGGTGCACGTGATGCACACATTGATGCCAACGTTGGTG

CACGTGATGCACACATTGGTGCCAATGTTGGTGCACGAGCATGTGCACATGGCAGGAGC

ACATGGCAAGGGTTGGTGCAAGTAGCTAACAAGGTGATGGGTATAGACCAAATGGGTTG

CCAATCTCTCAATTGGACATCTAGGCAATGTCCCAACGTCCAATCTGACCCCCCAACAT

GCATTCCCTAT

>I:

GCACATATGTGCACATATGCGCCCAGCTGCCGCACATATGCGCACATATGCCCACATCT

GCCGCACATATGCGCACATCTGCTGCACATATGCCGCACATCTGCGCACAGCTGCCCAC

ATCTGCCGCACATTGCTGCCCACGATGCTTGCACATCGCTGGCCATGTTGCCCTATGGG

AGGGGGTGATTATTGAAATGGGCATATTTTCGGATTTAGAGGTTTTTCGGAATATGTTG

CTGGCCATGGTGCACGGGATGGTGTCATCGCTGGTGCACATTGTGCACGGGATGATGCA

CATGTTGCTGCACAACGGTGGCACGTTGCTGCACATAGTCGCAATAAGATTAGCCCAAG

GAGGGGCTGATTATTGAAATGGGCATATTTTTGGATTTAGAGGCTTTCTTGGGCTACGT

TGCTGAGAATGGTGCACTGGGTGGTGCACGTGTTGCTGCACATGGTGCACTGGATGGTG

CACGGGTTGCTGCACACGCTGCAATAGATGGTGCACATGTTGCAGGCCATCATGCATGG

GATGGTGGCATCGTTGGTGCACTTGATGCACACATTGATGCCAACGTTGGTGCACATGA

TGCACACATTGATGCCAAGGTTGGTGCACGTGATGCACACATTGATGCCAACGTTGGTG

CACGTGATGCACACATTGGTGCCAATGTTGGTGCACGAGCATGTGCACATGGCAGGAGC

ACATGGCAAGGGTTGGTGCAAGTAGCTAACAAGGTGATGGGTATAGACCAAATGGGTTG

CCAATCTCTCAATTGGACATCTAGGCAATGTCCCAACGTCCAATCTGACCCCCCAACAT

GCATTCCCTATTTTTCCAGAATTTT

47 >II:

TTTGAGCCCAGCTGCCGCACATATGCGCACATATGCGCCCAGCTGCCGCACATATGCGC

ACATATGCCCACATCTGCCGCACATATGCGCACATCTGCTGCACATATGCCGCACATCT

GCGCACAGCTGCCCACATCTGCCGCACATTGCTGCCCACGATGCTTGCACATCGCTGGC

CATGTTGCCCTATGGGAGGGGGTGATTATTGAAATGGGCATATTTTCGGATTTAGAGGT

TTTTCGGAATATGTTGCTGGCCATGGTGCACGGGATGGTGTCATTGGTGTCATCGCTGG

TGCACATTGTGCACGGGATGATGCACATGTTGCTGCACAACGGTGGCACGTTGCTGCAC

ATAGTCGCAATAAGATTAGCCCAAGGAGGGGCTGATTATTGAAATGGGCATATTTTTGG

ATTTAGAGGCTTTCTTGGGCTACGTTGCTGAGAATGGTGCACTGGGTGGTGCACGTGTT

GCTGCACATGGTGCACTGGATGGTGCACGGGTTGCTGCACACGCTGCAATAGATGGTGC

ACATGTTGCAGGCCATCATGCATGGGATGGTGGCATCGTTGGTGCACTTGATGCACACA

TTGATGCCAACGTTGGTGCACATGATGCACACATTGATGCCAAGGTTGGTGCACGTGAT

GCACACATTGATGCCAACGTTGGTGCACGTGATGCACACATTGGTGCCAATGTTGGTGC

ACGAGCATGTGCACATGGCAGGAGCACATGGCAAGGGTTGGTGCAAGTAGCTAACAAGG

TGATGGGTATAGACCAAATGGGTTGCCAATCTCTCAATTGGACATCTAGGCAATGTCCC

AACGTCCAATCTGACCCCCCAACATGCATTCCCTATTTTTCCAGAATTTTGCTATTTTT

TGGTTTTCTGGGATCAC

>IV:

TTGGTTGAGCCCAGCTGCCGCACATATGCGCACATATGCCCACATCTGCCGCACATATG

CGCACATCTGCTGCACATATGCCGCACATCTGCGCACAGCTGCCCACATCTGCCGCACA

TTGCTGCCCACGATGCTTGCACATCGCTGGCCATGTTGCCCTATGGGAGGGGGTGATTA

TTGAAATGGGCATATTTTCGGATTTAGAGGTTTTTCGGAATATGTTGCTGGCCATGGTG

CACGGGATGGTGTCATTGGTGTCATCGCTGGTGCACATTGTGCACGGGATGATGCACAT 48 GTTGCTGCACAACGGTGGCACGTTGCTGCACATAGTCGCAATAAGATTAGCCCAAGGAG

GGGCTGATTATTGAAATGGGCATATTTTTGGATTTAGAGGCTTTCTTGGGCTACGTTGC

TGAGAATGGTGCACTGGGTGGTGCACGTGTTGCTGCACATGGTGCACTGGATGGTGCAC

GGGTTGCTGCACACGCTGCAATAGATGGTGCACATGTTGCAGGCCATCATGCATGGGAT

GGTGGCATCGTTGGTGCACTTGATGCACACATTGATGCCAACGTTGGTGCACATGATGC

ACACATTGATGCCAAGGTTGGTGCACGTGATGCACACATTGATGCCAACGTTGGTGCAC

GTGATGCACACATTGGTGCCAATGTTGGTGCACGAGCATGTGCACATGGCAGGAGCACA

TGGCAAGGGTTGGTGCAAGTAGCTAACAAGGTGATGGGTATAGACCAAATGGGTTGCCA

ATCTCTCAATTGGACATCTAGGCAATGTCCCAACGTCCAATCTGACCCCCCAACATGCA

TTCCCTATTTTTCCAGAATTTTG

>VII:

TGCCGCACATATGCGCACATATGCCCACATCTGCCGCACATATGCGCACATCTGCTGCA

CATATGCCGCACATCTGCGCACAGCTGCCCACATCTGCCGCACATTGCTGCCCACGATG

CTTGCACATCGCTGGCCATGTTGCCCTATGGGAGGGGGTGATTATTGAAATGGGCATAT

TTTCGGATTTAGAGGTTTTTCGGAATATGTTGCTGGCCATGGTGCACGGGATGGTGTCA

TTGGTGTCATCGCTGGTGCACATTGTGCACGGGATGATGCACATGTTGCTGCACAACGG

TGGCACGTTGCTGCACATAGTCGCAATAAGATTAGCCCAAGGAGGGGCTGATTATTGAA

ATGGGCATATTTTTGGATTTAGAGGCTTTCTTGGGCTACGTTGCTGAGAATGGTGCACT

GGGTGGTGCACGTGTTGCTGCACATGGTGCACTGGATGGTGCACGGGTTGCTGCACACG

CTGCAATAGATGGTGCACATGTTGCAGGCCATCATGCATGGGATGGTGGCATCGTTGGT

GCACTTGATGCACACATTGATGCCAACGTTGGTGCACATGATGCACACATTGATGCCAA

GGTTGGTGCACGTGATGCACACATTGATGCCAACGTTGGTGCACGTGATGCACACATTG

GTGCCAATGTTGGTGCACGAGCATGTGCACATGGCAGGAGCACATGGCAAGGGTTGGTG 49 CAAGTAGCTAACAAGGTGATGGGTATAGACCAAATGGGTTGCCAATCTCTCAATTGGAC

ATCTAGGCAATGTCCCAACGTCCAATCTGACCCCCCAACATGCATTCC

>VIII:

ATATGCGCACATATGCGCCCAGCTGCCGCACATATGCGCACATATGCCCACATCTGCCG

CACATATGCGCACATCTGCTGCACATATGCCGCACATCTGCGCACAGCTGCCCACATCT

GCCGCACATTGCTGCCCACGATGCTTGCACATCGCTGGCCATGTTGCCCTATGGGAGGG

GGTGATTATTGAAATGGGCATATTTTCGGATTTAGAGGTTTTTCGGAATATGTTGCTGG

CCATGGTGCACGGGATGGTGTCATTGGTGTCATCGCTGGTGCACATTGTGCACGGGATG

ATGCACATGTTGCTGCACAACGGTGGCACGTTGCTGCACATAGTCGCAATAAGATTAGC

CCAAGGAGGGGCTGATTATTGAAATGGGCATATTTTTGGATTTAGAGGCTTTCTTGGGC

TACGTTGCTGAGAATGGTGCACTGGGTGGTGCACGTGTTGCTGCACATGGTGCACTGGA

TGGTGCACGGGTTGCTGCACACGCTGCAATAGATGGTGCACATGTTGCAGGCCATCATG

CATGGGATGGTGGCATCGTTGGTGCACTTGATGCACACATTGATGCCAACGTTGGTGCA

CATGATGCACACATTGATGCCAAGGTTGGTGCACGTGATGCACACATTGATGCCAACGT

TGGTGCACGTGATGCACACATTGGTGCCAATGTTGGTGCACGAGCATGTGCACATGGCA

GGAGCACATGGCAAGGGTTGGTGCAAGTAGCTAACAAGGTGATGGGTATAGACCAAATG

GGTTGCCAATCTCTCAATTGGACATCTAGGCAATGTCCCAACGTCCAATCTGACCCCCC

AACATGC

>IX:

TGACTATTCTTTCAATTGGTTGAGCCCAGCTGCCGCACATATGCGCACATATGCGCCCA

GCTGCCGCACATATGCGCACATATGCCCACATCTGCCGCACATATGCGCACATCTGCTG

CACATATGCCGCACATCTGCGCACAGCTGCCCACATCTGCCGCACATTGCTGCCCACGA 50 TGCTTGCACATCGCTGGCCATGTTGCCCTATGGGAGGGGGTGATTATTGAAATGGGCAT

ATTTTCGGATTTAGAGGTTTTTCGGAATATGTTGCTGGCCATGGTGCACGGGATGGTGT

CATTGGTGTCATCGCTGGTGCACATTGTGCACGGGATGATGCACATGTTGCTGCACATA

GTCGCAATAAGATTAGCCCAAGGAGGGGCTGATTATTGAAATGGGCATATTTTTGGATT

TAGAGGCTTTCTTGGGCTACGTTGCTGAGAATGGTGCACGTGTTGCTGCACATGGTGCA

CTGGATGGTGCACGGGTTGCTGCACACGCTGCAATAGATGGTGCACATGTTGCAGGCCA

TCATGCATGGGATGGTGGCATCGTTGGTGCACTTGATGCACACATTGATGCCAACGTTG

GTGCACATGATGCACACATTGATGCCAAGGTTGGTGCACGTGATGCACACATTGATGCC

AACGTTGGTGCACGTGATGCACACATTGGTGCCAATGTTGGTGCACGAGCATGTGCACA

TGGCAGGAGCACATGGCAAGGGTTGGTGCAAGTAGCTAACAAGGTGATGGGTATAGACC

AAATGGGTTGCCAATCTCTCAATTGGACATCTAGGCAATGTCCCAACGTCCAATCTGAC

CCCCCAACATGCATTCCCTATTTTTCCAGAATTTTGCTATTTTTTGGTTTTTCTGGGAT

CACCATCAA

>X:

AATTGGTTGAGCCCAGCTGCCGCACATATGCGCACATATGCGCCCAGCTGCCGCACATA

TGCGCACATATGCCCACATCTGCCGCACATATGCGCACATCTGCTGCACATATGCCGCA

CATCTGCGCACAGCTGCCCACATCTGCCGCACATTGCTGCCCACGATGCTTGCACATCG

CTGGCCATGTTGCCCTATGGGAGGGGGTGATTATTGAAATGGGCATATTTTCGGATTTA

GAGGTTTTTCGGAATATGTTGCTGGCCATGGTGCACGGGATGGTGTCATTGGTGTCATC

GCTGGTGCACATTGTGCACGGGATGATGCACATGTTGCTGCACAACGGTGGCACGTTGC

TGCACATAGTCGCAATAAGATTAGCCCAAGGAGGGGCTGATTATTGAAATGGGCATATT

TTTGGATTTAGAGGCTTTCTTGGGCTACGTTGCTGAGAATGGTGCACTGGGTGGTGCAC

GTGTTGCTGCACATGGTGCACTGGATGGTGCACGGGTTGCTGCACACGCTGCAATAGAT 51 GGTGCACATGTTGCAGGCCATCATGCATGGGATGGTGGCATCGTTGGTGCACTTGATGC

ACACATTGATGCCAACGTTGGTGCACATGATGCACACATTGATGCCAAGGTTGGTGCAC

GTGATGCACACATTGATGCCAACGTTGGTGCACGTGATGCACACATTGGTGCCAATGTT

GGTGCACGAGCATGTGCACATGGCAGGAGCACATGGCAAGGGTTGGTGCAAGTAGCTAA

CAAGGTGATGGGTATAGACCAAATGGGTTGCCAATCTCTCAATTGGACATCTAGGCAAT

GTCCCAACGTCCAATCTGACCCCCCAACATGCATTCCCTATTTTTCCAGAATTTTG

>P9B:

TTCTGTCTTCTTCTCCGAATAGTGCAATGGGAAAGTGACCATTAGATGGAAAGGGGCCA

ATAGATGGAAAGGGATGGTGGGATCCCAAGATATATGCATACAAGATGGATGAACGAGT

GGGTGCTGCACATATGCGCACATGAGGTGCCCATGGTGCGCACAGCTGCCGCACATAGC

TGCCATCATCTGCCGAACATGTGCGCATTGGGTGGTGCACATCTGCGCACATGAAGGTG

CCCGTGGTGCGCACAGCTGCCGCACATAGCTGCCATCATCTGCCGAACATGTGCGCACA

GTTGCCCATATCTGCCGCACATATGCGCACATCTGCCGCACATCTGCCGCACATATGTG

CACATCTGCGCACAGCTGCCGACATGTGCCGCACATATGCGCACATCTGCGCCCAGCTG

CCGCACATGATGCGCACAGCTGCGCACATCTGCCGACATCAGCCACACATGTGCCGCAC

ATATGCGCACATCTGCTGCACATATGCCGCACATCTGCGCACAGCTGCCCACATCTGCC

GCACATTGCTGCCCACGATGCTTGCACATCGCTAGCCATGTTGCCCTATGGGAGGGGGT

GATTATTGAAATGGGCATATTTTCGGATTTAGAGGTTTTTCGGAATATGTTGCTGGCCA

TGGTGCACGGGATGGTGTCATTGGTGTCATCGCTGGTGCACATTGTGCACGGGATGATG

CACATGTTGCTGCACAACGGTGGCACGTTGCTGCACATAGTCGCAATAAGATTAGCCCA

AGGAGGGGCTGATTATTGAAATGGGCATATTTTTGGATTTAGAGGCTTTCTTGGGCTAC

GTTGCTGAGAATGGTGCACTGGGTGGTGCACGTGTTGCTGCACATGGTGCACTGGATGG

TGCACGGGTTGCTGCACACGCTGCAATAGATGGTGCACATGTTGCAGGCCATCATGCAT 52 GGGATGGTGGCATCGTTGGTGCACTTGATGCACGCATTGATGCCAACGTTGGTGCACAT

GATGCACACATTGATGCCAAGGTTGGTGCACGTGATGCACACATTGATGCCAACGTTGG

TGCACGTGATGCACACATTGGTGCCAATGTTGGTGCACGAGCATGTGCACATGGCAGGA

GCACATGGCAAGGGTTGGTGCAAGTAGCTAACAAGGTGATGGGTATAGACCAAATGGGT

TGCCAATCTCTCAATTGGACATCTAGGCAATGTCCCAACGTCCAATCTGACCCCCCAAC

ATGCATTCCCTATTTTCCAGAATT

>17AU:

TCGTTCTGTCCTTCTTCTCCGAATAGTGCAATGGGAAAGTGACCATTAGACGGAAAGGG

GCCAATAGATGGAAAGGGATGGTGGGATCCCAAGATATATGCATACAAGACGGAAAGGG

GCCCTGCATTCTGCCCATGCCGCCCCACAGTTATGTAAAACTTGGTGCGCATGGGTGCT

GCACATATGCGCACATGAGGTGCCCATGGTGCGCACAGCTGCCGCACATAGCTGCCATC

ATCTGCCGAACATGTGCGCATGGGTGGTGCACATCTGCGCACATGAGGTGCCCGTGGTG

CGCACAGCTGCCGCACATAGCTGCCATCATCTGCCGAACATGTGCGCACAGTTGCCCAT

ATCTGCCGCACATATGCGCACATCTGCCGCACATATGTGCACATCTGCGCACATCTGCC

GCACATCTGCCGCACATATGCGCCCAGCTGCTGCACATATGCGCACATATGCCCACATC

TGCCGCACATATGCGCACATCTGCTGCACATATGCCGCACATCTGCGCACAGCTGCCCA

CATCTGCCGCACATTGCTGCCCACGATGCTTGCACATCGCTAGCCATGTTGCCCTATGG

GAGGGGGTGATTATTGAAATGGGCATATTTTCGGATTTAGAGGTTTTTCGGAATATGTT

GCTGGCCATGGTGCACGGGATGGTGTCATTGGTGTCATCGCTGGTGCACATTGTGCACG

GGATGATGCACATGTTGCTGCACAACGGTGGCACGTTGCTGCACATAGTCGCAATAAGA

TTAGCCCAAGGAAGGGCTGATTATTGAAATGGGCATATTTTTGGA

53 >18AL:

TCGTTCTGTCCTTCTTCTCCGAATAGTGCAATGGGAAAGTGACCATTAGACGGAAAGGG

GCCAATAGATGGAAAGGGATGGTGGGATCCCAAGATATATGCATACAAGACGGAAAGGG

GCCCTGCATTCTGCCCATGCCGCCCCACAGTTATGTAAAACTTGGTGCGCATGGGTGCT

GCACATATGCGCACATGAGGTGCCCATGGTGCGCACAGCTGCCGCACATAGCTGCCATC

ATCTGCCGAACATGTGCGCACAGTTGCCCATATCTGCCGCACATATGCGCACATCTGCC

GCACATATGTGCACATCTGCGCACATCTGCCGCACATCTGCCGCACATATGCGCCCAGC

TGCTGCACATATGCGCACATATGCCCACATCTGCCGCACATATGCGCACATCTGCTGCA

CATATGCCGCACATCTGCGCACAGCTGCCCACATCTGCCGCACATTGCTGCCCACGATG

CTTGCACATCGCTAGCCATGTTGCCCTATGGGAGGGGGTGATTATTGAAATGGGCATAT

TTTCGGATTTAGAGGTTTTTCGGAATATGTTGCTGGCCATGGTGCACGGGATGGTGTCA

TTGGTGTCATCGCTGGTGCACATTGTGCACGGGATGATGCACATGTTGCTGCACAACGG

TGGCACGTTGCTGCACATAGTCGCAATAAGATTAGCCCAAGGAAGGGCTGATTATTGAA

ATGGGCATATTTTTGGATTTAGAGGCTTTCTTGGGCTACGTTGCTGAGAATGGTGCACT

GGGTGGTGCACGTGTTGCTGCACATGGTGCACTGGATGGTGCACGGGTTGCTGCACACG

CTGCAATAGATGGTGCACATGTTGCAGCCATCATGCATGGGATGATGGCATCGTTGGTG

CACTTGATGCACACATTGATGCCAACGTTGGTGCACATGATGCACACATTGATGCCAAC

GTTGGTGCACGTGATGCACACATTGATGCCAACGTTGGTGCAC

54 References

Benson, G. 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic

Acid Research, 27(2):573-580.

Bhatia, S., M.S. Negi, and M. Lakshmikumaran.1996. Structural analysis of the rDNA

intergenic spacer of Brassica nigra: evolutionary divergence of the spacers of the

three diploid Brassica species. J. Mol. Evol 43:460-468.

Borisjuk, N., and V. Hemleben. 1993. Nucleotide sequence of the potato rDNA

intergenic spacer. Plant Mol. Biol. 21:381-384.

Bower, N., K. Hertel, J. Oh, and R. Torey. 1988. Nutritional evaluation of marama bean

(Tylosema esculentum, ): analysis of the seed. Econ. Botany 42(4):533-540.

Cordesse, F., R. Cooke, D. Tremousaygue, F. Grellet, and M. Delseny. 1993. Fine

structure and evolution of the rDNA intergenic spacer in rice and other cereals. J.

Mol. Evol. 36:369-379.

Da Rocha, P.S.C.F., and H. Bertrand. 1995. Structure and comparative analysis of the

rDNA intergenic spacer of Brassica rapa: implications for the function and evolution

of the Cruciferae spacer. Eur. J. Biochem. 229:550-557.

55 Doelling, J.H., J. Reginald, J. Gaudino, and C.S. Pikaard. 1993. Functional analysis of

Arabidopsis thaliana rRNA gene and spacer promoters in vivo and by transient

expression. Proc. Natl. Acad. Sci. USA 90:7528-7532.

Ellis, T.H.N., D.R. Davies, J.A. Castleton, and I.D. Bedford. 1984. The organization and

genetics of rDNA length variants in peas. Chromosoma 91:74-81.

Fernández, M., C. Polanco, M.L. Ruiz, and M. Pérez de la Vega. 2000. A comparative

study of the structure of the rDNA intergenic spacer of Lens culinaris Medik., and

other legume species. Genome 43:597-603.

Kato, A., T. Nakajima, J. Yamashita, K. Yakura, and S. Tanifuji. 1990. The structure of

the large spacer region of the rDNA in Vicia faba and Pisum sativum. Plant Mol.

Biol. 14:983-993.

Keegan, A.B., and J. van Staden. 1981. Marama Bean, Tylosema esculentum, a plant

worthy of cultivation. South African J. Sci. 77:387.

Kuemerle, B., C. Cullis, C. Johnson, E. Meszaros, N. Frankenstein, J. Sugalski, A. Bell,

C. Guo, J. Ha, C. Jin, S. Kyung, B. Lee, B. Manyam, J. Ngo, J. Novak, M. Pelyak, T.

Roy, L. Sasala, T. Saw, S. Seward, Y. Shahin, Z. Sharalaya, D. Wolak, S. Woodcraft,

J. Yaney, P. Yoon, D. Zauner. Addressing malnutrition in :

56 development of novel molecular markers for the marama plant. Research Showcase,

Cleveland, OH, Apr. 15, 2010.

Lakshmikumara, M., and M.S. Negi. 1993. Structural analysis of two length variants of

the rDNA intergenic spacer from Eruca sativa. Plant Mol. Biol. 24:915-927.

Lassner, M., O. Anderson, and J. Dvořák. 1987. Hypervariation associated with a 12-

nucleotide direct repeat and inferences on intergenomic homogenization of ribosomal

RNA gene spacers based on the DNA sequence of a clone from the wheat Nor-D3

locus. Genome 29:770-781.

Monaghan, B.G., and G.M. Halloran. 1996. RAPD variation within and between natural

populations of morama [Tylosema esculentum (Burchell) Schreiber] in southern

Africa. South African J. Botany 62(6):287-291.

Nepolo, E., M. Takundwa, P.M. Chimwamurombe, C.A. Cullis, and K. Kunert. 2009. A

review of geographical distribution of marama bean [Tylosema esculentum (Burchell)

Schreiber] and genetic diversity in the Namibian germplasm. African J. Biotech

8(10):2088-2093.

Nepolo, E., P. M. Chimwamurombe, C.A. Cullis, and M.A. Kandawa-Schulz. 2010.

Determining genetic diversity based on ribosomal intergenic spacer length variation

57 in Marama bean (Tylosema esculentum) from the Omipanda area, Eastern Namibia.

African J. Plant Sci. 4(9):368-373.

Piller, K.J., S.R. Baerson, N.O. Polans, and L.S. Kaufman. 1990. Structural analysis of

the short length ribosomal DNA variant from Pisum sativum L. cv. Alaska. Nucleic

Acids Research 18(11):3135-3145.

Rathgeber, J., and I. Capesius. 1990. Nucleotide sequence of the intergenic spacer and the

18S ribosomal RNA gene from mustard (Sinapis alba). Nucleic Acids Research

18(5):1288.

Reeder, R.H. 1984. Enhancers and ribosomal RNA gene spacers. Cell 38:39-44.

Rogers, S.O., and A.J. Bendich. 1987. Ribosomal RNA genes in plants: variability in

copy number and in the intergenic spacer. Plant Mol. Biol. 9:509-520.

Saghai-Maroof, M.A., K.M. Soliman, R.A. Jorgensen, and R.W. Allard. 1984. Ribosomal

DNA spacer-length polymorphisms in barley: Mendelian inheritance, chromosomal

location, and population dynamics. Proc. Natl. Acad. Sci. USA 81:8014-8018.

Sardana, R.K., and R.B. Flavell. 1996. Molecular cloning and characterization of an

unusually large intergenic spacer from the Nor-B2 locus of hexaploid wheat. Genome

39:288-292.

58 Takundwa, M., E. Nepolo, P.M. Chimwamurombe, C.A. Cullis, M.A. Kandawa-Schulz

and K. Kunert. 2010a. Development and use of microsatellites markers for genetic

variation analysis, in the Namibian germplasm, both within and between populations

of marama bean (Tylosema esculentum). J. Plant Breeding and Crop Sci. 2(8):233-

242.

Takundwa, M., P.M. Chimwamurombe, K. Kunert, and C.A. Cullis. 2010b. Isolation and

characterization of microsatellite repeats in Marama bean (Tylosema esculentum).

African J. Agric. Research 5(7):561-566.

Thom, A. Magic marama - the green gold of Africa [Internet]. South Africa: Health-e

News Service; Jan. 24, 2000 [cited on Apr. 25, 2011]. Available from:

http://www.health-e.org.za/news/article.php?uid=20000103

Yakura, K., A. Kato, and S. Tanifuji. 1984. Length heterogeneity of the large spacer of

Vicia faba rDNA is due to the differing number of a 325 bp repetitive sequence

elements. Mol. and General Genet. 193:400-405.

59