STUDYING DNA POLYMORPHISMS IN GENUS GOSSYPIUM AND THEIR IMPLICATIONS IN COTTON BREEDING

By

NABILA TABBASAM

2016

NATIONAL INSTITUTE FOR BIOTECHNOLOGY & GENETIC ENGINEERING (NIBGE), FAISALABAD, PAKISTAN

QUAID-I-AZAM UNIVERSITY, ISLAMABAD, PAKISTAN

STUDYING DNA POLYMORPHISMS IN GENUS GOSSYPIUM AND THEIR IMPLICATIONS IN COTTON BREEDING

A dissertation submitted in partial fulfilment of the requirements for the degree of

Doctor of Philosophy

In

Biotechnology

By

Nabila Tabbasam

2016

NATIONAL INSTITUTE FOR BIOTECHNOLOGY & GENETIC ENGINEERING (NIBGE), FAISALABAD, PAKISTAN

QUAID-I-AZAM UNIVERSITY, ISLAMABAD, PAKISTAN

Acknowledgement

ACKNOWLEDGEMENT In the name of Allah, the most Beneficent and the Merciful. I bow my head, with all the humility and modesty, before Allah Almighty, the creator, the most supreme whose mercy enabled me to accomplish this task and bestowed me with success. May Allah shower His countless salutation upon His all Prophets including Muhammad (PBUH), His last messenger, who is the fountain of knowledge and guidance for the salvation of mankind in this world and in the hereafter. This gives me the privilege to acknowledge few of those people whose sincere help, guidance and prayers enabled me to accomplish my research work in a congenial and serene environment. First of all, I would like to pay attributes to all Directors (Drs. Yusuf Zafar, Zafar Mehmood Khalid, Sohail Hameed and Shahid Mansoor, SI) of the National Institute of Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Pakistan for providing me congenial environment for pursuing for my PhD. I must express my sincere regards to my supervisor Dr.Mehboob-Ur-Rahman (Pride of Performance) for his continuous guidance and patience throughout my study. I am greatly obliged to my co-supervisor Dr.Yusuf Zafar for his helpful suggestions without which it would not have been possible to begin the present to a successful end. I am thankful to the Director cotton CCRI Multan for providing us the leaves of Gossypium species. I am also grateful to late Prof J. McD. Stewart (University of Arkansas, Fayetteville, Arkansas, USA) for providing us the genomic DNA of G. hirsutum ‘‘yucatanense’’, G. capitis-viridis, G. longicalyx and G. australe. I too acknowledge Prof Andrew H Paterson (PGML, UGA, USA) for providing me research facilities in his labs for carrying out a part of my PhD thesis research work. I am also grateful to my lab fellows Dr.Tayyaba Shaheen, Mr. M Atif Iqbal and Mr. Zaman as well as field staff members, Mr. Shehbaz Shabbir, Mr.Wasif Ali and Mr. Muhammad Farooq for their support and cooperation. Funds for this research work were largely utilised through a project entitled “DNA- based genetic characterization of cotton Germplasm (Component-1, NIBGE, Faisalabad) financed through ALP-PARC program. I am also thankful to HEC Islamabad Pakistan for providing funds for doing some lab experiments in PGML, University of Georgia USA. I owe a heartfelt debt of gratitude to my parents, who endured all the strains and stress during the course of my study and whose hearts were beating with prayers for my success. I am also indebted and grateful to my brothers and sisters who always remembered me in their prayers. I am especially thankful to my sister Sonia Tabbasam whose help in take caring of my kids made me able to complete this work. I gratefully acknowledge the encouragement and support from my husband Mr. Fakhar Z. Wahla (Director Admin, BARDC, PARC, Quetta), whose love, support, encouragement, patience and praying made this work possible. I am also thankful to my in Laws for their encouragement and good wishes during my studies. Last but not the least are my loving kids Ch. Shehram Fakhar and Haleema Zaman, their tiny hands were always raised in prayers for my success. I am thankful to Allah Almighty, without His help it was impossible to complete this research work.

Ms. Nabila Tabbasam

I

Dedicated

to my family List of publications

LIST OF PUBLICATIONS

1. Tabbasam, N., Zafar, Y. and Rahman, M., 2014. Pros and cons of using genomic SSRs and EST-SSRs for resolving phylogeny of the Genus Gossypium. Plant Systemetic and Crop Evol. DOI 10.1007/s00606-013-0891-x 2. Rahman, M., Shaheen, T., Tabbasam, N., Iqbal, M., A., Ashraf, M., Zafar, Y. and Paterson, A. H. 2012. Cotton genetic resources. A review. Agron Sustain Dev 32: 419-432. 3. Rahman, M., Asif, M., Shaheen, T., Tabbasam, N., Zafar, Y. and Paterson, A. H., 2011. Marker-assisted breeding in Higher Plants. Sustain Agric Review 6: 39-76. 4. Saeed, M., Guo, W., Ullah, I., Tabbasam, N., Zafar, Y., Rahman, M. and Zhang, T., 2011. QTL mapping for physiology, yield and plant architecture traits in cotton (Gossypium hirsutum L.) grown under well-watered versus water-stress conditions. Electronic Journal of Biotechnology, ISSN 0717-3458, DOI: 10.2225 Vol 14-Issue 3-fulltext-3 5. Lin, L., Pierce, G. J., Bowers, J. E., Estill, J. C., Compton, R. O., Rainville, L. K., Kim, C., Lemke, C., Rong, J., Tang, H., Wang, X., Braidotti, M., Chen, A. H., Chicola, K., Collura, K., Epps, E., Golser, W., Grover, C., Ingles, J., Karunakaran, S., Kudrna, D., Olive, J., Tabbasam, N., Um, E., Wissotski, M., Yu, Y., Zuccolo, A., Rahman, M., Peterson, D. G., Wing, R. A., Wendel, J. F. and Paterson, A. H., 2010. A draft physical map of a D-genome cotton species (Gossypium raimondii). BMC genomics, 11:395. 6. Rahman, M., Yasmin,T.,Tabbasam,N.,Ullah,I.,Asif, M. and Zafar, Y.,2008. Studying the extent of genetic diversity among Gossypium arboreum L. genotypes/cultivars using DNA fingerprinting. Genetic Resour Crop Evol, 55: 331-339. 7. Tabbasam, N., Zafar, Y., Rahman, M. and Paterson, A.H., 2012. BAC derived new SSRs for use in cotton (Gossypium spp.) improvement. Submitted in Pak J Bot.

II

Table of contents

TABLE OF CONTENTS Chapter Topics Pages Title page Declaration Dedication Acknowledgement I List of Publications II Table of Contents III List of Figure VIII List of Table X Abstract XII 1 Introduction 16 1.1 Cotton 16 1.2 Evolution of genome size in cotton 17 1.3 Cotton and Arabidopsis 19 1.4 Need for the improvement of cotton genomic resources 20 1.5 Marker systems for genetic mapping 21 1.5.1 Morphological and Biochemical markers 21 1.5.2 DNA based markers 21 1.5.2.1 Advantages of DNA-based markers over other markers 21 1.5.2.2 Hybridization based markers 22 1.5.2.3 PCR-based DNA markers 22 1.5.2.3.1 PCR-based arbitrarily primed techniques 23 1.5.2.3.2 Random amplified polymorphic DNA (RAPD) 23 1.5.2.3.3 Amplified fragment length polymorphism (AFLP) 24 1.5.2.3.4 Cleaved amplified fragment length polymorphism 25 1.5.2.3.5 Microsatellites or Simple sequence repeats (SSRs) 26 1.6 Transcriptome analysis of cotton fiber development 29 1.6.1 Cotton fiber 29 1.6.2 Fiber Properties 29 1.6.2.1 Fiber length 29 1.6.2.2 Fiber strength 30 1.6.2.3 Micronaire 31 1.6.3 Fiber Development 32 1.6.3.1 Transcriptomes of a developing fiber 33 1.6.3.2 Comparison of diploid and tetraploid genomes regarding fiber 34 1.6.3.3 Similarities between Arabidopsis trichome and cotton seed hair 34 development 1.7 Cotton fiber identification through differential display 35 1.8 Development of new SSRs 35 1.9 Molecular linkage maps in cotton 36 1.9.1 Construction of linkage maps 36 1.9.2 Cotton linkage maps 37 1.10 Implications of genomic tools for cotton improvement 43 1.11 Objectives 44 2 Materials and Methods 46 2.1 Genetic diversity and relationship of diploid and tetraploid cotton 46

III

Table of contents

species using EST-SSR and gSSRs 2.1.1 Plant material 46 2.1.2 Extraction of Genomic DNA 48 2.1.2.1 DNA purification 49 2.1.2.2 Quantification of genomic DNA 49 2.1.3 Simple sequence repeats (SSR) analysis 49 2.1.3.1 Polymerase chain reaction (PCR) 50 2.1.4 Electrophoresis of amplified products 51 2.1.5 Data scoring and statistical analysis 51 2.2 Identification of differentially expressed genes from normal (fiber 52 producing) and mutant (fiberless) ovules of Gossypium hirsutum during fiber elongation stage 2.2.1 Plant Material and RNA extraction 52 2.2.2 Synthesis of the first strand cDNA and Differential Display RT- 53 PCR DDRT-PCR) 2.2.3 DDRT-PCR 53 2.2.4 Electrophoresis of amplified products 56 2.2.5 Isolation, Reamplification and Confirmation of Differentially 56 Expressed Transcripts 2.2.5.1 Elution protocol from agarose gel 56 2.2.6 DNA Sequencing 57 2.2.7 Data Analysis 57 2.3 Isolation of fiber related partial sequences from Gossypium 57 arboreum at different developmental stages and phylogenetic study of translation elongation factor-1 gamma gene across the genomes of cotton 2.3.1 Plant material 57 2.3.2 RNA extraction at different developmental stages of cotton fibers 58 2.3.3 Synthesis of 1st strand cDNA 58 2.3.4 Gene specific primer designing 58 2.3.4.1 Criteria for primer designing 59 2.3.5 Polymerase chain reaction 60 2.3.6 Agarose gel electrophoresis of amplified products 61 2.3.7 DNA fragments elution protocol from agarose gel 61 2.3.7.1 Ligation of fragments 61 2.3.7.2 Heat shock Cells Preparation of E .coli 62 2.3.7.3 Transformation in E. coli by heat shock 62 2.3.7.4 Plasmid isolation from E. coli 63 2.3.7.5 Restriction reaction 63 2.3.8 DNA Sequencing 64 2.3.8.1 Sequence analysis 64 2.3.9 BAC library screening 64 2.3.9.1 Filter preparation, probe designing and hybridization 64 2.3.9.2 BAC clone sequencing 65 2.3.9.3 Sequencing reaction 65 2.3.9.4 Clean up protocol 67 2.3.9.5 Run on “ABI 377” DNA sequencer 67 2.3.9.6 Analysis of sequences 67

IV

Table of contents

2.4 Development of BAC-gSSRs from Gossypium raimondii for use 67 in cotton improvement 2.4.1 Development of BAC-gSSRs 67 2.4.2 Plant material 67 2.4.3 DNA extraction 68 2.4.4 Polymerase chain reaction 68 2.4.5 Electrophoresis of BAC-gSSRs 69 2.4.6 Data scoring and analysis 69 2.5 Genetic and QTL mapping using F2 population derived from 69 G.hirsutum x G.barbadense 2.5.1 Parental genotypes 69 2.5.2 Mapping population development 69 2.5.3 Phenotyping of F2 mapping population 70 2.5.4 Phenotyping of F2:3,F2:4 mapping population 70 2.5.5 Genomic DNA isolation 70 2.5.6 Molecular mapping 70 2.5.6.1 Chemicals and enzymes 70 2.5.6.2 Simple sequence repeats (SSR) analysis 70 2.5.6.3 Polymerase chain reaction 71 2.5.6.4 Electrophoresis of amplified products 71 2.5.6.5 Map construction 71 2.5.6.6 QTL analysis 72 2.5.6.1 Marker assisted selection 72 3 Results 73 3.1 Genetic diversity and relationship of diploid and tetraploid species 73 using EST-SSRs and gSSRs 3.1.1 Microsatellite polymorphism 73 3.1.2 Genetic characterization 77 3.1.3 Transferability of SSRs across the Gossypium genomes and 79 genome specificity 3.1.4 Microsatellite performance among diploid (A and D) and 82 tetraploid (AD) genome species 3.1.5 A and D genome species relationship with AD genome species 83 3.1.6 Genetic similarity among diploid and tetraploid cotton species 85 with EST and gSSRs 3.1.7 Phylogenetic study of 36 cotton species/landraces with combined 85 data of EST-SSRs and gSSRs 3.1.8 Clustering of species with EST-SSRs 91 3.1.9 Clustering of species with gSSRs 92 3.2 Identification of differentially expressed genes from normal and 93 mutant ovules of G. hirsutum during fiber elongation stage 3.3 Isolation of fiber related sequences from G. arboreum ovules at 98 different developmental stages and phylogenetic study of translation elongation factor-1 gamma gene across diploid Gossypium genomes 3.3.1 Amplification of reverse transcribed RNA (cDNA) with gene 99 specific primers from G. hirsutum 3.3.2 Cloning of PCR amplified products into pTZ57 R/T vector 99

V

Table of contents

3.3.3 Screening of different cotton BAC libraries with gene specific 102 probes 3.3.3.1 Sequencing of clones 103 3.3.3.2 Nucleotide variations 104 3.3.3.3 Analysis of BAC end sequences (BES) and Contige assembly 106 3.4 BAC derived new SSRs for use in cotton (Gossypium spp) 108 improvement 3.4.1 Identification of microsatellites (SSR) 108 3.4.2 Surveying of BAC-gSSR 108 3.5 Experiment No. 5. Genetic and QTL mapping of fiber traits using 112 inter-specific (G. hirsutum x G. barbadense) F2 population 3.5.1 Parental polymorphism 112 3.5.2 Linkage analysis and map construction 112 3.5.3 Phenotypic analysis of productivity/taxonomic/fiber traits in F2:3 116 families 3.5.4 QTL analysis 119 3.5.4.1 Mapping QTL for productivity/taxonomic traits 119 3.5.4.2 Description of newly identified QTLs 123 3.5.4.2.1 Staple length 123 3.5.4.2.2 Uniformity index 123 3.5.4.2.3 GOT% 123 3.5.4.2.4 Marker association with traits using single marker analysis (SMA) 123 3.5.4.2.5 Genes flaking QTLs linked markers 124 3.5.5 Marker assisted selection (MAS) 124 4 Discussion 131 4.1 Genetic diversity and relationship of diploid and tetraploid species 132 using EST-SSRs and gSSRs 4.1.1 Polymorphism in microsatellite region 132 4.1.2 Genetic characterization 133 4.1.3 Performance of microsatellite between A, D and AD-genome 134 species 4.1.4 Cross species amplification and genome specificity 134 4.1.5 Genetic relationship of tetraploid species with their wild relatives 135 4.1.6 Genetic diversity and phylogenetic relationship in the genus 135 Gossypium 4.2 Identification of differentially expressed genes involved in fiber 138 development 4.3 Isolation of fiber related sequences from different developmental 141 stages of G. arboreum 4.3.1 Isolation of fiber related sequences 141 4.3.2 Screening of BAC libraries of different genomes (AD, A, D and 143 K genomes 4.3.3 Analysis of BAC end sequences (BES) and Contigs assembly 144 4.4 BAC derived new SSRs for use in cotton (Gossypiumspp) 144 improvement 4.4.1 Identification of genomic SSR markers in G. raimondii 144 4.4.2 Polymorphism detected with BAC- gSSRs 146 4.5 Genetic mapping 149

VI

Table of contents

4.5.1 Parental polymorphism 149 4.5.2 Linkage analysis and map construction 150 4.5.3 QTL detection 152 4.5.4 Marker Assisted selection (MAS) for fiber quality traits 153 4.5.5 Identification of candidate genes involved in staple length 154 4.6 Conclusions 155 Refrences 157 Appendix

VII

List of figures

LIST OF FIGURES

Figure No Title Page No Figure 1.1 Map showing countries which are members of ICAC 17 Figure 3.1 Frequency distribution of 109 alleles in 36 cotton 74 species/landraces Figure 3.2 Distribution of fragment sizes amplified by SSRs 82 Figure 3.3 Phylogenetic analysis of 36 cotton species with combined 90 data set of ESTs and gSSRs Figure 3.4 Clustering of species with EST-SSRs 91 Figure 3.5 Clustering of species with gSSRs 92 Figure 3.6 RNA isolated from ovules of mutant and normal floral buds 93 3 dpa Figure 3.7 Differential display (DDRT-PCR) from normal (fiber 94 producing) and mutant (fiberless) 3dpa ovules of Gossypium hirsutum. Figure 3.8 Amino acid of six fragments with 96 known genes Figure 3.9 RNA from ovules and leaf of G. arboreum 98 Figure 3.10 Nucleotide sequence homology of four fragments amplified 101 with gene specific primers with known genes Figure 3.11 Both probes showed maximum numbers of hits in G. 103 arboreumBAC library Figure 3.12 Genome specific nucleotide variation 104 Figure 3.13 Phylogenetic tree of cotton species 105 Figure 3.14 Polymorphic BAC-gSSRs between tetraploid species 110 Figure 3.15 Segregation of BAC-gSSR 48 in 13 plants of F2 population 113 derived from a cross of FH-1000 x PGMB-36 Figure 3.16 Genetic linkage map constructed using inter-specific cross 115 FH-1000 x PGMB-36 derived from F2 population Figure 3.17 Phenotypic distribution of, (A) boll length (B) boll width (C) 116 bracteole length (D) bracteole width (E) fiber elongation (F) uniformity index in F2:3 families Figure 3.18 Phenotypic distribution of, (G) staple length (H) fiber 117 strength (I) GOT (J) fiber fineness in F2:3 families Figure 3.19 Position of QTLs associated with staple length (shown in red 126 colour) and uniformity index (yellow colour), on genetic linkage map and physical locations as (Mb) of markers linked to these QTLs and putative genes in their flanking regions. Figure 3.20 Position of QTLs associated with staple length (red) and 127 uniformity index (yellow colour), on genetic linkage map

VIII

List of figures

and physical locations as (Mb) of markers linked to these QTLs and putative genes in their flanking regions. Figure 3.21 Position of QTLs associated with staple length (red) and 128 uniformity index (yellow colour), on genetic linkage map and physical locations as (Mb) of markers linked to these QTLs and putative genes in their flanking regions. Figure 3.22 Position of QTLs associated with uniformity index (yellow) 129 and uniformity index (yellow colour), on genetic linkage map and physical locations as (Mb) of markers linked to these QTLs and putative genes in their flanking regions. Figure 3.23 Position of QTLs associated with GOT%age (green) and 130 uniformity index (yellow colour), on genetic linkage map and physical locations as (Mb) of markers linked to these QTLs and putative genes in their flanking regions.

IX

List of tables

List of Tables

Table No Title Page No Table.1.1 DNA-based genetic maps of tetraploid cotton 42 Table 2.1 Cotton species included in the study 47 Table 2.2 Concentration of reagents used in Polymerase chain reaction 50 Table 2.3 Thermal cycler was programmed using the following profile 50 Table 2.4 Concentration of PCR reagents 54 Table 2.5 PCR profile 54 Table 2.6 Primer sequences used in the synthesis of cDNA 55 Table 2.7 Growth parameters for rising of cotton seedlings 58 Table 2.8 Primer sequences 59 Table2.9 Concentration of PCR reagents for fiber related genes 60 Table 2.10 PCR profile 60 Table 2.11 Protocol for ligation reaction 61 Table 2.12 Protocol for restriction reaction 63 Table 2.13 Sequences of Overgo probes 65 Table 2.14 Primers sequences used for sequencing of positive clones 66 Table 2.15 Concentration of reagents used in sequencing reaction 66 Table2.16 Profile of thermal cycler 66 Table 2.17 Cotton species used to study BAC-gSSRs polymorphism 68 Table 3.1 Markers with repeat motif type, number of alleles, 75 frequencies and PIC value Table 3.2 Most informative SSRs with their position on 78 Table 3.3 Transferability of G.hirsutum derivative SSRs in other 80 Gossypium species/genomes Table 3.4 Genome and species-specific amplification features of SSRs 81 Table 3.5 Genetic Similarity coefficient between tetraploid species and 84 A-/D- genome species Table 3.6 Genetic similarity coefficients of 36 cotton species using 87 two types of markers (EST-SSRs and gSSRs) Table 3.7 Fragments identification and homologies with known genes 97 Table 3.8 Different libraries and their hit numbers by both probes 102

X

List of tables

Table 3.9 Sequence alignment of EF 1-gamma of Arabidopsis thaliana 107 and G. arboreum Table 3.10 Frequency of different repeat motifs in newly identified set 108 of BAC-gSSRs Table 3.11 BAC-gSSRs polymorphic among four cotton species 109 Table 3.12 Percentage of polymorphism of BAC-gSSR in (G. hirsutum 110 cv. FH_1000 and G. barbadense cv. PGMB-36) Table 3.13 Genetic similarity coefficients of diploid and tetraploid 111 cotton species using G. raimondii derived BAC-gSSRs Table 3.14 Distribution of DNA markers on linkage map of cotton

constructed using F2 population derived from inter-specific 114 cross (FH-1000 x PGMB-36) Table 3.15 Phenotypic performance for boll length, boll width, bracteole length, bracteole width, fiber length, fiber strength, fiber elongation, fiber fineness, maturity, uniformity index 118

and GOT of parents and F2 mapping population from a cross of FH-1000 and PGMB-36 Table 3.16 Information regarding putative QTLs detected for boll surface, boll beak, boll length, staple length, uniformity 120 index and GOT based on interval mapping Table 3.17 Information regarding putative QTLs detected for staple length, Uniformity index and GOT based on Composite 121 interval mapping Table 3.18 Information regarding putative QTLs detected for boll surface/beak/length/width, bracteole length/width, staple 122 length, uniformity index and GOT %age based on single marker analysis Table 3.19 Physical locations of markers and putative genes in their 125 flanking regions

XI

Abstract

Abstract

Diploid cotton species of the genus Gossypium have sufficient genetic repository for many important traits like fiber strength and high fiber yield, and high tolerance/immunity to viral disease. It is, therefore, vital to assess the genetic diversity and phylogenetic relationship among the available cotton species as a mandatory requirement before utilizing the species in molecular breeding programs.

In this dissertation, we have demonstrated the application of genomic SSRs (gSSRs) and EST-SSRs, and after combining both the data sets, for resolving the phylogenies of 36 cotton species including seven races. Out of the 100 primer pairs surveyed (50 gSSRs and 50 EST-SSRs), 75 produced scoreable amplification products in all species. Out of these, 73 were found polymorphic and amplified 135 alleles ranging from 1 - 5 alleles per SSR marker (average 2.87 alleles per marker). The gSSRs amplified higher number of alleles (72) compared to the EST-SSRs (63). In total, 22 highly informative SSRs with PIC values ≥0.5 were identified. Genomic SSRs containing di- while EST-SSRs containing tri-nucleotide repeats exhibited high polymorphism compared to the other nucleotide repeats containing gSSRs/EST-SSRs. Number of tandem repeats showed positive correlation with polymorphisms while neither the type of chromosome nor the location of SSRs on the corresponding showed association with polymorphism. Gossypium herbaceum var. africanum and Gossypium robinsonii were found the most genetically diverse, while among races of Gossypium hirsutum L. ‘‘yucatanense’’ and ‘‘punctatum’’ were found genetically diverse. Of the three data sets, clustering analysis based on EST-SSRs and combined data sets, revealed parallel results reported in earlier studies. This study further confirmed that Gossypium darwinii has close relationship with Gossypium barbadense L. Moreover, G. raimondii and G. herbaceum/G. arboreum L. are close living relatives of the ancestor allotetraploid species. Our studies suggest that for resolving phylogenetic relationship among the various plant species EST-SSRs could be a better choice. This information can be instrumental in transferring novel alleles or loci from the wild cotton species into the cultivated cotton species which would set a stage for cultivating genetically diverse cultivars, a way to achieve sustainable cotton production in changing climate.

XII

Abstract

Cotton fibers being single cells present an ideal system for the study of cell differentiation and elongation. However, little information is available about genes involved in cotton fiber elongation due to complex mechanisms of genes regulation. Genes responsible for cotton fiber elongation can play a vital role for the improvement of present day cotton cultivars. In this investigation we performed non-radioactive differential display reverse transcriptase PCR (DDRT-PCR) with total RNA isolated from normal (fiber producing) and mutant (fiberless) 3 dpa (days post anthesis) ovules of Gossypium hirsutum to identify cotton fiber genes. The amplified products were resolved on ethidium bromide stained 1% agarose gel. By screening a combination of 11 anchored primers with combination of 15 arbitrary primers, 23 cDNA transcripts were found to be up-regulated, out of these 23 transcripts, ten were found false positive and DNA was extracted from remaining 13 cDNA transcripts, re-amplified and sequenced. BLAST search analysis showed that six differentially expressed transcripts (normal; fiber producing ovules) showed homology with phosphatase, zinc finger protein, glycosyltransferase and β-tubulin.

Although differential display is a powerful tool for the identification of genes but few most common genes are not recovered through this technique. For this purpose, gene specific primers from one species are used to identify genes from other species. In this investigation we isolated cotton fiber specific partial cDNA sequences from G. arboreum using gene specific primers based on the already reported sequences from G. hirsutum. All these sequences are involved in cotton fiber development. Partial sequence of translation elongation factor1-gamma recovered from G. arboreum used to screen four cotton libraries from three diploid species (G. arboreum, G. raimondii, G. kirkii) and one tetraploid species (G. hirsutum). The obtained sequences aligned to find SNPs and phylogenetic analysis. Gene was found to be multi-copy gene and different copies found in the same genome were grouped together in phylogenetic analysis. At eight positions SNPs specific to the respective genomes were detected.

In the last part of the present study, new SSRs derived from BAC-end sequences were designed to identify new DNA markers.This new set of SSRs with di-, tri-, tetra-, penta- and hexa-nucleotide repeats were fished out from bacterial artificial chromosome (BAC) ends and BAC clone sequences of Gossypium raimondii. A total of 1294 SSRs were designed: 560 from BAC end sequences and 734 from BAC clone sequences. These primer pairs were named as PR-GR-BESS and PR-GR-BS (PR for the last name

XIII

Abstract of the both principal investigators, GR for Gossypium raimondii,BES for BAC end sequences, B for BAC clone and S for simple sequence repeat). Now onward these will be called as BAC-gSSRs in this dissertation. This new set of G. raimondii derived BAC-gSSRs were tested for their transferability to other important cotton genomes including cultivated tetraploids (G. hirsutum and G. barbadense) and diploid ancestral species (G. raimondii and G. arboreum). BAC-gSSRs contained diverse types of repeat motifs. The 722 were dinucleotide (55.98%), followed by tri (397, 30.68%), tera (118, 9.11%), hexa (40, 3.09%) and penta (17, 1.31%). Hexa nucleotide repeats had shown the highest level of polymorphism (42.5%) followed by penta (35.29%), tetra (30.51%), tri (17.88%) and di (14.54%). A total of 313 (30 %) of the SSRs amplified two fragments, separated in high resolution agarose metaphor gel electrophoresis. A total of 30% of the primers were unable to amplify clear fragments in G. arboreum species. These primers produced some private alleles in G. raimondii and AD genome species indicating the specificity of these SSRs for D-genome. For genetic diversity assessment, PIC values were calculated. Average PIC value was 0.39 with a range of 0.12 to 0.85. G. arboreum was found more close to G. barbadense (0.63) as compared to G. hirsutum 0.57 while G. raimondii showed equal genetic similarity with both the G. hirsutum (0.59) and G. barbadense (0.59). This study will be instrumental in developing dense genetic maps, and also initiating marker-assisted selection (MAS) in cotton.

For genetic mapping of F2 and F2:3 mapping populations derived from an interspecific cross of G. hirsutum cv FH-1000 x G. barbadense cv PGMB-36 were evaluated for 18 productivity traits. Continuous variation pattern of F2 plants and F2:3 families for all the traits indicated that measured traits were quantitatively inherited. Transgressive segregation observed in both directions indicated that both the parents transmitted favourable alleles for each trait. The 1294 newly identified G. raimondii derived BAC-gSSRs and 20 EST-derived SSRs were surveyed on parent species. None of the EST-SSRs were found polymorphic while 235 (22.53%) BAC-gSSRs were found polymorphic. Out of the 235 polymorphic BAC-gSSRs, 73 were used to assay the entire

F2 population consisting of 131 individuals. Linkage analysis resulted in mapping of 61 loci on 9 different linkage groups (LG) ranging from 12 to 92.7 cM in length. The remaining 12 markers were unlinked. So none of linkage groups could assign chromosome number with the help of already mapped markers, however, we tried to assign chromosome numbers to linkage groups through BLAST search of sequences

XIV

Abstract from which these SSRs were designed against whole genome shotgun sequence (WGS) of G. raimondii. Chromosomes assigned to two linkage groups while, remaining seven linkage groups could not be assigned chromosome number because these quences from which these markers were designed showed homology with more than one chromosome. The map spanned a total of 322.9cM, covering ~8% of the total G. hirsutum (2.5 Gb) genome, and the average distance between adjacent markers was 5.3 cM. For QTL mapping “Interval mapping” and “Composite interval mapping”procedures were used. Interval mapping analysis employed to determine chromosomal location of genes impacting the productivity traits, which yielded seven QTLs for six traits while composite interval mapping depicted four QTLs for three traits.

This study would be instrumental in transferring genetic information for desired traits from other cotton species into the cultivated cotton species G. hirsutum using marker-assisted selection.

XV

Chapter 1: Introduction

INTRODUCTION

1.1 Cotton

The northern hemisphere is the richest for world cotton production which contributes ~91% (ICAC, 2006) of the total production. Cotton is an important cash crop for its natural fiber and oil as well as bio-energy production. Though America, Africa and Asia are native countries for cotton evolution but now it is cultivated in almost hundred countries. China, USA, India and Pakistan are major cotton producing countries which contribute 2/3 of the total cotton produced worldwide. Pakistan is the 4th largest cotton producer and consumer country. In Pakistan, it is grown on 3.0 million hectares with annual production of ~670 kg ha-1.

The genus of cotton is Gossypium which belongs to family Malvaceae. There are 51 species in the genus Gossypium, out of these 45 are diploids with haploid chromosome number (n=13) while 6 are allotetraploids (4n=52). Eight genomes A through G and K, based on chromosomes pairing, have been elected (Endrizzi et al., 1984). On the phylogenetic basis diploid species have been grouped into two main lineages that are 13 D-genome and the rest of 32 species represent other genomes (A-, B-, C-, E-, F-, G- and K-). While previously reported tetraploid species (5 in number;

AD1 to AD5) belong to one lineage (Wendel and Cronn, 2003). Only four species (G. hirsutum L. and G. barbadense L. G. herbaceum L. and G. arboreum L.) are cultivated. Out of these G. hirsutum L. and G. barbadense L. are major contributor of

16 Chapter 1: Introduction cotton in the world which contribute 90% and 8%, respectively, whereas G. herbaceum L. and G. arboreum L. together contribute ~2% of world’s cotton (Zhang et al., 2008).

Figure 1.1 Map showing cotton growing countries (members of ICAC)

1.2 Evolution of genome size in cotton

With the passage of time, some of the plant genomes were duplicated (polyploidy), during duplication intra and inter chromosomal rearrangement of some segments of genomes have occurred. These rearranged segments exhibit similar degree of sequence divergence (Zheng et al., 2008).

In recent years, most of the genomic research is focused to compare contiguous, homeologous segments of plant genomes e.g. maize, rice and sorghum (San Miguel et al., 1996; Tikhonov et al., 1999; Paterson et al., 2009).

The term “C-value paradox” was introduced for more than half century ago (Mirsky and Ris, 1951). Through extensive studies it was determined that there is lack of relationship between size and complexity of genomes (Thomas, 1971; Claverie, 2000; Bertran and Long, 2002). Eukaryotic genome size varies significantly from each other (>200,000 fold variation) as the genome size of Encephalitozoon cuniculi is 2.8 Mb (Biderre et al., 1998) while the genome size of Navicola pelliculosa>690,000 Mb (Li and Graur, 1991; Claverie, 2000). In spite of this remarkable variation in genome size, only about 20 fold variation was observed in

17 Chapter 1: Introduction exon portion of genomes (Li, 1997). Thus, it was cleared that major causes of variation in genomes size are introns (non coding DNA). However, the mechanisms involved in such genome size variations are not yet cleared (Oliver et al., 2007).

Main forces for genome size expansion in plants are polyploidization (Wendel, 2000) as well as transposable element (Bennetzen, 2002). However, some other small scale processes like introns of different sizes (Deutsch and Long, 1999; Vinogradov, 1999), high degree of pseudogenes (Zhang et al., 2003) and organelle genome integration into nuclear genome (Adams and Palmer, 2003; Shahmuradov et al., 2003) also play role in genome size increment.

Though the mechanisms involved in genome size reduction are not well understood but some insertions/deletions, partial mutations (Petrov, 2002) in genomes (Kirik et al., 2000; Orel and Puchta, 2003) are thought to be involved in genome size reduction process.

Gossypium tetraploid species originated about 5-10 mya (Cronn et al., 2002, Wang et al., 2012), represent three fold genomic variation (Wendel and Cronn, 2003). Its genomes sizes range from 880 Mb (G. raimondii L.) (Lin et al., 2010; Paterson et al., 2012) to 2.5 Gb (G. hirsutum L.) (Wendel and Cronn, 2003; Handrix and Stewart, 2005). These genomes have conserved gene sequences. A- and D-genomes have common ancestors about 10 mya and these diploid genomes gave birth to tetraploid species after hybridization about 1-2 mya (Cronn et al., 2002; Wendel and Cronn, 2003).

Like in other crop plants, doubling of chromosome number caused a variety of genomic responses in polyploidy cotton, which includes interactions of genomes in one nucleus i.e. polyploid nucleus. Activation of dispersed repetitive sequences after polyploidization in cotton has resulted in novel regulatory changes. Repetitive elements of ancestor A-genome have been found in D-genome of the present day polyploid cotton (Hanson et al., 1998; Zhao et al., 1998). It was investigated that after polyploid formation, may be the distribution of transposable elements across genomes have participated in adaptation/diversification of Gossypium. Intergenomic interaction did not appear at once rather these were gradually appeared over time as a result of hybridization and polyploidization (Hanson et al., 2000).

18 Chapter 1: Introduction

The simultaneous duplication of nuclear genes is considered to be the most important event that resulted in the formation of allopolyploids in cotton. After gene duplication there are four possible results. 1) the genes after duplication may continue with their original function 2) they may acquire new functions due to duplicate divergence that is resulted from relaxation of selection (Ohno, 1970; Ferris and Whitt, 1979; Li, 1985; Hughes, 1994; Hughes et al., 2000; Lynch and Conery, 2000; Lynch and Force, 2000).

Third possibility is the silencing of one of the member in duplicated gene pair (Lynch and Conery, 2000; Wendel, 2000) and fourth proposed possibility is sub- functionalization in which expression pattern is divided between members in duplicated gene (Force et al., 1999; Lynch and Force, 2000). Through evolution the repeats present at different loci of diploid genomes were homogenised in allopolyploids to same sequence (either ‘A-like’ or ‘D-like’) (Elder and Turner, 1995). In pseudo-genes, high degree of nucleotide diversity is present whereas in functional genes less diversity is present because of the selection pressure imposed during evolution. Because of co-linearity in two genomes of allopolyploid cotton it is considered to be a suitable model for exploration of such kind of studies. Evolutionary rate is faster in D-genome than A- genome sequences (Adams and Wendel, 2004).

Although evolutionary changes played major role in developing gene and genome structure but gene doubling has also significant effect on gene expression. Limited work has been under taken to find out the polyploidy effects in cotton. Hence there is a need to explore cotton genome to study its evolutionary consequences.

1.3 Cotton and Arabidopsis

Family Malvaceae including cotton is considered to be the closest relative to Arabidopsis (Rong et al., 2005), therefore, it is an ideal plant for undertaking comparative genomic study. Cotton also provides the best opportunity for polyploidization study. Complexity of polyploidy can be decoded by understanding comparative chromosomal structural evolution of Gossypium and Arabidopsis (Rong et al., 2005). About 83–86 mya cotton and Arabidopsis has been diverged from a common ancestor (Bowers et al., 2003) while according to recent studies, cotton diverged from Theobroma Cacao about 60 mya (Paterson et al., 2012) whereas

19 Chapter 1: Introduction cereals were diverged only about 20–30 mya (Gaut et al., 1996; Koch et al., 2000). A number of studies have shown that duplicated segments in Arabidopsis have similarity with the ancient duplicated cotton regions (Bowers et al., 2003).

The study of relationship of cotton with Arabidopsis has practical implications. The genetic control of “seed borne epidermal fibers” in cotton can be better understood by studying the growth and development of Arabidopsis “trichomes” (Larkin et al., 2003; Schiefelbein, 2003). The possible arrangement of mapped genes in imaginary common ancestor of cotton and Arabidopsis can be easily understood with the study of available linkage maps (Rong et al., 2004). For comparative study of gene order between divergent taxonomic families, cotton is an excellent model plant because of its close relationship with Arabidopsis, availability of its detailed genetic maps (Rong et al., 2004) and functional genomics information.

1.4 Need for the improvement of cotton genomic resources

Cotton is an important crop for both its natural fiber and seed (Rahman et al., 2005, 2008a). Cotton seeds are big source for producing edible oil, industrial by- products like soap and also used as animal feed.

Cotton is best system for study of many valuable biological processes (Zhang et al., 2008) such as cell expansion and cellulose biosynthesis. Cotton, being a polyploidy, offers a unique opportunity for studying the role of polyploidization in genome structuring and angiosperms evolution (Grant, 1981; Leitch and Bennett, 1997; Zhang et al., 2008).

Cotton production must be improved for maintaining the economy of Pakistan because cotton accounts for >60% of total foreign exchange in Pakistan economy (GOP, 2007). Cotton production is depressed because of occurrence of many biotic (insect, pest attack) and abiotic (water shortage) stresses. Diploid A-genome cultivated cotton is resistant to many biotic (Mehetre et al., 2003) and abiotic stresses, while upland cotton is highly susceptible to these stresses.

20 Chapter 1: Introduction

1.5 Marker systems for genetic mapping

The segregating markers provide information about allelic status (maternal homozygous, paternal homozygous or heterozygous) of each of the individuals in a segregating population. Construction of a genetic linkage map simply involves assigning markers to linkage groups using their recombination frequency estimated by genotypes of a cross and study of genetic distance between them (Kearsey and Pooni, 1996).

1.5.1 Morphological and Biochemical markers

Morphological markers are based on phenotype (visual observation) of individual. In a number of studies these markers have been employed in diversity analysis (Tatineni et al., 1996) and for linkage mapping (Franckowiak, 1997) but QTL analysis is difficult with these markers because of their recessive nature, stage (developmental) specificity, vulnerability to environmental effects, possibility of exhibiting epistatic effects (Mohan et al., 1997).

Later on, in 1980s, biochemical markers were used for QTLs mapping (Weller et al., 1988). The first biochemical mapped locus on long arm of chromosome 12 in cotton was Pgm7 encoding monomeric phosphoglucomutase isozyme (Saha and Stelly, 1994). Biochemical studies were more useful as compared to morphological markers (Tanksley, 1983, 1993).

1.5.2 DNA based markers

1.5.2.1 Advantages of DNA-based markers over other markers

DNA based markers have high occurrence and unique characteristics as compared to morphological and biochemical markers. These markers have many advantages like (1) selection of (a) true-to-type genotypes at early stage of plant growth (b) plants with desirable traits (c) low heritable traits more efficiently (2) pyramiding/combining of useful/multiple genes (3) monitoring of introgression of genes from wild species to cultivated

DNA markers are divided into hybridization-based and PCR-based markers.

21 Chapter 1: Introduction

1.5.2.2 Hybridization based markers

When genomic DNA is restricted with different restriction enzymes, DNA fragments of different lengths are produced leading to polymorphisms (Jeffreys et al., 1985). By analysing these fragments restriction fragment length polymorphisms (RFLPs) can be identified (Shappley et al., 1998a). RFLP markers are co-dominant in expression, more reliable and have simple inheritance pattern (Kohel et al., 2001; Mei et al., 2004). In a number of crops these markers have extensively been exploited for genetic diversity and genome mapping (Dudley, 1993; Giese et al., 1993; Schon et al., 1993; Lee et al., 1996). The probes derived from other species are called heterologous probes and these probes in RFLP are used for comparative mapping (Tanksley et al., 1988; Lagercrantz et al., 1996; Lagercrantz, 1998).

First genetic map using RFLPs was constructed in human by Botstein et al. (1980) which helped in genomic studies of other organisms. Among crops RFLP markers using 57 loci were employed first time on tomato (Bernatzky and Tanksley, 1986). In cotton a number of scientists used RFLP markers (Meredith, 1992; Reinisch et al., 1994; Shappley, 1994; Mei et al., 2004; Ulloa et al., 2005). Polymorphisms at both interspecific (Reinisch et al., 1994; Mei et al., 2004) and intraspecific (Van Becelaere et al., 2005) regions were analysed using RFLPs. For the study of hetrosis and varietal identification in G. hirsutum L. RFLPs were used (Meredith, 1992).

Major limitations of RFLPs are the requirement of large amount of high quality DNA, low level of polymorphisms, labour intensive and time consuming as compared to PCR based markers (Liu and Cordes, 2004).

1.5.2.3 PCR-based DNA markers

Polymerase chain reaction (PCR) invented by Mullis and Faloona in 1987 was great revolution to find polymorphisms at genetic level in different organisms. The PCR-based markers are faster, cost-effective, easy-to-use and universal as revealed by a number of applications in recent years (Caetano-Anollés, 1993; Rafalski and Tingey, 1993; McClelland et al., 1995; Schierwater, 1995; Caetano-Anollés, 1996; Rahman et al., 2012) as well as these marker assays need DNA in small quantity than hybridization based markers (RFLP) (Williams et al., 1990).

22 Chapter 1: Introduction

1.5.2.3.1 PCR-based arbitrarily primed techniques

There are many fingerprinting techniques have been used. These are (1) Random amplified polymorphic DNA (RAPD) (2) arbitrarily primed PCR (AP-PCR) (3) DNA amplification fingerprinting (DAF) using arbitrary primers. These techniques do not require prior information regarding sequence. All three above mentioned techniques have same basic principles but the difference is only in primer length (commonly range from 5 to 10) and gel electrophoresis e.g. 10 mer primers are used in RAPD whereas primers consisting of 7-8 bases (in some cases 5 bases) are used in DAF and primers of 20 bases are used in AP-PCR. Primer and template concentrations are another important difference between these techniques e.g. primer/template ratios for DAF is > 5 and for RAPD ≤ 2 while for AP-PCR this ratio is in between. DAF amplicons are more in number than RAPD and AP-PCR. Amplicons of DAF, AP-PCR and RAPD are resolved by silver stained polyacrylamide gel electrophoresis (PAGE), autoradiography and ethidium bromide stained agarose gel electrophoresis respectively.

Besides above mentioned arbitrarily primed PCR based techniques, microsatellite or simple sequence repeats (SSRs) and single nucleotide polymorphisms are the most elemental PCR based DNA markers. Both techniques require prior sequence information and their amplicons are resolved by ethidium bromide stained MetaPhor agarose gel electrophoresis. Both are preferred over other marker systems because of their co-dominance.

1.5.2.3.2 Random amplified polymorphic DNA (RAPD)

Random primed genomic regions are used in RAPD analysis for identifying polymorphisms (Bassam et al., 1995; Livak et al., 1995; McClelland et al., 1996). RAPD has a number of advantages like (1) non-radioactive detection (2) usability of these primers in any genome (3) requires less amount of DNA (4) simple to perform (5) cost effective technique apart from thermocycler and trans illuminator (Rafalski, 1997).

Like many other crop species RAPD analysis has also been performed in cotton for studying phylogenetic relationships and genetic diversity study among

23 Chapter 1: Introduction cotton species/cultivars (Khan et al., 2000; Rahman et al., 2002a, 2008; Wu et al., 2007b). RAPD markers have also been used for identification of cultivar/genes in cotton (Linos et al., 2002; Lu and Myers, 2002) and hybrid (Dongre and Parkhi, 2005; Rahman et al., 2005a).

Major limitations of RAPD are (1) reproducibility (2) reliability and (3) incapability to check the difference in fragments of similar size (Mukhtar et al., 2002, Rahman et al., 2009).

1.5.2.3.3 Amplified fragment length polymorphism (AFLP)

In AFLP, genomic DNA is restricted which is used as template in PCR. it helps in exploring complex genomes. Only a small quantity (20-500ng) of genomic DNA is needed for undertaking this analysis. Two types of enzymes EcoRI and MseI/TaqI are commonly used to digest the genomic DNA. Adapters are designed in this way that after their ligation original restriction site is not reinstated, which permits instantaneous restriction and ligation.

AFLP is useful technique to identify DNA markers associated with target genes produce target genes as this technique has been used in a number of mapping studies for different crops (Xu et al., 2000; Hao et al., 2005; Raboin et al., 2006; Semagn et al., 2006b and Grimmer et al., 2007; Robbins and Staub, 2009). AFLPs are highly polymorphic markers. Because of their high polymorphism AFLPs have been used for genetic diversity studies in many crops (Hashimoto et al., 2004; Moghaddam et al., 2005; Beyene et al., 2006; Allinne et al., 2007; Dong et al., 2007; Jiang et al., 2007; Ley and hardy, 2013). Furthermore, AFLPs are also helpful for the identification of cultivar (McGregor et al., 2002; Sobotka et al., 2004) and phylogenetic studies (Xu and Ban, 2004).

AFLPs produce higher number of polymorphic loci as compared to RFLPs, SSRs or RAPDs (Maughan et al., 1996). Like RAPD markers, AFLPs are also dominant in expression. Polymorphism is noticed as the band is present or absent.

AFLPs in cotton were first time reported by Reddy et al. (1997). After that these were successfully used for a number of studies for estimating the extent of

24 Chapter 1: Introduction genetic diversity (Pillay and Meyers, 1999; Murtaza et al., 2005; Wang et al., 2007c), understanding evolutionary relationships (Abdalla et al., 2001; Zhong et al., 2002), identification of disease resistant gene homologues (RGHs) (Niu et al., 2006) as well as fiber QTLs detection (Jiang, 2004). In addition to these, some G. tomentosum L. and G. hirsutum L. specific AFLP markers have been identified (Hawkins et al., 2005).

Major drawbacks associated with AFLP markers are (1) they are dominant in expression and difficult to identify various alleles of a specific locus. Some AFLP markers are co-dominant but their frequency is only 4-15%. (Waugh et al., 1997; Lu et al., 1998; Boivin et al., 1999). These markers are being used with some modification as well.

In cotton, the utilization of AFLPs in genetic linkage map construction is limited because occurrence of AFLP is relatively low in cotton. The idea to cleave the AFLP fragments using frequent restriction enzymes in combination with AFLP was proposed by Zhang et al. (2005) called cleaved AFLP analysis (cAFLP).

1.5.2.3.4 Cleaved amplified fragment length polymorphism

Procedure of cAFLP is same to AFLP but AFLPs comparatively low in number in G. hirsutum L. and G. barbadense L. (cultivated tetraploid species), therefore, rarely used for genetic map construction so amplicons of AFLP are cleaved with restriction enzymes which increases the polymorphism. Zhang et al. (2005) found that cAFLP increased polymorphism in cultivated cotton species G. hirsutum L. and G. barbadense L. 67%, and 132% respectively. To increase polymorphic markers for genotyping and mapping studies, amplicons of AFLP and cAFLP can be pooled before electrophoresis.

1.5.2.3.5 Microsatellites or Simple sequence repeats (SSRs)

Simple sequence repeats (SSRs) are short (2 to 6 nucleotides), tandemly repeated DNA sequences. These were reported first time in (Litt and Luty, 1989) and soon after in plants (Condit and Hubbell, 1991). SSRs detect high level of polymorphism (Lagercrantz et al., 1993; Powell et al., 1996). The frequency

25 Chapter 1: Introduction of SSRs is lower in plants as compared to mammals (Lagercrantz et al., 1993; Morgante and Olivieri, 1993). These small repetitive DNA sequences are evenly distributed through the genome.

In plant genomes percentage of SSRs ranges from 0.37% in maize to 0.85% in Arabidopsis (Morgante et al., 2002). Most plant genomes have higher proportion of AG/CT type dinucleotide repeat motifs than AC/GT, while vice versa in human genome (Morgante et al., 2002). As compared to other type of SSRs, trinucleotides are found more frequently in cotton (Reddy et al., 1997; Han et al., 2006) and in Arabidopsis, wheat, maize, rice and soybean (Morgante et al., 2002).

Like RFLPs SSRs can also be used by hybridizing short length probes. SSRs are then converted into PCR based markers in order to enhance the efficiency. For this purpose primers are used to design microsatellites using the sequence information of flanking region of microsatellites. As compared to RFLPs, SSRs are more polymorphic, automated and require low amount of genomic DNA (Mitchell et al., 1997).

The variation in size of amplicons is due to variation in number of repeat units of SSRs which can easily be measured by electrophoresis following staining with ethidium bromide (Ellegran, 1993). Among crop plants variation in length of amplified fragments was reported first time in soybean (Akkaya et al., 1992). SSRs have high level of polymorphism at an individual locus. At a single microsatellite locus the highest number of alleles, 37 in barely (Saghai Maroof et al., 1994) and 26 in soybean (Rongwen et al., 1995) have been identified.

SSRs are highly reproducible markers than the other marker systems. SSRs are valuable for mapping of different genomes because of their easy transportation/exchange among different research laboratories (Dib et al., 1996; Dietrich et al., 1996; Schmidt and Heslop, 1998). Precise size of the amplified fragments sometimes difficult to assess by fractionating on agarose gels, however, MetaPhor agarose (2% MetaPhor and 2% ultra pure agarose) can overcome this issue.

26 Chapter 1: Introduction

MetaPhor agarose has the ability to separate alleles differed by ≤3 bp(Karp et al., 2001).

Genome maps based at least in part, on SSRs exist for a number of plant species, such as Arabidopsis (Bell and Ecker, 1994; Wang et al., 2011), tomato (Broun and Tanksley., 1996; Foolad, 2007; Subramaniam et al., 2011), tetraploid potato (Bradshaw et al., 1998, 2008; McCord et al., 2011), rice (Cho et al., 2000; McCouch et al., 2002; Rangel et al., 2007; Angaji et al., 2010; Suh et al., 2010; Xi et al., 2011), wheat (Borner et al., 2000; Torada et al., 2006; Gupta et al., 2008; Gadaleta et al., 2009; Molnar et al., 2010; Adhikari et al., 2012), and barley (Varshney et al., 2006; Varshney et al., 2007b; Sato et al., 2009; Shavrukov et al., 2010; Wang et al., 2010) and many other species (Gyapay et al., 1994; Powell et al., 1996; Sverdlov et al., 1998, Yu et al., 2002; Druka et al., 2010; Cho et al., 2012).

1.5.2.3.6SSRs for genetic diversity and mapping studies

In multiple studies, SSRs have been employed for genetic diversity studies like in alfalfa (Flajoulot et al., 2005), white clover (George et al., 2006), rice (Thomson et al., 2007), wheat (Wang et al., 2007b), groundnut (Tang et al., 2007), maize (Yao et al., 2007; Lu et al., 2009) and cotton (Tabbasam et al., 2013). SSRs have been utilized in mapping of genes such as female sterile mutant gene in soybean (Kato and Palmer, 2004), gene conferring leaf blight resistance in sorghum (Mittal and Boora, 2005), stripe rust resistance gene in wheat (Li et al., 2006), aluminium resistance in alfalfa (Narasimhamoorthy et al., 2007), root knot nematode (Wang et al., 2006a; Ulloa et al., 2010), race 1 of Fusarium (Ulloa et al., 2011) and in cotton genetic mapping (Gao et al., 2004 and 2006; Guo et al., 2007a; Lacape et al., 2009, 2010; Yu et al., 2011, 2012).

A number of means have been reported to increase the number of SSRs in cotton, including genomic libraries (Nguyen et al., 2004) or BAC clones (Lichtenzveig et al., 2005) and even now the genome sequence information (Paterson et al., 2012).

EST-SSRs for a number of crops, have also been identified in barley (Thiel et al., 2003), wheat (Guo et al., 2003; Gupta et al., 2003), cotton (Guo et al., 2006a; Han

27 Chapter 1: Introduction et al., 2006) and other plant species (Poncet et al., 2006; Asp et al., 2007). EST- derived SSRs are cooperative to map genes with known function.

Fairly old, a large number of genome derived (Reddy et al., 2001; Yu et al., 2002; Nguyen et al., 2004; Frelichowski et al., 2006; Wang et al., 2007c) as well as EST- derived (Saha et al., 2003; Han et al., 2004; Qureshi et al., 2004; Changbiao et al., 2006; Han et al., 2006; Taliercio et al., 2006; Guo et al., 2007b) SSRs have been identified in cotton. Recently, the whole genome sequencing of G. raimondii L. has provided the opportunity to find SSRs at whole genome level (Paterson et al., 2012; Wang et al., 2012).

The availability of a large number of SSRs have opened new avenues for genetic and genomic studies in cotton like hybrid identification (Dongre and Parkhi, 2005), genetic diversity assessment among diploid (Liu et al., 2006; Stella et al., 2009) and tetraploid germplasm (Liu et al., 2000a; Lacape et al., 2007; Wang et al., 2007d; Wu et al., 2007b), commercial cultivars (Gutierrez et al., 2002; Rahman et al 2002a; Zhang et al., 2005a; Bertini et al., 2006; Mumtaz et al., 2010) and cotton species (Abdalla et al., 2001; Wu et al., 2007; Kalivas et al., 2011 ; Tabbasam et al., 2013).

1.6 Transcriptome analysis of cotton fiber development

1.6.1 Cotton fiber

Vegetative organs of plants have trichomes of different sizes/origin (Werker, 2000) which protect plants from biotic and abiotic stresses, absorption of water and seed dispersal. Cotton seed trichomes are single celled which have considerable economic importance.

Cotton trichomes or fibres differentiate from an epidermal cell of the seed coat that range in length from 30-40 mm and have thickness of 15 μm (Saima et al., 2008)

1.6.2 Fiber Properties

Fiber physical properties (length, strength, fineness) determine lint quality. These properties play important role in assessment of economic value and

28 Chapter 1: Introduction determining the end use of cotton. Thus quality of lint is extremely important for textile industry (Wang et al., 2012).

1.6.2.1 Fiber length

Most important characteristic required to textile industry is fiber length, as longer fibers twist more times around each other for making more number of counts. The length of fibers varies considerably within a genotype and even on a single seed (Behery 1993; May 2000). Fiber length varies from genotype to genotype and location on seed, as pointed ends has shorter fibers while chalzal ends have longer fibers. Plasmodesmatal closure duration and environmental factors (Bradow et al., 1997a, b) may affect fiber length and the phenomenon is genotype specific and seed location.. Floral anthesis is the most sensitive to fluctuation in environmental conditions. During the phase of fiber elongation the suboptimal conditions of environment depress the elongation rate or reduce the length of elongation period which ultimately reduces the potential for fiber length (Hearn, 1976). Gipson and Joham carried out an experiment in 1969 to find the influence of temperature on fiber length and found that when temperature lowered from 27.2oC to 10.00C, fiber elongation rate was decreased while its elongation period increased. The sensitivity of fiber to temperature varies from age to age and it is the most sensitive up to the 15 days of age.

Characteristics of fiber vary from season to season and location to location. Cotton fiber characteristics are optimal at 30-31°C, sunshine time 6-10 h, air humidity 65-74% and precipitation 2-4 mm.

Currently, instrument for direct fiber length measurement are not available however, HVI and AFIS measure the length uniformity and short fiber content. The difference in principle of both instruments is that HVI calculates fiber length uniformity from a ratio of HVI mean length and upper half mean length of fiber which is expressed in the form of percentage while AFIS calculates length uniformity on the basis of coefficient of variation for length by the number and length by weight measures.

29 Chapter 1: Introduction

1.6.2.2 Fiber strength

Yarn strength depends upon fiber strength. Fiber strength is directly related to yarn strength. Genetics is determinant of fiber strength however environmental conditions also have effect on fiber strength.

Fiber durability depends upon fiber strength which is important for ginning and yarn development practices (Moore, 1996; May, 2000). With the advent of high- tech spinning machines in textile sector, high strength of fiber is needed for avoiding breakage of the yarn fibers (Patil and Singh, 1995). Yarn production capacity using rotor spun is 5 times higher than the conventional spinning method (ring spinning). Moreover, yarn produced with rotor spinning is more even than the ring spinning but 15-20% weaker than ring spun yarn.

Fiber strength also plays a vital role in maintaining the original qualities of the cotton during chemical processing. Generally, fiber strength is measured on the basis of force needed for breaking the prepared fiber bundles. There are a number of instruments which are used for the measurement of fiber strength, i.e. Pressley tester (Pressley, 1942), Stelometer (Hertel, 1953) and HVI.

1.6.2.3 Micronaire

Micronaire is one of the traits that measure the quality of fiber. Micronaire is combination of fiber maturity and fiber fineness (Lord and Heap, 1988; Moore, 1996; May et al., 2000). Micronaire is measured as the degree of air impermeability of specific weight of fibers at constant pressure (Johnson, 1952). The optimal range of micronaire is 3.8-4.5 (Bednarz et al., 2002; Gordon and Naylor, 2004; Bange et al., 2009). The differences in micronaire values are directly related to the extent of fiber maturity. Fibers with micronaire more than 4.5 are of coarse quality and not required by spinners because of reduced strength (Bange et al., 2010). A Fiber having less than 3.8 micronaire value is the indicative of show immaturity which may lead to break yarn fibers and the less uptake of dye. Immature fibers are usually very fine.

Changes in temperature have great impact on micronaire value (Bange et al., 2010). Other factors like plant defoliation (Siebert et al., 2006; Bange et al., 2010), radiation (Pettigrew, 1995; Wang et al., 2006) and water stress (Hearn, 1994) may

30 Chapter 1: Introduction affect the micronaire. Thus understanding of the magnitude of influence of these environmental parameters on micronaire is vital for undertaking management practices for producing fiber with optimum micronaire.

The instrument used for the precise measurement of micronaire is the Areal meter (Hertel and Craven, 1951), Shirley fineness maturity tester (American Society for Testing and Material, 1993) and AFIS which provide module for direct measurement of individual fiber diameter and thus provide degree of fineness.

Fibers with long staple length, high strength and optimum micronaire have much more potential to synchronize with textile processing methods while fibers of short staple length have lesser yarn strength which reduces the efficiency of spinning and ultimately decreases the utilization of yarn. The textile industry requires yarn of high average strength so that it can help to counter harsh spinning activities (Deussen, 1992; Benedict et al., 1999).

Identification of cotton fiber genes would be helpful for developing cotton varieties with improved fiber quality traits.

1.6.3 Fiber Development

Cotton fiber is used for wearing since ages through the world. (1) It is comprised of 95% cellulose, small quantities of hemicelluloses, pectin and proteins (Saima et al., 2008).

Cotton is a major source of natural fiber to textile industry (Wang et al., 2012) and also an excellent single-celled (fiber) developmental system to study polyploidization and cell elongation (Chen et al., 2007; Wang et al., 2012).

Fiber development stages are (1) initiation (2) elongation (3) secondary cell wall (SCW) thickening (4) dehydration (Basra and Malik, 1984; Jasdanwala et al., 1977). These developmental stages are overlapping (Orford and Timmis, 1997; Saima et al., 2008). Initiation stage decides about number of fibers/each ovule while duration of elongation and SCW determine the length and strength/fineness, respectively (Smart et al., 1998; Ruan et al., 2001).

31 Chapter 1: Introduction

Initiation stage initiates 3 days before anthesis and continues even one day after anthesis. During this stage protrusion and enlargement of ovular surface’s epidermal cells take place. Around 30% of epidermal cells mature into lint fibers (Ramsey and Berlin, 1976; Saima et al., 2008). Elongation stage starts from the day of anthesis (Stewart, 1976) and lasts around 21-26 dpa (days post anthesis) (Schubert et al., 1973; Meinert and Delmer, 1977).

Secondary cell wall (SCW) deposition overlaps the last days of elongation as it starts at 16-18 days after anthesis (Schubert et al., 1973; Meinert and Delmer, 1977) and continues till 40-45 days after anthesis. Overlapping between fiber SCW deposition and elongation depends upon cultivar and environment (Saima et al., 2008).

After dehiscence of seed capsule; which is at around 45-60 days after anthesis, fiber cells quickly dehydrate (Kim Triplett, 2001) and become mature fiber which is comprised of about 95% cellulose (Arthur, 1990; Saima et al., 2008).

Cotton yield and quality is determined on the basis of fiber length which depends upon the onset/cell expansion duration while strength and fineness are positively correlated with SCW deposition duration (Smart et al., 1998; Ruan et al., 2001).

1.6.3.1 Transcriptomes of a developing fiber

Cotton fiber production depends upon initiation phase as it determines the number of fibers/each ovule. In multiple studies, genes involved in regulation of cotton fiber initiation have been identified (Wu et al., 2007b; Xiao et al., 2010; Zhang et al., 2011). For example Sucrose synthase gene (SuSy) was found to be involved in ovule development (Hendrix, 1990). Comparative analysis of transcriptomes harvested from mutant (fiberless, fl) and its wild type, have shown higher level of SuSy mRNA at the stage of fiber initiation of wild type than the mutant type ovules (Ruan and Chourey, 1998). When the levels of SuSy mRNA was down regulated by RNAi in the epidermal cells, which resulted in undesirable phenotypes with reduced fiber initiation, elongation and development of seed (Ruan et al., 2001, 2003).

32 Chapter 1: Introduction

In another study, more than 20 genes in 3dpa ovules were identified (Lee et al., 2006) which were highly expressed in 3dpa ovules than in the NINI (fibreless) mutant.

A number of transcription factors (GaMYB2, GaRDL1, GaHOX1 and GhMYB109) vital for cotton fiber initiation have been isolated from cotton fiber (Wang et al., 2004; Guan et al., 2008; Pu et al., 2008; Shangguan et al., 2008).

Elongation period of the cotton fiber development has been explored (Kim and Triplett, 2001). The fiber length is depended upon the period of cell elongation. This process is associated with plasmodesmata dynamics and transporter activities (Ruan et al., 2001). Sucrose and K+ transporters encoding genes express at high level during cell elongation (Ruan et al., 2004). The differences in fiber length among cotton genotypes depend upon the period for which plasmodesmata are open. These plasmodesmata remain open from 0-9 dpa, close at 10 dpa and re-open at 16 dpa (Shen et al., 2011).

More than 100 genes identifying using through subtractive hybridization and cDNA arrays (Ji et al., 2003) have been found up-regulated during fiber elongation period and a number of genes expressed during this period have been isolated (Wang et al., 2010a; Zhang et al., 2010). During this period ~2mm/day growth of fiber is observed in G. hirsutum L. (John and keller, 1996; Ji et al., 2002).

At ~15dpa, SCW synthesis starts which results in fiber comprising of >94% cellulose. Fiber strength and fineness is affected by the duration of SCW. After SCW synthesis fiber with distinctive characteristics becomes suitable for spinning.

1.6.3.2 Comparison of diploid and tetraploid genomes regarding fiber genes

In the diploid cotton species (G. arboreum L.) ~18000 genes (Wilkins et al., 2005) while in tetraploid cotton species (G. hirsutum L.) ~36000 genes (Saima et al., 2008) were found to be involved in fiber development. Fiber transcriptomes of the cultivated tetraploid cotton species have homeologous loci on both the AT and DT subgenomes. The sequences and arrangement of orthologous loci of diploid (A and D; progenitor genomes of tetraploid) and tetraploid species are highly conserved (Senchina et al., 2003; Rong et al., 2004) suggested that the function of fiber genes

33 Chapter 1: Introduction across all the species is conserved. The phenotypic variation in properties of fiber across different species is because of quantitative differences in expression of the genes.

Among diploid species, only G. arboreum L. produce spinable fiber but limited studies have been carried out to find genetic basis of its fiber development. In the present study G. arboreum var Ravi (have good fiber quality and immune to CLCuD) was selected to isolate genes expressed during the fiber development stagewhich would help in cloning genes involved in fiber development. The rate of gene discovery is expected to be enhanced at least two-fold to avoid redundancy due to polyploidy.

1.6.3.3 Similarities between Arabidopsis trichome and cotton seed hair development

Cotton fiber initiation was found similar to Arabidopsis trichome development (Guan et al., 2007). Therefore, the mechanism of cotton fiber initiation can be dissected by identifying analogous transcription factors (Chen et al., 2011).

MYB transcription factor (GL1) is involved in initiation of Arabidopsis leaf trichome (Larkin et al., 1993, 1994) as mutant gl1 are trichome-less (Larkin et al., 1993; Serna and Martin, 2006). When GL1 (MYB transcription factor) encoding cotton cDNAs were over expressed in tobacco, it produced trichome (Payne et al., 1999).

1.7 Cotton fiber gene identification through differential display

Different techniques like microarray (Ruan et al., 1998; Ji et al., 2003; Lee et al., 2006; Wu et al., 2007; Shangguan et al., 2008) and recently deep sequencing (Wang et al., 2010) have been used to understand differentially expressed genes conferring cotton fiber initiation and development using normal fiber producing and mutant cotton ovules. However, the techniques like microarray and deep sequencing are limited to resource full research groups (Saima et al., 2008). Moreover, some other limitations are also associated with microarrays such as the genes with low expression levels cannot be detected precisely (Wang et al., 2010). Most of the

34 Chapter 1: Introduction research groups working on fiber development have their major focus on isolation of full length genes involved in fiber development pathways.

Partial sequences of cotton fiber genes could be achieved by differential display of RNA. The DDRT-PCR (differential display reverse transcriptase polymerase chain reaction) (Liang and Pardee, 1992) has a number of advantages that are (1) easy to perform (2) requires starting material (RNA) in less quantity but there are some limitations associated with this method (1) amplification of small sized fragments which could only be resolved on polyacrylamide gels through radioactive detection techniques (Welsh et al., 1992) so, elution of differentially expressed fragments from thin polyacrylamide gels is difficult (Lohmann et al., 1995). Using ethidium bromide stained agarose gels can overcome this disadvantage of running PAGE (Rompf et al., 1997; Jefferies et al., 1998; Gromova et al., 1999; Ahmed et al., 2000; Kim et al., 2004).

1.8 Development of new SSRs

The demand of cotton products at global level will expectedly be increased 102% in 2030 (Xiao et al., 2009). This demand can be met by increasing production through complimenting the conventional breeding methodologies with the DNA- based tools (Xiao et al., 2009; Rahman et al., 2011).

Simple sequence repeats (SSRs) are useful for a number of applications in molecular biology particularly in mapping because of their Mendelian inheritance, co- dominance in expression and PCR-based assay for genotyping. Several SSR based cotton linkage maps are available (Han et al., 2004, 2006; Nguyen et al., 2004; Lacape et al., 2005; Park et al., 2005; Frelichowski et al., 2006; Guo et al., 2007; Zhang et al., 2008; Paterson et al., 2009 ; Yu et al., 2011, 2012; Shaheen et al., 2013). However, more SSR markers are required to dense these maps. A high density map is important for gene discovery and molecular maker based breeding.

BAC libraries having large DNA inserts are ideal choice for physical map construction (Hanson et al., 1995; Zhang et al., 2004; James et al., 2006; Guo et al., 2008; Lin et al., 2010) as well as for developing genetic markers like SSRs. These BAC derived SSR markers have a number of advantages over markers derived from other sources (Yu et al., 2002a, b). These advantages are sequences can integrate

35 Chapter 1: Introduction efficiently in genetic and physical maps (Danesh et al., 1998; Cregan et al., 1999; Chen et al., 2002; Yu et al., 2002a, b; Wu et al., 2004) and chromosome-based sequencing on the basis of assignment of BAC clones to chromosomes.

In a number of studies, BAC-derived SSR markers have been used for cotton genetic mapping (Tomkins et al., 2001; Yu et al., 2002a, b; James et al., 2006; Lin et al., 2010).

1.9 Molecular linkage maps in cotton

1.9.1 Construction of Linkage maps

For mapping of genomic regions, genes and markers that segregate through crossing-over are used (Paterson, 1996). Genes/markers which are tightly linked are transmitted together more frequently from parent to offspring as compared to the non- linked marker genes. Frequency of recombinant genotypes in segregating populations is useful for estimating the genetic distance between markers. Segregating markers which can be morphological, biochemical and DNA markers are used for the construction of linkage maps.

Segregating population consisting of about 50-250 genotypes (Mohan et al., 1997) or even more is the prerequisite in linkage map construction. Generally in self- pollinated species, mapping population originates from parents that are both highly homozygous (inbred). In cross pollinated species, mapping population may be resulting from a cross between a heterozygous parent and a haploid or homozygous parent (Wu et al., 1992). There are different populations like F2 segregating population, recombinant inbred lines (RIL), double haploids (DH) and back cross population. Every population has its own advantages and disadvantages.

Map maker/EXP (Lander et al., 1987) and Map manager QTX (Manly et al., 2001) are mostly commonly used software for map construction. Odd ratios (ratio of linkage to no linkage) are used to calculate linkage between the markers and it is usually represented in terms of logarithm called LOD score or LOD value (Risch, 1992). If two markers have LOD score 3 then it shows 1000 times more linkage than no linkage. On the basis of QTL analysis, marker trait association is performed for mapping genes after linkage map construction. The principle of QTL analysis is

36 Chapter 1: Introduction detection of association between trait(s) of interest and the marker(s). QTLs are detected by three methods single marker analysis, simple interval mapping and composite interval mapping (Tanksley, 1993; Liu, 1998; Preetha and Raveendren, 2008).

1.9.2 Cotton linkage maps

The first cotton genetic map using interspecific (G. hirsutum L. x G. barbadense L.) derived F2 population was constructed using RFLP markers (Reinisch et al., 1994). In this map 705 loci grouped into 41 linkage groups which covered 4,675 cM. Rong et al. (2004) added further markers on this map. This map is considered a complete genetic map of cotton up till now consisting of 2,584 loci at 1.74-cM distance, covering all (13) cotton chromosomes of allotetraploid cottons. In multiple studies different fiber quality traits have been mapped using RFLPs (Kohel et al., 2001; Chee et al., 2005a, b; Draye et al., 2005).

AFLP was utilized by Khan et al. (1998) to develop the earliest AFLP-based genetic map in cotton, using tri-species F2 mapping population, subsequently followed by Lacape et al. (2003) and Mei et al. (2004). For G-genome species of cotton (G. nelsonii L.× G. austral L.), first AFLP based genetic map was constructed by Brubaker and Brown. (2003). Some AFLP markers specific to G-genome were identified and these markers were used to identify transmission frequency of G. austral L. chromosomes in the hexaploid bridging family (G. hirsutum L. × G. austral L.).

For the study of intra-and inter-specific polymorphism, SSRs are useful because of their high occurrence in the genome. All SSRs derived from cotton genome are considered to be sufficient for genome mapping and thus can be used in MAS. Liu et al. (2000b) used aneuploid genetic stocks for assigning SSRs to cotton chromosomes. A number of SSRs based linkage maps using various populations derived from inter-specific crosses are available (Guo et al., 2002; Zhang et al., 2002b; Lacape et al., 2003; Nguyen et al., 2004; Lin et al., 2005; Park et al., 2005; Song et al., 2005a; Frelichowski et al., 2006; Han et al., 2006; Guo et al., 2007a; He et al., 2007).

37 Chapter 1: Introduction

A BC1 population {(TM-1 x Hai-7124) x TM-1} was developed and this population was used for the construction of genetic map containing 511 loci spanning ~4331 cM of the genome with the help of SSR markers (Song et al., 2005a). In another mapping experiment, 111 additional EST-SSRs were integrated on this map (Han et al., 2004). In this map there were 2 to 41 markers in each linkage group (34 linkage groups) with 9.0 cM inter-marker distance. This map covered 5644.3 cM of genome (Han et al., 2004). Han et al. (2006) modified his previous map by including 133 new SSRs which now have 907 loci spanning 5,060 cM with an average distance of markers 5.6 cM (Han et al., 2006).

The BC1 populations developed from interspecific crosses have also been used extensively for the construction of genetic map using SSRs. A BC1 population derived from [{(Guazuncho2 (G. hirsutum L.) × VH8-4602 (G. barbadense L.)} × Guazuncho2] was used for genetic map construction using SSR markers. This map covers 4400 cM of genome (Lacape et al., 2003). With an addition of 133 new SSR loci (Nguyen et al., 2004), this map now comprises of 1160 loci, spanning 5519 cM thus, covers wide proportion of tetraploid cotton genome.

Recombinant inbred lines (RILs) derived from inter-specific crosses were also exploited to produce linkage maps by surveying various marker assays including SSRs. For example, a genetic map spanning 305 cM was constructed using RILs (Wang et al., 2006a) Similarly, RILs population containing 183 individuals developed from an interspecific cross (TM-1 x 3-79), were surveyed with SSRs and complex sequence repeats (CRS) (Park et al., 2005). A total of 193 loci including 121 new fiber loci were mapped which covered 1277cM of the genome. These loci were grouped into 11 linkage groups and mapped on 19 different chromosomes. The same population was also used by Frelichowski et al. (2006) for adding more new makers. They developed 1,316 SSRs from (2,603) BAC-end sequences of G. hirsutum. This map covered 2,126.3 cM (~ 45%) of genome (433) loci which were grouped into 46 linkage groups. The average distance between these loci was 4.9 cM.

Linkage/genetic maps have also been developed using different types of DNA markers in order to overcome the issue of a particular type of DNA marker. Two F2 inter-specific populations were used for genetic map construction using two types of DNA markers: i.e. RAPDs and SSRs (Cantrell et al., 1999). This map covered 1058

38 Chapter 1: Introduction cM of genome and many fiber QTLs (strength, length, fineness) were mapped on this map.

A double haploid mapping population comprising of 58 individuals was developed from an inter-specific cross (G. hirsutum L. acc. TM-1, and G. barbadense L. cv. Hai-7124). This population was used for genetic linkage map construction using two types of markers: SSRs (510) and SSRs and RAPDs (114). This map consisted of 489 loci, ordered in 43 linkage groups spanning 3314.5 cM.Monosomic and telo-disomic genetic stocks were used to assign these linkage groups to chromosomes (Zhang et al., 2002). In another study, BC1 population consisting of 75 plants derived from inter-specific cross (G. hirsutum × G. barbadense) was used for genetic map construction using three types of markers (RFLP-SSR-AFLP) (Lacapeet al., 2003). The map consisted of 888 loci which were grouped into 37 linkage groups and covered 4,400 cM of the genome. This map was further saturated by integrating 1,160 new SSR loci covering 5,519cM with an average distance of 4.8 cM between loci (Nguyen et al., 2004).

In another study, a genetic map containing 392 (42 linkage groups) loci spanning 3,287 cM (~70%) of genome was constructed with the help of AFLPs, SSRs, and RFLPs using an inter-specific population (Lin et al., 2005).

Similarly, a genetic map containing 566 (41 linkage groups) loci spanning 5141.8 cM of the genome was constructed with the help of SSRs, RAPDs, SRAPs from F2 population derived from a cross: G. hirsutum cv Handan-208 x G. barbadense cv. Pima-90. Average distance between loci was 9.08 cM (Mei et al.,

2004). A detailed genetic map on the same F2 population (Lin et al., 2005) was constructed using SSRs, SRAP, RAPD and retrotransposon-microsatellite amplified polymorphisms (REMAPs). This map assembled 1029 loci into 26 linkage groups spanning 5,472.3 cM. Average distance between loci was 5.32 cM by He et al. (2007). Chromosome number assigned to linkage groups with the help of aneuploid stocks (Stelly, 1993; Reinisch et al., 1994) and probes from BACs (Wang et al., 2006a).

A linkage map containing 917 SSR loci spanning 5452.2 cM of cotton genome has been constructed using BC1 population [(G. hirsutum L. x G. barbadense L.) x G.

39 Chapter 1: Introduction hirsutum L.)]. These loci were grouped into 44 linkage groups. These linkage groups were assigned to 26 chromosomes. The average distance between loci was 5.9 cM (Zhang et al., 2008). In another study, a genetic linkage map consisting of 2316 loci (EST-SSRs and SSRs) located on 26 cotton chromosomes number was constructed covering 4418.9 cM of cotton genome with average marker distance of 1.91 cM (Yu et al., 2011).

An inter-specific linkage map based on 2072 loci covering 3380 cM of genome has been constructed using RILs population derived from G. hirsutum (TM- 1) x G. barbadense (3-79). In this map 1726.8 cM of At- subgenome was covered with 1138 loci while 1653.1 cM of Dt- subgenome was covered with 934 loci. Average marker distance was 1.63cM (Yu et al., 2012).

The chr-12 and chr-26 are homeologous and a physical map for these chromosomes using BAC contigs was developed. This map covered 73.49 Mb and 34.23 Mb of the chr-12 and chr-26 respectively (Xu et al., 2008). Out of 220 and 115 used contigs, 110 and 48 were mapped to At and Dt subgenome, respectively. A total of 67 contigs were common between the two chromosomes with 40% physical similarity. On chr-12, 401 fiber unigenes and 214 non fiber unigenes were detected whereas 207 fiber unigenes and 183 non fiber unigenes on chr-26 were detected (Xu et al., 2008).

A genetic linkage map using 188 RILs has been constructed for the identification of 14 agronomic and fiber traits (Wu et al., 2009). This map covered 965 cM with 26 linkage groups. Chromosomes numbers have been assigned to 24 linkage groups. The 20% phenotypic variation of all the traits and 18% of fiber traits was largely due to genotypic effects. A total of fifty six QTLs controlling 14 agronomic and fiber traits, with LOD score more than 3.0, were detected on chr-17. A total of 27 QTLs controlling 10 agronomic and 17 fiber traits were found on nine chromosomes of A-subgenome while 29 QTLs controlling 13 agronomic and 16 fiber traits were detected on eight chromosomes of D-subgenome. QTLs controlling yield and fiber quality identified in this study are useful for MAS as they were identified in intra-specific regions (Wu et al., 2009).

40 Chapter 1: Introduction

In cotton, BAC-end sequences are helpful for developing genetic markers like SSRs and these sequences can integrate efficiently in genetic and physical maps. G. raimondii (D genome) is progenitor specie of tetraploid cotton. A physical map of this species was assembled by incorporating overgo probes, agarose based fingerprints and high information content fingerprinting (HICF). For anchoring in genetic map, 1585, 370 and 438 contigs in cotton, Arabidopsis thaliana and Vitis vinifera respectively, total 13,662 BAC-end sequences and 2,828 DNA probes were used (Lin et al., 2010; Paterson et al., 2014).

41 Chapter 1: Introduction

Table.1.1 DNA-based genetic maps of tetraploid cotton

Genome Mapping Number of Marker coverage Reference Population (cM) LG

F2 RFLP 4675 41 (Reinisch et al., 1994)

F2 RFLP, RAPD 3855 40 (Yu et al., 1998)

F2 RAPD, SSR 1058 28 (Cantrell et al., 1999)

F2 RFLP, RAPD 4766 50 (Kohel et al., 2001) DH RAPD, SSR 3312 42 (Guo et al., 2002) DH SSR, RAPD 3315 43 (Zhang et al., 2002b) BC RFLP, SSR, AFLP 4400 37 (Lacape et al., 2003) BC SSR 5664 34 (He et al., 2004)

F2 AFLP, SSR, RFLP 3287 42 (Mei et al., 2004) BC RFLP, SSR, AFLP 5519 37 (Nguyen et al., 2004)

F2 STS 4908 26 (Rong et al., 2004)

F2 SSR RAPD SRAP 5141 41 (Lin et al., 2005) RIL SSR 1277 30 (Park et al., 2005) BC SSR 4331 34 (Song et al., 2005a) (Frelichowski et al., RIL SSR 2126 46 2006) BC SSR 5060 26 (Han et al., 2006) BC RAPD, SSR 3425 26 (Guo et al., 2007a) SSR, RAPD, F 5472 26 (He et al., 2007) 2 SRAP, REMAP BC1 SSR 5452.2 44 (Zhang et al., 2008) RIL SSR 965 26 (Paterson et al., 2009) BC1 SSR, EST-SSR 4418.9 26 (Yu et al., 2011) RIL SSRs, SNPs 3380 26 (Yu et al., 2012)

F2 SSRs, RAPD 346 10 (Shaheen et al., 2013)

42 Chapter 1: Introduction

1.10 Implications of genomic tools for cotton improvement

Marker-assisted-selection (MAS) is an important tool for identifying plants in segregating generations which would help in reducing time in developing improved cotton varieties. Mapping and identification of genes and cotton genome evolution mechanism were difficult to address before the onset of genomic resources development era. Selection efficiency can be improved significantly using MAS procedure at the early developmental stages. This technique offers advantage in screening phenotypes which are otherwise expensive to screen (Dreher et al., 2000).

In a number of crops a number of varieties with desired traits have been developed using DNA markers. e.g. In maize, successful applications have been exemplified by marker assisted introgression of novel genomic regions associated with anthesis-silking interval, marker-based diagnosis of plants containing the opaque 2 gene associated with quality, and marker-based prediction of hybrid vigor. New rice varieties are developed using DNA markers associated with genes and quantitative trait loci (QTLs) to provide resistance to both biotic stress, e.g. bacterial blight and blast, and abiotic stresses, and to improve yield and quality. A wheat variety ‘Patwin’ was developed through marker assisted selection for stripe and leaf rust resistance genes Yr17 and Lr37, respectively. The stay-green trait conferring resistance to drought in sorghum has been explored at length. In tomato, cotton, potato, soybean and other crops, many genes conferring resistance against various biotic stresses have been incorporated from wild relatives using DNA markers. In the near past, plenty of cotton genetic resources have been produced for utilising in its genetic improvement. Like many other crop species, cotton improvement

using DNA markers has been started (Rahman et al., 2011). The genome D8alloplasm

(CMS-D8) causes cytoplasmic male sterility (CMS) in cotton. This sterility was restored

by D8 restorer (D8R) and D2 restorer (D2R) (developed for CMS-D2). It showed that the two restorer loci are tightly associated having average genetic distance of 0.93 cM. New nomenclature Rf1 and Rf2 were assigned to these restorer genes. Identification of markers closely linked with these genes could help in the development of hybrid parental lines for using in breeding programme of cotton (Zhang and Stewart, 2001). A high resolution

43 Chapter 1: Introduction

genetic map of gene Rf1 consisting of 13 markers with genetic distance of 0.9 cM was reported (Yin et al., 2006).

Improving resistance to diseases has been one of the great breeding objective in cotton. He et al. (2004) isolated resistance gene analogues (RGAs) from G. hirsutum, which belongs to the family NBSLRR (nucleotide-banding site leucine-rich repeat) for cloning purpose. This gene family is found on a very few chromosomes of AD-genome. A-subgenome has more RGAs as compared to D-subgenome. Severe yield losses in cotton are due to root-knot nematodes (RKN). In “Aacla NemX” cultivar of G. hirsutum major RKN resistant gene (rkn1) tightly linked with a SSR marker “CIR316” was identified (Wang et al., 2006b). Already mapped RFLP markers were used to explore the position of bacterial blight resistant genes on various chromosomes (Wright et al.,1998; Rungis et al., 2002) and found a marker linked to resistant gene“B12” on chromosome- 14. For introgression of Xcm resistance through MAS in G. Barbadense, novel markers linked with this trait were identified using AFLPs and SSRs (Zhang et al., 2008). Three genes for cotton leaf curl virus disease (CLCuD) resistance, two for resistance (R1CLCuDhir and R2CLCuDhir) and one for suppressor of resistance (SCLCuDhir) were identified in G. hirsutum (Rahman et al., 2005b). A number of markers associated with resistance (CM- 43, CM-162) were identified. These markers were used in developing improved cotton cultivars i.e. NIBGE-2 and NIBGE-115.

1.11 Objectives

For improving fiber quality traits, it is vital to understand genetics and genomics of fiber. In this regard, we adopted two strategies, i.e. isolation of novel gene(s) involved in conferring quality traits and also the identification of QTLs and associated DNA markers.

This study was conducted to find phylogenetic relationship among different species of genus Gossypium and differentially expressed cDNAs from G. arboreum L. for helping in isolation of genes involved in fiber development. Moreover, to find the chromosomal locations of fiber QTLs in an interspecific G. hirsutum L. x G. barbadense

L. derived F2 population.

44 Chapter 1: Introduction

The particular objectives of the present study were.

1. Estimation of genetic diversity among cotton species using both functional (EST- SSRs) and genomic markers. 2. Isolation of differentially expressed sequences from fiber tissues at different stages of fiber development. 3. Constructing genome map of EST-SSRs/SSRs.

45 Chapter 2: Materials and methods

MATERIALS AND METHODS

2.1 Experiment No. 1. Genetic diversity and relationship of diploid and tetraploid cotton species using EST-SSR and gSSRs

2.1.1 Plant material

A total of 36 Gossypium species, including 24 diploid species representing all genomes except K genome (3 A-, 10 D-, 3 E-, 1 C-, 3 G-, 3 B-, and 1 F-genome), five tetraploid species AD-genome species and seven land races (morilii, palmeri, mariglanti, yucatanense, punctatum, latifolium and lanceolatum) were explored in the present study. The names of these species along with their genomes and geographical distributions are given in Table 2.1. Leaf samples of these species were collected from Central Cotton Research Institute (CCRI) Multan, Pakistan and the National Institute for Biotechnology and Genetic Engineering (NIBGE) Faisalabad, Pakistan. The genomic DNA of G. yucatanense L., G.capitis-viridis L., G. longicalyx L. and G. austral L. was kindly provided by late Prof. J.McD. Stewart (University of Arkansas, Fayetteville, Arkansas, USA).

46 Chapter 2: Materials and methods

Table 2.1 Cotton species included in the study

Sr. # Species name Genome Distribution 1 G. herbaceum (A1) Old world 2 G. herbaceum africanum (A1) Africa 3 G. arboreum (A2) Old world 4 G. thurberi (D1) Mexico, Arizona 5 G. klotzchianum (D3-k) Galapagos Islands 6 G. herkensii (D2-2) Mexico 7 G. davidsonii (D3-d) Mexico 8 G. aridum (D4) Mexico 9 G. raimondii (D5) Peru 10 G. gossypoides (D6) Mexico 11 G. lobatum (D7) Mexico 12 G. trilobum (D8) Mexico 13 G. laxum (D9) Mexico G. hirsutum cv NIBGE 14 (AD1) NIBGE, Faisalabad 115 15 G .tomentosum (AD3) Hawaii 16 G. mustelinum (AD4) Brazile 17 G. barbadense (AD2) Hawaii 18 G. darwinii (AD5) Galapagos 19 G .mariglanti (AD) Caribbean, Central America 20 G. latifolium (AD) Southern Mexico/Guatemala 21 G. morilli (AD) Mexico 22 G. palmeri (AD) Mexico 23 G. punctatum (AD) Cameron 24 G. yucatenense (AD) Guadeloupe 25 G. lanceolatum (AD) Mexican states of Oaxaca &Guerrero 26 G.anomalum (B1) Africa 27 G. barbosanum (B2) Africa 28 G. captisviridis (B3) AfricaCape VerdeIsland 29 G. somalense (E2) Africa/Arabia 30 G. stocksii (E1) Arabian penunsula 31 G.longicalyx (F) East Africa 32 G. incanum (E4) Arabian penunsula 33 G. robinsonii (C2) Australia 34 G. australe (G2) Australia 35 G. nelsonii (G2). Australia 36 G.bickii (G1) Australia

47 Chapter 2: Materials and methods

2.1.2 Extraction of Genomic DNA

Young leaves from five randomly selected plants of each cotton species were collected from the cotton field. The sampled leaves of each species were placed in pre-labeled plastic bags, immediately carried to the laboratory in liquid nitrogen and stored at –80 °C until the total genomic DNA was isolated by a CTAB method with little modifications as described by Iqbal et al. (1997).

Sampled leaves of each cotton species were ground in liquid nitrogen to fine powder, and transferred to 50 mL polypropylene tube. Fifteen mL of pre-heated (65°C) 2X CTAB extraction buffer (Appendix Table 3) was added before the frozen powder started thawing and the contents were mixed by inverting the tube many times, and then incubated at 65°C for 45 min in water bath. After incubation with the hot isolation buffer, 15 mL chloroform: iso-amyl alcohol (24:1) was added and contents were mixed gently by inverting the tube to form an emulsion the suspension was centrifuged for 10 min at 4000 rpm. The aqaeous layer was transferred to a new tube while the remaining chloroform phase was discarded. Precipitation of DNA was done by the addition of 0.6 volume chilled 2-propanol. After 10 min of centrifugation at 4000 rpm, the supernatant was decanted and the pellet was washed twice using 70% chilled ethanol. The pellet was dried thoroughly and dissolved in 0.1X TE buffer.

For RNase treatment, the suspension was transferred to a new 1.5 mL tube and 5 μLRNaseA (10 mg/mL) was added to digest RNA. After incubation for one hour at 37°C, one volume of the chloroform: iso-amyl alcohol was added and suspension was centrifuged for 10 min at 13000 rpm for phase separation. Supernatant was transferred to a new 1.5 mL tube and solution was mixed gently after adding 0.1 volumes 3M NaCl. One volume of 100% cold ethanol was added to precipitate DNA. After 10 min centrifugation at 13000 rpm at 4°C, supernatant was discarded and pellet was washed thrice with 70% chilled ethanol. Finally, DNA pellet was dried thoroughly and dissolved in 0.1X TE buffer (Appendix Table 3).

48 Chapter 2: Materials and methods

2.1.2.1 DNA purification

1. Measured 100µL of the extracted genomic DNA,, followed by addition of volume of phenole/chloroform (1:1) solution and mixed gently.. 2. Centrifuged for 5min at 13000rpm for phase separation (at room temperature). 3. Supernatant was collecxted in a new eppendorf tube (1.5mL). 4. Tthe1/10 volume of 3M sodium acetate (NaCH3OO) and 2.5 volume 100% cold ethanol was addedto precipitate DNA. 5. Placed at -20oC for 10 min. 6. Centrifugedat 4oC for 10 min at 13000 rpm. 7. The supernatant was discarded and pelletwas washed with 70% ethanol. 8. The pellet was air dried and dissolved in 0.1X TE buffer (Appendix Table 3).

2.1.2.2 Quantification of genomic DNA

The quality of DNA was checked using gel comparison method (running 50 ng aliquots of DNA on 0.8 % agarose gel) and the concentration of DNA samples was measured using standard Hoechst-stain-flouremeter. Concentrations of the isolated DNA stocks were determined using DyNA QuantTM 200 Flourometer (Hoefer, USA). All the DNA samples were diluted to a working concentration of 15ng/μL and stored at -20 °C in 50 μL aliquots. Recipes of buffers and solutions for DyNA QuantTM 200 Flourometer measurements are given in Appendix Table 4.

2.1.3 Simple sequence repeats (SSR) analysis

A total of 100 primer pair of SSRs, 50each of MGHES andBNL series were selected for the analysis. Sequence information of these primers was downloaded from http://www.mainlab.clemson.edu/cmd/primer, and was synthesized from GeneLink, USA. Polymorphic chain reaction (PCR) was performed in a thermal cycler (Mastercycler gradient, Eppendorf Germany).

2.1.3.1 Polymerase chain reaction (PCR)

Polymerase chain reaction was performed in 0.2 mL PCR tubes. Total volumof amplification reaction was 20µL (Table 2.2). Concentration of reagents used in PCR (supplied by Fermentas, USA) is given in Table 2.2.

49 Chapter 2: Materials and methods

Table 2.2 Concentration of reagents used in Polymerase chain reaction

Reagent Concentration Volume PCR buffer 10 X 2.0 µL

MgCl2 25 mM 1.6 µL dNTPs 2.5 mM 2.0µL Primer forward 50 ng/µL 1.0 µL Primer reverse 50 ng/µL 1.0 µL Taq DNA polymerase 5 units/µL 0.2 µL Template DNA 15 ng 2.5 µL Double distilled - 9.7 µL de-ionized water Total volume 20.0 µL

Table 2.3 Thermal cycler was programmed using the following profile

Number Steps Temperature Time of cycles Initial denaturation 94oC 4 min 1 (first) Denaturation 94oC 45 sec Annealing (depending 50-60oC 30 sec 35 on individual microsatellite) Extension 72oC 1 min Final extension 72oC 5 min 1 Hold 4oC Until turned off

2.1.4 Electrophoresis of amplified products

The amplicons were resolved by running on 4% MetaPhor™ (Cambrex Corporation, USA) agarose adopting a protocol described by the supplier. MetaPhor™ is a high-resolution agarose which can separate PCR products and small DNA fragments differ in size by 2%. For gel casting, 1X TBE (Tris borate EDTA; 45mM Tris-borate, 1mM EDTA, pH 8) buffer (receipe for 5XTBE buffer is given in

50 Chapter 2: Materials and methods

Appendix Table 5) was chilled over night at 4 oC. MetaPhor™ and agarose powder were dissolved thoroughlyby sprinkling slowly over the chilled stirring buffer and melted completely by heating in a microwave oven in short intervals to avoid excessive foam formation and spiting out of the flask. Ethidium bromide was added in the concentration of 0.5µg/mLwhen the the solution was cooled down to ~ 60oC and the solution was poured into the gel casting tray. After polymerization, the gel was cooled before loading for 30 min at 4 ºC to obtain a good resolution. The loading dye {40% (w/v) sucrose, 0.25% (w/v) bromophenol blue, 0.1 mMtris, 0.05 mM EDTA} was directly added to the products and 12 µL of the mixture was loaded into wells.DNA ladder (Fermentas, USA) of 50 bp was loaded in first well of the gels for the estimation of exact size of the amplified products.

2.1.5 Data scoring and statistical analysis

All visible and unambiguously scorable fragments were scored. The amplification profile of all the 36 species and land races were compared with each other. The amplified DNA fragments were scored as present (1) or absent (0). Polymorphism information content (PIC) analysis was conducted to calculate the genetic diversity of each microsatellite locus (Anderson et al., 1993). The PIC values of each locus were calculated as:

PIC = 1-  Pij2

Pijis the frequency of the jth allele for the ith locus summed across all alleles for the locus.

With the help of scoring profile similarity matrix was calculated (Nei and Li 1979). These similarity coefficients were used to construct the phylogenetic tree using unweighted pair group method of arithmetic means (UPGMA). We used the PAUP version 4.4 software. We developed three dendrograms using gSSRs, EST-SSRs and combined data sets. Amplification percentage was calculated by the formula described by Kuleung et al. (2004).

% of amplification = (no. of amplified markers x 100/total no. of markers)

51 Chapter 2: Materials and methods

Similarly, transferability of each SSR marker was calculated as the percentage of amplified products in all the cotton species. We too manually estimated the correlation between the repeat type and the polymorphism rate. The frequency distribution of the alleles among all the cotton species was calculated on MS excel. In this regard, we divided it in 10 different categories, i.e. alleles with frequency of 0.1 or less, 0.19 or less, 0.29 or less, 0.39 or less, 0.49 or less, 0.59 or less, 0.69 or less, 0.79 or less, 0.89 or less 0.99 or less. The association between the polymorphism rates with tandem repeats number was also calculated manually on the basis of the PIC values given in Table 3.1.

2.2 Experiment No. 2. Identification of differentially expressed genes from normal (fiber producing) and mutant (fiberless) ovules of Gossypium hirsutum during fiber elongation stage

2.2.1 Plant Material and RNA extraction

Gossypium hirsutum L. is known for its high yield potentional. Seeds of normal plants (fiber producing) of this species and mutant (fuzzless/lintless, fl) were kindly provided by late Prof. J. McD. Stewart (University of Arkansas, Fayetteville, Arkansas, USA). Plants were grown in greenhouse at NIBGE. Environmental conditions (temperature and relative humidity) were controlled artificially and kept at 25 ± 2°C and 50% respectively. The seeds germinated 4-10 days after sowing. Tagging of floral buds was done before their anthesis. The first day of flower opening was taken as 0 DAP (day after pollination). Fresh sample of 3 dpaovules from both types of plants that are wild and mutant (fl) were directly collected in liquid nitrogen.The samples were wrapedin aluminum foil and stored at -80oC if not directly use for RNA isolation. Ovules were ground into fine powder.QiagenMini Plant RNA isolation kitfollowing manufacturer’s protocol was used for RNA isolation.

Before starting RNA extraction, all plastic and glass wares were autoclaved bytreating with 0.01% DEPC (diethyl pyrocarbonate). The working area was cleaned with RNase Away solution (Invitrogen) to avoid the RNAse activity.

The integrity of the isolated total RNA was checked by electrophoresis on 1% non-denaturing ethidium bromide (0.5µg/mL) stained agarose gels in 1X TBE buffer

52 Chapter 2: Materials and methods

(Appendix Table 3). The quantity of the isolated RNA was measured with spectrophotometer at 260/280 nm.

2.2.2 Synthesis of the first strand cDNA and Differential Display RT-PCR (DDRT-PCR)

Revert aid H minus First Strand cDNA Synthesis kit’ (Fermentas, USA) was used to synthesise cDNA from DNAse-treated total RNA. Manufacturer’s protocol with little modifications in temperature was performed in mastercycler gradient, (Eppendorf, Germany). This cDNA was amplified using 11 anchored primers in combination with one of the 15 arbitrary primers (Table 2.6).

2.2.3 DDRT-PCR

Differential display analysis was conductedaccording to protocol reportedby Liang et al. (1992).The volume used for DDRT-PCR reaction was 20 μL and concentration of PCR reagents (supplied by Fermentas, USA) used in the reaction mixture is given below.

Table 2.4 Concentration of PCR reagents

PCR Reagents Concentration Volume Template cDNA 1:30 dilution 5µL dNTPs (10 mM) 2.5 mM 2.0µL Buffer (10X) 10X 2.0µL

MgCl2 (25 mM) 25 mM 1.6µL Forward primer(1µg/µL) 50 ng/µL 1µL Reverse primer(1µg/µL) 50 ng/µL 1µL Taq Polymerase (5U/ul) 2.5U/µL 0.2µL

Doubledistilled deionized (d3H2O) 7.2 µL Total Volume 20µL

Amplification was performed in mastercycler gradient, (Eppendorf, Germany) programmed as under:

53 Chapter 2: Materials and methods

Table 2.5 PCR profile

Steps Temperature Time No. of cycles Initial denaturation 94oC 4 min 1 Denaturation 94oC 1 min Annealing 400C 1 min 40 Extension 72oC 1 min Final extension 72oC 10 min 1 Hold 4oC Until turned off

54 Chapter 2: Materials and methods

Table 2.6 Primer sequences, used in the synthesis of cDNA

5' Arbitrary 3'Anchoredoligo-dT Sequence Sequence primers Primers P1 5'-ATTAACCCTCACTAAATGCTGGGGA-3' T1 5'-CATTATGCTGAGTGATATCTTTTTTTTTAA-3' P2 5'-ATTAACCCTCACTAAATCGGTCATAG-3' T2 5'-CATTATGCTGAGTGATATCTTTTTTTTTAC-3' P3 5'-ATTAACCCTCACTAAATGCTGGTGG-3' T3 5'-CATTATGCTGAGTGATATCTTTTTTTTTAG- 3' P4 5'-ATTAACCCTCACTAAATGCTGGTAG-3' T4 5'-CATTATGCTGAGTGATATCTTTTTTTTTCA-3' P5 5'-ATTAACCCTCACTAAAGATCTGACTG-3' T5 5'-CATTATGCTGAGTGATATCTTTTTTTTTCC-3' P6 5'-ATTAACCCTCACTAAATGCTGGGTG-3' T6 5'-CATTATGCTGAGTGATATCTTTTTTTTTCG-3' P7 5'-ATTAACCCTCACTAAATGCTGTATG-3' T7 5'-CATTATGCTGAGTGATATCTTTTTTTTTGA-3' P8 5'-ATTAACCCTCACTAAATGGAGCTGG-3' T8 5'-CATTATGCTGAGTGATATCTTTTTTTTTGC-3' P9 5'-ATTAACCCTCACTAAATGTGGCAGG-3' T9 5'-CATTATGCTGAGTGATATCTTTTTTTTTGG-3' A1 5'-AAGCTTGATTGCC-3' B1 5'-AAGCTTTTTTTTTTTTTA-3' A2 5'-AGCTTCAAGACC-3' B2 5'-AAGCTTTTTTTTTTTTTG-3' A3 5'-AAGCTTTATTTAT-3' A4 5'-AAGCTTCGACTGT-3' A5 5'-AAGCTTGCCTTTA-3' A6 5'-AAGCTTCTTTGGT-3'

55 Chapter 2:Material and methods

2.2.4 Electrophoresis of amplified products

The DDRT-PCR products were fractionatedon 4% metaphore-agarose gels, visualized and analysed as described in section 2.1.4. DNA ladder (Fermentas, USA) of 1 Kb was loaded in the first and last lane of the gels for estimatingthe size of amplicons.

2.2.5 Isolation, Reamplification and Confirmation of Differentially Expressed Transcripts

Bands that were consistently appeared in normal (fiber producing) ovules were excised from the gel using DNA extraction Kit (Fermentas) following gel elusion protocol described in section 2.2.5.1. The DNA was precipitated using ethanol, again dissolved in 0.1XTE buffer and re-amplified. Reamplification was performed in25 μLvolum containing the same set of primers with which cDNA was amplified and produced differential expressedproducts. For each reamplified band, a negative control without reverse transcriptase was used to check RNA contamination. ReamplifiedPCR products by directly adding loading dye {40% (w/v) sucrose, 0.25% (w/v) bromophenol blue, 0.1 mMtris, 0.05 mM EDTA} were separated on ethidium bromide(0.5µg/mL) stained horizontal agarose gel. The 4% 1:1 agarose: metaphore gel was prepared in chilled (40C) 1X TBE buffer (Appendix Table 3). The 100 bp DNA ladder (Fermentas) was used to estimate the sizes of reamplifiedPCR products.

2.2.5.1 Elution protocol from agarose gel

1. DNA fragment was excised from agel visualised under UV light. 2. Determined the approximate volume of the gel by weighing (0.1g of gel was equal to ~100 µL).Transfered to Eppendorf tube (1.5 mL). Large pieces were sliced into 2 mm cubes to facilitate dissolving gel for the next step. 3. Binding solution was added ~ 3 times the weight of the sliced gel. The tubes were placed in a 45-55oC dry bath incubator. After each min the contents of the tube were mixed and returned it to the dry bath. The gel was completely dissolvedafter about five min.

56 Chapter 2:Material and methods

4. Silica was added to the solution and for binding of the DNA to the silica matrix was incubated for 15 min. 5. The contents were spinnedat 14000 rpm (5-10seconds) in a microcentrifuge to remove small amounts of silica powder from DNA. 6. With ice cold washing buffer the pellet was washed 3 times. Supernatant discarded after finalwashing. 7. Vacuum dried the pellet for 2-5 min and then re-suspended in the elution buffer. 8. Elution of DNA from silica was done by centrifugation for 30 sec. Supernatant containing the eluted DNA was carefully collected in a new tube.

2.2.6 DNA Sequencing

ReamplifiedPCR products were sequenced using automated DNA sequencer (ABI 3100). Both strands of transcripts were sequenced with M13 primers.

2.2.7 Data Analysis

Nucleotide sequences were translated into amino acid sequences and amino acid sequence of each clone was searched with BLAST search tool in NCBI to find maximum similarity with the cotton gene sequences.

2.3 Experiment No. 3. Isolation of fiber related partial sequences from Gossypium arboreum at different developmental stages and phylogenetic study of translation elongation factor-1 gamma gene across the genomes of cotton.

2.3.1 Plant material

Gossypium arboreum L cv raviwas used inthis investigation. Seeds of this variety were delinted in 10% H2SO4, washed with tap water and air-dried. Delinted seeds were sown in pots that were kept in environmentally controlled growth chamber (Conviron, Canada) and monitored daily. Plants were grown in three replicates. The growth parameters maintained in growth chamber aregiven below (Table 2.7).

57 Chapter 2:Material and methods

Table 2.7 Growth parameters for rising of cotton seedlings

Time 24 Temperature Relative Humidity Fluorescence Incandescence hr/day ˚C % Light Light 0:00 25 65 0 0 6:00 25 65 2 2 10:00 30 65 2 2 14:00 35 65 2 2 21:00 30 65 0 0 23:59 25 65 0 0

2.3.2 RNA extraction at different developmental stages of cotton fibers

Total RNA was extracted from fresh sample of ovules at different developmental stages 0dpa, 5dpa and 10 dpa of G. arboreum L. according to the protocole described in section 2.2.1.

2.3.3 Synthesis of1ststrand cDNA

First Strand cDNA was synthesisedfrom total RNA isolated at different fiber developmental stages (0dpa, 5dpa and 10dpa)using Revert aid H minus First Strand cDNA Synthesis kit’ (Fermentas, USA). Manufacturer’s protocol with little modifications in temperature performed in mastercycler gradient, (Eppendorf, Germany). This cDNA was amplified using gene specific primers (Table 2.8).

2.3.4 Gene specific primer designing

Primers were designed fromthe conserved fiber ESTs regions showing homology with genes encoding for Proline rich cell wall protein, Ubiquitin extension protein, Translation elongationfactor1- gamma, Myb like protein.

58 Chapter 2:Material and methods

2.3.4.1 Criteria for primer designing

1. The length of the primers was kept close to 18 nucleotides. 2. The melting temperature (2*(A+T) +4*(C+G)) of forward and reverse primers was estimated using Oligo Calc (www.justbio.com) and was kept similar by increasing or decreasing the nucleotides of one of the primers in a given pair. 3. In order to avoid primer dimer formation with itself or other primers in the reaction, care was taken to ensure that primers have no complementary sequences. 4. The average GC contents of each primer werekept close to 50%.

Table 2.8Primer sequences

Sr. No Name Sequence Function 1 SNP 10 F-CATCATTGGGCTGGACATTG Proline rich cell wall R-GTACACATCGGCATAGGTAG protein 2 SNP 24 F -CTGAGGAAGCTGCTATTGC Translation R -CCAGTCATCCAGTACCATC elongationfactor1- gamma 3 SNP 76 F -CCAATGTGATGAAGCTCC Ubiquitin extension R -GGTGTACCTGAACCATTG protein 4 Y23 F -GGCTTCTTGCCTTCTTCACC Myb like protein R -ATTCGGCACGAGAAAAGCC

59 Chapter 2:Material and methods

2.3.5 Polymerase chain reaction

Concentration of PCR reagents (supplied by Fermentas, USA) used in reaction mixture is given below:

Table 2.9 Concentration of PCR reagents to amplify fiber related genes.

PCR Reagents Concentration Volume

Template cDNA 1:30 dilution 5µL dNTPs (10 mM) 10 mM 1µL

Buffer (10x) 10x 5µL

MgCl 2 (25 mM) 25 mM 4µL Forward primer(1µg/µL) 50 ng/µL 1µL

Reverse primer(1µg/µL) 50 ng/µL 1µL

Taq Polymerase (5U/ul) 2.5 U 1µL

Double distilled deionized - 32.00 µL (d 3H2O) Total Volume 50µL

Amplification was performed in mastercycler gradient, (Eppendorf, Germany) programmed for amplification of fiber genes as under

Table 2.10 PCRProfile

Steps Temperature Time Number of cycles Initial denaturation 94oC 4 min 1 (first) Denaturation 94oC 1 min Annealing variable 1 min 40 Extension 72oC 1 min Final extension 72oC 10 min 1 Hold 4oC Until turned off

60 Chapter 2:Material and methods

2.3.6 Agarose gel electrophoresis of amplified products

The products amplified with gene specific primers were resolved on 1.2% agarose gel. The protocol described in section 2.2.4 was used for gel electrophoresis.

2.3.7 DNA fragments elution protocol from agarose gel Fragments from ethidium bromide-stained gel were excised according to the protocol described in section 2.2.5.1. Cloning was carried out using E.coli strain DH5α. The vector selected for the cloning of eluted fragment was PTZ57R/T, Provided by T-A cloning kit (Fermentas) (Appendix Table 2). It contained ampicillin resistance gene.

2.3.7.1 Ligation of fragments

Eluted fragments were ligated to the vector using following protocol (Table 2.11).

Table 2.11 Protocol for ligation reaction

Reagents Volume Vector PTZ57R/T 2 µL 6 µL (variable depending Eluted DNA fragment on concentration) Ligase buffer (2X) 4 µL T4 DNA ligase 1 µL

Double distilled H2O 7 µL Total 20 µL The ligation mixture was incubated at 14oC overnight (~14-16 hours).

61 Chapter 2:Material and methods

2.3.7.2 Heat shock Cells Preparation of E .coli

1. 100 mL LB medium was inoculated with single colony picked from LB agar plate and put at shaker at 37°C for overnight. 2. The 2 mL of overnight grown clture was diluted to 250 mL in 1 liter flask, shakedat 37°Cto obtain optical density (OD) of 0.5-1.0 cells/mL. 3. Cells were kept on ice (30 min) and centrifuged at 3000 rpm (5 min). 4. After discarding the supernatant, tubes were placed in ice. In total, 5mL

of 0.1 M sterilized and pre-cooled MgCl2 was added and pellet was resuspended by gentle mixing.

5. After treating with MgCl2,the cells were centrifuged at 3000 rpm (5 min). 6. Supernatant was discarded and pellet was re-suspended in 5 mL solution

of 0.I M pre-cooled CaCl2 containigglycerole (15%). 7. Centrifuged again at 3000 rpm for 5 min. 8. Discarded supernatant and resuspended the pellet in 2 mL of 0.1 M

CaC12 and 0.7 mL of 100% glycerol. 9. The 200 µL cell suspension was taken in liquid nitrogen frozen microcentrifuge tubes and stored at -80oC in freezer.

2.3.7.3 Transformation in E. coli by heat shock

The 5 µL from DNAligation mixture was added into 200 µL aliquot containing competent cells and placed on ice for 30 min and then heated at 42°C for 2 min and again cooled down on ice for 2 min. After adding 1 mL LB media in the DNA it was put at 37°C for one hour. Transformed culture (100 µL) and IPTG (30 µL)and X-gal(30µL) were spread onampicillin containing (100ng/mL) solid LB medium. After complete absorption of liquid, plates were sealed with parafilm and kept overnight at 37°C. Colonies (white and transparent)were picked using sterile toothpicks and cultured overnight in ampicillin containing (100 ng/mL)LB mediumat 37°C in waterbath with vigorous shaking.The 1% agarose gel was used to check the isolated plasmids.

62 Chapter 2:Material and methods

2.3.7.4 Plasmid isolation from E. coli

Ampicillin containing (100 µg/mL) 20 mL liquid L.B medium was inoculated with single E. coli colony, grown overnight at 37°C and then centrifuged at 14000 rpm in 1.5 mLeppendorf for 5 min. After discarding supernanet pellet was air dried (2 min). The pellet was suspended in 100µL solution (Tris 50mM, EDTA1mM and RNAs100µg/mL) usingvorte.The 150µL solution (0.2N NaOH and 1% SDS) added and mixed well by gently invertingof the tube and after adding 200µL solution (Potassium acetate), it was centrifuged at 14000 rpm (5 min). The supernatant was collected in new eppendorf tube and 100% ethanol (2volumes) was added and kept at -40°C for 20 min and again centrifuged for 10 min at 14000 rpm.The pellet was washed with 70% ethanol and vacuum dried and dissolved in 20µL H2O.The concentration and size of plasmid were determined on 1% agarose gel by comparing with 1Kb DNA ladder (Fermentas).

2.3.7.5 Restriction reaction

Restriction reaction was performed to confirm clones. The restriction reaction was performed as under (Table 2.12).

Table 2.12 Protocol for restriction reaction

Reagents Concentrations DNA 8.0 µL Buffer 2.0 µL EcoRl 0.5µL Pstl 0.5µL RNAase 1.0 µL d3H2O 8.0 µL Total 20.0 µL

`

63 Chapter 2:Material and methods

The digestion mixture was incubated at 37oC for one hour and checked on 1% agarose gel to confirm the size of cloned fragment by comparing with 1Kb DNA ladder (Fermentas).

2.3.8 DNA Sequencing

Cloned fragments weresequenced using automated DNA sequencer (ABI 3100).

2.3.8.1 Sequence analysis

The nucleotide sequence of each clone was searched with BLAST search tool in NCBI to find maximum similarity with the cotton gene sequences.

2.3.9 BAC library screening

2.3.9.1 Filter preparation, probe designing and hybridization

For hybridization-based screening of BAC library, Hybond N+ membranes (22.5 cm × 22.25 cm) were prepared with 18432 BACs using Q- BOT. (Genetix). The gridding of clones 4 × 4 array in duplication generated this high number (18432) of BACs per membrane. These membranes, on 1% LB agar containing 12.5µg/mL chloramphenicol were incubated on 37°C for 12-18 hours.

The procedure described by Bowers et al. (2005) was followed for probes designing and hybridization to libraries. Three probes (Table 2.13) labelled with P-32 were designed from partial sequence of cotton fiber genes. These probes, for isolation of fiber specific genomic regions from four cotton species libraries (G. hirsutum L., G. arboreum L., G. raimondii L. and G. kirkii L.) were hybridized to 18,432 BACs containing membranes of each library. Films were scored manually, text-recognition software (ABBYY FINEREADER) was used to digitize scores and data deconvoluted and stored in the MS Access database system "BACMan".

64 Chapter 2:Material and methods

Table 2.13 Sequences of Overgo probes

Probes Sequences COV2419 A TTTATTGACAATTTGCACTAGATAA COV2419 B ACAAATAAACATACTATTATCTAG COV2420 A ATGGTCTTCCCCGTTAGGGTTTTG COV2420 B AGATGCAGATCTTCGTCAAAACCC COV2421 A TCCAAAAGTACCTCTCAACATGAG COV2422 TTTACCTCGGAGTTCCCTCATGTT

2.3.9.2 BAC clone sequencing

To isolate plasmid DNA from clones showing positive hits with gene specific probes, 3 mL LB cultures containing 12.5μg/μL chloramphenicol were grown overnight and miniprepped robotically (Autogen 740 plasmid isolation system). This plasmid DNA was prepared in 96 wells plate.

2.3.9.3 Sequencing reaction Sequencing reaction was performed in 0.2 mL 96 wells plate with gene specific primers (Table 2.14). Total volume of amplification reaction was 7µL. Concentration of reagents used in reaction mixture is given in Table 2.15. Cycle sequence was performed as given in Table 2.16.

65 Chapter 2:Material and methods

Table 2.14 Primers sequences used for sequencing of positive clones

Primer ID Primer sequence Sequence Homology MSP-1R AGAATCCTCTTGATT GAGGCAGCAAATCAAGAGGATTCT Translation TTGGTTTGGGCTTTGGTGCCTC elongation TGCTGCCTC CTCCTCCTCAGCGGCCTCTGCCT factor -1 MSP-2F CAGCAATAACAGCTT TTGCTGCCTGTTTCTCAACCTCCT gamma CCTTTGGTTTAGTTTCTTTTGGCT CCTCAGAAT TTTTGGCTTCTTTCTTTGGTTCAG ACAGGTGGTAGAGAGACTGCCT MSP-3-1-F TGGTAGAGAGACTGC GGGCAGCAGGCTTCTTTGAAGGA CTGCT GCTTCACTTTACCAAGAATCTTCTT GATTTTTGGCTGATTAACCAAGGTC MSP-3-2-F CATGATAATGTCAGC CAAAAGTACCTCTCAACATGAGGG AAGGG AACTCCGAGGTAAAACTCTTGGTC ATGATCTGGGAGAAACCCAAGTAG AGGTTACATGTCATGATAATGTCA GCAAGGGTGACAAAATGTCCAACA GGAAATGTGTTGGAAGCAAGATGA GTGTTCAAAGCGCCAAGGGCTCTC TTCAATGCAGCAATAACAGCTTCCT CAGAAT

Table 2.15 Concentration of reagents used in sequencing reaction

Reagent Concentration Volume Big dye 3.1 0.67µL Buffer 5X 1.07µL Primer forward or reverse 150 pmol/ul 0.1µL Template 2-2.5 ug 4ul ddH2O 1.17µL Total volume 7µL

Table 2.16 Profile of thermal cycler

Temperature Time Number of cycles 96oC 5 min 1 (first) 96oC 30 sec 50oC 10 sec 70 60oC 4 min 4oC Until turned off

66 Chapter 2:Material and methods

2.3.9.4 Clean up protocol

1. The 5ul of 0.005% SDS was added in each reaction. 2. The plate was heated at 95oC for 5 min, cooled down to room temperature (R.T) and kept at R.T for 10 min. 3. Plate rections were passed through sephadex columns. 4. Final filtrate with volume of 5ul was collected.

2.3.9.5 Run on “ABI 377” DNA sequencer

Filtrate were loaded in sequencing gels of 48 lanes in “ABI377” automated sequencers module LongSeq50_POP7_1 version 2.0 for 30 seconds of ‘injection_time’ and 4600 ‘run_time’

2.3.9.6 Analysis of sequences

BAC end sequences of high quality from each library were used as queriesin FASTX. Sequences were aligned using DNASTAR and phylogenetic analysis and nucleotide variation detection were carried out using Clustal W and Clustal v respectively.

2.4 Experiment No. 4. Development of BAC-gSSRsfromGossypium raimondii for use in cotton improvement 2.4.1 Development of BAC-gSSRs

From genome sequence information of G. raimondii L. sequencing (Lead PI is Prof Andrew H Paterson), a total of 1294 sequences (560 BAC end sequencesand 734 BAC-clone sequences) consisting of SSRs were identified. PCR primers were designed at Plant Genomics and Molecular Breeding (PGMB) Labs NIBGE, to produce amplicons of 100-300bp, from the flanking regions of SSRs using primer 3-software. The annealing temperature was variable for different primers and range from 50-61ºC.

2.4.2 Plant material

The newly designed BAC-gSSRs (Appendix Table 9) were screened on four cotton species (G.hirsutum cv FH-1000, G.barbadense cv PGMB-36, G.

67 Chapter 2:Material and methods arboreum L. and G. raimondii L.) for evaluation of amplification in other genomes of cotton and detection of polymorphism. G. raimondii was used as a control for PCR amplification and to check the amplicons for expected sizes, G. hirsutum cv. FH-1000 and G.barbadensecv PGMB-36 were used to find polymorphic SSRs for further use in genetic mapping while G. arboreum to check cross-species amplification of D-genome species derived SSRs.

2.4.3 DNAextraction

Genomic DNA from fourcotton species enlisted in Table 2.17 was isolated using CTAB method, purifiedand then quantified. Details of CTAB method and DNA quantification are given in sections 2.1.2, 2.1.2.1and 2.1.2.2 respectively.

Table 2.17 Cotton species used to study BAC-gSSRspolymorphism

Species Genome G. hirsutum L.cv FH-1000 AD1 G. barbadense L.cv PGMB 36 AD2 G. raimondii L. D5 G. arboretum L. A2

2.4.4 Polymerase Chain Reaction

PCR reactions (20µL) were performed in 0.2 mL PCR tubes with a thermal cycler (Mastercycler gradient, Eppendorf Germany). Concentration of PCR reagents kept and programming of thermal cycler were kept same as mentioned in Table 2.2 and 2.3, the only difference was in annealing temperature which was variable with different BAC-gSSRs and ranged from 50oC to 61oC.

68 Chapter 2:Material and methods

2.4.5 Electrophoresis of BAC-gSSRs

The PCRproducts were fractionated on 4% metaphore-agarose gels, visualized and analysed as described in section 2.1.4. DNA ladder (Fermentas, USA) of 50 bp was loaded in first well of the gels for estimation of exact size of amplicons.

2.4.6 Data scoring and analysis

Data scoring and analysis was done as described in section 2.1.5.

2.5 Experiment No. 5. Genetic and QTL mapping using F2 population derived from G.hirsutum x G.barbadense 2.5.1 Parental genotypes

The parental genotypes for mapping population developming wereG.hirsutum cv FH-1000 and G. barbadensegermplasm accession PGMB-36. FH-1000 wasvariety of G.hirsutum developed from a cross between S-12 cotton leaf curl virus (CLCV) susceptible and CIM-448 (CLCV resistant) at CRI (Cotton Research Institute) Faisalabad, Pakistan. It is highly resistant to mutated strain of CLCV and was selected because of its high yield, early maturity and well adaptation in wider range of Pakistani environment. PGMB-36 is a modern pure-line of G.Barbadensedeveloped at NIBGE, Faisalabad, Pakistan. It was selected for its superior fiber quality parameters as compared to FH-1000.

2.5.2 Mapping population development

These two genotypes (FH-1000 and PGMB-36) were used to develop mapping population for developing genetic map followed by the identification of QTLs associated with the contrasting fiber traits. These two parental species were crossed in year 2008 using FH-1000 as female in green house of NIBGE and F1 individuals weregrown during the normal cotton growing season of

November, 2009. These F1s were self-pollinated to produce F2 seeds.

F2population along with parents was planted in the normal cotton growing season of 2010. F2 individuals were self-pollinated to produce F3 seeds. The F2:3 family lines were sown in 2011 and self-pollinated to produce F4 seeds.A

69 Chapter 2:Material and methods

population of 131 F2:3 family lines were planted along with theparents and F1s as controls in 2012 and a population of 131 F2:4 family lines were planted with the parents in April 2013 in the NIBGE cotton field.

2.5.3 Phenotyping of F2 mapping population

Phenotyping of F2 plants was done for boll beak, number of gossypol, boll shape, boll length, boll width, bracteole width and for fiber quality traits (fiber strength, length, fineness and uniformity ratio). Fiber quality traits were determined by stelometer methodologies at NIBGE, Faisalabad. The stelometer method of measurement was used over high volume instrumentation (HVI) methods to provide precise measurements.

2.5.4 Phenotyping of F2:3,F2:4 mapping population

Data scored for F2 population was repeated in F2:3,F2:4 generations sampled by lines..

2.5.5 Genomic DNA isolation

Total genomic DNA of each of the parent (leaves from 8 to 10 plants harvested as a bulk sample), F1, and 131F2 plants was isolated following CTAB method mentioned in section 2.1.2 and quantified following method mentioned in section 2.1.2.2.

2.5.6 Molecular mapping

2.5.6.1 Chemicals and enzymes

Names of enzymes and chemicals used in the present study are given in Appendix Table 1.

2.5.6.2 Simple sequence repeats (SSR) analysis

A total of 1294 G.raimondii L. derived BAC-gSSRs (Appendix Table 9) and 20 EST-SSRs (Appendix Table 6) were screened to identify polymorphism(s) between the parent species (G. hirsutum L. cv FH-1000 and G. barbadense L. cv PGMB-36).

70 Chapter 2:Material and methods

Out of these, 74 informative SSR markers were selected to survey on the mapping population.

2.5.6.3 Polymerase chain reaction

PCR procedure was same as mentioned in section 2.1.3.1.

2.5.6.4 Electrophoresis of amplified products

Detailed procedure for electrophoresis is given in section 2.1.6.

2.5.6.5 Map construction

On the basis of 131F2genotypes, a linkage map with 74 marker loci was constructed with the computer program MAPMAKER/EXP version 3.0 (Lander et al., 1987) using Kosambi function (Kosambi, 1944)for the conversion of recombination frequency to genetic distances (cM). Scoring of polymorphic bands in F2 genotypes was done as (1) for present and (0) for absent and then the data codes were translated to A, B, C, D, and H. Co-dominant amplicons were coded as:

A Genotypes of parent A (FH-1000)

B Genotypes of parent B (PGM-B36)

H Heterozygote

Following are other codes for different situations:

C not A; i.e. H or B (for dominant markers)

D not B; i.e. H or A (for dominant markers)

‘-’ Missing data for the individual at a locus

The group command was used to assemble the groups, ‘Order’ command was used to order the formed groups while the ‘Compare’ command was applied to determine the order of the markers of small linkage groups. The obtained marker order was then confirmed further using ‘Ripple’ command. The possible linkages of unassigned loci with small linkage groups were determined using

71 Chapter 2:Material and methods

‘try’ command. Chi-square test was used to find expected Mendelian segregation ratio of each marker locus i.e. 1:2:1 and 1:3 for co-dominant and dominant markers, respectively.

2.5.6.6 QTL analysis

The F2 population data for individual plants was used in QTL mapping of morphological attributes whereas mean value of economic traits across the replicates for F2:3, F2:4families were utilized to map QTLs for economic traits. usingQTL cartographer (Wang et al., 2006a) single marker analysis (SMA), interval mapping (IM) and composite interval mapping (CIM) analysis were carried out to find association of phenotype with genotype. A significance level of 0.001 was selected in SMA analysis. According to criteria proposed by Lander and Botstein (1989) a minimum LOD score of 2.4 can be considered significant, however, in this study,LOD score of ≥2.69 was used to declare a putative QTL in a given genomic region in IM and CIM analysis.

2.5.6.6.1Marker assisted selection

ThF4 population was screened at seedling stage using 20 out of the 73 polymorphic BAC-gSSRs to check the introgression segments fromG. Barbadese L. and lines which showed alleles like G. barbadese and were phenotypivally like G. hirsutum were selected for further analysis. Parent species were used as control during this selection of seedlings. Fiber traits were measured at each generation.

72 Chapter 3: Results

RESULTS

This chapter has been subdivided into sections comprising of genetic diversity assessment of diploid and tetraploid cotton species with two types of SSRs (gSSRs and EST-SSRs), identification of differentially expressed genes from normal (fiber producing) and mutant (fiberless) ovules of Gossypium hirsutum during fiber elongation stage, isolation of fiber related sequences from Gossypium arboreum L. at different developmental stages and phylogenetic study of translation elongation factor-1 gamma gene across the genomes of cotton, identification of BAC-gSSRs in G. raimondii L. and genetic mapping followed by QTL analysis of inter-specific (G. hirsutum L. x G.

barbadense L.) derived F2 population.

3.1 Experiment No. 1. Genetic diversity and relationship of diploid and tetraploid species using and gSSRs

3.1.1 Microsatellite polymorphism

Genomic DNA of 36 cotton species/landraces were surveyed with 100SSR markers. Twenty five, out of 100 SSR markers (16 BNLs and 9 MGHES) were not included in the final results, because of poor quality of amplification in all diploid genome species. The remaining 75 primers amplified 87 loci. Out of these 10primers (MGHES-19, MGHES-33, MGHES-38A, MGHES-41, MGHES-60, BNL-448, BNL- 1350, BNL-1878, BNL-3408 and BNL-3646) amplified two loci while BNL-3955 amplified three loci. A total of 73 primer pairs (97.70%) were polymorphic, whereas only two primers i.e. MGHES-13 and MGHES-17 were monomorphic. Polymorphism was observed on the basis of allele size difference or absence of allele (null allele) in one or more species. Out of 73 polymorphic primers 26 (35.61%) gave polymorphism because of null allele in some species which were confirmed by repeating the reaction twice and 47 (62.66%) gave polymorphism because of amplifying alleles of different sizes in species. The size of amplified fragments was in the range of 70bp (MGHES-12) -700bp (MGHES-07). A total of 109 alleles were detected and the range of alleles on single locus was 1 to 5 with average of 2.87/locus. Maximum numbers of alleles (5) were amplified by each of genomic SSRs i.e. BNL-1878, BNL-2691, BNL-3955 and BNL-3985.

73

Chapter 3: Results

Comparatively more alleles were amplified in tetraploids (2 alleles per locus) than the diploids (1.36 alleles per locus).

The frequency distribution of the 109 alleles is shown in Figure 3.1. The allelic frequencies ranged from .025 to 1, with an average of 0.469 (Table 3.1). Eighteen alleles (16.51%) appeared with the frequencies of 0.10 or lower, while twenty nine (26.6%) were present with frequencies of 0.99 or higher. None allelic variations were observed in MGHES-13 and MGHES-17 loci and were amplified in all cotton species.

The average polymorphism information content (PIC) value was 0.50, with the highest of 0.882 for MGHES-27 and the lowest of 0.11 for BNL-3895 (Table 3.2). Higher PIC value (average 0.35) was observed with gSSRs compared to EST SSRs (0.291). Diploids showed high average PIC value (average 0.30) than the tetraploids (average 0.21)

Figure 3.1: Frequency distribution of 109 alleles in 36 cotton species/landraces.

74

Chapter 3: Results

Table 3.1 Markers with repeat motif type, number of alleles, frequencies and PIC Value

Allele size No. of Marker (loci) range Repeat motif Allele frequencies PIC value alleles (bp)/Loci

MGHES 06 130-175 (CCA)7 3 0.115, 0.769, 0.115 0.381657

MGHES 8 175-200 (ATT)11 2 0.433, 0.566 0.491111

MGHES11B 200-225 (TTA)3(CGG)3 2 0.769, 0.230 0.35503

MGHES 12 70-100 (AC)10 2 0.304, 0.695 0.42344

MGHES 15 175-200 (AAAC)5 2 0.523, 0.476 0.498866

MGHES 17 175-190 (CAT)6(TTC)4 2 0.341, 0.658 0.449732

MGHES 18 100-200 (AT)13 2 0.333, 0.667 0.444444

MGHES19 (2) 110-250 (AGA)5(AGC)3 2 0.534, 0.465 0.497566

MGHES 20 200-225 (CCA)9 3 0.333, 0.6, 0.066 0.524444

MGHES 21 200-230 (GA)14 2 0.227, 0.772 0.35124

MGHES 22 175-250 (AGA)7(GAA)3 2 0.842, 0.157 0.265928

MGHES 27 225-280 (TCT)7(AAC)4(TTC)3 3 0.375, 0.281, 0.343 0.881836

MGHES 30 175-210 (CT)13 4 0.527, 0.416, 0.027, 0.027 0.546296

MGHES 33(2) 190-280 (CT)8(TC)8 2 0.25, 0.75 0.375

MGHES 34 200-250 (CCA)3(CAC)4 2 0.472, 0.528 0.498457

MGHES 36 160-210 (CTT)8(TCA)4 2 0.25, 0.75 0.375

MGHES38A(2) 170-225 (ACC)5(TCT)3 2 0.448, 0.551 0.494649

MGHES 39 175-200 (GCC)6(TCT)3 2 0.702, 0.297 0.417823

MGHES 40 150-200 (TTC)6 2 0.233, 0.767 0.357778

MGHES 41(2) 210-290 (CAA)8 2 0.791, 0.208 0.329861

MGHES 42 170-240 (AGA)6 2 0.5, 0.5 0.5

MGHES 44 200-290 (GAA)10(TCA)3(CAT)3 3 0.178, 0.678, 0.142 0.487245

MGHES 51 190-210 (ACAA)5(TA)4(AT)5 3 0.281, 0.468, 0.25 0.648672

MGHES 52 250-340 (CAC)5(AGG)3(CCA)3 3 0.166, 0.694, 0.138 0.470679

MGHES 60(2) 125-200 (AT)15(TA)5 2 0.355, 0.644 0.458272

MGHES 63 160-200 (TTTTA)6 3 0.057, 0.685, 0.257 0.460408

MGHES 76 200-230 (GA)20 2 0.4, 0.6 0.48

75

Chapter 3: Results

BNL 448(2) 90-225 (CT)13 4 0.125, 0.553, 0.196, 0.125 0.623724

BNL 827 250-275 (CA)19 3 0.233, 0.6, 0.166 0.557778

BNL 1317 175-200 (AG)14 3 0.086, 0.826, 0.086 0.302457

BNL 1350(2) 200-300 (CA)8(GA)16 3 0.222, 0.278, 0.379 0.623457

BNL 1878(2) 125-250 (AG)14 5 0.025, 0.225, 0.5, 0.15, 0.1 0.66625 0.428, 0.5, 0.071, 0.75, BNL 2691 230-260 (GA)23 5 0.561224 0.667

BNL 3065 175-200 (AG)21 3 0.108, 0.521, 0.369 0.579395

BNL 3103 150-200 (GA)13(TC)14 3 0.367, 0.408, 0.224 0.648063

BNL 3147 180-220 (AG)11 3 0.189, 0.567, 0.243 0.582907

BNL 3255 225-320 (GC)6AT(AC)14 4 0.310, 0.206, 0.103, 0.379 0.706302 0.0606, 0.469, 0.439, BNL 3408(2) 200-250 (GT)2AT(GT)12 4 0.581726 0.030

BNL 3558 200-250 (AC)11 3 0.121, 0.609, 0.268 0.541344

BNL 3563 240-275 (CA)13(TA)4 3 0.041, 0.583, 0.375 0.517361

BNL 3627 160-200 (TC)17 4 0.161, 0.483, 0.354, 0.233 0.613944

BNL 3646(2) 110-170 (TC)14 4 0.054, 0.783, 0.054, 0.108 0.368152

BNL 3793 150-300 (TG)15 3 0.225, 0.375, 0.4 0.63875

BNL 3888 200-250 (TG)15 3 0.115, 0.576, 0.307 0.559172

BNL 3895 180-200 (TG)10 2 0.058, 0.942 0.110727 0.043, 0.108, 0.195, 0.152, BNL 3955(3) 170-350 (CA)12(GT)13 5 0.674858 0.5 0.086, 0.1521, 0.195, 0.5, BNL 3985 150-300 (TC)23 5 0.676749 0.065 The number in parenthesis in column one indicates the number of loci amplified.

76

Chapter 3: Results

3.1.2 Genetic characterization

Out of the twenty two informative SSRs, 11 gSSRs (BNLs) and 11 EST-SSRs (MGHES) (Table 3.2) with PIC ≥0.5 were identified which can distinguish all the Gossypium species.

Based on the PIC values of the most informative SSRs it is possible to greatly reduce the number of SSR markers employed in genetic diversity assessment (Candida et al., 2006). Within these 22 informative SSRs, all BNLs had di-nucleotide motif while 63.63% MGHES had tri, 27.27% had three types (di-, tetra- and penta- nucleotide) motif. However, location and or position of SSRs on chromosome (either proximal to centromere or near distal end) has no effect on polymorphism information content (PIC) (Table 3.2), while polymorphism rate was found positively correlated with tandem repeats number (Table 3.1).

77

Chapter 3: Results

Table 3.2 Most informative SSRs and their location on genome

Marker Chromosome Position CM PIC value MGHES 06 D 62.7 0.582907 MGHES 19 A 31.6 0.581726 MGHES 20 D 31.5 0.524444 MGHES 21 D 144 0.63875 MGHES 27 A 0 0.881836 MGHES 30 A 151 0.546296 MGHES 36 D 65.6 0.648063 MGHES 40 D 159 0.541344 MGHES 51 D 135 0.648672 MGHES 52 A 168 0.579395 MGHES 63 A 81.4 0.557778 BNL 448 D 81.3 0.623724 BNL 1878 D 64.2 0.66625 BNL 2691 D 71 0.561224 BNL 3255 A 119 0.706302 BNL 3408 D 68.6 0.623457 BNL 3563 A 71.6 0.517361 BNL 3627 A 66.7 0.613944 BNL 3793 D 92.2 0.5 BNL 3895 A 64.5 0.110727 BNL 3955 D 167 0.674858 BNL 3985 D 15 0.676749

78

Chapter 3: Results

3.1.3 Transferability of SSRs across Gossypium genomes and genome specificity

Out of the 75 SSRs, 22 (29.33%) produced amplicons in all the thirty six species/landraces. No association was found between the repeat motif type and rate of transferability. Out of the total amplified fragments (75 SSRs x 36 species =2700), 44.16% were found in more than one genome group. Four primers BNL-1350, BNL- 3147, BNL-3065 and BNL-3558 gave amplification in few species while, MGHES-15, MGHES-17, MGHES-21, MGHES-26, MGHES-28, MGHES-30, MGHES-52, BNL- 448, BNL-3672, BNL-3793 and BNL-3985 gave amplification in most of the species. Among diploids, the species belonging to A, B, F, and E genomes showed high transferability rate while D-genome species exhibited low transferability rate (Table 3.3).

Fifteen genome/ or species-specific SSRs were identified (Table 3.4). None of the species belonging to A-, B-, C- and G- genome were amplified with BNL-3147, similarly, none of the species of A- and AD-genome were amplified with MGHES-44 thus BNL-3147 can be used as A-, B-, C- and G- genome negative and MGHES-44 can be used as A-and D-genome negative. BNL-3888 did not amplify any of the D and E genome species which can be used as D and E genome negative. MGHES-16, BNL-1053 and BNL-1359 did not amplify in G. trilobum L. BNL-3482 did not amplify in G. aridum L. Two EST-SSRs (MGHES-20 and MGHES-21) could not amplify the genomic DNA of G. darwinii while MGHES-15 could not amplify in G. aridum L. and G. trilobum L. (Table 3.4). Thus this set of primers can be utilized as species specific primers.

79

Chapter 3: Results

Table 3.3 Transferability of G. hirsutum derivative SSRs in other Gossypium species/genomes

# of SSRs # of SSRs % of SSRs amplified # of null Genome amplified each amplified each subgenome amplified genome * partially**

A 32 42.67 31 12

D 24 32.0 41 10

AD 45 60 0 30

B 40 53.34 24 11

C 24 32 44 7

E 19 25.34 43 13

G 28 37.33 25 22

F 58 77.33 0 17 * indicates that amplification in all species of a given genome. ** indicates there was no amplification in some species of the given genome. Null amplified means that there was no amplification in any of the species of given genome.

80

Chapter 3: Results

Table 3.4 Genome and species-specific amplification features of SSRs

Primers Genomes A B C D E F G AD MGHES-15     1    X MGHES-16 √ × ×  3 √ √ √ √ MGHES-20 √ × × √ √ √ √  5 MGHES-21 √ √ × √  4 × √  5 MGHES-22 × × × √ × √ √ √ MGHES-44 × √ √ √ √ √ √ × BNL-1053 √ √ √  3 √ √ × √ BNL-1359 × √ ×  3 √ √ √ × BNL-2634 × √ × √ √ √ √ √ BNL-3066 √ √ √ √ √ × √ × BNL-3147 × × × √ √ √ × √ BNL-3279 √ √ × √  4 × √ × BNL-3482 √ √ √  2 √ √ √ √ BNL-3599 √ √ × √ × √ √ √ BNL-3888 √ √ √ × × √ √ √ The check mark X indicate no amplification in any species belonging to this genome The good mark √ indicate amplification in all species belonging to this genome  indicate partial amplification in the mentioned genomes  1= null in G. aridum and G. trilobumi  2 =null in G. aridum  3 = null in G. trilobum  4 = null in G. incanum  5 = null in G. darwinii

81

Chapter 3: Results

3.1.4 Microsatellite performance among diploid (A and D) and tetraploid (AD) genome species

Twelve primer pairs (MGHES-12, MGHES-22, BNL-1878, BNL-2449, BNL- 2634, BNL-2691, BNL-3147, BNL-3103, BNL-3408, BNL-3793, BNL-3955 and BNL- 3985) failed to amplify clear fragments in all three A-genome species. When PCR profiles of these SSRs were compared among A-, AD- and D-genome species it was found that these primers produced some private alleles in D- and AD-genome species, indicating specificity of these primers for D genome.

The sizes of many amplified fragments of tetraploids were different from diploids (A, D) (Fig. 3.2). The sizes of amplified fragments in the A, D and AD genome species ranged from 101-700 (bp). However, the percentage of fragments within 101- 300 bp in AD-genome species was higher than that of A- and D-genome species. All the A-genome species produced relatively larger sized fragments in the range of 301-400bp (Fig. 3.2).

80 70 A 60 AD 50 D

40

percentage (100%) percentage 30

20

10 0 50-100 101-200 201-300 301-400 700 Size of amplified fragments (bp)

Fig. 3.2 Distribution of fragment sizes amplified by SSRs

82

Chapter 3: Results

3.1.5 A and D genome species relationship with AD genome species

G. herbaceum L. was found genetically more similar to G. hirsutum L. (0.661) and G. barabadense L. (0.624). While G. arboreum L. was found 0.57 and 0.63 genetically similar with G. hirsutum L. and G. barbadense L., respectively (Table 3.5). Among D genome species G. raimondii L. was genetically more close to G. hirsutum L. and G. barbadense L. (0.642 and 0.667 respectively).

The mean genetic similarity coefficients of diploid (A-, D-) cotton species (11 in number) with tetraploid (AD) cotton species (5 in number) were from 0.56 to 0.64 (Table 3.5). G. raimondii L. showed highest (0.652) while G. lobatum L. showed lowest (0.566) mean genetic similarity with five tetraploid species.

83

Chapter 3: Results

Table 3.5 Genetic Similarity coefficient between tetraploid species and A-/D- genome species

G. herba G. arbo- G. thur- G. klotzs- G. david- G. arid- G. raim- G. gossy- G. loba G. trilo- G. lax ceumA1 reumA2 beriD1 cianumD2 soniiD3-d um D4 ondiiD5 piodes D6 tum D7 bum D8 um D9

G. hirsutum 0.661 0.570 0.582 0.630 0.618 0.618 0.667 0.545 0.552 0.594 0.600 115AD1

G. 0.624 0.630 0.630 0.618 0.631 0.582 0.642 0.594 0.588 0.632 0.636 barbedenseAD2

G. 0.636 0.618 0.594 0.618 0.618 0.582 0.632 0.570 0.588 0.594 0.636 mustelinumAD4

G. 0.636 0.618 0.630 0.606 0.606 0.594 0.641 0.582 0.552 0.594 0.624 tomentosumAD3

G. darwinii 0.648 0.667 0.618 0.606 0.630 0.606 0.641 0.570 0.552 0.655 0.600

AD5

Mean 0.641 0.621 0.611 0.616 0.622 0.596 0.645 0.572 0.566 0.616 0.619

84

Chapter 3:Results

3.1.6 Genetic similarity among diploid and tetraploid cotton species with EST and gSSRs

G. arboreum L. showed maximum genetic similarity (0.89) with G. herbaceum L. followed by (0.84) between G. barbosanum L. and G. capitis-varidis L. while, G. herbaceum var africanum showed least genetic similarity (0.50) with G. robinsonii L. The average genetic similarity coefficient among diploid species was (0.67), G. thurberi L. showed the highest (0.71) while G. robinsonii L. showed the lowest average similarity (0.62) to all other diploid species.

Genetic similarity coefficients between tetraploid species/landraces ranged from 0.62 to 0.85 (average 0.73). G. yucatanense L. and G. mustelinum L. had least (0.62) while (G. darwinii L. and G. barbadense L.) and (G. tomentosum L. and G. hirsutum L.) had highest genetic similarity (0.85). The species G. punctatum L. showed the highest genetic dissimilarity with the other tetraploids. Among the landraces G. morilii L. and G. punctatum L. showed close relatedness with G. palmeri L. and G. yucatanense L., respectively.

3.1.7 Phylogenetic study of 36 cotton species/landraces with combined data of EST-SSRs and gSSRs

Average genetic similarity coefficient of 36 Gossypium species/landraces was 0.64, with a range of 0.49 to 0.89 (Table 3.6). A dendrogram (Fig. 3.3) was generated using these genetic similarity coefficients and it was found that species were grouped in three major clusters ‘A’, ‘B’ and ‘C’. The major cluster ‘A’ consisted of two subclusters

(a1, a2). All A-genome species were grouped in the subcluster ‘a1’ and all tetraploid

species/landraces were grouped in ‘a2’ subcluster. Among allotetraploids, G. barbadense L. showed close relatedness with G. darwinii L. The major cluster ‘B’ comprised of 10

species, two subclusters ‘b1’ and ‘b2’. All B- and E-genome species (6 in number) were

grouped in subcluster ‘b1’ while the subcluster ‘b2’ consisted of four species (Fig. 3.3).

85

Chapter 3:Results

A total of nine D-genome species constituted a major cluster ‘C’ containing two

subclusters (C1, C2). In the subcluster ‘C1’ G. klotzchianum L. D2 with G. davidsonii L.

D3 and G. herkensii L. D2-2 with G.aridum L. D4 formed sister clustering respectively. G.

thurberi L. D1 was related to G. klotzchianum L. D2 and G. davidsonii L. D3 with genetic

relatedness of 80%. Similarly in subcluster ‘C2’ G. gossypiodes L. D6 and G. lobatum L.

D7 formed a sister group relationship. The most divergent species of the dendrogram was

G. longicalyx L. F1 which was 62.15% genetically related to all other species followed by

G. laxum L. D9 which was 64.94% genetically related to the other species.

86

Chapter 3:Results

Table 3.6 Genetic similarity coefficients of 36 cotton species using two types of markers (EST-SSRs and gSSRs)

G. G. G. G. G. G. G. G. G. G. G. G. G. G. G. G. G. G. G. herb. herb. arb. thur. klot. herk. davi. arid. raim. goss. loba. tril. laxu. hirs. tome. must. barb. darw. mari. Afri. G. herb. 1 G. herb. 0.81 1 Afri. G. arbo. 0.73 0.76 1 G. thur. 0.74 0.67 0.62 1 G. klot. 0.7 0.65 0.64 0.8 1 G. herk. 0.72 0.67 0.64 0.75 0.78 1 G. davi. 0.66 0.67 0.65 0.8 0.8 0.79 1 G. arid. 0.7 0.68 0.66 0.76 0.73 0.8 0.79 1 G. raim. 0.71 0.68 0.67 0.74 0.72 0.75 0.71 0.73 1 G. goss. 0.77 0.67 0.64 0.8 0.75 0.74 0.72 0.78 0.71 1 G. loba. 0.67 0.66 0.67 0.76 0.69 0.68 0.73 0.78 0.72 0.82 1 G. tril. 0.65 0.67 0.65 0.76 0.75 0.74 0.77 0.73 0.71 0.71 0.73 1 G. laxu. 0.73 0.62 0.63 0.77 0.69 0.7 0.65 0.69 0.68 0.75 0.67 0.67 1 G. hirs. 0.58 0.62 0.57 0.62 0.63 0.56 0.64 0.62 0.59 0.55 0.55 0.62 0.6 1 G. tome. 0.63 0.64 0.63 0.66 0.62 0.59 0.67 0.58 0.64 0.59 0.59 0.64 0.64 0.7 1 G. must. 0.59 0.56 0.62 0.64 0.62 0.58 0.62 0.58 0.59 0.57 0.59 0.64 0.64 0.7 0.76 1 G. barb. 0.63 0.61 0.62 0.64 0.61 0.55 0.61 0.59 0.59 0.58 0.55 0.65 0.62 0.85 0.77 0.79 1 G. darw. 0.62 0.65 0.67 0.65 0.61 0.58 0.63 0.61 0.65 0.57 0.55 0.65 0.6 0.76 0.85 0.77 0.85 1 G. mari. 0.58 0.61 0.58 0.58 0.58 0.6 0.57 0.58 0.56 0.57 0.55 0.65 0.56 0.71 0.65 0.65 0.71 0.73 1

87

Chapter 3:Results

Table 3.6 continued

G. G. G. G. G. G. G. G. G. G. G. G. G. G. G. G. G. G. G. herb herk goss herb. arb. thur. klot. davi. arid. raim. loba. tril. laxu hirs. tome must barb darw. mari. Afri. . . G. lati. 0.68 0.68 0.64 0.70 0.74 0.67 0.73 0.68 0.69 0.69 0.66 0.72 0.70 0.64 0.67 0.67 0.65 0.67 0.67 G. mori. 0.62 0.67 0.62 0.63 0.67 0.61 0.66 0.64 0.61 0.61 0.59 0.65 0.65 0.74 0.64 0.64 0.67 0.67 0.65 G. palm. 0.65 0.63 0.65 0.62 0.64 0.56 0.59 0.59 0.55 0.61 0.59 0.64 0.63 0.69 0.72 0.69 0.73 0.72 0.64 G. punc. 0.63 0.66 0.63 0.65 0.63 0.58 0.63 0.63 0.58 0.58 0.58 0.63 0.65 0.71 0.67 0.64 0.70 0.70 0.71 G. yucca 0.61 0.64 0.62 0.62 0.64 0.61 0.64 0.68 0.63 0.57 0.55 0.64 0.58 0.70 0.64 0.62 0.65 0.73 0.75 G. lanc. 0.65 0.65 0.64 0.66 0.69 0.70 0.68 0.69 0.64 0.69 0.68 0.69 0.72 0.59 0.61 0.65 0.61 0.61 0.61 G. anom 0.68 0.62 0.63 0.67 0.63 0.59 0.63 0.65 0.63 0.65 0.65 0.67 0.61 0.59 0.66 0.61 0.62 0.61 0.62 G. barb. 0.67 0.70 0.63 0.62 0.61 0.62 0.63 0.61 0.64 0.61 0.59 0.63 0.59 0.56 0.61 0.61 0.56 0.60 0.62 G. capi. 0.68 0.65 0.63 0.64 0.62 0.61 0.62 0.59 0.63 0.62 0.60 0.62 0.59 0.59 0.64 0.61 0.59 0.61 0.65 G. soma 0.61 0.50 0.61 0.68 0.66 0.64 0.68 0.68 0.64 0.66 0.62 0.72 0.62 0.58 0.58 0.63 0.62 0.64 0.59 G. stoc. 0.60 0.57 0.61 0.62 0.60 0.57 0.61 0.61 0.61 0.59 0.59 0.61 0.57 0.57 0.61 0.61 0.58 0.62 0.61 G. long. 0.70 0.62 0.58 0.65 0.65 0.67 0.64 0.65 0.59 0.70 0.65 0.66 0.67 0.50 0.59 0.62 0.55 0.50 0.53 G. inca. 0.63 0.62 0.63 0.64 0.59 0.59 0.59 0.62 0.62 0.62 0.65 0.67 0.62 0.54 0.54 0.62 0.60 0.58 0.56 G. robi. 0.56 0.50 0.52 0.64 0.55 0.54 0.61 0.57 0.55 0.59 0.61 0.58 0.55 0.52 0.52 0.55 0.52 0.53 0.54 G. aust. 0.59 0.57 0.55 0.63 0.59 0.56 0.65 0.61 0.64 0.60 0.59 0.60 0.55 0.55 0.58 0.61 0.53 0.58 0.53 G. nels. 0.56 0.55 0.55 0.62 0.54 0.53 0.56 0.59 0.61 0.60 0.62 0.53 0.59 0.56 0.58 0.58 0.56 0.57 0.52 G. bick. 0.61 0.57 0.54 0.64 0.58 0.56 0.61 0.59 0.61 0.60 0.58 0.59 0.58 0.62 0.61 0.64 0.58 0.62 0.61

88

Chapter 3:Results

Table 3.6 continued G. G. G. G. G. G. G. G. G. G. G. G. G. G. G. G. G. lati. mori. palm. punc. yucc. lanc. anom. barbo. capi. soma. stoc. long. inca. robi. aust. nels. bick. G. lati. 1.00 G. mori. 0.75 1.00 G. palm. 0.68 0.77 1.00 G. punc. 0.66 0.76 0.76 1.00 G. yucca. 0.67 0.68 0.67 0.78 1.00 G. lanc. 0.72 0.63 0.64 0.65 0.60 1.00 G. anom. 0.68 0.62 0.61 0.62 0.65 0.70 1.00 G. barbo. 0.68 0.59 0.57 0.59 0.62 0.68 0.82 1.00 G. capi. 0.72 0.61 0.62 0.62 0.65 0.70 0.83 0.84 1.00 G. soma. 0.69 0.55 0.56 0.57 0.64 0.73 0.70 0.70 0.72 1.00 G. stoc. 0.65 0.56 0.58 0.55 0.62 0.67 0.73 0.69 0.78 0.70 1.00 G. long. 0.67 0.58 0.60 0.52 0.53 0.74 0.73 0.72 0.73 0.68 0.70 1.00 G. inca. 0.62 0.57 0.57 0.60 0.58 0.68 0.72 0.70 0.72 0.69 0.72 0.72 1.00 G. robi. 0.58 0.56 0.55 0.49 0.52 0.58 0.66 0.61 0.64 0.59 0.65 0.59 0.72 1.00 G. aust. 0.63 0.54 0.55 0.51 0.52 0.56 0.67 0.61 0.62 0.62 0.62 0.60 0.67 0.74 1.00 G. nels. 0.59 0.55 0.58 0.52 0.52 0.53 0.68 0.62 0.64 0.61 0.60 0.59 0.65 0.74 0.82 1.00 G. bick. 0.64 0.56 0.56 0.58 0.63 0.56 0.64 0.63 0.65 0.64 0.61 0.60 0.62 0.63 0.75 0.73 1.00

* Full names of species are given in table 2.1.

89

Chapter 3:Results

Fig. 3.3 Phylogenetic analysis of 36 cotton species with combined data set of ESTs and gSSRs

The code represents the species as G. herbaceum A1 (1), G. herbaceum var Africanum A1 (2), G. arboreum A2 (3), G. thurberi D1 (4), G. klotzschianum D2 (5), G. harknessii D2-2 (6), G. davidsonii D3 (7), G. aridum D4 (8), G. raimondii D5 (9), G. gossypioides D6 (10), G. lobatum D7 (11), G. trilobum D8 (12), G. laxum D9 (13), G. tomentosum AD3 (14), G. hirsutum AD1 (15), G. mustelinum AD4 (16), G. barbadense AD2 (17), G. darwinii AD5 (18), G. mariglanti AD (19), G. latifolium AD (20), G. morilli AD (21), G. palmerii AD (22), G. punctatum AD (23), G. yucatanense AD (24), G. lanceolatum AD (25), G. anomalum B1 (26), G. barbosanum B3 (27), G. capitisvaridis B4 (28), G. somalense E2 (29), G. stocksii E1 (30), G. longicalyx F1 (31), G. incanum E4 (32), G. robinsonii C1 (33), G. australe G2 (34), G. nelsonii G3 (35), G. bickie G1 (36).

90

Chapter 3:Results

3.1.8 Clustering of species using EST-SSRs

Cluster analysis based on EST-SSRs grouped the species into seven major clusters A through G (Fig. 3.4). The clustering of species with EST-SSRs (Fig. 3.4) is more close to the phylogenetic tree obtained from combined data set except few differences like grouping of G. raimondii L. with A-genome species and sister clustering \between A- and D-genome species. Sister clustering between B- and E-genome and also between C- and G- genome species was observed with EST-SSRs as well as with combined data set.

Fig. 3.4 Clustering of Gossypium species using EST-SSRs

The code represents the species as G. herbaceum A1 (1), G. herbaceum var Africanum A1 (2), G. arboreum A2 (3), G. thurberi D1 (4), G. klotzschianum D2 (5), G. harknessii D2-2 (6), G. davidsonii D3 (7), G. aridum D4 (8), G. raimondii D5 (9), G. gossypioides D6 (10), G. lobatum D7 (11), G. trilobum D8 (12), G. laxum D9 (13), G. tomentosum AD3 (14), G. hirsutum AD1 (15), G. mustelinum AD4 (16), G. barbadense AD2 (17), G. darwinii AD5 (18), G. mariglanti AD (19), G. latifolium AD (20), G. morilli AD (21), G. palmerii AD (22), G. punctatum AD (23), G. yucatanense AD (24), G. lanceolatum AD (25), G. anomalum B1 (26), G. barbosanum B3 (27), G. capitisvaridis B4 (28), G. somalense E2 (29), G. stocksii E1 (30), G. longicalyx F1 (31), G. incanum E4 (32), G. robinsonii C1 (33), G. australe G2 (34), G. nelsonii G3 (35), G. bickie G1 (36).

3.1.9 Clustering of species using gSSRs

91

Chapter 3:Results

Cluster analysis based on gSSRs grouped species into six major clusters A through F (Fig. 3.5). Clustering of species with gSSRs deviated from both the EST-SSRs and

combined data set. In gSSRs based dendrogram, E-genome species (G. somalense L. E2) was grouped with D-genome species. Similarly, G. incanum L. E4 and G. stocksii L. E1

were grouped with G. robinsonii L. C1 and G. longicalyx L. F1, respectively. Also, the two races of G. hirsutum L. (G. lanceolatum L. and G. latifolium L.) were grouped with D- genome species.

In this study, all three A-genome species were grouped in one cluster with all three data sets (combined, EST-SSRs and gSSRs) (Figs. 3.3, 3.4 and 3.5 respectively).

Fig. 3.5 clustering of Gossypium species using gSSRs

The code represents the species as G. herbaceum A1 (1), G. herbaceum var Africanum A1 (2), G. arboreum A2 (3), G. thurberi D1 (4), G. klotzschianum D2 (5), G. harknessii D2-2 (6), G. davidsonii D3 (7), G. aridum D4 (8), G. raimondii D5 (9), G. gossypioides D6 (10), G. lobatum D7 (11), G. trilobum D8 (12), G. laxum D9 (13), G. tomentosum AD3 (14), G. hirsutum AD1 (15), G. mustelinum AD4 (16), G. barbadense AD2 (17), G. darwinii AD5 (18), G. mariglanti AD (19), G. latifolium AD (20), G. morilli AD (21), G. palmerii AD (22), G. punctatum AD (23), G. yucatanense AD (24), G. lanceolatum AD (25), G. anomalum B1 (26), G. barbosanum B3 (27), G. capitisvaridis B4 (28), G. somalense E2 (29), G. stocksii E1 (30), G. longicalyx F1 (31), G. incanum E4 (32), G. robinsonii C1 (33), G. australe G2 (34), G. nelsonii G3 (35), G. bickie G1 (36).

92

Chapter 3:Results

3.2 Experiment No. 2. Identification of differentially expressed genes from normal (fiber producing) and mutant (fiberless) ovules of Gossypium hirsutumduring fiber elongation stage.

This experiment was conducted to study the genes involved in the elongation of cotton fiber as very little information is available of the genes responsible for fiber elongation because of their complex mechanism of regulation. These genes were studied by non-radioactive differential display reverse transcriptase PCR (DDRT-PCR) assays. Total RNA isolated from the ovules collected from floral bud of normal (fiber producing) and mutant (fiberless) 3 dpa (days post anthesis) of G. hirsutum (Fig. 3.6). This RNA was reverse transcribed to cDNA using oligodt primer. M 1 2

Fig. 3.6 RNA isolated from ovules of mutant and normal floral buds 3 dpa Lane M contained 1kb ladder (Fermentas) Lane 1 contained RNA from mutant (fiberless) ovules at 3dpa Lane 2 contained RNA from normal (fiber producing) ovules at 3 dpa

By screening 15 arbitrary primers with combination of 11 anchored primers (Table 2.6), a total of 23 differentially transcribed parts of genes or genes (Fig. 3.7) were found. Out of these 10 were not further explored in the study because of poor amplification during re-amplification. The remaining 13 were sequenced with M13 primers. The DNA sequencing services were provided by Macrogen.

93

Chapter 3:Results

Fig. 3.7 Differential display (DDRT-PCR) from normal (fiber producing) and Mutant (fiberless) 3dpa ovules of G. hirsutum. Lanes M represent 1 Kb marker, lanes L represent normal lint producing ovules and lanes Lo for mutant (lintless). RT− PCR reactions were conducted using anchored primers and arbitrary primers (Table 2.6). cDNA fragments, which appeared to be differentially expressed in lint producing ovules, are indicated by arrows (lane 2: A6B2; lane 4: A6B1_F; lane 6: A1B5; lane 7: A6B1; lane 10: A1B3; lane 12: A5B2; Lane 14: A3B2).

The sequences of the differentially expressed parts of genes were translated into amino acid sequences. Upon the homology search in GenBank databases, amino acid sequences of six fragments showed homologies with the already reported genes (Fig. 3.8) and their sizes ranged from 600–800 bp (Fig. 3.8). This partial fiber related cDNA sequences have been submitted to NCBI GenBank EST database (except A6B1). Their accession numbers are given in Table 3.7.

94

Chapter 3:Results a: Amino acid sequence homology between fragment A6B1and β-tubulin genes; Identities = 10/23 (43%), Positives = 12/23 (52%)

A6B1 71 DLEPGTMETDSVRSGPYGQIFRP 93 DL PGT R +GQ +RP β-tubulin 73 DLNPGTPTGLGPRPSAFGQAWRP 5 b: Amino acid sequence homology between fragment A6B2 and zinc finger; Identities = 9/16(56%), Positives = 11/16(68%)

A6B2 154 FRLNRLCIPKCSMETR 169 R+ RLC P CS+E R Zinc finger 116 LRIGRLCXPNCSLERR 163 c: Amino acid sequence homology between fragment A3B2 and protein phosphatase; Identities = 34/79 (43%), Positives = 41/79 (51%)

A3B2 51 NKLSNYKVKEEPIEVGYPISVARLGKDQARQSNTGVLK*TVKQDDQPPADGFPVNYWPFF 230 NKLS V EE E G + RLGKD +N+G+ + V Q DQ P D VN FF phosphatase 342 NKLSAVGVVEELFEEGSAMLAERLGKDFPSNTNSGLYRCAVCQVDQTPTDSLSVNSASFF 401 A3B2 231 SRPSTAREGPIICANLGKR 287 S SEGP +C N K+ phosphatase 402 SPGSKPWEGPFLCPNCRKK 420

95

Chapter 3:Results d: Amino acid sequence homology between fragment A5B2 and protein phosphatase; Identities = 47/73 (64%), Positives = 66/73 (90%)

A5B2 1 DLVDIASGQKQGTLVAKIYWGDARDKICESVEDLKLDCLVMGSRGLGTIQRVLIGSVSNY 180 D++D A+ Q+T+VAK+YWGDAR+K+C++VE+ K+D LVMGSRGLG+IQR+L+GSV+NY phosphatase 91 DMLDTAARQLELTVVAKLYWGDAREKLCDAVEEQKIDTLVMGSRGLGSIQRILLGSVTNY 150 A5B2 181 VMVNATCPVTIVK 219 V+ NA+CPVT+VK phosphatase 151 VLSNASCPVTVVK 163 e: Amino acid sequence homology between fragment A1B3 and P-type ATPase; Identities = 34/79 (43%), Positives = 41/79 (51%)

A1B3 120 PMPG*FCIFEDTEFFPDGQAGH*LLTSGDPSASASQSAGIPGMSHHA 260 P P F +T F GQA LLTS DP ASASQ+AGI G+SH A P-type ATPase 45 PRPAYFLFLVETGFHHVGQASLELLTSDDPPASASQNAGITGVSHRA 91 f: Amino acid sequence homology between fragmentA6B1_F and Glycosyltransferase (Glyctransfse); Identities = 19/50 (38%), Positives = 29/50 (58%)

A6B1_F 18 FLAGSAVVPLLGRPTTNETD---LVSTOPDFALLNKEMETVLPVLKKLDS 64 +L + + ++ +PTT +D LV+ DFALLN M V +LK +DS Glyctransfse 9 YLHTTNIFFVMKQPTTENSDIIKLVAILGDFALLNLSMILVFFILKGIDS 58 Fig. 3.8Amino acid sequence homology of six fragments with known genes The deduced amino acid sequences were aligned using NCBI BLAST Pair wise Alignment algorithm programs (http://www.ncbi.nlm.nih.gov/BLAST/).

96

Chapter 3: Results

Table 3.7 Fragments identification and homologies with known genes

Clone GenBank Size Homology Identification Accession # A6B1 825bp -- β-tubulin gene GenBankAccession # AI730987.1

A6B2 760bp HO189243 Zinc finger protein Gene Bank Accession # BAD54569.1

A3B2 740bp HO189244 Protein phosphatase Gene Bank Accession # XP_002512477.1

A5B2 730bp HO189245 Protein phosphatase Gene Bank Accession # XM_002282949.2

A1B5(lintless) 750 HO189241 Zinc finger protein Gene Bank Accession # CAA35956.1

A1B3 760 HO189240 P-type ATPase Gene Bank Accession # ZP_05627499.1

A6B1_F 900 HO758782 Glycosyltransferase Gene Bank Accession # ATCC 8482

Sequence A6B1 not submitted to GenBank.

A transcript designated as A6B1 having size of 760bp showed homology with β- tubulin gene (Fig. 3.8a) while A transcript designated as A6B2 has homology with zinc finger proteins (Fig. 3.8b).

The deduced amino acid sequences of two transcripts designated as A3B2 and A5B2 showed homology with protein phosphatase (Figs. 3.8c and 3.8d, respectively).The fragment designated as A1B3 has homology with the P-type ATPase (Fig. 3.8e) and A6B1_F showed homology with glycosyltransferase (Fig. 3.8f).

A transcript designated as A6B1 of 750bp isolated from lintless ovule did not show homology with any putative plant protein, rather it showed homology with zinc-finger protein of Homo sapiens indicating that the expression of fiber development genes are impaired by mutation (Wang et al., 2010).

97

Chapter 3: Results

3.3 Experiment No. 3.Isolation of fiber related differentially expressed genes from G. arboreum ovules at different developmental stages and phylogenetic study of translation elongation factor-1 gamma gene across diploid Gossypium genomes.

This experiment was conducted to isolate sequences involved in fiber development from G. arboreum L. (this is diploid species and was used to avoid the redundancy due to polyploidy) which can be used for the improvement of modern cultivars. For this the total RNA isolated from ovules at 0 dpa, 5 dpa and 10 dpa, and from leaf samples of G. arboreum L. (Fig. 3.9) was reverse transcribed into cDNA, which was then amplified with gene specific primers (Table 2.6).

M 1 2 3 4

Fig. 3.9 RNA from ovules and leaf of G. arboreum

Lanes 1-3 contained RNA from ovules at 0, 5 and 10 dpa respectively.

Lane 4 contained RNA from leaf.

Lane M contained 1kb ladder (Fermentas).

98

Chapter 3: Results

3.3.1 Amplification of (cDNA with gene specific primers from G. hirsutum

Reverse transcribed RNAs into cDNA isolated from 0 and 5 dpa developing ovules of G. arboreum L. were amplified with gene specific primers Y-23 and SNP-76. While the cDNA collected at 10 dpa was amplified with two primers SNP-10 and SNP-24 (Table 2.8).

3.3.2 Cloning of PCR amplified products into pTZ57 R/T vector

The PCR amplified products with gene specific primers were cloned into pTZ57 R/T cloning vector. Clones having the inserts of interest were analyzed by restriction enzyme digestion (Fig. 3.10) and then were got sequenced from Macrogen with M13 reverse and forward primers.

The sequences were blast searched and showed homology with reported sequences. Reverse transcribed total RNAs of 0dpa (amplified with Y-23), 5dpa (amplified with SNP- 76) and 10 dpa (amplified with SNP-10 and SNP-24) showed homology with Myb like protein (Fig 3.11a), Ubiquitin extension protein (Fig 3.11b), proline rich protein (Fig 3.11c) and translation elongation factor1- gamma (Fig 3.11d), respectively.

99

Chapter 3: Results

a: Sequence alignment of myb104b gene of G. hirsutum L. and G. arboreum L. Max score = 196, Total score= 196, Query coverage= 44%, E value 6e-47, Max indent=98% Identities = 111/113(98%), Gaps 2/113(1%)

1 CTCCGCAAGTTGAGCATCGGGCCTTCACCCCCGAGGAAGACGAGACCATCA-CAGAGCTC 59 ||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||| 27 CTCCGCAAGTTGAGCATCGGGCCTTCACCCCCGAGGAAGACGAGACCATCATCAGAGCTC 86

60 ATGCCCGATTTGGTAACAAGTGGGCCACAATAGCCCGACTCCTCA-CGGTCGT 111 ||||||||||||||||||||||||||||||||||||||||||||| ||||||| 87 ATGCCCGATTTGGTAACAAGTGGGCCACAATAGCCCGACTCCTCAACGGTCGT 139

b: Sequence alignment of ubiquitin extension proteinofG. hirsutum and G. arboreum Max score = 313, Total score= 313, Query coverage= 90%, E value 6e-82, Max indent=89% Identities = 224/251(89%), Gaps 2/251(0%)

26 CAAGATGCAGATCTTCGTCAAAACCCTAACGGGGAAGACCATAA-CCTTGAGGTTGAATC 84 ||||||||||||||||||||||||||||||||||||||| |||| ||| ||||| || || 54 CAAGATGCAGATCTTCGTCAAAACCCTAACGGGGAAGACTATAACCCTAGAGGTCGAGTC 113

85 CTCCGATACAATCGACAACGTCAAGGCCAAGATCCAAGACAAGGAGGGCATCCCACCCGA 144 || || || || ||||| ||||| || ||||||||||||||||| ||||| ||||| || 114 TTCGGACACCATTGACAATGTCAAAGCTAAGATCCAAGACAAGGAAGGCATTCCACCTGA 173

145 CCAACAGCGCCTTATCTTCGCCGGGAAGCAGCTCGAAGACGGCCGCACCTTAGCCGACTA 204 |||||| || |||||||||||||| ||||| |||||||| |||||||||||||||||||| 174 CCAACAACGTCTTATCTTCGCCGGCAAGCAACTCGAAGATGGCCGCACCTTAGCCGACTA 233

205 CAACATCCAGAAGGAATCCACTCTCCACCTCGTCCATCCGTCTCAGGGGTGGTGCCAAGA 264 |||||||||||||||||||||||| ||||| |||| ||||||||||||| ||||| |||| 234 CAACATCCAGAAGGAATCCACTCTACACCTTGTCC-TCCGTCTCAGGGGAGGTGCGAAGA 292

265 AGAGGAAGAAG 275 293 |||| |||||| 303 AGAGAAAGAAG

100

Chapter 3: Results

c: Sequence alignment of proline rich protein of G. herbaceum and G. arboreum Max score = 291, Total score= 291, Query coverage= 100%, E value 2e-75, Max indent=96% Identities = 176/184(96%), Gaps 6/184(3%)

1 CAAAGACACTTGCTGCCCTGTACTAC-AAGGGCTGTTGGATTT-AGACGCCG-CATTTGT 57 |||||||||||||||||||||||||| |||||||||||||||| |||||||| ||||||| 199 140 CAAAGACACTTGCTGCCCTGTACTACNAAGGGCTGTTGGATTTNAGACGCCGCCATTTGT

58 CTCTGTACTACGATCAAGGCCAAACTTT-TGAATATCAATATTAT-AATCCTCATAGCTC 115 |||||||||||||||||||||||||||| |||||||||||||||| ||||| |||||||| 200 CTCTGTACTACGATCAAGGCCAAACTTTNTGAATATCAATATTATNAATCCCCATAGCTC 259

116 TTCAGGTCCTCATCGACTGTGGCAAAACTCCACCTCCAGGG-TCCAATGTCCAGCCCAAT 174 ||||||||||||||||||||||||||||||||||||||||| ||||||||||||| |||| 319 260 TTCAGGTCCTCATCGACTGTGGCAAAACTCCACCTCCAGGGTTCCAATGTCCAGCACAAT

175 GATG 178 |||| 230 GATG 323

d: Sequence alignment of elongation factor 1-gamma of Vitis vinifera and G. arboreum Max score = 215, Total score= 215, Query coverage= 53%, E value 3e-52, Max indent=96% Identities = 196/235(83%), Gaps 3/235(1%)

198 CCAAGAATCTTCTTGATTTTTGGCTGATTAACCAAGGTCCAAAAGTACCTCTCAACATGA 257 ||||| ||||| ||| ||||| |||||||||| |||||| ||||| |||||||||||| 757 CCAAG-ATCTTGCTGAAATTTGGTTGATTAACCATGGTCCAGAAGTATCTCTCAACATGA 699

258 GGGAACTCCGAGGTAAAACTCTTGGTCATGATCTGGGAGAAACCCAAGTAGAGGTTACAT 317 ||||||||||||||||| ||||| ||||||| || ||| |||| || | || ||| 698 GGGAACTCCGAGGTAAAGCTCTTAGTCATGAGCTTACTGAATCCCATATACAAATTGCAT 639

318 GTCATGATAATGTCAGCAAGGGTGACAAAATGTCCAAC-AGGAAATGTGTTGGAAGCAAG 376 ||||||| ||||||||| ||||||||| |||| || || |||||| ||||| |||||||| 638 GTCATGACAATGTCAGCCAGGGTGACAGAATGCCCCACCAGGAAA-GTGTTTGAAGCAAG 580

377 ATGAGTGTTCAAAGCGCCAAGGGCTCTCTTCAATGCAGCAATAACAGCTTCCTCA 431 ||| ||||||||||| || | |||||||||||||||||||| |||| |||||| 579 ATGTGTGTTCAAAGCACCTAATGCTCTCTTCAATGCAGCAATTGCAGCCTCCTCA 525

Fig. 3.10 Nucleotide sequence homology of four fragments amplified with gene specific primers with known genes. The nucleotide sequences were aligned using NCBI BLAST programs (http://www.ncbi.nlm.nih.gov/BLAST/).

101

Chapter 3: Results

3.3.3 Screening of different cotton BAC libraries with gene specific probes

Cotton overgo (COV) probes were designed from sequence obtained after cloning of PCR amplified products with gene specific primer (SNP-24) for screening of cotton BAC libraries to isolate cotton fiber development associated genomic regions. A total of four BAC libraries: three of diploid genome (G. kirkii L., G. raimondii L. and G. arboreum L.) and one of tetraploid genome (G. hirsutum L.) were screened. Three Overgo probes (COV2419, COV2420 and COV2421) were designed for these sequences (Table 2.11). This part of work was done in PGML lab of University of Georgia, USA.

COV2419 had no matches in any of the library, however, an adequate number of positive hits were obtained for both the other probes. COV2420 has too many matches to be a single copy gene; which showed that it is a small multi gene family. The numbers of hits of both probes in different libraries are shown in Table 3.8.

Table 3.8 Different libraries and their hit numbers by both probes

Species Enzyme Membrane COV2420 hits COV2421 hits G. hirsutum HindIII Maxxa 39 21 G. arboreum MboI GAMBO 61 28 G. raimondii HindIII GR 25 09 G. kirkii HindIII GKH 33 11 Total hits 158 69 Average hits 39.5 17.35

102

Chapter 3: Results

Probe hits in different libraries

Fig. 3.11 Both probes showed maximum numbers of hits in G. arboreum BAC library

3.3.3.1 Sequencing of BAC clones.

To study the practicability of using this strategy to establish a framework for sequencing the selected regions of cotton genomes, clones which gave positive hits with probe COV2421 were partially sequenced using gene specific primers. These primers were designed from the sequence which we got in our previous section 3.3.2 showing homology with translation elongation factor1- gamma (Table 2.11). A total of 28 positive clones of G. raimondii L., nine of each of G. arboreum L. and G. kirkii L. and 23 clones of G. hirsutum L. library were sequenced. A Total of 138 sequencing reactions were performed. Because of the problems encounter in sequencing, only 42 high quality sequences with readable bases (success rate 30.43%) with an average raw base count 489 were obtained. High quality sequences were defined as those having >100 high quality bases other than vector and sequences. High quality sequences were nine in number, four from each of G. arboreum L. and G. kirkii L. and one from G. raimondii L.

103

Chapter 3: Results

3.3.3.2 Nucleotide variations

DNASTAR (DNASTAR Inc., Madison, WI, USA) and Clustal v were used for sequence alignment and nucleotide variation detection in these nine nucleotide sequences. At eight positions between these sequences nucleotide variations were observed which were specific to genomes (Fig. 3.13).

Fig. 3.12 Genome specific nucleotide variation

104

Chapter 3:Results

GKH003H23 GKH047F08 GKH009H11 GKH042J14 GAMBO023K0 GAMBO238I11 GAMBO161O1 GAMBO119O0 G. raimondi ful 26.5

25 20 15 10 5 0 Nucleotide Substitutions (x100)

Fig. 3.13 Phylogenetic tree of cotton species.

105

Chapter 3: Results

3.3.3.3 Analysis of BAC end sequences (BES) and Contigs assembly

In total, 23 BES sequences were analyzed using Blast2Go to obtain a distribution of functional gene groups. Out of these, 14 sequences did not exhibit homology with the already reported sequences. The other nine high quality sequences showed homology with the reported elongation factor (EF) 1-gamma of Arabidopsis thaliana. All these sequences were aligned using DNASTAR (DNASTAR Inc., Madison, WI, USA) and Clustal V were used for sequence alignment. On joining of ends of these sequences a sequence of 1524 bp was obtained (Table 3.9).

106

Chapter 3: Results

Table 3.9 Sequence alignment of EF 1-gamma of A. thaliana and G. arboreum Max score = 313, Total score= 736, Query coverage= 63%, E value 1e-88, Max indent=100% Identities = 375/511 (74%), Gaps 24/511 (4%)

CCTTTAATTGCAACCTCTCGGAAATTGGTCTTTGTGTTAGAGTAAAGCCTCTTCCAGTCA 720 779 ||||| || |||||||| ||||| |||| || ||||| |||||||||||||||||||||

CCTTTGATAGCAACCTCACGGAAGTTGGATTTGGTGTTCGAGTAAAGCCTCTTCCAGTCA 996 907

TCCAGTACCATCTTACTTGGAGGCAGCAAATCAAGAGGATTCTTTGGTTTGGGCTTTGGT 780 839 || || ||||| ||||| || ||||| |||||||||||||| | ||| |||||||||

TCGAGAACCATTGGGCTTGGTGGTAGCAAGTCAAGAGGATTCTTGGCTTTAGGCTTTGGT 906 847

GCCTCCTCCTCCTCAGCTGCCTCTGCCTTTGCTGCCTGTTTCTCAACCTCCTTTTTGGCT 840 899 || |||||||||||||| | || ||| ||||| | || ||||

GCTTCCTCCTCCTCAGCAGGCT------TTGGTGCCTCTGCTACAGG------GGCT 846 802

TCTTTCTTTGGTTCGGCCTTTGGTTTAGTTTCCTTTGGCTGGGCAGCAGGCTTCTTTGAA 900 959 | ||||| || || |||| || |||| ||||| | ||| ||||| | |

GCCTTCTTGGGCTCCTCCTTGGGCTTAGCA------GGCTGTGGAGCT---TTCTTAGTA 801 751

GGAACAGGTGGTAGAGAGACTGCCTGCTTCATTTCACCAAGAATCTTCTTGATTTTTGGC 960 1019 ||||| || || | || | | || || ||||| | | |||||||| || ||||

GGAACTGGAGGGACAGCTTCGGTTTGTTTGGCATCACCCAACACCTTCTTGAATTCTGGC 750 691

TGATTAACCAAGGTCCAAAAGTACCTCTCAACATGAGGGAACTCCGAGGTAAAACTCTTA 1020 1079 || ||||||| ||||| ||||| ||||||||||||||||| | |||||||| |||

TGGTTAACCATTGTCCAGAAGTATCTCTCAACATGAGGGAATGCAGAGGTAAATTCCTTG 690 631

GTCATGATCTGGGAGAAACCCAAGTAGAGGTTACATGTCATGATAATGTCAGCAAGGGTG 1080 1139 ||||| | || || ||||||| | ||| || | ||| ||| ||||||||||||

GTCATCACGGTGGCAAATCCCAAGTTCAAGTTGCAGATTGTGACAATATCAGCAAGGGTG 630 571

ACAAAATGTCCAACAAGAAATGTGTTGGAAGCAAGATGAGTGTTCAAAGCGCCAAGGGCT 1140 1199 ||| | ||||||||||| || |||||||| || ||||| ||||||| ||| |||| ||

ACAGAGTGTCCAACAAGGAAAGTGTTGGAGGCGAGATGTGTGTTCAGAGCCTCAAGTCCT 570 511

CTCTTCAATGCAGCAATAGCAGCTTCCTCAG 1200 1230 ||||||||||||| ||| || ||||||||||

CTCTTCAATGCAGAAATTGCCGCTTCCTCAG 510 480

107

Chapter 3: Results

3.4 Experiment No. 4. BAC derived new SSRs and their utility for use in cotton (Gossypium spp.) improvement

3.4.1 Identification of microsatellite (SSR)

A total of 2397 BAC-end and BAC clone sequences of G. raimondii L. were selected for the identification of SSRs. Out of these 2397 sequences, SSRs were found in 1794 sequences. Out of these, 1294 were used for primer designing while others have not sufficient flanking sequences to permit designing of high quality primers. A total of 1294 new gSSRs (BAC-gSSRs) were designed (Appendix Table 9).

This new set of BAC-gSSRs contained diverse types (di-, tri-, tetra-, penta- and hexa- nucleotide) of repeat motifs. The large number of repeats were dinucleotide (722, 55.97%), followed by tri (397, 30.46%), tera, (118, 9.05%), hexa (40, 3.06%) and penta (17, 1.30%) (Table 3.10).

Table 3.10 Frequency of different repeat motifs in newly identified set of BAC- gSSRs.

Nucleotide repeat %age of the total Motif type in SSRs Total number types in SSRs SSRs Dinucleotide AT/TA 722 55.98 Trinucleotide AAG/TTC 397 30.68 Tetranucleotide TTAT/TATG 118 9.11 Pentanucleotide GAAAA/CTTTT 17 1.31 Hexanucleotide AAAAAT/TTTTTA 40 3.09

3.4.2 Surveying of BAC-gSSR

This newly identified set of BAC-gSSRs was surveyed on template genomic DNA of G. hirsutum cv FH-1000, G. barbadense cv PGMB-36, G. raimondii L. and G. arboreum L. Out of these, amplification was obtained with 1043 (80.60%) primers while 251 (19.39%) produced non-specific amplification. Out of these 535 (51.29%) were polymorphic between the diploid species (G. raimondii L. and G. arboreum L.) while 235

108

Chapter 3: Results

(22.53%) were polymorphic between the two cultivated species (G. hirsutum cv FH-1000 and G. barbadense cv PGMB-36) (Table 3.11). These two cultivated species were used as parental species for the development of segregating population for our next experiment of genetic and QTL mapping of inter-specific (G. hirsutum L. x G. barbadense L.) derived

F2:3 population.

Table 3.11 BAC-gSSRs polymorphic among four cotton species

G. barbadense Percent G. hirsutum Percent G. raimondii Percent

G. barbadense

G. hirsutum 235 22.53

G. raimondii 539 51.68 392 37.58

G. arboreum 683 65.48 708 67.88 535 51.29

Polymorphism was observed on the basis of variation in size of amplified alleles or null allele (allele absent). Further analysis were carried out on the basis of 235 polymorphic primers between G. hirsutum L. cv FH-1000 and G. barbadense L. cv PGMB-36 and found that 174 (74.05%) were polymorphic on the basis of differences in sizes of amplified allele (Fig. 3.15), while 61 (25.95%) showed dominant behavior (results from null alleles), yielding amplification products in only one of the two parental genotypes. Null alleles were confirmed by repeating the reaction thrice.

109

Chapter 3: Results

Fig. 3.14 Polymorphic BAC-gSSRs between tetraploid species

For di, tri, tetra, penta and hexa-nucleotides, 5, 4, 3, 3 and 3 were the minimum number of repeat units, respectively. The repeat motifs with higher number of repeats showed high level of polymorphism. Hexa nucleotide repeats had the highest level of polymorphism (42.4%) followed by penta (35.29%), tetra (30.50%), tri (23.25%) and di (19.0%) (Table 3.12).

Table 3.12 Percentage of polymorphism of BAC-gSSR in (G. hirsutum cv. FH_1000 and G. barbadense cv. PGMB-36)

FH- PGMB- %age of Total Monomorphic Polymorphic NA 1000 36 polymorphism Di 722 495 87 10 8 122 14.54 Tri 397 197 45 7 19 129 17.88 Tetra 118 82 22 12 2 30.51

Penta 17 11 3 2 1 35.29

Hexa 40 23 17 42.50 Total 1294 808 174 31 30 251 22.53 %age 77.47 16.68 2.97 2.88 19.40

110

Chapter 3: Results

All these SSRs were designed using cloned sequence of G. raimondii L. which is one of the progenitor (contributed D-genome) in allotetraploid cotton species, hence the equal number of dominant amplified products were produced by both the genomic DNA

Out of the successfully amplified BAC-gSSRs, 313 (30%) amplified two fragments in tetraploid species and single locus in G. raimondii L. and G. arboretum L. These G. raimondii L. derived BAC-gSSRs gave successful amplification in G. hirsutum L. and G. barbadense L.

In this study, a total of 391 (30%) of the total BAC-gSSRs primers could not amplify distinct fragments in G. arboreum L. species but these primers amplified alleles in D-genome species (G. raimondii L.) and also in AD genome species (G. hirsutum L. and G. barbadense L.). It indicates that the loci/alleles amplified by these BAC-gSSRs are specific to D-genome species and can be mapped on this species. Average PIC value was 0.39, ranging from 0.12 to 0.85.

Genetic similarity coefficients among diploids and cultivated tetraploid species were in the range of 0.57 to 0.85 with an average of 0.75 (Table 3.13). G. arboreum L. was relatively more genetically similar to G. barbadense L. (0.63) than to G. hirsutum L. (0.57) while G. raimondii L. has shown equal genetic similarity (0.59) with both the species.

Table 3.13 Genetic similarity coefficients of diploid and tetraploid cotton species using G. raimondii derived BAC-gSSRs

G.arboreum G.raimondii G.hirsutum G.babadense

(A2) (D5) (AD1) (AD2) G.arboreum (A2) 1.00

G.raimondii (D5) 0.67 1.00

G.hirsutum (AD1) 0.57 0.59 1.00 G.babadense(AD2) 0.62 0.59 0.85 1.00

111

Chapter 3: Results

3.5 Experiment No. 5. Genetic and QTL mapping of fiber traits using inter-

specific (G. hirsutum x G. barbadense) F2 population

3.5.1 Parental polymorphism

A total of 1314 SSR primer pairs including 20 EST-derived (prefix MGHES) and 1294 newly identified BAC-gSSRs were surveyed on the genomic DNA of G. hirsutum cv FH-1000 and G. barbadense cv PGMB-36 (which were used as parental species for the

development of F2segregating population). Out of 20 EST-SSRs only six EST-SSRs amplified the score-able products, and all these EST-SSRs (MGHES-05, MGHES-12, MGHES-14, MGHES -19, MGHES-20 and MGHES-23) amplified monomorphic fragments.

A total of 2049 alleles were amplified by 1043 successfully amplified G. raimondii derived BAC-gSSRs, yielding 1.96 alleles per primer (Fig. 3.15).In total 235 (22.53%) were found polymorphic. The size of most alleles was in good agreement with the previous published work (Liu et al., 2000a; Yu et al., 2002; Qureshi et al., 2004; Han et al., 2006).

Out of 235 polymorphic SSRs, 74 were used to assay the entire F2 population consisting of 131 individuals. Among the 74 SSRs surveyed on mapping population 62 SSRs segregated co-dominantly while rest of the markers were found dominant (Fig. 3.16). A total of twelve loci (16.4%) showed distorted segregation when subjected to chi-square goodness-of-fit test for the expected Mendelian segregation ratio that is 1:2:1 and 1:3 for co-dominant and dominant markers, respectively.

3.5.2 Linkage analysis and map construction

Data generated after surveying 74 BAC-SSRs on F2 mapping population was utilized for linkage analysis and map construction. Linkage analysis resulted in mapping of 74 loci grouped in eleven different linkage groups (LG) ranging from 00 to 264.3cM in length. The remaining 2 markers were found unlinked. The resulting linkage groups were numbered LG01–LG11. These linkage groups ranged from 2 to 11 markers with an average length of 68.03cM (Fig. 3.17). The map spanned a total of 748.4cM, covering ~8% of the total G. hirsutum L. (2.5Gb) genome, and the average distance between adjacent markers

112

Chapter 3: Results

was 10.39 cM (Table 3.14). Already mapped SSRs assigned to different chromosomes (Lacape et al., 2003; Nguyen et al., 2004; Song et al., 2005a) were used as reference markers. In this study we surveyed 20 markers as reference out of these only six gave scorable fragments which all were found monomorphic. So linkage groups could not be assigned any chromosome number. Chromosome numbers were assigned to linkage groups using genome sequence information of G. raimondii L. through blast search of sequences from which these SSRs were designed against whole genome shotgun sequence (WGS) of G. raimondii L. (Appendix Table 7). These BAC-gSSRs were named as PR (last name of both principal investigators) followed by number e.g. PR 542 in the linkage groups (Fig. 3.17).

M P1 P2 1 2 3 4 5 6 7 8 9 10 11 12 13

200bp

100bp

50bp

Figure 3.15 Segregation of BAC-gSSR 48 in 13 plants of F2 population derived from a cross of FH-1000 x PGMB-36.

P1= FH-1000, P2= PGMB-36, 1 through 13= F2 plants M= 50 bp size marker.

113

Chapter 3: Results

Table 3.14 Distribution of DNA markers on linkage map of cotton constructed using

F2 population derived from inter-specific cross (FH-1000x PGMB-36)

Linkage Assigned No. of Percentage of Length Marker group chromosome marker loci marker loci (cM) density 1 Chr.15 6 8.34 00 6.0 2 Chr.18 9 12.5 103.6 11.51 3 Chr.19 4 5.55 00 4.0 4 Chr.14 9 12.5 75.3 8.36 5 Chr.23 6 8.34 48.6 8.1 6 Chr.21 8 11.12 55.8 6.97 7 Chr.16 2 2.77 00 2.0 8 Chr.26 2 2.77 54.1 27.05 9 Chr.22 11 15.27 264.3 24.02 10 Chr.20 7 9.72 103.9 14.84 11 Chr.24 8 11.12 42.8 5.35 Total 72 100 748.4 10.39

114

Chapter 3: Results

Figure 3.16 Genetic linkage map constructed using inter-specific cross FH-1000 x PGMB-36 derived F2 population. The names of markers are given on right while position of GR-BAC-SSR in centi Morgans (Kosambi, 1944) on left.

115

Chapter 3: Results

3.5.3 Phenotypic analysis of productivity/taxonomic/fiber traits in F2:3families

All tested traits were fitted to the normal distribution according to Kolmogorov-

Smirnov normality test. The continuous pattern of F2:3 families’s variation showed quantitative inheritance of measured traits. Transgressive segregation observed in both directions indicated that both parents transmitted favorable alleles for each trait. A B

C D

E F

Figure 3.17 Phenotypic distribution of, (A) boll length (B) boll width (C) bracteole length (D) bracteole width (E) fiber elongation (F) uniformity index in F2:3 families. (Grey arrows and white arrows indicate FH-1000 and PGMB-36 respectively). 116

Chapter 3: Results

G H

I H

Figure 3.18 Phenotypic distribution of, (G) staple length (H) fiber strength (I) GOT (J) fiber fineness in F2:3 families. (Grey arrows and white arrows indicate FH-1000 and PGMB-36 respectively).

117

Chapter 3: Results

When phenotypic performance was examined FH-1000 showed higher values for boll length, boll width, bracteole length, bract eole width, fiber elongation, fiber fineness and maturity while PGMB-36 showed higher values for fiber length, fiber strength and uniformity index (Table 3.15).

Table 3.15 Phenotypic performances for boll length, boll width, bracteole length, bracteole width, fiber length, fiber strength, fiber elongation, fiber fineness, maturity, uniformity index and GOT% of parents and F2:3 mapping population from a cross of FH-1000 and PGMB-36.

F2 Families Trait FH-1000 PGMB-36 Prob a Mean ± SE CV (%) Range Boll length 3.4 3.0 0.05 3.1 ± 0.12 29.4 2.6-4.2 Boll width 3 2.5 0.01 2.23 ± 0.08 28.7 2.1-3.3 Bracteole length 3.8 3.2 0.01 2.74 ± 0.13 35.4 2.2-4 Bracteole width 3 2.7 0.05 2.32 ± 0.1 34.9 2-3.5 Fiber length 24.79 29.92 0.04 29.87 ± 0.23 8.86 21.19-34.68 Fiber strength 28.57 34.05 0.005 29.68 ± 0.16 6.46 24.90-35.0 Fiber elongation 6.7 5.85 0.04 5.99 ± 0.08 15.89 4.0-8.2 Fiber fineness 4.71 2.92 0.03 3.62 ± 0.062 19.73 2.36-5.52 Maturity 0.89 0.86 0.04 0.84 ± 0.001 1.81 0.82-0.89 Uniformity index 79.70 82.4 0.03 81.41 ±0.149 2.09 77.30-85.30 GOT (%) 28 35 0.007 31.48 ± 0.479 17.42 20.35-46.01 a The probability of the t-test on the means of the parents.

118

Chapter 3: Results

3.5.4 QTL analysis

QTL cartographer software V 2.5 (Wang et al., 2006a) was used for QTL analysis of productivity and fiber traits. Single marker analysis (SMA), interval mapping (IM) and composite interval mapping (CIM) methods were used to find out association between phenotype and marker genotype. A total of 6 markers were found associated (P<0.001) with three traits using SMA (Table 3.18). QTLs identified by IM and CIM for different traits understudy are listed in Tables 3.16 and 3.17 respectively. Additive effects of the traits are specified as +ve or -ve values. The R2explains the variability accounted by the trait. QTLs were defined as “Q” followed by trait acronym, and then followed by linkage group number. QTLs discovered with IM were designated by “I” while “C” stood for composite interval mapping.

3.5.4.1 Mapping QTL for productivity/taxonomic traits

The marker data of F2;3 individuals and for productivity traits including boll surface, boll beak, boll length, boll width, boll gossypol, bracteole length, bracteole width, staple length, fiber strength, fiber fineness, fiber elongation, ginning out turn i.e. (GOT), maturity and uniformity index were utilized to map QTLs. Single marker analysis (Table 3.18) showed 6 markers significantly (P<0.001) correlated with three traits. To find out position of genes on chromosomes, interval mapping and composite interval mapping analysis were carried out. Both Interval mapping and composite interval mapping depicted 3 QTLs for 3 different traits. Tables 3.16 and 3.17 are showing detailed biometrical structure for each QTL analyzed by IM and CIM, respectively.

119

Chapter 3: Results

Table 3.16 Information regarding putative QTLs detected for staple length, uniformity index and GOT% based on interval mapping (QTL Cartographer V 2.5)

1Linkage group/chromosome number in parenthesis (1) (2) (3) 2(4) LG / Nearest QTL LOD Additive R Trait QTL Chr. no marker Position Score effect (%) Staple QStL2I 2(18) PR-278 63.9 3.12 1.08 17.20 length Uniformity QUniI2I 2(18) PR-361 99.60 2.70 1.30 12.69 index GOT QGOT6I 6(21) PR264 00.00 3.54 4.93 11.69 2Position in centiMorgan from top of the linkage group/chromosome 3Log of odds scores 4Phenotypic variation explained 5 Discovered through interval mapping

120

Chapter 3: Results

Table 3.17 Information regarding putative QTLs detected for staple length, Uniformity index and GOT based on Composite interval mapping (QTL Cartographer V 2.5).

LG(1) Nearest QTL(2) LOD(3) Additiv R2(4) Trait QTL chrom marker Position Score e effect (%) Staple QStL2C5 2(18) PR-278 63.9 3.11 1.36 12.0 length Uniformity QUniI1C 1(15) PR-500 00.00 3.31 0.85 11.21 index GOT QGOT6C 6(21) PR-264 00.00 3.03 4.30 10.02 1Linkage group/chromosome 2Position in centi Morgan from top of the linkage group/chromosome 3Log of odds scores 4Phenotypic variation explained 5 Discovered through composite interval mapping

121

Chapter 3: Results

Table 3.18 Information regarding putative QTLs detected for staple length, Uniformity index and GOT% based on single marker analysis (QTL Cartographer V2.5).

Trait LG/Chrom)(1) Marker F P-value(2) 2(18) PR-278 14.77 0.000 Staple length 4(14) PR-536 9.44 0.000

6(21) PR-443 7.93 0.000 1(15) PR-500 11.19 0.001 Uniformity index 2(18) PR-278 11.70 0.001 GOT 6(21) PR-264 11.03 0.001 aLinkage group/chromosome Probability that the marker genotype had no effect on the trait.

122

Chapter 3: Results

3.5.4.2 Description of newly identified QTLs

3.5.4.2.1 Staple length

Single marker analysis (SMA) revealed three markers linked with staple length. Out of three linked markers, PR-278 and PR-536 showed highly significant (P≤0.001) association with staple length. Interval mapping revealed single QTL (QStL2I) on linkage group 2 for staple length, explaining 17.20% of the phenotypic variance with LOD scores of 3.12. One QTL, QStL2C, was identified by CIM method with LOD score 3.11. The contribution to total phenotypic variations was 12%. This QTL was located on chromosome 18. PR-278 marker was found linked with staple length with IM, CIM and SMA.

3.5.4.2.2Uniformity index

A total of five markers associated with uniformity index were identified using SMA. Out of these, PR-278 and PR-500 showed highly significant (P≤0.001) association with uniformity index. One QTL QUniI2I explaining the phenotypic variation of 12.69% using IM and one QTL QUniI1C explaining the phenotypic variation of 11.21% using CIM were detected on chromosome 18 and 15 respectively.

3.5.4.2.3GOT%

One QTL for GOT% was detected on chromosome21 using both the IM and CIM analyses. The contribution to total phenotypic variations for the QTL was 10.02-11.69%. Among the five linked markers, PR-264 showed highly significant (P≤0.001) association with GOT%.

3.5.4.2.4Marker association with traits using single marker analysis (SMA)

Two markers PR-87 and PR-278 were found significantly (P≤0.001) associated with boll length and bracteole width respectively using SMA.

123

Chapter 3: Results

3.5.4.2.5Genes flaking QTLs linked markers To identify putative genes involved in fiber development, flanking genomic regions of the markers (PR-278, PR-536, PR-443, PR-500 and PR-264) were explored in the genome sequence of G. raimondii L. Ten putative genes (in total 20) on each side of the marker were identified (Table 3.19). Then the sequences of these genes were blasted to identify homology with the known genes in NCBI (Appendix Table 8). Out of these, sequences of three putative genes with gene ids Gorai013G177500, Gorai002G077200 and Gorai011G163300 showed homology with the serine/threonine protein phosphatase, MYB related protein and Fiber protein Fb 20, respectively. Gorai013G177000 was found 0.14Mb away from PR-278 (Fig. 3.20), Gorai002G077200 was 0.17 Mb away from PR-536 (Fig.3.22) and Gorai011G163300 was 0.96 Mb away from PR-443 (Fig. 3.21). In this study PR-278 was found associated with fiber length QTL identified by the Interval Mapping (IM) (Table 3.16) as well as Composite Interval Mapping (CIM) (Table 3.17), while PR-536 and PR-443 were found associated with fiber length QTL with single marker analysis (SMA) (Table 3.18).

3.5.5 Marker assisted selection (MAS)

The F2:3 lines were advanced to F4. A total of nine lines (8 plants/line) showing increase in fiber length were selected. Genomic DNA from all plants/line was bulked and surveyed with PR-278 (associated with fiber length QTL, discussed in previous section). Only one line was found with maximum increase in fiber length and also showed G. barbadense L. like alleles.. Hence, the phenotypic selection assisted with DNA markers (for complex traits) may help breeder in cotton improvement programs.

the markers used in this study were newly designed and used for the first time so there were no common markers for fiber related QTLs, however, QTLs linked marker based selection will enable the use of MAS to improve fiber quality in future. In order to use MAS and dissect the genetic basis of fiber quality traits there is a need to develop more common SSR markers by different research groups.

124

Chapter 3: Results

Table 3.19 Physical locations of markers and putative genes in their flanking regions

Marker Chr. Marker Marker Trait Position Putative flanking genes No. name Position (cM) (bp) Gorai013G175100, Gorai013G174900, Gorai013G175200 Gorai013G175300, Gorai013G176600,Gorai013G176700 18 PR278 63.9 46901500 Gorai013G176800, Gorai013G176900,Gorai013G177000 Gorai013G177100, Gorai013G177200,Gorai013G177300 Gorai013G177400,Gorai013G177500 Gorai011G160000, Gorai011G160100, Gorai011G160200, Gorai011G160300, Gorai011G160400, Gorai011G160800, Gorai011G160900, Gorai011G161000,Gorai011G161100, 21 PR443 55.8 29964835 Gorai011G161200,Gorai011G161300,Gorai011G161400, Fiber length Gorai011G161500,Gorai011G161600,Gorai011G161700, Gorai011G162700,Gorai011G162800,Gorai011G163200, Gorai011G163300,Gorai011G163500 Gorai002G073300, Gorai002G073400, Gorai002G073500, Gorai002G073600, Gorai002G073700,Gorai002G075300, Gorai002G075400, Gorai002G075500, Gorai002G075600, 14 PR536 75.3 8979520 Gorai002G075700, Gorai002G075900, Gorai002G075800, Gorai002G076000, Gorai002G076100, Gorai002G076200, Gorai002G077100, Gorai002G077200, Gorai002G077300, Gorai002G077400, Gorai002G077500 Gorai001G171900, Gorai001G172000, Gorai001G172100, Gorai001G172200, Gorai001G172300, Gorai001G172500, Uniformity Gorai001G172600, Gorai001G172700, Gorai001G172800, 15 PR500 0 25040477 Gorai001G172900, Gorai001G173300, Gorai001G173100, Index Gorai001G173200, Gorai001G173300, Gorai001G173400, Gorai001G173600, Gorai001G173700, Gorai001G173800, Gorai001G173900, Gorai001G174000 Gorai011G267600, Gorai011G267900, Gorai011G268100, Gorai011G268200, Gorai011G268700, Gorai011G270200, Gorai011G270100, Gorai011G270300, Gorai011G270500, GOT %age 21 PR264 0 60375736 Gorai011G270900, Gorai011G271500, Gorai011G272400, Gorai011G272900, Gorai011G273000, Gorai011G273400, Gorai011G274200, Gorai011G274000, Gorai011G274600, Gorai011G275000

125

Chapter 3: Results

Figure 3.19Position of QTLs associated with staple length (shown in red colour) and uniformity index (yellow colour), on genetic linkage map of cotton constructed using F2 population derived from inter-specific cross FH-1000 and PGMB-36. These positions of QTLs were identified with log of odds ratio ≥3.0 in QTL Cartographer V 2.5. Physical locations as base pair (converted into million base pair :Mb)) of markers linked to these QTLs and putative genes in their flanking regions.

126

Chapter 3: Results

Figure 3.20Position of QTLs associated with staple length (red) on genetic linkage map of cotton constructed using F2 population derived from inter-specific cross FH-1000 and PGMB-36. These positions of QTLs were identified with log of odds ratio ≥3.0 in QTL Cartographer V 2.5. Physical locations as base pair (converted into million base pair: Mb)of markers linked to this QTL and putative genes in their flanking regions.

127

Chapter 3: Results

Figure 3.21Position of QTLs associated with staple length (red) on genetic linkage map of cotton constructed using F2 population derived from inter-specific cross FH-1000 and PGMB-36. These positions of QTLs were identified with log of odds ratio ≥3.0 in QTL Cartographer V 2.5. Physical locations as base pair (converted into million base pair:Mb)of markers linked to this QTL and putative genes in their flanking regions.

128

Chapter 3: Results

Figure 3.22Position of QTLs associated with uniformity index (yellow) on genetic linkage map of cotton constructed using F2 population derived from inter-specific cross FH-1000 and PGMB-36. These positions of QTLs were identified with log of odds ratio ≥3.0 in QTL Cartographer V 2.5. Physical locations as base pair (converted into million base pair:Mb)of markers linked to this QTL and putative genes in their flanking regions.

129

Chapter 3: Results

Figure 3.23 Position of QTLs associated with GOT% (green) on genetic linkage map of cotton constructed using F2 population derived from inter-specific cross FH-1000 and PGMB-36. These positions of QTLs were identified with log of odds ratio ≥3.0 in QTL Cartographer V 2.5. Physical locations as base pair (converted into million base pair:Mb)of markers linked to this QTL and putative genes in their flanking regions.

130

Chapter 4: Discussion

DISCUSSION

Cotton is known for the production of natural lint fiber worldwide including Pakistan—sustains billion dollar textile industry. Genetic diversity assessment is necessary for the selection of species/varieties for breeding programs. There are 51 species of the genus Gossypium (45 diploid and 6 allotetraploid). Out of the sixallotetraploids only two species; G. hirsutum L. and G. barbadense L. are cultivated. G. hirsutum L. also called upland cotton is known for its high lint fiber yield but this species has lost genetic diversity due to high selection pressure imposed during its domestication. Low genetic diversity impedes future breeding progress. In the given genetic resources, diploid cotton species have many important traits like fiber yield/strength and resistance/immunity to both types of stresses. G. raimondii and G. arboreum (both are diploid cotton species) are closest living relatives of the cultivated tetraploid species. G. raimondii has the smallest genome size ~ 880 Mb, and has been sequenced recently (Paterson et al., 2012). In the present study, sequence information of G. raimondii was kindly provided by Andrew H. Paterson PGML, UGA, USA to design new microsatellite or simple sequence repeats (SSRs). In total, 2397 were randomly selected from 92160 sequences and a total of 1294 sequences (560 BAC end and 734 BAC-clone sequences) consisting of SSRs were identified. SSRs are useful to distinguish closely related genotypes because of their hyper variability as well as randomly dispersed throughout the genome and publicly available via published flanking primer sequences, make them markers of choice (Saghai Maroof et al., 1994). SSRs derived from BAC-end sequences are not only used as genetic markers but also offer the opportunity to efficiently integrate the genetic and physical maps (Zhang et al., 2008).

131

Chapter 4: Discussion

4.1 Genetic diversity and relationship of diploid and tetraploid species using EST- SSRs and gSSR

4.1.1 Polymorphism in microsatellite region

Faint or failure of SSRs amplification was expected because primers were designed from G. hirsutum. It is much likely that during evolution there was enough accumulation of mutations in annealing sites and or loss of these loci in diploid species which together may influence the annealing of these primers (Liu et al., 2000). In the present study, 97% of SSRs were polymorphic among 36 cotton species. Such commonalities have been reported while studying the genetic divergence among 31 Gossypium species using RAPDs (Khan et al., 2000) and 25 diploid Gossypium species using SSRs (Guo et al., 2006). Such a high allelic polymorphism rate among various species is the result of accumulation of mutations during evolutionary period (Nei, 2007). In this study, average alleles per locus (2.87) were slightly higher than the previous reports (2 allele; Wu et al., 2007b). Similarly, more alleles were amplified in tetraploids than diploids, which are in agreement with earlier reports (Gutierrez et al., 2002; Kalivas et al., 2011). Multiple folds (30–36) increase in ploidy level of tetraploids (Paterson et al., 2012) is one of the possible explanations for the amplification of more number of alleles. The propensity of the number of alleles is positively correlated with the repeat number (Lacape et al., 2007), ploidy level of the germplasm (Udall and Wendel, 2006), number of genotypes surveyed (Guo et al., 2007; Lacape et al., 2007) and the accuracy of system used for resolving amplicons. Polymorphic information content (PIC), an important parameter, helps in choosing SSRs for evaluating the germplasm, gene tagging etc. (Peng and Lapitan, 2005). In the present study, higher PIC values for gSSRs compared to EST- SSRs suggesting that transcribed portions of the genome are conserved in the genomes (Eujayl et al., 2002; La Rota et al., 2005). In multiple reports, inconsistency in PIC values data have been reported (Liu et al., 2000; Kebede et al., 2007; Kalivas et al., 2011) which is attributed to the kind of germplasm explored, bottleneck in domestication (Thuillet et al., 2004; Vigouroux et al., 2005) and the kind of DNA markers surveyed (Liu et al., 2000; Gutierrez et al., 2002). Also, PIC values of the SSRs surveyed on diploid species were higher (0.30) than the tetraploids (0.21). Most diploid species are wild except A- 132

Chapter 4: Discussion

genome species. Wild species are not domesticated suggesting selection pressure for accumulating particular type of alleles was not imposed, is the reason for escalation of PIC values (Vigouroux et al., 2002; Qureshi et al., 2004).

4.1.2 Genetic characterization

The PIC values guide us to select the most informative SSRs for calculating the genetic divergence (Candida et al., 2006), thus the number of SSRs can be reduced substantially (Candida et al., 2006) before initiating the genetic diversity and variety identification experiments (Macaulay et al., 2001; Masi et al., 2003; Jain et al., 2004). In this study, we proposed 22 (11 BNLs and 11 MGHES) out of the 75 SSRs—based on their high PIC values (PIC ≥ 5.0) and potential to amplify distinct DNA fragments for calculating the extent of genetic diversity among the 36 Gossypium species. Such findings were reported in multiple investigations including 39 SSRs for cotton genetics studies (Lacape et al., 2007) and 25 SSRs for G. arboreum L. accessions (Kantartzi et al., 2009). Large number of informative gSSRs had di-nucleotide repeats while larger portion of the informative EST-SSRs had tri-nucleotide motif. Dominance of trimeric SSRs over the others is possibly due to the inhibition of non-trimeric SSRs in coding regions of the genes for helping in avoiding the chances of occurrence of frame shift mutations (La Rota et al., 2005). Another reason is that the high portion of trinucleotide repeats in coding regions may be due to the exertion of selection pressure for selecting particular single amino acid stretches (Morgante et al., 2002). Also the most informative SSRs contained ≥10 repeats which are in agreement with the previous studies (Vigouroux et al., 2002; Qureshi et al., 2004; Kantartzi et al., 2009). It was also observed that position of the SSR loci on the chromosome have no impact on the corresponding PIC values (Lacape et al., 2003, 2007). In the present study, correlation was not observed between the rate of polymorphism and repeat motif type that is contradictory to the previous findings of Lacape et al. (2007). They found repeat motif type dependent polymorphism in cotton and showed that SSRs with GA repeat motif type exhibited higher PIC value with more number of alleles than SSRs with CA repeat motif while Thuillet et al. (2004) found SSRs with CA repeat motif type exhibited significantly fewer alleles than GA SSRs in

133

Chapter 4: Discussion

wheat. This might be due to difference in nucleotide distribution in different genomes but still further investigations are required by using higher number of markers to confirm whether polymorphism is repeat motif type dependent or not.

4.1.3 Performance of microsatellite among A, D and AD-genome species

The SSRs did not amplify distinctive fragments with genomic DNA of A-genome species but produced clear bands in the D- and AD-genome species were placed on D- subgenome of allotetraploid cotton (Lacape et al., 2003; Mei et al., 2004). Only, 12 markers were D-genome specific reflecting substantial divergence of D-genome species from D-subgenome of allotetraploid cotton (Brubaker et al., 1999; Adams and Wendel 2004; Guo et al., 2006; Wu et al., 2007). Amplicon sizes (101–300 bp) of a number of SSRs were different in AD-genome species and their diploid ancestral species (A and D). Such commonalities have been reported (Syed et al., 2001) which are due to type/number of repeat motif and flanking sequences (Buteler et al., 1999). Amplified fragments size distribution of species containing AD- and D-genomes was dispersive while alleles amplified in A-genome species were of intense distribution (Fig. 3.1). Our outcomes are contradictory to the finding of Liu et al. (2006) who found dispersive distribution of fragment sizes in G. arboreum L. and relatively concentrated distribution in G. hirsutum L.

4.1.4 Cross species amplification and genome specificity

Genome/species-specific SSRs can be useful in monitoring introgression of specific genomic portion of the donor species into the adaptive species (Guo et al., 2007), that can be instrumental in assigning species to unknown plants and in distinguishing cotton species. In this study 15 genome/and or species-specific SSRs were observed. The transferability of SSRs derived from tetraploids to diploids indicates evolution of all genomes from one ancestor. We reported a high transferability rate in A-genome as compared to D-genome, indicating that D-subgenome in tetraploids deviated substantially during polyploidization from their progenitor D-genome (Liu et al., 2006). Second, the higher transferability rate in A-genome species may be due to larger size of A-genome

134

Chapter 4: Discussion

(Edwards et al., 1974; Reinisch et al., 1994). In this study, gSSRs (BNLs) showed low transferability (37.28%) and high polymorphism rate across the species versus high transferability (54.72%) and low polymorphism rate exhibited by EST-SSRs, primarily because of their conserved nature (Cho et al., 2000; Thiel et al., 2003). The EST-SSRs derived from fiber tissues showed high level of transferability in diploid genomes, confirming the presence of fiber related genes in all the cotton genomes. Phenomenon of transferability has also been reported in other crop species (Kuleung et al., 2004; Saha et al., 2004).

4.1.5 Genetic relationship of tetraploid species with their wild relatives

Among A-genome species, G. herbaceum (A1) was found relatively closer to G.

hirsutum (AD1) while G. arboreum (A2) showed more closeness toward G. barbadense.

Among D-genome species G. raimondii (D5) was more similar to G. hirsutum (AD1) (0.667). In another study, G. herbaceum was found more genetically close to G. hirsutum (0.69) as compared to G. arboreum (0.52). It has also been observed that G. raimondii

(D5) and G. gossypioides (D5) are genetically more comparable to G. hirsutum (AD2) and

G. barbadense (AD2) (Kebede et al., 2007). In few cytogenetic studies, it was elucidated that G. herbaceum is more comparable to the ancestor species of tetraploid cotton than G. arboreum (Endrizzi et al., 1985; Wendel, 1989; Percival and Kohel, 1990).

4.1.6 Genetic diversity and phylogenetic relationship in the genus Gossypium

For evolutionary studies of cotton species, basic requirement is to work out their phylogeny (Khan et al., 2000; Abdalla et al., 2001; Paterson et al., 2002) and to estimate the extent of genetic diversity (Khan et al., 2000). For genetic diversity and phylogeny studies of cotton species various methods based on morphology, meiotic behaviour, genetic and molecular techniques have been deployed. In this study, two types of SSR markers (EST-SSRs and genomic SSRs) have been utilized to study the phylogenetic relationship among the cotton species. It is clear from this study that diploid species are genetically more diverse from each other as compared to the tetraploid germplasm. In this study, low to moderate level of genetic similarity among Gossypium species has been

135

Chapter 4: Discussion

estimated. This report is consistent with the findings of Abdalla et al. (2001) calculated relatively high estimates of genetic diversity among Gossypium species using AFLP marker system.

Among tetraploid germplasm lowest genetic similarity of G. hirsutum ‘‘punctatum’’ (ancient cultivated race; Brubaker and Wendel, 1994; Lacape et al., 2007) with the other tetraploid germplasm reveals the existence of unique/useful alleles in this species. Such races could be a promising source for broadening the extent of genetic diversity within cultivated cotton. G. hirsutum ‘‘latifolium’’ genetically more close to G. hirsutum (Lacape et al., 2007), would have least obstacles (Lubbers and Chee, 2009) in attempting crosses. Within tetraploids, high genetic similarity estimates between G. barbadense and G. darwinii and G. tomentosum and G. hirsutum are in consistent with the earlier reports (Liu et al., 2000; Wendel and Cronn 2003; Lacape et al., 2007). Also, the variation in restriction sites in cpDNA and rDNA and in allozyme (14 enzyme systems) demonstrated more distinctiveness of G. tomentosum from G. hirsutum (0.82) than from G. barbadense 0.65 (Dejoode and Wendel, 1992) which is contradictory to our findings.

In cluster analysis, A-genome species made sister cluster with D-genome species using the data of both EST-SSRs and gSSRs (Fig. 3.4 and 3.5 respectively) while these made sister cluster with AD-genome species using combined data in a major cluster ‘A’ (Fig.3.3) Such commonalities have been found using RAPD markers (Khan et al., 2000), cpDNA, ITS and combined data set based phylogenies (Seelanan et al., 1997) and cpDNA restriction site based phylogeny (Wendel et al., 1992). The sister clustering of A- genome species with the AD-genome species strengthens the concept that A-genome is the cytoplasmic donor of AD-genome (Wendel, 1989). It is likely that genomes of A- genome species have chromosome of larger size and more recurrence of repetitive sequences in their genome as compared to D-genome (Geever, 1980), thus creating opportunities to amplify similar sequences (homology) among the genomes. Also, the rate of evolution of D-genome is faster than A-genome (Adams and Wendel, 2004). All B-genome species showed close relationship with each other using combined data sets,

136

Chapter 4: Discussion

EST-SSRs and gSSRs (Fig. 3.3, 3.4 and 3.5 respectively) which is consistent with the previous reports (Wendel and Albert, 1992), and the ‘E’ genome species were grouped with ‘B’ genome species. ‘G’ genome species also showed closeness with each other using three data sets. All ‘G’ genome species were grouped with ‘C’ genome species using EST-SSRs and combined data set while with gSSRs ‘C’ genome species were grouped with ‘E’ genome species whereas ‘G’ genome species grouped separately and appeared to be distantly related to all other genomic groups, illuminating that all ‘G’ genome species share a common ancestor (Fryxell et al., 1992; Liu et al., 2001).

Gossypium genomes, eight in number, comprise of four major lineages, spread in three continents. In Australia C-, G-, and K-genome species were found while in America D-genome species were present, however, in Africa/Arabia first lineage of the A-, B-, and F-genome species, and second lineage of the E-genome species were found (Fryxell, 1979; Fryxell et al., 1992). In this study, clustering of ‘B’ and ‘E', and ‘C’ and ‘G’ genomes in one cluster using EST-SSRs and combined data is probably because of evolution from a common ancestry. A large number of genomic data showed consistency with the aforementioned taxonomy of the cotton species (Wu et al., 2007).

With three types of data sets, all D-genome species were grouped in one cluster

except G. raimondii (D5) that grouped with A-genome species using EST-SSR data. The grouping of G. raimondii D5 with A-genome species with EST-SSR data set was not surprising as in several studies it has been found isolated from rest of the D-genome species (Phillips, 1966; Parks et al., 1975; Fryxell, 1979). It is disjunct geographically

from rest of the subgenus. In D-genome species cluster (G. aridum D4 and G. herkensii

D2-2), (G. gossypioides D6 and G. lobatum D7) and (G. klotzschianum D2 and G.

davidsonii D3) showed closeness with all three data sets. The position of remaining three

D-genome species (G. thurberi D1, G. trilobum D8 and G. laxum D9) remained unresolved as these species were grouped in different clusters with both the independent and combined data of EST and gSSRs while in few studies close genetic relationship was reported between G. thurberi D1 and G. trilobum D8 (Guo et al., 2007; Wu et al., 2007).

137

Chapter 4: Discussion

The close relationship of G. klotzschianum D2 and G. davidsonii D3 is congruent with the earlier reports (Wendel and Albert, 1992; Wu et al., 2007; Guo et al., 2008).

The position of G. longicalyx L., only F-genome species, still remained unresolved in the present study. With the gSSR data, this species has shown close

association with the E-genome species (G. stockii L. E1) while with EST-SSR data set it showed kinship to uncommon tetraploid species ‘G. lanceolatum L. (AD)’. In earlier reports, high genetic similarity of G. longicalyx L. was reported with A-genome or allotetraploid derivatives (Wendel, 1989). However, in majority of the earlier reports, relatively isolated position of G. longicalyx L. was reported (Saunders, 1961; Phillips and Strickland, 1966; Wu et al., 2007).

The phylogenetic tree of species with EST-SSR data is closely resembled with the tree obtained from the combined data set. On the basis of these findings it is, therefore, suggested that relatively limited number of EST-SSRs instead of using high number of markers can be instrumental in resolving phylogenies at species level. Moreover, this study confirms the usefulness of limited number EST SSRs for fingerprinting of geographically isolated species (Gutierrez et al., 2002; Rahman et al., 2002b; Shaheen, 2005) and determining the phylogeny of the species.

4.2 Identification of differentially expressed genes involved in fiber development

Genetics of cotton fiber is very complex. It has been shown in multiple studies that large numbers of genes are involved in fiber initiation, elongation and maturation. A lot of work has been done to identify genes involved in fiber initiation (Apart et al., 2004; Lee et al., 2007; Xu et al., 2007), however, molecular basis of cotton fiber elongation are poorly understood (Chen et al., 2012; Gilbert et al., 2013). Therefore, understanding the genetics of fiber elongation would help to find genes controlling fiber length. Mutants (fibreless) exhibit fiber cessation at the early stage of fiber development which results in very short fibers, thus offer an ideal system for the study of molecular events involved in cotton fiber elongation. In this study Gossypium genes involved in fiber elongation were studied by differential RNA display of 3 dpa of normal and mutant ovules. Differential

138

Chapter 4: Discussion

display method reveals all aspect of regulation (up and down), as well as the absence/presence of bands, suggesting qualitative differences, and signals with varying intensity, suggesting quantitative difference (Voelckel and Baldwin, 2003).

In this study,15 arbitrary primers in combination with 11 anchored primers were surveyed and 23 differentially transcribed genes were amplified. Out of these 23, ten were rejected because of the poor quality upon re-amplification (Lievens et al., 2001). The remaining 13 fragments were sequenced and six fragments showed homologies with known genes. The sizes of these six fragments ranged from 600–800 bp. Ji et al. (2003) also found fragments of ~400 to ~1Kb during the study of differentially expressed genes at early stages of cotton fiber development using subtractive PCR and cDNA array.

In this study, a transcript of 760bp designated as A6B1 showed homology with β- tubulin gene. β-tubulin are major component of microtubules (Silflow et al., 1987; Goddard et al., 1994; Nogales et al., 1998) which are reported to be involved in fiber elongation (He et al., 2008; Li et al., 2008; Yang et al., 2008). Our findings are compatible with the findings of Shi et al. (2006) who demonstrated that tubulin genes are expressed at higher level in normal ovules (fiber producing ovules) than the mutant (fiberless). In number of other studies, β-tubulin genes are found to be preferentially expressed during early cotton fiber development (Ji et al., 2002; Li et al., 2002b, 2005; Ruan et al., 2003; Feng et al., 2004;Wang et al., 2004).

In this study, the homology of transcript A6B2 with Zinc finger proteins is congruent with the findings of Ji et al. (2003). They found zinc finger proteins expression at increased level during cotton fiber elongation but contradictory to the findings of Wang et al. (2010) and Chen et al. (2012), they reported that zinc finger proteins are transcriptional regulators which are involved in fiber initiation.

Zinc-finger domain can exist in reduced and oxidized conditions. Under reduced conditions it coordinates two zinc ions and can interact with either lipid transfer protein (LTP) which was reported to be differentially expressed in developing cotton fibers (Ma et al., 1995) and specifically expressed in tobacco trichomes (Liu et al., 2000) while

139

Chapter 4: Discussion

under oxidized conditions transcription factors of zinc finger family regulate cellulose synthesis (Kurek et al., 2002). Cellulose is major component (95%) of fiber.

The fragment designated as A1B3 has shown homology with the P-type ATPase. Fiber elongation is under control of interaction of the extensibility of cell wall and turgor pressure of cell (Smart et al., 1998) as well as molecular transport and cell-to-cell communication are vital for fiber elongation (Lee et al., 2007), ATPase provides energy for the maintenance of turgor pressure of cell required during fiber expansion (Joshi et al., 1998; Ruan et al., 2004) and genes encoding K+ transporters are very high in fiber elongating cells (Ruan et al., 2004).

Genes encoding plasma membrane proton-translocating ATPase (one of P-type ATPases) were found to be up-regulated during peak fiber expansion while culminate with the commencement of next phase of fiber development (Smart et al., 1998; Li et al., 2002; Lee et al., 2007; Chen et al., 2012).

A6B1_F has homology with the glycosyltransferase. Our findings are congruent with the findings of Qin et al. (2013) who isolated a gene encoding putative b-1,3- galactosyltransferase (GalT) from cotton. This gene was named GhGalT1 and its expression was weak to moderate in 0-1 dpa ovules but was at peak in elongating fibers.

Glycosyltransferase are involved in side chain synthesis of arabinogalactan- proteins (AGPs) (Qin et al., 2013). It has been reported that AGPs play important role in cotton fiber development (Huang et al., 2008; Wu et al., 2009; Yuan et al., 2011; Gong et al., 2012). Many other studies have also demonstrated the critical participation of side chain of AGPs in the initiation and elongation of cotton fibers (Bowling et al., 2011; Huang et al., 2013). The glycosyltransferase may be a good candidate for genetic manipulations to improve fiber quality.

The homology of two transcripts A3B2 and A5B2 with protein phosphatase, is similar with the finding of Wang et al. (2010) found the involvement of protein phosphatase in the fiber cell development during the analysis of differentially expressed

140

Chapter 4: Discussion

genes between the wild type and mutant libraries (Wang et al., 2010). Our results are contradictory to the findings of Mei et al. (2004) identified that protein phosphatase have higher expression levels at 0 dpa and then decreased along the early stage of fiber development.

All the aforementioned transcripts were isolated from normal (fiber producing) ovules. Li et al. (2002) also found that 60 genes were expressed more abundantly in normal ovules than in its fl (fuzzless/lintless) mutant. In another study, thousands of transcripts were found in fiber bearing ovules using massively parallel deep sequencing procedure (Wang et al., 2010).

None of the transcript isolated from mutant (fiberless) ovule, showed homology with the putative plant proteins/enzymes, thus indicating that the fiberless mutants suffer from the loss of functional genes which are responsible for the fiber development (Wang et al., 2010).

These fiber-related partial cDNA sequences/transcripts can be used for the identification of full length genes which may be used for improving the quality traits in cotton.

4.3 Isolation of fiber related sequences from different developmental stages of G. arboreum

4.3.1 Isolation of fiber related sequences

Although the differential display is one of tools used to identify differentially expressed genes, but in our previous experiment entitled “Identification of differentially expressed genes from normal (fiber producing) and mutant (fiberless) ovules of Gossypium hirsutum L. during fiber elongation stage” most common genes involved in cotton fiber development were not recovered. Therefore, in this experiment, RT-PCR with gene specific primers was performed. To perform RT-PCR, total RNA isolated from the ovules at 0dpa, 5dpa and 10 dpa was reverse transcribed to cDNA. This cDNA was amplified using gene specific primers designed from the conserved regions of ESTs

141

Chapter 4: Discussion

showing homology with genes encoding for Proline rich cell wall protein, Ubiquitin extension protein, Translation elongation factor 1- gamma, Myb like protein from G. hirsutum L.

In this study, reverse transcribed total RNA (0 dpa) of G. arboreum L. amplified with gene specific primer “Y-23” produced amplicon of 248 bp and its nucleotide sequence showed homology with the MYB 104 gene of G. hirsutum L. In previous study, it was found that genes encoding putative transcription factors (MYB) are enriched during early cotton fiber development (-3 dpa to 0 dpa) while expression of these genes reduced at 5 dpa and expressed at high level in normal ovules and reduced in mutant ovules (Pu et al., 2008). In cotton a number of MYB transcription factors, such as GhMYB1-6 (Loguerico et al., 1999; Taliercio and Boykin, 2007), GaMYB2 (Wang et al., 2004b), and GhMYB25 (Wu et al., 2006; Wu et al., 2007) have been identified.

The “MYB 109” structurally related to AtGL1 and AtWER (Suo et al., 2003; Taliercio and Boykin, 2007), has been reported to be involved in cotton fiber initiation (Pu et al., 2008). In this study, identification of another class of transcription factors, the “MYB-104”, from 0 dpa ovules shows its involvement in cotton fiber initiation. Transcription factors play different roles during cell development (Hori et al., 2003; Perez Rodriguez et al., 2005), differentiation (Igarashi et al., 2007) and signal transduction (Middleton et al., 2007).

In the present study, partial cDNA sequence of Ubiquitin extension protein was obtained from 5 dpa ovules of G. arboreum L. through RT-PCR using Ubiquitin extension protein specific primers which suggest its involvement in cotton fiber elongation. In a number of studies, Ubiquitin gene has been used as positive control (Ji et al., 2003) and gene specific primers designed from this gene for isolating genes of fiber specific regions from cotton (He et al., 2008).

Ubiquitin extension protein was found differentially expressed between wild and mutant libraries prepared from the ovules of two developing stages (initiation and elongation) of G. hirsutum L. (Wang et al., 2010). As the dividing cells need more energy

142

Chapter 4: Discussion

so, high activity of ribosomes is required. Extension proteins are constituents of ribosomes and accumulate as free proteins, suggesting that their role in ribosome function is highly conserved and that their association with ubiquitin is transitory. More mRNA transcripts are present in dividing cells suggesting that it is highly expressed in these cells.

In this study, reverse transcribed total RNA of 10 dpa of G. arboreum L. that was amplified with gene specific primer “SNP-10” and “SNP-24” gave amplicon of 178 bp and 435 bp, respectively and their nucleotide sequences showed homology with Proline rich protein (PRP) of G. herbaceum L. and Translation elongationfactor1- gamma gene of vitis vinifera.

Our results of PRP are congruent with the findings of Ji et al. (2003) who found increased level of Proline rich proteins at 10 dpa (Ji et al., 2003) and in a number of other studies it was found that PRPs are involved in cotton fiber development (Xu et al., 2007; Ghazi et al., 2009; Hao et al., 2012; Xu et al., 2013).

Translation elongation factor1- gamma is necessary for the assembly of microtubules which are involved in cell wall synthesis during cotton fiber expansion (Wu et al., 2007).

4.3.2 Screening of BAC libraries of different genomes (AD, A, D and K genomes).

A total of four BAC libraries were screened using two probes to identify BACs containing genes which are involved in fiber development. Both probes were designed from sequence of translation elongation factor1-gamma of G. arboretum L. These probes showed the highest number of positive hits in G. arboreum followed by in G. hirsutum L. In G. arboretum L., the highest numbers of hits were expected because probes were designed from the G. arboreum L. derived sequence of translation elongation factor1- gamma. Second highest number of hits in G. hirsutum L. showed that G. arboreum is close living relative of the ancestor allotetraploid species. Positive hits in G. kirkii L. during library screening showed the relatedness of these species (Hawkins et al., 2009).

143

Chapter 4: Discussion

G. kirkii L. has already been used as an outgroup sister species during phylogenetic analysis of cotton species (Cronnet al., 1999; Hawkins et al., 2006). The results showed that gene has multiple copies in the genomes as it was observed in A. thaliana (Guo et al., 2002). During the course of evolution, the gene has accumulated mutations which are genome specific. At eight positions mutations were detected which were specific to G. arboreum and G. kirkii L., while G. raimondii L. was found more close to G. kirkii L. comparatively. This study indicates the importance of mutations in the phylogenetic studies (Small, 2005). G. kirkii L. which has been used as an outgroup species showed mutations with respect to cotton genomes. G. raimondii L. which is the most distant from the other two species, is found to be more diverse as for as this gene is concerned. Gene L. copies within G. arboreum L. were having more variations as compared to the G. kirkii L. In a previous study, genes sequences from A, D, AD and G. kirkii L. were compared to examine the rate of evolution in genomes and effect of polyploidy on them (Cronn et al., 1999) and found that except for a single instance involving a putative AdhC pseudogene, all relative rate tests show that orthologous loci in A- and D-genome cottons have evolved at statistically equivalent rates (Cronn et al., 1999). Our results are also comparable to these results.

4.3.3 Analysis of BAC end sequences (BES) and Contigs assembly

In total, 23 BES sequences from four cotton BAC libraries were obtained, out of these, 14 sequences did not show homology with the reported sequences,, while the remaining nine showed homology with the elongation factor 1-gamma. All these sequences were aligned, their ends were joined and a sequence of 1524 bp was obtained which showed homology with elongation factor 1-gammaof Arabidopsis thaliana involved in cotton fiber expansion (Wu et al., 2007).

4.4 BAC derived new SSRs for use in cotton (Gossypium spp.) improvement

4.4.1 Identification of genomic SSR markers in G. raimondii

DNA markers are ideal system for studying of polymorphisms. DNA markers are of two types 1) sequence specific 2) random markers. Random markers are unreliable 144

Chapter 4: Discussion

(Rahman et al., 2002b; Guo et al., 2003). Simple sequence repeat (SSRs) also called DNA microsatellites are markers of choice (Saghai Maroof et al., 1994) because of their hyper variability, practicability and validity (Wang et al., 2006).

Many thousands of SSRs have been derived from different genomic libraries (Hoffman et al., 2007; Lacape et al., 2007; Xiao et al., 2009), expressed portion of genome (ESTs) (Qureshi et al., 2004; Park et al., 2005; Han et al., 2006), or bacterial artificial chromosomes (BAC) derived sequences (Frelichowski et al., 2006; Guo et al., 2007b) but a large number of DNA markers are needed for developing a dense genetic map of cotton and also to associate with multiple QTLs.

In this study, 1294 BAC-gSSRs were designed from the 2397 randomly selected BAC end genomic sequences from approximately 92160 BAC clone sequences of G. raimondii L.. Cotton BAC-end sequences derived SSRs are not only used as genetic markers but also offer the opportunity to efficiently integrate the genetic and physical maps (Zhang et al., 2008). Frelichowski et al. (2006) designed 1316 PCR primer pairs from 2,603 BAC end genomic sequences of G. hirsutum cv Acala “Maxxa.”

This new set of BAC-gSSRs contained diverse types (di-, tri-, tetra-, penta- and hexa- nucleotide) of repeat motifs. The large numbers of repeats were dinucleotide (56.10%), followed by tri (30.46%), tetra (9.05%), hexa (3.06%) and penta (1.30%). Generally, the occurrence, relative abundance, and relative density of SSRs decreases as the repeat unit increases (Lee et al., 2004).

In this study, (AT/TA) and (AAG, TTC and AAT) were most common di- and tri- nucleotide repeat motifs, respectively. Wang et al. (2006) and Wangzehen et al. (2007) also reported that most frequent di-nucleotides were TA/AT followed by AAG and AAT trinucleotide in G. raimondii L. AAG was also most common repeat in G. arboreum L. (Saha et al., 2003) and in G. hirsutum L. (James et al., 2006). Gupta et al. (1996) reported the most abundant tri-nucleotide repeat motifs (AAG)n and (AAT)n in plants with some species specific variation.

145

Chapter 4: Discussion

SSRs-derived from coding region of genome/expressed sequence tags are called EST-SSRs (eSSRs) (Qureshi et al., 2004). EST-SSRs are more power full to study selection pressures imposed on the elite germplasm (Wang et al., 2007). Trimeric type of SSR motif was more abundant in G. hirsutum L. derived eSSRs and in G. raimondii L. derived BAC-eSSRs (Wang et al., 2006; Wangzehen et al., 2007), whereas in this study di-nucleotide were more prevalent in G. raimondii L. derived BAC-gSSRs suggesting that genomic DNA does not select for the trimeric SSRs whereas suppression of non- trimeric SSRs in coding regions occurs for avoiding the risk of occurrence of frame shift mutations that may happen when those microsatellites alternate in size by one unit (La Rota et al., 2005; Hong et al., 2007).

The rare occurrence of nucleotide ‘C’ and ‘G’ together in the same motif was observed in this study while in wheat, rice and barley higher frequency of CGC and CCG were observed (La Rota et al., 2005) which may be a distinguishing feature between monocotyledonous and dicotyledonous plants (James et al., 2006).

High level of polymorphism was observed with hexa nucleotide repeats (42.4%) similar to the findings of Wangzehen et al. (2007) but they found no relationship between polymorphism and motif type. Legendre and Verstrepen, (2008) also suggested that SSRs with more than six repeat units should be used.

The minimum number of repeat units for primer designing was 5, 4, 3, 3 and 3 for di, tri, tetra and penta and hexa nucleotides, respectively. The content of polymorphism was higher with SSRs having high number of repeats. Changes in repeat numbers are usually generated through an intra molecular mutation mechanism called DNA ‘slippage’. Number of tandem repeats showed positive relationship with the level of polymorphism in tomato (Smulders et al., 1997) and maize (Vigouroux et al., 2002).

4.4.2 Polymorphism detected with BAC- gSSRs

Newly identified set of 1294 BAC-gSSRs was surveyed on genomic DNA of four species (G. hirsutum L. cv FH-1000, G. barbadense L. cv PGMB-36, G. raimondii L. and

146

Chapter 4: Discussion

G. arboretum L.) and found successful amplification with 1043 (80.60%) primers. Out of these 535 (51.29%) primer pairs were polymorphic between diploid species (G. raimondii L. and G. arboretum L.) while 235 (22.53%) were polymorphic between the two cultivated species (G. hirsutum L. cv. FH-1000 and G. barbadense L. cv. PGMB-36). A possible explanation for this low level of interspecific polymorphism is that several interventions of different epidemics reduced the magnitude of genetic diversity (Rahman et al., 2005b) which may be the cause of bottleneck in evolution (Iqbal et al., 2001). The level of genetic diversity is low in cotton germplasm (Rahman et al., 2002a, 2008).

Polymorphism was observed on the basis of variation in size of amplified alleles or null allele (allele absent). Such phenomenon has been reported in few other studies (Dakin and Avis, 2004; Ma et al., 2008), and divergence in the binding sites of primer (Hoffman et al., 2007) occurred because of the complete deletion in primer annealing sites which are otherwise considered highly conserved even across the species (Rossetto et al., 2002).

Around 313 (30%) of the successfully amplified BAC-gSSRs, amplified two fragments in tetraploid species and single locus in diploid species which is in concurrence with the results of previous studies (Gutierrez et al., 2002; Kalivas et al., 2011). The propensity of alleles number is associated with the number of repeats present in a particular SSR locus (Lacape et al., 2007), ploidy level of germplasm (Udall and Wendel, 2006) and the number of genotypes surveyed.

Successful amplification of genomic DNA of G. hirsutum and G. barbadense L. using BAC-gSSR of G. raimondii L. corresponds well with the findings of Wangzehen et al. (2007), showed that there is high degree of sequence conservation between G. raimondii L. and the tetraploid species of Gossypium L. (kebede et al., 2006). Secondly, the amplification of tetraploid species with the BAC-gSSRs derived from G. raimondii L. was expected because G. raimondii L. is one of the donor genome(s) of the allopolyploid cotton (Beasley, 1940; Endrizzi et al., 1985; Percival and Kohel, 1990; Wendel et al., 1992). While the amplification of these SSRs in G. arboreum L. suggests the transferability of these primers in other cotton genome, also supporting the earlier

147

Chapter 4: Discussion

findings that all cotton genomes evolved from a single common ancestor (Wendel et al., 1992).

In multiple investigations, SSR primer pairs which could not amplify discrete fragments from the genomic DNA of A-genome species but amplified distinct fragments in AD- and D-genome species were placed on D sub-genome of allotetraploid cotton (Lacape et al., 2003; Mei et al., 2004). In this study, a total of 391 (30%) of total BAC- gSSRs were found D-genome specific as these SSRs amplified clear alleles in D-genome (G. raimondii L.) and AD-genome species (G. hirsutum L. and G. barbadense L.) indicating the specificity of these loci/alleles amplified by these BAC-gSSRs to D- genome species which can be mapped on this species. These genome specific SSRs can be helpful in introgression of specific genomic portion of the donor species into the adaptive species (Guo et al., 2007).

For the assessment of potential utility of newly designed BAC-gSSRs for germplasm evaluation, gene tagging (Peng and Lapitan, 2005) and genetic mapping (Lacape et al., 2009, 2010; Paterson et al., 2009, 2012; Yu et al., 2011, 2012; Wang et al., 2012; Shaheen et al., 2013) it is important to calculate PIC values. Average PIC value was 0.39, ranging from 0.12 to 0.85. In multiple reports, discrepancy in PIC values have been reported (Liu et al., 2000a; Kebede et al., 2006; Kalivas et al., 2011) which is credited to the kind of germplasm explored, domestication bottleneck (Thuillet et al., 2004; Vigouroux et al., 2005) and the type of markers surveyed (Liu et al., 2000a; Gutierrez et al., 2002). The SSRs have more potential to identify allelic variations, appropriate parental lines selection for molecular mapping and genetic diversity assessment (Anderson et al., 1993).

More genetic relatedness of G. arboreum with G. barbadense and similar kind of genetic similarity of G. raimondii L. with G. hirsutum L. and G. barbadense L. found in this study was expected because G. arboreum L. and G. raimondii L. are donor genomes of these tetraploid species (Beasley 1940; Endrizzi et al., 1985; Percival and Kohel, 1990; Wendel et al., 1992). Abdalla et al. (2001) also found that G. arboreum L. was genetically more close to G. barbadense L. (0.43) than the G. hirsutum L. (0.41) while

148

Chapter 4: Discussion

same genetic similarity of G. raimondii L. with both the tetraploid species was reported. Our results indicate the reliability of these newly designed BAC-gSSRs for using in various genetic diversity and genetic mapping experiments.

4.5 Genetic mapping

Molecular linkage maps based on DNA markers provide essential tools for plant genetic research, facilitating quantitative trait locus (QTL) mapping, marker-assisted selection (MAS) and map based cloning. Genetic linkage map is a conceptual model of the linear arrangement of a group of genetic markers. Genetic linkage map is constructed simply by identification of parental polymorphism, scoring of segregation of markers in a test population to estimate recombination fraction for linkage grouping, and finally order determination of the genetic markers within each linkage group (Kearsey and Pooni, 1996). Several genetic maps based on different DNA markers were constructed for cotton. In the present study, two types of SSRs (EST and gSSRs) were used to identify polymorphism between the parental genotypes, (G. hirsutum L. cv. FH-1000 and G.

barbadense L. cv. PGMB-36) and subsequent profiling of F2 individuals.

4.5.1 Parental polymorphism

For genetic map construction, SSR markers have successfully been utilized (Zhang et al., 2002a; Nguyen et al., 2004; Song et al., 2005b; Han et al., 2006; Abdurakhmonov et al., 2007; He et al., 2007; Shen et al., 2007; Zhang et al., 2008; Gadaleta et al., 2009; Lacape et al., 2010; Saeed et al., 2011, Yu et al., 2012; Gailing et al., 2013; Shaheen et al., 2013). In this study most of the SSRs amplified two fragments surveying the genomic DNA of tetraploid species. Such type of results were expected because during polyploidization, fusion of A and D genomes and doubling of chromosome during evolution (Buteler et al., 1999) may be the reason of amplification of more than one loci in allopolyploids like G. hirsutum L. (Gutierrez et al., 2002, Candida et al., 2006; Kalivas et al., 2011).

149

Chapter 4: Discussion

SSRs are co-dominant in expression which depict polymorphism between genotypes/accessions under study on the basis of mutation (indels) in repeat motifs. In this study, out of 74 polymorphic SSRs about 84.93% showed co-dominant behaviour, while, dominant polymorphism showed by 16.07% SSRs may be the result of complete deletion or base substitution in primer annealing sites which are considered highly conserved across the species (Rossetto et al., 2002). Null alleles may cause amplification of dominant loci. Null allele can be defined as an allele located at microsatellite locus that does not amplify during PCR and is therefore not identified during genotyping of individuals (Dakin and Avis, 2004; Ma et al., 2008).

The low rate of polymorphism (22.53%) observed in this study is comparable with the previous published research. Lacape et al. (2003) observed 22% polymorphism between G. hirsutum L. and G. barbadense L. using AFLPs and Yu et al. (2012) found 25% polymorphic gSSRs between G. hirsutum L. (TM-1) x G. barbadense (3-79) which are close to the value observed in the present study.

This low level of polymorphism may be due to the reason that all cultivated tetraploids have narrow genetic base in their respective gene pool (Liu et al., 2005; Rahman et al., 2008, 2011; Shaheen et al., 2010) because of using few genotypes for development of new cultivars (Rahman et al., 2002b, 2005a). Different epidemics may also be one of the reasons for lack of genetic diversity (Rahman et al., 2005b).

4.5.2 Linkage analysis and map construction

Inter-specific population has been used for constructing genetic map in cotton (G. hirsutum L. x G. barbadense L. (Zhang et al., 2002a; Nguyen et al., 2004; Song et al., 2005b; Frelichowski et al., 2006; Han et al., 2006; He et al., 2007; Zhang et al., 2008, Paterson et al., 2009, Yu et al., 2011). These highly saturated maps present wider genome coverage (Rong et al., 2004, He et al., 2007). In tetraploid cotton species polymorphism level is very low because of selection pressure (Rahman et al., 2002, 2005), but this low level of polymorphism can be detected by using higher number of markers available in cotton database.

150

Chapter 4: Discussion

Because of low genetic polymorphism within cotton species, populations derived from inter-specific crosses were used for developing genetic map (Lacape et al., 2003; Rong et al., 2004; Guo et al., 2007; Yu et al., 2011). Most of these maps were developed

using single type of marker. In this study 131 F2 plants derived from an inter-specific cross between G. hirsutum L. (FH-1000) and G. barbadense L. (PGMB-36) were used for linkage map construction followed by QTL analysis using newly developed G. raimondii L. derived BAC-gSSRs.

A total of 74 SSRs surveyed on genomic DNA of 131 F2 plants of mapping population, which were used for linkage map development to aid QTL analysis. The map consisted of eleven linkage groups and spanned 748.4cM (~8% of the total G. hirsutum L. genome). Inter-specific maps of cultivated tetraploid species developed using SSRs and EST-SSRs genome map coverage ranged from 1216 cM to 4418.9cM (Desai et al., 2006; Yu et al., 2011) while in intera-specific map of cultivated diploid species using RAPDs and SSRs genome map coverage was found 346cM (Shaheen et al., 2013), in another study, during intra-specific map construction using cultivated tetraploid species with the help of SSRs, genome map coverage of 3814.3cM was reported (QingZhi et al., 2013). Variation in map distances may be due to the differences in size of mapping populations and numbers/sources of genetic markers (Yu et al., 2012).

In this study, small linkage groups of 2-11 markers were observed. Such type of small linkage groups are expected when less number of markers are used to map large sized genomes like cotton having genome size of 2.5Gb (Hendrix and Stewart, 2005). The best confidence interval proposed for QTL detection is 10 cM(Kearsey and Farquhar, 1998; Kearsey, 1998). In this study, observed average distance (10.3cM) between adjacent markers is in agreement of best confidence interval.

In this study, already mapped EST-SSRs (20 in number) were used as reference markers to assign linkage groups to any chromosome but all of these EST-SSRs were found monomorphic between the parental species, so none of linkage group could be assigned to any chromosome on the basis of reference markers. Here we tried to assign chromosome number to linkage groups using genome sequence information of G.

151

Chapter 4: Discussion

raimondii L. through blast search of sequences from which these SSRs were designed against whole genome shotgun sequence (WGS) of G. raimondii L..

A variation in Mendelian segregation ratio was observed in populations developed through making intra- and inter-specific crosses (Ulloa et al., 2002; Rong et al., 2004; Lin et al., 2005; Lacape et al., 2009; Yu et al., 2011). In this study, 16.4% of G. raimondii L.

derived polymorphic SSRs illustrated skewed segregation in F2 mapping population. Such commonalities have also observed in previous studies (Lin et al., 2005; Guo et al., 2007; Zhang et al., 2007; Yu et al., 2011). Yu et al. (2012) found 32.9% segregation distortion in interspecific cross (TM-1; G. hirsutum x 3-79; G. Barbadense) derived mapping population. High frequency of segregation distortion (49–80%) has also been observed using populations derived from inter-specific crosses which are largely due to species divergence (Paterson et al., 1988). There are a number of factors which can cause segregation distortion in mapping population like mutation in SSR binding site, hitch hiking, redundant heterozygote, gametophyte selection, pollen lethal, genetic drift, or cytological attributes pollen tube competition, preferential fertilization, and zygotic selection all cause the cluster of segregation distortion (Shappley et al., 1998a; Sibov et al., 2003; Ma et al., 2008).

4.5.3 QTL detection

QTL mapping is an effective way to find out the genetic basis of quantitative traits (Shaheen et al., 2013). It has been reported that F2 segregating populations are valuable source for genetic map construction and QTL analysis (Lacape et al., 2003; Rong et al., 2004; Guo et al., 2007; Yu et al., 2012) because of the occurrence of all possible combinations of parental alleles (Lander et al., 1987). Therefore, in the present study interspecific cross (G. hirsutum cv. FH-1000 x G. barbadense cv. PGMB-36)

derived F2 population was used to construct linkage map and QTL analysis.

A threshold value of 2.69 was used for QTLs detection. For various mapping

population types, different LOD thresholds were suggested [1.9 (BC), 2.7 (F2), 2.1 (RI) and 3.2 (FS)] (Ooijen, 1999). In the present study, two QTLs for fiber traits were

152

Chapter 4: Discussion

identified, one for staple length QStL2I accounting 17.20% of the total phenotypic variation and one for uniformity index QUniI2I accounting for 12.69% of the total phenotypic variation which were placed on the different regions of the chromosome 18. Also one QTL for GOT QGOT6I accounting 11.69% of the total phenotypic variation was found on chromosome 21. In earlier studies, QTLs affecting different fibre related traits were detected within the same chromosome regions, suggesting that genes controlling fibre traits may be linked or the result of pleiotropy (QingZhi et al., 2013). All these fiber related QTLs have positive additive effects showing that alleles from PGMB- 36 have increased fiber length and also positive impact on GOT as well as for uniformity index.

Quantitative trait loci (QTL) controlling agronomic traits have been reported by several authors in G. hirsutum L. (Ulloa and Meredith, 2000; He et al., 2005, 2007; Wang et al., 2007a; Gore et al., 2013). Plant phenology strongly interacts with the yield potential (Ludlow and Muchow, 1990).

4.5.4 Marker Assisted selection (MAS) for fiber quality traits

With the advent of high-tech spinning machines in textile industry, high strength of fiber is needed for avoiding breakage of the yarn fibers (Patil and Singh, 1995). In general, G. barbadense L. has superior fiber quality but is low yielding and less adapted to cotton growing regions as compared to G. hirsutum L. Conventional breeding proved less effective to transfer novel genetic variation for fiber traits from G. barbadense L. to

modern upland cotton due to hybrid breakdown in the F2 and subsequent generations (Stephens, 1949; Stephens, 1950; Reinisch et al., 1994; Jiang et al., 2000; Zhang et al., 2009). Lint yield is negatively correlated with fiber quality (QingZhi et al., 2013). The only way to make conventional breeding effective is the utilization of molecular markers for rapid identification and precise selection of genotypes (Tanksley, 1988). For example in this study, a QTL QStL2I for fiber length was detected in F2 population derived from a cross between FH-1000 and PGMB-36, this QTL was found to be associated with marker PR-278 and explained 17.20% of the phenotypic variance. In succeeding generations, this marker was used to select the cotton plant. In one of the lines, we observed a significant

153

Chapter 4: Discussion

increase in fiber length, i.e. 32.68mm which is 2.56mm increased from the parent variety FH-1000 (G. hirsutum L.). Dong et al. (2009) screened three SSR markers linked with QTLs for fiber length and studied the effect of MAS and pyramiding breeding in the three combinations. These results suggest that the construction of a saturated linkage map for G. hirsutum, the DNA markers associated with the QTL is an effective means to improve fiber quality by MAS and by pyramiding QTL in upland cotton breeding programs.

In the t1 locus region on chromosome 6, Wan et al. (2007) detected QTLs affecting fiber length, fiber strength, fiber uniformity, and spiny bollworm resistance that increased the trait phenotypic values. Because all the QTLs were mapped within about 5 cM of the t1 locus, this locus could be considered as the candidate gene for the QTLs, which should be particularly useful in MAS manipulation of fiber yield and quality.

The results indicated that the genetic effect of the QTL was stable for over three generations. Thus, this can be used for improving the quality of cotton.

4.5.5 Identification of candidate genes involved in staple length DNA markers associated with QTLs conferring staple length were further explored using the genome sequence information of the G. raimondii L. (one of the progenitor species of the cultivated tetraploid cotton). Genomic regions were explored flanking to the QTLs (linked with marker). Out of the 20 explored putative genes near to the marker (PR-536), a gene conferring MYB related protein was identified. This protein acts as a transcription factor (Hori et al., 2003). In multiple investigations, role of TFs in cell development (Perez Rodriguez et al., 2005), differentiation (Igarashi et al., 2007) and signal transduction (Middleton et al., 2007) has been demonstrated.MYB related protein TFs have been found abundantly during early cotton fiber development (-3 dpa to 0 dpa) (Pu et al., 2008). Thus this part of the genome has some role in fiber development which may be investigated at length by conducting a series of QTL cloning experiments.

154

Chapter 4: Discussion

In the vicinity of DNA marker PR-278, a gene conferring protein phosphatase was identified. This protein acts as a regulator of phosphorylation (Mei et al., 2006) at post-translational level and is involved in fiber cell development (Wang et al., 2010). Similarly, a gene conferring fiber protein (involves in fiber development) near to a DNA marker PR-443 has been identified.

4.6 Conclusions

Findings of the present studies suggest that genomic SSR markers are more informative for the assessment of genetic diversity among the cotton species. EST-SSRs are more informative for determining the changes occurred as a result of selection during domestication. Only few EST-SSR markers are sufficient for resolving phylogenetic relationship of cotton species instead of using large number of SSR markers. Number of repeats per locus has positive correlation with the number of alleles amplified, allele size range and polymorphism information content. Repeat motif type and position of loci on the chromosome have no impact on polymorphism rate. G. mustelinum L., G. tomentosum L., G. darwinii L., G. hirsutum ‘‘yucatanense’’ and G. herbaceum L. var. africanum are genetically diverse which can be utilized for broadening the genetic base of cultivated cotton species, a way for achieving the sustainability in cotton production in the changing climate.

Some uniquely expressed fiber-related partial cDNA sequences/transcripts (phosphatase, zinc finger protein, glycosyltransferase and β-tubulin) were identified through DDRT-PCR, which hold promise for identification of full-length novel genes that can be further characterized and may eventually lead to the improvement of fiber quality/yield and help us in understanding the interactions between fiber related genes.

Differential display is a powerful tool for the identification of genes but not all genes are recovered through this technique. In this study, partial cDNA sequences involved in cotton fiber development were identified from G. arboreum L. using G. hirsutum L. based on gene specific primers. Screening of cotton BAC-libraries with probes designed from translation elongation factor1-gamma gene of G. arboreum L.

155

Chapter 4: Discussion

showed relatedness of cotton species because positive hits were observed in G. arboreum L., G. hirsutum L. and G. kirkii L. During phylogenetic study of these species G. raimondii L. was found the most distantly related with the other two species as for as this gene is concerned.

Additional portable SSR primers are required for constructing saturated genetic map, genetic analysis and gene tagging. SSRs can be derived from the genome sequence information of G. raimondii L. and G. arboreum L. Successful amplification of G. raimondii derived BAC-gSSRs in other species would also be helpful in initiating comparative mapping studies within the genus Gossypium.

For genetic mapping, low level of polymorphism between parent species i.e. G. hirsutum L. and G. barbadense L. was observed which showed narrow genetic base of these species.

QTL analysis is considered as an effective mean to assess genetic control of quantitative traits. In this study, 3 QTLs impacting 3 morphological traits were mapped. As our studies covered 8% of the G. hirsutum L. genome so there is also possibility for presence of more QTLs in different parts of the genome which are not identified in this study. These unidentified genomic regions can be discovered by using more number of markers to get extensive genome coverage.

Discovery of QTLs is a step towards the commencement of implementation of MAS to incorporate a specific character. The information regarding QTLs discovered in cultivated tetraploid species especially for the productivity traits is important.

156

References

References

Abdalla, A.M., Reddy, O.U.K., El-Zik, K.M. and Pepper, A.E., 2001. Genetic diversity and relationships of diploid and tetraploid cottons revealed using AFLP. Theoretical and Applied Genetics, 102: 222-229. Adams, K.L. and Palmer, J.D., 2003. Evolution of mitochondrial gene content: Gene loss and transfer to the nucleus. Molecular Phylogenetics and Evolution, 29: 380–395. Adams, K.L. and Wendel, J.F., 2004. Exploring the genomic mysteries of polyploidy in cotton. Biological Journal of the Linnean Society, 82: 573–581. Adhikari, T.B., Gurung, S., Hansen, J.M., Jackson, E.W. and Bonman, J.M., 2012. Association Mapping of Quantitative Trait Loci in Spring Wheat Landraces Conferring Resistance to Bacterial Leaf Streak and Spot Blotch. The plant Genome, 5: 1-16. Akkaya, M.S., Bhagwat, A.A. and Cregan, P.B., 1992. Length polymorphisms of simple sequence repeat DNA in soybean. Genetics, 132: 1131-1139. Allinne, C., Mariac, C., Vigouroux, Y., Bezancon, G., Couturon, E., Moussa, D., Tidjani, M., Pham, J.L. and Robert, T., 2008. Role of seed flow on the pattern and dynamics of pearl millet (Pennisetumglaucum [L.] R. Br.) genetic diversity assessed by AFLP markers: a study in south-western Niger. Genetica, 133: 167- 178. Anderson, J.A., Churchill, G.A., Autrique, J.E., Tanksley, S.D. and Sorrells, M.E.,1993. Optimizing parental selection for genetic linkage map. Genome, 36: 181-186. Angaji, S.A., Septiningsih, F.M., Mackill, D.J. and Ismail, A.M., 2010. QTLs associated with tolerance of flooding during germination in rice (Oryza sativa L.). Euphytica, 172:159-168 Arthur, J. C., 1990. Polymers, Fibers and Textiles, A Compendium, ed. Kroschwitz, J. I. (Wiley, New York) pp 118-141. Asp, T., Frei, U.K., Didion, T., Nielsen, K.K. and Lubberstedt, T., 2007. Frequency, type, and distribution of EST-SSRs from three genotypes of Loliumperenne, and their conservation across orthologous sequences of Festucaarundinacea, Brachypodiumdistachyon, and Oryza sativa. BMC Plant Biology, 7: 36-47.

157

References

Bandopadhyay, R., Sharma, S., Rustgi, S., Singh, R., Kumar, A., Balyan, H.S. and Gupta, P.K., 2004. DNA polymorphism among 18 species of Triticum-Aegilops complex using wheat EST-SSRs. Plant Science, 166: 349–356. Bange, M.P., R.L. Long, G.A. Constable, and S. Gordon., 2010. Minimizing immature fiber and neps in Upland cotton (Gossypium hirsutum L.). Agronomy Journal, 102: 781-789. Bassam, B.J., Caetano-Anollés, G. and Gresshoff, P.M., 1995. Method for profiling nucleic acids of unknown sequence using arbitrary oligonucleotide primers. US Patent 5,413,909. Beasley, J.O., 1940. The origin of American tetraploid Gossypium species. American Naturalist, 74: 285-286. Beasley, J.O., 1940. Meiotic chromosome behavior in species, species hybrids, haploids and induced polyploids of Gossypium. Genetics, 27: 25–54. Bednarz, C.W., Shurley, D.W. and Anthony, W.S., 2002. Losses in yield, quality, and profitability of cotton from improper harvest timing. Agronomy Journal, 94: 1004-1011. Behery, H.M., 1993. Short-fiber content and uniformity index in cotton. International Cotton Advisory Committee review article on cotton production research No. 4, CAB Int., Wallingford, UK. p. 40. Bell, C.J. and Ecker, J., 1994. Assignment of 30 microsatellite losi to the linkage map of Arabidopsis. Genome, 19: 137-144. Bennetzen, J.L., 2002. Mechanisms and rates of genome expansion and contraction in flowering plants. Genetica, 115: 29–36. Bernatzky, R. and Tanksley, S.D., 1986. Towards a saturated linkage map in tomato based on isozyme and random cDNA sequences. Genetics, 112: 887-898. Bertini, C.H.C.M., Schuster, I.,Sediyama, T.,Barros, E.G. and Moreira, M.A., 2006. Characterization and genetic diversity analysis of cotton cultivars using microsatellites. Genetetics and Molecular Biology, 29: 321-329. Bertran, E. and Long, M., 2002. Expansion of genome coding region by acquisition of new genes. Genetica, 115: 65–80. Beyene, Y., Botha, A.M. and Myburg, A.A., 2006. Genetic diversity in traditional Ethiopian highland maize accessions assessed by AFLP markers and morphological traits. Biodiversity Conservation, 15: 2655-2671.

158

References

Biderre, C., Metenier, G. and Vivares, C.P., 1998. A small spliceosomal-type intron occurs in a ribosomal protein gene of the microsporidianEncephalitozooncuniculi. Molecular Biochemistry and Parasitology, 74: 229–231. Boivin, K., Deu, M., Rami, J.F., Trouche, G. and Hamon, P., 1999. Towards a saturated sorghum map using RFLP and AFLP markers. Theoretical and Applied Genetics, 98: 320-328. Borner, A., Roder, M.S., Unger, O. and Meinel, A., 2000. The detection and molecular mapping of a major gene for non-specific adult-plant disease resistance against stripe rust (Pucciniastriformis) in wheat. Theoretical and Applied Genetics, 100: 1095-1099. Botstein, D., White, R.L., Skolnik, M. and Davis, R.W., 1980. Construction of a genetic linkage map in using restriction fragment length polymorphisms. AmericanJornal of Hum. Genetics, 32: 314-331. Bourdon, C., 1984.Diffe´renciationge´ne´tique inter et intraspe ´cifiquedans le genre Gossypium L.le polymorphismeenzymatique chez des espe`cesdiploı¨des et te´ traploı¨des de cotonnier, PhD Dissertation, Universite ´ Paris Sud, Paris, pp. 172 Bowers, J.E., Chapman, B.A., Rong, J. and Paterson, A.H., 2003. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature, 422: 433–438. Bradow, J.M., Bauer, P.J., Hinojosa, O. and Sassenrath- Cole, G.F., 1997a. Quantitation of cotton fibre-quality variations arising from boll and plant growth environments. European Journal of Agronomy, 6:191–204. Bradshaw, J.E., Hackett, C.A., Meyer, R.C., Milbourne, D., McNichol, J.W., Philips, M.S. and Waugh, R., 1998. Identification of AFLP and SSR marker associated with quantitative resistance to Globoderapallida (Stone) in tetraploid potato (Solanum tuberosum subsp. tuberosum) with a view to marker-assisted selection. Theoretical and Applied Genetics, 97: 202-210. Bradshaw, J.E., Hackett, C.A., Pande, B., Waugh, R. and Bryan, G.J., 2008. QTL mapping of yield, agronomic and quality traits in tetraploid potato (Solanum tuberosum subsp. Tuberosum). Theoretical and applied genetics, 116:193-211. Broun, P. and Tanksley, S.D., 1996. Characterization and genetic mapping of simple repeat sequences in the tomato genome. Molecular Genetics and Genomics, 250: 39-49. 159

References

Brubaker, C.L. and Paterson, A.H. and Wendel, J.F., 1999. Comparative genetic mapping of allotetraploid cotton and its diploid progenitors. Genome, 42: 184– 203. Brubaker, C.L., Wendel, J.F., 1994.Reevaluating the origin of domesticated cotton (Gossypium hirsutum, Malvaceae) using nuclear restriction fragment length polymorphism (RFLPs). American Journal of Botany, 81: 309-1326. Brubaker, C., Cantrell, R.G., Giband, M., Lyon, B. and Wilkins, T. A., 2000. Letter to Journal of Cotton Sciencecommunity from the Steering Committee of the International Cotton Genome Initiative (ICGI). Journal of Cotton Science, 4: 149-151. Buteler, M.I., Jarret, R.L. and LaBonte, D.R., 1999.Sequence characterization of microsatellites in diploid and polyploid Ipomoea. Theoretical and Applied Genetics, 99: 123-132. Caballero, A., Garcia-Pereira, M. and Quesada, H., 2013. Genomic distribution of AFLP markers relative to gene locations for different eukaryotic species. BMC Genomics, 14: 528.

Caetano-Anollés, G., 1993. Amplifying DNA with arbitrary oligonucleotide primers. PCR Methods and Applications, 3: 85–94.

Caetano-Anollés, G., 1996. Scanning of nucleic acids by in vitro amplification: new developments and applications. Nature Biotechnology, 14: 1668–1674. Cândida, H., Bertini, D.M., Schuster, I., Sediyama, T., Barros, E.G.D. and Moreira, M.A., 2006. Characterization and genetic diversity analysis of cotton cultivars using microsatellites. Genetetics and Molecular Biology, 29(2): 321-329. Cantrell, R.G. and Xiao. J., 2008. Utilization of cotton DNA markers in cotton breeding. Cotton Science Vol. 20 Supplement, P 15, International Cotton Genome Initiative Research Conference Proceedings, Anyang, China. http:// journal.cricaas.com.cn/en/2008/Supp/supp00.htm. Changbiao, W., Wangzhen, G., Caiping, C. and Tianzhen, Z., 2006. Characterization, development and exploitation of EST-derived microsatellites in Gossypium raimondii Ulbrich. Chinese Science Bulletin, 51: 557-561. Cho, Y.G., Ishii, T., Temnykh, S., Chen, X., Lopovich, L., McCouch, S.R., Park, W.D., Ayres, N. and Cartinhour, S., 2000. Diversity of microsatellites derived

160

References

from genomic libraries and GenBank sequences in rice (Oryza sativa L.). Theoretical and Applied Genetics, 100: 713-722. Cho, Y., Lee, Y.P., Park, B.S., Han, T.H. and Kim, S., 2012. Construction of a high- resolution linkage map of Rfd1, a restorer-of-fertility locus for cytoplasmic male sterility conferred by DCGMS cytoplasm in radish (Raphanus sativus L.) using synteny between radish and Arabidopsis genomes. Theoretical and Applied Genetics, 125(3): 467-77. Claverie, J.M., 2 000. What if there are only 30,000 human genes. Science, 291: 1255–1257. Condit, R. and Hubbell, S.P., 1991. Abundance and DNA sequence of two-base repeat regions in tropical tree genomics. Genome, 34: 66-71. Connell, J.P., Pammi, S., Iqbal, M.J., Huizinga, T. and Reddy, A.S., 1998. A high throughput procedure for capturing microsatellites from complex plant genomes. Plant Molecular Biology Reporter, 16: 341-349. Cronn, R., Small, R.L. and Wendel, J.F., 1999. Duplicated genes evolve independently after polyploid formation in cotton. Proceedings of the Beltwide Cotton Research Production Conference, 96: 14406-14411. Cronn, R., Small, R.L., Haselkorn, T. and Wendel, J.F., 2002. Rapid diversification of the cotton genus (Gossypium: Malvaceae) revealed by analysis of sixteen nuclear and chloroplast genes. American Journal of Botany, 89: 707–725. Dahab, A.A., Saeed, M., Mohamed, B.B., Ashraf, M.A., Puspito, A.N., Shahid, K.S.B.A.A. and Husnain, T., 2013. Genetic diversity assessment of cotton (Gossypium hirsutum L.) genotypes from Pakistan using simple sequence repeat markers. Australian Journal of Crop Science, 7: 261-267. Dakin, E.E. and Avis, J.C., 2004. Microsatellite null alleles in parentage analysis. Heredity, 93: 504-509. Decroocq, V., Fave, M.G., Hagen, L., Bordenave, L. and Decroocq, S., 2003. Development and transferability of apricot and grape EST microsatellite markers across taxa. Theoretical and Applied Genetics, 106: 912-922. Dejoode, D.R. and Wendel, J.F., 1992. Genetic diversity and origin of the Hawaiian Islands cotton, Gossypium tomentosum. American Journal of Botany, 79: 1311– 1319. Deutsch, M. and Long, M., 1999. Intron–exon structure of eukaryotic model organisms. Nucleic Acids Research, 27: 3219–3228. 161

References

Dib, C., Faure., S., Fizames., C., Samson., D., Drouot., N., Vignal., A., Millasseau., P., Marc., S., Hazan., J., Seboun., E., Lathrop., M., Gyapay., G., Morissette., J. and Wellssenbach., J., 1996. A comprehensive genetic map of the human genome based on 5264 microsatellites. Nature, 380: 152-154. Dietrich, W.F., J. Miller, R., Steen, M.A., Merchant, D., Damronboles, Z., Husain, R., Dredge, M.J., Daly, K.A., Ingalls, T.J., O’Connor, C.A., Evans, M.M., DEAngelis, D.M., Levinson, L., Kruglyak, N., Goodmann, N., Copeland, G., Jenkins, N.A., Hawkins, T.L., Stein, L., Page, D.C. and Lander, E.S., 1996. A comprehensive genetic map of the mouse genome. Nature, 380: 149-152. Dong, G.J., Liu, G.S. and Li, K.F., 2007. Studying genetic diversity in the core germplasm of confectionary sunflower (Helianthus annuus L.) in China based on AFLP and morphological analysis. Genetica, 43: 762-770. Dongre, A. and Parkhi, V., 2005. Identification of cotton hybrid through the combination of PCR based RAPD, ISSR and microsatellite markers. Journal of Plant Biochemistry and Biotechnology, 14(1): 53-55. Dreher, K., Morris, M., Khairallah, M., Ribaut, J.M., Pandey, S. and Srinivasan, G., 2000. Is markerassisted selection cost-effective compared to conventional plant breeding methods? The case of quality protein maize, The Fourth Annual Conference of the International Consortium on Agricultural Biotechnology Research (ICABR), "The Economics of Agricultural Biotechnology," held in Ravello, Italy. Druka, A., Potokina, E., Luo, Z.W., Jiang, N., Chen, X.W., Kearsey, M. and Waugh, R., 2010. Expression quantitative trait loci analysis in plants. Plant Biotechnology Journal, 8: 10-27. Dudley, J.W., 1993. Molecular markers in plant improvement: manipulation of genes affecting quantitative traits. Crop Science, 33: 660-668. Edwards, G.A., Endrizzi, A.J.E. and Stein, R., 1974. Genome DNA content and chromosome organization in Gossypium. Chromosoma, 47: 309–326. Elder, J.F. and Turner, B.J., 1995. Concerted evolution of repetitive DNA sequences in eukaryotes. Quarterly Review of Biology, 70: 297–320. Ellegran, H., 1993. Genome analysis with microsatellite markers. Ph. D. Dissertation. University of Agricultural Science, Swedish. Ellegren, H., 2000. Microsatellite mutations in the germline: implications for evolutionary inference. Trends in Genetics, 16:551–558. 162

References

Endrizzi, J.E., Turcotte, E.L. and Kohel, R.J., 1984. Qualitative genetics, cytology and cytogenetics. In Cotton (Kohel, R.J. and Lewis, C.F., eds). Madison, WI. American Society of Agronomy, 81-129. Endrizzi, J.E., Turcotte, E.L. and Kohel, R.J., 1985. Genetics, cytology and evolution of Gossypium. Advances in Genetics, 23: 271-375. Eujayl, I., Sorrells, M.E., Baum, M., Wolters, P. and Powell, W., 2002. Isolation of EST-derived microsaltellite markers for genotyping the A and B genomes of wheat. Theoretical and Applied Genetics, 104: 399–407. Ferris, S.D. and Whitt, G.S., 1979. Evolution of the differential regulation of duplicate genes after polyploidization. Journal of Molecular Evolution, 12: 267–317. Flajoulot, S., Ronfort, J., Baudouin, P., Barre, P., Huguet, T., Huyghe, C. and Julier, B., 2005. Genetic diversity among alfalfa (Medicagosativa) cultivars coming from a breeding program, using SSR markers. Theoretical and Applied Genetics, 111: 1420-1429. Foolad, M.R., 2007. Genome Mapping and Molecular Breeding of Tomato. International Journal of Plant Genomics, doi:10.1155/2007/64358 Force, A., Lynch, M., Pickett, F.B., Amores, A., Yan, Y-L. and Postlethwait, J., 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics, 151: 1531–1545. Franckowiak, J., 1997. Revised linkage maps for morphological markers in barley (Hordeum vulgare). Barley Genetics Newsletter, 26: 9-21. Frelichowski, J.E., J., Palmer, M.B., Main, D., Tomkins, J.P., Cantrell, R.G., Stelly, D.M., Yu, J., Kohel, R.J. and Ulloa, M., 2006. Cotton genome mapping with new microsatellites from Acala 'Maxxa' BAC-ends. Molecular Genetics and Genomics, 275: 479-491. Fryxell, P.A., 1979. The natural history of the cotton tribe. Texas A & M University Press, College Station. Fryxell, P.A., Craven, L.A. and Stewart, J.M., 1992. A revision of Gossypium sect. Grandicalyx (Malvaceae), including the description of six new species. Systamatic Botany, 17: 91-114. Gadaleta, A., Giancaspro, A., Giove, S.L., Zacheo, S., Mangini, G., Simeone, R., Signorile, A. and Blanco, A., 2009. Genetic and physical mapping of new EST- derived SSRs on the A and B genome chromosomes of wheat. Theoretical and Applied Genetics, 118(5): 1015-25 163

References

Gailing, O., Bodenes, C., Finkeldey, R., Kremer, A. And Plomion, C., 2013. Genetic mapping of EST-derived simple sequence repeats (EST-SSRs) to identify QTL for leaf morphological characters in a Quercusroburfull-sib family. Tree Genetics and Genomes, 9: 1361-1367. Gao, W., Chen, Z.J., Yu, J.Z., Raska, D., Kohel, R.J., et al., 2004. Wide-cross whole- genome radiation hybrid mapping of cotton (Gossypium hirsutum L.). Genetics, 167: 1317-1329. Gaut, B.S., Morton, B.R., McCaig, B.C. and Clegg, M.T., 1996. Substitution rate comparisons between grasses and palms: Synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proceedings of the National Academy of Sciences, USA. Science, 93:10274–10279. Geever, R.F., 1980. The evolution of single-copy nucleotide sequences in the genome of G. hirsutum. PhD dissertation, University of Arizona, Tuscon, USA. Geever, R.F., Katterman, F.R.H. and Endrizzi, J.E., 1989. DNA hybridization analyses of a Gossypium allotetraploid and two closely related diploid species. Theoretical and Applied Genetics, 77: 553–559. George, J., Dobrowolski, M.P., van Zijll de Jong, E., Cogan, N.O., Smith, K.F. and Forster, J.W., 2006. Assessment of genetic diversity in cultivars of white clover (Trifolium repens L.) detected by SSR polymorphisms. Genome, 49: 919-930. Gerstel, D.U., 1953. Chromosome translocations in interspecific hybrids of the genus Gossypium. Evolution, 7: 234–244. Giese, H., Holm-Jensen, A.G. and Jensen, J., 1993. Localization of the laevigatum powdery mildew resistance gene to barley chromosome 2 by the use of RFLP markers. Theoretical and Applied Genetics, 85: 897-900. Gipson, J.R. and Joham, H.E., 1968. Influence of night temperature on growth and development of cotton (Gossypium hirisutum L.). II. Fiber properties. Agronomy Journal, 60:296-298. Gipson, J.R. and H.E. Joham., 1969. Influence of night temperature on growth and development of cotton (Gossypium hirisutum L.) Fiber elongation. Crop Science, 9:127-129. GOP, 2007. Economic survey of Pakistan: Finance Division, Government of Pakistan. Gordon, S. and G. Naylor., 2004. Instrumentation for rapid direct measurement of cotton fibre fineness and maturity. In Proc. Beltwide Cotton Conf., San Antonio, TX. 5-9. National Cotton Council of America in Memphis, TN. 164

References

Grant, V., 1981. Plant speciation. Columbia University Press. New York. Grimmer, M.K., Trybush, S., Hanley, S., Francis, S.A., Karp, A. and Asher, M.J., 2007. An anchored linkage map for sugar beet based on AFLP, SNP and RAPD markers and QTL mapping of a new source of resistance to Beet necrotic yellow vein virus. Theoretical and Applied Genetics, 114: 1151-1160. Guan, X.Y., Li, Q.J., Shan, C.M., Wang, S., Mao, Y.B., Wang, L.J. and Chen, X.Y., 2008. The HD-Zip IV gene GaHOX1 from cotton is a functional homologue of the Arabidopsis GLABRA2. Physiologia Plantarum,134: 174–182. Guo, W.Z., Cai, C.P., Wang, C.B., Han, Z.G., Song, X.L., Wang, K., Niu, X.W., Wang, C., Lu, K.Y., Shi, B., Zhang, T.Z., 2007. A microsatellite-based, gene- rich linkage map reveals genome structure, function and evolution in Gossypium. Genetics, 176:527-541. Guo, W., Wang, W., Zhou, B. and Zhang, T., 2006a. Cross species transferability of G. arboreum- derived EST-SSRs in the diploid species of Gossypium. Theoretical and Applied Genetics, 112: 1573-1581. Guo, W.Z., Sang, Z.Q., Zhou, B.L. and Zhang, T.Z., 2007b. Genetic relationships of D-genome species based on two types of EST-SSR markers derived from G. arboreum and G. raimondii. Plant Science, 172: 808-814. Guo, W.Z., Zhang, T.Z., Shen, X.L., Yu, J.Z. and Kohel, R.J., 2003. Development of SCAR marker linked to a major QTL for high fiber strength and its usage in molecular-marker assisted selection in upland cotton. Crop Science, 43: 2252- 2256. Guo, Y., McCarty, J.C., Jenkins, J.N. and Saha, S., 2008. QTLs for node of first fruiting branch in a cross of an upland cotton, Gossypium hirsutum L., cultivar with primitive accession Texas 701. Euphytica, 163: 113–122. Guo, Y., Saha, S., Yu, J., Jenkins, J., Kohel, R., Scheffler, B. and Stelly, D., 2007. BAC-derived SSR markers chromosome locations in cotton. Euphytica, 161: 361-370. Gupta, P.K., Balyon, H.S., Sharma, P.C. and Ramesh, B., 1996. Microsatellites in Plants: a new class of molecular markers. Current Science. 70: 45-54. Gupta, P.K., Mir, R.R., Mohan, A. and Kumar, J., 2008. Wheat Genomics: Present Status and Future Prospects. International Journal of Plant Genomics, doi:10.1155/2008/89645.

165

References

Gupta, P.K., Rustgi, S., Sharma, S., Singh, R., Kumar, N. and Balyan, H.S., 2003. Transferable EST-SSR markers for the study of polymorphism and genetic diversity in bread wheat. Molecular Genetics and Genomics, 270: 315-323. Gutierrez, O.A., Basu, S., Saha, S., Jenkins, J.N., Shoemaker, D.B., Cheatham, C.L. and McCarty, J.C., 2002. Genetic distance among selected cotton genotypes and

its relationship with F2 performance. Crop Science, 42: 1841-1847. Gyapay, G., Morissette, J., Vignal, A., Dip, C., Fizames, C., Millasseau, P., Marc, S., Bernardi, G., Lathrop, M. and Weissenbach, J., 1994. The 1993-39 Genethon human genetic linkage map. Nature Genetics, 7: 246-339. Han, Z.G., Guo, W.Z., Song, X.L. and Zhang, T.Z., 2004. Genetic mapping of EST- derived microsatellites from the diploid Gossypium arboreum in allotetraploid cotton. Molecular Genetics and Genomics, 272: 308-327. Han, Z.G., Wang, C.B., Song, X.L., Guo, W.Z., Guo, J.Y. and Zhang, T.Z., 2006. Characteristics, development and mapping of Gossypium hirsutum derived EST-SSR in allotetraploid cotton. Theoretical and Applied Genetics, 112: 430– 439. Han, Z.G., Wang, C.B., Song, X.L., Guo, W.Z., Guo, J.Y. and Zhang, T.Z., 2006. Characteristics, development and mapping of Gossypium hirsutum derived EST-SSR in allotetraploid cotton. Theoretical and Applied Genetics, 112: 430– 439. Hanson, R.E., Islam-Faridi, M.N., Crane, C.F., Zwick, M.S., Czeschin, D.G., Wendel, J.F., Mcknight, T.D., Price, H.J. and Stelly, D.M., 2000. Ty1-copia- retrotransposon behavior in a polyploid cotton. Chromosome Research, 8: 73– 76. Hanson, R.E., Zhao, X.P., Islam-Faridi, M.N., Paterson, A.H., Zwick, M.S., Crane, C.F., McKnight, T.D., Stelly, D.M. and Price, H.J., 1998. Evolution of interspersed repetitive elements in Gossypium (Malvaceae). American Journal of Botany, 85: 1364-1368. Hao, Z.F., Li, X.H. and Zhang, S.H., 2005. Towards an expanded linkage map and exploration on co-dominant scoring of AFLPs in maize. Yi ChuanXueBao, 32: 960-968. Hashimoto, Z., Mori, N., Kawamura, M., Ishii, T., Yoshida, S., Ikegami, M., Takumi, S. and Nakamura, C., 2004. Genetic diversity and phylogeny of Japanese sake-

166

References

brewing rice as revealed by AFLP and nuclear and chloroplast SSR markers. Theoretical and Applied Genetics, 109: 1586-1596. Hawkins, J.S., Pleasants, J. and Wendel, J.F., 2005. Identification of AFLP markers that discriminate between cultivated cotton and the Hawaiian island endemic, Gossypium tomentosumNuttall ex Seeman. Genetic Resources and Crop Evolution, 52: 1069–1078. He, L., Du, C.G., Covaleda, L., Robinson, A.F., Yu, J.Z., Kohel, R.J. and Zhang, H.B., 2004. Cloning, characterization, and evolution of the NBS-encoding resistance gene analogue family in polyploid cotton (Gossypium hirsutum L.). Molecular Plant-Microbe Interactions journal, 17: 1234-1241. Hoffman, S.M., Yu, J.Z., Grum, D.S., Xiao, J., Kohel, R.J. and Pepper, A.E., 2007. Breeding and Genetics: Identification of 700 new microsatellite loci from cotton (G. hirsutum L.). The Journal of Cotton Science, 11: 208-241. Hong, C.P., Piao, Z.Y., Kang, T.W., Batley, J., Yang, T., Hur, Y.K., Bhak, J., Park, B.S., Edwards, D. and Lim, Y.P., 2007. Genomic distribution of simple sequence repeats in Brassica rapa. Molecular Cells, 23: 349-356. Hughes, A.L., 1994. The evolution of functionally novel proteins after gene duplication. Proceedings of the Royal Society of London, B 256: 119–124. Hughes, A.L., Green, J.A., Garbayo, J.M. and Roberts, R.M., 2000. Adaptive diversification within a large family of recently duplicated, placentally expressed genes. Proceedings of the National Academy of Sciences, USA, 97: 3319–3323. Hutchinson, J.B., 1951. Intra-specific differentiation in Gossypium hirsutum. Heredity, 5:169–193. Hutchinson, J.B., 1959. The application of genetics to cotton improvement. Cambridge University Press, Cambridge.

ICAC., 2006. Cotton: Review of the world situation, International cotton advisory Committee, Washington D.C, USA. Iqbal, M.J., Aziz, N., Saeed, N.A., Zafar, Y. and Malik, K.A., 1997. Genetic diversity of some elite cotton varieties by RAPD analysis. Theoretical and Applied Genetics, 94: 139-144. Iqbal, M.J., Reddy, O.U.K., El-Zik, K.M. and Pepper, A.E., 2001. A genetic bottleneck in the ‘evolution under domestication’ of upland cotton Gossypium 167

References

hirsutum L. examined using DNA fingerprinting. Theoretical and Applied Genetics. 103: 547–554. Iqbal, S., Bashir, A., Naseer, H.M., Ahmed, M. and Malik, K.A., 2008. Identification of differentially expressed genes in developing cotton fibers (Gossypium hirsutum L.) through differential display. Electronic Journal of Biotechnology, DOI: 10.2225/vol11-issue1-fulltext-11 Jain, S., Jain, R.K. and McCouch, S., 2004. Genetic analysis of Indian aromatic and quality rice (Oryza sativa L.) germplasm using panels of fluorescently-labeled microsatellite markers. Theoretical and Applied Genetics, 109: 965–977. James, E.,Frelichowski, J.R., Michael, B., Palmer, D.M., Jeffrey, P., Tomkins, R.G., Cantrell, D.M., Stelly., John, Yu., Russell, J. and Kohel, M.U., 2006. Cotton genome mapping with new microsatellites from Acala MaxaBAC-ends. Molecular Genetics and Genomics, 275: 479-491 Jasdanwala, R.T., Singh, Y.D. and Chinoy, J.J., 1977. Auxin metabolism in developing Cotton hairs. Journal of Experimental Botany,28:1111-1116. Jeffreys, A.J., Wilson, V. and Thein, S.L., 1985. Individual-specific 'fingerprints' of human DNA. Nature, 316: 76-79. Jena, S.N., Srivastava, A., Rai, K.M., Ranjan, A., Singh, S.K., Nisar, T., Srivastava, M., Bag, S.K., Mantri, S. and Asif, M.H., 2012. Development and characterization of genomic and expressed SSRs for levant cotton (Gossypium herbaceum L.). Theoretical and Applied Genetics, 124: 565-576. Jiang, B., 2004. Optimization of Agrobacterium mediated cotton transformation using shoot apices explants and quantitative trait loci analysis of yield and yield component traits in Upland cotton (Gossypium hirsutum L.), PhD Thesis, Louisiana State University and Agricultural and Mechanical College, USA. Jiang, C., Wright, R.J., EI-zik, K.M. and Paterson, A.H., 1998. Polyploid formation created unique avenues for response to selection in Gossypium (cotton). Proceedings of the National Academy of Sciences, USA, 95: 4419–4424. Jiang, H., Liao, B., Ren, X., Lei, Y., Mace, E., Fu, T. and Crouch, J.H., 2007. Comparative assessment of genetic diversity of peanut (Arachis hypogaea L.) genotypes with various levels of resistance to bacterial wilt through SSR and AFLP analyses. Journal of Genetics and Genomics, 34: 544-554. John, Z.Y., Kohel, R.J., Fang, D.D., Cho, J., Van Deynze, A., Ulloa, M., Hoffman, S.M., Pepper, A.E., Stelly, D.M. and Jenkins, J.N., 2012. A high-density simple 168

References

sequence repeat and single nucleotide polymorphism genetic map of the tetraploid cotton genome. G3: Genes| Genomes Genetics, 2: 43-58. Karp, A., Isaac, P.G. and Ingram, D.S., 2001. Molecular tools for Screening Biodiversity. Kluwer Academic Publishers, Netherland. Kato, K.K. and R.G. Palmer., 2004. Molecular mapping of four ovule lethal mutants in soybean. Theoretical and Applied Genetics, 108: 577-585. Kearsey, M. and Pooni, H., 1996. The genetical analysis of quantitative traits. Chapman and Hall, London, PP 81-90. Kebede, H., Burow, G., Raviprakash, G. and Randy, R.A., 2006. A-genome cotton as a source of genetic variability for Upland cotton (Gossypium hirsutum) Genetic Resource and Crop Evolution, 54: 885-895. Khan, S.A., Hussain, D., Askari, E., Stewart, J.M., Malik, K.A. and Zafar, Y., 2000. Molecular phylogeny of Gossypium species by DNA fingerprinting. Theoretical and Applied Genetics, 101: 931-938. Kim, H.J. and Triplett, B.A., 2001. Cotton fiber growth in planta and in vitro. Models for plant cell elongation and cell wall biogenesis. Plant Physiology, 127: 1361- 1366. Kirik, A., Salomon, S. and Puchta, H., 2000. Species-specific double-strand break repair and genome evolution in plants. EMBO Journal, 2000: 5562–5566. Koch, M.A., Haubold, B. and Mitchell-Olds, T., 2000. Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Molecular Biology and Evolution, 17: 1483–1498. Kohel, R.J., Yu, J., Park, Y.H. and Lazo, G.R., 2001. Molecular mapping and characterization of traits controlling fiber quality in cotton. Euphytica, 121: 163- 172. Kuleung, C., Baenziger, P.S. and Dweikat, I., 2004. Transferability of SSR markers among wheat, rye and triticale. Theoretical and Applied Genetics, 108: 1147- 1150. kulkarni, V.N., Khadi, B.M., Maralappanavar, M.S., Deshapande, L.A. and Narayanan, S.S., 2009. The world wide gene pools of Gossypium L., and their improvement. Springer book, Genetics and Genomics, 3: PP 69-100.

169

References

Smart, L.B.,Vojdani, F. Maeshima, M. and Wilkins, T.A.,1998. Genes involved in osmoregulation during turgor-driven cell expansion of developing cotton fibers are differentially regulated. Plant Physiology, 116: 1539–1549. La, M., Kantety, R.V., Yu, J.K., and Sorrells, M.E., 2005. Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice wheat and barley. BMC Genomics, 6: 23-35. Lacape, J.M., Nguyen, T.B., Thibivilliers, S., Bojinov, B., Courtois, B., Cantrell, R.G., Burr, B. and Hau, B., 2003. A combined RFLP-SSR-AFLP map of tetraploid cotton based on a Gossypium hirsutum × Gossypium barbadense backcross population. Genome, 46:612-626. Lacape, J.M., Dessauw, D., Rajab, M., Noyer, J.L. and Hau, B., 2007. Microsatellite diversity in tetraploid Gossypium germplasm: assembling a highly informative genotyping set of cotton SSRs. Molecular Breeding, 19:45-58. Lacape, J.M., Jacobs, J., Arioli, T., Derijcker, R., Forestier-Chiron, N., et al., 2009. A new interspecific, Gossypium hirsutum x G. barbadense, RIL population: towards a unified consensus linkage map of tetraploid cotton. Theoretical and Applied Genetics, 119: 281–292. Lacape, J.M., Nguyen, T.B., Thibivilliers, S., Bojinnov, T.B., Courtois, B., Cantrell, R.G., Burr, B. and Hau, B., 2003. A combined RFLP-SSR-AFLP map of tetraploid cotton based on a Gossypium hirsutum x Gossypium barbadense backcross population. Genome Informatics, 46: 612-626. Lagercrantz, U., 1998. Comparative mapping between Arabidopsis thaliana and Brassica nigra indicates that Brassica genomes have evolved through extensive genome replication accompanied by chromosome fusions and frequent rearrangements. Genetics, 150: 1217-1228. Lagercrantz, U., Ellegren, H. and Andersson, L., 1993. The abundance of various polymorphic microsatellite motifs differs between plants and vertebrates. Nucleic Acids Research, 21: 1111-1115. Lagercrantz, U., Ellegren, H. and Andersson, L., 1993. The abundance of various polymorphic microsatellite motifs differs between plants and vertebrates. Nucleic Acids Research, 21: 1111-1115. Lagercrantz, U., Putterill, J., Coupland, G. and Lydiate, D., 1996. Comparative mapping in Arabidopsis and Brassica, fine scale genome collinearity and congruence of genes controlling flowering time. Plant Journal, 9: 13-20. 170

References

Larkin, J.C., Brown, M.L. and Schiefelbein, J., 2003. How do cells know what they want to be when they grow up? Lessons from epidermal patterning in Arabidopsis. Annual Review of Plant Biology, 54: 403–430. Laurie, D. and Devos, K. 2002. Trends in comparative genetics and their potential impacts on wheat and barley research. Plant Molecular Biology, 48: 729–740. Lee, J.M., Nahm, S.H., Kim, Y.M. and Kim, B.D., 2004. Characterization and molecular genetic, mapping of microsatellite loci in pepper. Theoretical and Applied Genetics, 108: 619-627. Lee, M.-K., Zhang, Y., Zhang, M., Goebel, M., Kim, H., Triplett, B., Stelly, D., Zhang, H.-B., 2013. Construction of a plant-transformation-competent BIBAC library and genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.). BMC Genomics,14, 208. Lee, S.H., Bailey, M.A., Mian, M.A.R., Carter, J.T.E., Ashley, D.A., Hussey, D.A., Parrott, W.A. and Boerma, H.R., 1996. Molecular markers associated with soybean plant height, lodging, and maturity across locations. Crop Science, 36: 728-735. Legendre, M. and Verstrepen, K.J., 2008. Using the SERV Applet to Detect Tandem Repeats in DNA Sequences and to Predict Their variability. CSH Protocols Doi:10.1101/pdb.ip50. Leitch, I.L. and Bennett, M.D., 1997. Polyploidy in angiosperms. Trends in Plant Science, 2: 470-476. Li, F., Fan, G., Wang, K., Sun, F., Yuan, Y., Song, G., Li, Q., Ma, Z., Lu, C., Zou, C., Chen, W., Liang, X., Shang, H., Liu, W., Shi, C., Xiao, G., Gou, C., Ye, W., Xu, X., Zhang, X., Wei, H., Li, Z., Zhang, G., Wang, J., Liu, K., Kohel, R.J., Percy, R.G., Yu, J.Z., Zhu, Y.-X., Wang, J. and Yu, S., 2014. Genome sequence of the cultivated cotton Gossypium arboreum. Nature Genetics, 46: 567-572.

Li, G.Q., Li, Z.F., Yang, W.Y., Zhang, Y., He, Z.H., Xu, S.C., Singh, R.P., Qu, Y.Y. and Xia, X.C., 2006. Molecular mapping of stripe rust resistance gene YrCH42 in Chinese wheat cultivar Chuanmai 42 and its allelism with Yr24 and Yr26. Theoretical and Applied Genetics, 112: 1434-1440.

Li, W.H., 1985. Accelerated evolution following gene duplication and its implications for the neutralist-selectionist controversy. In: T. Ohta and K. Aoki (Editors),

171

References

Population genetics and molecular evolution. Berlin: Springer-Verlag, pp. 333– 353. Liang, P. And Pardee, A.B., 1992. Differential display of eukaryotic messenger RNA by means of polymerase chain reaction. Science, 257: 967–971. Lichtenzveig, J., Scheuring, C., Dodge, J., Abbo, S. and Zhang, H.B., 2005. Construction of BAC and BIBAC libraries and their applications for generation of SSR markers for genome analysis of chickpea, Cicer arietinum L. Theoretical and Applied Genetics, 110: 492-510. Lin, L., Pierce, G., Bowers, J., Estill, J., Compton, R., Rainville, L., Kim, C., Lemke, C., Rong, J., Tang, H., Wang, X., Braidotti, M., Chen, A., Chicola, K., Collura, K., Epps, E., Golser, W., Grover, C., Ingles, J., Karunakaran, S., Kudrna, D., Olive, J., Tabassum, N., Um, E., Wissotski, M., Yu, Y., Zuccolo, A., ur Rahman, M., Peterson, D., Wing, R., Wendel, J. and Paterson, A.H., 2010. A draft physical map of a D-genome cotton species (Gossypium raimondii). BMC Genomics, 11: 395. Lin, L., Pierce, G., Bowers, J., Estill, J., Compton, R., Rainville, L., Kim, C., Lemke, C., Rong, J., Tang, H., Wang, X., Braidotti, M., Chen, A., Chicola, K., Collura, K., Epps, E., Golser, W., Grover, C., Ingles, J., Karunakaran, S., Kudrna, D., Olive, J., Tabassum, N., Um, E., Wissotski, M., Yu, Y., Zuccolo, A., ur Rahman, M., Peterson, D., Wing, R., Wendel, J. and Paterson, A.H., 2010. A draft physical map of a D-genome cotton species (Gossypium raimondii). BMC Genomics, 11: 395. Linos, A.A., Bebeli, P.J. and Kaltsikes, P.J., 2002. Cultivar identification in Upland cotton using RAPD markers. Australian Journal of Agricultural Research, 53: 637-642. Litt, M. and Luty, J.A., 1989. A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. American Journal of Human Genetics, 44: 397-401. Liu, S-R., Li, W-Y., Long, D., Hu, C-G. and Zhang, J-Z.,2013. Development and Characterization of Genomic and Expressed SSRs in Citrus by Genome-Wide Analysis. PLoS ONE 8(10) Liu, A. and Burke, J.M., 2006. Patterns of nucleotide diversity in wild and cultivated sunflower. Genetics, 173: 321–330.

172

References

Liu, D., Guo, X.P., Lin, Z., Nie, Y.C. and Zhang, X., 2005. Genetic diversity of Asian cotton (Gossypium arboreum L.) in China evaluated by microsatellite analysis. Genetic Resources and Crop Evolution,53: 1145-1152. Liu, D., Guo, X.P., Lin, Z., Nie, Y.C. and Zhang, X., 2006. Genetic diversity of Asian cotton (Gossypium arboreum L.) in China evaluated by microsatellite analysis. Genetic Resources and Crop Evolution,53: 1145-1152.

Liu, S., Cantrell, R.G., McCarty, J.C. and Stewart, J.M., 2000a. Simple sequence repeat based assessment of genetic diversity in cotton race stock accessions. Crop Science, 40: 1459-1469. Liu, S., Saha, S., Stelly, D., Burr, B. and Cantrell, R.G., 2000b. Chromosomal assignment of microsatellite loci in cotton. Journal of Heredity, 91: 326-332. Liu, Z.J. and Cordes, J.F., 2004. DNA marker technologies and their applications in aquaculture genetics. Aquaculture, 238: 1-37. Liu, S., Griffey, C.A. and ShagaiMaroof, M.A., 2001. Identification of molecular markers associated with adult plant resistance to powdery mildew in common wheat cultivar Massey. Crop Science, 41: 1268-1275. Livak, K.J., Flood, S.A.J., Marmaro, J., Giusti, W. and Deetz, K., 1995. Oligonucleotides with fluorescent dyes at opposite ends provide a quenched probe system useful for detecting PCR product and nucleic acid hybridization. PCR Methods and Applications, 4: 357-362. Lord, E. and Heap, S.A., 1988. The origin and assessment of cotton fibre maturity. International Institute for Cotton, Manchester. Lu, H. and Myers, G.O., 2002. Genetic relationships and discrimination of ten influential Upland cotton varieties using RAPD markers. Theoretical and Applied Genetics, 105: 325-331. Lu, Z.X., Sosinski, B., Reighard, G.L., Baird, W.V. and Abbott, A.G., 1998. Construction of a genetic linkage map and identification of AFLP markers for resistance to root-knot nematodes in peach rootstocks. Genome Biology, 41: 199-207. Lubbers, E.L. and Chee, P.W., 2009. The worldwide gene pool of Gossypium hirsutum and its improvement. Springer book, Genetics and Genomics, 3: PP 23-52.

173

References

Lynch, M. and Conery, J.S., 2000. The evolutionary fate and consequences of duplicate genes. Science, 290: 1151–1155. Lynch, M. and Force, A., 2000. The probability of duplicate gene preservation by subfunctionalization. Genetics, 154: 459-473. Ma, X., Zhou, B., Lu, Y., Guo, W. and Zhang, T., 2008. Simple Sequence Repeat Genetic Linkage Maps of A-genome Diploid Cotton (Gossypium arboreum). Journal of Integrative Plant Biology, 50: 491–502. Macaulay, M., Ramsay, L., Powell, W. and Waugh, R., 2001. A representative, highly informative ‘‘genotyping set’’ of barley SSRs. Theoretical and Applied Genetics 106: 801–809. Masi, P., Spagnoletti, Zeuli, P.L. and Donini, P., 2003. Development and analysis of multiplex microsatellite markers sets in common bean (Phaseolus vulgaris L.). Molecular Breeding, 11: 303–313. Maughan, P.J., Saghai, M.A. and Buss, G.R., 1996. Amplified fragment length polymorphism (AFLP) in soybean: species diversity, inheritance, and near- isogenic line analysis. Theoretical and Applied Genetics, 93: 392-401. McClelland, M., Mathieu-Daude, F. and Welsh, J., 1995. RNA fingerprinting and differential display using arbitrarily primed PCR. Trends in Genetics, 11: 242– 246. McCord, P.H., Sosinski, B.R., Haynes, K.G., Clough, M.E. and Yencho, G.C., 2011. Linkage Mapping and QTL Analysis of Agronomic Traits in Tetraploid Potato Solanum tuberosumsubsp. tuberosum). Crop Science, 51: 771-785. McCouch, S.R., Teytelman, L., Xu, Y., Lobos, K.B., Clare, K., Walton, M., Fu, B., Maghirang, R., Li, Z., Xing, Y., Zhang, Q., Kono, I., Yano, M., Fjellstrom, R., DeClerck, G., Schneider, D., Cartinhour, S., Ware, D. and Stein, L., 2002. Development and mapping of 2240 new SSR markers for rice (Oryza sativa L.). DNA Research, 9: 199-207. McGregor, C.E., van Treuren, R., Hoekstra, R. and van Hintum, T.J.L., 2002. Analysis of the wild potato germplasm of the series Acaulia with AFLPs: implications for ex situ conservation. Theoretical and Applied Genetics, 104: 146-156. Mehetre, S.S., Aher, A.R., Gawande, V.L., Patil, V.R. and Mokate, A.S., 2003. Induced polyploidy in Gossypium: A tool to overcome interspecific

174

References

incompatibility of cultivated tetraploid and diploid cottons. Current Science, 84: 1510-1512. Mei, M., Syed, N.H., Gao, W., Thaxton, P.M., Smith, C.W., Stelly, D.M. and Chen, Z.J., 2004. Genetic mapping and QTL analysis of fibre related traits in cotton. Theoretical and Applied Genetics, 108: 280-291. Meinert, M.C., and Delmer, D.P.,1977. Changes in biochemical composition of the cell wall of the cotton fiber during development. Plant Physiology, 59:1088- 1097. Menzel, M.Y., 1954. A cytological method for genome analysis in Gossypium. Genetics, 40: 214-223. Meredith, W.R., 1992. RFLP association with varietal origin and heterosis. In: D. Herber (Editor), Proceedings of Beltwide Cotton Conference, Nashville, TN, pp. 607. Micheli, M.R. and Bova, R., 1996. Fingerprinting Methods Based on Arbitrarily Primed PCR Springer Verlag, Berlin.DOI 10.1007/978-3-642-60441-6 Mirsky, A.E. and Ris, H., 1951. The DNA content of animal cells and its evolutionary significance. Journal of Gene Physiology, 34: 451-462. Mitchell, S.E., Kresovich, S., Jester, C.A., Hernandez, C.J. and Szwec-McFadden, A.K., 1997. Application of multiplex PCR and fluorescence-based, semi- automated allele sizing technology for genotyping plant genetic resources. Crop Science, 37: 617-624. Mittal, M. and Boora, K.S., 2005. Molecular tagging of gene conferring leaf blight resistance using microsatellites in sorghum [Sorghum bicolor (L.) Moench]. Indian Journal of Experimental Biology, 43: 462-466. Moghaddam, M., Trethowan, R., William, H., Rezai, A., Arzani, A. and Mirlohi, A., 2005. Assessment of genetic diversity in bread wheat genotypes for tolerance to drought using AFLPs and agronomic traits. Euphytica, 141: 147-156. Mohan, M., Nair, S., Bhagwat, A., Krishna, T.G., Yano, M., Bhatia, C.R. and Sasaki, T., 1997. Genome mapping, molecular markers and marker-assisted selection in crop plants. Molecular Breeding, 3: 87-103. Molnar, I., Cifuentes, M., Schneider, A., Benavente, E. and Lang, M.M., 2011. Association between simple sequence repeat-rich chromosome regions and intergenomic translocation breakpoints in natural populations of allopolyploid wild wheats. Annals of Botany, 107: 65–76. 175

References

Moore, J.F. (1996) Cotton Classification and Quality. p. 51–57.In E.H. Glade Jr., L.A. Meyer, and H. Stults (ed.) The cotton industry in the United States. USDA- ERSAgric.Econ. Rep. 739. U.S. Gov. Print. Office, Washington,DC. Morgante, M. and Olivieri, A.M., 1993. PCR-amplified microsatellites as markers in plant genetics. Plant Journal, 3: 175-182. Morgante, M., Hanafey, M. and Powell, W., 2002. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nature Genetics, 30: 194- 200. Mukhtar, M.S., Rahman, M. and Zafar, Y., 2002. Assessment of genetic diversity among wheat (Triticum aestivum L.) cultivars using random amplified polymorphic DNA (RAPD) analysis. Euphytica, 128: 417-425. Mullis, K.B. and Faloona, F., 1987. Specific synthesis of DNA in vitro via a polymerase-catalysed chain reaction. Methods Enzymology, 155: 335-350. Mumtaz, A.S., Naveed, M. and Shanwari, Z.K., 2010. Assessment of genetic diversity and germination patterns in selected cotton genotypes of Pakistan. Pakistan Journal of Botany, 42: 3949-3956 Murtaza, N., Kitaoka, M. and Ali, G.M., 2005. Genetic differentiation of cotton cultivars by polyacrylamide gel electrophoresis. Journal of Central Europe Agriculture, 6: 69-76. Narasimhamoorthy, B., Bouton, J.H., Olsen, K.M. and Sledge, M.K., 2007. Quantitative trait loci and candidate gene mapping of aluminum tolerance in diploid alfalfa. Theoretical and Applied Genetics, 114: 901-913. Nei, N. andLi, W.,1979. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proceedings of the National Academy Sciences of the USA, 76: 5269-5273 Nei, M., 2007. The new mutation theory of phenotypic evolution. PNAS, 104: 12235- 12242. Nguyen, T.B., Giband, M., Brottier, P., Risterucci, A.M. and Lacape, J.M., 2004. Wide coverage of the tetraploid cotton genome using newly developed microsatellite markers. Theoretical and Applied Genetics, 109: 167-175. Niu, C., Lu, Y. and Zhang, J., 2006. Plant disease resistance gene analogues in cotton: mapping and expression, Beltwide Cotton Conference, Texas. Ohno, S., 1970. Evolution by gene duplication. New York: Springer-Verlag.

176

References

Oliver, M.J., Petrov, D., Ackerly, D., Falkowski, P. and Schofield, O.M., 2007. The mode and tempo of genome size evolution in eukaryotes. Genome Research, 17: 594-601. Orel, N. and Puchta, H., 2003. Differences in the processing of DNA ends in Arabidopsis thaliana and tobacco: Possible implications for genome evolution. Plant Molecular Biology, 51: 523-531. Park, Y.H., Alabady, M.S., Ulloa, M., Sickler, B., Wilkins, T.A., Yu, J., Stelly, D.M., Kohel, R.J., El-Shiny, O.M. and Cantrell, R.G., 2005. Genetic mapping of new cotton fibre loci using EST-derived microsatellites in an interspecific recombinant inbred (RIL) cotton population. Molecular Genetics Genomics, 274: 428-441. Paterson, A., Estill, J., Rong, J., Williams, D. and Marler, B., 2002., Toward a genetically anchored physical map of the cotton genomes. Cotton science, 14 (Suppl.): 31. Paterson, A.H., Bowers, J.E., Bruggmann, R., Dubchak, I., Grimwood, J., Gundlach, H., Haberer, G., Hellsten, U., Mitros, T., Poliakov, A., Schmutz, J., Spannagl, M., Tang, H., Wang, X., Wicker, T., Bharti, A.K., Chapman, J., Feltus, F.A., Gowik, U., Grigoriev, I.V., Lyons, E., Maher, C.A., Martis, M., Narechania, A., Otillar, R.P., Penning, B.W., Salamov, A.A., Wang, Y., Zhang, L., Carpita, N.C., Freeling, M., Gingle, A.R., Hash, C.T., Keller, B., Klein, P., Kresovich, S., McCann, M.C., Ming, R., Peterson, D.G., Rahman, M., Ware, D., Westhoff, P., Mayer, K.F.X., Messing, J. and Rokhsar, D.S., 2009. The Sorghum bicolor genome and the diversification of grasses. Nature, 457: 551-556. Paterson, A.H., Wendel, J.F., Gundlach, H., Guo, H., Jenkins, J., Jin, D., Llewellyn, D., …….Rahman, M., Rokhsar, D.S., Wang, X. and Schmutz, J., 2012. Repeated polyploidization of Gossypium genomes and the evolution of spinable cotton fibers. Nature, Doi: 10.1038/nature11798. Paterson, A.H., Lin, Y.R., Li, Z.K., Schertz, K.F., Doebley, J.F., Pinson, S.R.M., Liu, S.C., Stansel, J.W., and Irvine J.E., 1995. Convergent Domestication of Cereal Crops by Independent Mutations at Corresponding Genetic-Loci. Science 269: 1714-1718. Patil, N.B. and Singh, M., 1995. Development of medium staple high-strength cotton suitable for rotor spinning systems, p. 264–267. In G.A. Constable and N.W.

177

References

Forrester (ed.) Challenging the future. Proc. World Cotton Conf. I, Brisbane, Australia, 14–17 Feb. 1994. CSIRO, Melbourne, Australia. Peakall, R., Gilmore, S., Keys, W., Morgante, M. and Rafaske, A., 1998. Cross- species amplification of soybean (Glycine max) simple sequence repeat (SSRs) within the genus and other legume genera: implications for the transferability of SSRs in plants. Molecular Biology and Evolution, 15: 1275-1287. Peng, J.H. and Lapitan, N.L.V., 2005. Characterization of EST-derived microsatellites in the wheat genome and development of eSSR markers. Functional and Integrative Genomics, 5: 80–96. Percival, A.E. and Kohel, R.J., 1990. Distribution, collection, and evaluation of Gossypium. Advances in Agronomy 44: 225–256 Percival, A.E., Wendel, J.F. and Stewart, J.M., 1999. Taxonomy and germplasm resources. In: Smith CW, Cothren JT (eds) Cotton origin, history, technology, and production. John Wiley & Sons Inc., New York, pp 33–63 PES., 2013. Pakistan economic survey (PES) 2013-14. http://www.finance.gov.pk

Pettigrew, W.T., 1995. Source-to-sink manipulation effects on cotton fiber quality. Agronomy Journal, 87:947-952. Phillips, L.L., 1966. The cytology and Phylogenetic of the diploid species of Gossypium. Annals Journal of Botany, 53: 328-335. Pillay, M. and Meyers, G.O., 1999. Genetic diversity in cotton assessed by variation in ribosomal RNA genes and AFLP markers. Crop Science, 39: 1881-1886. Poncet, V., Rondeau, M., Tranchant, C., Cayrel, A., Hamon, S., de Kochko, A. and Hamon, P., 2006. SSR mining in coffee tree EST databases: potential use of EST-SSRs as markers for the Coffea genus. Molecular Genetics and Genomics, 276: 436-449. Powell, W., Morgante, M., Andre, C., Hanafey, M., Vogel, J., Tingey, S. and Rafalski, A., 1996. The comparison of RFLP, RAPD, AFLP and SSR (microsatellte) markers for germplasm analysis. Molecular Breeding, 2: 225- 238. Pu, L., Li, Q., Fan, X., Yang, W. andXue, Y., 2008. The R2R3 MYB transcription factor GhMYB109 is required for cotton fiber development. Genetics, 180: 811– 820.

178

References

Qureshi, S.N., Saha, S., Kantety, R.V. and Jenkins, J.N., 2004. EST-SSR: a new class of genetic markers in cotton. Journal of Cotton Science, 8: 112-123. Raboin, L.M., Oliveira, K.M., Lecunff, L., Telismart, H., Roques, D., Butterfield, M., Hoarau, J.Y. and D'Hont, A., 2006. Genetic mapping in sugarcane, a high polyploid, using bi-parental progeny: identification of a gene controlling stalk colour and a new rust resistance gene. Theoretical and Applied Genetics, 112: 1382-1391. Rafalski, J.A. and Tingey, S.V., 1993. Genetic diagnostics in plant breeding: RAPDs, microsatellites and machines. Trends in Genetics, 9: 275-279. Rafalski, J.A., 1997. Randomly amplified polymorphic DNA (RAPD) analysis. In: G.C. Anolles and P.M. Gresshoff (Editors), DNA markers : Protocols, Applications and Overviews. Wiley-Liss, Inc. USA, pp. 364. Rahman, M., Zafar, Y. and Paterson, A.H., 2009.Gossypium DNA markers: Types, numbers and uses. In: Paterson AH (ed) Genetics and genomics of cotton, Springer, New York. 3: 101-139 Rahman, M., Asif, M., Shaheen, T., Tabbasam, N., Zafar, Y. and Paterson, A. H., 2011. Marker-assisted breeding in Higher Plants. In: Eric Lichtfouse (ed) Sustainable Agriculture Reviews 6; Alternative Farming Systems, Biotechnology, Drought Stress and Ecological Fertilization. Springer, 39–76 Rahman, M., Ahmed, N., Asif, M. and Zafar, Y., 2006. Identification of DNA markers linked with cotton leaf curl disease (CLCD). International Cotton Genome Initiative (ICGI) Workshop, Brasilia Brazil. 77-78. Rahman, M., Asif, M., Ullah, I., Malik, K.A. and Zafar, Y., 2005a. Overview of cotton genomic studies in Pakistan, Plant and Animal Genome Conference XIII. San Diego, CA. USA. Rahman, M., Hussain, D. and Zafar, Y., 2002a. Estimation of genetic divergence among elite cotton (Gossypium hirsutum L.) cultivars/genotypes by DNA fingerprinting technology. Crop Science, 42: 2137-2144. Rahman, M., Hussain, D., Malik, T.A. and Zafar, Y., 2005b. Genetics of resistance to cotton leaf curl disease in Gossypium hirsutum. Plant Pathology, 54: 764-772. Rahman, M., Malik, T.A., Aslam, N., Asif, M., Ahmad, R., Khan, I.A. and Zafar, Y., 2002b. Optimisation of PCR Conditions to Amplify Microsatellite Loci in Cotton (Gossypium hirsutum L.). International Journal of Agiculture and Biology, 2: 282-284. 179

References

Rahman, M., Tabassum, N., Ullah, I., Asif, M. and Zafar, Y., 2008. Studying the extent of genetic diversity among Gossypium arboreum L. genotypes/cultivars using DNA fingerprinting. Genetic Resources and Crop Evolution, 55: 331-339. Rallo, P., Dorado, G. and Martin, A., 2000. Development of simple sequence repeats (SSRs) in olive tree (OleaeuropaeaL.). Theoretical and Applied Genetics. 101: 984-989. Ramsey, J.C. and Berlin, J.D., 1976. Bot. Gaz (Chicago) 137:11-19. Rangel, P.N., Brondani1, R.P.V., Coelho, A.S.G., Rangel, P.H.N. and Brondani, C., 2007. Comparative linkage mapping of Oryza glumaepatula and Oryza sativa interspecific crosses based on microsatellite and expressed sequence tag markers. Genetics and Molecular Biology, 30: 614-622. Rathore, K.S., Sunilkumar, G., Cantrell, R.G., Hague, S. and Reding, H., .2008. Transgenic Sugar,Tuber and Fiber Crops. In: Compendium of Transgenic Crop Plants. Edited by Kole C, Hall TC, Oxford, UK: Blackwell Publishing, 7: 199- 238. Reddy, A., Haisler, R.M., Yu, J. and Kohel, R.J., 1997. AFLP mapping in cotton, Plant Animal Genome Conf. V. USA. Reddy, O.U.K., Pepper, A.E., Abdurakhmonov, I., Saha, S., Jenkins, J.N., Brooks, T., Bolek, Y. and El-Zik, K.M., 2001. New dinucleotide and trinucleotide microsatellite marker resources for cotton genome research. Journal of Cotton Science, 5: 103-113. Reinisch, M.J., Dong, J., Brubaker, C.L., Stelly, D.M., Wendel, J.F. and Paterson, A.H., 1994. A detailed RFLP map of cotton, Gossypium hirsutum x Gossypium barbadense : chromosome organization and evolution in a disomic polyploid genome. Genetics, 138: 829-847. Rohlf, F.J., 2005. Geometric morphometrics simplified. Review of "Geometric Morphometrics for Biologists: a primer". Trends in Ecology and Evolution, 20: 13-14. Rong, J., Abbey, C., Bowers, J.E., Brubaker, C.L., Chang, C., Chee, P.W., Delmonte, T.A., Ding, X., Garza, J.J., Marler, B.S., et al., 2004. A 3347-locus genetic recombination map of sequence-tagged sites reveals features of genome organization, transmission and evolution of cotton (Gossypium). Genetics, 166: 389-417.

180

References

Rong, J., Feltus, E.A., Waghmare, V.N., Pierce, G.J., Chee, P.W., Draye, X., Saranga, Y., Wright, R.J., Wilkins, T.A., May, O.L., et al., 2007. Meta-analysis of polyploid cotton QTL shows unequal contributions of subgenomes to a complex network of genes and gene clusters implicated in lint fiber development. Genetics, 176: 2577-2588. Rong, J., Abbey, C., Bowers, J.E., Brubaker, C.L., Chang, C., Chee, P.W., Delmonte, T.A., Ding, X., Garza, J.J., Marler, B.S., Park, C., Pierce, G.J., Rainey, K.M., Rastogi, V.K., Schulze, S.R., Trolinder, N.L., Wendel, J.F., Wilkins, T.A., Williams-Coplin, T.D., Wing, R.A., Wright, R.J., Zhao, X., Zhu, L. and Paterson, A.H., 2004. A 3347-Locus genetic recombination map of sequence- tagged sites reveals features of genome organization, transmission and evolution of cotton (Gossypium). Genetics, 166: 389-417. Rong, J., Bowers, J.E., Schulze, S.R., Waghmare, V.N., Rogers, C.J., Pierce, G.J., Zhang, H., Estill, J.C. and Paterson, A.H., 2005. Comparative genomics of Gossypium and Arabidopsis: unraveling the consequences of both ancient and recent polyploidy. Genome Research, 15: 1198-1210. Rongwen, J., Akkaya, H.S., Bhagwat, A.A., Lavi, U. and Crega, P.B., 1995. The use of microsatellite DNA markers for soybean genotype identification. Theoretical and Applied Genetics, 90: 43-48. Rossetto, M., McNally, J. and Henry, R.J., 2002. Evaluating the potential of SSR flanking regions for examining taxonomic relationships in the Vitaceae. Theoretical and Applied Genetics, 104: 61-66. Ruan, Y.L., Llewellyn, D.J. and Furbank,R.T., 2001. The control of single celled cotton fiber elongation by developmentally reversible gating of plasmodesmata and coordinated expression of sucrose and K+ transporters and expansin. Plant Cell, 13: 47-60. Ruan, Y.L., Llewellyn, D.J. and Furbank, R.T., 2003. Suppression of sucrose synthase gene expression represses cotton fiber cell initiation, elongation and seed development. The Plant Cell, 15: 952-964. Ruan, Y.L., Xu, S.M., White, R. and Furbank, R.T., 2004. Genotypic and developmental evidence for the role of plasmodesmatal regulation in cotton fiber elongation mediated by callose turnover. Plant Physiology, 136: 4104– 4113.

181

References

Rungis, D., Llewellyn, D., Dennis, A.E.S. and Lyon, B.R., 2002. Investigation of the chromosomal location of the bacterial blight resistance gene present in an Australian cotton (Gossypium hirsutum L.) cultivar. Australian Journal of Agricultural Research, 53: 551-560. Rungis, D., Llewellyn, D., Dennis, E.S. and Lyon, B.R., 2005. Simple sequence repeat (SSR) markers reveal low levels of polymorphism between cotton (Gossypium hirsutum L.) cultivars. Australian Journal of Agricultural Research, 56: 301-307. Orford, S.J. Timmis,J.N., 1997. Abundant mRNAs specific to the developing cotton fibre. Theoretical and Applied Genetics, 94: 909-918. Saghai Maroof, M.A., Biyashev, R.M., Yang, G.P., Zhang, Q. and Allard, R.W., 1994. Extraordinarily polymorphic microsatellite DNA in barley: species diversity, chromosomal locations, and population dynamics. Proceedings of the National Academy of Sciences, USA, 91: 5466-5470. Saha, M.C., Main, M.A.R., Eujayl, L., Zwonitzer, J.C., Wang, L. and May, G.D., 2004. Tall fescue EST-SSR markers with transferability across several grass species. Theoretical and Applied Genetics, 109: 783–791. Saha, S. and Stelly, D.M., 1994. Journal of Heredity. 85: 35-39. Saha, S., Karaca, M., Jenkins, J.N., Zipf, A.E., Reddy, O.U.K., Ramesh, V. and Kantety, R.V., 2003. Simple sequence repeats as useful resources to study transcribed genes of cotton. Euphytica, 130: 355-364. SanMiguel, P., Tikhonov, A., Jin, Y.K., Motchoulskaia, N., Zakharov, D., Melake- Berhan, A., Springer, P.S., Edwards, K.J., Lee, M., Avramova, Z. and Bennetzen, J.L., 1996. Nested retrotransposons in the intergenic regions of the maize genome. Science, 274: 765-768. Sato, K., Nankaku, N. and Takeda, K., 2009. A high-density transcript linkage map of barley derived from a single population. Heredity,103: 110-117. Schiefelbein, J., 2003. Cell-fate specification in the epidermis: A common patterning mechanism in the root and shoot. Current Opinion in Plant Biology, 6: 74-78. Schierwater, B., 1995. Arbitrarily amplified DNA in systematics and phylogenetics. Electrophoresis, 16: 1643-1647. Schmidt, T. and Heslop, H.J., 1998. Genomes, genes and junk: the largescale organization of plant chromosome. Trends in Plant Science, 3: 195-198.

182

References

Schon, C.C., Lee, M., Melchinger, A.E., Guthrie, W.D. and Woodman, W.L., 1993. Mapping and characterization of quantitative trait loci affecting resistance against second generation European corn borer in maize with the aid of RFLPs. Heredity, 70: 648-659. Schubert, A.M., Benedict, C.R., Berlin, J.D. and Kohel, R.J., 1973. Cotton fiber development kinetics of cell elongation and secondary wall thickening. Crop Science, 13: 704-709. Schwartz B.M. and Smith, C.W., 2008. Genetic gain in yield potential of upland cotton under varying plant densities. Crop Science, 48: 601-605. Seelanan, T., Schnabel, A. and Wendel, J.F., 1997. Congruence and consensus in the cotton tribe. Systematic Botany, 22: 259-290. Semagn, K., Bjornstad, A. and Ndjiondjop, M.N., 2006a. An overview of molecular marker methods for plants. African Journal of Biotechnology, 5: 2540-2568. Semagn, K., Bjornstad, A., Skinnes, H., Maroy, A.G., Tarkegne, Y. and William, M., 2006b. Distribution of DArT, AFLP, and SSR markers in a genetic linkage map of a doubled-haploid hexaploid wheat population. Genome, 49: 545-555. Shaheen, H.L., 2005. Global view of genetic diversity among cotton cultivars/genotypes by microsatellite analysis. MPhill thesis. Department of botany University of Agriculture, Faisalabad, Pakistan. Shahmuradov, I.A., Akbarova, Y.Y., Solovyev, V.V. and Aliyev, J.A., 2003. Abundance of plastid DNA insertions in nuclear genomes of rice and Arabidopsis. Plant Molecular Biology Reporter, 52: 923-934. Shappley, Z.W., 1994. RFLPs in cotton (Gossypium hirsutum L.): Feasability of use, diversity among plants within a line, and establishment of molecular markers and linkage groups among two F2 populations, M.S. thesis. Mississippi State University, Mississippi State. Shappley, Z.W., Jenkins, J.N., Meredith, S.R. and McCarty, J.C., 1998. An RFLP linkage map of upland cotton, Gossypium hirsutum L. Theoratical and Applied Genetics, 97: 756-761. Siebert, J.D. and Stewart, A.M., 2006. Correlation of defoliation timing methods to optimize cotton yield, quality, and revenue. Journal of Cotton Science, 10:146- 154. Small, R.L., Ryburn, J.A., Cronn, R.C., Seelanan, T. and Wendel, J.F., 1998. The tortoise and the hare: choosing between noncoding plastome and nuclear Adh 183

References

sequences for phylogeny reconstruction in a recently diverged plant group. American Journal of Botany, 85: 1301-1315. Small, R.L., Ryburn, J.A. and Wendel, J.F., 1999. Low levels of nucleotide diversity at homoeologous Adh loci in allotetraploid cotton (Gossypium L.). Molecular Biology and Evolution, 16: 491-501. Small, R.L. and Wendel, J.F., 2000a. Copy number lability and evolutionary dynamics of the Adh gene family in diploid and tetraploid cotton (Gossypium). Genetics, 155: 1913-1926. Small, R.L. and Wendel, J.F., 2000b. Phylogeny, duplication, and intraspecific variation of Adh sequences in new world diploid cotton (Gossypium L., Malvaceae). Molecular Phylogenetics and Evolution, 16: 73-84. Smulders, M.J.M., Bredemeijer, G., Rus-Kortekaas, W., Arens, P. and Vosman, B., (1997) Use of short microsatellites from database sequences to generate polymorphisms among Lycopersicon esculentum cultivars and accessions of other Lycopersicon species. Theoretical and Applied Genetics, 97: 264–272. doi:10.1007/s001220050409. Sobotka, R., Dolanska, L., Curn, V. and Ovesna, J., 2004. Fluorescence-based AFLPs occur as the most suitable marker system for oilseed rape cultivar identification. Journal of Applied Genetics, 45: 161-173. Sokal, R.R. and Michener, C.D., 1958. A Statistical Method for Evaluating Systematic Relationships. The University of Kansas Scientific Bulletin, 38: 1409-1438. Song, Q.J., Quigley, C.V., Nelson, R.L., Carter, T.E., Boerma, H.R., Strachan, J.L. and Cregan, P.B., 1999. A selected set of trinucleotide simple sequence repeat markers for soybean cultivar identification. Plant Varieties and Seeds, 12: 207- 220. Srivastava, A., Jena, S.N., Ranjan, A., Kavita, P., Asif, M.H., Bag, S.K., Shukla, R.P., Yadav, H.K. and Sawant, S.V., 2013. Development of molecular markers from Indian genotypes of two Gossypium L. species. Plant Breeding, 132: 506-513. Stephens, S.G., 1967. Evolution under domestication of the new world cottons (Gossypium spp). Ciencia E Cultura, 19: 118-134. Stewart, J.M., 1995. Potential for crop improvement with exotic germplasm and genetic engineering. In: F.N. Constable GA (Editor), Challenging the future.

184

References

Proceedings of the world cotton research conference-1, Brisbane Australia, CSIRO, Melbourne, 313-327. Stewart, J.M.C.D., Craven, L.A., Brubaker, C.L. and Wendel J.F., 2008. Gossypium anapoides: a new species of Gossypium. Novon (in press). Stewart, J.M. andBenedict, P.A., 1976. Fiber initiation on the cotton ovule (Gossypium hirsutum). American Journal of Botany, 62: 723-730. Struss, D. and Plieske, J., 1998. The use of microsatellite markers for detection of genetic diversity in barley populations. Theoretical and Applied Genetics, 97: 308-315. Subramaniam, G., Palchamy, K., Robert, P., Eguru, R. and Jaw-Fen, W., 2011. Development of tomato SSR markers from anchored BAC clones of chromosome 12 and their application for genetic diversity analysis and linkage mapping . Euphytica, 178: 283-295. Suh, J.P., Jeung, J.U., Lee, J.I., Choi, Y.H., Yea, J.D., Virk, P.S., Mackill, D.J. and Jena, K.K., 2010. Identification and analysis of QTLs controlling cold tolerance at the reproductive stage and validation of effective QTLs in cold-tolerant genotypes of rice (Oryza sativa L.). Theoretical and Applied Genetics, 120: 985- 995. Sun, D.L., Sun, J.L., Jia, Y.H., Ma, Z.Y. and Du, X.M., 2009. Genetic diversity of colored cotton analyzed by simple sequence repeat markers. International Journal of Plant Sciences, 170: 76-82. Sverdlov, V.E., Dukhanina, O.I., Hoebee, B. and Rapp, J.P., 1998. Linkage mapping of fifty-eight new rat microsatellite markers. Mammalian Genome, 9: 816-821. Syed, N.H., Lee, H.S., Mei, M., Thaxton, P., Stelly, D.M. and Chen, Z.J., 2001. Variability and evolution of microsatellite loci in cotton (Gossypium) diploid and polyploidy genomes. Plant & Animal Genome IX Conference. Town & Country Hotel, San Diego, CA, January, 13-17. Tabbasam, N., Zafar, Y. and Rahman, M., 2014. Pros and cons of using genomic SSRs and EST-SSRs for resolving phylogeny of the Genus Gossypium. Plant Systematic and Crop Evolution, 300: 559-575. Tang, R., Gao, G., He, L., Han, Z., Shan, S., Zhong, R., Zhou, C., Jiang, J., Li, Y. and Zhuang, W., 2007. Genetic diversity in cultivated groundnut based on SSR markers. Journal of Genetics and Genomics, 34: 449-459.

185

References

Tanksley, S.D., 1983. Molecular markers in plant breeding. Plant Molecular Biology Reporter, 1: 3-8. Tanksley, S.D., 1993. Mapping polygenes. Annual Review of Genetics, 27: 205-233. Tanksley, S.D., Bernatzky, R., Lapitan, N.L. and Prince, J.P., 1988. Conservation of gene repertoire but not gene order in pepper and tomato. Proceedings of the National Academy of Sciences, USA, 85: 6419-6423. Tatineni, V., Cantrell, R.G. and Davis, D.D., 1996. Genetic diversity in elite cotton germplasm determined by morphological characteristics and RAPDs. Crop Science, 36: 186-192. Thiel, T., Michalek, V. and Graner, A., 2003. Exploiting EST databases for the development and characterization of gene derived SSR-markers in barley (Hordeumvulgare L.). Theoretical and Applied Genetics, 106: 411-422. Thomas, C.A., 1971. The genetic organization of chromosomes. Annual Review of Genetics, 5: 237-256. Thomson, M.J., Septiningsih, E.M., Suwardjo, F., Santoso, T.J., Silitonga, T.S. and McCouch, S.R., 2007. Genetic diversity analysis of traditional and improved Indonesian rice (Oryza sativa L.) germplasm using microsatellite markers. Theoretical and Applied Genetics, 114: 559-568. Thuillet, A.C., Bataillon, T., Sourdille, P. and David, J.L., 2004. Factors affecting polymorphism at microsatellite loci in bread wheat (Triticum aestivum L. Thell): effects of mutation processes and physical distance from the centromere. Theoretical and applied genetics, 108: 368-377. Tikhonov, A.P., SanMiguel, P.J., Nakajima, Y., Gorenstein, N.M., Bennetzen, J.L. and Avramova, Z., 1999. Colinearity and its exceptions in orthologous adh regions of maize and sorghum. Proceedings of the National Academy of Sciences, USA, 96: 7409-7414. Torada, A., Koike, M., Mochida, K. and Ogihara, Y., 2006. SSR-based linkage map with new markers using an intraspecific population of common wheat. Theoretical and Applied Genetics, 112: 1042-1051. Tyagi, P., Gore, M.A., Bowman, D.T., Campbell, B.T., Udall, J.A. and Kuraparthy, V., 2014. Genetic diversity and population structure in the US Upland cotton (Gossypium hirsutum L.). Theoretical and Applied Genetics, 127: 283-295. Ulloa, M., Saha, S., Jenkin, N., Meredith, W.R., McCarty, J.C. and Stelly, D.M., 2005. Chromosomal assignment of RFLP linkage groups harboring important 186

References

QTL on an intraspecific cotton (Gossypium hirsutum L.) join map. Journal of Heredity, 96: 132-144. Ullao, M., Stewart, J. M.C.D. Garcia, C.E.A., Godboy, A.S.A., Gaytan, M. and Acosta, N.S., 2006. Cotton genetic resources in the western states of Mexico: in situ conservation status and germplasm collection for ex situ preservation. Genetic Resources and Crop Evolution, 53: 653-668. Ulloa, M., Wang, C. and Roberts, P.A., 2010. Gene action analysis by inheritance and quantitative trait loci mapping of resistance to root-knot nematodes in cotton. Plant Breeding, 129: 541-550. Ulloa, M., Wang, C., Hutmacher, R., Wright, S., Davis, R., et al., 2011. Mapping Fusarium wilt race 1 resistance genes in cotton by inheritance QTL and sequencing composition. Molecular Genetics and Genomics, 286: 21-36. Van Becelaere, G., Lubbers, E.L., Paterson, A.H. and Chee, P.W., 2005. Pedigree vs. RFLP based genetic similarity estimates in cotton. Crop Science, 45: 2281-2287. Varshney, R.K., Graner, A. and Sorrells, M.E., 2005. Genic microsatellite markers in plants: features and applications. Trends Biotechnology, 23: 48-55. Varshney, R.K., Grosse, I., Hahnel, U., Siefken, R., Prasad, M., Stein, N., Langridge, P., Altschmied, L. and Graner, A., 2006. Genetic mapping and BAC assignment of EST-derived SSR markers shows non-uniform distribution of genes in the barley genome. Theoretical and Applied Genetics, 113: 239-350. Varshney, R.K., Marcel, T.C., Ramsay, L., Russell, J., Roder, M.S., Stein, N., Waugh, R., Langridge, P., Niks, R.E. and Graner, A., 2007. A high density barley microsatellite consensus map with 775 SSR loci. Theoretical and Applied Genetics, 114: 1091-1103. Vigouroux, Y., Jaqueth, J.S., Matsuoka, Y., Smith, O.S., Beavis, W.D., Smith, S.C. and Doebley, J., 2002. Rate and pattern of mutation at microsatellite loci in maize. Molecular Biology and Evolution, 19: 1251–1260. Vigouroux, Y., Mitchell, S., Matsuoka, Y., Hamblin, M., Kresovich, S., Smith, J.S.C., Jaqueth, J., Smith, O.S. and Doebley, J., 2005. An analysis of genetic diversity across the maize genome using microsatellites. Genetics, 169: 1617-1630. Vinogradov, A.E., 1999. Intron–genome size relationship on a large evolutionary scale. Journal of Molecular Evolution, 49: 376-384.

187

References

Wang, B.H., Guo, W.Z., Zhu, X.F., Wu, Y.T., Huang, N.T. and Zhang, T.Z., 2006. QTL mapping of fiber quality in an elite hybrid derived-RIL population of upland cotton. Euphytica, 152: 367-378. Wang, C., Guo, W., Cai, C. and Zhang, T., 2006a. Characterization, development and exploitation of EST-derived microsatellites in Gossypium raimondii Ulbrich. Chinese Science bulletin, 51: 557-561. Wang, C., Ulloa, M. and Roberts, P.A., 2006b. Identification and mapping of microsatellite markers linked to a root-knot nematode resistance gene (rkn1) in Acala NemX cotton (Gossypium hirsutum L.). Theoretical and Applied Genetics, 112: 770-777. Wang, G., Schmalenbach, I., Korff, M.V., Leon, J., Kilian, B., Rode, J. and Pillen, K., 2010. Association of barley photoperiod and vernalization genes with QTLs for flowering time and agronomic traits in a BC2DH population and a set of wild barley introgression lines. Theoretical and Applied Genetics, 120: 1559- 1574. Wang, H., Li, X., Gao, W., Jin, X., Zhang, X., Lin, Z., 2014. Comparison and development of EST–SSRs from two 454 sequencing libraries of Gossypium barbadense. Euphytica, 198: 277-288. Wang, H., Wang, X., Chen, P. and Liu, D., 2007. Assessment of genetic diversity of Yunnan, Tibetan, and Xinjiang wheat using SSR markers. Journal of Genetics and Genomics, 34: 623-633. Wang, H.Y., Wei, Yu.M., Yan, Z.H. and Zheng, Y.L., 2007. EST-SSR DNA polymorphism in durum wheat (Triticum durum L.) collections. Journal of Applied Genetics, 48: 35-42. Wang, J., Lydiate, D.J., Parkin, I.A.P., Falentin, C., Delourme, R., Carion, P.W.C. and King, G.J., 2011. Integration of linkage maps for the Amphidiploid Brassica napus and comparative mapping with Arabidopsis and Brassica rapa. BMC Genomics,12: 101-120. Wang, K., Guo, W. and Zhang, T., 2007. Development of one set of chromosome- specific microsatellite-containing BACs and their physical mapping in Gossypium hirsutum L. Theoretical and Applied Genetics, 115: 675-682. Wang, K., Song, X., Han, Z., Guo, W., Yu, J.Z., Sun, J., Pan, J., Kohel, R.J. and Zhang, T., 2006. Complete assignment of the chromosomes of Gossypium

188

References

hirsutum L. by translocation and fluorescence in situ hybridization mapping. Theoretical and Applied Genetics, 113: 73-80. Wang, K., Wang, Z., Li, F., et al., 2012. The draft genome of a diploid cotton Gossypium raimondii. Nat Genet, 44: 1098–1103. Wang, X., Ma, J., Yang, S., Zhang, G. and Ma, Z., 2007. Assessment of genetic diversity among Chinese Upland cottons with Fusarium and/or Verticillium wilts resistance by AFLP and SSR markers. Frontiers of Agriculture in China, 1: 129-135. Wang, X., Yu, Y., Sang, J., Wu, Q., Zhang, X. and Lin, Z., 2013. Intraspecific linkage map construction and QTL mapping of yield and fiber quality of' Gossypium barbadense. Australian Journal of Crop Science, 7: 1252-1261. Wangzehen, G., Caiping, C., Changbio, W., Zhiguo, Han., Xianliang, S., Kai, W., Xiaowei, N., Cheng, W., Keyu, Lu., Ben, S. and Tianzehen, Z., 2007. A microsatellite-based, gene rich linkage map reveals genome structure, function and evaluation in Gossypium. The Genetics Society of America, 176: 527-541. Waugh, R., Bonar, N., Baird, E., Thomas, B., Graner, A., Hayes, P. and Powell, W., 1997. Homology of AFLP products in three mapping populations of barley. Molecular Genetics and Genomics, 255: 311-321. Weller, J.I., Soller, M. and Brody, T., 1988. Linkage analysis of quantitative traits in an interspecific cross of tomato (L. esculentum x L. pimpinellifolium) by means of genetic markers. Genetics, 118: 329-339. Wendel, J. and Albert, V.A., 1992. Phylogenetics of the cotton genus (Gossypium): characteristics weighted parsimony analysis of chloroplast-DNA restriction site data and its systematic and biogeographic implications. Systematic Botany, 17: 115-143. Wendel, J.F., 1989. New world tetraploid cottons contain old-world cytoplasm. Proceedings of the National Academy of Sciences, 86: 4132-4136. Wendel, J.F. and Cronn, R., 2003. Polyploidy and the evolutionary history of cotton. Advances in Agronomy, 78: 139-186. Wendel, J.F., 1995. Cotton. In: N.W. Simmonds (Editor), Evolution of crop plants. Longman Scientific & Technical, Essex, England, 358-366. Wendel, J.F., 2000. Genome evolution in polyploids. Plant Molecular Biology, 42: 225-249.

189

References

Wendel, J.F., Brubaker, C., Alvarez, I., Cronn, R. and Srewart, J.M.C.D., 2009. Evolution and natural history of the cotton Genus. Springer book, Genetics and Genomics, 3: 3-22. Wendel, J.F., Brubaker, C.L. and Percival, A.E., 1992. Genetic diversity in Gossypium hirsutum and the origin of upland cotton. American Journal of Botany, 79: 1291-1310. Werker, E., 2000. Trichome diversity and development. Advances in Botanical Research, 31: 1-35. BASRA, A.S. and MALIK, C.P., 1984. Development of the cotton fiber. International Review of Cytology,89: 65-113. Williams, J.G.K., Kubelik, A.R., Livak, K.J., Rafalski, J.A. and Tingey, S.V., 1990. DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Research, 18: 6531-6535. Wright, R.J., Thaxton, P.M., El-Zik, K.M. and Paterson, A.H., 1998. D-subgenome bias of Xcm resistance genes in tetraploid Gossypium suggests that polyploid formation has created novel avenues for evolution. Genetics, 149: 1987-1996. Wu, J., Jenkins, J.N., McCarty, J.C., Zhong, M. and Swindle, M., 2007a. AFLP marker associations with agronomic and fiber traits in cotton. Euphytica, 153: 153-163. Wu, Y.X., Daud, M.K., Chen, L. and Zhu, S.J., 2007b. Phylogenetic diversity and relationship among Gossypium germplasm using SSRs markers. Plant Systematic Evolution, 268: 199-208. Xia, S., Cheng, L., Zu, F., Dun, X., Zhou, Z., Yi, B., Wen, J., Ma, C., Shen, J., Tu, J. andFu, T., 2012. Mapping of BnMs4 and BnRf to a common microsyntenic region of Arabidopsis thaliana chromosome 3 using intron polymorphism markers. Theoretical and Applied Genetics, 124: 1193-200. Xiao, Y.H., Li, D.M. and Yin, M.H., 2010. Gibberellin 20-oxidase promotes initiation and elongation of cotton fibers by regulating gibberellin synthesis. Journal of Plant Physiology, 167: 829-837. Xiao, J., Wu, K., David, D.F., David, M.S., John, Yu. And Roy, G.C., 2009. Breeding and genetics: new SSR markers for use in cotton (Gossypium spp.) improvement. Cotton science, 13:75-157. Xu, D.H. and Ban, T., 2004. Phylogenetic and evolutionary relationships between Elymushumidus and other Elymus species based on sequencing of non-coding 190

References

regions of cpDNA and AFLP of nuclear DNA. Theoretical and Applied Genetics, 108: 1443-1448. Xu, K., Xu, X., Ronald, P.C. and Mackill, D.J., 2000. A high-resolution linkage map of the vicinity of the rice submergence tolerance locus Sub1. Molecular and General Genetics, 263: 681-689. Xu, Q.H., Zhang, X.L. and Nie, Y.C., 2001. Genetic diversity evaluation of cultivars (G. hirsutum L.) from the Changjiang River valley and Yellow River valley by RAPD markers. Acta Genetica Sinica, 28: 683–690. Yao, Q., Yang, K., Pan, G. and Rong, T., 2007. Genetic diversity of maize (Zea mays L.) landraces from southwest China based on SSR data. Journal of Genetics and Genomics, 34: 851-859. Yin, J.M., Guo, W.Z., Yang, L.M., Liu, L.W. and Zhang, T.Z., 2006. Physical mapping of the Rf1 fertility restoring gene to a 100 kb region in cotton. Theoretical and Applied Genetics, 112: 1318-1325. Yu, J.W., Yu, S.X., Lu, C.R., Wang, W., Fan, S.L., Song, M.Z., Lin, Z.X., Zhang, X.L. and Zhang, J.F., 2007. High-density linkage map of cultivated allotetraploid cotton based on SSR, TRAP, SRAP and AFLP markers. Journal of Integrative Plant Biology, 49: 716-724. Yu, J., Kohel, R.J. and Dong, R.J., 2002. Development of integrative SSR markers from TM-1 BACs, Proceedings of Beltwide Cotton Improvement Conference, Atlanta, USA. Yu, J.Z., Kohel, R.J., Fang, D.D., Cho, J., Deynze, A.V., Ulloa, M., Hoffman, S.M., Pepper, A.E., Stelly, D.M., Jenkins, J.N., Saha, S.,Kumpatla, S.P., Shah, M.R., Hugie, W.V. and Percy, R.G., 2012. A high-density simple sequence repeat and single nucleotide polymorphism genetic map of the tetraploid cotton genome. G3, 2: 43-58. Yu, Y., Yuan, D., Liang, S., Li, X., Wang, X., Lin, Z. and Zhang, X., 2011. Genome structure of cotton revealed by a genome-wide SSR genetic map constructed from a BC1 population between Gossypium hirsutum and G. barbadense. BMC Genomics, 12:15-28. Zhang, M., Zheng, X. and Song, S., 2011. Spatiotemporal manipulation of auxin biosynthesis in cotton ovule epidermal cells enhances fiber yield and quality. Nature Biotechnology, 29:453-458.

191

References

Zhang, H.B., Li, Y., Wang, B. and Chee, P.W., 2008. Recent advances in cotton genomics. International Journal of Plant Genomics, doi:10.115/2008/742304. Zhang, J., 2003. Evolution by gene duplication: An update. Trends in Ecology and Evolution, 18: 292-298. Zhang, J., Lu, Y., Cantrell, R.G. and Hughs, E., 2005a. Molecular marker diversity and field performance in commercial cotton cultivars evaluated in the southwest USA. Crop Science, 45: 1483-1490. Zhang, J., Stewart, J.M. and Wang, T., 2005b. Linkage analysis between gametophytic restorer Rf2 gene and genetic markers in cotton. Crop Science, 45: 147-156. Zhang, J., Lu,Y., Cantrell,R.G. and Hughs, E., 2005. Molecular marker diversity and field performance in commercial cotton cultivars evaluated in the Southwestern. Crop Science, 45: 1483-1490. Zhang, J.F. and Stewart, J.M., 2001. Inheritance and genetic relationships of the D8 and D2-2 restorer genes for cotton cytoplasmic male sterility. Crop Science, 41: 289-294. Zhang, W., Sun, P., He, Q., Shu, F., Wang, J. and Deng, H., 2013. Fine mapping of GS2, a dominant gene for big grain rice. The Crop Journal, 1: 160-165. Zhao, L., Yuanda, L., Caiping, C., Xiangchao, T., Xiangdong, C., Wei, Z., Hao, D., Xiuhua, G. and Wangzhen, G., 2012. Toward allotetraploid cotton genome assembly: integration of a high-density molecular genetic linkage map with DNA sequence information. BMC Genomics, 13: 539-555. Zhao, X.P., Si, Y., Hanson, R.E., Crane, C.F., Price, H.J., Stelly, D.M., Wendel, J.F. and Paterson, A.H., 1998. Dispersed repetitive DNA has spread to new genomes since polyploid formation in cotton. Genome Research, 5: 479-492. Zheng, C., Wall, P.K., Leebens-Mack, J., de Pamphilis, C., Albert, V. and Sankoff, D., 2008. The effect of massive gene loss following whole genome duplication on the algorithmic reconstruction of the ancestral Populus diploid. In Proceedings of CSB: 08. Zhong, M., McCarty, J.C., Jenkins, J.N. and Saha, S., 2002. Assessment of day- neutral backcross populations of cotton using AFLP markers. Journal of Cotton Science, 6: 97-103.

192

References

Zhu, Y.L., Song, Q.J., Hyten, D.L., Van, T.C.P., Matukumalli, L.K., Grimm, D.R., M., H.S., Fickus, E.W., Young, N.D. and Cregan, P.B., 2003. Single-Nucleotide polymorphism in soybean. Genetics, 168: 1123-1134.

193

APPENDIX

Table 1 List of chemicals and enzymes used in the study.

Product Supplier a-Chemicals β-Mercaptoethenol MP Biomedicals, Inc. USA Agarose Research Organics, Inc. Boric acid MP Biomedicals, Inc. USA Bromophenol blue Research Organics, Inc. USA Chloroform MP Biomedicals, Inc. USA Calcium Chloride Fermentas, USA Cetyltrimethylammonium bromide (CTAB) MP Biomedicals, Inc. USA Deoxyadenosine 5' triphosphate (dATP) Fermentas, USA Deoxycytidine 5' triphosphate (dCTP) Fermentas, USA Deoxyguanosine 5' triphosphate (dGTP) Fermentas, USA Deoxythymidine 5' triphosphate (dTTP) Fermentas, USA Ethylene diamine tetra acetate (EDTA) MP Biomedicals, Inc. USA Ethanol absolute MP Biomedicals, Inc. USA Ethidium bromide Sigma chemicals, USA Gelatin Sigma chemicals, USA Hydrochloric acid MP Biomedicals, Inc. USA iso-amyl alcohol MP Biomedicals, Inc. USA Isopropanol MP Biomedicals, Inc. USA Metaphor™ Agarose Cambrex Corporation, USA Magnesium chloride Fermentas, USA Polyvinylpyrrolidone (PVP) MP Biomedicals, Inc. USA Sodium chloride Research Organics, Inc. USA Sucrose Research Organics, Inc. USA Tris(hydroxymethyl)Aminomethan (Tris) MP Biomedicals, Inc. USA bisBENZIMIDE Trihydrochloride (Hoechst 33258) Sigma chemicals, USA

194

b-Enzymes RNase A Fermentas, USA Taq DNA polymerase Fermentas, USA Mse1 Fermentas, USA Eco R1 Fermentas, USA Hind III Fermentas, USA Bam HI Fermentas, USA Xba I Fermentas, USA T4 DNA ligase Fermentas, USA

Table 2Reagent kits used in study

KIT Supplier DNA extraction kit Fermentas, USA Gene cloning kit Fermentas, USA PCR purification kit Fermentas, USA

Table 3 Composition of buffers used in genomic DNA isolation.

1X Tris-borate EDTA 0.1X Tris-EDTA (TE) 2X CTAB (Isolation buffer) (TBE) buffer buffer EDTA 20.0 mM Tris-borate 45mM EDTA (pH 8.0) 0.1 mM Tris 100.0 mM EDTA 1mM Tris (pH 8.0) 10.0 mM NaCl 1.4 M pH of buffer 8.0 CTAB 2.0 % (w/v) PVP 1.0 % (w/v) β-Mercapto- 0.2 % (v/v) ethenol pH of buffer 8.0

195

Table 4 Recipes of buffers and solutions for DyNA QuantTM 200 Flourometer.

10X Tris-NaCl-EDTA (TNE) buffer Assay solution EDTA 10 mM TNE buffer 1X Tris 100 mM Hoechst dye (H 33258) 0.0001% (w/v) NaCl 2 M

Table 5 Recipes of buffers and gel used for DNA quantification using gel method.

5X Tris-Borate-EDTA (TBE) buffer 0.8% Agarose gel Boric acid 445 mM Agarose 0.80% (w/v) Tris 445 mM TBE buffer 0.50X EDTA 10 mM Ethidium bromide 0.05% (w/v)

Table 6 List of EST-SSRs used to screenparental species (FH-1000 and PGMB-36)

Sr. # SSR name Forward primer Reverse primer 1 MGHES1a GACATGAGGAAGCAGTTGAAAGG GCATCACCTGAACAACATCCACC M 2 MGHES1b ACAGGGCAGCGTTTAATTTG CACCTGAACAACATCCACCA M 3 MGHES3 TCTCTCAAAATCTCAAACCCAGA GCTTAGGGCAAACCACTGAA M 4 MGHES4 GCCGGTTCCTTTGACCAC CCCGCATCGTCATTAACTTT M 5 MGHES5 ATTTGCGGGTGGAGAAGAC TGGCGATTGAACAACAAAGA M 6 MGHES6 TCGCTTGACTTTCCATTTCC AACCCTCGGGATTATCGTCT M 7 MGHES7 CCTTCTTCAACACCAATCTCC TGCATTTCTGCTGAGTACCG M 8 MGHES12 GTTTCCAGGACAGAAAGGTGTC GAGTTCCCAGTTACAGAGGC M 9 MGHES13 CAGGGGAGCCATTGTTAGAA CAGGGGTCCTGTGTTTCAGT M 10 MGHES14 GAGGAGGCTGTGGTTGAAGA ATGGTGACCCTGCTTACACC M 11 MGHES15 AATCGAAGCGTTTCATCACC CGAAGATCTTGGACAGACGA M 12 MGHES16 ACCCCAATACAACCCCATTT GCAGAGAAAAGGGACAGAGG M 13 MGHES18 GCCATCAATTGGTGAAGCAT ATGCCTCGGTGAGAAAATTG M 14 MGHES19 CACCGATCAGATAGCAGCAG TGGCGTCTCAGAGATGAAGT M 15 MGHES20 CGAACCCTAGCTTTCAGTCG AGTCGACGGCTTCAGATTGT M 16 MGHES21 TTTTTCGGGCTATGCTTTTG GGGGTTGACATGTCCTATGC M 17 MGHES22 GGAACAGAGGCAACTGAGGA TCGAAGGCACAGAGAAGGTT M 18 MGHES23 AGCCGCATCACTTTTTGCTA TCAAAAACAGAAGCACCAAGG M 19 MGHES26 AAGGGGAGGTTTTGTGTAAGG GACAAGAACCAGCTCCCAAA M 20 MGHES28 CCTGCAAACGCTATTGATCC CCCAGACTGGTGATGATGAA M

196

Table 7: Positive hits and identity score of primers sequences used for mapping with WGS of G.raimondii

Sr.No. Primer Positive hits with D- IdentityScore genome 1 PR-3 D10(c20) 95% 2 PR-5 D8(c24) 100% 3 PR-7 D2(c14) 99% 4 PR-1a D2(c14) 99% 5 PR-11 D2(c14) 99% 6 PR-15 D4(c22) 99% 7 PR-23 D10(c20) 99% 8 PR-25 D4(c22) 99% 9 PR-35 D8(c24) 99% 10 PR-43 D10(c20) 99% 11 PR-48 D8(c24) 99% 12 PR-55 D11(c21) 99% 13 PR-87 D13(c18) 99% 14 PR-98 D11(c21) 100% 15 PR-19 D7(c16) 100% 16 PR-127 D5(c19) 100% 17 PR-128 D1(c15) 99% 18 PR-131 D2(c14) 95% 19 PR-145 D13(c18) 99% 20 PR-173 D4(c22) 100% 21 PR-174 D4(c22) 99% 22 PR-181 D12(c26) 99% 23 PR-187 D4(c22) 99% 24 PR-191 D10(c20) 100% 25 PR-23 D10(c20) 99% 26 PR-28 D9(c23) 98% 27 PR-29 D9(c23) 100% 28 PR-216 D5(c19) 99% 29 PR-23 D13(c18) 100% 30 PR-241 D3(c17) 99% 31 PR-248 D4(c22) 99% 32 PR-264 D11(c21) 99% 33 PR-266 D8(c24) 99% 34 PR-278 D13(c18) 99% 35 PR-28 D9(c23) 99% 36 PR-319 D1(c15) 99% 37 PR-321 D10(c20) 96% 38 PR-334 D8(c24) 99% 39 PR-337 D11(c21) 100% 40 PR-342 D2(c14) 100% 41 PR-355 D5(c19) 99% 42 PR-358 D2(c14) 99% 43 PR-359 D4(c22) 99% 44 PR-361 D13(c18) 99% 45 PR-372 D2(c14) 99% 46 PR-384 D12(c26) 100% 47 PR-39 D11(c21) 99% 48 PR-426 D13(c18) 99%

197

49 PR-43 D8(c24) 99% 50 PR-435 D9(c23) 100% 51 PR-443 D11(c21) 100% 52 PR-447 D4(c22) 99% 53 PR-45 D11(c21) 99% 54 PR-467 D4(c22) 99% 55 PR-475 D8(c24) 99% 56 PR-478 D4(c22) 99% 57 PR-492 D13(c18) 99% 58 PR-5 D1(c15) 99% 59 PR-514 D13(c18) 98% 60 PR-524 D11(c21) 99% 61 PR-531 D10(c20) 99% 62 PR-536 D2(c14) 100% 63 PR-542 D5(c19) 99% 64 PR-548 D13(c18) 99% 65 PR-549 D9(c23) 99% 66 PR-553 D1(c15) 91% 67 PR-555 D8(c24) 94% 68 PR-563 D9(c23) 99% 69 PR-578 D2(c14) 100% 70 PR-581 D7(c16) 100% 71 PR-592 D6(c25) 100% 72 PR-593 D1(c15) 100% 73 PR-599 D5(c19) 99% 74 PR-64 D1(c15) 100%

198

Table 8: Description of candidate Genes flanking QTL linked markers

Marker name Genes IDs Description PR‐443 Gorai011G161400 Dihydroxy‐acid dehydratase (G. raimondii) Gorai011G161300 Cholcone synthase 1‐like (G. raimondii) Gorai011G161500 1,4alpha glucan‐branching enzyme 2‐1, chloroplastic/amyloplastic like Isoform X1 (G. raimondii) Gorai011G161200 Cholcone synthase 1‐like (G. raimondii) Gorai011G161000 Ankyrin repeat domain containing protein 13c‐B like (G. raimondii) Gorai011G160900 Vacuolar amino acid Transporten 1(G. raimondii) Gorai011G160800 Protein sulfur deficiency‐induced 2‐like (G. raimondii) Gorai011G161100 28S ribosomal S35, mitochondrial (g. arboreum) Gorai011G160000 Uncharacterized protein (G. raimondii) Gorai011G160100 Hypothetical protein B456 (G. raimondii) Gorai011G160200 Glutathione reductase, chloroplastic‐like (G.raimondii) Gorai011G160300 Non lysosomalglucosylceramidase‐like Isoform X2 (G. raimondii) Gorai011G160400 Hypothetical protein B456 (G. raimondii) Gorai011G161600 Uncharacterized protein (G. raimondii) Gorai011G161700 3‐Ketoacyl‐CoAthiolase 2, peroxisomal‐like [sesamumindicum] Gorai011G162700 Uncharacterized protein (G. raimondii) Gorai011G162800 Uncharacterized protein IsoformX2(G. raimondii) Gorai011G163200 Serine incororator 3 (G. arboreum) Gorai011G163300 Fiber protein Fb20 [G. barbadense] Gorai011G163500 Cakin ‐2‐like Isoform X1 [G. raimondii] PR‐278 Gorai013G177100 9‐cisepoxycarotenoid dioxygenase NCED3, chloroplast like (G. raimondii) Gorai013G176900 Gibberellin 2‐beta‐dioxygenas 1(theobroma cacao) Gorai013G175100 Uncharacterized protein (G. raimondii) Gorai013G174900 Sister Chromatid cohesion PDS 5 (G. arboreum) Gorai013G175200 protein trichome birefringence‐like 41 Isoform X2 (G. raimondii)

199

Gorai013G175300 Uncharacterized protein (G. raimondii) Gorai013G176600 Glutelin type‐A1 (G.arboreum) Gorai013G176700 Hypothetical protein B456 (G. raimondii) Gorai013G176800 Cationic amino acid transporter 8, vacoular (G.raimondii) Gorai013G177000 Uncharacterized protein (G. raimondii) Gorai013G177400 Probable Glutathione S‐transferase (G. raimondii) Serine/threonine protein phosphatase 2 A 55KDa regulatory subunit B beta Isoform X1 Gorai013G177500 [FraqariaVesca Sub SpVesca] Gorai013G177200 Probable Glutathione S‐transferase (G. raimondii) Gorai013G177800 Feruloyl CoA orthohydoxylase 1‐like IsoformX1 (G. raimondii) PR‐536 Gorai002G077400 tRNA (Guanine (37)‐N1]‐methyletransferase 1 Isoform X1 G. raimondii. Gorai002G077200 MYB related protein 330 (G. raimondii) Gorai002G077100 Diacyslyceraldiacylglyceral Kinase like Isoform X2 (G. raimondii) Gorai002G077300 Methyletransferase 1 Isoform X2 (G. raimondii). Gorai002G076100 Uncharacterizes/hypothetical protein (G. raimondii) Gorai002G075900 B3 domain containing protein 0S07g0563300‐like (G. raimondii) Gorai002G076800 Dystrophin (G. raimondii) Gorai002G073300 Uncharacterised protein (G. raimondii) Gorai002G073400 Hypothetical protein B456 (G. raimondii) Gorai002G073500 TMV resistance protein N‐like (G. raimondii) Gorai002G073600 ABC transporter G family member 9‐like [G. raimondii] Gorai002G073700 NAC domain containing protein 72‐like [G. raomondii] Gorai002G075300 Hypothetical protein B456 (G. raimondii) Gorai002G075400 Hypothetical protein B456 (G. raimondii) Gorai002G075500 Hypothetical protein B456 (G. raimondii) Gorai002G075600 Hypothetical protein B456 (G. raimondii) Gorai002G075700 Uncharacterised protein (G. raimondii) Gorai002G075800 Dystrophin (G. raimondii) Gorai002G076000 Protein RRP 5 [G. raimondii]

200

Gorai002G076200 Hypothetical protein B456 (G. raimondii) PR‐500 Gorai001G173900 Vestitonereductase‐like (G. raimondii). Gorai001G174000 Dihydroflavonal 4‐reductase (G. raimondii). Gorai001G171900 Hypothetical protein B456 (G. raimondii) Gorai001G172000 Protein‐lysine N‐methyletransferase EFM1 Isoform X1 [G. raimondii] Gorai001G172100 Hypothetical protein B456 (G. raimondii) Gorai001G172200 Aqamous‐like MADS‐box protein AGL6 Isoform X1 [G. raimondii] Gorai001G172300 Protein lap 1 [G. raimondii] Gorai001G172500 U‐box domain‐containing protein 4 Isoform X1 [G. raimondii] Gorai001G172600 Peroxisome biogenesis protein 16 [G. raimondii] Gorai001G172700 ATPase ASNA1 homology Isoform X2 [Nelumbonucifera] Gorai001G172800 Polyglacturonate 4‐alpha‐glacturonosyl transferase‐like [G. raimondii] Gorai001G172900 Hypothetical protein B456 (G. raimondii) Gorai001G173300 Early nodulin‐55‐2‐like [G. raimondii] Gorai001G173100 Slyoxylate/hydroxypyruvatereductase HPR3‐like Isoform X1 [G. raimondii] Gorai001G173200 Probable carboxylesterase 9‐[G. raimondii] Gorai001G173400 Spatin [G. raimondii] Gorai001G173600 Hypothetical protein B456 (G. raimondii) Gorai001G173700 Uncharacterised protein (G. raimondii) Gorai001G173800 Hypothetical protein B456 (G. raimondii) PR‐264 Gorai11G273400 Rop guanine nucleotide exchange factor 5‐like (G. raimondii and G. arboreum) Gorai11G272900 Cytochrome P450 81E8‐like (G. raimondii) Gorai11G272400 Cucumisin precursor putative (RicinusCommunis) Gorai11G270900 Protein DA1‐related 2 Isoform X1 (G. raimondii) Gorai11G270500 Protein NEDD1‐like Isoform X2 (G. raimondii) Gorai11G270100 Disease resistance protein RPP8‐like isoform X1 (G. ramondii) Gorai11G270200 Protein NEDD1‐like Isoform X2 (G. raimondii) Gorai11G267600 Chloroplasribulosebisphosphate carboxylase/oxygenaseactivase beta 2 [G. barbadense] Gorai11G267900 Thylakoid lumenal15 Kda protein 1, chloroplastic [G. raimondii]

201

Gorai11G268100 Probale transcriptional regulator RABBIT EARS [G. raimondii] Gorai11G268200 Hypothetical protein B456 (G. raimondii) Gorai11G268700 Ultraviolet‐B receptor UVR8‐like Isoform X3 [G. raimondii] Gorai11G270300 Protein NEDD!‐like Isoform X2 [G. raimondii] Gorai11G271500 Hypothetical protein B456 (G. raimondii) Gorai11G273000 Hypothetical protein B456 (G. raimondii) Gorai11G274200 Uncharacterised protein (G. raimondii) Gorai11G274000 Hypothetical protein B456 (G. raimondii) Gorai11G274600 Hypothetical protein B456 (G. raimondii) Gorai11G275000 7‐deoxylogenetic acid glucosyltransferase‐like Isoform X1 [G. raimondii]

202

Table 9: SSR primer sequences, number of repeats and their status of polymorphism in inter-specifically.

GR-SSR ID Repeat motif Forward Primer (5' – 3') Reverse Primer (3' – 5') P/M (TC)5 GTACCGTTGGATTTGAGGAG M CAATAAAATAAGAACAACCCAC PR-GR-BESS-1 gr172E02.f PR-GR-BESS-2 gr167L17.f (AT)5 GGTGTGTTAGAAATTGATTTTGG TGAGCACTTTCATAGCC M PR-GR-BESS-3 gr165E21.f (TC)5(TTA)7 GGACGGAAGTGGGGAGTA GGCCAAATAAAATGTAAACC P PR-GR-BESS-4 gr172I23.r (CA)5 CCACTTCTGGCACCATAACA AAGGCCCTAGGTTTGCAGAT M PR-GR-BESS-5 gr168B17.f (TTG) GGGTGGCTCAGAGGTAACTG GCCGCAAATGTAGACAACAA PR-GR-BESS-6 gr170I06.r (AG)10 GCATCACGATAGCAAACCAA GGACATGGGAAGCACAGT M PR-GR-BESS-7 gr174O20.r (ATG)8 CAAACCACATCCATTTTAGAAC TGGTGTTCTTAGGGAGCTACG P PR-GR-BESS-8 gr177H07.f (AT)5 GGCAACTAGCAGGCAACAA TACTGCAATGGTCCACGAAG M PR-GR-BESS-9 gr175F05.f (TA)5 TGAGCCATCACATTCCACT CATCAATGAAGGAATGGAGG N PR-GR-BESS-10a gr177L02.r (TA)6(AT)5 ACCTATTACATTTCATATCTGTG GTGTGTGTGTGTGCATTTGG P PR-GR-BESS-10b gr177L02.r (AT)7 CCCCAATTGCGTATAAGGAT TCAGCGTAGGTGGTTGAT M PR-GR-BESS-10c gr177L02.r (CA)18 TATGCCAAATGCACACACAC AAGCGCCAGTGAATCTGT p PR-GR-BESS-11 gr177L23.r (TA)5(AT)7(TTA9 CTTTGGCATGTTCAATAGGG CTGTGCATGTGTTTATAAAAG P PR-GR-BESS-12 gr173E10.f (CTT)12 ACCACCAAAACGTGGAGAAG AGAAGACAACCTTGGCGAGA M PR-GR-BESS-13 gr175I08.f (AT)12 GAGAAACTTTTCAACAGCAG AACAAATAGGACGGGGGAAA M PR-GR-BESS-14 gr179L21.f (AT)8 GCCTAACCATTTGATATTTTTG AACGCCGCTAAATTTCAT M PR-GR-BESS-15 gr179O12.r (AAT)16 GGCCAACTTGACCCTCTT TCTGTCCGAGGAGAACCT P PR-GR-BESS-16 gr175M19.r (AG)5 CTTCCAAGTTTCAGCCCAGT GGGGCTAAAAACCAGAAGGA M PR-GR-BESS-17 gr175N15.r (TA)5(AT)9 GTACTTATCATGTGCATAAAATTG CATGGAGCCTATGTGTTTGA M PR-GR-BESS-18 gr178A05.f (AT)6 GGGCGGTGAGAGAGAAAAG TTCCCTTACATGATTCAATGCT M PR-GR-BESS-19 gr178B10.f (TA)8 TGGTTTAACGGGATCAAAATG CGAGGCATTACTGAACAAAAC NA PR-GR-BESS-20 gr178D12.f (AT)7 GTGGGAACACCACAAAATCC CTTCCATGCCAAAGTGTGA M PR-GR-BESS-21a gr176H13.f (TA)6 CATGTCTTAATGATGGAGATCAC TTGGGGGATCTAGAATCGA M PR-GR-BESS-21b gr176H13.f (TA)5 GCCGACACACACTCAT GGATGCCTGCTATCATTCC M PR-GR-BESS-22 gr178L12.r (AT)5 GGGTAGGGACATTTTCTG CGCGTCCAAATCTAAACC M PR-GR-BESS-23 gr178N22.r (TA)5 AAGGCCTTGGTAATGATAGCC CAACACTGAGGCTTTGCTCA P PR-GR-BESS-24 gr174J12.f (AT)6 GCATCTAAATTAGGGGTAGG GCCCGAAAAGACCATCAGTA NA PR-GR-BESS-25 gr177C02.r (TC)8(CTT)7 TCCACTCAAAGGCCAATCTC AAGGATGAAGGAGGAGGTG P PR-GR-BESS-26 gr178P18.r (TC)8 CTAGAGCCCAACGCTCACTC GCTTGGACATCCTTTTGGAA M PR-GR-BESS-27 gr179C12.r (TA)10 CGTAGAATGCTTACCAATG GCGTTCGATTATACACATTTG M PR-GR-BESS-28 gr179C21.r (TA)13 GGCTCTTCCACGGAAAGA GATCTTATTATGTTTGGGCTTG M PR-GR-BESS-29 gr181J09.f (ATT)5 GGAATCTAGTCTTTTTACTTTTCAG GTCACAAGTCTCTGCACTG NA PR-GR-BESS-30 gr181K02.r (TA)8 TGCACCAACATTTTAAACC TGGACGAGGTCAGAACATC M

203

PR-GR-BESS-31 gr186G24.f (TA)5 TGCACCAAGGAATGAGTA CAGCCAAGTGAGTGGCTTAG M PR-GR-BESS-32 gr186H03.f (TA)5 CTCAGGGATCGTTAGCGAAG CATTCAGTTCAGTTAAACCG M PR-GR-BESS-33 gr186J17.r (TA)5 AGCAAGGCTTTCCTTTCTCA TGAAATCAACCCCTTCTCCA NA PR-GR-BESS-34 gr182C07.r (TC)5 ATCCGTACCATTGGATTTGG ACCAACACAAACGAAAGC M PR-GR-BESS-35 gr182D09.r (GA)8 CCTGGGGATCAACTATCC GCCTTGCTACAGGGATTG P PR-GR-BESS-36 gr184G03.r (AT)5 ACGTTGAGTTAAATGGTG GGCCTACGAGACCCGATAC M PR-GR-BESS-37 gr187C11.f (TA)16 GTCGCTTAGGGCTTGCTAC GGGTAAACTACACTCAAGAC M PR-GR-BESS-38 gr182F20.f (TA)10 TGAAATGGGATCACAGCA GCTGAATTCGGTGAACAAC N PR-GR-BESS-39 gr182H24.f (TA)5 CCTTGATGGTTGGAGGTGA CATATTTGACCATAACTTATCC M PR-GR-BESS-40 gr182H24.r (TA)6 CTATAATATGTGAAGGTAGG CGGTACCGTTTTATGTGGTTG M PR-GR-BESS-41 gr184N22.r (TA)5 AGGTGGTTGGTGTGTGCA CCAAGCTAACTTGCCAAC M PR-GR-BESS-42 gr187H14.r (TA)5 TCAACAACCAGGCCCATA CCATCCTCCCCAGTCCTAC M PR-GR-BESS-43 gr187O05.f (TA)5 CCATTCTATTTCAAATTCATG GCAGGGTCAAGACGAGAATG P PR-GR-BESS-44 gr190B17.f (TA)5 CGAGCTTTGGGTCAAAAGAG GGTCTAACTGCGGATTTTGC NA PR-GR-BESS-45 gr183I15.r (AT)5 CTCATATTCAAAATTCGGAACC GGGCCTAAAGTTTCACACG M PR-GR-BESS-46 gr188G08.f (TA)6 GGACCAAAAACGGAGATG GGGATAGTGGGAAGTGATG M PR-GR-BESS-47 gr188I16.r (TA)6 GGGAAATGCAAAAATTTCGAG CCCACATGTTTCAATTTATTCC M PR-GR-BESS-48 gr190N03.r (ATA)14 GGAAATATAATAATGCAA CCCCCAAATTTTGTTTTTAAAG P PR-GR-BESS-49 gr186C02.r (TG)5 CTCGGATTGCAGTTTTGTG CACTATGCAATCTAACCATCTGC M PR-GR-BESS-50 gr186C20.f (AT)5 GGGTGTGAGTAAAAATTCC TCACCGCTAATATGCACA NA PR-GR-BESS-51 gr183O14.r (TA)5 CAAGCTTGTGTGAGCAACC GGTATCCCCCTGGCATTCT M PR-GR-BESS-52 gr190O22.r (AT)13 CGAAGCTTAATTTGTTTGAC CAAATAAGGCTAGAAGGTT M PR-GR-BESS-53 gr188M01.r (TA)9 GCATGAATGGTATGGCTCA GGCTCCTCAAATGCAAAC NA PR-GR-BESS-54 gr188M17.r (TA)5 CACACCACAAGTCTCACA GCATTGTTCCAGGGTATCC M PR-GR-BESS-55 gr191D09.f (AT)5(AAT)6 AGGCATCGATAGCAATGA CCACTCTTTTGTTCCCACA P PR-GR-BESS-56 gr191H16.f (AG)7(TC)10(ATA5 ATCCAAACACGCAGACAC GTTTTGGGGTTGAAATGGTG M PR-GR-BESS-57 gr196D22.r (AAC)25 ATCAATCCGCAGTCTGAAGC CATCACCGGTTCAAAATCA NA PR-GR-BESS-58 gr196E18.r (TG)5 CACACTATCAACAGACCATC CCAAAAACTGTTCCATTGC M PR-GR-BESS-59 gr194C24.f (TA)5 TGCTGTTGGAAGTGGTGC TCACCCACAGTTGGAAAC NA PR-GR-BESS-60 gr196F14.f (TTA)5 GGTGGATGATAGCATGTAGG CGTAAAAGGTGAAAAAGTAG M PR-GR-BESS-61 gr192A14.f (TA)7 CCCTGCCAAAGTAACCTC GAAAGGGGGTTGTCAAGG M PR-GR-BESS-62 gr192F13.f (TAT)6 GACCATAGTTGTAATGTGGC GCCTTGACAAATCATAGCTG NA PR-GR-BESS-63 gr192M20.r (AT)8(AT)5 CAAGCTTATGTATGTAGGT GTTCAAATTTCACAAGTTACG M PR-GR-BESS-64 gr198A01.r (CAC)7 CACATTTCTTCGGGACTCG CATTGTCCCGTCTAGCCA NA PR-GR-BESS-65 gr193I24.f (TA)9 CCAATCAAACCGAATTC CAAGCTCGACTAAGCTCATAC NA PR-GR-BESS-66 gr198C21.f (TC)5 CACAAACATCATCATTACCTTC GAAAAGGTGAAGAATTTTGAGG NA PR-GR-BESS-67 gr204B09.f (TA)5 TCGATCGGTTGGAATCAAG GGCAATTATCAATTAACCAA M PR-GR-BESS-68 gr209L01.r (AT)5 GATGATGGAGAGGATGAAGG AGGGATGAAATCGTAAAGGTG M PR-GR-BESS-69 gr206L18.r (AG)6 CCACCAAAATTGTGAGAACG TGGCCCCAAAATTAAAATCC M PR-GR-BESS-70 gr204C20.f (GAT)9 ATCACTTGGCAATGCTCC TCAAGGCAAAATGTCCAACA NA

204

PR-GR-BESS-71 gr206M19.r (AT)18 TCATGTCATATGCCAAACTTCC TGTGGATTCAAATGGAAAGA M PR-GR-BESS-72 gr204I17.r (AT)5 ACTGGACCCTTCAATGGAGA GGTCCTGTCATGGACGAG NA PR-GR-BESS-73 gr202A11.r (TA)6(TA)7 GTATTCAGTTGTAAGGTGTTG GAATTCGCAAATACCACA M PR-GR-BESS-74 gr204L04.f (AT)5 CCTCGGGTTCTTTTGCTG TGCTGCCATTAATTGCTC M PR-GR-BESS-75 gr204L19.f (ATG)10 CCAAAGGAGTTGTCACCA CGTCACTCCATGTGATACGC M PR-GR-BESS-76a gr202H19.r (CT)6 GACACACTCTTCAGTTTTAGG TGAATGAAAATTGAGCACC M PR-GR-BESS-76b gr202H19.r (AT)5 GGGTCAGCTTTTAATTTTACCG GAATGCAGTATAAATCTTGTG M PR-GR-BESS-77 gr202J02.r (AT)5 GCTAATTCACATTGGTTAGGG CCGAATATACCACATGCATTTA M PR-GR-BESS-78 gr205B17.f (TG)9 AGTGTGTGCGAATTGTGCAT CACAGGCATACAAAAACCC NA PR-GR-BESS-79 gr205B17.f (AT)5(TA)5 AGTGTGTGCGAATTGTGCA CACAGGCATACAAAAACCC M PR-GR-BESS-80 gr205D15.f (AT)5 AGATTCCGGCCATCAGTATG CAAGCAGTAAGTGCCGATG M PR-GR-BESS-81a gr203B08.r (AG)6 CACCGTGTTGATGAAAAATG TGAGGCATTGGAGAGATGA M PR-GR-BESS-81b gr203B08.r (TA)6 CATATTTAAAAATAGAGTTCACG CTTTCTTATAACGAATTTTATC M PR-GR-BESS-82 gr211J17.r (AT)6 GGAAAGCTTTTCGTGCTC CATCGTCATTCTTGTCATAATC NA PR-GR-BESS-83 gr211O14.f (AT)13 CCTCTCCAGGTGAAGACCAA CGTGAATTGTCCTTCAAAGC M PR-GR-BESS-84 gr208D13.f (AC)6 CAATCGAGCTTCCGATAATC CAAGATTGGGTGGAAATTTG M PR-GR-BESS-85 gr208D13.f (AT)8 CAAATTTCCACCCAATCTTG GGATGTGGAATGGAGAGGAG M PR-GR-BESS-86 gr212G02.r (TCA)7 TGTTCAATGGCAGTCTCCAC GGAGGCACCCATATCAGAAG NA PR-GR-BESS-87 gr208I13.f (AT)6(AT)5(AT)9 ATCGCTTGCACTTCATCG GGTTATCATGAAATTGTTGG P PR-GR-BESS-88 gr208J21.f (TCT)5 CAGGAAAAACTCCATCCTTG TGTTTGTTATCCGGTGGTCA NA PR-GR-BESS-89a gr203F06.f (TA)5 GGGTTGAGTTGCACCAAG GTGATGAAATTTAGAGGCAATG NA PR-GR-BESS-89b gr203F06.f (AT)5 GGGTTGAGTTGCACCAAGTT GTGATGAAATTTAGAGGCAATG M PR-GR-BESS-89c gr203F06.f (AT)5 CTTGGAGTCTCACACATTTTGG GGCAAATCTCAATCAAAGTG M PR-GR-BESS-90 gr206D01.r (AT)8 CGATGATGATAGCGTCTTCAA GACCCTTTAGCTCGTATATCTG NA PR-GR-BESS-91 gr203L19.f (TA)10(TA)10 TGGATGGGTGAGTGAGTG CTGGGATGCTCTTTTTGAC M PR-GR-BESS-92 gr206H20.r (CA)5 GCACCATCAAAGCTCATC CAGAGAGAATGGCACGATG FH-1000 PR-GR-BESS-93 gr129K17.r (AT)6(CTT)5 GACTAACCAACCTCCTTTCC GGGGGTTTAAAAGGTATCA BAR PR-GR-BESS-94 gr132K24.f (AT)5 GGACAAACTTGGAAGAAATG GGTGTTAATGGTTCTTGTCG M PR-GR-BESS-95 gr135H21.r (TTAT)5 GTGTGGGTACACCGACTAT ATGTGATATGCCTTCCTTTG FH-1000 PR-GR-BESS-96 gr140J14.r (AT)5 CTTATCCATCCCAAGCAGT TGGAATCAGAAACTCGAAC M PR-GR-BESS-97 gr140K08.r (TC)6 CCTCTTTCTTTGTTTTCTGC GGATTCCCCAAAGCTAGTA M PR-GR-BESS-98 gr140K20.f (AT)19(AT)17 ACCACTCATCTGCATTAACC ACAACACATGGGAGATAACC P PR-GR-BESS-99 gr140L04.f (TC)5 TGAGGAAAGGATAATGGATG TGGATGCATCTTAGTTAGGG NA PR-GR-BESS-100 gr132N21.r (TA)6 GCCTTGAATGACATATTGGT GCTTGTATCTCAGGCTGTGT M PR-GR-BESS-101 gr132O18.f (AT)5 CTAGCCTTTCAGATGGTTTC GTGAGCCTCTAATTTTACCG M PR-GR-BESS-102 gr135M07.f (AG)5 GTAGTGGGAGTTCAACGGTA CGATGCAGCCAATAGTTAAT M PR-GR-BESS-103 gr133C01.f (ACA)5 GCCTACTCCAACGTTTCTTA CTATTCGAGCGAGTAAGAGC NA PR-GR-BESS-104 gr138K21.r (TA)7 CCCGGTTGTATCTGTGTAAT TCTGCATACATGGTTTCAAG M PR-GR-BESS-105 gr141A17.f (AC)6 AGTTTCAGGAGAGGGACTC CTCAAATGCCAAAGGTATGT NA PR-GR-BESS-106 gr133C05.f (TA)6(AT)5 CGAGGCTTTCTTTAACTGTG CAGTCTTTGATCATGTTCGT M

205

PR-GR-BESS-107 gr133D04.f (AT)12 GTTTGTCAAGGGCTCAAAT GACATGTCGATGACAAAGA M PR-GR-BESS-108 gr133D16.r (TA)5 AGTGAGTTTGGCATTTTGAG TCTTTGTGCACCTATTTTCC NA PR-GR-BESS-109 gr136F12.r (AG)5(AG)6 GCGCCACACTCTATGTATTC CTTCGTCAGTGTCAATCCT P PR-GR-BESS-110 gr136F22.f (TA)5 GGAAAATCATTTGAGGACAG ACTTCTTGTGTGCCTGTTG M PR-GR-BESS-111 gr136G19.r (AT)5 TATCCGAGCACTATCATTG CCTCTGAGATGGGGTCTAT M PR-GR-BESS-112 gr141F24.f (TC)6 TGGTGCATCCCTTTTATAT TAAAAGGCAAAATGGTCA NA PR-GR-BESS-113 gr133H01.r (TTA)9 GCGCATATACAATGTGGTC ACGGAGCCATGTATGAATA NA PR-GR-BESS-114 gr133H17.f (AT)5 AGACCAACCTAACCATTCG GGTTTGTTGGGTTTTGGTA M PR-GR-BESS-115 gr133I16.f (AT)6 AGGAAGTTTTCACGACACAC TCTTCAGAGTTCACCACACA M PR-GR-BESS-116 gr133J21.r (ATT)5 ACGGTCCATAAAACTCCT CGATATCGGAGTAGAGACCA M PR-GR-BESS-117 gr141J08.f (AT)8 GCTACGATGACGACAATGT ACGTGCACTTCATGCTAATA NA PR-GR-BESS-118 gr136N09.f (AT)5 CCATCTCTACTTTTGCGACT TCAAGATCAGCCCTAATGAC M PR-GR-BESS-119 gr133N08.f (TA)5 GGACAAAACACAACCTAACAC GCGAAGATCATCACTCTATCT M PR-GR-BESS-120 gr137A14.r (GT)5 GGCTAATGTAGCCTTCTCA AGCTAAGGCTTGAACTTGG M PR-GR-BESS-121 gr137C04.r (TA)5(TA)5 GGAAGCTGGTCTGAGCTAC CCTTAGCAACACATATCAACC M PR-GR-BESS-122 gr134B24.r (TA)13 CTCGAGATTTTCTTGATTGG GGTCATTGCATGGATTTAGA M PR-GR-BESS-123 gr137D17.f (AT)5 ACCCAATACCATAACCAGA GGTCGGTTTAGGGGTTACTA NA PR-GR-BESS-124 gr137D17.r (TA)5 GATCAAGAAGGCATTCCAC GATGTTTATGGACCAGCACT M PR-GR-BESS-125 gr134F11.r (CT)5 GGACAGCATTTTCTGTAGGA TGACAGTGGCATGATTGTAT M PR-GR-BESS-126 gr139I19.r (TA)5 GTTGGTGGTTTGCATTCTAT ATCAGTTGGTGCAGTTTTCT M PR-GR-BESS-127 gr142A09.f (AT)10(AT)5 CCTCTCGTAGTTTACCATAATC TACACGATGACTTGGTTGAG P PR-GR-BESS-128 gr139K02.f (AC)5 CTGTCCAAAAACTGTTCCAT CTATCAACAGACCATCTTGG M PR-GR-BESS-129 gr142A19.f (AT)6 TAGAGTGCTAGAGCCAAAGC GAGGTGGAAGTCTGTAATGG FH-1000 PR-GR-BESS-130 gr137H06.r (AG)12 AGAAAGACGACCGTAAGTCA GGAGAGAAAAGAGAAAAGGA M PR-GR-BESS-131 gr137K21.f (TG)5(TG)10 TGGCGTGTATAGTGTAGCTG CACATAAATGCCGAAACAC P PR-GR-BESS-132 gr137L02.r (AT)5 CCTCCTCCTCTCTATTAGGC GGAGATGAGACTTTAAGAAGG M PR-GR-BESS-133 gr142E04.f (AT)6 CTCTCTTCTCCTCACATTGG GAAGGCAATGATGATGTTG M PR-GR-BESS-134 gr142E04.r (TCT)8 ACATCAAGTTCGGTTACTGG AGGGCACAATCACAAATTAC NA PR-GR-BESS-135 gr139N24.r (TA)11 GTCAAGTTGGTCAGGTCAT TCGATATCTCAACGAAAGGT M PR-GR-BESS-136 gr139P20.f (TC)5 GACTTATTTGTGACCGTTCG GTCATAAGTTCGAAAGGAG M PR-GR-BESS-137 gr140B23.r (TA)5 ACTGAAAAACAGGTGGATTG GTGTTGATGGAGAAAAGAG NA PR-GR-BESS-138 gr137L13.r (TA)9 GAATGGTTTAAATGCCTAGC CAACCCTTTCAGCAAATAAG M PR-GR-BESS-139 gr135D07.r (TA)5 GGAAGCTGGTCTGAGCTAC CTTAGCAACACATATCAAC M PR-GR-BESS-140 gr142H07.f (TA)5 CACTGTCATCACCCTAGCTT TGTGCGACCTTAAAGTTTCT NA PR-GR-BESS-141 gr140E10.f (TA)13 ATTCATGAGCGTGAAACAG TCCATAGGTATGAAGCAAGC M PR-GR-BESS-142 gr140H20.r (AT)5 GATACCGACTTATTCCCACA CAACTTGGAGACCTAGCAAC M PR-GR-BESS-143 gr142K12.r (TA)6 GTCCCGACTACTGCTCTATG AGTGATGGTTGGTTATTTGG NA PR-GR-BESS-144 gr142M16.r (TC)5 GTACCCGTTGAACCACTTAG GTGCTAGTCTCGTACGTCCT M PR-GR-BESS-145 gr145J12.r (AT)8(TGC)6 CTAATTCCGATCCATCTTCA TGAAAAATAGAGGGAGAGCA P PR-GR-BESS-146 gr142O13.f (ATG)18 ATGACTGGATGCGAGTTAAG GCAATCAAAGAAACATCGTC M

206

PR-GR-BESS-147 gr148E19.r (TA)5 GATTGTCGGATGTAAGCTGT GCCAAACATAAACATACCT M PR-GR-BESS-148 gr145O07.f (TC)5 GAAGATGCTTCAAAAGATG TAACTGACGAGACTCCTTCG M PR-GR-BESS-149 gr145P06.f (AT)7 TGCCATTAGAGAGGTTCCTA TCTCCCATTTTCTTTGTCC NA PR-GR-BESS-150 gr143E21.r (TA)5(TA)5 ATTTCCTGGTACACTTAATG CATTCGGTTTTAAGTGTTGA M PR-GR-BESS-151 gr151D04.r (AT)5 TAAGTTGAGTTGAGGGGAGA AGGGTGAATGGTGAATACAG M PR-GR-BESS-152 gr148I09.f (TA)5 CGTTGACCTTATTTTTAC TATCTAACAGGCGCGAAT M PR-GR-BESS-153 gr148I12.f (AT)12(AG)9 GATCAAACCAGAATTCTCCA ATCCTCGCTTTCACCTTC M PR-GR-BESS-154 gr143I23.f (AT)5 GATATAGATCGGGGGAAATC CAGTCATCCAATGCTCAAC M PR-GR-BESS-155 gr143J02.r (TA)8 CACATCAACGCATATGTCC GGTTGTGAGTATTCGGTTCT NA PR-GR-BESS-156 gr146G02.f (TA)7 CATGAGTTCCAAATTCCAGT CATCAATTGTTGGTCATCTC M PR-GR-BESS-157 gr146H09.r (TA)19 CGTAGCTGAAATGCTTGA ACCCTTGAATGAAATCCAC M PR-GR-BESS-158 gr151I21.f (TA)5 GCTCAGCTCGACTTGATTAC TCGGGAGTCACACTATTTC M PR-GR-BESS-159 gr151K21.f (TA)5 CGTAGATCGTCGGATACAAG CTCATTGGTACCACAAGT M PR-GR-BESS-160 gr149A06.f (AT)5 CTCGAAAATCCCAAATTAC GCTGGTAAGTGTGTTTGATG M PR-GR-BESS-161 gr151N03.r (GA)5 AGGAGAGAGAAGAAGGGTCA TCATCAACAAGTGCTGGAC NA PR-GR-BESS-162 gr146I14.f (AT)5 TGGGAAGTATGTTTCATGG ACATACCAATTCCACCATTC M PR-GR-BESS-163 gr149B20.r (AT)5 GAAAAATGACCCACCAGAT TCAGGCTTTTACAACTACCG M PR-GR-BESS-164 gr151O21.f (CTG)5 ACAATGAACAAGGCTGATCT CACAGCTGAAGCAGAGTACA M PR-GR-BESS-165 r149G02.f (TC)5 GACAAACAATCCATTGACCT AGCTCAATCCCTCTACTTC NA PR-GR-BESS-166 gr143P21.r (AT)5 CGACGAACTTCTAACCAATC CATGATTGTATGGTGACGA BAR PR-GR-BESS-167 gr147A10.r (TA)9 AATCATAGAATATGGACAAACG CATGAGCATCTCTAATGTGC FH-1000 PR-GR-BESS-168 gr149H20.f (AT)11 CCCAAATCAGCTTTCTGTT ATATCTTTCTAGGGCGCAAC M PR-GR-BESS-169 gr152D10.r (TA)12 GAACCAATCGACATTTTAGG TATATCCATCCGTTTGCAG M PR-GR-BESS-170 gr144E19.f (ATA)5 GCTTTTAATCCTTAGTTTCTCC CAAATGTACGGCATAGATGA NA PR-GR-BESS-171 gr147H22.r (AAC)8 AATTGTTTTCATCCACATTG CATGTTACGAGAGTAGACAC M PR-GR-BESS-172 gr152J08.r (AT)5 GCCACAAATCGTAGTGATA GCCCATTAGGTCAACTAACT NA PR-GR-BESS-173 gr152K02.r (TA)10 (TATG)5 CTGCTTTAGTTCCTGGCTTA ATAGCATCTCCAAGCACTGT P PR-GR-BESS-174 gr149P09.f (AGA)5 TCAGTACGCCAAGGAATAAC CCTTTATATCGCTCTTTCCA P PR-GR-BESS-175 gr149P09.r (ATT)7 AATTTTACCCGTCCCTTTAG CCAACTGAGATCGAAGAAAG NA PR-GR-BESS-176 gr147K18.r (TA)5 GCTTCAAAATAGGCCGAATA GGTAGTGGATGGCATAATGA NA PR-GR-BESS-177 gr147K24.r (TC)5 GGGAAGTGGGGAGTAATAAT GCATCATAATCAAAGACCA M PR-GR-BESS-178 gr147M01.r (TA)6 TCTCAAATCTTGCCTCACT ACCCTCAAACTCATGTTCA M PR-GR-BESS-179 gr144P06.f (TG)5 CAACAGACCATCTCGGTAT ACTACCCAAAAACTGTTCCA M PR-GR-BESS-180 gr152P22.f (TTC)6 GCTCAGCATTCTTATTCTGG TGGGAAGCAGACAATCTTA NA PR-GR-BESS-181 gr145B08.f (TCT)6 ACTGCATTGGTTCAATCTCT TCCAGCATACTTTCCCTTTA P PR-GR-BESS-182 gr153A24.f (AAAT)5 CTCAGTGCACAAAGACCTG GTGATCTTTGGTTCGTTTGT M PR-GR-BESS-183 Ggr155K22.f (TA)6 GGACCCTTCATATCTTATTGG GCTTGGTGCACAAGTTGATA FH-1000 PR-GR-BESS-184 gr158G08.f (AT)8 CTGTGTTATGTTTTCCTCCA CTCACATATAAGTCCCCTTAC M PR-GR-BESS-185 gr161A16.f (AT)5 TCTCATGTTCACAATCCA TGGAAATTCGAGGAAAATAG M PR-GR-BESS-186 gr158J08.r (AT)9 CTGCATAGGGGGAATATAC TGTTTTGCAATTGGGTATAG NA

207

PR-GR-BESS-187 gr156D05.f (ATC)7 GTCTAGCTAGGTCAGGCATC GAGCTCCTTGTAGTCTCGA P PR-GR-BESS-188 gr153M15.r (AC)5 GCTTTGAGTGCGTAAACTG GTCAAGTGTCTAGTTTGTGTC M PR-GR-BESS-189 gr156G12.r (AT)6 GATGCATGGTTTATCTTTGG CACATATTCACATGCCACT M PR-GR-BESS-190 gr159D14.f (TTC)5 ACCAACTTTGTGAGACCAG GTTTGCTCCGTTTTCTAATG NA PR-GR-BESS-191 gr161J14.r (CT)8(ATA)6 CAAGCCATAAATGTGAGATG GAGTGACATGCAAATCCA P PR-GR-BESS-192 r161J21.r (TA)5 GTCTCGAAGACCGACTCC CGATTTCTCTCTGTTTTCG FH-1000 PR-GR-BESS-193 gr156L04.r (AT)6 GATAGACCCTGGATTCACA GCAAATGGTTAAACAAAAGG M PR-GR-BESS-194 gr159J07.f (TA)7 CCGACTATCCAAGAGTTTC GTGGAAGGTATTGATGGGT M PR-GR-BESS-195 gr162A15.r (AT)8 CTTCCAACGCTTTCTATGAC CATAGCATCTCTCTGTCAGG M PR-GR-BESS-196 gr162C21.f (AT)5 CTTAACCATTAGGCCAACAC GTGTTTTGTAAAACCTTGTTG NA PR-GR-BESS-197 gr157D23.f (AT)16 GATGCTTGAAATCCGAATAC CTGACAGCTCACTCTTTTCA M PR-GR-BESS-198 gr155A09.r (TA)12 TCCAAGCCTGTGTTATGAG CTGCCAGTTAAACGAAAAC BAR PR-GR-BESS-199 gr157H04.f (AT)5 TCCCAAATAGAGTGTTCGAG TGGAGATGTCCTCTATTTCG NA PR-GR-BESS-200 gr157H24.r (TA)5 TCGAACCTATGGCATGTA GCTAAACACATCACACTTCC M PR-GR-BESS-201 gr160F08.r (AT)5 TGCACAGCAATTATTAGCAC CACACTGTCATCTACTTTCG M PR-GR-BESS-202 gr157N04.r (AG)5 ATCTCATGCAATGGCTCA ACCATCTCCTTCCTCTTCTC M PR-GR-BESS-203 gr160J07.f (AT)5(AT)8 GAATAAGAGGCATGGGTATG GCACGCTCAATTTAGTCCTTG P PR-GR-BESS-204 gr162O03.f (AT)5 CTCTCCCCACTAAGTCATTTGG GGATCACTTCGGTGGTCA M PR-GR-BESS-205 gr165K12.f (TA)5 CGTTGGTAGAGTTTTTGGAC TCCTTCCATTTCCAGTTAGA M PR-GR-BESS-206 gr163B15.r (AG)6 CATGAGTTTGTTCATGTTCG GCTAGCCTATACATGCCCTA NA PR-GR-BESS-207 gr165M01.f (TA)11 ACCCTCAAGAACTCCTCA CCCTTTCGCACAGATACA M PR-GR-BESS-208 gr208I01.r GCTTCCGATAATCATTTCA CAAGATTGGGTAGAAATTTG P PR-GR-BESS-209 gr168D07.f (AT)5(TA)5 GTAGATCGCAAGCATTACA AGTGTGTGTGAATTGTGCA P PR-GR-BESS-210 gr170I11.r (AT)5 TCCATCTCCAATGACCTAAC TCTTAGGTTGGGCTACAAGA M PR-GR-BESS-211 gr170K01.f (GA)6 TGAGCTTCCCGTTATATGG TCCCATGTCAGAACTCTTTC NA PR-GR-BESS-212 gr163I18.r (AT)5 ACAAGCATCATCAACTCACA GGTTATATGGAAAGGCACAC NA PR-GR-BESS-213 gr166G05.f (AT)8 GAGCACCAAATTGTGATTG GCTAATTATGCTATGCATCC M PR-GR-BESS-214 gr166H22.r (CTT)5 TCCTACCAGTTTCTGCATCT AGTGTTCAAGCAAGTCAAGC M PR-GR-BESS-215 gr171F21.f (TA)5 GCGACCTTAAAGTTTCTTTTC CACTGTCATCACCCTAGCTT M PR-GR-BESS-216 gr171F21.r (TA)5(AT)8 GCTTGTTATGGTTGGATTTG TGCATTCATCTTTTACCTCA P PR-GR-BESS-217 gr166J13.f (TA)5(ATT)12 CGACAAACACTATGCACCT CAAGACCAATGTTATGCTG M PR-GR-BESS-218 gr164E12.f (AT)5 GTGCCAAAGACATTGGAG CCCTTCTTTTTCTCTTCTCT M PR-GR-BESS-219 gr167B18.r (CT)5 GAATCTCTCCATTCCCTCT AGCATGATTCTGATGAAAGG M PR-GR-BESS-220 gr171O13.f (CA)5 GATTCGAACCCAGAACCT TCACATCCCAAAAACTAGG NA PR-GR-BESS-221 gr169J19.r (AT)5 TGAGAAAGGTAAGGCATGT CGGATACACAGTGTGGAAG M PR-GR-BESS-222 gr164N08.f (TAA)5 GGGTTTATGATGCTTTGGT CACTCACTTTAGCAGGAACC NA PR-GR-BESS-223 gr164O09.r (AT)5 CCTTAATCCAAGTAAGGATGC TCATTGATTCAGTGTCTTGA M PR-GR-BESS-224 gr038H06.f (TA)5 GTCGATATCCGAGGTAAGTC GCCTGACTCAAAAACACC M PR-GR-BESS-225 gr022B01.r (AT)5 GGCACTTGCCTAAGCATC GAGAAAACCTCGGTAAGGTG NA PR-GR-BESS-226 gr003H04.r (AAG)6 GGAAATCGGAACAGTGGT CCGAGCTAGCTAGATTATGC NA

208

PR-GR-BESS-227 gr001C11.f (AT)7 GTTCAGACATCAAATCTAGCTC GTTCAATTATGCAGACATGC M PR-GR-BESS-228 gr001F11.f (AT)5 CTCCTCTTCTCCCCACCT CACACAGATAGGACGGATG M PR-GR-BESS-229 gr005L22.f (AAG)17 GCTGTGGTAACGAACGTAG GAGTAGGCAGACACTTGGAC M PR-GR-BESS-230 gr005N03.f (AT)5(AC)7 ACCTCCCAAATACTTAACC GGTTGATGTCTAGGAGACGA P PR-GR-BESS-231 gr003M24.r (TG)10 GGGTTAGTCTCGAGTTTGG GGAGATAGGGGCTTGGTA NA PR-GR-BESS-232 gr005P06.r (AT)17 GAAAATCAGAACCCAATACG CACATGTTTGCAACTTTACTG M PR-GR-BESS-233 gr006C09.r (TA)6 CTCGTCGAGGTAAGTTTGTG AGAACCAGGACCCAAATC M PR-GR-BESS-234 gr003O12.f (AG)5 GATGACGGTGTTCATCTTCT AGCAGACTCATCAAAAGCAG M PR-GR-BESS-235 gr004A17.r (TA)5 CTACCCCTTTATCGGGAAC GTTCCCAGCCTTCAACTAC NA PR-GR-BESS-236 gr001O08.f (TA)17 GACCGAAAACACATAAGCA GGAACGTCTTCACACTATCG M PR-GR-BESS-237 gr006C15.f (AT)11(AT)5 GCTTTCTTCAGGTAGGGTTC GCCATTACCAAATAGGTGCT M PR-GR-BESS-238 gr006F10.r (AT)7 CGGCCTAGGTTGAGCTTG GCTATCCCATACTCGCACAC M PR-GR-BESS-239 gr006G16.f (TA)5 GGCATTAGTCCTGTCAACC CGGGCAAGTTTTTAGTGG M PR-GR-BESS-240 gr006G23.f (AT)26 CTGGATTCCACCCAACTA GTTCAGGCGAATGATCTG NA PR-GR-BESS-241 gr045H05.r CCGATGTCTTCACTTGGAGT GGTTTCCAGGTTAACTGAGC P PR-GR-BESS-242 gr008N04.f (TA)5 CCTTTTCTCTCCCCTACAAC GCGGCACCTATGTTAGAA M PR-GR-BESS-243 gr008P23.r (AG)6 GTAAAGCACGGAAGCTACG GCTGGAACTGTGCTTCAT M PR-GR-BESS-244 gr004J19.f (TATG)5 GGAATCCTACCAGCCTCA CCAGCTAAAAGACCTGGAG BAR PR-GR-BESS-245 gr002F04.r (ATA)11 TGCGTACGTTTAGGTATAGA GGTAGTGGTTTATTGCCA NA PR-GR-BESS-246 gr009E11.f (AT)5 TCGGAATATCTCTTCTCACC GGAAGTTGGTGAGCTGTACT M PR-GR-BESS-247 gr004L10.r (AT)5 CAAGCTTTGAAATGACTTCG GACCTCCCTTGCTAGTTTCT NA PR-GR-BESS-248 gr004O15.f (TA)6(TA)9 TGTGATAGCCGAACATAGAA GCAAACAAACATTCGAACC P PR-GR-BESS-249 gr005A19.r (AT)5(AC)5 TGATGTGTTTGCACCTCTC GGCTGATGTTCAGGTATTG M PR-GR-BESS-250 gr005C01.r (AT)5 GTACAGGTCATCACCATCG AGACCCTAAGAAGGGGAAG M PR-GR-BESS-251 gr005C23.r (TA)6 CAGATCACCAATTCCAAACT GATCTGAATCAAAGGAACGA NA PR-GR-BESS-252 gr007J10.r (AT)5 GGGGTGTCCTCTTCACATAC GGAGGTTGTGGGTTTGAG M PR-GR-BESS-253 gr007J17.f (CATA)12 GGCCTGAAGCTGCTGATA TGAACTGCCGGACCTAAC BAR PR-GR-BESS-254 gr003E07.f (TTC)6 ACCTCACTAGCTGACTGCTC AGTGGGGGAGAACATGAG M PR-GR-BESS-255 gr010B02.r (AGC)5 CACTGTCAACCCGTCTGT CAAGGGTCCTTCGTCATC NA PR-GR-BESS-256 gr010D24.f (AGC)6 GCTTCTTTGCTCCATGTG ACTTGGAGGAGGTTTCTAGC M PR-GR-BESS-257 gr010L01.f (AT)6 CGGAATAGGGTACGAGGTA CTTTGTGGGAGATGAGTGG NA PR-GR-BESS-258 gr013E12.f (AC)14 TCAGTGTTAGGCCGGTTG GCGAATCATGTGCTAGTGG M PR-GR-BESS-259 gr015G11.f (AT)5 TAGGCACCATCAAGCAAG CAGGTGGATCGTCAAGAA M PR-GR-BESS-260 gr013I08.f (TCA)7 GCCATAGGCAGACAAGAAG GGCTTATAACCCCTGTTCC M PR-GR-BESS-261 gr013L21.r (TA)16 GCCGATTCGGTTTCATCT GAAGCTTTGGAGGTTTAGGC NA PR-GR-BESS-262 gr016D01.f (AT)18 TCGTACTTGAACCCAGTCG CTTTCGTCAAGGGAGAAGC M PR-GR-BESS-263 gr016E06.f (AT)5 CATGGAATTTGCTGAAAGGA CTGTCACTGTGCCACCTGTT M PR-GR-BESS-264 gr016E06.r (AT)5 AAGACCAACCTAACCATTCGAC TGGTTTGTTGGGTTTTGGTA P PR-GR-BESS-265 gr016F10.f (TA)6(TA)5 CGAGAATCATGAATGCCA CGTTGCAAATTGGTCCTG M PR-GR-BESS-266 gr016F10.r (TA)6(TA)6 CTTCCCGTTACATGGCTCAC GCCCTAGCACATATCCTCCA P

209

PR-GR-BESS-267 gr018P16.f (TA)5 CGGATAGTCAGGGGTGTCAG CCTTAGCCGTGCATGAGTCT M PR-GR-BESS-268 gr018P20.r (TAT)5 GGATCGCATCATCGTCATC TCCCTGAACTCTTCCAATCC M PR-GR-BESS-269 gr016H23.f (ATA)14 GGGTTCATCGAGTTTTGGTG GCTACTGACTTGCAGGGACTG BAR PR-GR-BESS-270 gr019A10.r (TA)5 CCCCCTTGTTCTTTTCTTTC GCAAAATGGTCAATTAGA M PR-GR-BESS-271 gr021P18.f (ATA)5 CCCATGAGTTTCTCAACGAG CTCGTGGCTTTGATCTTCC NA PR-GR-BESS-272 gr022D22.r (AT)5 TAGATCCTCGCTTTGGTTCG GCATGTTCCCCTTATATGTTGG M PR-GR-BESS-273 gr026L22.r (TCT)5 GGATAGCAACAGGCTGAGT CCCCCATAATCCTAGTTGG NA PR-GR-BESS-274 gr022F23.f (TA)5 TCTTAGGTTAGCAAAGGAGT TGTCGATTCGGAGTGTCTA M PR-GR-BESS-275 gr024K02.f (AG)5 AGCGAGAGCAAGAAAAGC CTGTCTCTCGTTGGAAAGG NA PR-GR-BESS-276 gr020D22.f (AT)5 GAGCTCATAACGCGACTCTA CAGGCATTATGTCGGTGT M PR-GR-BESS-277 gr022M12.f (TTA)19 CCTTCTCTTCCCAACCACTA GTATCGGGACCTTGTTTCTG M PR-GR-BESS-278 gr024L10.r (TA)5(AT)5(TTA)5 CGCAATCCAGTAAGTTAGG GGGTCCTCCTTTTGTGAC P PR-GR-BESS-279 gr020G05.r (AT)6 ACACGGGGCTCGATAGTC CGAGTGGAAAATCGAGGAG M PR-GR-BESS-280 gr020I06.r (TCT)5 GTCTCGATATCAGGTTCAGC CTATAGCCAGGGCACAATC P PR-GR-BESS-281 gr020I16.r (TGA)5 CTTCCTCCCAGAAAGGATTC AGGTGGATCTCCATCTTACC NA PR-GR-BESS-282 gr022P13.r (AAG)18 GCTGTTGTTGAGGTGATAGG CATCACTCTTCACCCTCTTC M PR-GR-BESS-283 gr025D07.f (AT)5 TGAAACCGTAGATCCGATGA CAAACCTGGAGGATCCAGAA NA PR-GR-BESS-284 gr023A10. (TA)5 CACCCACTTTGGAGTGTCTC GAATGCATATGGGGTTTGGA M PR-GR-BESS-285 gr020M07.r (AGA)6 CCCGATACAAGTGGGGAGTA CTTTCGCCTTTGCCTTCTC M PR-GR-BESS-286 gr021A23.r (AT)6 ACACCCCTTACCCGTATTCC GGAACTTGGACGGAGTCAGA M PR-GR-BESS-287 gr021D01.r (GAA)6 GTGGGGTCTAAAGCCACTTG TCCTTGTGCCCCTGACAT M PR-GR-BESS-288 gr023K08.f (TA)6 GTGCGAGCCTTGCCTTATG GCACCTAGGGTTCGCATTC NA PR-GR-BESS-289 gr024B06.r (AT)5 CCACCCATCCCTACATTCC CTGTTATGCCCCCTCGTGT M PR-GR-BESS-290 gr026G16.f (AT)8 CCCCTAACCCGTATCTGTCA GCGCACGTGCAAAGTTAAG M PR-GR-BESS-291 gr029D03.r (TTA)5 CTTCCCGTTCATCGGTGT GAACCCCGGATCTAGACAC NA PR-GR-BESS-292 gr029H12.r (AT)5 GCCGAATGTCGCTAACCT CTCCCTTACCACTCGAACCA M PR-GR-BESS-293 gr031M22.r (TA)5 CTTGAGGCCCCATAGATTGA GGTTTCCAAAGCTGTTCTCG M PR-GR-BESS-294 gr031P13.f (TG)5 CCTTTTGGTTATACGGCATGT CTTGCCTTGTTCCAATACCA NA PR-GR-BESS-295 gr034H02.f (AC)7 ACCCACGATTCACACATGG GCTAGGTCTTGGAGTTCACG BAR PR-GR-BESS-296 gr034I18.f (CTC)5 CTCGAATTTCCACCCAATCT TGACTCAAGTGGACAGCTTTG M PR-GR-BESS-297 gr036L12.r (AC)5 ATCGATCTGCCAACCAACTC CCTCACAGCATCTACCGTTTG M PR-GR-BESS-298 gr036M15.r (AT)10 CAATTCCACATCCACTTCC GGGAAAGAGCGTTAATGG M PR-GR-BESS-299 gr037A22.f (AT)7(AT)8(TA)15 GGCGCATACTTAACACATA CGCATTCACCTTACATATC M PR-GR-BESS-300 gr035B03.f (AT)8 GCCATCTTGGAATGTGTG GTTCACCCAGGCATTTTC NA PR-GR-BESS-301 gr032E10.r (AT)11 CAATGGACCACATCAACAG CCTTCTTTATTGCGCAGAC M PR-GR-BESS-302 gr032G11.r (TA)12 GTTCAGCCGTAGGCAAGT CCAGAGCTGCTTCGATCT M PR-GR-BESS-303 gr030C13.r (AT)5 CTGCACAGCTTGCTTCTG CGTAGGGCAGAAGAGGAA NA PR-GR-BESS-304 gr032I04.f (TA)5 GAGGGGAACGGAAATCAC GCTGAGAAGTACTGGCACTG M PR-GR-BESS-305 gr035D12.r (TC)6 GGAAGGCAGCAGAGAATAG GGTAAGGAAGGAGGTTGAAG M PR-GR-BESS-306 gr035E13.r (TTC)5 GAGCAACTGTTAAGGACGAC TCCATGCCATAACGAGTC M

210

PR-GR-BESS-307 gr035J05.f (AT)5 CTAGCTTTAGCCCGTCCT TCTCCGTTCCGTACTGTC NA PR-GR-BESS-308 gr035K15.r (GCA)7 TACTCAGAGTGTGGCTTCG GAGGGAAAAGGCAAAGTC M PR-GR-BESS-309 gr035P18.f (AT)6(AT)6 ATATGTGTGTAACCGAATGC GGTCATTGTGTTTGATAAGC BAR PR-GR-BESS-310 gr031D19.r (CT)5 CTCTCCCTCTTCCCTATCAT GGCTCTATACGCCTCTAAAC M PR-GR-BESS-311 gr031E04.r (AG)7 CATTTGTATAGCCCTTCTCC AGAGAGCCAGTAACTTCACTTC M PR-GR-BESS-312 gr031E11.r (AGA)6 GCTGTCAATGGAGGTATCC CGACCTAGGTGGGTGTAAC M PR-GR-BESS-313 gr035P21.r (TCT)5 CACGAAGATCTGGGTAAGG CCTCTACGAAGTAGCCAGAG M PR-GR-BESS-314 gr038H06.f (TA)5 CGTCGATATCCGAGGTAAG CCTGACTCAAAACCACCAG NA PR-GR-BESS-315 gr061D14.r (TTAT)5 TGAGTTTGACTCAACGAGTG TCGCTTTTGACTATACTTCG BAR PR-GR-BESS-316 gr031K01.f (AT)7 GACGACAGATTGACCACCT GCACTAGTCTCCACTGGGT M PR-GR-BESS-317 gr038J12.r (AT)5 CTCAACCCGGTGAATTGT CGGTTGGACTAGTTCTTCG M PR-GR-BESS-318 gr038O07.r (TC)11 CTCATGATCTTAGGGTTCCA CATAGTGGACGGAAAGAGAA NA PR-GR-BESS-319 gr041I05.f (ACA)6 ACTCAGAATAGGGGCACAC GGTAGCTCTATCGGTAACTGG P PR-GR-BESS-320 gr041I24.f (TA)6 GCAATCACCATCAGTGGAC GTTGTGGTAAGCACGGACA M PR-GR-BESS-321 gr041J05.r (TA)5 ATGTAGATCAACGGAAGACC CAGATAGACGGAAAAACCAG P PR-GR-BESS-322 gr041K03.r (TA)5 CTTCTTCCTCAAGCTCTGC CCTAGACCTAGTATGGGGTGT NA PR-GR-BESS-323 gr044A14.f (TA)5 GACTTTTGCGACATGCAG GAACCGACTGGTTCCATAG M PR-GR-BESS-324a gr039B19.r (AG)5 CTGATTCTGAAGGGATGCTA TCGTGAGTGTGTCAGAATGT M PR-GR-BESS-324b gr039B19.r (TA)5 CTGATTCTGAAGGGATGCTA CCTACATCGTGAGTGTGTCA M PR-GR-BESS-325 gr041M05.r (AT)6 TGGGTTCAGCCCAAATAGT GGCATTTGGTGTAGTTTACC NA PR-GR-BESS-326 gr039E21.r (TAAA)5 CAGTAAGCTGGAACTCTTGG GAGTGTGGACCCTGTTGAT BAR PR-GR-BESS-327 gr039K03.f (AAT)6 ATGACCGAATATGCTTAGG GTGCAATTCATAGTTCACC M PR-GR-BESS-328 gr044I20.f (AT)5 AGTCGTCATGCATCAACC CCGATGAGTGGTCATAGAGT NA PR-GR-BESS-329 gr044I22.f (AT)13 GCATCCCATCTCATCTCA GTCTTTGGTGCTGGATGT M PR-GR-BESS-330 gr046G15.f (AT)5 GGCTAATGTTGGCCATGTG CAAGCCATACCAAGGCATC M PR-GR-BESS-331 gr046I04.f (AT)5 GAATGGAGTCTCCCCTACT GAAGACAACTGGACTTCAGG M PR-GR-BESS-332 gr042F15.f (TCT)5 GTCAGGTTCGGTTACTGGT CGGATAGTTGGAGACTCTG NA PR-GR-BESS-333 gr042F15.r (AT)6 GGGCTTAAACCTAAACTGA CACATGATTCACATTGTTCC NA PR-GR-BESS-334 gr042J15.f (AT)9(TA)5 CCACAGTGAGGGAACATCT CAGACGTCCCAGAACTACC P PR-GR-BESS-335 gr042K05.f (TA)5 CGGAGTTAGGACATCATCC CATGTAACACCCCTTACCC M PR-GR-BESS-336 gr039P22.f (TA)5 CCGTTGTCAAAGGTAAGTC CATGGTCATTCGAACCTAAC M PR-GR-BESS-337 gr042L16.r (ACT)5 CCTTGGTATCCTTGTGTCTG GCCTAAGCTGGAGTAATGG P PR-GR-BESS-338 gr046L07.r (GAG)5(AGG)5 CCCCTGTTTTGACTTTGG ACCGCAATAGTTCCTTCG M PR-GR-BESS-339 gr044L17.r (TA)5 CTTCATCTCCCTCAGTTCCT GCAGGATATAACGCTAGGC NA PR-GR-BESS-340 gr040C04.f (AAT)5 CCCCGGATCTAGACACAAG GTAGGGAGCCGCTCAATC M PR-GR-BESS-341 gr042P22.r (AT)5 CGTAACATATGCAGGACCAC CGAGCACTTGTCAAGGTAAG NA PR-GR-BESS-342 gr040G19.f (AT)5(TA)5 GTGGTGTGGGTTCAAAGG ATGTCCAGGCTCCCATCT P PR-GR-BESS-343 gr043A19.r (AG)5 GCTGAGATCGCTCTACCA ACCGTTGAGCGACACTTC M PR-GR-BESS-344 gr045A07.r (AT)6 GGTCATACCCACAACTCATC TGGCTTGTAATGGTTAAAGG NA PR-GR-BESS-345 gr045A23.r (AT)13 CAGCTTCAGCTCCAACTGT GCTTCCGGTACCTTTAACC M

211

PR-GR-BESS-346 gr040H20.r (AT)5 CGGAACAGAGTACGAGGTA TGTAATGTGGCGTCACTG M PR-GR-BESS-347 gr043D15.f (AT)5 ACCCCTACGTCGTGAGTTC TAGTCAGGGTGTCGGAAGG M PR-GR-BESS-348 gr047J15.r (TA)5 GAAGAATGGACTGGATGGAG CGGAAATCTGCCTCATTC NA PR-GR-BESS-349 gr047K08.r (CT)6 TGCAGGCTTTCTTCACTG ACTCGAAAGCAACGGAAG M PR-GR-BESS-350 gr043G19.r (AT)6 GAGAAGCCAGGATCCTACA GGATTGTCATGCCTGCTA M PR-GR-BESS-351 gr045G02.r (TG)7 GATCACGCCAAAGGAGT GGAAACTTACCAGTCCCTC NA PR-GR-BESS-352 gr047L03.f (CA)5 GATCTACCAATCTGGTTTGC CTCTGTGAGGAATGAAAAGTG M PR-GR-BESS-353 gr045H08.r (TA)5 GCTTCCTGTTGGTGTGTC GATCCTCGAGTACGTAGCAG M PR-GR-BESS-354 gr047L24.r (TA)8 GGGTGAATCTCAAACATAGC CAATGTGAGAACAATCGTCA M PR-GR-BESS-355 gr045K06.f (AT)5(AT)7 GCAGCCTCTAATGTTCTACC CCAGTGCCGAAACCTAAG P PR-GR-BESS-356 gr047O01.r (AAT)8 ATCGAGACTTGCGACAGA CTTCGTGCTTTTCCAGGT M PR-GR-BESS-357 gr050C03.f (AT)10 ACCCAATACCCTCCAACC CTCTCCTTCGATCTCTCCA NA PR-GR-BESS-358 gr048A20.f (TA)8(AG)8 CTCCAGCTGTATCAAGCATC CCCTGCTTTCCTTATACCTG P PR-GR-BESS-359 gr048B03.r (AT)5(AC)7 GTCATGCATCCACAATAACC TCCTCTGAGATGGGGTCTA P PR-GR-BESS-360 gr056I03.f (AT)5 GCTAACCATTCCAACAGC GGCTACAAGTGTGGCACT M PR-GR-BESS-361 gr056J03.r (TA)5(TA)11 CAGCCTTGGGGTAAAACC GGATCAGCCACCTCACTAC P PR-GR-BESS-362 gr056J09.r (AT)5 GTTGACTACCTGACCATCG GTGAAGGGTGGTGGATAG NA PR-GR-BESS-363 gr050M14.f (TCT)8 GTTGACCTCCATCCTTACTG ACTACTACTACCGCCACCAA M PR-GR-BESS-364 gr056N09.f (ATGT)5 CATGCAGTGCTACAGAAGA CTACTGCTTGATGCTCCAAC NA PR-GR-BESS-365 gr056N09.r (AT)5 CATGTGCTTCTTACCCTCA CTGCATTGGGAGAATGAAC M PR-GR-BESS-366 gr048G16.r (TA)8 AGGCTTGCATCGAATGAG GTAACACCCCTTACCCGT M PR-GR-BESS-367 gr051L11.r (TG)10 CCTCTCTTTCCCCTTACTGT AGAGAGGCCATCCTACTTCT NA PR-GR-BESS-368 gr051N06.f (TC)11 CTAGGTTTGCCCCAACAG GGTAGAGAGCTGCTCGAG M PR-GR-BESS-369 gr048P01.f (TA)5 CTTCCCTAGGTGATTGCAC CTGGTCCTTAGTCGCAGTAG M PR-GR-BESS-370 gr049A04.f (AT)5 ACCTGATCATGTGGCAAG GTTGACATGTTGGACTGTGC NA PR-GR-BESS-371 gr049B12.f (AT)8 CAAACAGATGGTGTTGG GAGGAGACGTGAATAAG M PR-GR-BESS-372 gr052H12.f (AT)5(TAA)9 CGCCTAACATACTGACAGC GTACTTCGACCGCAGCAT P PR-GR-BESS-373 gr052H12.r (TA)5 GACCAGAGGACGTATGAGG CTCGATCTAGAACGGGAATC M PR-GR-BESS-374 gr049F01.r (TA)5 GCGGAGATCACTCACACT CATCCATATGCTTCCCATC M PR-GR-BESS-375 gr058E04.r (AG)7 CCCCCAGAAGAAGTGCAG CAGCTGTCCCTGAGTGGT NA PR-GR-BESS-376 gr052K19.r (CT)7 CGTCTGTATCTCGGCTGA CCACGTTTCCACGTTTCC M PR-GR-BESS-377 gr056C02.f (TG)6 GGGATCGTAAGGGTATCAG CACGTCAATGCTACTGTCC M PR-GR-BESS-378 gr049L18.f (TTA)5 GGGTTTCTAGAACCTCCAG CAGTGTTGGAATCGGTCT NA PR-GR-BESS-379 gr052P17.r (AT)6 GGGCTGATATCATGTCTACG CCCACATGTCTAGAAGAGGA M PR-GR-BESS-380 gr061L09.r (TA)6 GTCCCGACTACTGCTCTATG CTGGTTGATTCCTCAATAGG M PR-GR-BESS-381 gr061L11.r (TA)5 GGGATCGTCAGCATAGTCT TCTACCTGTTTGGACAGCTC NA PR-GR-BESS-382 gr064F06.r (TA)6 GTCCCGACTACTGCTCTATG TGGAACTGGTTGATTCCTC M PR-GR-BESS-383 gr059G13.r (TTC)6 TCTTCTCCTCCTCCTCCT CTGGCCCTTCTTCAACTC M PR-GR-BESS-384 gr062G21.r (AT)6(AT)5 GCCAAACATGGTCATTCC GGTGTGCTTTTGGCACTT P PR-GR-BESS-385 gr065C02.r (AT)5 CCCCTATATTTTGGGGTGT AAGAGGTACTTTCTGCCTC M

212

PR-GR-BESS-386 gr067L18.f (TTA)5 CATCTGCCCTTCCAACTT CTTCCTATGCTGCTCAACAC M PR-GR-BESS-387 gr060F12.f (TAT)15 CGGATTGCTGTTAAGGAG TCAGTGCGAGGCTAACTATC M PR-GR-BESS-388 gr065L01.f (AC)5 CTGTCCAAAAACCGTTCC GTAAGGACGTCGGAGAT NA PR-GR-BESS-389 gr068C05.f (TA)5 GTAATGGCCTCGTACCCTA GTCTGTAACACCCCAAACC M PR-GR-BESS-390 gr068D04.f (AT)7(AT)16 GGAGGCTGAAATGGAGTT CAGAGAATGGGTGGAGACT P PR-GR-BESS-391 gr068H01.r (TA)5 CCTCCTCACGAAGTCTTCTC GGTACGCCCAATTACCTG M PR-GR-BESS-392 gr065P24.r (AT)5 CTTTCCACTTCTTGTGTTCA CCATTATATGGCTCGAAAGA NA PR-GR-BESS-393 gr066B11.r (AT)8 GAGGAGGTTACGACTTCCA GGCTTGTACTCACTGATGC M PR-GR-BESS-394 gr068J09.f (AC)5 CACGTCAATGCTGCTGTC GGCGTCAGAGATCGTTTC M PR-GR-BESS-395 gr063M24.r (GT)5 GGTTGTGTCATGGTGAAGAG CGGCATAGCAAAACTCTC M PR-GR-BESS-396 gr063N02.f (TG)5 GGAAAGGTAAGGACTTCCAC ACATGTCAGGAGTGTCACCT NA PR-GR-BESS-397 gr063N09.r (AT)5 AGCCCTTGCTAAGAGCAC GAAGTACCACTCCCTCTCC M PR-GR-BESS-398 gr061E07.f (AT)6 GGTCAGAACATCATCCACA CCCCTCTTACTCGTATTCA M PR-GR-BESS-399 gr063N15.f (AT)8 GGTTCATGTCTACCGTCTATCC CTCTGATCAACACCTTCC M PR-GR-BESS-400 gr061F03.r (TA)5 ACCGTACGCTAAGTGAAGTC TTGAATGGTGGGTTAGGC NA PR-GR-BESS-401 gr061F11.r (AC)7 CCTCGGGCTAGCTAGATT AGCTAGGGCTTGAAGCAT BAR PR-GR-BESS-402 gr061H09.f (TA)5 TAAGGCCGCTTTATGGAC GCTAATGGTCTCCTACATGC BAR PR-GR-BESS-403 gr068P15.r (AAG)5 TGTACTCCACCCCAAGAG GTATCCACCCACCCTAGA M PR-GR-BESS-404 gr069M18.f (TA)7 GGTACAGATGCACGGTTTC CCACATCGGACACCTTAGT M PR-GR-BESS-405 gr069N07.f (ATGT)10 TCCATGAACCATCTCCTG GGACCCCTCTGCATATAG BAR PR-GR-BESS-406 gr074B02.r (AAT)5 GGATGTTGAGGATCATGC GAAAGCGAAGGAACAAGC M PR-GR-BESS-407 gr076N07.r (TA)5 AGGCTTGGATCGAATGAG GTAACACCCCTTACCCGT M PR-GR-BESS-408 gr070A02.f (TGA)8 GAGTCCTTAGTGGGAAAACC CTTGGCAAGATCACTGTCTC M PR-GR-BESS-409 gr072C12.r (ATC)5 GCCTGTTAGTCCACCATTC CCACTCTCTTGCTATCATCC M PR-GR-BESS-410 gr077A02.f (GGT)5 GGAACTGAGTTGGAGTGCT CCATCTCTGTTCTCACCATC NA PR-GR-BESS-411 gr070D09.r (TAA)5 CCCGCAGCTCCATTAGTA GATGTCCTGACACGTACCC M PR-GR-BESS-412 gr072G22.r (TA)5 TCATGTCTTACGGGAAGC CTCAATGGCTCTTTCAGG M PR-GR-BESS-413 gr077E20.f (TG)12 CATCGGAGATCGTTTCAC ACGTCAATGTTGCTGTCC M PR-GR-BESS-414 gr072H16.f (TA)6 CTGCACATGTGAACGACA GTATGATCACGACTCCAACG M PR-GR-BESS-415 gr072H16.r (TATG)7 GAATCCTACCAGCCTCAAAC GGACTTGAATGCCTCTTCAG BAR PR-GR-BESS-416 gr070G11.f (AT)6 GCGCACATTCTAACCCTA CCTACAAGTCACCCGATAGA M PR-GR-BESS-417 gr072L17.f (TG)5 CATCGGAGATCGTTTCAC ACGTCAATGTTGCTGTCC M PR-GR-BESS-418 gr075A18.r (CT)5 GGTACACTCTTCATTTTCAGG CACACTGAAGAGGGTTATCT M PR-GR-BESS-419 gr075B05.f (AT)5 CGCAACCACCACTTAATC AGGGGTGTTGTCATAACG M PR-GR-BESS-420 gr075C03.f (AT)5 GAAGCAATTGATCTTCAACC GAAATGTGACACCAAGTTCA NA PR-GR-BESS-421 gr075D05.r (AT)5 TGTCCGAGGTTATTTGAATC TGGAGATGTCCTCTATTTCG M PR-GR-BESS-422 gr070N05.f (CTT)7 ACTCTTCTTCGACCAGGTG GGAAAGAGATAGGGCTGAGT M PR-GR-BESS-423 gr070N13.r (AT)5 GGCAATGGATTCAGTGTTAG GCCTGATTAGGTAGACGACA M PR-GR-BESS-424 gr077L13.f (TA)12 CGAAGGGTTAATTGTTTGGT TTCACTCCGCGTTCTTATAG M PR-GR-BESS-425 gr098L19.f (TTA)12 GTGGAGGTTATGGAAGTTCA CGTTTTCCTTGTTAGCTACG M

213

PR-GR-BESS-426 gr073D08.f (TA)5 CCAGCTCACACTATCCATC GCATCCTGTGTCCTAGGTT P PR-GR-BESS-427 gr071B01.r (AT)6 TACAGTTCGTGGTTGTCG GTAAGCTAAGGGTGGGTGT M PR-GR-BESS-428 gr073G24.f (TA)7(AT)9 CGAAAATGGCTTAAGTTCC AGTCCCATACATGTCCAAAC M PR-GR-BESS-429 gr075L20.r (TA)5 CAAGCTTGTTCGATCCAC CACTGTCCGAGGTAAGTCT M PR-GR-BESS-430 gr078C04.f (TC)5 CCACACAACTTTAGGCTGA GCTAGGCAGATGAAGAAAC P PR-GR-BESS-431 gr075N02.f (AT)8 CCGAAGTACTACAGGACACG GCGCTAGTTGTCGATGTAAG M PR-GR-BESS-432 gr075N07.r (AT)6 GTTTCGACCAAAAGATGGAG ATAGTTAGTGGGTGGCAAGG M PR-GR-BESS-433 gr078E19.f (AG)8 TACGTCCTCGGGCACTAC GCAAGTTGATCCCTGAACC M PR-GR-BESS-434 gr076E15.r (AT)6 GTCGAAACTCCCAAGACTG CTCCATCTCCACCAAACA M PR-GR-BESS-435 gr078I13.r (AT)6 GCATTGGATGTGACATATTG AACAGAAAGCATCACAACC P PR-GR-BESS-436 gr078I19.r (TG)5 GTAAGGGCATCGAAGATCG CACATCAATGCTGCTGTCC M PR-GR-BESS-437 gr078J05.r (AG)6 GTGAGACCGAGAGGTGATCT GCAGGGGAGACATTCTGA M PR-GR-BESS-438 gr081F01.f (AT)6 GGTTAGGAGGCGTTAGTAAG AATGAGTTGTGGGTCGTG NA PR-GR-BESS-439 gr081F08.r (ACA)5 CGTCAAGAGCACATAGGC CGTGTAGCCCTGTTTCTC FH-1000 PR-GR-BESS-440 gr078L17.r (GAT)5 AAGCCTAGACTTTACAAGCA GAGATCGTTGGAAACTTTGA M PR-GR-BESS-441 gr084F19.r (AAT)9 GATCGAGCAATTTTACTTGG TGTCTTACAGTATGCGCTT M PR-GR-BESS-442 gr084G06.r (TA)5 ATGTGGAAGCGTACTTGAAT GTTATGGTTCCCAAATTACG M PR-GR-BESS-443 gr081J14.r (AT)5 (TA)5 CCAACCATTCCATAACATC GTAGTTTGGTTTGGAGTTG P PR-GR-BESS-444 gr079A05.r (TG)5 GTAAGGACGTCGAAGATCG GCCTACACCGTTTCTTTC M PR-GR-BESS-445 gr079A19.r (AT)5 TGTCATGGGACATACTTACC TGGCATGTATAGGCTATGGT M PR-GR-BESS-446 gr081M07.r (TA)5 CACACTATCGACTCCTCTCG CGCCGGATTAGAGTTACG M PR-GR-BESS-447 gr087F08.r (TA)5 CTGAAACCTTTGAGTTGGTC GACATCCACTTGTGCCTAAC P PR-GR-BESS-448 gr087J23.f (TA)5 CAATGCTTTGACGCAACC CGTGCATAATCCCATCTTTG M PR-GR-BESS-449 gr087J23.r (AC)5 ACATTTCCGTCCTTCCAC GGTAAGGGGTGTTACAAGC M PR-GR-BESS-450 gr085A11.r (AT)5(CT)5 ATAGGTTCGGGAATCACC GTCAATGCATCCCACAGT P PR-GR-BESS-451 gr085C14.f (AAG)5 GGAATTGGTACAGTCCATGT TAAAAGGCACTATGAGAAGC M PR-GR-BESS-452 gr085F11.f (TA)10 ACATGGGTTGCATGAGAG ACAAACACTACTCCGAGGAC NA PR-GR-BESS-453 gr085F11.r (TA)5 TGACCTCGGTGATTGTTC GGTTCGTTCATTGCACAG M PR-GR-BESS-454 gr085O22.r (AT)12 CTGGATCAGTTCTCCCTTG CTCGAGCAAGTCGAAGAG M PR-GR-BESS-455 gr089B01.r (AT)5 CACACCAAAATCCACAGAGT ACTTGAACCTAAGGGGTGAG M PR-GR-BESS-456 gr080G24.f (ATA)26 CCAACCATGGGGCTAAGT CCCCGCCACTAACTCTT NA PR-GR-BESS-457 gr080G24.r (AT)9 GAACAGAAAGGGGAAAGG GAGGGTGGCTACAAAAGAC M PR-GR-BESS-458 gr080L20.r (TAT)5 CTTCAAGCTCCCAACCAT GTATCGGGACCTCGTTTCTA BAR PR-GR-BESS-459 gr083P11.f (AT)6 CGATGACTGAGGTCCTACAT CATCTTCCAATTCACACGTC M PR-GR-BESS-460 gr086J17.r (AT)6 GCCTTGATGAGGAACTAGC CCCTCCCTCTTCTCTAAATG NA PR-GR-BESS-461 gr084C19.r (TA)6 CATACAAGCCCACACATTTC GGTAGTGGATGGCATAATGA M PR-GR-BESS-462 gr084F02.r (AAT)7 TAGGCTGCTGTGCCTAAG GTAGCTCAGTTGCCCAGA M PR-GR-BESS-463 gr081C23.r (AT)6 CTTGATTGGGCCTTGATG GAGCGATAAAGCTCAATGG M PR-GR-BESS-464 gr086L12.r (AT)5 TGAACTCTTTAGGGTTGTGG ATCTTCCACCATTCAAACAC FH-1000 PR-GR-BESS-465 gr086N15.f (AT)5(AC)5 CCCAATGGCTGAAGAAGA CCATGTATGTATGTGCGTGT M

214

PR-GR-BESS-466 gr086O15.r (TA)5 AATCGTAGCATCAGTTGGAT ACACTCCTAAACATGCCATT M PR-GR-BESS-467 gr092E17.f (AT)5 GCCTATGGGCATATTAGTTG GTCACACATTGCTGAGACAC P PR-GR-BESS-468 gr089P14.f (TAT)5 TCTATCCATCATAACCATCG GATCGAATTCAAGTTGTTCG NA PR-GR-BESS-469 gr098C20.r (TA)5(AT)13 CTCCCCATAGAAGGTCAGTA CGGAGGTGTTAGAAGATATGC M PR-GR-BESS-470 gr094M14.r (TA)5 AGTTGTCATGCGTCAACC GATGATTGGTGACAGAGTCC M PR-GR-BESS-471 gr098F05.r (AT)6 CCAAAACACGAGTCATACA GTATTTGGCCATATGGTTTC NA PR-GR-BESS-472 gr090F06.r (ATG)5 AGAGTGATGGCCTTGTACTG ATGTCCAGCAAGGTCAAG NA PR-GR-BESS-473 gr098H14.f (CT)5 GTCACATTCATTCTGAATCC GTCCTTCCAAATCAACTTGT M PR-GR-BESS-474 gr090M23.f (TA)6 GTGTTGAAGTACCCCATCC GGCAAGAATTGTGGGAGA M PR-GR-BESS-475 gr090N01.r (GA)18 GCCCCAAACTCTAAAGTGA AGAGGAGGAGAAATGTTTGG P PR-GR-BESS-476 gr093G14.f (TA)6 GCACTTAAAGGTTGTGGTGT CCGAACCATAGACGCATA M PR-GR-BESS-477 gr095P04.f (TA)6 CCACAACTAATCAACAGGTG GCTCGTTTGAGCTTTCTG M PR-GR-BESS-478 gr096C02.f (AT)5 CGATGGATATCGTTGGAG GCAAGGGTTATTGCAAGC P PR-GR-BESS-479 gr096C22.f (CTT)6 ACTTTGCTCCTCACTGGAG GTCATCAGCTGGGTAAAGC M PR-GR-BESS-480 gr091D01.f (AT)5 TCGTAGGAAATGGGTGTG CGAGACCCGATACTTACTC M PR-GR-BESS-481 gr099K16.f (TC)5 CTATACCGTTGGATTTGAGG CTATTGAACACGCCAAAGA M PR-GR-BESS-482 gr091E03.f (TG)6 GTTCTTCAACGCAGGTCTC CACCACACCAAACCTAGAAC M PR-GR-BESS-483 gr091E03.r (TG)6 GTCCCAATTTCTTCATGCT CACTCTTTAGGGTGAAAAGC NA PR-GR-BESS-484 gr097A06.f (AT)6(TA)7 ACCAACAAACTCATATGTTAGC AACTCATGGTATTGGAAACG M PR-GR-BESS-485 gr097F07.r (AAC)6 GTTCTCTCTTGGGCTGTTC ACGTCCTCTTCATCAGAGC NA PR-GR-BESS-486 gr094F01.f (TA)5 GCCTTTCAGATGGTTTGG GCTATTGGCAGCATCATC M PR-GR-BESS-487 gr094F04.r (AAGA)5 CCCCATAGAGGGATTAGTTC GTAGGCTCAATTGTCACCTG FH-1000 PR-GR-BESS-488 gr099P17.f (AT)5 GACTATGCTATGCCATGTGA TGCATACAGCATCACAAAAC M PR-GR-BESS-489 gr094G12.r (AT)15 GGCCTCCATATGTACTCACA TCGCTTGGGTATGATCTTC M PR-GR-BESS-490 gr091P17.r (GAA)12 GTGCAAAACCTTGCACTTAC TGAGAAGGGTTCTGAAAGAG M PR-GR-BESS-491 gr103P11.r (TA)5 AGAGGGCTGTACAGTGATGA AGGCGCCAAAGTGTAATC M PR-GR-BESS-492 gr100K23.r (TA)5 GGACCTTTAGGAAGGGACA GAGCCTTTCCCACAAGAC P PR-GR-BESS-493 gr100L11.r (AT)5(TA)8 CTCGAGAACCATGAATGC GGGTTCAAAAGGGGCTAT M PR-GR-BESS-494 gr104B20.f (TA)9 GGGATTTGTGGTTGGATATG ATGCACACTTCAAGCAACC M PR-GR-BESS-495 gr108M04.r (AT)10 GGCATGTATAGCATAGCTGA GAATGAAATCACCATGCAC M PR-GR-BESS-496 gr108O02.f (AT)6 CTTACTTCGATGCCAATGTC GCCGAATTTGGTTCGTAT M PR-GR-BESS-496b gr108O02.f (AT)6 ACACGAAATCACACCTC TGCCAAACTTGATTCGT M PR-GR-BESS-497 gr104F16.f (TA)5 CATACCTATTCGAACATCACA TCCGAGGTAAGTCTGTTAGC M PR-GR-BESS-498 gr101F24.r (AT)5 CTACGAGTCAATGTGTGGTG CATGCATTCTTTTACACCCT NA PR-GR-BESS-499 gr108P11.r (AT)8(TC)5 GTGTGGGCACATGGTCT CTGCCACCTCCTTTCCT M PR-GR-BESS-500 gr106J22.r (TA)6(TA)5 ACATACCAAGACCGATCC GCTATACTACCCCCTTTTTG P PR-GR-BESS-501 gr106M12.r (AT)7 ACTGATAAAGGGGTGATGG CGTTGTGTGAAATTATGAGTG p PR-GR-BESS-502 gr104G23.f (TA)5(TA)7 GAAGTCCAAAAAGCTCCAC ACTCCGCTGACAATGGAC M PR-GR-BESS-503 gr106O04.r (CT)5 GCTTAAAGCTTGCCCTCT GTGACTTTGATGGCCTAGTG M PR-GR-BESS-504 gr109F03.r (TA)5 GAGTCTCATGCACAATCAAC TCTATTTTCCTCCCTTCTCC M

215

PR-GR-BESS-505 gr109K17.f (AT)5 CAGCTCTTGTTCTTGGCTAC GAAGATGCCTAGTCAGCACT M PR-GR-BESS-506 gr107J16.r (AT)6 TCAACCAACCTAGATCCC ATGGCCATTTGATACTTGAG M PR-GR-BESS-507 gr107K14.f (AT)6 CCTACGAGACCCGATATGTA GCCTGGTAAGTTCGTGTAAC M PR-GR-BESS-508 gr107L07.r (AT)25 TCAATCTTGAGTCAGTGTCG GTCATCCTCAACCGAGTTAC M PR-GR-BESS-509 gr105G06.f (AT)6 TCTCCCCTACTTCATTAGC CCTTTGAGAAGACAGCTG M PR-GR-BESS-510 gr108E07.r (AT)14 GCTTGCTGTTAGTAGATAGCTC GGGAAGGGCTTGAAATAG NA PR-GR-BESS-511 gr108F22.r (AT)5 AGATGTGTGTTGAGCTTCC CCTCTCCCTTCATCATTTG M PR-GR-BESS-512 gr103K22.f (AT)5 GCAATGTAGTGAGGTAAGTTCG AGTCGTCGTACGTCGTGA M PR-GR-BESS-513 gr111F13.f (AT)5 CAATTAGGGTTACGAAGCAT GATGTGGATTGTTAATGACC M PR-GR-BESS-514 gr116N05.r (AT)5 CACCGAATGAACACACAAAG CTGATTCAGGTACGTTCGAT P PR-GR-BESS-515 gr119E16.r (TA)5 CCAGAGGTCTAAAACTGCAA TAGCTGAAGAAGATGGGAAC M PR-GR-BESS-516 gr117B02.f (AT)5(CT)5 CTTGGCCATAGCCTAACAC GCTTCCCGTTATACAGCTC M PR-GR-BESS-517 gr117C19.r (GA)5 GTACCCTTTTACTTCCTTCA CTTCCACTTAATTTCACTCG M PR-GR-BESS-518 gr112C15.f (AT)5 GTGGTCAAATAGTCGTCTCG CCAGACGTGGTCTTACACAA M PR-GR-BESS-519 gr119L07.r (TG)5 GTAAGGACGTCGGAGATCA ACCATTGTGACCCTGTCTC M PR-GR-BESS-520 gr114M03.f (AG)11 CCTACATGCCTACCCAA GCCACTAGCCTCTTACAGA BAR PR-GR-BESS-521 gr117J02.f (GT)6 GGCCATCTAGGGCTTTAC CGCACCATAGACACCTTC M PR-GR-BESS-522 gr117J13.r (AT)5 GCCAGGTAAGTTCGTATGAC ACACGGGGCTCAATAGTC M PR-GR-BESS-523 gr113A04.r (TTC)5 CATGGGAATGAGTGGGTA CAACACCTGGTCTCACAAAG M PR-GR-BESS-524 gr120F14.f (AT)6 TGAAGTTGCACAAGAATTTG GACAGATGTTGAAATACAGCA P PR-GR-BESS-525 gr113B21.r (TA)5 GGTCCCACCAACAAAAAC CTCGTATTCGAGCACTATCC M PR-GR-BESS-526 gr115K17.f (TA)8 CCCGTTGAACCAACTAGA GAGGTCAGAACATCATCCAC NA PR-GR-BESS-527 gr113C19.r (AT)5 CCTGTCAACCACAATATCA CATGTTTCGGTTGGATTACT M PR-GR-BESS-528 gr118C14.f (TA)5 GTGATATTGGTTGGGTGGT GTTCCATACATGTCCAAAC M PR-GR-BESS-529 gr120J01.r (TA)5 GGTAAAGAAGCTTACCAGGTC GGTTCCATAGTGAGGGACA M PR-GR-BESS-530 gr116D17.f (TA)5 CAACCCTTTTGCTAACAT ACACACATGAAATTGCAC M PR-GR-BESS-531 Gr121A18f (TTA)5(ATT)6 GTGGAGGTTATGGAAGTTCA CGTTTTCCTTGTTAGCTACG P PR-GR-BESS-532 gr113I04.r (AT)5 GTTATGCAATGCCATGAGA GTTTAACCATCCATCCATG M PR-GR-BESS-533 gr116E23.r (ATA)6 CTAGCACCCTGGGACTAGA GCTGCCTCTACTCCGAAA NA PR-GR-BESS-534 gr113I23.r (TA)6 CGGACATACGATGTACGATG CCATATCTCATGCACAGTCC M PR-GR-BESS-535 gr092G23.r (ATGT)6 GTAACACCCCTTACCCGT GGCTTGGATCGAATGAGT FH-1000 PR-GR-BESS-536 gr118I20.r (AT)5(TA)6 CGATTGTAGTTGTTGTTGAGC TCCACCTCACATATAATCACC P PR-GR-BESS-537 gr121F03.r (AC)5 CGTGCAATGTTGCTATCC TCGTAAGGACGTCGAAGA M PR-GR-BESS-538 gr118M06.r (AT)5 CTACTTAAAGGGGCCGAA GTGGATGCATGGATTAGT M PR-GR-BESS-539 gr121M18.f (AATA)6 CCCGGATCTAGAGACAAGA CTTCTAGGGGTTTCGTAGG M PR-GR-BESS-540 gr121N17.f (TA)6 GATTAGGTGCAAAAACAAAC CTGAGCATCCCACTTCATAC M PR-GR-BESS-541 gr125A07.r (AG)21 CGTTGATAGGCCGCTATAAG GGAAGCAAAGCCGTAAGTTC M PR-GR-BESS-542 gr130E19.r (AT)5 GTGTCTGCGCTTAGGTAGT TGCAGTTCTCTCCAAAGG P PR-GR-BESS-543 gr127O21.r (AT)17 ACACCAGATAGAACAACAAC CATTGGCATTTTATAGCAC M PR-GR-BESS-544 gr130F01.f (TA)5 ATCGGATCAACATTCAAGTC CATCACAAATGTGCCTCA NA

216

PR-GR-BESS-545 gr128C17.f (AAT)5 CTCGTTCCTTGTGACTAAGC CTGGCGAAGTCTATTCTAGG M PR-GR-BESS-546 gr122M09.f (GA)6 CTGCCTCTTCTCTCTCACT GACCTTAGTACCCCATAAC M PR-GR-BESS-547 gr122M21.r (AT)5 TCACTTGCTGAATTACCAC ACGATTCTGAAGGTGAAAG FH-1000 PR-GR-BESS-548 gr128J10.r (AT)6 CATAACAAATAGGGCCTACG GCCTAGAAAGTTCGTGTAAC P PR-GR-BESS-549 gr123C10.f (TCT)5(TA)6 ATCATGCATGTGGCTCTG CACTTAGGGCAGGTGGTT P PR-GR-BESS-550 gr123E21.r (ATTT)6 GAGATGACAACCCAAGCA GCTAAGTAGGGGGTTCGTAG M PR-GR-BESS-551 gr123F09.r (AT)5 GCTTGTACAATGCCTACGTC GTGTGGGAGAAGTAGATACG M PR-GR-BESS-552 gr126L06.f (ATA)6 CCGAGTATTGCCTTATTTTG TGTGATAGGCTTTTAGGTC M PR-GR-BESS-553 gr129B03.f (AG)6 GCTGGAAGCTTTGGTTAGG GGTAGCTGTCAGGAGCAC P PR-GR-BESS-554 gr127B15.r (AT)9 TGTCGTCTCCCACTGAAAC CCTCAAGGTGCCACAAAC M PR-GR-BESS-555 gr129E07.r (AT)5 CTCAAGCTTTGGGATATTAGA TGGAATACCCGTGTAGTG P PR-GR-BESS-556 gr124B10.r (AT)8 TTGGCTCTATTGGTAAAACC GAGTGAATCATGCAAATCAT M PR-GR-BESS-557 gr124C02.f (CT)5 GGAGTTTAAGGTGCTGAACA GGAGTTTAAGGTGCTGAACA NA PR-GR-BESS-558 gr132A08.f (ATA)5 GTTTCTCAACGAGTCCAAAG CACATATTTGACCCCGTACT M PR-GR-BESS-559 gr127F21.f (AT)7 CAACCGAACCTACTTTCATC CACCGGGTAAGTAAAGGT M PR-GR-BESS-560 gr030M02.r (TA)6(AT)5 CGCATTGGATATGACACG CGCAGCCCAATACGTTAG M GR-9-IV-1- PR-GR-BESS-561 9_10_09_A02_F (AAT)14 TCGAATGGTTTTTGTTTCTT TCAAATAGTGACAAATGTAGGG M GR-9-IV-1- PR-GR-BESS-562 9_10_09_P02_R (AAT)12 CGACAAATGAATCGTTCTCT CTTCCTCTCCCATTTAGCTT NA GR-9-IV-2- PR-GR-BESS-563 9_10_09_F04_R (AAT)11 TGCGTACGTTTAGGTATAGA TGGAGGTAGTGGTTTATTGC P GR-9-IV-2- PR-GR-BESS-564 9_10_09_O07_R (AAT)7 CCTGCTAGCATAGCTGTCTT CAAGTTCAAGTTTCCTGGAC M GR-9-IV-12- PR-GR-BESS-565 9_10_09_K23_F (AAT)9 GCAGGCATGCAAGCTTAT TGTGCCTTATATACCCCATT M GR-9-IV-14- PR-GR-BESS-566 9_10_09_A21_R (AAT)17 ATGGACCACTTGAGAAAAAC GGTCCCATGTGTGATCTATT BAR GR-9-IV-16- PR-GR-BESS-567 9_10_09_H23_F (AAT)13 GTTCATCGAGTTTTGGTGAT GCTCTAGGCTACTGACTTGC M GR-9-IV-18- PR-GR-BESS-568 9_10_09_A08_R (AAT)8 TCGAAAATAAAGAACCAAGT CTCCCTAATGTCATTTTAATTTC M GR-9-IV-20- PR-GR-BESS-569 9_10_09_A11_R (AAT)8 TTTCTCCTTGTTTCGTTTCT GACAAGTGGATGGTGATGAT NA GR-9-IV-20- PR-GR-BESS-570 9_10_09_e06_F (AAT)14 CATAAGGATTTTCCCCAAAC ACCAAAACTTTGGCTTTGTA M GR-9-IV-20- PR-GR-BESS-571 9_10_09_L22_F (AAT)18 AGTGATGGTTTTTGGAACTG GAAATTTTCCTGCATTCTCC BAR GR-9-IV-22- PR-GR-BESS-572 9_10_09_L23_F (AAT)11 GCAGGCATGCAAGCTTTA TGGGTACAAGTTAACCAAGG M

217

GR-9-IV-24- PR-GR-BESS-573 9_10_09_I24_R (AAT)8 CTCAATCCTTTCAAAGATCCTA CAACTTAATCCTTTGCATTTG NA GR-9-IV-25- PR-GR-BESS-574 9_10_09_N18_F (AAT)14 GGATGAAATTTCCGTAGACA CGATGTCCACTACACGATTA M GR-9-IV-27- PR-GR-BESS-575 9_10_09_d06_R (AAT)7 GGATGAAATTTCCGTAGACA CGATTACATTGATCAGAACG P GR-9-IV-27- PR-GR-BESS-576 9_10_09_H05_R (AAT)7 ACAATGACATCCTCTCTCTCA AATACATAGGATGCGGTTTG M GR-9-IV-3- PR-GR-BESS-577 9_10_09_P06_F (CAT)5 TGCCTTTCATCTTTTCTCTC TGAAATGGATTGTCACTGAG M GR-9-IV-12- PR-GR-BESS-578 9_10_09_e06_F (CAT)5 CAACACTTCAGTTTTTGTGC CAACAGGTAGTATATGGACTAGAG P GR-9-IV-13- PR-GR-BESS-579 9_10_09_I08_F (CAT)6 GTAAACTTGATTTGCCCAAC CCCCTGTTCCTAATGTGATA M GR-9-IV-16- PR-GR-BESS-580 9_10_09_P08_R (CAT)6 GTCTAGCTAGGTCAGGCATC GAGCTCCTTGTAGTCTCGAA M GR-9-IV-19- PR-GR-BESS-581 9_10_09_c05_R (CAT)6 TTACCTGCATTTTTCTAGCC GCTTCATTAGCATGGAAGAG P GR-9-IV-21- PR-GR-BESS-582 9_10_09_M23_R (GGA)6 GAAGAATTGGAGCCCTTAGT GCAAGGTCACTGTCTTCTTC M GR-9-IV-14- PR-GR-BESS-583 9_10_09_F08_F (AAC)5 GTCAAATTTGGGGGTACTCT GGATGAATCATTTGAAGCTC NA GR-9-IV-14- PR-GR-BESS-584 9_10_09_F08_R (AAC)5 TCAAATTTGGGGGTACTCTA GGATGAATCATTTGAAGCTC M GR-9-IV-22- PR-GR-BESS-585 9_10_09_L24_R (AAC)11 AACAATCTGCACACACACAC CCCCTGTAGACGTGATAAGT BAR GR-9-IV-29- PR-GR-BESS-586 9_10_09_H23_R (AAC)9 GGATTCCCACAGAATACTCA TCAGTTTCAGTTAGCATGGA M GR-9-IV-1- PR-GR-BESS-587 9_10_09_c11_F (AC)8 AATGGCATACCTAAATTCAA AAGTATGAAGGTTTATAAGGGTA M GR-9-IV-2- PR-GR-BESS-588 9_10_09_d08_F (AC)14 CCCCCTTTTAGTTTCAGTG TCATGTGCTAGTGGATGGTA M GR-9-IV-3- PR-GR-BESS-589 9_10_09_K15_F (AC)9 CTCTAGAAAATCCCGGTAGG GACCTAAAAGATGGATGCAC M GR-9-IV-13- PR-GR-BESS-590 9_10_09_e12_F (AC)14 CCCCCTTTTAGTTTCAGTG TCATGTGCTAGTGGATGGTA NA GR-9-IV-17- PR-GR-BESS-591 9_10_09_F08_F (AC)7 CAGCTCTTTTAGCAGAGGAC GGCAAAACTCGATAACTTGT M GR-9-IV-21- PR-GR-BESS-592 9_10_09_G09_F (AC)9 AAGCTTTTCTGCTGTTTGAC CCTTCAAGACTCACATCAAAG P

218

GR-9-IV-23- PR-GR-BESS-593 9_10_09_e15_F (AC)7 GACCCTTCAGTTTCATCTCA ACCGTTACGAAGAAGTTTGA P GR-9-IV-12- PR-GR-BESS-594 9_10_09_M16_R (AG)15 TACGGGTTGAAATGTACTCC ATGAATGCAGATCATTACGC M GR-9-IV-15- PR-GR-BESS-595 9_10_09_K12_R (AG)7 TTCACTTCTTGAGTTGCACA GGTCTTTCGTTGCTATGTTC M GR-9-IV-22- PR-GR-BESS-596 9_10_09_F06_R (AG)16 GGAAGCTCAGTGTTTTTCAG CGGTGAGTAAACAAAATTCC M GR-9-IV-22- PR-GR-BESS-597 9_10_09_H10_R (AG)10 CCTCCAAAGTTCACCAGATA CTGGTCAGGATTCATCAGA M GR-9-IV-24- PR-GR-BESS-598 9_10_09_b01_F (AG)7 TGGATTCATCCCAACTATC GCTATGCTTATGGCCTTATC M GR-9-IV-24- PR-GR-BESS-599 9_10_09_e01_F (AG)8 GGAGTTTGAATAGAGAACCAG TGGATGAGTGATTGTAGCA P GR-9-IV-2- PR-GR-BESS-600 9_10_09_A21_R (GAA)6 GGTTAAACAAGGAAAGCAAG GGTTCAATGGACAACACTTC M GR-9-IV-2- PR-GR-BESS-601 9_10_09_b14_R (GAA)6 CGAGATGAAAAGCATAAACC GCCATGTTCATACAGAGAGG M GR-9-IV-3- PR-GR-BESS-602 9_10_09_H04_R (GAA)5 AAGTGCAAAAGCATGAGTAG GCTACTGTGATCTCGGAAATA NA GR-9-IV-12- PR-GR-BESS-603 9_10_09_A21_R (GAA)6 GTGGATGATGTTTCACTTCC CATCAAATCAGTCATTGTGG M GR-9-IV-12- PR-GR-BESS-604 9_10_09_H07_F (GAA)9 GCTTCCTCAAACAGTTCATC CAACAACTTCCCTCAAATGT P GR-9-IV-13- PR-GR-BESS-605 9_10_09_A09_R (GAA)5 TCAAAAGGAGCAGAAACAG GTCTATGGTCAAAAACGACA M GR-9-IV-13- PR-GR-BESS-606 9_10_09_F09_R (GAA)6 CCATATACAATCTGCGTGTG GTGCTCATCTTTTTCATCCT M GR-9-IV-20- PR-GR-BESS-607 9_10_09_M07_R (GAA)6 GACGATAAAATCAGGGACAA TCCAAATTCCTATGTTCACC NA GR-9-IV-20- PR-GR-BESS-608 9_10_09_O11_R (GAA)6 GAAAGGAGAAGAAGAGAAGGA CGATATTACAATATATCCCCACA M GR-9-IV-21- PR-GR-BESS-609 9_10_09_d01_R (GAA)6 GAAAACTCATTTCCATCAGC TCCGTTTTGCTCTTATTCTC P GR-9-IV-17- PR-GR-BESS-610 9_10_09_I03_F (GAA)5 AAGTTTTGGCACTAGCTTTG CTGCATCACCAATACACAAG NA GR-9-IV-21- PR-GR-BESS-611 9_10_09_G06_F (GAA)7 AAGTGTTCTAATCGCCAAAC AATTGTAAGCTTCGGAGACA M GR-9-IV-22- PR-GR-BESS-612 9_10_09_P13_R (GAA)17 TGAGGTGATAGGGATGTTTC TCTTTGGCTCATCACTCTT P

219

GR-9-IV-23- PR-GR-BESS-613 9_10_09_c02_R (GAA)6 GTGTGTTTGGGGACCTTAT CTTCCCTCTCTCTTTTCTCC M GR-9-IV-28- PR-GR-BESS-614 9_10_09_A12_R (GAA)12 GTGGTAAGGGGTAACATTGA GAGATTCCAACCAAAACACC M GR-9-IV-29- PR-GR-BESS-615 9_10_09_J05_R (GAA)5 GCCTGCCCTTAACTCTATTG ATCACCACCACAAAGAAATC P GR-9-IV-29- PR-GR-BESS-616 9_10_09_O11_R (AT)11 TCCGTCTCAACTCACCTATT TCCACTATCCACTTCACTCC M GR-9-IV-29- PR-GR-BESS-617 9_10_09_K13_F (AT)8 CGGTTAATACAAGTGGGTGT AGGATGGCTATAGGTTCCAT M GR-9-IV-29- PR-GR-BESS-618 9_10_09_K05_R (AT)11 GTGGATTTGGGTTATTTGG TTGAAGCACTTAATCCTCGT M GR-9-IV-29- PR-GR-BESS-619 9_10_09_G09_F (AT)17 AGGTGCATAGTATTGGTTGAA TTGAAATGTTCCAATCACAA NA GR-9-IV-29- PR-GR-BESS-620 9_10_09_b20_F (AT)8 GTTTCTGCTACCACTTCCAC CCTTTCAAAAACAACCGAGT M GR-9-IV-28- PR-GR-BESS-621 9_10_09_P12_R (AT)9 GTATTTGGCCATATGGTTTG GAGCTCGTAACCAACTTCC BAR GR-9-IV-26- PR-GR-BESS-622 9_10_09_N07_F (AT)28 AGCATCTTATGCGCAAGTA GAAAGCTCTAGGTTATTCATGG M GR-9-IV-27-9_10_09_c PR-GR-BESS-623 01_F (AT)5 CAGGCATGCAAGCTTAAC AACTCGACATAGGTGGTTG M GR-9-IV-27- PR-GR-BESS-624 9_10_09_F16_R (AT)14 CTATACTGCGGCTAGCATCT TCTGCGCAATACAAGAAAC M GR-9-IV-27-9_10_09_I PR-GR-BESS-625 06_R (AT)10 TCTCAAATCTCATCATTTGC TGGCACTTTGACATTTATTG P GR-9-IV-27- PR-GR-BESS-626 9_10_09_N02_R (AT)12 CGAGCACATCTCAGAAGAA GGATTGCAGATGTTTATCA M GR-9-IV-28- PR-GR-BESS-627 9_10_09_e17_F (AT)14 CTACATCACGCTTTCCTACC GTCATTTAAAACGGGTCAAC P GR-9-IV-27- PR-GR-BESS-628 9_10_09_I06_R (AT)10 TCTCAAATCTCATCATTTGC TGGCACTTTGACATTTATTG M GR-9-IV-27- PR-GR-BESS-629 9_10_09_N02_R (AT)12 CGAGCACATCTCAGAAGAA GGATTGCAGATGTTTATCA NA GR-9-IV-28- PR-GR-BESS-630 9_10_09_e17_F (AT)14 CTACATCACGCTTTCCTACC GTCATTTAAAACGGGTCAAC M GR-9-IV-28- PR-GR-BESS-631 9_10_09_I24_R (AT)15 TCAGAAAATGACTTGAGCTT TGCCTGATTGGATTTGAC M GR-9-IV-28- PR-GR-BESS-632 9_10_09_J03_F (AT)20 TGATATCATATGCTCGTGA CCTAGGCCAAATTAGAGAGA M

220

GR-9-IV-28- PR-GR-BESS-633 9_10_09_N10_F (AT)11 TAGGCTAGTGTTCGTTTGGT TGCATCCATCATCTACTTCA M GR-9-IV-25- PR-GR-BESS-634 9_10_09_G03_R (AT)8 CAAATTCACGTGTAGCAAGA ACGAATCGTTGATGGACA M GR-9-IV-26- PR-GR-BESS-635 9_10_09_b07_F (AT)22 GATATTCAAATTCGGCAGAC GATTGCTGAGAATGGAGAAG P GR-9-IV-26- PR-GR-BESS-636 9_10_09_b12_F (AT)14 ATGAAGAGTGCTTGCTGT GCTAGAAGCAGAGAAGCTCA M GR-9-IV-26- PR-GR-BESS-637 9_10_09_b12_R (AT)8 TCTCAGGATCTACGTTCTGG ACAACTCCAAGAGGCAGTAA P GR-9-IV-26- PR-GR-BESS-638 9_10_09_G16_F (AT)8 GGGTTAAGAGGCATTACCA GCACGTGCAAAGTTAAGAA M GR-9-IV-26- PR-GR-BESS-639 9_10_09_K01_R (AT)6 CACTCAACCAAGAACACCAT TGATCTCTCAAGTACATTCACC M GR-9-IV-26- PR-GR-BESS-640 9_10_09_K19_F (AT)24 GCTTTTAAGAGTTGATGCTG ATCCTGAAGAACCCCTACTC M GR-9-IV-25- PR-GR-BESS-641 9_10_09_F20_R (AT)9 GTTGTTCGAATAGTCCAGTACA ATTCGGCAACACATCTCTAT FH-1000 GR-9-IV-25- PR-GR-BESS-642 9_10_09_e13_R (AT)10 TTGCAGATTAGTCATTGGTG TATCCATACCATTCGATCCT NA GR-9-IV-25- PR-GR-BESS-643 9_10_09_b13_R (AT)8 TGATTTAGCAACATTTACGAAG GCTCTTAACCAAAAGTCAATTC M GR-9-IV-24- PR-GR-BESS-644 9_10_09_J04_F (AT)14 AGGTAGCTATCGCAATTCAG GCAGGAACTCAGTCCATATC FH-1000 GR-9-IV-24- PR-GR-BESS-645 9_10_09_J03_F (AT)11 AGCCTACTTACATACGGACA ACCCAAATACAAAAGCTGGA M GR-9-IV-24- PR-GR-BESS-646 9_10_09_H22_R (AT)11 GACAGATGCTCATTCAATC GTTCTGACACTTGGACTTG P GR-9-IV-24- PR-GR-BESS-647 9_10_09_e01_R (AT)8 GCTATTTGGTGTGGCTTATT GCATACCAATTTCACAAACA M GR-9-IV-23- PR-GR-BESS-648 9_10_09_K03_R (AT)8 GATGTTAATGTGGTGATGG GGTCATTAGCACAACCTCTC M GR-9-IV-23- PR-GR-BESS-649 9_10_09_e19_F (AT)16 GGGGATCCTCTAGAGTCG CTGTTAAGCTGCAAACATGA M GR-9-IV-23- PR-GR-BESS-650 9_10_09_c22_F (AT)14 GTCTCTGAGCCTAAATCGTG GTGGCTATGACTCAAATGG M GR-9-IV-23- PR-GR-BESS-651 9_10_09_b03_F (AT)5 TAGACCAAGAACGAGAGGAA ACTATGTCCAAACGAGCATT M GR-9-IV-23- PR-GR-BESS-652 9_10_09_A20_F (AT)14 GTCTCTGAGCCTAAATCGTG AGTGGCTATGACTCAAATGG M

221

GR-9-IV-22- PR-GR-BESS-653 9_10_09_P21_F (AT)9 GGTTGTAAATAGCCATTGGA CAAGCCAAAACCAAGTCTTA P GR-9-IV-22- PR-GR-BESS-654 9_10_09_N12_F (AT)13 ATCGACATCTTCCAATTCAC CAGCTCATGTGAGCATACAG M GR-9-IV-22- PR-GR-BESS-655 9_10_09_N07_R (AT)10 AGATAAAGCCAAGATCCACA AATGGAACTTCATCTCATGG M GR-9-IV-22- PR-GR-BESS-656 9_10_09_M14_F (AT)12 GTGAGAGGAGTGATTTGTGAC GTCAATTTTAGCCTCAAGTG M GR-9-IV-22- PR-GR-BESS-657 9_10_09_K18_F (AT)10 CCCTAAATTGTTGCAAACTC GAGATATCACCGCTAGACCA NA GR-9-IV-22- PR-GR-BESS-658 9_10_09_K03_R (AT)9 ATAAAACAACACATACGCTTGA AGTAATCACGTTTAATCGAAAAG M GR-9-IV-22- PR-GR-BESS-659 9_10_09_J04_F (AT)20 GGCATGCAAGCTTAAACC GGAAAGTTTAGGGGTTTCG M GR-9-IV-22- PR-GR-BESS-660 9_10_09_G04_R (AT)13 TGGGTGAATTATGGTTTAGG GTGCATAAATGCTCATGCT M GR-9-IV-22- PR-GR-BESS-661 9_10_09_F15_F (AT)18 GGAAAGAAGCTCAATTACAAGG TACTTTCCCGAAGTGAAAAC M GR-9-IV-22- PR-GR-BESS-662 9_10_09_e16_R (AT)12 ATCCGACGAGATACTTTGAA GTGTCATTAAGAAAGGAGGT M GR-9-IV-22- PR-GR-BESS-663 9_10_09_e02_F (AT)11 GCAGGCATGCAAGCTTTA CTCTCACCAACAATCGTCTT M GR-9-IV-22- PR-GR-BESS-664 9_10_09_b02_F (AT)24 CAGGCATGCAAGCTTATTAT GTTAGGGCTCAATTTCCTTG P GR-9-IV-21- PR-GR-BESS-665 9_10_09_K05_F (AT)29 CAGGCATGCAAGCTTATAG CTTGGGAAAAGGGATAATTC M GR-9-IV-21- PR-GR-BESS-666 9_10_09_I17_F (AT)10 AGGCTTGAAGCTTTGTTGT CACGTGGTCCATTAAACAT P GR-9-IV-21- PR-GR-BESS-667 9_10_09_H15_F (AT)6 GGAAGTTAGGTGATATTGGTTG CATCCATTTCCAATTACCAC M GR-9-IV-21- PR-GR-BESS-668 9_10_09_c14_F (AT)5 GGCATGCAAGCTTATTATGA CGAGCAACAATAAAAAGGAC M GR-9-IV-20- PR-GR-BESS-669 9_10_09_P07_F (AT)8 GAGCCCTATTTTAGGCTTCT ATGCTATGATCAATGCACGA M GR-9-IV-20- PR-GR-BESS-670 9_10_09_L20_R (AT)11 CTTTTTAAACCCACTTGCTG ACAGGTGTAAAGCAAAAAGC NA GR-9-IV-20- PR-GR-BESS-671 9_10_09_G23_F (AT)20 TCTATAGGTTGCCTTGAAGC CTGCCCACACTCATAGGTAG M GR-9-IV-20- PR-GR-BESS-672 9_10_09_F04_F (AT)18 CACTCTTGTCCCATGTACC TCCATAAATGTTTCCATGC M

222

GR-9-IV-20- PR-GR-BESS-673 9_10_09_e04_R (AT)14 GCACTTCATGCTTCTTCTTC GAAAACCTGAATGACAATCC P GR-9-IV-20- PR-GR-BESS-674 9_10_09_b15_F (AT)12 CATAAAGGATGTGGAATGG CACACACACACACACACAC M PR-GR-BESS-675 (AT)11 TATGTCAATGGACCACATCA GTTGCATTGAGGTTCAACA M GR-9-IV-19- PR-GR-BESS-676 9_10_09_N11_F (AT)8 GGCATGTATAGGTGGACTTG TCATTTGAGCCTTACCATTC P GR-9-IV-19- PR-GR-BESS-677 9_10_09_K15_F (AT)12 AAAGAAGAGCACGTTTCAAG CGATAGGCTGAACATGATTT M GR-9-IV-19- PR-GR-BESS-678 9_10_09_J05_R (AT)2 CCCCCTTTGTTTGACGTG AGCCATCAGACATGAAAACAC M GR-9-IV-19- PR-GR-BESS-679 9_10_09_e22_F (AT)9 CATGACATGGGTTCAAAGA GAACCTTGCTATGGTGTTTC P GR-9-IV-19- PR-GR-BESS-680 9_10_09_e18_R (AT)12 TTCCATTTACAAGGATAGTTCA CCGGTGTGTTGTATCATGT NA GR-9-IV-19- PR-GR-BESS-681 9_10_09_A11_R (AT)27 TGGATTGAAATGTTATGTTTTGTT TGCAAATCGTTTGAATACAGG M GR-9-IV-18- PR-GR-BESS-682 9_10_09_P12_R (AT)17 GCTCGAGCTCGGCTCAATA TGCTAGGAGCCCATCTAAAAA BAR GR-9-IV-18- PR-GR-BESS-683 9_10_09_M09_R (AT)8 TTCCTCTTCTATATGACATTCCAAA AACGATGTTAAAATGATAGTGATGAG P GR-9-IV-18- PR-GR-BESS-684 9_10_09_I09_R (AT)9 CCAACATAATCATGACAACC GCCAGTGAACTGGATTAGAC P GR-9-IV-18- PR-GR-BESS-685 9_10_09_H11_F (AT)10 GCCTACTATCTTCGGAATCA ACGACGTTAACTTTTATCA M GR-9-IV-18- PR-GR-BESS-686 9_10_09_H02_F (AT)9 GTCCCTGAGTCTAAAACGTG AGTGGCTATGACTCAAATGG NA GR-9-IV-18- PR-GR-BESS-687 9_10_09_b16_R (AT)8 GATAATGGTGTGACCGAATC GAGGCCGAACATACTATTGA M GR-9-IV-18- PR-GR-BESS-688 9_10_09_b02_F (AT)20 TAGTGAAAGCCTTCCTCAGA GAAGCAATCAGAAACGAACT M GR-9-IV-17- PR-GR-BESS-689 9_10_09_O13_F (AT)5 CAGGCATGCAAGCTTATAC GATTGACTTCAACCTTGTACG P GR-9-IV-17- PR-GR-BESS-690 9_10_09_M17_F (AT)17 CATTTGGTAGTTCCTTAGGT CCATCAATTGAGAGTAACCA M GR-9-IV-17- PR-GR-BESS-691 9_10_09_J09_R (AT)8 GCTTGGTAAGATCATAAGGTG CCATTGATAAGCCAATACC FH-1000 GR-9-IV-17- PR-GR-BESS-692 9_10_09_I16_R (AT)5 CTTATCATGCACATACAATCG CAGGTTTTAAGCCTAGCAG NA PR-GR-BESS-693 GR-9-IV-17- (AT)9 M

223

9_10_09_H08_F GACTCGAGAAAATTTACGG CACTTCCCACTTTCAAATC GR-9-IV-17- PR-GR-BESS-694 9_10_09_H03_R (AT)9 CATATTTAAGCCCCGAAG GCAAACACAAACTAATGCAC FH-1000 GR-9-IV-17- PR-GR-BESS-695 9_10_09_G21_R (AT)12 CATCATGTATTCAGGTTCTG GTGACCAAATTTCTCATCC M GR-9-IV-17- PR-GR-BESS-696 9_10_09_e24_R (AT)14 TGGTATGCTTTGGTAAGGT CATAATAAGGCATCCAAATCA M GR-9-IV-17- PR-GR-BESS-697 9_10_09_d07_R (AT)8 GCGTTTATCGCTTTATGAG CAGGTGTTCAATAACATGC M GR-9-IV-17- PR-GR-BESS-698 9_10_09_b04_F (AT)8 TGACGAGAAAGTGAAGGATT GTACGAGGCATTACCAAAAC P GR-9-IV-16- PR-GR-BESS-699 9_10_09_I01_R (AT)14 GCTTACTAAGTGGAATGGA GCCTCAAGATTTGTCAAC M GR-9-IV-16- PR-GR-BESS-700 9_10_09_F01_F (AT)8 CAGCACTCTAGCGTAGTCCT AGTTGGCGTATTGAGTCTTC NA GR-9-IV-16- PR-GR-BESS-701 9_10_09_e20_R (AT)7 CATTCGAACCACACACATT TCCGAGGTAAGTCTATTAGCA M GR-9-IV-3- TGCAAGCTTATCCCCTACTA GGAAAGTGGAAATTTTTGT PR-GR-BESS-702 9_10_09_e13_F (AT)15 M GR-9-IV-16- CGAGTATGACGAATCAAGG GAGTCTGCCCGTTCTATATG PR-GR-BESS-703 9_10_09_d04_F (AT)22 M GR-9-IV-16- PR-GR-BESS-704 9_10_09_d01_F (AT)18 TCGTACTTGAACCCAGTC TCTTTCGTCAAGGGAGAA M GR-9-IV-13- PR-GR-BESS-705 9_10_09_N17_R (AT)15 TCAGAAAATGACTTGAGCT GCCTGATTGGATTTGACT P GR-9-IV-12- PR-GR-BESS-706 9_10_09_K05_R (AT)21 CTCCAAAGCCAAGTGATAAG CCCCAGTGTGGATTTATATG M GR-9-IV-11- PR-GR-BESS-707 9_10_09_M12_F (AT)11 ACAATCTTCCAGAATCACG GAAGGGAGGGTAGCATTAGT FH-1000 GR-9-IV-12- PR-GR-BESS-708 9_10_09_K06_R (AT)9 CTTGGATCATGTGAATTGTC CAATACACCCACTTTGGAG M GR-9-IV-11- PR-GR-BESS-709 9_10_09_N01_F (AT)10 CCATCTTCTTGCCAGATTAG CCTTAGCAAATCAACCAAAC M GR-9-IV-12- PR-GR-BESS-710 9_10_09_K13_F (AT)8 TAAACATGGGAAGAATGTGG GGGCTAGGGCTATCTTATTC P GR-9-IV-14- PR-GR-BESS-711 9_10_09_L10_R (AT)9 CGCGTAAGGATTAAAAGTTG CGGGCTAAAATTGGGTATTA M GR-9-IV-14- PR-GR-BESS-712 9_10_09_N18_F (AT)12 CCTAACAATAGTTCACACCA GTAAAAGTGTTAGCCGAATG FH-1000 PR-GR-BESS-713 GR-9-IV-12- (AT)8 M

224

9_10_09_K18_R GGTTTGGTTTGTGGTTGTAT GTAAAAGTGTTAGCCGAATG GR-9-IV-11- PR-GR-BESS-714 9_10_09_O13_F (AT)9 GGCCTATCTAGGAACACCTT AACAAACGAAAGGAAAGAAA M GR-9-IV-12- PR-GR-BESS-715 9_10_09_A06_F (AT)20 GCAACTTAAATGGATTGTTTG TTTCCATTATGCCAAGAGTT M GR-9-IV-12- PR-GR-BESS-716 9_10_09_M15_F (AT)11 GCAAGTTGATCCAGAATCTC GGTTATTGTCGAAGAGGTTG NA GR-9-IV-12- PR-GR-BESS-717 9_10_09_A16_F (AT)8 CAAGCTTCGGATAAGTGTTC ACGTGACTCGTTATCCATC M GR-9-IV-12- PR-GR-BESS-718 9_10_09_O13_R (AT)16 TGTCCAATTCGGTTTCTAAC ATATCTGATGCTTTGCGAAC M GR-9-IV-14- PR-GR-BESS-719 9_10_09_O08_F (AT)11 CGATAGTCGTCATACGTCA TCAAGATCGAGTGGAAAATC P GR-9-IV-12- PR-GR-BESS-720 9_10_09_b16_F (AT)9 TGTCGAGAGTCAACAACTCA GACGACATCCGAGGTAAGT M GR-9-IV-12- PR-GR-BESS-721 9_10_09_P07_F (AT)9 TGGAGGATCTAAACACCTG CCACCCAATCAACTATCACT M GR-9-IV-15- PR-GR-BESS-722 9_10_09_d16_R (AT)8 CCCCTTATGGTGTAAATTG CCATCAATGTTGACTTGTTC M GR-9-IV-1- PR-GR-BESS-723 9_10_09_F06_R (AT)6 CCAATCACATTTCGTATCTG GTGTTTGAAGAAGTTGTGC FH-1000 GR-9-IV-1- PR-GR-BESS-724 9_10_09_I10_R (AT)18 TCAGTGTCCAACACAAATG GGTTCAACTTGGATATCAGC M GR-9-IV-1- PR-GR-BESS-725 9_10_09_L14_R (AT)10 GCACTATTCTCTTTGACTCCA GGCGAAAAGATGCTTTGA M GR-9-IV-1- PR-GR-BESS-726 9_10_09_M09_F (AT)19 GAGGTTCGATTGTGTATGA CCAAATCCAGCTCAATAACT P GR-9-IV-1- PR-GR-BESS-727 9_10_09_M12_F (AT)17 CAAAATGATCATGGACAACC CATAGCCATGCAAAAATGG M GR-9-IV-1- PR-GR-BESS-728 9_10_09_O08_F (AT)16 GACCGAAAACACATAAGCAT GGAACGTCTTCACACTATCG P GR-9-IV-2- PR-GR-BESS-729 9_10_09_H13_R (AT)14 CACCAATACCAAAACCAAAC GGCATATAGGTGAGCTTAGTTG M GR-9-IV-15- PR-GR-BESS-730 9_10_09_O10_F (AT)17 GAAAGTTGTGCTGGGTAGG GTTAAACGCAATCCTTTCAG M GR-9-IV-15- PR-GR-BESS-731 9_10_09_L24_R (AT)12 ACTCACTCCTCTGCAACTTC TCTCTGTGAGAATGCAGGT M GR-9-IV-13- PR-GR-BESS-732 9_10_09_H05_F (AT)8 CGGTTTCAGCATGATACATA GGTTGTTCTTTTGACCATGT FH-1000 PR-GR-BESS-733 GR-9-IV-13- (AT)13 M

225

9_10_09_b06_F GGGGATCCTCTAGAGTCG GTGGAATTACACCTACACAAC GR-9-IV-12- PR-GR-BESS-734 9_10_09_I01_F (AT)9 CTGGAGTTTTGCTTAATTCG GAAGGTGGACAACATTGATT M GR-9-IV-12- PR-GR-BESS-735 9_10_09_G10_R (AT)9 AGGCTAGTGTTCGTTTGGT TGCATCCATCATCTACTTCA M GR-9-IV-11- CTTTGATCTAACATACGTGGAC CCAATTTTTGGTGAGTCTCT PR-GR-BESS-736 9_10_09_H13_R (AT)11 M GR-9-IV-11- CTTAAACTGGAGATGCGATA GGTGTAATTAGAACGACCAT PR-GR-BESS-737 9_10_09_H05_R (AT)9(GT)6 M GR-9-IV-11- PR-GR-BESS-738 9_10_09_c18_R (AT)16 TCTGATTTAAAAGCTGACTCTAAG TTGTAGCACACTTTGAAACG M GR-9-IV-10- PR-GR-BESS-739 9_10_09_O10_F (AT)14 ATGCGAAAATGACTTGATCT TCCATATCAAGCTTACCAAAA P GR-9-IV-10- PR-GR-BESS-740 9_10_09_O07_F (AT)17 CATGGTATTGTCATCAATCTT TGAATTTATCACTCAACCAA P GR-9-IV-10- PR-GR-BESS-741 9_10_09_O06_F (AT)20 TTCTCCGATAAAACAAAAGG TTGGAATTGGAATTTGTTTC M GR-9-IV-10- PR-GR-BESS-742 9_10_09_O05_R (AT)14 TGGGTTGAAATAAAGGTAAGA TTATCTAAATTCAAGACTTGTTCA M GR-9-IV-10- PR-GR-BESS-743 9_10_09_L11_R. (AT)9 GGAATTAGATCAGGGGAAAT GATCAAATCGAACTCGAAAC M GR-9-IV-10- PR-GR-BESS-744 9_10_09_F15_R (AT)7 AAGAATTGACGCATTTAAGTC ACGCTAACAAAAACTTGACC P GR-9-IV-2- PR-GR-BESS-745 9_10_09_J08_R (AT)13 TTCTTTCACTCCCTCGATTA CCTCACAAATGAACCTGTTT M GR-9-IV-2- PR-GR-BESS-746 9_10_09_J09_R (AT)13 TGCTCGATGCATTTAGACTT CCCTTACCACTCATTCTAAAAA P GR-9-IV-2- PR-GR-BESS-747 9_10_09_J14_R (AT)15 TGAGCTAAACAACTGAAGCA TAAGTTCATGCAAACAACCA M GR-9-IV-2- PR-GR-BESS-748 9_10_09_M08_F (AT12 CGATATGGTTAGGTGGTTGA TCGAATTTATGAAAGTACACAGA P GR-9-IV-3- PR-GR-BESS-749 9_10_09_d15_F (AT)14 GGAATGGCTTATGATGGTTA ACTCAAAACATGCCAAAATC M GR-9-IV-10- PR-GR-BESS-750 9_10_09_F05_R (AT)9 GGAATGGCTTGTAATGGTTA CAAAGCACAACCATGTCTTA M GR-9-IV-10- PR-GR-BESS-751 9_10_09_e17_F (AT)10 GGAAAGACAAAAGTGGTTGA AGCCTGAACTTACAAAATCG M GR-9-IV-10- PR-GR-BESS-752 9_10_09_c21_R (AT)30 GAGAAGCTTTTCTTGCACAC GACCAAAATAAACCCTACCC M PR-GR-BESS-753 GR-9-IV-3- (AT)9 M

226

9_10_09_L09_F GGAATATCATCGCATTATCG AGCATCATTATCAAGCCATT GR-9-IV-3- PR-GR-BESS-754 9_10_09_M10_R (AT)11 CCATGCATAACATACCAAGA AGCATTGTTTATGGAATTGG M GR-9-IV-3- (TA)6 PR-GR-BESS-755 9_10_09_N10_F AGCTTGTTCGATCCACATAC TCCGAGGTAAGTCTGTTAGC M GR-9-IV-3- (AT)10 PR-GR-BESS-756 9_10_09_P04_F TCAACAACAACATTTTCGAG GGGCGTGTTACACTTAGTATG M GR-9-IV-10- PR-GR-BESS-757 9_10_09_c19_R (AT)13 CACCCATTCTTTCTCCTAT AGGGCGTAATTTAAGTGATTT M PR-GR-BS-758 FITB7345.y2 a (AC)7 GCACCTTCTCTAATGCTCTG TATGTCCATAGGGACTGGTG M PR-GR-BS-759 FITB7314.y2 a (CTT)4 CATGGGATGGTAAAGCTCTC GCTCCTAGACACACTCAAGC FH-1000 PR-GR-BS-760 FITB7650.y2 a (TGA)5 TCGGCGATCTCTGTTAAG CCCGTGATGAAAGACTTG M PR-GR-BS-761 FITB7331.y2 a (TTC)4 CGGTCGTTACAAGTATGGAG ACTCGGTTCGAGTAGAGTCA M PR-GR-BS-762 FITB7531.y2 a (ATCAAA)4 CGCATACAAGGTGAAAGTGG GTTTTCACTTTGGCCTCTCC P PR-GR-BS-763 FITB7522.y2 a (AT)9 CACCTATATGGAAGGGCTGT GGAGTTGAACAGTAGCAGCA M PR-GR-BS-764 FITB7539.y2 a (GAG)4 ACAGTAAGGATGTGGAATGG CTTTATTTTCCCACATGAGC NA PR-GR-BS-765 FITB7411.y2 a (CAT)4 TATGGGCCTCATGGATCA GCCGATGCTTTATCTAGACG M PR-GR-BS-766 FITB7538.y2 a (CTG)4 CCCATATTTTCTTCATCCTC GAAACAAAACCAGGTGTGTA M PR-GR-BS-767 FITB7300.y2 a (TA)15 GGTTACGAGGCATTACTGAA CTTATGGTTTAACGGGATCA M PR-GR-BS-768 FITB7580.y2 a (TAAT)4 CGGTTCGACTACTAGACCAA GTCCAAACAAAGGGGTTG FH-1000 PR-GR-BS-769 FITB7404.y2 a (CTCT)8 GATCGACGTGACTATCATCC CCACCCTTGTTTCCTTTG M PR-GR-BS-770 FITB7620.y2 a (TTC)4 GGAAGCATCTTCTAATTCCAC GGACCAAAGAGAGGATTTTC M PR-GR-BS-771 FITB7331.y2 a (TTC)4 CGGTCGTTACAAGTATGGAG ACTCGGTTCGAGTAGAGTCA NA PR-GR-BS-772 FITB7452.y2 a (ATT)4 TGGAATTCAAGGTTAGATCG CAACGGGGCTTTTTAATATC NA PR-GR-BS-773 FITB7333.y2 a (ATTCG)4 GGGTGACAATGGTTACAAGG TAAGTAGGGGTGAGCATTCG M PR-GR-BS-774 FITB2178.y1 a (TC)6 GACCTACAACACAAGTGTAAGG AGAACCATGAGTGTCTAGGTG M PR-GR-BS-775 FITB7531.y2 b (ATCAAA)4 TCAAAAGAGATACCATGGAC GTTTTCACTTTGGCCTCTC M PR-GR-BS-776 FITB2152.y1 a (TTC)5 GGCCTTAAATGGTAATTAGACC GTTCAAAAATGGTGTCTTGC M PR-GR-BS-777 FITB7914.y2 a (ATT)4 CTCTGCGGCTGAATAGAA GGGGGAGGAAAATGAAAG NA PR-GR-BS-778 FITB4305.x2 a (AT)5 GTTCTCCGGTGGCAATAC GGATTGGTTCAGATCTCTCC M PR-GR-BS-779 FITB4428.x2 a (AAG)5 CAGGCTTGTGGGATTGAA GGTATTGGCCCAACCAAA NA PR-GR-BS-780 FITB7404.y2 b (TC)5 CCTTACGATCGACGTGACTA CCACCCTTGTTTCCTTTG M PR-GR-BS-781 FITB7365.y2 a (TCAT)5 TAGGGTCATGTCTAAACGTC CATTGATATCTGGGGACCTA M PR-GR-BS-782 FITB7669.y2 a (CAT)4 CAGGTCTACGTGGGAATAAG GGGACACTTGAGCAGATACT M PR-GR-BS-783 FITB7502.y2 a (GAGA)4 TCTTCTCCAAACCCACCT CCAGTTGCAGCAAGAAAC Bar PR-GR-BS-784 FITB7390.y2 a (TCT)4 CGGTAAGTTGACATGACACG TCAGCTTTCTCTCGCTCTCT NA PR-GR-BS-785 FITB7638.y2 a (AATA)4 GGTCTAAAATCCCCACCTAC CCTGTCAAACCCACCATA M PR-GR-BS-786 FITB7335.y2 a (TAC)4 CCAACTTGGGTCTGACAA GCGAGTGTTAAGACTGTGGA M PR-GR-BS-787 FITB7647.y2 a (AT)20 GCAGCCCCATTAGTGTAGAG CAAGGCGCCTCCTAGAAT FH-1000 PR-GR-BS-788 FITB7592.y2 a (CTT)4 GGTTTGGAGAGGCAAAGAG ATCTCGACAATGGGAGAGG M

227

PR-GR-BS-789 FITB7592.y2 b (AG)7 GGAGCTCAAGAGATTGGAG GGCCTTATTACGTGTGAGC M PR-GR-BS-790 FITB4441.x2 a (AT)10 CGAGTTTTTCAGTCTCATCTGG GAACATGCTCGGATTCTGT M PR-GR-BS-791 FITB4553.x2 a (ACA)4 GAACTACAGGGACAGTTCCA GATCACTGCAGGGTAATCAG M PR-GR-BS-792 FITB4355.x2 a (TTCT)4 CCGTGGTTAAGCTTGTTC GAGTAAAACCGAGCTCTCAC M PR-GR-BS-793 FITB4244.x2 a (AAT)4 AGGTCGACTTAATTCTGTG TTGGATCTAATACGAGTCAC NA PR-GR-BS-794 FITB4580.x2 a (TCT)4 TGTGGAATGGGAGAGATG CATTCTCGAGCATAGTGACC M PR-GR-BS-795 FITB4277.x2 a (TCC)4 CCCTGAACCTAAAACATTCA GACTCAAGTGGACAGTTTTGA M PR-GR-BS-796 FITB4470.x2 a (AGA)6 GAGACTTGGGTGTGCAGTA GACCTAGGTGGGTGTAACAA M PR-GR-BS-797 FITB4311.x2 a (GGTT)4 TGGTACTGTTCGAGCTTCTC CATCATATGCATTAGCATCG Bar PR-GR-BS-798 FITB4503.x2 a (TTA)4 GGATTGATGAGGAGTTATGG GATGGACACTTACGGACTGT M PR-GR-BS-799 FITB4359.x2 a (AT)10 GGTCGAACCAGTCAAACCA GCCAACACATCACCCACA M PR-GR-BS-800 FITB4583.x2 a (AT)8 CCAAAGAAATCAGAGTTAGACG CTTTCAATTTGAGCTCATCC M PR-GR-BS-801 FITB4583.x2b (AT)6 AGAGTTAGACGGGGGAAAG CTTTCAATTTGAGCTCATCC M PR-GR-BS-802 FITB4424.x2 a (TA)15 ATAGCATAAAGCATGCCACA CCTCCTTTTGAATCATGACA NA PR-GR-BS-803 FITB1978.y1 a (GAA)4 TCGTGTGAACACAGGTGTAG GAGTGACGTGGTTAGAGTGC M PR-GR-BS-804 FITB2211.y1 a (GT)6 GACGTATTCCAGGTTCAAGT GCTCATTGCTTGGAAAAC M PR-GR-BS-805 FITB1981.y1 a (TTA)4 TCCTGTGTTAGTTGGGTGTC GATCCATGTTGGTTGTGG NA PR-GR-BS-806 FITB2101.y1 a (TATT)4 GGGCAATATCCCTAGAGAAC TGTCATGACTTTAGCCGAAT M PR-GR-BS-807 FITB2118.y1 a (TAAA)4 CCATCATGTCACTCAATAGC CAACACGTATCGTCTCAGTT M PR-GR-BS-808 FITB2279.y1 a (CTA)4 CCGTGCATTACTTCTTTTCT GGACTGTCATTAATGGTGTGT NA PR-GR-BS-809 FITB2232.y1 a (AT)6 AACCCAAATTACCTTGAAGC GGAGCGAACTATGTGTAGGA P PR-GR-BS-810 FITB7843.y2 a (GAA)4 CATTCTTGCGAATAAAGGTG TAACCATCGTCTCTCGTAGC M PR-GR-BS-811 FITB8035.y2 a (ATT)7 CATCAGGCTCCTTCATTG TGGAGCCTATTTGGTAGG NA PR-GR-BS-812 FITB7740.y2 a (CAT)4 CCCTTTCCTCTATGGTGAG CCCTCTTCCTAGAAGGATTC P PR-GR-BS-813 FITB7804.y2 a (AT)7 GTTTCTCGTGATCTCTCTGG TCTAACCGACTCCTTGTAGG M PR-GR-BS-814 FITB7933.y2 a (CTT)4 GGCCACACCAAGTGAGTT CCTCTCGTACTGGTGTCCTA M PR-GR-BS-815 FITB7981.y2 a (ATC)4 GCCCTCCACTTCTACCTATC GCTCTCCTGCTTATGTGCT M PR-GR-BS-816 FITB7981.y2 b (AGC)4 GAGACCTGAGTGCTATGACAG CTGGTGAAGGAGGTTTGAC NA PR-GR-BS-817 FITB8061.y2 a (AT)8 GTGCTGATTTCTCCCAAG CTGACGGGTTTCTCCATA NA PR-GR-BS-818 FITB7703.y2 a (TTTTAG)5 TGTCTGCTAGTCACACCTTC CTATTGGGTGGTACCCTTAC P PR-GR-BS-819 FITB7751.y2 a (ATTA)4 CAGCCTAAGATGGTCCAA GGTCAGCATATGGACAGTTC M PR-GR-BS-820 FITB8063.y2 a (TA)6 AGAAGTCGAGTGCAGCTAAG TGCAATGTGATTAGTTCGAC NA PR-GR-BS-821 FITB7904.y2 a (AT)6 GCACTGAACTCCTTTCACC TGTAGGCAGCCTGTTGAG M PR-GR-BS-822 FITB8016.y2 a (AG)20 GGATGACTTTTTGCTCAACT TGCAACTACCTCACTTTTCA M PR-GR-BS-823 FITB1561.x1 a (AT)6 TGACTAATCAAGCCAACCTT TTCCTTCGTTGTTGACACTA P PR-GR-BS-824 FITB1561.x1 b (TGA)4 GGCAATGTGGTATTGTGG CCATCAGCAACTTTGAGC M PR-GR-BS-825 FITB1826.x1 b (TAA)20 AGGGTTCATGAGCTCGAC GGGCGAAGTTGAATAGGA M PR-GR-BS-826 FITB1581.x1 a (ATAT)4 GACCCCGGAAAATAAAGAC GCTTATGCTTCTCCCTTTGT P PR-GR-BS-827 FITB1821.x1 a (TTA)5 TGTAACACCCCTTACCTGAG GCAGGAGGGATTGAAAGT M PR-GR-BS-828 FITB1614.x1 a (GAA)15 CTGATGCTACAGCCATTACC ACCAGCTCACTATCAGCAGT M

228

PR-GR-BS-829 FITB1838.x1 a (TATA)4 CATGTGTCACTGGTTTTTGA ATCAGTGAAGAAAGGCATTG M PR-GR-BS-830 FITB1886.x1 a (AAT)4 ACCACACCACATAACATTCC TGGGCATGTACTTAGCTTTC M PR-GR-BS-831 FITB1807.x1 a (AT)7 GGCATGCCATCTATATCC GGTAAGTGGTTGAGGTGAGT M PR-GR-BS-832 FITB1727.x1 a (AT)6 AGAAAGTGAGTGGAGAGAGA GGACACGAAAGGTTGTATGT P PR-GR-BS-833 FITB1632.x1 a (AG)6 AGCGACTACTAATGGTGGAG CAGGGACAAAGTACACCAAG M PR-GR-BS-834 FITB6761.y2 a (TTA)5 TGGAATAGTGAAAGCCCATA CATAAATGATGTGTCCCAAC NA PR-GR-BS-835 FITB6730.y2 a (TGG)4 GTGGATTTCCACATGATACG TGGGCTTAGTGTATGACTCG NA PR-GR-BS-836 FITB6730.y2 b (TGG)4 GTGGATTTCCACATGATACG TGGGCTTAGTGTATGACTCG NA PR-GR-BS-837 FITB6738.y2 a (TCC)4 CATGAGCTTAAACCATGACC ACATTAAGGATGTGGAATGG M PR-GR-BS-838 FITB10254.x1a (ACA)4 GCCAAAAACCAAAGACCT CTCTTCATTGTTGTTGAGTGG M PR-GR-BS-839 FITB10366.x1a (TTA)4 CAGGAGAGGAAAGGTTTGTG CTACTCTAGCACATGCACCTC NA PR-GR-BS-840 FITB10207.x1a (TTAT)4 GCGCTTGTTAGGAGGTAAC ATGCGCTCTAAGTGTTGC M PR-GR-BS-841 FITB10312.x1a (ATAA)4 GGAGTTCTTGTTGGGAAATG CTCGTCACAATTTCGCTATG M PR-GR-BS-842 FITB7705.x1 a (AAT)4 CTGGTCTAAGAGGAGTGGTC CTCTGGAAGTTTGGTGAG M PR-GR-BS-843 FITB7745.x1 a (ATT)4 TACCTCTAGCAGGAGACAGG CGAGGTGTTCTTCATGGTAG NA PR-GR-BS-844 FITB7785.x1 a (TA)6 GTCCATACGGGTAGACACAC GATAAGGGGGTGAGTTCG M PR-GR-BS-845 FITB7738.x1 a (AAT)4 GTGTAACACACTCAATGCAA CACTTAAAGTCACACACTTCTG M PR-GR-BS-846 FITC82214.b1a (TA)6 AGCATCATTATCAAGCCATT GGAATATCATCGCATTATCG M PR-GR-BS-847 FITC62428.g1a (TA)6 CCATGCATAACATACCAAGA AGCATTGTTTATGGAATTGG M PR-GR-BS-848 FITB7891.x1 a (TCT)4 GGGTGGCCTTAAATGGTA CACACACAAGGATGAACTTG NA PR-GR-BS-849 FITB7884.x1 a (CTAG)5 GAACAGCATTGGGGATGT CAGAACCTGAACTCCTCTGA M PR-GR-BS-850 FITB7812.x1 a (TAA)4 ATCGTATAACACGCAAAAGG TGTGCATTGTCGTTGTTAAG M PR-GR-BS-851 FITB7942.x1 a (TTC)4 GCCTGACCAGGATACTCAC GAGAACTAGATGGCTGATGC NA PR-GR-BS-852 FITB8022.x1 a (GAT)5 TGGAGCTTCTTCTGCTAGTG CATAGCCTCACCTTCATCAC M PR-GR-BS-853 FITB7695.x1 a (CT)7 ACTCCTGCTTCCAACATCTC CGGTAGAAGAGAAGGTTCCA NA PR-GR-BS-854 FITB7919.x1 a (TCTC)8 CTACGAGGAAAGGATGTCAG TAAGGATGATCGGAGCAC M PR-GR-BS-855 FITB7808.x1 a (AAT)4 CGTCACTGCATGTGTTCGTT CATCGGTCGGCATTTTGT M PR-GR-BS-856 FITB3325.x2 a (TTA)4 CTGCAACTTCGTGCTTTGAT CCCTTGGCCAAATTGTATGT M PR-GR-BS-857 FITB3365.x2 a (TCTC)8 GCCGTTTACTCCAACAACG GGGTTCACGTCGAACTTCA M PR-GR-BS-858 FITB3429.x2 a (TCTTTC)5 TGGCGAACATAAAGCATAGG CGAATGTGATTCAAGGAGGA P PR-GR-BS-859 FITB3095.x2 a (TCC)5 CCATTTCATGCAATTTGGTC CATTAAGGATGTGGAATGGAGA NA PR-GR-BS-860 FITB3096.x2 a (AT)7 CGCTCCTCTCGTCACTTTCT AAGAACAGGCTGAGGAACCA M PR-GR-BS-861 FITB3304.x2 a (TAT)4 GGTCCGCCAGGATACAGAC CGGGTCTGGAGTACCTTGAA M PR-GR-BS-862 FITB5001.x2 a (AAAAG)5 GGCCCTAACGTCCTAAAGT GCCAAGTATGAGAGTGGGTA M PR-GR-BS-863 FITB5097.x2 a (TTG)4 TGATTACACAAGCCATTCAG TGTAACGGCTATGAACAACT NA PR-GR-BS-864 FITB5282.x2 a (AATA)4 ACAAAGCCTTGAATACCTGA AACACTTGAATGGTCTCTCC M PR-GR-BS-865 FITB5307.x2 a (TATA)4 AGTTCGAGTGCGTTAAAGT TCTTTTGACCTAACTCAAGG P PR-GR-BS-866 FITB5181.x2 a (TCA)4 GTGATCTATCAGTCCGGTT CTACACGAGCCAGTACCAT M PR-GR-BS-867 FITC22629.g1a (GAA)4 ACCTAGTTGCAACGATCTCA AGATGGTGGACGGTTTAAG M PR-GR-BS-868 FITC2087.g1 a (TCC)4 ACCGCAATAGTCCTTCAGTA GGAGCTCTATAGCTTCGGATA M

229

PR-GR-BS-869 FITB5350.x2 a (TTA)4 TGTACCATAGTCGACCAACC CACCCCACAACGAAAATC NA PR-GR-BS-870 FITB5350.x2 b (GAT)4 GCCTATGTCGACTATGTTCA GGATTGAGCGAGTACCTTAC NA PR-GR-BS-871 FITB5350.x2 b (GAT)4 GCCTATGTCGACTATGTTCA GGATTGAGCGAGTACCTTAC M PR-GR-BS-872 FITB5247.x2 a (TA)6 GGCTACGATGCAAACAAG AGCCTTGGAGAAGGGTTT M PR-GR-BS-873 FITB5232.x2 a (TGTG)5 GGGATGTTGAGTCTTTATGC CATGGTAGTCCTTACTACGAGA M PR-GR-BS-874 FITB5072.x2 a (AGAA)4 AGGAGTGGAAATTGATGTTG AAGCTCTCATGACCTCAGAA P PR-GR-BS-875 FITB5120.x2 a (AAAAT)4 CGTTAGGGTTTTAGACTCCA GGAATCAGCTAAACGATGAC FH-1000 PR-GR-BS-876 FITB1466.y2 a (TAT)4 TTGAATGAACCCTACTCTG TGTGTTCCCTCAAGAATTAC M PR-GR-BS-877 FITB1155.y2 a (AAT)4 CACCCTACGATAGAAACTCG CTACCAAAACAGCCAGGGTA M PR-GR-BS-878 FITB1244.y2 a (TTAA)4 ATCCAGCAAAGCTATCAGAC AGACATAAAGTGGGTCATGC M PR-GR-BS-879 FITB1260.y2 a (CTT)5 AACTCTTCCTGTGTCACCAC GCAACCAACCTAATCAACTC M PR-GR-BS-880 FITB1260.y2 a (CTT)5 AACTCTTCCTGTGTCACCAC GCAACCAACCTAATCAACTC NA PR-GR-BS-881 FITB1238.y2 a (ATAT)12 GTTTGGAAGCAAACAAGAAC ATTCAAGTGGCTGCAATC P PR-GR-BS-882 FITB1454.y2 a (AT)7 GATAAAGGGGCTTGGGTA GCACCGAGTGTTTTGGTA NA PR-GR-BS-883 FITB1423.y2 a (TCT)4 CTATTACAGCTGCCCCTCT GAGTAGGTTGAAGCACCAGT M PR-GR-BS-884 FITB1239.y2 a (ATAT)5 GCATATGATAAGTGTTGAGGTG CCATCTATATCCGACTCAGC M PR-GR-BS-885 FITB1463.y2 a (AAAT)5 CTCATTTCTGTCTTGCAACC ACTGACCCCAATGTATTTGA M PR-GR-BS-886 FITB1295.y2 a (TTAT)4 TCTTCTAGGGGTTTCGTAGG CCGGATCTAGACACAAGAGA M PR-GR-BS-887 FITB1320.y2 a (AG)6 CTTGCATTGTAACTGCTTCC GTATGTGTTGTGGCTTTCG M PR-GR-BS-888 FITB1320.y2 b (AG)6 CATATCGGGAGGTGCTTC GTTGTTCTGCCTGCCTCTA M PR-GR-BS-889 FITB5538.y1 a (TA)6 AGACCCCCAATCAAAGAG ATGGCCCATGAGTATTCC NA PR-GR-BS-890 FITB5487.y1 a (AAAG)4 CATGACTCGAATTTCCTAGC ACCAGATTAATGGACTGTGC M PR-GR-BS-891 FITB5487.y1 b (AAAG)4 GGACAAAAGCACAGTCCA ACTGCTTATCCCCTCACAG M PR-GR-BS-892 FITB5680.y1 a (AGATT)4 GAAGCCTGTCAGATTGAAAC TAGTTGAGCTCCTCCAAAAC M PR-GR-BS-893 FITB2705.x1 a (TAG)4 GCACAGGGAAAACAACTACT ACCATACTTTGCAGCATCTC NA PR-GR-BS-894 FITB2761.x1 a (ATAT)5 CACACAATTCAGACAATGACA CAAGCTAGGAGTGGAATTTG P PR-GR-BS-895 FITB3049.x1 a (GCA)4 CAGATTGAGACTGGGTTTGT GTCGACTCCTCTGAATGAAG M PR-GR-BS-896 FITB2938.x1 a (TCC)5 CAATACCGCAATAGTCTTTC CATAAAGGATGTGGATATGG M PR-GR-BS-897 FITB2963.x1 a (ATAT)5 CAACCTCATCCATTTTCGT GGGTTGTGTATTGAATTGAGA M PR-GR-BS-898 FITB3027.x1 a (GAG)4 CAAGTGGACAACTTTGAATG TTCAATTTAGTCCCTGAACC NA PR-GR-BS-899 FITB2772.x1 a (ATATAT)13 CGATAGTTGAATGCACACAC TCTGCTACGTTTTCACTTTG P PR-GR-BS-900 FITB2852.x1 a (ATAA)4 GGAGCTCATCGAAGAAAAC TCCCTCTCCATTCTTTCAC M PR-GR-BS-901 FITB3044.x1 a (TAT)4 CCCTAATCTTTTGTCGTAGC CCCACAAGGATTGAATAGG M PR-GR-BS-902 FITB2885.x1 a (AAG)4 CTGCAGAAGAGGAAGTTGTT CAACTTCTGCTAGTCCCACT M PR-GR-BS-903 FITB2717.x1 a (ATATAT)5 GACGTCGAATAGCTCAGAAT TCTGACTCAACCCGATACA P PR-GR-BS-904 FITB2949.x1 a (TGTG)4 TCTGTCACAAAGGGAACC GTGACAGAAGAAGGTGGAAC M PR-GR-BS-905 FITB2957.x1 a (ATAT)7 GCTTGAAAAGGAGATTACCAC CCTTTGAAGAAAGTCATGAGAC M PR-GR-BS-906 FITB2989.x1 a (AAC)4 TAGGTGCAACTTAGCAAGC CTTGGCTCGGTCTAGCTC M PR-GR-BS-907 FITB3061.x1 a (TGA)4 GCCACACATGTCACTCATAG GTCGGAGTAGACACGGATAC NA PR-GR-BS-908 FITB2720.x1 a (CCCATG)4 GCATCACTTTCAGCTTCTGT CACCATCACAACAATCACTG P

230

PR-GR-BS-909 FITB8521.x1 a (ACCCG)6 CCTCTCTATGAGGCACCTT CTACTGAAATTGGGTGTCAG M PR-GR-BS-910 FITB8812.x1 a (ATT)4 GGTGGTTAATTGTGCATGTC TCTTTGAACACCAACAGGAC M PR-GR-BS-911 FITB8828.x1 a (ATATAT)5 CTGCTAACCGACTTCACTCT GAGGAAGAAAGGGCATAAGT P PR-GR-BS-912 FITB8782.x1 a (TTTA)4 CCAAACCTTGGAAATGGT GGTGCAACTTAACAAGCAA M PR-GR-BS-913 FITB8606.x1 a (TCTC)6 TCATTTTCACGTCCTGTGTA ATGAGGGTCTTGTGTAATGC M PR-GR-BS-914 FITB8606.x1 a (TC)6 TCCTACTGCAGCTAGTGCTAC GGAAATGTAGAGCAATGGTG M PR-GR-BS-915 FITB8759.x1 a (TGACAT)4 GCAATGATCCCTGAATGTC GGTCTATCAACGACTTGGAGT P PR-GR-BS-916 FITB7665.x1 a (AAG)4 CTGAAGGGCTTGACTTTAGC ATGTATTGCCCTCCACCTAC NA PR-GR-BS-917 FITB7673.x1 a (TTA)5 CTGCCATGGAACTGAAAG ACAACAACAGCAGCAGACT M PR-GR-BS-918 FITB7298.x1 a (TAT)4 GGAAGAGTTAGTGGTTGTCG GTGAATGGGAGAGCACTAAG NA PR-GR-BS-919 FITB7522.x1 a (AT)6 AGTAACGCGTAAACCATGTC AAGCCTTCGATGTAGTGATG M PR-GR-BS-920 FITB7338.x1 a (GGT)4 GTCAGAATTTGGTGCAGAGT CATCATCCAAGTTTCAGACC M PR-GR-BS-921 FITB7338.x1 b (TCT)6 GTCAGAATTTGGTGCAGAGT CATCATCCAAGTTTCAGACC NA PR-GR-BS-922 FITB7650.x1 a (TAT)4 TCCAGTTGTTATGACATTGC CACGAGGAAGAAGAAAGAAG M PR-GR-BS-923 FITB7650.x1 b (ATATAT)3 TCGAAGGATTTCTTCAGGTA GAAACGCTTTGAACTTTGAC P PR-GR-BS-924 FITB7339.x1 a (AT)7 GCATTTCTTGGAGTTGCAC GCATACCGGTCGTATCCTAT FH-1000 PR-GR-BS-925 FITB7635.x1 a (CTT)4 GCTTAACATGGAGATGGAGA GAGAAGCATTTGGGACAGT M PR-GR-BS-926 FITB7667.x1 a (TTA)5 CCTTGTTGAAGTTACCTGGA GAATGTTGCTATTCCAGTGC M PR-GR-BS-927 FITB7675.x1 a (CTCTCT)4 GCGGAAGTTACTTGAATCC GGGTTATGGTCTCCAAACTA M PR-GR-BS-928 FITB7300.x1 a (ATATAT)4 ATGAATGTTGAGCACGTATG CACATTCTTGCCTTTACCTA P PR-GR-BS-929 FITB7549.x1 a (GTA)4 CTTGATCAACAAGGCGAAG GAGATGGGGTGTTTTTGAC M PR-GR-BS-930 FITB7597.x1 a (ATA)4 CCTCTCTTCTCACGTTGGTT GGCAATGATGATGTTGGTG M PR-GR-BS-931 FITB7597.x1 a (ATA)4 GGTTGCATGGAAGAAGATG GTGATGAGTCAAGGATGGA M PR-GR-BS-932 FITB7654.x1 a (ATTG)4 CTCTCTTTCCTAGGCCTGTT GATCATCTGGTAGGTCGTCA M PR-GR-BS-933 FITB7478.x1 a (TA)7 GGAGACACTGATAACGGAGA CCGCTCCCCTCTACTACTAT M PR-GR-BS-934 FITB7343.x1 a (TGTTT)4 CTTCCTTCCTTACCCTCTGT AGGCCCCATTCTCTTTAC M PR-GR-BS-935 FITB7487.x1 a (CTT)4 ACCCTAAACCCTCCAATTAC ATCCCATCTCTTCCGTTTC M PR-GR-BS-936 FITB7328.x1 a (CCTCT)7 TCCTCCTTTCATTGCTACAC GTGGTGCACAGCTAAGAAGT M PR-GR-BS-937 FITB7552.x1 a (TATATA)5 CTACAAAAGCCAGATGGTTG TGTAACGTGTTTCATCTGGA M PR-GR-BS-938 FITB7552.x1 b (AAG)4 AACCGAGTCTACACCTTCAA CAAAGATTATCGGAGAAGGA NA PR-GR-BS-939 FITB3643.x2 a (GAA)4 TCCTCATCACGAGATTTACC ATCCCTAAAATTGGCTTACC M PR-GR-BS-940 FITB3789.x2 a (ACTAAT)4 CGTAGTTGACCAACACGACT GGTCGGAAGCAGAAGTAAG P PR-GR-BS-941 FITB3782.x2 a (TAA)6 CATGCAGTACCGGAAGAA GTACTTTTGCCCAAGCTG P PR-GR-BS-942 FITB8169.x1 a (CAG)4 GGCACACCTATTGTAGTAGCA CTCTTAGGGCATTATCATCG NA PR-GR-BS-943 FITB8169.x1 b (CAG)5 GACTCATAGTAGAGGCACACC GGGTAACTGTGGTACTAGTGG M PR-GR-BS-944 FITB8099.x1 a (AC)7 TCCACCATGCTATTTTCTTC ACACGTGGTTTCATCTAAGG NA PR-GR-BS-945 FITB8115.x1 a (AAT)4 CCACGGAACGATTGTACTC TAGCCTGGTAAGGTCGTATG NA PR-GR-BS-946 FITB8276.x1 a (TTA)5 AAACAAGCGGTACATTTGTC ACACAATCCCAACAGGAAC NA PR-GR-BS-947 FITB8220.x1 b (AAG)4 GGACGCACACGTTAAAGG GGAGTAATGGGAGACCATCT M PR-GR-BS-948 FITB8117.x1 a (AAT)4 TCACAAGCTAAGTCGCTGTA TGCACTGCTTATTGTATGAG M

231

PR-GR-BS-949 FITB8325.x1 a (TATA)4 GCCAACTCACATGTACATACC CCACTGTTCCGATAACCTT P PR-GR-BS-950 FITB8189.x1 a (TTG)7 TGGAGTGTTGCTGGTAATC CCCGTGATACTTACAGGTTC NA PR-GR-BS-951 FITB8126.x1 a (AT)6 GAGTTGTGATACCAACAAGA GAGTGAACATATGCACTGAT M PR-GR-BS-952 FITB8375.x1 a (TATATA)3 CCTATGCTTTTAGGGACTGG GTGCAGTACCCTCCTGATT P PR-GR-BS-953 FITB8192.x1 a (TTC)4 TCCCTTCTCCTCTCTACACC GACCCAAGAAAGAGAGGAAG M PR-GR-BS-954 FITB6362.y1 a (ATA)5 GCATGTCATGGATTACCTTT GTTAGGGCTTAATTCCCTTG NA PR-GR-BS-955 FITB6362.y1 a (ATA)5 GTGCATGTCATGGATTACC CGGAACCCAATACGTTAG P PR-GR-BS-956 FITB6298.y1 a (TCA)5 CCGCCTGGTTATACTCAAC CTCAAGAGCTCAGAGAGACG M PR-GR-BS-957 FITB6506.y1 a (TTTC)4 CTCGATTGCCATCAACAG ATAGCACACTTGTGGGTAGG M PR-GR-BS-958 FITB6283.y1 a (CATC)5 CATTGCTCACCTTAAAGCTC AGCAGTAACGGTTTCGTTT M PR-GR-BS-959 FITB6429.y1 a (ATC)4 GTTCTTATTGGGCAGCCTAC GACCAGTGGTTAAGGTGGAC NA PR-GR-BS-960 FITB6502.y1 a (AAT)6 ATGCTATGACAGACAGGTCA GCAGTTTCACTGTGCATGT P PR-GR-BS-961 FITB6207.y1 a (TATTT)4 CTAAGGCCCCTTCCTATTC CGTGAGCACGAGAGATAGA M PR-GR-BS-962 FITB6400.y1 a (ATT)4 ACAAAAGCGCTACAGAGGTC GTCCCCTTTACTCCTAGACA M PR-GR-BS-963 FITB6216.y1 a (AT)6 GCCGAATGTACTCAAATCTC CAGGTAAGTTCGTACGGTTA M PR-GR-BS-964 FITB3777.y2 a (TAAA)4 AGGGCCCTTTTCTTTCAC GAGAATGTGCTCCACATGA P PR-GR-BS-965 FITB3738.y2 a (TAT)4 CGAACAAGGATTTACGTATCA TCGTGAATGTAATACCAGTGA NA PR-GR-BS-966 FITB3778.y2 a (ATATAT)4 GGCTTACCTAATGGGATATG CAATAGGGCTCAACAGTACC M PR-GR-BS-967 FITB3738.y2 a (AAT)5 GTTGGGAACCTAGATAGCAA CAACCTTAGGGAAGAAGATG M PR-GR-BS-968 FITB3494.y2 a (TCC)7 CCCTGAACCTAAAACATGC TGTGACAATGACTCAAGTGG M PR-GR-BS-969 FITB3782.y2 a (TCT)4 CTCAATCCCGAATGAGTTAC CGGTAAACAACCAAGTTCAG NA PR-GR-BS-970 FITB3830.y2 a (ATAA)4 GGCGAGCACTATGTTAACTC AAGAAGCCCAAAGTGACAG M PR-GR-BS-971 FITB3600.y2 a (GAG)4 CTCAAATGGATAGCCTTGAT CCATTATGCAATATCGCTCT M PR-GR-BS-972 FITB3792.y2 a (TAA)6 TGTTGATTATGGGTGGAATG CTTATGCTCAATGGGGAAA M PR-GR-BS-973 FITB985.y2 a (AAT)4 TGCACTAGAGCCTACAAATAG CTTTCCTCTACATAACGGACT FH-1000 PR-GR-BS-974 FITB945.y2 a (TCTT)5 ACACTTGCAAATCACATGC CCTTGGTTTAAGGGCTAGTT M PR-GR-BS-975 FITB945.y2 b (TCTT)5 GTTACGAGCTCGAATTTAGG CCGGTTGTAATGAATGGAT FH-1000 PR-GR-BS-976 FITB858.y2 a (GGTGC)4 TTGTCTCACGCTACAGTGTC TCTCTTCCAACATCTCAATCT BAR PR-GR-BS-977 FITB963.y2 a (ACA)5 AAGAGGGGCAGCTGTAATAG GGGCTAGTGTTGGGTTATTC M PR-GR-BS-978 FITB1051.y2 a (GAG)4 GGAGCTTGGTAGCTCAAT CCCACATGAGCTTAAACC NA PR-GR-BS-979 FITB1062.y2 a (ATAT)4 TCACTTTGGCATGTGAGAC GTGATGTTATGGCATATGTGAG M PR-GR-BS-980 FITB1062.y2 a (TTA)7 CTCACATATGCCATAACATCAC AGAGGAATAAAAACCCATCG M PR-GR-BS-981 FITB1151.y2 a (GTG)6 CCATTGTAGTGAAGGGTGAC ATCTTCTCACTCTCCCCATC NA PR-GR-BS-982 FITB5321.y1 a (GA)6 ATAGGAAAGGGCCACAAG CCGGTGGATAGACTGTAATG M PR-GR-BS-983 FITB5138.y1 a (AT)6 GACTCAAATGGACAGCTTTG GTCCCTGAGCCTAAAACG NA PR-GR-BS-984 FITB5330.y1 a (AGT)4 GAGTTTTAGCTCGGGAAAG AAGGAGCGTTAAAGACACAC NA PR-GR-BS-985 FITB5299.y1 a (TGC)5 GCAAAGAAGGTGTTGAGG AAGGAGTGCAAAGGGAAG M PR-GR-BS-986 FITB5115.y1 a (AAG)6 CTGAGAAGCCTGAAAAGATG TGATCGATGTCACAGTCTTG M PR-GR-BS-987 FITB5101.y1 a (TATTTT)4 CAATATTGAAGTACACCGAGAC CTTTTCCTGTGACCAGAATC M PR-GR-BS-988 FITB9145.y2 a (TTG)4 ACACTCCTCTACCGATTCC TAGGGATGCCATGAAGTCT NA

232

PR-GR-BS-989 FITB9130.y2 a ( TATTAT)4 GCTGTTGGTAGGAAATCG AGTCTGGCTTTGAGTAGCC P PR-GR-BS-990 FITB9202.y2 a (TATA)4 CCTAGACAACTCCAGGTTGA CATTGTCAACCAATTCTACG M PR-GR-BS-991 FITB9051.y2 a (ATAT)4 GTAAAGCAACATGCAACAGG GCCAATCAAGCTAAGACTCA M PR-GR-BS-992 FITB8852.y2 a (CAA)4 TAGGGATGCCATGAAGTC GGTTTGCCTCGAACTAAG NA PR-GR-BS-993 FITB8964.y2 a (TC)6 ATTTCGCATCATGGACAG CGCTAGCGTTATCAAGTTGT M PR-GR-BS-994 FITB9125.y2 a (GAA)7 GTAGGTATGTGCCTCCAAAG TCCTGGTGCCTCTCATAG M PR-GR-BS-995 FITB9205.y2 a (CT)7 GATATGCTTTCGGATTGCT CAGCTAGGAAGTAGGTCGTG M PR-GR-BS-996 FITB9112.y2 a (GT)9 CCTTCTCGTAGACAAATCAAG CAATATCGCCCTTTCTATTC M PR-GR-BS-997 FITB4385.y1 a (TTG)4 GGCATGTTTCCTTACCTACC CTTTGTTGCTGCCTGTTC NA PR-GR-BS-998 FITB4385.y1 b (AG)11 TCCTTACCTACCATCCATTG CAGCTTGTAAGTAACCACTGC M PR-GR-BS-999 FITB4602.y1 a (GAA)4 AGAAGCTCTGATCTTGAGTAGG CTTTGCAAGGTTTTCAGC M PR-GR-BS-1000 FITB4299.y1 (TATG)13 ATGGAGATTTTGTCGTATGG CGTGTATGATCAAAGCAAGA M PR-GR-BS-1001 FITB4292.y1 a (AATT)4 CTCCTCGCATACCCTCATT CGCCTTTGATGCTCAAGT M PR-GR-BS-1002 FITB4477.y1 a (CT)6 GTCCAAGTGCCACTCTATCT CTACTATACACGGTGGTTCG M PR-GR-BS-1003 FITB4439.y1 a (TCT)6 GAAGGAGTAGTTGAGTTGG CCCCAGTGATAGAAGCAAC M PR-GR-BS-1004 FITB4479.y1 a (ATAA)4 GTAACAATGCATCATCCTC GATAGCGTAATTGTGGAAG M PR-GR-BS-1005 FITB4272.y1 a (CAG)4 CCCTAGCCTCTGGAAACTT CGCACCAGTCACCATTTA M PR-GR-BS-1006 FITB2521.y2 a (AAC)4 CAGGATGAAGACTCACCTACA GCATGATGGTTGTCAGTTCT FH-1000 PR-GR-BS-1007 FITB2585.y2 a (AATT)6 GCGAATATCACGACAACTTT GCAGGATGAATCAAATGAAC P PR-GR-BS-1008 FITB2414.y2 a (ACA)4 AGCAAAGCGAAGTAATCAAC GCTTCATGCCTATCTCACAT NA PR-GR-BS-1009 FITB2430.y2 a (GTG)4 AACAACTCGCAAGCATGG GTGGTTATCTAGCGTGGAAC NA PR-GR-BS-1010 FITB2335.y2 a (AATAAT)5 TATCGTGTCACATCTTCAGC GTCCAACTTGTCTGTCTGCT M PR-GR-BS-1011 FITB2607.y2 a (ATATAT)5 ACAGGCTGTTGAGAATCAAG GAGACATTAATATGGGGTAGCA M PR-GR-BS-1012 FITB2631.y2 a (AATA)5 CCGGATCTAGACACAAGAGA TCTAGGGGTTTCGTAGGG M PR-GR-BS-1013 FITB2607.y2 b (ATATAT)5 CTATGTCTGCGTCTGCTACA CATGCTCTTTTCCCACCA M PR-GR-BS-1014 FITB8561.y2 a (AGA)4 TGCTCAAGAGCAGAGAAGG GGACACAAGGTGGGAACTT M PR-GR-BS-1015 FITB8466.y2 a (AG)7 CGGTAGAAGAGAAGGTTCCA CTCCTGCTTCCAACATCTC NA PR-GR-BS-1016 FITB8482.y2 a (CTCT)4 GGCTCCAATTGTTTTACTGT ACATTCGAATTGTGGATAGC P PR-GR-BS-1017 FITB8714.y2 a (TTTA)4 CAAGGCTATAGGCAACAGG GTGATTGGGGAAGCTCAG M PR-GR-BS-1018 FITB8483.y2 a (AAT)4 CGGATATTGAGGTCAGTTCT GACCTAAATCTACATCGGACTC BAR PR-GR-BS-1019 FITB8747.y2 a (TATT)4 CCAAATTCTTGAAGACAAGG GGGTCTTTATTTGGGACTCT M PR-GR-BS-1020 FITB8611.y2 a (TTC)4 CGCAAGAAAAGGCTAGGA GACGGAGCAGACAGAGATT BAR PR-GR-BS-1021 FITB8812.y2 a (TTG)4 GGTAGCTTTGGGATCTGAG CGGATCCTGGTAATTTGG BAR PR-GR-BS-1022 FITB8573.y2 a (TA)6 AGATGGCCAGGAATGAAC GATCTATGCTCAGCTCCAAG M PR-GR-BS-1023 FITB8789.y2 (ATATAT)10 GATAGGTGCACTTGCCTTAG GCGATAGGGCTCAGTGATA P PR-GR-BS-1024 FITB8454.y2 a (GTG)4 CCATGACGCAAACTAGAGG TACAGTAGAAGGGGCAGAAG NA PR-GR-BS-1025 FITB8718.y2 a (TA)6 GCCGAGAGAAAAGTTGTGT GCTTTGTAACGTCTTCAACC M PR-GR-BS-1026 FITB8534.y2 a (TATA)5 GACCCATTCATAGAAGACCA TCTCGAGGGATCCATATTC M PR-GR-BS-1027 FITB8574.y2 a (TA)7 GAGCATGGAAGGTGTTTG CACGATTCGAATACAGAGC NA PR-GR-BS-1028 FITB8830.y2 a (ATAT)4 ATACCCGCTTGTTGAGTG AGAGCCCTGTTTCATTGG M

233

PR-GR-BS-1029 FITB8535.y2 a (ATAT)5 GTCGTCGGATTAGAGTTACG GATGGGGATGAATTGTTG M PR-GR-BS-1030 FITB8456.y2 a (TA)6 GGTTGGTCTCTTTGATCAGT TGTCCACCCACTAATTTACC M PR-GR-BS-1031 FITB8568.y2 a (CTT)4 CTACCCTGTTTTACCCTCCT GGCGAAGTTTGGAGTCTT NA PR-GR-BS-1032 FITB4937.x2 a (TGTTGT)6 GCAAGTGGTGGTGGATCT GACTCATAGTAGGGGAACACC P PR-GR-BS-1033 FITB4803.x2 b (TTAA)4 CCTCTACATACGTTGGTTGC GGCAATGATATGTTGGTGA M PR-GR-BS-1034 FITB4619.x2 a (TTATTA)4 CTGCTCCATTGCTAACTCC CTTCATGCCCCCAGAATA M PR-GR-BS-1035 FITB4845.x2 a (TATATA )4 AGCAGACTCTTCAGGTTCAA ACAACCAGAAAGCTCCTACA P PR-GR-BS-1036 FITB4646.x2 a (ACA )5 CAGGAACCAAGCCAACAT TTCCCCTTCTGGAATCAG NA PR-GR-BS-1037 FITB4846.x2 a (TCT)4 CGTAACCCATCTTTCACCTC CAACCCACTCCATGATACAG NA PR-GR-BS-1038 FITB4760.x2 a (AAT)4 ACATCAATATGATGGACACG GGTGGCTGGTATGGATAGTA M PR-GR-BS-1039 FITB3841.x2 a (AT)6 GCCCAAATCGTACTCCAT CCATGGCTACCAAAGTCA M PR-GR-BS-1040 FITB4089.x2 a (GTT)4 TACGGTTACTCTTGGTTTGG GGTATTGAGATGCGTATGGT M PR-GR-BS-1041 FITB4089.x2 b (TTC)4 GTCCCTTTGAGTTGTTGTTG TGGAGGCAGTGAGATTAGAG NA PR-GR-BS-1042 FITB4090.x2 a (TA)7 CTCCACATTCAATCCAAACT ACTTTCCAACCGAAAAAGTC NA PR-GR-BS-1043 FITB3938.x2 a (AAATTT)4 GTAGTTAGGGGGTTGGTAGG GACCGACGTAAAATGAATTG M PR-GR-BS-1044 FITB4091.x2 a (AT)6 GGGGGAAACTAAAGTCGT TAGCCGTAGCTATTCCAGAC M PR-GR-BS-1045 FITB4116.x2 a (GAA)4 CTGCCACCATAAAAAGTAGC AAACAATGGCTGTTGACG M PR-GR-BS-1046 FITB3932.x2 a (ATAT)4 CTACTCACCTGGCAACACA AACGAACACTCAGACTACGG M PR-GR-BS-1047 FITB3973.x2 a (AGAG)4 GACAGGTTCTTGCATAGGAG CAAAGTAGCCGTTACACCTC M PR-GR-BS-1048 FITB4165.x2 a (TATA)4 CCTTAGGTGAAACCCTAACC TGAGGGCAATAACTAGCC M PR-GR-BS-1049 FITB3966.x2 a (TTA)4 GTATCTCCGAAAACCCGTAG CACATCACCAATCCAGCA NA PR-GR-BS-1050 FITB3935.x2 a (AAT)6 TTGAGTGATTAACGGTGAGG CTCCCATTTGGATGTTAAGG M PR-GR-BS-1051 FITB3880.x2 a (ATAT)4 CCAAACTAACTACCCCTTCC CATGAGAGTCCCTAACCAGA FH-1000 PR-GR-BS-1052 FITB4104.x2 a (ATAT)4 GGAAAGTTCGTACGGTTTAG TCCAAATCCATCTCTACTCC M PR-GR-BS-1053 FITB385.x2 a (AAT)4 TGGAAAGGTGAAGAGAGTTC GAGACAACTCTAGAAGGCAATG M PR-GR-BS-1054 FITB481.x2 a (AAG)4 AGGGTATAACGAACAGGTGA ACGCTGTACTCGCTATGTTT M PR-GR-BS-1055 FITB447.x2 a (AGAG)4 TGATGTGTGGATCAAACAAG CACTTGCCTTTTACTCCTGT M PR-GR-BS-1056 FITB9137.x1 a (ATATAT)4 ATCGTGTCCGCTAATATCC GTGTACACATATAAGCCATCCA P PR-GR-BS-1057 FITB9122.x1 a (AT)6 CCTATTGGAAGACCCAACT ATAGGTGGAAACCTTCCTG M PR-GR-BS-1058 FITB9154.x1 c (TA)6 CTCACGTCCTCCTATTCTGA AACTTCTAGGCCCTAACGAG M PR-GR-BS-1059 FITB9044.x1 a (TATA)5 GGTCCAGGACAAAATCATC GTACATCAATTGGTGCATGA M PR-GR-BS-1060 FITB9135.x1 a (AGAAGA)4 CATTCTCGTTAATGGGTGAT TGTTTCCTTCTGCTACTGGT P PR-GR-BS-1061 FITB5586.x2 a (AT)7 GTCACCCAGAACTGAGAACT ATACCACACCTCCATCCTCT M PR-GR-BS-1062 FITB5635.x2 a (TG)7 GCGTTGCAGTAAAGGATG AGCTCCCACCTCAATTTC NA PR-GR-BS-1063 FITB5755.x2 a (AT)6 GTTGACGAGGTAGCTGTGTT CTAAACGCGAAGTGTGAATC M PR-GR-BS-1064 FITB5388.x2 a (TGT)4 TGGTGCATCCACTGTCTATC GAGCGTATGCAAGGATAAGG NA PR-GR-BS-1065 FITB5660.x2 a (AC)6 GAATCGGGTCGGGTTAAG GAATCGGGTCGGGTTAAG P PR-GR-BS-1066 FITB5660.x2 a (AC)6 GAATCGGGTCGGGTTAAG GTGGACAAAACCCAATTTC M PR-GR-BS-1067 FITC3830.g1 a (AT)7 GCCGAATGCCTAAATATG GCAGTGGCATATATGTTTGA M PR-GR-BS-1068 FITB7120.x2 a (GA)6 TGCATAAATTGAGAGGGACT GGTGATCCGAATATGATGA P

234

PR-GR-BS-1069 FITB6480.x2 a (AT)7 TGTGACGTATTGATTTCTCG AGCACCAGCATACTTTAGGA M PR-GR-BS-1070 FITB10491.x1a (TA)7 GGTAGGGGACTAATGAGAATG CATGAATGCACTAACGTCTC M PR-GR-BS-1071 FITB10742.x1a (AT)7 GAATCAAGCCGGAGTAGG GAGTGTGGCATTGGTGTT M PR-GR-BS-1072 FITB10752.x1a (AT)6 CTCCAATACCATCTCCACTC GCTAGTTAAGTTCGTACGGTTA M PR-GR-BS-1073 FITB9605.y2 a (TG)7 TGATATTGAAGGTCGAGAGC GTCATTGTTTGTCCTCAAGC M PR-GR-BS-1074 FITB9605.y2 a (TG)7 GGTCGAGAGCCAATTGTA AACCAAACTCCCCAACAC M PR-GR-BS-1075 FITB7017.y1 b (ATT)8 CCTTCATCTTCTTCCTCCTC CTTGCCCTTGCTATGAAGTA M PR-GR-BS-1076 FITB7233.y1 a (AAAG)4 ACATGAGTTCAAACCCTGTC TGCCCTTGCTATGAAGTAAT M PR-GR-BS-1077 FITB7249.y1 a (TATA)4 CCATACTTGTGGGTGCAA TGACTCAACTCAGCTTGTGA M PR-GR-BS-1078 FITB7065.y1 a (ATAT)4 TGTCGAGAGTCAACAACTCA GACGACATCCGAGGTAAGT M PR-GR-BS-1079 FITC25210.b1a (TA)6 GACGACATCCGAGGTAAGT TGTCGAAAGTCAACAACTCA M PR-GR-BS-1080 FITB7266.y1 a (TAAA)4 CACATCTTTCTAGGGTACCAA GTTAGCTTTGTGGGAGATG M PR-GR-BS-1081 FITB6995.y1 a (GAA)4 CACTCCAATTCCTGCTTAAC GAAAGAGGCTCGAGCTAAGT M PR-GR-BS-1082 FITB7180.y1 a (GGT)4 GACAAGAGTGTGTGAGAGCA CATTGGATTTGGAGGAGTT M PR-GR-BS-1083 FITB7180.y1 b (TGG)4 TAGTGTGCCAAACGTTACTG AAACCACCACCATCAGTAAC NA PR-GR-BS-1084 FITB7284.y1 a (TTTA)4 CATCAAACAACCACCAAAC TGGCTAACAAGACAGATCCT P PR-GR-BS-1085 FITB7165.y1 a (TC)6 GAGGTAAGCTGTGGCTATGA TCCCAAGGAGAACAATAAAG M PR-GR-BS-1086 FITB7165.y1 b (TC)6 GAGGTAAGCTGTGGCTATGA TCCCAAGGAGAACAATAAAG NA PR-GR-BS-1087 FITB6981.y1 b (CT)6 GTCATTCTACCCATCACCAT GACATCGATGAGTTCACCTT M PR-GR-BS-1088 FITB7111.y1 a (CTTT)4 ACCTTTGCCAACATACACC CAGTGTTTGTTTGGCTGAA M PR-GR-BS-1089 FITB6983.y1 a (GGGAAG)5 GATTGGAGGGAATTGAGTTT TTCGGTAAGACATTTTCAGG M PR-GR-BS-1090 FITB7047.y1 a (AAAC)7 TCTCTGTGACCCTTTGAGAT GTCAAGGTCAAGAAGTCGAG M PR-GR-BS-1091 FITB7144.y1 b (TTATA)4 CTGAGGCGGTTTAACCTT CTCTGTTTCGTTTTGCTCTC M PR-GR-BS-1092 FITB7144.y1 b (AATA)4 CTGAGGCGGTTTAACCTT CTCTGTTTCGTTTTGCTCTC M PR-GR-BS-1093 FITB6009.y2 a (AAGCTC)4 GTCACCAATCCCAGTTTCTA AGGGCATTGTTGTATCGAG M PR-GR-BS-1094 FITB5889.y2 a (AT)6 TAAACACACCGCTCATCC CACCGGCGGTATATTTGT M PR-GR-BS-1095 FITB6018.y2 a (GA)6 CAAGTTCAGGGCCTAGTGAT AAATGGGAAGTGGGGAGT M PR-GR-BS-1096 FITB6066.y2 a (CTT)4 TCAAGTAGGAAGGAGGCAAT TCCATCAACATGACAATGTG NA PR-GR-BS-1097 FITB5916.y2 a (AGAAGA)5 AGATGAAAAGAGGATGAACG GTCCACGTCACCTTGTTAGT P PR-GR-BS-1098 FITB6116.y2 a (GAT)4 TCACTCATGAGAATCAGCAC GTATGGCAGAGAGTTTACGG NA PR-GR-BS-1099 FITB6141.y2 a (TTA)4 ACGTTAGGGCACAATTTCTC GGGTTCCAATTTTCCTTCTC NA PR-GR-BS-1100 FITB5918.y2 a (ATGCAG)4 ATGTGTCAGATGTGTGATGC ATCACTGACACTCACTGCAA P PR-GR-BS-1101 FITB6110.y2 a (AT)6 AATCCCCCAACGTAGTATCT GTACGTGAATGACGACAACA M PR-GR-BS-1102 FITB5863.y2 a (AGTTT)4 CACCGTTGGTTTCTTTCTAC TCAACCAGTCTCCAGGTTAC P PR-GR-BS-1103 FITB3353.y1 a (AAG)4 AACAGGAATGCTTTCTCAAC CTATGGTCTTACACGGAAGC M PR-GR-BS-1104 FITB3225.y1 a (TTC)4 GTGCCTTTTCTTTGGGTGA CCCATCCCCAAATTTTAGC M PR-GR-BS-1105 FITB3426.y1 a (GGTT)4 GACACTGTTCGAGTTTTTCC GAATCAACGTCTCTTTTTGC M PR-GR-BS-1106 FITB3371.y1 a (AATT)4 TTGTTGCTCTTGCTCTTATG TCCCGTAACTTTCTGTAACG M PR-GR-BS-1107 FITB3387.y1 a (TTA)4 CCCGTATTTTTCATCCACT AGTGGACCATTGTACATTCC M PR-GR-BS-1108 FITB3092.y1 a (ATAATA)5 GGCCAGATGTTATGTTCG AATCTCGTGCATGGTCAG M

235

PR-GR-BS-1109 FITB3364.y1 a (TACA)7 TCAAAAACTCCTCACTGCTG TATGCTGTACCTCGTTGATG M PR-GR-BS-1110 FITB3260.y1 a (GCA)7 AAGCAGCTTGTGAGCTTG ACGAGTATCTGCCCTCTTG M PR-GR-BS-1111 FITB3165.y1 a (GAA)4 CAAGAGGAACCAACTGAGC TGACCGGTTCTTCTACTAGC M PR-GR-BS-1112 FITB3383.y1 a (TA)7 CCCAGGTGAGAAACTCTCTA TCCTGTCCTGCTTTATTACC M PR-GR-BS-1113 FITB3215.y1 a (GTT)4 ATGCATGCCATAGCTCAAAC CAAAGCTAATTCGGGGATCT NA PR-GR-BS-1114 FITB3320.y1 a (AAAT)4 CTGTAATAGGCCATTTTTGC GCTACGGTTTTCTGACATTG M PR-GR-BS-1115 FITB3336.y1 a (AAAAAG)5 AGAGATGACCAGGACTTGC CTTCAGCTCAGCTAATACCC M PR-GR-BS-1116 FITB1761.y2 a (ATA)4 TTCCGTAGACCATCAGAAAC GATGTTTAGCTGGTTTGGTT M PR-GR-BS-1117 FITB1577.y2 a (GTT)4 GGGTTGTCTGTGATTGAGAT GGTTAGGGACTCACCTGATA NA PR-GR-BS-1118 FITB1769.y2 a (ATAT)4 GGAAGAGTTGGAAAAAGGTT ATAACCGTTGGTATCCAGTG M PR-GR-BS-1119 FITB1889.y2 a (GAA)4 ACGCAACCTTATCAGGAAC CTACCCTGTTTTACCCTCCT NA PR-GR-BS-1120 FITB1562.y2 a (TTC)4 CAGGGTATTCACGATTTTTC TTAGCTTCCATGGTCAAAGT M PR-GR-BS-1121 FITB1906.y2 a (GTGT)4 ACCCCTTGTTTCTCTCTCAC GGAACGAACCCAACTACAC M PR-GR-BS-1122 FITB1739.y2 a (CAA)5 TAGCTTCTTCAACCGCAGAT GGAATTGAAAGCCTAAGGAG M PR-GR-BS-1123 FITB1915.y2 a (ATA)4 GCTGTGACTTGTTGCATCT GGTCATAGTAGAGGACGAGGT NA PR-GR-BS-1124 FITB1836.y2 a (AT)6 GCCTTATTTAGTGGACGACA CTTAGATATCGTGCCTCCAG NA PR-GR-BS-1125 FITB1829.y2 a (AT)6 AATGAAGCAAACCAAGTCAG GTAATTTTGGAGAGCAATGG M PR-GR-BS-1126 FITB1885.y2 a (TA)7 GGGTAAAGCGAGACTTAGTG TTCATCCCCACCCTATCT M PR-GR-BS-1127 FITB1735.y2 a (TTC)4 CAATGTCAAGCTCCTCTTTC AACTTTGGGATATGTGGTTG NA PR-GR-BS-1128 FITB1783.y2 a (ATA)4 CGTTTAGGTCAGATTGAGGA TAGAGAAAAGATTGGCGTTC M PR-GR-BS-1129 FITB1704.y2 a (TA)6 ATCCGGATCCAACCATAC CCACATGACCCACTTGTATT M PR-GR-BS-1130 FITB8113.y2 a (AAT)4 AAGAACACGTAATCGTGGAG GTGGCAAATACTGGAGACAA NA PR-GR-BS-1131 FITB8185.y2 a (AGAAA) GACGGATCTTTTGATAGGG ATTCCGACGACTCAGGTAA P PR-GR-BS-1132 FITB8284.y2 a (ATT)4 AGACAACTGAAGACGAAAGC GGAAACACTTAGCACTGGAA M PR-GR-BS-1133 FITB8148.y2 a (CATCAT)4 GGGATAACATATCTGCTGTTG ATAGAATTGGGCCTCTTTG M PR-GR-BS-1134 FITB8341.y2 a (TAATT)4 AGAAAGAGGTGTGGATTGTG GTTGGTTCGACTTCTCATCT FH-1000 PR-GR-BS-1135 FITB8173.y2 a (ACC)5 TAAGACTACCGAGGAACCAC CACCATTATTACCCCCATT M PR-GR-BS-1136 FITB8126.y2 a (TA)6 GCATGAAGGTTTAAGACCAG CAAGTTTGTCGCATGATCT M PR-GR-BS-1137 FITB8398.y2 a (TCC)4 TGACCTTAAACCATGACCA TAAGGATGTGGAATGGAGA NA PR-GR-BS-1138 FITB8246.y2 a (TCTC)5 CGACGCCAAAAATGAATAC CTTGCTCACTTCGATATGCT M PR-GR-BS-1139 FITB8103.y2 a (ATATAT)4 AATCCCAAGATCAAACCAG ACCAACGTTTCTGAGACATC M PR-GR-BS-1140 FITB8152.y2 a (TC)6 GGTGCCGTTTCATTTCTTA GATCTGATTTTGTGCCTCTG P PR-GR-BS-1141 FITB1362.x1 a (TATA)4 CAGTTGGTTTATTCCTTTCC CATGGACCATCCTCACTAAT P PR-GR-BS-1142 FITB1322.x1 a (ATAT)4 GGATCCCGGCTTAAGATT GCTAAGCTAGACTTCCGTCTTC M PR-GR-BS-1143 FITB1443.x1 a (CT)6 CACACCCTATGGCTAAATGA CTTATTCCGACGATTAGACG M PR-GR-BS-1144 FITB1404.x1 a (CTT)4 GTGTCCTCATCACTCCAAGT ACTCTAATCAAGGCGAAAGG NA PR-GR-BS-1145 FITB1492.x1 a (TC)6 AGGAGTGTGTGGCTGCTA AAGACGGATATCGACAAGG BAR PR-GR-BS-1146 FITB1197.x1 a (AAT)4 CTCTAGCCCCTTAAGGTTTC CAACTAAGAGGGCAATTGTA M PR-GR-BS-1147 FITB1221.x1 a (GCT)5 AATGCGGACTTGAACATTAG TGGGATACATGTTGTGACAG NA PR-GR-BS-1148 FITB1166.x1 a (TCA)4 AACAAGGACGGTTCTCTACA GCACTGATCGATATCCAGTT NA

236

PR-GR-BS-1149 FITB1262.x1 a (AAT)7 ACGGTCGTCACTTGATTG GTAGTACGGTGCAACGAGAG NA PR-GR-BS-1150 FITB1262.x1 b (AAT)5 ACGGTCGTCACTTGATTG GTAGTACGGTGCAACGAGAG NA PR-GR-BS-1151 FITB1454.x1 a (GCT)4 TTGTTGTGTGCAGGTACG CTCTAGCCCAAACCTAAACC M PR-GR-BS-1152 FITB1518.x1 a (GCA)4 TAGCTGCAGAATGTCTCTCC GACTCCTCCGATCTGACTTA M PR-GR-BS-1153 FITB1159.x1 a (GA)6 TAGGGAGAGGAAGTGAAACC CTAGGCCCAAAACTCCTC NA PR-GR-BS-1154 FITB1399.x1 a (TA)6 GTCTTTTGGGCTACGTTAGA GTAACAAGGTGGAGAAATGG NA PR-GR-BS-1155 FITB1184.x1 a (TA)6 AACCTTACTGACGGTATTGC CGTTTGGGACTGTTTGAG P PR-GR-BS-1156 FITB4761.y2 a (TA)7 AAGCTTCGTTACAGTTCAGG CATTCTTGGACAACCCTTAC P PR-GR-BS-1157 FITB4762.y2 a (TCT)7 CCATCTTCAGATGCTTCTTC ACCACGAAAACCAATATCTC NA PR-GR-BS-1158 FITB4819.y2 a (GGAT)4 ATCTCCTGCTTCTGTTTCTG CGATAAAGTAAACCCAGGAC M PR-GR-BS-1159 FITB4613.y2 a (AGAC)4 CGTATTGGGGAGTTAGTTG TTGGCATACCTGTCCATAG P PR-GR-BS-1160 FITB4645.y2 a (TA)6 GATCAGCTGAACTATGGGAGT GGTGGCTTGGAAGAATTTG P PR-GR-BS-1161 FITB4823.y2 a (AAT)4 AGTTACCAAGCCACAGACC GCTCACCCCCAATAGTGT M PR-GR-BS-1162 FITB2545.x1 a (TCC)5 CATGAGCTTAAACCATGACC CTAACTCAGAGAGACAGCTTGA M PR-GR-BS-1163 FITB2649.x1 a (AT)6 AAGCCGGTAATTAGCATTC TGGTGTCCATAACATGTGTC NA PR-GR-BS-1164 FITB2314.x1 a (AAT)7 AGACCTCACTAACATCCCATAG GCAATTTGAAGCATTGAGAC NA PR-GR-BS-1165 FITB2501.x1 a (TAA)5 GATTTGGGTTAGCATATGGA CGACTCCGATGAAGAAGAAT NA PR-GR-BS-1166 FITB2582.x1 a (TTC)5 TGGATGTTGGGTATTTGAGT CTACACCCAGGCTTTTGA NA PR-GR-BS-1167 FITB2406.x1 a (CAA)4 TGTGTAGCAACACCAAGC GCGGGAAAGGTTAGATTACT FH-1000 PR-GR-BS-1168 FITB2622.x1 a (TTC)4 GTGTGTGAGGGAGGTAACAT GCGGGAAAGGTTAGATTACT M PR-GR-BS-1169 FITB2631.x1 a (TTA)5 TTGTCCAAGACTTTGTTGC CCCAACCAATGAGAATTAAG NA PR-GR-BS-1170 FITB2320.x1 a (TATATA)5 GGGAACATTGAGAAAGACAA AAGGCAAGTCGTAAACATGA M PR-GR-BS-1171 FITB773.x2 a (CT)7 CTTATTTGGTTGCCTTGTTC CCAAGTAGATGGAAGTGGAG M PR-GR-BS-1172 FITB2890.y2 a (AGAAGA)5 GTTAGAGTCGAGGTTCGATG TCCTCCCCTCATTTTTCTAT M PR-GR-BS-1173 FITB2980.y2 a (AAG)4 ACAGTGTGATTGGAAGCCTA CAAGAAAGGTAAGCCAGTTG M PR-GR-BS-1174 FITB3060.y2 (TGC)4 ACTCGTTCACCAGCTTTG GGTCATTGGACATAAAGAGG M PR-GR-BS-1175 FITB3021.y2 b (CA)6 AAGGTTAGGGTTGGAGAGTC CAAGTCAGTAACCCCAATGT NA PR-GR-BS-1176 FITB3047.y2 b (TA)6 AAGCCAAAAGGAAGAGGTAG GAACGCCATGATATGAAAGT M PR-GR-BS-1177 FITB2768.y2 a (ATATAT)5 GGCGTCTATATGCACTGACT CACTCAATTTGTGGTTTCAG M PR-GR-BS-1178 FITB585.y2 a (AAAG)4 CCAGCTTACGCAATTATTCT AAGTTCTCCCTCTTTCCTGT M PR-GR-BS-1179 FITB601.y2 a (TTAT)4 GGAGACGGTTGGATTAACTA GGAGACGGTTGGATTAACTA M PR-GR-BS-1180 FITB482.y2 a (ATA)4 CTGACGAAATGCGACTTATG CAGGAAAGAGATGACGAAAG NA PR-GR-BS-1181 FITB387.y2 a (AAT)4 ATCATAAACCCGACATAGGA GGAACCCTGAAGGATACCTA M PR-GR-BS-1182 FITB764.y2 a (ACAT)4 CAAGCTTTCTCTCCCACTAA CACGAACAGCAAACAATTAC M PR-GR-BS-1183 FITB391.y2 a (ATA)4 GCATCCATGTGCTCTAGT AGTCATGCCTCGTCTTCTAC NA PR-GR-BS-1184 FITB719.y2 a (TTA)4 CACCACTCATATTCGGACTT CAATCTTGTAACTCCCTACCC M PR-GR-BS-1185 FITB10178.y2a (GCT)4 GGGACTGTATGCTTCTTTCA GAAGATTCTCAACCAGCAAC M PR-GR-BS-1186 FITB10106.y2a (AG)7 TGAAGGAACTGTGTGTGTTG GATGATGGGTTTAGTTTTGG NA PR-GR-BS-1187 FITB10106.y2a (AG)7 ATGAAGGGTGTGAGAAGAGA TTGTTGGGTTAGGGTTAGG p PR-GR-BS-1188 FITB10314.y2a (CTG)4 GTCACATGCAATGGCTTATC ATCATGGTTAGGACAGCAAC M

237

PR-GR-BS-1189 FITB10195.y2a (ATTA)4 TCTCTTAGCTAGGGTTTCCA TGCTTAGTGGTTAAGGATGA M PR-GR-BS-1190 FITB10180.y2a (TATA)4 TTTGAACGATCTCTCTTTGG CATCATGAAGATGGGTTTCT M PR-GR-BS-1191 FITB9996.y2 a (ACACAC)5 GATGGACGATAATGGAAGAG TAGTTAGTGGCTTGCGTTG M PR-GR-BS-1192 FITB10253.y2a (ATG)4 CCGTTCTCTTGCAGAACTTA TCCTATATTGCTAGGGTGGA M PR-GR-BS-1193 FITC75717.b1 (TTG)4 TGGCCTTTACACTCCTCTAC TGCCACATGTGTAGCATC NA PR-GR-BS-1194 FITB10014.y2a (GCA)4 CAATCCTGGAGTGAAACC CATGACAGCAAACGTTCC NA PR-GR-BS-1195 FITB10022.y2a (GAA)5 CAGTCACATAGCCTTAGCAG GTGGTGGCTTTATCTTGG M PR-GR-BS-1196 FITB10094.y2a (AT)6 GCTCATGAGAGGAAATAACG ATGTGACATCACCAGATTCG NA PR-GR-BS-1197 FITB10294.y2a (TGA)4 ATCTGAGTTTCCGGAGTAGC TGTTGTTGGAGTAGACACGA NA PR-GR-BS-1198 FITB10271.y2a (CAA)4 TAGGGATGCCATGAAGTCT GCCTCGAACTAAGAAAGGA NA PR-GR-BS-1199 FITB10119.y2a (ATATT)4 GCATATGTCAGCATCCACT GATGCCTATGGGTGAAACT M PR-GR-BS-1200 FITB9992.y2 a (CCT)4 ACCCTCTTCTTATCCGTAGC GGATTTGCTTGGGCTCTA NA PR-GR-BS-1201 FITB5969.x2 b (TTTA) TACTTTGCATGTCGTTGGT GAATGAATGCCGCTTTAAC P PR-GR-BS-1202 FITB6009.x2 a (AT)6 GGAGGGATAAAATAGGATGG GAACGCTTCATTAGAATTGG M PR-GR-BS-1203 FITB6073.x2 a (TCT)4 AGGAAGTCGAAGTTTTGTCA GTGATCAACACGACCTACAA NA PR-GR-BS-1204 FITB5962.x2 a (AG)6 GAGGGAGAAAAGACTGTGTG GCTTCTCCTTCTTCATTGTG P PR-GR-BS-1205 FITB5930.x2 a (CAA) ATGATTTCCCCTATCCATTC TGCTTAAGAGTTGGGATTTG P PR-GR-BS-1206 FITB6083.x2 a (TA)6 AATCGGTCAGTCAATCACTC CCTAAGAAGCAGAGACAGGA M PR-GR-BS-1207 FITB5948.x2 a (TCTTCT)5 GTTTCTAAGGGTGAGGGATT GTGACAGAAAAGTGGCTTGT M PR-GR-BS-1208 FITB5948.x2 b (ATGCAG)3 CCACTTTTCTGTCACCTCTG TGTAGTGTCGTCCCCCTAT P PR-GR-BS-1209 FITB5765.x2 a (TTAT)4 GGTGTCCCTACAAAATGAGA CTGAGTCGACGGTTTATAGC p PR-GR-BS-1210 FITB6094.x2 a (TAA)4 GAAATGCTACCCATTGAGAC CTGCTACCATATGAGTCTGCT M PR-GR-BS-1211 FITB5934.x2 a (TCTC)4 GCACTGAGGGTTATTAGGAA GGCTGAGAGCATACCATAAC P PR-GR-BS-1212 FITB5943.x2 a (AT)7 TGTGACTGCATAGCTAATCG GAGCTGTTGAAATTGTCCTG BAR PR-GR-BS-1213 FITB5992.x2 a (TA)7 CAAGTTAGTGCGTGGCATAG GAAAGTGAAGGGGCGATT M PR-GR-BS-1214 FITB6080.x2 a (GCG)4 GTGACAAGCTAGGGTTTGG GCCTCTTTTCTTGGGTTG NA PR-GR-BS-1215 FITB9482.y2 a (ATC)4 CTTGACATCCATGGCCTA CTGCAAACGATCAACAGC NA PR-GR-BS-1216 FITB9523.y2 a (ATC)5 GAATTCCCACCATGATCC GCATGAGTTCCGAGAATG NA PR-GR-BS-1217 FITB9318.y2 a (TG)6 TCCATACTCTGCCACATTG GTTCTAGTGACCCCAGGAG NA PR-GR-BS-1218 FITB9295.y2 a (ATAT)4 AGATTACTTGCCTTTCATGG ATTGAGCTTCATTCTCTCCA M PR-GR-BS-1219 FITB9359.y2 a (AAAT)4 GACTTATCGAGCGAGTGAAT ACAATGTAACACCCTGAACC M PR-GR-BS-1220 FITB9368.y2 a (TCT)5 ATCAGGGATTTCAGCAGAC AGGTTATAGTCAGGGCACAA NA PR-GR-BS-1221 FITB9584.y2 a (TGT)4 ACCTCATCCGAGTTAGATCC CGGCTATGGTAATGATGC M PR-GR-BS-1222 FITB9584.y2 b (TGC)4 ACCTCATCCGAGTTAGATCC GCGGCTATGGTAATGATG NA PR-GR-BS-1223 FITB9817.x1 a (AAT)4 ACAACTCGACCAAAGGAC GAGAGCTAACGAGTGAGGTT NA PR-GR-BS-1224 FITB9729.x1 a (AT)6 CTCAGAGTACCAGACCGAGT CTTGGTTGGTTTAGTAGGTTG M PR-GR-BS-1225 FITB9769.x1 a (AAT)4 ATACTCGATTCAGCCTCAGA CAAGGCAGAGTTGGTAAAGT NA PR-GR-BS-1226 FITB9842.x1 a (TTC)12 GTCCACCCTTAGGCTTAGAA ACCCTAAGGCCCTTCTCTA P PR-GR-BS-1227 FITB9637.x1 a (GAA)8 GGACCTCTTCTGAGTAGCAA ATGTTGAGGTAGCCATGC P PR-GR-BS-1228 FITB9726.x1 a (AGA)5 CCCCAATCCATCCTAACTC CTGGATTGCCTCTCGATT M

238

PR-GR-BS-1229 FITB9950.x1 a (CTT4)4 GGTCACCATGTTAGGTTCAG TCACAGATTGGGACCTCTAC M PR-GR-BS-1230 FITB9441.x1 a (ATT)4 GAGAGAGGCATCAGATCAAT GTTTCAGAATCCGACTTGAG M PR-GR-BS-1231 FITB9441.x1 b (ATAATA10 CGCATAATTTCTCTCTCACC CACTACGCTGTTTGCTCA M PR-GR-BS-1232 FITB9505.x1 a (TA)7 GTTGAACATGTTTCAGCTTG GGTCCATGTCCTGTTGATAG M PR-GR-BS-1233 FITB9341.x1 a (AGA)6 ACCTTGAGATCGTTCAGGA AATCATCCACGAGTGCTAAC NA PR-GR-BS-1234 FITB9286.x1 a (TCTC)5 TGCAAGTTGATTCCTAAACC ATCACACTCACAATTCATGC P PR-GR-BS-1235 FITB9279.x1 a (AT)16 CTCCTGCCAAATGACATTAC TCCAAGTTAAGAAGTAGGAACC M PR-GR-BS-1236 FITB9543.x1 a (AGA)4 CCCATACCCTTTTTGAGC CCCTATTTTCCTCCGACA NA PR-GR-BS-1237 FITB4115.y2 a (CAA)4 AGTGGCTAAGGAATTTGACA CTAGGTTTGCCTCGAACTAA M PR-GR-BS-1238 FITB4062.y2 a (TTCT)4 GGGCTAAATTGACTTACCAA GAAGTGACGAAACAGAGAAGA M PR-GR-BS-1239 FITB4102.y2 a (AAT)4 GCGTTGAATCGAGTAGAGAG TTGGTTCTTTTGGTGTTAGG NA PR-GR-BS-1240 FITB3847.y2 a (AAGA)4 GAAAGTATGGGCTCTGATGA GACTGTTACTCAACCACGGTA M PR-GR-BS-1241 FITB4119.y2 a (AAT)5 TTCCACAGAAGGAAATGAAG TCCGTACTTATGTTCCCAAT M PR-GR-BS-1242 FITB4208.y2 a (AGATT)4 GTGGCCCAATGATTAAGATA CTAAGTCAGCCTAACCACCA P PR-GR-BS-1243 FITC96594.g1a (AT)6 AGCTGTTGGTCTGGTCTTC CAACTATGCCAGTTGAGCTT M PR-GR-BS-1244 FITC96620.g1a (TCTC)4 GGTCATCTGGCAATTCCT GTGGGCAAGACAAGAAGTC P PR-GR-BS-1245 FITC96620.g1b (TA)8 GGCCAGGACATATTCTCTCT GGGCAAGACAAGAAGTCAG M PR-GR-BS-1246 FITC124228.b1a (TTC)10 CTGCTACCTGTTTCCGAGT GAAAGGACCGGTGAAGTC NA PR-GR-BS-1247 FITC124273.b1a (TTTA)4 CTATAGTCGGAGGGCAGAC GAAGCCATTTTAGGCATCTG P PR-GR-BS-1248 FITC124294.b1a (CTC)6 GCCAAGATAGGTGAGCAAG GAGGATGGAGCTGGTGAT p PR-GR-BS-1249 FITC124296.b1a (ATTCG)5 GGGTGACAATGGTTACAAGG CTTGCTAGGGGTGAGCAT M PR-GR-BS-1250 FITC124298.b1a (TCT)4 CAGGTTTAGCTTCTGGTTCC CGAATAGCTAGAGACGCTGT NA PR-GR-BS-1251 FITC104852.b1a (AG)7 CCGACTTCAAGTTCAACC CTGAAATGGAGGAGAGGAG M PR-GR-BS-1252 FITC104854.b1a (GAG)4 GTGGACAGCTTTGAATGAAT GTTCAATTTAGTCCCTGAACC M PR-GR-BS-1253 FITC104894.b1a (AATA)4 CAATACTTTGTAGGTCGAAGC AAGTCCTTTTAACTCCACAGG P PR-GR-BS-1254 FITC104898.b1a (CAA)4 TGTGCTATCTTGGAAAACCT TCTAAGGTAATGCCTCATGG M PR-GR-BS-1255 FITC158606.g1b (CTTTT)4 CTGAAGTTGTCCGTCTCACT CAACAGGCGAACACAATG M PR-GR-BS-1256 FITC158606.g1a (TATA)4 GTTGTCCGTCTCACTTCTCA CAACAGGCGAACACAATG M PR-GR-BS-1257 FITC158643.g1a (GAG)4 ACAATTGAGAGCACACAACA AAGGAAGGAAAATCGTAAGG NA PR-GR-BS-1258 FITC158680.g1a (AG)6 AAAGCTGGCGTAACAAGG TGTCTCTTCCTCTCTTGCAC NA PR-GR-BS-1259 FITC63083.g1 a (TCT)4 GTGTTTCTCAATTCCTCGAA AATCAACTTGGACATGAAGC M PR-GR-BS-1260 FITC63115.g1 a (ATT)5 GGATTGGCTTTTCCAATG GGAAAGGACTTTGAGTTTCC NA PR-GR-BS-1261 FITC63143.g1 a (TTA)5 GCAGTTTATGTTGTGGCAAT TCTTCTCCACTACCCAGATG P PR-GR-BS-1262 FITC63156.g1 a (AATA)4 AACTCGTGGCTAGCATCTAC GACAGTTGACAACACACCTC M PR-GR-BS-1263 FITC63156.g1 b (TGA)7 GGAACCGTTAAAAGGAGGAG AAGCCCACAATGGGAACT NA PR-GR-BS-1264 FITC58043.g1 a (TGTT)5 CATTCTTCCACCGAAAAGTA AGAAGGCATCTTTGACAGAA P PR-GR-BS-1265 FITC164377.g1a (AT)4 CATCATGTTAACTGCTGGA GCTTTCAACTTTTCAAGGTC P PR-GR-BS-1266 FITC164414.g1a (TTTAAA)4 GACTCCATTTTGGGTGTACT GCTCCAAACTCCAAGTATGT M PR-GR-BS-1267 FITC164448.g1a (TTA)4 AGGAAAAGGAAAGGAATCAC GGATGTGACCCACATACATT NA PR-GR-BS-1268 FITC69184.g1 a (TCC)4 CATGAGTTTACCATGACCAA ACATAAAGGATGTGGAATGG FH-1000

239

PR-GR-BS-1269 FITC69196.g1 a (CT)6 TTTCATCTCCCCATGCAG GTCATTGGTGATCGTGTGTG M PR-GR-BS-1270 FITC69202.g1 a (CAA)4 TGCCATGAAGTCTTACCAC TGGCCTTTACACTCCTCTAC NA PR-GR-BS-1271 FITB8852.y2 a (CAA)4 TGCCATGAAGTCTTACCAC TGGCCTTTACACTCCTCTAC NA PR-GR-BS-1272 FITC168690.b1a (AAT)4 TAGTGGGAAATGAGGTGAGG CCAGACGTTGTAACACCACT NA PR-GR-BS-1273 FITC168708.b1a (AT)7 CTTCTTTCATGTGAGGTTCC AGAGGATCGAGAAGATGACA M PR-GR-BS-1274 FITC168751.b1a (TTG)4 CGGACGATAAGAAGTGGTAA CACCAAAATCTCAGCTATGG P PR-GR-BS-1275 FITC168752.b1a (AAG)4 TCTAGTCTTCTTCCGCTTCA GAAAGGAGCACTAGCACAGT P PR-GR-BS-1276 FITC168760.b1a (AGG)4 CCGAGTCTTAACGAACTGAC CTAGGATTGGGATCGACAT M PR-GR-BS-1277 FITC9898.b1 a (TA)12 AGGCAAGCTAGATTGGTTAG CCAGGTGCACTTTAGACAAT M PR-GR-BS-1278 FITC9899.b1 a (AT)6 TCATAACGTCCCTCTATTGG GATACGCCCGTATGAATCT NA PR-GR-BS-1279 FITC9899.b1 b (AT)12 CTTACCGTTCCAATGCAC GGAATGGAGAGGAGGAAA BAR PR-GR-BS-1280 FITC9901.b1 a (AAG)6 GCAGGTAAAGTGGACTCTCA TCCAAACAGCAGCAAGTG M PR-GR-BS-1281 FITC9903.b1 a (TCGAA)4 TAGGGGTGAGCATTAGATCG TGGGTGACAGTGGTTACAAG M PR-GR-BS-1282 FITC9910.b1 a (CAT)4 CAGTCCTGACTGCGAAATA GAATAGTTGCATCCGTGAAG P PR-GR-BS-1283 FITC23981.b1 a (GAA)20 CTCCTTTCTTCCTTCCCTA ATCCACGAATGGACACAC P PR-GR-BS-1284 FITC180034.b1a (AC)16 GTTCGGCAACCTATGTCA GGCTAATTGAGTGTGTAGGG NA PR-GR-BS-1285 FITC24772.b1 a (AAG)4 GTGCTTGGATGAGGACAA ATCAGAAGGCCTTGGTTC M PR-GR-BS-1286 FITC24810.b1 a (CCA) 4 CACATTCTTGTGGGGTGT CCCACTACTAGGTCTTGCAT M PR-GR-BS-1287 FITC24832.b1 a (CTCT)8 CGAAACACCTACAGGCATC GGTTTGAACCCCCAATGT P PR-GR-BS-1288 FITC24861.b1 a (AT)16 ACCACAACCTAATGGGTCTA ACGTTTTGAGCTTATGGAGA M PR-GR-BS-1289 FITC106762.b1a (AAAT)4 CTAAGCTTAACTCGTCGTCAC CCATTCCCTATCCTCAGTCT P PR-GR-BS-1290 FITC106795.b1a (TAA)7 CTGATACCAACTGATGCAGA TCATCGGTGTTTCTAGAGGT P PR-GR-BS-1291 FITC106795.b1b (AAT)6 CCGGATCTAGACACAAGAGA TCTAGGGGTTTCGTAGGG P PR-GR-BS-1292 FITC106847.b1a (GCT)5 GCTTTAAGACCTGCCAACTA CACCACATGCCACATTATC P PR-GR-BS-1293 FITC148830.b1a (AT)12 CTGAAGCCACCTTAGAACAC ATGACCTCGAGGATGAACTA NA PR-GR-BS-1294 FITC124917.b1a (GAA)20 CTCCTTTCTTCCTTCCCTA ATCCACGAATGGACACAC P

*PR-GR-BESS represents last name of principal investigators (PR), Gossypium raimondii (GR), BAC end sequence (BES), BAC clone (B), SSR (S) and suffix a, b or c indicates multiple primer pairs for the same sequence. P/M: P for polymorphic and M for monomorphic. BAR: G. Barbadense and FH-1000 is cultivar of G. hirsutum.

240