A1

PHENOTYPIC CHARACTERIZATION AND GENETIC DIVERSITY STUDIES OF SELECTED ( SATIVA L.) POPULATIONS BASED ON AROMA AND COOKED KERNEL ELONGATION

Wambua Festus Kioko

I56/27393/2013

A Thesis Submitted in Partial Fulfillment of the Requirements for the Award of the Degree of Master of Science (Biotechnology) in the School of Pure and Applied Sciences of Kenyatta University

MAY 2016 ii

DECLARATION

I declare that this thesis is my original work and has not been presented for a degree in any other University or any other award.

Festus Kioko Wambua, Department of Biochemistry and Biochemistry, Kenyatta University.

Signature……………………………………..Date………………………………….

We confirm that the work reported in this thesis was carried out by the candidate under our supervision.

Dr. Mathew Piero Ngugi, Department of Biochemistry and Biotechnology, Kenyatta University.

Signature……………………....……………..Date………..…………………………

Dr. Geoffrey Muriira Karau, Molecular Biology Laboratory, Kenya Bureau of Standards.

Signature……………………………………..Date……………………………………

iii

DEDICATION

This thesis is special dedication to my dear parents who brought me into being and continue to work tirelessly to get me educated. Avai wia ni wenyu.

iv

ACKNOWLEDGEMENTS I thank the almighty God for the gift of life and good health that enabled me to pursue my education dreams. I am forever indebted to my supervisors, Dr. Mathew Piero

Ngugi and Dr. Geoffrey Muriira Karau for their dedicated mentorship during my project work. This work would not have been a success without the support and provision of resources from Kenya Bureau of Standards, Molecular Biology laboratory where I carried out my research. Thanks to my colleague Amos Mawia for his encouragement and support. I recognize, with appreciation, Samantha Mary

Nyawira and Maureen Langat for offering their technical assistance whenever I needed them. My gratitude goes also to my family for their financial and moral support.

v

TABLE OF CONTENTS DECLARATION...... ii DEDICATION...... iii ACKNOWLEDGEMENTS ...... iv TABLE OF CONTENTS ...... v LIST OF TABLES ...... viii LIST OF FIGURES ...... ix ABBREVIATIONS AND ACRONYMS ...... xii ABSTRACT ...... xiii CHAPTER ONE ...... 1 INTRODUCTION...... 1 1.1Background information ...... 1 1.2Statement of the problem ...... 5 1.3 Justification ...... 5 1.5 Objectives ...... 6 1.5.1 General Objective ...... 6 1.5.2Specific Objectives ...... 6 1.6 Significance of the study ...... 6 CHAPTER TWO ...... 8 LITERATURE REVIEW ...... 8 2.1 The biology of rice ...... 8 2.2 Origin and geographical distribution of rice ...... 9 2.3 Global economic impact of rice ...... 13 2.4 Quality traits of rice grain ...... 14 2.4.1 Rice aroma ...... 14 2.4.2 Cooked kernel elongation ...... 15 2.5 Rice genetic diversity ...... 15 2.6 Measurement of genetic diversity ...... 16 2.6.1 Morphological markers ...... 17 2.6.2 Biochemical markers ...... 17 2.6.3 Molecular markers ...... 18 vi

2.6.3.1 Restriction Fragment Length Polymorphism (RFLP)...... 18 2.6.3.2 Random Amplified Polymorphic DNA (RAPD) ...... 19 2.6.3.3 Amplified Fragment Length Polymorphism (AFLP) ...... 19 2.6.3.4 Single Nucleotide Polymorphism (SNP) ...... 19 2.7 Simple Sequence Repeats (SSR) Markers ...... 20 CHAPTER THREE ...... 22 MATERIALS AND METHODS ...... 22 3.1 material ...... 22 3.2 Determination of phenotypic diversity ...... 23 3.2.1 Measurement of grains and kernel traits ...... 23 3.3 Determination of genetic diversity ...... 23 3.3.1 Genomic DNA extraction ...... 23 3.3.2 Quantification of genomic DNA ...... 24 3.3.3 Simple Sequence Repeat (SSR) analysis ...... 25 3.3.4 Polymerase Chain Reaction (PCR) ...... 26 3.3.5 Electrophoretic separation and visualization of PCR products...... 28 3.4 Data management and analysis ...... 29 CHAPTER FOUR ...... 31 RESULTS ...... 31 4.1 Determination of phenotypic diversity ...... 31 4.1.1 Measurement of grain and kernel traits ...... 31 4.1.2 Principal Component Analysis ...... 36 4.1.3 Cluster analysis of rice varieties based on morphological traits ...... 38 4.2 Determination of genetic diversity ...... 40 4.2.1 Quality of extracted DNA ...... 40 4.2.2 Simple Sequence Repeat (SSR) analysis, allele number and PIC ...... 40 4.2.2.1 Simple Sequence Repeat (SSR) analysis ...... 40 4.2.2.2 Number of alleles ...... 41 4.2.2.3 Rare alleles ...... 42 4.2.2.4 Polymorphic Information Content (PIC) values ...... 42 4.2.2.5 Heterozygosity ...... 43 vii

4.3 Determination of genetic relatedness ...... 45 4.3.1 Genetic distance ...... 45 4.3.2 Cluster analysis ...... 47 4.3.3 Analysis of molecular variance (AMOVA) ...... 48 4.3.4 Principal coordinates analysis (PCoA) ...... 49 CHAPTER FIVE ...... 51 DISCUSSION, CONCLUSIONS, RECOMMENDATIONS AND SUGGESTIONS FOR FURTHER RESEARCH ...... 51 5.1 Discussion ...... 51 5.2 Conclusions ...... 64 5.3 Recommendations ...... 66 5.4 Suggestion for further research ...... 67 REFERENCES ...... 68 APPENDICES ...... 76

viii

LIST OF TABLES Table 1.1: of rice ...... 1

Table 2.1: Species complexes of the genus Oryza and their geographical distribution ...... 12

Table 3.1:Profiles of rice varieties used in the study ...... 22

Table 3.2: Profiles of rice microsatellite (RM) or SSR makers used in this study ...... 27

Table 4.1: Analysis Of Variance (ANOVA) ...... 35

Table 4.2: Eigen values and percent of variation for 7 principal component axes………. in 13 ricevarieties………………………………………………………………...….37

Table 4.3: Profiles of Simple Sequence Repeat (SSR) analysis ...... 44

Table 4.4: C.S. Cord coefficients of dissimilarity among pairs of 13 rice varieties ...... 46

Table 4.5: Analysis Of Molecular Variance (AMOVA) based on 8 SSR loci ...... 49

ix

LIST OF FIGURES Figure 2.1:Schematic representation of the evolutionary pathways of Asian and African rice .. 10

Figure 2.2:Global rice production in the world...... 13

Figure 4.1: Scatter plot of 13 rice varieties based on the first two principal components… ……38

Figure 4.2: Dendrogram generated by cluster analysis of morphological characters ...... 39

Figure 4.3:Neighbor joining tree, 1000 bootstraps, dissimilariy matrix index…………………… presence/ absence (Jaccards coefficient)…………………… ...... 48

Figure 4.4:Two-dimensional scatter plot of principal coordinate analysis for all…………………

13 test rice varieties ...... 50

x

LIST OF PLATES

Plate 4.1: A gel picture showing fourteen rice DNA samples extracted using CTAB method. . 40

Plate 4.2: SSR banding pattern of 13 rice varieties generated by marker RM 339...... 41

xi

LIST OF APPENDICES

APPENDIX 1: One-way ANOVA: GL versus genotype ...... 76 APPENDIX 2: Tukey Pairwise Comparisons ...... 76 APPENDIX 3: One-way ANOVA: GB versus genotype ...... 77 APPENDIX 4: Tukey Pairwise Comparisons ...... 77 APPENDIX 5: One-way ANOVA: GL/B versus genotype ...... 78 APPENDIX 6: Tukey Pairwise Comparisons ...... 78 APPENDIX 7: One-way ANOVA: KL versus genotype ...... 79 APPENDIX 8: Tukey Pairwise Comparisons ...... 79 APPENDIX 9: One-way ANOVA: KB versus genotype ...... 79 APPENDIX 10: Tukey Pairwise Comparisons ...... 80 APPENDIX 11: One-way ANOVA: KL/B versus genotype ...... 80 APPENDIX 12: Tukey Pairwise Comparisons ...... 81 APPENDIX 13: One-way ANOVA: 10GW versus genotype ...... 82 APPENDIX 14: Tukey Pairwise Comparisons ...... 82 APPENDIX 15: Principal component analysis based on genetic distance matrix ...... 83

xii

ABBREVIATIONS AND ACRONYMS

2AP 2-Acetyl-1-Pyroline BAD Betaine Aldehyde Dehydrogenase Cm Centimorgan Fgr Fragrance CTAB Cetyl Trimethyl Ammonium Bromide DNA Deoxyribonucleic Acid GC-MS Gas Chromatography Mass Spectroscopy MAS Marker Assisted Selection MAB Marker Assisted Breeding PCR Polymerase Chain Reaction QTL Quality Trait Loci RFLP Restriction Fragment Length Polymorphism RAPD Random Amplified Polymorphic DNA SNP Single Nucleotide Polymorphism SSR Simple Sequence Repeats UPGMA Unweighted Pair Group Method with Arithmetic Mean PCA Principal Component Analysis PCoA Principal Coordinate Analysis AMOVA Analysis of Molecular Variance PIC Polymorphic information Content MIAD Mwea Irrigation and Agricultural Development KATC Kilimanjaro Agricultural Training Center

xiii

ABSTRACT Rice ( L.) is the main staple food for more than half of the world’s population. Improving cooking and eating quality of rice is one of the important objectives of many programs. Aroma and cooked kernel elongation are two critical parameters that determine the market value, cooking and eating qualities of rice. The objective of this study was to evaluate the phenotypic and genetic diversity of thirteen (13) Oryza sativa L. populations from Kenyan and Tanzanian. Genetic diversity was determined using 8 simple sequence repeats (SSR) markers. Phenotypic diversity was determined based on measurement of seven (7) grain and kernel traits. Diversity data was analyzed using Minitab 17.0 and Power Marker version 3.25 software packages (North Carolina State University).The number of alleles per locus ranged from 2 to 4 alleles with an average of 3.12 across 8 loci. The polymorphic information content (PIC) ranged from 0.2920 (RM 282) to 0.6409 (RM 339) in all loci with an average of 0.4821. Pair-wise genetic dissimilarity coefficients ranged from 0.1125 to 0.9003 with an average of 0.5312. Maximum genetic similarity was observed between Kilombero and Supa, BS 370 and BS 217. Minimum genetic similarity was observed between Kahogo and BS 217. Cluster analysis was used to group varieties by constructing dendrograms based on SSR data and morphological characterization of grains. The dendrogram based on SSR data formed two distinct clusters of the 13 rice varieties. RM 339 and RM 241 were the most informative markers and could be used for differentiating rice varieties from diverse geographical origins. The results obtained from this study demonstrated that phenotypic trait measurement and use of trait specific SSR markers can be relied upon in diversity studies among diverse and closely related genotypes.RM 339 and RM 241 markers are recommended for use in diversity studies and in quality assurance for grading of rice varieties. Further analysis should be carried out using a larger number of samples and markers to come up with a more conclusive report on the discriminating power of microsatellite markers based on rice grain quality traits.

1

CHAPTER ONE

INTRODUCTION 1.1 Background information Rice (Oryza sativaL.) is a member of the grass family (Gramineae) belonging to the genus Oryza(Table 1.1). The genus Oryza includes 23 wild species and 2 cultivated species. Of the two cultivated species, African rice () is highly grown in West whereas the Asian rice (Oryza sativa L.) has spread overtime and is grown in all continents in the world. Being able to grow in a wide spectrum of climates and conditions, rice is a staple food for one third of the world’s population

(Chakravarthi and Naravaneni, 2006).

Table 1.1:Taxonomy of rice Taxonomic level Name

Kingdom Plantae Division Magnoliophyta

Class Liliopsida

Order

Family Graminineae or

Tribe Oryzae

Genus Oryza

Species sativa

2

Rice (Oryza sativa L.) is regarded as one of the major cereal crops with high agronomic and nutritional importance.It is one of the food crops for which complete genome sequence is available. Therefore, it is an ideal model plant for the study of grass genetics due to its relatively small genome size of 430 Mb compared to other (Causse et al., 1994).

The current global production of rice is about 738.1 million metric tonnes per year.

This constitutes more than a quarter of all cereal grains. Of these, Asia accounts for the largest production totaling to about 584 million tones, whereas Africa produces approximately 21.9 million tones. In Kenya, rice is the third most important staple food after maize and wheat. The local production is estimated at between 45,000 to

80,000 tones whereas its consumption is about 300,000 tones. This huge production - consumption gap is met through imports. About 80% of the rice grown in Kenya is from irrigation schemes in Mwea, Ahero, Bunyala, West Kano and Yala swamp. The remaining 20% is produced under rain fed conditions (Ouma, 2014).

Rice is mainly used as a major source of human food.However, it has other uses such as animal feed, production of alcoholic beverages such as wine, rice bran oil, fuel and manufacture of insulation materials (Chakravarthi and Naravaneni, 2006).

There is a wide genetic diversity available in rice among and between landraces, leaving a wide scope for future crop improvement. Landraces are the local or traditional varieties of a domesticated plant species which have developed overtime through adaptation to their natural environment. The demand for productive and 3 homogeneous crops has led to development of a small number of standard, high yielding varieties. This has consequently resulted to tremendous loss of heterogeneous traditional cultivars through genetic erosion. Landraces preserve much of this lost diversity and are known to harbor great genetic potential for breeding new crop varieties that can cope with environmental and demographic changes (Esquinas-

Alacazar, 2005).

Proliferation of rice varieties has narrowed down the number of combinations of morphological descriptors available to describe the uniqueness of a variety.

Therefore, characterization and varietal identification of available landraces and improved varieties have become important in modern day crop improvement

(Vanniarajanet al.,2012).

There are more than 120,000 rice varieties worldwide but the major categories

include; indica, japonica, and glutinous. These varieties differ in their grain

qualities which include: milling quality, grain shape, cooking quality and nutritional

quality. These traits are crucial determinants of grain quality. In rice,

aroma is caused by accumulation of 2-acetyl-1-pyroline (2-AP). This compound is

encoded by betaine aldehyde dehydrogenase 2 (BAD2) gene which is also called

fragrance (fgr) gene located on chromosome 8. Accumulation of 2-AP is caused by

mutation in BAD2 gene with 8 bp deletion (Roy et al., 2012).

4

Kernel elongation trait is influenced by several physicochemical and genetic factors, including genotypes, aging temperature, aging time, water uptake, amylose content and gelatinization temperature. Cooked kernel elongation is influenced by kne gene and the major QTL has been mapped on chromosome 8. Previous studies on genetic analysis have shown that genes and / or QTLs of cooked kernel elongation and aroma are linked (Ahn et al., 1993).

Various methods previous used in quality trait studies in rice include sensory and chemical methods for determining aroma and measuring kernel length and breadth before and after cooking. Sensory and chemical methods require panels of analysts to distinguish fragrant and non-fragrant rice samples hence are unreliable and inconsistent. Other methods include use of spectroscopy, stable isotope dilution, gas chromatography – mass spectrometry (GC–MS) and near-infrared reflectance

(Yoshihashi et al., 2005). These methods have various limitations that include low sensitivity, time consuming and large sample volume requirement hence they are not reliable (Garland et al., 2000).

Simple sequence repeat (SSR) markers offer a simple way of detecting genetic variation in rice varieties with high level of polymorphism (Bligh et al., 1999).

These markers are preferred over other PCR-based molecular markers due to their ease of application, easy scoring patterns, high reproducibility, greater allelic diversity and their relative distribution throughout the genome (Blair and McCouch,

1997). 5

This study was aimed at assessing the phenotypic variation and genetic diversity

among rice varieties based on aroma and kernel elongation. This was achieved by

measuring morphological traits and using previously mapped PCR based SSR

markers to determine the genetic diversity.

1.2 Statement of the problem

Kenya is home to many varieties of rice varieties and land races. These varieties

were developed through selection based on agronomic traits. This resulted in a wide

spectrum of varieties that are highly valued both in domestic and foreign markets. In

Kenya, rice consumers prefer the , which is high in quality, and hence

price. Fragrant rice is often blended with low quality non-fragrant rice and sold as

premium rice. In addition, lack of genetic background information on these varieties

has constrained development of better varieties (Vlachos and Arvanitoyannis, 2008).

Various conventional methods routinely used to evaluate and grade rice varieties

include sensory and chemical methods. These methods are inconsistent and have

failed to address these concerns due to low sensitivity, time consumption and large

sample volume requirement.

1.3 Justification Molecular characterization using PCR-based SSR markers provides a suitable method, which can be used for varietal identification in rice supplies and to differentiate between the various grades of fragrant rice. This is because they are highly reproducible, co-dominant, interspersed throughout the genome and require only small amount of tissue hence they are cost effective to use (Chen et al., 1997). In 6

addition, grain quality evaluation is a key step in development of better rice varieties

through marker-assisted selection. This project was therefore aimed at validating the

SSR markers for diversity as a tool for grading of Kenyan rice.

1.4Research questions

i. What are the levels of phenotypic diversity among selected Kenyan and

Tanzanian rice varieties?

ii. What are the levels of heterozygosity among selected Kenyan and

Tanzanian rice varieties?

iii. What are the levels of genetic diversity among selected Kenyan and

Tanzania rice varieties?

1.5Objectives

1.5.1 General Objective

Tocarry out phenotypiccharacterizationand genetic diversity studies on selected rice (Oryza sativa L.) populations based on aroma and coked kernel elongation using microsatellite markers.

1.5.2 Specific Objectives i. To determine phenotypic diversity among selected Kenyan and Tanzanian

rice varieties.

ii. To determine heterozygosity among selected Kenyan and Tanzanian rice

varieties based on aroma and cooked kernel elongation traits.

iii. To determine genetic diversity among selected Kenyan and Tanzanian rice

varieties based on aroma and cooked kernel elongation traits. 7

1.6 Significance of the study

Information on the phenotypic diversity, genetic relatedness and heterozygosity among the rice varieties was developed in this study. This can be used for quality assurance in discriminating between good quality from low quality rice in the market. This information is also imperative in Molecular Assisted Breeding (MAB) in the era of modern biotechnology to add value to our own local rice varieties and land races.

8

CHAPTER TWO

LITERATURE REVIEW

2.1 The biology of rice

Rice (Oryza sativa L.) belongs to the grass family Gramineae and is a member of the genus Oryza. The genus Oryza includes 25 species, of which 23 are wild species and two (O. sativa and O. glaberrima) arecultivated species (Londoet al., 2006). Rice is normally grown as a monocarpic annual plant but can also survive as a perennial crop in tropical areas and can produce a ratoon crop for up to 20 years (Linares, 2002).

Rice plant can grow up to 2 - 6 ft (61–183 cm) tall, depending on the variety and soil fertility. As a member of grass family, it has a long, pointed leaves between 50-

100cm long and 2-2.6cm broad. It has small wind pollinated flowers that are produced in a branched arching to the pendulous inflorescence. The edible part of the rice plant is the rice grain which is a caryopsis, 5-12mm long and 2-3mm thick and includes glumes, endosperm, and embryo (Izawa and Shimamoto, 1996).

Rice endosperm consists mostly of starch granules in a crude fiber, together with sugar, fats, proteinaceous matrix, and organic matter. Oryza sativa has a relatively small diploid genome (2n = 24) of about 430 million base pairs. This is the smallest genome of all food crops and approximately half of the genome is composed of repetitive sequences (Sang et al., 2007). The basic chromosome number of the genus

Oryza is 12. Both O. sativa, O. glaberrima and 14 wild species are diploids with 24 chromosomes, other eight wild species are tetraploids with 48 chromosomes (4n = 9

48). Incompatibility exists among species having different genomes. Partial sterility in hybrids is common when different ecogeographic races of Oryza sativa are hybridized (Vaughan et al., 2005).

2.2 Origin and geographical distribution of rice

It is generally agreed that Oryza sativa could have originated from the river valleys of

Yangtze and Mekon in China. On the other hand,the delta of in Africa is believed to be the primary center of origin of Oryza glaberrima (Sweeney and

McCouch, 2007). The foothills of the Himalayas, northeastern India, Chattisgarh,

Jeypore Tract of Orissa, northern parts of Myanmar and Thailand, and Yunnan

Province of China are some of the centersof diversity for Asian varieties. The Inner delta of Niger River and some areas around Guinean coast of the Africa are the centers of diversity of the African species of Oryza glaberrima (Linares, 2002).

Oryza sativa and Oryza glaberrima are believed to have evolved independently from two different progenitors, Oryza nivara and Oryza barthiias shown in figure 2.1.

These two types of rice are believed to be domesticated in South or South East Asia and tropical respectively. The progenitors of Oryza sativa are considered to be the Asian AA genome diploid species and those of Oryza glaberrima to be

African AA genome diploid species (Sweeney and McCouch,2007).

10

Figure 2.1:Schematic representation of the evolutionary pathways of Asian and African cultivated rice; Source:(Chang, 1976).

Of the two cultivated species, the Asian rice, Oryza sativa is the most widely grown. It is grown worldwide, including in Asian, European Union, North and South American,

Middle Eastern and African countries. Oryza glaberrima, however, is grown solely in

West African countries. Asian rice (both indica and japonica) was domesticated about

8,200–13,500 years ago in the Pearl River valley region of china and later spread from

East Asia to Southeast and South Asia. The crop was then introduced to Europe through Western Asia route and to the Americas during European colonization (Huang et al., 2012).

African rice was domesticated in inland delta of upper Niger river, which is today about 3500 years ago and extended to . However, this rice species did not spread further from its original region because the Asian species was introduced through east Africa by the Portuguese during the 16th century and spread to the west. 11

The wild species are widely distributed in the tropics of Africa, Central and South

America, Asia, and Australia(Vaughan et al., 2005). The geographical distribution of the species complexes of the genus Oryza is summarized in table 2.1.

12

Table 2.1: Species complexes of the genus Oryza and their geographical distribution Sativa complex Chromosome Genome Geographical Number Distribution I 1. O. sativa L. 24 AA Worldwide: originally South & Southeast Asia 2. O. nivara Sharma et Shastry 24 AA South & Southeast Asia 3. O. rufipogon Griff. 24 AA South & Southeast Asia, South China 4. O. meridionalis Ng 24 AA Tropical Australia 5. O. glumaepetula Steud. 24 AA Tropical America 6. O. glaberrima Steud. 24 AA Tropical West Africa 7. O. barthii A. Chev. et Roehr 24 AA West Africa 8. O. longistaminata A. Chev. et Roehr. 24 AA Tropical Africa II Officinalis Complex 9. O. punctata Kotschy ex Steud. 24 BB East Africa 10. O. rhizomatis Vaughan 24 CC Sri Lanka 11. O. minuta J.S.Pesl. ex C.B.Presl. 48 BBCC Philippines, New Guinea 12. O.malamphuzaensis Krishn 48 BBCC Kerala & Tamil Nadu 13. O. officinalis 24 CC South & Southeast Asia 14. O. eichingeri A. Peter 24 CC East Africa & Sri Lanka 15. O. latifolia Desv. 48 CCDD Central & South America 16. O. alta Swallen 48 CCDD Central & South America 17. O. grandiglumis (Doell) Prod. 48 CCDD South America 18. O. australiensis Domin. 24 EE Northern Australia 19. O. schweinfurthiana Prod. 48 BBCC Tropical Africa III Meyeriana Complex 20. O. granulata Nees et Arn. ex Watt 24 GG South & Southeast Asia 21. O. meyeriana (Zoll. et Mor. ex 24 GG Southeast Asia Steud.) Baill. IV Ridleyi Complex 22. O. longiglumis Jansen 48 HHJJ Indonesia, New Guinea 23. O. ridleyi Hook. f. 48 HHJJ Southeast Asia V Unclassified (belonging to no complex) 24. O. brachyantha A. Chev. et Roehr. 24 FF West & Central Africa 25. O. schlechteri Pilger 48 HHKK Indonesia, New Guinea Source: Brar and Khush,(2003).

13

2.3 Global economic impact of rice

Rice is cultivated in about 162.3 million hectares in the world accounting for the total production of about 738.1 million tones (Choudhury et al., 2004). Of these, developing countries account for 95% with the largest producers being china, India, Indonesia,

Bangladesh, Vietnam, and Thailand as shown in figure 2.2. It is therefore a major economic mainstay for majority of rural populations, being mainly cultivated by small scale farmers and is a source of income for workers in the non-agricultural sectors.

In Africa, rice is largely cultivated in West Africa with Benin, Cameroon, Burkinafaso and Chad being the greatest producers. However, rice production in Africa has not kept pace with increasing demand. Consequently, only 54% of the Sub-Saharan Africa’s rice consumption is supplied locally. It is estimated that 3.4 billion people derive 20% of their daily calories from rice hence it is regarded the most important grain with respect to human nutrition and calorific intake (Smith, 2001).

Figure 2.2: Global rice production.Source: Zhao et al. (2011). 14

2.4 Quality traits of rice grain

In the major rice producing countries, grain quality traits highly determine the market value of the rice. The quality traits of rice grain range from physical to biochemical properties and include, grain shape and appearance, milling efficiency, cooking easiness, eating palatability, and nutrition. In particular, the cooking and eating qualities are very crucial determinants of cooked rice grain quality (Fitzgeraldet al., 2009).

The cooking and eating qualities of rice are influenced by several factors with amylose content being the most important determinant of cooked rice quality. Others include gelatinization temperature, gel consistency, and aroma (Fitzgeraldet al., 2009). In particular, aroma and cooked kernel elongation are the most important quality traits of rice, which differentiate the highly valued aromatic rice from the other rice types.

2.4.1 Rice aroma

Aromatic rice is preferred by consumers and fetches a high price both in domestic and international markets. In Kenya, rice consumers prefer the aromatic basmati rice which has superior cooking and eating qualities compared to the other local and imported varieties. Rice grain aroma results from the production of many biochemical compounds

(Lorieux et al., 1996). Sakthivelet al. (2009) reported that accumulation of 2-acetyl-1- pyrroline (2AP) is the most important compound responsible for aroma. Aroma compound is encoded by betaine aldehyde dehydrogenase 2(BAD2) gene which is located on chromosome 8 and the level of aroma depends on this gene caused by 8 bp deletion (Bradbury et al., 2005a). 15

2.4.2 Cooked kernel elongation

Linear elongation of the kernel after cooking is one of the major characteristics of fine rice (Govindarajet al., 2009). It is considered to be a physical phenomenon which is influenced by several physicochemical and genetic factors which include;water uptake, amylose content, aging temperature, gelatinization temperature, aging time and genotypes. During cooking, rice kernels absorb water and increase in volume through increase in length or breadth. Length-wise increase without increase in girth is a desirable characteristic in high-quality premium rice. Conventional methods routinely used to evaluate kernel elongation include measuring grain length and breadth before and after cooking to obtain grain elongation ratio hence the proportionate change (Cruz and Khush, 2000).

2.5 Rice genetic diversity

Genetic diversity refers to the total number of genetic characteristics in the genetic makeup of a species. It occurs as a result of recombination, mutation, selection and genetic drift. Mutation and recombination leads to development of new varieties in a population, whereas selection and genetic drift remove some alleles.

Land races or traditional which are maintained through traditional farming practices contain huge genetic variability which can be used to improve and widen the gene pool of existing genotypes (Villa et al., 2005).Information about genetic diversity and relationships among rice varieties is very crucial in crop improvement strategies.

Genetic diversity is of crucial importance in the continuity of a plant species as it allows 16 adaptation tothe prevailing environmentalstress. Genetic diversity determines the inherent potential of a cross and frequency of desirable recombinants in advanced generations.

In a breeding programme, genetic distance or parental diversity of optimum magnitude is a prerequisite to obtain superior genotypes. The analysis of genetic variation among breeding materials is of critical interest to plant breeders, as it contributes immensely to selection, prediction of potential genetic gainsand monitoring of germplasm

(Chakravarthi and Naravaneni, 2006).

2.6 Measurement of genetic diversity

The assessment of genetic diversity among plant populations is done using various techniques such as morphological markers, biochemical markers and DNA or molecular marker analysis. Of these, DNA markers are considered best for analysis of genetic diversity and varietal identification since they are not influenced by the stage of plant development and environment changes (Virk et al., 2000).

Further, these markers can also be utilized in detection of genes influencing agronomically important traits. Molecular marker technology provides an essential tool for evaluation of genetic diversity among different varieties as well as, identification of cultivars and thus adds to management plant genetic resources (Virk et al., 2000). They include; restriction fragment lengthpolymorphism (RFLP) (Devos and Gale, 1992), random amplified polymorphic DNA (RAPD) (Williams et al., 1990), microsatellite or 17 simple sequence repeat (SSR) (McCouch etal., 2002), amplified fragment length polymorphism (AFLP) (Vekemans et al., 2002), and single nucleotide polymorphism

(SNP) (Ganal et al., 2009).

2.6.1 Morphological markers

They are based on visually accessible traits such as plant height, seed shape and colour.

They involve field experiments hence requires large tracts of land and this makes it more expensive than other techniques. In addition, morphological markers are less abundant, vary during plant development and are adversely affected environmental variation.

Therefore, morphological markers are not reliable (Staub et al., 1996).

2.6.2 Biochemical markers

Theyare also called isozymes. Isozymes are allelic variants of enzymes and they are usually detected by electrophoresis and specific staining. They are as a result of amino acid alterations which cause net charge changes or conformational (spatial structural) changes which results in a shift in their electrophoretic morbidity. Addition of a specific enzyme stain can reveal isozyme profile of individual samples (Knapp and

Rice,1998).Isozyme markers are codominant in nature. They detect diversity at functional gene level and have the advantage of requiring small amount of material for detection and are less influenced by the environment. However, these markers offer limited polymorphism, only a limited number areavailable, and often do not allow discrimination between closely related genotypes (Kumar et al., 2009).

18

2.6.3 Molecular markers

A molecular marker is a DNA sequence that is readily detected and whose inheritance can be easily monitored. They are the most widely used genetic marker type, comprising a large variety of DNA molecular markers. They offer a wide range of advantages over morphological and biochemical markers as they are stable and detectable in all tissues regardless of growth, differentiation, development, or defense status of the cell.

Additionally, they are not confounded by environmental, pleiotropic, and epistatic effects. Molecular markers can cover the whole genome hence they are able to detect the variation that arises from deletion, duplication, inversion, and/or insertion in the chromosomes. They are neutral and therefore, do not affect the phenotype of the traits of interest because they are located only near or linked to genes controlling the traits. Many

DNA markers are co-dominant and can differentiate between the homozygous and heterozygous genotypes (Kurma et al., 2009).

2.6.3.1 Restriction Fragment Length Polymorphism (RFLP)

Restriction Fragment Length Polymorphism (RFLP) is a technique in which varietiesare differentiated by analysis of patterns derived from cleavage of their DNA. This technique is mainly based on the special class of enzyme called restriction endonucleases. The two main advantages of RFLP markers are co-dominance and high reproducibility. Disadvantages are the requirement of relatively large amounts of pure and intact DNA and the tedious experimental procedure (Devos and Gale, 1992).

19

2.6.3.2 Random Amplified Polymorphic DNA (RAPD)

RAPD markers involve PCR amplification technique of random DNA segments with single, typically short primers of arbitrary nucleotide sequence. A disadvantage of

RAPD markers is the fact that the polymorphisms are detected only as the presence or absence of a band of a certain molecular weight, with no information on heterozygosity.

Besides being dominantly inherited, RAPDs also show some problems with reproducibility of data. Their major advantages are the technical simplicity and the independence of any prior DNA sequence information (Williams et al., 1990).

2.6.3.3 Amplified Fragment Length Polymorphism (AFLP)

The AFLP technique combines elements of RFLP and RAPD. It is based on the selective PCR amplification of restriction fragments. Possible reasons for AFLP-

Polymorphisms are; sequence variations in a restriction site, insertions or deletions within an amplified fragment and differences in the nucleotide sequence immediately adjoining the restriction site (not detected with RFLPs). Thus, the usage of AFLP technologies results in the detection of higher levels of polymorphisms compared with

RFLPs. Amplified fragment length polymorphisms (AFLPs) also have a much higher multiplex ratio (more markers per experiment) and better reproducibility than RAPDs. A drawback can be that most AFLP markers are dominant rather than co-dominant, due to the complex banding patterns (Vos et al., 1995).

2.6.3.4 Single Nucleotide Polymorphism (SNP)

SNP markers are based on sequence differences at single-base pair positions in genomes. Single nucleotide exchanges in genomes are numerous; therefore SNP 20 markers provide a great marker density. Another important advantage of SNP is that it is not a gel-based technology. For the large-scale genotyping required in marker assisted breeding programs, technologies based on gel electrophoresis are often too labor intensive and time consuming. Among these markers, SSR markers have several advantages, their co-dominant, stable and highly polymorphic characteristics have been used intensively for rice cultivar identification, genetic diversity evaluation and phylogenetic comparison and marker assisted selection (Ganal et al., 2009).

2.7 Simple Sequence Repeats (SSR) Markers

Simple sequence repeats (SSRs) are DNA sequences with repeat lengths of a few base pairs that are well distributed throughout the genome and are flanked by highly conserved region. Variation in the number of nucleotide repeats can be detected with

PCR by selecting the conserved DNA sequences flanking the SSR primers.Among different PCR based markers, SSR markers based are preferred over other molecular markers due to their ease of application, high reproducibility, rapid analysis, low cost, easy scoring patterns, and relative distribution throughout the genome (Chen et al.,

1997).

Simple sequences repeat (SSR) markers have been widely applied in genetic diversity studies as they are able to detect high levels of polymorphism (McCouch et al., 1997). In rice, SSRs have been used to assess the genetic diversity of both cultivated and wild species (Neeraja et al., 2005). More than 2,000 rice SSR markers are available from

SSR-enriched libraries (McCouch et al., 2002). This high number of markers permits the 21 selection of the most informative and well distributed SSR loci in the rice genome for use in molecular analysis. Simple Sequence Repeat (SSR) are better markers for good quality rice discrimination because they are genetically linked to fgr and kne loci

(Cordeiro et al., 2002).

Specific SSR markers that are genetically linked to fragrance (fgr) locus can be used to discriminate fragrant and non-fragrant rice varieties (Bradbury et al., 2005b) According to Lorieux et al. (1996), fgr gene is flanked by RLFP molecular markers RG28 and

RG1 at a genetic distance of 6.4 ± 2.6 and 5.3 ± 2.7 cM, respectively. There is close linkage between RG28 and fgr (5.8 cM) and two quantitative trait loci for fragrance, one on chromosome 4 and the other on chromosome 12. Several SSR markers based on RG

28 locus have been developed for discrimination of fragrant and non-fragrant rice varieties (Garland et al., 2000).

A major quantitative trait loci (QTL) for cooked kernel elongation trait has been identified with close proximity to the RFLP marker RZ 323 in linkage group 8. Kernel elongation without increase in breadth on cooking is an equally important characteristic of high quality rice. Previous studies on genetic analysis have shown that genes and or

QTLs of cooked kernel elongation and aroma are linked and present on chromosome number 8.RM 44 primer set has been identified for use as a selection marker for identifying kernel elongation trait(Ahnet al., 1993).

22

CHAPTER THREE

MATERIALS AND METHODS

3.1 Plant material

A total of 500 g of thirteen different rice varieties were collected from Mwea

Irrigation Agricultural Development (MIAD) and Kilimanjaro Agricultural Training

Center (KATC). The names and attributes of the rice varieties and the names of the corresponding sources are detailed in table 3.1. The rice seeds were stored in the

Molecular Biology laboratory at Kenya Bureau of Standards, Nairobi, Kenya.

Table 3. 1: Profiles of rice varieties used in the study Sr.no Genotype Source Attribute 1 IR 2793 MIAD Improved variety

2 BS 217 MIAD Improved variety 3 BS 370 MIAD Improved variety 4 BW 196 MIAD Improved variety

5 ITA 310 MIAD Improved variety

6 Red Afaa KATC Landrace 7 IR 54 KATC Improved variety 8 Kilombero KATC Landrace 9 IR 64 KATC Improved variety 10 Kahogo KATC Landrace

11 Saro 5 KATC Improved variety 12 Wahiwahi KATC Landrace

13 Supa KATC Landrace 23

3.2 Determination of phenotypic diversity

3.2.1 Measurement of grains and kernel traits

A total of seven traits were measured in this study. They included; grain length (GL), grain breadth (GB), grain length/breadth (G-L/B), grain weight (GW), kernel length

(KL), kernel breadth (KB), kernel length/breadth (K-L/B). Ten randomly selected raw rice grains and kernels from each rice variety were measured for their length and breadth traits using a digital vernier caliper. The measurements were repeated 10 times in each and thus an average of 10 grains was recorded. The grain weight of 100 randomly counted rice kernels from each variety was determined using a weighing balance (METTLER TOLEDO) and an average recorded (Varnamkhasti et al., 2008).

The grain and kernel length / breadth ratio (measure of slenderness) for each variety was obtained by dividing length/breadth.

3.3 Determination of genetic diversity

3.3.1 Genomic DNA extraction

DNA was extracted from each sample by Cetyl Trimethyl Ammonium Bromide

(CTAB) with slight modifications (Ferrari et al., 2007).

The rice grains were grinded using a blender until a fine powder was formed. Further

20g of the ground samples were transferred to 50 ml falcon tubes and soaked in 600 μl ice-cold extraction buffer. The samples were incubated for 30 min at 65°C and then the mixture was centrifuged at 6500 x g for 10 min at 17°C. The supernatant was transferred into new centrifuge tubes and 600 μl of isopropanol added into each tube. 24

The mixture was then incubated at room temperature for 5 min for precipitation of the

DNA. The content was centrifuged at 13000 x g for 10 min after which the supernatant was discarded and the DNA pellet was left to dry overnight.

The dried DNA pellet was dissolved in 200 μl TE buffer containing RNase and incubated at 37°C for 2 hours. CTAB buffer (400 μl) was added and the tubes incubated for 15 min at 65°C. Five hundred microliters of chloroform-iso amyl alcohol was added and the tubes centrifuged for 5 min. After centrifugation, the upper phase was transferred into fresh eppendorf tubes and mixed with 1.4 μl of ethanol (96%) and the mixture was incubated at room temperature for 15 min for DNA precipitation. The mixture was then centrifuged at 13000 x g for 10 min. After centrifugation, the supernatant was discarded and the DNA pellet washed with 500 μl of 70% ethanol.

This was centrifuged at 13000 x g for 10 min at 17°C. Finally, the supernatant was discarded and the pellet dissolved in 20 μl of sterile TE buffer for purification and stored at -20°C.

3.3.2 Quantification of genomic DNA

The purity of the extracted genomic DNA for each sample solution was determined

using a nanodrop spectrophotometer (JENWAY Genova plus) at a wavelength

(A260/A280) nm of for protein contaminants and (A260/A230) nm for polyphenol buffers

and carbohydrate contaminants. The DNA was also quantified by 1% agarose gel

electrophoresis.A suitable gel tray and combs were cleaned with a tissue paper soaked

in rectified spirit.The ends of the gel tray were then sealed with an adhesive tape.

Agarose gel mix of 1% (1 g agarose in 100mL of 1X TAE) was prepared and boiled 25 in a microwave oven till ahomogenous, clear, boiling solution was formed.The gel solution was cooled to ~45°C. Ethidium bromide was added when temperature reached 45-50° C as a staining agent.

Gel was poured into the gel tray with the combs avoiding trapping of air bubbles. It was allowed to set for at least 15 min at room temperature. TAE buffer (1X) was then poured into the buffer-tank of the electrophoresis unit. The comb was removed carefully from the gel and the tapes were pulled off the gel tray and the gel tray was immersed in the buffer tank.

DNA sample dissolved in TE was pipetted onto a parafilm and mixed well with 3 μl of 10X loading dye by pipetting up and down several times and the samples were loaded into the gel wells. The lid was closed and the electrodes were fixed. It was made sure that the negative terminal is at the same end of the unit as the sample loading wells are. The power supply was turned on and the constant voltage was adjusted to 75 volt and allowed to run for 40 min till the dye front was ~2cm from the opposite end. DNA bands were detected by direct examination of the gel in ultraviolet light and photographed using Uvitec gel documentation system (Cambridge, UK).

The quantified DNA was used to run PCR using trait specific SSR markers.

3.3.3 Simple Sequence Repeat (SSR) analysis

Genetic diversity among the rice varieties was assessed using 8 SSR markers of the

RM series selected from the Gramene database (http://www.gramene.org/).Details of 26 the markers used in this studyare described in table 3.2. The basis for selection was annealing temperature of 55°C to 62°C and amplicon size less than 300bp. Based on the information available on the genome wide SSR markers in rice, a total of 35 SSR primer pairs were initially screened and 8 SSRs that were consistently amplified in our analysis were used.

3.3.4 Polymerase Chain Reaction (PCR)

The quantified DNA samples were amplified in 25 μl reaction volumes containing of

5.0 μl template DNA, 5.4 μl ddH2O, 6 μl PCR buffer, 3.0 μl MgCl2, 3.6 μl dNTPs,

0.6 μl of each primer and 0.8 μl of Taq DNA Polymerase.

This was carried out in a thermal cycler with a cycle profile: Initial denaturation at

94°C for 4 min, 40 cycles of 1 min denaturation at 94°C, 30 sec annealing at 55°C or

62°C (depending on the marker used) and 1 min extension at 72°C, and then 4 min at

72°C for the final extension. Variety IR 64, an international check variety was used as the positive control in PCR.The resultant PCR products were analysed by electrophoresis on 2% agarose gels with 100-bp ladder.

27

Table 3. 2:Profiles of rice microsatellite (RM) or SSR makers used for this study

Sr no. Locus Sequence Motif A* C* Amplicon size(bp)

1 RM 277 Forward: CGGTCAAATCATCACCTGAC (GA)11 55 8 124 Reverse: CAAGGCTTGCAAGGGAAG

2 RM 232 Forward: CCGGTATCCTTCGATATTGC (CT)24 55 3 158 Reverse: CCGACTTTTCCTCCTGACG 3 RM 252 Forward: TTCGCTGACGTGATAGGTTG (CT)19 55 4 216 Reverse: ATGACTTGATCCCGAGAACG 4 RM 282 Forward: CTGTGTCGAAAGGCTGCAC (GA)15 55 3 136 Reverse: CAGTCCTGTGTTGCAGCAAG 5 RM 241 Forward: GAGCCAAATAAGATCGCTGA (CT)31 55 4 138 Reverse: TGCAAGCAGCAGATTTAGTG

6 RM 215 Forward: CAAAATGGAGCAGCAAGAGC (CT)16 55 9 148 Reverse: TGAGCACCTCCTTCTCTGTAG 7 RM 339 Forward: GTAATCGATGCTGTGGGAAG (CTT)8CCT(CTT)5 55 8 167 Reverse: GAGTCATGTGATAGCCGATATG 8 RM 225 Forward: TGCCCATATGGTCTGGATG (CT)18 55 6 140 Reverse: GAAAGTGGATCAGGAAGGC

Name, motif, chromosomal location (C*), annealing temperature (A*) and product size (bp) of rice microsatellite. 28

3.3.5 Electrophoretic separation and visualization of PCR products

Five microliters of PCR products were separated by electrophoresis on 2% agarose gel. A loading dye comprising of (0.25% xylene cyanol, 0.25% bromophenol blue,

30% glycerol and 1 mM EDTA) was used for each PCR-product for purposes of monitoring the loading, progress of electrophoresis and to increase the weight of the sample so that it stays in the well of the gel.

A gel tray and combs were cleaned and dried with a tissue paper soaked in rectified spirit. The open ends of the gel casting plate were sealed with cello tape and placed on a horizontal perfectly leveled platform. Two percent agarose was added to 1X

TAE buffer and boiled till the agarose dissolved completely and then cooled.

Ethidium bromide was used as a staining agent at the final concentration of 1 μg/ml.

The gel was carefully placed in the electrophoresis gel chamber keeping the gel horizontal and submerged in the running buffer (1× TBE) and final level of buffer was ~5mm above the gel. The comb was placed properly and allowed to solidify.

After solidification of the agarose, the comb and cello tape were removed. 2μl of loading dye was mixed with amplified DNA samples on a parafilm using a pipette and were loaded into the gel wells. A molecular weight marker DNA 100 bp was loaded on either side of the gel.

To achieve good separation of the PCR products, agarose gel electrophoresis was performed at 100 V for 1 hour. The electrophoresis was stopped after theloading dye 29 had reached three quarters of the gel length. The gel was then taken out from the electrophoresis chamber and placed on a high performance ultraviolet Trans- illuminator. It was examined, photographed using gel documentation instrument and saved in a computer. The size of the amplified DNA bands (microsatellite alleles) was determined with reference to the 100 bp DNA ladder included in the gel as a size marker.

3.4 Data management and analysis

The phenotypic data was analysed using Analysis Of Variance (ANOVA) followed by Tukey’s post hoc statistical tools as implemented in Minitab 17 software package

(State College, Pennsylvania). A dendrogram was obtained from the mean values of the seven traits across all the test varieties with the help of Minitab 15 software package. Principal Component Analysis (PCA) was carried out to investigate the overall pattern of phenotypic diversity and the individual trait contributions to observed phenotypic diversity (Ray et al., 2013).

On the other hand, genetic data was analysed using power marker version 3.25 (Liu and Muse, 2005) and Gen Alex version 6.5 (Peakall and Smouse,2012) statistical software packages. Clearly resolved bands of the genotypes were manually scored using the binary coding system, ‘1’ for presence of band and ‘0’ for absence of band.

The resultant binary matrix was subjected to Power Marker software to analyse the genetic diversity of each variety on the basis of five parameters: major allele frequencies, allele number, Polymorphism Information Content (PIC) and gene 30 diversity (Devos and Gale, 1992). A dendrogram of cluster analysis was constructed using the Un-weighted Pair Group Method with Arithmetic average (UPGMA) as implemented on Power Marker software and was viewed using TreeView.

Analysis of Molecular Variance (AMOVA) was used to reveal the partitioning of

variation within and among the populations. Principal Coordinate Analysis (PCoA)

was carried out based on SSR data to generate a 2- dimensional representation of

genetic relationship across the 13 rice varieties with the help of GenAlex software.

31

CHAPTER FOUR

RESULTS

4.1 Determination of phenotypic diversity

4.1.1 Measurement of grain and kernel traits

Seven grain and kernel trait measurements were found to vary across the 13 studied rice varieties as shown in table 4.1. Of all the traits, the highest variation was observed in grain weight where most of the rice varieties significantly differed

(P<0.05; Table 4.1).Suparice variety showed the highest grain weight followed by IR

2793 and IR 54 whereas BS 370, ITA 310 and BS 217 showed the lowest grain weight mean values. It was observed that short and bold grains were heavier compared to long and slender grains. High grain length coupled with grain breadth was associated with high weight values for Supaand most of improved rice varieties. However,

Kahogo, Saro 5, Kilombero and Red Afaahad no significant variation in grain weight

(P>0.05; Table4.1).

Moderate variation was observed in kernel length where dimensions ranged from

6.520 mm to 7.586 mm. Based on this trait, Supa and Wahiwahi which showed the highest kernel length mean values were significantly different from the rest of the test varieties (P<0.05; Table 4.1). The lowest kernel length mean values were identified in

ITA 310and Red Afaaand the two varieties significantly differed from other varieties

(P<0.05; Table 4.1). All the varieties that had high and low values for grain length also showed high and low values for kernel length.

32

Low variation was observed in grain and kernel breadth traits where grain breadth dimensions across the rice varieties ranged from 1.846 to 2.055 mm. The highest grain breadth mean values were observed in Supa, followed closely by Red Afaa and

IR 54and they significantly differed from the rest of the varieties (P<0.05); Table

4.1).The lowest grain breadth mean values were observed in BS 370, BS 217 and ITA

310 respectively and based on this trait, they were significantly different from other test varieties (P<0.05; Table 4.1).

On the other hand, kernel breadth dimensions ranged from 1.64 mm to 1.87mm where the highest mean values were observed in Red Afaa, Supa and Kilombero. The three rice varieties had almost similar kernel breadth dimensions but differed significantly when compared to the rest of the varieties in this study (P<0.05; Table 4.1). The lowest kernel breadth mean values for were identified in BS 217, BS 370andITA 310.

These results indicated that there was an association between grain and kernel breadth traits since similar varieties showed consistency in high and low kernel breadth values.

Grain length measurements ranged from 8.999 mm to 10.666 mm. Wahiwahihad the longest grain size followed by Supa and Kilombero. Unlike other traits, the three rice varieties that had the longest grain sizes were significantly different from each other

(P<0.05; Table 4.1).On the other hand, ITA 310and IR 64 had the shortestgrain sizes and were significantly different from other rice varieties. It was observed that aromatic landraceshad the longest grains among the test varieties and shared a 33 common source, Tanzania. On the other hand,non-aromatic improved varieties were found to have the shortest grains and shared a common origin, as shown by the IR codes which indicates are improved varieties from Philippine.

Grain length/ breadth ratio was calculated and the highest mean values were observed in Wahiwahi, BS 217 and Saro 5 varieties. The lowest values were observed in Red

Afaa, IR 54 and IR 2793. Combination of the two traits depicted IR 54 and IR 2793 as short and bold grains.

Kernel length / breadth ratio which is the measure of slenderness mean values ranged from 3.45 mm to 4.34 mm. The highest mean values for this trait were observed in BS

217, BS 370 and Wahiwahiwhere BS 217, an improved aromatic variety from Kenya, was the most slender kernel and significantly differed from the rest of the rice varieties (P<0.05; Table 4.1). On the other hand,Red Afaa, IR 64 and IR 2793 had the lowest mean values for kernel length/breadth ratio.

Red Afaa, IR 64 and IR 2793 with KL/B ratio of less than 3.80 was categorized as short grain varieties whereas Kahogo, IR 54, ITA 310, and BW 196 rice varieties having KL/B ratio of less than 4.0 were considered as medium grain varieties. BS

217, BS 370, Kilombero, Saro 5, Wahiwahi and Supa showed a KL/B ratio greater than 4.0 and were categorized as long grain varieties.From these results, it was inferred that basmati varieties had long and slender rice kernels followed by Supa, 34

Wahiwahi, Kilombero and Saro 5 which had medium grain sizes. Red Afaa, IR 64 and IR 2793 varieties had short and bold kernels.

High variability was revealed by analysis of variance of the seven traits across all the varieties are shown in table 4.1. The ANOVA table clearly showed that the means of the characters measured varied significantly across all the varieties.

35

Table 4.1: Analysis Of Variance (ANOVA) of seven grain and kernel traits of the 13 studied rice genotypes

VARIETIES TRAITS GL (mm) GB (mm) GL/B KL (mm) KB (mm) KL/B GW (g) IR 2793 (1) 9.199±0.37 ef 1.994±0.10 ab 4.625±0.31 cde 6.619±0.30 cd 1.762±0.05 abc 3.759±0.19 de 28.9±0.01 b BS 217 (2) 9.543±0.50 bcdef 1.85±0.08 b 5.159±0.18 ab 7.112±0.39 abc 1.641±0.08 c 4.336±0.18 a 23.5±0.02 e BS 370 (3) 9.225±0.38 def 1.843±0.06 b 5.005±0.10 abc 6.931±0.27 cd 1.659±0.05 bc 4.177±0.09 ab 18.2±0.01 g BW 196 (4) 9.302±0.36 cdef 2.011±0.13 ab 4.640±0.30 cde 6.625±0.29 cd 1.749±0.1 abc 6 3.808±0.26 cd 26.2±0.01 d ITA 310 (5) 8.999±0.31 f 1.846±0.06 b 4.877±0.15 abcde 6.522±0.35 d 1.643±0.08 c 3.971±0.15 bcd 20.6±0.01 f Red Afaa (6) 9.138±0.22 ef 2.028±0.16 a 4.531±0.37 e 6.435±0.29 d 1.868±0.13 a 3.452±0.16 e 27.2±0.01 bcd IR 54 (7) 9.929±0.35 bcd 2.049±0.10 a 4.852±0.20 bcde 7.139±0.35 abc 1.788±0.11 abc 3.997±0.15 bcd 28.5±0.01 bc KILOMB(8) 10.02±0.60 abc 1.991±0.17 ab 5.057±0.41 abc 7.501±0.32 ab 1.814±0.06 a 4.169±0.22 ab 27.3±0.01 bcd IR 64 (9) 9.072±0.63 f 1.989±0.07 ab 4.565±0.33 de 6.600±0.50 cd 1.786±0.06 abc 3.697±0.27 de 26.5±0.01 cd Kahogo (10) 9.952±0.47 abc 2.012±0.12 ab 4.961±0.35 abcde 6.981±0.27 bcd 1.783±0.09 abc 3.924±0.24 bcd 27.5±0.01 bcd Saro 5 (11) 9.855±0.42 bcde 1.939±0.12 ab 5.096±0.32 ab 7.078±0.31 abc 1.744±0.14 abc 4.075±0.26 abc 27.5±0.01 bcd Wahiwahi(12) 10.666±0.65 a 2.017±0.16 ab 5.302±0.29 a 7.540±0.44 a 1.804±0.08 ab 4.130±0.15 ab 28.4±0.02 bc Supa (13) 10.243±0.66 ab 2.055±0.11 a 4.989±0.29 abcd 7.586±0.54 a 1.849±0.11 a 4.108±0.25 abc 32.7±0.03 a The values are mean± SEM of ten independent determinations at 5% level of significance. Data was analysed using Analysis Of Variance (ANOVA) followed by Tukey’s post hoc test. Means that do not share a superscript are significantly different (P˂0.05). 36

4.1.2 Principal Component Analysis (PCA)

The principal component analysis (PCA) was carried out to investigate the morphological traits that played a key role in phenotypic diversity among the rice varieties. It provided the eigen values and percent of variation for seven principal component axes across 13 rice varieties as shown in table 4.2. It was found that the first three principal components jointly accounted for 99.5% of the total variation among all the studied varieties. Combination of the first and the second principal components accounted for 95.2% of the total variation among the seven component axes of the total rice varieties.

Principal component 1 (PC1) had 84.6% of the total variation where all the traits; grain length, grain breadth, grain length / breadth ratio, kernel length, kernel length / breadth ratio and grain weight contributed positively. Of all, three traits; grain length, kernel length and grain weight had a notably major contribution to PC1. In the case of Principal Component 2 (PC2), three traits; grain length / breadth ratio, kernel length / breadth ratio and kernel length contributed positively and accounted for 10.6% of the total morphological variability.

On the other hand, grain length, grain breadth, kernel breadth and grain weight traits were negatively associated with PC2.

The first two principal components efficiently separated most of the improved varieties from landraces with varieties possessing long grains clustering close 37 together as shown in figure 4.1. Basmati varieties clustered in a separate group distantly from the other varieties and this correspond well with their slender grains.

The GL, KL, GW and KL/B were found to be the major contributors of PC1 and

PC2.

Table 4.2: Eigen values and percent of variation for 7 principal component axes in 13 rice varieties TRAITS PRINCIPAL COMPONENT ANALYSIS

PC1 PC2 PC3 PC4 PC5 PC6 PC7

Eigen values 0.5192 0.0647 0.0265 0.0026 0.00025 0.00002 0.0000

Proportion of variance 84.6 10.6 4.3 0.4 0.00 0.00 0.00

Cumulative % variance 84.6 95.2 99.5 99.9 100 100 100

Eigen vectors

GL 0.730 -0.456 0.408 -0.235 0.093 0.153

GB 0.029 -0.281 -0.113 -0.247 0.048 -0.517 -0.759

GL/B 0.294 0.099 0.489 0.568 -0.143 -0.177 -0.304

KL 0.562 0.455 -0.742 0.271 0.029 0.188 -0.119

KB 0.027 -0.261 -0.150 0.356 -0.030 -0.729 0.499

KL/B 0.250 0.645 -0.064 -0.591 -0.016 -0.352 0.211

GW 0.323 -0.112 -0.064 -0.105 -0.985 0.040 -0.004

38

Figure 4. 1:Scatter plot of 13 rice varieties based on the first two principal components

4.1.3 Cluster analysis of rice varieties based on morphological traits

Cluster analysis grouped the 13 rice varieties into two distinct major clusters I and II with a similarity index of 1.25 therebyrevealing presence of high diversity as shown in figure 4.2. Cluster I was the largest with 8 rice varieties whereas cluster II had only

5. Cluster I was further subdivided into three other sub clusters CIA, CIB andCIC where Wahiwahi, a landrace, formed its own sub cluster, CIA. Sub cluster CIB contained two other smaller groups i and ii.

Among these two groups, Supa, an improved aromatic variety, clustered close together with Kilombero, a semi aromatic variety in group i. In group ii, Saro 5, an 39 improved aromatic variety clustered together with two other varieties from the same origin. In sub cluster CIC, improved aromatic Basmati genotypes clustered together with a similarity coefficient of 0.43.

Cluster II contained four improved rice varieties and only one land race where ITA

310 showed parentage to the rest of the varieties on the pedigree. Two improved varieties from Kenya in this cluster, IR2793 and BW 196 were the most similar with a similaritycoefficient of 0.21. The relationship among the 13 rice varieties was revealed by the dendrogram as shown in figure. 4.2.

1.25

II 1.25 I y

t 0.83 i

r 1.01

a

l

i

m i

S 0.79 1.25 CIC CIB CIA 0.42 0.53 0.43

ii 0.57 i 0.57 0.39 0.21 0.00 i 93 96 64 aa 10 17 70 54 go 5 ro pa h 7 1 f 3 2 3 o ro e u a 2 IR A S S IR h a b S iw R W d A a S m h I B e IT B B K ilo a R K W

Varieties

Figure 4.2:Dendrogram generated by cluster analysis of morphological characters.

40

4.2 Determination of genetic diversity

4.2.1 Quality of extracted DNA

It was found that the entire DNA was intact and of good quality as shown in plate 4.1.

The numbers on the gel photo represent the lab codes assigned to each of the rice samples as indicated on table 3.1.

1 2 3 4 5 6 7 8 9 10 11 12 13

Plate 4.1:A gel picture showing thirteen rice DNA samples extracted using CTAB method. Sample code numbers 1 to 5 represent the Kenyan samples whereas samples 6 to 13 represent the Tanzanian samples.

The purity of the extracted DNA was found to be above 1.8 whereas the concentration of the DNA was on an average 428.72 ng per µl and was used for subsequent SSR analysis.

4.2.2 Simple Sequence Repeat (SSR) analysis, allele number, PIC and Heterozygosity estimates

4.2.2.1Simple Sequence Repeat (SSR) analysis

It was found that of all the SSR markers used in this study, only RM 42 was monomorphic and was present at the same level. This marker which is tightly linked to fgr gene responsible for aroma targeted one allele in all the varieties. 41

The othereight markers utilized showed clear and consistence banding patterns and

were chosen for assessment of genetic diversity among the varieties. RM 339 and RM

241 rice microsatellite markers demonstrated distinct bands in most of improved

aromatic rice varieties compared to all other varieties.

Marker RM 339, revealed considerable level of divergence among the different rice

varieties as shown in plate 4.2. Several bands presented by the microsatellite markers

were shared between aromatic and non-aromatic varieties as shown in plate 4.2.

Mw 1 2 3 4 5 6 7 8 9 10 11 12 13

200 bP 100 bp

Plate 4.2:SSR banding pattern of 13 landraces and improved rice varieties from Kenya and Tanzania generated bymarker RM 339. The lanes represent Mw- 100bp molecular weight ladder; lane 1: IR 2793; lane 2: BS 217; lane 3: BS 370; lane 4: BW 196; lane 5: ITA310; lane 6: Red Afaa; lane 7: IR 54; lane 8: Kilombero; lane 9: IR 64; lane 10: Kahogo; lane 11: Saro 5; lane 12: Wahiwahi; lane 13: Supa.

4.2.2.2 Number of alleles

The ability of each of the eight microsatellite markers to determine genetic diversity

among the varieties varied. A total of 25 alleles were detected from the 13 varieties

using the eight SSR markers as shown in table 4.3. The allelic richness per locus

generated by each marker varied from 2 for RM 282 to 4 for RM 241 and RM 339 42 with an average of 3.125 alleles per locus. Maximum number of alleles per loci was obtained with markers RM 241 and RM 339.

The number of alleles detected by particular markers provides an estimation of genetic diversity. This indicated that markersRM 241 and RM 339 were the most informative for the 13 test genotypes hence most suitable for diversity studies. The minimum number of polymorphic alleles was observed with marker RM 282. As shown in table 4.3, there was no association between the number of alleles detected and the number of SSR repeat motifs. The loci with the repeat motif varying from

(GA) 11 to (GA) 15 did not show any association with the number of alleles detected.

4.2.2.3 Rare alleles

Alleles observed in less than 5% of all the rice varieties (commonly termed as rare) were investigated and identified at three loci RM 277, RM 241 and RM 339. A total of 5 rare alleles (20%) were detected with maximum number being observed at RM

241 followed by RM 339. Five of the rice varieties (38%) showed rare alleles. ITA

310, Wahiwahi and Supa had one rare allele each while IR 2793 had two rare alleles.

It was found that markers RM 241 and RM 339 which detected a higher number of alleles (4) also detected more rare alleles.

4.2.2.4Polymorphic Information Content (PIC) values

The level of polymorphism among the 13 rice varieties was evaluated by calculating

PIC values for each SSR locus. This calculation was based on the alleles produced by 43 each marker. The PIC values varied from 0.292 on RM 282 to 0.641 on RM 339 with an average of 0.5019 per locus as shown in table 4.2. The varying PIC values generated by the markers served as an indicator of the discriminating power of a particular marker by taking into account the number of alleles at each locus and their relative frequencies among the tested varieties.

Six out of the eight markers (RM 277, RM 252, RM 241, RM 339, RM 215 and RM

225) had PIC values of above 0.5. On this basis, RM 339 was considered the best marker for the 13 test genotypes. The results were summarized in table 4.3.

4.2.2.5 Heterozygosity

No heterozygosity was observed (Ho=0) across the varieties whereas expected heterozygosity (He) which is reflected by the gene diversity at each locus ranged from 0.355 to 0.698 with an average value of 0.604. Heterozygosity deficiency concurred with high inbreeding coefficients (F) of 1.0 across all the varieties. 44

Table 4. 3: Profiles of SSR analysis Marker SSSSR motifs Chromosome Polymorphic Gene PIC values F HO number alleles diversity RM 277 (GA)11 8 3 0.6036 0.536 1.0 0

RM 232 (CT)24 3 3 0.5680 0.472 1.0 0

RM 252 (CT)19 4 3 0.6509 0.576 1.0 0

RM 282 (GA)15 3 2 0.3550 0.292 1.0 0

RM 241 (CT)31 4 4 0.6746 0.620 1.0 0

RM 215 (CT)16 9 3 0.6391 0.566 1.0 0

RM 339 (GA) 14 8 4 0.6982 0.641 1.0 0

RM 225 (CT)25 6165 3 0.6391 0.566 1.0 0

Average _ _ 3.125 0.6036 0.534 1.0 0

Simple Sequence Repeats (SSR) motifs, their location on rice chromosomes, number of polymorphic alleles, gene diversity, Polymorphic Information Content (PIC) values, inbreeding coefficients (F) and observed heterozygosity (Ho) 45

4.3 Determination of genetic relatedness

4.3.1 Genetic distance

Pair-wise genetic similarity estimates ranged from 0.113 to 0.90 and the average similarity among all the rice varieties was 0.482as shown in table 4.4. It was found that Kilombero and Supa were the closest genotypes with the lowest genetic similarity value of 0.113. This was closely followed by BS 217 and BS 370 varieties with a similarity value of 0.225.

On the other hand, the highest level of dissimilarity was observed between Kahogo and BS 217 rice varieties with a similarity index of 0.900. Higher similarity coefficients were evident among improved rice varieties as compared to landraces.

46

Table 4.4: C.S. Cord coefficients of dissimilarity among pairs of 13 rice varieties IR ITA KAHOG KILOMBE RED SARO BS 217 BS 370 BW 196 2793 IR 54 IR 64 310 O RO AFAA 5 SUPA WAHIWAHI BS 217 0.0000 BS 370 0.2251 0.0000 BW 196 0.4502 0.3376 0.0000 IR 2793 0.5627 0.5627 0.5627 0.0000 IR 54 0.5627 0.4502 0.5627 0.4502 0.0000 IR 64 0.6752 0.6752 0.6752 0.5627 0.7878 0.0000 ITA 310 0.6752 0.7878 0.9003 0.5627 0.6752 0.3376 0.0000 KAHOGO 0.9003 0.7878 0.7878 0.7878 0.6752 0.5627 0.3376 0.0000 KILOMBE RO 0.5627 0.5627 0.5627 0.5627 0.4502 0.5627 0.4502 0.6752 0.0000 RED AFAA 0.5627 0.5627 0.4502 0.4502 0.5627 0.4502 0.6752 0.5627 0.7878 0.0000 SARO 5 0.5627 0.5627 0.9003 0.6752 0.5627 0.5627 0.5627 0.5627 0.4502 0.5627 0.0000 SUPA 0.4502 0.6752 0.6752 0.6752 0.5627 0.5627 0.4502 0.6752 0.1125 0.7878 0.3376 0.0000 WAHIWA HI 0.7878 0.7878 0.7878 0.5627 0.5627 0.5627 0.5627 0.5627 0.7878 0.3376 0.5627 0.7878 0.0000

47

4.3.2 Cluster analysis

A dendrogram based on Neighbor joining tree grouped the 13 rice varieties into three major clusters, I, II and IIIas shown in figure4.3. Cluster I contained 4 rice varieties where two varieties from Philippine, IR 2793 and IR 54 clustered close together. Two other landraces, Wahiwahi and Red afaa from Tanzania also clustered together.

Cluster II was more diverse and consisted of 6rice varieties which were further subdivided into 2 sub-clusters, each having two other small clusters. The first Sub- cluster solely consisted of 3 landraces varieties from both source countries whereas the other sub-cluster consisted of 3 aromatic varieties from Tanzania.Cluster III consisted of 3 rice varieties from Kenya where 2 aromatic basmati varieties,BS 217 and BS 370clustered close together.

Most of landraces and improved varieties formed sub clusters of their own.High level of genetic relatedness was revealed amongBasmati varietieswhich clustered in one sub group as shown in figure 4.3. 48

I

II

III

Figure 4.3:Neighbor joining tree, 1000 bootstraps, dissimilarity matrix index presence/ absence (Jaccards coefficient)

4.3.3 Analysis of molecular variance (AMOVA)

The Analysis of molecular variance (AMOVA) results showed statistically significant differentiation (P< 0.001; Table 4.5).Eighty six percent (86%)of the total variation was contained within populations whereas a small but significant variation of 14% was contained among the two populations (P< 0.001; Table 4.5).

49

Table 4.5: Analysis Of Molecular Variance (AMOVA) based on 8 SSR loci Source of DF SS MS Est. Var. % p-value variation variation

Among 1 8.710 8.710 0.696 14% populations

Within 11 48.675 4.425 4.425 86% P populations <0.001

Total 12 57.385 13.135 5.121 100%

Degrees of freedom (DF), sum of squares (SS), mean of square (MS), estimated variation, % variation and P- values are shown.

4.3.4 Principal coordinates analysis (PCoA)

The first and second component axes of PCoA showed 27.96 % and 24.23% totaling to 52.19% of the variance respectively. Principal coordinate analysis revealed that huge genetic diversity existed in the test rice varieties and formed two clusters, A and

B as shown in figure 4.4. In cluster A, three improved and high quality rice varieties from Kenya;BS 217, BS 370 and BW 196 were grouped close together compared to

Wahiwahi and Red Afaa, landraces from Tanzaniawhich were grouped far apart in the same cluster. On the other hand, two improved aromatic varieties, Supa and Saro5 from Tanzania were grouped close together with one aromatic landrace, Kilombero in cluster B.These two clusters, A and B corresponded well with the two major clusters I and II of UPGMA dendrogram.

50

A

B

Figure 4.4: Two-dimensional scatter plot of principal coordinate analysis for all 13 test rice varieties. The first and second dimensions explained 27.96% and 24.23% of the genetic diversity respectively. Varieties were labeled with different colours to identify their region specificity, the intermixing of colour across the coordinate’s further support the UPGMA dendrogram that there is no location-specific grouping of varieties.

51

CHAPTER FIVE

DISCUSSION, CONCLUSIONS, RECOMMENDATIONS AND SUGGESTIONS FOR FURTHER RESEARCH

5.1 Discussion

Rice is a domesticated crop, and the process of domestication involves classical breeding by selection for quality traits and subsequent hybridization of the elite lines.

This process may lead to loss of useful genes and causes reduction of genetic diversity among landraces. Landraces are highly heterogeneous and show better adaptation to local environment than modern varieties (Brondani et al., 2005).

Studies have shown that these traditional varieties possess great genetic variability and specific traits such as biotic and abiotic resistance, disease resistance hence are an important resource for plant breeding in crop improvement strategies as well as for the preservation of genetic diversity. Morphological and seed traits are important means of studying variability among crop plants (Kamarouthu, 2013).

Analysis of variance of the grain and kernel traits measurements indicated a wide spectrum of exploitable variation among all the rice varieties for all the characters.The means of the characters measured varied significantly across all the varieties and this demonstrated that based on these traits, the varieties were distinct from each other.

This variation is useful in the initial breeding material as it offers opportunities for desired traits through phenotypic selection. The highest variability was observed in grain weight ranging from 18.2 g to 32.7 g. The maximum grain weight was recorded 52 in Supawith 32.7 g followed by IR 2793 with 0.289 g and then IR 54 with 28.5 g. The lowest grain weight was observed among basmati varieties, BS 370 and BS 217 with

18.2 g and 23.5 g, respectively.

Low grain weight among BS varieties can be explained by the fact that these varieties possess longer and slender grains. This study observation relates to the report of Ray et al.(2013) who found that most landrace varieties had heavier grains than improved varieties using a set of Indian aromatic and non-aromatic rice varieties. From commercial point of view, seed weight is an important quality trait that increases the market value of rice hence; these can be used as donor plants for developing varieties with bold seeds.

On the other hand, varieties showed moderate variation in kernel length ranging from

6.052 mm for ITA 310 to 7.586 mm for Supa with landrace varieties having the highest grain and kernel length. These results agree with the findings of Parikh et al. (2013) who reported moderate variability for kernel length among Indian rice varieties. The same varieties also showed very high values for grain and kernel breadth ranging from

1.804 mm for Kilombero to 1.849 mm for Supa. Maximum grain length coupled with grain breadth gave these rice varieties highest grain weight above the other varieties in this study.

Low variation was observed in grain and kernel breadth indicating limited scope of improvement in these traits through selection. The results clearly indicated that BS 217 53 and BS 370 varieties had a small grain breadth that contributes to their slender appearance.Red Afaa, IR 54 and IR 2793 had the lowest values of grain length / breadth ratio. Combination of these two traits depicted IR 54 and IR 2793 as short and bold grains. Supa, an improved variety and Kilombero, a landrace showed the highest grain length and breadth mean values and this corresponded well with their long and bold rice grains. High grain lengths coupled with high grain breadths contributed round appearance of these varieties.

In the case of grain and kernel length/breadth ratio (a measure of slenderness) the highest mean values were observed in BS 217 with 4.336 followed by BS 370 with

4.177 and Kilombero with 4.169. The rice varieties with a KL/B ratio of > 4.0 were categorized as slender and long grains (Yadav et al., 2007). The most slender rice varieties are desirable and preferred by consumers. It was observed that most of aromatic landraces and improved varieties belonged to this category with KL/B ratios ranging from 4.075 for Saro 5 to 4.336 for BS 217. Generally, Basmati varieties showed the highest kernel L/B values. This type of rice is long and slender in shape and possess most desirable grain and cooking quality traits.

Wahiwahi and Kilombero were the most slender landrace varieties with a KL/B ratio of 4.180 and 4.139 respectively. The varieties with high grain and kernel length/breadth ratios may be utilized as sources of these traits in breeding for long grain varieties. Immense variation in grain and kernel L/B ratios has also been reported by Shahidullah et al. (2009). Grain shape and size are important traits that 54 determine the market value of rice. These traits are highly considered by breeders in developing new varieties for commercial release (Cruzand Khush, 2000).

Principal Component Analysis (PCA) provided an insight of the contribution of each of the trait towards divergence among the characteristics of the rice varieties. Principal component analysis grouped the rice varieties into three clusters indicating the presence of considerable phenotypic diversity among the varieties. The first two principal components were utilized for principal component analysis because they expressed better total variability (95.2%) of the plant material.

The most predominant traits that contributed to the observed phenotypic diversity were; grain length, kernel length, grain weight, and kernel length / breadth ratio. This analysis validates their use as the main discriminating traits among the test varieties.The other traits were found to have a minimal contribution to variability.

These results are consistent with the report of Ray et al.(2013) who found major contributions of grain length, kernel length, grain weight and grain length / breadth ratio to phenotypic diversity in a set of Indian rice. The major contribution of grain length, kernel length, grain weight, and kernel length / breadth ratio could be perhaps due to the fact that they are the most important agronomic traits subjected to selection by farmers and breeders overtime (Duvicket al., 2010).

Cluster analysis provided a good opportunity to identify and group the rice varieties into distinct categories with respect to similarity levels based on the phenotypic traits.

The dendrogram showed two major clusters, I and II.Cluster I contained eight rice 55 varieties from both source countries. All the aromatic rice varieties, both improved and landraces were grouped in cluster I. This indicated the extent in which most aromatic varieties share phenotypic grain quality traits.

On the other hand, cluster II contained five rice varieties from both source countries where all were non aromatic varieties, representing both improved and landrace varieties. Cluster analysis indicated that there was no association between the observed pattern of variation and the geographical origin of the rice varieties.

Cluster analysis grouped the basmati varieties,BS 217 and BS 370 together and close to other aromatic varieties from different sources and this presented aroma as a grain quality trait that distinguished rice varieties into different categories. This also suggests a possibility that the widely preferred basmati type of rice may have evolved through natural mutation from non-basmati genotypes (Joshi and Behera, 2007). These results are consistent with the findings of Ray et al. (2013) who observed clustering together of the basmati group based on morphological traits using aromatic and non- aromatic rice varieties from India.

Only oneimproved aromatic variety,Saro 5, clustered with an improvednon-aromatic variety IR 54, and Kahogo, a landrace semi aromatic variety, which is rational sincethey are collections from Kilimanjaro Agricultural Research Training College and possibly have common ancestors. These varieties had similar phenotypic traits 56 such as grain length, kernel breadth and grain weight but differed in grain breadth, grain length/breadth ratio and kernel length/breadth ratio.

These differences contribute to the slender nature of Saro 5 and distinguish it from other varieties with almost similar size. Other long grain non-aromatic varieties fell in one sub-cluster corresponding well with their grain characters. These varieties, although have varying cooking and eating qualities, lack the desirable basmati traits.

Generally, cluster analysis gave an insight into the diversity of the rice varieties as shown in the dendrogram in figure 4.2.

Variety groups were primarily associated with phenotypic differences among them and with variety type. These results agree with earlier report of Hien et al. (2007),who also found that in cluster analysis, varieties grouped together with greater phenotypic similarity but the groups did not essentially include varieties from the same origin.

The assessment of genetic diversity is crucial in germplasm characterization, conservation and breeding. The advent of DNA marker technology has greatly facilitated studies of genetic variation through development of genetic markers to follow inheritance of agronomically important traits. The results obtained from assessment of genetic diversity at the DNA level could be used in development of better breeding strategies.

57

Simple Sequence Repeat (SSR) markers were chosen for the analysis of genetic diversity among Kenyan and Tanzanian rice varieties because previous studies have shown that they are a reliable tool for differentiation of even closely related lines. In addition, these markers have numerous advantages over other PCR-base DNA markers which include co-dominance, high abundance in the genome, allowance of high throughput screening, reproducibility and can be easily automated (Ni et al.,2002).

Although diversity analysis in rice has been previously reported (Mwangi et al., 2013), very little is known on the relationship of Kenyan and Tanzanian rice landraces and improved varieties on the basis of phenotypic and molecular analysis. In this study, 8 microsatellite markers were used to assess the genetic diversity among 13 rice varieties. The results obtained indicated a significant level of genetic variation among the rice varieties used. Thenumber of alleles produced by microsatellite assays was found to be shared among improved and landrace varieties but comparatively a lower number of alleles were common to aromatic and non-aromatic rice varieties(Peano et al., 2004).

The microsatellite assays produced some variety specific alleles in some of the varieties assessed. It was found that markers RM 241 and RM 339 detected a higher number of polymorphic alleles (4) and more rare alleles. These findings agree with

(Jain et al. (2004) who observed a similar output on Indian aromatic and quality rice accessions. Rare alleles are highly informative in fingerprinting of rice varieties and this indicates the enormous value of RM 241 and RM 339 markers in creation of DNA 58 fingerprints.These can be very useful in quality assurance for varietal identification of those varieties as well as for determination of cultivar purity. This phenomenon could be in support of the fact that some markers are reportedly more specific to subspecies genomes than others, and this aspect makes them very useful for discrimination of closely related genotypes (Bligh et al., 1999).

The number of alleles detected by microsatellite markers varied from 2 to 4 with an average of 3.13 alleles per locus. These results correspond well with another earlier report of by Shahet al. (2013) among Basmati and non-Basmati rice varieties from

Pakistan, who detected allelic richness of 2-4 alleles with an average of 2.75 alleles per SSR locus. In contrast, the average number of alleles detected in this study was comparatively higher than the values obtained by Meti et al. (2013) and Vanniarajan et al. (2012) who reported an average of 2.08 and 2.5 alleles per locus using Indian aromatic rice varieties.

The average number of alleles per locus detected in this study was lower than the findings of Siwach et al. (2004) who obtained an average of 4.5 alleles per locus using

Indian rice varieties.Jayamaniet al. (2007) and Ghneim et al.(2008) reported 7.7 and

13.0 alleles per locus for various classes of SSR markers using a different set of rice

germplasm from Portugal and Venezuela respectively. The contradiction in those

reports might be due to use of diverse germplasm and higher number of rice

accessions used by these researchers. It may also be due inclusion of landraces of 59 diverse origin as well as inclusion of Basmati and non-Basmati varieties in the present study.

The level of polymorphism as assessed by the PIC values, which is a reflection of allele diversity and frequency among the rice varieties, was considerably high and ranged from 0.29 to 0.64 with an average of 0.53. Six out of the eight markers had

PIC values of above 0.5 and this confirms that the SSR markers used in this study were highly informative because PIC values higher than 0.5 indicate high polymorphism. Only two markers RM 282 and RM 232 had low PIC values and this could be perhaps the result of closely related varieties whereas high PIC values could be due to diverse varieties. The highest PIC values were observed at SSR RM 339

(0.64) and this marker was the most polymorphic and informative. It was found that markers with high PIC valuesalso detected a high number of alleles per locus. It was found that the PIC values showed an association with the number of alleles produced at each SSR locus. This indicated that the two parameters could be used in combination to provide an estimation of genetic diversity across the rice varieties.

The average PIC value observed in this study was consistent with the findings of

Shahet al. (2013). In contrast, it was higher than observations of Linet al. (2012) who obtained an average PIC value of 0.43 on Taiwan modern elite varieties, domestic and imported germplasm. PIC value obtained in this study was notably lower than that previously reported by Brondani et al. (2005) who observed an average of 0.61 on Brazilian landraces and improved lines. This could indicate that the genotypes 60 used in this study were more diverse due to differences in origin and ecotype.

Generally, microsatellite markers exhibit high PIC values due to their co-dominant nature and multiallelism (Temnykh et al., 2000).

The study demonstrated heterozygosity deficiency across all the study varieties, an indication that the study varieties were all pure breeds. This could be associated to forces such as inbreeding which was reflected by high levels of inbreeding coefficients (F) of 1.0 across all the varieties. This issupported by the fact that rice self-pollinated and effects of cross pollination are very minimal. However, the level of polymorphism as indicated by the number of alleles and the PIC values concurred with the levels of expected heterozygosity at each locus. This reflected the high genetic variability contained across the rice varieties. These results are comparable to what was observedby Cao et al.(2006) who reported heterozygosity deficiency and high level of inbreeding among Chinese rice varieties.

The genetic dissimilarity among all varieties was also determined using a dissimilarity matrix. In the present study, it ranged from 0.1125 to 0.9003 with an average of 0.482. This was consisted with the reported similarity coefficients ranging from 0.24 to 0.92 observed in eight rice varieties from Pakistan (Jayamani et al.,

2007). It was lower than the average genetic similarity of 0.79 obtained by Ravi et al.

(2003) among 40 cultivated rice varieties and 5 wild relatives of rice.

61

A high degree of genetic similarity ranging from 0.67 to 0.91 was also reported by

Siwach et al. (2004) among Basmati and non-Basmati long grain indica varieties using SSR markers. This discrepancy in the level of genetic similarity could perhaps be due to intra-specific variation in the germplasm used. It could also be explained by the use of similar ancestors and selection of similar traits.

Based on genetic similarity analysis of the 13 rice varieties with 8 microsatellite markers, a close relationship was observed between Kilombero and Supa varieties.

Another close relationship was evident among Basmati varieties, BS 217 and BS 370.

This closeness could be due to very small differences between the varieties at the

DNA level (Cho et al., 2000). In addition, three of them are improved aromatic varieties, expressed similar morphological traits and this strongly supported a close association between them. However, considerable amount of genetic variation was evident among other improved and landrace aromatic and non-aromatic rice varieties.

Cluster analysis based on the similarity coefficients conspicuously placed the 13 rice accessions into two major groups. Most of aromatic and non-aromatic varieties clustered into close sub groups. Cluster analysis grouped 2 improved aromatic rice varieties BS 217 and BS 370from Kenya in a distinct sub group from the rest of the varieties studied. This indicates that these varieties are genetically similar and share common ancestors. These varieties share BW 196 as one of their parents in the pedigree.

62

Similarly, other aromatic varieties from Tanzania;Saro 5, Supa, and Kilombero were clustered closely in the same subgroup on the upper part of the dendrogram. This is consistent with the use of quality trait, aroma as a parameter of discrimination among the varieties.

Two improved non-aromatic varieties IR 2793 andIR54from both source countries were grouped close together in to one subgroup. These two varieties were introduced into east Africa from international rice research institute, IRRI in Philippine and possibly had similar ancestors. Wahiwahi and Red Afaa, non-aromatic landraces from

Tanzania also clustered close together on the lower part of the dendrogram. In other similar studies carried out by Pervaiz et al. (2009) and Saini et al. (2004) using microsatellite markers, long slender grained Basmati varieties were placed in the same group whereas other short grain non aromatic varieties clustered into different but close groups.

Although a small number of Basmati varieties were used in this study, the results demonstrated a clear distinction between Basmati and indica rice varieties. This indicated that Basmati varieties are genetically variable from other groups of aromatic and non-aromatic varieties. Higher level of genetic variability between Basmati and non-Basmati rice varieties supports the fact that Basmati group had a long history of complex pattern of evolution and diverged from non-Basmati varieties long time ago

(Jain etal., 2004). Similar observations were made by Glaszmann, (1987) who using 63

Biochemical markers, described Basmati as a clearly distinct group from other rice groups.

Analysis of Molecular Variance (AMOVA) revealed that the main contribution to the genetic variation was due to variation within populations. Indeed, 86% of the genetic variation was found within populations while differences among populations had only

14% contribution to the total genetic variation. This small genetic difference between the two populations could be perhaps due to exchange of germplasm between the two countries. These results are comparable to what was observed by Singh et al. (2013) using Indian rice varieties.

In the PCoA scatter plot, the distances among the varieties reflected the genetic distances among them, hence varieties that were clustered close together were interpreted to be closely related and sharing similar quality traits whereas those clustered far apart were distantly related. Two major groups were identified corresponding to improved varieties with high grain quality traits and landraces by both cluster analyses, UPGMA and PCoA based on genetic distance. Most of improvedrice varieties had similar morphological traits and showed genetic similarity which was supported by both UPGMA and PCoA. Since most rice breeding programs are geared towards improvement of grain quality, varieties with good cooking and eating qualities were grouped together in clusters IA and IIA.

64

Clustering of the rice varieties by both methods revealed that there was no association in the observed pattern of variations with their geographical origin. Similar observations were made by Linet al. (2012) using Taiwan landraces and improved rice varieties. Such non-congruence between the clustering pattern and geographical origin could be due to exchange of germplasm between the two origin countries.

5.2 Conclusions

From this study, it can be concluded that;

i. Phenotypic analysis of the 13 studied rice varieties revealed an enormous

diversity across all the varieties for all the traits evaluated. For all the

characters, these variations can be exploited for crop improvement program.

ii. The most distinct phenotypic traits among the studied Kenyan and Tanzanian

rice varieties were GL, KL, GW and KL/B ratio.This analysis validates their

use as the main discriminating traits among the test varieties.Based on these

traits, at least threelandrace rice varieties;Kilombero, Wahiwahi and Kahogo

were found to possess good grain quality traits. These promising landraces

should be conserved as reservoir of beneficial gene pool for improvement of

grain quality traits in rice varieties.

iii. Heterozygosity deficiency was evident among all the test varieties and this

indicated that varieties used in this study were all pure breeds. This could be

associated with high forces of inbreeding as shown by the inbreeding

coefficients (F). 65 iv. On the basis of morphological and molecular analysis, improved rice varieties

from both source countries used in this study had a low genetic diversity

compared to landraces. This indicated a high genetic similarity among these

varieties and could be perhaps due to high selection pressure for good quality

traits and sharing of a common ancestry.

v. RM 339 and RM 241 were found to be the most reproducible diverse markers

suitable for differentiating most of the rice varieties.These trait specific

markers demonstrated a good sensitivity and the extend in which they can be

relied upon for use in quality assurance, for characterization of other rice

varieties as well as in breeding. This would be of benefit to both consumers

and farmers. vi. Rice varieties from both source countries were found to share some common

alleles with some being specific to particular rice varieties. The variety

specific alleles can be employed in variety identification and DNA

fingerprints to differentiate rice varieties in the market from different

countries.

66

5.3 Recommendations

From this study, the following recommendations can be made;

i. Agro- morphological traits can be employed as a common approach for

assessing genetic and phenotypic variability among varieties.

ii. Use of trait specific microsatellite markers is recommended for diversity

analysis and differentiation of rice varieties from different geographical

origins. iii. RM 339 and RM 241 markers are recommended for use in diversity studies

and in quality assurance for grading of rice varieties. iv. This study suggests use of morphological and molecular evaluation of traits for

selection of donor lines for future breeding programmes in crop improvement.

v. Landraces which express huge genetic variability should be conserved as

reservoir of beneficial gene pool for improvement of grain quality traits in rice

varieties.

67

5.4 Suggestion for further research

i. Sequencing of all the alleles should be carried out to visualize better the

differences between them. ii. Further analysis should be carried out using a larger number of samples and

markers to come up a more conclusive report on the discriminating power of

microsatellite markers based on rice grain quality traits. iii. Hybridization of different distantly related rice varieties from different clusters

should be carried out to obtain segregants with high degree of hybrid vigor for the

traits studied.

68

REFERENCES

Ahn, S. N., Bollich, C. N., McClung, A. M., & Tanksley, S. D. (1993). RFLP analysis of genomic regions associated with cooked-kernel elongation in rice. Theoretical and Applied Genetics, 87(1-2), 27-32.

Blair, M. W., & McCouch, S. R. (1997). Microsatellite and sequence-tagged site markers diagnostic for the rice bacterial leaf blight resistance gene xa-5. Theoretical and Applied Genetics, 95(1-2), 174-184.

Bligh, H. F. J., Blackhall, N. W., Edwards, K. J., & McClung, A. M. (1999). Using amplified fragment length polymorphisms and simple sequence length polymorphisms to identify cultivars of brown and white milled rice. Crop Science, 39(6), 1715-1721.

Bradbury, L. M., Fitzgerald, T. L., Henry, R. J., Jin, Q., & Waters, D. L. (2005a). The gene for fragrance in rice. Plant Biotechnology Journal, 3(3), 363-370.

Bradbury, L. M., Henry, R. J., Jin, Q., Reinke, R. F., & Waters, D. L. (2005b). A perfect marker for fragrance genotyping in rice. Molecular Breeding, 16(4), 279-283.

Brar, D. S., & Khush, G. S. (2003). Utilization of wild species of genus Oryza in rice improvement. Monograph on Genus Oryza. Plymouth, 283-309.

Brondani, R. P. V., Zucchi, M. I., Brondani, C., Rangel, P. H. N., Borba, T. C. D. O., Rangel, P. N., & Vencovsky, R. (2005). Genetic structure of wild rice Oryza glumaepatula populations in three Brazilian biomes using microsatellite markers. Genetica, 125(2-3), 115-123.

Cao, Q., Lu, B. R., Xia, H. U. I., Rong, J., Sala, F., Spada, A., & Grassi, F. (2006). Genetic Diversity and Origin of in China, Annals of Botany, 98(6), 1241–1252.

Causse, M. A., Fulton, T. M., Cho, Y. G., Ahn, S. N., Chunwongse, J., Wu, K., & Tanksley, S. D. (1994). Saturated molecular map of the rice genome based on an interspecific backcross population. Genetics, 138(4), 1251-1274.

Chakravarthi, B. K., & Naravaneni, R. (2006). SSR marker based DNA fingerprinting and diversity study in rice (Oryza sativa L). African Journal of Biotechnology, 5(9), 684-688.

69

Chang, T. T. (1976). The origin, evolution, cultivation, dissemination and diversification of Asian and African . Euphytica, 25(1), 425-441.

Chang, T.T. (2003). Origin, Domestication, and Diversification. In RiceOrigin, History, Technology, and Production, 3-25.

Chen, X., Temnykh, S., Xu, Y., Cho, Y. G., & McCouch, S. R. (1997). Development of a microsatellite framework map providing genome-wide coverage in rice (Oryza sativa L.). Theoretical and Applied Genetics, 95(4), 553-567.

Cho, Y. G., Ishii, T., Temnykh, S., Chen, X., Lipovich, L., McCouch, S. R.,& Cartinhour, S. (2000). Diversity of microsatellites derived from genomic libraries and GenBank sequences in rice (Oryza sativa L.). Theoretical and Applied Genetics, 100(5), 713-722.

Choudhury, A. T. M. A., & Kennedy, I. R. (2004). Prospects and potentials for systems of biological nitrogen fixation in sustainable rice production. Biology and Fertility of Soils, 39(4), 219-227.

Cordeiro, G. M., Christopher, M. J., Henry, R. J., & Reinke, R. F. (2002). Identification of microsatellite markers for fragrance in rice by analysis of the rice genome sequence. Molecular Breeding, 9(4), 245-250.

Cruz, N. D., & Khush, G. S. (2000). Rice grain quality evaluation procedures. Aromatic Rices, 15-28.

Devos, K. M., & Gale, M. (1992). The use of random amplified polymorphic DNA markers in wheat. Theoretical and Applied Genetics, 84(5-6), 567-572.

Duvick, D. N., Smith, J. S. C., & Cooper, M. (2010). Long term selection in a commercial hybrid maize breeding program. Janick. I. Plant Breeding Reviews. Part, 2(24), 109-152.

Esquinas-Alacazar J. (2005). Protecting crop genetic diversity for food security: political, ethical and technical challenges. Nature Revision on Genetics, 6, 946–953.

Ferrari, C.D., Cibele, D.S., &Luciana, L.V. (2007). Evaluation of polymerase chain reaction and DNA isolation protocols for detection of genetically modified soybean. International Journal of Food Science and Technology, 42, 1249– 1255.

Fitzgerald, M. A., McCouch, S. R., & Hall, R. D. (2009). Not just a grain of rice: The quest for quality. Trends in Plant Science, 14(3), 133-139. 70

Ganal, M. W., Altmann, T., & Röder, M. S. (2009). SNP identification in crop plants. Current opinion in Plant Biology, 12(2), 211-217. Garland, S., Lewin, L., Blakeney, A., Reinke, R., & Henry, R. (2000). PCR-based molecular markers for the fragrance gene in rice (Oryza sativa L.). Theoretical and Applied Genetics, 101(3), 364-371.

Ghneim Herrera, T., Posso Duque, D., Pérez Almeida, I., Torrealba Núñez, G., Pieters, A. J., Martinez, C. P., & Tohme, J. M. (2008). Assessment of genetic diversity in Venezuelan rice cultivars using simple sequence repeats markers. Electronic Journal of Biotechnology, 11(5), 3-4.

Glaszmann, J. C. (1987). Isozymes and classification of Asian rice varieties. Theoretical and Applied Genetics, 74(1), 21-30.

Govindaraj, P., Vinod, K. K., Arumugachamy, S., & Maheswaran, M. (2009). Analysing genetic control of cooked grain traits and gelatinization temperature in a double haploid population of rice by quantitative trait loci mapping. Euphytica, 166 (2), 165-176.

Hien, N. L., Sarhadi, W. A., Oikawa, Y., & Hirata, Y. (2007). Genetic diversity of morphological responses and the relationships among Asia aromatic rice (Oryza sativa L.) cultivars. Tropics, 16(4), 343-355.

Huang, X., Kurata, N., Wei, X., Wang, Z. X., Wang, A., Zhao, Q., & Han, B. (2012). A map of rice genome variation reveals the origin of cultivated rice. NatureInternational Weekly Journal of Science, 490(7421), 497-501.

Izawa, T., & Shimamoto, K. (1996). Becoming a model plant: the importance of rice to plant science. Trends in Plant Science, 1(3), 95-99.

Jain, N., Jain, S., Saini, N., & Jain, R. K. (2006). SSR analysis of chromosome 8 regions associated with aroma and cooked kernel elongation in Basmati rice. Euphytica, 152(2), 259-273. Jain, S., Jain, R. K., & McCouch, S. R. (2004). Genetic analysis of Indian aromatic and quality rice (Oryza sativa L.) germplasm using panels of fluorescently- labeled microsatellite markers. Theoretical and Applied Genetics, 109(5), 965-977. Jayamani, P., Negrao, S., Martins, M., Macas, B., & Oliveira, M. M. (2007). Genetic relatedness of Portuguese rice accessions from diverse origins as assessed by microsatellite markers. Crop Science, 47(2), 879-884. 71

Joshi, R. K., & Behera, L. (2007). Identification and differentiation of indigenous non-Basmati aromatic rice genotypes of India using microsatellite markers. African Journal of Biotechnology, 6(4), 348-354.

Kamarouthu, D. K. (2013). Study of Genetic Diversity in Karnataka Rice (Oryza Sativa) Landraces Using Trait Specific Simple Sequence Repeat (SSR) Markers.International Journal of Thesis Projects and Dissertations,1, (1), 45-70. Knapp, E. E., & Rice, K. J. (1998). Comparison of isozymes and quantitative traits for evaluating patterns of genetic variation in purple needle grass (Nassellapulchra). Conservation Biology, 12(5), 1031-1041.

Kumar, P., Gupta, V. K., Misra, A. K., Modi, D. R., & Pandey, B. K. (2009). Potential of molecular markers in plant biotechnology. Plant Omics Journal, 2(4), 141-162.

Lin, H. Y., Wu, Y. P., Hour, A. L., Ho, S. W., Wei, F. J., Hsing, Y. I., & Lin, Y. R. (2012). Genetic diversity of rice germplasm used in Taiwan breeding programs. Botanical. Studies, 53, 363-376.

Linares, O. F. (2002). African rice (Oryza glaberrima): history and future potential. Proceedings of the National Academy of Sciences, 99(25), 16360-16365.

Liu, K., & Muse, S. V. (2005). PowerMarker: integrated analysis environment for genetic marker data. Bioinformatics, 21(9), 2128-2129. Londo, J. P., Chiang, Y. C., Hung, K. H., Chiang, T. Y., & Schaal, B. A. (2006). Phylogeography of Asian wild rice, , reveals multiple independent domestications of cultivated rice, Oryza sativa. Proceedings of the National Academy of Sciences, 103(25), 9578-9583.

Lorieux, M., Petrov, M., Huang, N., Guiderdoni, E., &Ghesquiere, A. (1996). Aroma in rice: genetic analysis of a quantitative trait. Theoretical and Applied Genetics, 93(7), 1145-1151.

McCouch, S. R., Chen, X., Panaud, O., Temnykh, S., Xu, Y., Cho, Y. G., & Blair, M. (1997). Microsatellite marker development, mapping and applications in rice genetics and breeding. Plant Molecular Biology, 35(1-2), 89-99.

McCouch, S. R., Teytelman, L., Xu, Y., Lobos, K. B., Clare, K., Walton, M., & Stein, L. (2002). Development and mapping of 2240 new SSR markers for rice (Oryza sativa L.). DNA Research, 9(6), 199-207.

Meti, N., Samal, K. C., Bastia, D. N., & Rout, G. R. (2013). Genetic diversity analysis in aromatic rice genotypes using microsatellite based simple 72

sequence repeats (SSR) marker. African Journal of Biotechnology, 12(27), 4238-4250. Mwangi, J. K., Murage, H., Ateka, E. M., & Nyende, A. B. (2013). Agronomic diversity among rice (Oryza sativa L.) lines in a germplasm collection from Kenya. In Scientific Conference Proceedings, 730-743.

Neeraja, C. N., Hariprasad, A. S., Malathi, S., & Siddiq, E. A. (2005). Characterization of tall landraces of rice (Oryza sativa L.) using gene-derived simple sequence repeats. Current Science, 88(1), 149-152. Ni, J., Colowit, P. M., & Mackill, D. J. (2002). Evaluation of genetic diversity in rice subspecies using microsatellite markers. Crop Science, 42(2), 601-607. Ouma-Onyango, A. (2014). Promotion of Rice Production: A Likely Step to Making Kenya Food Secure. An Assessment of Current Production and Potential. Developing Country Studies, 4(19), 26-31.

Parikh, M., Rastogi, N. K., & Sarawgi, A. K. (2013). Variability in grain quality traits of aromatic rice (Oryza sativa L.). Bangladesh Journal of Agricultural Research, 37(4), 551-558. Peakall, R., & Smouse, P. E. (2012). GenAlex 6.5: genetic analysis in Excel. Population genetic software for teaching and research—an update. Bioinformatics, 28(19), 2537-2539.

Peano, C., Samson, M. C., Palmieri, L., Gulli, M., & Marmiroli, N. (2004). Qualitative and quantitative evaluation of the genomic DNA extracted from GMO and non-GMO foodstuffs with four different extraction methods. Journal of Agricultural and Food Chemistry, 52(23), 6962-6968.

Pervaiz, Z. H., Rabbani, M. A., Pearce, S. R., & Malik, S. A. (2009). Determination of genetic variability of Asian rice (Oryza sativa L.) varieties using microsatellite markers. African Journal of Biotechnology, 8(21), 5641- 5651. Pusadee, T., Jamjod, S., Chiang, Y. C., Rerkasem, B., & Schaal, B. A. (2009). Genetic structure and isolation by distance in a landrace of Thai rice. Proceedings of the National Academy of Sciences, 106(33), 13880-13885.

Ravi, M., Geethanjali, S., Sameeyafarheen, F., & Maheswaran, M. (2003). Molecular marker based genetic diversity analysis in rice (Oryza sativa L.) using RAPD and SSR markers. Euphytica, 133(2), 243-252. Ray, A., Deb D., Ray, R., &Chattopadhayay, B. (2013). Phenotypic characters of rice landraces reveal independent lineages of short-grain aromatic indica rice. Journal for Plant Sciences, 5, 258-264.

73

Roy, S., Banerjee, A., Senapati, B. K., & Sarkar, G. (2012). Comparative analysis of agro‐morphology, grain quality and aroma traits of traditional and Basmati‐type genotypes of rice, Oryza sativa L. Plant Breeding, 131(4), 486-492.

Saini, N., Jain, N., Jain, S., & Jain, R. K. (2004). Assessment of genetic diversity within and among Basmati and non-Basmati rice varieties using AFLP, ISSR and SSR markers. Euphytica, 140(3), 133-146. Sang, T., & Ge, S. (2007). The puzzle of rice domestication. Journal of Integrative Plant Biology, 49(6), 760.

Sakthivel, K., Sundaram, R. M., Rani, N. S., Balachandran, S. M., & Neeraja, C. N. (2009). Genetic and molecular basis of fragrance in rice. Biotechnology Advances, 27(4), 468-473.

Shah, S. M., Naveed, S. A., & Arif, M. (2013). Genetic diversity in basmati and non-basmati rice varieties based on microsatellite markers. Pakistan Journal of Botany, 45, 423-431. Shahidullah, S. M., Hanafi, M. M., Ashrafuzzaman, M., Ismail, M. R., & Khair, A. (2009). Genetic diversity in grain quality and nutrition of aromatic rices. African Journal of Biotechnology, 8(7), 1238-1246. Singh, N., Choudhury, D. R., Singh, A. K., Kumar, S., Srinivasan, K., Tyagi, R. K., & Singh, R. (2013). Comparison of SSR and SNP markers in estimation of genetic diversity and population structure of Indian rice varieties. Plos one Journal, 8(12), e84136. Siwach, P., Jain, S., Saini, N., Chowdhury, V. K., & Jain, R. K. (2004). Allelic diversity among Basmati and non-Basmati long-grain indica rice varieties using microsatellite markers. Journal of Plant Biochemistry and Biotechnology, 13(1), 25-32. Smith, B. D. (2001). Documenting plant domestication: the consilience of biological and archaeological approaches. Proceedings of the National Academy of Sciences, 98(4), 1324-1326.

Staub, J. E., Serquen, F. C., & Gupta, M. (1996). Genetic markers, map construction, and their application in plant breeding. Horticultural Science, 31(5), 729-741. Sweeney, M., & McCouch, S. (2007). The complex history of the domestication of rice. Annals of Botany, 100(5), 951-957.

Temnykh, S., Park, W. D., Ayres, N., Cartinhour, S., Hauck, N., Lipovich, L., & McCouch, S. R. (2000). Mapping and genome organization of 74

microsatellite sequences in rice (Oryza sativa L.). Theoretical and Applied Genetics, 100(5), 697-712.

Vanniarajan, C., Vinod, K. K., & Pereira, A. (2012). Molecular evaluation of genetic diversity and association studies in rice (Oryza sativa L.). Journal of Genetics, 91(1), 9-19.

Varnamkhasti, M. G., Mobli, H., Jafari, A., Keyhani, A. R., Soltanabadi, M. H., Rafiee, S., & Kheiralipour, K. (2008). Some physical properties of rough rice (Oryza Sativa L.) grain. Journal of Cereal Science, 47(3), 496-501.

Vaughan, D. A., Kadowaki, K. I., Kaga, A., & Tomooka, N. (2005). On the phylogeny and biogeography of the genus Oryza. Breeding Science, 55(2), 113-122.

Vekemans, X., Beauwens, T., Lemaire, M., & Roldán‐Ruiz, I. (2002). Data from amplified fragment length polymorphism (AFLP) markers show indication of size homoplasy and of a relationship between degree of homoplasy and fragment size. Molecular Ecology, 11(1), 139-151. Villa, T. C. C., Maxted, N., Scholten, M., & Ford-Lloyd, B. (2005). Defining and identifying crop landraces. Plant Genetic Resources: characterization and utilization, 3(3), 373-384. Virk, P. S., Zhu, J., Newbury, H. J., Bryan, G. J., Jackson, M. T., & Ford-Lloyd, B. V. (2000). Effectiveness of different classes of molecular marker for classifying and revealing variation in rice (Oryza sativa L.) germplasm. Euphytica, 112(3), 275-284. Vos, P., Hogers, R., Bleeker, M., Reijans, M., Van de Lee, T., Hornes, M., & Zabeau, M. (1995). AFLP: a new technique for DNA fingerprinting. Nucleic Acids Research, 23(21), 4407-4414. Vlachos, A., & Arvanitoyannis, I. S. (2008). A review of rice authenticity/adulteration methods and results. Critical Reviews in Food Science and Nutrition, 48(6), 553-598. Williams, J. G., Kubelik, A. R., Livak, K. J., Rafalski, J. A., & Tingey, S. V. (1990). DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Research, 18(22), 6531-6535. Yadav, R. B., Khatkar, B. S., & Yadav, B. S. (2007). Morphological, physicochemical and cooking properties of some Indian rice (Oryza sativa L.) cultivars. Journal of Agricultural Technology, 3(2), 203-210.

Yoshihashi, T., Huong, N. T. T., Surojanametakul, V., Tungtrakul, P., & Varanyanond, W. (2005). Effect of Storage Conditions on 2–Acetyl‐1– pyrroline Content in Aromatic Rice Variety, Khao Dawk Mali 105. Journal of Food Science, 70(1), S34-S37. 75

Zhao, K., Tung, C. W., Eizenga, G. C., Wright, M. H., Ali, M. L., Price, A. H.,& McCouch, S. R. (2011). Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nature C ommunications, 2, 467.

76

APPENDICES

APPENDIX 1: One-way ANOVA: GL versus GENOTYPE

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value GENOTYPE 12 32.61 2.7177 12.03 0.000 Error 117 26.43 0.2259 Total 129 59.04

Means

GENOTYPE N Mean StDev 95% CI VARIETY A 10 9.199 0.374 ( 8.901, 9.497) VARIETY B 10 9.543 0.496 ( 9.245, 9.841) VARIETY C 10 9.225 0.378 ( 8.927, 9.523) VARIETY D 10 9.302 0.358 ( 9.004, 9.600) VARIETY E 10 8.9990 0.3105 (8.7014, 9.2966) VARIETY F 10 9.1380 0.2175 (8.8404, 9.4356) VARIETY G 10 9.929 0.353 ( 9.631, 10.227) VARIETY H 10 10.020 0.601 ( 9.722, 10.318) VARIETY I 10 9.072 0.633 ( 8.774, 9.370) VARIETY J 10 9.952 0.465 ( 9.654, 10.250) VARIETY K 10 9.855 0.417 ( 9.557, 10.153) VARIETY L 10 10.666 0.654 (10.368, 10.964) VARIETY M 10 10.243 0.657 ( 9.945, 10.541)

Pooled StDev = 0.475269

APPENDIX 2: Tukey Pairwise Comparisons

Grouping Information Using the Tukey Method and 95% Confidence

GENOTYPE N Mean Grouping VARIETY L 10 10.666 A VARIETY M 10 10.243 A B VARIETY H 10 10.020 A B C VARIETY J 10 9.952 A B C VARIETY G 10 9.929 B C D VARIETY K 10 9.855 B C D E VARIETY B 10 9.543 B C D E F VARIETY D 10 9.302 C D E F VARIETY C 10 9.225 D E F VARIETY A 10 9.199 E F VARIETY F 10 9.1380 E F VARIETY I 10 9.072 F VARIETY E 10 8.9990 F

Means that do not share a letter are significantly different.

77

APPENDIX 3: One-way ANOVA: GB versus GENOTYPE

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value GENOTYPE 12 0.7071 0.05893 4.45 0.000 Error 117 1.5507 0.01325 Total 129 2.2578

Means

GENOTYPE N Mean StDev 95% CI VARIETY A 10 1.9940 0.1041 (1.9219, 2.0661) VARIETY B 10 1.8500 0.0796 (1.7779, 1.9221) VARIETY C 10 1.8430 0.0617 (1.7709, 1.9151) VARIETY D 10 2.0110 0.1254 (1.9389, 2.0831) VARIETY E 10 1.8460 0.0593 (1.7739, 1.9181) VARIETY F 10 2.0280 0.1605 (1.9559, 2.1001) VARIETY G 10 2.0490 0.0976 (1.9769, 2.1211) VARIETY H 10 1.9910 0.1687 (1.9189, 2.0631) VARIETY I 10 1.9890 0.0737 (1.9169, 2.0611) VARIETY J 10 2.0120 0.1163 (1.9399, 2.0841) VARIETY K 10 1.9390 0.1171 (1.8669, 2.0111) VARIETY L 10 2.0170 0.1559 (1.9449, 2.0891) VARIETY M 10 2.0550 0.1068 (1.9829, 2.1271)

Pooled StDev = 0.115126

APPENDIX 4: Tukey Pairwise Comparisons

Grouping Information Using the Tukey Method and 95% Confidence

GENOTYPE N Mean Grouping VARIETY M 10 2.0550 A VARIETY G 10 2.0490 A VARIETY F 10 2.0280 A VARIETY L 10 2.0170 A B VARIETY J 10 2.0120 A B VARIETY D 10 2.0110 A B VARIETY A 10 1.9940 A B VARIETY H 10 1.9910 A B VARIETY I 10 1.9890 A B VARIETY K 10 1.9390 A B VARIETY B 10 1.8500 B VARIETY E 10 1.8460 B VARIETY C 10 1.8430 B

Means that do not share a letter are significantly different.

78

APPENDIX 5: One-way ANOVA: GL/B versus GENOTYPE

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value GENOTYPE 12 7.085 0.59042 6.98 0.000 Error 117 9.902 0.08463 Total 129 16.987

Means

GENOTYPE N Mean StDev 95% CI VARIETY A 10 4.6253 0.3088 (4.4431, 4.8075) VARIETY B 10 5.1593 0.1830 (4.9771, 5.3415) VARIETY C 10 5.0051 0.1006 (4.8229, 5.1872) VARIETY D 10 4.6397 0.3048 (4.4575, 4.8219) VARIETY E 10 4.8766 0.1499 (4.6944, 5.0588) VARIETY F 10 4.531 0.369 ( 4.349, 4.713) VARIETY G 10 4.8519 0.2007 (4.6697, 5.0341) VARIETY H 10 5.057 0.414 ( 4.875, 5.239) VARIETY I 10 4.565 0.331 ( 4.383, 4.747) VARIETY J 10 4.961 0.352 ( 4.778, 5.143) VARIETY K 10 5.096 0.318 ( 4.914, 5.278) VARIETY L 10 5.3015 0.2856 (5.1193, 5.4837) VARIETY M 10 4.9890 0.2850 (4.8068, 5.1712)

Pooled StDev = 0.290920

APPENDIX 6: Tukey Pairwise Comparisons

Grouping Information Using the Tukey Method and 95% Confidence

GENOTYPE N Mean Grouping VARIETY L 10 5.3015 A VARIETY B 10 5.1593 A B VARIETY K 10 5.096 A B VARIETY H 10 5.057 A B C VARIETY C 10 5.0051 A B C VARIETY M 10 4.9890 A B C D VARIETY J 10 4.961 A B C D E VARIETY E 10 4.8766 A B C D E VARIETY G 10 4.8519 B C D E VARIETY D 10 4.6397 C D E VARIETY A 10 4.6253 C D E VARIETY I 10 4.565 D E VARIETY F 10 4.531 E

Means that do not share a letter are significantly different.

79

APPENDIX 7: One-way ANOVA: KL versus GENOTYPE

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value GENOTYPE 12 19.14 1.5951 11.99 0.000 Error 117 15.57 0.1331 Total 129 34.71

Means

GENOTYPE N Mean StDev 95% CI VARIETY A 10 6.6190 0.2957 (6.3905, 6.8475) VARIETY B 10 7.112 0.386 ( 6.884, 7.340) VARIETY C 10 6.9310 0.2659 (6.7025, 7.1595) VARIETY D 10 6.6250 0.2878 (6.3965, 6.8535) VARIETY E 10 6.522 0.346 ( 6.294, 6.750) VARIETY F 10 6.4350 0.2941 (6.2065, 6.6635) VARIETY G 10 7.139 0.345 ( 6.911, 7.367) VARIETY H 10 7.501 0.324 ( 7.273, 7.729) VARIETY I 10 6.600 0.503 ( 6.372, 6.828) VARIETY J 10 6.9810 0.2707 (6.7525, 7.2095) VARIETY K 10 7.0780 0.3128 (6.8495, 7.3065) VARIETY L 10 7.540 0.437 ( 7.312, 7.768) VARIETY M 10 7.586 0.543 ( 7.358, 7.814)

Pooled StDev = 0.364793

APPENDIX 8: Tukey Pairwise Comparisons

Grouping Information Using the Tukey Method and 95% Confidence

GENOTYPE N Mean Grouping VARIETY M 10 7.586 A VARIETY L 10 7.540 A VARIETY H 10 7.501 A B VARIETY G 10 7.139 A B C VARIETY B 10 7.112 A B C VARIETY K 10 7.0780 A B C VARIETY J 10 6.9810 B C D VARIETY C 10 6.9310 C D VARIETY D 10 6.6250 C D VARIETY A 10 6.6190 C D VARIETY I 10 6.600 C D VARIETY E 10 6.522 D VARIETY F 10 6.4350 D

Means that do not share a letter are significantly different.

80

APPENDIX 9: One-way ANOVA: KB versus GENOTYPE

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value GENOTYPE 12 0.6485 0.054042 5.64 0.000 Error 117 1.1208 0.009580 Total 129 1.7693

Means

GENOTYPE N Mean StDev 95% CI VARIETY A 10 1.7620 0.0507 (1.7007, 1.8233) VARIETY B 10 1.6410 0.0803 (1.5797, 1.7023) VARIETY C 10 1.6590 0.0453 (1.5977, 1.7203) VARIETY D 10 1.7490 0.1590 (1.6877, 1.8103) VARIETY E 10 1.6430 0.0757 (1.5817, 1.7043) VARIETY F 10 1.8680 0.1268 (1.8067, 1.9293) VARIETY G 10 1.7880 0.1053 (1.7267, 1.8493) VARIETY H 10 1.8140 0.0638 (1.7527, 1.8753) VARIETY I 10 1.7860 0.0580 (1.7247, 1.8473) VARIETY J 10 1.7830 0.0904 (1.7217, 1.8443) VARIETY K 10 1.7440 0.1419 (1.6827, 1.8053) VARIETY L 10 1.8040 0.0821 (1.7427, 1.8653) VARIETY M 10 1.8490 0.1131 (1.7877, 1.9103)

Pooled StDev = 0.0978757

APPENDIX 10: Tukey Pairwise Comparisons

Grouping Information Using the Tukey Method and 95% Confidence

GENOTYPE N Mean Grouping VARIETY F 10 1.8680 A VARIETY M 10 1.8490 A VARIETY H 10 1.8140 A VARIETY L 10 1.8040 A B VARIETY G 10 1.7880 A B C VARIETY I 10 1.7860 A B C VARIETY J 10 1.7830 A B C VARIETY A 10 1.7620 A B C VARIETY D 10 1.7490 A B C VARIETY K 10 1.7440 A B C VARIETY C 10 1.6590 B C VARIETY E 10 1.6430 C VARIETY B 10 1.6410 C

Means that do not share a letter are significantly different.

81

APPENDIX 11: One-way ANOVA: KL/B versus GENOTYPE

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value GENOTYPE 12 6.961 0.58007 13.67 0.000 Error 117 4.965 0.04243 Total 129 11.926

Means

GENOTYPE N Mean StDev 95% CI VARIETY A 10 3.7590 0.1898 (3.6300, 3.8880) VARIETY B 10 4.3362 0.1751 (4.2072, 4.4652) VARIETY C 10 4.1773 0.0901 (4.0483, 4.3063) VARIETY D 10 3.8079 0.2642 (3.6789, 3.9369) VARIETY E 10 3.9711 0.1522 (3.8421, 4.1001) VARIETY F 10 3.4524 0.1575 (3.3234, 3.5814) VARIETY G 10 3.9974 0.1511 (3.8684, 4.1264) VARIETY H 10 4.1393 0.2227 (4.0103, 4.2683) VARIETY I 10 3.6965 0.2715 (3.5675, 3.8255) VARIETY J 10 3.9237 0.2376 (3.7947, 4.0528) VARIETY K 10 4.0749 0.2629 (3.9459, 4.2039) VARIETY L 10 4.1797 0.1506 (4.0507, 4.3087) VARIETY M 10 4.1076 0.2542 (3.9786, 4.2367)

Pooled StDev = 0.205995

APPENDIX 12: Tukey Pairwise Comparisons

Grouping Information Using the Tukey Method and 95% Confidence

GENOTYPE N Mean Grouping VARIETY B 10 4.3362 A VARIETY L 10 4.1797 A B VARIETY C 10 4.1773 A B VARIETY H 10 4.1393 A B VARIETY M 10 4.1076 A B C VARIETY K 10 4.0749 A B C VARIETY G 10 3.9974 B C D VARIETY E 10 3.9711 B C D VARIETY J 10 3.9237 B C D VARIETY D 10 3.8079 C D VARIETY A 10 3.7590 D E VARIETY I 10 3.6965 D E VARIETY F 10 3.4524 E

Means that do not share a letter are significantly different.

82

APPENDIX 13: One-way ANOVA: 10GW versus GENOTYPE

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value GENOTYPE 12 0.16756 0.013963 66.09 0.000 Error 117 0.02472 0.000211 Total 129 0.19228

Means

GENOTYPE N Mean StDev 95% CI VARIETY A 10 0.28900 0.01101 (0.27990, 0.29810) VARIETY B 10 0.23500 0.01581 (0.22590, 0.24410) VARIETY C 10 0.18200 0.01135 (0.17290, 0.19110) VARIETY D 10 0.26200 0.01317 (0.25290, 0.27110) VARIETY E 10 0.20600 0.01075 (0.19690, 0.21510) VARIETY F 10 0.27200 0.01476 (0.26290, 0.28110) VARIETY G 10 0.28500 0.01434 (0.27590, 0.29410) VARIETY H 10 0.27300 0.01337 (0.26390, 0.28210) VARIETY I 10 0.26500 0.00850 (0.25590, 0.27410) VARIETY J 10 0.27500 0.01354 (0.26590, 0.28410) VARIETY K 10 0.27500 0.01269 (0.26590, 0.28410) VARIETY L 10 0.28400 0.01776 (0.27490, 0.29310) VARIETY M 10 0.32700 0.02497 (0.31790, 0.33610)

Pooled StDev = 0.0145355

APPENDIX 14: Tukey Pairwise Comparisons

Grouping Information Using the Tukey Method and 95% Confidence

GENOTYPE N Mean Grouping VARIETY M 10 0.32700 A VARIETY A 10 0.28900 B VARIETY G 10 0.28500 B C VARIETY L 10 0.28400 B C VARIETY K 10 0.27500 B C D VARIETY J 10 0.27500 B C D VARIETY H 10 0.27300 B C D VARIETY F 10 0.27200 B C D VARIETY I 10 0.26500 C D VARIETY D 10 0.26200 D VARIETY B 10 0.23500 E VARIETY E 10 0.20600 F VARIETY C 10 0.18200 G

Means that do not share a letter are significantly different.

83

APPENDIX 15: Principal component analysis based on genetic distance matrix

Eigen Values by Axis and Sample Eigen Vectors Axis No. 1 2 3 4 5 6 EigenVal ue 3.962 3.433 2.134 1.819 1.571 1.252 Kenya 0.388 0.263 -0.054 -0.594 0.660 -0.061 Kenya -0.489 0.621 0.083 0.036 0.286 -0.297 Kenya -0.236 0.632 0.080 0.330 -0.062 -0.503 Kenya 0.040 0.676 0.415 -0.296 -0.636 0.256 Kenya 0.094 -0.968 0.131 -0.326 0.199 -0.370 Tanzania 0.830 0.449 0.149 0.152 -0.113 0.170 Tanzania -0.727 -0.346 -0.171 -0.384 -0.132 0.323 Tanzania 0.455 -0.759 -0.086 0.127 -0.640 -0.372 Tanzania -0.437 -0.170 -0.258 0.909 0.223 0.103 Tanzania 0.916 -0.165 -0.387 0.238 0.211 0.503 Tanzania -1.016 -0.224 -0.027 -0.083 -0.054 0.359 Tanzania 0.067 0.297 -0.844 -0.253 -0.153 -0.188 Tanzania 0.117 -0.305 0.969 0.143 0.210 0.076 Percentage of variation explained by the first 3 axes Axis 1 2 3 % 27.96 24.23 15.06 Cum % 27.96 52.19 67.25 84

85