<<

TRANSCRIPTOME ANALYSIS AND SNP MARKER IDENTIFICATION OF FINGER ( (L.) GAERTN) AT CRITICAL STAGES OF () INFESTATION

Mikwa Erick Owuor (BSc. Biology) I56/24662/2012

A Thesis Submitted in Partial Fulfilment of the Requirements for the Award of the Degree of Master of Science (Biotechnology) in the School of Pure and Applied Sciences of Kenyatta University

APRIL, 2019

ii

DECLARATION I, Mikwa Erick Owuor, duly declare that the work presented in this thesis is my original work and has not been presented for a degree or any other award in any other university or any other institution

Mikwa Erick Owuor (B.Sc. Biology) I56/24662/2012

Signature……………………………. Date…………………………………

We hereby confirm that the candidate carried out the work reported in this thesis under our supervision

Dr. Mark Wamalwa Signature…………………...Date………………………… Department of Biochemistry and Biotechnology, Kenyatta University

Dr. Richard O. Oduor Signature……………...... Date………………………. Department of Biochemistry and Biotechnology, Kenyatta University

Dr. Damaris A. Odeny Signature…………………...Date………………………… Genetic Gains, ICRISAT-ESA, Nairobi

iii

DEDICATION This thesis is dedicated to my mother Zilpah Mikwa, her sacrifice, belief, motivation and self-denial has led me this far.

iv

ACKNOWLEDGEMENTS

I would like to thank God Almighty for His guidance in strength and health to the end my course. I would like to thank my supervisors Dr. Mark Wamalwa, Dr. Damaris

Odeny and Dr. Richard Oduor for their support, encouragements and supervision throughout the course. I appreciate the effort and time they took towards correction and improving the clarity of the thesis. I am very grateful to Dr. Damaris Odeny specifically for believing in me and accepting me in the finger millet project at International

Research Institute for Semi-Arid Tropics (ICRISAT). Many thanks to Prof. Mathew

Dida (Department of Botany and Horticulture, Maseno University) and Dr. Rasha Ali

(Wad Medani Agricultural Research Station, Sudan) for providing the necessary materials I needed for the project.

At Biosciences east and central –International Livestock Research Institute

(BecA-ILRI) Hub, I am grateful to Dr. Francesca Stomeo, Solomon Maina, Mercy

Macharia, James Oguya and Joyce Njuguna for their support during my fellowship. At

ICRISAT, I thank Samuel Manthi, Rajneesh Palwaal and Vincent Njunge for their support during my fellowship in their lab. I want to especially thank Dryland group through CGIAR Research Program on Dryland Cereals (CRP-DC) for sponsoring part of my MSc. and part of my project. BecA-ILRI Hub under the Africa Bioscience

Challenge Fund (ABCF) program and Bio-resources Innovations Network for Eastern

Africa Development (Bio-innovate) for sponsoring the first part of my project, and making it part of the finger millet whole genome annotation project. v

I am very grateful to my family, my mum, sisters Ruth and Emma, and brothers Fred and Zachary for their tremendous love and support. I further extend my gratitude to my fellow students and fellows like Davis, Sandra, Godfrey, Easter, Milcah, Umar, and

Erick for their continued support and advice whenever I needed it. Thank you very much

vi

TABLE OF CONTENTS DECLARATION ...... ii DEDICATION ...... iii ACKNOWLEDGEMENTS ...... iv TABLE OF CONTENTS ...... vi LIST OF TABLES ...... ix LIST OF FIGURES ...... x LIST OF APPENDICES ...... xi ABBREVIATIONS AND ACRONYMS ...... xiii ABSTRACT ...... xv CHAPTER ONE ...... 1 GENERAL INTRODUCTION ...... 1 1.1 Background information...... 1 1.2 Problem statement and Justification ...... 6 1.3 Null Hypotheses ...... 7 1.4 Objectives ...... 8 1.4.1 General Objectives ...... 8 1.4.2 Specific Objectives ...... 8 1.5 Significance of the study ...... 8 CHAPTER TWO ...... 9 LITERATURE REVIEW ...... 9 2.1 Finger millet origin, distribution and adaptation ...... 9 2.2 Global economic importance ...... 10 2.3 and botany ...... 12 2.4 ...... 13 2.5 Production constraints ...... 14 2.5.1 Striga on finger millet ...... 15 2.5.2 Striga life cycle ...... 16 2.5.3 Striga resistance mechanisms ...... 18 2.5.4 biosynthetic pathways and pre-germination resistance ...... 19 2.5.5 Gene for gene resistance and Striga virulence ...... 21 2.5.6 Mechanisms of Striga resistance in and ...... 22 vii

2.5.7 Control strategies ...... 22 2.5.8 Screening for Striga resistance breeding resources...... 23 2.6 Genomics-assisted breeding in finger millet ...... 25 2.6.1 Next Generation Sequencing...... 27 2.6.2 Transcript analysis ...... 28 2.6.3 Single nucleotide polymorphisms (SNPs) identification ...... 30 CHAPTER THREE ...... 32 MATERIALS AND METHODS ...... 32 3.1 Agronomic competence of the finger millet accessions and in vitro response to Striga exposure ...... 32 3.2 Infecting of pre-germinated finger millet seedlings with Striga ...... 33 3.3 RNA extraction ...... 34 3.4 Library preparation and sequencing ...... 35 3.5 Quality control, de novo assembly of the sequences, and abundance estimation ...... 36 3.6 Differential gene expression analysis ...... 37 3.7 Gene prediction, annotation and classification ...... 37 3.8 SNP analysis ...... 38 CHAPTER FOUR ...... 39 RESULTS ...... 39 4.1 Agronomic traits scoring in the screen house...... 39 4.2 Critical Striga infestation stages and laboratory screening for resistance and susceptibility at each stage ...... 41 4.3 Quality of RNA and the cDNA library...... 44 4.4 Transcriptome sequencing and de novo assembly...... 46 4.5 Functional annotation and classification of transcripts ...... 49 4.6 Differentially expressed transcripts ...... 57 4.7 SNPs identification ...... 63 CHAPTER FIVE ...... 67 DISCUSSIONS, CONCLUSIONS AND RECOMMENDATIONS ...... 67 5.1 Discussion ...... 67 5.1.1 Agronomic competence and in vitro Striga response evaluation ...... 67 5.1.2 Differential expression and annotation of the transcripts ...... 70 viii

5.1.3 Functional annotation of Single Nucleotide Polymorphisms (SNPs) ...... 77 5.2 Conclusions ...... 79 5.3 Recommendations and Suggestions for Future Research...... 80 REFERENCES ...... 82 APPENDICES ...... 95

ix

LIST OF TABLES

Table 3.1: Finger millet accessions………………...…………………………….…....33

Table 3.2: Striga infestation and/or RNA extraction time points.………...………...... 35

Table 4.1: Screen house agronomical traits scores…………………...………………..40

Table 4.2: Total number of reads for all the accessions...... 47

Table 4.3: Statistical summary of the de novo assembly results per finger millet accession...... 49

Table 4.4: Number of differentially expressed transcripts at a false discovery rate (FDR) of 1% and a log2FC of 4 between infected and control samples per stage for each finger millet accession...... 58

Table 4.5: Top ten Differentially Expressed Transcripts (both up-regulated and down-regulated) for all the accessions at the attachment stage…...…..60

Table 4.6: Differentially Expressed (5% FDR) transcripts (both up-regulated and down-regulated) across all the accessions at 3 dpi....……...... 61

Table 4.7: Differentially Expressed (5% FDR) transcripts (both up-regulated and down-regulated) across the accessions at 5 dpi...………………...…...62

Table 4.8: Differentially Expressed (5% FDR) transcripts (both up-regulated and down-regulated) across the accessions at 7 dpi…...…...... ………...63

Table 4.9: Homoeologous SNPs and INDELS identified from de novo assembly of each accession ...………...……..………………………………………..64

Table 4.10: Non-Homoeologous Bi-allelic SNPs identified from de novo assembly of each accession ...……………………………….....………...65

Table 4.11: The distribution of Ts/TV coverages of Non-Homoeologous Bi-allelic SNPs…...……….……………..………………………………………….66

x

LIST OF FIGURES

Figure 4.1: The different ear shapes of the four accessions at maturity…...... 41

Figure 4.2: Optimal Striga infestation stages...……...………………………………...42

Figure 4.3: A comparison of the means of Striga counts per accession in all the four seedlings per petri dish at 3 and 9-days post infestation (dpi)..….43

Figure 4.4: Tukey‘s post hoc analysis of the group means of number of Striga at 3- and 9-days post infection…………...……...……………….………….44

Figure 4.5: Denaturing gel of the 24 E. corocana performed as a quality check before proceeding to library preparation……………….....………….45

Figure 4.6: Bio-analyzer gel (A) for the first twelve samples...……………….……....46

Figure 4.7: Venn diagram of transcript homology in four databases for each accession ...…...……………...……….………..…………………...50

Figure 4.8: Global view of Gene Ontology in GuluE……………...... …………...51

Figure 4.9: Global view of Gene Ontology in IE2396…………...... ………………...52

Figure 4.10: Global view of Gene Ontology in IE2459……………...…...…………...52

Figure 4.11: Global view of Gene Ontology in White Sel6……….....………...……...53

Figure 4.12: Cumulative view of Cluster of Orthologous Groups in all the

accessions……..…………………………………………………………55

Figure 4.13: Major KEGG pathways per accession.………………...……...... 56

Figure 4.14: classes……………………...…………………………………….57

Figure 4.15: Volcano plot showing differentially expressed transcripts at 5 dpi…...... 59

xi

LIST OF APPENDICES

Appendix 1: Number of reads of GuluE accession (Striga tolerant) ...…...... 95

Appendix 2: Number of reads of White Sel6 accession (Striga susceptible) …. …...... 95

Appendix 3: Number of reads of IE2396 accession (Striga tolerant)..………………95

Appendix 4: Number of reads of IE2459 accession (Striga susceptible) ...... 95

Appendix 5: Statistical summary of the number of annotated transcripts using BLASTX e-value 1e-3 per finger millet accession ...…………...... ……96

Appendix 6: Summary of KEGG Pathway mapping per accession ………...... 96

Appendix 7: Attachment stage commonly Annotated Transcripts against Uniprot Viridiplantae databases at an e-value of 1e-3…...... ……...…………….96

Appendix 8: Summary of Panther Protein Classes mapping per accession ...... 97

Appendix 9: Delayed infection in IE2396 after 5dpi……………………….………..98

Appendix 10: Top ten differentially expressed transcripts at 3 dpi…….………...... 98

Appendix 11: Top ten differentially expressed transcripts at 5 dpi………….………100

Appendix 12: Top ten differentially expressed transcripts at 7 dpi……………...….103

Appendix 13: Distribution of Bi-allelic SNPs Ts/Tv ratio across all accessions...... 105

Appendix 14: Statistical summary of the de novo assembly results per finger millet accession ...……...………………………………………………..….106

Appendix 15: IE2459 Samples vs transcripts heatmap…………………………..…..107

Appendix 16: White Sel6 Samples vs transcripts heatmap…………..………………108

Appendix 17: IE2396 Samples vs transcripts heatmap……...……...………………..109

Appendix 18: GuluE Samples vs transcripts heatmap………...…..…………………110

Appendix 19: Summary of the Differentially Expressed transcripts mapping to KEGG Pathways……………………………………. …..…………111

Appendix 20: Phenylpropanoid Biosynthesis pathway………...………...…………..111 xii

Appendix 21: Tetracycline Biosynthesis pathway……………...………...………….112

Appendix 22: Purine Biosynthesis pathway…………………………...……………..112

xiii

ABBREVIATIONS AND ACRONYMS

ABCF Africa Bioscience Challenge Fund

AFLP Amplified Fragment Length Polymorphism

BecA-ILRI Hub Biosciences east and central Africa–International Livestock

Research Institute

CD-HIT Cluster Database at High Identity with Tolerance

COG Cluster of Orthologous Groups

DIAL De novo identification of alleles

EST Expressed Sequence Tags

FDR False Discovery Rate

FPKM Fragments per kilo base per million reads mapped

GATK Genome Analysis Toolkit

GO Gene Ontology

HIF Haustorial Initiation Factor

ICRISAT International Crop Research Institute for the Semi-Arid Tropics

KEGG Kyoto Encyclopedia of Genes and Genomes

MAS Marker Assisted Selection

NCBI National Centre for Biotechnology Information

NGS Next Generation Sequencing

PANTHER Protein Analysis Through Evolutionary Relationships

Pfam Protein families

PWF Probability Weighting Function

RFLP Random Fragment Length Polymorphism xiv

RefSeq NCBI Reference Sequence database

RNA-Seq RNA sequencing

RSEM RNA-Seq by Expectation Maximization qRT-PCR Quantitative Real Time-Polymerase Chain Reaction

SNP Single Nucleotide Polymorphism

SSA Sub-Saharan Africa

SSRs Simple Sequence Repeats

TMM Trimmed mean of M-values

VCF Variant Call Format

xv

ABSTRACT

Finger millet is a highly nutritious annual crop in the semi-arid tropics of the world. However, its yield potential has not been achieved in the Sub-Saharan Africa (SSA) due to deleterious biotic stresses such as Striga hermonthica. The survival of Striga on finger millet depends on a complex -parasite interaction that is defined by three critical infestation time points, 3 days‘ post-inoculation (dpi), 5-dpi and 7-dpi. In this study, the transcriptome of tolerant (GuluE and IE2396) and susceptible (White Sel6 and IE2459) finger millet accessions were analysed upon infecting with Striga, to evaluate the molecular mechanisms involved in Striga tolerance or susceptibility at the critical time points. Roots of finger millet seedlings were infected with Striga seeds and samples collected after 3, 5 and 7 dpi. Total RNA was extracted from the young seedlings, synthesized to cDNA libraries and sequenced using Illumina MiSeq. Although, observable Striga germination commenced at 2 dpi, there were significant differences (mean ± standard error) among the accessions in terms of haustoria development and Striga attachment at 3 dpi and 9 dpi, respectively. At 3 dpi IE2396 (6.583 ± 0.7488) had the lowest number of Striga attachment, when compared to GuluE (8.75 ± 0.5171), IE2459 (9.667 ± 0.2465) and White Sel6 (9.4722 ± 0.2222). At 9 dpi, susceptible IE2459 (2.306 ± 0.314) was significantly different from GuluE (1.306 ± 0.2312), IE2396 (0.25 ± 0.1954) and White Sel6 (0.6944 ± 0.1806). Differential expression analysis identified 407 differentially expressed (DE) transcripts (at least 4- fold) at a false discovery rate (FDR) of 0.1% in a pairwise comparison of all samples. A total of 33527 Gene Ontology (GO) terms and 459 KEGG pathways were annotated at a cutoff e-value of 1e-5. Majority of the annotated transcripts were associated with signal transduction, hormone metabolism, cell wall development, mitochondrial electron transport/ATP synthesis and transport. An average of 1 SNP/153 bps, 1 SNP/160 bps, 1 SNP/358 bps and 1 SNP/189 bps bi-allelic non-homoeologous single nucleotide polymorphisms (SNPs) were identified in GuluE, IE2396, IE2459 and White Sel6, respectively. Finger millet accessions differ in their Striga tolerance regarding growth and yield. GuluE and IE2396 exhibited varying mechanisms of tolerance and escape to Striga infestation. Higher Striga infestation and differentially expressed transcripts observed in White Sel6 and IE2459 when compared to GuluE and IE2396 suggests higher Striga impact on susceptible accessions, especially at 5 dpi. SNP frequency doubling that was observed in GuluE confirms its wide geographical distance from the three other accessions. Genes involved in cell wall development, photosynthesis, signaling and transport should be studied and/or used for breeding in the tolerant accessions. The genes identified at critical stages of Striga infestation will inform key targeted breeding points for Striga tolerance in finger millet. Validation and introgression of these genes and SNP markers into farmer preferred varieties will, substantially, improve finger millet yields in the SSA. 1

CHAPTER ONE

GENERAL INTRODUCTION

1.1 Background information

Finger millet (Eluesine coracana) is an annual robust grass (Upadhyaya et al., 2007), belonging to the family and subfamily Chloridoideae. Finger millet is mainly grown in the drought prone regions of Africa, South and East (Oduori et al., 2005).

Finger millet is the third most abundantly produced millet in the world after foxtail millet (Setaria italica) and (Pennisetum glaucum) (Reddy et al., 2009). In

Africa, its production ranks second after pearl millet covering 19% of the area under (Belton and Taylor, 2004). Although there are limited statistics on the crop‘s production in Africa, the total production has been estimated to about 10% of the 34.6 million tons produced worldwide (Dida et al., 2008). In East Africa, Uganda is the largest finger millet producer with about 405 000 ha of farmland under millet production (Dida and Devos, 2006). In Kenya, the area under finger millet production

(about 65 000 ha) is not as immense as of Uganda (Mgonja et al., 2007), hence the need for crop improvement in the country. Uganda‘s higher production can be linked to the many years of cultivation, since it is presumed to be the centre of origin of the cultivated E. corcana (Owere et al., 2014).

Finger millet is believed to have originated from the highlands of Eastern Africa, most probably in Uganda and Ethiopia (Kajuna, 2001). Finger millet earliest archaeological records date to about 5000 years ago, relating to the early agricultural practices in 2

Ethiopia (Upadhyaya et al., 2007). From East Africa it spread to Asia, reported in around 2000 years ago and farther towards Java, China and Japan (Global Crop

DiversityTrust, 2011). In Africa, finger millet is majorly grown in the East and Southern regions (Reddy et al., 2009). In Kenya, it is cultivated in the Rift Valley, Western and

Nyanza regions (Oduori, 2008) mainly for human consumption.

Finger millet‘s nutritional properties are superior to other millets and major cereals

(Shobana et al., 2013), hence more favourable for animal and human food. Its calcium content (344mg/100g) is far much higher than that of pearl millet (Pennisetum glaucum)

(37mg/100g), wheat (Triticum aestivum) (30mg/100g) and sorghum (Sorghum bicolor)

(27mg/100g) (Obilana, 2003). In countries like India, USA and Ireland finger millet grain and stover have successfully been used for livestock feed (Heuzé et al., 2015).

Contrastingly, in the SSA countries, it has been used as special diet during festivities and specific functions for many years (Oduori, 2005). In such countries, it is mainly preferred for weaning of infants, pregnant women and the diabetics.

Finger millet continues to alleviate poverty, enhance nutritional health and open new markets to the people in the drought prone regions of Sub Sahara Africa (SSA). It adapts well to the erratic droughts experienced in the SSA region and has good post- harvest qualities (Dida and Devos, 2006). Finger millet‘s small grain size and the low- content have been associated with its better storage abilities as compared to other millets and cereals (Shobana et al., 2013). Finger millet has good malting properties

(Obilana, 2003) and its ‗saccharifying‘ power is only second to barley (Hordeum 3

vulgare L.) (Oduori, 2005). In Uganda and , it is a major substrate for brewing of both traditional and modern beer (Belton and Taylor, 2004).

Despite finger millet‘s superior qualities, its low production still experienced in the SSA region is majorly caused by the prevailing biotic and abiotic constraints. One of the main challenges to cultivating finger millet is the labour-intensive needs during weeding and harvesting of the crop (Oduori, 2005). Striga hermonthica is one of the prevalent found in finger millet fields contributing to the low yields and intensive labour requirement in eradication of the weed (Midega et al., 2010). Striga can result in yield losses ranging from 15% to 100% depending on the prevailing weather conditions and susceptibility of the crop (Atera et al., 2013). A recent assessment of Striga damage and severity in Western Kenya revealed that more than 50% of the farmers‘ fields were heavily affected by Striga and needed urgent intervention (Ngesa et al., 2015).

As an obligate hemi-parasite, though it exclusively depends on the host plant for its nutrients, Striga assumes autonomous photosynthesis at the above ground development stages. The survival of Striga depends upon three critical infestation stages of its life cycle, which are closely associated with the host plant physiology (Ejeta, 2007). The three stages include germination, attachment and penetration. At the germination stage, from the host plant stimulates Striga seeds to develop a radicle known as haustorium (Bouwmeester et al., 2003). The haustorium then receives a second stimulus known as haustorial initiation factors (HIF) from the host to produce sticky haustorial hairs at the attachment stage (Scholes and Press, 2008). The development of a 4

tubercle that penetrates to the xylem tissue of the host root marks the penetration stage, which is the last critical stage (Ejeta, 2007).

The complex host-parasite interaction between Striga and its hosts is the major constraint to some of the previously proposed long-term control strategies (Oswald,

2005; Khan et al., 2014). Marker assisted selection (MAS) and genetic engineering of novel resistant provide polygenic durable control strategies (Scholes and Press,

2008). This is based upon the assumption that have the ability to recruit complex defence mechanisms regulated at the molecular level to tackle invading environmental stressors at their constant exposure, in order to survive (Gachomo et al., 2003).

Therefore, a clear understanding of the host-parasite interaction will provide a strong basis towards defining the resistance essential in designing control strategies to Striga

(Scholes and Press, 2008).

Resistance against Striga has been studied in other cereal (Haussmann et al.,

2000). So far, specific Striga resistant mechanism like low germination stimulant production, low production of HIF, hypersensitivity and incompatible responses have also been described in sorghum (Ejeta, 2007) and maize (Amusan et al., 2008). More information on the actual host defence that discourage parasitic growth and establishment has recently been made available for sorghum (Mohamed et al., 2010).

Although no information is available for finger millet, recent advances in next generation sequencing (NGS) (Nagalakshmi et al., 2010) makes it possible to profile genes acting at various Striga infestation levels. 5

RNA sequencing (RNA-Seq) provides a powerful tool for profiling gene expression levels at specific and different tissues of an organism or treatment conditions (Wang et al., 2009). RNA-Seq is also emerging as the method of choice for studying functional effects of genetic variability and establishing causal relationships between genetic variants and diseases (Duitama et al., 2011). For example, Xu et al. (2012), used NGS tools to profile for genes expressed during embryogenesis and identified more than

1,011 differentially expressed genes between early and mature stages of embryogenesis.

In a different set up, Yang et al. (2011), also identified several single nucleotide polymorphisms (SNPs), simple sequence repeats (SSRs) and candidate genes between two alfalfa (Medicago sativa) plants, with varying cell wall components, and confirmed that RNA-seq requires no pre-existing genome. In comparison with quantitative real time PCR (qRT-PCR), the data from RNA-seq have presented a good starting point towards finding control genes during Striga development in finger millet (Fernández-

Aparicio et al., 2013).

Recently, WKRY45 was identified as a major gene involved in resistance to Striga in rice (Oryza sativa) using RNA-seq (Mutuku et al., 2015). In previous studies, only a few genes acting at various Striga infestation stages have been identified using other molecular tools (Yoshida and Shirasu, 2012). In a different review, it was proposed that essential genes can be identified at different critical stages of Striga development in rice using expressed sequence tags (EST) collections (Scholes and Press, 2008). However, no RNA-seq study has been used to profile finger millet genome at critical stages of 6

Striga infestation. Since transcriptome analysis is key to interpreting the functional elements of the genome and understanding the development of a disease (Wang et al.,

2009), analysing the transcriptome of finger millet at the specific Striga infective stages using RNA-seq is essential to understanding the development of Striga infestation in finger millet.

In this study, four finger millet accessions were exposed to Striga and their transcriptomes investigated at different Striga infestation time points namely 3 days post infection (dpi), 5 dpi and 7dpi. The transcriptome data was then compared between and among the tolerant and the susceptible finger millet accessions and different Striga infestation stages, in order to find differentially expressed genes. In addition, Single

Nucleotide Polymorphisms (SNPs) were identified in order to deduce variability between the four accessions.

1.2 Problem statement and Justification

Striga weed has led to devastating yield losses of major cereals in SSA. As a result, yield losses ranging from 15% to 100% have been estimated depending on the prevailing weather conditions and the susceptibility of the crops (Atera et al., 2013).

Previously, various control strategies such as crop rotation, herbicide control, use of nitrogen and , and use of suicidal germination plants like

Desmodium spp., have been applied in a bid to controlling Striga. However, these strategies are expensive, unreliable or not easily accessible by the small holder farmers in the SSA. Therefore, there is a need to explore inexpensive, more reliable and 7

permanent alternative control mechanisms. Currently, the use of molecular breeding technologies is the most preferred alternative to solve the Striga menace. Therefore, there is need to identify specific molecular markers and genes that are responsible for

Striga resistance to be used in breeding programs.

The genes profiled in this study will be introgressed into finger millet varieties with superior and desirable qualities like drought tolerance and high yielding in order to further improve their productivity. The identified single nucleotide polymorphic (SNP) markers will help in deciphering the genetic diversity between finger millet accessions and help in selecting the most adaptable genotype in the SSA region for further breeding. The set of SNPs will also improve the efficiency of breeding in finger millet and lead to the development of better yielding varieties with improved qualities including resistance to Striga.

1.3 Null Hypotheses

i. There are no phenotypic differences in responses against Striga infestation

among the four finger millet accessions.

ii. There are no functional differentially expressed (DE) transcripts that are

specifically induced or suppressed in Striga resistant versus susceptible finger

millet accessions.

iii. There are no nucleotide variations in transcripts of the finger millet accessions.

8

1.4 Objectives

1.4.1 General Objectives

To determine the transcriptome of finger millet accessions with varying response to

Striga in order to identify differentially expressed transcripts and single nucleotide

polymorphisms.

1.4.2 Specific Objectives

i. To determine in vitro response to Striga among four finger millet accessions

(GuluE, IE2396, White Sel6 and IE2459);

ii. To determine differentially expressed transcripts and functions of finger millet

transcripts upon Striga infestation at different time points;

iii. To determine single nucleotide polymorphisms (SNPs) among the four finger

millet accessions.

1.5 Significance of the study

The transcripts and SNPs markers identified in this study will augment the already existing finger millet breeding programs. Improved finger millet yields will directly raise the living standards of the small holder farmers in SSA. The national food and nutritional security will be enhanced in the semi-arid regions. As a result, the financial status of women in the region, who are the main stakeholders in farming finger millet, will improve. Future development of Striga resistant finger millet varieties will offer a milestone towards achieving a stable food and nutritional security as suggested in the

Kenya vision 2030 and millennium development goals. 9

CHAPTER TWO

LITERATURE REVIEW

2.1 Finger millet origin, distribution and adaptation

Many theories suggest that finger millet (E. coracana) originated from Africa (Shobana et al., 2013). Finger millet present archaeological records dating from 5000 years ago still exists at the national museum of Axum, Ethiopia (Dida et al., 2008). It is believed that finger millet was domesticated during the Iron Age in Africa and later introduced in

India before spreading to the rest of South-East Asia (Heuzé et al., 2015). Other evidence points to East Africa as the finger millet centre of origin, thereby distancing

India as a ‗secondary centre of diversity‘ (Dida and Devos, 2006). The of the cultivated finger millet from E. coracana subsp africana took place in the East

African highlands stretching from Ethiopia to Uganda (Dida et al., 2008).

The geographical distribution of finger millet is more diverse and stable as compared to other millets (Global Crop DiversityTrust, 2011). One of the contributing factors to the stability to various agro-ecological zones is its C4 nature (Oduori, 2005). Finger millet cultivation is extensive in warm temperate regions from Africa to Japan and Australia

(Heuzé et al., 2015). Currently, it is grown in an area covering over 4 million ha worldwide (Oduori, 2005). Finger millet was the first staple crop in East and Southern

Africa before maize was introduced and is still widely cultivated in the region, where it is a principle cereal in countries like Uganda (Oduori, 2008). In East Africa, the crop is normally cultivated around Lake Victoria and on the highlands. Finger millet also 10

serves as an important famine crop in countries such as Tanzania, Malawi, Zambia,

Zimbabwe and Mozambique (Oduori, 2005).

Finger millet adapts well to a wide range of agro-ecological conditions with minimal inputs. The crop can grow well at the sea level up to an altitude of 2,400m on the slopes of the Himalayas in Nepal (Dida and Devos, 2006). Finger millet cultivation is highly suitable in areas with both long and short rainfall (500 to 1000 mm) and low and high temperatures (18-35 ° C) (Manyasa, 2013). In Nepal, finger millet is highly popular for its widely known tolerance to cold temperatures (Dida and Devos, 2006). The productive agro-ecological zones vary substantially in East Africa. The low temperatures in the cold highlands of the Great Rift Valley are known to affect its seed set, and the low and erratic rainfall in the drier lowlands of Eastern Kenya is known to make the crop suffer from low moisture (Manyasa, 2013).

2.2 Global economic importance

Finger millet is one of the most nutritious cereals around the globe, well known for its high calcium and potassium content, good dietary fibre, reasonable amount of essential proteins and low-fat content (Obilana, 2003; Oduori, 2005; Shobana et al., 2013). The seed is covered by a unique multi-layered testa, which is the reason for its high dietary fibre (Shobana et al., 2013). The digestibility of finger millet and high levels of methionine and other essential amino acids in the main protein fraction eleusinin are of important medicinal value (Obilana, 2003; Oduori, 2005). The crop is an acclaimed supplement in the prevention and management of diseases like measles, anaemia, and 11

diabetes (Oduori, 2005). Through fermentation and germination its rich calcium and iron content can also be upgraded and used as a supplement for children and adolescents to improve bone health and haemoglobin (Shobana et al., 2013). The high polyphenols in the seed coat can also provide natural antioxidants and food preservatives (Varsha et al., 2009).

Finger millet is well adapted to high famine regions offering more reliable harvest as compared to other crops (Kajuna, 2001). For example, during the 2002 drought in the prairie dry districts of Siraro and Alaba of Ethiopia, nearly all the crops failed except finger millet, which showed high tolerance to drought (Aduguna, 2007). Therefore, many farmers resorted to replacing maize farms with finger millet in drought prone districts of Ethiopia (Aduguna, 2007). The grain suffers insignificant insect damages or viability loss during storage even for many years, because of their small sizes and the low-fat content (Dida and Devos, 2006; Shobana et al., 2013).

Finger millet grain are sold at high prices in the markets and can also be processed to valuable end-use products. The grains have been sold to villagers at a price twice that of maize in countries such as Ethiopia (Aduguna, 2007). The value-added flour and grain are sold at premium prices in Kenyan markets (Oduori, 2005). Finger millet possesses superior malting properties and ‗saccharifying‘ power that is only second to barley

(Hordeum vulgare L.) (Obilana, 2003; Oduori, 2005) that boosts its use in the brewing industries. Most of the processing in East Africa has focused on thin and thick porridge flour, while malting and brewing is mostly applied in West and Southern Africa. In 12

India, finger millet is processed by milling, malting, fermentation, popping, and decortication to produce several commercial products (Shobana et al., 2013).

2.3 Taxonomy and botany

Finger millet (E. coracana) belongs to the family Poaceae, sub-family Chloridoideae and Eleusine (Dida and Devos, 2006). The only other member of this sub-family

Chloridoideae is tef (Eragrostis teff) (Oduori, 2008). In the genus Eleusine, E. coracana is the only species that contains nine sub-species, eight of which are African wild grasses (Oduori, 2008) both annuals and perennials. There are two well-known sub- species, the cultivated E. coracana subsp. coracana (L.) Gaertn, and the wild type E. coracana subsp. africana (Manyasa, 2013).

A wide range of phenotypic characters can be used to identify finger millet. Generally, finger millet is a tufted annual crop that grows to about 40 to 150 cm tall and matures in

2.5 to 6 months (Dida and Devos, 2006). There are three types of growth habits exhibited in finger millet, decumbent, erect and prostate (Upadhyaya et al., 2007).

Finger millet possesses a panicle consisting of finger like bisexual spikes with spikelets and hermaphrodite florets that makes it highly self-pollinating (Oduori, 2008). Finger millet seeds are largely non-dormant, and the grain is spherical with a pericarp that is not fused to the seed coat (Manyasa, 2013).

Based on the inflorescence morphology, finger millet can be grouped into five races,

Coracana, Vulgaris, Compacta, Plana, and Elongata (Dida and Devos, 2006). The 13

races can further be classified into subraces. The race Elongata has three subraces, laxa, reclusa, and sparsa; Plana consists of three subraces, seriata, confundere, and grandigluma; Vulgaris has four subraces, lilicea, stellata, incurvata and digitata; while

Compacta has no subrace (Upadhyaya et al., 2007). The morphology of race Coracana closely resembles the subspecies africana having about 5 to 20 well-developed central spikes (Dida and Devos, 2006).

2.4 Genetic diversity

Early studies using flow cytometry indicate that finger millet has a 2C nuclear DNA content that translate into ca. 3.6 × 109 base pairs (Dida and Devos, 2006). The genome of finger millet is also estimated to contain about 58% of interspersed repetitive and non-repetitive DNA sequences (Vidya and Ranjekar, 1981). Currently, finger millet genome is being sequenced using next generation sequencing facilities, which will reveal the exact size of the genome.

The high self-pollinating rate coupled with small flowers that are difficult to open (Dida and Devos, 2006) are some of the reasons for the low level of variation in finger millet.

The earliest genetic map carried out within the cultivated finger millet (E. coracana subsp. coracana), and between cultivated finger millet and the weedy wild relative E. coracana subsp. africana using amplified length polymorphism (AFLP) and SNP markers identified limited levels of polymorphism despite the diverse origin of lines

(Dida and Devos, 2006). The cultivated finger millet and the weedy wild relative are both allotetraploids with 2n = 4x = 36 chromosomes (Dida and Devos, 2006). They 14

possess a genomic notation of AABB. The donor of the A genome is believed to be E. indica, but the donor of the B genome is yet to be established (Dida et al., 2008).

Progress towards identifying markers to be used in the genetic diversity programs have been made through identification of restriction (RFLP), amplified fragment length polymorphism (AFLP), expressed sequence tags (EST) and simple sequence repeats

(SSR) markers in finger millet (Dida et al., 2007). 82 SSR markers that can be used for breeding purposes are available to the public domain (Dida et al., 2007). Additional 45

SSR markers in 79 finger millet accessions from Africa and Asia, both cultivated and wild lines, have also been identified (Dida et al., 2008). The diversity within the wild accessions collected from Kenya and Uganda was higher than the diversity between the cultivated accessions which originated from Asia and Africa (Dida et al., 2008).

Additionally, the diversity of the Asian subgroup was found to be lower than the

African subgroup, explaining in part the origin of finger millet in Africa and later introduction in Asia (Dida et al., 2008).

2.5 Production constraints

The main abiotic and biotic factors hindering millet production in Africa include salinity, drought, diseases and weeds. The wide spread arid and semi-arid regions in

Africa is not favourable for any farming activity. However, some levels of tolerance to drought have been recorded in finger millet (Aduguna, 2007). The presence of a variety of diseases, pests and weeds also affect the production of finger millet. Blast disease, caused by the fungus Pyricularia grisea, is one of the major constraints to finger millet 15

production in Africa. Blast disease is transmitted through seed movement from one plant to another making it difficult to manage, hence could be a source of epidemic in finger millet fields (Oduori, 2008). Amongst the several weeds in the millet fields,

Striga weed has caused serious socio-economic impacts on the lives of famers in Africa.

Farmers abandoning their farms in places heavily infested areas and occasional migration of farming communities have also been reported (Obilana and Ramaiah,

1992). There are two well-known species of Striga present in finger millet farms in East

Africa, Striga hermonthica and .

2.5.1 Striga weed on finger millet

Striga hermonthica is one of the major constraints to finger millet production in SSA.

Among the 23 Striga spp. in the SSA region, S. hermonthica has been implicated to cause the most serious damages to major cereals in Eastern Africa (Midega et al.,

2010). Therefore, screening and breeding for S. hermonthica resistance in crops such as sorghum, maize (zea mays) have been of interest in this region (Ejeta, 2007; Amusan et al., 2008). Not much have been done on finger millet, except an evaluation of the effects of finger millet with D. intortum (Midega et al., 2010), in order to determine the potential of in reducing the negative effects of Striga.

Striga infects finger millet by competing for nutrients but photosynthesizes independently at later stages of its lifecycle. The crop supports this complex relationship by stimulating the germination of Striga seeds and initiation of a radicle known as haustorium. So far, the main Striga germination stimulants and the haustorial 16

initiation factors (HIF) studied are strigolactones and 2, 6 dimethoxy-1, 4-benzoquinone

(DMBQ), respectively (Ejeta, 2007). Restricted exudation of these phytochemicals, especially by the resistant plants, reduces the germination and later growth and development of Striga. As a result, Striga do not grow in the presence of favourable germination conditions unless exposed to its specific host.

Striga exploits the infected host plant through several mechanisms (Swarbrick et al.,

2008). Following seed production and an after-ripening period, the seeds germinate upon coming into contact with root exudates, but also non-root exudates and synthetic compounds (Berner et al., 1995). Striga seeds then develop haustoria, which penetrate to the xylem tissues of the host root from where they acquire nutrients, leading to stunting, wilting and chlorosis of the host plant. If there is no full attachment through the xylem- xylem connections between the parasite and the host and a successful haustoria induction then the weed dies (Berner et al., 1995).

2.5.2 Striga life cycle

The main drawback towards effective control of Striga has been the complex interaction between the weed and its hosts (Ichihashi et al., 2015). The intimate link between Striga and the hosts has been suggested as a challenge and a potential opportunity for controlling Striga, especially from the attachment stage (Scholes and Press, 2008). The parasite-host interaction starts right from the beginning of the lifecycle, where a group of sesquiterpene lactones known as strigolactones stimulate the germination of the seed 17

(Ejeta and Butler, 1993; Ejeta, 2007; Mohamed et al., 2010; Yoshida and Shirasu,

2012).

Strigolactones can be produced by many plants, especially when exposed to various abiotic and biotic stresses like drought, pests and diseases. One of the main reasons for production of strigolactones by plants is to develop a symbiotic relationship with arbuscular mycorrhizal (AM) fungi in the (Yoshida and Shirasu, 2012). In the process, plants obtain nitrogen and phosphorus from the fungi, while the fungi acquire water and other nutrients. In nutrient deficient conditions, it is expected that plants will tend to exudate more strigolactones in order to attract more AM fungi

(Yoneyama et al., 2015). However, it has been shown that Striga-resistance traits in maize are not necessarily accompanied by reduction in compatibility to AM fungi

(Yoneyama et al., 2015). The resistant genotypes are thus, capable of attracting Striga to a lesser extent and maintaining high productivity in non-fertile conditions.

Strigolactones are quite ubiquitous and can be produced by host and non-host plants

(Scholes and Press, 2008). As an obligate root hemi-parasite Striga takes advantage of the strigolactones exudates to partially complete its life cycle on specific hosts.

Following germination, the attachment of Striga to the host determines its ultimate survival (Ejeta, 2007; Scholes and Press, 2008). Prior to attachment, the germinating

Striga seed is stimulated to develop the haustorium by the haustorial initiation factors

(HIF) from the host. At this stage it is hypothesized that Striga triggers the host to produce DMBQ, which induces the parasite‘s quinone oxidoreductases that eventually convert DMBQ into an active single electron free radical with suitable redox potential 18

for haustorial induction (Ichihashi et al., 2015). The haustorium then attaches to the root of the plant using sticky haustorial hairs (Scholes and Press, 2008). Subsequently, the haustorium penetrates to the xylem tissue of the host.

At the penetration stage the haustorium is known to form a vascular bridge at the point of attachment (Ichihashi et al., 2015). Cell wall degrading enzymes like pectin methylsterases accumulate often at the point of attachment to aid penetration (Yoshida and Shirasu, 2012). At the centre of the penetrating haustorium of all the

Orobanchaceae family is a ‗hyaline body‘ with heavy parenchyma cells used for trafficking of host nutrients (Ichihashi et al., 2015). In the genus Striga, there is a special structure known as ‗Oscula‘ developed specifically for intrusive penetration into the host (Ichihashi et al., 2015). The rest of the above ground stages and development majorly depend on the successful penetration of Striga to the xylem tissue followed by acquisition of the nutrients. In contrast to the host plants, successful penetration is not equivalent to successful establishment of the parasite in the non-host plants. For instance, it has been shown that Striga can penetrate to the xylem of Arabidopsis, which in incapable of supporting its entire lifecycle (Yoshida and Shirasu, 2012).

2.5.3 Striga resistance mechanisms

Plants defence mechanism can be defined as: (i) resistance, the ability to reduce infection and (ii) tolerance, the ability to minimize the consequences of infection

(Rodenburg et al., 2005). Accordingly, a reliable resistance measure will be a prerequisite for identification of resistance or tolerance (Rodenburg et al., 2005). 19

Resistance to Striga can occur before or after the development of the vascular connection between the host and the parasite (Amusan et al., 2008; Rodenburg et al.,

2015). Pre-attachment resistance involves mechanisms that prevent the development of the parasite before attachment, while post-attachment resistance are the mechanisms that prevent establishment of the parasite after attachment (Rodenburg et al., 2005).

Many resistant mechanisms to Striga have been proposed (Haussmann et al., 2000). In summary, they include low production of germination stimulants, mechanical barriers, phytoalexine synthesis, incompatibility reaction, antibiosis, insensitivity to Striga toxin and avoidance through root growth habit. Specific resistant mechanisms have also been identified in sorghum and maize (Ejeta, 2007; Amusan et al., 2008; Rich and Ejeta,

2008; Oswald and Ransom, 2004). For instance, resistance mechanisms such as low strigolactones activity on Striga, low HIF activity, formation of necrotic lesions at the point of attachment, and incompatibility reactions have been identified in sorghum

(Ejeta, 2007).

2.5.4 Strigolactone biosynthetic pathways and pre-germination resistance Germination of Striga is one of the main processes that ensure its survival. Germination is stimulated by strigolactones (SL) produced by the host plants. For a successful invasion, Striga perceive the signals in the soil in a complexly synchronized interaction.

Some of the main germination stimulants (SL) reported so far include strigol, sorgolactone and alectrol first isolated from , sorghum and , respectively

(Ejeta, 2007). These and other several naturally occurring SL have been grouped into two families namely, Strigol and Orobanchol families (Zwanenburg et al., 2015). 20

Strigolactones have successfully evolved as signaling molecules between plants, which parasitic plants such as witchweed (Striga spp.) and broomrape (Orobanche spp.) wholly depend on (Zwanenburg et al., 2015). Strigolactones biosynthetic pathway involves a SL intermediate known as carlactone (CL), which is produced from a host of all-trans-b-carotene by three enzymes. Carlactone is then oxidized by oxidation enzyme(s) such as the cytochrome P450 monooxygenase and MORE AXILLARY

GROWTH 1 (MAX1) and then converted to two naturally occurring isomers of 5- deoxystrigol (5DS) (Seto and Yamaguchi, 2014). 5-deoxystrigol might be converted into some other SLs (Seto and Yamaguchi, 2014).

Strigolactones attach to the surface protein of the parasites‘ cell membrane stimulating a variety of signal pathways within the cell (Zwanenburg et al., 2015). In vascular plants such as rice and Arabidopsis, SL have been reported to stimulate members of the α/β fold hydrolase superfamily known to participate in hormone signaling and Skp, Cullin,

F-box containing complex (SCF)-mediated signal transduction pathway (Zwanenburg et al., 2015). A simple signalling transduction occurs as result, where target proteins act as

SL repressors until SL binds to a receptor which recruits an F-box protein to degrade the target (Lumba et al., 2017). Unlike in vascular plants SL perception in Striga hermonthica is more specific due to compatibility interaction with SL from exogenous sources. Isolation of genes in the family of α/β hydrolases in the family consisting of obligate parasitic plants has suggested their involvement in SL perception

(Lumba et al., 2017). A gene known as Striga hermonthica HYPOSENSITIVITY TO

LIGHT (ShHTL) and its homologs was implicated in sensing SLs in S. hermonthica 21

seeds, and differences in perception of SLs among plant species might be associated to the receptor evolution (Toh et al., 2015 and Lumba et al., 2017).

Pre-germination resistance to Striga by production of low germination stimulants (lgs) have been reported in sorghum (Ejeta et al., 2007). Recently, progress was made by elucidating the mechanism of action of the lgs mutant gene LGS1 (Gobena et al., 2016).

Mutation at LGS1 does not knock out SLs in root exudates; it just changes the relative abundance of certain types, such that the other essential functions of SLs (ability for mycorrhizal colonization, favourable tillering, and root responsiveness to nutritional deficiencies) remain intact (Gobena et al., 2016).

2.5.5 Gene for gene resistance and Striga virulence Plants immunity and specific effector mechanisms associated with Striga resistance have been reported recently (Timko et al., 2012). Although S. hermonthica triggers a host of polygenic reactions that cannot elicit specific effector reactions from the host, specific effector mechanisms have been established in S. gesneroides against cowpea

(Timko et al., 2012). Several genes associated with resistance have been identified in susceptible rice genotypes serving as a hint of effector triggered immunity (ETI) towards Striga invasion (Timko et al., 2012). Hypersensitivity reaction (HR) is one of the resistance reactions towards Striga that can be associated with direct effector triggered-nucleotide-binding domain and leucine-rich repeat-containing (NLR)- mediated immunity (Saucet and Shirashu, 2016). 22

2.5.6 Mechanisms of Striga resistance in Sorghum and maize Plants encounter invasion at various stages of infection in the lifecycle of Striga hermonthica. The multi-level invasiveness of Striga that makes it a successful plant parasite in many cereals was recently reported (Runo et al., 2018). Resistance and/or tolerance mechanisms have been documented in maize, sorghum and rice (Bawa et al.,

2015; Rodenburg et al., 2017 and Mbuvi et al., 2017). These cereal crops provide several breeding materials available for breeders as well as sources for further molecular exploration (Shayanowako et al., 2017). Wild sorghum accessions with at least two mechanisms for resistance provided additional material that can be exploited for Striga resistance using molecular studies and breeding (Mbuvi et al., 2017).

Genomics studies have also improved the ability to profile resistance genes specific for resistance in sorghum at particular infection stages. In the process, many sorghum varieties have been reported to be resistance to Striga (Liu et al., 2016). Molecular markers and genes that are associated with resistance have also been reviewed (Liu et al., 2016).

2.5.7 Control strategies A combination of several Striga control strategies is more viable as compared to any single control approach (Rodenburg et al., 2005; Bozkurt et al., 2014). The high genetic variation within S. hermonthica population, especially in East and Western Africa regions (Bozkurt et al., 2014), can explain in part the reason for the difficulty in controlling Striga. Nevertheless, previous Striga control efforts have been focused on achieving long-term and cost-effective control strategies (Berner et al., 1995; Olupot et al., 2003; Ahonsi et al., 2004; Midega et al., 2010). Some of the earlier control methods 23

included, cultural practices like crop rotation, use of organic manure, fallowing, and hand weeding. Besides the cultural methods, control strategies such as herbicide application (De Groote et al., 2007; Groote et al., 2008), intercropping with resistant (Khan et al., 2006) and use of suicidal germination chemicals have also been applied with different successes. For example, the US government successfully used ethylene to control Striga asiatica by suicidal germination in the absence of the host plants (Ransom, 2000).

Alternative control strategies such as application and various biological control strategies have also been of interest. The potential of various fertilizer and herbicide formulations have been evaluated in many studies (Ahonsi et al., 2004; Kanampiu et al., 2009; Jamil et al., 2012a). A number of biological control strategies have been tried in Striga control (Sauerborn et al., 2007; Elzein et al., 2008; Zarafi et al., 2015). For example, Fusarium oxysporium and AM (Lendzemo et al., 2005) fungi, endophytic bacteria Glucanoacetobacter spp. from (Gafar et al., 2015) and several soil bacteria (Mohammed et al., 2009) have the potential to reduce the Striga scourge substantially. Some of these control strategies are not readily available to the small holder farmers in the SSA. Advancements in the application of biotechnological tools in breeding for improved cultivars and development of transgenic resistant crops promises a greater success towards Striga control (Ejeta, 2007).

2.5.8 Screening for Striga resistance breeding resources Marker assisted selection (MAS) breeding and genetic engineering of resistant cultivars are some of the recently suggested reliable and permanent control strategies (Scholes 24

and Press, 2008). Plants own immunity can be improved to supplement the tolerance or resistance level in farmer preferred cultivars. Breeding is only possible after selection of the resistant varieties. Previous reports on resistance to Striga in sorghum and maize provide a strong basis to build on the current selection efforts (Amusan et al., 2008;

Rich and Ejeta, 2008; Mohamed et al., 2010).

To select for Striga resistant traits in the host plants, screening at the field and laboratory conditions are essential. In vitro methods of screening for resistance are useful for identification of better breeding materials (Rodenburg et al., 2015). Screening at the field conditions is preferred when targeting complex traits that are influenced by different environmental conditions (Haussmann et al., 2000). Preliminary field and screen house studies conducted in Western Kenya using finger millet cultivars from

Kenya and Uganda identified four accessions exhibiting extreme resistance and susceptibility to Striga (Mathews Dida, unpublished). The identified accessions through the screening program include Gulu-E, IE-2396 (tolerant) and White Sel6, IE-2459

(susceptible). The screening was done within two seasons and in a field with more than one Striga species.

For specific selection of resistance vis-à-vis susceptible traits at the underground Striga infestation stages, further screening under controlled laboratory conditions is required.

Since the advent of Striga resistance screening assays, efforts have been made in improving the assays to suit specific plant of interest (Reda, 1994). The Extended Agar 25

Gel assay (EAGA) was one of the latest improvements in sorghum screening

(Mohamed et al., 2010).

Identification of Striga related genetic variations in host plants have been intensified in different crops through screening (Oswald and Ransom, 2004; Haussmann et al., 2000;

Amusan et al., 2008). Highly variable effects have been observed in commercial maize varieties during Striga (Oswald and Ransom, 2004). An investigation of inheritance of genes controlling resistance and tolerance to S. hermonthica in maize inbred lines, based on visible host plant symptoms and Striga emergence counts, showed that genetic control for tolerance and resistance of the maize genotypes tested to S. hermonthica was quantitative (Ransom, 2000). In a previous study, the accumulated percentage gain from selection amounted to 51% lower Striga infestation measured by area under Striga number progress curve (ASNPC), in pearl millet

(Kountche et al., 2013). Physiological and histological changes in maize have also been reported to occur following exposure to various inducers or environmental conditions

(Amusan et al., 2008; Zheng et al., 2010). Physiological changes have been observed in root development of Zea diploperennis upon exposure to S. hermonthica, where the resistant inbred lines expressed a developmental barrier and an incompatible response against Striga parasitism (Amusan et al., 2008).

2.6 Genomics-assisted breeding in finger millet

Genomic resources have been widely applied to address crop productivity, stability and quality through advanced breeding techniques (Kole et al., 2015). Multi-dimensional 26

data from genetic resources, morphological and physiological traits has improved the efficiency of plant breeding. Because of the low variation among finger millet accessions, there is need for advanced techniques to evaluate the most salient traits and to maximize them in breeding for improved cultivars.

Both genome-wide association studies (GWAS) and next generation sequencing (NGS) technologies have become the primary approaches in identification and characterization of the specific variants causing susceptibility or resistance to common diseases. Current technologies such as genotyping-by-sequencing (GBS) can be used to discover novel genes/alleles for any given trait. NGS together with GWAS, which involve the use of statistical associations between DNA polymorphisms and morphological trait variations in a plant in order to identify genomic regions governing traits of interest, have increased precise mapping of genes (Kole et al., 2015). Functionally characterized genes, EST and genome sequencing projects have also facilitated the development of molecular markers from the transcribed regions of the genome (Varshney, et al., 2005).

Identification of specific molecular markers is a first step towards breeding of Striga resistant cultivars using the current molecular breeding programs. The markers should be identified before integration into improved cultivars. The commonly used markers in plant breeding include AFLP, RFLP, SSR, and SNPs. From previous studies slightly over 100 SSR makers were identified in finger millet and made available in the public domain (Dida et al., 2007; Dida et al., 2008). In other studies, much interest has been focused on development and testing of the genotyping potential of EST-SSRs 27

(Lingeswara et al., 2012; Obidiegwu, et al., 2014). The limited number of molecular markers available for breeding purposes in finger millet also limits the scope of breeding for improved cultivars. Using NGS analysis to identify SNPs have been one of the most recent and viable ways of molecular markers discovery in many crops (Yang et al., 2011; Trick et al., 2012; Sharpe et al., 2013; Duarte et al., 2014)

2.6.1 Next Generation Sequencing

Next generation sequencing involves massively parallel sequencing of DNA molecules in a step-wise iterative process or continuous real-time manner (Pareek et al., 2011).

Next generation sequencing has improved the ability of profiling genes at the transcript level. The improved performance of various technologies such as Ion Torrrent‘s PGM

(Personal Genome Machine), Pacific Biosciences‘ (Pac-Bio) RS (Read Sequencer),

Illumina Miseq and Hiseq continues to provide sufficient outputs which provide meaningful analysis (Quail et al., 2012).

Preferences are made towards choosing a particular sequencing technology depending upon different advantages and purposes for which the reads are desired. For instance,

Pac-Bio outputs long (in terms of base pairs) but few sequences and is mainly preferred for genome sequencing. Illumina Hiseq, on the other hand, outputs millions of short sequences ideal for RNA-seq studies. Compared to low throughput EST sequencing, transcript analysis (RNA-seq) through NGS has improved tremendously. It is now possible to sequence and assemble non-model organisms with ease enabling new discoveries that could not be explored by the Sanger method (Martin and Wang, 2011). 28

2.6.2 Transcript analysis

The transcriptome of a given genome comprises of all the transcripts in a cell, their quantity and type, at a specific treatment condition or developmental stage (Wang et al.,

2009; Nagalakshmi et al., 2010). The focus of performing transcriptome analysis is to catalogue the transcripts, to identify the expressed genes and to quantify their level of expression (Wang et al., 2009). Unlike genomic analysis, transcriptome analysis provides information on the significant mutations of the expressed genes that would otherwise not be present at the genomic level.

RNA sequencing is the current technique of choice for transcriptome analysis. The technique refers to all the experimental procedures including sequencing methods that generate transcripts of varying lengths (Garber et al., 2011). RNA-seq entails the direct sequencing of the complementary DNA (cDNA) using the NGS technologies, followed by either mapping on a reference genome (Trapnell et al., 2012) or mapping to a reference generated de novo (Martin and Wang, 2011; Haas et al., 2013). Unlike microarrays, which are limited in profiling of expressed genes, most of the biological problems that could not be envisaged previously can now be resolved with RNA-seq

(Garber et al., 2011). RNA-seq is also emerging as a superior method for studying effects of genetic variation on transcriptional regulation as well as comparing the relationships between normal individuals and disease or mutation causal agents

(Duitama et al., 2011).

29

Sequencing of the cDNA is normally done on various NGS sequencing platforms depending on the interest of the investigator. The viability and cost-effectiveness of some of the currently used platforms like Illumina Miseq, Pac-Bio and Ion Torrent have been studied and reviewed (Pareek et al., 2011; Quail et al., 2012). Illumina sequencers are usually used in transcriptome analysis, mainly because of their high throughput generation of reads that increase transcriptome profiling efficiency. Biases in Illumina sequencers have been studied and one of the causes identified was the PCR step during library preparation (Aird et al., 2011). In a previous study, Illumina Miseq platform generated a greater percentage of SNPs as compared to the Hiseq even with the same libraries (Quail et al., 2012).

Following sequencing in the analysis pipeline, is transcript assembly, mapping, and determination of differentially expressed (DE) transcripts as well as transcriptional variations. The assembly can be done either by a de novo strategy or a reference-based strategy or both, but these strategies are faced with several challenges. Possible challenges in the de novo strategy would include: random and non-random sequencing errors, the scale of reads and inherent transcript complexity (Tulin et al., 2013).

Remedies to these challenges have been made in order to improve assembly qualities through advancements in software such as Trinity, SOAP de novo, Velvet/Oases and

TransAByss (Tulin et al., 2013). The advantages and disadvantages of de novo as well as reference-based assembly methods have been evaluated and reviewed (Martin and

Wang, 2011). 30

2.6.3 Single nucleotide polymorphisms (SNPs) identification

Single nucleotide polymorphism (SNP) discovery and calling is part of the downstream analysis of the transcriptome, especially when determining the variations between conditions or developmental stages against a reference genome. SNP identification can also be accomplished with de novo generated reference transcriptome as well as in combination with the reference genome, which it is also known as single nucleotide variant (SNV) calling (Duitama et al., 2011). A combination of de novo and reference- based strategies, when using reference transcript like consensus coding sequences

(CCDS) as the de novo database, improved variant calling as compared to when using a single approach (Duitama et al., 2011).

Accuracy in SNP calling is crucial, especially from the data generated by the ever improving NGS technologies. The steps to be followed and some of the recent tools for accurate SNP calling have been outlined (Liu et al., 2012). Further details on how to identify SNPs without a reference genome and the algorithms needed therein have also been made available (Ratan et al., 2010). For the de novo approach, a pipeline known as

DIAL (De novo identification of alleles) was developed to detect substitutions between two closely related genomes without reference genome.

Single nucleotide polymorphisms (SNPs) identification in many plants have been applied as a method of choice for variant detection in RNA-seq studies (Yang et al.,

2011; Trick et al., 2012; Duarte et al., 2014). Using bulked segregant analysis (BSA) to map putative SNPs at a particular gene region, about 67% validated SNPs mapped across the investigated gene in polyploid wheat (Trick et al., 2012). In a dicot plant 31

(Medicago tranculata), over 35 informative SNPs were identified and over 1500 validated (Duarte et al., 2014). Although there is no SNP data on finger millet, there is a potential in using RNA-seq methods to call SNPs without reference genome (Wolf,

2013).

32

CHAPTER THREE

MATERIALS AND METHODS

3.1 Agronomic competence of the finger millet accessions and in vitro response to

Striga exposure

Four finger millet accessions i.e GuluE, IE2396, White Sel6 and IE2459 were sourced from the department of Botany and Horticulture, Maseno University. Striga seeds previously obtained from Wad Medani Agricultural Research Station, Sudan (courtesy of Dr. Rasha Ali) and reserved at International Crops Research Institute for Semi-Arid

Tropics (ICRISAT) genebank were used in this study.

Four accessions of finger millet (Table 3.1) were planted separately in the screen house

(a maximum of 4 pots per accession with four plants per pot) and their development monitored to maturity between the months of July-September 2014. The major morphological descriptors in finger millet accessions used in this study were previously reviewed by Bioversity International (Bioversity, 2010). All the experiments were undertaken at the BecA-ILRI Hub screen house at an altitude of 1229 above the sea level and an average temperature of 25-29°C.

In vitro phenotypic evaluation for the specific infestation time points was performed in an incubator at a temperature of 29±2°C. Four finger millet seedlings were pre- germinated in a petri dish until they were three days old. Each roots of the four finger millet seedlings per petri dish were infested with ten pre-germinated Striga seeds. The 33

number of Striga seeds that developed haustorial hair after 3-day post infection and successfully attached and penetrated after 9-day post infection were counted in nine different experiments. Each experiment was independent of the other, but replicates were performed concurrently and under similar conditions.

Two-way analysis of variance (ANOVA) was used to determine the difference in Striga germination and attachments among the four accessions at p-value<0.05. The ANOVA was followed by a post hoc Tukey‘s test (Brown, 2005) to identify the sources of variations among the accessions at 3 dpi and 9 dpi. P-values were corrected for multiple comparisons using Bonferroni‘s test (Bland, J.M and Altman, D. G., 1995).

Table 3.1: Finger millet accessions

Accessions Response to Striga White Sel 6 Susceptible IE2459 (GBK 033340) Susceptible GuluE Tolerant IE2396 (GBK 0333474) Tolerant

The plants were subjected to Striga infested fields in Western Kenya for two seasons. The field was infested with more than one Striga spp. Tolerance is associated to the ability of the crop to minimize effects of infection and susceptibility is lack of the ability to minimize the consequences of infection

3.2 Infecting of pre-germinated finger millet seedlings with Striga Finger millet seeds were germinated in vitro using the filter paper method (Hess et al.,

1992). They were sterilized using 10% NaOCl (domestic bleach) (Kobian Scientific,

EA, Kenya) for about 30-60 minutes. The seeds were then rinsed with double distilled 34

water (ddH2O) at least three times to remove any NaOCl residues and soaked in 0.5%

Ridomil Gold (methyl N-(2,6-dimethyl) -N- (methoxyacetyl) –D- alaninate) (Sygenta,

EA Ltd., Kenya) for about 12 hours to ensure both surface and systemic sterilization.

This was followed by rinsing at least 3 times with sterile ddH2O. The clean seeds were then planted in 90 mm petri dishes with moist filter papers, incubated in the dark at

29±2°C for 72 hours.

Simultaneously, Striga seeds were washed in 30 ml double distilled water (ddH2O) in a

50-ml falcon tube containing about 5 drops of Tween 20. The seeds were shaken and vortexed several times followed by removal of debris using a pipette. Thereafter, the seeds were sonicated for 3 minutes with occasional swirling to remove any leftover sand and debris. After sonication the seeds were rinsed three times, and then soaked for at least 10 minutes in ddH2O. The Striga seeds were then transferred into 90 mm petri dishes containing two layers of moist filter paper and conditioned in the dark at 29±2°C for 18-20 days.

3.3 RNA extraction

Whole finger millet seedlings were harvested at each time point followed by immediate

RNA extraction. RNA was extracted from 24 E. coracana plant tissues including controls (Table 32). Fresh plant tissues were harvested and immediately flash frozen in liquid nitrogen. Total RNA was extracted using the Zymo Research Plant RNA

MiniPrep RNeasy kit (ZR, Irvine, CA, USA) according to the manufacturer‘s instructions. Before proceeding to the library preparation, the integrity of the RNA was 35

checked using denaturing gel electrophoresis (Life Technologies, Carlsbad, CA, USA)

(Figure 4.5).

3.4 Library preparation and sequencing

The total RNA from the 24 samples was used to prepare cDNA in one batch (Table

3.2). The cDNA libraries were prepared using the Illumina TruSeq Stranded Total RNA

Sample Preparation kit with Ribo-ZeroTM Plant (Illumina, San Diego, California, USA).

After the rRNA depletion, the mRNA was fragmented before synthesis of the first and second strand cDNA. The double stranded cDNA fragments were adenylated with single ‗A‘ base, ligated with adapters and PCR enriched. The libraries were assessed for quality and concentration using Agilent Bioanalyzer2100 (Agilent Technologies, Palo

Alto, CA, USA) and Qubit® 3.0 Fluorometer (Life Technologies, Carlsbad, CA, USA), and normalized to 4nM. Final concentration of 12pM and 1% PhiX control of pooled equimolar ratios of the libraries was used for sequencing. The libraries were sequenced using the Miseq (Illumina) available at the BecA-ILRI Hub. A 300 bp paired-end sequencing run was performed using a 600 PE v3 kit (Illumina, San Diego, California,

USA) according to the manufacturers‘ instructions.

Table 3.2: Striga infestation and/or RNA extraction time points. The RNA was extracted 3 days‘ post infestation (dpi), 5 dpi and 7 dpi.

Accession 3 dpi 5 dpi 7 dpi GuluE Infested vs Control Infested vs Control Infested vs Control IE2396 Infested vs Control Infested vs Control Infested vs Control White Sel6 Infested vs Control Infested vs Control Infested vs Control IE2459 Infested vs Control Infested vs Control Infested vs Control 36

3.5 Quality control, de novo assembly of the sequences, and abundance estimation

As a preliminary check of the sequenced data, the quality of raw reads was first examined using Fastqc (http://www.bioinformatics.babraham.ac.uk/). The large and small sub unit rRNA sequences downloaded from Silva (http://ftp.arb- silva.de/release_123/) and Enterobacteria phage phiX174 sensu lato databases downloaded from (http://www.ncbi.nlm.nih.gov/genome/) were used to remove rRNA and phiX contaminants, respectively. Clean reads were trimmed using trimmomatic

(Bolger, et al., 2014). The trimming conditions were set as, headcrop: 15 minlen: 50 illuminaclip: crop: 175. All reads were aligned to the respective databases using bowtie2/2.2.5 (Song, et al., 2014).

Clean quality reads were first assembled de novo using Trinity software (Haas et al.,

2013) (Trinity v2.0.3) (November 10,2013 release) incorporating default settings for non-strand specific paired end reads. The second assembly was done using Shannon assembler (Kannan et al., 2016) incorporating default settings. De novo assembly was done on each accession, which was then followed by removal of redundancy using CD-

HIT. To estimate transcript abundance, the cleaned reads were aligned to the reference transcriptome (created by linking the individual assemblies) using RSEM (RNA-seq by

Expectation Maximization) software available in the Trinity software utilities (Haas et al., 2013). The RSEM software uses Bowtie (Langmead et al., 2009) for the alignment, calculates transcripts abundance and estimate FPKM (fragments per kilo base of target transcript length per million reads mapped) and TMM (trimmed mean of M-values) (Li and Dewey, 2011). 37

3.6 Differential gene expression analysis

The edgeR software package from the Bioconductor project (Anders et al., 2013) was used to estimate the differentially expressed (DE) transcripts between the samples and experimental conditions. The edgeR package compares all possible DE transcripts for each sample across all the samples and generates the statistically significant values that can be visualized in form of MA (log ratios versus abundance) plots (Robinson, et al.,

2010), correlation plots and clustered hit maps (Haas et al., 2013). DE transcripts were compared across all accessions and stages at 1% FDR. DE transcripts comparing samples between infested and controls per accession with a cut of 5% False Discovery

Rate (FDR) were used for further analysis. Log2 (fold change) (logFC) with at least four-fold change was used to identify upregulated and down regulated transcripts.

Cluster heat maps were generated using at least 0.1% FDR of the DE transcripts.

3.7 Gene prediction, annotation and classification

Transcriptomes for each accession were annotated against proteins from NCBI RefSeq plants (http://www.ncbi.nlm.nih.gov), Uniprot Viridiplantae (www.uniprot.org), Protein

Families (Pfam) and Cluster of Orthologous Groups (COG) database

(ftp://ftp.ncbi.nih.gov/pub/COG/COG2014/data) using BLAST (Altschul et al., 1997) at an e-value of 1e-3. Transdecoder ((http://transdecoder.github.io/) was used to find proteins with the longest open reading frames (ORFs) before searching against the Pfam database. Blast2GO software was used to classify Gene Ontologies (GO) on transcripts and to identify major KEGG pathways per accession. Blast output from RefSeq plant at an e-value of 1e-5 was used in blast2GO analysis. GO distribution was visualized using 38

WEGO online software. Panther database (http://pantherdb.org/) was used to classify

Uniprot BLAST hits from all the accessions into various protein classes.

3.8 SNP analysis

Freebayes package (Garrison and Marth, 2012) and a set of variant calling format

(VCF) tools (Danecek et al. 2011) were used for SNP calling and filtering, respectively.

Finger millet reference transcriptome was used as a reference against which each sample was then aligned. After the initial mapping, the reads were refined by recalibrating the base qualities and masking known SNPs. The single SNPs Indels

(insertions and deletions) were then called with Freebayes. Additional filtering was done using the VCF tools.

39

CHAPTER FOUR

RESULTS

4.1 Agronomic traits scoring in the screen house

Finger millet agronomic traits are crucial for the adaptability of specific accession to their agro-ecological regions. Based on the plant height from the ground level to the ear tip at dough stage, GuluE was shortest (average of 68 cm), while IE2459 was the tallest

(av. 93cm) (Table 4.1). IE2396 and White Sel6 attained almost the same heights at this stage. Results from visual examination revealed that GuluE and IE2459 were pigmented, while the rest were not pigmented at the flowering stage (Table 4.1). Most of the accessions had an average of two basal tillers that bear mature ears except

IE2459, which possessed as few as a single tiller to none (Table 4.1). Similarly, GuluE matured earlier (90 days) as compared to all other accessions which took between 98-

105 days from the time of sowing to the stage when 50% of the main tillers had mature ears (Table 4.1).

At the dough stage there was finger branching in all the accessions, but differences in finger lengths (Figure 4.1). Still at this stage GuluE and White Sel6 adopted a semi- compact ear with curved tops, while IE2396 retained a compact or incurved ear and

IE2459 a fist-like ear shape. The ear shapes changed to different forms at the maturity stage (Figure 4.1). IE239 and IE2459 had the same range of finger number (3-6), but different from that of GuluE (4-6) and White Sel6 (5-10) at this stage (Table 4.1).

40

Amongst the four accessions only GuluE yielded green at maturity (Table 4.1).

However, its grain colour was the same as that of IE2459 (copper-brown in colour), but different from IE2396 with purple-brown colour and White Sel6 with white grains as suggested in its name. On average, IE2396 had the highest weight either measured by

1000-grain weight or by grain yield per plant (Table 4.1). The weight of White Sel6 by

1000-grain weight was the lowest, but was above that of GuluE and IE2459 in terms of averages of grain yield per plant (Table 4.1).

Table 4.1: Screen house agronomical traits scores. Mean scores and morphological descriptors of finger millet based on Bioversity International, 2010 scoring index. White Descriptors GuluE IE2396 Sel6 IE2459 Plant Height (cm) 68 79 77 93 Plant Pigmentation 1 0 0 1 Productive tillers 2-3 1-3 1-3 1 Days to flowering 84 98 91 98 Ear shape 3 4 3 5 Finger branching 1 1 1 1 Finger length (mm) 57 68 86 64 No. of grains/spikelet 5 3 7 7 Grain colour 3 4 1 3 Green fodder yield Yes no no no Finger number 4-6 3-6 5-10 3-6 Days to maturity 90 105 98 105 1000-grain weight (g) 3.43 4.27 2.86 3.66 Grain yield/plant (g) 9.98 15.26 11.264 10.778

The details of the morphological descriptors and their classifications are available at (Bioversity International, 2010). For example, a value of 3 used to rank ear shape in both GuluE and White Sel6 can be described as semi-compact ear with curved tops.

41

Figure 4.1: The different ear shapes of the four accessions at maturity. GuluE (A) adopts open straight fingers, IE2396 (B) semi-compact with curved tops, White Sel6 (C) droopy (the fingers lax and droop), and IE2459 (D) semi-compact ear shapes.

4.2 Critical Striga infestation stages and laboratory screening for resistance and susceptibility at each stage

Observable Striga germination commenced as early as 2 days‘ post infestation (dpi).

However, optimal germination was seen 3 dpi. An example of a clear Striga germination was observed in IE2459 after 3 dpi (Figure 4.2.A). There was germination in all other accessions at different percentages as well.

Striga attachment was most visible 5 dpi after optimal germination. For example, there was delay in attachment of Striga to GuluE and White Sel6 accessions while optimal attachment was observed in IE2459 accession two days after germination (Figure

4.2.B). The penetration was also observed 2 days after attachment (Figure 4.2.C). These

3 stages were the focal points at which total RNA was extracted to identify differentially expressed genes in tolerant and susceptible finger millet accessions when exposed to Striga. Figure 4.2.D shows an observation made after 7 dpi that confirmed successful penetration and Striga shoot development in accession IE2459. Although there was no observable clear necrosis of Striga 7 dpi in IE2459, there was delayed 42

development in the tolerant accession IE2396 (Appendix 9), since no apical growth ensued as seen in IE2459 (Figure 4.2.D).

At the 3 dpi, IE2459 finger millet accession stimulated the highest number of Striga haustoria hair development per plate (Figure 4.3). Haustoria hair growth frequency in petri dishes containing GuluE and White Sel6 accessions was equal in the 9 set ups. The lowest number of Striga seeds that attached and penetrated to the host root were observed in IE2396 petri dishes while the susceptible accession IE2459 had the highest number of Striga attachments at 9 dpi (Figure 4.3).

Figure 4.2: Optimal Striga infestation stages. Optimal germination was achieved after three days (A), attachment two days after germination (B), and penetration two days after attachment (C). Continued development was monitored as a confirmation of the establishment of Striga (D).

43

Both GuluE and White Sel6 also realized almost similar attachment frequencies (Figure

4.3). The rate of successful attachment was determined to a larger extent with the subsequent penetration.

Figure 4.3: A comparison of the means of Striga counts per accession in all the four seedlings per petri dish at 3 and 9-days post infestation (dpi).

Furthermore, post hoc analysis identified different sources of variation in haustoria development and Striga attachment at 3 dpi and 9 dpi (Figure 4.4). At 3 days post infection the main source of variation was resistant accession IE2396a, which was significantly different from the other three accessions (GuluEb, IE2459b and White

Sel6b), thereby acting as the main source of variance (Figure 4.4). At 9 dpi, susceptible

IE2459b was significantly different from the rest of the accessions (Figure 4.4).

44

Figure 4.4: Tukey‘s post hoc analysis of the group means of the number of Striga attachment at 3 and 9 days post infection. Accessions with the same superscript did not show significant differences in mean striga attachment (p<0.05).

4.3 Quality of RNA and the cDNA library Good quality RNA was obtained from all the 24 samples following denaturing gel electrophoresis (Figure 4.5). From the gel photo (Figure 4.5), sample ten did not show visible concentration, while sample six had very low concentration. However, the results from Qubit® 3.0 Fluorometer concentrations were well above the required concentration (above 6 ng/µl) required for cDNA library preparation.

45

1 2 3 4 5 6 7 8 9 10 11 12

28S 25S

13 14 15 16 17 18 19 20 21 22 23 24

Figure 4.5: Denaturing gel of the 24 E. corocana performed as a quality check before proceeding to library preparation. 4 µl RNA sample, 6 µl formamide, 1 µl 10x loading dye, 0.5 µl gel red for every sample was denatured at 65°C for 5 min, then immediately chilled in ice for 5 min.

The cDNA libraries were of good quality and length (Figure 4.6). The Agilent

Bioanalyzer gel below (Figure 4.6.A), shows the quality of 12 samples from GuluE and

White Sel6 samples. Similar results were observed in the other accessions IE2459 and

IE2396. Bioanalyzer also indicated that all the peaks were at 70 seconds, which is equivalent to a 270 bp fragment (Figure 4.6.B). This was a confirmation of the mRNA length, which was previously fragmented into 270 bp before synthesis of the reverse strand.

46

A

B

Figure 4.6: Bio-analyzer gel. (A) For twelve samples from GuluE and White Sel6 accessions. (B) A peak at 70 seconds is equivalent to expected fragments of 270bp.

4.4 Transcriptome sequencing and de novo assembly

The finger millet cDNA libraries were sequenced using Illumina Miseq platform that yielded slightly above 85.6 million 250bp long paired-end reads (Table 4.2). A total of

67,570,534 (79%) clean reads were obtained after quality control and they were subsequently used for de novo assembly and further downstream analysis. The number of reads per sample varied widely between 400,000 to 7 million (Table 4.2) while the 47

number of reads per accession varied between 5 million and 10 million reads

(Appendices 1-4).

Table 4.2: Total number of reads for all the accessions

Raw reads Quality reads Quality Trimmed Sample R1 R2 R1 R2 reads % % 1 1,529,378 1,529,378 1,308,816 1,308,816 86 14 2 2,692,004 2,692,004 2,083,593 2,083,593 77 23 3 818,627 818,627 679,356 679,356 83 17 4 1,301,674 1,301,674 1,094,587 1,094,587 84 16 5 556,928 556,928 435,696 435,696 78 22 6 782,784 782,784 482,341 482,341 62 38 7 1,184,349 1,184,349 810,551 810,551 68 32 8 712,086 712,086 563,496 563,496 79 21 9 919,657 919,657 778,454 778,454 85 15 10 7,467,082 7,467,082 5,894,746 5,894,746 79 21 11 1,529,719 1,529,719 1,244,831 1,244,831 81 19 12 1,285,002 1,285,002 1,009,781 1,009,781 79 21 13 1,386,583 1,386,583 1,138,602 1,138,602 82 18 14 2,295,942 2,295,942 1,929,631 1,929,631 84 16 15 1,520,724 1,520,724 1,256,362 1,256,362 83 17 16 1,532,275 1,532,275 973,181 973,181 64 36 17 1,629,222 1,629,222 1,331,119 1,331,119 82 18 18 843,675 843,675 595,498 595,498 71 29 19 1,143,907 1,143,907 872,017 872,017 76 24 20 2,512,467 2,512,467 1,738,263 1,738,263 69 31 21 1,897,188 1,897,188 1,565,174 1,565,174 82 18 22 1,844,108 1,844,108 1,592,819 1,592,819 86 14 23 4,007,297 4,007,297 3,213,886 3,213,886 80 20 24 1,411,996 1,411,996 1,192,467 1,192,467 84 16 42,804,674 42,804,674 33,785,267 33,785,267 79 21

Quality reads are the reads that remained after trimming, while trimmed read are the reads that were discarded after the quality control. R1 is the forward read while R2 is the reverse read of the paired-end fragments. Sample 1-24 are all the samples from all the accessions that were sequenced.

48

The CD-HIT program was used to reduce the redundancy of the novel transcript assemblies generated from Trinity methodology for de novo full-length transcriptome reconstruction. CD-HIT clustering of the four accessions generated the following number of contigs: White Sel6 (125,630), IE2459 (111,811), GuluE (138,557), and

IE2396 (117,856) (Appendix 14).

Additionally, assembly of the reads was performed using the Shannon program generating the following contigs: White Sel6 (39,186), IE2459 (63,604), GuluE

(28,110), and IE2396 (33,156), (Table 4.3). The average contig length for at least half

(N50) of these assemblies was 346bp, 429bp, 287bp, and 310bp for White Sel6,

IE2459, GuluE and IE2396, respectively (Table 4.3). Furthermore, clustering of the contigs resulting from Shannon program was performed using CD-HIT at a similarity level of 95%, generating 33,368, 51,627, 25,154 and 29,013 non-redundant contigs from the original assembly of White Sel6, IE2459, GuluE and IE2396, respectively

(Table 4.3). The average GC content for all the assemblies was 48.75%.

The transcripts from Trinity assembly were relatively higher as compared to the transcripts from Shannon assembler (Table 4.3). The number of genes in finger millet has been estimated to about 85, 243, a half of which are repetitive in nature (Hittalmani et al., 2017). Further analysis was done using Shannon assemblies.

49

Table 4.3: Statistical summary of the de novo assembly results per finger millet accession

Accessions White Sel6 IE2459 GuluE IE2396 No. of contigs for reference assembly 39186 63604 28110 33156 No. of nr contigs from CD-Hit (95%) 33368 51627 25154 29013 N50 (bp) 346 429 287 310 GC (%) 48.64 48.21 50.28 47.88

The minimum contig length for the reference assembly is 200 bp. Nr (non-redundant)

4.5 Functional annotation and classification of transcripts

Similarity searches using the four transcriptomes generated from Shannon program produced a total of 76415, 78825, 2049 and 17930 hits against the RefSeq, Uniprot,

Pfam and COG databases, respectively (Appendix 5). The highest Blast hits were obtained from both RefSeq plants (54.9%) and Uniprot (56.6%) databases. There was more than 50% mapping rate in the two databases in all the accessions except GuluE

(Appendix 5).

Furthermore, identification of likely protein-coding regions in the four transcriptomes was performed using TransDecoder tool. For each accession, putative proteins with at least a 70bp open reading frame (ORF) from TransDecoder were 12980, 26620, 5606 and 8247 for White Sel6, IE2459, GuluE and IE2396, respectively. The ORFs were annotated by searching against four databases, namely, NCBI RefSeq, COG Pfam and

Uniprot (Figure 4.7).

50

A B

C D

Figure 4.7: Venn diagram of transcript homology in four databases for each accession. A. GuluE, B. IE2396, C. IE2459 and D. White Sel6. The Blastx was done at an e-value of 1e-3.

Gene ontology (GO) enrichment of the putative ORFs provided further annotation. A total of 15783, 8599, 6262, and 2883 GO terms were recovered from IE2459 (Figure

4.10), White Sel6 (Figure 4.11), IE2396 (Figure 4.9) and GuluE (Figure 4.8), respectively. The most abundant ―Cellular Component‖ terms in the four accessions were ―Cell part‖ (GO:0044464) (11625), ―cell‖ (GO: 0005623) (11806), and

―organelle‖ (GO:0043226) (7556). The top ―Biological Processes‖ were ―metabolic process‖ (GO:0008152) (12268) and ―cellular process‖ (GO: 0009987) (11367). The most abundant GO ―Molecular Functions‖ were ―binding‖ (GO: 0005488) (13383) and 51

―catalytic activity‖ (GO:0003824) (12589). Other significantly represented GO terms per accession include: ―Cellular Component‖, ―macromolecular complex‖

(GO:0032991) in GuluE, IE2396 and IE2459; ―Biological Process‖, ―localization‖

(GO:0051179), ―response to stimuli‖ (GO:0050896) and ―establishment of localization‖

(GO:0051234); and ―Molecular Function‖ ―transporter activity‖ (GO: 0005215) and

―structural molecule activity‖ (GO:0005198).

Figure 4.8: Global view of Gene Ontology in GuluE.

52

Figure 4.9: Global view of Gene Ontology in IE2396.

Figure 4.10: Global view of Gene Ontology in IE2459. 53

Figure 4.11: Global view of Gene Ontology in White Sel6.

Orthologous transcripts from all the accessions mapping to the COG were further classified into 25 major clusters (Figure 4.12). The top most abundant COGs in all the accessions were, ―Translation, ribosomal structure and biogenesis‖ (1760), ―Signal transduction mechanisms‖ (1759), ―Posttranslational modification, protein turnover, chaperones‖ (1501), ―Energy production and conversion‖ (1076), ―Amino acid transport and metabolism‖ (1038), ―General function prediction only‖ (979),

transport and metabolism‖ (821), ―Replication, recombination and repair‖ (650), ―Inorganic ion transport and metabolism‖ (580) ―Cell wall/membrane/envelope biogenesis‖ (543), and ―Defense mechanisms‖ (401). Most of the ―Defense mechanisms‖ orthologs mapped to IE2459 (192) and White Sel6 (133) as compared to IE2396 (51) and GuluE (25). ―Signal transduction mechanisms‖ also 54

showed similarly trends with more orthologs mapping to IE2459 (992) and White Sel6

(403) than IE2396 (266) and GuluE (98).

A total of 17,039 RefSeq Blast hits mapped to 459 KEGG metabolic pathways, including 3,063 enzymes (Appendix 6). The highest number of pathways was recovered from IE2459 (128), while GuluE (102) had the least. Besides, 4554 and 3253 protein sequences in White Sel6 and IE2396 mapped to 119 and 110 pathways, respectively

(Appendix 6). The top twenty pathways were consistent in all the accessions (Figure

4.13). ―Purine metabolism‖ (Appendix 22), ―Thiamine metabolism‖ and ―Biosynthesis of antibiotics‖ (Appendix 21) pathways were also overrepresented consistently in that order in all the accessions (Figure 4.13). Other pathways varied in numbers from one accession to the other. For example, ―Oxidative phosphorylation‖ and

―Phenylpropanoid biosynthesis‖ (Appendix 20) were overrepresented in IE2459 and

White Sel6 and underrepresented IE2396 and GuluE (Figure 4.13).

Out of the 407 DE transcripts, a total of 118 mapped to the KEGG pathways database

(Appendix 19). Many transcripts mapping to the pathway database were obtained from the DE transcripts at 5 dpi, owing to the larger differences that were observed between them previously. No down-regulated transcript mapped to the pathways except in White

Sel 6, which resulted to about 10 transcripts mapping to the database (Appendix 19).

55

Figure 4.12: Cumulative view of Cluster of Orthologous Groups in all the accessions. The clusters were derived from pooling classes from each accession together.

A total of 2713 protein class hits were recovered from the PANTHER (Protein

ANalysis THrough Evolutionary Relationships) classification system (Appendix 8). The top overrepresented classes were nucleic acid binding (17.8%), hydrolase (12.9%), transferase (11.6%), transporter (9.4%) and oxidoreductase (9.3%) (Figure 4.14). The least represented classes were cell junction protein (0.1%) and structural protein (0.1%).

There were no proteins representing transmembrane receptor regulatory/adaptor protein class (PC00226) in GuluE and IE2396 (Appendix 8). 56

Figure 4.13: Major KEGG pathways per accession. The pathways were identified from RefSeq Blastx (e-value 1e-3) output using Blast2GO. Three major pathways, Purine metabolism, Thiamine metabolism and Biosynthesis of antibiotics were overrepresented in all the accessions. 57

Figure 4.14: Protein classes. The distribution of the protein classes derived from the transcripts mapping to Uniprot BLAST at an e-value 1e-3. The top represented protein class was nucleic acid binding.

4.6 Differentially expressed transcripts The total transcripts predicted with at least 4-fold differential expression at 1% FDR in all accessions were 407 (Table 4.4). There were 320 up regulated and 97 down- regulated transcripts between Striga infested samples and their respective controls

(Table 4.4). High level of transcript expression was observed at 5 dpi (291) of Striga infestation in almost all the accessions in comparison to the expression at 3 dpi (47) and

7 dpi (69) (Table 4.4).

58

Table 4.4: Number of differentially expressed (DE) transcripts

Differentially Expressed Up- Down- Accessions Time points transcripts regulated regulated 3 dpi 7 2 5 IE2459 5 dpi 143 141 22 7dpi 4 2 2 3 dpi 0 0 0 White Sel6 5 dpi 68 24 44 7dpi 17 9 8 3 dpi 31 29 2 GuluE 5 dpi 33 13 10 7dpi 47 47 0 3 dpi 9 9 0 IE2396 5 dpi 47 44 3 7dpi 1 0 1 Total 407 320 97

The DE transcripts were identified at a false discovery rate (FDR) of 1% and a log2FC of 4 between infected and control samples per infestation time point for each finger millet accession

Clustering of differentially expressed transcripts at 0.1% FDR and comparing transcripts per infestation time point between infested samples versus their controls varied across all the genotypes (Appendices 15, 16, 17 and 18). The infested samples clustered distinctly from the controls. Four major sub-clusters of transcripts

(represented on the y-axis) defined the clustering of the samples (Appendices 15, 16, 17 and 18). Differentially expressed transcripts were visualized using volcano plots (shown by the red dots) at a logFC of at least 2 between the infected and control samples

(Figure 4.15). Significant differences were observed in IE2459 and White Sel6 as compared to GuluE and IE2396 (Figure 4.15). 59

A B

Key: A. GuluE B. IE2396 C D C. White Sel6 D. IE2459

Figure 4.15: Volcano plot showing differentially expressed transcripts at 5 dpi

Among the top 10 differentially expressed transcripts, more hits of known proteins were recovered from NCBI RefSeq at 5 dpi (39) as compared to 3 dpi (15) and 7 dpi (15)

(Appendix 10, 11 and 12). Majority of the differentially expressed transcripts mapped to hypothetical proteins at all the time points of Striga infection. The top ten (both down and up-regulated) hits to the uniprot database at 5 dpi are shown below (Table 4.5). For example, some of the up-regulated proteins across all accessions included, Chlorophyll a-b binding, Elongation Factor 1-alpha, Photosystem II D1 (IE2396), Cytochrome b6,

Cinnamic acid 4-hydroxylase, Chlorophyll a-b binding (IE2459), ATP synthase subunit a, Hexosyltransferase, Dehydrin (White Sel6) (Table 4.5). The top down-regulated 60

proteins were Senescence-associated (IE2459) and Signal anchor (White Sel6) (Table

4.5).

Table 4.5: Top ten differentially expressed transcripts (both up-regulated and down-regulated) for all the accessions at the attachment stage

Transcript ID LogFC Blast Evalue Uniprot Hit Title Uniprot ID Species GuluE Shannon_shannonout_cremaining1_123054_0 9.53 5.00E-105 Uncharacterized R0GWN2 C. rubella Shannon_shannonout_r2_c1_3_3160_316 9.49 2.00E-50 Uncharacterized M1DWM7 S. tuberosum Shannon_shannonout_r2_c1_3_1590_159 9.25 1.00E-70 BnaCnng12640D A0A078I3U3 B. napus Shannon_shannonout_r2_c1_3_2440_244 8.66 2.00E-48 Uncharacterized M1DWM7 S. tuberosum Shannon_shannonout_cremaining1_115929_2 8.54 3.00E-29 Uncharacterized A0A161X3X2 D. carota Shannon_shannonout_c1_17_11580_1158 -6.64 8.00E-73 Uncharacterized A0A162A9Q8 D. carota Shannon_shannonout_r2_c1_3_1580_158 -10.38 1.00E-70 BnaCnng12640D A0A078I3U3 B. napus Shannon_shannonout_c1_18_1660_166 -12.24 3.00E-83 Uncharacterized I1R8D4 O. glaberrima Shannon_shannonout_r2_c1_3 -12.58 1.00E-07 BnaCnng12640D A0A078I3U3 B. napus Shannon_shannonout_r2_c1_3_8450_845 -14.27 2.00E-46 Uncharacterized G7J100 M. truncatula IE2396 Shannon_shannonout_r2_c1_4_1330_133 13.78 1.00E-07 Uncharacterized A0A0D3DV04 B. oleracea Shannon_shannonout_r2_c1_5_1740_174 9.91 3.00E-37 Uncharacterized S8EP84 G. aurea Chlorophyll a-b Shannon_shannonout_cremaining2_52017_3 9.62 0 binding A0A1E5VI92 D. oligosanthes Single_236 9.26 5.00E-79 Uncharacterized A0A0E0B8L4 O. glumipatula Shannon_shannonout_cremaining3_single_70 9.04 0 EF 1-alpha A0A0D3FEJ4 O. barthii Shannon_shannonout_cremaining1_6986_2 3.12 5.00E-110 Uncharacterized R0GWN2 C. rubella Single_0 3.02 0 Photosystem II D1 A0A109NDD5 S. bicolor Shannon_shannonout_r2_c1_5_6140_614 -4.30 3.00E-30 Uncharacterized A0A161X3X2 D. carota Shannon_shannonout_r2_c1_4_3550_355 -8.71 2.00E-46 Uncharacterized G7J100 M. truncatula Shannon_shannonout_r2_c1_4_1340_134 -15.58 2.00E-07 BnaCnng12640D A0A078I3U3 B. napus IE2459 Shannon_shannonout_cremaining1_34110_2 14.26 4.00E-148 Cytochrome b6 P05642 Z. mays Cinnamic acid 4- Shannon_shannonout_cremaining2_36734_2 11.47 0 hydroxylase Q94IP1 S. bicolor Chlorophyll a-b Shannon_shannonout_cremaining1_1993_5 11.18 0 binding A0A1E5VI92 D. oligosanthes Shannon_shannonout_cremaining2_49471_1 10.89 0 Uncharacterized A0A0D3HQA8 O. barthii Shannon_shannonout_r2_c1_3_980_98 10.43 9.00E-15 Uncharacterized A0A0E0C3U0 O. meridionalis Senescence- Shannon_shannonout_c1_0_5080_508 -10.53 2.00E-142 associated A0A072TJF5 M. truncatula Shannon_shannonout_r2_c1_3_4920_492 -10.59 4.00E-162 Uncharacterized A0A1B6QNI8 S. bicolor Shannon_shannonout_r2_c1_3_2850_285 -13.55 4.00E-81 Uncharacterized I1R8D4 O. glaberrima Shannon_shannonout_r2_c1_3_1240_124 -18.59 3.00E-33 Uncharacterized M0UBY5 M. acuminata Shannon_shannonout_r2_c1_4_1860_186 -18.74 4.00E-96 Uncharacterized A0A162A9Q8 D. carota White Sel6 ATP synthase Single_14 10.23 1.00E-17 subunit a A0A199V1R8 A. comosus Shannon_shannonout_cremaining2_70245_2 9.50 9.00E-18 Uncharacterized Q6R9N6 Z. mays Shannon_shannonout_cremaining7_42799_2 8.48 0 Hexosyltransferase K4AC77 S. italic Single_623 8.39 3.00E-71 Uncharacterized K3Z9P0 S. italic Shannon_shannonout_cremaining1_single_67 5.70 4.00E-11 Dehydrin C5YX70 S. bicolor Shannon_shannonout_cremaining2_70145_1 -8.23 1.00E-17 Uncharacterized Q6R9N6 Z. mays Shannon_shannonout_r2_c1_0_8323_1 -8.47 6.00E-14 Uncharacterized K4D8K8 S. lycopersicum Shannon_shannonout_r2_c1_0_8673_36 -11.35 7.00E-57 Uncharacterized I1QKX6 O. glaberrima Shannon_shannonout_r2_c1_0_8563_25 -11.92 2.00E-10 Signal anchor G7K1Z2 M. truncatula Shannon_shannonout_r2_c1_4_1550_155 -17.48 3.00E-48 Uncharacterized A0A072TKH6 M. truncatula

The top transcripts mapping to Uniprot proteins at an e-value of 1e-3 were filtered using logFC scale.

61

Differential expression analysis identified few proteins that were either up-regulated or down-regulated among the accessions at 3 dpi and 7 dpi due to the lower expression profiles and subsequent lower BLAST mapping rates (Appendix 11 and 12). At 3 dpi, only five protein hits could be annotated of which two signal anchor proteins

(XP_013455718 and XP_003616487) were conversely up and down-regulated in GuluE and IE2396 (Table 4.6).

Table 4.6: Differentially Expressed (5% FDR) transcripts (both up-regulated and down-regulated) across all the accessions at 3 dpi

LogFC Gulu IE239 IE245 White Proteins RefSeq ID Species E 6 9 Sel6 Hypothetical -2.69 -14.08 protein XP_0036279 M. 5.50 ↑ * ↓ ↓ MTR_8g040260 37 truncatula Signal anchor, XP_0134557 M. 2.83 ↑ 4.60 ↑ * * putative 18 truncatula Hypothetical protein XP_0101047 M. 3.67 ↑ 4.14 ↑ * * L484_018823 71 notabilis Hypothetical YP_0091627 2.63 ↑ * -8.05 ↓ * protein 64 B. braunii Signal anchor, XP_0036164 M. -3.54 2.84 ↑ * * putative 87 truncatula ↓

↑=Up-regulated, ↓=Down-regulated, *=Not Expressed. The top transcripts mapping to RefSeq proteins at an e-value of 1e-3 were filtered using logFC scale

Majority of the up regulated proteins were observed at 5 dpi, with shared annotations to

Uniprot proteins (Appendix 7) and NCBI RefSeq (Table 4.7) databases.

62

Table 4.7: Differentially Expressed (5% FDR) transcripts (both up-regulated and down-regulated) across the accessions at 5 dpi

LogFC IE239 White Protein name RefSeq ID Species GuluE 6 IE2459 Sel6 Hypothetical protein P. PHAVU_011G146200g XP_007133039 vulgaris 5.36 ↑ 4.49 ↑ 3.77 ↑ 2.96 ↑ O. 3.145 Cytochrome b6 YP_009155818 officinalis 3.15 ↑ ↑ 13.18 ↑ 3.97 ↑ Hypothetical protein CARUB_v10011286mg XP_006306006 C. rubella 4.56 ↑ 3.78 ↑ 10.43 ↑ 2.55 ↑ S. michauxia Maturase K YP_009233928 nus 2.53 ↑ * 2.96 ↑ 3.69 ↑

ADP, ATP carrier protein2 XP_004953705 S. italica * 4.43 ↑ 3.13 ↑ 4.16 ↑ Putative cytochrome P450 superfamily protein NP_001140726 Z. mays * 3.75 ↑ 11.47 ↑ 3.59 ↑ NADH-plastoquinone A. oxidoreductase subunit K YP_009053884 calcareus * 2.67 ↑ 3.35 ↑ 3.18 ↑ Cytochrome f YP_009172152 S. viridis * 2.60 ↑ 3.14 ↑ 2.79 ↑ H. cenchroid Photosystem II protein M YP_009233850 es * 2.84 ↑ 4.75 ↑ 2.73 ↑ Cysteine proteinase 1 XP_004952477 S. italica * 3.85 ↑ 4.47 ↑ 3.96 ↑ Chlorophyll a-b binding protein 1 XP_004957570 S. italica 8.41 ↑ 9.62 ↑ 11.18 ↑ * P. Hypothetical protein trichocarp POPTR_1605s00200g XP_006387178 a 8.45 ↑ 4.82 ↑ 3.26 ↑ * Hypothetical protein CARUB_v10001685mg XP_006288426 C. rubella 9.23 ↑ 3.12 ↑ 2.75 ↑ * Hypothetical protein M. -15.58 MTR_8g040260 XP_003627937 truncatula -2.67 ↓ ↓ 4.28 ↑ -3.47 ↓ hypothetical protein P. PHAVU_006G011900g XP_007146093 vulgaris -4.01 ↓ * -3.19 ↓ -3.29 ↓ Senescence-associated M. protein, putative XP_013442963 truncatula -4.42 ↓ * -3.46 ↓ -3.46 ↓

↑ =Up-regulated, ↓=Down-regulated, *=Not Expressed. The top transcripts mapping to RefSeq proteins at an e-value of 1e-3 were filtered using logFC scale.

63

At 7 dpi only five of the annotated proteins against NCBI RefSeq were commonly expressed among the accessions (Table 4.8). The five annotated proteins were either hypothetical proteins or uncharacterized except metal transporter Nramp5 protein

(Table 4.8).

Table 4.8: Differentially expressed (5% FDR) transcripts (both up-regulated and down-regulated) across the accessions at 7 dpi

LogFC Gulu IE23 IE24 White Proteins RefSeq ID Species E 96 59 Sel6 - Hypothetical protein XP_01344 M.trunca 19.55 -9.75 MTR_0055s0130 2970 tula ↓ ↓ * -2.93 ↓ Hypothetical protein XP_01010 M. -6.83 L484_018823 4771 notabilis ↓ * * 4.24 ↑ XP_01011 M. -6.85 Metal transporter Nramp5 2784 notabilis ↓ * * 4.52 ↑ YP_00916 B. -6.62 Hypothetical protein 2764 braunii ↓ * * -8.64 ↓ Uncharacterized protein NP_00118 -5.48 LOC100502416 3823 Z. mays ↓ * * -7.62 ↓

↑=Up-regulated, ↓=Down-regulated, *=Not Expressed

LogFC=log2(fold change) = log2(Infected/Control); except in Gulue where LogFC= log2(Control/Infected) The top transcripts mapping to RefSeq proteins at an e-value of 1e-3 were filtered using logFC scale.

4.7 SNPs identification

The present study successfully identified a total of 49332, 47015, 73654 and 46524 homoeologous SNPs in GuluE, IE2396, IE2459 and White Sel6 accessions, respectively

(Table 4.9). The corresponding number of insertions and deletions (Indels) were 1148, 64

1177, 2681 and 1387 in GuluE, IE2396, IE2459 and White Sel6, respectively (Table

4.9). The highest homoeologous SNP frequency was observed in GuluE (1SNP to 152 base pairs) and IE2396 (1SNP/196bp). In contrast, lower frequencies were observed in the IE2459 (1SNP/279bp) and White Sel6 (1SNP/247bp) (Table 4.9). Alternatively, the frequencies of Indels varied from across the accessions from a high of 1INDEL/6565bp in GuluE to a low of 1INDEL/8297bp in White Sel6 (Table 4.9).

Table 4.9: Homoeologous SNPs and INDELS identified from de novo assembly of each accession

Numbers Frequencies

Accession No. of contigs (bps) SNPs INDELs SNPs INDELs

GuluE 7536325 49332 1148 1/152 1/6565

IE2396 9229098 47015 1177 1/196 1/7841

IE2459 20555477 73654 2681 1/279 1/7667

White Sel6 11508475 46524 1387 1/247 1/8297

Reads from each accession was mapped back to its own assembly and no filtering was done in all the variants

Conversely, using GuluE as a reference a total of 53141, 46660, 48502 non- homoeologous SNPs were identified against IE2396, IE2459 and White Sel6, respectively (Table 4.10). Different empirical SNP results were observed when reference transcriptomes of the four accessions were used interchangeably to screen for

SNPs. For example, a total of 54278, 63190, and 55624 non-homoeologous SNPs were identified against GuluE, IE2459 and White Sel6 while using IE2396 accession as the 65

reference transcriptome (Table 4.10). IE2459 reference assembly also discovered a total of 47362, 58336, 66447 non-homoeologous SNPS against GuluE, IE2396, and White

Sel6 (Table 4.10). In addition, using White Sel6 as a reference a total of 51595, 57203, and 74042 SNPs were identified against GuluE, IE2396, and IE2459 (Table 4.10). The average frequencies obtained by summation of the SNP frequencies from each of the three accessions against one reference varied significantly. For example, an average of

1SNP/153bp, 1SNP/160bp, 1SNP/358bp and 1SNP/189bp non-homoeologous SNPs were recovered from GuluE, IE2396, IE2459 and White Sel6 as references (Table 4.10).

Table 4.10: Non-Homoeologous Bi-allelic SNPs identified from de novo assembly of each accession

GuluE IE2396 IE2459 White Sel 6 Av. Fre Fre Fre Freq Referen Contig que Frequ que que uenc ce s (bp) SNPs ncy SNPs ency SNPs ncy SNPs ncy y 753632 1/16 1/15 1/15 GuluE 5 * * 53141 1/142 46660 1 48502 5 3 922909 1/14 1/14 1/16 1/16 IE2396 8 54278 6 * * 63190 6 55624 6 0 205554 1/43 1/30 1/35 IE2459 77 47362 4 58336 1/352 * * 66447 9 8 White 115084 1/22 1/15 1/18 Sel 6 75 51595 3 57203 1/201 74042 5 * * 9

*=N/A Reads from one accession was mapped to the other and only Bi-allelic SNPs identified.

The average transition/transversion (Ts/Tv) ratio ranged from 2.4-2.5 (Table 4.11).

Apart from GuluE the average ratio in the other accessions was 2.4 (Table 4.11). The rate of transitions was reasonably higher than the transversions. The least and the 66

highest Ts/Tv were 2.17 (between White Sel6 and IE2459) and 2.65 (between IE2396 and GuluE) (Appendix 13).

Table 4.11: Distribution of Ts/TV coverages of Non-Homoeologous Bi-allelic SNPs

No. % Bi- A/G C/T T/G A/C A/T C>G Referenc of allelic Ts Ts Tv Tv Tv Tv e sites SNPs (%) (%) (%) (%) (%) (%) Ts/Tv GuluE 49434 98.3 36.6 33.8 6.6 7 7.8 6.4 2.5 IE2396 57697 98.4 36.2 33.3 6.8 7.4 7.8 6.8 2.4 IE2459 57381 98.6 35.6 33.8 7 7.4 7.6 7.2 2.4 White Sel6 60947 98.5 36 33.4 6.8 7.4 7.7 7.2 2.4

Ts= transition, Tv= transversion

67

CHAPTER FIVE

DISCUSSIONS, CONCLUSIONS AND RECOMMENDATIONS

5.1 Discussion

5.1.1 Agronomic competence and in vitro Striga response evaluation Transcriptome sequencing and analysis is one of the efficient ways of studying molecular variations in a non-model organism like finger millet without having to sequence the whole genome. The current study focused on understanding the transcriptome of finger millet at critical stages of Striga infestation. The critical stages or time points used were 3 dpi, 5 dpi and 7 dpi, which have been evaluated and used in other studies previously (Ejeta and Butler 1993; Ejeta, 2007). These critical time points are crucial to the establishment, infection and survival of Striga, thereby acting as potential targets for controlling parasite invasion and focal points for devising control strategies (Ejeta, 2007).

Finger millet accessions differ in their Striga tolerance regarding growth and yield.

Accessions tolerant to Striga during critical stages and those that carry the grain quality demanded by consumers are needed for the success of the finger millet crop.

Preliminary evaluation of the impact of Striga on the four finger millet accessions used in this study was performed in Western Kenya in a field with several Striga spp. for a period of two seasons (Prof. Mathews Dida, Unpublished). This previous study done in

Western Kenya was not part of this study, but informed the choice of the accessions used. Detailed agronomic competence and in vitro responses to Striga were done to establish the plants‘ vigour and productivity relating to their tolerance or susceptibility 68

to Striga and to optimize the specific time points at which Striga infestation took place, respectively.

Agronomic performance of the different accessions in the screen house were evaluated using morphological traits. Accession IE2396 was the most productive in terms of grain weight per plant and grain weight per 1000 seeds per finger head. Although this accession had very few finger numbers and grains per spikelet, it produced the heaviest grain, while White Sel6 produced the highest number of fingers and grains per spikelet but had very low grain weight. The observed traits such as high grain yield, medium plant height, high tillering ability and early maturity are some of the farmer-preferred qualities in finger millet (Owere et al., 2014) which can be associated with plant tolerance or escape mechanisms against Striga.

Variations in tillering were observed among the four finger millet accessions. GuluE

(tolerant) attained the earliest days to maturity, the shortest height and the highest number of basal tillers. IE2459 (susceptible) was the tallest, with zero to one tiller, latest in days to flowering and maturity. Variations for strigolactone production was inferred among the four finger millet accessions investigated as was reflected in the variation in tillering and the levels of S. hermonthica infection. Jamil et al. (2012b) described tillering as a complex mechanism in which strigolactones play a role. The study (Jamil et al., 2012b) suggested that genetic variations in tillering and strigolactone production are associated, and high strigolactones production inhibits tillering in plants.

Low tillering, high strigolactone producing genotypes are more prone to S. hermonthica 69

infection, and high-tillering, low strigolactone producing genotypes are less prone to S. hermonthica infection (Jamil et al., 2012b).

Testing for resistance to Striga was performed using in vitro assays to provide information concerning the basis of resistance in the host plant. In these assays, the presence of haustoria was microscopically detected around the growing host roots.

Striga impact was more observable in IE2459 than in any other accession. IE2459 also stimulated the highest number of haustorial hair development at 3 dpi, hence a probable high producer of haustorial initiation factors.

The highest frequency of attachment and penetration, after 9 dpi, was observed in accession IE2459 as compared to the lowest in IE2396. GuluE and White Sel6 attained nearly similar penetration frequency. Haustoria on the susceptible finger millet accessions (IE2459 and White Sel6) penetrated the xylem and established xylem-xylem connections as confirmed by successful development until 9 dpi. Confirmation of this observation should be done through histology. Haustorial development on the tolerant accessions (GuluE and IE2396) was diminished relative to those invading susceptible accessions. According to Ejeta, (2007), successful haustorium attachment and penetration of hemiparasite Striga is attainable nine days after infection.

Prior to attachment of Striga, haustorial initiation factors (HIF) stimulate the elongation of the radicle towards the host root, a process which is directly linked with the attachment frequency. According to Mohamed, (2010), low haustorial initiation and 70

high germination stimulant producers could be viable sources for depletion of Striga seed banks. In a different study, Striga resistant maize had fewer Striga attachment, delayed parasitic development and high mortality of attached parasites compared with susceptible in bred line (Amusan et al., 2008). Further incriminating evidences on impacts of the parasite to the four finger millet accessions were determined by the genetic variations behind the responses to Striga.

5.1.2 Differential expression and annotation of the transcripts RNA-seq analysis was performed to identify changes in gene expression and metabolic pathways associated with the infection and development of S. hermonthica on the four finger millet accessions. Analysis of transcriptional gene expression involved in parasitism identified a total of 47 differentially expressed transcripts (DETs) with a fold change of ≥4 in the four analysed accessions. Fewer transcripts were differentially expressed at 3-dpi and 7-dpi when compared to 5-dpi. At 3-dpi, pre-conditioned Striga seeds undergo induction by strigolactones (SL), a chemical signal exuded from the roots of the host, to germinate. During this period, there is limited host-parasite interaction that can elicit substantial gene expression between control and infested finger millet. To detect the relationship between SL stimulation by the host and haustorial development, a total of 407 transcripts were differentially expressed across the finger millet accessions. Apart from 5 dpi, many of the DE transcripts at 3 dpi and 7 dpi were mapping to hypothetical proteins.

At 3-dpi, the most abundant family encoded for signal anchor proteins (XP_013455718 and XP_003616487), which were up and down-regulated in resistant accessions GuluE 71

and IE2396, respectively. Signal anchor proteins carry out functions such as receiving chemical (for example SL) signals from outside the cell, translating chemical signals into intracellular action, and sometimes anchoring the cell in a particular location.

Stimulation, attachment and penetration of a haustorium are crucial to the infection and survival of Striga hermonthica. These were the interactions taking place at 5-dpi of

Striga infestation. Apart from for cytochrome P450 (CYP450) superfamily protein

(―monooxygenase activity‖) and cysteine protease 1 (―hydrolase‖), the rest of the commonly DE proteins at 5 dpi were associated with the functions involved in transport, photosynthesis, oxidation-reduction processes, signal transduction, cell wall development and mitochondrial electron transport/ATP synthesis. Plant CYP enzymes, such as the cytochrome P450 superfamily, are important for the biosynthesis of several compounds, such as hormones, defensive compounds, and fatty acid conjugates

(Gachomo, et al., 2003). Some of the counteractive responses from finger millet against

Striga exposure are increased transport and photosynthesis. This is because Striga as an obligate hemiparasite derives its nutrients and water from its host. Redox changes have been implicated in plant parasite interactions, where the changes are catalysed by quinone oxidoreductases to initiate haustorium induction in plant parasites (Ichihashi et al., 2015). Additionally, redox-mediated mechanisms involved in quinone oxidoreductase induced by HIFs have been suggested to trigger haustorial development in parasitic plants ( Matvienko et al., 2001; Bandaranayake et al., 2012; Ngo et al.,

2013).

72

The high numbers of differential expressed transcripts at 5-dpi indicates a full invasion of the host by Striga at this stage. The few defence related genes detected at this point can be associated with the fact that sometimes Striga and other parasitic plants unlike other pathogens like fungi or bacteria plant parasites sometimes go unnoticed by their hosts (Volker, et al., 2017). However, the up-regulation of cytochrome P450 superfamily protein in at least three accessions could as well suggest some degree of defence mechanism. Cytochrome P450 gene was one of the defence related genes identified to be up-regulated during ethylene or jasmonate treatment or during plant attack (Gachomo, et al., 2003). Other proteins up-regulated in an individual accession

(IE2396) like pathogenesis-related protein (PRB1-2-like) and glycine-rich RNA- binding protein 1-like (GRPA) (Appendix 11) may also suggest resistance reaction mechanism to the Striga invasion. In a recent study, RAB15 (a GRPA gene) was observed as central to ABA and JA signaling, transferring the signal to receptors to induce the expression of JA responsive defense genes in maize (Zhang et al., 2015).

And pathogenesis-related proteins have been shown to confer broad-spectrum protection against fungal and bacterial pathogens in rice, exhibiting stronger and quicker defense response during pathogen infection (Gómez-Ariza et al., 2007).

After the establishment of vasculature connection between Striga and the host at 7-dpi, the parasite develops mechanisms of survival from its host until it is fully developed to the seed production stage. An untimely death of the host will lead to its death as well.

The parasite lifecycle is always synchronized with the host thereby requiring a trade off in survival tactics (Volker, et al., 2017). It has been shown that Striga hermonthica 73

handles this trade off by increasing its transpiration rates leading to high potassium accumulation in the leaves (Volker, et al., 2017). This leads to drought stress to the host plant leading to increased development of its root systems. Up-regulation of Metal transporter Nramp5 at this stage in White Sel6 and GuluE may be as result of active transpiration in the parasite.

Gene ontology (GO) terms for the finger millet transcriptome were assigned based on

GO slim terms for biological processes, molecular functions (B), and cellular components. GO terms of significant differentially expressed transcripts in the four accessions revealed related molecular functions and biological processes that influence specific physiological processes. GO term enrichment identified abundant representation of genes in the ―transporter activity‖ category, which can be associated with nutrients and molecules transfer between the host and the parasite during active

Striga parasitism. In a previous study, Cuscuta spp. (holoparasite) was associated with responses to stimuli, increased transporter activity and reduced photosynthesis during parasitism (Ranjan et al., 2014). Transporter-related genes involved in the acquisition of nutrients from the host were also been in identified in Cuscuta spp. (Ichihashi et al.,

2015).

Plants respond almost in similar ways when exposed to biotic or abiotic stresses. For example, when Aegilops variablis (highly resistant to cereal cyst ) was exposed to nematode attack, GO annotation identified overrepresented terms which 74

included ―localization‖, ―establishment of localization‖, ―metabolic process‖, ―cellular process‖, and ―response to stimuli‖ (Xu et al., 2012). Huang et al, (2012), also found that GO categories such as ―catalytic activity‖, ―transferase activity‖ and ―response to stimuli‖ were enriched at a late infection stage (13 dpi) of cowpea infested by Striga gesnerioides-race SG3. Finger millet transcriptome in all the accessions revealed related molecular functions and biological processes that influence specific physiological processes.

Transcripts mapping to Cluster of Orthologous Groups (COG) database identified twenty-five functional clusters across all the accessions. The transcripts were functionally enriched using the COG database and the relative abundance of COG categories was quantified (Tatusov, et al., 1997). Globally, the transcripts were enriched in cellular processes and signalling such as translation, ribosomal structure and biogenesis, signal transduction mechanism, cell wall/membrane and envelope biogenesis, posttranslational modification, protein turnover and chaperones. Defence mechanism and signal transduction orthologs were abundantly enriched in susceptible genotype (IE2459 and White Sel6) as compared to tolerant accessions (GuluE and

IE2396). These observations were consistent with the need of the host to decipher hormonal (SL) cues from the host and cell wall biogenesis for haustoria formation. This is consistent with the obligate nature of Striga associated with nutrients and molecules transfer between the host and the parasite during active parasitism (Ichihashi et al.,

2015). After germination, Striga actively develops haustoria to attach and penetrate the roots of host. This is accompanied by active cell division whereby the cell must 75

accurately duplicate the genetic information encoded in its DNA (replication), which then undergoes transcription to form mRNA and eventual translation into proteins.

To adapt to the environmental stresses, plants also stimulate the production of different classes of proteins acting in specific metabolic pathways. Tolerance or susceptibility to

Striga is controlled by multiple traits. This entails several metabolic pathways, both in host and parasite that interact during expression of resistance or susceptibility for host and of virulence or avirulance for parasite (Rispail et al., 2007). ―Purine metabolism‖,

―Thiamine metabolism‖ and ―Biosynthesis of antibiotics‖ pathways were the most overrepresented pathways in all the four accessions.

Purine metabolism pathway in plants is involved in nitrogen assimilation and storage.

Purines are converted to ureidedes (allantoin and allantoic acid), which are then stored in stems and roots of the plants (Smith and Atkins, 2002). The pathways‘ products include xanthonine monophosphate (XMP), adenine monophosphate (AMP) and guanine monophosphate (GMP). While AMP and GMP serve as DNA and RNA precursors, AMP is also a precursor for cytokinin group of plant regulators (Smith and

Atkins, 2002).

Thiamine metabolism pathway plays a role in processes underlying plant defence by indirectly enhancing anti-oxidative capacity in plants (Goyer, 2010). Studies have shown overrepresentation of thiamine during biotic and abiotic stresses (Subki et al.,

2018; Goyer, 2010). The active form of thiamine (Thiamine Pyrophosphate) is a co- 76

factor which is a crucial component in many metabolic activities such as acetyl-CoA biosynthesis, amino acid biosynthesis, Krebs cycle and Calvin cycle (Subki et al.,

2018). Antibiotics biosynthesis pathways such as Tetracycline pathway are also involved in disease management. Antibiotics such as polyketides, lipopeptides and heterocyclic nitrogenous compounds have been shown to have a broad-spectrum action against plant pathogens (Fernando et al., 2005). Antibiotics act in two major ways: either by targeting and killing cells or by inhibition/regulation of enzymes involved in cell wall biosynthesis, nucleic acid metabolism and repair.

Phenylpropanoid pathway was more active (up regulated) in susceptible accessions

(IE2459 and White Sel6) than (down-regulated) in tolerant accessions (GuluE and

IE2396). The over-representation of this pathway can be attributed to the host response to damage of the root cell walls by Striga haustoria. Repair of root cell walls in susceptible accessions involves lignification of the cell wall to mechanically prevent further haustoria penetration. Monolignols, the lignin monomers, are synthesized through the lignin-specific branch of the phenylpropanoid metabolic pathway.

Furthermore, phenylpropanoid metabolism other metabolic pathways responsible for synthesizing a variety of secondary metabolites that play various roles in developmental and stress-related processes (Dixon‘ and Paiva, 1995; Zabala et al., 2006).

Additionally, Phenylalanine, tryptophan and tyrosine metabolic pathways were also enriched in the four accessions. These amino acids are derived from the shikimate pathway and play an important role in phenylpropanoid metabolism. The deamination 77

of phenylalanine to cinnamic acid, catalysed by phenylalanine ammonia lyase (PAL), is the first committed step in phenylpropanoid metabolism (Friend, et al, 1973; Howles et al., 1996). Previously, enzymes and genes associated with ―Phenylpropanoid metabolism‖ were up-regulated during resistance reactions in rice against S. hermonthica (Swarbrick et al., 2008) and cowpea against Striga gesnerioides (Huang et al., 2012). Significant detection of genes related to ―Oxidative Phosphorylation‖ pathway was a clear indication that finger millet seedlings were in an active process of minimizing the effects of reactive oxygen species produced under both abiotic and biotic stresses (Zheng et al., 2010).

Three major enzymes groups, transferases, oxidoreductases, and hydrolases, were overrepresented in the finger millet transcriptome when exposed to Striga. Related protein classes were also identified when Chickpea (Cicer arietinum L.) was exposed to both salinity and drought stresses (Varshney et al., 2009). This may be attributed to an active response of finger millet seedlings towards a stressful environment caused by

Striga infestation. A specific quinone oxidoreductase have also been implicated in the stimulation of haustorium development in hemiparasite plant Tryphasaria

(Bandaranayake et al., 2010), which can directly be associated with haustorium in

Striga in this study.

5.1.3 Functional annotation of Single Nucleotide Polymorphisms (SNPs) Differences in genetic diversity of the four finger millet accessions in this study could well explain the observed variation in the expression profiles. Since finger millet is an allotetraploid (2n=4x=36) with two sub-genome AABB, there is a possibility of 78

homoeologous variants occurring within the sub-genomes. Homoeologous SNPs display little variation of frequencies within individual accessions. High frequencies of homoeologous SNPs were observed in GuluE and IE2396 as compared to IE2459 and

White Sel6, which could as well explain the internal effect each accession had due to exposure to Striga. SNP frequencies are determined by the genetic diversity, number of accessions and the regions being analyzed, whether coding and non-coding

(Mammadov et al., 2010). However, the high frequencies of homoeologous SNPs observed in this study may be because of polyploidy effect (genome duplication). Non- homoeologous SNPs provide a better representation of the difference between accessions.

Generally, higher non-homoeologous SNP frequencies were observed when GuluE and

IE2396 accessions were used as reference transcriptomes compared to IE2459 and

White Sel6 accessions. The highest bi-allelic SNP frequency was observed when GuluE was used as the reference assembly ranging from 1SNP/142bp against IE2396 to

1SNP/162bp against White Sel6. Similarly, the highest frequencies were observed when

IE2459 reads were mapped to the other accessions. The lowest frequency (1SNP/434bp) was observed when IE2396 reads were mapped to IE2459 reference.

SNPs occur at a density of 1SNP/375bp in exon coding sequences and 1SNP/490bp in intron non-coding sequences. However, due to high polymorphism a frequency of

1SNP/124bp has been detected in maize genome (Ching et al., 2002). Frequency doubling has been detected in Poplar (Populus balsamifera) due to geographical 79

diversity among the species analyzed (Cronk, 2005). Geographical diversity might have led to the high SNP frequencies within GuluE (domesticated in Uganda) against the other accessions (domesticated in Kenya). In contrast, the low frequencies of bi-allelic

SNPs in IE2459 called in this study suggest closer relation with all other accessions.

The average Ts/Tv for bi-allelic SNPs was 2.4. Ts/Tv of 2.6 and 2.5 have been observed between four separate genotypes in foxtail millet (Bai et al., 2013). Exome calls Ts/Tv are expected to be higher than the whole genome set, which is around 2.2 (Bai et al.,

2013). The Ts/Tv ratio is always calculated for known and novel SNPs separately and the higher the ratio the better quality the SNPs (Liu et al., 2012). If nucleotide substitutions are observed randomly then the number of transversions (Tv) are expected to be twice as abundant as transitions (Ts). A bias has been noted, which can be associated with two hypotheses (Lyons and Lauring, 2017). The first hypothesis known as the ―mutation hypothesis‖ states that ―the transition mutation rates of polymerases are higher than transversions rates‖. The second hypothesis known as ―selective hypothesis‖ posits that natural selection disfavours transversions and that nonsynonymous transition are more likely to conserve the original amino acid (Lyons and Lauring, 2017).

5.2 Conclusions

The following conclusion can be made from this study:

i) The two accessions previously identified as tolerant (GuluE and IE2396),

exhibited varying mechanisms of tolerance and escape to Striga infestation. This 80

was contrary to IE2459 and White Sel6, which had higher infestation of Striga

in White Sel6 and IE2459 petri-dishes as compared to GuluE and IE2396.

ii) Genes associated with different pathways, gene ontologies and protein classes

were expressed across all the accessions. Genes related to phenylpropanoid

biosynthesis and phenylalanine metabolism were significantly expressed in

susceptible accessions IE2459 and White Sel6. The high numbers of

differentially expressed transcripts identified at the 5 dpi suggests higher

impacts of Striga to finger millet at time of infestation.

iii) Higher non-homeologous SNP frequencies was observed between GuluE and all

other accessions. This suggests that all the four accessions were closely related

except GuluE, which was previously domesticated in Uganda. The appropriate

use and application of advanced techniques in molecular breeding such as using

SNPs can further enhance the efficiency of integrated Striga management

practices, and hence crop productivity.

5.3 Recommendations and Suggestions for Future Research

The following recommendations were deduced:

i) Due to the superior agronomical vigour observed in GuluE and IE2396,

breeding of these accessions towards tolerance to Striga is recommended.

ii) Genes involved in cell wall development, photosynthesis, signalling and

transport should be a focal point in studies involving Striga infestation in finger

millet especially at 5 dpi. 81

iii) In cases of polypoid like finger millet non-homoeologous bi-allelic SNPs should

be used in studying variations within individual species.

The following are some of the suggested further studies:

i) Histopathological studies should be done in order to elucidate host-parasite

interactions at each infestation time point.

ii) The transcripts identified at three time points in this study should be

validated using quantitative polymerase chain reaction (qPCR).

iii) The SNPs identified should be mapped to the finger millet genome and

validated.

82

REFERENCES

Aduguna, A. (2007). The role of introduced sorghum and millets in Ethiopian agriculture. SAT J. 1(3), 1–4. Ahonsi, M. O., Berner, D. K., Emechebe, A. M. and Lagoke, S. T. (2004). Effects of ALS-inhibitor herbicides, crop sequence, and fertilization on natural soil suppressiveness to Striga hermonthica. Agriculture, ecosystems and environment, 104(3), 453-463. Aird, D., Ross, M. G., Chen, W. S., Danielsson, M., Fennell, T., Russ, C. and Gnirke, A. (2011). Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome biology, 12(2), 1-14. Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research, 25(17), 3389-3402. Amusan, I. O., Rich, P. J., Menkir, A., Housley, T. and Ejeta, G. (2008). Resistance to Striga hermonthica in a maize inbred line derived from Zea diploperennis. New Phytologist, 178(1), 157-166. Anders, S., McCarthy, D. J., Chen, Y., Okoniewski, M., Smyth, G. K., Huber, W. and Robinson, M. D. (2013). Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nature protocols, 8(9), 1765- 1786. Atera, E. A., Ishii, T., Onyango, J. C., Itoh, K. and Azuma, T. (2013). Striga infestation in Kenya: status, distribution and management options. Sustainable Agriculture Research, 2(2), 526-535. Bai, H., Cao, Y., Quan, J., Dong, L., Li, Z., Zhu, Y. and Li, D. (2013). Identifying the genome-wide sequence variations and developing new molecular markers for genetics research by re-sequencing a landrace of foxtail millet. PloS one, 8(9), 1-12. Bandaranayake, P. C., Filappova, T., Tomilov, A., Tomilova, N. B., Jamison- McClung, D., Ngo, Q. and Yoder, J. I. (2010). A single-electron reducing quinone oxidoreductase is necessary to induce haustorium development in the root Triphysaria. The Plant Cell, 22(4), 1404-1419.

Bandaranayake, P. C., Tomilov, A., Tomilova, N. B., Ngo, Q. A., Wickett, N. and Yoder, J. I. (2011). The TvPirin gene is necessary for haustorium development in the parasitic plant Triphysaria versicolor. Plant physiology, 158(2), 1046- 1053. 83

Bawa, A., Isaac, A. K., Abdulai, M. S. and Kugbe, J. X. (2015). Evaluation of some genotypes of maize (Zea mays L.) for tolerance to Striga hermonthica (Del.) Benth in Northern . Research in Plant Biology, 5(5), 1-12.

Belton, P. S. and Taylor, J. R. (2004). Sorghum and millets: protein sources for Africa. Trends in Food Science and Technology, 15(2), 94-98.

Berner, D. K., Kling, J. G. and Singh, B. B. (1995). Striga research and control. A perspective from Africa. Plant Disease, 79(7), 652-660. Bolger, A. M., Lohse, M. and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114-2120.

Bouwmeester, H. J., Matusova, R., Zhongkui, S. and Beale, M. H. (2003). Secondary metabolite signalling in host–parasitic plant interactions. Current opinion in plant biology, 6(4), 358-364.

Bozkurt, M. L., Muth, P., Parzies, H. K. and Haussmann, B. I. G. (2015). Genetic diversity of East and West African Striga hermonthica populations and virulence effects on a contrasting set of sorghum cultivars. Weed research, 55(1), 71-81.

Ching, A. D. A., Caldwell, K. S., Jung, M., Dolan, M., Smith, O., Tingey, S. and Rafalski, A. J. (2002). SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines. BMC genetics, 3(1), 440-448.

Chrispus, O. A. (2008). Breeding Investigation of Finger millet characteristics including blast disease and striga resistance in Western Kenya (Doctoral dissertation), University of KwaZulu-Natal. Cronk, Q. C. B. (2005). Plant eco-devo: the potential of poplar as a model organism. New Phytologist, 166(1), 39-48.

Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A. and McVean, G. (2011). The variant call format and VCFtools. Bioinformatics, 27(15), 2156-2158.

De Groote, H., Wangare, L., Kanampiu, F., Odendo, M., Diallo, A., Karaya, H. and Friesen, D. (2008). The potential of a herbicide resistant maize technology for Striga control in Africa. Agricultural Systems, 97(1), 83-94. Dida, M. M. and Devos, K. M. (2006). Finger millet. Cereals and Millets, Springer, Heidelberg, Berlin, Germany.

Dida, M. M., Ramakrishnan, S., Bennetzen, J. L., Gale, M. D. and Devos, K. M. (2007). The genetic map of finger millet, Eleusine coracana. Theoretical and Applied Genetics, 114(2), 321-332. 84

Dida, M. M., Wanyera, N., Dunn, M. L. H., Bennetzen, J. L. and Devos, K. M. (2008). Population structure and diversity in finger millet (Eleusine coracana) germplasm. Tropical Plant Biology, 1(2), 131-141.

Dixon, R. A. and Paiva, N. L. (1995). Stress-induced phenylpropanoid metabolism. The plant cell, 7(7), 1085.

Duarte, J., Rivière, N., Baranger, A., Aubert, G., Burstin, J., Cornet, L. and Pilet- Nayel, M. L. (2014). Transcriptome sequencing for high throughput SNP development and genetic mapping in pea. BMC genomics, 15(1), 126-131.

Duitama, J., Srivastava, P. K. and Măndoiu, I. I. (2012). Towards accurate detection and genotyping of expressed variants from whole transcriptome sequencing data. BMC genomics, 13(2), 2-6.

Ejeta, G. (2007). Breeding for resistance in sorghum: Exploitation of an intricate host– parasite biology. Crop Science, 47(3), 209-216.

Ejeta, G. and Butler, L. G. (1993). Host-parasite interactions throughout the Striga life cycle, and their contributions to Striga resistance. African Crop Science Journal, 1(2), 78-80.

Elzein, A., Kroschel, J. and Cadisch, G. (2008). Efficacy of Pesta granular formulation of Striga-mycoherbicide Fusarium oxysporum f. sp. strigae Foxy 2 after 5-year of storage. Journal of Plant Diseases and Protection, 115(6)259- 262.

Fernández-Aparicio, M., Huang, K., Wafula, E. K., Honaas, L. A., Wickett, N. J., Timko, M. P. and Westwood, J. H. (2013). Application of qRT-PCR and RNA-Seq analysis for the identification of housekeeping genes useful for normalization of gene expression values during Striga hermonthica development. Molecular biology reports, 40(4), 3395-3407.

Fernando, W. D., Nakkeeran, S. and Zhang, Y. (2005). Biosynthesis of antibiotics by PGPR and its relation in biocontrol of plant diseases. In PGPR: Biocontrol and Biofertilization (pp. 67-109).

Friend, J., Reynolds, S. B. and Aveyard, M. A. (1973). Phenylalanine ammonia lyase, chlorogenic acid and lignin in potato tuber tissue inoculated with Phytophthora infestans. Physiological Plant Pathology, 3(4), 495-507.

Gachomo, E. W., Shonukan, O. O. and Kotchoni, S. O. (2003). The molecular initiation and subsequent acquisition of disease resistance in plants. African Journal of Biotechnology, 2(2), 26-32. 85

Gafar NY., Hassan, M. M., Rugheim, A. M. E., Osman, A. G., Mohamed, I. S., Abdelgani, M. E. and Babiker, A. G. T. (2015). Evaluation of endophytic bacterial isolates on germination and haustorium initiation of Striga hermonthica (del.) Benth. International Journal of Farming and Allied Sciences 4 (4), 302–8. Garber, M., Grabherr, M. G., Guttman, M. and Trapnell, C. (2011). Computational methods for transcriptome annotation and quantification using RNA-seq. Nature methods, 8(6), 469-477. Garrison, E. and Marth, G. (2012). Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907.

Global Crop DiversityTrust. (2011). Global Strategy for the Ex Situ Conservation of Finger Millet and Its Wild Relatives. Genetic Resources and Crop Evolution, 57(4), 625-639.

Gobena, D., Shimels, M., Rich, P. J., Ruyter-Spira, C., Bouwmeester, H., Kanuganti, S. and Ejeta, G. (2017). Mutation in sorghum LOW GERMINATION STIMULANT 1 alters strigolactones and causes Striga resistance. Proceedings of the National Academy of Sciences, 114(17), 4471- 4476.

Gómez-Ariza, J., Campo, S., Rufat, M., Estopà, M., Messeguer, J., Segundo, B. S. and Coca, M. (2007). Sucrose-mediated priming of plant defense responses and broad-spectrum disease resistance by overexpression of the maize pathogenesis- related PRms protein in rice plants. Molecular Plant-Microbe Interactions, 20(7), 832-842. Goyer, A. (2010). Thiamine in plants: aspects of its metabolism and functions. Phytochemistry, 71(14-15), 1615-1624. Gupta, V. S. and Ranjekar, P. K. (1981). DNA sequence organization in finger millet (Eleusine coracana). Journal of Biosciences, 3(4), 417-430. Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P. D., Bowden, J. and MacManes, M. D. (2013). De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature protocols, 8(8), 1494-1512. Hassan, M. M., Gani, M. E. S. A. and Babiker, A. G. E. T. (2009). Management of Striga hermonthica in sorghum using soil rhizosphere bacteria and host plant resistance. Int. J. Agric. Biol, 11(2009), 367-373. 86

Haussmann, B. I., Hess, D. E., Welz, H. G. and Geiger, H. H. (2000). Improved methodologies for breeding Striga-resistant . Field Crops Research, 66(3), 195-211. Hegenauer, V., Körner, M. and Albert, M. (2017). Plants under stress by parasitic plants. Current Opinion in Plant Biology, 3(84), 34-41. Heuzé, V., Tran, G., Lebas, F. and Hassoun, P. (2015). Finger Millet (Eleusine Coracana), . Feedipedia, a Programme by INRA, CIRAD, AFZ and FAO. Hittalmani, S., Mahesh, H. B., Shirke, M. D., Biradar, H., Uday, G., Aruna, Y. R. and Mohanrao, A. (2017). Genome and Transcriptome sequence of Finger millet (Eleusine coracana (L.) Gaertn.) provides insights into drought tolerance and nutraceutical properties. BMC Genomics, 18(1), 465-470. Howles, P. A., Sewalt, V. J., Paiva, N. L., Elkind, Y., Bate, N. J., Lamb, C. and Dixon, R. A. (1996). Overexpression of L-phenylalanine ammonia-lyase in transgenic tobacco plants reveals control points for flux into phenylpropanoid biosynthesis. Plant Physiology, 112(4), 1617-1624. Huang, K., Mellor, K. E., Paul, S. N., Lawson, M. J., Mackey, A. J. and Timko, M. P. (2012). Global changes in gene expression during compatible and incompatible interactions of cowpea ( unguiculata L.) with the root parasitic angiosperm Striga gesnerioides. BMC genomics, 13(1), 402. Ichihashi, Y., Mutuku, J. M., Yoshida, S. and Shirasu, K. (2015). Transcriptomics exposes the uniqueness of parasitic plants. Briefings in functional genomics, 14(4), 275-282. International, B. (2010). Key access and utilization descriptors for finger millet genetic resources. [ebook] p.4. Available at: https://www.bioversityinternational.org/e-library/publications [Accessed 24 Aug. 2014].

Jamil, M., Charnikhova, T., Houshyani, B., van Ast, A. and Bouwmeester, H. J. (2012b). Genetic variation in strigolactone production and tillering in rice and its effect on Striga hermonthica infection. Planta, 235(3), 473-484. Jamil, M., Kanampiu, F. K., Karaya, H., Charnikhova, T. and Bouwmeester, H. J. (2012a). Striga hermonthica parasitism in maize in response to N and P fertilisers. Field Crops Research, 134(7), 1-10. Kajuna, S. T. A. R. (2001). Millet: Post-harvest operations. Food and Agricultural Organisation, 5(2), 1-49. Kanampiu, F., Karaya, H., Burnet, M. and Gressel, J. (2009). Needs for and effectiveness of slow release herbicide seed treatment Striga control 87

formulations for protection against early season crop phytotoxicity. Crop protection, 28(10), 845-853. Kannan, S., Hui, J., Mazooji, K., Pachter, L. and Tse, D. (2016). Shannon: An Information-Optimal de Novo RNA-Seq Assembler. bioRxiv, 039230. Khan, Z. R., Midega, C. A., Pittchar, J. O., Murage, A. W., Birkett, M. A., Bruce, T. J. and Pickett, J. A. (2014). Achieving food security for one million sub- Saharan African poor through push–pull innovation by 2020. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 369(1639), 20120284. Khan, Z. R., Pickett, J. A., Wadhams, L. J., Hassanali, A. and Midega, C. A. (2006). Combined control of Striga hermonthica and stemborers by maize– Desmodium spp. intercrops. Crop Protection, 25(9), 989-995. Kole, C., Muthamilarasan, M., Henry, R., Edwards, D., Sharma, R., Abberton, M. and Cai, H. (2015). Application of genomics-assisted breeding for generation of climate resilient crops: progress and prospects. Frontiers in plant science, 6(2015) 1-11. Kountche, B. A., Hash, C. T., Dodo, H., Laoualy, O., Sanogo, M. D., Timbeli, A. and Haussmann, B. I. (2013). Development of a pearl millet Striga-resistant genepool: Response to five cycles of recurrent selection under Striga-infested field conditions in West Africa. Field crops research, 154(2013), 82-90. Langmead, B., Trapnell, C., Pop, M. and Salzberg, S. L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology, 10(3), 25-34. Lendzemo, V. W., Kuyper, T. W., Kropff, M. J. and van Ast, A. V. (2005). Field inoculation with arbuscular mycorrhizal fungi reduces Striga hermonthica performance on cereal crops and has the potential to contribute to integrated Striga management. Field Crops Research, 91(1), 51-61. Li, B. and Dewey, C. N. (2011). RSEM: accurate transcript quantification from RNA- Seq data with or without a reference genome. BMC bioinformatics, 12(1), 323- 335. Liu, Q., Guo, Y., Li, J., Long, J., Zhang, B. and Shyr, Y. (2012). Steps to ensure accuracy in genotype and SNP calling from Illumina sequencing data. BMC genomics, 13(8), 8-14.

Liu, G., Han, Y., Jiang, Y., Wang, Y., Lv, P. and Li, H. (2016). Genomics Approaches to Biotic Stress Resistance. In the Sorghum Genome 1(2016), 149- 167. 88

Lumba, S., Holbrook-Smith, D. and McCourt, P. (2017). The perception of strigolactones in vascular plants. Nature Chemical Biology, 13(6), 599-607.

Lyons, D. M. and Lauring, A. S. (2017). Evidence for the selective basis of transition- to-transversion substitution bias in two RNA viruses. Molecular biology and evolution, 34(12), 3205-3215.

Mammadov, J. A., Chen, W., Ren, R., Pai, R., Marchione, W., Yalçin, F. and Kumpatla, S. P. (2010). Development of highly polymorphic SNP markers from the complexity reduced portion of maize [Zea mays L.] genome for use in marker-assisted breeding. Theoretical and applied genetics, 121(3), 577-588. Manyasa, E. O. (2013). A study of the diversity, adaptation and gene effects for blast resistance and yield traits in East African finger millet (Eleusine coracana (L.) Gaertn) landraces (Doctoral dissertation), University of KwaZulu-Natal. Martin, J. A. and Wang, Z. (2011). Next-generation transcriptome assembly. Nature Reviews Genetics, 12(10), 671-682. Matvienko, M., Wojtowicz, A., Wrobel, R., Jamison, D., Goldwasser, Y. and Yoder, J. I. (2001). Quinone oxidoreductase message levels are differentially regulated in parasitic and non‐parasitic plants exposed to allelopathic quinones. The Plant Journal, 25(4), 375-387. Mbuvi, D. A., Masiga, C. W., Kuria, E. K., Masanga, J., Wamalwa, M., Mohamed, A., Odeny D. A., Hamza N., Timko M. P. and Runo, S. M. (2017). Novel sources of witchweed (Striga) resistance from wild sorghum accessions. Frontiers in plant science, 8(2017), 116-127. Mgonja, M. A., Lenne, J. M., Manyasa, E. and Sreenivasaprasad, S. (2007). Finger millet blast management in East Africa Creating opportunities for improving production and utilization of finger millet. International Crops Research Institute for the Semi-Arid Tropics, 1(1), 1-196. Midega, C. A., Khan, Z. R., Amudavi, D. M., Pittchar, J. and Pickett, J. A. (2010). Integrated management of Striga hermonthica and cereal stemborers in finger millet (Eleusine coracana (L.) Gaertn.) through intercropping with . International Journal of Pest Management, 56(2), 145-151. Mohamed, A. H., Housley, T. L. and Ejeta, G. (2010). An in vitro technique for studying specific Striga resistance mechanisms in sorghum. African Journal of Agricultural Research, 5(14), 1868-1875. Mutuku, J. M., Yoshida, S., Shimizu, T., Ichihashi, Y., Wakatake, T., Takahashi, A. and Shirasu, K. (2015). The WRKY45-dependent signaling pathway is required for resistance against Striga hermonthica parasitism. Plant physiology, 168(3), 1152-1163. 89

Naga, B. L. R. I., Mangamoori, L. N. and Subramanyam, S. (2012). Identification and characterization of EST-SSRs in finger millet (Eleusine coracana (L.) Gaertn.). Journal of Crop Science and Biotechnology, 15(1), 9-16. Nagalakshmi, U., Waern, K. and Snyder, M. (2010). RNA-Seq: a method for comprehensive transcriptome analysis. Current Protocols in Molecular Biology, 89(1), 4-11. Ngesa, H.O., Okora, J.O., Maobe S.N., Opondo M.A. and Ayako P.O. (2015). Assessment of the Damage, Severity and Effectiveness of Current Control Methods for Striga in Western Kenya. African Journal of Agriculture and Utilisation of Natural Resources for Sustainable Development, 1(1), 60–64. Ngo, Q. A., Albrecht, H., Tsuchimatsu, T. and Grossniklaus, U. (2013). The differentially regulated genes TvQR1 and TvPirin of the parasitic plant Triphysaria exhibit distinctive natural allelic diversity. BMC plant biology, 13(1), 28. Obidiegwu, O. N., Parzies, H. and Obidiegwu, J. E. (2014). Development and genotyping potentials of EST-SSRs in finger millet (E. Coracana (L.) Gaertn.). Int. J. Genet. Genomics, 2(1), 42-46. Obilana, A. B. (2003). Overview: importance of millets in Africa. World (all cultivated millet species), 38, 28. Obilana, A. T. and Ramaiah, K. V. (1992). Striga (witchweeds) in sorghum and millet: knowledge and future research needs. International Crops Research Institute for the Semi-Arid Tropics, p: 187-201. Oduori, C. O. (2005). The importance and research status of finger millet in Africa. Nairobi: The McKnight Foundation Collaborative Crop Research. Olupot, J.R., Oryokot, J., Osiru, D.S.O., Gebrekdam, B. and Warren, H. (2003). The Potential of Using Silver Leaf Desmodium (Desmodium uncinutum) for the Control of Striga on Sorghum in Uganda. Crop Protection 22(1), 463–68. Oswald, A. (2005). Striga control—technologies and their dissemination. Crop Protection, 24(4), 333-342. Oswald, A. and Ransom, J. K. (2004). Response of maize varieties to Striga infestation. Crop Protection, 23(2), 89-94. Owere, L., Tongoona, P., Derera, J. and Wanyera, N. (2014). Farmers‘ perceptions of finger millet production constraints, varietal preferences and their implications to finger millet breeding in Uganda. Journal of Agricultural Science, 6(12), 126. Pareek, C. S., Smoczynski, R. and Tretyn, A. (2011). Sequencing technologies and genome sequencing. Journal of applied genetics, 52(4), 413-435. 90

Quail, M. A., Smith, M., Coupland, P., Otto, T. D., Harris, S. R., Connor, T. R. and Gu, Y. (2012). A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC genomics, 13(1), 341-347. Ranjan, A., Ichihashi, Y., Farhi, M., Zumstein, K., Townsley, B., David-Schwartz, R. and Sinha, N. R. (2014). De novo assembly and characterization of the transcriptome of the parasitic weed dodder identifies genes associated with plant parasitism. Plant Physiology, 166(3), 1186-1199. Ransom, J. K. (2000). Long-term approaches for the control of Striga in cereals: field management options. Crop Protection, 19(8), 759-763. Ratan, A., Zhang, Y., Hayes, V. M., Schuster, S. C. and Miller, W. (2010). Calling SNPs without a reference sequence. BMC bioinformatics, 11(1), 130-139. Reda, F., Butler, L. G., Ejeta, G. and Ransom, K. (1994). Screening of maize genotypes for low Striga Asiatica stimulant production using the'agar gel technique'. African Crop Science Journal, 2(2), 1-16. Reddy, V. G., Upadhyaya, H. D., Gowda, C. L. L. and Singh, S. (2009). Characterization of eastern African finger millet germplasm for qualitative and quantitative characters at ICRISAT. Journal of SAT Agricultural Research, 7, pp9. Rich, P. J. and Ejeta, G. (2008). Towards effective resistance to Striga in African maize. Plant signaling and behavior, 3(9), 618-621. Rispail, N., Dita, M. A., González-Verdejo, C., Pérez-de-Luque, A., Castillejo, M. A., Prats, E. and Rubiales, D. (2007). Plant resistance to parasitic plants: molecular approaches to an old foe. New Phytologist, 173(4), 703-712. Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1), 139-140. Rodenburg, J., Bastiaans, L., Weltzien, E. and Hess, D. E. (2005). How can field selection for Striga resistance and tolerance in sorghum be improved? Field Crops Research, 93(1), 34-50. Rodenburg, J., Cissoko, M., Kayeke, J., Dieng, I., Khan, Z. R., Midega, C. A. and Scholes, J. D. (2015). Do NERICA rice cultivars express resistance to Striga hermonthica (Del.) Benth. and Striga asiatica (L.) Kuntze under field conditions? Field crops research, 170(1), 83-94.

Rodenburg, J., Cissoko, M., Kayongo, N., Dieng, I., Bisikwa, J., Irakiza, R. and Scholes, J. D. (2017). Genetic variation and host–parasite specificity of Striga 91

resistance and tolerance in rice: the need for predictive breeding. New Phytologist, 214(3), 1267-1280.

Runo, S. and Kuria, E. K. (2018). Habits of a highly successful cereal killer, Striga. PLoS pathogens, 14(1), 1-5.

Sauerborn, J., Müller-Stöver, D. and Hershenhorn, J. (2007). The role of biological control in managing parasitic weeds. Crop protection, 26(3), 246-254. Saucet, S. B. and Shirasu, K. (2016). Molecular parasitic plant–host interactions. PLoS pathogens, 12(12), 234-240.

Scholes, J. D. and Press, M. C. (2008). Striga infestation of cereal crops–an unsolved problem in resource limited agriculture. Current opinion in plant biology, 11(2), 180-186. Seto, Y. and Yamaguchi, S. (2014). Strigolactone biosynthesis and perception. Current Opinion in Plant Biology, 21(1), 1-6.

Sharpe, A. G., Ramsay, L., Sanderson, L. A., Fedoruk, M. J., Clarke, W. E., Li, R. and Bett, K. E. (2013). Ancient orphan crop joins modern era: gene-based SNP discovery and mapping in lentil. BMC genomics, 14(1), 192-197.

Shayanowako, A. T., Laing, M., Shimelis, H. and Mwadzingeni, L. (2018). Resistance breeding and biocontrol of Striga asiatica (L.) Kuntze in maize: a review. Acta Agriculturae Scandinavica, Section B—Soil and Plant Science, 68(2), 110-120.

Shobana, S., Krishnaswamy, K., Sudha, V., Malleshi, N. G., Anjana, R. M., Palaniappan, L. and Mohan, V. (2013). Finger millet (Ragi, Eleusine coracana L.): a review of its nutritional properties, processing, and plausible health benefits. Adv Food Nutr Res, 69(1), 39-43.

Smith, P. M. and Atkins, C. A. (2002). Purine biosynthesis. Big in cell division, even bigger in nitrogen assimilation. Plant Physiology, 128(3), 793-802.

Song, L., Florea, L. and Langmead, B. (2014). Lighter: fast and memory-efficient sequencing error correction without counting. Genome biology, 15(11), 509. Subki, A., Abidin, A. A. Z. and Yusof, Z. N. B. (2018). The Role of Thiamine in Plants and Current Perspectives in Crop Improvement. In B Group - Current Uses and Perspectives. 5(3), 33-44

Swarbrick, P. J., Huang, K., Liu, G., Slate, J., Press, M. C. and Scholes, J. D. (2008). Global patterns of gene expression in rice cultivars undergoing a 92

susceptible or resistant interaction with the parasitic plant Striga hermonthica. New Phytologist, 179(2), 515-529. Tatusov, R. L., Koonin, E. V. and Lipman, D. J. (1997). A genomic perspective on protein families. Science, 278(5338), 631-637. Timko, M. P., Huang, K. and Lis, K. E. (2012). Host resistance and parasite virulence in Striga–host plant interactions: a shifting balance of power. Weed science, 60(2), 307-315.

Toh, S., Holbrook-Smith, D., Stogios, P. J., Onopriyenko, O., Lumba, S., Tsuchiya, Y. and McCourt, P. (2015). Structure-function analysis identifies highly sensitive strigolactone receptors in Striga. Science, 350(6257), 203-207.

Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D. R. and Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature protocols, 7(3), 562-578. Trick, M., Adamski, N. M., Mugford, S. G., Jiang, C. C., Febrer, M. and Uauy, C. (2012). Combining SNP discovery from next-generation sequencing data with bulked segregant analysis (BSA) to fine-map genes in polyploid wheat. BMC Plant Biology, 12(1), 14-20. Tulin, S., Aguiar, D., Istrail, S. and Smith, J. (2013). A quantitative reference transcriptome for Nematostella vectensis earlyembryonic development: a pipeline for de novo assembly in emergingmodel systems. EvoDevo, 4(1), 16. Upadhyaya, H. D., Gowda, C. L. L. and Reddy, V. G. (2007). Morphological diversity in finger millet germplasm introduced from Southern and Eastern Africa. Journal of SAT Agricultural Research, 3(1), 1-3. Varsha, V., Urooj, A. and Malleshi, N. G. (2009). Evaluation of antioxidant and antimicrobial properties of finger millet (Eleusine coracana) polyphenols. Food Chem. 114(1), 340–346.

Varshney, R. K., Graner, A. and Sorrells, M. E. (2005). Genomics-assisted breeding for crop improvement. Trends in plant science, 10(12), 621-630.

Varshney, R. K., Hiremath, P. J., Lekha, P., Kashiwagi, J., Balaji, J., Deokar, A. A. and Siddique, K. H. (2009). A comprehensive resource of drought-and salinity- responsive ESTs for gene discovery and marker development in chickpea (Cicer arietinum L.). BMC genomics, 10(1), 523-530. Viswanath, V., Urooj, A. and Malleshi, N. G. (2009). Evaluation of antioxidant and antimicrobial properties of finger millet polyphenols (Eleusine coracana). Food Chemistry, 114(1), 340-346. 93

Wang, Z., Gerstein, M. and Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews genetics, 10(1), 57-63. Wolf, J. B. (2013). Principles of transcriptome analysis and gene expression quantification: an RNA-seq tutorial. Molecular ecology resources, 13(4), 559- 572. Xu, D. L., Long, H., Liang, J. J., Zhang, J., Chen, X., Li, J. L. and Yu, M. Q. (2012). De novo assembly and characterization of the root transcriptome of Aegilops variabilis during an interaction with the cereal cyst nematode. BMC genomics, 13(1), 133-136. Xu, H., Gao, Y. and Wang, J. (2012). Transcriptomic analysis of rice (Oryza sativa) developing embryos using the RNA-Seq technique. PLoS One, 7(2), 1-12. Yang, S. S., Tu, Z. J., Cheung, F., Xu, W. W., Lamb, J. F., Jung, H. J. G. and Gronwald, J. W. (2011). Using RNA-Seq for gene identification, polymorphism detection and transcript profiling in two alfalfa genotypes with divergent cell wall composition in stems. BMC genomics, 12(1), 199-203. Yoneyama, K., Arakawa, R., Ishimoto, K., Kim, H. I., Kisugi, T., Xie, X. and Yoneyama, K. (2015). Difference in Striga-susceptibility is reflected in strigolactone secretion profile, but not in compatibility and host preference in arbuscular mycorrhizal symbiosis in two maize cultivars. New Phytologist, 206(3), 983-989. Yoshida, S. and Shirasu, K. (2012). Plants that attack plants: molecular elucidation of plant parasitism. Current opinion in plant biology, 15(6), 708-713. Zabala, G., Zou, J., Tuteja, J., Gonzalez, D. O., Clough, S. J. and Vodkin, L. O. (2006). Transcriptome changes in the phenylpropanoid pathway of Glycine max in response to Pseudomonas syringae infection. BMC Plant Biology, 6(1), 26- 33. Zarafi, A. B., Elzein, A., Abdulkadir, D. I., Beed, F. and Akinola, O. M. (2015). Host range studies of Fusarium oxysporum f. sp. strigae meant for the biological control of Striga hermonthica on maize and sorghum. Archives of Phytopathology and Plant Protection, 48(1), 1-9. Zhang, X., Berkowitz, O., da Silva, J. A. T., Zhang, M., Ma, G., Whelan, J. and Duan, J. (2015). RNA-Seq analysis identifies key genes associated with haustorial development in the root hemiparasite Santalum album. Frontiers in plant science, 6(2015), 1-11. Zhang, Y. T., Zhang, Y. L., Chen, S. X., Yin, G. H., Yang, Z. Z., Lee, S. and Bennett, J. W. (2015). Proteomics of methyl jasmonate induced defense response in maize leaves against Asian corn borer. BMC genomics, 16(1), 224- 229. 94

Zheng, J., Fu, J., Gou, M., Huai, J., Liu, Y., Jian, M. and Wang, G. (2010). Genome-wide transcriptome analysis of two maize inbred lines under drought stress. Plant molecular biology, 72(5), 407-421. Zwanenburg, B., Pospíšil, T. and Zeljković, S. Ć. (2016). Strigolactones: new plant hormones in action. Planta, 243(6), 1311-1326.

95

APPENDICES

Appendix 1: Number of reads of GuluE accession (Striga tolerant)

Infestation time point Raw reads Quality reads 3dpi-infected 1,529,378 1,308,816 3dpi -control 2,692,004 2,083,593 5dpi -infected 556,928 435,696 5dpi -control 782,784 482,341 7dpi -infected 919,657 778,454 7dpi -control 7,467,082 5,894,746 13,947,833 10,983,646

Appendix 2: Number of reads of White Sel6 accession (Striga susceptible)

Infestation time point Raw reads Quality reads 3dpi -infected 818,627 679,356 3dpi-control 1,301,674 1,094,587 5dpi -infected 1,184,349 810,551 5dpi -control 712,086 563,496 7dpi -infected 1,529,719 1,244,831 7dpi -control 1,285,002 1,009,781 6,831,457 5,402,602

Appendix 3: Number of reads of IE2396 accession (Striga tolerant)

Infestation time point Raw reads Quality reads 3dpi -infected 1,520,724 1,256,362 3dpi -control 1,532,275 973,181 5dpi-infected 1,143,907 872,017 5dpi -control 2,512,467 1,738,263 7dpi -infected 1,386,583 1,138,602 7dpi-control 2,295,942 1,929,631 10,391,898 7,908,056

Appendix 4: Number of reads of IE2459 accession (Striga susceptible)

Infestation time point Raw reads Quality reads 3dpi -infected 1,629,222 1,331,119 3dpi -control 843,675 595,498 5dpi-infected 1,897,188 1,565,174 5dpi -control 1,844,108 1,592,819 7dpi -infected 4,007,297 3,213,886 7dpi-control 1,411,996 1,192,467 11,633,486 9,490,963 96

Appendix 5: Statistical summary of the number of annotated transcripts using BLASTX e-value 1e-3 per finger millet accession

No. of Accessions contigs RefSeq Uniprot Pfam COG White Sel6 33368 19463 20027 4873 4636 IE2459 51627 33997 34751 11378 8561 GuluE 25154 8252 8772 1282 1500 IE2396 29013 14703 15275 2958 3233 Total (% ) 139162(100) 76415(54.9) 78825(56.6) 20491(14.7) 17930(12.9)

Appendix 6: Summary of KEGG Pathway mapping per accession. The RefSeq Blast output at an e-value of 1e-5 were queried against KEGG pathway databases using Blast2GO online tool.

Accessions No. of Pathways No. of Sequences No. of Enzymes White Sel6 119 4554 814 GuluE 102 1530 490 IE2396 110 3253 708 IE2459 128 7702 1051 Total 459 17039 3063

Appendix 7: Attachment stage commonly Annotated Transcripts against Uniprot Viridiplantae databases at an e-value of 1e-3.

97

Appendix 8: Summary of Panther Protein Classes mapping per accession.

Protein Classes IE2459 White Sel6 IE2396 GuluE % % Hit %Hit %Hit Hits Hit Hits Hit s s Hit s calcium-binding protein (PC00060) 34 2.0% 15 1.6% 11 1.7% 7 2.1% cell adhesion molecule (PC00069) 13 0.7% 7 0.8% 3 0.5% 2 0.6% cell junction protein (PC00070) 2 0.1% 1 0.1% 2 0.3% 2 0.6% chaperone (PC00072) 40 2.3% 17 1.8% 15 2.3% 12 3.6% cytoskeletal protein (PC00085) 60 3.5% 33 3.6% 30 4.6% 5 1.5% defense/immunity protein (PC00090) 3 0.2% 2 0.2% 3 0.5% 3 0.9% enzyme modulator (PC00095) 97 5.6% 55 6.0% 35 5.3% 19 5.7% extracellular matrix protein (PC00102) 5 0.3% 1 0.1% 1 0.3% 12.6 11 12.4 13.6 hydrolase (PC00121) 218 % 5 12.5% 82 % 45 % isomerase (PC00135) 51 2.9% 25 2.7% 15 2.3% 8 2.4% ligase (PC00142) 71 4.1% 37 4.0% 26 3.9% 12 3.6% lyase (PC00144) 53 3.1% 23 2.5% 22 3.3% 9 2.7% membrane traffic protein (PC00150) 49 2.8% 29 3.1% 15 2.3% 5 1.5% 17.9 16 20.3 19.3 nucleic acid binding (PC00171) 310 % 6 18.0% 134 % 64 % oxidoreductase (PC00176) 167 9.6% 84 9.1% 64 9.7% 22 6.0% receptor (PC00197) 29 1.7% 20 2.2% 15 2.3% 16 4.8% signaling molecule (PC00207) 24 1.4% 20 2.2% 9 1.4% 3 0.9% storage protein (PC00210) 6 0.3% 2 0.2% 1 0.2% 1 0.3% structural protein (PC00211) 2 0.1% transcription factor (PC00218) 85 4.9% 38 4.1% 32 4.9% 17 5.1% transfer/carrier protein (PC00219) 53 3.1% 16 1.7% 14 2.1% 10 3.0% 11.5 11 11.7 transferase (PC00220) 199 % 2 12.1% 77 % 32 9.6% transmembrane receptor regulatory/adaptor protein (PC00226) 5 0.3% 4 0.4% 10 11.1 transporter (PC00227) 159 9.2% 0 10.8% 54 8.2% 37 % 173 92 5 100 2 100 659 100 332 100

98

Appendix 9: Delayed infection in IE2396 after 5dpi

IE2396

Increased swelling at the attachment site

Haustorial attachment

Appendix 10: Top ten differentially expressed transcripts at 3 dpi

Blast P- Transcript ID LogFC FDR RefSeq Title Species value Up- GuluE Infested Vs Control regulated Shannon_shannonout_r2_c1_0_2 0.011 720_272 2.827754 739 1E-19 signal anchor, putative M. truncatula Shannon_shannonout_r2_c1_0_5 0.013 hypothetical protein 120_512 3.681206 27 0.00001 PHAVU_003G113100g P. vulgaris Shannon_shannonout_r2_c1_3_3 0.013 240_324 2.761426 136 1E-08 hypothetical protein L484_018823 M. notabilis Shannon_shannonout_r2_c1_3_8 0.023 470_847 2.628023 151 1E-12 hypothetical protein (mitochondrion) B. braunii Shannon_shannonout_r2_c1_6_1 0.003 uncharacterized protein 4800_1480 3.428774 867 0.000002 LOC105914754 S. italica 0.000 Single_0 3.671516 299 1E-08 hypothetical protein L484_018823 M. notabilis Shannon_shannonout_c1_12_97 0.030 hypothetical protein 70_977 7.830198 171 4.00E-05 PHAVU_006G011900g P. vulgaris Shannon_shannonout_c1_11_34 0.036 uncharacterized protein T. 00_340 2.856658 871 0 LOC104804969, partial hassleriana Shannon_shannonout_c1_12_56 2.763381 0.045 7.00E-99 uncharacterized protein Z. mays 99

230_5623 458 LOC103631140 Shannon_shannonout_r2_c1_7_8 0.033 uncharacterized protein 290_829 2.793854 695 2.00E-06 LOC104878589, partial V. vinifera Down- regulated Shannon_shannonout_r2_c1_3_8 0.012 410_841 -2.6888 302 0.0003 hypothetical protein MTR_8g040260 M. truncatula Shannon_shannonout_c1_18_16 0.000 hypothetical protein 60_166 -3.5384 354 3E-15 MTR_0002s0250 M. truncatula Up- IE2396 Infested Vs Control regulated Shannon_shannonout_r2_c1_4_1 2.2E- 40_14 4.596348 05 1E-13 signal anchor, putative M. truncatula Shannon_shannonout_c1_8_170 2.2E- uncharacterized protein Populus _17 4.783506 05 2E-32 LOC105117228 euphratica 0.000 Single_3 4.138275 369 1E-08 hypothetical protein L484_018823 M. notabilis Shannon_shannonout_r2_c1_4_1 0.001 hypothetical chloroplast RF68 950_195 3.435404 655 9E-55 (plastid) T.a latifolia Shannon_shannonout_r2_c1_4_3 0.001 560_356 5.498438 781 8E-10 hypothetical protein MTR_8g040260 M. truncatula Shannon_shannonout_cremainin 0.003 g1_single_55 3.467877 037 2E-10 Metal transporter Nramp5 M. notabilis Shannon_shannonout_r2_c1_4_4 0.003 hypothetical protein 090_409 3.578002 137 4E-13 MTR_0002s0270 M. truncatula Shannon_shannonout_r2_c1_4_3 0.008 hypothetical protein 430_343 2.979572 295 0.0008 MTR_8g040260 M. truncatula Shannon_shannonout_c1_9_210 0.015 0_210 2.8443 584 7E-10 signal anchor, putative M. truncatula

Shannon_shannonout_c1_9_175 0.279 0_175 -2.25817 212 1E-53 hypothetical chloroplast RF68 T. latifolia Up- IE2459 Infested Vs Control regulated Shannon_shannonout_r2_c1_4_1 0.000 030_103 9.029269 197 2E-150 cytochrome b6 (plastid) L. pauciflora Shannon_shannonout_r2_c1_5_4 0.002 NADH-quinone oxidoreductase 452_3 8.474174 833 6E-09 protein M. truncatula Shannon_shannonout_cremainin 0.017 g1_34110_2 3.757188 535 3E-148 cytochrome b6 (plastid) L. pauciflora

Shannon_shannonout_cremainin g1_48649_0 0.017 2.00E- 3.735274 535 116 maturase K (chloroplast) P. australis Down- regulated Shannon_shannonout_c1_0_508 0.040 2.00E- senescence-associated protein, 0_508 -2.8294 72 142 putative M. truncatula Shannon_shannonout_c1_2_143 0.017 0_143 -8.0482 535 4E-12 hypothetical protein (mitochondrion) B. braunii Shannon_shannonout_c1_3_116 0.008 NADH-quinone oxidoreductase 85_1 -5.67367 649 5E-09 protein M. truncatula Shannon_shannonout_c1_4_singl 0.023 e_34 -3.1613 489 6.00E-05 hypothetical protein (mitochondrion) Bo. braunii Shannon_shannonout_cremainin 0.015 probable galactinol--sucrose g4_single_79 -3.46846 999 0 galactosyltransferase 2 S. italica Shannon_shannonout_r2_c1_3_1 3.76E- uncharacterized protein 240_124 -6.90515 12 8E-35 LOC103626742 Z. mays Shannon_shannonout_r2_c1_3_4 5.72E- 920_492 -4.21629 05 8E-160 alpha-L-arabinofuranosidase 1-like Z. mays Shannon_shannonout_r2_c1_4_1 3.62E- 860_186 -14.0817 18 0.0004 hypothetical protein MTR_8g040260 M. truncatula Shannon_shannonout_r2_c1_4_3 0.023 hypothetical protein 140_314 -2.97246 29 1.00E-05 MTR_0021s0160 M. truncatula Single_385 -4.64656 0.000 1E-08 dehydrin DHN1 Z. mays 100

378 White Sel 6 Infested Vs Up- Control regulated 0.019 Single_43 8.141438 206 5E-145 envelope membrane protein P. olyriformis Shannon_shannonout_r2_c1_11_ 0.019 uncharacterized protein 2800_280 3.395833 206 2E-88 LOC100274305 Z. mays Shannon_shannonout_cremainin 0.019 g1_37313_1 4.57631 206 5E-145 cytochrome b6 O.officinalis

Appendix 11: Top ten differentially expressed transcripts at 5 dpi

LogF Blast P- Transcript ID FDR RefSeq Title Species C value GuluE Infested Vs Control Up-regulated M. Shannon_shannonout_c1_17_ 5.514 2.91E truncatul 610_61 771 -07 0.001 hypothetical protein MTR_4g006070 a Shannon_shannonout_cremai 8.310 0.008 3.00E- Saccharu ning1_106918_1 799 11 54 ribosomal protein L23 (chloroplast) m spp. P. Shannon_shannonout_c1_18_ 8.335 0.006 9.00E- trichocar 1410_141 384 958 10 hypothetical protein POPTR_1605s00200g pa Shannon_shannonout_cremai 8.406 0.005 ning3_52441_0 722 002 0 chlorophyll a-b binding protein 1, chloroplastic S. italica P. Shannon_shannonout_c1_18_ 8.452 0.004 3.00E- trichocar 1390_139 392 8 11 hypothetical protein POPTR_1605s00200g pa Shannon_shannonout_cremai 8.539 0.003 4.00E- M. ning1_115929_2 613 629 18 hypothetical protein L484_022991 notabilis M. Shannon_shannonout_r2_c1_ 8.664 2.31E 6.00E- truncatul 3_2440_244 138 -16 17 hypothetical protein MTR_0055s0130 a M. Shannon_shannonout_r2_c1_ 9.245 0.000 7.00E- truncatul 3_1590_159 063 119 50 hypothetical protein MTR_4g091430 a M. Shannon_shannonout_r2_c1_ 9.494 3.71E 3.00E- truncatul 3_3160_316 663 -05 37 hypothetical protein MTR_0055s0130 a Shannon_shannonout_cremai 9.527 3.28E 4.00E- ning1_123054_0 04 -05 105 hypothetical protein CARUB_v10001685mg C. rubella Down- regulated - M. Shannon_shannonout_r2_c1_ 14.27 1.29E 1.00E- truncatul 3_8450_845 11 -18 46 hypothetical protein MTR_3g035650 a - M. Shannon_shannonout_r2_c1_ 12.58 4.61E 5.00E- truncatul 3 44 -14 66 hypothetical protein MTR_4g091430 a - M. Shannon_shannonout_c1_18_ 12.24 3.60E 3.00E- truncatul 1660_166 04 -13 15 hypothetical protein MTR_0002s0250 a - M. Shannon_shannonout_r2_c1_ 10.38 5.97E 2.00E- truncatul 3_1580_158 15 -08 50 hypothetical protein MTR_4g091430 a - M. Shannon_shannonout_c1_17_ 6.638 6.65E 1.00E- truncatul 11580_1158 01 -12 34 hypothetical protein MTR_8g040260 a - Shannon_shannonout_c1_18_ 6.386 6.47E 4.00E- M. 7260_726 27 -05 44 hypothetical protein L484_002552 notabilis Shannon_shannonout_r2_c1_ - 1.95E 1.00E- M. 3_670_67 4.424 -06 11 senescence-associated protein, putative truncatul 101

08 a - Shannon_shannonout_c1_11_ 4.010 0.000 1.00E- P. 8550_855 84 337 05 hypothetical protein PHAVU_006G011900g vulgaris - M. Shannon_shannonout_r2_c1_ 3.808 7.07E 3.00E- truncatul 3_8410_841 41 -05 04 hypothetical protein MTR_8g040260 a - Shannon_shannonout_c1_18_ 3.057 0.004 6.00E- 930_93 98 8 08 hypothetical protein (mitochondrion) B. braunii IE2396 Infested Vs Control Up-regulated Shannon_shannonout_cremai 8.487 0.009 1.00E- ning5_64293_1 407 051 152 PREDICTED: protochlorophyllide reductase S. italica Shannon_shannonout_cremai 8.529 0.007 3.00E- ning2_58524_1 112 055 145 PREDICTED: catalase isozyme 3 S. italica Shannon_shannonout_cremai 8.808 0.002 1.00E- ning2_3257_0 531 279 43 glycine-rich RNA-binding protein 1-like S. italica 8.825 0.002 4.00E- Single_792 367 089 177 shikimate O-hydroxycinnamoyltransferase-like S. italica B. 8.874 0.001 3.00E- ribulose bisphosphate carboxylase small chain, distachyo Single_311 729 928 100 chloroplastic-like n Shannon_shannonout_cremai 9.042 0.000 ning3_single_70 513 839 0 elongation factor 1-alpha O. sativa 9.256 0.000 2.00E- Single_236 261 322 75 pathogenesis-related protein PRB1-2-like S. italica Shannon_shannonout_cremai 9.616 5.70E ning2_52017_3 926 -05 0 chlorophyll a-b binding protein 1, chloroplastic S. italica P. Shannon_shannonout_r2_c1_ 9.913 5.16E 9.00E- trichocar 5_1740_174 116 -05 24 hypothetical protein POPTR_0001s42200g pa M. Shannon_shannonout_r2_c1_ 13.77 4.82E 8.00E- truncatul 4_1330_133 594 -15 67 hypothetical protein MTR_4g091430 a Down- regulated - M. Shannon_shannonout_r2_c1_ 15.58 6.06E 9.00E- truncatul 4_1340_134 1 -22 04 hypothetical protein MTR_8g040260 a - M. Shannon_shannonout_r2_c1_ 8.706 0.001 1.00E- truncatul 4_3550_355 24 543 46 hypothetical protein MTR_3g035650 a - Shannon_shannonout_r2_c1_ 4.304 2.01E 1.00E- M. 5_6140_614 43 -05 18 hypothetical protein L484_022991 notabilis 3.019 0.007 F. Single_0 015 491 0 photosystem II protein D (chloroplast) racemosa Shannon_shannonout_cremai 3.123 0.009 3.00E- C. ning1_6986_2 831 051 110 hypothetical protein CARUB_v10001685mg rubella Shannon_shannonout_cremai 3.146 0.004 9.00E- O. ning1_72510_2 624 831 145 cytochrome b6 (chloroplast) officinalis Shannon_shannonout_cremai 3.212 0.003 1.00E- M. ning1_88347_0 934 91 20 hypothetical protein MsiCp_p010 (chloroplast) sinensis 3.420 0.009 9.00E- C. Single_55 47 051 73 ribosomal protein S18 (plastid) prionitis P. Shannon_shannonout_r2_c1_ 3.584 0.009 4.00E- trichocar 5_640_64 437 051 11 hypothetical protein POPTR_1605s00200g pa Shannon_shannonout_c1_8_5 3.782 0.000 4.00E- C. 970_597 578 179 07 hypothetical protein CARUB_v10011286mg rubella IE2459 Infested Vs Control Up-regulated 10.05 0.000 Single_376 631 454 0 LOC100147734 isoform X1 Z. mays Shannon_shannonout_cremai 10.05 0.000 7.00E- ning2_37537_1 631 454 78 uncharacterized protein LOC100286322 Z. mays 10.16 0.000 Single_453 446 288 0 adenosylhomocysteinase S. italica 102

B. 10.22 0.000 4.00E- glyceraldehyde-3-phosphate dehydrogenase 1, distachyo Single_206 677 236 73 cytosolic-like n B. Shannon_shannonout_cremai 10.26 0.000 9.00E- distachyo ning2_1264_0 507 192 172 floral homeotic protein APETALA 2 isoform X1 n Shannon_shannonout_r2_c1_ 10.42 8.80E 4.00E- 3_980_98 825 -05 06 hypothetical protein CARUB_v10011286mg C. rubella B. Shannon_shannonout_cremai 10.89 8.04E ribulose bisphosphate carboxylase/oxygenase distachyo ning2_49471_1 046 -06 0 activase A, chloroplastic isoform X1 n Shannon_shannonout_cremai 11.17 1.56E ning1_1993_5 537 -06 0 chlorophyll a-b binding protein 1, chloroplastic S. italica Shannon_shannonout_cremai 11.47 3.02E ning2_36734_2 05 -07 0 putative cytochrome P450 superfamily protein Z. mays L. Shannon_shannonout_r2_c1_ 13.17 2.45E 2.00E- pauciflor 4_1030_103 562 -11 150 cytochrome b6 (plastid) a L. Shannon_shannonout_cremai 14.25 1.84E 3.00E- pauciflor ning1_34110_2 55 -14 148 cytochrome b6 (plastid) a Down- regulated - M. Shannon_shannonout_r2_c1_ 18.74 4.59E 4.00E- truncatul 4_1860_186 42 -30 04 hypothetical protein MTR_8g040260 a - Shannon_shannonout_r2_c1_ 18.59 6.49E 8.00E- 3_1240_124 43 -30 35 uncharacterized protein LOC103626742 Z. mays - M. Shannon_shannonout_r2_c1_ 13.55 3.80E 3.00E- truncatul 3_2850_285 04 -15 15 hypothetical protein MTR_0002s0250 a - Shannon_shannonout_r2_c1_ 10.58 3.56E 8.00E- 3_4920_492 6 -23 160 alpha-L-arabinofuranosidase 1-like Z. mays - M. Shannon_shannonout_c1_0_5 10.52 3.56E 2.00E- truncatul 080_508 96 -23 142 senescence-associated protein, putative a - M. Shannon_shannonout_r2_c1_ 7.634 9.90E 1.00E- truncatul 4_3140_314 1 -15 05 hypothetical protein MTR_0021s0160 a - Shannon_shannonout_c1_2_2 7.287 1.86E 2.00E- P. 90_29 76 -08 06 hypothetical protein PHAVU_003G113100g vulgaris - M. Shannon_shannonout_r2_c1_ 7.062 7.63E 3.00E- truncatul 4_860_86 31 -08 95 signal anchor, putative a - Shannon_shannonout_r2_c1_ 6.903 0.000 1.00E- C. 3_5790_579 07 521 34 hypothetical protein (mitochondrion) annuum - Shannon_shannonout_c1_2_1 5.904 1.11E 2.00E- 460_146 94 -09 12 hypothetical protein (mitochondrion) B. braunii White Sel 6 Infested Vs Control Up-regulated Shannon_shannonout_cremai 3.929 0.000 ning2_55217_0 746 763 0 probable galactinol--sucrose galactosyltransferase 2 S. italica Shannon_shannonout_cremai 3.958 0.009 ning4_20124_0 062 877 0 cysteine proteinase 1 S. italica Shannon_shannonout_cremai 3.964 0.000 1.00E- O. ning1_37514_1 621 513 143 cytochrome b6 (chloroplast) officinalis H. Shannon_shannonout_cremai 3.981 0.005 5.00E- photosystem II cytochrome b559 alpha subunit cenchroid ning1_58557_2 714 579 52 (plastid) es Shannon_shannonout_r2_c1_ 4.611 0.000 6.00E- 11_4550_455 611 763 39 MOB kinase activator-like 1A O. sativa Shannon_shannonout_cremai 5.097 1.07E 5.00E- O. ning1_37313_1 192 -06 145 cytochrome b6 (chloroplast) officinalis 103

Shannon_shannonout_cremai 5.703 3.32E 7.00E- ning1_single_67 474 -06 09 dehydrin DHN1 Z. mays 8.387 0.006 2.00E- Single_623 453 676 71 late embryogenesis abundant protein, group 3-like S. italica Shannon_shannonout_cremai 8.481 0.004 ning7_42799_2 817 999 0 galactinol synthase 2 S. italica H. 10.22 1.70E 4.00E- cenchroid Single_14 962 -06 17 photosystem II protein L (plastid) es Down- regulated - M. Shannon_shannonout_r2_c1_ 17.47 6.28E 2.00E- truncatul 4_1550_155 82 -28 48 hypothetical protein MTR_0021s0160 a - M. Shannon_shannonout_r2_c1_ 11.91 1.38E 1.00E- truncatul 0_8563_25 54 -11 10 signal anchor, putative a - Shannon_shannonout_r2_c1_ 11.34 4.18E 3.00E- M. 0_8673_36 83 -10 50 Metal transporter Nramp5 notabilis - Shannon_shannonout_r2_c1_ 8.474 0.002 3.00E- 0_8323_1 86 155 12 hypothetical protein (mitochondrion) B. braunii - Shannon_shannonout_r2_c1_ 5.288 0.008 5.00E- 9_1150_115 58 195 21 uncharacterized protein LOC105914134 S. italica - M. Shannon_shannonout_r2_c1_ 4.381 7.23E 6.00E- truncatul 4_5020_502 13 -06 07 hypothetical protein MTR_0055s0130 a - Shannon_shannonout_c1_7_1 4.060 0.000 2.00E- P. 6630_1663 2 513 05 hypothetical protein PHAVU_006G011900g vulgaris - Shannon_shannonout_r2_c1_ 3.876 0.001 4.00E- 1_8330_833 45 855 11 uncharacterized protein LOC103630876 Z. mays - Shannon_shannonout_cremai 3.598 0.008 2.00E- B. ning2_2563_11 47 277 74 uncharacterized protein LOC104896858 vulgaris - M. Shannon_shannonout_r2_c1_ 3.471 0.000 7.00E- truncatul 0_8503_19 58 855 04 hypothetical protein MTR_8g040260 a

Appendix 12: Top ten differentially expressed transcripts at 5 dpi

Blast P- Transcript ID LogFC FDR RefSeq Title Species value GuluE Control Vs Infested Down-regulated - Shannon_shannonout_r2_c1_3_ 19.5517027 2.92E M. 1670_167 5 -32 6.00E-37 hypothetical protein MTR_0055s0130 truncatula - Shannon_shannonout_r2_c1_3_ 12.1664949 6.54E M. 4700_470 1 -11 4.00E-15 hypothetical protein L484_018333 notabilis - Shannon_shannonout_r2_c1_0_ 10.6390793 5.12E M. 9900_990 9 -07 5.00E-13 hypothetical protein L484_012699 notabilis - Shannon_shannonout_r2_c1_0_ 8.91962116 0.000 9430_943 5 694 2.00E-09 hypothetical protein (mitochondrion) B. braunii - Shannon_shannonout_cremaini 8.14850301 0.016 ng1_120745_0 2 714 9.00E-24 ribosomal protein S12 (chloroplast) I. batatas 104

- Shannon_shannonout_r2_c1_18 7.92259730 0.035 _7920_792 2 271 4.00E-89 uncharacterized protein LOC100502343 Z. mays Shannon_shannonout_r2_c1_6_ - 1.68E hypothetical protein 1370_137 7.08037039 -09 8.00E-05 PHAVU_008G179700g P. vulgaris - Shannon_shannonout_r2_c1_3_ 6.85174420 5.71E M. 3300_330 9 -12 3.00E-07 Metal transporter Nramp5 notabilis - Shannon_shannonout_r2_c1_3_ 6.83238004 3.06E M. 3240_324 4 -10 1.00E-08 hypothetical protein L484_018823 notabilis - Shannon_shannonout_r2_c1_3_ 6.61456345 3.06E 8470_847 1 -10 1.00E-12 hypothetical protein (mitochondrion) B. braunii IE2396 Infested Vs Control Down-regulated - Shannon_shannonout_r2_c1_4_ 9.74746306 2.87E M. 160_16 2 -19 8.00E-18 hypothetical protein MTR_0055s0130 truncatula Up- IE2459 Infested Vs Control regulated Shannon_shannonout_r2_c1_3_ 8.44437388 4.84E M. 2850_285 5 -16 3.00E-15 hypothetical protein MTR_0002s0250 truncatula Down- regulated - Shannon_shannonout_r2_c1_4_ 14.0236503 1.71E M.notabili 107911_2 3 -16 3.00E-18 hypothetical protein L484_022991 s - Shannon_shannonout_cremaini 8.61398535 0.022 M. ng2_41843_0 5 168 3.00E-06 hypothetical protein MTR_2155s0010 truncatula - Shannon_shannonout_cremaini 8.43937369 0.038 3.00E- pyruvate, phosphate dikinase 1, ng2_1629_6 2 786 164 chloroplastic-like Z. mays - Shannon_shannonout_cremaini 8.34344496 0.049 ng3_20429_1 5 409 0 vacuolar cation/proton exchanger 1a-like S. italica White Sel 6 Infested Vs Up- Control regulated Shannon_shannonout_r2_c1_4_ 3.19679498 0.004 M. 1550_155 9 816 2.00E-48 hypothetical protein MTR_0021s0160 truncatula 4.24380305 6.97E M. Single_5 5 -05 4.00E-08 hypothetical protein L484_018823 notabilis Shannon_shannonout_r2_c1_0_ 4.25868829 0.000 M. 8413_10 1 115 7.00E-12 hypothetical protein L484_004636 notabilis Shannon_shannonout_cremaini 4.51985666 1.44E M. ng1_single_66 9 -05 2.00E-10 Metal transporter Nramp5 notabilis S. Shannon_shannonout_c1_5_24 5.88274051 1.44E michauxia 0_24 2 -06 0 maturase K (plastid) nus Shannon_shannonout_r2_c1_0_ 8.55028595 0.001 4.00E- LOW QUALITY PROTEIN: cytochrome G. 8171_0 5 104 163 c oxidase subunit 3 raimondii Shannon_shannonout_r2_c1_0_ 12.5514106 1.18E M. 8673_36 7 -13 3.00E-50 Metal transporter Nramp5 notabilis Down-regulated - Shannon_shannonout_r2_c1_0_ 8.63906138 0.000 8323_1 6 633 3.00E-12 hypothetical protein (mitochondrion) B. braunii Shannon_shannonout_cremaini - 0.007 1.00E- ng5_single_54 8.04367513 658 115 PREDICTED: tetraspanin-8-like S. italica - 7.84653956 0.014 PREDICTED: acetyl-CoA carboxylase 2- Single_979 9 804 0 like isoform X1 S. italica - Shannon_shannonout_r2_c1_5_ 7.61813801 0.031 2680_268 3 388 4.00E-09 uncharacterized protein LOC100502416 Z. mays Shannon_shannonout_cremaini - 0.031 7.00E- finger CCCH domain-containing ng4_single_33 7.56777417 388 121 protein 33 S. italica 105

4 - 7.56777417 0.031 protein EARLY RESPONSIVE TO Single_403 4 388 2.00E-49 DEHYDRATION 15-like Z. mays - Shannon_shannonout_r2_c1_11 7.51558835 0.036 _4550_455 9 775 6.00E-39 MOB kinase activator-like 1A O. sativa - Shannon_shannonout_r2_c1_11 7.51558835 0.036 _10325_2 9 775 2.00E-62 uncharacterized protein LOC107030119 S. pennellii - Shannon_shannonout_cremaini 7.46144378 0.043 9.00E- B. ng4_26043_0 1 042 149 26 kDa endochitinase 1 distachyon - Shannon_shannonout_cremaini 7.40518764 0.049 ng8_24282_1 4 887 0 protein TSS S. italica

Appendix 13: Distribution of Bi-allelic SNPs Ts/Tv ratio across all accessions

GuluE as Numb Bi-allelic A/G Ts C/T Ts T/G Tv A/C Tv A/T Tv C>G Tv Ts/ % reference er SNPs % % % % % % Tv 98. IE2396 53141 52110 37.1 33.8 6.5 6.9 7.8 6 2.61 1 98. IE2459 46660 45960 35.8 33.7 6.8 7.4 7.7 7 2.4 5 98. White Sel6 48502 47618 36.9 34 6.4 6.8 7.8 6.3 2.61 2 IE2396 as reference GuluE 54278 53218 98 37.3 33.9 6.3 7 7.8 5.8 2.65 98. IE2459 63190 62425 35.3 32.7 7.3 7.8 7.8 7.8 2.21 8 98. White Sel6 55624 54799 36.1 33.4 6.8 7.5 7.8 6.9 2.39 5 IE2459 as reference 98. GuluE 47362 46553 36.5 34 6.7 7.1 7.6 6.4 2.53 3 98. IE2396 58336 57542 35.4 33.9 7 7.4 7.7 7.2 2.36 6 98. White Sel6 66447 65695 34.9 33.4 7.2 7.6 7.6 8.1 2.23 9 White Sel6 as reference GuluE 51595 50577 98 37.1 33.7 6.3 7.2 7.6 6.2 2.59 98. IE2396 57203 56264 36 33.5 6.7 7.3 7.9 6.9 2.41 4 IE2459 74042 73334 99 34.8 33 7.4 7.6 7.7 8.5 2.17

106

Appendix 14: Statistical summary of the de novo assembly results per finger millet accession.

White Sel6 IE2459 GuluE IE2396 Trinity assembly 131305 119641 144309 123110 CD-HIT clusters 125630 111811 138557 117856

Shannon assembly 39186 63604 28110 33156 CD-HIT clusters 33368 51627 25154 29013 The minimum contig length for the reference assembly is 200 bp

107

Appendix 15: IE2459 Samples vs transcripts heatmap

108

Appendix 16: White Sel6 Samples vs transcripts heatmap

109

Appendix 17: IE2396 Samples vs transcripts heatmap

110

Appendix 18: GuluE Samples vs transcripts heatmap

111

Appendix 19: Summary of the Differentially Expressed transcripts mapping to KEGG Pathways

Genotypes Infection stages Up-regulated Down-regulated 3 dpi 0 0 IE2459 (S) 5 dpi 63 0 7 dpi 0 0 3 dpi 0 0 White Sel 6 (S) 5 dpi 31 0 7 dpi 0 10 3 dpi 0 0 GuluE (T) 5 dpi 0 0 7 dpi 0 0 3 dpi 0 0 IE2396 (T) 5 dpi 14 0 7 dpi 0 0 Total 108 10

Appendix 20: Phenylpropanoid Biosynthesis pathway

112

Appendix 21: Tetracycline Biosynthesis pathway

Appendix 22: Purine Biosynthesis pathway